摘要:Speech input is increasingly adopted as an intuitive interface for various embedded mobile devices. Cloud-based solutions provide powerful speech language understanding (SLU) capabilities but introduce privacy risks, as sensitive information may be processed remotely. To address these concerns, disentanglement-based encoders have been developed to strip sensitive data from audio signals, allowing SLU without compromising privacy. However, such encoders are often memory-intensive and computationally demanding, limiting their practicality on resource-constrained devices. Based on extensive experiments, this paper observes a key phenomenon: SLU relies on global information from the entire sentence, whereas the recognition of privacy-sensitive words predominantly depends on local information. We implemented simple encoder designed for efficient privacy-preserving SLU offloading (SILENCE) on an STM32H7 microcontroller and evaluated its performance under various privacy threat scenarios. Results demonstrate that SILENCE provides competitive speech intent classification accuracy and privacy protection compared to more complex encoders. Simultaneously, it achieves a speedup of up to 53.3 times and a reduction in memory footprint by 134.1 times, marking the first time that privacy-preserving SLU services have been realized on a microcontroller with only 1 MB of memory.
关键词:spoken language understanding (SLU);resource-constrained devices;privacy-preserving;microcontroller unit;speech intent classification;memory efficient
摘要:Facial recognition and voiceprint recognition are two core biometric technologies in the field of identity verification, widely applied in various scenarios. However, research on the correlation between these two modal features remains relatively limited. This study aims to explore the commonality between voice and facial features. Unlike the existing studies that directly look for solutions from the way of realising feature correspondences, this study starts from the identity feature characteristics and actively obtains the universal feature space from the accurate representation of identity information. The distance relationship between identity features in facial recognition tasks is introduced as prior information, ensuring that identity-related relationships are preserved while using feature correspondence methods. During the voiceprint feature extraction process, the pre-trained parameters from speech recognition tasks are adjusted to enable the model to better represent identity information. The experimental results demonstrate that the speech transformer model, when used as a voiceprint signal extractor with the same feature correspondence method, achieves significant improvement on verification task compared to the time-delay network. In addition, the method is able to achieve similar performance as the existing methods on the validation task with lower data requirements and no additional training of classifiers. Future studies could further incorporate prior knowledge of voiceprint features to enhance the performance of cross-modal feature matching.
摘要:The increasing complexity of the space environment has made threat assessment and instantaneous parameter estimation of space objects a research hotspot in the field of space situational awareness (SSA). Current monitoring methods predominantly rely on ground-based radar or space-based optical single sensor, and there are drawbacks such as unidimensional observation angle, insufficient timeliness, and difficulty in estimating target state parameters. To address these issues, this paper first establishes a unified imaging model for spaceborne inverse synthetic aperture radar (ISAR) and optical cameras. The equations correlating the geometric projections of space targets on optical and ISAR imaging planes with their instantaneous attitude and dynamic parameters are derived. Then, based on the whale migration algorithm (WMA), a space target instantaneous state parameter method by fusing spaceborne optical-and-radar images is proposed. Finally, the equivalent rotational angular velocity and angular acceleration, obtained from the fusion estimation, are used to realize the geometric calibration and anomalies detection. The method proposed in this paper applies spaceborne optical and radar fusion to space targets imaging and perception field. It overcomes the constraints of traditional ground-based observation methods on observation range as well as the limitation of spaceborne single-sensor observation angle, suitable for instantaneous state parameter estimation of most space targets with complex motions. Simulation results demonstrate that the proposed method accurately estimates the instantaneous state parameters of space targets, thereby enabling effective geometric calibration of ISAR images and detection of anomalous motions in space objects.
关键词:optical-and-radar image fusion;spaceborne detection;instantaneous state parameter estimation;cross-range geometric calibration;anomaly detection
摘要:With the explosive growth in the number of parameters of machine learning models and the scale of training datasets, a single computing node can no longer meet the computational demands of large artificial intelligence (AI) models. Distributed machine learning systems have become the primary platform for supporting AI model training. The training time can be reduced by implementing parallel training across tens of thousands of computing nodes. In particular, data parallelism is a widely used parallel training framework in distributed training. It splits the training dataset across many computing nodes and then trains the model collaboratively through periodic parameter synchronization among those nodes. Since computing nodes need to transmit a large amount of data to complete the parameter synchronization before each round of iteration, communication becomes the key factor that affects computational efficiency. Traditional parameter synchronization strategies suffer from the problem of excessive communication rounds or congestion at the receiver’s link. In contrast, parameter synchronization strategies based on in-network aggregation face issues such as limited computing and storage capabilities of the switches, and congestion at server output ports. To this end, a hybrid parameter synchronization strategy termed PASSING (hybrid Parameter Synchronization Strategy with In-host and In-network Aggregation) is proposed. It implements a local pre-aggregation of the model parameters within the host prior to transferring the data to programmable switches. Subsequently, the local aggregation parameters are sent to the programmable switches to implement the global parameter synchronization. This approach not only ensures efficient communication between the small-scale computing nodes with the host but also reduces the computational and communication load on the switch side. We built a testbed using the multi-GPU (Graphics Processing Unit) servers and programmable switches and deployed PASSING in this testbed. The experimental results demonstrate that PASSING, when compared to traditional parameter synchronization strategies, enhances training performance by up to 65.25%, thus effectively accelerating the speed of distributed training.
摘要:When discontinuous Galerkin time-domain (DGTD) methods with explicit runge-kutta (RK) schemes are applied to the analysis of electromagnetic (EM) problems involving lossy media, the time-step size is strictly limited by electric and magnetic conductivity. Although the influence of lossy media on stability has been verified by numerical experiments, it lacks strict theoretical support. Therefore, the stability condition of the explicit RK-DGTD methods in lossy media is established based on theoretical derivation. The upper limit of time-step size is estimated, and numerical results demonstrate its accuracy. In addition, to improve computational efficiency, an implicit-explicit time integration scheme is proposed for DGTD methods. The lossy terms are discretized using an implicit crank-nicolson (CN) scheme, while the remaining terms are discretized using an explicit RK scheme. The time stability of the proposed method is theoretically proven to be unaffected by material loss, and numerical experiments validate its reliability and efficiency.
摘要:The recognition task based on short utterances is one of the challenges in the field of speaker recognition (SR) due to data scarcity and inaccurate feature extraction. In scenarios with limited data, this paper proposes a short utterance speaker recognition network based on time-frequency (T-F) attention and convolutional enhancement for feature extraction and identity recognition. We introduce a time-frequency attention module and a convolution module in the transformer encoder to propose a module called time-frequency attention conformer (TFA-Conformer), which helps the model capture precise acoustic features by utilizing information from T-F channels to calculate validity weights from global to local perspectives, thereby enabling the feature encoder to produce highly discriminative speaker embeddings under short utterance speech conditions (3 s or less). We evaluate the proposed supervised training network on datasets under short utterance conditions, and the recognition accuracy and other metrics of the proposed method are improved by 4.837% on average, higher than those of the mainstream methods. In condition with shorter duration and less data, the proposed method shows a relative improvement of 2.799% on average. Furthermore, it requires fewer parameters and lower computational complexity, making it not only suitable for short utterance scenarios but also more lightweight.
关键词:speaker recognition;short utterance;time-frequency domain;self-attention;Conformer;voiceprint features
摘要:With the continuous advancement of technology, wireless communication is evolving toward higher levels and multiple dimensions. While this progression enhances communication quality, it also introduces significant technical challenges. These include mutual information loss due to multi-user interference, worsened time-varying fading in networks, and insufficient overload factor under limited resources. These issues indicate the critical need for coding theories and transmission schemes designed for complex communication scenarios. To address the loss of mutual information and poor transmission performance caused by inter-user interference in two-user interference channels, this paper proposes a novel transmission scheme and decoding algorithm based on the Han-Kobayashi equivalence. This scheme combines multi-level superposition coded modulation and reversible matrix mapping, effectively improving communication performance in such channels. At the transmitter, multi-level superposition coded modulation was extended to two-user interference channels. Rate compatible low-density parity-check codes were employed for layered encoding, combined with mapping transformations to enhance layered transmission gains. At the receiver, a nested multi-layer decoder structure and interference channel-composite nesting decoder algorithm were designed. Gaussian approximation was used to reduce computational complexity. Additionally, to further improve accuracy, the framework of proposed scheme was expanded by using incremental redundancy-hybrid automatic repeat request. Simulation results show that the novel algorithm outperforms iterative multistage decoding algorithm under both strong and weak interference conditions and its performance is positively correlated to the number of layers. In addition, at the cost of moderate increases in computational complexity, the incremental redundancy-hybrid automatic repeat request enhanced algorithm achieves better performance under high signal-to-noise ratio conditions. These solutions provide practical options for real-world two-user interference channel communication systems.
摘要:This paper proposes an improved particle swarm algorithm based on population interaction to tackle spatiotemporal uncertainty, complex constraint coupling, and dynamic adaptation in multi-UAV (Unmanned Aerial Vehicle) task allocation for post-disaster rescue. A bi-objective optimization model quantifies rescue value as a time-varying interval function, integrating a penalty mechanism with range constraints to capture the environment’s dynamic and multidimensional features. The proposed method departs from traditional centralized frameworks by introducing a main-auxiliary population co-evolution architecture. A problem-oriented initialization strategy ensures high-quality initial solutions. Additionally, a dual modal update strategy driven by a learning library, combined with population interaction and local refinement search, balances convergence speed and population diversity. Experiments with 48 UAVs demonstrate a 11.8%~26.1% improvement in the dual hypervolume metric compared to recent algorithms. Results confirms superior robustness, efficiency, and adaptability, underscoring its theoretical and practical significance.
摘要:Modern radar target detection often faces complex and changeable clutter environments. Traditional model-driven constant false alarm rate (CFAR) detectors are prone to model mismatch problems, and existing data-driven supervised deep learning methods require cumbersome and expensive label problems. In response to the above problems, this paper proposes a clutter modeling method based on deep unsupervised variational networks. This method utilizes a variational autoencoder for learning the high-dimensional distribution features of radar echoes to achieve the reconstruction modeling of complex clutter distributions for the range-doppler spectrum after radar echo processing. Firstly, convolutional neural network (CNN) and recurrent neural network (RNN) are introduced into the unsupervised inference-generation framework of the variational autoencoder. The reconstruction modeling of range-doppler spectra is achieved by respectively utilizing the local feature capture ability of CNN networks and the temporal correlation information extraction ability of RNN networks. To fully capture the clutter distribution characteristics and two-dimensional spatiotemporal information in different regions of the range-doppler spectrum, this paper proposes a clutter modeling method based on spatiotemporal variational Transformer. This method introduces the Transformer architecture into the proposed deep unsupervised clutter modeling variational network. Capture the global correlation of R-D spectral data by leveraging the self-attention mechanism of the Transformer network. In order to fully explore the clutter distribution characteristics of R-D spectra in different scenarios and retain the two-dimensional spatiotemporal information of the original data, a switching mechanism and a two-dimensional position encoding mechanism are designed to match the Transformer network architecture. Finally, combined with the out-of-distribution (OOD) detection strategy, this paper proposes a clutter modeling and radar target detection method based on deep unsupervised variational networks, and reconstructs the likelihood representation of the unsupervised variational network to accurately reconstruct the difficulty level of the input samples. The greater the reconstruction likelihood, the more similar the reconstructed sample is to the input sample. Therefore, the OOD score is defined by using the reconstructed likelihood as the basis for dividing the target from clutter to achieve the radar target detection task. Verified by simulation data, the unsupervised clutter modeling method proposed in this paper can achieve fine reconstruction of the clutter distribution in the radar range-Doppler spectrum. Moreover, compared with the traditional CFAR method, when the detection probability reaches 80%, the signal to clutter plus noise ratio (SCNR) required by the method proposed in this paper The SCNR is optimized by 5.6 dB.
摘要:The accuracy of time-frequency representation directly influences the interpretation of the intrinsic dynamics and functional significance of electroencephalogram (EEG) signals. To address the limitations of fixed scales and suboptimal regression term selection in existing multi-wavelet-based methods, this paper proposes a novel time-frequency representation framework based on scale-adaptive sparse multi-wavelets. This method adopts a joint sparse Bayesian learning and information entropy optimization framework to globally identify the optimal regression terms of the time-varying model, effectively avoiding the local convergence issues of traditional approaches. Furthermore, scales are allocated to the wavelet basis. The genetic algorithm is enhanced in three key aspects—optimal individual selection, particle swarm mutation, and population update—to optimize the scale. This achieves adaptive matching between the wavelet basis and the optimal scale, thus enhancing the fitting capability of multiple wavelet bases for time-varying signals. Ultimately, the estimated time-varying parameters are transformed into accurate time-frequency representations through parameter spectral estimation. Experiments on three simulation models show at least a 23.08% reduction in parameter estimation error and a 2.93% improvement in time-frequency resolution. Compared to state-of-the-art algorithms, it shows strong competitiveness in tracking time-varying parameters and extracting time-frequency features. On BCI Competition II-data set III, our method enhances event-related desynchronization/event-related synchronization detection, with performance improving from 3.37 to 8.78. When combined with a simple convolutional neural network, it achieves 88.04% recognition accuracy on the BCI Competition IV-dataset 2b—comparable to that of more complex state-of-the-art models—thereby indirectly validating its effectiveness in time-frequency representation. Our method is designed from three perspectives: model structure optimization, algorithm enhancement, and basis function scaling. The collaborative improvement of time-varying parameter estimation and time-frequency resolution is successfully achieved, offering a novel methodology for EEG signal.
摘要:In very large scale integration (VLSI) design, efficient clock tree synthesis (CTS) is crucial for ensuring circuit performance and reliability. To address the co-optimization challenge of clock skew, latency, and power consumption in large-scale circuits, this paper proposes an efficient CTS method based on a bi-partition, four-branch H-like tree. In the bottom-up phase, the method first employs a greedy-based clustering (GBC) algorithm to enhance the fanout utilization of low-level buffers, significantly reducing the number of inserted buffers. Subsequently, it incorporates a buffer re-placement algorithm for the fine-grained control of local clock skew. During the top-down phase, it is first theoretically proven that uniformly inserting a specific number of buffers along a path minimizes clock latency, and a look-up table is constructed based on this principle to guide optimal buffer insertion. Next, the layout is vertically divided into two symmetrical half-regions from the clock source, and a four-branch H-like tree structure is constructed within each half-region. This structure not only applies the long-path buffer insertion algorithm to minimize global clock latency but also leverages its symmetry to merge buffers on symmetrical paths, further optimizing the buffer count while ensuring low clock skew and latency. Finally, to handle potential constraint violations during synthesis, the method first extracts insertable locations for buffers based on Boolean operations and then determines their optimal placement according to the properties of Manhattan rectangles. The proposed algorithm is validated on circuit instances with 1×105 to 2×105 flip-flops, and the results demonstrate its significant advantages. Compared to OpenROAD, our method reduces clock skew and power consumption by 32.3% and 29.9%, respectively. In comparison with GH-Tree, it achieves reductions of 59.9% in clock skew and 28.9% in power consumption, while maintaining a comparable global clock latency.
关键词:bi-partition four-branch tree;buffer insertion;clock skew;clock latency;clock tree synthesis
摘要:A design method for circularly polarized antenna arrays featuring high aperture efficiency and reconfigurable scattering characteristics is proposed. Firstly, a reconfigurable antenna element is constructed using p-i-n diodes, which achieves two scattering states with a phase difference of 180° while maintaining stable circularly polarized radiation characteristics. Secondly, by innovatively adopting an overlapping feeding structure design, the number of diodes used in the antenna element is significantly reduced, effectively improving the radiation performance. Finally, a 2 × 8 circularly polarized phased array is constructed based on this element, achieving a gain of 14.6 dBic and an aperture efficiency of 86.5% at 9.4 GHz. Additionally, the radiation beam of the array can be scanned within a range of ±45°, and its scattering beam can be effectively adjusted within a range of ±32° without compromising the stability of its circularly polarized radiation characteristics. The high consistency between the measured and simulation results verifies the effectiveness of the proposed design method.
摘要:To address the real-time gesture recognition problem in occluded environments, this paper proposes a real-time gesture recognition algorithm based on long range radio (LoRa) signals. By utilizing the low frequency band and good penetration of LoRa signals, this algorithm calculates the signal ratio using two receiving antennas, and combines short-time Fourier transform (STFT) to obtain time-frequency maps containing hand motion features. These maps are processed by a neural network encoder called the Gesture Encoder, generating feature vectors that represent the gesture characteristics, which are then used for gesture classification and recognition. This algorithm effectively solves the recognition problem in scenarios with object occlusion, and introduces a system state transition machine (STM) and data augmentation methods to precisely control the start and end times of gestures, thus enabling automatic segmentation and real-time recognition. At last, the system is deployed on an edge computing device running Android, and tested in occlusion scenario. Experimental results show that the proposed gesture recognition system can efficiently and accurately complete gesture classification on edge devices, with strong practical value and application prospects.
关键词:radio frequency sensing;gesture recognition;real-time inference;signal processing;neural networks
摘要:Secure multi-party computation is an important research branch of modern cryptography. It can effectively protect data, prevent privacy data from being improperly acquired or exploited, and simultaneously ensure that participants maintain data privacy and integrity while sharing data. Among its applications, secure computation for set subset relationships is a key technology underpinning private data queries, confidential data outsourcing, similar document retrieval, and other secure sharing of private data. Existing schemes primarily focus on subset determination for sets composed of single elements and lack effective support for complex sets composed of tuples. Furthermore, they face the following challenges in terms of practicality, security, and efficiency: existing schemes require performing two independent subset determinations on the tuple set, resulting in low computational efficiency, and intermediate results may expose sensitive data unrelated to the subset relationship; Existing schemes struggle to effectively protect the privacy of single-element sets (especially in scenarios requiring protection of set intersections and cardinalities), while the amount of information requiring protection in tuple sets is larger, significantly exacerbating privacy leakage risks; Existing subset protocols for single-element sets may yield erroneous judgments; Simultaneously, existing schemes lack support for efficient batch determination when one participant holds multiple tuple sets. To address the above challenges, this paper proposes, for the first time, secure computation protocols for set subsets where one participant holds multiple sets and the set elements are tuples, designing distinct schemes for scenarios where participants possess or lack a universal set. The proposed protocols enable the synchronous determination, through a single execution, of whether one tuple set is a subset of multiple other tuple sets, avoiding the privacy leakage risk of intermediate results inherent in stepwise computation. The protocols in this paper significantly enhance efficiency and possess broad applicability. Furthermore, the protocols proposed in this paper not only protect the cardinality of the tuple sets held by both participating parties but also protect the cardinality and specific elements of the tuple subset itself. Specifically, for two-party computations with a universal set, Alice selects from encrypted data sent by Bob, thus avoiding complex modular exponentiation and reducing computational costs. For scenarios without a universal set, using polynomial representations of sets, Bob simply substitutes his data into the encrypted polynomial sent by Alice to compute subset confidentiality. Finally, using established simulation paradigms, this paper proves the security of the protocols, with experimental validation demonstrating the feasibility of the approaches.
摘要:As an open-source computing engine, due to its simplicity, speed and scalability, Spark is widely used in the field of big data processing and analysis. Spark defaults to using hash partitioning or range partitioning to partition data. It often results in severe imbalances in data volume between partitions when processing data with skewed key distributions. Many optimization methods have been proposed, such as migration partitioning, greedy partitioning, feedback partitioning, etc., but often have problems such as large data transmission, high extra computing cost, and long running time. In order to better alleviate the impact of key skew distribution problem, this paper proposes an adaptive Spark data balanced partitioning method, which introduces the idea of reward and punishment to properly regulate the data partitioning process. At the same time, the key with big data volume is properly divided to make the data amount of each partition relatively balanced. After sampling the data and estimating the key weights, the sample data are sorted in descending order according to the key weights, so that all partitions have initial data. Then according to the reward and punishment allocation strategy, the allocation probability of each partition is adaptively updated and the keys to be allocated are directed to the partition with the highest probability. The adaptive data partitioning scheme was obtained after all sample data were allocated. In actual partitioning, the data of keys that appear in the sample are allocated according to the adaptive data partitioning scheme, while the data of keys that do not appear are partitioned according to the hash method. The experimental results show that the adaptive data balanced partitioner (ADBP) designed with the new data partitioning method can effectively alleviate the negative impact of key skew. On real data sets, the total running time of WordCount program of ADBP is averagely 1.51% and 29.90% shorter than Spark’s own partitioners, i.e., HashPartitioner and RangePartitioner, and averagely 8.12%, 21.64% and 19.62% shorter than the existing partitioners learning automata hash partitioner (LAHP), splitting and combination algorithm for skew intermediate data block (SCID) and fined-coarse grained intermediate data placement (FCGIDP) respectively.
摘要:In view of the lack of effective understanding of complex scenes in image classification methods, which leads to the limited ability of models to capture key features and thus affects the classification accuracy, this paper proposes an image classification network of background perception mechanism (BPMNet). Firstly, the background perception (BP) module is proposed. Through a dual-branch structure, the foreground and background information are processed respectively, the contribution degree of the input features is dynamically adjusted, and the context support role of the background information on the foreground features is strengthened to enhance the model’s perception ability of background information. Then, combined with the BP module, the background perception attention (BPA) module is designed. While considering the local feature information and long-range dependency relationship, it also pays attention to the relationship between the foreground and background of the image, and dynamically regulates the influence degree of the background information on the features of the subject target and enhances the discriminability and positioning ability of key target features. Finally, the background perception module and the background perception attention module are embedded in the residual block to achieve feature transfer from shallow details to deep semantics, and the feature representation ability of foreground targets in complex scenes is enhanced by combining local details and global semantics. Compared with other mainstream networks, the classification accuracy of BPMNet achieved on the image data sets such as CIFAR-10, CIFAR-100, SVHN, Imagenette and Imagewoof, are 96.95%, 80.85%, 97.68%, 90.10% and 81.70%, respectively, which increased by 2.39%, 3.17%, 2.36%, 2.30% and 2.67% on average. Compared with the current advanced network models, the proposed method can enhance the model’s understanding of complex scenes, improve the ability to express key regions, extract key features more effectively, and further improve the robustness and generalization ability of the model.
摘要:Geometry-based point cloud compression (G-PCC) can achieve significant point cloud compression efficiency, but decompressing point clouds in low bit rate scenarios produces severe geometric compression artifacts and negatively affects the overall visual experience. To address this problem, this paper proposes a geometric quality enhancement method for decompressed point clouds based on attentional fusion of multiscale features. Specifically, the method designs a multi-scale input module to perform downsampling operations on the decompressed point cloud to obtain point cloud data at different scales. Then, the multi-scale point clouds are input in parallel into a discrete convolutional network to extract multi-scale feature information from local to global. Finally, a cross-scale attentional feature fusion module is designed in this paper to fuse the multi-scale features to enhance the completeness and accuracy of the features. The experimental results show that the proposed method achieves an average peak signal-to-noise ratio of 67.968 4 dB on the publicly available dataset, which is an improvement of 1.629 4 dB compared to the standard compression algorithm G-PCC, and the subjective and objective experimental results show that the method can further improve the quality of decompressed point clouds.
摘要:Current session recommendation models excel at extracting users’ immediate preferences but struggle to capture the dynamic evolution of user interests over time and context, making it challenging to extract latent relationships between items from short-term interaction sequences. This paper proposes a neighborhood and hypergraph collaboration for session-based recommendation model (NHG-Rec), which first comprehensively utilizes adaptive multi-hop hypergraph convolution and neighborhood convolution to simultaneously capture explicit and implicit relationships between items; then employs a context-aware dynamic positional attention mechanism to explore the importance of items within a session, thereby capturing users’ real-time interests; further adopts multi-view session embeddings through a local-global contrastive learning strategy to capture multi-dimensional item features and distinguish semantic differences. Experimental results demonstrate that for Tmall, Diginetica, and Nowplaying benchmark datasets, compared to mainstream baseline models such as SR-GNN, GCE-GNN, and DHCN, this model improves P@10, P@20, MRR@10, and MRR@20 performance metrics by an average of 12.38%, 5.47%, 6.53%, and 6.39%, respectively. The NHG-Rec model effectively captures the dynamic changes of user interests and multi-dimensional relationships between items.
摘要:The KAN (Kolmogorov-Arnold Networks) model enables the accuracy of image segmentation to be improved by a new linear function fitting method. However, the problems of single fitting angle and poor extraction of label position information lead to its poor ability to process the detailed feature information of labels, which limits the improvement of network accuracy. To address the above problems, a multi-scale dual-channel 3D image segmentation model is designed, which significantly enhances the network’s ability to extract minute features from images by integrating multi-angle 3D image inputs and combining the multi-angle KAN module with multi-scale convolutional weighted residual channels. In terms of the network attention mechanism, a multi-view self-attentive residual module is designed, which effectively captures the label spatial location information through multi-dimensional feature interactions, so that the label region with a relatively low percentage (<10%) can still maintain excellent segmentation accuracy. The model is experimented on BraTS2021 MRI multimodal 3D brain tumor dataset and LiTS2017 liver tumor CT 3D dataset. The accuracy of the improved model is 86.54% and 88.07%, respectively; in the brain tumor dataset, the Dice evaluation indexes of the enhanced tumor, all tumors, and tumor core region reach 83.67%, 88.79%, and 85.28%, which are improved by 3.38, 2.85, and 1.62 percentage points, respectively, compared with the U-KAN network; in the liver tumor dataset, the liver and tumor region’s Dice evaluation index reached 91.36% and 84.77%, which were improved by 1.69 percentage points and 1.02 percentage points, respectively. The experimental results show that the model improves the effect of 3D tumor image segmentation significantly.
关键词:3D image segmentation;multi-angle;dual-channel;self-attention;labeled region reinforcement
摘要:Emotion recognition is the key link of intelligent human-computer interaction. Electroencephalogram (EEG) has become an important carrier of emotion analysis because it contains rich biological information and is difficult to disguise. However, EEG signal features are complex and changeable, and there are significant individual differences and time variability, which lead to low accuracy and poor generalization ability of traditional machine learning methods. To address these challenges, this paper proposes a reconstructed transfer subspace based multi view domain adaptation (RTS-MVDA). This method regards different features as independent perspectives, explores the uniqueness and importance of each perspective through multi perspective learning, and mining their complementary relationship. Its core is to project the multi view data of the source domain and the target domain into a reconstruction migration subspace with low-rank constraints. In this subspace, RTS-MVDA, on the one hand, uses the reconstructed items to restore the original data information, and retains the main discrimination information through the low-rank representation; on the other hand, RTS-MVDA implements linear transformation to align the source domain and target domain, reducing the distribution difference between domains. In addition, RTS-MVDA constructs multi view supervised discriminant and global structure preserving item. The former uses source domain label information to enhance intra class compactness and inter class separation, while the latter maintains the global structure distribution of data in the migration subspace, so as to more effectively migrate the discriminant knowledge of the source domain to the target domain. The experimental verification on the public database for emotion analysis using physiological signals (DEAP) dataset shows that the average accuracy of the proposed RTS-MVDA method in arousal and valence is 73.15% and 72.91%, respectively. Its precision, recall and F1-score are significantly better than the related comparison methods, effectively improving the accuracy and generalization ability of cross-subject EEG emotion recognition.
摘要:Multi-angle plane wave coherent compounding (MPWCC) can achieve high frame rate ultrasound scanning, which aids color flow imaging technology in providing more accurate blood flow information and tissue images. However, the low-pass effect of the MPWCC results in underestimated blood flow velocities and the optimal threshold for clutter suppression filters cannot be determined computationally. To address this, this paper proposes an ultra fast ultrasound color blood flow imaging based on deep convolutional neural networks (DCNN). Based on the Field II ultrasound simulator, the carotid artery model is built to acquire ultrasound doppler signals with different blood flow velocities. These signals are processed with the singular value decomposition (SVD) and then normalized to generate training dataset. The DCNN model learns the characteristics of Doppler signals with different velocities through supervised learning, enabling clutter suppression and conversion of feature information into velocity information for color flow imaging. Compared to the autocorrelation velocimetry by combining high pass filtering (HPF) or SVD, the superior performance of the proposed method has been demonstrated in both simulation and human carotid artery test dataset. When blood flow velocity profiles in both forward and reverse directions are estimated, the normalized root mean square error (NRMSE) of the proposed method is reduced by an average of 45.65% and 41.95% than these of the HPF and SVD, respectively. In the results of color flow imaging in simulation and human data, the proposed method shows the best clutter suppression effect and vessel integrity. In summary, this method achieves ultrafast ultrasound color blood flow imaging and is applicable for visualizing blood flow dynamics.
摘要:This paper addresses the critical limitation of adversarial example transferability in black-box attacks for deep neural networks by proposing a dynamic fake target adversarial attack framework based on categorical semantic correlations. Existing methods often overlook inter-class semantic relationships, causing adversarial perturbations to overfit to model-specific features and severely restricting the adversarial example’s cross-model transferability. Studies have indicated that adversarial examples are more likely to be misclassified into semantically similar classes rather than arbitrary categories during the transfer process. This observation underscores the significance of class similarity as a pivotal factor influencing transferability. In this research, we innovatively propose a class-similarity-driven dynamic pseudo-targeted adversarial attack method by exploring the shared adversarial subspace characteristics among semantically analogous categories within the feature space. First, we establish a dynamic pseudo-target selection strategy. In each perturbation iteration, we identify the class with the highest predicted probability among all incorrect categories as the “pseudo-target”, based on the current model’s confidence distribution regarding the adversarial example. This pseudo-target is not fixed, instead, it is adaptively adjusted throughout the iterative process, ensuring that the perturbation direction consistently orients toward the most transferable semantic region. Second, we introduce a dual-gradient collaborative update framework. This framework integrates the adversarial loss gradient pertaining to the true class with the misleading gradient associated with the pseudo-target class through linear weighting. Leveraging the superposition effect in the gradient field, the perturbation update not only circumvents the decision boundary of the source model but also progresses into the shared semantic subspace of multiple models, thereby significantly enhancing the cross-model transferability of adversarial examples. Furthermore, our proposed method demonstrates wide compatibility and extensibility, serving as a versatile optimization mechanism that can be seamlessly integrated with various mainstream gradient-based attack strategies. During each gradient update, the incorporation of a dynamic pseudo-target gradient term markedly amplifies cross-model transfer capability without compromising the original gradient structure of the foundational method. Experimental results illustrate that the proposed approach exhibits superior transfer robustness in cross-architecture (e.g., Convolutional Neural Networks and Transformers) and cross-scale (e.g., lightweight models) adversarial attack scenarios. Additionally, it showcases excellent compatibility, enabling effective integration with diverse gradient attack strategies and data augmentation techniques, thereby outperforming existing methodologies across single, combined, and ensemble attack settings. This study proposes a general optimization paradigm based on semantic similarity for adversarial attacks, offering novel insights to enhance the transferability of black-box attacks.
摘要:Accurately predicting student performance is a prerequisite for intelligent tutoring systems to provide students with personalized learning services. As mainstream methods for student performance prediction, both cognitive diagnosis and knowledge tracing attribute student performance solely to knowledge states, neglecting students’ test-taking psychological states during the answering process, thereby limiting further improvements in prediction accuracy. To this end, this paper proposes a test-taking psychological state enhanced student performance prediction model (TSPP), which integrates students’ test-taking psychological states into the knowledge-centered student performance prediction model and combines the complementary advantages of the interpretability of cognitive diagnosis with dynamic prediction capability of knowledge tracing. The model models students’ test-taking psychological states by capturing complex high-order relations between exercises and their answering behaviors. Meanwhile, it models students’ dynamic knowledge states by extracting rich inter-node relations in heterogeneous knowledge graphs. Finally, we design a progressive fusion gate that employs an interpretable progressive approach to integrate test-taking psychological states and knowledge states to obtain interpretable prediction results. Extensive experimental results on three real-world datasets demonstrate that the TSPP model achieves 6.05% and 7.27% improvements in AUC (Area Under the Curve) and ACC (Accuracy), respectively, and a 6.76% reduction in RMSE (Root Mean Square Error), compared to the average performance of nine baseline models. Additionally, we further validate the explainability of TSPP by visually analyzing the test-taking psychological state and knowledge state in TSPP, and by investigating the advantages of the explainability parameters designed in the model.
摘要:To address the persistent tracking failures caused by strong background interference or target deformation in existing visual object tracking algorithms that indiscriminately utilize all historical templates and interact with entire search regions, this paper proposes a feature-adaptive selection based visual object tracking algorithm. First, a template feature filter is introduced to optimize traditional image-level template updating into feature-level dynamic updating, which selectively preserves strongly correlated template features while compressing weakly relevant features to reduce redundant information interference. Second, a search feature discriminator is employed to autonomously distinguish potential target features from noise features in search regions, thereby suppressing interactions with irrelevant areas. Furthermore, spatio-temporal information propagation tokens are incorporated to transmit target appearance and positional information across frames for progressive response refinement. A feature interaction encoder based on decoupled attention mechanisms is designed, which separates self-attention and cross-attention operations to better adapt to the proposed modules while enhancing discriminative capabilities. Comprehensive experiments on multiple large-scale public datasets demonstrate robust performance, achieving precision scores of 93.0%, 79.6%, and 91.2% on OTB100, LaSOT, and UAV123 datasets respectively. The algorithm maintains an optimal balance between tracking success rate and operational efficiency, significantly improving tracking accuracy and robustness in complex scenarios.
摘要:The capacity to perceive collisions is essential, from the survival of animals in nature to the safe operation of machines in industrial environments. Inspired by the locust visual neuron LGMD (Lobula Giant Movement Detector), numerous biomimetic computational models have been developed for real-time and reliable collision detection. However, constrained by two-dimensional monocular visual input, existing methods struggle to capture the depth features of moving objects, thus failing to meet the demands of looming perception in complex dynamic scenarios. To address this, this study proposes a 3D looming perception model that integrates bio-plausible motion and disparity pathways. In the presynaptic neural network, the proposed model achieves spatiotemporal integration of neural signals from both visual pathways. This not only effectively eliminates background clutter interference but also significantly suppresses visual stimuli caused by non-looming foreground motion, while reducing attention to targets suddenly appearing within the field of vision. Consequently, the model enhances selectivity for approaching objects in unknown realistic environments. The experimental results of offline tests on real scene datasets and online tests on robot validate that our model attains an accuracy of 96.09% while reducing time complexity by an order of magnitude compared with the state-of-art method. Furthermore, it enables mobile robots to detect and avoid potential collisions in real-time during autonomous navigation. The study demonstrates a significantly synergistic fusion of the motion pathway’s efficiency and the disparity pathway’s reliability accomplished by the proposed neural network.
关键词:looming perception;disparity;locust visual system;bio-plausible;neural signal integration
摘要:To overcome the issues of path length, high time consumption, and the tendency to get trapped in local optima in the traditional beluga whale optimization algorithm (BWO) for 3-5-3 polynomial interpolation robotic arm trajectory optimization, this paper proposes an enhanced whale-manta ray fusion optimization algorithm (EBWO). The algorithm aims to optimize the robotic arm’s motion time, constructing a constrained optimization model, which is then converted into an unconstrained form using the augmented Lagrangian multiplier method. Firstly, an improved logarithmic nonlinear Halton chaotic sequence is used to optimize population initialization, enhancing search diversity and quality. Secondly, a multi-directional cosine whale position update mechanism is designed to strengthen the search ability in the exploitation phase. In the mid-iteration stage, an improved manta ray whirlwind chain hunting strategy is introduced, combined with a Levy flight mechanism to build a new hunting factor, enhancing both local exploitation and global jumping abilities. Lastly, an adaptive whale fall strategy based on a resource-competition coupling mechanism is proposed, incorporating quantum tunneling effects to improve the algorithm’s ability to escape local optima and convergence speed. Experimental results show that EBWO improves time optimization by 8.69% over traditional BWO in 3-5-3 trajectory optimization and reduces time by 42.13% compared to the non-optimized trajectory, demonstrating its effectiveness and practicality in complex optimization tasks.
摘要:Fully homomorphic encryption and post-quantum cryptography rely heavily on NTT (Number Theory Transformation) to accelerate polynomial multiplication. As the dimension of NTT polynomials increases, the storage and transmission of rotation factors have an increasingly serious impact on the system. To solve this problem, this paper designs a twiddle factor generator with low area cost for fully homomorphic encryption. This paper first analyzes the calling rule of the twiddle factor of the NTT algorithm and proposes a dynamic twiddle factor generation scheme. Through data generation and overwriting, the twiddle factor is compressed to 0.12% of the original storage space. When performing the NTT operation with a dimension of 65 536, only 78 units of storage cost are required. Secondly, based on the RNS-CKKS scheme in fully homomorphic encryption, this paper evaluates 120 prime number modules with low Hamming weights and 60 bits wide, proposes a lightweight Barrett modular multiplication algorithm, and designs a modular multiplication unit with low area cost based on this algorithm. Finally, based on the dynamic twiddle factor generation scheme, this paper implements a low-cost dynamic twiddle factor generator for the Radix-16 NTT butterfly unit of fully homomorphic encryption, meeting the actual requirements of NTT operations in fully homomorphic encryption. To further verify the superiority of the hardware design in this paper, experimental verification was conducted on the Zynq UltraScale+ XCZU9EG device. The operating frequency reached 252 MHz, and the Slice resource consumption was reduced by 15%. The comprehensive implementation was carried out at the 40 nm CMOS (ss-corner) process node. Compared with the existing designs, the hardware design TPG (Throughput Per Gate) in this paper has increased by more than five times.
摘要:In recent years, Industry 5.0 has gradually emerged as a new direction for the development of global manufacturing, with a large number of resource-constrained smart devices being widely deployed in open environments. To address issues such as excessive computational overhead and the lack of critical security attributes in existing Industry 5.0 authentication protocols, this paper proposes a lightweight anonymous authentication protocol based on physical unclonable functions (PUF), effectively resolving the conflict between low computational overhead and high-security requirements in the Industry 5.0 environment. The proposed protocol utilizes trusted execution environment (TEE) to enhance PUF, optimizes the information flow of existing three-party authentication protocols, and introduces a chained challenge-response pair (CRP) update mechanism, achieving three-party authentication and key agreement among users, gateways, and Industry 5.0 smart devices. Furthermore, formal and informal security analyses demonstrate that the protocol can effectively resist smart device theft attacks and other common attack types. Comparative analysis with related protocols in recent years shows that the proposed protocol reduces the average computational overhead by 54% while meeting more security requirements.
摘要:This paper presents a highly efficient dual-band circularly polarized slot-coupled antenna directly integrated with a gallium nitride (GaN) power amplifier (PA). Through electromagnetic-circuit co-optimization, the antenna input impedance is directly matched to the optimal load impedance of the PA drain output at dual frequencies, eliminating the insertion loss and mismatch loss of traditional PA output matching networks. This approach effectively enhances system efficiency while maintaining stable antenna radiation performance. The dual-band circularly polarized antenna employs an “I”-shaped slot to excite a rotating radiation patch, enabling circularly polarized wave radiation while simultaneously regulating the antenna input impedance characteristics. To validate the design, a dual-band power amplifier integrated antenna (PAIA) operating at 2.6 GHz and 3.5 GHz was fabricated and experimentally tested in an anechoic chamber. Measurement results demonstrate that the overlapping regions of the 3 dB axial ratio (AR) bandwidth and impedance bandwidth are 200 MHz and 300 MHz, respectively. The effective isotropic radiated power (EIRP) at the center frequencies reaches 46 dBm and 48.2 dBm, while the saturated power added efficiency (PAE) exceeds 67% and 71%, respectively. These results highlight the excellent overall efficiency and stable circularly polarized radiation characteristics of the proposed design.
摘要:To address the limitations of existing real-time state-of-charge (SOC) prediction models for electric vehicles in terms of operational state awareness, dynamic calibration, and long-sequence forecasting accuracy, this paper proposes a temporal prediction framework that integrates a causal tree-of-thought mechanism with a deep reinforcement learning strategy. By introducing dynamic evolution and multi-branch causal inference, the proposed framework maintains the computational efficiency of a single model while enabling adaptive modeling of battery state transitions under complex operating conditions. First, a multi-level proximal policy optimization (PPO) model based on a hierarchical causal structure is designed. A time-series network is constructed as the core of the Actor network to hierarchically model the direct and indirect causal influences of key variables such as temperature and internal resistance on SOC. Through value function iteration and long-term return optimization strategies, the model continuously evolves its parameters, enhancing its generalization capability, interpretability, and causal reasoning ability. Second, a tree-of-thought structure is introduced to build a multi-path policy evaluation network, which combines policy search, path tracking, and backtracking correction mechanisms to achieve layer-wise policy optimization and anomaly branch correction under dynamic conditions. This design significantly improves the robustness and generalization performance of the model. Experimental results show that under various operating conditions, the proposed algorithm significantly outperforms Transformer, FED former, Mamba, and long short-term memory (LSTM) models across multiple evaluation metrics, achieving a mean absolute error (MAE) below 0.26%, root mean squared error (RMSE) below 0.35%, and coefficient of determination (R²) above 99.5%, demonstrating outstanding robustness and stability across different vehicle types.
关键词:Real-time state-of-charge (SOC) prediction for electric vehicles;deep reinforcement learning;Causal tree-of-thought;Time series network
摘要:In this paper, we investigate methods for rapidly and accurately establishing airborne laser communication links in scenarios where stable satellite navigation signals are unavailable. Compared to space laser communication, airborne laser communication faces more challenging factors such as higher platform mobility, more complex vibrations, and greater atmospheric losses. Consequently, the strategies and technologies for acquisition, pointing, and tracking in airborne laser communication require further refinement. To address the above issues, this paper constructs an airborne communication system that integrates millimeter-wave and laser technologies. It explores a method based on the Kalman filter principle, combining ranging and angle measurement information obtained from the millimeter-wave planar array with the platform’s flight state information to improve the accuracy of estimating the position and facilitate the rapid establishment of airborne laser communication links. First, based on the platform’s flight state, a motion model of the aircraft following a rhumb line is established. Next, a measurement model is developed according to the principles of millimeter-wave planar array ranging and angle measurement. Then, to address the nonlinear characteristics of the motion and measurement models, the unscented transformation is employed to generate sample points and weights, avoiding the complicated Jacobian matrix form used in the extended Kalman filter. Finally, the mean and variance of the target aircraft’s position estimation are obtained. Simulation results show that the target aircraft position estimation method proposed in this paper has significantly lower errors than those of the millimeter-wave measurement. The position estimation error is effectively controlled within the range required for rapid acquisition by the laser communication terminal, meeting the demands for establishing and maintaining stable airborne laser links. In summary, the millimeter-wave-assisted airborne laser communication method proposed in this paper can provide technical support for rapidly and reliably establishing laser communication links between airborne platforms under conditions of unstable satellite navigation signals. It has promising application prospects in areas such as inter-aircraft broadband networking, disaster area emergency communication relay, and formation flying coordination.
关键词:airborne laser communications;millimeter-wave planar array;Kalman filter;motion model;measurement model
摘要:With the rapid development and extensive application of internet of things (IoT) technologies, the large-scale deployment of IoT (LS-IoT) has become an inevitable trend for building intelligent and efficient social infrastructure. However, the heterogeneous, time-varying, and widely distributed nature of large-scale networks has led to increasingly prominent network and information security issues. Conventional perimeter-based security (PBS) models struggle to address complex and evolving threats in LS-IoT environments. The zero trust architecture (ZTA), which emphasizes the security principle of “never trust, always verify”, provides a potential solution for ensuring the security of LS-IoT systems. Initially, this paper systematically reviews the three core capabilities of ZTA, including software-defined perimeter (SDP), identity and access management (IAM), and micro-segmentation (MSG). Subsequently, aligning with the characteristics and requirements of LS-IoT, we delve into seven critical enabling technologies for implementing ZTA, including continuous identity authentication, dynamic access control, lightweight encryption technology, identity governance and management (IGM), terminal security, network isolation, and continuous monitoring. Then, throught practical applications in four representative scenarios, such as industrial IoT, 5G-enabled healthcare, autonomous driving, and remote work, this paper illustrates the effectiveness of ZTA in enhancing network security. Ultimately, this paper explores the integration of emerging technologies, such as large language models (LLM), generative artificial intelligence (AI), explainable machine learning (XML), edge computing, and post-quantum encryption (PQC) with ZTA, and discusses the future development directions of ZTA. This work aims to provide valuable insights for advancing ZTA implementation and strengthening security assurance in large-scale IoT.
关键词:large-scale internet of things (LS-IoT);zero trust architecture (ZTA);network security;intelligence
摘要:Unmanned aerial vehicles (UAVs) are extensively utilized in both military and civilian domains due to their high flexibility and mobility. However, radio-frequency (RF) communication faces challenges such as limited spectrum resources and interference. As a promising alternative, free-space optical (FSO) communication offers significant advantages, including high bandwidth, fast data rates, strong resistance to interference, and enhanced security. Nevertheless, FSO communication is highly sensitive to atmospheric conditions, and the high mobility of UAVs, coupled with limited onboard resources, introduces several operational challenges. This paper provides an overview of the FSO-based UAVs communication architecture, its characteristics, and the underlying channel dynamics. It discusses key technologies and recent advancements, including multiple-input multiple-output (MIMO), hybrid RF-FSO integration, relay communication/mission caching, and intelligent reflecting surfaces (IRS). The challenges related to environmental adaptability, precise positioning, energy efficiency, network architecture, and security are thoroughly examined. Furthermore, the paper explores future research directions, such as adaptive beam control, multi-modal sensor fusion, energy-efficient hardware innovations, hybrid communication architectures, and quantum security. This study aims to offer insights into the innovative applications of FSO communication for UAVs.
关键词:unmanned aerial vehicles;free-space optical;communication architecture;channel characteristics;relay communication