摘要:Continual learning generally refers to the ability of intelligent algorithms and agents to learn and adapt to a dynamic and changing world, enabling these algorithms to continually acquire, update, accumulate, and utilize knowledge throughout their deployment cycle. Continual learning technologies endow intelligent systems with the prospects and capabilities of adaptive development. In the context of deep learning, continual learning specifically refers to the ability to learn from non-stationary data streams and adapt to changing training objectives. This task often faces the challenge of catastrophic forgetting, where learning new tasks typically results in a significant decline in performance on previously learned tasks. In recent years, with the rapid development of deep learning in various fields such as language and vision, numerous advancements have emerged, effectively extending the understanding and application of continual learning. This work conducts a relatively extensive and comprehensive survey of existing continual learning research, analyzing it from multiple perspectives including fundamental definition, representative methods, applications in the visual domain. Finally, this paper also discusses the current cutting-edge developments and future research trends in continual learning. Based on the discussion of relevant work in the field of continual learning, we hope this review can effectively promote the development and exploration of this field and subsequent research endeavors.
摘要:In image-level weakly supervised semantic segmentation (WSSS),class activation map (CAM) are commonly used to localize object regions. However, existing methods often encounter challenges such as under-activation in object regions and erroneous activation in background regions when generating CAM. This paper proposes a class-aware contrastive learning (CA-CL) framework for weakly supervised semantic segmentation, which significantly enhances the model’s ability to accurately localize object regions by integrating text prompts and image category information. Firstly, we analyze the influence of different text prompt templates on the class activation maps of various categories, on this basis, to obtain more adaptive class representations, we construct a contextual prompt set and design a dynamic contextual prompt selection strategy. This strategy generates the most appropriate contextual prompts based on the similarity between image object regions and text prompts. Secondly, we adopt an image-text contrastive learning approach to enhance the model’s performance in aligning image and text semantics, and we design a contrastive loss function to guide the model training process. Finally, we introduce a class-specific background suppression module to mitigate erroneous activation in background regions closely related to object categories, thereby generating more complete and compact class activation maps and achieving more precise semantic segmentation. Experiments conducted on benchmark datasets PASCAL VOC 2012 and MS COCO 2014 demonstrate the effectiveness of the proposed framework, achieving mIoU values of 71.9% and 43.9%, respectively. The results demonstrate superior performance compared to existing methods, significantly improving the accuracy of weakly supervised semantic segmentation.
摘要:To address the information loss induced by downsampling in image semantic segmentation tasks, as well as the widespread limitations of existing upsampling methods: such as inadequate global perception, blurred fine-grained reconstruction, unstable generation processes, and redundant information handling in various scenarios, this paper proposes a lightweight semantic segmentation model, DFRNet, which incorporates a physics-inspired diffusion-focusing mechanism. Specifically, inspired by the surface tension of liquids, the model introduces a diffusion-focusing mechanism and designs a dynamic context window selection (DWS) module to optimize information flow, thereby implementing the physics-inspired energy propagation upsampling (PIEPU) framework. PIEPU comprises three core modules: diffusion, focusing, and regulation. These modules collaboratively enhance global contextual propagation, critical region feature reinforcement, and optimized information flow, thereby significantly improving fine-grained perception and semantic consistency across complex scenarios. Extensive experiments conducted on 14 datasets covering 7 semantic categories demonstrate that DFRNet consistently achieves superior performance over state-of-the-art methods in terms of mean intersection over union (mIoU), F1 score, and Accuracy. Specifically, mIoU improvements range from 0.165% to 4.259%, F1 score gains span 0.140% to 2.888%, and Accuracy enhancements vary from 0.035% to 1.386% across diverse datasets. These results validate the robustness and generalization capability of the proposed approach. Notably, DFRNet has a model size of only 3.34 MB, making it suitable for lightweight real-time applications.
摘要:Driven by the concentrated surge of AI application scenarios, the increasing requirements on data communication and computation in mobile applications is growing, the traditional cloud computing which relies on remote processing, often fails to meet low-latency requirements. Therefore, a new paradigm has emerged: terminal-side computing power that aggregate the vast terminal devices (including computing, storage, communication, etc) through distributed collaboration to efficiently execute computational tasks. However, constrained by the limited resource of standalone device and prohibitive communication overhead that impairs task coordination, such terminals still face significant challenges in achieving efficient collaboration for highly complex computing tasks.This paper presents device-to-device (D2D) communication assisted terminal devices collaborative computing, and a multi-agent soft actor-critic (MA-SAC) based on directed graph convolutional network (DGCN) is designed to solve this problem.The subtasks included in directed acyclic graph (DAG) tasks were deployed to multiple terminals for collaborative computing, it is introduced to cater to the exigencies of task transmission between disparate nodes within the DAG, and reduces the communication overhead when data transmission in the network. Through the simulations, the efficacy of the proposed scheme is demonstrated. The proposed scheme reduces network communication overhead by 38.2% and effectively improve resource utilization by 31.9%.
关键词:terminal-side computing power;terminal collaboration;multi hop D2D;terminal-side computing power allocation;directed graph convolutional network
摘要:With the advancement of integrated circuit manufacturing technology, analog integrated circuit design faces the challenge of trade-offs between performance metrics such as power consumption and gain. Traditional design methods, reliant on approximate equations and iterative refinement, are inefficient. This paper presents an artificial intelligence algorithm-based multi-objective design strategy for the design of a single-stage fully differential folded cascode operational amplifier. This method employs a neural network model to characterize the mapping relationship between design parameters and eight performance metrics, and sets the target performance for the operational amplifier to achieve through fitness functions and constraint conditions, then utilizes particle swarm optimization (PSO) algorithm to search for the optimal fitness. Experimental results show that multiple metrics exceed design targets, with a maximum voltage gain of 65 dB and a phase margin of 74°. Using this method, we can quickly and accurately obtain operational amplifier parameters that meet design specifications. Compared to manual calculations, this method reduces the running time to merely 906 seconds, significantly improving the design efficiency. It can be applied to more large-scale circuit designs in the future.
摘要:To address the issues of a small quantity, large variability of real Web application attack data and diverse attack payloads that lead to poor training effects of large models, a network attack detection method based on federated large model (FL-LLMID) is proposed. Firstly, a federated learning network for fine-tuning large model is proposed. The server conducts incremental aggregation on the parameters generated by the client’s local large model through incremental data training, which improves the parameter aggregation efficiency of large model in federated learning and avoids the problem of network traffic data exposure. Secondly, based on the large model ability to understand code, an attack detection model for application layer data (CodeBERT-LSTM) is proposed. By analyzing the application layer data packets, the CodeBERT model is used to perform vector encoding on the valid fields, and then combined with the long short-term memory network (LSTM) for classification to achieve the attack detection task of Web applications. Finally, the experimental results show that the accuracy of the FL-LLMID method in the attack detection task for application layer data reaches 99.63%. Compared with traditional federated learning, the efficiency of incremental learning is improved by 12 percentage points.
摘要:The widespread adoption of complex machine learning models across diverse industries has significantly increased the demand for model interpretability. The counterfactual explanation is a crucial post-hoc explanation method. However, traditional approaches often combine multiple objectives into a single objective optimization problem, leading to difficulties in weight assignment and reconciling conflicting objectives. Furthermore, existing methods also suffer from low computational efficiency, degraded prediction accuracy, and insufficient global explanations when dealing with high-dimensional, redundant, and noisy data. To address these issues, this article proposes a comprehensive causal multi-objective counterfactual explanation method with feature selection (CCE-FS). CCE-FS first employs the maximal information coefficient (MIC) to select key features, thereby enhancing prediction accuracy and global explanatory power. It then formulates the counterfactual search as a multi-objective optimization problem, effectively balancing the relationships between multiple objectives. Domain-specific causal relationships are incorporated as constraints to ensure the generated counterfactuals are realistic and plausible. Additionally, CCE-FS provides visual feature effect analysis to enhance user understanding and reveal potential model biases. Experiments conducted on the Statlog dataset demonstrate that CCE-FS significantly improves the validity, normality, and sparsity of counterfactual samples through feature selection, achieving a 46.3% enhancement in proximity for continuous features. Further validation on the Adult-Income and COMPAS datasets confirms that CCE-FS outperforms existing methods in causal consistency, data distribution reasonableness, and proximity of continuous features. These results highlight CCE-FS’s superior explanatory capabilities and greater application potential.
关键词:counterfactual explanations;multi-objective optimization;feature selection;causal relationship;maximal information coefficient;visualization of feature effects
摘要:Voice conversion is an artificial intelligence technology that uses deep learning to convert the voice of a source speaker into the voice of a target speaker. It is widely used not only in movie dubbing, personalized voice customization, etc., but also used by malicious individuals in telecom fraud, identity forgery, political and social manipulation, etc., posing serious threats to personal privacy, social stability, and even national security. Compared with the detection of VC-generated speeches, how to restore the source speech from VC-generated speeches, that is, VC-generated speeches restoration, has more important research significance and practical value for tracking real speakers and preventing the illegal use of VC technologies. However, there are still few related studies. In this paper, a restoration method for VC-generated speeches is proposed based on adversarial learning and enhancement optimization. Specifically, the similarity of the VC-generated speech with the source and target speech is first analyzed, and a restoration framework is present based on preliminary restoration-further optimization. Then, an adversarial restoration network is designed based on dynamic convolution and attention mechanisms, aiming to learn as much source speaker information as possible from VC-generated speech through adversarial learning of generator, classifier, and discriminator. After that, an enhanced optimization network, consisting of timbre extractor, content extractor, and sound encoder, is designed to generate optimized restored speech by deeply fusing timbre information in the preliminary restored speech and the content information in the deep converted speech. Finally, the effectiveness of the proposed method is validated on datasets of three high-performance speech conversion models: BNE-PPG-VC, TriAAN-VC, and Free VC. Comparative experimental results show that the restored speech for the three VC models improves the mean of cosine similarity with the source speech by 11.9, 8.7, and 7.1 percentage points respectively, and reduces the mean of equal-error-rate of speaker verification system by 4.30, 3.40, and 3.98 percentage points respectively, which indicates that the proposed method can not only effectively recover the source speaker speech, but also is also applicable to unknown VC-generated speech.
摘要:Unsupervised domain adaptive (UDA) person re-identification (Re-ID) seeks to leverage labeled source domain data to address the task of unsupervised Re-ID in unlabeled target domain data. Recently, contrastive learning has attracted attention in this field. However, current methods suffer from small differences in positive sample pairs and overlook biases in negative proxy sampling. To resolve these challenges, this paper presents a progressive hybrid contrastive learning (PHCL) method. In each training epoch, the PHCL method divides the unlabeled dataset into clustered samples with pseudo-labels and un-clustered independent instances through two steps: clustering and progressive refinement. Based on the clustering results, PHCL implements contrastive learning at two different levels: to learn intra-category similarity through bringing together similar samples within the same cluster (target domain) or identity label (source domain) and explores inter-instance discrimination by applying repulsion among un-clustered individual instances. Moreover, the PHCL method generates positive proxies for anchor samples through nearest neighbor mining, increasing the differences among positive sample pairs to learn richer semantic information. Additionally, the PHCL method performs debiasing in the negative proxy sampling process, mitigating the adverse impact of false negative proxies on model training. Experimental results show that the PHCL method achieves mean average precision (mAP) of 85.9% and 42.3% on the Market-1501 and MSMT17 datasets, respectively, which are improvements of 4.3 percentage points and 13.5 percentage points over the baseline model. These results validate the efficacy of the PHCL method for UDA Re-ID.
摘要:Collaborative optimization of transmission, computation, and storage resources in “cloud-edge-end” computing power networks is a critical and highly challenging task. Effectively integrating high-performance cloud resources, low-latency edge resources, widely distributed node resources, and low-cost user resources to achieve intelligent resource distribution, association, trading, and allocation are essential for the optimal configuration and efficient utilization of network-wide resources. This paper constructs a detailed mathematical model for the “cloud-edge-end” heterogeneous computing power networks with a focus on the integration of transmission and computation. Addressing multiple dimensions such as computing power demand, resource distribution, trading, and allocation, the joint optimization problem of minimizing delay and cost in scheduling heterogeneous computing and transmission resources is transformed into a mixed-integer nonlinear programming problem. Subsequently, an innovative serial sub-task path allocation mechanism is proposed, combined with the optimal route and assignment maximization (ORAM), to achieve efficient collaborative optimization of task computation and transmission paths. This mechanism divides computing tasks into multiple sub-tasks, perceives and manages the dependencies between serial sub-tasks, and utilizes the ORAM algorithm to select optimal computation paths that satisfy dependency relationships in real-time. It directs the transmission of computation results to target nodes with the fewest hops, thereby forming an end-to-end efficient resource scheduling channel. This approach not only reduces transmission delay and resource costs but also effectively transforms the traditional “transmit-then-compute” model into a “transmit-compute collaborative” model. Experimental results demonstrate that the proposed algorithm outperforms various benchmark algorithms in terms of delay, cost, and path optimization under different computational demands, sensing ranges, and node quantities.
关键词:computing power network;task scheduling;computing task;transmission path;computing power architecture
摘要:A low-offset, low-noise, high-precision bandgap reference (BGR) chip for high-resolution analog-to-digital converters (ADC) is designed in this paper. In response to the limitations of traditional architectures, two new technologies are proposed. Firstly, feedback enhancement technology reduces the operational amplifier’s offset voltage and low-frequency noise to 1/23 when equivalent to the reference output. Secondly, a high-precision base current compensation technique is proposed to reduce the reference output deviation under various corners and device mismatch. Implemented in 0.18 μm CMOS process, the proposed BGR occupies a chip area of 0.142 × 0.258 mm2.Measurement results demonstrate that the BGR generates a 0.6 V reference voltage under a 1.2 V power supply, consuming 31 μA quiescent current. The circuit achieves an integrated noise of 2.79 μVrms over 0.1~10.0 Hz and exhibits a temperature coefficient of 3.6 ppm/°C in the range of -40~125 ℃.
关键词:low-offset;low-noise;high-precision;bandgap reference;feedback enhancement;high-precision base current compensation
摘要:Real conductors of interconnect structures are lossy and their skin depth becomes large at low frequencies. The traditional one-region formation with the approximation of perfect electric conductor (PEC) or surface impedances may not be valid anymore, and two-region integral equation formations are needed in the integral equation approach. Also, the electric feld integral equation (EFIE) tend to break down at low frequencies and augmented electric field integral equations (AEFIEs) have been proposed to remedy the problem. In this work, we treat lossy conductors as penetrable objects and propose two-region augmented hybrid field integral equations (AHFIEs) for low-frequency analysis. The hybrid field integral equations (HFIEs) consist of the EFIE of describing the exterior of a conductor and the magnetic field integral equation (MFIE) of describing its interior. Since the magnetic current density appears in the operator in the HFIEs, we select the magnetic charge density as a new unknown function to be solved and introduce the continuity equation of magnetic current density as an extra equation. By incorporating the volume integral equations (VIEs) of describing the substrate with arbitrary penetrable media in the interconnect structures, the two-region augmented volume-surface integral equations (AVSIEs) are formulated for entire structures. The traditional method based on the AEFIEs can only be used for solving the problems including PEC interconnects and isotropic and homogeneous substrates while the proposed method based on the AVSIEs can applied to solve the problems with arbitrary materials so the capability of solving problems has been significantly enhanced. The AVSIEs are solved by the method of moments (MoM) where the Rao-Wlton-Glisson (RWG) and Schaubert-Wilton-Glisson (SWG) basis functions are used to represent the surface current densities of AHFIEs and volume current densities of VIEs, respectively, while a pulse basis function is employed to represent the charge densities of AHFIEs. Numerical examples are presented to illustrate the approach and good results have been obtained.
关键词:augmented volume-surface integral equations (AVSIEs);lossy conductor;interconnect structure;low-frequency breakdown;two-region
摘要:The many-to-many communication routing problem is an NP(Nondeterministic Polynomial time)-hard combinatorial optimization problem. Constructing efficient many-to-many communication routing paths requires timely acquisition of global network state information to adapt to the highly dynamic nature of network states. In this paper, within the context of software-defined wireless networks (SDWN), we address the issues present in existing data-driven multi-agent deep reinforcement learning methods, such as high computational and deployment costs, difficulty in adapting to the non-Euclidean characteristics of network topologies, excessive invalid actions during training leading to increased storage and time overheads, and slow convergence rates. This paper designs a new framework for collaborative sensing and intelligent decision-making between the SDN control plane and data plane and proposes a two-stage multi-agent routing method (Multi-Agent Graph deep reinforcement learning method based on intelligent node Deployment Strategy, MAGDS-M2M) to address the multi-to-multi communication routing problem. To reduce the computational and deployment costs associated with deploying agents on every node, a Q-learning-based intelligent node deployment algorithm is designed to determine the network nodes where agents need to be deployed. After completing the multi-agent deployment, a multi-to-multi routing decision method based on multi-agent graph reinforcement learning is developed within the actor-critic (AC) framework. This method redesigns the actor and critic networks using graph convolutional networks (GCN) and graph neural networks (GNN), addressing the weak adaptability of convolutional neural networks (CNN) to topological structure data in existing multi-agent reinforcement learning approaches. Additionally, to solve the issue of generating a large number of invalid actions during training caused by the fixed-length action space of the Actor network, a new local observation method for the action space is proposed. Experimental results demonstrate that the proposed method reduces task completion delay by 29.33% compared to benchmark experiments and verifies that by adjusting parameters, a balance can be achieved between task completion delay and the standard deviation of cumulative energy consumption across nodes. The source code developed in this work has been submitted to the open-source platform at https://github.com/GuetYe/MAGDS-M2M.
关键词:many-to-many communication;intelligent node deployment;multi-agent graph reinforcement learning;action space local observation method;soft-ware-defined wireless networks
摘要:Ultra-broadband semiconductor gallium oxide (Ga₂O₃) nanowires have attracted much attention in the scientific community in recent years as a nanomaterial with unique properties. As a third-generation metal-oxide semiconductor, ZnO, the growth of ZnO nanowires is substrate-selective, and highly uniform arrays can be achieved on substrates homogeneous to it, but they are not easy to grow on heterogeneous substrates, thus enabling the patterning of self-organized micro- and nanostructures; however, for a new metal-oxide semiconductor, Ga₂O₃, the growth of Ga₂O₃ nanowires is not substrate-selective. The arrays realized on homogeneous substrates are neither homogeneous nor dense and can be grown on a variety of substrates. In this paper, we comprehensively and systematically explore the factors affecting the morphology of Ga₂O₃ nanowires through several sets of comparative experiments on the growth of nanowires by hydrothermal method, investigat the relationship between the factors by using the control variable method, found that the selective growth of Ga₂O₃ nanowires is weakened because lattice matching is no longer the only important factor determining the growth of the nanowires, and propose the heterogeneous nucleation, which take the grain size and roughness of the seed layer as the core factors, and find that they are not as important as those in the growth of the nanowires, roughness as core factors, and find that they are another decisive factor affecting the morphology and density of nanowire growth, and both of them play a key role in the Ga₂O₃ nanowire growth process at the same time. This conclusion is of great significance in guidingthe in-depth understanding of the growth mechanism of Ga₂O₃ nanowires and the preparation of self-organized device structures.
摘要:Based on the low temperature co-fired ceramic (LTCC) three-dimensional packaging technology, this paper folds and vertically stacks multiple quarter-wavelength impedance transformation lines to achieve high integration in a multi-section broadband power divider. This design places 7 impedance transformation lines on the odd layers of the LTCC medium, and vertical vias are used to connect adjacent impedance transformation lines. Even layers are used to isolate the coupling effect between impedance transformation lines. The power divider not only achieves a relative bandwidth of 180%, but also has a size of only 4 mm × 4 mm × 1.33 mm. Comparing with the planar power divider with the same number of transmission sections, the horizontal size of this design is reduced by 84.6%. In the frequency range of 2~38 GHz, the measured values of S11, S21, S22, S31 and S32 are better than , respectively. Since the power divider has the advantages of ultra-wideband characteristics, miniaturization and high integration, it can be widely used in mobile communications, radar detection, satellite navigation, industrial measurement and other fields.
关键词:three-dimensional integration;multi-section Wilkinson power divider;miniaturization;ultra-wideband;low temperature co-fired ceramic
摘要:Utilizing properties of group algebras over finite fields, we construct a class of Hermitian linear complementary dual (LCD) 2-quasi-abelian codes. Employing the structure theorem for group algebras over finite fields, we explicitly determine the number of such codes. By investigating the enumeration of codes within this class that possess small relative minimum weights, we demonstrate that the class of Hermitian LCD 2-quasi-abelian codes over any finite field is asymptotically good.
关键词:finite fields;quasi-abelian codes of index 2;Hermitian LCD codes;asymptotically good codes
摘要:In the cell-free massive MIMO (CF-mMIMO) networks, characterized by differentiated service requirements, highly dynamic conditions, and decentralized resource deployment, the efficiency of distributing multi-dimensional network resources during CF-mMIMO caching deployment is constrained. To address this, this paper conducts research on the problem of diverse content caching and multi-user association in decentralized CF-mMIMO scenarios. First, based on the coupling relationship between content caching and user association, models for content caching, user association, and multi-dimensional resource allocation are studied and established. Second, given the stochastic and time-varying network environment and incomplete network state observations, the content caching, user association, and resource allocation problem are abstracted as a distributed partially observable Markov decision process (POMDP) with the objective of maximizing network efficiency. Taking into account the diverse content caching requirements and wide spatial differentiation, a multi-agent deep reinforcement learning algorithm based on graph attention network is further proposed for strategic learning and optimization of content caching, user association, and multi-dimensional resource allocation. Finally, simulation results confirm that the proposed algorithm significantly enhances performance in terms of network efficiency, system throughput and cache hit rate.
摘要:With the widespread adoption of 5G technology, the importance of edge computing in task offloading and processing has become increasingly prominent. Consequently, this trend has led to the emergence of game theory-based edge computing strategies as a research hot spot. This paper aims to maximize the quality of experience (QoE) by studying the multi-user task offloading problem under time constraints. A system model is constructed from three aspects: communication model, computation model, and time constraints. The optimization problem is first transformed into a game-theoretic problem, and thereafter, the existence of a Nash equilibrium solution is proven. This paper proposes a distributed multi-user offloading (DMUO) algorithm, which enables multiple users to simultaneously update their policies within a single time slot for the first time, significantly reducing computational overhead and improving convergence speed. Theoretical analysis not only demonstrates that the DMUO algorithm converges to the Nash equilibrium solution, but also provides an upper bound on the number of iterations.Furthermore, the robustness of the algorithm is verified by analyzing the performance gap between the worst-case strategy and the optimal solution. Simulation results show that the DMUO algorithm exhibits excellent convergence and system performance, proving its scalability and practical applicability in large-scale edge computing environments
关键词:mobile edge computing;potential game;NASH equilibrium;computation offloading;quality of experience
摘要:Deep learning-based synthetic aperture radar (SAR) target recognition methods are widely used in military reconnaissance and disaster monitoring. However, deep neural networks (DNNs) are vulnerable to adversarial attacks, which compromise the reliability of model decisions. Existing black-box adversarial attack methods for SAR images face challenges such as high-dimensional parameter design and perceptible perturbations. To address these issues, a frequency-domain multi-objective optimization-based adversarial attack method is proposed. By transforming SAR images from the spatial domain to the frequency domain via 2D Discrete Fourier Transform, the method reduces perturbation design complexity and modifies a single frequency component to generate texture-like perturbations in the spatial domain. A hypervolume metric-guided multi-objective evolutionary algorithm is integrated to balance attack performance and visual stealthiness. Experimental results demonstrate that, for the T62 category, the adversarial samples generated by our method achieve misclassification confidence rates of more than 90.39%, 71.43%, 44.28% on VGG16, AConvNet, and YOLO series models, respectively. Additionally, the similarity between adversarial and original images exceeds 99% across all cases, providing effective technical support for security and robustness evaluation of SAR imaging systems.
摘要:Lower limb exoskeletons require the capability to identify the user’s lower-limb motion intentions to provide support during daily activities. However, existing research rarely focuses on predicting locomotion modes that provide user intention for new subjects. To bridge this gap, this study proposes a novel method for lower-limb locomotion mode prediction based on multi-sensor signal fusion and transfer learning. The study first designs a prediction model that utilizes long-short term memory (LSTM) networks to extract pattern features from surface electromyography (sEMG) signals. These sEMG features are then fused with joint angle features to predict lower-limb locomotion modes. Considering the inter-subject variability in physiological signals, the method employs a two-step training process using transfer learning. First, the model is pre-trained on a source domain dataset. Next, the weights of the sEMG feature extractor are frozen, and the fully connected layers are fine-tuned using a target domain dataset. Experimental data are collected from subjects performing both normal walking and exoskeleton-wearing walking. Experimental results with a prediction time of 100 ms demonstrate that the proposed method enhances motion pattern prediction accuracy by 9.53% during free walking and by 8.29% during exoskeleton-wearing walking for new subjects. These results suggest that the proposed approach can improve locomotion mode prediction accuracy for new subjects, thereby ensuring reliable human motion intention prediction in lower-limb exoskeletons.
关键词:lower limb exoskeleton;locomotion mode prediction;surface Electromyography;transfer learning;multi-sensor information fusion
摘要:Mobile crowd sensing (MCS) collects data through the sensing devices carried by users and is a large-scale data sensing paradigm, where task allocation is one of the main challenges. This paper studies the task allocation problem of mixed users with heterogeneous quality delay-sensitive tasks. The design objective is to maximize the quality of task completion under the shared total budget of opportunistic users and participatory users. In response to the problem of insufficient prediction accuracy of existing prediction methods, this paper proposes a mobility prediction model based on transfer learning. By transferring the data of old participants with rich trajectories to new participants, it solves the prediction errors caused by the scarcity of historical data. Based on this prediction model, a mixed user task allocation algorithm is designed. The algorithm uses the mobility prediction model to allocate tasks to opportunistic users. In addition, the remaining tasks are clustered into different areas, and a bipartite graph matching problem is constructed to bind participatory users and task areas. Subsequently, an ant colony optimization algorithm based on travel distance balance (ACOTD) is proposed to achieve optimal path planning under the user’s travel distance budget. Through a large number of simulation experiments on real datasets, this paper compares with existing algorithms. The results show that the algorithm has significant advantages in task completion quality and task allocation efficiency, verifying its effectiveness.
摘要:Voting is an important decision-making method in modern society. This paper propose an efficient multi-centre quantum-secure voting scheme using quantum walk and semi-quantum techniques. This scheme consists of multiple voters, multiple quantum centres, etc. This scheme uses semi-quantum techniques to reduce the equipment cost and facilitate the implementation; Multiple quantum centers are computed in parallel, and the combination of ring and star structures reduces the communication pressure on the central nodes, making voting and vote counting more efficient and suitable for scenarios with a large number of people voting; When summarizing vote counting between quantum centers, the initial quantum resources use two-particle product states, which are easy to prepare and require only single-particle measurements, making the operation convenient and reducing the difficulty of vote counting. This system can effectively detect and resist various attacks, thus ensuring security.
关键词:quantum voting;multi-center parallel computing;quantum walk;semi-quantum key distribution; dimensional quantum system
摘要:Multimodal intent recognition (MIR) is a critical research for understanding human intent in the real world. It aims to judge the speaker’s intent through multiple modalities including language, visual and audio modalities. However, existing studies in MIR primarily focus on constructing multimodal semantic environments for textual data, while the utilization of rich semantic information in visual and audio modalities, such as action and emotional semantics, remains insufficiently explored. Despite the visual and audio modalities carrying intents-related semantics, their inherent redundant information and noise hinder the effective use of these modalities. To address these challenges, this paper proposes a more effective MIR model that better leverages audio and visual information while suppressing redundant information. The proposed model understands the speaker’s intent by constructing primary semantic features that suppress redundant information and guiding the learning of intra-modality and inter-modality semantic associations at different scales. Based on this, the model leverages the potential intent consistency across different modalities and pair audio and visual representations with textual features, which contain more explicit intent-related semantics, to filter out irrelevant semantics that cannot be eliminated by intent recognition tasks. Furthermore, the model uses multi-modal fusion gating mechanism to integrate intent semantics from different modalities. Experiments on several datasets of intents understanding tasks show that the proposed method can effectively extract the modal semantics of audio and video and filter out the irrelevant semantics of intent recognition, and outperforms the existing MIR methods, achieving 0.7 to 1.8 percentage points improvement in accuracy (ACC), precision (P), recall (R) and F1 score (F1).
摘要:Anti-spoofing of deeply forged speech is an important technique in the field of generative artificial intelligence (AI) security. In addition to binary classification of real and forged speech, speech forgery method recognition is becoming an important part of interpretable anti-spoofing strategies. To evade the recognition of the speech forgery method, attackers are likely to utilize the adversarial attack technique to degrade the accuracy of the speech forgery method recognition (SFMR) model by adding adversarial perturbations that are imperceptible to the human ear into the forged speech. To address this problem of adversarial attack faced by SFMR, the concept of adversarial defense boundary is proposed from the defender’s point of view. Based on this, the effect of network randomness and decision boundary distance on model adversarial robustness is theoretically analyzed using Taylor analysis techniques, and the robust adversarial defense boundary(RADB)-based SFMR algorithm is proposed. Two modules, random transform (RT) and decision boundary distance regularization (DBDR), are adopted by the algorithm to realize robust adversarial defense. The RT module improves the adversarial robustness by simulating the possible interference of forged speech in real-world scenarios, and randomly transforming the input speech during both training and inference to take advantage of the randomness. The DBDR module introduces the decision boundary distance regularization loss function to encourage the model to increase the upper bound of adversarial robustness and reduce the sensitivity of the model’s class prediction regarding the adversarial perturbation. Experimental results on typical SFMR datasets, i.e., Chinese fake audio detection(CFAD) and 2019 automatic speaker verification spoofing and countermeasures challenge (ASVspoof2019), show that compared with existing state-of-the-art baseline methods, the proposed algorithm is able to improve the SFMR accuracy under adversarial attacks by 5.63% and 5.95% to 93.98% and 91.71%, respectively.
摘要:Natural language description-driven object tracking refers to guiding the visual tracking task through natural language descriptions, and fusing textual descriptions and image visual information to realize the model’s perception and understanding of the world “like a human”. With the development of deep learning, new methods in the field of natural language description-driven visual tracking are emerging. However, most of the existing methods are limited to two-dimensional space and fail to fully utilize the position information in three-dimensional space, and thus are unable to naturally perceive the world in three dimensions as humans do. Most of the existing 3D object tracking tasks rely on expensive sensors and have limitations in data acquisition, which makes 3D object tracking even more complicated. To address the above challenges, this paper proposes a new task of natural language-driven object tracking in 3D(NLOT3D) in monocular view and constructs the corresponding dataset, NLOT3D-SPD. In addition, this paper designs an end-to-end NLOT3D-TR(Natural Language-driven Object Tracking in 3D based on Transformer) model, which fuses visual and textual cross-modal features and achieves excellent experimental results. This paper provides a comprehensive benchmarking of the NLOT3D task with several comparative experiments and ablation studies, providing strong support for further development in the field of 3D object tracking.
摘要:To address concept drift in non-stationary data streams that evolve over time, this paper proposes incremental density-based clustering algorithm (ICDC), an incremental density-based clustering algorithm designed for concept drift detection and adaptation over data stream. ICDC enhances the one-pass clustering framework by introducing a lazy outlier handling mechanism, where outlier evaluation is triggered by newly arrived data to distinguish between potential micro-clusters and noise. During clustering, data points and micro-clusters must satisfy feature dependency and temporal dependency conditions, effectively filtering outliers from the potential outlier set. This approach prevents irreversible deterioration of cluster structures caused by incorporating outliers—a limitation of existing outlier processing methods. Additionally, ICDC incorporates an outlier life cycle adjustment mechanism to control buffer size growth efficiently. By leveraging cluster structure changes as concept drift indicators, we propose a detection algorithm that enhances ICDC’s sensitivity to local and global pattern shifts during data stream evolution. We evaluate ICDC on multiple real and synthetic dataset, assessing clustering quality, performance, concept drift detection and adaptation, memory overheade, and computational overhead. Experimental results demonstrate that ICDC outperforms existing algorithms on most datasets, achieving superior clustering accuracy and effectively detecting concept drift.
摘要:Single-image high dynamic range (HDR) reconstruction can avoid ghosting artifacts that may be caused by multi-exposure HDR imaging, and is receiving widespread research. However, existing methods still struggle to effectively restore detail information in poorly exposed regions due to a lack of focus on critical information. To address this issue, this paper proposes a single-image HDR reconstruction method based on multi-attention and perceptual weighted learning, which aims to infer a high-fidelity HDR image from a single low dynamic range (LDR) image. Specifically, considering that the restoration of poorly exposed regions requires reference to compensation information from other regions, a multi-attention vision transformer (MA-ViT) with global-local receptive fields is designed. It combines depthwise separable convolution and attention mechanisms to achieve more effective global and local feature extraction and interaction. In addition, a loss aware weighted map is proposed to guide the network to focus on poorly exposed regions, further enhancing the quality of HDR reconstruction. Comprehensive comparative experiments are conducted on multiple benchmark datasets, and the results show that the proposed method improves the average peak signal to noise ratio (PSNR) by 0.23 dB compared to the state-of-the-art method, while generating HDR reconstruction results with higher visual quality.
关键词:single-image high dynamic range reconstruction;deep learning;inverse tone mapping;attention mechanism;perceptual weighting
摘要:Open world object detection aims to simultaneously identify both known and unknown categories in dynamic environments, while enabling incremental learning of new categories. However, due to the lack of semantic representation ability of unknown categories, the guidance information between known and unknown categories is mutually coupled, resulting in limited detection performance. To solve this problem, this paper proposes an open world object detection based on causal prompt distillation, which innovatively combines visual-language model with causal inference to solve the problem of semantic bias between categories in open scenes. Specifically, by constructing a structural causal model, this paper reveals the semantic interference path between known and unknown categories from the perspective of causality. Then, causal prompt learning is proposed, which explicitly introduces the semantic prior of the open scene by generating semantic vectors of unknown categories to enhance the model’s perception of unknown objects. Finally, in order to solve the problem of semantic bias in knowledge transfer, a causal distillation mechanism is proposed, and the guidance information of the known and unknown categories is decoupled by the double distillation loss decoupling teacher model. Experimental results demonstrate that this method has achieved good effects on multiple datasets, with an improvement in mean average precision (mAP) for known categories by 1.3% and a rise in recall rate (U-Recall) for unknown categories by 6.5%. These results validate the effectiveness and robustness of the proposed approach.
摘要:The adversarial robustness of deep learning models is crucial for the development of trustworthy artificial intelligence. The research field widely adopts adversarial attack methods to indirectly evaluate the adversarial robustness of models. However, such methods rely on specific adversarial attack methods and levels of adversarial perturbations, failing to reflect the essential characteristics of models. Meanwhile, the few existing indicators that directly assess model adversarial robustness require prior knowledge of adversarial perturbations or assume that training data follows a specific distribution, limiting their applicability. In response to these challenges, starting from the intrinsic characteristics of models, this paper proposes a simple and effective adversarial robustness evaluation metric, DBSE. This method exploits the correlation between adversarial robustness and decision boundary smoothness, proposing a decision boundary sample sampling strategy to approximate and characterize the actual decision boundary of the models by obtaining samples about the decision boundary. Then, singular value decomposition is used to extract spatial structural information of the decision boundary, and Shannon entropy is employed to quantify the distribution of variations in various directions, thereby forming the adversarial robustness evaluation metric DBSE. Experimental results demonstrate that DBSE outperforms representative evaluation metrics such as ASR(Attack Success Rate), EBD(Empirical Boundary Distance), ACTC(Average Confidence of True Class), ACAC(Average Confidence of Adversarial Class), MP(Minimal Perturbation) and ROBY in terms of independence, effectiveness, and efficiency, and reduces time consumption by 55% compared to EBD.
摘要:Question generation over knowledge graph (KGQG) aims to generate natural language questions from knowledge graph (KG) facts automatically. Existing methods directly transform an instantiated KG subgraph into a question and usually adopt the teacher-forcing training strategy. However, the current methods still face two major challenges: (1) instantiated KG subgraphs lack the integration of deterministic query intention, resulting in a semantic mismatch between the input and the target output; (2) the teacher-forcing training strategy suffers from exposure bias in the inference stage. To address the challenges posed by semantic ambiguity, this paper proposes a framework for complex question generation consisting of two stages, namely, facts-to-query and query-to-question. In the first stage, this paper designs a query graph generator, which converts KG subgraphs into query graphs with different query intentions. In the second stage, this paper proposes a question generation model, which employs densely connected graph convolutional networks (GCN) to encode the query graphs and utilizes the bidirectional and auto-regressive transformers (BART) model for decoding to generate questions. Moreover, to alleviate exposure bias, we train the question generator with generative adversarial imitation learning. The adopted discriminator learns reward functions self-adaptively through imitating the labeled data and guides the question generator to explore the high-reward area of the potential question space. Extensive experiments conducted on three widely-used datasets demonstrate the significant effectiveness of the proposed framework.
摘要:Terahertz (THz) communication is one of the key technologies for the sixth generation (6G) of mobile communications, offering enormous bandwidth and supporting ultra-high transmission rates with broad application prospects. THz channel modeling is fundamental to the design, simulation, and optimization of THz communication systems. The research on THz channels mainly encompasses channel measurements, characteristic analysis, and channel modeling. In this paper, firstly, the time-domain and frequency-domain channel measurement methods are introduced. Some recent research activities are also summarized, including research institutions, measurement scenarios, methods, antenna configurations, and measured channel characteristics. Next, the propagation characteristics of THz waves are summarized, including the propagation mechanisms of electromagnetic waves in the THz band, the large- and small-scale parameters of the channel. Finally, prospective directions and challenges are outlined for future research in the THz channel, aiming to enhance the understanding and exploitation of this promising electromagnetic spectrum for 6G and beyond.
摘要:In recent years, autonomous driving has gained increasing attention due to its significant potential in improving road safety and enhancing traffic efficiency. The perception system plays a crucial role in modern autonomous driving systems, aiming to accurately estimate the surrounding environment’s state and provide reliable observations for prediction and planning. Among them, 3D object detection serves as an important component of the perception system for predicting the positions, sizes, and categories of objects surrounding the autonomous vehicle. This paper provides a comprehensive overview of the research advancements in 3D object detection for autonomous driving in recent years. It discusses the advantages and limitations of single-modal methods and multi-modal fusion methods using different sensors from the perspectives of single-modal detection and multi-modal fusion detection. Furthermore, the paper compares the performance of various representative algorithms on public datasets, summarizes the current commonly used training strategies, and discusses the future development directions in this field.