最新刊期

    53 11 2025

      Large-Scale Models and the Internet

    • Multi-Model Serial and Parallel Collaborative Inference in AI-ModelNet

      LIU Zhong-ren, LI Zhe-tao, WANG Jian-hui, XIAO Yong, ZENG Xi-yu, LI Jun, MO Guang-feng
      Vol. 53, Issue 11, Pages: 3817-3835(2025) DOI: 10.12263/DZXB.20250503
      摘要:Large language models (LLMs), empowered by massive parameter scales and strong semantic representation capabilities, have achieved breakthrough progress in natural language processing, computer vision, and related fields, and have gradually become a key foundation of modern intelligent systems. However, increasing demands for lightweight deployment, on-device customization, and scenario-specific specialization have led to the rapid emergence of task-specific models. Although these specialized models exhibit strong capabilities within their respective domains, they are insufficient for handling complex multi-task and multi-domain reasoning independently, which motivates research on multi-model collaborative inference. Existing studies primarily focus on model fusion or single collaboration paradigms, which limits the exploitation of complementary strengths across models and lacks systematic exploration of collaboration structures and path mechanisms. To address these challenges, this study proposes a collaborative inference framework for model-interconnection scenarios, enabling an evolutionary shift from linear chain structures to multi-path composite structures. The framework formalizes two basic paradigms—serial inference (SI) and parallel inference (PI)—and further introduces two hybrid strategies, serial-to-parallel (S2P) and parallel-to-serial (P2S), to dynamically coordinate depth- and breadth-oriented collaboration pathways. Comprehensive experiments on mathematical reasoning, knowledge understanding, and symbolic reasoning show that SI, PI, S2P, and P2S improve accuracy by 24.33, 16.66, 26.66, and 25.33 percentage points, respectively, compared with single-model inference. Additional analysis shows that hybrid collaboration significantly reduces overall inference latency while achieving higher accuracy, demonstrating a superior performance-efficiency trade-off. Moreover, the study reveals the structural impacts of different collaboration paths, offering theoretical insights and empirical evidence for the design of multi-model networks and efficient model-interconnection systems.  
      关键词:large model;model interconnection;multi-model collaboration;serial inference;parallel inference;composite collaborative inference   
      7
      |
      3
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 149459823 false
      更新时间:2026-02-10
    • ZHANG Hu, SUN Ming-hui, LIU Yang, DAI Hong-jun, WANG Ji-bin, ZHANG You-li
      Vol. 53, Issue 11, Pages: 3836-3851(2025) DOI: 10.12263/DZXB.20250468
      摘要:The rapid evolution of artificial intelligence (AI) has propelled the large-scale application of open-source large language model across diverse scenarios. However, with the substantial performance boost of individual graphics processing unit (GPU), resources often suffer from idling when serving inference workloads for small- to medium-sized LLM, leading to insufficient overall computing utilization. To enhance GPU efficiency in data centers, spatial-temporal sharing or virtual GPU (vGPU) technologies are widely adopted for resource multiplexing. Notably, vGPU has emerged as the mainstream solution for providing GPU services to multi-tenant and multi-task environments, owing to its fine-grained resource partitioning and robust security isolation. Nevertheless, GPU resource sharing inevitably introduces performance interference among workloads, particularly given the dynamic and bursty resource demands characteristic of LLM inference. Neglecting such interference can lead to a significant surge in inference latency and trigger service level objective (SLO) violations, thereby compromising the stability and user experience of LLM services. To address this critical challenge, this paper proposes an efficient resource provisioning method for LLM inference workloads based on vGPU performance interference awareness. First, we construct a multi-dimensional performance characterization dataset through large-scale concurrent inference experiments, covering various LLM parameter sizes, workload co-location combinations, and intensities. On this basis, a lightweight performance interference prediction model is established, incorporating model features, hardware specifications, and system monitoring metrics. This model ensures precise estimation of key performance indicators while meeting the real-time requirements of resource decision-making. Leveraging this prediction model, we further design a constraint-optimization-based economic resource allocation algorithm. With the objective of minimizing GPU resource consumption and constraints ensuring inference latency remains within SLO thresholds and throughput meets business demands, the algorithm optimizes GPU resource allocation by dynamically adjusting the vGPU partition ratios for each workload. We evaluate the proposed method in a mixed workload environment comprising two categories and six typical LLMs. The experiments are conducted on NVIDIA A100 and RTX6000 platforms utilizing the HAMi vGPU solution, benchmarking against traditional GPU provisioning strategies. Experimental results demonstrate that the proposed method reduces GPU resource overhead by over 20% compared to mainstream schemes while strictly adhering to SLO constraints. These findings validate the effectiveness and economic viability of the approach in LLM inference scenarios, providing significant technical support for data centers to enhance GPU utilization, reduce AI service deployment costs, and facilitate the large-scale adoption of open-source LLM.  
      关键词:large models;inference workloads;vGPU;performance interference;resource allocation   
      104
      |
      14
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 141054265 false
      更新时间:2026-02-10
    • ZHANG Kui-yuan, ZHANG Qi-liang, CHEN Peng-peng, GAO Shou-wan
      Vol. 53, Issue 11, Pages: 3852-3864(2025) DOI: 10.12263/DZXB.20250472
      摘要:With the development of underground exploration towards deep, large, and unmanned, mobile robots have become crucial in underground detection and rescue. As the basis of mobile robots, simultaneous localization and mapping (SLAM) provides reliable support for its autonomous navigation and obstacle avoidance. Due to the sensors degradation, the constrained computing resources, and the limited sensing range and serious cumulative drift of mobile robots in large-scale underground environments, a robots-edge collaborative SLAM (Re-CoSLAM) method via ultra-wideband (UWB) tightly-coupled is proposed. Based on the edge assisted multi-modal SLAM framework, Re-CoSLAM designs a UWB tightly-coupled absolute pose estimation method based on the error state Kalman filter to improve absolute localization performance. Combined with the UWB absolute localization, a scaling up multi-agent collaborative SLAM framework and a adaptive transmission mechanism are further established. To ensure global consistency, Re-CoSLAM proposes a joint pose graph optimization algorithm with UWB relative range constraints between the multiple agents. Besides, considering the constrained computing resources of the edge server, a task scheduling strategy based on request priority is devised to reduce queuing latency and improve tracking accuracy. In this paper, Re-CoSLAM is fully deployed on three mobile robots equipped with NVIDIA on-board computers and an edge server, and extensive experiments and evaluations are performed in the indoor corridor, underground garage and underground tunnel scenarios. The results indicate that Re-CoSLAM can achieve an absolute localization accuracy of 7.3 cm and a speed of 13 Frames Per Second in various scenarios, with localization errors reduced by more than 50% compared to existing solutions.  
      关键词:multi-modal fusion;simultaneous localization and mapping;ultra-wideband localization;robot-edge collaborative;multi-agent cooperation   
      65
      |
      8
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 139566398 false
      更新时间:2026-02-10
    • SHEN Hang, WANG Xu, WANG Tian-jing, DAI Yuan-fei, BAI Guang-wei
      Vol. 53, Issue 11, Pages: 3865-3879(2025) DOI: 10.12263/DZXB.20250470
      摘要:In cross-platform and cross-lingual social network environments, the spread of misinformation is characterized by high concealment and cross-cultural complexity, posing serious challenges to public opinion governance and social trust systems. Due to significant differences in linguistic and cultural expression, traditional deep learning-based detection methods often suffer from performance degradation in cross-domain generalization and semantic modeling, exhibiting insufficient cross-domain feature alignment, incomplete semantic representation, and limited understanding of metaphors, emotions, and cultural contexts. To address these limitations, this paper proposes a large language model (LLM)-enhanced self-supervised domain adaptation (DA) detection framework. By integrating the deep semantic modeling capacity of LLMs with the discriminative feature learning capability of contrastive learning (CL), the framework achieves robust and generalizable cross-lingual misinformation detection. This solution establishes a closed-loop system encompassing semantic augmentation, feature alignment, and feedback optimization. First, a prompt-based cross-lingual text augmentation mechanism is designed to guide the LLM in maintaining semantic integrity and cultural adaptability during data generation. This enables the production of high-quality samples that preserve the semantic core of the original text while conforming to the linguistic style of the target language, effectively mitigating semantic gaps in cross-lingual contexts. Next, a dual-dimensional contrastive strategy aligns local lexical features at the token level and global semantic logic at the sentence level, unifying source and target domain representations at multiple levels to enhance feature distribution consistency and cross-lingual detection stability. Finally, an LLM-assisted cross-lingual training mechanism is introduced, where contrastive loss serves as a dynamic feedback signal to guide the iterative fine-tuning of the LLM. This process progressively refines the augmentation strategy, ensuring that the generated data distribution converges toward the CL detector’s decision boundary and enabling the co-evolution of cross-lingual data augmentation and feature learning. Experimental results on heterogeneous social media datasets, Weibo (a Chinese social platform) and PHEME (an English dataset of event-related rumor propagation), demonstrate that the proposed method significantly outperforms commercial LLM direct detection (e.g., ChatGPT-4o), mainstream deep learning models (e.g., LSTM, TextCNN, RCNN, HAN), and existing LLM-enhanced methods (e.g., LACL) in terms of accuracy and F1 score. In cross-lingual detection, the average detection accuracy of the proposed approach exceeds baseline methods by more than 10 percentage points. Further feature visualization analysis confirms that our method compresses intra-class variance and enlarges inter-class separability, resulting in clearer decision boundaries and higher classification confidence.  
      关键词:social network misinformation;large language model (LLM);contrastive learning (CL);cross-language text augmentation;domain adaptation (DA)   
      64
      |
      5
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 135713008 false
      更新时间:2026-02-10
    • LIAO Ling-ling, TAO Ming, XIE Ren-ping, ZHANG Yin, YUAN Hua-qiang
      Vol. 53, Issue 11, Pages: 3880-3893(2025) DOI: 10.12263/DZXB.20250411
      摘要:Large language model (LLM) has exhibited exceptional performance in inference. However, achieving real-time and high-efficiency inference in complex industrial scenarios remains a significant challenge. Traditional centralized cloud-based inference architectures are constrained by the latency of long chain of thought (CoT) reasoning and transmission bottleneck, rendering them inadequate to meet the stringent low-latency requirements of complex industrial inference. Conversely, although lightweight LLM deployed on the edge can achieve rapid response, limited inference capabilities also compromise the inference quality. Therefore, edge-cloud collaborative inference emerges as an inevitable choice. However, single-modal LLM struggle to accommodate modality-specific characteristics and diverse task requirements, while the widespread applicability of multimodal LLM is limited by the high computational costs. Moreover, directly employing an LLM for complex inference often leads to hallucinations, undermining inference reliability. To address the issues, a fine-grained LLM inference task offloading framework based on edge-cloud collaboration is proposed in this paper. Specifically, lightweight and modality-specialized LLM are deployed on the edge to efficiently process simple tasks with minimal latency, while a powerful multimodal deep LLM resides in the cloud to execute complex logical reasoning tasks, ensuring inference quality. Complex LLM inference is decomposed into three stages and modeled as a directed acyclic graph (DAG). With this representation, the communication and inference models are constructed, and the LLM inference is formulated as a minimization problem of the weighted sum between overall inference latency and cost. With the proof that the investigated problem can be transferred into a discrete Markov decision process (MDP), considering the complex interactions between subtask features and dynamic system resource states, a solution named UCB-COMA, integrating the upper confidence bound (UCB)-based action selection mechanism with counterfactual multi-agent policy gradient (COMA), is designed to enable joint optimization of subtask scheduling order and executing position of inference subtask. Experimental results demonstrate that the performance of UCB-COMA is superior to that of comparison schemes.  
      关键词:large language model;edge-cloud collaboration;task offloading;deep reinforcement learning;industrial internet of things   
      40
      |
      8
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 136273738 false
      更新时间:2026-02-10

      PAPERS

    • P-Slicer: A Program Slicing Approach Based on Learning Path Representations

      LIU Tian-yang, SHI Jian-jun, YE Jia-wei, JI Wei-xing
      Vol. 53, Issue 11, Pages: 3894-3909(2025) DOI: 10.12263/DZXB.20250824
      摘要:Program slicing is a foundational technique in software analysis, indispensable for tasks such as program understanding, defect localization, and code refactoring. Its primary challenge is to precisely identify code fragments related to a given slicing criterion within complex control and data flow structures. Recently, program slicing approaches based on pre-trained large language models have shown promising results, owing to their strong capability in capturing program semantics. However, due to the model’s limitation on input length, it is difficult to handle practical scenarios such as long methods and interprocedural dependencies. To address these problems, this paper proposes P-Slicer, a program slicing approach based on learning path representations. This approach first extracts multiple execution paths by building a control flow graph based on the syntactic structure to achieve high code coverage while preserving contextual information. Then, a learning-based classification model is employed to determine the relevance of each statement to the slice criterion. Finally, a variable define-use propagation mechanism for variables is employed to achieve interprocedural slices by recursive analysis. The approach integrates semantic comprehension while preserving the scalability, thereby enhancing the accuracy and practicality of the slicing results. The experimental results demonstrate that P-Slicer achieves 95.95% accuracy, 86.89% precision, and 88.95% recall on slicing task, while maintaining robust performance when handling long methods and interprocedural slices, indicating its promising potential for application in the software engineering.  
      关键词:program slicing;path extraction;interprocedural analysis   
      29
      |
      4
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 138394741 false
      更新时间:2026-02-10
    • LIU Chao-yi, GENG Hao-bang, GE Ya-wei, LIN Han, HOU Na, ZHAO Er-hu, HUANG Li-bo, XU Yong-jun
      Vol. 53, Issue 11, Pages: 3910-3919(2025) DOI: 10.12263/DZXB.20250716
      摘要:Denoising diffusion probabilistic models (DDPMs), as a core technology in the current generative AI field, have achieved revolutionary breakthroughs in high-quality image synthesis tasks. However, their internal working mechanisms have long been regarded as a “black box”, severely restricting their large-scale application in high-trust scenarios such as medical imaging and autonomous driving. Existing research mostly focuses on the macroscopic behavior analysis of the reverse denoising process, lacking fine-grained deconstruction of the dynamic interaction mechanisms among different semantic regions in the latent space, resulting in a significant gap between model interpretability and precise control ability. This study explores the interpretability of denoising diffusion probabilistic models from a new perspective of decoupled visual concept generation. The findings not only explain the manifestation of locality in DDPMs from a theoretical standpoint but also enable fine-grained image manipulation in downstream applications. Inspired by game theory, we propose to use Shapley values to evaluate the interactions between regions. However, calculating Shapley values according to the traditional definition would face feasibility issues in terms of time complexity. Therefore, we further propose a theorem and an accompanying sampling strategy to reduce the time complexity to OKC, where K represents the number of regions and C is the number of samples. Qualitative and quantitative experiments show that our method, when applied to real image processing, achieves a 30%~55% performance improvement in local manipulation compared with existing methods. In practical applications, users can modify specific visual concepts without interfering with other regions. Through the deep integration of game theory and DDPM, not only has the mathematical essence and implementation path of locality in diffusion models been theoretically clarified for the first time, but also the first interpretable DDPM framework with semantic decoupling capability has been constructed in practice.  
      关键词:computer science;artificial intelligence;large models;interpretability;DDPMs   
      19
      |
      2
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 144030654 false
      更新时间:2026-02-10
    • An Infrared Polarization Based UAV Detection Method for Complex Environment

      QIAO Xin-bo, GUO Yang, ZHAO Yong-qiang, LIU Liang
      Vol. 53, Issue 11, Pages: 3920-3931(2025) DOI: 10.12263/DZXB.20250496
      摘要:While detecting UAV under complex interferences, the existing methods based on radar, radio frequency and vision system obtains a high false alarm rate and low accuracy. Aimed at these problems, an UAV detection method for complex scenes is constructed based on the infrared division of focal plane (DoFP) polarization imager. By utilizing the polarization difference between UAV and background, the constructed attention network with distillation demosaic (ANDD) directly obtains the detection result from the mosaic image obtained from the DoFP imager with high accuracy in real time. Firstly, the method constructs the multi-polarized information with low noise using the pretrained polarization demosaic distillation network (PDMDN). Then, the primary features are extracted from the backbone network. Finally, the polarization attention (PAT) network is designed to obtain the UAV detection result by further applying the polarized feature. To verify the efficiency of our proposed ANDD network, an infrared polarization UAV detection dataset with complex interferences is constructed using an infrared DoFP imager. Experiences with the state-of-the-art methods demonstrate that the proposed method obtains a considerable result, demonstrate the effectiveness of the ANDD method.  
      关键词:infrared polarization;division of focal plane;deep learning;demosaic;attention network;object detection   
      49
      |
      8
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 142884286 false
      更新时间:2026-02-10
    • EL-RIS Empowered Multicast-Unicast Covert Communication

      LÜ Lu, LIANG Qi-hui, FENG Yun-peng, YANG Long, GUAN Xin-rong
      Vol. 53, Issue 11, Pages: 3932-3942(2025) DOI: 10.12263/DZXB.20250532
      摘要:This paper proposes an extremely large-scale reconfigurable intelligent surface (EL-RIS)-empowered hybrid multicast-unicast covert communication system to overcome the “double-fading” effect and energy leakage issues in conventional RIS-assisted systems. By leveraging its massive array scale, the EL-RIS significantly expands the near-field region. This expansion not only enhances the user channel gain but also introduces a distance-degree of freedom for covert communication, thereby substantially reducing the signal energy leakage to the eavesdropper. The multicast signal is used to broadcast public information to multiple users, with its transmit power following a uniform distribution, creating power uncertainty, while the unicast signal carries the users’ covert information. The multicast signal can provide a “cover” for the covert transmission of the unicast signal, confusing the eavesdropper’s detection and effectively improving the covert rate. Energy focusing is achieved at the user locations via EL-RIS, while energy leakage at the eavesdropper is suppressed. To maximize the system’s performance, we formulate a non-convex optimization problem that jointly designs the base station beamforming and the EL-RIS reflection coefficients. Aiming to maximize the users’ covert sum rate under the constraints of covertness and successful multicast signal decoding. An efficient alternating optimization (AO) algorithm is proposed, which decouples the original problem into two subproblems base station beamforming optimization and EL-RIS reflection coefficient optimization. The weighted minimum mean square error (WMMSE) algorithm is employed to reformulate the objective function, transforming the non-convex constraints into second-order cone (SOC) forms, and further representing them as linear matrix inequalities (LMI), thereby reducing the complexity of the optimization problem. For the design of EL-RIS reflection coefficients, the unit-modulus constraints render the problem non-convex and computationally challenging. To address this, a low-complexity algorithm based on the alternating direction method of multipliers (ADMM) is proposed. By introducing auxiliary variables, the original problem is decoupled. Subsequently, an augmented Lagrangian function is constructed to decompose the problem into multiple tractable subproblems, thereby enhancing the algorithm’s computational efficiency. Simulation results demonstrate that the proposed scheme significantly outperforms benchmark schemes in terms of covert communication rate, unicast signal energy can achieve focusing at the user locations. Under the near-field channel model, the beam focusing realized by EL-RIS improves the covert sum rate by 57% compared to the far-field model. Increasing the number of EL-RIS units can further extend the near-field region, enhance the beam focusing effect, and improve the communication system’s robustness against changes in the eavesdropper’s location. Even in extreme eavesdropping scenarios, the system can maintain a high covert communication rate.  
      关键词:covert communication;extremely large-scale reconfigure intelligent surface;hybrid multicast-unicast transmission;near field communication;weight minimum mean square error   
      44
      |
      2
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 143585735 false
      更新时间:2026-02-10
    • JIA Xi-bin, YANG Chuan-xu, FAN Chao, ZHENG Yi-ming, YANG Zheng-han, YANG Da-wei, XU Hui
      Vol. 53, Issue 11, Pages: 3943-3955(2025) DOI: 10.12263/DZXB.20250677
      摘要:Multi-modal medical image segmentation can fully exploit the rich complementary information across different magnetic resonance imaging (MRI) sequences (e.g., T1, T1ce, T2, FLAIR), achieving markedly superior accuracy and robustness compared to single-modality approaches in complex lesion segmentation tasks such as brain tumors. However, the vast majority of existing methods rely on the strong assumption that all modalities are fully available during inference, whereas in real-world clinical practice, one or several modalities are frequently missing due to patient motion, heterogeneous scanning protocols, equipment constraints, or absent historical data, causing drastic performance degradation and severely limiting practical deployability. To address this critical challenge, we propose a novel spatial Mamba-based missing-modality-robust lesion segmentation network, termed spatial Mamba-based missing modality robust lesion segmentation network (SM3RNet), which systematically ensures stable performance across arbitrary modality combinations from encoding to decoding stages. SM3RNet first introduces a Mamba-based multi-branch spatial feature encoder (SME) that assigns an independent Mamba branch to each modality and performs efficient bidirectional long-range dependency modeling along the x, y, and z axes, realizing global contextual modeling of 3D medical volumes with only linear computational and memory complexity, far surpassing the quadratic burden of Transformer-based alternatives. To maintain segmentation stability when partial modalities are absent, SM3RNet extracts and leverages discriminative features shared across modalities through a multi-view attention-guided cross-modal feature fusion module (MACF); by simultaneously operating from channel, spatial, and inter-modality perspectives, MACF dynamically amplifies the contribution of shared semantic features, adaptively coordinates varying modality combinations via attention mechanisms, and effectively alleviates performance drops caused by missing modalities. Furthermore, a parallel dual-stream attention decoder (DSD) is integrated into skip connections to synergistically refine multi-scale fused features from both spatial and channel dimensions, significantly enhancing lesion discriminability and boundary detail recovery, thereby yielding superior accuracy and completeness in the final segmentation maps. Extensive comparative and ablation experiments conducted on the internationally authoritative BraTS2020 and BraTS2018 datasets comprehensively validate the superiority of the proposed method: When all modalities are complete, SM3RNet outperforms existing methods in metrics such as Dice. In environments with randomly missing modalities, it still surpasses state-of-the-art methods specifically designed to handle missing modalities. This demonstrates strong robustness and significant potential for clinical deployment, providing an efficient and reliable novel paradigm for clinically practical multi-modal medical image segmentation.  
      关键词:multi-modal;medical image;semantic segmentation;missing modalities;Mamba;attentional mechanisms   
      76
      |
      8
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 142914876 false
      更新时间:2026-02-10
    • LIU Yu-xin, WANG Yi-hang, WANG Si-yang, XIA Wen-chao, ZHAO Hai-tao, BU Xian-de
      Vol. 53, Issue 11, Pages: 3956-3969(2025) DOI: 10.12263/DZXB.20250658
      摘要:With the rapid advancement of the Internet of Things, the volume of data generated by large-scale, ubiquitous distributed terminals is surging. To enhance the intelligence of network services, we adopt the semi-decentralized federated edge learning (SD-FEEL) paradigm, in which multiple edge servers each coordinate a cluster of terminals to perform local model updates, while periodically exchanging updates among themselves. This approach preserves learning performance while effectively mitigating network congestion. However, real-world deployments encounter key challenges: inadequate incentives reduce terminals’ motivation to participate in training, and wireless communication interruptions during the process can degrade overall training efficiency. To address these issues, this paper proposes an incentive mechanism for SD-FEEL scenarios, leveraging evolutionary game theory and optimization of interruption probabilities. Specifically, first, we design a terminal contribution metric that incorporates both data quantity and quality, along with a corresponding reward function to encourage participation from high-quality terminals. This not only boosts global model performance but also ensures fairness in incentives. Second, we introduce an evolutionary game framework to model terminals’ bounded rationality and dynamic decision-making behaviors. This framework balances edge server loads, determines the optimal proportions of terminals associating with each server within the population, and maximizes the population’s overall utility. Building on this foundation, we further optimize specific terminal-to-edge server association strategies with the goal of minimizing the probability of wireless communication interruptions. Simulation results demonstrate that the proposed method can effectively balance the edge service load. Compared to the random access method and the reputation-aware incentive mechanism(RAIM) scheme, the communication interruption probability is reduced by 32.04% and 35.55% respectively, and the model accuracy is improved by 3.58% and 4.34%, respectively.  
      关键词:semi-distributed federated learning;incentive mechanisms;evolutionary games;probability of interruption;terminal contribution evaluation;smart IoT   
      0
      |
      1
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 149459819 false
      更新时间:2026-02-10
    • QIU Ben-liu, WANG Lan-xiao, QIU He-qian, GAO Xiang-yu, WEN Hai-tao, LI Hong-liang
      Vol. 53, Issue 11, Pages: 3970-3982(2025) DOI: 10.12263/DZXB.20250413
      摘要:Online continual learning (OCL) aims at learning a non-stationary data stream in a way of reading each data sample only once, and hence suffers from insufficient learning. To address this problem, we propose a feature fusion method in this work. Our method leverages augmented samples of an image for producing anchor features, and incorporates them to obtain a fused feature via a weighted summation operation. The weights are determined by the similarity between anchor features and a pre-designated pivotal feature of the image. Optimizing the cross-entropy loss of this fused feature can accelerate the learning process, resulting in better performance on the current task. Additionally, we propose a consistency loss that restricts the mean-square error between the fused feature and the pivotal feature, which can further improve the performance on the current task. Finally, we provide a theoretical analysis about the gradients of cross-entropy loss to model parameters. This analysis reveals the relationship between the feature fusion and the gradient re-weighting. Extensive experiments are conducted on three benchmarks under OCL settings, including CIFAR-10, CIFAR-100 and Tiny-ImageNet. Our method surpasses baselines at most 7.00%, 8.04%, 6.33% for average end accuracy on CIFAR-10, CIFAR-100 and Tiny-ImageNet, respectively. Experimental results demonstrate the proposed method is effective, and achieves substantial improvement over previous methods for online continual learning.  
      关键词:image recognition;continual learning;online learning;class incremental learning;feature fusion;gradient re-weighting   
      22
      |
      2
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 144030808 false
      更新时间:2026-02-10
    • A Hybrid Quantum-Graph Neural Network for Multimodal Sentiment Analysis

      LI Xing-guang, CAI Yu-jian, CUI Wei, LI Jin-song, ZHANG Ying-yu
      Vol. 53, Issue 11, Pages: 3983-3995(2025) DOI: 10.12263/DZXB.20250554
      摘要:Multimodal sentiment analysis (MSA) is one of the most promising technologies in the field of affective computing. Visual, acoustic, and textual modalities encode most human emotional features. Integrating them yields a finer, multidimensional representation of subjective affect. However, achieving accurate and robust sentiment analysis still faces significant challenges. When the sentiment feature subsets extracted from each modality differ in element quantity or temporal alignment, an effective strategy for selecting representative emotional features is essential to prevent key features from being overlooked or over-extracted, thereby ensuring the reliability of subsequent fusion analysis. Direct fusion of representative features across modalities often fails to fully exploit information transmission and complementarity, which can cause excessive reliance on a single modality’s semantic representation and lead to overfitting or misclassification. Furthermore, human emotional expression exhibits modality heterogeneity and inconsistency, often resulting in uneven feature distributions and polarity ambiguity. Algorithmic models must not only capture cross-modal complementary information and fine-grained correlations but also suppress redundant features that interfere with emotional discrimination. The presence of a “semantic gap” in data fusion further limits result stability. To address these issues, this paper proposes a hybrid quantum-graph neural network, inspired by multi-scale temporal representation and qubit-based polymorphic encoding. First, a topological graph network of representative sequences is constructed to capture dynamic relationships among feature nodes, and a multi-head graph attention mechanism is introduced to adaptively adjust node and edge weights, ensuring reliable selection of critical sentiment features. Then, a quantum sentiment feature computation network is designed, mapping multimodal features into a high-dimensional Hilbert space via quantum encoding. Leveraging quantum superposition and entanglement, the model enhances deep intermodal coupling and dependency modeling. Through quantum measurement, superposed states collapse into specific eigenstates, establishing a correspondence between quantum states and sentiment features, and yielding more discriminative multimodal fusion representations. Finally, single-modal and multimodal predictions are formulated as multiple subtasks under a multitask collaborative optimization framework. Pseudo-label generation and shared representations improve task-specific performance, while a joint multitask loss mitigates inconsistencies among modality representations, enhancing the model’s generalization ability. Experimental results on the CMU-MOSI, CH-SIMS, and CMU-MOSEI benchmark datasets demonstrate that, compared with conventional baselines, the proposed method improves binary classification accuracy by 1.5%~8.7%, five-class accuracy by 3.3%~10.7%, and seven-class accuracy by 1.5%~14.5%. The F1 score increases by up to 8.5 points, the pearson correlation coefficient improves by up to 0.146, and the mean absolute error decreases by up to 0.304.  
      关键词:multimodal sentiment analysis;graph neural network;quantum machine learning;cross-modal information fusion;multitask optimization   
      46
      |
      7
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 136767307 false
      更新时间:2026-02-10
    • TU Hua-qing, LIU Shuo, FANG Xu-xin, MA Bo, LI Chuan-huang, ZHU Jun, ZOU Tao
      Vol. 53, Issue 11, Pages: 3996-4009(2025) DOI: 10.12263/DZXB.20250421
      摘要:With the rapid development of emerging services such as the Industrial Internet, Internet of Vehicles, and telemedicine, multimodal networks have emerged. This architecture is based on the design principle of “separating technical systems from network environments”, allowing multiple network modalities to coexist on a unified infrastructure platform. However, existing studies mostly focus on the construction of multimodal network environments, compilation optimization, and network element design, while lacking systematic research on load balancing in the distributed control plane. Some approaches inspired by the switch migration and dynamic reallocation mechanisms of software-defined networking (SDN) can alleviate controller overload to some extent, but they require frequent synchronization of state information among controllers. This leads to high migration overhead and response delays, making it difficult to meet the real-time and scalability requirements of multimodal networks.To address these challenges, this paper proposes a method called joint optimization of routing and polymorphic network element controller allocation (JRECA), which optimizes the control plane load distribution in multimodal networks by rationally planning data-plane traffic routing. The method explicitly incorporates the differences in control information scales among different modalities into the optimization framework, comprehensively considering constraints such as network element allocation, routing selection, controller processing capacity, and link bandwidth. Considering the heterogeneous nature of multimodal networks, this paper introduces a load-balancing mechanism that accounts for the differences in control information scale among modalities within the controller load constraints. It constructs a unified model that simultaneously achieves control-plane load balancing and data-plane throughput maximization, thereby realizing coordinated optimization between the control plane load and data-plane throughput—addressing the shortcomings of decoupled optimization in prior research. Furthermore, a theoretically grounded two-step algorithm framework is designed: First, a multimodal network element-controller allocation algorithm based on maximum-load priority is developed to determine the matching relationships between network elements and controllers, with a proven approximation ratio that strictly bounds algorithmic performance. Then, under dynamic traffic conditions, an online routing algorithm based on the primal-dual method is designed, with competitive ratio analysis providing a theoretical lower bound on online optimization performance. Simulation experiments on two representative topologies—Fat-Tree and ARPANet—demonstrate that the proposed algorithm achieves significant performance improvements across five network modalities: IPv4, IPv6, industrial control identifiers, named data identifiers, and identity identifiers. Compared with benchmark algorithms, the proposed method reduces controller load by 17.56%~20.97% and increases system throughput by 13.86%~29.82%.  
      关键词:polymorphic network;distributed;control plane;load balancing;routing;approximation algorithm   
      38
      |
      4
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 138538781 false
      更新时间:2026-02-10
    • ZHU Qiang-qiang, HU Zhi-ming, WANG Si-fan, WU Yang, SHEN Zi-ang, JIA Hao-wen, XU Ruo-feng, GUO Qing-chao, ZHAO Lei
      Vol. 53, Issue 11, Pages: 4010-4021(2025) DOI: 10.12263/DZXB.20250241
      摘要:The electromagnetic properties of dispersive media play a critical role in various engineering fields such as radar stealth and antenna design. Accurately and efficiently modeling dispersive media has long been a challenging research focus in computational electromagnetics. Although classical dispersion models can describe specific dispersive effects, they exhibit limitations in fitting complex broadband dispersive responses, making them difficult to apply in multi-band, multi-mechanism coupled complex scenarios. To achieve accurate and efficient time-domain analysis of the electromagnetic characteristics of dispersive media, this paper constructs a generalized dispersive media (GDM) mathematical model based on the vector fitting technique, enabling precise characterization of the dispersive properties of media. Combined with the auxiliary differential equation method and local time-stepping (LTS) technique, an efficient solving algorithm for dispersive media based on the discontinuous Galerkin time-domain (DGTD) method is developed. For given frequency-domain response data of dispersive media, this paper introduces the vector fitting technique and, under physical constraints and mathematical transformations, establishes a generalized dispersive media model that includes real poles and complex conjugate pole pairs, thereby unifying the description of relaxation-type and resonance-type dispersive behaviors. To overcome the computational complexity of time-domain convolution introduced by dispersive constitutive relations, the auxiliary differential equation method is employed to construct a DGTD solving scheme suitable for the generalized dispersive media model. This transforms the convolution operation into a set of coupled ordinary differential equations, enabling efficient time-domain stepping solutions. To further enhance computational efficiency, a local time-stepping strategy based on a low-storage Runge-Kutta integration method is designed, significantly improving the solving speed for dispersive media problems. This paper numerically solves the radar cross section (RCS) of dispersive spheres, dispersive material-coated warheads, and the reflection coefficients of frequency selective surface (FSS) periodic units. The results demonstrate that the generalized dispersive media model constructed via vector fitting accurately describes the frequency-domain dispersive characteristics of the media, with fitting errors consistently maintained at a low level. The obtained RCS and reflection coefficient results are in strong agreement with those from CST commercial software and traditional finite-difference methods, with absolute errors controlled within 3 dB. While ensuring computational accuracy, the introduction of the local time-stepping technique improves overall computational efficiency by over 40.42%. The proposed method provides a numerical analysis tool for electromagnetic simulations of complex dispersive media that combines generality, efficiency, and reliability.  
      关键词:vector fitting (VF);generalized dispersive media (GDM);discontinuous Galerkin time-domain (DGTD);auxiliary differential equation (ADE);Runge-Kutta integration;local time-stepping (LTS)   
      90
      |
      7
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 141036778 false
      更新时间:2026-02-10
    • XU Min, HU Chun-ling, HU Ting, ZHANG Fang-fang, DAI Xiang-long
      Vol. 53, Issue 11, Pages: 4022-4034(2025) DOI: 10.12263/DZXB.20250851
      摘要:Protein function prediction is one of the core tasks in bioinformatics. Although existing methods can fuse multimodal features of proteins, they still suffer from issues such as insufficient prediction accuracy and limited application scope due to reliance on limited experimental data. To address these problems, this study proposes a sequence- and cross-modal alignment-based protein function prediction model (SCMAGO), which takes protein sequences as the sole input. Specifically, it predicts tertiary structure and family domain information using the mainstream tools AlphaFold2 and InterProScan, respectively. It employs the protein large language model (Evolutionary Scale Model Cambrian, ESMC) to achieve sequence embedding, uses the geometric vector perceptron graph neural network (GVP-GNN) to extract tertiary structure features, and further obtains family domain representations through the broadcast embedding method. The SCMAGO model is designed with a two-step cross-modal alignment approach: first, it aligns sequence and structure features at the residue level based on bidirectional cross-attention; second, it further fuses family domain features by combining the graph attention pooling method. Experimental results show that SCMAGO outperforms existing benchmark methods on the Swiss-Prot dataset. Its Fmax values for biological process (BP), molecular function (MF), and cellular component (CC) are 0.487, 0.739 and 0.736, respectively, while the corresponding AUPR values reach 0.507, 0.760 and 0.800. Furthermore, SCMAGO still maintains stable prediction performance for proteins with sequence identity below 40%.  
      关键词:protein function prediction;multimodal fusion;attention mechanism;Gene Ontology   
      0
      |
      1
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 149459821 false
      更新时间:2026-02-10
    • LI Xiao-long, LI Xi, LIU Yang, LI Bing-ting, YI Chang-yan, ZENG Ning-jun
      Vol. 53, Issue 11, Pages: 4035-4050(2025) DOI: 10.12263/DZXB.20250591
      摘要:Building accurate traffic flow prediction models is crucial for optimizing traffic system management, alleviating urban congestion, and enhancing road network operational efficiency. However, real-world traffic flow exhibits significant non-stationary characteristics and complex spatio-temporal dependencies. In particular, the distribution shifts caused by unexpected events, rush hours, and holidays, coupled with the delayed propagation of traffic congestion across the network, pose severe challenges to traditional forecasting methods. Most existing models, relying on stationary assumptions or static spatio-temporal modeling, struggle to effectively capture the dynamic evolution patterns and heterogeneous delayed dependencies within traffic data, leading to limited prediction accuracy and insufficient practical applicability. To address these limitations, this paper proposes a non-stationary time series traffic flow forecasting model based on delayed spatio-temporal dependencies (NSFM), designed to deeply characterize the dynamic evolution mechanisms of traffic flow from both frequency and spatial domains. The model first employs Fourier Transform to decompose the non-stationary time series into time-varying and time-invariant components, capturing local dynamic fluctuations and global steady-state trends respectively, with orthogonality proven to ensure the independence between the two components, laying a theoretical foundation for subsequent differentiated modeling. Furthermore, the model constructs a feature fusion module with a delay feature extraction mechanism, integrating traffic flow, spatial adjacency relationships, temporal periodic information, and delay propagation features through pointwise convolution and positional encoding, thereby accurately capturing the spatio-temporal evolution and lagged response patterns of traffic states between stations. To model the spatial autocorrelation structure among discrete stations, this paper introduces the Moran operator to build a function-on-function regression prediction framework. Through basis function expansion and orthogonalization processing, a consistent mapping between the continuous function space and discrete observation stations is achieved, effectively quantifying the spatial dependency strength between regions and enhancing the model’s prediction robustness in complex road networks. To validate the effectiveness and generalization capability of the NSFM model, systematic experiments are conducted on four real-world traffic flow datasets (PEMS03, PEMS04, PEMS07, PEMS08). Experimental results demonstrate that NSFM significantly outperforms existing mainstream models across multiple evaluation metrics. Specifically, the mean absolute percentage error (MAPE) is reduced by 7.48%, 9.86%, 3.20%, and 1.73% respectively compared to SOTA models, demonstrating superior prediction accuracy and stability in non-stationary scenarios.  
      关键词:delayed spatio-temporal dependence;non-stationary;traffic flow prediction;spatio-temporal evolution characteristics   
      0
      |
      1
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 149459773 false
      更新时间:2026-02-10
    • HUANG Ya-ning, YAN Guang-hui, CHANG Wen-wen, CHENG Wen-xin, WU Bai-jing
      Vol. 53, Issue 11, Pages: 4051-4064(2025) DOI: 10.12263/DZXB.20250482
      摘要:In traditional driving behavior detection technology based on electroencephalography (EEG), the extraction and fusion methods of multi-dimensional features significantly affect classification performance. Existing approaches are predominantly based on single-modal feature extraction from time or frequency domains, failing to fully utilize nonlinear dynamics or spatial domain analysis. This limitation hinders the comprehensive capture of effective features across different brain regions and frequency bands, thus restricting recognition accuracy. To address this, we propose a multi-dimensional feature fusion model integrating multi-scale time-domain, frequency-domain, and spatial-domain features through dual branches utilizing graph convolutional neural networks (GCN) and EEGNet. First, we extract geometric properties and frequency band distributions from the raw EEG signals to construct time-frequency features. Next, brain network connectivity under different states is measured by calculating phase locking value (PLV), phase lag index (PLI), and mutual information (MI). Subsequently, GCN dynamically optimizes the adjacency matrix and aggregates node information to build spatial-domain features. EEGNet is then employed to extract local spatio-temporal features, enhancing model interpretability. The resulting multi-dimensional features are concatenated, fused, and classified. Our proposed model was evaluated across various dimensions on public datasets, achieving an average classification accuracy exceeding 95.87%, with a peak accuracy of 98.65%. This represents an improvement of 2.95% over the current state-of-the-art results. Our method effectively resolves the problems of suboptimal classification performance and low robustness stemming from reliance on single-modal features. This work provides a theoretical foundation for the development of wearable intelligent driving systems, particularly offering novel assistive technology pathways for individuals with disabilities who experience difficulties with physical vehicle operation during driving.  
      关键词:electroencephalography (EEG);emergency braking;driving behavior;graph convolution neural networks (GCN);phase locking value (PLV);phase lag index (PLI);mutual information (MI)   
      44
      |
      6
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 142698044 false
      更新时间:2026-02-10
    • HUANG Chen, MA Hao-bo, ZHANG Yan, YANG Chao, SONG Jian-hua
      Vol. 53, Issue 11, Pages: 4065-4076(2025) DOI: 10.12263/DZXB.20250555
      摘要:Graph neural networks (GNNs) have gained significant attention in electroencephalography (EEG)-based emotion recognition for their ability to model spatial-temporal dependencies across brain regions and capture context-aware neural patterns. However, most GNN-based EEG emotion recognition methods encounter two primary challenges: (1) Many existing models fail to account for the emotional commonality and diversity across local brain regions, resulting in overly homogeneous node embeddings for spatially or functionally adjacent regions; (2) Current approaches often rely on simple concatenation or correlationbased priors, which are inadequate for capturing the complex and distributed emotional patterns across multiple EEG channels and frequency bands. In this paper, we propose a tri-subspace decoupling clustering graph neural network (TS-DCGNN) to address the above challenges. Specifically, TS-DCGNN decouples EEG signals into three subspaces: the explicit emotional, implicit emotional, and explicit-implicit resonance subspaces, aiming to capture observable experiences (e.g., “happiness”), automatic responses (e.g., “startle”), and their coupling. Moreover, we introduce a dual-branch propagation architecture where graph attention networks (GATs) and graph convolutional networks (GCNs) operate in parallel to extract explicit and implicit features via attention-driven interaction and hierarchical learning. This enhances regional emotional representations. Furthermore, we present a unified representation learning module that integrates these features and employs information theory to obtain a minimal, sufficient, and discriminative emotional representation. Experiments on three benchmark datasets demonstrate state-of-the-art performance and improved interpretability.  
      关键词:EEG-based emotion recognition;graph neural networks;feature decoupling;local-global modeling;information-theoretic representation learning   
      51
      |
      6
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 134272402 false
      更新时间:2026-02-10
    • DENG Qiao, JIANG Lin, LIU Le-xin, TANG Lü-xin, YANG Ying-li
      Vol. 53, Issue 11, Pages: 4077-4090(2025) DOI: 10.12263/DZXB.20250250
      摘要:The rapid advancement of artificial intelligence (AI)-generated image technologies poses significant threats to cybersecurity and public trust, as human visual detection accuracy remains as low as 59%, close to random guessing. Existing detection methods suffer from limited performance and poor generalization across generative models, particularly struggling to capture physical inconsistencies in illumination. To address this gap, we propose L-KAN (Light-enhanced Kolmogorov-Arnold Networks), a novel detection framework that integrates illumination-sensitive features with the Kolmogorov-Arnold (K-A) representation theorem. Building upon red-green-blue (RGB) semantics, frequency-domain cues, and edge information, we construct physically grounded features that encode global illumination distribution, shadow geometry, and multi-scale illumination gradients to expose lighting inconsistencies in synthetic images. Leveraging the K-A theorem for feature fusion, ours method synergizes inner and outer functions to enhance feature complementarity while suppressing redundancy. Experimental results on three public datasets demonstrate that L-KAN achieves a competitive performance compared with the state of the art methods.  
      关键词:AI-generated image detection;light-shadow sensitive features;feature fusion;Kolmogorov-Arnold representation theorem   
      71
      |
      12
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 125448356 false
      更新时间:2026-02-10
    • WANG Hou-neng, YIN Jin-xiao, LIAO Xiao-bing, YE Shi-feng
      Vol. 53, Issue 11, Pages: 4091-4103(2025) DOI: 10.12263/DZXB.20250514
      摘要:Driven by energy transition and the “dual carbon” goals, islanded microgrids serve as flexible and reliable distributed energy carriers. Their multi-mode switching control, such as grid-connected/islanded operation mode switching and internal multi-energy coordination switching during islanded operation, is crucial for ensuring power supply quality, system stability and economic efficiency. However, with the continuous expansion and increasing complexity of islanded microgrids, multiple operational challenges emerge: strong nonlinear couplings internally,external uncertain random disturbances, parameter uncertainties, and high control costs and complexity. These issues can lead to switching instability and poor transient performance. Microgrid switched systems enable mutual switching among internal energy sources and can also connect to the main grid, allowing transitions between grid-connected and islanded modes, with more complex dynamic characteristics and operational conditions. This paper focuses on a nonlinear switching system of microgrids with planned islanding, where the nonlinear characteristics between grid-connected and islanded modes intensify coupling relationships among variables and disturbances may cause chattering or even system instability. First, an improved cross backstepping sliding mode variable control (ICBSMVC) is employed to decouple non-strict feedback systems,utilizing a barrier Lyapunov function to enhance convergence speed. Second, error compensation and sliding mode control are integrated to improve system robustness and an improved extended state observer is designed to compensate for external stochastic uncertainties.Additionally, dynamic surface control (DSC) is adopted to mitigate the “explosion in computation” problem. Finally, combining the Harris Hawk optimization algorithm and the sardine swarm optimization algorithm, a hybrid sardine swarm optimization algorithm is proposed to achieve smooth switching between grid-connected and islanded modes under external disturbances,enabling rapid and stable microgrid switching control while ensuring voltage and frequency stability. Simulation experiments on the Matlab platform validate the effectiveness of the proposed control method. Numerical examples demonstrate that the improved extended state observer can track disturbance signals more quickly, with zero tracking error and no chattering. The modified fal function enables more accurate estimation of external disturbances, ensuring rapid stabilization during microgrid switching and reducing steady-state errors. The proposed method achieves frequency stabilization at 50 Hz within 0.16 s when transitioning from islanded to grid-connected mode and restores frequency to 50 Hz within 0.166 s when switching from grid-connected to islanded mode. Voltage waveforms exhibit minimal abrupt changes during mode transitions, confirming the effectiveness of the control method.  
      关键词:microgrid;nonlinear switched system;improved cross backstepping;improved extended state observer;sliding mode control;hybrid salp swarm   
      26
      |
      2
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 143588699 false
      更新时间:2026-02-10
    • WANG Zheng-qiang, LI Chun, REN Xin-zhi, XU Yong-jun
      Vol. 53, Issue 11, Pages: 4104-4115(2025) DOI: 10.12263/DZXB.20240937
      摘要:Unmanned aerial vehicle (UAV) and intelligent reflecting surface (IRS) are two key technologies in the sixth-generation mobile communication system. With their high mobility and intelligent beamforming capabilities, they offer a new paradigm for building highly reliable and secure next-generation wireless networks. However, the broadcast nature of wireless channels poses severe challenges to the secure communication of UAV. Particularly in multi-user scenarios, how to ensure secure transmission while maintaining service fairness among multiple legitimate users is a complex and urgent problem to be solved. This paper studies an IRS-assisted multi-antenna UAV covert communication system, aiming to address the issue of unbalanced resource allocation among multiple users. Under the premise of considering user fairness, this paper takes maximizing the minimum average covert rate of the worst legitimate user as the optimization objective, ensuring that all users can obtain an acceptable minimum quality of service. This problem is solved by jointly optimizing user scheduling, the three-dimensional flight trajectory of the UAV, multi-antenna transmit beamforming, and the phase shift matrix of the IRS. Due to the highly non-convex and tightly coupled nature of the optimization problem, it is difficult to solve directly by convex optimization method. Therefore, this paper designs an efficient iterative algorithm based on block coordinate descent, decoupling the original problem into four relatively easy-to-handle sub-problems. According to the characteristics of each sub-problem, methods such as successive convex approximation, quadratic transformation, scaling, and variable substitution are adopted to transform them into convex optimization problems, and then solved efficiently through an alternating optimization mechanism. Simulation results show that the proposed algorithm has fast convergence rate. Compared with the benchmark schemes without IRS assistance and without trajectory optimization, the proposed joint optimization algorithm can significantly improve the minimum average covert rate of system users.  
      关键词:UAV communication;IRS;covert communication;resource allocation;trajectory optimization;convex optimization   
      61
      |
      7
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 144995427 false
      更新时间:2026-02-10
    • LI Yu-tong, MA Miao, CHEN Jian-rui
      Vol. 53, Issue 11, Pages: 4116-4131(2025) DOI: 10.12263/DZXB.20250652
      摘要:Action recognition aims to model and analyze human motions to automatically identify and understand human behaviors, and it has been widely applied in various fields such as intelligent surveillance, human-computer interaction, and smart education. In recent years, self-supervised skeleton-based action recognition has emerged as an important research area due to its low computational cost, strong adaptability, and minimal reliance on labeled samples. However, existing methods often rely on template-based prompts to generate action concept descriptions, which suffer from the lack of spatio-temporal information and limited semantic modeling capability. To address these issues, this paper proposes a cross-modal prior-assisted self-supervised skeleton-based action recognition method, aiming to effectively integrate skeletal structural features with semantic priors to achieve more semantically rich action representations. On one hand, it employs a dual-branch decoupled skeleton encoder to separately model the spatial structure and temporal dynamics of actions, and integrates a cross-domain contrastive learning strategy to establish feature alignment and consistency constraints from spatial, temporal, and global perspectives, thereby obtaining skeleton-modal features rich in spatio-temporal structure and global context. On the other hand, it feeds temporally concatenated action images along with prompt instructions into a vision-language model to generate action descriptions, and utilizes the text encoder of the contrastive language-image pre-training (CLIP) model to extract text features, thereby supplementing the limited fine-grained semantic representation capability of the skeleton modality. Furthermore, a cross-modal contrastive learning strategy is proposed, where the textual semantics are dynamically modulated under the guidance of skeleton features using a feature-wise linear modulation (FiLM) mechanism, enabling effective semantic alignment between skeleton and text modalities. Experimental results show that the recognition accuracy of the proposed method outperforms more than ten state-of-the-art approaches, including C2VL, on the NTU-RGB+D 60 and NTU-RGB+D 120 datasets, and surpasses eight competitive methods, such as ACA2Net, on the PKU-MMD-II dataset. The proposed method integrates skeletal structural information with semantic priors, achieving effective complementarity between skeleton features and language semantics, and providing a new perspective for skeleton-based action recognition with low annotation cost. In future work, we will further explore domain-adaptive fine-tuning strategies to enhance the open-set description capability of vision-language models, and develop an online collaborative optimization framework to jointly optimize description generation and action recognition, thereby improving the practicality, intelligence, and interpretability of the proposed method in complex dynamic scenarios such as real-time human-computer interaction and smart education.  
      关键词:skeleton-based action recognition;action description generation;cross-modal semantic alignment;vision-language model;contrastive learning;self-supervised learning   
      68
      |
      3
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 145787994 false
      更新时间:2026-02-10
    • WANG Hua-dong, YANG Jian-peng, ZHANG Tian-qi
      Vol. 53, Issue 11, Pages: 4132-4141(2025) DOI: 10.12263/DZXB.20250653
      摘要:For the transmission performance degradation problem caused by the coupling of nonlinear distortion from high-power amplifiers and linear fading effects of multipath channels in satellite communication systems, traditional blind equalization algorithms, such as the constant modulus algorithm (CMA), have a certain effect in addressing linear intersymbol interference caused by multipath. However, the traditional methods cannot effectively compensate for nonlinear distortion in high-order modulated signals,especially in blind equalization scenarios without training sequences, making it difficult to provide sufficient supervision information.To overcome this challenge, this paper proposes a blind equalization algorithm for nonlinear satellite channels based on decision-reconstruction (DR-NEA). DR-NEA adopts a decision-interpolation-reconstruction mode to generate reference signals, thereby realizing the joint compensation of nonlinear and linear distortions under unsupervised conditions. Firstly, the algorithm performs linear equalization on the received signal through the CMA to eliminate linear distortion caused by multipath effects. Subsequently, a reference signal is generated through decision, interpolation and reconstruction, providing supervision information for the parameter identification of the nonlinear equalizer. Finally, DR-NEA uses the Quasi-Newton method to identify the parameters of the Wiener-type equalizer under the criterion of minimum mean square error, thereby realizing the joint compensation of linear and nonlinear distortions in the channel. Simulation results show that under high-order modulation modes (32APSK, 32QAM, 64QAM), the traditional linear equalization algorithms are outperformed by DR-NEA. When the bit error rate is 1×10-3, its performance gain is more than 4 dB compared with traditional linear equalization algorithms, which reflects its strong nonlinear compensation capability under high-order modulation. In addition, when the decision error rate is lower than 9.44%, DR-NEA still maintains stability and its output performance is hardly affected, which further verifies the robustness of the proposed algorithm.By innovatively introducing a reference signal generation method based on decision reconstruction, DR-NEA solves the problem that traditional blind equalization algorithms cannot provide sufficient supervision information. At the same time, it adopts the Quasi-Newton method for Wiener model parameter identification, realizing efficient optimization of the nonlinear equalizer. Experimental results verify the superior performance of this algorithm in compensating nonlinear and linear distortions, and it is particularly suitable for the transmission of high-order modulated signals. In summary, the DR-NEA algorithm effectively solves the problem of joint interference of nonlinear distortion and multipath fading in satellite communication, and has important theoretical significance and broad practical application prospects. Especially in high-data-rate and high-order modulation satellite communication scenarios, it can significantly improve the transmission performance of the system.  
      关键词:satellite channels;decision reconstruction;blind equalization;nonlinear distortion;coefficient identification;bit error rate of decisions   
      63
      |
      4
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 145787921 false
      更新时间:2026-02-10
    • NAHGNN: Neighborhood Aware Heterogeneous Graph Neural Network

      LI Qiang, ZHENG Wei, CHEN Ming, TAN Xing-yi, MA Hua
      Vol. 53, Issue 11, Pages: 4142-4156(2025) DOI: 10.12263/DZXB.20250420
      摘要:Heterogeneous graphs are widely present in complex scenarios such as social networks, recommendation systems, and biological networks. Meta-path-based heterogeneous graph neural networks (HGNNs) explicitly model cross-type indirect relationships via high-order semantic paths, enhancing the ability to capture complex dependencies. However, existing studies either use all meta-path features within a specified length without distinction, leading to redundancy in semantic information as the number of generated features rises exponentially with the increase of the meta-paths length, or suffer from over-smoothing caused by high-order aggregation, resulting in the loss of edge information. To address these issues, this paper proposes a neighborhood aware heterogeneous graph neural network (NAHGNN). From the perspective of neighborhood awareness and through task decoupling, the feature generation is divided into two steps: associative meta-path generation and neighborhood-aware feature aggregation. Firstly, an associative meta-path generation module learns rich semantic information between target nodes by leveraging associative meta-path features that both start and end nodes are of the target type. Secondly, a simple and efficient neighborhood-aware feature aggregation module is designed based on the neighborhood-aware modalities of target nodes to extract neglected neighborhood information in associative meta-paths. Finally, to fit the semantic representations of corresponding neighborhood-aware modalities and avoid mutual interference between neighborhood-aware features, a semantic fusion module with a band mask is designed to integrate semantic information across different features. Experimental comparisons are conducted with six mainstream heterogeneous graph neural network baselines on four public heterogeneous graph datasets (DBLP, ACM, IMDB, and Freebase). The results show that NAHGNN achieves a Micro-F1 improvement of 0.63 to 12.50 percentage points in node classification tasks, significantly reduces training time and GPU memory consumption, and exhibits favorable interpretability.  
      关键词:heterogeneous graph;attention mechanism;heterogeneous graph neural networks;meta-path;graph representation learning   
      35
      |
      2
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 147206682 false
      更新时间:2026-02-10

      SURVEYS AND REVIEWS

    • Near-Zero-Index Metamaterials and Applications

      YAN Wen-di, LI Yue
      Vol. 53, Issue 11, Pages: 4157-4170(2025) DOI: 10.12263/DZXB.20250732
      摘要:Near-zero-index (NZI) media have emerged as a significant research direction in artificial electromagnetic (EM) media in recent years due to their unique physical properties in EM wave manipulation. Unlike traditional materials, NZI media can exhibit features such as infinitely stretched wavelength, infinite phase velocity, and unchanged propagation phase under conditions where the permittivity or permeability approaches zero. This leads to a spatiotemporal decoupling characteristic described as “temporal oscillation, spatial stillness”. These properties provide new physical pathways to overcome the bottlenecks of conventional devices in terms of size, bandwidth, and shape constraints.This article systematically reviews the physical fundamentals, implementation mechanisms, and typical NZI metamaterials. Starting from the physical mechanisms, this article introduces their wavelength stretching, supercoupling effects, and ideal power flow characteristics. Subsequently, the implementation methods of NZI media are summarized, and the rapidly developing theory of “photonic doping” is further introduced. This theory involves introducing heterogeneous doping elements into NZI media to achieve fine-tuning of the effective permeability, thereby constructing NZI metamaterials at subwavelength scales. This method offers advantages such as tunable parameters, geometry independence, and ease of integration, making it an important engineering approach for NZI media.In terms of applications, this article summarizes the typical functions and performance advantages of NZI metamaterials from three perspectives: absorption, transmission, and radiation. In absorption, leveraging the field enhancement effects, impedance matching mechanisms, and perfect coherent absorption in NZI media enables ultra-high sensitivity sensing, efficient thermal radiation control, and ultra-thin absorbing surfaces. In transmission, utilizing the supercoupling effects, impedance control capabilities, and dispersion engineering of NZI media enables functional devices such as reflectionless energy transmission of arbitrary shapes, high-efficiency bendable interconnects, multiport power distribution, and multichannel frequency division multiplexing. In radiation, exploiting the geometry independence and zero-phase-shift characteristics of NZI media enables wavefront shaping, directional radiation, and reconfigurable radiation patterns, facilitating the construction of shape-independent, highly integrated tunable antenna devices.Currently, NZI metamaterials still face critical challenges such as limited bandwidth, significant losses, and poor process compatibility. Future development directions include: developing broadband, low-loss material systems to achieve synergistic optimization of structures and modes; promoting interdisciplinary integration across mechanical, thermal, quantum, and other physical fields; and realizing deep integration with chip and optical platforms.  
      关键词:near-zero-index medium;near-zero-index metamaterials;photonic doping;Impedance matching;transmission line;antenna   
      27
      |
      2
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 144422970 false
      更新时间:2026-02-10
    • WANG Zhong-tian, WU Yi-quan
      Vol. 53, Issue 11, Pages: 4171-4198(2025) DOI: 10.12263/DZXB.20250775
      摘要:With the vigorous development of unmanned aerial vehicle (UAV) technology, its applications in various fields such as military defense, intelligent transportation, facility inspection, disaster relief, and agricultural management have become increasingly widespread, becoming the core driving force for the development of the low-altitude economy. Autonomous landing, as one of the core and key technologies of UAVs, directly determines the safety and reliability of UAV operations. Especially in emergency scenarios such as low battery power, deteriorating weather conditions, or communication disruptions, it can effectively prevent equipment damage and accidents, and is a crucial step towards achieving full automation of UAVs. Scene perception technology based on vision and deep learning, with its powerful feature learning and pattern recognition capabilities, has broken through the limitations of traditional technologies such as GPS (Global Positioning System) and LiDAR (Light Detection And Ranging) in complex environments, bringing a brand-new solution to the field of UAV autonomous landing. This paper systematically reviews the scene perception methods for UAV autonomous landing based on vision and deep learning. Firstly, it elaborates on the application background and significance of deep learning in UAV autonomous landing, and sorts out the technological evolution from traditional sensor-driven to intelligent perception. Then, it analyzes in detail the features and technical challenges of different scenarios: static platform landing focuses on three types of scenarios - landing marks, runway detection, and ground guidance, with the core demand being to improve landing accuracy and reliability; dynamic platform landing covers land-based vehicles, ships at sea, and other mobile platforms, and needs to focus on solving problems of motion tracking and interference suppression; special scenario landing faces multiple challenges such as obstacle occlusion, signal interference, and extreme weather in complex environments like mountains, forests, and urban canyons. This paper deeply explores the core technical system, including the principles and applications of key technologies such as object detection, semantic segmentation, pose estimation, optical flow prediction, and 3D reconstruction. At the same time, it analyzes the application effects and performance of feature extraction optimization, semantic understanding enhancement, and scene adaptation strategies. Finally, it summarizes the challenges faced in this field, such as insufficient adaptability to complex environments, computational resource constraints, data dependence and annotation difficulties, and looks forward to future research directions. It points out that multi-source sensor data fusion can enhance the perception ability in complex environments, developing lightweight models can adapt to the resource limitations of UAVs, and strengthening the combination of simulation and real scenarios can improve the generalization ability of models. Through systematic summary and analysis, this paper comprehensively presents the current technical status and development trends in this field, providing valuable reference and guidance for further research and engineering applications of UAV autonomous landing technology.  
      关键词:autonomous landing;UAV;deep learning;computer vision;object detection;semantic segmentation   
      2
      |
      1
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 149459948 false
      更新时间:2026-02-10
    0