最新刊期

    54 1 2026

      PAPERS

    • QIU Jing, NONG Lichen, SUN Yifei, CAO Xiaochun, CHEN Ximing, ZHANG Ruizhi
      Vol. 54, Issue 1, Pages: 1-18(2026) DOI: 10.12263/DZXB.20250681
      摘要:Guided by the MITRE ATT&CK framework, modeling and assessing cybersecurity risks by modeling attackers’ tactical objectives and technical methods through attack graphs have become one of the key approaches to countering complex multi-step attack threats. However, as attack scenarios and attack chains grow increasingly intricate, existing ATT&CK-based attack path modeling and risk assessment methods exhibit certain limitations. On the one hand, current attack path modeling processes only consider direct transition relationships between attack techniques within the ATT&CK framework, overlooking tactical-level attack semantics and weakening the ability to impose high-level semantic constraints on complex multi-stage attack paths. On the other hand, attack graph-based risk quantification methods relying on generic vulnerability characteristics overlook differences in organizational focus on critical assets, resulting in assessment outcomes that lack personalized asset adaptation.To address these challenges, this paper proposes a personalized risk assessment method based on dual-layer association modeling of attack techniques and tactics. First, a dual-layer association model is constructed to capture potential relationships between techniques and tactics. Combined with the Viterbi algorithm, this model infers the evolution paths of attack tactics, introducing tactical-level stage constraints during path inference. Subsequently, a customized threat quantification model is developed by integrating attack behavior attributes with asset-specific characteristics. Through a forward algorithm, state transition probabilities are coupled with threat quantification metrics to achieve holistic network security risk assessment.Experimental results demonstrate that the proposed method outperforms existing mainstream assessment models in both path modeling and risk evaluation capabilities in real-world network environments. Compared with competing approaches, the proposed method achieves an average improvement of 48.95% in comprehensive risk assessment accuracy, validating its effectiveness and practical value in complex attack scenarios.  
      关键词:cybersecurity;logical attack diagram;risk assessment;hidden Markov model;ATT&CK framework;risk path identification   
      19
      |
      7
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 155511272 false
      更新时间:2026-06-04
    • Text Prompted Image Coding for Machine

      HUANG Zhimeng, GAO Feng, YANG Fan, MA Siwei
      Vol. 54, Issue 1, Pages: 19-31(2026) DOI: 10.12263/DZXB.20250778
      摘要:In recent years, with the rapid development of classic machine-to-machine (M2M) communication scenarios such as the internet of things (IoT), semantic communication, and smart cities, the real-time transmission and efficient processing of massive visual data between devices have become a critical challenge. In this context, traditional image coding methods, which are primarily optimized for human perceptual quality, often suffer from insufficient analysis accuracy when applied to machine vision tasks due to a fundamental mismatch between their optimization objectives and the requirements of machine analysis. Consequently, image coding for machine (ICM) has emerged, aiming to maintain high analysis accuracy for downstream machine vision tasks (e.g., classification, detection, segmentation) while achieving the lowest possible bitrate, thereby better adapting to the bandwidth and storage constraints in M2M scenarios. However, existing ICM methods still face two major bottlenecks. First, their performance degrades sharply under extremely low bitrates. This is because most current approaches rely on end-to-end nonlinear transformations to extract visual features, failing to fully exploit the compact representation of high-level semantic information within images, which leads to inefficient feature coding. Second, they exhibit weak generalization in open-set scenarios. Most methods are optimized for single tasks or single datasets, lacking the adaptability to unseen categories or cross-domain data, and thus struggle to maintain stable analytical performance in practical, dynamic environments. To overcome these limitations, this paper proposes a novel text-prompted image coding for machine (T-ICM) framework. The core idea is to decouple image information into two complementary components: semantic information and texture information. The semantic information is represented and encoded in the form of structured text prompts (e.g., object categories, location descriptions), while the texture information is extracted and compressed as task-agnostic general visual features. At the encoder side, the text prompts, owing to their highly abstract and semantically compact nature, can significantly reduce the overall bitrate. The general features are efficiently compressed via our proposed grouped feature coding module. At the decoder side, the text prompts serve not only for direct parsing to accomplish tasks like classification and detection but, more importantly, act as guidance signals. Through a prompt encoder and a mask decoder, they dynamically adjust the semantically relevant regions of the reconstructed general features, enabling feature-level domain adaptation and task-specific adaptation, thereby significantly enhancing the model’s robustness in open-set scenarios. The proposed T-ICM is comprehensively evaluated on multiple standard datasets and tasks. Experiments demonstrate that on dense prediction tasks such as semantic segmentation and instance segmentation, T-ICM can maintain analysis accuracy close to that of using the original uncompressed images even at very low bitrates, significantly outperforming H.266/VVC, learned image codecs, and other existing ICM methods. By migrating semantic information to the highly compressed text modality for transmission and utilizing it to guide feature reconstruction, T-ICM achieves a superior trade-off between coding efficiency and task performance. This work provides a novel perspective and technical foundation for the future development of semantic communication, collaborative edge intelligence, and adaptive machine vision systems.  
      关键词:video coding;intelligent compression;feature coding;feature coding for machine;deep learning;signal processing   
      13
      |
      9
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153936701 false
      更新时间:2026-06-04
    • Transposed Projection Envelope Linear Discriminant Analysis

      LI Yongming, ZHAO Wenqiang, LI Fan, ZHANG Xiaoheng, WANG Pin
      Vol. 54, Issue 1, Pages: 32-49(2026) DOI: 10.12263/DZXB.20250938
      摘要:Linear discriminant analysis (LDA) is a widely used feature extraction method guided by Fisher’s discriminant criterion. It enhances the separability of dissimilar samples and the compactness of similar samples within the subspace, thereby improving the quality of dimensionality reduction results. With its mature, interpretable, simple, and efficient advantages, it remains one of the research hotspots in both academia and industry to date. Numerous scholars have refined LDA to further enhance its performance. However, these LDA variants model directly at the original sample granularity, utilizing only the information inherent within the samples themselves. The data-information-knowledge (DIK) model indicates that human knowledge acquisition occurs across three levels: data, information, and knowledge. Data must first be transformed into information, from which knowledge is then learned. Human cognitive mechanisms reveal that the information layer encompasses not only the inherent properties of raw inputs but also correlation information among similar inputs. Analogously, in LDA’s dimensionality reduction process, extracted features represent information that should also incorporate correlation information among similar samples to enhance downstream task performance. Furthermore, existing research demonstrates that the correlation information between similar samples is crucial for machine learning model construction and knowledge acquisition. This indicates that existing LDA has limitations, as it does not fully utilize sample information. To address these issues, this paper proposes transposed projection envelope linear discriminant analysis (TPELDA). First, transposed projection transforms original samples into envelope samples that encapsulate correlation information among similar samples. The core idea of transposed projection is to reduce the dimensionality of a batch of nearest neighbor samples along the sample dimension, ensuring the resulting envelope sample retain as much information as possible from the original batch. Subsequently, Fisher’s discriminant criterion is employed to learn a reduced-dimension subspace based on these envelope samples. A distribution-difference penalty term is introduced to ensure the reduced subspace’s adaptability to the original samples. Finally, through joint optimization, this method enhances the discriminative features of samples projected into the subspace by incorporating the correlation information among similar samples. Thus, the resulting features simultaneously represent both the intrinsic information of individual samples and the correlation information among similar samples. Experimental results demonstrate that TPELDA outperforms relevant comparison methods across multiple datasets, achieving performance improvements ranging from 2.25% to 13.19%. Furthermore, combined with other experimental findings, the effectiveness of the proposed method is confirmed.  
      关键词:linear discriminant analysis;dimensionality reduction;correlation information;distribution discrepancy;envelope learning;feature extraction   
      9
      |
      7
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153340369 false
      更新时间:2026-06-04
    • Building Privacy Shield in Online Generative AI Services

      QI Tao, WANG Huili, YANG Peiru, WANG Wendan, TAN Zhipeng, HUANG Yongfeng, WANG Shangguang, XU Hongyan, LUO Chuanwen
      Vol. 54, Issue 1, Pages: 50-67(2026) DOI: 10.12263/DZXB.20250793
      摘要:In recent years, state-of-the-art online artificial intelligence systems demonstrate remarkable capabilities in various fields, exerting broad social impacts. In order to access these model services, users are typically required to upload their personal data to the cloud platform. However, these queries may contain sensitive or confidential information, and directly sharing them with cloud platforms introduces potential privacy leakage risks. Moreover, platforms may exploit user data for further model training, causing private information to be memorized by the model and later regenerated in public services, thereby aggravating the risk of privacy breaches. Existing privacy-preserving mechanisms in generative AI applications predominantly rely on prompt sanitization techniques, whose security critically depends on the accuracy of sensitive information identification. These approaches usually require large amounts of annotated data for model training, which not only raises implementation costs but may also introduce new privacy vulnerabilities in specific scenarios. To address this issue, this paper proposes a novel privacy-preserving collaborative learning framework named PrivateAI. The core idea of this framework is to fully exploit sensitive data distributed across different devices to train local privacy identification models, while strictly ensuring data privacy. Meanwhile, PrivateAI extracts the implicit knowledge embedded in the large foundation models and compresses it into a lightweight distilled dataset, thereby achieving effective privacy detection performance enhancement of local models. In addition, to tackle the heterogeneity challenge between the knowledge extracted from labeled data and foundation models, the framework introduces a heterogeneous knowledge fusion mechanism that aligns and integrates multi-source knowledge from both the foundational models and distributed labeled datasets. We evaluate PrivateAI on two datasets, and the results demonstrate that models learned by PrivateAI can maximally improve the privacy protection success rate by 53.7 percentage points. PrivateAI holds significant potential in mitigating privacy breaches, acting as a sentinel against severe privacy leakage incidents within online AI applications.  
      关键词:privacy protection;collaborative learning;online artificial intelligence services;differential privacy;federated learning   
      8
      |
      10
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 154003940 false
      更新时间:2026-06-04
    • CHEN Rongjun, WANG Hongchao, WANG Qinding, QIAO Kai, TIAN Weikang, YANG Dong
      Vol. 54, Issue 1, Pages: 68-85(2026) DOI: 10.12263/DZXB.20250734
      摘要:With the rapid development of industrial wireless networks and wireless communication technologies, deterministic transmission in wireless networks has emerged as an important research direction. However, the inherent uncertainties of wireless channels, such as multipath fading and co-channel interference, pose significant challenges to achieving deterministic transmission. To address these challenges, the internet engineering task force (IETF) proposed the reliable and available wireless (RAW) architecture, which adopts time-slotted channel hopping (TSCH) as the underlying technology in industrial wireless network scenarios. In order to ensure reliability and stringent delay requirements, RAW incorporates a variety of mechanisms, including the use of packet replication, elimination and ordering functions (PREOF) to exploit path redundancy and thereby enhance transmission reliability and determinism. Nevertheless, existing scheduling schemes have not sufficiently considered PREOF or the joint optimization of routing and scheduling. This results in redundancy and inefficient resource allocation in the time-frequency domain, limiting the network’s ability to support critical flows. In this work, we formulate the joint optimization problem of multipath routing and scheduling for deterministic flow transmission and propose a hierarchical reinforcement learning-based resource allocation algorithm, termed hierarchical reinforcement resource allocation (HRRA). In HRRA, the high-level policy is responsible for selecting multipath routes, while the low-level policy allocates time-frequency resources based on the high-level routing decisions, explicitly accounting for the elimination of redundant packets by PREOF at aggregation nodes. To address variations in topology size and heterogeneous traffic demands, a graph neural network (GNN) is integrated into the high-level policy to enhance feature representation. The HRRA algorithm selects appropriate actions according to flow requirements such as deadlines and reliability, thereby maximizing both the number of schedulable flows and overall resource utilization. Through this cross-layer optimization framework and explicit support for PREOF, HRRA not only mitigates redundancy and improves scheduling efficiency but also better supports deterministic communication requirements. Experimental results demonstrate that, compared to baseline schemes such as DGRL+MWIS and EDF-MO, HRRA improves scheduling capability by 10.6% and 36.6%, respectively, while achieving higher resource utilization.  
      关键词:reliable and available wireless;packet replication, elimination and ordering functions;hierarchical reinforcement learning;graph neural network;network resource scheduling   
      5
      |
      8
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153742687 false
      更新时间:2026-06-04
    • Textual Semantic Guidance for Infrared and Visible Image Fusion

      ZHU Mingrui, CHEN Xiru, WEI Xin, WANG Nannan, GAO Xinbo
      Vol. 54, Issue 1, Pages: 86-101(2026) DOI: 10.12263/DZXB.20250906
      摘要:Infrared and visible image fusion (IVF) aims to integrate the complementary information contained in both image modalities by effectively combining the salient targets in infrared images with the rich texture details present in visible images. Through this integration, IVF produces more informative and comprehensive fused images that surpass single-modality inputs. Existing research has demonstrated that deep learning-based fusion methods have achieved remarkable progress in improving fused image quality. However, most of these approaches focus mainly on low-level visual features, and the deep semantic associations between high-level semantic information and visual features have not yet been sufficiently explored. In recent years, with the rapid development of large vision-language models (VLMs), text-guided image fusion methods have exhibited great potential due to their flexibility and versatility. However, the effective integration and utilization of textual semantic information in the image fusion process remain insufficiently studied. To tackle these challenges, this paper proposes a textual semantic guidance method for infrared and visible image fusion, termed textual semantic guidanc (TeSG), which guides the image synthesis process in a way that is optimized for downstream tasks such as object detection and semantic segmentation. By explicitly introducing high-level semantic information generated by VLMs into the fusion pipeline, TeSG achieves precise regulation of the fusion process and enhances the semantic consistency of the fused results. TeSG introduces textual semantics at two levels: the mask semantic level and the text semantic level. First, automatically generated textual descriptions from VLMs are employed as global text-level semantic guidance, providing high-level semantic constraints for the fusion process. Second, based on these textual descriptions, mask semantics corresponding to key target regions are constructed, enabling accurate localization and differentiated modeling of foreground and background regions. Building on this, three core modules are designed to implement the proposed framework. The semantic information generator (SIG) module generates both mask semantics and text semantics from automatically produced textual descriptions. The mask-guided cross-attention (MGCA) module performs preliminary attention-based fusion of visual features from both infrared and visible images under the guidance of mask semantics, thereby realizing mask-level cross-modal feature interaction. Finally, the text-driven attentional fusion (TDAF) module achieves text-level fusion and dynamic weighting through text-guided attention and a gating mechanism, allowing semantic cues to modulate the contribution of different modalities in an adaptive manner. Experimental results demonstrate that the proposed TeSG method, through its dual-level textual semantic guidance paradigm, performs favorably against existing state of the art (SOTA) methods in preserving multimodal texture information and enhancing contrast in the fused images. In addition, TeSG yields superior performance in downstream tasks such as object detection and semantic segmentation, highlighting its task-oriented fusion capability. Compared with current SOTA image fusion approaches, the proposed TeSG achieves an average improvement of 1.4% on downstream tasks, validating its competitiveness and effectiveness while also exhibiting strong generalization ability across different datasets and scene conditions. The proposed method effectively addresses the insufficient exploration of deep correlations between textual and visual features in existing image fusion algorithms, achieving simultaneous improvements in fusion quality and downstream task performance.  
      关键词:image fusion;infrared and visible images;textual semantic guidance;deep learning;vision-language models;attention   
      5
      |
      8
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 154117652 false
      更新时间:2026-06-04
    • CIMOT3D: Chinese-Instruction-Based Monocular 3D Multi-Object Tracking

      WANG Rong, HU Haixiang, WEI Hongkai, LIANG Haoxiang, QIAN Xiaowei, LI Kaifei, GUO Keyu, SONG Xiangyu, SUN Shijie
      Vol. 54, Issue 1, Pages: 102-114(2026) DOI: 10.12263/DZXB.20250826
      摘要:Natural language-driven object tracking parses human-like language descriptions and fuses them with visual information to achieve accurate recognition and continuous tracking of specific targets in complex environments. However, existing methods focus on 2D tracking or 3D single-target tracking, and they have not been effectively extended to 3D multi-target tracking. They lack the capability to align text with multiple candidate targets in 3D visual space and to establish associations. In addition, existing natural language-driven 3D object tracking tasks suffer from redundancy in language descriptions, which makes it hard to track multiple specific targets using flexible and concise instructions as humans do. To address these challenges, this paper introduces a new task, chinese-instruction-based monocular 3D multi-object tracking (CIMOT3D). The paper also constructs a new dataset, CIMOT3D-5k, which contains 5 562 video sequences with human-like Chinese descriptions. Furthermore, this paper designs a neural network model chinese-instruction-based monocular 3D multi-object tracking synchronization tracker (CIMOT3D-SyncTracker) for this task, which consists of a multimodal feature extractor, a vision-language encoder-decoder, and a detection-tracking module. Compared with baseline methods, the proposed approach achieves an improvement of 4.1% in tracking accuracy and 5.0% in identity consistency metric on the CIMOT3D-5k dataset, verifying its performance advantage. This paper advances research on vision-language fusion in 3D multi-object tracking and offers new ideas for further exploration in related fields.  
      关键词:scene understanding;3D object tracking;multi-object tracking;vision-language model;multimodal learning;machine vision   
      5
      |
      19
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153340882 false
      更新时间:2026-06-04
    • WANG Ziyao, TIAN Yu, HUANG Junjie, TAN Jie, YANG Wenjing
      Vol. 54, Issue 1, Pages: 115-124(2026) DOI: 10.12263/DZXB.20250975
      摘要:With the rapid evolution of cloud computing, big data, and artificial intelligence applications, the scale of data centers continues to expand, and the reliability of storage systems has become a critical factor affecting their stable operation and service availability. As a key component of data center storage systems, solid-state drives (SSDs) are widely deployed in the core storage layers of data centers owing to their advantages of high throughput, low latency, and low power consumption. However, under large-scale and long-term operating conditions, SSD failures are characterized by strong suddenness and complex evolution patterns, posing severe challenges to service continuity and data security. To enhance the accuracy and practicality of failure prediction, this paper investigates a machine learning prediction methodology based on classification models and feature engineering, alongside a rule-based reasoning prediction approach utilizing an explicit rule engine and dynamic feature compensation. The machine learning methodology, through multi-stage feature engineering and ensemble learning, achieves a macro-average F1-score of 0.968 under complete data conditions; however, its “black-box” nature somewhat limits its industrial applicability. In contrast, the rule-based reasoning approach constructs an explicit rule engine integrating multiple algorithms and introduces a dynamic feature compensation mechanism based on SHAP (SHapley Additive exPlanations) values. This method attains an accuracy of 0.988 with complete data and maintains an accuracy of 0.941 under extreme conditions with eight missing features, demonstrating strong robustness. Comparative analysis of experimental results indicates that the machine learning methodology excels in prediction accuracy with complete data, while the rule-based reasoning approach offers superior interpretability, real-time performance, and adaptability to missing data. This paper further explores potential pathways for integrating these two methodologies, providing theoretical support and practical references for constructing next-generation intelligent operation and maintenance systems that possess both perceptual capability and transparent reasoning.  
      关键词:SSD Failure Prediction;rule-based reasoning;machine learning;feature engineering;Real-Time Prediction   
      12
      |
      14
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 154117521 false
      更新时间:2026-06-04
    • ZHANG Shihui, ZHAO Pengyu, ZHANG Yao, HAN Shaojie
      Vol. 54, Issue 1, Pages: 125-140(2026) DOI: 10.12263/DZXB.20250521
      摘要:Despite the remarkable performance of deep neural networks across various fields, the existence of adversarial examples reveals significant security vulnerabilities. Existing black-box attack methods typically operate within a single domain, overlooking the importance of multi-domain feature co-perturbation in enhancing the transferability of adversarial examples. Moreover, many methods suffer from a single-purpose loss function, making it difficult to balance target class guidance and gradient stability. To address these issues, this paper proposes a high-transferability adversarial examples generation method based on spatial-frequency dual-domain feature fusion (SFDFF). Specifically, the input examples are first transformed from the spatial domain to the frequency domain using the discrete cosine transform, and region-level feature fusion is performed between the input and clean examples in the frequency domain. Then, the input examples are restored to the spatial domain via the inverse discrete cosine transform, and noise based on the statistical characteristics of the original examples are injected. Next, channel-level fusion of spatial features between the input and clean examples are conducted. Finally, a dual-guidance loss function is designed to simultaneously enhance target class directionality and gradient stability. Extensive experiments on ImageNet-Compatible and CIFAR-10 datasets demonstrate the performance of the proposed method. For instance, the attack success rate of the proposed SFDFF increases by 2.5% compared to the state-of-the-art method when transferred from the adv-RN-50 to LeViT model on ImageNet-Compatible dataset. The code is available at https://github.com/ipkpkpk/SFDFF.  
      关键词:adversarial examples;feature fusion;frequency domain;spatial domain;black-box attack;transferability   
      7
      |
      9
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153340843 false
      更新时间:2026-06-04
    • XIE Zhengbin, SUN Zhe, ZHAN Xufeng, LI Xuelong
      Vol. 54, Issue 1, Pages: 141-152(2026) DOI: 10.12263/DZXB.20250753
      摘要:After performing long-duration ocean exploration and other operational tasks, autonomous underwater vehicles (AUV) must return to a recovery station for energy replenishment and data transmission. During the AUV’s terminal recovery phase, the positioning accuracy and speed of its positioning system directly influence the success rate of AUV guidance. Among current guidance technologies, acoustic guidance methods offer long operating ranges but their positioning accuracy struggles to meet close-range docking requirements; while vision-based guidance methods offer higher accuracy, they are susceptible to interference from external environmental factors such as water turbidity and light scattering. Furthermore, such methods involve complex image feature extraction and matrix operations, placing higher demands on the computing power and power consumption of the computational platform carried by the AUV. Addressing the issues of limited AUV computing power, poor real-time performance of traditional visual methods, and high computational load, this paper proposes a hardware-software integrated lightweight high-speed optical positioning scheme. This study constructs an AUV guidance model based on a multi-quadrant photoelectric detector. In terms of hardware, this scheme uses an 8 × 8 array multi-quadrant area detector mounted on the front of the AUV as the signal receiver, with a group of three light-emitting diode (LED) guidance lights arranged in an equilateral triangle deployed at the front of the recovery station as the signal transmitter. The detector calculates the incident deviation angles of the three optical signals by measuring the centroid position of the incident light spots, avoiding the massive image matrix calculations of traditional visual systems. In the mathematical model, this paper establishes the mapping relationship from angular deviation information to relative spatial coordinates. Considering that the AUV’s roll angle is constrained during the structural design phase, information regarding the roll angle is removed from this model, effectively reducing positioning accuracy degradation caused by attitude measurement errors and enhancing system robustness. Addressing the non-linear solving problem in space, this paper introduces an improved particle swarm optimization (PSO) algorithm, using the sum of errors between the predicted deviation angles and the actual measured deviation angles as the loss function, achieving rapid estimation of the AUV’s relative pose. To verify the performance of this algorithm, this paper conducted physical simulations and sea trial validations. First, based on a physical model, a simulation dataset containing 100 000 sets of different data was generated, covering different distance and attitude information within 0 m to 20 m. Subsequently, the algorithm was deployed on the low-power edge computing platform Jetson Orin NX for actual testing. Experimental results show that in terms of speed, the system can stably solve for the AUV’s pose information at a frequency of 192 Hz; in terms of accuracy, within the terminal guidance distance of 0.6 m to 2 m, the average positioning error of this algorithm is only 7.81 mm; within the medium-to-long guidance distance of 2 m to 20 m, the average positioning error is 159.90 mm. Furthermore, in sea trial experiments based on a remotely operated vehicle (ROV), this paper used global positioning system (GPS) data as the ground truth benchmark. After deducting hardware baseline errors, the accuracy level of the simulation experiments was maintained, further illustrating the robustness and efficiency of the algorithm in a real underwater environment. Compared with existing vision-based guidance methods, this method demonstrates specific advantages in computational load and power consumption while ensuring millimeter-level guidance positioning accuracy: the floating-point operations for a single solution of this algorithm are reduced to 1 million floating-point operations per second (MFLOPs), a decrease of 2 to 3 orders of magnitude compared to other methods listed in the paper, and the operating power consumption on the Jetson Orin NX is only about 10 W. This research further alleviates the contradiction between the requirements for high accuracy, high speed, and low computing power in the underwater terminal guidance of AUVs, providing a new approach for the efficient autonomous docking of edge-type underwater robots.  
      关键词:optical guidance;AUV docking;multi-quardant photoelectric detection;particle swarm optimization algorithm;lightweight;edge computing   
      3
      |
      16
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153749969 false
      更新时间:2026-06-04
    • Joint Detection and Classification in Unsupervised Graph Learning

      LI Sicong, WANG Fei, WEI Ziling, CHEN Shuhui
      Vol. 54, Issue 1, Pages: 153-166(2026) DOI: 10.12263/DZXB.20250761
      摘要:Real-world graph machine learning systems typically operate in open environments, where test-time data inevitably deviate from the training distribution, violating the common assumption of identical training and testing distributions in supervised learning. In this setting, models are required not only to maintain stable classification performance on in-distribution (ID) samples, but also to accurately identify and reject out-of-distribution (OOD) data to avoid overconfident erroneous predictions. Due to the strong coupling between node attributes and graph topology, distribution shifts in graph data often occur implicitly, making graph OOD detection more challenging than its Euclidean counterpart. Existing graph OOD detection methods commonly rely on strong supervision assumptions, such as the availability of pre-labeled anomalous samples or the assumption that auxiliary OOD data are clearly separable from ID data in the feature space. However, in practical applications, OOD samples typically appear in an unlabeled and naturally mixed manner with ID data, as observed in cross-platform users in social networks or cold-start nodes in recommendation systems. Such wild data are difficult to distinguish using prior rules, which limits the applicability of existing approaches in open environments. To address this issue, we propose a fully open training paradigm that jointly optimizes graph node classification and OOD detection using unlabeled ID/OOD mixed data, without requiring any OOD annotations or distributional priors. The proposed method formulates a constrained optimization objective that strictly controls ID classification error and false positive rates, while encouraging the model to improve its capability to identify potential OOD samples, thereby capturing the implicit coupling between ID and OOD distributions in real-world open settings. At the methodological level, we introduce an energy-based detection mechanism that maps the outputs of graph neural networks to energy values, which quantify the consistency of samples with the training distribution. The imposed energy constraints guide the model to learn separable representations, where ID samples concentrate in low-energy regions while potential OOD samples are pushed toward higher-energy regions. This design alleviates the overconfidence issue of Softmax-based confidence methods under distribution shifts and allows the detection objective to directly influence graph representation learning. To effectively solve the resulting constrained optimization problem, we adopt an augmented Lagrangian approach that dynamically balances constraint satisfaction and objective optimization during training, enhancing model stability under mixed distributions. Experimental results on multiple real-world graph datasets demonstrate significant performance improvements. On the Twitch dataset, the proposed method achieves an AUROC of 95.97% and an AUPR of 92.84%, outperforming the current state-of-the-art baseline GNNsafe++ by over 21 percentage points, while maintaining a false positive rate of 12.30%. These results confirm the effectiveness and robustness of the proposed framework under fully unsupervised and open-world conditions.  
      关键词:out-of-distribution detection;graph neural networks;node classification;machine learning;wild data;energy function   
      7
      |
      9
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153914963 false
      更新时间:2026-06-04
    • LUO Haitong, ZHANG Weiyao, LIN Chungang, MENG Xuying, ZHANG Yujun
      Vol. 54, Issue 1, Pages: 167-182(2026) DOI: 10.12263/DZXB.20250576
      摘要:With the evolution of network technologies, the scale of network traffic has grown exponentially, and attack methods (such as protocol obfuscation and skipping connections) have become increasingly covert and complex, posing unprecedented challenges to traditional detection methods. Although graph neural networks (GNNs) have demonstrated potential in modeling traffic topological dependencies, they generally face two major bottlenecks in real-world network security scenarios: first, the significant structural heterophily in network traffic graphs, where anomalous traffic tends to establish atypical connections with normal nodes possessing vastly different features, causing GNNs based on homophily assumptions to fail; second, the extreme scarcity of high-quality anomaly labels, where full-parameter fine-tuning easily induces overfitting or the negative transfer of pre-trained knowledge. To this end, this paper proposes a spectral-aware graph pre-training and prompt tuning framework tailored for network traffic anomaly detection. Abandoning the reliance of traditional graph learning paradigms on homophilic structures and massive labeled data, the core innovations of this framework lie in: (1) Introducing complementary spectral filters to jointly model low-pass signals (capturing stable communication patterns) and high-pass signals (identifying abnormal connection perturbations) for the first time during the pre-training phase, accurately characterizing the strong heterophilic nature of network traffic from a frequency domain perspective; (2) Designing a spectral-aware contrastive learning mechanism to extract robust frequency-invariant features by maximizing representational consistency across cross-frequency views; (3) Proposing a parameter-efficient prompt tuning strategy that, while freezing backbone parameters, utilizes learnable prompt vectors to adaptively adjust the fusion weights of high- and low-frequency channels, achieving precise transfer to downstream few-shot tasks. Experiments on three real-world network security datasets, including CICIDS2017, CICIDS2018, and HIKARI2021, demonstrate that the proposed method comprehensively outperforms existing baseline models in detection performance under sample-scarce scenarios. With a maximum improvement exceeding 20%, these results verify the robustness and practicality of the proposed method in complex and heterophilic network environments.  
      关键词:network anomaly detection;graph neural networks;pre-training;spectral graph filters;prompt tuning;traffic classification   
      8
      |
      25
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153340935 false
      更新时间:2026-06-04
    • GAO Shang, JIA Maoshen
      Vol. 54, Issue 1, Pages: 183-194(2026) DOI: 10.12263/DZXB.20250937
      摘要:With the rapid advancement of technologies such as the Internet of Everything, intelligent sensing, and human-machine interaction, multi-source separation in complex acoustic environments has become a crucial front-end challenge in speech signal processing. However, non-stationary speech signals exhibit distinct energy distribution characteristics across different temporal and frequency scales, encompassing both rapidly changing formant structures and relatively stable harmonic and periodic information. Traditional single-resolution time-frequency analysis methods face fundamental constraints in such scenarios: short analysis windows yield insufficient frequency resolution, hindering the distinction of harmonic structures across multiple sources; conversely, longer windows degrade temporal resolution, compromising the capture of rapidly changing transient features in speech. Consequently, existing multi-source separation techniques often exhibit inadequate time-frequency structure analysis, loss of speech details, and separation imbalance in complex acoustic environments. Therefore, existing fixed-resolution separation methods frequently suffer from blurred time-frequency structures, loss of speech detail, and distorted separated signals in real complex acoustic environments, which limits the robustness and practicality of the system in real-world scenarios. To address these challenges, the proposed method implements a multi-branch parallel deep neural network. Each branch independently processes time-frequency spectra generated with different window lengths and employs nested hierarchical recurrent units for feature modeling. Specifically, each branch incorporates a two-stage recursive module: a frequency-spatial modeling unit (Frequency Long Short-Term Memory, F-LSTM) that operates along the frequency axis to extract cross-channel spatial correlations and spectral structures, and a time-spatial modeling unit (Time Long Short-Term Memory, T-LSTM) that recurs over time to capture the long-term dynamic evolution and temporal dependencies of speech signals. Furthermore, the approach feeds multiple sets of time-frequency spectra—generated from different analysis windows and featuring varying resolutions—into the network in parallel. During training, all branches are jointly optimized through a shared time-domain reconstruction loss, promoting the learning of consistent and complementary representations across resolutions. Each branch incorporates a nested architecture to enhance the interaction and fusion of cross-resolution features. At the output stage, the complex spectral masks estimated by each branch are integrated via a fusion layer, and the time-domain signal is reconstructed through inverse short-time Fourier transform, ultimately enabling end-to-end training under both time-domain and spectral constraints. Through multi-resolution joint optimization, the model simultaneously captures transient details and periodic harmonic structures within the speech spectrogram. The proposed multi-resolution fusion scheme significantly improves both objective metrics and subjective listening quality in highly reverberant and multi-speaker environments, and demonstrates structural flexibility, making it transferable to other time-frequency analysis-based network frameworks, thereby providing a scalable design approach and methodological foundation for future multi-source separation models targeting complex acoustic fields.  
      关键词:multi-resolution time-frequency analysis;sparse component analysis;sound source separation;deep neural network   
      5
      |
      8
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 154600920 false
      更新时间:2026-06-04
    • ZHANG Zehui, WANG Yang, CHEN Boyang, ZHANG Haoxuan, XU Xiaobin, WU Fulong, CHENG Shenglong, SHAO Haibin, LI Hao
      Vol. 54, Issue 1, Pages: 195-205(2026) DOI: 10.12263/DZXB.20250504
      摘要:In recent years, with the rapid development of fields such as autonomous driving, robot navigation, and 3D reconstruction, depth estimation technology, as a key means of perceiving the three-dimensional structure of the environment, has garnered widespread attention. However, although the existing deep estimation methods based on supervised learning perform well on specific datasets, their generalization ability is weak and they rely on large-scale, high-quality labeled data, which severely limits their application in real industrial scenarios. Hence, this study proposes a binocular vision depth estimation method based on geometric prior knowledge constraints. First, this study combines residual convolution with the context encoder to extract multi-scale features from image data, and utilizes the feature pyramid structure to capture matching information at different scales for retaining the edge structure details of the image. Then, a multi-level gated recurrent unit (GRU) unit is designed to update the feature matching parameters in combination with feature information of different scales, optimize the disparity matching results, and achieve binocular vision depth estimation. Notably, this paper constructs a hybrid loss function that combines supervised signals with physical priors. Based on the traditional supervised loss, this function introduces geometric constraints derived from the self-supervised learning paradigm as regularization terms, specifically including the left-right disparity consistency constraint and the disparity structure consistency constraint. The left-right consistency constraint enforces geometric correspondence between the predicted disparities of the left and right views, enhancing the model geometric understanding and mitigating mismatches in occluded areas. The structural consistency constraint guides the disparity map to remain smooth in texture-flat regions and sharp at object edges, thereby improving the structural integrity and visual quality of the depth map, ultimately enhancing the generalization capability of the binocular vision depth estimation model. To verify the effectiveness of the proposed method, this paper conducts training and evaluation on public datasets such as KITTI 2015 and Middlebury, and uses the SceneFlow dataset for cross-dataset generalization performance. Experimental results show that after introducing geometric prior constraints, the baseline model’s performance is consistently improved: on the KITTI dataset, the endpoint error (EPE) is reduced by 3% to 5%, and the overall mismatch rate (D1-all) is reduced by 5% to 8%. Simultaneously, results on the Middlebury dataset further confirm the method’s good generalization and robustness across different scenarios. Ablation experiments verify the contributions of each module, while hyperparameter sensitivity experiments determine the optimal configuration for the loss function weights. Additionally, transfer experiments demonstrate that the proposed geometric prior constraint mechanism exhibits good portability, adapting to various mainstream depth estimation network architectures and generally providing performance gains.  
      关键词:depth estimation;stereo matching;prior knowledge;deep learning;geometric constraints;hybrid supervised learning   
      10
      |
      14
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153338144 false
      更新时间:2026-06-04
    • GUAN Yongming, SHI Yuliang, WANG Jihu, LYU Liang, CHEN Zhiyong, LI Hui
      Vol. 54, Issue 1, Pages: 206-218(2026) DOI: 10.12263/DZXB.20250980
      摘要:Addressing the voltage fluctuation issues caused by high-penetration distributed photovoltaic (PV) integration into distribution areas, a distributed PV voltage regulation strategy that takes into account the power quality of the area is proposed. Firstly, based on the topology structure of the area, a distributed PV-load node connection graph is constructed. This serves as the basis for dynamic regulation area division, key voltage node screening, and multi-objective optimization function design. On this basis, a PV output prediction model based on time-frequency classification and mixture of experts(MoE) is developed. By integrating time-domain variation characteristics and frequency-domain periodic patterns, the classification and representation ability of output fluctuations is enhanced. Additionally, MoE is utilized to improve prediction accuracy and stability. Furthermore, using the prediction results as input, a model predictive control method is adopted to directly embed multiple constraints such as voltage constraints, active power output, and regulation frequency into the rolling optimization objective, generating a forward-looking collaborative regulation strategy to address issues such as lagging regulation, frequent actions, and output derating of traditional inverters. To enhance the efficiency of area-level regulation and reduce computational burden, an experience replay area with a self-optimizing update mechanism is designed. Combined with regulation boundary self-sensing rules, a simplified regulation mode can be triggered after prediction completion, directly selecting similar historical strategies for execution. The strategy library is continuously optimized through a reward mechanism, significantly improving response speed while ensuring regulation stability. Simulation results show that the proposed method significantly outperforms multiple comparative schemes in prediction accuracy, achieving a test accuracy of 99.29%, a standard deviation of only 0.71%, and a fluctuation range controlled within 3.27%. In terms of voltage regulation effects, the method achieves rapid and smooth voltage recovery in various scenarios such as voltage undershoot caused by sudden load increases and voltage overshoot caused by PV output fluctuations. Specifically, in scenarios where the voltage undershoots by 2%~10%, the regulation completion speed is increased by more than 2.4 times compared to existing methods. In scenarios where the voltage overshoots by 2%~7%, the regulation speed is increased by more than 1.5 times, and the voltage deviation throughout the entire process remains within ±2%, effectively avoiding frequent regulation and power generation losses. In scenarios where the voltage overshoots by 7%~10%, the proposed method achieves regulation within 2 seconds by reducing active power output, and the active power output is increased by about 3% compared to traditional methods, significantly mitigating the risk of downtime caused by overvoltage protection. In summary, the voltage regulation strategy that integrates precise prediction, rolling optimization, and experience replay mechanisms not only exhibits high prediction accuracy and response speed, but also effectively ensures voltage stability in the substation area and enhances photovoltaic output. This provides feasible technical support for the transformation of distributed renewable energy from ‘scale expansion’ to ‘quality improvement’.  
      关键词:distributed photovoltaics;substation voltage regulation;time-frequency classification;hybrid expert network;model predictive control;regulate boundary perception   
      5
      |
      12
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153978334 false
      更新时间:2026-06-04
    • TAO Hanqing, CHENG Yuhu, WANG Xuesong, WANG Jun
      Vol. 54, Issue 1, Pages: 219-233(2026) DOI: 10.12263/DZXB.20250863
      摘要:Intent classification is a fundamental and critical task in natural language processing, aiming to accurately identify the underlying intentions expressed in user utterances. It serves as an essential technical foundation for dialogue systems, intelligent customer service, and human-computer interaction. In recent years, deep learning-based approaches have achieved remarkable progress in intent classification; however, their performance heavily relies on large-scale annotated corpora and stable domain distributions, which poses significant challenges in real-world applications. In low-resource scenarios characterized by sparse short-text information, abstract label semantics, and insufficient domain prior knowledge, user expressions often exhibit low information density, implicit semantic dependencies, and diverse surface forms. Meanwhile, intent labels are typically highly abstract with blurred semantic boundaries, making it difficult for existing models to capture deep semantic representations and contextual associations solely from literal textual features. These issues severely limit the generalization ability and robustness of intent classification models under low-resource and cross-domain settings. To address these challenges, this paper explores intent classification from the perspective of semantic expansion and contextual modeling, aiming to reduce the reliance of traditional supervised learning methods on explicit annotations and shallow lexical features. Unlike approaches that directly formulate the task as zero-shot intent classification, we introduce the zero-shot contextual association capability of large language models into a supervised learning framework. By leveraging the rich world knowledge and semantic reasoning ability encoded in LLMs, the proposed approach expands the learnable semantic space, thereby alleviating the modeling limitations caused by sparse textual information and insufficient label semantics. Based on this idea, we propose an LLM-based zero-shot context association model (L-ZCAM). The model constructs structured prompts to guide LLMs to generate complementary contextual semantic information related to the input utterance from two complementary perspectives: associative intents and label definitions. This design enables joint mining of in-text features and out-of-text knowledge while explicitly enhancing label semantics. From a structural perspective, L-ZCAM adopts multi-branch feature encoders and a cross-attention mechanism to deeply model the interactions among original textual features, associative semantic features, and label semantic features. In addition, a constraint-guided joint loss function is introduced to enforce semantic consistency between associative semantics and label semantics, mitigating the impact of semantic noise and achieving effective alignment between internal and external information. Through these designs, L-ZCAM is able to better capture semantic associations under complex contexts involving polysemy, abstract labels, and diverse expressions, thereby improving the accuracy and stability of intent prediction. Experimental results on three public datasets, i.e., CLINC150, Banking77, and HWU64, demonstrate that L-ZCAM outperforms state-of-the-art methods by 2.25%, 1.28%, and 1.29% in terms of macro-averaged F1 score, respectively, exhibiting stronger generalization ability and robustness across different task scenarios.  
      关键词:large language model;intent classification;zero-shot context association;semantic expansion;feature generation;cross attention   
      7
      |
      9
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153582318 false
      更新时间:2026-06-04
    • QU Zixuan, SHI Jiale, XU Wei, REN Qingying, LI Jinze, LI Wei
      Vol. 54, Issue 1, Pages: 234-247(2026) DOI: 10.12263/DZXB.20250781
      摘要:Based on first-principles calculations, this work systematically investigates the geometric structures and electronic properties of pristine silicene as well as ruthenium (Ru)-doped and hafnium (Hf)-doped silicene. The calculated results indicate that among the two doped systems, Ru-doped silicene exhibits higher structural stability, whereas Hf-doped silicene demonstrates superior gas adsorption performance. After determining the most favorable adsorption sites for gas molecules, a comprehensive comparative study is carried out on the adsorption behaviors of six gas molecules—CO, CO₂, H₂S, NH₃, SO₂, and H₂CO—on the surfaces of pristine silicene, Ru-doped silicene, and Hf-doped silicene. The adsorption mechanisms and the effects of doping on adsorption capacity and gas-sensing performance are analyzed by comparing the adsorption distance, adsorption energy, charge transfer, recovery time, and density of states. Theoretical results reveal that, except for NH₃, the selected gas molecules exhibit negligible interactions with pristine silicene, indicating its limited sensitivity toward these gases. In contrast, significant adsorption interactions are observed for all selected gas molecules in the Ru-doped and Hf-doped systems. Compared with the pristine system, the doped silicene exhibits stronger binding strength and more pronounced charge transfer after gas adsorption. Moreover, some gas molecules show moderate adsorption energies and acceptable recovery times on the doped surfaces, suggesting good potential for reversible adsorption. In summary, this study demonstrates that Ru and Hf atom doping can effectively modulate the electronic properties of silicene and significantly enhance its adsorption capability and gas-sensing performance toward CO, CO₂, H₂S, NH₃, SO₂, and H₂CO, providing theoretical support for its potential applications in gas adsorption and other related fields, and offering valuable insights for the development of novel high-performance adsorption materials to address environmental and energy-related challenges.  
      关键词:silicene;First-principles;doping;gas sensor;gas sensing property;two-dimensional materials   
      3
      |
      21
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153583073 false
      更新时间:2026-06-04
    • Generative Image Detection Based on Diffusion Artifact Contrast Learning

      YUAN Chengsheng, CHEN Jinrui, CAO Yi, LIU Qingcheng, ZHOU Zhili, FU Zhangjie
      Vol. 54, Issue 1, Pages: 248-261(2026) DOI: 10.12263/DZXB.20250663
      摘要:With the continuous breakthroughs in generative artificial intelligence represented by diffusion models in the field of visual content synthesis, the generated images have approached or even partially surpassed real photographic levels in terms of visual realism and content diversity. However, the rapid development of this technology has also made the detection and identification of generated images—especially deepfake content that may be used for malicious purposes—increasingly complex and challenging. Most existing detection algorithms perform well in controlled laboratory environments, but in open real-world scenarios, once they encounter significant distributional differences between training and testing data—such as unknown generative models, unseen image styles, or forged samples subjected to complex post-processing—their generalization capability and robustness often exhibit notable deficiencies. To address these challenges, this paper proposes a generated image detection method based on contrastive learning of diffusion artifacts (CLDA) from the perspective of hard sample classification. The approach employs multi-module collaborative optimization to enhance the detection accuracy and robustness of the model for generated images. First, challenging generated samples are constructed using high-quality diffusion models to provide a richer data foundation for model training. Subsequently, an artifact enhancement module is designed, introducing a latent space cross-domain enhancement strategy. This strategy expands the forged feature space through feature interpolation weighted by cosine similarity, while incorporating a domain loss mechanism to guide the encoder in learning discriminative features across different forgery domains, thereby preventing the model from over-relying on specific forgery patterns. Furthermore, a contrastive loss function based on latent space boundaries is proposed, which employs dynamic weighting to focus on hard sample pairs near the decision boundary. This enhances the model’s ability to discern subtle differences between real images, generated images, and inverted images. This loss is then combined with binary cross-entropy loss to construct a unified multi-objective optimization function. To validate the effectiveness of the proposed method, comparative experiments were conducted on two public datasets, GenImage and DRCT-2M. The experimental results demonstrate that the detector optimized by the proposed framework achieves an average accuracy improvement of 1.1 percentage points on the GenImage dataset and 4.8 percentage points on the DRCT-2M dataset. Additionally, under challenging scenarios such as image scaling, JPEG compression, and Gaussian noise, the proposed method maintains a high average detection accuracy, with its robustness significantly outperforming existing comparative methods.  
      关键词:generated image detection;diffusion model;fake image detection;image forensics;cross-domain enhancement;contrastive learning   
      6
      |
      12
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153112728 false
      更新时间:2026-06-04
    • A Reliability-Aware Mechanism for Hierarchical Federated Learning

      LIU Xiaoyan, YU Zhen, LIANG Jingyu, LIN Botao, HUANG Jiwei
      Vol. 54, Issue 1, Pages: 262-275(2026) DOI: 10.12263/DZXB.20250852
      摘要:Hierarchical federated learning (HFL) operates in a client-edge-cloud architecture, where intra-group aggregation is carried out at the edge and global aggregation is performed in the cloud, enabling efficient distributed collaborative training. However, client data is typically non-independent and identically distributed (Non-IID), which may yield inconsistent local updates, leading to gradient drift and convergence instability, and degrading global model performance. Meanwhile, edge servers are subject to resource limitations, workload fluctuations, and unstable links, which can cause performance degradation or even failures. Such events may interrupt intra-group aggregation, undermining system stability and task completion efficiency. To address these challenges, this paper proposes a reliability-aware hierarchical federated learning framework (R-HFL) that decomposes the training procedure into a reliability-aware grouping stage and a global aggregation stage. In the grouping stage, we jointly cluster clients by integrating model semantic similarity and geographic proximity, improving intra-group statistical consistency and mitigating gradient drift induced by Non-IID data. In addition, an edge reliability metric is incorporated as a reliability-aware selection criterion, prioritizing highly reliable edge servers as group-level aggregators to reduce the risk of aggregation interruption. Furthermore, to account for the time-varying reliability of edge servers and the long-term horizon of federated training, we design a failure-triggered task migration mechanism: when a group-level aggregator fails, the aggregation task is dynamically migrated to an available edge server to maintain training continuity. To enable adaptive migration decisions, we formulate the migration process as a markov decision process (MDP) and adopt multi-agent proximal policy optimization (MAPPO) under centralized training and decentralized execution (CTDE) to learn migration policies. A unified reward function with constraints is further designed to dynamically balance migration cost, post-migration communication overhead, and semantic distribution similarity, facilitating an adaptive trade-off among objectives, fast migration adaptation, and sustained convergence stability. Finally, extensive experiments are conducted on two real-world datasets under different Non-IID scenarios. The results demonstrate that R-HFL consistently outperforms baseline methods in terms of global accuracy and convergence rate, while substantially reducing the risk of training disruption and migration overhead under edge server failures, thereby improving overall system robustness and fault tolerance.  
      关键词:hierarchical federated learning;non-independent and identically distributed;edge server failure;service migration;multi-agent proximal policy optimization   
      5
      |
      8
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 154254153 false
      更新时间:2026-06-04
    • ZHANG Yadong, CUI Zhanqi, LAN Wenwei, XU Weili, CAO Heling
      Vol. 54, Issue 1, Pages: 276-290(2026) DOI: 10.12263/DZXB.20250880
      摘要:Text input components are essential to Web applications and are widely used in scenarios such as search queries and content creation. Their inputs are typically constrained by syntactic rules and complex business logics. If text input components fail to correctly handle malicious or unexpected input texts, they may cause application crashes. Existing automated graphical user interface (GUI) testing tools for web applications often ignore these constraints. As a result, they cannot generate diverse inputs to effectively detect faults of text input components. Moreover, existing methods often overlook complex constraints among multiple text input components, which makes it difficult to generate diverse input combinations. To address this issue, this paper proposes an approach for testing text input components of web applications based on large language models (LLMs), named LLM-based text input component testing (LTICT). First, LTICT extracts information about text input components from the HTML files of the application under test. It then uses a LLM to infer the constraints of the text input components and to synthesize a text generation program with respect to these constraints. Next, LTICT executes the program to produce input texts in batches to test text input components. Finally, LTICT feeds component contexts and execution outcomes back to the LLM. These feedbacks help the LLM to analyze inter-component constraints and to generate more diverse combinations of inputs. To evaluate the effectiveness of LTICT, comparative experiments are conducted on four open-source web applications with three automated testing tools, which are WebExplor, DBInputs, and QTypist. The experimental results show that LTICT detects more text input component faults, with improvements of 34.21%, 37.84%, and 8.51% over WebExplor, DBInputs, and QTypist, respectively. In addition, LTICT reduces the average time required to detect text input component faults by 10.69%, 11.87%, and 6.99%, respectively.  
      关键词:web GUI testing;text input generation;web applications;prompt construction;large language model;automated testing tool   
      4
      |
      7
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153611942 false
      更新时间:2026-06-04
    • Global Dependency Guided Feature Reconstruction for Image Classification

      YUAN Heng, WU Jingrui, ZHANG Shengchong
      Vol. 54, Issue 1, Pages: 291-307(2026) DOI: 10.12263/DZXB.20250619
      摘要:To address the inadequacy of modelling long-range dependencies in convolutional neural networks for image classification tasks, this paper proposes the global dependency guided feature reconstruction for image classification(GDFRNet). GDFRNet constructs a synergistic dual-path architecture through the design of a novel feature reconstruction module (FRM) and a feature optimization branch, achieving both long-range dependency modelling and fine-grained feature enhancement. The FRM introduces parallel horizontal and vertical global mean pooling to compress features across two spatial dimensions. This extracts context vectors with global vision, remapping them into a two-dimensional feature space to establish long-range feature dependencies spanning the entire image. Concurrently, operations such as transposed convolutions reconstruct the feature space, suppressing irrelevant background noise while reinforcing coherent semantic representations of the target subject. The feature optimization branch refines and fuses detail information through the fine-grained feature capture module (FGCM) and feature optimization module (FOM), reducing the loss of detail information during the network abstraction process. FGCM employs Gaussian-Laplacian convolution to focus on capturing easily lost fine details within images. The FOM performs adaptive fusion and optimization of the global semantic feature map provided by the FRM and the rich detail features extracted by the FGCM within a high-resolution feature pool. These two pathways establish a complementary “global contour-local detail” working mechanism, the global semantic map provided by the FRM guides detail enhancement, ensuring that detail reinforcement does not deviate from the overall semantic context; simultaneously, the rich underlying details refined by the feature optimization branch provide essential fine-grained feedback and task-relevant guidance for FRM’s feature reconstruction process, establishing a virtuous optimization cycle. This complementary mechanism enables the network to ultimately fuse reconstructed semantic information with enhanced local detail, generating more discriminative image representations. This holistically strengthens the model’s understanding of overall image structure and significantly enhances the discriminative power of the feature space. Comparative experiments between the proposed model and state-of-the-art (SOTA) models were conducted across five benchmark datasets: CIFAR-10, CIFAR-100, SVHN, Imagenette and Imagewoof. GDFRNet demonstrated outstanding performance across all datasets. Compared with other advanced methods, GDFRNet achieved average improvements in classification accuracy of 2.39%, 3.73%, 2.35%, 3.33%, and 2.92% on the five datasets mentioned above, demonstrating the effectiveness and advancement of GDFRNet.  
      关键词:image classification;global dependency;feature reconstruction;detail enhancements;feature optimization;CNN   
      6
      |
      60
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153582981 false
      更新时间:2026-06-04
    • CHAI Rong, LIU Jin, LIANG Chengchao, CHEN Qianbin
      Vol. 54, Issue 1, Pages: 308-317(2026) DOI: 10.12263/DZXB.20250114
      摘要:Multi-beam low earth orbit (LEO) satellite communication systems have attracted significant attention due to their wide coverage, high throughput, low latency and low deployment cost. In this paper, user clustering, hybrid wide-spot beam scheduling and precoding issues are investigated for rate splitting multiple access (RSMA)-based multi-beam LEO satellite communication systems. Considering intra- and inter-cluster similarities, a user clustering strategy based on an improved mean shift algorithm is proposed. Initial clustering results are firstly obtained according to the geographic distribution of users and the coverage area of satellite spot beams, and a clustering evaluation function is constructed by incorporating intra-cluster dispersion and inter-cluster distance, a Gaussian kernel-based parameter adjustment mechanism is designed to dynamically tune parameters according to the evaluation results, thereby achieving a synergistic optimization of intra-cluster user compactness and inter-cluster separability. Then, a dual spatial-scale resource allocation strategy is further developed based on the determined user clustering strategy. Specifically, at a large spatial scale, the problem of wide-beam coverage for multiple user clusters is studied. Taking both the inter-satellite transmission performance differences and user access performance into account, the system cost function is modeled, and the wide-beam coverage problem is formulated as a system cost function minimization problem, which is solved using the branch-and-bound method. By systematically decomposing the search space and exploiting upper- and lower-bound pruning strategies, the feasible solution space is progressively reduced to obtain the set of user clusters for wide-beam coverage. At a small spatial scale, based on the wide-beam coverage strategy, spot beam scheduling and precoding schemes are designed for individual user cluster. The spot beam scheduling and precoding problem is formulated as a long-term satellite cache queue length minimization problem and decomposed into a precoding subproblem and a spot beam scheduling subproblem, solving them in sequence. For the precoding subproblem, the objective function is initially transformed into a convex form by introducing relaxation variables. For the non-convex constraints, we apply the first order Taylor expansion method to transform the constraints into convex ones, and the convex constraints involving nonlinear product terms are further transformed into second-order cone constraints, leading to a convex problem that can be efficiently solved using toolkits. Regarding the spot beam scheduling subproblem, it is modeled as a Markov decision process and the spot beam scheduling strategy is determined using the proximal policy optimization (PPO) algorithm. Simulation results validate the effectiveness of the proposed algorithm.  
      关键词:multi-beam LEO satellites;RSMA;user clustering;hybrid wide-spot beam;beam scheduling;precoding   
      4
      |
      12
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153978448 false
      更新时间:2026-06-04
    • DAI Yeling, GUO Yan, LIU Xiaoyu, HAN Lue, LIN Min
      Vol. 54, Issue 1, Pages: 318-328(2026) DOI: 10.12263/DZXB.20250996
      摘要:For the multi-user uplink access scenario in low-earth orbit (LEO) satellite networks, the system faces significant challenges arising from large-scale user populations, stochastic traffic arrivals, heterogeneous information timeliness requirements, and stringent onboard constraints. To address these issues, this paper proposes an age of information(AoI)-based dynamic user scheduling and resource allocation algorithm, aiming to guarantee information timeliness while effectively reducing the long-term average transmit power. Specifically, under random packet arrivals modeled as Bernoulli processes, the long-term average transmit power minimization problem is formulated subject to constraints on users’ maximum long-term average AoI, the per-slot number of scheduled users, and quality-of-service (QoS) requirements. The resulting optimization problem jointly considers user scheduling, beamforming, and power allocation, and is characterized by long-term objectives and constraints as well as coupling among optimization variables, which renders it intractable for direct solution. To overcome this challenge, Lyapunov optimization theory is employed to transform the original long-term problem into a per-slot drift-plus-penalty minimization problem, enabling online decision-making while ensuring the long-term satisfaction of AoI constraints. Furthermore, to mitigate the exponential growth in scheduling complexity with respect to the number of users, a spectral clustering-based grouping method is developed based on users’ angular information, which groups spatially weakly correlated users together to reduce intra-group interference and enhance transmission reliability. On this basis, a low-complexity dynamic scheduling policy is designed via a scheduling cost function that jointly incorporates users’ AoI states, packet arrival characteristics, and estimated power consumption, achieving a balanced tradeoff between information freshness and power consumption. For the resource allocation stage, the non-convex coupling between beamforming and transmit power of pertinent scheduled user set is addressed by leveraging the S-procedure and Taylor series expansion, whereby the original non-convex constraints are transformed into convex forms, yielding an optimal QoS-satisfied and power-constrained resource allocation algorithm. Simulation results demonstrate that, compared with fixed-size scheduling, greedy AoI-based scheduling, and minimum mean square error-based schemes, the proposed algorithm effectively satisfies information timeliness requirements while significantly reducing the long-term average transmit power under various user scales and AoI constraints, establishing the proposed algorithm as an effective and superior solution for multi-user LEO satellite access.  
      关键词:LEO satellite networks;uplink access;AoI;dynamic user scheduling;resource allocation;Lyapunov optimization   
      5
      |
      8
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153978477 false
      更新时间:2026-06-04
    • LI Guojun, CHEN Shi’ao, WANG Jie, ZHANG Zheming, ZHENG Jianzhong
      Vol. 54, Issue 1, Pages: 329-339(2026) DOI: 10.12263/DZXB.20250566
      摘要:Modulation recognition is a key technology in communication countermeasures. Most of the existing modulation recognition studies based on deep learning are conducted on simulated datasets or open-source datasets. As a result, the models obtained through training face huge challenges of being unable to adapt to specific scenarios in practical applications. First, this paper proposes a mirror data augmentation method for modulated signals. The signals transmitted by a vector signal source and received by a receiver are used as the original data. Signal augmentation is achieved through operations such as filtering, different rate sampling, phase shift, frequency shift, and noise addition. The augmented dataset generated in this way can adapt to the influence of various factors in real-world scenarios, such as different symbol rates, Doppler frequency shifts, receiver carrier offsets, signal-to-noise ratios (SNRs), and receiver characteristics, and is similar to real signals. Next, a signal modality transformation module is designed to perform modality transformation on IQ sampling data samples, providing a data basis for subsequent multi-modality processing. Then, a Transformer-based modulation recognition model with multi-modality multi-scale convolution fusion and SE denoising mechanism is designed. After that, the proposed model is trained using the RadioML2018.10a dataset. When the SNR is above 12 dB, the accuracy of the test set reaches 98.3%. However, when the trained model is used for testing in real-world scenarios, the result is only 10.4%. Finally, the proposed model is trained using the augmented dataset. When the SNR is above 16 dB, the average accuracy is 90.1%. The trained model is used for online practical testing, and the recognition rate reaches 91.9% when the SNR is 12 dB.  
      关键词:deep learning;modulation recognition;data augment;feature fusion;modal transformation;multi-head self-attention   
      3
      |
      4
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153978514 false
      更新时间:2026-06-04
    • HUANG Chen, MA Haobo, ZHANG Yan, YANG Chao, SONG Jianhua
      Vol. 54, Issue 1, Pages: 340-351(2026) DOI: 10.12263/DZXB.20250772
      摘要:Multimodal emotion recognition in conversation (MERC) refers to the identification of emotional states in conversations by integrating various modalities such as text, speech, and visual information. With the rapid development of conversational AI and affective computing, MERC has become a research hotspot in the fields of affective computing and human-computer interaction. Compared to traditional unimodal emotion recognition, multimodal approaches can capture the multifaceted characteristics of emotions more comprehensively and accurately. For instance, text conveys explicit emotional content, speech provides subtle emotional cues like tone, speed, and intonation, while visual information (such as facial expressions) reflects non-verbal emotional expressions. These multimodal signals complement each other, enhancing the accuracy and robustness of emotion recognition. However, multimodal emotion recognition faces several challenges. First, there are significant differences in the representation of information across different modalities, and traditional methods like feature concatenation or weighted averaging fail to fully capture the complex interactions between modalities, which can lead to information loss. Second, emotion recognition tasks often suffer from local noise and outlier samples, which can degrade model stability. Lastly, the accuracy of emotion recognition is closely tied to the effective use of contextual information in a conversation, as emotions are often influenced by preceding and succeeding dialogue. Thus, how to effectively extract and utilize contextual information becomes a major challenge in improving accuracy. To address these issues, this paper proposes a novel emotion recognition method, LLM-EmoGraph, which combines large language model (LLM) with global-local cross-domain graph structures to achieve precise fusion and efficient modeling of multimodal data. This method introduces a multimodal masking mechanism to handle missing and inconsistent information across modalities, ensuring that the model maintains good performance even with incomplete or low-quality information. Through large-scale cross-domain multi-graph pretraining, LLM-EmoGraph enhances the model’s transferability between modalities and graph structures, further improving its robustness. The innovative adaptive dual-scale feature fusion strategy aligns textual, speech, and visual semantic features efficiently, improving emotion recognition accuracy, particularly in scenarios involving high interaction among modalities. Additionally, the paper designs a weakly supervised hierarchical emotion classification scheme based on LLM. This approach guides the extraction of emotional information layer by layer, effectively preventing interference from global emotional patterns, and allows the model to learn emotional features accurately, even with limited annotated data. Experimental results show that LLM-EmoGraph significantly outperforms existing mainstream methods on multiple benchmark datasets, demonstrating its effectiveness and advancement in multimodal emotion recognition tasks. In summary, LLM-EmoGraph, through its innovative multimodal fusion strategies, large-scale pretraining, and weakly supervised learning methods, provides effective solutions to a series of challenges in multimodal emotion recognition, offering strong support for improving the accuracy and stability of emotion recognition systems.  
      关键词:multimodal emotion recognition;dialogue system;large language model;graph neural network;feature fusion   
      8
      |
      20
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153978381 false
      更新时间:2026-06-04
    • LU Yuming, CAO Longhao, GUO Xin, AI Yihao, CHEN Hao, JIE Lilin
      Vol. 54, Issue 1, Pages: 352-367(2026) DOI: 10.12263/DZXB.20250777
      摘要:Multi-objective Vehicle Routing Problem is a key optimization problem in the field of logistics distribution and transportation. It is directly impact to logistics operational efficiency, cost control and customer service quality. This problem has widely exist in practical scenarios such as e-commerce warehousing and distribution, urban cold chain transportation and emergency material scheduling, As logistics systems expand in scale and operating environments become increasingly dynamic, the number of constraints and optimisation objectives involved in MOVRP continues to grow, rendering the problem structure progressively more complex. This places heightened demands on optimisation algorithms regarding computational efficiency, solution quality, and robustness. Existing optimization algorithms employs a single-task independent solution approach, where each new problem is processed from scratch. where a solution model is built and the search process is initialized from scratch for each new MOVRP problem. This method fails to effectively utilise useful information accumulated during historical searches. Consequently, it leads to redundant searches, slower convergence speeds, and a tendency to become trapped in local optima in complex scenarios. This results in suboptimal algorithmic performance.To overcome the above-mentioned deficiencies, a multi-objective vehicle routing multi-task evolutionary algorithm is proposed in this paper. Firstly, the original problem is divided into several simple and similar sub-tasks by dimensional reduction, which is used to simplify the complexity of the original problem through hierarchical solution of sub-tasks. While preserving the key constraints of the original problem, this strategy effectively reduces the search space scale of individual sub-tasks, thereby alleviating the algorithm’s search burden and enhancing solution efficiency.Then, based on Evolutionary MultiTasking technology, the method of knowledge transfer is adopted to transfer the searched information between sub-tasks to achieve collaborative gain of sub-tasks and assist in the solution of the original task. This multi-task coordination mechanism fully exploits latent correlations between sub-tasks, significantly enhancing the algorithm’s global search capability and convergence performance. Finally, while the main population evolves, an independent archive population is introduced. The elite individuals in the main population are saved into the archive population.This ensures that high-quality solutions are not lost while maintaining population diversity and uniform distribution, effectively preventing the main population from becoming trapped in local optima.To evaluate the performance of the proposed algorithm, it is tested on the classic Solomon test dataset and compared with four mainstream evolutionary algorithms in the field, namely ACO-Tabu, M-MOEA/D, HMOMA and CCMO. Experimental results show that MO-MTEA outperforms other evolutionary algorithms and achieves superior solutions for MOVRP.  
      关键词:multi-objective optimization;multitasking;vehicle routing problems;time window;evolutionary algorithm   
      5
      |
      10
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153914901 false
      更新时间:2026-06-04
    • SHANG Yihan, DONG Xinghui
      Vol. 54, Issue 1, Pages: 368-380(2026) DOI: 10.12263/DZXB.20251002
      摘要:With the growing demand for deep-sea exploration and marine resource exploitation, underwater vision technologies have become a critical enabler for applications, such as robotic operations and marine biological monitoring. Among various vision tasks, underwater image instance segmentation (UIIS) is particularly challenging, as it requires both precise object localization and pixel-level mask generation. In recent years, vision foundation models, in particular, the segment anything model (SAM), have demonstrated remarkable zero-shot generalization capabilities in generic scenes. However, their performance remains unsatisfactory in complex underwater environments. Severe light absorption and scattering in underwater environments lead to significant image degradation, including color distortion, extremely low contrast, and blurred boundaries, which substantially hinder effective feature extraction. Moreover, the segmentation performance of SAM heavily relies on manually provided explicit prompts (e.g., points, boxes, and masks). This dependency not only increases annotation costs but also limits its applicability in unattended or complex underwater scenarios. To address these challenges, we propose a dynamically-guided SAM (DGD-SAM). By introducing a dynamically-guided mechanism and integrating feature aggregation with a multi-scale feature enhancement module, DGD-SAM establishes a complete pipeline for automatic prompt generation and refined segmentation. First, to mitigate the feature distribution discrepancy between detection and segmentation tasks, an adaptive feature aggregator (AFA) is designed. This module re-models inter-channel dependencies through a channel attention mechanism, achieving task alignment across both spatial and channel dimensions and effectively enhancing the model’s sensitivity to weak underwater targets. Second, considering the large variation in underwater target scales and the complexity of background interference, a multi-scale feature enhancement module is constructed. By building a cross-resolution feature pyramid, this module significantly improves the model’s ability to capture targets of various scales in complex scenes. During the decoding stage, a dynamically-guided decoder (DGD) is proposed, which first integrates the initial segmentation mask with image features to generate dynamic guidance information, and then performs refined mask prediction through bidirectional attention interactions between the prompts and image features. Experimental results demonstrate that DGD-SAM consistently outperforms state-of-the-art methods on four public underwater data sets, including LIACI, USIS10K, UIIS, and UIIS10K, as well as two terrestrial scene data sets, i.e., COME15K-E and COME15K-H. These results indicate that the proposed method not only achieves superior performance in underwater environments but also maintains stable and competitive segmentation performance in terrestrial scenes, suggesting that the model does not overly rely on scene-specific characteristics and exhibits strong generalizability and scalability.  
      关键词:segment anything model;vision foundation model;underwater image instance segmentation;image segmentation;dynamically-guided decoder;prompt generation   
      4
      |
      14
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153915020 false
      更新时间:2026-06-04
    • WANG Yulu, WU Min, ZHAN Tianming, SUN Yubao
      Vol. 54, Issue 1, Pages: 381-394(2026) DOI: 10.12263/DZXB.20250371
      摘要:Compressed sensing magnetic resonance imaging (CS-MRI) accelerates image acquisition by substantially reducing the amount of data sampled in the frequency domain (k-space). The core scientific challenge lies in the high-fidelity reconstruction of MRI images from such incomplete, under sampled k-space data. While deep neural network-based reconstruction methods have recently driven significant progress and continuously improved output quality, a key limitation remains: most existing deep models employ real-valued network architectures. This creates a critical mismatch, as the raw data from magnetic resonance imaging (MRI) scanners is inherently complex-valued, containing both magnitude and phase information. Real-valued networks typically process this data by separating or discarding its complex components, which hinders the full exploitation of the detailed structural features inherent in complex k-space signals, thereby limiting further gains in reconstruction fidelity. Furthermore, prevailing reconstruction networks often operate within a single domain (image or k-space) or use simple sequential processing, lacking a sophisticated interactive mechanism that explicitly enforces consistency between the frequency and image domains. This leads to insufficient and sub-optimal dual-domain feature learning, leaving potential performance improvements unrealized.To address these issues, this paper proposes an innovative dual-domain, three-party complex-valued generative adversarial network named the dual-domain tri-edge complex generative adversarial network (DualTri-CGAN) reconstruction model. Its core architecture features two principal generators: a k-space generator and an image-domain generator, forming a comprehensive dual-domain generation framework. This framework is paired with a real-valued discriminator that evaluates the authenticity of the generated outputs. Both generators are built on a multi-scale encoder-decoder structure, enabling effective extraction and utilization of image features across different scales, from local textures to global anatomy. Additionally, residual connections are integrated within the generators to effectively fuse multi-scale features, significantly enhancing overall feature representation. A pivotal innovation is the introduction of a three-party adversarial learning paradigm. This advanced scheme goes beyond the conventional adversarial game between the generators and the discriminator by incorporating a novel, direct adversarial mechanism between the two sub-generators, fostering a competitive yet collaborative dynamic. For the loss function, alongside standard adversarial losses, a novel similarity adversarial loss is designed. This specialized loss explicitly enforces consistency and alignment between the outputs of the two generators, compelling them to mutually inform, regularize, and optimize each other during adversarial training. This results in superior collaborative performance and, ultimately, higher-quality MRI reconstructions.For experimental validation, the proposed DualTri-CGAN model was systematically evaluated on the public information extraction from images brain (IXI Brain) dataset. Results demonstrate that, compared to existing state-of-the-art generative adversarial network (GAN)-based models, DualTri-CGAN exhibits superior native handling of complex-valued k-space data. This approach effectively avoids the reconstruction errors and information loss typically arising from the separate processing of real and imaginary components in real-valued networks. Moreover, the synergistic benefits of the dual-domain generator framework and the three-party adversarial learning strategy collectively lead to measurable improvements in key image quality metrics, namely higher peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). Notably, even under a stringent 10% sampling rate, DualTri-CGAN maintains a robust capability to accurately recover fine edge details and nuanced textures in MRI images. These findings underscore the model’s excellent reconstruction performance, generalization ability, and strong robustness, marking a promising advancement for fast, high-quality CS-MRI.  
      关键词:magnetic resonance imaging (MRI) compressed sensing reconstruction;complex-valued generative adversarial network (GAN);dual-domain generator;tripartite adversarial learning   
      4
      |
      4
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153978270 false
      更新时间:2026-06-04
    • QIAN Zhongsheng, RAO Yuxian, WU Minxuan, PENG Shaoqiang, WANG Rongrong, XU Kewen
      Vol. 54, Issue 1, Pages: 395-416(2026) DOI: 10.12263/DZXB.20250993
      摘要:click-through rate (CTR) prediction is a core task in recommendation systems, whose goal is to predict the probability that a user will click on a candidate item by modeling the user’s historical behaviors and item features. However, existing CTR methods still have problems in modeling global interaction structures, extracting multi-hop neighbor information, and improving the efficiency of high-dimensional feature interaction learning. The interactions between users and items usually exhibit multi-level and strongly structured association characteristics; direct modeling will lead to excessive computational complexity and difficulty in capturing the semantic relationships between different levels of neighborhoods, thereby limiting the in-depth exploration of potential semantic associations and user preferences. Moreover, most existing CTR models rely on fixed activation functions of traditional neural networks, which lack flexibility in modeling high-order nonlinear feature interactions, and are prone to problems such as feature redundancy and weak generalization ability, resulting in difficulty in further improving prediction accuracy. To address these problems, this paper proposes a kolmogorov-arnold networks (KAN)-based CTR prediction model integrating hybrid community division and cluster-level feature extraction (HCCF-KCTR). Firstly, a hierarchical hybrid community clustering strategy is designed, which combines coarse-grained global community division and fine-grained intra-cluster optimization to decompose complex global interaction relationships into cluster-level units with clear structure and coherent semantics. This strategy significantly reduces the modeling complexity while retaining key structural information. Secondly, based on the results of global community division, multi-hop neighbors are mapped at the cluster level, and a cluster-aware attention pooling mechanism is introduced to dynamically evaluate the semantic importance of each hop of neighbors within and between clusters, adaptively assign attention weights, and generate high-quality cluster-level embedding representations of multi-hop neighbors, so as to fully capture the multi-level interaction characteristics between users and items. Finally, the learnable function of the KAN network is used to replace the fixed activation function, and multiple cross-hop and cross-cluster feature combinations are constructed to convert complex multi-hop interaction features into interpretable low-order function combination, realizing the in-depth fusion of structural information and semantic features, and further improving the prediction accuracy and expressive ability of the model. Comparative experiments are conducted with 13 mainstream CTR models on four real-world datasets, namely MovieLens, Electronics, Book, and Taobao. The experimental results show that, in terms of the three metrics of AUC, GAUC, and LogLoss, the proposed HCCF-KCTR model achieves an average minimum improvement of 2.74%, 2.19%, and 3.68% respectively compared with the existing optimal baseline model, verifying its superiority in feature interaction modeling and prediction. In addition, this work verifies the necessity and synergistic effectiveness of each module as well as the balance of the model in overall efficiency through ablation experiments, parameter sensitivity experiments and model efficiency experiments, further demonstrating that the model has excellent generalization ability.  
      关键词:click-through rate prediction;community clustering;kolmogorov-arnold networks;recommender system;attention pooling;cluster-level features   
      5
      |
      6
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 154609701 false
      更新时间:2026-06-04
    • CHEN Mengyuan, ZHANG Tantan, TANG Zhe
      Vol. 54, Issue 1, Pages: 417-432(2026) DOI: 10.12263/DZXB.20250899
      摘要:To address the limitations of deep learning models in the textile industry caused by the scarcity of fabric defect samples, this paper proposes a method combining upstream data augmentation with downstream detection model deep optimization. In actual production, the extreme scarcity of defect samples results in a “small-sample dilemma” that hinders model training. During the upstream stage, a GAN-Data generative network is designed based on the cycle-consistent generative adversarial network (Cycle-GAN) architecture. This network utilizes a mask-guided mechanism to decouple defect features from background textures, ensuring precise positioning and resolving distribution randomness. To handle significant scale variations, GAN-Data incorporates an enhanced defect generation module (EDGM), which employs parallel multi-scale dilated convolution branches to achieve adaptive feature extraction for various defect types. Furthermore, a texture-preservation loss function based on the VGG19 network and Gram Matrix constraints is introduced to maintain the integrity of periodic fabric textures in non-defect regions. In the downstream stage, the FD-DETR detection network is constructed. Its backbone embeds a four-directional edge enhancement module based on the Prewitt operator to strengthen the capture of weak defect contours. To improve efficiency, a sparse attention-based intra-scale feature interaction (SparseAIFI) mechanism is designed, which effectively reduces the computational complexity by fusing local window, striped sampling, and block-level sparse patterns. Additionally, an aspect ratio aware-IoU (ARA-IoU) loss function is introduced to optimize the localization accuracy for irregular defects through center distance normalization and an adaptive weighting mechanism.The method is validated using the MVTec AD dataset, the industrial textile dataset (ITD), and a self-built production line dataset. Initial evaluations using the structural similarity index measure (SSIM), peak signal-to-noise ratio (PSNR), and Fréchet inception distance (FID) demonstrate that GAN-Data achieves superior image quality and cross-domain generalization. Subsequent comparative experiments show that the FD-DETR model trained with GAN-augmented data significantly improves detection accuracy while meeting industrial requirements. Finally, collaborative optimization experiments confirm that integrating GAN-Data and FD-DETR achieves faster convergence and higher performance ceilings than single-stage improvements. In conclusion, this bidirectional synergistic route provides an efficient solution for fabric defect detection under small-sample conditions.  
      关键词:Fabric defect detection;Dataset augmentation;generative adversarial network;geep learning   
      6
      |
      24
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 154117614 false
      更新时间:2026-06-04
    • LEI Xiaochun, WU Weilin, JIANG Zetao, ZHU Wencai, LIU Yingjian, CHEN Dongmei, WU Siqi
      Vol. 54, Issue 1, Pages: 433-450(2026) DOI: 10.12263/DZXB.20251221
      摘要:Semantic segmentation plays an important role in a variety of practical applications such as autonomous driving, doctor-worker intersection, and security monitoring. However, nighttime semantic segmentation is still an unsolved problem. Due to insufficient illumination at night, the details of the acquired image are unclear, which leads to the difficulty of dataset annotation. Therefore, unsupervised domain adaptation methods for nighttime semantic segmentation are preferred. As a result, the semantic segmentation effect of nighttime scenes is not ideal. To solve this problem, this paper proposes an unsupervised SDDA (Style and Distribution Domain Adaptation) method for nighttime semantic segmentation. The domain adaptation of nighttime semantic segmentation task is divided into style domain adaptation and distribution domain adaptation. In this way, the difficulty of the nighttime segmentation task is reduced. The Mamba architecture model with better performance is introduced into the unsupervised domain to adapt to the nighttime semantic segmentation task, and the advantages of this architecture model in the nighttime semantic segmentation task are explored to improve the accuracy of the nighttime segmentation task. This paper proposes a SPG (Semantic Pairing GAN) module, which combines the unpaired translation and rough paired translation through semantic information, so as to semantically associate the segmentation task with the SPG translation module, so as to promote the translation content to be more suitable for the segmentation task and not independent of the segmentation task. The SPG module translates the day images of the source domain into night images, and then the segmentation model is trained with the translated images, so that the segmentation model can learn the style domain information to reduce the style domain differences. This paper proposes a SDM (Semantic Domain Mixing) strategy, which uses semantic information to extract and move the dynamic objects translated by SPG to the reasonable position of the static object image at night in the target domain, and recombines them into a new image. The segmentation model is trained by using the images with small style domain differences, which makes it easier to perform domain adaptation from the perspective of distribution domain, so as to narrow the distribution domain gap. Through the combination of style domain adaptation and distribution domain adaptation, the model reduces the domain differences from two different perspectives, and realizes the domain adaptation of night segmentation tasks as a whole, so as to alleviate the problem that the existing data sets have too large cross-domain range and are difficult to directly adapt to the domain. The experimental results show that the mIoU index of the proposed method on Dark Zurich, ACDC Night and Nighttime Driving datasets achieves 60.0%, 59.8% and 59.1%, respectively, which is 0.9%, 0.4% and 1.6% higher than the best existing method. It can accurately segment and predict the image target of complex actual scene at night.  
      关键词:unsupervised domain adaptation;nighttime semantic segmentation;image to image translation;deep learning;image segmentation   
      5
      |
      6
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 154117554 false
      更新时间:2026-06-04
    • SHU Xiangbo, LI Chengjian, YIN Zheng, LI Pengpeng, LI Zechao, TANG Jinhui
      Vol. 54, Issue 1, Pages: 451-465(2026) DOI: 10.12263/DZXB.20251089
      摘要:Generating high-quality human motions that are semantically consistent with textual descriptions remains a challenging problem. Although recent diffusion-based, autoregressive, and multimodal pre-trained approaches have improved motion naturalness and diversity, they still struggle with complex semantic understanding and fine-grained motion modeling. These limitations mainly stem from two factors: (1) the lack of explicit modeling of hierarchical dependency relationships among sentence components, which hampers accurate textual semantic understanding; (2) the reliance on either global-level or word-level text-motion alignment, while neglecting the complementarity between global and local semantics, making coarse-to-fine collaborative modeling difficult. To address these limits, we propose the hierarchical textual-semantic-driven multi-granularity human motion generation framework (HTMG), which models textual semantics while enabling coarse-to-fine cross-modal interactions to ensure text-motion consistency. Specifically, we introduce a hierarchical semantic capture strategy (HSCS) that constructs a textual structure tree via syntactic parsing and embeds it into hyperbolic space, where hierarchical semantic dependencies are dynamically modeled using a hyperbolic graph attention mechanism. Furthermore, we design a multi-granularity cross-modal attention mechanism (MGCA) that adaptively fuses global-level and word-level semantic representations with motion features, allowing the model to jointly capture overall motion intent and fine-grained action variations. Extensive experiments demonstrate that HTMG achieves state-of-the-art performance on the HumanML3D and KIT-ML benchmarks, validating the effectiveness of our framework in textual semantic understanding and text-motion alignment.  
      关键词:human motion generation;hierarchical semantic capturing strategy;hyperbolic space;hyperbolic graph attention mechanism;textual structure tree;multi-granularity cross-modal attention   
      2
      |
      5
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 154191974 false
      更新时间:2026-06-04

      SURVEY AND REVIEW

    • WANG Qianfan, GUO Yangeng, BI Sheng, WANG Yiwen, SONG Linqi, MA Xiao
      Vol. 54, Issue 1, Pages: 466-478(2026) DOI: 10.12263/DZXB.20251062
      摘要:Ultra-reliable low-latency communications (URLLC) is one of the three key scenarios in 5G, and its enhanced form, hyper-reliability low-latency communications (HRLLC), has been proposed as one of the six typical scenarios for 6G. These services impose very stringent constraints on latency and reliability, presenting new opportunities and challenges for the coding and decoding of short-blocklength codes. Ordered statistics decoding (OSD) is a universal near-maximum-likelihood (Near-ML) decoding algorithm for short codes with strong potential in such scenarios, but its high computational complexity severely limits practical deployment. This paper surveys recent advances in OSD with a focus on the design of test error pattern (TEP), including ordering rules, skipping mechanisms, and early termination strategies, and outlines future research directions. Specifically, we first examine Hamming-weight, soft-weight, and logical-weight TEP orderings, and show that logical-weight ordering achieves an effective balance between reliability and implementation complexity. We then review existing skipping and termination mechanisms, which exploit dynamic soft information or probabilistic decisions to avoid redundant re-encodings. Moreover, we concentrate on TEP generation, skipping, and termination schemes driven by soft metrics and additional parity checks, as well as their joint design. By combining structural constraints with hybrid decision strategies, such schemes can reduce the average number of re-encodings by one to two orders of magnitude with almost no loss in frame error rate. Simulation results show that, for the bose-chaudhuri-hocquenghem(BCH) code [127,64], a combined skipping mechanism requires only tens of re-encodings at an SNR of 4 dB, reducing the computational cost by more than 90% compared to the original OSD algorithm. Finally, we discuss open challenges related to non-binary codes, time-varying channels, and hardware implementations for longer blocklengths and medium-rate codes, and outline several promising directions for future research.  
      关键词:error-correcting code;ordered statistics decoding;test order;skipping mechanism;stop mechanism   
      3
      |
      6
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 154258192 false
      更新时间:2026-06-04

      CORRESPONDENCE

    • ZHAO Yun, MU Yunxi
      Vol. 54, Issue 1, Pages: 479-486(2026) DOI: 10.12263/DZXB.20250632
      摘要:To achieve efficient signal coverage and system capacity enhancement in high-frequency wireless communication system, multi-beam antenna technology plays an irreplaceable role. The sub-millimeter wave band has become a key frontier for next-generation communication systems. However, existing mainstream solutions face significant bottlenecks: traditional Butler matrices suffer from high insertion loss and complex structures at high frequencies; lens and reflector antennas are bulky, making it difficult to meet compactness requirements; and traditional substrate integrated waveguide (SIW) multi-mode networks are limited by narrow bandwidths and high dielectric loss. Addressing these challenges, this paper proposes a compact broadband multi-beam horn antenna operating in the Y-band (170~260 GHz), aimed at overcoming existing technical limitations and providing a low-loss, easy-to-fabricate, and high-performance solution for sub-millimeter wave communications. The structure of the designed antenna consists of four parts: ridge-waveguide transmission lines, a ridge-waveguide multi-mode network, periodic slot phase shifters, and a horn antenna. To overcome the limitations of narrow bandwidth and high sidelobes in traditional multi-mode beamforming networks, single-ridged waveguide structure is introduced to alter the internal electromagnetic field distribution. The electric field is highly concentrated in the ridge region, which acts as an added shunt capacitor in the equivalent circuit. Therefore, the cutoff frequency is lowered, significantly broadening the operating bandwidth of the multi-mode beamforming network. Additionally, periodic slot phase shifters are designed as key optimization components to the phase adjustment of the designed antenna. The phase shifters are loaded into the waveguides connected to the output ports 6 and 7 of the multi-mode network. The shifters consist of periodically arranged rectangular slots protruding outward from the waveguide’s broad wall. Based on the theory of phase delay caused by waveguide discontinuities, the inherent phase deviation of the ridge waveguide multi-mode network is compensated, making the overall output phase distribution flatter. Simulation analysis confirms that the designed antenna obtains suppressed sidelobe radiation and enhanced beam gain. The designed antenna is fabricated and measured, the entire antenna is made with aluminum alloy with high-precision computer numerical control (CNC) milling method. To minimize assembly errors, the structure adopts an H-plane split-block design with alignment pin holes reserved to ensure precise cavity alignment. Measurement results show that the antenna achieves a 25% relative impedance bandwidth within the 175~225 GHz range, with reflection coefficients for all four feeding ports below -10 dB and great port isolation. The four synthesized beams cover a maximum scanning range of ±30°. Within the operating band, the antenna achieves broadband radiation characteristics with a peak gain exceeding 15.6 dBi, a gain fluctuation of less than 3 dB, and an aperture efficiency maintained above 48%.  
      关键词:sub-millimeter wave;Y-band;multi-beam antenna;ridged waveguide multi-mode network;broadband antenna;waveguide phase shifter   
      3
      |
      8
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 153340806 false
      更新时间:2026-06-04
    0