Acta Electronica Sinica

25 April 2025, Volume 53 Issue 4

Select all

|

On-Device Intelligence Acceleration Technologies: A Survey

ZHANG Jin-rui, LONG Ting-ting, ZHANG De-yu, XU Yuan, REN Ju, ZHANG Yao-xue

2025, 53(4): 1063-1102.
https://doi.org/10.12263/DZXB.20240691

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

Intelligent edge computing is an essential pathway towards the era of pervasive intelligence, and it has propelled the rapid advancement of on-device intelligence technology. By directly deploying and running deep learning models on edge devices, on-device intelligence holds natural advantages in real-time processing, security, and personalization, among other aspects, and has found extensive applications in various scenarios such as autonomous driving, satellite reconnaissance, virtual reality/augmented reality (VR/AR), and more. However, as the parameters of deep learning models continue to increase, the limited hardware resources at the edge struggle to sustain the growing computational costs. To enhance the computational efficiency of model inference on edge devices, researchers have systematically optimized from multiple perspectives including model algorithms, compilation software, and device hardware, driving the advancement and evolution of on-device intelligence. This paper summarizes existing optimization efforts for deep learning model inference at the edge, covering techniques such as model compression, collaborative design of model-software-hardware, heterogeneous model parallel deployment strategies, and optimizations for large models. Lastly, it outlines the challenges faced by current on-device intelligence inference acceleration technologies and provides insights into future development trends.
GaussDB: Intelligent Cloud-Native Distributed Database

LI Guo-liang, WANG Lei, ZHANG Jin-yu, WANG Jiang, REN Yang, ZHANG Qiong, SHEN Yu, DONG Ya-hui, SONG Tao, QIAO Dian

2025, 53(4): 1103-1122.
https://doi.org/10.12263/DZXB.20231026

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

With the rapid development of Internet technologies and digital transformation of traditional enterprises, databases are facing great challenges in terms of large data volume, high availability, cloud-native elasticity, intelligence, and security protection as data infrastructure in the digital age. To address these challenges, GaussDB came into being. It proposes the distributed query optimization technology of distributed databases to improve data query performance, proposes the HA disaster recovery technology of distributed databases to improve enterprise data availability and reliability, proposes the cloud-native computing and storage separation and elastic scaling technology of distributed databases to improve storage resource utilization, proposes the autonomous management technology of distributed database is proposed to enhance the intelligent management of database, and proposes the all-round security protection technology of distributed database is proposed to improve the security protection ability of data. With all these new technologies, GaussDB supports the digital transformation of core scenarios in key basic industries
Log Fault Diagnosis Based on Large Language Models

XU Ting, XIAO Tong, ZHANG Sheng-lin, SUN Yi-dan, SUN Yong-qian, PEI Dan

2025, 53(4): 1123-1141.
https://doi.org/10.12263/DZXB.20240801

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

As the software service systems become increasingly large and complex, log-based fault diagnosis is critical to ensure the reliability of software services. Although existing research in log fault diagnosis methods can identify the type of the fault, they often fails to explain the reasoning process to convince the operation and maintenance personnel, which makes the above method challenging to apply in the production environment. The LogCoT (Log Chain of Thought) is proposed in this paper as a new framework for fault diagnosis based on automatically constructing chain of thought prompting (CoT-Prompting) to address the above issues. The auto-few-shot-CoT (Auto-FSC) algorithm of the two-stage CoT-Prompting engineering extracts semantic information from the large language mode (LLM) table root cause analysis reports. In addition, the combination of prompt-tuning with category-unlabelled and preference-tuning with category-labelled is used to optimally align the base model Mistral. Then, the large language model feedback identity preference optimisation(LLMf-IPO) algorithm is used to correct the wrong diagnosis results generated by the base model Mistral to better align the user’s intention. Finally, we provide a comprehensive experimental evaluation of LogCoT’s performance based on two log datasets collected from the production environment of the top-tier global Internet service provider and a cloud service provider. The experimental results show that LogCoT outperforms the three baseline models in three performance metrics, including Accuracy, Macro-F1, and Weighted-F1 on two datasets, and outperforms the Accuracy of the best existing model by 31.88 percentage points, 10.51 percentage points, respectively.
Model-Driven Deep Learning-Enhanced Markov Chain Monte Carlo MIMO Detector: Design, Simulation and Prototyping

CAO Yi-xiao, ZHOU Xing-yu, ZHANG Jing, LIANG Le, LI Yong, JIN Shi

2025, 53(4): 1142-1152.
https://doi.org/10.12263/DZXB.20240798

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

The scale of multiple-input multiple-output (MIMO) systems is growing rapidly, leading to a dramatic increase in the computational complexity of receiver signal detection. Traditional detection algorithms struggle to achieve a good balance between bit error rate (BER) performance and computational complexity. Markov chain Monte Carlo (MCMC)-based detection algorithms can achieve near-optimal detection performance with polynomial complexity, but their performance deteriorates significantly with low sampling rates. To address this issue, this paper introduces a model-driven deep learning approach, which transforms the MCMC iterative process into a cascade network structure. Trainable parameters are incorporated into the network, and deep learning techniques are employed to optimize their settings. Based on complexity analysis and simulation results, the proposed method outperforms the original algorithm in terms of BER by approximately 1 dB in coding scenarios, while significantly reducing computational complexity. To validate the performance of the model-driven deep learning approach in real-world transmission, a 2×2 MIMO smart communication prototype is developed, and end-to-end air interface transmission tests are conducted. The test results demonstrate that the MCMC detection algorithm enhanced by the model-driven deep learning approach still achieves a significant BER performance advantage with lower computational complexity, thereby confirming the effectiveness and robustness of the proposed solution in practical transmission environments.
PS Selecting Method of GB-InSAR Based on GMM

TIAN Wei-ming, WANG Long-yue, GAO Song, DENG Yun-kai

2025, 53(4): 1153-1163.
https://doi.org/10.12263/DZXB.20240803

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

Permanent scatterer (PS) selection is a crucial step in the processing of ground-based interferometric synthetic aperture radar (GB-InSAR).Existing methods rely on amplitude stability, phase stability or high coherence between pixels to select PS. Amplitude stability and phase stability are sensitive to phase fluctuations and may not well represent phase errors in some cases. The methods based on high coherence can easily lead to false detections due to their reliance on local windows. To address these issues, this paper analyzes the differences in the distribution characteristics of interferometric phases between PS and non-PS in GB-InSAR. A new PS selection method based on Gaussian mixture model (GMM) is proposed. The method firstly selects enough PS as prior reference PS. Then, GMM is used to fit the probability distribution of the interferometric phases of the reference PS. Finally, the PS and non-PS are distinguished based on the matching degree between the interferometric phase series of all image pixels and the GMM. The results of measured data show that, comparing to the traditional methods based on amplitude and phase stability, when the number of obtained PS is close, the proposed method yields stronger coherence and higher degree of aggregating of phase series among the obtained PS. The PS obtained by GMM are with less residues than other methods. This demonstrates the method's ability to accurately select PS.
Systems and Key Technologies of 4K/8K UHD TV Production Broadcasting and Presentation

JIANG Wen-bo, ZHAO Gui-hua

2025, 53(4): 1164-1173.
https://doi.org/10.12263/DZXB.20230911

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

The television production and presentation system has undergone an evolution from black and white to color, and from analog to digital. Currently, it is in a stage of rapid development from high definition (HD) to ultra-high definition (UHD). The signal transmission rate of the traditional HD baseband system is only 1.5 Gbps, which is unable to carry 4K/8K UHD signals (48 Gbps@8K, 12 Gbps@4K). Moreover, the luminance dynamic range of HD television is only 10³, while the human eye's visible range without pupil adjustment is 10⁵. Therefore, UHD television should enhance the luminance dynamic range to 10⁵ in accordance with the human eye's recognition ability. Focusing on the technical challenges such as uncompressed cross-domain multi-address Internet Protocol (IP) switching and high dynamic range (HDR) production and presentation for UHD, this paper comprehensively introduces the UHD television production and presentation system and key technologies, with a particular emphasis on the innovative points of UHD IP signal switching, 8K UHD video imaging and image processing, video intelligent enhancement, extended reality (XR) virtual-real fusion production, Audio Vivid, heterogeneous network audio-video synchronization transmission, and 4K/8K UHD terminal display, as well as the full-process program production and presentation capabilities of UHD high dynamic characteristics.
The Vortex Electron and Vortex Microwave Photon: The Intrinsic and Extrinsic Orbital Angular Momentum

WANG Zhe-yuan, ZHANG Chao

2025, 53(4): 1174-1181.
https://doi.org/10.12263/DZXB.20240455

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

The theoretical model of cyclotron electron to radiate vortex electromagnetic wave photons is crucial for the technology of quantum state vortex electromagnetic wave. This paper is the part of “The Vortex Electron and Vortex electromagnetic wave Photon”, which establishes the theoretical model related to the “The Intrinsic and Extrinsic Orbital Angular Momentum (OAM)”. Single electron or electromagnetic wave photon can form vortices and are determined by their intrinsic OAM. Aiming to analyze the intrinsic OAM of a single electron transferring in the energy level transition radiation, the eigenvalues of the intrinsic OAM are calculated theoretically in different scenarios of free space and magnetic field in this paper. The results show that the intrinsic OAM is only determined by the electron wave packet, while the extrinsic OAM is affected by the choice of coordinate system. When the quantum number in the intrinsic OAM is changed, the physical variation is the expansion or contraction of the electron wave packet. This paper also gives expressions for the intrinsic OAM of electromagnetic wave photons in free space and the extrinsic OAM in twisted optical fiber. Depending on the presence of the intrinsic OAM, a mass of electromagnetic wave photons can constitute a quantum state OAM electromagnetic wave and a statistical state OAM electromagnetic wave.
Research on Reconfigurable NTT Arithmetic Unit and Efficient Scheduling Algorithm for Lattice Post-Quantum Cryptography

FU Qiu-xing, LI Wei, BIE Meng-ni, CHEN Tao, NAN Long-mei

2025, 53(4): 1182-1191.
https://doi.org/10.12263/DZXB.20240788

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

In order to further improve the rate of polynomial multiplication in lattices post-quantum cryptography, and considering the different parameters of polynomial multiplication in different lattices, a high-speed reconfigurable number theory transformation (NTT) arithmetic unit is proposed in this paper, and the corresponding data scheduling scheme is proposed to solve the problem of time sequence conflict and space conflict. In this paper, we first analyze the operation characteristics of NTT algorithm in different lattice-based post-quantum cryptography algorithms, and propose a 4×4 reconfigurable operating unit to meet the needs of 2/3/4-NTT operation in different bit widths. Secondly, based on the above hardware design, a data scheduling scheme based on the basic 4-NTT algorithm is proposed to solve the timing conflict problem in the highly parallel multi-pipeline-level design. Finally, a multi-bank data storage scheme based on m-coloring algorithm is proposed to solve the problem of data access conflict. Experimental results show that the hardware structure designed in this paper is capable of implementing base 2/3/4-NTT and its inverse operation functions, and can support a variety of latry-based post-quantum cryptography algorithms including Kyber and Dilithium. The maximum parallelism degree supported by the hardware is 4. In order to further verify the superiority of the hardware design in this paper, Xilinx Virtex-7 device is used for experimental verification. The working frequency is up to 169 MHz, and the NTT algorithm function can be completed within 0.40 μs, and ATP is reduced by about 42%. Integrated implementation on 40 nm CMOS process nodes results in a 18%~90% reduction in the AT volume of the hardware design compared with existing designs.
High-Speed Column Bus Readout Method Based on Negative Capacitance Circuit

XU Rui-ming, GUO Zhong-jie, LIU Sui-yang, YU Ning-mei

2025, 53(4): 1192-1200.
https://doi.org/10.12263/DZXB.20240948

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

Aiming at the problem of the long establishment time of the column bus signal in large array CMOS image sensors, a high-speed column bus signal reading method is proposed in this paper. Based on negative capacitance technology, the negative capacitance circuit is integrated into the column-level readout circuit to offset the negative influence of column bus parasitic capacitance on the establishment time of column bus signal. At the same time, based on the dynamic loop stability regulation technology, the design contradiction between readout speed and loop stability is balanced. Based on 55 nm 1P4M CMOS technology, detailed circuit design and comprehensive simulation verification of the proposed high-speed column bus readout method are completed in this paper. Under the design conditions that the pixel size is 10 μm × 10 μm, the tail current is 5 μA, and the column bus output voltage swing is 1.2 V, the rise time of the column bus signal is reduced from 1.721 μs to 1.204 μs, which is reduced by 30.04%. The fall time of column bus signal is reduced by 51.28% from 5.780 μs to 2.816 μs. In addition, row fixed mode noise (RFPN) is reduced from 1.30% to 0.01%. Under the power consumption of 1.6 W, the frame rate and dynamic range of the large array CMOS image sensor designed in this paper reach 27 fps and 85 dB respectively. It provides a certain theoretical support for the design of large array high-speed and low-power CMOS image sensors.
Aerospace Heterogeneous Image Matching Algorithm Based on Improved SuperPoint

LIU Geng-chen, JIANG Liang, WU Guo-qiang, HUANG Kun

2025, 53(4): 1201-1211.
https://doi.org/10.12263/DZXB.20240724

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

It is quite difficult to extract features from heterogeneous aerospace images, and the image matching accuracy is relatively low, which has a negative impact on the precise target positioning of unmanned aerial vehicles (UAVs). The SuperPoint-SuperGlue algorithm has been widely applied in the field of image matching in recent years due to its characteristics such as self-supervision, easy training, and high accuracy. However, in the field of heterogeneous aerospace image matching, the feature extraction ability of SuperPoint still needs to be improved. In order to improve the matching accuracy of heterogeneous aerospace images, this paper proposes a heterogeneous aerospace image matching algorithm based on the improved SuperPoint. Firstly, the spatial group-wise enhance (SGE) module and the global attention mechanism (GAM) are introduced into the SuperPoint encoder to form a supplementary encoder, which to a certain extent solves the problems of uneven distribution of image features and the difficulty in extracting features from weakly textured images. Secondly, to further enhance the feature extraction ability of the algorithm, the supplementary encoder is connected in parallel with the original SuperPoint encoder to form a combined encoder. By combining the advantages of the two, it can extract image features with greater differences, reduce the false matching of feature points in similar regions, and improve the matching accuracy of heterogeneous aerospace images. Finally, through experimental verification, within the error range of 80 pixels on the UAV-VisLoc dataset, the number of matchable images can reach 82.14%. Compared with the original SuperPoint algorithm, the number of matchable images within the error range of 80 pixels has increased by 6.05%. Compared with other advanced algorithms, the number of matchable images within each pixel error range has increased. The experiments show that the algorithm proposed in this paper can effectively solve the problems such as weak feature extraction ability and uneven feature distribution in the matching of heterogeneous aerospace images.
Quantitative Evaluation Defects in Rod Workpieces Based on Acoustic Field Characteristics and C-Scan Images

DONG Ming, TIAN Hui, MA Hong-wei, CHEN Yuan, CAO Xian-gang, WAN Xiang

2025, 53(4): 1212-1220.
https://doi.org/10.12263/DZXB.20240594

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

Flaw sizing is the focus of research in the field of non-destructive testing. The diffusion of ultrasonic waves causes edge blurring of C-scan images, which affects the accuracy of flaw sizing. A defects quantitative evaluation method for rod workpieces is proposed based on acoustic field characteristics and C-scan images. Based on the multi-Gaussian beam model, according to the propagation law of the ultrasonic wave at a curved interface, the acoustic field distribution of the spherical focusing probe under the curved surface condition is deduced, and the acoustic field characteristic values of the target plane where the defect is located are extracted. Nylon rod samples with flat-bottomed holes of different depths and diameters are scanned by ultrasonic C-scan system, and the characteristic values of the C-scan images are extracted. A dataset is created and a random forest regression model is trained. The test set is processed by the trained model, and the predicted results are closer to the standard values compared with the quantitative results of the 6 dB drop method. The quantitative error for the 1.5 mm flat-bottomed hole is 19.33%, a 27.34 percentage points reduction compared to the 6 dB drop method. The quantitative evaluation is performed on nylon rod with natural defects, the results show that the model can effectively predict the size information of natural defects in nylon rods.
Lightweight MR Image Reconstruction Network Based on Wavelet Domain Complex Convolution and Complex Transformer

ZHANG Xiao-hua, LIAN Qiu-sheng

2025, 53(4): 1221-1231.
https://doi.org/10.12263/DZXB.20241058

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

Convolutional neural networks (CNNs) have demonstrated remarkable capabilities in learning image priors from large-scale datasets, achieving exceptional performance across various image processing tasks. However, the local receptive field inherently limit their ability to capture long-range dependencies between pixels. In contrast, the transformer architecture, renowned for its global receptive field, has exhibited outstanding performance in natural language processing and high-level vision tasks. Nevertheless, its computational complexity, which scales quadratically with image size, poses significant challenges for high-resolution image processing applications. Furthermore, many magnetic resonance (MR) reconstruction algorithms exhibit limitations by either relying exclusively on magnitude data or processing real and imaginary components as separate channels, thereby failing to account for the intrinsic correlations within complex-valued images. By integrating complex convolution and complex transformer, an innovative hybrid module is introduced, which leverages the high-resolution spatial information extracted by CNNs to enhance the details of MR images and capture long-range features through global contextual information obtained by the self-attention module. Building on this hybrid module and wavelet transform, a lightweight MR image reconstruction method using complex convolution and complex transformer in the wavelet domain is further proposed. Experimental results on the Calgary-Campinas and fastMRI datasets demonstrate that the proposed model achieves superior reconstruction performance and while maintaining lower resource consumption compared to four representative MR image reconstruction algorithms. The source code is available at https://github.com/zhangxh-qhd/WCCTNet.
Cost Function Optimization-Based Beam Scheduling and Resource Allocation Algorithm for Multibeam Satellite Communication Systems

ZHANG Si-ya, CHAI Rong, LIANG Cheng-chao, CHEN Qian-bin

2025, 53(4): 1232-1240.
https://doi.org/10.12263/DZXB.20240116

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

Multibeam satellite communication systems have received widespread attentions due to their high throughput and efficient resource utilization. This paper investigates the beam scheduling and resource allocation problem in multibeam satellite communication system. By jointly considering user position and service characteristics, an optics-based initial user grouping algorithm is proposed. To enhance beam coverage performance, a minimum circle algorithm is proposed to optimally design satellite beam positions and coverage radius. Given the determined user grouping strategy, system cost function is defined and the joint beam scheduling, sub-channel allocation and power allocation problem is formulated as a system cost function minimization problem. To solve the formulated optimization problem, aggregate nodes are introduced to describe the characteristics of user groups, and a parameterized deep $Q$ -network-based joint beam scheduling and power allocation algorithm is proposed. Based on the obtained user group beam scheduling and power allocation strategy, a double deep $Q$ -network algorithm and a proximal policy optimization-based joint subchannel and power allocation strategies are proposed. Simulation results validate the effectiveness of the proposed algorithms.
Wireless Frequency Synchronization Method for Transmit Beamforming of Distributed Coherent Aperture Radar

WANG Nan, WANG Hua-lin, GAO Jing, YANG Chang-shan

2025, 53(4): 1241-1250.
https://doi.org/10.12263/DZXB.20241025

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

In Distributed Coherent Aperture Radar (DCAR), inter-node frequency synchronization is required to adjust the time and phase of the emitted signals from various radar nodes, thereby completing the distributed transmission beamforming. However, existing wired synchronization methods cannot achieve frequency synchronization in scenarios such as mobile platforms or complex terrains. Therefore, this paper proposes a wireless frequency synchronization method for DCAR, aimed at transmission beamforming. Firstly, the paper derives the quantitative relationship between frequency synchronization deviation and transmission beam gain. Secondly, the DCAR nodes are divided into a master node and multiple slave nodes, and a two-dimensional coherent frequency measurement algorithm is proposed to estimate the frequency deviation of the slave nodes relative to the master node, and compensate for the frequency deviation, thus achieving frequency synchronization. Thirdly, the performance limits of frequency synchronization and beamforming using the proposed method are derived. Finally, the simulation confirmed that the performance of the proposed method is consistent with the theoretical performance limits, and by reasonably selecting the synchronization signal parameters, the frequency synchronization results can meet the requirements for the transmission beamforming of distributed coherent aperture radar.
Radar Active Electromagnetic Interference Visual Detection and Parameter Estimation Method

ZHU Xuan, RUAN Jiang-jun, WU Hao, SHI Zhao-peng, HE Fang-min, MENG Jin

2025, 53(4): 1251-1263.
https://doi.org/10.12263/DZXB.20240875

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

Towards the factors such as high dynamic variation, time-frequency aliasing and unknown interference, this paper proposes a radar active electromagnetic interference visual detection and parameter estimation method, aiming to improve the electromagnetic compatibility and anti-jamming ability of the radar system. Firstly, the time-frequency image dataset is constructed based on the modelling and simulation of electromagnetic interference signals, and the adaptive contrast and edge enhancement network (ACEENet) is used for preprocessing to strengthen the edge details and suppress noise. Then, the proposed parameter reduction enhancement network (PRENet), slim-neck with triplet attention mechanism (Slim-Neck-TAM) and combined loss function are used to improve the YOLOv8n object detection algorithm, and a high-precision electromagnetic interference visual detection network (EIVDNet) is constructed to obtain the pattern and location of interference signals. Finally, based on the location information and parameter estimation principle, the rough estimation of the key parameters of the interference signal is realized, and the accurate estimation value is obtained after correction by the XGBoost regression algorithm. The results show that the detection precision and speed of the electromagnetic interference signal can reach 99.30% and 82.75 frames/s, and the overall error rate of parameter estimation is 1.01%, which has favourable perception performance under low signal-to-noise ratio / jamming-to-noise ratio and unknown interference and can be conducive to improve the level of radar cognitive intelligence.
Counterfactual User Behavior Generation for Session-Based Recommendation

LU Xiang-kui, WU Jun

2025, 53(4): 1264-1278.
https://doi.org/10.12263/DZXB.20240783

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

To protect user privacy, many platforms offer anonymous login options, limiting recommendation systems to accessing only user behavior records within the current session, thereby leading to the development of session-based recommendation (SBR). Existing SBR approaches mainly follow the traditional paradigms of non-anonymous user behavior modeling, focusing on learning session representations through sequential modeling. However, when sessions are short, the performance of these techniques drops significantly, making it challenging to address real-world SBR scenarios dominated by short sessions. To this end, we propose a method called counterfactual inference by frequent pattern guided long sequence generation (CLSG), which aims to answer the counterfactual question: “what would be the model’s prediction if the session contained richer interactions?” CLSG follows the classical three-stage counterfactual inference process of “induction-action-prediction”. The induction stage constructs a frequent pattern knowledge base from the observed session set. The action stage generates counterfactual long sessions with the guide of the knowledge base. The prediction stage measures the discrepancy between the predictions of the observed and counterfactual sessions, and incorporates such discrepancy as a regularization term into the objective function to achieve representation consistency. Notably, CLSG is model-agnostic and can be easily applied to enhancing current SBR models. Experimental results on three benchmark datasets demonstrate that CLSG significantly improves the recommendation performance of five existing SBR models, with an average improvement of 6% in terms of both hit rate (HR) and mean reciprocal rank (MRR) metrics.
Binary Code Similarity Detection Method Based on Cross-Modal Coordinated Representation Learning

YANG Hong-yu, WANG Yun-long, HU Ze, CHENG Xiang

2025, 53(4): 1279-1292.
https://doi.org/10.12263/DZXB.20240769

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

Existing binary code similarity detection (BCSD) methods often overlook the actual execution information and local semantic details of programs, leading to suboptimal performance in assembly code semantic representation learning, high training resource consumption, and poor similarity detection performance. To address these issues, this paper proposes a cross-modal coordinated representation learning method (CMRL) for binary code similarity detection. First, we extract the semantic correspondence between assembly instruction sequences and programming language fragments to construct a contrastive learning dataset. We then propose an assembly code-programming language coordinated representation learning method (APECL), which uses the high-level semantics of source code as supervisory information. Through contrastive learning tasks, we align the feature representations of the APECL-Asm encoder and the programming language encoder in the semantic space, thereby enhancing the semantic representation learning capability of APECL-Asm for assembly instructions. Next, we design a graph neural network-based method for generating binary function embedding vectors. This method uses a semantic structure-aware network to fuse the semantic information extracted by APECL-Asm with the actual execution information of the program, generating function embedding vectors for similarity detection. Experimental results show that compared to existing methods, CMRL improves the Recall@1 metric for binary code similarity detection by 8%~33%. Additionally, in the context of code obfuscation, CMRL exhibits stronger resilience, with less degradation in the Recall@1 metric.
Dual-Stream Attention Image Inpainting Method Based on Interacting and Fusing Internal-External Features

HUANG Guang-yuan, HUANG Rong, ZHOU Shu-bo, JIANG Xue-qin

2025, 53(4): 1293-1307.
https://doi.org/10.12263/DZXB.20240780

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

The attention mechanism and its variants have been widely applied in the field of image inpainting. They divide corrupted images into complete and missing regions, and capture long-range contextual information only within the complete regions to fill in the missing regions. As the area of missing regions increases, the features of complete regions decrease, which limits the performance of the attention mechanisms and leads to suboptimal inpainting results. In order to extend the context range of the attention mechanism, we employ a vector-quantized codebook to learn visual atoms. These visual atoms, which describe the structural and textural of image patches, constitute external features for image inpainting and thus compensate for the internal features of the image. On this basis, we propose a dual-stream attention image inpainting method based on interacting and fusing internal-external features. Based on internal and external information sources, we design an internal mask attention module and an internal-external cross attention module. These two attention modules form a dual-stream attention to facilitate interaction within internal features and between internal and external features, thereby generating internal and external source inpainting features. The internal mask attention shields the interference of missing region features with a mask. It captures contextual information exclusively within the complete regions, thereby generating internal-source inpainting features. The internal-external cross attention interacts with internal and external features by calculating the similarity relationship between internal features and external features composed of visual atoms, thereby generating external-source inpainting features. In addition, we design a controllable feature fusion module that generates spatial weight maps based on the correlation between internal and external source inpainting features. These spatial weight maps fuse internal and external features by element-wise weighting of internal and external source inpainting features. Extensive experimental results on Places2, FFHQ and Paris StreetView datasets demonstrate that the proposed method achieves average improvements of 3.45%, 1.34%, 13.91%, 13.64%, and 16.92% for PSNR, SSIM, L1, LPIPS, and FID metrics respectively, compared with the state-of-the-art methods. Visualization experimental results demonstrate that both internal features and external features composed of visual atoms are beneficial for repairing corrupted images.
The Pedestrian Trajectory Prediction Model Based on Hierarchical Envelope Domain Adaptation

LI Yong-ming, LI Wen-zheng, ZHANG Xiao-heng, WANG Pin, HU Jie

2025, 53(4): 1308-1321.
https://doi.org/10.12263/DZXB.20240562

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

In complex environments, short-term pedestrian trajectory prediction finds extensive applications in autonomous driving, social robotics, intelligent security, and smart city infrastructures. Interactions among pedestrians and between pedestrians and their environment exhibit multi-scale complexities and uncertainties, posing substantial challenges. Although current deep learning models are effective in uncovering complex pedestrian interactions, they typically assume uniform motion patterns across various scenes, thereby neglecting potential distributional discrepancies. While domain adaptation models partially address this issue, they often overlook the multi-level characteristics of pedestrian interactions and environmental influences. To address these challenges, this study proposes a pedestrian trajectory prediction model founded on hierarchical envelope domain adaptation. We design a local-level envelope sample construction module by establishing local-level pedestrian adjacency relationships. An individual-level envelope sample construction module is devised based on individual pedestrian relationships. These two modules are subsequently integrated to form a bi-level envelope sample construction module. Leveraging the bi-level envelope sample construction module, we compute the spatio-temporal feature distribution of all pedestrian trajectories to construct global-level envelope samples. Employing the attention mechanism and cross-domain distribution alignment, we respectively design the local-level envelope domain adaptation and global-level envelope domain adaptation modules. These modules are then integrated into a unified framework using a weighted prediction loss function, which is jointly optimized. The experimental section utilizes two representative public datasets and compares them with five representative algorithm models. Comprehensive validation is conducted through ablation studies, parameter analysis, method comparison, and trajectory visualization. The experimental results in the ETH and UCY datasets show that compared with T-GNN, the average displacement error is reduced by 22.7% and the final displacement error is reduced by 19.8%. For the full version of the article, please refer to the link: https://github.com/LWZ9910/MESC-HEDA.git.
A Context-Aware Approach for Smart Contract Element Extraction

QIAN Xiao, JIANG Zhong-yuan, TAO Mei-yue, LIU Bing-cheng, LI Ren-xiang, GAO Sheng, MA Jian-feng

2025, 53(4): 1322-1336.
https://doi.org/10.12263/DZXB.20241038

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

Extracting key data elements from text is the primary foundation for the intelligent contract conversion demand of massive text documents in various industries. Compared with traditional named entity recognition (NER), contract element extraction (CEE) aims to extract ubiquitous, lengthy, diverse, and redundant contract elements. However, it faces challenges such as limited research in Chinese, lack of application of novel large language model (LLM) techniques, and insufficient perception of contextual features in text. This article first proposes a novel context-sensitive dynamic padding method (CDPM), a triple attention layer, and an edge-weighted loss function. They provide additional context semantics without increasing hardware requirements, enhance the perception of context related features, and improve the efficiency of element extraction training under the sequential annotation paradigm; Secondly, a context-aware deep learning framework context-aware model for contract element extraction (CAM-CEE) was proposed by integrating the above modules with the bidirectional encoder representations from transformers (BERT) embedding model, achieving high-performance element extraction for smart contract scenarios; Finally, extensive experiments are conducted on the independently constructed and publicly available datasets in this article. The results indicate that the proposed framework CAM-CEE outperforms the best baseline model in metrics such as micro F ₁ and macro F ₁, and has high generality.
Low-Rank Adaptation Based Flexibility-Aware Distillation Method

LI Jia-ming, BAO Zhi-qiang, HUANG Zhen-hua, SUN Sheng-li, CHEN Yun-wen

2025, 53(4): 1337-1346.
https://doi.org/10.12263/DZXB.20240894

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

Knowledge distillation is a learning paradigm that transfers knowledge from a complex and deep teacher model to a lightweight student model to enhance performance. To address the issues of insufficient diversity in the teacher model’s knowledge distribution and the significant resource consumption caused by the search space for constructing the student model’s architecture, we propose a low-rank adaptation based flexibility-aware distillation (LAFA) method. The LAFA method constructs low-rank transformation matrices to map teacher knowledge to both student model knowledge and class labels, thereby enhancing the diversity of distributed knowledge. Meanwhile, LAFA introduces a decision support module that dynamically adjusts the student model’s capacity, achieving a balance between distillation performance and model capacity. Furthermore, we propose the warm-up and relaxation strategies to optimize decision variables. The warm-up strategy constrains the gradual increase in model capacity to alleviate convergence difficulties caused by capacity scaling, while the relaxation strategy removes the constraints in the later stages of distillation, achieving significant performance improvements with minimal resource consumption. On the CIFAR-100 dataset, LAFA integrated into 13 distillation methods achieved an average performance improvement of 0.28 percentage points. Moreover, through ablation experiments and analytical experiments, the effectiveness of the LAFA method is further validated.
Research on Dynamic Continuous Emotional Recognition of EEG Based on Improved TCNN Algorithm

JIE Li-lin, LIU Yong, WANG Ming-xun, ZOU Yang-meng, XU Yi-lu, LU Yu-ming

2025, 53(4): 1347-1360.
https://doi.org/10.12263/DZXB.20240997

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

In real life, human emotions possess dynamic and diverse characteristics, influenced by external environments, social interactions, and an individual’s internal state. Given that EEG emotion recognition research is often confined to static laboratory scenarios and fails to adequately consider the dynamic continuity of emotions, this paper proposes a novel method for dynamic continuous emotional recognition of EEG based on an improved TCNN algorithm. Firstly, an EEG acquisition paradigm suitable for dynamic scenarios was designed. A 64-channel EEG device was used to collect EEG signals from 24 subjects experiencing six types of dynamic emotional transitions: happy to calm, calm to happy, calm to sad, sad to calm, calm to tense, and tense to calm. Dynamic continuous emotional labels are also annotated for these signals. Secondly, the existing TCNN algorithm is improved to construct a dual-stream network model for dynamic continuous emotion recognition. This model captures local temporal features through a short-term stream utilizing a time-series convolutional module, while the long-term stream captures global temporal features via a Transformer module. Lastly, feature-level fusion of the extracted EEG features is performed to achieve more accurate dynamic continuous emotion recognition results. The results show that, on the collected dataset, the proposed method achieves the smallest mean errors of 0.083 and 0.084 for valence and arousal across six emotions, respectively. On the DEAP dataset, the errors for valence and arousal are reduced to 0.108 and 0.113, respectively. Moreover, compared to four traditional machine learning methods and six deep learning approaches including GRU, CGRU, CNN, CNN-LSTM, CNN-Bi-LSTM, and TCNN, the proposed method demonstrats higher recognition accuracy and stability, effectively meeting the requirements of application scenarios.
Fundamental Research of High Computation-Capability Devices for Optical Tensor Convolution Operation

ZHANG Wen-jia

2025, 53(4): 1361-1364.
https://doi.org/10.12263/DZXB.20240294

Abstract ( ) Download PDF ( ) HTML ( ) Knowledge map Save

Convolutional Neural Network is one of the most successful algorithms in the fields of computer vision and object detection. With the explosive bandwidth growth of high-definition images and videos, intelligent computing processors require higher computational capability with less power consumption. Photonic technology has inherent capability of coherent combination and multidimensional manipulation, and will become an inevitable approach to realize tensor convolution operations. This paper introduces the research motivation, primary research challenges, solution approaches, and future prospects of high computation-capability optical tensor convolutional devices. It also explores the main limiting factors restraining the application of optical tensor convolution operations, aiming to drive this promising technology from basic research to large-scale applications.

Peer Review System

News

More >>

25 April 2025
Volume 53 Issue 4

Most Download Most Read

Visited

Please choose a citation manager

Content to export

模态框（Modal）标题

Please choose a citation manager

Content to export