CIE Homepage  |  Join CIE  |  Login CIE  |  中文 

Collections

论文数据共享支持计划网站发布
Sort by Default Latest Most read  
Please wait a minute...
  • Select all
    |
  • PAPERS
    GUO Xiang-xing, ZHOU Wei, YANG Zheng-yi, WEN Jun-hao, YANG Jia-jia, LIU Man
    ACTA ELECTRONICA SINICA. 2025, 53(1): 151-162. https://doi.org/10.12263/DZXB.20230387

    Social recommender systems based on graph neural networks (GNNs) have achieved promising performance. However, challenges exist in GNN-based social recommendation models, such as the neighborhood aggregation operation of GNN-based models amplifying noise in users' implicit behaviors, resulting in suboptimal user and item representations. Additionally, the heterogeneity of edges in the user-item graph and the user social relationship graph leads to user representations learned on two different semantic spaces, where direct fusion also results in suboptimal representations. To address these issues, this paper proposes a social recommendation model based on self-supervised graph convolution and an attention mechanism to achieve implicit feedback noise reduction. The model captures users' true interests from the original user-item graph, generating a denoised user-item interaction graph; a novel method is introduced for fusing user vectors to integrate heterogeneous user vector representations. Experimental results on two public datasets demonstrate that the proposed model significantly improves the recommendation performance over the baseline models. Specifically, on the lastfm dataset, the performance improvement ranges from 1.18% to 3.87%, while on the ciao dataset, the improvement ranges from 3.56% to 7.31%.The effectiveness of each module is verified through ablation experiments.

  • PAPERS
    CUI Jian-feng, LIANG Hong
    ACTA ELECTRONICA SINICA. 2025, 53(1): 63-71. https://doi.org/10.12263/DZXB.20240292

    We investigated the optimal intersection problem of a direction-finding cross-location system composed of 1D and 2D passive sensors. Utilizing closed-form solutions for localization accuracy, extremum analysis, and geometric intersection analysis, we identified the global optimal intersection point and explored the spatial distribution characteristics of optimal intersection positions, as well as their influencing factors and underlying principles. The study reveals that the global optimal intersection point lies in the horizontal plane of the baseline (or 2D sensor). The optimal intersection locations are jointly determined by the geometric intersection characteristics and the distance diffusion effect of measurement errors, distributed around an arc on the horizontal plane with the midpoint of the baseline as the center and the baseline length as the diameter, collapsing towards the baseline. Variations in sensor positions do not affect the relative position of the optimal intersection location to the baseline; once the variance ratio of the baseline and angular measurement errors is established, the optimal intersection location is determined. Furthermore, case analysis suggests that the optimal intersection area converges towards sensor with larger angular measurement errors. In practical engineering applications, the optimal intersection area holds greater utility than the optimal intersection point; matching the optimal intersection locations with target detection results or estimated positions can effectively enhance the system’s positioning performance.

  • PAPERS
    ZHANG Yu-xiang, LI Wei, ZHANG Meng-meng, TAO Ran
    ACTA ELECTRONICA SINICA. 2025, 53(1): 248-258. https://doi.org/10.12263/DZXB.20230937

    In cross-scene classification tasks, most domain adaptation (DA) methods typically focus on transfer tasks where the source domain data and the target domain data are obtained using the same sensor and share the same land cover class. However, the adaptive performance is significantly reduced when new classes are present in the target data. Moreover, many hyperspectral image (HSI) classification methods rely on a global representation mechanism, where representation learning is performed on samples with fixed-size windows, limiting their ability to effectively represent ground object classes. A framework called local representation few-shot learning (LrFSL) is proposed, which aims to overcome the limitations of global representation ability by constructing a local representation mechanism in few-shot learning. In this proposed framework, meta-tasks are created for all labeled source domain data and a few labeled target domain data, and scenario training is performed simultaneously using a meta-learning strategy. Additionally, an Intra-domain local representation block (ILR-block) is designed to extract semantic information from multiple local representations within each sample. Furthermore, the inter-domain local alignment block (ILA-block) is designed to align cross-domain class-wise distribution, thereby mitigating the impact of domain shift on few-shot learning. Experimental results on three publicly available HSI datasets demonstrate that the proposed method outperforms state-of-the-art methods by a significant margin.

  • PAPERS
    LEI Tao, ZHANG Jun-ming, DU Xiao-gang, MIN Chong-dan, YANG Zi-yao
    ACTA ELECTRONICA SINICA. 2024, 52(12): 4142-4152. https://doi.org/10.12263/DZXB.20231011

    To solve the long-standing problems of the great scale variation in target sizes and blurred boundaries that make segmentation difficult in medical image segmentation, we propose a novel dual-branch hybrid network framework based on feature encoding and gated decoder based on multi-scale feature for accurate multi-organ segmentation. In order to fully exploit the strengths of convolutional neural network (CNN) in local information extraction and transformers in modeling long-range dependency, we employ U-Net and Swin-Unet to construct the dual-branch network. The innovation of this method lies in the shuffling operation of high-dimensional features extracted at multiple stages from different branches of the network. It efficiently integrates local and global information by means of a dual-branch channel cross-fusion, enhancing information interaction between the dual-branch network at different stages. This addresses the limitation in segmentation accuracy caused by the blurring of object contours in images. Additionally, to address the challenge of great scale variation among multiple organs, we introduce a new gated decoder based on multi-scale feature (GDMF) to extract multi-scale high-dimensional features at different stages of the network and perform adaptive feature enhancement, and adopts the attention mechanisms and feature mappings to assist in acquiring accurate target information. The experimental results on automated cardiac diagnosis challenge (ACDC) and fast and low GPU memory abdominal organ segmentation challenge 2021 (FLARE21) datasets demonstrate that our proposed method outperforms existing mainstream medical image segmentation methods and effectively solves the problems of the great scale variation in target sizes and blurred boundary in medical images.

  • PAPERS
    GU Tao-chen, WAN Fa-yu, RAVELO Blaise
    ACTA ELECTRONICA SINICA. 2024, 52(12): 3967-3975. https://doi.org/10.12263/DZXB.20240185

    A fundamental theory of novel ultra-wide band (UWB) bandpass (BP) negative group delay (NGD) topology is established in this paper. The microwave circuit under study consists of lossy transmission lines and stepped impedance resonators. The flat NGD topology is constructed using fully distributed elements. The ABCD- and S-parameter models are formulated to derive the NGD optimal values and bandwidth. In order to verify the theoretical feasibility, NGD prototypes are designed, fabricated, and measured. The flat BP-NGD microstrip circuit has a compact size of 11 mm × 81 mm (0.13 λg × 1.01 λg) with a NGD center frequency of fn =2.14 GHz. Excellent agreement has been observed between experimental and theoretical results, revealing ΔfNGD=1.28 GHz (BWNGD=61%fn ) NGD bandwidth and tn =-0.52 ns NGD value. Furthermore, within the NGD frequency band, the flat BP-NGD prototype presents a good performance in terms of bandwidth about ΔfNGD =1.01 GHz, BWflat-NGD =48%fn with tn ±0.05 ns group delay fluctuation. Compared with similar broadband flat NGD circuits, the flat NGD bandwidth of the SIR NGD circuit proposed in this article is increased by about 215%. The flat BP-NGD prototype return loss at the center frequency is better than 18.8 dB.

  • PAPERS
    CHEN Shuang, TIAN Ye, FU Ying
    ACTA ELECTRONICA SINICA. 2024, 52(10): 3600-3612. https://doi.org/10.12263/DZXB.20230343

    Quantum image sensor (QIS) has ultra-high single-photon sensitivity and spatial resolution, making it a promising alternative to CMOS image sensor (CIS) as the next-generation image sensor. However, image reconstruction of QIS differs from traditional image reconstruction methods, it aims to recover the original scene from binary measurements. The existing methods include model-based QIS image reconstruction and deep learning-based QIS image reconstruction. Model-based methods are largely based on optimization and are highly sensitive to the selection of hyperparameters. While deep learning-based methods require designing and training separate models for QIS image reconstruction tasks with slight variations in detail, which is inflexible and limits its usefulness to a large extent. In order to tackle the problems in QIS image reconstruction, a tuning-free plug-and-play alternating direction method of multiplier (TFPnP-ADMM) QIS image reconstruction method is proposed in this paper, which can adaptively select appropriate parameters dynamically for different input images with various oversampling factors, so as to achieve better image reconstruction performance. Specifically, in this paper, the parameters that need to be manually tuned in the QIS image reconstruction process under the plug-and-play (PnP) framework are modeled as a sequential decision problem, and a mixed model-free and model-based reinforcement learning algorithm is introduced to learn an optimal strategy, which could determine optimal hyperparameters at each iteration for different input images. The experimental results on synthetic dataset and real dataset demonstrate that, compared with existing state-of-the-art methods, the proposed method improves the peak signal-to-noise ratio by approximately 0.44~0.60 dB under oversampling rates of 4, 6, and 8. Furthermore, the visual results demonstrate the superiority of the proposed method in retaining more texture details. Real extremely low light QIS image data is available at https://github.com/ying-fu/Real-SPAD-Dataset.

  • PAPERS
    TIAN Sheng-jing, HAN Yi-nan, ZHAO Xian-tong, LIU Xiu-ping, ZHANG Ming
    ACTA ELECTRONICA SINICA. 2024, 52(10): 3527-3540. https://doi.org/10.12263/DZXB.20231009

    The potential of sparse convolution in the field of single target tracking from LiDAR (Lightlaser Detection And Ranging) point cloud has not been fully explored. The vast majority of point cloud tracking algorithms use point-based backbone networks which require higher computation costs and the target-aware relationship modeling is insufficient. To address this problem, this paper proposes a 3D target tracking algorithm based on a sparse convolutional framework, and incorporates it with a point-voxel dual channel relationship modeling module to facilitate the embedding of target discrimination information in the such sparse framework. Firstly, this work uses a 3D convolutional residual network to extract the features of the template and search area separately, then uses deconvolution to obtain pointwise features for the spatial position in tracking tasks. Secondly, the relationship modeling module further calculates a semantic similarity query table based on the above features of the template and the search area. In order to capture the fine-grained correlation, on the one hand, the module utilizes the nearest neighbor algorithm in the spatial point channel to find the template points for each search area point, and extracts corresponding features based on the query table; on the other hand, local multi-scale voxels are constructed with each search area point as the center in the voxel channel, and the accumulated similarity of templates falling into voxel units is used as clues to extract features. Finally, the dual channel feature fusion is sent into the candidate bounding box generation module based on bird’s-eye view to estimate the target bounding box. To verify the superiority of the proposed method, we evaluated it on the KITTI and NuScenes datasets, and compared with the baseline algorithm adopting sparse convolution, the mean success and precision rates achieved a considerable improvement of 11.0% and 12.0%. The proposed method not only inherits the efficient characteristics of sparse convolution but also improves tracking accuracy.

  • PAPERS
    YU Yi-feng, QIAN Jiang-bo, YAN Di-qun, WANG Chong, DONG Li
    ACTA ELECTRONICA SINICA. 2024, 52(7): 2491-2502. https://doi.org/10.12263/DZXB.20230622
    Abstract (1247) Download PDF (773) HTML (1199)   Knowledge map   Save

    Coloring long sequences of animated sketch frames is a challenging task in computer vision. On one hand, the information contained in sketches is sparse, and coloring algorithms need to infer missing information. On the other hand, the colors between consecutive frames need to be consistent to ensure visual quality throughout the video. Most existing coloring algorithms are designed for single images and only provide one open-ended, reasonable color result, which is not suitable for coloring frame sequences. Other reference-based coloring algorithms do not have an organic connection between two frames, resulting in unsatisfactory coloring results. In the same shot sequence, the features of same object usually do not change too much. Therefore, a model that can automatically color sketches based on a given reference frame can be designed. This paper proposes a new model called Cross-CNN that combines convolutional neural networks (CNN) and Transformer. Our Cross-CNN can find and match colors from the reference frame, thus ensuring temporal feature consistency. In this model, the reference frame and the sketch frame are superimposed in the channel dimension, and the pre-trained Resnet50 network is used to extract locally fused features. The fused feature map is then passed to the Transformer structure for encoding to extract global features. In the Transformer structure, a cross attention mechanism is designed to better match long-distance features. Finally, a convolutional decoder with skip connections is used to output the colored image. In terms of the dataset, this paper extracted frames from eight movies and conducted strict screening to create a dataset containing 20 000 pairs of reference and sketch frames for experimental research. The SSIM (Structural SIMilarity) of Cross-CNN can reach 0.932, which is higher than the SOTA algorithm by 0.014. The algorithm codes link for this paper: https://github.com/silenye/Cross-CNN.

  • PAPERS
    LI Fei, GUO Shao-zhong, HAO Jiang-wei, HOU Ming, SONG Guang-hui, XU Jin-chen
    ACTA ELECTRONICA SINICA. 2024, 52(5): 1633-1647. https://doi.org/10.12263/DZXB.20220375

    RISC-V instruction set architecture (ISA), as a new streamlined ISA, has developed rapidly due to its characteristics of free, open source, and freedom. Since the research on RISC-V at home and abroad mainly focuses on hardware development, the software ecosystem is still weak compared to mature ISAs. Implementing a set of high-performance basic math libraries for the RISC-V instruction set can further enrich the RISC-V software ecosystem. This paper realizes the transplantation of Sunway math library to RISC-V based on automatic transplantation technology, and provides the first basic math library system using vector instruction optimization for RISC-V instruction architecture. This paper proposes an automatic branch look-up table method and a path marker insertion method for vector registers, focusing on solving the problem of register multiplexing in the process of register mapping between different architectures, realizing the correct and efficient mapping of registers, and automatically transplanting 69 mathematical functions according to different instruction equivalence conversion strategies. The test results show that the RISC-V basic math library function can achieve correct calculation, the maximum error is 1.90ULP, and the average performance of functions is 157.03 beats.

  • PAPERS
    QIAO Tong, CHEN Yu-xing, XIE Shi-chuang, YAO Heng, LUO Xiang-yang
    ACTA ELECTRONICA SINICA. 2024, 52(3): 924-936. https://doi.org/10.12263/DZXB.20220711
    Abstract (1092) Download PDF (1026) HTML (1001)   Knowledge map   Save
    CSCD(1)

    Currently, it is very difficult to identify the images synthesized by generative adversarial networks (GAN), which severely poses the threat on national cyber security and social stability. Meanwhile, most classifiers based on deep neural networks require large-scale samples for training, where the problems such as low model interpretability and poor generalization performance are less addressed. To overcome the limitations, we propose to design the ensemble classifier using fused features in the multi-color channels. First of all, by studying the discrimination of adjacent pixels in the multi-color channels between natural and GAN synthetic images, the difference metric is designed based on the correlation of adjacent pixels, in order to select the optimal color channels. Secondly, by utilizing the highly-correlated relationship among pixels, the difference array between adjacent pixels are modeled through a second-order Markov chain along eight directions, and meanwhile the subtractive pixel adjacency matrix features are successfully extracted. Finally, based on the extracted features, a simple but efficient detector for identifying GAN synthetic images is constructed. In the image dataset synthesized by the StyleGAN model, the results show that the accuracy of the proposed detector can reach 100.00%. It can also identify GAN synthetic images very well when the pair number of positive and negative training samples is 2 (99.65% accuracy) or only 50 positive training samples are provided (92.84% accuracy). The accuracy can also reach more than 99.96% in the image dataset synthesized by StyleGAN2 and PGGAN models. Numerous experiments show that the proposed method in this paper is better than the compared forensic methods. Our code is available at https://github.com/cyxcyx559/ccss.

  • PAPERS
    LI Fan, ZHANG Xiao-heng, LI Yong-ming, WANG Pin
    ACTA ELECTRONICA SINICA. 2024, 52(3): 751-761. https://doi.org/10.12263/DZXB.20220712

    Ensemble methods have become an important branch of imbalanced learning. However, the existing imbalanced ensemble methods all rely on the original instances without considering the structure information of the instances, so their effectiveness is still limited. The research shows that the structure information of instances includes local and global structure information. In order to solve the above problem, this paper proposes an imbalanced ensemble algorithm based on deep instance envelope network (DIEN) and hierarchical structure consistency mechanism (HSCM). Considering the local manifold and global structure information, the algorithm generates high-quality deep envelope instances to achieve class balance. Firstly, based on the instance neighborhood concatenation and fuzzy c-means clustering algorithm, the DIEN is designed to mine the structure information of instances, obtaining the deep envelope instances. Then, the local manifold structure measure and global structure distribution measure are designed to construct the HSCM to enhance the distribution consistency of interlayer instances. Next, DIEN and HSCM are combined to construct the optimized deep instance envelope network—DH (DIEN with HSCM). Then, the base classifier is applied to the deep envelope instances. Finally, the bagging ensemble learning mechanism is designed to fuse the prediction results of the base classifier to obtain the final results. At the end of this paper, several groups of experiments are organized. More than 10 public datasets and representative related algorithms are used for verification. Experimental results show that the proposed algorithm is significantly better in four performance metrics, such as AUC (Area Under Curve) and F-measure.

  • PAPERS
    CHEN Jun-yi, JIANG De-chen, WANG Zhi-ming, CAO Jia-he, WANG Yong
    ACTA ELECTRONICA SINICA. 2023, 51(8): 2179-2187. https://doi.org/10.12263/DZXB.20211410

    This paper proposes a gesture recognition algorithm based on frequency modulated continuous wave (FMCW) radar echo signals. Firstly, a two-dimensional filtering algorithm is proposed to filter the gesture echo signals in the distance and speed dimensions, which effectively reduces the static noise of the system. Secondly, the data is filtered by the moving target indicator (MTI) algorithm to filter out the noise in the time dimension. Then a time-adaptive fixed-length method is proposed, which ensures the consistency of the frame number of each gesture sample on the premise of reducing the loss of gesture information. Finally, a range Doppler net (RD-Net) is established for training and classification. The algorithm achieved 98.28% accuracy in Google's open source deepsoli data set, which is 11.11% higher than the algorithm proposed by the data set. The algorithm achieves 90.8% accuracy in real-time reasoning experiments and has better generalization ability.

  • LUO Zhong-tao, GONG Yan-ru, LI Ji-xuan, LU Kun
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.240403
    Online available: 2024-11-15

    Sky-wave over-the-horizon radar (OTHR) effectiveness is limited by the operation environment. When the ionospheric state is bad or the operating parameters are unsuitable, the radar signal will not illuminate the scheduled area. Hence, the fact that the land-sea clutter (LSC) is normal or abnormal directly reflects the working status of OTHR. To address the scarcity and imbalance of OTHR clutter signals, a data enhancement method based on generative adversarial network is proposed for clutter range-Doppler image enhancement. A lightweight ResNet18 model is used for real-time identification of the radar images. Further, an LSC anomaly detector (LSCAD) is designed to achieve automatic identification of the radar LSC situation. The LSCAD extracts the high-amplitude region from the radar range-Doppler map, classifies it by the classification network based on the augmented dataset, and feds back to the radar operator. Simulation results show that the LSC data enhancement increases the LSC classifier accuracy by 25.26%. The LSCAD can make a correct judgement on the LSC status of the real data and literature images. Therefore, the LSCAD can be used as an extended module of the OTHR and provides automatic detection and warning about the LSC anomaly, which helps OTHR improving the degree of automation.

  • GUO Zhe, ZHANG Zhi-bo, ZHOU Wei-jie, FAN Yang-yu, ZHANG Yan-ning
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.230429
    Online available: 2024-11-28

    Current research on Chinese long text summarization based on deep learning has the following problems: (1) summarization models lack information guidance, fail to focus on keywords and sentences, leading to the problem of losing critical information under long-distance span; (2) the word lists of existing Chinese long text summarization models are often word-based and do not contain common Chinese words and punctuation, which is not conducive to extracting multi-grained semantic information. To solve the above problems, a Chinese long text summarization method with guided attention (CLSGA) is proposed in this paper. Firstly, for the long text summarization task, an extraction model is presented to extract the core words and sentences in the long text to construct the guided text, which can guide the generation model to focus on more important information in the encoding process. Secondly, the Chinese long text vocabulary is designed to changing the text structure from words statistics to phrases statistics, which is conducive to extracting richer multi-granularity features. Hierarchical location decomposition encoding is then introduced to efficiently extend location encoding of long text and accelerate network convergence. Finally, the local attention mechanism is combined with the guided attention mechanism to effectively capture the important information under the long text span and improve the accuracy of summarization. Experimental results on four public Chinese abstract datasets with different lengths, LCSTS, CNewSum, NLPCC2017 and SFZY2020, show that our proposed method has significant advantages over long text summarization and can effectively improve the value of ROUGE-1, ROUGE-2 and ROUGE-L.

  • ZHANG Qing-long, HAN Rui, LIU Chi
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20240518
    Online available: 2024-12-17

    Foundation models deployed in dynamic edge environment encounter continuously evolving input data distributions, requiring retraining them to maintain high accuracy. However, existing retraining techniques can only train fixed compressed models within the constraints of device resources and retraining windows, thus considerably lowering accuracies due to these small models’ limited generalization ability. For such an issue, this paper proposes BlockTrainer, an edge-cloud collaborative retraining approach of foundation models at the block granularity. BlockTrainer first introduces a model retraining scaling law to evaluate the accuracy contributions of different blocks in a foundation model according to its latest input data at edge. Based on this evaluation, it generates the optimal retraining solution under resource constraints, and dynamically converts the most accuracy-relevant parts of the model into retrainable small models at edge, thereby constructing a collaborative training system between large and small models. Comparative experiments on real edge-cloud platforms show that BlockTrainer improves the retraining accuracy of foundation models by 81.24% using the same resource consumptions, and supports retraining a model of up to 33 billion parameters.

  • ZENG Kai, WAN Zi-xin, Wang Ming-tao, SHEN Tao
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20240640
    Online available: 2024-12-23

    Restoring the weight distribution, activation distribution, and gradient to the original full precision network data as much as possible can greatly improve the inference ability of the binary network. However, existing methods directly apply the restoration operation in forward propagation to binary data, and the gradient approximation functions for backpropagation are fixed or manually determined, resulting in the need for improvement in the restoration efficiency of binary networks. To address this problem, the efficient restoration method is investigated for binary neural networks. Firstly, a distribution recovery method for maximizing information entropy is proposed. By shifting the original full precision weight mean and scaling the modulus, the quantized binary weight directly has the characteristic of maximum distribution restoration. At the same time, a simple statistical translation and scaling factor is used to greatly improve the restoration efficiency of weight and activation; Furthermore, it is proposed a gradient function based on adaptive distribution approximation, which dynamically determines the update range of the current gradient in the P-percentile according to the actual distribution of the current full precision data. It adaptively changes the shape of the approximation function to efficiently update the gradient during the training process, thereby improving the convergence ability of the model. On the premise of ensuring the improvement of execution efficiency, theoretical analysis has confirmed that the method proposed in this paper can achieve maximum restoration of binary data. Compared with the existing advanced binary network models, the experimental results of our method show excellent performance, with a 60% and 67% reduction in computational time for the distribution restoration operation quantization of ResNet-18 and ResNet-20, respectively. An accuracy of 93.0% was achieved for VGG-Small binary quantization on the CIFAR-10 dataset, and 61.9% was achieved for ResNet-18 binary quantization on the ImageNet dataset, both of which are the best performance of the current binary neural network. The relevant code is available inhttps://github.com/sjmp525/IA/tree/ER-BNN.

  • ZHU Zheng-yu, ZHAO Hang-ran, WANG Zi-xuan, WANG Zhong-yong, KONG Ke-xian, LIANG Jing
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20240487
    Online available: 2024-12-24

    Aiming at the problem that traditional frequency hopping network station sorting technology is ineffective under low signal-to-noise ratio conditions and has poor real-time detection performance, this paper proposes a shortwave frequency hop-ping signal sorting algorithm based on the improved YOLOv8. First, the short-time Fourier transform is performed on the received aliasing signal to generate a grayscale time-frequency image as the input of the YOLOv8 network model. Secondly, in view of the impact of frequency collisions between aliasing signals such as sweep frequency signals, fixed frequency signals and frequency hopping signals on detection accuracy, the Deformable Convolutional Net-works v2 is introduced in the C2f layer to improve the generalization ability of network feature extraction. Thirdly, the Simam attention mechanism is added to the backbone layer to solve the problem that background noise is easily confused with frequency hopping signals and affects detection accuracy under low signal-to-noise ratio. Finally, the convolutional kernel of Detect module is replaced by Partial Convolution kernel, which reduces the computational complexity of the network by 32.18% without the accuracy loss of mAP@0.5 exceeding 0.37%, and improve the inference speed of the network model. Experimental results show that the improved YOLOv8 algorithm proposed in this paper has a separation rate of 97.68% at -5 dB signal-to-noise ratio, and the model has fast convergence and strong robustness.

  • LUO Ke, LI Wei, JIAN Yu-gen, GAO Hong-yu, ZHANG Ke-zheng, LIAO Yan-zhe, WU Yu-fei, CHEN Jin-cai, LU Ping
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20230527
    Online available: 2025-01-06

    As the recording density of magnetic storage increases, the recording bit spacing decreases and the magnetization transition noise increases significantly, which greatly affects the quality of the readback signal. To mitigate the interference of magnetization transition noise problem among recording patterns in ultra-high density magnetic storage systems, the maximum transition run(MTR) constraint code MTR(j=1), which limits the continuous transition, is proposed to effectively suppress the magnetization transition noise compared with the constraint codes MTR(j=2) and MTR(j=3), which allow continuous transitions. We investigate the detection effect of the readback signal experimentally. When the signal-to-noise ratio is 12 dB, the detection bit error rate (BER) of MTR(j=1) is reduced by about 30% and 60% relatively compared with MTR(j=2) and MTR(j=3), respectively. We confirmed that the MTR(j=1) constrained coding that forbids continuous transitions can achieve higher data detection reliability.

  • YANG Hong-yu, WANG Yun-long, HU Ze, CHENG Xiang
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20240769
    Online available: 2025-02-26

    Existing binary code similarity detection(BCSD) methods often overlook the actual execution information and local semantic details of programs, leading to suboptimal performance in assembly code semantic representation learning, high training resource consumption, and poor similarity detection performance. To address these issues, this paper proposes a cross-modal Coordinated Representation Learning method(CMRL) for binary code similarity detection. First, we extract the semantic correspondence between assembly instruction sequences and programming language fragments to construct a contrastive learning dataset. We then propose an Assembly Code-Programming Language Coordinated Representation Learning method(APECL), which uses the high-level semantics of source code as supervisory information. Through contrastive learning tasks, we align the feature representations of the APECL-Asm encoder and the programming language encoder in the semantic space, thereby enhancing the semantic representation learning capability of APECL-Asm for assembly instructions. Next, we design a graph neural network-based method for generating binary function embedding vectors. This method uses a semantic structure-aware network to fuse the semantic information extracted by APECL-Asm with the actual execution information of the program, generating function embedding vectors for similarity detection. Experimental results show that compared to existing methods, CMRL improves the Recall@1 metric for binary code similarity detection by 8%-33%. Additionally, in the context of code obfuscation, CMRL exhibits stronger resilience, with less degradation in the Recall@1 metric.

  • ZHANG Si-ya, CHAI Rong, LIANG Cheng-chao, CHEN Qian-bin
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20240116
    Online available: 2025-02-28

    Multibeam satellite communication systems have received widespread attentions due to their high throughput and efficient resource utilization. This paper investigates the beam scheduling and resource allocation problem in multibeam satellite communication system. By jointly considering user position and service characteristics, an optics-based initial user grouping algorithm is proposed. To enhance beam coverage performance, a minimum circle algorithm is proposed to optimally design satellite beam positions and coverage radius. Given the determined user grouping strategy, system cost function is defined and the joint beam scheduling, sub-channel allocation and power allocation problem is formulated as a system cost function minimization problem. To solve the formulated optimization problem, aggregate nodes are introduced to describe the characteristics of user groups, and a parameterized deep Q-network-based joint beam scheduling and power allocation algorithm is proposed. Based on the obtained user group beam scheduling and power allocation strategy, a double deep Q-network algorithm and a proximal policy optimization-based joint subchannel and power allocation strategies are proposed. Simulation results validate the effectiveness of the proposed algorithms.

  • GAO Ning, LI Yu-rong, CHEN Hong, CHEN Wen-sheng, JIA Zi-hao
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20240998
    Online available: 2025-03-04

    Atrial fibrillation(AF) is a common arrhythmia often associated with cardiovascular diseases such as stroke and heart failure. Although numerous researchers have made substantial progress in AF detection using deep learning methods in recent years, most of these methods require extensive computational resources. Moreover, the clinical application of these models is challenging due to the black-box nature of deep learning models. Therefore, this paper proposes a lightweight AF detection model based on feature fusion and conducts an interpretability study. The model comprises an ECG(Electrocardiogram) backbone network and an RRI(R-R Interval) branch. The ECG backbone network uses depthwise separable convolutions along with a few standard convolutions to extract deep morphological features of the ECG signals, while the RRI branch employs multi-scale convolutions to extract deep rhythm features of the RRI. The network learns robust feature representations by fusing morphological features and rhythm features to detect AF accurately. As to interpretability analysis, Grad-CAM++ is utilized to visualize the contribution of different features to the classification results. In this paper, the training and dataset internal tests are conducted in the LTAFDB and achieved an accuracy of 97.99%. In order to validate the generalization performance of the model, external testing experiments are conducted using the AFDB and the CPSC2021, achieving an accuracy of 95.17% and 93.81%, respectively. Experimental results demonstrate that the proposed method is lightweight, stable, and accurate, and the incorporation of interpretable deep-learning techniques suggests that the proposed method holds significant potential for the clinical diagnosis of AF.

  • HUANG Guang-yuan, HUANG Rong, ZHOU Shu-bo, JIANG Xue-qin
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20240780
    Online available: 2025-03-04

    The attention mechanism and its variants have been widely applied in the field of image inpainting. They divide corrupted images into complete and missing regions, and capture long-range contextual information only within the complete regions to fill in the missing regions. As the area of missing regions increases, the features of complete regions decrease, which limits the performance of the attention mechanisms and leads to suboptimal inpainting results. In order to extend the context range of the attention mechanism, we employ a vector-quantized codebook to learn visual atoms. These visual atoms, which describe the structural and textural of image patches, constitute external features for image inpainting and thus compensate for the internal features of the image. On this basis, we propose a dual-stream attention image inpainting method based on interacting and fusing internal-external features. Based on internal and external information sources, we design an internal mask attention module and an internal-external cross attention module. These two attention modules form a dual-stream attention to facilitate interaction between internal features and between internal and external features, thereby generating internal- and external- source inpainting features. The internal mask attention shields the interference of missing region features with a mask. It captures contextual information exclusively within the complete regions, thereby generating internal-source inpainting features. The internal-external cross attention interacts with internal and external features by calculating the similarity relationship between internal features and external features composed of visual atoms, thereby generating external-source inpainting features. In addition, we design a controllable feature fusion module that generates spatial weight maps based on the correlation between internal- and external- source inpainting features. These spatial weight maps fuse internal and external features by element-wise weighting of internal- and external- source inpainting features. Extensive experimental results on Places2, FFHQ and Paris StreetView datasets demonstrate that the proposed method achieves average improvements of 3.45%, 1.34%, 13.91%, 13.64%, and 16.92% for PSNR, SSIM, L1, LPIPS, and FID metrics respectively, compared with the state-of-the-art methods. Visualization experimental results demonstrate that both internal features and external features composed of visual atoms are beneficial for repairing corrupted images.

  • GAO Yun-long, SHI Shu-guang, ZHAO Zhi-xiang, CAO Chao, PAN Jin-yan
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20240682
    Online available: 2025-03-06

    Due to the curse of dimensionality, effectively discarding redundant features while retaining critical information in high-dimensional data has become a key issue. Unsupervised feature selection, which performs dimensionality reduction without any prior class information, has attracted increasing attention. However, two common issues are ignored by existing unsupervised feature selection methods: Fuzziness is a common characteristic of data, but most existing unsupervised feature selection methods based on regularized regression ignore this aspect, resulting in suboptimal feature subsets; Most methods fail to effectively distinguish between normal and noisy samples and are susceptible to the noise. To tackle the mentioned issues, robust unsupervised feature selection with double fuzzy(DFRFS) learning is proposed. Specifically, DFRFS learning introduces fuzzy membership into unsupervised feature selection based on regularized regression, allowing data to be shared among multiple clusters, thereby better reflecting the complex structure and uncertainty of the data. Additionally, DFRFS learning assigns different weights to samples through the robust weight learning framework, thus suppressing the impact of noise while retaining the effect of normal samples. Experiments on toy and real-world datasets have demonstrated the effectiveness of the proposed method DFRFS learning.

  • LU Xiangkui, WU Jun
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20240783
    Online available: 2025-03-17

    To protect user privacy, many platforms offer anonymous login options, limiting recommendation systems to accessing only user behavior records within the current session, thereby leading to the development of session-based recommendation(SBR). Existing SBR approaches mainly follow the traditional paradigms of non-anonymous user behavior modeling, focusing on learning session representations through sequential modeling. However, when sessions are short, the performance of these techniques drops significantly, making it challenging to address real-world SBR scenarios dominated by short sessions. To this end, we propose a method called counterfactual inference by frequent pattern guided long sequence generation (CLSG), which aims to answer the counterfactual question: “what would be the model’s prediction if the session contained richer interactions?” CLSG follows the classical three-stage counterfactual inference process of “induction-action-prediction”. The induction stage constructs a frequent pattern knowledge base from the observed session set. The action stage generates counterfactual long sessions with the guide of the knowledge base. The prediction stage measures the discrepancy between the predictions of the observed and counterfactual sessions, and incorporates such discrepancy as a regularization term into the objective function to achieve representation consistency. Notably, CLSG is model-agnostic and can be easily applied to enhancing current SBR models. Experimental results on three benchmark datasets demonstrate that CLSG significantly improves the recommendation performance of five existing SBR models, with an average improvement of 6% in terms of both hit rate (HR) and mean reciprocal rank (MRR) metrics.

  • LIU Qi-hang, LEI Qian-qian, XIONG Jian-hui, ZHANG Xu-dong
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20240241
    Online available: 2025-03-25

    To solve the compatibility problem of multi-band on a single chip in the RF front-end of the receiver, this paper proposes a new bandwidth-reconfigurable low noise amplifier(LNA) structure for UWB applications. This LNA is based on switchable reconfigurable design methods, embedding the switchable design in the load of the cascaded LNA circuit. The design achieves switching of in-band input impedance matching and gain curves for different UWB operating modes by controlling the position of low-frequency impedance resonance point and corresponding gain pole through the reconfigurable design of the load inductance of the resistive parallel negative feedback structure. Compared with the design methods of introducing switches in the input/output matching path, placing switches at the load optimizes gain and noise performance without affecting impedance matching. The resistors and inductors in the traditional inductive peaking technique are adjustable to consider gain flatness within different operating bandwidths. Based on SMIC 28 nm CMOS technology, the simulation results of electromagnetic modeling demonstrate that the LNA operates in three modes: 3.1~10.6 GHz, 6~10.6 GHz, and 3.1~5 GHz, with in-band voltage gain(S 21) above 16.59 dB and minimum noise figure below 3 dB. Under 0.8 V power supply voltage, all three modes exhibit input and output matching(S 11, S 22) below -10 dB, with a static power consumption of only 9.03 mW; after introducing MOS switches, the noise figure degradation of the LNA in all three bandwidths is less than 0.2 dB.

  • WU Qi, WANG Zi-tong, ZHANG Dong-liang, XIA Si-yu, FAN Wen-qi, CHEN Yi-long
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20240933
    Online available: 2025-04-02

    The measurement and control of advanced air vehicle requires the realization of multiple functions such as telemetry, remote control, communication, and tracking. Traditionally, it is generally composed of multiple wireless transceiver systems and discrete antennas. The contradiction between its volume, weight, cost, installation, etc. and the limited resources of the air vehicle is becoming increasingly prominent. The antenna aperture synthesis enables a single multi-functional antenna aperture to perform the functions of multiple dedicated antenna apertures. This greatly reduces the number of antenna apertures. It also significantly eases the pressure on the antenna aperture layout on the air vehicle platform, offering a new way to enhance the system-level electromagnetic compatibility. This paper systematically elaborates on the technical route of antenna aperture synthesis for air vehicle measurement and control communication. It focuses on introducing the multi-band and multi-polarization antenna technology for the synthesis of multiple discrete antennas, the diplexer antenna technology for the synthesis of transmitted and received antennas, the shared-aperture antenna technology for the synthesis of multiple antennas in the same aperture, and the coupling suppression technology for the integration of the same-frequency antenna array. At the same time, combined with the working characteristics of the software-defined radio system, it analyzes the advantages and feasibility of the application of the software-defined radio system in the air vehicle measurement and control communication system. Finally, this paper looks forward to the development of the antenna aperture synthesis technology for air vehicle measurement and control communication and puts forward the possible development directions of the antenna aperture synthesis technology in the development of the air vehicle measurement and control communication system.

  • WU Hai-yang, YU Ning-mei
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20240624
    Online available: 2025-04-15

    It is crucial to monitor ethanol in real time due to the safety risks posed by its high volatility and flammability. However, current methods for improving the performance of SnO2 ethanol sensors often hinder the miniaturization of devices. To address this, the paper designs an intrinsic SnO2 ethanol sensor with a field-effect transistor structure and employs magnetron sputtering to fabricate the sensitive film. The study systematically investigates the influence of gate voltage on the gas-sensing performance of the sensor. Experimental results indicate that the SnO2 sensor prepared by sputtering is an n-channel depletion-mode device. Gas-sensing tests reveal significant differences in the sensor’s response under different operating gate voltages: at a gate voltage of 10 V, the current change of the sensor in 100 ppm ethanol is 2.40 times; while at a gate voltage of -30 V, the channel current change is significantly enhanced to 3.42 times, representing a 42% improvement compared to 10 V. Further investigation shows that the gas-sensing properties of SnO2 arise from the modulation of carrier concentration in the channel by the surface adsorption of ethanol molecules. This effect is significantly enhanced under negative gate voltage but suppressed under positive gate voltage. However, a positive gate voltage of 10 V induces more electrons in the channel, effectively accelerating the adsorption and desorption processes of ethanol. As a result, the sensor's response and recovery times to 100 ppm ethanol are reduced to 8 s and 17 s, respectively, demonstrating faster dynamic characteristics. The study’s findings indicate that the degree and rate of ethanol vapor reaction on the SnO2 surface are significantly regulated by the sensor’s gate voltage. This research provides a new approach for optimizing the gas-sensing performance of SnO2 sensors and contributes to advancing their application in miniaturized, fast-response, and high-precision gas-sensing detection.