CIE Homepage  |  Join CIE  |  Login CIE  |  中文 

Collections

论文数据共享支持计划网站发布
Sort by Default Latest Most read  
Please wait a minute...
  • Select all
    |
  • PAPERS
    CHEN Shuang, TIAN Ye, FU Ying
    ACTA ELECTRONICA SINICA. 2024, 52(10): 3600-3612. https://doi.org/10.12263/DZXB.20230343

    Quantum image sensor (QIS) has ultra-high single-photon sensitivity and spatial resolution, making it a promising alternative to CMOS image sensor (CIS) as the next-generation image sensor. However, image reconstruction of QIS differs from traditional image reconstruction methods, it aims to recover the original scene from binary measurements. The existing methods include model-based QIS image reconstruction and deep learning-based QIS image reconstruction. Model-based methods are largely based on optimization and are highly sensitive to the selection of hyperparameters. While deep learning-based methods require designing and training separate models for QIS image reconstruction tasks with slight variations in detail, which is inflexible and limits its usefulness to a large extent. In order to tackle the problems in QIS image reconstruction, a tuning-free plug-and-play alternating direction method of multiplier (TFPnP-ADMM) QIS image reconstruction method is proposed in this paper, which can adaptively select appropriate parameters dynamically for different input images with various oversampling factors, so as to achieve better image reconstruction performance. Specifically, in this paper, the parameters that need to be manually tuned in the QIS image reconstruction process under the plug-and-play (PnP) framework are modeled as a sequential decision problem, and a mixed model-free and model-based reinforcement learning algorithm is introduced to learn an optimal strategy, which could determine optimal hyperparameters at each iteration for different input images. The experimental results on synthetic dataset and real dataset demonstrate that, compared with existing state-of-the-art methods, the proposed method improves the peak signal-to-noise ratio by approximately 0.44~0.60 dB under oversampling rates of 4, 6, and 8. Furthermore, the visual results demonstrate the superiority of the proposed method in retaining more texture details. Real extremely low light QIS image data is available at https://github.com/ying-fu/Real-SPAD-Dataset.

  • PAPERS
    TIAN Sheng-jing, HAN Yi-nan, ZHAO Xian-tong, LIU Xiu-ping, ZHANG Ming
    ACTA ELECTRONICA SINICA. 2024, 52(10): 3527-3540. https://doi.org/10.12263/DZXB.20231009

    The potential of sparse convolution in the field of single target tracking from LiDAR (Lightlaser Detection And Ranging) point cloud has not been fully explored. The vast majority of point cloud tracking algorithms use point-based backbone networks which require higher computation costs and the target-aware relationship modeling is insufficient. To address this problem, this paper proposes a 3D target tracking algorithm based on a sparse convolutional framework, and incorporates it with a point-voxel dual channel relationship modeling module to facilitate the embedding of target discrimination information in the such sparse framework. Firstly, this work uses a 3D convolutional residual network to extract the features of the template and search area separately, then uses deconvolution to obtain pointwise features for the spatial position in tracking tasks. Secondly, the relationship modeling module further calculates a semantic similarity query table based on the above features of the template and the search area. In order to capture the fine-grained correlation, on the one hand, the module utilizes the nearest neighbor algorithm in the spatial point channel to find the template points for each search area point, and extracts corresponding features based on the query table; on the other hand, local multi-scale voxels are constructed with each search area point as the center in the voxel channel, and the accumulated similarity of templates falling into voxel units is used as clues to extract features. Finally, the dual channel feature fusion is sent into the candidate bounding box generation module based on bird’s-eye view to estimate the target bounding box. To verify the superiority of the proposed method, we evaluated it on the KITTI and NuScenes datasets, and compared with the baseline algorithm adopting sparse convolution, the mean success and precision rates achieved a considerable improvement of 11.0% and 12.0%. The proposed method not only inherits the efficient characteristics of sparse convolution but also improves tracking accuracy.

  • PAPERS
    YU Yi-feng, QIAN Jiang-bo, YAN Di-qun, WANG Chong, DONG Li
    ACTA ELECTRONICA SINICA. 2024, 52(7): 2491-2502. https://doi.org/10.12263/DZXB.20230622
    Abstract (1218) Download PDF (608) HTML (1171)   Knowledge map   Save

    Coloring long sequences of animated sketch frames is a challenging task in computer vision. On one hand, the information contained in sketches is sparse, and coloring algorithms need to infer missing information. On the other hand, the colors between consecutive frames need to be consistent to ensure visual quality throughout the video. Most existing coloring algorithms are designed for single images and only provide one open-ended, reasonable color result, which is not suitable for coloring frame sequences. Other reference-based coloring algorithms do not have an organic connection between two frames, resulting in unsatisfactory coloring results. In the same shot sequence, the features of same object usually do not change too much. Therefore, a model that can automatically color sketches based on a given reference frame can be designed. This paper proposes a new model called Cross-CNN that combines convolutional neural networks (CNN) and Transformer. Our Cross-CNN can find and match colors from the reference frame, thus ensuring temporal feature consistency. In this model, the reference frame and the sketch frame are superimposed in the channel dimension, and the pre-trained Resnet50 network is used to extract locally fused features. The fused feature map is then passed to the Transformer structure for encoding to extract global features. In the Transformer structure, a cross attention mechanism is designed to better match long-distance features. Finally, a convolutional decoder with skip connections is used to output the colored image. In terms of the dataset, this paper extracted frames from eight movies and conducted strict screening to create a dataset containing 20 000 pairs of reference and sketch frames for experimental research. The SSIM (Structural SIMilarity) of Cross-CNN can reach 0.932, which is higher than the SOTA algorithm by 0.014. The algorithm codes link for this paper: https://github.com/silenye/Cross-CNN.

  • PAPERS
    LI Fei, GUO Shao-zhong, HAO Jiang-wei, HOU Ming, SONG Guang-hui, XU Jin-chen
    ACTA ELECTRONICA SINICA. 2024, 52(5): 1633-1647. https://doi.org/10.12263/DZXB.20220375

    RISC-V instruction set architecture (ISA), as a new streamlined ISA, has developed rapidly due to its characteristics of free, open source, and freedom. Since the research on RISC-V at home and abroad mainly focuses on hardware development, the software ecosystem is still weak compared to mature ISAs. Implementing a set of high-performance basic math libraries for the RISC-V instruction set can further enrich the RISC-V software ecosystem. This paper realizes the transplantation of Sunway math library to RISC-V based on automatic transplantation technology, and provides the first basic math library system using vector instruction optimization for RISC-V instruction architecture. This paper proposes an automatic branch look-up table method and a path marker insertion method for vector registers, focusing on solving the problem of register multiplexing in the process of register mapping between different architectures, realizing the correct and efficient mapping of registers, and automatically transplanting 69 mathematical functions according to different instruction equivalence conversion strategies. The test results show that the RISC-V basic math library function can achieve correct calculation, the maximum error is 1.90ULP, and the average performance of functions is 157.03 beats.

  • PAPERS
    QIAO Tong, CHEN Yu-xing, XIE Shi-chuang, YAO Heng, LUO Xiang-yang
    ACTA ELECTRONICA SINICA. 2024, 52(3): 924-936. https://doi.org/10.12263/DZXB.20220711
    CSCD(1)

    Currently, it is very difficult to identify the images synthesized by generative adversarial networks (GAN), which severely poses the threat on national cyber security and social stability. Meanwhile, most classifiers based on deep neural networks require large-scale samples for training, where the problems such as low model interpretability and poor generalization performance are less addressed. To overcome the limitations, we propose to design the ensemble classifier using fused features in the multi-color channels. First of all, by studying the discrimination of adjacent pixels in the multi-color channels between natural and GAN synthetic images, the difference metric is designed based on the correlation of adjacent pixels, in order to select the optimal color channels. Secondly, by utilizing the highly-correlated relationship among pixels, the difference array between adjacent pixels are modeled through a second-order Markov chain along eight directions, and meanwhile the subtractive pixel adjacency matrix features are successfully extracted. Finally, based on the extracted features, a simple but efficient detector for identifying GAN synthetic images is constructed. In the image dataset synthesized by the StyleGAN model, the results show that the accuracy of the proposed detector can reach 100.00%. It can also identify GAN synthetic images very well when the pair number of positive and negative training samples is 2 (99.65% accuracy) or only 50 positive training samples are provided (92.84% accuracy). The accuracy can also reach more than 99.96% in the image dataset synthesized by StyleGAN2 and PGGAN models. Numerous experiments show that the proposed method in this paper is better than the compared forensic methods. Our code is available at https://github.com/cyxcyx559/ccss.

  • PAPERS
    LI Fan, ZHANG Xiao-heng, LI Yong-ming, WANG Pin
    ACTA ELECTRONICA SINICA. 2024, 52(3): 751-761. https://doi.org/10.12263/DZXB.20220712

    Ensemble methods have become an important branch of imbalanced learning. However, the existing imbalanced ensemble methods all rely on the original instances without considering the structure information of the instances, so their effectiveness is still limited. The research shows that the structure information of instances includes local and global structure information. In order to solve the above problem, this paper proposes an imbalanced ensemble algorithm based on deep instance envelope network (DIEN) and hierarchical structure consistency mechanism (HSCM). Considering the local manifold and global structure information, the algorithm generates high-quality deep envelope instances to achieve class balance. Firstly, based on the instance neighborhood concatenation and fuzzy c-means clustering algorithm, the DIEN is designed to mine the structure information of instances, obtaining the deep envelope instances. Then, the local manifold structure measure and global structure distribution measure are designed to construct the HSCM to enhance the distribution consistency of interlayer instances. Next, DIEN and HSCM are combined to construct the optimized deep instance envelope network—DH (DIEN with HSCM). Then, the base classifier is applied to the deep envelope instances. Finally, the bagging ensemble learning mechanism is designed to fuse the prediction results of the base classifier to obtain the final results. At the end of this paper, several groups of experiments are organized. More than 10 public datasets and representative related algorithms are used for verification. Experimental results show that the proposed algorithm is significantly better in four performance metrics, such as AUC (Area Under Curve) and F-measure.

  • PAPERS
    CHEN Jun-yi, JIANG De-chen, WANG Zhi-ming, CAO Jia-he, WANG Yong
    ACTA ELECTRONICA SINICA. 2023, 51(8): 2179-2187. https://doi.org/10.12263/DZXB.20211410

    This paper proposes a gesture recognition algorithm based on frequency modulated continuous wave (FMCW) radar echo signals. Firstly, a two-dimensional filtering algorithm is proposed to filter the gesture echo signals in the distance and speed dimensions, which effectively reduces the static noise of the system. Secondly, the data is filtered by the moving target indicator (MTI) algorithm to filter out the noise in the time dimension. Then a time-adaptive fixed-length method is proposed, which ensures the consistency of the frame number of each gesture sample on the premise of reducing the loss of gesture information. Finally, a range Doppler net (RD-Net) is established for training and classification. The algorithm achieved 98.28% accuracy in Google's open source deepsoli data set, which is 11.11% higher than the algorithm proposed by the data set. The algorithm achieves 90.8% accuracy in real-time reasoning experiments and has better generalization ability.

  • LUO Zhong-tao, GONG Yan-ru, LI Ji-xuan, LU Kun
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.240403
    Online available: 2024-11-15

    Sky-wave over-the-horizon radar (OTHR) effectiveness is limited by the operation environment. When the ionospheric state is bad or the operating parameters are unsuitable, the radar signal will not illuminate the scheduled area. Hence, the fact that the land-sea clutter (LSC) is normal or abnormal directly reflects the working status of OTHR. To address the scarcity and imbalance of OTHR clutter signals, a data enhancement method based on generative adversarial network is proposed for clutter range-Doppler image enhancement. A lightweight ResNet18 model is used for real-time identification of the radar images. Further, an LSC anomaly detector (LSCAD) is designed to achieve automatic identification of the radar LSC situation. The LSCAD extracts the high-amplitude region from the radar range-Doppler map, classifies it by the classification network based on the augmented dataset, and feds back to the radar operator. Simulation results show that the LSC data enhancement increases the LSC classifier accuracy by 25.26%. The LSCAD can make a correct judgement on the LSC status of the real data and literature images. Therefore, the LSCAD can be used as an extended module of the OTHR and provides automatic detection and warning about the LSC anomaly, which helps OTHR improving the degree of automation.

  • CUI Jian-feng, LIANG Hong
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20240292
    Online available: 2024-11-28

    We investigated the optimal intersection problem of a direction-finding cross-location system composed of 1D and 2D passive sensors. Utilizing closed-form solutions for localization accuracy, extremum analysis, and geometric intersection analysis, we identified the global optimal intersection point and explored the spatial distribution characteristics of optimal intersection positions, as well as their influencing factors and underlying principles. The study reveals that the global optimal intersection point lies in the horizontal plane of the baseline (or 2D sensor). The optimal intersection locations are jointly determined by the geometric intersection characteristics and the distance diffusion effect of measurement errors, distributed around an arc on the horizontal plane with the midpoint of the baseline as the center and the baseline length as the diameter, collapsing towards the baseline. Variations in sensor positions do not affect the relative position of the optimal intersection location to the baseline; once the variance ratio of the baseline and angular measurement errors is established, the optimal intersection location is determined. Furthermore, case analysis suggests that the optimal intersection area converges towards sensor with larger angular measurement errors. In practical engineering applications, the optimal intersection area holds greater utility than the optimal intersection point; matching the optimal intersection locations with target detection results or estimated positions can effectively enhance the system’s positioning performance.

  • LEI Tao, ZHANG Jun-ming, DU Xiao-gang, MIN Chong-dan, YANG Zi-yao
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20231011
    Online available: 2024-11-28

    To solve the long-standing problems of the great scale variation in target sizes and blurred boundaries that make segmentation difficult in medical image segmentation, we propose a novel dual-branch hybrid network framework based on feature encoding and gated decoder based on multi-scale feature for accurate multi-organ segmentation. In order to fully exploit the strengths of convolutional neural network(CNN) in local information extraction and transformers in modeling long-range dependency, we employ U-Net and Swin-Unet to construct the dual-branch network. The innovation of this method lies in the shuffling operation of high-dimensional features extracted at multiple stages from different branches of the network. It efficiently integrates local and global information by means of a dual-branch channel cross-fusion, enhancing information interaction between the dual-branch network at different stages. This addresses the limitation in segmentation accuracy caused by the blurring of object contours in images. Additionally, to address the challenge of great scale variation among multiple organs, we introduce a new gated decoder based on multi-scale feature(GDMF) to extract multi-scale high-dimensional features at different stages of the network and perform adaptive feature enhancement, and adopts the attention mechanisms and feature mappings to assist in acquiring accurate target information. The experimental results on automated cardiac diagnosis challenge(ACDC) and fast and low GPU memory abdominal organ segmentation challenge 2021(FLARE21) datasets demonstrate that our proposed method outperforms existing mainstream medical image segmentation methods and effectively solves the problems of the great scale variation in target sizes and blurred boundary in medical images.

  • GU Tao-chen, WAN Fa-yu, RAVELO Blaise
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20240185
    Online available: 2024-11-28

    A fundamental theory of novel ultra-wide band(UWB) bandpass(BP) negative group delay(NGD) topology is established in this paper. The microwave circuit under study consists of lossy transmission lines and stepped impedance resonators. The flat NGD topology is constructed using fully distributed elements. The ABCD- and S-parameter models are formulated to derive the NGD optimal values and bandwidth. In order to verify the theoretical feasibility, NGD prototypes were designed, fabricated, and measured. The flat BP-NGD microstrip circuit has a compact size of 11 mm × 81 mm (0.13 λg × 1.01 λg) with a NGD center frequency of fn =2.14 GHz. Excellent agreement has been observed between experimental and theoretical results, revealing Δf NGD=1.28 GHz (BWNGD=61%fn ) NGD bandwidth and tn =-0.52 ns NGD value. Furthermore, within the NGD frequency band, the flat BP-NGD prototype presents a good performance in terms of bandwidth about Δf NGD =1.01 GHz, BWflat-NGD =48%fn with tn ±0.05 ns group delay fluctuation. Compared with similar broadband flat NGD circuits, the flat NGD bandwidth of the SIR NGD circuit proposed in this article is increased by about 120%. The flat BP-NGD prototype return loss at the center frequency is better than 18.8 dB.

  • GUO Zhe, ZHANG Zhi-bo, ZHOU Wei-jie, FAN Yang-yu, ZHANG Yan-ning
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.230429
    Online available: 2024-11-28

    Current research on Chinese long text summarization based on deep learning has the following problems: (1) summarization models lack information guidance, fail to focus on keywords and sentences, leading to the problem of losing critical information under long-distance span; (2) the word lists of existing Chinese long text summarization models are often word-based and do not contain common Chinese words and punctuation, which is not conducive to extracting multi-grained semantic information. To solve the above problems, a Chinese long text summarization method with guided attention (CLSGA) is proposed in this paper. Firstly, for the long text summarization task, an extraction model is presented to extract the core words and sentences in the long text to construct the guided text, which can guide the generation model to focus on more important information in the encoding process. Secondly, the Chinese long text vocabulary is designed to changing the text structure from words statistics to phrases statistics, which is conducive to extracting richer multi-granularity features. Hierarchical location decomposition encoding is then introduced to efficiently extend location encoding of long text and accelerate network convergence. Finally, the local attention mechanism is combined with the guided attention mechanism to effectively capture the important information under the long text span and improve the accuracy of summarization. Experimental results on four public Chinese abstract datasets with different lengths, LCSTS, CNewSum, NLPCC2017 and SFZY2020, show that our proposed method has significant advantages over long text summarization and can effectively improve the value of ROUGE-1, ROUGE-2 and ROUGE-L.

  • ZHANG Yu-xiang, LI Wei, ZHANG Meng-meng, TAO Ran
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20230937
    Online available: 2024-12-10

    In cross-scene classification tasks, most domain adaptation (DA) methods typically focus on transfer tasks where the source domain data and the target domain data are obtained using the same sensor and share the same land cover class. However, the adaptive performance is significantly reduced when new classes are present in the target data. Moreover, many hyperspectral image (HSI) classification methods rely on a global representation mechanism, where representation learning is performed on samples with fixed-size windows, limiting their ability to effectively represent ground object classes. A framework called local representation few-shot learning (LrFSL) is proposed, which aims to overcome the limitations of global representation ability by constructing a local representation mechanism in few-shot learning. In this proposed framework, meta-tasks are created for all labeled source domain data and a few labeled target domain data, and scenario training is performed simultaneously using a meta-learning strategy. Additionally, an Intra-domain local representation block (ILR-block) is designed to extract semantic information from multiple local representations within each sample. Furthermore, the inter-domain local alignment block (ILA-block) is designed to align cross-domain class-wise distribution, thereby mitigating the impact of domain shift on few-shot learning. Experimental results on three publicly available HSI datasets demonstrate that the proposed method outperforms state-of-the-art methods by a significant margin.

  • ZENG Kai, WAN Zi-xin, Wang Ming-tao, SHEN Tao
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20240640
    Online available: 2024-12-23

    Restoring the weight distribution, activation distribution, and gradient to the original full precision network data as much as possible can greatly improve the inference ability of the binary network. However, existing methods directly apply the restoration operation in forward propagation to binary data, and the gradient approximation functions for backpropagation are fixed or manually determined, resulting in the need for improvement in the restoration efficiency of binary networks. To address this problem, the efficient restoration method is investigated for binary neural networks. Firstly, a distribution recovery method for maximizing information entropy is proposed. By shifting the original full precision weight mean and scaling the modulus, the quantized binary weight directly has the characteristic of maximum distribution restoration. At the same time, a simple statistical translation and scaling factor is used to greatly improve the restoration efficiency of weight and activation; Furthermore, it is proposed a gradient function based on adaptive distribution approximation, which dynamically determines the update range of the current gradient in the P-percentile according to the actual distribution of the current full precision data. It adaptively changes the shape of the approximation function to efficiently update the gradient during the training process, thereby improving the convergence ability of the model. On the premise of ensuring the improvement of execution efficiency, theoretical analysis has confirmed that the method proposed in this paper can achieve maximum restoration of binary data. Compared with the existing advanced binary network models, the experimental results of our method show excellent performance, with a 60% and 67% reduction in computational time for the distribution restoration operation quantization of ResNet-18 and ResNet-20, respectively. An accuracy of 93.0% was achieved for VGG-Small binary quantization on the CIFAR-10 dataset, and 61.9% was achieved for ResNet-18 binary quantization on the ImageNet dataset, both of which are the best performance of the current binary neural network. The relevant code is available inhttps://github.com/sjmp525/IA/tree/ER-BNN.

  • ZHU Zheng-yu, ZHAO Hang-ran, WANG Zi-xuan, WANG Zhong-yong, KONG Ke-xian, LIANG Jing
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20240487
    Online available: 2024-12-24

    Aiming at the problem that traditional frequency hopping network station sorting technology is ineffective under low signal-to-noise ratio conditions and has poor real-time detection performance, this paper proposes a shortwave frequency hop-ping signal sorting algorithm based on the improved YOLOv8. First, the short-time Fourier transform is performed on the received aliasing signal to generate a grayscale time-frequency image as the input of the YOLOv8 network model. Secondly, in view of the impact of frequency collisions between aliasing signals such as sweep frequency signals, fixed frequency signals and frequency hopping signals on detection accuracy, the Deformable Convolutional Net-works v2 is introduced in the C2f layer to improve the generalization ability of network feature extraction. Thirdly, the Simam attention mechanism is added to the backbone layer to solve the problem that background noise is easily confused with frequency hopping signals and affects detection accuracy under low signal-to-noise ratio. Finally, the convolutional kernel of Detect module is replaced by Partial Convolution kernel, which reduces the computational complexity of the network by 32.18% without the accuracy loss of mAP@0.5 exceeding 0.37%, and improve the inference speed of the network model. Experimental results show that the improved YOLOv8 algorithm proposed in this paper has a separation rate of 97.68% at -5 dB signal-to-noise ratio, and the model has fast convergence and strong robustness.

  • LUO Ke, LI Wei, JIAN Yu-gen, GAO Hong-yu, ZHANG Ke-zheng, LIAO Yan-zhe, WU Yu-fei, CHEN Jin-cai, LU Ping
    ACTA ELECTRONICA SINICA. https://doi.org/10.12263/DZXB.20230527
    Online available: 2025-01-06

    As the recording density of magnetic storage increases, the recording bit spacing decreases and the magnetization transition noise increases significantly, which greatly affects the quality of the readback signal. To mitigate the interference of magnetization transition noise problem among recording patterns in ultra-high density magnetic storage systems, the maximum transition run(MTR) constraint code MTR(j=1), which limits the continuous transition, is proposed to effectively suppress the magnetization transition noise compared with the constraint codes MTR(j=2) and MTR(j=3), which allow continuous transitions. We investigate the detection effect of the readback signal experimentally. When the signal-to-noise ratio is 12 dB, the detection bit error rate (BER) of MTR(j=1) is reduced by about 30% and 60% relatively compared with MTR(j=2) and MTR(j=3), respectively. We confirmed that the MTR(j=1) constrained coding that forbids continuous transitions can achieve higher data detection reliability.