CIE Homepage  |  Join CIE  |  Login CIE  |  中文 

Most accessed

  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All

Please wait a minute...
  • Select all
    |
  • PAPERS
    ZHANG Ze-wei, BAO Wei-min, FANG Hai-yan, SU Jian-yu, LI Xiao-ping, YAO Yun-feng
    ACTA ELECTRONICA SINICA. 2024, 52(9): 2939-2949. https://doi.org/10.12263/DZXB.20221003
    Abstract (2046) Download PDF (446) HTML (1992)   Knowledge map   Save

    A high-precision X-ray photon arrival time conversion model is crucial to the accuracy of X-ray Pulsar-based Navigation. Aiming at the current problem that the complete model is complex and the simplified model has limited accuracy, a fast simplified model with accuracy no less than the existing simplified model is proposed in this paper. Through the derivation of the existing complete model, the influence of each delay item on the accuracy of the model is theoretically analyzed, and it is pointed out that the Roemer delay is still the key to the accuracy of the simplified model. A fast simplified model was obtained by changing the expression of the Roemer delay and its second-order expansion, and considering the ease of access to physical quantities in practical application. The accuracy and computational efficiency of the proposed model are analyzed by using the complete model and the proposed simplified model to time-transform the measured photon data of NICER (Neutron star Interior Composition Explorer) and HXMT (Hard X-ray Modulation Telescope) satellites. Furthermore, the influence of orbital altitude and pulsar angular position measurement errors on the accuracy of the simplified model is analyzed by numerical simulation, and the accuracy and computational efficiency of the simplified model in the application of Earth orbit at different altitudes are discussed. The results show that the computational efficiency of the simplified model proposed in this paper is improved 50% than that of the Sheikh’s simplified model and 10% than the fei’s model, without causing a decrease in accuracy.

  • PAPERS
    LIU Xin, HAI Yang, DAI Wei
    ACTA ELECTRONICA SINICA. 2024, 52(9): 3052-3064. https://doi.org/10.12263/DZXB.20230957
    Abstract (2009) Download PDF (1242) HTML (1930)   Knowledge map   Save

    The state space model is a common and important model structure for automation and control. In this paper, the robust identification of nonlinear state-space model corrupted by outliers is investigated. The outliers imposed on both the state transition process and the output measurement process are considered and a more comprehensive and robust identification algorithm is proposed. To ensure the robustness of the proposed algorithm, two independent heavy-tailed Student's t-distributions are used to describe the state noise and the output noise, respectively. Then the particle smoothing method is applied to estimate the posterior distribution of the unknown states. Finally, the expectation maximization algorithm is used to realize the parameter estimation problem. The mathematical decomposition of the Student's t-distribution is employed in the identification process which brings two main advantages: (1) facilitating the derivation and implementation of the proposed algorithm; (2) providing a more clearer explanation of the robustness of the algorithm. The usefulness of the proposed algorithm is demonstrated via the numerical and mechanical examples.

  • PAPERS
    JIN Xiao-zhong, LIU Hai-kun, LAI Hao, MAO Fu-bing, ZHANG Yu, LIAO Xiao-fei, JIN Hai
    ACTA ELECTRONICA SINICA. 2024, 52(9): 3038-3051. https://doi.org/10.12263/DZXB.20221257
    Abstract (1969) Download PDF (1550) HTML (1887)   Knowledge map   Save

    Heterogeneous memory systems composed of traditional dynamic random access memory (DRAM) and new non-volatile memory (NVM) can be organized in a horizontal architecture or a hierarchical architecture. The horizontal DRAM/NVM architecture often requires page migration technologies to improve memory access performance. However, hot page monitoring and migration implemented in operating systems would cause significant software performance overhead. The hardware-supported hierarchical architecture even increases the memory access latency for big data applications with poor data locality due to the deeper memory hierarchy. To this end, this paper proposes a reconfigurable heterogeneous memory architecture that can be converted between horizontal and hierarchical architectures at runtime to dynamically adapt the memory access characteristics of different applications. We design a DRAM/NVM heterogeneous memory controller (HMC) based on the new instruction set architecture RISC-V (Reduced Instruction Set Computing-V). The HMC uses a few hardware counters for memory access monitoring and analyzing, and achieves dynamic address mapping and efficient page migration between DRAM and NVM pages. Experimental results show that the DRAM/NVM hybrid memory controller can improve application performance by 43%.

  • PAPERS
    CAI Hua, YI Ya-xi, FU Qiang, RAN Yue, SUN Jun-xi
    ACTA ELECTRONICA SINICA. 2024, 52(10): 3368-3381. https://doi.org/10.12263/DZXB.20240271
    Abstract (1901) Download PDF (2040) HTML (1819)   Knowledge map   Save

    Current multimodal pre-training techniques for visual languages predominantly focus on aligning global semantic features between images and text, yet they inadequately explore the granular feature interactions between modalities. Addressing this gap, this paper proposes a novel multimodal pre-training strategy informed by cross-modal guidance and alignment. Our method employs a dual-stream feature extraction network designed for visual sequence compression, to facilitate modality feature extraction. During this phase, a synergistic image-text guidance is integrated within the visual encoder, orchestrating the compression of visual sequences layer by layer. This approach mitigates the obfuscation of modality-specific fine-grained interactions by irrelevant visual information. Subsequently, in the modality feature alignment phase, we implement fine-grained relational reasoning on the image and textual features to achieve localized feature alignment among visual tokens and textual tokens. This advancement bolsters the model's comprehension of fine-grained alignment relationships. After fine-tuning, in the image-text retrieval tasks, our approach achieves an average recall rate of 86.4% for images and 94.88% for texts, which represents a significant 5.36% improvement in zero-shot image-text retrieval over the canonical CLIP (Contrastive Language-Image Pre-training) algorithm. Moreover, our method also surpasses existing mainstream multimodal pre-training methods in accuracy for classification tasks like visual question answering.

  • PAPERS
    CHEN Xu-chu, PU Yu, ZHANG Wei-qiang
    ACTA ELECTRONICA SINICA. 2024, 52(9): 2971-2978. https://doi.org/10.12263/DZXB.20230050
    Abstract (1857) Download PDF (1370) HTML (1746)   Knowledge map   Save

    Alzheimer's disease (AD) is a neurodegenerative disease that causes symptoms such as aphasia and decreased speech fluency. Researchers have used articulatory features, paralinguistic features such as fluency and pauses, or features extracted from transcribed text to detect Alzheimer's disease. However, traditional acoustic feature detection methods are difficult to obtain semantic information, while transcribing speech into text is time-consuming and laborious, and the quality of transcription is significantly degraded due to the effects of accent and disease in the elderly. In this paper, we propose a dVAE-BERT (discrete Variational Autoencoders-Bidirectional Encoder Representations from Transformers) model, which uses discrete Variational Autoencoders (dVAE) to convert speech into pseudo-phoneme sequences, and then uses the Bidirectional Encoder Representations from Transformers (BERT) model to model the connection relations of the pseudo-phoneme sequences to extract the representation of audio in the language dimension. The accuracy of the model on the ADReSSo (Alzheimer's Dementia Recognition through Spontaneous Speech only) dataset is 70.42%, which is 5.63% better than the baseline system, and its accuracy is 76.06% and 71.83% after fusion with Wav2vec2.0 and Hidden-unit BERT (HuBERT) models, respectively.

  • PAPERS
    XING Chang-da, WANG Mei-ling, XU Yong-chang, WANG Zhi-sheng
    ACTA ELECTRONICA SINICA. 2024, 52(9): 3010-3022. https://doi.org/10.12263/DZXB.20230077
    Abstract (1813) Download PDF (1308) HTML (1757)   Knowledge map   Save

    Feature extraction is a key operation for hyperspectral image (HSI) classification. For current classification approaches, they usually ignore the information preservation and spatial distribution in feature extraction, which may export features with low information utilization and disordered distribution, generating unsatisfactory prediction results. To remedy such deficiencies, a novel method based on structure-wise feature reconstruction is proposed for the HSI classification. This method can reduce the information loss and improve the information preservation during the process of feature extraction. In addition, the distribution is also fully considered to enhance the discriminability and separability. In this proposed method, considering the reconstruction idea and the self-expression theory, a structure-wise feature reconstruction model is constructed to extract the features of the HSI, which can improve the information utilization of original information from the HSI and describe the structure reflecting the well-ordered distribution. Here, an optimization with alternative updating is presented to solve the above constructed model. The support vector machine is finally used to classify the extracted features and predict the labels of the HSI. The Salinas, Pavia Center, Botswana, and Houston datasets are used for experimental validation. Results show that the proposed method achieves the better classification performance compared with some state-of-the-art approaches, which is averagely higher 2.6%, 3.9%, 3.3% at OA (Overall Accuracy), AA (Average Accuracy), and Kappa indexes.

  • PAPERS
    JIANG Shun-rong, SHI Kun, ZHOU Yong
    ACTA ELECTRONICA SINICA. 2024, 52(9): 3023-3037. https://doi.org/10.12263/DZXB.20221299
    Abstract (1745) Download PDF (1519) HTML (1716)   Knowledge map   Save

    Micro-grid is a distributed small-scale power generation and distribution system, which has realized the circular flow of electricity through adjacent energy trading according to the different needs of prosumers. In order to develop optimal price and transaction strategies in energy trading of micro-grid, we proposed a double sealed bid (DSB) auction scheme according to the characteristics of consortium blockchain. Except met key economic properties (individual rationality, budget balance, and so on), this scheme would determine the final winner based on the users' offers, bids, volumes, average price and other factors. In the meanwhile, in order to protect the personal privacy of users in the auction process, we proposed the blockchain-based differential privacy (BDP) algorithm based on the differential privacy theory and the characteristics of the DSB auction scheme, which was satisfied with differential privacy demands and mean validity through privacy analysis and data validity analysis. Finally, we applied the BDP algorithm to the DSB auction scheme and realized a safe and efficient double energy auction privacy-preserving scheme—differential privacy-based double auction on blockchain (DPDAB), which not only developed the optimal price and transaction strategy but also protected the users' privacy in the process of auction. In addition, we analyzed the influence of the BDP algorithm on auction data and the data computation time overhead on the auction scheme through experiments, and proved the validity of the DPDAB scheme in terms of average benefit, user satisfaction and social welfare through comparative experiments.

  • PAPERS
    CUI Yi-han, LIANG Yan, SONG Qian-qian, ZHANG Hui-xia, WANG Fan
    ACTA ELECTRONICA SINICA. 2024, 52(9): 2961-2970. https://doi.org/10.12263/DZXB.20230440
    Abstract (1671) Download PDF (1502) HTML (1592)   Knowledge map   Save

    With the increasing complexity of modern battlefield environment and the upgrading of aviation equipment technology, massive multi-source heterogeneous sensor data inevitably appear inconsistent and incomplete problems. Traditional multi-sensor fusion method ignores sensor features correlation, and forms a closed data-driven recognition system of sensors. Whereas expert cognition, domain experience, attribute rules and other knowledge can instruct model construction and inference recognition of comprehensive target recognition in the form of expert experience, rule constraints and so on, this paper presents a method of knowledge assisted integrated identification of aerial targets. First of all, a military combat knowledge map of typical aerial target features is constructed, and key feature parameters are extracted to establish a target identification framework model. Then data basic trust assignment and evidence conflict credibility are constructed at recognition and decision recognition level respectively. Besides, time-domain fusion rules for high-conflict evidence is formulated to adjust timing fusion weights by using historical data. Finally, type recognition of multi-sensor is hierarchically realized through static reasoning and dynamic fusion. This study recognition accuracy is better than the existing algorithms in typical aerial target recognition tasks, demonstrating the effectiveness of the proposed algorithm.

  • PAPER
    WANG Yu, WANG Zhen, WEN Li-qiang, LI Wei-ping, ZHAO Wen
    ACTA ELECTRONICA SINICA. 2024, 52(9): 2950-2960. https://doi.org/10.12263/DZXB.20221187
    Abstract (1639) Download PDF (210) HTML (1623)   Knowledge map   Save

    The task of document-level relation extraction aims to extract facts from multiple sentences of unstructured documents, which is a key step in the construction of domain knowledge graph and knowledge answering application. The task requires that the model not only capture the complex interactions between entities based on the structural features of documents, but also deal with the serious long-tail category distribution problem. Existing table-based relation extraction models try to solve this issue, but they mainly model documents in two-dimensional “entity/entity” space, and use multi-layer convolutional network or restricted self-attention mechanism to extract the interaction features between entities, which cannot avoid the influence of category overlap and capture the directional features of relationships, resulting in the lack of decoupled semantic information of interaction. For the above challenges, this paper proposes a new document-level relation extraction model, named DRE-3DC (Document-Level Relation Extraction with Three-Dimensional Representation Combination Modeling), in which the “entity/entity” modeling extend to the form of three-dimensional “entity/entities/relationship” modeling method. Based on the deformable convolution in triple attention mechanism, the model effectively distinguishes and integrates the interaction features under different semantic space and adaptively captures the document structural features. At the same time, we propose a multi-task learning method to enhance the perception of relation category combination of documents to alleviate the long-tail distribution problem. The experimental results reveal better score on DocRED and Revisit-DocRED dataset respectively. The effectiveness of the proposed method was verified by ablation experiment, comparative analysis and example analysis.

  • PAPERS
    GUO Zi-yue, QUAN Hui-min, PENG Zi-shun, DAI Yu-xing
    ACTA ELECTRONICA SINICA. 2024, 52(9): 3000-3009. https://doi.org/10.12263/DZXB.20230094
    Abstract (1629) Download PDF (1325) HTML (1543)   Knowledge map   Save

    Si/SiC cascaded H-bridge inverters enable a combination of different devices to ensure low output current total harmonic distortion (THD) and high device efficiency. However, this also presents the challenge of switching and assigning Si/SiC cells. In this paper, a model predictive control (MPC) with variable weight is designed to select the total switch state and assign the cell switch combination. In this method, a variable weight based on the switching loss of the device is introduced into the cost function of selecting the total switching state of the inverter and the switching combination of Si/SiC cells, to improve the efficiency and output current harmonic distortion rate of the inverter. The effectiveness of variable-weight MPC is verified on the five-level Si/SiC cascaded H-bridge inverter device, and the output current THD is reduced by up to 2.05% and the device loss is reduced by up to 4.53% compared with the fixed-weight MPC.

  • PAPERS
    DING Jing-yi, JIN Jia-hui, YANG Feng-he, XIONG Run-qun, SHAN Feng, DONG Fang
    ACTA ELECTRONICA SINICA. 2024, 52(9): 2988-2999. https://doi.org/10.12263/DZXB.20221018
    Abstract (1474) Download PDF (945) HTML (1393)   Knowledge map   Save

    With the rapid development of the industrial Internet, industrial production needs to satisfy personalized user requirements. Due to the wide variety of personalized product specifications, an efficient and intelligent scheduling method is particularly important for manufacturing enterprises. From the perspective of deployment mode, existing intelligent scheduling systems can be divided into two categories: enterprise on-premises deployment and cloud scheduling services. The computing and storage resources of the local scheduling system are relatively limited, making it difficult to meet the needs of accurate scheduling algorithms. In contrast, cloud scheduling systems require the support of a large amount of industrial core scheduling data and charge on demand. The overhead of computing, storage, and network transmission makes scheduling service costs high. Additionally, uploading core industrial data to the cloud may carry the risk of data leakage. To address these issues, this paper takes the hot rolling production of iron and steel as an example, introduces edge computing technology into intelligent production scheduling, and proposes a cloud-edge collaborative industrial internet production scheduling framework (PSECC). The framework preprocesses the original industrial data at the edge to ensure that core production data is kept at the enterprise end, while the algorithm is solved in the cloud. The framework is also extended by deploying a general-purpose algorithm. Based on the PSECC framework, we designed and realized a cloud-edge decomposition method for hot rolling production scheduling tasks in steel. Experiments show that the performance of the cloud-edge collaborative production scheduling method proposed in this paper is similar to that of the conventional solver, but it can avoid uploading industrial core data to the cloud, and the choice of cloud solver is more flexible. In terms of performance, the total scheduling time of cloud scheduling is 1.4 to 3.7 times that of PSECC, and the network transmission time is 10 to 15 times..

  • PAPERS
    YANG Jia-yi, WANG Qian-fan, YAO Xin-yuan-meng, LI Cong-duan, MA Xiao
    ACTA ELECTRONICA SINICA. 2024, 52(9): 2979-2987. https://doi.org/10.12263/DZXB.20230112
    Abstract (1443) Download PDF (1139) HTML (1371)   Knowledge map   Save

    Constellation shaping, one of the key techniques for the communication systems, can provide shaping gain. However, recently proposed constant composition distribution matching (CCDM) probabilistic amplitude shaping (PAS) scheme is only suitable for square constellation modulation but not for general structured 2D constellation. This paper presents a generalized CCDM shaping, which can be directly applied to any 2D constellation with a symmetric structure. Furthermore, taking into account the 5G low-density parity-check (LDPC) standard (especially for the puncture structure), the presented CCDM shaping is combined with the 5G LDPC codes, resulting in a 5G LDPC coded shaping modulation scheme. The numerical results show that the performance of the proposed scheme is consistent with that of the conventional PAS scheme. The simulation results also show that the proposed 5G LDPC coded shaping modulation scheme can achieve a shaping gain of about 0.6 dB and a puncturing gain of about 0.5 dB (compared with the non-puncturing design).

  • PAPERS
    CAI Mei-ling, LUO Di, XIAO Jing-ri, LI Jing-yan, LIU Jin-ping
    ACTA ELECTRONICA SINICA. 2024, 52(10): 3291-3300. https://doi.org/10.12263/DZXB.20240466
    Abstract (1427) Download PDF (835) HTML (1237)   Knowledge map   Save

    Industrial process data encompasses continuous and discrete variables, whose underlying statistical characteristics are crucial for revealing operational conditions. However, current process monitoring models predominantly focus on continuous variables with Gaussian assumptions, which often overlook the significant effects of the multimodal distribution characteristics of process variables, as well as the noises and outliers in process data. These limitations hinder the models' ability to capture complex statistical characteristics, leading to low detection performance particularly in non-Gaussian and nonstationary processes. This article introduces a robust anomaly detection method termed continuous and discrete variables-concurrent analysis-based variational Bayesian mixture discriminator (CDVCA-VBMD). It models continuous variables with a mixed student's t-distribution and discrete variables with a mixed multinomial distribution based on variational Bayesian inference, which can adeptly manage and analyze the complex interdependencies between process variables and overcome the non-Gaussian nature of continuous variables effectively. Furthermore, CDVCA-VBMD incorporates continuous learning to ensure the effective detection of nonstationary industrial processes. Extensive validation and comparative experiments were conducted on a numerical simulation system and the Tennessee Eastman (TE) process. The outcomes demonstrate that CDVCA-VBMD can accurately characterize the mixed multimodal distribution characteristics of time-varying industrial processes, facilitating accurate anomaly detection. Additionally, the method exhibits robustness against noise and outliers in process data, supporting long-term and reliable monitoring of complex and non-Gaussian industrial processes.

  • PAPERS
    LI Jing-yu, CHEN Tuo-chao, LI Ming-zhe, XU Xu-hai, ZHANG Cheng, XU Zi-chen, LIU Xuan-zhe, HUANG Gang, FENG Yun, XU Chen-ren
    ACTA ELECTRONICA SINICA. 2024, 52(11): 3643-3656. https://doi.org/10.12263/DZXB.20220677
    Abstract (1402) Download PDF (581) HTML (1217)   Knowledge map   Save

    In mobile augmented reality applications, users interact with smart objects in proximity to finish collaboration or interaction tasks, whose efficiency and user experience are determined by the underlying directional interaction technology. However, current directional interaction technologies at this stage are inefficient. For interaction means, they rely on wireless technologies such as Wi-Fi and BLE, which propagates omni-directionally and thus cannot use the user's spatial context (i.e., location and direction) to shorten the interaction time, bringing unnecessary effort. In terms of interaction interface, current vision-based interfaces suffer from low reliability and low scalability, which further limits the adaptability and efficiency of the system. To address this issue, we developed RetroAR: an optical-sensing solution that leverages visible light backscatter communication to serve for directional interaction with intelligent objects on commodity smartphones. RetroAR exploits the directional propagation property of light to preserve the user's spatial context, which enables fast connection-free directional interaction between the user and the target devices. RetroAR instruments objects with custom retro-reflective markers called ViTags. When users interact with these smart objects, these ViTags are used to communicate with the camera on the mobile reader by backscattering the flashlight beams. We first conducted a system evaluation which showed that RetroAR could work reliably at a distance up to 4 meters and a view angle up to 100 degrees, and is able to achieve 6-DoF 3D tracking with an error as low as 1 cm on translation and 4.7 degrees on rotation. To evaluate how our system performs in terms of interaction, we then conducted a user study with 12 participants, which demonstrated that RetroAR improves the interaction time of MAR contactless control by at least two times compared to Wi-Fi-based solutions. RetroAR utilizes the user's spatial environment with visible light backscatter communication to maintain the intuitiveness of the interaction process. Users can interact with multiple targets in a point-and-control manner, which reduces interaction costs and brings a natural and intuitive interaction experience.

  • SURVEYS AND REVIEWS
    TONG Shuai, WANG Ji-liang
    ACTA ELECTRONICA SINICA. 2024, 52(10): 3623-3642. https://doi.org/10.12263/DZXB.20240471
    Abstract (1382) Download PDF (1865) HTML (1292)   Knowledge map   Save

    The rapid development of the internet of things (IoT) has spawned a large number of new applications. IoT empowers ordinary devices with computing and networking capabilities by connecting sensors, wearable devices, smart meters, and other low-data-rate, low-power end devices. Traditional wireless technologies struggle to adapt to the large-scale, low-power, long-distance connectivity requirements of IoT. How to reduce the barrier to device access and achieve low-power, long-distance device connectivity is an important challenge facing current IoT systems. LoRa, as a representative low-power wide-area network (LPWAN) technology, effectively solves the problem of long-distance connectivity for low-power devices and has become the core supporting technology of the IoT. However, LoRa still faces three important challenges in practice: (1) high-concurrency transmission in large-scale connection scenarios leads to signal conflicts, making it difficult for devices to access concurrently; (2) signal attenuation in long-distance wireless links makes it difficult to reliably transmit weak signals; (3) the problem of interference from heterogeneous protocols in IoT shared channels is prominent, and heterogeneous coexistence is difficult. This article outlines the current research progress of LoRa, focusing on the three research challenges and corresponding technological progress. Existing research has proposed conflict avoidance and concurrent decoding methods to address the problem of high-concurrency conflicts; existing research explores weak signal enhancement transmission and receiver decoding optimization to address the problem of weak signals; existing research has designed various cross-protocol communication mechanisms to address the problem of heterogeneous protocol competition. This article reviews the latest research progress of LoRa, analyzes the innovation points and limitations of existing research, and points out the direction of future research.

  • PAPERS
    GAO Si-hua, LI Jun-hui, LI Jian-fu, LIU Bao-yu
    ACTA ELECTRONICA SINICA. 2024, 52(11): 3699-3710. https://doi.org/10.12263/DZXB.20230299
    Abstract (1380) Download PDF (499) HTML (1307)   Knowledge map   Save

    UAV (Unmanned Aerial Vehicle)-assisted WSN (Wireless Sensor Networks) suffers from single-source data collection and uneven energy supplement. In this article, we first investigate and develop a mathematical model for the problem of fairness for data collection and energy supplement. Then, a novel deep reinforcement learning algorithm, named DPDQN (Double Parametrized Deep Q-Networks), is designed to resolve the proposed problem. The DPDQN algorithm incorporates a hybrid discrete-continuous action strategy, which consists of two components, namely, discrete action network and continuous action network. The former schedules the UAV's visiting order to sensors in WSN, and the latter optimizes the UAV’s hover position around each visited sensor. Numerical results demonstrate that the DPDQN algorithm outperforms three existing solutions in data collection fairness, energy replenishment fairness, flying distance, and four factors that influence fairness. Furthermore, the results validate our algorithm is robust and stable.

  • PAPERS
    QIAN Zhong-sheng, HUANG Heng, WAN Zi-long
    ACTA ELECTRONICA SINICA. 2024, 52(11): 3684-3698. https://doi.org/10.12263/DZXB.20231084
    Abstract (1376) Download PDF (568) HTML (1285)   Knowledge map   Save

    Graph convolutional network has been widely applied in multi-behavior recommender systems due to its powerful ability to learn high-order collaborative signal. However, most existing graph convolution-based multi-behavior recommendation methods have failed to effectively model the relationships between different user-item nodes and various behaviors. The sparsity of target behaviors also poses challenges to further improve the performance of multi-behavior recommendation algorithms. Based on this, we propose the multi-behavior graph contrastive learning recommendation model with self-attention mechanism (SA-MBGCL). This method combines user-item node embeddings with behavior embeddings and employs a self-attention mechanism to enhance embedding representations, effectively modeling the dependency relationships between different nodes and behaviors. In the meanwhile, a graph contrastive learning approach is constructed, treating the target behavior and auxiliary behaviors of the same user as positive pairs, while considering those of different users as negative pairs, thereby reinforcing behavioral differences among different users to alleviate the sparsity of target behaviors. The proposed model combines unsampled recommendation tasks with multi-behavior graph contrastive learning to perform multi-task joint optimization. It was compared with 6 single-behavior models and 10 multi-behavior models on two public datasets, Beibei and Taobao. The results show that the proposed model SA-MBGCL achieves an average improvement of 5.21% in Hit Ratio (HR) and 8.30% in Normalized Discounted Cumulative Gain (NDCG). This demonstrates the effectiveness of the method presented in this work.

  • Intelligent Vision Algorithms for Unmanned Systems
    GU Mei-ying, LI Hang, ZHANG Jia-wei, BAI Xiao, ZHENG Jin
    ACTA ELECTRONICA SINICA. 2025, 53(3): 651-685. https://doi.org/10.12263/DZXB.20240699
    Abstract (1357) Download PDF (858) HTML (1318)   Knowledge map   Save

    As the cost of unmanned aerial vehicles (UAVs) decreases, they have attracted increasing research interest. UAVs are now widely applied in various fields, including agriculture, firefighting, surveying, aerial photography, and recreational applications. These applications require UAVs to perform autonomous flights with precise self-localization, typically relying heavily on global navigation satellite systems (GNSS). However, GNSS has multiple shortcomings related to long-distance radio communications, such as non-line-of-sight reception, multi-path effects, and spoofing. This has driven the development of new methods to supplement or replace satellite navigation. Vision-based UAV localization and navigation methods, utilizing onboard visual sensors for autonomous localization and navigation, have become crucial in addressing this issue. This review contributes to the field by systematically reviewing vision-based UAV localization and navigation technologies, providing a comprehensive summary of the current research landscape and developmental trends. First, it introduces vision-based UAV localization methods, which are categorized into image retrieval and feature matching approaches. The technical characteristics, applicable scenarios, relevant datasets, and evaluation metrics of these methods are analyzed in detail. Second, this review elaborates on vision-based UAV navigation methods, distinguishing between obstacle detection and avoidance techniques and path planning methods based on their functional objectives, while highlighting the strengths and limitations of existing technologies. Finally, this review further discusses the potential challenges faced by vision-based UAV localization and navigation methods, including the lack of publicly available datasets, the need for hardware acceleration, the complexity of operating environments, real-time processing requirements, energy constraints, and the gap between simulated and real-world environments.

  • PAPERS
    SONG Yan-tao, LU Yun-li
    ACTA ELECTRONICA SINICA. 2024, 52(11): 3835-3846. https://doi.org/10.12263/DZXB.20230904
    Abstract (1346) Download PDF (663) HTML (1258)   Knowledge map   Save

    Ultrasound image segmentation plays a key role in disease diagnosis and treatment, but accurately segmenting the regions of interest is still a challenging task due to the low contrast, noise interference, and variability in shape, size, and location of the lesions in ultrasound images. To address this problem, we propose a dual-channel self-attention mechanism U-shaped network (SwinT-Unet), which utilizes Swin-Transformer and Unet encoder to simultaneously extract features. To effectively fuse the different-level features extracted by Swin-Transformer and Unet encoder, we also propose a gated dual-layer feature fusion module (GDFF), which achieves the effective fusion of global and local features through the gating mechanism, thereby improving the accuracy and robustness of the segmentation results. We conduct experiments on two different ultrasound image datasets, and the results show that our proposed model outperforms the existing convolutional neural network and Transformer-based models in terms of segmentation accuracy and robustness. Our paper provides a new method for ultrasound image segmentation, and offers more accurate and reliable support for clinical medical diagnosis and treatment.

  • PAPERS
    LEI Tian-liang, JI Li-xin, WANG Geng-run, LIU Shu-xin, WU Lan
    ACTA ELECTRONICA SINICA. 2024, 52(11): 3741-3750. https://doi.org/10.12263/DZXB.20221225
    Abstract (1295) Download PDF (758) HTML (1175)   Knowledge map   Save

    As an important spatio-temporal data mining task, user trajectory identification is widely used in the fields of location-based personalized service recommendation, itinerary planning, crime behavior detection, and target tracking.However, it still has low prediction accuracy, mainly due to low sampling and sparse trajectory data, and a huge number of trajectory categories.To fill the research gaps, a user trajectory identification model based on an expandable self-attention spatio-temporal graph convolutional neural network (ESAST-GCNN) is proposed, which adopts the spatio-temporal graph convolutional neural network to deeply mine the relationship between time sequence features and spatial features to predict and expand the sequence.This model combines the self-attention mechanism to obtain the internal correlation of user trajectory feature vectors and identify user trajectories.After testing on two real datasets, the results show that the accuracy of ESAST-GCNN is improved by 13.95% and 10.63% in Geolife and Gowalla compared with TUL via Embedding and RNN (TULER-GRU), respectively.The experimental results illustrate that ESAST-GCNN is superior to other comparative models, with better identification effect and wider applicability.

  • PAPER
    WANG Xu-xin, CHEN Hui, LIAN Feng, ZHANG Guang-hua
    ACTA ELECTRONICA SINICA. 2024, 52(9): 3135-3147. https://doi.org/10.12263/DZXB.20230873
    Abstract (1241) Download PDF (1320) HTML (1127)   Knowledge map   Save

    To address the problem of extended target tracking (ETT) with irregular shape, this paper proposes a random hypersurface model-adaptive progressive bayesian filter (RHM-APBF). First, the local cumulative distribution of the continuous state prior probability density of extended target is randomly sampled, and the optimal position of the sampling point is obtained by minimizing the modified Cramer-Von Mises distance between the local cumulative distribution of the continuous probability density and the Dirac mixture probability density. Then, the sampled particles are migrated to the posterior dense area to obtain a more accurate posterior probability density approximation by progressive update with adaptive variable step size. Furthermore, the random hypersurface model is used to represent the measurement source distribution of arbitrary star-convex extended targets, and an adaptive progressive filter for tracking star-convex irregular shape extended target is proposed, which effectively recurses the multi-feature probability density of irregular shape extended targets. Finally, the effectiveness of the proposed method is verified by the tracking simulation experiments of the extended target (ET) and group target (GT) at different noise level and complex random environment.

  • PAPERS
    LIU Shuai, REN Xiao-guang, WANG Shi-xiong, GUAN Jie, ZHANG Xiao-chuan, TAN Jie, WANG Jun
    ACTA ELECTRONICA SINICA. 2024, 52(9): 3065-3074. https://doi.org/10.12263/DZXB.20230673
    Abstract (1240) Download PDF (213) HTML (1149)   Knowledge map   Save

    The linear property of lightweight cipher ACE and SPIX was researched. The linear property of ring AND-gate combination was described accurately with mixed-integer linear programming. The nonlinear operation of ACE and SPIX was transformed into ring AND-gate combination. Based on this, the linear models of ACE permutation and SLISCP permutation were constructed with mixed-integer linear programming. The models returned the optimal linear characteristics of 2-step to 4-step ACE permutation and 2-step to 5-step SLISCP permutation. It was proved that 7-step and 12-step ACE permutation achieved the 128-bit security and 320-bit security respectively, and 7-step and 13-step SLISCP permutation achieved the 128-bit security and 256-bit security respectively. For the ACE permutation and SLISCP permutation with any number of steps, authenticated encryption algorithm ACE-AE-128 and SPIX can resist the linear distinguish attack of plaintext processing stage.

  • PAPERS
    WANG Wen-tao, YE Chen, TIAN Jun
    ACTA ELECTRONICA SINICA. 2024, 52(11): 3780-3797. https://doi.org/10.12263/DZXB.20230382
    Abstract (1220) Download PDF (626) HTML (1148)   Knowledge map   Save

    The 3D UAV (Unmanned Aerial Vehicle) path planning problem aims to plan an optimal flight path for the UAV while satisfying safety conditions. In this paper, a cost function for UAV path planning is constructed by means of mathematical modeling, so that the UAV path planning problem is transformed into a multi-constrained optimization problem, and metaheuristic algorithms are applied to solve this problem. Aiming at the shortcomings of artificial rabbit optimization algorithm which is slow to converge and easy to fall into local optimum, this paper develops an improved Artificial Rabbit Optimization algorithm based on Levy flight, adaptive Cauchy mutation, and elite population Genetic strategy (LCGARO). Multifaceted comparison experiments are conducted between LCGARO and six classical and advanced heuristic algorithms in 29 CEC2017 test functions and six 3D UAV path-planning terrain scenarios of varying complexity. The results of the comparison experiments prove that the LCGARO algorithm proposed in this paper has better optimization accuracy among 22 test functions in the comparison experiments of CEC2017 test functions. In the UAV path planning experiments, the LCGARO algorithm is able to plan a flight path with the smallest total cost function value in five terrain scenarios.

  • PAPERS
    YANG Le, MA Yong-jie, PING Hao-yu, YANG Yue
    ACTA ELECTRONICA SINICA. 2024, 52(9): 3278-3290. https://doi.org/10.12263/DZXB.20221330
    Abstract (1178) Download PDF (1264) HTML (1102)   Knowledge map   Save

    In order to better cope with the environmental changes in dynamic multi-objective optimization, an evolutionary algorithm with angular correction of difference vectors and hierarchical multi-population co-evolution (ACHMP) is proposed. According to the historical information, use the unscented Kalman filter model to predict the population centroids, generate different difference vectors through different centroids at different times, and then use the unscented Kalman filter to correct the angle of the difference vectors. A multi-population coevolution model is proposed, which divides the population into three parts to evolve in different directions. The sub-population supervises the evolution of the master population, which not only improves the performance of the algorithm, but also ensures the diversity of the population. Experimental results with 10 comparison algorithms on different test problems show that the ACHMP algorithm performs better than the other algorithms in general, which proves that the angle correction and hierarchical multi-population method proposed in this paper has strong competitiveness in dealing with dynamic multi-objective optimization problems.

  • PAPER
    HE Qian-hua, CHEN Yong-qiang, ZHENG Ruo-wei, HUANG Jin-xin
    ACTA ELECTRONICA SINICA. 2024, 52(10): 3482-3492. https://doi.org/10.12263/DZXB.20240048
    Abstract (1176) Download PDF (296) HTML (1069)   Knowledge map   Save

    End-to-end deep learning is the main technology for speech keyword spotting. The research focused on exploring better network structures, modeling units, and search strategies, and have made a lot of progress. However, less attention is paid on training efficiency. In this paper, a novel class uncertainty sampling (CUS) strategy is proposed to select effective samples for each training epoch. Since only a subset is used, much training time is saved. The core idea of CUS is measuring the class uncertainty of samples with the forward information of the output layer during the middle and late training stages, and samples are selected at a probability of their class uncertainty. Therefore more attention is paid to samples nearing the decision boundary, which are prone to missed detection or false alarm. Furthermore, the proposed method could shield the interference of label error samples. Experimental results on the AISHELL-1 Mandarin dataset showed that fast convergence and better training performance were achieved. Against the conventional training strategy, the average training time and the average converging time was relatively shortened by 60% and 47.5%, respectively. At 0.5 FP/h false accept rate(FAR), the false reject rate (FRR) was reduced from 4.75% to 3.65%, a relative reduction of 30.1%, and the maximum term weighted value (MTWV) was increased from 0.837 4 to 0.853 1. Moreover, it was experimentally verified that the method could shield most of the mislabeled samples. This conclusion was confirmed with the experiments on the large-scale AISHELL-2 Mandarin dataset.

  • PAPERS
    ZHOU Xin-min, XIONG Zhi-mou, SHI Chang-fa, YANG Jian
    ACTA ELECTRONICA SINICA. 2024, 52(9): 3159-3171. https://doi.org/10.12263/DZXB.20231068
    Abstract (1172) Download PDF (1437) HTML (1046)   Knowledge map   Save

    Currently, more and more medical image segmentation models are using Transformer as their basic structure. However, the computational complexity of the Transformer model is quadratic with respect to the input sequence, and it requires a large amount of data for pre-training in order to achieve good results. In situations where there is insufficient data, the Transformer's advantages cannot be fully realized. Additionally, the Transformer often fails to effectively extract local information from images. In contrast, convolutional neural networks can effectively avoid these two problems. In order to fully leverage the strengths of both convolutional neural networks and Transformers and further explore the potential of convolutional neural networks, this paper proposes a multi-scale convolution modulation network (MSCMNet) model. This model incorporates the design methodology of visual Transformer models into traditional convolutional networks. By using convolution modulation and multi-scale feature extraction strategies, a feature extraction module based on multi-scale convolution modulation (MSCM) is constructed. Efficient patch combination and patch decomposition strategies are also proposed for downsampling and upsampling of feature maps, respectively, further enhancing the model's representation ability. The mDice scores obtained on four different types and sizes of medical image segmentation datasets - multiple organs in the abdomen, heart, skin cancer, and nucleus - are 0.805 7, 0.923 3, 0.923 9 and 0.854 8, respectively. With lower computational complexity and parameter count, MSCMNet achieves the best segmentation performance, providing a novel and efficient model structure design paradigm for convolutional neural networks and Transformers in the field of medical image segmentation.

  • PAPERS
    LIAN Xiao-juan, JIANG Ji-yuan, WAN Xiang, XIAO Wan-ang, WANG Lei
    ACTA ELECTRONICA SINICA. 2024, 52(11): 3886-3898. https://doi.org/10.12263/DZXB.20230948
    Abstract (1137) Download PDF (252) HTML (1071)   Knowledge map   Save

    Phase-change integrated photonic devices are widely considered as a strong competitor to conventional electronic devices due to their large bandwidth, short delay, multiplexing and great anti-interference. However, current phase-change integrated photonic devices require high energy consumption, thus severely exacerbating its commercial application prospect. To address this issue, this paper innovatively proposed a promising silicon dioxide (SiO2) / magnesium fluoride (MgF2) based photonic architecture to replace the mainstream silicon based devices. Such device made use of the Ge2Sb2Te5 (GST) and indium tin oxide (ITO) as the functional and microheater materials, respectively, which have received widespread applications today, and simulated its programming and readout process according to an independently developed model that coupled electro-thermal and phase-change field processes. Results indicated that the energy consumption for crystallization and amorphization were 78 aj/nm3 and 90 aj/nm3, much lower than majority of other silicon-based devices. It also exhibited good light propagation trait at near-infrared band (1 550 nm), as well as multilevel characteristic with more than 5 intermediate states and short pulse width with 50 ns. Additionally, further research suggested that the photonic neural networks constructed from the proposed device can be used to recognize the iris dataset, and its accuracy can reach 90%, close to that of conventional neural networks (~94.7%). Aforementioned work provided for the new strategy for developing emerging phase-change photonic devices with low power, in-memory computing and neuromorphic computing functionalities, and exhibited its extremely important significance to the general non von-Neumann regime that has both electronic and photonic performance superiorities.

  • PAPERS
    YAN Li, XU Gao-tian, ZHANG Ting-hao
    ACTA ELECTRONICA SINICA. 2024, 52(10): 3472-3481. https://doi.org/10.12263/DZXB.20240059
    Abstract (1103) Download PDF (250) HTML (1062)   Knowledge map   Save

    The multi-stage sub-image merging is a key method to accelerate to synthetic aperture radar (SAR) imaging in the time domain. However, the high-squint acquisition in the maneuvering platform enhances the irregularity of the support region of the spectrum, which degrades the performance of image merging in efficiency and accuracy. Because of these issues, in this paper, a modified hybrid coordinate system is designed, based on which a fast time domain imaging algorithm is developed for high-squint diving maneuvering platform SAR. Benefiting from the equivalent slant range model in the modified hybrid coordinate system, the sensitivity of the spectrum to the squinted angle is reduced, and the space variation phenomenon of the spectrum is eliminated. Hence, the spectral preprocessing function can be easily designed to effectively compress and merge the spectrum, which improves the performance of the image merging in efficiency and accuracy. Both simulated and raw data are processed to validate the performance superiority of the proposed algorithm.

  • PAPERS
    HE Chao-bo, CHENG Qi-wei, CHENG Jun-wei, YANG Jia-qi, CHENG Hao, TANG Yong
    ACTA ELECTRONICA SINICA. 2024, 52(11): 3757-3768. https://doi.org/10.12263/DZXB.20230239
    Abstract (1099) Download PDF (379) HTML (1019)   Knowledge map   Save

    The topic of semantic community discovery and evolution analysis in dynamic attributed networks has important research value. It needs to simultaneously accomplish the tasks of dynamic community discovery, community semantic interpretation and community evolution analysis, but existing methods are difficult to achieve this goal. In view of this, this paper proposes a method DAN-NMF (NMF for Dynamic Attributed Networks) based on joint nonnegative matrix factorization. DAN-NMF can uniformly integrate network topology information, attribute information and smooth constraint information from community evolution, and derive iterative update rules of the related factor matrices using the majorization-minimization optimization framework, which helps it to directly obtain the results of dynamic community discovery, community semantic interpretation and community evolution analysis. Extensive experiments are conducted on multiple synthetic and real-world dynamic attributed networks. The results show that DAN-NMF has improved by at least 7.3% in term of accuracy metric, compared to the optimal baseline. Moreover, the data analysis results on real-world dynamic attributed networks also demonstrate that DAN-NMF can effectively discover the evolution patterns of dynamic communities and provide rich community semantic interpretations.

  • PAPERS
    JIA Qiong-qiong, ZHOU Yue-ying
    ACTA ELECTRONICA SINICA. 2024, 52(9): 3148-3158. https://doi.org/10.12263/DZXB.20230739
    Abstract (1094) Download PDF (1175) HTML (975)   Knowledge map   Save

    Global Positioning System (GPS) L5, Beidou B2 and Galilea E5 are important components of the global navigation satellite system (GNSS), providing life safety related applications services for civil aviation. The L5, B2 and E5 signal are working in the protected aeronautical radio navigation service (ARNS) band (962~1213 MHz). At the same time, distance measuring equipment (DME), the civil aviation navigation system, is also working in this frequency band. The high-power pulse signal emitted by DME will interfere with satellite navigation signals such as L5/B2/E5, leading to abnormal acquisition of satellite signals by the receiver or loss of lock in the tracking loop. Traditional interference sparse domain suppression, such as DME interference zeroing method in time domain and time-frequency hybrid domain, can completely eliminate the satellite signal overlapping with interference while suppressing interference. In order to reduce the loss of satellite signals while mitigating the DME interference, this paper proposes a DME interference suppression method based on local robust preprocessing using robust statistical theory. According to the sparse domain characteristics of the DME interference, the robust statistical theory of non-gaussian distribution is applied to the extraction of the data samples, which can reduce the effect of the satellite signal while inhibiting the interference. The experimental results show that the performance of DME interference suppression method based on the local robust preprocessing is superior to the corresponding traditional sparse domain method, and the output acquisition factor increases the 1~2 dB by the traditional sparse domain method.

  • PAPERS
    WANG Wei, XIE Hui, WEI Zhong-cheng, ZHAO Ji-jun, PENG Li
    ACTA ELECTRONICA SINICA. 2024, 52(10): 3552-3561. https://doi.org/10.12263/DZXB.20240444
    Abstract (1090) Download PDF (769) HTML (1003)   Knowledge map   Save

    In disaster scenarios, the application of UAV (Unmanned Aerial Vehicle) for resource delivery holds considerable promise. However, the complexity and volatility of emergency environments, along with the spatial and temporal uncertainties associated with various unexpected events, can lead to inaccuracies in assessing resource demands at target points, which in turn may affect the UAV task allocation strategies in resource distribution. To address this issue, a two-stage robust optimization approach is introduced into the UAV task assignmet model. By integrating UAV assignment with task allocation, the model leverages the resources of the UAV fleet to minimize task assignment costs under maximum demand variability. This paper models the relationship between injury severity levels and resource demand variations, categorizing resource demand into three levels to achieve an accurate representation of total task allocation cost variations. The C&CG (Column-and-Constraint Generation) algorithm is used to address UAV task assignment under uncertain resource demand conditions. Finally, three types of experiments were designed and the simulation results validated the effectiveness and superiority of the algorithm. Compared to the deterministic model, this algorithm showed greater robustness in handling demand variation.

  • PAPERS
    LU Qing-yang, YUAN Guang-lin, ZHU Hong, QIN Xiao-yan, XUE Mo-gen
    ACTA ELECTRONICA SINICA. 2024, 52(10): 3448-3458. https://doi.org/10.12263/DZXB.20230364
    Abstract (1081) Download PDF (431) HTML (1015)   Knowledge map   Save

    The one-stage visual grounding method has received widespread attention due to its speed, which uses fused features of images and text to predict target boxes. However, existing methods do not align image and text features before feature fusion, which limits the accuracy of visual grounding. To solve this problem, this paper proposes a visual grounding method based on contrastive learning large model. This method extracts features of image and text with CLIP(Contrastive Language-Image Pre-training) which is a large-scale pre-trained model based on contrastive learning. It uses Transformer encoders to fuse the image-text features and predicts target boxes using multi-layer perceptron and fused features. The method can overcome the above shortcomings for the following reasons: It can extract highly aligned image-text features in semantics via the CLIP encoders. Meanwhile, it uses global attention to interactively fuse contextual features of images and text. The proposed method was experimentally validated on five datasets, and the experimental results show that compared to existing visual grounding methods, the proposed method has achieved an improvement in overall accuracy.

  • PAPERS
    LI Ming, XU Shan-zhi, YIN Yu-qing, YANG Xu, NIU Qiang, LI Zi-long
    ACTA ELECTRONICA SINICA. 2024, 52(11): 3858-3864. https://doi.org/10.12263/DZXB.20230715
    Abstract (1074) Download PDF (155) HTML (974)   Knowledge map   Save

    The exponential increase of IoT devices has accelerated the process of interconnecting heterogeneous wireless devices, and the cross-technology communication (CTC) technique enables wireless devices to operate in the same band and use different underlying protocols to connect directly without gateways. Nevertheless, systematic research on the two-way CTC of heterogeneous mobile devices is still lacking. This paper proposes MobiCTC, a CTC scheme based on energy sensing that supports bidirectional CTC between mobile WiFi and ZigBee devices. In the WiFi-to-ZigBee direction, the scheme uses RSSI as the decoding information and an energy-level mapping scheme to achieve information decoding. In the ZigBee to WiFi direction, the scheme adopts CSI as the decoding information, fully exploits CSI’s amplitude and phase information and uses a machine learning method for decoding. Finally, this paper designs and implements MobiCTC using the TelosB node and USRP X310 platform, as well as experimental verification. The experimental results show that in the mobile state, the WiFi to ZigBee communication throughput is 139.535 bps, which is 1.82 times higher than WiZig, and the symbol error rate is 0.016, which is basically the same as WiZig; the ZigBee to WiFi communication throughput is 250 bps, which is 15.7% higher than FreeBee, and the symbol error rate is 0.0516, which is a decrease of 23.21% compared to ZigFi.

  • PAPERS
    FENG Jin-yuan, CHEN Min, LI Jun-ying, CHEN Jia-le, PU Zhi-qiang, CHEN Min-jie, SUN Fang-yi
    ACTA ELECTRONICA SINICA. 2024, 52(11): 3809-3822. https://doi.org/10.12263/DZXB.20230985
    Abstract (1068) Download PDF (450) HTML (972)   Knowledge map   Save

    The rapid development of artificial intelligence technology has endowed autonomous air combat strategies with the potential to surpass human experts. Existing intelligent air combat strategies can be classified into two categories based on their driving methods: knowledge-based strategies, which heavily rely on application scenarios and expert knowledge; and data-driven strategies, represented by reinforcement learning, which have poor interpretability and weak generalization. In this study, focusing on the scenario of multi-agent cooperative air combat from the air intelligence game (AIG)—a knowledge-based and data-driven integrating strategy design method is proposed. The knowledge-based part utilizes expert knowledge to design a parameterized and stylized knowledge-based artificial intelligence (AI) system, which generates high-quality offline data and initializes the strategy. The data-driven part employs graph attention networks to selectively represent information about teammates and opponents, aiming to improve training efficiency and convergence performance. Furthermore, a dynamic opponent matching mechanism is introduced for multi-agent reinforcement learning training to enhance strategy generalization. The proposed strategy achieved a statistical winning rate of over 70% when competing against 12 teams from the top 16 teams in AIG. It is worth mentioning that these teams all adopt the latest knowledge-based or data-driven methods, with diverse styles, and at the same time, they have strong combat capabilities.

  • PAPERS
    LIU Zong-hao, PENG Wen-jie, DAI Gang, HUANG Shuang-ping, LIU Yong-ge
    ACTA ELECTRONICA SINICA. 2024, 52(10): 3347-3358. https://doi.org/10.12263/DZXB.20240286
    Abstract (1068) Download PDF (975) HTML (980)   Knowledge map   Save

    Oracle bone character recognition holds significant value for understanding Chinese history and the inheritance of Chinese culture. Currently, manual recognition of oracle bone character requires extensive expert experience and consumes a great deal of time, while the majority of methods for automatic recognition are constrained by the closed-set assumption. This limitation becomes pronounced in the context of oracle bones, where new characters are continuously discovered. To address this, some researchers achieved zero-shot oracle character recognition by visual matching. This method employs handprinted images as category references, achieving character recognition in scanned images through similarity matching with handprinted references. However, this approach overlooks the challenge of large intra-class variance in oracle bone scanned images, leading to potential mismatches due to the variability in glyphs. This paper proposes a two-stage semantic-enhanced zero-shot oracle character recognition method. The first stage is domain-independent character semantic learning, where the contrastive vision-language pre-training model CLIP is used to extract character semantics from oracle rubbings and template images through prompt learning, addressing the lack of semantic information in oracle characters. To cope with the domain differences between rubbings and templates, we set learnable domain-specific prompts and character category prompts, decoupling their semantics to achieve more accurate feature extraction. The second stage is semantic-enhanced character image visual matching. The model extracts intra-class shared features and inter-class distinctive features through two branches. The first branch uses contrastive learning to align the visual features of different glyphs within the same character category to the character semantics, guiding the model to focus on intra-class shared features. The second branch employs the loss function N-Pair to enhance the model’s ability to learn distinctive features between different character categories. During the testing phase, the model does not require semantic features; instead, it utilizes the intra-class similarity and inter-class distinctiveness learned during training to achieve more accurate matching between rubbings and templates, improving zero-shot recognition performance. Experimental validation on the scanned images dataset OBC306 and the handprinted images dataset SOC5519 demonstrates that our proposed method surpasses the baseline method in zero-shot oracle character recognition accuracy by over 25%.

  • PAPERS
    DU Ming-jing, WU Fu-yu, LI Yu-rui, DONG Yong-quan
    ACTA ELECTRONICA SINICA. 2024, 52(10): 3459-3471. https://doi.org/10.12263/DZXB.20231146
    Abstract (1068) Download PDF (451) HTML (1016)   Knowledge map   Save

    Density-based clustering is a classical algorithm in cluster analysis, which can find non-spherical clusters without specifying the number of clusters in advance. In the real-world scene, there are still some issues, including unclear boundaries between clusters, varying densities of data, and complex cluster shapes. Most existing density-based clustering algorithms do not tackle these problems in a unified way. We counter this difficulty by taking inspiration from the natural erosion phenomenon to present erosion clustering (EC). Firstly, the proposed dynamic density evaluation method is integrated into the erosion strategy, which identifies and removes the data on the cluster boundary layer by layer, revealing the cores of the latent clusters. After that, a mutual-reachability-graph-based clustering is used to group the core data. Finally, the allocation strategy based on the local density peak is designed to associate the eroded data to different clusters. The experimental results on 12 benchmark datasets demonstrate that the clustering performance of the proposed EC algrithm is improved by an average of 96%, 53%, and 36% in the adjusted Rand index, adjusted mutual information, and F 1 score, respectively, comparing with the other seven algrithms.

  • PAPERS
    HU Yu-hong, WANG De-guang, YANG Ming, WANG Xi
    ACTA ELECTRONICA SINICA. 2024, 52(9): 3172-3184. https://doi.org/10.12263/DZXB.20221267
    Abstract (1067) Download PDF (1247) HTML (988)   Knowledge map   Save

    In the case that several controllable events (control commands) are allowed to execute simultaneously, the supervisor in the framework of discrete event systems (DESs) selects one randomly. However, in practical applications, such as traffic scheduling and robot path planning, the problems of directed control and numerical optimization should be considered. This paper introduces an optimization mechanism to quantify the control cost and combines supervisory control theory (SCT) with reinforcement learning. A systematic procedure is proposed to synthesize the optimal directed supervisor of a DES based on reinforcement learning, which makes the controlled system achieve the following three goals: (1) the control specifications relevant to security and liveness are not violated; (2) at most one controllable event can be executed at each state; (3) the cumulative cost of event execution from the initial state to a mark state is minimal. First, given the automaton models of the plant and specifications, the target automaton model is obtained by the synchronous operation of these two models; a cost function is defined and assigns the execution cost for each event in the target model. Second, the non-blocking and maximally permissive supervisor is synthesized by SCT. Finally, the supervisor is transformed into a Markov decision process and then the Q-learning algorithm is utilized to compute the optimal directed supervisor. Two applications are used to verify the effectiveness and correctness of the proposed method. The simulation results show that the proposed method can realize the directed control of the system, and the numerical cost of the directed supervisor is minimized.

  • PAPERS
    LI Zhuang-ju, DU Peng-da, WANG Ning
    ACTA ELECTRONICA SINICA. 2024, 52(9): 3185-3194. https://doi.org/10.12263/DZXB.20230854
    Abstract (1042) Download PDF (1355) HTML (963)   Knowledge map   Save
    CSCD(1)

    A quadrotor unmanned aerial vehicle (UAV) system is full of parameter uncertainties and strong couplings, and the performance of a quadrotor UAV is easily degraded by external disturbances.To ensure the flight stability of the quadrotor UAV, a fuzzy linear active disturbance rejection control based on an improved linear extended state observer(LESO) is proposed in this paper.Parameters of the linear active disturbance rejection control are adaptively adjusted by a fuzzy algorithm, and the second-order differential signal of position and attitude angle of the quadrotor UAV is extracted by a levant tracking differentiator, and then the total disturbance of the quadrotor UAV is extracted, the fuzzy controller takes the total disturbance deviation and its differential as input, thus optimizing the estimation accuracy of the LESO for the total disturbance.The convergence of the LESO and the stability of the closed-loop system are analyzed.Finally, the proposed control strategy is verified from the control signals, dynamic responses and of the robustness of the system.

  • PAPERS
    HE Peng, WANG Liang-jun, ZHANG Wu, ZHU Wen-hao
    ACTA ELECTRONICA SINICA. 2024, 52(9): 3097-3110. https://doi.org/10.12263/DZXB.20230809
    Abstract (1023) Download PDF (1106) HTML (916)   Knowledge map   Save

    The local encryption technique for multi-layer grids based on the lattice Boltzmann method computes the flow characteristics at different levels through multi-layer grids, which avoids the inefficiency and waste of computational resources in single-layer uniform Cartesian grids. But there is still an undesirable effect on the parallel performance. The load-balancing effect in parallel computing is considered in this paper. Starting from a single-layer grid, we study the load-balancing-based grid partitioning method by considering the computational characteristics of multi-layer grids. At the same time, the grid partitioning is separated from the program implementation, and parallel computation with arbitrary grid partitioning is achieved in both single-layer and multi-layer grids. The relationship between load partitioning and the respective time overheads of the different processes is investigated in a single-layer grid with different parallel strategies for 2D vascular flow. The characteristics of multiscale grids with respect to the order of operations is first discussed for multi-layer grids. Second, three different multi-layer grids are used to verify the computational results of the two-dimensional aerofoils. Finally, the relationship between load balancing and time overhead is further investigated by using three different meshing methods in each grid. Parallel performance tests on a 128-core HPC (High Performance Computing) platform show that the strong scalability can reach up to 60%, and the weak scalability can reach 82.78%. This high scalability result shows the significant improvement of the parallel performance in multi-layer grid computing by improving the load balancing performance.

  • PAPER
    JIAO Jie, QI Yong-sheng, LIU Li-qiang, LI Yong-ting, WANG Zhao-xia
    ACTA ELECTRONICA SINICA. 2024, 52(9): 3251-3261. https://doi.org/10.12263/DZXB.20230829
    Abstract (1022) Download PDF (1258) HTML (933)   Knowledge map   Save
    CSCD(1)

    With the rapid development of intelligent animal husbandry, cattle facial recognition has become a key aspect of intelligent farming in cattle ranches. However, due to the complexity of the ranching environment and the limited autonomy of animals, the process of collecting and identifying cattle facial data is severely affected by environmental factors such as blurriness, occlusion, and lighting. To address this issue, a complex scene-adaptive dual-branch efficient cattle facial recognition algorithm is proposed. This algorithm first designs a data augmentation strategy based on pixel fusion. By calculating fusion coefficients using the beta distribution, the left and right facial images of cattle are integrated at the pixel level, enriching the sample's feature information. Simultaneously, the algorithm enhances the network's ability to learn cattle facial features under blurriness and occlusion, improving its generalization ability to complex scenes. Furthermore, a novel attention mechanism called composite dual-branch adaptive attention (CDAA) is introduced into the main feature extraction network. This mechanism adaptively strengthens the weights of the channel and spatial attention branches as scene information changes, enhancing the network's feature selection ability in complex scenarios. Next, a dual-branch feature extraction structure combining FaceNet and U-LBP (Uniform Local Binary Patterns) is designed. The extracted feature vectors are adaptively weighted and fused to increase the network's robustness in overly bright or dark environments. Finally, an improved cross-entropy loss (Focal Loss) is incorporated into the loss function. Weight coefficients are dynamically adjusted based on the complexity of the scene information to autonomously control the classification of difficult and easy samples. To evaluate the effectiveness and real-time performance of the algorithm, ablation experiments are conducted on a specific dataset, comparing it with various typical recognition algorithms. The experimental results indicate that the proposed algorithm effectively meets real-time requirements, achieving an accuracy of 87.53% on the open test set with a recognition speed of 108 frames per second. Moreover, in complex scenarios, the recognition performance of the proposed algorithm surpasses that of the comparative networks.