最新刊期

    52 7 2024

      PAPERS

    • CHEN Zhe, WANG Pin-qing, ZHOU Pei-gen, CHEN Ji-xin, HONG Wei
      Vol. 52, Issue 7, Pages: 2161-2169(2024) DOI: 10.12263/DZXB.20230645
      摘要:This paper presents the design of a millimeter-wave dual-band low phase noise voltage-controlled-oscillator in 45 nm CMOS SOI (Complementary Metal Oxide Semiconductor Silicon On Insulator) process, which covers bands of 24.25~27.5 GHz and 37~43.5 GHz for 5G millimeter-wave communications. Based on the transistor’s high performance as the RF switch in SOI process, the switched cap-bank and switched inductor topology are proposed in this paper, to enhance the quality factor Q for the wide-band tuning inductance and capacitance, increase the VCO (Voltage Controlled Oscillator) operating bandwidth, and lower the phase noise performance. Meanwhile, the switched capacitor is also adopted in the output matching network for good matching and stable output power in dual-bands. Measured results show that the designed VCO covers the bands of 24.25~27.5 GHz and 37~43.5 GHz for 5G millimeter-wave communication standards as in WRC-19, with output power of -4.8~0 dBm in low band and -6.4~-2.3 dBm in high band. The measured phase noise is -105.1 ‍dBc/Hz@1 MHz offset for the 24.482 GHz carrier, and -95.3 dBc/Hz@1 MHz offset for the 43.308 GHz carrier. The DC power consumption for the core circuit is 15.3~18.5 mW, and the core area is 0.198 mm2. The corresponding FoM (Figure of Merit) and FoMT for low (high) band is -181.3 dBc/Hz (-175.4 dBc/Hz), and -194.3 dBc/Hz (-188.3 dBc/Hz), respectively.  
      关键词:complementary metal oxide semiconductor Silicon on insulator;Voltage controlled oscillator;5G millimeter wave;dual-band   
      16
      |
      13
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 61773730 false
      更新时间:2025-12-24
    • LI Ze-chao, FU Xiao-de, PAN Li-yong, YAN Rui, TANG Jin-hui
      Vol. 52, Issue 7, Pages: 2170-2182(2024) DOI: 10.12263/DZXB.20230971
      摘要:Video privacy protection is one of the important challenges faced by current society, and blurring videos is an important means to protect people’s privacy rights. Due to the natural lack of visual modality information in blurry videos, mainstream video action recognition algorithms cannot achieve satisfactory results. As a multimodal medium, blurry videos not only contain visual modality information but also rich audio modality information. From a human cognitive perspective, audio is also an important source of information acquisition. In view of this, this article proposes a privacy video action recognition method based on multimodal fusion, which can recognize human action behavior without infringing on user privacy. Specifically, this article uses the audio visual feature fusion module to integrate audio modal feature maps into visual modalities, fully integrating the deep semantic information of audio and video modalities. In addition, the model also introduces clear video frame images as labels to monitor the parameter updates of the action recognition network during the model training phase, providing clear semantic information for the private video action recognition network. The effectiveness of the proposed method was verified through extensive ablation and comparative experiments on multiple sets of private behavior datasets.  
      关键词:audio-visual feature fusion;semantic clarity;privacy preserving   
      9
      |
      18
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 66952459 false
      更新时间:2025-12-24
    • 200 V All-SiC Integration Technology

      GU Yong, MA Jie, LIU Ao, HUANG Run-hua, LIU Si-yang, BAI Song, ZHANG Long, SUN Wei-feng
      Vol. 52, Issue 7, Pages: 2183-2189(2024) DOI: 10.12263/DZXB.20230782
      摘要:An all silicon carbide integrated process platform based on the wafer with N-substrate and P-epitaxy is proposed in this paper, which is compatible with CMOS (Complementary Metal Oxide Semiconductor field-effect transistor) devices, LDMOS (Laterally-Diffused MOS) and high-voltage diodes. A P-buffer layer is adopted to modulate the vertically distributed electric field and potential, which results in 212.4% improvement in vertical voltage withstanding. The LDMOS, high voltage diode and high side region can achieve more than 300 V breakdown voltage in 2 μm P-type epitaxial layer. Based on this platform, SiC (Silicon Carbide) CMOS inverter and inverter chain are constructed, all of which achieve voltage output ranging from 0~20 V with rail-to-rail capability. A half-bridge driving circuit is designed with a four-stage inverter chain as the low-side driver circuit. The high-side driver circuit consists of level-shifting circuit and a high-side region inverter chain circuit, producing an output of 180~200 V floating gate drive signal.  
      关键词:silicon carbide (SiC);integration;silicon carbide integrated circuit;SiC inverter;SiC laterally-diffused metal Oxide semiconductor   
      8
      |
      11
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55544230 false
      更新时间:2025-12-24
    • CHEN Liang-dong, HUANG Zhi-tao, WANG Xiang, WU Gui-zhou
      Vol. 52, Issue 7, Pages: 2190-2200(2024) DOI: 10.12263/DZXB.20230720
      摘要:Aiming at the low accuracy of traditional two-step positioning method in fixed passive single station positioning, a fixed passive single station direct positioning method based onprior angular velocity is proposed. Firstly, the positioning scene and radiation source motion model are given. Based on the sampling characteristics of radar radiation source within pulse, between pulse and space, the 3D observation signal model is constructed according to fast time, slow time and snapshot. Secondly, the fast time is transformed into the frequency domain and the strongest set of signals are extracted. By using the STSAF (Space Time Symmetric Autocorrelation Function) function proposed in this paper, the quadratic phase term about the slow time is eliminated. Then, the two observed signals processed above are mixed, the direct location model is constructed and the direct location cost function is given. Meanwhile, an improved MUSIC (MUltiple SIgnal Classification) algorithm is proposed, which according to the distance information contained in the slow time domain and the azimuth information contained in the space domain, uses the relationship to search the horizontal and vertical coordinates of the radiation source to realize the direct location of the radiation source. Finally, this paper quantitatively calculates the computational amount and CRLB (Cramer-Rao Lower Bound) of the algorithm, analyzes the factors that affect the positioning accuracy, compares the root-mean-square error between the proposed method and the traditional two-step positioning method, and draws the GDOP (Geometric Dilution Of Precision) curve of the proposed method.  
      关键词:fixed passive single-station;direct position determination;STSAF function;location selection MUSIC   
      9
      |
      10
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 66952517 false
      更新时间:2025-12-24
    • Inter-Network Load Balancing Based on Hash Access

      XIE Hui, LING Xin-tong, WANG Zi-han, WANG Jia-heng
      Vol. 52, Issue 7, Pages: 2201-2211(2024) DOI: 10.12263/DZXB.20230063
      摘要:The intricacy of wireless access networks continues to escalate, accompanied by a substantial proliferation in the number of wireless devices. In order to augment network capability and resource utilization, it is necessary to design effective access control and resource allocation schemes to balance traffic loads among subnetworks and promote cross-domain resource coordination and sharing. To address this, our paper builds upon the recently proposed Hash Access protocol and devises an optimization method, which dynamically adjusts access parameters based on the network load to alleviate network congestion. Furthermore, this paper presents traffic balancing and resource re-allocation schemes for multi-subnetwork scenarios, offering valuable approaches for integrating wireless resources and offloading traffics. Simulation results demonstrate that the proposed Hash Access optimization method maximizes network throughput while ensuring network stability, and the proposed resource allocation scheme effectively addresses load unbalancing issue, thereby enhancing the performance and fairness for complex integrated networks.  
      关键词:radio access network;Hash Access;access control;network load balancing   
      7
      |
      12
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 53752374 false
      更新时间:2025-12-24
    • LI Qing, ZHONG Jiang, NI Hang
      Vol. 52, Issue 7, Pages: 2212-2218(2024) DOI: 10.12263/DZXB.20231106
      摘要:Graph anomaly detection, as a crucial data mining task, focuses on identifying anomalous nodes that significantly deviate from the majority of the nodes. With the advancement of unsupervised graph neural network techniques, various efficient methods have been developed to detect potential anomalies in graph data, including those based on density estimation and generative adversarial networks. However, these methods often focus on generating high-quality representations for unsupervised graph anomaly detection and tend to overlook the characteristics of graph anomalies. Consequently, this paper proposes a dual-channel heterogeneous graph anomaly detection model (HD-GAD). Its architecture includes two graph neural networks, i.e. a global substructure-aware GNN (Graph Neural Network) and a local substructure-aware GNN, designed to capture global and local substructural properties for graph anomaly detection. Additionally, the model introduces a multi-hypersphere learning (MHL) objective based on dual inference, which measures anomalies deviating from the overall graph/community structure from macro and meso hypersphere perspectives. The HD-GAD model utilizes the similarity function EmbSim to optimize the training objective, mitigating model collapse issues in multi-hypersphere learning. Comprehensive experiments conducted on five different datasets demonstrated that the AUC (Area Under Curve) values exceeded 0.9 in most cases, achieving industry-leading levels and further proving the HD-GAD model's efficiency and performance advantages in graph anomaly detection tasks.  
      关键词:graph anomaly detection;graph neural network;hypersphere learning;dual-channel graph neural network;unsupervised learning;dual learning   
      4
      |
      14
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 59657834 false
      更新时间:2025-12-24
    • LIU Bing, LI Sui, LIU Ming-ming, LIU Hao
      Vol. 52, Issue 7, Pages: 2219-2227(2024) DOI: 10.12263/DZXB.20231156
      摘要:Limited by the latent space modeling ability and pre-defined diversity metrics, most diverse image captioning models fail to achieve a balance between diversity and accuracy. To this end, we propose a novel diverse image captioning framework, which consists of a transformer based variational inference encoder and a generator. Specifically, the variational inference network aims to learn a latent space for each word to enhance the ability of caption diversity modeling, while the generator network produces diverse captions conditioned on each image and a sequence of latent variables. To overcome the limitation of pre-defined metrics, we introduce introspective adversarial learning into the proposed model, where the variational inference network also serves as a discriminator to distinguish between the ground truth captions and those produced by the generator without extra discriminators. The proposed method is endowed the ability to self-evaluate the quality of generated captions. The experimental results on dataset MSCOCO show that compared with the conventional methods, the proposed method with 100 samples improves the mBLEU (mutual overlap-BiLingual Evaluation Understudy) scores by 1.9% and the CIDEr (Consensus-based Image Description Evaluation) scores by 7.5%, respectively. Compared with typical multimodal large models, the proposed method is more suitable for generating diverse declarative descriptive captions with smaller parameters.  
      关键词:image captioning;variational inference;Adversarial learning;latent embedding;multi-modal learning;generative model   
      9
      |
      16
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 49917995 false
      更新时间:2025-12-24
    • XU Si-ya, GUO Jia-hui
      Vol. 52, Issue 7, Pages: 2228-2241(2024) DOI: 10.12263/DZXB.20230065
      摘要:As an emerging distributed machine learning architecture, federated learning (FL) allows multiple users to train local models and achieve global aggregation of models with data privacy protection, thus providing reliable Internet of Vehicle (IoV) services. However, in the training process of FL, many training terminals may switch among domains due to the high mobility, resulting in low accuracy of the global model. Besides, malicious terminals may frequently upload invalid or incorrect model data which leads to low service reliability. Therefore, we build the dual-layer FL based edge collaborative computing mechanism for high dynamic IoV businesses. Firstly, we comprehensively consider the mobility, computing ability and reliability to construct the service capability model for the terminal, and then propose the edge collaborative computing domain (ECCD) construction algorithm based on deep reinforcement learning. By clustering the vehicle terminals covered by multiple edge nodes, the switching probability of the terminal local model will be reduced, and the sustainability of the FL model training can be guaranteed. Furthermore, we design a dual-layer FL framework including the inter-ECCD aggregation layer and cross-ECCD aggregation layer, respectively. It adopts the semi-asynchronous aggregation mechanism for local models based on the adaptive aggregation factor in the inter-ECCD aggregation layer, and the asynchronous aggregation mechanism for domain’s regional model based on data volume in the cross-ECCD aggregation layer, which jointly improve the aggregation efficiency of the FL system. In particular, considering that the high speed terminals may cause the cross-domain problem, we introduce the partial conditional update mechanism for the local model to avoid the situation that the high-quality models are covered by the low-quality models, which further improves the accuracy of the global model and the utilization of FL system resources. The simulation results verify that the proposed framework outperforms the local computing and asynchronous/synchronous FL algorithms in terms of model accuracy and service reliability.  
      关键词:federated learning;edge computing;reliability;high dynamic;internet of vehicle   
      11
      |
      16
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 50541967 false
      更新时间:2025-12-24
    • CHEN Yu-zhong, CHEN You-kun, LIN Min-hu, NIU Yu-zhen
      Vol. 52, Issue 7, Pages: 2242-2256(2024) DOI: 10.12263/DZXB.20230607
      摘要:Different from the natural images captured from real-world scenes, screen content images (SCI) are synthetic images typically composed of various multimedia contents, such as computer-generated text, graphics, and animations. Existing SCI quality assessment methods usually fail to fully consider the impacts of image edge and global context on the perceived quality of screen content images. To address the above issues, this paper proposed a no-reference screen content image quality assessment model based on edge assistance and multi-scale Transformer. Firstly, an edge structure map consisting of the high-frequency information in a distorted SCI is constructed using Gaussian Laplace operators. Then a convolutional neural network (CNN) is used to extract and fuse the multi-scale features from the input distorted SCI and the corresponding edge structure map, thus providing additional edge information gain for model training. In addition, this paper further proposed a multi-scale feature encoding module based on Transformer to better model the global context information of different scale images and edge features on the basis of the local features obtained by CNN. The experimental results show that the model proposed in this paper outperforms the state-of-the-art no-reference and full-reference SCI quality assessment methods, and achieves higher consistency with the subjective visual perception.  
      关键词:no-reference screen content image quality assessment;laplacian of gaussian;convolutional neural network;Transformer;multi-scale features   
      4
      |
      13
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55782227 false
      更新时间:2025-12-24
    • JING Xiao-wei, JING Jian-wei, YAN Li-ping, LIU Chang-jun
      Vol. 52, Issue 7, Pages: 2257-2261(2024) DOI: 10.12263/DZXB.20230932
      摘要:With the rapid development of aerospace technology, wireless power transmission (WPT) in the closed cavity has attracted extensive attention.WPT based on frequency control is proposed, which can realize controllable, and high-efficiency wireless charging of multi-directional sensors in electrically large closed cavities (103×λ3).The electric field distribution in an electrically large cavity is very sensitive to the change of frequency, and the field distribution in the closed cavity can be controlled by changing frequency.The experimental results show that the highest WPT efficiency at S-band is 96.6%. The measured rectification efficiency of the designed broadband rectifier circuit is up to 80%, and the bandwidth with rectification efficiency higher than 50% is 1.65 GHz.The different working states of dual receivers can be controlled in the frequency band from 2.401~2.495 GHz, which shows its application prospect in wireless power supply for sensors in closed spaces such as aerospace vehicles.  
      关键词:closed electrically large cavity;frequency control;high efficiency;S-band;microwave wireless power transmission;rectifying circuit   
      5
      |
      14
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55544201 false
      更新时间:2025-12-24
    • HUANG Jun-yang, CHEN Hong-hui, WANG Jia-bao, CHEN Ping-ping, LIN Zhi-jian
      Vol. 52, Issue 7, Pages: 2262-2270(2024) DOI: 10.12263/DZXB.20240090
      摘要:Scene text image super-resolution (STISR) aims to enhance the resolution and legibility of text in low-resolution images. In cases of spatial deformation or low-resolution text images, the lack of details in text regions and the difficulty in aligning semantic cues and visual features with character position make it difficult to recognize text effectively. In order to address these challenges, this paper proposes a perceiving multi-domain character distance for scene text image super-resolution method (PMDC), which improves the image text region and edge texture details. Firsly, the visual and semantic features are extracted by using the asymmetric convolution module along with the semantic prior module. Then the enhanced position coding is obtained by the character distance perception module to perceive the distance change and semantic similarity between characters. Finally, the guiding cues and visual features are combined to restructure the pixels and generate a super-resolution text image. In comparison to TATT, experimental results from the public dataset TextZoom showed an increase of 0.11 dB in the fidelity of the peak signal-to-noise ratio index. This improvement effectively enhances the clarity of the text area and the detailed edge texture. Additionally, the recognition accuracy was improved by 1.4%, which effectively enhances the readability of the text image.  
      关键词:computer vision;scene text images;super-resolution;attention mechanism;feature information association   
      4
      |
      12
      |
      2
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 59015884 false
      更新时间:2025-12-24
    • CHONG Yi-ning, LI Jue, QIAO Ming
      Vol. 52, Issue 7, Pages: 2271-2278(2024) DOI: 10.12263/DZXB.20230845
      摘要:In this paper, the design of high-voltage super junction power MOS (Metal Oxide Semiconductor) device is carried out by using the semi-super junction structure, the super junction cell structure is designed based on the Sentaurus TCAD (Technology Computer Aided Design) simulation platform, and the breakdown voltage and on-resistance of the high-voltage super junction power MOS devices are optimized, and then the characteristics of parasitic capacitance are explored. Finally, based on multiple epitaxial processes, a high-voltage super junction power MOS device with a simulated breakdown voltage of 1 658 V, a process simulation breakdown voltage of 1 598 V and a specific on-resistance value of 303 mΩ·cm2 has been independently designed, which reduced the specific on-resistance value by about 50% compared with the same withstand voltage device. At the same time, the influence of four main structural parameters, namely super junction doping concentration and thickness and voltage support layer doping concentration and thickness, on the parasitic capacitance characteristics of the device has been explored.  
      关键词:super junction VDMOS;cell;breakdown voltage;specific on-resistance;parasitic capacitance   
      5
      |
      13
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 59321582 false
      更新时间:2025-12-24
    • LIU Wen-xi, ZHANG Jia-bang, LI Yue-zhou, LAI Yu, NIU Yu-zhen
      Vol. 52, Issue 7, Pages: 2279-2290(2024) DOI: 10.12263/DZXB.20230668
      摘要:Camouflage object detection aims to detect highly concealed objects hidden in complex environments, and has important application value in many fields such as medicine and agriculture. The existing methods that combine boundary priors excessively emphasize boundary area and lack the ability to represent the internal information of camouflaged objects, resulting in inaccurate detection of the internal area of the camouflaged objects by the model. At the same time, existing methods lack effective mining of foreground features of camouflaged objects, resulting in the background area being mistakenly detected as camouflaged object. To address the above issues, this paper proposes a camouflage object detection method based on boundary feature fusion and foreground guidance, which consists of several stages such as feature extraction, boundary feature fusion, backbone feature enhancement and prediction. In the boundary feature fusion stage, the boundary features are first obtained through the boundary feature extraction module and the boundary mask is predicted. Then, the boundary feature fusion module effectively fuses the boundary features and boundary mask with the lowest level backbone features, thereby enhancing the camouflage object’s boundary position and internal region features. In addition, a foreground guidance module is designed to enhance the backbone features using the predicted camouflage object mask. The camouflage object mask predicted by the previous layer of features is used as the foreground attention of the current layer features, and performing spatial interaction on the features to enhance the network’s ability to recognize spatial relationships, thereby enabling the network to focus on fine and complete camouflage object areas. A large number of experimental results in this paper on four widely used benchmark datasets show that the proposed method outperforms the 19 mainstream methods compared, and has stronger robustness and generalization ability for camouflage object detection tasks.  
      关键词:camouflaged object detection;boundary prior;foreground guidance;boundary features;boundary mask;spatial interaction   
      6
      |
      13
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 61764857 false
      更新时间:2025-12-24
    • WANG Xin-rui, JI Yuan, ZHANG Yin, CHEN Hong-gang, MU Ting-zhou
      Vol. 52, Issue 7, Pages: 2291-2299(2024) DOI: 10.12263/DZXB.20230049
      摘要:Based on super pixel technology, a digital driven strategy for color silicon OLED (Organic Light Emitting Diode) micro-display is proposed. By reusing adjacent pixel information, a single pixel can be used for imaging multiple adjacent pixels to greatly improve the display resolution. A digital driving circuit for color OLEDoS (Organic Light Emitting Diode on Silicon) micro-display is designed. Under the condition of 120 Hz frame rate, 256 grey levels and 4K display resolution can be achieved while the circuit area and data transmission per second are only 50% of the traditional driving mode. The test results show that the average current range of OLED pixel realized by the driving circuit is 13.1 pA~3.74 nA, which can meet the demand of near-eye display of micro display.  
      关键词:organic light emitting diode on silicon;micro-display;pixel driving circuit;super pixel strategy;field programmable gate array   
      3
      |
      12
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 50541921 false
      更新时间:2025-12-24
    • HUANG He, MA Rui-hua
      Vol. 52, Issue 7, Pages: 2300-2306(2024) DOI: 10.12263/DZXB.20240018
      摘要:In this paper, a wideband, dual-polarized antenna with extremely low profile is developed for base station application. The antenna evolved from two fan-shaped dipoles that crossed each other. By adding annular branches and metallized through holes at the end of the dipole, its port input impedance increases when the antenna occupies a lower height. Besides, the flare angle of the fan-shaped arm is increased so that a second resonant point can be generated to achieve the purpose of expanding the bandwidth. The dual-polarized antenna can provide a bandwidth of 22% in the 2.17~2.7 GHz band. Because the two dipoles is highly symmetrical about the geometric center, the isolation degree and cross polarization discrimination are high in the working frequency band, among which the simulation value of the isolation degree can reach 51 dB, and the simulation value of the cross polarization discrimination in the 0° can reach 48 dB. In addition, the simulated peak gain of the antenna is as high as 9.6 dBi. The antenna has the advantages of high isolation, high cross-polarization discrimination and high gain, and has a good application prospect in the base station system.  
      关键词:base station;dual-polarized antenna;dipole antenna;input impedance;low profile antenna;wideband antenna   
      6
      |
      15
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 64919680 false
      更新时间:2025-12-24
    • A Deep Semantic Mining Model Based on Aspect-Level Sentiment Analysis

      ZHANG Huan-xiang, PENG Jun-jie
      Vol. 52, Issue 7, Pages: 2307-2319(2024) DOI: 10.12263/DZXB.20230037
      摘要:Aspect level sentiment analysis is a fine-grained sentiment classification task, which has a wide range of application prospects. Therefore, it has been widely concerned and researched, especially in recent years, the graph neural network based on dependency tree and the network model based on attention have made great progress. However, these studies are limited by factors such as the difficulty in parsing dependency and the complex expression of online reviews. To overcome these challenges, this paper proposes a deep semantic mining model (DSMM) that considers both syntactic and contextual semantics. Specifically, in order to mine deep semantic hidden behind the syntax, the model uses parallel graph convolution and multi-head self-attention to mine rich semantic. In order to make full use of the intrinsic correlation between syntactic semantics and contextual semantics, we used the relevance attention mechanism to obtain the correlation between syntactic semantics and contextual semantics, and we used the adaptive aspect routing mechanism to obtain the sentiment semantics of aspects effectively. Moreover, we introduced the semantic location embedding based on dependency tree to further enhance the aspect-opinion word correlation. The experimental results on three public datasets show that our model can not only mine the semantic features of sentences from different semantic spaces, but also effectively use the syntactic features to strengthen the semantic representation of sentences in sentiment analysis of complex sentence, and the performance of classification accuracy and generalization ability is better than that of related work.  
      关键词:aspect level sentiment analysis;graph convolutional neural network;multi-head attention mechanism;Relevance attention;syntax;contextual semantics   
      5
      |
      17
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 58939160 false
      更新时间:2025-12-24
    • CAI Qi, ZHU Hao-shen, ZENG Ding-yuan, WANG Xi-yao, XUE Quan, CHE Wen-quan
      Vol. 52, Issue 7, Pages: 2320-2330(2024) DOI: 10.12263/DZXB.20240026
      摘要:This work presents a high-efficiency on-chip harmonic tuned power amplifier (PA) monolithic microwave integrated circuit (MMIC) for millimeter-wave applications. The efficiency of MMIC PA at high frequency can be improved by accurate harmonic tuning method and proper harmonic terminations at both the input and output port of the transistor. The output second and third harmonic impedance are controlled simultaneously by the proposed matching network. Besides, the input second harmonic impedance is tuned to the optimum region to achieve high-efficiency performance. Based on 0.15 μm GaN-on-SiC (Gallium Nitride on Silicon Carbide) process, the proposed PA topology and design method are verified by simulation and measurement. The fabricated PA has a measured bandwidth of 21.4 to 23 GHz. The PAE (Power Added Efficiency) is larger than 39.2% and the output power is larger than 33 dBm within the measured bandwidth. The maximum measured drain efficiency is 63.7% with an output power of 34.1 dBm at 22.2 GHz. The corresponding PAE is 50.2%. Close agreement between simulated and measured results is achieved for this PA. The total size of the PA is 1.87 mm2, resulting in a power density of 1.31 W/mm2. Meanwhile, the proposed PA has a high-efficiency and power density performance compared with other reported high-efficiency PAs.  
      关键词:drain efficiency;GaN PA;second and third harmonic tuned;input harmonic tuned   
      15
      |
      15
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 54598109 false
      更新时间:2025-12-24
    • LI Si-cong, WANG Jian, SONG Ya-fei, WANG Shuo
      Vol. 52, Issue 7, Pages: 2331-2340(2024) DOI: 10.12263/DZXB.20240162
      摘要:With the increasing severity of cyber threats, the detection and classification of malicious code has become particularly critical. Traditional analysis methods rely on manual feature extraction, which is time-consuming and difficult to keep up with the rapid mutation of malicious code. In contrast, deep learning techniques show great potential for malicious code classification. However, model complexity and resource consumption are still challenges for practical deployment. In this study, we propose the TriCh-LKRepNet (Triple-Channel Large Kernel Reparameterisation Network), which focuses on lightweight design and aims to ensure detection performance while reducing computation and memory requirements. Through the proposed three-channel mapping technique, the multi-dimensional information of malicious code is effectively converted into image channels, which enhances the differentiation of features. An efficient deep learning architecture is designed by combining the advantages of convolutional neural networks (CNN) and Transformer, and the connection paths are optimized by a reparameterization technique to reduce the memory consumption and enhance the operation efficiency. In addition, the introduced linear training time over-parameterization and large convolutional kernel techniques further reduce the number of parameters and computational burden of the model. It is experimentally demonstrated that TriCh-LKRepNet improves the malicious code classification accuracy while realizing the model's lightweight, which shows better performance and wider application potential than existing techniques, especially in resource-constrained environments where real-time detection is required, providing an effective solution.  
      关键词:malicious code classification;malicious code visualization;structural reparameterisation;large convolutional kernel;assembly information;semantic relations   
      6
      |
      12
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 59283354 false
      更新时间:2025-12-24
    • A Multi-Modal Medical Image Analysis Algorithm Based on Text Guidance

      FAN Lin, GONG Xun, ZHENG Cen-yang
      Vol. 52, Issue 7, Pages: 2341-2355(2024) DOI: 10.12263/DZXB.20231135
      摘要:Combining gastroscopy ultrasound and white light endoscopy can improve the accuracy of identifying gastrointestinal stromal tumors (GISTs). However, existing multi-modal methods often focus solely on image features and overlook the semantic relevance contained in diagnostic textual information, which is crucial for precise understanding and diagnosis of medical images. To address this issue, we propose a novel text-guided multi-modal medical image analysis framework (TMM-Net). TMM-Net extracts key diagnostic information features from images through a multi-stage guided model of diagnostic text, and then promotes the interaction of multi-modal features through cross-modal attention mechanisms. Notably, TMM-Net simulates the clinical diagnostic process by predicting lesion attributes, enhancing interpretability. Validation experiments were conducted on a dataset consisting of 10 025 modality data pairs from two centers. The results show that the proposed method achieves a 7.7% improvement in accuracy compared to the current state-of-the-art GISTs diagnostic method, with the highest AUC (Area Under the Curve) value of 0.927, and its interpretability may better suit clinical needs.  
      关键词:multi-modal fusion;model interpretability;image-text matching;gastrointestinal stromal tumor;gastroscopic ultrasound;white light endoscopy   
      3
      |
      14
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 66952417 false
      更新时间:2025-12-24
    • JIANG Lin, LI Yun-fei, LEI Bin, TANG Bo, LIU Qi, GUO Yu-fei
      Vol. 52, Issue 7, Pages: 2356-2368(2024) DOI: 10.12263/DZXB.20230358
      摘要:In order to solve the problems of kidnapping detection and re-localization failure of original AMCL(Adaptive Monte Carlo Localization) in similar environment, an improved AMCL algorithm based on semantic dimension chain of corner family is proposed. Firstly, the multi-sensor information of robot is fused and a two-dimensional grid map is constructed based on Gmapping algorithm. Secondly, the target detection frame and category information of indoor environment are obtained based on Yolov5, and the semantic mapping map is constructed incrementally by combining GrabCut algorithm and Bayesian method. The corners are classified based on their convexity, concavity, and the azimuth of the corners relative to the robot, and the category and position relationships between the corners and the indoor objects in the semantic mapping map are fully excavated. The semantic dimension chain of the corner family and the corresponding retrieval table are constructed. In the process of localization, global pre-localization is realized based on the semantic dimension chain of corner family, and kidnapping detection is carried out based on the proposed kidnapping detection mechanism, and localization self-recovery is realized based on the improved AMCL algorithm after the kidnapping event is detected. Finally, the effectiveness of this method is verified by kidnapping experiments in real environment. Experiments show that the proposed method improves the global localization accuracy, global localization rate, kidnapping detection accuracy and localization self-recovery success rate by 42%, 214%, 88% and 72%, respectively, in the similar environment; and 44%, 152%, 12% and 92%, respectively, in the non-similar environment; and 36%, 426%, 26% and 68%, respectively, in the long corridor environment.  
      关键词:kidnapping detection;the semantic dimension chain of the corner family;Bayesian method;global pre-localization;localization self-recovery   
      7
      |
      15
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 66951780 false
      更新时间:2025-12-24
    • LI Guang-li, YE Yi-yuan, WU Guang-ting, LI Chuan-xiu, LÜ Jing-qin, ZHANG Hong-bin
      Vol. 52, Issue 7, Pages: 2369-2381(2024) DOI: 10.12263/DZXB.20230305
      摘要:Breast cancer is the most common cancer in women. The single neural network used in breast cancer pathological image classification has the following defects: the convolutional neural network (CNN) lacks the ability to extract global context information while the Transformer lacks the ability to depict local lesion details. To alleviate the problem, a novel model, named multi-view Transformer coding and online fusion mutual learning (MVT-OFML), is proposed for breast cancer pathological image classification. First, ResNet-50 is employed to extract local features in images. Then, a new multi-view Transformer (MVT) coding module is designed to capture the global context information. Finally, a novel online fusion mutual learning (OFML) framework based on the Logits and middle feature layers is designed to implement the bi-directional knowledge transfer between ResNet-50 and the MVT coding module. This makes the two networks complement each other to complete breast cancer pathological image classification. Experiments validated on BreakHis and BACH show that compared to the best baseline, the performance improvements of accuracy are 0.90% and 2.26%, respectively, whereas the corresponding improvements of average F1 score are 4.75% and 3.21%, respectively.  
      关键词:breast cancer;pathological image classification;multi-view Transformer;convolution neural network;online fusion mutual learning   
      2
      |
      13
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 66952360 false
      更新时间:2025-12-24
    • KANG Hai-yan, JI Shan-shan
      Vol. 52, Issue 7, Pages: 2382-2392(2024) DOI: 10.12263/DZXB.20231158
      摘要:In today’s rapidly evolving landscape of distributed machine learning, conventional data incentive solutions often fall short due to their reliance on simplistic single-server architectures, in addition, as computing environments become increasingly complex, particularly within the context of heterogeneous wireless networks, these traditional approaches struggle to meet the dynamic computational demands such as unbalanced resource allocation and exorbitant communication costs. In response to the above dilemma, this paper innovatively proposes a hierarchical Stackelberg game swarm learning incentive method for wireless edge network (HSISL). This paper innovatively introduces the Stackelberg game mechanism into the swarm learning. Based on the performance differences of each computing terminal, the cloud aggregation platform, edge cluster nodes, and edge computing nodes conduct dynamic games and jointly formulate personalized hierarchical resource allocation strategies through the fair incentive process of dual pricing, which can effectively guide the edge computing model to accelerate forward. Through theoretical and experimental analysis, the HSISL method can obtain the optimal incentive Nash equilibrium solution for model training. Compared with other incentive methods, the HSISL method can effectively improve the fairness of the model. With training efficiency, its accuracy on the MNIST data set can reach 96.06%.  
      关键词:data sharing;swarm learning;Stackelberg game;dynamic games;wireless network communication;incentives   
      6
      |
      16
      |
      2
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 55208776 false
      更新时间:2025-12-24
    • Image Classification Network of Gating Mechanism

      JIANG Wen-tao, GAO Yuan, YUAN Heng, LIU Wan-jun
      Vol. 52, Issue 7, Pages: 2393-2406(2024) DOI: 10.12263/DZXB.20240104
      摘要:To extract more expressive and discriminative key features, reduce the loss of key features during network transmission, and improve the image classification ability of neural networks, a new image classification network of gating mechanism (GMNet) is proposed. Firstly, the shallow features are extracted using gated convolution, and the convolution operation is selectively performed through the gating mechanism to improve the network's ability to extract key features of the original image. Secondly, an interpolation gated convolution (IGC) module is designed, which combines Lanczos interpolation with gated convolution to enhance shallow features while extracting more discriminative features, improving the non-linear expression ability of features. Then, a large kernel gated attention mechanism (LGAM) module is designed, which combines large kernel attention with gated convolution to achieve selective enhancement and fusion of features, and improve the contribution of key region features. Finally, the large kernel gated attention mechanism module is embedded into the residual branch to enable the model to learn input data's features and contextual information more effectively, reduce the loss of key features during network information transmission, and improve the network's classification ability. The method achieved classification accuracy of 97.05%, 83.68%, 97.68%, 90.60%, and 83.05% on image datasets CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof, respectively, and improved on average by 3.26%, 7.08%, 3.44%, 2.65%, and 5.02% compared to current advanced methods. Compared with existing mainstream network models, the gated mechanism image classification network proposed in this paper can enhance the non-linear expression ability of features, extract more expressive and discriminative vital features, the loss of key features, improve the contribution of key region features, and effectively improve the image classification ability of neural networks.  
      关键词:image classification;gating mechanism;gated convolution;interpolation gated convolution;large kernel gated attention;residual network   
      10
      |
      13
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 60054869 false
      更新时间:2025-12-24
    • Multi-Graph Learning Based on Structure-Aware

      FU Dong-lai, GAO Ze-an
      Vol. 52, Issue 7, Pages: 2407-2417(2024) DOI: 10.12263/DZXB.20230565
      摘要:Multi-graph learning is a very important learning paradigm. Compared with multi-instance learning, in multi-graph learning, a bag represents an object, and each graph in the bag corresponds to a sub-object. This data representation method can express the structural information of sub-objects. However, existing multi-graph learning methods not only implicitly assume that the graphs in the bag satisfy independent and identical distribution, but also mostly adopt the technical idea of transforming multi-graph learning problems into multi-instance learning problems. This type of multi-graph learning method easily loses the structural information of the graph itself and the relationships between graphs. In response to the above problems, a multi-graph learning method based on structure awareness is proposed to effectively learn the structural information of the graph itself and the relationships between graphs. This method uses graph kernels to retain the structural information of the graph itself by calculating the similarity between graphs, expresses the structural information between graphs by generating bag-level graphs, and designs a bag encoder to effectively learn the structural information between graphs. Experimental results on the NCI(1), NCI(109), and AIDB datasets show that compared with existing methods, the proposed method improved by 5.97%, 3.44%, 4.48%, and 2.56% in accuracy, precision, F1 value, and AUC respectively. In terms of recall rate decreased by 2.12%.  
      关键词:multi-graph learning;graph kernel;structural information;bag-structure graph;independent identical distribution   
      4
      |
      12
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 66950036 false
      更新时间:2025-12-24
    • A CatBoost Optimization-Based Fault Diagnosis Model for Photovoltaic Arrays

      PENG Zi-ran, XU Huai-shun, XIAO Shen-ping
      Vol. 52, Issue 7, Pages: 2418-2428(2024) DOI: 10.12263/DZXB.20240236
      摘要:Most of the photovoltaic power stations are located in remote areas with complex terrain, which are affected by the external environment and prone to various faults. The traditional PV array fault diagnosis methods have the problems of low accuracy and low utilization of PV data. Aiming at the above problems, in this paper, we first improve the sparrow search algorithm (SSA) by introducing the Levy flight strategy and the dynamic adjustment strategy of the step factor to reduce the risk of the SSA algorithm falling into the local optimum and improve the optimization ability of the SSA algorithm. Then the improved levy adjustment sparrow search algorithm (LASSA) is used to optimize the key hyperparameters of the CatBoost model, and a photovoltaic array fault diagnosis model LASSA-based on CatBoost and using LASSA as the optimization strategy is proposed. CatBoost for accurate diagnosis of short-circuit, open-circuit, aging and shadow masking faults in PV arrays. The experimental results show that the fault diagnosis accuracy of the LASSA-CatBoost model is 99.7%, which is 3.6% higher compared to the CatBoost model before optimization. Compared with the existing PV array fault diagnosis models, the LASSA-CatBoost model has higher accuracy and stability.  
      关键词:photovoltaic array;fault diagnosis;I-V characteristic curve;Catboost;Levy adjustment sparrow search algorithm   
      4
      |
      18
      |
      4
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 61974062 false
      更新时间:2025-12-24
    • WANG Hai-rong, WANG Tong, XU Xi, JING Bo-xiang, CHEN Fang-ping
      Vol. 52, Issue 7, Pages: 2429-2437(2024) DOI: 10.12263/DZXB.20231160
      摘要:To solve the visual semantic understanding bias and multimodal semantic bias in multimodal named entity recognition, the confidence learning guides label fusion (CLGLF) method for multimodal named entity recognition is proposed. This method invokes the BLIP-2 pre-trained model to generate image captions, concatenates them with the input texts, and performs joint coding to achieve multimodal feature fusion. The candidate labels and text labels are obtained after decoding the multimodal representations and text representations. Based on using the KL divergence loss function to align the two groups of labels, the confidence score is calculated to evaluate the quality of the multimodal representation, and a confidence threshold is set to help screen out the biased candidate labels, the text labels in the corresponding positions are used to replace the biased candidate labels, to achieve the label fusion, and finally complete the multimodal named entity recognition. In order to verify the proposed method, experiments are carried out on the Twitter-2015 and Twitter-2017 multimodal datasets, and the experimental results are compared with 7 mainstream methods, such as MSB and UMT. The experimental results show the effectiveness of the CLGLF.  
      关键词:multimodal named entity recognition;image caption;confidence learning;multimodal semantic bias;information extraction   
      4
      |
      12
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 66952571 false
      更新时间:2025-12-24
    • CHAI Rong, LIU Lei, LIANG Cheng-chao, CHEN Qian-bin
      Vol. 52, Issue 7, Pages: 2438-2448(2024) DOI: 10.12263/DZXB.20240107
      摘要:Multi-beam satellite communication systems have received considerable attention due to their high throughput and resource utilization. Existing research considers the channel or power allocation problems in multi-beam satellite communication systems but rarely addresses the joint optimization design of user grouping and dynamic resource allocation strategies, which limits system performance. Furthermore, current studies often assume a fixed beam coverage radius, overlooking the impact of variable beam coverage radius on improving beam coverage performance. In this paper, we study the problem of user grouping and resource allocation in multi-beam satellite communication systems, and propose a two-stage resource management scheme. Addressing the dynamic and diverse user service requirements, we first design a Voronoi diagram-based iterative user grouping algorithm to achieve load balancing among user groups. Then, we formulate the subchannel and power allocation problem as a system average utility function maximization problem. To solve the problem, we regard each satellite beam as an agent, and propose a multi-agent deep Q network (DQN)-based algorithm to determine the subchannel and power allocation strategy. Simulation results demonstrate that the iterative user grouping algorithm based on Voronoi diagram proposed in this paper reduces the discrepancy in user group loads by 49.2% compared to the K-means user grouping scheme, highlighting the advantage of the proposed algorithm in achieving load balancing among user groups. Furthermore, the two-stage resource management scheme presented in this paper, when compared to algorithm proposed in existing literature, reduces the gap between system capacity and user demand by 83.43%, showcasing the performance advantage of the proposed algorithm in efficiently utilizing system resources and ensuring user service demands.  
      关键词:multi-beam satellite;user grouping;subchannel allocation;power allocation;multi-agent deep Q network;load balancing   
      7
      |
      12
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 62831912 false
      更新时间:2025-12-24
    • FANG Cheng, GUAN Fang-heng, LI Tian-chi, ZOU Zheng-feng, YANG Lei
      Vol. 52, Issue 7, Pages: 2449-2460(2024) DOI: 10.12263/DZXB.20230536
      摘要:Synthetic aperture radar (SAR) image detection often encounters problems such as error sensitivity and high computational complexity, which pose challenges to SAR target recognition. Researchers have proposed many novel and efficient deep learning methods for SAR data. However, most of these deep learning networks for SAR target recognition use the same methods as optical real-valued processing, directly applying real-valued deep neural networks to SAR images. Real-valued neural networks to some extent lose the phase information, which cannot fully utilize the complex characteristics of SAR data. As phase information is a unique data feature in SAR images, it plays a crucial role in applications such as SAR interferometry, information retrieval, and target recognition. In order to make the network more suitable for extracting complex data features from SAR, breaking the architecture of traditional neural networks, this paper proposes a novel end-to-end fully complex-valued multi-stage convolutional neural network (Complex-valued mUltI-Stage convolutIonal Neural nEtworks, CUISINE) architecture. It realizes the computation in the full complex-valued domain from the input of SAR complex image data to convolutional calculations, and finally to classification labels. Experimental comparisons on the publicly available MSTAR dataset show that our method performs well in SAR target classification. The accuracy reaches 99.42% on the test set with a phase error of 0 rad, and 88.05% on the test set with a phase error of 50 rad.  
      关键词:synthetic aperture radar;phase information;full complex fields;end-to-end;neural networks   
      16
      |
      16
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 66949273 false
      更新时间:2025-12-24
    • YU Xin-kuo, LI Jian-ping, QIN Yu-wen, YANG Hai-lin, PENG Di, XIANG Meng, XU Ou, FU Song-nian
      Vol. 52, Issue 7, Pages: 2461-2467(2024) DOI: 10.12263/DZXB.20230079
      摘要:Based on the multiple spatial mode channels of conventional multimode optical fiber, the optical fiber transmission system capacity can be effectively improved by adopting multi-dimensional multiplexing technology and then meet the rapidly growing demand for data services. In this paper, we demonstrate the conventional OM2 fiber based large-capacity optical transmission with a combination of wavelength division multiplexing (WDM), polarization division multiplexing (PDM) and mode division multiplexing (MDM) technologies. Each of the total 80 channels with 40 wavelengths (1 535.04~1 566.31 nm) and 2 mode (LP01 and LP11b) channels is modulated by 60 Gbaud PDM16-ary quadrature amplitude modulation (PDM-16QAM) signal. The MDM link consists of a pair of mode multiplexer/de-multiplexer based on multi-plane light conversion (MPLC) and 20 m OM2 fiber. Thanks to the high mode isolation degree of two used mode channels (<-20 dB), only 2×2 multiple input multiple output (MIMO) algorithm is applied for polarization de-multiplexing, and no need to do mode de-multiplexing. In this work, to improve the system capacity, the key system parameters have been optimized, including the roll-off factor of pulse shaping filter, clipping ratio and the received optical power (ROP). And the Volterra decision feedback equalization (VDFE) is also adopted not only to compensate for the nonlinear impairments introduced by the optical modulator, but also alleviate the high frequency noise enhancement caused by feed forward equalization (FFE). Then, a total capacity up to 38.4 Tbit/s has been realized with the bit error rate (BER) of all 80 channels lowering than the 20% soft decision forward error correction (SD-FEC) threshold of 2.7×10-2. The experimental results reveal that the MDM coherent optical transmission scheme based on multi-mode fiber has the potential in the future ultra-large capacity short-distance optical interconnection system.  
      关键词:optical fiber communications;mode division multiplexing;wavelength division multiplexing;coherent detection;nonlinear equalization   
      4
      |
      13
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 67257995 false
      更新时间:2025-12-24
    • WANG Xin, SHEN Bin, HUANG Xiao-ge
      Vol. 52, Issue 7, Pages: 2468-2476(2024) DOI: 10.12263/DZXB.20240004
      摘要:Spectrum cartography based on tensor completion algorithms has been widely studied in recent years. Most of the current tensor completion algorithms for spectrum cartography implicitly assume that the tensor is balanced. It may not be possible to take advantage of unbalanced tensors' low-rank nature to estimate the entire tensor information, leading to performance degradation. This paper proposes an unbalanced spectrum cartography algorithm based on overlapping Ket augmentation (OKA) and tensor train (TT) to address the performance degradation of unbalanced tensors when applying traditional tensor completion algorithms. Firstly, OKA is used to represent the low-order high-dimensional tensor as a high-order low-dimensional tensor, which solves the problem that the unbalanced tensor is unable to utilize its low-rank nature for tensor completion without information loss. Secondly, the use of TT matricization to obtain more balanced matrices improves the accuracy of the completion algorithm under more balanced dimensionality conditions. Finally, using the low-rank nature of the high-order low-dimensional tensor, the tensor completion is accomplished using the parallel matrix factorization or Frobenius norm based singular value decomposition free (SVDFree) algorithm. Simulation results show that for unbalanced tensors, the proposed scheme can obtain more accurate radio maps compared to existing tensor completion algorithms, while the proposed SVDFree algorithm has lower computing complexity.  
      关键词:spectrum cartography;tensor completion;tensor train;overlapping Ket augmentation;parallel matrix factorization;singular value decomposition   
      4
      |
      13
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 56393418 false
      更新时间:2025-12-24
    • Optimal BN Structure Learning Algorithm Based on Double Constraints

      CHEN Yi-wei, DI Ruo-hai, WANG Peng, ZHANG Xin-lan, ZHANG Huan, XU Wen
      Vol. 52, Issue 7, Pages: 2477-2490(2024) DOI: 10.12263/DZXB.20230268
      摘要:Aiming at the problem that existing Bayesian network (BN) structure learning algorithms based on Dynamic programming are too complex to learn large-scale networks within a reasonable time, a Bayesian network structure learning algorithm based on double constraints is proposed. Firstly, the set of neighbor nodes is obtained by using the set of candidate nodes and constraint set for conditional independence (CI) tests based on the maximum information coefficient and Markov blanket. Secondly, the neighbor node set is used to constrain the search process of the parent node graph, so as to obtain the candidate parent node set. On this basis, the optimal parent set of each node is extracted from the candidate parent node set to construct the initial directed graph. Thirdly, the strongly connected components of the initial digraph are calculated using the Tarjan algorithm to get the node block order. Finally, the optimal BN structure is obtained by using node block order to constrain the search process of node order graph. Experiments show that, compared with the existing five structural learning algorithms based on dynamic programming, the algorithm proposed in this paper greatly improves the learning efficiency of the algorithm under the premise of slightly reduced accuracy. For Sachs network, the proposed algorithm reduces the time consumption by 40.3% and the accuracy by 12.1% compared with DPCMB (Dynamic Programming Constrained with Markov Blanket) algorithm.  
      关键词:Bayesian network;maximum information coefficient;conditional independence test;Markov blanket   
      6
      |
      17
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 66952606 false
      更新时间:2025-12-24
    • YU Yi-feng, QIAN Jiang-bo, YAN Di-qun, WANG Chong, DONG Li
      Vol. 52, Issue 7, Pages: 2491-2502(2024) DOI: 10.12263/DZXB.20230622
      摘要:Coloring long sequences of animated sketch frames is a challenging task in computer vision. On one hand, the information contained in sketches is sparse, and coloring algorithms need to infer missing information. On the other hand, the colors between consecutive frames need to be consistent to ensure visual quality throughout the video. Most existing coloring algorithms are designed for single images and only provide one open-ended, reasonable color result, which is not suitable for coloring frame sequences. Other reference-based coloring algorithms do not have an organic connection between two frames, resulting in unsatisfactory coloring results. In the same shot sequence, the features of same object usually do not change too much. Therefore, a model that can automatically color sketches based on a given reference frame can be designed. This paper proposes a new model called Cross-CNN that combines convolutional neural networks (CNN) and Transformer. Our Cross-CNN can find and match colors from the reference frame, thus ensuring temporal feature consistency. In this model, the reference frame and the sketch frame are superimposed in the channel dimension, and the pre-trained Resnet50 network is used to extract locally fused features. The fused feature map is then passed to the Transformer structure for encoding to extract global features. In the Transformer structure, a cross attention mechanism is designed to better match long-distance features. Finally, a convolutional decoder with skip connections is used to output the colored image. In terms of the dataset, this paper extracted frames from eight movies and conducted strict screening to create a dataset containing 20 000 pairs of reference and sketch frames for experimental research. The SSIM (Structural SIMilarity) of Cross-CNN can reach 0.932, which is higher than the SOTA algorithm by 0.014. The algorithm codes link for this paper: https://github.com/silenye/Cross-CNN.  
      关键词:sketch coloring;convolutional neural network;Transformer;color matching;animation production   
      6
      |
      14
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 59776384 false
      更新时间:2025-12-24
    • Improved Adaptive Model Pools for Online Anomaly Detection Algorithms

      XIANG Qiu-yan, ZI Ling-ling, CONG Xin
      Vol. 52, Issue 7, Pages: 2503-2514(2024) DOI: 10.12263/DZXB.20230731
      摘要:Accurate online anomaly detection methods are at the core of the development of IoT-related industries, in which online anomaly identification targeting complex and dynamic data streams is one of the important research hotspots. Existing online anomaly detection methods suffer from the problem of processing complexity overload, while offline deep anomaly detection methods suffer from the problem of concept drift due to the change of data distribution. To address the above problems, this paper proposes an online anomaly detection framework with improved adaptive model pooling, which can collaborate with autoencoder-based anomaly detection methods to achieve online anomaly detection. Firstly, the basic anomaly identification is carried out using the autoencoder-based anomaly detection model. Secondly, based on the adaptive model pool, the concept drift detection algorithm is integrated to accurately identify concept drift, adapt to the dynamically changing data flow, and solve the concept drift phenomenon. Finally, the model merging method of the optimised adaptive model pool is optimised, which enhances the capability of online anomaly identification. The experimental results show that compared with the flow variant of autoencoder model and the original adaptive model pool algorithm, the proposed algorithm improves the anomaly detection accuracy indexes by 20.2% and 5.83% respectively, and meanwhile is higher than the existing online anomaly detection algorithms in the best accuracy indexes by about 16.7%.  
      关键词:unsupervised learning;autoencoder;concept drift;anomaly detection;adaptive model pool;data stream   
      10
      |
      14
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 59787615 false
      更新时间:2025-12-24
    • GUO Yan, WANG Zhi-wen, ZHAO Run-xing
      Vol. 52, Issue 7, Pages: 2515-2528(2024) DOI: 10.12263/DZXB.20230772
      摘要:With the widespread application of electronic devices, printed circuit boards (PCB) hold significant importance in the electronics manufacturing industry. However, due to imperfections in the manufacturing process and interference from environmental factors, tiny defects may in PCB. Therefore, the development of efficient and accurate defect detection algorithms is crucial in ensuring product quality. To address the challenge of detecting tiny defects on PCB, this paper proposes a high-precision PCB tiny defect detection algorithm based on multi-dimensional attention mechanism. To reduce model parameters and computational complexity, partial convolution (PConv) is introduced, and the ELAN module is redesigned as the more efficient P-ELAN. Additionally, to enhance the network’s feature extraction capability for tiny defects, the omni-dimensional dynamic convolution (ODConv) based on the multi-dimensional attention mechanism (MDAM) is introduced. By combining partial convolution, the POD-CSP (Partial ODConv-Cross Stage Partial) and POD-MP (Partial ODConv-Max Pooling) cross-stage partial network modules are designed, along with the OD-Neck structure. Finally, based on YOLOv7, a more efficient YOLO-POD model for small object detection is proposed, and the network is optimized during the training phase using a novel loss function called Alpha-SIoU. Experimental results demonstrate that YOLO-POD achieves a detection precision of 98.31% and recall rate of 97.09%, exhibiting substantial advantages across multiple metrics. Notably, it achieves a 28% improvement over the original YOLOv7 model, as to more stringent mAP75 metric. These results validate the high accuracy and robustness of YOLO-POD in PCB defect detection, fulfilling the requirements for high-precision detection and providing an effective detection solution for the PCB manufacturing industry.  
      关键词:PCB;tiny-defect detection;partial odconv-cross stage partial;partial odconv-max pooling;omni-dimensional dynamic convolution;multi-dimensional attention mechanism   
      10
      |
      12
      |
      1
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 66949316 false
      更新时间:2025-12-24

      SURVEY AND REVIEW

    • A Survey of Network Attack Investigation Based on Provenance Graph

      QIU Jing, CHEN Rong-rong, ZHU Hao-jin, XIAO Yan-jun, YIN Li-hua, TIAN Zhi-hong
      Vol. 52, Issue 7, Pages: 2529-2556(2024) DOI: 10.12263/DZXB.20231057
      摘要:Investigating network attacks is crucial for the implementation of proactive defenses and the formulation of tracing countermeasures. With the rise of sophisticated and stealthy network threats, the need to develop efficient and automated methods for investigations has become a pivotal aspect of advance intelligent network attack and defense capabilities. Existing studies have focused on modeling system audit logs into provenance graphs that represent causal dependencies of attack events. Leveraging the powerful associative analysis and semantic representation capabilities of provenance graphs, complex and stealthy network attacks can be effectively investigated, yielding superior results compared to conventional methods. This paper offers a systematic review of the literature on provenance-graph-based attack investigation, categorizing the diverse methodologies into three principal groups: causality analysis, deep representation learning, and anomaly detection. For each category, the paper succinctly presents the workflows and the core frameworks that underpin these methodologies. Additionally, it delves into the optimization techniques for provenance graphs and chronicles the evolution of these technologies from theoretical constructs to their application in industrial settings. This study methodically aggregates and reviews datasets prevalently utilized in attack investigation research, offering a comprehensive comparative analysis of representative techniques alongside their associated performance metrics, specifically within the ambit of provenance graph-based methodologies. Subsequently, it delineates the prospective directions for future research and development within this specialized field, thereby providing a structured roadmap for advancing the domain's academic and practical applications.  
      关键词:attack investigation;provenance graph;advanced persistent threat;deep learning;anomaly detection   
      23
      |
      20
      |
      0
      <HTML>
      <L-PDF><Meta-XML>
      <引用本文> <批量引用> 66951027 false
      更新时间:2025-12-24
    0