摘要:Industrial process data encompasses continuous and discrete variables, whose underlying statistical characteristics are crucial for revealing operational conditions. However, current process monitoring models predominantly focus on continuous variables with Gaussian assumptions, which often overlook the significant effects of the multimodal distribution characteristics of process variables, as well as the noises and outliers in process data. These limitations hinder the models' ability to capture complex statistical characteristics, leading to low detection performance particularly in non-Gaussian and nonstationary processes. This article introduces a robust anomaly detection method termed continuous and discrete variables-concurrent analysis-based variational Bayesian mixture discriminator (CDVCA-VBMD). It models continuous variables with a mixed student's t-distribution and discrete variables with a mixed multinomial distribution based on variational Bayesian inference, which can adeptly manage and analyze the complex interdependencies between process variables and overcome the non-Gaussian nature of continuous variables effectively. Furthermore, CDVCA-VBMD incorporates continuous learning to ensure the effective detection of nonstationary industrial processes. Extensive validation and comparative experiments were conducted on a numerical simulation system and the Tennessee Eastman (TE) process. The outcomes demonstrate that CDVCA-VBMD can accurately characterize the mixed multimodal distribution characteristics of time-varying industrial processes, facilitating accurate anomaly detection. Additionally, the method exhibits robustness against noise and outliers in process data, supporting long-term and reliable monitoring of complex and non-Gaussian industrial processes.
摘要:Transformer-based models, such as large language models (LLM) and vision Transformers (ViTs), had achieved state-of-the-art performance in tasks across natural language processing and machine vision. However, the prevalent activation functions such as GELU (Gaussian Error Linear Unit) and Swish in ViTs and LLMs encountered challenges with insufficient precision and low computational efficiency during fully quantized inference, which constrained their deployment and application in resource-limited edge devices. This paper introduced a high-precision segmented quadratic polynomial fitting method (SQPF) and its corresponding quantized inference process, to achieve high-performance deployment of nonlinear activation functions on the edge side.The SQPF adopted the least squares method and particle swarm optimization to fetch the optimal coefficient and interval divisions for the quadratic polynomial fitting of activation functions. The obtained quadratic polynomialswere subjected to dynamic fixed-point symmetric quantization, enabling pure integer inference that solely required shift operations and multiply-accumulate computations. This paper calculated the quadratic polynomials of GELU and Swish to Si-GELU and Si-Swish, and evaluated their inference accuracy. The experimental results demonstrated that on ImageNet, the Si-GELU induced a minimal accuracy reduction of only 0.09% in the classification tasks for ViTs (ViT, DeiT, and Swin), which is 27.3% of other methods. On large language model benchmark dataset MMLU, Si-Swish caused a negligible precision degradation, with subcategory precision degradation not exceeding 0.77% and major category precision degradation not exceeding 0.23%. The minimal loss in precision indicated that the optimal quadratic polynomials derived from SQPF were a direct substitute for the full-precision floating-point activation functions in Transformer models, negating parameter fine-tuning or retraining.
摘要:Facial action unit (AU) recognition is a hot topic in the fields of computer vision and affective computing. AU recognition is a multi-label binary classification task, and currently faces challenges such as label imbalance. Most existing methods re-balance labels by adjusting the sampling rate and weights of AUs based on the correlations among AUs. However, these methods only shift the model’s prediction bias from high-frequency labels to low-frequency ones, and the bias is still unresolved. Fair treatment of each AU class, including the head and tail classes, is the key to achieve unbiased AU recognition. By introducing causal inference theory, we propose an unbiased AU recognition method CIU (Causal Intervention for Unbiased facial action unit recognition), which adjusts the empirical risks in both the imbalanced and balanced but invisible domains to achieve model unbiasedness. Extensive experiments demonstrate that our method outperforms state-of-the-art methods on BP4D and DISFA benchmarks, in which 1.1% margin over previous best method is achieved on DISFA, and can learn unbiased feature representation.
关键词:causal inference;unbiasedness;facial action unit recognition;multi-label binary classification;label imbalance;empirical risk
摘要:Spark is a distributed big data processing framework based on in-memory computing, which has the advantages of fast running speed and strong versatility. When conducting the computation task, Spark’s default partitioner HashPartitioner is easy to generate data skewing among partitions. It results in low resource utilization and poor operating efficiency. Most of the existing Spark balanced partitioning methods, such as multi-stage partitioning, migration partitioning, and sampling partitioning, have defects of scale control difficulty, high communication overhead, and excessive sampling dependence. In order to solve the above-mentioned problems, we propose a partitioning method based on first filling strategy, which considers the allocations of sample data and non-sample data at the same time, so as to achieve a balanced data partitioning. After sampling the data and estimating the weight of each key according to the sample information, the keys are sorted in descending order according to the weights. The keys are in turn assigned to the previous partitions if their additions can satisfy the partition tolerance, and the space of the last partition is reserved for the keys that are not sampled, so as to obtain the partitioning plan for the sample data. Spark partitions the data corresponding to the keys that appear in the sample according to the partitioning plan, and the data of other keys that do not appear is directly allocated to the last data partition available. The experimental results show that the new method can effectively achieve balanced partitioning for Spark data. On the real datasets from Bureau of Transportation Statistics, compared with HashPartitioner, the total running time of first filling partitioner (FFP), designed based on the proposed method, is shortened by 15.3% on average. In addition, FFP’s total running time is on average 38.7% shorter than balanced Spark data partitioner and 30.2% shorter than hash based key reassigning partitioner.
关键词:balanced partitioning;first fill strategy;data skew;Spark operator;big data
摘要:There has been a growing interest in fine-grained facial expression recognition due to its ability to capture more subtle and realistic human emotions. Existing facial expression recognition algorithms enhance image representations by extracting local key regions and other relevant features. However, these methods disregard the inherent structural relationships within the image dataset and fail to fully exploit the semantic correlation between labels and the relationship between images and labels, which restricts the enhancement of feature learning. Besides, current fine-grained expression recognition methods do not effectively explore and utilize the hierarchical relationship between coarse and fine-grained levels, which limits the recognition performance of the model. In addition, existing fine-grained expression recognition algorithms ignore the label ambiguity problem caused by labeling subjectivity and emotional complexity, which greatly affects the recognition performance of the model. To address these issues, we propose a fine-grained facial expression recognition algorithm based on relationship-awareness and label disambiguation (RALD). This algorithm enhances image features by constructing a hierarchy-aware image feature enhancement network, thoroughly exploring the dependencies among images, hierarchical labels, and between images and labels to obtain more discriminative image features. As for the issue of label ambiguity, this algorithm designs a nearest neighbors-based label distribution learning module, which further improves recognition performance by integrating neighborhood information for label disambiguation. Our algorithm achieves 97.34% in terms of accuracy on the FG-Emotions dataset for fine-grained expression recognition. Additionally, it outperforms existing mainstream facial expression recognition algorithms by 0.80% to 4.55% on the RAF-DB dataset for coarse-grained expression recognition.
关键词:fine-grained facial expression recognition;attention mechanism;relation awareness;feature optimization;label distribution learning
摘要:Oracle bone character recognition holds significant value for understanding Chinese history and the inheritance of Chinese culture. Currently, manual recognition of oracle bone character requires extensive expert experience and consumes a great deal of time, while the majority of methods for automatic recognition are constrained by the closed-set assumption. This limitation becomes pronounced in the context of oracle bones, where new characters are continuously discovered. To address this, some researchers achieved zero-shot oracle character recognition by visual matching. This method employs handprinted images as category references, achieving character recognition in scanned images through similarity matching with handprinted references. However, this approach overlooks the challenge of large intra-class variance in oracle bone scanned images, leading to potential mismatches due to the variability in glyphs. This paper proposes a two-stage semantic-enhanced zero-shot oracle character recognition method. The first stage is domain-independent character semantic learning, where the contrastive vision-language pre-training model CLIP is used to extract character semantics from oracle rubbings and template images through prompt learning, addressing the lack of semantic information in oracle characters. To cope with the domain differences between rubbings and templates, we set learnable domain-specific prompts and character category prompts, decoupling their semantics to achieve more accurate feature extraction. The second stage is semantic-enhanced character image visual matching. The model extracts intra-class shared features and inter-class distinctive features through two branches. The first branch uses contrastive learning to align the visual features of different glyphs within the same character category to the character semantics, guiding the model to focus on intra-class shared features. The second branch employs the loss function N-Pair to enhance the model’s ability to learn distinctive features between different character categories. During the testing phase, the model does not require semantic features; instead, it utilizes the intra-class similarity and inter-class distinctiveness learned during training to achieve more accurate matching between rubbings and templates, improving zero-shot recognition performance. Experimental validation on the scanned images dataset OBC306 and the handprinted images dataset SOC5519 demonstrates that our proposed method surpasses the baseline method in zero-shot oracle character recognition accuracy by over 25%.
关键词:oracle character recognition;zero-shot recognition;visual matching;semantic-enhanced;vision language model;contrastive learning
摘要:In the scenario of imperfect channel state information (CSI) and imperfect successive interference cancellation (SIC), the problem of robust resource allocation in reconfigurable intelligent surface (RIS) assisted multi-user non-orthogonal multiple access (NOMA) is studied. Considering the constraints of two types of users (information user and energy user) quality of service (QoS) and information user SIC, a transmit power minimization problem is formulated. This optimization problem is a multi-variable coupled non-convex optimization problem. In this paper, the non-convex constraints of the problem are transformed by using relaxation variables, linear approximation, S-procedure, and sign-definiteness methods. Then, the optimization problem is decomposed into two sub-problems, Finally, the alternate optimization method is used to iteratively solve the above sub-problems. The simulation results show that the proposed approach has a good convergence behavior, realizes the robust allocation of resources and can effectively reduce the transmit power of the base station.
关键词:reconfigurable intelligent surface;non-orthogonal multiple access;imperfect channel state information;imperfect successive interference cancellation;transmit power optimization;power minimization
摘要:Current multimodal pre-training techniques for visual languages predominantly focus on aligning global semantic features between images and text, yet they inadequately explore the granular feature interactions between modalities. Addressing this gap, this paper proposes a novel multimodal pre-training strategy informed by cross-modal guidance and alignment. Our method employs a dual-stream feature extraction network designed for visual sequence compression, to facilitate modality feature extraction. During this phase, a synergistic image-text guidance is integrated within the visual encoder, orchestrating the compression of visual sequences layer by layer. This approach mitigates the obfuscation of modality-specific fine-grained interactions by irrelevant visual information. Subsequently, in the modality feature alignment phase, we implement fine-grained relational reasoning on the image and textual features to achieve localized feature alignment among visual tokens and textual tokens. This advancement bolsters the model's comprehension of fine-grained alignment relationships. After fine-tuning, in the image-text retrieval tasks, our approach achieves an average recall rate of 86.4% for images and 94.88% for texts, which represents a significant 5.36% improvement in zero-shot image-text retrieval over the canonical CLIP (Contrastive Language-Image Pre-training) algorithm. Moreover, our method also surpasses existing mainstream multimodal pre-training methods in accuracy for classification tasks like visual question answering.
摘要:Financial fraud poses a serious threat to the economic and social stability, making the development of effective fraud detection algorithms crucial for safeguarding the integrity of the financial system. Currently, various graph-based fraud detection algorithms have been applied in practical scenarios. These methods either classify based on the structural information of graphs or utilize graph convolutional neural networks to learn embedded representations of nodes for fraud detection. However, these approaches have relatively narrow perspectives and cannot comprehensively analyze fraud detection on imbalanced multi-relational graphs. To address these issues, this paper proposes a RWK-GNN (Random Walk feature enhancement and Kcore subkernel decomposition Graph Neural Network), which efficiently extracts topological information at both the node level and the global network level in imbalanced graphs with multiple relationships. It optimizes the propagation and aggregation of graph structural features from the perspective of community evolution through subkernel decomposition algorithm, ultimately achieving fraud detection and identification. To validate the performance of the RWK-GNN algorithm, this study employs commonly used public datasets for graph neural network fraud detection tasks in model training and testing. Experimental results demonstrate significant improvements of this method over other machine learning algorithms and graph neural network algorithms in terms of the same evaluation metrics. Compared to the CARE-GNN algorithm, the proposed method achieves a 17% increase in AUC value. Compared to the PC-GNN algorithm, the proposed method achieves an 8% increase in AUC value. Moreover, compared to the SIGN algorithm, the proposed method achieves a 7% increase in AUC value.
摘要:The task of multi-label classification is widely present in real life, but there is often an issue of imbalanced data, which seriously affects the classification performance. At present, the mainstream technology for solving this problem is resampling, which are mainly divided into over-sampling and under-sampling. Particularly, over-sampling generates samples related to minority class labels while under-sampling removes samples related to majority class labels. However, these methods all focus on solving an imbalance problem, namely intra label imbalance or inter label imbalance, which may introduce another imbalance problem while solving one imbalance problem. In response to this issue, this paper proposes an imbalanced multi-label data ensemble learning method ESUS (Ensemble learning method based on Safe Under-Sampling) based on safe under-sampling. Firstly, the imbalanced multi-label dataset is divided into single label datasets and label pair datasets through label partitioning. For single label datasets, this paper proposes a secure under-sampling method to solve the problem of intra label imbalance, and constructs binary classification models using the sampled balanced dataset. For label pair datasets, ensemble learning is used on the pruned data to solve the problem of inter label imbalance, which may maintain the classification performance of the model and reduce spatiotemporal complexity. Finally, the single label dataset models and label pair dataset models are integrated into the final classification model. The experimental results on six imbalanced multi-label datasets show that compared with seven comparison methods, the ESUS method is more stable and effective on four evaluation metrics.
摘要:The existing WiFi sensing methods have high requirements for data collection and receiver hardware resources. At the same time, massive data processing will also consume a lot of hardware and software resources. The model-based WiFi sensing method reduces the dependence on the amount of data to a certain extent by establishing a mathematical model between the action mode and the signal change, but the mainstream solution still requires multiple receiving antennas or antenna arrays. This paper proposes a sensing scheme using a single-antenna receiver for the first time, using the ratio of channel state information (CSI) of different subcarriers to eliminate hardware and noise interference, and proposes a subcarrier combination selection algorithm based on variance and range to screen out high-quality subcarriers combination to get action features. A high-availability feature generation algorithm based on Fresnel zone theory is further proposed, which skillfully combines the relationship between reflection path changes and CSI dynamic phase rotation, and obtains high-availability features through data fitting and phase alignment on the complex plane. Both theoretical analysis and experimental results show that the single-antenna scheme proposed in this paper fully complies with the Fresnel zone theory, and at the same time, it can effectively improve the recognition effect of different actions in different scenarios. For the seven different actions in this paper, the overall recognition accuracy of the scheme is maintained at about 95%, and CSI selection and feature enhancement achieved an accuracy improvement of approximately 2%.
摘要:The core idea of knowledge distillation is to use a large model as the teacher network to guide a small model as the student network, improving the performance of the student network in image classification tasks. Existing knowledge distillation methods often extract category probability or feature information as knowledge from a single input sample. They could not model the relationships between samples, decreasing the network’s representation learning ability. To solve this problem, this paper introduces a graph convolutional neural network, which treats the input sample set as graph nodes to construct a relationship graph. Each sample in the graph could aggregate information from other samples, improving its own representation ability. This paper constructs the distillation loss of graph representation knowledge from the perspectives of graph nodes and relationships. It uses meta-learning to guide the student network to adaptively learn better graph representations from a teacher network, thereby improving the graph modeling ability of the student network. Compared to the baseline method, the graph-based representation knowledge distillation method improves the classification accuracy by 3.70% on the 100-classification dataset published by Canadian Institute For Advanced Research. The result indicates that the proposed method makes the student network learn a more discriminative feature space, thereby improving its image classification ability.
摘要:The one-stage visual grounding method has received widespread attention due to its speed, which uses fused features of images and text to predict target boxes. However, existing methods do not align image and text features before feature fusion, which limits the accuracy of visual grounding. To solve this problem, this paper proposes a visual grounding method based on contrastive learning large model. This method extracts features of image and text with CLIP(Contrastive Language-Image Pre-training) which is a large-scale pre-trained model based on contrastive learning. It uses Transformer encoders to fuse the image-text features and predicts target boxes using multi-layer perceptron and fused features. The method can overcome the above shortcomings for the following reasons: It can extract highly aligned image-text features in semantics via the CLIP encoders. Meanwhile, it uses global attention to interactively fuse contextual features of images and text. The proposed method was experimentally validated on five datasets, and the experimental results show that compared to existing visual grounding methods, the proposed method has achieved an improvement in overall accuracy.
摘要:Density-based clustering is a classical algorithm in cluster analysis, which can find non-spherical clusters without specifying the number of clusters in advance. In the real-world scene, there are still some issues, including unclear boundaries between clusters, varying densities of data, and complex cluster shapes. Most existing density-based clustering algorithms do not tackle these problems in a unified way. We counter this difficulty by taking inspiration from the natural erosion phenomenon to present erosion clustering (EC). Firstly, the proposed dynamic density evaluation method is integrated into the erosion strategy, which identifies and removes the data on the cluster boundary layer by layer, revealing the cores of the latent clusters. After that, a mutual-reachability-graph-based clustering is used to group the core data. Finally, the allocation strategy based on the local density peak is designed to associate the eroded data to different clusters. The experimental results on 12 benchmark datasets demonstrate that the clustering performance of the proposed EC algrithm is improved by an average of 96%, 53%, and 36% in the adjusted Rand index, adjusted mutual information, and F1 score, respectively, comparing with the other seven algrithms.
关键词:density-based clustering;cluster analysis;density estimation;local density peak;mutual k-nearest neighbor;erosion strategy
摘要:The multi-stage sub-image merging is a key method to accelerate to synthetic aperture radar (SAR) imaging in the time domain. However, the high-squint acquisition in the maneuvering platform enhances the irregularity of the support region of the spectrum, which degrades the performance of image merging in efficiency and accuracy. Because of these issues, in this paper, a modified hybrid coordinate system is designed, based on which a fast time domain imaging algorithm is developed for high-squint diving maneuvering platform SAR. Benefiting from the equivalent slant range model in the modified hybrid coordinate system, the sensitivity of the spectrum to the squinted angle is reduced, and the space variation phenomenon of the spectrum is eliminated. Hence, the spectral preprocessing function can be easily designed to effectively compress and merge the spectrum, which improves the performance of the image merging in efficiency and accuracy. Both simulated and raw data are processed to validate the performance superiority of the proposed algorithm.
关键词:fast time domain algorithm;high-squint diving maneuvering platform;hybrid coordinate system;equivalent slant range model;spectral preprocessing
摘要:End-to-end deep learning is the main technology for speech keyword spotting. The research focused on exploring better network structures, modeling units, and search strategies, and have made a lot of progress. However, less attention is paid on training efficiency. In this paper, a novel class uncertainty sampling (CUS) strategy is proposed to select effective samples for each training epoch. Since only a subset is used, much training time is saved. The core idea of CUS is measuring the class uncertainty of samples with the forward information of the output layer during the middle and late training stages, and samples are selected at a probability of their class uncertainty. Therefore more attention is paid to samples nearing the decision boundary, which are prone to missed detection or false alarm. Furthermore, the proposed method could shield the interference of label error samples. Experimental results on the AISHELL-1 Mandarin dataset showed that fast convergence and better training performance were achieved. Against the conventional training strategy, the average training time and the average converging time was relatively shortened by 60% and 47.5%, respectively. At 0.5 FP/h false accept rate(FAR), the false reject rate (FRR) was reduced from 4.75% to 3.65%, a relative reduction of 30.1%, and the maximum term weighted value (MTWV) was increased from 0.837 4 to 0.853 1. Moreover, it was experimentally verified that the method could shield most of the mislabeled samples. This conclusion was confirmed with the experiments on the large-scale AISHELL-2 Mandarin dataset.
摘要:ECG (ElectroCardioGram) signals are widely used in the medical detection of heart disease, and wearable dynamic ECG monitoring devices enable the detection and early warning of cardiac arrhythmias. Compared to resting ECG signals, dynamic ECG signals are more susceptible to interference from motion artifacts during the data acquisition process. These motion artifacts can obscure critical information within the ECG signal, limiting its clinical utility. In this paper, taking into account the local and global characteristics of the ECG signal and using its periodicity, a two-stage adaptive threshold filtering algorithm is investigated to process the low-frequency PT wave and the high-frequency QRS wave group separately, which is suitable for motion artifact filtering in single-channel ECG signal. In the first step, motion artifacts in the low-frequency part of the ECG signal are suppressed by a multi-resolution threshold. In the second step, the imbalanced QRS wave affected by motion artifacts is repaired by adaptive threshold, adjusting the QRS waveform to reduce motion artifacts in the high-frequency portion of the ECG signal, while setting adaptive thresholds to process the wavelet coefficients corresponding to the P-wave and T-wave of the ECG signal. Wavelet coefficients beyond the adaptive threshold range are adjusted via waveform scaling to further suppress the low-frequency motion artifacts. In this paper, the performance of the algorithm is evaluated using different ECG databases. When the input SNR changes from -10~10 dB, the SNR of the ECG signal increases by 10.912 2 dB and 4.391 2 dB, respectively, and the correlation coefficients between the filtered ECG signal and the pure ECG signal are 0.687 6 and 0.978 3, respectively, the correlation coefficients between the extracted motion artifacts and the original motion artifacts are 0.953 0 and 0.852 9, respectively. The experimental results show that under different noise levels, the proposed algorithm can effectively recover the ECG waveform characteristics contaminated by motion artifacts by exploiting the advantages of adaptive threshold, and retain the clinical information of ECG signals to the maximum extent, and can be used as an effective tool for filtering motion artifacts in wearable ECG devices.
摘要:In 6G communication system, the Fresnel region gradually expands with the increase of the antenna size, and the existing far-field hypothesis will introduce serious energy diffusion, that is, the angle domain will no longer be sparse. Near field communication uses spherical wave front for modeling, and the channel model is related to the angle and distance from the user to the base station, which makes it possible to estimate angles and distances while communicating, enabling integrated sensing and communication (ISAC). In this paper, a near-field model based on polar coordinates is proposed to solve the ISAC problem in near-field environment. We transform ISAC into a sparse estimation problem through non-uniform meshing and then use sparse Bayesian learning models for active user detection, location awareness, and communication. In addition, since adopting differential modulation, the proposed algorithm can realizes blind ISAC without pilot frequency, and effectively improves the spectral efficiency of the communication system. Simulation results show that the proposed ISAC algorithm can achieve higher sensing accuracy and BER performance compared with the uniform region partitioning and the existing methods in the literature.
关键词:near field communication;integrated sensing and communication;non-uniform meshing model;sparse estimation;Bayesian method
摘要:Short-text classification is broadly used and is a current hot research spot. However, the performance of short-text classification is hampered by the sca1rcity of annotated data for short texts and the challenges of centralized training for private data. To address these issues, we propose Fed-ASSL-HGAT (Active Semi-Supervised Heterogeneous Graph ATtention network model based on Federated learning), an active semi-supervised heterogeneous graph attention network model based on federated learning. This model utilizes the innovative active semi-supervised learning (ASSL) framework to generate high-quality labeled samples for empowering the heterogeneous graph attention network (HGAT) model. Additionally, federated learning is introduced to facilitate the joint training of the models deployed on different nodes, thereby satisfying the requirements of data privacy protection. The proposed ASSL framework significantly reduces the annotation difficulty by transforming the multi-class annotation task into a binary classification task. To mitigate information loss, we employ a selection strategy based on information gain to filter soft and hard labels. Semi-supervised learning is employed to select positive and negative samples with high accuracy and stability for pseudo-labeling, thereby ensuring the labeling quality. Experimental results demonstrate that the proposed ASSL-HGAT (Active Semi-supervised Learning Empowered Heterogeneous Graph Attention Network) model achieves improvements of 2.45%, 8.11%, and 7.46% in F1 scores comparing with the HGAT baseline model on the AGNews, Snippets, and TagMyNews datasets, respectively. By incorporating the federated learning, the Fed-ASSL-HGAT model can meet the performance requirements without scarifying data privacy.
摘要:The potential of sparse convolution in the field of single target tracking from LiDAR (Lightlaser Detection And Ranging) point cloud has not been fully explored. The vast majority of point cloud tracking algorithms use point-based backbone networks which require higher computation costs and the target-aware relationship modeling is insufficient. To address this problem, this paper proposes a 3D target tracking algorithm based on a sparse convolutional framework, and incorporates it with a point-voxel dual channel relationship modeling module to facilitate the embedding of target discrimination information in the such sparse framework. Firstly, this work uses a 3D convolutional residual network to extract the features of the template and search area separately, then uses deconvolution to obtain pointwise features for the spatial position in tracking tasks. Secondly, the relationship modeling module further calculates a semantic similarity query table based on the above features of the template and the search area. In order to capture the fine-grained correlation, on the one hand, the module utilizes the nearest neighbor algorithm in the spatial point channel to find the template points for each search area point, and extracts corresponding features based on the query table; on the other hand, local multi-scale voxels are constructed with each search area point as the center in the voxel channel, and the accumulated similarity of templates falling into voxel units is used as clues to extract features. Finally, the dual channel feature fusion is sent into the candidate bounding box generation module based on bird’s-eye view to estimate the target bounding box. To verify the superiority of the proposed method, we evaluated it on the KITTI and NuScenes datasets, and compared with the baseline algorithm adopting sparse convolution, the mean success and precision rates achieved a considerable improvement of 11.0% and 12.0%. The proposed method not only inherits the efficient characteristics of sparse convolution but also improves tracking accuracy.
摘要:In recent years, convolutional neural networks have demonstrated outstanding performance in HSIC (Hyperspectral Image Classification). However, the improvement of model performance involves adopting deeper and broader network architectures, leading to an increased number of parameters and operations, thus hindering deployment in airborne or on-board devices. To this end, this paper introduces a HSIC method based on the LiteFCTMN (Lightweight Fully-Connected Tensorial Mapping Network). We design two convolutional units based on the mapping way of FCTN (Fully-Connected Tensor Network) decomposition and the structural characteristics of HSIs. By mapping the original convolution kernel to multiple small-sized convolution kernels with fully-connected structures, the complexity of the novel units is reduced while their expressiveness is improved. In addition, the RDT (Residual Double-Branch Tensorial) module is constructed using the designed units. In this module, two branches share the same weights, and a channel split operation is employed to reduce the number of feature channels, thereby reducing complexity. The proposed model strategically leverages both local spatial-spectral information from RDT and global spectral information from the new units, resulting in enhanced classification performance and reduced hardware consumption. Experimental results on three widely used HSI datasets demonstrate that the proposed model achieves superior classification performance and lower complexity compared to the state-of-the-art works.
摘要:In disaster scenarios, the application of UAV (Unmanned Aerial Vehicle) for resource delivery holds considerable promise. However, the complexity and volatility of emergency environments, along with the spatial and temporal uncertainties associated with various unexpected events, can lead to inaccuracies in assessing resource demands at target points, which in turn may affect the UAV task allocation strategies in resource distribution. To address this issue, a two-stage robust optimization approach is introduced into the UAV task assignmet model. By integrating UAV assignment with task allocation, the model leverages the resources of the UAV fleet to minimize task assignment costs under maximum demand variability. This paper models the relationship between injury severity levels and resource demand variations, categorizing resource demand into three levels to achieve an accurate representation of total task allocation cost variations. The C&CG (Column-and-Constraint Generation) algorithm is used to address UAV task assignment under uncertain resource demand conditions. Finally, three types of experiments were designed and the simulation results validated the effectiveness and superiority of the algorithm. Compared to the deterministic model, this algorithm showed greater robustness in handling demand variation.
摘要:Light field imaging, as an image type capable of capturing light information from every position in a scene, holds broad application prospects in fields such as electronic imaging, medical imaging, and virtual reality. Light field image quality assessment (LFIQA) aims to measure the quality of such images, yet current methods confront significant challenges arising from the heterogeneity between visual effects and textual modalities. To address these issues, this paper proposes a multi-modal light field image quality assessment model grounded in text-vision integration. Specifically, for the visual modality, we devise a multi-task model that effectively enriches the crucial representational features of light field images by incorporating an edge auto-thresholding algorithm. On the textual side, we accurately identify noise categories in light field images based on the comparison between input noise features and predicted noise features, thereby validating the importance of noise prediction in optimizing visual representations. Building upon these findings, we further introduce an optimized universal noise text configuration approach combined with an edge enhancement strategy, which notably enhances the accuracy and generalization capabilities of the baseline model in LFIQA. Additionally, ablation experiments are conducted to assess the contribution of each component to the overall model performance, thereby verifying the effectiveness and robustness of our proposed method. Experimental results demonstrate that our approach not only excels in tests on public datasets like Win5-LID and NBU-LF1.0 but also shows remarkable outcomes in fused datasets. Compared to the state-of-the-art algorithms, our method achieves performance improvements of 2% and 6% respectively on the two databases. The noise verification strategy and configuration method presented in this paper not only provide valuable insights for light field noise prediction tasks but can also be applied as auxiliary tools for other noise prediction types.
关键词:image quality assessment;light field images;visual-textual model;multi-task mode;noise prediction;image enhancement
摘要:This paper proposes a method based on subgraph rephrasing to solve the problem of unseen predicates in question generation over knowledge graph. Traditional KBQG (Question Generation over Knowledge Base) methods mainly use annotated Q&A (Question and Answer) data (question and logic formal pairs) to generate questions. However, annotated data can’t fully cover all predicates in the knowledge graph. It is still a challenge to generate questions with unseen predicates in the knowledge graph. In this paper, we propose a semantic decoupling method based on subgraph structure. By decomposing the subgraph corresponding to a complex question into atomic subgraphs, the multi-hop subgraph containing unseen predicates can be divided into single-hop subgraphs that are easy to handle. In addition, we design a subgraph rephrasing procedure to train a subgraph rewriter on large-scale unsupervised data through sampling the predicates in the dataset by subgraph sampling. The subgraph rewriter will provide natural language form for subgraphs and effective information for generating questions. This paper quantitatively analyzes the performance of the model at different difficulty levels. The experimental results on GrailQA and other datasets show that our method achieves the state-of-the-art performance.
摘要:It is believed that the arrangements and amounts of the registers in processor chip have much heavy impact on the operation speed of the processor, which has induced the improvement of the structure of the on-chip cache, whose central task is to realize the fast access to the data in registers in the term of time and space. This kind of fast access to the register can be investigated vise the access process in which the data and structures in on-chip cache are accessed. By introducing the a new on-chip cache percolation cache, we prove that the existent of the time and spatial just-in-time locality which the percolation cache equips has contributed much to shorten the memory access delay by raising the hit rates when processor core accesses the percolation cache.
摘要:Quantum image sensor (QIS) has ultra-high single-photon sensitivity and spatial resolution, making it a promising alternative to CMOS image sensor (CIS) as the next-generation image sensor. However, image reconstruction of QIS differs from traditional image reconstruction methods, it aims to recover the original scene from binary measurements. The existing methods include model-based QIS image reconstruction and deep learning-based QIS image reconstruction. Model-based methods are largely based on optimization and are highly sensitive to the selection of hyperparameters. While deep learning-based methods require designing and training separate models for QIS image reconstruction tasks with slight variations in detail, which is inflexible and limits its usefulness to a large extent. In order to tackle the problems in QIS image reconstruction, a tuning-free plug-and-play alternating direction method of multiplier (TFPnP-ADMM) QIS image reconstruction method is proposed in this paper, which can adaptively select appropriate parameters dynamically for different input images with various oversampling factors, so as to achieve better image reconstruction performance. Specifically, in this paper, the parameters that need to be manually tuned in the QIS image reconstruction process under the plug-and-play (PnP) framework are modeled as a sequential decision problem, and a mixed model-free and model-based reinforcement learning algorithm is introduced to learn an optimal strategy, which could determine optimal hyperparameters at each iteration for different input images. The experimental results on synthetic dataset and real dataset demonstrate that, compared with existing state-of-the-art methods, the proposed method improves the peak signal-to-noise ratio by approximately 0.44~0.60 dB under oversampling rates of 4, 6, and 8. Furthermore, the visual results demonstrate the superiority of the proposed method in retaining more texture details. Real extremely low light QIS image data is available at https://github.com/ying-fu/Real-SPAD-Dataset.
摘要:Gray failures are micro switch malfunctions that have a subtle impact on production networks. However, when these micro malfunctions are superimposed on each other or on a new malfunction, they can lead to paralysis of production networks. Thus, the detection of gray failures is essential to the stability of production networks. Prior methods focus on using the control plane to collect flow records from data plane switches and process them to detect packet loss. However, they fall short due to (1) their high resource overhead of handling with massive flow records and (2) non-trivial delays that result in out-of-date failure detection. Recently, the emergence of programmable switches provides a promising alternative solution: the detection of gray failures can be offloaded to line-rate switch ASIC pipelines, enabling low-cost, low-latency, and high-accuracy in-network gray failure detection. This paper presents an illustrative survey of programmable switch-assisted techniques in in-network gray failure detection. First, we describe the concept of gray failures, their prevalence, and their impact to production networks. Second, we analyze and discuss the characteristics of state-of-the-art gray failures detection techniques built on programmable switches. Third, we illustrate the principle and workflow of each detection technique. Fourth, we conduct a real-world testbed to evaluate the metrics of each detection technique. Finally, we highlight the problems and challenges faced by existing techniques.
摘要:The rapid development of the internet of things (IoT) has spawned a large number of new applications. IoT empowers ordinary devices with computing and networking capabilities by connecting sensors, wearable devices, smart meters, and other low-data-rate, low-power end devices. Traditional wireless technologies struggle to adapt to the large-scale, low-power, long-distance connectivity requirements of IoT. How to reduce the barrier to device access and achieve low-power, long-distance device connectivity is an important challenge facing current IoT systems. LoRa, as a representative low-power wide-area network (LPWAN) technology, effectively solves the problem of long-distance connectivity for low-power devices and has become the core supporting technology of the IoT. However, LoRa still faces three important challenges in practice: (1) high-concurrency transmission in large-scale connection scenarios leads to signal conflicts, making it difficult for devices to access concurrently; (2) signal attenuation in long-distance wireless links makes it difficult to reliably transmit weak signals; (3) the problem of interference from heterogeneous protocols in IoT shared channels is prominent, and heterogeneous coexistence is difficult. This article outlines the current research progress of LoRa, focusing on the three research challenges and corresponding technological progress. Existing research has proposed conflict avoidance and concurrent decoding methods to address the problem of high-concurrency conflicts; existing research explores weak signal enhancement transmission and receiver decoding optimization to address the problem of weak signals; existing research has designed various cross-protocol communication mechanisms to address the problem of heterogeneous protocol competition. This article reviews the latest research progress of LoRa, analyzes the innovation points and limitations of existing research, and points out the direction of future research.