
基于多特征融合和BiLSTM的语音隐写检测算法
A Speech Steganalysis Algorithm Based on Multi-Feature Fusion and BiLSTM
针对传统互联网低比特率编解码器(internet Low Bit Rate Codec,iLBC)语音隐写主要集中在线性频谱频率系数矢量量化、码本搜索矢量量化或增益量化的单个阶段,难以应对多阶段下的联合隐写检测等问题,提出一种基于多特征融合和双向长短时记忆(Bi-Directional Long Short-Term Memory,BiLSTM)网络的iLBC语音隐写检测算法.通过分析隐写对不同阶段参数带来的影响,提取线性频谱频率系数矢量量化、码本搜索矢量量化和增益量化过程中的多种隐写特征,并分别输入到相应的BiLSTM检测网络,最后将各检测网络的结果进行融合,得到最终隐写检测结果.实验表明,所提算法可以实现多阶段下的联合隐写检测,而且在语音时长较短时,仍能取得优异的检测结果,平均检测准确率达到了90%以上.
The traditional internet low bit rate codec (iLBC) based speech steganography mainly focuses on a single stage of the linear spectrum frequency coefficient vector quantization, the codebook search vector quantization, or the gain quantization, which is difficult to deal with the multi-stage joint steganalysis. To this end, an iLBC speech steganalysis algorithm based on the multi-feature fusion and the bi-directional long short-term memory (BiLSTM) network is proposed. Specifically, the impact of steganography on iLBC parameters is first analyzed in the linear spectrum frequency coefficient vector quantization process, the dynamic codebook search process, and the gain quantization process. Then, multiple steganographic features in the above three stages are extracted and input to three different detection models based on BiLSTM, respectively. Finally, a fusion strategy is presented to merge the detection results of each model. Experimental results show that the proposed algorithm can achieve multi-stage joint steganalysis and good detection results with an average detection accuracy of more than 90%, even if the speech duration is short.
联合隐写检测 / 互联网低比特率编解码器 / 双向长短时记忆网络 / 隐写特征提取 / 多特征融合 {{custom_keyword}} /
joint steganalysis / internet low bit rate codec / bi-directional long short-term memory network / steganographic feature extraction / multi-feature fusion {{custom_keyword}} /
表1 各模型网络参数 |
BiLSTM1 | BiLSTM2 | Flatten | Dense | ||
---|---|---|---|---|---|
LSF-BiLSTM | 输入 | 6, | 12, | 12, | 12× |
输出 | 12, | 12, | 12× | 1 | |
CB-BiLSTM | 输入 | 15, | 30, | 30, | 30× |
输出 | 30, | 30, | 30× | 1 | |
GQ-BiLSTM | 输入 | 15, | 30, | 30, | 30× |
输出 | 30, | 30, | 30× | 1 |
表2 不同分析器在30ms帧的中文语音样本上的检测率 |
隐写方法 | 隐写分析器 | 0.1s | 0.3s | 0.5s | 0.7s | 1s |
---|---|---|---|---|---|---|
QIMC | FCEM | 1 | 0.972 | 1 | 0.997 | 1 |
SRCNet | 0.024 | 0.354 | 0 | 0 | 0.009 | |
G-LSTM | 0.588 | 0.547 | 0.57 | 0.628 | 0.595 | |
SpecResNet | 0.456 | 0.574 | 0.606 | 0.572 | 0.623 | |
MSFNet | 0.948 | 0.948 | 0.959 | 0.988 | 0.959 | |
FCB | FCEM | 0.153 | 0.17 | 0.002 | 0.021 | 0.051 |
SRCNet | 0.995 | 0.942 | 1 | 1 | 0.999 | |
G-LSTM | 0.278 | 0.505 | 0.004 | 0.001 | 0.008 | |
SpecResNet | 0.488 | 0.568 | 0.636 | 0.565 | 0.654 | |
MSFNet | 0.992 | 1 | 1 | 1 | 0.997 | |
GQS | FCEM | 0.153 | 0.17 | 0.002 | 0.021 | 0.051 |
SRCNet | 0.033 | 0.376 | 0.002 | 0 | 0.003 | |
G-LSTM | 0.814 | 0.826 | 0.817 | 0.914 | 0.944 | |
SpecResNet | 0.48 | 0.584 | 0.617 | 0.567 | 0.633 | |
MSFNet | 0.845 | 0.938 | 0.995 | 0.995 | 0.998 | |
HS | FCEM | 0.153 | 0.17 | 0.002 | 0.021 | 0.051 |
SRCNet | 0.996 | 0.938 | 1 | 1 | 1 | |
G-LSTM | 0.618 | 0.513 | 0.79 | 0.83 | 0.869 | |
SpecResNet | 0.526 | 0.577 | 0.607 | 0.586 | 0.677 | |
MSFNet | 0.996 | 1 | 1 | 1 | 1 |
表3 在0.1 s长、30 ms帧的中文语音样本上的检测率 |
隐写分析器 | 隐写方法 | 嵌入率 | |||||
---|---|---|---|---|---|---|---|
0.1 | 0.2 | 0.4 | 0.6 | 0.8 | 1 | ||
FCEM | QIMC | 0.998 | 1 | 1 | 1 | 1 | 1 |
FCB | 0.153 | 0.153 | 0.153 | 0.153 | 0.153 | 0.153 | |
GQS | 0.153 | 0.153 | 0.153 | 0.153 | 0.153 | 0.153 | |
HS | 0.153 | 0.153 | 0.153 | 0.153 | 0.153 | 0.153 | |
平均值 | 0.364 | 0.365 | 0.365 | 0.365 | 0.365 | 0.365 | |
SRCNet | QIMC | 0.034 | 0.029 | 0.024 | 0.036 | 0.032 | 0.024 |
FCB | 0.932 | 0.958 | 0.993 | 0.992 | 0.994 | 0.995 | |
GQS | 0.03 | 0.03 | 0.031 | 0.032 | 0.027 | 0.033 | |
HS | 0.023 | 0.027 | 0.027 | 0.027 | 0.052 | 0.996 | |
平均值 | 0.255 | 0.261 | 0.269 | 0.272 | 0.276 | 0.512 | |
G-LSTM | QIMC | 0.553 | 0.54 | 0.542 | 0.589 | 0.577 | 0.588 |
FCB | 0.167 | 0.219 | 0.237 | 0.231 | 0.264 | 0.278 | |
GQS | 0.5 | 0.541 | 0.541 | 0.771 | 0.802 | 0.814 | |
HS | 0.5 | 0.561 | 0.669 | 0.505 | 0.457 | 0.618 | |
平均值 | 0.43 | 0.465 | 0.497 | 0.524 | 0.525 | 0.575 | |
SpecResNet | QIMC | 0.463 | 0.451 | 0.471 | 0.439 | 0.477 | 0.456 |
FCB | 0.469 | 0.483 | 0.482 | 0.482 | 0.472 | 0.488 | |
GQS | 0.497 | 0.482 | 0.491 | 0.495 | 0.489 | 0.48 | |
HS | 0.532 | 0.517 | 0.535 | 0.532 | 0.506 | 0.526 | |
平均值 | 0.424 | 0.435 | 0.431 | 0.453 | 0.487 | 0.499 | |
MSFNet | QIMC | 0.84 | 0.955 | 0.971 | 0.977 | 0.962 | 0.948 |
FCB | 0.966 | 0.969 | 0.987 | 0.993 | 0.997 | 0.992 | |
GQS | 0.646 | 0.637 | 0.658 | 0.804 | 0.829 | 0.845 | |
HS | 0.615 | 0.59 | 0.647 | 0.748 | 0.757 | 0.996 | |
平均值 | 0.767 | 0.788 | 0.816 | 0.881 | 0.886 | 0.945 |
表4 在0.1 s长、20 ms帧的中文语音样本上的检测率 |
隐写分析器 | 隐写方法 | 嵌入率 | |||||
---|---|---|---|---|---|---|---|
0.1 | 0.2 | 0.4 | 0.6 | 0.8 | 1 | ||
FCEM | QIMC | 0.959 | 0.962 | 0.978 | 0.998 | 0.999 | 1 |
FCB | 0.013 | 0.013 | 0.013 | 0.013 | 0.013 | 0.013 | |
GQS | 0.013 | 0.013 | 0.013 | 0.013 | 0.013 | 0.013 | |
HS | 0.013 | 0.013 | 0.013 | 0.013 | 0.013 | 0.013 | |
平均值 | 0.25 | 0.25 | 0.254 | 0.259 | 0.26 | 0.26 | |
SRCNet | QIMC | 0.002 | 0.003 | 0.001 | 0.003 | 0 | 0.003 |
FCB | 0.987 | 0.991 | 1 | 1 | 1 | 0.999 | |
GQS | 0.005 | 0.008 | 0.004 | 0.007 | 0.004 | 0.009 | |
HS | 0.005 | 0.01 | 0.008 | 0.016 | 0.41 | 0.988 | |
平均值 | 0.25 | 0.253 | 0.203 | 0.257 | 0.354 | 0.502 | |
G-LSTM | QIMC | 0.44 | 0.438 | 0.442 | 0.472 | 0.502 | 0.462 |
FCB | 0.159 | 0.078 | 0.109 | 0.117 | 0.133 | 0.122 | |
GQS | 0.416 | 0.564 | 0.743 | 0.854 | 0.844 | 0.811 | |
HS | 0.51 | 0.559 | 0.7 | 0.682 | 0.627 | 0.719 | |
平均值 | 0.381 | 0.41 | 0.499 | 0.518 | 0.527 | 0.529 | |
SpecResNet | QIMC | 0.528 | 0.524 | 0.531 | 0.549 | 0.528 | 0.531 |
FCB | 0.495 | 0.525 | 0.534 | 0.587 | 0.617 | 0.556 | |
GQS | 0.487 | 0.499 | 0.492 | 0.52 | 0.498 | 0.51 | |
HS | 0.485 | 0.487 | 0.501 | 0.518 | 0.552 | 0.523 | |
平均值 | 0.499 | 0.509 | 0.515 | 0.544 | 0.549 | 0.53 | |
MSFNet | QIMC | 0.862 | 0.925 | 0.948 | 0.964 | 0.961 | 0.967 |
FCB | 0.996 | 0.996 | 0.999 | 0.999 | 1 | 1 | |
GQS | 0.637 | 0.712 | 0.794 | 0.882 | 0.876 | 0.872 | |
HS | 0.69 | 0.719 | 0.837 | 0.796 | 0.909 | 1 | |
平均值 | 0.796 | 0.838 | 0.895 | 0.91 | 0.937 | 0.96 |
表5 在0.1 s长、30 ms帧的英文语音样本上的检测率 |
隐写分析器 | 隐写方法 | 嵌入率 | |||||
---|---|---|---|---|---|---|---|
0.1 | 0.2 | 0.4 | 0.6 | 0.8 | 1 | ||
FCEM | QIMC | 0.908 | 0.989 | 0.998 | 0.999 | 0.997 | 1 |
FCB | 0.003 | 0.003 | 0.003 | 0.003 | 0.003 | 0.003 | |
GQS | 0.003 | 0.003 | 0.003 | 0.003 | 0.003 | 0.003 | |
HS | 0.003 | 0.003 | 0.003 | 0.003 | 0.003 | 0.003 | |
平均值 | 0.227 | 0.247 | 0.25 | 0.25 | 0.249 | 0.25 | |
SRCNet | QIMC | 0.008 | 0.013 | 0.019 | 0.018 | 0.014 | 0.017 |
FCB | 0.959 | 0.959 | 0.996 | 1 | 0.997 | 0.999 | |
GQS | 0.019 | 0.016 | 0.021 | 0.018 | 0.019 | 0.018 | |
HS | 0.014 | 0.015 | 0.018 | 0.014 | 0.029 | 0.996 | |
平均值 | 0.25 | 0.251 | 0.264 | 0.275 | 0.265 | 0.508 | |
G-LSTM | QIMC | 0.3 | 0.335 | 0.371 | 0.356 | 0.37 | 0.417 |
FCB | 0.46 | 0.61 | 0.7 | 0.59 | 0.68 | 0.84 | |
GQS | 0.449 | 0.676 | 0.889 | 0.897 | 0.95 | 0.886 | |
HS | 0.295 | 0.588 | 0.434 | 0.416 | 0.509 | 0.426 | |
平均值 | 0.376 | 0.62 | 0.599 | 0.565 | 0.627 | 0.642 | |
SpecResNet | QIMC | 0.45 | 0.509 | 0.489 | 0.492 | 0.546 | 0.512 |
FCB | 0.418 | 0.43 | 0.429 | 0.47 | 0.482 | 0.527 | |
GQS | 0.417 | 0.393 | 0.407 | 0.402 | 0.425 | 0.444 | |
HS | 0.412 | 0.406 | 0.4 | 0.448 | 0.495 | 0.512 | |
平均值 | 0.424 | 0.435 | 0.431 | 0.453 | 0.487 | 0.499 | |
MSFNet | QIMC | 0.712 | 0.963 | 0.98 | 0.983 | 0.971 | 0.951 |
FCB | 0.946 | 0.945 | 0.988 | 0.988 | 0.994 | 0.992 | |
GQS | 0.522 | 0.726 | 0.887 | 0.905 | 0.907 | 0.871 | |
HS | 0.396 | 0.55 | 0.56 | 0.583 | 0.671 | 0.99 | |
平均值 | 0.647 | 0.796 | 0.854 | 0.865 | 0.886 | 0.951 |
表6 在0.1 s长、20 ms帧的英文语音样本上的检测率 |
隐写分析器 | 隐写方法 | 嵌入率 | |||||
---|---|---|---|---|---|---|---|
0.1 | 0.2 | 0.4 | 0.6 | 0.8 | 1 | ||
FCEM | QIMC | 0.982 | 0.991 | 0.996 | 0.998 | 1 | 1 |
FCB | 0.005 | 0.005 | 0.005 | 0.005 | 0.005 | 0.005 | |
GQS | 0.005 | 0.005 | 0.005 | 0.005 | 0.005 | 0.005 | |
HS | 0.005 | 0.005 | 0.005 | 0.005 | 0.005 | 0.005 | |
平均值 | 0.249 | 0.252 | 0.287 | 0.253 | 0.254 | 0.254 | |
SRCNet | QIMC | 0.001 | 0 | 0 | 0.003 | 0.002 | 0.006 |
FCB | 0.99 | 0.989 | 0.999 | 0.998 | 0.999 | 1 | |
GQS | 0.004 | 0.003 | 0.005 | 0.003 | 0.004 | 0.004 | |
HS | 0.005 | 0.004 | 0.007 | 0.009 | 0.49 | 0.997 | |
平均值 | 0.25 | 0.249 | 0.253 | 0.253 | 0.374 | 0.502 | |
G-LSTM | QIMC | 0.408 | 0.468 | 0.489 | 0.531 | 0.524 | 0.518 |
FCB | 0.135 | 0.122 | 0.103 | 0.087 | 0.099 | 0.099 | |
GQS | 0.446 | 0.691 | 0.745 | 0.809 | 0.91 | 0.709 | |
HS | 0.569 | 0.541 | 0.858 | 0.793 | 0.635 | 0.721 | |
平均值 | 0.39 | 0.308 | 0.549 | 0.555 | 0.542 | 0.512 | |
SpecResNet | QIMC | 0.452 | 0.422 | 0.51 | 0.502 | 0.346 | 0.356 |
FCB | 0.396 | 0.41 | 0.44 | 0.452 | 0.449 | 0.433 | |
GQS | 0.392 | 0.388 | 0.4 | 0.376 | 0.375 | 0.381 | |
HS | 0.386 | 0.38 | 0.388 | 0.389 | 0.399 | 0.408 | |
平均值 | 0.407 | 0.4 | 0.435 | 0.43 | 0.392 | 0.395 | |
MSFNet | QIMC | 0.781 | 0.956 | 0.962 | 0.982 | 0.991 | 0.996 |
FCB | 0.978 | 0.98 | 0.996 | 0.998 | 0.997 | 0.999 | |
GQS | 0.571 | 0.706 | 0.763 | 0.82 | 0.889 | 0.769 | |
HS | 0.6 | 0.603 | 0.866 | 0.8 | 0.863 | 1 | |
平均值 | 0.735 | 0.811 | 0.897 | 0.9 | 0.935 | 0.941 |
1 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
2 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
3 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
4 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
5 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
6 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
7 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
8 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
9 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
10 |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
11 |
李望望. 面向iLBC语音流的隐写与隐写分析技术研究[D]. 合肥: 合肥工业大学计算机与信息学院, 2019.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
12 |
张浩, 胡昌华, 杜党波 等. 多状态影响下基于Bi-LSTM网络的锂电池剩余寿命预测方法[J]. 电子学报, 2022, 50(3): 619-624.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
13 |
李敬轩, 胡润文, 阮观奇, 等. 基于手工特征提取与结果融合的CNN音频隐写分析算法[J]. 计算机学报, 2021, 44(10): 2061-2075.
{{custom_citation.content}}
{{custom_citation.annotation}}
|
{{custom_ref.label}} |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
/
〈 |
|
〉 |