CN104021791B

CN104021791B - Detecting method based on digital audio waveform sudden changes

Info

Publication number: CN104021791B
Application number: CN201410285152.0A
Authority: CN
Inventors: 徐晶
Original assignee: Guizhou University
Current assignee: Guangxi Pingguo Runmin Poverty Alleviation Development Co Ltd
Priority date: 2014-06-24
Filing date: 2014-06-24
Publication date: 2017-02-22
Anticipated expiration: 2034-06-24
Also published as: CN104021791A

Abstract

The invention discloses a detection method for sudden changes in audio waveforms, which is a statistical discrimination method proposed according to the characteristics of ridges in speech spectrum caused by sudden changes in audio waveforms, and belongs to the field of multimedia information security. The method of the present invention aims at the waveform mutation detection of digital audio due to the copy and paste operation, analyzes the ridge-peak span changes before and after the audio splicing point in the logarithmic domain of the spectrum, and constructs a ridge-peak factor to describe the ridge-peak bandwidth of a frame of audio logarithmic spectrum to represent Short-term energy change characteristics, and use the difference operator to distinguish the sudden sound of the audio from the ridge factor identification caused by the sudden change of the audio waveform. The method of the present invention comprises the following steps: performing short-time Fourier transform and logarithmic transformation on the audio signal to obtain the audio spectrum in the logarithmic domain; calculating the ridge factor of each frame in the spectrum; performing differential transformation on the ridge factor The results are tested and judged. The invention can effectively identify sudden changes in audio waveforms, and provides an effective method for detecting the boundaries of digital audio editing operations.

Description

Detection method of digital audio waveform mutation

技术领域technical field

本发明涉及多媒体信息安全领域，尤其是一种数字音频波形突变的检测方法。The invention relates to the field of multimedia information security, in particular to a method for detecting sudden changes in digital audio waveforms.

背景技术Background technique

多媒体技术的数字化以及传输技术的进步，使得数字音频的应用迅速增长。数字音频易于通过物理的或电子的系统进行传输传播,但同时正是这些优点也带来了许多新的问题：数字音频在录制和拷贝过程中可能会遭受到有意无意的篡改。无论是蓄谋人为破坏完整性、真实性的篡改，还可能在传输传播过程中发生的错误，都会对本身的信息数据原始性造成破坏，尤其是在一些具有特殊意义的信息，如法庭举证、部门机密文件、历史文献备份等重要内容，一旦遭到恶意篡改，会造成很严重的后果。The digitalization of multimedia technology and the progress of transmission technology make the application of digital audio grow rapidly. Digital audio is easy to transmit and disseminate through physical or electronic systems, but at the same time these advantages also bring many new problems: digital audio may suffer from intentional or unintentional tampering in the process of recording and copying. Whether it is intentional tampering that destroys integrity and authenticity, or errors that may occur during transmission and dissemination, it will damage the originality of its own information data, especially in some information with special significance, such as court evidence, department Once important content such as confidential documents and historical document backups are maliciously tampered with, serious consequences will result.

数字语音复制粘贴/删除篡改是把数字语音中的一段语音片段复制到另一语音片段中，或者删除一段语音中的片段，它是一种简单有效改变语音重要信息的篡改方法，如说明书附图1、2所示。由于在同一段语音中有着一致或相似的噪声，说话人声纹，使得人耳很难对篡改后的语音片段进行辨别，检测此形式的篡改对语音真伪性的判断有着重要的实际意义。Digital voice copy-paste/delete tampering is to copy a segment of voice in digital voice to another segment of voice, or to delete a segment in a segment of voice. It is a simple and effective method of tampering to change important information of voice, as shown in the attached picture of the manual 1, 2 shown. Because there are consistent or similar noises and voiceprints in the same speech, it is difficult for the human ear to distinguish the falsified speech segments. Detecting this form of tampering has important practical significance for judging the authenticity of speech.

发明内容Contents of the invention

本发明所要解决的技术问题是提供一种数字音频波形突变的检测方法，它能对给定的一段语音，判断其是否经过了语音片段复制粘贴或者删除一段语音而造成的音频波形突变，还能定位出语音篡改的时间范围，从而确认语音的真伪性，以克服现有技术的不足。The technical problem to be solved by the present invention is to provide a detection method for a digital audio waveform mutation, which can judge whether a given section of speech has undergone a audio waveform mutation caused by copying and pasting a section of speech or deleting a section of speech. The time range of voice tampering is located, so as to confirm the authenticity of the voice, so as to overcome the shortcomings of the existing technology.

本发明是这样实现的：数字音频波形突变的检测方法，包括以下步骤：1）将音频信号变换得到对数域的音频语谱Y ，将获得的音频语谱进行对数变换，得到对数语谱G ；2）进行对数语谱G能量二值化计算；3）计算每帧对数语谱G 的音频突变系数σ _t；4）对音频突变系数σ _t进行判断，进行音频波形突变检测和突变区域定位。The present invention is realized in the following way: the detection method of digital audio waveform mutation includes the following steps: 1) Transform the audio signal to obtain the audio spectrum Y in the logarithmic domain, perform logarithmic transformation on the obtained audio spectrum, and obtain the logarithmic spectrum Spectrum G ; 2) Calculate the energy binarization of the logarithmic spectrum G; 3) Calculate the audio mutation coefficient σ _t of each frame logarithmic spectrum G ; 4) Judge the audio mutation coefficient σ _t and detect the audio waveform mutation and location of mutation regions.

步骤1）所述的将音频信号变换得到对数域的音频语谱Y ，将获得的音频语谱进行对数变换，得到对数语谱G ，具体是，对于长度为h的数字音频信号y 进行分帧，得到帧数为N _l，帧长为2* N 的矩阵；加入窗函数并进行短时傅里叶变换，得到大小为N * N ₁的音频语谱Y ；对音频语谱Y 进行对数变换，得到对数语谱G ，其大小为N * N ₁。In step 1), the audio signal is transformed to obtain the audio spectrum Y in the logarithmic domain, and the obtained audio spectrum is logarithmically transformed to obtain the logarithmic spectrum G. Specifically, for a digital audio signal y of length h Carry out framing to obtain a matrix whose number of frames is N1 and _whose frame length is 2* N ; add a window function and perform short-time Fourier transform to obtain an audio spectrum Y whose size is N*N1 _; to the audio spectrum Y Perform logarithmic transformation to obtain the logarithmic spectrum G , whose size is N * N ₁ .

步骤2）所述的进对数语谱G 能量二值化计算，具体是，先计算得到对数语谱G 中的最大值G _max和最小值G _min，设每帧频率的能量值为G _ti(1≤t≤N, 1≤i≤N_l)，通过如下公式（1）计算能量二值化值δ(t,i)，Step 2) The energy binarization calculation of the logarithmic spectrum G , specifically, first calculate the maximum value G _max and the minimum value G _min in the logarithmic spectrum G , and set the energy value of each frame frequency as G _ti (1 ≤ t ≤ N, 1 ≤ i ≤ N _l ), the energy binarization value δ(t,i) is calculated by the following formula (1) ,

其中λ 为阈值因子。where λ is the threshold factor.

步骤3）所述的计算每帧对数语谱G _t的音频突变系数，具体是，通过公式（2）计算音频突变系数σ _t （1≤t≤N ₁ ）；Step 3) The calculation of the audio mutation coefficient of each frame of logarithmic spectrum G _t , specifically, the audio mutation coefficient σ _t (1≤t≤N ₁ ) is calculated by formula (2) ;

步骤4）所述的对音频突变系数σ _t进行判断，进行音频波形突变检测和突变区域定位，具体是，假设在步骤3)中得到了对数语谱G第i帧G _i及相邻帧的三个音频突变系数σ _i-1 、 σ _i 、σ _i+1，若满足：In step 4), the audio mutation coefficient σ _t is judged, and the audio waveform mutation detection and mutation region location are performed. Specifically, it is assumed that the i-th frame G _i and adjacent frames of the logarithmic spectrum G are obtained in step 3). The three audio mutation coefficients σ _i-1 , σ _i , σ _i+1 , if satisfy:

则确定音频中存在音频突变，其中第i帧G _i是检测到的音频波形突变区域。Then it is determined that there is an audio mutation in the audio, wherein the i-th frame G _i is the detected audio waveform mutation region.

与现有技术相比，本发明利用在篡改拼接语音过程中信号波形突变，使得该时间片段的频率短时能量突增的性质，利用差分算法区分编辑造成的音频波形突变和强信号的音频突变，通过计算音频突变系数后，检测出该段音频有复制粘贴/删除篡改痕迹，并准确定位篡改时间。本发明方法有较好的检测与定位功能，在人耳无法识别、语谱图无法观察出来的情况下能很好地识别出音频的波形突变，是一种有效的音频复制粘贴/删除篡改检测技术。Compared with the prior art, the present invention utilizes the property that the signal waveform mutation in the process of tampering and splicing speech causes a short-term energy sudden increase in the frequency of the time segment, and uses a differential algorithm to distinguish between the audio waveform mutation caused by editing and the audio mutation of a strong signal After calculating the audio mutation coefficient, it is detected that the audio has copy-pasted/deleted tampering traces, and the tampering time is accurately located. The method of the present invention has better detection and positioning functions, and can well identify the waveform mutation of the audio when the human ear cannot recognize it and the spectrogram cannot be observed, and it is an effective audio copy-paste/deletion tampering detection technology.

附图说明Description of drawings

图1是音频资料插入音频片段示意图；Fig. 1 is a schematic diagram of inserting audio clips into audio data;

图2是音频资料删除音频片段示意图；Fig. 2 is a schematic diagram of deleting audio clips from audio data;

图3是语音篡改拼接点处波形突变示意图；Fig. 3 is a schematic diagram of a waveform mutation at a speech tampering splicing point;

图4是检测算法流程图；Fig. 4 is a detection algorithm flow chart;

图5是检测语音的波形图；Fig. 5 is the oscillogram of detection speech;

图6是检测语音的对数语谱；Fig. 6 is the logarithmic spectrum of detection speech;

图7是检测语音的音频突变系数。Fig. 7 is the audio abrupt change coefficient of detected speech.

具体实施方式detailed description

本发明的实施例1：数字音频波形突变的检测方法，Embodiment 1 of the present invention: the detection method of digital audio waveform mutation,

1）将音频信号y 变换得到对数域的音频语谱Y ：对于长度为h 的数字音频信号y，对音频信号y 进行分帧，每帧长度为2 * N （设定N=128）；重叠率为l （设定l = 0.5），则帧数N ₁为：1) Transform the audio signal y to obtain the audio spectrum Y in the logarithmic domain: For a digital audio signal y of length h, divide the audio signal y into frames, and the length of each frame is 2 * N (set N=128); The overlap rate is l (set l = 0.5), then the number of frames N ₁ is:

设分帧信号为y _i，i=1...N_l, 利用公式（1）进行离散短时傅里叶变换，得到Y _ti，其中w(N-m)为窗函数（设定为Hamming窗函数）。Let the framed signal be y _i , i=1...N _l , use formula (1) to perform discrete short-time Fourier transform to obtain Y _ti , where w(Nm) is the window function (set as the Hamming window function ).

利用公式（2）计算频谱图，进行对数变换得到对数语谱G _i (i=1...N_l,) ，即Use the formula (2) to calculate the spectrogram, and perform logarithmic transformation to obtain the logarithmic spectrum G _i (i=1...N _l ,) , namely

将所有对数幅度值组成矩阵即为语音信号的对数语谱G ，其大小为N * N ₁；Composing all logarithmic amplitude values into a matrix is the logarithmic spectrum G of the speech signal, and its size is N * N ₁ ;

2）进行对数语谱G _i能量二值化计算：首先计算对数语谱G _i中的最大值G _max和最小值G _min，对于每帧频率的能量值G _i （k），k = 1 ... N ，通过公式（3）计算频率的能量二值化值δ _(t,i)：2) Calculate the energy binarization of the logarithmic spectrum G _i : first calculate the maximum value G _max and the minimum value G _min in the logarithmic spectrum G _i , for the energy value G _i (k) of each frame frequency, k = 1 ... N , calculate the frequency energy binarization value δ _{(t,i) by formula (3)} :

其中λ 为阈值因子（设定λ = 0.65），将δ 组成的矩阵Δ定义为短时能量二值谱，δ _(i,k)取值为0代表第i 帧第k 频率成分能量低，1为能量高；Where λ is the threshold factor (set λ = 0.65), and the matrix Δ composed of δ is defined as the short-term energy binary spectrum. The value of δ _(i,k) is 0, which means that the energy of the kth frequency component in the i -th frame is low, and 1 high energy

3）计算每帧对数语谱G _i的音频突变系数σ _t：由于人工复制粘贴语音片段，在编辑处音频信号会出现波形突变，这一操作引入新的频率成份，使得对数幅度谱包含音频突变的帧所有频率能量相对于相邻帧突然增加，因此，该帧短时能量二值谱的非零值数应明显多于相邻帧；统计步骤2）中短时能量二值谱每帧的均值σ _i，根据公式（4）计算得到音频突变系数σ _i（1≤i≤N ₁）；3) Calculate the audio mutation coefficient σ _t of the logarithmic spectrum G _i of each frame: due to the artificial copying and pasting of speech fragments, the audio signal will have a waveform mutation at the editing site. This operation introduces new frequency components, making the logarithmic magnitude spectrum contain All frequency energies of a frame with a sudden audio mutation increase suddenly relative to adjacent frames. Therefore, the number of non-zero values of the short-term energy binary spectrum of this frame should be significantly more than that of adjacent frames; in the statistical step 2), each short-term energy binary spectrum The mean value σ _i of the frame is calculated according to the formula (4) to obtain the audio mutation coefficient σ _i (1≤ i ≤ N ₁ );

4）对音频突变系数σ _t进行判断，进行音频波形突变检测和突变区域定位：由于语音特征，强语音信号会使检测得到的音频突变系数变大，但这种信号具有时间延续性，人工复制粘贴造成的音频突变系数只会存在于一帧中，因此，利用差分算法区分编辑造成的音频波形突变和强信号的音频突变，其具体步骤是，通过公式（5）计算每帧对数语谱G _i的音频突变系数σ _t （1≤t≤N1）；4) Judging the audio mutation coefficient σ _t , audio waveform mutation detection and mutation area location: due to voice characteristics, strong voice signals will increase the detected audio mutation coefficient, but this signal has time continuity and cannot be copied manually The audio mutation coefficient caused by pasting will only exist in one frame. Therefore, the difference algorithm is used to distinguish the audio waveform mutation caused by editing and the audio mutation of strong signal. The specific steps are to calculate the logarithmic spectrum of each frame by formula (5) Audio mutation coefficient σ _t of G _i (1≤t≤N1) ;

步骤4）所述的对音频突变系数σ _t进行判断，进行音频波形突变检测和突变区域定位，具体是，假设在步骤3)中得到了对数语谱第i帧G _i及相邻帧的三个音频突变系数σ _i-1 、 σ _i 、σ _i+1，若满足：In step 4), the audio mutation coefficient σ _t is judged, and audio waveform mutation detection and mutation region location are performed. Specifically, assuming that in step 3), the i-th frame G _i of the logarithmic spectrum and the adjacent frames are obtained. The three audio mutation coefficients σ _i-1 , σ _i , σ _i+1 , if satisfied:

如图5所示，图5是删除部分语音之后的篡改语音波形图，其中虚线标注时刻为音频剪切位置，且无法通过听觉辨别出该编辑痕迹。。如图6所示，虽然该段音频的对数语谱中无明显的短时能量突变，通过计算音频突变系数后如图7所示，检测出该段音频有复制粘贴/删除篡改痕迹，并准确定位篡改时间。As shown in Figure 5, Figure 5 is a tampered voice waveform diagram after deleting part of the voice, where the time marked by the dotted line is the audio cut position, and the editing trace cannot be discerned by hearing. . As shown in Figure 6, although there is no obvious short-term energy mutation in the logarithmic spectrum of this segment of audio, after calculating the audio mutation coefficient, as shown in Figure 7, it is detected that this segment of audio has copy-paste/delete tampering traces, and Accurately locate the time of tampering.

从上述实施例可以看出，本发明方法有较好的检测与定位功能。在人耳无法识别、语谱图无法观察出来的情况下能很好地识别出音频的波形突变，是一种有效的音频复制粘贴/删除篡改检测技术。It can be seen from the above embodiments that the method of the present invention has better detection and positioning functions. In the case that the human ear cannot recognize and the spectrogram cannot be observed, it can well identify the waveform mutation of the audio, which is an effective audio copy-paste/deletion tampering detection technology.

Claims

1. a kind of mutation of DAB waveform detection method it is characterised in that：Comprise the following steps：1) audio signal is converted Obtain the audio frequency language spectrum Y of log-domain, the audio frequency language spectrum of acquisition is carried out logarithmic transformation, obtain logarithm language spectrum G；2) carry out logarithm language Spectrum G energy binaryzation calculates；3) calculate every frame logarithm language and compose G_tAudio frequency change coefficient σ_t；4) to audio frequency change coefficient σ_tCarry out Judge, carry out audio volume control abrupt climatic change and Sudden change region positioning；

Step 1) described in the audio frequency language spectrum Y that audio signal conversion is obtained log-domain, the audio frequency language of acquisition spectrum is carried out logarithm Conversion, obtains logarithm language spectrum G, specifically, carries out framing for the digital audio and video signals y for h for the length, and obtaining frame number is N_l, frame The matrix of a length of 2*N；Add window function and carry out Short Time Fourier Transform, obtaining size is N*N₁Audio frequency language spectrum Y；To audio frequency Language spectrum Y carries out logarithmic transformation, obtains logarithm language spectrum G, and its size is N*N₁；

Step 2) described in carry out logarithm language spectrum G energy binaryzation calculate, specifically, be first calculated logarithm language compose G in maximum Value G_maxWith minima G_minIf the energy value of every frame rate is G_ti(1≤t≤N,1≤i≤N_l), energy is calculated by equation below (1) Amount binary value δ (t, i),

Wherein λ is threshold factor；

Step 3) described in calculating every frame logarithm language spectrum G_tAudio frequency change coefficient, specifically, by formula (2) calculate audio frequency dash forward Variable coefficient σ_t(1≤t≤N₁)；

σ_{t} = \frac{Σ_{i = 1}^{n} δ (t, i)}{n} - - - (2);

Step 4) described in audio frequency change coefficient σ_tJudged, carried out audio volume control abrupt climatic change and Sudden change region positioning, tool Body is it is assumed that in step 3) in obtained logarithm language spectrum G the i-th frame G_iAnd three audio frequency change coefficient σ of consecutive frame_i-1、σ_i、σ_i+1, If meeting：

σ_i>0.85 and | σ_i-σ_i-1|*|σ_i-σ_i+1|>σ_i/16

Then determine in audio frequency there is audio frequency mutation, the wherein i-th frame G_iIt is the audio volume control Sudden change region detecting.