CN105513600A - Duration-related mp3 double compression detection method under same bit rate - Google Patents

Duration-related mp3 double compression detection method under same bit rate Download PDF

Info

Publication number
CN105513600A
CN105513600A CN201610018814.7A CN201610018814A CN105513600A CN 105513600 A CN105513600 A CN 105513600A CN 201610018814 A CN201610018814 A CN 201610018814A CN 105513600 A CN105513600 A CN 105513600A
Authority
CN
China
Prior art keywords
voice
compression
compressed
duration
wav
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610018814.7A
Other languages
Chinese (zh)
Other versions
CN105513600B (en
Inventor
王让定
陶表犁
严迪群
金超
周劲蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo University
Original Assignee
Ningbo University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo University filed Critical Ningbo University
Priority to CN201610018814.7A priority Critical patent/CN105513600B/en
Publication of CN105513600A publication Critical patent/CN105513600A/en
Application granted granted Critical
Publication of CN105513600B publication Critical patent/CN105513600B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes

Abstract

The invention discloses a duration-related mp3 double compression detection method under the same bit rate. The duration and compression rate of mp3 speech to be detected are first obtained; the mp3 speech to be detected is then decoded, so that wav speech is obtained, and a QMDCT coefficient matrix is extracted; the wav speech which is obtained by decompression is then encoded and compressed with the same compression rate, so that mp3 speech is obtained, and a QMDCT coefficient matrix is extracted; whereafter, a difference matrix of the two QMDCT coefficient matrices is obtained; the duration of the mp3 speech to be detected is then substituted into a primary compression fitting function and a secondary compression fitting function, so that a primary compression fitting function value and a secondary compression fitting function value are obtained correspondingly; finally, by comparing the distance from the value of the number of non-zero elements in the difference matrix to the primary compression fitting function value with the distance from the value of the number of the non-zero elements in the difference matrix to the secondary compression fitting function value, the mp3 speech to be detected is determined as primarily compressed speech or secondarily compressed speech. The duration-related mp3 double compression detection method has the following advantages: the accuracy of detection is high and the complexity of detection is low, and in particular, the relation between the accuracy of detection and the duration of speech is disclosed.

Description

A kind of two compressed detected method of same code rate mp3 relevant to duration
Technical field
The present invention relates to the two compressed detected technology of a kind of voice, especially relate to a kind of two compressed detected method of same code rate mp3 relevant to duration.
Background technology
Along with the develop rapidly of digital technology and Internet technology, digital multimedia is widely used in daily life, and people can obtain the multimedia oneself wanted very easily.But, all there is dual character in the technology of any advanced person, digital technology is also like this, although it has enriched the cultural life of people, facilitating the communication between people, having allowed people be benefited because having grasped new technology, but because digital technology easily consciously or unconsciously is handled by people, some will be caused like this to utilize digital technology to carry out distorting of information, thus damage to society, therefore digital evidence obtaining technology arises.Digital speech, as the important part of digital multimedia, occupies critical role in digital evidence obtaining technology.Comparatively outstanding in digital speech forensic technologies is that digital speech distorts forensic technologies, digital speech distort evidence obtaining to the integrality of digital voice content and authenticity verification significant.But, if digital speech there occurs distort operation, then inevitably produce two compressions of digital speech.Therefore, two compressed detected of digital speech are also significant.
Mp3 voice are one of current phonetic matrixs the most popular, mp3 voice adopt the compression of MPEG-1 third layer token sound, be developed so far during the last ten years from it, well received with the tonequality of its nearly CD, high compression ratio, opening and ease for use, widely popular on the internet, corresponding encoding and decoding software and hardware equipment also continues to bring out, and impacts digital voice compression field dearly.Distorting of mp3 voice document can experience two compression process, because original mp3 voice must be unziped to time domain before distorting by mp3 voice document, then in time domain, speech data is carried out some and distort operation, such as insert, delete, the operation such as splicing, finally again be compressed into mp3 voice by carrying out tampered time domain data, as shown in Figure 1.
If original mp3 voice are through first compression formation, so tampered mp3 voice just experienced by two compression.Therefore, if there is method can detect two compressions of mp3 voice, so also just can detects that this mp3 voice may have passed through and distort.
The large quantifier elimination that had a lot of scholar to carry out in two compressed detected of image and video, also scholar is had to do a few thing to two compressed detected researchs of mp3 audio frequency, usually their background of research is to detect falsetto Geological Problems, also namely detects the audio frequency of original low bit-rate with the situation of high Compression.As: YangR, ShiYQ, HuangJ.Defeatingfake-qualityMP3 [J] .Mm & SecProceedingsofAcmWorkshoponMultimedia & Security, 2009:117-124 (Yang Rui, Huang is followed the footsteps of. and falsetto matter MP3 detects [J] .ACM multi-media safety and symposial .2009:117-124) propose a kind of method detecting falsetto matter, it is first by analyzing low bit-rate audio frequency non-zero MDCT (modifieddiscretecosinetransform, Modified Discrete Cosine Transform) the less feature of coefficient, proposing a kind of statistics MDCT median coefficient is 0, ± 1, ± 2, ± 3, the number of ± 4 is as feature, the method of machine learning is used to classify, result shows that the method has good Detection results.And for example: YuX, WangR, YanD, etal.DetectingFake-QualityMP3basedonHuffmanTableIndex [J] .JournalofSoftware, 2014,9 (4) (Yu Xianmin, Wang Rangding, Yan Diqun. based on the falsetto quality detection of Huffman code table index. software periodical, 2014.9 (4)) propose a kind of false sound quality detection method utilizing Huffman code table index, the method also has good Detection results.
YangR, ShiYQ, HuangJ.Detectingdoublecompressionofaudiosignal. [J] .ProcSpie, 2010, 7541:75410K-75410K-10 (Yang Rui, Huang is followed the footsteps of. and two compressing audio signal detects [J] .SPIE meeting .2010, a kind of pair of compressed detected 7541:75410K-75410K-10) proposed will be less than single compressed voice by the number of MDCT coefficient between (-1 ~ 1) of second-compressed voice, utilize Benford law and vertical shift whether to detect voice signal through two compression, result shows the method and has higher Detection results to the mp3 second-compressed that low bit-rate is changed to high code check, but when high code check is poor to the Detection results under low bit-rate conversion or same code rate.LiuQ, SungAH, QiaoM.DetectionofDoubleMP3Compression [J] .CognitiveComputation, 2010,2 (4): 291-296 (LiuQ, SungAH, two compressed detected [J] cognition of QiaoM.MP3 calculates, 2010,2 (4): 291-296) distribution proposing the quantity with non-zero MDCT coefficient, the mean distance between them, zero coefficient and nonzero coefficient distinguishes single compression and two compression as feature, the method turns low bit-rate to high code check, or low bit-rate turns high code check and has good Detection results.DaLuo, WeiqiLuo, RuiYang, JiwuHuang.Compressionhistoryidentificationfordigitalaudi osignal [C] IEEEInternationalConferenceonAcoustics, SpeechandSignalProcessing, 2012:1733-1736 (Luo Da, Luo Weiqi, Yang Rui, Huang is followed the footsteps of. and compressing digital audio history detects the international acoustics meeting of [J] .IEEE, Speech processing, null value number and mel cepstrum coefficients (MFCC) that audio compression history 2012:1733-1736) proposed detects by extracting MDCT coefficient detect as the compression histories of feature to wav audio frequency, detect it whether to compress through mp3, and the bit rate of its compression can be detected.
At present, two compressed detected of voice are substantially all concentrate on the detect delay of different Compression, but in the authenticity research of voice content, interpolater often preserves with original identical code check after distorting pseudo-manufacturing operation to voice content, make the degree of forgery higher like this, the difficulty of detection is larger.Therefore significant to two compressed detected of same code rate in reality.Current, only have the detection that a small amount of working needle compresses same code rate.As: Yu Xianmin, Wang Rangding, Yan Diqun, Deng. based on the two compressed detected method [J] of the MP3 under identical compression speed. computer engineering and application, 2013,12nd phase (12): two compressed detected methods of a kind of same code rate proposed in 93-96, it utilizes number that in adjacent twice compression process, QMDCT (Modified Discrete Cosine Transform of quantification) coefficient is not identical as foundation, and classify repeatedly doing poor value as characteristic use support vector machine, experimental result shows that the method has good Detection results.And for example: MaPF, WangR, YanD, etal.DetectingDouble-compressedMP3withtheSameBit-rate [J] .JournalofSoftware, 2014, 9 (10) (Ma Pengfei, Wang Rangding, Yan Diqun. the two compressed detected [J] of MP3 of same code rate. software periodical, 2014.9 (10)) the two compressed detected method of mp3 same code rate of a kind of proportion of utilization ratio characteristics proposed, the scalefactor bands matrix of window long in mp3 cataloged procedure is delivered in the support vector machine trained as feature and is detected by it, result shows that the method has good Detection results.Although above-mentioned two kinds of methods have certain Detection results to the two compression of same code rate, but these two kinds of methods only consider the relation of Detection accuracy and compression bit rate, the object of research is too single, do not consider that voice length is also the key factor affecting Detection accuracy, and experiment shows that the Detection accuracy of the voice of different duration exists notable difference, these two kinds of methods all have ignored a key factor of two compressed detected, are therefore necessary to study a kind of voice pair compressed detected technology considering duration.
Summary of the invention
Technical matters to be solved by this invention is to provide the two compressed detected method of the high and same code rate mp3 relevant to duration that detection of complex is low of a kind of Detection accuracy.
The present invention solves the problems of the technologies described above adopted technical scheme: a kind of two compressed detected method of same code rate mp3 relevant to duration, is characterized in that comprising the following steps:
1. the mp3 voice that acquisition one is to be detected, are designated as fr;
2. obtain duration and the compression bit rate of fr, correspondence is designated as and br, wherein, unit be second, the unit of br is kbps;
3. utilize the mp3 scrambler with codec functions to decode to fr, obtain wav voice, in decode procedure, extract QMDCT matrix of coefficients simultaneously, be designated as q 1;
4. the mp3 scrambler in utilizing step 3. carries out compression coding with the compression bit rate br of fr to the wav voice that 3. step obtains, and obtains mp3 voice, extracts QMDCT matrix of coefficients simultaneously, be designated as q in compression coding process 2;
5. q is calculated 1with q 2matrix of differences, be designated as D, D=q 1-q 2;
6. by the duration of fr value substitute into first compression fitting function Y respectively 1(t)=10 3(A 1× t-B 1) and second-compressed fitting function Y 2(t)=10 3(A 2× t-B 2) in, correspondence obtains with then judge whether set up, if set up, then determine that fr is the two compressed voices under same code rate; Otherwise, determine that fr is single compressed voice, wherein, Y 1t () is the function of duration variable t, A 1represent Y 1(t)=10 3(A 1× t-B 1) in slope, B 1represent Y 1(t)=10 3(A 1× t-B 1) in intercept, Y 2t () is the function of duration variable t, A 2represent Y 2(t)=10 3(A 2× t-B 2) in slope, B 2represent Y 2(t)=10 3(A 2× t-B 2) in intercept, symbol " || || 0" for asking the number of nonzero element in matrix.
Described step 2. in the duration of fr refer to the time span of the voice content of fr.
Described step 6. in first compression fitting function Y 1(t)=10 3(A 1× t-B 1) and second-compressed fitting function Y 2(t)=10 3(A 2× t-B 2) acquisition process be:
6. the original wav voice _ 1, choosing different duration are each N number of, wherein, and N >=50; Then the N number of original wav voice of each duration are formed a voice set;
6. _ 2, current pending voice set is defined as current speech set;
6. _ 3, pending n-th original wav voice current in current speech set are defined as current wav voice, wherein, the initial value of 1≤n≤N, n is 1;
6. the mp3 scrambler in _ 4, utilizing step 3. carries out compression coding with the compression bit rate br of fr to current wav voice, obtains the mp3 voice that current wav voice are formed after first compression, extracts QMDCT matrix of coefficients simultaneously, be designated as Q in compression coding process n, 1;
6. the mp3 scrambler in _ 5, utilizing step is 3. decoded to the mp3 voice that current wav voice are formed after first compression, obtains mp3 voice that current wav voice are formed after first compression again through wav voice that decoding is formed;
6. the mp3 scrambler in _ 6, utilizing step 3. carries out compression coding with the compression bit rate br of fr to current wav voice (the mp3 voice formed after first compression are again through the wav voice of decoding formation), obtain the mp3 voice that current wav voice are formed after second-compressed, in compression coding process, extract QMDCT matrix of coefficients simultaneously, be designated as Q n, 2;
6. the mp3 scrambler in _ 7, utilizing step is 3. decoded to the mp3 voice that current wav voice are formed after second-compressed, obtains mp3 voice that current wav voice are formed after second-compressed again through wav voice that decoding is formed;
6. the mp3 scrambler in _ 8, utilizing step 3. carries out compression coding with the compression bit rate br of fr to current wav voice (the mp3 voice formed after second-compressed are again through the wav voice of decoding formation), in compression coding process, extract QMDCT matrix of coefficients simultaneously, be designated as Q n, 3;
6. _ 9, n=n+1 is made, using original wav voice next pending in current speech set as current wav voice, then return step 6. _ 4 and continue to perform, until the N number of original wav voice in current speech set are all disposed, wherein, "=" in n=n+1 is assignment;
6. Mean1 _ 10, is made to represent first compression QMDCT Coefficient Mean, and make Mean2 represent second-compressed QMDCT Coefficient Mean, wherein, symbol " || || 0" for asking the number of nonzero element in matrix;
6. _ 11, using voice set pending for the next one as current speech set, then return step 6. _ 3 to continue to perform, until each self-corresponding voice process of aggregation of all durations is complete, obtain first compression QMDCT Coefficient Mean corresponding to each duration and second-compressed QMDCT Coefficient Mean;
6. _ 12, linear fit is carried out to all durations and each self-corresponding first compression QMDCT Coefficient Mean of all durations, obtain first compression fitting function Y 1(t)=10 3(A 1× t-B 1); And linear fit is carried out to all durations and each self-corresponding second-compressed QMDCT Coefficient Mean of all durations, obtain second-compressed fitting function Y 2(t)=10 3(A 2× t-B 2); Wherein, Y 1t () is the function of duration variable t, A 1represent Y 1(t)=10 3(A 1× t-B 1) in slope, B 1represent Y 1(t)=10 3(A 1× t-B 1) in intercept, Y 2t () is the function of duration variable t, A 2represent Y 2(t)=10 3(A 2× t-B 2) in slope, B 2represent Y 2(t)=10 3(A 2× t-B 2) in intercept.
Compared with prior art, the invention has the advantages that:
1) duration of mp3 voice to be detected substitutes in the first compression fitting function relevant to duration and second-compressed fitting function by the inventive method respectively, correspondence obtains first compression fitting function value and second-compressed fitting function value, again by the individual distance counting to first compression fitting function value and second-compressed fitting function value of the nonzero element in poor value matrix, determine that mp3 voice to be detected are that first compression voice are still for second-compressed voice, experimental result shows that Detection accuracy is high and detection of complex is low, and the relation that well disclosed between detection accuracy and duration.
2) the inventive method propose first in two compressed detected of same code rate detect accuracy relevant to the duration of voice, no matter be use spectral coefficient in existing method, the spectral coefficient quantized or scale factor, the Statistic features of Huffman code table index etc. carries out two compressed detected as feature, all only considered the relation of testing result and compression bit rate, and these statistical natures can because of the quantity difference property of there are differences of statistics, therefore the result of detection can be had influence on, and the quantity of statistical nature is determined by the duration of voice just in the methods of the invention, experimental result shows that the accuracy of detection and duration exist obvious correlativity, therefore the two compressed detected of voice that the correlativity that is accurate and duration of the inventive method utilization detection is perfect further.
3) the inventive method can be applied to the tampering detection of audio compression history inconsistency, the inconsistent voice joint of two sections of compression histories together, the sound bite of multiple identical duration can be cut into, apply the inventive method respectively to detect, if detect that the compression number of times of sound bite is inconsistent, then can determine that these voice there occurs to distort, therefore the inventive method also can be applied in the tampering detection field of voice.
Accompanying drawing explanation
Fig. 1 is the schematic diagram distorting process mp3 voice two compression generative process;
Fig. 2 be the inventive method totally realize block diagram;
To be 20 durations be Fig. 3 a that the wav audio frequency of 1 second with compression bit rate is that 96kbps compresses three times and the schematic diagram of QMDCT coefficient not same number between the mp3 audio frequency that adjacent twice compression obtained after extracting corresponding QMDCT matrix of coefficients is formed;
To be 20 durations be Fig. 3 b that the wav audio frequency of 10 seconds with compression bit rate is that 96kbps compresses three times and the schematic diagram of QMDCT coefficient not same number between the mp3 audio frequency that adjacent twice compression obtained after extracting corresponding QMDCT matrix of coefficients is formed;
Fig. 4 is that duration is respectively 1,2,3,4...10 each 600 of 10 kinds of duration audio frequency of second, compression bit rate is 96kbps, add up all audio frequency of different durations obtained once with the result schematic diagram of the secondary QMDCT coefficient not average of same number and secondary and three the QMDCT coefficients not average of same number;
Fig. 5 is the design sketch that the first compression sample of 9 kinds of different durations and second-compressed sample are tested in matched curve.
Embodiment
Below in conjunction with accompanying drawing embodiment, the present invention is described in further detail.
The two compressed detected method of a kind of relevant to the duration same code rate mp3 that the present invention proposes, it totally realizes block diagram as shown in Figure 2, and it comprises the following steps:
1. the mp3 voice that acquisition one is to be detected, are designated as fr.
2. adopt existing technological means to obtain duration and the compression bit rate of fr, correspondence is designated as and br, wherein, unit be second, the unit of br is kbps.
In this particular embodiment, step 2. in the duration of fr refer to the time span of the voice content of fr.
The compression bit rate of general mp3 audio frequency is 32kbps, 64kbps, 96kbps, 128kbps, 192kbps, 256kbps or 320kbps.
3. utilize the mp3 scrambler with codec functions to decode to fr, obtain wav voice, in decode procedure, extract QMDCT matrix of coefficients simultaneously, be designated as q 1.
4. the mp3 scrambler in utilizing step 3. carries out compression coding with the compression bit rate br of fr (namely with identical compression bit rate) to the wav voice that 3. step obtains, obtain mp3 voice, in compression coding process, extract QMDCT matrix of coefficients simultaneously, be designated as q 2.
QMDCT matrix of coefficients q is extracted in decode procedure 1with in compression coding process, extract QMDCT matrix of coefficients q 2dimension identical.
5. q is calculated 1with q 2matrix of differences, be designated as D, D=q 1-q 2.
6. by the duration of fr value substitute into first compression fitting function Y respectively 1(t)=10 3(A 1× t-B 1) and second-compressed fitting function Y 2(t)=10 3(A 2× t-B 2) in, correspondence obtains with then judge whether set up, if set up, then determine that fr is the two compressed voices under same code rate; Otherwise, determine that fr is single compressed voice, wherein, Y 1t () is the function of duration variable t, A 1represent Y 1(t)=10 3(A 1× t-B 1) in slope, B 1represent Y 1(t)=10 3(A 1× t-B 1) in intercept, Y 2t () is the function of duration variable t, A 2represent Y 2(t)=10 3(A 2× t-B 2) in slope, B 2represent Y 2(t)=10 3(A 2× t-B 2) in intercept, symbol " || || 0" for asking the number of nonzero element in matrix.
First, between the mp3 voice that formed of the more adjacent compression of the wav compress speech three times of the present invention to identical duration QMDCT coefficient not same number carry out statistical study.Experiment one: adopt 20 durations to be the wav voice of 1 second, sampling rate 44.1kHz, monophony, be that 96kbps extracts corresponding QMDCT matrix of coefficients to each wav compress speech three times with compression bit rate, correspondence is designated as Q 1', Q 2' and Q 3'; Then each wav voice are calculated corresponding || Q 2'-Q 1' || 0with || Q 3'-Q 2' || 0, all wav voice are each self-corresponding || Q 2'-Q 1' || 0with || Q 3'-Q 2' || 0result as shown in Figure 3 a, in Fig. 3 a, transverse axis represents the sequence number of 20 wav voice, and the longitudinal axis represents the not identical number of QMDCT coefficient, asterisk represent || Q 2'-Q 1' || 0, square representative || Q 3'-Q 2' || 0.Experiment two: adopt 20 durations to be the wav voice of 10 seconds, sampling rate 44.1kHz, monophony, be that 96kbps extracts corresponding QMDCT matrix of coefficients to each wav compress speech three times with compression bit rate, correspondence is designated as Q 1", Q 2" and Q 3"; Then each wav voice are calculated corresponding || Q 2"-Q 1" || 0with || Q 3"-Q 2" || 0, all wav voice are each self-corresponding || Q 2"-Q 1" || 0with || Q 3"-Q 2" || 0result as shown in Figure 3 b, in Fig. 3 b, transverse axis represents the sequence number of 20 wav voice, and the longitudinal axis represents the not identical number of QMDCT coefficient, asterisk represent || Q 2"-Q 1" || 0, square representative || Q 3"-Q 2" || 0.Can observe once the number not identical with second-compressed QMDCT coefficient from Fig. 3 a and Fig. 3 b is all can be greater than the number that in secondary and three second compression, QMDCT coefficient is not identical in 1 second or the sound bite of 10 seconds, and may be more and more obvious along with the discrimination of the increase of duration between them.
Secondly, the present invention adopts duration to be respectively 1,2,3,4...10 each 600 of 10 kinds of duration voice of second, compression bit rate is 96kbps, and add up all voice of different duration once with the average of the secondary QMDCT coefficient not average of same number and secondary and three QMDCT coefficients not same number, experimental result is as shown in Figure 4.As can be seen from Figure 4 increase progressively that to be no matter once not identical with three QMDCT coefficients with secondary or secondary number linearly increase progressively along with duration, but along with the discrimination of both increases of duration is more and more obvious, also just illustrate that the accuracy rate of the speech detection that duration is longer is higher.
The present invention according to QMDCT coefficient between the mp3 voice that the more adjacent compression of the wav compress speech three times of identical duration is formed not same number carry out statistical study, and once with the average of the secondary QMDCT coefficient not average of same number and secondary and three QMDCT coefficients not same number, statistical study is carried out to all voice of different duration, determine duration and the first compression fitting function Y of relation once and between the average of secondary QMDCT coefficient not same number 1(t)=10 3(A 1× t-B 1) and the second-compressed fitting function Y of relation between the average determining duration and secondary and three QMDCT coefficients not same number 2(t)=10 3(A 2× t-B 2), namely in this particular embodiment, step 6. in first compression fitting function Y 1(t)=10 3(A 1× t-B 1) and second-compressed fitting function Y 2(t)=10 3(A 2× t-B 2) acquisition process be:
6. the original wav voice _ 1, choosing different duration are each N number of, and wherein, N >=50, get N=600 in the present embodiment; Then the N number of original wav voice of each duration are formed a voice set.
In the present embodiment, be respectively for duration and within 1 second, 2 seconds, 3 seconds, 4 seconds, 5 seconds, 6 seconds, 7 seconds, 8 seconds, 9 seconds, 10 seconds, respectively choose 600 original wav voice.
6. _ 2, current pending voice set is defined as current speech set.
6. _ 3, pending n-th original wav voice current in current speech set are defined as current wav voice, wherein, the initial value of 1≤n≤N, n is 1.
6. the mp3 scrambler in _ 4, utilizing step 3. carries out compression coding with the compression bit rate br of fr to current wav voice, obtains the mp3 voice that current wav voice are formed after first compression, extracts QMDCT matrix of coefficients simultaneously, be designated as Q in compression coding process n, 1.
6. the mp3 scrambler in _ 5, utilizing step is 3. decoded to the mp3 voice that current wav voice are formed after first compression, obtains mp3 voice that current wav voice are formed after first compression again through wav voice that decoding is formed.
6. the mp3 scrambler in _ 6, utilizing step 3. carries out compression coding with the compression bit rate br of fr to current wav voice (the mp3 voice formed after first compression are again through the wav voice of decoding formation), obtain the mp3 voice that current wav voice are formed after second-compressed, in compression coding process, extract QMDCT matrix of coefficients simultaneously, be designated as Q n, 2.
6. the mp3 scrambler in _ 7, utilizing step is 3. decoded to the mp3 voice that current wav voice are formed after second-compressed, obtains mp3 voice that current wav voice are formed after second-compressed again through wav voice that decoding is formed.
6. the mp3 scrambler in _ 8, utilizing step 3. carries out compression coding with the compression bit rate br of fr to current wav voice (the mp3 voice formed after second-compressed are again through the wav voice of decoding formation), in compression coding process, extract QMDCT matrix of coefficients simultaneously, be designated as Q n, 3; Q n, 1, Q n, 2and Q n, 3dimension identical.
6. _ 9, n=n+1 is made, using original wav voice next pending in current speech set as current wav voice, then return step 6. _ 4 and continue to perform, until the N number of original wav voice in current speech set are all disposed, wherein, "=" in n=n+1 is assignment.
6. Mean1 _ 10, is made to represent first compression QMDCT Coefficient Mean, and make Mean2 represent second-compressed QMDCT Coefficient Mean, wherein, symbol " || || 0" for asking the number of nonzero element in matrix.
6. _ 11, using voice set pending for the next one as current speech set, then return step 6. _ 3 to continue to perform, until each self-corresponding voice process of aggregation of all durations is complete, obtain first compression QMDCT Coefficient Mean corresponding to each duration and second-compressed QMDCT Coefficient Mean.
6. _ 12, use MATLAB fitting of a polynomial instrument polyfit to carry out linear fit to all durations and each self-corresponding first compression QMDCT Coefficient Mean of all durations, obtain first compression fitting function Y 1(t)=10 3(A 1× t-B 1); And use MATLAB fitting of a polynomial instrument polyfit to carry out linear fit to all durations and each self-corresponding second-compressed QMDCT Coefficient Mean of all durations, obtain second-compressed fitting function Y 2(t)=10 3(A 2× t-B 2); Wherein, Y 1t () is the function of duration variable t, A 1represent Y 1(t)=10 3(A 1× t-B 1) in slope, B 1represent Y 1(t)=10 3(A 1× t-B 1) in intercept, A 1and B 1value matching obtain, Y 2t () is the function of duration variable t, A 2represent Y 2(t)=10 3(A 2× t-B 2) in slope, B 2represent Y 2(t)=10 3(A 2× t-B 2) in intercept, A 2and B 2value matching obtain.At this, if the compression bit rate br of fr is 96kbps, then matching obtains A 1=9.0963, B 1=0.1174, A 2=7.8639, B 2=0.5952, i.e. Y 1(t)=10 3(9.0963t-0.1174), Y 2(t)=10 3(7.8639t-0.5952).
For further illustrating feasibility and the validity of the inventive method, test.
Experiment test Sample Storehouse comprises the mp3 voice of two groups of different durations, first group is first compression mp3 voice, second group is the mp3 voice of second-compressed, two groups of durations are all 0.5 second, 1.25 second, 1.75 second, 2.5 second, 4.25 second, 6.5 second, 8.75 second, 12.5 seconds, 18 seconds (for the duration of the mp3 voice to be detected of simulation reality is uncertain, the duration detecting mp3 voice is selected arbitrarily in experiment) each 9 kinds of durations, the mp3 voice of often kind of duration have 600, then first compression mp3 voice have 5400, the mp3 voice of second-compressed also have 5400, totally 10800 mp3 voice to be detected.Compression bit rate is all 96kbps, and sound channel is monophony, and mp3 scrambler uses the lame3.99.5 of current popular.The mp3 voice decompress(ion) that two groups are compressed by experiment testing process first respectively, extracts corresponding QMDCT matrix of coefficients simultaneously respectively; And then the wav voice after decompress(ion) are compressed into mp3 voice with same code rate (96kbps) again, also extract its QMDCT matrix of coefficients simultaneously, so just obtain QMDCT matrix of coefficients and the QMDCT matrix of coefficients that obtains of second compression again of each mp3 voice to be detected, namely each mp3 voice to be detected are to there being a pair QMDCT matrix of coefficients; Then corresponding for each mp3 voice to be detected a pair QMDCT matrix of coefficients is done to differ from and the number of nonzero element in the matrix of differences calculated; Last for each mp3 voice to be detected, the number of the nonzero element in the matrix of differences of correspondence is made comparisons with two fitting function values of corresponding duration position, if the distance to first compression fitting function value is less than the distance of second-compressed fitting function value, then these voice are first compression, otherwise these voice are second-compressed.
In order to describe the result of experiment test more intuitively, two points of the first compression of each duration and second-compressed are thrown on curve, its experimental result as shown in Figure 5, in Fig. 5, transverse axis represents duration, the longitudinal axis represents the number of the nonzero element in each matrix of differences, the number that namely adjacent two second compression QMDCT coefficients are not identical yet; Star-like expression voice experienced by first compression, and square expression voice experienced by second-compressed, and two curves represent the curve of two second compression matchings respectively, wherein above one be first compression, below one what represent is second-compressed.As can be seen from Figure 5, the sample point once tested is nearly all on the curve of first compression matching, and two compression verification sample point is also nearly all on the curve of second-compressed matching.This illustrates that the curve of matching is rational.
In order to detect the verification and measurement ratio of matched curve, can by 600 respective test sample books of 9 kinds of different durations, add up final verification and measurement ratio, test result comprises TP (truepositive), TN (truenegative), FP (falsepositive), FN (falsenegative).TP represents that two compressed voice is detected as two compressed voice, and TN represents that single compressed voice is detected as single compressed voice, and FN represents that two compressed voice is detected as single compressed voice, and FP represents that single compressed voice is detected as two compressed voice.Final Detection accuracy AR=(TPR+TNR)/2, wherein, TPR=TP/ (TP+FN), TNR=TN/ (TN+FP).The accuracy rate of experimental result is as shown in table 1.As can be seen from Table 1, the accuracy rate that the increase along with duration detects also constantly increases thereupon, and this illustrates that the accuracy rate detected is relevant to duration.
Detection accuracy under the different duration of table 1

Claims (3)

1. the two compressed detected method of the same code rate mp3 relevant to duration, is characterized in that comprising the following steps:
1. the mp3 voice that acquisition one is to be detected, are designated as fr;
2. obtain duration and the compression bit rate of fr, correspondence is designated as and br, wherein, unit be second, the unit of br is kbps;
3. utilize the mp3 scrambler with codec functions to decode to fr, obtain wav voice, in decode procedure, extract QMDCT matrix of coefficients simultaneously, be designated as q 1;
4. the mp3 scrambler in utilizing step 3. carries out compression coding with the compression bit rate br of fr to the wav voice that 3. step obtains, and obtains mp3 voice, extracts QMDCT matrix of coefficients simultaneously, be designated as q in compression coding process 2;
5. q is calculated 1with q 2matrix of differences, be designated as D, D=q 1-q 2;
6. by the duration of fr value substitute into first compression fitting function Y respectively 1(t)=10 3(A 1× t-B 1) and second-compressed fitting function Y 2(t)=10 3(A 2× t-B 2) in, correspondence obtains with then judge whether set up, if set up, then determine that fr is the two compressed voices under same code rate; Otherwise, determine that fr is single compressed voice, wherein, Y 1t () is the function of duration variable t, A 1represent Y 1(t)=10 3(A 1× t-B 1) in slope, B 1represent Y 1(t)=10 3(A 1× t-B 1) in intercept, Y 2t () is the function of duration variable t, A 2represent Y 2(t)=10 3(A 2× t-B 2) in slope, B 2represent Y 2(t)=10 3(A 2× t-B 2) in intercept, symbol " || || 0" for asking the number of nonzero element in matrix.
2. the two compressed detected method of a kind of same code rate mp3 relevant to duration according to claim 1, is characterized in that during described step 2., the duration of fr refers to the time span of the voice content of fr.
3. the two compressed detected method of a kind of same code rate mp3 relevant to duration according to claim 1 and 2, is characterized in that the first compression fitting function Y during described step 6. 1(t)=10 3(A 1× t-B 1) and second-compressed fitting function Y 2(t)=10 3(A 2× t-B 2) acquisition process be:
6. the original wav voice _ 1, choosing different duration are each N number of, wherein, and N >=50; Then the N number of original wav voice of each duration are formed a voice set;
6. _ 2, current pending voice set is defined as current speech set;
6. _ 3, pending n-th original wav voice current in current speech set are defined as current wav voice, wherein, the initial value of 1≤n≤N, n is 1;
6. the mp3 scrambler in _ 4, utilizing step 3. carries out compression coding with the compression bit rate br of fr to current wav voice, obtains the mp3 voice that current wav voice are formed after first compression, extracts QMDCT matrix of coefficients simultaneously, be designated as Q in compression coding process n, 1;
6. the mp3 scrambler in _ 5, utilizing step is 3. decoded to the mp3 voice that current wav voice are formed after first compression, obtains mp3 voice that current wav voice are formed after first compression again through wav voice that decoding is formed;
6. the mp3 scrambler in _ 6, utilizing step 3. carries out compression coding with the compression bit rate br of fr to current wav voice (the mp3 voice formed after first compression are again through the wav voice of decoding formation), obtain the mp3 voice that current wav voice are formed after second-compressed, in compression coding process, extract QMDCT matrix of coefficients simultaneously, be designated as Q n, 2;
6. the mp3 scrambler in _ 7, utilizing step is 3. decoded to the mp3 voice that current wav voice are formed after second-compressed, obtains mp3 voice that current wav voice are formed after second-compressed again through wav voice that decoding is formed;
6. the mp3 scrambler in _ 8, utilizing step 3. carries out compression coding with the compression bit rate br of fr to current wav voice (the mp3 voice formed after second-compressed are again through the wav voice of decoding formation), in compression coding process, extract QMDCT matrix of coefficients simultaneously, be designated as Q n, 3;
6. _ 9, n=n+1 is made, using original wav voice next pending in current speech set as current wav voice, then return step 6. _ 4 and continue to perform, until the N number of original wav voice in current speech set are all disposed, wherein, "=" in n=n+1 is assignment;
6. Mean1 _ 10, is made to represent first compression QMDCT Coefficient Mean, and make Mean2 represent second-compressed QMDCT Coefficient Mean, wherein, symbol " || || 0" for asking the number of nonzero element in matrix;
6. _ 11, using voice set pending for the next one as current speech set, then return step 6. _ 3 to continue to perform, until each self-corresponding voice process of aggregation of all durations is complete, obtain first compression QMDCT Coefficient Mean corresponding to each duration and second-compressed QMDCT Coefficient Mean;
6. _ 12, linear fit is carried out to all durations and each self-corresponding first compression QMDCT Coefficient Mean of all durations, obtain first compression fitting function Y 1(t)=10 3(A 1× t-B 1); And linear fit is carried out to all durations and each self-corresponding second-compressed QMDCT Coefficient Mean of all durations, obtain second-compressed fitting function Y 2(t)=10 3(A 2× t-B 2); Wherein, Y 1t () is the function of duration variable t, A 1represent Y 1(t)=10 3(A 1× t-B 1) in slope, B 1represent Y 1(t)=10 3(A 1× t-B 1) in intercept, Y 2t () is the function of duration variable t, A 2represent Y 2(t)=10 3(A 2× t-B 2) in slope, B 2represent Y 2(t)=10 3(A 2× t-B 2) in intercept.
CN201610018814.7A 2016-01-13 2016-01-13 A kind of bis- compressed detected methods of same code rate mp3 relevant to duration Active CN105513600B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610018814.7A CN105513600B (en) 2016-01-13 2016-01-13 A kind of bis- compressed detected methods of same code rate mp3 relevant to duration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610018814.7A CN105513600B (en) 2016-01-13 2016-01-13 A kind of bis- compressed detected methods of same code rate mp3 relevant to duration

Publications (2)

Publication Number Publication Date
CN105513600A true CN105513600A (en) 2016-04-20
CN105513600B CN105513600B (en) 2019-02-05

Family

ID=55721527

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610018814.7A Active CN105513600B (en) 2016-01-13 2016-01-13 A kind of bis- compressed detected methods of same code rate mp3 relevant to duration

Country Status (1)

Country Link
CN (1) CN105513600B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103366753A (en) * 2013-06-28 2013-10-23 宁波大学 Moving picture experts group audio layer-3 (MP3) audio double-compression detection method under same code rate
CN104282310A (en) * 2014-09-26 2015-01-14 宁波大学 Steganography detection method for audio subjected to MP3Stego steganography
CN105070297A (en) * 2015-07-16 2015-11-18 宁波大学 MP3 audio compression history detection method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103366753A (en) * 2013-06-28 2013-10-23 宁波大学 Moving picture experts group audio layer-3 (MP3) audio double-compression detection method under same code rate
CN104282310A (en) * 2014-09-26 2015-01-14 宁波大学 Steganography detection method for audio subjected to MP3Stego steganography
CN105070297A (en) * 2015-07-16 2015-11-18 宁波大学 MP3 audio compression history detection method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JINGLEI ZHOU ETC: "Multiple MP3 Compression Detection Based", 《DIGITAL-FORENSICS,14TH INTERNATIONAL WORKSHOP, IWDW 2015》 *
MEILING HUANG ETC: "Detection of Double Compression for HEVC", 《DIGITAL-FORENSICS,14TH INTERNATIONAL WORKSHOP, IWDW 2015》 *

Also Published As

Publication number Publication date
CN105513600B (en) 2019-02-05

Similar Documents

Publication Publication Date Title
US9208790B2 (en) Extraction and matching of characteristic fingerprints from audio signals
US9093120B2 (en) Audio fingerprint extraction by scaling in time and resampling
CN100380975C (en) Method for generating hashes from a compressed multimedia content
US10089994B1 (en) Acoustic fingerprint extraction and matching
CN110647656B (en) Audio retrieval method utilizing transform domain sparsification and compression dimension reduction
CN101577605A (en) Speech LPC hiding and extraction algorithm based on filter similarity
CN102568469B (en) G.729A compressed pronunciation flow information hiding detection device and detection method
Chen et al. Audio hash function based on non-negative matrix factorisation of mel-frequency cepstral coefficients
CN103366753B (en) Moving picture experts group audio layer-3 (MP3) audio double-compression detection method under same code rate
Hicsonmez et al. Methods for identifying traces of compression in audio
CN1599983B (en) Method and equipment for detecting the quantization of spectra
CN101350198B (en) Method for compressing watermark using voice based on bone conduction
Zhou Automatic speech code identification with application to tampering detection of speech recordings
Hicsonmez et al. Audio codec identification from coded and transcoded audios
CN105741853B (en) A kind of digital speech perceptual hash method based on formant frequency
CN102222504A (en) Digital audio multilayer watermark implanting and extracting method
CN105513600A (en) Duration-related mp3 double compression detection method under same bit rate
CN108877816B (en) QMDCT coefficient-based AAC audio frequency recompression detection method
Sampaio et al. Double compressed AMR audio detection using linear prediction coefficients and support vector machine
Huang et al. AAC audio compression detection based on QMDCT coefficient
Kabir et al. Vector quantization in text dependent automatic speaker recognition using mel-frequency cepstrum coefficient
Xu et al. Content-based digital watermarking for compressed audio
Li et al. Reversible watermarking for compressed speech
Yadav et al. PS3DT: Synthetic Speech Detection Using Patched Spectrogram Transformer
CN109785848A (en) The bis- compression audio-frequency detections of AAC based on scale factor coefficient differentials

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant