CN102867513B - Pseudo-Zernike moment based voice content authentication method - Google Patents
Pseudo-Zernike moment based voice content authentication method Download PDFInfo
- Publication number
- CN102867513B CN102867513B CN201210278724.3A CN201210278724A CN102867513B CN 102867513 B CN102867513 B CN 102867513B CN 201210278724 A CN201210278724 A CN 201210278724A CN 102867513 B CN102867513 B CN 102867513B
- Authority
- CN
- China
- Prior art keywords
- frame
- watermark
- pseudo
- voice
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Abstract
The invention discloses a pseudo-Zernike moment based voice content authentication method. The method comprises the following steps of: during watermark embedding, dividing an original voice signal A into P frames, and dividing each frame into N sections; generating a watermark W according to an average value of the amplitudes of n-order pseudo-Zernike moments of discrete cosine transformation (DCT) low-frequency coefficients of the first N/2 sections of each frame; and embedding the watermark by quantizing pseudo-Zernike moments of the DCT low-frequency coefficients of the last N/2 sections of each frame, and thus obtaining a watermark-containing voice signal A'. The method has the advantages that by fully using the characteristic of close relevancy of the amplitudes of the pseudo-Zernike moments of the DCT low-frequency coefficients of the voice signal and voice contents, and the processing robustness of the conventional voice signal, the sensitivity of attack on malicious tampering is ensured, and high tolerance on certain conventional voice signal processing capacity is ensured.
Description
Technical field
The present invention relates to a kind of speech recognition, especially the solution of voice content authenticity and integrity authentication question.
Background technology
In recent years, the extensive of the fast development of digitize voice communication and various voice products popularized, and the appearance of various powerful audio frequency process softwares, makes the transmission of digital speech and application day by day become frequent and extensive.Meanwhile, distort the voice content data of transmitting and storing and become relatively easy.For example, one section of important court testimony recording, its consequence is well imagined if key partial content is maliciously tampered in storage, transmitting procedure! .Therefore, how to differentiate whether one section of important or responsive voice content was tampered, and where had been tampered, whether voice record source is true, credible, and the authentication question that these relate to digital speech authenticity has caused that Chinese scholars studies interest greatly.Audio watermarking technique, as a kind of technological means of protecting audio frequency, occurs being just subject to the people's attention from the nineties in last century, and becomes the focus of information security research field.
Compare with sound signal, it is low that voice signal has sampling rate, and normal signal is processed to the features such as more responsive.Therefore, existing a lot of audio content identifying algorithms cannot authenticate for voice content, or are not very desirable for the effect of voice content authentication.In actual life, for audio frequency, being more to solve Copyright Protection, is more to solve content authenticity and integrated authentication problem for voice.Voice content authentication techniques based on digital watermarking, if the watermark embedding and voice self content are irrelevant, can increase the transmission quantity of information on the one hand, also there is on the other hand certain potential safety hazard, so come the voice authentication algorithm of generating watermark just to have more Research Significance and practical value based on voice unique characteristics or content.
The amplitude of Zernike pseudo-matrix (Zernike square) has the feature of rotational invariance, and this feature has been widely used in the fields such as image representation, image retrieval and image watermark, and application on audio frequency also seldom.Document " Robust audio watermarking based on low-order Zernike moments " (Xiang Shi-jun, Huang Ji-wu, Yang Rui, 5
thinternational Workshop on Digital Watermarking, pp226-240, Oct.2006) first audio frequency is carried out to one dimension to two-dimentional conversion, then corresponding 2D signal is carried out to Zernike conversion.The amplitude that has proved by experiment Zernike square is processed and is had very strong robustness normal signal; Analyze the amplitude of Zernike square and the linear relationship of audio sample value simultaneously, proposed thus the robust audio watermarking algorithm based on low order Zernike square.Document " Apseudo-Zernike moments based audio watermarking scheme robust against desynchronization attacks " (Wang Xiang-yang, Ma Tian-xiao, Niu Pan-pan, Computers and Electrical Engineering, vol.37, no.4, pp.425-443, July2011) first in time domain, based on average statistical, embed synchronous code, then quantize the amplitude embed watermark of Zernike pseudo-matrix, proposed the Audio Watermarking Algorithm of the synchronous attack resistant based on Zernike pseudo-matrix.The watermarking algorithm based on Zernike pseudo-matrix (Zernike square) for above-mentioned, on the one hand, need to calculate the Zernike pseudo-matrix of all sample points, and calculated amount is larger, and the time of expending is longer.The embedding of watermark is to complete by the sample value of each audio section of convergent-divergent in proportion.Analysis shows, directly scalable audio sample value is larger to the change amount of original audio, and the quality of original audio signal is caused to larger destruction; On the other hand, the embedded location of watermark and method are disclosed, and the calculating of the feature of each audio frame (Zernike pseudo-matrix) is also known.So assailant can find the position of each audio frame and calculate the feature of every frame, re-quantization Zernike pseudo-matrix removes the watermark of embedding, makes algorithm lose the effect of protection copyright.Or assailant can replace the audio frequency containing watermark with other audio section, then quantize the audio content after replacing, make it meet the correct condition of extracting of watermark, its Content Implementation is attacked.Therefore, studying the voice content identifying algorithm that content-based anti-attack ability is strong has important practical significance.
Summary of the invention
In view of the deficiencies in the prior art, the object of the present invention is to provide a kind of voice content identifying algorithm based on Zernike pseudo-matrix, this algorithm can effectively be distinguished the normal signal of voice is processed to operation and malicious attack, and can effective location the voice content position of maliciously distorting, thereby realize the authenticity and integrity authentication of voice content.
For realizing such object, it is foundation that the robustness that the Zernike pseudo-matrix amplitude of DCT low frequency coefficient processes normal signal is take in the present invention, has designed a kind of new watermark generation and embedding grammar.
A voice content authentication method for Zernike pseudo-matrix, can effectively distinguish normal signal and process operation and malicious attack, simultaneously to malicious attack tampering location effectively.Thereby realize the authenticity and integrity authentication of voice content, comprise following concrete steps:
(1) watermark embeds: first from K sample point of voice signal, start primary speech signal A to be divided into P frame (K is as the key of watermaking system), and every frame is divided into N section.Then calculate the n rank Zernike pseudo-matrix amplitude sum of the front N/2 section DCT low frequency coefficient of every frame, and obtain the average of Zernike pseudo-matrix amplitude, by average generation watermark W.The watermark obtaining is embedded in the rear N/2 section of every frame by quantizing the Zernike pseudo-matrix of DCT low frequency coefficient, and the voice signal containing watermark obtaining is designated as A '
(2) voice content verification process: similar with watermark embed process, first from the k of voice signal to be detected
1individual sample point starts A
*be divided into P frame, every frame is divided into N section.Calculate the n rank Zernike pseudo-matrix amplitude sum of the front N/2 section DCT low frequency coefficient of every frame, and ask its average, by average generation watermark W '.Calculate the n rank Zernike pseudo-matrix amplitude of N/2 section DCT low frequency coefficient after every frame, by the magnitude extraction of Zernike square, go out watermark W
*.Compare W
*and W ', judge that those different places, corresponding position are the position that voice signal was tampered, thereby realized the authentication of voice content authenticity and integrity.
Compare with the existing voice watermarking algorithm for content authentication, the present invention utilizes the content of voice to carry out generating watermark, and receiving end has also been received the watermark being embedded in voice signal when receiving voice signal.Thereby reduced transmission bandwidth, saved resource; Also strengthened the security that watermark transmits simultaneously.The embedding of watermark only need to be carried out pseudo-Zernike conversion to DCT low frequency coefficient, has improved the efficiency of algorithm and the ability that watermark tolerance normal signal is processed.So the present invention is easier to practical application.
Accompanying drawing explanation
Fig. 1 is the moisture indo-hittite tone signal figure of the embodiment of the present invention.
Fig. 2 is to the voice signal figure after the quiet attack of Fig. 1 part voice content.
Fig. 3 is to voice signal figure corresponding after Fig. 1 partial content substitution attack.
Fig. 4 is the tampering location result of Fig. 2.
Fig. 5 is the tampering location result of Fig. 3.
Fig. 6 is not audibility test result list.
Fig. 7 is the robustness test result list that normal signal is processed.
Embodiment
Below in conjunction with appendix and embodiment, technical scheme of the present invention is further described.
1, the generation of watermark and embedding:
(1) minute frame of speech data and the division of every frame voice segments.By primary speech signal A={a (l), 1≤l≤LA+K} is divided into P frame (K is as the key of watermaking system), and every frame length is I=LA/P, i frame be designated as A (i) (i=1,2 ..., P).Every frame is divided into N section, and the length of every section is I/N, and i frame j section is designated as A (i, j), 1≤i≤P, 1≤j≤N.
(2) dct transform.A (i, j) is done to dct transform, and D (i, j) represents the DCT coefficient of i frame j section, and the DCT coefficient of getting the front N/2 section of i frame is designated as D
1(i, j).
(3) calculate the heavy Zernike pseudo-matrix of n rank m.By D
1the front m of (i, j)
1* m
1individual low frequency coefficient is transformed to 2D signal.Calculate as follows the heavy Zernike pseudo-matrix of its n rank m:
Note { V
nmbe pseudo-Zernike polynomial expression, it is the set that a series of complex value polynomial expressions form, { V
nmcomplete Orthogonal base in component unit circle, it is defined as follows formula
V
nm(x,y)=V
nm(ρ,θ)=R
nm(ρ)exp(imθ)
Wherein n is nonnegative integer, and m is for meeting | the integer of m|≤n.Note true origin is l to the vector of point (x, y), ρ=| l|, θ is that x axle forward is to the anticlockwise angle of vectorial l.R
nm(ρ) be radial polynomial,
2D signal f (x, y) (x in coordinate plane
2+ y
2≤ 1) can be expressed as V
nmthe linear combination of (x, y), as shown in the formula
Wherein
and V
nm(x, y) be conjugate complex number each other, A
nmfor the heavy Zernike pseudo-matrix of n rank m, be defined as follows:
(4) generation of voice watermark.Get the front N/2 section of each frame and carry out generating watermark.Note
1≤i≤P, 1≤j≤N/2 is the amplitude sum of n rank Zernike pseudo-matrix, calculates C
1the average of (i, j)
note
most significant digit be M
1(i), M
1(i) corresponding scale-of-two is made as W
1(i)={ w
1(i, t), 1≤t≤N/2}, W
1(i) be the watermark that i frame generates.
(5) embedding of watermark.Get the DCT coefficient of N/2 section after i frame and be designated as D
2(i, j), N/2+1≤j≤N, by D
2the front m of (i, j)
2* m
2individual low frequency coefficient is transformed to 2D signal, and calculates its n rank Zernike pseudo-matrix amplitude sum, is designated as C
2(i, j).Note
most significant digit be M
2(i, j), watermark embeds according to the methods below:
Work as w
1(i, t)=1 o'clock
Work as w
1(i, j)=0 o'clock
In above formula, work as M
2(i, j)=9 o'clock, M '
2(i, j)=M
2(i, j)-1; J=t+N/2,1≤t≤N/2.With M '
2(i, j) replaces C
2(i, j) integral part most significant digit, and an inferior high position is quantified as to 5, corresponding value is designated as
By D
2the front m of (i, j)
2* m
2individual low frequency coefficient expands α
2(i, j) doubly, corresponding value is designated as D '
2(i, j), α
2(i, j) can be obtained by following formula:
To D '
2(i, j) does inverse DCT, and the signal obtaining is the latter half content of i frame, and i frame first half and latter half combine and be the moisture indo-hittite tone signal of i frame.
(6) P speech frame carried out to such embedding successively, until the complete all speech frames of embedding just obtain moisture indo-hittite sound A'.
2, voice content authentication:
(1) step (1)~(4) of similar watermark generation and telescopiny, to voice signal A to be detected
*from K sample point, start to be divided into P frame, every frame is divided into N section, and i frame is designated as A
*(i) (i=1,2 ..., P), i frame j section is designated as A
*(i, j), 1≤j≤N; To A
*(i, j) is DCT, and corresponding DCT coefficient is designated as D
*(i, j).The DCT coefficient of getting the front N/2 section of i frame is designated as
will
front m
1* m
1individual low frequency coefficient is transformed to 2D signal, and calculates its n rank Zernike pseudo-matrix amplitude sum, is designated as
1≤j≤N/2.Calculate
the average of 1≤j≤N/2
Note
most significant digit be
two-value turns to
be the watermark that i frame generates reconstruct.
(2) getting the DCT coefficient of N/2 section after i frame is designated as
will
front m
2* m
2individual low frequency coefficient is transformed to 2D signal, and calculates its n rank Zernike pseudo-matrix amplitude sum, is designated as
n/2+1≤j≤N.Note
most significant digit be
carry out following calculating and obtain the watermark of extracting
(3) definition identification sequences TA (i) is
If TA (i)=0, show that i frame voice content is real, otherwise TA (i)=1 shows that i frame voice content is tampered.
The effect of the inventive method can be verified by following performance evaluation:
1, audibility not
Choosing sampling rate is 22.05kHz, and sample length is that the monophony voice signal of 1024078,16 quantifications is done not audibility test.Fig. 6 has provided the SNR value of 3 kinds of sound-types, by test result, can find out that algorithm has well not audibility herein.
2, the robustness of normal signal being processed
With error rate BER (bit error rate), test the robustness that algorithm is processed normal signal herein, BER is defined as follows formula
Wherein, E is for extracting watermark error bit number, and T is the total bit number of voice signal institute water mark inlaying.The BER value robustness that more the bright algorithm of novel is processed normal signal is stronger.
Fig. 7 has listed adult male voice in the BER value (test result of other type voice signal similarly) after some normal signals are processed, and can find out that the inventive method has stronger robustness to the conventional voice signal processing such as MP3 compression, low-pass filtering, resampling.
3, malice tampering location
Moisture indo-hittite tone signal has as shown in Figure 1 been carried out respectively to quiet and substitution attack.Respectively as shown in Figures 2 and 3, corresponding tampering location result respectively as shown in Figure 4 and Figure 5 for voice signal after attack.In Fig. 4, Fig. 5, the frame of TA (i)=1 represents by the part of malicious attack, and the frame of TA (i)=0 represents not have the part of malicious attack.From the result of tampering location, the inventive method is to malicious attack tampering location effectively.
The above-mentioned description for preferred embodiment is too concrete; those of ordinary skill in the art will appreciate that; embodiment described here is in order to help reader to understand principle of the present invention, should be understood to that the protection domain of invention is not limited to such special statement and embodiment.
Claims (1)
1. the voice content authentication method based on Zernike pseudo-matrix, processes operation and malicious attack in order to distinguish normal signal, and to malicious attack tampering location effectively, concrete steps comprise simultaneously:
(1) watermark embeds: first from K sample point of voice signal, start primary speech signal A to be divided into P frame, and every frame is divided into N section; Then calculate the n rank Zernike pseudo-matrix amplitude sum of the front N/2 section discrete cosine transform low frequency coefficient of every frame, and obtain the average of Zernike pseudo-matrix amplitude, by average generation watermark W; The watermark obtaining is embedded in the rear N/2 section of every frame by quantizing the Zernike pseudo-matrix of DCT low frequency coefficient, obtains moisture indo-hittite sound A ';
(2) voice content verification process: similar with watermark embed process, first from voice signal A to be detected
*k
1individual sample point starts voice to be divided into P frame, and every frame is divided into N section; Calculate the n rank Zernike pseudo-matrix amplitude sum of the front N/2 section DCT low frequency coefficient of every frame, and ask its average, by average generation watermark W '; Calculate the n rank Zernike pseudo-matrix amplitude of N/2 section DCT low frequency coefficient after every frame, by the magnitude extraction of Zernike pseudo-matrix, go out watermark W
*; Compare W
*and W ', different place, the corresponding position of judgement is the position that voice signal was tampered, thereby has realized the authentication of voice content authenticity and integrity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210278724.3A CN102867513B (en) | 2012-08-07 | 2012-08-07 | Pseudo-Zernike moment based voice content authentication method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210278724.3A CN102867513B (en) | 2012-08-07 | 2012-08-07 | Pseudo-Zernike moment based voice content authentication method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102867513A CN102867513A (en) | 2013-01-09 |
CN102867513B true CN102867513B (en) | 2014-02-19 |
Family
ID=47446337
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210278724.3A Expired - Fee Related CN102867513B (en) | 2012-08-07 | 2012-08-07 | Pseudo-Zernike moment based voice content authentication method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102867513B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103456308B (en) * | 2013-08-05 | 2015-08-19 | 西南交通大学 | A kind of recoverable ciphertext domain voice content authentication method |
EP3093846A1 (en) * | 2015-05-12 | 2016-11-16 | Nxp B.V. | Accoustic context recognition using local binary pattern method and apparatus |
CN105304091B (en) * | 2015-06-26 | 2018-10-26 | 信阳师范学院 | A kind of voice tamper recovery method based on DCT |
GB2567703B (en) * | 2017-10-20 | 2022-07-13 | Cirrus Logic Int Semiconductor Ltd | Secure voice biometric authentication |
CN107886956B (en) * | 2017-11-13 | 2020-12-11 | 广州酷狗计算机科技有限公司 | Audio recognition method and device and computer storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7280970B2 (en) * | 1999-10-04 | 2007-10-09 | Beepcard Ltd. | Sonic/ultrasonic authentication device |
CN101609675B (en) * | 2009-07-27 | 2011-09-21 | 西南交通大学 | Fragile audio frequency watermark method based on mass center |
-
2012
- 2012-08-07 CN CN201210278724.3A patent/CN102867513B/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN102867513A (en) | 2013-01-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101290772B (en) | Embedding and extracting method for audio zero water mark based on vector quantization of coefficient of mixed domain | |
CN101345054B (en) | Digital watermark production and recognition method used for audio document | |
CN101458810B (en) | Vector map watermark method based on object property characteristic | |
CN102867513B (en) | Pseudo-Zernike moment based voice content authentication method | |
Wang et al. | Centroid-based semi-fragile audio watermarking in hybrid domain | |
Liu et al. | A novel speech content authentication algorithm based on Bessel–Fourier moments | |
CN100596061C (en) | Method for watermarking small wave threshold digital audio multiple mesh based on blind source separation | |
CN103208288A (en) | Dual encryption based discrete wavelet transform-discrete cosine transform (DWT-DCT) domain audio public watermarking algorithm | |
CN107993669B (en) | Voice content authentication and tampering recovery method based on modification of least significant digit weight | |
CN105632506A (en) | Robust digital audio watermark embedding and detection method based on polar harmonic transform | |
CN103456308B (en) | A kind of recoverable ciphertext domain voice content authentication method | |
CN103050120B (en) | high-capacity digital audio reversible watermark processing method | |
CN102324234A (en) | Audio watermarking method based on MP3 encoding principle | |
CN106780281A (en) | Digital image watermarking method based on Cauchy's statistical modeling | |
CN105304091A (en) | Voice tamper recovery method based on DCT | |
CN101609675B (en) | Fragile audio frequency watermark method based on mass center | |
CN108877819B (en) | Voice content evidence obtaining method based on coefficient autocorrelation | |
Dutta et al. | Generation of biometric based unique digital watermark from iris image | |
Chen et al. | Multipurpose audio watermarking algorithm | |
Khanduja et al. | A scheme for robust biometric watermarking in web databases for ownership proof with identification | |
CN103745725B (en) | A kind of audio frequency watermark embedding grammar based on constant Q transform | |
CN107910010A (en) | Digital watermark detection method based on multi-parameter Weibull statistical modelings | |
Yang et al. | A novel dual watermarking algorithm for digital audio | |
Yue et al. | Rights protection for trajectory streams | |
Chen et al. | A new fragile audio watermarking scheme |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20140219 Termination date: 20160807 |