CN102867513B - Pseudo-Zernike moment based voice content authentication method - Google Patents

Pseudo-Zernike moment based voice content authentication method Download PDF

Info

Publication number
CN102867513B
CN102867513B CN201210278724.3A CN201210278724A CN102867513B CN 102867513 B CN102867513 B CN 102867513B CN 201210278724 A CN201210278724 A CN 201210278724A CN 102867513 B CN102867513 B CN 102867513B
Authority
CN
China
Prior art keywords
frame
watermark
pseudo
voice
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210278724.3A
Other languages
Chinese (zh)
Other versions
CN102867513A (en
Inventor
王宏霞
刘正辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Jiaotong University
Original Assignee
Southwest Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Jiaotong University filed Critical Southwest Jiaotong University
Priority to CN201210278724.3A priority Critical patent/CN102867513B/en
Publication of CN102867513A publication Critical patent/CN102867513A/en
Application granted granted Critical
Publication of CN102867513B publication Critical patent/CN102867513B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a pseudo-Zernike moment based voice content authentication method. The method comprises the following steps of: during watermark embedding, dividing an original voice signal A into P frames, and dividing each frame into N sections; generating a watermark W according to an average value of the amplitudes of n-order pseudo-Zernike moments of discrete cosine transformation (DCT) low-frequency coefficients of the first N/2 sections of each frame; and embedding the watermark by quantizing pseudo-Zernike moments of the DCT low-frequency coefficients of the last N/2 sections of each frame, and thus obtaining a watermark-containing voice signal A'. The method has the advantages that by fully using the characteristic of close relevancy of the amplitudes of the pseudo-Zernike moments of the DCT low-frequency coefficients of the voice signal and voice contents, and the processing robustness of the conventional voice signal, the sensitivity of attack on malicious tampering is ensured, and high tolerance on certain conventional voice signal processing capacity is ensured.

Description

A kind of voice content authentication method based on Zernike pseudo-matrix
Technical field
The present invention relates to a kind of speech recognition, especially the solution of voice content authenticity and integrity authentication question.
Background technology
In recent years, the extensive of the fast development of digitize voice communication and various voice products popularized, and the appearance of various powerful audio frequency process softwares, makes the transmission of digital speech and application day by day become frequent and extensive.Meanwhile, distort the voice content data of transmitting and storing and become relatively easy.For example, one section of important court testimony recording, its consequence is well imagined if key partial content is maliciously tampered in storage, transmitting procedure! .Therefore, how to differentiate whether one section of important or responsive voice content was tampered, and where had been tampered, whether voice record source is true, credible, and the authentication question that these relate to digital speech authenticity has caused that Chinese scholars studies interest greatly.Audio watermarking technique, as a kind of technological means of protecting audio frequency, occurs being just subject to the people's attention from the nineties in last century, and becomes the focus of information security research field.
Compare with sound signal, it is low that voice signal has sampling rate, and normal signal is processed to the features such as more responsive.Therefore, existing a lot of audio content identifying algorithms cannot authenticate for voice content, or are not very desirable for the effect of voice content authentication.In actual life, for audio frequency, being more to solve Copyright Protection, is more to solve content authenticity and integrated authentication problem for voice.Voice content authentication techniques based on digital watermarking, if the watermark embedding and voice self content are irrelevant, can increase the transmission quantity of information on the one hand, also there is on the other hand certain potential safety hazard, so come the voice authentication algorithm of generating watermark just to have more Research Significance and practical value based on voice unique characteristics or content.
The amplitude of Zernike pseudo-matrix (Zernike square) has the feature of rotational invariance, and this feature has been widely used in the fields such as image representation, image retrieval and image watermark, and application on audio frequency also seldom.Document " Robust audio watermarking based on low-order Zernike moments " (Xiang Shi-jun, Huang Ji-wu, Yang Rui, 5 thinternational Workshop on Digital Watermarking, pp226-240, Oct.2006) first audio frequency is carried out to one dimension to two-dimentional conversion, then corresponding 2D signal is carried out to Zernike conversion.The amplitude that has proved by experiment Zernike square is processed and is had very strong robustness normal signal; Analyze the amplitude of Zernike square and the linear relationship of audio sample value simultaneously, proposed thus the robust audio watermarking algorithm based on low order Zernike square.Document " Apseudo-Zernike moments based audio watermarking scheme robust against desynchronization attacks " (Wang Xiang-yang, Ma Tian-xiao, Niu Pan-pan, Computers and Electrical Engineering, vol.37, no.4, pp.425-443, July2011) first in time domain, based on average statistical, embed synchronous code, then quantize the amplitude embed watermark of Zernike pseudo-matrix, proposed the Audio Watermarking Algorithm of the synchronous attack resistant based on Zernike pseudo-matrix.The watermarking algorithm based on Zernike pseudo-matrix (Zernike square) for above-mentioned, on the one hand, need to calculate the Zernike pseudo-matrix of all sample points, and calculated amount is larger, and the time of expending is longer.The embedding of watermark is to complete by the sample value of each audio section of convergent-divergent in proportion.Analysis shows, directly scalable audio sample value is larger to the change amount of original audio, and the quality of original audio signal is caused to larger destruction; On the other hand, the embedded location of watermark and method are disclosed, and the calculating of the feature of each audio frame (Zernike pseudo-matrix) is also known.So assailant can find the position of each audio frame and calculate the feature of every frame, re-quantization Zernike pseudo-matrix removes the watermark of embedding, makes algorithm lose the effect of protection copyright.Or assailant can replace the audio frequency containing watermark with other audio section, then quantize the audio content after replacing, make it meet the correct condition of extracting of watermark, its Content Implementation is attacked.Therefore, studying the voice content identifying algorithm that content-based anti-attack ability is strong has important practical significance.
Summary of the invention
In view of the deficiencies in the prior art, the object of the present invention is to provide a kind of voice content identifying algorithm based on Zernike pseudo-matrix, this algorithm can effectively be distinguished the normal signal of voice is processed to operation and malicious attack, and can effective location the voice content position of maliciously distorting, thereby realize the authenticity and integrity authentication of voice content.
For realizing such object, it is foundation that the robustness that the Zernike pseudo-matrix amplitude of DCT low frequency coefficient processes normal signal is take in the present invention, has designed a kind of new watermark generation and embedding grammar.
A voice content authentication method for Zernike pseudo-matrix, can effectively distinguish normal signal and process operation and malicious attack, simultaneously to malicious attack tampering location effectively.Thereby realize the authenticity and integrity authentication of voice content, comprise following concrete steps:
(1) watermark embeds: first from K sample point of voice signal, start primary speech signal A to be divided into P frame (K is as the key of watermaking system), and every frame is divided into N section.Then calculate the n rank Zernike pseudo-matrix amplitude sum of the front N/2 section DCT low frequency coefficient of every frame, and obtain the average of Zernike pseudo-matrix amplitude, by average generation watermark W.The watermark obtaining is embedded in the rear N/2 section of every frame by quantizing the Zernike pseudo-matrix of DCT low frequency coefficient, and the voice signal containing watermark obtaining is designated as A '
(2) voice content verification process: similar with watermark embed process, first from the k of voice signal to be detected 1individual sample point starts A *be divided into P frame, every frame is divided into N section.Calculate the n rank Zernike pseudo-matrix amplitude sum of the front N/2 section DCT low frequency coefficient of every frame, and ask its average, by average generation watermark W '.Calculate the n rank Zernike pseudo-matrix amplitude of N/2 section DCT low frequency coefficient after every frame, by the magnitude extraction of Zernike square, go out watermark W *.Compare W *and W ', judge that those different places, corresponding position are the position that voice signal was tampered, thereby realized the authentication of voice content authenticity and integrity.
Compare with the existing voice watermarking algorithm for content authentication, the present invention utilizes the content of voice to carry out generating watermark, and receiving end has also been received the watermark being embedded in voice signal when receiving voice signal.Thereby reduced transmission bandwidth, saved resource; Also strengthened the security that watermark transmits simultaneously.The embedding of watermark only need to be carried out pseudo-Zernike conversion to DCT low frequency coefficient, has improved the efficiency of algorithm and the ability that watermark tolerance normal signal is processed.So the present invention is easier to practical application.
Accompanying drawing explanation
Fig. 1 is the moisture indo-hittite tone signal figure of the embodiment of the present invention.
Fig. 2 is to the voice signal figure after the quiet attack of Fig. 1 part voice content.
Fig. 3 is to voice signal figure corresponding after Fig. 1 partial content substitution attack.
Fig. 4 is the tampering location result of Fig. 2.
Fig. 5 is the tampering location result of Fig. 3.
Fig. 6 is not audibility test result list.
Fig. 7 is the robustness test result list that normal signal is processed.
Embodiment
Below in conjunction with appendix and embodiment, technical scheme of the present invention is further described.
1, the generation of watermark and embedding:
(1) minute frame of speech data and the division of every frame voice segments.By primary speech signal A={a (l), 1≤l≤LA+K} is divided into P frame (K is as the key of watermaking system), and every frame length is I=LA/P, i frame be designated as A (i) (i=1,2 ..., P).Every frame is divided into N section, and the length of every section is I/N, and i frame j section is designated as A (i, j), 1≤i≤P, 1≤j≤N.
(2) dct transform.A (i, j) is done to dct transform, and D (i, j) represents the DCT coefficient of i frame j section, and the DCT coefficient of getting the front N/2 section of i frame is designated as D 1(i, j).
(3) calculate the heavy Zernike pseudo-matrix of n rank m.By D 1the front m of (i, j) 1* m 1individual low frequency coefficient is transformed to 2D signal.Calculate as follows the heavy Zernike pseudo-matrix of its n rank m:
Note { V nmbe pseudo-Zernike polynomial expression, it is the set that a series of complex value polynomial expressions form, { V nmcomplete Orthogonal base in component unit circle, it is defined as follows formula
V nm(x,y)=V nm(ρ,θ)=R nm(ρ)exp(imθ)
Wherein n is nonnegative integer, and m is for meeting | the integer of m|≤n.Note true origin is l to the vector of point (x, y), ρ=| l|, θ is that x axle forward is to the anticlockwise angle of vectorial l.R nm(ρ) be radial polynomial,
R nm ( ρ ) = Σ s = 0 n - | m | ( - 1 ) s ( 2 n + 1 - s ) ! s ! ( n + | m | + 1 - s ) ! ( n - | m | - s ) ! ρ n - s
2D signal f (x, y) (x in coordinate plane 2+ y 2≤ 1) can be expressed as V nmthe linear combination of (x, y), as shown in the formula
f ( x , y ) = Σ n = 0 ∞ Σ m = - n n A nm V nm * ( x , y )
Wherein
Figure GDA0000361323970000062
and V nm(x, y) be conjugate complex number each other, A nmfor the heavy Zernike pseudo-matrix of n rank m, be defined as follows:
A nm = n + 1 π Σ x Σ y f ( x , y ) V nm * ( x , y ) , x 2 + y 2 ≤ 1
(4) generation of voice watermark.Get the front N/2 section of each frame and carry out generating watermark.Note
Figure GDA00003613239700000610
1≤i≤P, 1≤j≤N/2 is the amplitude sum of n rank Zernike pseudo-matrix, calculates C 1the average of (i, j) note
Figure GDA0000361323970000066
most significant digit be M 1(i), M 1(i) corresponding scale-of-two is made as W 1(i)={ w 1(i, t), 1≤t≤N/2}, W 1(i) be the watermark that i frame generates.
(5) embedding of watermark.Get the DCT coefficient of N/2 section after i frame and be designated as D 2(i, j), N/2+1≤j≤N, by D 2the front m of (i, j) 2* m 2individual low frequency coefficient is transformed to 2D signal, and calculates its n rank Zernike pseudo-matrix amplitude sum, is designated as C 2(i, j).Note
Figure GDA0000361323970000067
most significant digit be M 2(i, j), watermark embeds according to the methods below:
Work as w 1(i, t)=1 o'clock
M 2 ′ ( i , j ) = M 2 ( i , j ) , M 2 ( i , j ) mod 2 = 1 M 2 ( i , j ) + 1 , M 2 ( i , j ) mod 2 = 0
Work as w 1(i, j)=0 o'clock
M 2 ′ ( i , j ) = M 2 ( i , j ) , M 2 ( i , j ) mod 2 = 0 M 2 ( i , j ) + 1 , M 2 ( i , j ) mod 2 = 1
In above formula, work as M 2(i, j)=9 o'clock, M ' 2(i, j)=M 2(i, j)-1; J=t+N/2,1≤t≤N/2.With M ' 2(i, j) replaces C 2(i, j) integral part most significant digit, and an inferior high position is quantified as to 5, corresponding value is designated as
Figure GDA0000361323970000071
By D 2the front m of (i, j) 2* m 2individual low frequency coefficient expands α 2(i, j) doubly, corresponding value is designated as D ' 2(i, j), α 2(i, j) can be obtained by following formula:
α 2 ( i , j ) = C 2 ′ ( i , j ) C 2 ( i , j ) , N/2+1≤j≤N
To D ' 2(i, j) does inverse DCT, and the signal obtaining is the latter half content of i frame, and i frame first half and latter half combine and be the moisture indo-hittite tone signal of i frame.
(6) P speech frame carried out to such embedding successively, until the complete all speech frames of embedding just obtain moisture indo-hittite sound A'.
2, voice content authentication:
(1) step (1)~(4) of similar watermark generation and telescopiny, to voice signal A to be detected *from K sample point, start to be divided into P frame, every frame is divided into N section, and i frame is designated as A *(i) (i=1,2 ..., P), i frame j section is designated as A *(i, j), 1≤j≤N; To A *(i, j) is DCT, and corresponding DCT coefficient is designated as D *(i, j).The DCT coefficient of getting the front N/2 section of i frame is designated as
Figure GDA0000361323970000073
will
Figure GDA0000361323970000074
front m 1* m 1individual low frequency coefficient is transformed to 2D signal, and calculates its n rank Zernike pseudo-matrix amplitude sum, is designated as 1≤j≤N/2.Calculate
Figure GDA00003613239700000719
the average of 1≤j≤N/2
Figure GDA0000361323970000077
C ‾ 1 * ( i ) = Σ j = 1 N / 2 C 1 * ( i , j ) / N / 2 . Note
Figure GDA0000361323970000079
most significant digit be
Figure GDA00003613239700000710
Figure GDA00003613239700000711
two-value turns to W 1 * ( i ) = { w 1 * ( i , t ) , 1 ≤ t ≤ N / 2 } ,
Figure GDA00003613239700000713
be the watermark that i frame generates reconstruct.
(2) getting the DCT coefficient of N/2 section after i frame is designated as
Figure GDA00003613239700000714
will
Figure GDA00003613239700000715
front m 2* m 2individual low frequency coefficient is transformed to 2D signal, and calculates its n rank Zernike pseudo-matrix amplitude sum, is designated as n/2+1≤j≤N.Note
Figure GDA00003613239700000717
most significant digit be
Figure GDA00003613239700000718
carry out following calculating and obtain the watermark of extracting
Figure GDA0000361323970000084
w ^ 1 * ( i , t ) = 1 M 2 * ( i , t + N / 2 ) mod 2 = 1 0 , M 2 * ( i , t + N / 2 ) mod 2 = 0
(3) definition identification sequences TA (i) is
TA ( i ) = Σ t = 1 N / 2 w ^ 1 * ( i , t ) ⊕ w 1 * ( i , t ) , T ∈ { 0,1 }
If TA (i)=0, show that i frame voice content is real, otherwise TA (i)=1 shows that i frame voice content is tampered.
The effect of the inventive method can be verified by following performance evaluation:
1, audibility not
Choosing sampling rate is 22.05kHz, and sample length is that the monophony voice signal of 1024078,16 quantifications is done not audibility test.Fig. 6 has provided the SNR value of 3 kinds of sound-types, by test result, can find out that algorithm has well not audibility herein.
2, the robustness of normal signal being processed
With error rate BER (bit error rate), test the robustness that algorithm is processed normal signal herein, BER is defined as follows formula
BER = E T × 100 %
Wherein, E is for extracting watermark error bit number, and T is the total bit number of voice signal institute water mark inlaying.The BER value robustness that more the bright algorithm of novel is processed normal signal is stronger.
Fig. 7 has listed adult male voice in the BER value (test result of other type voice signal similarly) after some normal signals are processed, and can find out that the inventive method has stronger robustness to the conventional voice signal processing such as MP3 compression, low-pass filtering, resampling.
3, malice tampering location
Moisture indo-hittite tone signal has as shown in Figure 1 been carried out respectively to quiet and substitution attack.Respectively as shown in Figures 2 and 3, corresponding tampering location result respectively as shown in Figure 4 and Figure 5 for voice signal after attack.In Fig. 4, Fig. 5, the frame of TA (i)=1 represents by the part of malicious attack, and the frame of TA (i)=0 represents not have the part of malicious attack.From the result of tampering location, the inventive method is to malicious attack tampering location effectively.
The above-mentioned description for preferred embodiment is too concrete; those of ordinary skill in the art will appreciate that; embodiment described here is in order to help reader to understand principle of the present invention, should be understood to that the protection domain of invention is not limited to such special statement and embodiment.

Claims (1)

1. the voice content authentication method based on Zernike pseudo-matrix, processes operation and malicious attack in order to distinguish normal signal, and to malicious attack tampering location effectively, concrete steps comprise simultaneously:
(1) watermark embeds: first from K sample point of voice signal, start primary speech signal A to be divided into P frame, and every frame is divided into N section; Then calculate the n rank Zernike pseudo-matrix amplitude sum of the front N/2 section discrete cosine transform low frequency coefficient of every frame, and obtain the average of Zernike pseudo-matrix amplitude, by average generation watermark W; The watermark obtaining is embedded in the rear N/2 section of every frame by quantizing the Zernike pseudo-matrix of DCT low frequency coefficient, obtains moisture indo-hittite sound A ';
(2) voice content verification process: similar with watermark embed process, first from voice signal A to be detected *k 1individual sample point starts voice to be divided into P frame, and every frame is divided into N section; Calculate the n rank Zernike pseudo-matrix amplitude sum of the front N/2 section DCT low frequency coefficient of every frame, and ask its average, by average generation watermark W '; Calculate the n rank Zernike pseudo-matrix amplitude of N/2 section DCT low frequency coefficient after every frame, by the magnitude extraction of Zernike pseudo-matrix, go out watermark W *; Compare W *and W ', different place, the corresponding position of judgement is the position that voice signal was tampered, thereby has realized the authentication of voice content authenticity and integrity.
CN201210278724.3A 2012-08-07 2012-08-07 Pseudo-Zernike moment based voice content authentication method Expired - Fee Related CN102867513B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210278724.3A CN102867513B (en) 2012-08-07 2012-08-07 Pseudo-Zernike moment based voice content authentication method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210278724.3A CN102867513B (en) 2012-08-07 2012-08-07 Pseudo-Zernike moment based voice content authentication method

Publications (2)

Publication Number Publication Date
CN102867513A CN102867513A (en) 2013-01-09
CN102867513B true CN102867513B (en) 2014-02-19

Family

ID=47446337

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210278724.3A Expired - Fee Related CN102867513B (en) 2012-08-07 2012-08-07 Pseudo-Zernike moment based voice content authentication method

Country Status (1)

Country Link
CN (1) CN102867513B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103456308B (en) * 2013-08-05 2015-08-19 西南交通大学 A kind of recoverable ciphertext domain voice content authentication method
EP3093846A1 (en) * 2015-05-12 2016-11-16 Nxp B.V. Accoustic context recognition using local binary pattern method and apparatus
CN105304091B (en) * 2015-06-26 2018-10-26 信阳师范学院 A kind of voice tamper recovery method based on DCT
GB2567703B (en) * 2017-10-20 2022-07-13 Cirrus Logic Int Semiconductor Ltd Secure voice biometric authentication
CN107886956B (en) * 2017-11-13 2020-12-11 广州酷狗计算机科技有限公司 Audio recognition method and device and computer storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7280970B2 (en) * 1999-10-04 2007-10-09 Beepcard Ltd. Sonic/ultrasonic authentication device
CN101609675B (en) * 2009-07-27 2011-09-21 西南交通大学 Fragile audio frequency watermark method based on mass center

Also Published As

Publication number Publication date
CN102867513A (en) 2013-01-09

Similar Documents

Publication Publication Date Title
CN101290772B (en) Embedding and extracting method for audio zero water mark based on vector quantization of coefficient of mixed domain
CN101345054B (en) Digital watermark production and recognition method used for audio document
CN101458810B (en) Vector map watermark method based on object property characteristic
CN102867513B (en) Pseudo-Zernike moment based voice content authentication method
Wang et al. Centroid-based semi-fragile audio watermarking in hybrid domain
Liu et al. A novel speech content authentication algorithm based on Bessel–Fourier moments
CN100596061C (en) Method for watermarking small wave threshold digital audio multiple mesh based on blind source separation
CN103208288A (en) Dual encryption based discrete wavelet transform-discrete cosine transform (DWT-DCT) domain audio public watermarking algorithm
CN107993669B (en) Voice content authentication and tampering recovery method based on modification of least significant digit weight
CN105632506A (en) Robust digital audio watermark embedding and detection method based on polar harmonic transform
CN103456308B (en) A kind of recoverable ciphertext domain voice content authentication method
CN103050120B (en) high-capacity digital audio reversible watermark processing method
CN102324234A (en) Audio watermarking method based on MP3 encoding principle
CN106780281A (en) Digital image watermarking method based on Cauchy's statistical modeling
CN105304091A (en) Voice tamper recovery method based on DCT
CN101609675B (en) Fragile audio frequency watermark method based on mass center
CN108877819B (en) Voice content evidence obtaining method based on coefficient autocorrelation
Dutta et al. Generation of biometric based unique digital watermark from iris image
Chen et al. Multipurpose audio watermarking algorithm
Khanduja et al. A scheme for robust biometric watermarking in web databases for ownership proof with identification
CN103745725B (en) A kind of audio frequency watermark embedding grammar based on constant Q transform
CN107910010A (en) Digital watermark detection method based on multi-parameter Weibull statistical modelings
Yang et al. A novel dual watermarking algorithm for digital audio
Yue et al. Rights protection for trajectory streams
Chen et al. A new fragile audio watermarking scheme

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140219

Termination date: 20160807