CN102867513B

CN102867513B - Pseudo-Zernike moment based voice content authentication method

Info

Publication number: CN102867513B
Application number: CN201210278724.3A
Authority: CN
Inventors: 王宏霞; 刘正辉
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2012-08-07
Filing date: 2012-08-07
Publication date: 2014-02-19
Anticipated expiration: 2032-08-07
Also published as: CN102867513A

Abstract

The invention discloses a pseudo-Zernike moment based voice content authentication method. The method comprises the following steps of: during watermark embedding, dividing an original voice signal A into P frames, and dividing each frame into N sections; generating a watermark W according to an average value of the amplitudes of n-order pseudo-Zernike moments of discrete cosine transformation (DCT) low-frequency coefficients of the first N/2 sections of each frame; and embedding the watermark by quantizing pseudo-Zernike moments of the DCT low-frequency coefficients of the last N/2 sections of each frame, and thus obtaining a watermark-containing voice signal A'. The method has the advantages that by fully using the characteristic of close relevancy of the amplitudes of the pseudo-Zernike moments of the DCT low-frequency coefficients of the voice signal and voice contents, and the processing robustness of the conventional voice signal, the sensitivity of attack on malicious tampering is ensured, and high tolerance on certain conventional voice signal processing capacity is ensured.

Description

A kind of voice content authentication method based on Zernike pseudo-matrix

Technical field

The present invention relates to a kind of speech recognition, especially the solution of voice content authenticity and integrity authentication question.

Background technology

In recent years, the extensive of the fast development of digitize voice communication and various voice products popularized, and the appearance of various powerful audio frequency process softwares, makes the transmission of digital speech and application day by day become frequent and extensive.Meanwhile, distort the voice content data of transmitting and storing and become relatively easy.For example, one section of important court testimony recording, its consequence is well imagined if key partial content is maliciously tampered in storage, transmitting procedure! .Therefore, how to differentiate whether one section of important or responsive voice content was tampered, and where had been tampered, whether voice record source is true, credible, and the authentication question that these relate to digital speech authenticity has caused that Chinese scholars studies interest greatly.Audio watermarking technique, as a kind of technological means of protecting audio frequency, occurs being just subject to the people's attention from the nineties in last century, and becomes the focus of information security research field.

Compare with sound signal, it is low that voice signal has sampling rate, and normal signal is processed to the features such as more responsive.Therefore, existing a lot of audio content identifying algorithms cannot authenticate for voice content, or are not very desirable for the effect of voice content authentication.In actual life, for audio frequency, being more to solve Copyright Protection, is more to solve content authenticity and integrated authentication problem for voice.Voice content authentication techniques based on digital watermarking, if the watermark embedding and voice self content are irrelevant, can increase the transmission quantity of information on the one hand, also there is on the other hand certain potential safety hazard, so come the voice authentication algorithm of generating watermark just to have more Research Significance and practical value based on voice unique characteristics or content.

The amplitude of Zernike pseudo-matrix (Zernike square) has the feature of rotational invariance, and this feature has been widely used in the fields such as image representation, image retrieval and image watermark, and application on audio frequency also seldom.Document " Robust audio watermarking based on low-order Zernike moments " (Xiang Shi-jun, Huang Ji-wu, Yang Rui, 5 ^thinternational Workshop on Digital Watermarking, pp226-240, Oct.2006) first audio frequency is carried out to one dimension to two-dimentional conversion, then corresponding 2D signal is carried out to Zernike conversion.The amplitude that has proved by experiment Zernike square is processed and is had very strong robustness normal signal; Analyze the amplitude of Zernike square and the linear relationship of audio sample value simultaneously, proposed thus the robust audio watermarking algorithm based on low order Zernike square.Document " Apseudo-Zernike moments based audio watermarking scheme robust against desynchronization attacks " (Wang Xiang-yang, Ma Tian-xiao, Niu Pan-pan, Computers and Electrical Engineering, vol.37, no.4, pp.425-443, July2011) first in time domain, based on average statistical, embed synchronous code, then quantize the amplitude embed watermark of Zernike pseudo-matrix, proposed the Audio Watermarking Algorithm of the synchronous attack resistant based on Zernike pseudo-matrix.The watermarking algorithm based on Zernike pseudo-matrix (Zernike square) for above-mentioned, on the one hand, need to calculate the Zernike pseudo-matrix of all sample points, and calculated amount is larger, and the time of expending is longer.The embedding of watermark is to complete by the sample value of each audio section of convergent-divergent in proportion.Analysis shows, directly scalable audio sample value is larger to the change amount of original audio, and the quality of original audio signal is caused to larger destruction; On the other hand, the embedded location of watermark and method are disclosed, and the calculating of the feature of each audio frame (Zernike pseudo-matrix) is also known.So assailant can find the position of each audio frame and calculate the feature of every frame, re-quantization Zernike pseudo-matrix removes the watermark of embedding, makes algorithm lose the effect of protection copyright.Or assailant can replace the audio frequency containing watermark with other audio section, then quantize the audio content after replacing, make it meet the correct condition of extracting of watermark, its Content Implementation is attacked.Therefore, studying the voice content identifying algorithm that content-based anti-attack ability is strong has important practical significance.

Summary of the invention

In view of the deficiencies in the prior art, the object of the present invention is to provide a kind of voice content identifying algorithm based on Zernike pseudo-matrix, this algorithm can effectively be distinguished the normal signal of voice is processed to operation and malicious attack, and can effective location the voice content position of maliciously distorting, thereby realize the authenticity and integrity authentication of voice content.

For realizing such object, it is foundation that the robustness that the Zernike pseudo-matrix amplitude of DCT low frequency coefficient processes normal signal is take in the present invention, has designed a kind of new watermark generation and embedding grammar.

A voice content authentication method for Zernike pseudo-matrix, can effectively distinguish normal signal and process operation and malicious attack, simultaneously to malicious attack tampering location effectively.Thereby realize the authenticity and integrity authentication of voice content, comprise following concrete steps:

(1) watermark embeds: first from K sample point of voice signal, start primary speech signal A to be divided into P frame (K is as the key of watermaking system), and every frame is divided into N section.Then calculate the n rank Zernike pseudo-matrix amplitude sum of the front N/2 section DCT low frequency coefficient of every frame, and obtain the average of Zernike pseudo-matrix amplitude, by average generation watermark W.The watermark obtaining is embedded in the rear N/2 section of every frame by quantizing the Zernike pseudo-matrix of DCT low frequency coefficient, and the voice signal containing watermark obtaining is designated as A '

(2) voice content verification process: similar with watermark embed process, first from the k of voice signal to be detected ₁individual sample point starts A ^*be divided into P frame, every frame is divided into N section.Calculate the n rank Zernike pseudo-matrix amplitude sum of the front N/2 section DCT low frequency coefficient of every frame, and ask its average, by average generation watermark W '.Calculate the n rank Zernike pseudo-matrix amplitude of N/2 section DCT low frequency coefficient after every frame, by the magnitude extraction of Zernike square, go out watermark W ^*.Compare W ^*and W ', judge that those different places, corresponding position are the position that voice signal was tampered, thereby realized the authentication of voice content authenticity and integrity.

Compare with the existing voice watermarking algorithm for content authentication, the present invention utilizes the content of voice to carry out generating watermark, and receiving end has also been received the watermark being embedded in voice signal when receiving voice signal.Thereby reduced transmission bandwidth, saved resource; Also strengthened the security that watermark transmits simultaneously.The embedding of watermark only need to be carried out pseudo-Zernike conversion to DCT low frequency coefficient, has improved the efficiency of algorithm and the ability that watermark tolerance normal signal is processed.So the present invention is easier to practical application.

Accompanying drawing explanation

Fig. 1 is the moisture indo-hittite tone signal figure of the embodiment of the present invention.

Fig. 2 is to the voice signal figure after the quiet attack of Fig. 1 part voice content.

Fig. 3 is to voice signal figure corresponding after Fig. 1 partial content substitution attack.

Fig. 4 is the tampering location result of Fig. 2.

Fig. 5 is the tampering location result of Fig. 3.

Fig. 6 is not audibility test result list.

Fig. 7 is the robustness test result list that normal signal is processed.

Embodiment

Below in conjunction with appendix and embodiment, technical scheme of the present invention is further described.

1, the generation of watermark and embedding:

(1) minute frame of speech data and the division of every frame voice segments.By primary speech signal A={a (l), 1≤l≤LA+K} is divided into P frame (K is as the key of watermaking system), and every frame length is I=LA/P, i frame be designated as A (i) (i=1,2 ..., P).Every frame is divided into N section, and the length of every section is I/N, and i frame j section is designated as A (i, j), 1≤i≤P, 1≤j≤N.

(2) dct transform.A (i, j) is done to dct transform, and D (i, j) represents the DCT coefficient of i frame j section, and the DCT coefficient of getting the front N/2 section of i frame is designated as D ₁(i, j).

(3) calculate the heavy Zernike pseudo-matrix of n rank m.By D ₁the front m of (i, j) ₁* m ₁individual low frequency coefficient is transformed to 2D signal.Calculate as follows the heavy Zernike pseudo-matrix of its n rank m:

Note { V _nmbe pseudo-Zernike polynomial expression, it is the set that a series of complex value polynomial expressions form, { V _nmcomplete Orthogonal base in component unit circle, it is defined as follows formula

V _nm(x,y)＝V _nm(ρ,θ)＝R _nm(ρ)exp(imθ)

Wherein n is nonnegative integer, and m is for meeting | the integer of m|≤n.Note true origin is l to the vector of point (x, y), ρ=| l|, θ is that x axle forward is to the anticlockwise angle of vectorial l.R _nm(ρ) be radial polynomial,

R_{nm} (ρ) = Σ_{s = 0}^{n - | m |} \frac{{(- 1)}^{s} (2 n + 1 - s)!}{s! (n + | m | + 1 - s)! (n - | m | - s)!} ρ^{n - s}

2D signal f (x, y) (x in coordinate plane ²+ y ²≤ 1) can be expressed as V _nmthe linear combination of (x, y), as shown in the formula

f (x, y) = Σ_{n = 0}^{\infty} Σ_{m = - n}^{n} A_{nm} V_{nm}^{*} (x, y)

Wherein

and V _nm(x, y) be conjugate complex number each other, A _nmfor the heavy Zernike pseudo-matrix of n rank m, be defined as follows:

A_{nm} = \frac{n + 1}{π} \underset{x}{Σ} \underset{y}{Σ} f (x, y) V_{nm}^{*} (x, y), x^{2} + y^{2} \leq 1

(4) generation of voice watermark.Get the front N/2 section of each frame and carry out generating watermark.Note

1≤i≤P, 1≤j≤N/2 is the amplitude sum of n rank Zernike pseudo-matrix, calculates C ₁the average of (i, j) note

most significant digit be M ₁(i), M ₁(i) corresponding scale-of-two is made as W ₁(i)={ w ₁(i, t), 1≤t≤N/2}, W ₁(i) be the watermark that i frame generates.

(5) embedding of watermark.Get the DCT coefficient of N/2 section after i frame and be designated as D ₂(i, j), N/2+1≤j≤N, by D ₂the front m of (i, j) ₂* m ₂individual low frequency coefficient is transformed to 2D signal, and calculates its n rank Zernike pseudo-matrix amplitude sum, is designated as C ₂(i, j).Note

most significant digit be M ₂(i, j), watermark embeds according to the methods below:

Work as w ₁(i, t)=1 o'clock

M_{2}^{'} (i, j) = \{\begin{matrix} M_{2} (i, j), & M_{2} (i, j) \mod 2 = 1 \\ M_{2} (i, j) + 1, & M_{2} (i, j) \mod 2 = 0 \end{matrix}

Work as w ₁(i, j)=0 o'clock

M_{2}^{'} (i, j) = \{\begin{matrix} M_{2} (i, j), & M_{2} (i, j) \mod 2 = 0 \\ M_{2} (i, j) + 1, & M_{2} (i, j) \mod 2 = 1 \end{matrix}

In above formula, work as M ₂(i, j)=9 o'clock, M ' ₂(i, j)=M ₂(i, j)-1; J=t+N/2,1≤t≤N/2.With M ' ₂(i, j) replaces C ₂(i, j) integral part most significant digit, and an inferior high position is quantified as to 5, corresponding value is designated as

By D ₂the front m of (i, j) ₂* m ₂individual low frequency coefficient expands α ₂(i, j) doubly, corresponding value is designated as D ' ₂(i, j), α ₂(i, j) can be obtained by following formula:

α_{2} (i, j) = \frac{C_{2}^{'} (i, j)}{C_{2} (i, j)},

N/2+1≤j≤N

To D ' ₂(i, j) does inverse DCT, and the signal obtaining is the latter half content of i frame, and i frame first half and latter half combine and be the moisture indo-hittite tone signal of i frame.

(6) P speech frame carried out to such embedding successively, until the complete all speech frames of embedding just obtain moisture indo-hittite sound A'.

2, voice content authentication:

(1) step (1)～(4) of similar watermark generation and telescopiny, to voice signal A to be detected ^*from K sample point, start to be divided into P frame, every frame is divided into N section, and i frame is designated as A ^*(i) (i=1,2 ..., P), i frame j section is designated as A ^*(i, j), 1≤j≤N; To A ^*(i, j) is DCT, and corresponding DCT coefficient is designated as D ^*(i, j).The DCT coefficient of getting the front N/2 section of i frame is designated as

will

front m ₁* m ₁individual low frequency coefficient is transformed to 2D signal, and calculates its n rank Zernike pseudo-matrix amplitude sum, is designated as 1≤j≤N/2.Calculate

the average of 1≤j≤N/2

{\overset{&OverBar;}{C}}_{1}^{*} (i) = Σ_{j = 1}^{N / 2} C_{1}^{*} (i, j) / N / 2 .

Note

most significant digit be

two-value turns to

W_{1}^{*} (i) = {w_{1}^{*} (i, t), 1 \leq t \leq N / 2},

be the watermark that i frame generates reconstruct.

(2) getting the DCT coefficient of N/2 section after i frame is designated as

will

front m ₂* m ₂individual low frequency coefficient is transformed to 2D signal, and calculates its n rank Zernike pseudo-matrix amplitude sum, is designated as n/2+1≤j≤N.Note

most significant digit be

carry out following calculating and obtain the watermark of extracting

{\hat{w}}_{1}^{*} (i, t) = \{\begin{matrix} 1 & M_{2}^{*} (i, t + N / 2) \mod 2 = 1 \\ 0, & M_{2}^{*} (i, t + N / 2) \mod 2 = 0 \end{matrix}

(3) definition identification sequences TA (i) is

TA (i) = Σ_{t = 1}^{N / 2} {\hat{w}}_{1}^{*} (i, t) &CirclePlus; w_{1}^{*} (i, t), T &Element; {0,1}

If TA (i)=0, show that i frame voice content is real, otherwise TA (i)=1 shows that i frame voice content is tampered.

The effect of the inventive method can be verified by following performance evaluation:

1, audibility not

Choosing sampling rate is 22.05kHz, and sample length is that the monophony voice signal of 1024078,16 quantifications is done not audibility test.Fig. 6 has provided the SNR value of 3 kinds of sound-types, by test result, can find out that algorithm has well not audibility herein.

2, the robustness of normal signal being processed

With error rate BER (bit error rate), test the robustness that algorithm is processed normal signal herein, BER is defined as follows formula

BER = \frac{E}{T} \times 100 %

Wherein, E is for extracting watermark error bit number, and T is the total bit number of voice signal institute water mark inlaying.The BER value robustness that more the bright algorithm of novel is processed normal signal is stronger.

Fig. 7 has listed adult male voice in the BER value (test result of other type voice signal similarly) after some normal signals are processed, and can find out that the inventive method has stronger robustness to the conventional voice signal processing such as MP3 compression, low-pass filtering, resampling.

3, malice tampering location

Moisture indo-hittite tone signal has as shown in Figure 1 been carried out respectively to quiet and substitution attack.Respectively as shown in Figures 2 and 3, corresponding tampering location result respectively as shown in Figure 4 and Figure 5 for voice signal after attack.In Fig. 4, Fig. 5, the frame of TA (i)=1 represents by the part of malicious attack, and the frame of TA (i)=0 represents not have the part of malicious attack.From the result of tampering location, the inventive method is to malicious attack tampering location effectively.

The above-mentioned description for preferred embodiment is too concrete; those of ordinary skill in the art will appreciate that; embodiment described here is in order to help reader to understand principle of the present invention, should be understood to that the protection domain of invention is not limited to such special statement and embodiment.

Claims

1. the voice content authentication method based on Zernike pseudo-matrix, processes operation and malicious attack in order to distinguish normal signal, and to malicious attack tampering location effectively, concrete steps comprise simultaneously:

(1) watermark embeds: first from K sample point of voice signal, start primary speech signal A to be divided into P frame, and every frame is divided into N section; Then calculate the n rank Zernike pseudo-matrix amplitude sum of the front N/2 section discrete cosine transform low frequency coefficient of every frame, and obtain the average of Zernike pseudo-matrix amplitude, by average generation watermark W; The watermark obtaining is embedded in the rear N/2 section of every frame by quantizing the Zernike pseudo-matrix of DCT low frequency coefficient, obtains moisture indo-hittite sound A ';

(2) voice content verification process: similar with watermark embed process, first from voice signal A to be detected ^*k ₁individual sample point starts voice to be divided into P frame, and every frame is divided into N section; Calculate the n rank Zernike pseudo-matrix amplitude sum of the front N/2 section DCT low frequency coefficient of every frame, and ask its average, by average generation watermark W '; Calculate the n rank Zernike pseudo-matrix amplitude of N/2 section DCT low frequency coefficient after every frame, by the magnitude extraction of Zernike pseudo-matrix, go out watermark W ^*; Compare W ^*and W ', different place, the corresponding position of judgement is the position that voice signal was tampered, thereby has realized the authentication of voice content authenticity and integrity.