CN110163787B

CN110163787B - Audio digital robust blind watermark embedding method based on dual-tree complex wavelet transform

Info

Publication number: CN110163787B
Application number: CN201910343271.XA
Authority: CN
Inventors: 钱振兴; 周立波; 钱阳; 景旭
Original assignee: Jiangsu Watermark Technology Co ltd
Current assignee: Jiangsu Watermark Technology Co ltd
Priority date: 2019-04-26
Filing date: 2019-04-26
Publication date: 2023-02-28
Anticipated expiration: 2039-04-26
Also published as: CN110163787A

Abstract

The invention discloses an audio digital robust blind watermark embedding method based on dual-tree complex wavelet transform, which takes an original code stream of an audio signal as an embedding object of watermark information, utilizes extremely low correlation between a pseudorandom sequence and a natural audio signal, and fully ensures the possibility of correctly extracting the embedded information; the watermark information is infinitely and circularly embedded into the whole audio signal, so that the extraction accuracy is greatly improved, the algorithm obtains cutting robustness, and the embedded audio and the original audio have no perceptual difference in hearing, so that the method is a digital blind watermark embedding method which can not sense hearing. The watermark is embedded in a dual-tree complex wavelet transform domain of the image, and a block embedding method is adopted, so that the information embedding amount is increased, common attacks such as image quality compression, proportion transformation and the like of the image can be resisted, certain robustness is achieved, and the method has great significance for copyright protection of the image or audio.

Description

Audio digital robust blind watermark embedding method based on dual-tree complex wavelet transform

Technical Field

The invention provides an image digital watermark embedding method, in particular to an audio digital robust blind watermark embedding method based on dual-tree complex wavelet transform; belonging to the technical field of image and video processing.

Background

With the rapid development of mobile networks in recent years, multimedia products including digital images and digital audio become easier and easier to spread, and the spreading speed is faster and faster; and the copying of multimedia content becomes simpler and simpler, the piracy difficulty is lower and lower, and the legal rights of the creator are seriously damaged. Therefore, the issue of copyright protection in multimedia is becoming more serious, and audio digital watermarking as a feasible solution has become one of the hot spots of research.

The audio digital watermark is a secret signal embedded in an audio signal, and enables an audio author to establish copyright authentication on own audio works and mark an owner, so that the audio digital watermark is a reliable technical means for realizing the copyright protection of multimedia contents. With the development of audio signal encoding technology, the development of audio watermarking technology is also new. The method of using visible logo as audio watermark in early time is directly placed in an area of audio, and is convenient to use, but the watermark has a large influence on the quality of carrier audio, so that the sound quality is poor, and the watermark is easy to cut. Recently, some research results propose a series of robust imperceptible audio watermarking algorithms, which generally have strong robustness or imperceptibility, but the tone quality and the robustness are not well balanced as a pair of contradictions when the watermarking signals are embedded.

In view of the above, there is a need for a new embedding method to take account of the robustness of the watermark and the sound quality of the carrier audio.

Disclosure of Invention

In order to solve the defects of the prior art, the invention aims to provide an audio digital robust blind watermark embedding method based on dual-tree complex wavelet transform, the algorithm is embedded with a section of invisible digital information which is dispersed in the frequency domain of each frame of audio, so that various common audio attacks can be effectively resisted, the quality of audio containing watermarks can be better ensured, and the robustness of the watermarks and the tone quality of carrier audio are considered.

In order to achieve the above purpose, the invention adopts the following technical scheme:

the original code stream of the audio signal is used as an embedded object of the watermark information, and the design of the algorithm ensures that the anti-attack robustness of the system is high enough, and meanwhile, the tone quality of the carrier signal can be effectively ensured.

Firstly, reading carrier signal audio sampling point information, and performing framing processing on all sampling points. The invention utilizes the extremely low correlation between the pseudo-random sequence and the natural audio signal, and fully ensures the possibility that the embedded information is correctly extracted. The watermark is formed by splicing a plurality of pseudo-random sequences, the pseudo-random sequences are generated by 16 different keys, the basic watermarks are numbered from 0 to 15 and correspond to 16-system information 0x0 to F, and therefore each pseudo-random sequence represents 4-bit information. For each frame we embed a pseudo-random sequence into the carrier audio, resulting in a large watermark, and thus a large embedding capacity. And performing 2-layer DTCTWT (dual-tree complex wavelet transform) frequency domain conversion processing on the pseudo-random sequence to be embedded, and then embedding the pseudo-random sequence into carrier audio. In order to ensure that the watermark information can be correctly extracted, the method adopts a 'two-repetition' embedding rule, namely, each code element is repeatedly embedded into multi-frame content, and meanwhile, the watermark information is infinitely and circularly embedded into the whole audio signal, so that the extraction accuracy is greatly improved, and the algorithm obtains the cutting robustness.

After embedding the watermark information, embedding a time domain signal for realizing frame synchronization, and storing the audio as a signal containing the watermark. The embedded audio and the original audio have no perceptual difference in hearing, and the method is a digital blind watermark embedding method which is not perceivable by hearing. During extraction, firstly, cross-correlation calculation is carried out on a received signal and a synchronous time domain signal to obtain the position of the synchronous signal, frame synchronization is realized, then, framing processing is carried out on audio, 2 layers of DTCTWT conversion is used for each frame to extract an intermediate frequency sub-band, correlation between the intermediate frequency sub-band and a generated pseudo-random sequence is calculated, and therefore an embedded watermark sequence is obtained.

The specific process of watermark embedding is as follows:

(1) Audio framing and sequence mapping

Firstly, the information embedder divides the carrier audio into a plurality of frames, each frame comprises 1024 time domain sampling points in sequence, and the sampling points with the tail less than one frame are not in the embedding range. Then, 16 different pseudo-random seeds are selected to generate 16 different pseudo-random sequences, wherein the value range of elements in the sequences is [ -1,1], and the range is consistent with the amplitude of the audio signal sampling point. In the present algorithm, different watermarks are embedded, i.e. representing embedding different hexadecimal watermark information. The 16 pseudo-random sequences are used as mapping sequences, are numbered 0-15 in sequence and correspond to 0x0F in hexadecimal. Thus, the embedding of each pseudorandom sequence corresponds to a 4-bit binary watermark information embedding. Thus, the information embedder maps the watermark information to the sequence to be embedded. In addition, the 17 th pseudo-random seed is selected by the information embedder to generate the audio synchronization signal, which is introduced in step (3).

(2) Mapping sequence embedding

The information embedder realizes the embedding of the watermark information by superposing the DTCTWT intermediate frequency signal of the sequence to be embedded with the original audio signal. Firstly, for each frame, the embedder performs 2-level dual-tree complex wavelet transform (2-level DTCWT) on the sequence to be embedded and the original sequence, and converts the signal from the time domain into low-frequency, intermediate-frequency and high-frequency sub-bands through the transform. In order to ensure that the influence of information embedding on the original signal is as small as possible and simultaneously to have enough robustness, an intermediate frequency sub-band is selected as a target area for information embedding. Assuming that the second layer DTCWT transform intermediate frequency coefficients of the original signal, the sequence to be embedded and the output sequence are a, X, W, respectively, the embedding of the mapping sequence satisfies the following rule:

W＝A+2X

α＝E(X)/E(A)＝10*log(SNR) (a)

in the formula (a), α is the embedding strength of the signal, E (.) is the energy of the signal, and SNR is the signal-to-noise ratio of the watermarked signal. Generally, experiments prove that when the SNR value is larger than 60 (dB), the auditory influence of the embedded watermark information on the original signal is basically very small, so that the imperceptibility of the watermark can be met, and meanwhile, the correct extraction of the watermark can be ensured. After the frequency domain linear superposition, the W is subjected to inverse DTCTWT conversion, and an output audio signal containing watermark information can be obtained.

In order to ensure the robustness of the embedded information, the system provides that each code word needs to be embedded 4 times in succession, that is, the code words embedded in the contents of 4 successive frames are the same, and the overall structure is as shown in fig. 3. Thus, even though the extracting end may occasionally generate codeword extraction errors, the system may extract codewords of 4 frames continuously, and then extract one codeword with the highest statistical probability as the correct extracted codeword. At the same time, the watermark information will be embedded in the entire carrier audio signal in an infinite loop. Thus, even if the original audio is cropped, there is a high probability that the information extractor will at least observe the entire piece of information embedded from the remaining audio.

(3) Synchronization information embedding

After the watermark-containing information is cut, the user needs to accurately position the initial position of each frame of the audio signal to realize signal synchronization, so that the accuracy of information extraction can be ensured. In addition, the receiver also needs to know where the complete watermark information starts to be embedded. Therefore, after the information embedder completes embedding the watermark each time, the information embedder completes the pseudo-random signal superposition of the time domain in the next frame (synchronous frame), the superposition rule is substantially the same as the formula (a), and the difference is that the superposition is completed in the time domain, namely the full-frequency superposition, in order to ensure the accuracy of positioning the boundary of each frame when the synchronous information is extracted.

The specific process of watermark extraction is as follows:

(1) Pseudo-random sequence generation

The information extractor needs to share the same pseudo-random seed with the information receiver to generate the same pseudo-random sequence, and performs 2-level DTCTWT conversion on the pseudo-random sequence for hiding information, reserves the intermediate frequency sub-band, erases zero the high frequency and low frequency sub-bands, and obtains the signal of only the remaining intermediate frequency through inverse DTCTWT conversion, which is abbreviated as

The sharing of the pseudorandom seed may be achieved by prior agreement or transmission under a more covert communication regime. Similarly, the pseudo-random seed can be used as a key to be owned by the information embedder and provided to the legal information receiver, so that better information encryption is realized.

(2) Signal synchronization

After receiving the audio signal, the information extractor first performs a time-domain correlation calculation using the sync sequence and the entire audio signal, and obtains a correlation peak, which is assumed to be C _p For the remaining sample points, if the correlation at that point is greater than or equal to T C _p Then the sample point is considered to be the initial position of the sync frame. T is a decision threshold, which is set to 0.8 by default, and a higher threshold has the capability of eliminating synchronization interference, but also results in a higher probability of losing synchronization, and a lower threshold may introduce a higher probability of false synchronization.

(3) Correlation calculation

After the signals are synchronized, the receiver performs 2-level DTCTC WT conversion on each frame of signal, reserves the intermediate frequency sub-band, and combines the high frequency sub-band with the low frequency sub-bandWiping zero, and obtaining the signal of the residual intermediate frequency only by inverse DTCTWT conversion, which is abbreviated as W _M . Then, the receiver calculates W _M And P _M And (4) the correlation of the medium elements, and a pseudo-random sequence corresponding to the correlation peak value is taken, wherein the serial number of the seed is the watermark information (M) extracted from the current frame.

In the formula (b), cov (x, y) represents the correlation between x and y.

(4) Statistical average of results

The code element with the highest frequency is taken out through the watermark information extracted from 4 continuous frames, and the embedded watermark information can be correctly obtained. And if the extractor obtains more than one synchronous code in the audio, then for each code element, the code element with the highest occurrence frequency can be extracted as the result according to the frequency of the extraction results of the code elements in different cycles, thereby further improving the information extraction accuracy.

The invention has the advantages that:

(1) According to the method, the original code stream of the audio signal is used as an embedded object of the watermark information, and the anti-attack robustness of the system is ensured to be high enough through the algorithm design based on the dual-tree complex wavelet transform, and meanwhile, the tone quality of the carrier signal can be effectively ensured;

(2) The extremely low correlation between the pseudo-random sequence and the natural audio signal is utilized, and the possibility that the embedded information is correctly extracted is fully ensured. Moreover, for each frame, the pseudo-random sequence is embedded into the carrier audio to obtain a large watermark, so that a large embedding capacity can be obtained;

(3) Watermark information is infinitely and circularly embedded into the whole audio signal, so that the extraction accuracy is greatly improved, the algorithm obtains cutting robustness, and the embedded audio and the original audio have no perceptual difference in hearing, so that the method is a digital blind watermark embedding method which cannot sense hearing;

(4) By the embedding method, a certain amount of information, such as an image serial number or other copyright identification information, can be embedded in an image or audio-video. The watermark is embedded in a dual-tree complex wavelet transform domain of the image, and a block embedding method is adopted, so that the information embedding amount is increased, common attacks such as image quality compression, proportion transformation and the like of the image can be resisted, certain robustness is achieved, and the method has great significance for copyright protection of the image or audio.

Drawings

Fig. 1 is a block diagram of a watermark embedding process in the overall algorithm framework of the audio robust blind watermark embedding method of the present invention;

FIG. 2 is a block diagram of the watermark extraction process in the overall algorithm framework of the audio robust blind watermark embedding method of the present invention;

FIG. 3 is a schematic diagram of audio framing and sequence mapping during watermark embedding;

fig. 4 is a schematic diagram of embedding a watermark signal and a synchronization signal in a watermark embedding process.

Detailed Description

The invention is described in detail below with reference to the figures and the embodiments.

The audio digital robust blind watermark embedding method based on the dual-tree complex wavelet transform can be divided into watermark embedding and watermark extraction.

Block diagram of watermark embedding process (a) referring to fig. 1:

(1) Audio framing and sequence mapping

A block diagram of audio framing and sequence mapping is shown in fig. 3. Firstly, the information embedder divides the carrier audio into a plurality of frames, each frame comprises 1024 time domain sampling points in sequence, and the sampling points at the tail of less than one frame are not in the embedding range. Then, 16 different pseudo-random seeds are selected to generate 16 different pseudo-random sequences, wherein the value range of elements in the sequences is [ -1,1], and the range is consistent with the amplitude of the audio signal sampling point. In the present algorithm, different watermarks are embedded, i.e. representing embedding different hexadecimal watermark information. The 16 pseudo-random sequences are used as mapping sequences, are numbered 0-15 in sequence and correspond to 0x0F in hexadecimal. Thus, the embedding of each pseudorandom sequence corresponds to a 4-bit binary watermark information embedding. Thus, the information embedder maps the watermark information to the sequence to be embedded. In addition, the 17 th pseudo-random seed is selected by the information embedder to generate the audio synchronization signal, which is introduced in step (3).

(2) Mapping sequence embedding

The information embedder realizes the embedding of the watermark information by superposing the DTCTWT intermediate frequency signal of the sequence to be embedded with the original audio signal. Firstly, for each frame, the embedder performs 2-level dual-tree complex wavelet transform (2-level DTCWT) on the sequence to be embedded and the original sequence, and converts the signal from the time domain into low-frequency, intermediate-frequency and high-frequency sub-bands through transformation. In order to ensure that the influence of information embedding on the original signal is as small as possible and sufficient robustness is required, an intermediate frequency sub-band is selected as a target area for information embedding. Assuming that the second layer DTCWT transform intermediate frequency coefficients of the original signal, the sequence to be embedded, and the output sequence are a, X, and W, respectively, the embedding of the mapping sequence satisfies the following rule:

W＝A+αX

α＝E(X)/E(A)＝10*log(SNR) (a)

in the formula (a), α is the embedding strength of the signal, E (.) is the energy of the signal, and SNR is the signal-to-noise ratio of the watermarked signal. Generally, experiments prove that when the SNR value is greater than 60 (dB), the auditory influence of the embedded watermark information on the original signal is basically small, the imperceptibility of the watermark can be met, and meanwhile, the correct extraction of the watermark can be ensured. After the frequency domain linear superposition, the W is subjected to inverse DTCTWT conversion, and an output audio signal containing watermark information can be obtained.

In order to ensure the robustness of the embedded information, the system provides that each code word needs to be embedded repeatedly 4 times in succession, that is, the code words embedded in the contents of 4 consecutive frames are the same, and the overall structure is as shown in fig. 4. Thus, even though the extracting end may occasionally generate codeword extraction errors, the system may extract codewords of 4 frames continuously, and then extract one codeword with the highest statistical probability as the correct extracted codeword. At the same time, the watermark information will be embedded in the entire carrier audio signal in an infinite loop. Thus, even if the original audio is cropped, there is a high probability that the information extractor will at least observe the entire piece of information embedded from the remaining audio.

(3) Synchronization information embedding

After the watermark-containing information is cut, the user needs to accurately position the initial position of each frame of the audio signal to realize signal synchronization, so that the accuracy of information extraction can be ensured. In addition, the receiver also needs to know where the complete watermark information starts to be embedded. Therefore, after the information embedder completes embedding the watermark each time, the information embedder completes the pseudo-random signal superposition in the time domain in the next frame (synchronous frame), the superposition rule is substantially the same as (1), and the difference is that in order to ensure the accuracy of positioning the boundary of each frame when the synchronous information is extracted, the superposition is completed in the time domain, namely, the full frequency superposition is completed.

A block diagram of the watermark extraction process (ii) is shown in fig. 2:

(1) Pseudo-random sequence generation

An information extractor needs to share the same pseudo-random seed with an information receiver to generate the same pseudo-random sequence, 2-level DTCTWT conversion is carried out on the pseudo-random sequence used for hiding information, an intermediate frequency sub-band is reserved, zero is erased from a high frequency sub-band and a low frequency sub-band, and signals of only the remaining intermediate frequency are obtained through inverse DTCTWT conversion, which is abbreviated as

(2) Signal synchronization

After receiving the audio signal, the information extractor first performs a time-domain correlation calculation using the synchronization sequence and the entire audio signal, and obtains a correlation peak, which is assumed to be C _p For the remaining sample points, if the correlation at that point is greater than or equal to T C _p Then the sampling point is considered to beIs the initial position of the sync frame. T is a decision threshold, which is set to 0.8 by default, and a higher threshold has the capability of eliminating synchronization interference, but also results in a higher probability of losing synchronization, and a lower threshold may introduce a higher probability of false synchronization.

(3) Correlation calculation

After signal synchronization, a receiver performs 2-level DTCTWT conversion on each frame of signal, reserves an intermediate frequency sub-band, erases zero from the high frequency and low frequency sub-bands, and obtains a signal of only a residual intermediate frequency, which is abbreviated as W, through inverse DTCTT conversion _M . Then, the receiver calculates W _M And P _M And (4) the correlation of the medium elements, and a pseudo-random sequence corresponding to the correlation peak value is taken, wherein the serial number of the seed is the watermark information (M) extracted from the current frame.

In the formula (b), cov (x, y) represents the correlation between x and y.

(4) Statistical average of results

In summary, by the embedding method of the present application, a certain amount of information, such as an image serial number or some other copyright identification information, may be embedded in an image or an audio-video. The watermark is embedded in a dual-tree complex wavelet transform domain of the image, and a block embedding method is adopted, so that the information embedding amount is increased, common attacks such as image quality compression, proportion transformation and the like of the image can be resisted, certain robustness is achieved, and the method has great significance for copyright protection of the image or audio.

The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It should be understood by those skilled in the art that the above embodiments do not limit the present invention in any way, and all technical solutions obtained by using equivalent alternatives or equivalent variations fall within the scope of the present invention.

Claims

1. The audio digital robust blind watermark embedding method based on dual-tree complex wavelet transform is characterized by comprising the following steps of

Watermark embedding:

(1) The carrier audio is divided into a plurality of frames, each frame comprises 1024 time domain sampling points in sequence, and the sampling points less than one frame at the tail end are not in the embedding range;

(2) Selecting 16 different pseudo-random seeds for generating 16 different pseudo-random sequences, and selecting a 17 th pseudo-random seed for generating an audio synchronization signal;

(3) Superposing a DTCTWT intermediate frequency signal of a sequence to be embedded with an original audio signal to realize the embedding of watermark information; the signal superposition method is as shown in formula (a), and inverse DTCTWT conversion is carried out on W to obtain an output audio signal containing watermark information:

assuming that the second layer DTCWT transform intermediate frequency coefficients of the original signal, the sequence to be embedded, and the output sequence are a, X, and W, respectively, the embedding of the mapping sequence satisfies the following rule:

W＝A+αX

α＝E(X)/E(A)＝10*log(SNR) (a)

in the formula (a), alpha is the embedding strength of the signal, E (right) is the energy of the signal, and SNR is the signal-to-noise ratio of the watermark-containing signal;

(4) Each code word needs to be embedded for 4 times continuously, namely the code words embedded in the continuous 4-frame content are the same;

(5) After information embedding is finished, the pseudo-random signal superposition of a time domain is finished in a next frame, the superposition rule refers to a formula (a), and the superposition is finished on the time domain, namely full-frequency superposition;

(II) watermark extraction:

(1) The information extractor and the information receiver use the same pseudo-random seed and will be used for hidingPerforming 2-level DTCTCTWT conversion on the pseudo-random sequence of information, reserving an intermediate frequency sub-band, wiping zero on the high frequency sub-band and the low frequency sub-band, and obtaining signals of only the remaining intermediate frequency through inverse DTCTWT conversion, which is abbreviated as

(2) A time domain correlation calculation is performed using the synchronization sequence and the entire audio signal and a correlation peak, assumed to be C, is obtained _p For the remaining samples, if the correlation at that point is greater than or equal to T C _p If the sampling point is the initial position of the synchronous frame, T is the judgment threshold;

(3) After the signals are synchronized, a receiver carries out 2-level DTCTWT conversion on each frame of signals, reserves an intermediate frequency sub-band, erases zero on the high frequency sub-band and the low frequency sub-band, and obtains signals of only the residual intermediate frequency, namely W through inverse DTCTT conversion _M Then, the receiver calculates W _M And P _M The correlation of the medium elements, and a pseudo-random sequence corresponding to a correlation peak value is taken, wherein the serial number of the seed is the watermark information M extracted from the current frame;

(4) And extracting the code element with the highest frequency of occurrence through the watermark information extracted from the continuous 4 frames to obtain the embedded watermark information.

2. The method for embedding digital robust blind watermark in audio frequency based on dual-tree complex wavelet transform as claimed in claim 1, wherein the method employs "two repetition" embedding rule, each symbol is repeatedly embedded in multi-frame content, and watermark information is embedded in whole audio signal in infinite loop.

3. The method of claim 1, wherein an original code stream of the audio signal is used as an embedding object of the watermark information.

4. The method for embedding audio digital robust blind watermark based on dual-tree complex wavelet transform as claimed in claim 1, wherein 16 pseudo-random sequences are generated by 16 different keys, the basic watermarks are numbered 0-15, corresponding to 16-bit information 0x0-F, each pseudo-random sequence represents 4-bit information.

5. The method for embedding an audio digital robust blind watermark based on dual-tree complex wavelet transform as claimed in claim 1, wherein in the step (3) of embedding a watermark, for each frame, the embedder performs 2-layer dual-tree complex wavelet transform on the embedded sequence and the original sequence, converts the signal from time domain to low frequency, intermediate frequency and high frequency sub-bands through transform, and selects the intermediate frequency sub-band as the target region for embedding information.

6. The method of claim 1, wherein the SNR is greater than 60dB.

7. The method for embedding an audio digital robust blind watermark based on dual-tree complex wavelet transform as claimed in claim 1, wherein in the step (1) of extracting the watermark (ii), the sharing of the pseudo-random seed is implemented by a predetermined agreement or under a more secret communication mode, or is owned by the information embedder as a key and provided to the legal information receiver, so as to implement better information encryption.

8. The dual-tree complex wavelet transform-based audio digital robust blind watermark embedding method as claimed in claim 1, wherein in the step (2) of (two) watermark extraction, the default of the decision threshold T is set to 0.8.

9. The method for embedding audio digital robust blind watermark based on dual-tree complex wavelet transform as claimed in claim 1, wherein in the step (3) of watermark extraction (ii), the expression of watermark information M is disclosed as:

where Cov (x, y) represents calculating the correlation of x and y.

10. The method for embedding digital robust blind watermark in audio based on dual-tree complex wavelet transform as claimed in claim 1, wherein in the step (4) of extracting watermark, if the extractor obtains more than one synchronization code in the audio, then for each symbol, according to the frequency of the extraction result in different cycles, the symbol with the highest occurrence number is extracted as the result, thereby further improving the accuracy of information extraction.