CN108766449B

CN108766449B - Reversible watermark realization method for audio signal

Info

Publication number: CN108766449B
Application number: CN201810542591.3A
Authority: CN
Inventors: 张卫明; 俞能海; 吴媛欣; 姚远志
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2018-05-30
Filing date: 2018-05-30
Publication date: 2020-10-27
Anticipated expiration: 2038-05-30
Also published as: CN108766449A

Abstract

The invention discloses a reversible watermark realization method of an audio signal, which comprises the following steps: predicting the value of the left channel signal point by using a first prediction model so as to obtain a residual error; distinguishing different smooth areas of the right channel signal by using the residual error, and selecting an optimal right channel signal smoothness area by using an optimization algorithm; and predicting values of all points in the optimal right channel signal smoothness area by using a second prediction module, obtaining a residual error, and embedding, extracting and recovering watermark information by using a spread spectrum method. The method greatly reduces the embedding distortion of the audio while improving the embedding capacity of the audio, can restore the original audio without loss after extracting the watermark, and can be used for authenticity and lossless authentication of the audio signal.

Description

Reversible watermark realization method for audio signal

Technical Field

The invention relates to an information hiding technology, in particular to a reversible watermark realization method of an audio signal.

Background

Since the conventional encryption method cannot prevent the re-spread and the theft of the protection and integrity authentication scheme of the multimedia contents. The digital watermarking technology is supplemented and extended with encryption technology, and is rapidly developed in the aspects of copyright protection and integrity authentication. Digital watermarking technology is to embed signals (images, characters, signatures, symbols with special meanings, etc.) as marks in digital media works for the purpose of copyright protection, ownership authentication, integrity protection, etc., and is an important research direction in information hiding technology.

The audio digital watermarking technology is used for embedding watermark information into digital audio, and compared with the traditional audio protection method, the audio digital watermarking technology cannot be removed, and the use and the quality of the audio can be influenced by random modification and stripping; secondly, the audio digital watermark utilizes the self-correlation of the audio, thereby reducing the complexity of the operation; more importantly, the audio watermark is not perceptibility, and the superposition of the audio and the watermark does not affect the auditory perception of human ears.

The audio watermark can be divided into a robust watermark and a fragile watermark according to the characteristics of the audio watermark, the robust watermark can resist malicious attacks (compression, digital-to-analog conversion, time delay and the like) to a certain degree, the imperceptibility of the watermark is improved while the robustness is ensured, the optimal balance between the robustness and the imperceptibility is achieved, and the purpose of protecting the audio copyright can be achieved under the severe environment by the watermark; fragile watermarks are not robust and can be modified if the audio content is changed, and the watermark facilitates detection of tampering and location of tampering, and aims to protect the integrity of the audio content.

However, the audio watermarking technology can cause certain damage to the carrier information, and a receiving party can only extract the embedded watermark information and cannot completely recover the carrier signal, which has limitations in some practical applications. In business scenarios with high requirements on sound quality, the music quality is also affected by slight modifications to the carrier audio, so that the lossless restoration of the original carrier also represents the urgency of the demand in some fields.

Reversible watermarking solves the problem, the reversible information hiding technology allows the original digital audio to be completely restored, the reversible information hiding technology can realize that the original voice carrier signal can be restored in a lossless manner while watermark information is completely extracted, reversible information hiding processing is carried out in voice, and the reversible information hiding technology is suitable for a plurality of application scenes, such as legal evidence obtaining, criminal investigation, military intelligence and high-quality music requirements, and serious consequences can be brought by voice quality reduction, key point blurring or partial section deletion in the applications. In addition, the reversible information hiding technology can also authenticate the authenticity and the non-damage of voice. However, the audio distortion caused by the current reversible audio watermark under the same embedding rate is relatively large, and how to reduce the audio distortion under the same embedding rate becomes a technical problem to be solved.

Disclosure of Invention

The invention aims to provide a reversible watermarking realization method of an audio signal, which can be used for authenticity and lossless authentication of the audio signal.

The purpose of the invention is realized by the following technical scheme:

a method of reversible watermarking of an audio signal, comprising:

predicting the value of the left channel signal point by using a first prediction model so as to obtain a residual error;

distinguishing different smooth areas of the right channel signal by using a residual error, and selecting an optimal right channel signal smoothness area combination by using an optimization algorithm;

and predicting values of all points in the optimal right channel signal smoothness area combination by using a second prediction module, obtaining a residual error, and embedding, extracting and recovering watermark information by using a spread spectrum method.

The technical scheme provided by the invention can improve the audio embedding capacity, greatly reduce the audio embedding distortion, and recover the original audio without loss after extracting the watermark.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a flowchart of a reversible watermarking method for an audio signal according to an embodiment of the present invention;

FIG. 2 is a diagram of an audio left channel signal according to an embodiment of the present invention;

FIG. 3 is a diagram of an audio right channel signal according to an embodiment of the present invention;

fig. 4 is a graph comparing average distortions of audio libraries according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a method for realizing reversible watermark of an audio signal, which mainly comprises the following steps as shown in figure 1:

1. the values of the left channel signal points are predicted using a first prediction model, thereby obtaining a residual.

Fig. 2 is a schematic diagram of an audio left channel signal. Value of the ith signal point of the left channel

Having a local correlation with its context, the values of locally adjacent signal points can be exploited

k < i < L' -k +1, the formula is as follows:

in the above formula, the first and second carbon atoms are,

is composed of

The predicted value of (2); x is the value of the signal point, the subscript is the serial number of the signal point, the superscript L represents that the signal point belongs to the left sound channel, and L' is the number of the signal points of the left sound channel; v. of_pRepresenting the prediction coefficients;

will predict the value

And

subtracting to obtain a residual:

in particular, the method of least squares regression is used in the embodiment of the present invention to obtain the optimal prediction coefficient v in the form of vector_p：

X_p*v_p＝y_p；

Wherein, when p is-3, -2, -1,1,2,3, X_pIs a 3 x 6 matrix, represented as:

in the above formula, x with a wave symbol represents a predicted value;

v_p＝[v_-1v_-2v_-3v₁v₂v₃]^T；

the optimal prediction coefficient is:

where the superscript T denotes the transpose of the matrix, y_pRepresents a vector; w is a regularization term introduced to avoid the nan (not a number) problem, for which the following empirical value is sought:

in the above formula, e is an expression of scientific notation, meaning 1 × 10^-5。

2. And distinguishing different smooth areas of the right channel signal by using the residual error, and selecting the optimal right channel signal smoothness area combination by using an optimization algorithm.

From residual error

The magnitude of (a) divides the right channel signal into different smooth regions:

in the above formula, l represents a smooth region, and the subscript j is the serial number of the smooth region; x is the value of the signal point, and the subscript is the serial number of the signal point; the superscript R indicates that the signal point or smooth region belongs to the right channel; tr represents the total number of the set smooth regions;

trying to embed the smoothness regions in different smoothness regions to obtain embedding capacity and corresponding embedding distortion of each smoothness region, and calculating an optimal smooth region combination according to a specified embedding capacity C and through the following formula, so that the embedding distortion is minimized on the premise of meeting the embedding capacity C:

in the above formula, c_j、d_jCorrespondingly, the j represents the embedding capacity and the embedding distortion of the smooth area;

the optimal solution is as follows:

3. and predicting values of all points in the optimal right channel signal smoothness area combination by using a second prediction module, obtaining a residual error, and embedding, extracting and recovering watermark information by using a spread spectrum method.

In the embodiment of the invention, the optimal right channel signal smoothness area combination selected in the step 2 is combined

And (6) watermark embedding is carried out.

As shown in fig. 3, a schematic diagram of an audio right channel signal is shown, and the following description mainly deals with each signal point therein. In order to prevent the modification of the front signal points from influencing the prediction of the rear signal points, assuming that the number of right channel signal points to be predicted is N, dividing the signal points into an odd set and an even set according to the positions of the signal points, and respectively predicting the right channel signal points in the two sets;

1) the first wheel predicts an even set, and the formula is as follows:

in the above formula, x is the value of the signal point, the subscript thereof is the serial number of the signal point, and the superscript R indicates that the signal point belongs to the right channel;

representing the even set of prediction coefficients in the form of a vector.

The optimal even set prediction coefficients are obtained by solving the following equation:

if N is an even number, the matrix

Vector quantity

Expressed as:

if N is odd, the matrix

Vector quantity

Expressed as:

the optimal even set prediction coefficients can thus be calculated:

in the above formula, d is less than or equal to 4;

2) the second wheel predicts the odd set, and the formula is as follows:

in the above formula, the first and second carbon atoms are,

and

all represent the value of the signal point after embedding the watermark, namely, after the predicted value of a signal point is calculated, calculate the corresponding residual error and embed the watermark;

representing the odd set of vectors and the prediction coefficients.

The optimal odd set prediction coefficients are obtained by solving the following equation:

if N is an even number, the matrix

Vector quantity

Expressed as:

if N is odd, the matrix

Vector quantity

Expressed as:

the optimal odd set prediction coefficients can thus be calculated:

after the predicted values of the signal points are obtained, corresponding residual errors can be calculated, and therefore the watermark information is embedded, extracted and restored by using a spread spectrum method; the method comprises the following specific steps:

1) when embedding watermark information, firstly, calculating a residual:

then, embedding of watermark information is performed according to the following formula:

in the above formula, b represents watermark information, and t is a threshold value for determining embedding capacity;

a secret-carrying signal is obtained, denoted as:

the left channel signal is then combined without any modification with the corresponding watermarked right channel signal into a binaural signal as the final secret signal.

2) When extraction and recovery are carried out, firstly, watermark information is obtained through the following formula:

the original residual is then restored by:

the original carrier signal is finally recovered by:

according to the scheme of the embodiment of the invention, the embedding distortion of the audio is greatly reduced while the audio embedding capacity is improved, and the original audio can be recovered without loss after the watermark is extracted. The technique can be used for authenticity and non-destructive authentication of audio signals.

Comparative experiments were also conducted to illustrate the effects of the above-described schemes of the embodiments of the present invention.

In this comparison experiment, 70 sections of standard two-channel audio with a sampling frequency of 44.1KHZ and an audio length of 200 ten thousand sample points were selected. Embedding capacity and audio distortion are two important evaluation indexes, the embedding capacity is measured by the number of watermark bits embedded in audio, and signal-to-noise ratio (SNR) is used for measuring the distortion degree of the audio:

s in the above formula is the number of sample points;

the results of the comparative experiments are shown in fig. 4, and specific experimental effect values are shown in table 1.

Number of embedding bits	10000	20000	30000	40000	50000	60000	70000	80000	90000	100000
											The invention	72.45	68.35	65.72	63.98	62.20	60.71	59.01	57.73	56.55	55.68
Li et al.	69.35	65.69	63.34	61.52	60.04	58.77	57.79	56.78	55.50	54.58
											Akira et al.	63.69	59.60	57.24	54.14	51.25	49.37	47.67	45.90	44.31	42.88
Xiang et al.	63.55	60.55	57.65	55.36	53.14	51.25	49.50	47.44	45.90	44.21

TABLE 1 average SNR for Audio library at different Capacity

In fig. 4 and the schemes "Li et al.", "Akira et al.", "Xiang et al." in table 1 are compared with the existing schemes involved in the comparative experiments.

The results of the comparative experiments show that the effect of the scheme of the embodiment of the invention is obviously superior to that of the existing schemes.

Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for reversible watermarking of an audio signal, comprising:

predicting values of all points in the optimal right channel signal smoothness area combination by using a second prediction module, obtaining a residual error, and embedding, extracting and recovering watermark information by using a spread spectrum method;

the steps of embedding, extracting and recovering the watermark information by using the spread spectrum method comprise:

when embedding watermark information, firstly, calculating a residual:

x is the value of the signal point, the superscript R indicates that the signal point belongs to the right channel, N is the number of signal points of the right channel to be predicted,

is composed of

The predicted value of (2);

a secret-carrying signal is obtained, denoted as:

when extraction and recovery are carried out, firstly, watermark information is obtained through the following formula:

the original residual is then restored by:

the original carrier signal is finally recovered by:

2. a method of reversible watermarking an audio signal according to claim 1, wherein the step of predicting the values of the left channel signal points using the first prediction model to obtain the prediction residual comprises:

value of the ith signal point of the left channel

With local correlation to its context, using values of locally adjacent signal points

The prediction is carried out by the following formula:

in the above formula, the first and second carbon atoms are,

is composed of

will predict the value

And

subtracting to obtain a residual:

3. a reversible watermark realization method for audio signals according to claim 2, characterized in that the least squares regression method is used to obtain the optimal prediction coefficients v in vector form_p：

X_p*v_p＝y_p；

Wherein, when p is-3, -2, -1,1,2,3, X_pIs a 3 x 6 matrix, represented as:

in the above formula, x with a wave symbol represents a predicted value;

v_p＝[v_-1v_-2v_-3v₁v₂v₃]^T；

the optimal prediction coefficient is:

where the superscript T denotes the transpose of the matrix, y_pRepresents a vector; w represents the regularization term for which an empirical value is sought:

4. the method of claim 1, wherein the step of using the residual error to distinguish different smooth regions of the right channel signal and using the optimization algorithm to select the optimal combination of smooth regions of the right channel signal comprises:

from residual error

performing trial embedding on the smoothness areas in different smoothness areas to obtain the embedding capacity and corresponding embedding distortion of each smoothness area, and calculating an optimal smooth area combination according to a specified embedding capacity C by the following formula so as to minimize the embedding distortion on the premise of meeting the embedding capacity C:

the optimal solution is as follows:

5. the method of claim 1 or 4, wherein the step of predicting the values of the points in the optimal combination of the smoothness areas of the right channel signal by the second prediction module comprises:

assuming that the number of right channel signal points to be predicted is N, dividing the signal points into an odd set and an even set according to the positions of the signal points, and respectively predicting the right channel signal points in the two sets;

the first wheel predicts an even set, and the formula is as follows:

representing an even set of prediction coefficients in vector form;

the second wheel predicts the odd set, and the formula is as follows:

in the above formula, the first and second carbon atoms are,

and

each represents a value of a signal point after embedding the watermark;

representing the odd set of vectors and the prediction coefficients.

6. A reversible watermark realization method for audio signals according to claim 5, characterized in that the optimal even set of prediction coefficients is obtained by solving the following equation:

if N is an even number, the matrix

Vector quantity

Expressed as:

if N is odd, the matrix

Vector quantity

Expressed as:

the optimal even set prediction coefficients can thus be calculated:

7. a reversible watermark realization method for audio signals according to claim 5, characterized in that the optimal odd set of prediction coefficients is obtained by solving the following equation:

if N is an even number, the matrix

Vector quantity

Expressed as:

if N is odd, the matrix

Vector quantity

Expressed as:

the optimal odd set prediction coefficients can thus be calculated: