CN115116453A

CN115116453A - Audio watermark embedding method and device, electronic equipment and storage medium

Info

Publication number: CN115116453A
Application number: CN202210605835.4A
Authority: CN
Inventors: 黄樱; 张树武; 刘杰
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2022-09-27
Anticipated expiration: 2042-05-30
Also published as: CN115116453B

Abstract

The invention provides an audio watermark embedding method, an audio watermark embedding device, electronic equipment and a storage medium, wherein the method comprises the following steps: extracting local salient feature points of the carrier audio in a time domain; determining the watermark embedding position of the carrier audio on a time domain according to the local significant characteristic points; and embedding watermark information into the carrier audio based on the watermark embedding position to obtain the carrier audio comprising the watermark information. In the embodiment of the invention, the extracted local significant characteristic points of the carrier audio are positioned at the positions where audio signals change violently, and the watermark embedding positions determined based on the local significant characteristic points can keep the relative positions unchanged when the desynchronizing attack is faced, so that the audio segments embedded with the watermark information can be accurately positioned, the watermark information embedded in the audio segments can be accurately extracted, and the robustness of the watermark technology in the desynchronizing attack situation is effectively improved.

Description

Audio watermark embedding method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of audio processing technologies, and in particular, to an audio watermark embedding method and apparatus, an electronic device, and a storage medium.

Background

With the popularization of the mobile internet, more and more creators release works to the network, which brings convenience to the spreading of the works and also has serious copyright hidden trouble. Audio watermarking provides a solution to copyright protection for digital audio by imperceptibly embedding a copyright-identifying digital mark, called watermark, into the audio, which can be determined by extracting the watermark from the audio when a rights dispute occurs.

Currently, regarding embedding a watermark, it is possible to choose to embed a watermark in a complete segment of audio, or to embed a watermark in a partial segment of audio.

However, the method of embedding a watermark into a complete section of audio has no effect in the face of desynchronization attacks represented by shearing and resampling, and the watermark cannot be correctly extracted; the method for embedding the watermark in the partial audio segment has the advantage that after the audio faces desynchronizing attack, the watermark embedded in the audio segment cannot be correctly extracted because the audio segment embedded with the watermark cannot be accurately positioned. It can be seen that the related watermarking technology has low robustness in the face of desynchronization attack situations.

Disclosure of Invention

The invention provides an audio watermark embedding method, an audio watermark embedding device, electronic equipment and a storage medium, which are used for solving the problem of low robustness of a related watermark technology in the face of desynchronization attack.

The invention provides an audio watermark embedding method, which comprises the following steps: extracting local salient feature points of the carrier audio in a time domain;

determining the watermark embedding position of the carrier audio on a time domain according to the local significant characteristic points;

and embedding watermark information into the carrier audio based on the watermark embedding position to obtain the carrier audio comprising the watermark information.

According to the embedding method of the audio watermark provided by the invention, the extracting of the local significant feature point of the carrier audio in the time domain comprises the following steps:

low-pass filtering the audio signal of the carrier audio to obtain a filtered audio signal x _s (n)；

According to x _s (n), calculating a first order difference signal d (n) using equation (1):

d(n)＝x _s (n+1)-x _s (n) (1)

calculating local extreme points of d (n) as candidate feature points;

extracting the candidate characteristic points which meet a first condition as the local significant characteristic points;

wherein the meeting the first condition comprises at least one of:

the distance between the last sampling point of the carrier audio and the last sampling point of the carrier audio is greater than or equal to a first distance;

the corresponding contrast | d (i) | is greater than a first threshold value, i is a positive integer;

the corresponding steepness t (i-1) -t (i) is larger than a second threshold value, wherein t (n) is a differential signal of a contrast signal | d (n) | of d (n), and i is a positive integer;

among other candidate feature points whose distances from the candidate feature point are smaller than the second distance, the candidate feature point is a candidate feature point whose contrast is the greatest.

According to the audio watermark embedding method provided by the invention, the calculating the local extreme point of d (n) as the candidate feature point comprises the following steps:

calculating a contrast signal | d (n) | of d (n);

calculating a differential signal t (n) of | d (n) |;

and selecting local extreme points satisfying { i | t (i-1) >0, t (i) <0} in t (n) as the candidate feature points.

According to the audio watermark embedding method provided by the invention, before embedding watermark information in the carrier audio based on the watermark embedding position to obtain the carrier audio including the watermark information, the method further includes:

generating a pseudo-random sequence cluster according to the key sequence;

wherein the pseudo-random sequence cluster comprises two pairwise orthogonal 2 ^m A pseudo-random sequence, m being a positive integer;

modulating the binary watermark to be embedded into a spread spectrum watermark by using the pseudo-random sequence cluster;

the embedding watermark information into the carrier audio based on the watermark embedding position to obtain the carrier audio including the watermark information includes:

and based on the watermark embedding position, embedding the spread spectrum watermark serving as watermark information into the carrier audio to obtain the carrier audio comprising the watermark information.

According to the audio watermark embedding method provided by the invention, the generating of the pseudo-random sequence cluster according to the key sequence comprises the following steps:

obtaining the key sequence

k _i ∈{-1，+1}，i＝0，…，l _f -1；

To k is paired ₀ Cyclic shift to obtain matrix

wherein ,

the full rank of matrix K is decomposed into matrix F and matrix H using equation (2):

K＝F ^T H (2)

where matrix H is the non-zero row portion of the Hermite standard version of matrix K, F ^T Is a full rank matrix consisting of c columns of the matrix K, c being the rank of the matrix K;

first 2 of matrix F using equation (3) ^m Performing gram-Schmidt orthogonalization on the rows to obtain a pseudo-random sequence cluster

Wherein m is equal to

f _j Is a row matrix of matrix F.

According to the audio watermark embedding method provided by the present invention, based on the watermark embedding position, the spread spectrum watermark is embedded in the carrier audio as watermark information to obtain the carrier audio including the watermark information, including:

carrying out domain transformation on the audio signals corresponding to the watermark embedding positions to obtain transformation domain coefficients;

selecting coefficients with stable attack in a transform domain to form a transform domain coefficient block;

calculating the embedding strength corresponding to each transform domain coefficient block by using the pseudorandom sequence cluster;

based on the embedding strength, embedding the spread spectrum watermark into a transform domain coefficient block corresponding to the embedding strength to obtain a transform domain coefficient block with a watermark;

and carrying out inverse domain transformation on the transformation domain coefficient block with the watermark to obtain the carrier audio comprising the watermark information.

According to the audio watermark embedding method provided by the invention, the calculating the embedding strength corresponding to each transform domain coefficient block by using the pseudo-random sequence cluster comprises the following steps:

calculating the embedding strength alpha corresponding to each transform domain coefficient block by using the pseudo-random sequence cluster and adopting a formula (4):

α＝γmax|xp _k |，k＝0，1，…，2 ^m -1 (4)

where γ is a constant greater than 1, the vector x characterizes each of said transform domain system blocks, the vector p _k Characterizing the pseudorandom sequence cluster.

The invention also provides an audio watermark embedding device, which comprises:

the extraction module is used for extracting local significant feature points of the carrier audio frequency in a time domain;

a determining module, configured to determine, according to the local significant feature point, a watermark embedding position of the carrier audio in a time domain;

and the embedding module is used for embedding the watermark information into the carrier audio based on the watermark embedding position to obtain the carrier audio comprising the watermark information.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the audio watermark embedding method.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of embedding an audio watermark as described in any of the above.

The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements the method of embedding an audio watermark as described in any one of the above.

According to the audio watermark embedding method, the device, the electronic equipment and the storage medium, the extracted local significant characteristic points of the carrier audio are located at the positions where audio signals change violently, and the watermark embedding positions determined based on the local significant characteristic points can keep the relative positions unchanged when desynchronizing attack is faced, so that the audio segments in which watermark information is embedded can be accurately positioned, the watermark information embedded in the audio segments can be accurately extracted, and the robustness of the watermark technology in the desynchronizing attack situation is effectively improved.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of an audio watermark embedding method provided by the present invention;

FIG. 2 is a second flowchart illustrating an audio watermark embedding method according to the present invention;

fig. 3 is an exemplary diagram of step 202 in the audio watermark embedding method provided by the present invention;

FIG. 4 is SWR for 30 cases of audio with both fixed embedding strength and adaptive embedding strength;

FIG. 5 is an ODG of 30 cases of audio with both fixed embedding strength and adaptive embedding strength;

FIG. 6 shows the MOS of 30 cases of audio frequency with both fixed embedding strength and adaptive embedding strength;

FIG. 7 is a third flowchart of the audio watermark embedding method provided by the present invention;

fig. 8 is a schematic structural diagram of an audio watermark embedding device provided by the present invention;

fig. 9 is a schematic physical structure diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

The embodiments of the application provide a solution to the problem that the robustness of the related watermarking technology is low when the related watermarking technology faces the desynchronization attack situation. In order to facilitate a clearer understanding of the embodiments of the present application, some relevant technical knowledge will be first described as follows.

In recent years, piracy network stream events are frequently generated, network piracy infringement increasingly becomes a problem of common attention of creators and media platforms, and related technical researches on copyright infringement are also carried out in academic circles and industry circles. Audio watermarking offers a good solution to copyright protection for digital audio, which imperceptibly embeds a copyright-identifying digital mark, called watermark, into the audio; when the right dispute occurs, the watermark in the audio is extracted to determine the copyright of the audio. Audio watermarking is an effective technique for protecting digital audio copyright, however, there still exist some problems to be solved, no matter in academic research or project application.

An audio watermarking method applicable to project application needs to have both concealment, robustness and large embedding capacity. This requires that the watermark be embedded into the audio work without affecting its perceptual quality and use, i.e. the concealment of the watermark needs to be satisfied; even if the audio is subjected to various attacks to generate distortion, such as noise superposition, coding compression, resampling and the like, the watermark can be correctly extracted, namely the robustness of the watermark needs to be met; there is also sufficient capacity to label copyright information such as the authority, the type of rights, the time to start the rights, etc., i.e. a large embedding capacity of the watermark needs to be met.

However, there is a conflict between the three performance requirements: increased robustness is usually at the expense of concealment and embedding capacity; increasing concealment can also reduce embedding capacity or compromise robustness. An effective audio watermarking method is to satisfy three performance requirements as simultaneously as possible.

A complete watermark embedding framework mainly comprises three aspects: selection of an embedding domain, determination of an embedding position, and design of an embedding method and an extraction method corresponding to the embedding method.

The embedded domain of the watermark can be divided into a time domain and a transform domain. The time domain refers to the audio signal itself, the transform domain refers to a coefficient space obtained by performing domain transform processing on the audio signal, and compared with the time domain technology, the transform domain watermarking technology has better concealment and robustness. Commonly used domain Transformation methods include Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Lifting Wavelet Transform (LWT).

The determination of the watermark embedding location includes two dimensions: firstly, the determination of the embedding position in the time domain can select a section of complete audio to be used for embedding the watermark, and can also select partial audio segments to be embedded with the watermark, wherein the former has no effect in the face of desynchronization attack represented by shearing and resampling, and the latter can not accurately position the audio segments embedded with the watermark after the audio is attacked by the existing method, so that the watermark embedded in the audio segments can not be correctly extracted; the second is the determination of the embedding position on the transform domain, which depends on the transform domain characteristics of the digital audio, and usually determines the embedding position by digging some stable coefficient ranges in the transform domain, so that the watermark can be robust enough even if the audio is attacked by signals.

The design of the embedding method is the core of the watermarking technology, and is closely related to the performance of the watermarking technology. The currently common embedding method is a spread spectrum method, which first modulates each bit in the watermark into a Pseudo Noise (PN) sequence, which is then superimposed into the transform domain coefficients of the carrier audio. When extracting the watermark, the correlation function of the transform domain coefficient containing the watermark and the PN sequence can be calculated to extract the watermark. The spread spectrum method is concerned about because of its simple extraction method and robustness to noise attack, but in the spread spectrum method, in order to ensure the accuracy of the extracted watermark, the PN sequence is designed to be very long, and this can only embed one bit of watermark information, which severely limits the embedding capacity of the audio watermark.

Spread spectrum methods often use a parameter to control the strength of watermark embedding, i.e. the embedding strength. The embedding strength is used to balance the performance of the watermarking technique. Specifically, the greater the embedding strength, the stronger the robustness of the watermark, but the weaker the concealment; and vice versa. The setting of the embedding strength needs to take robustness and concealment into consideration at the same time. Some existing spread spectrum methods use the same fixed strength when a watermark is embedded into each section of audio, ignore the difference of different audios and cannot give consideration to the requirements of robustness and concealment; some spread spectrum watermarking methods use different intensities for different audio frequencies, but use the same embedding intensity at different embedding locations of a segment of audio frequency, which also impairs the concealment of the watermarking technique.

The audio watermark embedding method of the present invention is described below with reference to the drawings. Fig. 1 is a schematic flowchart of an audio watermark embedding method provided by the present invention, and as shown in fig. 1, the method includes steps 101 to 103; wherein:

step 101, extracting local significant feature points of the carrier audio in a time domain.

And step 102, determining the watermark embedding position of the carrier audio on the time domain according to the local significant characteristic points.

And 103, embedding the watermark information into the carrier audio based on the watermark embedding position to obtain the carrier audio comprising the watermark information.

Specifically, local significant feature points of a carrier audio to be embedded with a watermark in a time domain are extracted, the local significant feature points can be located at positions where audio signals change violently, a watermark embedding position of the carrier audio in the time domain is determined based on the local significant feature points, and then watermark information is embedded into the carrier audio based on the determined watermark embedding position to obtain the carrier audio including the watermark information.

The watermark information may be a character string, a 01 string, a text or an image, etc. which may identify a copyright.

In the embodiment of the invention, the extracted local significant characteristic points of the carrier audio are positioned at the positions with violent audio signal changes, and the watermark embedding positions determined based on the local significant characteristic points can keep the relative positions unchanged when the carrier audio faces desynchronizing attack, so that the audio segments embedded with watermark information can be accurately positioned, the watermark information embedded in the audio segments can be accurately extracted, and the robustness of the watermark technology in desynchronizing attack conditions is effectively improved.

Alternatively, the above-mentioned local significant feature point may be extracted by:

s1, low-pass filtering the audio signal of the carrier audio to obtain a filtered audio signal x _s (n)；

Specifically, the audio signal x (n) of the carrier audio may be low-pass filtered by using formula (5), so as to obtain a filtered audio signal x _s (n)：

x _s (n)＝G(n)*x(n) (5)

Wherein g (n) characterizes a low-pass filter, which represents the convolution calculation.

The above-mentioned g (n) for example characterizes gaussian filters,

s2, according to x _s (n), calculating a first order difference signal d (n) using equation (1):

d(n)＝x _s (n+1)-x _s (n) (1)

s3, calculating the local extreme point of d (n) as a candidate characteristic point;

specifically, calculating the local maximum point of the first-order difference signal d (n) can be understood as calculating the local maximum point of the contrast signal | d (n) | of d (n).

Alternatively, the step S3 may include the following steps:

s3-1, calculating a contrast signal | d (n) | of d (n);

s3-2, calculating a differential signal t (n) of | d (n) |;

specifically, the difference signal t (n) of the contrast signal | d (n) | can be calculated using equation (6):

t(n)＝|d(n+1)|-|d(n)| (6)

s3-3, selecting local extreme points in t (n) which satisfy { i | t (i-1) >0, t (i) <0} as candidate feature points.

Specifically, it is also possible to select a value satisfying { i | t (i-1)>0，t(i)<0, adding a candidate feature point set S ₀ I.e. S ₀ ＝{i|t(i-1)>0，t(i)<0}。

S4, extracting candidate feature points meeting the first condition to serve as local significant feature points;

wherein satisfying the first condition comprises at least one of:

a. the distance between the last sampling point of the carrier audio and the last sampling point of the carrier audio is greater than or equal to a first distance;

the first distance may be set by a technician according to actual conditions.

b. The corresponding contrast | d (i) | is greater than a first threshold value, i is a positive integer;

specifically, the first threshold may be set by a technician according to actual situations, for example, the first threshold may be set as a median of contrast of the entire candidate feature points.

c. The corresponding steepness t (i-1) -t (i) is larger than a second threshold value, wherein t (n) is a differential signal of a contrast signal | d (n) | of d (n), and i is a positive integer;

specifically, the second threshold may be set by a technician according to actual situations, for example, the second threshold may be set as a median of the steepness of the overall candidate feature point.

d. Among other candidate feature points whose distances from the candidate feature points are smaller than the second distance, the candidate feature point is the candidate feature point with the largest contrast.

The second distance can be set by a technician according to actual conditions.

It is understood that step S4 is to determine the candidate feature point set S ₀ In the method, candidate feature points which are not at the tail end of the audio frequency, have high contrast, are in a steep peak or have dominant contrast in adjacent candidate feature points are selected, and the candidate feature points which are suitable for the watermark embedding task are used as local significant feature points.

Optionally, the candidate feature point set S may be ₀ In the method, candidate feature points which simultaneously meet four conditions of no audio tail end, high contrast, steep peak and contrast dominance in adjacent candidate feature points are selected as local significant feature points, so that the robustness of the watermark technology in the face of desynchronization attack is further improved.

Specifically, selecting a candidate feature point which simultaneously satisfies four conditions of no audio tail end, high contrast, steep peak and contrast dominance in adjacent candidate feature points as a local significant feature point can be realized through the following steps:

a. filtering out set S ₀ Filtering out candidate feature points with the distance to the last sampling point of the carrier audio less than the first distance to obtain a candidate feature point set S ₁ ；

The first distance is, for example, a set distance l.

b. Filtering out set S ₀ Candidate feature points with medium contrast, i.e. computing candidate feature points

Corresponding contrast | d (i) |, and filtering out the candidate feature points with the contrast value smaller than the median of the contrast of the whole candidate feature points to obtain a candidate feature point set S ₂ ；

c. Filtering out set S ₀ Calculating the steepness of the candidate feature point

Filtering out candidate points with the steepness value smaller than the median of the steepness of the whole candidate feature points to obtain a candidate feature point set S ₃ ；

d. Filtering out set S ₀ The candidate feature point with less dominant contrast in the adjacent candidate feature points, namely two candidate feature points p exist, q belongs to S ₀ And | p-q<l, l is a set distance, and may be set to a second distance not equal to l if both contrast | d (p) respectively<If p is filtered out, a candidate feature point set S is obtained ₄ ；

Finally, the intersection of all candidate feature point sets is used as a feature point set S, namely

Optionally, before embedding the watermark information in the carrier audio based on the watermark embedding position to obtain the carrier audio including the watermark information, that is, before step 103, the following steps are further included:

generating a pseudo-random sequence cluster according to the key sequence;

wherein the pseudo-random sequence cluster comprises two-two orthogonal 2 ^m A pseudo-random sequence, m being a positive integer;

modulating a binary watermark to be embedded into a spread spectrum watermark by using the pseudo-random sequence cluster;

step 103 specifically includes:

Specifically, a pseudo-random sequence cluster is generated according to the key sequence, and the pseudo-random sequence cluster comprises 2 which are two-two orthogonal ^m And m is a positive integer, wherein the larger the value of m is, the larger the embedding capacity of the watermark is, but the weaker the concealment of the watermark is, and vice versa, and in order to combine the embedding capacity and the concealment of the watermark, the value of m can be set to be 4. And modulating the binary watermark to be embedded into a spread spectrum watermark by utilizing the pseudorandom sequence cluster, and embedding the obtained spread spectrum watermark into the carrier audio to obtain the carrier audio comprising the watermark information.

In the embodiment of the invention, the binary watermark is modulated into the spread spectrum watermark by utilizing the pseudo-random sequence cluster comprising a plurality of pseudo-random sequences which are orthogonal pairwise, so that one pseudo-random sequence in the related technology corresponds to one-bit binary watermark and is changed into one pseudo-random sequence corresponding to a multi-bit binary watermark, further more binary watermarks can be embedded into the carrier audio, and the problem of low embedding capacity in the watermark technology is effectively solved.

Optionally, generating a pseudo-random sequence cluster according to the key sequence includes the following steps:

obtaining a key sequence

k _i ∈{-1，+1}，i＝0，…，l _f -1；

To k is paired ₀ Cyclic shift to obtain matrix

wherein ,

the matrix K is decomposed into a matrix F and a matrix H by adopting a formula (2):

K＝F ^T H (2)

Wherein m is equal to

f _j Is a row matrix of matrix F.

As described above

Is not more than log ₂ And c is an integer.

Optionally, based on the watermark embedding position, embedding the spread spectrum watermark as watermark information in the carrier audio to obtain the carrier audio including the watermark information, including the following steps:

specifically, a section of audio with a fixed distance behind each local feature point may be extracted as a watermark embedding position in a time domain, each audio section is divided into a plurality of non-overlapping audio frames, each audio frame is subjected to domain transformation, such as DCT transformation, and then DCT intermediate-frequency coefficients are extractedAnd splitting the block into several coefficient blocks with equal length, e.g. extracting the interval [ f ] where the corresponding physical frequency in DCT coefficient is located _l ，f _h ]The inner components are separated into a plurality of non-overlapping parts with length l _f If the length of the coefficient block of (2) is not divisible, the remaining frequency part is not processed. Wherein the interval [ f _l ，f _h ]I.e. the intermediate frequency bandwidth, wherein the parameter f _l ，f _h Is the verified frequency bandwidth critical point in this embodiment. It should be noted that suitable if coefficients help to improve the robustness of the watermarking technique.

Calculating the embedding strength corresponding to each transform domain coefficient block by utilizing the pseudorandom sequence cluster;

embedding the spread spectrum watermark into a transform domain coefficient block corresponding to the embedding strength based on the embedding strength to obtain a transform domain coefficient block with the watermark;

Optionally, calculating an embedding strength corresponding to each transform domain coefficient block by using the pseudo random sequence cluster, including:

and (3) calculating the embedding strength alpha corresponding to each transform domain coefficient block by using the pseudo-random sequence cluster and adopting a formula (4):

α＝γmax|xp _k |，k＝0，1，…，2 ^m -1 (4)

where γ is a constant greater than 1, the vector x characterizes each transform domain system block, the vector p _k A pseudo-random sequence cluster is characterized.

Compared with the related technology which uses fixed embedding strength to embed the watermark, the embodiment of the invention can adaptively calculate the embedding strength corresponding to each transform domain coefficient block, and embeds the spread spectrum watermark into the corresponding transform domain coefficient block based on the embedding strength, thereby improving the robustness and the concealment of the watermark technology.

Fig. 2 is a second flowchart illustrating an audio watermark embedding method according to the present invention, as shown in fig. 2, the method includes steps 201 to 206; wherein:

step 201, extracting local significant feature points of the carrier audio in a time domain.

In this step, local salient feature points on the time domain are extracted from the audio to be embedded with the watermark. The characteristic points are located at the positions where the audio signals change violently, and the relative positions of the characteristic points can be kept unchanged in the desynchronization attack, so that the robustness of the watermarking technology in the desynchronization attack situation is improved.

Specifically, step 201 may include steps 2011 to 2014.

2011, low-pass filtering is performed on the audio signal of the carrier audio to obtain a filtered audio signal x _s (n)；

x _s (n)＝G(n)*x(n) (5)

It will be appreciated that the filtered audio signal x _s (n) represents the low-pass filtered smoothed audio signal.

Optionally, G (n) above characterizes, for example, a Gaussian filter,

step 2012, an audio signal x of the filtered carrier audio is calculated _s (n) first order difference signal.

Wherein a first order difference signal is used to describe a variation of the audio signal, in particular according to x _s (n), calculating a first order difference signal d (n) using equation (1):

d(n)＝x _s (n+1)-x _s (n) (1)

step 2013, calculating local extreme points of the first-order difference signals, and adding the local extreme points into the initial candidate feature point set S of the carrier audio ₀ 。

The local maximum point of the first-order difference signal is the local maximum of the contrast signal.

Specifically, step 2013 may include steps 2013-1 through 2013-3; wherein:

step 2013-1, calculating a contrast signal | d (n) | of the first-order difference signal d (n);

step 2013-2, calculate the difference signal t (n) of the contrast signal | d (n) |.

t(n)＝|d(n+1)|-|d(n)| (6)

step 2013-3, satisfying { i | t (i-1) in the differential signal t (n)>0，t(i)<0, adding the initial candidate feature point set S of the carrier audio ₀ 。

Step 2014, in the initial candidate feature point set S ₀ And selecting candidate feature points which are not at the tail end of the audio frequency, have high contrast, are in a sharp peak and have dominant contrast among adjacent candidate feature points as local significant feature points.

In this step, four types of candidate feature points unsuitable for the watermark embedding task may be sequentially excluded, and it is required that if the distance between any two candidate feature points is smaller than a set distance l, only the feature point with a relatively large contrast is retained, and the set distance l represents the minimum distance between adjacent feature points.

Specifically, step 2014 may include step 2014-1 through step 2014-5; wherein:

step 2014-1, filter set S ₀ Filtering out candidate feature points with the distance to the last sampling point of the carrier audio less than the first distance to obtain a candidate feature point set S ₁ ；

The first distance is, for example, a set distance l.

Step 2014-2, filter set S ₀ Candidate feature points with medium contrast, i.e. computing candidate feature points

Corresponding contrast | d (i) |, and filtering out candidate feature points with a contrast value less than the median of the contrast of the whole candidate feature points,obtaining a candidate feature point set S ₂ ；

Step 2014-3, filtering out set S ₀ Calculating the steepness of the candidate feature point

And filtering candidate points with a steepness value smaller than the median of the steepness of the whole candidate feature points to obtain a candidate feature point set S ₃ ；

Step 2014-4, filter set S ₀ The candidate feature point with less dominant contrast in the adjacent candidate feature points, namely two candidate feature points p exist, q belongs to S ₀ And | p-q<l, l is a set distance, and may also be set to a second distance not equal to l if both correspond to a contrast | d (p) to count light<If p is filtered out, a candidate feature point set S is obtained ₄ ；

Step 2014-5, taking the intersection of all candidate feature point sets as a feature point set S, namely

Step 202, according to the local significant feature points, determining a watermark embedding position on a proper time domain, performing domain transformation on the audio signal corresponding to the watermark embedding position, and selecting coefficients with stable partial attack in a transform domain to form a group of transform domain coefficient blocks.

In the step, a section of audio after each characteristic point is extracted as a watermark embedding position on a time domain, then non-overlapped framing is carried out on the audio section, domain transformation is carried out on each audio frame, and partial stable attack coefficients in a transformation domain are selected to form a group of transformation domain coefficient blocks.

Alternatively, the present invention may use DCT transform as the domain transform, because DCT transform has better energy compression advantages, embedding watermark in DCT domain can improve the concealment of watermark, and at the same time, the computational complexity is lower than other transforms.

Alternatively, the mid-frequency coefficient of the DCT coefficient is selected as the watermark embedding location in the transform domain, because the human ear is sensitive to the low-frequency component of the audio, embedding the watermark in the low-frequency coefficient is likely to affect the audio auditory quality, while the high-frequency component is likely to be damaged by low-pass filtering and resampling, and so on, and thus the mid-frequency coefficient of the DCT is relatively more stable.

It should be noted that, in the embodiment, the watermark is embedded in the DCT intermediate frequency coefficient of the audio segment after the feature point, but the present invention is not limited to the DCT and the intermediate frequency coefficient of the DCT, and the present invention can be applied to any transform domain coefficient.

Fig. 3 is an exemplary diagram of step 202 in the audio watermark embedding method provided by the present invention.

Specifically, step 202 may include steps 2021 to 2024.

Step 2021, extracting a section of audio with a fixed distance behind the feature point for each feature point;

note that the fixed distance may be represented by d _m Length of audio segment is denoted as l _c . There is a need to meet the requirements: d _m +l _c <l, where l may be the set distance in step 2014.

It should be noted that the inequality d _m +l _c <l, indicating that the audio segment corresponding to each feature point should not contain adjacent feature points, so as to avoid embedding of the watermark from destroying the local features of the feature points. It should also be noted that embedding the watermark near the region where the audio amplitude changes strongly at the feature points may enhance the concealment of the watermark by using a masking effect, and thus d _m The value of (A) is not too large. Length l of audio segment _c Depending on the length of the binary watermark to be embedded, and also on the frequency bandwidth of the watermark embedding.

Step 2022, dividing each audio segment into a plurality of non-overlapping frames;

this step aims at splitting the audio segment into shorter audio frames, and then applying a DCT transform. Appropriate audio frame length l _s The method can enhance the concealment of the watermark and also provide convenience for butterfly operation in DCT transformation.

Step 2023, perform DCT transform on each audio frame.

Step 2024, extracting the DCT intermediate frequency coefficients, and splitting them into a plurality of coefficient blocks of equal length.

Specifically, the corresponding physical frequency in the extracted DCT coefficient is located in the interval [ f _l ，f _h ]The inner components are separated into a plurality of non-overlapping parts with length l _f If the length of the coefficient block of (2) is not divisible, the remaining frequency portion is not processed. Wherein the interval [ f _l ，f _h ]Is medium frequency bandwidth, wherein the parameter f _l ，f _h Is the verified frequency bandwidth critical point in this embodiment. It should be noted that suitable if coefficients help to improve the robustness of the watermarking technique.

Step 203, generating a pseudo-random sequence cluster according to the key sequence;

in this step, a cluster of pseudo-random sequences is generated, which are orthogonal in pairs and modulo 1.

Specifically, step 203 may include steps 2031 to 2034.

Step 2031, acquiring a normalized key sequence;

the composition of the key sequence may be different in different situations, and in this step it is uniformly transformed into a canonical form, where the key sequence is represented as

k _i ∈{-1，+1}，i＝0，…，l _f -1, length l _f It can be seen that the key sequence is a set of PN sequences.

Step 2032, circularly shifting the key sequence to obtain a matrix K;

specifically, for k ₀ The values in (3) are cyclically shifted to obtain a set of sequences:

combining the above sequence and the key sequence into a matrix

Wherein the superscript is ^T Representing a transpose operation.

2033, decomposing the matrix K into a matrix F and a matrix H;

specifically, the full rank of matrix K is decomposed into matrix F and matrix H using formula (2):

K＝F ^T H (2)

where matrix H is the non-zero row portion of the Hermite standard version of matrix K, F ^T Is a full rank matrix consisting of c columns of matrix K, c being the rank of matrix K.

Step 2034, performing Graham-Schmidt orthogonalization on the matrix F to obtain a pseudo-random sequence cluster

In particular, for facilitating subsequent watermark modulation, the first 2 of the matrix F needs to be updated ^m The rows are subjected to a gram-schmitt orthogonalization, specifically using formula (3) for the first 2 of the matrix F ^m Performing gram-Schmidt orthogonalization on the rows to obtain a pseudo-random sequence cluster

Wherein m is equal to

f _j Is a row matrix of matrix F. It can be seen that p _j (j＝0,…,2 ^m -1) is a pseudo-random sequence cluster

Step 204, adaptively calculating the embedding strength corresponding to each transform domain coefficient block by utilizing a pseudorandom sequence cluster;

specifically, using the pseudo-random sequence cluster, calculating the embedding strength α corresponding to each transform domain coefficient block by using formula (4):

α＝γmax|xp _k |，k＝0，1，…，2 ^m -1 (4)

Step 205, modulating the binary watermark to be embedded into a spread spectrum watermark by utilizing the pseudorandom sequence cluster, and superposing the spread spectrum watermark into a transform domain coefficient block according to the corresponding embedding strength;

in this step, the binary watermark is split into non-overlapping m-bit watermark groups, and if the length cannot be divided completely, zero padding is performed, and the length of the binary watermark after 0 padding is recorded as l _w . These m-bit watermark groups are then modulated into spread spectrum watermarks, and an m-bit watermark group is embedded into an intermediate frequency coefficient block.

Specifically, step 205 may include steps 2051 to 2053; wherein:

step 2051, splitting the binary watermark to obtain a plurality of m-bit watermark groups;

in particular, the binary watermark will be split into several m-bit watermark groups, which do not overlap each other.

It should be noted that the number of watermark groups is the same as the number of coefficient blocks obtained in step 2024. For example, if an audio segment is divided into a audio frames, and each audio frame is DCT-transformed to obtain b blocks of intermediate frequency coefficients, there should be a × b m-bit watermark sets.

Step 2052, modulating each watermark group into a pseudo-random sequence;

in this step, each binary m-bit watermark group is first converted into a decimal watermark group, and the decimal watermark group is mapped to one of the pseudo-random sequence clusters.

The above-mentioned m-bit watermark group may be characterized as b ═ b ₀ ，b ₁ ，b ₂ ，…，b _m-1 ]。

The specific binary conversion expression is the following formula (7):

the mapping relation between the decimal watermark group and the pseudo-random sequence cluster is as follows:

t∈{0，1，…，2 ^m -1} whereby a mapping between the set of m-bit watermarks and the pseudo-random sequence can be constructed.

Step 2053, embedding the spread spectrum watermark into the DCT intermediate frequency coefficient block to obtain the DCT intermediate frequency coefficient block with the watermark;

specifically, the spread spectrum watermark is embedded into the DCT intermediate frequency coefficient block by using embedding expression (8) to obtain the watermarked DCT intermediate frequency coefficient block:

the DCT intermediate frequency coefficient block is x, and the intermediate frequency coefficient block with the watermark is y.

And step 206, replacing the original coefficient block with the transform domain coefficient block with the watermark, and obtaining the carrier audio comprising the watermark information after inverse domain transform.

This process may be the reverse of step 202.

Specifically, step 206 may include steps 2061 to 2064; wherein:

step 2061, replacing the original coefficient block with the DCT coefficient block with the watermark, and forming a complete DCT coefficient;

this process is the reverse of step 2024, replacing the original coefficient block x with the watermarked coefficient block y, and recombining into DCT coefficients.

Step 2062, performing inverse discrete cosine transform;

the step is to perform inverse transformation on the recombined DCT coefficients to obtain corresponding audio frames.

Step 2063, recombining all audio frames to obtain audio segments;

step 2064, replacing the corresponding audio segment with the watermark to obtain the audio y (n) embedded with the watermark.

It should be noted that the audio segment corresponding to each feature point is embedded with a complete watermark. The binary watermark is repeatedly embedded in the carrier audio.

The following describes the audio watermark embedding method according to an embodiment of the present invention in detail.

This example randomly selects 30 examples of binaural audio with a length of 30 seconds and a sampling frequency of 44.1 khz as the carrier audio.

This embodiment uses a randomly generated 320-bit binary sequence as the original watermark information, i.e. | _w A 0-1 sequence of 320 bits in length may enable a more comprehensive description of copyright information.

And extracting feature points of the carrier audio, enabling the minimum distance between two adjacent feature points to be larger than 1.5 times of sampling frequency, and taking the set distance l as 66150.

Extracting 20480 samples from 100 milliseconds after each feature point as audio segments, i.e./ _c 20480. Each audio segment will be divided into 10 audio frames of length 2048, i.e./ _s ＝2048。

DCT transform is carried out on each audio frame to obtain 2048 DCT coefficients, and the physical frequency is selected to be positioned in the interval [ f _l ，f _h ]Inner intermediate frequency coefficient, f _l ＝f _s 16 ≈ 2756 Hz, f _h ＝f _s 5512 Hz, wherein f _s Is the sampling frequency, which corresponds to 256 coefficients from 256 th bit to 511 th bit in the DCT coefficient, and uses the 256 coefficients as the intermediate frequency coefficients to be embedded, and at this time, the 256 intermediate frequency coefficients will be split into 8 coefficient blocks with the length of 32. Each feature point thus corresponds to a total of 80 coefficient blocks.

Generating a pseudo-random sequence cluster from the key sequence with the length of 32 according to the step 203

In the embodiment, when m is 4, the cyclic shift process may cause the matrix K to be discontentedRank, so the sequence length l can be set _f ＝2 ^m+1 32, to avoid the occurrence of insufficient number of sequences in the pseudo-random sequence cluster. At this time, the pseudo random sequence cluster is expressed as

For each coefficient block, the embedding strength α is calculated according to the above formula (4).

Splitting the 320-bit binary watermark into 80 4-bit watermark groups, each watermark group being represented by the above equations (7) and (d)

t∈{0，1，…，2 ^m -1} mapping the 4-bit watermark groups to the corresponding pseudo-random sequence p _t 。

And then, according to the formula (8), the 80 m-bit watermark groups are embedded into the 80 coefficient blocks one by one.

The carrier audio comprising the watermark information is obtained according to step 206 described above.

It should be noted that, when extracting feature points for different carrier audios, the distance between adjacent feature points is often greater than the minimum distance set in the embodiment, so that the method of the present invention can only estimate the upper bound of the embedding capacity when calculating the embedding capacity. The upper bound of embedding capacity of this embodiment is 500 bits per second, which is much higher than the comparable method.

In order to embody the improvement of the adaptive embedding strength strategy on the concealment of the watermark technology, in this embodiment, a set of comparison experiments with fixed embedding strength are performed.

Setting fixed embedding strength; the average embedding loss obtained after embedding the watermark in 30 audio samples is basically the same as that of the present invention, and at this time, the fixed embedding strength is set to be 0.33, the corresponding Signal Watermark Ratio (SWR) is 25.20dB, and the adaptive embedding strength used in the present invention corresponds to SWR of 25.13 dB. Objective quality assessment and subjective assessment are introduced to measure the concealment of the watermark.

Fig. 4 to 6 show SWR, objective quality rating (ODG), and subjective quality rating (MOS) of 30 audio samples under the above two embedding strength conditions.

FIG. 4 shows SWR of 30 cases of audio with both fixed embedding strength and adaptive embedding strength; FIG. 5 is an ODG of 30 cases of audio with both fixed embedding strength and adaptive embedding strength; fig. 6 shows the MOS of 30 cases of audio with both fixed embedding strength and adaptive embedding strength.

In fig. 4 to 6, the higher the vertical axis, the better the concealment of the watermark. It can be seen that with a fixed embedding strength, although SWR is similar to the adaptive embedding strength, MOS and ODG perform worse, which means that the adaptive embedding strength strategy can achieve better watermark concealment, thereby ensuring that watermark embedding does not destroy the use value of the carrier audio.

The SWR is calculated as follows in equation (9):

it can be seen that the SWR reflects the amount of change of the watermark information embedding to the carrier audio.

Fig. 7 is a third schematic flowchart of an extracting method corresponding to the audio watermark embedding method provided by the present invention, as shown in fig. 7, the method includes steps 701 to 705; wherein:

step 701, extracting local significant feature points of the carrier audio in a time domain;

step 702, according to the local significant feature points, determining a watermark embedding position on a proper time domain, performing domain transformation on an audio signal corresponding to the watermark embedding position, and selecting coefficients with stable partial attack in a transform domain to form a group of transform domain coefficient blocks;

703, generating a pseudo-random sequence cluster according to the key sequence;

it should be noted that the specific implementation details of steps 701 to 703 are substantially the same as those of steps 201 to 203, and are not repeated herein to avoid repetition.

And 704, extracting the embedded watermark according to the correlation between the pseudo-random sequence cluster and the coefficient block.

In this step, let the coefficient block of the watermark to be extracted be

Then the coefficient block is pre-extracted using each sequence in the pseudo-random sequence cluster using equation (10):

and then selecting the largest item from the pre-extraction result as the extracted watermark by adopting a formula (11):

will be provided with

The conversion into binary is the extracted m-bit watermark group.

Repeating the above steps for each coefficient block to obtain the complete binary watermark.

The invention also analyzes the formula (8), the formula (10), the formula (11) and the formula (4) to perfect the derivation process of the self-adaptive embedding strength. Let the embedded m-bit watermark group be represented as a decimal number t, then the following formula (12) exists for extraction:

if j equals t, then there are the following derivation equations (13) to (15):

wherein, the formula (14) is obtained by mutually orthogonal sequences in the pseudo random sequence cluster and the modulus is 1; equation (15) is obtained because the two terms of absolute value in equation (14) are in the same sign.

Correspondingly, if j ≠ t, there are lower derivation equations (16) to (17):

in order for equation (11) to extract the embedded watermark t, R needs to be satisfied _t Is 2 ^m A pre-extraction result R _j Is the largest, so equation (18) can be derived:

the value of the embedding capacity α is actually the result of scaling the inequality (21), specifically see the derivation formulas (19) to (21):

considering that the audio will experience some attacks during transmission, and the distortion caused by these attacks is considered to be much smaller than the audio energy, adding a constant factor on the basis of the inequality (19) can further enhance the robustness of the watermark, i.e. the above equation (4) can be obtained.

The robustness of a watermarking technique is usually evaluated by using a Bit Error Rate (BER), which is a ratio of the number of bits in Error to the total number of bits, to compare an extracted watermark with an original embedded watermark, and the lower the BER, the better the robustness of the watermark. Original binary watermark sequence w ═ w ₀ ，w ₁ ，…，w _n-1 ]With the extracted watermark sequence

The bit error rate between (a) and (b) is calculated as follows:

wherein ,

for 30 cases of watermark-embedded audio obtained in the above embodiments, a plurality of signal processing attacks are respectively implemented, where the signal processing attacks include: MP3 format compression with amplitude reduced to 0.7 times and expanded to 1.3 times, bit rates of 128kbps and 96kbps, low pass filtering with a cut-off frequency of 8 khz, additive white gaussian noise with a signal-to-noise ratio of 15 db and 10 db, respectively, and 16-8 bit quantization, the watermark in the attacked watermarked audio is extracted using an embodiment of the audio watermark embedding method and compared with the original watermark to obtain an average bit error rate of 30 extracted watermarks, as shown in table 1.

Table 1 mean BER of 30 extracted watermarks under different signal processing attacks

For 30 cases of embedded watermark audio obtained in the above embodiment, two types of desynchronization attacks are implemented, including: resampling with a sampling frequency of 22.05 khz or 11.025 khz and preserving the middle section for 10 seconds and 20 seconds respectively to cut off the attack, extracting the watermark in the attacked watermarked audio by using the embodiment of the audio watermark-based embedding method, and comparing the watermark with the original watermark, the average bit error rate of 30 extracted watermarks is shown in table 2.

Table 2 mean BER of 30 extracted watermarks under different desynchronization attacks

As can be seen from the results shown in table 1, table 2 and fig. 4 to fig. 6, in the face of common signal processing attacks and desynchronization attacks, the embodiment of the present invention can basically extract the watermark with 100% accuracy, and can also ensure the audio quality (ODG > -1.0, MOS > 3.5). The invention can be proved to resist various signal processing attacks and desynchronization attacks, has stronger robustness and can ensure the concealment of the watermark in each audio.

In the above embodiment of the method for embedding an audio watermark, the length l of the binary watermark is adjusted _w The concealment, robustness and embedding capacity of the audio watermark embedding method were evaluated, and SWR, ODG, BER and embedding capacity (upper bound) of 30 cases of embedded audio are shown in table 3.

TABLE 3 different watermark lengths l _w Corresponding watermark embedding parameters

Watermark length l _w	SWR(dB)	ODG	BER(％)	Capacity (bps)
					80	30.98	-0.51	0	500
160	27.94	-0.57	0	500
					320	25.13	-0.61	0	500
640	22.03	-0.58	0	500
					800	21.07	-0.64	15.17	500
900	20.55	-0.62	51.23	500

From table 3, the watermark length l can be seen _w An increase in SWR will increase, but will not affect the ODG score. This is because SWR reflects global embedding effects, while ODG focuses more on local perceptual quality, which is also more logical in the human auditory system. When the watermark length is greater than 750, the bit error rate increases rapidly. This is thatSince this exceeds 1.5 times the upper bound of the watermark capacity, the (minimum) distance between adjacent feature points, l 66150, cannot accommodate so much watermark embedding.

It should be noted that, the 320-bit watermark embedding is selected by this embodiment because of the requirement of practical application, and is not a limitation to the present invention.

In the above embodiment of the method for embedding audio watermark, the length l of the audio frame is _s The change of (2) evaluates the concealment, robustness and embedding capacity of the audio watermark embedding method. The SWR, ODG, BER and embedding capacity (upper bound) for 30 cases of embedded audio are shown in table 4.

TABLE 4 different Audio frame lengths l _s Corresponding watermark embedding parameters

Audio frame length l _s	SWR(dB)	ODG	BER(％)	Capacity (bps)
					256	25.02	-0.54	0	500
512	24.94	-0.53	0	500
					1024	24.89	-0.55	0	500
2048	25.13	-0.61	0	500
					4096	25.10	-0.72	0	500
8192	24.99	-0.90	0	500

From Table 4, it can be seen that the frame length l is varied with the audio frame _s SWR, BER and embedding capacity are hardly affected, but ODG is decreased. Audio frame length l _s The global embedding effect is not changed by the change in (b), but locally the watermark of the same pattern (a pseudo-random sequence) will last longer, which changes the local perceptual quality.

It should be noted that, for convenience of implementation, the audio frame length is set to 2048 in this embodiment, which is not a limitation on the present invention.

Implementation of the above-described method for embedding audio watermarksIn the example, the watermark bits m or the length l of the block of DCT coefficients represented for each pseudorandom sequence _f (in fact, in this example there is l _f ＝2 ^m+1 ) The change of (2) evaluates the concealment, robustness and embedding capacity of the audio watermark embedding method. The SWR, ODG, BER and embedding capacity (upper bound) for 30 cases of embedded audio are shown in table 5.

TABLE 5 different coefficients block length l _f Corresponding watermark embedding parameters

From Table 5, it can be seen that the block length l is dependent on the DCT coefficient _f There is a deterioration in SWR, and as the sequence becomes longer, this deterioration is not alleviated. This is because when the sequence is shorter, the corresponding watermark group length m is smaller, the embedding strength obtained by equation (4) is smaller (in the distribution that maximizes the random number, the probability that the maximum value is larger is smaller as the number of samples is smaller), and as the sequence length continues to increase, the embedding strength gradually tends to saturate (i.e., does not become infinitely larger as the number of samples becomes larger). When the coefficient block length is small, the pseudo-random sequence is also shorter, and the independence between them is affected. The extraction method described in this embodiment is based on the correlation between the pseudorandom sequence cluster and the coefficient block, and the independence between them has a significant influence on the robustness. Since the present embodiment is designed to have a 50% redundancy in sequence length in order to ensure that a sufficient number of mutually orthogonal pseudo-random sequences can be generated, the longer the coefficient block, the lower the embedding capacity (upper bound).

It should be noted that, in combination with the experimental results of concealment, robustness and embedding capacity, the length of the coefficient block set in this embodiment is 32, which is not a limitation to the present invention.

The following describes the audio watermark embedding apparatus provided by the present invention, and the audio watermark embedding apparatus described below and the audio watermark embedding method described above may be referred to in correspondence with each other.

Fig. 8 is a schematic structural diagram of an audio watermark embedding apparatus provided by the present invention, and as shown in fig. 8, the audio watermark embedding apparatus 800 includes:

an extracting module 801, configured to extract a local significant feature point of a carrier audio in a time domain;

a determining module 802, configured to determine, according to the local significant feature point, a watermark embedding position of the carrier audio in a time domain;

and an embedding module 803, configured to embed the watermark information in the carrier audio based on the watermark embedding location, so as to obtain the carrier audio including the watermark information.

Optionally, the extracting module 801 is specifically configured to:

d(n)＝x _s (n+1)-x _s (n) (1)

calculating local extreme points of d (n) as candidate feature points;

extracting candidate feature points meeting a first condition to serve as local significant feature points;

wherein satisfying the first condition comprises at least one of:

the distance between the last sampling point of the carrier audio and the last sampling point of the carrier audio is larger than or equal to a first distance;

among other candidate feature points whose distances from the candidate feature points are smaller than the second distance, the candidate feature point is the candidate feature point with the largest contrast.

Optionally, the extracting module 801 is further specifically configured to:

calculating a contrast signal | d (n) | of d (n);

calculating a differential signal t (n) of | d (n) |;

and selecting local extreme points satisfying { i | t (i-1) >0, t (i) <0} in t (n) as candidate feature points.

Optionally, the apparatus 800 for embedding an audio watermark further includes: the device comprises a generation module and a modulation module.

The generating module is used for generating a pseudo-random sequence cluster according to the key sequence; wherein the pseudo-random sequence cluster comprises two pairwise orthogonal 2 ^m A pseudo-random sequence, m being a positive integer;

and the modulation module is used for modulating the binary watermark to be embedded into the spread spectrum watermark by utilizing the pseudo-random sequence cluster.

The embedding module 802 is specifically configured to embed the spread spectrum watermark as watermark information in a carrier audio based on the watermark embedding position, so as to obtain the carrier audio including the watermark information.

Optionally, the generating module is specifically configured to:

obtaining a key sequence

k _i ∈{-1，+1}，i＝0，…，l _f -1；

To k is paired ₀ Cyclic shift to obtain matrix

wherein ,

K＝F ^T H (2)

wherein matrix H is a non-zero row portion of the hermitian standard version of matrix KMinute, F ^T Is a full rank matrix consisting of c columns of the matrix K, c being the rank of the matrix K;

Wherein m is equal to

f _j Is a row matrix of matrix F.

Optionally, the embedded module 802 is further specifically configured to:

carrying out domain transformation on the audio signal corresponding to each watermark embedding position to obtain a transformation domain coefficient;

calculating the embedding strength corresponding to each transform domain coefficient block by utilizing the pseudo-random sequence cluster;

Optionally, the embedded module 802 is further specifically configured to:

α＝γmax|xp _k |，k＝0，1，…，2 ^m -1 (4)

In the embodiment of the invention, the extraction module extracts the local significant characteristic points of the carrier audio at the position with severe audio signal change, and the watermark embedding position determined based on the local significant characteristic points can keep the relative position unchanged when the desynchronization attack is faced, so that the audio segment embedded with the watermark can be accurately positioned, the watermark embedded in the audio segment can be accurately extracted, and the robustness of the watermark technology in the desynchronization attack situation is effectively improved.

Fig. 9 is a schematic physical structure diagram of an electronic device provided in the present invention, and as shown in fig. 9, the electronic device 900 may include: a processor (processor)910, a communication Interface (Communications Interface)920, a memory (memory)930, and a communication bus 940, wherein the processor 910, the communication Interface 920, and the memory 930 communicate with each other via the communication bus 940. Processor 910 may invoke logic instructions in memory 930 to perform a method of audio watermark embedding, the method comprising:

extracting local salient feature points of the carrier audio in a time domain;

determining the watermark embedding position of the carrier audio on a time domain according to the local significant feature points;

and embedding the watermark information into the carrier audio based on the watermark embedding position to obtain the carrier audio comprising the watermark information.

Furthermore, the logic instructions in the memory 930 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being storable on a non-transitory computer-readable storage medium, the computer program, when executed by a processor, being capable of executing the method for embedding an audio watermark provided by the above methods, the method including:

extracting local salient feature points of the carrier audio in a time domain;

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for embedding an audio watermark provided by the above methods, the method comprising:

extracting local salient feature points of the carrier audio in a time domain;

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An audio watermark embedding method, comprising:

extracting local salient feature points of the carrier audio in a time domain;

2. The method for embedding an audio watermark according to claim 1, wherein the extracting the locally significant feature points of the carrier audio in the time domain comprises:

According to x _s (n) use ofEquation (1) calculates the first order difference signal d (n):

d(n)＝x _s (n+1)-x _s (n) (1)

calculating local extreme points of d (n) as candidate feature points;

extracting the candidate feature points meeting a first condition as the local significant feature points;

wherein the meeting the first condition comprises at least one of:

3. The method for embedding an audio watermark according to claim 2, wherein the calculating local extreme points of d (n) as candidate feature points comprises:

calculating a contrast signal | d (n) | of d (n);

calculating a differential signal t (n) of | d (n) |;

4. The method for embedding an audio watermark according to any one of claims 1 to 3, wherein before the embedding watermark information in the carrier audio based on the watermark embedding position, the method further comprises:

generating a pseudo-random sequence cluster according to the key sequence;

wherein the pseudo-random sequence cluster comprises two pairwise orthogonal 2 ^m A pseudo random sequence, m is positive integerCounting;

5. The method for embedding an audio watermark according to claim 4, wherein the generating a pseudo-random sequence cluster according to a key sequence comprises:

obtaining the key sequence

k _i ∈{-1，+1}，i＝0，…，l _f -1；

To k is paired ₀ Cyclically shifting to obtain matrix

wherein ,

K＝F ^T H (2)

where matrix H is a non-zero row portion of the Hermite standard form of matrix K, F ^T Is a full rank matrix consisting of c columns of the matrix K, c being the rank of the matrix K;

Wherein m is equal to

f _j Is a row matrix of matrix F.

6. The method for embedding an audio watermark according to claim 4, wherein the embedding the spread spectrum watermark in the carrier audio as watermark information based on the watermark embedding position to obtain the carrier audio including the watermark information comprises:

calculating the embedding strength corresponding to each transform domain coefficient block by using the pseudo random sequence cluster;

7. The method of claim 6, wherein said calculating an embedding strength for each of the transform domain coefficient blocks using the pseudorandom sequence clusters comprises:

α＝γmax|xp _k |，k＝0，1，…，2 ^m -1 (4)

8. An apparatus for embedding an audio watermark, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of embedding an audio watermark according to any one of claims 1 to 7 when executing the program.

10. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of embedding an audio watermark according to any one of claims 1 to 7.

11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the audio watermark embedding method according to any one of claims 1 to 7.