CN115116453B

CN115116453B - Audio watermark embedding method and device, electronic equipment and storage medium

Info

Publication number: CN115116453B
Application number: CN202210605835.4A
Authority: CN
Inventors: 黄樱; 张树武; 刘杰
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2023-09-12
Anticipated expiration: 2042-05-30
Also published as: CN115116453A

Abstract

The invention provides an audio watermark embedding method, an audio watermark embedding device, electronic equipment and a storage medium, wherein the method comprises the following steps: extracting local salient feature points of the carrier audio on a time domain; determining watermark embedding positions of the carrier audio on a time domain according to the local salient feature points; and embedding watermark information into the carrier audio based on the watermark embedding position to obtain the carrier audio comprising the watermark information. In the embodiment of the invention, the local significant feature points of the extracted carrier audio are positioned at the position with severe change of the audio signal, and the watermark embedding position determined based on the local significant feature points can keep the relative position unchanged when facing the desynchronization attack, so that the audio fragment embedded with watermark information can be accurately positioned, further the watermark information embedded in the audio fragment can be accurately extracted, and the robustness of the watermark technology in the desynchronization attack situation is effectively improved.

Description

Audio watermark embedding method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of audio processing technologies, and in particular, to an audio watermark embedding method, an audio watermark embedding device, an electronic device, and a storage medium.

Background

With the popularization of mobile internet, more and more creators release works into the network, which brings convenience to the propagation of the works and also has serious copyright hidden trouble. Audio watermarking provides a solution for the protection of the copyright of digital audio by imperceptibly embedding a digital mark identifying the copyright, known as a watermark, into the audio, which can be determined by extracting the watermark from the audio when a rights dispute occurs.

With regard to watermark embedding, it is currently possible to choose to embed a watermark in a complete audio segment, or to embed a watermark in a partial audio segment.

However, the method of embedding a watermark into a complete piece of audio is unable to properly extract the watermark in the face of desynchronization attacks represented by clipping and resampling; in the method for embedding the watermark into the partial audio fragment, after the audio surface is subjected to desynchronization attack, the watermark embedded in the audio fragment cannot be extracted correctly because the watermark embedded audio fragment cannot be positioned accurately. As can be seen, the related watermarking technique is less robust against desynchronization attack scenarios.

Disclosure of Invention

The invention provides an audio watermark embedding method, an audio watermark embedding device, electronic equipment and a storage medium, which are used for solving the problem of low robustness of a related watermark technology in the face of desynchronization attack.

The invention provides an embedding method of an audio watermark, which comprises the following steps: extracting local salient feature points of the carrier audio on a time domain;

determining watermark embedding positions of the carrier audio on a time domain according to the local salient feature points;

and embedding watermark information into the carrier audio based on the watermark embedding position to obtain the carrier audio comprising the watermark information.

According to the method for embedding the audio watermark, provided by the invention, the local significant feature points of the carrier audio on the time domain are extracted, and the method comprises the following steps:

the audio signal of the carrier audio is subjected to low-pass filtering to obtain a filtered audio signal x _s (n)；

According to x _s (n) calculating a first order differential signal d (n) using formula (1):

d(n)＝x _s (n+1)-x _s (n) (1)

calculating local extremum points of d (n) as candidate feature points;

extracting the candidate feature points meeting a first condition as the local significant feature points;

wherein the meeting the first condition includes at least one of:

the distance between the carrier audio and the last sampling point is greater than or equal to the first distance;

the corresponding contrast |d (i) | is larger than a first threshold value, and i is a positive integer;

the corresponding steepness t (i-1) -t (i) is larger than a second threshold value, wherein t (n) is a differential signal of contrast signal |d (n) | of d (n), and i is a positive integer;

Among other candidate feature points having a distance to the candidate feature point smaller than the second distance, the candidate feature point is the candidate feature point having the largest contrast.

According to the audio watermark embedding method provided by the invention, the local extreme point of d (n) is calculated and used as a candidate feature point, and the method comprises the following steps:

calculating the contrast signal |d (n) | of d (n);

calculating a differential signal t (n) of |d (n) |;

and selecting local extreme points meeting { i|t (i-1) >0 and t (i) <0} in t (n) as the candidate feature points.

According to the method for embedding the audio watermark provided by the invention, before the watermark information is embedded in the carrier audio based on the watermark embedding position to obtain the carrier audio comprising the watermark information, the method further comprises the following steps:

generating a pseudo-random sequence cluster according to the key sequence;

wherein the pseudo random sequence cluster comprises 2 orthogonal pairs ^m A plurality of pseudo-random sequences, m being a positive integer;

modulating the binary watermark to be embedded into a spread spectrum watermark by utilizing the pseudo random sequence cluster;

the embedding watermark information in the carrier audio based on the watermark embedding position to obtain carrier audio including the watermark information includes:

And based on the watermark embedding position, embedding the spread spectrum watermark into the carrier audio as watermark information to obtain the carrier audio comprising the watermark information.

According to the method for embedding the audio watermark, which is provided by the invention, the pseudo-random sequence cluster is generated according to the key sequence, and the method comprises the following steps:

acquiring the key sequencek _i ∈{-1，+1}，i＝0，…，l _f -1；

For k ₀ Cyclic shift to obtain momentArray

wherein ,

the full rank of the matrix K is decomposed into a matrix F and a matrix H by adopting a formula (2):

K＝F ^T H (2)

wherein matrix H is the non-zero row portion of the Hermite standard type of matrix K, F ^T Is a full rank matrix consisting of c columns of matrix K, c being the rank of matrix K;

front 2 of matrix F using equation (3) ^m Orthogonalization of the line gram-schmitt is carried out to obtain a pseudo-random sequence cluster

Wherein m is equal tof _j Is a row matrix of matrix F.

According to the method for embedding the audio watermark provided by the invention, the spread spectrum watermark is embedded into the carrier audio as watermark information based on the watermark embedding position, so as to obtain the carrier audio comprising the watermark information, and the method comprises the following steps:

performing domain transformation on the audio signals corresponding to the watermark embedding positions to obtain transformation domain coefficients;

selecting stable attack coefficients in a transform domain to form a transform domain coefficient block;

Calculating the embedding strength corresponding to each transformation domain coefficient block by utilizing the pseudo-random sequence cluster;

based on the embedding strength, embedding the spread spectrum watermark into a transform domain coefficient block corresponding to the embedding strength to obtain a transform domain coefficient block with watermark;

and carrying out inverse domain transformation on the transformation domain coefficient block with the watermark to obtain carrier audio comprising the watermark information.

According to the method for embedding the audio watermark provided by the invention, the embedding strength corresponding to each transform domain coefficient block is calculated by using the pseudo random sequence cluster, and the method comprises the following steps:

calculating the embedding strength alpha corresponding to each transform domain coefficient block by using the pseudo-random sequence cluster and adopting a formula (4):

α＝γmax/xp _k /，k＝0，1，…，2 ^m -1 (4)

where γ is a constant greater than 1, and vector x represents each of the transform domain system blocks, vector p _k Characterizing the cluster of pseudo-random sequences.

The invention also provides an embedding device of the audio watermark, which comprises:

the extraction module is used for extracting local significant feature points of the carrier audio frequency in a time domain;

the determining module is used for determining the watermark embedding position of the carrier audio on the time domain according to the local significant feature points;

and the embedding module is used for embedding watermark information into the carrier audio based on the watermark embedding position to obtain the carrier audio comprising the watermark information.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of embedding an audio watermark as described in any of the above when executing the program.

The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of embedding an audio watermark as described in any of the above.

The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a method of embedding an audio watermark as described in any of the above.

According to the audio watermark embedding method, the device, the electronic equipment and the storage medium, the local significant characteristic points of the extracted carrier audio are positioned at the position with severe audio signal change, and the watermark embedding position determined based on the local significant characteristic points can keep the relative position unchanged when facing the desynchronization attack, so that the audio fragment embedded with watermark information can be accurately positioned, watermark information embedded in the audio fragment can be accurately extracted, and the robustness of the watermark technology in the desynchronization attack situation is effectively improved.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of an audio watermark embedding method provided by the invention;

FIG. 2 is a second flowchart of an audio watermark embedding method according to the present invention;

fig. 3 is an exemplary schematic diagram of step 202 in the audio watermark embedding method provided in the present invention;

FIG. 4 is SWR for 30 cases of audio with both fixed embedding strength and adaptive embedding strength;

FIG. 5 is an ODG for 30 cases of audio with both fixed embedding strength and adaptive embedding strength;

fig. 6 is a MOS of 30 cases of audio in both fixed embedding strength and adaptive embedding strength;

FIG. 7 is a third flow chart of the audio watermark embedding method according to the present invention;

fig. 8 is a schematic structural diagram of an audio watermark embedding apparatus provided by the present invention;

Fig. 9 is a schematic diagram of the physical structure of the electronic device provided by the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The embodiments of the present application provide a solution to the problem of low robustness of the related watermark technology in the face of a desynchronization attack scenario. In order to facilitate a clearer understanding of the embodiments of the present application, some related technical knowledge will be described first.

In recent years, piracy network streaming events are frequent, network piracy infringement becomes a problem of general attention of creators and media platforms, and related technology research on copyright infringement is also carried out in academia and industry. Audio watermarking provides a good solution to the copyright protection of digital audio by imperceptibly embedding a digital mark identifying the copyright, called a watermark, into the audio; when rights disputes occur, extracting the watermark in the audio can determine the copyright of the audio. Audio watermarking is an effective technique for protecting digital audio copyrights, however, there are still some problems to be solved, whether in academic research or project applications.

The audio watermarking method applicable to the project application needs to have the advantages of concealment, robustness and large embedding capacity. This requires that the watermark is embedded in the audio work without affecting its perceived quality and use, i.e. that the concealment of the watermark needs to be met; even if the audio is distorted by various attacks, such as noise superposition, encoding compression, resampling and the like, the watermark can be extracted correctly, i.e. the robustness of the watermark needs to be satisfied; there is also enough capacity to mark copyright information, such as rights man, rights type, and right start-stop time, i.e. large embedding capacity needed to meet the watermark.

However, there are contradictions between the three performance requirements: increased robustness is typically at the expense of concealment and embedding capacity; improving concealment also reduces embedding capacity or compromises robustness. An effective audio watermarking method should meet the requirements of three properties as simultaneously as possible.

A complete watermark embedding framework mainly comprises three aspects: selection of an embedded domain, determination of an embedded position, an embedding method, and design of an extraction method corresponding to the embedding method.

The embedded domain of the watermark can be divided into a time domain and a transform domain. The time domain refers to the audio signal, the transform domain refers to the coefficient space obtained by performing domain transform processing on the audio signal, and compared with the time domain technology, the transform domain watermark technology has better concealment and robustness. Common domain transformation methods are discrete cosine transform (Discrete Cosine Transformation, DCT), discrete wavelet transform (Discrete Wavelet Transformation, DWT), lifting wavelet transform (Lifting Wavelet Transformation, LWT), etc.

The determination of the watermark embedding location includes two dimensions: firstly, determining the embedding position on a time domain, wherein a section of complete audio can be selected to be used for embedding the watermark, or part of the audio can be selected to be embedded with the watermark, the former cannot be used for carrying out desynchronization attack represented by cutting and resampling, and the latter cannot accurately position the audio fragment embedded with the watermark after the audio is attacked by the existing method, so that the watermark embedded in the watermark cannot be extracted correctly; and secondly, determining the embedded position on the transform domain depends on the transform domain characteristics of digital audio, and the embedded position is usually determined by mining certain stable coefficient ranges in the transform domain, so that the watermark can be sufficiently robust even if the audio is subjected to signal attack.

The design of the embedding method is the core of the watermarking technology, and is closely related to the performance of the watermarking technology. A common embedding method at present is a spread spectrum method, which first modulates each bit in the watermark into a pseudo-noise (PN) sequence, which is then superimposed into the transform domain coefficients of the carrier audio. In watermark extraction, a correlation function between transform domain coefficients containing the watermark and a PN sequence can be calculated to extract the watermark. The spread spectrum method is paid attention to because of a simple extraction mode and robustness to noise attack, but in the spread spectrum method, PN sequences are designed to be long in order to ensure that extracted watermarks are accurate, and only one bit of watermark information can be embedded, so that the embedding capacity of audio watermarks is severely limited.

The spread spectrum method often uses a parameter to control the strength of watermark embedding, which is the embedding strength. The embedding strength is used to balance the performance of the watermarking technique. Specifically, the greater the embedding strength, the more robust the watermark, but the weaker the concealment; and vice versa. The setting of the embedding strength needs to be compatible with both robustness and concealment. The existing spread spectrum methods use the same fixed strength when watermark is embedded into each section of audio, neglect the difference of different audios and cannot meet the requirements of robustness and concealment; some spread spectrum watermarking methods use different intensities for different audio, but use the same embedding intensity at different embedding positions of a section of audio, which also compromises the concealment of the watermarking technique.

The following describes an audio watermark embedding method of the present invention with reference to the accompanying drawings. Fig. 1 is a schematic flow chart of an audio watermark embedding method provided by the present invention, as shown in fig. 1, the method includes steps 101 to 103; wherein:

and 101, extracting local salient feature points of the carrier audio in a time domain.

Step 102, determining the watermark embedding position of the carrier audio in the time domain according to the local salient feature points.

Step 103, based on the watermark embedding position, embedding watermark information into the carrier audio to obtain the carrier audio comprising the watermark information.

Specifically, local salient feature points of the carrier audio to be embedded with the watermark in a time domain are extracted, the local salient feature points can be positioned at positions with severe audio signal changes, watermark embedding positions of the carrier audio in the time domain are determined based on the local salient feature points, and watermark information is embedded into the carrier audio based on the determined watermark embedding positions, so that the carrier audio comprising the watermark information is obtained.

The watermark information may be a character string, a 01 string, a text or an image, etc. which can identify copyrights.

In the embodiment of the invention, the local significant feature points of the extracted carrier audio are positioned at the position with severe change of the audio signal, and the watermark embedding position determined based on the local significant feature points can keep the relative position unchanged when facing the desynchronization attack, so that the audio fragment embedded with watermark information can be accurately positioned, further the watermark information embedded in the audio fragment can be accurately extracted, and the robustness of the watermark technology in the desynchronization attack situation is effectively improved.

Alternatively, the local salient feature points may be extracted by:

s1, carrying out low-pass filtering on an audio signal of carrier audio to obtain a filtered audio signal x _s (n)；

Specifically, the audio signal x (n) of the carrier audio may be subjected to low-pass filtering processing by using the formula (5) to obtain a filtered audio signal x _s (n)：

x _s (n)＝G(n)*x(n) (5)

Where G (n) characterizes the low pass filter, representing the convolution calculation.

The above G (n) characterizes for example a gaussian filter,

s2, according to x _s (n) calculating a first order differential signal d (n) using formula (1):

d(n)＝x _s (n+1)-x _s (n) (1)

s3, calculating local extreme points of d (n) as candidate feature points;

specifically, the local extremum point of the first-order differential signal d (n) is calculated, which can be understood as the local extremum point of the contrast signal |d (n) | of d (n) is calculated.

Optionally, the step S3 may include the steps of:

s3-1, calculating contrast signal |d (n) | of d (n);

s3-2, calculating a differential signal t (n) of |d (n) |;

specifically, the differential signal t (n) of the contrast signal |d (n) | can be calculated using the formula (6):

t(n)＝|d(n+1)|-|d(n)| (6)

s3-3, selecting local extreme points meeting { i|t (i-1) >0 and t (i) <0} in t (n) as candidate feature points.

Specifically, the { i|t (i-1) can also be selected to be satisfied>0，t(i)<0, adding a candidate feature point set S ₀ S, i.e ₀ ＝{i|t(i-1)>0，t(i)<0}。

S4, extracting candidate feature points meeting the first condition to serve as local significant feature points;

wherein satisfying the first condition includes at least one of:

a. the distance between the carrier audio and the last sampling point is greater than or equal to the first distance;

the first distance may be set by a technician according to actual conditions.

b. The corresponding contrast |d (i) | is larger than a first threshold value, and i is a positive integer;

specifically, the first threshold may be set by a technician according to an actual situation, for example, the first threshold may be set as a median of contrast of the entire candidate feature points.

c. The corresponding steepness t (i-1) -t (i) is larger than a second threshold value, wherein t (n) is a differential signal of contrast signal |d (n) | of d (n), and i is a positive integer;

specifically, the second threshold value may be set by a technician according to the actual situation, and for example, the second threshold value may be set as the median of the abruptness of the entire candidate feature points.

d. Among other candidate feature points having a distance to the candidate feature point smaller than the second distance, the candidate feature point is the candidate feature point having the largest contrast.

The second distance may be set by a technician according to actual conditions.

It will be appreciated that step S4 is to provide a set of candidate feature points S ₀ And selecting candidate feature points which are not positioned at the tail end of the audio, have larger contrast, are positioned in steep peaks or have dominant contrast in adjacent candidate feature points, and using the candidate feature points which are suitable for watermark embedding as local salient feature points.

Alternatively, the candidate feature point set S may also be ₀ And selecting candidate feature points which simultaneously meet four conditions of no audio tail end, larger contrast, steep peak and dominant contrast in adjacent candidate feature points as local significant feature points so as to further improve the robustness of the watermarking technology in the face of desynchronization attack.

Specifically, selecting, as the local salient feature points, candidate feature points that simultaneously satisfy four conditions of not being at the audio tail end, having a large contrast, being in a steep peak, and having a dominant contrast in the neighboring candidate feature points, may be achieved by:

a. filtering out the set S ₀ Filtering out candidate feature points with a distance smaller than the first distance from the last sampling point of the carrier audio to obtain a candidate feature point set S ₁ ；

The first distance is, for example, a set distance l.

b. Filtering out the set S ₀ Candidate feature points of low contrast, i.e. calculatedCorresponding contrast |d (i) | and filtering out the candidate feature points with the contrast value smaller than the median of the contrast of the whole candidate feature points to obtain a candidate feature point set S ₂ ；

c. Filtering out the set S ₀ Candidate feature points not in steep peaks, i.e. calculating the steepness of candidate feature pointsAnd filtering candidate points with the steepness value smaller than the median of the steepness of the whole candidate feature points to obtain a candidate feature point set S ₃ ；

d. Filtering out the set S ₀ In the adjacent candidate feature points, the contrast of the candidate feature points is not dominant, namely two candidate feature points p, q epsilon S exist ₀ And |p-q|<l, l is a set distance, and may be set to a second distance different from l if the contrast |d (p) | of the two corresponds to<I d (q) i, then p will be filtered out to obtain a candidate feature point set S ₄ ；

Finally, the intersection of all candidate feature point sets is taken as a feature point set S, namely

Optionally, before embedding watermark information in the carrier audio based on the watermark embedding location, resulting in carrier audio comprising watermark information, i.e. before step 103, the steps of:

Generating a pseudo-random sequence cluster according to the key sequence;

wherein the pseudo-random sequence cluster comprises 2 orthogonal pairs ^m A plurality of pseudo-random sequences, m being a positive integer;

modulating a binary watermark to be embedded into a spread spectrum watermark by using a pseudo random sequence cluster;

step 103 specifically includes:

and based on the watermark embedding position, embedding the spread spectrum watermark as watermark information into the carrier audio to obtain the carrier audio comprising the watermark information.

Specifically, a pseudo-random sequence cluster is generated according to a key sequence, wherein the pseudo-random sequence cluster comprises 2 orthogonal pairs ^m And m is a positive integer, wherein the larger the value of m is, the larger the embedding capacity of the watermark is, but the weaker the concealment of the watermark is, and vice versa, in order to have the embedding capacity and concealment of the watermark, the value of m can be set to be 4. Further utilize pseudo-random sequence cluster to tuneAnd preparing the binary watermark to be embedded into a spread spectrum watermark, and embedding the obtained spread spectrum watermark into carrier audio to obtain the carrier audio comprising watermark information.

In the embodiment of the invention, the binary watermark is modulated into the spread spectrum watermark by utilizing the pseudo-random sequence cluster comprising a plurality of pseudo-random sequences which are orthogonal in pairs, so that one pseudo-random sequence corresponds to one-bit binary watermark in the related technology, and the pseudo-random sequence corresponds to a multi-bit binary watermark, thereby more binary watermarks can be embedded in carrier audio, and the problem of low embedding capacity in the watermarking technology is effectively solved.

Optionally, generating a pseudo random sequence cluster according to the key sequence includes the following steps:

acquiring a key sequencek _i ∈{-1，+1}，i＝0，…，l _f -1；

For k ₀ Cyclic shift to obtain matrix

wherein ,

K＝F ^T H (2)

Wherein m is equal tof _j Is a row matrix of matrix F.

Above-mentionedIs not more than log ₂ An integer of c.

Optionally, based on the watermark embedding location, embedding the spread spectrum watermark as watermark information in the carrier audio, resulting in a carrier audio comprising watermark information, comprising the steps of:

specifically, a section of audio with a fixed distance after each local feature point can be extracted as watermark embedding position in time domain, each audio section is divided into a plurality of non-overlapping audio frames, each audio frame is subjected to domain conversion, such as DCT conversion, and DCT intermediate frequency coefficient is extracted and split into a plurality of equal length coefficient blocks, such as extracting that the corresponding physical frequency in DCT coefficient is located in the interval [ f ] _l ，f _h ]The inner component is divided into a plurality of non-overlapping lengths l _f If the length is not divisible, the remaining frequency portion is not processed. Wherein the interval [ f _l ，f _h ]I.e. intermediate frequency bandwidth, where the parameter f _l ，f _h Is the verified critical point of the frequency bandwidth in this embodiment. It should be noted that the appropriate intermediate frequency coefficient helps to promote the robustness of the watermarking technique.

Calculating the embedding strength corresponding to each transform domain coefficient block by using the pseudo-random sequence cluster;

and carrying out inverse domain transformation on the transformation domain coefficient block with the watermark to obtain carrier audio comprising watermark information.

Optionally, calculating the embedding strength corresponding to each transform domain coefficient block by using the pseudo random sequence cluster includes:

calculating the embedding strength alpha corresponding to each transform domain coefficient block by using a pseudo-random sequence cluster and adopting a formula (4):

α＝γmax|xp _k |，k＝0，1，…，2 ^m -1 (4)

where γ is a constant greater than 1, and vector x characterizes each transform domain system block, vector p _k The cluster of pseudo-random sequences is characterized.

Compared with the prior art that the watermark is embedded by using fixed embedding strength, the embodiment of the invention can adaptively calculate the embedding strength corresponding to each transform domain coefficient block and embed the spread spectrum watermark into the corresponding transform domain coefficient block based on the embedding strength, thereby improving the robustness and concealment of the watermark technology.

Fig. 2 is a second flowchart of the audio watermark embedding method provided by the present invention, as shown in fig. 2, the method includes steps 201 to 206; wherein:

step 201, extracting local salient feature points of carrier audio on a time domain.

In this step, locally significant feature points on the time domain are extracted for the audio to be watermarked. The characteristic points are positioned at the position with severe change of the audio signal, and the relative positions of the characteristic points can be kept unchanged in the desynchronization attack, so that the robustness of the watermarking technology in the desynchronization attack situation is improved.

Specifically, step 201 may include steps 2011 through 2014.

Step 2011, performing low-pass filtering on the audio signal of the carrier audio to obtain a filtered audio signal x _s (n)；

In particular, a male can be usedEquation (5), performing low-pass filtering processing on the audio signal x (n) of the carrier audio to obtain a filtered audio signal x _s (n)：

x _s (n)＝G(n)*x(n) (5)

As can be appreciated, the filtered audio signal x _s (n) represents a smooth audio signal after low-pass filtering.

Alternatively, the above G (n) characterizes, for example, a Gaussian filter,

step 2012, calculating an audio signal x of the filtered carrier audio _s (n) a first order differential signal.

Wherein the first order differential signal is used to describe the variation of the audio signal, in particular in terms of x _s (n) calculating a first order differential signal d (n) using formula (1):

d(n)＝x _s (n+1)-x _s (n) (1)

step 2013, calculating local extreme points of the first-order differential signal, and adding the local extreme points to the initial candidate feature point set S of the carrier audio ₀ 。

The local extreme point of the first-order differential signal is the local maximum value of the comparison signal.

Specifically, step 2013 may include steps 2013-1 to 2013-3; wherein:

step 2013-1, calculating a contrast signal |d (n) | of the first-order differential signal d (n);

step 2013-2, calculate the differential signal t (n) of the contrast signal |d (n) |.

t(n)＝|d(n+1)|-|d(n)| (6)

step 2013-3, satisfying { i|t (i-1) in the differential signal t (n)>0，t(i)<0, adding the initial candidate feature point set S of the carrier audio ₀ 。

Step 2014, in the initial candidate feature point set S ₀ And selecting the candidate feature points which are not positioned at the tail end of the audio, have larger contrast, are positioned in the steep peak and have the dominant contrast in the adjacent candidate feature points as the local significant feature points.

In this step, four types of candidate feature points unsuitable for watermark embedding task may be sequentially excluded, and if the distance between any two candidate feature points is smaller than a set distance l, only the feature point with a larger contrast is reserved, and the set distance l represents the minimum distance between adjacent feature points.

Specifically, step 2014 may include steps 2014-1 to 2014-5; wherein:

step 2014-1, filter out set S ₀ Filtering candidate feature points with the distance smaller than the first distance from the last sampling point of the carrier audio to obtain a candidate feature point set S ₁ ；

The first distance is, for example, a set distance l.

Step 2014-2, filter out set S ₀ Candidate feature points of low contrast, i.e. calculatedCorresponding contrast |d (i) | and filtering out the candidate feature points with the contrast value smaller than the median of the contrast of the whole candidate feature points to obtain a candidate feature point set S ₂ ；

Step 2014-3, filter out set S ₀ Candidate feature points not in steep peaks, i.e. calculating the steepness of candidate feature pointsAnd filtering candidate points with the steepness value smaller than the median of the steepness of the whole candidate feature points to obtain a candidate feature point set S ₃ ；

Step 2014-4, filter out set S ₀ In the adjacent candidate feature points, the contrast of the candidate feature points is not dominant, namely two candidate feature points p, q epsilon S exist ₀ And |p-q|<l, l is a set distance, and may be set to a second distance different from l if the contrast |d (p) | of the two corresponds to <I d (q) i, then p will be filtered out to obtain a candidate feature point set S ₄ ；

Step 2014-5, regarding the intersection of all candidate feature point sets as feature point set S, i.e

Step 202, determining a watermark embedding position on a proper time domain according to the local salient feature points, performing domain transformation on an audio signal corresponding to the watermark embedding position, and selecting a part of attack stable coefficients in a transformation domain to form a group of transformation domain coefficient blocks.

In the step, a section of audio after each characteristic point is extracted as a watermark embedding position on a time domain, non-overlapping framing is carried out on the audio section, domain transformation is carried out on each audio frame, and coefficients with stable attack in part in a transformation domain are selected to form a group of transformation domain coefficient blocks.

Alternatively, the present invention may employ a DCT transform as the domain transform, because the DCT transform has better energy compression advantages, embedding the watermark in the DCT domain can improve the concealment of the watermark while the computational complexity is lower than other transforms.

Alternatively, if coefficients in the DCT coefficients are selected as watermark embedding locations in the transform domain, since the human ear is more sensitive to low frequency components in the audio, embedding the watermark in the low frequency coefficients tends to affect the audible quality of the audio, while the high frequency components are easily damaged by attacks such as low pass filters and resampling, and therefore if coefficients of the DCT are relatively more stable.

It should be noted that, in the present embodiment, the watermark is embedded in the DCT intermediate frequency coefficient of the audio segment after the feature point, but the present invention is not limited to DCT and the DCT intermediate frequency coefficient, and the present invention can be applied to any transform domain coefficient.

Fig. 3 is an exemplary schematic diagram of step 202 in the audio watermark embedding method provided in the present invention.

Specifically, step 202 may include steps 2021 to 2024.

Step 2021, extracting a section of audio with a fixed distance after the feature points are extracted from each feature point;

the fixed distance may be expressed as d _m The length of an audio segment is denoted as l _c . Here, the requirements need to be satisfied: d, d _m +l _c <l, where l may be the set distance in step 2014.

It should be noted that inequality d _m +l _c <And l, indicating that the audio segment corresponding to each feature point should not contain adjacent feature points so as to avoid embedding the watermark to destroy the local features of the feature points. It should also be noted that the feature points are located in regions of strongly varying audio amplitude, and embedding the watermark in the vicinity of this region can enhance the concealment of the watermark by using the masking effect, thus d _m The value of (2) is not suitable to be too large. Length of audio segment l _c Depending on the length of the binary watermark to be embedded, it is also affected by the frequency bandwidth of the watermark embedding.

Step 2022, dividing each audio segment into a number of non-overlapping frames;

this step aims at splitting the audio segment into shorter audio frames, thereby applying the DCT transform. Suitable audio frame length l _s The hidden property of the watermark can be enhanced, and the butterfly operation in DCT transformation can be facilitated.

Step 2023, DCT transforming each audio frame.

Step 2024, extract the DCT intermediate frequency coefficients and split them into several equal length coefficient blocks.

Specifically, the corresponding physical frequency in the extracted DCT coefficients lies in the interval [ f ] _l ，f _h ]The inner component is divided into a plurality of non-overlapping lengths l _f If the length is not divisible, the remaining frequency portion is not processed. Wherein the interval [ f _l ，f _h ]Is the intermediate frequency bandwidth, where the parameter f _l ，f _h Is the verified critical point of the frequency bandwidth in this embodiment. It should be noted that the appropriate intermediate frequency coefficient helps to promote the robustness of the watermarking technique.

Step 203, generating a pseudo-random sequence cluster according to the key sequence;

in this step, a cluster of pseudo-random sequences is generated, which are orthogonal in pairs and modulo 1.

Specifically, step 203 may include steps 2031 to 2034.

Step 2031, obtaining a normalized key sequence;

The composition of the key sequences may be different in different situations, and in this step the key sequences are uniformly transformed into a canonical form, which is represented ask _i ∈{-1，+1}，i＝0，…，l _f -1, length l _f It can be seen that the key sequence is a set of PN sequences.

Step 2032, circularly shifting the key sequence to obtain a matrix K;

specifically, for k ₀ The values of (2) are cyclically shifted to obtain a set of sequences:

the above sequences and key sequences are combined into matrixWherein the superscript ^T Representing a transpose operation.

Step 2033, decomposing the full rank of the matrix K into a matrix F and a matrix H;

specifically, the full rank of the matrix K is decomposed into a matrix F and a matrix H by adopting the formula (2):

K＝F ^T H (2)

wherein matrix H is the non-zero row portion of the Hermite standard type of matrix K, F ^T Is a full rank matrix consisting of c columns of matrix K, c being the rank of matrix K.

Step 2034, performing a gram-schmitt orthogonalization on the matrix F to obtain a pseudo-randomSequence clusters

In particular, to facilitate the subsequent watermark modulation, the first 2 of the matrix F needs to be addressed ^m Orthogonalization of the line glam-schmitt, in particular for the first 2 of the matrix F using equation (3) ^m Orthogonalization of the line gram-schmitt is carried out to obtain a pseudo-random sequence cluster

Wherein m is equal tof _j Is a row matrix of matrix F. It can be seen that p _j (j＝0,…,2 ^m -1) is the pseudo-random sequence cluster +.>/>

Step 204, adaptively calculating the embedding strength corresponding to each transform domain coefficient block by using the pseudo-random sequence cluster;

specifically, using a pseudo random sequence cluster, calculating the embedding strength alpha corresponding to each transform domain coefficient block by using a formula (4):

α＝γmax|xp _k |，k＝0，1，…，2 ^m -1 (4)

Step 205, modulating a binary watermark to be embedded into a spread spectrum watermark by using a pseudo random sequence cluster, and superposing the spread spectrum watermark into a transform domain coefficient block according to the corresponding embedding strength;

in this step, the binary watermark is split into non-overlapping m-bit watermark groups, if there is a non-integer lengthThe division is completed by zero padding, and the length of the binary watermark after 0 padding is marked as l _w . The m-bit watermark sets are then modulated into spread spectrum watermarks, and an m-bit watermark set is embedded in an intermediate frequency coefficient block.

Specifically, step 205 may include steps 2051 to 2053; wherein:

step 2051, splitting binary watermarks to obtain a plurality of m-bit watermark groups;

in particular, the binary watermark will be split into several m-bit watermark groups, which do not overlap each other.

The number of watermark sets is the same as the number of coefficient blocks obtained in step 2024. For example, if an audio segment is divided into a audio frames and each audio frame is DCT transformed to obtain b intermediate frequency coefficient blocks, a×b m-bit watermark sets should be used here.

Step 2052, modulating each watermark group into a pseudo-random sequence;

in this step, each binary m-bit watermark set is first converted into a decimal watermark set, and the decimal watermark set is mapped to a pseudo-random sequence in a pseudo-random sequence cluster.

The m-bit watermark set described above may be characterized as b= [ b ] ₀ ，b ₁ ，b ₂ ，…，b _m-1 ]。

The specific binary conversion expression is the following formula (7):

the mapping relation between the decimal watermark group and the pseudo random sequence cluster is as follows:t∈{0，1，…，2 ^m -1, whereby a mapping relationship between the m-bit watermark sets and the pseudo-random sequence can be constructed.

Step 2053, embedding the spread spectrum watermark into the DCT intermediate frequency coefficient block to obtain the DCT intermediate frequency coefficient block with the watermark;

specifically, the spread watermark is embedded into the DCT intermediate frequency coefficient block using the embedding expression (8) to obtain the watermarked DCT intermediate frequency coefficient block:

wherein the DCT intermediate frequency coefficient block is x, and the intermediate frequency coefficient block with watermark is y.

And 206, replacing the original coefficient block with the transform domain coefficient block with the watermark, and obtaining the carrier audio containing the watermark information after the inverse domain transform.

This process may be the inverse of step 202.

Specifically, step 206 may include steps 2061 to 2064; wherein:

step 2061, replacing the original coefficient block with the DCT coefficient block with watermark, and forming complete DCT coefficient;

the process is the inverse of step 2024, replacing the corresponding original coefficient block x with the watermarked coefficient block y, and recombining into DCT coefficients.

Step 2062, performing inverse discrete cosine transform;

the step carries out inverse transformation to the recombined DCT coefficient to obtain a corresponding audio frame.

Step 2063, reorganizing all audio frames to obtain audio segments;

step 2064, replacing the corresponding audio segment with the watermark to obtain the watermarked audio y (n).

It should be noted that, the audio segment corresponding to each feature point will be embedded with a complete watermark. The binary watermark is repeatedly embedded in the carrier audio.

The following describes in detail an audio watermark embedding method according to an embodiment of the present invention.

In this embodiment, 30 pieces of binaural audio with a length of 30 seconds and a sampling frequency of 44.1 khz are randomly selected as the carrier audio.

The present embodiment employs random generationAs original watermark information, i.e. l _w =320, and a 0-1 sequence of 320 bits in length can realize a more comprehensive description of copyright information.

And extracting the characteristic points of the carrier audio, enabling the minimum distance between two adjacent characteristic points to be larger than 1.5 times of sampling frequency, and taking a set distance l= 66150.

20480 samples 100 milliseconds after each feature point are extracted as audio segments, i.e.) _c =20480. Each audio segment will be divided into 10 audio frames of length 2048, i.e.) _s ＝2048。

DCT transformation is carried out on each audio frame to obtain 2048 DCT coefficients, and the physical frequency is selected to be located in the interval [ f ] _l ，f _h ]Intermediate frequency coefficient of f _l ＝f _s 16 is approximately equal to 2756 Hz, f _h ＝f _s /8=5512 hz, where f _s Is the sampling frequency, 256 coefficients are used as intermediate frequency coefficients to be embedded corresponding to 256 th bit to 511 th bit in DCT coefficients, and the 256 intermediate frequency coefficients are split into 8 coefficient blocks with the length of 32. Thus, each feature point corresponds to a total of 80 coefficient blocks.

Generating a pseudo-random sequence cluster from a key sequence of length 32 according to step 203M=4 set in the present embodiment, since the matrix K may be caused to be not full of rank during cyclic shift, the sequence length l may be set _f ＝2 ^m+1 =32 to avoid the occurrence of insufficient number of sequences in the pseudo-random sequence cluster. At this time, the pseudo random sequence cluster is expressed as

For each coefficient block, the embedding strength α is calculated according to the above formula (4).

Splitting the 320-bit binary watermark into 80 4-bit watermark groups, and adding each watermark group according to the formula (7)t∈{0，1，…，2 ^m -1}, mapping a 4-bit watermark set to a corresponding pseudo-random sequence p _t 。

And then embedding the 80 m-bit watermark groups into the 80 coefficient blocks one by one according to the formula (8).

Carrier audio comprising watermark information is obtained according to step 206 described above.

It should be noted that, when extracting feature points from different carrier audios, the distance between adjacent feature points is often greater than the minimum distance set in the embodiment, so that the method of the present invention can only estimate the upper bound when calculating the embedding capacity. The upper bound of the embedding capacity of this embodiment is 500 bits per second, which is much higher than the similar approach.

In order to embody the improvement of the self-adaptive embedding strength strategy on the concealment of the watermark technology, in the embodiment, a set of comparison experiments for fixing the embedding strength are carried out.

Setting fixed embedding strength; the average embedding loss obtained after the watermark is embedded in 30 cases of audio samples is basically the same as that of the invention, at the moment, the fixed embedding strength is set to be 0.33, the corresponding signal watermark ratio (signal watermark ratio, SWR) is 25.20dB, and the SWR corresponding to the self-adaptive embedding strength used by the invention is 25.13dB. Objective quality assessment and subjective assessment are introduced to measure the concealment of the watermark.

Fig. 4 to 6 show SWR, objective quality assessment (objective differentce grade, ODG), subjective quality assessment (Mean Opinion Score, MOS) for 30 cases of audio in the two embedding strength cases described above.

Wherein, fig. 4 is SWR of 30 cases of audio in both fixed embedding strength and adaptive embedding strength; FIG. 5 is an ODG for 30 cases of audio with both fixed embedding strength and adaptive embedding strength; fig. 6 is a MOS of 30 cases of audio in both fixed embedding strength and adaptive embedding strength.

In fig. 4 to 6, the more upward the vertical axis is, the better the concealment of the watermark. It can be seen that with a fixed embedding strength, although SWR approximates the adaptive embedding strength, MOS and ODG perform worse, meaning that the adaptive embedding strength strategy can achieve better watermark concealment, thus ensuring that watermark embedding does not destroy the value of use of the carrier audio.

The SWR is calculated by the following formula (9):

it can be seen that SWR reflects the amount of change in the carrier audio by watermark information embedding.

Fig. 7 is a third flow chart of an extraction method corresponding to the audio watermark embedding method provided by the present invention, as shown in fig. 7, the method includes steps 701 to 705; wherein:

Step 701, extracting local significant feature points of carrier audio on a time domain;

step 702, determining a watermark embedding position on a proper time domain according to local salient feature points, performing domain transformation on an audio signal corresponding to the watermark embedding position, and selecting coefficients with stable partial attack in a transformation domain to form a group of transformation domain coefficient blocks;

step 703, generating a pseudo-random sequence cluster according to the key sequence;

it should be noted that, the specific implementation details of steps 701 to 703 corresponding to steps 201 to 203 are substantially the same, and are not described herein again to avoid repetition.

Step 704, extracting the embedded watermark according to the correlation between the pseudo random sequence cluster and the coefficient block.

In this step, let the coefficient block of the watermark to be extracted beThe coefficient block is pre-extracted using equation (10) using each sequence in the pseudo-random sequence cluster:

and selecting the largest item from the pre-extracted result as an extracted watermark by adopting a formula (11):

will beConverting to binary system is the extracted m-bit watermark group.

Repeating the steps for each coefficient block to obtain the complete binary watermark.

The invention also analyzes the formula (8), the formula (10), the formula (11) and the formula (4) to perfect the derivation process of the self-adaptive embedding strength. Let the embedded m-bit watermark set be represented as a decimal number t, then the extraction is as follows:

If j=t, then there are the following derivation formulas (13) to (15):

wherein, the formula (14) is obtained by mutually orthogonal sequences in the pseudo-random sequence cluster and the module is 1; equation (15) is then available due to the two same signs of the absolute value in equation (14).

Correspondingly, if j+.t, there are the following derivation formulas (16) to (17):

in order for equation (11) to extract the embedded watermark t, it is necessary toTo satisfy R _t Is 2 ^m The pre-extraction results R _j Is maximum and therefore can be given by formula (18):

the value of the embedding capacity α is actually the result of scaling the inequality (21), specifically, the derivation formulas (19) to (21):

considering that audio experiences attacks during transmission, the distortion caused by these attacks is considered to be much smaller than the audio energy, so adding a constant factor to the inequality (19) can further enhance the robustness of the watermark, i.e. the above equation (4) can be obtained.

The robustness of watermarking techniques is usually evaluated using a Bit Error Rate (BER), which is a ratio of the number of erroneous bits to the total number of bits, which is a comparison of the extracted watermark with the original embedded watermark, the lower the BER, the better the robustness of the watermark. Original binary watermark sequence w= [ w ] ₀ ，w ₁ ，…，w _n-1 ]And extracted watermark sequenceThe bit error rate therebetween is calculated as follows in equation (22):

wherein ,

for 30 cases of watermark-embedded audio obtained in the above embodiment, various signal processing attacks are respectively implemented, where the signal processing attacks include: the amplitude is reduced to 0.7 times and expanded to 1.3 times, MP3 format compression with bit rate of 128kbps and 96kbps respectively, low-pass filtering with cut-off frequency of 8 KHz, additive Gaussian white noise with signal-to-noise ratio of 15 dB and 10 dB respectively and quantization with 16 bits-8 bits are adopted, then the watermark in the attacked watermark audio is extracted by using the embodiment of the embedding method of the audio watermark, and compared with the original watermark, and the average bit error rate of 30 extracted watermarks is obtained, as shown in the table 1.

Table 1 average BER of 30 extracted watermarks under different signal processing attacks

The watermark audio is embedded in 30 cases obtained in the above embodiment, and two types of desynchronizing attacks are implemented, including: resampling at a sampling frequency of 22.05 khz or 11.025 khz and retaining the middle section for 10 seconds and 20 seconds, respectively, and then extracting the watermark from the attacked watermark audio by using the audio watermark-based embedding method embodiment, and comparing the extracted watermark with the original watermark to obtain the average bit error rate of 30 extracted watermarks, which is shown in table 2.

Table 2 average BER of 30 extracted watermarks under different desynchronisation attacks

From the results shown in tables 1, 2 and fig. 4 to 6, it can be seen that the embodiment of the present invention can extract the watermark with 100% accuracy and also can ensure the audio hearing quality (ODG > -1.0, mos > 3.5) in the face of common signal processing attacks and desynchronization attacks. The invention can resist various signal processing attacks and desynchronization attacks, has stronger robustness, and can ensure the concealment of the watermark in each case of audio.

In the embodiment of the above-mentioned audio watermark embedding method, the audio watermark is embedded in the audio watermarkLength of binary watermark l _w The concealment, robustness and embedding capacity of the audio watermark embedding method were evaluated to obtain SWR, ODG, BER and embedding capacity (upper bound) of 30 cases of embedded audio as shown in table 3.

TABLE 3 different watermark Length l _w Corresponding watermark embedding parameters

Watermark Length l _w	SWR(dB)	ODG	BER(％)	Capacity (bps)
					80	30.98	-0.51	0	500
160	27.94	-0.57	0	500
					320	25.13	-0.61	0	500
640	22.03	-0.58	0	500
					800	21.07	-0.64	15.17	500
900	20.55	-0.62	51.23	500

From Table 3, it can be seen that the watermark length l _w Increases in SWR but do not affect ODG scores. This is because SWR reflects the global embedding effect, while ODG focuses more on the local perceived quality, which also conforms more to the logic of the human ear auditory system. When the watermark length is greater than 750, the bit error rate increases rapidly. This is because this exceeds 1.5 times the upper bound of watermark capacity, and the (minimum) distance l= 66150 between adjacent feature points cannot accommodate so much watermark embedding.

It should be noted that, the 320-bit watermark embedding is selected in this embodiment because of practical application, and is not a limitation of the present invention.

In the embodiment of the above-mentioned audio watermark embedding method, the audio frame length l _s The concealment, robustness and embedding capacity of the embedding method of the audio watermark are evaluated. SWR, ODG, BER and embedding capacity (upper bound) for 30 cases of embedded audio are shown in table 4.

Table 4 different audio framesLong l _s Corresponding watermark embedding parameters

Audio frame length l _s	SWR(dB)	ODG	BER(％)	Capacity (bps)
					256	25.02	-0.54	0	500
512	24.94	-0.53	0	500
					1024	24.89	-0.55	0	500
2048	25.13	-0.61	0	500
					4096	25.10	-0.72	0	500
8192	24.99	-0.90	0	500

From Table 4, it can be seen that with audio frame length l _s The SWR, BER, and embedding capacity are hardly affected, but the ODG is lowered. Audio frame length l _s The change in (c) does not change the global embedding effect but locally will cause the watermark of the same pattern (a pseudo random sequence) to last longer, which changes the local perceived quality.

It should be noted that, for convenience of implementation, the audio frame length is set to 2048 in this embodiment, which is not a limitation of the present invention.

In an embodiment of the above-described method of embedding an audio watermark, the watermark bits m represented by each pseudo-random sequence or the length l of the DCT coefficient block _f (in fact, in this embodiment there is l _f ＝2 ^m+1 ) The concealment, robustness and embedding capacity of the embedding method of the audio watermark are evaluated. SWR, ODG, BER and embedding capacity (upper bound) for 30 cases of embedded audio are shown in table 5.

TABLE 5 different coefficient block lengths l _f Corresponding watermark embedding parameters

From Table 5, it can be seen that the block length l follows the DCT coefficients _f The SWR is degraded and the degradation is not alleviated as the sequence becomes longer. This is because when the sequence is shorter, the corresponding watermark group length m is smaller, the embedding strength obtained by the formula (4) is smaller (the smaller the number of samples is in the distribution of maximizing the random number, the larger the probability of maximum value is smaller), and the embedding strength gradually tends to saturate (i.e., does not become infinitely larger as the number of samples increases) as the sequence length continues to increase. When the coefficient block length is small, the pseudo random sequence is also shorter, and the independence between them is affected. The extraction method described in this embodiment is based on the correlation between the pseudo random sequence clusters and the coefficient blocks, and the independence between them has a significant impact on robustness. Since the present embodiment is designed to ensure that a sufficient number of mutually orthogonal pseudo-random sequences can be generated, the length of the designed sequence has a redundancy of 50%, so that the embedding capacity (upper bound) is lower as the coefficient block is longer.

It should be noted that, in combination with the experimental results of concealment, robustness and embedding capacity, the coefficient block length set in this embodiment is 32, which is not a limitation of the present invention.

The following describes an audio watermark embedding apparatus provided by the present invention, and the audio watermark embedding apparatus described below and the audio watermark embedding method described above may be referred to correspondingly to each other.

Fig. 8 is a schematic structural diagram of an audio watermark embedding apparatus provided by the present invention, and as shown in fig. 8, the audio watermark embedding apparatus 800 includes:

an extracting module 801, configured to extract local salient feature points of the carrier audio in a time domain;

a determining module 802, configured to determine a watermark embedding position of the carrier audio in a time domain according to the local salient feature points;

an embedding module 803 is configured to embed watermark information in carrier audio based on the watermark embedding location, to obtain carrier audio including watermark information.

Optionally, the extracting module 801 is specifically configured to:

low-pass filtering the audio signal of the carrier audio to obtain a filtered audio signal x _s (n)；

d(n)＝x _s (n+1)-x _s (n) (1)

calculating local extremum points of d (n) as candidate feature points;

Extracting candidate feature points meeting the first condition as local significant feature points;

wherein satisfying the first condition includes at least one of:

Optionally, the extracting module 801 is further specifically configured to:

calculating the contrast signal |d (n) | of d (n);

calculating a differential signal t (n) of |d (n) |;

and selecting local extreme points meeting { i|t (i-1) >0 and t (i) <0} in t (n) as candidate feature points.

Optionally, the audio watermark embedding apparatus 800 further includes: a generation module and a modulation module.

The generation module is used for generating a pseudo-random sequence cluster according to the key sequence; wherein the pseudo-random sequence cluster comprises 2 orthogonal pairs ^m A plurality of pseudo-random sequences, m being a positive integer;

and the modulation module is used for modulating the binary watermark to be embedded into a spread spectrum watermark by utilizing the pseudo random sequence cluster.

The embedding module 802 is specifically configured to embed the spread spectrum watermark as watermark information in the carrier audio based on the watermark embedding location, so as to obtain the carrier audio including the watermark information.

Optionally, the generating module is specifically configured to:

acquiring a key sequencek _i ∈{-1，+1}，i＝0，…，l _f -1；

For k ₀ Cyclic shift to obtain matrix

wherein ,

K＝F ^T H (2)

Wherein m is equal tof _j Is a row matrix of matrix F.

Optionally, the embedding module 802 is further specifically configured to:

α＝γmax/xp _k |，k＝0，1，…，2 ^m -1 (4)

In the embodiment of the invention, the extraction module extracts the local significant feature points of the carrier audio at the position where the audio signal changes severely, and the watermark embedding position determined based on the local significant feature points can keep the relative position unchanged when facing the desynchronization attack, so that the audio fragment embedded with the watermark can be accurately positioned, the watermark embedded in the audio fragment can be accurately extracted, and the robustness of the watermark technology in the desynchronization attack situation is effectively improved.

Fig. 9 is a schematic physical structure of an electronic device provided by the present invention, and as shown in fig. 9, the electronic device 900 may include: processor 910, communication interface (Communications Interface), memory 930, and communication bus 940, wherein processor 910, communication interface 920, and memory 930 communicate with each other via communication bus 940. The processor 910 may invoke logic instructions in the memory 930 to perform a method of embedding an audio watermark, the method comprising:

Extracting local salient feature points of the carrier audio on a time domain;

according to the local salient feature points, determining watermark embedding positions of carrier audio on a time domain;

Further, the logic instructions in the memory 930 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing a method of embedding an audio watermark provided by the methods described above, the method comprising:

extracting local salient feature points of the carrier audio on a time domain;

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a method of embedding an audio watermark provided by the above methods, the method comprising:

extracting local salient feature points of the carrier audio on a time domain;

The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of embedding an audio watermark, comprising:

extracting local salient feature points of the carrier audio on a time domain;

embedding watermark information into the carrier audio based on the watermark embedding position to obtain carrier audio comprising the watermark information;

wherein before the watermark information is embedded in the carrier audio based on the watermark embedding location to obtain carrier audio including the watermark information, the method further comprises:

generating a pseudo-random sequence cluster according to the key sequence;

based on the watermark embedding position, embedding the spread spectrum watermark as watermark information in the carrier audio to obtain carrier audio comprising the watermark information;

the generating a pseudo-random sequence cluster according to the key sequence comprises the following steps:

acquiring the key sequencek _i ∈{-1，+1}，i＝0，…，l _f -1；

For k ₀ Cyclic shift to obtain matrix

wherein ,

K＝F ^T H (2)

Wherein m is equal tof _j Is a row matrix of matrix F.

2. The method of embedding an audio watermark according to claim 1, wherein said extracting locally significant feature points of the carrier audio in the time domain comprises:

d(n)＝x _s (n+1)-x _s (n) (1)

calculating local extremum points of d (n) as candidate feature points;

wherein the meeting the first condition includes at least one of:

3. The method of embedding an audio watermark according to claim 2, wherein said calculating a local extremum point of d (n) as a candidate feature point comprises:

calculating the contrast signal |d (n) | of d (n);

calculating a differential signal t (n) of |d (n) |;

4. The method according to claim 1, wherein the embedding the spread spectrum watermark as watermark information in the carrier audio based on the watermark embedding position, to obtain carrier audio including the watermark information, comprises:

5. The method according to claim 4, wherein calculating the embedding strength corresponding to each transform domain coefficient block using the pseudo random sequence cluster, comprises:

α＝γmax|xp _k |，k＝0，1，…，2 ^m -1 (4)

Where γ is a constant greater than 1, and vector x represents each of the transform domain coefficient blocks, vector p _k Characterizing the cluster of pseudo-random sequences.

6. An audio watermark embedding apparatus, comprising:

the embedding module is used for embedding watermark information into the carrier audio based on the watermark embedding position to obtain the carrier audio comprising the watermark information;

wherein the apparatus further comprises:

a generation module for: generating a pseudo-random sequence cluster according to the key sequence; wherein the pseudo random sequence cluster comprises 2 orthogonal pairs ^m A plurality of pseudo-random sequences, m being a positive integer;

a modulation module for: modulating the binary watermark to be embedded into a spread spectrum watermark by utilizing the pseudo random sequence cluster;

the embedded module is specifically used for: based on the watermark embedding position, embedding the spread spectrum watermark as watermark information in the carrier audio to obtain carrier audio comprising the watermark information;

The generating module is specifically configured to:

acquiring the key sequencek _i ∈{-1，+1}，i＝0，…，l _f -1；

For k ₀ Cyclic shift to obtain matrix

wherein ,

K＝F ^T H (2)

Wherein m is equal tof _j Is a row matrix of matrix F.

7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of embedding an audio watermark as claimed in any one of claims 1 to 5 when executing the program.

8. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the method of embedding an audio watermark as claimed in any one of claims 1 to 5.