CN113362835B

CN113362835B - Audio watermarking method, device, electronic equipment and storage medium

Info

Publication number: CN113362835B
Application number: CN202010147458.5A
Authority: CN
Inventors: 熊贝尔; 朱一闻; 曹偲; 郑博; 刘华平
Original assignee: Hangzhou Netease Cloud Music Technology Co Ltd
Current assignee: Hangzhou Netease Cloud Music Technology Co Ltd
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2024-06-07
Anticipated expiration: 2040-03-05
Also published as: CN113362835A

Abstract

The application discloses an audio watermark processing method, an audio watermark processing device, electronic equipment and a storage medium, which can reduce the energy of embedding watermark, ensure the accurate extraction of extracted watermark and obtain better robustness and concealment. The method comprises the following steps: obtaining frequency domain signals corresponding to each audio frame in the audio respectively; watermark information is embedded in the imaginary component of the frequency domain signal corresponding to each audio frame; and carrying out time domain reconstruction on the frequency spectrum signal embedded with the watermark information to obtain the audio containing the watermark information.

Description

Audio watermarking method, device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an audio watermarking method, an audio watermarking device, an electronic device, and a storage medium.

Background

This section is intended to provide a background or context to the embodiments of the application that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

The conventional method for embedding watermark in audio generally transforms time-domain audio into frequency domain through FFT (Fast Fourier Transform ), embeds watermark information in the frequency spectrum amplitude or phase angle of the frequency-domain signal, and then performs IFFT (INVERSE FAST Fourier Transform ) on the frequency-domain signal embedded with the watermark information to reconstruct the time-domain audio to obtain the audio embedded with watermark. In extracting the watermark, the audio is also transformed into the frequency domain, thereby extracting the watermark information. Since existing audio watermark embedding methods typically embed a watermark in the spectral amplitude or phase angle, this affects the quality of the audio and is vulnerable to attack, resulting in the watermark being destroyed. Therefore, the existing audio watermarking method is poor in robustness and concealment.

Disclosure of Invention

Aiming at the technical problems, an improved method is very needed, which can reduce the energy of embedding the watermark and ensure the accurate extraction of the watermark so as to obtain better robustness and concealment.

In one aspect, an embodiment of the present application provides an audio watermarking method, including:

Obtaining frequency domain signals corresponding to each audio frame in the audio respectively;

watermark information is embedded in the imaginary component of the frequency domain signal corresponding to each audio frame;

and carrying out time domain reconstruction on the frequency spectrum signal embedded with the watermark information to obtain the audio containing the watermark information.

Optionally, determining, as the amplitude corresponding to each target frequency point, a product of the element value corresponding to each target frequency point and the masking threshold corresponding to each target frequency point, includes:

and determining the product of the element value corresponding to each target frequency point, the product of the masking threshold value corresponding to each target frequency point and the intensity coefficient as the amplitude corresponding to each target frequency point.

Optionally, the threshold value corresponding to each audio frame is determined according to a maximum value of imaginary components of each frequency point in the frequency domain signal corresponding to the audio frame.

Optionally, the obtaining the frequency domain signals corresponding to each audio frame in the audio specifically includes:

and respectively carrying out windowed fast Fourier transform on each audio frame in the audio to obtain frequency domain signals respectively corresponding to each audio frame, wherein cosine components in each frequency domain signal are imaginary components.

obtaining frequency domain signals corresponding to each audio frame in the audio embedded with watermark information;

And extracting the watermark information from the imaginary component of the frequency domain signal corresponding to each audio frame.

Optionally, the extracting the watermark information from the imaginary component of the frequency domain signal corresponding to each audio frame specifically includes:

extracting imaginary components of each frequency point from the frequency domain signals corresponding to each audio frame;

carrying out logarithmic operation processing on the imaginary components of each frequency point corresponding to each audio frame;

And extracting watermark information from the imaginary component subjected to logarithmic operation.

Optionally, the watermark information is a binary sequence, and if a value in the watermark information is embedded in each audio frame, extracting watermark information from the imaginary component after the logarithmic operation processing specifically includes:

Respectively calculating inner products of imaginary components corresponding to each audio frame after logarithmic operation processing and a specified sequence p to obtain detection judgment quantity corresponding to each audio frame, wherein the specified sequence p is a sequence used when the watermark information is subjected to spread spectrum processing when the watermark information is embedded in the audio;

If the detection decision quantity is greater than 0, determining that the value embedded in the corresponding audio frame is 0, otherwise, determining that the value embedded in the corresponding audio frame is 1;

And sequentially and serially connecting the determined numerical values embedded in each audio frame according to the sequence of each audio frame in the audio to obtain the watermark information.

Optionally, the watermark information is a binary sequence, and if one value of the watermark information is embedded in two adjacent audio frames in the audio, extracting watermark information from the imaginary component after logarithmic operation processing specifically includes:

If the difference value between the detection decision amounts corresponding to two adjacent audio frames embedded with the same value is larger than 0, determining that the value embedded in the two adjacent audio frames is 0, otherwise, determining that the value embedded in the two adjacent audio frames is 1;

And sequentially and serially determining the numerical values embedded in each pair of adjacent two audio frames according to the sequence of each pair of adjacent two audio frames in the audio to obtain the watermark information.

In one aspect, an embodiment of the present application provides an audio watermarking apparatus, including:

The first conversion module is used for obtaining frequency domain signals corresponding to each audio frame in the audio;

The watermark embedding module is used for embedding watermark information in the imaginary component of the frequency domain signal corresponding to each audio frame;

And the time domain reconstruction module is used for performing time domain reconstruction on the frequency spectrum signal embedded with the watermark information to obtain the audio containing the watermark information.

Optionally, the watermark embedding module is specifically configured to embed the watermark information in any one of the audio frames by:

According to the sequence position of any audio frame in each audio frame, obtaining information to be embedded in a sequence to be embedded in a corresponding sequence position, wherein the sequence to be embedded comprises a plurality of pieces of information to be embedded determined according to the watermark information;

Determining a frequency point of which the absolute value of the imaginary component in the frequency domain signal corresponding to any audio frame is smaller than a threshold value as a target frequency point;

And embedding information to be embedded corresponding to any audio frame in the imaginary component of each target frequency point.

Optionally, the watermark information is a binary sequence, and the watermark embedding module is further configured to determine the sequence to be embedded according to the watermark information by:

according to the appointed sequence p, carrying out spread spectrum processing on the numerical value of each sequence bit in the watermark information to obtain the sequence to be embedded;

the method comprises the steps of obtaining information to be embedded +p after spreading a value 0 in watermark information, and obtaining information to be embedded-p after spreading a value 1 in watermark information; or the value 0 in the watermark information is spread to obtain two pieces of information to be embedded +p and +p, and the value 1 in the watermark information is spread to obtain two pieces of information to be embedded-p and +p.

Optionally, the watermark embedding module is specifically configured to:

According to the sequence of each target frequency point in the frequency domain signal, determining the element value corresponding to each target frequency point in the information to be embedded corresponding to any audio frame;

And respectively adding corresponding amplitude values to the imaginary component of each target frequency point according to the element values corresponding to each target frequency point.

Optionally, the watermark embedding module is further configured to input each audio frame of the audio into a psychoacoustic model, to obtain a masking threshold corresponding to each target frequency point in the frequency domain signal corresponding to each audio frame;

The watermark embedding module is specifically configured to:

determining the product of the element value corresponding to each target frequency point and the masking threshold value corresponding to each target frequency point as the amplitude value corresponding to each target frequency point;

And adding corresponding amplitude values to the imaginary component of each target frequency point respectively.

Optionally, the watermark embedding module is specifically configured to: and determining the product of the element value corresponding to each target frequency point, the product of the masking threshold value corresponding to each target frequency point and the intensity coefficient as the amplitude corresponding to each target frequency point.

Optionally, the first conversion module is specifically configured to: and respectively carrying out windowed fast Fourier transform on each audio frame in the audio to obtain frequency domain signals respectively corresponding to each audio frame, wherein cosine components in each frequency domain signal are imaginary components.

The second conversion module is used for obtaining frequency domain signals corresponding to each audio frame in the audio containing watermark information;

and the watermark extraction module is used for extracting the watermark information from the imaginary component of the frequency domain signal corresponding to each audio frame.

Optionally, the watermark extraction module is specifically configured to:

Optionally, the watermark information is a binary sequence, and if a value in the watermark information is embedded in each audio frame, the watermark extraction module is specifically configured to:

Optionally, the watermark information is a binary sequence, and if one value in the watermark information is embedded in two adjacent audio frames in the audio, the watermark extraction module is specifically configured to:

In one aspect, an embodiment of the present application provides an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of any of the methods described above when the processor executes the computer program.

In one aspect, an embodiment of the present application provides a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, perform the steps of any of the methods described above.

In one aspect, an embodiment of the present application provides a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which when executed by a processor implement the steps of any of the methods described above.

According to the audio watermarking processing method, the device, the electronic equipment and the storage medium, only watermark information is embedded in the imaginary component of the frequency spectrum signal corresponding to the audio, watermark embedding is not carried out on the real component, because various processing on the audio is carried out on the real component of the audio frequency spectrum, the processing is carried out on the imaginary component of the audio frequency spectrum, the real component is easy to suffer various digital attacks, the attack on the imaginary component is little, and therefore the watermark information is embedded in the imaginary component, various digital attacks can be effectively resisted, and the watermark information is accurately extracted. In addition, because the imaginary component is generally much smaller than the real component, the total energy of the embedded watermark is very small, and the user can hardly perceive the difference of the audio before and after the watermark is embedded, thereby ensuring the tone quality of the audio after the watermark is embedded. Therefore, the audio watermarking processing method provided by the application can ensure the tone quality of the audio after watermarking, ensure the accurate extraction of the watermark and obtain better robustness and concealment.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present application are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

fig. 1 is a schematic diagram of an application scenario of an audio watermarking method according to an embodiment of the present application;

fig. 2 is a flowchart of an audio watermarking method according to an embodiment of the present application;

Fig. 3 is a schematic flow chart of embedding watermark information in an audio frame according to an embodiment of the present application;

Fig. 4 is a flowchart of an audio watermarking method according to an embodiment of the present application;

fig. 5 is a schematic flow chart of extracting watermark information according to an embodiment of the present application;

fig. 6 is a schematic flow chart of extracting watermark information according to an embodiment of the present application;

fig. 7 is a schematic flow chart of extracting watermark information according to an embodiment of the present application;

FIG. 8 is a diagram of a watermark pattern embedded in audio and a watermark pattern extracted from audio used in a test procedure according to an embodiment of the present application;

FIG. 9 is a waveform diagram of the original audio used in the test procedure and the audio after embedding the watermark pattern shown in FIG. 8 according to an embodiment of the present application;

Fig. 10 is a schematic structural diagram of an audio watermarking apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an audio watermarking apparatus according to an embodiment of the present application;

Fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The principles and spirit of the present application will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable those skilled in the art to better understand and practice the application and are not intended to limit the scope of the application in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the application to those skilled in the art.

Those skilled in the art will appreciate that embodiments of the application may be implemented as a system, apparatus, device, method, or computer program product. Thus, the application may be embodied in the form of: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

In this document, it should be understood that any number of elements in the drawings is for illustration and not limitation, and that any naming is used only for distinction and not for any limitation.

For convenience of understanding, the terms involved in the embodiments of the present application are explained below:

Audio watermarking (audio watermarking): an information hiding technology, namely, embedding digital watermark information into audio, has no great influence on sound quality or is not perceived by human ears. Once the audio is illegally transmitted and copied, watermark information can be extracted from the sample audio to clarify the copyright owner and punish infringement.

Masking (imperceptibility): also called imperceptibility, reflects how much the human ear can perceive the change in sound quality produced by the watermark.

Robustness (Robustness): reflects the capability of accurately extracting watermark information after the audio frequency embedded with the watermark is attacked in the transmission and copying processes.

Information rate: i.e. how many bits of watermark information can be embedded in 1s of audio.

Blind extraction (blind extracting): the watermark information can be extracted without the help of the original audio, only with the sample audio.

Transform domain: other domains to which the time domain signal is equivalently transformed are commonly referred to as discrete fourier transform (DFT, discrete Fourier Transform) domain (frequency domain), discrete cosine transform (DCT, discrete Cosine Transform) domain, and discrete wavelet transform (DWT, discrete Wavelet transform) domain.

Fast fourier transform: is a fast algorithm of discrete fourier transform DFT, engineering generally uses FFT as an implementation of DFT, and IFFT is an inverse transform of FFT, i.e. transforming a frequency domain signal back to the time domain.

Spread spectrum: the method is used for wireless communication at first, and in an audio watermarking system, single bits are modulated onto a pseudo-random sequence (PN sequence), and the PN sequence has good autocorrelation characteristics, so that watermark information can be accurately extracted even if the audio is subjected to various attacks in the transmission process. Spread spectrum watermarking is one of the most robust watermark systems accepted in the industry.

Host signal interference (host SIGNAL INTERFERENCE): is an inherent disturbance in spread spectrum based audio watermarking systems. Since watermark information is extracted blindly and correlation detection is adopted, the correlation between the spread spectrum watermark and the audio signal is not known, so the audio signal is also regarded as noise.

Masking threshold: english masking threshold, the modification of the audio can be considered as imperceptible to the human ear as long as the modification is below the masking threshold. Masking thresholds are calculated from psychoacoustic models.

Psychoacoustic model: i.e., psychoacoustic model, was originally used for perceptual audio coding. In the watermark system, a masking threshold of each frequency point can be obtained to determine the upper limit of the embedded energy of each frequency point so as to ensure that the tone quality change is in a range which can not be detected by human ears. Most audio watermarking algorithms make use of the principle of this model more or less.

The principles and spirit of the present application are explained in detail below with reference to several representative embodiments thereof.

Summary of The Invention

The inventor of the present application has found that existing audio watermark embedding methods typically embed a watermark in the spectral amplitude or phase angle of the audio spectrum, which affects the quality of the audio and is vulnerable to attack, resulting in the watermark being corrupted. Therefore, the existing audio watermarking method is poor in robustness and concealment.

In order to solve the above problems, the present application provides an audio watermarking method, which embeds watermark information only in an imaginary component of a spectrum signal corresponding to audio, and does not embed watermark in a real component, and specifically includes: obtaining frequency domain signals corresponding to each audio frame in the audio respectively; watermark information is embedded in the imaginary component of the frequency domain signal corresponding to each audio frame; and carrying out time domain reconstruction on the frequency spectrum signal embedded with the watermark information to obtain the audio containing the watermark information. Since various processing is performed on the audio only for the real component of the audio frequency spectrum, the processing is performed on the imaginary component of the audio frequency spectrum rarely, the real component is easy to suffer various digital attacks, and the imaginary component is rarely attacked, so that watermark information is embedded in the imaginary component, various digital attacks can be effectively resisted, and the watermark information is accurately extracted. In addition, because the imaginary component is generally much smaller than the real component, the total energy of the embedded watermark is very small, and the user can hardly perceive the difference of the audio before and after the watermark is embedded, thereby ensuring the tone quality of the audio after the watermark is embedded. Therefore, the audio watermarking processing method provided by the application can ensure the tone quality of the audio after watermarking, ensure the accurate extraction of the watermark and obtain better robustness and concealment.

Having described the basic principles of the present application, various non-limiting embodiments of the application are described in detail below.

Application scene overview

Referring to fig. 1, an application scenario of an audio watermarking method according to an embodiment of the present application is shown. The application scenario comprises a watermark embedding device 101, a server 102 and a watermark extraction device 103. Wherein the watermark embedding device 101, the watermark extraction device 103 and the server 102 are connected through a wired or wireless communication network. The watermark embedding device 101 has a function of adding watermark information to audio, the watermark extracting device 103 has a function of extracting watermark information from audio, and the watermark embedding device 101 and the watermark extracting device 103 may be two independent devices or may be integrated in the same device, that is, the same device has both functions of adding watermark and extracting watermark. The watermark embedding device 101 and the watermark extracting device 103 may be separate entity devices, or may be application programs installed in the terminal device. The server 102 is used for providing services such as listening to audio, downloading audio or uploading audio to a user, and the server 102 can be a server, a server cluster formed by a plurality of servers or a cloud computing center. The user watermarks information in the audio through the watermark embedding device 101, and then sends the audio embedded with the watermark information to the server 102 corresponding to the network platform so as to release the audio embedded with the watermark information. Other users may listen to and download audio published on the network platform. When it is desired to trace or verify the copyright of the audio, the user may extract watermark information from the audio by the watermark extraction apparatus 103 to prove that the user owns the copyright of the audio.

Exemplary method

An audio watermarking method according to an exemplary embodiment of the present application is described below in connection with the application scenario of fig. 1. It should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principle of the present application, and the embodiments of the present application are not limited in any way. Rather, embodiments of the application may be applied to any scenario where applicable.

Referring to fig. 2, an audio watermarking method provided by an embodiment of the present application may be applied to the watermark embedding apparatus 101 shown in fig. 1, and may specifically include the following steps:

S201, obtaining frequency domain signals corresponding to each audio frame in the audio.

The length of one audio frame may be set according to practical application requirements, for example, the length of one audio frame may be 10 ms, 25 ms, etc.

In particular, in order to prevent spectrum leakage, windowed fast fourier transforms may be performed on each audio frame in audio to obtain frequency domain signals corresponding to each audio frame, where a cosine component in each frequency domain signal is an imaginary component. The window size used in performing the fast fourier transform may be determined according to the actual application scenario, which is not limited in the embodiment of the present application.

Assuming that a time domain signal corresponding to an audio frame is X, after windowed fast fourier transform, the obtained frequency domain signal is X, and a specific transform formula is: Where hk is the window function, X N is one frequency point in the frequency domain signal X, and N frequency points are present in total, and cosine component/>As the real component of the frequency point X [ n ], the sinusoidal component/>Is the imaginary component of frequency point X n.

S202, watermark information is embedded in the imaginary part component of the frequency domain signal corresponding to each audio frame.

In implementation, the watermark information embedded in the audio is a binary sequence, such as: 00111010. and (3) performing spread spectrum processing on the numerical value of each sequence bit in the watermark information according to a designated sequence p to obtain a sequence to be embedded, wherein the designated sequence p is a binary sequence generated by a fixed key. The specific spread spectrum method can be as follows: spreading the value "0" in the watermark information to +p, spreading the value "1" in the watermark information to-p, for example, spreading the watermark information "00111010" to obtain a sequence to be embedded which is { +p, -p, -p, +p }, wherein each +p or-p in the sequence to be embedded is information to be embedded corresponding to one bit in the watermark information. Of course, the specific spreading method may also be: the value "0" in the watermark information is spread to be-p, and the value "1" in the watermark information is spread to be +p, which is not limited by the embodiment of the application.

Then, for any one of the audio frames, according to the sequence position of the audio frame in each audio frame, obtaining information to be embedded in the corresponding sequence position in a sequence to be embedded, and adding the obtained information to be embedded into the imaginary part component of the frequency domain signal corresponding to the audio frame, wherein the sequence to be embedded comprises a plurality of pieces of information to be embedded determined according to watermark information. In particular, watermark information may be embedded in the imaginary component of the frequency domain signal by the following formula: y.im=x.im±p, where x.im is an imaginary component, Y is a frequency domain signal embedded with watermark information, and y.im is an imaginary component of the frequency domain signal embedded with watermark information.

In the implementation, assuming that the audio has M audio frames in total, and the watermark information includes N bits, the correspondence between each audio frame and the information to be embedded of each sequence bit in the sequence to be embedded may be: the 1 st audio frame corresponds to the 1 st information to be embedded in the sequence to be embedded, the 2 nd audio frame corresponds to the 2 nd information to be embedded in the sequence to be embedded, the … … nth audio frame corresponds to the N th information to be embedded in the sequence to be embedded, the (n+1) th audio frame corresponds to the 1 st information to be embedded in the sequence to be embedded, the (n+2) th audio frame corresponds to the 2 nd information to be embedded in the sequence to be embedded, the … … nd audio frame corresponds to the N th information to be embedded in the sequence to be embedded, and so on until the bit corresponding to the last audio frame (i.e., the (M) th audio frame) is determined. Of course, watermark information may be embedded in only zN audio frames, where z is 1 or more and zN is M or less.

Furthermore, in order to improve the confidentiality of the watermark, under the condition that the information rate is not strict, the watermark information can be firstly encrypted or encoded, then the encrypted or encoded watermark information is subjected to spread spectrum processing, and then the spread watermark information is embedded into the imaginary component of the frequency domain signal. It should be noted that the encrypted watermark information is still a binary sequence, the present application does not limit the encryption method, for example, two times of repetition codes may be used to convert the value "0" of one bit in the original watermark information into "00", and the value "1" into "11", or other channel coding methods may be used. When watermark information is extracted, the information extracted from the imaginary component is decrypted or decoded correspondingly, and the original watermark information can be obtained.

S203, performing time domain reconstruction on the frequency spectrum signal embedded with the watermark information to obtain the audio containing the watermark information.

In specific implementation, the spectral signal (including the real component and the imaginary component) embedded with the watermark information is subjected to IFFT to obtain the time domain signal corresponding to each audio frame, and the complete audio embedded with the watermark information is obtained based on each audio frame.

According to the audio watermarking processing method provided by the embodiment of the application, only watermark information is embedded in the imaginary component of the frequency spectrum signal corresponding to the audio, watermark embedding is not carried out on the real component, various digital attacks (such as compression coding, resampling and DA/AD conversion) can be effectively resisted, the accuracy of watermark information extraction is improved, and because the imaginary component is generally much smaller than the real component, the total energy of watermark embedding is very small, a user can hardly perceive the difference of audio before and after watermark embedding, and the tone quality of the audio after watermark embedding is ensured. Therefore, the audio watermarking processing method provided by the application can ensure the tone quality of the audio after watermarking, ensure the accurate extraction of the watermark and obtain better robustness and concealment.

In order to further improve the concealment of watermark information embedded in audio and the fidelity of audio, a threshold value is set for the imaginary component, and watermark embedding operation is carried out only on frequency points of which the absolute value of the imaginary component is larger than the threshold value, so as to reduce the total energy of the embedded watermark information. To this end, referring to fig. 3, watermark information may be embedded in any of the individual audio frames of audio by:

S301, obtaining information to be embedded in corresponding sequence positions in a sequence to be embedded according to the sequence positions of the audio frames in each audio frame, wherein the sequence to be embedded comprises a plurality of pieces of information to be embedded determined according to watermark information.

In the specific implementation, the numerical value of each sequence bit in the watermark information can be subjected to spread spectrum processing according to the appointed sequence p to obtain the sequence to be embedded.

In one possible implementation, one bit of watermark information may be embedded in one audio frame. At this time, the method of spread spectrum processing may be: the value of a bit in the watermark information is spread to be an information to be embedded +p or-p. For example, a value 0 in watermark information is spread to obtain a to-be-embedded information +p, and a value 1 in watermark information is spread to obtain a to-be-embedded information-p; or spreading a value 1 in the watermark information to obtain information to be embedded +p, and spreading a value 0 in the watermark information to obtain information to be embedded-p.

In another possible implementation, one bit of watermark information may be embedded in two adjacent audio frames. At this time, the method of spread spectrum processing may be: the value of one bit in the watermark information is spread into two specified sequences with opposite signs, namely, the value of one bit is spread into two pieces of information to be embedded. For example, the value 0 in the watermark information is spread to obtain two pieces of information to be embedded +p and-p, the value 1 in the watermark information is spread to obtain two pieces of information to be embedded-p and +p, for example, the sequence to be embedded obtained by spreading the watermark information "00111010" is { +p, -p, -p, +p, -p }. Or the value 1 in the watermark information is spread to obtain two pieces of information to be embedded +p and +p, and the value 0 in the watermark information is spread to obtain two pieces of information to be embedded-p and +p. When watermark information is extracted, the numerical value of one bit in the watermark information can be determined according to two adjacent audio frames, specifically, the difference processing can be carried out according to the information to be embedded extracted from the two adjacent audio frames, so that the influence of host signal interference on watermark extraction results can be reduced, and the anti-interference capability of the watermark can be improved. The specific watermark extraction method will be described in detail later.

S302, determining a frequency point with an absolute value of an imaginary component smaller than a threshold value in the frequency domain signal corresponding to the audio frame as a target frequency point.

In one possible embodiment, the threshold value corresponding to each audio frame is determined according to a maximum value of imaginary components of each frequency point in the frequency domain signal corresponding to the audio frame. For example, if the maximum value of the imaginary component in the frequency domain signal of a certain audio frame is M _i, the threshold value β=a·m _i corresponding to the audio frame, where a is a value greater than 0 and less than 1, for example, a may be a value between 0.1 and 0.8, and the frequency point with higher energy is screened out, so as to ensure that watermark information is embedded in the frequency point with lower energy, thereby reducing the total energy of the embedded watermark information.

For example, if the frequency point X [ n ] is the absolute value of the imaginary componentThen the frequency point X [ n ] is determined to be a target frequency point in which watermark information needs to be embedded, otherwise, the watermark information does not need to be embedded in the frequency point.

In another possible implementation manner, for each audio frame, the distribution situation of the absolute value of the imaginary component of each frequency point in the audio frame may be counted, and the threshold value is determined according to the distribution situation. For example, the percentiles may be counted in order of the absolute value of the imaginary component from small to large, and the value corresponding to the q-th percentile is determined as the threshold value, and when q=80, i.e. the first 80% of the frequency points are determined as the target frequency points.

S303, embedding information to be embedded corresponding to the audio frame in the imaginary component of each target frequency point.

By the method shown in fig. 3, watermark information is embedded in only frequency points with frequency domain amplitude lower than a threshold value, so that the total energy of embedding can be further reduced, the tone quality of audio is ensured, and meanwhile, the concealment and anti-interference capability of the watermark information are improved.

In specific implementation, step S303 specifically includes: according to the sequence of each target frequency point in the frequency domain signal, determining the corresponding element value of each target frequency point in the information to be embedded; and respectively adding corresponding amplitude values to the imaginary component of each target frequency point according to the element values corresponding to each target frequency point.

Further, the method of the embodiment of the application further comprises the following steps: and respectively inputting each audio frame of the audio into a psychoacoustic model to obtain a masking threshold corresponding to each frequency point in the frequency domain signal corresponding to each audio frame. Correspondingly, in step S303, according to the element values corresponding to each target frequency point, corresponding magnitudes are added to the imaginary component of each target frequency point, which specifically includes: determining the product of the element value corresponding to each target frequency point and the masking threshold value corresponding to each target frequency point as the amplitude value corresponding to each target frequency point; and adding corresponding amplitude values to the imaginary component of each target frequency point respectively. Wherein, the masking threshold calculated by the psychoacoustic model is used as the upper limit value of the embeddable watermark energy, and the total energy of the embedded watermark is controlled.

In particular, watermark information may be embedded in the imaginary component of the frequency domain signal by the following formula: y.im=x.im±p·m, where x.im is an imaginary component of the target frequency point, Y is a frequency domain signal embedded with watermark information, y.im is an imaginary component of the frequency domain signal embedded with watermark information, m is a vector composed of masking thresholds corresponding to the respective target frequency points, and m can determine an amplitude value increased on the imaginary component, that is, an upper limit of embedded watermark energy.

Further, in step S303, a product of an element value corresponding to each target frequency point in the information to be embedded and a masking threshold corresponding to each target frequency point is determined as an amplitude corresponding to each target frequency point, which specifically includes: and determining the product of the element value corresponding to each target frequency point, the product of the masking threshold value corresponding to each target frequency point and the intensity coefficient as the amplitude value corresponding to each target frequency point.

In particular, watermark information may be embedded in the imaginary component of the frequency domain signal by the following formula: y.im=x.im±αp·m, where x.im is the imaginary component of the target frequency point, Y is the frequency domain signal in which watermark information is embedded, y.im is the imaginary component of the frequency domain signal in which watermark information is embedded, m is a vector composed of masking thresholds corresponding to the respective target frequency points, and α is an intensity coefficient for adjusting the overall embedding energy.

Referring to fig. 4, the audio watermarking method provided by the embodiment of the present application may be applied to the watermark extraction apparatus 103 shown in fig. 1, and may specifically include the following steps:

s401, obtaining frequency domain signals corresponding to all audio frames in the audio with the embedded watermark information.

The method for processing the frames of the audio with the embedded watermark information can refer to the processing method when the watermark information is embedded, and will not be described again.

In particular, in order to prevent spectrum leakage, windowed fast fourier transform may be performed on each audio frame in the audio embedded with watermark information, so as to obtain frequency domain signals corresponding to each audio frame, where a cosine component in each frequency domain signal is an imaginary component. The specific embodiments can refer to a processing method when watermark information is embedded, and will not be described in detail.

S402, watermark information is extracted from imaginary components of frequency domain signals corresponding to each audio frame.

According to the audio watermarking processing method provided by the embodiment of the application, the watermarking information is embedded in the imaginary part component of the frequency spectrum signal corresponding to the audio, so that various digital attacks can be effectively resisted, and the accuracy of extracting the watermarking information is improved.

In specific implementation, referring to fig. 5, step S402 specifically includes:

s501, extracting imaginary components of all frequency points from frequency domain signals corresponding to all audio frames.

S502, carrying out logarithmic operation processing on the imaginary components of the frequency points corresponding to each audio frame.

S503, extracting watermark information from the imaginary component after logarithmic operation processing.

Assuming that the frequency domain signal s=y+n of each audio frame in the audio in which watermark information has been embedded, where Y is the audio in which the watermark has been embedded, n is frequency domain noise of the audio Y generated by digital attack, compressed resampling, or the like in a transmission environment, logarithmic operation processing is performed on the imaginary component s.im: s.im (dB) =10·log (s.im). The watermark information is then extracted from s.im (dB).

In the method shown in fig. 5, when extracting the watermark, the frequency points are scaled to the logarithmic domain by utilizing the property of the logarithmic function, the frequency points with lower absolute values in the imaginary component are pulled to lower positions, and meanwhile, the energy of the watermark is widened, so that the accuracy of watermark extraction is improved, and the robustness is ensured.

On the basis of any of the above embodiments, referring to fig. 6, the spreading method used when embedding the watermark is: embedding the value of one bit in watermark information in an audio frame can extract the watermark information by:

S601, respectively calculating inner products of imaginary components corresponding to each audio frame after logarithmic operation processing and a specified sequence p, and obtaining detection judgment quantity corresponding to each audio frame, wherein the specified sequence p is a sequence used when watermark information is subjected to spread spectrum processing when watermark information is embedded in audio.

S602, if the detection decision quantity is larger than 0, determining that the value embedded in the corresponding audio frame is 0, otherwise, determining that the value embedded in the corresponding audio frame is 1.

S603, sequentially and serially connecting the determined numerical values embedded in each audio frame according to the sequence of each audio frame in the audio to obtain watermark information.

In specific implementation, a key specified sequence p is used when watermark information is embedded, and inner product operation is carried out on an imaginary component S.im (dB) after logarithmic operation processing and the specified sequence p, so that a correlation peak value is obtained as a detection decision quantity r=S.im (dB). P. The spread spectrum method adopted when the watermark information is embedded is as follows: and (3) spreading a value 0 in the watermark information to obtain information to be embedded +p, spreading a value 1 in the watermark information to obtain information to be embedded-p, wherein when r is greater than 0, the value embedded in the corresponding audio frame is 0, and otherwise, the value embedded in the corresponding audio frame is 1. The spread spectrum method adopted when the watermark information is embedded is as follows: and (3) spreading a value 1 in the watermark information to obtain information to be embedded +p, spreading a value 0 in the watermark information to obtain information to be embedded-p, wherein when r is greater than 0, the value embedded in the corresponding audio frame is 1, and otherwise, the value embedded in the corresponding audio frame is 0.

On the basis of any of the above embodiments, referring to fig. 7, the spreading method used when embedding the watermark is: two specified sequences p with opposite signs corresponding to one bit in watermark information are respectively embedded in two adjacent audio frames, and then the watermark information can be extracted by the following modes:

S701, respectively calculating inner products of imaginary components corresponding to each audio frame after logarithmic operation processing and a specified sequence p, and obtaining detection judgment quantity corresponding to each audio frame, wherein the specified sequence p is a sequence used when watermark information is subjected to spread spectrum processing when watermark information is embedded in audio.

S702, if the difference value between the detection decision amounts corresponding to two adjacent audio frames embedded with the same value is greater than 0, determining that the value embedded in the two adjacent audio frames is 0, otherwise, determining that the value embedded in the two adjacent audio frames is 1.

S703, sequentially and serially connecting the determined numerical values embedded in each pair of adjacent two audio frames according to the sequence of each pair of adjacent two audio frames in the audio to obtain watermark information.

Assuming that the detection decision r=s.im (dB) ·p= (x.im (dB) ±αp·m+n) ·p, where n is the host signal interference inherent in the spread spectrum system, the short-time stationarity of the audio can be utilized to suppress the host signal interference. If a specified sequence p with opposite signs is embedded in two adjacent audio frames when watermark information is embedded, namely, the spread spectrum mode adopted when the watermark information is embedded is as follows: spreading the value '0' in the watermark information to +p and-p, spreading the value '1' in the watermark information to-p and +p, or spreading the watermark bit '1' to +p and-p, and spreading the watermark bit '0' to-p and +p, then when extracting the watermark, the detection decision r of two adjacent audio frames is differed by r '= (Δx.im (dB) ±αp·m' +Δn) ·p. Since the adjacent audio frames can maintain a certain stationarity in a short time, Δx is a small value, and Δx.im (dB) is also a small value, and the distribution randomness of Δx.im (dB) is strong, and the correlation of the specified sequence p is poor, it is considered that the result of the operations of Δx.im (dB) and p inner product is approximately zero in a statistical sense, and thus, r 'can be approximately (αp·m' +Δn) ·p.

When the value '0' in the watermark information is spread to +p and-p, and the value '1' in the watermark information is spread to-p and +p, if the value '0' is embedded in two adjacent audio frames, the difference value of the detection decision r of the two adjacent audio frames is:

r’＝(X[1].im(dB)+αp·m₁+n₁)·p-(X[2].im(dB)-αp·m₂+n₂)·p

＝(ΔX.im(dB)+2αp·m’+Δn)·p，

Where r ' may be approximated as (2αp·m ' +Δn) ·p, since Δn is approximately zero, r ' >0. I.e. when r' >0, it can be determined that the value embedded in the adjacent two audio frames is 0. I.e. when r' >0, it can be determined that the value embedded in the adjacent two audio frames is 0.

Similarly, when the value "1" in the watermark information is spread to be-p and +p, and the value "1" in the watermark information is spread to be +p and-p, if the value "1" is embedded in two adjacent audio frames, the difference value of the detection decision r of the two adjacent audio frames is:

r’＝(X[1].im(dB)-αp·m₁+n₁)·p-(X[2].im(dB)+αp·m₂+n₂)·p

＝(ΔX.im(dB)-2αp·m’+Δn)·p，

where r ' may be approximated as (-2αp·m ' +Δn) ·p, since Δn is approximately zero, r ' <0. I.e. when r' <0, it can be determined that the value embedded in the adjacent two audio frames is 1.

By embedding two specified sequences p with opposite signs corresponding to one bit in watermark information in two adjacent audio frames respectively, the anti-interference capability of the watermark information can be improved, so that the watermark extraction accuracy can be improved.

The following is a result of testing the audio watermarking method provided by the embodiment of the present application.

The watermark pattern on the left side in fig. 8 is embedded in a section of audio with the sampling rate of 44.1kHz, and the ratio of the energy of the audio after embedding the watermark to the energy of the watermark information is about 45dB by adjusting the intensity coefficient alpha, so that the difference before and after adding the watermark can be hardly perceived by human ears. The upper graph in fig. 9 is a waveform diagram of original audio, and the lower graph in fig. 9 is a waveform diagram of audio after watermark embedding, and no obvious difference is seen from the waveform, so that the audio watermark processing method provided by the embodiment of the application can improve the concealment of the embedded watermark information and ensure the audio quality after watermark embedding.

The watermark pattern on the left side in fig. 8 is embedded in a section of audio with the sampling rate of 44.1kHz, after being compressed by AAC (Advanced Audio Coding ) with the 96kbps code rate, watermark information is extracted from the compressed audio by the audio watermark processing method provided by the embodiment of the application, the watermark pattern on the right side in fig. 8 is obtained based on the extracted watermark information, and the extraction accuracy is higher than 90%. The extraction accuracy can be higher than 95% after resampling the audio at 32 kHz. After DA/AD conversion, the extraction accuracy is higher than 98%.

Exemplary apparatus

Having described the method of an exemplary embodiment of the present application, an audio watermarking apparatus of an exemplary embodiment of the present application is described next.

Fig. 10 is a schematic structural diagram of an audio watermarking apparatus according to an embodiment of the present application. In one embodiment, the audio watermarking apparatus 100 comprises: a first conversion module 1001, a watermark embedding module 1002 and a time domain reconstruction module 1003.

A first conversion module 1001, configured to obtain frequency domain signals corresponding to respective audio frames in audio;

A watermark embedding module 1002, configured to embed watermark information in the imaginary component of the frequency domain signal corresponding to each audio frame;

The time domain reconstructing module 1003 is configured to reconstruct the time domain of the spectrum signal embedded with the watermark information, so as to obtain the audio containing the watermark information.

Optionally, the watermark embedding module 1002 is specifically configured to embed watermark information in any one of the audio frames by:

According to the sequence bit of any audio frame in each audio frame, obtaining information to be embedded in a sequence to be embedded in a corresponding sequence bit, wherein the sequence to be embedded comprises a plurality of pieces of information to be embedded determined according to watermark information;

determining a frequency point with an absolute value of an imaginary component smaller than a threshold value in a frequency domain signal corresponding to any audio frame as a target frequency point;

Optionally, the watermark information is a binary sequence, and the watermark embedding module 1002 is further configured to determine the sequence to be embedded according to the watermark information by: according to the appointed sequence p, spreading the numerical value of each sequence in the watermark information to obtain a sequence to be embedded; the method comprises the steps of obtaining information to be embedded +p after spreading a value 0 in watermark information, and obtaining information to be embedded-p after spreading a value 1 in watermark information; or the value 0 in the watermark information is spread to obtain two pieces of information to be embedded +p and +p, and the value 1 in the watermark information is spread to obtain two pieces of information to be embedded-p and +p.

Optionally, the watermark embedding module 1002 is specifically configured to:

Optionally, the watermark embedding module 1002 is further configured to input each audio frame of the audio into a psychoacoustic model, to obtain a masking threshold corresponding to each target frequency point in the frequency domain signal corresponding to each audio frame;

the watermark embedding module 1002 is specifically configured to: determining the product of the element value corresponding to each target frequency point and the masking threshold value corresponding to each target frequency point as the amplitude value corresponding to each target frequency point; and adding corresponding amplitude values to the imaginary component of each target frequency point respectively.

Optionally, the watermark embedding module 1002 is specifically configured to: and determining the product of the element value corresponding to each target frequency point, the product of the masking threshold value corresponding to each target frequency point and the intensity coefficient as the amplitude corresponding to each target frequency point.

Optionally, the first conversion module 1001 is specifically configured to: and respectively carrying out windowed fast Fourier transform on each audio frame in the audio to obtain frequency domain signals respectively corresponding to each audio frame, wherein cosine components in each frequency domain signal are imaginary components.

Fig. 11 is a schematic structural diagram of an audio watermarking apparatus according to an embodiment of the present application. In one embodiment, the audio watermarking apparatus 110 comprises: a second conversion module 1101 and a watermark extraction module 1102.

A second conversion module 1101, configured to obtain frequency domain signals corresponding to respective audio frames in audio containing watermark information;

watermark extraction module 1102 is configured to extract watermark information from imaginary components of the frequency domain signal corresponding to each audio frame.

Optionally, the watermark extraction module 1102 is specifically configured to:

carrying out logarithmic operation processing on the imaginary component of each frequency point corresponding to each audio frame;

Optionally, the watermark information is a binary sequence, and if a value in the watermark information is embedded in each audio frame, the watermark extraction module 1102 is specifically configured to:

Respectively calculating the inner product of the imaginary component corresponding to each audio frame after logarithmic operation processing and a designated sequence p to obtain a detection judgment quantity corresponding to each audio frame, wherein the designated sequence p is a sequence used when watermark information is subjected to spread spectrum processing when watermark information is embedded in audio;

and sequentially and serially connecting the determined numerical values embedded in each audio frame according to the sequence of each audio frame in the audio to obtain watermark information.

Optionally, the watermark information is a binary sequence, and if one value in the watermark information is embedded in two adjacent audio frames in the audio, the watermark extraction module 1102 is specifically configured to:

and according to the sequence of each pair of adjacent two audio frames in the audio, sequentially and serially determining the numerical values embedded in each pair of adjacent two audio frames to obtain watermark information.

The audio watermarking device provided by the embodiment of the application adopts the same inventive concept as the audio watermarking method, can obtain the same beneficial effects, and is not described herein again.

Based on the same inventive concept as the above-mentioned audio watermarking method, the embodiment of the present application further provides an electronic device, which may specifically be a terminal device or a server in fig. 1. As shown in fig. 12, the electronic device 120 may include a processor 1201 and a memory 1202.

The Processor 1201 may be a general purpose Processor such as a Central Processing Unit (CPU), digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array (Field Programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.

Memory 1202 is a non-volatile computer-readable storage medium that can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory may include at least one type of storage medium, which may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), magnetic Memory, magnetic disk, optical disk, and the like. The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1202 in embodiments of the present application may also be circuitry or any other device capable of performing storage functions for storing program instructions and/or data.

Exemplary program product

An embodiment of the present application provides a computer-readable storage medium storing computer program instructions for use with the above-described electronic device, which contains a program for executing the audio watermarking method in any of the exemplary embodiments of the present application.

The computer storage media described above can be any available media or data storage device that can be accessed by a computer, including, but not limited to, magnetic storage (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical storage (e.g., CD, DVD, BD, HVD, etc.), and semiconductor storage (e.g., ROM, EPROM, EEPROM, non-volatile storage (NAND FLASH), solid State Disk (SSD)), etc.

In some possible embodiments, the aspects of the present application may also be implemented as a computer program product comprising program code for causing a server device to carry out the steps of the audio watermarking method according to the various exemplary embodiments of the application as described in the "exemplary methods" section of this specification, when the computer program product is run on the server device.

The computer program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer program product for instant messaging applications in accordance with embodiments of the present application may employ a portable compact disc read-only memory (CD-ROM) and include program code and may run on a server device. However, the program product of the present application is not limited thereto, and in this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.

Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

While the spirit and principles of the present application have been described with reference to several particular embodiments, it is to be understood that the application is not limited to the disclosed embodiments nor does it imply that features of the various aspects are not useful in combination, nor are they useful in any combination, such as for convenience of description. The application is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method of audio watermarking comprising:

Performing time domain reconstruction on the frequency spectrum signal embedded with the watermark information to obtain audio containing the watermark information;

The spread spectrum method adopted when the watermark is embedded is as follows: embedding a value of one bit in watermark information in an audio frame, the watermark extraction device performs the following operations when extracting the watermark information: obtaining frequency domain signals corresponding to each audio frame in the audio embedded with watermark information; extracting imaginary components of each frequency point from the frequency domain signals corresponding to each audio frame; carrying out logarithmic operation processing on the imaginary components of each frequency point corresponding to each audio frame; respectively calculating inner products of imaginary components corresponding to each audio frame after logarithmic operation processing and a specified sequence p to obtain detection judgment quantity corresponding to each audio frame, wherein the specified sequence p is a sequence used when the watermark information is subjected to spread spectrum processing when the watermark information is embedded in the audio; if the detection decision quantity is greater than 0, determining that the value embedded in the corresponding audio frame is 0, otherwise, determining that the value embedded in the corresponding audio frame is 1; sequentially concatenating the determined numerical values embedded in each audio frame according to the sequence of each audio frame in the audio to obtain the watermark information;

The embedding watermark information in the imaginary component of the frequency domain signal corresponding to each audio frame specifically includes: embedding the watermark information in any one of the individual audio frames by: according to the sequence position of any audio frame in each audio frame, obtaining information to be embedded in a sequence to be embedded in a corresponding sequence position, wherein the sequence to be embedded comprises a plurality of pieces of information to be embedded determined according to the watermark information; determining a frequency point of which the absolute value of the imaginary component in the frequency domain signal corresponding to any audio frame is smaller than a threshold value as a target frequency point; and embedding information to be embedded corresponding to any audio frame in the imaginary component of each target frequency point.

2. The method of claim 1, wherein the watermark information is a binary sequence, and the sequence to be embedded is determined from the watermark information by:

3. The method according to claim 2, wherein the embedding the information to be embedded corresponding to the arbitrary audio frame in the imaginary component of each target frequency point specifically includes:

4. A method according to claim 3, characterized in that the method further comprises:

Respectively inputting each audio frame of the audio into a psychoacoustic model to obtain a masking threshold corresponding to each target frequency point in a frequency domain signal corresponding to each audio frame;

according to the element values corresponding to the target frequency points, corresponding amplitude values are respectively added to the imaginary component of each target frequency point, and the method specifically comprises the following steps:

5. The method according to claim 4, wherein determining the product of the element value corresponding to each target frequency point and the masking threshold corresponding to each target frequency point as the amplitude corresponding to each target frequency point specifically includes:

6. The method of any of claims 1 to 4, wherein the threshold value for each audio frame is determined based on a maximum value of imaginary components of each frequency bin in the frequency domain signal for the audio frame.

7. The method according to any one of claims 1 to 4, wherein the obtaining the frequency domain signals corresponding to each audio frame in the audio respectively specifically includes:

And respectively carrying out windowed fast Fourier transform on each audio frame in the audio to obtain frequency domain signals respectively corresponding to each audio frame, wherein a sine component in each frequency domain signal is an imaginary component.

8. A method of audio watermarking comprising:

extracting watermark information from the imaginary component subjected to logarithmic operation;

The spread spectrum method adopted when the watermark is embedded is as follows: embedding a bit value in watermark information in an audio frame, extracting watermark information from the imaginary component after logarithmic operation processing, including: respectively calculating inner products of imaginary components corresponding to each audio frame after logarithmic operation processing and a specified sequence p to obtain detection judgment quantity corresponding to each audio frame, wherein the specified sequence p is a sequence used when the watermark information is subjected to spread spectrum processing when the watermark information is embedded in the audio; if the detection decision quantity is greater than 0, determining that the value embedded in the corresponding audio frame is 0, otherwise, determining that the value embedded in the corresponding audio frame is 1; sequentially concatenating the determined numerical values embedded in each audio frame according to the sequence of each audio frame in the audio to obtain the watermark information;

the watermark information is embedded in any one of the audio frames in the following manner: according to the sequence position of any audio frame in each audio frame, obtaining information to be embedded in a sequence to be embedded in a corresponding sequence position, wherein the sequence to be embedded comprises a plurality of pieces of information to be embedded determined according to the watermark information; determining a frequency point of which the absolute value of the imaginary component in the frequency domain signal corresponding to any audio frame is smaller than a threshold value as a target frequency point; and embedding information to be embedded corresponding to any audio frame in the imaginary component of each target frequency point.

9. The method according to claim 8, wherein the watermark information is a binary sequence, and if one value of the watermark information is embedded in two adjacent audio frames in the audio, extracting watermark information from the imaginary component after the logarithmic operation processing specifically includes:

10. An audio watermarking apparatus, comprising:

the time domain reconstruction module is used for performing time domain reconstruction on the frequency spectrum signal embedded with the watermark information to obtain audio containing the watermark information;

The watermark embedding module is specifically configured to embed the watermark information in any one of the audio frames by: according to the sequence position of any audio frame in each audio frame, obtaining information to be embedded in a sequence to be embedded in a corresponding sequence position, wherein the sequence to be embedded comprises a plurality of pieces of information to be embedded determined according to the watermark information; determining a frequency point of which the absolute value of the imaginary component in the frequency domain signal corresponding to any audio frame is smaller than a threshold value as a target frequency point; and embedding information to be embedded corresponding to any audio frame in the imaginary component of each target frequency point.

11. The apparatus of claim 10, wherein the watermark information is a binary sequence, and wherein the watermark embedding module is further configured to determine the sequence to be embedded from the watermark information by:

12. The apparatus according to claim 11, wherein the watermark embedding module is specifically configured to:

13. The apparatus of claim 12, wherein the watermark embedding module is further configured to input each audio frame of the audio into a psychoacoustic model, respectively, to obtain a masking threshold corresponding to each target frequency point in the frequency domain signal corresponding to each audio frame;

The watermark embedding module is specifically configured to:

14. The apparatus according to claim 13, wherein the watermark embedding module is specifically configured to:

15. The apparatus according to any one of claims 10 to 13, wherein the threshold value for each audio frame is determined based on a maximum value of imaginary components of each frequency point in the frequency domain signal for the audio frame.

16. The apparatus according to any one of claims 10 to 13, wherein the first conversion module is specifically configured to:

17. An audio watermarking apparatus, comprising:

The watermark extraction module is used for extracting the imaginary component of each frequency point from the frequency domain signal corresponding to each audio frame; carrying out logarithmic operation processing on the imaginary components of each frequency point corresponding to each audio frame; extracting watermark information from the imaginary component subjected to logarithmic operation;

The watermark information is a binary sequence, and if the watermark is embedded, a spread spectrum method adopted is as follows: the watermark extraction module is specifically configured to embed a value of one bit in watermark information in an audio frame: respectively calculating inner products of imaginary components corresponding to each audio frame after logarithmic operation processing and a specified sequence p to obtain detection judgment quantity corresponding to each audio frame, wherein the specified sequence p is a sequence used when the watermark information is subjected to spread spectrum processing when the watermark information is embedded in the audio; if the detection decision quantity is greater than 0, determining that the value embedded in the corresponding audio frame is 0, otherwise, determining that the value embedded in the corresponding audio frame is 1; sequentially concatenating the determined numerical values embedded in each audio frame according to the sequence of each audio frame in the audio to obtain the watermark information;

18. The apparatus according to claim 17, wherein the watermark information is a binary sequence, and the watermark extraction module is specifically configured to:

19. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any one of claims 1 to 9 when the computer program is executed by the processor.

20. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the method of any of claims 1 to 9.