CN113990330A - Method and device for embedding and identifying audio watermark based on deep network - Google Patents

Method and device for embedding and identifying audio watermark based on deep network Download PDF

Info

Publication number
CN113990330A
CN113990330A CN202111250867.9A CN202111250867A CN113990330A CN 113990330 A CN113990330 A CN 113990330A CN 202111250867 A CN202111250867 A CN 202111250867A CN 113990330 A CN113990330 A CN 113990330A
Authority
CN
China
Prior art keywords
network
audio
watermark
decoder
embedding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111250867.9A
Other languages
Chinese (zh)
Inventor
李平
蒋升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suirui Technology Group Co Ltd
Original Assignee
Suirui Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suirui Technology Group Co Ltd filed Critical Suirui Technology Group Co Ltd
Priority to CN202111250867.9A priority Critical patent/CN113990330A/en
Publication of CN113990330A publication Critical patent/CN113990330A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/018Audio watermarking, i.e. embedding inaudible data in the audio signal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method and a device for embedding and identifying audio watermarks based on a deep network, belonging to the field of audio digital watermarks, wherein the embedding method comprises the following steps: s1: framing original audio, and performing windowing and short-time Fourier transform on each frame; s2: the frequency domain characteristics extracted after short-time Fourier transform are used as the input of a U-net network, and watermark information is embedded into the U-net network for encoding; s3: and carrying out short-time Fourier inverse transformation on the coded frequency domain characteristics to obtain the audio with the watermark. The invention can make the watermark have stronger robustness under different types of noise scenes.

Description

Method and device for embedding and identifying audio watermark based on deep network
Technical Field
The invention belongs to the field of audio digital watermarks, and particularly relates to a method and a device for embedding and identifying an audio watermark based on a deep network.
Background
The digital watermarking technology is an information hiding technology, namely an audio digital watermarking algorithm, namely, a digital watermark is embedded into an audio file (such as wav, mp3, avi and the like) through a watermarking embedding algorithm, but the digital watermarking technology has no great influence on the original tone quality of the audio file or cannot be influenced by human ears. And on the contrary, the audio digital watermark is completely extracted from the audio host file through a watermark extraction algorithm, and the embedded watermark and the extracted watermark are called the audio digital watermark.
Digital watermarking is not a new concept and has long been studied as a pure signal processing problem. Classical methods include the use of hidden information in encoded images and audio files. These methods include LSB encoding, phase encoding, and spread spectrum watermarking, among others. Another watermarking method has recently appeared, which represents the information in the media through machine learning, and can use a deep neural network to perform coding and decoding to obtain the corresponding watermark.
In the prior art, a steganographic algorithm in an image is available, wherein transmitted information is another image, the method is a lossy transmission which allows carrying a large amount of information, and the algorithm shows the potential of machine learning in watermarking and steganography in the artificial intelligence era of big data. Furthermore, there is a method of steganography and watermarking in images that leaves the carrier distorted by the noise introduced after encoding, rendering the encoded information that has been trained robust to the introduced noise, in such a way that one can try to make the watermark robust to lossy JPEG compression.
Given that these methods can be applied to audio, taking advantage of the audio characteristics, it is possible to create more robust watermarks in the audio domain.
In view of the above, the present invention is particularly proposed.
Disclosure of Invention
The invention aims to provide a method and a device for embedding and identifying an audio watermark based on a deep network, which can enable the watermark to have stronger robustness under different types of noise scenes.
In order to achieve the above object, the present invention provides a method for embedding an audio watermark based on a deep network, including the following steps:
s1: framing original audio, and performing windowing and short-time Fourier transform on each frame;
s2: the frequency domain characteristics extracted after short-time Fourier transform are used as the input of a U-net network, and watermark information is embedded into the U-net network for encoding;
s3: and carrying out short-time Fourier inverse transformation on the coded frequency domain characteristics to obtain the audio with the watermark.
Further, in step S2, the upsampling stage and the downsampling stage of the U-net network use convolution operations of the same number of levels, and the downsampling layer and the upsampling layer are connected by using a cross-connection structure.
Furthermore, the down-sampling part of the U-net consists of four encoders, and each encoder consists of a 2-dimensional convolution network and is used for batch normalization processing; after the frequency domain features are mapped by 4 encoders, the obtained dimension is reduced to 8 x 2 x 256; and then, performing up-sampling operation by using four encoders, wherein each encoder is composed of a 2-dimensional convolution network and is used for batch normalization processing.
The invention also provides a method for identifying the audio watermark based on the deep network, which comprises the following steps:
s4: decoding the audio with the watermark through a 2-dimensional convolutional network to obtain watermark information, wherein the decoding process comprises the following steps: and adopting a 2-dimensional convolution network for batch normalization processing.
Further, in step S4, a decoder is used for decoding; the decoder can output 32 prediction probability values between 0 and 1 and obtain watermark information through a decoder loss function.
The invention also provides a device for embedding the audio watermark based on the deep network, which comprises a preprocessing module, an encoder and a processing module,
the preprocessing module is used for framing the original audio and performing windowing and short-time Fourier transform on each frame;
the encoder is used for taking the frequency domain characteristics extracted after short-time Fourier transform as the input of the U-net network, and embedding the watermark information into the U-net network for encoding;
and the processing module is used for carrying out short-time Fourier inverse transformation on the encoded frequency domain characteristics to obtain the audio with the watermark.
Further, the up-sampling stage and the down-sampling stage of the U-net network used in the encoder adopt convolution operations of the same number of levels, and a structure of crossing connections is used to connect the down-sampling layer and the up-sampling layer.
Furthermore, the down-sampling part of the U-net consists of four encoders, and each encoder consists of a 2-dimensional convolution network and is used for batch normalization processing; after the frequency domain features are mapped by 4 encoders, the obtained dimension is reduced to 8 x 2 x 256; and then, performing up-sampling operation by using four encoders, wherein each encoder is composed of a 2-dimensional convolution network and is used for batch normalization processing.
The invention also provides a device for identifying the audio watermark based on the deep network, which comprises a decoder,
the decoder is used for decoding the audio with the watermark through a 2-dimensional convolutional network to obtain watermark information, wherein the decoding process comprises the following steps: and adopting a 2-dimensional convolution network for batch normalization processing.
Furthermore, in the decoder, the decoder is adopted for decoding; wherein, the decoder can output 32 prediction probability values between 0 and 1, and obtains watermark information through a decoder loss function.
Compared with the algorithm in the prior art, the method and the device for embedding and identifying the audio watermark based on the deep network can encode and accurately extract information of various noise types, and have good robustness on noise.
Drawings
Fig. 1 is a flowchart of a method for embedding an audio watermark based on a deep network according to this embodiment.
Fig. 2 is a spectrum diagram of the signal obtained in step S1 after short-time fourier transform in accordance with the exemplary embodiment.
Fig. 3 is a schematic diagram of a U-net network architecture framework adopted in step S2 according to an embodiment.
Fig. 4 is a schematic diagram of a network structure of a decoder according to an embodiment.
Fig. 5 is a flowchart of a method for identifying an audio watermark based on a deep network according to this embodiment.
Fig. 6 is a schematic diagram of an apparatus for embedding an audio watermark based on a deep network according to this embodiment.
Detailed Description
The present invention will be described in further detail with reference to specific embodiments in order to make the technical field better understand the scheme of the present invention.
As shown in fig. 1, an embodiment of the present invention is a method for embedding an audio watermark based on a deep network, which is an audio watermarking method capable of performing deep learning based on U-net and having robustness, where robustness means that a watermark can still be extracted after being propagated and recorded in the air.
The method for embedding the watermark specifically comprises the following steps:
s1: the original audio is framed and each frame is windowed and short-time fourier transformed.
The formula for performing the short-time fourier transform is as follows:
S(m,w)=DTFT{x(n)w(n-m*N/2)}
in the formula, S (m, w) is a two-dimensional function with respect to time and frequency domains; x (N) is a speech signal, w (N) is a window function which is inverted in the time domain with an offset of N/2, m represents the number of window slips; n is 2048 points in the present invention and STFT is a short-time fourier transform.
In the invention, the original audio format is that the sampling rate is 16000Hz, a single channel is 16 bits, the time length is 2s, the number of sampling points is 32000, the framing and windowing are carried out, the window length is 2048, the window shift is 1024, and then the short-time Fourier transform (STFT) is carried out on the non-stationary model, so as to obtain 1025 frequency domain sections and 32 time domain sections which are respectively the amplitude and the phase.
The invention only uses 10-512 frequency domain amplitude, which is equivalent to 87-4000 Hz frequency domain amplitude, other parts are kept unchanged, and the obtained spectrogram of the signal after short-time Fourier transform is shown in figure 2, wherein the abscissa is a time axis and the ordinate is a frequency axis.
The reason for the frequency domain amplitude of 10-512 selected by the present invention is as follows: the channel plays an important role in playing or recording audio in the air by using hardware products (such as a mobile phone or a computer). This parameter range is chosen considering that low frequencies may be lost in air, and higher frequencies are lost if the hardware width is limited.
S2: and (4) taking the 87Hz-4000Hz frequency domain characteristics extracted after short-time Fourier transform as the input of the U-net network, and embedding the watermark information into the U-net network for encoding.
The encoder used for encoding is the U-net structure used by Jansson et al (Andrea Jansson, Eric Humphrey, Nicola Montechio, Rachel Bittner, Aparna Kumar, and Tillman Weyde.singing voice separation with deep U-network connected networks.2017). The U-net structure is a U-shaped symmetrical structure and is divided into a down-sampling stage and an up-sampling stage, and the network only has a convolution layer and a pooling layer and does not have a full connection layer. The convolution operation with the same number of levels is adopted in the up-sampling stage and the down-sampling stage of the U-net, and the down-sampling layer is connected with the up-sampling layer by using a crossing connection structure, so that the characteristics extracted by the down-sampling layer can be directly transmitted to the up-sampling layer, and the subdivision characteristics of the U-net network are more accurate.
The U-net network structure framework adopted in the invention is shown in figure 3, the down-sampling part of the U-net network consists of the following four encoders, each encoder consists of a 2-dimensional convolution network, the step length of the encoder is 2, the convolution sum is 5 x 5, the encoder is used for batch normalization processing, and the activation function of the encoder uses a ReLU function; after the frequency domain features are mapped by a 4-block encoder, the resulting dimension is reduced to 8 x 2 x 256, and the watermark information is inserted in this part of the bottleneck layer, this block being 8 x 2 x 32. In order to recover the original input bit number, the up-sampling operation is still performed by using four block encoders, each block encoder is composed of a 2-dimensional convolution network, the step size of the 2-dimensional convolution network is 2, the convolution kernel is 5 × 5, the activation function is used for batch normalization processing, and the ReLU function is also used as the activation function. The step size of the last 2-dimensional convolution network is 1, the convolution sum is 5 x 5, and the number of channels is reduced to 1.
Therefore, the watermark information is stacked and repeated by 1 × 32 into 8 × 2 × 32 information, and connected to the U-net encoder.
S3: and carrying out short-time Fourier inverse transformation on the coded frequency domain characteristics to obtain the audio with the watermark.
The formula of the short-time inverse Fourier transform is as follows:
Figure BDA0003322460720000051
in the formula, y (N) is a reconstructed signal subjected to short-time Fourier transform, S (m, N) is a function obtained by performing inverse discrete Fourier transform on an S (m, w) time frequency spectrum, w (N) is a window function, and N is 2048 points.
As shown in fig. 5, an embodiment of the present invention is a method for identifying an audio watermark based on a deep network, which includes the following steps:
s4: and decoding the audio with the watermark to obtain watermark information.
Wherein a decoder is used for decoding. The decoder is a multi-label classifier, a spectrogram of a voice signal is used as input, 32 prediction probability values between 0 and 1 are output, bit bits corresponding to each prediction value are coded into binary bit bits in the audio watermark, and the bit bits are 0 or 1 respectively.
The network structure of the decoder is shown in fig. 4, specifically, the decoder is decoded by 6 decoders, the decoder adopts a 2-dimensional convolution network, batch normalization processing is performed, the activation function of the decoder uses a ReLU function for adjusting different step sizes, the loss function is minimum in the training process, a proper step size is obtained, the space size after the last convolution is 32 × 1, the last layer is a fully-connected layer, and 32 output neurons and a sigmoid function exist. The output value is a value between 0-1 and the decoder loss function is the cross entropy of the output value from the decoder network and the encoder binary watermark message. Finally, watermark information is obtained.
Wherein, the formula of the ReLU function is as follows:
y(x)=max(0,x)
the use of the ReLU function can overcome the problem of gradient disappearance during training and speed up the training.
The formula for the loss function is as follows:
Figure BDA0003322460720000061
in the formula, Loss represents the Loss function size, yicLabels (0 or 1), pre, representing binary bits of watermarksicIs to say that the watermarked speech signal belongs to a prediction probability of type c and the information embedded in the audio watermark is a bit, i.e. 0 or 1. In the patent of the present invention, the two types are 0 or 1. Therefore, the value of c belongs to 1 or 0. M is the number of the watermark labels, and 1 ten thousand different watermark labels are adopted in the invention.
By the method, information of various noise types can be coded and accurately extracted, and the watermark model has good robustness to noise.
As shown in fig. 6, an embodiment of the present invention is an apparatus for embedding an audio watermark based on a deep network, and the apparatus includes a preprocessing module 1, an encoder 2, and a processing module 3.
The pre-processing module 1 is used to frame the original audio and perform windowing and short-time fourier transform on each frame.
The formula for performing the short-time fourier transform is as follows:
S(m,w)=DTFT{x(n)w(n-m*N/2)}
in the formula, S (m, w) is a two-dimensional function with respect to time and frequency domains; x (N) is a speech signal, w (N) is a window function which is inverted in the time domain with an offset of N/2, m represents the number of window slips; n is 2048 points in the present invention and DTFT is a discrete fourier transform.
In the invention, the original audio format is that the sampling rate is 16000Hz, a single channel is 16 bits, the time length is 2s, the number of sampling points is 32000, the framing and windowing are carried out, the window length is 2048, the window shift is 1024, and then the short-time Fourier transform (STFT) is carried out on the non-stationary model, so as to obtain 1025 frequency domain sections and 32 time domain sections which are respectively the amplitude and the phase.
The invention only uses 10-512 frequency domain amplitude, which is equivalent to 87-4000 Hz frequency domain amplitude, other parts are kept unchanged, and the obtained spectrogram of the signal after short-time Fourier transform is shown in figure 2, wherein the abscissa is a time axis and the ordinate is a frequency axis.
And the encoder 2 is used for taking the 87Hz-4000Hz frequency domain characteristics extracted after short-time Fourier transform as the input of the U-net network, embedding the watermark information into the U-net network and encoding.
The encoder employs the U-net architecture used by Jansson et al. The U-net structure is a U-shaped symmetrical structure and is divided into a down-sampling stage and an up-sampling stage, and the network only has a convolution layer and a pooling layer and does not have a full connection layer. The convolution operation with the same number of levels is adopted in the up-sampling stage and the down-sampling stage of the U-net, and the down-sampling layer is connected with the up-sampling layer by using a crossing connection structure, so that the characteristics extracted by the down-sampling layer can be directly transmitted to the up-sampling layer, and the subdivision characteristics of the U-net network are more accurate.
The U-net network structure framework adopted in the invention is shown in figure 3, the down-sampling part of the U-net network consists of the following four encoders, each encoder consists of a 2-dimensional convolution network, the step length of the encoder is 2, the convolution sum is 5 x 5, the encoder is used for batch normalization processing, and the activation function of the encoder uses a ReLU function; after the frequency domain features are mapped by a 4-block encoder, the resulting dimension is reduced to 8 x 2 x 256, and the watermark information is inserted in this part of the bottleneck layer, this block being 8 x 2 x 32. In order to recover the original input bit number, the up-sampling operation is still performed by using four block encoders, each block encoder is composed of a 2-dimensional convolution network, the step size of the 2-dimensional convolution network is 2, the convolution kernel is 5 × 5, the activation function is used for batch normalization processing, and the ReLU function is also used as the activation function. The step size of the last 2-dimensional convolution network is 1, the convolution sum is 5 x 5, and the number of channels is reduced to 1.
Thus, the watermark information can be stacked and repeated from 1 x 32 to 8 x 2 x 32 information by the inventive encoder 2 connected to the U-net encoder.
The processing module 3 is configured to perform short-time inverse fourier transform on the audio to obtain a watermarked audio.
The formula of the short-time inverse Fourier transform is as follows:
Figure BDA0003322460720000081
in the formula, y (N) is a reconstructed signal subjected to short-time Fourier transform, S (m, N) is a function obtained by performing inverse discrete Fourier transform on an S (m, w) time frequency spectrum, w (N) is a window function, and N is 2048 points.
One embodiment of the present invention is an apparatus for embedding an audio watermark based on a deep network, which includes a decoder 4.
The decoder 4 is configured to decode the watermarked audio to obtain watermark information.
Wherein a decoder is used for decoding. The decoder is a multi-label classifier, which takes a spectrogram of a voice signal as input and outputs 32 prediction probability values between 0 and 1, and bit corresponding to each prediction value.
The network structure of the decoder is as shown in fig. 4, the decoder is decoded by 6 decoders, the decoder adopts a 2-dimensional convolution network, batch normalization processing is carried out, the activating function of the decoder uses a ReLU function for adjusting different step sizes, the loss function is minimum in the training process, a proper step size is obtained, the space size after the last convolution is 32 x 1, and the last layer is a full-connection layer and has 32 output neurons and a sigmoid function. The output value is a value between 0-1 and the decoder loss function is the cross entropy of the output value from the decoder network and the encoder binary watermark message. Finally, watermark information is obtained.
Wherein, the formula of the ReLU function is as follows:
y(x)=max(0,x)
the use of the ReLU function can overcome the problem of gradient disappearance during training and speed up the training.
The formula for the loss function is as follows:
Figure BDA0003322460720000091
in the formula, Loss represents the Loss function size, yicLabels (0 or 1), pre, representing binary bits of watermarksicIs to mean the prediction probability that the watermarked speech signal belongs to type c. M is the number of the watermark labels, and 1 ten thousand different watermark labels are adopted in the invention.
The invention is explained in detail by applying specific examples, and the above description of the embodiments is only used to help understanding the core idea of the invention. It should be understood that any obvious modifications, equivalents and other improvements made by those skilled in the art without departing from the spirit of the present invention are included in the scope of the present invention.

Claims (10)

1. A method for embedding audio watermark based on a deep network is characterized by comprising the following steps:
s1: framing original audio, and performing windowing and short-time Fourier transform on each frame;
s2: the frequency domain characteristics extracted after short-time Fourier transform are used as the input of a U-net network, and watermark information is embedded into the U-net network for encoding;
s3: and carrying out short-time Fourier inverse transformation on the coded frequency domain characteristics to obtain the audio with the watermark.
2. The method for embedding an audio watermark according to claim 1, wherein in step S2, the up-sampling stage and the down-sampling stage of the U-net network use convolution operations of the same number of levels, and the down-sampling layer and the up-sampling layer are connected by using a cross-connection structure.
3. The method for embedding audio watermark based on the deep network as claimed in claim 2, wherein the downsampling part of the U-net is composed of four encoders, each encoder is composed of a 2-dimensional convolution network and is used for batch normalization processing; after the frequency domain features are mapped by 4 encoders, the obtained dimension is reduced to 8 x 2 x 256; and then, performing up-sampling operation by using four encoders, wherein each encoder is composed of a 2-dimensional convolution network and is used for batch normalization processing.
4. A method for identifying audio watermarks based on a deep network is characterized by comprising the following steps:
s4: decoding the audio with the watermark through a 2-dimensional convolutional network to obtain watermark information, wherein the decoding process comprises the following steps: and adopting a 2-dimensional convolution network for batch normalization processing.
5. The method for identifying an audio watermark based on a deep network as claimed in claim 4, wherein in the step S4, a decoder is used for decoding; the decoder can output 32 prediction probability values between 0 and 1 and obtain watermark information through a decoder loss function.
6. The device for embedding the audio watermark based on the deep network is characterized by comprising a preprocessing module, an encoder and a processing module,
the preprocessing module is used for framing the original audio and performing windowing and short-time Fourier transform on each frame;
the encoder is used for taking the frequency domain characteristics extracted after short-time Fourier transform as the input of the U-net network, and embedding the watermark information into the U-net network for encoding;
and the processing module is used for carrying out short-time Fourier inverse transformation on the encoded frequency domain characteristics to obtain the audio with the watermark.
7. The apparatus of claim 6, wherein the up-sampling stage and the down-sampling stage of the U-net network used in the encoder use convolution operations of the same number of levels, and the down-sampling layer and the up-sampling layer are connected by a cross-connection structure.
8. The device for embedding audio watermark based on deep network as claimed in claim 7, wherein the downsampling part of U-net is composed of four encoders, each encoder is composed of 2-dimensional convolution network for batch normalization processing; after the frequency domain features are mapped by 4 encoders, the obtained dimension is reduced to 8 x 2 x 256; and then, performing up-sampling operation by using four encoders, wherein each encoder is composed of a 2-dimensional convolution network and is used for batch normalization processing.
9. An apparatus for identifying an audio watermark based on a deep network, comprising a decoder,
the decoder is used for decoding the audio with the watermark through a 2-dimensional convolutional network to obtain watermark information, wherein the decoding process comprises the following steps: and adopting a 2-dimensional convolution network for batch normalization processing.
10. The apparatus for identifying audio watermark based on deep network as claimed in claim 9, wherein the decoder is adopted for decoding; wherein, the decoder can output 32 prediction probability values between 0 and 1, and obtains watermark information through a decoder loss function.
CN202111250867.9A 2021-10-26 2021-10-26 Method and device for embedding and identifying audio watermark based on deep network Pending CN113990330A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111250867.9A CN113990330A (en) 2021-10-26 2021-10-26 Method and device for embedding and identifying audio watermark based on deep network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111250867.9A CN113990330A (en) 2021-10-26 2021-10-26 Method and device for embedding and identifying audio watermark based on deep network

Publications (1)

Publication Number Publication Date
CN113990330A true CN113990330A (en) 2022-01-28

Family

ID=79741966

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111250867.9A Pending CN113990330A (en) 2021-10-26 2021-10-26 Method and device for embedding and identifying audio watermark based on deep network

Country Status (1)

Country Link
CN (1) CN113990330A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114936962A (en) * 2022-06-23 2022-08-23 晋城市大锐金马工程设计咨询有限公司 One-to-one full text watermark encryption adding technology based on document
CN115018734A (en) * 2022-07-15 2022-09-06 北京百度网讯科技有限公司 Video restoration method and training method and device of video restoration model
CN117935820A (en) * 2024-03-21 2024-04-26 北京和人广智科技有限公司 Watermark batch embedding method for audio library

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114936962A (en) * 2022-06-23 2022-08-23 晋城市大锐金马工程设计咨询有限公司 One-to-one full text watermark encryption adding technology based on document
CN115018734A (en) * 2022-07-15 2022-09-06 北京百度网讯科技有限公司 Video restoration method and training method and device of video restoration model
CN115018734B (en) * 2022-07-15 2023-10-13 北京百度网讯科技有限公司 Video restoration method and training method and device of video restoration model
CN117935820A (en) * 2024-03-21 2024-04-26 北京和人广智科技有限公司 Watermark batch embedding method for audio library

Similar Documents

Publication Publication Date Title
CN113990330A (en) Method and device for embedding and identifying audio watermark based on deep network
CN111091841B (en) Identity authentication audio watermarking algorithm based on deep learning
US6219634B1 (en) Efficient watermark method and apparatus for digital signals
Ahani et al. A sparse representation-based wavelet domain speech steganography method
CN1311581A (en) Method and device for computerized voice data hidden
CN109996073B (en) Image compression method, system, readable storage medium and computer equipment
Shirali-Shahreza et al. High capacity error free wavelet domain speech steganography
Kumsawat A genetic algorithm optimization technique for multiwavelet-based digital audio watermarking
Mosleh et al. A robust intelligent audio watermarking scheme using support vector machine
Ye et al. Heard more than heard: An audio steganography method based on gan
JP2005513543A (en) QIM digital watermarking of multimedia signals
Baziyad et al. Maximizing embedding capacity for speech steganography: a segment-growing approach
CN111292756B (en) Compression-resistant audio silent watermark embedding and extracting method and system
Bao et al. MP3-resistant music steganography based on dynamic range transform
CN114999502B (en) Adaptive word framing based voice content watermark generation and embedding method and voice content integrity authentication and tampering positioning method
Irawati et al. QR-based watermarking in audio subband using DCT
Wei et al. Lightweight AAC Audio Steganalysis Model Based on ResNeXt
Wei et al. Controlling bitrate steganography on AAC audio
KR20030016381A (en) Watermarking
Peng et al. Optimal audio watermarking scheme using genetic optimization
Zhang et al. A CNN based visual audio steganography model
CN110047495B (en) High-capacity audio watermarking algorithm based on 2-level singular value decomposition
Chowdhury A Robust Audio Watermarking In Cepstrum Domain Composed Of Sample's Relation Dependent Embedding And Computationally Simple Extraction Phase
Wang et al. Audio zero watermarking for MP3 based on low frequency energy
Tegendal Watermarking in audio using deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination