CN117223055A

CN117223055A - Robust authentication of digital audio

Info

Publication number: CN117223055A
Application number: CN202180059403.1A
Authority: CN
Inventors: 崔洋; 王科; 何磊; F·K-P·宋
Original assignee: Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2021-05-08
Filing date: 2021-05-08
Publication date: 2023-12-12
Also published as: WO2022236451A1; EP4334934A1

Abstract

The solution for authenticating digital audio includes: generating a first band-limited watermark using a first key, generating a second band-limited watermark using a second key, wherein the bandwidth of the second watermark does not overlap the bandwidth of the first watermark; and embedding the first watermark and the second watermark into a segment of the digital audio file. The solutions further include determining a first watermark score for the segment of the digital audio file for the first watermark using the first key; determining a second watermark score for the segment of the digital audio file for the second watermark using the second key; determining a probability of watermarking of the digital audio file based at least on the first watermark score and the second watermark score; and generating a report indicating whether the digital audio file is watermarked. In some examples, the solutions may also embed and decode messages.

Description

Robust authentication of digital audio

Background

Digital audio watermarking is a technique used to assist in enforcing copyrights and using data hiding techniques to embed messages into digital audio content that can then be recovered, but which it is desirable that humans cannot hear while listening to the audio. However, hackers and pirates are aware of the use of watermarks and may therefore attempt to tamper with the watermark in the digital audio file, such as by attempting to overwrite it with a different watermark or copy the record in a way that erases or degrades the watermark. One approach is to play audio through a speaker and record the played audio into a different digital file. If the watermark becomes unrecoverable, the expected authentication value of the copyright implementation may be reduced or lost.

The conventional watermarking method has a number of disadvantages: for example, multiple watermarks placed within the same audio piece may interfere with each other, possibly rendering one of the watermarks unrecoverable (corrupting the authentication value), whereas common techniques, such as inserting a bit sequence, typically use less important bits, resulting in a watermark that is vulnerable to corruption. A common tradeoff with conventional watermarking methods is that increasing the robustness of authentication reduces the transparency to the user, making the watermark possible for humans to hear, and thus reducing the user's listening experience.

Disclosure of Invention

The disclosed examples are described in detail below with reference to the drawings listed below. The following summary is provided to illustrate some examples disclosed herein. However, this is not meant to limit all examples to any particular configuration or order of operation.

The solution for authenticating digital audio includes: receiving a digital audio file; generating a first watermark using a first key, wherein the first watermark is band limited to a first bandwidth; generating a second watermark using a second key, wherein the second watermark is band limited to a second bandwidth, and wherein the second bandwidth does not overlap with the first bandwidth; embedding the first watermark into a segment of the digital audio file; and embedding the second watermark into the segment of the digital audio file.

The solution for authenticating digital audio includes: receiving a digital audio file; determining a first watermark score for a segment of the digital audio file for a first watermark using a first key, wherein the first watermark is band limited to a first bandwidth; determining a second watermark score for the segment of the digital audio file for a second watermark using a second key, wherein the second watermark is band limited to a second bandwidth, and wherein the second bandwidth does not overlap the first bandwidth; determining a probability of watermarking of the digital audio file based at least on the first watermark score and the second watermark score; and generating a report indicating whether the digital audio file is watermarked based at least on determining the probability that the digital audio file is watermarked. In some examples, solutions for authenticating digital audio may also embed and decode messages.

Drawings

The disclosed examples are described in detail below with reference to the drawings listed below:

FIG. 1 illustrates an arrangement for robust authentication of digital audio;

FIG. 2 illustrates a spectrogram of an input audio segment and a watermarked audio segment that may be generated using the arrangement of FIG. 1;

Fig. 3 illustrates further details of the watermark embedding module of the arrangement of fig. 1;

fig. 4 illustrates the phase of generating a spread spectrum watermark that may occur in the arrangement of fig. 1;

fig. 5 illustrates the phase of generating an autocorrelation watermark that may occur in the arrangement of fig. 1;

FIG. 6 is a flowchart illustrating exemplary operations that may be performed by the arrangement of FIG. 1;

fig. 7 illustrates further details of the watermark detection module of the arrangement of fig. 1;

fig. 8 illustrates a stage of detecting a spread spectrum watermark that may occur in the arrangement of fig. 1;

fig. 9 illustrates a stage of detecting an autocorrelation watermark that may occur in the arrangement of fig. 1;

FIG. 10 illustrates a Machine Learning (ML) component that can be advantageously employed to enhance watermark detection in the arrangement of FIG. 1;

FIG. 11 is another flowchart illustrating exemplary operations that may be performed by the arrangement of FIG. 1;

FIG. 12 is another flowchart illustrating exemplary operations that may be performed by the arrangement of FIG. 1;

FIG. 13 is another flowchart illustrating exemplary operations that may be performed by the arrangement of FIG. 1;

FIG. 14 is a block diagram of an example computing environment suitable for implementing some of the various examples disclosed herein.

Corresponding reference characters indicate corresponding parts throughout the several views of the drawings.

Detailed Description

Various examples will be described in detail with reference to the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. References throughout this disclosure to specific examples and implementations are provided for illustrative purposes only and are not meant to limit all examples unless indicated to the contrary.

The solution for authenticating digital audio includes: generating a first band-limited watermark using a first key, generating a second band-limited watermark using a second key, wherein the bandwidth of the second watermark does not overlap the bandwidth of the first watermark; and embedding the first watermark and the second watermark into a segment of the digital audio file. The solutions further include determining a first watermark score for the segment of the digital audio file for the first watermark using the first key; determining a second watermark score for the segment of the digital audio file for the second watermark using the second key; determining a probability of watermarking of the digital audio file based at least on the first watermark score and the second watermark score; and generating a report indicating whether the digital audio file is watermarked. In some examples, solutions for authenticating digital audio may also embed and decode messages.

Aspects of the present disclosure operate in an unconventional manner by embedding multiple (different) watermarks within the same segment of a digital audio file, thereby placing the watermark in its own limited bandwidth within that segment. This technique allows watermarks to coexist without interference, thereby improving robustness, such as tamper resistance. Aspects of the present disclosure operate in an unconventional manner by detecting multiple watermarks within different frequency bands of the same segment of a digital audio file. The technology improves the reliability of watermark detection, thereby improving the robustness of the detection process under the condition of tampering.

The disclosed solution for watermark embedding and detection employs a watermark embedding module and a watermark detection module. The watermark key is used for synchronization parameters and provides additional security. In some examples, a Machine Learning (ML) component using a Neural Network (NN) is used to enhance robustness. By limiting the bandwidth of the watermark, multiple watermarks can be embedded in the same piece of digital audio without interference. The use of multiple different watermarking schemes within the same segment of digital audio increases the likelihood of detecting at least one watermark despite the presence of natural noise and distortion, even intentional attack (e.g., increased robustness). An example is disclosed that uses a bandwidth of 6 kilohertz (KHz) to 8KHz as the bandwidth of one watermark and 3-4KHz as the second bandwidth of a second watermark.

Solutions can be used for audio books, music and other types of digital audio recordings where imperceptibility (perceived transparency) is important to the user, such as high quality audio. Each version has been tested and produced a Mean Opinion Score (MOS) gap of less than 0.02 and a Comparison (CMOS) gap of less than 0.05. Other advantages include low computational cost and low latency for real-time applications, and flexibility to accommodate various sampling rates and quantization resolutions. Watermarks can be embedded in a variety of digital audio formats, such as sample rates from 8KHz to 48KHz, quantization from 8 bits to 48 bits, stored in WAV, PCM, OGG, MP3, OPUS, SILK, siren and other formats, including formats that use codecs for lossy compression.

Security is provided to prevent brute force cracking. For example, the use of two 96-bit keys is described, thereby providing 2-96 bits of security. Robustness may preserve performance from distortion or corruption by transmission, playback and re-recording, noise, and even deliberate attacks. Each version has been successfully tested with noise levels ranging from-10 decibels (dB) to 30 dB. Intentional attacks that may be defeated by various examples of the present disclosure include synchronization attacks (which adjust the time-series properties of audio, e.g., make the time-series faster or slower, exchange the order of certain audio segments, or insert other audio segments); signal processing attacks (e.g., low-pass filtering or high-pass filtering); and digital watermark attacks (which add new watermarks in an attempt to mask the original watermark (s)). Robustness has been demonstrated to detect more than 95% correctly in real world scenarios (combined precision and recall measurements).

Fig. 1 illustrates an arrangement 100 for robust authentication of digital audio. The digital audio file 102 is passed through the watermark embedding module 300 to become a watermarked digital audio file 104. The watermarked digital audio file 104 is distributed and stored on a digital medium 106. When it is desired to identify a watermark, the watermarked digital audio file 104 is passed through a watermark detection module 700 that outputs a watermark report 108 indicating the detection (or absence) of a watermark. Watermark embedding module 300 generates a first watermark using watermark key 402 and generates a second watermark using watermark key 502. Watermark detection module 700 uses watermark key 402 and watermark key 502 to detect a watermark.

In some examples, watermark message 110 is inserted into one of the watermarks used to embed digital audio file 102 by watermark embedding module 300 and then extracted by watermark detection module 700. Watermark embedding module 300 is described in further detail in connection with fig. 3. Watermark detection module 700 is described in further detail in connection with fig. 7. Watermark keys 402 and 502 are described in further detail in connection with fig. 4 and 5, respectively.

In general, there are three requirements for the performance of digital audio watermarking. The first is imperceptibility, also known as perceived transparency, which is a requirement to ensure that the watermark is not audible to the human ear. The second is robustness, which is used to measure the robustness of the watermark against distortion or damage during transmission. Third is security, which refers to the complexity of brute force cracking of digital watermarks. In general, the longer the key length, the higher the complexity, and the more secure the watermark.

There are various watermarking schemes such as spread spectrum methods that spread the pseudo-random sequence spectrum and then embed it into the audio; a splice (patch) method of embedding a watermark into two channels of a data block; quantization Index Modulation (QIM); a sensing method; and an autocorrelation method. The perception method enhances the robustness and simultaneously improves the imperceptibility of the watermark by calculating the psychoacoustic model. The autocorrelation method divides the audio into several equal length data blocks. For example, two blocks are used to embed different watermark vectors that are orthogonal to each other in the Discrete Cosine Transform (DCT) domain. For the detection procedure, the presence of the watermark is estimated by calculating the autocorrelation of the (watermarked) audio signal. The higher the correlation, the higher the probability of the presence of an autocorrelation watermark.

Fig. 2 illustrates a spectrum 200a of a digital audio file segment 200 and a spectrum 220a of a watermarked digital audio file segment 220. In operation, digital audio file segment 200 is input to watermark embedding module 300, which outputs watermarked digital audio file segment 220. Digital audio file segment 200 is a 1.4 second portion of digital audio file 102 and watermarked digital audio file segment 220 is a 1.4 second portion of watermarked digital audio file 104. A first watermark (e.g., spread spectrum watermark 410) occupies a first bandwidth 201, shown as 6-8KHz, and a second watermark (e.g., autocorrelation watermark 510) occupies a second bandwidth 202, shown as 3-4KHz. The 6-8KHz bandwidth of the first watermark does not overlap the 3-4KHz bandwidth of the second watermark. This allows two watermarks to coexist in the same audio piece without interference. Scrutiny of fig. 2 reveals a slight difference in bandwidth 202 at about 0.6 seconds.

The autocorrelation (SC) method is employed in the lower frequency band (3-4 KHz) and is robust to reverberant scenes. The Spread Spectrum (SS) method is employed in the higher frequency band (6-8 KHz) and is robust to additive noise scenarios. This combination provides superior robustness than either alone. In low frequencies, higher robustness may be achieved at the expense of imperceptibility, while in high frequencies, higher imperceptibility may be achieved at the expense of robustness. The autocorrelation method can enhance imperceptibility at low frequencies. The spread spectrum method can enhance robustness at high frequencies. Spread spectrum watermark 410 is described in further detail in connection with fig. 4, and autocorrelation watermark 510 is described in further detail in connection with fig. 5.

Fig. 3 illustrates further details of the watermark embedding module 300. The watermark embedding module 300 includes a Linear Predictive Coding (LPC) analysis component 302 that receives the digital audio file 102. LPC analysis is used to decompose the audio signal into a spectral envelope and an excitation signal and to improve imperceptibility and to enhance robustness in LPC-based codec scenarios. Watermark embedding module 300 then embeds both the autocorrelation watermark and the spread watermark separately, although different watermark combinations may be used (including the use of additional watermarks in the same audio segment in another non-overlapping bandwidth).

The excitation signal from the LPC analysis component 302 is transformed by the DCT component 304. Autocorrelation embedding 340 generates autocorrelation watermark 510 as shown in fig. 5. An Inverse DCT (IDCT) component 314 transforms the audio data back into the time domain. Analysis filter bank 306 also follows LPC analysis component 302 and performs subband decomposition. Spread spectrum embedding 360 generates spread spectrum watermark 410, as shown in fig. 4, and synthesis filter bank 316 converts the signal for combination with the output of IDCT component 314. These orthogonal transforms, DCT and subband decomposition, preserve signal quality close to that of the original audio.

The strength of the watermark is controlled by a psycho-acoustic strength control 308, the psycho-acoustic strength control 308 determining the strength of the audio power in any segment of the digital audio file 102 in which the watermark is to be embedded. The intensity is controlled based on a psychoacoustic model modeling the human auditory system. The strength is a multiplicative factor of the watermark to ensure that the watermark energy remains below the human hearing threshold. A mask curve is calculated from the input audio according to a psycho-acoustic model and an intensity factor is determined to control the intensity of the watermark to ensure that the energy of the watermark is below the mask curve.

The LPC synthesis component 312 completes the process to allow the embedding of the autocorrelation watermark 510 and the spread watermark 410 in the digital audio file 102, thereby producing the watermarked digital audio file 104.

Fig. 4 illustrates various stages of generating a spread watermark 410. Spread watermark 410 is embedded into watermarked digital audio file fragment 220 along with an autocorrelation watermark 510. As illustrated, the watermark key 402 has three portions, each of which is 32 bits in some examples. These are a Pseudo Noise (PN) portion 406 that provides PN generator seeds, a permutation portion 404 that provides permutation information, and a symbol portion 408 that provides symbol information. The watermark message 110 is permuted into a permuted watermark message 414 according to a permutation array 412 generated from the permutation section 404. PN sequences 416 (1 and 1) are generated from PN portion 406 and multiplied with permuted watermark message 414. This is multiplied by a symbol sequence 418 generated using the symbol portion 408. This result is combined with block 420 (along with the autocorrelation watermark 510) from the digital audio file fragment 200 to produce the watermarked digital audio file fragment 220.

The process can be expressed as:

wherein the method comprises the steps ofIs a watermarked block x _i Is the corresponding audio block, alpha is the intensity, s _i Is symbol g _i Is related to x _i And w is the energy of (2) _i Is a watermark.

Fig. 5 illustrates various stages of generating an autocorrelation watermark 510, the autocorrelation watermark 510 being embedded into a watermarked digital audio file fragment 220 along with a spread watermark 410. As illustrated, the watermark key 502 has three portions, each of which is 32 bits in some examples. These are a position portion 504 that provides position information as a position array 514, an eigenvector portion 506 that provides eigenvector information, and a symbol portion 508 that provides symbol information. The position array 514 controls the position of eigenvectors V1 and V2 generated from eigenvector portion 506 in eigenvector array 516. The eigenvector array 516 provides an alternating embedded series of mutually orthogonal vectors, denoted V1 and V2. This is multiplied by a symbol sequence 518 generated by the symbol portion 508. This result is combined with block 420 (along with spread watermark 410) from digital audio file segment 200 to produce watermarked digital audio file segment 220.

The process can be expressed as:

wherein the method comprises the steps ofIs a watermarked block x _i Is an audio block, alpha is intensity, s _i Is symbol g _i Is related to x _i And v is the energy of (2) _i Is a watermark.

Fig. 6 is a flowchart 600 illustrating exemplary operations involved in detecting a watermark for authenticating digital audio. In some examples, the operations described with respect to flowchart 600 are performed by computing device 1400 of fig. 14. Flowchart 600 begins with operation 602, operation 602 comprising receiving digital audio file 102, and operation 604 comprises generating spread watermark 410 (first watermark) using watermark key 402 (first key), wherein spread watermark 410 is band limited to bandwidth 201 (first bandwidth). In some examples, spread spectrum watermark 410 contains watermark message 110. In some examples, bandwidth 201 extends from 6KHz to 8KHz.

Operation 606 comprises generating an autocorrelation watermark 510 (second watermark) using watermark key 502 (second key), wherein autocorrelation watermark 520 is band limited to bandwidth 201 (second bandwidth). In some examples, the autocorrelation watermark 510 includes the watermark message 110 (or another watermark message). In some examples, bandwidth 201 extends from 3KHz to 4KHz. Operation 608 comprises embedding the spread watermark 410 in the digital audio file segment 200. Operation 610 comprises embedding an autocorrelation watermark 510 into the digital audio file fragment 200. In some examples, the first bandwidth has a lower frequency limit above 5KHz and the second bandwidth has an upper frequency limit below 5KHz such that the second bandwidth does not overlap the first bandwidth.

In some examples, the first watermark and the second watermark comprise different watermark schemes, each watermark scheme selected from a list comprising: spread spectrum watermarking, autocorrelation watermarking and splice watermarking. In some examples, the first watermark comprises a spread spectrum watermark and is band limited to 6KHz to 8KHz. In some examples, the second watermark comprises an autocorrelation watermark and is band limited to 3KHz to 4KHz. In some examples, watermark key 402 includes a first set of at least 96 bits. In some examples, watermark key 502 includes a second set of at least 96 bits. In some examples, watermark key 502 has a different value than watermark key 402. In some examples, the key for the spread spectrum watermark includes three 32-bit portions, a first of the three portions serving as a PN generator seed, a second of the three portions providing permutation information, and a third of the three portions providing symbol information. In some examples, the key for the autocorrelation watermark comprises three 32-bit portions, a first of the three portions serving as a location array, a second of the three portions providing eigenvector information, and a third of the three portions providing symbol information;

In some examples, a third watermark (or more) may also be added to watermarked digital audio file fragment 220. For example, a splice watermark may be used as the third watermark. Thus, in an example using the third watermark, operation 612 includes generating the third watermark using the third key. In some examples, the third watermark is band limited to a third bandwidth. In some examples, the third bandwidth overlaps the first bandwidth or the second bandwidth. Operation 614 comprises embedding a third watermark in the digital audio file segment 200. Operation 616 comprises distributing the watermarked digital audio file 104.

Fig. 7 illustrates further details of the watermark detection module 700. The watermark detection module 700 includes an LPC analysis component 702 that receives the watermarked digital audio file 104. The search method is used to search for watermark embedding locations in audio. After the search, the score of the watermark is calculated at a position where the existence probability of the watermark is maximized. The higher the score, the higher the probability that the watermark is present. Watermark detection module 700 detects both autocorrelation watermark 510 and spread watermark 410 (and/or other watermarks that may have been embedded in watermarked digital audio file 104), respectively.

The excitation signal from the LPC analysis component 702 is transformed by the DCT component 704. The autocorrelation watermark search 740 generates an autocorrelation watermark score 714, as shown in fig. 9. Analysis filter bank 706 also follows LPC analysis component 302 and performs subband decomposition. The spread watermark search 760 generates the spread watermark score 716, as shown in fig. 8. In some examples, to further enhance robustness, ML component 1000 generates ML watermark score 1010, as shown in fig. 10. The various scores are combined into a composite watermark score 712, which is provided to a watermark decision component 718 (e.g., watermark detector). Watermark decision component 718 generates and outputs watermark report 108, which watermark report 108 indicates whether a watermark and/or any individual scores (e.g., composite watermark score 712, autocorrelation watermark score 714, spread watermark score 716, and/or ML watermark score 1010) are detected in watermarked digital audio file 104.

In some examples, if watermark determination component 718 detects a watermark in watermarked digital audio file 104, ML component 1000 and message decoder 720 output recovered watermark message 110.

Fig. 8 illustrates the stage of detecting the spread spectrum watermark 410. The watermark key 402 used for detection is the same as the watermark key used for generation. The watermark message 110 is permuted into a permuted watermark message 814 according to a permutation array 812 generated from the permutation section 404. This is multiplied by a symbol sequence 818 generated using the symbol portion 408. PN sequence 816 (1 and 1) is generated from PN portion 406 and multiplied by the product of permuted watermark message 814 and symbol sequence 818. The result is combined with a block 820 from the watermarked digital audio file fragment 220 using a cross-correlation operation 822 to cross-correlate to generate a spread watermark score 716.

The score process can be expressed as:

using

Wherein ρ is _n Is correlation and BER represents bit error rate. BER varies from 0 (zero) (if the watermark is not detected in error) to 50% (if there is no watermark trace) (assuming that a random bit gives the same likelihood of a correct or erroneous result). Since the encoded watermark sequence is known, it is possible to calculate the BER. The closer the BER is to 0, the higher the probability that the watermark exists. The closer the BER is to 50%, the lower the probability that the watermark is present.

Fig. 9 illustrates the stage of detecting an autocorrelation watermark 510. The watermark key 502 used for detection is the same as the watermark key used for generation. The position portion 504 provides position information to a position array 914, which position array 914 controls the positions of eigenvectors V1 and V2 generated from eigenvector portion 506 in eigenvector array 916. The eigenvector array 916 is multiplied by a symbol sequence 918 generated using the symbol portion 508. The result is combined with a block 820 from the watermarked digital audio file fragment 220 using an autocorrelation operation 922 to perform an autocorrelation to generate an autocorrelation watermark score 714.

The score process can be expressed as:

and is also provided with

Where c is a scalar constant.

According to equations (5) and (6), if no watermark is present, the autocorrelation will remain at a low level. However, if a watermark is present, the autocorrelation will be a constant value added to the autocorrelation with respect to the watermark. This enables a determination of whether a watermark is present.

Fig. 10 illustrates further details of ML component 1000. Block 820 from the watermarked digital audio file fragment 220 is provided to the feature extraction network 1002. Features from the feature extraction network 1002 are provided to the pooling layer 1004 and then to the classification network 1006. The softmax layer 1008 generates ML watermark scores 1010. Features from the feature extraction network 1002 are provided to the decoder network 1012 and the softmax layer 1008 outputs (recovers) the watermark message 110 (along with the message decoder 720). In some examples, the feature extraction network 1002, classification network 1006, and decoder network 1012 include neural networks and are trained using thousands of hours of watermarked audio data using a multitasking training method and/or an countermeasure training method.

Fig. 11 is a flow chart 1100 illustrating exemplary operations involved in authenticating digital audio. In some examples, the operations described with respect to flowchart 1100 are performed by computing device 1400 of fig. 14. Flowchart 1100 begins with operation 1102, operation 1102 comprising receiving a digital audio file (watermarked digital audio file 104), and operation 1104 comprises determining a spread watermark score 716 (first watermark score) of digital audio file segment 220 for spread watermark 410 using watermark key 402, wherein spread watermark 410 is band limited to bandwidth 201. Operation 1106 comprises determining an autocorrelation watermark score 714 (a second watermark score) of the digital audio file segment 220 for the autocorrelation watermark 510 using the watermark key 502, wherein the autocorrelation watermark 510 is band limited to the bandwidth 202, and wherein the bandwidth 202 does not overlap with the bandwidth 01.

In an example using the third watermark, operation 1108 includes determining a watermark score of the digital audio file segment 220 for the third watermark using the third watermark key. Operation 1110 comprises determining ML watermark score 1010 (third watermark score) for digital audio file segment 220 using ML component 1000. In some examples, ML component 1000 includes feature extraction network 1002 and classification network 1006. In some examples, ML component 1000 further includes decoder network 1020.

Operation 1112 includes determining a probability that the watermarked digital audio file 104 is watermarked based at least on the spread watermark score 716 and the autocorrelation watermark score 714. In some examples, determining the watermarked probability of the watermarked digital audio file 104 includes determining the watermarked probability of the watermarked digital audio file 104 based at least on the spread watermark score 716, the autocorrelation watermark score 714, and the watermark score of the third watermark. In some examples, determining the probability that the watermarked digital audio file 104 is watermarked includes determining the probability that the watermarked digital audio file 104 is watermarked based at least on the spread watermark score 716, the autocorrelation watermark score 714, and the ML watermark score 1010.

Decision 1114 determines whether to report the received digital audio file as having a watermark found or not. If not, in operation 1116, watermark report 108 indicates that a watermark was not found. Otherwise, operation 1118 includes generating a watermark report 108 indicating that the digital audio file 102 is watermarked based at least on determining a probability that the watermarked digital audio file 104 is watermarked. In some examples, a hard decision may not be used (decision operation 1114) and operation 1118 reports only probabilities. Operations 1116 and 1118 together include generating watermark report 108 that indicates whether digital audio file 102 is watermarked. If a watermark is detected, operation 1120 includes determining a decoded watermark message 110 using ML component 1000.

Fig. 12 is a flowchart 1200 illustrating exemplary operations involved in detecting a watermark for authenticating digital audio. In some examples, the operations described with respect to flowchart 1200 are performed by computing device 1400 of fig. 14. Flowchart 1200 begins with operation 1202, operation 1202 comprising receiving a digital audio file. Operation 1204 includes generating a first watermark using the first key, wherein the first watermark is band limited to a first bandwidth. Operation 1206 comprises generating a second watermark using a second key, wherein the second watermark is band limited to a second bandwidth, and wherein the second bandwidth does not overlap the first bandwidth. Operation 1208 comprises embedding the first watermark into a segment of the digital audio file. Operation 1210 comprises embedding the second watermark into the segment of the digital audio file.

Fig. 13 is a flowchart 1300 illustrating exemplary operations involved in authenticating digital audio. In some examples, the operations described with respect to flowchart 1300 are performed by computing device 1400 of fig. 14. Flowchart 1300 begins with operation 1302, operation 1302 comprising receiving a digital audio file. Operation 1304 includes determining a first watermark score for a segment of the digital audio file for a first watermark using a first key, wherein the first watermark is band limited to a first bandwidth. Operation 1306 includes determining a second watermark score for the segment of the digital audio file for a second watermark using a second key, wherein the second watermark is band limited to a second bandwidth, and wherein the second bandwidth does not overlap the first bandwidth. Operation 1308 comprises determining a probability that the digital audio file is watermarked based at least on the first watermark score and the second watermark score. Operation 1310 comprises generating a report indicating whether the digital audio file is watermarked based at least on determining the probability that the digital audio file is watermarked.

Additional examples

An example method of authenticating digital audio includes: receiving a digital audio file; generating a first watermark using a first key, wherein the first watermark is band limited to a first bandwidth; generating a second watermark using a second key, wherein the second watermark is band limited to a second bandwidth, and wherein the second bandwidth does not overlap with the first bandwidth; embedding the first watermark into a segment of the digital audio file; and embedding the second watermark into the segment of the digital audio file.

An example system for authenticating digital audio includes: a processor; and a computer readable medium storing instructions that when executed by the processor are operable to: receiving a digital audio file; generating a first watermark using a first key, wherein the first watermark is band limited to a first bandwidth; generating a second watermark using a second key, wherein the second watermark is band limited to a second bandwidth, and wherein the second bandwidth does not overlap with the first bandwidth; embedding the first watermark into a segment of the digital audio file; and embedding the second watermark into the segment of the digital audio file.

One or more example computer storage devices having stored thereon computer-executable instructions that, when executed by a computer, cause the computer to perform operations comprising: receiving a digital audio file; generating a first watermark using a first key, wherein the first watermark is band limited to a first bandwidth; generating a second watermark using a second key, wherein the second watermark is band limited to a second bandwidth, and wherein the second bandwidth does not overlap with the first bandwidth; embedding the first watermark into a segment of the digital audio file; and embedding the second watermark into the segment of the digital audio file.

An example method of authenticating digital audio includes: receiving a digital audio file; determining a first watermark score for a segment of the digital audio file for a first watermark using a first key, wherein the first watermark is band limited to a first bandwidth; determining a second watermark score for the segment of the digital audio file for a second watermark using a second key, wherein the second watermark is band limited to a second bandwidth, and wherein the second bandwidth does not overlap the first bandwidth; determining a probability of watermarking of the digital audio file based at least on the first watermark score and the second watermark score; and generating a report indicating whether the digital audio file is watermarked based at least on determining the probability that the digital audio file is watermarked.

An example system for authenticating digital audio includes: a processor; and a computer readable medium storing instructions that when executed by the processor are operable to: receiving a digital audio file; determining a first watermark score for a segment of the digital audio file for a first watermark using a first key, wherein the first watermark is band limited to a first bandwidth; determining a second watermark score for the segment of the digital audio file for a second watermark using a second key, wherein the second watermark is band limited to a second bandwidth, and wherein the second bandwidth does not overlap the first bandwidth; determining a probability of watermarking of the digital audio file based at least on the first watermark score and the second watermark score; and generating a report indicating whether the digital audio file is watermarked based at least on determining the probability that the digital audio file is watermarked.

One or more example computer storage devices having stored thereon computer-executable instructions that, when executed by a computer, cause the computer to perform operations comprising: receiving a digital audio file; determining a first watermark score for a segment of the digital audio file for a first watermark using a first key, wherein the first watermark is band limited to a first bandwidth; determining a second watermark score for the segment of the digital audio file for a second watermark using a second key, wherein the second watermark is band limited to a second bandwidth, and wherein the second bandwidth does not overlap the first bandwidth; determining a probability of watermarking of the digital audio file based at least on the first watermark score and the second watermark score; and generating a report indicating whether the digital audio file is watermarked based at least on determining the probability that the digital audio file is watermarked.

Alternatively or additionally to other examples described herein, examples include any combination of the following:

the first watermark comprising a message;

the second watermark comprising a message;

the first bandwidth has a lower frequency limit above 5 KHz;

the first bandwidth extends from 6KHz to 8KHz;

the second bandwidth has an upper frequency limit below 5 KHz;

the second bandwidth extends from 3KHz to 4KHz;

the first watermark and the second watermark comprise different watermark schemes, each watermark scheme selected from a list comprising: spread spectrum watermarking, autocorrelation watermarking and splicing watermarking;

the first watermark comprises a spread spectrum watermark and is band limited to 6KHz to 8KHz;

the second watermark comprises an autocorrelation watermark and is band limited to 3KHz to 4KHz;

the first key comprises a first set of at least 96 bits;

the second key comprises a second set of at least 96 bits;

the second key has a different value than the first key;

the key for the spread spectrum watermark comprises three 32 bit portions, a first of which serves as a PN generator seed, a second of which provides permutation information, and a third of which provides symbol information;

The key for the autocorrelation watermark comprises three 32-bit portions, a first of which serves as an array of positions, a second of which provides eigenvector information, and a third of which provides symbol information;

generating a third watermark using the third key;

the third watermark is band limited to a third bandwidth;

the third bandwidth overlaps with the first bandwidth or the second bandwidth;

embedding the third watermark into the segment of the digital audio file;

determining a first watermark score for the segment of the digital audio file for the first watermark using the first key;

determining a second watermark score for the segment of the digital audio file for the second watermark using the second key;

determining a probability of watermarking of the digital audio file based at least on the first watermark score and the second watermark score;

determining a fourth watermark score for the segment of the digital audio file for a third watermark using a third key;

determining the probability that the digital audio file is watermarked includes: determining the probability of the digital audio file being watermarked based at least on the first watermark score, the second watermark score, and the fourth watermark score;

Determining a third watermark score for the segment of the digital audio file using an ML component;

determining the probability that the digital audio file is watermarked includes: determining the probability of the digital audio file being watermarked based at least on the first watermark score, the second watermark score, and the third watermark score;

the ML component comprises a feature extraction network and a classification network;

determining a decoded watermark message using the ML component;

the ML component further includes a decoder network;

generating a report indicating whether the digital audio file is watermarked based at least on determining the probability that the digital audio file is watermarked;

generating the first watermark using the first key;

generating the second watermark using the second key;

embedding the first watermark into the segment of the digital audio file; and

the second watermark is embedded into the segment of the digital audio file.

While aspects of the disclosure have been described in terms of various examples and their associated operations, those skilled in the art will recognize that combinations of operations from any number of the different examples are also within the scope of aspects of the disclosure.

Example operating Environment

Fig. 14 is a block diagram of an example computing device 1400 for implementing aspects disclosed herein and designated generally as computing device 1400. Computing device 1400 is but one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the examples disclosed herein. Neither should the computing device 1400 be interpreted as having any dependency or requirement relating to any one or combination of components/modules illustrated. Examples disclosed herein may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program components, being executed by a computer or other machine, such as a personal data assistant or other handheld device. Generally, program components, including routines, programs, objects, components, data structures, etc., refer to code that perform particular tasks or implement particular abstract data types. The disclosed examples may be implemented in a variety of system configurations, including personal computers, laptop computers, smart phones, mobile tablets, hand-held devices, consumer electronics, professional computing devices, and the like. The disclosed examples may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.

Computing device 1400 includes a bus 1410 that directly or indirectly couples the following devices: computer storage memory 1412, one or more processors 1414, one or more presentation components 1416, I/O ports 1418, I/O components 1420, power supply 1422, and network components 1424. Although computing device 1400 is depicted as a single device, multiple computing devices 1400 may work together and share the depicted device resources. For example, memory 1412 may be distributed across multiple devices, and processor(s) 1414 may be housed in different devices.

Bus 1410 represents what may be one or more busses (such as an address bus, data bus, or combination thereof). Although the various blocks of FIG. 14 are shown with lines for the sake of clarity, describing various components may be accomplished in different representations. For example, in some examples, the presentation component, such as a display device, is an I/O component, and some examples of the processor have their own memory. There is no distinction between such things as "workstation," server, "" laptop, "" handheld device, "and the like, all of which are considered to be within the scope of fig. 14 and referred to herein as" computing devices. Memory 1412 may take the form of the following computer storage media references and is operable to provide storage of computer-readable instructions, data structures, program modules, and other data for computing device 1400. In some examples, memory 1412 stores one or more of an operating system, a general-purpose application platform, or other program modules and program data. Thus, the memory 1412 is capable of storing and accessing data 1412a and instructions 1412b, which are executable by the processor 1414 and configured to perform various operations disclosed herein.

In some examples, memory 1412 includes computer storage media in the form of volatile and/or nonvolatile memory, removable or non-removable memory, data disks in a virtual environment, or a combination thereof. Memory 1412 may include any number of memories associated with computing device 1400 or accessible by computing device 1400. Memory 1412 may be internal to computing device 1400 (as shown in fig. 14), external to computing device 1400 (not shown), or both (not shown). Examples of memory 1412 include, but are not limited to, random Access Memory (RAM); read Only Memory (ROM); electronically Erasable Programmable Read Only Memory (EEPROM); flash memory or other memory technology; CD-ROM, digital Versatile Disks (DVD) or other optical or holographic media; magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices; a memory wired to the analog computing device; or any other medium used to encode desired information and be accessed by computing device 1400. Additionally or alternatively, the memory 1412 may be distributed across multiple computing devices 1400, for example, in a virtualized environment in which instruction processing is performed on multiple devices 1400. For purposes of this disclosure, "computer storage medium," "computer storage memory," "memory," and "memory device" are synonymous terms of computer storage memory 1412, and none of these terms include carrier wave or propagated signaling.

Processor 1414 may include any number of processing units that read data from various entities such as memory 1412 or I/O component 1420. In particular, processor 1414 is programmed to execute computer-executable instructions for implementing aspects of the present disclosure. These instructions may be executed by a processor, by multiple processors within computing device 1400, or by a processor external to client computing device 1400. In some examples, processor 1414 is programmed to execute instructions such as those shown in the flowcharts and depicted in the figures discussed below. Moreover, in some examples, processor 1414 represents one implementation of simulation techniques to perform the operations described herein. For example, these operations may be performed by analog client computing device 1400 and/or digital client computing device 1400. The presentation component 1416 presents data indications to a user or other device. Exemplary presentation components include display devices, speakers, printing components, vibration components, and the like. Those skilled in the art will understand and appreciate that computer data may be presented in a variety of ways, such as visually in a Graphical User Interface (GUI), audibly through speakers, wirelessly between computing devices 1400, through a wired connection, or otherwise. The I/O ports 1418 allow the computing device 1400 to be logically coupled to other devices, some of which may be built-in, including the I/O component 1420. Example I/O components 1420 include, for example, but are not limited to, a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, or the like.

Computing device 1400 may operate in a networked environment using logical connections to one or more remote computers via a network component 1424. In some examples, the network component 1424 includes a network interface card and/or computer-executable instructions (e.g., drivers) for operating the network interface card. Communication between computing device 1400 and other devices may occur over any wired or wireless connection using any protocol or mechanism. In some examples, the network component 1424 is operable to use a transport protocol to communicate between public, private, or hybrid (public and private) devices using short-range communication techniques (e.g., near Field Communication (NFC), bluetooth ^TM Brand communication, etc.) or a combination thereof. Network component 1424 communicates with cloud resources 1428 across network 1430 via wireless communication link 1426 and/or wired communication link 1426 a. Various examples of communication links 1426 and 1426a include wireless connections, wired connections, and/or dedicated links, and in some examples, at least a portion is routed through the internet.

While described in connection with an example computing device 1400, the examples of this disclosure are capable of being implemented with numerous other general purpose or special purpose computing system environments, configurations, or devices. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with aspects of the disclosure include, but are not limited to: smart phones, mobile tablets, mobile computing devices, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, game consoles, microprocessor-based systems, set top boxes, programmable consumer electronics, mobile phones, mobile computing and/or communication devices with wearable or accessory form factors (e.g., watches, glasses, headphones, or earplugs), network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, virtual Reality (VR) devices, augmented Reality (AR) devices, mixed reality devices, holographic devices, and the like. Such a system or device may accept input from a user in any manner, including from an input device such as a keyboard or pointing device, through gesture input, proximity input (such as through hovering), and/or through voice input.

Examples of the disclosure may be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices in software, firmware, hardware, or combinations thereof. The computer-executable instructions may be organized into one or more computer-executable components or modules. Generally, program modules include, but are not limited to, routines, programs, objects, components, and data structures that perform particular tasks or implement particular abstract data types. Aspects of the disclosure may be implemented with any number of such components or modules, and any organization thereof. For example, aspects of the present disclosure are not limited to the specific computer-executable instructions or the specific components or modules illustrated in the figures and described herein. Other examples of the disclosure may include different computer-executable instructions or components having more or less functionality than illustrated and described herein. In examples involving general-purpose computers, aspects of the present disclosure transform general-purpose computers into special-purpose computing devices when configured to execute the instructions described herein.

By way of example, and not limitation, computer readable media comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable memory implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, and the like. Computer storage media are tangible and mutually exclusive to communication media. The computer storage media is implemented in hardware and excludes carrier waves and propagated signals. The computer storage media used for the purposes of this disclosure are not signals themselves. Exemplary computer storage media include hard disk, flash drive, solid state memory, phase change random access memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission media that can be used to store information for access by a computing device. In contrast, communication media typically embodies computer readable instructions, data structures, program modules, etc. in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

The order of execution or performance of the operations in the examples of the disclosure illustrated and described herein is not essential, but may be performed in a different order in various examples. For example, it is contemplated that executing or performing a particular operation before, contemporaneously with, or after another operation is within the scope of aspects of the disclosure. When introducing elements of aspects of the present disclosure or the examples thereof, the articles "a," "an," "the," and "said" are intended to mean that there are one or more of the elements. The terms "comprising," "including," and "having" are intended to be inclusive and mean that there may be additional elements other than the listed elements. The term "exemplary" is intended to represent an example of "… …". The phrase "one or more of the following: A. b and C "means" at least one A and/or at least one B and/or at least one C ".

Having described aspects of the present disclosure in detail, it will be apparent that modifications and variations are possible without departing from the scope of aspects of the disclosure defined in the appended claims. As various changes could be made in the above constructions, products, and methods without departing from the scope of aspects of the disclosure, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.

Claims

1. A method of authenticating digital audio, the method comprising:

receiving a digital audio file;

generating a first watermark using a first key, wherein the first watermark is band limited to a first bandwidth;

generating a second watermark using a second key, wherein the second watermark is band limited to a second bandwidth, and wherein the second bandwidth does not overlap with the first bandwidth;

embedding the first watermark into a segment of the digital audio file; and

the second watermark is embedded into the segment of the digital audio file.

2. The method of claim 1, wherein the first bandwidth has a lower frequency limit above 5 kilohertz (KHz) and the second bandwidth has an upper frequency limit below 5 KHz.

3. The method of claim 1, wherein the first watermark and the second watermark comprise different watermark schemes, each watermark scheme selected from a list comprising:

spread spectrum watermarking, autocorrelation watermarking and splice watermarking.

4. The method of claim 1, further comprising:

Determining a second watermark score for the segment of the digital audio file for the second watermark using the second key; and

a probability of watermarking the digital audio file is determined based at least on the first watermark score and the second watermark score.

5. The method as recited in claim 4, further comprising:

determining a third watermark score for the segment of the digital audio file using a Machine Learning (ML) component, wherein determining the probability that the digital audio file is watermarked comprises: the probability of the digital audio file being watermarked is determined based at least on the first watermark score, the second watermark score, and the third watermark score.

6. A system for authenticating digital audio, the system comprising:

a processor; and

a computer readable medium storing instructions that when executed by the processor are operable to:

receiving a digital audio file;

Embedding the first watermark into a segment of the digital audio file; and

the second watermark is embedded into the segment of the digital audio file.

7. The system of claim 6, wherein the first bandwidth has a lower frequency limit above 5 kilohertz (KHz) and the second bandwidth has an upper frequency limit below 5 KHz.

8. The system of claim 6, wherein the first watermark and the second watermark comprise different watermark schemes, each watermark scheme selected from a list comprising:

9. The system of claim 6, wherein the instructions are further operable to:

10. The system of claim 9, wherein the instructions are further operable to:

11. One or more computer storage devices having stored thereon computer-executable instructions that, when executed by a computer, cause the computer to perform operations comprising:

receiving a digital audio file;

determining a first watermark score for a segment of the digital audio file for a first watermark using a first key, wherein the first watermark is band limited to a first bandwidth;

determining a second watermark score for the segment of the digital audio file for a second watermark using a second key, wherein the second watermark is band limited to a second bandwidth, and wherein the second bandwidth does not overlap the first bandwidth;

determining a probability of watermarking of the digital audio file based at least on the first watermark score and the second watermark score; and

A report indicating whether the digital audio file is watermarked is generated based at least on the probability of determining that the digital audio file is watermarked.

12. The one or more computer storage devices of claim 11, wherein the first bandwidth has a lower frequency limit above 5 kilohertz (KHz) and the second bandwidth has an upper frequency limit below 5 KHz.

13. The one or more computer storage devices of claim 11, wherein the first watermark and the second watermark comprise different watermark schemes, each watermark scheme selected from a list comprising:

14. The one or more computer storage devices of claim 11, wherein the operations further comprise:

determining a fourth watermark score for the segment of the digital audio file for a third watermark using a third key, wherein the third watermark is band limited to a third bandwidth, wherein the third bandwidth overlaps the first bandwidth or the second bandwidth, and wherein determining the probability that the digital audio file is watermarked comprises: the probability of the digital audio file being watermarked is determined based at least on the first watermark score, the second watermark score, and the fourth watermark score.

15. The one or more computer storage devices of claim 11, wherein the operations further comprise:

generating the first watermark using the first key;

generating the second watermark using the second key;

embedding the first watermark into the segment of the digital audio file; and

the second watermark is embedded into the segment of the digital audio file.