WO2011160966A1 - Audio watermarking - Google Patents

Audio watermarking Download PDF

Info

Publication number
WO2011160966A1
WO2011160966A1 PCT/EP2011/059688 EP2011059688W WO2011160966A1 WO 2011160966 A1 WO2011160966 A1 WO 2011160966A1 EP 2011059688 W EP2011059688 W EP 2011059688W WO 2011160966 A1 WO2011160966 A1 WO 2011160966A1
Authority
WO
WIPO (PCT)
Prior art keywords
time frame
magnitude
audio signal
frequency
signal
Prior art date
Application number
PCT/EP2011/059688
Other languages
French (fr)
Inventor
Jian Wang
Ron Healy
Joseph Timoney
Original Assignee
National University Of Ireland, Maynooth
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University Of Ireland, Maynooth filed Critical National University Of Ireland, Maynooth
Publication of WO2011160966A1 publication Critical patent/WO2011160966A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/018Audio watermarking, i.e. embedding inaudible data in the audio signal

Definitions

  • the present invention relates to steganography for digital audio files.
  • Steganography comprises concealing a message, image, or file within another message, image, or file.
  • Digital watermarking of audio and/or video is a form of steganography, in that audio or video can be used to 'hide' the presence of other information.
  • digital watermarking of audio/video files has been considered in an attempt to protect, track, identify or authenticate media such as photographs, music and/or movies.
  • Moulin, P., & Koetter, R. "Data-Hiding Codes", Proc. Of the IEEE, Vol. 93, No. 12, Dec. 2005 provides an overview of techniques for hiding data in cover signals.
  • Figure 1 is a flow diagram for adding watermark information to an audio file in accordance with an embodiment of the invention
  • FIG. 2 is a flow diagram illustrating the operation of CSPE employed in
  • Figure 3 is an exemplary output after CSPE transformation of an audio signal
  • Figure 4 is a flow diagram illustrating the extraction of watermark information from an audio signal
  • Figure 5 is a flow diagram for adding watermark information to an audio file in accordance with a second embodiment of the invention.
  • Figure 6 shows FFT components illustrating a Type I click introduced into an audio signal during processing according to the second embodiment
  • Figure 7 shows FFT components illustrating a Type II click introduced into an audio signal during processing according to the second embodiment.
  • a key value is first chosen as the basis for determining which components within an audio signal are to be chosen to hide a watermark message within the audio signal, step 10.
  • the key value can simply be a frequency value.
  • the key value(s) can be used as a private key which is mapped to a frequency and ultimately to the chosen frequency components.
  • the key value can be used to add security as required according to the environment where the hidden message is to be used.
  • this can also be used as a private key, if required.
  • the key value is mapped to identify frequency components from a window within the signal onto which watermark information is to be written.
  • This mapping may depend on various factors, such as the type or content of audio used as host/cover for the message. For example, human speech generally includes lower frequency components - and less of them - than a modern Rock or Pop song, so hiding data in a recording of speech would naturally limit the choice of frequency components. However, even in an audio file with such a limited range, there could still be thousands of components to choose from.
  • components may occur at 950Hz and 1020Hz.
  • Various criteria may be set for determining the distinctiveness of the components, e.g. they may need to comprise more than a given threshold percentage of the overall energy of the signal for the time window, or they may need to contrast by more than a given amount with the surrounding signal components, or they may simply need to have an amplitude above a given threshold.
  • the second or third next adjacent frequency components could be chosen, or indeed this might change from frame to frame according to a keying scheme.
  • the signal S(t) intended as the cover or host audio is segmented into frames or windows of uniform length, step 12, for example, 20 ms, and, in the preferred embodiments, the frame is analyzed using Complex Spectral Phase Estimation (CSPE) to identify the presence, magnitude and phase of its frequency components, step 14.
  • CSPE Complex Spectral Phase Estimation
  • Douglas Nelson "Cross Spectral Methods for Processing Speech", Journal of the Acoustic Society of America, vol . 1 10, No.5, pt.1 , Nov.2001 , pp.2575-2592 discloses a related cross-spectrogram technique.
  • an FFT analysis is performed twice: firstly on a frame S 0 of the signal of interest and the second time upon a frame Si for the same signal but shifted in time by one sample. Then, by multiplying the sample-shifted FFT spectrum with the complex conjugate of the initial FFT spectrum, step 20, a frequency dependent function is formed from which the magnitude and phase angle of the frequency components it contains can be detected.
  • CSPE produces a graph with a staircase-like appearance where the flat parts of the graph indicated by the arrows are the frequencies of the components.
  • the key value if not already, is transformed to a frequency value and this is used to identify a frequency component in the CSPE frequency domain for the chosen window and calculate its magnitude within the CSPE frequency domain, step 16.
  • the magnitude of this first identified component can then be modified by comparison with a value for a second component from within the signal window, in order to represent a single bit ⁇ ' or '0' from a watermark message.
  • the value of both the first and the second component can be modified, or just the second component can be modified by comparison with the first identified component to represent the watermark information.
  • identified component(s) are dynamically chosen dependent on the key value and the signal in the window under analysis.
  • the components chosen for modification are the nearest distinct components beyond a calculated threshold (say 10 Hz) above (CompB) and below (CompA) the frequency derived from the key value.
  • the first embodiment is based on a set of rules that lead to the modification of only one of the identified components (compA or compB) in approximately half the frames. This is achieved with the rule:
  • Amp is the amplitude of the identified component.
  • the magnitude of both components (compA and compB) in any given frame is compared before deciding if any modification would be required in order to satisfy these criteria, depending on the watermark bit to be embedded and the magnitudes of the two components in that particular frame. If they are already in the correct relationship relative to the watermark bit selected at step 18, no modification is required at step 22. If, however, they are not in the correct relationship, at least one of them should be modified at step 22.
  • rAmp, lAmp are the amplitudes of CompB and CompA respectively;
  • CompA is the frequency of CompA
  • Ip is the phase of CompA.
  • the magnitude of CompB is reduced, by adding in a component that is 180° out of phase with the original component in the signal, as follows:
  • compB and rp define amplitude and phase of CompB.
  • the threshold value is set to provide a 25% difference in magnitude between CompA and CompB, however, this value can be varied dynamically from window to window or signal to signal as required.
  • a decoder in order to decode an embedded watermarked message from an audio signal, a decoder must be provided with the key value, an indication of the windowing used as the basis for embedding the watermark message as well as the rules that define a ⁇ ' bit and a '0' bit within the audio signal.
  • the candidate audio signal is then segmented into frames using the same windowing as was used for embedding, step 40.
  • the system uses CSPE to calculate the magnitudes of the frequency components for the window.
  • the two components above and below the frequency determined by the key value are then identified, step 42. These two components then have their magnitude compared and a "T or a '0' bit is determined according to the rules used in their embedding, step 44. From this comparison, the watermarked bit sequence can be recreated from the sequence of windows in the signal.
  • a relatively short watermark message can be repeatedly written into successive windows of the signal and in decoding, each impression of the message can be correlated with the others to correct for any errors in decoding any portion of the message from the signal.
  • the bin index values for the Fourier components corresponding to CSPE derived frequency components identified through using the key value are used as criteria for determining whether or not the signal in a window needs to be adjusted to
  • the watermark bit if the watermark bit is 0, it requires that the bin index value (k) of both identified CSPE frequency components should be either both odd or both even. If the embed bit is 1 , then one bin location should be odd and one even.
  • the first reduction is 25% of the component's magnitude.
  • CSPE is then run again on the adjusted signal for the window, and possibly new 1 st and 2 nd frequency components are selected. If the bin index values for these components satisfy the embedding criteria, the process proceeds, and if not, the magnitude of one or other of these components is again reduced, step 58, before repeating the process. It has been found that this loop has not had to be repeated more than 3 times before the criteria for embedding a bit have been met.
  • this loop adds a processing overhead to the embedding phase but since embedding is a one-off process and not time-critical, it is a satisfactory compromise for improved accuracy.
  • audible artefacts are unacceptable as they allow listeners to deduce that there might be a watermark present or simply effect the quality of the recordal.
  • One possible result of the watermarking schemes described above could be unexpected audible artefacts comprising 'pops' or 'clicks'.
  • the signal whether modified according to the first or second embodiment is analyzed for two types of artefact, Type I and II as follows:
  • a modified frequency component in the adjusted signal has a magnitude greater than 10 times the original component's magnitude, it is identified as a Type I click.
  • the Fourier transformed spectrum of the watermarked signal has bins with magnitudes that are different from the corresponding bins in the original spectrum, peaks are picked those differing bins from each spectrum. If a peak exists in the spectrum of the watermarked signal with a magnitude greater than 3 times the magnitude of the original corresponding peak, and if this peak is also greater than the magnitude of the neighbouring peaks, then this is identified as a Type II click.
  • Figure 6 shows a Type I click where a selected component's magnitude in a finally adjusted signal satisfying the criteria for embedding is apparently much larger than its value in the original signal and noticeably larger than neighboring components' magnitudes.
  • Figure 7 shows a Type II click.
  • Type II clicks only occur relatively rarely by comparison to Type I. Since Type II clicks occur only very occasionally, in embodiments of the present invention, the solution to this artefact is to return the adjusted component to its original state, step 60. This of course introduces an inaccuracy in the embedded information, however, as the event is so rare, it can be compensated for by building redundancy into the watermark message, either by repeatedly embedding a short watermark message into the signal, allowing a true version to be built up by a decoder or simply using CRC codes or equivalent within the watermark message information.
  • the magnitude of the selected component is reduced again, step 58 and the process is repeated.
  • the solid line in Figure 6 represents the original signal
  • the dash-dot line represents the signal after it has been modified once (denoted as 'intermediate signal')
  • the dashed line represents the final signal, in which the selected bins satisfy the condition for embedding.
  • two components are selected in encoding step 16 and decoding step 42 and their mutual values are determined to enable the embedding/extraction of watermark information in/from an audio signal.
  • the invention could equally be implemented by selecting more than two components and using their mutual values to determine how to embed information in the audio signal.
  • the watermark information embedded and extracted in/from the audio signal can be used for any number of applications. For example, by embedding an ISRC code for a song in a recording of the song, it's broadcast on radio stations can be detected by listening for such watermark information including, but not limited to, where such stations broadcast through the Internet. This in turn can be used to assist musicians to properly recover royalties from the broadcast of their works from the responsible agencies around the world.
  • an audio signal watermarked according to the present invention may simultaneously include many threads of information which can be used for different applications relating to that audio material.
  • the invention is not limited to the embodiment(s) described herein but can be amended or modified without departing from the scope of the present invention.

Abstract

A method of providing a digital watermark in an audio signal comprises selecting a key frequency value determining how watermark information is to be embedded into a first time frame of the audio signal. A plurality of discrete frequency component values of the audio signal is provided for the first time frame. At least two frequency components for the time frame are selected as a function of the key frequency value. The two frequency components are tested to determine if they meet a given mutual criterion for the signal in the first time frame. If the components do not meet the criterion, the magnitude of at least one of the two frequency components is adjusted in the first time frame.

Description

Audio Watermarking
Field of the Invention
The present invention relates to steganography for digital audio files.
Background
Steganography comprises concealing a message, image, or file within another message, image, or file. Digital watermarking of audio and/or video is a form of steganography, in that audio or video can be used to 'hide' the presence of other information. In recent years, digital watermarking of audio/video files has been considered in an attempt to protect, track, identify or authenticate media such as photographs, music and/or movies. Moulin, P., & Koetter, R., "Data-Hiding Codes", Proc. Of the IEEE, Vol. 93, No. 12, Dec. 2005 provides an overview of techniques for hiding data in cover signals.
It is an object of the present invention to provide an improved method and apparatus for adding watermark information to an audio file in a relatively processor efficient manner and without unduly effecting the quality of the audio signal.
Summary of the Invention
According to the present invention there is provided a method of providing a digital watermark in an audio signal according to claim 1 .
According to a second aspect there is provided a method of extracting a digital watermark from an audio signal according to claim 13.
In further aspects there are provided a corresponding encoder, a decoder, a computer program product and an audio signal watermarked according to the invention.
Brief Description of the Drawings Embodiments of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:
Figure 1 is a flow diagram for adding watermark information to an audio file in accordance with an embodiment of the invention;
Figure 2 is a flow diagram illustrating the operation of CSPE employed in
embodiments of the present invention;
Figure 3 is an exemplary output after CSPE transformation of an audio signal; Figure 4 is a flow diagram illustrating the extraction of watermark information from an audio signal;
Figure 5 is a flow diagram for adding watermark information to an audio file in accordance with a second embodiment of the invention;
Figure 6 shows FFT components illustrating a Type I click introduced into an audio signal during processing according to the second embodiment; and
Figure 7 shows FFT components illustrating a Type II click introduced into an audio signal during processing according to the second embodiment.
Description of the Preferred Embodiments Referring now to Figure 1 , a key value is first chosen as the basis for determining which components within an audio signal are to be chosen to hide a watermark message within the audio signal, step 10. In some cases, the key value can simply be a frequency value. However, as this value is needed in order to extract the watermark message from a signal, in alternative implementations, the key value(s) can be used as a private key which is mapped to a frequency and ultimately to the chosen frequency components. As such, the key value can be used to add security as required according to the environment where the hidden message is to be used. Also, as an indication of the windowing scheme used to embed information in the signal will be required to decode the watermark information, this can also be used as a private key, if required.
In implementations of the invention, the key value is mapped to identify frequency components from a window within the signal onto which watermark information is to be written. This mapping may depend on various factors, such as the type or content of audio used as host/cover for the message. For example, human speech generally includes lower frequency components - and less of them - than a modern Rock or Pop song, so hiding data in a recording of speech would naturally limit the choice of frequency components. However, even in an audio file with such a limited range, there could still be thousands of components to choose from.
In the embodiments described below, a relatively simple scheme is described where key value is mapped to a given frequency value, e.g. approximately 1 kHz, and the adjacent distinct frequency components identified within the signal above and below this frequency value are chosen for receiving watermark information.
So for a given time frame within the signal, distinct frequency components may occur at 990Hz and 1015Hz, whereas for another frame in the signal, the
components may occur at 950Hz and 1020Hz. Various criteria may be set for determining the distinctiveness of the components, e.g. they may need to comprise more than a given threshold percentage of the overall energy of the signal for the time window, or they may need to contrast by more than a given amount with the surrounding signal components, or they may simply need to have an amplitude above a given threshold.
It should also be appreciated that frequency components immediately adjacent the frequency derived from the key value need not be chosen. In other
implementations, the second or third next adjacent frequency components could be chosen, or indeed this might change from frame to frame according to a keying scheme.
In any case, the signal S(t) intended as the cover or host audio is segmented into frames or windows of uniform length, step 12, for example, 20 ms, and, in the preferred embodiments, the frame is analyzed using Complex Spectral Phase Estimation (CSPE) to identify the presence, magnitude and phase of its frequency components, step 14. CSPE is disclosed more fully in K. M. Short and R. A. Garcia, 'Signal Analysis using the Complex Spectral Phase Evolution (CSPE) Method', Audio Engineering Society 120th Convention, May 2006, Paris, France and provides one computationally efficient method of accurately estimating the frequency and phase of components that exist in an audio signal.
Douglas Nelson, "Cross Spectral Methods for Processing Speech", Journal of the Acoustic Society of America, vol . 1 10, No.5, pt.1 , Nov.2001 , pp.2575-2592 discloses a related cross-spectrogram technique.
Referring now to figure 2, in CSPE, an FFT analysis is performed twice: firstly on a frame S0 of the signal of interest and the second time upon a frame Si for the same signal but shifted in time by one sample. Then, by multiplying the sample-shifted FFT spectrum with the complex conjugate of the initial FFT spectrum, step 20, a frequency dependent function is formed from which the magnitude and phase angle of the frequency components it contains can be detected. Referring to figure 3, for a 1 second window of a signal containing components with frequency values (in Hz) of 17, 293.5, 313.9, 204.6, 153.7, 378 and 423 and for a sampling frequency of 1024 HZ, CSPE produces a graph with a staircase-like appearance where the flat parts of the graph indicated by the arrows are the frequencies of the components.
Referring back to Figure 1 , the key value, if not already, is transformed to a frequency value and this is used to identify a frequency component in the CSPE frequency domain for the chosen window and calculate its magnitude within the CSPE frequency domain, step 16. The magnitude of this first identified component can then be modified by comparison with a value for a second component from within the signal window, in order to represent a single bit Ί ' or '0' from a watermark message. Alternatively, the value of both the first and the second component can be modified, or just the second component can be modified by comparison with the first identified component to represent the watermark information.
In the first embodiment, identified component(s) are dynamically chosen dependent on the key value and the signal in the window under analysis. In the embodiment, the components chosen for modification are the nearest distinct components beyond a calculated threshold (say 10 Hz) above (CompB) and below (CompA) the frequency derived from the key value.
When modifying the amplitude of an identified frequency component, care must be taken to ensure that a user perceptible artefact is not introduced into the signal and that the modification does not have a negative impact on the timbre of the original signal.
The first embodiment is based on a set of rules that lead to the modification of only one of the identified components (compA or compB) in approximately half the frames. This is achieved with the rule:
If watermark bit= let Amp(compA) > Amp (compB) + margin
If watermark bit=0 let Amp(compS) > Amp(compA) + margin
where Amp is the amplitude of the identified component.
Thus, the magnitude of both components (compA and compB) in any given frame is compared before deciding if any modification would be required in order to satisfy these criteria, depending on the watermark bit to be embedded and the magnitudes of the two components in that particular frame. If they are already in the correct relationship relative to the watermark bit selected at step 18, no modification is required at step 22. If, however, they are not in the correct relationship, at least one of them should be modified at step 22.
Let us assume that the magnitude of compA is lower than that of compB, in a frame in which it needs to be of a higher magnitude to represent a Ί ' bit. In order to increase the magnitude of a particular frequency component in the particular window of the cover signal S(t), a component is added at a defined magnitude and matched to the phase of the component it is being combined with, as follows:
S(t)= S(t)+(rAmp-IAmp+threshold)cos(2n(CompA)t+lp)
where rAmp, lAmp are the amplitudes of CompB and CompA respectively;
CompA is the frequency of CompA; and
Ip is the phase of CompA. Alternatively, and possibly more desirably, to reduce the magnitude of a component of S(t) so that it satisfies the requirements for embedding a Ί ' bit in a window of the signal, the magnitude of CompB is reduced, by adding in a component that is 180° out of phase with the original component in the signal, as follows:
S(t)= S(t)+(rAmp-IAmp+threshold)cos(2n(compB)t+ π-rp)
where compB and rp define amplitude and phase of CompB.
In each of the above cases, the threshold value is set to provide a 25% difference in magnitude between CompA and CompB, however, this value can be varied dynamically from window to window or signal to signal as required.
It is possible that more than 1 bit would be written within a given window of a signal and if it is determined at step 24, that more information is to be embedded, the process loops around to choose the next pair of components as determined by the key value and the applicable keying algorithm.
Once watermarking for a given window is complete, and if more signal remains, step 26, the process continues to the next window to be processed. Referring now to Figure 4, in order to decode an embedded watermarked message from an audio signal, a decoder must be provided with the key value, an indication of the windowing used as the basis for embedding the watermark message as well as the rules that define a Ί ' bit and a '0' bit within the audio signal. The candidate audio signal is then segmented into frames using the same windowing as was used for embedding, step 40. In the embodiment, the system uses CSPE to calculate the magnitudes of the frequency components for the window. The two components above and below the frequency determined by the key value are then identified, step 42. These two components then have their magnitude compared and a "T or a '0' bit is determined according to the rules used in their embedding, step 44. From this comparison, the watermarked bit sequence can be recreated from the sequence of windows in the signal.
In some cases, a relatively short watermark message can be repeatedly written into successive windows of the signal and in decoding, each impression of the message can be correlated with the others to correct for any errors in decoding any portion of the message from the signal.
Referring now to Figure 5, in a second embodiment of the invention, the bin index values for the Fourier components corresponding to CSPE derived frequency components identified through using the key value are used as criteria for determining whether or not the signal in a window needs to be adjusted to
accommodate watermark information. Thus, 1st and 2nd components are identified from a key value as in the first embodiment and their index value k is calculated as k=f*N/Fs where f is the identified component's frequency, Fs is the sampling frequency used to produce the Fourier transform, Figure 2 and N is the transform window length.
In one scheme based on this principle, if the watermark bit is 0, it requires that the bin index value (k) of both identified CSPE frequency components should be either both odd or both even. If the embed bit is 1 , then one bin location should be odd and one even.
If the bin index values for the identified frequency components selected at step 56, do not satisfy the criteria for the watermark message bit to be embedded in the window at that point, then the magnitude of one or other of the frequency
components is reduced, step 58. In one implementation, the first reduction is 25% of the component's magnitude. CSPE is then run again on the adjusted signal for the window, and possibly new 1 st and 2nd frequency components are selected. If the bin index values for these components satisfy the embedding criteria, the process proceeds, and if not, the magnitude of one or other of these components is again reduced, step 58, before repeating the process. It has been found that this loop has not had to be repeated more than 3 times before the criteria for embedding a bit have been met.
It is appreciated that this loop adds a processing overhead to the embedding phase but since embedding is a one-off process and not time-critical, it is a satisfactory compromise for improved accuracy. In a steganographic audio watermarking system, audible artefacts are unacceptable as they allow listeners to deduce that there might be a watermark present or simply effect the quality of the recordal. One possible result of the watermarking schemes described above could be unexpected audible artefacts comprising 'pops' or 'clicks'.
In embodiments of the present invention, the signal, whether modified according to the first or second embodiment is analyzed for two types of artefact, Type I and II as follows:
If a modified frequency component in the adjusted signal has a magnitude greater than 10 times the original component's magnitude, it is identified as a Type I click.
If the Fourier transformed spectrum of the watermarked signal has bins with magnitudes that are different from the corresponding bins in the original spectrum, peaks are picked those differing bins from each spectrum. If a peak exists in the spectrum of the watermarked signal with a magnitude greater than 3 times the magnitude of the original corresponding peak, and if this peak is also greater than the magnitude of the neighbouring peaks, then this is identified as a Type II click.
Figure 6 shows a Type I click where a selected component's magnitude in a finally adjusted signal satisfying the criteria for embedding is apparently much larger than its value in the original signal and noticeably larger than neighboring components' magnitudes. On the other hand, the spectrum shown in Figure 7 shows a Type II click.
It is thought that these clicks result from CSPE identifying a 'ghost frequency component' not actually present in the original signal. When the magnitude of such a ghost component is adjusted when embedding watermark information, a real component can then be added into the signal.
Type II clicks only occur relatively rarely by comparison to Type I. Since Type II clicks occur only very occasionally, in embodiments of the present invention, the solution to this artefact is to return the adjusted component to its original state, step 60. This of course introduces an inaccuracy in the embedded information, however, as the event is so rare, it can be compensated for by building redundancy into the watermark message, either by repeatedly embedding a short watermark message into the signal, allowing a true version to be built up by a decoder or simply using CRC codes or equivalent within the watermark message information.
For Type I clicks, the magnitude of the selected component is reduced again, step 58 and the process is repeated. The solid line in Figure 6 represents the original signal, the dash-dot line represents the signal after it has been modified once (denoted as 'intermediate signal'), while the dashed line represents the final signal, in which the selected bins satisfy the condition for embedding.
In the embodiments described above, two components are selected in encoding step 16 and decoding step 42 and their mutual values are determined to enable the embedding/extraction of watermark information in/from an audio signal. However, it should be appreciated that the invention could equally be implemented by selecting more than two components and using their mutual values to determine how to embed information in the audio signal. It will be appreciated that the watermark information embedded and extracted in/from the audio signal can be used for any number of applications. For example, by embedding an ISRC code for a song in a recording of the song, it's broadcast on radio stations can be detected by listening for such watermark information including, but not limited to, where such stations broadcast through the Internet. This in turn can be used to assist musicians to properly recover royalties from the broadcast of their works from the responsible agencies around the world.
In addition or alternatively, public key information for stakeholders in a piece of audio material including the author, performer, publisher, distributor, retailer etc can be included in the audio material as a way of verifying/authenticating the audio material and especially for the purposes of combating illegal distribution of a piece of music. Thus, it will be seen that an audio signal watermarked according to the present invention may simultaneously include many threads of information which can be used for different applications relating to that audio material. The invention is not limited to the embodiment(s) described herein but can be amended or modified without departing from the scope of the present invention.

Claims

Claims:
1 . A method of providing a digital watermark in an audio signal comprising: a) selecting a key frequency value determining how watermark information is to be embedded into a first time frame of said audio signal;
b) providing a plurality of discrete frequency component values of said audio signal for said first time frame;
c) selecting at least two frequency components for said time frame as a function of at least said key frequency value;
d) determining if said at least two frequency components meet a given mutual criterion for said signal in said first time frame; and
e) responsive to said components not meeting said criterion, adjusting the magnitude of at least one of said at least two frequency components in said first time frame.
2. A method according to claim 1 wherein step b) further comprises:
b1 ) providing a first Fourier transform for said first time frame within said audio signal, said transform including a number of indexed bins representing the Fourier components of said first time frame;
b2) providing a second Fourier transform for a second time frame shifted in time relative to said first time frame and overlapping in time with said first time frame; and
b3) convolving said first and second transform components to provide said plurality of frequency component values for said first time frame.
3. A method according to claim 1 comprising performing CSPE analysis on said first time frame to provide said frequency component values.
4. A method according to claim 1 wherein step d) further comprises:
d1 ) determining the respective bin index values said selected frequency
components; and
d2) testing said bin index values for meeting a given criterion for said signal in said first time frame, and wherein step e) comprises reducing the magnitude of one of said selected frequency components.
5. A method according to claim 4 further comprising the step of:
f) repeating steps b) to e) with said adjusted audio signal until said audio signal does not need to be adjusted to accommodate digital watermark information.
6. A method according to claim 4 in which step d2) comprises testing each of the bin index values for being even valued and, responsive to said test values, reducing the magnitude of one of said selected frequency components to accommodate a bit of a given value in said time frame.
7. A method according to claim 6 wherein said test values are tested for having mutually different values.
8. A method according to claim 1 wherein said adjusting comprises reducing the magnitude of one of said selected frequency components and further comprising:
g) analysing the frequency component values for said adjusted signal; and h) responsive to said adjusted frequency component's magnitude exceeding its magnitude prior to said adjustment by a first threshold, further reducing the magnitude of said adjusted frequency component in said first time frame.
9. A method according to claim 8 wherein said first threshold is an order of magnitude.
10. A method according to claim 1 wherein said adjusting comprises reducing the magnitude of one of said selected frequency components and further comprising:
g) analysing frequency component values for said adjusted signal for said first time frame;
h) identifying any frequency component having a magnitude exceeding its magnitude prior to said adjustment by a second threshold; and i) responsive to any identified component's magnitude being greater than the magnitudes for respective adjacent frequency components of said identified component, restoring the original magnitude of said adjusted frequency component in said first time frame of said audio signal.
1 1 . A method according to claim 10 wherein said second threshold is 3 times the original value.
12. A method according to claim 1 comprising: providing watermark information to be included in an audio signal; and adjusting successive time frames of said audio signal according to claim 1 to accommodate one or more bits of said watermark information in said successive time frames.
13. A method of extracting a digital watermark from an audio signal comprising: a) selecting a key frequency value determining how watermark information was embedded into a first time frame of said audio signal;
b) providing a plurality of discrete frequency component values of said audio signal for said first time frame;
c) selecting at least two frequency components for said time frame as a function of at least said key frequency value; and
d) determining if said at least two frequency components meet a given mutual criterion for said signal in said first time frame, to determine a value for said digital watermark in said first time frame.
14. A computer program product comprising computer readable code stored on a computer readable medium which when executed on a computing device is arranged to process an audio signal according to any one of claims 1 to 13.
15. An audio signal including a digital watermark embedded according to any one of claims 1 to 12.
PCT/EP2011/059688 2010-06-21 2011-06-10 Audio watermarking WO2011160966A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US35692410P 2010-06-21 2010-06-21
US61/356,924 2010-06-21

Publications (1)

Publication Number Publication Date
WO2011160966A1 true WO2011160966A1 (en) 2011-12-29

Family

ID=44350600

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2011/059688 WO2011160966A1 (en) 2010-06-21 2011-06-10 Audio watermarking

Country Status (1)

Country Link
WO (1) WO2011160966A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111462765A (en) * 2020-04-02 2020-07-28 宁波大学 One-dimensional convolution kernel-based adaptive audio complexity characterization method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DOUGLAS NELSON: "Cross Spectral Methods for Processing Speech", JOURNAL OF THE ACOUSTIC SOCIETY OF AMERICA, vol. 110, no. 5, November 2001 (2001-11-01), pages 2575 - 2592, XP012002610, DOI: doi:10.1121/1.1402616
JIAN WANG ET AL: "Digital Audio Watermarking by Magnitude Modification of Frequency Components Using the CSPE Algorithm", 18 August 2009 (2009-08-18), pages 1 - 7, XP055004535, Retrieved from the Internet <URL:http://eprints.nuim.ie/1635/2/HealyDAWCSPE.pdf> [retrieved on 20110810] *
JIAN WANG ET AL: "Perceptually Transparent Audio Watermarking of Real Audio Signals Based On The CSPE Algorithm", 8 June 2010 (2010-06-08), pages 1 - 6, XP055004541, Retrieved from the Internet <URL:http://eprints.nuim.ie/1970/1/Perceptually_Transparent_Audio_Watermarking_of_Real_Audio_Signals_Based_On_The_CSPE_Algorithm[1].pdf> [retrieved on 20110810] *
K. M. SHORT, R. A. GARCIA: "Signal Analysis using the Complex Spectral Phase Evolution (CSPE) Method", AUDIO ENGINEERING SOCIETY 120TH CONVENTION, May 2006 (2006-05-01)
MOULIN, P., KOETTER, R.: "Data-Hiding Codes", PROC. OF THE IEEE, vol. 93, no. 12, December 2005 (2005-12-01)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111462765A (en) * 2020-04-02 2020-07-28 宁波大学 One-dimensional convolution kernel-based adaptive audio complexity characterization method

Similar Documents

Publication Publication Date Title
Li et al. Localized audio watermarking technique robust against time-scale modification
Kirovski et al. Spread-spectrum watermarking of audio signals
US6952774B1 (en) Audio watermarking with dual watermarks
Xiang et al. Histogram-based audio watermarking against time-scale modification and cropping attacks
US8116514B2 (en) Water mark embedding and extraction
US7206649B2 (en) Audio watermarking with dual watermarks
US6442283B1 (en) Multimedia data embedding
Kang et al. Geometric invariant audio watermarking based on an LCM feature
US6738744B2 (en) Watermark detection via cardinality-scaled correlation
Wang et al. Centroid-based semi-fragile audio watermarking in hybrid domain
Dhar et al. A new audio watermarking system using discrete fourier transform for copyright protection
Xiang Audio watermarking robust against D/A and A/D conversions
Dhar et al. A new DCT-based watermarking method for copyright protection of digital audio
Dhar et al. Digital watermarking scheme based on fast Fourier transformation for audio copyright protection
Nikmehr et al. A new approach to audio watermarking using discrete wavelet and cosine transforms
Kirovski et al. Spread-spectrum audio watermarking: requirements, applications, and limitations
Hu et al. Frame-synchronized blind speech watermarking via improved adaptive mean modulation and perceptual-based additive modulation in DWT domain
Kirovski et al. Audio watermark robustness to desynchronization via beat detection
Li et al. An audio watermarking technique that is robust against random cropping
Bibhu et al. Secret key watermarking in WAV audio file in perceptual domain
Megías et al. A robust audio watermarking scheme based on MPEG 1 layer 3 compression
Huang et al. A reversible acoustic steganography for integrity verification
Cichowski et al. Analysis of impact of audio modifications on the robustness of watermark for non-blind architecture
WO2011160966A1 (en) Audio watermarking
Lin et al. Audio watermarking techniques

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11729940

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11729940

Country of ref document: EP

Kind code of ref document: A1