EP2787503A1 - Method and system of audio signal watermarking - Google Patents
Method and system of audio signal watermarking Download PDFInfo
- Publication number
- EP2787503A1 EP2787503A1 EP13162596.4A EP13162596A EP2787503A1 EP 2787503 A1 EP2787503 A1 EP 2787503A1 EP 13162596 A EP13162596 A EP 13162596A EP 2787503 A1 EP2787503 A1 EP 2787503A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- watermark
- audio signal
- profile
- class
- embedding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/018—Audio watermarking, i.e. embedding inaudible data in the audio signal
Definitions
- the present invention relates to a method and system of audio signal watermarking.
- Audio signal watermarking is a process for embedding information data (watermark) into an audio signal without affecting the perceptual quality of the host signal itself.
- the watermark should be imperceptible or nearly imperceptible to Human Auditory System (HAS). However, the watermark should be detectable through an automated detection process.
- HAS Human Auditory System
- EP 1 594 122 discloses a watermarking method and apparatus employing spread spectrum technology and psycho-acoustic model.
- spread spectrum technology a small baseband signal bandwidth is spread over a larger bandwidth by injecting or adding a higher frequency signal, or spreading function. Thereby the energy used for transmitting the signal is spread over a wider bandwidth, and appears as noise.
- psycho-acoustic model based on psycho acoustical properties of the HAS, the watermark signal is shaped to reduce its magnitude so that it has a level that is below a masking threshold of the host audio/video signal.
- a spreading function is modulated by watermark data bits for providing a watermark signal; the current masking level of the audio/video signal is determined and a corresponding psycho-acoustic shaping of the watermark signal is performed; the psycho-acoustically shaped watermark signal is additionally shaped in order to reduce on average the magnitude of the watermark signal, whereby for each spectral line the phase of the values of the audio/video signal into which the psycho-acoustically and additionally shaped watermark signal is embedded is kept unchanged by the additional shaping; the psycho-acoustically and additionally shaped watermark signal is embedded into the audio/video signal.
- a watermarking technique should achieve a trade-off between three basic features: imperceptibility, robustness and payload, which are strictly linked to each other by inverse relationships.
- a watermarking technique should find a correct balance between the need to keep the watermark imperceptible; to make the watermark robust against attacks and manipulations of the host signal (e.g., noise distortion, A/D or D/A conversion, lossy coding, resizing, filtering, lossy compression); and to achieve the highest possible payload.
- psycho-acoustic model enables to determine the maximum distortion, i.e. the maximum watermark signal energy that can be introduced into a host signal without being perceptible by human senses.
- this model does not provide any information about robustness and payload and about optimization of trade-off among imperceptibility, robustness and payload.
- the Applicant found that the above objects are achieved by a method and system of audio signal watermarking wherein audio signals are classified based on their semantic content and watermarks are embedded into the audio signals by using watermark profiles selected on the basis of the classes assigned to the audio signals.
- the Applicant found that, given an audio signal, the trade-off among watermark imperceptibility, robustness and payload can be optimized by fitting the watermark profile depending on the semantic content of the audio signal.
- semantic content in relation to an audio signal refers to the audio type contained in the audio signal.
- the semantic content of an audio signal can be, for example, speech (e.g. talks from movies, from TV or radio programs, from TV or radio advertisements, from TV or radio talk shows, and similar) or music.
- the semantic content can be, for example, a musical genre (e.g., rock, classic, jazz, blues, instrumental, singing and similar).
- the semantic content can be, for example, a tone of voice, conversation, single person speaking, whisper, aloud quarrel, and similar.
- watermark profile is used to indicate a set of parameters used for embedding the watermark into the audio signal according to a predetermined watermarking technique.
- the present disclosure relates to a method of watermarking an audio signal comprising:
- the present disclosure relates to a system of watermarking an audio signal comprising an encoding device comprising:
- the method and system of the present disclosure may have at least one of the following preferred features.
- each watermark profile is associated with a corresponding class so that trade-off among watermark imperceptibility, robustness and payload is optimized for said class, depending on the watermark application.
- one, two or all the features among imperceptibility, robustness and payload could be optimized for each class, by keeping unchanged the other feature(s), if any, among the classes.
- robustness could be maximized for each class, by keeping the same payload and imperceptibility level among the classes.
- payload could be maximized for each class, by keeping the same robustness and imperceptibility level among the classes.
- imperceptibility could be maximized for each class, by keeping the same payload and robustness level among the classes.
- the plurality of watermark profiles can relate to a single watermarking technique or to a plurality of watermarking techniques.
- the watermark profiles differ from each other for the value taken by at least one parameter of said set of parameters.
- the watermark profiles differ from each other for at least one of the parameters and/or for the values taken by at least one of the common parameters.
- the watermarking technique(s) can be selected, for example, from the group comprising: spread spectrum watermarking technique, echo hiding watermarking technique, phase coding technique, informed watermarking schemes like QIM (Quantization Index Modulation) and Spread Transform Dither Modulation (STDM).
- QIM Quadrature Index Modulation
- STDM Spread Transform Dither Modulation
- the method is a computer-implemented method.
- the embedding unit can comprise a plurality of embedding sub-units.
- Each embedding sub-unit can be configured to embed the watermark into the audio signal by using one watermark profile of said plurality of watermarking profiles or one watermarking technique of said plurality of watermarking techniques.
- At least one parameter of said set of parameters defining the watermark profile may be selected, for example from the group comprising: watermark bit rate; frequency range hosting the watermark; Document to Watermark Ratio (DWR); watermark frame length; masking threshold modulation factor F, intended as a quantity by which the masking threshold of the audio signal -computed according to a psychoacoustic model- is multiplied to vary its amplitude with respect to the computed value; channel coding scheme (which may also include error detection techniques such as, for example, Cyclic Redundancy Check); number, amplitude, offset and decay rate of echo pulses (in case of echo hiding watermarking technique); spreading factor, intended as number of audio signal frequency or phase samples needed to insert one bit of watermark (in case of spread spectrum watermarking technique with, respectively, frequency or phase modulation).
- DWR Document to Watermark Ratio
- F masking threshold modulation factor F, intended as a quantity by which the masking threshold of the audio signal -computed according to a psychoacous
- the plurality of classes is associated with the corresponding plurality of watermark profiles in a database.
- the database can be internal or external to the encoding device.
- the expression watermarking refers to digital watermarking.
- Digital watermarking relates to a computer-implemented watermarking process.
- a masking threshold of the audio signal according to a psychoacoustic model is computed.
- the audio signal is split into time windows and a masking threshold is computed for each time window of the audio signal.
- the masking threshold can be computed after, before or at the same time of the audio signal classification.
- the psychoacoustic model can be a psychoacoustic model known in the art.
- the psychoacoustic model is adapted to calculate the masking threshold in time and/or frequency domain and is based on one of the following analysis: block based FFT (Fast Fourier Transform), block based DCT (Discrete Cosine Transform), block based MDCT (Modified Discrete Cosine Transform), block based MCLT (Modified Complex Lapped Transform), block based STFT (Short-Time Fourier Transform), sub-band or wavelet packet analysis.
- block based FFT Fast Fourier Transform
- block based DCT Discrete Cosine Transform
- block based MDCT Modified Discrete Cosine Transform
- block MCLT Modified Complex Lapped Transform
- STFT Short-Time Fourier Transform
- the encoding device comprises a masking unit configured to perform the masking threshold computation and, optionally, the time windows splitting.
- embedding the watermark into the audio signal comprises the step of shaping the energy of the watermark according to the computed masking threshold.
- the watermark is shaped to reduce its energy below the computed masking threshold of the audio signal.
- the set of parameters defining the obtained watermark profile comprises the masking threshold modulation factor F
- the watermark is preferably shaped to reduce its energy below the computed masking threshold, multiplied by the masking threshold modulation factor F.
- the watermark is shaped by the embedding unit.
- the masking threshold modulation factor F is at least equal to 0.5. More preferably, the masking threshold modulation factor F is at least equal to 0.7, even more preferably at least equal to 0.8, even more preferably at least equal to 0.9.
- the masking threshold modulation factor F is not higher than 1.5. More preferably, the masking threshold modulation factor F is not higher than 1.3, even more preferably not higher than 1.2, even more preferably not higher than 1.1.
- Assigning the audio signal to a class, according to the semantic content of the audio signal, is preferably performed based upon analysis of at least one audio signal feature.
- the audio signal feature is related to the semantic content of the audio signal.
- the audio signal feature is preferably related to time, frequency, energy or cepstrum domain.
- the at least one audio signal feature can be selected from the group comprising: loudness, brightness, beats per minute (BPM) bandwidth, pitch, odd-to-even harmonic energy ratio, spectral energy bands (e.g. spectrum sparsity), spectral and tonal complexities, spectral roll-off point (intended as any percentile of the power spectral distribution), spectral centroid (defined as the center of gravity of the magnitude spectrum), spectral "flux” (intended as squared difference between the normalized magnitudes of successive spectral distributions), time domain Zero-Crossing Rate, Cepstrum Resynthesis Residual Magnitude, Mel-Frequency Cepstral Coefficients, band periodicity (defined as the periodicity of a sub-band and derived by sub-band correlation analysis).
- BPM beats per minute
- pitch odd-to-even harmonic energy ratio
- spectral energy bands e.g. spectrum sparsity
- spectral and tonal complexities e.g. spectrum sparsity
- the analysis of at least one audio signal feature preferably comprises checking if the at least one audio signal feature meets one or more predetermined constraints. For example, the value of the at least one audio signal feature can be compared to one or more predetermined thresholds or one or more predetermined ranges of values. Each class is advantageously defined by predetermined constrains (e.g. set of values) to be met by the at least one audio signal feature.
- the audio signal is split into sub-signals of shorter duration and a class is assigned to each sub-signal independently from the other sub-signals.
- the duration of the sub-signals is longer than the duration of the time windows in which the audio signal is split for performing masking threshold computation.
- the method of audio signal watermarking comprises a decoding process comprising extraction of the watermark from the watermarked audio signal.
- the watermark is extracted by using the same watermark profile used for embedding the watermark into the audio signal.
- the system also comprises a decoding device configured to extract the watermark from the watermarked audio signal.
- the watermarked audio signal is assigned to a class, among the plurality of classes, depending on the semantic content of the watermarked audio signal, the plurality of classes being associated with the corresponding plurality of watermark profiles.
- the class is assigned by a classification unit of the decoding device.
- the watermark profile associated with the class assigned to the audio signal is obtained and used for extracting the watermark from the watermarked audio signal.
- the watermark is extracted from the watermarked audio signal by an extraction unit of the decoding device.
- said plurality of watermark profiles are tried in sequence for extracting the watermark till the watermark is successfully extracted from the watermarked audio signal.
- the decoding device can comprise a single extraction unit for trying in sequence the plurality of watermark profiles.
- the extraction unit can comprise a plurality of sub-extraction units, one for each watermark profile or for each watermarking technique, for trying the plurality of watermark profiles at least partly in parallel.
- audio signal classification is not necessary at the decoding side.
- a second watermark is embedded into the audio signal, comprising the class assigned to the audio signal, by using a predefined watermark profile, common to all audio signals independently from their class.
- the second watermark can be embedded into the watermarked audio signal, already watermarked with the first watermark.
- the first and the second watermarks can be embedded into different sub-bands of the audio signal. Watermark extraction can then be performed by first extracting the second watermark from the watermarked audio signal (by using the common watermark profile) so as to retrieve the class of the audio signal, and then by obtaining the watermark profile associated with the retrieved class and extracting the watermark from the watermarked audio signal with the obtained watermark profile.
- audio signal classification is not necessary at the decoding side.
- Figure 1 discloses a system 1 of audio signal watermarking according to an embodiment of the invention.
- the system 1 comprises an encoding device 10 comprising an input 11 for an audio signal, an input 13 for a watermark and an output 15 for a watermarked audio signal.
- the encoding device 10 comprises a classification unit 12, a watermark profile unit 14, a masking unit 18 and an embedding unit 16.
- the classification unit 12, watermark profile unit 14, masking unit 18 and embedding unit 16 comprise hardware and/or software and/or firmware configured to implement the method of the present disclosure.
- the classification unit 12 is configured to assign the audio signal to a class depending on the semantic content of the audio signal.
- the classification unit 12 is configured to analyse at least one audio signal feature related to the semantic content of the audio signal, to compare the at least one audio signal feature with one or more constrains and to assign the audio signal a class, selected among a predetermined plurality of classes, depending on the result of the comparison.
- the at least one audio signal feature can be selected from the group comprising: loudness, brightness, beats per minute (BPM) bandwidth, pitch, odd-to-even harmonic energy ratio, spectral energy bands (e.g. spectrum sparsity), spectral and tonal complexities, spectral roll-off point (intended as any percentile of the power spectral distribution), spectral centroid (defined as the center of gravity of the magnitude spectrum), spectral "flux” (intended as squared difference between the normalized magnitudes of successive spectral distributions), time domain Zero-Crossing Rate, Cepstrum Resynthesis Residual Magnitude, Mel-Frequency Cepstral Coefficients, band periodicity (defined as the periodicity of a sub-band and derived by sub-band correlation analysis).
- BPM beats per minute
- pitch odd-to-even harmonic energy ratio
- spectral energy bands e.g. spectrum sparsity
- spectral and tonal complexities e.g. spectrum sparsity
- the at least one audio signal feature can be compared with one or more predetermined thresholds or one or more predetermined ranges of values and each class can be defined by a predetermined set of values that can be taken by the at least one audio signal feature.
- the plurality of classes can be stored in a suitable class database 17 internal (as shown in figure 1 ) or external to the encoding device 10.
- the encoding device 10 before assigning the audio signal to a class, is configured to split the audio signal into sub-signals of shorter duration (e.g. from few tenths to few tens of seconds) and the classification unit 12 is configured to classify each sub-signal independently from the other sub-signals.
- sub-signals of shorter duration e.g. from few tenths to few tens of seconds
- the classification unit 12 is configured to classify the audio signals (or sub-signals) by analysing the spectrum sparsity of their energy spectrum.
- the spectrum sparsity is an audio signal feature indicative of the energy concentration in a sub-band compared to the energy in the whole audio signal (or sub-signal) band.
- the energy spectrum of the audio signal (or sub-signal) is considered sparse (or colored) if most part of its energy is concentrated in a small spectrum sub-band, otherwise it is considered non-sparse (or noise-like).
- figure 2 shows the energy spectrum of four audio signals having a different semantic content: speech, rock, jazz, piano solo.
- three different classes can be defined by analyzing the spectrum sparsity feature and, in the example, by comparing the fraction of signal energy (normalized to the total energy) contained in the 0-1000 Hz sub-band with two threshold levels S L and S H . If said fraction of energy in the 0-1000 Hz sub-band is lower than S L , the audio signal (or sub-signal) can be classified into a "low sparse” class; if said fraction of energy in the 0-1000 Hz sub-band is between S L and S H , the audio signal (or sub-signal) can be classified into a "medium sparse” class; if it is higher than S H , the audio signal can be classified into a "high sparse” class.
- threshold levels S L and S H can thus be significant in such sub-band.
- the plurality of classes used by the classification unit 12 are associated with a corresponding plurality of watermark profiles in a suitable watermark profile database 19 internal (as shown in figure 1 ) or external to the encoding device 10.
- database 17 and database 19 are shown in the figures as two distinct entities, they can also be implemented into a single database.
- Each watermark profile is defined by a set of parameters used for embedding the watermark into the audio signal according to a predetermined watermarking technique.
- the watermarking technique can be a technique known in the art as, for example, a spread spectrum watermarking technique (e.g. wherein the watermark is spread over many frequency bins so that the energy in one bin is very small and undetectable), a echo hiding watermarking technique (e.g. wherein the watermark is embedded into an audio signal by introducing one or more echoes that are offset in time from the audio signal by an offset value associated with the data value of the bit), a phase coding technique (e.g. wherein phase differences between selected frequency component portions of an audio signal are modified to embed the watermark in the audio signal) or any other watermarking technique known in the art.
- a spread spectrum watermarking technique e.g. wherein the watermark is spread over many frequency bins so that the energy in one bin is very small and undetectable
- a echo hiding watermarking technique e.g. wherein the watermark is embedded into an audio signal by introducing one or more echoes that are offset in time from the audio signal by an offset value associated with the
- the set of parameters can comprise at least one parameter selected from the group comprising: watermark bit rate; frequency range hosting the watermark; Document to Watermark Ratio (DWR); watermark frame length; masking threshold modulation factor F; channel coding scheme; number, amplitude, offset and decay rate of echo pulses; spreading factor.
- DWR Document to Watermark Ratio
- F masking threshold modulation factor
- the plurality of watermark profiles associated with the plurality of classes can relate to a single watermarking technique or to a plurality of watermarking techniques.
- the watermark profiles are all defined by the same set of parameters and differ from each other for the values taken by at least one of the parameters.
- the watermark profiles relating to different watermarking techniques are defined by different sets of parameters.
- the watermark profiles can thus differ from each other for at least one parameter and/or for at least one value taken by a common parameter.
- each class is associated with a corresponding watermark profile that enables to optimize trade-off among watermark imperceptibility, robustness and payload for each class, depending on the watermark application.
- talk and rock signals classified as “low sparse” signals, allows to introduce a higher level of distortion than jazz signals, classified as “medium sparse” signals, and than piano solo signals, classified as “high sparse” signals.
- the level of distortion which is actually "available” for each audio signal class is advantageously exploited in order to optimize trade-off among watermark imperceptibility, robustness and payload, depending on the watermark application.
- the higher level of distortion available for the "low sparse” and “medium sparse” signals compared with the "high sparse” signals, could be exploited to maximize, for each class, one or two features among imperceptibility, robustness and payload, by keeping unchanged the other feature(s) among the classes.
- the audio signal when the audio signal is intended to be transmitted in a low noise channel and/or played in a low-noise ambient (e.g. domestic ambient), payload could be maximized for each class, by keeping the same robustness and imperceptibility levels among the classes.
- a low noise channel and/or played in a low-noise ambient e.g. domestic ambient
- payload could be maximized for each class, by keeping the same robustness and imperceptibility levels among the classes.
- a high noise channel and/or played in a high-noise environment e.g. public ambient as a train station or airport
- robustness could be maximized for each class, by keeping the same payload and imperceptibility level among the classes.
- imperceptibility when the audio signal is intended to be played in a low-noise ambient, imperceptibility could be maximized for each class, by keeping the same payload and robustness level among the classes.
- payload could be optimized by acting on the watermark bit rate; imperceptibility could be optimized by acting on the masking threshold modulation factor F; and robustness could be optimized by acting on at least one of: frequency range hosting the watermark; Document to Watermark Ratio; watermark frame length; channel coding scheme; number, amplitude, offset and decay rate of echo pulses; and spreading factor.
- the level of distortion actually "available" for each audio signal class could be exploited in order to maximize the payload feature for each class, by associating a watermark profile with a higher bit rate with the low sparse class, a watermark profile with an intermediate bit rate with the medium sparse class and a watermark profile with a lower bit rate with the high sparse class.
- the robustness feature could be maximized for each class, by associating a more robust watermark profile with the low sparse class, an intermediate robust watermark profile with the medium sparse class and a lower robust watermark profile with the high sparse class.
- the masking threshold modulation factor F it is also observed that it could be set to different values, depending on the frequency ranges of the audio signal.
- the watermark is shaped by the embedding unit 16 according to a masking threshold as computed by the masking unit 18, on the basis of a psycho-acoustic model.
- the watermark is shaped according to a higher masking threshold, whereby the imperceptibility level of the watermark is decreased with respect to the level set according to the psycho-acoustic model.
- the watermark is shaped according to a lower masking threshold, whereby the imperceptibility level of the watermark is increased with respect to the level set according to the psycho-acoustic model.
- the psychoacoustic model produces a representation of sound perception based on average human auditory system, without taking into account high or low level psychoacoustic effects.
- the masking threshold computed according to the psychoacoustic model can be too strict in some situations (e.g. in case of rock music, noisy signal, conversation or aloud quarrel) or too light in other situations (e.g. in case of classic or instrumental music and expert hearer).
- the masking threshold modulation factor F enables to vary the amplitude of the masking threshold, as computed according to the psycho-acoustic model, depending on the semantic content of the audio signal.
- the imperceptibility level of the watermark can be finely tuned, with respect to the level set according to the psycho-acoustic model, depending on the semantic content of the audio signal and on the watermark application.
- the watermark profile unit 14 is configured to retrieve from the watermark profile database 19 the watermark profile associated with said class and to provide it to the embedding unit 16.
- the masking unit 18 is configured to compute a masking threshold of the audio signal according to a psycho-acoustic model and to provide it to the embedding unit 16.
- the psychoacoustic model can be any psychoacoustic model known in the art.
- the psychoacoustic model calculates the masking threshold in time and/or frequency domain and is based on one of the following analysis: block based FFT (Fast Fourier Transform), block based DCT (Discrete Cosine Transform), block based MDCT (Modified Discrete Cosine Transform), block based MCLT (Modified Complex Lapped Transform), block based STFT (Short-Time Fourier Transform), sub-band or wavelet packet analysis.
- block based FFT Fast Fourier Transform
- block based DCT Discrete Cosine Transform
- block based MDCT Modified Discrete Cosine Transform
- block MCLT Modified Complex Lapped Transform
- STFT Short-Time Fourier Transform
- the masking unit 18 is configured to split the audio signal into suitable time windows (e.g. of about few ms) and to compute a masking threshold for each time window.
- the masking threshold computation is performed in parallel to audio signal classification.
- the embedding unit 16 is configured to embed the watermark into the audio signal by using the watermark profile obtained by the watermark profile unit 14 so as to provide a watermarked audio signal.
- the embedding unit 16 is also configured to shape the energy of the watermark according to the masking threshold computed by the masking unit 18.
- the embedding unit 16 is also preferably configured to shape the watermark so as to reduce its energy below the masking threshold computed by the masking unit 18, multiplied by the masking threshold modulation factor F.
- the embedding unit 16 can comprise a plurality of embedding sub-units, one for each different watermark profile of said plurality of watermarking profiles or one for each of the watermarking techniques to which the plurality of watermarking profiles relates.
- Figure 3 shows an embodiment wherein the system 1 comprises the encoding device 10, a communication network 30 and a decoding device 20.
- the communication network 30 and the decoding device 20 comprise hardware and/or software and/or firmware configured to implement the method of the present disclosure.
- the communication network 30 can be any type of communication network adapted to transmit the watermarked audio signal.
- the decoding device 20 is configured to receive the watermarked audio signal and to extract the watermark from it.
- the watermark is extracted by using the same watermark profile used for embedding the watermark into the audio signal.
- the decoding device 20 needs to know the watermark profile used for embedding the watermark.
- Figure 4 shows a first embodiment of the decoding device 20 comprising a classification unit 22, a watermark profile unit 24, an extraction unit 26, a class database 27 and a watermark profile database 29.
- the classification unit 22 is configured to assign the watermarked audio signal a class depending upon the semantic content of the audio signal, in the same way to what disclosed above with reference to classification unit 12 of encoding device 10.
- the class assigned to the watermarked audio signal will thus be the same as that assigned in the encoding device 10.
- the class database 27 (which could also be external to the decoding device 20) stores the plurality of classes in which the audio signal can be classified.
- the watermark profile database 29 (which could also be external to the decoding device 20) stores an association between the plurality of classes and the corresponding plurality of watermark profiles.
- database 27 and database 29 are shown in the figures as two distinct entities, they can also be implemented into a single database.
- the watermark profile unit 24 is configured to retrieve from the watermark profile database 29 the watermark profile associated with said class and to provide it to the extraction unit 26.
- the association in the watermark profile database 29 is the same as that in the watermark profile database 19 of the encoding device 10.
- the watermark profile retrieved by the watermark profile unit 24 will thus be the same as that used in the encoding device 10.
- the extraction unit 26 is configured to use the watermark profile retrieved by watermark profile unit 24 for extracting the watermark from the watermarked audio signal.
- Figure 5 shows a second embodiment of the decoding device 20 comprising an extraction unit 26.
- watermarked audio signal classification is not performed in decoding device 20.
- extraction unit 26 is configured to try in sequence the plurality of watermark profiles for extracting the watermark till the watermark is successfully extracted from the watermarked audio signal.
- the extraction unit 26 can comprise a plurality of extraction sub-units (not shown), one for each watermark profile or for each watermarking technique, for trying the plurality of watermark profiles at least partly (or wholly) in parallel.
- the class of the audio signal can be inserted into the audio signal by the embedding unit 16 of the encoding device 10 by embedding a second watermark (containing said class) into the audio signal with a predefined watermark profile, common to all audio signals independently from their class.
- the second watermark can be embedded into the already watermarked audio signal.
- the two watermarks can be embedded into different sub-bands of the audio signal.
- the extraction unit 16 is preferably configured to first use the predefined common watermark profile to extract the second watermark from the watermarked audio signal thereby retrieving the class of the audio signal. Then, the extraction unit 16 is configured to obtain (e.g., from a watermark profile database - not shown in figure 5 - similar to watermark profile database 29) the watermark profile associated with the retrieved class and to extract the watermark from the watermarked audio signal with the obtained watermark profile.
- Figure 6 shows an exemplary implementation of the system 1 of audio signal watermarking.
- the encoding device 10 is deployed on an entity 50 for embedding watermarks into audio signals.
- the entity 50 can be, for example, a recording industry, a music producer or a service supply company providing services to the users.
- the watermark can comprise data relating to signature information, copyright information, serial numbers of broadcasted audio signals, product identification in audio signal broadcasting and similar.
- the audio signal can comprise music or speech as, for example, talks from movies, from TV or radio programs, from TV or radio advertisements, from TV or radio talk shows, and similar.
- the decoding device 20 is deployed on a user device 60.
- the user device 60 can be, for example, a PC, a smart phone, a tablet, a portable media player (e.g. an iPod®), or other similar device.
- the user device 60 can be adapted to download or stream a video from the internet or to detect audio signals of a TV program broadcasted on the TV or to detect audio signals of a movie played by means of a DVD player, a VHS player, a decoder or similar.
- a media provider 40 can obtain watermarked audio signals from entity 50 and supply them to user device 60, through communication network 30.
- the watermarked audio signals can be, for example, supplied to the user device 60 by means of broadcasting (e.g. from a TV or radio station), streaming (e.g. from the internet) or downloading (e.g. from a PC).
- broadcasting e.g. from a TV or radio station
- streaming e.g. from the internet
- downloading e.g. from a PC
- the media provider 40 can be, for example, a TV station, a radio station, a PC or other similar device.
- User device 60 equipped with the decoding device 20, will be configured to extract the watermark from the watermarked audio signals.
- the watermark can comprise information enabling the user device 60 to connect, through communication network 30, to a service provider 70 that supplies predetermined services to users.
- the audio signals can be, for example, audio signals of a TV talk show or movie (or similar) and the user service can involve the provision of information to users about TV images the users are watching into the TV (e.g. information about the actors, about items of clothing and/or furnishing, about the movies or talk shows, about the set and similar).
Abstract
Method and system of watermarking an audio signal wherein:
- the audio signal is assigned to a class, among a plurality of classes, depending on the semantic content of the audio signal, the plurality of classes being associated with a corresponding plurality of watermark profiles;
- the watermark profile associated with the class assigned to the audio signal is obtained;
- a watermark is embedded into the audio signal by using the obtained watermark profile so as to provide a watermarked audio signal.
- the audio signal is assigned to a class, among a plurality of classes, depending on the semantic content of the audio signal, the plurality of classes being associated with a corresponding plurality of watermark profiles;
- the watermark profile associated with the class assigned to the audio signal is obtained;
- a watermark is embedded into the audio signal by using the obtained watermark profile so as to provide a watermarked audio signal.
Description
- The present invention relates to a method and system of audio signal watermarking.
- Audio signal watermarking is a process for embedding information data (watermark) into an audio signal without affecting the perceptual quality of the host signal itself.
- The watermark should be imperceptible or nearly imperceptible to Human Auditory System (HAS). However, the watermark should be detectable through an automated detection process.
- Watermarking techniques are known in the art.
- For example,
EP 1 594 122EP 1 594 122 - The applicant observes that a watermarking technique should achieve a trade-off between three basic features: imperceptibility, robustness and payload, which are strictly linked to each other by inverse relationships. Depending upon the purpose of using watermarking, a watermarking technique should find a correct balance between the need to keep the watermark imperceptible; to make the watermark robust against attacks and manipulations of the host signal (e.g., noise distortion, A/D or D/A conversion, lossy coding, resizing, filtering, lossy compression); and to achieve the highest possible payload.
- The applicant notes that psycho-acoustic model enables to determine the maximum distortion, i.e. the maximum watermark signal energy that can be introduced into a host signal without being perceptible by human senses. However, this model does not provide any information about robustness and payload and about optimization of trade-off among imperceptibility, robustness and payload.
- It is thus an object of the invention to provide alternative method and system of audio signal watermarking.
- It is a further object of the invention to provide improved method and system of audio signal watermarking with high performances in terms of trade-off among imperceptibility, robustness and payload.
- The Applicant found that the above objects are achieved by a method and system of audio signal watermarking wherein audio signals are classified based on their semantic content and watermarks are embedded into the audio signals by using watermark profiles selected on the basis of the classes assigned to the audio signals.
- Indeed, as described in more detail below, the Applicant found that, given an audio signal, the trade-off among watermark imperceptibility, robustness and payload can be optimized by fitting the watermark profile depending on the semantic content of the audio signal.
- In the present disclosure, the expression "semantic content" in relation to an audio signal refers to the audio type contained in the audio signal. The semantic content of an audio signal can be, for example, speech (e.g. talks from movies, from TV or radio programs, from TV or radio advertisements, from TV or radio talk shows, and similar) or music. In case of music, the semantic content can be, for example, a musical genre (e.g., rock, classic, jazz, blues, instrumental, singing and similar). In case of speech, the semantic content can be, for example, a tone of voice, conversation, single person speaking, whisper, aloud quarrel, and similar.
- In the present disclosure, the expression "watermark profile" is used to indicate a set of parameters used for embedding the watermark into the audio signal according to a predetermined watermarking technique.
- In a first aspect, the present disclosure relates to a method of watermarking an audio signal comprising:
- assigning the audio signal to a class, among a plurality of classes, depending on the semantic content of the audio signal, the plurality of classes being associated with a corresponding plurality of watermark profiles;
- obtaining the watermark profile associated with the class assigned to the audio signal;
- embedding a watermark into the audio signal by using the obtained watermark profile so as to provide a watermarked audio signal.
- In a second aspect, the present disclosure relates to a system of watermarking an audio signal comprising an encoding device comprising:
- a classification unit configured to assign the audio signal to a class, among a plurality of classes, depending on the semantic content of the audio signal, the plurality of classes being associated with a corresponding plurality of watermark profiles;
- a watermark profile unit configured to obtain the watermark profile associated with the class assigned to the audio signal;
- an embedding unit configured to embed a watermark into the audio signal by using the watermark profile obtained by watermark profile unit so as to provide a watermarked audio signal.
- The method and system of the present disclosure may have at least one of the following preferred features.
- Advantageously, each watermark profile is associated with a corresponding class so that trade-off among watermark imperceptibility, robustness and payload is optimized for said class, depending on the watermark application. For example, depending on the watermark application, one, two or all the features among imperceptibility, robustness and payload could be optimized for each class, by keeping unchanged the other feature(s), if any, among the classes. For example, in noisy applications, robustness could be maximized for each class, by keeping the same payload and imperceptibility level among the classes. On the other hand, in low-noise application and/or when many data need to be contained in the watermark, payload could be maximized for each class, by keeping the same robustness and imperceptibility level among the classes. Otherwise, in low-noise application, imperceptibility could be maximized for each class, by keeping the same payload and robustness level among the classes.
- The plurality of watermark profiles can relate to a single watermarking technique or to a plurality of watermarking techniques.
- In case of single watermarking technique, the watermark profiles differ from each other for the value taken by at least one parameter of said set of parameters.
- In case of a plurality of watermarking techniques, the watermark profiles differ from each other for at least one of the parameters and/or for the values taken by at least one of the common parameters.
- The watermarking technique(s) can be selected, for example, from the group comprising: spread spectrum watermarking technique, echo hiding watermarking technique, phase coding technique, informed watermarking schemes like QIM (Quantization Index Modulation) and Spread Transform Dither Modulation (STDM).
- In a preferred embodiment, the method is a computer-implemented method.
- In an embodiment, the embedding unit can comprise a plurality of embedding sub-units. Each embedding sub-unit can be configured to embed the watermark into the audio signal by using one watermark profile of said plurality of watermarking profiles or one watermarking technique of said plurality of watermarking techniques.
- At least one parameter of said set of parameters defining the watermark profile may be selected, for example from the group comprising: watermark bit rate; frequency range hosting the watermark; Document to Watermark Ratio (DWR); watermark frame length; masking threshold modulation factor F, intended as a quantity by which the masking threshold of the audio signal -computed according to a psychoacoustic model- is multiplied to vary its amplitude with respect to the computed value; channel coding scheme (which may also include error detection techniques such as, for example, Cyclic Redundancy Check); number, amplitude, offset and decay rate of echo pulses (in case of echo hiding watermarking technique); spreading factor, intended as number of audio signal frequency or phase samples needed to insert one bit of watermark (in case of spread spectrum watermarking technique with, respectively, frequency or phase modulation).
- Preferably, the plurality of classes is associated with the corresponding plurality of watermark profiles in a database. The database can be internal or external to the encoding device.
- In a preferred embodiment, the expression watermarking refers to digital watermarking.
- Digital watermarking relates to a computer-implemented watermarking process.
- In a preferred embodiment, a masking threshold of the audio signal according to a psychoacoustic model is computed. Preferably, before computing the masking threshold, the audio signal is split into time windows and a masking threshold is computed for each time window of the audio signal. The masking threshold can be computed after, before or at the same time of the audio signal classification.
- The psychoacoustic model can be a psychoacoustic model known in the art.
- Preferably, the psychoacoustic model is adapted to calculate the masking threshold in time and/or frequency domain and is based on one of the following analysis: block based FFT (Fast Fourier Transform), block based DCT (Discrete Cosine Transform), block based MDCT (Modified Discrete Cosine Transform), block based MCLT (Modified Complex Lapped Transform), block based STFT (Short-Time Fourier Transform), sub-band or wavelet packet analysis.
- Preferably, the encoding device comprises a masking unit configured to perform the masking threshold computation and, optionally, the time windows splitting.
- Preferably, embedding the watermark into the audio signal comprises the step of shaping the energy of the watermark according to the computed masking threshold.
- This advantageously enables to guarantee watermark imperceptibility to human auditory system.
- Preferably, the watermark is shaped to reduce its energy below the computed masking threshold of the audio signal. When the set of parameters defining the obtained watermark profile comprises the masking threshold modulation factor F, the watermark is preferably shaped to reduce its energy below the computed masking threshold, multiplied by the masking threshold modulation factor F.
- Preferably, the watermark is shaped by the embedding unit.
- Preferably, the masking threshold modulation factor F is at least equal to 0.5. More preferably, the masking threshold modulation factor F is at least equal to 0.7, even more preferably at least equal to 0.8, even more preferably at least equal to 0.9.
- Preferably, the masking threshold modulation factor F is not higher than 1.5. More preferably, the masking threshold modulation factor F is not higher than 1.3, even more preferably not higher than 1.2, even more preferably not higher than 1.1.
- Assigning the audio signal to a class, according to the semantic content of the audio signal, is preferably performed based upon analysis of at least one audio signal feature.
- In the present disclosure, the audio signal feature is related to the semantic content of the audio signal.
- In the present disclosure, the audio signal feature is preferably related to time, frequency, energy or cepstrum domain.
- For example, the at least one audio signal feature can be selected from the group comprising: loudness, brightness, beats per minute (BPM) bandwidth, pitch, odd-to-even harmonic energy ratio, spectral energy bands (e.g. spectrum sparsity), spectral and tonal complexities, spectral roll-off point (intended as any percentile of the power spectral distribution), spectral centroid (defined as the center of gravity of the magnitude spectrum), spectral "flux" (intended as squared difference between the normalized magnitudes of successive spectral distributions), time domain Zero-Crossing Rate, Cepstrum Resynthesis Residual Magnitude, Mel-Frequency Cepstral Coefficients, band periodicity (defined as the periodicity of a sub-band and derived by sub-band correlation analysis).
- The analysis of at least one audio signal feature preferably comprises checking if the at least one audio signal feature meets one or more predetermined constraints. For example, the value of the at least one audio signal feature can be compared to one or more predetermined thresholds or one or more predetermined ranges of values. Each class is advantageously defined by predetermined constrains (e.g. set of values) to be met by the at least one audio signal feature.
- Preferably, before assigning the audio signal to a class, the audio signal is split into sub-signals of shorter duration and a class is assigned to each sub-signal independently from the other sub-signals.
- Suitably, the duration of the sub-signals is longer than the duration of the time windows in which the audio signal is split for performing masking threshold computation.
- Preferably, the method of audio signal watermarking comprises a decoding process comprising extraction of the watermark from the watermarked audio signal. Preferably, the watermark is extracted by using the same watermark profile used for embedding the watermark into the audio signal.
- Preferably, the system also comprises a decoding device configured to extract the watermark from the watermarked audio signal.
- In an embodiment of the decoding process, the watermarked audio signal is assigned to a class, among the plurality of classes, depending on the semantic content of the watermarked audio signal, the plurality of classes being associated with the corresponding plurality of watermark profiles. Preferably, the class is assigned by a classification unit of the decoding device. According to this embodiment, the watermark profile associated with the class assigned to the audio signal is obtained and used for extracting the watermark from the watermarked audio signal. Preferably, the watermark is extracted from the watermarked audio signal by an extraction unit of the decoding device.
- According to another embodiment of the decoding process, said plurality of watermark profiles are tried in sequence for extracting the watermark till the watermark is successfully extracted from the watermarked audio signal. In this case, the decoding device can comprise a single extraction unit for trying in sequence the plurality of watermark profiles. In alternative, the extraction unit can comprise a plurality of sub-extraction units, one for each watermark profile or for each watermarking technique, for trying the plurality of watermark profiles at least partly in parallel. In this embodiment, audio signal classification is not necessary at the decoding side.
- According to a further embodiment, a second watermark is embedded into the audio signal, comprising the class assigned to the audio signal, by using a predefined watermark profile, common to all audio signals independently from their class. The second watermark can be embedded into the watermarked audio signal, already watermarked with the first watermark. In alternative, the first and the second watermarks can be embedded into different sub-bands of the audio signal. Watermark extraction can then be performed by first extracting the second watermark from the watermarked audio signal (by using the common watermark profile) so as to retrieve the class of the audio signal, and then by obtaining the watermark profile associated with the retrieved class and extracting the watermark from the watermarked audio signal with the obtained watermark profile. In this embodiment, audio signal classification is not necessary at the decoding side.
- Further characteristics and advantages of the present invention will become clearer from the following detailed description of some preferred embodiments thereof, made as an example and not for limiting purposes with reference to the attached drawings. In such drawings,
-
figure 1 schematically shows a system of audio signal watermarking according to an embodiment of the invention; -
figure 2 schematically shows the energy spectrum of four audio signals having a different semantic content; -
figure 3 schematically shows a system of audio signal watermarking according to another embodiment of the invention; -
figure 4 schematically shows a first embodiment of a decoding device of the system offigure 3 ; -
figure 5 schematically shows a second embodiment of the decoding device of the system offigure 3 ; -
figure 6 schematically shows an exemplary implementation of the system offigure 3 . -
Figure 1 discloses asystem 1 of audio signal watermarking according to an embodiment of the invention. - The
system 1 comprises anencoding device 10 comprising aninput 11 for an audio signal, aninput 13 for a watermark and anoutput 15 for a watermarked audio signal. - The
encoding device 10 comprises aclassification unit 12, awatermark profile unit 14, a maskingunit 18 and an embeddingunit 16. - The
classification unit 12,watermark profile unit 14, maskingunit 18 and embeddingunit 16 comprise hardware and/or software and/or firmware configured to implement the method of the present disclosure. - The
classification unit 12 is configured to assign the audio signal to a class depending on the semantic content of the audio signal. - The
classification unit 12 is configured to analyse at least one audio signal feature related to the semantic content of the audio signal, to compare the at least one audio signal feature with one or more constrains and to assign the audio signal a class, selected among a predetermined plurality of classes, depending on the result of the comparison. - For example, the at least one audio signal feature can be selected from the group comprising: loudness, brightness, beats per minute (BPM) bandwidth, pitch, odd-to-even harmonic energy ratio, spectral energy bands (e.g. spectrum sparsity), spectral and tonal complexities, spectral roll-off point (intended as any percentile of the power spectral distribution), spectral centroid (defined as the center of gravity of the magnitude spectrum), spectral "flux" (intended as squared difference between the normalized magnitudes of successive spectral distributions), time domain Zero-Crossing Rate, Cepstrum Resynthesis Residual Magnitude, Mel-Frequency Cepstral Coefficients, band periodicity (defined as the periodicity of a sub-band and derived by sub-band correlation analysis).
- For example, the at least one audio signal feature can be compared with one or more predetermined thresholds or one or more predetermined ranges of values and each class can be defined by a predetermined set of values that can be taken by the at least one audio signal feature.
- The plurality of classes can be stored in a
suitable class database 17 internal (as shown infigure 1 ) or external to theencoding device 10. - In a preferred embodiment (not shown), before assigning the audio signal to a class, the
encoding device 10 is configured to split the audio signal into sub-signals of shorter duration (e.g. from few tenths to few tens of seconds) and theclassification unit 12 is configured to classify each sub-signal independently from the other sub-signals. - In a preferred embodiment, the
classification unit 12 is configured to classify the audio signals (or sub-signals) by analysing the spectrum sparsity of their energy spectrum. - The spectrum sparsity is an audio signal feature indicative of the energy concentration in a sub-band compared to the energy in the whole audio signal (or sub-signal) band.
- The energy spectrum of the audio signal (or sub-signal) is considered sparse (or colored) if most part of its energy is concentrated in a small spectrum sub-band, otherwise it is considered non-sparse (or noise-like).
- For example,
figure 2 shows the energy spectrum of four audio signals having a different semantic content: speech, rock, jazz, piano solo. - For example, in case of
figure 2 , three different classes can be defined by analyzing the spectrum sparsity feature and, in the example, by comparing the fraction of signal energy (normalized to the total energy) contained in the 0-1000 Hz sub-band with two threshold levels SL and SH. If said fraction of energy in the 0-1000 Hz sub-band is lower than SL, the audio signal (or sub-signal) can be classified into a "low sparse" class; if said fraction of energy in the 0-1000 Hz sub-band is between SL and SH, the audio signal (or sub-signal) can be classified into a "medium sparse" class; if it is higher than SH, the audio signal can be classified into a "high sparse" class. - In the example of
figure 2 , by setting SL = 0.85 and SH = 0.90, talk and rock signals are classified as "low sparse" signals, jazz signal is classified as "medium sparse" signal, and piano solo signal is classified as "high sparse" signal. - It is noted that most audio signals have their energy concentrated in the 0-1000 Hz sub-band. Even slight differences (e.g. of about 0.05) between threshold levels SL and SH can thus be significant in such sub-band.
- The plurality of classes used by the
classification unit 12 are associated with a corresponding plurality of watermark profiles in a suitablewatermark profile database 19 internal (as shown infigure 1 ) or external to theencoding device 10. - It is noted that, even if
database 17 anddatabase 19 are shown in the figures as two distinct entities, they can also be implemented into a single database. - Each watermark profile is defined by a set of parameters used for embedding the watermark into the audio signal according to a predetermined watermarking technique.
- The watermarking technique can be a technique known in the art as, for example, a spread spectrum watermarking technique (e.g. wherein the watermark is spread over many frequency bins so that the energy in one bin is very small and undetectable), a echo hiding watermarking technique (e.g. wherein the watermark is embedded into an audio signal by introducing one or more echoes that are offset in time from the audio signal by an offset value associated with the data value of the bit), a phase coding technique (e.g. wherein phase differences between selected frequency component portions of an audio signal are modified to embed the watermark in the audio signal) or any other watermarking technique known in the art.
- The set of parameters can comprise at least one parameter selected from the group comprising: watermark bit rate; frequency range hosting the watermark; Document to Watermark Ratio (DWR); watermark frame length; masking threshold modulation factor F; channel coding scheme; number, amplitude, offset and decay rate of echo pulses; spreading factor.
- The plurality of watermark profiles associated with the plurality of classes can relate to a single watermarking technique or to a plurality of watermarking techniques.
- In case of single watermarking technique, the watermark profiles are all defined by the same set of parameters and differ from each other for the values taken by at least one of the parameters.
- In case of a plurality of watermarking techniques, the watermark profiles relating to different watermarking techniques are defined by different sets of parameters. The watermark profiles can thus differ from each other for at least one parameter and/or for at least one value taken by a common parameter.
- According to the present disclosure, within the
watermark profile database 19, each class is associated with a corresponding watermark profile that enables to optimize trade-off among watermark imperceptibility, robustness and payload for each class, depending on the watermark application. - In fact, the applicant observed that it is possible to obtain different optimized trade-offs for the audio signals, depending on the semantic content of each audio signal.
- For example, as visible in
figure 2 , talk and rock signals, classified as "low sparse" signals, allows to introduce a higher level of distortion than jazz signals, classified as "medium sparse" signals, and than piano solo signals, classified as "high sparse" signals. - According to the invention, the level of distortion which is actually "available" for each audio signal class is advantageously exploited in order to optimize trade-off among watermark imperceptibility, robustness and payload, depending on the watermark application. For example, the higher level of distortion available for the "low sparse" and "medium sparse" signals, compared with the "high sparse" signals, could be exploited to maximize, for each class, one or two features among imperceptibility, robustness and payload, by keeping unchanged the other feature(s) among the classes.
- For example, when the audio signal is intended to be transmitted in a low noise channel and/or played in a low-noise ambient (e.g. domestic ambient), payload could be maximized for each class, by keeping the same robustness and imperceptibility levels among the classes. On the other hand, when the audio signal is intended to be transmitted in a high noise channel and/or played in a high-noise environment (e.g. public ambient as a train station or airport), robustness could be maximized for each class, by keeping the same payload and imperceptibility level among the classes. Otherwise, when the audio signal is intended to be played in a low-noise ambient, imperceptibility could be maximized for each class, by keeping the same payload and robustness level among the classes.
- Within the set of parameters defining a watermark profile, payload could be optimized by acting on the watermark bit rate; imperceptibility could be optimized by acting on the masking threshold modulation factor F; and robustness could be optimized by acting on at least one of: frequency range hosting the watermark; Document to Watermark Ratio; watermark frame length; channel coding scheme; number, amplitude, offset and decay rate of echo pulses; and spreading factor.
- For example, in the case of
figure 2 , supposing to keep the same robustness and imperceptibility levels among classes, the level of distortion actually "available" for each audio signal class could be exploited in order to maximize the payload feature for each class, by associating a watermark profile with a higher bit rate with the low sparse class, a watermark profile with an intermediate bit rate with the medium sparse class and a watermark profile with a lower bit rate with the high sparse class. On the other hand, supposing to keep the same payload and imperceptibility level among classes, the robustness feature could be maximized for each class, by associating a more robust watermark profile with the low sparse class, an intermediate robust watermark profile with the medium sparse class and a lower robust watermark profile with the high sparse class. Otherwise, supposing to keep the same payload and robustness level among classes, the imperceptibility feature could be maximized for each class, by associating, for example, a watermark profile having a masking threshold modulation factor F ≥ 1 with the low sparse class, a watermark profile having a masking threshold modulation factor F = 1 with the medium sparse class and a watermark profile having a masking threshold modulation factor F ≤ 1 with the high sparse class. - As to the masking threshold modulation factor F, it is also observed that it could be set to different values, depending on the frequency ranges of the audio signal.
- As also explained in more detail below, in case of F = 1, the watermark is shaped by the embedding
unit 16 according to a masking threshold as computed by the maskingunit 18, on the basis of a psycho-acoustic model. In case of F > 1, the watermark is shaped according to a higher masking threshold, whereby the imperceptibility level of the watermark is decreased with respect to the level set according to the psycho-acoustic model. In case of F < 1, the watermark is shaped according to a lower masking threshold, whereby the imperceptibility level of the watermark is increased with respect to the level set according to the psycho-acoustic model. - This is advantageous because the psychoacoustic model, as such, produces a representation of sound perception based on average human auditory system, without taking into account high or low level psychoacoustic effects. Indeed, the masking threshold computed according to the psychoacoustic model can be too strict in some situations (e.g. in case of rock music, noisy signal, conversation or aloud quarrel) or too light in other situations (e.g. in case of classic or instrumental music and expert hearer).
- Accordingly, the masking threshold modulation factor F enables to vary the amplitude of the masking threshold, as computed according to the psycho-acoustic model, depending on the semantic content of the audio signal. In this way, the imperceptibility level of the watermark can be finely tuned, with respect to the level set according to the psycho-acoustic model, depending on the semantic content of the audio signal and on the watermark application.
- Once the audio signal is assigned to a class by the
classification unit 12, thewatermark profile unit 14 is configured to retrieve from thewatermark profile database 19 the watermark profile associated with said class and to provide it to the embeddingunit 16. - The masking
unit 18 is configured to compute a masking threshold of the audio signal according to a psycho-acoustic model and to provide it to the embeddingunit 16. - The psychoacoustic model can be any psychoacoustic model known in the art.
- Preferably, the psychoacoustic model calculates the masking threshold in time and/or frequency domain and is based on one of the following analysis: block based FFT (Fast Fourier Transform), block based DCT (Discrete Cosine Transform), block based MDCT (Modified Discrete Cosine Transform), block based MCLT (Modified Complex Lapped Transform), block based STFT (Short-Time Fourier Transform), sub-band or wavelet packet analysis.
- Preferably, the masking
unit 18 is configured to split the audio signal into suitable time windows (e.g. of about few ms) and to compute a masking threshold for each time window. - In the embodiment shown, the masking threshold computation is performed in parallel to audio signal classification.
- The embedding
unit 16 is configured to embed the watermark into the audio signal by using the watermark profile obtained by thewatermark profile unit 14 so as to provide a watermarked audio signal. - The embedding
unit 16 is also configured to shape the energy of the watermark according to the masking threshold computed by the maskingunit 18. - When the set of parameters defining the watermark profile obtained by the
watermark profile unit 14 comprises the masking threshold modulation factor F, the embeddingunit 16 is also preferably configured to shape the watermark so as to reduce its energy below the masking threshold computed by the maskingunit 18, multiplied by the masking threshold modulation factor F. - In an embodiment (not shown) the embedding
unit 16 can comprise a plurality of embedding sub-units, one for each different watermark profile of said plurality of watermarking profiles or one for each of the watermarking techniques to which the plurality of watermarking profiles relates. -
Figure 3 shows an embodiment wherein thesystem 1 comprises theencoding device 10, acommunication network 30 and adecoding device 20. - As far as the
encoding device 10 is concerned reference is made to what disclosed above. - The
communication network 30 and thedecoding device 20 comprise hardware and/or software and/or firmware configured to implement the method of the present disclosure. - The
communication network 30 can be any type of communication network adapted to transmit the watermarked audio signal. - The
decoding device 20 is configured to receive the watermarked audio signal and to extract the watermark from it. - Preferably, the watermark is extracted by using the same watermark profile used for embedding the watermark into the audio signal. In view of this, the
decoding device 20 needs to know the watermark profile used for embedding the watermark. -
Figure 4 shows a first embodiment of thedecoding device 20 comprising aclassification unit 22, awatermark profile unit 24, anextraction unit 26, aclass database 27 and awatermark profile database 29. - The
classification unit 22 is configured to assign the watermarked audio signal a class depending upon the semantic content of the audio signal, in the same way to what disclosed above with reference toclassification unit 12 ofencoding device 10. The class assigned to the watermarked audio signal will thus be the same as that assigned in theencoding device 10. - The class database 27 (which could also be external to the decoding device 20) stores the plurality of classes in which the audio signal can be classified.
- The watermark profile database 29 (which could also be external to the decoding device 20) stores an association between the plurality of classes and the corresponding plurality of watermark profiles.
- It is noted that, even if
database 27 anddatabase 29 are shown in the figures as two distinct entities, they can also be implemented into a single database. - Once the audio signal is assigned to a class by the
classification unit 22, thewatermark profile unit 24 is configured to retrieve from thewatermark profile database 29 the watermark profile associated with said class and to provide it to theextraction unit 26. - The association in the
watermark profile database 29 is the same as that in thewatermark profile database 19 of theencoding device 10. The watermark profile retrieved by thewatermark profile unit 24 will thus be the same as that used in theencoding device 10. - The
extraction unit 26 is configured to use the watermark profile retrieved bywatermark profile unit 24 for extracting the watermark from the watermarked audio signal. -
Figure 5 shows a second embodiment of thedecoding device 20 comprising anextraction unit 26. In this embodiment, watermarked audio signal classification is not performed indecoding device 20. - According to a first implementation of this embodiment,
extraction unit 26 is configured to try in sequence the plurality of watermark profiles for extracting the watermark till the watermark is successfully extracted from the watermarked audio signal. Theextraction unit 26 can comprise a plurality of extraction sub-units (not shown), one for each watermark profile or for each watermarking technique, for trying the plurality of watermark profiles at least partly (or wholly) in parallel. - According to a second implementation, the class of the audio signal can be inserted into the audio signal by the embedding
unit 16 of theencoding device 10 by embedding a second watermark (containing said class) into the audio signal with a predefined watermark profile, common to all audio signals independently from their class. The second watermark can be embedded into the already watermarked audio signal. In alternative, the two watermarks can be embedded into different sub-bands of the audio signal. - In this second variant, the
extraction unit 16 is preferably configured to first use the predefined common watermark profile to extract the second watermark from the watermarked audio signal thereby retrieving the class of the audio signal. Then, theextraction unit 16 is configured to obtain (e.g., from a watermark profile database - not shown infigure 5 - similar to watermark profile database 29) the watermark profile associated with the retrieved class and to extract the watermark from the watermarked audio signal with the obtained watermark profile. -
Figure 6 shows an exemplary implementation of thesystem 1 of audio signal watermarking. - According to this exemplary implementation, the
encoding device 10 is deployed on anentity 50 for embedding watermarks into audio signals. - The
entity 50 can be, for example, a recording industry, a music producer or a service supply company providing services to the users. - The watermark can comprise data relating to signature information, copyright information, serial numbers of broadcasted audio signals, product identification in audio signal broadcasting and similar.
- The audio signal can comprise music or speech as, for example, talks from movies, from TV or radio programs, from TV or radio advertisements, from TV or radio talk shows, and similar.
- The
decoding device 20 is deployed on auser device 60. - The
user device 60 can be, for example, a PC, a smart phone, a tablet, a portable media player (e.g. an iPod®), or other similar device. - The
user device 60 can be adapted to download or stream a video from the internet or to detect audio signals of a TV program broadcasted on the TV or to detect audio signals of a movie played by means of a DVD player, a VHS player, a decoder or similar. - A
media provider 40 can obtain watermarked audio signals fromentity 50 and supply them touser device 60, throughcommunication network 30. - The watermarked audio signals can be, for example, supplied to the
user device 60 by means of broadcasting (e.g. from a TV or radio station), streaming (e.g. from the internet) or downloading (e.g. from a PC). - The
media provider 40 can be, for example, a TV station, a radio station, a PC or other similar device. -
User device 60, equipped with thedecoding device 20, will be configured to extract the watermark from the watermarked audio signals. - For example, the watermark can comprise information enabling the
user device 60 to connect, throughcommunication network 30, to aservice provider 70 that supplies predetermined services to users. In this case, the audio signals can be, for example, audio signals of a TV talk show or movie (or similar) and the user service can involve the provision of information to users about TV images the users are watching into the TV (e.g. information about the actors, about items of clothing and/or furnishing, about the movies or talk shows, about the set and similar).
Claims (15)
- Method of watermarking an audio signal comprising:- assigning the audio signal to a class, among a plurality of classes, depending on the semantic content of the audio signal, the plurality of classes being associated with a corresponding plurality of watermark profiles;- obtaining the watermark profile associated with the class assigned to the audio signal;- embedding a watermark into the audio signal by using the obtained watermark profile so as to provide a watermarked audio signal.
- Method according to claim 1, wherein each watermark profile is associated with a corresponding class so that trade-off among watermark imperceptibility, robustness and payload is optimized for said class, depending on watermarking application.
- Method according to claim 1 or 2, wherein the plurality of watermark profiles relate to a single watermarking technique or to a plurality of watermarking techniques.
- Method according to any of claims 1 to 3, wherein each watermark profile is defined by a set of parameters and the watermark profiles differ from each other for the value taken by at least one parameter of said set of parameters and/or for at least one of the set of parameters.
- Method according to any of claims 1 to 4, further comprising: computing a masking threshold of the audio signal according to a psychoacoustic model.
- Method according to claim 5, wherein embedding the watermark into the audio signal comprises the step of shaping the energy of the watermark according to the computed masking threshold.
- Method according to claim 6, wherein each watermark profile is defined by a set of parameters comprising a masking threshold modulation factor F, and the energy of the watermark is shaped according to the computed masking threshold, multiplied by the masking threshold modulation factor F.
- Method according to any of claims 1 to 7, wherein assigning the audio signal to a class, depending on the semantic content of the audio signal, is performed based upon analysis of at least one audio signal feature related to time, frequency, energy or cepstrum domain of the audio signal.
- Method according to any of claims 1 to 8, further comprising a decoding process comprising: extracting the watermark from the watermarked audio signal by using the same watermark profile used for embedding the watermark into the audio signal.
- Method according to claim 9, wherein the decoding process comprises:- assigning the watermarked audio signal to a class, among said plurality of classes, depending on the semantic content of the watermarked audio signal;- obtaining the watermark profile associated with the class assigned to the audio signal; and- extracting the watermark from the watermarked audio signal by using the obtained watermark profile.
- Method according to claim 9, wherein the decoding process comprises: trying in sequence said plurality of watermark profiles till the watermark is successfully extracted from the watermarked audio signal.
- Method according to claim 9, wherein embedding the watermark into the audio signal comprises: embedding a second watermark into the audio signal, comprising the class assigned to the audio signal, by using a common watermark profile; and the decoding process comprises:- extracting the second watermark from the watermarked audio signal by using the common watermark profile so as to retrieve the class of the audio signal,- obtaining the watermark profile associated with the retrieved class, and- extracting the watermark from the watermarked audio signal with the obtained watermark profile.
- System (1) of watermarking an audio signal comprising an encoding device (10) comprising:- a classification unit (12) configured to assign the audio signal to a class, among a plurality of classes, depending on the semantic content of the audio signal, the plurality of classes being associated with a corresponding plurality of watermark profiles;- a watermark profile unit (14) configured to obtain the watermark profile associated with the class assigned to the audio signal;- an embedding unit (16) configured to embed a watermark into the audio signal by using the watermark profile obtained by watermark profile unit so as to provide a watermarked audio signal.
- System (1) according to claim 13, further comprising a database (19) storing the plurality of classes associated with the corresponding plurality of watermark profiles.
- System (1) according to claim 13 or 14, further comprising a decoding device (20) configured to extract the watermark from the watermarked audio signal, by using the same watermark profile used by the embedding unit (16) for embedding the watermark into the audio signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP13162596.4A EP2787503A1 (en) | 2013-04-05 | 2013-04-05 | Method and system of audio signal watermarking |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP13162596.4A EP2787503A1 (en) | 2013-04-05 | 2013-04-05 | Method and system of audio signal watermarking |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2787503A1 true EP2787503A1 (en) | 2014-10-08 |
Family
ID=48045325
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP13162596.4A Withdrawn EP2787503A1 (en) | 2013-04-05 | 2013-04-05 | Method and system of audio signal watermarking |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP2787503A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104700841A (en) * | 2015-02-10 | 2015-06-10 | 浙江省广电科技股份有限公司 | Watermark embedding and detecting method based on audio content classification |
EP3109860A1 (en) * | 2015-06-26 | 2016-12-28 | Thomson Licensing | Method and apparatus for increasing the strength of phase-based watermarking of an audio signal |
CN106504757A (en) * | 2016-11-09 | 2017-03-15 | 天津大学 | A kind of adaptive audio blind watermark method based on auditory model |
CN111292756A (en) * | 2020-01-19 | 2020-06-16 | 成都嗨翻屋科技有限公司 | Compression-resistant audio silent watermark embedding and extracting method and system |
CN111968654A (en) * | 2020-08-24 | 2020-11-20 | 成都潜在人工智能科技有限公司 | Self-adaptive mixed domain audio watermark embedding method |
EP3769281A4 (en) * | 2018-03-21 | 2022-01-12 | The Nielsen Company (US), LLC | Methods and apparatus to identify signals using a low power watermark |
EP4258139A1 (en) * | 2022-04-07 | 2023-10-11 | Siemens Aktiengesellschaft | Method for preventing the theft of machine learning modules and prevention system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6456726B1 (en) * | 1999-10-26 | 2002-09-24 | Matsushita Electric Industrial Co., Ltd. | Methods and apparatus for multi-layer data hiding |
US6674861B1 (en) * | 1998-12-29 | 2004-01-06 | Kent Ridge Digital Labs | Digital audio watermarking using content-adaptive, multiple echo hopping |
US20040059918A1 (en) * | 2000-12-15 | 2004-03-25 | Changsheng Xu | Method and system of digital watermarking for compressed audio |
EP1594122A1 (en) | 2004-05-06 | 2005-11-09 | Deutsche Thomson-Brandt Gmbh | Spread spectrum watermarking |
US7127065B1 (en) * | 1999-11-23 | 2006-10-24 | Koninklijke Philips Electronics N.V. | Watermark embedding and detection |
US20100057231A1 (en) * | 2008-09-01 | 2010-03-04 | Sony Corporation | Audio watermarking apparatus and method |
-
2013
- 2013-04-05 EP EP13162596.4A patent/EP2787503A1/en not_active Withdrawn
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6674861B1 (en) * | 1998-12-29 | 2004-01-06 | Kent Ridge Digital Labs | Digital audio watermarking using content-adaptive, multiple echo hopping |
US6456726B1 (en) * | 1999-10-26 | 2002-09-24 | Matsushita Electric Industrial Co., Ltd. | Methods and apparatus for multi-layer data hiding |
US7127065B1 (en) * | 1999-11-23 | 2006-10-24 | Koninklijke Philips Electronics N.V. | Watermark embedding and detection |
US20040059918A1 (en) * | 2000-12-15 | 2004-03-25 | Changsheng Xu | Method and system of digital watermarking for compressed audio |
EP1594122A1 (en) | 2004-05-06 | 2005-11-09 | Deutsche Thomson-Brandt Gmbh | Spread spectrum watermarking |
US20100057231A1 (en) * | 2008-09-01 | 2010-03-04 | Sony Corporation | Audio watermarking apparatus and method |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104700841A (en) * | 2015-02-10 | 2015-06-10 | 浙江省广电科技股份有限公司 | Watermark embedding and detecting method based on audio content classification |
EP3109860A1 (en) * | 2015-06-26 | 2016-12-28 | Thomson Licensing | Method and apparatus for increasing the strength of phase-based watermarking of an audio signal |
US9922658B2 (en) | 2015-06-26 | 2018-03-20 | Thomson Licensing | Method and apparatus for increasing the strength of phase-based watermarking of an audio signal |
CN106504757A (en) * | 2016-11-09 | 2017-03-15 | 天津大学 | A kind of adaptive audio blind watermark method based on auditory model |
EP3769281A4 (en) * | 2018-03-21 | 2022-01-12 | The Nielsen Company (US), LLC | Methods and apparatus to identify signals using a low power watermark |
CN111292756A (en) * | 2020-01-19 | 2020-06-16 | 成都嗨翻屋科技有限公司 | Compression-resistant audio silent watermark embedding and extracting method and system |
CN111292756B (en) * | 2020-01-19 | 2023-05-26 | 成都潜在人工智能科技有限公司 | Compression-resistant audio silent watermark embedding and extracting method and system |
CN111968654A (en) * | 2020-08-24 | 2020-11-20 | 成都潜在人工智能科技有限公司 | Self-adaptive mixed domain audio watermark embedding method |
EP4258139A1 (en) * | 2022-04-07 | 2023-10-11 | Siemens Aktiengesellschaft | Method for preventing the theft of machine learning modules and prevention system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11183198B2 (en) | Multi-mode audio recognition and auxiliary data encoding and decoding | |
EP2787503A1 (en) | Method and system of audio signal watermarking | |
US10026410B2 (en) | Multi-mode audio recognition and auxiliary data encoding and decoding | |
US10236006B1 (en) | Digital watermarks adapted to compensate for time scaling, pitch shifting and mixing | |
RU2624549C2 (en) | Watermark signal generation and embedding watermark | |
CN1264137C (en) | Method for comparing audio signal by characterisation based on auditory events | |
CN1808568B (en) | Audio encoding/decoding apparatus having watermark insertion/abstraction function and method using the same | |
US9947327B2 (en) | Methods and apparatus for performing variable block length watermarking of media | |
CN100550723C (en) | Camouflage communication method based on speech recognition | |
KR20030064381A (en) | Modulating One or More Parameter of An Audio or Video Perceptual Coding System in Response to Supplemental Information | |
JP4885812B2 (en) | Music detector | |
Kanhe et al. | DCT based audio steganography in voiced and un-voiced frames | |
Doets et al. | On the comparison of audio fingerprints for extracting quality parameters of compressed audio | |
CN111199745A (en) | Advertisement identification method, equipment, media platform, terminal, server and medium | |
Attari et al. | Robust and transparent audio watermarking based on spread spectrum in wavelet domain | |
Karnjana et al. | Tampering detection in speech signals by semi-fragile watermarking based on singular-spectrum analysis | |
Xu et al. | Performance analysis of data hiding in MPEG-4 AAC audio | |
Wei et al. | Controlling bitrate steganography on AAC audio | |
Xu et al. | Content-based digital watermarking for compressed audio | |
WO2017016363A1 (en) | Method for processing digital audio signal | |
Bhatt et al. | Implementation of variable bitrate data hiding techniques on standard and proposed GSM 06.10 full rate coder and its overall comparative evaluation of performance | |
Xu et al. | Content-adaptive digital music watermarking based on music structure analysis | |
Ballesteros L et al. | On the ability of adaptation of speech signals and data hiding | |
Muhaimin et al. | An efficient audio watermark by autocorrelation methods | |
Sasaki et al. | Manipulating vocal signal in mixed music sounds using small amount of side information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20130405 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20150409 |