WO2001088915A1 - Addition d'un bruit imperceptible a des signaux audio et a d'autres types de signaux visant a provoquer une degradation significative de ces signaux lorsqu'ils sont comprimes et decomprimes - Google Patents

Addition d'un bruit imperceptible a des signaux audio et a d'autres types de signaux visant a provoquer une degradation significative de ces signaux lorsqu'ils sont comprimes et decomprimes Download PDF

Info

Publication number
WO2001088915A1
WO2001088915A1 PCT/US2001/015328 US0115328W WO0188915A1 WO 2001088915 A1 WO2001088915 A1 WO 2001088915A1 US 0115328 W US0115328 W US 0115328W WO 0188915 A1 WO0188915 A1 WO 0188915A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
audio
audio signal
compression
compressed
Prior art date
Application number
PCT/US2001/015328
Other languages
English (en)
Inventor
Paul R. Goldberg
Vlad Fruchter
Mauricio Greene
Sergiy Bilobrov
Jason Lesperance
Dimitrij Chmounk
Original Assignee
Qdesign Usa, Inc.
Zoran Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qdesign Usa, Inc., Zoran Corporation filed Critical Qdesign Usa, Inc.
Priority to AU2001261475A priority Critical patent/AU2001261475A1/en
Publication of WO2001088915A1 publication Critical patent/WO2001088915A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N1/32101Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
    • H04N1/32144Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title embedded in the image data, i.e. enclosed or integrated in the image, e.g. watermark, super-imposed logo or stamp
    • H04N1/32149Methods relating to embedding, encoding, decoding, detection or retrieval operations
    • H04N1/32267Methods relating to embedding, encoding, decoding, detection or retrieval operations combined with processing of the image
    • H04N1/32277Compression
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/018Audio watermarking, i.e. embedding inaudible data in the audio signal
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/00007Time or data compression or expansion
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/00086Circuits for prevention of unauthorised reproduction or copying, e.g. piracy
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/00086Circuits for prevention of unauthorised reproduction or copying, e.g. piracy
    • G11B20/00884Circuits for prevention of unauthorised reproduction or copying, e.g. piracy involving a watermark, i.e. a barely perceptible transformation of the original data which can nevertheless be recognised by an algorithm
    • G11B20/00891Circuits for prevention of unauthorised reproduction or copying, e.g. piracy involving a watermark, i.e. a barely perceptible transformation of the original data which can nevertheless be recognised by an algorithm embedded in audio data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/66Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission for reducing bandwidth of signals; for improving efficiency of transmission
    • H04B1/665Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission for reducing bandwidth of signals; for improving efficiency of transmission using psychoacoustic properties of the ear, e.g. masking effect

Definitions

  • This invention is related to the processing, transmission and recording of signals intended for interfacing with humans, particularly music and other audio signals, and, more specifically, to techniques that prevent or discourage the unauthorized copying and/or distribution of audio or other content of such signals.
  • Psychoacoustic audio compression technologies operate by making quantized noise imperceptible to the human hearing system.
  • digital audio systems such as those used by compact disks to deliver music to consumers
  • 16 bit resolution is considered to be about the practical minimum number of bits to use to keep the quantized noise down to an acceptable level (in this case about 96dB below the maximum signal level).
  • the objective of an audio compression algorithm is to use as few a bits as possible to represent the input audio signal. In order to use fewer bits, mechanisms need to be found to minimize the increased level of quantized noise, or make this higher level of noise indiscernible to the listener.
  • the characteristics of the human hearing process provides several opportunities to do the latter. The first is the basic threshold of hearing.
  • the cochlea is a spiral, tapering passage with the basilar membrane that is stretched, more or less, across the diameter along its length. Sound is conducted from the outer ear to the fluid in the cochlea where it travels the length of the basilar membrane.
  • Different frequency components of a sound vibrate the hair cells at different locations along the membrane, stimulating the auditory nerves.
  • the frequency dependent movement of the hair cells make the ear act like a spectrum analyzer.
  • a high level frequency component will not only vibrate the hair cells at the location sensitive to that specific frequency, but it will also vibrate the hair cells at some of the adjacent locations as well.
  • the spreading of the response to a specific frequency over multiple hair cell sensors can and will override, or
  • mask the response to other lower level, nearby frequency components.
  • the ability of relatively loud sounds to mask lower level ones is usually described by sets of frequency and level-dependent "masking curves”. If the quantizing noise produced by a coarse quantizer can be confined to the spectral region near to the signal component being quantized (or encoded), and if that noise is low enough to fall below the masking curve of the signal being coded, then the listener will not hear the quantized noise. That is, the amount of data that represent spectral regions near to the signal component being quantized can be reduced without it becoming noticeable to the listener.
  • EMD Electronic Music Distribution
  • an electronic signal that is perceptible to the senses of a human such as an audio or video signal
  • an audio or video signal is modified in a manner that is not perceptible until, after the signal is compressed and decompressed, the decompressed signal is noticeably degraded.
  • the specific embodiments and examples provided herein relate primarily to the processing of audio signals but the principles used with audio signals also apply to other types of observed signals, such as video signals.
  • An audio signal is modified in a manner that is not perceptible to the human ear until, after compression according to one of various specific compression algorithms, an uncompressed version of the compressed signal is noticeably distorted to the human ear.
  • the audio signal may be modified an amount that a small degradation is perceived by a limited number of trained observers but generally not noticed by ordinary listeners. It is the imperceptibility to ordinary listeners that is important, of course, not the perception of a relatively few number of audio experts. A subsequent compression and decompression of the modified signal then results in a reproduction of it that is perceived by ordinary listeners, as well as audio experts, to be significantly degraded.
  • the original audio signal is modified so that its subsequent compression and decompression changes it from one that is acceptable to almost all listeners to one that is not acceptable to those same listeners.
  • the perceptibility of the signal modifications can also be determined electronically by comparing the original and the modified signals with data of masking characteristics of the human ear that are in cornmon use in sound signal processing, particularly as part of audio compression and decompresssion techniques.
  • the original audio signal is so modified, so that any such compression and decompression results in the distorted signal
  • a compressed audio signal is modified in a manner that provides a high quality signal when decompressed but which, when that decompressed signal is again compressed, its further decompression results in a noticeably distorted signal.
  • an audio signal is modified by increasing levels of its masked frequency components while still retaining those levels below the masking level of a typical human ear.
  • the resulting distortion caused by this "anti-compression" processing of the signal is thus not heard by a listener.
  • the modified audio signal is compressed and then decompressed by algorithms of the type discussed above, the resulting sound is significantly degraded in quality. This is because the compression algorithm is operating on a different sound signal than the original one that is desired to be reproduced.
  • the masking levels are different and the reduced number of bits used to represent the spectrum are thus allocated differently. When these different bit allocations are used to reconstruct the sound signal, it does not represent the original signal.
  • the compression algorithm may need to allocate a limited number of bits to an expanded portion of the signal's spectrum, thus not representing the unmasked, audible portions with enough resolution.
  • the resulting decompressed sound signal is a significantly degraded, noisy version of the original signal and is therefore not desirable for listening.
  • a second example of the first embodiment of the anti-compression techniques relationships between multiple audio data channels are used.
  • the example of this embodiment employs the alteration of timing and or phase relationships found within an audio signal with two or more channels. Alteration of these relationships in a multi-channel signal causes subsequent compression and decompression processes to incorrectly combine the multiple channel data during the data reduction process, and thus cause a degraded version of the original audio signal to be produced after the compression process is complete.
  • a third example of the first embodiment of anti-compression techniques again uses relationships between multiple audio data channels, hi this case, data from one channel of a multi-channel signal is added to the data of another channel of the multi-channel signal in a manner such that the donor signal is masked by the receiver signal.
  • This data is altered in phase on a periodic or aperiodic basis and can also be altered in phase on a frequency component basis.
  • the effect is to once again cause a subsequent compression and decompression process, which attempts to combine the data in the multiple channels as a strategy to reduce data rate, to incorrectly perform this combination process and thus cause the resulting compressed signal to be degraded when decompressed.
  • a fourth example of the first anti-compression embodiment once again uses the relationships between multiple audio data channels, but in this case they are used to unmask data embedded into the original signal that are masked by the audio data prior to the compression process being performed.
  • the mechanisms employed to reduce the data rate of monophonic and multichannel signals often employ detectors which monitor input audio signals, partial results being available during the encoding process and/or included with the encoded output signal characteristics. The results of this monitoring activity are used to initiate different compression processing modes. These different modes are initiated in order to encode special case audio signals with fewer artifacts. The selection mechanisms driven by these detectors can and do make the wrong choices when encountering unanticipated changes in audio signal characteristics.
  • This fifth example of the first anti-compression embodiment takes advantage of this fact by placing phase, timing and/or amplitude discontinuities in the original signal, which are masked by the audio signal itself. These discontinuities cause the aforementioned detectors to switch to an incorrect mode with respect to the audio signal being processed, thus choosing an inappropriate processing function for the audio signal being encoded. Thus, when the encoded audio signal is decompressed, a compromised quality audio output is realized.
  • discontinuities can be monophonic in nature, in that a mode detector's confusion can be caused by discontinuities injected into only one channel of the data stream that are independently analyzed with respect to activity in other audio channels. They can also be multi-channel in nature, in that a mode detector's confusion can be caused by injected discontinuities which are analyzed in relationship to activity in one or more of the other audio channels.
  • an encode/decode compression algorithm pair is described which has the characteristic of producing compressed audio data that can be decompressed for listening, but cannot be compressed with quality for a second time, thus effectively disallowing retransmission of the audio data over the Internet.
  • a first example of this "one generation" codec with built in anti-compression processing uses the addition of noise or other data to achieve the desired unique results.
  • a second example of the second embodiment employs the generational characteristics of compression algorithms to a similar end.
  • a third example of the one generation codec embodiment of the present invention uses the fact that compression algorithms with improved generational qualities often use additional techniques to reduce bit requirements without adding quantization noise. These techniques, Huffman encoding for example, form the basis of additional methods for producing compressed audio data that can be decompressed for listening, but cannot be compressed with quality for a second time.
  • the unique concept, presented in this third example of the one generation codec, of embedding data within a compressed audio signal that is decoded by a subsequent decoding process as if it was part of the originally encoded data, and which is in a form that is compatible with the compressed audio data which comprises said compressed audio data stream, may be included as a central idea in all the examples of the second embodiment of the present invention.
  • an alteration of the timing of the processing of defined blocks of audio data is employed to create a compressed version of the original audio data that displays high quality when decompressed and listened to, but will cause following compression and decompression processes to be unable to choose the size and process timing necessary to mask, transient noise added to the audio data during the initial compression process.
  • phase, timing and/or amplitude discontinuities are inserted into one or more of the channels of the encoded audio. These discontinuities are designed to be as imperceptible to the human ear as possible when they appear in the decompressed audio.
  • these discontinuities are tailored to cause the initiation of different compression processing modes in a subsequent encoding (compression) process, as described in the fifth example of the first anti-compression embodiment of this invention.
  • the incorporation of these discontinuities in the codec allows for the discontinuities to be embedded in the encoded signal at the time of encoding, or the passing of discontinuity information from the encoder to the decoder by means of carrying the additional discontinuity data along with the encoded data stream in the data structure of the encoded signal.
  • discontinuities are added to the encoded, compressed audio data itself such that the decompression decoder will pass these discontinuities into the decompressed data stream without acting upon them, and thus these discontinuities will appear in the decompressed data stream with minimal or no alteration
  • the latter case the mixing of the discontinuities with the decoded data stream takes place in the decoder.
  • a decoder can be constructed such that the discontinuity data is generated within the decoder, with no discontinuity information passed to the decoder from the encoder. This discontinuity information is then derived from analysis of the signal characteristics of the decoded audio signal and mixed with the decoded audio signal before it is delivered to the user as a time domain audio output.
  • a unique method of adaptively optimizing anti-compression processing of audio data is also included as part of the present invention. For example, any of the foregoing processing techniques can be adjusted as a function of characteristics of the input audio signal being processed during such processing.
  • a unique concept is included that discourages, and makes it difficult for computer hackers compromise the beneficial effects of the audio processing begin disclosed.
  • the techniques of the present invention apply those principles to change the character of the sound signal so that it cannot be compressed without significant degradation in the quality of the signal.
  • existing compression algorithms have been designed to allow a signal to be compressed and decompressed two or more times without significant degradation of the quality of the signal that is perceptible to the human ear, termed their "generational" quality.
  • the present invention uses the principles of compression in a reverse manner, modifying a sound signal so that it will not retain its quality when compressed. This contrary use of the principles underlying compression algorithms greatly improves the ability of a music provider to control the distribution of its music. Additional features, advantages and objects of the present invention are included in the following description of its embodiments, which description should be taken in conjunction with the accompanying drawings.
  • Figure 2 is a curve representing an audio signal being processed
  • Figure 3 is an example frequency spectra for a block of the audio signal that shows its processing according to the present invention
  • Figure 4 shows an example frequency spectra for a block of the audio signal after it is modified by the processing of the present invention
  • Figure 5 illustrates a recording application of the present invention
  • Figure 6 illustrates an Internet music delivery application of the present invention
  • Figure 7 shows a key card for use in the delivery application of Figure 6;
  • Figure 8 illustrates a one generation codec with built-in anti- compression components as part of the compression process
  • Figure 9 illustrates the application of "adaptive processing", referred to as optimization, to maximize the difference between the high quality of a processed but not compressed audio signal as compared with the reduced quality of a processed and compressed audio signal
  • ' "adaptive processing"
  • Figure 10 shows a multi-channel audio compression encoding technique with which various aspects of the present invention may be used
  • Figure 11 illustrates a method of adding discontinuities to multichannel audio signals
  • Figure 12 shows example frequency and phase characteristics of two channel audio anti-compression filters of Figure 11;
  • Figure 13 provides example two-channel audio signal characteristics and resulting compression algorithm encoding modes;
  • Figure 14 includes waveforms before and after an example anti- compression processing according to an example of the present invention
  • Figure 15 illustrates anti-compression processing according to an example of the present invention.
  • Figure 16 is a block diagram showing a single ended one-generation encoding technique according to the present invention.
  • the block diagram of Figure 1 shows an example anti-compression signal modification system 511 of the first embodiment of the present invention, which operates to process an input audio signal 513.
  • the first three processing steps 515, 517 and 519 are substantially the same as those of a compression algorithm of the type discussed above.
  • a block of data of the signal 513 is acquired.
  • a portion 527 of the signal is shown divided into time successive blocks, such as blocks 529 and 531.
  • data representing samples of the signal 527 during a block are quantized in the step 515.
  • the signal block is then filtered in a step 517 in order to obtain floating point coefficients of the frequency spectrum of the block of data.
  • Each sampled frequency is expressed as an exponent (coarse measure) and mantissa (fine). Those values are then used by a non-linear quantizer 519 to calculate a masking function 535 ( Figure 3) and compare it to the spectrum 533 of the block.
  • the quantizer 519 When used as part of a compression algorithm, the quantizer 519 also allocates a lesser number of bits than in the incoming signal 513 to represent the signal in limited frequency ranges 537 where the spectrum 533 is greater than the mask 535.
  • the remaining frequency ranges are not necessary to be included in the compressed signal since they are below the levels indicated by the mask 535 that a human ear can hear. So they can be omitted, and it is this omission that allows the amount of data representing the signal to be reduced.
  • a step 521 is added that does not exist in compression algorithms. This step calculates increases that can be made to various frequency components of the incoming signal 513.
  • the block spectrum 533 and mask 535 calculated in the non-linear quantizer 519 are used in this calculation. This calculation increases the value of frequency components that are less than the mask 535, increasing the signal spectrum 533 into shaded regions 539 of Figure 3. Since, as expressed by the masking function, the human ear cannot separately resolve these frequencies, this will not be perceived to degrade the signal, so long as the spectrum 533 is not increased above the level of the mask 535. Indeed, it is preferable to maintain the spectrum 533 below the mask 535 by some margin in the regions 539 to assure that these added signal components will not be heard by the human ear.
  • Example margins are ten or twenty percent of the level of the masking function 535.
  • the level of some frequency components of the signal 533 maybe increased above the mask 535 without affecting the quality of the sound to the human ear, such as at frequencies adjacent peak frequency levels of the spectrum.
  • This type of change to the signal 533 can also affect the ability of a decompression algorithm operating on a compressed version of the altered signal to provide a good quality decompressed signal.
  • changes to the spectrum 533 maybe more modest so that the modified signal can be subject to one compression and decompression cycle without significantly degrading the quality of the incoming signal 513 but would result in serious degradation if again compressed and decompressed.
  • This partial degradation has application to the 1-nternet, wherein the partially degraded signal is initially sent over the Internet and re-transmissions of the audio signal are discouraged when the second or more cycle of compression and decompression makes the sound undesirable.
  • the additional calculated signal is then added to the input signal 513 at 523 in order to provide a modified signal output 525.
  • An implementation of the processing of Figure 1 includes a digital signal processor that operates under controlling software to perform the functions described above.
  • the step 521 may determine in one of several ways the amount that the level of the audio signal 513 is to be increased in the step 523 over a portion or all of the frequency ranges 531.
  • One way is to generate random or pseudo-random noise that is uncorrelated with the signal 513 and add appropriate levels of such noise to the signal in the block 523.
  • Another way is to generate a defined signal, such as a sine wave or a combination of sine waves of different frequencies, that is uncorrelated with the audio signal, and then add such a signal(s) to the audio signal.
  • a further way to modify the audio signal 513 is to add an amount of signal data that is correlated to it. This last technique may be implemented by simply increasing the levels of the frequency components already in the signal that are below the masking curve 535. This preserves the original audio qualities of the initial signal because the added data is correlated with that signal. The added data is then also difficult to distinguish from the original signal when listening to the resulting output audio signal 525.
  • One way to increase the signal levels is to multiply the levels of some or all of the various frequency components of the audio signal 513 within the frequency ranges 539 by a frequency dependent factor greater than unity to increase the level of some or all of such frequencies to a level that is equal to or some defined amount below the masking function 535.
  • Yet another way to modify the audio signal of 513 is to add a replica of the original signal from one or more frequency bands, position shifted in time by one - or more clock cycles with respect to the original audio signal, to the original audio signal.
  • the original audio qualities of the initial signal are preserved because the added data is presented in very rapid sequence with respect to the original data and is correlated with the original audio signal.
  • the added data is also difficult to distinguish from the original signal when listening to the resulting processed output audio signal 525.
  • One way to add this replicated time shifted data is to store a block of the original audio signal's frequency domain coefficients, delay this coefficient data in time, recreate a time domain representation from the frequency coefficient data, and add this delayed time domain data back to the time domain representation of the original signal.
  • Another way is to first use a narrow band filter bank in the time domain to separate the frequency components of the original signal into multiple narrow bands. Then select which frequency band or bands of the original audio data are most beneficial to replicate and delay by one or more clock cycles with respect to the original audio data, based on which one of these frequency components will require the most bits to accurately represent the original signal in a compressed version of the original signal. Then amplitude normalize these frequency components with respect to the original signal, such that their amplitude is above, equal to or below the masking curve amplitude defined by the frequency components of the original audio signal, based on the masking properties associated with each band of frequencies. Then time synchronize this frequency band data, and combine it with the original audio data.
  • FIG. 4 illustrates the effect of one specific application of the signal processing described with respect to Figures 1-3.
  • a frequency spectrum 541 is shown for a block of the output audio signal 525 in the same time interval as illustrated in Figure 3.
  • the input signal 513 has been modified by increasing the level of the spectrum 533 in all frequency ranges where it was below the mask 535 (shaded regions 539) up to the level of the mask 535. . This represents the maximum increase of the input signal 513 that is desirable, and, as discussed above, is normally more than what is normally prudent to add.
  • the output signal 525 now has a different frequency spectrum than the input signal 513. If the output signal is then compressed by the type of algorithm discussed above, a resulting mask 543 is different.
  • the mask of a block is calculated as part of compression algorithms from the frequency spectrum of the block itself and, in some algorithms, from data of the frequency spectra of adjacent blocks occurring in time before and/or after the block represented by Figure 4.
  • the example shown in Figure 4 shows a large extent 545 of frequencies where the spectrum 541 is higher than the mask 543.
  • the compression algorithm then must allocate its limited number of bits across the frequency bands 545 which are much larger in extent of frequency than the bands 537 ( Figure 3) of frequencies for the original signal 513.
  • the signal spectrum 541 ( Figure 4) of the output signal 525 is much different than the spectrum 533 ( Figure 3) of the input signal 513, differences being noted over ranges 547 of frequencies.
  • the increased signal has the effect of causing the signal spectrum 541 and the mask 543 calculated (at least in part) from it to follow each other more closely (curves of Figure 4 vs. those of Figure 3). This also makes the signal less compressible after the signal has been increased.
  • the result is a compressed signal calculated from the output signal 525 that is much different than one calculated from the input signal 513.
  • the output signal 525 because of the nature of the data intentionally added to the input signal 513, does not lend itself to compression if a faithful reproduction of the input signal 513 is desired upon decompression.
  • the embodiment described above transforms the complex audio signals that are input to the system into the frequency domain, and masking curves for the different signal components are computed.
  • the masking (hearing) threshold curves are compared with the spectrum of the input audio signal, and the limits on the level of quantizing noise or other added data that can be "hidden” by the audio signal input to the system is thus determined.
  • the encoder then makes decisions about the coarseness of the quantizer, or the number of bits that need to be assigned to each of the frequency components of the audio signal, in order to assure that the added quantizing noise, caused by the coarser quantizing process, is masked and thus imperceptible to the listener.
  • this information is employed to determine how much extra noise, for example, can be added to the original audio signal input to the system, before this noise can be heard by the listener.
  • the present techniques output the original signal with noise added on a frequency component by frequency component basis, the level of added noise chosen to be just low enough to be masked by adjacent frequency components in the original audio signal.
  • the audio output signal then no longer has the uniform low level noise floor of the original input audio signal. Instead it has a dynamically changing, program dependent noise floor.
  • this digital audio signal is converted into its analog audio presentation and listened to, the added noise will properly be masked by the adjacent higher level frequency components in the signal, and thus not heard. If, however, this processed signal is fed into a compression encoder/decode process for Internet distribution, the additional quantizing noise caused by this following audio compression/decompression process will add to the noise injected into the audio signal by the techniques described above. The resulting audio signal will then contain a total noise which is over the masking curve limit, and thus the noise will be perceptible to the listener. These noise artifacts will make the compressed audio signal unsuitable for distribution over the Internet, which is an objective of the present invention. It should be noted that the injected "noise" can have a wide range of characteristics. These characteristics are chosen to be most annoying to the listener in the event the noise is made perceptible by a follow-on compression process.
  • timing and/or phase relationships between two channels (a stereo pair) of an audio signal composed of two or more channels are modified.
  • This modification can be a fixed phase or timing change, or a phase or timing change that varies over time.
  • the modified phase or timing relationship can be different for each audio frequency encountered in the original audio signal.
  • This technique is designed to work best with "Intensity" stereo or "Coupled” multi-channel compression possesses. Intensity stereo and coupled compression processes are well know in the art. These methods combine input audio data from two or more channels above a predefined frequency, and retain only the intensity of the total energy appearing in each frequency band above this predefined frequency.
  • the intensity envelope of the total energy is encoded on a frequency by frequency basis, and the amplitude of the signal in each channel is retained.
  • This channel amplitude information is delivered separately in the encoded bit stream to the decoder, so that the decoder can parcel the monophonic intensity envelope to each channel based on the original amplitude of the signal that appeared in any particular channel.
  • a simple implementation of the above concept calls for advancing or retarding the phase of one channel with respect to the other by a predetermined number of degrees, for example 180 degrees, of all frequencies above a predetermined frequency. 1500 Hz has proven to be a good frequency to choose for this purpose.
  • This process produces an audio signal which sounds identical to the original stereo audio signal, but will be degraded by a subsequent compression process which employs intensity stereo techniques.
  • the resulting intensity stereo compressed and decompressed audio signal sounds very much as if it is emanating from an underwater source because of the amplitude variations introduced in the audio program material by complete or partial phase cancellation as described above.
  • a similar effect can be produced if, instead of introducing 180 degree phase inversion above a predefined frequency, one of the two channels of the stereo audio pair being processed is advanced or retarded in time with respect to the other channel. This can be implemented in the digital domain by advancing or retarding one of these two channels with respect to the other channel by 1 or more bits.
  • a more advanced version of the above concept calls for modulating the timing and or phase of a particular frequency or frequencies. For example, a rate below or above the lowest or highest frequency the human ear can detect can be employed. Such a rate could be 1 Hz.
  • the modulation would be imposed on one or more frequency component present in one channel of a stereo channel pair as compared to the other channel of the stereo channel pair.
  • This phase modulation will not significantly affect the processed original stereo audio data, but, when the processed data is compressed and decompressed by the use of an intensity stereo compression algorithm, causes an audio output whose amplitude varies in time and is quite degraded. This degradation is caused by the varying phase cancellation of the data which is common to both channels.
  • I-n a third example of the first embodiment of anti-compression, relationships between two or more audio data channels are again used to create an audio signal that will cause a compression and decompression process, which attempts to combine data in multiple channels as a strategy to reduce data rate, to incorrectly perform this combination process during encode and thus cause the resulting decoded signal to be degraded when decompressed, hi this technique, data from one channel of a stereo pair of a multi-channel signal is reversed in phase and added, in the frequency domain, to data in the other channel of the stereo pair. For clarity of discussion we will call one of these channels the "right” or "R” channel and the other channel the "left" or "L” channel.
  • Any two channels of a multi-channel audio signal that is an audio signal with three or more channels, can be designated for the purposes herein as the "R” and “L” channels.
  • the use of "R” and “L” nomenclature refers to a two channel stereo music source solely to aid in visualizing the concept, but there is no intent to limit this technique to such a source. Care is taken to insert this cross-channel data in a manner such that the donor channel signal data is masked after insertion into the receiver channel and does not significantly affect the quality of the resulting pre-compressed audio signal.
  • the added L to R cross-signal can be reversed in phase on a periodic or aperiodic basis.
  • the reversed phase L signal can be periodically or aperiodically inserted and not inserted into the R channel.
  • Additional anti-compression effects can be realized by reversing the phase of only some of the frequency components of the L signal that is added to the R signal. For example, the phase of every second or third frequency bin of the L signal can be reversed before the L signal is inserted into the R channel. Note that although this discussion has referred to the addition of L data in the R channel, this is for example purposes only. The technique is equally valid for the insertion of R data into the L channel.
  • a fourth method of modifying audio signal 513 once again uses the relationships between multiple audio data channels. In this case spurious data which is masked by the original audio signal is embedded into each channel of the original audio signal. This data is caused to be"unmasked" when the audio signal is compressed.
  • This approach is to first alter or totally reverse the phase of one channel of a stereo audio signal with respect to its other channel.
  • This alteration in phase which could be either fixed, varying in time, or applied periodically or aperiodically, could be implemented on frequencies which lie above a predetermined frequency, over a range of frequencies, or over one or more bands of frequencies.
  • the spurious data is then added in phase into both channels. By choosing the spurious data such that it is below the masking threshold of the original audio signal, the spurious data will be inaudible when this now processed audio signal is reproduced for listening.
  • a modification of the above strategy is to add spurious data, at a selected frequency or frequencies, continuously, periodically or aperiodically, to one channel of a stereo audio signal, phase shift this added data by 180 degrees, and add it to the second channel of the stereo audio signal.
  • the intensity and frequency components of this added signal energy would be chosen to be below the masking threshold set by the audio data in each channel. Being 180 degrees out of phase the spurious data added to the two channels would additionally tend to cancel when reproduced either in free air, through speakers or through headphones, and thus be virtually inaudible to the listener.
  • the audio processed in this manner is encoded with a compression algorithm that sums the absolute values of one or more of the frequency components in each channel of said two channel audio signal in order to reduce the data rate requirements of the compressed signal, the absolute values of the embedded spurious signals in each channel will constructively add and the embedded spurious signals will become audible to the listener.
  • a fifth example of the first anti-compression embodiment takes advantage of compression strategies that detect characteristics of input and in-process audio data.
  • Audio data compression mechanisms that use different signal processing modes are employed by both monophonic and multichannel encoders.
  • Two examples of such audio compression strategies are "Middle/Side” or “M/S” stereo encoding, sometimes referred to as “Sum/Difference” stereo encoding, for compressing two channel audio signals, and “window switching", which is used for monophonic as well as multi-channel audio data compression.
  • the selection mechanisms driven by these detectors can and do make the wrong choices when encountering unanticipated changes in audio signal characteristics. When this occurs an incorrect set of processing functions are employed to encode the incoming audio signal and the resulting encoded output signal does not accurately reflect the properties of the input signal.
  • the present example of the first anti-compression embodiment takes advantage of this fact by inserting discontinuities into the original signal which cause the encoder to switch to an incorrect mode with respect to the audio data being processed. These discontinuities can be phase, timing, frequency, amplitude or other signal discontinuities.
  • discontinuities can be monophonic in nature.
  • the mode detector's false analysis is prompted by discontinuities in a single channel of the audio data stream, without regard to activity in other channels of the audio data stream. They can also be multi-channel in nature. In this case the mode detector's confusion is caused by discontinuities which are analyzed in relationship to activity in one or more of the other audio data channels.
  • this example five of the first embodiment of anti- compression includes the unique concept of adding and removing the aforementioned discontinuities on a temporal basis in order to cause a compression encoder to switch between one or more inappropriate and one or more appropriate encoder modes throughout the portions of the audio which is so processed.
  • FIG. 10 is an illustrative embodiment of a M/S stereo encoder.
  • Perceptual Model Processor 679 evaluates thresholds for the left and right channels. The two thresholds are then compared on a frequency subband basis. For example, the Right and Left input signals 669 and 671 respectively, could have been divided into 32 coder frequency bands. In each band, where the two thresholds vary between Right and Left by less than some amount, typically 2 dB, but not necessarily 2 db, perceptual encoder 673 is switched into the M/S mode by the action of line 681 becoming a "1". In the M/S mode perceptual encoder 673 uses M and S as its source data instead of R and L. That is, the Right signal for that band of frequencies is replaced by the sum of the
  • encoded outputs 675 and 683 are derived from M/S data not R/L data. The actual amount of threshold difference that triggers this substitution will vary with bit rate constraints and other signal system parameters.
  • DIFFERENCE channels mode M/S. This decision is based on the assumption that human binaural perception is a function of the output of the same critical bands at the two ears. If the signals are such that they generate a stereo image, then the choice of R/L coding is more appropriate. If the signals are similar then additional coding gains, that is either a maintaining of encoded audio quality at a lower data rate or the improvement of audio quality at the same data rate, may be exploited by choosing the M/S coding mode. A convenient way to detect the similarity of the two channels being encoded is by comparing the monophonic threshold between Right and Left channels. If the thresholds in a particular band do not differ by more than a predefined value, then the M/S coding mode is chosen.
  • This mode is chosen because this situation most often occurs when the amplitude of the frequency components, which comprise both signals, are very similar. Otherwise the independent mode R/L is assumed.
  • associated with each band is a one bit flag that specifies the coding mode of that band and that flag must be transmitted to the decoder as side chain information.
  • the coding mode decision is adaptive in time since for the same band it may differ for subsequent segments, and is also adaptive in frequency since for the same segment, the coding mode for subsequent bands may be different. An illustration of a coding decision is given in Figure 13.
  • MPEG 1 Layer 3 (MP3) Version 1.0 audio compression encoder developed by Fraunhoffer Gesellshaft IIS, which is used in the Opticom "MP3 Producer" Version 2.1 application, is an example of an audio compression encoder which employs M/S stereo techniques as described above.
  • the Fraunhoffer MP3 audio encoder determines whether it should use the R/L or M/S mode on a frame by frame basis and will switch into M/S mode when the average of the monophonic thresholds between Right and Left channel subbands do not differ by more than a predefined value.
  • the Fraunhoffer MP3 encoder evaluates and performs a threshold comparison the effect, as seen in the external behavior of the encoder, is that the encoder will assume the M/S mode when the average energy in the frequency components of the R channel is almost equal to the average energy in the frequency components of the L channel. When the average energy of the frequency components in the R and L channels differ by more than a certain amount, then the encoder will go into the R/L mode. When the average energy of the frequency components in the R and L channels vary around this predefined level the Fraunhoffer MP3 encoder can become confused and toggle between the M/S and FJL modes. This uncertainty is exploited in this fifth example of the first anti-compression embodiment.
  • Figure 11 is a block diagram of an implementation of the fifth example of the first anti-compression embodiment. It depicts the addition of phase and amplitude discontinuities to a stereo audio signal. As will be shown, these discontinuities cause the MP3 encoder, which follows the anti-compression processor depicted, to be uncertain as to the choice of M/S or R/L mode. This results in switching between these modes during the process of encoding the stereo audio signal. As shown in Figure 11, which depicts anti-compression processor 627, Right channel input signal 629 and Left Channel input signal 631 are divided into low and high pass signals by passing them through respecive filters 633, 635, 637 and 639.
  • Right channel high pass signal 715 Right channel low pass signal 717, Left channel high pass signal 719 and Left channel low pass signal 721.
  • Left channel high pass signal 719 is further processed by the 180 degree phase inverter 655 and added to the Left channel low pass signal 721 in mixer 643.
  • This 180 degree phase inversion is not included in the processing chain for Right channel high pass signal 717 which is added to Right channel low pass signal 715 in mixer 641.
  • Low pass filter block 633, high pass filter block 635, high pass filter block 637 and low pass filter block 639 serve to add phase and amplitude discontinuities around a predefined frequency.
  • this frequency has been chosen to be approximately 1600 Hz.
  • 1600Hz has been chosen for illustrative purposes only and could have been chosen to be any frequency above or below 1600Hz. How effective the chosen frequency will be depends on the audio signals being processed.
  • the phase and amplitude characteristics of these filter blocks are shown in Figure 12. Of course, the exact characteristics of these discontinuities will be dependent on the filter characteristics chosen and how the falling slopes of the low pass filters and the rising slopes of the high pass filters are related.
  • the falling slopes of low pass filters 633 and 639 and the rising slopes of high pass filters 635 and 637 have been chosen to be quite sharp, about 60 dB per octave, and their cross over point 659 has been chosen to be -6dB from the flat portion of the filters frequency response.
  • This selection of filter characteristics are for a specific example only. Other filter characteristics can alternatively be chosen. However, this set of characteristics will cause the frequency spectrum discontinuities injected into the Right and Left signals to assume mimmum audibility in the uncompressed Right and Left stereo signal. They also can cause the M/S-R/L selection determination in the subsequent MP3 encoder process to be uncertain.
  • low pass filter falling slope 657 causes an amplitude dip in both the Right and Left Channels that begins at about 1500 Hz, before the high pass filter rising slope 661 has an opportunity to compensate for this loss in signal energy.
  • Figure 12 depicts rapidly changing non-linear phase responses 665 and 669 which culminate at an inflection point 667. This inflection point occurs at approximately 1600 Hz.
  • the average thresholds of the Right and Left Channels of a musical selection, which is to undergo Anti-Compression processing are either solidly within the predetermined threshold difference band defined by a subsequent MP3 encoding process, or are substantially outside this difference band, the addition of the above described transients may be insufficient to cause the MP3 M/S - R/L analysis and detection mechanism to become confused and switch between M/S and R/L modes. If the Right and Left average thresholds are within this difference band, the MP3 encoder would remain in the M/S mode. If they are substantially outside this difference band, the MP3 encoder would continuously assume the R/L mode.
  • a narrow threshold band be maintained between the channels in order to add Anti-Compression characteristics to the input audio signal, using the example Anti-Compression processing scheme.
  • This situation is resolved by the cross channel mixing processing network composed of circuit blocks 647, 645, 649, 653, 651, and 723 of Figure 11.
  • this network is adjusted such that the difference between the average thresholds of the Right and Left channels are forced to reside in the range of M/S - R L switch uncertainty, where the MP3 encoder will switch between the two modes if the thresholds of the music varies.
  • An audio signal 757 is inputed to a Combiner 753 and a Psychoacoustic Analyzer 761.
  • the Psychoacoustic Analyzer 761 determines the acoustic elements that comprise input audio signal 757, in terms of both spectral components and the timing of these spectral components, and inputs this data, which appears on line 765, to a Degradation Generator 763, a Forcing Function Generator 791 and a Masking Function Generator 803.
  • the Degradation Function Generator 763, Forcing Function Generator 791 and Masking Function Generator 803 all employ the data on line 765 to create signals 755, 751 and 803, respectively, that are combined with the original audio signal in the Combiner 753.
  • a degradation function Input 755 is created such that it is minimally audible in the Anti-Compressed audio output appearing on line 759, but, following a compression process, is perceptible in the decompressed version of this signal.
  • a Forcing function Input 751 is also created such that it is minimally audible in the Anti-Compressed audio output appearing on line 759, but in this case the objective is to force audio compression encoding processes, which subsequently acts on the Anti-Compressed audio output 759, to employ encoding techniques or parameters during the encoding process that are inappropriate for the proper encoding of the Anti-Compressed audio output 759.
  • Masking Function Input 801 serves the purpose of reducing the audibility and/or increasing the acceptability of the additional signals added to the input audio data stream by the Forcing Function and/or Degradation Functions generators. Note that the Forcing function 751 is also input to the Degradation Generator 763 and the Masking Function Generator 803.
  • Forcing function 751 also provides timing information to Degradation Generator 763 and Masking Function Generator 803.
  • the Degradation Function 755 and the Masking Function 801 to be inserted in the Anti- Compressed signal 759 at the time or times during which they will be most effective in causing the desired effect, hi the case of the Degradation Function 755 this time or times are chosen to cause the Degradation Function to be audible after a compression- decompression cycle and non-offensive in the Anti-Compressed (ACTed) output signal 759. In the case of the Masking Function 801, this time or times are chosen to reduce the audibility of the Degradation Function and/or the Forcing Function in ACTed Audio Output 759.
  • Masking Function could be perceivable by a human listener, listening to an audio reproduction of the ACTed Audio Output 759, and still be acceptable. This case would occur if the Masking Function added to 759 is chosen to complement the artistry of the music signal appearing on 759. Such would be the case if the Masking Function was chosen to be, for example, a synthesized or naturally occurring trumpet sound that contained frequency components of the appropriate amplitude to mask the audibility of the inserted Degradation and/or Forcing Functions, and said Masking Function was inserted into an appropriate musical passage.
  • the processing elements defined in the generalized Anti-Compression process depicted in Figure 15 are often encountered as compound elements that perform one or more of the Anti-Compression processing functions.
  • forcing function 751 produced by Forcing Function generator 791 of Figure 15 is created by the actions of the Low Pass Filters 633 and 639 and the High Pass Filters 635 and 637.
  • These elements add the temporal and spectral discontinuties that are desirable to cause a subsequent MP3 encoding process to switch between M/S and R L modes. Thus they provide the forcing function required to cause audio compression encoder mode uncertainty.
  • the Degradation Generator function 763 of Figure 15 is provided by the Inverter 655 of Figure 11.
  • the MP3 encoder Since in the M/S mode, the MP3 encoder provides the majority of the bits to the M signal, and the M signal has been degraded above 1600 Hz, the resulting decoded M and S signals will provide R and L signals that do not display the same high frequency characteristics as the original Anti- Compressed R and L signals appearing on lines 775 and 779 of Figure 11.
  • the Inverter 655 serves the same purpose as the Degradation Generator 763 of Figure 15.
  • the function of the Combiner 753 of Figure 15 is provided by adders 641, 643, 645, and 723 of Figure 11. The only function provided for in Figure 15 and not present in Figure 11 are those of the Psychoacoustic Analyzer 761 and the Masking Function generator 803.
  • FIG. 5 One important application of the signal modification system 511 depicted in Figure 1 is illustrated in Figure 5.
  • a Compact Disc (“CD") is assembled as a digital file, indicated by a block 551
  • that file is processed by one or more of the techniques described above to add signal data to the audio signals of the file before making a CD master recording 553 from it.
  • the content of the resulting replica CDs that are sold to consumers cannot then be compressed without a significant loss of quality of the content signals when decompressed.
  • the same techniques can also be used when storing or distributing audio content by other means such as with audio tape, as a component of a Digital Video Disc ("DVD”), or as the digital or analog sound track on a motion picture release print. Since such compression is currently required before the audio content can be stored or distributed in several ways, such as storing in nonvolatile semiconductor memory cards or transmission over the Internet or other communications network, unauthorized copying and distribution of the content is thus greatly discouraged. The degraded music or other audio content is of little value.
  • DVD Digital Video Disc
  • the block diagram of Figure 6 illustrates a use of the present invention in the distribution of music or other audio content over the Internet in a manner that greatly discourages copying and re-distribution of the content by the recipient over the Internet.
  • a master audio source file 555 is compressed, as indicated by a block 557, and then encoded, as indicated by a block 559, in order to provide a secure transmission that can be decoded only by the intended recipient.
  • the compressed and encoded digital signal is then transmitted over the Internet 561 to the intended recipient who, in the normal case, has paid the content provider for it.
  • the recipient must then decode the incoming signal, as indicated by a block 565, by use of a key or other accepted technique, and then decompress it, as indicated by a block 567.
  • the master audio source file 555 is available to the recipient in a decoded and decompressed form that can easily be distributed to others over the Internet by a recipient who is willing to violate the copyright of the content provider. But since such unauthorized distribution is practical only if the content file is first again compressed by the recipient, noise or other data is added to the decoded and decompressed content file by the recipient's audio player or other utilization device, as indicated by a block 569. The recipient can, however, reproduce the audio content without degradation after the audio signal has been modified.
  • the content in the form of an analog or pulse code modulated (“PCM”) signal, for example, is applied to standard audio circuits 571 that drive a loud speaker or head phones.
  • PCM pulse code modulated
  • Such a signal addition in the recipient's utilization device is made effective when the recipient has no effective choice but to receive an output of the content from his or her utilization device after the audio signal has been modified.
  • the signal modification is preferably performed in a physically sealed module 115' that also includes the decoding function 565.
  • a key necessary for decoding the signal is included within the module in a manner that renders it inaccessible to the recipient. Since the content provider can make it a condition of supplying the music or other content that the recipient use such a sealed module to decode the transmitted encoded content, the added security against the recipient being able to easily redistribute the audio content is conveniently included in the same sealed module.
  • a decoded digital signal of the content is not available except within the sealed module 115'.
  • An input to that module is an encoded signal which the recipient cannot decode except with use of the module.
  • An output of the module 115' presents the content in a standard format, such as an analog or PCM signal, which could normally be re-digitized or otherwise manipulated by the recipient for unauthorized redistribution. But since such redistribution normally requires that the signal be compressed prior to doing so, the noise or other data that is added to the output signal by the processing step 569 makes that highly undesirable or even impossible.
  • the sealed module 115' is a variation of the module 115 described in the aforementioned Secure Transmission Patent Application, with a specific version shown in Figure 7 hereof, where the reference numbers are the same as used in the
  • DSP digital signal processor
  • the module 115' is preferably implemented in the form of a small key card that is made personal to a particular user by storing decryption (decoding) key(s) in its memory 147' that are unique to the user.
  • the key card is removably inserted into the user's audio player when connected to the Internet, a kiosk in a music store, or other content providing device, in order to purchase content from a provider with use of the user's key(s) stored within the card.
  • the key card is also inserted into the recipient's player, as well as others, in order to allow the received content to be played by the recipient while restricting the extent to which the content can be transferred to or played by others.
  • Second Embodiment Allowing one Compression and Decompression of an Audio Signal
  • FIG 8 shows a second embodiment of the present invention.
  • an encode/decode compression algorithm pair is described which has the characteristic of producing compressed audio data that can be decompressed for listening, but cannot be compressed with quality for a second time, thus effectively disallowing retransmission of the audio data over the Internet.
  • a compression algorithm with this characteristic is called a "one generation" algorithm.
  • the use of a one generation algorithm serves as an alternative to including anti- compression signal modification in the recipient's player, as described with respect to Figure 6 and 7.
  • an audio source file 577 is compressed with an available algorithm, as indicated by a block 579, and some noise or other data for the same purpose is added, as shown by a block 581.
  • the amount that the audio signal is increased by 581 is below that which significantly affects the quality of the content when decompressed by the user. But it is sufficient to cause the quality of the content signal to be significantly degraded if the decompressed signal is again compressed with the type of algorithm described previously.
  • the block 581 can be combined with the block 579 to form a single stage compression algorithm which provides a compressed audio output with anti-compression signal components added.
  • a "calculate signal increases" block such as block 521 of Figure 1
  • an “adder” block such as block 525 of Figure 1
  • a second approach applicable to the one generation codec embodiment described above employs the fact that compression algorithms inherently add quantization noise to the original signal during the compression process itself. As previous described, this is due to the fact that individual frequency components of the signal are more coarsely digitized in an effort to reduce the number of bits used to described the signal. This leads to “generation loss” when "cascading" compression processes. When compression algorithms are cascaded, that is a signal is compressed, then decompressed and then compressed and decompressed once again, the resulting signal is naturally noisier than the original signal.
  • the second embodiment of the present invention can take advantage of the mechanisms that produce generational loss, by employing those techniques that inherently modify the signal.
  • a third approach to implement the second embodiment of the present invention uses the fact that compression algorithms with improved generational qualities often use additional techniques to reduce bit requirements without adding quantization noise. These techniques can provide the basis for further one generation functionality methods. For example, some algorithms, such as the Dolby AC-3 compression algorithm, employ a technique called Huffman encoding in addition to reduced quantization resolution on a frequency band by frequency band basis. Huffman encoding uses the elimination of redundancies in the audio signal over time to reduce data requirements. It decreases the number of bits needed to described an audio signal by first encoding the audio signal using complete information and then only using differences in this information to describe the audio signal over a defined sequential time interval.
  • Compression algorithms using such a technique have better generational characteristics than those that do not because they can use finer frequency band quantization and still maintain the desired compression ratio. They suffer, however, from having reduced audio data time resolution.
  • the underlying assumption that significant changes in input audio signal characteristics will not take place over the time window used by the Huffman encoding process can be used by the one generation compression process.
  • One example of such use is the addition by a one generation audio compression process of short duration audio data or noise bursts to its output audio data stream. It is well known in the art that as an audio data sample is reduced in duration it must be of greater amplitude to be perceived by the listener when in the presence of competing sounds.
  • an 8 kHz tone with a duration of 1 millisecond, beginning 2 milliseconds after the initiation of 60 db of Uniform Masking noise must be 33 dB greater in amplitude as compared to an 8 kHz tone with a duration of 20 milliseconds, beginning 2 milliseconds after the initiation of 60 db of Uniform Masking noise, to be perceived by the human ear.
  • Audio data samples which occur randomly in time, or at chosen predetermined time intervals, and are short enough in time duration will therefore not be easily sensed by the listener, but will be detected by an audio compression process attempting to compress the audio signal.
  • Using some of the specific techniques described above, as exemplified in Figures 3 and 4, will further hide the randomly added audio samples from a listener. If this audio compression process employs Huffman encoding, these pulses will asynchronously occur at the time the Huffman encoding process is preparing the data which is used as the reference for subsequent audio difference samples, and cause these subsequent samples to incorrectly represent the audio being compressed.
  • the Huffman encoding window is 30 milliseconds. This means that the output compressed audio will be corrupted for 30 milliseconds each time the Huffman reference information is spuriously altered by these embedded short audio noise bursts. This corruption will represent a significant degradation of the decompressed audio signal.
  • the addition of embedded short noise bursts can be used to anti-compress an audio signal that has not been previously compressed. Any compressed and subsequently decompressed version of an audio signal that has been anti-compressed in this manner will thereby be degraded as compared to the original audio signal.
  • these bursts will be decoded by a subsequent MP3 decoder as if they were part of the original signal. Since, as previously described, these noise bursts were masked by the original signal, the presence of these noise bursts in the decoded version of this encoded audio stream will be difficult to detect.
  • a fourth approach applicable to the one generation algorithm of the second embodiment of the current invention shown in Figure 8 uses a different method of accomplishing similar ends. It employs the concept of temporal unmasking. As described above, a usual compression encoding algorithm operates on successive, uniform blocks 529, 531 etc. of digital samples of the signal 527 ( Figure 2). If these blocks are not uniform, information defining the timing and number of bytes of data associated with each of these blocks of digital samples must be sent along with the compressed data for use by the compression decoding algorithm in order to reconstruct a replica of the signal 527. It is the alteration of this block timing and block size that can constitute the noise or data added by block 581 in the embodiment of Figure 8, either alone or in combination with some level of spectral alteration.
  • each successive block of audio data includes 256 new time samples as well as the previous 256 time samples.
  • This block of 512 overlapping samples is windowed and the data in this window, which moves in time, is transformed into 256 unique frequency coefficients.
  • the input signals are analyzed with a high frequency bandpass filter, to detect the presence of transients. This information is used to adjust the block size of the data transformed, restricting quantization noise associated with the transient to within a small temporal region about the transient, avoiding temporal unmasking.
  • the method under consideration utilizes the fact that the changing data block size and/or windowing time position, occurring on compression encode, must be transmitted to the decompression decoder in order to accurately decompress the encoded audio signal.
  • One method of doing this is through the use of side chain information, although other methods, which embed this information into the compressed audio data stream itself, may be employed.
  • This permits the decoder to accurately synclironize the decode operation with the varying encoded data block size and assure the same block size is employed for decode as was used for encode, thus avoiding temporal unmasking.
  • the present method takes advantage of the fact that this additional side chain information is not included in the decompressed audio data stream and is thus not available to subsequent compression processes.
  • the present method calls for the one generation compression algorithm under consideration to place transient noise or data at locations in the audio data stream being compressed which is synchronized with the sample block size and sample block timing used during the process of transforming the audio data stream data from the time to the frequency domain.
  • This transient extraneous data is tailored such that the audio data present in the audio signal begin compressed, which occurs immediately before and immediately after the transient, masks the audibility of these transients, so they will not be perceptible to the listener when the audio signal is decompressed.
  • the one generation compression algorithm under consideration uses a varying sample block size during the process of transforming the data from the time to the frequency domain.
  • phase, timing and/or amplitude discontinuities are inserted into one or more of the channels of the encoded audio. These discontinuities are designed to be as imperceptible to the human ear as possible when they appear in the decompressed audio.
  • these discontinuities are tailored to cause the initiation of different compression processing modes in a subsequent encoding process, as described in the fifth example of the first anti- compression embodiment of this invention.
  • the incorporation of these discontinuities in the codec allows for the discontinuities to be embedded in the encoded signal at the time of encoding, or the passing of discontinuity information from the encoder to the decoder by means of carrying the additional discontinuity data along with the encoded data stream in the data structure of the encoded signal.
  • discontinuities are embedded into the encoded signal at the time of compression encoding
  • encoded discontinuities are added to the encoded, compressed audio data itself, such that the decompression decoder will pass these discontinuities into the decompressed data stream without acting upon them, other than to decode them and convert them from the frequency domain to the time domain. They will therefore appear in the decompressed data stream with minimal or no alteration and be difficult to perceive in the decoded data stream.
  • this decoded data stream is again compressed and subsequently decompressed, these discontinuities cause this second decoded data stream version to be degraded, as previously described, compared to the audio signal that was first encoded.
  • Figure 16 depicts an implementation of this unique One Generation encoder approach.
  • a Right audio input channel 821 and a Left audio input channel 823 are simultaneously inputted into the ACT processing scheme beginning with a Psychoacoustic analyzer block 761 and ending with a Combiner block 753, and the audio compression encoding scheme beginning with a Buffer block 825 and ending with a Bit Stream Composing and Buffering block 829.
  • the ACT processing scheme depicted in Figure 16 is the same method previously described and depicted in Figure 15 of the present patent specification.
  • the audio compression encoding scheme depicted in Figure 16 is fully described in the previously mentioned United States Patent 5,285,498, of James D Johnston.
  • ACT Data Signal 827 is equivalent to ACTed Audio output 759 of Figure 15 hereof, less the PCM Audio Input 757.
  • the ACTed Audio Output is composed of a Forcing Function 751 combined with a Masking Function 801, a Degradation Function 755 and a PCM Audio Input 757.
  • 827 represents the ACT signal derived from the aforementioned Anti-Compression signal components before they are combined with the input signal which is undergoing Anti- Compression processing.
  • the ACT Data Signal 827 is then input to an Encoder and Formatter block 817 to be converted into the frequency domain and formatted such that it can be combined in Combiner blocks 831 and 833 with the transform coded and quantized version of the input audio signals appearing on lines 835 and 837.
  • the combined encoded audio and Anti-Compression elements are then passed through Huffman Coding block 839 to losslessly remove redundant information. Note that the addition of Anti-Compression data elements, that appear on lines 815 and 813, to the encoded audio signal components that appear on lines 835 and 837, will, in general, increase the data rate of the encoded signal.
  • the resulting encoded compressed audio signal is now in a form that can be decoded and decompressed by any appropriate decoder using techniques which are well known in the art.
  • the decoded signal produced by these decoders will be unique in that the decoded audio output delivered will contain Anti-Compression elements that disallow a subsequent compression and decompression process from delivering a high quality audio experience.
  • the "single ended" one generation codec approach described above a technique that does all anti-compression processing of the input audio signal during the encoding of the compressed audio data steam without using the decompression decoder as part of the process, is a unique concept.
  • this methodology allows the establishment of an installed based of players and customers, before One Generation encoders and One Generation compressed audio content is generally available. For example, if one were to chose to make an MP3 compatible One Generation encoder there would be an established base of hundreds of millions of One Generation MP3 players in the field at the present time, each player capable of producing anti-compressed audio signals from One Generation MP3 encoded content.
  • the decoding and mixing of the discontinuities with the decoded data stream takes place in the decoder. This has the benefit of pennitting the original, unprocessed encoded data stream to be recovered, if this should be desired, but requires that the discontinuity information be hidden in the encoded data structure so it cannot be removed before it is added to the decoded audio data.
  • a decoder can be constructed such that the discontinuity data is generated as part of, or as a separate process from, the decoder, using the principles illustrated in Figure 15, with the PCM Audio input 757 being the PCM decoded output of the decompression decoder.
  • no discontinuity information is passed to the decoder from the encoder.
  • the discontinuity information would be derived from analysis of the signal characteristics of the decoded audio signal and combined with the decoded audio signal before it is delivered to the user as a time domain audio output.
  • This one-generation approach provides compressed audio data that can be stored and distributed in any of a number of ways.
  • the distribution of such audio data in a form for use with individual portable audio players is mentioned above.
  • the players contain the software necessary to decompress the data.
  • the media storing the compressed data can be any one of commercially available media, such as non- volatile semiconductor memory in the player itself or in removable cards, small rotating magnetic disk drives and small optical disks.
  • security techniques be applied to restrict access to such compressed data in order to prevent it from being distributed in its compressed form.
  • An audio signal decompressed from a copy of the compressed data file will have a high quality. Security techniques, such as those described in the Secure Transmission Patent Applications referenced above, are therefore desirably applied. Another application is with the sound track of motion picture films.
  • Sound is commonly recorded in a compressed form. Movies are often video taped during an opening theater showing of them by a member of the audience. The video tape is then used to make copies of the film that are then distributed illegally. In order to obtain a good quality sound signal, an infrared audio signal transmission that is available in many theaters for use by people who are hard of hearing is intercepted and used. This uncompressed sound signal is then recompressed for recordation on the copies. If the sound track of the film has been compressed with one of the techniques described above, however, the audio signal decompressed from the illegal copies will have an unacceptable quality.
  • the block diagram of Figure 9 depicts anti-compression method 619 which can be used alone to add anti-compression characteristics to uncompressed audio signals or as part of a one generation audio compression codec 619 that operates on two channel stereo audio signals and tunes anti-compression processing as a function of input signal characteristics.
  • anti-compression method 619 which can be used alone to add anti-compression characteristics to uncompressed audio signals or as part of a one generation audio compression codec 619 that operates on two channel stereo audio signals and tunes anti-compression processing as a function of input signal characteristics.
  • only blocks 583, 585, 587, 589 and 593 of 619 would be required because the additional blocks shown, 611, 603, 601, 599, 597 and 595, are for second channel relationship analysis and second channel anti-compression processing.
  • elements of method 619 are replicated to accommodate the processing and relationship analysis required by the additional channels.
  • the method 619 assumes the use of a sub-band based process, so no prior block quantizing step is shown.
  • a sub-band based process uses narrow band time domain filters to continuously partition the input audio signal into its critical frequency bands. The input audio signal is therefore not transformed into its frequency domain representation and thus no block quantizing step is required.
  • the frequency component activity analysis derived by blocks 583 and 603, which corresponds to block spectrum 533 of system 511, is used by blocks 585 and 601 respectively to calculate the masking functions associated with each of the two stereo channels as well as to derive, for example, temporal audio activity, audio signal dynamic range, and audio signal baseline offset.
  • spurious signal generator blocks 587 and 599 are used by spurious signal generator blocks 587 and 599 respectively, often in conjunction with data from signal relationship block 611, to create spurious signals, which are combined with the input stereo signals 617 and 605 by adder blocks 593 and 595, which are output on lines 591 and 621 as anti-compressed treated signals. It is also used by signal modification blocks 589 and 597, also often in conjunction with data from block 611, to alter, but not add to, the signals output on 591 and 621.
  • time related masking curve information from blocks 585 and 601 can be employed by blocks 587 and 599 to create noise bursts inserted into the output audio signals 591 and 621 that are optimized in both timing and in frequency characteristics, so as to maximally confuse audio compression codecs employing Huffman encoding techniques, as previously described, but which are masked by the audio signal frequency components present so they are minimally audible to the listener.
  • the frequency and phase relationships between the input audio signals appearing on line 617 and 605, that are derived by the actions of block 611, can be used by audio signal modification blocks 589 and 597 to adaptively shift the relative phase of frequency elements common to both output signals 591 and 621, so as to cause audio compression codecs employing joint stereo encoding techniques to be optimally confused, as previously described, and produce degraded results.
  • signal relationship data from block 611 can be used by blocks 587 and 599 to add out of phase extraneous signals into each of the output channels, through the use of blocks
  • each of multiple incoming audio signals is modified according to a common algorithm.
  • the algorithm can be changed by a content provider for subsequent audio signal processing. This would then make it necessary for the hacker to determine the new algorithm each time it is changed.
  • many different algorithms can be alternately used by content providers in order to make the task of removing the modifications from the signal even more difficult.
  • This notion can be taken one step further by using a different algorithm on different parts of the same song or other audio content.
  • it will allow a single song to be tailored to the characteristics of multiple audio compression technologies and thus prevent this processed song from being compressed with quality by a large number of different compression encoder algorithms.
  • the perceptibility of the processing techniques can be measured by electronic means.
  • the anti-compressed processed signal is first passed through a series of bandpass filters in order to decompose this signal into the frequency components that comprise the processed audio signal.
  • the input audio signal is also passed through a series of bandpass filters in order to decompose this signal into the frequency components that comprise the input audio signal.
  • the unprocessed signal is subtracted from the anti-compressed processed signal to obtain the frequency components added to the input audio signal that comprise the added anti-compression signal.
  • the added anti-compression signal is then compared, by use of a spectrum analyzer, with well known human hearing masking curves, which are used in all perceptual compression encoders, to determine the audibility of the applied anti-compression signal as it appears in the anti-compressed version of the original audio signal.
  • the effect of the processing in the examples of the second embodiment described above can also be measured by electronic techniques.
  • the effect is a measure of anti-compression processing on a decompressed audio signal derived from an input audio signal that has undergone anti-compression processing and a compression encoding step. Discontinuities in the decompressed audio data stream are analyzed, where the decompressed audio data stream is derived from an input audio signal that has undergone anti-compression processing and a compression encoding step.
  • the compressed audio data stream is frequency decomposed by using a series of bandpass filters. The average energy is measured, on a frequency bin basis, of the decompressed audio data stream under test.
  • the deviations from these average energy values are then measured at the times at which anti-compression elements were added to the input, uncompressed, audio data stream. These energy variations are then electronically compared, on a frequency bin basis, with well known human masking curves, by means of an audio spectrum analyzer, to determine a measure of the audibility of the anti-compression signal included in the output decompressed signal.
  • the techniques of the second embodiment described above for compressing audio data can also be used when compressing the video data.
  • the compression and decompression algorithms are necessarily different, their characteristics are similar to those used with sound.
  • a decompressed video signal such as one obtained from a DVD disc, cannot be satisfactorily copied and again compressed since the decompressed video signal will have high levels of noise and distortion that makes the video unpleasant for a viewer to watch. Tins is especially the case when the video image repeatedly switches between a reasonably good image and a very poor image, or between two levels of poor images.
  • the invention is particularly suitable for use with signals that are interfaced with humans, such as audio, particularly music, and video signals, since the poor quality of unauthorized copies will not be tolerated by humans.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

Afin de décourager la compression de données contenues dans des signaux destinées à être interfacés avec l'homme, tels que ceux ayant un contenu audio, notamment musical, et, par voie de conséquence, afin de décourager la reproduction non autorisée et la distribution de ce type de contenu, notamment via Internet, les signaux de données sont modifiés d'une manière qui n'est normalement pas perceptible par l'homme lorsque le signal est reproduit mais qui provoque une dégradation significative du signal perceptible si ce dernier est ultérieurement comprimé et décomprimé. Dans un mode de réalisation, un signal audio est directement modifié d'une manière qui provoque une dégradation significative du signal si celui-ci est comprimé et ultérieurement décomprimé. Dans un autre mode de réalisation, une version comprimée d'un signal audio est modifiée, dans le cadre d'un processus de compression du signal, d'une manière qui permet d'obtenir une bonne qualité de signal si ce dernier résulte d'une décompression ultérieure mais qui dégrade de manière significative et perceptible le signal si ce dernier est de nouveau comprimé et décomprimé.
PCT/US2001/015328 2000-05-15 2001-05-11 Addition d'un bruit imperceptible a des signaux audio et a d'autres types de signaux visant a provoquer une degradation significative de ces signaux lorsqu'ils sont comprimes et decomprimes WO2001088915A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001261475A AU2001261475A1 (en) 2000-05-15 2001-05-11 Adding imperceptible noise to audio and other types of signals to cause significant degradation when compressed and decompressed

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US57065500A 2000-05-15 2000-05-15
US09/570,655 2000-05-15
US66734500A 2000-09-22 2000-09-22
US09/667,345 2000-09-22

Publications (1)

Publication Number Publication Date
WO2001088915A1 true WO2001088915A1 (fr) 2001-11-22

Family

ID=27075368

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/015328 WO2001088915A1 (fr) 2000-05-15 2001-05-11 Addition d'un bruit imperceptible a des signaux audio et a d'autres types de signaux visant a provoquer une degradation significative de ces signaux lorsqu'ils sont comprimes et decomprimes

Country Status (2)

Country Link
AU (1) AU2001261475A1 (fr)
WO (1) WO2001088915A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2372416A (en) * 2000-10-19 2002-08-21 Stom C & C Inc Method of preventing reduction of record sales due to digital music files being illegally distributed through a communication network
US7280689B2 (en) 2002-07-05 2007-10-09 Qdesign U.S.A., Inc. Anti-compression techniques for visual images
US8019201B2 (en) 2002-06-28 2011-09-13 Dcs Copy Protection Limited Method and apparatus for providing a copy-protected video signal
US8032006B2 (en) 2003-06-05 2011-10-04 Dcs Copy Protection Limited Digital processing disruption systems
US8160423B2 (en) 2004-10-13 2012-04-17 Dcs Copy Protection Limited Audio copy protection system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07161140A (ja) * 1993-12-07 1995-06-23 Sony Corp ディジタルオーディオ信号の伝送装置及び受信装置、並びにディジタルオーディオ信号の伝送方法及び受信方法
EP0831596A2 (fr) * 1996-09-20 1998-03-25 Deutsche Thomson-Brandt Gmbh Procédé et dispositif de circuit pour le codage et décodage de signaux audio
EP0889470A2 (fr) * 1997-07-03 1999-01-07 AT&T Corp. Dégradation de la qualité par compression/décompression
EP0947953A2 (fr) * 1998-03-30 1999-10-06 Seiko Epson Corporation Filligramme pour détecter des effractions dans des images
JP2000182320A (ja) * 1998-12-17 2000-06-30 Victor Co Of Japan Ltd 圧縮符号化防止方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07161140A (ja) * 1993-12-07 1995-06-23 Sony Corp ディジタルオーディオ信号の伝送装置及び受信装置、並びにディジタルオーディオ信号の伝送方法及び受信方法
EP0831596A2 (fr) * 1996-09-20 1998-03-25 Deutsche Thomson-Brandt Gmbh Procédé et dispositif de circuit pour le codage et décodage de signaux audio
EP0889470A2 (fr) * 1997-07-03 1999-01-07 AT&T Corp. Dégradation de la qualité par compression/décompression
EP0947953A2 (fr) * 1998-03-30 1999-10-06 Seiko Epson Corporation Filligramme pour détecter des effractions dans des images
JP2000182320A (ja) * 1998-12-17 2000-06-30 Victor Co Of Japan Ltd 圧縮符号化防止方法

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BONEY L ET AL: "Digital Watermarks for Audio Signals", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, LOS ALAMITOS, CA, US, 17 June 1996 (1996-06-17), pages 473 - 480, XP002160496 *
PATENT ABSTRACTS OF JAPAN vol. 1995, no. 09 31 October 1995 (1995-10-31) *
PATENT ABSTRACTS OF JAPAN vol. 2000, no. 09 13 October 2000 (2000-10-13) *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2372416A (en) * 2000-10-19 2002-08-21 Stom C & C Inc Method of preventing reduction of record sales due to digital music files being illegally distributed through a communication network
US8019201B2 (en) 2002-06-28 2011-09-13 Dcs Copy Protection Limited Method and apparatus for providing a copy-protected video signal
US9264657B2 (en) 2002-06-28 2016-02-16 Dcs Copy Protection Limited Method and apparatus for providing a copy-protected video signal
US7280689B2 (en) 2002-07-05 2007-10-09 Qdesign U.S.A., Inc. Anti-compression techniques for visual images
US8032006B2 (en) 2003-06-05 2011-10-04 Dcs Copy Protection Limited Digital processing disruption systems
US8837909B2 (en) 2003-06-05 2014-09-16 Dcs Copy Protection Ltd. Digital processing disruption systems
US8160423B2 (en) 2004-10-13 2012-04-17 Dcs Copy Protection Limited Audio copy protection system
US8639092B2 (en) 2004-10-13 2014-01-28 Dcs Copy Protection Limited Audio copy protection system

Also Published As

Publication number Publication date
AU2001261475A1 (en) 2001-11-26

Similar Documents

Publication Publication Date Title
US20020009000A1 (en) Adding imperceptible noise to audio and other types of signals to cause significant degradation when compressed and decompressed
CA2418722C (fr) Modulation d'un ou plusieurs parametres d'un systeme de codage perceptuel audio ou video en reponse a des informations supplementaires
Noll MPEG digital audio coding
US6879652B1 (en) Method for encoding an input signal
AU774862B2 (en) Scalable coding method for high quality audio
EP0873614B1 (fr) Transport de donnees cachees apres compression
US7372375B2 (en) Signal reproducing method and device, signal recording method and device, and code sequence generating method and device
JP5678020B2 (ja) オーディオストリームの段階的な適応型スクランブル
JPH02183468A (ja) デジタル信号記録装置
Xiang et al. Digital audio watermarking: fundamentals, techniques and challenges
US7702404B2 (en) Digital audio processing
WO2001088915A1 (fr) Addition d'un bruit imperceptible a des signaux audio et a d'autres types de signaux visant a provoquer une degradation significative de ces signaux lorsqu'ils sont comprimes et decomprimes
JP4207109B2 (ja) データ変換方法およびデータ変換装置、データ再生方法、データ復元方法、並びにプログラム
EP1554877A2 (fr) Desembrouillage adaptatif et progressif de flux audio
EP1431961B1 (fr) Transport de données cachées après compression
Xu et al. Digital Audio Watermarking
EP1582022A2 (fr) SYSTEME D’EMBROUILLAGE SECURISE DE FLUX AUDIO
JP2003308099A (ja) データ変換方法およびデータ変換装置、データ復元方法およびデータ復元装置、データフォーマット、記録媒体、並びにプログラム
Xu et al. Audio watermarking
Jayant Digital audio communications
JP2003308013A (ja) データ変換方法およびデータ変換装置、データ復元方法およびデータ復元装置、データフォーマット、記録媒体、並びにプログラム

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP