EP2240928B1 - Device and method for calculating a fingerprint of an audio signal, device and method for synchronizing and device and method for characterizing a test audio signal - Google Patents
Device and method for calculating a fingerprint of an audio signal, device and method for synchronizing and device and method for characterizing a test audio signal Download PDFInfo
- Publication number
- EP2240928B1 EP2240928B1 EP09710004A EP09710004A EP2240928B1 EP 2240928 B1 EP2240928 B1 EP 2240928B1 EP 09710004 A EP09710004 A EP 09710004A EP 09710004 A EP09710004 A EP 09710004A EP 2240928 B1 EP2240928 B1 EP 2240928B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- audio signal
- fingerprint
- value
- sequence
- bit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 269
- 238000012360 testing method Methods 0.000 title claims description 54
- 238000000034 method Methods 0.000 title claims description 23
- 238000001228 spectrum Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 230000001419 dependent effect Effects 0.000 claims description 6
- 230000000295 complement effect Effects 0.000 claims description 4
- 238000013139 quantization Methods 0.000 description 27
- 238000004364 calculation method Methods 0.000 description 24
- 230000005540 biological transmission Effects 0.000 description 16
- 238000012545 processing Methods 0.000 description 10
- 238000006243 chemical reaction Methods 0.000 description 9
- 230000003595 spectral effect Effects 0.000 description 9
- 230000000875 corresponding effect Effects 0.000 description 8
- 230000001360 synchronised effect Effects 0.000 description 8
- 230000000873 masking effect Effects 0.000 description 7
- 230000002123 temporal effect Effects 0.000 description 7
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000012512 characterization method Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000009466 transformation Effects 0.000 description 5
- 238000007906 compression Methods 0.000 description 4
- 230000006835 compression Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000003321 amplification Effects 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000003199 nucleic acid amplification method Methods 0.000 description 3
- 230000000737 periodic effect Effects 0.000 description 3
- 238000012805 post-processing Methods 0.000 description 3
- 238000011045 prefiltration Methods 0.000 description 3
- 238000013144 data compression Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000009396 hybridization Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000000153 supplemental effect Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
Definitions
- the present invention relates to the fingerprint technology for audio signals, and more particularly to the calculation of a fingerprint, the use of a fingerprint for synchronizing multi-channel extension data with an audio signal, and the characterization of an audio signal with the fingerprint.
- BCC Binary Cue Coding
- S. Disch S. Disch
- C. Ertel J. Hilpet
- A. Hoelzer K. Linzmeier
- C. Spenger P. Kroon: "Spatial Audio Coding: Next-Generation Efficient and Compatible Coding Surface Multi -Channel Audio ", 117th AES Convention, San Francisco 2004 , Preprint 6186, referenced.
- Such methods in a sequential communication system such as broadcast or Internet, separate the audio program to be transmitted into audio base data or audio signal, which may be mono or stereo downmixed audio, and extension data, also referred to as multichannel overhead information or multichannel extension data.
- the multi-channel extension data can be broadcast together with the audio signal, ie combined, or the multi-channel extension data can also be separated be broadcast from the audio signal.
- the multichannel extension data can also be transmitted separately to a version of the downmix channel that is already available to the user, for example.
- the transmission of the audio signal takes place, for example, in the form of an Internet download or a purchase of a compact disc or DVD spatially and temporally separated from the transmission of multi-channel extension data, which can be supplied for example by a multi-channel extension data server.
- the separation of a multi-channel audio signal into an audio signal and multi-channel extension data has the following advantages.
- a "classical" receiver is at any time able to receive and reproduce the audio base data, ie the audio signal, independently of the content and version of the multi-channel additional information. This property is called backward compatibility.
- a newer generation receiver may evaluate the transmitted multichannel overhead data and combine it with the audio base data, that is, the audio signal so as to provide the user with the full extension, ie. H. the multi-channel sound, can be provided.
- the previously broadcast stereo audio signal can be extended to the multi-channel format 5.1 by a small additional transmission effort.
- the multichannel format 5.1 has five playback channels, ie a left channel L, a right channel R, a middle channel C, a left rear channel LS (left surround) and a right rear channel RS (right surround).
- the program provider generates on the sender side of multi-channel sound sources, such as. B. on a DVD / audio / video to find the multi-channel additional information. Subsequently, this multichannel additional information can be transmitted in parallel to the previously emitted audio stereo signal, which now contains a stereo downmix of the multichannel signal.
- An advantage of this method is the compatibility with the previously existing digital broadcasting system.
- a classical receiver which can not evaluate this additional information, will be able to receive and reproduce the bilingual signal without any qualitative restrictions.
- a receiver of a newer design can, in addition to the previously received stereo sound signal, evaluate and decode the multichannel information and reconstruct the original 5.1 multichannel signal therefrom.
- the first solution is to combine the multichannel overhead information with the encoded downmix audio signal so that the data stream generated by an audio encoder can be appended as a suitable and compatible extension.
- the receiver sees only one (valid) audio data stream and can extract from it the multi-channel sound additional information via a corresponding upstream data distributor again synchronously to the associated audio data block, decode and output as a 5.1 multi-channel sound.
- This solution requires the extension of the existing infrastructure / data paths, so that instead of just the stereo audio signals, as before, they can now transport the data signals consisting of downmix signals and expansion. This is possible, for example, without any additional effort or unproblematic, if it is a data-reduced representation, d. H. a bit stream which transmits the downmix signals. A field for the extension information can then be inserted into this bit stream.
- a second conceivable solution is not to apply the multi-channel audio addition information to the audio coding system used to pair.
- the multichannel extension data will not be injected into the actual audio stream.
- the transmission takes place via a separate, but not necessarily synchronized additional channel, which z. B. may be a parallel digital additional channel.
- z. B. may be a parallel digital additional channel.
- Such a situation occurs, for example, when the downmix data, so the audio signal, in unreduced form z. B. as PCM data via AES / EBU data format, be routed through existing in studios common audio distribution infrastructure.
- These infrastructures are designed to digitally distribute ("crossbars") and / or manipulate audio signals between various sources, for example by means of a tone control, a dynamic compression, etc.
- the problem of skew of the downmix audio signal and multi-channel overhead information in the receiver may occur because both signals go through different, non-synchronized data paths.
- a time lag between downmix signal and additional information leads to a deterioration of the sound quality of the reconstructed multi-channel signal, since then on the playback side, an audio signal is processed with multi-channel extension data, which actually belong not to the current audio signal, but to an earlier or later section or block of the audio signal.
- Another example of this situation is when an already running 2-channel transmission system to be extended to a multi-channel transmission, if z. B. is thought of a receiver for digital radio.
- the decoding of the downmix signal means an existing in the receiver audio decoder, so for example a stereo audio decoder according to the MPEG-4 standard happens.
- the delay time of this audio decoder is not always known or can not always be predicted with certainty, due to the system-inherent data compression of audio signals. Therefore, the delay time of such an audio decoder can not be reliably compensated.
- the audio signal may even reach the multi-channel audio decoder via a transmission chain containing analog parts.
- a digital / analog conversion is carried out at a point in the transmission, which is again followed by an analog / digital conversion after further storage / transmission.
- no clues are initially available as to how a proper delay equalization of the downmix signal relative to the multichannel overhead data can be performed. If the sampling frequency for the analog / digital conversion and the digital / analog conversion slightly differ, so even creates a slow time drift of the necessary compensation delay corresponding to the ratio of the two sampling rates to each other.
- the German patent DE 10 2004 046 746 B4 discloses a method and apparatus for synchronizing overhead data and data.
- a user provides a fingerprint based on his stereo data.
- An extension data server identifies the stereo signal based on the obtained fingerprint and accesses a database to retrieve the extension data for that stereo signal.
- the server identifies an ideal stereo signal that corresponds to the stereo signal present at the user and generates two test fingerprints of the ideal audio signal associated with the enhancement data.
- These two test fingerprints are then delivered to the client, which determines therefrom a compression / expansion factor and a reference offset, based on the reference offset, the supplemental channels stretched / compressed and at the beginning and at the end be cut off.
- a multi-channel file can be generated using the basic data and the extension data.
- the international publication WO 2006/102991 A1 also discloses a synchronization to multichannel reconstruction by means of correlation of finger pressure.
- Fingerprint technologies generally must be characteristic of an audio signal. On the other hand, they should also be an equally compressed representation of an audio signal. This means that the fingerprint may consume much less storage space than the audio signal itself, otherwise creating a fingerprint and using a fingerprint would not make sense.
- a fingerprint should reflect the time course of an audio signal in order to be suitable for synchronization purposes on the one hand, but also for identification purposes on the other hand.
- an audio signal such as a broadcast
- the fingerprint does not have to be decompressible, since the fingerprint generation can be considered as a particularly lossy compression.
- fingerprint information is additional information, they should, as I said, be as compressed as possible, yet characteristic. Further, for the compressed representation, the more compressed the representation, the faster and more manageable any correlations occur, that is, computational methods involving a fingerprint, e.g. to synchronize or characterize an audio signal.
- the object of the present invention is to provide an efficient fingerprint concept.
- the present invention is based on the finding that a well-compressing fingerprint is obtained by a block processing of an audio signal, ie that a fingerprint value is derived per block of the audio signal. It has also been found that a progression of this fingerprint value from block to block is particularly characteristic of the audio signal. Therefore, in the sense of a differential coding, a comparison of successive fingerprint values is carried out for successive blocks, in order then to characterize only the change in a binary manner. If the first fingerprint value is greater than the second fingerprint value, then a first binary value is assigned, while if the second fingerprint value is greater than the first fingerprint value, a different second binary value is assigned. This sequence of binary values is output as a fingerprint for the audio signal. Preferably, this change is quantized by only a single bit. This 1-bit quantization provides only a single bit of fingerprint information per block of the audio signal, and the audio signal is represented by a simple bit sequence that provides a fast, efficient, and surprisingly accurate correlation with a corresponding test bit sequence can be carried out.
- Audio signals have the characteristic that the characteristics do not change so much from block to block that full, eg 8-bit quantization or 16-bit quantization of the fingerprint value is not absolutely necessary. Further Audio signals have the property that a change of the fingerprint value from one block to the next is very meaningful for the audio signal. The preferred 1-bit quantization strongly emphasizes this change from one block to the next. In particular, audio signals have the property that the fingerprint value does not change very much from one block to the next. In this small change, however, is the characterization information for the audio signal which is particularly required for fingerprint processing purposes and which is effectively utilized by the inventive 1-bit quantization.
- the fingerprint value is an energy-dependent or power-dependent value
- changes from one block to the next are relatively small, but especially if blocks in the range of less than 5,000 samples and in particular less than 2,000 samples and blocks greater than 500 samples are formed , the change of the energy-dependent or power-dependent value from one block to another particularly characteristic of the audio signal.
- the fingerprint according to the invention can be used particularly advantageously for the synchronization of multichannel extension data with an audio signal, wherein synchronization by means of a block-based fingerprint technology is achieved efficiently and reliably.
- fingerprints calculated in blocks represent a good and efficient characteristic for an audio signal.
- the audio signal preferably comprises a block division information which can be used at the time of synchronization. This will ensure that the fingerprints, which are derived from the audio signal when synchronizing, are based on the same block division as fingerprints of the audio signal associated with the multi-channel extension data.
- the multi-channel extension data comprises a sequence of reference audio signal fingerprint information. This reference audio signal fingerprint information provides an association contained in the multichannel extension stream between a block of multichannel extension data and the portion or block of the audio signal to which the multichannel extension data belongs.
- the reference audio signal fingerprints are extracted from the multichannel extension data and correlated with the test audio signal fingerprints computed by the synchronizer.
- the correlator only has to achieve a block correlation because, due to the use of the block allocation information, the block rasterization underlying the two sequences of fingerprints is already identical.
- the block division information contained in the audio signal may be used as explicit page information e.g. B. be specified in a header of the audio signal.
- this block scheduling information may also be included in a sample which may be e.g. B. was the first sample of a block that was formed to calculate the reference audio signal fingerprints contained in the multichannel extension data.
- the block allocation information may also be input directly into the audio signal itself, e.g. B. by means of a watermark embedding, are introduced. Particularly suitable for this purpose is a pseudo noise sequence, however, other ways of watermark embedding may be used to introduce block division information into the audio signal.
- the reference audio signal fingerprint information it is preferred to embed the reference audio signal fingerprint information directly, block by block, into the data stream of the multi-channel extension data.
- finding a suitable time offset using a fingerprint is achieved with a data fingerprint not stored separately from the multichannel extension data. Instead, for each block of multi-channel extension data in that block itself, the fingerprint is embedded.
- the reference audio signal fingerprint information associated with multichannel extension data may be from a separate source.
- Fig. 1 12 shows a schematic diagram of an apparatus for processing an audio signal, wherein the audio signal is shown at 100 with block allocation information, while the audio signal 102 may not include any block allocation information.
- the device for processing an audio signal from Fig. 1 which, in an encoder scenario, still referring to Fig. 9 can be used, includes a fingerprint calculator 104 for calculating a fingerprint per block of the audio signal for a plurality of successive blocks to obtain a series of reference audio signal fingerprint information.
- the fingerprint calculator is configured to use a predetermined block division information 106.
- the predetermined block division information 106 may be detected by block detector 108 from the audio signal 100 with block allocation information, for example.
- the fingerprint calculator 104 is capable of calculating from the audio signal 100 the sequence of reference fingerprints.
- the fingerprint calculator 104 will choose any block schedule and perform a very first block schedule.
- This block division is signaled via block division information 110 to a block allocation information embedder 112, which is designed to embed the block allocation information 110 into the audio signal 102 without block allocation information.
- the block scheduling information embedder On the output side, the block scheduling information embedder, thus providing an audio signal 114 with block scheduling information, which audio signal may be output via an output interface 116 or separately stored or output via a different path, as schematically indicated at 118, independently of the output.
- the fingerprint calculator 104 is configured to calculate a sequence of reference audio signal fingerprint information 120. This sequence of reference audio signal fingerprint information is supplied to a fingerprint information insetter 122.
- the fingerprint information embedder embeds the reference audio signal fingerprint information 120 into multichannel extension data 124, which may be provided separately, or may also be computed directly from a multichannel extension data calculator 126 that receives a multichannel audio signal 128 on the input side.
- the fingerprint information inset 122 thus provides multi-channel extension data with associated reference audio signal fingerprint information, which data is labeled 130.
- the fingerprint information embedder 122 is configured to directly embed the reference audio signal fingerprint information into the multichannel extension data at a block level, as it were.
- the fingerprint information embedder 122 will also store or provide the sequence of reference audio signal fingerprint information by association with a block of multichannel extension data, this block of multichannel extension data together with a block of the audio signal representing as good a approximation as possible of a multi-channel audio signal or multi-channel audio signal 128, respectively ,
- the output interface 116 is configured to output an output signal 132 comprising the sequence of reference audio signal fingerprint information and the multichannel extension data in unique association, such as within an embedded data stream.
- the output signal may also be a sequence of blocks of multi-channel extension data without reference audio signal fingerprint information.
- the fingerprint information is then provided in a separate sequence of fingerprint information, for example, each fingerprint being "connected" to a block of multichannel extension data over a consecutive block number.
- Alternative assignments of fingerprint data to blocks, such as implicit signaling of order, etc., are also usable.
- the output signal 132 may further include an audio signal with block scheduling information. However, in special applications, such as broadcasting, the audio signal with block scheduling information will go a separate path 118.
- Fig. 2 shows a more detailed representation of the fingerprint calculator 104.
- the fingerprint calculator 104 includes a blocker 104a, a downstream fingerprint value calculator 104b, and a fingerprint processor 104c to provide a sequence of reference audio signal fingerprint information 120.
- the blocker 104a is configured to provide the block-in information for storage / embedding 110 when performing the somewhat first block. However, if the audio signal already has a block scheduling information, then the block forming means 104a will be controllable to block depending on the predetermined block scheduling information 106.
- Blocking means 104 provides means for dividing the audio signal into successive blocks of samples. Further, fingerprint value calculation 104b acts as a means for calculating a first fingerprint value for a first block of the successive blocks and a second fingerprint value for a second block of the successive blocks ,
- the fingerprint correlator 312 of FIG. 3a provides a means of comparing, as at 806 in FIG. 8 , wherein the first fingerprint value is compared with the second fingerprint value.
- a preferred implementation of means 806 for comparison is the difference, as indicated by FIG. 8 is still described, since it can then be determined based on the sign of the difference result, whether the first fingerprint value was greater or less than the second fingerprint value.
- the fingerprint postprocessor 104c of FIG. 2 is configured in accordance with the invention to preferably perform a 1-bit quantization 814 or to assign a first binary value generally when the first fingerprint value is greater than the second fingerprint value or to assign a second different binary value when the first fingerprint value is less than that second fingerprint value is.
- the device according to the invention for calculating a fingerprint also comprises a device for outputting information about a sequence of binary values as a fingerprint for the audio signal, the device being in the form of the output interface 116 of FIG. 1 may be formed or act as any other data stream or bit stream writer.
- the two binary values are preferably complementary to one another.
- the first binary value is a 0 or a 1
- the second binary value is also a 0 or a 1, the second value being complementary to the first value.
- a 1-bit quantization is performed, wherein exactly one bit is generated per block of the audio signal.
- the sequence of bits as generated by block 814 is then the test fingerprint or reference fingerprint.
- the block divider 104a of FIG. 2 is adapted to either successive adjacent blocks form overlapping or to form blocks that overlap, for example, have a 50% overlap.
- the blocker 104a is configured to provide blocks of the audio signal having time samples having at least 500 samples or more, and whose length is preferably less than 5,000 samples. More preferably, blocks are taken in the range of between 1,000 and 2,500 samples, with 1024 samples or 2048 samples being preferred, particularly when frequency-based measures are used for fingerprint calculation. The longer the blocks are selected, the lower the bit-requirement of fingerprint information per audio signal.
- block lengths described above which may relate to an audio sample rate of, for example, 44.1 KHz, are preferred, however, corresponding block lengths for other sample rates will also provide reasonable results as long as one block a temporal length of the audio signal from about 10 ms to about 100 ms.
- the fingerprint according to the invention can preferably be used for synchronization, as it is based on FIG. 3 has been described, wherein even without block allocation information an accuracy on the order of a block length is obtained, which can be increased by adding the block allocation information to the range of 1 sample.
- block-accurate synchronization is sufficient, a satisfactory result can be obtained even without block allocation information.
- fingerprint applications for characterizing or identifying an audio signal it is not necessary to obtain a sample-exact synchronization between the test fingerprint and the reference fingerprint.
- the audio signal is watermarked as shown in FIG Fig. 4a is shown.
- Fig. 4a an audio signal with a sequence of samples, with a block division in Blocks i, i + 1, i + 2 is indicated schematically.
- the audio signal itself includes the in Fig. 4a embodiment shown no such explicit block classification.
- a watermark 400 is embedded in the audio signal such that each audio sample includes a watermark portion.
- This watermark portion is indicated schematically for a sample 402 at 404.
- the watermark 400 is embedded in such a way that the block structure can be detected on the basis of the watermark.
- the watermark is, for example, a known periodic pseudo noise sequence as described in U.S. Pat Fig. 5 shown at 500.
- This known pseudo noise sequence has a period equal to or longer than a block length, but with a period equal to or longer than the block length.
- Watermark embedding will, as in Fig. 5 is shown, first made a block 502 of the audio signal. Then, a block of the audio signal is converted into the frequency domain by means of a time / frequency conversion 504. Analogously, the known pseudo noise sequence 500 is also transformed into the frequency domain via a time / frequency conversion 506. Thereafter, a psychoacoustic module 508 calculates the psychoacoustic masking threshold of the audio signal block, where, as is known in psychoacoustics, a signal in a band is then masked in the audio signal, ie inaudible, if the energy of the signal in the band is below the value of the masking threshold for this band is.
- a spectral weighting 510 is performed for the spectral representation of the pseudo noise sequence.
- the spectrally weighted pseudo-noise sequence then has, in front of a combiner 512, a spectrum having a gradient corresponding to the psychoacoustic masking threshold. This signal is then combined spectral-wise with the spectrum of the audio signal in combiner 512.
- the combiner 512 there is an audio signal block with a watermark inserted, but the watermark is masked by the audio signal.
- a frequency / time converter 514 the block of the audio signal becomes the time domain transformed back and it exists in Fig. 4a shown audio signal, but now has a watermark, which represents a block allocation information.
- the spectral weighting 510 may be made by a dual operation in the time domain such that a time / frequency translation 506 is not necessary.
- the spectrally weighted watermark could also be transformed into the time domain prior to its combination with the audio signal, such that the combination 512 would occur in the time domain, in which case a time / frequency conversion 504 would not be necessary unless the masking threshold is without transformation can be calculated.
- a calculation of the masking threshold used independently of the audio signal or of a transformation length of the audio signal can also be undertaken.
- the length of the known pseudo noise sequence is equal to the length of a block. Then a correlation to the watermark extraction works very efficiently and clearly.
- longer pseudo noise sequences may also be used as long as a period of the pseudo noise sequence is equal to or greater than the block length.
- a watermark which does not have a white spectrum but which, for example, is designed in such a way that it has only spectral components in specific frequency bands, for example the lower spectral band or a medium spectral band. This can be controlled that the watermark is not z. B. is introduced only in the upper bands, the z. B. by a "Spectral Band Replication" technique, as it is known from the MPEG-4 standard, be eliminated or parameterized in a data-sparing transmission.
- a watermark and a block division can be made if z.
- B. a digital channel exists, in which each block of the audio signal from Fig. 4 can be marked to the effect that z.
- the first sample value of a block receives a flag.
- z. For example, in a header of an audio signal, a block schedule used to calculate the fingerprint and also used to compute the multi-channel extension data from the original multi-channel audio channels is signaled.
- Fig. 9 shows an encoder-side scenario as used to reduce the data rate of multi-channel audio signals.
- a 5.1 scenario is shown, although a 7.1, 3.0 or an alternative scenario can also be used.
- Spatial audio object coding which is likewise known, and in which audio objects are coded instead of audio channels, in which the multichannel extension data are actually data with which objects can be reconstructed, also uses a basically two-part structure in Fig. 9 is indicated.
- the multi-channel audio signal having the plurality of audio channels or audio objects is supplied to a downmixer 900 which provides a downmixed audio signal, the audio signal being for example a mono downmix or a stereo downmix. Further, a multi-channel extension data calculation is performed in a corresponding multi-channel extension data calculator 902. There the multi-channel extension data are calculated, e.g. B. according to the BCC technique or according to the standard, which is known under the name MPEG Surround. An extension data calculation for audio objects, which are also referred to as multi-channel extension data, can take place in the audio signal 102. In the Fig. 1 shown apparatus for processing the audio signal is connected downstream of these known two blocks 900, 902, wherein the in Fig.
- FIG. 9 shown apparatus 904 for processing according to Fig. 1 z. B. receives an audio signal 102 without block scheduling information as a mono downmix or stereo downmix and further receiving the multichannel extension data over the line 124.
- the multi-channel extension data calculator 126 of FIG Fig. 1 is thus the multi-channel extension data calculator 902 of Fig. 9 correspond.
- the device 904 provides for processing z. B. an audio signal 118 with embedded block allocation information and a data stream with multi-channel extension data including associated or embedded reference audio signal fingerprint information, as shown in Fig. 1 at 132.
- FIG. 12 shows a more detailed representation of multichannel extension data calculator 902.
- block formation is first performed in respective blockers 910 to obtain a block for the original channel of the multichannel audio signal.
- a time / frequency conversion per block in a time / frequency converter 912.
- the time / frequency converter may be a filter bank for performing a subband filtering, a general transformation or in particular a transformation in the form of an FFT. Alternative transformations are also known as MDCT, etc.
- a separate correlation parameter between the channel and a reference channel which is denoted by ICC, is calculated.
- a separate energy parameter ICLD is also calculated per band and block and channel, this being done in a parameter calculator 914.
- the blocker 910 uses block allocation information 106 if such block allocation information already exists.
- the blocker 910 may also specify block scheduling information itself when the first block scheduling is made, and then output and thereby e.g. B. the fingerprint calculator of Fig. 1 to control.
- the output block allocation information is also designated 110.
- it is ensured that the block formation for the calculation of the multichannel extension data matched to the block formation for the calculation of the fingerprints of Fig. 1 is made. This will ensure that a sample-accurate Synchronization of multi-channel extension data to the audio signal is achievable.
- the parameter data calculated by the parameter calculator 914 is fed to a streaming formatter 916 which is equal to the fingerprint information embedder 122 of FIG Fig. 1 can be trained.
- Data stream formatter 916 also receives one fingerprint per block of the downmix signal, as indicated at 918.
- the streaming formatter uses the fingerprint and received parameter data 915 to generate multichannel extension data 130 with embedded fingerprint information, a block of which is schematically illustrated in FIG Fig. 11b is shown.
- the fingerprint information for this block is entered after an optional synchronization word 950 at 960.
- the parameters 915 calculated by the parameter calculator 940 namely, z. B. in the in Fig.
- the channel is indicated by the index of "ICLD", where an index “1” stands for the left channel, an indes “2” stands for the middle channel, an index “3” stands for the right channel Index “4" stands for the left rear channel (LS) and an index "5" stands for the right rear channel (RS).
- the fingerprint information for a block may also be inserted in the direction of transmission after the multi-channel extension data or anywhere between the multi-channel extension data.
- the fingerprint information can also be transmitted in a separate data stream or in a separate table, the z. Via an explicit block identifier with the multi-channel extension data in the assignment or the assignment is implicitly given, namely by the order of the fingerprints to the order of the multi-channel extension data for the individual blocks. Other mappings without explicit embedding are also usable.
- Fig. 3a shows an apparatus for synchronizing multichannel extension data with an audio signal 114.
- the audio signal 114 comprises block allocation information as described with reference to FIG Fig. 1 has been shown.
- the multi-channel extension data is assigned reference audio signal fingerprint information.
- the audio signal having the block division information is supplied to a block detector 300 configured to detect the block division information in the audio signal and to supply the detected block division information 302 to a fingerprint calculator 304.
- the fingerprint calculator 304 also obtains the audio signal, here only one audio signal without block scheduling information would be sufficient, but the fingerprint calculator may also be configured to use the audio signal with block scheduling information for fingerprint calculation.
- the fingerprint calculator 304 now calculates one fingerprint per block of the audio signal for a plurality of consecutive blocks to obtain a sequence of test audio signal fingerprints 306.
- the fingerprint calculator 304 is configured to use the block allocation information 302 to calculate the sequence of test audio signal fingerprints 306.
- the synchronization device according to the invention or the synchronization method according to the invention is further based on a fingerprint extractor 308 for extracting a series of reference audio signal fingerprints 310 from the reference audio signal fingerprint information 120 as supplied to the fingerprint extractor 308.
- Both the sequence of test fingerprints 306 and the sequence of reference fingerprints 308 are supplied to a fingerprint correlator 312, which is configured to correlate the two sequences.
- a balancer 316 is controlled to reduce a temporal offset between the multichannel extension data 132 and the audio signal 114, or at best to eliminate.
- Fig. 10 is the in Fig. 3a shown synchronizer at 1000.
- the synchronizer 1000 includes as referring to FIG Fig. 3a has the audio signal 114 and the multichannel extension data in an unsynchronized form and provides the audio signal and multichannel extension data in synchronized form to an up-converter 1102 on the output side.
- the up-converter 1102, also referred to as an "upmix" block, can now operate on the basis of the audio signal and the multichannel extension data synchronized therewith computes reconstructed multichannel audio signals L ', C', R ', LS' and RS '. These reconstructed multichannel audio signals approximate the original multichannel audio signals as received at the input of block 900 in FIG Fig.
- the reconstructed multichannel audio signals are provided at the output of block 1102 Fig. 10 Also reconstructed audio objects or at certain positions already modified reconstructed audio objects is, as it is known from the audio-object coding.
- the reconstructed multichannel audio signals now have a maximum achievable audio quality due to the fact that a synchronization of the multichannel extension data with the audio signal has been achieved sample-accurate.
- Fig. 3b shows a specific implementation of the equalizer 316.
- the equalizer 316 has two delay blocks, of which a block 320 may be a maximum delay fixed block and the second block 322 may be a variable delay block controllable between a delay equal to zero and a maximum delay D max .
- the control takes place on the basis of the correlation result 314.
- the fingerprint correlator 312 provides a correlation offset control in integer (x) of a block length ( ⁇ D).
- ⁇ D block length
- the fingerprint has been calculated in blocks, that is, only relatively roughly reflects the time course of the audio signal and, correspondingly, the time course of the multichannel extension data, nevertheless a sample-exact correlation is achieved, solely due to the fact that the block classification of the fingerprint Calculator 304 in the synchronizer has been synchronized to the block scheduling used to block the multichannel extension data, which has been used primarily to calculate the fingerprints embedded in the multichannel extension data stream, or those associated with the multichannel extension data stream Assignment are.
- equalizer 316 With regard to the implementation of the equalizer 316, it should be noted that two variable delays can also be used, so that the correlation result 314 controls both variable delay stages. Also alternative implementation options within a balancer for synchronization purposes to eliminate temporal offsets may be used.
- Fig. 6 a detailed implementation of the block detector 300 of Fig. 3a shown when the block division information is introduced as a watermark in the audio signal.
- the watermark extractor in Fig. 6 can become watermark embedder of Fig. 5 be constructed analogously, but need not be constructed in exact analogy.
- the watermarked audio signal is supplied to a blocker 600 which generates successive blocks from the audio signal.
- a block is then supplied to a time / frequency converter 602 to transform the block.
- a psychoacoustic module 604 is able to compute a masking threshold to prefilter the block of the audio signal in a prefilter 606 using this masking threshold.
- the implementation of module 604 and prefilter 606 serve to increase the watermark detection accuracy. They may also be omitted so that the output of the time / frequency converter 602 is directly coupled to a correlator 608.
- the correlator 608 is configured to provide the known pseudo noise sequence 500 already used in watermark embedding in FIG Fig. 5 has been used to correlate to a block of the audio signal after a time / frequency translation in a converter 502.
- a test block classification is specified, which does not necessarily have to correspond to the final block classification. Instead, the correlator 608 will now perform a correlation over several blocks, for example over twenty or even more blocks.
- the spectrum of the known noise sequence is correlated with the spectrum of each block at different delay values, so that after a plurality of blocks, a correlation result 610 results which could look like, for example, FIG Fig. 7 is shown.
- a controller 612 may monitor the correlation result 610 and perform a peak detection. For this purpose, the controller 612 recognizes a peak 700, which emerges more and more clearly with a longer correlation, ie with a larger number of blocks used for the correlation.
- the controller 612 determines a corrected block schedule 614, e.g. For example, according to the formula as shown in Fig. 7 is set forth.
- the offset block value ⁇ n is subtracted from the test block schedule to calculate the corrected block schedule 614, which is then output from the fingerprint calculator 304 of FIG Fig. 3a is to be compiled to calculate the test fingerprints.
- a special approach on the transmitter side and the receiver side is thus preferred to solve the assignment problem.
- a calculation of temporally variable and suitable fingerprint information from the corresponding (mono or stereo) Downmixaudiosignal be made.
- these fingerprints can be used regularly as a synchronization aid in the multichannel additional data stream sent be keyed. This can be done as a data field in the middle of the block-organized spatial audio coding page information or in such a way that the fingerprint signal is sent as first or last information of the data block so that it can be easily added or removed.
- a watermark such as a known noise sequence, may be embedded in the audio signal to be sent. This serves the receiver to identify the frame phase and to eliminate in-frame skew.
- a two-stage synchronization is preferred.
- the watermark is extracted from the received audio signal and the position of the noise sequence is determined.
- the frame boundaries can be determined by the position due to their noise sequence and the audio data stream can be subdivided accordingly.
- the characteristic audio features i. H. Fingerprints or fingerprints are calculated over the almost equal sections, as they were calculated in the transmitter, which increases the quality of the result in a later correlation.
- the downmix signal can also have more than two channels, so long the channels in the downmix signal have a smaller number than in the original audio signal before the downmix channels or are generally audio objects.
- the fingerprints may be extracted from the multichannel overhead information, and a temporal offset between the multichannel overhead information and the received signal may be made via appropriate and well known correlation techniques.
- An overall time offset is composed of the frame phase and the offset between multichannel overhead information and received audio signal.
- the audio signal and the multi-channel additional information be synchronized for subsequent multi-channel decoding by a downstream, actively controlled delay equalization stage.
- the multichannel audio signal is divided into blocks of fixed size to obtain the multichannel overhead data.
- a noise sequence also known to the receiver is embedded, or in general a watermark is embedded.
- a fingerprint is calculated block by block simultaneously or at least synchronized to obtain the multichannel additional data, which is suitable for characterizing the temporal structure of the signal as clearly as possible.
- An embodiment of this is to use the energy content of the current downmix audio signal of the audio block, for example in logarithmic form, ie in a decibel-related representation.
- the fingerprint is a measure of the temporal envelope of the audio signal.
- this synchronization information may also be expressed as a difference to the energy value of the previous block followed by appropriate entropy coding, such as Huffman coding, adaptive scaling and quantization.
- the audio signal is present in successive blocks.
- the fingerprint value may be one energy value per block, as illustrated in step 802.
- the signal value s left (i) with the number i stands for a temporal sample of a left channel of the audio signal.
- s right (i) stands for the ith sample of a right channel of the audio signal.
- a minimum limitation of the energy is now preferably carried out for the purpose of a subsequent logarithmic display.
- a minimum energy offset E offset is applied, so that in the case of zero energy, a meaningful logarithmic calculation results.
- This energy measure in dB describes a number range from 0 to 90 (dB) with an audio signal resolution of 16 bits.
- e db 10 * log ⁇ e mono sum + e offset
- E db (Diff) is the difference value of the energy values of two preceding blocks, in a dB representation, while E db is the energy in dB of the current block or the previous block, as explained in the above equation itself. This difference of energies is performed in a step 806.
- this step z. B. only in the encoder, so in the fingerprint calculator 104 of Fig. 1 in that the fingerprint embedded in the multichannel extension data consists of different coded values.
- the subtraction step 806 may also be implemented purely on the decoder side, that is, in the fingerprint calculator 304 of FIG Fig. 3a ,
- the transmitted fingerprint consists only of non-differentially encoded, and the difference in step 806 is first made in the decoder.
- This possibility is illustrated by the dashed signal flow line 808 which bypasses the subtraction block 806.
- This latter option 808 has the advantage that the fingerprint still contains information about the absolute energy of the downmix signal, but requires a slightly higher fingerprint word length.
- the scaling of the energy (envelope of the signal) for optimal modulation according to the block 808 ensures that in the subsequent quantization of this fingerprint both the number range is maximally utilized and the resolution at low energy values is improved.
- E scaled this case, the scaled energy.
- E db (diff) is calculated by the difference in the block 806 difference energy in dB, and
- a gain is the gain factor, which may be dependent on the time t, if it is a particular dynamic gain control is.
- the amplification factor will depend on the envelope signal in that with a larger envelope the amplification factor decreases and with a smaller envelope the amplification factor increases in order to obtain the most uniform possible modulation of the available number range.
- the gain factor may be replicated in the fingerprint calculator 304 by measuring the energy of the transmitted audio signal so that the gain factor does not have to be explicitly transmitted.
- E quantizes the quantized energy value and represents a quantization index that has 8 bits.
- Q 8Bit is the quantization operation that assigns the quantization index for the maximum value 255 to a value> 255. It should be noted that even finer quantizations with more than 8 bits or coarser quantizations with less than 8 bits can be taken, with coarser quantization of the additional bit needs decreases, while finer quantization with more bits of the overhead of bits increases, but also the accuracy increases.
- entropy coding of the fingerprint may then take place.
- the bit requirement for the quantized fingerprint can be further reduced.
- a suitable entropy method is, for example, the Huffman coding. Statistically different frequencies of fingerprint values can be expressed by different code lengths and thus on average reduce the bit requirements of the fingerprint representation.
- the result of the entropy encoding block 812 is then written into the extension channel data stream, as shown at 813.
- non-entropy-coded fingerprints may also be written into the bitstream as quantized values, as shown at 811.
- another fingerprint value may also be calculated, as shown in block 818.
- the crest factor of the power density spectrum (PSD crest) can also be calculated.
- a 1-bit quantization as shown in block 814 may be used.
- a 1-bit quantization is additionally performed. It has been shown that this can increase the accuracy of the correlation. This 1-bit quantization is realized so that the fingerprint is equal to 1 if the new value is greater than the old one (slope positive) and equal to -1 if the slope is negative. A negative slope is reached when the new value is less than the old value.
- the inventively preferred 1-bit quantization considerably simplifies the correlation calculation in the fingerprint correlator 312. Due to the fact that the test fingerprint and the reference fingerprint are bit sequences, the correlation can be simplified to a simple XOR operation and then summation of the bitwise results of the XOR operation. Thus, if the sequence of test audio fingerprint values and the sequence of reference audio signal fingerprint values are each a sequence of 1-bit values, one bit for each block of audio samples is the fingerprint correlator 312 of Fig. 3a designed to combine a bit sequence of the sequence of test audio signal fingerprints and a bit sequence of the reference audio signal fingerprints by a bitwise XOR operation and to sum up received bit results. The result of this summation represents a first correlation value.
- the bit sequences have a length of z. B. 32 bits or between z. For example, 10 bits and 100 bits.
- the fingerprint correlator 312 is configured to combine a shift-shifted bit sequence of the sequence of test audio signal fingerprints or reference audio signal fingerprints with a different sequence by also a bitwise XOR operation and to sum up the received bit results. whereby a second correlation value is obtained. For the displacement value for which the maximum correlation value has been given, it can then be determined that the test fingerprint and the reference fingerprint have coincided. This shift value thus represents the correlation result, since it has given the largest correlation value for this particular shift value.
- this quantization also affects the bandwidth needed to transmit the fingerprint. If at least 8 bits had to be used for the fingerprint before, in order to provide a sufficiently accurate value, one single bit is sufficient here. Since the fingerprint and its 1-bit counterpart are already determined in the transmitter, one achieves a more accurate calculation of the difference, since the actual fingerprint with maximum resolution is present and so minimal changes between the fingerprints both in the transmitter and in the receiver can be considered , It has also been found that most consecutive fingerprints differ only minimally. However, this difference will be nullified by quantization before difference formation.
- 1-bit quantization as a special fingerprint post-processing can also be used regardless of whether there is an audio signal with overhead information or not, since 1-bit quantization on the The basis of differential coding is inherently a robust, yet accurate fingerprint method, which is also used for purposes other than synchronization, for. B. can be used for purposes of identification or classification.
- a calculation of the multichannel overhead data is performed using the multichannel audio data.
- the multi-channel additional information calculated in this case is then extended by the newly added synchronization information in the form of the calculated fingerprints by suitable embedding in the bit stream.
- the preferred word-mark fingerprint hybridization solution allows a synchronizer to detect a skew of downmix signal and overhead data and a timely adjustment, ie delay compensation, between the audio signal and the multi-channel extension data on the order of +/- one sample value to realize.
- the multi-channel assignment in the receiver can be almost completely, i. H. are reconstructed to a barely perceptible time difference of a few samples, which does not significantly affect the quality of the reconstructed multichannel audio signal.
- the fingerprint according to the invention can be used to characterize a test audio signal.
- a device 104 or 304 is provided in order to obtain a sequence of test audio signal fingerprints from the test audio signal.
- a correlator such as correlator 312 is provided to correlate the sequence of binary values with different reference fingerprints provided in a reference database, the reference database containing information about each reference fingerprint Audio signal associated with the reference fingerprint.
- the information about the test audio signal is, for example, an identification of the audio signal, that is to say what the song is called and, if applicable, from which author it originates and on which CD or on which sound carrier this piece can be found and where it can be ordered.
- An alternative characterization of an audio signal is to provide a test audio signal e.g. identify as audio signal of a certain style epoch or a certain style belonging to identify or originating from a particular music group. Such a characterization can be done, for example, by not only qualitatively but quantitatively determining how the reference fingerprint stands for the test fingerprint or what distance exists between the two. This alignment of the fingerprint sequences or the calculation of the quantitative spacing of the fingerprint sequences can be performed e.g. take place when a correlation has taken place to eliminate the time offset of the reference fingerprint and the test fingerprint.
- the method according to the invention can be implemented in hardware or in software.
- the implementation may be on a digital storage medium, in particular a floppy disk, CD or DVD with electronically readable control signals, which may interact with a programmable computer system such that the method is performed.
- the invention thus also consists in a computer program product with a program code stored on a machine-readable carrier for carrying out the method according to the invention, when the computer program product runs on a computer.
- the invention can thus be realized as a computer program with a program code for carrying out the method when the computer program runs on a computer.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
Description
Die vorliegende Erfindung bezieht sich auf die Fingerabdruck-Technologie für Audio-Signale und insbesondere auf das Berechnen eines Fingerabdrucks, des Verwendens eines Fingerabdrucks zum Synchronisieren von Mehrkanälerweiterungsdaten mit einem Audiosignal und das Charakterisieren eines Audiosignals mit dem Fingerabdruck. -The present invention relates to the fingerprint technology for audio signals, and more particularly to the calculation of a fingerprint, the use of a fingerprint for synchronizing multi-channel extension data with an audio signal, and the characterization of an audio signal with the fingerprint. -
Derzeit in der Entwicklung befindliche Technologien ermöglichen eine immer effizientere Übertragung von Audiosignalen durch Datenreduktion, aber auch eine Steigerung des Hörgenusses durch Erweiterungen, wie beispielsweise durch den Einsatz von Mehrkanaltechnik.Technologies currently under development enable more and more efficient transmission of audio signals through data reduction, but also an increase in listening pleasure through extensions, such as the use of multi-channel technology.
Beispiele für eine solche Erweiterung der üblichen Übertragungstechniken sind unter dem Namen "Binaural Cue Coding" (BCC) sowie "Spatial Audio Coding" bekannt geworden. Hierzu wird beispielhaft auf
Solche Verfahren trennen in einem sequentiell arbeitenden Übertragungssystem wie Rundfunk oder Internet das zu übertragende Audioprogramm in Audiobasisdaten beziehungsweise ein Audiosignal aus, das ein Mono- oder auch ein Stereodownmixaudiosignal sein kann, und in Erweiterungsdaten, die auch als Mehrkanalzusatzinformationen oder Mehrkanalerweiterungsdaten bezeichnet werden, auf. Die Mehrkanalerweiterungsdaten können zusammen mit dem Audiosignal, also kombiniert ausgestrahlt werden, oder die Mehrkanalerweiterungsdaten können auch separat von dem Audiosignal ausgestrahlt werden. Alternativ zur Ausstrahlung eines Rundfunkprogramms können die Mehrkanalerweiterungsdaten auch separat zu einer beim Benutzer zum Beispiel schon vorliegenden Version des Downmix-Kanals übertragen werden. In diesem Fall findet die Übertragung des Audiosignals beispielsweise in Form eines Internet-Downloads oder eines Kaufs einer Compactdisk oder DVD räumlich und zeitlich getrennt von der Übertragung der Mehrkanalerweiterungsdaten statt, welche beispielsweise von einem Mehrkanalerweiterungsdaten-Server geliefert werden können.Such methods, in a sequential communication system such as broadcast or Internet, separate the audio program to be transmitted into audio base data or audio signal, which may be mono or stereo downmixed audio, and extension data, also referred to as multichannel overhead information or multichannel extension data. The multi-channel extension data can be broadcast together with the audio signal, ie combined, or the multi-channel extension data can also be separated be broadcast from the audio signal. As an alternative to the broadcasting of a broadcast program, the multichannel extension data can also be transmitted separately to a version of the downmix channel that is already available to the user, for example. In this case, the transmission of the audio signal takes place, for example, in the form of an Internet download or a purchase of a compact disc or DVD spatially and temporally separated from the transmission of multi-channel extension data, which can be supplied for example by a multi-channel extension data server.
Prinzipiell hat die Trennung eines Mehrkanalaudiosignals in ein Audiosignal und Mehrkanalerweiterungsdaten folgende Vorteile. Ein "klassischer" Empfänger ist jederzeit unabhängig von Inhalt und Version der Mehrkanalzusatzinformationen in der Lage, die Audiobasisdaten, also das Audiosignal zu empfangen und wiederzugeben. Diese Eigenschaft wird als Rückwärtskompatibilität bezeichnet. Darüber hinaus kann ein Empfänger der neueren Generation die übertragenen Mehrkanalzusatzdaten auswerten und diese mit den Audiobasisdaten, also mit dem Audiosignal so kombinieren, dass dem Nutzer die vollständige Erweiterung, d. h. der Mehrkanalton, zur Verfügung gestellt werden kann.In principle, the separation of a multi-channel audio signal into an audio signal and multi-channel extension data has the following advantages. A "classical" receiver is at any time able to receive and reproduce the audio base data, ie the audio signal, independently of the content and version of the multi-channel additional information. This property is called backward compatibility. In addition, a newer generation receiver may evaluate the transmitted multichannel overhead data and combine it with the audio base data, that is, the audio signal so as to provide the user with the full extension, ie. H. the multi-channel sound, can be provided.
Bei einem Beispielsanwendungsszenario im digitalen Rundfunk kann mit Hilfe dieser Mehrkanalerweiterungsdaten das bisher ausgestrahlte Stereoaudiosignal durch geringen zusätzlichen Übertragungsaufwand auf das Mehrkanalformat 5.1 erweitert werden. Das Mehrkanalformat 5.1 hat fünf Wiedergabekanäle, also einen linken Kanal L, einen rechten Kanal R, einen mittleren Kanal C, einen linken hinteren Kanal LS (left surround) und einen rechten hinteren Kanal RS (right surround). Hierzu erzeugt der Programmanbieter auf der Senderseite aus Mehrkanaltonquellen, wie sie z. B. auf einer DVD/Audio/Video zu finden sind, die Mehrkanalzusatzinformationen. Anschließend kann diese Mehrkanalzusatzinformation parallel zum wie bisher ausgestrahlten Audiostereosignal übertragen werden, welches nun einen Stereodownmix des Multikanalsignales enthält.In an example application scenario in digital broadcasting, with the aid of this multi-channel extension data, the previously broadcast stereo audio signal can be extended to the multi-channel format 5.1 by a small additional transmission effort. The multichannel format 5.1 has five playback channels, ie a left channel L, a right channel R, a middle channel C, a left rear channel LS (left surround) and a right rear channel RS (right surround). For this purpose, the program provider generates on the sender side of multi-channel sound sources, such as. B. on a DVD / audio / video to find the multi-channel additional information. Subsequently, this multichannel additional information can be transmitted in parallel to the previously emitted audio stereo signal, which now contains a stereo downmix of the multichannel signal.
Ein Vorteil dieses Verfahrens ist dabei die Kompatibilität mit dem bisher bestehenden digitalen Rundfunkübertragungssystem. Ein klassischer Empfänger, der diese Zusatzinformation nicht auswerten kann, wird wie bisher das Zweikanaltonsignal ohne irgendwelche qualitativen Einschränkungen empfangen und wiedergeben können.An advantage of this method is the compatibility with the previously existing digital broadcasting system. A classical receiver, which can not evaluate this additional information, will be able to receive and reproduce the bilingual signal without any qualitative restrictions.
Ein Empfänger neuerer Bauart hingegen kann zusätzlich zum bisher empfangenen Stereotonsignal die Mehrkanalinformationen auswerten, dekodieren und das ursprüngliche 5.1 Mehrkanalsignal daraus rekonstruieren.In contrast, a receiver of a newer design can, in addition to the previously received stereo sound signal, evaluate and decode the multichannel information and reconstruct the original 5.1 multichannel signal therefrom.
Um eine gleichzeitige Übertragung der Mehrkanalzusatzinformationen als Ergänzung zum bisher verwendeten Stereotonsignal zu ermöglichen, sind zwei Lösungen für die kompatible Ausstrahlung über ein digitales Rundfunksystem denkbar.In order to enable a simultaneous transmission of multi-channel additional information as a supplement to the previously used stereo sound signal, two solutions for the compatible broadcast over a digital broadcasting system are conceivable.
Die erste Lösung besteht darin, die Mehrkanalzusatzinformationen mit dem codierten Downmixaudiosignal so zu kombinieren, dass die in dem von einem Audiocodierer erzeugten Datenstrom als geeignete und kompatible Erweiterung angehängt werden können. In diesem Fall sieht der Empfänger nur einen (gültigen) Audiodatenstrom und kann daraus die Mehrkanaltonzusatzinformationen über einen entsprechend vorgeschalteten Datenverteiler wieder synchron zu dem dazugehörigen Audiodatenblock extrahieren, dekodieren und als 5.1-Mehrkanalton ausgeben.The first solution is to combine the multichannel overhead information with the encoded downmix audio signal so that the data stream generated by an audio encoder can be appended as a suitable and compatible extension. In this case, the receiver sees only one (valid) audio data stream and can extract from it the multi-channel sound additional information via a corresponding upstream data distributor again synchronously to the associated audio data block, decode and output as a 5.1 multi-channel sound.
Diese Lösung benötigt die Erweiterung der vorhandenen Infrastruktur/Datenwege, so dass sie statt wie bisher lediglich die Stereoaudiosignale, nun die aus Downmixsignalen und Erweiterung bestehenden Datensignale transportieren können. Dies ist zum Beispiel dann ohne Zusatzaufwand möglich beziehungsweise unproblematisch, wenn es sich um eine datenreduzierte Darstellung handelt, d. h. einen Bitstrom, welcher die Downmix-Signale überträgt. In diesen Bitstrom kann dann ein Feld für die Erweiterungsinformation eingefügt werden.This solution requires the extension of the existing infrastructure / data paths, so that instead of just the stereo audio signals, as before, they can now transport the data signals consisting of downmix signals and expansion. This is possible, for example, without any additional effort or unproblematic, if it is a data-reduced representation, d. H. a bit stream which transmits the downmix signals. A field for the extension information can then be inserted into this bit stream.
Eine zweite denkbare Lösung besteht darin, die Mehrkanaltonzusatzinformationen nicht an das verwendete Audiocodierungssystem zu koppeln. In diesem Fall werden die Mehrkanalerweiterungsdaten nicht in den eigentlichen Audiodatenstrom eingekoppelt. Die Übertragung erfolgt stattdessen über einen gesonderten, aber zeitlich nicht notwendigerweise synchronisierten Zusatzkanal, welcher z. B. ein paralleler digitaler Zusatzkanal sein kann. Eine solche Situation tritt beispielsweise dann auf, wenn die Downmixdaten, also das Audiosignal, in unreduzierter Form z. B. als PCM-Daten per AES/EBU-Datenformat, durch eine in Studios vorhandene übliche Audioverteilungsinfrastruktur geleitet werden. Diese Infrastrukturen sind darauf ausgerichtet, Audiosignale zwischen diversen Quellen digital zu verteilen ("Kreuzschienen") und/oder zu bearbeiten, beispielsweise mittels einer Klangregelung, einer Dynamikkompression, etc..A second conceivable solution is not to apply the multi-channel audio addition information to the audio coding system used to pair. In this case, the multichannel extension data will not be injected into the actual audio stream. Instead, the transmission takes place via a separate, but not necessarily synchronized additional channel, which z. B. may be a parallel digital additional channel. Such a situation occurs, for example, when the downmix data, so the audio signal, in unreduced form z. B. as PCM data via AES / EBU data format, be routed through existing in studios common audio distribution infrastructure. These infrastructures are designed to digitally distribute ("crossbars") and / or manipulate audio signals between various sources, for example by means of a tone control, a dynamic compression, etc.
In der zweiten denkbaren Lösung, die vorstehend beschrieben worden ist, kann das Problem der zeitlichen Versetzung des Downmixaudiosignal und Mehrkanalzusatzinformationen im Empfänger auftreten, da beide Signale unterschiedliche, nicht synchronisierte Datenpfade durchlaufen. Ein zeitlicher Versatz zwischen Downmixsignal und Zusatzinformation führt jedoch zu einer Verschlechterung der Klangqualität des rekonstruierten Mehrkanalsignals, da dann auf Wiedergabeseite ein Audiosignal mit Mehrkanalerweiterungsdaten verarbeitet wird, die eigentlich nicht zu dem aktuellen Audiosignal gehören, sondern zu einem früheren oder späteren Abschnitt beziehungsweise Block des Audiosignals.In the second conceivable solution described above, the problem of skew of the downmix audio signal and multi-channel overhead information in the receiver may occur because both signals go through different, non-synchronized data paths. However, a time lag between downmix signal and additional information leads to a deterioration of the sound quality of the reconstructed multi-channel signal, since then on the playback side, an audio signal is processed with multi-channel extension data, which actually belong not to the current audio signal, but to an earlier or later section or block of the audio signal.
Da die Größenordnung der zeitlichen Verschiebung nicht mehr aus dem empfangenen Audiosignal und den Zusatzinformationen ermittelbar ist, ist eine zeitlich korrekte Rekonstruktion und Zuordnung des Mehrkanalsignals im Empfänger nicht gewährleistet, was zu den Qualitätseinbußen führen wird.Since the magnitude of the time shift is no longer determined from the received audio signal and the additional information, a timely correct reconstruction and assignment of the multi-channel signal in the receiver is not guaranteed, which will lead to the quality losses.
Ein weiteres Beispiel für diese Situation besteht dann, wenn ein bereits laufendes 2-kanaliges Übertragungssystem auf eine Multikanal-Übertragung erweitert werden soll, wenn z. B. an einen Empfänger für digitales Radio gedacht wird. Hie ist es oft der Fall, dass die Decodierung des Downmixsignals mittels eines in dem Empfänger bereits vorhandenen Audiodecodierers, also zum Beispiel eines Stereo-Audiodecodierers nach dem MPEG-4-Standard, geschieht. Die Verzögerungszeit dieses Audiodecodierers ist nicht immer bekannt beziehungsweise kann nicht immer mit Sicherheit vorausgesagt werden, und zwar aufgrund der systemimmanenten Datenkompression von Audiosignalen. Daher kann die Verzögerungszeit eines solchen Audio-Decodierers auch nicht zuverlässig ausgeglichen werden.Another example of this situation is when an already running 2-channel transmission system to be extended to a multi-channel transmission, if z. B. is thought of a receiver for digital radio. Here it is often the case that the decoding of the downmix signal means an existing in the receiver audio decoder, so for example a stereo audio decoder according to the MPEG-4 standard happens. The delay time of this audio decoder is not always known or can not always be predicted with certainty, due to the system-inherent data compression of audio signals. Therefore, the delay time of such an audio decoder can not be reliably compensated.
Im Extremfall kann das Audiosignal den Mehrkanal-Audiodecodierer sogar über eine Übertragungskette erreichen, die analoge Teile enthält. Hierbei wird an einem Punkt in der Übertragung eine Digital-/Analog-Umsetzung vorgenommen, welche nach einer weiteren Speicherung/Übertragung wieder von einer Analog-/Digital-Umsetzung gefolgt wird. Auch hier sind zunächst keinerlei Anhaltspunkte verfügbar, wie ein passender Verzögerungsausgleich des Downmixsignals relativ zu den Mehrkanalzusatzdaten durchgeführt werden kann. Wenn die Abtastfrequenz für die Analog-/Digital-Wandlung und die Digital-/Analog-Wandlung leicht voneinander abweichen, so entsteht sogar eine langsame zeitliche Drift der notwendigen Ausgleichsverzögerung entsprechend dem Verhältnis der beiden Abtastraten zueinander.In an extreme case, the audio signal may even reach the multi-channel audio decoder via a transmission chain containing analog parts. In this case, a digital / analog conversion is carried out at a point in the transmission, which is again followed by an analog / digital conversion after further storage / transmission. Again, no clues are initially available as to how a proper delay equalization of the downmix signal relative to the multichannel overhead data can be performed. If the sampling frequency for the analog / digital conversion and the digital / analog conversion slightly differ, so even creates a slow time drift of the necessary compensation delay corresponding to the ratio of the two sampling rates to each other.
Das deutsche Patent
Die internationale Veröffentlichung
Fingerabdruck-Technologien müssen allgemein gesagt charakteristisch für ein Audiosignal sein. Andererseits sollten sie auch eine ebenso stark komprimierte Darstellung eines Audiosignals sein. Dies bedeutet, dass der Fingerabdruck wesentlich weniger Speicherplatz in Anspruch nehmen darf als das Audio-signal selbst, da sonst das Erzeugen eines Fingerabdrucks und das Verwenden eines Fingerabdrucks keinen Sinn machen würde.Fingerprint technologies generally must be characteristic of an audio signal. On the other hand, they should also be an equally compressed representation of an audio signal. This means that the fingerprint may consume much less storage space than the audio signal itself, otherwise creating a fingerprint and using a fingerprint would not make sense.
Andererseits sollte ein Fingerabdruck den zeitlichen Verlauf eines Audiosignals wiedergeben, um zu Synchronisationszwecken einerseits, aber auch zu Identifikationszwecken andererseits geeignet zu sein. Insbesondere im Hinblick auf Identifikations- bzw. Charakterisierungszwecke existiert oft die Situation, dass ein Audiosignal, wie beispielsweise eine Rundfunksendung, ein Audiostück nicht vollständig abspielt, sondern zu einem bestimmen Zeitpunkt - innerhalb des Stücks zu senden beginnt und möglicherweise sogar bereits bevor das Stück beendet ist, mit dem Senden aufhört. Der Fingerabdruck muss allerdings nicht dekomprimierbar sein, da die Fingerabdruck-Erzeugung als eine besonders stark verlustbehaftete Kompression angesehen werden kann.On the other hand, a fingerprint should reflect the time course of an audio signal in order to be suitable for synchronization purposes on the one hand, but also for identification purposes on the other hand. In particular, with regard to identification or characterization purposes, the situation often exists that an audio signal, such as a broadcast, does not play an audio track completely but begins to transmit within the track at some point in time, and possibly even before the track is finished , with the transmission stops. However, the fingerprint does not have to be decompressible, since the fingerprint generation can be considered as a particularly lossy compression.
Da Fingerabdruckinformationen Zusatzinformationen sind, sollen sie, wie gesagt, eine möglichst komprimierte aber dennoch charakteristische Darstellung sein. Für die komprimierte Darstellung spricht ferner, dass je komprimierter die Darstellung ist, umso schneller und besser handhabbar jegliche Korrelationen ablaufen, also Berechnungsverfahren, bei denen ein Fingerabdruck involviert ist, z.B. zum Synchronisieren oder Charakterisieren eines Audiosignals.Since fingerprint information is additional information, they should, as I said, be as compressed as possible, yet characteristic. Further, for the compressed representation, the more compressed the representation, the faster and more manageable any correlations occur, that is, computational methods involving a fingerprint, e.g. to synchronize or characterize an audio signal.
Die Aufgabe der vorliegenden Erfindung besteht darin, ein effizientes Fingerabdruckkonzept zu schaffen.The object of the present invention is to provide an efficient fingerprint concept.
Diese Aufgabe wird durch eine Vorrichtung oder ein Verfahren gemäß einem der Ansprüche 1-13 oder ein Computer-Programm gemäß Patentanspruch 14 gelöst.This object is achieved by a device or a method according to one of claims 1-13 or a computer program according to claim 14.
Der vorliegenden Erfindung liegt die Erkenntnis zugrunde, dass ein gut komprimierender Fingerabdruck durch eine Blockverarbeitung eines Audiosignals erhalten wird, dass also pro Block des Audiosignals ein Fingerabdruckwert abgeleitet wird. Ferner hat sich :herausgestellt, dass ein Verlauf dieses Fingerabdruckwertes von Block zu Block besonders charakteristisch für das Audiosignalist. Daher wird im Sinne einer Differenzcodierung ein Vergleich aufeinander folgender Fingerabdruckwerte für aufeinander folgende Blöcke vorgenommen, um dann lediglich die Änderung binär zu charakterisieren. Ist der erste Fingerabdruckwert größer als der zweite Fingerabdruckwert, so wird ein erster binärer Wert zugewiesen, während dann, wenn der zweite Fingerabdruckwert größer als der erste Fingerabdruck-wert ist, ein anderer zweiter binärer Wert zugewiesen wird. Diese Folge von binären Werten wird als Fingerabdruck für das Audiosignal ausgegeben. Vorzugsweise wird diese Änderung durch nur ein einziges Bit quantisiert. Durch diese 1-Bit-Quantisierung wird pro Block des Audiosignals lediglich ein einziges Bit an Fingerabdruckinformationen geliefert, und das Audiosignal wird durch eine einfache Bit-Sequenz dargestellt, mit der eine schnelle, effiziente und überraschend genaue Korrelation mit einer entsprechenden Test-Bit-Sequenz durchgeführt werden kann.The present invention is based on the finding that a well-compressing fingerprint is obtained by a block processing of an audio signal, ie that a fingerprint value is derived per block of the audio signal. It has also been found that a progression of this fingerprint value from block to block is particularly characteristic of the audio signal. Therefore, in the sense of a differential coding, a comparison of successive fingerprint values is carried out for successive blocks, in order then to characterize only the change in a binary manner. If the first fingerprint value is greater than the second fingerprint value, then a first binary value is assigned, while if the second fingerprint value is greater than the first fingerprint value, a different second binary value is assigned. This sequence of binary values is output as a fingerprint for the audio signal. Preferably, this change is quantized by only a single bit. This 1-bit quantization provides only a single bit of fingerprint information per block of the audio signal, and the audio signal is represented by a simple bit sequence that provides a fast, efficient, and surprisingly accurate correlation with a corresponding test bit sequence can be carried out.
Audiosignale haben die Eigenschaft, dass sich von Block zu Block die Charakteristika nicht so stark ändern, sodass eine volle, z.B. 8-Bit-Quantisierung oder 16-Bit-Quantisierung des Fingerabdruckwerts nicht unbedingt erforderlich ist. Ferner haben Audiosignale die Eigenschaft, dass eine Änderung des Fingerabdruckwertes von einem Block zum nächsten sehr aussagekräftig für das Audiosignal ist. Durch die bevorzugte 1-Bit-Quantisierung wird diese Änderung von einem Block zum nächsten stark betont. So haben Audiosignale insbesondere die Eigenschaft, dass sich der Fingerabdruckwert von einem Block zum nächsten nicht besonders stark ändert. In dieser zwar kleinen Änderung steckt jedoch die besonders zu Fingerabdruck-Verarbeitungszwecken erforderliche Charakterisierungsinformation für das Audiosignal, die durch die erfindungsgemäße 1-Bit-Quantisierung wirkungsvoll ausgenutzt wird.Audio signals have the characteristic that the characteristics do not change so much from block to block that full, eg 8-bit quantization or 16-bit quantization of the fingerprint value is not absolutely necessary. Further Audio signals have the property that a change of the fingerprint value from one block to the next is very meaningful for the audio signal. The preferred 1-bit quantization strongly emphasizes this change from one block to the next. In particular, audio signals have the property that the fingerprint value does not change very much from one block to the next. In this small change, however, is the characterization information for the audio signal which is particularly required for fingerprint processing purposes and which is effectively utilized by the inventive 1-bit quantization.
Insbesondere dann, wenn der Fingerabdruckwert ein energieabhängiger oder leistungsabhängiger Wert ist, sind Änderungen von einem Block zum nächsten relativ klein, wobei jedoch insbesondere dann, wenn Blöcke in dem Bereich kleiner 5.000 Abtastwerte und insbesondere kleiner als 2.000 Abtastwerte und Blöcke größer als 500 Abtastwerte gebildet werden, die Änderung des energieabhängigen oder leistungsabhängigen Werts von einem Block zum anderen besonders charakteristisch für das Audiosignal.In particular, when the fingerprint value is an energy-dependent or power-dependent value, changes from one block to the next are relatively small, but especially if blocks in the range of less than 5,000 samples and in particular less than 2,000 samples and blocks greater than 500 samples are formed , the change of the energy-dependent or power-dependent value from one block to another particularly characteristic of the audio signal.
Besonders günstig lässt sich der erfindungsgemäße Fingerabdruck für die Synchronisation von Mehrkanalerweiterungsdaten mit einem Audiosignal einsetzen, wobei eine Synchronisation mittels einer Block-basierten Fingerabdruck-Technologie effizient und zuverlässig erreicht wird.The fingerprint according to the invention can be used particularly advantageously for the synchronization of multichannel extension data with an audio signal, wherein synchronization by means of a block-based fingerprint technology is achieved efficiently and reliably.
Es hat sich herausgestellt, dass blockweise berechnete Fingerabdrücke ein gutes und effizientes Charakteristikum für ein Audiosignal darstellen. Um jedoch die Synchronisation auf eine Ebene zu bringen, die kleiner als eine Blockdauer ist, wird es bevorzugt, das Audiosignal mit einer Blockeinteilungsinformation zu versehen, die bei einer Synchronisierung detektiert und zur Fingerabdruckberechnung einsetzbar ist.It has been found that fingerprints calculated in blocks represent a good and efficient characteristic for an audio signal. However, in order to bring the synchronization to a level which is smaller than a block duration, it is preferable to provide the audio signal with a block division information which is detected at a synchronization and usable for fingerprint calculation.
Das Audiosignal umfasst vorzugsweise eine Blockeinteilungsinformation, die zum Zeitpunkt des Synchronisierens verwendet werden kann. Damit wird sicher gestellt, dass die Fingerabdrücke, die beim Synchronisieren von dem Audiosignal abgeleitet werden, auf der selben Blockeinteilung beziehungsweise Blockrasterung basieren wie Fingerabdrücke des Audiosignals, die den Mehrkanalerweiterungsdaten zugeordnet sind. Insbesondere umfassen die Mehrkanalerweiterungsdaten eine Folge von Referenz-Audiosignal-Fingerabdruckinformationen. Diese Referenz-Audiosignal-Fingerabdruckinformationen liefern eine im Mehrkanalerweiterungsstrom enthaltene Zuordnung zwischen einem Block von Mehrkanalerweiterungsdaten und dem Abschnitt beziehungsweise Block des Audiosignals, zu dem die Mehrkanalerweiterungsdaten gehören.The audio signal preferably comprises a block division information which can be used at the time of synchronization. This will ensure that the fingerprints, which are derived from the audio signal when synchronizing, are based on the same block division as fingerprints of the audio signal associated with the multi-channel extension data. In particular, the multi-channel extension data comprises a sequence of reference audio signal fingerprint information. This reference audio signal fingerprint information provides an association contained in the multichannel extension stream between a block of multichannel extension data and the portion or block of the audio signal to which the multichannel extension data belongs.
Zur Synchronisation werden aus den Mehrkanalerweiterungsdaten die Referenz-Audiosignal-Fingerabdrücke extrahiert und mit den vom Synchronisierer berechneten Test-Audio-Signal-Fingerabdrücken korreliert. Der Korrelator muss lediglich eine Block-Korrelation erreichen, da aufgrund der Verwendung der Blockeinteilungsinformation die Blockrasterung, die den beiden Folgen von Fingerabdrücken zugrunde liegt, bereits identisch ist.For synchronization, the reference audio signal fingerprints are extracted from the multichannel extension data and correlated with the test audio signal fingerprints computed by the synchronizer. The correlator only has to achieve a block correlation because, due to the use of the block allocation information, the block rasterization underlying the two sequences of fingerprints is already identical.
Damit kann bei diesem Ausführungsbeispiel trotz der Tatsache, dass lediglich Fingerabdruckfolgen auf Blockniveau korreliert werden müssen, eine nahezu Sample-genaue Synchronisation der Mehrkanalerweiterungsdaten mit dem Audiosignal erreicht werden.Thus, in this embodiment, despite the fact that only fingerprint sequences at block level must be correlated, a nearly sample-accurate synchronization of the multi-channel extension data with the audio signal can be achieved.
Die Blockeinteilungsinformation, die in dem Audiosignal enthalten ist, kann als explizite Seiteninformation z. B. in einem Header des Audiosignals angegeben sein. Alternativ kann auch dann, wenn eine digitale, jedoch unkomprimierte Übertragung vorhanden ist, diese Blockeinteilungsinformation auch in einem Sample enthalten sein, der z. B. das erste Sample eines Blocks war, der gebildet wurde, um die Referenzaudiosignal-Fingerabdrücke zu berechnen, die in den Mehrkanalerweiterungsdaten enthalten sind. Alternativ oder zusätzlich kann die Blockeinteilungsinformation auch direkt in das Audiosignal selbst, z. B. mittels einer Wasserzeichen-Einbettung, eingebracht werden. Hierfür eignet sich besonders eine Pseudorausch-Sequenz, es können jedoch auch andere Arten und Weisen von Wasserzeicheneinbettungen verwendet werden, um eine Blockeinteilungsinformation in das Audiosignal einzubringen. Vorteil dieser Wasserzeichenimplementierung ist, dass auch beliebige Analog/Digital- oder Digital/Analog-Wandlungen unkritisch sind. Ferner existieren auch gegenüber der Datenkompression robuste Wasserzeichen, die sogar eine Kompression/Dekompression beziehungsweise sogar Tandem-Codierungsstufen überstehen werden und als zuverlässige Blockeinteilungsinformation zu Synchronisationszwecken eingesetzt werden können.The block division information contained in the audio signal may be used as explicit page information e.g. B. be specified in a header of the audio signal. Alternatively, even if a digital but uncompressed transmission is present, this block scheduling information may also be included in a sample which may be e.g. B. was the first sample of a block that was formed to calculate the reference audio signal fingerprints contained in the multichannel extension data. Alternatively or additionally, the block allocation information may also be input directly into the audio signal itself, e.g. B. by means of a watermark embedding, are introduced. Particularly suitable for this purpose is a pseudo noise sequence, however, other ways of watermark embedding may be used to introduce block division information into the audio signal. The advantage of this watermark implementation is that any analog / digital or digital / analog conversions are uncritical. Further, there are also robust watermarks to data compression that will survive even compression / decompression or even tandem coding stages and can be used as reliable block scheduling information for synchronization purposes.
Darüber hinaus wird es bevorzugt, in den Datenstrom der Mehrkanalerweiterungsdaten die Referenz-Audiosignal-Fingerabdruckinformationen direkt, blockweise einzubetten. Bei diesem Ausführungsbeispiel wird das Auffinden eines geeigneten Zeitoffsets unter Benutzung eines Fingerabdrucks mit einem nicht getrennt von den Mehrkanalerweiterungsdaten abgelegten Daten-Fingerabdruck erreicht. Stattdessen wird zu jedem Block der Mehrkanalerweiterungsdaten in diesem Block selbst der Fingerabdruck eingebettet. Alternativ können die Referenz-Audiosignal-Fingerabdruckinformationen, den MehrkanalErweiterungsdaten zugeordnet sein, jedoch aus einer separaten Quelle stammen.Moreover, it is preferred to embed the reference audio signal fingerprint information directly, block by block, into the data stream of the multi-channel extension data. In this embodiment, finding a suitable time offset using a fingerprint is achieved with a data fingerprint not stored separately from the multichannel extension data. Instead, for each block of multi-channel extension data in that block itself, the fingerprint is embedded. Alternatively, the reference audio signal fingerprint information associated with multichannel extension data may be from a separate source.
Bevorzugte Ausführungsbeispiele der vorliegenden Erfindung werden nachfolgend bezugnehmend auf die beiliegenden Zeichnungen detailliert erläutert. Es zeigen:
- Fig. 1
- ein Blockschaltbild einer Vorrichtung zum Verarbeiten des Audiosignals, um ein synchronisierbares Ausgangs- signal mit Mehrkanalerweiterungsdaten zu schaffen, gemäß einem Ausführungsbeispiel der Erfindung;
- Fig. 2
- eine detaillierte Darstellung des Fingerabdruck- Berechners von
Fig. 1 ; und - Fig. 3a
- ein Blockschaltbild einer Vorrichtung zum Synchroni- sieren gemäß einem Ausführungsbeispiel der Erfindung;
- Fig. 3b
- eine detailliertere Darstellung-des Ausgleichers von
Fig. 3a ; - Fig. 4a
- eine schematische Darstellung eines Audiosignals mit einer Blockeinteilungsinformation;
- Fig. 4b
- eine schematische Darstellung von Mehrkanalerweite- rungsdaten mit blockweise eingebetteten Fingerabdrü- cken;
- Fig. 5
- eine schematische Darstellung eines Wasserzeichen- Einbetters zum Erzeugen eines Audiosignals mit einem Wasserzeichen;
- Fig. 6
- eine schematische Darstellung eines Wasserzeichen- Extraktors zur Extraktion der Blockeinteilungsinfor- mation;
- Fig. 7
- eine schematische Darstellung eines Ergebnisdia- gramms, wie es nach einer Korrelation über z. B. 30 Blöcke der Test-Blockeinteilung erscheint;
- Fig. 8
- ein Ablaufdiagramm zur Veranschaulichung verschiede- ner Fingerabdruck-Berechnungsmöglichkeiten;
- Fig. 9
- ein Mehrkanal-Codierer-Szenario mit einer erfindungs- gemäßen Vorrichtung zum Verarbeiten;
- Fig. 10
- ein Mehrkanal-Decodierer-Szenario mit einem erfin- dungsgemäßen Synchronisierer;
- Fig. 11a
- eine detailliertere Darstellung des Mehrkanalerweite- rungsdatenberechners von
Fig. 9 ; und - Fig. 11b
- eine detailliertere Darstellung eines Blocks mit Mehrkanalerweiterungsdaten, wie er durch die in
Fig. 11a gezeigte Anordnung erzeugbar ist.
- Fig. 1
- a block diagram of an apparatus for processing the audio signal to provide a synchronizable output signal with multi-channel extension data, according to an embodiment of the invention;
- Fig. 2
- a detailed representation of the fingerprint calculator of
Fig. 1 ; and - Fig. 3a
- a block diagram of a device for synchronizing according to an embodiment of the invention;
- Fig. 3b
- a more detailed account of the equalizer of
Fig. 3a ; - Fig. 4a
- a schematic representation of an audio signal with a block allocation information;
- Fig. 4b
- a schematic representation of multi-channel extension data with block-embedded embedded fingerprints;
- Fig. 5
- a schematic representation of a watermark embedder for generating an audio signal with a watermark;
- Fig. 6
- a schematic representation of a watermark extractor for extracting the Blockeinteilungsinfor- mation;
- Fig. 7
- a schematic representation of a result dia- gram, as it is after a correlation over z. B. 30 blocks of test block partition appear;
- Fig. 8
- a flowchart for illustrating various fingerprint calculation capabilities;
- Fig. 9
- a multi-channel coder scenario with a device according to the invention for processing;
- Fig. 10
- a multi-channel decoder scenario with a synchronizer according to the invention;
- Fig. 11a
- a more detailed representation of the multi-channel extension data calculator of
Fig. 9 ; and - Fig. 11b
- a more detailed representation of a block with multi-channel extension data, as represented by the in
Fig. 11a shown arrangement can be generated.
Erhält der Fingerabdruck-Berechner 104 jedoch ein Audiosignal 102 ohne Blockeinteilungsinformationen, so wählt der Fingerabdruck-Berechner 104 irgendeine Blockeinteilung und führt eine allererste Blockeinteilung durch. Diese Blockeinteilung wird über eine Blockeinteilungsinformation 110 einem Blockeinteilungsinformationseinbetter 112 signalisiert, der ausgebildet ist, um in das Audiosignal 102 ohne Blockeinteilungsinformationen die Blockeinteilungsinformationen 110 einzubetten. Ausgangsseitig liefert der Blockeinteilungsinformationseinbetter, somit ein Audiosignal 114 mit Blockeinteilungsinformationen, wobei dieses Audiosignal über eine Ausgangsschnittstelle 116 ausgegeben werden kann oder unabhängig von der Ausgabe über die Ausgangsschnittstelle 116 separat gespeichert oder über einen anderen Weg ausgegeben werden kann, wie es bei 118 schematisch dargestellt ist.However, if the
Der Fingerabdruck-Berechner 104 ist ausgebildet, um eine Folge von Referenz-Audiosignal-Fingerabdruck-Informationen 120 zu berechnen. Diese Folge von Referenz-Audiosignal-FingerabdruckInformation wird einem Fingerabdruck-Informationseinbetter 122 zugeführt. Der Fingerabdruck-Informationseinbetter bettet die Referenz-Audiosignal-Fingerabdruck-Informationen 120 in Mehrkanalerweiterungsdaten 124 ein, die separat bereitgestellt werden können, oder die auch direkt von einem Mehrkanalerweiterungsdatenberechner 126 berechnet werden können, der eingangsseitig ein Mehrkanalaudiosignal 128 empfängt. Ausgangsseitig liefert der Fingerabdruckinformationseinbetter 122 somit Mehrkanalerweiterungsdaten mit zugeordneten Referenz-Audiosignal-Fingerabdruckinformationen, wobei diese Daten mit 130 bezeichnet sind. Der Fingerabdruckinformationseinbetter 122 ist ausgebildet, um die Referenz-Audiosignal-Fingerabdruckinformationen direkt gewissermaßen auf Blockebene in die Mehrkanalerweiterungsdaten einzubetten. Alternativ oder zusätzlich wird der Fingerabdruckinformationseinbetter 122 auch die Folge von Referenz-Audiosignal-Fingerabdruckinformationen anhand der Zuordnung zu einem Block von Multikanalerweiterungsdaten speichern beziehungsweise bereitstellen, wobei dieser Block der Multikanalerweiterungsdaten zusammen mit einem Block des Audiosignals eine möglichst gute Approximation eines Multikanalaudiosignals beziehungsweise des Mehrkanalaudiosignals 128 darstellt.The
Die Ausgangsschnittstelle 116 ist ausgebildet, um ein Ausgangssignal 132 auszugeben, das die Folge von Referenz-Audiosignal-Fingerabdruck-Informationen und die Mehrkanalerweiterungsdaten in eindeutiger Zuordnung, wie beispielsweise innerhalb eines eingebetteten Datenstroms, umfasst. Alternativ kann das Ausgangssignal auch eine Folge von Blöcken von Mehrkanalerweiterungsdaten ohne Referenz-Audiosignal-Fingerabdruck-Informationen sein. Die Fingerabdruckinformationen werden dann in einer eigenen Folge von Fingerabdruckinformationen geliefert, wobei beispielsweise jeder Fingerabdruck über eine fortlaufende Blocknummer mit einem Block von Mehrkanalerweiterungsdaten "verbunden" ist. Alternative Zuordnungen von Fingerabdruckdaten zu Blöcken, wie beispielsweise über eine implizite Signalisierung einer Reihenfolge etc. sind ebenfalls verwendbar.The
Das Ausgangssignal 132 kann ferner auch ein Audiosignal mit Blockeinteilungsinformationen umfassen. Bei speziellen Anwendungsfällen, wie beispielsweise beim Rundfunk, wird das Audio-signal mit Blockeinteilungsinformationen jedoch einen separaten Weg 118 gehen.The
Unabhängig von der Verwendung von Blockeinteilungsinformationen wird auch ein besonders guter, charakteristischer und effizienter Fingerabdruck durch eine Vorrichtung zur Berechnung eines Fingerabdrucks eines Audiosignals, wie sie z.B. in
Der Fingerabdruck-Korrelator 312 von
Der Fingerabdrucknachverarbeiter 104c von
Schließlich umfasst die erfindungsgemäße Vorrichtung zum Berechnen eines Fingerabdrucks noch eine Einrichtung zum Ausgeben von Informationen über eine Folge von binären Werten als Fingerabdruck für das Audiosignal, wobei die Einrichtung beispielsweise in Form der Ausgangsschnittstelle 116 von
Vorzugsweise sind die beiden binären Werte, also der erste binäre Wert und der zweite unterschiedliche binäre Wert komplementär zueinander. Bei dem in
Die Folge von Bits, wie sie durch den Block 814 erzeugt wird, ist dann der Test-Fingerabdruck bzw. der Referenz-Fingerabdruck.The sequence of bits as generated by
Die Blockeinteilungseinrichtung 104a von
Der erfindungsgemäße Fingerabdruck kann vorzugsweise zum Synchronisieren verwendet werden, wie es anhand von
Bei einem Ausführungsbeispiel der vorliegenden Erfindung ist das Audiosignal mit einem Wasserzeichen versehen, wie es in
Zur Wasserzeicheneinbettung wird, wie es in
Es sei darauf hingewiesen, dass viele verschiedene Wasserzeicheneinbettungsstrategien existieren. So kann die spektrale Gewichtung 510 beispielsweise durch eine duale Operation im Zeitbereich vorgenommen werden, so dass eine Zeit-/Frequenzumsetzung 506 nicht nötig ist.It should be noted that many different watermark embedding strategies exist. For example, the
Des weiteren könnte das spektralgewichtete Wasserzeichen auch vor seiner Kombination mit dem Audiosignal in den Zeitbereich transformiert werden, so dass die Kombination 512 im Zeitbereich stattfinden würde, wobei in diesem Fall eine Zeit-/Frequenzumsetzung 504 nicht unbedingt nötig sein würde, sofern die Maskierungsschwelle ohne Transformation berechnet werden kann. Selbstverständlich kann auch eine unabhängig vom Audiosignal beziehungsweise von einer Transformationslänge des Audiosignals verwendete Berechnung der Maskierungsschwelle vorgenommen werden.Furthermore, the spectrally weighted watermark could also be transformed into the time domain prior to its combination with the audio signal, such that the
Vorzugsweise ist die Länge der bekannten Pseudorauschsequenz gleich der Länge eines Blocks. Dann funktioniert eine Korrelation zur Wasserzeichenextraktion besonders effizient und übersichtlich. Allerdings können auch längere Pseudorauschsequenzen verwendet werden, so lange eine Periodendauer der Pseudorauschsequenz gleich oder größer als die Blocklänge ist. Ferner kann auch ein Wasserzeichen verwendet werden, das kein weißes Spektrum hat, sondern das beispielsweise derart gestaltet ist, dass es lediglich spektrale Anteile in bestimmten Frequenzbändern hat, wie beispielsweise dem unteren Spektralband oder einem mittleren Spektralband. Hierdurch kann gesteuert werden, dass das Wasserzeichen nicht z. B. nur in die oberen Bänder eingebracht wird, die z. B. durch eine "Spectral Band Replication"-Technik, wie sie vom MPEG-4-Standard bekannt ist, bei einer Datenraten sparenden Übertragung eliminiert beziehungsweise parametrisiert werden.Preferably, the length of the known pseudo noise sequence is equal to the length of a block. Then a correlation to the watermark extraction works very efficiently and clearly. However, longer pseudo noise sequences may also be used as long as a period of the pseudo noise sequence is equal to or greater than the block length. Furthermore, it is also possible to use a watermark which does not have a white spectrum but which, for example, is designed in such a way that it has only spectral components in specific frequency bands, for example the lower spectral band or a medium spectral band. This can be controlled that the watermark is not z. B. is introduced only in the upper bands, the z. B. by a "Spectral Band Replication" technique, as it is known from the MPEG-4 standard, be eliminated or parameterized in a data-sparing transmission.
Alternativ zur Verwendung eines Wasserzeichens kann auch eine Blockeinteilung vorgenommen werden, wenn z. B. ein digitaler Kanal existiert, bei dem jeder Block des Audiosignals von
Um das Szenario der Berechnung der Mehrkanalerweiterungsdaten zu veranschaulichen, wird nachfolgend auf
Die vom Parameterberechner 914 berechneten Parameterdaten werden einem Datenstromformatierer 916 zugeführt, der gleich dem Fingerabdruckinformationseinbetter 122 von
Daraus ergibt sich dann allgemein gesagt ein Datenstrom mit Mehrkanalerweiterungsdaten, wie er in
Das Audiosignal mit den Blockeinteilungsinformationen wird einem Blockdetektor 300 zugeführt, der ausgebildet ist, um die Blockeinteilungsinformationen in dem Audiosignal zu detektieren, und um die detektierten Blockeinteilungsinformationen 302 einem Fingerabdruck-Berechner 304 zuzuführen. Der Fingerabdruck-Berechner 304 erhält ferner das Audiosignal, wobei hier nur mehr ein Audiosignal ohne Blockeinteilungsinformationen ausreichend wäre, wobei jedoch der Fingerabdruck-Berechner auch ausgebildet sein kann, um das Audiosignal mit Blockeinteilungsinformationen zu Fingerabdruckberechnung zu verwenden.The audio signal having the block division information is supplied to a
Der Fingerabdruck-Berechner 304 berechnet nunmehr einen Fingerabdruck pro Block des Audiosignals für eine Mehrzahl von aufeinander folgenden Blöcken, um eine Folge von Test-Audiosignal-Fingerabdrücken 306 zu erhalten. Insbesondere ist der Fingerabdruck-Berechner 304 ausgebildet, um die Blockeinteilungsinformationen 302 zu verwenden, um die Folge von Test-Audiosignal-Fingerabdrücken 306 zu berechnen.The
Die erfindungsgemäße Synchronisationsvorrichtung beziehungsweise das erfindungsgemäße Synchronisationsverfahren basiert ferner auf einem Fingerabdruck-Extraktor 308 zum Extrahieren einer Folge von Referenz-Audiosignal-Fingerabdrücken 310 aus den Referenz-Audiosignal-Fingerabdruck-Informationen 120, wie sie dem Fingerabdruck-Extraktor 308 zugeführt werden.The synchronization device according to the invention or the synchronization method according to the invention is further based on a
Sowohl die Folge von Test-Fingerabdrücken 306 als auch die Folge von Referenz-Fingerabdrücken 308 werden einem Fingerabdruckkorrelator 312 zugeführt, der ausgebildet ist, um die beiden Folgen zu korrelieren. Abhängig von einem Korrelationsergebnis 314, bei dem ein Versatzwert erhalten wird, welcher ein ganzzahliges (x) der Blocklänge (ΔD) ist, wird ein Ausgleicher 316 gesteuert, um einen zeitlichen Versatz zwischen den Mehrkanalerweiterungsdaten 132 und dem Audiosignal 114 zu reduzieren oder im besten Fall zu eliminieren. Am Ausgang des Ausgleichers 316 werden somit sowohl das Audiosignal als auch die Mehrkanalerweiterungsdaten in synchronisierter Form ausgegeben, um einer Multikanalrekonstruktion zugeführt zu werden, wie sie bezugnehmend auf
In
Bezüglich der Implementierung des Ausgleichers 316 sei darauf hingewiesen, dass auch zwei variable Verzögerungen verwendet werden können, so dass das Korrelationsergebnis 314 beide variablen Verzögerungsstufen steuert. Auch alternative Implementierungsmöglichkeiten innerhalb eines Ausgleichers zu Synchronisationszwecken, um zeitliche Offsets zu eliminieren, können eingesetzt werden.With regard to the implementation of the
Nachfolgend wird bezugnehmend auf
Bei dem in
Zur Blockbildung im Block 600 wird eine Test-Blockeinteilung vorgegeben, die nicht unbedingt der endgültigen Blockeinteilung entsprechen muss. Stattdessen wird nunmehr der Korrelator 608 über mehrere Blöcke hinweg, beispielsweise über zwanzig oder sogar noch mehr Blöcke hinweg eine Korrelation durchführen. Hierbei wird in dem Korrelator 608 das Spektrum der bekannten Rauschsequenz mit dem Spektrum jedes Blocks bei verschiedenen Verzögerungswerten korreliert, so dass sich nach mehreren Blöcken ein Korrelationsergebnis 610 ergibt, das beispielsweise so ausschauen könnte, wie es in
Im Hinblick auf den beispielhaften Wasserzeichen-Extraktor in
Bei einem bevorzugten Ausführungsbeispiel der vorliegenden Erfindung wird somit zur Lösung des Zuordnungsproblems eine spezielle Vorgehensweise auf Senderseite und Empfangsseite bevorzugt. Auf Senderseite kann eine Berechnung von zeitlich veränderlichen und geeigneten Fingerprint-Informationen aus dem korrespondierenden (Mono- oder Stereo-) Downmixaudiosignal vorgenommen werden. Ferner können diese Fingerprints regelmäßig als Synchronisationshilfe in den versendeten Mehrkanalzusatzdatenstrom eingetastet werden. Dies kann als ein Datenfeld inmitten der blockweise organisierten Spatial-Audio-Coding-Seiteninformationen erfolgen oder so, dass das Fingerprint-Signal als erste oder letzte Information des Datenblocks geschickt wird, um somit leicht hinzugefügt beziehungsweise herausgenommen werden zu können. Ferner kann ein Wasserzeichen, wie beispielsweise eine bekannte Rauschsequenz, in das zu versendende Audiosignal eingebettet werden. Dies dient dem Empfänger zur Ermittlung der Rahmenphase und zur Eliminierung eines rahmeninternen Versatzes.In a preferred embodiment of the present invention, a special approach on the transmitter side and the receiver side is thus preferred to solve the assignment problem. On the transmitter side, a calculation of temporally variable and suitable fingerprint information from the corresponding (mono or stereo) Downmixaudiosignal be made. Furthermore, these fingerprints can be used regularly as a synchronization aid in the multichannel additional data stream sent be keyed. This can be done as a data field in the middle of the block-organized spatial audio coding page information or in such a way that the fingerprint signal is sent as first or last information of the data block so that it can be easily added or removed. Furthermore, a watermark, such as a known noise sequence, may be embedded in the audio signal to be sent. This serves the receiver to identify the frame phase and to eliminate in-frame skew.
Auf der Empfangsseite wird eine zweistufige Synchronisierung bevorzugt. In einer ersten Stufe wird das Wasserzeichen aus dem empfangenen Audiosignal extrahiert, und es wird die Position der Rauschsequenz ermittelt. Ferner können die Framegrenzen aufgrund ihrer Rauschsequenz durch die Position ermittelt und der Audiodatenstrom entsprechend unterteilt werden. In diesen Framegrenzen beziehungsweise Blockgrenzen können die charakteristischen Audiomerkmale, d. h. Fingerabdrücke oder Fingerprints über die nahezu gleichen Abschnitte errechnet werden, wie sie auch im Sender errechnet wurden, wodurch sich die Qualität des Ergebnisses bei einer späteren Korrelation erhöht. In einer zweiten Stufe werden dann zeitlich veränderliche und geeignete Fingerprintinformationen aus den korrespondierenden Stereo-Audio-Signal oder Mono-Audio-Signal beziehungsweise allgemein gesagt, aus dem Downmix-Signal berechnet, wobei das Downmix-Signal auch mehr als zwei Kanäle haben kann, so lange die Kanäle im Downmix-Signal eine kleinere Anzahl haben als in dem ursprünglichen Audiosignal vor dem Downmix Kanäle oder allgemein Audioobjekte sind.On the receiving side, a two-stage synchronization is preferred. In a first stage, the watermark is extracted from the received audio signal and the position of the noise sequence is determined. Furthermore, the frame boundaries can be determined by the position due to their noise sequence and the audio data stream can be subdivided accordingly. In these frame boundaries, the characteristic audio features, i. H. Fingerprints or fingerprints are calculated over the almost equal sections, as they were calculated in the transmitter, which increases the quality of the result in a later correlation. In a second stage, then time-varying and suitable fingerprint information from the corresponding stereo audio signal or mono-audio signal or generally speaking, calculated from the downmix signal, the downmix signal can also have more than two channels, so long the channels in the downmix signal have a smaller number than in the original audio signal before the downmix channels or are generally audio objects.
Ferner können die Fingerabdrücke aus dem Mehrkanalzusatzinformationen extrahiert werden und es kann ein zeitlicher Versatz zwischen den Mehrkanalzusatzinformationen und dem empfangenen Signal über geeignete und auch über bekannte Korrelationsmethoden vorgenommen werden. Ein gesamter zeitlicher Versatz setzt sich aus der Framephase und dem Versatz zwischen Mehrkanalzusatzinformation und empfangenem Audiosignal zusammen. Ferner können das Audiosignal und die Mehrkanalzusatzinformationen für eine anschließende Mehrkanaldecodierung durch eine nachgeschaltete, aktiv geregelte Verzögerungsausgleichsstufe synchronisiert werden.Further, the fingerprints may be extracted from the multichannel overhead information, and a temporal offset between the multichannel overhead information and the received signal may be made via appropriate and well known correlation techniques. An overall time offset is composed of the frame phase and the offset between multichannel overhead information and received audio signal. Furthermore, the audio signal and the multi-channel additional information be synchronized for subsequent multi-channel decoding by a downstream, actively controlled delay equalization stage.
Das Mehrkanalaudiosignal wird für die Gewinnung der Mehrkanalzusatzdaten beispielsweise in Blöcke fester Größe eingeteilt. In den jeweiligen Block wird eine dem Empfänger ebenfalls bekannte Rauschsequenz eingebettet, beziehungsweise wird allgemein ein Wasserzeichen eingebettet. Im gleichen Raster wird nun blockweise gleichzeitig oder wenigstens synchronisiert zur Gewinnung der Mehrkanalzusatzdaten ein Fingerprint berechnet, der geeignet ist, die zeitliche Struktur des Signal möglichst eindeutig zu charakterisieren.For example, the multichannel audio signal is divided into blocks of fixed size to obtain the multichannel overhead data. In the respective block, a noise sequence also known to the receiver is embedded, or in general a watermark is embedded. In the same grid, a fingerprint is calculated block by block simultaneously or at least synchronized to obtain the multichannel additional data, which is suitable for characterizing the temporal structure of the signal as clearly as possible.
Ein Ausführungsbeispiel hierzu ist es, den Energiegehalt des aktuellen Downmixaudiosignals des Audioblocks zu verwenden, beispielsweise in logarithmierter Form, also in einer Dezibelverwandten Darstellung. In diesem Fall ist der Fingerprint ein Maß für die zeitliche Hüllkurve des Audiosignals. Um die zu übertragende Informationsmenge zu reduzieren und die Genauigkeit des Messwerts zu steigern, kann diese Synchronisationsinformation auch als Differenz zum Energiewert des vorangegangenen Blocks mit anschließender geeigneter Entropiecodierung, wie beispielsweise einer Huffman-Codierung, einer adaptiven Skalierung und einer Quantisierung ausgedrückt werden.An embodiment of this is to use the energy content of the current downmix audio signal of the audio block, for example in logarithmic form, ie in a decibel-related representation. In this case, the fingerprint is a measure of the temporal envelope of the audio signal. In order to reduce the amount of information to be transmitted and increase the accuracy of the measured value, this synchronization information may also be expressed as a difference to the energy value of the previous block followed by appropriate entropy coding, such as Huffman coding, adaptive scaling and quantization.
Nachfolgend wird bezugnehmend auf
Nach einer Blockeinteilung in einem Blockeinteilungsschritt 800 liegt das Audiosignal in aufeinander folgenden Blöcken vor. Hierauf wird eine Fingerabdruckwertberechnung gemäß Block 104b von
Insbesondere steht der Signalwert sleft(i) mit der Nummer i für einen zeitlichen Abtastwerte eines linken Kanals des Audiosignals. sright(i) steht für den i-ten Abtastwert eines rechten Kanals des Audiosignals. Bei dem gezeigten Ausführungsbeispiel beträgt die Blocklänge 1152 Audioabtastwerte, weshalb die 1153 Audioabtastwerte (einschließlich des Abtastwerts für i = 0) sowohl vom linken als auch vom rechten Downmixkanal jeweils quadriert und aufsummiert werden. Ist das Audiosignal ein monophones Audiosignal, so entfällt die Summierung. Ist das Audiosignal ein Signal mit z. B. drei Kanälen, so werden die quadrierten Abtastwerte von drei Kanälen aufsummiert. Ferner wirdes bevorzugt, vor der Berechnung die (nicht aussagekräftigen) Gleichanteile der Downmixaudiosignale zu entfernen.In particular, the signal value s left (i) with the number i stands for a temporal sample of a left channel of the audio signal. s right (i) stands for the ith sample of a right channel of the audio signal. In the illustrated embodiment, the block length is 1152 audio samples, therefore the 1153 audio samples (including the sample for i = 0) are both squared and summed from both the left and right downmix channels. If the audio signal is a monophonic audio signal, the summation is omitted. Is the audio signal a signal with z. B. three channels, the squared samples of three channels are summed. Further, it is preferable to remove the (non-meaningful) DC components of the downmix audio signals before the calculation.
In einem Schritt 804 wird nunmehr vorzugsweise eine Minimumbegrenzung der Energie zwecks einer anschließenden logarithmischen Darstellung vorgenommen. Für eine Dezibel-verwandte Bewertung der Energie wird ein minimaler Energie-Offset Eoffset beaufschlagt, damit sich im Falle der Nullenergie eine sinnvolle logarithmische Rechnung ergibt. Diese Energiemaßzahl in dB beschreibt dabei einen Zahlenbereich von 0 bis 90 (dB) bei einer Audiosignalauflösung von 16 Bit. In einem Block 804 wird somit folgende Gleichung implementiert:
Vorzugsweise wird für eine exakte Bestimmung des zeitlichen Versatzes zwischen den Mehrkanalzusatzinformationen und dem empfangenen Audiosignal nicht der absolute Energie-Höhekurvenwert verwendet, sondern vielmehr die Steigung beziehungsweise Steilheit der Signalhüllkurve. Hierbei wird für die Korrelationsmessung in dem Fingerabdruck-Korrelator 312 von
Edb(Diff) ist der Differenzwert der Energiewerte zweier vorausgehender Blöcke, und zwar in einer dB-Darstellung, während Edb die Energie in dB des aktuellen Blocks beziehungsweise des vorangegangenen Blocks ist, wie es aus der vorstehenden Gleichung selbst erklärend ist. Diese Differenzbildung der Energien wird in einem Schritt 806 durchgeführt.E db (Diff) is the difference value of the energy values of two preceding blocks, in a dB representation, while E db is the energy in dB of the current block or the previous block, as explained in the above equation itself. This difference of energies is performed in a
Es sei darauf hingewiesen, dass dieser Schritt z. B. nur im Encoder, also im Fingerabdruck-Berechner 104 von
Alternativ kann der Schritt 806 der Differenzbildung auch rein decodiererseitig implementiert werden, also in dem Fingerabdruck-Berechner 304 von
Während die Blöcke 802, 804, 806 zur Fingerabdruckwertberechnung gemäß 104b von
Bei der Skalierung der Energie (Hüllkurve des Signals) für eine optimale Aussteuerung gemäß dem Block 808 wird sichergestellt, dass bei der anschließenden Quantisierung dieses Fingerabdrucks sowohl der Zahlenbereich maximal ausgenutzt als auch die Auflösung bei geringen Energiewerten verbessert wird. Hierzu wird eine zusätzliche Skalierung beziehungsweise Verstärkung eingeführt. Diese kann entweder als feste oder statische Gewichtungsgröße oder über eine an das Hüllkurvensignal angepasste dynamische Verstärkungsregelung realisiert werden. Auch Kombinationen einer statischen Gewichtungsgröße sowie einer angepassten dynamischen Verstärkungsregelung sind verwendbar. Insbesondere wird gemäß folgender Gleichung vorgegangen:
Eskaliert stellt hierbei die skalierte Energie dar. Edb(diff) stellt die durch die Differenzbildung im Block 806 berechnete Differenzenergie in dB dar, und Averstärkung ist der Verstärkungsfaktor, der von der Zeit t abhängig sein kann, wenn es sich um eine insbesondere dynamische Verstärkungsregelung handelt. Der Verstärkungsfaktor wird von dem Hüllkurvensignal dahingehend abhängen, dass mit größerer Hüllkurve der Verstärkungsfaktor kleiner wird und mit kleinerer Hüllkurve der Verstärkungsfaktor größer wird, um eine möglichst gleichmäßige Aussteuerung des zur Verfügung stehenden Zahlenbereichs zu erhalten. Der Verstärkungsfaktor kann insbesondere im Fingerabdruck-Berechner 304 durch Messen der Energie des übertragenen Audiosignals nachgebildet werden, so dass der Verstärkungsfaktor nicht explizit übertragen werden muss.E scaled this case, the scaled energy. E db (diff) is calculated by the difference in the
In einem Block 810 wird der vom Block 808 berechnete Fingerabdruck quantisiert. Dies wird durchgeführt, um den Fingerprint für die Eintastung in die Mehrkanalzusatzinformationen vorzubereiten. Diese reduzierte Fingerprintauflösung hat sich als guter Kompromiss hinsichtlich Bitbedarf und Zuverlässigkeit der Verzögerungsdetektion bewährt. Vor allem Überläufe von > 255 können dabei mit einer Sättigungskennlinie auf den Maximalwert von 255 begrenzt werden, wie es beispielsweise folgendermaßen gleichungsmäßig dargestellt werden kann:
Equantisiert ist hierbei der quantisierte Energiewert und stellt einen Quantisierungsindex dar, der 8 Bit hat. Q8Bit ist die Quantisierungsoperation, die einem Wert > 255 den Quantisierungsindex für den Maximalwert 255 zuweist. Es sei darauf hingewiesen, dass auch feinere Quantisierungen mit mehr als 8 Bit oder gröbere Quantisierungen mit weniger als 8 Bit genommen werden können, wobei bei gröber werdender Quantisierung der Zusatzbitbedarf abnimmt, während bei feinerer Quantisierung mit mehr Bits der Zusatzaufwand an Bits ansteigt, jedoch auch die Genauigkeit ansteigt.E quantizes the quantized energy value and represents a quantization index that has 8 bits. Q 8Bit is the quantization operation that assigns the quantization index for the maximum value 255 to a value> 255. It should be noted that even finer quantizations with more than 8 bits or coarser quantizations with less than 8 bits can be taken, with coarser quantization of the additional bit needs decreases, while finer quantization with more bits of the overhead of bits increases, but also the accuracy increases.
In einem Block 812 kann hierauf eine Entropiecodierung des Fingerprints stattfinden. Durch die Auswertung von statistischen Eigenschaften des Fingerprints kann der Bitbedarf für den quantisierten Fingerprint weiter reduziert werden. Ein geeignetes Entropieverfahren ist beispielsweise die Huffman-Codierung. Statistisch unterschiedliche Häufigkeiten von Fingerprint-Werten können durch verschiedene Codelängen ausgedrückt werden und somit im Mittel den Bitbedarf der Fingerprintdarstellung reduzieren.In
Das Ergebnis des Entropiecodierungsblocks 812 wird dann in den Erweiterungskanaldatenstrom geschrieben, wie es bei 813 dargestellt ist. Alternativ können auch nicht-entropie-codierte Fingerprints als quantisierte Werte in den Bitstrom geschrieben werden, wie es bei 811 dargestellt ist.The result of the
Alternativ zur Energieberechnung pro Block im Schritt 802 kann auch ein anderer Fingerabdruckwert berechnet werden, wie es im Block 818 dargestellt ist.Alternatively to the energy calculation per block in
Alternativ zur Energie eines Blocks kann auch der Crestfaktor des Leistungsdichtespektrums (PSD-Crest) berechnet werden. Der Crestfaktor berechnet sich allgemein als Quotient zwischen dem Maximalwert XMax des Signals in einem Block zum arithmetischen Mittelwert der Signale Xn (z.B. Spektralwerte) in dem Block, wie es in der nachfolgenden Gleichung
beispielhaft dargestellt ist.As an alternative to the energy of a block, the crest factor of the power density spectrum (PSD crest) can also be calculated. The crest factor is generally calculated as the quotient between the maximum value XMax of the signal in a block to the arithmetic mean of the signals X n (eg, spectral values) in the block, as in the following equation
is shown by way of example.
Um eine robustere Synchronisierung zu erreichen, kann ferner ein weiteres Verfahren eingesetzt werden. Anstelle des Nachverarbeitens mittels den Blöcken 808, 810, 812 kann als alternative Fingerabdrucknachverarbeitung 104c (
Die erfindungsgemäß bevorzugte 1-Bit-Quantisierung vereinfacht die Korrelationsberechnung im Fingerabdruck-Korrelator 312 erheblich. Aufgrund der Tatsache, dass der Test-Fingerabdruck und der Referenz-Fingerabdruck Bit-Sequenzen sind, kann die Korrelation auf eine einfache XOR-Verknüpfung und anschließende Aufsummierung der bitweisen Ergebnisse der XOR-Verknüpfung vereinfacht werden. Wenn also die Folge von Test-Audiosignalen-Fingerabdruckwerten und die Folge von Referenz-Audiosignal-Fingerabdruckwerten jeweils eine Folge von 1-Bit-Werten sind, wobei jeweils ein Bit für einen Block von Audio-Abtastwerten steht, so ist der Fingerabdruck-Korrelator 312 von
Ferner ist der Fingerabdruck-Korrelator 312 ausgebildet, um eine um einen Verschiebungswert verschobene Bitfolge der Folge von Test-Audiosignal-Fingerabdrücken oder Referenz- Audiosignal-Fingerabdrücken mit einer jeweils anderen Folge durch ebenfalls eine bitweise XOR-Verknüpfung zu kombinieren und die erhaltenen Bitergebnisse aufzusummieren, wodurch ein zweiter Korrelationswert erhalten wird. Für den Verschiebungswert, für den es den maximalen Korrelationswert gegeben hat, kann dann festgestellt werden, dass Test-Fingerabdruck und Referenz-Fingerabdruck übereingestimmt haben. Dieser Verschiebungswert stellt somit das Korrelationsergebnis dar, da es für diesen speziellen Verschiebungswert den größten Korrelationswert gegeben hat.Furthermore, the
Zusätzlich zur Verbesserung der Synchronisationsergebnisse wirkt sich diese Quantisierung auch auf die benötigte Bandbreite für die Übertragung des Fingerprints aus. Mussten vorher für den Fingerprint mindestens 8 Bit eingesetzt werden, um einen ausreichend genauen Wert bereitzuhalten, genügt hier ein einziges Bit. Da der Fingerprint und sein 1-Bit-Pendant schon im Sender ermittelt werden, erreicht man eine genauere Berechnung der Differenz, da der eigentliche Fingerprint mit maximaler Auflösung vorliegt und so auch minimale Änderungen zwischen den Fingerprints sowohl im Sender als auch im Empfänger berücksichtigt werden können. Ferner hat sich herausgestellt, dass sich die meisten aufeinander folgenden Fingerprints nur minimal unterscheiden. Dieser Unterschied wird jedoch durch eine Quantisierung vor der Differenzbildung zunichte gemacht werden.In addition to improving the synchronization results, this quantization also affects the bandwidth needed to transmit the fingerprint. If at least 8 bits had to be used for the fingerprint before, in order to provide a sufficiently accurate value, one single bit is sufficient here. Since the fingerprint and its 1-bit counterpart are already determined in the transmitter, one achieves a more accurate calculation of the difference, since the actual fingerprint with maximum resolution is present and so minimal changes between the fingerprints both in the transmitter and in the receiver can be considered , It has also been found that most consecutive fingerprints differ only minimally. However, this difference will be nullified by quantization before difference formation.
Je nach Implementierung, und wenn eine blockweise Genauigkeit ausreichend ist, kann die 1-Bit-Quantisierung als spezielle Fingerabdruck-Nachverarbeitung auch unabhängig davon verwendet werden, ob ein Audiosignal mit Zusatz-Informationen vorliegt oder nicht, da die 1-Bit-Quantisierung auf der Basis einer Differenzcodierung bereits an sich ein robustes und dennoch genaues Fingerabdruck-Verfahren ist, das auch zu anderen Zwecken als zur Synchronisation, z. B. zu Zwecken der Identifizierung oder Klassifizierung eingesetzt werden kann.Depending on the implementation, and if blockwise accuracy is sufficient, 1-bit quantization as a special fingerprint post-processing can also be used regardless of whether there is an audio signal with overhead information or not, since 1-bit quantization on the The basis of differential coding is inherently a robust, yet accurate fingerprint method, which is also used for purposes other than synchronization, for. B. can be used for purposes of identification or classification.
Wie es anhand von
Die bevorzugte Wortmar-ken-Fingerprint-Hybdrid-Lösung erlaubt es einem Synchronisierer, einen zeitlichen Versatz von Downmixsignal und Zusatzdaten zu erkennen und eine zeitkorrekte Anpassung, also eine Verzögerungskompensation zwischen dem Audiosignal und den Mehrkanalerweiterungsdaten in der Größenordnung von +/- einem Sample-Wert zu realisieren. Somit kann die Mehrkanalzuordnung im Empfänger fast vollständig, d. h. bis auf einen kaum wahrnehmbaren Zeitunterschied von wenigen Samples rekonstruiert werden, welches sich nicht nennenswert auf die Qualität des rekonstruierten Mehrkanalaudiosignals auswirkt.The preferred word-mark fingerprint hybridization solution allows a synchronizer to detect a skew of downmix signal and overhead data and a timely adjustment, ie delay compensation, between the audio signal and the multi-channel extension data on the order of +/- one sample value to realize. Thus, the multi-channel assignment in the receiver can be almost completely, i. H. are reconstructed to a barely perceptible time difference of a few samples, which does not significantly affect the quality of the reconstructed multichannel audio signal.
Der erfindungsgemäße Fingerabdruck, wie er durch z.B. den Fingerabdrucksberechner 104 oder den Fingerabdrucksberechner 304 mit oder ohne Blockeinteilungsinformation berechnet wird, kann dazu verwendet werden, um ein Test-Audiosignal zu charakterisieren. Hierzu ist eine Einrichtung 104 bzw. 304 vorgesehen, um von dem Test-Audiosignal eine Folge von Test-Audiosignal-Fingerabdrücken zu erhalten.The fingerprint according to the invention, as calculated by, for example, the
Ferner ist ein Korrelierer, wie beispielsweise der Korrelierer 312 vorgesehen, um die Folge von binären Werten mit verschiedenen Referenz-Fingerabdrücken, die in einer Referenz-Datenbank vorgesehen sind, zu korrelieren, wobei die Referenz-Datenbank für jeden Referenz-Fingerabdruck eine Information über ein Audiosignal aufweist, das dem Referenz-Fingerabdruck zugeordnet ist.Further, a correlator, such as
Basierend auf diesen verschiedenen Korrelationen, also basierend auf dem der Korrelation des Test-Audiosignal-Fingerabdrucks in Folge einer 1-Bit-Frequenz und der verschiedenen Referenz-Fingerabdrücke der Referenz-Datenbank kann dann eine Information über das Test-Audiosignal getroffen werden.Based on these different correlations, that is based on the correlation of the test audio signal fingerprint due to a 1-bit frequency and the various reference fingerprints of the reference database, information about the test audio signal can then be made.
Die Information über das Test-Audiosignal ist beispielsweise eine Identifikation des Audiosignals, also wie das Stück heißt und ggfs. von welchem Autor es stammt und auf welcher CD bzw. auf welchem Tonträger dieses Stück zu finden ist, und wo es zu bestellen ist. Eine alternative Charakterisierung eines Audiosignals besteht darin, ein Test-Audiosignal z.B. als Audiosignal einer bestimmten Stilepoche bzw. einer bestimmten Stilrichtung zugehörig zu identifizieren bzw. von einer bestimmten Musikgruppe stammend zu identifizieren. Eine solche Charakterisierung kann beispielsweise dadurch erfolgen, dass nicht nur qualitativ sondern quantitativ bestimmt wird, wie der Referenz-Fingerabdruck zum Test-Fingerabdruck steht bzw. welcher Abstand zwischen beiden existiert. Dieser Abgleich der Fingerabdruck-Sequenzen bzw. die Berechnung des quantitativen Abstands der Fingerabdruck-Sequenzen kann z.B. stattfinden, wenn eine Korrelation stattgefunden hat, um den zeitlichen Versatz des Referenz-Fingerabdrucks und des Test-Fingerabdrucks zu eliminieren.The information about the test audio signal is, for example, an identification of the audio signal, that is to say what the song is called and, if applicable, from which author it originates and on which CD or on which sound carrier this piece can be found and where it can be ordered. An alternative characterization of an audio signal is to provide a test audio signal e.g. identify as audio signal of a certain style epoch or a certain style belonging to identify or originating from a particular music group. Such a characterization can be done, for example, by not only qualitatively but quantitatively determining how the reference fingerprint stands for the test fingerprint or what distance exists between the two. This alignment of the fingerprint sequences or the calculation of the quantitative spacing of the fingerprint sequences can be performed e.g. take place when a correlation has taken place to eliminate the time offset of the reference fingerprint and the test fingerprint.
Abhängig von den Gegebenheiten kann das erfindungsgemäße Verfahren in Hardware oder in Software implementiert werden. Die Implementierung kann auf einem digitalen Speichermedium, insbesondere einer Diskette, CD oder DVD mit elektronisch auslesbaren Steuersignalen erfolgen, die so mit einem programmierbaren Computersystem zusammenwirken können, dass das Verfahren ausgeführt wird. Allgemein besteht die Erfindung somit auch in einem Computer-Programm-Produkt mit einem auf einem maschinenlesbaren Träger gespeicherten Programmcode zur Durchführung des erfindungsgemäßen Verfahrens, wenn das Computer-Programm-Produkt auf einem Rechner abläuft. In anderen Worten ausgedrückt, kann die Erfindung somit als ein Computer-Programm mit einem Programmcode zur Durchführung des Verfahrens realisiert werden, wenn das Computer-Programm auf einem Computer abläuft.Depending on the circumstances, the method according to the invention can be implemented in hardware or in software. The implementation may be on a digital storage medium, in particular a floppy disk, CD or DVD with electronically readable control signals, which may interact with a programmable computer system such that the method is performed. In general, the invention thus also consists in a computer program product with a program code stored on a machine-readable carrier for carrying out the method according to the invention, when the computer program product runs on a computer. In other words, the invention can thus be realized as a computer program with a program code for carrying out the method when the computer program runs on a computer.
Claims (14)
- An apparatus for synchronizing multichannel extension data (132) with an audio signal (114), wherein reference audio signal fingerprint information is associated with the multichannel extension data, comprising:a fingerprint calculator (304) for calculating a fingerprint of the audio signal (114), comprising:a means (104a) for dividing the audio signal into subsequent blocks of samples;a means (104b) for calculating a first fingerprint value for a first block of the subsequent blocks and a second fingerprint value for a second block of the subsequent blocks;a means for comparing (806) the first fingerprint value with the second fingerprint value;a means for assigning (814) a first binary value when the first fingerprint value is higher than the second fingerprint value, or a second different binary value when the first fingerprint value is smaller than the second fingerprint value;anda means (104c) for outputting information about a sequence of binary values as fingerprint for the audio signal;a fingerprint extractor (308) for extracting a sequence of reference audio signal fingerprints from the reference audio signal fingerprint information associated with the multichannel extension data (132);wherein the sequence of test audio signal fingerprints and the sequence of reference audio signal fingerprints are each a sequence of 1-bit values, wherein one bit each is associated with one block of audio samples,a fingerprint correlator (312) for correlating the sequence of test audio signal fingerprints and the sequence of reference audio signal fingerprints, the fingerprint correlator (312) being implemented
to combine a bit sequence of the sequence of test audio signal fingerprints and a bit sequence of the reference audio signal fingerprints by a bit-by-bit XOR operation, and to sum up obtained bit results in order to obtain a first correlation value,
to further combine a bit sequence of the sequence of test audio signal fingerprints or the reference audio signal fingerprints shifted by an offset value with a respectively different sequence by a bit-by-bit XOR operation, and to sum up obtained bit results in order to obtain a second correlation value, and
to select that offset value as the correlation result for which the largest correlation value has resulted; anda compensator (316) for reducing or eliminating a time offset between the multichannel extension data (132) and the audio signal based on the correlation result (314). - The apparatus according to claim 1, wherein the means for assigning (814) is implemented to take a binary value that is complementary to the first binary value as a second different value.
- The apparatus according to claim 2, wherein the first binary value and the second binary value are exactly one bit.
- The apparatus according to claim 3, wherein the means for assigning (814) is implemented to assign a first bit value as first binary value and a second bit value complementary to the first value as second different value.
- The apparatus according to one of the previous claims, wherein the means (116) for outputting is implemented to output a sequence of bits as fingerprint.
- The apparatus according to one of the previous claims, wherein the means for comparing (806) is implemented to calculate a difference between the first fingerprint value and the second fingerprint value; and
wherein the means for assigning (814) is implemented to assign the first binary value when the difference is more than 0 and to assign the second binary value when the difference is less than 0. - The apparatus according to one of the previous claims, wherein the means (104a) for dividing is implemented to provide adjacent or overlapping blocks as subsequent blocks.
- The apparatus according to one of the previous claims, wherein the means (104b) for calculating is implemented to calculate an energy or power-dependent amount of the block as first or second fingerprint value.
- The apparatus according to one of the previous claims, wherein the means (104b) for calculating is implemented to square and sum up time samples per block in order to obtain the first or second fingerprint value for the block.
- The apparatus according to one of claims 1 to 8, wherein the means (104b) for calculating is implemented to calculate a crest factor of a power spectrum of the block as first or second fingerprint value.
- An apparatus for characterizing a test audio signal, comprising:a means for calculating a test fingerprint of the test audio signal (114), comprising:a means (104a) for dividing the audio signal into subsequent blocks of samples;a means (104b) for calculating a first fingerprint value for a first block of the subsequent blocks and a second fingerprint value for a second block of the subsequent blocks;a means for comparing (806) the first fingerprint value with the second fingerprint value;a means for assigning (814) a first binary value when the first fingerprint value is higher than the second fingerprint value, or a second different binary value when the first fingerprint value is smaller than the second fingerprint value; anda means (104c) for outputting information about a sequence of binary values as fingerprint for the audio signal;a means for correlating the information about the sequence of binary values with different reference fingerprints in a reference database, wherein the reference database comprises information about an audio signal for every reference fingerprint, which is associated to the reference fingerprint; andwherein the sequence of test audio signal fingerprints and the sequence of reference audio signal fingerprints are each a sequence of 1-bit values, wherein one bit each is associated with one block of audio samples,the means for correlating (312) being implemented
to combine a bit sequence of the sequence of test audio signal fingerprints and a bit sequence of the reference audio signal fingerprints by a bit-by-bit XOR operation, and to sum up obtained bit results in order to obtain a first correlation value,
to further combine a bit sequence of the sequence of test audio signal fingerprints or the reference audio signal fingerprints shifted by an offset value with a respectively different sequence by a bit-by-bit XOR operation, and to sum up obtained bit results in order to obtain a second correlation value, and
to select that offset value as the correlation result for which the largest correlation value has resulted,a means for providing information about the test audio signal based on the correlation result. - A method for synchronizing multichannel extension data (132) with an audio signal (114), wherein the multichannel extension data are associated with the reference audio signal fingerprint information, comprising:calculating (304) a fingerprint of an audio signal , comprising;
dividing (104a) the audio signal into subsequent blocks of samples;
calculating (104b) a first fingerprint value for a first block of the subsequent blocks and a second fingerprint value for a second block of the subsequent blocks;
comparing (806) the first fingerprint value with the second fingerprint value;
assigning (814) a first binary value when the first fingerprint value is higher than the second fingerprint value, or a second different binary value when the first fingerprint value is smaller than the second fingerprint value; and
outputting (104c) information about a sequence of binary values as fingerprint for the audio signal;extracting (308) a sequence of reference audio signal fingerprints from the reference audio signal fingerprint information associated with the multichannel extension data (132);wherein the sequence of test audio signal fingerprints and the sequence of reference audio signal fingerprints are each a sequence of 1-bit values, wherein one bit each is associated with one block of audio samples,correlating (312) the sequence of test audio signal fingerprints and the sequence of reference audio signal fingerprints, the correlating (312) comprising:combining a bit sequence of the sequence of test audio signal fingerprints and a bit sequence of the reference audio signal fingerprints by a bit-by-bit XOR operation, and to sum up obtained bit results in order to obtain a first correlation value,combining a bit sequence of the sequence of test audio signal fingerprints or the reference audio signal fingerprints shifted by an offset value with a respectively different sequence by a bit-by-bit XOR operation, and to sum up obtained bit results in order to obtain a second correlation value, andselecting that offset value as the correlation result for which the largest correlation value has resulted; andreducing (316) or eliminating a time offset between the multichannel extension data (132) and the audio signal based on the correlation result (314). - A method for characterizing a test audio signal, comprising:calculating a test fingerprint of an audio signal, comprising the steps of
dividing (104a) the audio signal into subsequent blocks of samples;
calculating (104b) a first fingerprint value for a first block of the subsequent blocks and a second fingerprint value for a second block of the subsequent blocks;
comparing (806) the first fingerprint value with the second fingerprint value;
assigning (814) a first binary value when the first fingerprint value is higher than the second fingerprint value, or a second different binary value when the first fingerprint value is smaller than the second fingerprint value; and
outputting (104c) information about a sequence of binary values as fingerprint for the audio signal, wherein a sequence of binary values is obtained as test fingerprint;wherein the sequence of test audio signal fingerprints and the sequence of reference audio signal fingerprints are each a sequence of 1-bit values, wherein one bit each is associated with one block of audio samples, andcorrelating the information about a sequence of binary values with different reference fingerprints in a reference database, wherein the reference database comprises, for every reference finger print, information about an audio signal associated with the reference fingerprint, the correlating (132) comprising:combining a bit sequence of the sequence of test audio signal fingerprints and a bit sequence of the reference audio signal fingerprints by a bit-by-bit XOR operation, and to sum up obtained bit results in order to obtain a first correlation value,combining a bit sequence of the sequence of test audio signal fingerprints or the reference audio signal fingerprints shifted by an offset value with a respectively different sequence by a bit-by-bit XOR operation, and to sum up obtained bit results in order to obtain a second correlation value, andselecting that offset value as the correlation result for which the largest correlation value has resulted; andproviding information about the test audio signal based on the correlations. - A computer program comprising a program code for performing the method according to claims 12 or 13, when the program runs on a computer.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE102008009025A DE102008009025A1 (en) | 2008-02-14 | 2008-02-14 | Apparatus and method for calculating a fingerprint of an audio signal, apparatus and method for synchronizing and apparatus and method for characterizing a test audio signal |
PCT/EP2009/000917 WO2009100875A1 (en) | 2008-02-14 | 2009-02-10 | Device and method for calculating a fingerprint of an audio signal, device and method for synchronizing and device and method for characterizing a test audio signal |
Publications (2)
Publication Number | Publication Date |
---|---|
EP2240928A1 EP2240928A1 (en) | 2010-10-20 |
EP2240928B1 true EP2240928B1 (en) | 2011-06-22 |
Family
ID=40821819
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP09710004A Active EP2240928B1 (en) | 2008-02-14 | 2009-02-10 | Device and method for calculating a fingerprint of an audio signal, device and method for synchronizing and device and method for characterizing a test audio signal |
Country Status (8)
Country | Link |
---|---|
US (1) | US8634946B2 (en) |
EP (1) | EP2240928B1 (en) |
JP (1) | JP5302977B2 (en) |
CN (1) | CN101971249B (en) |
AT (1) | ATE514161T1 (en) |
DE (1) | DE102008009025A1 (en) |
HK (1) | HK1149842A1 (en) |
WO (1) | WO2009100875A1 (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010135623A1 (en) * | 2009-05-21 | 2010-11-25 | Digimarc Corporation | Robust signatures derived from local nonlinear filters |
EP2458890B1 (en) * | 2010-11-29 | 2019-01-23 | Nagravision S.A. | Method to trace video content processed by a decoder |
US8586847B2 (en) * | 2011-12-02 | 2013-11-19 | The Echo Nest Corporation | Musical fingerprinting based on onset intervals |
EP2648418A1 (en) | 2012-04-05 | 2013-10-09 | Thomson Licensing | Synchronization of multimedia streams |
EP2880654B1 (en) * | 2012-08-03 | 2017-09-13 | FRAUNHOFER-GESELLSCHAFT zur Förderung der angewandten Forschung e.V. | Decoder and method for a generalized spatial-audio-object-coding parametric concept for multichannel downmix/upmix cases |
CN103000180A (en) * | 2012-11-20 | 2013-03-27 | 上海中科高等研究院 | Surround array coding and decoding system and achieving method thereof |
IL290275B2 (en) | 2013-05-24 | 2023-02-01 | Dolby Int Ab | Coding of audio scenes |
US9666198B2 (en) | 2013-05-24 | 2017-05-30 | Dolby International Ab | Reconstruction of audio scenes from a downmix |
CN104239306A (en) * | 2013-06-08 | 2014-12-24 | 华为技术有限公司 | Multimedia fingerprint Hash vector construction method and device |
KR20150009757A (en) * | 2013-07-17 | 2015-01-27 | 삼성전자주식회사 | Image processing apparatus and control method thereof |
US9244042B2 (en) * | 2013-07-31 | 2016-01-26 | General Electric Company | Vibration condition monitoring system and methods |
DE102014102163B4 (en) * | 2014-02-20 | 2017-08-03 | Denso Corporation | Transmission technology for analog measured values |
KR102086047B1 (en) * | 2015-12-11 | 2020-03-06 | 한국전자통신연구원 | Method and apparatus for inserting data to audio signal or extracting data from audio signal |
CN107666638B (en) * | 2016-07-29 | 2019-02-05 | 腾讯科技(深圳)有限公司 | A kind of method and terminal device for estimating tape-delayed |
US10237608B2 (en) * | 2016-09-13 | 2019-03-19 | Facebook, Inc. | Systems and methods for evaluating synchronization between content streams |
US20180144755A1 (en) * | 2016-11-24 | 2018-05-24 | Electronics And Telecommunications Research Institute | Method and apparatus for inserting watermark to audio signal and detecting watermark from audio signal |
JP7380382B2 (en) | 2020-03-30 | 2023-11-15 | 沖電気工業株式会社 | range finder |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7461002B2 (en) * | 2001-04-13 | 2008-12-02 | Dolby Laboratories Licensing Corporation | Method for time aligning audio signals using characterizations based on auditory events |
WO2003091990A1 (en) * | 2002-04-25 | 2003-11-06 | Shazam Entertainment, Ltd. | Robust and invariant audio pattern matching |
US7382905B2 (en) * | 2004-02-11 | 2008-06-03 | Microsoft Corporation | Desynchronized fingerprinting method and system for digital multimedia data |
CN101002500A (en) * | 2004-08-12 | 2007-07-18 | 皇家飞利浦电子股份有限公司 | Audio source selection |
DE102004046746B4 (en) * | 2004-09-27 | 2007-03-01 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Method for synchronizing additional data and basic data |
US7761304B2 (en) * | 2004-11-30 | 2010-07-20 | Agere Systems Inc. | Synchronizing parametric coding of spatial audio with externally provided downmix |
DE102005014477A1 (en) * | 2005-03-30 | 2006-10-12 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for generating a data stream and generating a multi-channel representation |
US7516074B2 (en) * | 2005-09-01 | 2009-04-07 | Auditude, Inc. | Extraction and matching of characteristic fingerprints from audio signals |
GB2431837A (en) | 2005-10-28 | 2007-05-02 | Sony Uk Ltd | Audio processing |
US20070217626A1 (en) * | 2006-03-17 | 2007-09-20 | University Of Rochester | Watermark Synchronization System and Method for Embedding in Features Tolerant to Errors in Feature Estimates at Receiver |
WO2007144813A2 (en) * | 2006-06-13 | 2007-12-21 | Koninklijke Philips Electronics N.V. | Fingerprint, apparatus, method for identifying and synchronizing video |
-
2008
- 2008-02-14 DE DE102008009025A patent/DE102008009025A1/en not_active Withdrawn
-
2009
- 2009-02-10 JP JP2010546255A patent/JP5302977B2/en active Active
- 2009-02-10 AT AT09710004T patent/ATE514161T1/en active
- 2009-02-10 EP EP09710004A patent/EP2240928B1/en active Active
- 2009-02-10 WO PCT/EP2009/000917 patent/WO2009100875A1/en active Application Filing
- 2009-02-10 CN CN2009801053183A patent/CN101971249B/en active Active
- 2009-02-10 US US12/867,460 patent/US8634946B2/en active Active
-
2011
- 2011-04-20 HK HK11104000.7A patent/HK1149842A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
EP2240928A1 (en) | 2010-10-20 |
JP5302977B2 (en) | 2013-10-02 |
DE102008009025A1 (en) | 2009-08-27 |
ATE514161T1 (en) | 2011-07-15 |
CN101971249B (en) | 2013-03-13 |
US20110112669A1 (en) | 2011-05-12 |
CN101971249A (en) | 2011-02-09 |
HK1149842A1 (en) | 2011-10-14 |
WO2009100875A1 (en) | 2009-08-20 |
US8634946B2 (en) | 2014-01-21 |
JP2011512554A (en) | 2011-04-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2240928B1 (en) | Device and method for calculating a fingerprint of an audio signal, device and method for synchronizing and device and method for characterizing a test audio signal | |
EP2240929B1 (en) | Device and method for synchronizing multi-channel expansion data with an audio signal and for processing said audio signal | |
EP1864279B1 (en) | Device and method for producing a data flow and for producing a multi-channel representation | |
EP0954909B1 (en) | Method for coding an audio signal | |
EP1741215B1 (en) | Watermark incorporation | |
DE69927505T2 (en) | METHOD FOR INSERTING ADDITIONAL DATA INTO AN AUDIO DATA STREAM | |
DE60303209T2 (en) | PARAMETRIC AUDIOCODING | |
EP0931386B1 (en) | Method for signalling a noise substitution during audio signal coding | |
EP1687809B1 (en) | Device and method for reconstruction a multichannel audio signal and for generating a parameter data record therefor | |
DE602005006424T2 (en) | STEREO COMPATIBLE MULTICHANNEL AUDIO CODING | |
DE4320990B4 (en) | Redundancy reduction procedure | |
DE60112407T2 (en) | METHOD AND DEVICE FOR CONVERTING AN AUDIO SIGNAL BETWEEN DIFFERENT DATA COMPRESSION FORMATS | |
WO1993025015A1 (en) | Process for reducing data in the transmission and/or storage of digital signals from several interdependent channels | |
DE102007029381A1 (en) | Digital signal e.g. audio signal, processing device, has decision section, which assumes forecast data before deletion as interpolation data, when absolute value is lower than resolution | |
DE602004009926T2 (en) | DEVICE AND METHOD FOR EMBEDDING A WATERMARK USING SUBBAND FILTERING | |
EP1277346B1 (en) | Device and method for analysing a spectral representation of a decoded time-variable signal | |
DE10065363B4 (en) | Apparatus and method for decoding a coded data signal | |
DE69914345T2 (en) | TANDEM AUDIO COMPRESSION |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20100805 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA RS |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
DAX | Request for extension of the european patent (deleted) | ||
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: FIESEL, WOLFGANG Inventor name: NEUSINGER, MATTHIAS Inventor name: SCHARRER, SEBASTIAN |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D Free format text: NOT ENGLISH |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D Free format text: LANGUAGE OF EP DOCUMENT: GERMAN |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 502009000830 Country of ref document: DE Effective date: 20110811 |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: VDEP Effective date: 20110622 |
|
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 1149842 Country of ref document: HK |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110922 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110923 Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FD4D |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20111024 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20111022 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 |
|
REG | Reference to a national code |
Ref country code: HK Ref legal event code: GR Ref document number: 1149842 Country of ref document: HK |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20120323 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 502009000830 Country of ref document: DE Effective date: 20120323 |
|
BERE | Be: lapsed |
Owner name: FRAUNHOFER-GESELLSCHAFT ZUR FORDERUNG DER ANGEWAN Effective date: 20120228 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20120229 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20120228 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20111003 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110922 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20130228 Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20130228 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: TR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110622 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20120210 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: HU Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20090210 |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: MM01 Ref document number: 514161 Country of ref document: AT Kind code of ref document: T Effective date: 20140210 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: AT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20140210 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 8 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 9 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 10 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20230217 Year of fee payment: 15 |
|
P01 | Opt-out of the competence of the unified patent court (upc) registered |
Effective date: 20230512 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20240216 Year of fee payment: 16 Ref country code: GB Payment date: 20240222 Year of fee payment: 16 |