EP1864279B1 - Vorrichtung und verfahren zum erzeugen eines datenstroms und zum erzeugen einer multikanal-darstellung - Google Patents

Vorrichtung und verfahren zum erzeugen eines datenstroms und zum erzeugen einer multikanal-darstellung Download PDF

Info

Publication number
EP1864279B1
EP1864279B1 EP06707562A EP06707562A EP1864279B1 EP 1864279 B1 EP1864279 B1 EP 1864279B1 EP 06707562 A EP06707562 A EP 06707562A EP 06707562 A EP06707562 A EP 06707562A EP 1864279 B1 EP1864279 B1 EP 1864279B1
Authority
EP
European Patent Office
Prior art keywords
channel
fingerprint
data stream
information
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP06707562A
Other languages
German (de)
English (en)
French (fr)
Other versions
EP1864279A1 (de
Inventor
Wolfgang Fiesel
Matthias Neusinger
Harald Popp
Stephan Geyersberger
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Original Assignee
Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV filed Critical Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Publication of EP1864279A1 publication Critical patent/EP1864279A1/de
Application granted granted Critical
Publication of EP1864279B1 publication Critical patent/EP1864279B1/de
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing

Definitions

  • the present invention relates to audio signal processing, and more particularly to multi-channel processing techniques based on generating a multi-channel reconstruction of an original multi-channel signal based on at least one down-channel and multi-channel additional information.
  • Binaural Cue Coding (BCC) and Spatial Audio Coding, as disclosed in US Pat J. Herre, C. Faller, S. Disch, C. Ertel, J. Hilbert, A. Hoelzer, K. Linzmeier, C. Sprenger, P. Kroon: "Spatial Audio Coding: Next-Generation Efficient and Compatible Coding of Multi -Channel Audio ", 117th. AES Convention, San Francisco 2004 , Preprint 6186, is described.
  • Fig. 3 shows a joint stereo device 60.
  • This device may be a device that, for example, the intensity stereo (IS) Technology or the binaural cue coding technique (BCC) implemented.
  • Such a device typically receives as input at least two channels CH1, CH2, .... CHn, and outputs a single carrier channel as well as multi-channel parametric information.
  • the parametric data is defined so that an approximation of an original channel (CH1, CH2, ..., CHn) can be calculated in a decoder.
  • the carrier channel will include subband samples, spectral coefficients, time domain samples, etc. that provide a relatively fine representation of the underlying signal, while the parametric data does not include such samples or spectral coefficients, but control parameters for controlling a particular reconstruction algorithm, such as multiplying by weights, by time shifting, by frequency shifting, etc.
  • the parametric multi-channel information therefore comprises a relatively rough representation of the signal or the associated channel.
  • the amount of data needed by a carrier channel is about 60 to 70 kbps, while the amount of data required by one channel parametric page information is in the range of 1.5 to 2.5 kbps.
  • the above figures apply to compressed data.
  • a non-compressed CD channel requires data rates on the order of about ten times.
  • An example of parametric data is the known scale factors, intensity stereo information, or BCC parameters, as set forth below.
  • the reconstructed signals differ in their amplitude, but they are identical in terms of their phase information.
  • the energy-time envelopes of both original audio channels are maintained by the selective scaling operation, which typically operates in a frequency-selective manner. This corresponds to the human perception of sound at high frequencies, where the dominant spatial information is determined by the energy envelopes.
  • the transmitted signal i. H. the carrier channel is generated from the sum signal of the left channel and the right channel instead of the rotation of both components.
  • this processing i. H. generating intensity-stereo parameters to perform the scaling operations in a frequency-selective manner, i. H. independent for each scale factor band, d. H. for each encoder frequency partition.
  • both channels are combined to form a combined or "carrier" channel and, in addition to the combined channel, the intensity stereo information.
  • the intensity stereo information depends on the energy of the first channel, the energy of the second channel or the energy of the combined channel.
  • the BCC technique is described in the AES convention paper 5574 " Binaural Cue Coding applied to stereo and multichannel audio compression ", T. Faller, F. Baumgarte, May 2002 , Kunststoff.
  • BCC coding a number of audio input channels are converted to a spectral representation using a DFT-based transform with overlapping windows. The resulting spectrum is divided into non-overlapping sections, each of which has an index. Each partition has a bandwidth proportional to the equivalent rectangular bandwidth (ERB).
  • the Inter Channel Level Differences (ICLD) and the Inter Channel Time Differences (ICTD) are determined for each partition and for each frame k.
  • the ICLD and ICTD are quantized and encoded to eventually arrive as page information in a BCC bitstream.
  • the inter-channel level differences and the inter-channel time differences are given for each channel relative to a reference channel. Then, the parameters are calculated according to predetermined formulas that depend on the particular partitions of the signal to be processed.
  • the decoder On the decoder side, the decoder typically receives a mono signal and the BCC bit stream.
  • the mono signal is transformed into the frequency domain and input to a spatial synthesis block which also receives decoded ICLD and ICTD values.
  • the BCC parameters ICLD and ICTD are used to perform a weighting operation of the mono signal to synthesize the multi-channel signals representing, after a frequency / time conversion, a reconstruction of the original multi-channel audio signal.
  • the joint stereo module 60 operates to output the channel-side information such that the parametric channel data is quantized and coded ICLD or ICTD parameters using one of the original channels as the reference channel for encoding the channel side information.
  • the carrier signal is formed from the sum of the participating source channels.
  • the above techniques provide only a monodic representation for a decoder that can only process the carrier channel, but is unable to process the parametric data to produce one or more approximations of more than one input channel.
  • Fig. 5 shows such a BCC scheme for encoding / transmission of multi-channel audio signals.
  • the multi-channel audio input signal at an input 110 of a BCC encoder 112 is down-mixed in a so-called downmix block 114.
  • the original multi-channel signal at the input 110 is a 5-channel surround signal having a front left channel, a front right channel, a left surround channel, a right surround channel and a center channel.
  • the downmix block 114 generates a sum signal by simply adding these five channels into a mono signal.
  • This single channel is output on a sum signal line 115.
  • Side information obtained from the BCC analysis block 116 is output on a page information line 117.
  • inter-channel level differences ICLD
  • inter-channel time differences ICTD
  • the BCC analysis block 116 is also capable of calculating inter-channel correlation (ICC) values.
  • the sum signal and the page information are transmitted in a quantized and encoded format to a BCC decoder 120.
  • the BCC decoder decomposes the transmitted sum signal into a number of subbands and performs scaling, delays and other processing to provide the subbands of the multichannel audio channels to be output. This processing is performed so that the ICLD, ICTD and ICC parameters (cues) of a reconstructed multichannel signal at output 121 match the corresponding cues for the original multichannel signal at input 110 in BCC encoder 112.
  • the BCC decoder 120 includes a BCC synthesis block 122 and a page information revision block 123.
  • the sum signal on line 115 is fed to a time / frequency conversion unit or filter bank FB 125.
  • FB 125 At the output of the block 125 there exists a number N of subband signals or, in an extreme case, a block of spectral coefficients, when the audio filter bank 125 performs a 1: 1 transform, ie a transform producing N spectral coefficients from N time domain samples.
  • the BCC synthesis block 122 further includes a delay stage 126, a level modification stage 127, a correlation processing stage 128 and an inverse filter bank stage IFB 129.
  • stage 129 the reconstructed multichannel audio signal with, for example, five channels in the case of a 5-channel surround system may be output to a set of loudspeakers 124 as described in US Pat Fig. 5 or Fig. 4 are shown.
  • the input signal sn is converted into the frequency domain or the filter bank region by means of the element 125.
  • the signal output by element 125 is copied so as to obtain multiple versions of the same signal, as represented by copy node 130.
  • the number of versions of the original signal is equal to the number of output channels in the output signal.
  • each version of the original signal at node 130 undergoes a certain delay d 1 , d 2 , ..., d i , ... d N.
  • the delay parameters are determined by the page information processing block 123 in FIG Fig. 5 and from the inter-channel time differences as determined by the BCC analysis block 116 of FIG Fig. 5 have been calculated derived.
  • the ICC parameters calculated by BCC analysis block 116 are used to control the functionality of block 128 so that certain correlations between the delayed and level manipulated signals are obtained at the outputs of block 128. It should be noted here that the order of stages 126, 127, 128 is different from the one in FIG Fig. 6 may differ.
  • the BCC analysis is carried out in frames, ie temporally variable, and further that a frequency-wise BCC analysis is obtained, as determined by the filter bank division Fig. 6 is apparent.
  • the audio filter bank 125 decomposes the input signal into, for example, 32 bandpass signals
  • the BCC analysis block obtains a set of BCC parameters for each of the 32 bands.
  • the BCC synthesis block 122 performs Fig. 5 who is detailed in Fig. 6 is a reconstruction, which is based on the exemplified 32 bands.
  • Fig. 4 presented a scenario that is used to determine individual BCC parameters. Normally the ICLD, ICTD and ICC parameters can be defined between channel pairs. However, it is preferred to determine the ICLD and ICTD parameters between a reference channel and each other channel. This is in Fig. 4A shown.
  • ICC parameters can be defined in several ways. Generally speaking, one can determine ICC parameters in the encoder between all possible channel pairs, as shown in FIG Fig. 4B is shown. However, it has been proposed to calculate only ICC parameters between the strongest two channels at a time, as in Fig. 4C where an example is shown in which one ICC parameter between channels 1 and 2 is calculated one at a time, and at another time an ICC parameter between channels 1 and 5 is calculated.
  • the decoder then synthesizes the inter-channel correlation between the strongest channels in the decoder and uses certain heuristic rules to compute and synthesize the inter-channel coherence for the remaining channel pairs.
  • the multiplication parameters a 1 , aN are derived from the ICLD parameters such that the total energy of all reconstructed output channels is the same (or proportional to the energy of the transmitted sum signal).
  • Fig. 5 Generally, in such particular multi-channel parametric coding schemes, generation of at least one base channel as well as side information takes place Fig. 5 is apparent.
  • block-based schemes are used in which, as is also the case Fig. 5 it can be seen that the original multichannel signal at the input 110 is subjected to block processing by a block stage 111, such that from one block of, for example, 1152 samples, the downmix signal or the at least one base channel is formed for this block while at the same time the corresponding multichannel parameters are generated for this block by the BCC analysis.
  • the sum signal is typically encoded again with a block-based encoder, such as an MP3 encoder or an AAC encoder, to obtain further data rate reduction.
  • the parameter data is coded, for example by differential coding, scaling / quantization and entropy coding.
  • a common data stream is written in which a block of the at least one base channel follows an earlier block of the at least one base channel, and in which the encoded multi-channel additional information is also keyed in, for example by a bit stream multiplexer.
  • the data stream of basic channel data and multi-channel additional information always comprises one block of basic channel data and comprises, in association with this block, a block of multi-channel additional data which is then z.
  • B. form a common transmission frame. This transmission frame is then sent over a transmission link to a decoder.
  • the decoder again includes a data stream demultiplexer on the input side to split a frame of the data stream into a block of basic channel data and a block of associated multichannel overhead information. Then the block of basic data z. B. decoded by an MP3 decoder or an AAC decoder. This block of decoded basic data is then supplied to the BCC decoder 120 along with the block of optionally also decoded multichannel additional information.
  • the time allocation of the additional information to the basic channel data is automatically determined and by a decoder, which works in frame, easily restore.
  • the decoder will to a certain extent automatically find the additional information associated with a block of basic channel data, so that high-quality multi-channel reconstruction is possible. So there will be no problem that the multi-channel additional information have a time offset to the basic channel data.
  • a situation may arise, for example, in a sequential transmission system, such as broadcasting or the Internet.
  • the audio program to be transmitted is divided into audio base data (mono or stereo demix audio signal) and extension data (multichannel additional information) which are broadcast singly or in combination.
  • coder / decoder with non-constant output data rate in order to achieve a particularly good bit efficiency.
  • this processing also depends on the actually used hardware components for decoding, such as must be present in a PC or digital receiver, for example.
  • systemic or algorithmic-inherent blurring since, in particular, in the case of bit savings bank technology, on average, a constant output data rate is generated, however, locally, bits that are not needed for a particularly well-to-be-coded block are saved in order for another block, which is particularly difficult to code because the audio signal z. B. is particularly transient to be removed from the Bitsparkasse again.
  • the separation of the common data stream described above into two individual data streams has particular advantages. So is a classic receiver, so z. For example, a pure mono or stereo receiver at any time, regardless of the content and version of the multi-channel additional information, is able to receive and reproduce the audio base data. The separation into separate data streams thus ensures the backward compatibility of the entire concept.
  • a newer generation receiver can evaluate this multi-channel additional data and combine it with the audio base data in such a way that the user can be provided with the complete extension, here the multi-channel sound.
  • a particularly interesting application scenario of separate transmission of audio base data and extension data is in digital broadcasting.
  • the previously broadcast stereo audio signal can be extended by a small additional transmission effort to a multi-channel format, such as 5.1.
  • the program provider generates on the transmitter side from multi-channel sound sources, such as those found on DVD-Audio / Video, the multi-channel additional information.
  • this multichannel additional information is transmitted in parallel with the as yet radiated audio stereo signal, which is now not simply a stereo signal, but comprises two base channels derived by some downmix from the multichannel signal.
  • the stereo signal of the two base channels sounds like a normal stereo signal because multichannel analysis ultimately takes similar steps as those made by a sound engineer who mixed a stereo signal out of multiple tracks.
  • a major advantage of the separation is the compatibility with the existing digital broadcasting systems.
  • a classic receiver that can not evaluate this additional information will continue to receive and reproduce the bilingual signal without any qualitative restrictions.
  • a receiver of a newer design can, in addition to the previously received stereo sound signal, evaluate and decode this multichannel information and reconstruct the original 5.1 multichannel signal therefrom.
  • multi-channel additional information as a supplement to the previously used stereo signal
  • the receiver sees only one (valid) audio data stream and, if it is a receiver of the newer type, can extract from the data stream the multichannel sound additional information via a corresponding upstream data distributor again synchronously to the associated audio data block, decode and output as a 5.1 multichannel sound.
  • a disadvantage of this approach is the extension of the existing infrastructure or the existing data paths, so that instead of just as before, only the stereo audio signals from the downmix signals and extension combined data signals can transport.
  • leaving the standard transmission format for stereo data ensures synchronicity even in broadcast transmissions through the common data stream.
  • the other alternative is not to couple the multichannel overhead information to the audio coding system used and therefore not key in the actual audio data stream.
  • the transmission takes place via a separate, but not necessarily synchronized parallel digital additional channel.
  • This situation can occur when the downmix data is passed in unreduced form, for example as AES / EBU data format PCM data, through a common audio distribution infrastructure existing in studios. These infrastructures are designed to digitally distribute audio signals between diverse sources. For this purpose, normally known as "crossbars" functional units are used. Alternatively or additionally, audio signals are also processed in PCM format for purposes of equalization and dynamic compression. All of these steps lead to incalculable delays on a path from the sender to the receiver.
  • the separate transmission of base channel data and multi-channel additional information is particularly interesting, since existing stereo infrastructures do not need to be changed, so the described here with respect to the first possibility disadvantages of non-standard conformity here do not occur.
  • a broadcasting system only needs to broadcast one additional channel, but not change the infrastructure for the existing stereo channel.
  • the overhead is therefore effectively driven solely on the receiver side, but so that there is backwards compatibility, so that a user who has a new receiver gets better sound quality than a user who has an old receiver.
  • the magnitude of the time shift can no longer be determined from the received audio signal and the additional information.
  • a timely correct reconstruction and assignment of the multi-channel signal in the receiver is no longer guaranteed.
  • Another example of such a delay problem is when an already-running two-channel transmission system is to be extended to multi-channel transmission, for example in a receiver of a digital radio.
  • the decoding of the downmix signal by means of a receiver already existing in the two-channel audio decoder whose delay time is not known and thus can not be compensated.
  • the downmix audio signal may even reach the multi-channel reconstruction audio decoder via a transmission chain containing analog parts, ie one point is digital-to-analog conversion, and another analog-to-digital conversion occurs after further storage / transmission. Something like this always takes place in a radio transmission. Again, no clues are initially available as to how a proper delay equalization of the downmix signal relative to the multichannel overhead data can be performed. Even if the sampling frequency for the A / D conversion and the sampling frequency for the D / A conversion differ slightly, there is a slow time drift of the necessary compensation delay corresponding to the ratio of the two sampling rates to one another.
  • time synchronization method To synchronize the additional data to the basic data, various techniques known by the term "time synchronization method" can be used. These are based on pasting timestamps into both data streams in such a way that a correct assignment of the data belonging to each other can be achieved on the basis of these timestamps in the receiver. However, time stamping also alters the normal stereo infrastructure.
  • the WO 2005/011281 A1 discloses a method and apparatus for generating and capturing fingerprints for synchronizing audio and video signals.
  • a first fingerprint and a second fingerprint are generated, which are usable for the synchronization of at least two signals.
  • a segment of a first signal for example an audio signal
  • a segment of a second signal for example a video signal
  • the generated fingerprint pairs are stored in a database and transmitted to a synchronization device.
  • fingerprints of the audio signal and fingerprints of the video signal are generated and compared with the fingerprints in the database. If a match has been found, the fingerprints also designate the synchronization timing used to synchronize the two signals.
  • the object of the present invention is to provide a concept for generating a data stream or for generating a multi-channel representation, by means of which a synchronization of basic channel data and multi-channel additional information can be achieved.
  • a device for generating a data stream according to claim 1 a device for generating a multi-channel representation according to claim 17, a method for generating a data stream according to claim 26, a method for generating a multi-channel representation according to claim 27, a computer Program product according to claim 28 or a data stream representation according to claim 29 solved.
  • the present invention is based on the finding that a separate transmission and time-synchronous merging of a basic channel data stream and a multi-channel additional information data stream is made possible by the fact that the "multichannel data stream is modified on the" sender side "in such a way that fingerprint information representing a time profile of the at least reproduce a base channel, are introduced into the data stream with the multi-channel additional information such that from the data stream, a relationship between the multi-channel additional information and the fingerprint information is derivable. So include certain multi-channel additional information to certain basic channel data. Exactly this assignment must also be secured when transferring separate data streams.
  • the affiliation of multichannel additional information to basic channel data is signaled on the sender side by the fact that fingerprint information is determined from the base channel data with which the multichannel additional information which belongs to precisely this basic channel data is as it were marked.
  • This labeling of the relationship between the multichannel overhead information and the fingerprint information is achieved in block-wise data processing by associating a block fingerprint of precisely that block of base channel data with a block of multichannel overhead information corresponding to a block of basic channel data to which the considered block of multichannel overhead information belongs.
  • a fingerprint of exactly the basic channel data block, with which together the multichannel additional information must be processed during the reconstruction is assigned to the multichannel additional information.
  • the block fingerprint of the block of base channel data in the block structure of the multichannel overhead data stream may be keyed in such that each block of multichannel overhead information contains the block fingerprint of the associated base data.
  • the block fingerprint may be written immediately following a previously used block of multichannel overhead information, or may be written before the previously existing block, or may be written at any known location within that block, such that in multichannel reconstruction the block Fingerprint is readable for synchronization purposes.
  • the data stream therefore contains normal multichannel additional data as well as the block fingerprints interspersed accordingly.
  • the data stream could also be written so that z.
  • all block fingerprints provided with additional information such as a block counter, are at the beginning of the data stream generated in accordance with the invention so that a first portion of the data stream contains only block fingerprints and a second portion of the data stream contains the block fingerprint information belonging block-wise written multi-channel additional data contains.
  • additional information such as a block counter
  • a large number of block fingerprints could simply be read in first to obtain the reference fingerprint information.
  • the test fingerprints are added until there is a minimum number of test fingerprints used for a correlation.
  • the set of reference fingerprints could e.g. B. are already subjected to differential coding when the correlation in the multi-channel reconstruction is performed using differences, while in the data stream no difference block fingerprints but absolute block fingerprints are included.
  • the data stream is processed on the receiver side with the basic channel data, that is to say initially decoded, for example, and then supplied to a multichannel reconstructor.
  • this multi-channel reconstructor is designed such that it, if it does not get additional information, simply makes a through connection to output the preferably two base channels as a stereo signal.
  • Parallel to this is the extraction the reference fingerprint information and the calculation of the test fingerprint information from the decoded base channel data, to then perform a correlation calculation to calculate the offset of the base channel data to the multi-channel overhead data.
  • this offset is also the correct offset. This will be the case if the offset obtained by the second correlation calculation does not deviate more than a predetermined threshold from the offset obtained by the first correlation calculation.
  • the rendering may be performed so that the entire synchronization calculation is performed without stereo data being output in parallel, and then from the first one Block of basic channel data to synchronized multichannel overhead information. The listener will then have a synchronized 5.1 experience right from the first block.
  • the time for synchronization is normally about 5 seconds since about 200 reference fingerprints are needed as reference fingerprint information for optimal offset calculation. If this delay of about 5 seconds is irrelevant, as is the case for unidirectional transmissions, for example, you can start with a 5.1 playback - but only after the time required for the offset calculation. For interactive applications, such as when it comes to dialogs or something similar, this delay will be annoying, so that at some point, when the synchronization is finished, it will change from stereo to multi-channel playback. Thus, it has been found that it is better to provide only stereo playback than multichannel playback with non-synchronized multi-channel additional information.
  • the temporal allocation problem between basic channel data and multi-channel additional data is solved both by measures on the transmitter side and by measures on the receiver side.
  • time-varying and suitable fingerprint information is calculated from the corresponding mono or stereo downmix audio signal.
  • this fingerprint information is regularly keyed as a synchronization aid in the sent multi-channel additional data stream. This is preferably done as a data field in the middle of the block-organized z. Spatial audio coding page information, or such that the fingerprint signal is sent as first or last information of the data block such that it can be easily added or removed.
  • temporally variable and suitable fingerprint information is calculated from the corresponding stereo audio signal, ie the basic channel data. wherein according to the invention a number of two base channels is preferred. Furthermore, the fingerprints are extracted from the multi-channel additional information. Thereafter, the time offset between the multichannel overhead information and the received audio signal is calculated via correlation methods, such as a calculation of cross-correlation between the test fingerprint information and the reference fingerprint information. Alternatively, trial-and-error methods may also be performed in which various fingerprint information calculated from the base channel data based on various block rasters is compared to the reference fingerprint information to best match the test block raster, its associated test fingerprint information match the reference fingerprint information to determine the temporal offset.
  • the audio signals of the base channels are synchronized with the multichannel overhead information for subsequent multichannel reconstruction by a downstream delay balancing stage.
  • a downstream delay balancing stage Depending on the implementation, only an initial delay can be compensated.
  • the offset computation is performed in parallel to the reproduction in order to be able to readjust the offset as needed and according to the result of the correlation calculation in the event of a drifting apart of the base channel data and the multi-channel additional information despite a compensated initial delay.
  • the delay equalization stage can thus also be actively regulated.
  • the present invention is advantageous in that there is no need to make any changes to the base channel data or to the basic channel data processing path.
  • the basic channel data stream fed to a receiver is no different from a common base channel data stream. Changes are made only on the part of the multi-channel data stream. This is modified to include the fingerprint information be keyed.
  • changing the multichannel additional data stream does not lead to an unwanted departure from an already standardized, implemented and established solution, as would be the case if the base channel data stream were modified would.
  • the scenario according to the invention provides a particular flexibility for the propagation of multichannel additional information.
  • the multichannel additional information is parameter information that is very compact in terms of the required data rate or storage capacity
  • a digital receiver with such data can also be supplied completely separate from the stereo signal.
  • a user could obtain multi-channel additional information from a separate provider for stereo recordings that already exist on his solid-state player or on his CDs, and store it on his playback device. This storage is not a problem because the memory requirements, especially for multi-channel parametric additional information is not particularly large.
  • the multi-channel overhead data memory can retrieve the corresponding multi-channel overhead data stream and synchronized with the stereo signal based on the fingerprint information in the multi-channel overhead data stream to provide a multi-channel reconstruction to reach.
  • the solution according to the invention thus allows completely independent of the way the stereo signal, that is, regardless of whether it comes from a digital radio receiver, whether it comes from a CD, whether it comes from a DVD or whether it is z.
  • B. has arrived via the Internet, multi-channel additional data that can come from a very different source to synchronize with the stereo signal, the stereo signal then acts as a base channel data, then the basis of the multi-channel reconstruction is performed.
  • Fig. 1 shows a device for generating a data stream for a multi-channel reconstruction of an original multi-channel signal, wherein the multi-channel signal has at least two channels, according to a preferred embodiment of the present invention.
  • the device comprises a fingerprint generator 2, to which at least one base channel derived from the original multi-channel signal can be supplied via an input line 3.
  • the number of base channels is greater than or equal to 1 and less than a number of channels of the original multi-channel signal. If the original multi-channel signal is just a stereo signal with only two channels, then there is only a single base channel derived from the two stereo channels. However, if the original multi-channel signal is a signal having three or more channels, the number of base channels may be equal to two.
  • LFE Low Frequency Enhancement
  • the five channels are a left surround channel Ls, a left channel L, a center channel C, a right channel R, and a right right surround channel Rs.
  • the two base channels are then the left base channel and the right base channel ,
  • the one or more base channels are also referred to as downmix channels or downmix channels.
  • the fingerprint generator 2 is designed to generate fingerprint information from the at least one base channel, the fingerprint information representing a time profile of the at least one base channel.
  • the fingerprint information is calculated more or less costly.
  • very elaborately calculated fingerprints which are known under the keyword "Audio-ID”
  • any other size could be used that somehow represents the timing of the one or more base channels.
  • the fingerprint information is composed of a series of block fingerprints, where a block fingerprint is a measure of the energy of the one or more base channels in the block.
  • a block fingerprint is a measure of the energy of the one or more base channels in the block.
  • the fingerprint generator 2 supplies, on the output side, the fingerprint information which is supplied to a data stream generator 4.
  • the data stream generator 4 is designed to generate a data stream from the fingerprint information and the typically time-varying multi-channel additional information, wherein the multi-channel additional information together with the at least one base channel enable the multi-channel reconstruction of the original multi-channel signal.
  • the data stream generator is designed to generate the data stream at an output 5 such that a connection between the multichannel additional information and the fingerprint information can be derived from the data stream.
  • the data stream of multichannel additional information is thus marked with the fingerprint information derived from the at least one base channel, such that the togetherness is provided via the fingerprint information, which is assigned to the multichannel additional information by the data stream generator 4 of certain multi-channel additional information to the basic channel data can be determined.
  • Fig. 2 shows an inventive device for generating a multi-channel representation of an original multi-channel signal from at least one base channel and a data stream, the fingerprint information representing a time course of the at least one base channel, and multi-channel additional information that together with the at least one base channel enable the multi-channel reconstruction of the original multi-channel signal, wherein from the data stream, a relationship between the multi-channel additional information and the fingerprint information is derivable.
  • the at least one base channel is fed via an input 10 to a receiver or decoder-side fingerprint generator 11.
  • the fingerprint generator 11 provides output test fingerprint information via an output 12 to a synchronizer 13.
  • the test fingerprint information is derived from the at least one base channel by exactly the same algorithm as described in block 2 of FIG Fig. 1 is performed. However, depending on the implementation, the algorithms do not necessarily have to be identical.
  • the fingerprint generator 2 may generate a block fingerprint in absolute encoding while the fingerprint generator 11 on the decoder side performs a differential fingerprint determination such that the test block fingerprint associated with a block is the difference between two absolute fingerprints .
  • a fingerprint extractor 14 will extract the fingerprint information from the data stream and at the same time form differences so that the synchronization 13 is provided as reference fingerprint information via an output 15 Data that is comparable to the test fingerprint information.
  • the algorithms for calculating the test fingerprint information on the decoder side and the algorithms for calculating the fingerprint information on the encoder side which in Fig. 2 may also be referred to as reference fingerprint information, at least so similar that the synchronizer 13 using this two information the multichannel overhead data in the data stream obtained via an input 16 can be synchronized with the data via the at least one base channel.
  • a synchronized multi-channel display is obtained, which comprises the basic channel data and synchronously thereto the multi-channel additional data.
  • the synchronizer 13 determines a time offset between the basic channel data and the multi-channel additional data and then delays the multi-channel additional data by this offset. It has been found that the multichannel overhead data usually arrives earlier, that is, too early, which can be attributed to the much smaller amount of data that typically corresponds to the multichannel overhead data compared to the amount of data for the base channel data. Thus, if the multichannel additional data is delayed, the data is fed via the at least one base channel from the input 10 via a base channel data line 17 to the synchronizer 13 and actually only "looped through” by this and output again at an output 18.
  • the multi-channel overhead data obtained via the input 16 is fed to the synchronizer via a multi-channel overhead data line 19, delayed there by a predetermined offset, and fed to an output 20 of the synchronizer together with the base channel data to a multi-channel reconstructor 21 then the actual audio rendering executes to the output side z.
  • the five audio channels and a low-frequency channel in Fig. 2 not shown).
  • the data on lines 18 and 20 thus form the synchronized multi-channel representation, with the data stream on line 20 corresponding to the data stream at input 16, apart from any multichannel overhead data coding, except for the fact that the fingerprint information is from the Data stream removed be, which can happen depending on the implementation in the synchronizer 13, or even before.
  • the fingerprint removal can be done already in the fingerprint extractor 14, so that then there is no line 19, but a line 19 ', which goes from the fingerprint extractor 9 directly into the synchronizer 13.
  • the synchronizer 13 is thus supplied in parallel by the fingerprint extractor with both the multi-channel additional data and with the reference fingerprint information.
  • the synchronizer is thus configured to synchronize the multichannel overhead information and the at least one base channel using the test fingerprint information and the reference fingerprint information, and using the derived from the data stream context of the multichannel information with the fingerprint information contained in the data stream.
  • the timing relationship between the multichannel overhead information and the fingerprint information is preferably determined simply by whether the fingerprint information precedes a set of multichannel overhead information, a set of multichannel overhead information, or within a set of Multi-channel additional information is available. Depending on whether the fingerprints are in front of, behind, or in the midst of a set of multichannel additional information, it is determined on the encoder side that this multichannel information belongs to that fingerprint information.
  • block processing is used.
  • the keying of the fingerprints is made so that a block of multi-channel additional data always follows a block fingerprint, so that a block of multi-channel additional information alternates with a block fingerprint and vice versa.
  • a data stream format could be used in which the entire fingerprint information in one separate part at the beginning of the data stream, whereupon the whole data stream follows. So here block fingerprints and blocks of multi-channel additional information would not alternate.
  • Alternative ways of assigning fingerprints to multi-channel additional information are known to those skilled in the art. According to the invention, a connection between the plurality of additional information and the fingerprint information must be derivable from the data stream only on the decoder side so that the fingerprint information can be used to synchronize the multichannel additional information with the basic channel data.
  • Fig. 7a shows an original multi-channel signal, for example a 5.1-signal consisting of a sequence of blocks B1 to B8, wherein in a block at the in Fig. 7a shown example multi-channel information MKi are included.
  • a block such as block B1
  • Such a block size is used, for example, in the BCC encoder 112 of FIG Fig. 5 in which the block formation, that is to say the windowing to a certain extent in order to obtain a sequence of blocks from a continuous signal, is effected by the element 111 in FIG Fig. 5 , which is called "block", is reached.
  • the at least one base channel is applied.
  • the basic channel data can again be represented as a sequence of blocks B1 to B8, blocks B1 to B8 of FIG Fig. 7b with the blocks B1 to B8 in Fig. 7a correspond.
  • a block now no longer contains - if in a time-domain representation remains the original 5.1 signal, but only a mono signal or a stereo signal with two stereo baseband channels.
  • the block B1 therefore again comprises the 1152 time samples of both the first stereo master channel and the second stereo master channel, these 1152 samples of both the left stereo baseband and the right stereo baseband having been respectively calculated by sample addition / subtraction and optionally weighting, ie by the operation performed in downmix block 114 of FIG Fig. 5 for example, is performed.
  • the multichannel information stream again comprises blocks B1 through B8, with each block in Fig. 7c the corresponding block of the original multi-channel signal in Fig. 7a or of the one or more base channels of Fig. 7b equivalent.
  • the base channel data in the block B1 of the basic channel data stream indicated by BK1 must match the multi-channel information P1 of the block B1 in FIG Fig. 7c be combined.
  • This combination is used in the Fig. 6 embodiment shown by the BCC synthesis block, which, in order to obtain a block-by-block processing of the basic channel data, again has a blocking stage at its input.
  • P3 thus designates, as it does in Fig. 7c is executed, the multi-channel information which, together with the block of values BK3 of the base channels, reconstructs a reconstruction of the block of values MK3 of the original multi-channel signal.
  • each block Bi of the data stream of Fig. 7c provided with a block fingerprint.
  • This block fingerprint is now derived exactly from the block B3 of the block of values BK3.
  • the block fingerprint F3 could also be subjected to differential coding so that the block fingerprint F3 equals the difference is the block fingerprint of block BK3 of the base channels and the block fingerprint of the block of values BK2 of the base channels.
  • a block of energy or differential energy is used as the block fingerprint.
  • the data stream with the one or more base channels in Fig. 7b separated from the data stream with the multichannel information and fingerprint information from Fig. 7c to a multichannel reconstructor. If nothing else were done, the case could arise that at the multichannel reconstructor, for example at the BCC synthesis block 122 of FIG Fig. 5 block BK5 is about to be processed. Furthermore, due to some temporal blurring of the multichannel information, block B7 may be present instead of block B5. Without further action, therefore, a reconstruction of the block of basic channel data BK5 would be made with the multi-channel information P7, which would lead to artifacts. According to the invention, as will be explained below, an offset of two blocks is now calculated, such that the data stream in Fig. 7c is delayed by two blocks, such that a multi-channel representation from the data stream of Fig. 7b and the data stream of Fig. 7c is present, but now have been synchronized to each other.
  • the offset determination according to the invention is not limited to the calculation of an offset as an integer multiple of a block, but can, given sufficiently accurate correlation calculation and using a sufficiently large number of block fingerprints (which of course at the expense of the time period for calculating the correlation) can also achieve an offset accuracy that is equal to a fraction of a block and can reach up to one sample.
  • a high accuracy is not necessarily required, but that a synchronization accuracy of +/- half a block (at a block length of 1152 samples) already leads to a multi-channel reconstruction that judges a listener as artifact-free.
  • Fig. 7d shows a preferred embodiment for a block Bi, for example, for the block B3 of the data stream in Fig. 7c .
  • the block is initiated with a sync word, which may be one byte long, for example.
  • a sync word which may be one byte long, for example.
  • length information since it is preferred to scale the multichannel information P3, as known in the art, according to its calculation, quantize, and entropy-encode, so that the length of the multi-channel information, which may be parameter information, for example, but also one Waveform signal z. B. of the page channel is not known from the outset and therefore must be signaled in the data stream.
  • the block fingerprint according to the invention is then inserted.
  • Fig. 7d executed, can be introduced as an energy measure an absolute measure of energy, or even a differential energy measure. Then the block B3 of the data stream would be added as a block fingerprint the difference between the energy measure for the base channel data BK3 and the energy measure for the base channel data BK2.
  • Fig. 8 shows a more detailed representation of the synchronizer, the fingerprint generator 11 and the fingerprint extractor 9 of Fig. 2 in cooperation with the multi-channel reconstructor 21.
  • the base channel data is fed to a base channel data buffer 25 and buffered.
  • the additional information or the data stream with the additional information and the fingerprint information is supplied to an additional information buffer 26.
  • Both buffers are generally constructed in the form of a FIFO buffer, but the buffer 26 has further capacities in that the fingerprint information is extractable from the reference fingerprint extractor 9 and further removed from the data stream, so that on a buffer output line 27 only Multi-channel additional information, but can be output without keyed fingerprints.
  • the removal of the fingerprints in the data stream may also be performed by a time shifter 28 or some other element such that the multi-channel reconstructor 21 is not disturbed by fingerprint bytes in the multi-channel reconstruction.
  • the fingerprint information calculated by the fingerprint generator 11, as well as the fingerprint information obtained by the fingerprint extractor 9 can be directly input to a correlator 29 within the synchronizer 13 of FIG Fig. 2 be fed.
  • the correlator then calculates the offset value and provides it to the time shifter 28 via an offset line 30.
  • the synchronizer 13 is further configured to be fed to the time shifter 28 when a valid offset value is generated In order to enable the enable switch 31 to close a switch 32 such that the stream of multi-channel overhead data from the buffer 26 is fed to the multichannel reconstructor 21 via the time shifter 28 and the switch 32.
  • the multichannel reconstructor 21 only a time delay (delay) of the multichannel overhead information is made.
  • delay time delay
  • multichannel reconstruction is already performed in parallel to the calculation of the correct offset value.
  • this multichannel reconstruction is merely a "trivial" multichannel reconstruction because the preferably two stereo-base channels are simply output from the multi-channel reconstructor 21. If the switch 32 is therefore open, only one stereo output follows. However, if the switch 32 is closed, the multichannel reconstructor 21 also receives the multichannel additional information in addition to the stereo base channels and can perform a multichannel output synchronized, however. A listener only notices this by switching from stereo quality to multi-channel quality.
  • the output of multichannel reconstructor 21 may be held back until there is a valid offset. Then already the very first block (BK1 of Fig. 7b ) with the now correctly delayed multi-channel additional data P1 ( Fig. 7c ) are supplied to the multi-channel reconstructor 21, so that the output is started only when multi-channel data is present. An output of the multichannel reconstructor 21 with the switch open will not exist in this embodiment.
  • Fig. 9 the functionality of the correlator 29 of Fig. 8 shown.
  • a sequence of test fingerprint information is provided, as in the top field of FIG Fig. 9 you can see.
  • this block being designated 1, 2, 3, 4, i, a block fingerprint is present.
  • the reference fingerprint determiner 9 also generates a sequence of discrete reference fingerprints that it extracts from the data stream.
  • differential encoded fingerprint information is included in the data stream, and if the correlator is to operate on the basis of absolute fingerprints, a differential decoder 35 in FIG Fig. 8 activated.
  • absolute fingerprints be used in the data stream .
  • Energy measurements are included, since this information about the total energy per block for level correction purposes of the multi-channel reconstructor 21 can also be advantageously exploited.
  • the correlator 29 is now the in the two upper fields of Fig. 9 shown curves or sequences of discrete values and provide a correlation result in the lower field of Fig. 9 is shown.
  • the result is a correlation result whose offset component provides exactly the offset between the two fingerprint information curves. Since the offset is also positive, the multichannel additional information must be be postponed in a positive time direction, so be delayed. It should be noted that, of course, the basic channel data could be shifted in the negative time direction, or that both the multi-channel additional information can be shifted in the positive direction, and the base channel overhead data can be shifted a part of the offset in the negative time direction, so long the multichannel reconstructor contains a synchronized multi-channel representation at its two inputs.
  • the basic channel data is buffered to calculate one fingerprint at a time, after which the block from which a test block fingerprint has just been calculated is fed to the multichannel reconstructor for multichannel reconstruction. Thereafter, the next block of the base channel data is again fed to the buffer 25 so that a block test fingerprint can be calculated from this block again.
  • fewer than 200 blocks or more than 200 blocks may be used. According to the invention, it has been found that a number between 100 and 300 blocks, and preferably 200 blocks, provides results that provide a reasonable compromise between computation time, correlation computation, and offset accuracy.
  • a block 37 is entered in which the correlation between the 200 calculated test block fingerprints and the 200 calculated reference block fingerprints is performed by the correlator 29.
  • the offset result obtained there is saved now.
  • a block 38 corresponding to the block 36 a number of the next z. B. calculates 200 blocks of the base channel data. Accordingly, 200 blocks are again extracted from the data stream with the multi-channel additional information.
  • a correlation is again performed, and the offset result obtained there is stored.
  • a deviation between the offset result due to the second 200 blocks and the offset result due to the first 200 blocks is detected.
  • a predetermined value for the deviation threshold is, for example, a value of one or two blocks. This is because when an offset from one calculation to the next calculation does not change more than one or two blocks, no error has been made in the correlation calculation.
  • the z. B. 200 is used. So z. B. made a calculation with 200 blocks and obtained a result. Then one block is continued and one block is taken out of the number of blocks used for the correlation calculation and the new block is used for this purpose. The result obtained is then stored as well as the last result obtained in a histogram. This procedure is used for a number of correlation calculations, such as 100 or 200, so that the histogram gradually fills. The peak of the histogram is then used as a calculated offset to provide the initial offset or to obtain a dynamic offset offset.
  • the offset calculation taking place in parallel to the output will run in a block 42, and adaptive dynamic offset tracking will be achieved as needed, when drifting of the data stream with the multichannel information and the data stream with the base channel data has been detected updated offset value via line 30 to time shifter 28 of FIG Fig. 8 is supplied.
  • adaptive tracking it should be noted that, depending on the implementation, a smoothing of the offset change can also be carried out, so that if a deviation of, for example, two blocks has been determined, first the offset is incremented by 1 and then incremented again as required so that the jumps do not get too big.
  • Fig. 11 to a preferred embodiment of the fingerprint generator 2 encoder side, as in Fig. 1 has been shown, and the fingerprint generator 11 of Fig. 2 , as used on the decoder side, is shown.
  • the multichannel audio signal for obtaining the multichannel overhead data is divided into fixed size blocks.
  • a fingerprint is calculated for each block at the same time to obtain the multichannel additional data, which is suitable for characterizing the temporal structure of the signal as clearly as possible.
  • An embodiment of this is to use the energy content of the current downmix audio signal of the audio block, for example in logarithm form, ie in a decibel-related representation.
  • the fingerprint is a measure of the temporal envelope of the audio signal.
  • this synchronization information can also be compared to the energy value of the previous block, followed by suitable entropy coding, for example Huffman coding, adaptive scaling and quantization be expressed.
  • suitable entropy coding for example Huffman coding, adaptive scaling and quantization be expressed.
  • an energy calculation of Downmixaudiosignals in the current block optionally performed for a stereo signal.
  • This z For example, 1152 audio samples are squared and summed from both the left and right downmix channels. S left (i) represents a time sample at time i of the left base channel, while S right (i) represents a time sample of the right base channel at time i. With a monophonic downmix signal the summation is omitted. Furthermore, it is preferred to remove the non-meaningful DC components of the Dowrimix audio signal before the calculation.
  • a minimum limitation of the energy is carried out for the purpose of subsequent logarithmic display.
  • a minimum energy offset it is preferred to use a minimum energy offset to give a meaningful logarithmic calculation in the case of zero energy.
  • This energy metric in dB covers a range of 0 to 90 (dB) with an audio signal resolution of 16 bits.
  • this step is z. B. completed in the encoder.
  • the fingerprint consists of difference coded values.
  • this step can also be implemented purely on the decoder side become.
  • the transmitted fingerprint thus consists of non-differentially encoded values. The difference is only made here in the decoder. The latter possibility has the advantage that the fingerprint contains information about the absolute energy of the downmix signal. However, typically a slightly higher fingerprint word length is needed.
  • a quantization of the fingerprint made. To prepare this fingerprint for keying in the multichannel overhead information, it is quantized to 8 bits. This reduced fingerprint resolution has proven to be a good compromise in terms of bit demand and reliability of delay detection in practice. Number overflows greater than 255 are limited to a maximum value of 255 with a saturation characteristic.
  • an optimal Entropiecodtechnik the fingerprint can still be made.
  • the bit requirement of the quantized fingerprint can be further reduced.
  • a suitable entropy method is, for example, Huffman coding or arithmetic coding. Statistically different frequencies of fingerprint values may be due to different Code lengths are expressed and thus on average reduce the bit requirements of the fingerprint representation.
  • the calculation of the multi-channel additional data is performed using the multi-channel audio data.
  • multichannel additional information calculated is then expanded by the newly added synchronization information by suitable embedding in the bit stream.
  • the receiver is now able to detect a time offset of downmix signal and additional data and to realize a time-correct adaptation, ie a delay compensation between stereo audio signals and multi-channel additional information in the order of +/- 1 ⁇ 2 audio block.
  • a time-correct adaptation ie a delay compensation between stereo audio signals and multi-channel additional information in the order of +/- 1 ⁇ 2 audio block.
  • the inventive method for generating or decoding can be implemented in hardware or in software.
  • the implementation may be on a digital storage medium, in particular a floppy disk or CD with electronically readable control signals, which may interact with a programmable computer system such that the method is performed.
  • the invention thus also consists in a computer program product with a program code stored on a machine-readable carrier for carrying out the method when the computer program product runs on a computer.
  • the invention can thus be realized as a computer program with a program code for carrying out the method when the computer program runs on a computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Stereophonic System (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Time-Division Multiplex Systems (AREA)
  • Synchronisation In Digital Transmission Systems (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Studio Circuits (AREA)
  • Television Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
EP06707562A 2005-03-30 2006-03-15 Vorrichtung und verfahren zum erzeugen eines datenstroms und zum erzeugen einer multikanal-darstellung Active EP1864279B1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102005014477A DE102005014477A1 (de) 2005-03-30 2005-03-30 Vorrichtung und Verfahren zum Erzeugen eines Datenstroms und zum Erzeugen einer Multikanal-Darstellung
PCT/EP2006/002369 WO2006102991A1 (de) 2005-03-30 2006-03-15 Vorrichtung und verfahren zum erzeugen eines datenstroms und zum erzeugen einer multikanal-darstellung

Publications (2)

Publication Number Publication Date
EP1864279A1 EP1864279A1 (de) 2007-12-12
EP1864279B1 true EP1864279B1 (de) 2009-06-17

Family

ID=36598142

Family Applications (1)

Application Number Title Priority Date Filing Date
EP06707562A Active EP1864279B1 (de) 2005-03-30 2006-03-15 Vorrichtung und verfahren zum erzeugen eines datenstroms und zum erzeugen einer multikanal-darstellung

Country Status (12)

Country Link
US (1) US7903751B2 (zh)
EP (1) EP1864279B1 (zh)
JP (1) JP5273858B2 (zh)
CN (1) CN101189661B (zh)
AT (1) ATE434253T1 (zh)
AU (1) AU2006228821B2 (zh)
CA (1) CA2603027C (zh)
DE (2) DE102005014477A1 (zh)
HK (1) HK1111259A1 (zh)
MY (1) MY139836A (zh)
TW (1) TWI318845B (zh)
WO (1) WO2006102991A1 (zh)

Families Citing this family (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1962082A1 (de) 2007-02-21 2008-08-27 Agfa HealthCare N.V. System und Verfahren zur optischen Kohärenztomographie
US8612237B2 (en) * 2007-04-04 2013-12-17 Apple Inc. Method and apparatus for determining audio spatial quality
US8566108B2 (en) 2007-12-03 2013-10-22 Nokia Corporation Synchronization of multiple real-time transport protocol sessions
DE102008009024A1 (de) * 2008-02-14 2009-08-27 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung und Verfahren zum synchronisieren von Mehrkanalerweiterungsdaten mit einem Audiosignal und zum Verarbeiten des Audiosignals
DE102008009025A1 (de) * 2008-02-14 2009-08-27 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung und Verfahren zum Berechnen eines Fingerabdrucks eines Audiosignals, Vorrichtung und Verfahren zum Synchronisieren und Vorrichtung und Verfahren zum Charakterisieren eines Testaudiosignals
CN101809656B (zh) * 2008-07-29 2013-03-13 松下电器产业株式会社 音响编码装置、音响解码装置、音响编码解码装置及会议系统
CN102177726B (zh) * 2008-08-21 2014-12-03 杜比实验室特许公司 用于音频和视频签名生成和检测的特征优化和可靠性估计
HUE041788T2 (hu) * 2008-10-06 2019-05-28 Ericsson Telefon Ab L M Eljárás és berendezés igazított többcsatornás hang szállítására
CN103177725B (zh) * 2008-10-06 2017-01-18 爱立信电话股份有限公司 用于输送对齐的多通道音频的方法和设备
WO2010103442A1 (en) * 2009-03-13 2010-09-16 Koninklijke Philips Electronics N.V. Embedding and extracting ancillary data
GB2470201A (en) * 2009-05-12 2010-11-17 Nokia Corp Synchronising audio and image data
US8436939B2 (en) * 2009-10-25 2013-05-07 Tektronix, Inc. AV delay measurement and correction via signature curves
US9426574B2 (en) * 2010-03-19 2016-08-23 Bose Corporation Automatic audio source switching
EP2458890B1 (en) * 2010-11-29 2019-01-23 Nagravision S.A. Method to trace video content processed by a decoder
US9075806B2 (en) * 2011-02-22 2015-07-07 Dolby Laboratories Licensing Corporation Alignment and re-association of metadata for media streams within a computing device
KR101748760B1 (ko) * 2011-03-18 2017-06-19 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에.베. 오디오 콘텐츠를 표현하는 비트스트림의 프레임들 내의 프레임 요소 배치
US8832039B1 (en) 2011-06-30 2014-09-09 Amazon Technologies, Inc. Methods and apparatus for data restore and recovery from a remote data store
US8706834B2 (en) 2011-06-30 2014-04-22 Amazon Technologies, Inc. Methods and apparatus for remotely updating executing processes
US10754813B1 (en) 2011-06-30 2020-08-25 Amazon Technologies, Inc. Methods and apparatus for block storage I/O operations in a storage gateway
US8806588B2 (en) 2011-06-30 2014-08-12 Amazon Technologies, Inc. Storage gateway activation process
US8639921B1 (en) 2011-06-30 2014-01-28 Amazon Technologies, Inc. Storage gateway security model
US9294564B2 (en) 2011-06-30 2016-03-22 Amazon Technologies, Inc. Shadowing storage gateway
US8639989B1 (en) 2011-06-30 2014-01-28 Amazon Technologies, Inc. Methods and apparatus for remote gateway monitoring and diagnostics
US8793343B1 (en) 2011-08-18 2014-07-29 Amazon Technologies, Inc. Redundant storage gateways
US8789208B1 (en) 2011-10-04 2014-07-22 Amazon Technologies, Inc. Methods and apparatus for controlling snapshot exports
US9635132B1 (en) 2011-12-15 2017-04-25 Amazon Technologies, Inc. Service and APIs for remote volume-based block storage
KR20130101629A (ko) * 2012-02-16 2013-09-16 삼성전자주식회사 보안 실행 환경 지원 휴대단말에서 컨텐츠 출력 방법 및 장치
EP2670157B1 (en) * 2012-06-01 2019-10-02 Koninklijke KPN N.V. Fingerprint-based inter-destination media synchronization
CN102820964B (zh) * 2012-07-12 2015-03-18 武汉滨湖电子有限责任公司 一种基于系统同步与参考通道的多通道数据对齐的方法
EP2693392A1 (en) 2012-08-01 2014-02-05 Thomson Licensing A second screen system and method for rendering second screen information on a second screen
CN102937938B (zh) * 2012-11-29 2015-05-13 北京天诚盛业科技有限公司 指纹处理装置及其控制方法和控制装置
TWI557727B (zh) 2013-04-05 2016-11-11 杜比國際公司 音訊處理系統、多媒體處理系統、處理音訊位元流的方法以及電腦程式產品
JP6349977B2 (ja) * 2013-10-21 2018-07-04 ソニー株式会社 情報処理装置および方法、並びにプログラム
US20150302086A1 (en) 2014-04-22 2015-10-22 Gracenote, Inc. Audio identification during performance
US20160344902A1 (en) * 2015-05-20 2016-11-24 Gwangju Institute Of Science And Technology Streaming reproduction device, audio reproduction device, and audio reproduction method
EP3115932A1 (en) * 2015-07-07 2017-01-11 Idex Asa Image reconstruction
RU2718418C2 (ru) * 2015-11-09 2020-04-02 Сони Корпорейшн Устройство декодирования, способ декодирования и программа
EP3249646B1 (en) * 2016-05-24 2019-04-17 Dolby Laboratories Licensing Corp. Measurement and verification of time alignment of multiple audio channels and associated metadata
US10015612B2 (en) 2016-05-25 2018-07-03 Dolby Laboratories Licensing Corporation Measurement, verification and correction of time alignment of multiple audio channels and associated metadata
EP3324407A1 (en) * 2016-11-17 2018-05-23 Fraunhofer Gesellschaft zur Förderung der Angewand Apparatus and method for decomposing an audio signal using a ratio as a separation characteristic
EP3324406A1 (en) 2016-11-17 2018-05-23 Fraunhofer Gesellschaft zur Förderung der Angewand Apparatus and method for decomposing an audio signal using a variable threshold
CN112986963B (zh) * 2021-02-08 2024-05-03 武汉徕得智能技术有限公司 一种激光脉冲测距回波信号多路缩放结果选择控制方法
CN112995708A (zh) * 2021-04-21 2021-06-18 湖南快乐阳光互动娱乐传媒有限公司 一种多视频同步方法及装置
CN114003546B (zh) * 2022-01-04 2022-04-12 之江实验室 一种多通道开关量复合编码设计方法和装置

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000155598A (ja) * 1998-11-19 2000-06-06 Matsushita Electric Ind Co Ltd 多チャンネル・オーディオ信号の符号化/復号化方法と装置
CA2859333A1 (en) * 1999-04-07 2000-10-12 Dolby Laboratories Licensing Corporation Matrix improvements to lossless encoding and decoding
US7013301B2 (en) * 2003-09-23 2006-03-14 Predixis Corporation Audio fingerprinting system and method
US6990453B2 (en) * 2000-07-31 2006-01-24 Landmark Digital Services Llc System and methods for recognizing sound and music signals in high noise and distortion
TW510144B (en) 2000-12-27 2002-11-11 C Media Electronics Inc Method and structure to output four-channel analog signal using two channel audio hardware
US7461002B2 (en) * 2001-04-13 2008-12-02 Dolby Laboratories Licensing Corporation Method for time aligning audio signals using characterizations based on auditory events
US7116787B2 (en) * 2001-05-04 2006-10-03 Agere Systems Inc. Perceptual synthesis of auditory scenes
US20030035553A1 (en) * 2001-08-10 2003-02-20 Frank Baumgarte Backwards-compatible perceptual coding of spatial cues
US7006636B2 (en) * 2002-05-24 2006-02-28 Agere Systems Inc. Coherence-based audio coding and synthesis
US7292901B2 (en) * 2002-06-24 2007-11-06 Agere Systems Inc. Hybrid multi-channel/cue coding/decoding of audio signals
CN1315110C (zh) * 2002-04-25 2007-05-09 兰德马克数字服务有限责任公司 坚固而且不变的音频图样匹配
EP1506550A2 (en) * 2002-05-16 2005-02-16 Koninklijke Philips Electronics N.V. Signal processing method and arrangement
KR20060037403A (ko) * 2003-07-25 2006-05-03 코닌클리케 필립스 일렉트로닉스 엔.브이. 오디오 및 비디오를 동기화시키기 위하여 핑거프린트들을생성하여 검출하는 방법 및 장치
WO2005086139A1 (en) 2004-03-01 2005-09-15 Dolby Laboratories Licensing Corporation Multichannel audio coding
DE102004046746B4 (de) * 2004-09-27 2007-03-01 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Verfahren zum Synchronisieren von Zusatzdaten und Basisdaten
US7567899B2 (en) * 2004-12-30 2009-07-28 All Media Guide, Llc Methods and apparatus for audio recognition

Also Published As

Publication number Publication date
CN101189661B (zh) 2011-10-26
US7903751B2 (en) 2011-03-08
AU2006228821B2 (en) 2009-07-23
JP2008538239A (ja) 2008-10-16
JP5273858B2 (ja) 2013-08-28
US20080013614A1 (en) 2008-01-17
CA2603027C (en) 2012-09-11
AU2006228821A1 (en) 2006-10-05
EP1864279A1 (de) 2007-12-12
CN101189661A (zh) 2008-05-28
TW200644704A (en) 2006-12-16
HK1111259A1 (en) 2008-08-01
DE502006003997D1 (de) 2009-07-30
TWI318845B (en) 2009-12-21
ATE434253T1 (de) 2009-07-15
CA2603027A1 (en) 2006-10-05
WO2006102991A1 (de) 2006-10-05
MY139836A (en) 2009-10-30
DE102005014477A1 (de) 2006-10-12

Similar Documents

Publication Publication Date Title
EP1864279B1 (de) Vorrichtung und verfahren zum erzeugen eines datenstroms und zum erzeugen einer multikanal-darstellung
EP2240929B1 (de) Vorrichtung und verfahren zum synchronisieren von mehrkanalerweiterungsdaten mit einem audiosignal und zum verarbeiten des audiosignals
EP2240928B1 (de) Vorrichtung und verfahren zum berechnen eines fingerabdrucks eines audiosignals, vorrichtung und verfahren zum synchronisieren und vorrichtung und verfahren zum charakterisieren eines testaudiosignals
EP1687809B1 (de) Vorrichtung und verfahren zur wiederherstellung eines multikanal-audiosignals und zum erzeugen eines parameterdatensatzes hierfür
DE602004008613T2 (de) Treueoptimierte kodierung mit variabler rahmenlänge
DE602005006424T2 (de) Stereokompatible mehrkanal-audiokodierung
EP1794564B1 (de) Vorrichtung und verfahren zum synchronisieren von zusatzdaten und basisdaten
EP1763870B1 (de) Erzeugung eines codierten multikanalsignals und decodierung eines codierten multikanalsignals
DE602004004168T2 (de) Kompatible mehrkanal-codierung/-decodierung
EP1854334B1 (de) Vorrichtung und verfahren zum erzeugen eines codierten stereo-signals eines audiostücks oder audiodatenstroms
DE69210689T2 (de) Kodierer/dekodierer für mehrdimensionale schallfelder
DE602004002390T2 (de) Audiocodierung
DE602006000239T2 (de) Energieabhängige quantisierung für effiziente kodierung räumlicher audioparameter
EP0750811B1 (de) Verfahren zum codieren mehrerer audiosignale
EP2005421A1 (de) Vorrichtung und verfahren zum erzeugen eines umgebungssignals
DE60024729T2 (de) System und verfahren zum effizienten antialiasing im zeitbereich (tdac)
JP2017532603A (ja) オーディオ信号のエンコードおよびデコード
WO1993025015A1 (de) Verfahren zur reduzierung von daten bei der übertragung und/oder speicherung digitaler signale mehrerer voneinander abhängiger kanäle
DE102007029381A1 (de) Digitalsignal-Verarbeitungsvorrichtung, Digitalsignal-Verarbeitungsverfahren, Digitalsignal-Verarbeitungsprogramm, Digitalsignal-Wiedergabevorrichtung und Digitalsignal-Wiedergabeverfahren
EP1430750B1 (de) Verfahren und vorrichtung zur auswahl eines klangalgorithmus
DE102020210917B4 (de) Verbesserter M/S-Stereo-Codierer und -Decodierer
DE602004006401T2 (de) Aktualisieren eines verborgenen datenkanals
DE202004003000U1 (de) Vorrichtung zum Beschreiben einer Audio-CD und Audio-CD

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20070913

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

DAX Request for extension of the european patent (deleted)
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1111259

Country of ref document: HK

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

Free format text: LANGUAGE OF EP DOCUMENT: GERMAN

REF Corresponds to:

Ref document number: 502006003997

Country of ref document: DE

Date of ref document: 20090730

Kind code of ref document: P

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090617

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090617

REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1111259

Country of ref document: HK

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090917

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090617

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090617

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090617

NLV1 Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act
PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20091017

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090617

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090928

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090617

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090617

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090617

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090617

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20091017

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090917

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090617

26N No opposition filed

Effective date: 20100318

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090918

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090617

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090617

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20091218

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20090617

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 11

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 12

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 13

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: CH

Payment date: 20230402

Year of fee payment: 18

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: LU

Payment date: 20240321

Year of fee payment: 19

Ref country code: IE

Payment date: 20240319

Year of fee payment: 19

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: AT

Payment date: 20240318

Year of fee payment: 19

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: MC

Payment date: 20240320

Year of fee payment: 19

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20240321

Year of fee payment: 19

Ref country code: GB

Payment date: 20240322

Year of fee payment: 19

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20240320

Year of fee payment: 19

Ref country code: BE

Payment date: 20240320

Year of fee payment: 19