EP1354314A2

EP1354314A2 - Method and device for producing a scalable data stream, and method and device for decoding a scalable data stream while taking a bit bank function into account

Info

Publication number: EP1354314A2
Application number: EP02708282A
Authority: EP
Inventors: Ralph Sperschneider; Bodo Teichmann; Manfred Lutzky; Bernhard Grill
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date: 2001-01-18
Filing date: 2002-01-14
Publication date: 2003-10-22
Anticipated expiration: 2022-01-14
Also published as: KR20030076614A; CA2434783A1; ATE272884T1; US7496517B2; CA2434783C; DE50200750D1; EP1354314B1; WO2002058051A3; JP3890298B2; US20040107289A1; DE10102154C2; KR100516985B1; DE10102154A1; AU2002242667B2; WO2002058051A2; JP2004520739A; HK1056790A1

Abstract

In a method for generating a scalable data stream from one or several blocks of output data of a first encoder and from one or several blocks of output data of a second encoder a determining data block for a current section of an input signal is written. In addition, output data of the second encoder representing a preceding section of the input signal are written in transmission direction from an encoder to a decoder after the determining data block. When the output data of the second encoder are written for a preceding section of the input signal, the output data of the second encoder are written representing the current section of the input signal. In order to signalize where the output data of the second encoder for the preceding section end and where the output data of the second encoder for the current section begin, buffer information is written into the scalable data stream. By the fact that output data of a preceding section follow a determining data block for the current section, a bit savings bank function may be implemented in the scalable encoder and simply be signalized in the bit stream.

Description

Method and device for generating a scalable data stream and method and device for decoding a scalable data stream taking into account a bit savings bank function

description

The present invention relates to scalable encoders and decoders and in particular to the generation of scalable data streams by means of which a bit savings bank can be signaled.

Scalable encoders are shown in EP 0 846 375 B1. In general, scalability is understood to mean the possibility of a subset of a bit stream that contains an encoded data signal, e.g. represents an audio signal or a video signal to be decoded into a usable signal. This property is particularly desirable when e.g. a data transmission channel does not provide the full bandwidth required to transmit a full bit stream. On the other hand, incomplete decoding on a decoder with lower complexity is possible. In general, different discrete scalability layers are defined in practice.

An example of a scalable encoder as defined in subpart 4 (general audio) of part 3 (audio) of the MPEG-4 standard (ISO / IEC 14496-3: 1999 subpart 4) is shown in FIG. 1 , An audio signal s (t) to be coded is fed into the scalable encoder on the input side. The scalable encoder shown in Fig. 1 includes a first encoder 12, which is an MPEG-Celp encoder. The second encoder 14 is an AAC encoder that provides high quality audio coding and is defined in the MPEG-2 AAC (ISO / IEC 13818) standard. The Celp encoder 12 supplies a first scaling layer via an output line 16, while the AAC encoder 14 provides a second output layer. output line 18 supplies a second scaling layer to a bitstream multiplexer (BitMux) 20. On the output side, the bitstream multiplexer then outputs an MPEG-4 LATM bitstream 22 (LATM = low-overhead MPEG-4 audio transport multiplex). The LATM format is described in Section 6.5 of Part 3 (Audio) of the first addition to the MPEG-4 standard (ISO / IEC 14496-3: 1999 / AMD1: 2000).

The scalable audio encoder also includes some other elements. First there is a delay stage 24 in the AAC branch and a delay stage 26 in the Celp branch. An optional delay can be set for each branch using both delay levels. The delay stage 26 of the Celp branch is followed by a downsampling stage 28 in order to adapt the sampling rate of the input signal s (t) to the sampling rate required by the Celp encoder. An inverse celp decoder 30 is connected downstream of the celp encoder 12, the celp-coded / decoded signal being fed to an upsampling stage 32. The sampled up signal is then fed to a further delay stage 34, which is referred to in the MPEG-4 standard as "core encoder or delay".

The CoreDoderDelay 34 level has the following function. If the delay is set to zero, the first encoder 14 and the second encoder 16 process exactly the same samples of the audio input signal in a so-called superframe. A superframe can consist, for example, of three AAC frames, which together represent a certain number of samples No. x to No. y of the audio signal. The superframe also includes e.g. B. 8 CELP blocks, which in the case of CoreCoderDelay = 0 represent the same number of samples and also the same samples No. x to No. y.

If, on the other hand, a CoreCoderDelay D is set as a time variable other than zero, the three blocks of AAC frames nevertheless represent the same sample values No. x to No. y. The eight blocks of CELP frames, on the other hand, represent sample values No. x - Fs D to No. y - Fs D, where Fs is the sampling frequency of the input signal.

The current time segments of the input signal in a superframe for the AAC blocks and the CELP blocks can thus either be identical if CoreCoderDelay D = 0, or in the case of D not equal to zero they are shifted from one another by CoreCoderDelay. For reasons of simplicity, however, a CoreCoderDelay = 0 is assumed for the following explanations without restricting generality, so that the current time period of the input signal for the first encoder and the current time period for the second encoder are identical. In general, however, the only requirement for a superframe is that the AAC block (s) and the CELP blocks (s) in a superframe represent the same number of samples, the samples themselves not necessarily being identical, but also around CoreCoderDelay can be shifted to each other.

It should be noted that the Celp encoder, depending on the configuration, processes a section of the input signal s (t) faster than the AAC encoder 14. In the AAC branch, the optional delay stage 24 is followed by a block decision stage 26, which may be used. a. determines whether short or long windows should be used to window the input signal s (t), whereby short windows should be selected for strongly transient signals, while long windows are preferred for less transient signals, since the relationship between the amount of user data and side information is better for them than with short windows.

Through the block decision stage 26, a fixed delay of z. B. performs 5/8 times a block. This is referred to in technology as the look-ahead function. The block decision stage has to look ahead for a certain time in order to be able to determine whether there are transient signals in the future must be coded with short windows. Thereupon both the corresponding signal in the Celp branch and the signal in the AAC branch are fed to a device for converting the temporal representation into a spectral representation, which is designated in FIG. 1 with MDCT 36 or 38 (MDCT = Modified Discrete Cosine Transform = Modified Discrete Cosine Transform). The output signals of the MDCT blocks 36, 38 are then fed to a subtractor 40.

At this point, samples that belong together in time must be available. H. the delay must be identical in both branches.

The subsequent block 44 determines whether it is more favorable to feed the input signal per se to the AAC encoder 14. This is made possible by the bypass branch 42. However, if it is determined that the difference signal at the output of the subtractor 40 is e.g. is lower in energy than the signal output by the MDCT block 38, the difference signal is not taken, but the difference signal, in order to be encoded by the AAC encoder 14 in order to finally form the second scaling layer 18. This comparison can be carried out in bands, which is indicated by a frequency-selective switching device (FSS) 44. The closer functions of the individual elements are known in the art and are described, for example, in the MPEG-4 standard and in further MPEG standards.

An essential feature of the MPEG-4 standard and also of other encoder standards is that the transmission of the compressed data signal should take place over a channel with a constant bit rate. All high-quality audio codecs work block-based, ie they process blocks of audio data (order of magnitude 480-1024 samples) to pieces of a compressed bit stream, which are also referred to as frames. The bitstream format must be structured so that a decoder without a priori information where a frame begins is able to recognize the beginning of a frame in order to start outputting the decoded audio signal data with the least possible delay. Therefore, each header or destination data block of a frame begins with a particular synchronization word that can be searched for in a continuous bit stream. Other common components in the data stream in addition to the determination data block are the main data or "payload data" of the individual layers, in which the actual compressed audio data are contained.

4 shows a bit stream format with a fixed frame length. In this bitstream format, the headers or determination data blocks are inserted equidistantly into the bitstream. The side information and main data associated with this header follow immediately behind. The length, i.e. Number of bits, for the main data is the same in every frame. Such a bit stream format, as shown in FIG. 4, is used for example in MPEG Layer 2 or MPEG-CELP.

Fig. 5 shows another bit stream format with a fixed frame length and a back pointer or backward pointer. In this bitstream format, the header and page information are arranged equidistantly as in the format shown in FIG. 4. The start of the associated main data, however, only occurs in exceptional cases immediately after a header. In most cases, the start is in one of the previous frames. The number of bits by which the start of the main data in the bit stream is shifted is transmitted by the side information variable back pointer. The end of this main data can be in this frame or in a previous frame. The length of the main data is no longer constant. Thus the number of bits with which a block is encoded can be adapted to the properties of the signal. At the same time, however, a constant bit rate can be achieved. This technique is called "Bitsparkasse" and increases the theoretical Delay in the transmission chain. Such a bitstream format is used for example in MPEG Layer 3 (MP3). The technology of the bit savings bank is also described in the standard MPEG Layer 3.

Generally speaking, the bit savings bank represents a buffer of bits that can be used to provide more bits for coding a block of temporal samples than are actually permitted by the constant output data rate. The technology of the bit savings bank takes into account the fact that some blocks of audio samples can be coded with fewer bits than specified by the constant transmission rate, so that these blocks fill the bit bank, while still other blocks of audio samples have psychoacoustic properties that are not so allow large compression, so that the available bits would not be sufficient for these blocks for low-interference or interference-free coding. The required surplus bits are taken from the bit savings bank, so that the bit savings bank is emptied in such blocks.

However, as shown in FIG. 6, such an audio signal could also be transmitted in a format with a variable frame length. With the bit stream format "variable frame length", as shown in FIG. 6, the fixed order of the bit stream elements header, page information and main data is maintained as with the "fixed frame length". Since the length of the main data is not constant, the bit savings bank technique can also be used here, but no back pointers as in FIG. 5 are required. An example of a bit stream format, as shown in FIG. 6, is the transport format ADTS (Audio Data Transport Stream), as defined in the MPEG 2 AAC standard.

It should be noted that the aforementioned encoders are not all scalable encoders, but only comprise a single audio encoder. MPEG 4 provides for the combination of different encoders / decoders to form a scalable encoder / decoder. It is possible and useful to combine a Celp speech coder as the first coder with an AAC coder for the further or the further scaling layers and to package it in a bit stream. The purpose of this combination is that it is possible to either decode all scaling layers or layers and thus achieve the best possible audio quality, or parts of it, possibly only the first scaling layer with the corresponding limited audio quality. Reasons for the sole decoding of the lowest scaling layer can be that because the bandwidth of the transmission channel is too small, the decoder has only received the first scaling layer of the bit stream. For this reason, the portions of the first scaling layer in the bit stream are given priority over the second and further scaling layers during transmission, which ensures the transmission of the first scaling layer in the event of capacity bottlenecks in the transmission network, while the second scaling layer may be lost in whole or in part.

Another reason may be that a decoder wants to achieve the lowest possible codec delay and therefore only decodes the first scaling layer. It should be noted that the codec delay of a Celp codec is generally significantly smaller than the delay of the AAC codec.

The MPEG 4 version 2 standardizes the LATM transport format, which can also transmit scalable data streams.

In the following, reference is made to FIG. 2a. 2a is a schematic representation of the samples of the input signal s (t). The input signal can be divided into different successive sections 0, 1, 2, 3, each section having a certain fixed number of time has samples. Typically, the AAC encoder 14 (FIG. 1) processes an entire section 0, 1, 2 or 3 to provide an encoded data signal for that section. However, the celp encoder 12 (FIG. 1) usually processes a smaller amount of temporal samples per coding step. For example, it is shown in FIG. 2b that the celp encoder, or generally speaking the first encoder or coder 1, has a block length which is one quarter of the block length of the second encoder. It should be noted that this division is completely arbitrary. The block length of the first encoder could also be half as long, but could also be one eleventh of the block length of the second encoder. Thus, the first encoder will generate four blocks (11, 12, 13, 14) from the section of the input signal, from which the second encoder supplies a block of data. A conventional LATM bitstream format is shown in FIG. 2c.

A superframe can have different ratios of the number of AAC frames to the number of CELP frames, as is tabulated in MPEG 4. So a superframe z. B. an AAC block and 1 to 12 CELP blocks, 3 AAC blocks and 8 CELP blocks but also z. B. have more AAC blocks than CELP blocks depending on the configuration. A LATM frame that has a LATM determination data block comprises one or more superframes.

The generation of the LATM frame opened by header 1 is described as an example. First, the output data blocks 11, 12, 13, 14 of the Celp encoder 12 (FIG. 1) are generated and buffered. In parallel, the output data block of the AAC encoder, which is labeled "1" in FIG. 2c, is generated. Then, when the output data block of the AAC encoder is generated, the determination data block (header 1) is only written. Depending on the convention, the output data block of the first encoder, which is generated first and is designated 11 in FIG. 2c, can then be written directly after the header 1, ie be transmitted. An equidistant spacing of the output data blocks of the first encoder is usually chosen (in view of the small signaling information required) for further writing or transmission of the bit stream, as shown in FIG. 2c. This means that after writing or transferring block 11, the second output data block 12 of the first encoder, then the third output data block 13 of the first encoder and then the fourth output data block 14 of the first encoder are written or transmitted at equidistant intervals. The output data block 1 of the second encoder is filled into the remaining gaps during the transmission. Then a LATM frame is completely written, ie transmitted.

A disadvantage of the known bit stream formats shown in FIGS. 4 to 6 is the fact that they are not suitable for scalable data streams.

Another disadvantage of the known bit stream formats is that there is no bit stream format for a scalable data stream, so that the bit savings bank function for scalable data streams with output data from encoders with different time bases, in particular for the combination of AAC encoder and CELP encoder of a scalable encoding device, is currently not usable can be made. However, since a constant transmission rate is required, but the AAC encoder outputs blocks of different lengths depending on the properties of the encoded signal, the situation may well arise that the AAC encoder encodes a section of the time signal more bits than specified by the transmission rate, needed, while again requesting fewer bits for another section than specified by the output data rate. Thus, in the latter case the AAC encoder of the scalable coding device will run out of bits, while the AAC encoder of the scalable coding device in the former case will not, in order to maintain the constant output data rate comes to introduce audible interference in the encoded and decoded signal.

The object of the present invention is to provide a method and a device for generating a scalable data stream which is suitable for the fact that a bit savings bank function can be used for a scaling layer.

This object is achieved by a method according to claim 1 or by a device according to claim 9.

Another object of the present invention is to provide a method for decoding a scalable data stream.

This object is achieved by a method according to claim 10 or by an apparatus according to claim 11.

The present invention is based on the finding that the known concept set out in FIG. 2c has to be abandoned, which consists in that all data of an output data block of the second encoder are arranged between two successive LATM headers. Instead, it is permitted that output data of the second encoder, which represent a preceding time period of the input signal, are also written after a determination data block for the current time period, this fact or how much data is still written behind the determination data block in the transmission direction special buffer information also to be transmitted is signaled to a decoder.

The decoder can then easily determine, based on a determination data block and using the buffer information, where the output data of the second encoder ends and where then the output data of the second Encoders for the current time period begin, so that the decoder is able to connect the corresponding output data blocks of the first encoder with corresponding output data blocks of the second encoder in order to decode the signal in all layers again, the expression "corresponding" refers to the fact that the corresponding data of the first and the second encoder are related to the same section of the input signal in the case of CoreCoderDelay equal to zero (see FIG. 1) or to current sections shifted by CoreCoderDelay for the first and the second encoder.

In a method according to the invention for generating a scalable data stream from one or more blocks of output data from a first encoder and from one or more blocks of output data from a second encoder, a determination data block is therefore written for a current section of the input signal. In addition, the output data of the second encoder, which represent a preceding section of the input signal, are written in the transmission direction from an encoder to a decoder behind the determination data block. The output data of the second encoder which relate to the current section of the input signal, that is to say which actually belong to the determination data block, can then be written when the output data of the second encoder for the preceding section have been written completely. In addition, buffer information is written into the scalable data stream, the buffer information indicating how far the output data of the second encoder for the preceding section extend behind the determination data block for the current section. The output data of the first encoder can either be written equidistantly or not in the scalable data stream, but it is desirable, for delay reasons, to enable low-delay decoding of the first scaling layer alone, i.e. only of the output data blocks of the first encoder Write data blocks equidistant and delay optimized.

Usually a bit savings bank u. a. defined by the maximum size of the bit savings bank, this value being referred to in FIG. 3 as "Max Bufferfullness". This value is fixed and known to the decoder. In addition, the current value of the bit savings bank occupancy, which is referred to as "buffer fullness", is transmitted in the data stream. The difference between the variables Max Bufferfullness and Bufferfullness, when the present invention is applied to an MPEG 4 encoder, provides the buffer information and, as will be explained later, in this case it must be taken into account that in the AAC Blocks of interspersed Celp blocks or data from other scaling layers must not be taken into account in order to find the exact value of the start of the output data of the second data block behind the LATM determination data block.

Regardless of the functionality of the bit savings bank, the format according to the invention also enables output data blocks of varying lengths of the second encoder to be transmitted in an equidistant grid of determination data blocks. It may make sense to select the grid for the determination data blocks and the grid for the output data blocks of the first encoder equidistantly, and in particular to choose such that a determination data block is always followed by an output data block of the first encoder. The output data block of the second encoder is then written into the remaining gaps, the buffer information signaling how much data of the second encoder behind a determination data block belongs to the time period to which the determination data block refers, or to the preceding time segment of the input signal count so that the decoder can unambiguously and unambiguously create an association between output data blocks of the first encoder and an output data block of the second encoder for a period of the input signal. It is also an advantage of the present invention that the signaling of the output data block behind the determination data block can easily be combined with a signaling of output data blocks of the first encoder before the determination data block for the current time period in order to enable low-delay decoding only of the first scaling layer.

The scalable data stream according to the invention is particularly useful for real-time applications, but can also be used for non-real-time applications as well.

Preferred embodiments of the present invention are explained in detail below with reference to the accompanying drawings. Show it:

1 shows a scalable encoder according to MPEG 4;

2a shows a schematic representation of an input signal which is divided into successive time segments;

2b shows a schematic representation of an input signal which is divided into successive time segments, the ratio of the block length of the first encoder to the block length of the second encoder being shown;

2c shows a schematic illustration of a scalable data stream with a high delay in the decoding of the first scaling layer;

2d shows a schematic illustration of a scalable data stream with low delay in the decoding of the first scaling layer;

2e shows a bitstream format according to the present invention. fertilizer in which output data of the second encoder from a previous time period are arranged behind the determination data block for a current section;

3 shows a detailed illustration of the scalable data stream according to the invention using the example of a Celp encoder as the first encoder and an AAC encoder as the second encoder with a bit savings bank function.

4 shows an example of a bit stream format with a fixed frame length;

5 shows an example of a bit stream format with a fixed frame length and back pointer; and

6 shows an example of a bit stream format with a variable frame length.

In the following, FIG. 2d is discussed in comparison to FIG. 2c in order to explain a bit stream with a low delay for the first scaling layer. As in FIG. 2c, the scalable data stream contains successive determination data blocks, which are designated as header 1 and header 2. In MPEG 4, the destination data blocks are LATM headers. In the direction of transmission from an encoder to a decoder, which is shown with an arrow 202 in FIG. 2d, behind the LATM header 200 are the parts of the output data block of the AAC encoder hatched from top left to bottom right, which are in remaining gaps between Output data blocks of the first encoder are entered.

Furthermore, in contrast to FIG. 2c, however, in the frame started by the LATM header 200 there are no longer only output data blocks of the first encoder which belong in this frame, such as the output data blocks 13 and 14, but also the output data blocks 21 and 22 of the subsequent section of input data. In other words, in the example shown in FIG. 2d, the two output data blocks of the first encoder, which are designated by 11 and 12, are present in the transmission direction (arrow 202) in front of the LATM header 200 in the bit stream. In the example shown in FIG. 2d, the offset information 204 indicate an offset of the output data blocks of the first encoder from two output data blocks. If FIG. 2d is compared with FIG. 2c, it can be seen that the decoder can already decode the lowest scaling layer earlier by a time corresponding to this offset than in the case of FIG. 2c if the decoder is only interested in the first scaling layer is. The offset information, e.g. B. can be signaled in the form of a "core frame offset" are used to determine the position of the first output data block 11 in the bit stream.

In the case of core frame offset = zero, the bit stream designated in FIG. 2c results. However, if the core frame offset is> zero, the corresponding output data block of the first encoder 11 is transmitted earlier by the number of core frame offset of output data blocks of the first encoder. In other words, the delay between the first output data block of the first encoder after the LATM header and the first AAC frame results from the core encoder delay (FIG. 1) + core frame offset x core block length (block length of the encoder 1 in FIG. 2b) ). As is clear from the comparison of FIGS. 2c and 2d, for core frame offset = zero (FIG. 2c), after the LATM header 200, the output data blocks 11 and 12 of the first encoder are transmitted. By transmitting core frame offset = 2, the output data blocks 13 and 14 can follow the LATM header 200, whereby the delay in the case of pure celp decoding, that is to say decoding of the first scaling layer, is reduced by two celp block lengths. In the example, an offset of three blocks would be optimal. However, an offset of one or two blocks also brings a delay advantage. With this bitstream structure, it is possible for the celp encoder to transmit the generated celp block immediately after encoding. In this case, no additional delay is added to the Celp encoder by the bit stream multiplexer (20). Thus, in this case, no additional delay is added to the celp delay by the scalable combination, so that the delay becomes minimal.

It is pointed out that the case shown in FIG. 2d is only exemplary. So different ratios of the block length of the first encoder to the block length of the second encoder are possible, the z. B. can vary from 1: 2 to 1:12 or can also assume other ratios, with ratios greater or less than one.

In extreme cases (1:12 for MPEG 4 CELP / AAC), this means that for the same time period of the input signal for which the AAC encoder generates an output data block, the Celp encoder generates twelve output data blocks. The delay advantage of the data stream shown in FIG. 2d compared to the data stream shown in FIG. 2c can in this case come in the order of magnitude of a quarter to a half a second. This advantage will increase all the more, the greater the ratio between the block length of the second encoder and the block length of the first encoder, whereby in the case of the AAC encoder as the second encoder, the greatest possible block length is sought because of the then more favorable ratio between useful information and side information if the signal to be coded allows.

In the following, reference is made to FIG. 2e. In contrast to FIG. 2d, in which the offset function, that is to say the displacement of the output data blocks of the first encoder with respect to a determination data block, is shown in FIG. 2e the displacement of the Output data blocks of the second encoder are represented with respect to the grid given by the determination data blocks. The arrangement of the output data blocks of the first encoder, which are denoted by 11, 12, 13, 14, 21, 22, 23, 24, 31 in FIG. 2e, is unchanged compared to FIG. 2d. While no bit savings bank function is possible in FIG. 2d or, if the determination data blocks are to be in a fixed grid, no output data blocks of variable length can be used for the second encoder, this is now possible in FIG. 2e according to the present invention.

For this purpose, the data of the output data block of the second encoder of the preceding section, which is denoted by "0" in FIGS. 2a to 2e, are written in the transmission direction from an encoder to a decoder after the LATM header 200 until the scalable encoder has all the data of the has written the previous section in the bit stream. Only then is a transition limit 220 started to write the output data of the second encoder for the current section of the input signal into the bit stream. For example, transition boundary 220 may or may not coincide with a boundary of a celp data block. Depending on the signaling, either the distance from the end of the determination data block to the transition limit 220 or the distance from the start of the determination data block to the transition limit 220 or the distance from the rear limit of the celp block 13 to the transition limit 220 with or without the length of the celp Blocks 13, 14 and / or the length of the determination data block are signaled as buffer information. The latter variant is shown in more detail with reference to FIG. 3.

According to the invention, in the case of application to a scalable encoder, it is preferred not to provide any separate page information for signaling the buffer information, but instead to use the buffer fullness value already transmitted in the bit stream, the length of the the pointer designated "buffer information" in FIG. 2e, which is identified by the reference symbol 314 in FIG. 3, is exactly the same as the difference between Max Bufferfullness and Bufferfullness, if the length of the determination data blocks and the length of any Celp blocks as well as possibly existing further scaling layers are not taken into account, as is represented by the broken arrow with reference to FIG. 3.

3, which is similar to FIG. 2, but represents the special implementation using the example of MPEG 4. A current time period is shown hatched in the first line. In the second line, the windowing used in the AAC encoder is shown schematically. As is known, an overlap-and-add of 50% is used, so that a window is usually twice the length of time samples as the current time period, which is hatched in the top line of FIG. 3. FIG. 3 also shows the delay tdip, which corresponds to block 26 of FIG. 1 and which in the selected example has a size of 5/8 of the block length. A block length of the current time segment of 960 samples is typically used, so that the delay tdip of 5/8 of the block length is 600 samples. For example, the AAC encoder delivers a bit stream of 24 kbit / s, while the Celp encoder shown schematically below delivers a bit stream at a rate of 8 kbit / s. This results in a total bit rate of 32 kbit / s.

As can be seen from FIG. 3, the output data blocks zero and one of the Celp encoder correspond to the current time period of the first encoder. The output data block with number 2 of the Celp encoder already corresponds to the next time period. The same applies to the number 3 celp block. In FIG. 3, the delay of the downsampling stage 28 and the celp encoder 12 is also shown by an arrow, which is identified by the reference symbol 302 is shown. From this, the delay, which must be set by the stage 34 so that the same conditions exist at the subtracting point 40 of FIG. 1, results in the delay, which is designated by the core code delay and is illustrated by an arrow 304 in FIG. 3 , Alternatively, this delay can also be generated by block 26. For example:

Core encoder delay =

= tdip - Celp Encoder Delay - Downsampling Delay

= 600 - 120 - 117 = 363 samples.

For the case without a bit savings bank function or for the case that the bit savings bank (bit mux output buffer) is full, which is indicated by the variable buffer fullness = max, the case shown in FIG. 2d results. In contrast to FIG. 2d, in which four output data blocks of the first encoder are generated in accordance with an output data block of the second encoder, in FIG. 3 two for an output data block of the second encoder, which is drawn in black in the last two lines of FIG. 3 Output data blocks of the Celp encoder, designated "0" and "1", are generated. According to the invention, however, the output data block of the Celp encoder with the number "0" is no longer written behind a first LATM header 306, but rather the output data block of the Celp encoder with the number "one", especially since the output data block with the number "zero" has already been transmitted to the decoder. In the equidistant grid spacing provided for the celp data blocks, the celp block 1 is followed by the celp block 2 for the next period of time, with the rest of the data of the output data block of the AAC encoder being written into the data stream until a frame is completed until another LATM header 308 follows for the next time period. As shown in the last line of FIG. 3, the present invention can be easily combined with the bit savings bank function. In the event that the variable "Bufferfullness", which indicates the filling of the bit savings bank, is less than the maximum value, this means that the AAC frame required more bits than actually permitted for the immediately preceding period of time. This means that after the LATM header 306 the celp frames are written as before, but first the output data block or the output data blocks of the AAC encoder must be written into the bit stream from previous periods of time before writing the output data block of the AAC Encoder for the current period can be started. From the comparison of the last two lines of FIG. 3, which are labeled "1" and "2", it can be seen that the bit savings bank function also leads directly to a delay in the encoder for the AAC frame. Thus, the data for the AAC frame of the current time period, which is designated by 310 in FIG. 3, is present at exactly the same time as in the case "1", but can only be written into the bit stream after the AAC Data 312 for the immediately preceding period of time has been written into the bit stream. The starting position of the AAC frame is thus shifted depending on the bit savings bank level of the AAC encoder.

The bit savings bank status is transferred according to MPEG 4 in the element StreamMuxConfig by the variable "Bufferfullness". The variable buffer fullness is calculated from the variable bit reservoir divided by 32 times the currently existing number of channels of the audio channels.

It should be noted that the pointer identified by reference numeral 314 in Fig. 3 and whose length = max Bufferfullness - Bufferfullness is a forward pointer that points to the future, as it were, while at 5 is a backward pointer which sort of points to the past. This is because, according to the present exemplary embodiment, the LATM header is always written into the bit stream after the current time period has been processed by the AAC encoder, although it may still be necessary to write AAC data from previous time periods into the bit stream.

It should also be noted that the pointer 314 is deliberately drawn interrupted below the celp block 2, since it does not take into account the length of the celp block 2 or the length of the celp block 1, since this data naturally has nothing to do with the bit savings bank of the AAC encoder. Furthermore, no header data and bits from any other layers that may be present are taken into account.

In the decoder, the celp frames are first extracted from the bit stream, which is readily possible since, for example, they are arranged equidistantly and have a fixed length.

However, the length and spacing of all CELP blocks can be signaled in the LATM header anyway, so that immediate decoding is possible in any case.

The parts of the output data of the AAC encoder of the immediately preceding period, which are separated by the Celp block 2, are thus reassembled, and the LATM header 306 moves to the beginning of the pointer 314, so that the decoder is aware of the length of the pointer 314 knows when the data of the immediately preceding period of time has ended, so that when this data has been completely read in, the immediately preceding period of time can be decoded with full audio quality together with the Celp data blocks available for the same.

In contrast to the case shown in Fig. 2c, in which one If the LATM header follows both the output data blocks of the first encoder and the output data block of the second encoder, the variable core frame offset can now be used to shift output data blocks of the first encoder forward in the bit stream, while the arrow 314 (max Bufferfullness - Bufferfullness a shift of the output data block of the second encoder to the rear can be achieved in the scalable data stream, so that the bit savings bank function can also be implemented in the scalable data stream in a simple and safe manner, while the basic grid of the bit stream is maintained by the successive LATM determination data blocks, which are written whenever the AAC encoder has encoded a period of time, and which can therefore serve as a reference point, even if, as shown in the last line in FIG. 3, a large part of the data in the by a LATM Header designated frame one its originate from the next time period (with regard to the Celp frames) or from immediately preceding time periods (with regard to the AAC frame), the respective shifts, however, being communicated to a decoder by the two variables to be transmitted additionally in the bit stream.

Claims

claims

1. A method for generating a scalable data stream from one or more blocks of output data from a first encoder (12) and from one or more blocks of output data from a second encoder (14), the one or more blocks of output data from the first encoder (12 ) together represent a number of samples of the input signal for the first encoder, which form a current section of the input signal for the first encoder, and wherein the one or more blocks of output data of the second encoder (14) together form a number of samples of the input signal for the second encoder, the number of samples for the second encoder forming a current portion of the input signal for the second encoder, the number of samples for the first encoder and the number of samples for the second encoder being the same, and wherein current sections for the first u nd the second encoder are identical or are shifted from each other by a time period (34), with the following steps:

Writing a determination data block (306) for the current portion of the input signal for the first or second encoder;

Writing output data (312) of the second encoder, which represents a preceding section of the input signal for the second encoder, in the direction of transmission from an encoder to a decoder behind the determination data block (306);

Writing second encoder output data (310) representing the current portion of the input signal for the second encoder when the second encoder output data is written for the previous portion of the input signal; Writing buffer information (314) into the scalable data stream, the buffer information indicating how far the output data of the second encoder for the preceding section for the second encoder extends beyond the destination data block; and

Writing the one or more blocks of the output data of the first encoder (12) into the scalable data stream.

2. The method according to claim 1,

in which the lengths of the blocks of output data of the second encoder are different for sections of the same length of the input signal, the lengths of the blocks of output data depending on signal properties of the input signal,

in which the one or more blocks of the output data of the first encoder are of equal length for sections of the input signal of equal length, and

at which the transmission rate of the bit stream is constant.

3. The method according to claim 1 or 2,

in which the second encoder (14) has a bit savings bank function, the maximum size of the bit savings bank being given by maximum buffer size information, and the current state of the bit savings bank being given by current buffer information,

where the buffer information (314) is the current buffer information, and

where the size, how far the output data of the extend the second encoder for the previous period behind the determination data block (306), from which the difference between the maximum buffer size information and the current buffer information can be derived.

4. The method according to any one of the preceding claims,

wherein the writing of output data of the first encoder is carried out such that a block of output data of the first encoder is arranged immediately after a determination data block (306), and

in which the length of this determination data block (306) as well as the length of existing output data blocks of the first encoder and possibly existing data of further scaling layers when determining the size, how far the output data of the second encoder extend behind the determination data block, using the current buffer information and the maximum buffer size information is ignored.

5. The method according to any one of the preceding claims,

in which the device (20) is designed to write the one or more blocks of output data of the first encoder in order to write the blocks of output data of the first encoder equidistantly into the scalable data stream.

6. The method according to any one of the preceding claims,

in which the first encoder (12) is a Celp encoder,

in which the second encoder (14) is an AAC encoder, and

in which the determination data block is a LATM header according to MPEG 4.

7. The method according to any one of the preceding claims, wherein the at least one block of output data from the second encoder (14) and the at least one block of output data from the first encoder (12) are useful data in a superframe which, in addition to the useful data, exactly one loading mood data block.

8. The method according to any one of the preceding claims,

in which, in the step of writing the blocks of output data of the first encoder, at least one block of output data of the first encoder for the current section of the input signal for the first encoder is written in the transmission direction before the determination data block for the current time section.

9. Device for generating a scalable data stream from one or more blocks of output data from a first encoder (12) and from one or more blocks of output data from a second encoder (14), the one or more blocks of output data from the first encoder (12 ) together represent a number of samples of the input signal for the first encoder, which form a current section of the input signal for the first encoder, and wherein the one or more blocks of output data of the second encoder (14) together form a number of samples of the input signal for the second encoder, the number of samples for the second encoder forming a current portion of the input signal for the second encoder, the number of samples for the first encoder and the number of samples for the second encoder being the same, and wherein current sections for the first u nd the second encoder are identical or are shifted from one another by a time period (34), with the following features: means for writing a determination data block (306) for the current portion of the input signal for the first or second encoder;

means for writing output data (312) of the second encoder, representing a preceding portion of the input signal for the second encoder, in the direction of transmission from an encoder to a decoder behind the determination data block (306);

means for writing second encoder output data (310) representing the current portion of the input signal to the second encoder when the second encoder output data is written to the previous portion of the input signal;

means for writing buffer information (314) into the scalable data stream, the buffer information indicating how far the output data of the second encoder for the preceding section for the second encoder extends beyond the destination data block; and

means for writing the one or more blocks of the output data of the first encoder (12) into the scalable data stream.

10. A method for decoding a scalable data stream from one or more blocks of output data from a first encoder (12) and from one or more blocks of output data from a second encoder (14), the one or more blocks of output data from the first encoder (12 ) together represent a number of samples of the input signal for the first encoder, which form a current section of the input signal for the first encoder, and wherein the one or more blocks of output data of the second encoder (14) together form a number of Ab- represent samples of the input signal for the second encoder, the number of samples for the second encoder forming a current section of the input signal for the second encoder, the number of samples for the first encoder and the number of samples for the second encoder being the same, and wherein the current sections for the first and the second encoder are identical or are shifted from one another by a time period (34), the scalable data stream comprising a determination data block for the current section for the first or second encoder, output data of the second encoder for a previous section of the input signal in the transmission direction behind the determination data block and buffer information indicating how far the output data of the second encoder for the preceding section extend behind the determination data block, with the following steps:

Reading the determination data block (306) for the current portion of the input signal for the first or second encoder;

Reading the output data of the first encoder for the current section of the first encoder (12);

Reading the buffer information (314);

Reading the output data (310) of the second encoder for the current section starting from a position in the scalable data stream indicated by the buffer information (314); and

Decoding the output data (310) of the second encoder and the output data of the first encoder to obtain a decoded signal.

11. Device for decoding a scalable data Stream from one or more blocks of output data from a first encoder (12) and from one or more blocks of output data from a second encoder (14), the one or more blocks of output data from the first encoder (12) together comprising a number of samples of the Represent input signals to the first encoder that form a current portion of the input signal to the first encoder, and wherein the one or more blocks of output data from the second encoder (14) together represent a number of samples of the input signal to the second encoder, wherein the number of samples for the second encoder forms a current portion of the input signal for the second encoder, the number of samples for the first encoder and the number of samples for the second encoder are the same, and wherein the current portions for the first and the second encoder are identical or one e time period (34) are shifted from one another, the scalable data stream having a determination data block for the current section for the first or second encoder, output data of the second encoder for a preceding section of the input signal in the transmission direction behind the determination data block and buffer information which indicate how far the output data of the second encoder for the preceding section extend behind the determination data block, with the following features:

a bitstream demultiplexer which is designed to be able to carry out the following steps:

Reading the output data of the first encoder for the current section of the first encoder (12); Reading the buffer information (314);

means for decoding the output data (310) of the second encoder and the output data of the first encoder to obtain a decoded signal.