CN1816847A - Fidelity-optimised variable frame length encoding - Google Patents
Fidelity-optimised variable frame length encoding Download PDFInfo
- Publication number
- CN1816847A CN1816847A CNA2004800186630A CN200480018663A CN1816847A CN 1816847 A CN1816847 A CN 1816847A CN A2004800186630 A CNA2004800186630 A CN A2004800186630A CN 200480018663 A CN200480018663 A CN 200480018663A CN 1816847 A CN1816847 A CN 1816847A
- Authority
- CN
- China
- Prior art keywords
- signal
- coding
- mrow
- mono
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims description 82
- 238000005259 measurement Methods 0.000 claims description 4
- 230000003595 spectral effect Effects 0.000 claims description 4
- 230000001419 dependent effect Effects 0.000 abstract description 2
- 230000005540 biological transmission Effects 0.000 description 16
- 230000006870 function Effects 0.000 description 16
- 230000008569 process Effects 0.000 description 16
- 230000005236 sound signal Effects 0.000 description 13
- 238000010586 diagram Methods 0.000 description 11
- 238000004458 analytical method Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 6
- 230000009286 beneficial effect Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000013139 quantization Methods 0.000 description 4
- 230000008447 perception Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000002592 echocardiography Methods 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 230000008054 signal transmission Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 239000004698 Polyethylene Substances 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000000739 chaotic effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000003750 conditioning effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- -1 polyethylene Polymers 0.000 description 1
- 229920000573 polyethylene Polymers 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Mobile Radio Communication Systems (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
- Compression Of Band Width Or Redundancy In Fax (AREA)
- Reduction Or Emphasis Of Bandwidth Of Signals (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
- Endoscopes (AREA)
Abstract
Polyphonic signals are used to create a main signal, typically a mono signal, and a side signal (Xside). A number of encoding schemes (81) for the side signal (Xside) are provided. Each encoding scheme (81) is characterised by a set of sub-frames (90) of different lengths. The total length of the sub-frames (90) corresponds to the length of the encoding frame (80) of the encoding scheme (81). The encoding scheme (81) to be used on the side signal (Xside) is selected dependent on the present signal content of the polyphonic signals. In a preferred embodiment, a side residual signal is created as the difference between the side signal and the main signal scaled with a balance factor. The balance factor is selected to minimise the side residual signal. The optimised side residual signal and the balance factor are encoded and provided as encoding parameters representing the side signal.
Description
Technical Field
The present invention relates generally to encoding of audio signals, and in particular to encoding of multi-channel audio signals.
Background
There is a high market demand for transmitting and storing audio signals at low bit rates while maintaining high audio quality. In particular, low bit rate operation is a necessary cost factor in cases where transmission resources or memory are limited. This is often the case, for example, in streaming and messaging applications in mobile communication systems such as GSM, UMTS or CDMA.
Today, no standardized codec is available that provides high stereo audio quality at bit rates that are economically interesting for use in mobile communication systems. For available codecs, the audio signal can be transmitted monophonically. Stereo transmission is also available to some extent. However, bit rate limitation usually requires limiting the stereo representation quite drastically.
The simplest way of stereo or multi-channel coding of audio signals is to encode the signals of the different channels separately as a single and independent signal. Another basic way used in stereo FM radio transmission and to ensure compatibility with a conventional mono radio receiver is to transmit the sum and difference signals of the two channels involved.
State of the art audio codecs, such as MPEG-1/2 layer III and MPEG-2/4AAC, use so-called joint stereo coding. According to this technique, the signals of different channels are processed jointly, rather than separately and individually. The two most frequently used joint stereo coding techniques are called "mid/side" (M/S) stereo coding and intensity stereo coding, which are typically applied on the sub-band or multi-channel signal of the stereo to be coded.
M/S stereo coding is similar to the process described in stereo FM radio in the sense that: it encodes and transmits the sum signal and the difference signal of the channel subbands, thus utilizing redundancy between the channel subbands. The structure and operation of an M/S stereo coding based encoder is described, for example, in U.S. patent 5,285,498 to j.d. johnston.
Intensity stereo, on the other hand, can exploit stereo irrelevancy. It sends the joint intensity of multiple channels (of different sub-bands) and some location information indicating how the intensity is distributed among the multiple channels. Intensity stereo provides only spectral magnitude information of the channels. No phase information is transferred. For this reason and because the inter-channel information in time (more specifically the time difference between the channels) has a major psychoacoustic correlation especially at lower frequencies, intensity stereo can only be used at high frequencies above e.g. 2 KHz. An intensity stereo coding method is described in, for example, european patent 0497413 to r.veldhuis et al.
A newly developed stereo coding method is described in a conference paper entitled "technical psychoacoustic coding implemented for stereo and multi-channel audio compression" published by c.faller et al on the 112 th AES conference held in munich, germany, 2002. The method is a parametric multi-channel audio coding method. The basic principle is that the input signals from the N channels C1, C2, … CN are combined into one mono signal m at the encoding side. The mono signal is audio encoded using any conventional mono audio codec. At the same time, parameters are taken from the channel signal, which parameters describe the multi-channel image. These parameters are encoded and sent to the decoder along with the audio bitstream. The decoder first decodes the mono signal m 'and then regenerates the channel signals C1', C2 ', …, CN' based on the parametric description of the multi-channel image.
The principle of the technical psychoacoustic coding (BCC) method is that it transmits an encoded mono signal and so-called BCC parameters. The BCC parameters comprise the encoded inter-channel level differences and the inter-channel time differences of the subbands of the original multi-channel input signal. The decoder regenerates the different channel signals by applying subband level and phase adjustments to the mono signal based on the BCC parameters. An advantage compared to e.g. M/S or intensity stereo is that stereo information comprising inter-channel information in time is transmitted at a much lower bit rate. However, this technique requires computationally demanding time-frequency conversions at both the encoder and decoder, on each channel.
Moreover, BCC does not deal with the problem that many stereo information (especially at low frequencies) is diffuse, i.e. they do not come from any particular direction. The diffuse sound fields are present in both channels of a stereo recording, but they are largely out of phase with respect to each other. If an algorithm such as BCC encounters a recording with a large diffuse sound field, the reproduced stereo image will become chaotic, a jump from left to right situation will occur, since the BCC algorithm can only allocate (pan) a signal of a specific frequency band to the left channel or the right channel.
One possible method for encoding a stereo signal and ensuring good reproduction of the diffuse sound field is to use an encoding scheme that is very similar to the technique used for FM stereo radio, i.e. to encode the mono (left + right) and difference (left-right) signals separately.
A technique is described in us patent 5,434,948 to c.e. holt et al, which uses a similar technique to BCC to encode the mono signal and the side information. In this case, the side information consists of the prediction filter and optionally includes a residual signal. The prediction filter, which is estimated by a least mean square algorithm when applied to the mono signal, allows to predict the multi-channel audio signals. With this technique, a multi-channel audio source can be encoded at a very low bitrate, however at the cost of a degradation in quality, as discussed further below.
Finally, for completeness, a technique used in 3D audio is to be mentioned. This technique analyzes right and left channel signals by filtering the sound source signal with a so-called head-related filter. However, this technique requires the separation of different sound source signals and thus cannot be applied generally in stereo or multi-channel coding.
Disclosure of Invention
A problem with existing coding schemes based on the coding of frames of a signal, in particular a main signal and one or more side signals, is that the division of audio information into frames introduces annoying perceptual artefacts. Dividing the information into frames having a relatively long duration generally reduces the average requested bit rate. This may be beneficial for e.g. music containing a lot of diffuse sound. However, for instantaneously rich music or speech, rapid temporal changes will penetrate over the frame duration, creating phantom sounds or even pre-echo problems. Conversely, encoding a short frame will give a more accurate representation of the sound, thereby minimizing energy, but requires a higher transmission bit rate and higher computational resources. Therefore, the coding efficiency also decreases with a very short frame length. Introducing more frame boundaries also introduces discontinuities in the encoding parameters, which can appear as perceptual artifacts.
Another problem with solutions based on encoding of the main and one or several side signals is that they typically require relatively large computational resources. In particular, when short frames are used, handling discontinuities in parameters from one frame to another is a complex task. When long frames are used, the estimation error of the instantaneous sound can cause a very large side signal, which in turn increases the rate requirement for transmission.
It is therefore an object of the present invention to provide an encoding method and apparatus that improves the perceptual quality of a multi-channel audio signal, in particular avoiding artifacts such as pre-echoes, phantom sounds or frame discontinuity artifacts. It is a further object of the invention to provide an encoding method and apparatus which requires less processing power and has a more constant transmission bit rate requirement.
The above object is achieved by a method and an apparatus according to the appended patent claims. Generally, a polyphonic signal is used to create a main signal (typically a mono signal) and a side signal. The main signal is encoded according to prior art encoding principles. Various coding schemes are provided for the side signal. Each coding scheme is characterized by a set of subframes of different lengths. The total length of the sub-frames corresponds to the length of the coded frames of the coding scheme. The subframes of the groups include at least one subframe. The coding scheme to be used on the side signal is selected at least partly in dependence on the current signal content of the polyphonic signal.
In one embodiment, the selection is made based on a characteristic analysis of the signal, or prior to encoding. In another embodiment, the side signal is encoded by each coding scheme and the best coding scheme is selected based on a measure of the coding quality.
In a preferred embodiment, a side residual signal is created as the difference between the side signal and the main signal scaled with the balance factor. The balance factor is selected to minimize the side residual signal. The optimized side residual signal and the balance factor are encoded and provided as parameters representing the side signal. On the decoder side, the side residual signal and the main signal are used to recover the side signal.
In another preferred embodiment, the encoding of the contralateral signal comprises energy profile scaling in order to avoid pre-echo effects. Further, the different coding schemes may include different coding processes in separate subframes.
The main advantage of the invention is that the preservation of the perception of the audio signal is improved. Furthermore, the invention still allows for multi-channel signal transmission at very low bit rates.
Drawings
The invention, together with further objects and advantages thereof, may best be understood by making reference to the following description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a block diagram of a system for transmitting polyphonic signals;
FIG. 2a is a block diagram of an encoder in a transmitter;
fig. 2b is a block diagram of a decoder in a receiver;
FIG. 3a is a diagram illustrating encoded frames having different lengths;
FIGS. 3b and 3c are block diagrams of embodiments of a side signal encoder unit according to the present invention;
FIG. 4 is a block diagram of an embodiment of an encoder for encoding a side signal using a balance factor;
FIG. 5 is a block diagram of an embodiment of an encoder for a multiple signal system;
FIG. 6 is a block diagram of an embodiment of a decoder adapted to decode signals from the apparatus of FIG. 5;
figures 7a and b are diagrams illustrating a pre-echo artifact;
FIG. 8 is a block diagram of an embodiment of a side signal encoder unit according to the present invention, which employs different encoding principles in different sub-frames;
FIG. 9 illustrates the use of different coding principles in different frequency subbands;
FIG. 10 is a flow chart of the basic steps of an embodiment of an encoding method according to the present invention; and
fig. 11 is a flow chart of the basic steps of an embodiment of the decoding method according to the invention.
Detailed Description
Fig. 1 illustrates a typical system 1 in which the present invention may be advantageously employed. The transmitter 10 comprises an antenna 12 which includes associated hardware and software to enable transmission of the radio signal 5 to the receiver 20. The transmitter 10 comprises, among other things, a multi-channel encoder 14 that transforms signals of a plurality of input channels 16 into output signals suitable for radio transmission. An example of a suitable multi-channel encoder 14 will be described in further detail below. The signals of the input channels 16, e.g. data files of a digital representation of an audio recording, magnetic tape or polyethylene disc of audio, etc., may be supplied from, for example, an audio signal storage 18. The signals of the input channels 16 may also be provided "live", for example from a set of microphones 19. If the audio signal is not already in digital format, it is digitized before entering the multi-channel encoder 14.
On the receiver 20 side, an antenna 22 with associated hardware and software processes the reception of the radio signal 5 representing a multi-tone audio signal. In this case, the usual functions, for example error correction, are carried out. The decoder 24 decodes the received radio signal 5 and transforms the audio data carried thereby into signals of a plurality of output channels 26. The output signal may be provided to e.g. a loudspeaker 29 for immediate rendering or may be stored in any kind of audio signal storage 28.
The system 1 may be, for example, a teleconferencing system, a system for providing audio services or other audio applications. In some systems, for example in teleconferencing systems, the communication must be of the duplex type, whereas the distribution of music from one service provider to subscribers can be of the substantially unidirectional type. The transmission of signals from the transmitter 10 to the receiver 20 can also take place in any other way, for example by means of different kinds of electromagnetic waves, cables or optical fibres and combinations thereof.
Fig. 2a illustrates an embodiment of an encoder according to the present invention. In this embodiment, the polyphonic signal is a stereo signal comprising two channels a and B received at the input terminals 16A and 16B. The signals of channels a and b are provided to a pre-processing unit 32 where different signal conditioning processes may be performed. The signals from the outputs of the pre-processing units 32 (possibly modified) are summed in a summing unit 34. The additionThe method unit 34 also divides the resulting sum by a factor of 2. The signal x generated in this waymonoIs the main signal of the stereo signal because it contains substantially all data from both channels. In this embodiment, the main signal thus represents a pure "mono" signal. Principal signal xmonoIs supplied to a main signal encoder unit 38 which encodes the main signal according to any suitable encoding principle. These principles are available in the prior art and will not be discussed further herein. The main signal encoder unit 38 gives an output signal pmonoAs encoding parameters representing the main signal.
In the subtraction unit 36, the difference (divided by a factor 2) of the channel signals is provided as the side signal xside. In this embodiment, the side signal represents the difference between the two channels of the stereo signal. Side signal xsideIs supplied to the side signal encoding unit 30. Preferred embodiments of the side signal encoding unit 30 will be discussed further below. The side signal x is encoded according to a side signal encoding process, which will be discussed in further detail belowsideIs converted into a representative side signal xsideCoding parameter p ofside. In some embodiments, the host signal x is also utilizedmonoIs encoded. Arrow 42 indicates such a device, in which the original uncoded host signal x is utilizedmono. In still other embodiments, the main signal information used in the side signal encoding unit 30 may be derived from an encoding parameter p representing the main signalmonoAs indicated by dashed line 44.
Representing the main signal xmonoCoding parameter p ofmonoIs a first output signal, and represents a side signal xsideCoding parameter p ofsideIs the second output signal. In the usual case, the two output signals pmono、psideTogether representing the complete stereo sound, which are multiplexed into one transmission signal 52 at the multiplexer unit 40. However, in other embodiments, the first and second output signals p may be performed separatelymono、psideTo be transmitted.
In fig. 2b, an embodiment of the decoder 24 according to the invention is illustrated in block diagram form. The received signal 54 (containing encoding parameters representing main and side signal information) is supplied to a demultiplexer unit 56 which separates the first and second input signals, respectively. Coding parameter p corresponding to the main signalmonoIs provided to the main signal decoder unit 64. In a conventional manner, the coding parameter p representing the main signalmonoIs used to generate a decoded host signal x "monoWhich is as similar as possible to the host signal x in the encoder 14 (fig. 2a)mono(FIG. 2 a).
Similarly, a second input signal corresponding to the side signal is provided to a side signal decoder unit 60. Here, the coding parameter p representing the side signalsideUsed to recover the decoded side signal x "side. In some embodiments, the decoding process utilizes the associated host signal x "monoAs indicated by arrow 65.
Decoded main and side signals x "mono、x”sideIs supplied to a summing unit 70 which provides an output signal representing the original signal of channel a. Similarly, the difference provided by the subtraction unit 68 provides an output signal representing the original signal of channel b. These channel signals may be post-processed in a post-processor unit 74 according to prior art processing procedures. Finally, the channel signals a and B are provided at the output 26A and 26B of the decoder.
As described in the summary, encoding is typically done one frame at a time. A frame includes audio samples over a predetermined time period. At the bottom of fig. 3a, a frame SF2 of duration L is illustrated. The audio samples within the unshaded part are to be encoded together. The preceding samples and the following samples are encoded in other frames. In any event, dividing the samples into frames will introduce some discontinuities at the frame boundaries. A variable sound will give variable coding parameters and thus change substantially at each frame boundary. This will produce an appreciable error. One way to compensate somewhat for this is to base the encoding not only on the samples to be encoded, but also on samples in the absolute vicinity of the frame, as indicated by the shaded portion. In this way there will be softer transitions between different frames. Alternatively or additionally, interpolation techniques are sometimes utilized to reduce perceptible artifacts caused by frame boundaries. However, all of these processes require a significant amount of additional computational resources, and for some specific coding techniques it may be difficult to provide any resources.
Therefore, it would be beneficial to use as long frames as possible, so the number of frame boundaries would be small. Also the coding efficiency will generally become high and the necessary transmission bit rate is generally minimized. However, problems with long frames are pre-echo artifacts and phantom sounds.
By instead utilizing shorter frames, such as SF1 or even SF0 with durations of L/2 and L/4, respectively, those skilled in the art recognize that coding efficiency will be reduced, transmission bit rates must be higher, and the problem of frame boundary artifacts will increase. However, shorter frames are less subject to other perceptible artifacts, such as phantom sounds and pre-echoes, for example. In order to be able to minimize the coding error as much as possible, a frame length as short as possible should be used.
According to the present invention, audio perception can be improved by encoding the side signal using a frame length that depends on the current signal content. Since the effect of different frame lengths on the audio perception will differ depending on the characteristics of the sound to be encoded, improvements can be obtained by letting the characteristics of the signal itself influence the frame length used. The encoding of the main signal is not an object of the present invention and will not be described in detail. However, the frame length used for the main signal may be equal to the frame length used for the side signal, or may not be equal.
Due to small temporal variations, it is beneficial, for example, in some cases to encode the side signal using relatively long frames. This situation may arise for recordings with a largely diffuse sound field, such as concert recordings. In other situations, such as in stereo voice conversation, short frames may be preferred. Two basic methods can be used to decide which frame length to choose.
An embodiment of the side signal encoder unit 30 according to the invention is illustrated in fig. 3b, wherein a closed loop decision is utilized. Here, a basic coded frame of length L is used. A plurality of coding schemes 81 are generated, characterized by separate sets 80 of sub-frames 90. Each set 80 of subframes 90 includes one or more subframes 90, which may be of the same or different lengths. However the total length of the set 80 of sub-frames 90 is always equal to the basic coding frame length L. Referring to fig. 3b, the top coding scheme is characterized as a set of subframes including only one subframe of length L. The next subframe set contains two subframes of length L/2. The third set contains two L/4 length subframes followed by one L/2 length subframe.
The signal x supplied to the side signal encoder unit 30 is encoded by all encoding schemes 81sideAnd (6) coding is carried out. In the top coding scheme, the entire basic coded frame is coded in blocks. In other coding schemes, however, the signal x is coded in separate sub-framessideAnd (6) coding is carried out. The results from each encoding scheme are provided to a selector 85. The fidelity measurement means 83 determines a fidelity measure (measure) for each encoded signal. The fidelity measure is an objective quality value, preferably a signal-to-noise ratio measure or a weighted signal-to-noise ratio. The fidelity measure associated with each coding scheme is compared and as a result a switching means 87 is controlled for selecting the coding parameter representing the side signal from the coding scheme giving the best fidelity measure as the output signal p from the side signal encoder unit 30side。
Preferably all possible combinations of frame lengths are tested and the set of sub-frames giving the best quality of appearance (e.g. signal to noise ratio) is selected.
In this embodiment, the length of the subframe used is selected according to:
lsf=l∫/2n,
wherein lsfIs the length of the subframe, lfIs the length of the encoded frame and n is an integer. In the present embodiment, n is selected between 0 and 3. However, it would be possible to use any frame length as long as the total length of the set remains constant.
In fig. 3c another embodiment of the side signal encoder unit 30 according to the invention is illustrated. Here, the frame length determination is an open loop determination based on statistical properties of the signal. In other words, the spectral characteristics of the side signal will be used as a basis for deciding which coding scheme is intended to be used. As previously described, different coding schemes characterized as sets of different subframes may be obtained. However, in this embodiment, the selector 85 is placed before the actual encoding. Input side signal xsideInto the selector 85 and the signal analysis unit 84. The result of the analysis becomes the input to the switch 86, in which only one coding scheme 81 is used. The output from this encoding scheme will also be the output signal p from the side signal encoder unit 30side。
The advantage of open-loop decision is that the actual encoding is performed only once. A disadvantage, however, is that the analysis of the signal characteristics can be very complex in practice and that it is difficult to predict the possible characteristics in advance in order to be able to give an appropriate choice in the switch 86. A number of sound statistical analyses have to be performed and included in the signal analysis unit 84. Any small change in the coding scheme may completely reverse the statistical properties.
By using a closed loop selection (fig. 3b), the coding scheme can be interchanged without any changes to the rest of the unit. On the other hand, if many coding schemes are to be investigated, the computational requirements can be high.
The benefit of such variable frame length coding of the side signal is that it is possible to choose between two situations: fine time resolution and coarse frequency resolution on the one hand and coarse time resolution and fine frequency resolution on the other hand. The above embodiment will maintain the stereo image in the best possible way.
There may also be some requirements on the actual coding used in the different coding schemes. In particular, when closed-loop selection is used, the computational resources for performing multiple more or less simultaneous encodings must be large. The more complex the encoding process, the more computing power is required. In addition, a low bit rate at the time of transmission is also preferable.
The method presented in US 5,434,948 uses a filtered version of the mono (main) signal to compare the side signal or the difference signal. The parameters of the filter are optimized and allowed to vary over time. The filter parameters representing the encoding of the side signal are then transmitted. In one embodiment, a residual side signal is also transmitted. In many cases, this method will likely be used as a side signal encoding method within the scope of the present invention. However, this approach has some drawbacks. The quantization of the filter coefficients and any residual side signal typically requires a relatively high transmission bit rate, since the filter order has to be high to provide an accurate side signal estimate. The estimation of the filter itself can also be problematic, especially in music that is instantaneously rich. The estimation error will give a modified side signal which is sometimes larger in amplitude than the unmodified signal. This will result in higher bit rate requirements. Also, if a new set of filter coefficients is computed every N samples, these filter coefficients need to be interpolated to produce a smooth transition from one set of filter coefficients to another, as discussed above. Interpolation of the filter coefficients is a complex task and errors in the interpolation will appear as large side error signals, resulting in a higher bit rate required by the difference error signal encoder.
One way to avoid the need for interpolation is to update the filter coefficients on a sample-by-sample basis and rely on backward adaptive analysis. In order to function well, a residual encoder is required to have a relatively high bit rate. This is therefore not a good alternative for low rate stereo coding.
There are situations, for example, very common for music, where the mono signal and the difference signal are almost uncorrelated. The filter estimation then becomes very difficult, with the additional risk of just making the case of the difference error signal encoder worse.
The solution according to US 5,434,948 works well in the following situations: where the filter coefficients vary slowly over time, such as in a conference telephone system. In the case of music signals, this method does not work well because the filter needs to be changed quickly to track the stereo image. This means that sub-frame lengths of very different amplitude have to be used, which means that the number of combinations to be tested increases rapidly. This in turn means that the requirements for computing all possible coding schemes become impractically high.
Thus, in a preferred embodiment, the side signal is encoded based on the following idea: i.e. by using a simple balance factor instead of a complex bit rate consuming prediction filter, thereby reducing the redundancy between the mono signal and the side signal. The residue of this operation is then encoded. The residual amplitude is relatively low and does not require very high bit rate requirements for transmission. This idea is indeed very suitable for combining with the variable frame aggregation method described earlier, because of the low computational complexity.
Using a balance factor in combination with the variable frame length method eliminates the need for complex interpolation and the associated problems that may arise from interpolation. Also, using simple balance factors instead of complex filters creates less estimation problems, since the possible estimation errors of the balance factors have less influence. The preferred solution will be able to reproduce smooth signals (panned signals) and diffuse sound fields with good quality and limited bit rate requirements and computational resources.
Fig. 4 illustrates a preferred embodiment of a stereo encoder according to the present invention. This embodiment is very similar to the embodiment shown in FIG. 2a, however, it is disclosedDetails of the side signal encoder unit 30 are shown. The encoder 14 of this embodiment does not have any pre-processing unit and the input signal is supplied directly to the adding and subtracting units 34, 36. In a multiplier 33 the monophonic signal xSingle sound channelAnd a certain balance factor gsmMultiplication. In subtracting unit 35, the multiplied mono signal is subtracted from side signal xSide wallSubtracted (i.e., substantially the difference between the two channels) to produce a side residual signal. Determining a balance factor g by an optimizer 37 based on the content of the mono signal and the side signalsmIn order to minimize the side residual signal according to a quality criterion. The quality criterion is preferably a least mean square criterion. The side residual signal is encoded in a side residual encoder 39 according to any encoder process. Preferably, the side residual encoder 39 is a low bit rate transform encoder, or a Codebook Excited Linear Prediction (CELP) encoder. Coding parameter p representing a side signalsideThe coding parameter p representing the side residual signal is includedside residualAnd an optimized balance factor 49.
In the embodiment of fig. 4, the mono signal 42 for the synthesis-side signal is the target signal x of the mono encoder 38mono. As described above (in connection with fig. 2a), a locally synthesized signal of the mono encoder 38 may also be utilized. In the latter case, the total encoder delay is increased and the computational complexity of the side signal is increased. On the other hand, the quality will be better, since it is possible to fix coding errors that occur in the mono encoder.
The basic coding scheme is described in a more precise manner as follows. The two channel signals are denoted as a and b, which may be the left and right channels of a stereo pair. The channel signals are combined into a mono signal by addition and into a side signal by subtraction. The operation is described in the form of an equation:
xmono(n)=0.5(a(n)+b(n))
xside(n)=0.5(a(n)-b(n)).
it is beneficial to reduce x by a factor of 2monoAnd xsideA signal. Here, this implies that there are other production xmonoAnd xsideThe method of (1). For example, it is possible to use:
xmono(n)=γa(n)+(1-γ)b(n)
xside(n)=γa(n)-(1-γ)b(n)
0≤γ≤1.0.
on a block of the input signal, a modified or residual side signal is calculated according to:
xside residual(n)=xside(n)-f(xmono.xside)xmono(n),
wherein f (x)mono,xside) Is a balance factor function that strives to cancel as much as possible from the side signal based on a block of N samples (i.e., sub-frame) from the side and mono signals. In other words, the balance factor is used to minimize the residual side signal. In the special case of minimization based on mean square, this is equivalent to minimizing the residual side signal xside residualThe energy of (a).
In the above special case, f (x)mono,xside) Is described as:
wherein xsideIs a side signal, and xmonoIs a mono signal. Note that this function is based on blocks starting with "start of frame" and ending with "end of frame".
It is possible to add weights in the frequency domain to calculate the balance factor. This is by using the impulse response of the weighting filter to xsideAnd xmonoSignal convolution is performed. This makes it possible to shift the estimation error to a frequency range that is less audible. This is called perceptual weighting.
From the function f (x)mono,xside) The quantized version of the given balance factor value is sent to the decoder. These quantizations are best illustrated when generating the modified side signal. The following expression is then obtained:
xside residual(n)=xside(n)-gQxmono(n)
qg (…) is a quantization function that is applied to the function f (x)mono,xside) Given the balance factor. Transmitting the balance factor in a transmission channel. In a normal left-right smoothed signal, the balance factor is limited to the interval [ -1.01.0 [ -]In (1). On the other hand, if the channels are out of phase with respect to each other, the balance factor may exceed these limits.
As an alternative method for stabilizing the stereo image, the balance factor may be limited if the normalized cross-correlation between the mono signal and the side signal is not good, as given by the following equation:
wherein,
these situations occur very frequently in classical music or studio music with a lot of diffuse sound, where in some cases the a and b channels may almost cancel each other out when creating a mono signal. The effect on the balance factor is to jump quickly, causing a disturbed stereo image. The above adjustment alleviates the problem.
The filter-based approach in US 5,434,948 has a similar problem, but the solution is not so simple in that case.
If E issIs the coding function (e.g. transform coder) of the residual-side signal, andmis a coding function of the mono signal, the a "and b" signals decoded at the end of the decoder can be described as (assuming here γ ═ 0.5):
a″(n)=(1+gQ)xmono″(n)+xside″(n)
b″(n)=(1-gQ)xmono″(n)-xside″(n)
one important benefit of calculating the balance factor for each frame is to avoid the use of interpolation. Instead, frame processing is generally performed with overlapping frames, as described above.
The coding principle using a balance factor works particularly well in the case of music signals, where fast changes are usually required to track the stereo image.
Recently, multichannel coding has become widespread. One example is 5.1 channel surround sound in DVD movies. The channels are set there as: front left, front center, front right, back left, back right, and subwoofer. In fig. 5, an embodiment of an encoder according to the invention for encoding 3 front channels in such an arrangement with inter-channel redundancy is shown.
The 3 channel signals L, C, R are provided on 3 inputs 16A-C and a mono signal x is generated by the sum of these three signalsmono. A central signal encoder unit 130 is added which receives the central signal xcentre. The mono signal 42 is in this embodiment an encoded and decoded mono signal x "monoAnd is multiplied by a certain balance factor g in multiplier 133QMultiplication. In the subtraction unit 135, the multiplied mono signal is subtracted from the central signal xcentreTo produce a central residual signal. Determining, by the optimizer 137, the balance factor g based on the content of the mono signal and the mid signalQIn order to minimize the central residual signal according to a quality criterion. The central residual signal is encoded in the central residual encoder 139 according to any encoding process. Preferably, the central residual encoder 139 is a low bit rate transform encoder or a CELP encoder. Coding parameter p representing the center signalcentreThe center then contains a coding parameter p representing the center residual signalcentre residualAnd an optimized balance factor 149. The mid residual signal is added to the scaled mono signal in an adding unit 235 resulting in a modified mid signal 142 to compensate for the coding error.
As in the previous embodiments, the side signal xsideI.e. the difference between the left L and right R channels, is supplied to a side signal encoder unit 30. Here, however, the optimizer 37 also relies on the modified central signal 142 provided by the central signal encoder unit 130. The side residual signal will thus be generated in the subtraction unit 35 as the best linear combination of the mono signal 42, the modified mid signal 142 and the side signal.
The above-described concept of variable frame length may be applied to either or both of the side and center signals.
Fig. 6 illustrates a decoder unit adapted to receive an encoded audio signal from the encoder unit of fig. 5. The received signal 54 is divided into coding parameters p representing the main signalmonoAnd a coding parameter p representing the center signalcentreAnd an encoding parameter p representing the side signalside. In the decoder 64, an encoding parameter p representing the main signalmonoIs used to generate a main signal x "mono. In the decoder 160, the coding parameter p representing the mid signalcentreIs used for the main signal x-based "monoTo generate a central signal x "centre. In the decoder 60, from the host signal x "monoAnd a central signal x "centreTo decode the coding parameter p representing the side signalsideThereby generating a side signal x "side。
This process can be expressed mathematically as follows:
inputting a signal x according toleft、xrightAnd xcentreCombine into one mono:
xmono(n)=αxleft(n)+βxright(n)+χxcentrefor simplicity, α, β, and χ are set to 1.0 in the remaining portion, but they may be set to arbitrary values. The values of α, β and χ may be constant or dependent on the signal content in order to emphasize one or both channels to obtain an optimal quality.
The normalized cross-correlation between the mono and the center signal is calculated as follows:
wherein
xcentreIs the central signal, and xmonoIs a mono signal. The mono signal is derived from the mono target signal, but it is also possible to use a local synthesis of the mono encoder.
The central residual signal to be encoded is:
xcentreresidual(n)=xcentre(n)-gQxmono(n)
qg (…) is a quantization function applied to the balance factor. Transmitting the balance factor in a transmission channel.
If E iscIs the coding function of the central residual signal (e.g. transform coder), and EmIs a coding function of the monophonic signal, the decoded signal x' at the end of the decoder "centreIs described as:
xcentre″(n)=gQxmono″(n)+xcentreresidual″(n)
the side residual signal to be encoded is:
xsideresidual(n)=(xleft(n)-xright(n))-gQsmxmono″(n)
wherein g isQsmAnd gQscIs the parameter gsmAnd gscWhich minimizes the expression:
for a least mean square minimization of the error, η may be equal to 2, for example. gsmAnd gscThe parameters may be quantized together or separately.
If E issIs a coding function of the side residual signal, the decoded channel signal x "Left side ofAnd x "rightIs given as:
xleft″(n)=xmono″(n)-xcentre″(n)+xside″(n)
xright″(n)=xmono″(n)-xcentre″(n)-xside″(n)
xside″(n)=xsideresidual″+gQsmxmono″(n)+gQscxcentre″(n)
one of the most annoying perceptible artifacts is the pre-echo effect. In fig. 7a-b, the figures illustrate such an artifact. The signal component is assumed to have a temporal development as shown by curve 100. At the beginning (starting at t 0), there is no signal component in the audio sample. At time t between t1 and t2, a signal component suddenly appears. When the signal component is encoded using a frame length of t2-t1, the presence of the signal component will be "smeared" across the entire frame, as shown by curve 101. If the decoding of this curve 101 is generated, the signal component occurs a time Δ t before the expected occurrence of the signal component, whereby a "pre-echo" is perceived.
If long encoded frames are used, the pre-echo artifacts become further enhanced. By using shorter frames, the artifact is suppressed slightly. Another way to deal with the above pre-echo problem is to exploit the fact that a mono signal can be used at the end of both the encoder and decoder. This makes it possible to scale the side signal according to the energy profile of the mono signal. At the end of the decoder, the reverse scaling is performed, and thus some pre-echo problems can be mitigated.
The energy profile of the mono signal is calculated over the entire frame as:
where w (n) is a windowing function. The simplest windowing function is a rectangular window, but other window types, such as hamming windows, may be more desirable.
The side residual signal is then scaled to:
The above equation can be written using a more general form as:
where f (…) is a monotonically continuous function. In the decoder, an energy profile is calculated for the decoded mono signal and applied to the decoded side signal:
x″ side(n)=xside″(n)f(Ec(n)), the frame start is less than or equal to n and less than or equal to the frame end.
Since this energy profile, which is scaled to some extent, is an alternative to using shorter frame lengths, this concept is particularly suitable in combination with the concept of variable frame lengths, as described further above. By having some coding schemes that apply energy contour scaling, some coding schemes that do not apply and some coding schemes that apply energy contour scaling only during certain subframes, a more flexible set of coding schemes may be provided. An embodiment of a signal encoder unit 30 according to the invention is illustrated in fig. 8. Here, the different coding scheme 81 comprises shaded sub-frames 91 (indicating a coding with energy contour scaling applied) and unshaded sub-frames 92 (indicating a coding process without energy contour scaling applied). In this way, not only combinations of subframes of different lengths, but also combinations of subframes with different coding principles can be obtained. In the current illustrative example, the energy profile scaling applied is different between different encoding schemes. In a more general case, any coding principle can be combined with the concept of variable length in a similar way.
The set of encoding schemes of fig. 8 includes schemes that handle, for example, pre-echo artifacts in different ways. In some aspects, longer subframes with pre-echo minimization according to the energy profile principle are used. In other schemes, shorter subframes are utilized where no energy contour scaling is performed. Depending on the content of the signal, one of the alternatives may be more advantageous. For very severe pre-echo scenarios, a coding scheme of short sub-frames for energy contour scaling must be used.
The proposed solution may be used in the whole frequency band or in one or more different sub-bands. The use of subbands may be applied to both the main and side signals or to one of them separately. The preferred embodiment includes splitting the side signal into several frequency bands. The reason is simply that it is easier to remove possible redundancy in isolated bands than in the whole band. This is particularly important when decoding a signal having rich spectral content.
One possible use is to encode frequency bands below a predetermined threshold using the above method. The predetermined threshold may preferably be 2kHz, or even more preferably 1 kHz. For the rest of the frequency range of interest, another additional band may be encoded using the method described above, or a completely different method may be used.
One motivation for using the above method, preferably at low frequencies, is that diffuse sound fields generally have little energy content at high frequencies. The natural reason is that sound absorption generally increases with frequency. Furthermore, diffuse sound field components appear to play a less important role for the human auditory system at higher frequencies. It is therefore beneficial to employ the solution at low frequencies (below 1 or 2kHz) and to use a more bit efficient coding scheme at higher frequencies depending on other conditions. Applying the scheme only at low frequencies allows a considerable saving of bit rate, since the bit rate necessary for the proposed method is proportional to the required bandwidth. In most cases, the mono encoder can encode the entire frequency band, whereas it is suggested to perform the proposed side signal encoding only in the lower part of the frequency band, as schematically illustrated in fig. 9. Reference numeral 301 refers to a side signal encoding scheme according to the invention, reference numeral 302 refers to any other side signal encoding scheme, and reference numeral 303 refers to one encoding scheme of the side signal.
It is also possible to use the proposed method for several different frequency bands.
In fig. 10, the main steps of an embodiment of the encoding method according to the invention are illustrated in a flow chart. The process starts at step 200. In step 210, a primary signal derived from a polyphonic signal is encoded. At step 212, a coding scheme is provided that includes subframes having different lengths and/or orders. The side signal derived from the polyphonic signal is encoded in step 214 using an encoding scheme selected at least in part on the actual signal content of the current polyphonic signal. The process ends in step 299.
In fig. 11, the main steps of an embodiment of the decoding method according to the invention are illustrated in a flow chart. The process starts at step 200. In step 220, the received encoded main signal is decoded. In step 222, a coding scheme is provided that includes subframes having different lengths and/or orders. The received side signal is decoded in step 224 by a selected coding scheme. In step 226, the decoded main and side signals are combined into one multi-tone signal. The process ends in step 299.
The above-described embodiments are to be understood as some illustrative examples of the invention. Those skilled in the art will appreciate that various modifications, combinations, and alterations to these embodiments may be made without departing from the scope of the invention. In particular, different partial solutions in different embodiments can be combined in other solutions, as long as they are technically feasible. The scope of the invention is, however, defined by the appended claims.
Reference to the literature
European patent 0497413
U.S. Pat. No. 5,285,498
U.S. Pat. No. 5,434,948
"Binaural cue coding applied to stereo and multi-channel audio compression" on the 112 th AES conference held by c.faller et al, munich 2002, germany, at 5 months.
Claims (26)
1. A method of encoding a polyphonic signal, comprising the steps of:
generating (210) a first output signal (p) based on signals of at least a first and a second channel (a, b; L, R)mono) Which is a coding parameter representing the main signal; and
generating (214) a second output signal (p) based on signals of at least the first and second channels (a, b; L, R) within an encoded frame (80)side) Which is a coding parameter representing the side signal,
the method is characterized by further comprising the following steps:
providing (212) at least two coding schemes (81), each of said at least two coding schemes (81) being characterized by a set of respective sub-frames (90) which together constitute the coding frame (80), whereby the sum of the lengths of the sub-frames (90) in each coding scheme (81) is equal to the length of the coding frame (80);
each set of subframes (90) comprises at least one subframe (90);
thereby, a second output signal (p) is generated (214)side) Comprises at least partially based on the current side signal (x)side) Selecting a coding scheme (81) from the signal content;
separately encoding the second output signal (p) in each sub-frame (90) of the selected group of sub-frames (90)side)。
2. A method according to claim 1, characterized by generating (214) a second output signal (p)side) Comprises the following steps in sequence:
generating a signal (x) representing one side separately in all sub-frames (90) of each of said at least two groups of sub-frames (90)side) Is a parameter of at least a first and a second channel (a, b; l, R);
calculating a total fidelity measure for each of the at least two coding schemes (81); and
selecting the encoded signal from the encoding scheme (81) having the best fidelity measure as the encoding parameter (p) representing the side signalside)。
3. The method of claim 2, wherein the fidelity measurement is based on a signal-to-noise measurement.
4. A method according to any of claims 1-3, characterized in that the sub-frame (90) has a length/according tosf:
lsf=lf/2n,
Wherein lfIs the length of the encoded frame (80), and n is an integer.
5. The method of claim 4, wherein n is less than a predetermined value.
6. The method according to claim 5, characterized in that the at least two coding schemes (81) comprise all permutations of subframe (90) length.
7. A method according to any of the claims 1-6, characterized by generating (210) coding parameters (p) representing the host signalmono) Comprises the following steps in sequence:
creating a host signal (x)mono) As at least a first and a second channel (a, b; l, R); and
encoding the main signal into encoding parameters (p) representing the main signalmono),
The step of encoding the side signal comprises the following steps in sequence:
creating a side residual signal (x)side residunt) As a balance factor (g) between the side signal and the phase of the signalsm) Scaled host signal (x)mono) The difference between them;
the balance factor (g)sm) A factor determined to minimize the side residual signal according to a quality criterion;
the side residual signal and a balance factor (g)sm) Encoding into encoding parameters (p) representing the side signalside)。
8. The method of claim 7, wherein the quality criterion is based on a least mean square measurement.
9. The method according to any of the claims 1-8, characterized in that the step of encoding the side signal further comprises the steps of:
the side signal (x)side) Scaling to the host signal (x)mono) The energy profile of (a).
10. Method according to claim 9, characterized in that the side signal (x) is scaledside) Is divided by a factor which is the principal signal (x)mono) Is used to generate a monotonically continuous function of the energy profile of (1).
11. The method of claim 10, wherein said monotonic continuous function is a square root function.
12. Method according to claim 10 or 11, characterized in that the host signal x is calculated over one sub-frame according to the following equationmonoEnergy profile E ofc:
Where L is an arbitrary factor, n is a summation index, m is the samples within the subframe, and w (n) is a windowing function.
13. The method of claim 12, wherein the windowing function is a rectangular windowing function.
14. The method of claim 12, wherein the windowing function is a hamming window function.
15. The method according to any of the claims 1-14, characterized in that said at least two coding schemes (81) comprise said side signal (x)side) Different coding principles.
16. The method according to claim 15, characterized in that at least a first coding scheme of said at least two coding schemes (81) comprises said side signal (x) for all sub-frames (90)side) And at least a second coding scheme of the at least two coding schemes (81) comprises the side signal (x) for all sub-frames (90)side) The second coding principle of (1).
17. Method according to claim 15 or 16, characterized in that at least one of said at least two coding schemes (81) comprises said side signal (x) for one subframeside) And the side signal (x) for another sub-frameside) The second coding principle of (1).
18. A method according to claim 1, characterized by generating (214) a second output signal (p)side) Comprises the following steps in sequence:
analyzing a side signal (x)side) The side signal (x)side) Is the sum of the first and second channels (a, b; l, R);
selecting a set of subframes (90) based on the analyzed spectral features; and
separately encoding the side signals (x) within all sub-frames (90) of the selected set of sub-frames (90)side)。
19. A method according to any of claims 1-18, characterized by generating (214) a second output signal (p)side) The steps of (a) are applied within a limited frequency band.
20. A method according to claim 19, characterized by generating (214) a second output signal (p)side) The steps of (a) are only applied to frequencies below 2 kHz.
21. A method according to claim 20, characterized by generating (214) a second output signal (p)side) The steps of (1) are only applied to frequencies below 1 kHz.
22. The method according to any of the claims 1-20, wherein the polyphonic signal represents a music signal.
23. A method of decoding a polyphonic signal, comprising the steps of:
decoding (220) an encoding parameter (p) representing a host signalmono);
Decoding (224) coding parameters (p) representing a side signal within a coded frame (80)side) (ii) a And
at least the decoded main signal (x) "mono) And the decoded side signal (x "side) Is combined (226) into at least a first and a second channel (a, b; l, R) of the signal,
the method is characterized by comprising the following steps:
providing (222) at least two coding schemes (81), each of said at least two coding schemes (81) being characterized by a set of sub-frames (90) that together constitute the coding frame (80), whereby the sum of the lengths of the sub-frames (90) in each coding scheme (81) is equal to the length of the coding frame (80);
each group of subframes (90) comprising at least one subframe (90),
whereby decoding (224) of a signal representing said sideCoding parameter (p)side) Comprises in turn separately decoding in a sub-frame (90) of one of said at least two coding schemes (81) coding parameters (p) representative of said side signalside) The step (2).
24. Encoder apparatus (14), comprising:
input means (16; 16A-C) for polyphonic signals (a, b; L, R, C) comprising at least a first and a second channel (a, b; L, R),
for generating a first output signal (p) from signals of at least said first and second channels (a, b; L, R)mono) Wherein the first output signal is an encoding parameter representing the host signal;
for generating a second output signal (p) from signals of at least said first and second channels (a, b; L, R) within one encoded frame (80)side) The apparatus (30) of (1), wherein the second output signal is an encoding parameter representing a side signal; and
an output device (52);
it is characterized in that
-means for providing at least two coding schemes (81), each of said at least two coding schemes (81) being characterized by a set of respective sub-frames (90) that together constitute the coding frame (80), whereby in each coding scheme (81) the sum of the lengths of these sub-frames (90) is equal to the length of said coding frame (80);
each set of subframes (90) comprises at least one subframe (90);
thereby for generating a second output signal (p)side) Comprises in turn a signal (x) at least partly based on the current side signal (x)side) To select a coding scheme of the signal content of the device (86; 87) (ii) a
For separately encoding the side signal (x) in each sub-frame (90) of the selected coding schemeside) The apparatus of (1).
25. Decoder device (24) comprising:
encoding parameter (p) for representing a host signalmono) And a coding parameter (p) representing the side signalside) An input device (54);
for decoding said coding parameters (p) representative of the main signalmono) The device (64);
encoding parameters (p) for decoding a side signal represented in an encoded frame (80)side) The device (60);
for at least decoding the main signal (x) "mono) And the decoded side signal (x "side) Is combined into at least a first and a second channel (a, b; l, R) signal means (68, 70); and
an output device (26; 26A-C),
characterized in that said coding parameters (p) for decoding the side-representative signalside) The device (60) comprising in sequence:
-means for providing at least two coding schemes (81), each of said at least two coding schemes (81) being characterized by a set of respective sub-frames (90) that together constitute the coding frame (80), whereby in each coding scheme the sum of the lengths of these sub-frames (90) is equal to the length of said coding frame (80);
each set of subframes (90) comprises at least one subframe (90); and
for separately decoding the coding parameters (p) representing the side signal in a sub-frame (90) of one of the at least two coding schemes (81)side) The apparatus of (1).
26. Audio system (1) comprising at least one of:
the encoder device (14) of claim 24, and
the decoder device (24) according to claim 25.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SE03035011 | 2003-12-19 | ||
SE0303501A SE0303501D0 (en) | 2003-12-19 | 2003-12-19 | Filter-based parametric multi-channel coding |
SE04004172 | 2004-02-20 | ||
SE0400417A SE527670C2 (en) | 2003-12-19 | 2004-02-20 | Natural fidelity optimized coding with variable frame length |
PCT/SE2004/001867 WO2005059899A1 (en) | 2003-12-19 | 2004-12-15 | Fidelity-optimised variable frame length encoding |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200710138487XA Division CN101118747B (en) | 2003-12-19 | 2004-12-15 | Fidelity-optimized pre echoes inhibition encoding |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1816847A true CN1816847A (en) | 2006-08-09 |
CN100559465C CN100559465C (en) | 2009-11-11 |
Family
ID=31996354
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2004800186630A Active CN100559465C (en) | 2003-12-19 | 2004-12-15 | The variable frame length coding that fidelity is optimized |
CN200710138487XA Expired - Fee Related CN101118747B (en) | 2003-12-19 | 2004-12-15 | Fidelity-optimized pre echoes inhibition encoding |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200710138487XA Expired - Fee Related CN101118747B (en) | 2003-12-19 | 2004-12-15 | Fidelity-optimized pre echoes inhibition encoding |
Country Status (15)
Country | Link |
---|---|
EP (2) | EP1845519B1 (en) |
JP (2) | JP4335917B2 (en) |
CN (2) | CN100559465C (en) |
AT (2) | ATE371924T1 (en) |
AU (1) | AU2004298708B2 (en) |
BR (2) | BRPI0419281B1 (en) |
CA (2) | CA2527971C (en) |
DE (2) | DE602004008613T2 (en) |
HK (2) | HK1115665A1 (en) |
MX (1) | MXPA05012230A (en) |
PL (1) | PL1623411T3 (en) |
RU (2) | RU2305870C2 (en) |
SE (1) | SE527670C2 (en) |
WO (1) | WO2005059899A1 (en) |
ZA (1) | ZA200508980B (en) |
Families Citing this family (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2003244932A1 (en) * | 2002-07-12 | 2004-02-02 | Koninklijke Philips Electronics N.V. | Audio coding |
EP1905004A2 (en) | 2005-05-26 | 2008-04-02 | LG Electronics Inc. | Method of encoding and decoding an audio signal |
JP4639966B2 (en) * | 2005-05-31 | 2011-02-23 | ヤマハ株式会社 | Audio data compression method, audio data compression circuit, and audio data expansion circuit |
AU2006266655B2 (en) | 2005-06-30 | 2009-08-20 | Lg Electronics Inc. | Apparatus for encoding and decoding audio signal and method thereof |
WO2007004831A1 (en) | 2005-06-30 | 2007-01-11 | Lg Electronics Inc. | Method and apparatus for encoding and decoding an audio signal |
US8082157B2 (en) | 2005-06-30 | 2011-12-20 | Lg Electronics Inc. | Apparatus for encoding and decoding audio signal and method thereof |
US7966190B2 (en) * | 2005-07-11 | 2011-06-21 | Lg Electronics Inc. | Apparatus and method for processing an audio signal using linear prediction |
JP5108767B2 (en) | 2005-08-30 | 2012-12-26 | エルジー エレクトロニクス インコーポレイティド | Apparatus and method for encoding and decoding audio signals |
JP5173811B2 (en) | 2005-08-30 | 2013-04-03 | エルジー エレクトロニクス インコーポレイティド | Audio signal decoding method and apparatus |
US8577483B2 (en) | 2005-08-30 | 2013-11-05 | Lg Electronics, Inc. | Method for decoding an audio signal |
US7788107B2 (en) | 2005-08-30 | 2010-08-31 | Lg Electronics Inc. | Method for decoding an audio signal |
US7751485B2 (en) | 2005-10-05 | 2010-07-06 | Lg Electronics Inc. | Signal processing using pilot based coding |
US7646319B2 (en) | 2005-10-05 | 2010-01-12 | Lg Electronics Inc. | Method and apparatus for signal processing and encoding and decoding method, and apparatus therefor |
US7696907B2 (en) | 2005-10-05 | 2010-04-13 | Lg Electronics Inc. | Method and apparatus for signal processing and encoding and decoding method, and apparatus therefor |
ES2478004T3 (en) | 2005-10-05 | 2014-07-18 | Lg Electronics Inc. | Method and apparatus for decoding an audio signal |
US7672379B2 (en) | 2005-10-05 | 2010-03-02 | Lg Electronics Inc. | Audio signal processing, encoding, and decoding |
KR100857111B1 (en) | 2005-10-05 | 2008-09-08 | 엘지전자 주식회사 | Method and apparatus for signal processing and encoding and decoding method, and apparatus therefor |
US7653533B2 (en) | 2005-10-24 | 2010-01-26 | Lg Electronics Inc. | Removing time delays in signal paths |
WO2007080211A1 (en) * | 2006-01-09 | 2007-07-19 | Nokia Corporation | Decoding of binaural audio signals |
US7752053B2 (en) | 2006-01-13 | 2010-07-06 | Lg Electronics Inc. | Audio signal processing using pilot based coding |
WO2007091927A1 (en) * | 2006-02-06 | 2007-08-16 | Telefonaktiebolaget Lm Ericsson (Publ) | Variable frame offset coding |
US7461106B2 (en) | 2006-09-12 | 2008-12-02 | Motorola, Inc. | Apparatus and method for low complexity combinatorial coding of signals |
US8576096B2 (en) | 2007-10-11 | 2013-11-05 | Motorola Mobility Llc | Apparatus and method for low complexity combinatorial coding of signals |
US8209190B2 (en) | 2007-10-25 | 2012-06-26 | Motorola Mobility, Inc. | Method and apparatus for generating an enhancement layer within an audio coding system |
US7889103B2 (en) | 2008-03-13 | 2011-02-15 | Motorola Mobility, Inc. | Method and apparatus for low complexity combinatorial coding of signals |
US8639519B2 (en) | 2008-04-09 | 2014-01-28 | Motorola Mobility Llc | Method and apparatus for selective signal coding based on core encoder performance |
EP2124486A1 (en) * | 2008-05-13 | 2009-11-25 | Clemens Par | Angle-dependent operating device or method for generating a pseudo-stereophonic audio signal |
BRPI0908630B1 (en) | 2008-05-23 | 2020-09-15 | Koninklijke Philips N.V. | PARAMETRIC STEREO 'UPMIX' APPLIANCE, PARAMETRIC STEREO DECODER, METHOD FOR GENERATING A LEFT SIGN AND A RIGHT SIGN FROM A MONO 'DOWNMIX' SIGN BASED ON SPATIAL PARAMETERS, AUDIO EXECUTION DEVICE, DEVICE FOR AUDIO EXECUTION. DOWNMIX 'STEREO PARAMETRIC, STEREO PARAMETRIC ENCODER, METHOD FOR GENERATING A RESIDUAL FORECAST SIGNAL FOR A DIFFERENCE SIGNAL FROM A LEFT SIGN AND A RIGHT SIGNAL BASED ON SPACE PARAMETERS, AND PRODUCT PRODUCT PRODUCTS. |
US20110137661A1 (en) * | 2008-08-08 | 2011-06-09 | Panasonic Corporation | Quantizing device, encoding device, quantizing method, and encoding method |
US8676365B2 (en) * | 2008-09-17 | 2014-03-18 | Orange | Pre-echo attenuation in a digital audio signal |
JP5309944B2 (en) | 2008-12-11 | 2013-10-09 | 富士通株式会社 | Audio decoding apparatus, method, and program |
US8200496B2 (en) | 2008-12-29 | 2012-06-12 | Motorola Mobility, Inc. | Audio signal decoder and method for producing a scaled reconstructed audio signal |
US8219408B2 (en) | 2008-12-29 | 2012-07-10 | Motorola Mobility, Inc. | Audio signal decoder and method for producing a scaled reconstructed audio signal |
US8140342B2 (en) | 2008-12-29 | 2012-03-20 | Motorola Mobility, Inc. | Selective scaling mask computation based on peak detection |
US8175888B2 (en) | 2008-12-29 | 2012-05-08 | Motorola Mobility, Inc. | Enhanced layered gain factor balancing within a multiple-channel audio coding system |
WO2011013381A1 (en) * | 2009-07-31 | 2011-02-03 | パナソニック株式会社 | Coding device and decoding device |
CN102576539B (en) * | 2009-10-20 | 2016-08-03 | 松下电器(美国)知识产权公司 | Code device, communication terminal, base station apparatus and coded method |
EP2346028A1 (en) * | 2009-12-17 | 2011-07-20 | Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. | An apparatus and a method for converting a first parametric spatial audio signal into a second parametric spatial audio signal |
EP2517201B1 (en) * | 2009-12-23 | 2015-11-04 | Nokia Technologies Oy | Sparse audio processing |
US8442837B2 (en) | 2009-12-31 | 2013-05-14 | Motorola Mobility Llc | Embedded speech and audio coding using a switchable model core |
US8428936B2 (en) | 2010-03-05 | 2013-04-23 | Motorola Mobility Llc | Decoder for audio signal including generic audio and speech frames |
US8423355B2 (en) | 2010-03-05 | 2013-04-16 | Motorola Mobility Llc | Encoder for audio signal including generic audio and speech frames |
EP2544466A1 (en) | 2011-07-05 | 2013-01-09 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Method and apparatus for decomposing a stereo recording using frequency-domain processing employing a spectral subtractor |
US9129600B2 (en) | 2012-09-26 | 2015-09-08 | Google Technology Holdings LLC | Method and apparatus for encoding an audio signal |
IN2015DN02595A (en) * | 2012-11-15 | 2015-09-11 | Ntt Docomo Inc | |
US10060955B2 (en) * | 2014-06-25 | 2018-08-28 | Advanced Micro Devices, Inc. | Calibrating power supply voltages using reference measurements from code loop executions |
ES2904275T3 (en) | 2015-09-25 | 2022-04-04 | Voiceage Corp | Method and system for decoding the left and right channels of a stereo sound signal |
CN107742521B (en) | 2016-08-10 | 2021-08-13 | 华为技术有限公司 | Coding method and coder for multi-channel signal |
CN109215668B (en) | 2017-06-30 | 2021-01-05 | 华为技术有限公司 | Method and device for encoding inter-channel phase difference parameters |
CN115831130A (en) * | 2018-06-29 | 2023-03-21 | 华为技术有限公司 | Coding method, decoding method, coding device and decoding device for stereo signal |
CN112233682B (en) * | 2019-06-29 | 2024-07-16 | 华为技术有限公司 | Stereo encoding method, stereo decoding method and device |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5434948A (en) * | 1989-06-15 | 1995-07-18 | British Telecommunications Public Limited Company | Polyphonic coding |
NL9100173A (en) * | 1991-02-01 | 1992-09-01 | Philips Nv | SUBBAND CODING DEVICE, AND A TRANSMITTER EQUIPPED WITH THE CODING DEVICE. |
US5285498A (en) * | 1992-03-02 | 1994-02-08 | At&T Bell Laboratories | Method and apparatus for coding audio signals based on perceptual model |
US5694332A (en) * | 1994-12-13 | 1997-12-02 | Lsi Logic Corporation | MPEG audio decoding system with subframe input buffering |
US5956674A (en) * | 1995-12-01 | 1999-09-21 | Digital Theater Systems, Inc. | Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels |
US5812971A (en) * | 1996-03-22 | 1998-09-22 | Lucent Technologies Inc. | Enhanced joint stereo coding method using temporal envelope shaping |
US5796842A (en) * | 1996-06-07 | 1998-08-18 | That Corporation | BTSC encoder |
US6463410B1 (en) * | 1998-10-13 | 2002-10-08 | Victor Company Of Japan, Ltd. | Audio signal processing apparatus |
US6226616B1 (en) * | 1999-06-21 | 2001-05-01 | Digital Theater Systems, Inc. | Sound quality of established low bit-rate audio coding systems without loss of decoder compatibility |
JP3335605B2 (en) * | 2000-03-13 | 2002-10-21 | 日本電信電話株式会社 | Stereo signal encoding method |
CN1244904C (en) * | 2001-05-08 | 2006-03-08 | 皇家菲利浦电子有限公司 | Audio coding |
JP2003084790A (en) * | 2001-09-17 | 2003-03-19 | Matsushita Electric Ind Co Ltd | Speech component emphasizing device |
CN1219415C (en) * | 2002-07-23 | 2005-09-14 | 华南理工大学 | 5.1 path surround sound earphone repeat signal processing method |
-
2004
- 2004-02-20 SE SE0400417A patent/SE527670C2/en unknown
- 2004-12-15 AU AU2004298708A patent/AU2004298708B2/en not_active Ceased
- 2004-12-15 DE DE602004008613T patent/DE602004008613T2/en active Active
- 2004-12-15 MX MXPA05012230A patent/MXPA05012230A/en active IP Right Grant
- 2004-12-15 PL PL04820553T patent/PL1623411T3/en unknown
- 2004-12-15 BR BRPI0419281-8A patent/BRPI0419281B1/en not_active IP Right Cessation
- 2004-12-15 AT AT04820553T patent/ATE371924T1/en not_active IP Right Cessation
- 2004-12-15 DE DE602004023240T patent/DE602004023240D1/en active Active
- 2004-12-15 EP EP07109801A patent/EP1845519B1/en active Active
- 2004-12-15 AT AT07109801T patent/ATE443317T1/en not_active IP Right Cessation
- 2004-12-15 RU RU2005134365/09A patent/RU2305870C2/en active
- 2004-12-15 CN CNB2004800186630A patent/CN100559465C/en active Active
- 2004-12-15 CA CA2527971A patent/CA2527971C/en active Active
- 2004-12-15 BR BRPI0410856A patent/BRPI0410856B8/en not_active IP Right Cessation
- 2004-12-15 JP JP2006518596A patent/JP4335917B2/en not_active Expired - Fee Related
- 2004-12-15 WO PCT/SE2004/001867 patent/WO2005059899A1/en active IP Right Grant
- 2004-12-15 CN CN200710138487XA patent/CN101118747B/en not_active Expired - Fee Related
- 2004-12-15 ZA ZA200508980A patent/ZA200508980B/en unknown
- 2004-12-15 EP EP04820553A patent/EP1623411B1/en not_active Ceased
- 2004-12-15 CA CA2690885A patent/CA2690885C/en active Active
-
2006
- 2006-11-01 HK HK08106066.8A patent/HK1115665A1/en not_active IP Right Cessation
- 2006-11-01 HK HK06112026.7A patent/HK1091585A1/en not_active IP Right Cessation
-
2007
- 2007-06-05 RU RU2007121143/09A patent/RU2425340C2/en active
- 2007-08-22 JP JP2007216374A patent/JP4589366B2/en not_active Expired - Fee Related
Also Published As
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1816847A (en) | Fidelity-optimised variable frame length encoding | |
US7809579B2 (en) | Fidelity-optimized variable frame length encoding | |
JP4809370B2 (en) | Adaptive bit allocation in multichannel speech coding. | |
RU2666291C2 (en) | Signal processing apparatus and method, and program | |
JP5719372B2 (en) | Apparatus and method for generating upmix signal representation, apparatus and method for generating bitstream, and computer program | |
US9626973B2 (en) | Adaptive bit allocation for multi-channel audio encoding | |
WO2013027629A1 (en) | Encoding device and method, decoding device and method, and program | |
CN101036183A (en) | Stereo compatible multi-channel audio coding | |
CN1669074A (en) | Voice intensifier | |
CN1639770A (en) | Reconstruction of the spectrum of an audiosignal with incomplete spectrum based on frequency translation | |
CN1910655A (en) | Apparatus and method for constructing a multi-channel output signal or for generating a downmix signal | |
CN101044551A (en) | Individual channel shaping for bcc schemes and the like | |
JP2009539132A (en) | Linear predictive coding of audio signals | |
JP2012215599A (en) | Encoding apparatus and method, and program | |
TWI666632B (en) | Voice coding device and voice coding method | |
CN1677493A (en) | Intensified audio-frequency coding-decoding device and method | |
US20100121633A1 (en) | Stereo audio encoding device and stereo audio encoding method | |
CN1732530A (en) | MPEG audio encoding method and device | |
AU2007237227B2 (en) | Fidelity-optimised pre-echo suppressing encoding | |
CN1783726A (en) | Decoder for decoding and reestablishing multi-channel audio signal from audio data code stream | |
JP2002229598A (en) | Device and method for decoding stereophonic encoded signal | |
JP2011118215A (en) | Coding device, coding method, program and electronic apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 1091585 Country of ref document: HK |
|
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: GR Ref document number: 1091585 Country of ref document: HK |