MX2008009186A - Complex-transform channel coding with extended-band frequency coding - Google Patents

Complex-transform channel coding with extended-band frequency coding

Info

Publication number
MX2008009186A
MX2008009186A MXMX/A/2008/009186A MX2008009186A MX2008009186A MX 2008009186 A MX2008009186 A MX 2008009186A MX 2008009186 A MX2008009186 A MX 2008009186A MX 2008009186 A MX2008009186 A MX 2008009186A
Authority
MX
Mexico
Prior art keywords
channel
audio
transformation
channels
encoder
Prior art date
Application number
MXMX/A/2008/009186A
Other languages
Spanish (es)
Inventor
Chen Weige
Mehrotra Sanjeev
Original Assignee
Microsoft Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corporation filed Critical Microsoft Corporation
Publication of MX2008009186A publication Critical patent/MX2008009186A/en

Links

Abstract

An audio encoder receives multi-channel audio data comprising a group of plural source channels and performs channel extension coding, which comprises encoding a combined channel for the group and determining plural parameters for representing individual source channels of the group as modified versions of the encoded combined channel. The encoder also performs frequency extension coding. The frequency extension coding can comprise, for example, partitioning frequency bands in the multi-channel audio data into a baseband group and an extended band group, and coding audio coefficients in the extended band group based on audio coefficients in the baseband group. The encoder also can perform other kinds of transforms. An audio decoder performs corresponding decoding and/or additional processing tasks, such as a forward complex transform.

Description

CODIFICATION OF COMPLEX TRANSFORMATION CHANNEL WITH EXTENDED BAND FREQUENCY CODING BACKGROUND Engineers use a variety of techniques to efficiently process digital audio while still maintaining the quality of digital audio. To understand these techniques, it helps to understand how audio information is represented and processed on a computer.
I. Representation of Audio Information on a Computer A computer processes audio information as a series of numbers representing audio information. For example, an individual number can represent an audio sample, which is a value of amplitude at a particular time. Several factors affect the quality of the audio information, which includes sample depth, sample rate, and channel mode. Sample depth (or precision) indicates the scale of numbers used to represent a sample. The more possible values there are for the sample, the quality will be higher because the number can capture more useful variations in amplitude. For example, an 8-bit sample has 256 possible values, while a 16-bit sample has 65,536 possible values. The sampling rate (usually measured as the number of samples per second) also affects quality The higher the sampling rate, the higher the quality ad because more sound frequencies can be represented Some common sampling rates are 8,000, 11,025, 22,050, 32,000, 44,100, 48,000 and 96,000 samples / second Mono and stereo are two common channel modes for audio In mono mode, the information of audio is presented on a channel In stereo mode, audio information is presented on two channels usually labeled left and right channels Other modes with more channels such as channel 5 1, channel 7 1, or surround sound of channel 9 1 (the "1" indicates a channel of effects sub-woofer or low frequency) are also possible Table 1 shows several audio formats with different levels of quality, together with corresponding incomplete bit rate costs Table 1: bit rates for different quality audio information Surround sound audio typically has incomplete bit rate even higher. As shown in Table 1, the cost of high quality audio information is high bit rate. High quality audio information consumes large amounts of computer storage and capacity. Transmission Companies and customers increasingly rely on computers, however, to create, distribute, and play high-quality audio content II. Processing audio information on a computer Many computers and computer networks lack the resources to process incomplete digital audio Compression (also called coding or coding) decreases the cost of storing and transmitting audio information by converting information into a form of speed Internal bit decompression (also called decoding) extracts a reconstructed version of the original information from the compressed form. The encoder and decoder systems include certain versions of the Windows Media Audio (WMA) encoder and decoder ("WMA"). Corporation and WMA Pro encoder and decoder Compression can be lossless (where quality does not suffer) or lossy (where quality suffers but compression bit rate reduction without subsequent loss is more dramatic) For example, the loss compression is used to approximate information of original audio, and the approximation is then compressed without loss. Lossless compression techniques include operating length coding, operating level coding, variable length coding, and arithmetic coding. The corresponding decompression techniques (also called decoding techniques) entropy) include decoding of operating length, performance level decoding, variable length decoding, and arithmetic decoding An audition compression goal is to digitally represent audio signals to provide maximum perceived signal quality with the smallest possible amounts of bits With this goal as a goal, several systems Contemporary audio encoding makes use of a variety of different lossy compression techniques These lossy compression techniques typically involve perceptual casting / weighting and quantification after a frequency transformation. The corresponding decompression involves reverse quantification, inverse weights, and inverse transformations. Inverse Frequency Frequency transformation techniques convert data into a form that facilitates separating perceptually important information from perceptually unimportant information Less important information then can be subjected to more compression n with loss, while retaining more important information, to provide the best perceived quality for a given bit rate. The frequency transformation typically receives audio samples and converts them from the time domain into data in the frequency domain, sometimes called frequency coefficients or spectral coefficients Perceptual molding involves processing audio data according to a model of the human audit system to provide the perceived quality of the reconstructed audio signal for a given bit rate. For example, an audit model typically considers the human audience scale and critical bands Using the results of the perceptual modeling, an encoder shapes the distortion (for example, quantization noise) in the audio data with the goal to minimize the distortion hearing for a bit rate Given The quantification of line scales of input values to val Individual encoders, which introduce irreversible loss of information but also show an encoder to regulate the quality and bit rate of the output. Sometimes, the encoder performs quantization in conjunction with a velocity controller that adjusts the quantization to regulate bit rate and / or quality There are several kinds of quantification, including adaptive and non-adaptive, scalar and vector, uniform and non-uniform Perceptual weight can be considered a non-uniform quantification form Inverse quantization and inverse weight reconstruct the data of heavy frequency coefficients , quantized to an approximation of the original frequency coefficient data A reverse frequency transformation then converts the reconstructed frequency coefficient data into reconstructed time domain audio samples The coding junction of audio channels involves encoding information over of u n channel together to reduce bit rate For example, middle / side encoding (also called M / S coding or sum difference coding) involves performing a matrix operation on left and right stereo channels on an encoder, and sending channels "media" and "lateral" resulting (normalized addition and difference channels) to a decoder The decoder reconstructs the actual physical channels of the "middle" and "lateral" channels The M / S coding is lossless, allowing for perfect reconstruction if other lossy techniques (eg, quantization) are not used in the coding procedure The intensity stereo coding is an example of a loss-binding coding technique that can be used at low bit rates. intensity involves adding a left and right channel into an encoder and then scaling information from the sum channel in a coding during reconstruction of the left and right channels Typically, intensity stereo coding is performed at higher frequencies where artifacts introduced by this lossy technique are less noticeable Given the importance of compression and decompression to media processingIt is not surprising that compression and decompression are richly developed fields. However, the advantages of previous techniques and systems, however, do not have vain advantages of the techniques and systems described here.
BRIEF DESCRIPTION OF THE INVENTION This Brief Description is provided to introduce a selection of concepts in a simplified form that is also described later in the detailed description. This Brief Description is not intended to identify key features or essential characteristics of the subject matter claimed, nor is it intended to be used to limit the scope of the subject matter claimed. In summary, the detailed description is chosen to strategies for encoding and decoding multi-channel audio. For example, an audio encoder uses one or more techniques to improve the quality and / or bit rate of multi-channel audio data. This improves the total listening experience and towards computer systems a more mandatory platform for creating, distributing, and reproducing high quality multiple channel audio The coding and decoding strategies described here include various techniques and tools, which can be used in combination or independently. example or, an audio encoder receives multi-channel audio data, multi-channel audio data comprising a plurality of plural source channels. The encoder performs channel extension coding in multi-channel audio data. channel comprises encoding a combined channel for the group, and determining plural parameters to represent individual source channels of the group as modified versions of the coded composite channel. The encoder also performs frequency extension coding in the multi-channel audio data. frequency extension may comprise, for example, dividing frequency bands into the multiple channel audio data in a baseband group and an extended band group, and encoding audio coefficients in the extended band group based on the coefficient of audito in the baseband group As another example, an audio decoder receives encoded multi-channel audio data comprising channel extension coding data and frequency extension encoding data, the decoder reconstructs plural audio channels using the channel extension coding data and the extension coding data Frequency The channel extension coding data comprises a combined channel for plural audio channels and plural parameters to represent individual channels of plural audio channels as modified versions of the combined channel. As another example, an audio decoder receives data from Multiple channel audio and performs an inverse multiple channel transformation, a time to reverse base frequency transformation, frequency extension procedure and channel extension procedure in the received multiple channel audio data. The decoder can perform decoding that corresponds to the encoder made, and / or additional steps such as a complex forward transformation in the received data, and can perform the steps in various orders For vain aspects described here in terms of an audio encoder, a decoder audio performs corresponding procedure and decoding The objects, features, and advantages above and others will be more apparent from the following detailed description, which proceeds with reference to the accompanying figures BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a block diagram of a generalized operating environment in conjunction with which various described modalities can be implemented. Figures 2, 3, 4, and 5 are block diagrams of coders and / or generalized decoders with which various implementations can be implemented. described embodiments Figure 6 is a diagram showing a configuration of the illustrative box Figure 7 is a flow chart showing a generalized technique for multiple channel pre-processing Figure 8 is a flow chart showing a generalized technique for multiple channel post-processing Figure 9 of a flow chart showing a technique for deriving complex scale factors for channels combined in channel extension coding Figure 10 is a flow chart showing a technique for using complex scale factors in channel extension disqualification. Figure 11 is a diagram showing a scale of combined channel coefficients in channel reconstruction. Figure 12 is a table showing a graphical comparison of real energy ratios and interpolated energy ratios for energy ratios at anchor points. Figures 13-33 are related equations and matrix arrangements that show details of the channel extension procedure in some implementations Figure 34 is a block diagram of asp ects of an encoder performing frequency extension coding Figure 35 is a flow chart showing an illustrative technique for encoding extended band sub-bands Figure 36 is a block diagram of aspects of a decoder performing extension decoding Frequency Figure 37 is a block diagram of aspects of an encoder that performs channel extension coding and frequency extension coding., 39 and 40 are block diagrams of decoder aspects that perform channel extension decoding and frequency extension decoding. Figure 41 is a diagram showing displacement vector representations for two audio blocks. Figure 42 is a diagram that shows an audio block arrangement that have anchor points for scale parameter interpolation DETAILED DESCRIPTION Vain techniques and tools are described to represent, encode, and decode audio information. These techniques and tools facilitate the creation, distribution, and reproduction of high quality audio content, even at very slow pace rates. The various techniques and tools described here can be used independently Some of the techniques and tools can be used in combination (for example, in different phases of a combined encoding and / or decoding procedure) Several techniques are described below with reference to flow charts of procedural acts The vain acts of The procedures shown in the flow charts can be consolidated into fewer acts or separated into more acts. For the search for simplicity, the relationship of acts shown in a particular flow chart to acts described elsewhere is often not shown. In many cases, the acts in a flow chart can Reorder Much of the detailed description is directed to representing, encoding, and decoding audio information. Many of the techniques and tools described here to represent, encode, and decode audio information can also be applied to video information, still image information, or other media information sent in single or multiple channels I. Computing Environment Figure 1 illustrates a generalized example of an adequate computing environment 100 in which described modalities can be implemented The computing environment 100 is not intended to suggest any limitations to the scope of use or functionality, as the described modalities can be implemented in general purpose or special purpose computing environments With reference to Figure 1, the computing environment 100 includes at least one processing unit 110 and memory 120 In Figure 1, this very basic configuration 130 is included within a dotted line The processing unit 110 includes computer executable instructions and can be a real or virtual processor. In a multiple processing system, the multiple processing units are executed in computer executable instructions to increase processing power. The memory 120 can be a volatile memory. (for example, registers, cache, RAM), mem non-volatile source (eg, ROM, EEPROM, flash memory), or some combination of the two. The memory 120 stores software 180 that implements one or more audio processing techniques and / or systems according to one or more of the modes described A computing environment may have additional features For example, the computing environment 100 includes storage 140, one or more input devices 150, one or more output devices 160, and one or more communication connections 170 An interconnection mechanism ( not shown) such as a common driver, controller, or network interconnects the components of the computing environment 100 Typically, the operating system software (not shown) provides an operating environment for software running in the computing environment 100 and coordinates activities of the components of the computing environment 100 The storage 140 can be removable or non-removable, and includes magnetic discs, tapes or cassettes magnetic fields, CDs, DVDs, or any other means that can be used to store information and that can be accessed within the computing environment 100 Storage 140 stores instructions for the software 180 The input card (s) 150 can be an input device by touch such as a keyboard, mouse, pen, touch screen or seguibola, a voice input device, a scanning device, or other device that provides input to the computing environment 100 For audio or video, the d? spos? t? vo (s) input 150 can be a microphone, sound card, video card, tv tuner card, or similar device that accepts audio or video input in analogue or digital form, or a CD or DVD that reads audio or video samples in the computing environment The output d? spos? t? vo (s) 160 may be a large screen, printer, speakers, CD / DVD writer, network adapter, or other device that provides output from the computing environment 100 Communication connection (s) 170 allows communication in a communication medium in one or more other computing entities The communication medium conveys information such as computer executable instructions, audio or video information, or other data in a data signal A modulated data signal is a signal having one or more of its characteristics set or changed in such a way as to encode information in the signal By way of example, and not limitation, the means of communication include wired or wireless techniques implemented with an electric, optical, RF, infrared carrier. eye, acoustic, or other Modes can be described in the general context of computer-readable media Computer-readable media is any medium that can be accessed within a computing environment. By way of example, and not limitation, with the computing environment 100, computer-readable media includes memory 120. storage 140, media, and combinations of any of the foregoing. The modalities can be described in the general context of computer executable instructions, such as those included in program modules, which run in a computing environment in a real or virtual target processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular data types. The functionality of the program modules can be combined or divided among program modules as desired in various modalities. Computer-executable instructions for program modules can be executed within a local or distributed computing environment. For presentation, the detailed description uses terms such as "determine", "receive", and "perform" to describe computer operations in a computing environment These terms are higher level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being. The actual computer operations that correspond to these terms vary depending on the implementation II. Illustrative encoders and decoders Figure 2 shows a first audio encoder 200 in which one or more described embodiments can be implemented. The encoder 200 is a perceptual transformation-based audio encoder 200. Figure 3 shows a corresponding audio decoder 300. 4 shows a second audio encoder 400 in which one or more described embodiments can be implemented. The decoder 400 again is a perceptual transformation-based audio codifier, but the encoder 400 includes additional modules, such as modules for processing audio. Multiple channel display Figure 5 shows a corresponding audio decoder 500 Although the systems shown in Figures 2 to 5 are generalized, each has features found in real-world systems. In any case the relationships shown between modules within the encoders and decoders indicate information flows in codif icators and decoders, other relationships are not displayed for the simplicity search Depending on the implementation and the type of compression desired, the modules of an encoder or decoder can be added, omitted, divided into multiple modules, combined with other modules, and / or replace with similar module In alternative modes, encoders or decoders with different modules and / or other configurations process audio data or some other type of data according to one or more modalities described A. First audio encoder Encoder 200 receives a time-slice of input audio samples 205 at some depth and sampling rate Input audio samples 205 for multi-channel audio (eg, stereo) or mono audio encoder 200 compresses the audio samples 205 and multiplexes information produced by the various modules of the encoder 200 to output a bitstream 295 in a compression format such as a WMA format, a container format such as Advanced Flow Format ("ASF"). "), or another compression or container format. The frequency transformer 210 receives the audio samples 205 and converts them into data in the frequency (or spectral) domain. For example, the frequency transformer 210 divides the audio samples 205. of frames in sub-frame blocks, which may have variable size to allow variable temporal resolution Blocks may overlap to reduce perceptual discontinuities ible between blocks that can otherwise be introduced by further quantization The frequency transformer 210 applies to blocks a Modulated Bent Modulation ("MLT") variant of time, modulated DCT ("MDCT"), some other MLT or DCT variety, or some other type of frequency modulation or non-modulated transformation superimposed superimposed, or uses sub-band or small-wave coding. Frequency transformer 210 outputs blocks of spectral coefficient data and outputs side information such as block sizes to the multiplexer (" MUX ') 280 For multi-channel audio data, the multi-channel transformer 220 can convert the multiple original independently coded channels into co-encoded channels OR, the multi-channel transformer 220 can pass the left and right channels through channels independently encoded The multiple channel transformer 220 produces lateral information for l MUX 280 indicating the channel mode used The encoder 200 can apply new multi-channel matrix to an audio data block after a multiple channel transformation The perception modeler 230 models properties of the human hearing system to improve the perceived quality of the reconstructed audio signal for a given bit rate The perception modeler 230 uses any of the various hearing models and passes stimulation pattern information or other information to the load 240 For example, a listening model typically considers the human hearing scale and critical bands (eg, Bark bands) In addition to critical scale and bands, the interactions between audio signals can dramatically affect perception. In addition, a listening model may consider a variety of other factors that relate to physical or neural aspects of human sound perception The perception modeler 230 outputs information that the load 240 uses to shape noise in audio data to reduce hearing noise For example, use any of In vain techniques, load 240 generates weight factors for quantization matrices (sometimes called mask) based on the received information. Weight factors for a quantization matrix include a weight for each of the multiple quantization bands in the matrix , where the quantification bands are frequency scales of coefficients of fr ecuencia In this way, weight factors indicate portions in which noise / quantization error is diffused across the quantization bands, thereby controlling spectral / temporal distribution of noise / quantization error, with the goal of minimizing audition of noise by placing more noise in bands where it is less audible, and vice versa Load 240 then applies the load factors to the data received from the multiple channel transformer 220 Quantizer 250 quantizes the output of load 240, which produces data of quantized coefficient to entropy encoder 260 and lateral information including quantization step size for MUX 280 in Figure 2, quantizer 250 is a scalar uniform adaptive quantizer Quantifier 250 applies the same quantization step size to each coefficient spectral, but the quantification step size by itself can change from a repetition of a turn and quantization to the next to affect the bit rate of the entropy encoder output 260 Other quantization classes are non-uniform quantization, vector quantization, and / or non-adaptive The entropy encoder 250 quantitatively compresses quantized coefficient data received from the quaptifier 250, for example, which perform operation level coding and variable length vector encoding The entropy decoder 260 can calculate the number of spent bits encoding audio information and pass this information to the speed / quality controller 270 The controller 270 works with the quantizer 250 to regulate the bit rate and / or quality of the output of the encoder 200. The controller 270 outputs the quantize step size to the quantizer 250 with the goal of satisfying bit rate and quality limitations. 200 encoder can apply noise substitution and / or truncamient or band to an audio data block The MUX 280 multiplexes the lateral information received from the other modules of the audio encoder 200 together with the entropy encoded data received from the entropy encoder 260 The MUX 280 may include a virtual buffer which it stores the bit stream 295 to be output by the encoder 200 B. First audio decoder The decoder 300 receives a bitstream 305 of compressed audio information including entropy encoded data as well as side information, of which the decoder 300 reconstructs audio samples 395 The demultiplexer ("DEMUX") 310 analyzes information in the bitstream 305 and sends information to the decoder 300 modules The DEMUX 310 includes one or more buffers to compensate for short-term variations in bit rate due to fluctuations in audio complexity, network instability, and / or other factors The decoding of entropy 320 decompresses without loss entropy codes received from the DEMUX 310, producing quantized spectral coefficient data The entropy decoder 320 typically applies the inverse of the entropy coding techniques used in the encoder The inverse quantizer 330 receives a quantization step size of the DEMUX 310 and receives quantized spectral coefficient data from the decoder Entropy 320 The inverse quantizer 330 applies the quantization step size to the quantized frequency coefficient data to partially reconstruct the frequency coefficient data, or otherwise performs inverse quantization of the DEMUX 310, the noise generator 340 receives information which indicates which band in a data block is replaced by noise 9 as well as any of the parameters for the shape of the noise The noise generator 340 generates the patterns for the indicated bands, and passes information to the inverse load 350 The inverse load 350 receives the weight factors of the DEMU X 310, patterns for any of the bands replaced by noise generator noise 340, and partially reconstructed frequency coefficient data of the inverse quantizer 330 As necessary, the inverse load 350 decompresses weight factors The inverse load 350 applies the factors weight to the partially reconstructed frequency coefficient data for bands that were not replaced by noise The reverse load 350 then adds noise-substituted bands in the noise patterns received from the noise generator 350. The reverse multiple channel transformer 360 receives the data of the reconstructed spectral coefficient of the reverse load 350 and channel mode information of the DEMUX 310 If the multi-channel auditor is on independently coded channels, the reverse multiple channel transformer 360 passes the channels through this If the channel data multiple channels are in coded channels, the channel transformer m Multiple 360 converts the data into independently coded channels The reverse frequency transformer 370 receives the output of spectral coefficient data by the multi-channel transformer 360 as well as lateral information such as block sizes of the DEMUX 310 The inverse frequency transformer 370 applies the inverse of the frequency transformation used in the encoder and takes out blocks of reconstructed audio samples 395 C. Second audio encoder With reference to Figure 4, the encoder 400 receives a time slice of input audio samples 405 at some depth and sampling rate. The input audio samples 405 are for multi-channel audio (for example, example, stereo, surround) or mono audio The encoder 400 compresses the audio samples 405 and multiplexes information produced by the various modules of the encoder 400 to the bitstream output 495 in a compression format such as a Pro format WMA, a container format such as ASF, or other compression or format of the container The encoder 400 selects between multiple coding modes for the audio samples 405 In Figure 4, the encoder 400 switches between a coding mode without mixed / pure loss and a lossy encoding mode The lossless encoding mode includes the encoder without mixed / pure loss 472 and is typically used for compression of high quality (and higher bit rate) The lossy encoding mode includes components such as load 442 and quantizer 460 and is typically used for adjustable quality compression (and controlled bit rate) The selection decision depends on the input of user and other criteria For coding with multi-channel audio data loss, the multi-channel pre-processor 410 optionally resets the time-domain audio samples 405 in matrix. For example, the multi-channel pre-processor 410 selectively relocates in those audio samples 405 to descend one or more coded channels or increase inter-channel correlation in the encoder 400, which even allows reconstruction (in some form) in the decoder 500. The multi-channel pre-processor 410 can send side information such as instructions for multi-channel post-processing to the MUX 490 The window module 420 divides a frame of 405 audio input samples into subframe blocks (windows) Windows can have vanishing time and window shape functions When encoder 400 uses lossy encoding, resizable windows allow temporary resolution Variable The window module 420 removes split data blocks and outputs side information such as block sizes to the MUX 490 In Figure 4, the box configurator 422 divides multiple channel audio frames in a base per channel The frame configurator 422 independently divide each channel in the frame, if quality / bit rate allows For example, this allows the box configurator 422 to isolate passenger aspects that appear in a particular channel with smaller windows, but use larger windows for frequency resolution or compression efficiency in other channels This can improve compression efficiency by isolating passage aspects in one base per channel, but additional information is needed that specifies the divisions in individual channels in many cases. Windows of the same size that are co-located in time can qualify for additional redundancy reduction through multiple channel transformation. In the form, the box configurator 422 groups windows of the same size that are co-located in time as a box. Figure 6 shows an illustrative box configuration 600 for a channel 5 audio frame. The box 600 configuration includes 7 boxes, Numbered from 0 to 6 Box 0 includes samples of channels 0, 2, 3, and 4 and separates the first quarter of the frame Box 1 includes samples of channel 1 and separates the first half of the frame Box 2 includes samples of channel 5 and separates the complete frame Box 3 is like box 0, but separates the second quarter of the frame Boxes 4 and 6 include samples in channels 0, 2, and 3, and separate the third r and fourth quarters, respectively, of the frame Finally, box 5 includes samples of channels 1 and 4 and separates the last half of the frame As shown, a particular box may include windows in non-contiguous channels. The frequency transformer 430 receives audio samples and converts them into data in the frequency domain, which apply a transformation as described above for the frequency transformer 210 of FIG. 2 The frequency transformer 430 outputs spectral coefficient data blocks to the load 442 and outputs side information such as block sizes to the MUX 490. The frequency transformer 430 outputs both frequency coefficients and the lateral information of the perception modeler 440. perception 440 models properties of the human hearing system, which processes audio data according to a listening pattern generally as described above with reference to the perception modeler 230 of Figure 2 The load 442 generates weight factors for quantization matrices based on in the information received from the perception modeler 440, generally as described above with reference to the load 240 of Figure 2 The load 442 applies the weight factors to the data received from the frequency transformer 430 The load 442 outputs lateral information such as the quantification matrices and weight factors channel to MUX 490 Quantization matrices can be compressed For multi-channel audio data, multi-channel transformer 450 can apply a multi-channel transformation to take advantage of inter-channel correlation. For example, the multi-channel transformer 450 selectively and flexibly apply the multi-channel transformation to some but not all channels and / or quantization bands in the box The 450 multi-channel transformer selectively uses predefined matrices or custom matrices, and applies efficient compression to the matrices custom The multiple channel transformer 450 produces lateral information to the MUX 490 that in for example, the multiple channel transformations used and the multiple channel transformed parts of the frames. The quantizer 460 quantizes the output of the multiple channel transformer 450, which produces quantized coefficient data to the entropy encoder 470 and lateral information including quantization step sizes at MUX 490 In Figure 4, quantizer 460 is an adaptive, uniform, scalar quantifier that calculates a quantization factor per frame, but quantizer 460 in turn can perform some other quantization class. entropy 470 losslessly compresses quantized coefficient data received from quantizer 460, generally as described with reference to entropy coder 260 of Figure 2 Controller 480 works with quantizer 460 to regulate the bit rate and / or quality of the output of the 400 encoder The 480 controller outputs the quanti factors to the 460 quantifier with the goal of satisfying quality and / or bit rate limitations The mixed / pure loss 472 encoder and associated entropy encoder 474 compresses audio data for the encoding mode without mixed / pure loss The encoder 400 uses encoding mode without mixed / pure loss for a complete sequence or changes between encoding modes on a frame-by-frame basis, block by frame-by-frame, or other The MUX 490 multiplexes the lateral information received from the other encoder modules 400 audio along with the entropy-coded data received from the entropy coders 470, 474 The MUX 490 includes one or more buffers for speed control or other purposesD. Second audio decoder With reference to Figure 5, the second audio decoder 500 receives a bitstream 505 of compressed audio information. The bit stream 505 includes data entropy encoded as well as lateral information of which the decoder 500 reconstructs audio samples 595 DEMUX 510 analyzes information in bit stream 505 and sends information to decoder 500 modules DEMUX 510 includes one or more buffers to compensate for short-term changes in bit rate due to fluctuations in complexity of the audio factors, network instability, and / or others The entropy decoder 520 decompresses lossless entropy codes received from the DEMUX 510, which typically applies the inverse of the entropy coding techniques used in the 400 encoder. compressed decoding data in loss coding mode, the entropy decoder 520 produces quantized spectral coefficient data The decoder without mixed / pure loss 522 and associated entropy decoder (s) 520 uncompress encoded audio data without loss for encoding mode without mixed / pure loss The decode configuration box 530 receives and, if necessary, decodes information indicating frame patterns for DEMUX 590 frames. Box pattern information may be entropy coded or otherwise parameterized. The box configuration decoder 530 then passes. pattern information to other modules of the decoder 500 The reverse multiple channel transformer 540 receives the quantized spectral coefficient data from the entropy decoder 520 as well as box pattern information from the box configuration decoder 530 and side information from the DEMUX 510 that indicates, for example, the multiple channel transformation used and transformed parts of frames. By using this information, the reverse multiple channel transformer 540 decompresses the transformation matrix as necessary, and selectively and flexibly applies one or more reverse channel multichannel transformations. to the audio data JO The quantifier / reverse load 550 receives information such as box and channel quantization factors as well as quantizing matrices of the DEMUX 510 and receives quantized spectral coefficient data from the reverse multiple channel transformer 540 The quantizer / load Inverse 550 decompresses the received weight factor information as necessary The quantizer / load 550 then performs the quantization and inverse weight The inverse frequency transformer 560 receives new output of spectral coefficient data by the inverse quantizer / load 550 thus with lateral information DEMUX 510 and the box pattern information of the box configuration decoder 530 The reverse frequency transformer 570 applies the inverse of the frequency transformation used in the encoder and removes blocks from the 570 overlay / aggregator In addition to receiving pattern information from box of descodi 530 Frame Configuration Factor, Overlay / Aggregator 570 Receives Decoded Information from Reverse Frequency Transformer 560 and / or Decoder Without Mixed / Pure Loss 522 Overlay / Aggregator 570 Overlaps and Adds Auditory Data As Needed and Separates Frames or Other audio data streams encoded with different modes The 580 multi-channel post-processor optionally resets the time domain audio samples output to the matrix overlayer / aggregator 570 For bit stream-controlled post-processing, the post-processing transformation matrices vary with time and are signaled or included in the 505 bit stream III. Multiple Channel Processing Review This section is a review of some multi-channel processing techniques used in some encoders and decoders, including multi-channel pre-processing techniques, flexible multiple channel transformation techniques, and post-processing techniques multiple channel A. Multiple channel pre-processing Some encoders perform multiple channel pre-processing on audio samples in the time domain In traditional encoders, where there are N source audio channels as input, the number of output channels produced by the encoder is also N The number of encoded channels may correspond one-to-one with the source channels, or the encoded channels may be encoded channels of multiple-channel transformation When the complexity of encoding the source makes the compensation difficult or when the encoder buffer is full, however, the encoder can alter or descend (ie, without code) one or more of the original input audio channels or J encoded channels of multi-channel transformation This can be done to reduce complexity of coding and improving the total perceived quality of the audio For pre-processing driven by r quality, an encoder can perform multiple channel pre-processing in reaction to measured audio quality to smoothly control total audio quality and / or channel separation For example, an encoder can alter a multi-channel audio image to make one or more less critical channels for channels to descend into the encoder even reconstructed in a decoder as "ghost" or uncoded channels This helps avoid the need for direct channel elimination or severe quantization, which can have a dramatic effect on quality encoder may indicate to the decoder when to take action when the number of encoded channels is less than the number of channels for output When a multi-channel post-processing transformation can be used in a decoder to create ghost channels For example, an encoder (a through a stream of bits) can instruct a decoder to create a c ghost input when averaging decoded left and right channels Rear multiple channel transformations can exploit redundancy between averaged left and rear left channels (without post-processing), or an encoder can instruct a decoder to perform some multipath post-processing for left and right back channels of O, an encoder can point to a decoder to perform multi-channel post-processing for another purpose Figure 7 shows a generalized technique 700 for multi-channel pre-processing An encoder performs (710) pre-processing Multiple channel in time domain multiple channel audio data, which produces audio data transformed in the time domain For example, pre-processing involves a general transformation matrix with real, continuous, valued elements The general transformation matrix can also be chosen to increase arti inter-channel correlation This reduces complexity for the rest of the encoder, but not the cost of channel separation loss The output is then fed to the rest of the encoder, which, in addition to any other processing that the encoder can perform, encodes (720) data using techniques described with reference to Figure 4 or other compression techniques, which produce encoded multiple channel audio data A syntax used by an encoder and decoder may allow the description of multiple channel transformation matrices of Predefined or general post processing that can vary or turn on / off on a frame-by-frame basis An encoder can use this flexibility to limit stereo / surround image impairments, which change channel separation for better overall quality under certain circumstances by increasing artificially the inter-channel correlation Alternatively, a c The decoder and the decoder can use another syntax for pre and post-processing of multiple channels, for example, one that allows changes in transformation matrices in a different base from frame to frame B. Flexible Multiple Channel Transformations Some encoders can perform flexible multiple channel transformations that effectively take advantage of inter-channel correlation. The corresponding decoders can perform corresponding reverse multiple channel transformations. For example, an encoder can place a multi-channel transformation after of perceptual weight (and the decoder can place the reverse multiple channel transformation before inverse weight) so that a measurable, transverse, channel-channel-escaped signal is controlled and have a spectrum as the original signal An encoder can apply weight factors to audio multiple channel in the frequency domain (eg both weight factors and quantization step modifiers per channel) before multiple channel transformations An encoder can perform one or more multi-channel transformations in heavy audio data, and quantize Multi-channel Transformed Audio Crossovers A decoder can collect samples of multiple channels at a particular frequency index on a vector and perform a reverse multiple channel transformation to generate the output. Subsequently, a decoder can reverse the quantization and the inverse audio weight of multiple channel, coloring the output of the reverse multiple channel transformation with mask (s) In this way, the escape that occurs through channels (due to quantification) can be spectrally shaped so that the hearing of the escaped signal is measurable and controllable, and the escape of other channels in a given reconstructed channel has spectral form as the original uncorrupted signal of the given channel An encoder can group channels for multiple channel transformations to limit which channels were transformed together For example, an encoder may determine that channels within a box are correlated and grouped an correlated channels An encoder can consider correlations as a pair between channel signals as well as inter-band correlations, and others and / or additional factors when grouping channels for multiple channel transformation For example, an encoder can calculate correlations as a pair between signals on channels and then group channels accordingly A channel that does not correlate as a pair with any of the channels in a group can still be compatible with that group For channels that are incompatible with a group, an encoder can check band level compatibility and adjust one or more groups of channels accordingly An encoder can identify channels that are compatible with a group in some bands, but incompatible in some other bands Turning off a transformation into incompatible bands can improve the correlation between bands that actually get coded multiple channel transformation and improve efficiency of c The channels in a channel group do not need to be contiguous. An individual box can include multiple channel groups, and each channel group can have a different associated multiple channel transformation. After deciding which channels are compatible, an encoder can place information from channel group in a bit stream A decoder can then retrieve and process the bitstream information An encoder can selectively turn on or off multicast transforms at the frequency band level to control which bands are transformed together In this way , an encoder can selectively exclude bands that are not compatible in multiple channel transformations When a multiple channel transformation is switched off for a particular band, an encoder can use the identity transformation for that band, which passes through the data in the band without altering it The number of bands d e frequency refers to the sampling frequency of the audio data and the box size in general, the higher the sampling frequency or the larger the frame size, the greater the number of frequency bands An encoder can selectively turn on or off multi-channel transforms at the frequency band level for channels of a channel group. a box A decoder can retrieve band on / off information for a multiple channel transformation for a channel group of a frame of a bit stream according to a particular bit-syntax. An encoder can use multiple-channel transformations hierarchical to limit computational complexity, especially in the decoder With a hierarchical transformation, an encoder can divide a total transformation into multiple stages, which reduces the computational complexity of individual stages in some cases reduces the amount of information needed to specify multiple channel transformations Al use In this cascade structure, an encoder can emulate the larger total transformation with minor transformations, up to some accuracy. A decoder can then perform a corresponding hierarchical inverse transformation. An encoder can combine frequency band on / off information for multiple multiple channel transformations A decoder can retrieve information for a multiple channel transformation for channel groups of a bitstream according to a particular bitstream syntax. An encoder can use pre-defined multiple channel transformation arrays to reduce the bit rate used. to specify transformation matrices An encoder can select from among multiple available predefined matrix types and point out the selected matrix in the bit stream Some types of arrays may require no further signaling in the stream Other bits may require additional specification A decoder can retrieve the information that indicates the type of array and (if necessary) additional information that specifies the matrix An encoder can calculate and apply quantization matrices for box channels, pitch modifiers quantification by channel, and total quantization box factors This allows an encoder to shape the noise according to a listening pattern, balance noise between channels, and total control distortion. A corresponding encoder can decode the application of frame factors of total quantization, quantification step modifiers per channel, and quantification matrices for box channels, and can combine inverse quantization and inverse weight steps C. Post-processing of Multiple Channel Some decoders perform post-processing of multiple channel in audio samples reconstructed in the time domain For example, the number of decoded channels may be less than the number of channels for output (for example, because the encoder does not encode one or more input channels). If so, a channel post-processing transformation may be used. Multiple to create one or more "ghost" channels based on actual data in the decoded channels If the number of decoded channels equals the number of output channels, the post-processing transformation can be used for arbitrary spatial rotation of the presentation, which delineates output channels between horn positions, or other spatial or special effects If the number of decoded channels is greater than the number of output channels (for example, that it plays surround audio in stereo equipment), a transformation of the -processing can be used to "double" channels The transformation matrices for these scenarios and applications can provide The decoder decodes (810) encoded multi-channel audio data, which produces reconstructed time domain multiple channel audio data. The decoder can be identified or signaled by the encoder Figure 8 shows a generalized 800 technique for multi-channel post-processing then performs (820) multiple channel post-processing in the time domain multiple channel audio data When the encoder produces a number of coded channels and the decoder outputs a larger number of channels, the post processing involves a general transformation to produce the largest number of output channels of the smallest number of encoded channels For example, the decoder takes co-localized samples (in time), one from each of the reconstructed encoded channels, then fills in any of the missing channels (it is say, the channels descended by the encoder) with zeros The decoder multiplies the samples with a general post-processing transformation matrix The general post-processing transformation matrix can be a matrix with prc-determined elements, or it can be a general matrix with elements specified by the encoder The encoder signals the decoder to use a matrix default (for example, with one or more branded bits) or sends the elements of a general matrix to the decoder, or the decoder can be configured to always use the same general post-processing transformation matrix For additional flexibility, post-processing Multiple channel can be turned on / off on one frame basis per frame or another (in which case, the decoder can use an identity matrix to leave unaltered channels) For more information on pre-processing transformations, post-channel processing Multiple and multiple channel flexible, see U.S. Patent Application Publication No. 2004 -0049379, entitled "Multiple Channel Audio Coding and Decoding" IV. Channel Extension Processing for Multiple Channel Audio In a typical coding scheme for encoding a multiple channel source, a time-to-frequency transformation using a transformation such as a bent modulated transformation ("MLT") or separate cosine transformation ("DCT") is performed in an encoder, with a corresponding inverse transformation in the decoder The MLT or DCT coefficients for some of the channels were grouped together in a channel group and a linear transformation is applied through the channels to obtain the channels to be encoded If the left and right channels of a stereo source are correlated, they can be coded by using a sum difference transformation (also called M / S or media / lateral encoding) This removes correlation between the two channels which results in fewer bits needed to encode them however, at lower bit rates, the difference channel may not be encoded (resulting in stereo image loss) or the quality may suffer from heavy quantization of both channels. The techniques and tools described provide a desirable alternative to coding schemes of existing junctions (eg, middle / side coding, intensity stereo coding, etc.) Instead of sum coding and difference channels for channel groups (eg, left / right pairs, front left / right front pairs, the left rear / right rear pairs or other groups), the described techniques and tools encode one or more combined channels (which may be the sum of channels, a major major component after applying a transformation &;? correlation, or some other combined channel) together with additional parameters to describe the correlation of transversal channel and energy of respective physical channels and allows the reconstruction of the physical channels that maintains the correlation of transversal channel and energy of the respective physical channels In other words, the second order statistics of the physical channels are maintained. Such processing can be referred to as a channel extension processing. For example, using complex transformations allows the reconstruction of the channel maintaining cross-channel correlation and energy of the respective channels. a narrow band signal approximation, maintaining second order statistics is sufficient to provide a reconstruction that maintains the energy and phase of individual channels, without sending explicit correlation coefficient information or phase information The techniques and tools described They represent uncoded channels as modified versions of encoded channels. The channels to be encoded can be real channels, physical channels or transformed versions of physical channels (using, for example, a linear transformation applied to each sample). For example, the techniques described and the tools allow representation of plural physical channels using a coded channel and plural parameters In one implementation, the parameters include energy ratios (also referred to as intensity or energy) between two physical channels and a channel coded on a per-band basis For example, to encode a signal that has left (L) and right (R) stereo channels, the energy ratios are L / M and R / M, where M is the energy of the coded channel (the "sum" or "mono" channel) ), L is the left channel energy, and R is the energy of the right channel Although channel extension coding can be used for all s the frequency scales, this is not required For example, for lower frequencies an encoder can encode both channels of a channel transformation (for example, if it uses sum and difference), while for higher frequencies an encoder can encode the sum channel and plural parameters The described modes can significantly reduce the bit rate needed to encode a multiple channel source. The parameters for modifying the channels take a small portion of the total bit rate, which leaves more bit rate to encode combined channels. For example , for a two channel source, if the coding of the parameters takes 10% of the available bit rate 90% of the bits can be used to encode the combined channel In many cases, this is a significant saving in coding of both channels, even after counting transverse channel dependencies Channels can be reconstructed in a reconstructed channel / coded channel relationship different from the ratio of 2 1 described above For example, a decoder can reconstruct left and right channels and a central channel of the individual coded channel Other arrangements are also possible In addition, the parameters can be defined in different ways For example, parameters can be defined on some basis other than a base per band A. Complex Transformation and Scale / Shape Parameters In described modes, an encoder forms a combined channel and provides parameters to a decoder for reconstruction of the channels that were used to form the combined channel. A decoder derives complete coefficients (each having a real component and an imaginary component) for the combined channel that uses a complex forward transformation Then, to reconstruct physical channels of the combined channel, the decoder scales the complex coefficients using the parameters provided by the encoder For example, the decoder derives factors from scale the parameters provided by the encoder and use them to scale the complex coefficients The combined channel is often a sum channel (sometimes referred to as a mono channel) but it can also be another combination of physical channels The combined channel can be a channel of di (for example, the difference between left and right channels) in cases where physical channels are out of phase and summing the channels will cause them to cancel each other For example, the encoder sends a sum channel for left and right physical channels and plural parameters to a decoder that can include one or more complex parameters (Complex parameters are derived in some way from one or more complex numbers, although a complex parameter sent by an encoder (for example, a relation involving an imaginary number and a real number) can itself be not a complex number) The encoder can also send only real parameters from which the decoder can derive complex scaling factors to scale spectral coefficients (The encoder typically does not use a complex transformation to encode the same channel combined rather the encoder can use any of the vain technique s encoding to encode the combined channel) Figure 9 shows a simplified channel extension coding technique 900 performed by an En 910 encoder, the encoder performs one or more combined channels (eg, sum channels) then at 920, the encoder derives one or more parameters to be sent together with the combined channel to a decoder Figure 10 shows a simplified reverse channel extension decoding technique 1000 made by a decoder En 1010, the decoder receives one or more parameters of one or more combined channels Then, at 1020, the decoder scales combined channel coefficients when using the parameters For example the decoder derives complex scale factors from the parameters and uses the scale factors to scale the coefficients After a transformation from time to frequency in a encoder, the spectrum of each channel is usually divided into a sub-band In the described embodiments, an encoder can determine different parameters for different frequency sub-bands, and a decoder can scale coefficients in a band of the combined channel for the respective band in the reconstructed channel using one or more parameters provided by the encoder In a coding arrangement where the left and right channels are to be reconstructed from a coded channel, each coefficient in the subband for each of the left and right channels is represented by a scaled version of a subband in the encoded channel For example, Figure 11 shows scale of coefficients in a band 1110 of a combined channel 1120 during channel reconstruction The decoder uses one or more parameters provided by the encoder to derive scaled coefficients in corresponding subbands for the channel left 1230 and the right channel 1240 that is rebuilt by the decoder In one implementation, each sub-band in each of the The left and right channels have a scale parameter and a shape parameter. The shape parameter can be determined by the encoder and sent to the decoder or the shape parameter can be assumed by taking spectral coefficients in the same location as those encoded. The encoder represents all the frequencies in a channel that uses scaled version of the spectrum of one or more coded channels A complex transformation (which has a component of real number and an imaginary number component) is used, so that the second order statistics of channel cross channel can be maintained for each subband Because the coded channels are a linear transformation of real channels the parameters do not need to be sent for all channels For example, if the P channels are encode when using N channels (where N <; P), then the parameters do not need to be sent for all P-channels More information on scaling parameters and form is provided later in Section V Parameters can change over time while the energy ratios between the physical channels and the channel Thus, the parameters for the frequency bands in a frame can be determined in a frame by frame basis or some other base. The parameters for a current band in a current frame are encoded differently based on parameters of other bands of frequency and / or other frames in described modes The decoder performs a complex forward transformation to derive the complex spectral coefficients from the combined channel Then it uses the parameters sent in the bit stream (such as energy ratios of an actual imaginary relationship for the correlation transverse or a normalized correlation matrix) ra scaling the spectral coefficients The output of the complex scaling is sent to the post-processing filter The output of this scale filter and aggregates to reconstruct the physical channels The channel extension coding does not need to be done for all frequency bands or for all the time blocks For example, channel extension coding can be switched on or off adaptively on a per-band basis, one base per block, or some other base. In this way, an encoder can choose to perform this processing when it is efficient or otherwise. beneficial form to be it The remaining bands or blocks can be processed by not correlation of traditional channel, but correlation, or use other methods The complex scale factors achievable in described modalities are limited to values within certain limits For example, the described modalities encode parameters in the registration domain, and the values are limited by the amount cross-channel correlation channels The channels that can be reconstructed from the combined channel that uses complex transformations are not limited to left and right channel pairs are not combined channels limited to combinations of left and right channels For example, combined channels can represent 2, 3 or more physical channels The reconstructed channels of combined channels can be groups such as left rear / right rear, left rear / left, right rear / right, left / center, right / center, and left / middle / right Other groups are also possible All reconstructed channels can be reconstructed by using complex transformations, or some channels can be reconstructed by using complex transformations while others can not.
B. Interpolation of Parameters An encoder can choose anchor points in which it determines explicit parameters and parameters to interpolate between the anchor points The amount of time between anchor points and the number of anchor points can be set or varied depending on the decisions of the anchor points. content and / or decoder ring When an anchor point is selected at time P, the encoder can use that anchor point for all frequency bands in the spectrum Alternatively, the encoder can select anchor points at different times for different bands Frequency Figure 12 is a graphical comparison of real energy ratios and interpolated energy ratios of energy ratios at anchor points In the example shown in Figure 12, interpolation smoothes variations in energy ratios (for example, between points anchor 1200 and 1202, 1202 and 1204, 1204 1206, 1206 and 1208) that can help avoid frequently changing energy relationship artifacts The encoder can turn on or off interpolation or not interpolate all the parameters. For example, the encoder can choose to interpolate parameters when changes in energy ratios are gradual over time, or turn off interpolation when parameters they do not change much from frame to frame (for example, between anchor points 1208 and 1210 in Figure 12), or when the parameters change so rapidly that interpolation will provide inaccurate representation of the parameters C. Detailed Explanation A general linear channel transformation can be written as Y = AX, where X is a group of vectors L of channel coefficients P (a dimensional matrix P x L), A is a transformation matrix of channel P x P, e Y is the group of transformed vectors L of the P channels to be encoded (a dimensional matrix P x L) L (the vector dimension) is the band size for a given subframe in which the linear channel transformation algorithm If an encoder encodes a subgroup N of the P-channels in Y, this can be expressed as Z = BX, where the vector Z is a matrix N x L, and B is a matrix N x P formed at take N rows of the matrix Y that correspond to the N channels to be encoded The reconstruction of the N channels involves another matrix multiplication with a matrix C after encoding the vector Z to obtain W = CQ (Z), where Q represents quantification of the Z vector Substitute Z gives equation W = CQ (BX) Assuming that the quantization noise is imperceptible, W = CBX C can be appropriately chosen to keep second-order statistics of the transversal channel between the uppercase vector X and W In the form of an equation, this can be represented as WW * = CBXX * B * C = XX *, where XX * is a symmetric P x P matrix Since XX * is a symmetric P x P matrix, there are P (themes 1) / 2 degrees of freedom in the matrix If N > = (P + 1) / 2, then it may be possible to arrive with a matrix P x N C satisfying the equation If N < (P + 1) / 2, then more information is needed to solve this If that is the case, complex transformations can be used to arrive with other solutions that satisfy some portion of the limitation. For example, if X is a complex vector and C is a complex matrix, we can try to find C so that Re (CBXX * B * C) = Re (XX *) According to this equation, for an appropriate complex matrix C the actual portion of the symmetric matrix XX * is equal to the actual portion of the product of symmetric matrix CBXX * B * C * Example 1 for the case where M = 2 and N = 1, then, BXX * B * is simply a real scalar array (L x 1), named as a Solves for the equation shown in Figure 13 If B0 = B1 = ß (which is some constant) then the limitation in Figure 14 is kept Resolving the values shown in Figure 15 are obtained for | C 01, 1 C T | and | Co || C1 | cos (0o-0?) The encoder sends | C0 | and | C? | Then we can solve by using the limitation shown in Figure 16 It should be clear from Figure 15 that these quantities are essentially the energy ratios L / M and R / M The mark in the limitation shown in Figure 16 can be used to control the signal of the phase to match the imaginary portion of XX * This allows to solve o0-o, but not for the actual values In order to solve the exact values, another assumption is made that the angle of the mono channel for each coefficient is maintained, as expressed in Figure 17 To maintain this, it is sufficient that | Co | sen0o + | C? | sin0? = 0, which gives the results for 0O and 01 shown in Figure 18 By using the limitation shown in Figure 16, we can solve for the real and imaginary portions of the two scale factors. For example, the actual portion of the two factors of scale can be found by solving | Co | cos0o and | C? | cos0? and, respectively, as shown in Figure 19 The imaginary portion of the two scale factors can be found by solving | Co | sin0o and Idlsenß !, respectively, as shown in Figure 20 In this way, when the encoder sends the magnitude of the complex scale factors, the decoder is able to reconstruct two individual channels that maintain second-order characteristics of the transverse channel of the original, physical channels, and the two reconstructed channels maintain the appropriate phase of the encoded channel Example 2 in the Example 1, although the imaginary portion of the second-order transverse channel statistics is solved (as shown in Figure 20), only the actual portion in the decoder is maintained, which is only reconstructed from a single mono source. However, the imaginary portion of the second-order transverse channel statistics can also be maintained if (in addition to the complex escalation) the output from the previous stage as described in Example 1 is post-processed to achieve an additional spacing effect. . The output is filtered through a linear filter, scaled, and added back to the output of the previous stage. Assume that in addition to the current signal from the previous analysis (W0 and Wi) for the two channels, respectively), the decoder has the effect signal - a processed version of both available channels (W0F and W1F, respectively), as shown in Figure 21. Then the total transformation can be represented as shown in Figure 23, which assumes that WOF = C0ZOF and W1F = C1ZOF- It is shown that by following the reconstruction procedure shown in Figure 22, the decoder can maintain the statistics Second order of the original signal. The decoder takes a linear combination of the original and filtered versions of W to create a signal F that holds the second-order statistics of X. In Example 1, it was determined that the complex constants C0 and C ^ can be chosen to coincide with the actual portion of second-order cross-channel statistics when sending two parameters (for example, left-to-mono (L / M) and mono-right (R / M) energy ratios). If another parameter is sent by the encoder, then the second-order statistics of the complete transversal channel of the multi-channel source can be maintained. For example, the encoder can send an additional, complex parameter representing the imaginary to real relation of the transversal correlation. between the two channels to maintain the complete second-order transverse channel statistics of a two-channel source Assume that the correlation matrix is provided by Rxx, as defined in Figure 24, where U is a matrix other than normal vector characteristic complexes, and? is a matrix / of characteristic values It should be noted that this factor must exist for any symmetric matrix. For any obtainable energy correlation matrix, the characteristic values must also be real. This factopzation allows to find a Karhunen-Loeve Transformation ("KLT" ) KLT was used to create non-correlated sources for compression Here, we want to do the inverse operation which is to take uncorrelated sources and create a desired correlation The KLT of vector X is provided by U *, since U * U? U * U = ?, a diagonal matrix The energy in Z is a Therefore if we choose a transformation such as and assume that W0F and W1F have the same energy as and do not correlate with W0 and W, respectively, the reconstruction procedure in Figures 23 or 22 produces the desired correlation matrix for the final result. In practice, the encoder sends relationships of energy | C0 | and | C 11, and the imaginary relation to real lm (X0X *?) / A The decoder can reconstruct a normalized version of the cross correlation matrix (as shown in Figure 25). The decoder can then calculate? if it finds characteristic values and characteristic vectors, which arrive in the desired transformation Due to the relationship between | C0 | and | C? |, can not have independent values From here the encoder quantifies them jointly or conditionally This applies to both Examples 1 and 2 They are also possible for other parametrizations, such as by sending the encoder to the decoder of a standard version of the energy matrix directly where we can normalize by geometric means of the energies as shown in Figure 26 Now the encoder can send only the first row of the matrix, which is sufficient since the product of the diagonals is 1 However, now the decoder scales the characteristic values as shown in Figure 27 Another parametching is possible to represent U and? It can be shown that U can be made in a given row of rotations. Each given rotation can be represented by an angle. The encoder transmits the given rotation angles and the characteristic values. Also, both parameters can be incorporated by any additional arbitrary pre-rotation V and even produce the same correlation matrix since it is of VV * = I where it represents the identity matrix. That is, the relation shown in Figure 28 will work for any arbitrary rotation. V For example, the decoder chooses a pre-rotation so that the amount of filtered signal that goes in each channel is the same, as represented in Figure 29 The decoder can choose? for the relationship in Figure 30 to be maintained Once the matrix shown in Figure 31 is known, the decoder can be reconstructed as above to obtain the channels W0 and W, then the decoder obtains W0F and W1F (the signals from effect) when applying a linear filter to W0 and W, For example, the decoder uses a full-pass filter and can take the output of any of the filter caps to obtain the effect signals (For more information on filter applications of step, see MR Schroeder and BF Logan, "Colorless' Artificial Reverberation," 12th Anniversary of the Society of Audio Engineers Meeting, page 18 (1960)) The strength of the signal that is added as a post procedure is provided in the matrix shown in Figure 31 The all-pass filter can be represented as a cascade of other full-pass filters. Depending on the amount of reverberation needed to accurately model the source, the output Any of the filters of any step can be taken This parameter can also be sent in a band, sub-frame, or base-source For example, the output of the first, second or third stage in the filter cascade of all steps can be taken by taking the output of the filter, scaling it and adding it back to the original reconstruction the decoder is able to keep the second-order statistics of transverse channel Although the analysis makes certain assumptions about the energy and the correlation structure in the effect signal, such assumptions they are not always perfectly satisfied in practice Additional processing and better approximation can be used to refine these assumptions For example, if the signal is filtered it has an energy that is greater than the desired the filtered signal can be scaled as shown in Figure 32 so that have the correct energy This ensures that the energy is properly maintained if the energy is too much Ande A calculation to determine if the energy exceeds the threshold is shown in Figure 33 Sometimes there may be cases when the signal in the two physical channels that is combined is out of phase, and in that way if sum coding is used, the matrix will be singular In such cases, the maximum standard of the matrix can be limited This parameter (a threshold) to limit the maximum scale of the matrix can also be sent in the bit stream in a band, subframe, or source base As in Example 1, the analysis in this Example assumes that B0 = B1 = ß However, the same algebra principles can be used for any transformation to obtain similar results V. Channel Extension Coding with Other Coding Transformations The channel extension coding techniques and tools described in Section IV above can be used in combination with other techniques and tools. For example, an encoder can use base coding transformations, transformations frequency extension coding (eg, extended band perceptual similarity encoding transformations) and channel extension coding transformations (the frequency extension encoding is described in Section VA, below) In the encoder these transformations may be Or, a frequency extension coding module separate from the base coding module, and a channel extension coding module separate from the base coding module and frequency extension coding module O, can be performed in a base coding module. Different transformations in vain combinations within the same module A. Revision of Frequency Extension Coding This section is a review of frequency extension coding techniques and tools used in some encoders and decoders to encode higher frequency spectral data as a function of baseband data in the spectrum ( sometimes referred to as extended-band perceptual frequency frequency coding or wide-sense perceptual similarity coding) Spectral coding coefficients for transmission in an output bit stream to a decoder may consume a relatively large portion of the bit rate Available At low bit rates, therefore, an encoder can choose to encode a reduced number of coefficients by encoding a baseband within the bandwidth of the spectral coefficients and representing coefficients outside the baseband as scaled and shaped versions. the baseband coefficients Figure 34 illustrates a generalized module 3400 that can be used in an encoder. The illustrated model 3400 receives a group of spectral coefficients 3415. Therefore, at low bit rates, an encoder can choose to code a reduced number of coefficients. a baseband within the bandwidth of the spectral coefficients 3415 typically at the lower end of the spectrum The spectral coefficients outside the baseband are referred to as "extended band" spectral coefficients The division of the baseband and the band Extended is performed in the extended band / extended band division section 3420 Subband division can also be performed (for example for extended band sub-bands) in this section To avoid distortion (for example, a muffled or low step) in the reconstructed audio the extended band spectral coefficients are represented as noise with form, versions c in form or other frequency components or a combination of the two The extended band spectral coefficients can be divided into a number of subbands (for example, 64 or 128 coefficients) that can be separated or overlapped even though the real spectrum can be somehow different this extended band coding provides a perfect effect to that is similar to the original The baseband / extended band division section 3420 outputs 3463 base band spectral coefficients and extended band spectral coefficients and lateral information (which can compressed) describing, for example, base bandwidth and individual sizes and number of extended band subbands In the example shown in Figure 34, the encoder encodes coefficients and side information (3435) into coding module 3430 A encoder may include separate entropy coders for baseband and extended band spectral coefficients give and / or use different entropy coding techniques to code the different categories of coefficients A corresponding decoder will typically use complementary decoding techniques (To show another possible implementation, Figure 36 shows separate decoding modules for baseband and extended band coefficients) An extended band encoder can encode the subband when using two parameters A parameter (referred to as a scale parameter) is used to represent the energy total in the band The other parameter (referred to as a shape parameter) is used to represent the shape of the spectrum within the band Figure 35 shows an illustrative technique 3500 for coding each subband of the extended band in a band encoder extended The extended band encoder calculates the scale parameter in 3510 and the shape parameter in 3520. Subband case encoded by the extended band encoder can be represented as a product of a scale parameter and a shape parameter. For example, the Scale parameter can be the root mean value of the coefficients within the current subband. This is found by taking the square root of the average square value of all the coefficients. The average square value is found by taking the sum of the square value of all the coefficients in the subband, and dividing by the number of coefficients. shape can be a displacement vector that specifies a normalized version of a portion of the spectrum that was already coded (eg, a portion of baseband spectral coefficients encoded with a baseband encoder), a standardized random noise vector , or a vector for a spectral form of a fixed-code book A displacement vector that specifies another portion of the spectrum is useful in audio since there are harmonic components typically in tone signals with repetition across the spectrum The use of noise or some another fixed-code book can facilitate low bit-rate encoding of components that are not well represented in a p spectrum-encoded base-band orbit Some encoders allow modification of vectors to better represent spectral data Some possible modifications include a linear or non-linear transformation of the vector, which represents the vector as a combination of two or more other original or modified vectors. In case of a combination of vectors, the modification may involve taking one or more portions of a vector and combining them with one or more portions of other vectors. When vector modification is used, the bits are sent to inform a decoder about how to form a new vector. vector Despite the additional bits, the modification consumes fewer bits to represent spectral data than the actual waveform encoding The extended band encoder does not need to encode a separate scale factor per subband of the extended band Instead , the extended band encoder can represent the parameter of scale for the subband as a function of frequency, such as by coding a group of coefficients of a pohnominal function that generate the scaling parameters of the extended subbands as a function of their frequency In addition, the extended band encoder can encode additional values that characterize the shape for an extended subband For example, the extended band encoder can encode values to specify change or stretch of the baseband portion indicated by the motion vector In such case, the parameter of shape is encoded as a group of values (eg, specifying position, change, and / or stretch) to better represent the shape of the extended subband with respect to a vector of the coded baseband, fixed codebook , or random noise vector The scaling and shape parameters that encode each subband of the extended band can both be vectors. For example, the sub-bands extended can be represented as a product of vector scale (f) form (f) in the time domain of the filter with frequency response sca / a (/) and a stimulus with frequency response form (f) This coding can be the shape of a linear predictive coding filter (LPC) and a stimulus An LPC filter is a low order representation of the scale and shape of the extended subband, and the stimulus represents tone and / or noise characteristics of the extended subband. The stimulus may come from analyzing the encoded portion of the baseband of the spectrum and identifying a portion of the baseband encoded spectrum, a spectrum from the baseband. Fixed code or random noise that matches the stimulus that is encoded This represents the extended subband as a portion of the baseband encoded spectrum, but the match is made in the time domain When referring back to Figure 35 at 3530 the extended band encoder looks for base band spectral coefficients for a similar band outside the baseband spectral coefficients that it has in a form similar to the current subband of the extended band (for example using a comparison to the less square of medium for a normalized version of each portion of the baseband) In 3532, the extended-band encoder checks whether this similar band was The ratio of the baseband spectral coefficients is close enough in the form to the current extended band (for example, the least squared mean value is less than a pre-selected threshold). If so, the extended band encoder determines a vector that points to this similar band of the baseband spectral coefficients in 3534. The vector can be the position of starting coefficients in the baseband. Other methods (such as revising tonality versus non-tonality) can be used to see if the similar band of baseband spectral coefficients are sufficiently close in shape to the current extended band If no portion sufficiently similar to the baseband is found, the extended band encoder then searches for a fixed codebook (3540) of spectral shapes to represent the current subband If found (3542) the extended band encoder uses its index in the codebook as the parameter or in 3544 form Otherwise, in 3550, the extended band encoder represents the shape of the current subband as a normalized random noise vector. Alternatively, the extended band encoder can decide how the spectral coefficients can be represented with some another decision method The extended band encoder can compress scale and shape parameters (for example using predictive coding, quantificacton coding and / or entropy) For example, the scale parameter can be encoded predictively based on an earlier extended subband For multi-channel audio, the climbing parameters for sub-band can be predicted from a preceding sub-band in the channel. The scale parameters can also be predicted through channels, for more than another sub-band, of the band spectrum of basis, or previous auditory input blocks, among other variations This choice of prediction can be done by noting that the previous band (for example, with the same extended band, channel, or box (input block)) provides higher correlations The extended band encoder can quantize scale parameters using uniform or non-uniform quantization, and the resulting quantized value can be encoded by entropy The extended-band encoder can also use predictive coding (for example, from a preceding sub-band) quantization, and entropy coding for form parameters If the sub-band sizes are variable for a given implementation, this provides the opportunity for the sub-band size to improve coding efficiency Frequently, sub-bands that have similar characteristics can be merged with very little effect on quality Sub-bands with highly variable data can be better represented if the sub is divided -band However, the smaller sub-bands require more b-bands (and, typically, more bits) to represent the same spectral data than larger sub-bands To balance these interests, an encoder can make subband decisions based on quality measures and bit rate information A decoder de-multiplexes a bit stream with extended band / band division and decodes the bands (for example, in a decoder baseband and an extended band decoder) using corresponding decoding techniques The decoder can also perform additional functions Figure 36 shows aspects of an audio decoder 3600 for decoding a bit stream produced by an encoder using extension coding Frequency and Separate Coding Modules for Baseband Data and Extended-Band Data In Figure 36, the baseband data and the extended-band data in the coded-bit stream 3605 are decoded in the band-width decoder. base 3640 and the extended band decoder 3650, respectively The decoder baseband 3640 decodes the baseband spectral coefficients using conventional decoding of the baseband decoder / decoder The extendedband decoder FF 50 decodes the extendedband data, including copying in portions of the band spectral coefficients The base band and the extended band spectral coefficients are combined into an individual spectrum that is converted by the inverse transformation 3680 to reconstruct the audio signal Section IV describes techniques for representing all the frequencies in an uncoded channel using a scaled version of the spectrum of one or more encoded channels. The frequency extension coding differs in that the extended band coefficients are represented when using scaled versions of the coefi However, these techniques can be used together, such as when performing frequency extension coding in a combined channel and in other ways as described below B. Examples of Channel Extension Coding with Other Coding Transformations Figure 37 is a diagram showing aspects of an illustrative 3700 encoder using a time-to-frequency (T / F) transformation 3710, a frequency extension transformation T / F 3720, and a T / F channel extension transformation 3730 to process multiple channel source audio 3705 (Other encoders may use different combinations or other transformations in addition to those shown) The T / F transformation may be different for each of the three transformations For the base transformation, after a transformation multi-channel 3712, the 3715 encoding comprises coding of coefficients spectr If the channel extension coding is also used, at least some frequency scales for at least some of the multi-channel transform coded channels do not need to be encoded. If the frequency extension coding is also used, at least some of the frequency does not need to be encoded For frequency extension processing, the 3715 encoding comprises coding scale and shape parameters for bands in a sub-frame. If the channel extension coding is also used, then these parameters may not need to be sent for some scales of frequency for some of the channels For the channel extension transformation, the 3715 encoding comprises parameter coding (e.g., power ratios and a complex parameter) to maintain exactly transverse channel correlation for bands in a sub-frame For simplicity, coding it shows as formed in an individual coding module 3715 however, different coding tasks can be performed in different coding modules. Figures 38, 39 and 40 are diagrams showing aspects of decoders 3800, 3900 and 4000 that decode a bit stream such as bit stream 3795 produced by illustrative encoder 3700. the decoders, 3800, 3900 and 4000, some modules (for example, entropy decoding, quantification / inverse weight, additional postprocessing) that are present in some decoders are not shown for simplicity Also, the modules shown in some cases can be rearranged , combine, or divide in different ways For example, although individual routes are shown, the processing routes can be conceptually divided into two or more processing paths In the 3800 decoder the base spectral coefficients are processed with a base multiple channel transformation Inverse 3810, reverse base T / F transformation 3820 , forward frequency T / F extension transformation 3830, 3840 sequence extension processing, 3850 inverse frequency extension T / F transformation, forward T / F channel extension transformation 3860 channel extension processing 3870, and 3880 reverse channel extension T / F transformation to produce 3895 reconstructed audio However, for practical purposes, this decoder can be undesirably complicated. Also, the complex channel extension transformation, while the other two are not. Therefore, they can be adjusted Other decoders of the following forms the T / F transformation for frequency extension coding may be limited to (1) base T / F transformation, or (2) the actual portion of the channel extension T / F transformation This allows configurations such as those shown in Figures 39 and 40 In Figure 39, the 3900 decoder processes spectacle coefficients base rails with 3910 frequency extension processing, 3920 inverse multiple channel transformation, inverse base T / F transformation 3930, forward channel extension transformation 3940, channel extension processing 3950, and extension T / F transformation reverse channel 3960 to produce reconstructed audio 3995 In Figure 40, decoder 4000 processes base spectral coefficients with reverse multiple channel transformation 4010, reverse base T / F transformation 4020 forward channel extension transformation actual portion 4030, frequency extension processing 4040, imaginary portion of forward channel extension transformation portion 4050, channel extension processing 4060, and reverse channel extension T / F transformation 4070 to produce 4095 reconstructed audio Any of these configurations can be used , and a decoder can dynamically change the c The configuration used in an implementation, the transformation used for base extension and frequency coding is MLT (which is the real portion of the MCLP (complex bent modulated modulation) and the transformation used for the channel extension transformation is the MCLT However, both have different submarine sizes Each MCLT coefficient in a subframe has a base function that separates that subframe Since each subframe only overlaps with the two neighboring subframes, only the MLT coefficients of the current subframe, previous subframe, and The following subframe is needed to find the exact MCLT coefficients for a given subframe. Transformations can use the same size transformation blocks, or transformation blocks can have different sizes for different kinds of transformations. Transformation blocks of different sizes in the basic coding transformation and the transformation of frequency extension coding may be desirable, such as when the transform frequency coding extension may improve quality by acting on smaller time window blocks However changing the transformation sizes into base coding, frequency extension coding and channel coding introduce significant complexity in the encoder and the decoder Thus, sharing transformation sizes between at least some of the transformation types may be desirable As an example, if the base coding transformation and the frequency extension coding transformation share the same transformation block size, the channel extension coding transformation can have a transformation block size independent of transformation block size core encoding / extension coding frequency In this example, the decoder can comprise frequency reconstruction followed by a reverse base coding transformation Then, the decoder performs a complex forward transformation to derive spectral coefficients to scale the coded channel, combined complex channel coding ansformation uses its own transformation block size, independent of those two transformations The decoder reconstructs the physical channels in the coded channel frequency domain, combined (for example, a sum channel) that uses the spectral coefficients derivatives, and performs an inverse complex transformation to obtain time domain samples from the reconstructed physical channels As another example, if the base coding transformation and the frequency extension coding transformation have different transformation block sizes, the transformation of channel coding may have the same transformation block size as the frequency extension coding transformation block size In this example, the decoder may comprise a reverse base coding transformation followed by frequency reconstruction The decoder performs a reverse channel transformation that uses the same transformation block size as it was used for frequency reconstruction. Then, the decoder performs a forward transformation of the complex component to derive the spectral coefficients. In the forward transformation, the decoder can calculate the imaginary portion of MCLT coefficients of the channel extension transformation coefficients of the real portion. For example, the decoder can calculate an imaginary portion in a current block by observing actual portions of some bands (e.g., three bands or more) of a previous block, some bands (e.g., two bands) of the current block, and some bands (for example, three bands or more) of the next block The delineation of the real portion to an imaginary portion involves taking a dot product between the inverse modulated DCT base and the modulated separated sine transformation (DST) base vector forward Calculating the imaginary portion for a given subframe involves finding all the DST coefficients within a subframe This can only be for non-zero for DCT base vectors of the previous subframe, current subframe, and next subframe In addition, only the DCT base vectors of approximately similar frequency as the DST coefficient that they tried to find have significant power If the subframe sizes for the previous subframe, act If, and next are the same, then the energy is significantly for different frequencies than we are trying to find for the DST coefficient. Therefore, a complexity solution can be found to find the DST coefficients for sub given the DCT coefficients. Specifically, we compute Xs = A * Xc (-1) + B * Xc (0) + C * Xc (1), where Xc (-1), Xc (0) and Xc (1) represent DCT coefficients of the previous, current block and the next block and Xs represent the DST coefficients of the current block 1) pre-calculate matrix A, B and C for different form / window size 2) matrix threshold A, B, and C since the values significantly lower than the peak values are reduced to zero, which reduces it to sparse matrices 3) calculate the matrix multiplication only using non-zero matrix elements In applications where banks are needed complex filter, is the fast way to derive the imaginary portion of the real, or vice versa, without directly calculating the imaginary portion The decoder reconstructs the physical channels in the frequency domain of the coded channel, combined (for example, a sum channel) ) using the derived scaling factors, and performs a complex inverse transformation to obtain time domain samples from reconstructed physical channels The aspect results in significant reduction in complexity compared to the brute force approach involving an inverse DCT and a DST towards ahead C. Computational Complexity Reduction in Frequency / Channel Coding Frequency / channel coding can be done with base coding transformations, frequency coding transformations, and channel coding transformations. The transformation change of one or the other in block or frame base can improve perceptual quality, but it is computationally expensive In some scenarios (for example, low processing energy devices), such high complexity may not be acceptable A solution to reduce complexity is to force the encoder to always select the transformations However, this aspect puts a limitation on the quality even for playback devices that are without power limitations. Another solution is to allow the encoder to perform without transformation limitations and have the parameters of codify frequency ion / decoder map channel for the basic coding transformation domain if low complexity is required If the delineation is done in an appropriate way, the second solution can achieve better quality for high-energy devices and good quality for devices low energy with reasonable complexity The delineation of the parameters to the base transformation domain for the other domains can be done without extra information for the bit stream or without additional information put in the bit stream by the encoder to improve the delineation performance D. Improvement of Energy Tracking of Frequency Coding in Transition between Different Window Sizes As indicated in section VB, a frequency coding encoder can use base coding transformations, frequency coding transformations (eg, transformations). of extended-band perceptual similarity coding) and channel-coding transformations However, when the frequency coding changes between two different transformations the starting point of the frequency coding may need extra attention. This is because the signal in a of the transformations, such as the base transformation, is usually passed on by band, with a clear passage band defined by the last coded coefficient. However, such a clear limit, when a different transformation is delineated, can become confusing. In some implementation, the Frequency encoder ensures that no signal energy is lost by carefully defining the starting point Specifically, 1) for each band, the frequency encoder calculates the energy of the previously compressed signal (eg base coding) - E1 2) for each band the encoder frequency calculates the energy of the original signal - E2 3) if (E2-E1) > T, where T is a predefined threshold, the frequency encoder marks this band as the starting point 4) the frequency encoder starts the operation here, and 5) the frequency encoder transmits the starting point to the decoder This way , a frequency encoder, when switching between different transformations, detects the energy difference and transmits it to a starting point accordingly SAW. Shape and Scale Parameters for Frequency Extension Coding A. Displacement Vectors for Encoders Using Modulated DCT Encoding As mentioned in Section V above, extended band perceptual frequency coding involves determining shape parameters and parameters of scale for frequency bands within time windows Form parameters specify a portion of a baseband (typically an inner band) that will act as the basis for encoding coefficients in an extended band (typically a band higher than the baseband) ) For example, the coefficients in the specified portion of the baseband can be scaled and then applied to the extended band A displacement vector d can be used to modulate the signal of a channel at time,, as shown in Figure 41 Figure 41 shows representations of vectors from offset for two audio blocks 4100 and 4110 at time t0 and ti, respectively. Although the example shown in Figure 41 involves concepts of frequency extension coding, this principle can be applied to other modulation schemes that do not relate to frequency extension coding In the example shown in Figure 41, audio blocks 4100 and 4110 comprise N subbands on the scale from 0 to N-1, with the subbands in each block divided into a band lower frequency base and higher frequency extended band For audio block 4100, the displacement vector d0 is shown to be the displacement between its b-bands m0 and n0 Similarly, for audio block 4110, the displacement vector di is shown for the sub-band displacement of ni] and n ^ Since the displacement vector means that it accurately describes the form of extended band coefficients, one can assume that it will be desirable to allow maximum flexibility in the displacement vector However, restrict values of displacement vectors in some situations of improved low perceptual quality For example, an encoder can choose sub-bands m and n so that they are always fixed sub-bands and numbered by probability, which makes the number of sub-bands covered by the displacement vector d always fixed In an encoder that uses separate modulated cosine transformations (DCT), when the number of sub-bands covered by the displacement vector d is fixed, better reconstruction is possible When perceptual frequency similarity encoding of extended band is performed at ut For example, modulated DCTs, a cosine wave of the baseband is modulated to produce a modulated cosine wave for the extended band. If the subband number is covered by the displacement vector d is even, the modulation leads to exact reconstruction. However, if the number of sub-bands covered by the displacement vector is odd, the modulation leads to distortion in the reconstructed audio that way, by restricting displacement vectors to cover only fixed subband numbers (and sacrificing some flexibility). in d) better overall sound quality can be achieved by avoiding distortion in the modulated signal. Thus, in the example shown in Figure 41, the audio block shift vectors 4100 and 4110 each cover an even number of sub-blocks. bands B. Anchor Points for Scale Parameters When the frequency coding has windows smaller than the base encoder, the bit rate tends to increase. This is because while the windows are smaller, it is still important to maintain frequency resolution at a fairly high level to avoid unpleasant artifacts Figure 42 shows a simplified arrangement of audio blocks of different sizes The time window 4210 has a longer duration than the 4212-4222 time windows, but each time window has the same number of frequency bands The revision marks in Figure 42 indicate anchor points for each frequency band. As shown in Figure 42, the number of anchor points can vary between bands, as can temporary distances between Anchor points (For simplicity, not all windows, bands or anchor points are shown in Figure 42) At these points anchor, the scale parameters are determined The scale parameters for the same bands in other time windows can then be interpolated from the parameters at the anchor points Alternatively, the anchor points can be determined in other ways Having described and illustrated the principles of our invention with reference described modalities, we will recognize that the described modalities can be modified in disposition and detail without departing from such principles It should be understood that the programs, procedures, or methods described herein are not related to or limited to any particular type of computing environment , unless otherwise indicated Vanos general purpose types or specialized computing environments can be used with or perform operations in accordance with the teachings described herein Elements of the described modalities shown in software can be implemented in hardware and vice versa In view of the many modalid possible ways to which the principles of our invention can be applied, all these modalities are claimed as the invention since they may be within the scope and spirit of the following claims equivalent to the same

Claims (20)

1 - . 1 - In an audio encoder, a computer-implemented method comprising receiving multiple channel audio data, multiple channel audio data comprises a plurality of plural source channels, performing channel extension coding in the audio data of multiple channel, the channel extension encoding comprises encoding a combined channel for the group and determining plural parameters to represent individual source channels of the group as modified versions of the coded composite channel, and performing frequency extension coding
2 - The method of according to claim 1, wherein the frequency extension coding comprises dividing frequency bands in the multiple channel audio data in a baseband group and an extended band group
3 - The method according to claim 2 , wherein the frequency extension coding further comprises coding coefficient s of audio in the extended band group based on auditory coefficients in the baseband group
4 - The method according to claim 1, further comprising sending the coded combo channel and the plural parameters to an audio decoder, and sending frequency extension coding data to the audio decoder, wherein the coded combo channel, the plural parameters, and the frequency extension coding data facilitate reconstruction in the audio encoder of at least two of the source channels. plurals - The method according to claim 4, wherein the plural parameters comprise energy ratios for at least two source channels 6 - The method according to claim 4, wherein the plural parameters comprise a complex parameter to maintain Second order statistics through at least two source channels 7 - The method according to the claim 4, wherein the audio decoder maintains second order statistics through at least two source channels 8 - The method according to claim 1, wherein the audio encoder comprises a base transformation module, a module transform frequency extension, and a channel extension transformation module 9 - The method according to claim 1, further comprising performing base coding in the multi-channel audio data 10 - The method according to the claim 9, further comprising performing a multi-channel transformation on multiple channel audio data encoded by base 11 - A computer-readable medium storing computer executable instructions to cause a computer programmed by it to perform the method in accordance with the claim 1 12 - In an audio decoder, a method implemented by computer that they buy of receiving encoded multiple channel audio data, the encoded multiple channel audio data comprises channel extension coding data and frequency extension encoding data, and reconstructing plural audio channels using the channel extension coding data and the frequency extension coding data, wherein the channel extension coding data comprises a combined channel for the plural audio channels, and plural parameters to represent individual channels for the plural audio channels as modified versions of the combined channel 13 - A computer readable medium that stores computer executable instructions to cause a computer programmed by it to perform the method according to claim 12 14 - In an audio decoder, a computer implemented method comprising receiving audio data from multiple channel, perform a na reverse multiple channel transformation in the multiple channel audio data received, perform a time-to-base-to-frequency frequency transformation on the received multi-channel audio data, perform the frequency extension processing on the received multi-channel audio data, and perform channel extension processing on the audio data received multiple channel channels 1
5 - The method according to claim 14, wherein the frequency extension procedure is performed on the multiple channel audio data received before the reverse multiple channel transformation and the time to frequency transformation of the channel. reverse base 1
6 - The method according to claim 14, further comprising performing a forward channel extension transformation and a reverse channel extension transform in the received multiple channel audio data 1
7 - The method according to the claim 16, wherein the frequency extension processing is performed in the data of Multiple channel audio received after at least part of the forward channel extension transformation 1
8 - The method according to claim 17, wherein at least part of the forward channel extension transformation is a real portion of the channel. forward channel extension transformation 1
9 - The method according to claim 16, wherein an imaginary portion of the forward channel extension transformation is derived from a real portion of the forward channel extension transformation 20 - A computer readable medium storing computer executable instructions to cause a computer programmed by it to perform the method according to claim 14
MXMX/A/2008/009186A 2006-01-20 2008-07-17 Complex-transform channel coding with extended-band frequency coding MX2008009186A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11336606 2006-01-20

Publications (1)

Publication Number Publication Date
MX2008009186A true MX2008009186A (en) 2008-09-26

Family

ID=

Similar Documents

Publication Publication Date Title
US9105271B2 (en) Complex-transform channel coding with extended-band frequency coding
US7953604B2 (en) Shape and scale parameters for extended-band frequency coding
US8190425B2 (en) Complex cross-correlation parameters for multi-channel audio
US8046214B2 (en) Low complexity decoder for complex transform coding of multi-channel sound
US8249883B2 (en) Channel extension coding for multi-channel source
US9741354B2 (en) Bitstream syntax for multi-process audio decoding
US7860720B2 (en) Multi-channel audio encoding and decoding with different window configurations
US8255234B2 (en) Quantization and inverse quantization for audio
US7299190B2 (en) Quantization and inverse quantization for audio
MX2008009186A (en) Complex-transform channel coding with extended-band frequency coding