CN104347077B

CN104347077B - A kind of stereo coding/decoding method

Info

Publication number: CN104347077B
Application number: CN201410573759.9A
Authority: CN
Inventors: 窦维蓓; 卢敏
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2014-10-23
Filing date: 2014-10-23
Publication date: 2018-01-16
Anticipated expiration: 2034-10-23
Also published as: CN104347077A

Abstract

The invention discloses a kind of stereo encoding method, methods described comprises the steps of：Time-domain signal obtaining step；Block length selects step；Time-frequency conversion step；Stereo parameter coding step；Monophonic processing step；Encapsulation step.The invention also discloses a kind of stereo decoding method, the described method comprises the following steps：Decapsulation step；Monophonic decoding step；Stereo parameter decoding step；Three-dimensional sonication step；Time-frequency inverse transformation step.The decoding method of the present invention provide not only and core mono codec is overall compact merges framework, time-frequency conversion step has been multiplexed, so as to cut down the amount of calculation and delay overhead because time-frequency conversion is brought repeatedly；And the coding method of combination in more flexible inter-frame-combined and frame is provided, the data volume of final coding result is reduced, so as to ensure to obtain higher audio quality in the case where different stereo parameter code checks configure.

Description

A kind of stereo coding/decoding method

Technical field

The present invention relates to audio- visual technique field, in particulars relate to a kind of stereo coding/decoding method.

Background technology

Conventionally, as it is stronger more excellent than having with analogy method record sound to record sound with digital method Gesture, such as noise resisting ability is strong during transmission, increase audio dynamic range, multiple pirate recordings does not have signal attenuation etc..Therefore generally adopt Audio file is recorded with digital method.But with requirement more and more higher of the people to multimedia audio, record needed for sound Data volume is increasing.The audio file of big data quantity not only needs to take very wide transmission bandwidth in transmitting procedure, and Need to take substantial amounts of memory space in storing process.To solve the problems, such as that audio file data amount is excessive, generally in numeral Digital Audio Compression Technology is used in audio, voice data is compressed.

Generally many time-frequency conversion modules based on the audio coding method of frequency domain by using adaptive block length, there is provided sound The most suitable time frequency resolution of frequency signal, obtains higher coding compression gains.For example, in advanced audio (Advanced Audio Coding, AAC) in using 1024 points long block and 128 points short block combination two kinds of filter bank structures, it is right respectively Stationary signal and instantaneous signal carry out the amendment type discrete cosine transform (MDCT) of different length.

On this basis, parameter stereo coding is further proposed.The technology refers to utilize to descend mixing sound road to pass through all the way Mono encoder, the basic framework of a small amount of stereo parameter information is added in addition, dual-channel audio data is encoded. Because stereo coding module and monophonic coding module employ the time-frequency conversion of different resolution, audio signal need by Multiple time-frequency conversion process, therefore the overall complexity of encoding and decoding flow is added, while also increase in encoding-decoding process Delay.

Therefore, for existing stereo coding/decoding method overall flow it is excessively complicated the problem of, it is necessary to a kind of new solid Sound decoding method is to reduce the complexity of overall flow.

The content of the invention

For existing stereo coding/decoding method overall flow it is excessively complicated the problem of, the invention provides a kind of stereo Coding method, methods described comprise the steps of：

Time-domain signal obtaining step, obtain the pairing sound channel time-domain signal of target audio；

Block length selects step, when carrying out block length selection processing to the pairing sound channel time-domain signal so as to obtain pairing sound channel The time-frequency conversion block of domain signal and corresponding block length control signal；

Time-frequency conversion step, time-frequency conversion is carried out so as to obtain to the time-frequency conversion block based on the block length control signal Corresponding pairing vocal tract spectrum coefficient block；

Stereo parameter coding step, the pairing vocal tract spectrum coefficient block is stood based on the block length control signal Body sound parameter coding is so as to obtaining stereo parameter coded data block；

Monophonic processing step, based on the block length control signal and the stereo parameter coded data block according to institute State pairing vocal tract spectrum coefficient block and obtain monophonic coded data block；

Encapsulation step, by the monophonic coded data block, the stereo parameter coded data block and the block length Control signal carries out code stream encapsulation so as to obtain stereo coding packet.

In one embodiment, the stereo parameter coding step comprises the steps of：

Stereo parameter extracts mode selecting step, according to the encoder bit rate of the stereo parameter part of the target audio And the block length control signal selects corresponding stereo parameter extraction pattern and generates corresponding stereo parameter extraction mould Formula marks；

Stereo parameter extraction step, pattern is extracted from the pairing vocal tract spectrum coefficient block according to the stereo parameter Stereo parameter corresponding to middle extraction；

Stereo parameter quantization encoding step, quantization encoding is carried out to the stereo parameter so as to obtain the solid of quantization Sound parameter, and then obtain the stereo parameter coded data block.

In one embodiment, the stereo parameter extraction pattern includes common extraction pattern and superframe extraction pattern, Wherein：

Under the common extraction pattern, the corresponding stereo parameter of a monophonic coded data block is compiled Code data block；

Under the superframe extraction pattern, multiple continuous corresponding one of monophonic coded data blocks are described stereo Parameter coding data block.

In one embodiment, the stereo parameter extraction mode selecting step comprises the steps of：

Encoder bit rate analytical procedure, corresponding stereo parameter is selected according to the encoder bit rate of the stereo parameter part Extraction pattern, the common extraction pattern is selected when the encoder bit rate of the stereo parameter part is higher than particular value；

Block length analytical procedure, when the encoder bit rate of the stereo parameter part is less than or equal to particular value described in analysis Block length control signal simultaneously selects corresponding stereo parameter to extract pattern according to block length control signal analysis result.

In one embodiment, the stereo parameter extraction mode selecting step also includes lower mixed energy attenuation analysis step Suddenly, analyze the lower mixed energy attenuation for matching vocal tract spectrum coefficient block under different stereo parameter extraction patterns and be based on dividing Analyse result and select corresponding stereo parameter extraction pattern.

In one embodiment, the monophonic processing step comprises the steps of：

Step is mixed under frequency spectrum, the stereo parameter based on the quantization carries out lower mixed to the pairing vocal tract spectrum coefficient block Handle so as to obtain corresponding monophonic spectral coefficient block；

Monophonic coding step, monophonic volume is carried out to the monophonic spectral coefficient block based on the block length control signal Code is handled so as to obtain the monophonic coded data block.

In one embodiment, in the encapsulation step, by the monophonic coded data block and with the monophonic The stereo parameter coded data block, the block length control signal and the stereo parameter carry corresponding to coded data block Mode flag is taken to carry out code stream encapsulation with certain format so as to obtain the stereo coding packet.

Present invention also offers a kind of stereo decoding method, the described method comprises the following steps：

Decapsulation step, decapsulation processing is carried out to the stereo coding packet so as to obtain monophonic coded data Block, stereo parameter coded data block, block length control signal and stereo parameter extraction mode flag；

Monophonic decoding step, monophonic solution is carried out to the monophonic coded data block based on the block length control signal Code is handled so as to obtain monophonic spectral coefficient block；

Stereo parameter decoding step is right based on the block length control signal and stereo parameter extraction mode flag It is stereo corresponding to the monophonic spectral coefficient block so as to obtain that the stereo parameter coded data block carries out decoding process Parameter；

Three-dimensional sonication step, based on the block length control signal according to the monophonic spectral coefficient block and described vertical Body sound parameter carries out three-dimensional sonication so as to obtain pairing vocal tract spectrum coefficient block；

Time-frequency inverse transformation step, it is anti-that time-frequency is carried out to the pairing vocal tract spectrum coefficient block based on the block length control signal Convert so as to obtain the pairing sound channel time-domain signal of target audio.

In one embodiment, in the stereo parameter decoding step, according to the block length control signal and solid Sound parameter extraction mode flag selects corresponding stereo parameter decoding schema.

In one embodiment, the stereo parameter decoding schema includes common decoding schema and superframe decoding schema.

Compared with prior art, the invention has the advantages that：

The decoding method of the present invention provides and core mono codec is overall compact merges framework, multiplexing Time-frequency conversion step, so as to cut down the amount of calculation and delay overhead because time-frequency conversion is brought repeatedly；

The coding method of the present invention provides the stereo encoding method that more flexible different mode is used in mixed way, and reduces The data volume of final coding result, higher audio matter can be obtained under different stereo code check configurations so as to ensure that Amount.

The further feature or advantage of the present invention will illustrate in the following description.Also, the present invention Partial Feature or Advantage will be become apparent by specification, or be appreciated that by implementing the present invention.The purpose of the present invention and part Advantage can be realized or obtained by specifically noted step in specification, claims and accompanying drawing.

Brief description of the drawings

Accompanying drawing is used for providing a further understanding of the present invention, and a part for constitution instruction, the reality with the present invention Apply example to be provided commonly for explaining the present invention, be not construed as limiting the invention.In the accompanying drawings：

Fig. 1 is to encode flow chart according to one embodiment of the invention；

Fig. 2 is to divide schematic diagram according to one embodiment of the invention time-frequency conversion block；

Fig. 3 is to extract model selection flow chart according to one embodiment of the invention stereo parameter；

Fig. 4 is according to one embodiment of the invention encoded data structure sketch；

Fig. 5 is according to one embodiment of the invention stereo coding packet structure sketch；

Fig. 6 is according to one embodiment of the invention decoding process figure.

Embodiment

Embodiments of the present invention are described in detail below with reference to drawings and Examples, whereby implementation personnel of the invention Can fully understand how application technology means solve technical problem to the present invention, and reach the implementation process of technique effect and according to The present invention is embodied according to above-mentioned implementation process.If it should be noted that do not form conflict, each embodiment in the present invention And each feature in each embodiment can be combined with each other, the technical scheme formed protection scope of the present invention it It is interior.

The present invention proposes a kind of stereo coding/decoding method.Next the inventive method is specifically described based on flow chart Encoding and decoding idiographic flow.The step of being shown in the flow chart of accompanying drawing can include such as one group of computer executable instructions Performed in computer system., in some cases, can be with although showing the logical order of each step in flow charts Shown or described step is performed different from order herein.

It is described first against cataloged procedure.As shown in figure 1, step S100 is first carried out, time-domain signal obtaining step, Obtain the pairing sound channel time-domain signal of target audio.The coding method of the present invention first has to carry out time-frequency to the time-domain signal of audio Conversion, corresponding spectral coefficient is converted into by time-domain signal.Match the set that sound channel time-domain signal is numerous individual signal sampling points pair. Due to when carrying out coded treatment to audio, generally being handled in units of frame audio signal.Therefore before time-frequency conversion, All time-domain signals are divided into multiple continuous time-domain signal frames first.Definition M is frame length, continuous N signal sampling point pair For a frame.

In the present embodiment, it is left and right acoustic channels to match sound channel, and each code period handles the time domain letter of a superframe length Number block.Superframe length is the integral multiple of frame length, and definition N is superframe length parameter, the time-domain signal block bag of a superframe length Containing N number of time-domain signal frame, i.e. M × N number of signal sampling point pair.That is step S100 read in every time each M of left and right acoustic channels × N number of sampling point (M × N number of sampling point to).

In order to obtain higher coding compression gains, coding method of the invention becomes by using the time-frequency of adaptive block length Bring and provide audio signal most suitable time frequency resolution, this just needs to be believed a time domain according to the actual conditions of audio signal Number frame is divided into one or more time-frequency conversion blocks.Therefore in the present embodiment, step S110, block are performed after step sloo Long selection step, block length selection processing is carried out to pairing sound channel time-domain signal and become so as to obtain the time-frequency of pairing sound channel time-domain signal Change block and corresponding block length control signal.

In the present embodiment, short block combination in the combination of interframe long block and frame is supported in the processing to audio signal.

Short block combines the frame spectral coefficient data after referring to time-frequency conversion (i.e. with M sampling point to spectral coefficient structure in frame Into data block), can be the data block that multiple m (m is less than M) individual sampling point is formed to spectral coefficient combined result.Such as can To be the combination for the data block that two M/2 sampling points are formed to spectral coefficient, or the number that 4 M/4 sampling points are formed to spectral coefficient According to the combination of block.In this case, the sub-band division of the spectral coefficient after time-frequency conversion and stereo parameter extraction are all pair It should be carried out in each time-frequency conversion block.For different time-frequency conversion block lengths, select the mode of division can also be different, bag Include the number of spectral sub-bands, or the number of each sub-band coefficients.

Interframe long block, which combines, to be referred to allow the spectral coefficient of continuous 2 (or multiple) M point signal frames to be combined, and forms one Individual coefficient matrix.But it is required that these spectral coefficients are obtained using the time-frequency conversion of M points.In this case, frequency spectrum system Several sub-band divisions and stereo parameter extraction are all corresponded to and carried out in coefficient matrix.

Based on short block combination in frame and the combination of interframe long block, in step s 110, by each time-domain signal frame (M sample Point to) be divided into a time-frequency conversion block (M sampling point to) or multiple time-frequency conversion blocks (m sampling point to).An assuming that superframe The time-domain signal block of length includes 3M signal sampling point to (superframe length N=3).Then as shown in Fig. 2 the time domain letter of superframe length Number block 210 includes signal frame 211,212 and 213, and each signal frame (211,212 or 213) includes M signal sampling point pair.Root Further division is done to each signal frame (211,212 or 213) according to the actual conditions of audio signal.First to each signal frame Time-domain signal (M signal sampling point to) carries out 1/2 time and mixed, it is assumed that the signal of left and right acoustic channels is respectively X_LAnd X_R, lower mixed result X_M's Calculation formula is as follows：

M obtained lower sample mixing points are analyzed, so as to obtain block length control letter corresponding with time-domain signal frame Number, this block length control signal is used for representing that the block length information of each time-frequency conversion block during signal frame progress time-frequency conversion (can To be independent M point block lengths, or the shorter block length such as a series of M/2, M/4, M/8).And then controlled and believed according to corresponding block length Number one or more time-frequency conversion blocks are divided into left and right acoustic channels time-domain signal frame.

Division result can be one kind in time-frequency conversion block division result 220,230 and 240 in Fig. 2.Certainly division Situation when as a result can also be Fig. 2 beyond shown frequency transform block division result.

As shown in Fig. 2 time-frequency conversion blocks division result 220, signal frame 211 is divided into comprising M/2 signal sampling point pair Time-frequency conversion block 221, the time-frequency conversion block 222 and 223 comprising M/4 signal sampling point pair；Signal frame 212 is divided into and included The time-frequency conversion block 224 of M signal sampling point pair；Signal frame 213 is divided into the time-frequency conversion block for including M signal sampling point pair 225。

As shown in Fig. 2 time-frequency conversion blocks division result 230, signal frame 211 is divided into comprising M/2 signal sampling point pair Time-frequency conversion block 231 and 232；By signal frame 212 be divided into the time-frequency conversion block 233 comprising M/2 signal sampling point pair and 234；Signal frame 213 is divided into the time-frequency conversion block 235 and 236 comprising M/2 signal sampling point pair.

As shown in Fig. 2 time-frequency conversion blocks division result 240, by signal frame 211 be divided into comprising M signal sampling point pair when Frequency transform block 241；Signal frame 212 is divided into the time-frequency conversion block 242 for including M signal sampling point pair；Signal frame 213 is divided To include the time-frequency conversion block 243 of M signal sampling point pair.

As shown in Fig. 2 time-frequency conversion blocks division result 250, by signal frame 211 be divided into comprising M signal sampling point pair when Frequency transform block 251；Signal frame 212 is divided into the time-frequency conversion block 252 comprising M/2 signal sampling point pair, comprising M/4 signal The time-frequency conversion block 253 and 254 of sampling point pair；Signal frame 213 is divided into the time-frequency conversion block for including M signal sampling point pair 255。

Following can performs step S120 as shown in Figure 1, time-frequency conversion step, during based on block length control signal pair Frequency transform block carries out time-frequency conversion so as to obtain corresponding pairing vocal tract spectrum coefficient block；Each time-frequency conversion block becomes by time-frequency A pairing vocal tract spectrum coefficient block corresponding to generation after changing.

Can encodes to pairing vocal tract spectrum coefficient block after step S120.In the present embodiment, coding step Including stereo parameter coding step, stereo parameter coding is carried out to pairing vocal tract spectrum coefficient block based on block length control signal So as to obtain stereo parameter coded data block.Stereo parameter coding step includes step S140, stereo parameter extraction step Suddenly, stereo parameter is extracted from pairing vocal tract spectrum coefficient block based on block length control signal.

To be effectively reduced encoder bit rate, coding method of the invention proposes superframe extraction pattern and common extraction pattern Two kinds of stereo parameters extract pattern.Therefore, it is necessary to perform step S142 before step S140 is performed, stereo parameter is extracted Mode selecting step, selected according to the encoder bit rate of the stereo parameter part of target audio and block length control signal corresponding Stereo parameter extracts pattern and generates corresponding stereo parameter extraction mode flag.In the present embodiment, stereo parameter Whether extraction mode flag is used to identify employs superframe extraction pattern in the present encoding cycle.Define stereo parameter extraction mould Superframe extraction mode identifier sflag of the formula labeled as 1bit.Sflag=1 is represented using superframe extraction mode treatment；sflag =0 represents using common extraction mode treatment.

Under common extraction pattern, first according to block length control signal respectively to the frequency spectrum of pairing sound channel Zhong Mei roads sound channel Coefficient block carries out frequency domain sub-band division, obtains several spectral coefficient vectors；Then using matching spectral coefficient vector corresponding to sound channel Carry out stereo parameter extraction.Under common coding mode, each vocal tract spectrum coefficient block of matching generates one group of stereo parameter.

Under superframe extraction pattern, stereo coding module supports that sampling point number per treatment is at core encoder module The integral multiple n (2≤n≤N) of the frame length of reason situation.N M point spectral coefficient block of sound channel Zhong Mei roads sound channel will be matched first Form a two-dimentional pedigree matrix number；Then the sub-band division in frequency dimension is carried out to the pedigree matrix number of every road sound channel, Obtain several spectral coefficient submatrixs；Stereo parameter is carried out followed by spectral coefficient submatrix corresponding to every group of sound channel of pairing to carry Take.I.e. under superframe extraction pattern, the n pairing vocal tract spectrum coefficient blocks comprising M sampling point pair generate one group of stereo parameter. Compared to common extraction pattern, the data volume of the stereo parameter of generation greatly reduces.Step S142 detailed implementation is such as Shown in Fig. 3, step S310 is first carried out, encoder bit rate analytical procedure, phase is selected according to the encoder bit rate of stereo parameter part The stereo parameter extraction pattern answered.In the present embodiment, when the encoder bit rate of stereo parameter part is higher than certain specific threshold When, stereo parameter extraction is carried out to pairing vocal tract spectrum coefficient block using common extraction pattern.Generally, stereo parameter part Encoder bit rate be the preset parameter that sets before coding, stereo parameter portion is set according to the actual coding situation of audio in advance The specific threshold of the encoder bit rate divided.In the present embodiment, the specific threshold is set as 12kbps.

When the encoder bit rate of stereo parameter part is less than or equal to above-mentioned specific threshold, it is possible to perform step S320, block length analytical procedure, analyze block length control signal and selected according to block length control signal analysis result corresponding stereo Parameter extraction pattern.

Based on the description of the above-mentioned processing mode to super frame mode, encoder reads in the time domain letter of a superframe length every time Number (M × N number of signal sampling point to), after carrying out time-frequency conversion, when the pairing vocal tract spectrum coefficient that continuous n (n >=2) group M points be present During block, it is possible to extract pattern using superframe.Time-domain signal frame i.e. in the time-domain signal to a superframe length is drawn Timesharing, if division result includes the time-frequency conversion block of continuous n M points length, it is possible to which what is generated to it matches somebody with somebody accordingly The stereo parameter extraction under superframe extraction pattern is done to vocal tract spectrum coefficient block, so as to obtain one group of common stereo ginseng Number.It is that can not use superframe it can be appreciated that when the time-domain signal block of a superframe length only includes signal frame (N=1) Extraction pattern carries out stereo parameter extraction to it.

In the present embodiment, continuous n M point block lengths control signal to be present be block length analysis condition for definition, when being unsatisfactory for this During condition, i.e., when the time-frequency conversion block of M points and discontinuous, or when the length of time-frequency conversion block is less than M, then step S301 is performed, Common extraction pattern is selected for above-mentioned time-frequency conversion block.

During due to using common extraction pattern and superframe extraction pattern, the lower mixed result for matching sound channel (is generated Monophonic spectral coefficient) be different.In order to obtain more preferable coding result, in the present embodiment, stereo parameter Extraction mode selecting step also includes lower mixed energy attenuation analytical procedure (step S330), and analysis pairing vocal tract spectrum coefficient block exists Lower mixed energy attenuation under different stereo parameter extraction patterns simultaneously selects corresponding stereo parameter extraction based on analysis result Pattern.

In the present embodiment, judge to meet block length analysis condition in step S320, i.e., in the presence of continuous n groups (2≤n≤N) M points Pairing vocal tract spectrum coefficient block when perform step S330.Superframe extraction pattern and common extraction pattern are based respectively on first to even The pairing vocal tract spectrum coefficient block of continuous n group (2≤n≤N) M points carries out lower mixed processing, according to the lower mixed processing under both of which Energy attenuation analysis result come determine whether to take superframe extract pattern.It is as follows to implement step：

Assuming that the pairing vocal tract spectrum coefficient block of continuous n group (2≤n≤N) M points is expressed as L (i, j) and R (i, j), wherein i =0 ..., M-1, j=0 ..., n-1.Using lower mixing pedigree several piece corresponding to superframe extraction pattern and common extraction pattern acquisition Be expressed as S (i, j) andCalculate mixed energy under it

E_S=∑_i,jS²(i,j) (2)

And off-energy

Wherein：E_SFor mixed energy under corresponding to superframe extraction pattern；For mixed energy under corresponding to common extraction pattern；E_D For off-energy corresponding to superframe extraction pattern；For off-energy corresponding to common extraction pattern.

When meet following condition for the moment, then this n group spectral coefficients block using superframe extract pattern；Otherwise, use is general Logical extraction pattern.

a)E_D/E_S<T₁(threshold value T₁It can be any value between 0~0.5 in theory, take in the present embodiment 0.05)；

b)(threshold value T₂1.0 any value is may be greater than in theory.Take in the present embodiment 5.0).

It is to be herein pointed out step S330 purpose is to obtain the more preferable coding result of coding quality, It can be omitted in view of amount of calculation or other practical operation factors, step S330.I.e. in another embodiment of the invention, stand Body sound parameter extraction mode selecting step only includes encoder bit rate analytical procedure (S310) and block length analytical procedure (S320).

Next by taking the time-frequency conversion block division result 220,230 and 240 shown in Fig. 2 as an example, specific description is three-dimensional The selection result of sound parameter extraction mode selecting step.Assuming that all lower mixed energy attenuation analyses are satisfied by condition (in other words Mixed energy attenuation analytical procedure under omission).

For time-frequency conversion block division result 220, the pairing sound channel frequency generated to time-frequency conversion block 221,222 and 223 Pedigree several piece carries out stereo parameter extraction using common extraction pattern；The pairing sound generated to time-frequency conversion block 224 and 225 Road spectral coefficient block carries out stereo parameter extraction using superframe extraction pattern.

For time-frequency conversion block division result 230, time-frequency conversion block 231,232,233,234,235 and 236 is generated Pairing vocal tract spectrum coefficient block stereo parameter extraction is carried out using common extraction pattern.

For time-frequency conversion block division result 240, the pairing sound channel frequency generated to time-frequency conversion block 241,242 and 243 Pedigree several piece carries out stereo parameter extraction using superframe extraction pattern.

For time-frequency conversion block division result 250, time-frequency conversion block 251,252,253,254 and 255 generations are matched somebody with somebody Stereo parameter extraction is carried out using common extraction pattern to vocal tract spectrum coefficient block.

After stereo parameter is extracted, you can perform step S141 as shown in Figure 1, quantization encoding step, opposition Body sound parameter carries out quantization encoding, so as to the stereo parameter of one group of quantization, and then obtains stereo parameter coded data block

In the present embodiment, coding step also includes monophonic processing step, based on block length control signal according to pairing sound Stereo parameter after road spectral coefficient block and quantization obtains monophonic coded data block.In this step, it is first carried out walking Rapid S130, step is mixed under frequency spectrum, the stereo parameter based on the quantization obtained in step S141 is to matching vocal tract spectrum coefficient block Lower mixed processing is carried out so as to obtain corresponding monophonic spectral coefficient block.

Then step S131 is performed, monophonic coding step, monophonic coded treatment is carried out to monophonic spectral coefficient block So as to obtain monophonic coded data block.By all monophonic spectral coefficient blocks corresponding to N number of time-domain signal frame and accordingly Block length control signal is sequentially sent to core encoder, monophonic coded data block corresponding to acquisition as input.Each pairing sound Road spectral coefficient block can generate a monophonic coded data block.If it can be appreciated that using general in step S140 Lead to extraction pattern, then the corresponding stereo parameter coded data block of each monophonic coded data block.If in step S140 In using superframe extract pattern, then the corresponding stereo parameter coded data block of multiple monophonic coded data blocks.Most Can performs step S150, encapsulation step, by monophonic coded data block and corresponding with monophonic coded data block afterwards Stereo parameter coded data block, block length control signal and stereo parameter extraction mode flag carry out code stream with certain format Encapsulate so as to obtain stereo coding packet.

In the present embodiment, each code period read in the time-domain signal block of a superframe length and in each coding week Step is packaged at the end of phase.Therefore the time-domain signal block comprising a superframe length in each stereo coding packet Information.It is of course also possible to the time-domain signal block of multiple superframe lengths after the time-domain signal block of multiple superframe lengths has been handled Result do disposable encapsulation.

The coded data packet that each code period is obtained is combined successively according to specific format can obtain target sound The final coding result of frequency file.As shown in figure 4, form 400 depicts coding staff of a certain audio signal by the present invention Method encoded after data structure.It is by a head information packet and multiple stereo coding packet (coded datas Bag 1,2,3 ...) form.

By taking stereo coding packet 1 as an example, it is assumed that the raw information of stereo coding packet 1 is as shown in Figure 2 The time domain time-frequency conversion block 210 of superframe length.For time-frequency conversion block division result 220, it ultimately generates stereo coding number According to the data structure schematic diagram of bag as shown in Fig. 5 neutral body sound encoders packet 501.Stereo coding packet 501 includes sound Road to identifier 1, stereo parameter extract mode flag 1, multiple block length control signals, multiple monophonic coded data blocks and Multiple stereo coding data blocks.

In stereo coding packet 501, signal frame 211, i.e. time-frequency conversion in block length control signal 1-1 corresponding diagrams 2 Block 221,222 and 223；Signal frame 212, i.e. time-frequency transform block 224 in block length control signal 2-1 corresponding diagrams 2；Block length control letter Signal frame 213, i.e. time-frequency transform block 225 in number 3-1 corresponding diagrams 2.Monophonic coded data block 1-1 and stereo parameter coding Time-frequency conversion block 221 in data block 1-1 corresponding diagrams 2；Monophonic coded data block 2-1 and stereo parameter coded data block 2- Time-frequency conversion block 222 in 1 corresponding diagram 2；Monophonic coded data block 3-1 and stereo parameter coded data block 3-1 corresponding diagrams 2 Middle time-frequency conversion block 223；Time-frequency conversion block 224 in monophonic coded data block 4-1 corresponding diagrams 2；Monophonic coded data block 5-1 Time-frequency conversion block 225 in corresponding diagram 2.Time-frequency conversion block 224 and 225 in stereo parameter coded data block 4-1 corresponding diagrams 2. Stereo parameter extraction mode flag 1 is sflag=1.

The data block such as Fig. 4 data structures included for frequency transform block division result 240, its coded data packet generated Shown in schematic diagram 502.Stereo coding packet 502 includes sound channel and extracts mode flag 2, more to identifier 2, stereo parameter Individual block length control signal, multiple monophonic coded data blocks and a stereo coding data block.Wherein, block length control signal Signal frame 211, i.e. time-frequency transform block 241 in 1-2 and monophonic coded data block 1-2 corresponding diagrams 2；Block length control signal 2-2 And signal frame 212, i.e. time-frequency transform block 242 in monophonic coded data block 2-2 corresponding diagrams 2；Block length control signal 3-2 and Signal frame 213, i.e. time-frequency transform block 243 in monophonic coded data block 3-2 corresponding diagrams 2.Stereo parameter coded data block 1-2 Time-frequency conversion block 241,242 and 243 in corresponding diagram 2.Stereo parameter extraction mode flag 2 is sflag=1.

The data block such as Fig. 4 data structures included for frequency transform block division result 250, its coded data packet generated Shown in schematic diagram 503.Stereo coding packet 503 includes sound channel and extracts mode flag 3, more to identifier 3, stereo parameter Individual block length control signal, multiple monophonic coded data blocks and multiple stereo coding data blocks.Wherein, block length control signal Signal frame 211 in 1-3 corresponding diagrams 2, i.e. time-frequency transform block 251；Signal frame 212 in block length control signal 2-3 corresponding diagrams 2, immediately Frequency transform block 252,253 and 254；Signal frame 213, i.e. time-frequency transform block 255 in block length control signal 3-3 corresponding diagrams 2.Monophone Time-frequency conversion block 251 in road coded data block 1-3 and stereo parameter coded data block 1-3 corresponding diagrams 2；Monophonic coded number According to time-frequency conversion block 252 in block 2-3 and stereo parameter coded data block 2-3 corresponding diagrams 2；Monophonic coded data block 3-3 And time-frequency conversion block 253 in stereo parameter coded data block 3-3 corresponding diagrams 2；Monophonic coded data block 4-3 and solid Time-frequency conversion block 254 in sound parameter coding data block 4-3 corresponding diagrams 2；Monophonic coded data block 5-3 and stereo parameter are compiled Time-frequency conversion block 255 in code data block 5-3 corresponding diagrams 2.Stereo parameter extraction mode flag 2 is sflag=0.

Correction data structural representation 501,502 and 503 is as can be seen that using the superframe extraction stereo ginseng of schema extraction Number, substantially reduce the data volume of stereo coding packet.The coding method of the present invention provides more flexible superframe extraction The stereo encoding method that pattern and common extraction mode mixture use, ensure that can be obtained with lower stereo code check Equal audio quality.

It is pointed out that data structure schematic diagram 501,502 and 503 describes stereo coding data in Fig. 5 A part for the included data block of bag.In actual coding, the arrangement mode of each data block by coded data packet encapsulation lattice Formula determines, and the encapsulation format according to coded data packet and other actual coding requirements, in stereo coding packet The data block of other types and content can be included.

Next it is described for the coding/decoding method of the present invention.The stereo decoding method of the present invention is for the present invention Stereo encoding method generation the method that is decoded of coded data packet.As shown in fig. 6, step S600 is first carried out, solve Encapsulation step, decapsulation processing is carried out to coded data packet so as to obtain monophonic coded data block, stereo parameter coded number According to block and block length control signal.

Following can performs step S610 monophonic decoding steps, based on block length control signal to monophonic coded number Monophonic decoding process is carried out according to block so as to obtain monophonic spectral coefficient block.

Step S620, stereo parameter decoding step, based on block length control signal are performed while step S610 is performed Decoding process is carried out to stereo parameter coding data block so as to obtain stereo parameter corresponding to monophonic spectral coefficient block.

The common extraction pattern and superframe extraction mould of the coding method neutral body sound parameter extraction pattern of the corresponding present invention Formula, coding/decoding method of the invention include two kinds of stereo parameter decoding schemas of common decoding schema and superframe decoding schema.

In step S620, current decoding is judged according to block length control signal and stereo parameter extraction mode flag Data frame obtains the mode of corresponding stereo parameter, so as to choose corresponding stereo parameter decoding schema.

Step S600, S610 selected based on common decoding schema and superframe decoding schema and S620 concrete operations Journey is as follows：

In step S600, it is first determined superframe length parameter N and frame length M；Then each decoding data bag is solved Encapsulation process extracts mode flag so as to obtain stereo parameter (superframe extracts mode identifier sflag)；Then, using frame to be single Decoding data bag is split into N number of decoding data frame (information that a time-domain signal frame is included in each decoding data frame) by position.

In step S610, monophonic coded data block corresponding to current decoded data frame and block length control signal are sent into Core decoder, carry out inverse quantization and obtain corresponding monophonic spectral coefficient block.

In step S620, if block length control signal and superframe extraction the mode identifier sflag of current decoded data frame Meet following 3 conditions, carry out step S620.1；Otherwise step S620.2 is carried out.

A. superframe extraction mode identifier sflag=1；

B. current decoded data frame is not the first frame in the current decoding process cycle；

C. the block length control signal of current decoded data frame and upper decoding data frame represent to become using the time-frequency of M points Change.

Step S620.1：Current decoded data frame uses and upper one group of stereo parameter of decoding data frame identical.

Step S620.2：Stereo parameter coded data block is extracted from current decoded data frame, inverse quantization obtains solid Sound parameter.

It is seen that as N=1, each decoding data bag is the information for only including a time-domain signal frame, therefore is decoded Packet can only use stereo parameter corresponding to step S620.2 acquisitions.As N >=2, each decoding in decoding data bag Data frame needs to carry out judging that selection carries out step S620.1 or step S620.2.

Can performs step S630, three-dimensional sonication step, based on block length on the basis of step S610 and S620 Control signal carries out three-dimensional sonication so as to obtain pairing vocal tract spectrum according to monophonic spectral coefficient block and stereo parameter Coefficient block.In step S630, the M point spectral coefficients of current decoded data frame and corresponding stereo parameter are carried out stereo Upper mixed processing, obtains matching the respective M points spectral coefficient of sound channel.

Following can performs step S640, time-frequency inverse transformation step, based on block length control signal to pairing sound channel frequency Spectral coefficient carries out time-frequency inverse transformation so as to obtain the time domain reconstruction signal of the respective M points of the pairing sound channel of target audio.Finally perform Step S650, the time domain reconstruction signal obtained in step S640 is exported.

The stereo coding/decoding method of the present invention, from identical with core encoder multiplexing during whole coding/decoding Block length selection mode and time-frequency conversion step.During coding, the monophonic frequency spectrum data obtained after stereo parameter is encoded is made For input, it is sent directly into core encoder and carries out quantization encoding；During decoding, make from core decoder output monophonic frequency spectrum data For input, stereo decoding processing module is sent into.Which provides with core mono encoder is overall compact merges frame Frame, so as to cut down the amount of calculation and delay overhead because time-frequency conversion is brought repeatedly.

Mode of the coding method of the present invention also based on the combination of interframe long block proposes superframe extraction pattern, and this is stereo Parameter extraction pattern, effectively reduce final stereo coding data volume.The coding method of the present invention to superframe by extracting Pattern and it is used in mixed way based on the common extraction pattern that short block in frame combines, there is provided more flexible stereo coding side Method, ensure that can obtain higher audio quality under different stereo code check configurations.

The execution flow of the stereo coding/decoding method of the present invention is described followed by a specific application example. By taking superframe length parameter N=2 as an example, i.e., encoder reads in the signal sampling point pair of 2M every time.Stereo coding/decoding module uses The decoding method of greatly related rotation, the optional output result of block length selecting module include the long block structure and 8 M/8 of M points The short block combining structure of point, time-frequency conversion module are converted using MDCT.

During coding, 1/2 time is carried out to the 2M points time-domain signals of left and right acoustic channels first and is mixed, and is divided into the time-domain signal of 2 M points Frame, it is sequentially sent to block length selecting module.Corresponding MDCT is completed to left and right acoustic channels to convert, obtain frequency spectrum according to block length output result Coefficient.

A) when 2 frame block length output results use M point long blocks, the spectral coefficient of 2 M points of every road sound channel is combined into The coefficient matrix of one M × 2, is represented with L and R respectively.The partition of frequency dimension is carried out to coefficient matrix L and R, divides number Mesh is expressed as N_L.Then to every system number sub-block L_kAnd R_k, (k=0 ..., N_L- 1) one very big related rotation angle θ is extracted_k：

Wherein θ₀For related rotation angle parameter, and

Here, the definition of inner product expands to two-dimensional matrix space：

Wherein：L (i, j) and r (i, j) represents coefficient sub-block L respectively_kAnd R_kAll elements.

Quantization encoding is carried out to all anglecs of rotation, obtains a stereo parameter coded data block.Meanwhile using quantization after The anglec of rotationTo every system number sub-block L_kAnd R_kGreatly related rotation transformation is carried out, obtains two coefficient after rotation transformation BlockWith

To coefficient sub-blockWithCarry out again it is lower mixed, obtain corresponding under mixed coefficient sub-block

All lower mixed coefficient sub-blocks are reconfigured, obtain the lower mixed coefficient matrix of M × 2.Successively by the every of the matrix One column vector is sent into core encoder and encoded, obtain 2 monophonic coded data blocks as input.

When code stream encapsulates, first monophonic coded data block and corresponding block length control signal, meeting and stereo parameter Coded data block collectively constitutes a coded frame data；Second monophonic coded data block and corresponding block length control signal group Into a coded frame data.Two coded frame datas and superframe are finally extracted into mode identifier sflag (sflag=1 here) Form a coded data packet.

B) when 2 frame block length output results are at least one to be combined using 8 M/8 points short blocks, then to each time-frequency conversion The greatly related anglec of rotation extraction of block (M points or M/8 points) complete independently, and sub-band coefficients vector very big related rotation transformation and Under sneak out journey.

The spectral coefficient block of 1 M point can be reassembled into for the lower mixing spectral coefficient using continuous 8 M/8 point short blocks, Core encoder is sent into together with corresponding block length control signal, obtains 1 monophonic coded data block；Corresponding 8 groups obtained The very big related anglec of rotation can also be combined into 1 stereo parameter coded data block after quantization encoding.

Directly it can be sent into core together with corresponding block length control signal and compile for the lower mixing spectral coefficient using M point long blocks Code device, obtains 1 monophonic coded data block.Corresponding 1 group of obtained greatly related anglec of rotation, also can group after quantization encoding Synthesize 1 stereo parameter coded data block.

When code stream encapsulates, each monophonic coded data block, it can be compiled with corresponding block length control signal and stereo parameter Code data block collectively constitutes a coded frame data.Two coded frame datas and superframe are finally extracted into mode identifier sflag (sflag=0 here) forms a coded data packet.

During decoding, superframe extraction mode identifier sflag is extracted first to each decoding data bag；Then, yardage will be solved 2 decoding data frames are splitted into according to bag.

Block length control signal, monophonic coded data block and stereo parameter are further separated into the 1st decoding data frame Coded data block.Monophonic coded data block and stereo parameter coded data block are carried out instead respectively according to block length control signal Quantization decoder, obtain M point spectral coefficients and the corresponding greatly related anglec of rotation.Then, it is independent complete in each time-frequency conversion block Upper mixed conversion and IMDCT processes into sub-band coefficients vector, reconstruct the 1st group of M point time-domain signal of left and right acoustic channels.

To the 2nd decoding data frame, point following 2 kinds of situations are carried out：

(1) if during sflag=1, then only monophonic coded data block carries out inverse quantization, obtains M point spectral coefficients, pole The big related anglec of rotation is then identical with previous frame.

(2) if during sflag=0, the decoding data frame uses and the 1st decoding data frame identical stereo decoding side Method, reconstruct the 2nd group of M point time-domain signal of left and right acoustic channels.

Described according to example, the cataloged procedure of this stereo coding/decoding method only carries out a MDCT to every road sound channel and become Change, decoding process only carries out once reverse Modified Discrete Cosine Tr ansform (IMDCT) conversion.Compared to conventional method, this stereo volume Coding/decoding method is eliminated with time domain due to providing the time frequency resolution consistent with core coding/decoding module to input signal Pilot process of the signal as transition, reduces encoding and decoding complexity；For the coding/decoding system using MDCT as time-frequency conversion, Encoding and decoding delay is also reduced simultaneously.Further, since the continuous monophonic coded data block in part has been multiplexed one it is stereo Parameter coding data block, encoder bit rate can be effectively reduced, optimally the encoder bit rate of stereo parameter part can Compression 50% or so.

While it is disclosed that embodiment as above, but described content only to facilitate understand the present invention and adopt Embodiment, it is not limited to the present invention.Method of the present invention can also have other various embodiments.Without departing substantially from In the case of essence of the present invention, those skilled in the art, which work as, can make various corresponding changes or become according to the present invention Shape, but these corresponding changes or deformation should all belong to the scope of the claims of the present invention.

Claims

1. a kind of stereo encoding method, it is characterised in that methods described comprises the steps of：

Block length selects step, carries out block length selection processing to the pairing sound channel time-domain signal so as to obtain pairing sound channel time domain letter Number time-frequency conversion block and corresponding block length control signal；

Time-frequency conversion step, it is corresponding so as to obtain that time-frequency conversion is carried out to the time-frequency conversion block based on the block length control signal Pairing vocal tract spectrum coefficient block；

Stereo parameter coding step, the pairing vocal tract spectrum coefficient block is carried out based on the block length control signal stereo Parameter coding is so as to obtaining stereo parameter coded data block；

Monophonic processing step, matched somebody with somebody based on the block length control signal and the stereo parameter coded data block according to Monophonic coded data block is obtained to vocal tract spectrum coefficient block；

Encapsulation step, the monophonic coded data block, the stereo parameter coded data block and the block length are controlled Signal carries out code stream encapsulation so as to obtain stereo coding packet；

Wherein, the stereo parameter coding step comprises the steps of：

Stereo parameter extracts mode selecting step, according to the encoder bit rate of the stereo parameter part of the target audio and The block length control signal selects corresponding stereo parameter extraction pattern and generates corresponding stereo parameter extraction pattern mark Note；

Stereo parameter extraction step, pattern is extracted according to the stereo parameter and carried from described match in vocal tract spectrum coefficient block Take corresponding stereo parameter；

Stereo parameter quantization encoding step, quantization encoding is carried out to the stereo parameter so as to obtain the stereo ginseng of quantization Number, and then obtain the stereo parameter coded data block；

The stereo parameter extraction mode selecting step comprises the steps of：

Encoder bit rate analytical procedure, corresponding stereo parameter is selected to extract according to the encoder bit rate of the stereo parameter part Pattern, common extraction pattern is selected when the encoder bit rate of the stereo parameter part is higher than particular value；

Block length analytical procedure, the block length is analyzed when the encoder bit rate of the stereo parameter part is less than or equal to particular value Control signal simultaneously selects corresponding stereo parameter to extract pattern according to block length control signal analysis result.

2. the method as described in claim 1, it is characterised in that the stereo parameter extraction pattern includes common extraction pattern And superframe extraction pattern, wherein：

Under the common extraction pattern, the corresponding stereo parameter coded number of a monophonic coded data block According to block；

Under the superframe extraction pattern, multiple continuous corresponding stereo parameters of the monophonic coded data block Coded data block.

3. the method as described in claim 1, it is characterised in that the stereo parameter extraction mode selecting step also includes down Mixed energy attenuation analytical procedure, it is lower mixed under different stereo parameter extraction patterns to analyze the pairing vocal tract spectrum coefficient block Energy attenuation simultaneously selects corresponding stereo parameter extraction pattern based on analysis result.

4. the method as described in claim 1, it is characterised in that the monophonic processing step comprises the steps of：

Step is mixed under frequency spectrum, the stereo parameter based on the quantization carries out lower mixed processing to the pairing vocal tract spectrum coefficient block So as to obtain corresponding monophonic spectral coefficient block；

Monophonic coding step, the monophonic spectral coefficient block is carried out at monophonic coding based on the block length control signal Manage so as to obtain the monophonic coded data block.

5. the method as described in claim 1, it is characterised in that in the encapsulation step, by the monophonic coded data Block and the stereo parameter coded data block corresponding with the monophonic coded data block, the block length control signal with And the stereo parameter extraction mode flag carries out code stream encapsulation so as to obtain the stereo coding data with certain format Bag.

6. a kind of stereo decoding method, it is characterised in that the described method comprises the following steps：

Decapsulation step, decapsulation processing is carried out to stereo coding packet so as to obtain monophonic coded data block, solid Sound parameter coding data block, block length control signal and stereo parameter extraction mode flag, according to the block length control signal And stereo parameter extraction mode flag selects corresponding stereo parameter decoding schema, wherein, the stereo parameter carries It is in an encoding process according to the encoder bit rate of the stereo parameter part of target audio and the block length control to take mode flag Generated after the corresponding stereo parameter extraction pattern of signal behavior processed；

Monophonic decoding step, the monophonic coded data block is carried out at monophonic decoding based on the block length control signal Manage so as to obtain monophonic spectral coefficient block；

Stereo parameter decoding step, based on the block length control signal and stereo parameter extraction mode flag, to described Stereo parameter coded data block carries out decoding process so as to obtain stereo parameter corresponding to the monophonic spectral coefficient block；

Three-dimensional sonication step, based on the block length control signal according to the monophonic spectral coefficient block and described stereo Parameter carries out three-dimensional sonication so as to obtain pairing vocal tract spectrum coefficient block；

Time-frequency inverse transformation step, time-frequency inverse transformation is carried out to the pairing vocal tract spectrum coefficient block based on the block length control signal So as to obtain the pairing sound channel time-domain signal of target audio.

7. method as claimed in claim 6, it is characterised in that the stereo parameter decoding schema includes common decoding schema And superframe decoding schema.