CN100571043C - A kind of space parameter stereo coding/decoding method and device thereof - Google Patents

A kind of space parameter stereo coding/decoding method and device thereof Download PDF

Info

Publication number
CN100571043C
CN100571043C CNB2007100537702A CN200710053770A CN100571043C CN 100571043 C CN100571043 C CN 100571043C CN B2007100537702 A CNB2007100537702 A CN B2007100537702A CN 200710053770 A CN200710053770 A CN 200710053770A CN 100571043 C CN100571043 C CN 100571043C
Authority
CN
China
Prior art keywords
module
frequency
domain
signal
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2007100537702A
Other languages
Chinese (zh)
Other versions
CN101162904A (en
Inventor
胡瑞敏
陈水仙
艾浩军
涂卫平
曹晟
王恒
李璇
周婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CNB2007100537702A priority Critical patent/CN100571043C/en
Publication of CN101162904A publication Critical patent/CN101162904A/en
Application granted granted Critical
Publication of CN100571043C publication Critical patent/CN100571043C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses space parameter stereo coding/decoding method and device: coding side at first mixes stereo left and right sound track signals under time domain or frequency domain, mixed signal is sent to transform coder and produces the monophony coded data down then, extract the following mixed signal that comprises quantization error simultaneously, and its transform domain is divided into continuous and nonoverlapping subband according to its short-term spectrum characteristic, with the subband spatial parameter that unit extracts left and right acoustic channels.Decoding end is according to the transition coding data, generate the following mixed signal that comprise quantization error consistent and adopt same division methods to obtain sub-band division with coding side, rebuild stereo left and right acoustic channels subband signal according to spatial parameter information then, inverse transformation output time domain stereophonic signal.Compared with prior art, the present invention does not have the additional delay that positive inverse transformation brings and can realize under the condition that does not transmit sub-band division information that dynamic time-frequency divides, and has improved the real-time and the efficient of space parameter stereo coding/decoding.

Description

A kind of space parameter stereo coding/decoding method and device thereof
Technical field
The invention belongs to the Digital Audio Compression Coding field, particularly a kind of stereo coding/decoding system and device with the parametric representation spatial information.
Background technology
Digital audio encoding originates from late 1980s, serves as typical case's representative with MP3 (MPEG-1 Layer III) and AAC (Advanced Audio Coding).This moment, coding techniques was in the correlation of considering aspect the stereosonic processing between sound channel, adopt and poor (Mid/Side) stereo or intensity stereo (Intensity Stereo) treatment technology, then the complete audio signal of two-way after handling is adopted independently coding method, so code check is directly proportional substantially with channel number.
People such as C.Faller had proposed a kind of based on space psychoacoustic parameter stereo coding/decoding system BCC (Binaural Cue Coding) in 2002.This system extracts intensity difference ILD (Interchannel Level Difference) between sound channel at transform domain, time difference ITD between sound channel (Interchannel Time Difference), and degree of correlation IC (Interchannel Coherence) between sound channel, its decoder is according to the output of these parameters and following mixing sound road re-establishing multiple acoustic track.People such as J.Breebaart proposed parameter stereo PS (Parametric Stereo) coding/decoding system in 2004, it all is to be stereo/multichannel coding/decoding system, its core architecture and the BCC systems compliant that base growth is got up with BCC that MPEG in 2005 releases MPEG surround sound (MPEG Surround) system.
Above-mentioned spatial parameter coding/decoding system is with respect to only adopting and coding/decoding systems such as MP3/AAC poor stereo and the intensity stereo technology, therefore tangible lifting is arranged: reach stereosonic tonequality with monaural code check, in audio broadcasting strict and mobile audio frequency, be applied to tonequality and code stream on performance.Increase but the distinct issues of these spatial parameter coding/decoding systems are time-delays, the time frequency analysis module of encoding and decoding end has been introduced the time-delay of at least one frame, usually at 20ms between the 40ms, be unfavorable for bidirectional real-time.Therefore the time-delay that reduces space parameter stereo coding/decoding is this area problem demanding prompt solution.
Another problem of BCC, PS and MPEG Surround is to be difficult to realize dynamic time-frequency division.According to the space psychologic acoustics, ILD, ITD and IC only just have clear and definite meaning at sound source continuous on same or the space.Above-mentioned spatial parameter coded system all adopts the mode of sub-band division to distinguish sound sources different in the signal and then extracts spatial parameter, increases for fear of transmitting the code check that sub-band division information brings, and this division is static and is independent of characteristics of signals.Because the spectral range of each sound source is dynamic in the actual signal, this mode has reduced the efficient of spatial parameter coding.Therefore realize under the situation that does not increase code check that it is another problem that this area needs to be resolved hurrily that dynamic time-frequency is divided.
Summary of the invention
The objective of the invention is to solve the deficiency of existing space parameter coding/decoding system, providing a kind of does not have additional delay and supports space parameter stereo coding/decoding method and the device thereof that dynamic time-frequency is divided, and reduces system delay and improves the efficient of stereophonic signal compression.
Coding techniques solution of the present invention may further comprise the steps:
Step 1 is descended to mix to the left and right sides two-way time-domain audio signal of importing, and generates one road signal;
Step 2 is carried out the converting audio frequency coding to the following mixed signal that obtains, and generates the coded data of mixed signal down;
Step 3, the inverse quantization frequency spectrum of the following mixed signal that acquisition is corresponding with the coded data of following mixed signal;
Step 4 is analyzed the following mixed signal that frequency domain inverse quantizes, and spectrum division is become plurality of continuous and non-overlapping subband;
Step 5 is a unit with the subband of dividing, and extracts the spatial parameter information of the left and right sides two paths of signals of input at each subband of frequency domain, span parameter coding data;
Step 6 is combined into one road encoding code stream with the coded data and the spatial parameter coded data of mixing signal down by certain format.
And described the mixing down of step 1 is in the time domain operation, and the time domain average value of promptly getting left and right sides two-way audio signal is as mixed signal down, and wherein the time domain average value is a two paths of signals in half of the sample value sum at synchronization place.
And, described the mixing down of step 1 is in frequency-domain operations, promptly get left and right sides two-way audio signal frequency-domain mean value or multiply by gain coefficient, obtain mixed signal under the frequency domain, its frequency domain mean value is two paths of signals in half of the spectral line value sum at same frequency place, and gain coefficient is a positive real number of taking advantage of on mean value in order to adjust down mixed signal energy.
And the described inverse quantization frequency spectrum of mixed signal down of step 3 is by acquisition that the coded data of mixing signal is down decoded; Perhaps when having generated the inverse quantization frequency spectrum of time mixed signal in the transition coding process, directly from cataloged procedure, obtain.
The present invention also provides corresponding space parameter stereo coding/decoding method, comprises following steps successively:
Step I is separated into down mixed signal encoding data and spatial parameter coded data with generated code stream;
Step II is carried out converting audio frequency decoding to mixing the signal encoding data down, generates the following mixed signal that frequency domain inverse quantizes, and the spectrum division with this signal becomes plurality of continuous and non-overlapping subband simultaneously;
Step II I, according to the following mixed signal of spatial parameter coded data and frequency domain inverse quantification, the subband of dividing with Step II is a unit, generates two-way and comprises frequency-domain audio signals, this two-way frequency-domain audio signals comprises the given spatial information of spatial parameter coded data;
Step IV to the two-way frequency-domain audio signals, carries out time-frequency inverse transformation or the filtering of synthesis filter group, generates left and right sides two-way time-domain audio signal.
The invention provides a kind of and space parameter stereo coding method corresponding device thereof, by mixing module down, the core encoder module, the core codec module, dynamic time-frequency is divided module, analysis filterbank, parameter extraction module and code stream forming module are formed, left and right sides two-way time-domain audio signal input is mixed module and analysis filterbank down, the output of mixed module inserts core encoder down, the output of core encoder inserts the core codec module, the output of core codec module inserts dynamic time-frequency and divides module, dynamic time-frequency is divided the output access parameter extraction module of module and analysis filterbank, and the output of core encoder module and the output of parameter extraction module insert the code stream forming module.
And described core encoder module adopts the AAC encoder.
The present invention also provides a kind of and space parameter stereo coding/decoding method corresponding device thereof, dividing module, parameter synthesis module and synthesis filter group by code stream analyzing module, core codec module, dynamic time-frequency forms, isolate core codec data and spatial parameter data behind the synthetic code stream input code flow parsing module, core codec data input core decoder module, the output of core codec module is after dynamic time-frequency is divided module and the common input parameter synthesis module of spatial parameter data, and the output of parameter synthesis module inserts synthesis filter.
The present invention is directly mixing the stereophonic signal of input or is utilizing the time-frequency conversion instrument of core encoder to blend together the input of one road signal as core encoder under frequency domain under the time domain, thus the additional delay of avoiding the independent positive inverse transformation of time-frequency to bring; Based on analytical synthetic method, the extraction of spatial parameter is positioned at after the core encoder, and carrying out dynamic time-frequency according to the data of coding back inverse quantization divides, extract the spatial parameter of each division unit then, because the data of coding back inverse quantization can accurately reappear in decoding end, therefore as long as decoding end adopts identical time-frequency division methods, need not to transmit division information, just can obtain the time-frequency consistent and divide, and be that unit is according to spatial parameter compound stereoscopic sound left and right sides two paths of signals with each division unit with coding side.The present invention has not only reduced the time-delay of spatial parameter coding/decoding system, and can be implemented in the dynamic time-frequency division that does not transmit under the spectrum division information condition, and the encoding and decoding real-time and the efficient of space parameter stereo all are significantly increased.
Description of drawings
Fig. 1 is an embodiment of the invention spatial parameter coding flow process, and wherein Fig. 1 a is the situation of mixing under the time domain, and Fig. 1 b is a mixed situation under the frequency domain;
Fig. 2 is an embodiment of the invention spatial parameter decoding process;
Fig. 3 is a space parameter stereo coding/decoding device basic structure of the present invention;
Fig. 4 is that embodiment of the invention employing AAC is the code device structure chart of core encoder;
Fig. 5 is that embodiment of the invention employing AAC is the decoding device structure chart of core codec.
Embodiment
Space parameter stereo coding method provided by the invention may further comprise the steps: step 1, the left and right sides two-way time-domain audio signal of importing is descended to mix, and generate one road signal;
Step 2 is carried out the converting audio frequency coding to the following mixed signal that obtains, and generates the coded data of mixed signal down;
Step 3, the inverse quantization frequency spectrum of the following mixed signal that acquisition is corresponding with the coded data of following mixed signal;
Step 4 is analyzed the following mixed signal that frequency domain inverse quantizes, and spectrum division is become plurality of continuous and non-overlapping subband;
Step 5 is a unit with the subband of dividing, and extracts the spatial parameter information of the left and right sides two paths of signals of input at each subband of frequency domain, span parameter coding data;
Step 6 is combined into one road encoding code stream with the coded data and the spatial parameter coded data of mixing signal down by certain format.
Generally at first relevant left and right sides two-way time-domain audio signal is carried out reversible time-frequency conversion or analysis filterbank filtering during concrete enforcement, generate left and right sides two-way frequency-domain audio signals, these 2 kinds of processing of time-frequency conversion or analysis filterbank filtering all can.The general sensing audio encoding that adopts of described converting audio frequency coding, sensing audio encoding is the general designation of a class based on the transform domain audio coding method of human hearing characteristic, when carrying out frequency domain and mixing operation down, only need the following mixed signal that obtains is carried out part in the sensing audio encoding.Corresponding carrying out also should be adopted the sensing audio decoding technique at conversion when decoding.
Parameter stereo is a kind of stereo encoding method that is based upon on the psychologic acoustics basis, space.Its maximum characteristic is only one tunnel main signal (being called mixed signal down again) to be encoded, and isolates spatial information and parametrization simultaneously and represent (being called spatial parameter information again) from stereophonic signal.The present invention has provided effect following mixed signal adquisitiones preferably: adopt when mixing under the time domain, the time domain average value of getting left and right sides two-way audio signal is as mixed signal down, and wherein the time domain average value is a two paths of signals in half of the sample value sum at synchronization place; Frequency domain mixes operation down, be meant and get left and right sides two-way audio signal frequency-domain mean value or multiply by gain coefficient, obtain mixed signal under the frequency domain, its frequency domain mean value is two paths of signals in half of the spectral line value sum at same frequency place, and gain coefficient is a positive real number of taking advantage of on mean value in order to adjust down mixed signal energy;
During concrete enforcement, can realize full cataloged procedure automation by program, the spatial parameter coding flow process that the invention provides embodiment is so that implement, for mixing under the time domain, referring to Fig. 1 (a):
(101) the relevant left and right sides two-way time-domain audio signal of input blendes together one road signal under time domain, goes to step (102);
(102) carry out complete sensing audio encoding to mixing signal under the time domain, generate the coded data of mixed signal down, if this process generates the frequency domain inverse quantized data simultaneously, go to step (104), otherwise go to step (103);
(103) coded data of mixing signal is down carried out partial decoding of h, generate the frequency domain inverse quantized data, go to step (104);
(104) analyze the following mixed signal that frequency domain inverse quantizes, spectrum division is become plurality of continuous and non-overlapping subband, go to step (105);
(105) the binaural time-domain signal to input carries out analysis filterbank filtering respectively, generates the two-way frequency-region signal, goes to step (106);
(106) sub-band division with step (104) is a unit, extracts the spatial parameter information of each subband of two-way frequency-region signal, and generates the parameter code stream, goes to step (107);
(107) the following mixed signal encoding data that step (102) is generated and the parameter coding data of the generation of step (106) become single encoding code stream by certain format combination.
Mix down for frequency domain, referring to Fig. 1 (b):
(111) the binaural time-domain signal to input carries out analysis filterbank filtering respectively, generates the two-way frequency-region signal, goes to step (112);
(112) the two-way frequency-region signal is carried out mixing under the frequency domain, can adopt mathematic(al) mean or weighted average to obtain mixed signal under one road frequency domain, go to step (113)
(113) mixed signal under the frequency domain is carried out the processing of sensing audio encoding except that time-frequency conversion, comprise that frequency domain is handled and the quantification entropy coding, generate the coded data of mixed signal down, if this process produces the frequency domain inverse quantized data simultaneously, then go to step (115), otherwise go to step (114);
(114) coded data of mixing signal is down decoded, generate mixed signal frequency-domain inverse quantization data down, go to step (115);
(115) analyze the frequency domain inverse quantized signal, spectrum division is become continuous and non-overlapping subband, go to step (116);
(116) sub-band division with step (115) is a unit, extracts the spatial parameter information of each subband of two-way frequency-region signal, and generates the parameter code stream, goes to step (117);
(117) the following mixed signal encoding data that step (113) is generated and the parameter coding data of the generation of step (116) become single encoding code stream by certain format combination.
The spatial parameter coding flow process of the embodiment of the invention as shown in Figure 2, comprises following steps:
(201) be input as the code stream of mixed signal and spatial parameter information under single the comprising, generate the following mixed signal encoding data and the spatial parameter coded data of separating;
(202) according to the coded data of the following mixed signal of step (201) output, carry out the sensing audio decoding, generate the following mixed signal that the frequency domain inverse that comprises quantization error quantizes;
(203) the following mixed signal of the frequency domain inverse quantification of analytical procedure (202) output, the identical method of step (108) that adopts space parameter stereo to encode becomes plurality of continuous and non-overlapping subband with spectrum division, goes to step (4);
(204) quantize mixed signal down according to the spatial parameter coded data of step (201) output and the frequency domain inverse of step (202) output, the subband that provides with step (203) is a unit, generates the frequency-domain audio signals that two-way comprises the given spatial information of spatial parameter coded data;
(205) the two-way frequency-domain audio signals that provides according to step (204) is carried out time-frequency inverse transformation or the filtering of synthesis filter group, generates left and right sides two-way time-domain audio signal.
In the encoding and decoding field, but the process that software approach is realized often is cured as the codec hardware product, so that use in market.The present invention also provides space parameter stereo coding/decoding device basic structure.As shown in Figure 3, wherein coding side comprises 6 modules: following mixed module 301, and core encoder module 302, core codec module 303, dynamic time-frequency is divided module 304, analysis filterbank 305, parameter extraction module 306 and code stream forming module 307.The stereo left and right sides two paths of signals of input at first forms one road signals and as the input of core encoder module 302 through following mixed module 301, the data of its generation through core codec module 303 revert to comprise quantization error with the approximate signal of former mixed signal down, dynamic time-frequency is divided module 304 and is divided frequency spectrum according to the characteristic in short-term of this signal, parameter extraction module 306 is that base unit extracts the original left and right sides two paths of signals spatial parameter between the frequency domain two paths of signals that obtains of bank of filters 305 by analysis with each division unit, and last code stream forming module 307 forms the discernible code stream of decoding with the output of core encoder module 302 and parameter extraction module 306 by certain format combination.
The input of mixed module 301 is the stereo left and right sides of time domain two paths of signals down, and one road time-domain signal of output is the average of left and right sides two paths of signals, and mixed signal is otherwise known as down.
The input of core encoder module 302 is mixed signals under a tunnel, and the core encoder module can be existing monophony transform coder here, as MP3 and AAC etc.The output of core encoder module comprises two parts, and the coded data of following mixed signal and following mixed signal are in the quantization index value of transform domain, and transform domain can be a subband domain here, discrete Fourier DFT territory or correction cosine transform MDCT territory.
The input of core codec module 303 is that down mixed signal is in coded data, and output is mixed signal under the transform domain of inverse quantization.Here core codec module 303 is monophony conversion decoders corresponding with core encoder module 302, as MP3 and AAC etc.Different with common monophony conversion decoder is, decode procedure only need proceed to inverse quantization here, need not obtain time-domain signal carrying out inverse transformation.
The input that dynamic time-frequency is divided module 304 is a mixed signal under the inverse quantization transform domain, according to the characteristic of this signal, its transform domain spectral line is divided into continuous subband, normally different-bandwidth.The frequency domain that the existing space parameter coding adopts is divided the Bark band of the non-linear division of auditory properties of normally pressing people's ear, and is irrelevant with the characteristic of signal.Here based on the Bark band,, the Bark band is segmented and merges:, then this Bark band is divided into corresponding 2 sections or multistage if comprise two or more individual sources in a Bark band according to the time-frequency characteristic in short-term of signal; If all in the scope of a logical sound source, then these Bark tape merges become one section to adjacent Bark band.The judgement of individual sources can obtain by analysis spectrum envelope, phase place and correlation.
The input of analysis filterbank 305 is original left and right sides two-way time-domain signals, and output is the frequency-region signal of left and right sides two-way.Analysis filterbank 305 can adopt multiphase orthogonal modulated filter bank (PolyphaseQuadrature Modulated Filterbank, PQMF), discrete Fourier transform (DFT) (Discrete FourierTransform, DFT), or correction modulation cosine transform (Modified Discrete Fourier Transform, MDCT).
The input of parameter extraction module 306 is that dynamic time-frequency is divided and the original left and right sides two paths of signals two-way frequency-region signal that obtains of filtering by analysis, what export is the spatial parameter that extracts at each division unit, comprises time difference ITD, intensity difference ILD and degree of correlation IC etc.For each division unit, parameter extraction can adopt existing technology, as the method for parameter extraction among BCC, the PS.
The coded data of the sound channel that the input of code stream forming module 306 mixes under being and the coded data of spatial parameter, output is the discernible code stream of decoder.According to the syntactic structure of given code stream, the code stream forming module with above-mentioned two parts data combination together and add given identification information, as specific bit etc.
Decoding end then comprises 5 modules: code stream analyzing module 311, and core codec module 312, dynamic time-frequency is divided module 313, parameter synthesis module 314 and synthesis filter group 315.The input of decoding end is the code stream that meets given syntactic structure, at first 311 pairs of input code flows of code stream analyzing module are resolved, isolate core codec data and spatial parameter data, mixed signal under the transform domain of core codec module 312 according to core codec data generating quantification, mixed signal is accurately consistent under the inverse quantization transform domain of this signal and corresponding time period of coding side, divide module 313 with the on all four dynamic time-frequency of coding side then it is carried out the dynamic time-frequency division, obtain dividing with the on all four time-frequency of coding side, final parameter synthesis module 314 is divided and the spatial parameter data according to time-frequency, rebuild the transform-domain signals of each division unit left and right acoustic channels, obtain final time domain stereophonic signal output through the 315 time-frequency inverse transformations of synthesis filter pack module.
The input of code stream analyzing module 311 is code streams that encoder produces, and output is core codec data and the spatial parameter data corresponding with the decode time section that parse.The code stream analyzing module is according to given syntactic structure, by specific identification information, as the sign bit, obtains the data implication of each bit sequence section in the code stream, so isolate core codec module and parameter synthesis module required and data in synchronization.
The input of core codec module 312 is the core codec data that parse, and output is mixed signal under the transform domain of inverse quantization.Here the core codec module is the monophony conversion decoder corresponding with coding side core encoder module, as MP3 and AAC etc.Different with common monophony conversion decoder is, decode procedure only need proceed to inverse quantization here, need not obtain time-domain signal carrying out inverse transformation.When encoding code stream correctly is sent to decoding end, the core codec module just can accurately recover mixed signal under the transform domain of coding side inverse quantization, the difference of mixed signal is exactly quantization error under this signal and the original transform territory, and primary signal normally can't accurate reconstruction in decoding end.
The input that dynamic time-frequency is divided module 313 is a mixed signal under the transform domain of the inverse quantization that obtains of core codec module, and the transform domain subband according to the current demand signal characteristic of output is divided.It is in full accord that the dynamic time-frequency of this module and coding side is divided module, and mixed signal also is on all four under the transform domain of Shu Ru inverse quantization simultaneously, so the sub-band division of its output is also in full accord with the sub-band division of coding side.
Mixed signal and sub-band division and spatial parameter data under the transform domain of the inverse quantization of the input of parameter synthesis module 314, output are the stereo left and right sides two-way frequency-region signals of rebuilding.The same with the parameter extraction module of coding side, the parameter synthesis module can adopt existing technology, as the parameter synthetic method of BCC, PS etc.With each subband of dynamically dividing is unit, and two subband signals about generation make it to have given time difference ITD, intensity difference ILD and degree of correlation IC etc.
The input of synthesis filter pack module 315 is left and right sides two-way frequency-region signals, and output is left and right sides two-way time-domain signal.Synthesis filter group 315 is inverse transformations of analysis filterbank 305, can be contrary multiphase orthogonal modulated filter bank (Inverse PQMF, IPQMF), contrary discrete Fourier transform (DFT) (Inverse DFT, IDFT) and contrary revise cosine transform (Inverse MDCT, IMDCT).
Be described further below in conjunction with 4,5 pairs of the specific embodiment of the present invention of accompanying drawing, it is the structure of core encoder with AAC that Fig. 4 has provided space parameter stereo coded system of the present invention, is complementary with the space parameter stereo decode system of Fig. 5.
The space parameter stereo coded system that with AAC is core encoder comprises 7 modules, psychoacoustic analysis module 401, MDCT module 402, following mixed module 403, the AAC frequency domain is handled and quantization encoding module 404, dynamic time-frequency is divided module 405, parameter extraction module 406 and code stream forming module 407.The time domain left and right sound track signals is at first handled through psychological acoustic analysis module 401 and is obtained AAC encode required psychoacoustic data and MDCT transform length; MDCT module 402 is carried out the frequency domain data that corresponding time-frequency conversion obtains left and right acoustic channels according to transform length; Warp mixed module 403 down obtains one road frequency domain data handled and quantized entropy coding module 404 as the AAC frequency domain input; Module 404 is AAC core encoders, and the output encoding code stream and the frequency domain inverse of mixed signal down quantizes mixed signal down; Dynamic time-frequency is divided module 405 and is quantized down mixed signal according to frequency domain inverse and provide one according to the signal spectrum division of characteristic in short-term; Parameter extraction module 406 is a unit with each division unit, extracts the spatial parameter information between original frequency domain left and right sound track signals and forms the spatial parameter code stream; Last code stream forming module 407 is combined into the code stream that meets given syntactic structure with mixing sound road code stream under the AAC and spatial parameter code stream.
Psychoacoustic analysis module 401 is one of main modular of AAC encoder, and input is an original left R channel time-domain signal, and output is the required psychoacoustic parameter of AAC coding and the length of MDCT conversion.Psychoacoustic parameter comprises perceptual entropy, covers thresholding etc.; The length of MDCT conversion depends mainly on the stationarity in short-term of signal, and steady-state signal is adopted long conversion, and transient signal is adopted the section conversion.
MDCT module 402 also is one of main modular of AAC encoder, and input is the time-domain signal of original left R channel, and output is the frequency-region signal of left and right acoustic channels.
Mixed module 403 is peculiar modules of the embodiment of the invention down, and input is a left and right sides two-way frequency-region signal, and output is one road frequency-region signal.Mix down and can adopt simple mathematical average, also can on the basis of mathematic(al) mean, introduce gain control coefficient, strengthen mutually or offset with the signal of avoiding the left and right acoustic channels frequency-region signal to bring at homophase or when anti-phase.
The AAC frequency domain is handled and quantized entropy coding module 404 is nucleus modules of AAC encoder, and input is a mixed signal under one road frequency domain, and input is encoding code stream and frequency domain inverse quantized data.Concrete frequency domain is handled and is quantized entropy coding method and has a detailed description in the standard document of MPEG tissue.To space parameter stereo coded system of the present invention, adopting AAC is exactly that AAC cataloged procedure itself just can generate the frequency domain inverse quantized data as an important income of core encoder, has therefore saved independent inverse quantization module.
It is peculiar modules of the embodiment of the invention that dynamic time-frequency is divided module 405, and input is the frequency domain inverse quantized data, and output is according to the signal spectrum division of characteristic in short-term.Module 405 is consistent with effect and processing method that the dynamic time-frequency of Fig. 3 is divided module 304.
Parameter extraction module 406 is peculiar modules of the embodiment of the invention, and input is the division of frequency domain left and right sound track signals and frequency spectrum, and output is the code stream of spatial parameter signal.Module 406 is consistent with the effect and the processing method of the parameter extraction module 305 of Fig. 3.
Code stream forming module 407 is one of main modular of encoder, and input is the AAC code stream and the spatial parameter code stream of mixed signal down, and output is the complete code stream that meets given syntactic structure.
It is the structure of core decoder with AAC that Fig. 5 has provided space parameter stereo decode system of the present invention, is complementary with the space parameter stereo coded system of Fig. 4.
The space parameter stereo decode system that with AAC is core decoder comprises 5 modules: code stream analyzing module 501, and AAC decoding inverse quantization is with the contrary processing module 502 of frequency domain, and dynamic time-frequency is divided module 503, parameter synthetic 504 and IMDCT 505.The code stream that coded system shown in Figure 4 generates is transferred to decode system, at first code stream analyzing module 501 is separated into two parts with code stream, the AAC code stream and the spatial parameter code stream of mixed signal are sent to contrary processing module 502 of AAC decoding inverse quantization and frequency domain and parameter synthesis module 504 respectively down; Module 502 bases mixed signal code stream are down carried out the entropy decoding, inverse quantization, and the frequency domain corresponding with the AAC encoder obtains the following mixed signal of frequency domain inverse quantification against handling; This signal is divided module 503 through dynamic time-frequency and is analyzed and provide according to a signal spectrum division of characteristic in short-term, because inverse quantization signal and time-frequency division methods and coded system is accurate consistent, so also accurate consistent with coded system of the division that provides; Parameter synthesis module 504 is that base unit generates left and right sides two-way frequency-region signal according to spatial parameter information with each division unit; After the IMDCT conversion obtains the time domain left and right sound track signals.
Code stream analyzing module 501 is one of main modular of decode system, and input is a code stream, and output is the AAC encoding code stream and the spatial parameter code stream of mixed signal down.The process of code stream analyzing is cut into elementary stream unit according to given syntactic structure with code stream exactly.
The contrary processing module 502 of AAC decoding inverse quantization and frequency domain is nucleus modules of AAC decoder, and input is the code stream of mixed signal down, and output is the following mixed signal that frequency domain inverse quantizes.Concrete entropy decoding, inverse quantization, and the contrary processing method of frequency domain can be with reference to the AAC standard document of MPEG tissue.Here Shu Chu inverse quantization signal is accurately consistent with the frequency domain inverse quantized signal.
It is peculiar modules of the embodiment of the invention that dynamic time-frequency is divided module 503, and input is the following mixed signal that frequency domain inverse quantizes, and output is the spectrum division of signal.Here module 503 and module 405 are accurately consistent, so the spectrum division of its output and module 405 export also is accurate consistent.
Parameter synthesis module 504 is peculiar modules of the embodiment of the invention, and input is the spatial parameter code stream, the following mixed signal that frequency domain inverse quantizes and the division of frequency spectrum, and output is the two-way frequency-region signal.Module 504 is consistent with the effect and the processing method of the module 314 of system shown in Figure 3, and the two paths of signals of input has comprised the spatial information that coded system transmits.
IMDCT module 505 is one of main modular of AAC decoder, and input is the two-way frequency-region signal, and output is the time domain left and right sound track signals.Module 505 and MDCT module 402 constitute a pair of reciprocal conversion.

Claims (8)

1. space parameter stereo coding method is characterized in that may further comprise the steps:
Step 1 is descended to mix to the left and right sides two-way time-domain audio signal of importing, and generates one road signal;
Step 2 is carried out the converting audio frequency coding to the following mixed signal that obtains, and generates the coded data of mixed signal down;
Step 3, the inverse quantization frequency spectrum of the following mixed signal that acquisition is corresponding with the coded data of following mixed signal;
Step 4 is analyzed the following mixed signal that frequency domain inverse quantizes, and spectrum division is become plurality of continuous and non-overlapping subband;
Step 5 is a unit with the subband of dividing, and extracts the spatial parameter information of the left and right sides two paths of signals of input at each subband of frequency domain, span parameter coding data;
Step 6 is combined into one road encoding code stream with the coded data and the spatial parameter coded data of mixing signal down by certain format.
2. space parameter stereo coding method as claimed in claim 1, it is characterized in that: described the mixing down of step 1 is to operate in time domain, the time domain average value of promptly getting left and right sides two-way audio signal is as mixed signal down, and wherein the time domain average value is a two paths of signals in half of the sample value sum at synchronization place.
3. space parameter stereo coding method as claimed in claim 1, it is characterized in that: described the mixing down of step 1 is in frequency-domain operations, promptly get left and right sides two-way audio signal frequency-domain mean value or multiply by gain coefficient, obtain mixed signal under the frequency domain, its frequency domain mean value is two paths of signals in half of the spectral line value sum at same frequency place, and gain coefficient is a positive real number of taking advantage of on mean value in order to adjust down mixed signal energy.
4. as claim 1 or 2 or 3 described space parameter stereo coding methods, it is characterized in that: the described inverse quantization frequency spectrum of mixed signal down of step 3, by acquisition that the coded data of mixing signal is down decoded; Perhaps when having generated the inverse quantization frequency spectrum of time mixed signal in the transition coding process, directly from cataloged procedure, obtain.
5. space parameter stereo coding/decoding method is characterized in that: decode to comprising down the generated code stream of mixed signal and spatial parameter information, may further comprise the steps:
Step I is separated into down mixed signal encoding data and spatial parameter coded data with generated code stream;
Step II is carried out converting audio frequency decoding to mixing the signal encoding data down, generates the following mixed signal that frequency domain inverse quantizes, and the spectrum division with this signal becomes plurality of continuous and non-overlapping subband simultaneously;
Step II I, according to the following mixed signal of spatial parameter coded data and frequency domain inverse quantification, the subband of dividing with Step II is a unit, generates two-way and comprises frequency-domain audio signals, this two-way frequency-domain audio signals comprises the given spatial information of spatial parameter coded data;
Step IV to the two-way frequency-domain audio signals, carries out time-frequency inverse transformation or the filtering of synthesis filter group, generates left and right sides two-way time-domain audio signal.
6. space parameter stereo code device, it is characterized in that: by mixing module down, the core encoder module, the core codec module, dynamic time-frequency is divided module, analysis filterbank, parameter extraction module and code stream forming module are formed, left and right sides two-way time-domain audio signal input is mixed module and analysis filterbank down, the output of mixed module inserts core encoder down, the output of core encoder inserts the core codec module, the output of core codec module inserts dynamic time-frequency and divides module, dynamic time-frequency is divided the output access parameter extraction module of module and analysis filterbank, and the output of core encoder module and the output of parameter extraction module insert the code stream forming module.
7. space parameter stereo coding method as claimed in claim 6 is characterized in that: described core encoder module adopts the AAC encoder.
8. space parameter stereo decoding device, it is characterized in that: divide module, parameter synthesis module and synthesis filter group by code stream analyzing module, core codec module, dynamic time-frequency and form, isolate core codec data and spatial parameter data behind the synthetic code stream input code flow parsing module, core codec data input core decoder module, the output of core codec module is after dynamic time-frequency is divided module and the common input parameter synthesis module of spatial parameter data, and the output of parameter synthesis module inserts synthesis filter.
CNB2007100537702A 2007-11-06 2007-11-06 A kind of space parameter stereo coding/decoding method and device thereof Expired - Fee Related CN100571043C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2007100537702A CN100571043C (en) 2007-11-06 2007-11-06 A kind of space parameter stereo coding/decoding method and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2007100537702A CN100571043C (en) 2007-11-06 2007-11-06 A kind of space parameter stereo coding/decoding method and device thereof

Publications (2)

Publication Number Publication Date
CN101162904A CN101162904A (en) 2008-04-16
CN100571043C true CN100571043C (en) 2009-12-16

Family

ID=39297757

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2007100537702A Expired - Fee Related CN100571043C (en) 2007-11-06 2007-11-06 A kind of space parameter stereo coding/decoding method and device thereof

Country Status (1)

Country Link
CN (1) CN100571043C (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5243527B2 (en) * 2008-07-29 2013-07-24 パナソニック株式会社 Acoustic encoding apparatus, acoustic decoding apparatus, acoustic encoding / decoding apparatus, and conference system
CN101499279B (en) * 2009-03-06 2011-11-02 武汉大学 Bit distribution method and apparatus with progressively fine spacing parameter
CN101499280B (en) * 2009-03-09 2011-11-02 武汉大学 Spacing parameter choosing method and apparatus based on spacing perception entropy judgement
CN102157152B (en) 2010-02-12 2014-04-30 华为技术有限公司 Method for coding stereo and device thereof
WO2012058805A1 (en) * 2010-11-03 2012-05-10 Huawei Technologies Co., Ltd. Parametric encoder for encoding a multi-channel audio signal
CN102231868A (en) * 2011-05-18 2011-11-02 上海大学 High-order-recording-way-based three-dimensional (3D) sound reproducing system
CN102682779B (en) * 2012-06-06 2013-07-24 武汉大学 Double-channel encoding and decoding method for 3D audio frequency and codec
CN103065634B (en) * 2012-12-20 2014-11-19 武汉大学 Three-dimensional audio space parameter quantification method based on perception characteristic
CN104240712B (en) * 2014-09-30 2018-02-02 武汉大学深圳研究院 A kind of three-dimensional audio multichannel grouping and clustering coding method and system
CN105405445B (en) * 2015-12-10 2019-03-22 北京大学 A kind of parameter stereo coding, coding/decoding method based on transmission function between sound channel
CN108694955B (en) 2017-04-12 2020-11-17 华为技术有限公司 Coding and decoding method and coder and decoder of multi-channel signal
CN109389985B (en) 2017-08-10 2021-09-14 华为技术有限公司 Time domain stereo coding and decoding method and related products

Also Published As

Publication number Publication date
CN101162904A (en) 2008-04-16

Similar Documents

Publication Publication Date Title
CN100571043C (en) A kind of space parameter stereo coding/decoding method and device thereof
US20230245667A1 (en) Stereo audio encoder and decoder
CN101933086B (en) Method and apparatus for processing audio signal
CN1774956B (en) Audio signal synthesis
CN105679327B (en) Method and apparatus for encoding and decoding audio signal
RU2677580C2 (en) Audio encoder, audio decoder, methods and computer program using jointly encoded residual signals
CN107925388A (en) For strengthening the post processor instantaneously handled, preprocessor, audio coder, audio decoder and correlation technique
CN101149925B (en) Space parameter selection method for parameter stereo coding
CN101253808B (en) Method and apparatus for encoding and decoding an audio signal
US20050177360A1 (en) Audio coding
CN102388417A (en) Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding
CN105164749B (en) The hybrid coding of multichannel audio
JP4685165B2 (en) Interchannel level difference quantization and inverse quantization method based on virtual sound source position information
CN104240712A (en) Three-dimensional audio multichannel grouping and clustering coding method and three-dimensional audio multichannel grouping and clustering coding system
CN106373583A (en) Ideal ratio mask (IRM) multi-audio object coding and decoding method
US8271291B2 (en) Method and an apparatus for identifying frame type
CN101582259B (en) Methods, devices and systems for coding and decoding dimensional sound signal
CN101635145B (en) Method, device and system for coding and decoding
CN117037816A (en) Multi-channel audio coding method, system, medium and equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20091216

Termination date: 20151106

EXPY Termination of patent right or utility model