CN101651752A - Decoding method and decoding device - Google Patents

Decoding method and decoding device Download PDF

Info

Publication number
CN101651752A
CN101651752A CN200910166740A CN200910166740A CN101651752A CN 101651752 A CN101651752 A CN 101651752A CN 200910166740 A CN200910166740 A CN 200910166740A CN 200910166740 A CN200910166740 A CN 200910166740A CN 101651752 A CN101651752 A CN 101651752A
Authority
CN
China
Prior art keywords
frame
superframe
background noise
parameter
lpc filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200910166740A
Other languages
Chinese (zh)
Other versions
CN101651752B (en
Inventor
艾雅·舒默特
张立斌
代金良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN2009101667401A priority Critical patent/CN101651752B/en
Publication of CN101651752A publication Critical patent/CN101651752A/en
Application granted granted Critical
Publication of CN101651752B publication Critical patent/CN101651752B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The embodiment of the invention discloses a decoding method which comprises the following steps: acquiring CNG parameters of a first frame of a first superframe from a speech coding frame at the frontof the first frame of the first superframe; carrying out background noise decoding to the first frame of the first superframe according to the CNG parameters, wherein the CNG parameters comprise target excitation gains and LPC filter coefficients, wherein the target excitation gains are determined by fixed codebook gains quantized by the long-term and smooth speech coding frame, and the LPC filter coefficients are defined by LPC filter coefficients quantized by the long-term and smooth speech coding frame. The invention also discloses a decoding device. By utilizing the embodiment, the occupied bandwidth can be remarkably reduced under the condition of ensuring the signal quality.

Description

The method and the device of decoding
Technical field
The present invention relates to communication technical field, particularly relate to the method and the device of decoding.
Background technology
In voice communication, for the encoding and decoding of background noise be by ITU (International TelecomUnion, International Telecommunications Union) formulate G.729B in the noise processed scheme of regulation carry out.
Introduced the silence compression technology in speech coder, its signal processing theory diagram as shown in Figure 1.
The silence compression technology mainly comprises three big module: VAD (Voice Activity Detection, the voice activation detection), DTX (Discontinuous Transmission, discontinuous transmission) and CNG (ComfortNoise Generator, comfort noise generates), wherein VAD, DTX are the modules in the encoder, and CNG is the module in the decoding end.Fig. 1 is a simple silence compression system theory diagram, and its basic procedure is:
At first at transmitting terminal (coding side), to each input signal frame, the VAD module is analyzed and is detected current input signal, detects in the current demand signal whether comprise voice signal, if comprise, then present frame is made as speech frame, otherwise is made as non-speech frame.
Secondly, encoder is encoded to current demand signal according to the VAD testing result, if the VAD testing result is a speech frame, then signal enters speech coder and carries out speech coding, is output as speech frame; If the VAD testing result is a non-speech frame, then signal enters the DTX module and carries out background noise with the non-voice encoder and handle, and the output non-speech frame.
At last, receiving terminal (decoding end) to the received signal frame (comprising speech frame and non-speech frame) decode.If the signal frame that receives is a speech frame, then it is decoded, otherwise enter the CNG module, pass the parameter of coming in the CNG module according to non-speech frame background noise is decoded with Voice decoder, produce comfortable background noise or quiet, make decoded signal sound more natural and continuous.
In encoder, introduce the coded system of this variable Rate, carry out suitable coding by the signal to the quiet stage, the silence compression technology has effectively solved the discontinuous problem of background noise, has improved the synthetic quality of signal, therefore, the background noise of decoding end also can be described as comfort noise.In addition, because the code rate of background noise will be far smaller than speech encoding rate, so the also reduction greatly of the average code rate of system, thereby effectively saved bandwidth.
G.729B signal adopted during processing signals and divide frame to handle, frame length is 10ms.In order to save bandwidth, G.729.1 also defined the demand of silence compression system, require it under the prerequisite that does not reduce signal binary encoding quality, adopting the coded system of low rate that background noise is carried out coding transmission under situation of background noise, promptly defined the demand of DTX and CNG, more prior be its DTX/CNG system of requirement want can compatibility G.729B.In though DTX/CNG system G.729B simply can being transplanted to G.729.1, but have two problems to need to solve: the one, the processing frame length difference of these two encoders, directly transplanting can bring some problems, and the DTX/CNG system of 729B some is simple, especially parameter extraction part, the DTX/CNG system to 729B expands for the demand needs that satisfy system G.729.1DTX/CNG.The 2nd, the signal bandwidth of G.729.1 handling is the broadband, and the bandwidth of G.729B handling is the arrowband, (processing mode of 4000Hz~7000Hz) makes it become a complete system also will to add the ambient noise signal highband part in G.729.1 DTX/CNG system.
There are the following problems at least in the prior art: existing G.729B system is because the bandwidth of handling is the arrowband background noise, in the time of in being transplanted to G.729.1 system, can not guarantee the quality of code signal.
Summary of the invention
In view of this, the purpose of the one or more embodiment of the present invention is to provide a kind of method and device of decoding, to be implemented in after G.729B expanding, can meet the G.729.1 requirement of technical standard, realized under the situation that guarantees coding quality, significantly reducing the communication bandwidth of signal.For addressing the above problem, the embodiment of the invention provides a kind of method of decoding, comprising:
From first frame vocoder frames before of first superframe, obtain the CNG parameter of first frame of first superframe;
According to described CNG parameter, first frame of first superframe is carried out the background noise decoding, described CNG parameter comprises:
The fixed codebook gain that level and smooth vocoder frames quantized when target excitation gain, described target excitation gained by length is determined;
The LPC filter coefficient definition that LPC filter coefficient, described LPC filter coefficient level and smooth vocoder frames when long quantizes.
A kind of decoding device also is provided, has comprised:
The CNG parameter obtains the unit, is used for: from first frame vocoder frames before of first superframe, obtain the CNG parameter of first frame of first superframe;
First decoding unit is used for: according to described CNG parameter, first frame of first superframe is carried out the background noise decoding, described CNG parameter comprises:
The fixed codebook gain that level and smooth vocoder frames quantized when target excitation gain, described target excitation gained by length is determined;
The LPC filter coefficient definition that LPC filter coefficient, described LPC filter coefficient level and smooth vocoder frames when long quantizes.
Compared with prior art, the embodiment of the invention has the following advantages:
The embodiment of the invention is by extracting the background noise characteristic parameter in the hangover time; To first superframe behind the described hangover time,, carry out the background noise coding according to the described background noise characteristic parameter that extracts and the background noise characteristic parameter of described first superframe; To the superframe behind first superframe, each frame is all carried out background noise characteristic parameter extraction and DTX judgement; To the superframe behind first superframe, according to the background noise characteristic parameter of some superframes before the background noise characteristic parameter of the current superframe that extracts and the described current superframe, and final DTX court verdict, carry out the background noise coding.Realized:
At first, under the situation that guarantees coding quality, significantly reduce the communication bandwidth of signal.
Secondly, by for the G.729B expansion of system, met the G.729.1 requirement of system index.
Once more, by the extraction of background noise characteristic parameter accurately flexibly, make that the coding of background noise is more accurate.
Description of drawings
Shown in Figure 1, be a simple silence compression system theory diagram;
Shown in Figure 2, be encoder functionality block diagram G.729.1;
Shown in Figure 3, be decoder system block diagram G.729.1;
Shown in Figure 4, be the flow chart of the embodiment one of Methods for Coding of the present invention;
Shown in Figure 5, be schematic flow sheet to first superframe coding;
Shown in Figure 6, be the flow chart that the arrowband partial parameters extracts and DTX adjudicates;
Shown in Figure 7, be arrowband part background noise parameter extraction and the DTX judgement flow chart in the current superframe;
Shown in Figure 8, be the flow chart of the embodiment one of coding/decoding method of the present invention;
Shown in Figure 9, be the block diagram of the embodiment one of code device of the present invention;
Shown in Figure 10, be the block diagram of the embodiment one of decoding device of the present invention.
Embodiment
Below in conjunction with accompanying drawing the specific embodiment of the invention is done further and to be elaborated.
At first, introduce the G.729B relevant principle of system.
1.1.2. the similarities and differences of coding parameter in speech coding code stream and the background noise encoding code stream
In current speech coder, the composition principle of background noise is identical with the composition principle of voice, and the model of employing all is CELP (Code Excited Linear Prediction, Code Excited Linear Prediction) models.The composition principle of voice is: voice s (n) can regard the output that a composite filter v of a pumping signal e (n) excitation (n) is produced as, i.e. s (n)=e (n) * v (n), Mathematical Modeling that voice produce that Here it is.What use when synthesizing background noise also is this model, so the characteristic parameter in the description background noise that is transmitted in the background noise encoding code stream and the characteristic parameter content of quietness and the speech coding code stream is basic identical, composite filter parameter and excitation parameters when synthesizing for signal.
In the speech coding code stream, the composite filter parameter is mainly line spectral frequencies LSF quantization parameter, and the pumping signal parameter comprises: pitch delay parameter, fundamental tone gain parameter, fixed codebook parameters and fixed codebook gain parameter.Different speech coders, the quantizing bit number and the quantized versions of these parameters are different; Identical encoder, if it comprises a plurality of speed, under different rates, owing to describe the emphasis difference of characteristics of signals, the quantizing bit number of coding parameter and quantized versions are also different.
Different with speech coding parameters, what the background noise coding parameter was described is the background noise characteristic, because the pumping signal of background noise can be thought simple noise random sequence, and these sequences all can simply produce with the random noise generation module at the encoding and decoding end, control the amplitude of these sequences then with energy parameter, just can produce final pumping signal, therefore the pumping signal characteristic parameter can simply be represented with energy parameter, and do not need to further describe with other some characteristic parameters, so in the background noise encoding code stream, its excitation parameters is the energy parameter of current background noise frame, and these are different with speech frame; Identical with speech frame is, the composite filter parameter in the background noise encoding code stream also is a line spectral frequencies LSF quantization parameter, is concrete quantization method difference to some extent.By above analysis, can think that the coded system to background noise is exactly a kind of simple " voice " coded system in essence.
G.729B noise processed scheme (quoting the 729B agreement)
1.2.1 the DTX/CNG general technical is introduced
G.729B silence compression scheme is a kind of silence compression technology early, its background noise encoding/decoding technology based on algorithm model be CELP, therefore the background noise parameter that it transmitted is based on also that the CELP model extracts, be composite filter parameter and the excitation parameters of describing background noise, wherein excitation parameters is an energy parameter of describing the background noise energy, the self adaptation and the fixed codebook parameters of voice-activated are not described, filter parameter and speech coding parameters basically identical are the LSF parameters.At coding side, voice signal to every frame input, if the court verdict of VAD is " 0 ", represent that current signal is a background noise, encoder is sent signal into the DTX module so, in the DTX module background noise parameter is extracted, come background noise is encoded according to every frame parameter situation of change then: if present frame extract the filter parameter and the variation of energy parameter and former frames bigger, represent that so current background noise characteristic compares with background noise characteristic before bigger difference is arranged, then the background noise parameter that present frame extracts is encoded in the noise code module, be assembled into SID frame (Silence Insertion Descriptor, silence description frames) sends to decoding end, otherwise send NODATA frame (free of data) to decoding end.SID frame and NODATA frame are called non-speech frame.In decoding end,, then in the CNG module, synthesize the comfort noise of describing coding side background noise characteristic according to the non-speech frame that receives if enter the background noise stage.
G.729B signal adopted during processing signals and divide frame to handle, frame length is 10ms.Divide three joints to describe DTX, noise code and the CNG module of 729B respectively below.
1.2.2 DTX module
The DTX module is mainly used to do the estimation of background noise parameter and the transmission of quantification and SID frame.In the non-voice stage, the DTX module need send to decoding end with background noise information, and background noise information is encapsulated in the SID frame and sends, if current background noise does not steadily send the SID frame so, otherwise do not send the SID frame, and send NODATA frame without any data.Two other adjacent S ID interframe be conditional at interval, be restricted to two frames, if background noise is not steady, need continuous transmission SID frame, the transmission of a so back SID frame can postpone.
At coding side, the DTX module can receive the output of VAD module from encoder, auto-correlation coefficient and excitation sampling point in the past, at each frame, the DTX module is used three values 0,1,2 and is described non-transmit frame respectively, speech frame and SID frame, their frame type is respectively Ftyp=0, Ftyp=1 and Ftyp=2.
The content that background noise is estimated is the energy level and the spectrum envelope of background noise, this is consistent with speech coding parameters in itself, so the calculating of spectrum envelope and the calculating basically identical of speech coding parameters, the parameter of using has comprised the parameter of front cross frame; And energy parameter also is a mean value of former frame energy.
The main operation of DTX module:
The storage of a, every frame auto-correlation coefficient
To the signal frame of each input, comprise speech frame and non-speech frame, the auto-correlation coefficient of present frame t is retained in the buffer memory, these auto-correlation coefficients are expressed as: r ' t(j), j=0...10.Wherein j is the sequence number of every frame auto-correlation function.
B, the current frame type of estimation
If current is a speech frame, promptly VAD=1 is made as 1 with current frame type so, if non-speech frame, then the auto-correlation coefficient according to former frame and this frame calculates a current LPC filter A t(z), calculating A t(z) can at first calculate the mean value of adjacent two frame auto-correlation coefficients before:
R t ( j ) = Σ i = t - N cur + 1 t r i ′ ( j ) , j = 0 . . . 10
N wherein Cur=2, calculate R t(j) back goes out A according to the Levinson-Durbin algorithm computation t(z).The Levinson-Durbin algorithm also can calculate residual energy E in addition t, and do the simple estimation of frame excitation energy with this.
The frame type of present frame is used following mode and is estimated:
(1) if the current frame non-active frame that is first is made as the SID frame with this frame so, and makes the variable E of characterization signal energy equal E t, the parameter k of sign frame number ECan be set as 1:
( Vad t - 1 = 1 ) ⇒ Ftyp = 2 E ‾ = E t k E = 1
(2), for other non-speech frame, algorithm compares before SID frame parameter and current relevant parameters, if excitation energy and excitation energy before that current filter and filter difference before are bigger or current are bigger, order sign flag_change equals 1 so, otherwise the value of sign is constant.
(3), current counter count_fr has represented the number of the frame between present frame and the Last SID.If its value is greater than N Min, send the SID frame so; If flag_change equals 1 in addition, the SID frame also can send, and under other situation, does not send present frame:
count _ fr ≥ N min flag _ chang = 1 ⇒ Ftyp t = 2
Otherwise:Ftyp t=0
Under the situation of SID frame, counter count_fr reinitializes into 0 with sign flag_change.
C, LPC filter coefficient:
If the LPC filter A of Last SID Sid(z) coefficient is a Sid(j), j=0...10, if the Itakura of the SID-LPC filter of present frame and previous frame distance has surpassed certain thresholding, just think that the two is very different:
Σ j = 0 10 R a ( i ) × R t ( i ) ≥ E t × thr 1
Wherein, R a(j), j=0...10 is the auto-correlation coefficient of SID filter coefficient:
R a ( j ) = 2 Σ k = 0 10 - j a sid ( k ) × a sid ( k + j ) if ( j ≠ 0 ) R a ( 0 ) = Σ k = 0 10 a sid ( k ) 2
D, frame energy:
Calculate the frame energy and:
E ‾ = Σ i = t - k E + 1 t E i
Then E is quantized with the logarithm quantizer of 5 bits.Logarithm ENERGY E after the decoding qA meeting and a last decoded SID logarithm ENERGY E q SidCompare,, think that so the energy difference of the two is bigger if the difference of the two has surpassed 2dB.
1.2.3 noise code and SID frame
Parameter in the SID frame is exactly the quantization parameter of LPC filter coefficient (spectrum envelope) and energy.
When being calculated, the SID-LPC filter considered the stable case between the adjacent noise frame:
At first, calculate current SID frame N before pThe average LPC filter A of frame p(z), this will use auto-correlation function and R p(j), then with R p(j) send in the Levinson-Durbin algorithm and obtain A pAnd R (z), p(j) be expressed as:
R ‾ p ( j ) = Σ k = t ′ - N p t ′ r k ′ ( j ) , j = 0 . . . 10
N wherein pValue be decided to be 6.The scope of frame number t ' is [t-1, t-N Cur].Like this, the SID-LPC filter table is shown:
A sid ( z ) = A t ( z ) if dis tan ce ( A t ( z ) , A ‾ p ( z ) ) ≥ thr 3 A ‾ p ( z ) otherwise
Be the average LPC filter coefficient A that algorithm can calculate former frames p(z), use itself and current LPC filter coefficient A then t(z) compare, if the two difference is less, that present frame is selected when quantizing the LPC coefficient so is exactly the mean value A of former frames pOtherwise be exactly the A of present frame (z), t(z).Choose after the LPC filter coefficient, algorithm is transformed into the LSF territory with these LPC filter coefficients, carries out quantization encoding then, and the quantization encoding mode of the mode that quantization encoding is selected and speech coding is the same.
The quantification of energy parameter is finished at log-domain, employing be linear quantification, encode with 5bit then.Like this coding of background noise is just finished, then these coded-bits are encapsulated in the SID frame.Shown in Table A:
Table A
TABLE?B.2/G.729
?Parameter?description ????Bits
?Switched?predictor?index?of?LSF?quantizer ????1
?First?stage?vector?of?LSF?quantizer ????5
?Second?stage?vector?of?LSF?quantizer ????4
?Gain(Energy) ????5
Parameter in the SID frame is made of four code book indexes, and one of them is used to refer to energy quantization index (5 bit), in addition three index (10 bit) that the indication frequency spectrum quantizes.
1.2.4 CNG module
In decoding end, algorithm obtains comfortable background noise with the controlled LPC composite filter that obtains through interpolation of pseudo-white-noise excitation of level, and synthesis mode with voice is the same in itself for this.Wherein the level of drive and LPC filter coefficient obtain from the Last SID frame respectively.The LPC filter coefficient of subframe obtains by the interpolation of the LSP parameter in the SID frame, and interpolating method is consistent with interpolation method in the speech coder.
Pseudo-white-noise excitation ex (n) is the mixing of voice-activated ex1 (n) and white Gaussian noise excitation ex2 (n).The gain of ex1 (n) is less, is in order to make the transition between voice and non-voice more natural and adopt the purpose of ex1 (n).
Obtain like this can obtaining comfortable background noise with its excitation composite filter after the pumping signal.
Because encoding and decoding both sides' non-voice encoding and decoding will keep synchronously, so all will be SID frame and transmit frame generation pumping signal not both sides.
At first, objective definition excitation gain Make its square root as present frame excitation average energy, Obtain by following smoothing algorithm, wherein
Figure G2009101667401D00103
Be the gain of decoded SID frame:
G ~ t = G ~ sid if ( Vad t - 1 = 1 ) 7 8 G ~ t - 1 + 1 8 G ~ sid otherwise
80 sampled points are divided into two subframes, and to each subframe, the pumping signal of CNG module is used following mode and synthesized:
(1), in [40,103] scope, selects pitch delay at random;
(2), the position of non-zero pulses is selected (position of these non-zero pulses with the structure of symbol with G.729 be consistent) at random in the fixed codebook vector of subframe with symbol;
(3), select the self-adapting codebook excitation signal of a band gain, it is labeled as e a(n), n=0...39, and the constant codebook excitations signal post of selecting is designated as e f(n), n=0...39.Be according to calculating adaptive gain G then with the subframe energy aWith fixed codebook gain G f:
1 40 Σ n = 0 39 ( G a × e a ( n ) + G f × e f ( n ) ) 2 = G ~ t 2
It should be noted that G fCan select negative value.
Definition: E a = ( Σ n = 0 39 e a ( n ) 2 ) , I = ( Σ n = 0 119 e a ( n ) e f ( n ) ) , K = 40 × G ~ t 2 , And as can be known by the incentive structure of ACELP Σ n = 0 39 e f ( n ) 2 = 4 .
If with adaptive codebook gain G aFixing, performance so Equation just become one about G fSecond-order equation:
G f 2 + G a × I 2 G f + E a × G a 2 - K 4 = 0
G aValue can be defined guaranteeing that top equation separates, a nearlyer step, can the application of some big adaptive codebook gain values be limited, like this, adaptive codebook gain G aCan be in following scope at random selection:
[ 0 , Max { 0.5 , K A } ] , With A=E a-I 2/ 4 with equation 1 40 Σ n = 0 39 ( G a × e a ( n ) + G f × e f ( n ) ) 2 = G ~ t 2 Root in the absolute value minimum as G fValue.
At last, with following formula structure pumping signal G.729:
ex 1(n)=G a×e a(n)+G f×e f[n],n=0...39
Synthetic excitation ex (n) can be synthetic by following method:
If E 1Be ex 1(n) energy, E 2Be ex 2(n) energy, E 3Be ex 1(n) and ex 2(n) dot product:
E 1 = Σ ex 1 2 ( n )
E 2 = Σ ex 2 2 ( n )
E 3=∑ex 1(n)·ex 2(n)
And counting of calculating surpasses the size of self.
Make that α and β are respectively ex in the mixed excitation 1(n) and ex 2(n) proportionality coefficient, wherein α is made as 0.6, and β determines according to following quadratic equation:
β 2E 2+2αβE 3+(α 2-1)E 1=0,??with??β>0
If β does not separate, β will be set as 0 so, and α is set as 1.The excitation of final CNG module becomes ex (n):
ex(n)=αex 1(n)+βex 2(n)
More than be the basic principle of the DTX/CNG module of 729.B encoder.
1.3 the basic procedure of codec G.729.1
G.729.1 be the encoding and decoding speech standard of new generation (document that sees reference [1]) of the up-to-date issue of ITU, it is the expansion of ITU-TG.729 on the gradable broadband of 8-32kbit/s (50-7000Hz).Under the default situations, encoder input and decoder output sample frequency are 16000Hz.The code stream that encoder produces has gradability, comprises 12 embedded layers, is known as the 1st~12 layer respectively.The 1st layer is core layer, and the corresponding bit rate is 8kbit/s.This layer is with G.729 code stream is consistent, thereby makes G.729EV and G.729 have interoperability.The 2nd layer is the arrowband enhancement layer, has increased 4kbit/s, and the 3rd~12 layer be the broadband enhancement layer, increases 20kbit/s altogether with every layer of 2kbit/s speed.
G.729.1 codec is based on three stage structures: embedded sign indicating number excitation linear is estimated (CELP) encoding and decoding, time domain bandwidth expansion (TDBWE) and be called as the estimation conversion encoding and decoding that the time domain aliasing is eliminated (TDAC).The embedded type C ELP stage produces the 1st and the 2nd layer, generates 8kbit/s and 12kbit/s arrowband composite signal (50-4000Hz).The TDBWE stage produces the 3rd layer, generates 14kbit/s broadband output signal (50-7000Hz).The TDAC stage is operated in improves discrete cosine transform (MDCT) territory, generates the 4th~12 layer, and signal quality is brought up to 32kbit/s from 14kbit/s.50-4000Hz frequency band weighting CELP encoding and decoding bit error signal and 4000-7000Hz frequency band input signal are represented in the TDAC encoding and decoding simultaneously.
With reference to shown in Figure 2, provided G.729.1 encoder functionality block diagram.Encoder works in 20ms input superframe.Under the default situations, input signal s WB(n) sample at 16000Hz.Therefore, the input superframe has 320 sampled point length.
At first, input signal s WB(n) through QMF filtering (H 1(z), H 2(z)) be divided into two subbands, low subband signal s LB Qmf(n) high pass filter through the 50Hz cut-off frequency carries out preliminary treatment, output signal s LB(n) use the arrowband embedded type C ELP encoder of 8kb/s~12kb/s to encode s LB(n) the local composite signal of celp coder and under the 12Kb/s code check
Figure G2009101667401D00141
Between difference signal be d LB(n), with its process perceptual weighting filtering (W LB(z)) obtain signal d after LB w(n), with d LB w(n) transform to frequency domain by MDCT.Weighting filter W LB(z) comprise gain compensation, be used for keeping filter output d LB w(n) with high subband input signal s HB(n) the spectrum continuity between.
High subband component is multiplied by (1) nCompose folding picked up signal s afterwards HB Fold(n), with s HB Fold(n) be that the low pass filter of 3000HZ carries out preliminary treatment, filtered signal s by cut-off frequency HB(n) use the TDBWE encoder to encode.Signal s HB(n) also be transformed to frequency-region signal by MDCT.
Two groups of MDCT coefficient D LB w(k) and S HB(k) use the TDAC encoder to encode at last.
In addition, also have some parameters to transmit, the mistake that causes when in transmission, frame losing occurring in order to improve with FEC (frame losing hiding error) encoder.
The decoder system block diagram as shown in Figure 3, the real work pattern of decoder also is equivalent to by the code check decision that receives by the decision of the code stream number of plies that receives.
(1) if the code check that receives is 8kb/s or 12kb/s (promptly only receiving ground floor or preceding two-layer): ground floor or preceding two-layer code stream are decoded by embedded type C ELP decoder, obtain decoded signal
Figure G2009101667401D00142
Carrying out back filtering again obtains And obtain through high-pass filtering s ^ LB qmf ( n ) = s ^ LB hpf ( n ) . Output signal is produced by QMF composite filter group, its medium-high frequency composite signal By zero setting.
(2) if the code check that receives is 14kb/s (promptly receiving three first layers): except the CELP decoder decode went out the arrowband component, the TDBWE decoder also decoded the high-band signal component Right
Figure G2009101667401D00147
Carry out the MDCT conversion, 3000Hz in the high subband component spectrum above (corresponding in the 16kHz sample rate more than the 7000Hz) frequency component is put 0, carry out contrary MDCT conversion then, after the superposition and carry out spectrum inversion, then in the QMF bank of filters with the high-frequency band signals of rebuilding
Figure G2009101667401D00151
The low strap component that solves with the CELP decoder s ^ LB qmf ( n ) = s ^ LB post ( n ) The broadband signal (not carrying out high-pass filtering) of synthesizing 16kHz together.
(3) if receive the above speed of 14kb/s code stream (corresponding to preceding four layers or more multi-layered): except the CELP decoder decode goes out to hang down the subband component The TDBWE decoder decode goes out high subband component
Figure G2009101667401D00154
In addition, the TDAC decoder also is responsible for rebuilding the MDCT coefficient
Figure G2009101667401D00155
With
Figure G2009101667401D00156
The two corresponds respectively to, and low-frequency band (0-4000Hz) is rebuild weighted difference and high frequency band (4000-7000Hz) reconstruction signal (notices that in high frequency band non-reception subband and TDAC zero sign indicating number allocated subbands is replaced by level and adjusts subband signal
Figure G2009101667401D00157
).
Figure G2009101667401D00158
And
Figure G2009101667401D00159
Be transformed to time-domain signal by reverse MDCT and overlap-add.Then, low band signal
Figure G2009101667401D001510
Handle via perceptual weighting filter.Change the influence that coding brings for reducing, to low-frequency band and high-frequency band signals
Figure G2009101667401D001511
With
Figure G2009101667401D001512
Carry out forward direction/back to echo monitoring and compression.The low-frequency band composite signal Via the back Filtering Processing, and the high frequency band composite signal
Figure G2009101667401D001514
Handle via (1) n spectrum folding.Then, QMF composite filter group is to signal s ^ LB qmf ( n ) = s ^ LB post ( n ) With
Figure G2009101667401D001516
Make up and up-sampling, obtain the broadband signal of final 16kHz.
1.4G.729.1 the demand of DTX/CNG system
In order to save bandwidth, G.729.1 also defined the demand of silence compression system, require it under the prerequisite that does not reduce signal binary encoding quality, adopting the coded system of low rate that background noise is carried out coding transmission under situation of background noise, promptly defined the demand of DTX and CNG, more prior be its DTX/CNG system of requirement want can compatibility G.729B.In though DTX/CNG system G.729B simply can being transplanted to G.729.1, but have two problems to need to solve: the one, the processing frame length difference of these two encoders, directly transplanting can bring some problems, and the DTX/CNG system of 729B some is simple, especially parameter extraction part, the DTX/CNG system to 729B expands for the demand needs that satisfy system G.729.1DTX/CNG.The 2nd, the signal bandwidth of G.729.1 handling is the broadband, and the bandwidth of G.729B handling is the arrowband, (processing mode of 4000Hz~7000Hz) makes it become a complete system also will to add the ambient noise signal highband part in G.729.1 DTX/CNG system.
In G.729.1, the high frequency band and the low-frequency band of background noise can be handled respectively.Wherein the processing mode of high frequency band is fairly simple, but the TDBWE coded system of the coded system reference voice encoder of its background noise characteristic parameter, and the stability of simple relatively frequency domain envelope of judgement part and temporal envelope gets final product.Technical scheme of the present invention and the problem that will solve are in low-frequency band, also are the arrowband.G.729.1DTX/CNG the system of following indication is meant the relevant processing procedure that is applied to arrowband DTX/CNG part.
With reference to shown in Figure 4, be the embodiment one of Methods for Coding of the present invention, comprise step:
Background noise characteristic parameter in step 401, the extraction hangover time;
Step 402, to first superframe behind the described hangover time, according to the background noise characteristic parameter in the described hangover time that extracts and the background noise characteristic parameter of described first superframe, carry out the background noise coding, obtain first SID frame;
Step 403, to the superframe behind first superframe, to each frame all carry out background noise characteristic parameter extraction and DTX the judgement;
Step 404, to the superframe behind first superframe, according to the background noise characteristic parameter of some superframes before the background noise characteristic parameter of the current superframe that extracts and the described current superframe, and final DTX court verdict, carry out the background noise coding.
Utilize the embodiment of the invention, by extracting the background noise characteristic parameter in the hangover time; To first superframe behind the described hangover time,, carry out the background noise coding according to the background noise characteristic parameter in the described hangover time that extracts and the background noise characteristic parameter of described first superframe;
To the superframe behind first superframe, each frame is all carried out background noise characteristic parameter extraction and DTX judgement;
To the superframe behind first superframe, according to the background noise characteristic parameter of some superframes before the background noise characteristic parameter of the current superframe that extracts and the described current superframe, and final DTX court verdict, carry out the background noise coding.Realized:
At first, under the situation that guarantees coding quality, significantly reduce the communication bandwidth of signal.
Secondly, by for the G.729B expansion of system, met the G.729.1 requirement of system index.
Once more, by the extraction of background noise characteristic parameter accurately flexibly, make that the coding of background noise is more accurate.
In each embodiment of the present invention, if in order to adapt to the G.729.1 requirement of correlation technique standard, can be set to 20 milliseconds by each superframe, the frame that each superframe comprised is set to 10 milliseconds.Utilize each embodiment of the present invention, all can realize the expansion for G.729B, satisfy technical indicator G.729.1.Simultaneously, for the those skilled in the art, can understand,, use the technical scheme that each embodiment of the present invention provided, can reach equally background noise is taken with lower frequency band, bring higher communication quality for non-G.729.1 system.Be that G.729.1 range of application of the present invention not only is confined within the system.
Below in conjunction with accompanying drawing, introduce the embodiment two of Methods for Coding of the present invention in detail:
Because the coding frame length of G729.1 and G729B is different, wherein, the former is 20ms one frame, and the latter is 10ms one frame.That is to say that the frame of G729.1 is corresponding with the length of two frames of G729B.For convenience of description, here the frame with G729.1 is superframe (superframe), the frame of G729B is frame (frame), the present invention is primarily aimed at the DTX/CNG system that this difference is described G729.1, promptly by G729B DTX/CNG system being upgraded and expanding, to adapt to system's characteristics of ITU729.1.
One, the study of noise:
At first, encode with the preceding 120ms of speech encoding rate to background noise;
For the characteristic parameter to background noise accurately extracts, finish (according to VAD result's indication at speech frame, show that present frame becomes inactive background noise from the voice of activity) in a period of time afterwards, the processing stage of not entering background noise, but continue background noise to be encoded with speech encoding rate at once.The time of this hangover is generally 6 superframes, i.e. 120ms (can with reference to AMR and AMRWB).
Secondly, in this hangover time, to each 10ms frame of each superframe, all to the auto-correlation coefficient r ' of background noise T, k(j), j=0...10 carries out buffer memory, and wherein t is the superframe sequence number, k=1, and 2 is the sequence number of the 1st and the 2nd 10ms frame in each superframe.Because these auto-correlation coefficients have characterized the characteristic of hangover stage background noise, therefore just can be when background noise is encoded according to these auto-correlation coefficients, extract the characteristic parameter of background noise accurately, thereby make more accurate the coding of background noise.In practice, the time span that noise study is continued can be set according to actual needs, is not limited to 120ms; Hangover time is set to other numerical value as required.
Two, first superframe after the hangover stage is encoded
After the hangover stage finishes, adopt the processing mode of background noise that background noise is handled.With reference to shown in Figure 5, be to the schematic flow sheet of first superframe coding, comprise step:
First superframe after the hangover stage finishes, the background noise characteristic parameter that noise learning phase and current superframe extract is encoded, obtain first SID superframe, because first superframe after the hangover stage will carry out the coding transmission of background noise parameter, so this superframe is commonly referred to as first SID superframe; First SID superframe that the coding back generates is decoded after sending to decoding end.Because corresponding two the 10ms frames of superframe in order accurately to obtain coding parameter, can extract the characteristic parameter A of background noise at the 2nd 10ms frame t(z) and E t:
LPC filter A t(z) and residual energy E tAccount form as follows:
The mean value of all auto-correlation coefficients in step 501, the calculating buffer memory:
R t ( j ) = 1 2 * N cur Σ i = t - N cur + 1 t Σ k = 1 2 r i , k ′ ( j ) , j = 0 . . . 10
N wherein Cur=5, just the size of buffer memory is 10 10ms frames.
Step 502, by the mean value R of auto-correlation coefficient t(j), go out LPC filter A according to the Levinson-Durbin algorithm computation t(z), its coefficient is a t(j), j=0 ..., 10, the Levinson-Durbin algorithm also can calculate residual energy E simultaneously t, and with this simple estimation as current superframe energy parameter.
Wherein, in practice, estimate in order to obtain more stable superframe energy parameter, can also be to the residual energy E that estimates tCarried out level and smooth when long, and the Energy Estimation E_LT after will be level and smooth composes to E again as the final estimation of current superframe energy parameter t, smooth operation is as follows:
E_LT=αE_LT+(1-α)E t
E t=E_LT
Wherein, the α span is: 0<α<1, in preferred embodiment, the α value can be 0.9.Also can be set to other values as required.
Step 503, algorithm are with LPC filter coefficient A t(z) be transformed into the LSF territory, carry out quantization encoding then;
Step 504, residual energy parameter E tQuantification finish at log-domain, employing be linear quantification.
After the coding of background noise arrowband part is finished, these coded-bits are enclosed in the SID frame are sent to decoding end, so just finished the coding of first SID frame arrowband part.
In an embodiment of the present invention, taken into full account the characteristic of hangover stage background noise for the coding of first SID frame arrowband part, the characteristic of background noise in the hangover stage has been reflected in the coding parameter, thereby made these coding parameters characterize the characteristic of current background noise to greatest extent.Therefore, the parameter extraction in the embodiments of the invention is with respect to G.729B more accurately rationally.
Three, the judgement of DTX
For the clear needs of describing, establish extraction parameter take the form of PARA T, k, wherein t is the superframe sequence number, " k=1,2 " are the sequence number of the 1st and the 2nd 10ms frame in each superframe.For other non-voice superframes except first superframe, need extract with DTX the parameter of each 10ms frame and adjudicate so.
With reference to shown in Figure 6, be the flow chart that the arrowband partial parameters extracts and DTX adjudicates, comprise step:
At first, carry out the background noise parameter extraction and the DTX judgement of first the 10 milliseconds of frames behind first superframe;
For described first 10 milliseconds of frames, the spectrum parameter A of background noise T, 1(z) and excitation energy parameter E T, 1Account form as follows:
Step 601, according to nearest four adjacent 10ms frame auto-correlation coefficient r ' T, 1(j), r ' (t-1), 2(j), r ' (t-1), 1(j) and r ' (t-2), 2(j) value is calculated the stable state mean value R of current auto-correlation coefficient T, 1(j):
R t,1(j)=0.5*r min1(j)+0.5*r min2(j),j=0...10
Wherein, r Min1(j) and r Min2(j) expression r ' T, 1(j), r ' (t-1), 2(j), r ' (t-1), 1(j) and r ' (t-2), 2(j) have time auto-correlation coefficient minimum and time time minimum auto-correlation coefficient norm value in, just remove and have the remaining auto-correlation coefficient of minimum and maximum auto-correlation coefficient norm value with two 10ms frames of middle auto-correlation coefficient norm value:
R ' T, 1(j), r ' (t-1), 2(j), r ' (t-1), 1(j) and t ' (t-2), 2(j) auto-correlation coefficient norm is respectively:
norm t , 1 = Σ j = 0 10 r t , 1 ′ 2 ( j )
norm ( t - 1 ) , 2 = Σ j = 0 10 r ( t - 1 ) , 2 ′ 2 ( j )
norm ( t - 1 ) , 1 = Σ j = 0 10 r ( t - 1 ) , 1 ′ 2 ( j )
norm ( t - 2 ) , 2 = Σ j = 0 10 r ( t - 2 ) , 2 ′ 2 ( j )
These four auto-correlation coefficient norm value are sorted, then r Min1(j) and r Min2(j) corresponding to the auto-correlation coefficient of two 10ms frames with middle auto-correlation coefficient norm value size.
Step 602, by the stable state mean value R of current auto-correlation coefficient T, 1(j), go out the LPC filter A of background noise according to the Levinson-Durbin algorithm computation T, 1(z), its coefficient is a t(j), j=0 ..., 10, the Levinson-Durbin algorithm also can calculate residual energy E simultaneously T, 1
Wherein, in practice, algorithm is in order to obtain more stable frame Energy Estimation, to the E that estimates T, 1, level and smooth in the time of can also growing, and the Energy Estimation E_LT after will be level and smooth estimates again that as the present frame excitation energy assignment gives E T, 1, operate as follows:
E_LT1=αE_LT+(1-α)E t,1
E t,1=E_LT1
The α value is 0.9.
After step 603, the parameter extraction, carry out the DTX judgement of current 10ms frame; The particular content of DTX judgement is:
(the SID superframe is exactly the background noise superframe that finally will encode and send after the DTX judgement to algorithm with SID superframe before, if DTX court verdict, this superframe does not send, then be not referred to as the SID superframe) in arrowband part coding parameter and current 10 milliseconds of frame respective coding parameters compare, if current LPC filter coefficient is bigger with LPC filter coefficient difference in the SID superframe before, energy parameter difference in perhaps current energy parameter and the SID superframe before is big (formula of face as follows), then the parameter change flag flag_change_first with current 10ms frame puts 1, otherwise zero clearing.Concrete definite method in this step is with G.729B similar:
At first, establish LPC filter A in the Last SID superframe Sid(z) coefficient is a Sid(j), j=0...10 if the Itakura of the LPC filter of a current 10ms frame and last SID superframe distance has surpassed certain thresholding, just makes flag_change_first put 1, otherwise zero setting:
if ( Σ j = 0 10 R a ( i ) × R t , 1 ( i ) > E t , 1 × thr )
flag_change_first=1
else
flag_change_first=0
Wherein, thr is concrete threshold value, between 1.0 to 1.5, is 1.342676475 in the present embodiment generally, R a(j), j=0...10 is the auto-correlation coefficient of a last SID superframe LPC filter coefficient:
R a ( j ) = 2 Σ k = 0 10 - j a sid ( k ) × a sid ( k + j ) if ( j ≠ 0 ) R a ( 0 ) = Σ k = 0 10 a sid ( k ) 2
Secondly, calculate current 10ms frame and nearest three the 10ms frames mean value of totally four 10ms frame residual energy:
E t,1=(E t,1+E t-1,2+E t-1,1+E t-2,2)/4
It should be noted that if current superframe is second superframe (being that previous superframe is first superframe) in noise code stage, E so T-2,2Value be 0.To E T, 1Quantize with the logarithm quantizer.With the logarithm ENERGY E after the decoding Q, 1With the decoded logarithm ENERGY E of a last SID superframe q SidCompare,, just make flag_change_first put one if the difference of the two surpasses 3dB, otherwise zero setting:
if abs ( E q sid - E q , 1 ) > 3
flag_change_first=1
else
flag_change_first=0
For the those skilled in the art, the difference of two excitation energies is set to other value according to actual needs, and this does not exceed protection scope of the present invention.
After background noise parameter extraction that has carried out first 10ms frame and DTX judgement, just carry out the background noise parameter extraction and the DTX judgement of second 10ms frame.
The background noise parameter extraction of second 10ms frame and DTX judgement flow process are consistent with first 10ms frame, and wherein the relevant parameter of second 10ms frame is: the stable state mean value R of adjacent four 10ms frame auto-correlation coefficients T, 2(j), the mean value E of adjacent four 10ms frame frame energy T, 2And the DTX of second 10ms frame sign flag_change_second.
Four, arrowband part background noise parameter extraction in the current superframe and DTX judgement.
With reference to shown in Figure 7, be arrowband part background noise parameter extraction and the DTX judgement flow chart in the current superframe, comprise step:
Step 701, determine the final DTX sign flag_change of current superframe arrowband part, it determines that mode is as follows:
flag_change=flag_change_first‖flag_change_second
As long as it is 1 that the DTX court verdict of a 10ms frame is promptly arranged, the final judging result of then current superframe arrowband part is 1.
Step 702, determine the final DTX court verdict of current superframe; The final DTX court verdict that comprises the current superframe of current superframe highband part then also will be considered the characteristic of highband part, by the final DTX court verdict of arrowband part and the current superframe of the comprehensive decision of highband part.If the final DTX court verdict of current superframe is 1, then enter step 703; If the DTX court verdict of current superframe is 0, then do not encode, only to the NODATA frame of decoding end transmission without any data.
If the final DTX court verdict of the current superframe of step 703 is 1, then the background noise characteristic parameter to current superframe extracts; The source of extracting the background noise characteristic parameter of current superframe is the parameter of current two 10ms frames, and the parameter that is about to current two 10ms frames smoothly obtains the background noise coding parameter of current superframe.Extract the background noise characteristic parameter and carry out the level and smooth process of background noise characteristic parameter as follows:
At first, determine smoothing factor smooth_rate:
if??(flag_change_first==0&&flag_change_second==1)
smooth_rate=0.1
else
smooth_rate=0.5
If that is: the DTX court verdict of first 10ms frame is 0, and the DTX court verdict of second 10ms frame is 1, then when level and smooth, the level and smooth weight of first 10ms frame background noise characteristic parameter is 0.1, the average weight of second 10ms frame background noise characteristic parameter is 0.9, otherwise the level and smooth weight of two 10ms frame background noise characteristic parameters all is 0.5.
Then, the background noise characteristic parameter of two 10ms frames is carried out smoothly, obtain the LPC filter coefficient of current superframe and calculate the mean value of two 10ms frame frame energy, its process comprises:
At first, calculate the sliding average R of two 10ms frame auto-correlation coefficient stable state mean values t(j):
R t(j)=smooth_rateR t,1(j)+(1-smooth_rate)R t,2(j)
Obtain the sliding average R of auto-correlation coefficient t(j) afterwards, according to the Levinson-Durbin algorithm, obtain LPC filter A t(z), its coefficient is a t(j), j=0 ..., 10;
Secondly, calculate the mean value E of two 10ms frame frame energy t:
E t=smooth_rateE t,1+(1-smooth_rate)E t,2
So just obtain the coding parameter of the arrowband part of current superframe: LPC filter coefficient and frame average energy.Background noise characteristic parameter extraction and the abundant characteristic of DTX control by each the 10ms frame of current superframe of knowing clearly, so algorithm is comparatively rigorous.
Five, the coding of SID frame
With G.729B the same, when the spectrum parameter of SID frame is finally encoded, considered the stable case between the adjacent noise frame, concrete operation is with G.729B consistent:
At first, calculate current superframe N before pThe average LPC filter A of individual superframe p(z), this will use auto-correlation function mean value R p(j), then with R p(j) send in the Levinson-Durbin algorithm and obtain A pAnd R (z), p(j) be expressed as:
R ‾ p ( j ) = 1 2 * N p Σ i = t - 1 - N p t - 1 Σ k = 1 2 r i , k ′ ( j ) , j = 0 . . . 10
N wherein pValue be decided to be 5.Like this, the SID-LPC filter table is shown:
A sid ( z ) = A t ( z ) if dis tan ce ( A t ( z ) , A ‾ p ( z ) ) > thr 3 A ‾ p ( z ) otherwise
Be the average LPC filter coefficient A that algorithm can calculate former superframes p(z) then, with itself and current LPC filter coefficient A t(z) compare, if the two difference is less, that so current superframe is selected when quantizing the LPC coefficient is exactly the mean value A of preceding several superframes pOtherwise be exactly the A of current superframe (z), t(z), concrete comparative approach is the same with the method for 10ms frame DTX judgement in the step 602, and wherein thr3 is concrete threshold value, between 1.0 to 1.5, is 1.0966466 in the present embodiment generally.The technical staff in described field can get other value according to actual needs, and this does not exceed protection scope of the present invention.
Choose after the LPC filter coefficient, algorithm is transformed into the LSF territory with these LPC filter coefficients, carries out quantization encoding then, and mode that quantization encoding is selected and quantization encoding mode G.729B are similar.
The quantification of energy parameter is finished at log-domain, employing be linear quantification, encode then.Like this coding of background noise is just finished, then these coded-bits are encapsulated in the SID frame.
Six, the mode of CNG
In coding based on the CELP model, in order to obtain best coding parameter, also comprised the process of decoding at coding side, no exception for the CNG system, promptly coding side also will comprise the CNG module in G.729.1.For the CNG in G.729.1, the flow process of its processing is based on G.729B, though its frame length is 20ms, but is that basic data processing length is handled background noise with 10ms.But from a last joint as can be known, the coding parameter of first SID superframe just can be to encoding at the 2nd 10ms frame, but at this moment system need just produce the parameter of CNG at first 10ms frame of first SID superframe.Obviously, the CNG parameter of first 10ms frame of first SID superframe can not obtain from the coding parameter of SID superframe, and can only be from obtaining the speech coding superframe before.Since there are this special circumstances, the CNG mode of first 10ms frame of first SID superframe G.729.1 and G.729B different, the CNG mode G.729B that aforementioned relatively related content is introduced, this different manifestations exists:
(1) target excitation gain
Figure G2009101667401D00271
The fixed codebook gain LT_G that level and smooth speech coding superframe quantizes during by length fDefinition:
G ~ t = LT _ G ‾ f * γ
Wherein, γ=0.4 can be selected in the present embodiment in 0<γ<1.
(2) LPC filter coefficient A SidLPC filter coefficient LT_A (z) definition that level and smooth speech coding superframe quantizes during (z) by length:
A sid(z)=LT_A(z)
Other operation is consistent with 729B.
If fixed codebook gain and LPC filter coefficient that vocoder frames quantizes are respectively gain_code and A qLevel and smooth parameter was respectively calculated as follows when (z), then these were long:
LT_G f=βLT_G f+(1-β)gain_code
LT_A(z)=βLT_A(z)+(1-β)A q(z)
Each subframe that more than operates in voice superframes is all carried out smoothly, and wherein the span of smoothing factor β is 0<β<1, is 0.5 in the present embodiment.
In addition, except first 10ms frame of first SID superframe with 729B is slightly different, the CNG mode of other all 10ms frames is all with G.729B consistent.
Wherein, in the above-described embodiments, described hangover time is 120 milliseconds or 140 milliseconds.
Wherein, in the above-described embodiments, the background noise characteristic parameter in the described extraction hangover time is specially:
In described hangover time,, preserve the auto-correlation coefficient of every frame background noise to each frame of each superframe.
Wherein, described to first superframe behind the described hangover time in the above-described embodiments according to the background noise characteristic parameter in the described hangover time that extracts and the background noise characteristic parameter of described first superframe, carry out the background noise coding and comprise:
Preserve the auto-correlation coefficient of each frame background noise at first frame and second frame;
At second frame, according to the auto-correlation coefficient of described described two frames that extract and the background noise characteristic parameter in the described hangover time, extract the LPC filter coefficient and the residual energy of described first superframe, carry out the background noise coding.
Wherein, in the above-described embodiments, the described LPC filter coefficient of described extraction is specially:
Calculate the mean value of the auto-correlation coefficient of four superframes in described first superframe and described first superframe described hangover time before;
By the mean value of described auto-correlation coefficient, go out the LPC filter coefficient according to the Levinson-Durbin algorithm computation;
The described residual energy E of described extraction tBe specially:
Go out residual energy according to the Levinson-Durbin algorithm computation;
Describedly carry out background noise coding at second frame and be specially:
Described LPC filter coefficient is transformed into the LSF territory, carries out quantization encoding;
Described residual energy is carried out the equal interval quantizing coding at log-domain.
Wherein, in the above-described embodiments, after calculating described residual energy, carry out also comprising before the quantization encoding:
Level and smooth when described residual energy is grown;
Smoothing formula is: E_LT=α E_LT+ (1-α) E t, the α span is: 0<α<1;
With the value of the Energy Estimation E_LT after level and smooth value as residual energy.
Wherein, described to the superframe behind first superframe in the above-described embodiments, each frame is all carried out the background noise characteristic parameter extraction be specially:
Value according to nearest four consecutive frame auto-correlation coefficients, calculate the stable state mean value of current auto-correlation coefficient, the stable state mean value of described auto-correlation coefficient is the mean value of auto-correlation coefficient that has two frames of middle auto-correlation coefficient norm value in described nearest four consecutive frames;
To described stable state mean value, according to Levinson-durbin algorithm computation background noise LPC filter coefficient and residual energy.
Wherein, in the above-described embodiments, after calculating described residual energy, also comprise:
Level and smooth when described residual energy is grown, obtain the present frame Energy Estimation; Smooth manner is:
E_LT=αE_LT1+(1-α)E t,k
The α value is: 0<α<1;
Give described residual energy with the present frame Energy Estimation assignment after level and smooth; The assignment mode is:
E t,k=E_LT。
K=1 wherein, 2, represent first frame and second frame respectively.
Wherein, among each embodiment: α=0.9.
Wherein, described to the superframe behind first superframe in the above-described embodiments, each frame is all carried out the DTX judgement be specially:
If the value of a present frame LPC filter coefficient and a last SID superframe LPC filter coefficient surpasses default threshold value, perhaps to compare difference bigger for the Energy Estimation of present frame and the Energy Estimation in the last SID superframe, and then the parameter change flag with present frame is made as 1;
If the value of current 10 milliseconds of frame LPC filter coefficients and a last SID superframe LPC filter coefficient does not have to surpass default threshold value, it is little that the Energy Estimation of perhaps current 10 milliseconds of frames and the Energy Estimation in the last SID superframe are compared difference, and then the parameter change flag with current 10 milliseconds of frames is made as 0.
Wherein, in the above-described embodiments, the Energy Estimation of described present frame is compared difference with the Energy Estimation in the last SID superframe and is specially more greatly:
Calculate current 10 milliseconds of frames and nearest before 3 frames the mean value of the residual energy of totally 4 frames as the Energy Estimation of present frame;
Use the logarithm quantizer to quantize the mean value of described residual energy;
If the difference of decoded logarithm energy and the decoded logarithm energy of a last SID superframe surpasses preset value, it is bigger that the Energy Estimation of then determining described present frame and the Energy Estimation in the last SID superframe are compared difference.
Wherein, in the above-described embodiments, describedly each frame all carried out DTX judgement be specially:
If it is 1 that the DTX court verdict of a frame is arranged in the current superframe, the DTX court verdict of then current superframe arrowband part is 1.
Wherein, in the above-described embodiments, the described final DTX court verdict of stating current superframe is 1, then described: " to the superframe behind first superframe; according to the background noise characteristic parameter of some superframes before the background noise characteristic parameter of the current superframe that extracts and the described current superframe; and final DTX court verdict, carrying out the background noise coding " process comprises:
For described current superframe, determine smoothing factor, comprising:
If the DTX of current superframe first frame is zero, the DTX of second frame is 1, and then described smoothing factor is 0.1, otherwise described smoothing factor is 0.5;
Two frames to described current superframe carry out parameter smoothing, and as the characteristic parameter that carries out the background noise coding to described current superframe, described parameter smoothing comprises with the parameter behind the parameter smoothing:
Calculate the sliding average R of described two frame auto-correlation coefficient stable state mean values t(j):
R t(j)=smooth_rateR T, 1(j)+(1-smooth_rate) R T, 2(j), described smooth_rate is described smoothing factor, R T, 1(j) be the auto-correlation coefficient stable state mean value of first frame, R T, 2(j) be the auto-correlation coefficient stable state mean value of second frame;
Sliding average R to described two frame auto-correlation coefficient stable state mean values t(j),, obtain the LPC filter coefficient according to the Levinson-Durbin algorithm,
Calculate the sliding average E of described two frame frame Energy Estimation t:
E t=smooth_rateE T, 1+ (1-smooth_rate) E T, 2, described E T, 1Be the Energy Estimation of first frame, E T, 2It is the Energy Estimation of second frame.
Wherein, in the above-described embodiments, described " according to the background noise characteristic parameter of some superframes before the background noise characteristic parameter of the current superframe that extracts and the described current superframe, and final DTX court verdict, carry out the background noise coding " be specially:
Calculate the current superframe mean value of the auto-correlation coefficient of some superframes before;
According to the mean value of described auto-correlation coefficient, calculate the current superframe average LPC filter coefficient of several superframes before;
If the LPC filter coefficient difference of described average LPC filter coefficient and current superframe is less than or equal to preset value, then described average LPC filter coefficient is transformed into the LSF territory, carry out quantization encoding;
If the LPC filter coefficient difference of described average LPC filter coefficient and current superframe is greater than preset value, then the LPC filter coefficient with described current superframe is transformed into the LSF territory, carries out quantization encoding;
To energy parameter, carry out the equal interval quantizing coding at log-domain.
Wherein, in the above-described embodiments, the quantity of described some frames is 5.The those skilled in the art also can select the frame number of other quantity as required.
Wherein, in the above-described embodiments, before the background noise characteristic parameter step in described extraction hangover time, also comprise:
With speech encoding rate the background noise in the described hangover time is encoded.
With reference to shown in Figure 8, be the embodiment one of coding/decoding method of the present invention, comprise step:
Step 801, the vocoder frames before first frame of first superframe obtain the CNG parameter of first frame of first superframe;
Step 802, according to described CNG parameter, first frame of first superframe is carried out the background noise decoding, described CNG parameter comprises:
The fixed codebook gain of level and smooth vocoder frames parameter quantification was determined when target excitation gain, described target excitation gained by length;
Wherein, in practice, described definite target gain can be specially: target excitation gain=γ * fixed codebook gain, 0<γ<1;
The filter coefficient definition of filter coefficient, described filter coefficient level and smooth vocoder frames parameter quantification when long;
Wherein, in practice, the described filter coefficient of described definition can be specially:
The filter coefficient that level and smooth vocoder frames quantizes during filter coefficient=length.
Wherein, in the foregoing description, described when long the smoothing factor span be: greater than 0 and less than 1.
Wherein, in the foregoing description, described when long smoothing factor can be 0.5.
Wherein, in the foregoing description, described γ=0.4.
Wherein, in the foregoing description, described first frame to first superframe carries out can also comprising after the background noise decode procedure:
To all frames except that first frame of described first superframe, obtain the CNG parameter from the Last SID superframe after, carry out the background noise decoding according to the described CNG parameter of obtaining.
With reference to shown in Figure 9, be the embodiment one of code device of the present invention, comprising:
First extraction unit 901 is used for: extract the background noise characteristic parameter in the hangover time;
Second coding unit 902 is used for: to first superframe behind the described hangover time, according to the background noise characteristic parameter in the described hangover time that extracts and the background noise characteristic parameter of described first superframe, carry out the background noise coding;
Second extraction unit 903 is used for: to the superframe behind first superframe, each frame is all carried out the background noise characteristic parameter extraction;
DTX decision unit 904 is used for: to the superframe behind first superframe, each frame is all carried out the DTX judgement;
The 3rd coding unit 905, be used for: to the superframe behind first superframe, according to the background noise characteristic parameter of some superframes before the background noise characteristic parameter of the current superframe that extracts and the described current superframe, and final DTX court verdict, carry out the background noise coding.
Wherein, in the foregoing description, described hangover time is 120 milliseconds or 140 milliseconds.
Wherein, in the foregoing description, described first extraction unit is specially:
Cache module is used for: in described hangover time, to each frame of each superframe, preserve the auto-correlation coefficient of every frame background noise.
Wherein, in the foregoing description, described second coding unit is specially:
Extraction module is used for: the auto-correlation coefficient of preserving every frame background noise at first frame and second frame;
Coding module, be used for: at second frame, according to the auto-correlation coefficient of described described two frames that extract and the background noise characteristic parameter in the described hangover time, extract the LPC filter coefficient and the residual energy of described first superframe, carry out the background noise coding.
Wherein, in the foregoing description, described second coding unit can also comprise:
The level and smooth module of residual energy is used for: level and smooth when described residual energy is grown;
Smoothing formula is: E_LT=α E_LT+ (1-α) E t, the α span is: 0<α<1;
With the value of the Energy Estimation E_LT after level and smooth value as described residual energy.
Wherein, in the foregoing description, described second extraction unit is specially:
First computing module, be used for: according to the value of nearest four consecutive frame auto-correlation coefficients, calculate the stable state mean value of current auto-correlation coefficient, the stable state mean value of described auto-correlation coefficient is the mean value of auto-correlation coefficient that has two frames of middle auto-correlation coefficient norm value in described nearest four consecutive frames;
Second computing module is used for: to described stable state mean value, according to Levinson-durbin algorithm computation background noise LPC filter coefficient and residual energy.
Wherein, in the foregoing description, described second extraction unit can also comprise:
The level and smooth module of second residual energy is used for: level and smooth when described residual energy is grown, obtain the present frame Energy Estimation; Smooth manner is:
E_LT=αE_LT1+(1-α)E t,k
The α value is: 0<α<1;
Give described residual energy with the present frame Energy Estimation assignment after level and smooth; The assignment mode is:
E t,k=E_LT。
K=1 wherein, 2, represent first frame and second frame respectively.
Wherein, in the foregoing description, described DTX decision unit is specially:
The thresholding comparison module is used for: if the value of a present frame LPC filter coefficient and a last SID superframe LPC filter coefficient surpasses default threshold value, then generate decision instruction;
The energy comparison module, be used for: calculate present frame and nearest before 3 frames the mean value of the residual energy of totally 4 frames as the Energy Estimation of present frame, use the logarithm quantizer to quantize the mean value of described residual energy, if the difference of decoded logarithm energy and the decoded logarithm energy of a last SID superframe surpasses preset value, then generate decision instruction;
First judging module is used for: according to described decision instruction, the parameter change flag of present frame is made as 1.
Wherein, in the foregoing description, can also comprise:
Second decision unit is used for: if current superframe has the DTX court verdict of a frame is 1, and the DTX court verdict of then current superframe arrowband part is 1;
Described the 3rd coding unit is specially:
Level and smooth indicating module is used for: if the described final DTX court verdict of current superframe is 1, then generate level and smooth instruction;
The smoothing factor determination module is used for: after receiving described level and smooth instruction, determine the smoothing factor of described current superframe:
If the DTX of current superframe first frame is zero, the DTX of second frame is 1, and then described smoothing factor is 0.1, otherwise described smoothing factor is 0.5;
The parameter smoothing module is used for: two frames to described current superframe carry out parameter smoothing, and the parameter after level and smooth as the characteristic parameter that carries out the background noise coding to described current superframe, being comprised:
Calculate the sliding average R of described two frame auto-correlation coefficient stable state mean values t(j):
R t(j)=smooth_rateR T, 1(j)+(1-smooth_rate) R T, 2(j), described smooth_rate is described smoothing factor, R T, 1(j) be the auto-correlation coefficient stable state mean value of first frame, R T, 2(j) be the auto-correlation coefficient stable state mean value of second frame;
Sliding average R to described two frame auto-correlation coefficient stable state mean values t(j),, obtain the LPC filter coefficient according to the Levinson-Durbin algorithm,
Calculate the sliding average E of described two frame frame Energy Estimation t:
E t=smooth_rateE T, 1+ (1-smooth_rate) E T, 2, described E T, 1Be the Energy Estimation of first frame, E T, 2It is the Energy Estimation of second frame.
Wherein, in the foregoing description, described the 3rd coding unit is specially:
The 3rd computing module is used for: according to the mean value of the auto-correlation coefficient of some superframes before the current superframe that calculates, calculate the average LPC filter coefficient of several superframes before the current superframe;
First coding module is used for: if the LPC filter coefficient difference of described average LPC filter coefficient and current superframe is less than or equal to preset value, then described average LPC filter coefficient is transformed into the LSF territory, carries out quantization encoding;
Second coding module is used for: if the LPC filter coefficient difference of described average LPC filter coefficient and current superframe greater than preset value, then the LPC filter coefficient with described current superframe is transformed into the LSF territory, carries out quantization encoding;
The 3rd coding module is used for: to energy parameter, carry out the equal interval quantizing coding at log-domain.
Wherein, in the foregoing description, α=0.9.
Wherein, in the foregoing description, can also comprise:
First coding unit is used for: with speech encoding rate the background noise in the hangover time is encoded;
The concrete course of work that adapts with coding method of the present invention of code device of the present invention correspondingly, also has and the corresponding same technique effect of method embodiment.
With reference to shown in Figure 10, be the embodiment one of decoding device of the present invention, comprising:
The CNG parameter obtains unit 1001, is used for: from first frame vocoder frames before of first superframe, obtain the CNG parameter of first frame of first superframe;
First decoding unit 1002 is used for: according to described CNG parameter, first frame of first superframe is carried out the background noise decoding, described CNG parameter comprises:
The fixed codebook gain that level and smooth vocoder frames quantized when target excitation gain, described target excitation gained by length determines that wherein, in practice, described target excitation gain is determined to be specially:
Target excitation gain=γ * fixed codebook gain, the span of described γ is: 0<γ<1;
The LPC filter coefficient definition that LPC filter coefficient, described LPC filter coefficient level and smooth vocoder frames when long quantizes, wherein, in practice, described definition LPC filter coefficient can be specially:
The LPC filter coefficient that level and smooth vocoder frames quantizes during LPC filter coefficient=length.
Wherein, in the above-described embodiments, described when long the smoothing factor span be: greater than 0 and less than 1.
In the preferred case, described when long smoothing factor can be 0.5.
Wherein, in the above-described embodiments, can also comprise:
Second decoding unit is used for: to all frames except that described first superframe, obtain the CNG parameter from the Last SID superframe after, carry out the background noise coding according to the described CNG that obtains.
Wherein, in the above-described embodiments, described γ=0.4.
The concrete course of work that adapts with coding/decoding method of the present invention of decoding device of the present invention correspondingly, also has and the corresponding same technique effect of coding/decoding method embodiment.
Above-described embodiment of the present invention does not constitute the qualification to protection range of the present invention.Any modification of being done within the spirit and principles in the present invention, be equal to and replace and improvement etc., all should be included within protection scope of the present invention.

Claims (12)

1, a kind of coding/decoding method is characterized in that, comprising:
From first frame vocoder frames before of first superframe, obtain the CNG parameter of first frame of first superframe;
According to described CNG parameter, first frame of first superframe is carried out the background noise decoding, described CNG parameter comprises:
The fixed codebook gain that level and smooth vocoder frames quantized when target excitation gain, described target excitation gained by length is determined;
The LPC filter coefficient definition that LPC filter coefficient, described LPC filter coefficient level and smooth vocoder frames when long quantizes.
2, the method for claim 1 is characterized in that, described when long the smoothing factor span be: greater than 0 and less than 1.
3, the method for claim 1 is characterized in that, described first frame to first superframe carries out also comprising after the background noise decode procedure:
To all frames except that first frame of described first superframe, obtain the CNG parameter from the Last SID superframe after, carry out the background noise decoding according to the described CNG parameter of obtaining.
4, method as claimed in claim 2 is characterized in that, described when long smoothing factor be 0.5.
5, the method for claim 1 is characterized in that, described definite described target excitation gain is specially: described target excitation gain=γ * fixed codebook gain, 0<γ<1.
6, method as claimed in claim 5 is characterized in that, described γ=0.4.
7, the method for claim 1 is characterized in that, the described LPC filter coefficient of described definition is specially: the LPC filter coefficient that level and smooth vocoder frames quantizes during described LPC filter coefficient=length.
8, a kind of decoding device is characterized in that, comprising:
The CNG parameter obtains the unit, is used for: from first frame vocoder frames before of first superframe, obtain the CNG parameter of first frame of first superframe;
First decoding unit is used for: according to described CNG parameter, first frame of first superframe is carried out the background noise decoding, described CNG parameter comprises:
The fixed codebook gain that level and smooth vocoder frames quantized when target excitation gain, described target excitation gained by length is determined;
The LPC filter coefficient definition that LPC filter coefficient, described LPC filter coefficient level and smooth vocoder frames when long quantizes.
9, device as claimed in claim 8 is characterized in that, described when long the smoothing factor span be: greater than 0 and less than 1.
10, device as claimed in claim 8 is characterized in that, also comprises:
Second decoding unit is used for: to all frames except that described first superframe, obtain the CNG parameter from the Last SID superframe after, carry out the background noise coding according to the described CNG that obtains.
11, device as claimed in claim 8 is characterized in that, described definite described target excitation gain is specially:
Described target excitation gain=γ * fixed codebook gain, the span of described γ is: 0<γ<1.
12, device as claimed in claim 8 is characterized in that, the described LPC filter coefficient of described definition is specially:
The LPC filter coefficient that level and smooth vocoder frames quantizes during described LPC filter coefficient=length.
CN2009101667401A 2008-03-26 2008-03-26 Decoding method and decoding device Active CN101651752B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009101667401A CN101651752B (en) 2008-03-26 2008-03-26 Decoding method and decoding device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009101667401A CN101651752B (en) 2008-03-26 2008-03-26 Decoding method and decoding device

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN2008100840776A Division CN101335000B (en) 2008-03-26 2008-03-26 Method and apparatus for encoding

Publications (2)

Publication Number Publication Date
CN101651752A true CN101651752A (en) 2010-02-17
CN101651752B CN101651752B (en) 2012-11-21

Family

ID=41673849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009101667401A Active CN101651752B (en) 2008-03-26 2008-03-26 Decoding method and decoding device

Country Status (1)

Country Link
CN (1) CN101651752B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102903365A (en) * 2012-10-30 2013-01-30 山东省计算中心 Method for refining parameter of narrow band vocoder on decoding end
CN103093756A (en) * 2011-11-01 2013-05-08 联芯科技有限公司 Comfort noise generation method and comfort noise generator
WO2015154397A1 (en) * 2014-04-08 2015-10-15 华为技术有限公司 Noise signal processing and generation method, encoder/decoder and encoding/decoding system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2356538A (en) * 1999-11-22 2001-05-23 Mitel Corp Comfort noise generation for open discontinuous transmission systems
US6662155B2 (en) * 2000-11-27 2003-12-09 Nokia Corporation Method and system for comfort noise generation in speech communication

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093756A (en) * 2011-11-01 2013-05-08 联芯科技有限公司 Comfort noise generation method and comfort noise generator
CN103093756B (en) * 2011-11-01 2015-08-12 联芯科技有限公司 Method of comfort noise generation and Comfort Noise Generator
CN102903365A (en) * 2012-10-30 2013-01-30 山东省计算中心 Method for refining parameter of narrow band vocoder on decoding end
WO2015154397A1 (en) * 2014-04-08 2015-10-15 华为技术有限公司 Noise signal processing and generation method, encoder/decoder and encoding/decoding system
US9728195B2 (en) 2014-04-08 2017-08-08 Huawei Technologies Co., Ltd. Noise signal processing method, noise signal generation method, encoder, decoder, and encoding and decoding system
US10134406B2 (en) 2014-04-08 2018-11-20 Huawei Technologies Co., Ltd. Noise signal processing method, noise signal generation method, encoder, decoder, and encoding and decoding system
US10734003B2 (en) 2014-04-08 2020-08-04 Huawei Technologies Co., Ltd. Noise signal processing method, noise signal generation method, encoder, decoder, and encoding and decoding system

Also Published As

Publication number Publication date
CN101651752B (en) 2012-11-21

Similar Documents

Publication Publication Date Title
CN101335000B (en) Method and apparatus for encoding
US9715883B2 (en) Multi-mode audio codec and CELP coding adapted therefore
US11450329B2 (en) Method and device for quantization of linear prediction coefficient and method and device for inverse quantization
JP4270866B2 (en) High performance low bit rate coding method and apparatus for non-speech speech
US20050251387A1 (en) Method and device for gain quantization in variable bit rate wideband speech coding
KR101698905B1 (en) Apparatus and method for encoding and decoding an audio signal using an aligned look-ahead portion
US11238878B2 (en) Method and device for quantizing linear predictive coefficient, and method and device for dequantizing same
EP2633521A1 (en) Coding generic audio signals at low bitrates and low delay
CN103050121A (en) Linear prediction speech coding method and speech synthesis method
CN101430880A (en) Encoding/decoding method and apparatus for ambient noise
KR101931273B1 (en) Concept for encoding an audio signal and decoding an audio signal using deterministic and noise like information
CN100555414C (en) A kind of DTX decision method and device
Jelinek et al. Wideband speech coding advances in VMR-WB standard
CN101651752B (en) Decoding method and decoding device
Krishnan et al. EVRC-Wideband: the new 3GPP2 wideband vocoder standard
CN101192408A (en) Method and device for selecting conductivity coefficient vector quantization
Jelinek et al. On the architecture of the cdma2000/spl reg/variable-rate multimode wideband (VMR-WB) speech coding standard
Hiwasaki et al. An LPC vocoder based on phase-equalized pitch waveform
Kaur et al. MATLAB based encoder designing of 5.90 kbps narrow-band AMR codec
Ekudden et al. ITU-t g. 729 extension at 6.4 kbps.
Ohmuro et al. Dual-Pulse CS-CELP: a toll-quality low-complexity speech coder at 7.8 kbit/s

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant