CN101609683B - Encoder and method for self adapting to discontinuous transmission of multi-rate narrowband - Google Patents

Encoder and method for self adapting to discontinuous transmission of multi-rate narrowband Download PDF

Info

Publication number
CN101609683B
CN101609683B CN2008100389866A CN200810038986A CN101609683B CN 101609683 B CN101609683 B CN 101609683B CN 2008100389866 A CN2008100389866 A CN 2008100389866A CN 200810038986 A CN200810038986 A CN 200810038986A CN 101609683 B CN101609683 B CN 101609683B
Authority
CN
China
Prior art keywords
signal
voice signal
weighting
delay
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2008100389866A
Other languages
Chinese (zh)
Other versions
CN101609683A (en
Inventor
向为
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN2008100389866A priority Critical patent/CN101609683B/en
Publication of CN101609683A publication Critical patent/CN101609683A/en
Application granted granted Critical
Publication of CN101609683B publication Critical patent/CN101609683B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides an encoder and an encoding method adopting backward related detection in a discontinuous transmitting mechanism. The backward related detection is to take a current frame to be encoded and a subsequent adjacent signal as detecting objects; discontinuously transmitted type when a backward related result is detected is normal voice; and the self-adaptive multi-rate encoder and the encoding method can ensure that a sound signal synchronized by a decoder accurately reflects the hearing effect of original sound. The encoder and the encoding method can be directly applied to voice encoding technology of a third generation mobile communication system, namely a general mobile communication system.

Description

A kind of scrambler and the method for the discontinuous transmission of self-adapting multi-rate narrowband
Technical field
The present invention relates to self-adapting multi-rate narrowband scrambler and coding method thereof; The follow-up signal according to input signal frame that is specifically related to the self-adapting multi-rate narrowband scrambler is confirmed the problem that encoding process that the apparatus and method and solving of the transmission types of discontinuous transmission cause thus postpones.
Background technology
Code excited linear prediction coder has obtained using widely since 1985 are suggested.In the vocoder of CDMA (CDMA) and UMTS (UMTS), all used the technology of code excited linear prediction coder.
Code Excited Linear Prediction has comprised linear prediction and quantification, self-adapting code book search and fixed codebook search.Because itself has quiet period voice; Can be through reducing the transfer rate of the effective compressed voice data of data rate between these quiet period, the application number of Qualcomm is that the patent of 92104618.9 rate changeable vocoder is exactly a scheme about said method.
In UMTS, used AMR (AMR) voice coding; AMR (AMR) voice coding is the voice compression coding in the 3G (Third Generation) Moblie that is applied to of 3GPP (3G (Third Generation) Moblie gpp) formulation; AMR (AMR) voice coding is divided into self-adapting multi-rate narrowband (AMR-NB) voice coding, AMR-WB (AMR-WB) voice coding and AMR-WB modified (AMR-WB+) voice coding again, and these coding methods are all based on code book excitation linear linear forecasting technology.The code book excited linear prediction (CELP) coder that adopts in AMR (AMR) code encoding/decoding mode is divided into several subframes with a voice signal frame, carries out linear prediction and quantification, self-adapting code book search and quantification and fixed codebook search and quantification.12.2,10.2,7.95,7.40,6.70,5.90,5.15,4.75kb/s (kilobits/second) AMR-NB (self-adapting multi-rate narrowband) voice coding is supported the code rate of the speech pattern of eight kinds of speed:; And the ground unrest code rate of low rate (1.80kb/s), the form 1 of the chapters and sections 5 of the TS26.071-500 of 3GPP (Table 1) has provided the encoder modes of corresponding above-mentioned these self-adapting multi-rate narrowband code rates: AMR_12.20, AMR_10.20, AMR_7.95, AMR_7.40, AMR_6.70, AMR_5.90, AMR_5.15, AMR_4.75 and AMR_SID.
Linear prediction and quantification have comprised: the resulting sampling rate of pre-service that the voice signal frame that sampling is obtained carries out high-pass filtering is that the voice signal frame of 8 KHzs is formed a sequence; Take advantage of the sample sound in this sequence with a window function, so that the voice data frame of a windowing to be provided; Voice data frame by said windowing calculates one group of coefficient of autocorrelation; With Lai Wenxun-Du Bin (Levinson-Durbin) algorithm by one group of linear predictor coefficient of said coefficient of autocorrelation set of calculated: said linear predictor coefficient group is transformed into another spectrum domain; Quantize the said coefficient sets that is transformed on another spectrum domain according to the speed in the coded order; For example; One group of line frequency spectrum on 10 rank is to the value of (LSP); Or one group of acoustic reactance on 16 rank receives frequency spectrum to the value of (ISP), to (LSP), in the article " the line frequency spectrum is to (LSP) and speech data compression " in being published in international language voice and signal Processing meeting (ICASSP) ' 84 explanation arranged about the line frequency spectrum; The application number of Qualcomm is in the patent of 92104618.9 rate changeable vocoder explanation to be arranged also, in the C.S0014-A of the TS of 3GPP (technical manual) 26090 and 3GPP2, explanation is arranged all also.
Characteristic according to the voice signal of input signal frame confirms in the scheme of code rate that the voice signal energy in short a period of time (a for example frame) is a reference object often, and common, the definition of the short-time energy of voice signal is following:
E n = Σ m = - ∞ ∞ x 2 ( m ) · h ( n - m )
E nBe that signal x (m) is the energy on this time period of N sample point in the length that begins from sample point n, h (n) is a rectangular window
h(n)=1,0≤n<N-1
=1, other
In the QCELP Qualcomm process, the best code book vector signal that self-adapting code book search and fixed codebook search obtain multiply by addition after the optimum gain separately, itself and be pumping signal.Pumping signal is must use in the cataloged procedure, and QCELP Qualcomm is the minimum synthetic speech based on pumping signal of error between search and the raw tone.
The TS26.090 of 3GPP is described the self-adapting code book search of self-adapting multi-rate narrowband, for example, and 5.6 joints of TS26.090-310 version.Self-adapting code book search comprised based on former pumping signal closed loop pitch (pitch) search with after this interiorly insert the calculating that pumping signal in the past obtains self-adapting code book by what selected integer and mark pitch delay carried out.The self-adapting code book parameter that the self-adapting code book search obtains is the self-adapting code book gain of pumping signal, integer and mark pitch delay, self-adapting code book gain and quantification.
Closed loop pitch searcher is to accomplish through the minimizing of all square weighted errors between raw tone and the reconstruct voice; Said minimizing need be found out minimum all square weighted error the pairing all square weighted error of each delay value in the hunting zone, and the pairing all square weighted error of each delay value is confirmed the response of former pumping signal by self-adapting code book ferret out signal (target signal) and weighted synthesis filter (weighted synthesis filter).Concerning self-adapting multi-rate narrowband, 5.6 in the TS26.090-310 version of 3GPP joint explains this, is exactly that the integer delay value k that finds the solution earlier by the characteristic item r (k) of following formula (1) expression when maximum obtains best integer delay,
r ( k ) = Σ n = 0 39 x ( n ) y k ( n ) Σ n = 0 39 y k ( n ) y k ( n ) , - - - ( 1 )
X (n) is an echo signal, y kBe the value through filtering of crossing deactivation signal at integer time-delay k place, near the mark delay value best integer delay obtains through the interior normalized characteristic item r (k) that inserts, and the maximum mark delay value of search can obtain best score to postpone; Deposit the pumping signal value be excitation impact damper excitation buffer (u (n), n=-(143+11) ... 39), the value of search phase (u (n) wherein; N=0,1 ... 39;) be that the pumping signal of each subframe of linear residual error (LP residual) is the signal that obtains after the self-adapting code book signal of current subframe amplifies by the self-adapting code book yield value that quantizes, obtain the signal resulting signal that superposes after amplifying by the fixed codebook gain value that quantizes with the fixed code book signal, about this point; Can be referring to 5.9 joints of the TS26.090-310 version of 3GPP, its Chinese style (64) is the mathematical notation of pumping signal value.
Fixed codebook search about AMR-NB has detailed description in 5.7 joints of the TS26090-500 of 3GPP; The fixed code book of AMR-NB has adopted algebraic-codebook Algebraic codebook, and fixed codebook search obtains the fixed codebook gain that the fixed code book parameter has fixed code book vector, fixed codebook gain and quantification.
In self-adapting multi-rate narrowband (AMR-NB) the tone decoding process, each frame is all carried out LP (linear prediction) filter parameter decoding, thereby be formed for the LP filter coefficient of each subframe of the voice signal of each subframe of reconstruct; The building method of the pumping signal of each subframe is: the signal that obtains after the self-adapting code book signal is amplified by the self-adapting code book yield value; The signal that obtains after amplifying by the fixed codebook gain value with the fixed code book signal superposes, and self-adapting code book yield value here and fixed code book signal are the quantized values that the self-adapting code book gain index that obtains according to decoding and fixed code book index find from quantization table; The self-adapting code book signal of AMR-NB is based on the composite signal of the pumping signal of a sub-frame; Promptly; The self-adaption of decoding codebook index obtain integer and mark pitch delay, insert in by said integer and mark pitch delay the pumping signal of a last sub-frame being carried out and obtain the self-adapting code book signal.
The fixed codebook gain of self-adapting multi-rate narrowband (AMR-NB) quantizes to comprise: obtain based on the quantification energy predicting error (quantified prediction error) of former subframe or the fixed code book prediction gain, and the quantification of the modifying factor between fixed codebook gain and the described fixed code book prediction gain.The quantification energy predicting error (quantifiedprediction error) of subframe is exactly the value after the logarithm of said modifying factor amplifies by fixed proportion.
The TS26.090 of 3GPP has done to wash bright to the fixed codebook gain quantification of self-adapting multi-rate narrowband; For example; Formula (54) and (56) in 5.8 joints of TS26.090-310 version, the formula (3) below just and (4) explain quantification energy predicting error how impact prediction gain
E ~ ( n ) = Σ i = 1 4 b i R ^ ( n - i ) - - - ( 3 )
g c ′ = 10 0.05 ( E ~ ( n ) + E ‾ - E I ) - - - ( 4 )
Formula (3) is a n sub-frame prediction energy (predicted energy)
Figure S2008100389866D00033
Definition, value is moving average (MA) predictive coefficient for [the b1 b2 b3 b4] of [0.68 0.58 0.340.19],
Figure S2008100389866D00034
It is exactly the quantification energy predicting error of k sub-frame; Formula (4) is fixed code book prediction gain (predicted fixed-codebook gain) g ' cDefinition, E upgrades the mean value of energy (innovation energy) and gets different constant values according to the difference of pattern, is 36 decibels (dB) during 12.2kb/s for example, E IBe on average to upgrade energy (mean innovation energy).Modifying factor between fixed codebook gain and the fixed code book prediction gain is the ratio of the former with the latter; And the formula (58) in 5.8 joints of TS26.090-310 version explain that energy predicting error R (n) 20 is multiplied by the logarithm of stating modifying factor, quantizes the energy predicting error and then is 20 and take advantage of the logarithm of quantification modifying factor.
The resonance peak of sampled, digital Speech frame formed synthetic digital Speech frame after the digital voice frame that forms after the pre-service passes through linear prediction and quantification, self-adapting code book search and fixed codebook search is mainly determined by the employed linear prediction analysis of linear prediction (LPC); More definitely; Concerning AMR-NB; Be exactly after LSP converts prediction (LP) coefficient into; One 10 rank linear prediction synthesis filter (linear prediction synthesis filter) just can be definite by formula (7),
Figure S2008100389866D00035
wherein (i=1 ...; M m=10) is prediction (LP) coefficient that has quantized.
H ( z ) = 1 A ^ ( z ) = 1 1 + Σ i = 1 m a ^ i z - i , - - - ( 7 )
For AMR-NB and AMR-WB; Is exactly synthetic digital Speech frame with pumping signal through the filtered output of linear prediction synthesis filter; So; The frequency and the bandwidth of the resonance peak of the limit of linear prediction synthesis filter is corresponding synthetic digital Speech frame, these resonance peaks are reflected on the intensity of the waveform on the time domain, and are very big to sense of hearing influence.
The voice activation of AMR vocoder detection (VAD) method is to calculate the level of pretreated input signal and the difference between the ground unrest estimated value earlier at present; Calculate the VAD decision threshold again; The initial judgement of VAD realizes through more said difference and decision threshold; When the former initially adjudicates to Speech frame is arranged during greater than the latter; When the former during smaller or equal to the latter initial judgement be no Speech frame, the conclusive judgement of VAD is with the result of initially other detections such as judgement and the pretreated digital voice signal tone judgement after comprehensively.
The purpose of pitch detection is the detection signal tone, comprises also that certainly other has very strong periodic signal, and it is through relatively open-loop pitch gain and setting threshold are realized.If the open-loop pitch yield value is greater than threshold value (TONE_THR), expression detects tone and with pitch marks set.The algorithm of pitch detection is following:
if(t0>TONE_THR×t1)
tone=1
Wherein
t 0 = Σ n s w ( n ) s w ( n - k ) - - - ( 8 )
t 1 = Σ n s w 2 ( n - k ) - - - ( 9 )
S wBe the weighted speech signal of voice signal frame to be detected, k is an open-loop pitch delay, and the scope of n is 0 to 159 or 0 to 79 or 80 to 159 of a present frame.
The VAD of AMR-NB also will combine with discontinuous transmitting DTX; DTX is that the VAD result through a plurality of input signal frames detects the transmission that just begins to carry out discontinuous silence description frames SID after one section voice finishes, and the TS26.093 of 3GPP has introduced carrying into execution a plan of a kind of DTX.
The DTX requirement; When one section voice finishes; Need remove to produce a SID frame by a plurality of (for example 8) successive frame; Promptly will be continuously a plurality of (for example 7) VAD result frame (for example the 8th frame) afterwards is encoded to SID_FIRST to indicate the end of one section voice for the input signal frame of no speech after with speech pattern code rate coding; In case the SID_FIRST frame is sent out, as long as continuous no voice (for example per 8 frames) transmission SID_UPDATE frame periodically just, first SID_UPDATE frame need send out at the particular moment behind the SID_FIRST frame (for example the 3rd frame); A kind of exception is that the VAD result of an input signal frame behind the input signal frame of voice is no speech and finishes to be less than certain hour (for example 24 frames) apart from the preceding paragraph voice this frame is encoded to the SID_FIRST frame.
Summary of the invention
The technical matters that solves
Prior art adopt VAD was directed against to as if the pretreated digital voice signal frame that after pre-service, forms again of the digital voice signal frame that forms of sampling speech input back or sampling back digital voice signal frame; The object of the correlation detection of pitch detection also relates to the signal of last pretreated digital voice signal frame on the part sample point after the weighted except the frame with the formation after the weighting of pretreated digital voice signal frame; So; Even the result of pitch detection be strong correlation and the result who causes VAD for speech is arranged, the VAD result of said last pretreated digital voice signal frame also can be no speech.
S in the formula (8) w(n) (n=0,1 ... 159) relate to the sample point of present frame, the S in the formula (8) w(n-k) (n=0,1 ... 159) not only relate to present frame and also relate to the sample point of frame before.The frame that existing pitch detection technology only relates to the former sample point is made the VAD judgement; The related previous frame of the latter's sample point is not done the VAD judgement of relevant this frame; Such situation can take place under certain condition: the degree of correlation that pitch detection draws present frame and previous frame has reached the result that tone is arranged, and the code rate of present frame is decided to be speech pattern but previous frame is decided to be the ground unrest pattern.
Technical scheme
The present invention is applied in the result of voice signal coherent detection on the confirming of code rate of all related voice signal frames of voice signal sample point coherent detection.
The voice signal frame that maybe will be encoded for being encoded, to its carry out the back to correlation detection, promptly; Except carrying out pitch detection that the 3GPP standard provides, whether the sample point that also detects in its adjacent back one signal frame is relevant with it, and confirms the transmission types TX_TYPE of present frame according to the result of detection; For example, calculate signal S (n) (n=0,1 on the sample point of preceding half frame of adjacent back one signal frame;, 79) with its correlativity.
The AMR-NB frame that begins again to encode after the sample point on all having obtained a back signal frame can bring bigger time delay, that is, what contrast prior art carries out the AMR-NB Methods for Coding according to the result of VAD to this voice signal frame with regard to earlier the voice signal frame being carried out the VAD operation again after obtaining the voice signal frame; Or according to the result of VAD this voice signal frame is carried out the AMR-NB Methods for Coding again according to obtaining 1/4 frame of voice signal frame and rear adjacent after, just earlier the voice signal frame being carried out the VAD operation, obtain whole after sample point on the signal frame begin again to encode afterwards and will cause bigger time delay, can dwindle the back hunting zone of present frame in order to reduce time-delay to correlativity; Promptly; For postpone j can be only to the signal on the sample point of the preceding field of back one adjacent signals frame, for example, when j=143; Only to S (n) (n=160; 161 ..., 239) and the correlativity of signal on the sample point of these back one adjacent signals frames detects; To S (n) (n=240; 241 ..., 303) and signal on these sample points do not do detection.
In fact; But in case the signal of all sample points of current voice signal frame time spent just can begin to encode its AMR-NB frame all; Because coding AMR-NB frame needs certain processing time; This processing time is less than the frame length (20 milliseconds) of a frame; But generally can be between 1/4 to 3/4 frame length; Like this for the signal on the sample point that utilizes the next frame that receives during the encoding process carries out correlation detection, the transmission types and the code rate of the AMR-NB coded frame that AMR-NB coding that can advanced lang sound pattern-coding speed is confirmed to send to code translator side according to the result of said correlation detection are not again selected the code rate coding with the ground unrest pattern if the transmission types TX_TYPE that confirms is not normal speech SPEECH_GOOD with regard to the coded frame of giving up speech pattern; The coded frame of the code rate of ground unrest pattern refers to that transmission types is quiet description SID and no datat NO_DATA, and the coded frame of ground unrest pattern also can form during encoding process.
Interference for fear of unnecessary low frequency part; Can make the voice signal of carrying out correlation detection is through the pretreated voice signal of Hi-pass filter; The method that for this reason, can provide by the 5.1 joint pre-service (Pre-processing) of the TS26.090 of 3GPP is carried out pre-service.
The degree of correlation of voice signal is that the value through the autocorrelation function that calculates it obtains, and the form of autocorrelation function is following:
r ( d ) = Σ n = 0 N - 1 s ( n ) s ( n - d )
Wherein d postpones, and r (d) is an autocorrelation function, and s (n) is the value of voice signal on sample point n, and N is the related sample points of autocorrelation function.
Can also be before calculating autocorrelation function with the tut signal weighting, be below a kind of be the scheme that input audio signal carries out weighting with the mode of subframe one by one:
s w ( n ) = s ( n ) + Σ i = 1 10 a i γ 1 i s ( n - i ) - Σ i = 1 10 a i γ 2 i s w ( n - i ) , n = 0 , . . . , L - 1
S wherein w(n) be the weighting voice signal, s (n) is signal or the said follow-up voice signal in the said input audio signal frame, weighting factor γ 1Less than 1 more than or equal to 0, weighting factor γ 2Less than 0.7 more than or equal to 0, a iBe linear prediction LP coefficient, L is that the length of subframe is 40 sample points, if γ 1And γ 2All be 0 just to be equivalent to weighting function the original sound signal is remained unchanged.Carry out the detection of degree of correlation owing to adopted, only need calculating according to the signal auto-correlation function of this a part of sample point according to the signal on a part of sample point in the one frame voice signal of adjacent back,
Weighted auto-correlation function and said follow-up voice signal are following in the form of the energy of the pairing weighting voice signal of past signal at said optimum delay place:
R ( d ) = Σ n = 160 M + 159 s w ( n ) s w ( n - d ) w ( d )
E ( d max ) = Σ n = 160 M + 159 s w ( n - d max ) s w ( n - d max )
Wherein, R (d) is a weighted auto-correlation function, and d postpones d MaxBe that R (d) gets the pairing delay of maximal value, E (d Max) be the energy of said follow-up voice signal at the pairing weighting voice signal of past signal at said optimum delay place, w (d) is a weighting function, R (d) is exactly an autocorrelation function when w (d) is 1, s w(n) be the weighting voice signal; M is the said number that is no more than 20 milliseconds of sample points that time span comprised; Integer n in 0 to 159 scope sample point of said input audio signal frame that has been corresponding, 160 Integer n in the M+159 scope the are corresponding sample point of said follow-up voice signal.
Back following for the technical scheme of its AMR-NB scrambler that in both of the non-ground unrest code rate of ground unrest code rate and speech pattern, makes a choice according to present frame to coherent detection:
A kind of discontinuous transmitting DTX control and functional unit self-adapting multi-rate narrowband AMR-NB scrambler of having; Its reception frame length is 20 milliseconds an input audio signal frame; Also receive the follow-up voice signal that be no more than 20 millisecond time spans adjacent, be said input audio signal frame and said follow-up voice signal generation weighting input audio signal with said input audio signal frame; It comprises the auto-correlation calculating unit; These parts are to confirm predetermined maximum-delay and autocorrelation function between the predetermined minimum delay and weighted auto-correlation function corresponding to the weighting voice signal of said follow-up voice signal; With said maximum-delay and the area dividing between the said minimum delay is the scope of at least one; With being identified as optimum delay corresponding to the peaked delay of weighted auto-correlation function in the said scope; Calculate the value of autocorrelation function on optimum delay, calculate the energy of said follow-up voice signal at the pairing weighting voice signal of the past at optimum delay place signal; If greater than predetermined value, control and functional unit is confirmed as normal voice SPEECH_GOOD with said input audio signal frame transmission types TX_TYPE by said discontinuous transmitting DTX at the ratio of the energy of the pairing weighting voice signal of the past at this at least one optimum delay place signal for value and the said follow-up voice signal of said autocorrelation function at least one optimum delay.
The weighting function that generates the weighting voice signal for wherein input audio signal frame and follow-up voice signal has following form:
s w ( n ) = s ( n ) + Σ i = 1 10 a i γ 1 i s ( n - i ) - Σ i = 1 10 a i γ 2 i s w ( n - i ) , n = 0 , . . . , L - 1
S wherein w(n) be the weighting voice signal, s (n) is signal or the said follow-up voice signal in the said input audio signal frame, weighting factor γ 1Less than 1 more than or equal to 0, weighting factor γ 2Less than 0.7 more than or equal to 0, γ 1And γ 2All equal 0 and be equivalent to not carry out operation for weighting (s w(n)=and s (n)), a iBe linear prediction LP coefficient, L is the length of subframe.The method of calculating linear prediction LP coefficient is carried out in the scheme that 5.2 nodel line property forecast analysis and quantification (Linear predictionanalysis and quantization) provide at the TS of 3GPP (technical manual) 26.090-500, and the linear prediction analysis of just mentioning in the background technology and the method for quantification are carried out.
Above-mentioned weighted auto-correlation function and said follow-up voice signal are following in the form of the energy of the pairing weighting voice signal of past signal at said optimum delay place:
R ( d ) = Σ n = 160 M + 159 s w ( n ) s w ( n - d ) w ( d )
E ( d max ) = Σ n = 160 M + 159 s w ( n - d max ) s w ( n - d max )
Wherein, R (d) is a weighted auto-correlation function, and d postpones, and w (d) is a weighting function, when w (d) when being 1 R (d) be exactly autocorrelation function, d MaxBe that R (d) gets the pairing delay of maximal value, E (d Max) be the energy of said follow-up voice signal at the pairing weighting voice signal of past signal at said optimum delay place; When said maximum-delay and the zone between the said minimum delay are divided into more than 1 scope; R (d) has a maximal value, d in each scope MaxAlso have a plurality ofly, when said maximum-delay and the whole zone between the said minimum delay were used as a scope, R (d) had only a maximal value, d MaxAlso has only one.s w(n) be the weighting voice signal; M is the said number that is no more than 20 milliseconds of sample points that time span comprised; Integer n in 0 to 159 scope sample point of said input audio signal frame that has been corresponding, 160 Integer n in the M+159 scope the are corresponding sample point of said follow-up voice signal.
Adopt the number of the scope that suitable weighting function w (d) helps said maximum-delay and the zone between the said minimum delay are divided to become 1, for example adopt the following w of form (d):
w(d)=w l(d)w n(d)
Wherein d postpones, the low weighting function w that postpones l(d) form is w l ( d ) = d Log 2 K Nw , Adjacent past frame postpones weighting function w n(d) form is w n(d)=1 or w n ( d ) = ( | T Old - d | + d L ) Log 2 K w , d LBe the said minimum delay, T OldBe the open-loop pitch delay of said input audio signal frame, K NwBe the adjustment parameter that postpones the weighting of d adjacent domain, K NwBe postpone (| T Old-d|+d L) the adjustment parameter of adjacent domain weighting.
The way of in the speech pattern AMR-NB frame of coding present frame, carrying out the sampling of the adjacent follow-up signal of present frame is suitable for above-mentioned scrambler; Because to the operation of the linear prediction of present frame and quantification, self-adapting code book search and fixed codebook search with to parallel the carrying out of sampling operation of the adjacent follow-up signal of present frame, so the result who carries out coherent detection according to the follow-up signal that sampling is obtained can control code rate the choosing between speech pattern and ground unrest pattern of present frame.
Consider that to the real-time requirement of voice signal after the decoding restriction that the needs of the time span of follow-up signal are certain can be controlled within 20 milliseconds from receiving present frame and beginning that it is encoded to the time that its AMR-NB frame coding accomplishes like this.Time span for given follow-up signal; As long as linear prediction and quantification, code book search, voice activation detect VAD, back to coherent detection and DTX control and total deadline of operating these each 20 milliseconds of needed steps of duration signal frame of coding less than 20 milliseconds, voice signal just can be by uninterrupted coding and send to decoding side constantly.Because the operation of sampling follow-up signal can be carried out with VAD is parallel with linear prediction and quantification, code book search (self-adapting code book search and fixed codebook search), in the AMR-NB variable rate coding, introduces the back and still can keep less time-delay to detection.For the arithmetic speed of present microprocessor and DSP (digital signal processor), it is a suitable selection that the time span of follow-up signal is decided to be 10 milliseconds.
Minimum delay of being adopted in the coherent detection and maximum-delay can adopt the employed hunting zone of pitch Detection of the AMR-NB of 3GPP, promptly are respectively 20 sample points (or 18 sample points) and 143 sample points.
The technical scheme of tut signal coder can be used for the field of other any voice coding, so the present invention proposes the method for the transmission types of following definite AMR-NB:
The method of the transmission types TX_TYPE of a kind of definite discontinuous transmitting DTX of self-adapting multi-rate narrowband AMR-NB coding; For this method; The reception frame length is 20 milliseconds an input audio signal frame; Also receive the said follow-up voice signal that be no more than 20 millisecond time spans adjacent, be said input audio signal frame and said follow-up voice signal generation weighting input audio signal with the input audio signal frame;
For confirming predetermined maximum-delay and autocorrelation function between the minimum delay and weighted auto-correlation function corresponding to the weighting voice signal of said follow-up voice signal; With said maximum-delay and the area dividing between the said minimum delay is the scope of at least one; With in the said scope corresponding to the peaked delay of weighted auto-correlation function as optimum delay; Calculate the value of autocorrelation function on optimum delay, calculate the energy of said follow-up voice signal at the pairing weighting voice signal of the past at optimum delay place signal;
If value and the said follow-up voice signal of said autocorrelation function at least one optimum delay greater than predetermined value, confirmed as normal voice SPEECH_GOOD with said input audio signal frame transmission types TX_TYPE at the ratio of the energy of the pairing weighting voice signal of the past at this at least one optimum delay place signal.
Weighting function for input audio signal frame and follow-up voice signal generation weighting voice signal in the above-mentioned coding method has following form:
s w ( n ) = s ( n ) + Σ i = 1 10 a i γ 1 i s ( n - i ) - Σ i = 1 10 a i γ 2 i s w ( n - i ) , n = 0 , . . . , L - 1
S wherein w(n) be the weighting voice signal, s (n) is signal or the said follow-up voice signal in the said input audio signal frame, weighting factor γ 1Less than 1 more than or equal to 0, weighting factor γ 2Less than 0.7 more than or equal to 0, a iBe linear prediction LP coefficient, L is the length of subframe.
Weighted auto-correlation function in the said method and said follow-up voice signal are following in the form of the energy of the pairing weighting voice signal of past signal at said optimum delay place:
R ( d ) = Σ n = 160 M + 159 s w ( n ) s w ( n - d ) w ( d )
E ( d max ) = Σ n = 160 M + 159 s w ( n - d max ) s w ( n - d max )
Wherein, R (d) is a weighted auto-correlation function, and d postpones d MaxBe that R (d) gets the pairing delay of maximal value, E (d Max) be the energy of said follow-up voice signal at the pairing weighting voice signal of past signal at said optimum delay place, w (d) is a weighting function, s w(n) be said weighting voice signal; M is the said number that is no more than 20 milliseconds of sample points that time span comprised; Integer n in 0 to 159 scope sample point of said input audio signal frame that has been corresponding, 160 Integer n in the M+159 scope the are corresponding sample point of said follow-up voice signal.
Equally, adopt suitable weighting function w (d) to help the scope number that said maximum-delay and the zone between the said minimum delay are divided is become 1, for example adopt the following w of form (d):
w(d)=w l(d)w n(d)
Wherein d postpones, the low weighting function w that postpones l(d) form is w l ( d ) = d Log 2 K Nw , Adjacent past frame postpones weighting function w n(d) form is w n(d)=1 or w n ( d ) = ( | T Old - d | + d L ) Log 2 K w , d LBe the said minimum delay, T OldBe the open-loop pitch delay of said input audio signal frame, K NwBe the adjustment parameter that postpones the weighting of d adjacent domain, K NwBe postpone (| T Old-d|+d L) the adjustment parameter of adjacent domain weighting.
Tut scrambler follow-up sampled voice and linear prediction and the parallel way of carrying out of code book search are here stood good.
When adjacent follow-up signal is done correlation detection, can detect the energy of said adjacent follow-up signal, and confirm the coding mode (speech pattern or ground unrest pattern) of present frame according to the comparative result of the energy of said pumping signal and threshold value by the pumping signal on the sample point at optimum delay time-delay back place.What some explanation was here arranged is: press the pumping signal on the sample point that belongs to after the optimum delay time-delay for said adjacent follow-up signal; Only that part of on the sample point of present frame need not carry out these calculating of linear prediction and code book search to follow-up signal and just can obtain; Under the situation that has real-time to require, can be only confirm the coding mode of present frame according to the energy of the determined signal in the past pumping signal on the occupied sample point in present frame of delaying time by optimum delay.Certainly; Under the situation that does not have real-time to require; Can according to the energy on the sample point at the determined signal in the past place of delaying time by optimum delay confirm coding mode and the position that needn't consider sample point whether in the scope of present frame; Promptly generate the pumping signal of follow-up signal, the energy on all sample points can obtain like this.
Be exactly the technical scheme of considering definite AMR-NB coding mode of pumping signal energy below
A kind of discontinuous transmitting DTX control and functional unit self-adapting multi-rate narrowband AMR-NB scrambler of having; The reception frame length is 20 milliseconds an input audio signal frame; Also receive the follow-up voice signal that be no more than 20 millisecond time spans adjacent with said input audio signal frame; Be said input audio signal frame and said follow-up voice signal generation weighting input audio signal; After receiving said input audio signal frame, begin said input audio signal frame is carried out linear prediction and quantification, self-adapting code book search and fixed codebook search; Described said input audio signal frame is carried out search of linear prediction and quantification, self-adapting code book and fixed codebook search during receive said follow-up voice signal; Promptly; From receive said input audio signal frame to receive said follow-up voice signal during arrange to carry out at least this wherein first of the linear prediction of said input audio signal frame and quantification, self-adapting code book search and fixed codebook search, and generate the pumping signal of said input audio signal frame; This scrambler comprises the auto-correlation calculating unit; These parts are to confirm predetermined maximum-delay and autocorrelation function between the minimum delay and weighted auto-correlation function corresponding to the weighting voice signal of said follow-up voice signal; With said maximum-delay and the area dividing between the said minimum delay is the scope of at least one; With being identified as optimum delay corresponding to the peaked delay of weighted auto-correlation function in the said scope; Calculate the value of autocorrelation function on optimum delay; Calculate the energy of said follow-up voice signal, calculate the energy of the pumping signal of follow-up voice signal on the sample point at the past at optimum delay place signal place that is arranged in said input audio signal frame at the pairing weighting voice signal of the past at optimum delay place signal; If value and the said follow-up voice signal of said autocorrelation function at least one optimum delay at the ratio of the energy of the pairing weighting voice signal of the past at this at least one optimum delay place signal greater than predetermined value; And the energy of the pumping signal of the said follow-up voice signal that is arranged in said input audio signal frame on the sample point at the past at this at least one optimum delay place signal place is greater than being this pumping signal preset threshold; Described preset threshold is the number and the product of sample point energy threshold of the sample point at said signal in the past place, and said discontinuous transmitting DTX is controlled and functional unit is confirmed as normal voice SPEECH_GOOD with said input audio signal frame transmission types TX_TYPE.Confirm to be used for method with the threshold value of the energy comparison of pumping signal here and be the number of the sample point of pumping signal in the present frame scope multiply by predetermined value; Confirming of this predetermined value is relevant with the expression mode with the dynamic range of input signal frame; Such as; Will obtain same effect for same input audio signal, the dynamic range of 13 bits and 16 bit integer that symbol arranged represent that predetermined value and the dynamic range of 20 bits and the signed integer of 32 bits of the input signal of mode represent that the predetermined value of the input signal of mode is different fully.
Can come for this scrambler generates the weighting voice signal with reference to the scheme that this paper front provides, same, the scheme of the energy calculation of weighted auto-correlation function that the front provides and weighting voice signal here is suitable for too.Search weighted autocorrelation function maximal value in whole or several scopes of scheme between minimum delay and maximum-delay that also can provide with reference to this paper front.
Consider the arithmetic speed that real-time and present CPU (central processing unit) and DSP (digital signal processor) can reach; The follow-up voice signal adjacent with present frame that is used for coherent detection is unsuitable oversize; So the time span that the length of this voice signal is decided to be 10 milliseconds is the selection of a compromise; The scope that can reduce to search for for the computing expense that reduces search is such as will the minimum delay being set at 80.
The back scheme to coherent detection of considering the present frame of pumping signal energy is not only applicable to above-mentioned AMR-NB scrambler, also is applicable to the occasion that other needs the AMR-NB coding.So following method is just arranged,
The method of the transmission types TX_TYPE of a kind of definite discontinuous transmitting DTX of self-adapting multi-rate narrowband AMR-NB coding; In method; The reception frame length is 20 milliseconds an input audio signal frame; Also receive the follow-up voice signal that be no more than 20 millisecond time spans adjacent with said input audio signal frame; Be said input audio signal frame and said follow-up voice signal generation weighting input audio signal; After receiving said input audio signal frame, begin said input audio signal frame is carried out linear prediction and quantification, self-adapting code book search and fixed codebook search, from receive said input audio signal frame to receive said follow-up voice signal during arrange at least to carry out to this wherein first of the linear prediction of said input audio signal frame and quantification, self-adapting code book search and fixed codebook search;
For confirming predetermined maximum-delay and autocorrelation function between the minimum delay and weighted auto-correlation function corresponding to the weighting voice signal of said follow-up voice signal; With said maximum-delay and the area dividing between the said minimum delay is the scope of at least one; With in the said scope corresponding to the peaked delay of weighted auto-correlation function as optimum delay; Calculate the value of autocorrelation function on optimum delay; Calculate the energy of said follow-up voice signal, calculate the energy of the pumping signal of said follow-up voice signal on the sample point at the past at optimum delay place signal place that is arranged in said input audio signal frame at the pairing weighting voice signal of the past at optimum delay place signal;
If value and the said follow-up voice signal of said autocorrelation function at least one optimum delay at the ratio of the energy of the pairing weighting voice signal of the past at this at least one optimum delay place signal greater than predetermined value; And the energy of the pumping signal of the follow-up voice signal that is arranged in said input audio signal frame on the sample point at the past at this at least one optimum delay place signal place is greater than being this pumping signal preset threshold; Described preset threshold is number and the product of sample point energy threshold of the sample point at said signal in the past place, and said input audio signal frame transmission types TX_TYPE is confirmed as normal voice SPEECH_GOOD.
Above-mentioned for above-mentioned scrambler or coding method, can substitute with way according to the way of the energy of pumping signal according to the level of pumping signal.With signal on all sample points of a period of time pumping signal take absolute value the summation be exactly the level of this section period pumping signal.
Beneficial effect
The present invention has been applied to tone detection methods of making by the correlativity of the present frame that will encode and the past frame of having encoded of the prior art on the signal of the present frame that will encode and uncoded back consecutive frame; The code rate that when the degree of correlation of the signal in present frame and the back consecutive frame reaches the degree that surmounts predetermined threshold, just the code rate of present frame is decided to be speech pattern; And for prior art; If the relevant predetermined threshold that do not reach of the consecutive frame before present frame and its is even can reach the code rate that the degree that surmounts predetermined threshold still can not be decided to be the code rate of present frame speech pattern according to the testing result of being correlated with the consecutive frame after current frame is with this relevant when the consecutive frame of coding after it.
Can know from the introduction of background technology: a last sub-frame and pumping signal are before carried out the interior slotting self-adapting code book signal that obtains at last by integer and mark pitch delay; This self-adapting code book signal is again by the signal that obtains after the self-adapting code book yield value amplification that quantizes; The signal that obtains after amplifying by the fixed codebook gain value that quantizes with the fixed code book signal superposes, and resulting signal is exactly a pumping signal.
Method of the present invention can detect the current demand signal frame whether with adjacent back one signal frame in the degree of correlation of signal; Thereby make and to encode with the code rate of speech pattern above the current demand signal frame of predetermined threshold with the degree of correlation of adjacent back one signal frame; Thereby form the pumping signal of the current demand signal frame of non-zero; Like this; When adjacent back one signal frame is carried out the coding of speech pattern; Can the pitch delay that the resulting auto-adaptive parameter of coding provides be applied to the pumping signal of current demand signal frame, that is, the pumping signal of the non-full zero of present frame can be made contributions to this adjacent formation of the pumping signal of a signal frame afterwards; If present frame is with the ground unrest pattern-coding; No matter adjacent the value of the pitch delay of back one signal frame be much, and complete zero the pumping signal that is reset of its last signal frame (current demand signal frame) that is to say that to the not contribution of formation of the pumping signal of this adjacent back one signal frame the present frame and the stronger correlativity of back one frame do not show in the decoded voiced frame of coded frame.
Of the present invention according to present frame and back consecutive frame correlativity settled really before the scheme of frame encoding mode also can combine to detect the present frame pumping signal to the scheme of the contribution of consecutive frame thereafter; Because adopted and the corresponding to searching method of pitch delay; Operate the candidate that resulting optimum delay is an open-loop pitch delay by scheme of the present invention; The energy that reflects or predict the pumping signal that the pitch delay place of back consecutive frame is corresponding with the energy of the pairing pumping signal of candidate of open-loop pitch delay; Thereby ignore those some coherent detection results and be the relevant but very low present frame of pumping signal energy; No longer with them with voice pattern-coding rate coding, can Compression and Transmission speed.
The present invention proposes, and to carry out the method for coherent detection according to weighted speech signal close with the method that open-loop pitch detects, to the adjacent next frame of present frame search for resultant open-loop pitch time-delay and that weighted speech signal is carried out in the resulting optimum delay of coherent detection approaching.Employed minimum delay of coherent detection of the present invention, maximum-delay and employed minimum delay of open-loop pitch search, when maximum-delay is consistent; The detected optimum delay of coherent detection has one certainly near open-loop pitch delay; As long as the energy of the pumping signal the autocorrelation value at this pitch delay place and the ratio of weighting voice signal energy are delayed time follow-up signal those sample points in the present frame greater than threshold value and by this pitch delay on is greater than predetermined value; Just can detect this phenomenon by scheme of the present invention, thus send content for relevant signal to DTX control and operational module to make the code rate of present frame be the non-ground unrest code rate of speech pattern.The benefit of doing so clearly; When the pumping signal of present frame has certain energy; And the ratio of the autocorrelation value at optimum delay place and weighting voice signal energy shows when said pumping signal is contributed constituting of adjacent next frame; Present frame is encoded to the speech pattern frame can makes the pumping signal of present frame non-vanishing, adopting ground unrest code rate coding then can make the pumping signal of present frame is zero (this gives the coding deleterious impact of the speech pattern of adjacent next frame).
Carry out the method for confirming code rate again owing to adopted the search of first execution linear prediction and code book; Like this; The appearance of the pumping signal that is generated according to non-ground unrest code rate is just confirmed operation prior to code rate, confirms that to follow-up signal and pumping signal the scheme of present frame code rate is that prior art does not possess.
Description of drawings
But Fig. 1 is the theory diagram of band back to the AMR-NB scrambler of the dual-mode coding of coherent detection parts.
Back theory diagram in Fig. 2 (AMR-NB) shown in Figure 1 scrambler to the coherent detection module.
The theory diagram of the voice coding module in Fig. 3 (AMR-NB) shown in Figure 1 scrambler.
Fig. 4 be band back to coherent detection parts (detection pumping signal) but the theory diagram of AMR-NB scrambler of dual-mode coding.
Back theory diagram in Fig. 5 (AMR-NB) shown in Figure 3 scrambler to the coherent detection module.
Fig. 6 is that the band back is to the AMR-NB of coherent detection parts scrambler.
Back theory diagram in Fig. 7 (AMR-NB) shown in Figure 5 scrambler to the coherent detection module.
Embodiment
Embodiment 1; But self-adapting multi-rate narrowband (AMR-NB) scrambler that is used for the dual-mode coding of UMTS system; As shown in Figure 1; The voice sample rate is that 13 bit uniform pulse modulation (PCM) signal frame 1 of 8kHz is exported to the voice coding module of non-ground unrest code rate and the ground unrest coding module of ground unrest code rate simultaneously; Self-adapting multi-rate narrowband (AMR-NB) scrambler also receives 13 bit uniform pulse modulation (PCM) signal frame 5 that the voice sample rate is 8kHz; Signal frame 5 has such relation with signal frame 1: for the AMR-NB scrambler; The 2 back 10 milliseconds sub-frame of the signal frame 1 that receive recently 2 preceding 10 milliseconds sub-frame and it of its signal frame of receiving 5 are identical, and the 2 preceding 10 milliseconds sub-frame of the signal frame 1 that 2 back 10 milliseconds sub-frame and it of signal frame 5 will be received are identical, just can comprise the follow-up signal with 10 milliseconds of adjacent time spans of signal frame 1 than the late 10 milliseconds of signal frames 5 that arrive the AMR-NB scramblers of signal frame 1 like this.
The voice coding module of non-ground unrest code rate speech pattern is selected module output with self-adapting multi-rate narrowband (AMR-NB) coded frame 11 of the non-ground unrest code rate of signal frame 1 to coded frame output; The ground unrest coding module of ground unrest pattern is selected module output with the quiet description coded frame 12 of self-adapting multi-rate narrowband (AMR-NB) of the ground unrest code rate of signal frame 1 to coded frame output; The pretreated digital audio signal frame 17 that the voice coding module of non-ground unrest code rate produces during also with coded signal frame 1 is exported to the voice activation detection module; Shown in the coding flow process of the voice coding module of Fig. 3; Digital audio signal frame 17 is the output of pre-processing module, is divided by 2 and then be to obtain behind 80 hertz the Hi-pass filter through cutoff frequency with the value of all inputs on sample points of signal frame 1.The voice activation detection module carries out voice activation to digital audio signal frame 17 and detects; And the result that will detect---VAD sign 18 is to discontinuous transmission (DTX) control and operational module output; DTX control and operational module output transmission types TX_TYPE signal 19 are selected module to coded frame output, and coded frame output selects module that the transmission types signal of receiving 19 is exported to 3G (3G (Third Generation) Moblie) wireless access network (AN).
Transmission types signal 19 is that normal speech (SPEECH_GOOD), quiet description begin (SID_FIRST), quiet description and upgrade one of four kinds of (SID_UPDATE), no datat (NO_DATA); When transmission types signal 19 was normal speech (SPEECH_GOOD), it was self-adapting multi-rate narrowband (AMR-NB) coded frame 11 by non-ground unrest code rate that the information bit 2 of module output is selected in coded frame output; When transmission types signal 19 was quiet description renewal (SID_UPDATE), it was by the quiet description of the self-adapting multi-rate narrowband of ground unrest code rate (AMR-NB_SID) frame 12 that the information bit 2 of module output is selected in coded frame output; When transmission types signal 19 is quiet descriptions when beginning (SID_FIRST), it is the SID_FIRST frame that forms according to 3GPP technical manual TS26093 that the information bit 2 of module output is selected in coded frame output; When transmission types signal 19 was no datat (NO_DATA), information bit 2 was invalid for the AN of 3G.If discontinuous transmission (DTX) control and operational module are set at transmission types signal 19 according to the VAD sign 18 of input the transmission types indication of normal speech (SPEECH_GOOD); Discontinuous transmission (DTX) control and operational module also send the transmission types indication of AMR-NB coded frame of the even PCM signal frame 1 of 13 bits of current 8kHz to the voice coding module of non-ground unrest code rate speech pattern---normal speech (SPEECH_GOOD); Still use the quantification energy predicting error in the quantification energy predicting error buffer of pumping signal and module itself in the pumping signal impact damper of module itself when receiving the adjacent back frame coding AMR-NB frame of the even PCM signal frame of 13 bits that the non-ground unrest code rate voice coding modules in this transmission types signal 19 backs are current 8kHz; That is, still use the pumping signal in its excitation impact damper to quantize the energy predicting error with using according to the described method of the TS26.090 of 3GPP; If being set at quiet description according to the VAD sign 18 of input with transmission types signal 19, discontinuous transmission (DTX) control and operational module begin (SID_FIRST), quiet description renewal (any among three of SID_UPDATE and the no datat (NO_DATA); Discontinuous transmission (DTX) control and operational module also send this signal 19 to the voice coding module of non-ground unrest code rate; After receiving the transmission types signal 19 of one of these types; Non-ground unrest code rate voice coding module will be accomplished with ground unrest code rate ground unrest coding module and use when the pumping signal 35 that current PCM signal frame 1 coding back is produced goes to replace pumping signal in himself pumping signal impact damper to supply the AMR-NB frame of coding and an adjacent back frame of the even PCM signal frame of 13 bits of current 8kHz; Use when equally, the voice coding module will go the quantification energy predicting error of four sub-frame in the quantification energy predicting error buffer of replacement itself to supply the AMR-NB frame of an adjacent back frame of the even PCM signal frame of 13 bits of coding and current 8kHz with the quantification energy predicting error 37 that the ground unrest coding module is accomplished four sub-frame that current PCM signal frame 1 coding back is produced.
The block diagram of the coded portion on the right of transmit leg among Fig. 1 of the TS26.071 of Fig. 1 and 3GPP (TRANSMIT SIDE) is similar; Difference is how back to the coherent detection module, and this module is carried out the voice and the follow-up signal coherent detection of pre-service, linear prediction and quantification, calculating weighting to the voice digital signal frame 5 of input.The voice of pre-service, linear prediction and quantification and calculating weighting all have explanation in the TS26.090 of 3GPP, what here need specify is how follow-up signal coherent detection module works.
The processing flow chart that is the back to the coherent detection module shown in Figure 2; As shown in the figure; Follow-up signal coherent detection module receives the voice signal frame through weighting from calculating the weighting voice module; The back field that follow-up signal coherent detection module will be calculated the 10 millisecond time spans of weighting voice module in the voice signal frame 5 of its output is used as the coherent detection reference object; That is, carry out the calculating of related function, when calculating formula (8) in the back field scope of 10 milliseconds of time spans of scope in the voice signal frame of n by the expression formula that front formula (8) provides; The scope of the sample point of current demand signal frame 1 is expressed as 0-159, and the expression of the sample point scope of above-mentioned follow-up signal is 160-239.
Pre-processing module is identical with the function that the pre-processing module in the voice coding module produces digital audio frame 17; The processing that calculating weighting voice module is done pretreated signal frame is exactly a perceptual weighting, and this module is according to expression A (z) structure weighting filter A (the z/ γ of the reverse wave filter of the non-quantization parameter that receives 1)/A (z/ γ 2), γ 1Be 0.94, γ 2Being 0.6, that is, is the voice of subframe lengths L (being taken as 40 according to regulation among a AMR-NB) sub-frame length to length, obtains the weighting voice by following formula:
s w ( n ) = s ( n ) + Σ i = 1 10 a i γ 1 i s ( n - i ) - Σ i = 1 10 a i γ 2 i s w ( n - i ) , n = 0 , . . . , 39
The regulation minimum delay is that 20 maximum-delays are 143 in the present embodiment; Calculate the value on each delay k between the 20-143 of autocorrelation function; And in these three scopes of 20-39,40-80 and 80-143, be autocorrelation function maximizing; The pairing value that postpones k of maximal value as optimum delay, has so just been had 3 optimum delays, and here autocorrelation function r (k) is expressed as:
r ( k ) = Σ n = 160 M + 159 s w ( n ) s w ( n - k )
Here M is 80 (10 milliseconds time spans).
Calculate above-mentioned 3 optimum delay k MaxThe energy at place, the energy calculation formula is expressed as:
E ( k max ) = Σ n = 160 M + 159 s w ( n - k max ) s w ( n - k max )
Calculate 3 k respectively MaxThe auto-correlation function value and the ratio of energy---the r (k at place Max)/E (k Max); These 3 ratios are compared with 0.65 respectively; As long as it is that be correlated with back sends to discontinuous transmission control and operational module to correlated results signal 28 with content just greater than 0.65 that a ratio is wherein arranged, if in 3 ratio neither one than 0.65 greatly with content be incoherent after to correlated results signal 28 to discontinuous transmission control and operational module transmission.
In a single day discontinuous transmission control and operational module receive that content is relevant afterwards just to export normal speech SPEECH_GOOD to correlated results signal 28, and to select the information bit 2 of module output be exactly the AMR-NB coded frame of the speech pattern of the non-ground unrest code rate that generates of voice coding module in coded frame output like this.
Embodiment 2; Energy according to the pumping signal of want coded frame confirms that the back is to relevant self-adapting multi-rate narrowband (AMR-NB) scrambler; As shown in Figure 4; With the difference of embodiment 1 be: the voice coding module does not all have pre-processing module with the back in detection module; And signal frame 1 is that 13 bit PCM input signals 0 have passed through the pretreated signal of pre-processing module with signal in the signal frame 5, the performed operation of pre-service be high-pass filtering and signal value divided by 2, the preceding field of signal frame 5 is identical with the back field of signal frame 1; The back field of signal frame 5 is adjacent follow-up signals of signal frame 1, does like this with to be put into pre-processing module voice coding the same with the scheme effect of back among coherent detection; The voice coding module of non-ground unrest code rate is backward to the pumping signal of coherent detection module output sound signal frame 1, and is as shown in Figure 4, pumping signal 45.Like this, for signal frame 5, the pumping signal of its preceding 2 sub-frame is exactly the signal of last 2 sub-frame in the pumping signal 45.Shown in Figure 5 is back to the coherent detection module, and follow-up signal coherent detection module wherein receives this pumping signal 45.
Identical with embodiment 1 each postpones the calculating of the auto-correlation function value on k and is respectively the corresponding optimum delay k of autocorrelation function maximal value in these three scopes of 20-39,40-80 and 80-143 between 20-143 except carrying out in the present embodiment MaxCalculate outside its energy, follow-up signal coherent detection module also will be each optimum delay k of these 3 scopes MaxCalculate the energy of the pumping signal of its correspondence in the scope of signal frame 1, each optimum delay k MaxThe energy expression of pumping signal following:
E v ( k Max ) = Σ n = 160 k Max + 159 v w ( n - k Max ) v w ( n - k Max ) As 20≤k Max<80 o'clock,
E v ( k Max ) = Σ n = 160 239 v w ( n - k Max ) v w ( n - k Max ) As 80≤k Max≤143 o'clock
Wherein v (n) (n=0,1 ..., 159) and be the pumping signal of all 4 sub-frame of signal frame 1, E v(k Max) be exactly that the follow-up signal (parts of back 10 milliseconds of time spans of signal frame 5) that is arranged in signal frame 1 is pressed time-delay k MaxThe delay time energy of pumping signal of resulting signal in the past, the follow-up signal that promptly is arranged in signal frame 1 is at time-delay k MaxThe energy of the pumping signal on the sample point of the past signal at place.
In the present embodiment, it is slightly different with embodiment 1 for relevant back condition to correlated results signal 28 to produce content, as 3 k MaxIn some k MaxThe auto-correlation function value and the ratio of energy---the r (k at place Max)/E (k Max) greater than 0.65 and this k MaxThe E at place v(k Max) during greater than given threshold value just output content for relevant back to correlated results signal 28; Value representation in the present embodiment on the dynamic range sample point of 13 of input signal 0 bits is that (minimum 3 bits of sample value all are 0 to 16 bit signed integers; High 13 13 bit signals representing input signal 0), this given threshold value be exactly 25000 with the part of back 10 milliseconds of time spans of signal frame 5 by this k mThe length of the part that is arranged in signal frame 1 in the signal after the time-delay, that is, and as this k MaxMore than or equal to 80 o'clock, said given threshold value was exactly 25000 and 80 product---and 2000000, work as k MaxLess than 80 o'clock, said given threshold value was exactly 25000 and k MaxProduct.
Embodiment 3 and embodiment 2 differences are regulation γ in the present embodiment 1And γ 2Be 0 all, be equivalent to calculate the weighting voice module and keep input signal constant that the minimum delay is 80; Maximum-delay is 143; So only search autocorrelation function maximal value and corresponding optimum delay in this scope of 80-143, in addition, embodiment 3 is identical with embodiment 2.Like this, at a unique optimum delay k MaxThe energy E of the pumping signal at place v(k Max) just can use computes:
E v ( k max ) = Σ n = 160 239 v w ( n - k max ) v w ( n - k max )
This is the k that searches because of in this scope of 80-143 MaxMore than or equal to 80, the time span of the follow-up signal part in the signal frame 5 has only 80 samples of 10 milliseconds, so follow-up voice signal is at optimum delay k MaxThe sample point at the past signal place at place is exactly (n-k Max) to (n+79-k Max) the scope of 80 sample points.
In the present embodiment, work as k MaxThe auto-correlation function value and the ratio of energy---the r (k at place Max)/E (k Max) greater than 0.65 and this k MaxThe E at place v(k Max) greater than given threshold value---2000000 o'clock, back for what be correlated with to correlated results signal 28 to DTX control and operational modules with regard to output content.
The hunting zone of 80-143 is more much smaller than the scope of fundamental tone (pitch) search of 3GPP for present embodiment; But the object of the invention is not the search fundamental tone; In fact; On the integral multiple of pitch period, also can embody correlativity, thus when pitch period less than 80 but its integral multiple when detecting correlativity between the 80-143 and on the integral multiple at it, just can present frame be encoded with the non-ground unrest code rate of speech pattern.
Embodiment 4; As shown in Figure 6; An input voice signal frame 42 there is an AMR-NB scrambler that is operated in 10.2kbit/s (kilobits per second) coding module for its generation AMR-NB coded frame; Input voice signal frame 42 is the even PCM frames of 13 bits; VAD sign 43 indication VAD results; The voice coding module coding generates AMR-NB encoded speech frames 44 (non-ground unrest encode speed self-adaption arrowband coded frame); Ground unrest coding module coding generates the quiet description of AMR-NB (SID) frame 41; Transmission types indication 46 is used for indicating the type of the content in the information bit 47 of passing to the 3G Access Network, and the voice coding module is carried out pre-service, linear prediction and quantification, self-adapting code book search and fixed codebook search to the even PCM frame of 13 bits and obtained synthetic digital voice signal frame 48, and the even PCM frame of 13 bits is carried out obtaining pretreated voice signal frame 49 after the pre-service; Generate quantification energy predicting error 50---the frame energy logarithmic mean value of quantification of subframe during the coded frame of ground unrest coding module coding ground unrest code rate-quiet description (SID) frame; The logarithmic mean value (averaged logarithmic energy) that is the frame energy is through the value after the quantification treatment, and the quantification energy predicting error of four sub-frame is all used this numerical value, calculates in (Frame energy caculation) at the 5.2 joint frame energy of the TS26.092-500 of 3GPP and has provided the logarithmic mean value of frame energy and the frame energy logarithmic mean value defined of quantification.
Different place of block diagram, the right of the transmitting section of describing among Fig. 5 and 3GPP26.071-400 Fig. 1 (Transmit side) is: the voice activation detection module among Fig. 5 of the present invention detects synthetic digital voice signal 48; The method of 3GPP is that pretreated digital voice signal is detected; More than among Fig. 5 back to the coherent detection module; This module is carried out the voice and the follow-up signal coherent detection of pre-service, linear prediction and quantification, calculating weighting to the voice digital signal frame 55 of input, and this back also receives pumping signal 45 and open-loop gain 51 from the voice coding module to the coherent detection module.
The ground unrest coding module when the voice coding module provides its coding SID frame, produce quantification energy predicting error 50---the frame energy logarithmic mean value of quantification to this, has clear and definite expression in Fig. 6.
In the present embodiment; Self-adapting multi-rate narrowband (AMR-NB) scrambler also receives 13 bit uniform pulse modulation (PCM) signal frame 55 that the voice sample rate is 8kHz except receiving the even PCM frame 42 of 13 bits; Signal frame 55 has such relation with signal frame 42: receive signal frame 55 for the AMR-NB scrambler; Its 2 preceding 10 milliseconds sub-frame are identical with the 2 back 10 milliseconds sub-frame of the signal frame 42 that the AMR-NB scrambler is received recently; Thereafter 10 milliseconds 2 sub-frame receive next time that with the AMR-NB scrambler the 2 preceding 10 milliseconds sub-frame of signal frame 42 are identical, just can comprise the follow-up signal with 10 milliseconds of adjacent time spans of signal frame 1 than the late 10 milliseconds of signal frames 5 that arrive the AMR-NB scramblers of signal frame 42 like this.The voice coding module is promptly carried out following operations after receiving the even PCM frame 42 of 13 bits (minimum effective 3 bit zero setting, all the other 13 bits are effective 13 bit PCM values) that 16 bit form represent:
To signal frame 42 carry out cutoff frequency be 80 hertz high-pass filtering with eliminate low-frequency noise (for example 50 hertz AC power frequency noise) and with sample value divided by 2 pre-service;
Pretreated speech digital signal is carried out linear prediction and quantification;
Obtain synthetic digital voice signal frame after self-adapting code book search and the fixed codebook search; That is: amplify the back with self-adapting code book by the self-adapting code book gain and obtain pumping signal by fixed codebook gain amplification back addition, obtain synthetic digital voice signal frame 48 with pumping signal through the determined linear prediction synthesis filter of prediction (LP) coefficient that obtains by linear prediction again with fixed code book;
Send synthetic digital voice signal frame 48 to the voice activation detection module.
The voice activation detection module is according to the resultant VAD result of detection to synthetic digital voice signal frame 48---and VAD sign 43 is to DTX control and operational module output.
In the above-mentioned cataloged procedure of voice coding module; New sound import is sampled; Form the signal value on the sample point one by one; After the sample value of 10 milliseconds of time spans after the signal frame 42 is all sampled; Use the back field (3rd, 4 subframe) of the sample value of the signal frame 42 10 milliseconds of time spans afterwards that sample with the signal on the back 10 milliseconds sample point of signal frame 42 as the preceding field of signal frame 55 as signal frame 55; Like this during the adjacent follow-up signal of sampled signal frame 42 just can with the linear prediction in coded signal frame 42 processes and quantification, self-adapting code book search, fixed codebook search and to the voice activation of synthetic voice signal detect an operation part or all be arranged in simultaneously carry out; That is the signal in the back field of sampling generation signal frame 55 the voice coding module is carried out linear prediction and quantification, self-adapting code book search, fixed codebook search to signal frame 42 during.Begin Methods for Coding compared with obtaining again behind the complete signal frame 55, the benefit of doing like this is that the processing of encoded voice Mode A MR-NB frame is when can advance to complete signal frame 42 and be received.
Back processing procedure to detection module among Fig. 6 is represented that by Fig. 7 voice signal frame 55 is s through the high-pass filtering of pre-processing module with divided by the output signal behind the voice module that outputs to the calculating weighting after 2 the processing w(n), the processing that the voice module of calculating weighting is done input signal frame is exactly a perceptual weighting, and this module is according to expression A (z) structure weighting filter A (the z/ γ of the reverse wave filter of the non-quantization parameter that receives 1)/A (z/ γ 2), γ 1Be 0.94, γ 2Being 0.6, that is, is subframe lengths L (in AMR-NB, being defined as 40) subframe voice to length, obtains the weighting voice by following formula:
s w ( n ) = s ( n ) + Σ i = 1 10 a i γ 1 i s ( n - i ) - Σ i = 1 10 a i γ 2 i s w ( n - i ) , n = 0 , . . . , L - 1
In the present embodiment; The regulation minimum delay is that 20 maximum-delays are 143; Follow-up signal coherent detection module among Fig. 6 is calculated the value on each delay k between the 20-143 of weighted auto-correlation function; And also that this maximal value is the corresponding delay of maximal value of search weighted autocorrelation function is identified as optimum delay in the scope of 20-143, and the expression of weighted auto-correlation function R (d) is following:
R ( d ) = Σ 160 239 s w ( n ) s w ( n - d ) w ( d )
w(d)=w l(d)w n(d)
The number of said 10 milliseconds of sample points that time span comprised is 80; Integer n in 0 to 159 scope has been corresponding the sample point of voice signal frame 42; The sample point of the back field voice signal of Integer n in 160 to 239 scopes is corresponding said signal frame 55; Wherein d postpones, the low weighting function w that postpones l(d) form is w l ( d ) = d Log 2 K Nw , Adjacent past frame postpones weighting function w n(d) form is w n(d)=1 or w n ( d ) = ( | T Old - d | + d L ) Log 2 K w , d LBe the said minimum delay, T OldBe the open-loop pitch delay of said input audio signal frame, K NwBe the adjustment parameter that postpones the weighting of d adjacent domain, K wBe postpone (| T Old-d|+d L) the adjustment parameter of adjacent domain weighting.
For the ease of quick calculating;
Figure S2008100389866D00184
is expressed as cw (d); List of values with fixing representes that d is 0 to 250 o'clock cw (d), the data that the file corrwght.tab among the TS26.073 of the list of values employing 3GPP among the cw in the present embodiment (d) provides.
w l(d)=cw(d)
Figure S2008100389866D00185
When the open-loop gain 51 of input signal frame 42 is 1 greater than 0.4 v, otherwise v is the v of previous frame and 0.9 product.Computing method about the open-loop gain of input signal frame 42 have detailed explanation in the part of the 10.2kbit/s of the 5.3 joint open-loop gain analyses of the TS26.090-500 of 3GPP.
Follow-up signal coherent detection module among Fig. 7 obtains the delay k at this maximal value place according to the maximal value of the weighted auto-correlation function in the search 20-143 scope Max, calculate at k MaxThe auto-correlation function value at place, also the back field voice signal of signal calculated frame 55 is by this k MaxIn the sample point of place's time-delay all drop on the energy of the pumping signal on the sample point within signal frame 42 scopes, work as k MaxLess than having only k at 80 o'clock MaxThe energy of the pumping signal on the individual sample point will calculate, and works as k MaxGreater than to calculate in 80 o'clock on 80 sample points the energy of pumping signal.Autocorrelation function is at k MaxValue r (the k at place Max), the follow-up voice signal (the back field of signal frame 55) that is arranged in input audio signal frame 42 is with k MaxThe energy E of the pumping signal on the sample point at the past signal place of time-delay v(k Max) and weighting voice signal s w(n) at k MaxEnergy E (the k at place Max) expression following:
r ( k max ) = Σ 160 239 s w ( n ) s w ( n - k max )
E v ( k Max ) = Σ n = 160 k Max + 159 v w ( n - k Max ) v w ( n - k Max ) As 20≤k Max<80 o'clock,
E v ( k Max ) = Σ n = 160 239 v w ( n - k Max ) v w ( n - k Max ) As 80≤k Max≤143 o'clock
E ( k max ) = Σ n = 160 239 s w ( n - k max ) s w ( n - k max )
Wherein v (n) (n=0,1 ..., 159) and be the pumping signal of all 4 sub-frame of signal frame 42, in the present embodiment, work as k MaxThe auto-correlation function value and the ratio of energy---the r (k at place Max)/E (k Max) greater than 0.65 and this k MaxThe E at place v(k Max) just produce content for relevant back greater than given threshold value to correlated results signal 28; The even PCM frame 55 of 13 bits (the minimum effective 3 bit zero setting of representing for 16 bit form; All the other 13 bits are effective 13 bit PCM values), this threshold value be 25000 with the part of back 10 milliseconds of time spans of signal frame 55 by this k MaxThe length of the part that is arranged in signal frame 1 in the signal after the time-delay, that is, and as this k MaxMore than or equal to 80 o'clock, said given threshold value was exactly 25000 and 80 product---and 2000000, work as k MaxLess than 80 o'clock, said given threshold value was exactly 25000 and k MaxProduct.
In a single day discontinuous transmission control and operational module receive that content is relevant afterwards just to export normal speech SPEECH_GOOD to correlated results signal 28; The voice coding module is that normal voice (SPEECH_GOOD) transmission types is indicated 46 o'clock generation AMR-NB speech pattern coded frame (non-ground unrest code rate coded frame) in the content of receiving DTX control and operational module output; When the voice coding module is sent pretreated speech digital signal frame 49 to background noise code module when receiving that transmission types indication 46 is not normal voice (SPEECH_GOOD), the ground unrest coding module is receiving that content is that transmission types indication 46 backs that (SID_UPDATE) upgraded in quiet description produce the quiet description of AMR-NB (SID) frames 41; DTX control and operational module are put AMR-NB encoded speech frames 44 47 li of information bits and are sent to 3G Access Network (AN) when transmission types indication 46 is normal voice (SPEECH_GOOD); DTX control and operational module are put the quiet description of AMR (AMR_SID) frame 41 47 li of information bits and are sent to 3G Access Network (AN) when transmission types indication 46 is quiet description renewal (SID_UPDATE); DTX control and operational module are that the SID_FIRST frame that quiet description is put according to 3GPP technical manual TS26093 formation 47 li of information bits when beginning (SID_FIRST) sends to 3G Access Network (AN) in transmission types indication 46; Indication 3G Access Network did not carry out the transmission of Speech frame when TX control and operational module were no datat (NO_DATA) in transmission types indication 46, can so in information bit, what no matter is put.

Claims (30)

1. one kind has discontinuous transmitting DTX control and functional unit self-adapting multi-rate narrowband AMR-NB scrambler, it is characterized in that:
The reception frame length is 20 milliseconds an input audio signal frame; Also receive the follow-up voice signal that be no more than 20 millisecond time spans adjacent with said input audio signal frame; Be that said input audio signal frame and said follow-up voice signal generate the weighting voice signal, from receive said input audio signal frame to receive said follow-up voice signal during arrange at least to carry out to this wherein first of the linear prediction of said input audio signal frame and quantification, self-adapting code book search and fixed codebook search;
Comprise the auto-correlation calculating unit; These parts are to confirm predetermined maximum-delay and autocorrelation function between the predetermined minimum delay and weighted auto-correlation function corresponding to the weighting voice signal of said follow-up voice signal; With said maximum-delay and the area dividing between the said minimum delay is the scope of at least one; With being identified as optimum delay corresponding to the peaked delay of weighted auto-correlation function in the said scope; Calculate the value of autocorrelation function on optimum delay, calculate the energy of said follow-up voice signal at the pairing weighting voice signal of the past at optimum delay place signal;
If greater than predetermined value, control and functional unit is confirmed as normal voice SPEECH_GOOD with said input audio signal frame transmission types TX_TYPE by said discontinuous transmitting DTX at the ratio of the energy of the pairing weighting voice signal of the past at this at least one optimum delay place signal for value and the said follow-up voice signal of said autocorrelation function at least one optimum delay.
2. according to the scrambler of claim 1, it is characterized in that the weighting function that generates the weighting voice signal for wherein input audio signal frame and follow-up voice signal has following form:
Figure RE-FSB00000798094800011
S wherein w(n) be the weighting voice signal, s (n) is signal or the said follow-up voice signal in the said input audio signal frame, weighting factor γ 1Less than 1 more than or equal to 0, weighting factor γ 2Less than 0.7 more than or equal to 0, a iBe linear prediction LP coefficient, L is the length of subframe.
3. according to the scrambler of claim 2, it is characterized in that said weighted auto-correlation function and said follow-up voice signal are following in the form of the energy of the pairing weighting voice signal of past signal at said optimum delay place:
Figure RE-FSB00000798094800012
Figure RE-FSB00000798094800013
Wherein, R (d) is a weighted auto-correlation function, and d postpones d MaxBe that R (d) gets the pairing delay of maximal value, E (d Max) be the energy of said follow-up voice signal at the pairing weighting voice signal of past signal at said optimum delay place, w (d) is a weighting function, s w(n) be said weighting voice signal; M is the said number that is no more than 20 milliseconds of sample points that time span comprised; Integer n in 0 to 159 scope sample point of said input audio signal frame that has been corresponding, 160 Integer n in the M+159 scope the are corresponding sample point of said follow-up voice signal.
4. according to the scrambler of claim 3, it is characterized in that weighted auto-correlation function is identical with autocorrelation function, that is, weighting function w (d) wherein is 1.
5. according to the scrambler of claim 3; It is characterized in that; Described at least one scope is the such scope from the said minimum delay to said maximum-delay, and said at least one optimum delay is an optimum delay, and the form of said weighting function w (d) is following:
w(d)=w l(d)w n(d)
Wherein d postpones, the low weighting function w that postpones l(d) form is
Figure RE-FSB00000798094800021
Adjacent past frame postpones weighting function w n(d) form is w n(d)=1 or
Figure RE-FSB00000798094800022
d LBe the said minimum delay, T OldBe the open-loop pitch delay of said input audio signal frame, K NwBe the adjustment parameter that postpones the weighting of d adjacent domain, K NwBe postpone (| T Old-d|+d L) the adjustment parameter of adjacent domain weighting.
6. according to each scrambler in the claim 1 to 5, it is characterized in that the time span of described follow-up voice signal is 10 milliseconds.
7. one kind has discontinuous transmitting DTX control and functional unit self-adapting multi-rate narrowband AMR-NB scrambler, it is characterized in that:
The reception frame length is 20 milliseconds an input audio signal frame; Also receive the follow-up voice signal that be no more than 20 millisecond time spans adjacent with said input audio signal frame; Be said input audio signal frame and said follow-up voice signal generation weighting voice signal; Said input audio signal frame is carried out linear prediction and quantification, self-adapting code book search and fixed codebook search, and generate the pumping signal of said input audio signal frame;
Comprise the auto-correlation calculating unit; These parts are to confirm predetermined maximum-delay and autocorrelation function between the minimum delay and weighted auto-correlation function corresponding to the weighting voice signal of said follow-up voice signal; With said maximum-delay and the area dividing between the said minimum delay is the scope of at least one; With being identified as optimum delay corresponding to the peaked delay of weighted auto-correlation function in the said scope; Calculate the value of autocorrelation function on optimum delay; Calculate the energy of said follow-up voice signal, calculate the energy of the pumping signal of said follow-up voice signal on the sample point at the past at optimum delay place signal place that is arranged in said input audio signal frame at the pairing weighting voice signal of the past at optimum delay place signal;
If value and the said follow-up voice signal of said autocorrelation function at least one optimum delay at the ratio of the energy of the pairing weighting voice signal of the past at this at least one optimum delay place signal greater than predetermined value; And the energy of the pumping signal of the follow-up voice signal that is arranged in said input audio signal frame on the sample point at the past at this at least one optimum delay place signal place is greater than being this pumping signal preset threshold; Described preset threshold is the number and the product of sample point energy threshold of the sample point at said signal in the past place, and said discontinuous transmitting DTX is controlled and functional unit is confirmed as normal voice SPEECH_GOOD with said input audio signal frame transmission types TX_TYPE.
8. according to the scrambler of claim 7, it is characterized in that the weighting function that generates the weighting voice signal for wherein input audio signal frame and follow-up voice signal has following form:
Figure RE-FSB00000798094800031
S wherein w(n) be the weighting voice signal, s (n) is signal or the said follow-up voice signal in the said input audio signal frame, weighting factor γ 1Less than 1 more than or equal to 0.88, weighting factor γ 2Less than 0.7 more than or equal to 0.4, a iBe linear prediction LP coefficient, L is the length of subframe.
9. according to Claim 8 scrambler is characterized in that said weighted auto-correlation function and said follow-up voice signal are following in the form of the energy of the pairing weighting voice signal of past signal at said optimum delay place:
Figure RE-FSB00000798094800032
Figure RE-FSB00000798094800033
Wherein R (d) is a weighted auto-correlation function, and d postpones d MaxBe that R (d) gets the pairing delay of maximal value, E (d Max) be the energy of said follow-up voice signal at the pairing weighting voice signal of past signal at said optimum delay place, w (d) is a weighting function, s w(n) be said weighting voice signal; M is the said number that is no more than 20 milliseconds of sample points that time span comprised; Integer n in 0 to 159 scope sample point of said input audio signal frame that has been corresponding, 160 Integer n in the M+159 scope the are corresponding sample point of said follow-up voice signal.
10. according to the scrambler of claim 9; It is characterized in that; Described at least one scope is the such scope from the said minimum delay to said maximum-delay, and said at least one optimum delay is an optimum delay, and the form of weighting function w (d) wherein is following:
w(d)=w l(d)w n(d)
Wherein d postpones, the low weighting function w that postpones l(d) form is
Figure RE-FSB00000798094800034
Adjacent past frame postpones weighting function w n(d) form is w n(d)=1 or
Figure RE-FSB00000798094800035
d LBe the said minimum delay, T OldBe the open-loop pitch delay of said input audio signal frame, K NwBe the adjustment parameter that postpones the weighting of d adjacent domain, K wBe postpone (| T Old-d|+d L) the adjustment parameter of adjacent domain weighting.
11. the scrambler according to claim 9 is characterized in that, weighted auto-correlation function is identical with autocorrelation function, that is, described weighting function w (d) is 1.
12., it is characterized in that the time span of described follow-up voice signal is 10 milliseconds according to each scrambler in the claim 7 to 11.
13. according to each scrambler in the claim 7 to 11; It is characterized in that, from receive said input audio signal frame to receive said follow-up voice signal during arrange to carry out at least this wherein first of the linear prediction of said input audio signal frame and quantification, self-adapting code book search and fixed codebook search.
14. according to each scrambler in the claim 7 to 11; It is characterized in that, confirm said sample point energy threshold at the ratio of the energy of the pairing weighting voice signal of the past at said this at least one optimum delay place signal according to the value and the said follow-up voice signal of said autocorrelation function on said this at least one optimum delay.
15. the method for the transmission types TX_TYPE of a kind of definite discontinuous transmitting DTX of self-adapting multi-rate narrowband AMR-NB coding is characterized in that:
The reception frame length is 20 milliseconds an input audio signal frame; Also receive the said follow-up voice signal that be no more than 20 millisecond time spans adjacent with the input audio signal frame; Be that said input audio signal frame and said follow-up voice signal generate the weighting voice signal, from receive said input audio signal frame to receive said follow-up voice signal during arrange at least to carry out to this wherein first of the linear prediction of said input audio signal frame and quantification, self-adapting code book search and fixed codebook search;
For confirming predetermined maximum-delay and autocorrelation function between the minimum delay and weighted auto-correlation function corresponding to the weighting voice signal of said follow-up voice signal; With said maximum-delay and the area dividing between the said minimum delay is the scope of at least one; With in the said scope corresponding to the peaked delay of weighted auto-correlation function as optimum delay; Calculate the value of autocorrelation function on optimum delay, calculate the energy of said follow-up voice signal at the pairing weighting voice signal of the past at optimum delay place signal;
If value and the said follow-up voice signal of said autocorrelation function at least one optimum delay greater than predetermined value, confirmed as normal voice SPEECH_GOOD with said input audio signal frame transmission types TX_TYPE at the ratio of the energy of the pairing weighting voice signal of the past at this at least one optimum delay place signal.
16. the method according to claim 15 is characterized in that, the weighting function that generates the weighting voice signal for wherein input audio signal frame and follow-up voice signal has following form:
Figure S2008100389866C00041
S wherein w(n) be the weighting voice signal, s (n) is signal or the said follow-up voice signal in the said input audio signal frame, weighting factor γ 1Less than 1 more than or equal to 0, weighting factor γ 2Less than 0.7 more than or equal to 0, a iBe linear prediction LP coefficient, L is the length of subframe.
17. the method according to claim 16 is characterized in that, said weighted auto-correlation function and said follow-up voice signal are following in the form of the energy of the pairing weighting voice signal of past signal at said optimum delay place:
Figure S2008100389866C00042
Figure S2008100389866C00043
Wherein, R (d) is a weighted auto-correlation function, and d postpones d MaxBe that R (d) gets the pairing delay of maximal value, E (d Max) be the energy of said follow-up voice signal at the pairing weighting voice signal of past signal at said optimum delay place, w (d) is a weighting function, s w(n) be said weighting voice signal; M is the said number that is no more than 20 milliseconds of sample points that time span comprised; Integer n in 0 to 159 scope sample point of said input audio signal frame that has been corresponding, 160 Integer n in the M+159 scope the are corresponding sample point of said follow-up voice signal.
18. the method according to claim 17 is characterized in that, weighted auto-correlation function is identical with autocorrelation function, that is, described weighting function w (d) is 1.
19. method according to claim 18; It is characterized in that; Described at least one scope is the such scope from the said minimum delay to said maximum-delay, and said at least one optimum delay is an optimum delay, and the form of weighting function w (d) wherein is following:
w(d)=w l(d)w n(d)
Wherein d postpones, the low weighting function w that postpones l(d) form is
Figure RE-FSB00000798094800041
Adjacent past frame postpones weighting function w n(d) form is w n(d)=1 or
Figure RE-FSB00000798094800042
d LBe the said minimum delay, T OldIt is institute
State the open-loop pitch delay of input audio signal frame, K NwBe the adjustment parameter that postpones the weighting of d adjacent domain, K wBe postpone (| T Old-d|+d L) the adjustment parameter of adjacent domain weighting., the time span of described follow-up voice signal is 10 milliseconds.
20., it is characterized in that the time span of described follow-up voice signal is 10 milliseconds according to each method in the claim 15 to 19.
21. the method for the transmission types TX_TYPE of a kind of definite discontinuous transmitting DTX of self-adapting multi-rate narrowband AMR-NB coding is characterized in that:
The reception frame length is 20 milliseconds an input audio signal frame; Also receive the follow-up voice signal that be no more than 20 millisecond time spans adjacent with said input audio signal frame; Be said input audio signal frame and said follow-up voice signal generation weighting voice signal; Said input audio signal frame is carried out linear prediction and quantification, self-adapting code book search and fixed codebook search, and generate the pumping signal of said input audio signal frame;
For confirming predetermined maximum-delay and autocorrelation function between the minimum delay and weighted auto-correlation function corresponding to the weighting voice signal of said follow-up voice signal; With said maximum-delay and the area dividing between the said minimum delay is the scope of at least one; With in the said scope corresponding to the peaked delay of weighted auto-correlation function as optimum delay; Calculate the value of autocorrelation function on optimum delay; Calculate the energy of said follow-up voice signal, calculate the energy of the pumping signal of said follow-up voice signal on the sample point at the past at optimum delay place signal place that is arranged in said input audio signal frame at the pairing weighting voice signal of the past at optimum delay place signal;
If value and the said follow-up voice signal of said autocorrelation function at least one optimum delay at the ratio of the energy of the pairing weighting voice signal of the past at this at least one optimum delay place signal greater than predetermined value; And the energy of the pumping signal of the follow-up voice signal that is arranged in said input audio signal frame on the sample point at the past at this at least one optimum delay place signal place is greater than being this pumping signal preset threshold; Described preset threshold is number and the product of sample point energy threshold of the sample point at said signal in the past place, and said input audio signal frame transmission types TX_TYPE is confirmed as normal voice SPEECH_GOOD.
22. the method according to claim 21 is characterized in that, the weighting function that generates the weighting voice signal for wherein input audio signal frame and follow-up voice signal has following form:
Figure RE-FSB00000798094800051
S wherein w(n) be the weighting voice signal, s (n) is signal or the said follow-up voice signal in the said input audio signal frame, weighting factor γ 1Less than 1 more than or equal to 0.88, weighting factor γ 2Less than 0.7 more than or equal to 0.4, a iBe linear prediction LP coefficient, L is the length of subframe.
23. the method according to claim 22 is characterized in that, said weighted auto-correlation function and said follow-up voice signal are following in the form of the energy of the pairing weighting voice signal of past signal at said optimum delay place:
Wherein, R (d) is a weighted auto-correlation function, and d postpones d MaxBe that R (d) gets the pairing delay of maximal value, E (d Max) be the energy of said follow-up voice signal at the pairing weighting voice signal of past signal at said optimum delay place, w (d) is a weighting function, s w(n) be said weighting voice signal; M is the said number that is no more than 20 milliseconds of sample points that time span comprised; Integer n in 0 to 159 scope sample point of said input audio signal frame that has been corresponding, 160 Integer n in the M+159 scope the are corresponding sample point of said follow-up voice signal.
24. coding method according to claim 23; It is characterized in that; Described at least one scope is the such scope from the said minimum delay to said maximum-delay, and said at least one optimum delay is an optimum delay, and the form of weighting function w (d) wherein is following:
w(d)=w l(d)w n(d)
Wherein d postpones, the low weighting function w that postpones l(d) form is
Figure RE-FSB00000798094800054
Adjacent past frame postpones weighting function w n(d) form is w n(d)=1 or
Figure RE-FSB00000798094800055
d LBe the said minimum delay, T OldBe the open-loop pitch delay of said input audio signal frame, K NwBe the adjustment parameter that postpones the weighting of d adjacent domain, K wBe postpone (| T Old-d|+d L) the adjustment parameter of adjacent domain weighting.
25. the coding method according to claim 24 is characterized in that, weighted auto-correlation function is identical with autocorrelation function, that is, described weighting function w (d) is 1.
26. according to each method in the claim 21 to 25; It is characterized in that, from receive said input audio signal frame to receive said follow-up voice signal during arrange to carry out at least this wherein first of the linear prediction of said input audio signal frame and quantification, self-adapting code book search and fixed codebook search.
27., it is characterized in that the time span of described follow-up voice signal is 10 milliseconds according to each method in the claim 21 to 25.
28. according to each method in the claim 21 to 25; It is characterized in that, confirm said sample point energy threshold at the ratio of the energy of the pairing weighting voice signal of the past at said this at least one optimum delay place signal according to the value and the said follow-up voice signal of said autocorrelation function on said this at least one optimum delay.
29. one kind has discontinuous transmitting DTX control and functional unit self-adapting multi-rate narrowband AMR-NB scrambler; It is characterized in that: the reception frame length is 20 milliseconds an input audio signal frame; Also receive the follow-up voice signal that be no more than 20 millisecond time spans adjacent with said input audio signal frame; Be said input audio signal frame and said follow-up voice signal generation weighting voice signal; Said input audio signal frame is carried out linear prediction and quantification, self-adapting code book search and fixed codebook search, and generate the pumping signal of said input audio signal frame;
Comprise the auto-correlation calculating unit; These parts are to confirm predetermined maximum-delay and autocorrelation function between the minimum delay and weighted auto-correlation function corresponding to the weighting voice signal of said follow-up voice signal; With said maximum-delay and the area dividing between the said minimum delay is the scope of at least one; With being identified as optimum delay corresponding to the peaked delay of weighted auto-correlation function in the said scope; Calculate the value of autocorrelation function on optimum delay; Calculate the energy of said follow-up voice signal, calculate the level of the pumping signal of said follow-up voice signal on the sample point at the past at optimum delay place signal place that is arranged in said input audio signal frame at the pairing weighting voice signal of the past at optimum delay place signal;
If value and the said follow-up voice signal of said autocorrelation function at least one optimum delay at the ratio of the energy of the pairing weighting voice signal of the past at this at least one optimum delay place signal greater than predetermined value; And the level of the pumping signal of the follow-up voice signal that is arranged in said input audio signal frame on the sample point at the past at this at least one optimum delay place signal place is greater than being this pumping signal preset threshold; Described preset threshold is the number and the product of sample point level threshold of the sample point at said signal in the past place, and said discontinuous transmitting DTX is controlled and functional unit is confirmed as normal voice SPEECH_GOOD with said input audio signal frame transmission types TX_TYPE.
30. the method for the transmission types TX_TYPE of a kind of definite discontinuous transmitting DTX of self-adapting multi-rate narrowband AMR-NB coding is characterized in that:
The reception frame length is 20 milliseconds an input audio signal frame; Also receive the follow-up voice signal that be no more than 20 millisecond time spans adjacent with said input audio signal frame; Be said input audio signal frame and said follow-up voice signal generation weighting voice signal; Said input audio signal frame is carried out linear prediction and quantification, self-adapting code book search and fixed codebook search, and generate the pumping signal of said input audio signal frame;
For confirming predetermined maximum-delay and autocorrelation function between the minimum delay and weighted auto-correlation function corresponding to the weighting voice signal of said follow-up voice signal; With said maximum-delay and the area dividing between the said minimum delay is the scope of at least one; With in the said scope corresponding to the peaked delay of weighted auto-correlation function as optimum delay; Calculate the value of autocorrelation function on optimum delay; Calculate the energy of said follow-up voice signal, calculate the level of the pumping signal of said follow-up voice signal on the sample point at the past at optimum delay place signal place that is arranged in said input audio signal frame at the pairing weighting voice signal of the past at optimum delay place signal;
If value and the said follow-up voice signal of said autocorrelation function at least one optimum delay at the ratio of the energy of the pairing weighting voice signal of the past at this at least one optimum delay place signal greater than predetermined value; And the level of the pumping signal of the said follow-up voice signal that is arranged in said input audio signal frame on the sample point at the past at this at least one optimum delay place signal place is greater than being this pumping signal preset threshold; Described preset threshold is number and the product of sample point level threshold of the sample point at said signal in the past place, and said input audio signal frame transmission types TX_TYPE is confirmed as normal voice SPEECH_GOOD.
CN2008100389866A 2008-06-16 2008-06-16 Encoder and method for self adapting to discontinuous transmission of multi-rate narrowband Expired - Fee Related CN101609683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008100389866A CN101609683B (en) 2008-06-16 2008-06-16 Encoder and method for self adapting to discontinuous transmission of multi-rate narrowband

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008100389866A CN101609683B (en) 2008-06-16 2008-06-16 Encoder and method for self adapting to discontinuous transmission of multi-rate narrowband

Publications (2)

Publication Number Publication Date
CN101609683A CN101609683A (en) 2009-12-23
CN101609683B true CN101609683B (en) 2012-08-08

Family

ID=41483407

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008100389866A Expired - Fee Related CN101609683B (en) 2008-06-16 2008-06-16 Encoder and method for self adapting to discontinuous transmission of multi-rate narrowband

Country Status (1)

Country Link
CN (1) CN101609683B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111896808B (en) * 2020-07-31 2023-02-03 中国电子科技集团公司第四十一研究所 Method for integrally designing frequency spectrum track processing and adaptive threshold generation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1333981A (en) * 1998-11-24 2002-01-30 艾利森电话股份有限公司 Efficient in-band signaling for discontinuous transmission and configuration changes in adaptive multi-rate communications systems
CN1398126A (en) * 2001-07-18 2003-02-19 华为技术有限公司 Method for implementing multi-language coding-decoding in universal mobile communication system
CN1428953A (en) * 2002-04-22 2003-07-09 西安大唐电信有限公司 Implement method of multi-channel AMR vocoder and its equipment
CN1805565A (en) * 2005-01-14 2006-07-19 华为技术有限公司 Decoding method for adaptive multi-rate speech coding in non-consecutive transmitting mode

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1333981A (en) * 1998-11-24 2002-01-30 艾利森电话股份有限公司 Efficient in-band signaling for discontinuous transmission and configuration changes in adaptive multi-rate communications systems
CN1398126A (en) * 2001-07-18 2003-02-19 华为技术有限公司 Method for implementing multi-language coding-decoding in universal mobile communication system
CN1428953A (en) * 2002-04-22 2003-07-09 西安大唐电信有限公司 Implement method of multi-channel AMR vocoder and its equipment
CN1805565A (en) * 2005-01-14 2006-07-19 华为技术有限公司 Decoding method for adaptive multi-rate speech coding in non-consecutive transmitting mode

Also Published As

Publication number Publication date
CN101609683A (en) 2009-12-23

Similar Documents

Publication Publication Date Title
CN101359978B (en) Method for control of rate variant multi-mode wideband encoding rate
EP1164580B1 (en) Multi-mode voice encoding device and decoding device
US6073092A (en) Method for speech coding based on a code excited linear prediction (CELP) model
KR101147878B1 (en) Coding and decoding methods and devices
CN1820306B (en) Method and device for gain quantization in variable bit rate wideband speech coding
US20050197833A1 (en) Apparatus and method for speech coding
CN105359209A (en) Apparatus and method for improved signal fade out in different domains during error concealment
CN101320563A (en) Background noise encoding/decoding device, method and communication equipment
CN105359211A (en) Unvoiced/voiced decision for speech processing
CN107293311A (en) Very short pitch determination and coding
CN104126201A (en) System and method for mixed codebook excitation for speech coding
CN101388214B (en) Speed changing vocoder and coding method thereof
Jelinek et al. Wideband speech coding advances in VMR-WB standard
CN105745705A (en) Concept for encoding an audio signal and decoding an audio signal using speech related spectral shaping information
CN105723456A (en) Concept for encoding an audio signal and decoding an audio signal using deterministic and noise like information
CN101572090B (en) Self-adapting multi-rate narrowband coding method and coder
US7024354B2 (en) Speech decoder capable of decoding background noise signal with high quality
CN101609682B (en) Encoder and method for self adapting to discontinuous transmission of multi-rate wideband
CN101609683B (en) Encoder and method for self adapting to discontinuous transmission of multi-rate narrowband
EP1619665B1 (en) Voice coding apparatus and method using PLP in mobile communications terminal
CN104025191A (en) An improved method and apparatus for adaptive multi rate codec
CN101572091A (en) Self-adapting multi-rate broadband coding method and coder
Li et al. Basic audio compression techniques
Copperi Efficient excitation modeling in a low bit-rate CELP coder
Deyuan An 8 kb/s low complexity ACELP speech codec

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120808

Termination date: 20130616