CN102034476B

CN102034476B - Methods and devices for detecting and repairing error voice frame

Info

Publication number: CN102034476B
Application number: CN2009101745877A
Authority: CN
Inventors: 刘加; 王林芳; 李明; 刘小青
Original assignee: Tsinghua University; Huawei Technologies Co Ltd
Current assignee: Tsinghua University; Huawei Technologies Co Ltd
Priority date: 2009-09-30
Filing date: 2009-09-30
Publication date: 2013-09-11
Anticipated expiration: 2029-09-30
Also published as: CN102034476A

Abstract

The embodiment of the invention provides methods and devices for detecting and repairing an error voice frame, and relates to the field of communication. The error frame can be detected and repaired according to the characteristics of a voice signal and the prior statistic characteristics of a coding parameter. The detection method comprises the following steps of: in a silent mode, receiving a voice frame which is indicated to be correct by the parameter, detecting the parameter of the voice frame according to a preset detection rule, and determining that the voice frame is the error voice frame when the parameter meets the condition specified by the detection rule; and in a voice mode, receiving a voice frame which is indicated to be wrong by the parameter, detecting a silence insertiondescriptor (SID) of the voice frame according to the parameter average value of the voice frame, and determining that the voice frame is an SID frame when the SID meets the detection condition. The embodiment of the invention can be used for the full rate voice coding and decoding of a global system for mobile communications (GSM).

Description

The method of speech frame error-detecting and device

Technical field

The present invention relates to the communications field, relate in particular to a kind of method and device of speech frame error-detecting.

Background technology

GSM (Global System for Mobile Communications, global system for mobile communications) mobile communication system is a kind of typically based on TDMA (Time Division Multiple Access, time division multiple access (TDMA)) and FDMA (Frequency Division Multiple Access, frequency division multiple access) digital cellular mobile communication systems of realization multi-user communication, its capacity is greatly improved than the analog mobile cellular telecommunication system.

Voice communication is an important application in the GSM mobile communication.Gsm system adopts four kinds of coding and decoding schemes at present, is respectively: full rate, enhanced full rate, adaptive multi-rate and half-rate speech.

The GSM full-speed voice adopts RPE-LTP LPC (Regular Pulse Excited-Long Term Predition-Linear Predictive Coding, RPE-LTP-LPC) scheme.The coding parameter of GSM full-speed voice is as shown in table 1:

The coding parameter of table 1, GSM full-speed voice

In typical two-way call process, the effective rate of utilization average out to 50% of each transmission channel.For improving the utilization ratio of transmission channel, and reach the purpose that reduces power consumption, reduces the wave point overall interference, in wireless voice communication, adopt discontinuous transmission (Discontinuous Transmission, DTX) pattern, only when voice signal is arranged, transmit, when voice are ended, then cut off wireless transmission.DTX mechanism requires to have following several functions:

The voice activity detection of transmitting terminal (Voice Activity Detection, VAD);

The ground unrest estimation of transmitting terminal and silent descriptor (Silence Insertion Descriptor, SID) frame generates, to receiving end transmission comfort noise characteristic parameter;

The comfort noise of receiving end produces.

The VAD module is used for detecting and difference voice segments and quiet section.Its output result is the indication of binary voice, and 0 expression is quiet, 1 expression voice.When voice were designated as 1, the speech parameter after DTX will encode is directly passed to wireless subsystem, and (Radio Subsystem RSS), carried out normal signal and sends.When voice were designated as 0, in conjunction with historical information, DTX taked certain hangover (Hangover) measure, to avoid cutting off wireless transmission when the of short duration speech pause; When the accumulation frame number of VAD=0 surpassed hangover duration (Hangover Period), DTX just cut off wireless transmission.

If when the DTX processor cuts off wireless transmission, receiving end is not exported any signal and is presented silently completely, and military order hearer produces very uncomfortable sensation, thereby reduces whole voice communication perceived quality.Be head it off, receiving end has adopted " comfort noise " function, and the frequency spectrum parameter that namely utilizes the SID frame to provide produces the sound of similar transmitting terminal background noise.In non-speech segment, every through one period predetermined time at interval, send a SID frame that comprises one group of parameter.The time interval that sends the SID frame is more a lot of than voice frame length, can reduce transfer rate effectively thus, reduces channel congestion.

At receiving end, antenna remains opening.When not having signal to send, what antenna received is random noise.Adopt CRC (Cyclical Redundancy Check, CRC) to detect accepting bit stream in the receiving end channel-decoding process, the random noise bit stream can be judged as erroneous frame usually.At this moment, receiving end utilizes the parameter of last effective SID frame to substitute, and imports Voice decoder and decode, and what this moment, demoder was exported is comfort noise.

In wireless voice communication, error code and frame losing phenomenon that interchannel noise and channel fading effect etc. causes are more common.Exist the speech parameter frame of error code to comprise contaminated speech parameter usually, if directly decode, the noise of shrillness may occur, reduce voice communication quality; But if output signal not fully, it is blank and discontinuous then to cause voice signal to occur, and makes the hearer feel very uncomfortable, has a strong impact on the subjective quality of voice communication.Therefore, the detection of erroneous frame and reparation are particularly important for the subjective speech quality of system.The FR code decode algorithm of GSM has been done relevant robustness and has been considered when design.Common way is done smoothly by the parameter of getting former frames for when frame losing occurring, recovers speech waveform as far as possible.

State in realization in the process of GMS full-speed voice frame error-detecting and frame losing error concealing, the inventor finds that there are the following problems at least in the prior art:

The CRC check protective capability a little less than, can not detect all significant bits errors of transmission; And the binary judgment model is too simple, can not handle the bit mistake flexibly.

In addition, existing frame losing error concealing method can abandon the useful information in the erroneous frame fully, can not be utilized effectively; And parameter substitutes the dynamic change that can't follow the tracks of voice signal.

Summary of the invention

Embodiments of the invention provide a kind of method and device of speech frame error-detecting, can utilize the characteristics of voice signal, and the priori statistical property of speech coding parameters is carried out the detection of erroneous frame.

For achieving the above object, embodiments of the invention adopt following technical scheme:

A kind of erroneous frame detection method comprises:

Under silent mode, receive that a parameter is designated as correct speech frame, then according to the detection rule that sets in advance described speech frame parameters is detected, when satisfying the condition of described detection rule predetermining, determine that described speech frame is the garbled voice frame.

A kind of erroneous frame pick-up unit comprises:

Receiving element is used under silent mode, receives speech frame.

Detecting unit, be used for when the speech frame that receives be that a parameter is when being designated as correct speech frame, according to the detection rule that sets in advance described speech frame parameters is detected, when satisfying the condition of described detection rule predetermining, determine that described speech frame is wrong speech frame.

Erroneous frame detection method and device that the embodiment of the invention provides, by under silent mode, receive that a parameter is designated as correct speech frame, then according to the detection rule that sets in advance speech frame parameters is detected, when satisfying the condition that detects rule predetermining, determine that speech frame is the garbled voice frame.Like this, can eliminate because CRC check is judged a large amount of quiet section noise that causes by accident, and then significantly reduce the influence to the decoded speech quality, improve the user and experience.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

The FB(flow block) of a kind of erroneous frame detection method that Figure 1A provides for the embodiment of the invention one;

The FB(flow block) of the another kind of erroneous frame detection method that Figure 1B provides for the embodiment of the invention one;

The DTX transaction module synoptic diagram that Fig. 2 provides for the embodiment of the invention;

The FB(flow block) of the speech frame restorative procedure that Fig. 3 provides for the embodiment of the invention two;

The FB(flow block) of the speech frame restorative procedure that Fig. 4 provides for the embodiment of the invention three;

The structured flowchart one of the erroneous frame pick-up unit that Fig. 5 provides for the embodiment of the invention;

The structured flowchart two of the erroneous frame pick-up unit that Fig. 6 provides for the embodiment of the invention;

The structured flowchart three of the erroneous frame pick-up unit that Fig. 7 provides for the embodiment of the invention;

The structured flowchart four of the erroneous frame pick-up unit that Fig. 8 provides for the embodiment of the invention;

The structured flowchart one of the speech frame prosthetic device that Fig. 9 provides for the embodiment of the invention;

The structured flowchart two of the speech frame prosthetic device that Figure 10 provides for the embodiment of the invention;

The structured flowchart three of the speech frame prosthetic device that Figure 11 embodiment of the invention provides;

The structured flowchart of the speech frame prosthetic device that Figure 12 provides for another embodiment of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.

The embodiment of the invention one provides a kind of erroneous frame detection method, and shown in Figure 1A, this method comprises:

S101, under silent mode, receive that a parameter is designated as correct speech frame, then according to the detection rule that sets in advance speech frame parameters is detected, when satisfying the condition that detects rule predetermining, determine that speech frame is the garbled voice frame.

In the present embodiment, parameter is designated as the speech frame that correct speech frame is BFI=0 and SID=0.

Concrete, the detection rule that sets in advance can for: under silent mode, receive a BFI=0, and the correct speech frame of SID=0, if SID=2 or the BFI=1 of a frame behind this speech frame determine that then this speech frame is the garbled voice frame.Further, the BFI of this speech frame can also be reset to 1.

In addition, the detection rule that sets in advance can also be: under silent mode, receive a BFI=0, and the correct speech frame of SID=0, obtain the LAR parameter of this speech frame, the prior probability of the RPEMAX of speech frame subframe, LTPlag, LTPGain parameter if wherein the prior probability of any one parameter is less than first threshold, determines that then this speech frame is the garbled voice frame; Perhaps obtain the LAR parameter of a frame behind this speech frame former frame, this speech frame and this speech frame respectively, priori transition probability with the RPEMax parameter of a frame subframe behind this speech frame former frame subframe, this speech frame subframe and this speech frame, if wherein the priori transition probability of any one parameter is less than second threshold value, determine that then this speech frame is the garbled voice frame, and further the BFI of this speech frame is reset to 1.Optionally, this processing procedure namely can be carried out after receiving correct frame immediately, also can carry out after the detection rule on not satisfying.

Like this, can under silent mode, accurately identify the speech frame of factual error, improve the accuracy that speech frame detects, avoid because speech frame detects the decoding process of the wrong mistake that causes, thereby improve voice quality.

Further, in order to realize under the speech pattern that to the accurate detection of speech frame, as shown in Figure 1B, the method that the embodiment of the invention one provides can further include:

S102, under speech pattern, receive that a parameter is designated as wrong speech frame, then detect according to the mean parameter of the speech frame SID to this speech frame, when satisfying testing conditions, determine that this speech frame is the SID frame.

In the present embodiment, parameter is designated as wrong speech frame can be BFI=1, and the speech frame of SID=0.

Concrete, the testing conditions in this step can be the mean value of LTPLag parameter that obtains 4 subframes of this speech frame

Obtain the mean value of RPEMax parameter of 4 subframes of this speech frame former frame And the mean value of the RPEMax parameter of 4 subframes of this speech frame

If Less than the 3rd threshold value, and

Less than the 4th threshold value, and

Less than the 4th threshold value, determine that then described speech frame is the SID frame.Further, the SID of this speech frame can also be reset to 2.

Like this, can eliminate because CRC check is judged a large amount of quiet section noise that causes by accident, and can correctly identify disturbed SID frame, at utmost utilize the effective information in the speech frame, significantly reduce the influence to the decoded speech quality, improve the user and experience.

Below, provide a specific embodiment that an implementation procedure of the erroneous frame detection method among the embodiment one is described, the erroneous frame detection method of embodiment one can directly be implemented into traditional GSM full-speed voice receiving end, be embedded into after the channel decoder, before the Voice decoder, can be considered the tone decoding preprocessing process.This method be input as totally 76 and the corresponding erroneous frame indication (BFI) of frame parameter that channel decoder obtains, the indication of quiet insertion descriptor (SID), totally 78 actual parameters.This method be output as repair or level and smooth after frame parameter (totally 76), the erroneous frame indication of revising (M_BFI), the quiet indication of insertion descriptor (M_SID), totally 78 actual parameters of revising.

This embodiment adopts a DTX state machine to describe the conversion between voice segments and quiet section.The initial state of DTX state machine is mute state, and its state redirect depends on by the correction BFI (M_BFI) of the output of the enhancement mode error detection module shown in Fig. 2 left side and revises SID (M_SID) indication.On the other hand, the operation of enhancement mode error detection module also depends on the DTX state.Complement each other, condition each other between DTX state machine and the enhancement mode error detection module, make that the redirect of DTX state machine is more sane reliable.

Concrete, when the DTX state machine is in mute state, if receive one " correct speech frame ", i.e. SID=0 in this speech frame, and BFI=0 is though parameter is designated as " correct speech frame " here, but indication may be incorrect, these speech frame actual capabilities are wrong speech frames, at first tentatively judge whether wrong speech frame of this speech frame according to two principle in the present embodiment, and this two principle is:

(1) if a back frame of this speech frame is the SID frame, judges that then this speech frame is wrong speech frame, resets to 1 with this speech frame BFI indication;

(2) if the BFI of a back frame of this speech frame is designated as 1, judge that then this speech frame is wrong speech frame, resets to 1 with this speech frame BFI indication.

Through above-mentioned judgement, for erroneous frame, then start initial speech frame parameters detection module 401 if can not be judged as this " correct speech frame ".Certainly, also can directly start initial speech frame parameters detection module 401 and judge without said process.

Here the initial speech frame of indication namely is first received when switching to speech pattern from silent mode speech frame, and the DTX state machine can switch to speech pattern with silent mode according to this speech frame.

Voice The initial segment parameter detection method major control silent mode is to the conversion of speech pattern, and this module utilizes the information of two aspects to carry out error-detecting:

(1) obtains the prior probability of each parameter value, if the LAR parameter of current speech frame, the prior probability of a parameter value is arranged less than given threshold value (being first threshold) in the RPEMax parameter of current speech frame subframe, LTPlag parameter, the LTPGain parameter, namely be defined as erroneous frame, current speech frame BFI indication is reset to 1.

(2) obtain the LAR parameter of a frame behind current speech frame former frame, current speech frame and the current speech frame, obtain the priori transition probability of the RPEMax parameter of a frame subframe behind current speech frame former frame subframe, current speech frame subframe and the current speech frame, if the priori transition probability of a parameter value is arranged less than given threshold value (i.e. second threshold value), namely be defined as erroneous frame, present frame BFI indication is reset to 1.

The prior probability of each parameter and priori transition probability obtain more than 2700000 speech frame statistics, and these speech frames comprise mandarin, Americanese, German, French, Spanish and six languages of Japanese, comprise noise and music in the part language material.In embodiments of the present invention, prior probability threshold value (first, second threshold value) all is taken as 0.0001, and this value is unfixed empirical value.Utilize these knowledge and rule, initial speech frame parameters detects the quiet section CRC that can correct more than 90% and detects mistake, eliminates most quiet section noise.

Further, at the redirect of speech pattern to silent mode, the embodiment of the invention adopts the disturbed SID frame of SID frame detection module 402 identifications.In some cases, when voice segments will finish, if interchannel noise is comparatively serious, can make the SID frame also be subjected to the pollution of interchannel noise.If the SID sign is interfered, making Voice decoder can not correctly identify this frame is the SID frame, it can be handled as speech frame, and the DTX state machine can not correctly jump to mute state.This will cause a series of contingency question.In addition, voice The initial segment parameter detection module possibly can't be identified some random noise frame of quiet section.If such situation occurs, can correctly identify disturbed SID frame, and rapidly the DTX state machine be transferred to mute state, then the influence to the decoded speech quality can be dropped to minimum.What the SID frame in the enhancement mode DTX module detected realization is exactly this function.

When the DTX state machine is in voice status, and current signal amplitude hour, and whenever receiving one " garbled voice frame " is SID=0, and the speech frame of BFI=1, all it is carried out the inspection of SID frame pattern.Mainly two class parameters are checked:

For long-term prediction delay parameter (LTPLag), calculate the LTPLag mean parameter of 4 subframes of present frame

\overset{&OverBar;}{L} = \frac{1}{4} Σ_{i = 0}^{3} L_{i};

For sub-frame block range parameter (RPEMax), calculate the RPEMax mean parameter of 4 subframes of 4 subframes of former frame and present frame

{\overset{&OverBar;}{R}}_{- 1} = \frac{1}{4} Σ_{i = 0}^{3} R_{- 1 i},

{\overset{&OverBar;}{R}}_{0} = \frac{1}{4} Σ_{i = 0}^{3} R_{0 i} .

When

And

And

The time, think that present frame is the SID frame, the SID sign is reset to 2, determine that namely the current speech frame is the SID frame, and LTPLag parameter and the RPEMax parameter of 4 subframes of present frame all reset to 0.In this programme, threshold value TL (i.e. the 3rd threshold value) is taken as empirical value 10, and threshold value TP (i.e. the 4th threshold value) is taken as empirical value 4.

Detect in conjunction with voice The initial segment parameter detecting and SID frame, according to revised M_BFI indication and M_SID indication, the state redirect of control DTX state machine.Shown in right figure among Fig. 2, initial state is in silent mode, when receiving M_BFI=1 or M_SID=2, namely when erroneous frame or SID frame, remains under the silent mode; When receiving M_BFI=0, namely during correct speech frame, switch to speech pattern from silent mode; Under speech pattern, when receiving M_SID=0, namely during speech frame, remain under the speech pattern; When receiving M_SID=2, namely during the SID frame, switch to silent mode from speech pattern.

The method that the embodiment of the invention one provides adopts stricter quiet-voice status redirect control, utilize the statistical information of actual speech signal to help to carry out the validity detection of initial speech frame, utilize the SID frame of SID frame detection method identification marking information errors simultaneously, solved the simple CRC of dependence check in the prior art, cause that identification information is unreliable, be prone to the problem of a large amount of quiet section noises in the decoded signal, can correctly detect quiet section random noise frame of CRC erroneous judgement, thereby effectively eliminate quiet section noise.

The embodiment of the invention two provides a kind of speech frame restorative procedure, and as shown in Figure 3, this method comprises:

S301, according to the parameter of frame before and after the speech frame or front and back frame subframe, perhaps the parameter according to the front and back subframe of speech frame subframe generates reference parameter.

For speech frame, the LAR parameter is the parameter of speech frame, and each LAR parameter has 8 components, and RPEMax parameter, LTPLag parameter, LTPGain parameter are the parameters of this speech frame subframe, and wherein, each speech frame has 4 subframes.

This step specifically can be respectively:

LAR parameter according to a frame behind the LAR parameter of the LAR parameter of this current speech frame front cross frame, this current speech frame former frame and this current speech frame generates with reference to the LAR parameter.

Or the RPEMax parameter of establishing 4 subframes of this current speech frame former frame is first set, establishes that the RPEMax parameter of 4 subframes of a frame is second set behind this current speech frame, establishes 8 RPEMax parameter the 3rd set of 4 subframes of a frame behind 4 subframes of this current speech frame former frame and this current speech frame, changes the set of minimum as reference RPEMax parameter with each component elements in three set.

Or the LTPLag param elements of all values between 40 to 120 in the LTPLag parameter of 4 subframes of a frame carried out cluster after adopting minimum clustering procedure to the LTPLag parameter of the LTPLag parameter of 4 subframes of this current speech frame, 4 subframes of this current speech frame former frame and this current speech frame, get element number greater than the mean value set of each class of 1 as with reference to the LTPLag parameter.

Or according to the LTPGain parameter of a subframe after the LTPGain parameter of the LTPGain parameter of this current speech frame subframe, the last subframe of this current speech frame subframe and this current speech frame subframe, utilize the priori conditions probability to obtain the parameter with reference to LTPGain.

S302, determine the component to be repaired of this speech frame or subframe parameter.

Concrete, this step can be respectively at different parameters:

If the component of current speech frame LAR parameter with reference to the distance between the LAR parameter respective component greater than the 7th threshold value, then the component of definite current speech frame LAR parameter is component to be repaired.

Or if the mean value of the RPEMax parameter of 4 subframes of current speech frame with reference to the distance of the mean value of RPEMax parameter greater than the 11 threshold value, then determine in the RPEMax parameter of 4 subframes of current speech frame with reference RPEMax mean parameter apart from the RPEMax parameter of maximum as component to be repaired.

Or with value in the LTPLag parameter of 4 subframes of current speech frame less than 40 or greater than 120, perhaps not by the component to be repaired that is defined as current speech frame subframe parameter of cluster.

Or the LTPGain parameter of any one subframe of current speech frame is greater than reference LTPGain parameter, perhaps the LTPGain parameter of any one subframe of current speech frame subtracts 1 less than reference LTPGain parameter, determines that then current speech frame subframe LTPGain parameter is component to be repaired.

S303, employing bit-masks are carried out the bit-level reparation to component to be repaired, obtain repairing the result; And/or with the alternative component of repairing of reference parameter, obtain repairing the result.

At LAR parameter, RPEMax parameter, need carry out bit and the reparation of bit-masks, concrete can for:

Each bit to component to be repaired carries out bit-masks, obtains the reparation component at each bit; From repair component, select the reference LAR parameter component immediate reparation component corresponding with component to be repaired as repairing the result.

Or each bit of component to be repaired carried out bit-masks, obtain the reparation component at each bit; From repair component, select with reference to the immediate reparation component of RPEMax mean parameter as repairing the result.

At the LTPLag parameter, not only to carry out the bit-masks reparation, might also to substitute with reference parameter, concrete: each bit to component to be repaired carries out bit-masks, obtains the reparation component at each bit; From repair component, select with reference to the immediate reparation component of arbitrary element in the LTPLag parameter as repairing the result.If should repair result and immediate with reference to the distance between the LTPLag param elements during less than the 14 threshold value, it is available then to repair the result; If this distance is during greater than the 14 threshold value, will with this repair the result immediate with reference to the LTPLag param elements as final reparation result.

At the LTPGain parameter, what carry out is to substitute operation, concrete, directly will substitute current speech frame subframe LTPGain parameter value with reference to the LTPGain parameter, obtains repairing the result.

Upgrade the parameter of current speech frame or subframe with the reparation result.After this process, can also judge further whether the parameter after repairing satisfies the condition of determining component to be repaired, repair if still satisfy then can circulate, available until repairing; Perhaps follow number of times and reach certain threshold value, the result is still unavailable in reparation, and then repairing failure withdraws from this repair process.

Like this, can avoid in the prior art the abandoning fully of erroneous frame, but it is carried out farthest effectively utilizing, repair the random bit mistake, further reduce the influence to the decoded speech quality, improve the user and experience.Need to prove: the speech frame restorative procedure among the embodiment two namely can be repaired based on the result of CRC check, also can repair based on the testing result that above-described embodiment one provides, if the result based on above-described embodiment one repairs, repairing the result can be more accurate.

An implementation procedure that provides a specific embodiment that the speech frame restorative procedure among the embodiment two is described below, unlike the prior art, present embodiment adopts error-detecting and the restorative procedure based on parameter, realizes error concealing.Present embodiment is to four class major parameters in the GSM full-speed voice, be that LAR, RPEMax, LTPlag and LTPGain adopt specific error-detecting and the wrong strategy of repairing respectively, this is because the purpose of speech coding algorithm is to eliminate intrinsic redundancy in the voice signal, in order to try one's best the bit number transmission few fixing voice signal of duration, be generally a frame, i.e. 20ms.Because the restriction of aspects such as complexity, time-delay, the coding parameter of speech coding algorithm output still has certain redundancy but in actual applications.There is bigger redundancy in three class GSM full-speed voice coding parameters as shown in table 2, and wherein, M represents the bit number of every class parameter, H (x _n), H (x _n| x _N-1) and H (x _n| x _N-1, x _N+1) represent entropy, single order and the second-order condition entropy of parameter respectively.

Bit number, entropy and the conditional entropy of table 2, GSM full-speed voice coding parameter

It can be seen from the table, LAR, LTPLag and RPEMax parameter have bigger redundancy.Simultaneously, in GSMFR voice coding relevant criterion, Ia class bit, namely 50 bits that subjective quality is had the greatest impact mainly concentrate in LAR, LTPLag and the RPE Max parameter.Basically the RPEGrid, the RPEPulse parameter bit that do not possess redundancy mainly are distributed in the II class, and be minimum relatively to the influence of subjective auditory perception.And the redundancy of LTPGain parameter itself is less, and its bit all is the Ib class, and namely subjective significance level is lower than the Ia class, is higher than the II class, and this parameter has certain influence to the decoded speech quality, but less relatively.

Thus, should concentrate strength on recovering or the level and smooth major parameter bigger to voice quality impacts, and for the less parameter of subjective significance level, can take certain measure suitably to repair, but can take the strategy of minimal effort in principle, namely only attempt recovering very manifest error.In embodiments of the present invention, mainly LAR parameter, LTPLag parameter and RPEMax parameter are repaired or smoothly; In the time may having more bit mistake, the LTPGain parameter has been taked certain level and smooth measure.

One, repairs for error-detecting and the mistake of LAR parameter

Use x _i, i=1 ..., 8 represent 8 LAR parameters respectively.Present embodiment adopts LAR divergence measurement (LARDeviation)

d (X, \hat{X}) = \frac{Σ_{i = 1}^{8} {(x_{i} - {\hat{x}}_{i})}^{2}}{Σ_{i = 1}^{8} {x_{i}}^{2} + Σ_{i = 1}^{8} {\hat{x}}_{i}^{2}}

Two groups of LAR parameter X=(x are described ₁, x ₂..., x ₈) and

Between apart distance.

For each speech frame, establishing its actual LAR parameter value is X _C=(x _C1, x _C2..., x _C8), adopt the interpolation result of its front and back two frame LAR parameters as the estimated value of present frame parameter

Interpolating method adopts simple linear interpolation:

{\hat{x}}_{Ci} = \frac{x_{Pi} + x_{Ni}}{2}, i = 1, \cdot \cdot \cdot, 8

X wherein _PiI LAR parameter value of expression former frame, x _NiI LAR parameter value of expression back one frame.

By about 2700000 speech frames are added up discovery, under 95% situation, X _C=(x _C1, x _C2..., x _C8) with

LAR difference less than 0.05.In the embodiment of the invention, the LAR difference threshold value of employing is 0.01.

Error-detecting and the restorative procedure of " garbled voice frame " interior LAR parameter are as follows:

(1) (pprev prev) (is expressed as X respectively with the interior LAR parameter value of back one frame (next) to utilize the front cross frame that receives _PP=(x _PP1, x _PP2..., x _PP8), X _P=(x _P1, x _P2..., x _P8) and X _N=(x _N1, x _N2..., x _N8)), for each component, generate with reference to the LAR parameter according to following regular interpolation

{\hat{X}}_{C} = ({\hat{x}}_{C 1}, {\hat{x}}_{C 2}, \cdot \cdot \cdot, {\hat{x}}_{C 8}) :

1. at first calculate the distance of former frame (prev) and back one frame (next) LAR parameter value:

d _i＝|x _Ni-x _Pi|，i＝1，…，8

2. select threshold value (i.e. the 6th threshold value) for use according to the BFI sign of back one frame: if back one frame is flagged as " erroneous frame ", then select stronger threshold value STd for use _iIf back one frame is flagged as " correct frame ", then select more weak threshold value WTd for use _iIn the embodiment of the invention, STd _iWith WTd _iValue such as table 3, be respectively:

Parameter	LAR1	LAR2	LAR3	LAR4	LAR5	LAR6	LAR7	LAR8
									Threshold value STd _i	10	10	7	7	5	5	3	3
Threshold value WTd _i	16	16	12	12	8	8	4	4

Table 3, STd _iWith WTd _iValue

3. if d _iSmaller or equal to the 6th threshold value, then determine to use x _PiAnd x _NiInterpolation is calculated " with reference to the LAR parameter "

If d _iGreater than threshold value Td _1i, then use x instead _PPiAnd x _PiCalculate

And calculating x _PPiAnd x _PiDistance to substitute d _iBe the convenience of representing later, actual to receive value representations be x with two of the calculating " with reference to the LAR parameter " of selecting for use _1iAnd x _2i

4. give tacit consent to interpolation coefficient a _1iAnd a _1iBe respectively 0.5.If d _iSurpass pre-determined threshold value Td _2i, then suitably adjust interpolation coefficient: at first, calculate x respectively _1i, x _2iActual reception value x with present frame _CiBetween apart from d _1i, d _2i:

d _1i＝|x _1i-x _Ci|，d _2i＝|x _2i-x _Ci|

Adjust interpolation coefficient a then respectively _1iAnd a _1iFor:

a_{1 i} = 1 - \frac{{d_{1 i}}^{2}}{{d_{1 i}}^{2} + {d_{2 i}}^{2}},

a_{2 i} = 1 - \frac{{d_{2 i}}^{2}}{{d_{1 i}}^{2} + {d_{2 i}}^{2}} .

5. calculate interpolation " with reference to the LAR parameter ":

{\hat{x}}_{Ci} = a_{1 i} \times x_{1 i} + a_{2 i} \times x_{2 i}

(2) compare the actual reception value of each LAR component x _CiWith " with reference to the LAR parameter "

Between distances dd _i

(3) if dd _iGreater than threshold value (i.e. the 7th threshold value) Td _1i, then to actual reception value x _CiCarry out preliminary correction.

Td _1i, Td _2iBe empirical value, value is respectively table 4 in embodiments of the present invention:

Parameter	LAR1	LAR2	LAR3	LAR4	LAR5	LAR6	LAR7	LAR8
									Threshold value Td _1i	10	10	7	7	5	5	3	3
Threshold value Td _2i	10	10	8	8	4	4	2	2

Table 4, Td _1i, Td _2iValue

Modification method is: adopt bit-masks, revise actual reception value x one by one _CiEffective bit, obtain one group of possible transmission value.Present embodiment claims that the method is the bit restorative procedure.Concrete implementation step is as follows:

If x _Ci=43 is the 1st component of LAR parameter, according to coding standard, and x _Ci6 significant bits are arranged, and its binary form is shown 101011, and its bit-masks initial value can be taken as BM=100000 (this is binary representation), and detailed process is:

1. make y ₁=x _Ci∧ BM, y ₁Be first possibility transmission value, in embodiments of the present invention, y ₁Binary form be shown 001011, decimal value is 11;

2. BM is moved to right one, even BM=BM＞＞1;

3. make y ₂=x _Ci∧ BM, y ₂Be second possibility transmission value, in embodiments of the present invention, y ₂Binary form be shown 111011, decimal value is 59;

4. so repeatedly, until BM=0, obtaining 6 altogether may transmission value y ₁=11, y ₂=59, y ₃=35, y ₄=47, y ₅=41, y ₆=44;

5. make Y=(y ₁, y ₂..., y ₆) be Candidate Set;

6. select among the Candidate Set Y and " with reference to the LAR parameter "

The most close value of distance as preliminary correction result

8 LAR components are repeated this process, obtain preliminary correction result

Afterwards, carry out the described parameter error in the following step (4)-(9) and detect and repair process, this process circulation is carried out, and up to all parameter errors of correct reparation, perhaps cycle index surpasses threshold value (i.e. the 9th threshold value), is made as 10 in the embodiment of the invention.

(4) calculate

With

Between the LAR divergence measurement:

LARD (\tilde{X}, \hat{X}) = \frac{Σ_{i = 1}^{8} {({\tilde{x}}_{Ci} - {\hat{x}}_{Ci})}^{2}}{Σ_{i = 1}^{8} {\tilde{x}}_{Ci}^{2} + Σ_{i = 1}^{8} {\hat{x}}_{Ci}^{2}}

(5) if

Smaller or equal to threshold value (i.e. the 8th threshold value) 0.01, then the current LAR parameter of mark is available, will

Output is as correction result; If

Be higher than threshold value (i.e. the 8th threshold value), then enter step (6) and further revise.

(6) calculate each LAR component

With

Divergence measurement:

{LD}_{i} = \frac{{({\tilde{x}}_{Ci} - {\hat{x}}_{Ci})}^{2}}{{\tilde{x}}_{Ci}^{2} + {\hat{x}}_{Ci}^{2}}

(7) selection differences is estimated maximum component as component to be repaired.

(8) adopt the bit correction method of describing in (3) that component to be repaired is repaired, select value the most close with " with reference to the LAR parameter " in the Candidate Set as correction result.

(9) upgrade the LAR parameter, recomputate With

The LAR divergence measurement

And return step (5).

It more than is all processes that " erroneous frame " LAR parameter (BFI=1) is carried out error-detecting and reparation.When LAR parameter repair process is failed, then be further processed in the mistake level and smooth stage of frame, detailed process illustrates in the back.

Error-detecting and the restorative procedure of " correct speech frame " interior LAR parameter are as follows:

(1) known each component x of present frame LAR parameter _Ci, i=1 ..., 8, front cross frame respective component x _PPi, former frame respective component x _Pi, the back one frame respective component x _Ni

(2) calculate present frame LAR parameter component and consecutive frame LAR component apart from d _PPi, d _Pi, d _Ni

(3) select d _PPi, d _Pi, d _NiMinimum value among the three is designated as d _Min

(4) if this minor increment d _MinGreater than threshold value T _i(i.e. the 5th threshold value) then enters step (5) and revises; If smaller or equal to threshold value T, then skip this component.Each component threshold value T _iValue such as table 5:

Parameter	LAR1	LAR2	LAR3	LAR4	LAR5	LAR6	LAR7	LAR8
									Threshold value T _i	20	20	15	15	8	8	5	5

Table 5, T _iValue

(5) ask for " with reference to the LAR parameter " of current component, the step of describing in acquiring method and above-described embodiment (1) is identical.

(6) adopt the bit-masks restorative procedure of describing in the step (3) in above-described embodiment that this component is repaired, choose in the Candidate Set with " with reference to the LAR parameter " immediate value as repairing the result.

(7) the reparation result to each component of LAR parameter verifies, the method that step (2), (3), (4) are described in verification method and above-described embodiment is identical.

(8) if do not find insecure component, then present frame LAR parameter is masked as and repairs successfully; If still have insecure component, then present frame LAR parameter is masked as repairing failure.At this, implication is reliably: 8 parameter d of LAR _MinAll smaller or equal to threshold value, then think LAR parameter reparation success, for reliably; 8 parameters of LAR have one greater than threshold value, all think LAR parameter repairing failure, for unreliable.

If this stage repairing failure then will be repaired the result and send into the smoothing processing that the parameter smoothing module is carried out parameter, detailed process illustrates in the back.

Two, the error-detecting of RPEMax parameter and reparation

RPEMax parameter in the GSM full-speed voice coding parameter has great redundancy.That its redundancy shows that the RPEMax parameter value distributes is inhomogeneous, have big correlativity between the RPEMax parameter value of adjacent sub-frame.What the RPEMax parameter represented is the amplitude of LPC pumping signal random partial.According to the short-term stationarity of voice, to the time of several frames, the value of RPEMax parameter generally is stable at a frame.

In embodiments of the present invention, the important criteria that the stationarity in short-term of voice signal is detected and repairs as the RPEMax parameter error.

The detection of RPEMax parameter error and the restorative procedure of " garbled voice frame " are as follows:

(1) the RPEMax parameter of establishing 4 subframes of present frame is R _C=(r _C1, r _C2, r _C3, r _C4), the RPEMax parameter of 4 subframes of former frame is R _P=(r _P1, r _P2, r _P3, r _P4), the RPEMax parameter of 4 subframes of back one frame is R _N=(r _N1, r _N2, r _N3, r _N4).

(2) make R _PFor gathering 1, R _NFor gathering 2, R=(r _P1, r _P2, r _P3, r _P4, r _N1, r _N2, r _N3, r _N4) for gathering 3.

(3) respectively set of

computations

1,2,3 mean value, and determine according to the variation range of each set interior element etc. whether this set interior element value is stable.Stable criterion for each change of elements in the set and this gather all elements and ratio.

Concrete, establishing interior each element of set is x _i, i=1 ..., N,

1. calculate element mean value

\overset{&OverBar;}{x} = \frac{1}{N} Σ_{i = 1}^{N} x_{i};

2. calculate the element jitter value

d (x) = \frac{Σ_{i = 1}^{N} | x_{i} - \overset{&OverBar;}{x} |}{Σ_{i = 1}^{N} x_{i}};

3. if d (x)＞0.25, (this 0.25 be the tenth threshold value) thinks this set interior element instability.

(4) select set the most stable in the

set

1,2,3 as " with reference to the RPEMax parameter ", if three set are all unreliable, then construct " average collection " conduct " with reference to the RPEMax parameter ".Building method is: make the mean value of " average collection " be

(5) the RPEMax reception value R of 4 subframes of calculating present frame _C=(r _C1, r _C2, r _C3, r _C4) mean value, determine the stability of 4 subframe RPEMax of present frame parameter according to the step described in (3);

Afterwards, carry out error-detecting and the reparation of RPEMax parameter.This process circulation is carried out, and up to no longer detecting wrong parameter, perhaps cycle index surpasses threshold value (i.e. the 12 threshold value), and present embodiment is made as 10.

(6) if 4 subframe RPEMax of present frame parameters R _C=(r _C1, r _C2, r _C3, r _C4) stable, namely smaller or equal to the tenth threshold value, and the distance between its mean value and " with reference to the RPEMax parameter " mean value is smaller or equal to given threshold value (i.e. the 11 threshold value), and then mark present frame subframe RPEMax parameter is available, its output as repairing the result, is withdrawed from process; If be higher than threshold value (i.e. the 11 threshold value), then enter step (7).

(7) if cycle index surpasses threshold value (i.e. the 12 threshold value), think that repair process fails, withdraw from process; Otherwise, enter (8) and carry out the parameter correction.

(8) select with " with reference to the RPEMax parameter " mean value at a distance of farthest RPEMax component, adopt the described bit restorative procedure of previous embodiment bit-masks to revise to it, revise the nearest candidate value of back selection and " with reference to the RPEMax parameter " mean value as repairing the result.

(9) upgrade present frame subframe RPEMax parameter value collection, determine the stability of present frame subframe RPEMax parameter according to the step described in (3), and return step (6).

It more than is all processes that the RPEMax parameter of " erroneous frame " is carried out error-detecting and reparation.If the parameter repairing failure, then to send into the parameter smoothing module and handle will repairing the result, detailed process illustrates in the back.

The RPEMax parameter error of " correct speech frame " detects and repairs.

Here " correct speech frame " refers to and is designated as correct speech frame, in fact may be incorrect speech frame.RPEMax parameter error detection in " correct speech frame " and restorative procedure and the method basically identical of introducing above, different is, and that the present frame parameter is carried out the condition restriction of error-detecting is loose relatively, and namely threshold value becomes greatly relatively, repeats no more herein.

Three, the error-detecting of LTPLag parameter and wrong the reparation

The embodiment of the invention adopts the LTPLag parameter error based on cluster analysis to detect and restorative procedure.The reparation of LTPLag parameter is primarily aimed at " erroneous frame " (BFI=1) to be carried out, and concrete steps are as follows:

(1) with all are effective in the LTPLag parameter of the LTPLag parameter of the LTPLag parameter of 4 subframes of current speech frame former frame, 4 subframes of present frame, 4 subframes of back one frame, namely the LTPLag parameter of value between 40-120 gathers in the big S set;

(2) adopt the minor increment clustering procedure that the element among the S is carried out cluster.In cluster process, stipulate that the ultimate range between every dvielement must be less than 10 (these 10 be the 13 threshold value);

(3) class that all is had more than 1 element is labeled as effective class, gets the set of each effective class average as " with reference to the LTPLag parameter " of the detection of LTPLag parameter error and reparation;

(4) if certain of present frame or the LTPLag component value of a few subframes are invalid value, namely less than 40 or greater than 120, perhaps not by in cluster to effective class, it is labeled as component to be repaired; If there is not component to be repaired, then the LTPLag parameter tags with 4 subframes of this frame are available, and withdraw from this process;

(5) adopt the bit restorative procedure of describing in the bit-masks reparation in the previous embodiment that LTPLag component to be repaired is revised, select candidate value near certain effective class average in " with reference to the LTPLag parameter " as repairing the result;

(6) if repair the result with immediate with reference to the distance between the LTPLag param elements during less than threshold value (i.e. the 14 threshold value), the result is available in this reparation; If this distance is during greater than threshold value (i.e. the 14 threshold value), will with this repair the result immediate with reference to the LTPLag param elements as the final result of reparation.

It more than is all processes that the LTPLag parameter is carried out error-detecting and reparation.

Four, the error-detecting of LTPGain parameter and wrong the reparation

The embodiment of the invention adopts the LTPGain parameter error based on the priori conditions probability to detect and restorative procedure.Only " erroneous frame " LTPGain parameter (BFI=1) is carried out error-detecting and reparation, concrete steps are as follows:

(1) the LTPGain parameter of remembering the last subframe of current speech frame subframe respectively is x _P, any subframe of current speech frame the LTPGain parameter be x _C, the LTPGain parameter of a subframe is x after the current speech frame subframe _N

(2) search priori conditions Probability p (x _C| x _P, x _N), if this conditional probability greater than 0.1 (this 0.1 be the 15 threshold value), is then accepted the LTPGain value x of current subframe _COtherwise, carry out error-detecting and the reparation of step (3)-(4);

(3) the reference LTPGain parameter of the current subframe of calculating, computing method are as follows:

{\hat{x}}_{C} = int [0.5 + Σ_{k = 0}^{3} k \cdot p (x_{C} = k | x_{P}, x_{N})]

(4) if

Or

Think x _CThere is mistake, with x _CReplace with the parameter with reference to LTPGain

Finish reparation; Otherwise, do not need x _CRepair.

It more than is all processes that the LTPGain parameter is carried out error-detecting and reparation.

More than be error-detecting and the repair process of four class major parameters.If the repair process of four class parameters all returns success sign, then the BFI sign with present frame changes 0 into, and will repair parameter and return, and sends into Voice decoder and decodes.

Like this, can utilize the priori statistical information, according to the parameter of the front and back frame of current speech frame, perhaps the parameter according to the front and back frame subframe of current speech frame obtains estimated value, substitutes the parameter value of current speech frame or subframe again with estimated value.Avoided relative simple parameters alternative method in the prior art, more met the characteristics of voice signal, the dynamic change that can keep voice signal preferably reduces the influence to the decoded speech quality, improves the user and experiences.

The embodiment of the invention three provides another speech frame restorative procedure, this method can be to carry out on the basis of embodiment one and/or embodiment two, repair still repairing failure as the restorative procedure that adopts embodiment two to provide is carried out one or many, repair with this method again; Certainly, this method can not carried out on the basis of embodiment one and/or embodiment two yet, directly speech frame is repaired.This method can adopt the mode of parameter smoothing that speech frame is repaired, different with reparation with the specific error-detecting of parameter, parameter smoothing does not carry out the reparation of bit-level to actual reception value, but utilizes the priori statistical information, calculates suitable substitution value according to adjacent parameter value.The priori statistical information here i.e. conditional probability by obtaining after a large amount of speech frames are added up.

As shown in Figure 4, step comprises:

S401, utilize the priori statistical information, according to the parameter of the front and back frame of speech frame, perhaps the parameter according to the front and back frame subframe of speech frame obtains estimated value.

In the present embodiment, this priori statistical information is conditional probability, and this can be to obtain in a large amount of speech frame statistics.

Concrete, for the LAR parameter, this step can for: obtain the conditional probability of the LAR parameter of the LAR parameter of a frame behind the current speech frame and current speech frame former frame, if this conditional probability less than the 16 threshold value, then adopts extrapolation method to obtain the LAR estimates of parameters; If this conditional probability more than or equal to the 16 threshold value, then adopts interpolation method to obtain the LAR estimates of parameters.

For the RPEMax parameter, this step can be the conditional probability of obtaining the RPEMax parameter of the RPEMax parameter of a frame subframe behind the current speech frame and current speech frame former frame subframe, if conditional probability less than the 16 threshold value, then adopts extrapolation method to obtain the RPEMax estimates of parameters; If conditional probability more than or equal to the 16 threshold value, then adopts interpolation method to obtain the RPEMax estimates of parameters.

The parameter value of S402, usefulness estimated value replacement speech frame or subframe.

Concrete, for the LAR parameter, this step can for: when the LAR parameter of determining the current speech frame with the distance between the LAR estimates of parameters during more than or equal to the 17 threshold value, with the LAR parameter of the alternative current speech frame of LAR estimates of parameters.

For the RPEMax parameter, this step can for: the RPEMax parameter of current speech frame subframe and the distance between the RPEMax estimates of parameters substitute the RPEMax parameter of current speech frame subframe during more than or equal to the 17 threshold value with the RPEMax estimates of parameters.

Like this, avoided relative simple parameters alternative method in the prior art, more met the characteristics of voice signal, the dynamic change that can keep voice signal preferably reduces the influence to the decoded speech quality more further, improves the user and experiences.

Describe a kind of implementation procedure of the speech frame restorative procedure of embodiment three below in detail with a specific embodiment, present embodiment provides the parameter smoothing based on interpolation.Introduce the time-delay of a frame (20ms), computation complexity is little, relative GSM full-speed voice decoding baseline algorithm, and stable performance, the decoded speech quality is good.

When carrying out parameter smoothing, at first detect whether satisfy the interpolation condition.Concrete, the reception of known former frame or subframe or reparation parameter value x _P, present frame or subframe reception parameter value x _C, back one frame or subframe reception parameter value x _N, search criterion probability P (x at first _N| x _P), if this probable value less than given threshold value (i.e. the 16 threshold value), is set to 0.0001 in the embodiment of the invention, then thinks and can not adopt interpolating method to carry out smoothly, change the estimated value that adopts Extrapolation method to calculate present frame or this parameter of subframe into

If more than or equal to given threshold value (i.e. the 16 threshold value), then adopt interpolating method.Computing formula is as follows:

Extrapolation:

{\hat{x}}_{C} = Σ_{k = 0}^{K} k \cdot P (x_{C} = k | x_{p})

Interpolation:

{\hat{x}}_{C} = Σ_{k = 0}^{K} k \cdot P (x_{C} = k | x_{p}, x_{N})

Wherein, K represents the maximum occurrences that x is possible, P (x _C=k|x _P), P (x _C=k|x _P, x _N) be the conditional probability of adding up in a large amount of coded samples.This estimated value

Comprise LAR estimates of parameters and RPEMax estimates of parameters.

The parameter that is input as the specific error-detecting of parameter and the output of reparation module of this module.In these input parameters, there is correct parameter reliably, also there are wrong or insecure parameter.The parameter smoothing module is only smoothly repaired wrong or insecure parameter as far as possible.After calculating interpolation or extrapolation smooth value, the reparation value of the specific error-detecting of comparative parameter and the output of reparation module

With estimated value

If both distances think that then the reparation value is available, without estimated value less than given threshold value T (i.e. the 17 threshold value) Substitute; Otherwise, with the reparation value Be replaced by estimated value

Threshold value T value such as the table 6 of each parameter:

Parameter	LAR1	LAR2	LAR3	LAR4	LAR5	LAR6	LAR7	LAR8	RPEMax
										Threshold value T	6	6	4	4	3	3	2	2	4

Table 6, T value

Behind level and smooth the end, indicate BFI to reset to 0 erroneous frame, the erroneous frame indication that obtains revising.So far, the error-detecting of speech frame parameters and repair process finish fully.

The binary decision pattern that the embodiment of the invention has overcome " correctly " and " mistake " speech frame in the prior art is too simple, the shortcoming that effective information is abandoned fully, the simple parameters alternative method can't be followed the tracks of the dynamic change of voice in the garbled voice frame.At first, by LAR, LTPLag and RPEMax three class parameters are established rational error criterion respectively, detect the parameter that may comprise mistake, and keep reliable parameter, carry out wrong detection and reparation in the parameter aspect, rather than whole frame signal is carried out the judgement of clean cut formula; Secondly, to there being the parameter of mistake, the correlativity of frame parameter before and after utilizing adopts the method for bit-level reparation to revise, and can eliminate accidental bit mistake, effectively repairs parameter; At last, the parameter of repairing failure is carried out based on extrapolation or the interpolation of statistical probability level and smooth, relatively simple parameters substitutes, and more meets the characteristics of voice signal, keeps the dynamic change characterization of voice signal to a certain extent.

The method that the speech frame that present embodiment provides is repaired, adopt the method for parameter smoothing that speech frame is repaired, the parameter smoothing method has been utilized the statistical nature of speech parameter, with respect to the simple parameters alternative method, the characteristics that more meet voice signal, the dynamic change that can keep voice signal preferably.Can reduce the influence to the decoded speech quality, improve the user and experience.

Need to prove that above-described embodiment one, two, three the method that provides can make up execution, also can be independent execution the respectively on the basis of existing technology.When combination is carried out, can be to make up in twos, also can be three part combinations, when three parts make up, can be to carry out by embodiment one, two, three order, namely the method for Zhi Hanging comprises three parts: first, check mistake at the quiet section CRC that may occur, adopt the DTX Processing Algorithm that strengthens, to eliminate quiet section noise fully; Second portion for speech frame, at a few class major parameters, carries out the specific speech frame error-detecting of parameter and reparation; Third part to the speech frame at the second portion repairing failure, is carried out the parameter smoothing based on interpolation; Zu He situation comprises embodiment one, two in twos; Embodiment one, three; Embodiment two, three.The realization details of every part embodiment does not repeat them here referring to embodiment one, two, three and the introduction of corresponding specific implementation flow process during combination.

The erroneous frame pick-up unit that the embodiment of the invention provides as shown in Figure 5, comprising:

Receiving element 501 is used for receiving speech frame under silent mode.

Detecting unit 502, be used for when the speech frame that receives be that a parameter is when being designated as correct speech frame, according to the detection rule that sets in advance this speech frame parameters is detected, when satisfying the condition that detects rule predetermining, determine that this speech frame is wrong speech frame.

Further, detecting unit 502, concrete be used for when the speech frame that receives is BFI=0, during the speech frame of SID=0, if SID=2 or the BFI=1 of a frame behind this speech frame determine that then this speech frame is the garbled voice frame.

Perhaps, detecting unit 502, the concrete LAR parameter that is used for as if this speech frame, the prior probability of any one parameter determines then that less than first threshold this speech frame is the garbled voice frame in the RPEMAX of this speech frame subframe, LTPlag, the LTPGain parameter.

Perhaps, detecting unit 502, the concrete LAR parameter that is used for as if a frame behind this speech frame former frame, this speech frame and this speech frame, with the priori transition probability of any one parameter in the RPEMax parameter of a frame subframe behind this speech frame former frame subframe, this speech frame subframe and this speech frame less than second threshold value, determine that then this speech frame is the garbled voice frame.

This device also comprises: reset cell 503 is used for the BFI of detecting unit 502 detected error speech frames is reset to 1.

In addition, receiving element 501 also is used for receiving speech frame under speech pattern.

Detecting unit 502, also be used for when the speech frame that receives be a parameter when being designated as the speech frame of mistake, the SID to this speech frame detects according to the mean parameter of this speech frame, when satisfying testing conditions, determines that this speech frame is the SID frame.

Concrete, detecting unit 502, concrete being used for if the mean value of the LTPLag parameter of 4 subframes of this speech frame

Less than the 3rd threshold value, and the mean value of the RPEMax parameter of 4 subframes of this speech frame former frame

Less than the 4th threshold value, and the mean value of the RPEMax parameter of 4 subframes of this speech frame

Less than the 4th threshold value, determine that then this speech frame is the SID frame.

Reset cell 503 is used for the SID of detecting unit 502 detected error speech frames is reset to 2.

Like this, under silent mode, receive that a parameter is designated as correct speech frame, then according to the detection rule that sets in advance described speech frame parameters is detected, when satisfying the condition that detects rule predetermining, determine that this speech frame is the garbled voice frame, can eliminate because a large amount of quiet section noise that the CRC check erroneous judgement causes.Under speech pattern, receive that a parameter is designated as wrong speech frame, then detect according to the mean parameter of this speech frame SID to this speech frame, when satisfying testing conditions, determine that this speech frame is the SID frame, can correctly identify disturbed SID frame.In a word, can at utmost utilize the effective information in the speech frame, significantly reduce the influence to the decoded speech quality, improve the user and experience.

Above-mentioned detection device comprises the speech frame repair function, and the method to realize that embodiment two provides comprises as this device of Fig. 6:

Generation unit 601 is used for the parameter according to frame before and after the speech frame or front and back frame subframe, and perhaps the parameter according to the front and back subframe of speech frame subframe generates reference parameter;

Judging unit 602 is for the component to be repaired of determining this speech frame or subframe parameter;

Repair unit 603, be used for adopting bit-masks that this component to be repaired is carried out the bit-level reparation, obtain repairing the result; And/or with the alternative component of repairing of reference parameter, obtain repairing the result.

Further, as shown in Figure 7, generation unit 601 further can comprise:

The LAR parameter generates parameter module 601A, is used for the LAR parameter of a frame behind the LAR parameter of LAR parameter, speech frame former frame according to the speech frame front cross frame and the speech frame, generates with reference to the LAR parameter;

The RPEMax parameter that is used for establishing 4 subframes of speech frame former frame is first set, establish that the RPEMax parameter of 4 subframes of a frame is second set behind the speech frame, establish 8 RPEMax parameter the 3rd set of 4 subframes of a frame behind 4 subframes of speech frame former frame and the speech frame, the RPEMax parameter generates parameter module 601B, changes minimum set as reference RPEMax parameter with each component elements in three set;

The LTPLag parameter generates parameter module 601C, the LTPLag param elements of all values between 40 to 120 of the LTPLag parameter of 4 subframes of a frame carried out cluster after be used for adopting minimum clustering procedure to the LTPLag parameter of the LTPLag parameter of 4 subframes of speech frame, 4 subframes of speech frame former frame and speech frame, get element number greater than the mean value set of each class of 1 as with reference to the LTPLag parameter, wherein, in each class the distance between each element less than the 13 threshold value;

The LTPGain parameter generates parameter module 601D, be used for the LTPGain parameter of a subframe after the LTPGain parameter of LTPGain parameter, the last subframe of speech frame subframe according to the speech frame subframe and the speech frame subframe, utilize the priori conditions probability to obtain the parameter with reference to LTPGain.

Judging unit 602 further can comprise:

LAR parameter judge module 602A, be used for if the component of speech frame LAR parameter with reference to the distance between the LAR parameter respective component greater than the 7th threshold value, then this component of definite this speech frame LAR parameter is component to be repaired;

RPEMax parameter judge module 602B, be used for if the mean value of the RPEMax parameter of 4 subframes of speech frame with reference to the distance of the mean value of RPEMax parameter greater than the 11 threshold value, then determine in the RPEMax parameter of 4 subframes of this speech frame with reference RPEMax mean parameter apart from the RPEMax parameter of maximum as component to be repaired;

LTPLag parameter judge module 602C is used for LTPLag parameter value with 4 subframes of speech frame less than 40 or greater than 120, perhaps not by the component to be repaired that is defined as this speech frame subframe parameter of cluster;

LTPGain parameter judge module 602D, be used for the LTPGain parameter of this any one subframe of speech frame greater than reference LTPGain parameter, perhaps the LTPGain parameter of this any one subframe of speech frame subtracts 1 less than reference LTPGain parameter, determines that then this speech frame subframe LTPGain parameter is component to be repaired.

Like this, can avoid in the prior art the abandoning fully of erroneous frame, but it is carried out farthest effectively utilizing, repair the random bit mistake, further reduce the influence to the decoded speech quality, improve the user and experience.

As shown in Figure 8, above-mentioned erroneous frame pick-up unit can also comprise the function that another kind of speech frame is repaired, and can carry out the method shown in the embodiment three, and said apparatus can also comprise:

Computing unit 801 is used for utilizing the priori statistical information, and according to the parameter of the front and back frame of current speech frame, perhaps the parameter according to the front and back frame subframe of current speech frame obtains estimated value;

Replace unit 802, be used for substituting with estimated value the parameter value of current speech frame or subframe.

Further, computing unit 801, the condition when the LAR parameter of the LAR parameter of a frame behind the speech frame and speech frame former frame of being used for is during generally less than the 16 threshold value, and the employing extrapolation method obtains the LAR estimates of parameters; If this conditional probability during more than or equal to the 16 threshold value, adopts interpolation method to obtain the LAR estimates of parameters;

Perhaps, computing unit 801, when being used for conditional probability when the RPEMax parameter of the RPEMax parameter of a frame subframe behind the speech frame and speech frame former frame subframe less than the 16 threshold value, the employing extrapolation method obtains the RPEMax estimates of parameters; If this conditional probability during more than or equal to the 16 threshold value, adopts interpolation method to obtain the RPEMax estimates of parameters.

The embodiment of the invention provides a kind of speech frame prosthetic device, as shown in Figure 9, comprising:

Further, as shown in figure 10, generation unit 601 further can comprise:

Judging unit 602 further can comprise:

As shown in figure 11, said apparatus also comprises another repair function, and with the method that realizes that embodiment three provides, this device also comprises:

The speech frame prosthetic device that the embodiment of the invention provides as shown in figure 12, comprising:

The specific implementation details of said apparatus embodiment sees also specifying of counterpart among the method embodiment, its reference value, threshold value value also with method embodiment in identical, do not repeat them here.

Through the above description of the embodiments, the those skilled in the art can be well understood to the present invention and can realize by the mode that software adds essential general hardware platform, can certainly pass through hardware, but the former is better embodiment under a lot of situation.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product is stored in the storage medium, comprise that some instructions are with so that a computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out the described method of each embodiment of the present invention.

The above; only be the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; can expect easily changing or replacing, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion by described protection domain with claim.

Claims

1. an erroneous frame detection method is characterized in that, comprising:

Under silent mode, receive that a parameter is designated as correct speech frame, then according to the detection rule that sets in advance described speech frame parameters is detected, when satisfying the condition of described detection rule predetermining, determine that described speech frame is the garbled voice frame;

Wherein, it is bad frame indication BFI=0 that described parameter is designated as correct speech frame, and the speech frame of SID=0 is described in quiet insertion.

2. method according to claim 1 is characterized in that, and is described when satisfying the condition of described detection rule predetermining, determines that described speech frame is that the garbled voice frame is specially:

If SID=2 or the BFI=1 of a frame behind the described speech frame determine that then described speech frame is the garbled voice frame, described method also comprises: the BFI of described speech frame is reset to 1.

3. method according to claim 1 is characterized in that, it is BFI=0 that described parameter is designated as correct speech frame, and the speech frame of SID=0; The detection rule that described basis sets in advance detects described speech frame parameters, when satisfying the condition of described detection rule predetermining, determines that described speech frame is that the garbled voice frame is specially:

Obtain the LAR parameter of described speech frame, the prior probability of the Regular-Pulse Excitation maximal value RPEMAX of described speech frame subframe, long-term prediction time-delay LTPlag, long-term prediction gain LTPGain parameter, if wherein the prior probability of any one parameter is less than first threshold, determine that then described speech frame is the garbled voice frame; Perhaps obtain the LAR parameter of a frame behind described speech frame former frame, described speech frame and the described speech frame respectively, priori transition probability with the RPEMAX parameter of a frame subframe behind described speech frame former frame subframe, described speech frame subframe and the described speech frame, if wherein the priori transition probability of any one parameter is less than second threshold value, determine that then described speech frame is the garbled voice frame;

Described method also comprises: the BFI of described speech frame is reset to 1.

4. method according to claim 2, it is characterized in that, described method also comprises: after not satisfying described speech frame when the SID=2 of a frame or BFI=1, obtain the LAR parameter of described speech frame, the prior probability of the Regular-Pulse Excitation maximal value RPEMAX of described speech frame subframe, long-term prediction time-delay LTPlag, long-term prediction gain LTPGain parameter, if wherein the prior probability of any one parameter is less than first threshold, determine that then described speech frame is the garbled voice frame; Perhaps obtain the LAR parameter of a frame behind described speech frame former frame, described speech frame and the described speech frame respectively, priori transition probability with the RPEMAX parameter of a frame subframe behind described speech frame former frame subframe, described speech frame subframe and the described speech frame, if wherein the priori transition probability of any one parameter is less than second threshold value, determine that then described speech frame is the garbled voice frame;

5. method according to claim 1 is characterized in that, described method also comprises:

Under speech pattern, receive that a parameter is designated as wrong speech frame, then according to the mean parameter of described speech frame SID is described in the quiet insertion of described speech frame and detect, when satisfying testing conditions, determine that described speech frame is the SID frame;

Wherein, it is BFI=1 that described parameter is designated as wrong speech frame, and the speech frame of SID=0.

6. method according to claim 5 is characterized in that, described mean parameter according to described speech frame is described SID to the quiet insertion of described speech frame and detected, and when satisfying testing conditions, determines that described speech frame is the SID frame, comprising:

Obtain the mean value of long-term prediction time-delay LTPLag parameter of 4 subframes of described speech frame

Obtain the mean value of Regular-Pulse Excitation maximal value RPEMAX parameter of 4 subframes of described speech frame former frame

And the mean value of the Regular-Pulse Excitation maximal value RPEMAX parameter of 4 subframes of described speech frame

If it is described

Less than the 3rd threshold value, and described

Less than the 4th threshold value, and described

Less than described the 4th threshold value, determine that then described speech frame is the SID frame,

Described method also comprises: the SID of described speech frame is reset to 2.

7. method according to claim 1 is characterized in that, described method also comprises:

According to the parameter of frame before and after the described speech frame or front and back frame subframe, perhaps the parameter according to the front and back subframe of described speech frame subframe generates reference parameter;

Determine the component to be repaired of described speech frame or subframe parameter;

Adopt bit-masks that described component to be repaired is carried out the bit-level reparation, obtain repairing the result; And/or with the alternative described reparation component of described reference parameter, obtain repairing the result;

Upgrade the parameter of described speech frame or subframe with described reparation result.

8. according to claim 1 or 7 described methods, it is characterized in that described method also comprises:

Utilize the priori statistical information, according to the parameter of the front and back frame of described speech frame, perhaps the parameter according to the front and back frame subframe of described speech frame obtains estimated value;

Substitute the parameter value of described speech frame or subframe with described estimated value.

9. an erroneous frame pick-up unit is characterized in that, comprising:

Receiving element is used under silent mode, receives speech frame.

Detecting unit, be used for when the speech frame that receives be that a parameter is when being designated as correct speech frame, according to the detection rule that sets in advance described speech frame parameters is detected, when satisfying the condition of described detection rule predetermining, determine that described speech frame is wrong speech frame; Wherein, it is bad frame indication BFI=0 that described parameter is designated as correct speech frame, and the speech frame of SID=0 is described in quiet insertion.

10. erroneous frame pick-up unit according to claim 9 is characterized in that,

Described detecting unit, concrete be used for when the speech frame that receives is BFI=0, during the speech frame of SID=0, if SID=2 or the BFI=1 of a frame behind the described speech frame determine that then described speech frame is the garbled voice frame;

Perhaps, described detecting unit, concrete being used for if the LAR parameter of described speech frame, the prior probability of any one parameter determines then that less than first threshold described speech frame is the garbled voice frame in the Regular-Pulse Excitation maximal value RPEMAX of described speech frame subframe, long-term prediction time-delay LTPlag, the long-term prediction gain LTPGain parameter;

Perhaps, described detecting unit, the concrete LAR parameter that is used for as if a frame behind described speech frame former frame, described speech frame and the described speech frame, with the priori transition probability of any one parameter in the RPEMAX parameter of a frame subframe behind described speech frame former frame subframe, described speech frame subframe and the described speech frame less than second threshold value, determine that then described speech frame is the garbled voice frame;

Described device also comprises: reset cell is used for the BFI of the detected described speech frame of described detecting unit is reset to 1.

11. erroneous frame pick-up unit according to claim 9 is characterized in that,

Described receiving element also is used for receiving speech frame under speech pattern;

Described detecting unit, also be used for when the speech frame that receives be a parameter when being designated as the speech frame of mistake, the SID to described speech frame detects according to the mean parameter of described speech frame, when satisfying testing conditions, determines that described speech frame is the SID frame; Wherein, it is BFI=1 that described parameter is designated as wrong speech frame, and the speech frame of SID=0.

12. erroneous frame pick-up unit according to claim 11 is characterized in that,

Described detecting unit, concrete being used for if the mean value of the long-term prediction time-delay LTPLag parameter of 4 subframes of described speech frame

Less than the 3rd threshold value, and the mean value of the Regular-Pulse Excitation maximal value RPEMAX parameter of 4 subframes of described speech frame former frame Less than the 4th threshold value, and the mean value of the Regular-Pulse Excitation maximal value RPEMAX parameter of 4 subframes of described speech frame

Less than described the 4th threshold value, determine that then described speech frame is the SID frame;

Described reset cell, concrete being used for resets to 2 with the SID of the detected described speech frame of described detecting unit.

13. device according to claim 9 is characterized in that, described device also comprises:

Generation unit is used for the parameter according to frame before and after the speech frame or front and back frame subframe, and perhaps the parameter according to the front and back subframe of speech frame subframe generates reference parameter;

Judging unit is for the component to be repaired of determining described speech frame or subframe parameter;

Repair the unit, be used for adopting bit-masks that described component to be repaired is carried out the bit-level reparation, obtain repairing the result; And/or with the alternative described reparation component of described reference parameter, obtain repairing the result.

14. device according to claim 9 is characterized in that, described device also comprises:

Computing unit is used for utilizing the priori statistical information, and according to the parameter of the front and back frame of speech frame, perhaps the parameter according to the front and back frame subframe of speech frame obtains estimated value;

Replace the unit, be used for substituting with described estimated value the parameter value of described speech frame or subframe.