Embodiment
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.
Embodiment mono-
Because the speech frame in interframe relative decoding device voice packet is to be mutually related, therefore the decoded result of speech frame can impact current speech frame decoding above.When voice packet occurs packet loss in network transmission process, the speech frame in voice packet also can be lost.Now, due to the decoded result that there is no a speech frame above as a reference, the decode procedure of losing the follow-up speech frame of speech frame can be subject to very large negative effect, thus the voice quality that causes decoding voice signal out to produce is poor.
Negative effect when interframe relative decoding device is decoded in order to reduce as much as possible packet loss, the invention provides a kind of bag-losing hide method of parameter field, the method is for installing the equipment of interframe relative decoding device, this equipment includes but not limited to terminal, server etc., and the present embodiment is not done concrete restriction to this.For the speech frame in voice packet is decoded, the embodiment of the present invention is usingd the parameter of last valid frame or last valid frame and a rear valid frame as the foundation of determining lost frames parameter, take executive agent as receiving end is example, and the method that the present embodiment is provided is illustrated.Referring to Fig. 1, the method flow that the present embodiment provides comprises:
101: determine whether current speech frame to be decoded is lost;
102: if the parameter of the last valid frame of current speech frame is obtained in current speech LOF;
103: the parameter of determining current speech frame according to the parameter of last valid frame;
104: according to the parameter of current speech frame, current speech frame is decoded.
On the method basis shown in Fig. 1, the method that the present embodiment provides, according to whether there being the different situations of a rear valid frame of current speech frame in buffering, specifically can be subdivided into following two kinds of situations:
Referring to Fig. 2, for there being the situation of a rear valid frame of current speech frame in buffering, the method flow that the present embodiment provides is as follows:
201: determine whether current speech frame to be decoded is lost;
202: if the parameter of the last valid frame of current speech frame is obtained in current speech LOF;
203: judge whether that buffering has a rear valid frame of current speech frame;
204: if buffering has a rear valid frame, the parameter of a valid frame after obtaining;
205: the parameter of determining current speech frame according to the parameter of the parameter of last valid frame and a rear valid frame;
206: according to the parameter of current speech frame, current speech frame is decoded.
Referring to Fig. 3, for there is no the situation of a rear valid frame of current speech frame in buffering, the method flow that the present embodiment provides comprises:
301: determine whether current speech frame to be decoded is lost;
302: if the parameter of the last valid frame of current speech frame is obtained in current speech LOF;
303: judge whether that buffering has a rear valid frame of current speech frame;
304: if buffering does not have a rear valid frame, according to the parameter extrapolation of last valid frame, determine the parameter of current speech frame;
305: according to the parameter of current speech frame, current speech frame is decoded.
The method that the present embodiment provides, when determining current speech LOF to be decoded, by obtaining the parameter of last valid frame or the parameter of last valid frame and a rear valid frame of current speech frame, according to concrete condition, determine the parameter of current speech frame, then according to the parameter of current speech frame, to losing speech frame, carry out normal decoder, owing to having simulated the normal work of demoder under packet drop, therefore the continuity that has kept decoding, thereby when there is packet loss phenomenon in voice packet in transmitting procedure, can decode according to the parameter of definite lost frames, and then improve decoded voice quality.
Embodiment bis-
The embodiment of the present invention provides a kind of bag-losing hide method of parameter field, in conjunction with the content in above-described embodiment one, for current speech frame, lose, wobble buffer has or not the situation of subsequent voice bag, respectively packet loss concealment provided by the invention is at length illustrated.Referring to Fig. 4, the method flow that the present embodiment provides comprises:
401: determine whether current speech frame to be decoded is lost;
The present embodiment is not to concrete restriction of definite method work of determining whether current speech frame to be decoded is lost, include but not limited to: voice packet transmitting terminal is before sending voice packet, for each speech frame in voice packet is numbered, numbering speech frame is later sent to voice packet receiving end.Demoder shown in Figure 5, is provided with a wobble buffer, and the speech frame receiving is pre-stored in wobble buffer.Demoder is the numbering with the follow-up valid frame of storing in wobble buffer according to the numbering of the last valid frame of current speech frame, can determine whether current speech frame is lost.
For example, first speech frame is numbered 1, and demoder has been decoded after first speech frame, retrieves follow-up valid frame in wobble buffer, if retrieve follow-up valid frame be numbered 4, now can determine second speech frame and the 3rd lost speech frames.If current, need to decode to second speech frame, determine current speech LOF.
Certainly, can also adopt alternate manner to determine whether current speech frame is lost, the present embodiment is not done concrete restriction to this.The present embodiment only be take current speech LOF as example, tone decoding method is illustrated, for the situation of determining that current speech frame is not lost, can directly decodes according to predefined decoding process, the decoding process of not losing about current speech frame repeats no more herein.
402: if current speech LOF judges whether that buffering has a rear valid frame of current speech frame, if so, execution step 403, otherwise, execution step 407;
This step, when judging whether that buffering has a rear valid frame of current speech frame, can adopt the same way of whether losing with definite current speech frame.As described in above-mentioned steps 401, transmitting terminal, before sending speech frame, for each speech frame is numbered, is sent to receiving end by numbering speech frame later.Receiving end sets in advance a wobble buffer, and the speech frame receiving is pre-stored in wobble buffer.Numbering according to the numbering of the last valid frame of current speech frame with the follow-up valid frame of storing in wobble buffer, judges whether that buffering has a rear valid frame of current speech frame.
For example, current speech frame number is 3, if retrieve the follow-up speech frame that is numbered 4 that has in wobble buffer, now can determine that buffering has a rear valid frame of current speech frame.Again for example, current speech frame number is 3, if retrieve the follow-up speech frame that is numbered 5 that has in wobble buffer, now can determine and not cushion a rear valid frame that has current speech frame.
Certainly, can also adopt alternate manner to judge whether that buffering has a rear valid frame of current speech frame, the present embodiment is not done concrete restriction to this.
403: obtain the binary decision class parameter of last valid frame and a rear valid frame, and according to the binary decision class parameter of last valid frame and a rear valid frame, determine the signal type of current speech frame, obtain the binary decision class parameter of current speech frame;
Particularly, binary decision class parameter is for signal type is judged, because voice has dividing of voiceless sound voiced sound, so the modeling of cyclical signal and aperiodicity signal and coding are had to obvious difference in common speech model.Wherein, wide in range says, the corresponding unvoiced frame of cyclical signal, the corresponding unvoiced frames of aperiodicity signal.Therefore, signal type comprises two types of voiceless sound and voiced sounds.Obtain after the binary decision class parameter of last valid frame and a rear valid frame, can determine whether last valid frame and a rear valid frame are cyclical signal according to the binary decision class parameter of the last valid frame getting and a rear valid frame, thereby according to the binary decision class parameter of last valid frame and a rear valid frame, determine the signal type of last valid frame and a rear valid frame, obtain the binary decision class parameter of current speech frame.The mode providing according to the present embodiment, in the process of signal type of determining current speech frame, includes but not limited to following three kinds of situations:
Situation one: last valid frame and a rear valid frame are cyclical signal, can determine that the signal type of last valid frame and a rear valid frame is unvoiced frame, is now defined as unvoiced frame by the signal type of current speech frame according to the binary decision class parameter of last valid frame and a rear valid frame.
Situation two: last valid frame is cyclical signal, a rear valid frame is aperiodicity signal, can determine that last valid frame is unvoiced frame according to the binary decision class parameter of last valid frame and a rear valid frame, a rear valid frame is unvoiced frames.Or last valid frame is aperiodicity signal, a rear valid frame is cyclical signal, can determine that last valid frame is unvoiced frames according to the binary decision class parameter of last valid frame and a rear valid frame, and a rear valid frame is unvoiced frame.
In above-mentioned two kinds of situations, owing to having one in last valid frame and a rear valid frame for cyclical signal, can judge in this case the conversion of having experienced in lost frames periodically and between aperiodicity signal, therefore can reasonably suppose how much also to have the existence of cyclical signal in lost frames, therefore, determine that current speech frame is unvoiced frame.
Situation three: last valid frame and a rear valid frame are aperiodicity signal, can determine that the signal type of last valid frame and a rear valid frame is unvoiced frames, is now defined as unvoiced frames by the signal type of current speech frame according to the binary decision class parameter of last valid frame and a rear valid frame.
No matter adopt above-mentioned which kind of situation to determine the signal type of current speech frame, definite signal type all be can be exchanged into corresponding binary decision class parameter.For example, during concrete enforcement, the binary decision class parameter that unvoiced frames can be set is 0, the binary decision class parameter value of unvoiced frame is 1, after determining the signal type of current speech frame, if this current speech frame is unvoiced frames, the binary decision class parameter value of current speech frame is 0, in like manner, if this current speech frame is unvoiced frame, the binary decision class parameter of current speech frame is 1, certainly, the numerical value of binary decision class parameter can also adopt other set-up mode, and the present embodiment is not done concrete restriction to this.
404: obtain the sequential evolution class parameter of last valid frame and a rear valid frame, and according to the binary decision class parameter of last valid frame and a rear valid frame and sequential evolution class parameter, determine the sequential evolution class parameter of current speech frame;
Particularly, sequential evolution class parameter can include but not limited to pitch period, gain parameter and LSP(LineSpectrum Pair, line spectrum pair) coefficient etc., the present embodiment is not done concrete restriction to this, to obtaining the mode of the sequential evolution class parameter of last valid frame and a rear valid frame, do not limit equally.During concrete enforcement, first take pitch period as example, in the binary decision class parameter according to last valid frame and a rear valid frame, determine after signal type, can according to following four kinds of situations, determine according to the signal type of last valid frame and a rear valid frame pitch period parameter of current speech frame.
Situation one: last valid frame and a rear valid frame are unvoiced frame;
Get after the pitch period of last valid frame and a rear valid frame, due in actual scene, when people speaks, likely improve suddenly or reduce tone, so in the stage, may there is equally the sudden change of pitch period at stable voiced sound.In order to judge the pitch period of last valid frame and a rear valid frame, whether undergo mutation, can take following method: the absolute value of getting the difference of the pitch period of last valid frame and the pitch period of a rear valid frame, whether the pitch period offset threshold that the absolute value of difference is default compares, according to the pitch period of comparative result and then definite last valid frame and a rear valid frame, undergo mutation.
For example, establish the pitch period that next_pitch is a rear valid frame, the pitch period that last_pitch is last valid frame, takes absolute value and determines whether and have pitch period sudden change with default pitch period offset threshold δ by both difference.
Wherein, if | next_pitch-last_pitch| < δ, after calculating according to above-mentioned formula, if both differences take absolute value, be less than pitch period offset threshold δ, can determine that the pitch period of last valid frame and a rear valid frame is not undergone mutation.Otherwise, can determine that the pitch period of last valid frame and a rear valid frame is undergone mutation.Wherein, pitch period offset threshold δ can set according to historical experience, and the present embodiment is not done concrete restriction to this.In addition, can also adopt other method to determine whether the pitch period of last valid frame and a rear valid frame undergos mutation in practical operation, this example is not done concrete restriction yet to this.
When determining the pitch period parameter of current speech frame, according to the pitch period of last valid frame and a rear valid frame, whether undergo mutation, can be divided into following two kinds of situations:
The first situation: the pitch period of last valid frame and a rear valid frame is not undergone mutation;
Under the situation of not undergoing mutation at the pitch period of last valid frame and a rear valid frame, pitch period profile is smoothly and evolution chronologically, therefore, can select to determine by the method for linear interpolation the pitch period of the subframe of current speech frame, according to the pitch period of the subframe of current speech frame, determine the pitch period of current speech frame afterwards.Certainly, can also select other interpolation algorithm to determine the pitch period of current speech frame, the present embodiment is not done concrete restriction to this.
During concrete enforcement, in carrying out the process of linear interpolation, can the parameter needing be set according to actual concrete condition, and different numerical value is set carries out linear interpolation, the present embodiment is not done concrete restriction to the algorithm of linear interpolation.Only take following linear interpolation algorithm as example, and the specific implementation of this kind of algorithm can represent by following formula:
pitch[k]=last_pitch+pIncr*(k+1ossCnt*subFrameCount+1) (2)
In formula (1), pIncr is evolution increment, and lostFrameCount is the number of frame losing altogether that wobble buffer imports into, and subFrameCount is the number of sub frames comprising in every frame.
In formula (2), can add up the numerical value of determining lostFrameCount to the continuous frame losing number before the rear valid frame in wobble buffer.For example, in wobble buffer, can deposit 5 frames altogether, wherein each frame all has numbering.At a time, the second LOF, and be numbered 5 the 5th frame during next valid frame of wobble buffer, and now can determine two, three, four LOFs, the numerical value of lostFrameCount is 3.In addition, subFrameCount is the number of sub frames comprising in every frame, and the number of sub frames of every frame can be set according to actual needs, and the present embodiment is not done concrete restriction yet to this.
Determine after the numerical value of different parameters in the manner described above, can determine the numerical value of evolution increment pIncr, the numerical value of evolution increment pIncr is updated in formula (2) and does computing further.
In formula (2), lossCnt is that frame losing starts the number of frame losing altogether of position up till now, and the implication of k is the k subframe of current speech frame, pitch[k] represent the pitch period of the k subframe of current speech frame.
Determine in the manner described above after the numerical value of different parameters, can determine the numerical value of lossCnt and k, both numerical value are updated to and in formula (2), carry out computing and can determine pitch[k] numerical value, i.e. the voice cycle of the k subframe of current speech frame.
After getting the pitch period of all subframes of current speech frame, can adopt certain method to determine the pitch period of current speech frame, as the pitch period of subframe is superposeed according to weight, the present embodiment is not done concrete restriction to this.
Second case: the pitch period of last valid frame and a rear valid frame is undergone mutation.
For the ease of follow-up calculating, it is example that the present embodiment be take the interlude point that pitch period occurs in packet loss, for example, according to the numbering of frame, determine that lost frames are the second frame, the 3rd frame and the 4th frame, the first frame and the 5th frame are determined normal arrival, there is the interlude point of packet loss in now fundamental tone sudden change, in the 3rd frame.Certainly, according to actual conditions, can also adopt other method to determine the interlude point of packet loss, the present embodiment is not done concrete restriction to this.For example, the pitch period of establishing former frame is last_pitch, and the pitch period of next frame is next_pitch, and the pitch period of current speech frame is pitch, and the pitch period of current speech frame can be determined according to following formula:
pitch=last_pitch,iflossCnt<(LostFrameCount>>1) (3)
pitch=next_pitch,iflossCnt≥(LostFrameCount>>1) (4)
In above-mentioned formula (3) and formula (4), determine after the numerical value of lostFrameCount, determine that lost frames start to count lossCnt to the frame losing altogether of current speech frame position, if the numerical value of lostFrameCount, divided by 2 numerical value that are greater than lossCnt, is defined as the pitch period last_pitch of last valid frame the pitch period of current speech frame.Otherwise, the pitch period of a rear valid frame is defined as to the pitch period of current speech frame.
Situation two: last valid frame is unvoiced frame, a rear valid frame is unvoiced frames;
Now can predict, periodic signal is constantly decay during packet loss.From the physical model of voice sounding, the decay of sort signal can show the slow decreasing of the slow elongated or fundamental frequency of pitch period.Based on above-mentioned principle, the pitch period of current speech frame can increase progressively to obtain by the pitch period extrapolation of last valid frame.The evaluation method of the pitch period of current speech frame k subframe is as follows:
pitch[k]=last_pitch+lossCnt*subFrameCount+k (5)
Wherein, in formula (5), the implication of parameter can be determined the pitch period of current speech frame with reference to the annotation in above-mentioned steps at the pitch period that gets all subframes of current speech frame, and detailed process can, with reference to above-mentioned steps, repeat no more herein.
Situation three: last valid frame is unvoiced frames, a rear valid frame is unvoiced frame;
Now can predict, periodic signal forms gradually during packet loss.Principle based in situation one, can increase progressively the formation that carrys out simulation cycle signal by the pitch period extrapolation of a rear valid frame, to obtain the pitch period of current speech frame.The evaluation method of the pitch period of current speech frame k subframe is as follows: pitch[subFrameCount-k-1]=next_pitch-lossCnt*subFrameCount-k (6)
Wherein, in formula (6), the implication of parameter can be determined the pitch period of current speech frame with reference to the annotation in above-mentioned steps at the pitch period that gets all subframes of current speech frame, and detailed process can, with reference to above-mentioned steps, repeat no more herein.
Situation four: last valid frame and a rear valid frame are unvoiced frames.
Now can determine that current speech frame is unvoiced frames, because unvoiced frames is not cyclical signal, so current speech frame does not have pitch period.
The present embodiment continues to determine that the gain parameter of current speech frame is example, to determining that according to the sequential evolution class parameter of last valid frame or a rear valid frame sequential evolution class parameter of current speech frame explains.
During concrete enforcement, the present embodiment is determined gain parameter in the mode of linear interpolation, certainly, can be according to the complexity of algorithms of different in actual environment, time delay and effect are selected interpolation method, and the present embodiment is not done concrete restriction to this.As not being in very serious situation when the situation of continual data package dropout, can select polynomial interpolation to obtain the gain of lost frames, but these class methods will obtain better effects, the subsequent frame in more multipair wobble buffers is decoded in advance, thereby can increase decoding time delay.The present embodiment, according to concrete application scenarios, provides a kind of linear interpolation algorithm, and concrete formula is as follows:
gain[k]=last_gain+gIncr*(k+lossCnt*subFrameCount+1) (8)
In formula (7), next_gain is the gain parameter of last valid frame, the gain parameter of next valid frame of last_gain, the evolution increment of the pitch period in the similar above-mentioned steps of evolution increment gIncr, after the concrete numerical value substitution formula (7) of parameter is calculated, can determine the numerical value of evolution increment gIncr.
Determine after evolution increment gIncr, the numerical value of parameter is updated in formula (8) and is calculated, can determine the gain parameter of the k subframe of current speech frame.
Similarly, the present embodiment is determined LSP coefficient in the mode of linear interpolation equally, and concrete formula is as follows:
lsp[i]=(1-α)*last_lsp[i]+α*next_lsp[i],1={1,2,...,P} (10)
In formula (9), α is weight coefficient, the frame losing that the present embodiment imports into by wobble buffer is counted lostFrameCount and is determined till lossCnt is counted in the frame losing altogether of current speech frame the weight coefficient α that front and back frame is carried out to linear interpolation, after the concrete numerical value substitution formula (9) of parameter is calculated, can determine the numerical value of weight coefficient α.
Determine after weight coefficient α, the numerical value of parameter is updated in formula (10) and is calculated, can determine the LSP coefficient on the i rank of current speech frame, suppose the LSP coefficient one total P rank of current speech frame, calculate in the manner described above, finally can determine the LSP coefficient on all rank of current speech frame.
405: obtain the non-sequential evolution class parameter of last valid frame and a rear valid frame, and according to the binary decision class parameter of last valid frame and a rear valid frame and non-sequential evolution class parameter, determine the non-sequential evolution class parameter of current speech frame;
Particularly, non-sequential evolution class parameter can include but not limited to LTP(Long Term Prediction, long-term forecasting) coefficient and pumping signal etc., the present embodiment is not done concrete restriction to this.The present embodiment is not done concrete restriction to definite mode of the non-sequential evolution class parameter of definite current speech frame yet, include but not limited to: according to the binary decision class parameter of last valid frame and a rear valid frame, determine after signal type, according to following four kinds of situations, according to the non-sequential evolution class parameter of last valid frame or a rear valid frame, determine the non-sequential evolution class parameter of current speech frame.
Situation one: last valid frame and a rear valid frame are unvoiced frame;
First the LTP coefficient of take explains as example, if current in continual data package dropout, or sudden change has occurred for the last valid frame of current speech frame and the pitch period of a rear valid frame, can infer that significant variation may appear in the last valid frame of current speech frame and the LTP coefficient of a rear valid frame.Wherein, when judgement continual data package dropout, if when the quantity of continual data package dropout is greater than packet loss threshold value, now illustrate that significant variation may appear in the last valid frame of current speech frame and the LTP coefficient of a rear valid frame.Packet loss threshold value can be set according to actual conditions, and the present embodiment is not done concrete restriction to this.On the other hand, judge that the mode whether last valid frame of current speech frame and the pitch period of a rear valid frame have occurred to suddenly change can, with reference to above-mentioned steps, repeat no more herein.Based on above-mentioned principle, be divided into following two kinds of situations and explain:
The first situation: current not in continual data package dropout and the last valid frame of current speech frame and the pitch period of a rear valid frame do not undergo mutation;
Now can determine the last valid frame of current speech frame and the energy value of a rear valid frame, if the energy value of last valid frame is Last_Energy, the energy value of a rear valid frame is Next_Energy, the energy value of last valid frame is removed to the energy value of a later valid frame and can determine zoom factor β.If determine the lost frames that current speech frame is first half, the LTP coefficient of last valid frame be multiplied by the LTP coefficient that zoom factor β can obtain current speech frame.If determine the lost frames that current speech frame is latter half, the LTP coefficient of a rear valid frame be multiplied by the LTP coefficient that zoom factor β can obtain current speech frame.Certainly, can also adopt alternate manner to determine the LTP coefficient of current speech frame in actual conditions, the present embodiment is not done concrete restriction to this.
Second case: current in continual data package dropout or the last valid frame of current speech frame and the pitch period of a rear valid frame there is sudden change.
Owing to having there is significant variation in the now last valid frame of current speech frame and the LTP coefficient of a rear valid frame, if therefore determine the lost frames that current speech frame is first half, the LTP coefficient using the LTP coefficient of last valid frame as current speech frame.If determine the lost frames that current speech frame is first half, the LTP coefficient using the LTP coefficient of last valid frame as current speech frame.If determine the lost frames that current speech frame is latter half, the LTP coefficient using the LTP coefficient of a rear valid frame as current speech frame.Certainly in actual conditions, can also adopt alternate manner to determine the LTP coefficient of current speech frame, the present embodiment is not done concrete restriction to this.
Situation two: last valid frame is unvoiced frame, a rear valid frame is unvoiced frames;
Because last valid frame is unvoiced frame, a rear valid frame is unvoiced frames.Now can predict, periodic signal is constantly decay during packet loss.Therefore, the LTP coefficient of current lost frames is multiplied by decay factor coefficient by the unification of last valid frame LTP coefficient and obtains, wherein decay factor can obtain with the energy Ratios of last valid frame and a rear valid frame, certainly can also adopt alternate manner to calculate decay factor, the present embodiment is not done concrete restriction to this.
Situation three: last valid frame is unvoiced frames, a rear valid frame is unvoiced frame;
Because last valid frame is unvoiced frames, a rear valid frame is unvoiced frame.Now can predict, periodic signal constantly strengthens during packet loss.Therefore, the LTP coefficient of current lost frames is multiplied by decay factor coefficient by a rear valid frame LTP coefficient unification and obtains, and the present embodiment is not done concrete restriction to definite mode of decay factor.
Situation four: last valid frame and a rear valid frame are unvoiced frames.
Because last valid frame and a rear valid frame are unvoiced frames, so current speech frame is also unvoiced frames, now can judge during packet loss and there is no periodic signal, do not need to determine the LTP coefficient of current speech frame.
The present embodiment continues to take pumping signal as example, and the aftertreatment of non-sequential evolution class parameter is explained.It should be noted that, due to pumping signal normally voice signal through long short-term prediction and aftertreatment (as noise shaping etc.) the remaining very strong residual signals of randomness afterwards.But sometimes wherein still can contain the non-white information that some speech models cannot decompose.Therefore, only with white noise, replace obtaining well synthetic quality.This class parameter does not possess sequential evolution yet simultaneously, so be not suitable for doing interpolation.Based on above-mentioned principle, the present embodiment provides a kind of method of definite current speech frame pumping signal, and concrete explaination is as follows:
If determine the lost frames that current speech frame is first half, the pumping signal using the pumping signal of last valid frame as current speech frame.If determine the lost frames that current speech frame is latter half, the pumping signal using the pumping signal of a rear valid frame as current speech frame.
406: according to the binary decision class parameter of current speech frame, sequential evolution class parameter and non-sequential evolution class parameter, current speech frame is decoded.
Particularly, by above-mentioned each step, get after binary decision class parameter, sequential evolution class parameter and the non-sequential evolution class parameter of current speech frame, decoder architecture as shown in Figure 5, the binary decision class parameter of current speech frame, sequential evolution class parameter and non-frame sequential evolution class parameter are transported to after demoder, by demoder, are decoded.Wherein decoding algorithm can be determined according to encryption algorithm, and the present embodiment is not done concrete restriction to this, after demoder decoding, can obtain the voice signal of current speech frame.
407: obtain the parameter of the last valid frame of current speech frame, according to the parameter of last valid frame, extrapolate to obtain the parameter of current speech frame, and according to the parameter of current speech frame, current speech frame is decoded.
Particularly, owing to judging current speech frame, there is no a rear valid frame, therefore obtain after the parameter of last valid frame, while obtaining the binary decision class parameter of current speech frame, can include but not limited to two kinds of situations: the first situation, for to determine that according to the binary decision class parameter of last valid frame last valid frame is unvoiced frames, now can be extrapolated and judge that the signal type of current speech frame is unvoiced frames.The second situation is for determining that last valid frame is unvoiced frame, now can obtain pitch period and the gain parameter of last valid frame, allow the voice signal of last valid frame slowly decay according to certain speed, make pitch period slowly elongated, gain parameter slowly diminishes.When arriving current speech frame, if the pitch period of current speech frame is greater than pitch period predetermined threshold value or gain parameter is less than gain parameter predetermined threshold value, the signal type that now can determine current speech frame is unvoiced frames.Otherwise, determine that the signal type of current speech frame is unvoiced frame.
Wherein, slowing down speed and can arranging according to historical experience of the voice signal of last valid frame, can also adopt other to determine method certainly, and the present embodiment is not done concrete restriction to this.The predetermined threshold value of pitch period and gain parameter can rule of thumb arrange equally, and the present embodiment is not done concrete restriction to the method to set up of the predetermined threshold value of pitch period and gain parameter yet.
Further, the present embodiment is not done concrete restriction to definite mode of the sequential evolution class parameter of definite current speech frame yet, include but not limited to: the sequential evolution class parameter of determining current speech frame according to the signal type of last valid frame can be divided into following two kinds of situations, concrete explaination is as follows:
Situation one: last valid frame is unvoiced frames;
Now can determine that current speech frame is unvoiced frames, because unvoiced frames is not cyclical signal, therefore, current speech frame does not have pitch period, the present embodiment be take and determined that the gain parameter of current speech frame is example, to determining that according to the gain parameter of last valid frame the gain parameter of current speech frame explains.
The voice signal of unvoiced frames is decayed according to given pace, and gain parameter correspondence slowly reduces, when arriving current speech frame, and gain parameter that can be using gain parameter now as current speech frame.
Wherein, slowing down speed and can arranging according to historical experience of the voice signal of last valid frame, can also adopt other to determine method certainly, and the present embodiment is not done concrete restriction to this.
Situation two: last valid frame is unvoiced frame.
Now the voice signal of last valid frame can be decayed according to given pace, pitch period slowly increases, gain parameter correspondence slowly reduces, the resonance peak of the corresponding LSP coefficient of frequency band expanding weakens gradually, when arriving current speech frame, according to the decision procedure in above-mentioned steps 403, if current speech frame is still unvoiced frame, sequential evolution class parameter that can be using parameters such as pitch period now, gain parameter and LSP coefficients as current speech frame.
Further, according to the non-sequential evolution class parameter of the binary decision class parameter of last valid frame and last valid frame, determine the non-sequential evolution class parameter of current speech frame.
Particularly, if determine that according to the binary decision class parameter of current speech frame current speech frame is unvoiced frame, take non-sequential evolution class parameter as LTP coefficient be example: obtain the LTP coefficient of last valid frame, the LTP coefficient of last valid frame be multiplied by zoom factor as current speech frame LTP coefficient.Wherein, zoom factor can rule of thumb obtain, and weakens frame by frame, and the present embodiment is not done concrete restriction to this.If current speech frame is unvoiced frames, take non-sequential evolution class parameter as pumping signal be example, the pumping signal of current speech frame can be chosen the less part of energy in last valid frame, the present embodiment is not done concrete restriction to this.
According to the binary decision class parameter of current speech frame, sequential evolution class parameter and non-frame sequential evolution class parameter, current speech frame is decoded, the process of obtaining the voice signal of current speech frame can, with reference to the related content of above-mentioned steps 406, repeat no more herein.
The method that the present embodiment provides, when determining current speech LOF to be decoded, by obtaining the parameter of last valid frame or the parameter of last valid frame and a rear valid frame of current speech frame, according to concrete condition, determine the parameter of current speech frame, then according to the parameter of current speech frame, to losing speech frame, carry out normal decoder, owing to having simulated the normal work of demoder under packet drop, therefore the continuity that has kept decoding, thereby when there is packet loss phenomenon in voice packet in transmitting procedure, can decode according to the parameter of definite lost frames, and then improve decoded voice quality.
Embodiment tri-
The embodiment of the present invention provides a kind of bag-losing hide device of parameter field, and this device is for the bag-losing hide method of the parameter field carrying out embodiment mono-or embodiment bis-and provide.Referring to Fig. 6, this device comprises:
Determination module 601, for determining whether current speech frame to be decoded is lost;
Front frame acquisition module 602, during for current speech LOF, obtains the parameter of the last valid frame of current speech frame;
Present frame determination module 603, for determining the parameter of current speech frame according to the parameter of last valid frame;
Decoder module 604, for decoding to current speech frame according to the parameter of current speech frame.
As a kind of preferred embodiment, referring to Fig. 7, this audio decoding apparatus, also comprises:
Judge module 605, for judging whether that buffering has a rear valid frame of current speech frame;
Rear frame acquisition module 606, during for a valid frame after buffering has, the parameter of a valid frame after obtaining;
Present frame determination module 603, for determining the parameter of current speech frame according to the parameter of the parameter of last valid frame and a rear valid frame.
As a kind of preferred embodiment, the parameter of the parameter of last valid frame and a rear valid frame comprises binary decision class parameter; Binary decision class parameter is for judging signal type, and signal type comprises two types of voiceless sound and voiced sounds;
Present frame determination module 603, while having a binary decision class parameter decision signal type to be unvoiced frame for the binary decision class parameter of the binary decision class parameter when last valid frame and a rear valid frame, the signal type of determining current speech frame is unvoiced frame;
As a kind of preferred embodiment, present frame determination module 603, for when the binary decision class parameter of last valid frame and the binary decision class parameter of a rear valid frame have an equal decision signal type of binary decision class parameter to be unvoiced frames, the signal type of determining current speech frame is unvoiced frames.
Or while having an equal decision signal type of binary decision class parameter to be unvoiced frames in the binary decision class parameter of last valid frame and the binary decision class parameter of a rear valid frame, the signal type of determining current speech frame is unvoiced frames.
As a kind of preferred embodiment, the parameter of the parameter of last valid frame and a rear valid frame also comprises sequential evolution class parameter, and sequential evolution class parameter at least comprises pitch period;
Present frame determination module 603, also for determining the sequential evolution class parameter of current speech frame according to the binary decision class parameter of last valid frame and a rear valid frame and sequential evolution class parameter.
As a kind of preferred embodiment, present frame determination module 603, for when according to last valid frame and after the binary decision class parameter of a valid frame determine that the signal type of last valid frame and a rear valid frame is unvoiced frame, and while determining that according to the sequential evolution class parameter of last valid frame and a rear valid frame pitch period of last valid frame and a rear valid frame does not suddenly change, according to the pitch period of last valid frame and a rear valid frame, carry out linear interpolation, obtain the pitch period of current speech frame.
As a kind of preferred embodiment, present frame determination module 603, for when according to last valid frame and after the binary decision class parameter of a valid frame determine that the signal type of last valid frame and a rear valid frame is unvoiced frame, and while determining that according to the sequential evolution class parameter of last valid frame and a rear valid frame pitch period of last valid frame and a rear valid frame has sudden change, if current speech framing bit is in the first half of all loss speech frames, determine that the pitch period of current valid frame and the pitch period of last valid frame are consistent, if current speech framing bit is in the latter half of all loss speech frames, the pitch period of determining current valid frame is consistent with the pitch period of a rear valid frame.
As a kind of preferred embodiment, present frame determination module 603, for when according to last valid frame and after the binary decision class parameter of a valid frame determine that the signal type of last valid frame is unvoiced frame, when the signal type of a rear valid frame is unvoiced frames, according to the pitch period extrapolation of last valid frame, obtain the pitch period of current speech frame.
As a kind of preferred embodiment, present frame determination module 603, for when according to last valid frame and after the binary decision class parameter of a valid frame determine that the signal type of last valid frame is unvoiced frames, when the signal type of a rear valid frame is unvoiced frame, according to the pitch period extrapolation of a rear valid frame, obtain the pitch period of current speech frame.
As a kind of preferred embodiment, the parameter of the parameter of last valid frame and a rear valid frame also comprises non-sequential evolution class parameter, and non-sequential evolution class parameter at least comprises long-term forecasting LTP coefficient;
Present frame determination module 603, also for determining the non-sequential evolution class parameter of current speech frame according to the binary decision class parameter of last valid frame and a rear valid frame and non-sequential evolution class parameter.
As a kind of preferred embodiment, present frame determination module 603, for when according to last valid frame and after the binary decision class parameter of a valid frame determine that the signal type of last valid frame and a rear valid frame is unvoiced frame, and determine that according to the sequential evolution class parameter of last valid frame and a rear valid frame pitch period of last valid frame and a rear valid frame does not suddenly change, and when packet loss quantity is less than packet loss threshold value, if current valid frame is positioned at the first half of all loss speech frames, according to the LTP coefficient of last valid frame, be multiplied by the LTP coefficient that zoom factor obtains current speech frame, if current valid frame is positioned at the latter half of all loss speech frames, according to the LTP coefficient of a rear valid frame, be multiplied by the LTP coefficient that zoom factor obtains current speech frame.
As a kind of preferred embodiment, present frame determination module 603, for when according to last valid frame and after the binary decision class parameter of a valid frame determine that the signal type of last valid frame and a rear valid frame is unvoiced frame, and while determining that according to the sequential evolution class parameter of last valid frame and a rear valid frame pitch period of last valid frame and a rear valid frame is undergone mutation or packet loss quantity is greater than packet loss threshold value, if current valid frame is positioned at the first half of all loss speech frames, determine that the LTP coefficient of current speech frame and the LTP coefficient of last valid frame are consistent, if current valid frame is positioned at the latter half of all loss speech frames, determine that the LTP coefficient of current speech frame and the LTP coefficient of a rear valid frame are consistent.
As a kind of preferred embodiment, present frame determination module 603, for when according to last valid frame and after the binary decision class parameter of a valid frame determine that the signal type of last valid frame is unvoiced frame, when the signal type of a rear valid frame is unvoiced frames, according to the LTP coefficient of last valid frame, be multiplied by the LTP coefficient that decay factor obtains current speech frame.
As a kind of preferred embodiment, present frame determination module 603, for when according to last valid frame and after the binary decision class parameter of a valid frame determine that the signal type of last valid frame is unvoiced frames, when the signal type of a rear valid frame is unvoiced frame, according to the LTP coefficient of a rear valid frame, be multiplied by the LTP coefficient that decay factor obtains current speech frame.
The device that the present embodiment provides, when determining current speech LOF to be decoded, by obtaining the parameter of last valid frame or the parameter of last valid frame and a rear valid frame of current speech frame, according to concrete condition, determine the parameter of current speech frame, then according to the parameter of current speech frame, to losing speech frame, carry out normal decoder, owing to having simulated the normal work of demoder under packet drop, therefore the continuity that has kept decoding, thereby when there is packet loss phenomenon in voice packet in transmitting procedure, can decode according to the parameter of definite lost frames, and then improve decoded voice quality.
It should be noted that: the bag-losing hide device of the parameter field that above-described embodiment provides is when carrying out bag-losing hide, only the division with above-mentioned each functional module is illustrated, in practical application, can above-mentioned functions be distributed and by different functional modules, completed as required, the inner structure that is about to device is divided into different functional modules, to complete all or part of function described above.In addition, the bag-losing hide device of the parameter field that above-described embodiment provides and the bag-losing hide embodiment of the method for parameter field belong to same design, and its specific implementation process refers to embodiment of the method, repeats no more here.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
One of ordinary skill in the art will appreciate that all or part of step that realizes above-described embodiment can complete by hardware, also can come the hardware that instruction is relevant to complete by program, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium of mentioning can be ROM (read-only memory), disk or CD etc.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.