CN104347076A

CN104347076A - Network audio packet loss concealment method and device

Info

Publication number: CN104347076A
Application number: CN201310345063.6A
Authority: CN
Inventors: 屈振华; 江洪; 尹梅; 马涛; 张海涛; 龙显军; 陈珣; 王刚; 王哲; 区洪辉; 黄梓南; 胡文胜; 许捷翰; 叶文超; 刘豪; 郭英
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2013-08-09
Filing date: 2013-08-09
Publication date: 2015-02-11
Anticipated expiration: 2033-08-09
Also published as: CN104347076B

Abstract

The invention discloses a network audio packet loss concealment method and device, relating to the field of audio transmission. The method comprises the steps of decoding an audio stream subjected to packet loss, reversing audio signals obtained through decoding, then, carrying out packet loss compensation on the audio signals and the reversed audio signals respectively, reversing the compensated reversed audio signals again, and finally, carrying out magnitude average on the compensated audio signals of two channels, so as to obtain output signals. By the bidirectional-prediction error concealment technology, the quality of audio can be further improved, and particularly, a relatively clear voice effect can be obtained in the network environment that the packet loss rate is relatively high.

Description

Network audio packet loss covering method and device

Technical field

The present invention relates to audio transmission field, particularly a kind of network audio packet loss covering method and device, thus the discontinuous sense that attenuating network audio stream produces due to packet loss, improve the auditory effect of network audio stream.

Background technology

At VOIP(Voice over Internet Protocol, the networking telephone) in application, due to some reasons such as network qualities, may there is packet loss phenomenon in audio stream, and serious may affect voice quality, causes interference to call.

ILBC(internet Low Bit Rate Codec, Internet Low Bit-rate is encoded) algorithm is a kind of based on CELP(Code Excited Linear Prediction, Qualcomm Code Excited Linear Prediction (QCELP)) low bit rate speech coding algorithm, the voice quality of its excellence, outstanding long-term prediction method and packet loss shelter (Packet Loss Concealment, be called for short PLC) technology, solve the problem of interconnected transfers on network voice well.ILBC designs mainly for packet network, and main advantage is its process for Network Packet Loss.ILBC adopts and carries out direct coding by original state structure adaptive codebook to pumping signal, and achieve the separate coding between speech frame, the impact of frame losing is confined in this frame.When after loss of data, iLBC according to the pumping signal recorded before and speech parameter, can carry out relevant treatment, residual error strengthens, mix the operations such as white noise, generates the voice signal of simulation, thus substitute the voice lost.Therefore, iLBC, when audio stream generation packet loss, can lower the discontinuous sense that packet loss produces.

But, serious packet loss phenomenon may be there is when network environment is poor, adopt existing iLBC packet loss macking technique still cannot obtain satisfied sound effect, therefore, be necessary to propose a kind of packet loss macking technique under the network environment that packet loss is higher, still can obtain sound effect more clearly.

Summary of the invention

An embodiment of the present invention technical matters to be solved is: solve the problem how obtaining sound effect more clearly under the network environment that packet loss is higher.

According to an aspect of the embodiment of the present invention, a kind of network audio packet loss covering method is proposed, comprise: be pulse code modulation (PCM) PCM signal by the audio stream after packet loss, caused by packet loss the PCM data value of the data segment lacked to be set to 0, using the PCM signal after process as sound signal S; Carry out obtaining sound signal S ' time reversal to sound signal S; Sound signal S obtains sound signal Si through Internet Low Bit-rate coding iLBC Discarded Packets compensation process, and sound signal S ' obtains sound signal Si ' through iLBC Discarded Packets compensation process; Carry out obtaining sound signal Sit time reversal to sound signal Si '; By moving window, the sound signal So that amplitude spectrum on average obtains output is carried out to sound signal Si and sound signal Sit.

By moving window, the sound signal So that amplitude spectrum on average obtains output is carried out to sound signal Si and sound signal Sit, specifically comprise: adopt moving window to sound signal Si, sound signal Sit, sound signal So carries out framing, the length 2N of every frame, slip between two frames is spaced apart N, and remembers that the i-th frame data in sound signal Si are vector α _i, the i-th frame data in sound signal Sit are vector beta _i, the i-th frame data in sound signal So are vector O _i, the vector of window function is w; According to window function to α _icarry out windowing operation to obtain , right its amplitude spectrum A is calculated after carrying out Fast Fourier Transform (FFT) _iwith phase spectrum φ _i; According to window function to β _icarry out windowing operation to obtain , right its amplitude spectrum B is calculated after carrying out Fast Fourier Transform (FFT) _iwith phase spectrum θ _i; To amplitude spectrum A _iwith amplitude spectrum B _ibe averaged and obtain ; To with for amplitude spectrum with φ _ifor the signal of phase spectrum carries out inverse fast Fourier transform, the result according to inverse fast Fourier transform calculates O _i.

ILBC Discarded Packets compensation process specifically comprises: rebuild linear predictor coefficient to lost frames, and rebuild residual signals.

According to another aspect of the embodiment of the present invention, a kind of network audio packet loss covering appts is proposed, comprise: decoder module, for being pulse code modulation (PCM) PCM signal by the audio stream after packet loss, the PCM data value of the data segment lacked is caused by packet loss to be set to 0, using the PCM signal after process as sound signal S; Reversal block, for carrying out obtaining sound signal S ' to sound signal S time reversal; Compensating module, obtain sound signal Si for sound signal S through Internet Low Bit-rate coding iLBC Discarded Packets compensation process, sound signal S ' obtains sound signal Si ' through iLBC Discarded Packets compensation process; Reversal block, also for carrying out obtaining sound signal Sit time reversal to sound signal Si '; Amplitude averaging module, for carrying out by moving window the sound signal So that amplitude spectrum on average obtains output to sound signal Si and sound signal Sit.

Amplitude averaging module, specifically for: adopt moving window to sound signal Si, sound signal Sit, sound signal So carry out framing, the length 2N of every frame, and the slip between two frames is spaced apart N, and remember that the i-th frame data in sound signal Si are vector α _i, the i-th frame data in sound signal Sit are vector beta _i, the i-th frame data in sound signal So are vector O _i, the vector of window function is w; According to window function to α _icarry out windowing operation to obtain , right its amplitude spectrum A is calculated after carrying out Fast Fourier Transform (FFT) _iwith phase spectrum φ _i; According to window function to β _icarry out windowing operation to obtain , right its amplitude spectrum B is calculated after carrying out Fast Fourier Transform (FFT) _iwith phase spectrum θ _i; To amplitude spectrum A _iwith amplitude spectrum B _ibe averaged and obtain ; To with for amplitude spectrum with φ _ifor the signal of phase spectrum carries out inverse fast Fourier transform, the result according to inverse fast Fourier transform calculates O _i.

The iLBC Discarded Packets compensation process of compensating module specifically for: linear predictor coefficient is rebuild to lost frames, and rebuilds residual signals.

The present invention is to the audio stream after packet loss, and the sound signal obtained of decoding is reversed, then respectively Discarded Packets compensation is carried out to sound signal and reversion sound signal, and the reversion sound signal after compensating is reversed again, finally the sound signal amplitude of carrying out after two-way compensation is on average outputed signal, this bi-directional predicted error concealment technology can improve audio quality further, especially under the network environment that packet loss is higher, still can obtain sound effect more clearly.

By referring to the detailed description of accompanying drawing to exemplary embodiment of the present invention, further feature of the present invention and advantage thereof will become clear.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 is the schematic flow sheet of a network audio packet loss covering method of the present invention embodiment.

Fig. 2 is the Hanning window function schematic diagram that the present invention adopts.

Fig. 3 is that the present invention adopts moving window to voice signal overlap-add operation schematic diagram.

Fig. 4 is the result schematic diagram that the present invention assesses the voice quality after error covering.

Fig. 5 is the structural representation of a network audio packet loss covering appts of the present invention embodiment.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Illustrative to the description only actually of at least one exemplary embodiment below, never as any restriction to the present invention and application or use.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

In order to obtain sound effect more clearly under the network environment that packet loss is higher, the present invention adopts bi-directional predicted error concealment technology can improve audio quality further, special under the network environment that packet loss is higher, still can obtain sound effect more clearly.

As shown in Figure 1, the network audio packet loss covering method of the present embodiment comprises the following steps:

Audio stream after packet loss is PCM(pulse code modulation (PCM) by S101) signal, caused by packet loss the PCM data value of the data segment lacked to be set to 0, using the PCM signal after process as sound signal S.

S102, carries out obtaining sound signal S ' to sound signal S time reversal.

The present embodiment can by sound signal S and sound signal S ' respectively stored in two buffer zones.

S103, sound signal S obtain sound signal Si through iLBC Discarded Packets compensation process, and sound signal S ' obtains sound signal Si ' through iLBC Discarded Packets compensation process.

Wherein, the follow-up detailed description of iLBC Discarded Packets compensation processing procedure.

S104, carries out obtaining sound signal Sit to sound signal Si ' time reversal.

S105, carries out by moving window the sound signal So that amplitude spectrum on average obtains output to sound signal Si and sound signal Sit.

The present invention proposes a kind of exemplary amplitude spectrum averaging method, can refer step (1) ~ step (5), specific as follows:

(1) suppose the sound signal Si and the sound signal Sit that there are two sections of equal length, carrying out superposing the specific practice obtaining output audio signal So is:

Adopt moving window b to sound signal Si, sound signal Sit, sound signal So carry out framing, the length 2N of every frame, and the slip between two frames is spaced apart N, namely each frame and two interframe before and after it have respectively 50% overlapping.Do the flatness that can ensure to superpose the voice obtained like this, and remember that the i-th frame data in sound signal Si are vector α _i, the i-th frame data in sound signal Sit are vector beta _i, the i-th frame data in sound signal So are vector O _i, the Hanning(Hamming that 2N point is corresponding) and the vector of window function is w.

Fig. 2 is the Hanning window function schematic diagram that the present invention adopts, and as shown in Figure 2, the length of Hanning window is 2N, N=80.

Fig. 3 is that the present invention adopts moving window to voice signal overlap-add operation schematic diagram, and as shown in Figure 3, the spacing between two moving windows is 1/2 frame.

(2) according to window function to α _icarry out windowing operation to obtain , right carry out Fast Fourier Transform (FFT) (being called for short FFT) and calculate its amplitude spectrum A afterwards _iwith phase spectrum φ _i.

Windowing operation can with reference to following formula:

Wherein, represents windowing operation.

Amplitude spectrum calculates can with reference to following formula:

Wherein, abs represents that plural number takes absolute value computing, and F represents Fast Fourier Transform (FFT).

Phase spectrum calculates can with reference to following formula:

Wherein, ang represents and gets phasing degree to plural number, unit radian.

(3) according to window function to β _icarry out windowing operation to obtain , right its amplitude spectrum B is calculated after carrying out Fast Fourier Transform (FFT) _iwith phase spectrum θ _i.

Correlation computations can join out previous step (2), repeats no more here.

(4) to amplitude spectrum A _iwith amplitude spectrum B _ibe averaged and obtain , can with reference to following formula:

\overset{&OverBar;}{A} = \frac{A_{i} + B_{i}}{2}

(5) to for amplitude spectrum with φ _isignal for phase spectrum carries out inverse fast Fourier transform (being called for short IFFT), and the result according to inverse fast Fourier transform calculates O _i, specifically can with reference to following formula:

O_{i} = O_{i} + real (IFFT [\overset{&OverBar;}{A} \exp (φ_{i})])

Wherein, real (.) represents that getting real part to plural number operates.

It should be noted that, the Discarded Packets compensation processing procedure based on iLBC technology can with reference to prior art, such as, by Andersen, S.V.; Kleijn, W.B.; Hagen, R.; Linden, J.; Murthi, M.N.; The people such as Skoglund, J. deliver " iLBC-a linear predictive coder with robustness to packet losses, " Speech Coding, 2002, IEEE Workshop Proceedings., vol., no., pp.23,25,6-9Oct.2002.As a kind of example, brief description iLBC Discarded Packets compensation processing procedure.

In the process that decoding end is decoded frame by frame to the code stream received, after iLBC Decoder accepts to the code stream of every frame, first to judge that whether present frame is complete, be handled as follows respectively according to complete or generation packet loss:

If complete, then according to normal iLBC decoding process reconstructed speech signal, need the status information of preserving present frame, these status informations comprise linear predictor coefficient (being called for short LPC) information, decoded residual signals etc.If the bit rate of next frame is lost, the information that these are preserved will be used.

If there occurs packet loss, so just carry out packet loss and shelter process (PLC process).PLC is mainly according to the decoded information of former frame, and the lost frames that the method approximate substitution utilizing pitch synchronous to repeat is current, to reach Discarded Packets compensation.If present frame is lost, just need the PLC method reconstructing lost speech data adopting follow-up introduction, if there is continuous multiple frames packet loss, just need the PLC method reconstructing lost speech data repeatedly adopting follow-up introduction.It should be noted that, the frame that face is lost more rearward is more difficult to accurate reconstruction, successively decreases frame by frame so adopt the gain of continual data package dropout, to avoid introducing larger distorted signals.

PLC reconstructing lost speech data mainly comprises two steps:

(1) linear predictor coefficient (being called for short LPC) is rebuild

Due to from spatially or the time, last subframe all has maximum correlation with the linear predictor coefficient of current lost frames, therefore, the present invention: using the linear predictor coefficient of the linear predictor coefficient of last subframe of past frame as lost frames.

(2) residual signals (or claiming pumping signal) is rebuild

Residual signals can be divided into two parts usually: quasi-periodicity composition and noise like composition.Therefore in fact first PLC needs to rebuild this two parts, composition can be similar to according to the pitch period measuring former frame and obtain quasi-periodicity, noise like composition then can obtain by producing random noise, and the energy proportion of the two also can use for reference the proportionate relationship of former frame.Therefore, pitch Detection is carried out to former frame, with the phonological component of pitch synchronous mode reconstructing lost frame, utilize correlativity to obtain the gain of class noise, phonological component is mixed to rebuild residual signals with class noise.

Network audio packet loss covering method of the present invention, to the audio stream after packet loss, and the sound signal obtained of decoding is reversed, then respectively Discarded Packets compensation is carried out to sound signal and reversion sound signal, and the reversion sound signal after compensating is reversed again, finally the sound signal amplitude of carrying out after two-way compensation is on average outputed signal, this bi-directional predicted error concealment technology can improve audio quality further, special under the network environment that packet loss is higher, still can obtain sound effect more clearly.

In order to prove that bi-directional predicted error concealment technology can improve audio quality further, invention has been experiment simulation, below experimentation and simulation result being described.

The sound signal that experiment adopts is a segment length is 100s, and sampling rate is the voice signal of 8k, stores, be designated as X with 16 bit PCM format.

Experimental procedure is as follows:

(1) adopt G.729 mode to encode to X, G.729 frame length is 160 sample points, and the length of X just in time may be partitioned into an integer audio frame, and voice packet presses generation time serial number.

(2) analog network packet loss process.According to the packet loss r pre-set, produce 0/1 random series, sequence length equals the number of speech frame, i.e. the corresponding voice packet of each random number, and the voice packet of corresponding numeral " 1 " is considered to abandon.

(3) order is decoded to voice packet by number, carries out normal decoder, export the PCM data after this frame decoding to the voice packet that packet loss zone bit is " 0 ".Be the voice packet of " 1 " to packet loss zone bit, directly export the PCM data that 20ms value is 0 entirely.The speech data that this decode procedure obtains is designated as S.

(4) use iLBC demoder to encode to S, the frame length of iLBC coding is set to 160 sample points.Again decode to the voice after coding, iLBC except needs number of speech frames certificate, also needs the packet loss zone bit that this frame is corresponding when decoding.Be the speech frame (lost frames) of " 1 " to packet loss zone bit, the phonetic synthesis parameter of this frame of phonetic synthesis parameter prediction that iLBC can obtain according to decoding former frame when decoding, and reconstruct the speech data of loss.

(5) carry out obtaining sound signal S ' time reversal to S, and use iLBC to encode to S '.Time reversal is carried out to the 0/1 packet loss zone bit sequence generated before.And with iLBC, to speech frame and corresponding packet loss zone bit thereof, it is decoded, and obtains the voice Si ' after Discarded Packets compensation, is carried out obtaining signal Sit time reversal by signal Si '.

(6) voice signal Si and voice signal Sit is carried out amplitude spectrum overlap-add operation and obtain output audio signal So.

The present invention adopts subjective speech quality assessment (PESQ) (see ITU-T suggestion P.862) to assess the voice quality after error concealment, and assessment result as shown in Figure 4.Result shows, when packet loss is more than 5%, the method that the present invention proposes uses iLBC algorithm compared to simple, more effectively can improve the voice quality after error concealment.

As shown in Figure 5, the network audio packet loss covering appts of the present embodiment comprises:

Decoder module 501, for being pulse code modulation (PCM) PCM signal by the audio stream after packet loss, causes the PCM data value of the data segment lacked to be set to 0, using the PCM signal after process as sound signal S by packet loss;

Reversal block 502, for carrying out obtaining sound signal S ' to sound signal S time reversal;

Compensating module 503, obtain sound signal Si for sound signal S through Internet Low Bit-rate coding iLBC Discarded Packets compensation process, sound signal S ' obtains sound signal Si ' through iLBC Discarded Packets compensation process;

Reversal block 502, also for carrying out obtaining sound signal Sit time reversal to sound signal Si ';

Amplitude averaging module 504, for carrying out by moving window the sound signal So that amplitude spectrum on average obtains output to sound signal Si and sound signal Sit.

Wherein, amplitude averaging module 504 specifically for:

Adopt moving window to sound signal Si, sound signal Sit, sound signal So carry out framing, the length 2N of every frame, and the slip between two frames is spaced apart N, and remember that the i-th frame data in sound signal Si are vector α _i, the i-th frame data in sound signal Sit are vector beta _i, the i-th frame data in sound signal So are vector O _i, the vector of window function is w;

According to window function to α _icarry out windowing operation to obtain , right its amplitude spectrum A is calculated after carrying out Fast Fourier Transform (FFT) _iwith phase spectrum φ _i;

According to window function to β _icarry out windowing operation to obtain , right its amplitude spectrum B is calculated after carrying out Fast Fourier Transform (FFT) _iwith phase spectrum θ _i;

To amplitude spectrum A _iwith amplitude spectrum B _ibe averaged and obtain ;

To with for amplitude spectrum with φ _ifor the signal of phase spectrum carries out inverse fast Fourier transform, the result according to inverse fast Fourier transform calculates O _i.

Wherein, the iLBC Discarded Packets compensation process of compensating module 503 specifically for: linear predictor coefficient is rebuild to lost frames, and rebuilds residual signals.

Wherein, compensating module 503 rebuild linear predictor coefficient time specifically for: using the linear predictor coefficient of the linear predictor coefficient of last subframe of past frame as lost frames.

Wherein, compensating module 503 rebuild residual signals time specifically for: pitch Detection is carried out to former frame, with the phonological component of pitch synchronous mode reconstructing lost frame, utilize correlativity to obtain the gain of class noise, phonological component is mixed to rebuild residual signals with class noise.

Network audio packet loss covering appts of the present invention, to the audio stream after packet loss, and the sound signal obtained of decoding is reversed, then respectively Discarded Packets compensation is carried out to sound signal and reversion sound signal, and the reversion sound signal after compensating is reversed again, finally the sound signal amplitude of carrying out after two-way compensation is on average outputed signal, this bi-directional predicted error concealment technology can improve audio quality further, special under the network environment that packet loss is higher, still can obtain sound effect more clearly.

One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can have been come by hardware, the hardware that also can carry out instruction relevant by program completes, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. a network audio packet loss covering method, comprising:

Be pulse code modulation (PCM) PCM signal by the audio stream after packet loss, caused by packet loss the PCM data value of the data segment lacked to be set to 0, using the PCM signal after process as sound signal S;

Carry out obtaining sound signal S ' time reversal to sound signal S;

Sound signal S obtains sound signal Si through Internet Low Bit-rate coding iLBC Discarded Packets compensation process, and sound signal S ' obtains sound signal Si ' through iLBC Discarded Packets compensation process;

Carry out obtaining sound signal Sit time reversal to sound signal Si ';

By moving window, the sound signal So that amplitude spectrum on average obtains output is carried out to sound signal Si and sound signal Sit.

2. method according to claim 1, is characterized in that, described to sound signal Si and sound signal Sit by moving window carry out amplitude spectrum on average obtain export sound signal So, specifically comprise:

To amplitude spectrum A _iwith amplitude spectrum B _ibe averaged and obtain ;

3. method according to claim 1, is characterized in that, described iLBC Discarded Packets compensation process specifically comprises: rebuild linear predictor coefficient to lost frames, and rebuild residual signals.

4. method according to claim 3, is characterized in that, described to lost frames rebuild linear predictor coefficient specifically comprise: using the linear predictor coefficient of the linear predictor coefficient of last subframe of past frame as lost frames.

5. method according to claim 3, it is characterized in that, described reconstruction residual signals specifically comprises: carry out pitch Detection to former frame, with the phonological component of pitch synchronous mode reconstructing lost frame, utilize correlativity to obtain the gain of class noise, phonological component is mixed to rebuild residual signals with class noise.

6. a network audio packet loss covering appts, comprising:

Decoder module, for being pulse code modulation (PCM) PCM signal by the audio stream after packet loss, causes the PCM data value of the data segment lacked to be set to 0, using the PCM signal after process as sound signal S by packet loss;

Reversal block, for carrying out obtaining sound signal S ' to sound signal S time reversal;

Compensating module, obtain sound signal Si for sound signal S through Internet Low Bit-rate coding iLBC Discarded Packets compensation process, sound signal S ' obtains sound signal Si ' through iLBC Discarded Packets compensation process;

Reversal block, also for carrying out obtaining sound signal Sit time reversal to sound signal Si ';

Amplitude averaging module, for carrying out by moving window the sound signal So that amplitude spectrum on average obtains output to sound signal Si and sound signal Sit.

7. device according to claim 6, is characterized in that, described amplitude averaging module, specifically for:

To amplitude spectrum A _iwith amplitude spectrum B _ibe averaged and obtain ;

8. device according to claim 6, is characterized in that, the iLBC Discarded Packets compensation process of described compensating module specifically for: linear predictor coefficient is rebuild to lost frames, and rebuilds residual signals.

9. device according to claim 8, is characterized in that, described compensating module rebuild linear predictor coefficient time specifically for: using the linear predictor coefficient of the linear predictor coefficient of last subframe of past frame as lost frames.

10. device according to claim 8, it is characterized in that, described compensating module rebuild residual signals time specifically for: pitch Detection is carried out to former frame, with the phonological component of pitch synchronous mode reconstructing lost frame, utilize correlativity to obtain the gain of class noise, phonological component is mixed to rebuild residual signals with class noise.