CN101366079B - Packet loss concealment for sub-band predictive coding based on extrapolation of full-band audio waveform - Google Patents

Packet loss concealment for sub-band predictive coding based on extrapolation of full-band audio waveform Download PDF

Info

Publication number
CN101366079B
CN101366079B CN200780001854XA CN200780001854A CN101366079B CN 101366079 B CN101366079 B CN 101366079B CN 200780001854X A CN200780001854X A CN 200780001854XA CN 200780001854 A CN200780001854 A CN 200780001854A CN 101366079 B CN101366079 B CN 101366079B
Authority
CN
China
Prior art keywords
signal
audio signal
frame
full band
output audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN200780001854XA
Other languages
Chinese (zh)
Other versions
CN101366079A (en
Inventor
朱因韦·陈
杰斯·赛森
罗伯塔·W·措普夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zyray Wireless Inc
Original Assignee
Zyray Wireless Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zyray Wireless Inc filed Critical Zyray Wireless Inc
Priority claimed from PCT/US2007/075975 external-priority patent/WO2008022176A2/en
Publication of CN101366079A publication Critical patent/CN101366079A/en
Application granted granted Critical
Publication of CN101366079B publication Critical patent/CN101366079B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

A technique for concealing the effect of a lost frame in a series of frames representing an encoded audio signal in a sub-band predictive coding system is provided. In accordance with the technique, one or more received frames in the series are decoded to generate a full-band output audio signal. The full-band output audio signal corresponding to the one or more received frames is stored. Then, a full-band output audio signal corresponding to the lost frame is synthesized by performing waveform extrapolation based on the stored full-band output audio signal corresponding to the one or more received frames.

Description

The packet loss concealment that is used for the subband predictive coding based on full band audio volume control extrapolation
Technical field
The present invention relates to the system and method that a kind of quality that hiding packet loss causes in voice or audio coder reduces effect.
Background technology
Carry out through packet network in the process of digital transmission at sound or sound signal, the sound/sound signal of coding is divided into frame usually, is packaged into bag then, and wherein each bag can comprise the frame of one or more encode sound/voice datas.Then through these bags of packet network transmission.Sometimes some bags can be lost, and some useful bags can arrive too late, lose thereby be identified as.This packet loss can cause the remarkable reduction of audio quality, only if use special technique to hide the effect that packet loss caused.
Current existence is used for the bag bag-losing hide based on the extrapolation sound signal (packet loss concealment the is abbreviated as PLC) method of autonomous block scrambler or full range band predictive coding device.This PLC method comprises disclosed technology in the following U.S. Patent application: application number is 11/234; 291, artificially old, the name of the invention U.S. Patent application that is called the bag-losing hide of autonomous block audio coder & decoder (codec) " be used for technology " and application number are 10/183,608, invention is artificially old, be called the U.S. Patent application of " frame deletion hidden method and the system based on the extrapolation speech waveform that are used to predict voice coding ".Yet the technology of in these applications, describing can not directly be used for subband predictive coding device, recommends G.722 wideband acoustic encoder like ITU-T, and this is because there are these technological unsolved subband ad hoc structure problems.In addition; For each subband; G.722 scrambler has used adaptive difference pulse code modulation (ADPCM) predictive coding device; This ADPCM predictive coding device has used based on the sampling one by one (sample-by-sample) of the quantiser step size of gradient method and predictor coefficient back to self-adaptation, and this has caused the unsolved particular challenge of existing PLC technology.Therefore, needing a kind of is the specially designed appropriate PLC method of subband predictive coding device (as G.722).
Summary of the invention
The present invention is used for hiding the quality reduction effect that packet loss causes at subband predictive coding device.The present invention has specifically solved some the subband specific structure problems when subband predictive coding device is used the audio volume control extrapolation technique, and the present invention has also solved a general back specific PLC difficult problem to self-adaptation adpcm encoder and special G.722 subband adpcm encoder.
Specifically, the present invention has described the method for the influence of the lost frames in a kind of series of frames of in subband predictive coding system, hiding the presentation code sound signal at this.According to said method, the one or more received frames in the said series of frames are decoded to produce full band output audio signal, and wherein said full band output audio signal comprises the combination of at least the first subband decoded audio signal and the second subband decoded audio signal.The full band output audio signal of corresponding said one or more received frames is stored.Then, the full band output audio signal of synthetic corresponding said lost frames, wherein the full band output audio signal of synthetic corresponding said lost frames comprises the full band output audio signal execution waveform extrapolation based on corresponding one or more received frames of said storage.
The present invention has also described a kind of system at this.Said system comprises demoder, buffer memory and full band sound signal compositor.Said demoder is used for decoding one or more received frames of series of frames of expression coding audio signal to produce full band output audio signal, and wherein said full band output audio signal comprises the combination of at least the first subband decoded audio signal and the second subband decoded audio signal.Said buffer memory is used to store the full band output audio signal of corresponding said one or more received frames.Said full band sound signal compositor is used for the full band output audio signal of synthetic corresponding said series of frames lost frames, and wherein the full band output audio signal of synthetic corresponding said lost frames comprises the full band output audio signal execution waveform extrapolation based on corresponding one or more received frames of said storage.
The present invention also describes a kind of computer program.Said computer program comprises the computer readable medium that records computer program logic, and said computer program logic is used for making the influence of the lost frames of processor in the series of frames of the hiding presentation code sound signal of subband predictive coding system.Said computer program logic comprises first module, second module and three module.One or more received frames that said first module is used for making said processor decodes series of frames are to produce full band output audio signal, and wherein said full band output audio signal comprises the combination of at least the first subband decoded audio signal and the second subband decoded audio signal.Said second module is used to make the full band output audio signal of the corresponding said one or more received frames of said processor storage.Said three module is used for making the full band output audio signal of the synthetic corresponding said series of frames lost frames of said processor, and wherein the full band output audio signal of synthetic corresponding said lost frames comprises the full band output audio signal execution waveform extrapolation based on corresponding one or more received frames of said storage.
The structure of more feature and advantage of the present invention and various embodiments of the invention will be made further details with reference to accompanying drawing with operation and describe.Notice that the present invention is not limited to said specific embodiments.In the embodiment of this proposition purpose of property as an example only.Based on the instruction that is included in this, more embodiment is conspicuous to those of ordinary skill in the art.
Description of drawings
Accompanying drawing in this combination is the part of instructions; Accompanying drawing has been illustrated the one or more embodiment of the present invention with text description; And be further used for explaining purposes of the present invention, advantage and principle, and the those skilled in the art is implemented and use the present invention.
Fig. 1 is the G.722 synoptic diagram of the coder structure of subband predictive coding device of traditional I TU-T;
Fig. 2 is the G.722 synoptic diagram of the decoder architecture of subband predictive coding device of traditional I TU-T;
Fig. 3 is the module map of the demoder/PLC system according to the embodiment of the invention;
Fig. 4 is a method flow diagram of exporting voice signal according to embodiment of the invention processed frame in demoder/PLC system with generation;
Fig. 5 is can be by the sequential chart of the dissimilar frames of demoder/PLC system handles according to the embodiment of the invention;
Fig. 6 is the timeline synoptic diagram of the amplitude of primary speech signal and extrapolation voice signal;
Fig. 7 is the method flow diagram that between decodeing speech signal and extrapolation voice signal, calculates time lag (time lag) according to the embodiment of the invention;
Fig. 8 is the method flow diagram that between decodeing speech signal and extrapolation voice signal, calculates two stages of time lag according to the embodiment of the invention;
Fig. 9 is that calculate in the implementation in time lag according to the embodiment of the invention can be with respect to the synoptic diagram of the mode of decodeing speech signal translation extrapolation voice signal;
Figure 10 A representes to be ahead of the decodeing speech signal of extrapolation voice signal and the timeline synoptic diagram of the relevant effect that recompile is operated according to the embodiment of the invention;
Figure 10 B representes to lag behind the decodeing speech signal of extrapolation voice signal and the timeline synoptic diagram of the relevant effect that recompile is operated according to the embodiment of the invention;
Figure 10 C is the timeline synoptic diagram that is illustrated in the relevant effect of extrapolation voice signal synchronous on the frame boundaries and decodeing speech signal and recompile operation according to the embodiment of the invention;
Figure 11 is a method flow diagram of behind packet loss, carrying out the phasing again (re-phasing) of subband adpcm decoder internal state according to the embodiment of the invention;
Figure 12 A is the synoptic diagram that is twisted (time-warping) the decodeing speech signal application time that is ahead of the extrapolation voice signal according to the embodiment of the invention;
Figure 12 B and 12C all are the synoptic diagram that the decodeing speech signal application time that lags behind the extrapolation voice signal twisted according to the embodiment of the invention;
Figure 13 is with the process flow diagram along a kind of method of time shaft contraction signal according to embodiment of the invention execution time distortion;
Figure 14 is with the process flow diagram along a kind of method of time shaft stretch signal according to embodiment of the invention execution time distortion;
Figure 15 is the module map according to the logic that is used in demoder/PLC system taking place the received frame after the received frame of predetermined quantity to be handled behind the packet loss of the embodiment of the invention;
Figure 16 be according to the embodiment of the invention be used for carry out the module map that the waveform extrapolation generates the logic of the output voice signal that is associated with the frame of losing at demoder/PLC system;
Figure 17 is the more module map of the logic of the subband adpcm decoder state of new decoder/PLC system that is used for according to the embodiment of the invention;
Figure 18 is the module map that is used for carrying out again in demoder/PLC system the logic of phasing and time distortion according to the embodiment of the invention;
Figure 19 is a module map of carrying out the logic of constraint and controlled decoding according to the good frame that is used for after demoder/PLC system is to packet loss, receiving of the embodiment of the invention;
Figure 20 is the module map of simplification low strap adpcm encoder that is used for upgrading in the packet loss process internal state of low strap adpcm decoder according to the embodiment of the invention;
Figure 21 is the module map of simplification high-band adpcm encoder that is used for upgrading in the packet loss process internal state of high-band adpcm decoder according to the embodiment of the invention;
Figure 22 A, 22B and 22C all are according to the timeline synoptic diagram of the embodiment of the invention to the distortion of decodeing speech signal application time;
Figure 23 is the module map according to another demoder/PLC system of the embodiment of the invention;
Figure 24 is the module map that realizes the computer system of embodiments of the invention.
The details of making is in conjunction with the drawings described, and it is more obvious that feature and advantage of the present invention will become.Leftmost arabic numeral are represented in the Reference numeral of assembly by correspondence that occurs for the first time in the accompanying drawing.
Embodiment
A, foreword
Below will come to do details to exemplary embodiments of the present invention with reference to accompanying drawing describes.Other embodiment also is feasible, and can within spirit and scope of the invention, make modification to exemplary embodiment.Therefore, following details description is not limited to the present invention.On the contrary, scope of the present invention is defined by claim.
The those skilled in the art should understand easily, and as following described, the present invention can realize in hardware, software, software and hardware and/or entity shown in the drawings.Anyly realize that with specific control hardware actual software code of the present invention is not restriction of the present invention.Thereby, below the description of operation of the present invention and behavior is based on following understanding provides, promptly, can carry out various modifications and variation to the embodiment among the application according to the program of the detailed description that provides among the application.
Although should be understood that describing in the details of the present invention of this proposition is the processing of the voice signal that is directed against, the present invention also can be used for relating to the processing of other type sound signal.Therefore, only be for convenience at term " voice " and " voice signal " of this use, as restriction.Those having skill in the art will recognize that and know that this term can use term " audio frequency " and " sound signal " more commonly used to replace.In addition,, those having skill in the art will recognize that and know that sort signal also can be divided into other discrete signal segment, includes but not limited to subframe though voice and sound signal are described as being divided into a plurality of frames at this.Thereby the similar operations of carrying out on other section of voice or sound signal that is also included within the operation of carrying out on the frame described here is like subframe.
In addition, though the LOF (being called as packet loss) through the sound signal of packet network transmission has been discussed in following description, the present invention is not limited to bag-losing hide (PLC).For example, in wireless network,, also possibly lose or delete audio signal frame because channel damages.This situation is referred to as " frame deletion ".When this situation takes place, descend for fear of the essence of exporting on the voice quality, the demoder in the wireless system needs to carry out " frame deletion is hidden " and (FEC) attempts hiding the quality decline that causes because of frame losing.For PLC or FEC algorithm; Packet loss all runs into identical problem with frame deletion: the frame of some transmission can not be used further to decoding; So PLC or FEC algorithm need produce waveform and fill up the waveform gap (gap) corresponding to the frame of losing, descend because of the quality that frame losing causes thereby hide frame.Because term FEC and PLC are often referred to the technology of identical type, so can alternately use.Thereby, for convenience's sake, be used in reference to for both at this used term " bag-losing hide " or PLC.
The review of B, subband predictive coding
In order to help understanding better the various embodiments of the present invention of describing in the chapters and sections in the back, recall the ultimate principle of subband predictive coding at this.Usually, subband predictive coding device can separate into N subband, wherein N >=2 with input speech signal.Under situation about being without loss of generality, this with ITU-T G.722 the biobelt predictive coding system of scrambler describe as an example.The those skilled in the art can easily be summarized into other N belt band predictive coding device with this description.
Fig. 1 is the simplification coder structure 100 of G.722 subband predictive coding device.Coder structure 100 comprises quadrature mirror filter (QMF) analysis filterbank 110, low strap adaptive differential pulse code modulation (ADPCM) scrambler 120, high-band adpcm encoder 130 and bit stream multiplexer 140.QMF analysis filterbank 110 separates into low strap voice signal and high-band voice signal with input speech signal.Low strap adpcm encoder 120 becomes the low strap bit stream with the low strap speech signal coding.High-band adpcm encoder 130 becomes the high-band bit stream with the high-band speech signal coding.Bit stream multiplexer 140 is multiplexed into single output bit flow with low strap bit stream and high-band bit stream.In the transmitted in packets of this discussion was used, this output bit flow is packaged into bag, is transferred to subband prediction decoding device 200 then, and was as shown in Figure 2.
As shown in Figure 2, demoder 200 comprises bit stream demultiplexer 210, low strap adpcm decoder 220, high-band adpcm decoder 230 and QMF composite filter group 240.Bit stream demultiplexer 210 separates into low strap bit stream and high-band bit stream with incoming bit stream.Low strap adpcm decoder 220 becomes decoding low strap voice signal with the low strap bit stream decoding.High-band adpcm decoder 230 becomes decoding high-band speech signal with the high-band bit stream decoding.The QMF composite filter group 240 low strap voice signal of will decoding merges with decoding high-band voice signal and helps band output voice signal then.
About the more details of the structure of scrambler 100 and demoder 200 and operation can find, its integral body is introduced this paper as a reference at this in G.722 ITU-T recommends.
C, based on the bag-losing hide technology of the subband predictive coding device of full band speech waveform extrapolation (extrapolation)
Now high quality P LC system and method according to an embodiment of the invention is described.Summarized introduction to this system and method is provided in this joint, and relating to the concrete more details that realize of this system and method will describe in following D joint.This example system and method are used for ITU-T and recommend G.722 speech coder.Yet, those having skill in the art will recognize that and know, can be used for carrying out PLC in these many notions of describing with reference to this specific embodiments at the subband prediction speech coder of other type and the voice and the audio coder of other type.
Like what in this more details, describe, this embodiment carries out PLC in the 16kHz domain output of Voice decoder G.722.This method periodic waveform extrapolation is filled the waveform that is associated with the frame losing of voice signal, wherein according to the signal characteristic that takes place before the frame losing extrapolation waveform is mixed with noise through filtering.In order to upgrade the state of subband adpcm decoder, the 16kHz signal of extrapolation generates subband signal through the QMF analysis filterbank, and this subband signal is handled by the subband adpcm encoder of simplifying then.For provide from the extrapolation waveform related with the frame of losing to packet loss after the seamlessly transitting of the related normal decoder waveform of the good frame that receives, can carry out extra processing behind each packet loss.Wherein, The good frame of first that receives behind the state of subband adpcm decoder and the packet loss carries out phase alignment; And the normal decoder waveform related with first good frame carried out time distortion; With its with interpolation waveform stack between this normal decoder waveform is alignd with the interpolation waveform, thereby realization seamlessly transits.For long-term packet drop, this system and method will be gradually with the output signal weakening.
Fig. 3 is the higher level module figure that realizes the G.722 Voice decoder 300 of this PLC function.Though demoder described here/PLC system 300 comprises G.722 demoder, those having skill in the art will recognize that and knows, many notions described here can be used for any N belt band predictive coding system usually.Similarly, needing not to be adpcm encoder shown in Figure 3 to the predictive coding device of each subband, also can be any common predictive coding device, and can be that forward direction self-adaptation or back are to adaptive.
As shown in Figure 3, demoder/PLC system 300 comprises bit stream demultiplexer 310, low strap adpcm decoder 320, high-band adpcm decoder 330, switch 336, QMF composite filter group 340, is with voice signal compositor 350, subband adpcm decoder state update module 360 and decoding to retrain and control module 370 entirely.
The term " lost frames " of this use perhaps " bad frame " refer to the voice signal frame that does not receive or be considered to be not suitable for the normal decoder operation at demoder/PLC300." received frame " or " good frame " is at demoder/PLC system 300 normal voice signal frames that receive." present frame " is current just by the frame of demoder/PLC300 processing with generation output voice signal, and " preceding frame " generates the frame of exporting voice signal by demoder/PLC system 300 processing before being.Term " present frame " and " preceding frame " all can be used to the frame of finger receipts and are just carrying out PLC operation and lost frames.
The mode of demoder/PLC system 300 operations will be described with reference to the process flow diagram 400 of figure 4.As shown in Figure 4, the method for process flow diagram 400 is in step 402 beginning, and demoder/PLC system 300 confirms the frame type of present frame.Six kinds of dissimilar frames are distinguished by demoder/PLC system 300, represent with Class1 to 6 respectively.Fig. 5 provides the timeline 500 of different frame type.The frame of Class1 is any received frame behind the 8th received frame behind the packet loss.The frame of type 2 is first and second lost frames relevant with packet loss.The frame of type 3 is any one in the 3rd to the 6th lost frames relevant with packet loss.The frame of type 4 is any one lost frames behind the 6th lost frames relevant with packet loss.The frame of type 5 is to follow any received frame that receives behind the packet loss closely.At last, the frame of type 6 is any one in second to the 8th received frame that behind packet loss, receives.The those skilled in the art should know easily, also can use other scheme of classification frame type according to alternate embodiment of the present invention.For example, in the system with different frame size, the frame number in each frame type is all with above-mentioned different.Equally, for different codecs (being non-G.722 codec), the frame number in each frame type can be different.
The mode that demoder/PLC system 300 processing present frames produce the output voice signal is to be confirmed by the frame type of present frame.This shows through a series of determining steps 404,406,408 and 410 in Fig. 4.Specifically, if confirm that in step 402 present frame is the frame of Class1, the treatment step of carrying out first sequence so produces the output voice signal, shown in determining step 404.If confirm that in step 402 present frame is the frame of type 2, type 3 or type 4, the treatment step of carrying out second sequence so produces the output voice signal, shown in determining step 406.If confirm that in step 402 present frame is the frame of type 5, the treatment step of carrying out the 3rd sequence so produces the output voice signal, shown in determining step 408.At last, if confirm that in step 402 present frame is the frame of type 6, the treatment step of carrying out the 4th sequence so produces the output voice signal, shown in determining step 410.Treatment step with every kind of different frame type association below will be described.
After executing the treatment step of each sequence, determine whether that in determining step 430 extra frame will handle.If there is extra frame to handle, handles so and turn back to step 402.Yet, if there is not extra frame to handle, handle so shown in step 432 finish.
1, the frame of treatment type 1
As shown in the step 412 of process flow diagram 400, if present frame is the frame of Class1, demoder/PLC system 300 carries out the normally G.722 decoding of present frame so.Therefore, the module 310,320,330 and 340 of demoder/PLC system 300 is correctly carried out respectively and tradition relative module 210,220,230 and 240 identical functions of demoder 200 G.722.Specifically, bit stream demultiplexer 310 separates into low strap bit stream and high-band bit stream with incoming bit stream.Low strap adpcm decoder 320 becomes decoding low strap voice signal with the low strap bit stream decoding.High-band adpcm decoder 330 becomes decoding high-band voice signal with the high-band bit stream decoding.The QMF synthetic filtering group 340 low strap voice signal of will decoding reconsolidates with decoding high-band voice signal and helps the band voice signal then.In the process of the frame of treatment type 1, switch 336 is connected to the top position that is labeled as " Class1 ", thus with the output signal of QMF synthetic filtering group 340 as final output voice signal to the demoder/PLC system 300 of the frame of Class1.
After completing steps 412, demoder/PLC system 300 upgrades various status registers, and carries out some processing of the PLC operation that helps carrying out for follow-up frame losing, shown in step 414.Status register comprise the relevant high-band adpcm decoder status register of the relevant low strap adpcm decoder status register of PLC, PLC with entirely with the relevant status register of PLC.As the part of this step, entirely with voice signal compositor 350 with the output signal storage of QMF composite filter group 340 in the internal signal buffer memory, think that speech waveform extrapolation possible in the follow-up lost frames processing procedure prepares.Subband adpcm decoder state update module 360 is non-active in the process of the frame of treatment type 1 with decoding constraint and control module 370.The more details that below provide the frame of relevant Class1 to handle concrete realization with reference to demoder/PLC system 300 of describing in the D joint.
2, the frame of treatment type 2, type 3 and type 4
In the process of the frame of treatment type 2, type 3 and type 4, the incoming bit stream relevant with lost frames is disabled.Therefore, module 310,320,330 and 340 can not be carried out their common functions, and is non-active.On the contrary, switch 336 is connected to the lower position that is labeled as " type 2-6 ", and it is active to be with voice signal compositor 350 to become entirely, the output voice signal of synthetic demoder/PLC system 300.Full band voice signal compositor 350 through storage before inserting with packet loss before the relevant output voice signal of last several received frames synthesize the output voice signal of demoder/PLC system 300.This embodies in the step 416 of process flow diagram 400.
After full band voice signal compositor 350 is accomplished the synthetic task of waveform; Subband adpcm decoder state update module 360 is suitably upgraded the internal state of low strap adpcm decoder 320 and high-band adpcm decoder 330; For the good frame that possibly exist in the next frame is prepared, shown in step 418.Now the mode of execution in step 416 and 418 being carried out more details describes.
A, waveform extrapolation
There are many prior aries in the outer plugging function of the waveform of execution in step 416.Below the employed technology of realization of demoder/PLC system 300 of in D joint, describing be 11/234,291 at application number, the revision of application is artificially old, the submission date is on September 26th, 2005, title is described for the U.S. Patent application of " the bag-losing hide technology that is used for piece independent voice codec " technology.At this senior description of this technology will be provided, and in the D joint, will propose more details.
In order to realize the outer plugging function of waveform, in the processing procedure of received frame, analyzes output voice signal from the storage of QMF composite filter group 340 with extraction pitch period (pitchperiod), the short-term forecasting factor and the long-term forecasting factor with voice signal compositor 350 entirely.Then these parameters are stored so that follow-up use.
Full band voice signal compositor 350 is searched for through two stages of carrying out and is extracted pitch period.In the phase one, confirm the pitch period of low-res (or thick fundamental tone) through the sampled version (decimated version) of input speech signal or its filtered version being carried out search.In subordinate phase, thick fundamental tone is refined into normal resolution through the neighborhood that uses sampled signal not to search for thick fundamental tone.This two stage searching methods obviously need obvious lower computation complexity than the complete search of single phase in sampling interval not.Before voice signal or its filtered version were sampled, sampled signal need not passed through anti-aliased (anti-aliasing) low-pass filter usually.In order to reduce complexity, common prior art is to use low order IIR (IIR) wave filter, like elliptic filter.Yet the limit of good low order iir filter is usually very near unit circle, thereby in carrying out 16 fixed-point algorithms, during with the corresponding filtering operation of full pole segment of wave filter, needs the algorithm computing of dual precision.
Compared with prior art, be with voice signal compositor 350 to use finite impulse response (FIR) (FIR) wave filter entirely as anti-aliased low-pass filter.Through using the FIR wave filter by this way, only need 16 fixed-point algorithm computings of single precision, the FIR wave filter can carry out computing with the sampling rate of lower sampled signal.So this method can reduce the computation complexity of anti-aliased low-pass filter significantly.For example, in the realization of demoder/PLC system 300 of in the D joint, describing, sampled signal does not have the sampling rate of 16kHz, but the sampled signal that is used for the fundamental tone extraction only has the sampling rate of 2kHz.On the basis of existing technology, can use 4 rank elliptic filters.The full pole segment of elliptic filter needs the fixed-point algorithm of dual precision, and need be with the computing of 16kHz sampling rate.Just because of this, although full null part can be with the computing of 2kHz sampling rate, whole 4 rank elliptic filters and down-sampling operation need the computation complexity of 0.66WMOPS (weighting 1,000,000 computing per seconds).On the contrary; Even if use the high order FIR filter on 60 relative rank to replace 4 rank elliptic filters; Because 60 rank FIR wave filters are with low-down 2kHz sampling rate operation; So whole 60 rank FIR wave filters and down-sampling operation only need the complexity of 0.18WMOPS, compare 4 rank elliptic filters and have reduced 73%.
First lost frames starting point at packet loss; When the input to the cascade composite filter is made as zero; Full band voice signal compositor 350 uses long-term composite filter of cascade and short-term composite filter to produce a signal, is referred to as " call signal (ringing signal) ".Analyze the degree that some signal parameter (like fundamental tone prediction gain and standardization auto-correlation) is confirmed " pronunciation (voicing) " in the output voice signal of storage with voice signal compositor 350 entirely then.If output voice signal before pronunciation is very high, so with this voice signal of periodic manner extrapolation to produce the alternative wave of current bad frame.The periodic waveform extrapolation is to use the refinement version of the pitch period that on the frame that receives recently, extracts to carry out.If output voice signal before is sounding or similar noise, so proportional (scaled) random noise produces the substitution signal of current bad frame through the short-term composite filter.If the degree of pronunciation be two extreme between, so two compositions are mixed by the pronunciation degree is proportional.Then with this extrapolation signal and call signal stack, to guarantee that when first bad frame of packet loss begins, not having waveform interrupts.In addition, the waveform extrapolation is expanded to exceed terminal one period that equals the cycle of superposeing at least of current bad frame, the call signal that makes stack when the extra samples of this extrapolation signal can begin as next frame when next frame began.
In the bad frame of first bad frame that is not packet loss (promptly in the frame of type 3 or type 4); The operation of full band voice signal compositor 350 is identical with the operation that the preceding paragraph is described in essence; Except full band voice signal compositor 350 need not calculate call signal; And can use in the call signal of the extra samples that exceeds the extrapolation signal that in previous frame, calculates behind the previous frame end, guarantee when this frame begins, not have waveform to interrupt with this as the stack computing.
For the situation of long-term packet loss, gradually the output voice signal of demoder/PLC system 300 is weakened with voice signal compositor 350 entirely.For example, in the realization of demoder/PLC system of in the D joint, describing, the output voice signal that produces in the packet loss process arrives zero with linear mode decay or " weakening ", begins from 20ms, and finishes at 60ms.Carrying out this function is because increase in time about the shape of " reality " waveform and the uncertainty of form.In fact, when the extrapolation fragment far exceeded the scope of about 60ms, many PLC schemes began to produce the output of drone sound (buzzy).
In alternate embodiment of the present invention, for the PLC in the ground unrest, (usually) embodiments of the invention are followed the trail of the rank of ground unrest (ambient noise), and decay to this rank to long frame deletion, rather than zero.This has eliminated the packet loss interruption effect that the noise elimination of exporting is produced because of the PLC system in ground unrest.
Further alternate embodiment of the present invention realizes that through carrying out comfort noise produces (CNG) function and solved the foregoing problems of PLC in the ground unrest.When this embodiment of the present invention began the output voice signal to long-term packet loss decay demoder/PLC system 300, it also began to sneak into the comfort noise that is produced by CNG.Through sneaking into comfort noise and with comfort noise replacement, when the output voice signal of demoder/PLC system 300 by weak and when finally eliminating the noise, above-mentioned interruption effect will be eliminated and provide the reliable reproduction of this signal surrounding environment.This method has been proved to be and in other is used, has been accepted at large.For example, in sub-band echo canceller (SBAEC), in the perhaps general Echo Canceller (AEC), when detecting residual echo, weaken this signal and replace with comfort noise.This typically refers to Nonlinear Processing (NLP).The prerequisite of this embodiment of the present invention is that PLC demonstrates very similarly scheme.Similar with AEC, this method will provide a kind of enhanced experience more to the use of PLC, and this interruption effect that is far from is horrible.
The renewal of the internal state of b, low strap and high-band adpcm decoder
Full band voice signal compositor 350 has been accomplished after the synthetic task of the waveform of in step 416, carrying out; Subband adpcm decoder state update module 360 is suitably upgraded the internal state of low strap adpcm decoder 320 and high-band adpcm decoder 330 then in step 418, for good frame possible in the next frame is prepared.The internal state of carrying out low strap adpcm decoder 320 and high-band adpcm decoder 330 upgrades and has many methods.Because the internal state that G.722 scrambler among Fig. 1 and the G.722 demoder among Fig. 2 have same type; More a kind of direct method of new decoder 320 and 330 internal state be the output signal of the full band of feedback voice signal compositor 350 through standard shown in Figure 1 scrambler G.722, the internal state that stays with last sampling of previous frame begins.Then, behind the current bad frame coding to the extrapolation voice signal, the internal state that last sampling of current bad frame stays is used to upgrade the internal state of low strap adpcm decoder 320 and high-band adpcm decoder 330.
Yet preceding method has the complexity of two subband coders.In order to save complexity, demoder/PLC system of in the D joint, describing more than 300 pairs method carried out approximate realization.For the high-band adpcm encoder, be recognized that in first received frame after handling packet loss, do not need high-band adaptive quantizing step delta H(n).On the contrary, quantization step is reset to sliding average before the packet loss (describing like other part in this application).Therefore, use differential signal (or predictive error signal) e of non-quantification H(n) adaptive prediction that carries out in the high-band adpcm encoder upgrades, and to e H(n) quantization operation has been avoided fully.
For the low strap adpcm encoder, scheme is a little a bit different.Owing to keep low strap adaptive quantizing step delta LThe importance of (n) fundamental tone modulation, below the realization of demoder/PLC system 300 of in the D joint, describing in lost frames, upgraded this parameter effectively.Standard G.722 low strap adpcm encoder adopts differential signal (or predictive error signal) e L(n) 6 quantifications.Yet,, only used the subclass of 8 amplitude quantizing indexes to upgrade low strap adaptive quantizing step delta according to standard G.722 L(n).Through using non-quantized difference signal e L(n) adaptive prediction that replaces quantized difference signal to be used for the low strap adpcm encoder upgrades, and is keeping low strap adaptive quantizing step delta L(n) under the same more news, the embodiment that in the D joint, describes can use the differential signal that is not very complicated to quantize.
The those skilled in the art should know easily, relates to high-band adaptive quantizing step delta in this application H(n) in the description, high-band adaptive quantizing step-length can be by the high-band logarithmic scale factor
Figure S200780001854XD00131
Replacement.Likewise, relate to low strap adaptive quantizing step delta in this application L(n) in the description, low strap adaptive quantizing step-length can be by the low strap logarithmic scale factor
Figure S200780001854XD00132
Replacement.
With standard G.722 the subband adpcm encoder relatively, used low strap and another difference of high-band adpcm encoder are resetting based on signal attribute and the self-adaptation of packet loss duration of scrambler in the embodiment of D joint.Begin to describe this function now.
Like above introduction,, will export speech waveform after being with voice signal compositor 350 at the fixed time entirely and eliminate the noise for long packet loss.In the realization of demoder/PLC system 300 of in following D joint, describing; Output signal feed-through from full band voice signal compositor 350 is given G.722QMF analysis filterbank, to obtain to be used for during lost frames, upgrading the subband signal of low strap adpcm decoder 320 and high-band adpcm decoder 330 internal states.Therefore, in case subdued to zero from the output signal of full band voice signal compositor 350, the subband signal that is used to upgrade subband adpcm decoder internal state also can become zero.Permanent zero can make the adaptive predictor in each demoder distinguish with the adaptive predictor in the scrambler, because permanent zero can make fallout predictor partly in equidirectional accommodation ceaselessly artificially.This is conspicuous in traditional high-band adpcm decoder, and this can produce high frequency chirp (chirping) usually when long-time good frame of packet loss aftertreatment.For traditional low strap adpcm decoder, because fallout predictor has too high filter gain, this problem can cause that once in a while factitious energy increases.
Based on aforesaid argumentation, in case the PLC output waveform subdued to zero, below the realization of demoder/PLC system 300 of in the D joint, the describing ADPCM sub-band decoder that resets.This method has almost completely been eliminated the high frequency chirp behind long-time frame deletion.The uncertainty of the synthetic waveform that full band voice signal compositor 350 produces increases with the increase of packet loss time, and this argumentation result shows that, it is unconspicuous using this method to upgrade subband adpcm decoder 320 and 330 at some point.
Yet; Even when the output of full band voice signal compositor 350 is eliminated the noise fully, reset subband APCM demoder 320 and 330, still have some problems that occur with uncommon chirp (from high-band adpcm decoder 330) and form uncommon and not natural energy growth (from low strap adpcm decoder 320).Through producing the adaptive degree of depth that resets of each subband adpcm decoder, solve these problems in the realization of in the D joint, describing.Reset at when waveform is eliminated the noise and still can take place, but also can reset the one or more of subband adpcm decoder 320 and 330 in advance.
As will in D joint, describe, during the decision that resets in advance is based on bad frame (promptly based on from the output signal update subband adpcm decoder 320 of full band voice signal compositor with 330 during) monitoring of some performance of the adaptive signal of the pole segment of the adaptive predictor of controlling subband adpcm decoder 320 and 330 made.For low strap adpcm decoder 320, the signal P of part reconstruct Lt(n) drive full limit filtering self-adaptation partly, and the signal P of part reconstruct H(n) self-adaptation of the full limit filtering part of driving high-band adpcm decoder 330.In essence, monitoring each parameter in during the lost frames of 10ms all is constant to a great extent, is mainly plus or minus in perhaps during current lost frames.It should be noted that self-adaptation resets and is limited in after the packet loss 30ms in the realization that the D joint is described.
3, the frame of treatment type 5 and type 6
When the frame of treatment type 5 and type 6, the incoming bit stream relevant with present frame is available again, thereby module 310,320,330 and 340 is in active state once more.Yet; Low strap adpcm decoder 320 retrains and controls with decoded constraint of decode operation and control module 370 that high-band adpcm decoder 330 is carried out; With the counterfeit picture (artifact) and the distortion of the transition position of minimizing from lost frames to the received frame, thus the performance of demoder/PLC system 300 behind the raising packet loss.For the frame of type 5, this is embodied in the step 420 of process flow diagram 400, then is embodied in the step 426 for the frame of type 6.
For the frame of type 5, will carry out additional modifications to the output voice signal and guarantee by the composite signal of full band voice signal compositor 350 generations with by seamlessly transitting between the output signal of QMF composite filter group 340 generations.Thereby the output signal of QMF composite filter group 340 directly is not used as the output voice signal of demoder/PLC system 300.On the contrary, revise the output of QMF composite filter group 340 entirely with VODER 350, and use the output voice signal of amended version as demoder/PLC system 300.Thereby when the frame of treatment type 5 or type 6, switch 336 remains connected to the lower position that is labeled as " type 2-6 ", to receive the output voice signal from full band voice signal compositor 350.
In this, if exist unjustifiedly between the output signal that composite signal that produces with voice signal compositor 350 entirely and QMF composite filter group 340 produce, the operation of carrying out with voice signal compositor 350 entirely comprises time distortion and phasing again.The execution of these operations illustrates in the step 422 of flow process 400, and will do more details and describe following.
Equally, for the frame of type 5, output voice signal that produces with voice signal compositor 350 entirely and call signal stack from the lost frames of first pre-treatment.Do like this be in order to ensure from the synthetic waveform related with preceding frame to the seamlessly transitting of the related output waveform of the frame of current type 5.The execution of this step illustrates in the step 424 of process flow diagram 400.
After the frame that is type 5 or type 6 produces the output voice signal; Demoder/PLC system 300 upgrades various status registers; And carry out some handle be beneficial to follow-up lost frames with the performed PLC computing of the similar fashion of step 414, shown in step 428.
The constraint and the control of a, subband ADPCM decoding
Like above introduction; 370 pairs of decode operations of in the frame process of treatment type 5 and type 6, being carried out by low strap adpcm decoder 320 and high-band adpcm decoder 330 of constraint and the control module of decoding retrain and control, with the performance of demoder/PLC system 300 behind the raising packet loss.Now the various constraints and the control of decoding constraint and control module 370 employings are described.About the more details of these constraints and control will further describe in following D joint in the special realization of reference decoder/PLC system 300.
I, be used for the setting of the adaptive quantizing step-length of high-band adpcm decoder
For the frame of type 5, decoding constraint and control module 370 will be used for the adaptive quantizing step delta of high-band adpcm decoder 330 H(n) be set at packet loss before the sliding average of the related value of the good frame that receives.Energy decreases through visible packet loss in the fragment that reduces ground unrest causes has improved the performance of demoder/PLC system 300 in the ground unrest.
Ii, be used for the setting of the adaptive quantizing step-length of low strap adpcm decoder
For the frame of type 5, decoding constraint and control module 370 are carried out adaptive strategy and are thought that low strap adpcm decoder 320 is provided with the adaptive quantizing step delta L(n).In the embodiment that substitutes, this method also can be used for high-band adpcm decoder 330.As partly introduce in front, for high-band adpcm decoder 330, with the adaptive quantizing step delta H(n) being set at the sliding average of the value on first good frame before the packet loss, is useful to the demoder/performance of PLC system 300 in ground unrest.Yet same procedure is applied to low strap adpcm decoder 320 can produce very big not natural energy growth once in a while on speech sound.This is because in speech sound, use pitch period to Δ L(n) modulate, and therefore with Δ L(n) be set at the preceding sliding average of frame losing and can on first the good frame behind the packet loss, cause Δ L(n) very large unusual increasing.
Therefore, modulating Δ by pitch period L(n) under the situation, preferably use Δ from adpcm decoder state update module 360 L(n), rather than packet loss before Δ L(n) sliding average.Recall, through will be entirely obtaining low band signal with the output signal transmission of voice signal compositor 350 through analysis filterbank G.722QMF, subband adpcm decoder state update module 360 is upgraded low strap adpcm decoders 320.If be with voice signal compositor 350 executing the task entirely, possibly be to speech sound, the signal that is used to upgrade low strap adpcm decoder 320 so matees the signal that on scrambler, uses probably very much, so the parameter Δ L(n) also probably very near the step-length of this scrambler.For speech sound, this method is preferably with Δ L(n) be set at the preceding Δ of packet loss L(n) sliding average.
Describe before considering, decoding constraint and control module 370 employing adaptive strategies are used for the Δ of first good frame behind the packet loss with setting L(n).If the voice signal quite stable before the packet loss, for example stable ground unrest is so with Δ L(n) be set at the preceding Δ of packet loss L(n) sliding average.Yet, if the voice signal before the packet loss demonstrates Δ L(n) variation on for example is considered to speech sound, so with Δ L(n) be set at through low strap adpcm decoder and upgrade the value that obtains based on the output of full band voice signal compositor 350.For the situation of centre, Δ L(n) be set to based on Δ before the packet loss L(n) change the linear weighted function that between these two values, carries out.
Iii, be used for the self-adaptation LPF of the adaptive quantizing step-length of high-band adpcm decoder
Lose in the process of the good frame of back first few (frame of type 5 and type 6) in pack processing; For the risk of the localised waving that reduces the too strong high-frequency content of generation (because G.722 scrambler and G.722 synchronous temporarily losing between the demoder), decoding constraint and control module 370 can effectively be controlled the adaptive quantizing step delta of high-band adpcm decoder H(n).Can produce higher-order of oscillation effect like this, this just in time is the influence of actual chirp.Therefore, in the good frame of first few, to high-band quantization step Δ H(n) application self-adapting low-pass filter.Reduced with quadric form and seamlessly transitted through the adaptive time cycle.For the highly stable signal segment of voice signal before the packet loss, the duration longer (in the realization of following demoder/PLC system 300 of in the D joint, describing is 80ms).For the situation that before the packet loss is not very stable voice signal, the duration short slightly (in the realization of following demoder/PLC system 300 of in the D joint, describing is 40ms), and, then do not adopt LPF for unsettled fragment.
Adaptive security nargin (adaptivesafety margin) in iv, the good frame of first few on the all-pole filter part
Since in inevitable deviation between demoder and the scrambler G.722 during the packet loss and afterwards, decoding constraint and control module 370 some constraint of adaptive predictor execution during the good frame of first few behind the packet loss (frame of type 5 and type 6) to low strap adpcm decoder 720.According to standard G.722, the encoder of acquiescence is carried out 1/16 minimum " safety " nargin on the pole segment of subband fallout predictor.Yet, it has been found that, the full pole segment of six at the two poles of the earth of low strap adpcm decoder, zero predictive filters at packet loss after regular meeting causes that unusual energy increases.This senses with the form of waveform spring (pop) usually.Obviously, packet loss causes lower margin of safety, and it is corresponding with the full limit filtering part with higher gain that produces the very high energies waveform.
Through on the full limit filtering part of the adaptive predictor of low strap adpcm decoder 320, carrying out more strict constraint adaptively, decoding constraint and control module 370 have greatly reduced the unusual energy increase behind this packet loss.On the good frame of first few behind the packet loss, obtained the minimum safe nargin that increases.The minimum safe nargin that increases is reduced to the G.722 minimum safe nargin of standard gradually.In addition, also the sliding average of the margin of safety before the packet loss is monitored, and the minimum safe nargin of good increase image duration of first few behind the packet loss is controlled, so that can not surpass this sliding average.
DC on the internal signal of v, high-band adpcm decoder removes
During the good frame of first few behind the packet loss (frame of type 5 and type 6), according to observations, G.722 demoder often produces the distortion of warbling of very tedious tangible high frequency.This distortion comes from because of packet loss and high-band adpcm encoder and loses synchronously and thereby produce the high-band adpcm decoder of prediction devious.The synchronization loss of distortion of causing warbling has shown himself antipodal points fallout predictor P in input signal H(n) adaptive control and the high band signal r of reconstruct in long-time, having constant sign H(n) control.This makes the pole segment of fallout predictor drift about, because self-adaptation is based on symbol (sign-based), thereby on equidirectional, keeps upgrading.
For fear of this problem, through first few behind packet loss good image duration respectively with high-pass filtered version P H, HP(n) and r H, HP(n) replace signal P H(n) and r H(n), decoding constraint and control module 370 have increased DC to these signals and have removed operation.This is used for eliminating fully chirp.DC removes and is implemented as P H(n) and r H(n) separately the subtraction of sliding average.These sliding averages upgrade to good frame and bad frame constantly.In the realization of demoder/PLC system 300 of in following D joint, describing, this replacement takes place to initial 40ms behind the packet loss.
B, phasing and time distortion again
Like above introduction; In the step 422 of process flow diagram 400; If between the voice signal that produce the initial image duration that receives behind the packet loss, exist unjustifiedly entirely in the synthetic speech signal that produces during the packet loss and QMF composite filter group 340 with voice signal compositor 350, carry out the technology that is called as " phasing again " and " time distortion " with voice signal compositor 350 entirely.
Aforesaid, when handling lost frames,,, be with voice signal compositor 350 so entirely based on pitch period extrapolation speech waveform like the first tone signal of voice if almost be periodic with the relevant decodeing speech signal of received frame before the packet loss.Still aforesaid, beyond lost frames endings, continue this waveform extrapolation, obtaining to be used for the more samplings with the related voice signal stack of next frame, thereby guaranteed to seamlessly transit and avoided any interruption.Yet the actual pitch period of decodeing speech signal is not generally followed and in lost frames, is carried out used pitch contour during the waveform extrapolation.So, the decodeing speech signal complete matching that general extrapolation voice signal can be not related with first good frame.
This is shown in Fig. 6; Fig. 6 show before the packet loss with packet loss after during first received frame decodeing speech signal 602 amplitude (for ease; Decodeing speech signal when also showing lost frames; But should be appreciated that can not decode this part of original signal of demoder/PLC system 300) and during the lost frames with packet loss after the timeline 600 of amplitude of the extrapolation voice signal 604 that produces during first received frame.As shown in Figure 6, two signals are out-phase in first received frame.
This out-phase phenomenon causes two problems in demoder/PLC system 300.The first, can see that from Fig. 6 in first received frame behind packet loss, the decodeing speech signal 602 in the overlap-add region is an out-phase with extrapolation voice signal 604, and part is offseted, cause to listen and bear picture.The second, presented fundamental tone modulation to a certain degree with subband adpcm decoder 320 and 330 relevant status registers, and therefore to the phase sensitive of voice signal.If voice signal near pitch period, promptly is that this problem is especially obvious near the voice signal part of the rapid fundamental tone pulse of rising and descending of signal level.Because the phase sensitive of subband adpcm decoder 320 and 330 pairs of voice signals; And because extrapolation voice signal 604 is used to when packet loss, upgrade the status register (as stated) of these demoders; Phase differential between extrapolation voice signal 604 and the decodeing speech signal 602 will produce significantly breast picture in the received frame behind packet loss, and this is because the internal state mismatch of subband adpcm encoder and demoder.
Below will make more details to this and describe, the time distortion is used for solving first problem of the destructive interference of overlap-add region.Specifically, time distortion be used to stretch with contraction and packet loss after the time shaft of the relevant decodeing speech signal of first received frame, so that it is alignd with the extrapolation voice signal that is used for hiding last lost frames.Though with reference to subband predictive coding device the time distortion is described with storer at this; But this ordinary skill also may be used on other scrambler, includes but not limited to have and do not have scrambler, prediction and nonanticipating scrambler and the subband of storer and be with scrambler entirely.
Make more details at this equally and describe, phasing is used to solve because second problem that the internal state of the unjustified subband adpcm encoder that causes of first frame and demoder excuse me, but I must be leaving now behind lost frames and the packet loss again.Again phasing is that internal state with subband adpcm decoder 320 and 330 is set at behind extrapolation speech waveform and the packet loss last input signal sampling before first received frame with the processing procedure of the state of the time point of phase time.Though in the environment of adaptive system, describing phasing again in the back, it also can be used for carrying out PLC at forward direction adaptive prediction coder or any scrambler with storer.
I, time lag are calculated
Again phasing and time distortion technology all need calculate the extrapolation voice signal and with packet loss after the quantity of unjustified sampling between the relevant decodeing speech signal of first received frame.This unjustified being called as " hysteresis ", like institute's mark among Fig. 6, it can think the number of samples of decodeing speech signal hysteresis extrapolation voice signal.In the situation of Fig. 6, hysteresis is born.
A kind of general method of carrying out time lag calculating still also can be used other method shown in the process flow diagram 700 of Fig. 7.A kind of ad hoc fashion of carrying out this method has provided description in following D joint.
As shown in Figure 7, the method for process flow diagram 700 is in step 702 beginning, after the speech waveform that during last lost frames, is produced by full band voice signal compositor 350 is extrapolated to packet loss in first received frame.
In step 704, calculate time lag.On conceptual level, through maximization extrapolation voice signal and with packet loss after correlativity between the related decodeing speech signal of first received frame calculate time lag.As shown in Figure 9; With respect to the related decodeing speech signal of first received frame (being expressed as 902); Extrapolation voice signal (being expressed as 904) drifts about in the scope of+MAXOS at-MAXOS, and wherein MAXOS representes maximum offset, and the drift value of maximization correlativity is used as time lag.This can through near zero ± signal in the time lag scope of MAXOS between the peak value of search criterion crossing dependency function R (k) accomplish:
R ( k ) = Σ i = 0 LSW - 1 es ( i - k ) · x ( i ) Σ i = 0 LSW - 1 es 2 ( i - k ) Σ i = 0 LSW - 1 x 2 ( i ) , k = - MAXOS , K , MAXOS - - - ( 1 )
Wherein es is the extrapolation voice signal, x be with packet loss after the related decodeing speech signal of first received frame, MAXOS is the peak excursion that allows, LSW is a hysteresis search window length, i=0 is illustrated in the sampling of first in the hysteresis search window.The time lag that maximizes this function will be corresponding to the relative time deviation between two waveforms.
In one embodiment, to determine the quantity (being called the hysteresis search window again) of the sampling of calculating correlativity above that based on the adaptive mode of pitch period.For example, among the embodiment that in following D joint, describes, the quantitative window size of sampling (16kHz sampling rate) that is used for the thick search that lags behind is following:
Figure S200780001854XD00211
Wherein ppfe is a pitch period.This equality has used floor function (floor function).The floor function of real number x
Figure S200780001854XD00212
is the function that returns the maximum integer that is less than or equal to x.
If the time lag of in step 704, calculating is zero; This expression extrapolation voice signal decodeing speech signal related with first received frame is homophase so; And lagging behind (comparatively speaking being postponed) extrapolation voice signal on the occasion of the expression decodeing speech signal related with first received frame, negative value representes that the decodeing speech signal related with first received frame is ahead of the extrapolation voice signal.If time lag equals zero, need not carry out again the distortion of phasing and time so.In the example implementation that in following D joint, proposes; If last received frame before the packet loss is that noiseless (number of degrees through the sounding that calculates to this frame are represented; Described like above processing) about type 2, type 3 and type 4; If first received frame perhaps behind the packet loss is noiseless, time lag also is set as zero.
In order to minimize the complexity of correlation calculations, can use the multistage to handle and carry out the search that lags behind.The process flow diagram 800 of Fig. 8 shows this method, wherein at first uses the down-sampling of signal to represent to carry out thick time lag search in step 802, representes to carry out the search of refinement time lag what step 804 was used signal than high sampling rate then.For example, signal is down sampled to carries out thick time lag search behind the 4kHz, carry out the search of refinement time lag with the signal on the 8kHz.In order further to reduce complexity, can be only ignore any aliasing effect and carry out to down-sampling through signal being carried out double sampling.
A problem is to use any signal to carry out relevant with the interior extrapolation voice signal of first received frame." powerful (brute force) " method is intactly to decode first received frame to obtain decodeing speech signal calculates correlativity then on 16kHz method.Be first received frame of decoding, can use from recompile extrapolation voice signal (as stated) until frame boundaries and the subband adpcm decoder 320 that obtains and 330 internal state.Yet because the algorithm of phasing again of the following stated will provide one group better state for subband adpcm decoder 320 and 330, this need rerun G.722 decoding.Because this method is carried out complete decode operation twice, aspect computation complexity, waste very much.For head it off, embodiments of the invention have been realized a kind of method of lower complexity.
According to the method for lower complexity, the G.722 bit stream that in first received frame, receives only by partial decoding of h to obtain low strap quantized difference signal d Lt(n).Normally G.722 in the decode procedure, the bit that receives from bit stream demultiplexer 310 converts differential signal d to by subband adpcm decoder 320 and 330 Lt(n) and d H(n), these two signals carry out convergent-divergent by self-adaptation scale factor backward, and obtain the subband voice signal through self-adaptation zero limit (pole-zero) fallout predictor backward, and these signals are synthesized by QMF composite filter group 340 then and produce the output voice signal.In each sampling in this processing procedure, with the coefficient (coefficient) that upgrades the adaptive predictor in subband ADPCM decoding d device 320 and 330.This renewal has solved the pith of decoder complexity.Owing to only need be used for the signal that time lag is calculated, so in the lower complexity method, the two poles of the earth, 60 predictive filter coefficients still remain unchanged (they are not updated based on sampling one by one).In addition, because lag behind by the fundamental tone decision, and the fundamental tone basic frequency of people's voice is less than 4kHz, so only can obtain low strap approximate signal r L(n).More details about the method will provide in following D joint.
Among the embodiment that in following D joint, describes, the fixedly filter factor of the two poles of the earth, 60 predictive filters is that the extrapolation waveform during the packet loss of decoding again obtains up to the end of last lost frames.In optional realization, fixedly filter factor can be those filter coefficients of the ending use of last received frame before packet loss.In another optional realization, can be according to the characteristic of voice signal or other standard, select in these coefficient sets one or other with adaptive mode.
Ii, phasing again
In phasing process again, adjustment subband adpcm decoder 320 and 330 internal state are considered the time lag between the related decoded speech waveform of first received frame behind extrapolation speech waveform and the packet loss.As described before, before handling first received frame, through the output voice signals that during last lost frames, synthesized by full band voice signal compositor 350 are carried out the internal state that recompile is estimated subband adpcm decoder 320 and 330.The internal state of these demoders demonstrates certain fundamental tone modulation.Thereby; If the pitch period that during the waveform extrapolation related with last lost frames, uses is just in time closelyed follow the pitch contour of decodeing speech signal; In the end the border between lost frames and first received frame stops recoding processing so, and the state of subband adpcm decoder 320 and 330 and original signal are homophases.Yet, as stated, generally the do not match pitch contour of decodeing speech signal of the fundamental tone that when extrapolation, uses, and first received frame behind packet loss is when beginning, and extrapolation voice signal and decodeing speech signal do not line up.
In order to overcome this problem, phasing use time lag is controlled at and where stops again the phasing processing again.In the example of Fig. 6, the time lag between extrapolation voice signal 604 and the decodeing speech signal 602 is born.Suppose that this time lag represented by lag.So, can find out, if to the extrapolation voice signal recode exceed frame boundaries-lag sampling, recodification will be in extrapolation voice signal 604 so with frame boundaries on stop on the consistent phase place of the phase place of decodeing speech signal 602.Subband adpcm decoder 320 that produces and 330 status register will with the reception data homophase in first good frame, thereby better decoded signal is provided.Thereby the number of samples of the subband reconstruction signal of decoding again is following:
N=FS-lag (3)
Wherein FS is a frame sign, and all parameters all are to be unit with sub-band sample rate (8kHz).
Figure 10 A, Figure 10 B and Figure 10 C have provided three kinds of schemes of phasing again respectively.On the timeline 1000 of Figure 10 A, decodeing speech signal 1002 is ahead of extrapolation voice signal 1004, so decoding exceeds frame boundaries-lag sampling again.On the timeline 1010 of Figure 10 B, decodeing speech signal 102 lags behind extrapolation voice signal 1014, lag the sampling place termination before frame boundaries of decoding again.On the timeline 1020 of Figure 10 C, extrapolation voice signal 1024 believes 1022 homophase on the frame boundaries (although the pitch contour during the lost frames is different) with the decoding voice number, and recompile stops on frame boundaries.Note, for ease, in Figure 10 A, 10B and 10C, all show the decodeing speech signal during lost frames, but should know can not decode this part of original signal of demoder 300.
If do not carry out the phasing again of subband adpcm decoder 320 and 330 internal states, can in the entire process process of lost frames, carry out the recompile that is used to upgrade these internal states so.Yet, because first received frame that will arrive behind the packet loss is just known hysteresis always, so can not in the whole process of lost frames, carry out recompile.The straightforward procedure of head it off is the whole extrapolation waveform that storage is used to replace last lost frames, during first received frame, carries out recompile then.Yet this needs storer to store FS+MAXOS sampling.The complexity of recompile also all falls into first received frame.
Figure 11 is to redistribute carry out method flow Figure 110 of recompile in the mode of the great amount of calculation of preceding lost frames.Because MAXOS<<FS, so from load calculated balance angle, this is reasonable and feasible.
Shown in figure 11, the method for process flow diagram 1100 starts from step 1102, in lost frames, carries out recompile until frame boundaries, then subband adpcm decoder 320 and 330 internal state on the storage frame border.In addition, also to store recompile FS-MAXOS the inside middle state after the sampling, shown in step 1104.In step 1106, in storer, preserve and be used for recompile FS-MAXOS+1 and sample to the waveform extrapolation that FS+MAXOS generated.In step 1108, in first received frame behind packet loss, subband is approximate decodes (being used for confirming above-mentioned lag) to carry out as original state to use the internal state of storing on the frame boundaries.Then, in decision steps 1110, confirm that lag is positive or negative.If lag is positive, be stored in the internal state in the FS-MAXOS sampling so again, and the MAXOS-lag sampling that begins to decode again, shown in step 1112.On the contrary,, use the internal state on the frame boundaries so, and recompile is additional if lag bears | the lag| sampling.According to this method, recompile MAXOS sampling at the most in first received frame.
Those having skill in the art will recognize that and know, store the amount that more G.722 state reduces the recompile in first good frame in the time of can be through the recompile process in lost frames on the way.Under extreme case, can store the G.722 state of each sampling between FRAMESIZE-MAXOS and the FRAMESIZE+MAXOS, and need in first received frame, not carry out recompile.
Compare the method for process flow diagram 1100, in a kind of alternative method that in first received frame, needs more recompile, recompile is sampled to FS-MAXOS during lost frames and is carried out.2*MAXOS sampling of subband adpcm decoder 320 and 330 internal state and residue is stored in the storer so that in first received frame, use.In first received frame, calculate hysteresis, and the sampling of appropriate amount is begun to carry out recompile from the G.722 state of storage based on this hysteresis.This method need be stored the sampling of 2*MAXOS reconstruct, the recompile of a copy and the 2*MAXOS at the most in first received frame the sampling of state G.722.The shortcoming of this alternative method is subband adpcm decoder 320 and 330 the internal state that can not store on the frame boundaries that is used for decoding of above-mentioned lower complexity and time lag calculating.
Say that ideally the phase shift on the frame boundaries between the decodeing speech signal that hysteresis should be related with extrapolation voice signal and first received frame is consistent.According to one embodiment of present invention, calculate thick hysteresis estimated value through long relatively hysteresis search window, the center of this window is not consistent with frame boundaries.For example, the hysteresis search window can be 1.5 times of pitch period.The hysteresis hunting zone number of samples of primary speech signal skew extrapolation voice signal (promptly with respect to) broad that also compares (sampling promptly ± 28).In order to improve degree of registration, so carry out the hysteresis search refinement.As the part of hysteresis search refinement, the mobile search window is with first sampling beginning from first received frame.This can accomplish through the extrapolation voice signal of being estimated to setover by thick hysteresis.The size of the hysteresis search window in the hysteresis search refinement can be smaller, and the hysteresis hunting zone also can smaller (sampling promptly ± 4).Searching method can be identical with the method in the above-mentioned 3.b.i joint.
The present invention has proposed the notion of phasing again in above G.722 back in the environment of adaptive prediction coder.We can be easily with this conceptual expansion to other back to the adaptive prediction coder, as G.726.Yet the use of phasing is not limited to the back to the adaptive prediction coder again.On the contrary, most scramblers based on storer show phase correlation in status register, and therefore benefit from phasing again.
Iii, time distortion
As in this use, term time distortion refers to along time shaft and stretches or the processing of contraction signal.Like what discuss in this other places, in order to keep continuous signal, the related decodeing speech signal merging of first received frame is to avoid interruption behind the extrapolation voice signal that embodiments of the invention will be used to replace lost frames and the packet loss.This is to accomplish through carrying out two stacks between the signal.Yet, if between signal be mutual out-phase, waveform possibly take place so offset (cancellation), and produce and can listen the breast picture, for example the overlap-add region among Fig. 6.In this zone, carrying out between negative part and the extrapolation voice signal 604 that stack will cause decodeing speech signal 602 significantly, waveform offsets.
According to embodiments of the invention, be performed the time distortion with the related decodeing speech signal of first received frame behind the packet loss, to make decodeing speech signal and extrapolation voice signal phase alignment on certain time point in first received frame.The amount of time distortion is to be controlled by the value of time lag.Thereby, in one embodiment, if time lag is positive, the decodeing speech signal related that will stretch so with first received frame, and overlap-add region can be arranged on the place that begins of first received frame.Yet if time lag is born, decodeing speech signal will be compressed.Therefore, overlap-add region is set to and gets into first received frame | lag| sampling.
Under situation G.722, some sampling that first received frame begins behind the packet loss is not reliably, this be because frame when beginning subband adpcm decoder 320 and 330 internal state be incorrect.Therefore, in an embodiment of the present invention, according to the time distortion that the decodeing speech signal related with first received frame used, the MIN_UNSTBL sampling in first received frame can not be included in the overlap-add region.For example, in the embodiment that following D joint is described, MIN_UNSTBL is set at 16, or first 1ms in the 10ms frame of 160 samplings.In this zone, the extrapolation voice signal can be used as the output voice signal of demoder/PLC system 300.This embodiment has solved the convergence time again of voice signal in first received frame effectively.
Figure 12 A, Figure 12 B and Figure 12 C show several examples of this notion.In the example of Figure 12 A, timeline 1200 shows that decodeing speech signal is ahead of the extrapolation signal in first received frame.Therefore, decodeing speech signal has passed through-the lag time of sampling a distortion contraction (time lag lag bears).Used result after the time distortion shown in the timeline 1210.Shown in timeline 1210, these signals are homophases near the center in stack district or center.In this case, overlap-add region be centered close to the MIN_UNSTBL-lag+OLA/2 place, wherein OLA is the quantity of sampling in the overlap-add region.In the example of Figure 12 B, timeline 1220 shows that decodeing speech signal lags behind the extrapolation signal in first received frame.Therefore, decodeing speech signal is done lag time distortion stretching of sampling and accomplish alignment.The result of employing time distortion is shown in Figure 123 0.In this case, MIN_UNSTBL>lag, and in first received frame, still have unsettled zone.In the example of Figure 12 C, timeline 1240 shows decoded signal and lags behind the extrapolation signal again, so decodeing speech signal is stretched by the time distortion result in the timeline 1250 is provided.Yet, shown in timeline 1250, because MIN_UNSTBL≤lag, so overlap-add region can begin in first sampling from first received frame.
" homophase point " between decodeing speech signal and the extrapolation signal need be in the centre of overlap-add region, and overlap-add region is arranged on the place that begins near first received frame as far as possible.This has reduced the time that must the synthetic speech signal of last lost frames association be extrapolated to first received frame.In one embodiment of the invention, this is to estimate to accomplish through the time lag of carrying out two stages.In the phase one, calculate thick hysteresis estimated value through long relatively hysteresis search window, the center of window can be not consistent with the center of overlap-add region.For example, the hysteresis search window can be 1.5 times of pitch period.The hysteresis hunting zone number of samples of primary speech signal skew extrapolation voice signal (promptly with respect to) broad that also compares (sampling promptly ± 28).In order to improve degree of registration, so carry out the hysteresis search refinement.As the part of hysteresis search refinement, the hysteresis search window is provided with concentricity with the expectation stack that obtains according to the thick estimation that lags behind.This can accomplish through the extrapolation voice signal of being estimated to setover by thick hysteresis.The size of the hysteresis search window in the hysteresis search refinement can less (the for example size of overlap-add region), and the hysteresis hunting zone also can less (sampling promptly ± 4).Searching method can be identical with the method in the above-mentioned 3.b.i joint.
The execution time distortion exists many technology, a kind of technology to comprise sectional type (piece-wise) single sampling translation and stack.The process flow diagram 1300 of Figure 13 has been described a kind of method of using this technology to shrink.According to this method, shown in step 1302, periodically reduce sampling.From this some beginning that sampling reduces, stack original signal and to the signal (because reduction) of left, shown in step 1304.The process flow diagram 1400 of Figure 14 has been described a kind of method of using this technology to stretch.According to this method, the periodicity repeated sampling is shown in step 1402.From that some beginning of sampling repetition, stack original signal and to the signal (because sampling repetition) of right translation, shown in step 1404.The length of the stack window of these operations depends on the periodicity of the increase/reduction of sampling.For fear of too many signal smoothing, can define the stack cycle (i.e. 8 samplings) of a maximum.The cycle that sampling increases/reductions takes place is depended on various factors, like the quantity of frame sign, sampling increase/reduction and whether carrying out increase or reduction.
The time amount of distortion can be limited.For example, in the G.722 system that following D joint is described, the metering pin that can the time be twisted is restricted to ± 1.75ms (perhaps 28 samplings in the 10ms frame of 160 samplings) the frame of 10ms.Distortion greater than this scope can be eliminated above-mentioned destructive interference, but can bring some other listened to distortion usually.Thereby, in this embodiment, exceed in time lag under the situation of this scope, not the execution time distortion.
The present invention is provided with the following system of in the D joint, describing and guarantees the zero sampling delay after first received frame behind packet loss.For this reason, this system is not to surpassing the decodeing speech signal execution time distortion of first received frame.This so limited the amount that can listen the time distortion that distortion takes place not having described in the preceding paragraph.Yet; Those having skill in the art will recognize that and know; In the system that holds some sampling delay (after first received frame behind packet loss); Can be to surpassing the decodeing speech signal application time distortion of first good frame, thus can adjust to bigger time lag not having to listen under the situation of distortion.Certainly, in this system, if the LOF behind first received frame, the time distortion only can be applied to and the related decodeing speech signal of first good frame so.This optional embodiment is also within scope of the present invention and spirit.
In optional embodiment of the present invention, decodeing speech signal and extrapolation voice signal can be twisted the execution time.Owing to multiple reason, this method can provide more performance.
For example, if time lag is-20, according to said method decodeing speech signal is done the contraction of 20 samplings so.20 samplings that need produce the extrapolation voice signal in other words are to be used in first received frame.This quantity also can reduce through shrinking the extrapolation voice signal.For example, can the extrapolation voice signal be shunk 4 samplings, stay 16 samplings and be used for decodeing speech signal.This has reduced the extrapolation signals sampling quantity that must be used in first received frame, has also reduced the amount of the distortion that must on decodeing speech signal, carry out.As above write down, in the embodiment of D joint, the time distortion need be restricted to 28 samplings.The minimizing that is used for the required time twist angle of aligned signal means at the time distort process can introduce distortion still less, and has increased the quantity of the situation that can improve.
Through decodeing speech signal and extrapolation voice signal are done the time distortion, also should obtain better Waveform Matching in the overlap-add region.Be explained as follows: if lag behind be before-20 samplings in the example, decodeing speech signal is ahead of 20 samplings of extrapolation signal in other words.The most possible reason of this situation is to be used for the pitch period of extrapolation greater than actual fundamental tone.Through same contraction extrapolation voice signal, effective fundamental tone of this signal becomes littler in the overlap-add region, more approaches actual pitch period.Equally, through shrinking original signal, effective pitch period of this signal is greater than the situation that only is used to shrink.Therefore, two waveforms in the overlap-add region can have the pitch period of coupling more, thereby waveform can more mate.
If it is positive lagging behind, decodeing speech signal so stretches.In this case, whether though stretching extrapolation signal can increase the quantity of the extrapolation sampling that is used for first received frame, and it is also unclear to be improved.Yet, if there is long-term packet loss, and two obvious out-phase of waveform, this method can provide the performance of improvement so.For example, be 30 samplings if lag behind, because, in aforesaid method, do not carry out distortion greater than the restriction of 28 samplings.The distortion of 30 samplings causes the distortion of itself probably.Yet, if these 30 sample distribution between two signals, like 10 samplings of stretching extrapolation voice signal and 20 samplings of stretching decodeing speech signal, can be alignd them under the situation of not using too much time distortion so.
D, the details of the example implementation in the demoder G.722
This part provides and has related to the detail that the present invention recommends special realization in the Voice decoder G.722 at ITU-T.This example implementation is carried out on 10 milliseconds of intrinsic (ms) frame signs, also can on the bag of the multiple of any 10ms or frame sign, carry out.Long incoming frame is handled as superframe (super frame), and to this, the PLC logic is called the number of times of suitable quantity with its intrinsic 10ms frame sign.G.722, the routine big or small with using same number of frames decoded and compared, and it can't cause additional delay.The present invention only mode through example provides these to realize details and the following content that provides, and can not be used for limiting the present invention.
The embodiment that describes in this joint satisfy with appendix IV G.722 in the identical complexity demand of PLC algorithm described, but the obvious better voice quality of the PLC algorithm that has provided than has described in that appendix.Because its high-quality, the embodiment that describes in this joint is applicable to the general application G.722 that frame deletion or packet loss take place.This application comprises that for example, internet protocol voice technology (VoIP), Wireless Fidelity voice technology (WiFi) and numeral of future generation strengthen radio communication (DECT).The embodiment that describes in this joint is easy to be suitable for, except reality after the basically G.722 demoder of carrying out no PLC does not stay the application of complexity headroom (headroom).
1, abbreviation and agreement
Some abbreviations of in this joint, using have been listed in the table 1.
Abbreviation Describe
ADPCM Adaptive differential PCM
ANSI American National Standards Institute
dB Decibel
DECT Numeral strengthens radio communication
DC Direct current
FIR Finite impulse response (FIR)
Hz Hertz
LPC Linear predictive coding
OLA Stack
PCM Pulse code modulation (PCM)
PLC Bag-losing hide
PWE The periodic waveform extrapolation
STL2005 Software tool archive 2005
QMF The mirror image secondary filter
VoIP Internet protocol voice technology
WB The broadband
WiFi Wireless Fidelity
Table 1: abbreviation
Description of the invention has also been used some agreements, and will make explanations to a part wherein.The PLC algorithm carries out computing with the intrinsic frame sign of 10ms, so the description of this algorithm only is directed against the frame of 10ms.For bigger bag (multiple of 10ms), decode with 10ms segmentation butt joint packet receiving.The discrete time of signal is generally used " j " or " i " expression on 16kHz sampling rate rank.The discrete time index of signal generally uses " n " expression on 8kHz sampling rank.Low band signal (0-4kHz) is with subscript " L " sign, and high band signal (4-8kHz) identifies with subscript " H ".If possible, this description will be reused ITU-T standard G.722.
The most frequently used symbol and their description have been listed in the following table 2.
Figure S200780001854XD00301
Figure S200780001854XD00321
Figure S200780001854XD00331
Table 2: conventional sign and description thereof
2, the general description of PLC algorithm
Describe with reference to figure 5 as above, the frame that demoder/PLC system 300 handles has six types: Class1, type 2, type 3, type 4, type 5 and type 6.The frame of Class1 is the received frame beyond any the 8th received frame after packet loss.The frame of type 2 is one of them of first related with packet loss and second lost frames.The frame of type 3 be related with packet loss the 3rd to the 6th lost frames wherein any one.The frame of type 4 is six frame in addition any lost frames related with packet loss.The frame of type 5 is received frames that follow closely behind the packet loss.At last, the frame of type 6 be follow closely behind the packet loss second to the 8th received frame wherein any one.The PLC algorithm described in this joint is to be the enterprising row operation of constant frame size of 10ms in the duration.
The present invention is according to the frame of G.722 operating the Class1 of decoding of standard, and the maintenance that has increased some status register is beneficial to PLC and relevant processing with processing.Figure 15 is a module map 1500 of carrying out the logic of these operations according to embodiments of the invention.Specifically, shown in figure 15, when the frame of treatment type 1, from bit demodulation multiplexer (not shown Figure 15), receive index (index) I of low strap adpcm encoder L(n), and by 1510 decodings of low strap adpcm decoder produce the subband voice signal.Similarly, from the bit demodulation multiplexer, receive the index number I of high-band adpcm encoder H(n), and by 1520 decodings of high-band adpcm decoder produce the subband voice signal.QMF composite filter group 1530 synthetic low strap voice signals and high-band voice signal produce decoded output signal x Out(j).G.722, it is consistent that these operations are decoded with standard.
Except that these standards G.722 the decode operation; When the frame of treatment type 1; Logic module 1540 is used to upgrade the relevant low strap ADPCM status register of PLC; Logic module 1550 is used to upgrade the relevant high-band ADPCM status register of PLC, and logic module 1560 is used to upgrade the relevant status register of WB PCM PLC.These status registers upgrade and are used to accelerate the PLC processing relevant with other frame type.
For the frame of type 2, type 3 and type 4, broadband (WB) PCM PLC carries out in 16kHz output voice domain.The module map 1600 of the logic that is used to carry out WB PCM PLC is provided among Figure 16.G.722 demoder before output voice x Out(j) be buffered, and be sent to WB PCMPLC logic.WB PCM PLC algorithm is based on periodic waveform extrapolation (PWE), and it is the important component part of WB PCM PLC logic that fundamental tone is estimated.At first, based on estimating thick fundamental tone to down-sampling (to 2kHz) signal in the weighting voice domain.Subsequently, use original 16kHz sampling with complete this estimated value of resolution refinement.The output x of WB PCM PLC logic PLC(i) be extrapolation waveform and periodically by the linear combination of the noise of PLC setting.For the frame deletion that continues, output waveform x PLC(i) weakened gradually.Begin after weakening the 20ms after the LOF, and behind the 60ms after the LOF, accomplish.
Shown in the module map 1700 of Figure 17, for the frame of type 2, type 3 and type 4, the output x of WB PCMPLC logic PLC(i) be transmitted through G.722QMF analysis filterbank 1702 obtaining corresponding subband signal, these subband signals are sent to improved low strap adpcm encoder 1704 and improved high-band adpcm encoder 1706 respectively subsequently with the more state and the storer of new decoder.The subband adpcm encoder that has only part to simplify is used for this renewal.
The processing that Figure 16 and logic shown in Figure 17 are carried out occurs in during the lost frames.Improved low strap adpcm encoder 1704 all is simplified to reduce complexity with improved high-band adpcm encoder.To make details to them in other place of the application describes.A characteristic (in the G.722 subband adpcm encoder of routine, not existing) that in scrambler 1704 and 1706, occurs is based on the scrambler self-adaptation of signal attribute and packet loss duration and resets.
The complex processing related with the PLC algorithm is the processing for the frame of type 5, and the frame of type 5 is first received frames that follow packet loss closely.The transition of extrapolation waveform to the standard decode waveform took place in this image duration.The technology of when the frame of treatment type 5, using comprises again phasing and time distortion, and these will be made more details at this and describe.Figure 18 provides the module map 1800 that is used to carry out these technological logics.In addition, when the frame of treatment type 5, come the QMF composite filter group in the new decoder more with the mode of having described more details at this.Another function related with the frame of treatment type 5 is included in low strap and the setting of the high-band logarithmic scale factor when first received frame begins behind the packet loss.
All be to use the decode frame of type 5 and type 6 of improved subband adpcm decoder at this with constraint.Figure 19 has described the module map 1900 of the logic of the frame that is used for treatment type 5 and type 6.Shown in figure 19, logical one 970 imposes restriction to subband adpcm decoder 1910 and 1920 when the frame of treatment type 5 and/or type 6 and controls.The constraint of subband adpcm decoder and control are during the 80ms behind the packet loss, to apply.Wherein some does not expand to beyond the 40ms, and other constraint and be controlled at the duration or the degree on be adaptive.Constraint and controlling mechanism will be made more details in this application and describe.Shown in figure 19, logic module 1940,1950 and 1960 is used for update mode storer after the frame of treatment type 5 or type 6.
Under the condition of error-free channel, the PLC algorithm of describing in this joint be bit (bit-exact) accurately G.722.In addition, under error condition, this algorithm behind packet loss beyond the 8th frame with G.722 be identical, if there is not the bit mistake, should be able to obtain to the G.722 convergence of error-free output.
The PLC algorithm of describing in this joint supports that any is the frame sign of the multiple of 10ms.For bag size, only need the PLC algorithm be called repeatedly with 10ms at interval to each bag greater than 10ms.Therefore, in the further part of this joint, will the PLC algorithm be described according to the constant frame size of 10ms.
3, the waveform extrapolation of G.722 exporting
For with the corresponding lost frames of packet loss (frame of type 2, type 3 and type 4), the WB PCM PLC logic extrapolation G.722 output waveform of in Figure 16, describing related with former frame produces the replacement waveform of present frame.Then when the frame of treatment type 2, type 3 and type 4 with this extrapolation broadband signal waveform x PLC(i) be used as the G.722PLC output waveform of logic.Various modules among Figure 16 are described for ease, when WB PCM PLC logic is that lost frames calculate signal x PLC(i) after, signal x PLC(i) be written into and stored x Out(j) buffer memory, wherein x Out(j) be the final output of whole G.722 demoder/PLC system.Now each processing module of Figure 16 being made more details describes.
A, eight rank lpc analysis
Module 1604 is used for calculating the related signal x of present frame Out(j) and with it be stored in after the buffer memory, carrying out 8 rank lpc analysis near the ending of frame cycle of treatment.This 8 rank lpc analysis are a kind of auto-correlation lpc analysis, have to be applied to the x related with present frame Out(j) the asymmetric analysis window of the 10ms of signal.This asymmetrical window defines as follows:
w ( j ) = 1 2 [ 1 - cos ( ( j + 1 ) π 121 ) ] , for j = 0,1,2 , . . . , 119 cos ( ( j - 120 ) π 80 ) , forj = 120,121 , . . . , 159 - - - ( 4 )
Suppose x Out(0), x Out(1) ..., x Out(159) the expression G.722 demoder/PLC system output Sampling for Wide-Band Signal related with present frame.It is following to carry out the window computing:
x w(j)=x out(j)w(j),j=0,1,2,...,159. (5)
Next step, it is following to calculate coefficient of autocorrelation:
r ( i ) = Σ j = i 159 x w ( j ) x w ( j - i ) , i = 0,1,2 , . . . , 8 . - - - ( 6 )
Then spectral smoothing and white noise are corrected operational applications to coefficient of autocorrelation, as follows:
r ^ ( i ) = 1.0001 × r ( 0 ) , i = 0 r ( i ) e - ( 2 πiσ / f s ) 2 2 , i = 1,2 , . . . , 8 , - - - ( 7 )
F wherein sThe=16000th, the sampling rate of input signal, σ=40.
Next step; Use row Vincent-Du Bin (Levinson-Durbin) recurrence to convert coefficient of autocorrelation
Figure S200780001854XD00364
into LPC predictor coefficient
Figure S200780001854XD00365
i=0; 1; ..., 8.If the short-term forecasting device coefficient related with a nearest frame used in row Vincent-Du Bin recurrence too early withdrawing from before accomplishing recurrence (for example, because prediction residual energy E (i) is less than zero) so in present frame.In order to solve the exception of this mode, need the initial value of
Figure S200780001854XD00366
array.The initial value of
Figure S200780001854XD00367
array is set to and
Figure S200780001854XD00369
i=1; 2; ..., 8.Row Vincent-Du Bin recursive algorithm concrete regulation is following:
If 1
Figure S200780001854XD003610
use
Figure S200780001854XD003611
array of a nearest frame, and withdraw from row Vincent-Du Bin recurrence
2 . E ( 0 ) = r ^ ( 0 )
3 . k 1 = - r ^ ( 1 ) / r ^ ( 0 )
4 . a ^ 1 ( 1 ) = k 1
5.E(1)=(1-k 1 2)E(0)
If 6 E (1)≤0; Use
Figure S200780001854XD00375
array of a nearest frame, and withdraw from row Vincent-Du Bin recurrence
7, for i=2,3,4 ..., 8, carry out following computing:
a . k i = - r ^ ( i ) - Σ j = 1 i - 1 a ^ j ( i - 1 ) r ^ ( i - j ) E ( i - 1 )
b . a ^ i ( i ) = k i
c . a ^ j ( i ) = a ^ j ( i - 1 ) + k i a ^ i - j ( i - 1 ) , fori = 1,2 , . . . , i - 1
d.E(i)=(1-k i 2)E(i-1)
If e is E (1)≤0; Use
Figure S200780001854XD00379
array of a nearest frame, and withdraw from row Vincent-Du Bin recurrence
If withdraw from recurrence too early, array of the frame of handling before using.If accomplish recurrence (under the normal condition) smoothly, the LPC predictor coefficient is following:
a ^ 0 = 1 - - - ( 8 )
And
a ^ i = a ^ i ( 8 ) , fori = 1,2 , . . . , 8 . - - - ( 9 )
Through the coefficient applicable broadband extended arithmetic to above acquisition, the final LPC predictor coefficient group that obtains is following:
a i = ( 0.96852 ) i a ^ i , f = 0,1 , . . . , 8 . - - - ( 10 )
The calculating of b, short-term forecasting residue signal
The module 1602 of Figure 16 (being labeled as " A (z) ") expression short-term linear prediction error wave filter, filter coefficient is a of above calculating i, i=0,1 ..., 8.Module 1602 is used for carrying out the laggard line operate of 8 rank lpc analysis.Module 1602 is calculated short-term forecasting residue signal d (j) as follows:
d ( j ) = x out ( j ) + Σ i = 1 8 a i · x out ( j - i ) forj = , 2 , . . . , 159 . - - - ( 11 )
Traditionally, the time index n of present frame is from the time index continuation of the frame of processing before.In other words, if time index scope 0,1,2 ..., 159 expression present frames, time index scope-160 so ,-159 ... ,-1 frame just handled before the expression.Thereby, in above equality, if index (j-i) bear, so this index before near the ending of the frame handled signal sampling.
The calculating of c, scale factor
Module 1606 among Figure 16 is used to calculate the average amplitude of the short-term forecasting residue signal related with present frame.This operates in module 1602 and calculates short-term forecasting residue signal d (j) just execution afterwards in the above described manner.The calculating of average amplitude avm is following:
avm = 1 160 Σ j = 0 159 | d ( j ) | . - - - ( 12 )
If next pending frame is lost frames (in other words, with the corresponding frame of packet loss), can use this average amplitude to adjust white Gauss (Gaussian) noise sequence (if present frame is noiseless) as scale factor.
The calculating of d, weighted speech signal
The module 1608 of Figure 16 (being labeled as " 1/A (z/y) ") expression weighting short-term composite filter.Module 1608 is used for the short-term forecasting residue signal d (j) that in the above described manner (referrer module 1602) calculate present frame and operates afterwards.The coefficient a of this weighting short-term composite filter i(i=0,1 ..., 8) calculate as follows (γ wherein 1=0.75):
a′ i=γ 1 ia i,i=1,2,...,8. (13)
Short-term forecasting residue signal d (j) is through this weighted synthesis filter.The calculating of corresponding output weighted speech signal xw (j) is following:
E, eight to one sampling (eight-to-one decimation)
The module 1616 of Figure 16 transmits the weighted speech signal of module 1608 outputs through 60 rank minimum phase finite impulse response (FIR) (FIR) wave filters, carries out sampling in 8: 1 then the 16kHz LPF weighted speech signal that obtains is sampled as 2kHz downwards to the weighted speech signal xwd of down-sampling (n).This sampling operation is just carried out after calculating weighted speech signal.So that reduce complexity, only when needing xwd (n) new sampling the time carry out the FIR low-pass filtering operation.Thereby, following to the calculating of the weighted speech signal xwd of down-sampling (n):
xwd ( n ) = Σ i = 0 59 b i · xw ( 8 n + 7 - i ) , n 1,2 , . . . , 19 , - - - ( 15 )
B wherein i(i=0,1,2 ..., 59) be the filter factor of 60 rank FIR low-pass filters, as shown in table 3.
Lag behind i The b of Q15 form i Lag behind i The b of Q15 form i Lag behind i The b of Q15 form i
0 1209 20 -618 40 313
1 728 21 -941 41 143
2 1120 22 -1168 42 -6
3 1460 23 -1289 43 -126
4 1845 24 -1298 44 -211
5 2202 25 -1199 45 -259
6 2533 26 -995 46 -273
7 2809 27 -701 47 -254
8 3030 28 -348 48 -210
9 3169 29 20 49 -152
10 3207 30 165 50 -89
11 3124 31 365 51 -30
12 2927 32 607 52 21
13 2631 33 782 53 58
14 2257 34 885 54 81
15 1814 35 916 55 89
16 1317 36 881 56 84
17 789 37 790 57 66
18 267 38 654 58 41
19 -211 39 490 59 17
The coefficient of table 3:60 rank FIR wave filter
F, thick pitch period extract
For the complexity that reduces to calculate; WB PCM PLC logic is carried out the fundamental tone extraction and is divided into two stages: at first use the temporal analytical density of 2kHz sampled signal to confirm thick pitch period, use the temporal analytical density of the non-sampled signal of 16kHz to carry out the pitch period refinement then.Only after the weighted speech signal xwd of down-sampling (n), just carry out this fundamental tone extraction when calculating.This subdivision has been described the module 1620 of Figure 16 thick pitch period extraction algorithm of performed phase one.This algorithm is based on using certain additional decision logic to maximize the crossing dependency of standard.
, can use thick pitch period the pitch analysis window of 15ms when extracting.The afterbody of pitch analysis window aligns with the afterbody of present frame.Under the sampling rate of 2kHz, corresponding 30 samplings of 15ms.Under situation about being without loss of generality, suppose index range n=0 to n=29 corresponding to the pitch analysis window that is used for xwd (n).Thick pitch period extraction algorithm is through being worth beginning below calculating:
c ( k ) = Σ n = 0 29 xwd ( n ) xwd ( n - k ) , - - - ( 16 )
E ( k ) = Σ n = 0 29 [ xwd ( n - k ) ] 2 , - - - ( 17 )
And
Figure S200780001854XD00403
All integers that more than calculate in being directed against from k=MINPPD-1 to the k=MAXPPD+1 scope carry out, and wherein MINPPD=5 and MAXPPD=33 are respectively minimum and the maximum pitch periods in the sample range.Thick then pitch period extraction algorithm is at k=MINPPD, MINPPD+1, and MINPPD+2 ..., search in the scope of MAXPPD, to find all local peakings of the array { c2 (k)/E (k) } that satisfies c (k)>0.If (adjacent two values of value are all little than it, are local peaking with this value defined).Suppose N pThe quantity of representing positive local peaking.Suppose k p(j) (j=1,2 ..., N p) be index, c2 (k wherein p(j))/E (k p(j)) be local peaking and c (k p(j))>0, and the hypothesis k p(1)<k p(2)<...<k p(N p).For ease, c2 (k)/E (k) will be called as " standardization correlativity square (normalized correlation square) ".
If N p=0, if promptly there is not positive local peaking in function c2 (k)/E (k), this algorithm has maximum amplitude with search so | c2 (k)/E (k) | the negative local peaking of maximum.If find this maximum negative local peaking, so corresponding index k is used as the thick pitch period cpp of output, and stops the processing of module 1620.If standardization correlativity chi square function c2 (k)/E (k) had not both had positive local peaking, also not negative local peaking will export thick pitch period so and be set at cpp=MIPPD, and stop the processing of module 1620.If N p=1, will export thick pitch period so and be set at cpp=k pAnd stop the processing of module 1620 (1).
If there are at least two (N of local peaking p>=2), this module is used algorithm A, B, C and D (will be described below) so, comes to confirm the thick pitch period cpp of output in proper order according to this.The variable that in these four algorithms, calculates in the more preceding algorithm will pass to back one algorithm to be continued to use.
Following algorithm A is used for criterion of identification correlativity square c2 (k p)/E (k p) local peaking around maximum secondary in insert peak value.To c (k p) carry out in the secondary and insert, and to E (k p) the execution linear interpolation.Insert in this that the temporal analytical density be to use the non-sampling voice signal of 16kHz carries out.In following algorithm, the sampling factor of using when D representes that xw (n) is sampled to xwd (n).Thereby, at this D=8.
Algorithm A-is at c2 (k p)/E (k p) the interior peak value of inserting of look-around maximum secondary:
A, setting c2max=-1, Emax=1, jmax=0.
B, for j=1,2 ..., N p, carry out following 12 steps:
1, sets a=0.5 [c (k p(j)+1)+c (k p(j)-1)]-c (k p(j))
2, set b=0.5 [ck p(j)+1)+c (k p(j)-1)]
3, set ji=0
4, set ei=E (k p(j))
5, set c2m=c2 (k p(j))
6, set Em=E (k p(j))
If 7 c2 (k p(j)+1) E (k p(j)-1)>c2 (k p(j)-1) E (k p(j)+1), carry out the remaining part of step 7:
a、Δ=[E(k p(j)+1)-ei]/D
B, for k=1,2 ..., D/2, carry out step 7 with the lower part:
i.ci=a(k/D) 2+b(k/D)+c(k p(j))
ii.ei←ei+Δ
If iii. (ci) 2The ei of Em>(c2m), carry out following triplex row:
a.ji=k
b.c2m=(ci) 2
c.Em=ei
If 8 c2 (k p(j)+1) E (k p(j)-1)≤c2 (k p(j)-1) E (k p(j)+1), carry out the remaining part of step 8:
a、Δ=[E(k p(j)-1)-ei]/D
B, for k=-1 ,-2 ... ,-D/2, carry out step 8 with the lower part:
i.ci=a(k/D) 2+b(k/D)+c(k p(j))
ii.ei←ei+Δ
If iii. (ci) 2The ei of Em>(c2m), carry out following triplex row:
a.ji=k
b.c2m=(ci) 2
c.Em=ei
9, set lag (j)=k p(j)+ji/D
10, set c2i (j)=c2m
11, set Ei (j)=Em
If 12 c2m * Emax>c2max * Em carry out following triplex row:
a.jmax=j
b.c2max=c2m
c.Emax=Em
Symbol ← expression uses the value on right side to upgrade the parameter in left side.
For fear of selecting approximately is the thick pitch period of integral multiple of actual thick pitch period, to c2 (k p)/E (k p) corresponding each time lag of local peaking search for, with determine whether time lag enough near before the thick pitch period of output (be expressed as cpplast, for each first frame, cpplast is initialized to 12) of the frame handled.If have time lag be positioned at cpplast 25% in, so just think enough near.For all cpplast 25% with interior time lag, with corresponding standard correlativity square c2 (k p)/E (k p) secondary in insert peak value and compare, and select wherein to be used for subsequent treatment corresponding to the interior slotting time lag of maximum standardization correlativity square.Following algorithm B has carried out above-mentioned task.The interior slotting array c2i (j) and the Ei (j) that in above-mentioned algorithm A, calculate in this algorithm, have been used.
Algorithm B-searches among all time lags near the thick pitch period of output of a nearest frame and inserts c2 (k in the maximization p)/E (k p) time lag:
A, setting index im=-1
B, setting c2m=-1
C, setting Em=1
D, for j=1,2 ... N p, carry out following computing:
1. if | k p(j)-and cpplast|≤0.25 * cpplast, carry out as follows:
If c2i (j) * Em>c2m * Ei (j) a. carries out following triplex row:
i.im=j
ii.c2m=c2i(j)
iii.Em=Ei(j)
Note, if be not positioned at cpplast 25% with time lag k p(j), the value of index im will remain-1 behind execution algorithm B so.If exist one or more cpplast of being positioned at 25% with time lag, so index im in these time lags corresponding to maximum standardization correlativity square.
Next, algorithm C determines whether to select another time lag as the thick pitch period of output at the preceding half cycle of fundamental tone scope.This algorithm search is less than inserting time lag lag (j) in all of 16, and checks the local peaking that whether has in them near the enough big standardization correlativity of its each integral multiple (until 32, comprise itself) square.If there are the one or more time lags that satisfy this condition, select minimum in these time lags that a satisfy condition time lag so as the thick pitch period of output.
In addition, each variable that in above algorithm A and algorithm B, calculates also transmits its end value and gives following algorithm C.Be described below, parameter MPDTH is 0.06, and providing threshold array MPTH (k) is MPTH (2)=0.7, MPTH (3)=0.55, MPTH (4)=0.48, MPTH (5)=0.37, MPTH (k)=0.30, k>5.
Algorithm C-checks that another time lag that whether should select in the interior preceding half cycle of thick pitch period scope is as exporting thick pitch period:
A, for j=1,2,3 ..., N p, when lag (j)<16, operate in proper order as follows by this:
If 1 j ≠ im sets threshold=0.73; Otherwise, set threshold=0.4.
If 2 c2i (j) * Emax≤threshold * c2max * Ei (j) cancel this j, and skip the step (3) of corresponding this j, j is increased 1 and return step (1).
If 3 c2i (j) * Emax>threshold * c2max * Ei (j) operate as follows:
A, for k=2,3,4 ..., when k * lag (j)<32, carry out as follows:
i、s=k×lag(j)
ii、a=(1-MPDTH)s
iii、b=(1+MPDTH)s
Iv, pass through m=j+1 in order, j+2, j+3 ..., N p, see if there is time lag lag (m) between a and b.If there is not time lag to be between a and the b, skip this j, stop step 3, j is increased 1 and return step 1.If there is at least one m satisfy a<lag (m)<b and c2i (m) * Emax>MPTH (k) * c2max * Ei (m), think near the enough big peak value that k the integral multiple of lag (j), has found standardization correlativity square so; In this case, stop step 3.a.iv, k is increased 1 and return step 3.a.i.
B if under situation about not stopping too early completing steps 3.a; Just; If less than each integral multiple of 32 lag (j) ± have the enough big interior slotting peak value of standardization correlativity square among the 100xMPDTH%; Stop this algorithm so, skip algorithm D and with cpp=lag (j) as the thick pitch period of final output.
Do not find the thick pitch period cpp of the output that satisfies condition if execute above algorithm C; Algorithm D will check the maximum local peaking (in above algorithm B, obtaining) of the standardization correlativity square around the thick pitch period of a nearest frame, and finally confirm the thick pitch period cpp of output.Equally, the variable that in above algorithm A and algorithm B, calculates passes to following algorithm D with its end value.Be described below, parameter is SMDTH=0.095, LPTHI=0.78.
Algorithm D-exports the final of thick pitch period and confirms:
If A is im=-1, if promptly around the thick pitch period of a nearest frame, there is not the local peaking of enough big standardization correlativity square, the cpp that will calculate at last at algorithm A so is as the thick pitch period of final output, and withdraws from this algorithm.
If B is im=jmax; If promptly the maximum local peaking of the standardization correlativity around the thick pitch period of a nearest frame square inserts the global maximum in the peak value in all of standardization correlativity square in this frame; The cpp that will calculate at last at algorithm A so is as the thick pitch period of final output, and withdraws from this algorithm.
If C is im<jmax, carry out like the lower part:
If 1 c2m * Emax>0.43 * c2max * Em, carry out step C with the lower part:
If a is lag (im)>MAXPPD/2 sets output cpp=lag (im), and withdraws from this algorithm.
B otherwise, for k=2,3,4,5, carry out with the lower part:
i、s=lag(jmax)/k
ii、a=(1-SMDTH)s
iii、b=(1+SMDTH)s
If iv is lag (im)>a and lag (im)>b sets output cpp=lag (im), and withdraws from this algorithm.
If D is im>jmax, carry out with the lower part:
If 1 c2m * Emax>LPTH1 * c2max * Em sets output cpp=lag (im), and withdraws from this algorithm.
If the E algorithm implements this, above-mentioned steps is not selected the thick pitch period of final output so.In this case, only be received in cpp that the ending of algorithm A calculates as the thick pitch period of final output.
G, pitch period refinement
Module 1622 among Figure 16 is used for carrying out through the near zone that uses decoding output voice signal G.722 to search for thick pitch period with complete 16kHz temporal analytical density the subordinate phase of pitch period extraction algorithm and handles.This module at first is transformed into non-sampled signal territory, wherein D=8 through multiply by thick pitch period cpp with the sampling factor D with thick pitch period cpp.Fundamental tone refinement analysis window size WSZ is chosen to be less window size a: WSZ=min (cpp * D, 160) in cpp * D sampling and 160 samplings (corresponding 10ms).
Next, the lower boundary of calculating hunting zone is that (MINPP, cpp * D-4), wherein MINPP=40 sampling is minimum pitch period to lb=max.The coboundary of calculating the hunting zone is that (MAXPP, cpp * D+4), wherein MAXPP=265 sampling is maximum pitch period to ub=max.
It is G.722 decodeing speech signal x of XQOFF=MAXPP+1+FRSZ 16kHz that samples altogether that module 1622 has been kept Out(j) buffer memory, wherein FRSZ=160 is a frame sign.Last FRSZ sampling of this buffer memory comprises the G.722 decodeing speech signal of present frame.Before MAXPP+1 sampling be the G.722 demoder/PLC system output signal in the previous frame that is processed before of present frame.Last sampling of analysis window is alignd with last sampling of present frame.(this window is x if the index range from j=0 to j=WSZ-1 is corresponding to this analysis window OutThe sampling of last WSZ in the buffer memory), and establish negative index and represent the sampling before the analysis window (j).In the hunting zone in [lb, ub] to time lag k calculate in the non-sampled signal territory following relevance function and energy term (energy term) as follows:
c ~ ( k ) = Σ j = 0 WSZ - 1 x out ( j ) x out ( j - k ) - - - ( 19 )
E ~ ( k ) = Σ j = 0 WSZ - 1 x out ( j - k ) 2 . - - - ( 20 )
To maximize the time lag k ∈ [lb of ratio
Figure S200780001854XD00463
then; Ub] be chosen to be the final refinement pitch period of frame deletion or ppfe.Promptly
ppfe = arg max k ∈ [ lb , ub ] [ c ~ 2 ( k ) E ~ ( k ) ] . - - - ( 21 )
Next, module 1622 has also been calculated two relevant with fundamental tone more scale factors.First is called as the fundamental tone tap of ptfe or frame deletion, and it is the scale factor that is used for the periodic waveform extrapolation, and is calculated as x in the analysis window Out(j) average amplitude of signal and ppfe sampling x before Out(j) ratio of the average amplitude of signal section, have with these two signal sections between the identical symbol of relevance function, as follows:
ptfe = sign ( c ~ ( ppfe ) ) [ Σ j = 0 WSZ - 1 | x out ( j ) | Σ j = 0 WSZ - 1 | x out ( j - ppfe ) | ] . - - - ( 22 )
Σ j = 0 WSZ - 1 | x Out ( j - Ppfe ) | = 0 Degenerate case under, ptfe is set at 0.After this calculating of ptfe was accomplished, the bounds of ptfe value was [1,1].
Second scale factor relevant with fundamental tone is called as ppt or fundamental tone prediction tapped, is used to calculate long-term filtering call signal (back will be narrated this), and it is calculated as ppt=0.75 * ptfe.
H, calculating mixing ratio (Mixing Ratio)
Module 1618 among Figure 16 is calculated periodicity extrapolation waveform and the mixing ratio between the filter noise waveform during figure of merit (figure of merit) is confirmed lost frames.This calculating is only carried out during first lost frames when at every turn packet loss taking place.Figure of merit is the weighted sum of three characteristics of signals: log gain, the first standardization auto-correlation and fundamental tone prediction gain, they each calculating is following.
Use with before son joint description in the identical x that is used for Out(j) index agreement, the x in the fundamental tone refinement analysis window Out(j) energy of signal does
sige = Σ j = 0 WSZ - 1 x out 2 ( j ) , - - - ( 23 )
And with 2 is that the calculating of log gain lg at the end is following
Figure S200780001854XD00472
If E ~ ( Ppfe ) ≠ 0 , The calculating of fundamental tone prediction complementary energy is following
rese = sige - c ~ 2 ( ppfe ) / E ~ ( ppfe ) , - - - ( 25 )
And the calculating of fundamental tone prediction gain pg is following
If E ~ ( Ppfe ) = 0 , Set pg=0.If sige=0 sets pg=0 equally.
The first standardization auto-correlation ρ 1Calculating following
After obtaining these three characteristics of signals, the calculating of figure of merit is following
merit=lg+pg+12ρ 1. (28)
The merit that more than calculates has confirmed two scale factor Gp and Gr, and these two scale factors have been confirmed periodically extrapolation waveform and the mixing ratio between the filter noise waveform effectively.Two threshold values that are used for merit are arranged here: the figure of merit high threshold MHI and the figure of merit are hanged down threshold value MLO.These threshold values are set to MHI=28 and MLO=20.The calculating of (filter noise) components in proportions factor Gr is following at random
Gr = MHI - merit MHI - MLO , - - - ( 29 )
And the calculating of the scale factor Gp of cyclic component is following
Gp=1-Gr (30)
I, periodic waveform extrapolation
Module 1624 among Figure 16 is used on lost frames period property ground the last output speech waveform of extrapolation (if merit>MLO).To describe the mode that module 1624 is carried out this function now.
For first lost frames of each packet loss, calculate the average pitch periodic increment of every frame.Pitch period history buffer pph (m) (m=1,2 ..., 5) preserved the pitch period ppfe of preceding 5 frames.The average pitch periodic increment obtains according to following process.Nearest frame with instant begins, and calculates the pitch period increment (negative value is represented the pitch period decrement) from its former frame to this frame.If the pitch period increment is zero, this algorithm can be checked the pitch period increment of former frame.This processing procedure continues till detecting first frame with non-zero pitch period increment, perhaps till detecting the 4th previous frame.If all have identical pitch period at five preceding frames, so the average pitch periodic increment is made as zero.Otherwise; If find first non-zero pitch period increment at m previous frame; And if the amplitude of this pitch period increment is less than 5% of the pitch period on this frame; Then average pitch periodic increment ppinc is calculated as pitch period increment on this frame divided by m, and end value is limited in the scope of [1,2].
In second continuous lost frames of packet loss, with average pitch periodic increment and pitch period ppfe addition, and number of results is rounding to immediate integer, be not limited to then in the scope of [MIPP, MAXPP].
If present frame is first lost frames of packet loss, calculate so-called " call signal " in stack, used so, to guarantee the smooth waveform transition when frame is initial.Call signal is 20 samplings of first lost frames with the stack length of periodicity extrapolation waveform.Suppose j=0,1,2 ..., 20 samplings of corresponding current first lost frames of 19 index range are the stack cycle, and the corresponding previous frame of the negative index of hypothesis.Just can obtain long-term call signal is the zoom version of short-term forecasting residue signal (it is than a Zao pitch period of stack cycle):
ltring ( j ) = x out ( j - ppfe ) + Σ i = 1 8 a i · x out ( j - ppfe - i ) , j = 0,1,2 , . . . , 19 . - - - ( 31 )
After these 20 samplings that calculate ltring (j), the scale factor ppt that calculates with module 622 further adjusts them:
ltring(j)←ppt·ltring(j),j=0,1,2,...,19. (32)
Use is initialized to x in a nearest frame Out(j) the filtering storer ring (j) of last 8 samplings of signal (j=-8 ,-7 ... ,-1), the final call signal of acquisition is following:
ring ( j ) = ltring ( j ) - Σ i = 1 8 a i · ring ( j - i ) , j = 0,1,2 , . . . , 19 . - - - ( 33 )
Suppose j=0,1,2 ..., 159 index range is corresponding to current first lost frames, and j=160,161,162 ..., 209 index range is corresponding to 50 samplings of next frame.In addition, suppose wi (j) and wo (j) (j=0,1 ..., 19) be respectively that triangle fades in and the window that fades out, so wi (j)+wo (j)=1.So, the periodic waveform extrapolation is that two steps are carried out below the branch:
Step 1:
x out(j)=wi(j)·ptfe·x out(n-ppfe)+wo(j)·ring(j),j=0,1,2,...,19. (34)
Step 2:
x out(j)=ptfe·x out(j-ppfe),j=20,21,22,...,209. (35)
J, standardization noise maker
If merit<MHI, the module 1610 among Figure 16 can produce has the white gaussian random noise sequence of single average amplitude.In order to reduce computation complexity, calculate white gaussian random noise in advance and be stored in the table.For fear of using long table and avoiding owing to too short table repeats identical noise pattern, the present invention will use a kind of special index scheme.In this scheme, white Gauss noise table wn (j) has 127 clauses and subclauses, and the adjustment version of the output of noise maker module does
wgn(j)=avm×wn(mod(cfecount×j,127)),j=0,1,2,...,209, (36)
Wherein cfecount is a frame counter; K in losing for current pack lost frames continuously; Cfecount=k,
Figure S200780001854XD00492
is modular arithmetic.
The filtering of k, noise sequence
Module 1614 expression short-term composite filters among Figure 16.If merit<MHI, 1614 pairs of modules through the white Gauss noises of adjustment carry out filtering with it is provided with a nearest frame in x Out(j) the identical spectrum envelope of the spectrum envelope of signal.Obtain through filter noise fn (j) as follows
fn ( j ) = wgn ( j ) - Σ i = 1 8 a i · fn ( j - i ) , j = 0,1,2 , . . . , 209 . - - - ( 37 )
The mixing of l, periodicity and random element
If merit>MHI, the periodicity extrapolation waveform x that has only module 1624 to calculate so Out(j) as the output of WB PCM PLC logic.If merit<MLO, have only that module 1614 produces through the output of filtered noise signals fn (j) as WB PCM PLC logic.If MLO≤merit≤MHI is mixed into two compositions so
x out(j)←Gp·x out(j)+Gr·fn(j),j=0,1,2,...,209. (38)
The x of extrapolation Out(j) signal (j=160,161,162 ..., 199) preceding 40 extra samples will become the call signal ring (j) of next frame, j=0,1,2 ..., 39.If next frame or lost frames have only preceding 20 samplings of this call signal to be used for stack so.If next frame is a received frame, all 40 samplings of this call signal all will be used for stack so.
M, oblique deascension with good conditionsi (conditional ramp down)
If packet loss continues 20ms or shorter, the x that produces through mixing cycle property and random element so Out(j) signal will be as WB PCM PLC output signal.If packet loss continues greater than 60ms, WB PCM PLC output signal is eliminated the noise fully so.If packet loss continues greater than 20ms but less than 60ms, the x that produces through mixing cycle property and random element so Out(j) signal will produce linear oblique deascension (decaying to zero with linear mode).Like what stipulate in the following specific algorithm, this oblique deascension with good conditionsi is during the lost frames of cfecount>2 o'clock, to carry out.This provided with the array gawd () of Q15 form for 52 ,-69 ,-104 ,-207}.Equally, j=0,1,2 ..., 159 index range is corresponding to x Out(j) present frame.
Oblique deascension algorithm with good conditionsi:
If A is cfecount≤and 6, carry out following 9 row:
1、delta=gawd(cfecount-3)
2、gaw=1
3, for j=0,1,2 ..., 159, carry out following two row:
a.x out(j)=gaw·x out(j)
b.gaw=gaw+delta
If following triplex row is carried out in 4 cfecount<6:
A, for j=160,161,162 ..., 209, carry out following two the row:
i.x out(j)=gaw·x out(j)
ii.gaw=gaw+delta
X is set in B otherwise (if cfecount>6) Out(j)=0, j=0,1,2 ..., 209.
Stack in n, first received frame
For the frame of type 5, will be from the output x of demoder G.722 Out(j) with from the call signal ring (j) of last lost frames (calculating with the mode of as above describing) superpose by module 1624:
x out(j)=w i(j)·x out(j)+w o(j)·ring(j) j=0...L OLA-1, (39)
Wherein
L OLA = 8 if G p = 0 40 otherwise . - - - ( 40 )
4, the recompile of PLC output
For during lost frames (frame of type 2, type 3 and type 4) upgrade the G.722ADPCM storer and the parameter of demoder, PLC output will be passed through the G722 scrambler in essence.Figure 17 is the module map 1700 that is used to carry out the logic that this recompile handles.Shown in figure 17, PLC exports x Out(j) pass through QMF analysis filterbank 1702 to produce low strap subband signal x L(n) and high-band subband signal x H(n).Low strap subband signal x L(n) encode high-band subband signal x by low strap adpcm encoder 1704 H(n) encode by high-band adpcm encoder 1706.In order to reduce complexity, compare with traditional ADPCM subband coder, the present invention simplifies ADPCM subband coder 1704 and 1706.Now more details being carried out in aforesaid operation describes.
A, transmission PLC output are through the QMF analysis filterbank
The storer of QMF analysis filterbank 1702 is initialised to be provided and subband signal that the subband signal of decoding is continuous.Initial 22 samplings of WB PCM PLC output have constituted the filtering storer, and subband signal calculates according to following equality:
x L ( n ) = Σ i = 0 11 h 2 i · x PLC ( 23 + j - 2 i ) + Σ i = 0 11 h 2 i + 1 · x PLC ( 22 + j - 2 i ) , and - - - ( 41 )
x H ( n ) = Σ i = 0 11 h 2 i · x PLC ( 23 + j - 2 i ) - Σ i = 0 11 h 2 i + 1 · x PLC ( 22 + j - 2 i ) , - - - ( 42 )
X wherein PLC(0) first sampling of the 16kHz WB PCM PLC of corresponding present frame output, x L(n=0) and x H(n=0) the 8kHz low strap of corresponding present frame and first sampling of high-band subband signal respectively.Except 22 the extra samplings of squinting, said filtering is identical with the transmission QMF of scrambler G.722, and WB PCM PLC output (with input relatively) be sent to bank of filters.In addition, (80 sampling~10ms), WB PCM PLC need expand 22 samplings outside present frame, and produces 182 sampling~11.375ms for the whole frame that produces subband signal.Subband signal x L(n) (n=0,1 ..., 79) and x H(n) (n=0,1 ..., 79) produce according to equality 41 and 42 respectively.
The recompile of b, low band signal
Low band signal x L(n) be to use the low strap adpcm encoder of simplification to encode.The module map of the low strap adpcm encoder of simplifying 2000 is shown in figure 20.In Figure 20, can see, delete the inverse quantizer of the low strap adpcm encoder of standard, and replace quantized prediction error with non-quantized prediction error.In addition, because the renewal of adaptive quantizer only based on by 6 bit low strap encoder index I L(n) 8 element subclass in 64 element sets of expression are carried out, so predicated error only is quantified as 8 element sets.This provides the identical renewal of adaptive quantizer, has also simplified quantification.Table 4 has been listed based on e LDecision level, output code and the multiplier of 8 grades of simplification quantizers of absolute value (n).
m L Low threshold value High threshold I L Multiplier, W L
1 0.00000 0.14103 3c -0.02930
2 0.14103 0.45482 38 -0.01465
3 0.45482 0.82335 34 0.02832
4 0.82335 1.26989 30 0.08398
5 1.26989 1.83683 2c 0.16309
6 1.83683 2.61482 28 0.26270
7 2.61482 3.86796 24 0.58496
8 3.86796 20 1.48535
Table 4:8 level is simplified decision level, output code and the multiplier of quantizer
The entity of Figure 20 is to calculate according to the equivalents of their G.722 low strap ADPCM subband coder:
s Lz ( n ) = Σ i = 1 6 b L , i ( n - 1 ) · e L ( n - i ) , - - - ( 43 )
s Lp ( n ) = Σ i = 1 2 a L , i ( n - 1 ) · x L ( n - i ) , - - - ( 44 )
s L(n)=s Lp(n)+s Lz(n), (45)
e L(n)=x L(n)-s L(n), reach (46)
p Lt(n)=s Lz(n)+e L(n). (47)
Adaptive quantizer upgrades according to the regulation of scrambler G.722 exactly.The self-adaptation of zero-sum pole segment and the same generation in scrambler G.722 are as described in the clause 3.6.3 of standard G.722 and the 3.6.4.
Low strap adpcm decoder 1910 automatically resets behind the 60ms of LOF, but its can be during LOF early 30ms carry out self-adaptation and reset.In the recompile process of low band signal, to part reconstruction signal p Lt(n) attribute is monitored, and the self-adaptation of control low strap adpcm decoder 1910 resets.p Lt(n) signal is lost in the process monitoredly whole, so it is set as zero when first lost frames:
sgn [ p Lt ( n ) ] = sgn [ p Lt ( n - 1 ) ] + 1 p Lt ( n ) > 0 sgn [ p Lt ( n - 1 ) ] p Lt ( n ) = 0 . sgn [ p Lt ( n - 1 ) ] - 1 p Lt ( n ) < 0 - - - ( 48 )
For lost frames, the p of monitoring and constant signal contrast on the basis of every frame Lt(n) attribute, therefore zero originally attribute (cnst []) being made as of each lost frames.It is updated to
cnst [ p Lt ( n ) ] = cnst [ p Lt ( n - 1 ) ] + 1 p Lt ( n ) = p Lt ( n - 1 ) cnst [ p Lt ( n - 1 ) ] p Lt ( n ) &NotEqual; p Lt ( n - 1 ) . - - - ( 49 )
If meet the following conditions, in the ending of lost frames 3 to 5 sub-band decoder that resets:
| Sgn [ p Lt ( n ) ] N Lost | > 36 Or cnst [p Lt(n)]>40, (50)
N wherein OstBe the quantity of lost frames, promptly 3,4 or 5.
The recompile of c, high band signal
High band signal x H(n) be to use the high-band adpcm encoder of simplification to encode.The module map of the high-band adpcm encoder of simplifying 2100 is shown in figure 21.In Figure 21, can see; The adaptive quantizer of standard high-band adpcm encoder is deleted; Because this algorithm uses the moving average before the packet loss to rewrite the logarithmic scale factor on first received frame, thereby the logarithmic scale factor that does not need high-band to recode.The quantized prediction error of high-band adpcm encoder 2100 has been substituted by non-quantized prediction error.
The entity of Figure 21 is to calculate according to the equivalents of their G.722 high-band ADPCM subband coder:
s Hz ( n ) = &Sigma; i = 1 6 b H , i ( n - 1 ) &CenterDot; e H ( n - i ) , - - - ( 51 )
s Hp ( n ) = &Sigma; i = 1 2 a H , i ( n - 1 ) &CenterDot; x H ( n - i ) , - - - ( 52 )
s H(n)=s Hp(n)+s Hz(n), (53)
e H(n)=x H(n)-s H(n), reach (54)
p H(n)=s Hz(n)+e H(n). (55)
The self-adaptation of zero-sum pole segment and the same generation in scrambler G.722 are as described in the clause 3.6.3 of standard G.722 and the 3.6.4.
Similar with the low strap recompile, high-band adpcm decoder 1920 automatically resets behind the 60ms of LOF, but its can be during LOF early 30ms carry out self-adaptation and reset.In the recompile process of high band signal, to part reconstruction signal p H(n) attribute is monitored, and the self-adaptation of control high-band adpcm decoder 1910 resets.p H(n) signal is lost in the process monitoredly whole, so it is set as zero when first lost frames:
sgn [ p H ( n ) ] = sgn [ p H ( n - 1 ) ] + 1 p H ( n ) > 0 sgn [ p H ( n - 1 ) ] p H ( n ) = 0 . sgn [ p H ( n - 1 ) ] - 1 p H ( n ) < 0 - - - ( 56 )
For lost frames, the p of monitoring and constant signal contrast on every frame basis H(n) attribute, therefore each lost frames begin attribute (cnst []) is made as zero.It is updated to
cnst [ p H ( n ) ] = cnst [ p H ( n - 1 ) ] + 1 p H ( n ) = p H ( n - 1 ) cnst [ p H ( n - 1 ) ] p H ( n ) &NotEqual; p H ( n - 1 ) . - - - ( 57 )
If meet the following conditions, in the ending of lost frames 3 to 5 sub-band decoder that resets:
| Sgn [ p H ( n ) ] N Lost | > 36 Or cnst [p H(n)]>40. (58)
5, the use of pilot signal characteristic and PLC thereof
Below describe the constraint of Figure 19 and the function of steering logic 1970, be used to reduce counterfeit picture and distortion in the transition from lost frames to the received frame, thus the performance of demoder/PLC system 300 behind the raising packet loss.
A, the low strap logarithmic scale factor
During received frame, upgrade the characteristic of the low strap logarithmic scale factor
Figure S200780001854XD00553
, and on first received frame after the LOF, use these characteristics to come to set adaptively state to the adaptive quantizer of scale factor.Thereby obtain a kind of tolerance (measure) of low strap logarithmic scale factor stationarity, be used for confirming the proper reset of state.
The stationarity of i, the low strap logarithmic scale factor
During received frame, calculate and upgrade the stationarity of the low strap logarithmic scale factor
Figure S200780001854XD00554
, this is based on the single order moving average
Figure S200780001854XD00556
of
Figure S200780001854XD00555
of the constant leakage of tool (leakage):
&dtri; L , ml ( n ) = 7 / 8 &CenterDot; &dtri; L . ml ( n - 1 ) + 1 / 8 &CenterDot; &dtri; L ( n ) . - - - ( 59 )
The metric calculation of the tracking of single order moving average
Figure S200780001854XD00558
is following
&dtri; L , trck ( n ) = 127 / 128 &CenterDot; &dtri; L , trck ( n - 1 ) + 1 / 128 &CenterDot; | &dtri; L , ml ( n ) - &dtri; L , ml ( n - 1 ) | . - - - ( 60 )
Have following equality 61 calculating of second order moving average
Figure S200780001854XD005510
basis that self-adaptation is leaked:
&dtri; L , m 2 ( n ) = 7 / 8 &CenterDot; &dtri; L , m 2 ( n - 1 ) + 1 / 8 &CenterDot; &dtri; L , m 1 ( n ) &dtri; L , trck ( n ) < 3277 3 / 4 &CenterDot; &dtri; L , m 2 ( n - 1 ) + 1 / 4 &CenterDot; &dtri; L < m 1 ( n ) 3277 &le; &dtri; L , trck ( n ) < 6554 1 / 2 &CenterDot; &dtri; L , m 2 ( n - 1 ) + 1 / 2 &CenterDot; &dtri; L , m 1 ( n ) 6554 &le; &dtri; L , trck ( n ) < 9830 &dtri; L , m 2 ( n ) = &dtri; L , m 1 ( n ) 9830 &le; &dtri; L , trck ( n ) - - - ( 61 )
The stationarity of the low strap logarithmic scale factor is weighed according to following equality is change degree:
&dtri; L , chng ( n ) = 127 / 128 &CenterDot; &dtri; L , chng ( n - 1 ) + 1 / 128 &CenterDot; 256 &CenterDot; | &dtri; L , m 2 ( n ) - &dtri; L , m 2 ( n - 1 ) | . - - - ( 62 )
During lost frames, do not upgrade, in other words:
&dtri; L , m 1 ( n ) = &dtri; L , m 1 ( n - 1 )
&dtri; L , trck ( n ) = &dtri; L , trck ( n - 1 ) (63)
&dtri; L , m 2 ( n ) = &dtri; L , m 2 ( n - 1 ) .
&dtri; L , chng ( n ) = &dtri; L , chng ( n - 1 )
Resetting of the logarithmic scale factor of ii, low strap adaptive quantizer
First received frame place after LOF, according to (rewriting) low strap logarithmic scale factor that resets adaptively of the stationarity before the LOF:
&dtri; L ( n - 1 ) &LeftArrow; &dtri; L , m 2 ( n - 1 ) &dtri; L , chng ( n - 1 ) < 6554 &dtri; L ( n - 1 ) 3276 [ &dtri; L , chng ( n - 1 ) - 6554 ] + &dtri; L , m 2 ( n - 1 ) 3276 [ 9830 - &dtri; L , chng ( n - 1 ) ] 6554 &le; &dtri; L , chng ( n - 1 ) &le; 9830 &dtri; L ( n - 1 ) 9830 < &dtri; L , chng ( n - 1 ) - - - ( 64 )
B, the high-band logarithmic scale factor
During received frame, upgrade the characteristic of the high-band logarithmic scale factor , and on the received frame after the LOF, use these characteristics to set the state of adaptive quantizing scale factor.In addition, said characteristic adaptively control frame lose the convergence of the back high-band logarithmic scale factor.
The moving average and the stationarity of i, the high-band logarithmic scale factor
Tracking according to computes
Figure S200780001854XD00567
:
Figure S200780001854XD00568
Based on tracking, it is following to calculate the moving average with self-adaptation leakage
&dtri; H , m ( n ) = 255 / 256 &CenterDot; &dtri; H , m ( n - 1 ) + 1 / 256 &CenterDot; &dtri; H ( n ) | &dtri; H , trck ( n ) | < 1638 127 / 128 &CenterDot; &dtri; H , m ( n - 1 ) + 1 / 128 &CenterDot; &dtri; H ( n ) 1638 &le; | &dtri; H , trck ( n ) | < 3277 63 / 64 &CenterDot; &dtri; H , m ( n - 1 ) + 1 / 64 &CenterDot; &dtri; H ( n ) 3277 &le; | &dtri; H , trck ( n ) | < 4915 31 / 32 &CenterDot; &dtri; H , m ( n - 1 ) + 1 / 32 &CenterDot; &dtri; H , m ( n ) 4915 &le; | &dtri; H , trck ( n ) | . - - - ( 66 )
This moving average is used for the high-band logarithmic scale factor that on first received frame, resets, and this will describe in son joint after a while.
The degree of the stationarity of the calculating high-band logarithmic scale factor is following from mean value:
&dtri; H , chng ( n ) = 127 / 128 &CenterDot; &dtri; H , chng ( n - 1 ) + 1 / 128 &CenterDot; 256 &CenterDot; | &dtri; H , m ( n ) - &dtri; H , m ( n - 1 ) | . - - - ( 67 )
The tolerance of this stationarity is used for the heavily convergence that control frame is lost back
Figure S200780001854XD00571
, and this will describe in son joint after a while.
During lost frames, do not upgrade, in other words:
&dtri; H , trck ( n ) = &dtri; H , trck ( n - 1 )
&dtri; H , m ( n ) = &dtri; H , m ( n - 1 ) . - - - ( 68 )
&dtri; H , chng ( n ) = &dtri; H , chng ( n - 1 )
Resetting of the logarithmic scale factor of ii, high-band adaptive quantizer
The moving average of the received frame before on first received frame, the high-band logarithmic scale factor being reset to packet loss:
&dtri; H ( n - 1 ) &LeftArrow; &dtri; H , m ( n - 1 ) . - - - ( 69 )
The convergence of the logarithmic scale factor of iii, high-band adaptive quantizer
The convergence of the high-band logarithmic scale factor after the LOF is to be controlled by the tolerance of the stationarity before the frame losing
Figure S200780001854XD00576
.For situation stably, behind packet loss to
Figure S200780001854XD00577
application self-adapting low-pass filter.This application of low pass filters is on 0ms, 40ms or 80ms, and the degree of LPF reduces gradually during this period.The duration
Figure S200780001854XD00578
of sampling is according to definite as getting off:
N LP , &dtri; H = 640 &dtri; H , chng < 819 320 &dtri; H , chng < 1311 0 &dtri; H , chng &GreaterEqual; 1311 . - - - ( 70 )
LPF provides as follows:
&dtri; H , LP ( n ) = &alpha; LP ( n ) &dtri; H , LP ( n - 1 ) + ( 1 - &alpha; LP ( n ) ) &dtri; H ( n ) , - - - ( 71 )
Wherein coefficient is by providing as follows:
&alpha; LP ( n ) = 1 - ( n + 1 H LP , &dtri; H + 1 ) 2 , n = 0,1 , . . . , N LP , &dtri; H - 1 . - - - ( 72 )
Therefore, the LPF minimizing of sampling one by one time n.The logarithmic scale factor through LPF is replaced the conventional logarithmic scale factor simply between
Figure S200780001854XD005712
sampling period.
C, low strap pole segment
During received frame for the subband adpcm decoder upgrades the entity (entity) be referred to as (pole segment) engine sta bility margin, to be used to retrain the pole segment after the LOF.
The engine sta bility margin of i, low strap pole segment
The engine sta bility margin of low strap pole segment is defined as
β L(n)=1-|a L,1(n)|-a L,2(n), (73)
A wherein L, 1(n) and a L, 2(n) be two limit coefficients.The moving average of the engine sta bility margin during the received frame is upgraded according to following formula:
β L,MA(n)=15/16·β L,MA(n-1)+1/16·β L(n) (74)
During lost frames, do not upgrade moving average:
β L,MA(n)=β L,MA(n-1). (75)
The constraint of ii, low strap pole segment
G.722 in low strap (and high-band) the ADPCM Code And Decode, keep β in routine L, min=1/16 minimum engine sta bility margin.In the initial 40ms after LOF, for the low strap adpcm decoder is kept the minimum engine sta bility margin of increase, it has been the function of the moving average of time and engine sta bility margin since the LOF.
For three initial 10ms frames, minimum engine sta bility margin
β L,min=min{3/16,β L,MA(n-1)} (76)
Being arranged on the frame boundaries and on entire frame, carrying out should the minimum stability margin.On the frame boundaries that enters into the 4th 10ms frame, carry out minimum engine sta bility margin
&beta; L , min = min { 2 / 16 , 1 / 16 + &beta; L , MA ( n - 1 ) 2 } - - - ( 77 )
And other frame is carried out β L, min=1/16 conventional minimum engine sta bility margin.
D, high band portion reconstruction signal and high-band reconstruction signal
In all image durations, comprise lost frames and received frame, preserve and hold high band portion reconstruction signal p H(n) and high-band reconstruction signal r H(n) high-pass filtered version:
p H, HP(n)=0.97 [p H(n)-p H(n-1)+p H, HP(n-1)], reach (78)
Figure S200780001854XD00582
The 3dB of this corresponding about 40Hz blocks, and mainly is to remove DC.
In the initial 40ms after LOF, conventional part reconstruction signal is replaced by their high-pass filtered version separately with conventional reconstruction signal, and this is respectively for high-band pole segment self-adaptation and high-band reconstruct output.
6, time lag is calculated
Phasing again and time distortion Technology Need in this discussion have frame-losing hide waveform x PLC(j) and the unjustified number of samples of the signal in first received frame.
The low complex degree of a, low subband reconstruction signal is estimated
The signal that in first received frame, is used to calculate time lag is through using extremely zero filter factor (a Lpwe, i (159), b Lpwe, i (159)) and from STATE 159The necessary status information of other that obtains is blocked differential signal d to low subband Lt(n) carry out that filtering obtains:
r Le ( n ) = &Sigma; i = 1 2 a Lpwe , i ( 159 ) &CenterDot; r Le ( n - i ) + &Sigma; i = 1 6 b Lpwe , i ( 159 ) &CenterDot; d Lt ( n - i ) + d Lt ( n ) ,
n=0,1,...,79. (80)
This function is carried out by the module 1820 of Figure 18.
B, phasing and time distortion demand confirms again
If last received frame is noiseless, and is represented like figure of merit, time lag T LBe set to:
If merit≤MLO, T L=0. (81)
In addition, if first received frame is noiseless, represented like standardization first coefficient of autocorrelation:
r ( 1 ) = &Sigma; n = 0 78 r Le ( n ) &CenterDot; r Le ( n ) &Sigma; n = 0 78 r Le ( n ) &CenterDot; r Le ( n + 1 ) , - - - ( 82 )
Time lag is set to zero:
If r (1)<0.125, T L=0. (83)
Otherwise the calculating of time lag is as being explained with the lower part.The calculating of said time lag is to be carried out by the module 1850 of Figure 18.
The calculating of c, time lag
The calculating of time lag may further comprise the steps: (1) generates the extrapolation signal; (2) thick time lag search; And (3) refinement time lag search.These will be described in following son joint.
The generation of i, extrapolation signal
Time lag is represented x PLC(j) and r Le(n) deviation between.In order to calculate this deviation, with x PLC(j) extend to first received frame, and standardization crossing dependency function is maximized.How this height joint has described extrapolation x PLCAnd specified the length of desired signal (j).Suppose x PLC(j) be copied to x Out(j) in the buffer memory.Because this is the frame (first received frame) of a type 5, so hypothesis is accordingly:
x out(j-160)=x PLC(j),j=0,1,...,159 (84)
The scope of relevance of searches (correlation) is following:
Δ wherein TLMAX=28, ppfe is producing x PLCThe pitch period of the periodic waveform extrapolation of using in the time of (j).
Lag behind search window size (under the 16kHz sampling rate) as follows:
Figure S200780001854XD00602
Specifying hysteresis search window LSW under the 8kHz sampling rate is usefulness very, as follows:
Figure S200780001854XD00603
As above provide, need be from x PLCThe total length of the extrapolation signal that (j) obtains is:
L=2·(LSW+Δ TL). (88)
The extrapolation signal with respect to the reference position of first sampling in the received frame is:
D=12-Δ TL. (89)
Extrapolation signal es (j) is the following method reconstruct of basis:
If D<0
es(j)=x out(D+j) j=0,1,...,-D-1
If (L+D≤ppfe)
es(j)=x out(-ppfe+D+j) j=-D,-D+1,...,L-1
Otherwise
es(j)=x out(-ppfe+D+j) j=-D,-D+1,...,ppfe-D-1
es(j)=es(j-ppfe) j=ppfe-D,ppfe-D+1,...,L-1
Otherwise
Figure S200780001854XD00611
If (ovs >=L)
es(j)=x out(-ovs+j) j=0,1,...,L-1
Otherwise
(if ovs>0)
es(j)=x out(-ovs+j) j=0,1,...,ovs-1
If (L-ovs≤ppfe)
es(j)=x out(-ovs-ppfe+j) j=ovs,ovs+1,...,L-1
Otherwise
es(j)=x out(-ovs-ppfe+j) j=ovs,ovs+1,...,ovs+ppfe-1
es(j)=es(j-ppfe) j=ovs+ppfe,ovs+ppfe+1,...,L-1.
Ii, thick time lag search
The time lag T of guestimate LSUBAt first through search sub sampling standardization crossing dependency function R SUB(k) peak value calculates:
R SUB ( k ) = &Sigma; i = 0 LSW / 2 - 1 es ( 4 i - k + &Delta; TL ) &CenterDot; r Le ( 2 i ) &Sigma; i = 0 LSW / 2 - 1 es 2 ( 4 i - k + &Delta; TL ) &Sigma; i = 0 LSW - 1 r Le 2 ( 2 i ) , k = - &Delta; TL , - &Delta; TL + 4 , - &Delta; TL + 8 , K , &Delta; TL - - - ( 90 )
In order to avoid when the refinement search to be beyond the boundary, adjustment T LSUBAs follows:
If (T LSUB>Δ TLMAX-4) T LSUBTLMAX-4 (91)
If (T LSUB<-Δ TLMAX+ 4) T LSUB=-Δ TLMAX+ 4. (92)
Iii, the search of refinement time lag
Then through the search R (k) the peak value search refinement to provide time lag T L, R (k) as follows:
R ( k ) = &Sigma; i = 0 LSW - 1 es ( 2 i - k + &Delta; TL ) &CenterDot; r Le ( i ) &Sigma; i = 0 LSW - 1 es 2 ( 2 i - k + &Delta; TL ) &Sigma; i = 0 LSW - 1 r Le 2 ( i ) , k = - 4 + T LSUB , - 2 + T LSUB , . . . , 4 + T LSUB . - - - ( 93 )
At last, check following condition:
If &Sigma; i = 0 LSW - 1 r Le 2 ( i ) = 0 - - - ( 94 )
Or &Sigma; i = 0 LSW - 1 es ( 2 i - T L + &Delta; TL ) &CenterDot; r Le ( i ) &le; 0.25 &CenterDot; &Sigma; i = 0 LSW - 1 r Le 2 ( i ) - - - ( 95 )
Or(T L>Δ TLMAX-2)‖(T L<-Δ TLMAX+2)?(96)
T so L=0.
7, phasing again
Again phasing is that internal state is set at frame-losing hide waveform x PLC(j) with first received frame before the sampling of last input signal with the processing procedure of the state of phase time.Again phasing can be divided into following step: middle G.722 state is stored in (1) in the recompile process of lost frames; (2) according to time lag adjustment recompile; And QMF synthetic filtering storer is upgraded in (3).Following subdivision will be described the more details of these steps.Again phasing is to be carried out by the module 1810 of Figure 18.
A, in the recompile process state G.722 in the middle of the storage
As described in other place of the application, recompile reconstruction signal x during lost frames PLC(j) to upgrade G.722 decoder states storer.Suppose that STATEj is to x PLC(j) G.722 state and PLC state behind j the sampling recompile.The G.722 state that removes on frame boundaries so (will normally be kept, i.e. STATE 159) outside, also stored STATE 159-Δ TLMAXIn order to promote phasing again, also stored subband signal:
x L(n),x H(n) n=69-Δ TLMAX/2...79+Δ TLMAX/2
B, according to time lag adjustment recompile
According to the symbol of time lag, the process of adjustment recompile is following:
If Δ TL>0
1, recovers G.722 state and PLC state to STATE 159-Δ TLMAX
2, with aforementioned manner recompile x L(n), x H(n) (n=80-Δ TLMAX/ 2...79-Δ TL/ 2) if Δ TL<0
1, recovers G.722 state and PLC state to STATE 159
2, with aforementioned manner recompile x L(n), x H(n) (n=80...79+| Δ TL/ 2|) note, in order to promote x L(n) and x H(n) recompile is until the n=79+| Δ TL/ 2| needs x PLC(j) until Δ TLMAX+ 182 samplings.
C, renewal QMF composite filter storer
On first received frame, owing to during lost frames, occur in the 16kHz output voice domain because of PLC, QMF composite filter group is sluggish, so need to calculate QMF composite filter storer.On time, last sampling of general corresponding last lost frames of this storer.Yet, need again phasing to take into account.According to G.722, QMF composite filter storer is as follows given:
x d(i)=r L(n-i)-r H(n-i), i=1,2 ..., 11, and (97)
x s(i)=r L(n-i)+r H(n-i),i=1,2,...,11 (98)
Initial two output samplings of first received frame are calculated as follows:
x Out ( j ) = 2 &Sigma; i = 0 11 h 2 i &CenterDot; x d ( i ) , And (99)
x out ( j + 1 ) = 2 &Sigma; i = 0 11 h 2 i + 1 &CenterDot; x s ( i ) . - - - ( 100 )
Filter memory (is x d(i) and x s(i) (i=1,2 ..., 11)) be that basis is at recompile x L(n) and x H(n) (n=69-Δ TL/ 2, the 69-Δ TL/ 2+1 ..., the 79-Δ TL/ 2) warp that give to simplify the subband adpcm encoder when (promptly until more last samplings of phasing point again) again last 11 samplings in the input of phasing calculate:
x d(i)=x L(80-Δ TL/ 2-i)-x H(80-Δ TL/ 2-i), and i=1,2 ..., 11, and (101)
x s(i)=x L(80-Δ TL/2-i)+x H(80-Δ TL/2-i),i=1,2,...,11, (102)
X wherein L(n) and x H(n) during lost frames, be stored in the status register.
8, time distortion
The time distortion is the processing along time shaft stretching or contraction signal.Below described how to x Out(j) carry out the time distortion to improve and periodic waveform extrapolation signal x PLC(j) alignment.Only work as T LCarried out this algorithm at ≠ 0 o'clock.The time distortion is carried out by the module 1860 of Figure 18.
A, time lag refinement
Come refinement to be used for the time lag T of time distortion through the maximal value of getting the crossing dependency in the stack window LBased on T LThe reference position of the stack window in first received frame that estimates is following:
SP OLA=max(0,MIN_UNSTBL-T L), (103)
MIN_UNSTBL=16 wherein.
With respect to SP OLAThe reference position of extrapolation signal following:
D ref=SP OLA-T L-RSR, (104)
Wherein RSR=4 is the search refinement scope.
The Len req of extrapolation signal is following:
L ref=OLALG+RSR. (105)
Extrapolation signal es Tw(j) be to use that described identical process obtains as D.6.c.i saving, except LSW=OLALG, L=L RefAnd D=D Ref
Refinement hysteresis T RefBe to obtain through searching for following peak value:
R ( k ) = &Sigma; i = 0 OLALG - 1 es tw ( i - k + RSR ) &CenterDot; x out ( i + SP OLA ) &Sigma; i = 0 OLALG - 1 es tw 2 ( i - k + RSR ) &Sigma; i = 0 OLALG - 1 x out 2 ( i + SP OLA ) , k = - RSR , - RSR + 1 . . . , RSR . - - - ( 106 )
Obtain to be used for the final time lag of time distortion then:
T Lwarp=T L+T ref. (107)
The x that b, calculating were twisted through the time Out(j) signal
Signal x Out(j) through T LwarpThe distortion of time of individual sampling forms subsequently and waveform extrapolation signal es Old(j) the signal x of stack Warp(j).The timeline 2200,2220 and 2240 of Figure 22 A, Figure 22 B and Figure 22 C shows according to T respectively LwarpThree kinds of situation of value.In Figure 22 A, T Lwarp<0, x Out(j) experience is shunk or compression.x Out(j) first MIN_UNSTBL sampling is not used in distort process creates x Warp(j) and xstart=MIN_UNSTBL.In Figure 22 B, 0≤T Lwarp≤MIN_UNSTBL is to x Out(j) carry out T LwarpThe stretching of individual sampling.Equally, x Out(j) first MIN_UNSTBL is not used and xstart=MIN_UNSTBL.In Figure 22 C, T Lwarp>=MIN_UNSTBL is again to x Out(j) carry out T LwarpThe stretching of individual sampling.Yet, because in distort process, can create extra T LwarpIndividual sampling is not so need x in this case Out(j) a T LwarpIndividual sampling; Thereby, xstart=T Lwarp
In each case, the number of samples of each stack/decline is following:
spad = ( 160 - xstart ) | T Lwarp | . - - - ( 108 )
Distortion superposes through sectional type (piece-wise) single sampling translation and triangle and realizes, from x Out[xstart] beginning.In order to carry out contraction, periodically reduce sampling.The point that reduces from sampling begins, and superposes with original signal with to the signal (because reduction) of left.In order to carry out stretching, repeated sampling periodically.Begin from the point of sampling repetition, superpose with original signal with to the signal of right translation (because sampling repetition).The length L of stack window Olawarp(note: this is different from the OLA zone shown in Figure 22 A, 22B and the 22C) depended on the periodicity of the increase/reduction of sampling, as follows:
If T Lwarp < 0 , L Olawarp = ( 160 - Xstart - | T Lwarp | ) | T Lwarp |
Otherwise
Figure S200780001854XD00653
L olawarp=min(8,L olawarp).
Input signal x through distortion WarpLength following:
L xwarp=min(160,160-MIN_UNSTBL+T Lwarp). (110)
C, calculating waveform extrapolation signal
Shown in Figure 22 A, 22B and 22C, in first received frame, will twist signal x WarpWith extrapolation signal es Ola(j) superpose.Processing according to following two steps can be at x Out(j) directly produce extrapolation signal es in the signal buffer memory Ola(j):
Step 1
es ola(j)=x out(j)=ptfe.x out(j-ppfe) j=0,1,...,160-L xwarp+39 (111)
Step 2
x out(j)=x out(j)·w i(j)+ring(j)·w o(j) j=0,1,...,39, ?(112)
W wherein i(j) and w o(j) be that length is 40 triangle oblique ascension and oblique deascension stack window, ring (j) is the call signal that calculates with other local described mode among the application.
The stack of d, time distortion signal and waveform extrapolation signal
Extrapolation signal that in the last period, calculates and distortion signal x Warp(j) superpose, as follows:
x out(160-L xwarp+j)=x out(160-L xwarp+j)·w o(j)+x warp(j)·w i(j),j=0,1,...,39.
(113)
X then Warp(j) remainder by simple copy in the signal buffer memory:
x out(160-L xwarp+j)=x warp(j),j=40,41,...,L xwarp-1. (114)
E, based on the bag-losing hide of the subband predictive coding device of subband speech waveform extrapolation
Shown in the demoder among Figure 23/PLC system 2300, be an optional embodiment of the present invention.Above-mentioned most of technology for demoder/PLC system 300 researchs also can be used for this second exemplary embodiment.The key distinction of demoder PLC system 2300 and demoder/PLC system 300 is in subband voice signal territory, to carry out voice signal waveform extrapolation, rather than in full band voice signal territory.
Shown in figure 23, demoder/PLC system 2300 comprises bit stream demultiplexer 2310, low strap adpcm decoder 2320, low strap voice signal compositor 2322, switch 2336 and QMF composite filter group 2340.Bit stream demultiplexer 2310 bit stream demultiplexer 210 with Fig. 2 in essence is identical, and the QMF composite filter group 2340 QMF composite filter group 240 with Fig. 2 in essence is identical.
The same with demoder/PLC system 300 of Fig. 3, demoder/PLC system 2300 is with the mode processed frame according to frame type, and identical frame type among use and above-mentioned Fig. 5.
When the frame of treatment type 1, G.722 demoder/PLC system 2300 operative norms decode.Under this operator scheme, the module 2310,2320,2330 and 2340 of demoder/PLC system 2300 is carried out respective modules 210,220,230 and 240 identical functions with traditional G .722 demoder 200 respectively.Specifically, bit stream demultiplexer 2310 separates into low strap bit stream and high-band bit stream with incoming bit stream.Low strap adpcm decoder 2320 becomes decoding low strap voice signal with the low strap bit stream decoding.Switch 2326 is connected to the top position that is labeled as " Class1 ", thereby the low strap voice signal of will decoding is connected to QMF composite filter group 2340.High-band adpcm decoder 2330 becomes decoding high-band voice signal with the high-band bit stream decoding.Switch 2336 also is connected to the top position that is labeled as " Class1 ", thereby the high-band voice signal of will decoding is connected to QMF composite filter group 2340.The QMF composite filter group 2340 low strap voice signal of will decoding reconfigures with decoding high-band voice signal and helps band output voice signal then.
Therefore; When the frame of treatment type 1; Demoder/PLC system is equivalent to the demoder 200 of Fig. 2, and difference is for use possible in the follow-up lost frames, and decoding low strap voice signal is stored in the low strap voice signal compositor 2322; Same for use possible in the follow-up lost frames, decoding high-band voice signal is stored in the high-band voice signal compositor 2332.Estimate to carry out other state renewal of PLC operation and handle and also can be performed.
When the frame (lost frames) of treatment type 2, type 3 and type 4, the decodeing speech signal to each subband from the subband voice signal of the storage related with previous frame carries out extrapolation, to fill up the waveform gap related with current lost frames.This waveform extrapolation is carried out by low strap voice signal compositor 2322 and high-band voice signal compositor 2332.There are many prior aries in the outer plugging function of execution module 2322 and 2332 waveform.For example; Can use at application number is 11/234; 291, artificially old, application time of invention be September 26 in 2005 day, be called the technology of describing in the United States Patent (USP) of " the bag-losing hide technology of autonomous block audio coder & decoder (codec) "; Perhaps use those technological revisions, as relate to the technology of demoder/PLC system 300 of above-mentioned Fig. 3.
When the frame of treatment type 2, type 3 or type 4, switch 2326 and 2336 is positioned at the lower position that is labeled as " type 2-6 ".Thereby they will synthesize the low strap sound signal and synthetic high-band sound signal is connected to QMF composite filter group 2340, and then they are reassembled into the synthetic output voice signal to current lost frames.
Similar with demoder/PLC system 300; The first few received frame that follows bad frame (frame of type 5 and type 6) closely needs particular processing to minimize because the voice quality that the state G.722 of not matching causes descends, and guarantees the seamlessly transitting of the decodeing speech signal waveform of extrapolation voice signal waveform in the good frame of the first few that follows last bad frame closely from last lost frames.Thereby; When handling these frames; Switch 2326 and 2336 remains on the lower position that is designated as " type 2-6 "; So that the decoding low strap voice signal from low strap adpcm decoder 2320 can be made amendment by low strap voice signal compositor 2322 before being provided to QMF composite filter group 2340, and can before being provided to QMF composite filter group 2340, make amendment by high-band voice signal compositor 2332 from the decoding high-band voice signal of high-band adpcm decoder 2330.
Those having skill in the art will recognize that and know, in above son joint C and D, be directed against this exemplary embodiment of the special processing of the first few frame after most of technology of first few frame delineation also can be used for packet loss at an easy rate behind the packet loss.For example; Decoding constraint is also included within demoder/PLC system 2300 with steering logic (not shown among Figure 23), is used for retraining and control the decode operation of being carried out when the frame of treatment type 5 and type 6 by low strap adpcm decoder 2320 and high-band adpcm decoder 2330 with above-mentioned similar fashion with reference to demoder/PLC system 300.Equally, each subband voice signal compositor 2322 and 2332 is used to carry out again phasing and time distortion technology, like above-mentioned those technology with reference to demoder/PLC system 300 descriptions.Because part provides the complete description of these technology in front, so in these these technological descriptions that do not need repetition in the environment of demoder/PLC system 2300, to use.
Compare with demoder/PLC system 300, the major advantage of demoder/PLC system 2300 is that it has lower complexity.This is because the extrapolation voice signal has been got rid of and adopted QMF composite filter group will separate into the needs of subband voice signal entirely with the extrapolation voice signal in subband domain, like what in first exemplary embodiment, accomplish.Yet the extrapolation voice signal also has its advantage in full band territory, below will set forth this.
When the 2300 extrapolation high-band voice signals of the system among Figure 23, there are some potential problems.At first, if it, exports the periodic characteristic that voice signal will not keep the high-band voice signal that possibly in some high periodically audible signals, occur so not to high-band voice signal performance period property waveform extrapolation.In other words; If it is to high-band voice signal performance period property waveform extrapolation; Reduce calculating and guarantee that two subband voice signals are just using identical pitch period to be used for extrapolation, still exist another problem even its use is used for the identical pitch period of extrapolation low strap voice signal.When periodicity extrapolation high-band voice signal, extrapolation high-band voice signal will be periodic, and on frequency spectrum, have harmonic structure.In other words, the frequency of the spectrum peak in the frequency spectrum of high-band voice signal will be relevant with integral multiple.Yet in case composite filter group 2340 reconfigures high-band voice signal and low strap voice signal, the frequency spectrum of high-band voice signal will " be translated into " or be shifted to be higher frequency simultaneously mirror image possibly take place, and this depends on used QMF composite filter group.Thereby, after this mirror image and frequency shift (FS), can not guarantee that the spectrum peak that full band is exported in the high band portion of voice signal still has the frequency of the integral multiple of the fundamental frequency in the low strap voice signal.This possibly cause the high periodically decline of the output audio quality of audible signal potentially.On the contrary, there is not this problem in the system among Fig. 3 300.Because system 300 carries out the sound signal extrapolation in full band territory, so guaranteed that the frequency of the harmonic wave peak value in the high-band is the integral multiple of fundamental frequency.
In a word, the advantage of demoder/PLC system 300 is for audible signal, and extrapolation will keep the harmonic structure of spectrum peak entirely on whole voice band with voice signal.In other words, demoder/PLC system 2300 has the advantage of low complex degree, but it cannot keep this harmonic structure in higher subband.
F, hardware and software are realized
Complete in order to guarantee, the invention provides the following description of general calculation machine system.The present invention can realize in the combination of hardware or software and hardware.Therefore, the present invention can realize in the environment of computer system or other disposal system.Figure 24 shows an example of this computer system 2400.In the present invention, more than all decodings of in C, D and E joint, describing with PLC operation can on one or more different computer systems 2400, carry out, to realize the whole bag of tricks of the present invention.
Computer system 2400 comprises one or more processors, like processor 2404.Processor 2404 can be specific use or general digital signal processor.Processor 2404 is connected to communication construction 2402 (for example, bus or network).Various softwares are realized describing according to this exemplary computer system.After reading this narration, the those skilled in the art will be readily appreciated that how to use other computer system and/or computer organization to realize the present invention.
Computer system 2400 also comprises primary memory 2406, preferably at random at reservoir (RAM), also can comprise second memory 2420.Second memory 2420 can comprise, for example, hard disk drive 2422 and/or mobile storage driver 2424, representational have floppy disk, tape drive, CD drive or the like.Mobile storage unit 2428 is read and/or write to mobile storage driver 2424 in a well-known manner.There are floppy disk, tape, CD or the like in mobile storage unit 2428, and it carries out read and write by mobile storage driver 2424.Should know that mobile storage unit 2428 comprises computer software and/data storage computing machine are therein used storage medium.
In optional realization, second memory 2420 can comprise other similar device, is used for computer program or other instruction are written into computer system 2400.This device also comprises, for example, and mobile storage unit 2430 and interface 2426.The example of this device comprises programming box (cartridge) and cartridge interface (like what in video-game is provided with, use), removable memory chip (like EPROM or PROM) and associated slots, other makes data be transferred to the mobile storage unit 2430 and the interface 2426 of computer system 2400 from mobile storage unit 2430.
Computer system 2400 also comprises communication interface 2440.Communication interface 2440 is transmitted software and data between computer system 2400 and outer setting.The example of communication interface 2440 comprises modulator-demodular unit, network interface (like Ethernet card), COM1, PCMCIA groove and card etc.The software through communication interface 2440 transmission and the form of data can be that electricity, electromagnetism, light or other can be by the signals of communication interface 2440 receptions.Can these signals be provided to communication interface 2440 through communication port 2442.Communication port 2442 transmits signal, and it can use electric wire or cable, optical fiber, telephone wire, mobile phone connection, RF connects and other communication channel realizes.
Be generally used for referring to media at this used term " computer program media " and " computing machine can use media ", like mobile storage unit 2428, the signal that is installed in the hard disk on the hard disk drive 2422 and receives by communication interface 2440.These computer programs are to be used to provide the means of software to computer system 2400.
Computer program (being also referred to as computer control logic) is stored in primary memory 2406 and/or the second memory 2420.Computer program also can receive through communication interface 2440.This computer program makes computer system 2400 realize the present invention when being performed, as in this discussion.Specifically, when computer program is performed, make processor 2400 realize processing of the present invention, like any method in this discussion.Therefore, this computer program is represented the controller of computer system 2400.Using software to realize when of the present invention, can software is being stored in the computer program and use mobile storage driver 2424, interface 2426 or communication interface 2440 that software is loaded in the computer system 2400.
In another embodiment, characteristic of the present invention mainly is in hardware, to realize, for example, uses hardware device, like special IC (ASIC) and gate matrix.The realization of using hardware state machine to carry out said function also is conspicuous to those of ordinary skill in the art.
G, conclusion
Though more than described various embodiment of the present invention, should be appreciated that its purpose only is to illustrate, and do not have restricted.Those skilled in the art knows, is not leaving under the spirit and scope of the present invention situation, also can make various changes in form with on the details.Therefore, the improper arbitrary embodiment that only is confined to above description of protection scope of the present invention, and should limit according to claim and equivalence replacement thereof.

Claims (9)

1. the method for the influence of the lost frames in the series of frames of in subband predictive coding system, hiding the presentation code sound signal is characterized in that said method comprises:
One or more received frames in the said series of frames of decoding are to produce full band output audio signal, and wherein said full band output audio signal comprises the combination of at least the first subband decoded audio signal and the second subband decoded audio signal;
The said full band output audio signal of the corresponding said one or more received frames of storage; And
The full band output audio signal of synthetic corresponding said lost frames, wherein the full band output audio signal of synthetic corresponding said lost frames comprises the full band output audio signal execution waveform extrapolation based on corresponding one or more received frames of said storage; Wherein, carrying out the waveform extrapolation based on the full band output audio signal of corresponding one or more received frames of said storage comprises:
Calculate the related pitch period of full band output audio signal with said storage, wherein calculate said pitch period and comprise:
The full band output audio signal of said storage is sampled to produce the full band output audio signal of sampling;
Carry out search to confirm thick pitch period based on the full band output audio signal of said sampling; And
Carry out search to confirm the pitch period of refinement in the scope predetermined around said thick pitch period based on the full band output audio signal of said storage;
Carrying out the waveform extrapolation based on the full band output audio signal of corresponding one or more received frames of said storage also comprises:
If figure of merit is less than figure of merit high threshold; Then through the short-term composite filter to carrying out filtering through the white Gauss noise of adjustment it is provided the identical spectrum envelope of spectrum envelope with the full band output audio signal of a nearest frame; Wherein, said figure of merit is used for confirming periodicity extrapolation waveform and the mixing ratio between the filter noise waveform during the lost frames;
If figure of merit is more than or equal to the low threshold value of the figure of merit and smaller or equal to figure of merit high threshold, then said periodic waveform extrapolation signal and said noise signal through filtering are mixed the full band output audio signal with synthetic corresponding said lost frames; Wherein, the said synthetic full band output audio signal related with said lost frames specifically comprises:
Calculate figure of merit, and confirm scale factor Gp and Gr according to figure of merit; And
The ratio of said periodic waveform extrapolation signal of confirming to be used to mix based on said scale factor Gp and Gr and said noise signal through filtering.
2. method according to claim 1 is characterized in that, the one or more received frames in the said series of frames of decoding comprise:
The incoming bit stream related with received frame separated at least the first subband bit stream and the second subband bit stream;
The said first subband bit stream of decoding is to produce the said first subband decoded audio signal in first demoder; And
The said second subband bit stream of decoding is to produce the said second subband decoded audio signal in second demoder.
3. method according to claim 2 is characterized in that, wherein:
Said first demoder is a low strap adaptive difference pulse code modulation decoding device; And
Said second demoder is a high-band adaptive difference pulse code modulation decoding device.
4. method according to claim 2 is characterized in that, said method further comprises:
After synthetic related with said lost frames full band output audio signal, upgrade the internal state of said first demoder and second demoder, the internal state that wherein upgrades said first demoder and second demoder comprises the full band output audio signal that coding is related with said lost frames.
5. the system of the influence of the lost frames in the series of frames of a hiding presentation code sound signal in subband predictive coding system is characterized in that, comprising:
Be used for decoding one or more received frames of series of frames of expression coding audio signal to produce the demoder of full band output audio signal, wherein said full band output audio signal comprises the combination of at least the first subband decoded audio signal and the second subband decoded audio signal;
Be used to store the buffer memory of the full band output audio signal of corresponding said one or more received frames; And
The full band sound signal compositor that is used for the full band output audio signal of synthetic corresponding said series of frames lost frames, wherein the full band output audio signal of synthetic corresponding said lost frames comprises the full band output audio signal execution waveform extrapolation based on corresponding one or more received frames of said storage; Wherein, carrying out the waveform extrapolation based on the full band output audio signal of corresponding one or more received frames of said storage comprises:
Calculate the related pitch period of full band output audio signal with said storage, wherein calculate said pitch period and comprise:
The full band output audio signal of said storage is sampled to produce the full band output audio signal of sampling;
Carry out search to confirm thick pitch period based on the full band output audio signal of said sampling; And
Carry out search to confirm the pitch period of refinement in the scope predetermined around said thick pitch period based on the full band output audio signal of said storage;
If said full band sound signal compositor also is used for figure of merit less than figure of merit high threshold; Then through the short-term composite filter to carrying out filtering through the white Gauss noise of adjustment it is provided the identical spectrum envelope of spectrum envelope with the full band output audio signal of a nearest frame; Wherein, said figure of merit is used for confirming periodicity extrapolation waveform and the mixing ratio between the filter noise waveform during the lost frames;
If said full band sound signal compositor is further used for figure of merit more than or equal to the low threshold value of the figure of merit and smaller or equal to figure of merit high threshold, then said periodic waveform extrapolation signal and said noise signal through filtering are mixed the full band output audio signal with synthetic corresponding said lost frames; Wherein, the full band output audio signal of synthetic corresponding said lost frames comprises: calculate figure of merit, and confirm scale factor Gp and Gr according to figure of merit; And the ratio of said periodic waveform extrapolation signal of confirming to be used to mix based on said scale factor Gp and Gr and said noise signal through filtering.
6. system according to claim 5 is characterized in that, further comprises:
Be used for will the incoming bit stream related separating into the bit stream demultiplexer of at least the first subband bit stream and the second subband bit stream with received frame;
Wherein said demoder comprises:
Be used to decode the said first subband bit stream to produce first demoder of the said first subband decoded audio signal; And
Be used to decode the said second subband bit stream to produce second demoder of the said second subband decoded audio signal.
7. system according to claim 6 is characterized in that, wherein:
Said first demoder is a low strap adaptive difference pulse code modulation decoding device; And
Said second demoder is a high-band adaptive difference pulse code modulation decoding device.
8. system according to claim 6 is characterized in that, further comprises:
Be used for after the synthetic full band output audio signal related with said lost frames, upgrading the sub-band decoder state new logic more of the internal state of said first demoder and second demoder, the internal state that wherein upgrades said first demoder and second demoder comprises the full band output audio signal related with said lost frames of encoding.
9. system according to claim 5 is characterized in that, said full band sound signal compositor is used for based on the full band output audio signal performance period property waveform extrapolation of corresponding one or more received frames of said storage to produce periodic waveform extrapolation signal.
CN200780001854XA 2006-08-15 2007-08-15 Packet loss concealment for sub-band predictive coding based on extrapolation of full-band audio waveform Expired - Fee Related CN101366079B (en)

Applications Claiming Priority (9)

Application Number Priority Date Filing Date Title
US83762706P 2006-08-15 2006-08-15
US60/837,627 2006-08-15
US84804906P 2006-09-29 2006-09-29
US84805106P 2006-09-29 2006-09-29
US60/848,049 2006-09-29
US60/848,051 2006-09-29
US85346106P 2006-10-23 2006-10-23
US60/853,461 2006-10-23
PCT/US2007/075975 WO2008022176A2 (en) 2006-08-15 2007-08-15 Packet loss concealment for sub-band predictive coding based on extrapolation of full-band audio waveform

Publications (2)

Publication Number Publication Date
CN101366079A CN101366079A (en) 2009-02-11
CN101366079B true CN101366079B (en) 2012-02-15

Family

ID=40332816

Family Applications (5)

Application Number Title Priority Date Filing Date
CN2007800015096A Expired - Fee Related CN101361113B (en) 2006-08-15 2007-08-15 Constrained and controlled decoding after packet loss
CN2007800031830A Expired - Fee Related CN101375330B (en) 2006-08-15 2007-08-15 Re-phasing of decoder states after packet loss
CN200780001854XA Expired - Fee Related CN101366079B (en) 2006-08-15 2007-08-15 Packet loss concealment for sub-band predictive coding based on extrapolation of full-band audio waveform
CN2007800014996A Expired - Fee Related CN101361112B (en) 2006-08-15 2007-08-15 Re-phasing of decoder states after packet loss
CN2007800020499A Active CN101366080B (en) 2006-08-15 2007-08-15 Method and system for updating state of demoder

Family Applications Before (2)

Application Number Title Priority Date Filing Date
CN2007800015096A Expired - Fee Related CN101361113B (en) 2006-08-15 2007-08-15 Constrained and controlled decoding after packet loss
CN2007800031830A Expired - Fee Related CN101375330B (en) 2006-08-15 2007-08-15 Re-phasing of decoder states after packet loss

Family Applications After (2)

Application Number Title Priority Date Filing Date
CN2007800014996A Expired - Fee Related CN101361112B (en) 2006-08-15 2007-08-15 Re-phasing of decoder states after packet loss
CN2007800020499A Active CN101366080B (en) 2006-08-15 2007-08-15 Method and system for updating state of demoder

Country Status (1)

Country Link
CN (5) CN101361113B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101445296B1 (en) * 2010-03-10 2014-09-29 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. Audio signal decoder, audio signal encoder, methods and computer program using a sampling rate dependent time-warp contour encoding
KR20150056770A (en) 2012-09-13 2015-05-27 엘지전자 주식회사 Frame loss recovering method, and audio decoding method and device using same
EP3576087B1 (en) * 2013-02-05 2021-04-07 Telefonaktiebolaget LM Ericsson (publ) Audio frame loss concealment
BR112015031180B1 (en) * 2013-06-21 2022-04-05 Fraunhofer- Gesellschaft Zur Förderung Der Angewandten Forschung E.V Apparatus and method for generating an adaptive spectral shape of comfort noise
CN108364657B (en) * 2013-07-16 2020-10-30 超清编解码有限公司 Method and decoder for processing lost frame
EP2922055A1 (en) 2014-03-19 2015-09-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and corresponding computer program for generating an error concealment signal using individual replacement LPC representations for individual codebook information
EP2922054A1 (en) 2014-03-19 2015-09-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and corresponding computer program for generating an error concealment signal using an adaptive noise estimation
EP2922056A1 (en) 2014-03-19 2015-09-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and corresponding computer program for generating an error concealment signal using power compensation
NO2780522T3 (en) * 2014-05-15 2018-06-09
CN104021792B (en) * 2014-06-10 2016-10-26 中国电子科技集团公司第三十研究所 A kind of voice bag-losing hide method and system thereof
JP6700507B6 (en) * 2014-06-10 2020-07-22 エムキューエー リミテッド Digital encapsulation of audio signals
FR3024582A1 (en) * 2014-07-29 2016-02-05 Orange MANAGING FRAME LOSS IN A FD / LPD TRANSITION CONTEXT
EP2988300A1 (en) 2014-08-18 2016-02-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Switching of sampling rates at audio processing devices
EP3023983B1 (en) * 2014-11-21 2017-10-18 AKG Acoustics GmbH Method of packet loss concealment in ADPCM codec and ADPCM decoder with PLC circuit
CN106898356B (en) * 2017-03-14 2020-04-14 建荣半导体(深圳)有限公司 Packet loss hiding method and device suitable for Bluetooth voice call and Bluetooth voice processing chip
CN107749299B (en) * 2017-09-28 2021-07-09 瑞芯微电子股份有限公司 Multi-audio output method and device
CN110310621A (en) * 2019-05-16 2019-10-08 平安科技(深圳)有限公司 Sing synthetic method, device, equipment and computer readable storage medium
CN110970038B (en) * 2019-11-27 2023-04-18 云知声智能科技股份有限公司 Voice decoding method and device
CN113035205B (en) * 2020-12-28 2022-06-07 阿里巴巴(中国)有限公司 Audio packet loss compensation processing method and device and electronic equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2774827B1 (en) * 1998-02-06 2000-04-14 France Telecom METHOD FOR DECODING A BIT STREAM REPRESENTATIVE OF AN AUDIO SIGNAL
US7047190B1 (en) * 1999-04-19 2006-05-16 At&Tcorp. Method and apparatus for performing packet loss or frame erasure concealment
US7711563B2 (en) * 2001-08-17 2010-05-04 Broadcom Corporation Method and system for frame erasure concealment for predictive speech coding based on extrapolation of speech waveform
CA2388439A1 (en) * 2002-05-31 2003-11-30 Voiceage Corporation A method and device for efficient frame erasure concealment in linear predictive based speech codecs
US7830900B2 (en) * 2004-08-30 2010-11-09 Qualcomm Incorporated Method and apparatus for an adaptive de-jitter buffer
SG124307A1 (en) * 2005-01-20 2006-08-30 St Microelectronics Asia Method and system for lost packet concealment in high quality audio streaming applications

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Masahiro Serizawa et al.a packet loss concealment method using pitch waveform repetition and internal state update on the decoded speech for the sub-band ADPCM wideband speech codec.《Speech coding, IEEE Workshop Proceedings》.2002,68-70. *
Telecommunication Standardization Sector of ITU.7Khz audio-coding within 64kbit/s.《General Aspects of Digital Transmission Systems Terminal Equipments, ITU-T G.722》.1993,1-73. *
TelecommunicationStandardizationSectorofITU.7Khzaudio-codingwithin64kbit/s.《GeneralAspectsofDigitalTransmissionSystemsTerminalEquipments ITU-T G.722》.1993

Also Published As

Publication number Publication date
CN101361112A (en) 2009-02-04
CN101375330B (en) 2012-02-08
CN101361113B (en) 2011-11-30
CN101366080B (en) 2011-10-19
CN101361113A (en) 2009-02-04
CN101366080A (en) 2009-02-11
CN101366079A (en) 2009-02-11
CN101375330A (en) 2009-02-25
CN101361112B (en) 2012-02-15

Similar Documents

Publication Publication Date Title
CN101366079B (en) Packet loss concealment for sub-band predictive coding based on extrapolation of full-band audio waveform
CN1957398B (en) Methods and devices for low-frequency emphasis during audio compression based on acelp/tcx
EP2054878B1 (en) Constrained and controlled decoding after packet loss
CN101981615B (en) Concealment of transmission error in a digital signal in a hierarchical decoding structure
US7280960B2 (en) Sub-band voice codec with multi-stage codebooks and redundant coding
US9218817B2 (en) Low-delay sound-encoding alternating between predictive encoding and transform encoding
RU2690754C2 (en) Sampling frequency switching concept in audio signal processing devices
CN1751338B (en) Method and apparatus for speech coding
CN106575505A (en) Frame loss management in an fd/lpd transition context
KR102485835B1 (en) Determining a budget for lpd/fd transition frame encoding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1129764

Country of ref document: HK

C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1129764

Country of ref document: HK

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120215

Termination date: 20140815

EXPY Termination of patent right or utility model