Embodiment
Referring now to Fig. 1,, show the block diagram of wireless subscriber unit, hereinafter be called movement station (MS) 100, be suitable for supporting the different inventive concepts of the preferred embodiments of the present invention.MS 100 comprises antenna 102, preferably is connected to duplexer filter, reprod or circulator 104 that isolation is provided between receiver in MS 100 and the transmit chain.
As known in the art, receiver chain generally includes radio scanner front-end circuit 106 (reception, filtering effectively are provided, and intermediate frequency or base-band frequency conversion).The scanning front-end circuit is connected with signal processing function unit 108.The output of signal processing function unit offers suitable output unit 110, for example via the loudspeaker of Audio Processing Unit 130.
Audio Processing Unit 130 comprises voice coding functional unit 134, and it is encoded to user speech and is fit to the form that transmission medium transmits.Audio Processing Unit 130 also comprises voice coding functional unit 132, and it is with the form of tone decoding for being fit to export via output unit (loudspeaker) 110 of receiving.Audio Processing Unit 130 is connected with timer 118 with memory cell 116 via controller 114.Especially, the operation of Audio Processing Unit 130 is suitable for supporting the inventive concept of the preferred embodiments of the present invention.Especially, Audio Processing Unit 130 is suitable for choosing replacement speech frame from the speech frame of a large amount of previous transmission.Audio Processing Unit 130 or signal processor 108 can be enabled in the transmission of the reference/pointer signal (provide the replacement speech frame chosen) of selectable virtual transmission path in the main transmission path.The applicability of Audio Processing Unit 130 will further specify according to Fig. 2.
Consider that for integrality receiver chain also comprises received signal volume indicator (RSSI) circuit 112 (linking to each other with radio scanner front end 106 in the diagram, although RSSI circuit 112 can be arranged in other any positions of receiver chain).The RSSI circuit is connected with controller 114, to safeguard whole subscriber unit control.Controller 114 also is connected with signal processing function unit 108 (realizing by DSP usually) with radio scanner front-end circuit 106.Therefore, controller 114 can receive bit error rate (BER) and frame error rate (FER) data from recovering information.Controller 114 links to each other with the store operation rule with storage arrangement 116, for example decoding/encoding function or the like.Timer 118 links to each other with controller 114 usually, with the timing of operation (transmission of time correlation signal and reception) in the control MS 100.
In environment of the present invention, timer 118 has been stipulated the timing of the voice signal in transmission (coding) path and/or reception (decoding) path.
About sending chain, it comprises input media 120 in essence, for example the microphone sensor that is in series via speech coder 134 and transmitter/modulation circuit 122.After this, any transmission signal sends from antenna 102 via power amplifier 124.By the output from the power amplifier that links to each other with duplexer filter or circulator 104,124 pairs of controllers of transmitter/modulation circuit 122 and power amplifier respond.Comprise up-conversion and frequency down-conversion function unit (not shown) in transmitter/modulation circuit 122 and the radio scanner front-end circuit 106.
Certainly, the disparate modules among the MS 100 can be arranged according to function topology any appropriate, that can utilize inventive concept of the present invention.In addition, the disparate modules among the MS 100 can be implemented as discrete or integrated disparate modules form, so its basic structure is only selected arbitrarily.
The present invention's expection, voice signal preferably cushions or disposal route can realize in software, firmware or hardware, and method preferably is to adopt software processes device (perhaps digital signal processor (DSP)) to finish language process function.
Referring now to Fig. 2,, it shows the block diagram of code book Excited Linear Prediction (CELP) speech coder 134 according to the preferred embodiments of the present invention.Audio input signal to be analyzed puts on the speech coder 134 on the microphone 202.Then, input signal puts on wave filter 204.Wave filter 204 has the characteristic of bandpass filter usually.Yet if speech bandwidth is enough, wave filter 204 may comprise direct circuit connection so.
As known in the art, then be converted into N pulse sampling sequence from the analog voice signal of wave filter 204, the amplitude of each pulse sampling is represented by the digital code in modulus (A/D) converter 208.Sampling rate is determined by sampling clock (SC).Sampling clock (SC) is along with frame clock (FC) produces together.
Numeral output with the A/D 208 that imports speech vector s (n) expression can put on coefficient analyser 210.As known in the art, input speech vector s (n) can repeat to obtain from the frame that separates, and just obtains from the time block by frame clock (FC) decision length.
According to the preferred embodiments of the present invention,, can produce linear predictive coding (LPC) parameter set by parameter analyzer 210 for each block of speech.The speech coding parameters that produces may comprise with the lower part: LPC parameter, long-term prediction (LTP) parameter, excitation gain factor (G
2) (together with the random code book excitation code word I of the best).These speech coding parameters are applied to multiplexer 250, and use by the voice operation demonstrator that channel sends in the demoder.Input speech vector s (n) also is applied to subtracter 230, and its function illustrates subsequently.
In the traditional celp coder of Fig. 2, for minimum weighted in the excitation vector is selected in the summation that obtains being used for representing importing speech samples, selection optimal index and gain in the adaptive codebook of codebook search controller 240 from module 216 and the random code book in the module 214.The output of random code book 214 and adaptive codebook 216 is input to respectively in gain function unit 222 and 218.As known in the art, the adjusted output that gains is sued for peace in totalizer 220, is input to then in the LPC wave filter 224.
At first, calculate adaptive codebook or long-term prediction component l (n).It is characterized in that postponing and gain factor " G
1".
For each independent stochastic codebook excitation vector u
i(n), relatively import the speech vector s ' that speech vector s (n) produces reconstruct
i(n).Gain module 222 scaled excitation gain factor " G
2", summation module 220 increases the adaptive codebook component.Such gain can be calculated in advance and is used to analyze all excitation vectors by coefficient analyser 210, perhaps can carry out combined optimization with search Optimum Excitation code word I, and Optimum Excitation code word I is produced by codebook search controller 240.
Pumping signal G by 224 pairs of convergent-divergents of linear predictive coding wave filter then
1L (n)+G
2u
i(n) carry out filtering, wave filter 224 has constituted short-term prediction (STP) wave filter, in order to produce the speech vector s ' of reconstruct
i(n).The reconstruct speech vector s ' that is used for i boot code vector
i(n) same block with input speech vector s (n) compares, and this is by finishing these two signal subtractions in subtracter 230.
Difference vector e
i(n) poor between expression raw tone piece and the reconstruct block of speech.Difference vector carries out perceptual weighting by weighting filter 232, uses the weighting filter parameter (WTP) that is produced by coefficient analyser 210.Perceptual weighting has been strengthened error wherein to the sensuously prior frequency of people's ear, and has weakened other frequency.
Energy calculator functional unit in the codebook search controller 240 calculates weighted difference vector e '
i(n) energy.The codebook search controller relatively is used for current excitation vectors u
i(n) i error signal and former error signal are to determine to produce the excitation vectors of least error.Sign indicating number with i excitation vectors of least error is exported as Optimum Excitation sign indicating number I on channel subsequently.
Scaled excitation G
1L (n)+G
2u
1(n) copy is stored in the long-term prediction storer 216 standby.
In addition, codebook search controller 240 can be determined specific code word, and this code word provides the error signal with some preassigned, such as satisfying predetermined error threshold.
The more detailed description of typical case's voice coding unit can find from following document: A.M.Kondoz, " Digital speech coding for low-bit rate communications systems ", John Wiley, 1994.
In a preferred embodiment of the invention, error reduction technology is applied to speech frame after multiplexer 250.The present invention has utilized selective (being preferably parallel) virtual transmission path 282, and it is used to send the pointer of sensing speech frame of coding before sending from scrambler on the main transmission path 281.
In environment of the present invention, term " virtual " is defined as the transmission path except the main transmission path of support voice communication, and it is assumed to from the scrambler to the demoder." virtual " transmission path can be positioned at identical bit stream, perhaps in the identical time frame or multiframe in time division multiplex mechanism, perhaps via different communication routes, for example in VoIP system.By utilizing additional virtual transmission path, it has different error statistics (for example separating FEC mechanism) ideally, and reference/pointer will obtain the error identical with the speech frame of its reference.
Significantly not being both after the multiplexing operation with of known coded configuration is second to minimize part.Speech parameter data in such circuit estimation buffering is also selected near the current speech frame one.
In strengthening embodiment, parallel virtual transmission path is used different forward error recovery (FEC) protection of using with speech coder in main transmission path.Like this, by using independent F EC path, the error statistics that the VoP experience is different.Difference between main transmission path and the parallel virtual transmission path helps to improve the robustness to error.
Multiplexer 250 output data bag/frames are to the impact damper 260 of the in the past multiplexing frame of control.The buffered frame of the multiplexed signals in demodulation multiplexer 270 access buffer 260.Herein, demodulation multiplexer 270 separates excitation parameters 274 with LPC parameter 272.Notice that the storer that is used to produce the long-term prediction device of excitation parameters must be identical with the long-term prediction device 216 that frame begins to locate.
For each block of multiplexed speech, produce linear predictive coding (LPC) parameter set of present frame and former frame thus.In a preferred embodiment of the invention, the set of each quantification LPC parameter and excitation parameters has formed the speech vector s ' of the reconstruct of frame before j that is used for buffered data
j(n).It is by coming to compare with the speech vector s (n) that cushions previously to these two signal subtractions in subtracter 262.
Difference vector e
j(n) poor between the original and block of speech that cushions previously of expression.Difference vector carries out perceptual weighting by LPC weighting filter 264.As noted, perceptual weighting has strengthened those people's ear has been felt the frequency of prior error, and other the frequency of having decayed.
Energy calculator functional unit in the codebook search controller 266 calculates weighted difference vector e '
j(n) energy.Codebook search controller 266 relatively is used for current excitation vectors u
j(n) j the error signal and the error signal of front are to determine to produce the excitation vectors of least error.Codebook search controller 266 is selected " optimal index of frame data " subsequently, so that minimum weighted to be provided." pointer " of frame was sent to demoder before scrambler will point to subsequently, and this preceding frame is confirmed as providing the minimum weighted between each speech frame in himself and the main transmission path.
In essence, the speech frame of reference (ideally, different with current transmission frame on time or number of frames) has constituted the frame of the frame (on the meaning of perceptual weighting error) that is similar to encoder encodes in the specific mobile voice window most.Therefore, if mistake has received frame, its expression is used for the optimum matching (pointer) that error reduces the present frame of step.This expression or pointer will be described in conjunction with Fig. 3 below in more detail.
Referring now to Fig. 3,, the buffering timing diagram 300 that illustrates has illustrated preferred process of the present invention.Timing diagram explanation frame-0 310 is received and is confirmed as mistake at Voice decoder.Demoder inserts selective virtual transmission path then to determine that optimal frame comes replacement frame-0 310.As shown in Figure 3, selective virtual transmission path is included in the pointer of frame-4 320, substitutes as the preferred of frame-0 310.By with frame-4 320 replacement frame-0 310, in the tone decoding process, only voice quality has been produced minimum influence.
The present inventor recognizes and has used such fact, and promptly several frames in front (usually) are all said by identical talker, and promptly these speech frames will show similar fundamental tone and resonance peak position.Therefore, probably find the former speech frame similar to the current speech frame.
According to a preferred embodiment of the invention, by finding the minimal sensation error for each buffered frame assessment weighting segmental signal-to-noise ratio (SEGSNR) or average weighted SNR, the given parameter sets that is used for every frame in storer here.Preferably, in audio coder ﹠ decoder (codec) subframe rank definition segment.
This determines to finish in scrambler.Exist under the situation of little pitch error, expectation may obtain significantly different SEGSNR value.This is because source voice and buffering signal may shift out phase place fast.Therefore, in enhancing embodiment of the present invention, suggestion is searched near the pitch period of buffered frame, for example+/-5%, uses sub sampling (sub-sample) to decompose (normally 1/3 or 1/4 sampling), selects maximum SEGSNR value.
During another strengthens in the present invention,, then be used to reduce the bad frame self that receives of this frame and will be the source of the voice messaging of the best that is used for the present frame that mistake receives, as shown in Figure 4 if mistake has received this frame self.Therefore, Fig. 4 has illustrated the timing diagram of pointing out how to handle multiple error.Known from the data of frame-0 410 is wrong.The process of the reduction error of suggestion has been used selective virtual transmission path, and it is appointed as suitable substituting with Frame-4 420.But Frame-4 420 also is confirmed as wrong.In the case, pointer will be appointed as the frame the most similar to worsening frame-4 420 from the data of frame-6 430.Therefore, frame-6 430 is used for replacement frame-4 420 and is applicable to replacement frame-0 410.Like this, just can handle the multiframe mistake, overflow the problem of (out-of-memory) reference to overcome storage.
This may cause with reference to (pointer) finally straight-through effectively (lead out of) memory window.But if the improper value in the window obtains upgrading by the needs of removing many references, this just no longer is a problem.
In a word, flow in the main bit stream at selective bit, reference or beacon transmission are to demoder.Reference or pointer have pointed out to have mated best the frame of the former transmission of current transmission frame.Reference or pointer be transmission in parallel bit stream preferably.If received frame, just in frame substitution error reduction process, use reference or pointer in the Voice decoder mistake.Therefore, by with known formerly or the subsequent frame replacement mechanism expand to the reduction that arbitrary frame in a plurality of frames comes the enhancement frame error.In this, the quantity of the frame that uses during the course is subjected to the restriction of the required processing power of buffered/stored device and/or definite minimum weighted frame.
As noted, the buffered/stored of the speech parameter of speech coder is handled and to be based on that a plurality of frames carry out.For example, in the situation of GSM EFR (EFR) codec (<12kb/ second), three second voice memory space have only the 5K byte.Therefore, the most difficult task is the immediate frame coupling of identification from 150 possible frames.Therefore, in one embodiment of the invention, above-mentioned minimum weighted selection technology can be used for subset of parameters or is used to derive from the parameter of synthetic speech, rather than all parameters of speech coder frame.In other words, may be with reference to the energy (getting the speech parameter of the synthetic speech that all calculates in the comfortable encoder) of (or sensing) LPC filter parameter (LSF) and synthetic speech frame, rather than precision encoding device parameter, thereby storage and comparison process have been saved.
In this, because speech frame comprises many parameters, the technology of suggestion can be applied to the parameter of any amount on principle.In celp coder, the example of these parameters comprises:
(i) line spectrum pair (LSP), its expression LPC parameter;
The long-term prediction (LTP) that (ii) is used for subframe-1 lags behind;
(iii) be used for the LTP gain of subframe-1;
The code book index that (iv) is used for subframe-1;
(v) be used for the code book gain of subframe-1;
(long-term prediction that vi) is used for subframe-2 lags behind;
(vii) be used for the LTP gain of subframe-2;
(the code book index that viii) is used for subframe-2;
(ix) code book that is used for subframe-2 gains;
(x) long-term prediction that is used for subframe-3 lags behind;
(xi) LTP that is used for subframe-3 gains;
(xii) be used for the code book index of subframe-3;
(xiii) code book that is used for subframe-3 gains;
(xiv) long-term prediction that is used for subframe-4 lags behind;
(xv) LTP that is used for subframe-4 gains;
(xvi) code book that is used for subframe-4 gains; Or
(xvii) code book that is used for subframe-4 gains.
Below also within limit of consideration of the present invention, can send pointer with reference to LSP set from previous frame, with the LSP of coupling present frame, rather than the entire parameter collection.In addition, might make pointer be used for each of a plurality of above-mentioned parameters.
In wireless communication system, parallel virtual transmission path preferably includes: transmission block coded reference word in the not protected bit of data useful load (7 bits are enough to support 128 frames buffering herein, are equivalent to about 2.5 seconds).This can encode (having 75 bps equivalent rate) by the BCH block code of 15 bits, and the nearly error correction of 2 bits is provided.
In addition, can estimate that selective virtual transmission path may provide the combination of error correction and error-detecting function.Error-detecting will be useful, because the bad reception of reference can cause bad reduction.If poorly received reference word, frame received before this mechanism can default to.75 bps channel speed will be only be reduced to 22.725K bps to the thick bit rate of GSM full speed channel from 22.8K bps, and this will cause the inessential loss of sensitivity.
In a further embodiment, this is as voice-over ip (VoIP) communication link, and selective virtual transmission path can obtain by sending many bag streams.Basically can not increase though wish total flow in the case, because this may increase the rate of substitute.
Preferable mechanism is only under generation transformation and the astable situation of voice, to send the frame that is referenced to the front as mentioned above.When the voice stable state, and when the relative work of prior art is fine, do not send reference.Like this, packet network is excessively overload not, but has obtained most of performance gain.The degree that voice signal becomes static can be generated as a variable, and this variable can be adjusted into to improve under the situation of packet loss and reproduce quality.
Decoder function is the reverse side (adjunct circuit that does not have the multiplexer back) of encoder functionality basically, therefore here repeats no more.The description of the function of typical case's tone decoding unit can be found in below with reference to document: A.M.Kondoz, " Digital speech coding forlow-bit rate communications systems ", John Wiley, 1994.At demoder, demoder is followed the standard decode procedure, determines bad frame up to it.When detecting bad frame, demoder is assessed selective virtual transmission path to determine the indicated selective frame of each reference/pointer.Demoder receives " similar " frame subsequently, as pointed in the reference/pointer transmission.Zhi Shi frame was used for substituting the frame that receives subsequently in the past, with synthetic speech.
Advantageously, inventive concept described here can come existing codec is innovated in pattern or design by steal bit from the FEC mechanism of having constructed.
Should be appreciated that the bad frame error reduces mechanism as mentioned above, following at least advantage be provided:
(i) provide replacement frame mechanism more accurately, be reduced in thus in the speech frame of recovery can audible undesired man-made noise risk.
(ii) by for example stealing bit from the FEC mechanism of having constructed, selective virtual transmission path can innovate in pattern or design to existing codec.
(iii) only taking place to change and the astable situation of voice under just send to before during the reference of frame, will use to have bad frame error reduction technology now required any additional data among minimized thus the present invention.
(iv) by cross reference be the data that receive of particular frame and in this mechanism reference frame, can detect the wrong parameter that receives.
Although preferred embodiment has been discussed the application of the present invention to celp coder, the inventor can expect, inventive concept described here can be used for other Audio Processing Units of wireless communication unit, such as the digital exchange standard (DIIS) or the voice-over ip (VoIP) of Universal Mobile Telecommunications System (UMTS) unit, global system for mobile communications (GSM), land relay wireless (TETRA) communication unit, information and signaling.
The device invention
A kind of voice communication units comprises the speech coder that can represent input speech signal.This speech coder comprises transmission path, is used for a plurality of speech frames are transferred to Voice decoder.This speech coder further comprises virtual transmission path, is used for being transmitted in one or more references of a plurality of speech frames that transmission path transmits.Described one or more reference relates to the selective speech frame in a plurality of speech frames that transmit on transmission path, be used as replacement frame when bad frame.
A kind of voice communication units, for example above-mentioned voice communication units with speech coder comprises Voice decoder, is suitable for receiving a plurality of speech frames on the transmission path and receive one or more selective speech frame references on virtual transmission path.Described one or more reference relates to the selective speech frame in a plurality of speech frames that receive on transmission path, be used as replacement frame when bad frame.
The method invention
A kind of method that reduces the bad frame error in voice communication units, described method comprise the steps: on transmission path a plurality of speech frames to be transferred to Voice decoder by the speech coder in the voice communication units.Speech coder is transmitted in one or more references of a plurality of speech frames that transmit in the transmission path on virtual transmission path, wherein said one or more reference relates to the selective speech frame in a plurality of speech frames that transmit on transmission path, be used as replacement frame when bad frame.
Like this, when mistake receives speech frame, can select to improve replacement frame from a plurality of speech frames.
Therefore, describe bad frame error reduction technology and related voice communication unit and circuit here, reduced some shortcoming at least in the above-mentioned shortcoming of known error reduction technology basically.