GB2391440A

GB2391440A - Speech communication unit and method for error mitigation of speech frames

Info

Publication number: GB2391440A
Application number: GB0217729A
Authority: GB
Inventors: Jonathan Alastair Gibbs; Stephen Aftelak
Original assignee: Motorola Inc
Current assignee: Motorola Solutions Inc
Priority date: 2002-07-31
Filing date: 2002-07-31
Publication date: 2004-02-04
Anticipated expiration: 2022-07-31
Also published as: GB0217729D0; KR20050027272A; WO2004015690A1; EP1527440A1; JP2005534984A; CN1672193A; CN100349395C; GB2391440B; AU2003240644A1

Abstract

A speech communication unit comprises a speech encoder capable of representing an input speech signal, e.g. from microphone 202. The speech encoder is coupled to a main transmission path (281), for transmitting a number of speech frames to a speech decoder, and a virtual transmission path (282) for transmitting one or more references for a number of speech frames transmitted in the transmission path (281). The one or more references identify an alternative speech frame within the number of speech frames transmitted on the transmission path (281) to be used as a replacement frame when a frame is received including an error. A suitable alternative speech frame is one that exhibits a similar characteristic to the frame to be replaced, based on evaluation of a minimum weighted error. The speech communication unit provides the advantage that a more accurate replacement frame mechanism is provided, thereby reducing the risk of undesirable artefacts being audible in recovered speech frames.

Description

À 239 1 440 BS.;OS

Speech Communication Unit And Method For Error Mitigation Of Speech Frames Field of the Invention

This invention relates to speech coding and methods for improving the performance of speech codecs in speech communication units. The invention is applicable to, but not limited to, error mitigation in speech coders.

Background of the Invention

Many present day voice communications systems, such as the global system for mobile communications (GSM) 15 cellular telephony standard and the TErrestrial Trunked RAdio (TETRA) system for private mobile radio users, use speech-processing units to encode and decode speech patterns. In such voice communications systems a speech encoder in a transmitting unit converts the analogue 20 speech pattern into a suitable digital format for transmission. A speech decoder in a receiving unit converts a received digital speech signal into an audible analogue speech pattern.

25 As frequency spectrum for such wireless voice communication systems is a valuable resource, it is desirable to limit the channel bandwidth used by such speech signals, in order to maximise the number of users per frequency band. Hence, a primary objective in the 30 use of speech coding techniques is to reduce the occupied capacity of the speech patterns as much as possible, by use of compression techniques, without losing fidelity.

In the context of voice and data communication systems, a 35 further approach is to provide substantially less

À e e. c C e À CEO 0 5 iG UM;CT / EJ: /GB /G BBS./OS protection on speech signals, when compared to comparable data signals. This approach leads to comparably more errors within speech packets, than data packets as well as increased risk of losing whole speech packets.

In speech decoders, it is common for error mitigation techniques to be used, for example to improve the performance of the speech communication unit in the event of: 10 (i) Too many bit errors being present within a received speech frame; or (ii) A data packet (which may include speech information) within an Internet Protocol (IP) based network being lost. Bad-frame' mitigation techniques are needed to minimise the audible effect of frames received in error. These techniques reproduce an estimate of the missing speech frame, rather than injecting either silence or noise into 20 the decoded speech. Such techniques typically involve exploiting the statistical static properties of speech.

A single frame in error is usually adequately estimated by replacing it with similar parameters including energy, pitch, spectrum and voicing from the previous frame.

25 However, speech is not truly stationary e.g. speech onsets and plosives are very short events. Hence, this simple 'replacement' technique sometimes leads to unnatural, and therefore undesirable, artefacts.

30 In an ideal world it would be preferable to interpolate the data from either side of a transmission break, i.e. take data following the badframe sequence, as well as before, and interpolate therebetween. However, such an approach is unacceptable in voice communication systems 35 as it introduces undesirable delay.

À... À e CE00530;tT/EJE/B/GIBB3/OS If several bad frames are received then the energy of the speech signals is often reduced to zero after a few frames. Often a 'voicing' parameter is included because 5 it is useful to change what is repeated dependent upon whether the speech is voiced or not. In principle, for voiced speech, it is preferable to just repeat the periodic component. In contrast, for unvoiced speech, it is preferable to generate a similar audio spectrum and 10 similar energy without making it too periodic.

The inventors of the present invention have recognized and appreciated the limitations in using such a simple replacement' frame mechanism as a bad-frame mitigation 15 strategy. In particular, they have recognized that only on rare occasions is the replacing frame a truly suitable frame. Furthermore, if a number of frames are received in error, which may frequently occur on a poor quality wireless communication link, then the replacement frame 20 mechanism is even less acceptable.

Hence, a need has arisen for provision of an improved error mitigation technique when using such speech coders, to alleviate at least some of the aforementioned 25 disadvantages.

Summary of the Invention

In a first aspect of the present invention, a speech 30 communication unit is provided, in accordance with Claim

1. In a second aspect of the present invention, a speech communication unit is provided, in accordance with Claim 35 13.

e À 8 #

CE0053CT//,CB//OS

In a third aspect of the present invention, a method of performing badframe error mitigation in a voice communication unit is provided, in accordance with Claim 5 15. In a fourth aspect of the present invention, a speech communication unit is provided, in accordance with Claim 16. In a fifth aspect of the present invention, a wireless communication system is provided, in accordance with Claim 17.

15 Further aspects of the present invention are defined in the dependent Claims.

In summary, the present invention aims to provide a

communication unit, comprising a speech codec and method 20 of performing bad-frame error mitigation that at least alleviate some of the aforementioned disadvantages associated with current bad-frame error mitigation techniques. This is primarily achieved by transmitting speech frames on a transmission path and using a 25 reference/pointer that is transmitted on a virtual transmission path to indicate alternative replacement speech frames to be used by a speech decoder, should a speech frame on the transmission path be received in error. By utilising an additional virtual transmission 30 path, ideally with different error statistics e.g. separate FEC scheme, the reference/pointer will not be subject to the same errors as the speech frame it is referencing. Furthermore, a buffering technique is used in the encoder to select an alternative speech frame, 35 from a number of previously transmitted speech frames,

À. À e À CE0053U=T/EJEl6B/BBO/oS the selected alternative speech frame being one that exhibits similar characteristics to the speech frame that is to be referenced.

5 Brief Description of the Drawings

Exemplary embodiments of the present invention will now be described, with reference to the accompanying drawings, in which: 10 FIG. 1 shows a block diagram of a wireless communication unit containing a speech coder adapted to support the various inventive concepts of a preferred embodiment of the present invention; FIG. 2 shows a block diagram of a code excited linear 15 predictive speech coder adapted to support the various inventive concepts of a preferred embodiment of the present invention; FIG. 3 shows a use of a reference mechanism indicated by an alternative virtual transmission path, whereby 20 replacement frames are selected from a number of other frames, in accordance with the preferred embodiments of the present invention; and FIG. 4 shows an enhanced use of an alternative virtual transmission path, to address multiple errors occurring 25 in the main transmission path, in accordance with the preferred embodiments of the present invention.

Description of Preferred Embodiments

30 Turning now to FIG. 1, there is shown a block diagram of a wireless subscriber unit, hereinafter referred to as a mobile station (MS) 100 adapted to support the inventive concepts of the preferred embodiments of the present invention. The MS 100 contains an antenna 102 preferably 35 coupled to a duplex filter, antenna switch or circulator

À . ... CE0053T/E]L/.GB/GI3BSIOS

104 that provides isolation between a receiver and a -

transmitter chain within the MS 100.

As known in the art, the receiver chain typically 5 includes scanning receiver front-end circuitry 106 (effectively providing reception, filtering and intermediate or base-band frequency conversion). The scanning front-end circuit is serially coupled to a signal processing function 108. An output from the 10 signal processing function is provided to a suitable output device 110, such as a speaker via a speech-

processing unit 130.

The speech-processing unit 130 includes a speech encoding 15 function 134 to encode a user's speech into a format suitable for transmitting over the transmission medium.

The speech-processing unit 130 also includes a speech decoding function 132 to decode received speech into a format suitable for outputting via the output device 20 (speaker) 110. The speech-processing unit 130 is operably coupled to a memory unit 116, via link 136, and a timer 118 via a controller 114. In particular, the operation of the speech-processing unit 130 has been adapted to support the inventive concepts of the 25 preferred embodiments of the present invention. In particular, the speechprocessing unit 130 has been -

adapted to select a replacement speech frame from a number of previously transmitted speech frames. The speech processing unit 130, or signal processor 108, then 30 initiates transmission of a reference/pointer signal (indicating the selected replacement speech frame) in an alternative virtual transmission path to the primary transmission path. The adaptation of the speech-

processing unit 130 is further described with regard to 35 FIG. 2.

:.:.::

CE00510VCT//.0B/elt/OS For completeness, the receiver chain also includes received signal strength indicator (RSSI) circuitry 112 (shown coupled to the scanning receiver front-end 106, 5 although the RSSI circuitry 112 could be located elsewhere within the receiver chain). The RSSI circuitry is coupled to a controller 114 for maintaining overall subscriber unit control. The controller 114 is also coupled to the scanning receiver front-end circuitry 106 10 and the signal processing function 108 (generally realised by a DSP). The controller 119 may therefore receive bit error rate (BER) or frame error rate (FER) data from recovered information. The controller 114 is coupled to the memory device 116 for storing operating 15 regimes, such as decoding/encoding functions and the like. A timer 118 is typically coupled to the controller 114 to control the timing of operations (transmission or reception of time-dependent signals) within the MS 100.

20 In the context of the present invention, the timer 118 dictates the timing of speech signals, in the transmit (encoding) path and/or the receive (decoding) path.

As regards the transmit chain, this essentially includes 25 an input device 120, such as a microphone transducer coupled in series via speech encoder 134 to a transmitter/modulation circuit 122. Thereafter, any transmit signal is passed through a power amplifier 124 to be radiated from the antenna 102. The 30 transmitter/modulation circuitry 122 and the power amplifier 124 are operationally responsive to the controller, with an output from the power amplifier coupled to the duplex filter or circulator 104. The transmitter/modulation circuitry 122 and scanning 35 receiver front-end circuitry 106 comprise frequency up

1 1 ce r r CE0053U'/T/EJL'XGB//OS

conversion and frequency down-conversion functions (not shown). Of course, the various components within the MS 100 can 5 be arranged in any suitable functional topology able to utilise the inventive concepts of the present invention.

Furthermore, the various components within the MS 100 can be realised in discrete or integrated component form, with an ultimate structure therefore being merely an 10 arbitrary selection.

It is within the contemplation of the invention that the preferred buffering or processing of speech signals can be implemented in software, firmware or hardware, with 15 preferably a software processor (or indeed a digital signal processor (DSP)), performing the speech processing function. Referring now to FIG. 2, a block diagram of a code 20 excited linear predictive (CELP) speech encoder 200 is shown, according to the preferred embodiment of the present invention. An acoustic input signal to be analysed is applied to speech coder 200 at microphone 202. The input signal is then applied to filter 204.

25 Filter 204 will generally exhibit band-pass filter characteristics. However, if the speech bandwidth is already adequate, filter 204 may comprise a direct wire connection. 30 The analogue speech signal from filter 204 is then converted into a sequence of N pulse samples, and the amplitude of each pulse sample is then represented by a digital code in analogue-to-digltal (A/D) converter 208, as known in the art. The sampling rate is determined by

À CE O 0 5 3.;T / E AL t / jibed O S sample clock (SC). The sample clock (SC) is generated along with the frame clock (FC) via clock 212.

The digital output of A/D 208, which may be represented 5 as input speech vector s(n), is then applied to coefficient analyser 210. This input speech vector s(n) is repetitively obtained in separate frames, i.e., blocks of time, the length of which is determined by the frame clock (FC), as is known in the art.

For each block of speech, a set of linear predictive coding (LPC) parameters is produced in accordance with a preferred embodiment of the invention by coefficient analyses 210. The generated speech coder parameters may 15 include the following: LPC parameters, long-term predictor (LTP) parameters, excitation gain factor (G2) (along with the best stochastic codebook excitation codeword I). Such speech coding parameters are applied to multiplexer 250 and sent over the channel 252 for use 20 by the speech synthesizer at the decoder. The input speech vector s(n) is also applied to subtracter 230, the function of which is described later.

Within the conventional CELP encoder of FIG. 2, the 25 codebook search controller 240 selects the best indices and gains from the adaptive codebook within block 216 and the stochastic codebook within block 214 in order to produce a minimum weighted error in the summed chosen excitation vector used to represent the input speech 30 sample. The output of the stochastic codebook 214 and the adaptive codebook 216 are input into respective gain functions 222 and 218. The gain-adjusted outputs are then summed in summer 220 and input into the LPC filter 224, as is known in the art.

À CEo053TiEJ[/.GB/GoS Firstly, the adaptive codebook or long-term predictor component is computed l(n). This is characterized by a delay and a gain factor 'G1'.

5 For each individual stochastic codebook excitation vector ul(n), a reconstructed speech vector stl(n) is generated for comparison to the input speech vector s(n). Gain block 222 scales the excitation gain factor 'G2' and summing block 220 adds in the adaptive codebook 10 component. Such gain may be pre-computed by coefficient analyses 210 and used to analyse all excitation vectors, or may be optimised jointly with the search for the best excitation codeword I, generated by codebook search controller 240.

The scaled excitation signal G1l(n) + G2ul(n) is then filtered by the linear predictive coding filter 224, which constitutes a short-term predictor (STP) filter, to generate the reconstructed speech vector s'l(n) .

20 The reconstructed speech vector s' l(n) for the i-th excitation code vector is compared to the same block of input speech vector s(n) by subtracting these two signals in subtracter 230.

26 The difference vector el (n) represents the difference between the original and the reconstructed blocks of speech. The difference vector is perceptually weighted by weighting filter 232, utilising the weighting filter parameters (WTP) generated by coefficient analyser 210.

30 Perceptual weighting accentuates those frequencies where the error is perceptually more important to the human ear, and attenuates other frequencies.

An energy calculator function inside the codebook search 35 controller 240 computes the energy of the weighted

::: : c: ate CE005309CT/EJL./B/GIBSJOS

difference vector e'l(n). The codebook search controller compares the ith error signal for the present excitation vector ui(n) against previous error signals to determine the excitation vector producing the minimum error. The 5 code of the i-th excitation vector having a minimum error is then output over the channel as the best excitation code I. A copy of the scaled excitation G1l(n) + G2 UI (n) is stored 10 within the Long Term Predictor memory of 216 for future use. In the alternative, codebook search controller 240 may determine a particular codeword that provides an error 15 signal having some predetermined criteria, such as meeting a predefined error threshold.

A more detailed description of the functionality of a

typical speech encoding unit can be found in "Digital 20 speech coding for low-bit rate communications systems" by A. M. Kondoz, published by John Wiley in 1994.

In the preferred embodiment of the invention, an error mitigation technique has been applied to the speech 25 frames following the multiplexer 250. The invention makes use of an alternative, preferably parallel, virtual transmission path 282 that is used to send a pointer to a previously encoded speech frame sent from the encoder on the main transmission path 281.

In the context of the present invention, the expression virtual' is defined as a transmission path that is provided from the encoder to the decoder in addition to the primary transmission path that supports the speech 35 communication. The 'virtual' transmission path, may be

:: '.:::

CE005.U0y.CT/BJL/B/I,fJ S located within the same bit-stream, or within the same time frame or multi-frame in a time division multiplexed scheme, or via a different communication route, for example in a VoIP system. By utilising an additional 5 virtual transmission path, ideally with different error statistics e.g. a separate FEC scheme, the reference/pointer will not be subject to the same errors as the speech frame that it is referencing.

TO One notable difference to known encoding arrangements is that there is a second minimization section following the multiplexing operation. Such circuitry assesses the speech parameter data held in the buffer and selects the one that is closest to the current speech frame.

In an enhanced embodiment, the parallel virtual transmission path uses different forward error correction (FEC) protection from that used in the main transmission path by the speech coder. In this manner, by using an 20 independent FEC path, the speech data packet suffers from different error statistics. This difference between the main and parallel virtual transmission paths helps improve robustness to errors.

25 The multiplexer 250 outputs data packets/frames into a buffer 260 that holds previously multiplexed frames. A de-multiplexer 270 accesses the buffered frames of the multiplexed signal held in the buffer 260. In this regard, the de-multiplexer 270 separates the excitation 30 parameters 274 from the LPC parameters 272. Note that the memory of the long-term predictor used to generate the excitation parameters must be the same as the long term predictor 216 at the start of the frame.

c 8 1 C

CE00530UM//EJtjO5iGIBSigS For each block of multiplexed speech, a set of linear predictive coding (LPC) parameters for current frames and previous frames are therefore produced. In the preferred embodiment of the present invention, each set of 5 quantized LPC parameters and excitation parameters form reconstructed speech vectors s'j(n) for the j-th previous frame of buffered data. These are compared to previously buffered speech vectors s(n) by subtracting these two signals in subtracter 262.

The difference vector e: (n) represents the difference between the original and the previously buffered blocks of speech. The difference vector is perceptually weighted by LPC weighting filter 264. As indicated, 15 perceptual weighting accentuates those frequencies where the error is perceptually more important to the human ear, and attenuates other frequencies.

An energy calculator function inside a codebook search 20 controller 266 computes the energy of the weighted difference vector e'(n). The codebook search controller 266 compares the g-th error signal for the present excitation vector fin) against previous error signals to determine the excitation vector producing the minimum 25 error. The codebook search controller 266 then selects the 'best index to frame data' to provide the minimum weighted error. The encoder then transmits, to the decoder, a 'pointer' to the previous frame determined as providing the minimum weighted error between itself and 30 the respective speech frame in the main transmission path. In essence, the speech frame that is referenced (ideally differentially in time or frame number from the current 35 transmitted frame) constitutes the frame within a certain

À ' f CE00530//EJE;Gp./GIBS'OS moving window of speech that most closely resembles, in a perceptually weighted error sense, the frame that was encoded by the encoder. It therefore represents the best match (pointer) to the current frame for use in the error 5 mitigation procedure if the current frame is received in error. This representation, or pointer, is described in more detail with respect to FIG. 3.

Referring now to FIG. 3, a timing diagram 300 is shown 10 illustrating the preferred process of the present invention. The timing diagram illustrates frame 310 as having been received at a speech decoder and determined as being in error. The decoder has then accessed the alternative virtual transmission path to determine the 15 most appropriate frame to replace frame 310. As shown in FIG. 3 the alternative virtual transmission path has included a pointer to frame-4 320 as a preferred replacement of frame 310. By replacing frame 310 with frame-4 320, there is minimal effect on speech quality in 20 the speech decoding process.

The inventors of the present invention have appreciated, and utilised, the fact that the immediately preceding frames were (typically) all spoken by the same talker, 25 i.e. the speech frames will exhibit similar pitch and formant positions. Therefore, it is highly likely that a; similar previous speech frame can be found to the current speech frame.

30 In accordance with the preferred embodiment of the present invention, the minimum perceptual error is found by evaluating the weighted Segmental signal-to-noise (SEGSNR) or average weighted SNR for each of the buffered frames, given the sets of parameters for each frame

t re À CE0053040|EiGh;GI8BSiOS within the memory. Preferably, a segment is defined at the speech codec sub-frame level.

SEGSNR= 1 uh,ie1Ol (Energy of Perceptually Weighted Speech) [1] Subbrames, =, l, Energy of Perceptually Weighted Error) This determination is performed in the encoder. In cases where there is a small pitch error, it is envisaged that wildly different SEGSNR values may result. This is because the source speech and the buffered signal can TO quickly move out of phase. Hence, in an enhanced embodiment of the present invention, it is proposed to search around the pitch period for the buffered frames, say +/-5%, using sub-sample resolution (usually i/3 or 1/4 samples) and take the highest SEGSNR value.

In a yet further enhancement of the present invention, if that frame itself was received in error, then a frame that was used to mitigate against that frame's bad reception will itself be the best source of speech 20 information for the current frame received in error, as shown in FIG. 4. Hence, FIG. 4 illustrates a timing diagram indicating how multiple errors are handled. The data from frame 410 is known to be in error. The proposed error mitigation process employs an alternative 25 virtual transmission path that has indicated data frame-4 420 as a suitable replacement. However, data frame-4 420 is determined to be in error. In which case, a pointer indicates data from frame-6 430 as a frame that is the most similar frame to the corrupted frame-4 420.

30 Therefore, frame-6 450 is used to replace frame-4 420 and is suitable for replacing frame 410. In this manner, multiple frame errors can be handled, to overcome the problem of out-of-memory references.

e:' À À:e::: CEo0530UM/CT=='/GBiGP0Bs/ This may result in references (pointers) eventually leading out of, what is effectively, a storage window.

However, this does not need to be a problem if the erroneous values within the window are updated by 5 removing the need for multiple references.

In summary, a reference or a pointer is transmitted to

the decoder in an alternative bit stream to the primary bit stream. The reference or pointer indicates a 10 previously transmitted frame that best matches the currently transmitted frame. The reference or pointer is preferably transmitted in a parallel bit stream. If the frame is received in error at the speech decoder, the reference or pointer is used in the frame replacement 15 error mitigation process. Hence, frame mitigation has been enhanced by extending the known immediately preceding or immediately succeeding frame replacement mechanism to any frame from a number of frames. In this regard, the number of frames used in the process is only 20 limited by the buffering/storage mechanism and/or the processing power required to determine the minimum weighted error frame.

As indicated, the buffering/storage process of the speech 25 parameters of the speech coder is performed over a number of frames. For example, in the context of a GSM enhanced full rate (EFR) codec, of <12 kb/sec, the storage for three seconds of speech is only 5 Kbytes. The most difficult task is therefore identifying the closest frame 30 match from the one hundred and fifty possible frames.

Hence, in one embodiment of the present invention, the aforementioned minimum-weighted error selection technique may be applied to subsets of parameters or to parameters derived from the synthesized speech, rather than all of 35 the parameters of a speech coder frame. In other words,

À e e a À CE00530UM/C1lEJ>GB/GI7S/O5... the LPC filter parameters (LSFs) and energy of the synthesized speech frame (a derived speech parameter from the synthesised speech computed in both the encoder and decoder) may be referenced (or pointed to) rather than 5 the precise coder parameters in order to save on memory and comparison processing.

In this regard, since a speech frame includes many parameters, the proposed technique can be applied in 10 principle to any number of them. Examples of such parameters, in a CELP coder, include the following: (i) Line Spectral Pairs (LSPs) that represent the LPC parameters; 15 (ii) Long-term predictor (LTP) lag for subframe-1; (iii) LTE Gain for suLframel (iv)Codebook Index for subframe-l (v) Codebook Gain for subframe-l (vi) Long-term predictor lag for suLframe-2; 20 (vii) LTP Gain for subframe-2; (viii) Codebook Index for subframe-2; (ix) Codebook Gain for subframe-2; (x) Long-term predictor lag for subframe-3; (xi) LTP Gain for subframe-3; 25 (xii) Codebook Index for subframe-3i (xiii) Codebook Gain for subframe3i (xiv) Long-term predictor lag for subframe-4; (xv) LTP Gain for subframe-4; (xvi) Codebook Index for subframe-4; or 30 (xvii) Codebook Gain for subframe-4.

It is within the contemplation of the invention that a pointer could be sent referencing the set of LSPs from previous frames to match those of the current frame, 35 rather than the whole set of parameters. Alternatively,

e À. CEO0530UM/CTiiGB/GIq8S/p it would be possible to have a pointer for each of a number of the above parameters.

In a wireless communication system, the parallel virtual 5 transmission path preferably consists of transmitting a block coded reference word (where seven bits would be sufficient to support a 128-frame buffer, equating to approximately 2.5 seconds) within the unprotected bits of the data payload. This could be encoded with a BCH block 10 code of 15 bits (with an equivalent rate of 75 bits/see) providing up to 2-bit error correction. Alternatively, it is envisaged that the alternative virtual transmission

path may provide a combination of 15 error correction and error detection functions. Error detection would be useful since poor reception of the reference could lead to bad mitigation. In the event of a badly received reference word, the scheme could default to the previous frame repetition. The 75 bits/see of 20 channel rate would only reduce the gross bit-rate of the GSM full-rate channel from 22.S Kbits/sec. to 22.725 Kbits/sec., which would result in an insignificant loss of sensitivity.

25 In an alternative embodiment, such as a voice over an Internet Protocol (VoIP) communication link, the alternative virtual transmission path may be achieved by sending multiple packet streams. Although, in this context, it is desirable that the total traffic does not 30 increase substantially since this is likely to increase the packet dropping probabilities.

A preferred mechanism would be to send the references to previous frames as described above, only where 35 transitions occur and the speech is nonstationary. When

À À À À. À

À À À. CEO0530UM/CtiEL/GBlG:BBS/OS:.. the speech is stationary, and when conventional techniques will work relatively well, the references would not be sent. In this way the packet network is not unduly overloaded, but the majority of the performance 5 gains are achieved. The degree of 'how static a speech signal becomes' can be generated as a variable, which can be adjusted to improve the reproduced quality in the event of a lost packet.

10 The decoder functionality is substantially the reverse of that of the encoder (without the additional circuitry following the multiplexer), and is therefore not described here in detail. A description of the

functionality of a typical speech decoding unit can also 15 be found in "Digital speech coding for low-bit rate communications systems" by A. M. Kondoz, published by John Wiley in 1994. At the decoder, the decoder follows the standard decoding process until it determines a bad frame. When a bad frame is detected, the decoder 20 assesses the alternative virtual transmission path to determine the alternative frame indicated by the respective reference/pointer. The decoder then retrieves the 'similar' frame, as indicated by the reference/pointer transmission. The previously indicated 25 frame is then used to replace the received frame, to synthesise the speech.

Advantageously, the inventive concepts herein described may be retrofitted to existing codecs by stealing bits 30 from an already constructed FEC scheme.

It is within the contemplation of the invention that any speech processing circuit would benefit from the inventive concepts described herein.

À À CE00530UM/C!GBiCIBpS/O5. It will be understood that the bad-frame error mitigation mechanism, as described above, provides at least the following advantages: (i) A more accurate replacement frame mechanism is 5 provided, thereby reducing the risk of undesirable artefacts being audible in recovered speech frames.

(ii) The alternative virtual transmission path may be retrofitted to existing codecs, for example, by stealing bits from an already constructed FEC scheme.

10 (iii) When references to previous frames are only sent where transitions occur and the speech is non-stationary, then at other times the existing bad-frame error mitigation techniques can be used, thereby minimising any additional data required in the present invention.

15 (iv) By cross-referencing the data received for a given

frame with the frames referenced in this scheme, erroneously received parameters may be detected.

Whilst the preferred embodiment discusses the application 20 of the present invention to a CELP coder, it is envisaged by the inventors that any other speech-processing unit, where transmission errors may occur, can benefit from the inventive concepts contained herein. The inventive concepts described herein find particular use in speech 25 processing units for wireless communication units, such as universal mobile telecommunication system (UMTS) units, global system for mobile communications (GSM), TErrestrial Trunked RAdio (TETRA) communication units, Digital Interchange of Information and Signalling 30 standard (DIIS), Voice over Internet Protocol (VoIP) units, etc. Apparatus of the Invention:

ec. ':.-. Àe..

À CE00530UM/CTi[JL7GB/lEB5/O:: A speech communication unit includes a speech encoder capable of representing an input speech signal. The speech encoder includes a transmission path for transmitting a number of speech frames to a speech 5 decoder. The speech encoder further includes a virtual transmission path for transmitting one or more references for a number of speech frames transmitted in the transmission path. The one or more references relate to an alternative speech frame, within the number of speech 10 frames transmitted on the transmission path, to be used as a replacement frame when a frame is received in error.

speech communication unit, for example the above speech communication unit having a speech encoder, includes a 15 speech decoder adapted to receive a number of speech frames on a transmission path and one or more alternative speech frame references on a virtual transmission path.

The one or more references relate to an alternative speech frame within the number of speech frames received 20 on the transmission path to be used as a replacement frame when a frame is received in error.

Method of the Invention: 25 A method of performing bad-frame error mitigation in a voice communication unit includes the step of transmitting, by a speech encoder in a speech communication unit, a number of speech frames on a transmission path to a speech decoder. The speech 30 encoder transmits, on a virtual transmission path, one or more references for a number of speech frames transmitted in the transmission path. The one or more references relate to an alternative speech frame within the number of speech frames transmitted on the transmission path to

À't t:...e..

CE00530UM/Cl.iGBiGIB/ be used as a replacement frame when a frame is received in error.

In this manner, an improved replacement frame from a 5 number of speech frames may be selected when a speech frame is received in error.

Whilst specific, and preferred, implementations of the present invention are described above, it is clear that 10 one skilled in the art could readily apply further variations and modifications of such inventive concepts. -

Thus, a bad-frame error mitigation technique, and associated speech communication units and circuits, have 15 been described that substantially alleviate at least some of the aforementioned disadvantages with known error mitigation techniques.

Claims

À À #

À CE00530UM/ClE1L/GBYGIBSAOS:., Cla s 1. A speech communication unit (100) comprising a speech encoder (134) capable of representing an input speech 5 signal,! the speech encoder (134) adapted to transmit a number of speech frames to a speech decoder on a transmission path (281),

the speech encoder (134) adapted to transmit, on a TO virtual transmission path (282), one or more references for a number of speech frames transmitted on the transmission path (281), wherein the one or more references relate to an alternative speech frame, within the number of speech 15 frames transmitted on the transmission path (281), to be used as a replacement frame when a frame is received in error.
2. The speech communication unit (100) according to 20 Claim 1, wherein the speech encoder (134) is further characterized by: a multiplexer (250) for multiplexing said number of speech frames; a buffer (260), operably coupled to said 25 multiplexer (250), to store multiplexed speech data; and a processor (130,270), operably coupled to said buffer (260), for characterizing a current speech frame in said buffer (260), for selecting an alternative speech frame that exhibits a similar characteristic to said 30 speech frame, and for transmitting to the decoder, on the virtual transmission path (282), a reference to said alternative speech frame.

À CE 0 0 5
3 0 UM / CTi,E]lJGB jIB.BS /iS 3. The speech communication unit (100) according to Claim 2, wherein said processor includes a demultiplexer function (270) to access one or more speech frames in the buffer (260) and separates excitation parameters (274) 5 from LPC parameters (272) of the buffered speech frame to! select a speech frame exhibiting a similar characteristic.
4. The speech communication unit (100) according to 10 any preceding Claim, wherein the virtual transmission path (282) is contained within the same bit stream of the transmission path (281) in, for example, a wireless communication system.

15
5. The speech communication unit (100) according to any preceding Claim, wherein said transmission path (281) employs a first forward error correction protection scheme and said virtual transmission path (282) employs a second forward error correction protection different from 20 that used in the transmission path (281).
6. The speech communication unit (100) according to any of preceding Claims 2 to 5, wherein said processor (130, 266, 270) selects an alternative replacement frame 25 to provide a minimum weighted error.
7. The speech communication unit (100) according to Claim 6, wherein said processor (130, 266, 270) determines a minimum weighted error by evaluating a 30 weighted Segmental signal-to-noise (SEGSNR) or average weighted SNR for each of the buffered frames.

c À: a: À ce.::.: CE00530UM/C EJ/GB7lB/C:
8. The speech communication unit (100) according to Claim 6 or Claim 7, wherein said processor (130, 266, 270) determines a minimum weighted error of a subset of speech coding parameters.;
9. The speech communication unit (100) according to Claim 6, Claim 7 or Claim 8, wherein said processor (130, 266) searches substantially around a pitch period of said buffered speech frames, and selects a frame exhibiting 10 the highest SEGSNR value.
10. The speech communication unit (100) according to any preceding Claim, wherein said alternative speech frame (320) is referenced to said current speech frame 15 only where transitions occur and speech is non-

stationary.
11. The speech communication unit (100) according to any preceding Claim, wherein said referenced speech 20 frames stored in buffer (300) of both the encoder and decoder represent active speech frames instead of silent or background noise frames.
12. The speech communication unit (100) according to 25 Claim 11, wherein said referenced speech frames represent distinctive active speech frames.

r er * 1 1, 1

À #lo e CE005301M/C=lEdL./GB./GIBS/tOS:
13. A speech communication unit (100), for example a speech communication unit according to any preceding Claim, characterized by a speech decoder (132) adapted to receive a number of speech frames on a transmission path 5 (281) and one or more alternative speech frame (320) references on a virtual transmission path (282), wherein the one or more references relate to an alternative speech frame (320), within the number of speech frames received on the transmission path (281), to be used as a 10 replacement frame when a frame is received in error.
14. The speech communication unit (100) according to Claim 13, wherein, if said alternative speech frame (420) is received in error, then 15 (i) a further frame (430) is selected as the alternative, in place of the said alternative frame (410) received in error, and (ii) the further frame (430) is used in the replacement of both the current speech frame, received in error, and 20 the alternative speech frame (410), received in error.
15. A method of performing bad-frame error mitigation in a voice communication unit (100), the method comprising the step of transmitting, by a speech encoder (134) in a 25 speech communication unit (100), a number of speech frames on a transmission path (281) to a speech decoder, the method further comprising the step of: transmitting, on a virtual transmission path (282), one or more references for one or more of a number of speech 30 frames transmitted in the transmission path (281), wherein the one or more references relate to an alternative speech frame, within the number of speech frames transmitted on the transmission path (281), to be used as a replacement frame when a frame is received in 35 error.

À t. À 1 eve À CE00530UM/C:JGB/IS/OS.,.
16. A speech communications unit (100) adapted to perform the method steps according to Claim 15.

5
17. A wireless communication system comprising a speech communications unit in accordance with, and i adapted to support the use of a transmission path (281) and a virtual transmission path (282) as described in, any of claims 1-14 and 16.
18. A speech communication unit (100) substantially as hereinbefore described with reference to, and/or as illustrated by, FIG. 1 of the accompanying drawings.

15
19. A speech coder (130) substantially as hereinbefore described with reference to, and/or as illustrated by, FIG. 2 of the accompanying drawings.