GB2444757A - Code excited linear prediction speech coding and efficient tradeoff between wideband and narrowband speech quality - Google Patents

Code excited linear prediction speech coding and efficient tradeoff between wideband and narrowband speech quality Download PDF

Info

Publication number
GB2444757A
GB2444757A GB0624860A GB0624860A GB2444757A GB 2444757 A GB2444757 A GB 2444757A GB 0624860 A GB0624860 A GB 0624860A GB 0624860 A GB0624860 A GB 0624860A GB 2444757 A GB2444757 A GB 2444757A
Authority
GB
United Kingdom
Prior art keywords
signal
speech
filter
error signal
celp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0624860A
Other versions
GB2444757B (en
GB0624860D0 (en
Inventor
Jonathan Gibbs
Halil Fikretler
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Motorola Solutions Inc
Original Assignee
Motorola Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc filed Critical Motorola Inc
Priority to GB0624860A priority Critical patent/GB2444757B/en
Publication of GB0624860D0 publication Critical patent/GB0624860D0/en
Priority to PCT/US2007/083608 priority patent/WO2008076534A2/en
Publication of GB2444757A publication Critical patent/GB2444757A/en
Application granted granted Critical
Publication of GB2444757B publication Critical patent/GB2444757B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/06Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A CELP speech coder encodes a speech signal. The coder comprises a prediction filter (218) generating a predicted speech signal in response to an excitation signal. A subtraction unit (216) generates an error signal indicative of a difference between the predicted speech signal and the speech signal and a perceptual weighting filter filters the error signal to generate a perceptually weighted error signal which is filtered in a shaping filter (234) to generate a shaped error signal. The shaping filter (234) modifies a level of the perceptually weighted error signal in a lower frequency interval relative to a higher frequency interval. A codebook search controller (220) selects an encode excitation signal in response to the shaped error signal and an encoded signal is generated which comprises an indication of the encoded excitation signal. The shaping of the perceptually weighted error signal allows easy and efficient adjustment of a trade-off between wideband and narrowband speech quality.

Description

CODE EXCITED LINEAR PREDICTION SPEECH CODING
Field of the invention
The invention relates to Code Excited Linear Prediction (CELP) speech coding and in particular, but not exclusively, to encoding of narrowband and wideband speech.
Background of the Invention
Many present day voice communications systems, such as the Global System for Mobile communications (GSM) cellular telephony standard and the third generation cellular technology Universal Mobile Telecommunications System (UNTS), use speech-processing units to digitally encode and decode speech signals. In such voice communications systems, a speech encoder in a transmitting unit converts an analogue speech signal into a suitable digital format for transmission. A speech decoder in a receiving unit converts a received digital speech signal into an audible analogue speech signal.
As frequency spectrum for such wireless voice communication systems is a valuable resource, it is desirable to limit the channel bandwidth used by such speech signals, in order to maximise the number of users per frequency band. Hence, a primary objective in the use of speech coding techniques is to reduce the occupied bandwidth of the speech signals as much as possible, by use of compression techniques, without losing fidelity.
A detailed description of the functionality of a typical speech encoding unit can be found in "Digital speech coding for low-bit rate communications systems" by A. M. Kondoz, published by John Wiley in 1994.
In the field of Code Excited Linear Predictive (CELP) speech coders, speech coding techniques are adopted to provide high quality narrowband (300-3300 Hz) and wideband (50-7000 Hz) speech compression at low to medium bit rates (5-24 kb/s) for speech/audio communication units.
In the ITU-T Embedded Variable Bit Rate (EV-VBR) Speech Codec, work has started on an embedded speech coder that should provide high quality wideband speech from an 8kb/sec.
core and, in a series of four extra layers (4kb/sec., 4kb/sec., 8kb/sec. and 8kb/sec.), provide both quality improvements and error resilience improvements on this basic core codec. The main benefit of an embedded speech codec is that one encoding and decoding algorithm may be employed in many different equipment types and in different configurations, allowing terminals with different capabilities to exchange speech or audio signals, without the need for transcoding. These different capabilities may include input and output audio bandwidth (narrowband (300 - 3000 Hz), wideband (50 -7000 Hz) or superwideband (50 - 14000 Hz)) and monophonic or stereophonic pickup or rendering.
When an audio signal is presented to an instance of an embedded encoder algorithm, a bit stream will be produced which may be decoded by any decoder to provide an audio output. The many different configurations lead to quality CMLO4735EV optimization problems associated with balancing the quality in these configurations.
As an example, the quality requirements for the 8kb/sec.
core ITU-T codec are high. For example, it is required that a decoder only capable of narrowband speech synthesis should produce speech of similar quality for both a narrowband and wideband input to the encoder. ccordingly, the ITU-T Embedded Variable Bit P'ate Coder requires a wideband codec core which needs to provide excellent speech quality performance when the encoder is presented with wideband speech and may be decoded with either 16kHz (wideband) or 8kHz (narrowband) . Thus, the codec core must provide excellent results for narrowband encoding.
An implication of the requirements is that a significant proportion of the available coding resource/bit rate should be focused on the encoding of the narrowband frequencies of a wideband signal. However this tends to compromise the wideband speech quality performance of the encoder/decoder path which is clearly undesirable.
Hence, an improved speech encoding would be advantageous and in particular an encoding which allows increased flexibility, improved and/or facilitated trade-off between wideband and narrowband quality, facilitated implementation and/or improved performance would be advantageous.
Summary of the Invention
CMLO4735EV Accordingly, the Invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination.
According to a first aspect of the invention there is provided a Code Excited Linear Prediction, CELP, speech coder comprising: means for receiving a first speech signal; a prediction filter for generating a predicted speech signal in response to an excitation signal; means for generating an error signal indicative of a difference between the predicted speech signal and the received speech signal; a perceptual weighting filter for filtering the error signal to generate a perceptually weighted error signal; a shaping filter for filtering the perceptually weighted error signal to generate a shaped error signal, the shaping filter being arranged to modify a signal level of the perceptually weighted error signal in a lower frequency interval relative to a higher frequency interval; means for selecting an encode excitation signal in response to the shaped error signal; and means for generating an encoded signal for the first speech signal, the encoded signal comprising an indication of the encoded excitation signal.
The invention may allow improved performance and/or facilitated implementation. For example, the invention may allow a practical, easy to implement and/or high performance CELP encoder wherein quality/bit rate trade-offs between narrowband and wideband performance can easily be achieved by adjusting the shaping performed by the shaping filter. An improved quality of a decoded narrowband signal can be achieved from the encoded signal even for a wideband input signal to the encoder.
CMLO4735EV Specifically, the approach may allow control of perceived decoded speech quality by deliberately adding a difference in weighting between different frequency bands.
Specifically, by increasing a weighting of the lower frequency interval in the perceptually weighted error signal relative to a higher frequency interval an improved quality of a decoded narrowband signal from an encoded wideband signal can be achieved.
The shaped error signal may be generated for a plurality of different excitation signals and the encode excitation signal may be selected as the one resulting in the lowest shaped error signal according to a suitable metric, such as a signal energy measure. The selection of the encode excitation signal may be repeated for each speech frame, e.g. of 20 msec duration. The shaping filter may specifically be a digital filter applied to samples of the perceptually weighted error signal in the speech frame.
The system may in particular provide an efficient, flexible and/or adjustable tuning of the trade-off between narrowband and wideband performance.
According to an optional feature of the invention, the CELP speech coder further comprises means for determining a characteristic of the prediction filter in response to the received signal and wherein the shaping filter is independent of the characteristic.
This may allow improved performance and/or facilitated implementation. The prediction filter may be a filter which CMLO4735EV is changed during the encoding of a signal whereas the shaping filter is constant for the encoding of a signal.
Also, characteristics of the perceptual weighting filter may be determined in response to characteristics of the prediction filter and/or the received signal. Thus the perceptual weighting filter may be a time varying filter whereas the shaping filter may be static. The coefficients of the perceptual weighting filter may specifically be dependent of the received speech signal whereas the shaping filter may be independent of the received speech signal.
Specifically, new coefficients of the prediction filter and/or the perceptual weighting filter may be determined for each speech/encoding frame whereas the shaping filter is constant in different speech segments.
According to an optional feature of the invention, a gain of the shaping filter is higher in the lower frequency interval than in the higher frequency interval.
The invention may allow improved performance and/or facilitated implementation. In particular, the feature may allow a practical tuning of the coder towards improved narrowband quality.
According to another aspect of the invention, there is provided a communication unit comprising a Code Excited Linear Prediction, CELP, speech coder, the speech coder comprising: means for receiving a first speech signal; a prediction filter for generating a predicted speech signal in response to an excitation signal; means for generating an error signal indicative of a difference between the CMLO4735EV predicted speech signal and the received speech signal; a perceptual weighting filter for filtering the error signal to generate a perceptually weighted error signal; a shaping filter for filtering the perceptually weighted error signal to generate a shaped error signal, the shaping filter being arranged to modify a signal level of the perceptually weighted error signal in a lower frequency interval relative to a higher frequency interval; means for selecting an encode excitation signal in response to the shaped error signal; and means for generating an encoded signal for the first speech signal, the encoded signal comprising an indication of the encoded excitation signal.
According to another aspect of the invention, there is provided a method of Code Excited Linear Prediction, CELP, speech coding, the method comprising: receiving a first speech signal; a prediction filter generating a predicted speech signal in response to an excitation signal; generating an error signal indicative of a difference between the predicted speech signal and the received speech signal; a perceptual weighting filter filtering the error signal to generate a perceptually weighted error signal; a shaping filter filtering the perceptually weighted error signal to generate a shaped error signal, the shaping filter modifying a signal level of the perceptually weighted error signal in a lower frequency interval relative to a higher frequency interval; selecting an encode excitation signal in response to the shaped error signal; and generating an encoded signal for the first speech signal, the encoded signal comprising an indication of the encoded excitation signal.
CMLO4735EV These and other aspects, features and advantages of the invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter.
Brief Description of the Drawings
Embodiments of the invention will be described, by way of example only, with reference to the drawings, in which FIG. 1 is an illustration of a speech communication unit in accordance with some embodiments of the invention; FIG. 2 is an illustration of a speech encoder in accordance with some embodiments of the invention; FIG. 3 is an illustration of a speech decoder in accordance
with prior art;
FIG. 4 is an illustration of a shaping filter in accordance with some embodiments of the invention; FIG. 5, 6 and 7 illustrates examples of gain transfer functions for a shaping filter in accordance with some embodiments of the invention; and FIG. 8 illustrates an example of a method of Code Excited Linear Prediction, CELP, speech coding in accordance with some embodiments of the invention.
Detailed Description of Some Embodiments of the Invention CMLO4735EV The following description focuses on embodiments of the invention applicable to an ITU-T Embedded Variable Bit Rate (EV-VBR) CELP Speech encoder and in particular to an encoder capable of generating an encoded signal representing both wideband and narrowband speech data. However, it will be appreciated that the invention is not limited to this application but may be applied to many other speech encoders.
FIG. 1 illustrates an example of a speech communication unit in accordance with some embodiments of the invention. In the specific example, the communication unit is a Generation Cellular communication unit, such as a UMTS User Equipment (UE) The speech communication unit 100 contains an antenna 102 preferably coupled to a duplex filter or antenna switch 104 Lhat provides isolation between a receiver chain and a transmitter chain within the speech communication unit 100.
As known in the art, the receiver chain typically includes a receiver front-end circuit 106 (effectively providing reception, filtering and intermediate or base-band frequency conversion) . The front-end circuit 106 is serially coupled to a signal processing function 108. An output from the signal processing function 108 is provided to a suitable output device 110, such as a speaker via speech-processing logic 112.
The speech-processing logic 112 may comprise speech encoding logic 114 to encode a user's speech into a format suitable CMLO4735EV for transmitting over the transmission medium. The speech-processing logic 112 may also comprise speech decoding logic 116 to decode received speech into a format suitable for outputting via the output device (speaker) 110.
For completeness, the receiver chain also includes received signal strength indicator (RSSI) circuitry 118 (shown coupled to the receiver front-end 106, although the RSSI circuitry 118 could be located elsewhere within the receiver chain) . The RSSI circuitry 118 is coupled to a controller for maintaining overall subscriber unit control. The controller 120 is also coupled to the receiver front-end circuitry 106 and the signal processor 108 (typically realised by a digital signal processor (DSP)) . The controller 120 is coupled to a memory device 122 for storing operating regimes, such as decoding/encoding functions and the like. A timer 124 is typically coupled to the controller 120 to control the timing of operations (transmission or reception of time-dependent signals) within the speech communication unit 100.
In the context of the present description, the timer 124 dictates the timing of speech signals, in both transmit (encoding) path and receive (decoding) paths.
As regards the transmit chain, this essentially includes an input device 126, such as a microphone transducer coupled in series via speech encoder 112 to a transmitter/modulation circuit 128. Thereafter, any transmit signal is passed through a power amplifier 130 to be radiated from the antenna 102. The transmitter/modulation circuitry 128 and the power amplifier 130 are operationally responsive to the CMLO4735EV controller 120, with an output from the power amplifier 130 coupled to the duplex filter or antenna switch 104. The transmitter/modulation circuitry 128 and receiver front-end circuitry 106 comprise frequency up-conversion and frequency down-conversion functions (not shown) Of course, the various components within the speech communication unit 100 can be arranged in any suitable functional topology. Furthermore, the various components within the speech communication unit 100 can be realised in discrete or integrated component form, with an ultimate structure therefore being merely an arbitrary selection.
It is within the contemplation of the invention that the preferred buffering or processing of speech signals can be implemented in software, firmware or hardware, with e.g. a software processor (or indeed a digital signal processor (DSP)), performing the speech processing function.
FIG. 2 is an illustration of a speech encoder in accordance with some embodiments of the invention. Specifically, FIG. 2 illustrates elements of the speech processing logic 112 in more detail. The speech processing logic 112 specifically implements a code excited linear predictive (CELP) speech coder. An acoustic input signal to be analysed is applied to the speech coder at microphone 202. The input signal is then applied to filter 204 to remove high frequency components which would otherwise cause aliasing during sampling.
The analogue speech signal from filter 204 is then converted into a sequence of N pulse samples, and the amplitude of -CMLO4735EV each pulse sample is then represented by a digital code in analogue-to-digital (A/D) converter 208, as known in the art. The sampling rate is determined by a sample clock generated along with a frame clock from a common clock base.
The digital output of AID 208, which may be represented as input speech vector s(n), is then applied to coefficient analyser 210. This input speech vector s(n) is repetitively obtained in separate frames, e.g. blocks of time, the length of which is determined by the frame clock, as is known in the art. A speech frame may typically be 20 msec.
For each block of speech, a set of linear predictive coding (LPC) parameters is produced by the coefficient analyser 210. The generated speech coder parameters may include: LPC parameters, Long-Term Predictor (LTP) parameters, excitation gain factor (G2) (along with the best stochastic codebook excitation codeword I) . Such speech coding parameters are applied to multiplexer 212 and sent over the channel 214 for use by the speech synthesizer at the decoder. The input speech vector s(n) is also applied to subtractor logic 216, the function of which is described later.
The CELP coder comprises a prediction filter 218 (also typically referred to as a LPC (Linear Predictive Coder) synthesis filter) . For each, speech segment, the coefficients for the prediction filter 218 is determined based on the received speech samples. The coefficients are then included in the encoded signal (bitstream) and transmitted to a decoder thereby allowing the decoder to recreate the prediction filter used by the encoder.
CMLO4735EV Furthermore, for each speech frame, the CELP coder determines an excitation signal for the prediction filter 218. The excitation signal is generated such that the resulting signal generated by the prediction filter 218 most closely resembles the received signal. Specifically, the excitation signal is determined by an iterative search of excitation signals with a calculation of an error measure for each available excitation signal and a selection of the excitation signal resulting in the lowest error measure.
In the example, the excitation signal is generated as a combination of one of a number of predetermined excitation signals and one of a number of adaptive excitation signals generated from signals of previous speech frames. The selected excitation signal(s) is (are) indicated by a suitable index or code shared between the encoder and decoder and the index/code is included in the transmitted encoded signal thereby allowing the decoder to recreate the excitation signal while maintaining a low data rate.
Specifically, within the conventional CELP encoder of FIG. 2, a codebook search controller 220 selects the best excitation signal indices and gains from an adaptive codebook 222 and a stochastic codebook 224 such that it produces a minimum weighted error between the generated signal from the prediction filter 218 and the input speech signal. In the example, the adaptive codebook 222 comprises various excitation signals determined from previous speech frames and particularly suitable for representing voiced speech whereas the stochastic codebook 224 comprises a plurality of predetermined excitation signals resembling CM104735EV white noise sequences and being particularly suitable for onsets, unvoiced sections and transitions.
In the CELP encoder, the output of the stochastic codebook 224 and the adaptive codebook 222 are input into respective gain functions 228 and 226. The gain-adjusted outputs are then summed in summer 230 to generate an excitation signal which is input to the prediction filter 218.
Firstly the adaptive codebook or long-term predictor component is computed 1 (n) . This is characterised by a delay and a gain factor G1' For each individual stochastic codebook excitation vector u1(n), a reconstructed speech vector s'1(n) is generated for comparison to the input speech vector s(n). Gain block 228 scales the excitation gain factor G2' and summing block 230 adds in the adaptive codebook component. Such gain may be pre-computed by coefficient analyser 210 and used to analyse all excitation vectors, or may be optimised jointly with the search for the best excitation codeword I, generated by codebook search controller 220.
The scaled excitation signal G11(n) + G2 u1(n) is then filtered by the prediction filter 218, which constitutes a short-term predictor (STP) filter, to generate the reconstructed speech vector s'1(n) The reconstructed speech vector s'.(n) for the i-th excitation code vector is compared to the same block of input speech vector s(n) by subtracting these two signals in subtractor 216.
CMLO4735EV An error signal in the form of the difference vector e (n) represents the difference between the original and the reconstructed blocks of speech. The difference vector is perceptually weighted by a perceptual weighting filter 232 to generate a perceptually weighted error signal. The perceptual weighting filter 232 utilises weighting filter parameters (WTP) generated by coefficient analyser 210.
The perceptual weighting accentuates those frequencies where the error is perceptually more important to the human ear, and attenuates other frequencies. Specifically, it is known that human auditory sensitivity is reduced in frequencies around a strong signal. Specifically, the coefficient analyser 210 can evaluate the masking threshold for the specific speech signal in the speech frame and determine the characteristics of the perceptual weighting filter 232 in response -for example as a filter having a frequency response complementary to the masking threshold frequency response.
Thus, the perceptual weighting filter 232 provides a perceptual weighting of the generated error signal thereby providing an error signal which more closely reflects how the difference between the generated predicted signal and the actual speech signal will be perceived by the end user.
Furthermore, the perceptual weighting filter 232 is dependent on the received signal/prediction filter and is continually adapted to the speech signal thereby reflecting how the human perception is affected by the signal characteristics.
CMLO4735EV In the system of FIG. 2, the perceptually weighted error signal is not directly used to select the excitation signal but is rather fed to a shaping filter 234 which is cascaded with the perceptual weighting filter 232 and which filters the perceptually weighted error signal to generate a shaped error signal.
The shaping filter 234 shapes the error signal by modifying a signal level of the perceptually weighted error signal in lower frequency interval relative to a higher frequency interval. Specifically, in the system, the shaping filter 234 is able to deemphasise the perceptually weighted error signal for a frequency region above that of narrowband speech (typically > 3500 Hz) relative to the frequency region for narrowband speech. Thus, the shaping filter 234 may provide an efficient, easy to implement and flexible way of adjusting a quality trade off between narrowband speech and wideband speech.
Furthermore, whereas the prediction filter 218 and the perceptual weighting filter 232 needs to be continually updated and changed to correspond to the speech samples of the current speech frame, the shaping filter 234 is independent of the current speech signal, and the dynamic characteristics of the prediction filter 218 and the perceptual weighting filter 232. Thus, the shaping filter 234 can be a static filter representing a desired bias and need not be changed dynamically during encoding. Indeed, the shaping filter may be fixed during the design stage and can be independent of the received signal, the prediction filter 218 and/or the perceptual weighting filter 232.
CMLO4735EV The function and benefits of the shaping filter 234 will be described in more detail later.
The shaping filter 234 is coupled to the codebook search controller 220 where an energy calculator function computes the energy of the shaped error signal in the form of the filtered weighted difference vector e'1(n) . The codebook search controller compares the i-th shaped error signal for the present excitation vector u1(n) against error signals for other possible excitation signals to determine the excitation vector producing the minimum error. The code of the i-th excitation vector having a minimum error is then output over the channel as the best excitation code I. Thus, the codebook search controller 220 selects an encode excitation signal from the possible excitation signals as the encode excitation signal having the lowest error measure. An indication of this excitation signal in the form of the code/index of the encode excitation signal is then submitted to the MUX 212 which includes it in the encoded signal transmitted to the decoder.
Furthermore, a copy of the scaled excitation signal G11(n)+G2u1(n) is stored within a Long Term Predictor memory 236 coupled to the adaptive codebook 222 for future use.
It will be appreciated that other selection Criteria or measures may be used and that the codebook search controller 220 may determine a particular codeword that provides an error signal having some predetermined criteria, such as meeting a predefined error threshold.
CMLO4735EV The decoder functionality is substantially the reverse of that of the encoder as illustrated in FIG. 3. Within the conventional CELP decoder of FIG. 3, the best indices and gains sent in the main bitstream are used with a stochastic codebook 302 and adaptive codebook 304 corresponding to the codebooks of the speech encoder.
The output of the stochastic codebook 302 is fed to a gain function 306 and the output of the adaptive codebook 304 is input to respective gain function 308. The gain-adjusted outputs are then summed in summer 310 and input into a prediction filter 312 which has been setup with the coefficients received in the bitstream. Thus, a speech signal is generated using an excitation signal and prediction filter substantially identical to that selected by the encoder.
The generated excitation signal is further stored in a memory 314 coupled to the adaptive codebook 304 for use in subsequent frames.
In the system, the decoder may receive a signal comprising both encoded wideband speech data and narrowband speech data. Specifically, the encoder may generate a scalable signal such that a decoder can generate a narrowband speech signal even if the original signal input to the decoder is a wideband signal. However, a requirement of the ITU-T is that the decoder should generate a signal which has a comparable quality to a narrowband signal generated from an original narrowband signal.
CMLO4735EV In order to achieve this, the shaping filter 234 allows an adjustment of the emphasis applied to different frequency intervals. Specifically, the response of the shaping filter 234 can be adjusted such that a lower frequency interval corresponding to a bandwidth of the narrowband speech data has a higher gain than a higher frequency interval corresponding to the additional bandwidth of the wideband speech data. Thus, the error contribution from the higher frequencies only present in the wideband speech is attenuated relative to the error contribution of the lower frequencies present in both narrowband and wideband speech.
Thus, increased emphasis is provided to the narrowband error relative to the wideband error resulting in an increased quality for narrowband speech.
FIG. 4 illustrates a specific example wherein the shaping filter 234 corresponds to a low pass filter 401 in parallel with a high pass filter 403. Furthermore, the gain of the high pass filter relative to the low pass filer may be adjusted in a gain element 405. It will be appreciated that the gain element 405 may be located before, after or as part of the high pass filter 405. The gain factor K may be used to adjust the relativeemphasis of the narrowband frequencies relative to the wideband frequencies. Thus, a simply adjustment of a single gain parameter can be used to adjust and control the quality trade-off between decoded narrowband and wideband quality.
In the example, the shaping filter and the low pass filter has a substantially unity DC gain (e.g. the DC gain is between 0.9 and 1.1) . Furthermore, the shaping filter is CMLO4735EV kept relatively simple to facilitate implementation, design and to reduce computational requirements.
Specifically, a shaping filter based on a simple low order (e.g. second order or lower) pole-zero low-pass filter with unity DC gain has been developed. The low pass filter has a transfer function substantially proportional to B(z) A(z) where A(z) and B(z) are polynomials given by c1+c2.z1+c2.z2.
In the example, the polynomials are of second order or less but it will be appreciated that in other embodiments other orders may be used.
The high pass filter has a transfer function substantially proportional to: B(z) -__________ A(z) -A(z) The resulting transfer function of the shaping filter 234 is accordingly substantially proportional to: B(z) KA(z)_B(z)(I_K).B(z)+K.A(z-) A(z) A(z) A(z) where A(z) and 3(z) are z-polynomials and K is a gain adjustment parameter.
The developed filter provides a particularly advantageous filter characteristic and frequency transfer function.
Specifically, the filter can provide a step like frequency CMLO4735EV response whereby the frequency interval for the narrowband speech can be emphasised relative to the frequency interval only present in the wideband speech. For example, the filter can be arranged to attenuate frequencies above a second interval frequency relative to frequencies be:Low a first interval frequency. By selecting the first and second interval frequencies to correspond to the frequency boundary between narrowband and wideband signals, an efficient trade-off between the decoded wideband and narrowband speech quality can be adjusted by a simple adjustment of the gain factor K. Specifically, for ITU-T defined speech signals, the frequency bandwidth of a narrowband speech signal is limited to below 3.5 kHz whereas a wideband speech signal extends to 7 kFlz. Accordingly, the first interval frequency may be selected to be below 3.5 kHz (e.g. suitable values may be 2.5-3.5 kHz) and the second interval frequency may be selected above 3.5 kHz (e.g. suitable values may be 3.5-6 kHz) For K<1 the specific shaping filter has a gain of less than unity for higher frequencies, and close to unity for lower frequencies. Furthermore, the approximate gain of the high frequency magnitudes is approximately given by K. Various versions of this filter are shown for different values of K in Figures 5, 6 and 7.
As can be seen, an effective step-like response is achieved with the emphasis between narrowband and wideband response being adjustable simply by modifying the gain value K. Furthermore, particularly advantageous quality trade-off CMLO4735EV adjustment has been achieved by developing a filter which allows the low frequency gain (corresponding to the narrowband bandwidth) to remain above a given value for the entire frequency interval up to a first interval frequency (e.g. above -0.5 dB up to a normalized frequency of 0.5 for FIG. 6) while the high frequency gain remain below a given value for the entire frequency interval from a second interval frequency (e.g. below -3.5 dE from a normalized frequency of 0.8 for FIG. 6) Furthermore, the step-like function allows the higher frequency interval to remain above a given value for the entire frequency interval from the second interval frequency (e.g. above -4 dB above a normalized frequency of 0.8 for FIG.6). The gain difference between the narrowband and wideband frequency intervals (e.g, >3 dB in the example of FIG. 6) can be designed to be larger than the gain variation within the high frequency interval (e.g, <0.5 dB in the
example of FIG. 6)
Indeed, the inclusion of a high pass filtering effect provides a step like function wherein the minimum gain of the filter falls in a transitional region such that the gain for higher frequencies may possibly be increasing and higher than the gain in the transitional region.
The filter shape may provide particularly advantageous performance as it provides a filter with a relatively flat frequency response within each band while achieving a relative quick transition between the narrowband and wideband frequency bands. This reduces the impact of the shaping filter within each band while providing an efficient CMLO4735EV adjustment between the relative weighting/emphasis of narrowband and wideband errors.
Specifically, by examining FIGs. 5-7 it is clear that by simply varying the value of K, the weighting of the low and high bands may be adjusted thereby providing an efficient design trade-off between decoded narrowband and wideband quality.
Furthermore, complexity of the shaping filter is very low since it is a simple second-order pole-zero filter.
Thus a practical and efficient mechanism is provided that allows a change in the perceptual weighting of the narrowband and wideband parts of the audio bandwidth in order to improve the narrowband reproduction at the expense of the wideband reproduction. It also allows the difference to be easily parameterized with a simple pole/zero filter (second order) A particularly advantageous shaping filter suitable for e.g. the ITtJ-T requirements and parameters has a transfer function substantially proportional or equal to: 0.7029+O.7349z +0.2584:2 I+0.4676z' +0.2323z2 where the received speech is assumed to be sampled at a sampling rate of 12.8 kHz, which is the internal sampling rate used by the ITU-T G.722.2, 3GPP-2 VMR and 3GPP AMR-Wideband speech codecs.
CMLO4735EV This filter provides a particularly advantageous trade-off which in listening experiments has been demonstrated to provide exceedingly high narrowband speech quality while maintaining high wideband speech quality.
It will be appreciated that minor variations to this filter may be acceptable while still maintaining high performance.
For example, evaluations have demonstrated that high quality performance can be achieved for filters with a gain transfer function that deviates by 10% or less from the gain of the above or by a filter where the coefficients values deviate up to 10% from the coefficients of the above transfer function.
It will be appreciated that the gain adjustment of the high pass filter may be constant or that it may e.g. be modified between different speech frames. For example, the CELP coder may continuously evaluate whether a received speech signal is a narrowband signal or a wideband signal and may change the emphasis accordingly.
FIG. 8 illustrates an example of a method of Code Excited Linear Prediction, CELP, speech coding in accordance with some embodiments of the invention.
In step 801 a first speech signal is received.
In step 803 a prediction filter generates a predicted speech signal in response to an excitation signal.
CMLO4735EV In step 805 an error signal indicative of a difference between the predicted speech signal and the received speech signal is generated.
In step 807 a perceptual weighting filter filters the error signal to generate a perceptually weighted error signal.
In step 809 a shaping filter filters the perceptually weighted error signal to generate a shaped error signal. The shaping filter is arranged to modify a signal level of the perceptually weighted error signal in a lower frequency interval relative to a higher frequency interval.
Step 803-809 are repeated for a number of excitation signals (e.g. for all stored excitation signals) In step 811 an encode excitation signal is selected in response to the shaped error signal.
In step 813 an encoded signal is generated for the speech signal. The encoded signal comprises an indication of the encoded excitation signal.
It will be appreciated that the above description for clarity has described embodiments of the invention with reference to different functional units and processors.
However, it will be apparent that any suitable distribution of functionality between different functional units or processors may be used without detracting from the invention. For example, functionality illustrated to be performed by separate processors or controllers may be performed by the same processor or controllers. Hence, CMLO4735EV references to specific functional units are only to be seen as references to suitable means for providing the described functionality rather than indicative of a strict logical or physical structure or organization.
The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these. The invention may optionally be implemented at least partly as computer software running on one or more data processors and/or digital signal processors. The elements and components of an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors.
Although the present invention has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present invention is limited only by the accompanying claims. Additionally, although a feature may appear to be described in connection with particular embodiments, one skilled in the art would recognize that various features of the described embodiments may be combined in accordance with the invention. In the claims, the term comprising does not exclude the presence of other elements or steps.
Furthermore, although individually listed, a plurality of means, elements or method steps may be implemented by e.g. a CMLO4735EV single unit or processor. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. Also the inclusion of a feature in one category of claims does not imply a limitation to this category but rather indicates that the feature is equally applicable to other claim categories as appropriate. Furthermore, the order of features in the claims does not imply any specific order in which the features must be worked and in particular the order of individual steps in a method claim does not imply that the steps must be performed in this order. Rather, the steps may be performed in any suitable order.
CMLO4735EV

Claims (18)

1. A Code Excited Linear Prediction, CELP, speech coder comprising: means for receiving a first speech signal; a prediction filter for generating a predicted speech signal in response to an excitation signal; means for generating an error signal indicative of a difference between the predicted speech signal and the received speech signal; a perceptual weighting filter for filtering the error signal to generate a perceptually weighted error signal; a shaping filter for filtering the perceptually weighted error signal to generate a shaped error signal, the shaping filter being arranged to modify a signal level of the perceptually weighted error signal in a lower frequency interval relative to a higher frequency interval; means for selecting an encode excitation signal in response to the shaped error signal; and means for generating an encoded signal for the first speech signal, the encoded signal comprising an indication of the encoded excitation signal.
2. The CELP speech coder of claim 1 further comprising means for determining a characteristic of the prediction filter in response to the received signal and wherein the shaping filter is independent of the characteristic.
3. The CELP speech coder of claim 1 wherein a gain of the shaping filter is higher in the lower frequency interval than in the higher frequency interval.
CMLO4735EV
4. The CELP speech coder of claim I wherein the lower frequency interval comprises frequencies below a first interval frequency and the higher frequency interval comprises frequencies above a second interval frequency and a gain of the shaping filter is above a first threshold for the lower frequency interval and below a second threshold for the higher frequency interval; the first threshold being higher than the second threshold.
5. The CELP speech coder of claim 4 wherein the gain of the shaping filter is above a third threshold in the higher frequency interval and a difference between the second and third threshold is less than a difference between the first and second threshold.
6. The CELP speech coder of claim 4 wherein the first interval frequency is below 3.5 kHz and the second interval frequency is above 3.5 kHz
7. The CELP speech coder of claim 1 wherein the shaping filter corresponds to a low pass filter in parallel with a gain adjusted high pass filter.
8. The CELP speech coder of claim 7 wherein the low pass filter has a transfer function substantially proportional to: B(z) A( z) and the high pass filter has a transfer function substantially proportional to: B(z) = A(:)-B(z) A(z) A() CMLO4735EV where A(z) and B(z) are z-polynomials.
9. The CELP speech coder of claim 8 wherein A(z) and B(z) are maximum second order polynomials.
10. The CELP speech coder of claim 7 wherein a gain adjustment of the high pass filter is modified between different speech frames.
11. The CELP speech coder of claim I wherein the shaping filter has a transfer function substantially proportional to (1-K)B(z)+KA(:) A( z) where A(z.) and B(z) are z-polynomials and K is a high frequency gain adjustment parameter.
12. The CELP speech coder of claim 1 wherein the shaping filter has a transfer function substantially proportional to 0.7029+O.7349z +O.2584z2 1+O.4676z +O.2323z2
13. The CELP speech coder of claim 1 wherein the shaping filter has a gain transfer function deviating by less than 10% from a gain transfer function of a transfer function proportional to O.7029+O.7349[' +0.2584z2 I +O.4676z' +O.2323z2 CMLO4735EV
14. The CELP speech coder of claim 1 wherein the shaping filter has a transfer function with coefficient values deviating less than 10% from a transfer function proportional to 0.7029+ 0.7349 +0.284z 1+0.4676z +0.2323z2
15. The CELP speech coder of claim 1 wherein the encoder is arranged to generate an encoded signal comprising wideband speech data and narrowband speech data and the lower frequency interval corresponds to a bandwidth of the narrowband speech data.
16. The CELP speech coder of claim 15 wherein a combination of the lower frequency interval and the higher frequency interval corresponds to a bandwidth of the wideband speech data.
17. A communication unit comprising a Code Excited Linear Prediction, CELP, speech coder, the speech coder comprising: means for receiving a first speech signal; a prediction filter for generating a predicted speech signal in response to an excitation signal; means for generating an error signal indicative of a difference between the predicted speech signal and the received speech signal; a perceptual weighting filter for filtering the error signal to generate a perceptually weighted error signal; CML04735EV a shaping filter for filtering the perceptually weighted error signal to generate a shaped error signal, the shaping filter being arranged to modify a signal level of the perceptually weighted error signal in a lower frequency interval relative to a higher frequency interval; means for selecting an encode excitation signal in response to the shaped error signal; arid means for generating an encoded signal for the first speech signal, the encoded signal comprising an indication of the encoded excitation signal.
18. A method of Code Excited Linear Prediction, CELP, speech coding, the method comprising: receiving a first speech signal; a prediction filter generating a predicted speech signal in response to an excitation signal; generating an error signal indicative of a difference between the predicted speech signal and the received speech signal; a perceptual weighting filter filtering the error signal to generate a perceptually weighted error signal; a shaping filter filtering the perceptually weighted error signal to generate a shaped error signal, the shaping filter modifying a signal level of the perceptually weighted error signal in a lower frequency interval relative to a higher frequency interval; selecting an encode excitation signal in response to the shaped error signal; and generating an encoded signal for the first speech signal, the encoded signal comprising an indication of the encoded excitation signal.
CMLO4735EV
GB0624860A 2006-12-13 2006-12-13 Code excited linear prediction speech coding Active GB2444757B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB0624860A GB2444757B (en) 2006-12-13 2006-12-13 Code excited linear prediction speech coding
PCT/US2007/083608 WO2008076534A2 (en) 2006-12-13 2007-11-05 Code excited linear prediction speech coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0624860A GB2444757B (en) 2006-12-13 2006-12-13 Code excited linear prediction speech coding

Publications (3)

Publication Number Publication Date
GB0624860D0 GB0624860D0 (en) 2007-01-24
GB2444757A true GB2444757A (en) 2008-06-18
GB2444757B GB2444757B (en) 2009-04-22

Family

ID=37712051

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0624860A Active GB2444757B (en) 2006-12-13 2006-12-13 Code excited linear prediction speech coding

Country Status (2)

Country Link
GB (1) GB2444757B (en)
WO (1) WO2008076534A2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103295578B (en) 2012-03-01 2016-05-18 华为技术有限公司 A kind of voice frequency signal processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002058052A1 (en) * 2001-01-19 2002-07-25 Koninklijke Philips Electronics N.V. Wideband signal transmission system
US20040111257A1 (en) * 2002-12-09 2004-06-10 Sung Jong Mo Transcoding apparatus and method between CELP-based codecs using bandwidth extension
US20040153313A1 (en) * 2001-05-11 2004-08-05 Roland Aubauer Method for enlarging the band width of a narrow-band filtered voice signal, especially a voice signal emitted by a telecommunication appliance
WO2004084182A1 (en) * 2003-03-15 2004-09-30 Mindspeed Technologies, Inc. Decomposition of voiced speech for celp speech coding
US20060282263A1 (en) * 2005-04-01 2006-12-14 Vos Koen B Systems, methods, and apparatus for highband time warping

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2252170A1 (en) * 1998-10-27 2000-04-27 Bruno Bessette A method and device for high quality coding of wideband speech and audio signals

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002058052A1 (en) * 2001-01-19 2002-07-25 Koninklijke Philips Electronics N.V. Wideband signal transmission system
US20040153313A1 (en) * 2001-05-11 2004-08-05 Roland Aubauer Method for enlarging the band width of a narrow-band filtered voice signal, especially a voice signal emitted by a telecommunication appliance
US20040111257A1 (en) * 2002-12-09 2004-06-10 Sung Jong Mo Transcoding apparatus and method between CELP-based codecs using bandwidth extension
WO2004084182A1 (en) * 2003-03-15 2004-09-30 Mindspeed Technologies, Inc. Decomposition of voiced speech for celp speech coding
US20060282263A1 (en) * 2005-04-01 2006-12-14 Vos Koen B Systems, methods, and apparatus for highband time warping

Also Published As

Publication number Publication date
WO2008076534A2 (en) 2008-06-26
GB2444757B (en) 2009-04-22
WO2008076534A3 (en) 2008-11-27
GB0624860D0 (en) 2007-01-24

Similar Documents

Publication Publication Date Title
US10446162B2 (en) System, method, and non-transitory computer readable medium storing a program utilizing a postfilter for filtering a prefiltered audio signal in a decoder
RU2763374C2 (en) Method and system using the difference of long-term correlations between the left and right channels for downmixing in the time domain of a stereophonic audio signal into a primary channel and a secondary channel
RU2262748C2 (en) Multi-mode encoding device
KR101344174B1 (en) Audio codec post-filter
JP4335917B2 (en) Fidelity optimized variable frame length coding
JP3653826B2 (en) Speech decoding method and apparatus
US7020605B2 (en) Speech coding system with time-domain noise attenuation
JP4213243B2 (en) Speech encoding method and apparatus for implementing the method
JP5413839B2 (en) Encoding device and decoding device
JP4176349B2 (en) Multi-mode speech encoder
JP2009069856A (en) Method for estimating artificial high band signal in speech codec
JPH08278799A (en) Noise load filtering method
US7813922B2 (en) Audio quantization
WO1994025959A1 (en) Use of an auditory model to improve quality or lower the bit rate of speech synthesis systems
RU2707144C2 (en) Audio encoder and audio signal encoding method
WO1997031367A1 (en) Multi-stage speech coder with transform coding of prediction residual signals with quantization by auditory models
GB2444757A (en) Code excited linear prediction speech coding and efficient tradeoff between wideband and narrowband speech quality
JPH05158495A (en) Voice encoding transmitter
JP5451603B2 (en) Digital audio signal encoding
Bhatt Implementation and overall performance evaluation of CELP based GSM AMR NB coder over ABE
KR20070030035A (en) Apparatus and method for transmitting audio signal
GB2391440A (en) Speech communication unit and method for error mitigation of speech frames
JP2016105168A (en) Method of concealing packet loss in adpcm codec and adpcm decoder with plc circuit
JPH08160996A (en) Voice encoding device
GB2436192A (en) A speech encoded signal and a long term predictor (ltp) logic comprising ltp memory and which quantises a memory state of the ltp logic.

Legal Events

Date Code Title Description
732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)

Free format text: REGISTERED BETWEEN 20110127 AND 20110202

732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)

Free format text: REGISTERED BETWEEN 20170831 AND 20170906