WO2002023533A2 - System for improved use of pitch enhancement with subcodebooks - Google Patents

System for improved use of pitch enhancement with subcodebooks Download PDF

Info

Publication number
WO2002023533A2
WO2002023533A2 PCT/IB2001/001735 IB0101735W WO0223533A2 WO 2002023533 A2 WO2002023533 A2 WO 2002023533A2 IB 0101735 W IB0101735 W IB 0101735W WO 0223533 A2 WO0223533 A2 WO 0223533A2
Authority
WO
WIPO (PCT)
Prior art keywords
speech
rate
pitch
fixed codebook
codebook
Prior art date
Application number
PCT/IB2001/001735
Other languages
French (fr)
Other versions
WO2002023533A3 (en
Inventor
Yang Gao
Original Assignee
Conexant Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Conexant Systems, Inc. filed Critical Conexant Systems, Inc.
Priority to AU2001287973A priority Critical patent/AU2001287973A1/en
Publication of WO2002023533A2 publication Critical patent/WO2002023533A2/en
Publication of WO2002023533A3 publication Critical patent/WO2002023533A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0004Design or structure of the codebook
    • G10L2019/0005Multi-stage vector quantisation

Definitions

  • Patent Number .
  • Patent Number United States Patent Application Serial Number , "SYSTEM FOR
  • This invention relates to speech communication systems and, more particularly, to systems and methods for digital speech coding.
  • Communication systems include both wireline and wireless radio systems.
  • Wireless communication systems electrically connect with the land ' lihe systems and communicate using radio frequency (RF) with mobile communication devices.
  • RF radio frequency
  • the radio frequencies available for communication in cellular systems are in the frequency range centered around 900 MHz and in the personal communication services (PCS) frequency range centered around 1900 MHz. Due to increased traffic caused by the expanding popularity of wireless communication devices, such as cellular telephones, it is desirable to reduced bandwidth of transmissions within the wireless systems.
  • Digital transmission in wireless radio communications is increasingly being applied to both voice and data due to noise immunity, reliability, compactness of equipment and the ability to implement sophisticated signal processing functions using digital techniques.
  • Digital transmission of speech signals involves the steps of: sampling an analog speech waveform with an analog-to-digital converter, speech compression (encoding), transmission, speech decompression (decoding), digital-to- analog conversion, and playback into an earpiece or a loudspeaker.
  • the sampling of the analog speech waveform with the analog-to-digital converter creates a digital signal.
  • the number of bits used in the digital signal to represent the analog speech waveform creates a relatively large bandwidth.
  • a speech signal that is sampled at a rate of 8000 Hz (once every 0.125 ms), where each sample is represented by 16 bits, will result in a bit rate of 128,000 (16x8000) bits per second, or 128 Kbp ' s (Kilo bits per second).
  • Speech compression reduces the number of bits that represent the speech signal, thus reducing the bandwidth needed for transmission.
  • speech compression may result in degradation of the quality of decompressed speech.
  • a higher bit rate will result in higher quality, while a lower bit rate will result in lower quality.
  • speech compression techniques such as coding techniques, can produce decompressed speech of relatively high quality at relatively low bit rates.
  • coding techniques attempt to represent the perceptually important features of the speech signal, with or without preserving the actual speech waveform.
  • One coding technique used to lower the bit rate involves varying the degree of speech compression (i.e., varying the bit rate) depending on the part of the speech signal being compressed.
  • varying the bit rate i.e., varying the bit rate
  • parts of the speech signal for which adequate perceptual representation is more difficult or more important are coded and transmitted using a higher number of bits
  • parts of the speech signal for which adequate perceptual representation is less difficult or less important are coded with a lower number of bits.
  • the resulting average bit rate for the speech signal may be relatively lower than would be the case for a fixed bit rate that provides decompressed speech of similar quality.
  • a technique uses a pitch enhancement to improve the use of the fixed codebooks in cases where the fixed codebook comprises a plurality of subcodebooks.
  • Code-excited linear prediction (CELP) coding utilizes several predictions to capture redundancy in voiced speech while minimizing data to encode the speech.
  • a first short- term prediction results in an LPC residual
  • a second long term prediction results in a pitch residual.
  • the pitch residual may be coded using a fixed codebook that includes a plurality of fixed subcodebooks.
  • the disclosed embodiments describe a system for pitch enhancements to improve the use of communication systems employing a plurality of fixed subcodebooks.
  • a pitch enhancement is used in a predictable manner to add pulses to the output from the fixed subcodebooks but without requiring any additional bits to encode this additional information.
  • the pitch lag is calculated in an adaptive codebook portion of the speech encoder/decoder. These additional pulses result in encoded speech that more closely approximates the voiced speech.
  • an adaptive pitch gain and a modifying factor are used to enhance the pulses from the fixed subcodebooks differently for different subcodebooks. These techniques are used in such a manner that no extra bits of data are added to the bitstream that constitutes the output of an encoder or the input to a decoder.
  • the speech coder is capable of selectively activating a series of encoders and decoders of different bitstream rates to maximize the overall quality of a reconstructed speech signal while maintaining the desired average bit rate.
  • FIG. 1 is a graph representing time-domain speech patterns.
  • FIG. 2 is a block diagram of a speech-coding system according to the invention.
  • FIG. 3 is another block diagram of a speech coding system.
  • FIG. 4 is an expanded block diagram of a speech encoding system.
  • FIG. 5 is a block diagram of fixed codebooks.
  • FIG. 6 is an expanded block diagram of the encoding system of FIG. 4.
  • FIG. 7 is a flow chart for searching a fixed codebook.
  • FIG. 8 is a flow chart for searching a fixed codebook.
  • FIG. 9 is a schematic diagram illustrating pitch enhancements.
  • FIG. 10 is a schematic diagram illustrating pitch enhancements.
  • FIG. 11 is a schematic diagram illustrating pitch enhancements.
  • FIG. 12 is a schematic diagram illustrating pitch enhancements.
  • FIG. 13 is a schematic diagram illustrating pitch enhancements.
  • FIG. 14 is a schematic diagram illustrating pitch enhancements.
  • FIG. 15 is a schematic diagram illustrating pitch enhancements.
  • FIG. 16 is a schematic diagram illustrating pitch enhancements.
  • FIG. 17 is another expanded block diagram of the encoding system of FIG. 4.
  • FIG. 18 is an expanded block diagram of the decoding system of FIG. 3.
  • Fig. 1 depicts the waveforms in CELP speech coding.
  • An input speech signal 2 has some measure of predictability or periodicity 4. At least a pitch gain, a pitch lag and a fixed codebook index are calculated from the speech signal 2.
  • the code-excited linear prediction (CELP) coding approach uses two types of predictors, a short-term predictor and a long-term predictor.
  • the short-term predictor is typically applied before the long-term predictor.
  • the short-term predictor is also referred to as linear prediction coding (LPC) or spectral envelope representation, and typically may comprise ten prediction parameters.
  • LPC linear prediction coding
  • a first prediction error may be derived from the short-term predictor and is called a short-term or LPC residual 6.
  • the short-term LPC parameters, fixed-codebook indices and gain, as well as an adaptive codebook lag and its gain for the long-term predictor are quantized.
  • the quantization indices, as well as the fixed codebook indices, are sent from the encoder to the decoder.
  • the quality of the speech may be enhanced through a system that uses a plurality of fixed subcodebooks, rather than merely a single fixed subcodebook.
  • Each lag parameter also may be called a pitch lag
  • each long-term predictor gain parameter also may be called an adaptive codebook gain.
  • the lag parameter defines an entry or a vector in the adaptive codebook.
  • the long-term predictor parameters and the fixed codebook entries that best represent the prediction error of the long-term residual are determined.
  • a second prediction error may be derived from the long-term predictor and is called a long-term or pitch residual 8.
  • the long-term residual may be coded using a fixed codebook that includes a plurality of fixed codebook entries or vectors. During coding, one of the entries is multiplied by a fixed codebook gain to represent the long- term residual.
  • Analysis-by-synthesis that is, feedback, is employed in the CELP coding. In the ABS approach, synthesizing with an inverse prediction filter and applying a perceptual weighting measure determine the best contribution from the fixed codebook and the best long-term predictor parameters.
  • the CELP decoder uses the fixed codebook indices to extract a vector from the fixed codebook or subcodebooks.
  • the vector is multiplied by the fixed-codebook gain to create a fixed codebook contribution.
  • a long-term predictor contribution is added to the fixed codebook contribution to create a synthesized excitation that is referred to as an excitation.
  • the long-term predictor contribution comprises the excitation from the past multiplied by the long-term predictor gain.
  • the long-term predictor contribution alternatively comprises an adaptive codebook contribution or a long-term pitch-filtering characteristic.
  • the synthesized excitation is passed through a short-term synthesis filter, which uses the short-term LPC prediction coefficients quantized by the encoder to generate synthesized speech.
  • the synthesized speech may be passed through a post- filter that reduces the perceptual coding noise.
  • Other codecs and associated coding algorithms may be used, such as a selectable mode locoer (SUM) system, extended code excited linear prediction (eX-CELP), and algebraic CELP (A-CELP).
  • SUM selectable mode locoer
  • eX-CELP extended code excited linear prediction
  • A-CELP algebraic CELP
  • Fig. 2 is a block diagram of a speech coding system 100 with according to one embodiment that uses CELP coding.
  • the speech coding system 100 includes a first communication device 105 operatively connected via a communication medium 110 to a second communication device 115.
  • the speech coding system 100 may be any cellular telephone, radio frequency, or other communication system capable of encoding a speech signal 145 and decoding the encoded signal to create synthesized speech 150.
  • the communications devices 105 and 1 15 may be cellular telephones, portable radio transceivers, and the like.
  • the communications medium 110 may include systems using any transmission mechanism, including radio waves, infrared, landlines, fiber optics, any other medium capable of transmitting digital signals (wires or cables), or any combination thereof.
  • the communications medium 110 may also include a storage mechanism including a memory device, a storage medium, or other device capable of storing and retrieving digital signals. In use, the communications medium 110 transmits a bitstream of digital between the first and second communications devices 105 and 115
  • the first communication device 105 includes an analog-to-digital converter 120, a preprocessor 125, and an encoder 130 connected as shown.
  • the first communication , device 105 may have an antenna or other communication medium interface (not shown) for sending and receiving digital signals with the communication medium 110.
  • the first communication device 105 may also have other components known in the art for any communication device, such as a decoder or a digital-to-analog converter.
  • the second communication device 115 includes a decoder 135 and digital-to- analog converter 140 connected as shown. Although not shown, the second communication device 115 may have one or more of a synthesis filter, a postprocessor, and other components. The second communication device 1 15 also may have an antenna or other communication medium interface (not shown) for sending and receiving digital signals with the communication medium.
  • the preprocessor 125, encoder 130, and decoder 135 comprise processors, digital signal processors (DSP), application specific integrated circuits, or other digital devices for implementing the coding and algorithms discussed herein.
  • the preprocessor 125 and encoder 130 may comprise separate components or the same component
  • the analog-to-digital converter 120 receives a speech signal 145 from a microphone (not shown) or other signal input device.
  • the speech signal may be voiced speech, music, or another analog signal.
  • the analog-to-digital converter 120 digitizes the speech signal, providing the digitized speech signal to the preprocessor 125.
  • the preprocessor 125 passes the digitized signal through a high-pass filter (not shown) preferably with a cutoff frequency of about 60-80 Hz.
  • the preprocessor 125 may perform other processes to improve the digitized signal for encoding, such as noise suppression.
  • the encoder 130 codes the speech using a pitch lag, a pitch gain, a fixed codebook, a fixed codebook gain, LPC parameters and other parameters.
  • the code is transmitted in the communication medium 110.
  • the decoder 135 receives the bitstream from the communication medium 110.
  • the decoder operates to decode the bitstream and generate a synthesized speech signal 150 in the form of a digitized signal.
  • the synthesized speech signal 150 has been converted to an analog signal by the digital-to-analog converter 140.
  • the encoder 130 and the decoder 135 use a speech compression system, commonly called a codec, to reduce the bit rate of the noise-suppressed digitized speech signal.
  • a codec commonly called a speech compression system
  • CELP code excited linear prediction
  • the CELP coding approach is frame-based. Samples of input speech signals (e.g., preprocessed, digitized speech signals) are stored in blocks of samples called frames. To minimize bandwidth use, each frame may be characterized. The frames are processed to create a compressed speech signal in digitized form. The frame characterization is based on the portion of the speech signal 145 contained in the particular frame. For example, frames may be characterized as stationary voiced speech, non-stationary voiced speech, unvoiced speech, onset, background noise, and silence. As will be seen, these classifications may be used to help determine the resources used to encode and decode each particular frame. Fig.
  • FIG. 3 shows an embodiment of a speech coding system 10 that may utilize adaptive and fixed codebooks, and in particular, may utilize fixed codebooks that comprise a plurality of fixed subcodebooks for encoding at different rates as a function of the characterization.
  • the encoding system 12 receives a speech signal 18 from a signal input device such as a microphone (not shown).
  • the speech coding system 10 includes four codecs, a full-rate codec 22, a half-rate codec 24, a quarter-rate codec 26 and an eighth-rate codec 28. There may be more or fewer codecs. Each codec has an encoder portion and a decoder portion located within the encoding and decoding systems 12 and 16 respectively.
  • Each codec 22, 24, 26, and 28 may process a portion of the bitstream between the encoding system 12 and the decoding system 16. Desirably. the decoded speech is also post-processed by modules shown in later figures. The post- processed speech may be received by a human ear or by a recording device, or other device capable of receiving or using such a signal. Each codec generates a bitstream of a different bandwidth. In one embodiment, the full rate codec generates about 170 bits, the half-rate codec generates about 80 bits, the quarter-rate about 40 bits, and the eighth-rate about 16 bits respectively, per frame.
  • the speech processing circuitry is constantly changing the codec used to code and decode speech.
  • a mode- line 21 carries a mode-input signal from a communications system.
  • the mode-input signal controls the average rate of the encoding system 12, dictating which of a plurality of codecs is used within the encoding system 12.
  • the full- and half-rate codecs use an eX-CELP (extended CELP) algorithm.
  • the eX-CELP algorithm categorizes frames into different categories using a rate selection and a type classification.
  • the quarter- and eighth-rate codecs are based on a perceptual matching algorithm. Different encoding approaches may be used for different categories of frames with different perceptual matching, different waveform matching, and different bit assignments. In this embodiment, the perceptual matching algorithms of the quarter-rate and eighth-rate codecs do not use waveform matching.
  • the frames may be divided into a plurality of subframes. The subframes may be different in size and number for each codec.
  • the subframes may be different in size for each classification.
  • the CELP approach is used in eX-CELP to choose the adaptive codebook, the fixed codebook. and other parameters used to code the speech.
  • the ABS scheme uses inverse prediction filters- and perceptual weighting measures for selecting the codebook entries.
  • Fig. 4 is an expanded block diagram of the encoding system 12 shown in Fig. 3.
  • One embodiment of the encoding system 12 includes a preprocessing module 34, a full- rate encoder 36, a half-rate encoder 38, a quarter-rate encoder 40, and an eighth-rate encoder 42, connected as illustrated.
  • the pre-processing module 34 may be used to process speech on a frame basis to provide filtering, signal enhancement, noise enhancement, and amplification to optimize the signal for subsequent processing.
  • the rate encoders include an initial frame-processing module 44 and an excitation-processing module 54.
  • the initial frame-processing module 44 is divided into a plurality of initial frame processing modules, namely, modules for the full-rate 46, half-rate 48, quarter-rate 50, and an initial eighth-rate frame processing module 52.
  • the full, half, quarter and eighth-rate encoders 36, 38, 40, and 42 comprise the encoding portion of the respective codecs 22, 24, 26, and 28.
  • the initial frame- processing module 44 performs initial frame processing, extracts speech parameters, and determines which rate encoder will encode a particular frame. Module 44 determines a rate selection that activates one of the encoders 36, 38, 40, or 42. The rate selection may be based on the categorization of the frame of the speech signal 18 and the mode of the speech compression system. Activation of one of the rate encoders 36, 38, 40, or 42, correspondingly activates one of the initial frame-processing modules 46, 48, 50, or 52.
  • the initial frame-processing module 44 determines a type classification for each frame that is processed by the full and half rate encoders 36 and 38.
  • the speech signal 18 as represented by one frame is classified as "type 0" or "type 1 ,” depending on the nature and characteristics of the speech signal 18.
  • additional classifications and supporting processing are provided.
  • Type 1 classification includes frames of the speech signal 18 having harmonic and formant structures that do not change rapidly.
  • Type 0 classification includes all other frames.
  • the type classification optimizes encoding by the initial full-rate frame- processing module 46 and the initial half-rate frame-processing module 48.
  • the classification type and rate selection are used to optimize the encoding by the excitation-processing module 54 for the full and half-rate encoders 36 and 38.
  • the excitation-processing module 54 is sub-divided into a full-rate module 56, a half-rate module 58, a quarter-rate module 60, and an eighth-rate module 62.
  • the rate modules 56, 58, 60, and 62 correspond to the rate encoders 36, 38, 40, and 42.
  • the full and half rate modules 56 and 58 in one embodiment both include a plurality of frame processing modules and a plurality of subframe processing modules, but provide substantially different encoding.
  • the term “F” indicates full rate processing
  • "H” indicates half-rate processing
  • "0" and "1" indicate type 0 and type 1, respectively.
  • the initial frame-processing module 44 includes modules for full-rate frame processing 46 and half-rate frame processing 48. These modules may calculate an open loop pitch 144a for a full-rate frame, or an open loop pitch 176a for a half-rate frame. These components may be used later.
  • the full rate module 56 includes an F type selector module 68, and an FO subframe-processing module 70.
  • Module 56 also includes modules for FI processing, including an FI first frame processing module 72, an FI subframe processing module 74, and an FI second frame-processing module 76.
  • the half rate module 58 includes an H type selector module 78, an HO sub-frame processing module 80, an HI first frame processing module 82, an HI sub-frame processing module 84, and an HI second frame-processing module 86.
  • the selector modules 68 and 78 direct the processing of the speech signals 18 to further optimize the encoding process based on the type classification.
  • selector module 68 directs the speech signal to either the F0 or FI processing to encode the speech and generate the bitstream.
  • Type 0 classification for a frame activates the processing module to process the frame on a subframe basis.
  • Type 1 processing proceeds on both a frame and subframe basis.
  • a fixed codebook component 146a and a closed loop adaptive codebook component 144b are generated and are used to generate fixed and adaptive codebook gains 148a and 150a.
  • an adaptive gain 148b is derived from the first frame-processing module 72, and a fixed codebook 146b is selected and used to encode the speech with the subframe-processing module 74.
  • a fixed codebook gain 150b is derived from the second frame-processing module 76.
  • Type signal 142 designates the type as either F0 or FI in the bitstream.
  • selector module 78 directs the frame to either HO (type 0) or HI (type 1) processing.
  • type 0 processing HO subframe processing module 80 generates a fixed codebook component 178a and a closed loop adaptive codebook component 176b, used to generate fixed and adaptive codebook gains 180a and 182a.
  • type 1 processing an HI first frame processing module 82, an HI subframe processing module 84 and an HI second frame processing module 86 are used.
  • An adaptive gain 180b, a fixed codebook component 178b, and a fixed codebook gain are calculated.
  • Type signal 174 designates the type as either HO or HI in the bitstream.
  • adaptive codebooks are then used to code the signal in the full rate and half rate codecs.
  • An adaptive codebook search and selection for the full rate codec uses components 144a and 144b. These components are used to search, test, select and designate the location of a pitch lag from an adaptive codebook.
  • half-rate components 176a and 176b search, test, select and designate the location of the best pitch lag for the half-rate codec. These pitch lags are subsequently used to improve the quality of the encoded and decoded speech through fixed codebooks employing a plurality of fixed subcodebooks.
  • Fig. 5 is a block diagram depicting the structure of fixed codebooks and subcodebooks in one embodiment.
  • the fixed codebook 160 for the F0 codec comprises three (different) subcodebooks, each of them having 5 pulses.
  • the fixed codebook for the FI codec is a single 8-pulse subcodebook 162.
  • the fixed codebook 178 comprises three subcodebooks for the HO, a 2-pulse subcodebook 192, a three-pulse subcodebook 194, and a third subcodebook 196 with gaussian noise.
  • the fixed codebook comprises a 2-pulse subcodebook 193, a 3 -pulse subcodebook 195, and a 5-pulse subcodebook 197.
  • Fig. 6 comprises F0 and HO subframe processing modules 70 and 80, including an adaptive codebook section 362, a fixed codebook section 364, and a gain quantization section 366.
  • the adaptive codebook section 368 receives a pitch track 348 to calculate an area in the adaptive codebook to search for an adaptive codebook vector (v a ) 382 (a pitch lag).
  • the adaptive codebook section 368 also performs a search to determine and store the best lag vector v a for each subframe.
  • An adaptive gain, g a 384 An adaptive gain, g a 384.
  • the gain quantization section 366 may include a 2D VQ gain codebook 412, a first multiplier 414, a second multiplier 416, an adder 418, a synthesis filter 420, a perceptual weighting filter 422, a subtractor 424 and a minimization module 426.
  • the gain quantization section 366 makes use of the second resynthesized speech 406 generated in the fixed codebook section, and also generates a third resynthesized speech 438.
  • the fixed codebook 390 fixed codebook vector (v c ) 402 representing the long- term residual for a subframe.
  • the multiplier 392 multiplies the fixed codebook vector (v c ) 402 by a gain (g c ) 404.
  • the gain (g c ) 404 is unquantized and is a representation of the initial value of the fixed codebook gain.
  • the resulting signal is provided to the synthesis filter 394.
  • the synthesis filter 394 receives the quantized LPC coefficients A q (z) 342 and together with the perceptual weighting filter 396, creates a resynthesized speech signal 406.
  • the subtractor 398 subtracts the resynthesized speech signal 406 from the long-term error signal 388 to generate the weighted mean square error (WMSE), a fixed codebook error signal 408.
  • WMSE weighted mean square error
  • the minimization module 400 receives the fixed codebook error signal 408.
  • the minimization module 400 uses the fixed codebook error signal 408 to control the selection of vectors for the fixed codebook vector (v c ) 402 from the fixed codebook 292 in order to reduce the error.
  • the minimization module 400 also receives the control information 356 that may include a final characterization for each frame.
  • the final characterization class contained in the control information 356 controls how the minimization module 400 selects vectors for the fixed codebook vector (v c ) 402 from the fixed codebook 390. The process repeats until the search by the second minimization module 400 has selected the best vector for the fixed codebook vector (v c ) 402 from the fixed codebook 390 for each subframe. The best vector for the fixed codebook vector (v c ) 402 minimizes the error in the second resynthesized speech signal 406.
  • the indices identify the best vector for the fixed codebook vector (v c ) 402 and, as previously discussed, may be used to form the fixed codebook components 146a and 178a. Weighting Factors in Selecting a Fixed Subcodebook and a Codevector
  • Low-bit rate coding uses the important concept of perceptual weighting to determine speech coding.
  • This special weighting factor is generated by employing certain features of speech, and applied as a criterion value in favoring a specific subcodebook in a codebook featuring a plurality of subcodebooks.
  • One subcodebook may be preferred over the other subcodebooks for some specific speech signal, such as noise-like unvoiced speech.
  • the features used to estimate the weighting factor include, but are not limited to, the noise-to-signal ratio (NSR), sharpness of the speech, the pitch lag, the pitch correlation, as well as other features.
  • NSR noise-to-signal ratio
  • the classification system for each frame of speech is also important in defining the features of the speech.
  • the NSR is a traditional distortion criterion that may be calculated as the ratio between an estimate of the background noise energy and the frame energy of a frame.
  • One embodiment of the NSR calculation ensures that only true background noise is included in the ratio by using a modified voice activity decision.
  • previously calculated parameters representing, for example, the spectrum expressed by the reflection coefficients, the pitch correlation R p , the NSR, the energy of the frame, the energy of the previous frames, the residual sharpness and the sharpness may also be used.
  • Sharpness is defined as the ratio of the average of the absolute values of the samples to the maximum of the absolute values of the samples of speech. It is typically applied to the amplitude of the signals.
  • One embodiment of the target signal for time warping is a synthesis of the current segment derived from the modified weighted speech that is represented by s w ' (n) and the pitch track 348 represented by L ⁇ (n) . According to the pitch track 348,
  • l(L p ( ⁇ )) and f(L p (n)) are the integer and fractional parts of the pitch lag, respectively;
  • w ⁇ (f,i) is the Hamming weighted Sine window, and N s is the length of the segment.
  • the weighting function, w e (n) may be a two-piece linear function, which emphasizes the pitch complex and de-emphasizes the "noise" in between pitch complexes.
  • the weighting may be adapted according to a classification, by increasing the emphasis on the pitch complex for segments of higher periodicity.
  • the modified weighted speech for the segment may be reconstructed according to the mapping given by ( « + ⁇ ),s v + r acc + r c + r opt )] ⁇ [s' w (n),s' w (n + r c - 1)], (Equation 2) and (" + X, + r c + ⁇ opi ),s w (n + Xc + + N > - 1)] ⁇ fc(" + ⁇ c ),s w ' (n + N ⁇ - 1)],
  • Equation 3 where r c is a parameter defining the wa ⁇ ing function. In general, r c specifies the beginning of the pitch complex.
  • the mapping given by Equation 2 specifies a time warping, and the mapping given by Equation 3 specifies a time shift (no wa ⁇ ing).
  • Both may be carried out using a Hamming weighted Sine window function.
  • the pitch gain and pitch correlation may be estimated on a pitch cycle basis and are defined by Equations 2 and 3, respectively.
  • the pitch gain is estimated in order to minimize the mean squared error between the target s (n) , defined by Equation 1 , and the final modified signal s (n), defined by Equations 2 and 3, and may be given by JV. -l
  • the pitch gain is provided to the excitation-processing module 54 as the unquantized pitch gains.
  • the pitch correlation may be given by
  • Both parameters are available on a pitch cycle basis and may be linearly interpolated.
  • the fixed codebook component 146a for frames of Type 0 classification may represent each of four subframes of the full-rate codec 22 using the three different 5- pulse subcodebooks 160.
  • vectors for the fixed codebook vector (v c ) 402 within the fixed codebook 390 may be determined using the error signal
  • Pitch enhancement may be applied to the 5-pulse codebooks 160 within the fixed codebook 390 in the forward direction or the backward direction during the search.
  • the search is an iterative, controlled complexity search for the best vector from the fixed codebook 160.
  • An initial value for the fixed codebook gain represented by the gain (g c ) 404 may be found simultaneously with the search.
  • Figures 7 and 8 illustrate the procedure used to search for the best indices in the fixed codebook.
  • a fixed codebook has k subcodebooks. More or fewer subcodebooks may be used in other embodiments.
  • the following example first features a single subcodebook containing N pulses. The possible location of a pulse is defined by a plurality of positions on a track.
  • the encoder processing circuitry corrects each pulse position sequentially, again from the first pulse 639 to the last pulse 641, by considering the influence of all the other pulses.
  • the functionality of the second or subsequent searching turn is repeated, until the last turn is reached 643. Further turns may be utilized if the added complexity is allowed. This procedure is followed until k turns are completed 645 and a value is calculated for the subcodebook.
  • Fig. 8 is a flow chart for the method described in Fig. 7 to be used for searching a fixed codebook comprising a plurality of subcodebooks.
  • a first turn is begun 651 by searching a first subcodebook 653, and searching the other subcodebooks 655, in the same manner described for Fig. 7, and keeping the best result 657, until the last subcodebook is searched 659.
  • a second turn 661 or subsequent turn 663 may also be used, in an iterative fashion.
  • one of the subcodebooks in the fixed codebook is typically chosen after finishing the first searching turn. Further searching turns are done only with the chosen subcodebook.
  • one of the subcodebooks might be chosen only after the second searching turn or thereafter, should processing resources so permit. Computations of minimum complexity are desirable, especially since two or three times as many pulses are calculated, rather than one pulse before enhancements described herein are added.
  • the search for the best vector for the fixed codebook vector (v c ) 402 is completed in each of the three 5-puls ⁇ codebooks 160.
  • candidate best vectors for the fixed codebook vector (v c ) 402 have been identified. Selection of which of the candidate best vectors from which of the 5-pulse codebooks 160 will be used may be determined minimizing the corresponding fixed codebook error signal 408 for each of the three best vectors.
  • the corresponding fixed codebook residual error 408 for each of the three candidate subcodebooks will be referred to as first, second, and third fixed codebook error signals.
  • the minimization of the weighted mean square errors (WMSE) from the first, second and third fixed codebook error signals is mathematically equivalent to maximizing a criterion value which may be first modified by multiplying a weighting factor in order to favor selecting one specific subcodebook.
  • the criterion value from the first, second and third fixed codebook error signals may be weighted by the subframe-based weighting measures.
  • the weighting factor may be estimated by a using a sha ⁇ ness measure of the residual signal, a voice-activity detection module, a noise-to-signal ratio (NSR), and a normalized pitch correlation. Other embodiments may use other weighting factor measures. Based on the weighting and on the maximal criterion value, one of the three 5-pulse fixed codebooks 160, and the best candidate vector in that subcodebook, may be selected.
  • the selected 5-pulse codebook 161, 163 or 165 may then be fine searched for a final decision of the best vector for the fixed codebook vector (v c ) 402.
  • the fine search is performed on the vectors in the selected 5-pulse codebook 160 that are in the vicinity of the best candidate vector chosen.
  • the indices that identify the best vector (maximal criterion value) from the fixed codebook vector are in the bitstream to be transmitted to the decoder.
  • Encoding the pitch lag generates an adaptive codebook vector 382 (lag) and an adaptive codebook gain g a 384, for each subframe of type 1 processing.
  • the lag is inco ⁇ orated into the fixed codebook in one embodiment, by using the pitch enhancement differently for different subcodebooks, to increase excitation density.
  • the use of the pitch enhancement should be inco ⁇ orated during the searches in the encoder and the same pitch enhancement should be applied to the codevector from the fixed codebook in the decoder. For every vector found in the fixed codebook, the density of the codevector may be increased by convoluting with an impulsive response of pitch enhancement.
  • This impulsive response always has a unit pulse at time 0 and includes an addition pulse at +1 pitch lag, -1 pitch lag, +2 pitch lags, -2 pitch lags, and so on.
  • the magnitudes of these additional pitch pulses are determined by a pitch enhancement coefficient, which may be different for different subcodebooks.
  • the pitch enhancement coefficient is calculated according the pitch gain, g a _ m from the previous subframe of the adaptive codebook section, multiplied by a factor that depends on the fixed subcodebook.
  • Type 0 LTP Type Type 1 (PP TYPE)
  • Subcodebook #1 0.5 ⁇ 0.75 • g a m ⁇ 1.0 0.5 ⁇ 0.75 • g a ⁇ 1.0
  • Subcodebook #2 0.0 ⁇ 0.25 • g a _ m ⁇ 0.5 0.0 ⁇ 0.50 • g a ⁇ 0.5
  • Subcodebook #3 0 0.0 ⁇ 0.50 • g a ⁇ 0.5
  • the pitch enhancement coefficient for the whole fixed codebook could be the previous pitch gain g a _ m multiplied by a factor of
  • the result may be limited to a value between 0.0 and 1.0.
  • the above Table may also be used to determine the pitch enhancement coefficients for different subcodebooks.
  • the pitch enhancement coefficient for the first subcodebook may be the pitch gain of the previous subframe, g a _ m , multiplied by 0.75. The result may be limited to values between 0.5 and 1.0.
  • the pitch enhancement coefficients could be limited to values between 0.0 ⁇ 0.25-g a _ m ⁇ 0.5; the pitch enhancement coefficient could be zero for the third subcodebook.
  • speech is processed in frames of 160 samples with four subframes of 40 samples for F0.
  • a pitch lag of 16 samples may be calculated and forwarded by an adaptive codebook contribution.
  • the use of 16 samples is merely a convenience, and pitch lags are usually larger than 16.
  • a fixed codebook in the same speech coder/decoder may be searched and a close match of one of the pulses from the fixed codebook found at sample 6.
  • the fixed codebook generates a pulse at sample 6 and the pitch enhancement generates additional pulses at sample 22 and at sample 38. Because the pitch enhancement coefficient has been calculated according to available information, no additional bits need to be transmitted to capture the extra pulse density.
  • Fig. 9 illustrates a single pulse 902 at about location 6 (samples) generated by- a fixed codebook.
  • a pitch enhancement adds pulses 904 and 906 additional to the original pulse 902 from the fixed codebook.
  • the additional pulses correspond to at intervals 910 of 16 samples, as shown in Fig. 11. This illustrates a pitch enhancement applied in a "forward" direction.
  • the pitch enhancement may be applied in a "backward" direction.
  • Fig. 12 illustrates a pulse 912 from a fixed codebook at 24 (samples).
  • a pulse 916 is added in a forward direction at 40 (samples), as seen in Fig. 13.
  • a pulse 914 is added in a backward direction at 8 (samples), calculated by subtracting 16 from 24. It has been found that speech coded with these enhancements sounds more natural and more similar to an original spoken voice.
  • the fixed codebook pulses in this embodiment are processed as described and shown in the previous examples.
  • a pitch enhancement coefficient is applied to the pitch pulses that are +1 or -1 pitch lag away from the main pulse.
  • Type 0 Fixed Codebook Search for the Half-Rate Codec The fixed codebook component 178a for frames of Type 0 classification represents the fixed codebook contribution for each of the two subframes of the half- rate codec 24.
  • the representation may be based on the pulse codebooks 192 and 194 and the gaussian subcodebook 196.
  • the initial target for the fixed codebook- gain represented by the gain (g c ) 404 may be determined similarly to the full-rate codec 22.
  • the criterion value may be weighted similarly to the full-rate codec 22, from a perceptual point of view.
  • the weighting may be applied to favor selecting the best vector from the gaussian subcodebook 196 when the input reference signal is noise-like.
  • the weighting helps determine the most suitable fixed subcodebook vector (v c ) 402.
  • the pitch enhancement discussed in the FO processing applies also to the half rate HO, which in one embodiment is processed in subframes of 80 samples.
  • the pitch lags are derived in the same manner from the adaptive codebook, as is the pitch gain, g a 384.
  • a pitch gain from the previous subframe, g a _ m is used.
  • the pitch enhancement coefficient for the first subcodebook 192 is estimate by multiplying the pitch gain of the previous subframe by a factor of 0.75, where resulting 0.75 • g a _ m is limited to values between 0.5 and 1.0.
  • the pitch enhancement coefficient is multiplied by 0.25, with the resulting 0.25 • g a m is limited to values between 0.0 and 0.25.
  • An example is depicted in Figs. 14-16.
  • 2-subframe processing is used, and in this example, an initial pulse from a subcodebook for the HO codec is at about 44. This is shown in Fig. 14 as 922.
  • Additional pulses introduced by the pitch enhancement are located at ⁇ 1 and ⁇ 2 pitch lags away from the initial pulse, or in this example, at 12, 28, 60 and 76, for a pitch lag of 16. This is depicted in Fig. 15, with pulses at ⁇ 1 pitch lag at 28 and 60, 926 and 928 respectively, and ⁇ 2 pitch lags, at 12 and 76, 924 and 930 respectively.
  • Fig. 16 depicts a pitch enhancement coefficient of 0.5 applied once to the pulses 936 and 938. The coefficient is applied twice (0.5 to the second power, or 0.25) to the pulses 934 and 940.
  • the search for the best vector for the fixed codebook vector (v c ) 402 is based on minimizing the energy of the fixed codebook error signal 408 as previously discussed.
  • the search may first be performed on the 2-pulse subcodebook 192.
  • the 3-pulse codebook 194 may be searched next, in several steps.
  • the current step may determine a starting point for the next step.
  • Backward and forward pitch enhancement may be applied during the search and after the search in both pulse subcodebooks 192 and 194.
  • the gaussian subcodebook 196 may be searched last, using a fast search routine based on two orthogonal basis vectors.
  • the selection of one of the subcodebooks 192, 194 or 196 and the best vector (v c ) 402 from the selected subcodebook may be performed in a manner similar, to that used for the full-rate codec 22.
  • the indices that identify the best fixed codebook vector (v c ) 402 within the selected subcodebook are the fixed codebook component 178a in the bitstream.
  • the unquantized initial values of the gains (g a ) 384 and (g c ) 404 may now be finalized based on the vectors for the adaptive codebook vector (v a ) 382 (lag) and the fixed codebook vector (v c ) 402 previously determined. They are jointly quantized within the gain quantization section 366. Determination and quantization of the gains occurs within the gain quantization section 366. . ⁇ * - • >• Fixed Codebook Encoding for Type 1 Frames
  • the FI and HI first frame processing modules 72 and 82 include a 3D/4D open loop VQ module 454.
  • the FI and HI sub-frame processing modules 74 and 84 include the adaptive codebook 368, the fixed codebook 390, a first multiplier 456, a second multiplier 458, a first synthesis filter 460 and a second synthesis filter 462.
  • the FI and HI sub-frame processing modules 74 and 84 include a first perceptual weighting filter 464, a second perceptual weighting filter 466, a first subtractor 468, a second subtractor 470, a first minimization module 472 and an energy adjustment module 474.
  • the FI and HI second frame processing modules 76 and 86 include a third multiplier 476, a fourth multiplier 478, an adder 480, a third synthesis filter 482, a third perceptual weighting filter 484, a third subtractor 486, a buffering module 488, a second minimization module 490 and a 3D/4D VQ gain codebook 492.
  • the processing of frames classified as Type 1 within the excitation-processing module 54 provides processing on both a frame basis and a sub-frame basis.
  • the following discussion refers to the modules within the full rate codec 22.
  • the modules in the half rate codec 24 function similarly unless otherwise noted.
  • Quantization of the adaptive codebook gain by the FI first frame-processing module 72 generates the adaptive gain component 148b.
  • the FI subframe processing module 74 and the FI second frame processing module 76 operate to determine the fixed codebook vector and the corresponding fixed codebook gain, respectively as previously set forth.
  • the FI subframe-processing module 74 uses the track tables to generate the fixed codebook component 146b as illustrated in FIG. 2.
  • the FI second frame processing module 76 quantizes the fixed codebook gain to generate the fixed gain component 150b.
  • the full-rate codec 22 uses 10 bits for the quantization of 4 fixed codebook gains
  • the half-rate codec 24 uses 8 bits for the quantization of the 3 fixed codebook gains.
  • the quantization may be performed using moving average prediction.
  • the 3D/4D open loop VQ module 454 receives the unquantized pitch gains 352 from a pitch pre-processing module (not shown).
  • the 3D/4D open loop VQ module 454 quantizes the unquantized pitch gains 352 to generate a quantized pitch gain (g k a ) 496 representing quantized pitch gains for each subframe where k is the number of subframes.
  • a quantized pitch gain (g k a ) 496 representing quantized pitch gains for each subframe where k is the number of subframes.
  • there are four subframes for the full-rate codec 22 and three subframes for the half-rate codec 24 which correspond to four quantized gains (g a, g a, g a, and g a ) and three quantized gains (g a , g a , and g a ) of each subframe, respectively.
  • the index location of the quantized pitch gain (g k a ) 496 within the pre- gain quantization table represents the adaptive gain component 148b for the full-rate codec 22 or the adaptive gain component 180b for the half-rate codec 24.
  • the quantized pitch gain (g a ) 496 is provided to the FI subframe-processing module 74 or the HI second subframe-processing module 84.
  • k the subframe number.
  • the adaptive codebook vector (v k a ) 498 selected and the quantized pitch gain (g k a ) 496 are multiplied by the first multiplier 456.
  • the first multiplier 456 generates a signal that is processed by the first synthesis filter 460 and the first perceptual weighting filter module 464 to provide a first resynthesized speech signal 500.
  • the first synthesis filter 460 receives the quantized LPC coefficients A q (z) 342 from an LSF quantization module (not shown) as part of the processing.
  • the first subtractor 468 subtracts the first resynthesized speech signal 500 from the modified weighted speech 350 provided by a pitch pre-processing module (not shown) to generate a long-term residual signal 502.
  • the FI or HI subframe-processing module 74 or 84 also performs a search for the fixed codebook contribution that is similar to that performed by the F0 and HO subframe-processing modules 70 and 80.
  • Vectors for a fixed codebook vector (v k c ) 504 that represents the long-term residual for a subframe are selected from the- fixed- codebook 390.
  • the second multiplier 458 multiplies the fixed codebook vector (v k c ) 504 by a gain (g k c ) 506 where k equals the subframe number as previously discussed.
  • the gain (g k c ) 506 is unquantized and represents the fixed codebook gain for each subframe.
  • the resulting signal is processed by the second synthesis filter 462 and the second perceptual weighting filter 466 to generate a second component of resynthesized speech signal 508.
  • the second resynthesized speech signal 508 is subtracted from the long-term error signal 502 by the second subtractor 470 to produce a fixed codebook error 510.
  • the fixed codebook error signal 510 is received by the first minimization module 472 along with control information 356.
  • the first minimization module 472 operates in the same manner as the previously discussed second minimization module 400 illustrated in FIG. 6.
  • the search process repeats until the first minimization module 472 has selected a fixed codebook vector (v c ) 504 from the fixed codebook 390 for each subframe.
  • the best vector for the fixed codebook vector (v k c ) 504 minimizes the energy of the fixed codebook error signal 510.
  • the indices identify the best fixed codebook vector (v c ) 504, and form the fixed codebook components 146b and 178b.
  • the 8-pulse codebook 162, illustrated in FIG. 5, is used for each of the four subframes for frames of type 1 by the full-rate codec 22.
  • the target for the fixed codebook vector (v k c ) 504 is the long-term error signal 502.
  • t'(n) is a target for a fixed codebook search
  • g a is a pitch gain
  • h(n) is an impulse response of a perceptually weighted synthesis filter
  • e(n) is past excitation
  • I(L p (n)) is an integer part of a pitch lag
  • f(L p (n)) is a fractional part of a pitch lag
  • w s (f, i) is a Hamming weighted Sine window.
  • pitch enhancement may be applied in the forward, or forward and backward directions.
  • the search procedure minimizes the fixed codebook error 508 using an iterative search procedure with controlled complexity to determine the best fixed codebook vector v k c 504.
  • An initial fixed codebook gain represented by the gain (g k c ) 506 is determined during the search.
  • the indices identify the best fixed codebook vector (v k c ) 504 and form the fixed codebook component 146b as previously discussed.
  • the long-term residual is represented by an excitation from a fixed codebook with 13 bits for each of the three subframes for frames classified as Type 1 for the half-rate codec 24.
  • the long-term residual error 502 may be used as a target in a similar manner to the fixed codebook search in the full-rate codec 22. Similar to the fixed-codebook search for the half-rate codec 24 for frames of Type 0, high-frequency noise injection, additional pulses that are determined by correlation in the previous subframe, and a weak short-term filter may be added to enhance the fixed codebook contribution connected to the second synthesis filter 462. In addition, forward, or forward and backward pitch enhancement may be also.
  • the adaptive codebook gain 496 calculated above is also used to estimate the pitch enhancement coefficients for the fixed subcodebook.
  • the adaptive codebook gain of the current subframe, g a rather than that of the previous subframe is used.
  • a full search is performed for a 2-pulse subcodebook 193, a 3-pulse subcodebook 195, and a 5-pulse subcodebook 197, as illustrated in FIG. 5.
  • the best fixed codebook vector (v k c ) 504 that minimizes the fixed codebook error signal 510 is selected for the representation of the long term residual for each subframe.
  • an initial fixed codebook gain represented by the gain (g c ) 506 may be determined during the search similar to the full-rate codec 22.
  • the indices identify the vector for the fixed codebook vector (v k c ) 504 and form the fixed codebook component 178b.
  • the pitch enhancement coefficients for different subcodebooks are also determined using Table 1.
  • the pitch enhancement coefficient for the first subcodebook could be the pitch gain of the current subframe, g a , limited to a value between 0.5 and 1.0.
  • the pitch enhancement coefficient could be 0.0 ⁇ 0.5 g a ⁇ 0.5.
  • the FI or HI subframe-processing modules 74 or 84 operate on a subframe basis.
  • the FI or HI second frame-processing modules 76 or 86 operate on a frame basis.
  • parameters determined by the FI or HI subframe-processing module 74 or 84 are stored in the buffering module 488 for later use on a frame basis.
  • the parameters stored are the adaptive codebook vector (v k a ) 498 and the fixed codebook vector (v k c ) 504, a modified target signal 512 and the gains 496 (g k a ) and 506 (g k c ) representing the initial adaptive, and fixed codebook gains.
  • the fixed codebook gains (g k c ) 506 are determined by vector quantization (VQ).
  • the fixed codebook gains (g c ) 506 replace the unquantized initial fixed codebook gains determined previously.
  • a joint delayed quantization (VQ) of the fixed-codebook gains for each subframe is performed by the second frame-processing modules 76 and 86.
  • Fig. 17 comprises FI and HI subframe processing modules 74 and 84, respectively.
  • Each uses a pitch track provided to identify a pitch vector (v k a ) 498.
  • a functional block diagram represents the full and half rate decoders 90 and 92 of Fig. 4.
  • One embodiment of the decoding system 16 includes a full-rate decoder 90, a half-rate decoder 92, a quarter-rate decoder 94, and an eighth- rate decoder 96, a synthesis filter module 98, and a post-processing module 100.
  • the decoders are the decoding portion of the full, half, quarter and eighth rate codecs 22, 24, 26, and 28 shown in Fig. 2.
  • the decoders 90, 92, 94, and 96 receive the bitstream as shown in Fig. 2, and transform the bitstream back to different parameters of the speech signal 18.
  • the ⁇ ⁇ • decoders decode each frame as a function of the rate selection and classification.
  • the rate selection is provided from the encoding system 12 to the decoding system 16 by an external signal in a control channel in a wireless communications system.
  • the synthesis filter 98 assembles the parameters of the speech signal 18 that are decoded by the decoders, thus generating reconstructed speech.
  • the reconstructed speech is passed thorough the post-processing module 100 to create post-processed synthesized speech 20.
  • Post-processing module 100 can include filtering, signal enhancement, noise modification, amplification, tilt correction, and other similar techniques capable of improving the perceptual quality of the synthesized speech.
  • the decoders 90 and 92 perform inverse mapping of the components of the bit- stream to algorithm parameters.
  • the inverse mapping may be followed by a type classification dependent synthesis within the full and half-rate codecs 22 and 24.
  • the decoding for the quarter-rate codec 26 and the eighth rate coded 28 are similar to those of the full and half rate codecs. However, the quarter-rate and eighth- rate codecs use vectors of similar yet random numbers and an energy gain, rather than the adaptive codebooks 368 and fixed codebooks 390. The random numbers and an energy gain may be used to reconstruct an excitation energy that represents the excitation of a frame. Excitation modules 120 and 124 may be used respectively to generate portions of the quarter-rate and eighth-rate reconstructed speech. LSFs encoded during the encoding process may be used by LPC reconstruction modules 122 and 126 respectively for the quarter-rate and eighth-rate reconstructed speech.
  • the adaptive codebook 368 receives information reconstructed by the decoding system 16 from the adaptive codebook components 144 and 176 provided in the bitstream by the encoding system 12.
  • the synthesis filter assembles the parameters of the speech signal 18 that are decoded by the decoders, 90, 92, 94, and 96.
  • the full rate decoder 90 includes an F-type selector 102 and a plurality of excitation reconstruction modules.
  • the excitation reconstruction modules comprise an F0 excitation reconstruction module 104 and an FI excitation reconstruction module 106.
  • the full rate decoder 90 includes an LPC reconstruction module 107.
  • the LPC reconstruction module 107 comprises an F0 LPC reconstruction module 108 and an FI LPC reconstruction module 110.
  • the other speech parameters encoded by full rate encoder 36 are reconstructed by the decoder 90 to reconstruct speech.
  • an embodiment of the half-rate decoder 92 includes an H-type selector 112 and a plurality of excitation reconstruction modules.
  • the excitation reconstruction modules comprise an HO excitation reconstruction module 114 and an HI excitation reconstruction module 116.
  • the half-rate decoder 92 .. comprises an H LPC reconstruction module 1 18. In a manner similar to that of the full rate encoder, the other speech parameters encoded by the half rate encoder 38 are reconstructed by the half rate decoder to reconstruct speech.
  • the F and H type selectors 102 and 112 selectively activate appropriate respective portions of the full and half rate decoders 90 and 92 respectively.
  • a type 0 classification activates the FO reconstruction module 104 or HO 114.
  • the respective F0 or FI LPC reconstruction modules are used to reconstruct the speech from the bitstream.
  • the same process used to encode the speech is used in reverse to decode the signals, including the pitch lags, pitch gains, and any additional factors used, such as the coefficients described above.

Abstract

A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codec are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech. The overall quality of the system is strongly related to the excitation. In order to enhance the excitation, the system contains a fixed codebook comprising several subcodebooks. The invention reveals a way to apply a pitch enhancement efficiently and differently for different subcodebooks without using additional bits. The technique is particularly applicable to selectable mode vocoder (SMV) systems.

Description

"Express Mail" mailing label number F.L51314?.SS4US Date of Deposit September 15, ?.nfffl
SYSTEM FOR IMPROVED USE OF PITCH ENHANCEMENT WITH
SUBCODEBOOKS
INVENTOR: YANG GAO
BACKGROUND OF THE INVENTION
1. Cross Reference to Related Applications.
The following co-pending and commonly assigned U.S. patent applications have been filed on the same day as this application. All of these applications relate to and further describe other aspects of the embodiments disclosed in this application and are incorporated by reference in their entirety. United States Patent Application Serial Number , "SELECTABLE
MODE VOCODER SYSTEM," Attorney Reference Number: 98RSS365CIP (10508.4), filed on September 15, 2000, and is now United States Patent Number
United States Patent Application Serial Number , "INJECTING HIGH FREQUENCY NOISE INTO PULSE EXCITATION FOR LOW BIT RATE CELP," Attorney Reference Number: 00CXT0065D (10508.5), filed on September 15,
2000, and is now United States Patent Number .
United States Patent Application Serial Number , "SHORT TERM
ENHANCEMENT IN CELP SPEECH CODING," Attorney Reference Number: 00CXT0666N (10508.6), filed on September 15, 2000, and is now United States Patent
Number .
United States Patent Application Serial Number , "SYSTEM OF
DYNAMIC PULSE POSITION TRACKS FOR PULSE-LIKE EXCITATION IN SPEECH CODING," Attorney Reference Number: 00CXT0573N (10508.7), filed on September 15, 2000, and is now United States Patent Number . United States Patent Application Serial Number , "SPEECH
CODING SYSTEM WITH TIME-DOMAIN NOISE ATTENUATION," Attorney Reference Number: 00CXT0554N (10508.8), filed on September 15, 2000, and is now
United States Patent Number . United States Patent Application Serial Number , "SYSTEM FOR
AN ADAPTIVE EXCITATION PATTERN FOR SPEECH CODING," Attorney Reference Number: 98RSS366 (10508.9), filed on September 15, 2000, and is now United States Patent Number .
United States Patent Application Serial Number , "SYSTEM FOR ENCODING SPEECH INFORMATION USING AN ADAPTIVE CODEBOOK WITH DIFFERENT RESOLUTION LEVELS," Attorney Reference Number: 00CXT0670N (10508.13), filed on September 15, 2000, and is now United States Patent Number
United States Patent Application Serial Number , "CODEBOOK TABLES FOR ENCODING AND DECODING," Attorney Reference Number: 00CXT0669N (10508.14), filed on September 15, 2000, and is now United States
Patent Number .
United States Patent Application Serial Number , "BIT STREAM
PROTOCOL FOR TRANSMISSION OF ENCODED VOICE SIGNALS," Attorney Reference Number: 00CXT0668N (10508.15), filed on September 15, 2000, and is now
United States Patent Number .
United States Patent Application Serial Number , "SYSTEM FOR
FILTERING SPECTRAL CONTENT OF A SIGNAL FOR SPEECH ENCODING," Attorney Reference Number: 00CXT0667N (10508.16), filed on September 15, 2000, and is now United States Patent Number .
United States Patent Application Serial Number , "SYSTEM FOR
ENCODING AND DECODING SPEECH SIGNALS," Attorney Reference Number: O0CXT0665N (10508.17), filed on September 15, 2000, and is now United States
Patent Number . United States Patent Application Serial Number , "SYSTEM FOR
SPEECH ENCODING HAVING AN ADAPTIVE FRAME ARRANGEMENT," Attorney Reference Number: 98RSS384CIP (10508.18), filed on September 15, 2000, and is now United States Patent Number .
2. Technical Field.
This invention relates to speech communication systems and, more particularly, to systems and methods for digital speech coding.
3. Related Art.
One prevalent mode of human communication involves the use of communication systems. Communication systems include both wireline and wireless radio systems. Wireless communication systems electrically connect with the land'lihe systems and communicate using radio frequency (RF) with mobile communication devices. Currently, the radio frequencies available for communication in cellular systems, for example, are in the frequency range centered around 900 MHz and in the personal communication services (PCS) frequency range centered around 1900 MHz. Due to increased traffic caused by the expanding popularity of wireless communication devices, such as cellular telephones, it is desirable to reduced bandwidth of transmissions within the wireless systems.
Digital transmission in wireless radio communications is increasingly being applied to both voice and data due to noise immunity, reliability, compactness of equipment and the ability to implement sophisticated signal processing functions using digital techniques. Digital transmission of speech signals involves the steps of: sampling an analog speech waveform with an analog-to-digital converter, speech compression (encoding), transmission, speech decompression (decoding), digital-to- analog conversion, and playback into an earpiece or a loudspeaker. The sampling of the analog speech waveform with the analog-to-digital converter creates a digital signal. However, the number of bits used in the digital signal to represent the analog speech waveform creates a relatively large bandwidth. For example, a speech signal that is sampled at a rate of 8000 Hz (once every 0.125 ms), where each sample is represented by 16 bits, will result in a bit rate of 128,000 (16x8000) bits per second, or 128 Kbp's (Kilo bits per second). Speech compression reduces the number of bits that represent the speech signal, thus reducing the bandwidth needed for transmission. However, speech compression may result in degradation of the quality of decompressed speech. In general, a higher bit rate will result in higher quality, while a lower bit rate will result in lower quality. However, speech compression techniques, such as coding techniques, can produce decompressed speech of relatively high quality at relatively low bit rates. In general, coding techniques attempt to represent the perceptually important features of the speech signal, with or without preserving the actual speech waveform.
One coding technique used to lower the bit rate involves varying the degree of speech compression (i.e., varying the bit rate) depending on the part of the speech signal being compressed. Typically, parts of the speech signal for which adequate perceptual representation is more difficult or more important (such as voiced speech, plosives, or voiced onsets) are coded and transmitted using a higher number of bits, while parts of the speech signal for which adequate perceptual representation is less difficult or less important (such as unvoiced, or the silence between words) are coded with a lower number of bits. The resulting average bit rate for the speech signal may be relatively lower than would be the case for a fixed bit rate that provides decompressed speech of similar quality.
These speech compression techniques have resulted in lowering the amount of bandwidth used to transmit a speech signal. However, further reduction in bandwidth is important in a communication system for a large number of users. Accordingly, there-is a need for systems and methods of speech coding that are capable of minimizing the average bit rate needed for speech representation, while providing high quality decompressed speech.
SUMMARY
A technique uses a pitch enhancement to improve the use of the fixed codebooks in cases where the fixed codebook comprises a plurality of subcodebooks. Code-excited linear prediction (CELP) coding utilizes several predictions to capture redundancy in voiced speech while minimizing data to encode the speech. A first short- term prediction results in an LPC residual, and a second long term prediction results in a pitch residual. The pitch residual may be coded using a fixed codebook that includes a plurality of fixed subcodebooks. The disclosed embodiments describe a system for pitch enhancements to improve the use of communication systems employing a plurality of fixed subcodebooks.
A pitch enhancement is used in a predictable manner to add pulses to the output from the fixed subcodebooks but without requiring any additional bits to encode this additional information. The pitch lag is calculated in an adaptive codebook portion of the speech encoder/decoder. These additional pulses result in encoded speech that more closely approximates the voiced speech. In the improvement, an adaptive pitch gain and a modifying factor are used to enhance the pulses from the fixed subcodebooks differently for different subcodebooks. These techniques are used in such a manner that no extra bits of data are added to the bitstream that constitutes the output of an encoder or the input to a decoder.
Accordingly, the speech coder is capable of selectively activating a series of encoders and decoders of different bitstream rates to maximize the overall quality of a reconstructed speech signal while maintaining the desired average bit rate. Other systems, methods, features and advantages of the invention will be or will become apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the accompanying claims.
BRIEF DESCRIPTION OF THE FIGURES
The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like reference numerals designate corresponding parts throughout the different views. FIG. 1 is a graph representing time-domain speech patterns.
FIG. 2 is a block diagram of a speech-coding system according to the invention.
FIG. 3 is another block diagram of a speech coding system.
FIG. 4 is an expanded block diagram of a speech encoding system.
FIG. 5 is a block diagram of fixed codebooks. FIG. 6 is an expanded block diagram of the encoding system of FIG. 4.
FIG. 7 is a flow chart for searching a fixed codebook. FIG. 8 is a flow chart for searching a fixed codebook. FIG. 9 is a schematic diagram illustrating pitch enhancements. FIG. 10 is a schematic diagram illustrating pitch enhancements. FIG. 11 is a schematic diagram illustrating pitch enhancements. FIG. 12 is a schematic diagram illustrating pitch enhancements.
FIG. 13 is a schematic diagram illustrating pitch enhancements. FIG. 14 is a schematic diagram illustrating pitch enhancements. FIG. 15 is a schematic diagram illustrating pitch enhancements. FIG. 16 is a schematic diagram illustrating pitch enhancements. FIG. 17 is another expanded block diagram of the encoding system of FIG. 4.
FIG. 18 is an expanded block diagram of the decoding system of FIG. 3.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Fig. 1 depicts the waveforms in CELP speech coding. An input speech signal 2 has some measure of predictability or periodicity 4. At least a pitch gain, a pitch lag and a fixed codebook index are calculated from the speech signal 2. The code-excited linear prediction (CELP) coding approach uses two types of predictors, a short-term predictor and a long-term predictor. The short-term predictor is typically applied before the long-term predictor. The short-term predictor is also referred to as linear prediction coding (LPC) or spectral envelope representation, and typically may comprise ten prediction parameters.
Using CELP coding, a first prediction error may be derived from the short-term predictor and is called a short-term or LPC residual 6. The short-term LPC parameters, fixed-codebook indices and gain, as well as an adaptive codebook lag and its gain for the long-term predictor are quantized. The quantization indices, as well as the fixed codebook indices, are sent from the encoder to the decoder. The quality of the speech may be enhanced through a system that uses a plurality of fixed subcodebooks, rather than merely a single fixed subcodebook. Each lag parameter also may be called a pitch lag, and each long-term predictor gain parameter also may be called an adaptive codebook gain. The lag parameter defines an entry or a vector in the adaptive codebook. Following the LPC analysis, the long-term predictor parameters and the fixed codebook entries that best represent the prediction error of the long-term residual are determined. A second prediction error may be derived from the long-term predictor and is called a long-term or pitch residual 8. The long-term residual may be coded using a fixed codebook that includes a plurality of fixed codebook entries or vectors. During coding, one of the entries is multiplied by a fixed codebook gain to represent the long- term residual. Analysis-by-synthesis (ABS), that is, feedback, is employed in the CELP coding. In the ABS approach, synthesizing with an inverse prediction filter and applying a perceptual weighting measure determine the best contribution from the fixed codebook and the best long-term predictor parameters.
The CELP decoder uses the fixed codebook indices to extract a vector from the fixed codebook or subcodebooks. The vector is multiplied by the fixed-codebook gain to create a fixed codebook contribution. A long-term predictor contribution is added to the fixed codebook contribution to create a synthesized excitation that is referred to as an excitation. The long-term predictor contribution comprises the excitation from the past multiplied by the long-term predictor gain. The long-term predictor contribution alternatively comprises an adaptive codebook contribution or a long-term pitch-filtering characteristic. The synthesized excitation is passed through a short-term synthesis filter, which uses the short-term LPC prediction coefficients quantized by the encoder to generate synthesized speech. The synthesized speech may be passed through a post- filter that reduces the perceptual coding noise. Other codecs and associated coding algorithms may be used, such as a selectable mode locoer (SUM) system, extended code excited linear prediction (eX-CELP), and algebraic CELP (A-CELP).
Fig. 2 is a block diagram of a speech coding system 100 with according to one embodiment that uses CELP coding. The speech coding system 100 includes a first communication device 105 operatively connected via a communication medium 110 to a second communication device 115. The speech coding system 100 may be any cellular telephone, radio frequency, or other communication system capable of encoding a speech signal 145 and decoding the encoded signal to create synthesized speech 150. The communications devices 105 and 1 15 may be cellular telephones, portable radio transceivers, and the like. The communications medium 110 may include systems using any transmission mechanism, including radio waves, infrared, landlines, fiber optics, any other medium capable of transmitting digital signals (wires or cables), or any combination thereof. The communications medium 110 may also include a storage mechanism including a memory device, a storage medium, or other device capable of storing and retrieving digital signals. In use, the communications medium 110 transmits a bitstream of digital between the first and second communications devices 105 and 115.
The first communication device 105 includes an analog-to-digital converter 120, a preprocessor 125, and an encoder 130 connected as shown. The first communication , device 105 may have an antenna or other communication medium interface (not shown) for sending and receiving digital signals with the communication medium 110. The first communication device 105 may also have other components known in the art for any communication device, such as a decoder or a digital-to-analog converter.
The second communication device 115 includes a decoder 135 and digital-to- analog converter 140 connected as shown. Although not shown, the second communication device 115 may have one or more of a synthesis filter, a postprocessor, and other components. The second communication device 1 15 also may have an antenna or other communication medium interface (not shown) for sending and receiving digital signals with the communication medium. The preprocessor 125, encoder 130, and decoder 135 comprise processors, digital signal processors (DSP), application specific integrated circuits, or other digital devices for implementing the coding and algorithms discussed herein. The preprocessor 125 and encoder 130 may comprise separate components or the same component
In use, the analog-to-digital converter 120 receives a speech signal 145 from a microphone (not shown) or other signal input device. The speech signal may be voiced speech, music, or another analog signal. The analog-to-digital converter 120 digitizes the speech signal, providing the digitized speech signal to the preprocessor 125. The preprocessor 125 passes the digitized signal through a high-pass filter (not shown) preferably with a cutoff frequency of about 60-80 Hz. The preprocessor 125 may perform other processes to improve the digitized signal for encoding, such as noise suppression. The encoder 130 codes the speech using a pitch lag, a pitch gain, a fixed codebook, a fixed codebook gain, LPC parameters and other parameters. The code is transmitted in the communication medium 110.
The decoder 135 receives the bitstream from the communication medium 110. The decoder operates to decode the bitstream and generate a synthesized speech signal 150 in the form of a digitized signal. The synthesized speech signal 150 has been converted to an analog signal by the digital-to-analog converter 140. The encoder 130 and the decoder 135 use a speech compression system, commonly called a codec, to reduce the bit rate of the noise-suppressed digitized speech signal. For example, the code excited linear prediction (CELP) coding technique utilizes several prediction- • • techniques to remove redundancy from the speech signal.
The CELP coding approach is frame-based. Samples of input speech signals (e.g., preprocessed, digitized speech signals) are stored in blocks of samples called frames. To minimize bandwidth use, each frame may be characterized. The frames are processed to create a compressed speech signal in digitized form. The frame characterization is based on the portion of the speech signal 145 contained in the particular frame. For example, frames may be characterized as stationary voiced speech, non-stationary voiced speech, unvoiced speech, onset, background noise, and silence. As will be seen, these classifications may be used to help determine the resources used to encode and decode each particular frame. Fig. 3 shows an embodiment of a speech coding system 10 that may utilize adaptive and fixed codebooks, and in particular, may utilize fixed codebooks that comprise a plurality of fixed subcodebooks for encoding at different rates as a function of the characterization. The encoding system 12 receives a speech signal 18 from a signal input device such as a microphone (not shown). The speech coding system 10 includes four codecs, a full-rate codec 22, a half-rate codec 24, a quarter-rate codec 26 and an eighth-rate codec 28. There may be more or fewer codecs. Each codec has an encoder portion and a decoder portion located within the encoding and decoding systems 12 and 16 respectively. Each codec 22, 24, 26, and 28 may process a portion of the bitstream between the encoding system 12 and the decoding system 16. Desirably. the decoded speech is also post-processed by modules shown in later figures. The post- processed speech may be received by a human ear or by a recording device, or other device capable of receiving or using such a signal. Each codec generates a bitstream of a different bandwidth. In one embodiment, the full rate codec generates about 170 bits, the half-rate codec generates about 80 bits, the quarter-rate about 40 bits, and the eighth-rate about 16 bits respectively, per frame. The speech processing circuitry is constantly changing the codec used to code and decode speech. By processing the frames of the speech signal 18 with the various codecs, an average bit rate is achieved. The average bit rate of the bitstream may be calculated as an average of the codecs used in any particular interval of time. A mode- line 21 carries a mode-input signal from a communications system. The mode-input signal controls the average rate of the encoding system 12, dictating which of a plurality of codecs is used within the encoding system 12.
In one embodiment of the speech compression system 10, the full- and half-rate codecs use an eX-CELP (extended CELP) algorithm. The eX-CELP algorithm categorizes frames into different categories using a rate selection and a type classification. The quarter- and eighth-rate codecs are based on a perceptual matching algorithm. Different encoding approaches may be used for different categories of frames with different perceptual matching, different waveform matching, and different bit assignments. In this embodiment, the perceptual matching algorithms of the quarter-rate and eighth-rate codecs do not use waveform matching. The frames may be divided into a plurality of subframes. The subframes may be different in size and number for each codec. With respect to the eX-CELP algorithm, the subframes may be different in size for each classification. The CELP approach is used in eX-CELP to choose the adaptive codebook, the fixed codebook. and other parameters used to code the speech. The ABS scheme uses inverse prediction filters- and perceptual weighting measures for selecting the codebook entries.
Fig. 4 is an expanded block diagram of the encoding system 12 shown in Fig. 3. One embodiment of the encoding system 12 includes a preprocessing module 34, a full- rate encoder 36, a half-rate encoder 38, a quarter-rate encoder 40, and an eighth-rate encoder 42, connected as illustrated. The pre-processing module 34 may be used to process speech on a frame basis to provide filtering, signal enhancement, noise enhancement, and amplification to optimize the signal for subsequent processing. The rate encoders include an initial frame-processing module 44 and an excitation-processing module 54. The initial frame-processing module 44 is divided into a plurality of initial frame processing modules, namely, modules for the full-rate 46, half-rate 48, quarter-rate 50, and an initial eighth-rate frame processing module 52. The full, half, quarter and eighth-rate encoders 36, 38, 40, and 42 comprise the encoding portion of the respective codecs 22, 24, 26, and 28. The initial frame- processing module 44 performs initial frame processing, extracts speech parameters, and determines which rate encoder will encode a particular frame. Module 44 determines a rate selection that activates one of the encoders 36, 38, 40, or 42. The rate selection may be based on the categorization of the frame of the speech signal 18 and the mode of the speech compression system. Activation of one of the rate encoders 36, 38, 40, or 42, correspondingly activates one of the initial frame-processing modules 46, 48, 50, or 52.
In addition to the rate selection, the initial frame-processing module 44 also determines a type classification for each frame that is processed by the full and half rate encoders 36 and 38. In one embodiment, the speech signal 18 as represented by one frame is classified as "type 0" or "type 1 ," depending on the nature and characteristics of the speech signal 18. In an alternative embodiment, additional classifications and supporting processing are provided. Type 1 classification includes frames of the speech signal 18 having harmonic and formant structures that do not change rapidly. Type 0 classification includes all other frames. The type classification optimizes encoding by the initial full-rate frame- processing module 46 and the initial half-rate frame-processing module 48. In addition, the classification type and rate selection are used to optimize the encoding by the excitation-processing module 54 for the full and half-rate encoders 36 and 38.
In one embodiment, the excitation-processing module 54 is sub-divided into a full-rate module 56, a half-rate module 58, a quarter-rate module 60, and an eighth-rate module 62. The rate modules 56, 58, 60, and 62 correspond to the rate encoders 36, 38, 40, and 42. The full and half rate modules 56 and 58 in one embodiment both include a plurality of frame processing modules and a plurality of subframe processing modules, but provide substantially different encoding. The term "F" indicates full rate processing, "H" indicates half-rate processing, and "0" and "1" indicate type 0 and type 1, respectively.
The initial frame-processing module 44 includes modules for full-rate frame processing 46 and half-rate frame processing 48. These modules may calculate an open loop pitch 144a for a full-rate frame, or an open loop pitch 176a for a half-rate frame. These components may be used later.
The full rate module 56 includes an F type selector module 68, and an FO subframe-processing module 70. Module 56 also includes modules for FI processing, including an FI first frame processing module 72, an FI subframe processing module 74, and an FI second frame-processing module 76. In a similar manner, the half rate module 58 includes an H type selector module 78, an HO sub-frame processing module 80, an HI first frame processing module 82, an HI sub-frame processing module 84, and an HI second frame-processing module 86.
The selector modules 68 and 78 direct the processing of the speech signals 18 to further optimize the encoding process based on the type classification. When the frame being processed is classified as full rate, selector module 68 directs the speech signal to either the F0 or FI processing to encode the speech and generate the bitstream. Type 0 classification for a frame activates the processing module to process the frame on a subframe basis. Type 1 processing proceeds on both a frame and subframe basis. In type 0 processing, a fixed codebook component 146a and a closed loop adaptive codebook component 144b are generated and are used to generate fixed and adaptive codebook gains 148a and 150a. In type 1 processing, an adaptive gain 148b is derived from the first frame-processing module 72, and a fixed codebook 146b is selected and used to encode the speech with the subframe-processing module 74. A fixed codebook gain 150b is derived from the second frame-processing module 76. Type signal 142 designates the type as either F0 or FI in the bitstream.
If the frame of the speech signal is classified as half-rate, selector module 78 directs the frame to either HO (type 0) or HI (type 1) processing. The same classifications are made with respect to type 0 or type 1 processing. In type 0 processing, HO subframe processing module 80 generates a fixed codebook component 178a and a closed loop adaptive codebook component 176b, used to generate fixed and adaptive codebook gains 180a and 182a. In type 1 processing, an HI first frame processing module 82, an HI subframe processing module 84 and an HI second frame processing module 86 are used. An adaptive gain 180b, a fixed codebook component 178b, and a fixed codebook gain are calculated. Type signal 174 designates the type as either HO or HI in the bitstream.
In a manner known to those skilled in the art, adaptive codebooks are then used to code the signal in the full rate and half rate codecs. An adaptive codebook search and selection for the full rate codec uses components 144a and 144b. These components are used to search, test, select and designate the location of a pitch lag from an adaptive codebook. In a similar manner, half-rate components 176a and 176b search, test, select and designate the location of the best pitch lag for the half-rate codec. These pitch lags are subsequently used to improve the quality of the encoded and decoded speech through fixed codebooks employing a plurality of fixed subcodebooks. Fig. 5 is a block diagram depicting the structure of fixed codebooks and subcodebooks in one embodiment. The fixed codebook 160 for the F0 codec comprises three (different) subcodebooks, each of them having 5 pulses. The fixed codebook for the FI codec is a single 8-pulse subcodebook 162. For the half-rate codec, the fixed codebook 178 comprises three subcodebooks for the HO, a 2-pulse subcodebook 192, a three-pulse subcodebook 194, and a third subcodebook 196 with gaussian noise. In the HI codec, the fixed codebook comprises a 2-pulse subcodebook 193, a 3 -pulse subcodebook 195, and a 5-pulse subcodebook 197. Fixed Codebook Encoding for Type 0 Frames
Fig. 6 comprises F0 and HO subframe processing modules 70 and 80, including an adaptive codebook section 362, a fixed codebook section 364, and a gain quantization section 366. The adaptive codebook section 368 receives a pitch track 348 to calculate an area in the adaptive codebook to search for an adaptive codebook vector (va) 382 (a pitch lag). The adaptive codebook section 368 also performs a search to determine and store the best lag vector va for each subframe. An adaptive gain, ga 384. FIG. 6 depicts the fixed codebook section 364, including a fixed codebook 390, a multiplier 392, a synthesis filter 394, a perceptual weighting filter 396, a subtractor 398, and a minimization module 400. The gain quantization section 366 may include a 2D VQ gain codebook 412, a first multiplier 414, a second multiplier 416, an adder 418, a synthesis filter 420, a perceptual weighting filter 422, a subtractor 424 and a minimization module 426. The gain quantization section 366 makes use of the second resynthesized speech 406 generated in the fixed codebook section, and also generates a third resynthesized speech 438.
The fixed codebook 390 fixed codebook vector (vc) 402 representing the long- term residual for a subframe. The multiplier 392 multiplies the fixed codebook vector (vc) 402 by a gain (gc) 404. The gain (gc) 404 is unquantized and is a representation of the initial value of the fixed codebook gain. The resulting signal is provided to the synthesis filter 394. The synthesis filter 394 receives the quantized LPC coefficients Aq(z) 342 and together with the perceptual weighting filter 396, creates a resynthesized speech signal 406. The subtractor 398 subtracts the resynthesized speech signal 406 from the long-term error signal 388 to generate the weighted mean square error (WMSE), a fixed codebook error signal 408.
The minimization module 400 receives the fixed codebook error signal 408. The minimization module 400 uses the fixed codebook error signal 408 to control the selection of vectors for the fixed codebook vector (vc) 402 from the fixed codebook 292 in order to reduce the error. The minimization module 400 also receives the control information 356 that may include a final characterization for each frame.
The final characterization class contained in the control information 356 controls how the minimization module 400 selects vectors for the fixed codebook vector (vc) 402 from the fixed codebook 390. The process repeats until the search by the second minimization module 400 has selected the best vector for the fixed codebook vector (vc) 402 from the fixed codebook 390 for each subframe. The best vector for the fixed codebook vector (vc) 402 minimizes the error in the second resynthesized speech signal 406. The indices identify the best vector for the fixed codebook vector (vc) 402 and, as previously discussed, may be used to form the fixed codebook components 146a and 178a. Weighting Factors in Selecting a Fixed Subcodebook and a Codevector
Low-bit rate coding uses the important concept of perceptual weighting to determine speech coding. We introduce here a special weighting factor different from the factor previously described for the perceptual weighting filter in the closed-loop analysis. This special weighting factor is generated by employing certain features of speech, and applied as a criterion value in favoring a specific subcodebook in a codebook featuring a plurality of subcodebooks. One subcodebook may be preferred over the other subcodebooks for some specific speech signal, such as noise-like unvoiced speech. The features used to estimate the weighting factor include, but are not limited to, the noise-to-signal ratio (NSR), sharpness of the speech, the pitch lag, the pitch correlation, as well as other features. The classification system for each frame of speech is also important in defining the features of the speech.
The NSR is a traditional distortion criterion that may be calculated as the ratio between an estimate of the background noise energy and the frame energy of a frame. One embodiment of the NSR calculation ensures that only true background noise is included in the ratio by using a modified voice activity decision. In addition, previously calculated parameters representing, for example, the spectrum expressed by the reflection coefficients, the pitch correlation Rp, the NSR, the energy of the frame, the energy of the previous frames, the residual sharpness and the sharpness may also be used. Sharpness is defined as the ratio of the average of the absolute values of the samples to the maximum of the absolute values of the samples of speech. It is typically applied to the amplitude of the signals. Pitch Correlation
One embodiment of the target signal for time warping is a synthesis of the current segment derived from the modified weighted speech that is represented by sw' (n) and the pitch track 348 represented by Lμ(n) . According to the pitch track 348,
Lp(n) , each sample value of the target signal s (n),n = 0,...,NΛ. -] may be obtained by interpolation of the modified weighted speech using a 21sl order Hamming weighted Sine window, *» = ∑ s(f(Lp(n)),i)- sw' (n - l(Lp(n)) + i), (Equation 1)
-10 for n = 0,...,N , -1
where l(Lp(ή)) and f(Lp(n)) are the integer and fractional parts of the pitch lag, respectively; w^(f,i) is the Hamming weighted Sine window, and Ns is the length of the segment. A weighted target, -C(«) . is given by s '(n) = wc(n) -sκ' (n) . The weighting function, we(n), may be a two-piece linear function, which emphasizes the pitch complex and de-emphasizes the "noise" in between pitch complexes. The weighting may be adapted according to a classification, by increasing the emphasis on the pitch complex for segments of higher periodicity. Signal Warping
The modified weighted speech for the segment may be reconstructed according to the mapping given by (« + τ ),sv + racc + rc + ropt)]→ [s'w(n),s'w(n + rc - 1)], (Equation 2) and (" + X, + rc + τopi),sw(n + Xc + + N> - 1)]→ fc(" + τc),sw' (n + NΛ - 1)],
(Equation 3) where rcis a parameter defining the waφing function. In general, rc specifies the beginning of the pitch complex. The mapping given by Equation 2 specifies a time warping, and the mapping given by Equation 3 specifies a time shift (no waφing).
Both may be carried out using a Hamming weighted Sine window function.
Pitch Gain and Pitch Correlation Estimation
The pitch gain and pitch correlation may be estimated on a pitch cycle basis and are defined by Equations 2 and 3, respectively. The pitch gain is estimated in order to minimize the mean squared error between the target s (n) , defined by Equation 1 , and the final modified signal s (n), defined by Equations 2 and 3, and may be given by JV. -l
∑sw' (n) - sw' (n) ga = ^ ^ • (Equation 4) n=0
The pitch gain is provided to the excitation-processing module 54 as the unquantized pitch gains. The pitch correlation may be given by
R (Equation 5)
Figure imgf000019_0001
Both parameters are available on a pitch cycle basis and may be linearly interpolated.
Type 0 Fixed Codebook Search for the Full-Rate Codec
The fixed codebook component 146a for frames of Type 0 classification may represent each of four subframes of the full-rate codec 22 using the three different 5- pulse subcodebooks 160. When the search is initiated, vectors for the fixed codebook vector (vc) 402 within the fixed codebook 390 may be determined using the error signal
388, represented by:
*'(#.) = .(«) - ga ■ (e(n - L°p ) * /.(«)) . (Equation 6) where t' (n) is a target for a fixed codebook search, t(n) is an original target signal, ga is an adaptive gain, e(n) is a post excitation to generate an adaptive codebook contribution, Lp opt is an optimized lag, and h(n) is an impulse response of a perceptually-weighted LPC synthesis filter.
Pitch enhancement may be applied to the 5-pulse codebooks 160 within the fixed codebook 390 in the forward direction or the backward direction during the search. The search is an iterative, controlled complexity search for the best vector from the fixed codebook 160. An initial value for the fixed codebook gain represented by the gain (gc) 404 may be found simultaneously with the search.
Figures 7 and 8 illustrate the procedure used to search for the best indices in the fixed codebook. In one embodiment, a fixed codebook has k subcodebooks. More or fewer subcodebooks may be used in other embodiments. In order to simplify the description of the iterative search procedure, the following example first features a single subcodebook containing N pulses. The possible location of a pulse is defined by a plurality of positions on a track. In a first searching turn, the encoder processing circuitry searches the pulse positions sequentially from the first pulse 633 (PN=1 ) to the next pulse 635, until the last pulse 637 (PN = N). For each pulse after the first, the searching of the current pulse position is conducted by considering the influence from previously-located pulses. The influence is the desirable minimizing of the energy of the fixed subcodebook error signal 408. In a second searching turn, the encoder processing circuitry corrects each pulse position sequentially, again from the first pulse 639 to the last pulse 641, by considering the influence of all the other pulses. In subsequent turns, the functionality of the second or subsequent searching turn is repeated, until the last turn is reached 643. Further turns may be utilized if the added complexity is allowed. This procedure is followed until k turns are completed 645 and a value is calculated for the subcodebook.
Fig. 8 is a flow chart for the method described in Fig. 7 to be used for searching a fixed codebook comprising a plurality of subcodebooks. A first turn is begun 651 by searching a first subcodebook 653, and searching the other subcodebooks 655, in the same manner described for Fig. 7, and keeping the best result 657, until the last subcodebook is searched 659. If desired, a second turn 661 or subsequent turn 663 may also be used, in an iterative fashion. In some embodiments, to minimize complexity and shorten the search, one of the subcodebooks in the fixed codebook is typically chosen after finishing the first searching turn. Further searching turns are done only with the chosen subcodebook. In other embodiments, one of the subcodebooks might be chosen only after the second searching turn or thereafter, should processing resources so permit. Computations of minimum complexity are desirable, especially since two or three times as many pulses are calculated, rather than one pulse before enhancements described herein are added.
In an example embodiment, the search for the best vector for the fixed codebook vector (vc) 402 is completed in each of the three 5-pulsε codebooks 160. At the conclusion of the search process within each of the three 5-pulse codebooks 160, candidate best vectors for the fixed codebook vector (vc) 402 have been identified. Selection of which of the candidate best vectors from which of the 5-pulse codebooks 160 will be used may be determined minimizing the corresponding fixed codebook error signal 408 for each of the three best vectors. For puφoses of this discussion, the corresponding fixed codebook residual error 408 for each of the three candidate subcodebooks will be referred to as first, second, and third fixed codebook error signals.
* The minimization of the weighted mean square errors (WMSE) from the first, second and third fixed codebook error signals is mathematically equivalent to maximizing a criterion value which may be first modified by multiplying a weighting factor in order to favor selecting one specific subcodebook. Within the full-rate' codec 22 for frames classified as Type Zero, the criterion value from the first, second and third fixed codebook error signals may be weighted by the subframe-based weighting measures. The weighting factor may be estimated by a using a shaφness measure of the residual signal, a voice-activity detection module, a noise-to-signal ratio (NSR), and a normalized pitch correlation. Other embodiments may use other weighting factor measures. Based on the weighting and on the maximal criterion value, one of the three 5-pulse fixed codebooks 160, and the best candidate vector in that subcodebook, may be selected.
The selected 5-pulse codebook 161, 163 or 165 may then be fine searched for a final decision of the best vector for the fixed codebook vector (vc) 402. The fine search is performed on the vectors in the selected 5-pulse codebook 160 that are in the vicinity of the best candidate vector chosen. The indices that identify the best vector (maximal criterion value) from the fixed codebook vector are in the bitstream to be transmitted to the decoder.
Encoding the pitch lag generates an adaptive codebook vector 382 (lag) and an adaptive codebook gain ga 384, for each subframe of type 1 processing. The lag is incoφorated into the fixed codebook in one embodiment, by using the pitch enhancement differently for different subcodebooks, to increase excitation density. The use of the pitch enhancement should be incoφorated during the searches in the encoder and the same pitch enhancement should be applied to the codevector from the fixed codebook in the decoder. For every vector found in the fixed codebook, the density of the codevector may be increased by convoluting with an impulsive response of pitch enhancement. This impulsive response always has a unit pulse at time 0 and includes an addition pulse at +1 pitch lag, -1 pitch lag, +2 pitch lags, -2 pitch lags, and so on. The magnitudes of these additional pitch pulses are determined by a pitch enhancement coefficient, which may be different for different subcodebooks. For type 0 processing, the pitch enhancement coefficient is calculated according the pitch gain, ga_m from the previous subframe of the adaptive codebook section, multiplied by a factor that depends on the fixed subcodebook.
Examples of typical pitch enhancement coefficients are listed in Table 1. This table is typically used for the half-rate codec, although it could also be employed; for the; full-rate. The benefit from a more flexible pitch enhancement for the full-rate codec is less significant, because the full rate excitation from a large fixed codebook with a short subframe size is already very rich. The coefficients for Type 1 will be explained below.
Type 0 LTP Type) Type 1 (PP TYPE) Subcodebook #1 0.5 < 0.75 ga m < 1.0 0.5 < 0.75 ga < 1.0
Subcodebook #2 0.0 < 0.25 ga_m < 0.5 0.0 < 0.50 ga < 0.5
Subcodebook #3 0 0.0 < 0.50 ga < 0.5
Pitch Enhancement Coefficients TABLE 1 "' ■ ' . ''•
In one embodiment for F0 processing, the pitch enhancement coefficient for the whole fixed codebook could be the previous pitch gain ga_m multiplied by a factor of
0.75. The result may be limited to a value between 0.0 and 1.0. The above Table may also be used to determine the pitch enhancement coefficients for different subcodebooks. The pitch enhancement coefficient for the first subcodebook may be the pitch gain of the previous subframe, ga_m, multiplied by 0.75. The result may be limited to values between 0.5 and 1.0. Similarly, for F0 processing with a second subcodebook, the pitch enhancement coefficients could be limited to values between 0.0 < 0.25-ga_m ≤ 0.5; the pitch enhancement coefficient could be zero for the third subcodebook.
In the example of Fig. 9, speech is processed in frames of 160 samples with four subframes of 40 samples for F0. A pitch lag of 16 samples may be calculated and forwarded by an adaptive codebook contribution. The use of 16 samples is merely a convenience, and pitch lags are usually larger than 16. A fixed codebook in the same speech coder/decoder may be searched and a close match of one of the pulses from the fixed codebook found at sample 6. In this example, the fixed codebook generates a pulse at sample 6 and the pitch enhancement generates additional pulses at sample 22 and at sample 38. Because the pitch enhancement coefficient has been calculated according to available information, no additional bits need to be transmitted to capture the extra pulse density.
Fig. 9 illustrates a single pulse 902 at about location 6 (samples) generated by- a fixed codebook. In one embodiment, shown in Fig. 10, a pitch enhancement adds pulses 904 and 906 additional to the original pulse 902 from the fixed codebook. The additional pulses correspond to at intervals 910 of 16 samples, as shown in Fig. 11. This illustrates a pitch enhancement applied in a "forward" direction.
In another embodiment, the pitch enhancement may be applied in a "backward" direction. Fig. 12 illustrates a pulse 912 from a fixed codebook at 24 (samples). Using the previous example of a pitch lag of 16 samples, a pulse 916 is added in a forward direction at 40 (samples), as seen in Fig. 13. A pulse 914 is added in a backward direction at 8 (samples), calculated by subtracting 16 from 24. It has been found that speech coded with these enhancements sounds more natural and more similar to an original spoken voice. The fixed codebook pulses in this embodiment are processed as described and shown in the previous examples. In this example, a pitch enhancement coefficient is applied to the pitch pulses that are +1 or -1 pitch lag away from the main pulse. Type 0 Fixed Codebook Search for the Half-Rate Codec The fixed codebook component 178a for frames of Type 0 classification represents the fixed codebook contribution for each of the two subframes of the half- rate codec 24. The representation may be based on the pulse codebooks 192 and 194 and the gaussian subcodebook 196. The initial target for the fixed codebook- gain represented by the gain (gc) 404 may be determined similarly to the full-rate codec 22. In addition, during the search for the fixed codebook vector (vc) 402 within the fixed codebook 390, the criterion value may be weighted similarly to the full-rate codec 22, from a perceptual point of view. In the half-rate codec 24, the weighting may be applied to favor selecting the best vector from the gaussian subcodebook 196 when the input reference signal is noise-like. The weighting helps determine the most suitable fixed subcodebook vector (vc) 402. The pitch enhancement discussed in the FO processing applies also to the half rate HO, which in one embodiment is processed in subframes of 80 samples. The pitch lags are derived in the same manner from the adaptive codebook, as is the pitch gain, ga 384. In HO processing, as in FO processing, a pitch gain from the previous subframe, ga_m, is used. In one embodiment, the pitch enhancement coefficient for the first subcodebook 192 is estimate by multiplying the pitch gain of the previous subframe by a factor of 0.75, where resulting 0.75 ga_m is limited to values between 0.5 and 1.0. Similarly, for HO processing with a second subcodebook, the pitch enhancement coefficient is multiplied by 0.25, with the resulting 0.25 ga m is limited to values between 0.0 and 0.25. An example is depicted in Figs. 14-16. For the HO codec, 2-subframe processing is used, and in this example, an initial pulse from a subcodebook for the HO codec is at about 44. This is shown in Fig. 14 as 922. Additional pulses introduced by the pitch enhancement are located at ± 1 and ± 2 pitch lags away from the initial pulse, or in this example, at 12, 28, 60 and 76, for a pitch lag of 16. This is depicted in Fig. 15, with pulses at ± 1 pitch lag at 28 and 60, 926 and 928 respectively, and ± 2 pitch lags, at 12 and 76, 924 and 930 respectively. Fig. 16 depicts a pitch enhancement coefficient of 0.5 applied once to the pulses 936 and 938. The coefficient is applied twice (0.5 to the second power, or 0.25) to the pulses 934 and 940.
The search for the best vector for the fixed codebook vector (vc) 402 is based on minimizing the energy of the fixed codebook error signal 408 as previously discussed. The search may first be performed on the 2-pulse subcodebook 192. The 3-pulse codebook 194 may be searched next, in several steps. The current step may determine a starting point for the next step. Backward and forward pitch enhancement may be applied during the search and after the search in both pulse subcodebooks 192 and 194. The gaussian subcodebook 196 may be searched last, using a fast search routine based on two orthogonal basis vectors. The selection of one of the subcodebooks 192, 194 or 196 and the best vector (vc) 402 from the selected subcodebook may be performed in a manner similar, to that used for the full-rate codec 22. The indices that identify the best fixed codebook vector (vc) 402 within the selected subcodebook are the fixed codebook component 178a in the bitstream. The unquantized initial values of the gains (ga) 384 and (gc) 404 may now be finalized based on the vectors for the adaptive codebook vector (va) 382 (lag) and the fixed codebook vector (vc) 402 previously determined. They are jointly quantized within the gain quantization section 366. Determination and quantization of the gains occurs within the gain quantization section 366. .* - >• Fixed Codebook Encoding for Type 1 Frames
Referring now to Fig. 17, the FI and HI first frame processing modules 72 and 82 include a 3D/4D open loop VQ module 454. The FI and HI sub-frame processing modules 74 and 84 include the adaptive codebook 368, the fixed codebook 390, a first multiplier 456, a second multiplier 458, a first synthesis filter 460 and a second synthesis filter 462. In addition, the FI and HI sub-frame processing modules 74 and 84 include a first perceptual weighting filter 464, a second perceptual weighting filter 466, a first subtractor 468, a second subtractor 470, a first minimization module 472 and an energy adjustment module 474. The FI and HI second frame processing modules 76 and 86 include a third multiplier 476, a fourth multiplier 478, an adder 480, a third synthesis filter 482, a third perceptual weighting filter 484, a third subtractor 486, a buffering module 488, a second minimization module 490 and a 3D/4D VQ gain codebook 492.
The processing of frames classified as Type 1 within the excitation-processing module 54 provides processing on both a frame basis and a sub-frame basis. For puφoses of brevity, the following discussion refers to the modules within the full rate codec 22. The modules in the half rate codec 24 function similarly unless otherwise noted. Quantization of the adaptive codebook gain by the FI first frame-processing module 72 generates the adaptive gain component 148b. The FI subframe processing module 74 and the FI second frame processing module 76 operate to determine the fixed codebook vector and the corresponding fixed codebook gain, respectively as previously set forth. The FI subframe-processing module 74 uses the track tables to generate the fixed codebook component 146b as illustrated in FIG. 2.
The FI second frame processing module 76 quantizes the fixed codebook gain to generate the fixed gain component 150b. In one embodiment, the full-rate codec 22 uses 10 bits for the quantization of 4 fixed codebook gains, and the half-rate codec 24 uses 8 bits for the quantization of the 3 fixed codebook gains. The quantization may be performed using moving average prediction. First Frame Processing Module
The 3D/4D open loop VQ module 454 receives the unquantized pitch gains 352 from a pitch pre-processing module (not shown). The 3D/4D open loop VQ module 454 quantizes the unquantized pitch gains 352 to generate a quantized pitch gain (gk a) 496 representing quantized pitch gains for each subframe where k is the number of subframes. In one embodiment, there are four subframes for the full-rate codec 22 and three subframes for the half-rate codec 24 which correspond to four quantized gains (g a, g a, g a, and g a) and three quantized gains (g a, g a, and g a) of each subframe, respectively. The index location of the quantized pitch gain (gk a) 496 within the pre- gain quantization table represents the adaptive gain component 148b for the full-rate codec 22 or the adaptive gain component 180b for the half-rate codec 24. The quantized pitch gain (g a) 496 is provided to the FI subframe-processing module 74 or the HI second subframe-processing module 84. Sub-Frame Processing Module
The FI or HI subframe-processing module 74 or 84 uses the pitch track 348 to identify an adaptive codebook vector (v a) 498, representing the adaptive codebook contribution for each subframe, where k = the subframe number. In one embodiment, there are four subframes for the full-rate codec 22 and three subframes for the half-rate codec 24 which correspond to four vectors (v'a, v2 a, v3 a, and v4 a) and three vectors (v'a, v2 a, and v3 a) for the adaptive codebook contribution for each subframe, respectively.
The adaptive codebook vector (vk a) 498 selected and the quantized pitch gain (gk a) 496 are multiplied by the first multiplier 456. The first multiplier 456 generates a signal that is processed by the first synthesis filter 460 and the first perceptual weighting filter module 464 to provide a first resynthesized speech signal 500. The first synthesis filter 460 receives the quantized LPC coefficients Aq(z) 342 from an LSF quantization module (not shown) as part of the processing. The first subtractor 468 subtracts the first resynthesized speech signal 500 from the modified weighted speech 350 provided by a pitch pre-processing module (not shown) to generate a long-term residual signal 502.
The FI or HI subframe-processing module 74 or 84 also performs a search for the fixed codebook contribution that is similar to that performed by the F0 and HO subframe-processing modules 70 and 80. Vectors for a fixed codebook vector (vk c) 504 that represents the long-term residual for a subframe are selected from the- fixed- codebook 390. The second multiplier 458 multiplies the fixed codebook vector (vk c) 504 by a gain (gk c) 506 where k equals the subframe number as previously discussed. The gain (gk c) 506 is unquantized and represents the fixed codebook gain for each subframe. The resulting signal is processed by the second synthesis filter 462 and the second perceptual weighting filter 466 to generate a second component of resynthesized speech signal 508. The second resynthesized speech signal 508 is subtracted from the long-term error signal 502 by the second subtractor 470 to produce a fixed codebook error 510.
The fixed codebook error signal 510 is received by the first minimization module 472 along with control information 356. The first minimization module 472 operates in the same manner as the previously discussed second minimization module 400 illustrated in FIG. 6. The search process repeats until the first minimization module 472 has selected a fixed codebook vector (v c) 504 from the fixed codebook 390 for each subframe. The best vector for the fixed codebook vector (vk c) 504 minimizes the energy of the fixed codebook error signal 510. The indices identify the best fixed codebook vector (v c) 504, and form the fixed codebook components 146b and 178b. Type 1 Fixed Codebook Search for Full-Rate Codec
In one embodiment, the 8-pulse codebook 162, illustrated in FIG. 5, is used for each of the four subframes for frames of type 1 by the full-rate codec 22. The target for the fixed codebook vector (vk c) 504 is the long-term error signal 502. The long-term error signal 502, represented by t'(n), is determined based on the modified weighted speech 350, represented by t(n), with the adaptive codebook contribution from the initial frame processing module 44 removed according to: t'(ή) = t(n) - ga ■ (va(n) * h(ή)), (Equation 7)
10 where va (n) = ∑ w, (f(Lp («)) , i) - e(n - I(Lp («)) + .)
;=-10
and where t'(n) is a target for a fixed codebook search, ga is a pitch gain, h(n) is an impulse response of a perceptually weighted synthesis filter, e(n) is past excitation, I(Lp(n)) is an integer part of a pitch lag and f(Lp (n)) is a fractional part of a pitch lag, and ws (f, i) is a Hamming weighted Sine window.
During the search for the fixed codebook vector (v c) 504, pitch enhancement may be applied in the forward, or forward and backward directions. In addition, the search procedure minimizes the fixed codebook error 508 using an iterative search procedure with controlled complexity to determine the best fixed codebook vector vk c 504. An initial fixed codebook gain represented by the gain (gk c) 506 is determined during the search. The indices identify the best fixed codebook vector (vk c) 504 and form the fixed codebook component 146b as previously discussed. Fixed Codebook Search for Half-Rate Codec
In one embodiment, the long-term residual is represented by an excitation from a fixed codebook with 13 bits for each of the three subframes for frames classified as Type 1 for the half-rate codec 24. The long-term residual error 502 may be used as a target in a similar manner to the fixed codebook search in the full-rate codec 22. Similar to the fixed-codebook search for the half-rate codec 24 for frames of Type 0, high-frequency noise injection, additional pulses that are determined by correlation in the previous subframe, and a weak short-term filter may be added to enhance the fixed codebook contribution connected to the second synthesis filter 462. In addition, forward, or forward and backward pitch enhancement may be also.
For Type 1 processing, the adaptive codebook gain 496 calculated above is also used to estimate the pitch enhancement coefficients for the fixed subcodebook. However, in one embodiment of type 1 processing, the adaptive codebook gain of the current subframe, ga, rather than that of the previous subframe is used. In one embodiment, a full search is performed for a 2-pulse subcodebook 193, a 3-pulse subcodebook 195, and a 5-pulse subcodebook 197, as illustrated in FIG. 5. The best fixed codebook vector (vk c) 504 that minimizes the fixed codebook error signal 510 is selected for the representation of the long term residual for each subframe. In addition, an initial fixed codebook gain represented by the gain (g c) 506 may be determined during the search similar to the full-rate codec 22. The indices identify the vector for the fixed codebook vector (vk c) 504 and form the fixed codebook component 178b.
In one embodiment for HI processing, the pitch enhancement coefficients for different subcodebooks are also determined using Table 1. The pitch enhancement coefficient for the first subcodebook could be the pitch gain of the current subframe, ga, limited to a value between 0.5 and 1.0. Similarly, for HI processing with second and third subcodebooks, the pitch enhancement coefficient could be 0.0 < 0.5 ga < 0.5.
As previously discussed, the FI or HI subframe-processing modules 74 or 84 operate on a subframe basis. However, the FI or HI second frame-processing modules 76 or 86 operate on a frame basis. Accordingly, parameters determined by the FI or HI subframe-processing module 74 or 84 are stored in the buffering module 488 for later use on a frame basis. In one embodiment, the parameters stored are the adaptive codebook vector (vk a) 498 and the fixed codebook vector (vk c) 504, a modified target signal 512 and the gains 496 (gk a) and 506 (gk c) representing the initial adaptive, and fixed codebook gains.
Using the vectors and pitch gains, the fixed codebook gains (gk c) 506 are determined by vector quantization (VQ). The fixed codebook gains (g c) 506 replace the unquantized initial fixed codebook gains determined previously. To determine the fixed codebook gains, a joint delayed quantization (VQ) of the fixed-codebook gains for each subframe is performed by the second frame-processing modules 76 and 86. Fig. 17 comprises FI and HI subframe processing modules 74 and 84, respectively. Each uses a pitch track provided to identify a pitch vector (vk a) 498. The pitch vector with the pitch gain represents a long-term prediction contribution for each subframe where k = the number of subframes. In one embodiment, there are four subframes for the FI codec 22 and three subframes for the HI codec 24. Decoding System
Referring now to Fig. 18, a functional block diagram represents the full and half rate decoders 90 and 92 of Fig. 4. One embodiment of the decoding system 16 includes a full-rate decoder 90, a half-rate decoder 92, a quarter-rate decoder 94, and an eighth- rate decoder 96, a synthesis filter module 98, and a post-processing module 100. The decoders are the decoding portion of the full, half, quarter and eighth rate codecs 22, 24, 26, and 28 shown in Fig. 2.
The decoders 90, 92, 94, and 96 receive the bitstream as shown in Fig. 2, and transform the bitstream back to different parameters of the speech signal 18. The ■ ■ decoders decode each frame as a function of the rate selection and classification. The rate selection is provided from the encoding system 12 to the decoding system 16 by an external signal in a control channel in a wireless communications system. The synthesis filter 98 assembles the parameters of the speech signal 18 that are decoded by the decoders, thus generating reconstructed speech. The reconstructed speech is passed thorough the post-processing module 100 to create post-processed synthesized speech 20. Post-processing module 100 can include filtering, signal enhancement, noise modification, amplification, tilt correction, and other similar techniques capable of improving the perceptual quality of the synthesized speech.
The decoders 90 and 92 perform inverse mapping of the components of the bit- stream to algorithm parameters. The inverse mapping may be followed by a type classification dependent synthesis within the full and half-rate codecs 22 and 24.
The decoding for the quarter-rate codec 26 and the eighth rate coded 28 are similar to those of the full and half rate codecs. However, the quarter-rate and eighth- rate codecs use vectors of similar yet random numbers and an energy gain, rather than the adaptive codebooks 368 and fixed codebooks 390. The random numbers and an energy gain may be used to reconstruct an excitation energy that represents the excitation of a frame. Excitation modules 120 and 124 may be used respectively to generate portions of the quarter-rate and eighth-rate reconstructed speech. LSFs encoded during the encoding process may be used by LPC reconstruction modules 122 and 126 respectively for the quarter-rate and eighth-rate reconstructed speech. Within the full and half rate decoders 90 and 92, operation of the excitation modules 104, 106, 114, and 116 depends on the type classification provided by the type component 142 and 174, just as did the encoding. The adaptive codebook 368 receives information reconstructed by the decoding system 16 from the adaptive codebook components 144 and 176 provided in the bitstream by the encoding system 12.
Depending on the type classification system provided, the synthesis filter assembles the parameters of the speech signal 18 that are decoded by the decoders, 90, 92, 94, and 96.
One embodiment of the full rate decoder 90 includes an F-type selector 102 and a plurality of excitation reconstruction modules. The excitation reconstruction modules comprise an F0 excitation reconstruction module 104 and an FI excitation reconstruction module 106. In addition, the full rate decoder 90 includes an LPC reconstruction module 107. The LPC reconstruction module 107 comprises an F0 LPC reconstruction module 108 and an FI LPC reconstruction module 110. The other speech parameters encoded by full rate encoder 36 are reconstructed by the decoder 90 to reconstruct speech.
Similarly, an embodiment of the half-rate decoder 92 includes an H-type selector 112 and a plurality of excitation reconstruction modules. The excitation reconstruction modules comprise an HO excitation reconstruction module 114 and an HI excitation reconstruction module 116. In addition, the half-rate decoder 92 .. , comprises an H LPC reconstruction module 1 18. In a manner similar to that of the full rate encoder, the other speech parameters encoded by the half rate encoder 38 are reconstructed by the half rate decoder to reconstruct speech.
The F and H type selectors 102 and 112 selectively activate appropriate respective portions of the full and half rate decoders 90 and 92 respectively. A type 0 classification activates the FO reconstruction module 104 or HO 114. The respective F0 or FI LPC reconstruction modules are used to reconstruct the speech from the bitstream. The same process used to encode the speech is used in reverse to decode the signals, including the pitch lags, pitch gains, and any additional factors used, such as the coefficients described above. While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible that are within the scope of this invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.

Claims

What is claimed is: L A method of pitch enhancement in a speech compression system using adaptive and fixed codebooks, comprising: calculating a pitch enhancement coefficient; providing a fixed subcodebook comprising at least two fixed subcodebooks; and selecting a fixed subcodebook from among the at least two fixed subcodebooks, where the pitch enhancement coefficient is dependent on the fixed subcodebook selected.
PCT/IB2001/001735 2000-09-15 2001-09-17 System for improved use of pitch enhancement with subcodebooks WO2002023533A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001287973A AU2001287973A1 (en) 2000-09-15 2001-09-17 System for improved use of pitch enhancement with subcodebooks

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US23293800P 2000-09-15 2000-09-15
US60/232,938 2000-09-15

Publications (2)

Publication Number Publication Date
WO2002023533A2 true WO2002023533A2 (en) 2002-03-21
WO2002023533A3 WO2002023533A3 (en) 2002-08-15

Family

ID=22875191

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2001/001735 WO2002023533A2 (en) 2000-09-15 2001-09-17 System for improved use of pitch enhancement with subcodebooks

Country Status (2)

Country Link
AU (1) AU2001287973A1 (en)
WO (1) WO2002023533A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003001172A1 (en) * 2001-06-21 2003-01-03 Nokia Corporation Method and device for coding speech in analysis-by-synthesis speech coders

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6173257B1 (en) * 1998-08-24 2001-01-09 Conexant Systems, Inc Completed fixed codebook for speech encoder

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6173257B1 (en) * 1998-08-24 2001-01-09 Conexant Systems, Inc Completed fixed codebook for speech encoder

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MCCLELLAN S ET AL: "EFFICIENT PITCH FILTER ENCODING FOR VARIABLE RATE SPEECH PROCESSING" IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, IEEE INC. NEW YORK, US, vol. 7, no. 1, January 1999 (1999-01), pages 18-29, XP000890821 ISSN: 1063-6676 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003001172A1 (en) * 2001-06-21 2003-01-03 Nokia Corporation Method and device for coding speech in analysis-by-synthesis speech coders

Also Published As

Publication number Publication date
AU2001287973A1 (en) 2002-03-26
WO2002023533A3 (en) 2002-08-15

Similar Documents

Publication Publication Date Title
US7117146B2 (en) System for improved use of pitch enhancement with subcodebooks
US6556966B1 (en) Codebook structure for changeable pulse multimode speech coding
US6714907B2 (en) Codebook structure and search for speech coding
US6604070B1 (en) System of encoding and decoding speech signals
US6757649B1 (en) Codebook tables for multi-rate encoding and decoding with pre-gain and delayed-gain quantization tables
US6581032B1 (en) Bitstream protocol for transmission of encoded voice signals
US7020605B2 (en) Speech coding system with time-domain noise attenuation
EP1214706B9 (en) Multimode speech encoder
Ekudden et al. The adaptive multi-rate speech coder
RU2262748C2 (en) Multi-mode encoding device
US6480822B2 (en) Low complexity random codebook structure
JP4662673B2 (en) Gain smoothing in wideband speech and audio signal decoders.
AU6456999A (en) A method and device for adaptive bandwidth pitch search in coding wideband signals
WO2000048170A9 (en) Celp transcoding
MXPA04011751A (en) Method and device for efficient frame erasure concealment in linear predictive based speech codecs.
WO2002023533A2 (en) System for improved use of pitch enhancement with subcodebooks
AU2003262451B2 (en) Multimode speech encoder
AU766830B2 (en) Multimode speech encoder
Gersho Advances in speech and audio compression
GB2352949A (en) Speech coder for communications unit
Tavathia et al. Low bit rate CELP using ternary excitation codebook

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PH PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP