WO2000057401A1 - Computation and quantization of voiced excitation pulse shapes in linear predictive coding of speech - Google Patents

Computation and quantization of voiced excitation pulse shapes in linear predictive coding of speech Download PDF

Info

Publication number
WO2000057401A1
WO2000057401A1 PCT/CA2000/000287 CA0000287W WO0057401A1 WO 2000057401 A1 WO2000057401 A1 WO 2000057401A1 CA 0000287 W CA0000287 W CA 0000287W WO 0057401 A1 WO0057401 A1 WO 0057401A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
spectral magnitude
code book
deriving
pulse
Prior art date
Application number
PCT/CA2000/000287
Other languages
French (fr)
Inventor
Mohammad Aamir Husain
Bhaskar Bhattacharya
Original Assignee
Glenayre Electronics, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Glenayre Electronics, Inc. filed Critical Glenayre Electronics, Inc.
Priority to AU34110/00A priority Critical patent/AU3411000A/en
Publication of WO2000057401A1 publication Critical patent/WO2000057401A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation

Definitions

  • This invention is directed to linear predictive coding of voiced speech sounds.
  • a single code book containing code words representative of different frequency spectra facilitates reconstruction of speech sounds, irrespective of pitch differences in such sounds.
  • LPC Linear Predictive Coding of speech involves estimating the coefficients of a time varying filter (henceforth called a "synthesis filter") and providing appropriate excitation (input) to that time varying filter.
  • the process is conventionally broken down in two steps known as encoding and decoding.
  • the speech signal s is first filtered by pre-f ⁇ lter 10.
  • the pre-filtered speech signal s p is then analyzed by LPC Analysis block 14 to compute the coeffi- cients of the synthesis filter.
  • an "analysis filter” 12 is formed, using the same coefficients as the synthesis filter but having an inverse structure.
  • the pre-filtered speech signal s p is processed by analysis filter 12 to produce an output signal u called the "residue".
  • Information about the filter coefficients and the residue is passed to the decoder for use in the decoding step.
  • a synthesis filter 18 is formed using the coefficients obtained from the encoder.
  • An appropriate excitation signal e is applied to synthesis filter 18 by excitation generator 16, based on the information about the residue obtained from the encoder.
  • Synthesis filter 18 outputs a synthetic speech signal y, which is ideally the closest possible approximation to the original speech signal s.
  • the present invention pertains to excitation generator 16 and to the way in which information about the residue passes from the encoder to the decoder.
  • Analysis filter 12 and synthesis filter 18 are exact inverses of each other. Therefore, if the residue signal u were applied directly to synthesis filter 18, the decoder would exactly reproduce the pre-filtered speech signal s p . In other words, if the precise residue signal u could be transferred from the encoder to the decoder, then the synthetic speech output signal y would be of very high quality (i.e. as good as the pre-filtered speech signal s p ). However, bandwidth restrictions necessitate quantization of the residue signal u, which unavoidably distorts the excitation signal e and the resultant synthetic speech signal y.
  • Excitation generator 16 incorporates both a "voiced” excitation generator, and an "un-voiced” excitation generator.
  • the quantization process exploits structural differences between voiced and unvoiced components of the residue.
  • the voiced residue is quasi- periodic, while the unvoiced residue is like a randomly varying signal.
  • the present invention deals particularly with quantization of the voiced residue, and corresponding generation of voiced excitation in the decoder.
  • the voiced residue can be described in terms of three parameters for quantization purposes: pitch, p u ; gain, g; and, the shape of a single cycle, called the pulse shape.
  • Pitch refers to the periodicity of the signal and is equal to the distance between subsequent pulses in the residue signal u.
  • Gain refers to the energy of the signal and is higher for a residue having higher energy.
  • the pulse shape is the actual geometric shape of each pulse (a single cycle) in the voiced residue.
  • a typical voiced residue signal is shown in Figure 2.
  • Prior art LPC coding techniques have quantized pitch and gain parameters, but have achieved only poor representation of pulse shapes.
  • early LPC coders used single unit impulses to represent pulse shape (Markel, J.D. and Gray, A.H. Jr., "A Linear Prediction Vocoder Simulation Based Upon the Autocorrelation Method", IEEE Trans. ASSP, Vol. 22, 1974, pp. 124-134); the LPC- 10 government standard (U.S. Government Federal Standard 1015, 1977) represented each pulse by a fixed shape; and more recently, excitation pulse shapes have been represented as a sum of a fixed shape and random noise (McCree, A.V.
  • Pulse trains constructed from such restricted shapes provide a poor representation of the variations in pulse shapes observed in residual signals output by analysis filter 12, as is evident from the sample residue signal shown in Figure 2.
  • a common technique known in the art of speech coding is
  • vector quantization in which a vector of samples (e.g. a signal segment) is represented as one of a predetermined set of vectors called “code words”. All of the code words are assembled to form a table called a "code book”.
  • code book The difficulty in using a standard vector quantization approach is that the pulse shapes required to be represented in LPC based speech coding are not of fixed length, but vary with pitch period. In principle, one could construct a plurality of code books, one for each possible value of pitch period, but this approach requires too many code books. It is impractical in many cases to use multiple code books due to memory limitations of the hardware in which speech encoding and decoding capabilities are preferably pro- vided. For example, large integrated circuit memory chips have relatively high power consumption requirements which cannot be satisfied in small battery powered systems such as voice pagers, which must remain active for months between battery replacements.
  • This invention provides improved representation of pulse shapes in LPC coding of voiced speech, irrespective of pitch period variations, and requires only a single code book.
  • the dashed line shown in Figure 1 represents the transfer of information about the residue from analysis filter 12 to excitation generator 16.
  • Figure 3 depicts this transfer in greater detail in respect of the aforementioned pitch, gain and pulse shape parameters.
  • the present invention focuses only on transfer of an improved pulse shape parameter in LPC coding of voiced speech sounds.
  • the invention facilitates good quality LPC coding of voiced speech sounds through better quantization of excitation pulse shapes for all possible pitch periods.
  • the invention utilizes a novel frequency domain code book with code words representative of signal frequency spectra, to select a pulse shape that closely matches the original pulse shape from the residue signal.
  • the invention provides a method of determining a pulse shape vector v for a linear predictive speech coder from a voiced residue pulse v uq , during a sampling instant n characterized by a gain g and a pitch period p u .
  • a spectral magnitude vector S uq of dimension d sm is derived to represent the frequency spectral magnitude of the pulse during the sampling instant.
  • a code book C ⁇ containing a plurality of vectors representative of pre-selected spectral magnitude vectors is provided.
  • a vector which provides a minimum error approximation to S uq is selected from the code book.
  • i sm is the spectral magnitude index, within the code book, of the selected minimum error approximation vector.
  • a quantized spectral magnitude vector S having the spectral magnitude index i sm and having d sm elements is then derived.
  • a complex frequency spectrum signal X is derived from the quantized spectral magnitude vector S and the quantized pitch period p. This in turn is converted to a complex time domain representation x.
  • the pulse shape vector v is then derived from the Real components of x.
  • Figure 1 is a block diagram representation of a prior art
  • Figure 2 depicts a typical voiced residue signal waveform and the shapes of individual pulses found in typical voiced residue/excitation signals.
  • Figure 3 is a block diagram representation of the information pathway over which information respecting the voiced residue is transferred from the encoder to the decoder in the preferred embodiment of the invention.
  • Figure 4 is a block diagram representation showing further details of the pulse shape encoder and pulse shape decoder blocks depicted in Figure 3.
  • Figure 5 graphically depicts interpolation of a harmonics vector, in accordance with the invention, to produce a spectral magnitude vector for cases in which the dimension of the harmonics vector is less than the desired dimension of the spectral magnitude vector.
  • Figure 6 graphically depicts decimation of a harmonics vector, in accordance with the invention, to produce a spectral magnitude vector for cases in which the dimension of the harmonics vector exceeds the desired dimension of the spectral magnitude vector.
  • the pre-filtered signal, s p ( Figure 1) is obtained by passing the original speech signal, s, through a pre-processing filter 10.
  • the residue, u is obtained by passing the pre-filtered signal, s p , through a time-varying all-zero LPC analysis filter 12.
  • the coefficients applied to filter 12 are obtained by LPC analyzer 14 using techniques which are well known to persons skilled in the art and need not be described here.
  • n the original speech signal s is classified as voiced (using techniques which are well known in the art), then a pulse-shape vector v uq is obtained as described below for that particular sampling instant.
  • the energy at any sampling instant, n is represented by a gain, g, corresponding to the root mean square value of the residue over a window (typically having a length of 80-160 samples) centred at the sampling instant, n.
  • the pitch period at any sampling instant, n, as determined in the speech encoder, is denoted by p u and the quantized pitch at the speech decoder is denoted by p.
  • voicing and gain analyzer 20 receives original speech signal s and residue u, and outputs signals representative of pitch period p u , gain g and pulse-shape vector v uq respectively.
  • pitch encoder 24 processes pitch period p u for further processing by pitch decoder 34 on the decoder side to yield quantized pitch p, which is in turn input to the decoder's voiced excitation generator 22.
  • Pulse shape encoder 28 processes pulse-shape vector v uq for further processing by pulse shape decoder 30 to yield pulse shape vector v for input to voiced excitation generator 22.
  • Gain encoder 26 processes the gain characteristic of the signal output by voicing and gain analyzer 20 for further processing by gain decoders 32, 36 which respectively yield the gain g for input to voiced excitation generator 22 (on the decoder side) and pulse shape encoder 28 (on the encoder side).
  • the operation of pulse shape encoder 28 and pulse shape decoder 30 will now be described in further detail, with reference to Figure 4.
  • spectral magnitude vector, S uq is obtained ( Figure 4, block 38) as follows.
  • V uq which is a complex vector of dimension, p u
  • DFT Discrete Fourier Transform
  • H uq of dimension, d h
  • the two end elements ss ⁇ , ss 9 of a source sequence of d h elements (upper portion of Figure 5) are initially repositioned (central portion of Figure 5) to coincide with the end elements tSj, ts 14 , of the desired target sequence (lower portion of Figure 5).
  • the source sequence elements are equi- spaced, as are the target sequence elements, although the spacings are of arbitrary size in each sequence. Then, the source sequence elements between the end points are copied to the nearest element positions in the target sequence.
  • source sequence elements ss l f ss 2 , ss 3 , and ss 4 depicted in the central portion of Figure 5 are copied to produce target sequence elements ts ⁇ , ts 3 , ts 5 , and ts 6 respectively, as depicted in the lower portion of Figure 5. Since d h ⁇ d sm (i.e. 9 ⁇ 14), some empty positions, such as ts 2 and ts 4 remain in the target sequence. These empty positions are filled by inserting values obtained by interpolation between the closest adjacent target sequence values copied from the source sequence.
  • the value inserted in empty position ts 2 is obtained by interpolation between the previously copied target sequence elements ts ⁇ , ts 3 ; and, the value inserted in empty position ts 4 is obtained by interpolation between the previously copied target sequence elements ts 3 , ts 5 , etc.
  • the two end elements ss j , ss 25 of the source sequence of d h elements (upper portion of Figure 6) are initially repositioned (central portion of Figure 6) to coincide with the end elements ts lf ts 25 of the desired target sequence (lower portion of Figure 6). Then, the source sequence elements between the end points are copied to the nearest element positions in the target sequence. Since d h > d sm (i.e. 25 > 8), some target sequence positions (in the case illustrated, all target sequence positions) must receive copies of more than one of the source sequence elements.
  • source sequence elements ss ⁇ , ss 2 , ss 3 and ss 4 depicted in the central portion of Figure 6 are all copied to produce target sequence element tS j '
  • source sequence elements ss 5 , ss 6 and ss 7 are all copied to produce target sequence element ts 2 , etc. as depicted in the lower portion of Figure 6. If more than one source sequence element is copied to produce a single target sequence element as aforesaid, the value of the resultant single target sequence element is determined as a weighted average of the source sequence elements in question.
  • the interpolation/decimation operation of the preferred embodiment of the invention is expressed in pseudo-code as follows:
  • the vector quantizer code book, ⁇ ,, ( Figure 4, blocks 46, 48) is obtained by generating a very large training set of spectral magnitude vectors, S uq , obtained from a database of different speakers and sentences.
  • the code book, C sm is obtained by means of the LBG algorithm (see Y. Linde, A. Buzo and R.M. Gray, "An algorithm for Vector Quantizer Design", IEEE Transactions on Communications, Vol. COM-28, pp. 84-95, January 1980).
  • any spectral magnitude vector can then be encoded by selecting a suitable vector from the code book.
  • Encoding the vector, S uq , ( Figure 4, block 40) involves selecting a vector entry from the code book, C sm , that minimizes a specified error criterion.
  • the spectral magnitude index, i sm denotes the vector entry selected from the spectral magnitude code book, C ⁇ .
  • a weighted mean square error criterion is used for the code book search.
  • the weighting function, w ⁇ used in the search procedure, is defined as follows:
  • the weighting values used in interpolation of the spectral magnitude vector can be obtained in any one of a number of ways well known to per- sons skilled in the art. The same is true of the weighting function, w sm , used in searching the code book, as described above in the section headed "Encoding Spectral Magnitude Vectors".
  • different mapping techniques can be used in the interpolation/decimation processes described above in relation to Figures 5 and 6.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention facilitates linear predictive coding of voiced speech sounds, with a single voiced excitation code book encompassing a wide range of different pitch periods. Unlike prior art techniques which represent voiced excitation using fixed pulse shapes with random variations added, the invention uses a single code book of representative frequency spectra to closely match the original unquantized pulse shape. In particular, the invention facilitates determination of a pulse shape vector v for a linear predictive speech coder from a voiced residue pulse u, during a sampling instant n characterized by a gain g and a pitch period p. A spectral magnitude vector Suq of dimension dsm is derived (38) to represent the frequency spectral magnitude of the pulse during the sampling instant. A code book Csm (46) containing a plurality of vectors representative of pre-selected spectral magnitude vectors is provided. A vector which provides a minimum error approximation to Suq is selected from the code book (40). ism is the spectral magnitude index, within the code book, of the selected minimum error approximation vector. A quantized spectral magnitude vector S having the spectral magnitude index ism and having dsm elements is then derived (42). A complex frequency spectrum signal X is derived from S and is converted to a complex time-domain representation x. The pulse shape vector v is then derived from the Real components of x.

Description

COMPUTATION AND QUANTIZATION OF VOICED
EXCITATION PULSE SHAPES IN LINEAR
PREDICTIVE CODING OF SPEECH
Technical Field
This invention is directed to linear predictive coding of voiced speech sounds. A single code book containing code words representative of different frequency spectra facilitates reconstruction of speech sounds, irrespective of pitch differences in such sounds.
Background
Linear Predictive Coding (LPC) of speech involves estimating the coefficients of a time varying filter (henceforth called a "synthesis filter") and providing appropriate excitation (input) to that time varying filter. The process is conventionally broken down in two steps known as encoding and decoding.
As shown in Figure 1, in the encoding step, the speech signal s is first filtered by pre-fϊlter 10. The pre-filtered speech signal sp is then analyzed by LPC Analysis block 14 to compute the coeffi- cients of the synthesis filter. Then, an "analysis filter" 12 is formed, using the same coefficients as the synthesis filter but having an inverse structure. The pre-filtered speech signal sp is processed by analysis filter 12 to produce an output signal u called the "residue". Information about the filter coefficients and the residue is passed to the decoder for use in the decoding step.
In the decoding step, a synthesis filter 18 is formed using the coefficients obtained from the encoder. An appropriate excitation signal e is applied to synthesis filter 18 by excitation generator 16, based on the information about the residue obtained from the encoder. Synthesis filter 18 outputs a synthetic speech signal y, which is ideally the closest possible approximation to the original speech signal s.
The present invention pertains to excitation generator 16 and to the way in which information about the residue passes from the encoder to the decoder. Analysis filter 12 and synthesis filter 18 are exact inverses of each other. Therefore, if the residue signal u were applied directly to synthesis filter 18, the decoder would exactly reproduce the pre-filtered speech signal sp. In other words, if the precise residue signal u could be transferred from the encoder to the decoder, then the synthetic speech output signal y would be of very high quality (i.e. as good as the pre-filtered speech signal sp). However, bandwidth restrictions necessitate quantization of the residue signal u, which unavoidably distorts the excitation signal e and the resultant synthetic speech signal y. Excitation generator 16 incorporates both a "voiced" excitation generator, and an "un-voiced" excitation generator. The quantization process exploits structural differences between voiced and unvoiced components of the residue. The voiced residue is quasi- periodic, while the unvoiced residue is like a randomly varying signal. The present invention deals particularly with quantization of the voiced residue, and corresponding generation of voiced excitation in the decoder.
The voiced residue can be described in terms of three parameters for quantization purposes: pitch, pu; gain, g; and, the shape of a single cycle, called the pulse shape. Pitch refers to the periodicity of the signal and is equal to the distance between subsequent pulses in the residue signal u. Gain refers to the energy of the signal and is higher for a residue having higher energy. The pulse shape is the actual geometric shape of each pulse (a single cycle) in the voiced residue. A typical voiced residue signal is shown in Figure 2.
Prior art LPC coding techniques have quantized pitch and gain parameters, but have achieved only poor representation of pulse shapes. For example, early LPC coders used single unit impulses to represent pulse shape (Markel, J.D. and Gray, A.H. Jr., "A Linear Prediction Vocoder Simulation Based Upon the Autocorrelation Method", IEEE Trans. ASSP, Vol. 22, 1974, pp. 124-134); the LPC- 10 government standard (U.S. Government Federal Standard 1015, 1977) represented each pulse by a fixed shape; and more recently, excitation pulse shapes have been represented as a sum of a fixed shape and random noise (McCree, A.V. and Barnwell III, T.P., "A Mixed Excitation LPC Vocoder Model for Low Bit Rate Speech Coding," IEEE Trans, on Speech and Audio Processing, Vol. 3, No. 4, July 1995, pp. 242-250). Pulse trains constructed from such restricted shapes provide a poor representation of the variations in pulse shapes observed in residual signals output by analysis filter 12, as is evident from the sample residue signal shown in Figure 2. A common technique known in the art of speech coding is
"vector quantization", in which a vector of samples (e.g. a signal segment) is represented as one of a predetermined set of vectors called "code words". All of the code words are assembled to form a table called a "code book". The difficulty in using a standard vector quantization approach is that the pulse shapes required to be represented in LPC based speech coding are not of fixed length, but vary with pitch period. In principle, one could construct a plurality of code books, one for each possible value of pitch period, but this approach requires too many code books. It is impractical in many cases to use multiple code books due to memory limitations of the hardware in which speech encoding and decoding capabilities are preferably pro- vided. For example, large integrated circuit memory chips have relatively high power consumption requirements which cannot be satisfied in small battery powered systems such as voice pagers, which must remain active for months between battery replacements.
This invention provides improved representation of pulse shapes in LPC coding of voiced speech, irrespective of pitch period variations, and requires only a single code book. The dashed line shown in Figure 1 represents the transfer of information about the residue from analysis filter 12 to excitation generator 16. Figure 3 depicts this transfer in greater detail in respect of the aforementioned pitch, gain and pulse shape parameters. However, the present invention focuses only on transfer of an improved pulse shape parameter in LPC coding of voiced speech sounds.
Summary of Invention The invention facilitates good quality LPC coding of voiced speech sounds through better quantization of excitation pulse shapes for all possible pitch periods. Unlike prior art techniques which use fixed shape excitation pulses, or excitation pulses formed by adding random noise to a fixed shape, the invention utilizes a novel frequency domain code book with code words representative of signal frequency spectra, to select a pulse shape that closely matches the original pulse shape from the residue signal. In particular, the invention provides a method of determining a pulse shape vector v for a linear predictive speech coder from a voiced residue pulse vuq, during a sampling instant n characterized by a gain g and a pitch period pu. A spectral magnitude vector Suq of dimension dsm is derived to represent the frequency spectral magnitude of the pulse during the sampling instant. A code book C^, containing a plurality of vectors representative of pre-selected spectral magnitude vectors is provided. A vector which provides a minimum error approximation to Suq is selected from the code book. ism is the spectral magnitude index, within the code book, of the selected minimum error approximation vector. A quantized spectral magnitude vector S having the spectral magnitude index ism and having dsm elements is then derived. A complex frequency spectrum signal X is derived from the quantized spectral magnitude vector S and the quantized pitch period p. This in turn is converted to a complex time domain representation x. The pulse shape vector v is then derived from the Real components of x.
Brief Description of Drawings Figure 1 is a block diagram representation of a prior art
LPC based speech encoder /decoder.
Figure 2 depicts a typical voiced residue signal waveform and the shapes of individual pulses found in typical voiced residue/excitation signals. Figure 3 is a block diagram representation of the information pathway over which information respecting the voiced residue is transferred from the encoder to the decoder in the preferred embodiment of the invention.
Figure 4 is a block diagram representation showing further details of the pulse shape encoder and pulse shape decoder blocks depicted in Figure 3.
Figure 5 graphically depicts interpolation of a harmonics vector, in accordance with the invention, to produce a spectral magnitude vector for cases in which the dimension of the harmonics vector is less than the desired dimension of the spectral magnitude vector. Figure 6 graphically depicts decimation of a harmonics vector, in accordance with the invention, to produce a spectral magnitude vector for cases in which the dimension of the harmonics vector exceeds the desired dimension of the spectral magnitude vector.
Description
Introduction
As previously explained, the pre-filtered signal, sp, (Figure 1) is obtained by passing the original speech signal, s, through a pre-processing filter 10. The residue, u, is obtained by passing the pre-filtered signal, sp, through a time-varying all-zero LPC analysis filter 12. The coefficients applied to filter 12 are obtained by LPC analyzer 14 using techniques which are well known to persons skilled in the art and need not be described here.
If, at any desired sampling (time) instant, n, the original speech signal s is classified as voiced (using techniques which are well known in the art), then a pulse-shape vector vuq is obtained as described below for that particular sampling instant. The energy at any sampling instant, n, is represented by a gain, g, corresponding to the root mean square value of the residue over a window (typically having a length of 80-160 samples) centred at the sampling instant, n. The pitch period at any sampling instant, n, as determined in the speech encoder, is denoted by pu and the quantized pitch at the speech decoder is denoted by p.
More particularly, as seen in Figure 3, voicing and gain analyzer 20 receives original speech signal s and residue u, and outputs signals representative of pitch period pu, gain g and pulse-shape vector vuq respectively. On the encoder side, pitch encoder 24 processes pitch period pu for further processing by pitch decoder 34 on the decoder side to yield quantized pitch p, which is in turn input to the decoder's voiced excitation generator 22. Pulse shape encoder 28 processes pulse-shape vector vuq for further processing by pulse shape decoder 30 to yield pulse shape vector v for input to voiced excitation generator 22. Gain encoder 26 processes the gain characteristic of the signal output by voicing and gain analyzer 20 for further processing by gain decoders 32, 36 which respectively yield the gain g for input to voiced excitation generator 22 (on the decoder side) and pulse shape encoder 28 (on the encoder side). The operation of pulse shape encoder 28 and pulse shape decoder 30 will now be described in further detail, with reference to Figure 4.
Computation of Spectral Magnitude Vectors A spectral magnitude vector, Suq, is obtained (Figure 4, block 38) as follows. First, an unquantized time domain pulse shape vector, vuq, is determined as: , .. u (n-l (pu-l) /2i+j) v ( 7 ) = u«r ; 10 (ff/20)
forj=0,...,pu-l
A complex spectrum signal, Vuq, which is a complex vector of dimension, pu, is then obtained by taking a /? ..-point Discrete Fourier Transform (DFT) of vuq. A harmonics vector, Huq, of dimension, dh, is then obtained from V . More particularly:
. , dh
Figure imgf000010_0001
The spectral magnitude vector, Suq, (of dimension, dsm=64, in the preferred embodiment of the invention), is obtained from the harmonics vector, Huq, by interpolation or decimation. Three cases must be considered, namely those in which dh=dsm, those in which dh <dsm, and those in which d^ d^. If dh=dsm, then Suq is set equal to Huq. Note that the number of harmonics, dh, is related to pitch, is time variant, and varies with individual speakers, whereas dm is fixed.
Figure 5 illustrates the interpolation process for the case dh <dsm, for representative values of dh=9 and dsm= \4. The two end elements ss}, ss9 of a source sequence of dh elements (upper portion of Figure 5) are initially repositioned (central portion of Figure 5) to coincide with the end elements tSj, ts14, of the desired target sequence (lower portion of Figure 5). The source sequence elements are equi- spaced, as are the target sequence elements, although the spacings are of arbitrary size in each sequence. Then, the source sequence elements between the end points are copied to the nearest element positions in the target sequence. Thus, source sequence elements ssl f ss2, ss3, and ss4 depicted in the central portion of Figure 5 are copied to produce target sequence elements ts}, ts3, ts5, and ts6 respectively, as depicted in the lower portion of Figure 5. Since dh<dsm (i.e. 9 < 14), some empty positions, such as ts2 and ts4 remain in the target sequence. These empty positions are filled by inserting values obtained by interpolation between the closest adjacent target sequence values copied from the source sequence. Thus, the value inserted in empty position ts2 is obtained by interpolation between the previously copied target sequence elements ts}, ts3; and, the value inserted in empty position ts4 is obtained by interpolation between the previously copied target sequence elements ts3, ts5, etc.
Figure 6 illustrates the decimation process for the case dh>dsm, fof representative values of dh=25 and dsm=S. The two end elements ssj, ss25 of the source sequence of dh elements (upper portion of Figure 6) are initially repositioned (central portion of Figure 6) to coincide with the end elements tslf ts25 of the desired target sequence (lower portion of Figure 6). Then, the source sequence elements between the end points are copied to the nearest element positions in the target sequence. Since dh > dsm (i.e. 25 > 8), some target sequence positions (in the case illustrated, all target sequence positions) must receive copies of more than one of the source sequence elements. Thus, source sequence elements ss}, ss2, ss3 and ss4 depicted in the central portion of Figure 6 are all copied to produce target sequence element tSj', source sequence elements ss5, ss6 and ss7 are all copied to produce target sequence element ts2, etc. as depicted in the lower portion of Figure 6. If more than one source sequence element is copied to produce a single target sequence element as aforesaid, the value of the resultant single target sequence element is determined as a weighted average of the source sequence elements in question. For example, source sequence elements ss}, ss2, ss3 and ss4 are weighted to produce target sequence element ts} as: tSj = WJSSJ + W2SS2 + W3SS3 + W4SS4 where w7, w2, w3, w4 are weighting values which can be obtained in any one of a number of ways well known to persons skilled in the art. The interpolation/decimation operation of the preferred embodiment of the invention is expressed in pseudo-code as follows:
If dh < dsm, then
Figure imgf000012_0001
end for
Figure imgf000012_0002
end for
Figure imgf000013_0001
s uq(3k> = weighted average {Huq(k) , . . . , Huq{k+i -l) } end for
Spectral Magnitude Code book Training
The vector quantizer code book, ^,, (Figure 4, blocks 46, 48) is obtained by generating a very large training set of spectral magnitude vectors, Suq, obtained from a database of different speakers and sentences. After the training set vectors are obtained, the code book, Csm, is obtained by means of the LBG algorithm (see Y. Linde, A. Buzo and R.M. Gray, "An algorithm for Vector Quantizer Design", IEEE Transactions on Communications, Vol. COM-28, pp. 84-95, January 1980). Once the code book, Csm, has been obtained, any spectral magnitude vector can then be encoded by selecting a suitable vector from the code book.
Encoding Spectral Magnitude Vectors The code book, Csm, consists of M vectors of dimension, dsm = 64. In the preferred embodiment, M = 256. Encoding the vector, Suq, (Figure 4, block 40) involves selecting a vector entry from the code book, Csm, that minimizes a specified error criterion. The spectral magnitude index, ism, denotes the vector entry selected from the spectral magnitude code book, C^.
A weighted mean square error criterion is used for the code book search. The weighting function, w^, used in the search procedure, is defined as follows:
Figure imgf000014_0001
W Sm -J' Suq (j) <0 . 25 J = 1 , . .
Figure imgf000014_0002
j≤dsm/2\ otherwise
Figure imgf000014_0003
Figure imgf000014_0004
Given the weighting function, wsm, as indicated above, the code book search procedure is as follows:
-38
Cmin = 1 ° for i=l to M . j) ) **„ { )
Figure imgf000014_0005
end for
Decoding Spectral Magnitude Vectors
Given the index, ism, the quantized spectral magnitude vector, 5, is obtained (Figure 4, block 42) by copying the (is h vector from the code book, Csm, as follows:
SO) = C ismj) foτj=l,...,dsm Computation of Pulse-shape Vectors
Pulse-shape vectors are computed (Figure 4, block 44) using the quantized pitch, p, not the unquantized pitch pu. More particularly, the dsm elements of the vector S are used in obtaining the complex spectrum signal, X = {X(j) j — 0, ... 2dsm-\}, as follows, Re(X(0J) = 0.0
Rerø) = Re(Xβdsm-j)) = (p/2)S(j) forj= \ ,...,dsm
Rerø ) = 2-ReCrtø )
Figure imgf000015_0001
Having obtained the complex spectrum signal, X, which is a complex vector of dimension 2dsm, the complex time-domain pulse signal, x, is obtained by taking a 2^m-point Inverse Fast Fourier Transform (IFFT) of AT.
The pulse shape vector, v, is then obtained (Figure 4, block 48) as follows:
Figure imgf000015_0002
v(j) = R O)) for 7=0,..., [p/2] -1 v(p-j) = Re(x(2dsm-j)) for ./ = 1 ,...,/?- [p/2j
As will be apparent to those skilled in the art in the light of the foregoing disclosure, many alterations and modifications are possible in the practice of this invention without departing from the spirit or scope thereof. For example, as noted above, the weighting values used in interpolation of the spectral magnitude vector (Figure 6) can be obtained in any one of a number of ways well known to per- sons skilled in the art. The same is true of the weighting function, wsm, used in searching the code book, as described above in the section headed "Encoding Spectral Magnitude Vectors". As a further example, different mapping techniques can be used in the interpolation/decimation processes described above in relation to Figures 5 and 6. Thus, instead of mapping the first element of the source sequence to the first element of the target sequence and the last element of the source sequence to the last element of the target sequence (which may not be very accurate, and may not yield good results for larger values of dsrr) one could alternatively compute the frequencies corresponding to the first and the last element of the source sequence and map those source sequence elements to the target sequence elements having the nearest corresponding frequencies. This of course means that choice of an appropriate value for dsm is another source of variation on the algorithm. Accordingly, the scope of the invention is to be construed in accordance with the substance defined by the following claims.

Claims

WHAT IS CLAIMED IS:
1. A method of determining a pulse shape vector v for a linear predictive speech coder from a voiced residue pulse vuq, during a sampling instant n characterized by a gain g and a pitch period p, said method characterized by:
(a) deriving a dsm dimension spectral magnitude vector Stt<7 representative of the frequency spectral magnitude of said pulse during said sampling instant;
(b) providing a code book Csm containing a plurality of vectors representative of pre-selected spectral magnitude vectors;
(c) selecting, from said code book, one of said plurality of vectors which provides a minimum error approximation to said spectral magnitude vector Suq, said selected minimum error approximation vector having a spectral magnitude index ism within said code book;
(d) deriving a quantized spectral magnitude vector 5 having said spectral magnitude index ism and having dsm elements;
(e) deriving a complex frequency spectrum signal X having real and imaginary components for each of said elements; (f) converting said complex frequency spectrum signal X to a complex time-domain representation x; and, (g) deriving said pulse shape vector v from the Real components of said complex time-domain representation x.
2. A method as defined in Claim 1 , wherein said derivation of said spectral magnitude vector Suq further comprises: (a) deriving an unquantized time domain pulse shape vector vuq, where:
Figure imgf000018_0001
forj=0,...,pu-l
(b) deriving a complex spectrum signal Vuq by taking a pu- point Discrete Fourier Transform of said unquantized time domain pulse shape vector vuq,
(c) deriving a harmonics vector Huq, where:
. , dh
Figure imgf000018_0002
(d) interpolating said harmonics vector Huq to form said spectral magnitude vector Suq.
3. A method as defined in Claim 2, wherein said harmonics vector Huq has a dimension dh, and wherein said interpolating further comprises: (a) if dh=dm, setting Suq equal to Huq, (b) if ^ <^:
(i) copying a first element of Suq to a corresponding first element position of Huq; (ii) copying a last element of Suq to a corresponding last element position of Huq; (iii) for each intermediate element of Suq between said first element of Suq and said last element of Suq, copying said intermediate element to a closest corresponding intermediate element position of Huq, (iv) for any one of said intermediate element positions of Huq to which no intermediate element of Suq is copied, copying to said one intermediate element position of Huq a value derived by interpolation between a first element of Huq immediately adjacent and on a first side of said one intermediate element position of Huq and a second element of Huq immedi- ately adjacent and on a second side of said one intermediate element position of Huq; (c) if dh >dsm,
(i) copying a first element of Suq to a corresponding first element position of Huq; (ii) copying a last element of Suq equal to a corresponding last element position of Huq; (iii) for each intermediate element of Suq between said first element of Suq and said last element of Suq, copying said intermediate element to a closest corre- sponding intermediate element position of Huq; and,
(iv) for any one of said intermediate element positions of Huq to which more than one intermediate element of Suq is to be copied, copying to said one intermediate element position of Huq a weighted average of all of said more than one intermediate elements of Suq. A method as defined in Claim 1 , wherein said selection of said minimum error approximation vector further comprises: (a) deriving a weighting function wjm, where
w„ , (J ) 1 , ds
Figure imgf000020_0001
(b) deriving an error value e.- for each vector in said code book C^, where:
- (S 'uuσq(j) -C3m ( i , j) ) 2wsm (j)
Figure imgf000020_0002
and i is said index of said vector within said code book C^, and, (c) selecting that one of said plurality of vectors with index ism within said code book Csm for which eim < e-. for all i ≠ ism.
A method as defined in Claim 1 , wherein said derivation of said quantized spectral magnitude vector 5 further comprises deriving SO) = C ismJ) foτj= l,...,dm.
6. A method as defined in Claim 1, wherein said derivation of said complex spectrum signal X further comprises: (a) setting Re(X(0J) = 0.0; (b) selling Rerø) = R (Xβdsm~j)) = (p/2)S0) for 7 = 1 d
(c) setting Re(X(d ) = 2-Re(x(d ); and,
(d) setting Imrøl) = 0.0 for ; = 1 , ... ,2dm-l .
7. A method as defined in Claim 6, wherein said conversion of said complex frequency spectrum signal X to said complex time- domain representation x further comprises deriving an inverse Fourier transform of X.
8. A method as defined in Claim 7, wherein said derivation of said pulse shape vector v from said Real components of said complex time-domain representation x further comprises:
(a) setting vQ) — 0-0 for j=0,...,p-l,' (b) setting vtfj = Re(t(/J)
Figure imgf000021_0001
-1 ; and,
(c) setting v(p-j) = Re(x(2dstn-j)) for j = 1 , ... ,p-
Figure imgf000021_0002
.
PCT/CA2000/000287 1999-03-24 2000-03-15 Computation and quantization of voiced excitation pulse shapes in linear predictive coding of speech WO2000057401A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU34110/00A AU3411000A (en) 1999-03-24 2000-03-15 Computation and quantization of voiced excitation pulse shapes in linear predictive coding of speech

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US27568299A 1999-03-24 1999-03-24
US09/275,682 1999-03-24

Publications (1)

Publication Number Publication Date
WO2000057401A1 true WO2000057401A1 (en) 2000-09-28

Family

ID=23053380

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2000/000287 WO2000057401A1 (en) 1999-03-24 2000-03-15 Computation and quantization of voiced excitation pulse shapes in linear predictive coding of speech

Country Status (2)

Country Link
AU (1) AU3411000A (en)
WO (1) WO2000057401A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6750210B2 (en) 2000-08-05 2004-06-15 Smithkline Beecham Corporation Formulation containing novel anti-inflammatory androstane derivative
US6759398B2 (en) 2000-08-05 2004-07-06 Smithkline Beecham Corporation Anti-inflammatory androstane derivative
US6777399B2 (en) 2000-08-05 2004-08-17 Smithkline Beecham Corporation Anti-inflammatory androstane derivative compositions
US6777400B2 (en) 2000-08-05 2004-08-17 Smithkline Beecham Corporation Anti-inflammatory androstane derivative compositions
US6787532B2 (en) 2000-08-05 2004-09-07 Smithkline Beecham Corporation Formulation containing anti-inflammatory androstane derivatives
US6858596B2 (en) 2000-08-05 2005-02-22 Smithkline Beecham Corporation Formulation containing anti-inflammatory androstane derivative
US6858593B2 (en) 2000-08-05 2005-02-22 Smithkline Beecham Corporation Anti-inflammatory androstane derivative compositions
US7132532B2 (en) 2000-08-05 2006-11-07 Glaxo Group Limited Compounds useful in the manufacture of an anti-inflammatory androstane derivative
US11721349B2 (en) 2014-04-17 2023-08-08 Voiceage Evs Llc Methods, encoder and decoder for linear predictive encoding and decoding of sound signals upon transition between frames having different sampling rates

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHEN J -H: "A CANDIDATE CODER FOR THE ITU-T'S NEW WIDEBAND SPEECH CODING STANDARD", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP),US,LOS ALAMITOS, IEEE COMP. SOC. PRESS, 1997, pages 1359 - 1362, XP000822708, ISBN: 0-8186-7920-4 *
GERLACH C G: "CELP speech coding with almost no codebook search", ICASSP-94. 1994 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (CAT. NO.94CH3387-8), PROCEEDINGS OF ICASSP '94. IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ADELAIDE, SA, AUSTRALIA, 19-22 APRIL 1, 1994, New York, NY, USA, IEEE, USA, pages II/109 - 12 vol.2, XP002140169, ISBN: 0-7803-1775-0 *
SKOGLUND J: "Analysis and quantization of glottal pulse shapes", SPEECH COMMUNICATION,NL,ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, vol. 24, no. 2, 1 May 1998 (1998-05-01), pages 133 - 152, XP004127156, ISSN: 0167-6393 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6750210B2 (en) 2000-08-05 2004-06-15 Smithkline Beecham Corporation Formulation containing novel anti-inflammatory androstane derivative
US6759398B2 (en) 2000-08-05 2004-07-06 Smithkline Beecham Corporation Anti-inflammatory androstane derivative
US6777399B2 (en) 2000-08-05 2004-08-17 Smithkline Beecham Corporation Anti-inflammatory androstane derivative compositions
US6777400B2 (en) 2000-08-05 2004-08-17 Smithkline Beecham Corporation Anti-inflammatory androstane derivative compositions
US6787532B2 (en) 2000-08-05 2004-09-07 Smithkline Beecham Corporation Formulation containing anti-inflammatory androstane derivatives
US6858596B2 (en) 2000-08-05 2005-02-22 Smithkline Beecham Corporation Formulation containing anti-inflammatory androstane derivative
US6858593B2 (en) 2000-08-05 2005-02-22 Smithkline Beecham Corporation Anti-inflammatory androstane derivative compositions
US7132532B2 (en) 2000-08-05 2006-11-07 Glaxo Group Limited Compounds useful in the manufacture of an anti-inflammatory androstane derivative
US11721349B2 (en) 2014-04-17 2023-08-08 Voiceage Evs Llc Methods, encoder and decoder for linear predictive encoding and decoding of sound signals upon transition between frames having different sampling rates

Also Published As

Publication number Publication date
AU3411000A (en) 2000-10-09

Similar Documents

Publication Publication Date Title
Spanias Speech coding: A tutorial review
Kleijn Encoding speech using prototype waveforms
US5127053A (en) Low-complexity method for improving the performance of autocorrelation-based pitch detectors
EP1224662B1 (en) Variable bit-rate celp coding of speech with phonetic classification
US5138661A (en) Linear predictive codeword excited speech synthesizer
CA2140329C (en) Decomposition in noise and periodic signal waveforms in waveform interpolation
US6041297A (en) Vocoder for coding speech by using a correlation between spectral magnitudes and candidate excitations
US5890110A (en) Variable dimension vector quantization
EP0780831B1 (en) Coding of a speech or music signal with quantization of harmonics components specifically and then of residue components
US20030074192A1 (en) Phase excited linear prediction encoder
USRE43099E1 (en) Speech coder methods and systems
JPH1097300A (en) Vector quantizing method, method and device for voice coding
US8719011B2 (en) Encoding device and encoding method
JPH10124092A (en) Method and device for encoding speech and method and device for encoding audible signal
US20050114123A1 (en) Speech processing system and method
EP1597721B1 (en) 600 bps mixed excitation linear prediction transcoding
JPH10214100A (en) Voice synthesizing method
US20110035214A1 (en) Encoding device and encoding method
WO2000057401A1 (en) Computation and quantization of voiced excitation pulse shapes in linear predictive coding of speech
Shlomot et al. Hybrid coding: combined harmonic and waveform coding of speech at 4 kb/s
Gottesman et al. High quality enhanced waveform interpolative coding at 2.8 kbps
JP2000514207A (en) Speech synthesis system
KR0155798B1 (en) Vocoder and the method thereof
Li et al. Coding of variable dimension speech spectral vectors using weighted nonsquare transform vector quantization
Etemoglu et al. Speech coding with an analysis-by-synthesis sinusoidal model

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase