US20010001320A1 - Method and device for speech coding - Google Patents

Method and device for speech coding Download PDF

Info

Publication number
US20010001320A1
US20010001320A1 US09/725,345 US72534500A US2001001320A1 US 20010001320 A1 US20010001320 A1 US 20010001320A1 US 72534500 A US72534500 A US 72534500A US 2001001320 A1 US2001001320 A1 US 2001001320A1
Authority
US
United States
Prior art keywords
speech
zero
differ
vector elements
encoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US09/725,345
Inventor
Stefan Heinen
Wen Xu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of US20010001320A1 publication Critical patent/US20010001320A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0057Block codes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/10Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/002Dynamic bit allocation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0007Codebook element generation
    • G10L2019/0008Algebraic codebooks

Definitions

  • the invention lies in the communications field. More specifically, the invention relates to methods and devices for speech coding in which the speech signal is encoded by a combination of speech parameters and excitation signals, in particular on the CELP (coded excited linear predictive) principle.
  • CELP coded excited linear predictive
  • Source signals or source information such as voice, sound, picture, and video signals almost always contain statistical redundancy, that is redundant information. This redundancy can be greatly reduced by source encoding, so that efficient transmission or storage of the source signal is made possible. This reduction in redundancy eliminates, prior to transmission, redundant signal contents that are based on the prior knowledge of, for example, statistical parameters of the signal variation.
  • the bit rate of the source-encoded information is also referred to as the encoding rate or source bit rate. Following transmission, in the source decoding, these component parts are once more added to the signal, so that objectively and/or subjectively there is no ascertainable loss in quality.
  • a method of encoding and/or corresponding decoding which may also comprise source encoding and/or channel encoding is also referred to as a speech codec.
  • AMR adaptive multirate
  • speech codecs with variable source bit rate or variable encoding rate are to achieve fixed network quality of the speech under different channel conditions and to ensure optimum distribution of the channel capacity with certain network parameters taken into account.
  • the object of the invention is to provide a method and a device for speech coding which overcome the above-noted deficiencies and disadvantages of the prior art devices and methods of this kind, and which make it possible to encode speech signals robustly to resist transmission errors and with relatively little expenditure.
  • a speech coding method which comprises:
  • the speech parameters or the excitation signals are formed from the speech signals on the CELP principle.
  • the positions of the vector elements that differ from zero are encoded together to form an index value.
  • a device for speech coding includes a processor unit receiving speech signals and being configured to:
  • the processor unit is programmed to obtain the speech parameters and excitation signals from the speech signals on the CELP principle.
  • the processor unit is programmed to encode the positions of the vector elements that differ from zero together to form an index value.
  • the invention is premised on the idea of encoding positions of certain predetermined vector elements which differ from zero and the algebraic signs of these vector elements separately from one another for the encoding of vectors for describing speech parameters or excitation signals.
  • the invention relates quite generally to methods for speech coding in which the speech signal is encoded by a combination of speech parameters and excitation signals.
  • the speech parameters include parameters or characteristic variables of a statistical model on which the speech production is based, for example LPC or LTP filter coefficients, and the excitation signals are signals of the exciting processes of this model. These processes may be modeled either statistically or deterministically. Examples of statistical modeling are vocoder methods in which the excitation signals are generated by noise sources. In deterministic modeling, the excitation signals are obtained with the aid of the underlying model from the speech signal and are quantized. Examples of this are RPE/LTP (GSM Full Rate Codec), VSELP (GSM Half Rate Codec) and ACELP (GSM Enhanced Full Rate Codec).
  • FIG. 1 is a block diagram of essential elements in a telecommunications transmission chain
  • FIG. 2 is a block diagram of an AMR encoder based on the CELP principle
  • FIG. 3 is a schematic block diagram of a processor unit.
  • a source Q which generates source signals qs, which are compressed by a source encoder QE, such as the GSM full rate speech coder, to form symbol sequences comprising symbols.
  • a source encoder QE such as the GSM full rate speech coder
  • the source signals qs generated by the source Q are divided into blocks (for example time frames) and these are processed separately.
  • the source encoder QE generates quantized parameters (for example speech parameters or speech coefficients), which are also referred to hereafter as symbols of a symbol sequence, and which reflect the properties of the source in the current block in a certain way (for example spectrum of the speech in the form of filter coefficients, amplitude factors, excitation vectors). After the quantization, these symbols have a certain symbol value.
  • quantized parameters for example speech parameters or speech coefficients
  • symbols of a symbol sequence for example spectrum of the speech in the form of filter coefficients, amplitude factors, excitation vectors.
  • the symbols of the symbol sequence or the corresponding symbol values are mapped by a binary mapping (allocation specification), which is frequently described as part of the source encoding QE, onto a sequence of binary code words, which in each case have a plurality of bit positions. If these binary code words are, for example, further processed one after the other as a sequence of binary code words, a sequence of source-encoded bit positions which may be embedded in a frame structure is produced. After source encoding carried out in this way, source bits or data bits db with a source bit rate (encoding rate) dependent on the type of source encoding are thus obtained in a structured form in the frame.
  • a binary mapping allocation specification
  • codebook is to be understood here as describing a table with all the quantization representatives.
  • the entries of the table may be both scalar and vectorial quantities.
  • Scalar codebooks can be used for example for the quantization of amplitude factors, since these are generally scalar quantities.
  • vectorial codebooks are the quantization of LSF (line spectrum frequencies) and the quantization of the stochastic excitation.
  • FIG. 2 there is shown a basic representation a special variant of a source encoder.
  • the exemplary embodiment is a speech coder, namely an AMR encoder based on a CELP (coded excited linear predictive) principle.
  • CELP coded excited linear predictive
  • the CELP principle concerns a method of analysis by synthesis.
  • a filter structure obtained from the current portion of speech is excited by excitation vectors (code vectors) taken one after the other from a codebook.
  • the output signal of the filter is compared by means of a suitable error criterion with the current portion of speech and the error-minimizing excitation vector is selected.
  • the representation of the filter structure and the position number of the selected excitation vector are transmitted to the receiver.
  • a specific variant of a CELP method uses an algebraic codebook, which is often also referred to as a sparse algebraic code. It is a multipulse codebook which is filled with binary (+/ ⁇ 1) or ternary pulses (0, +/ ⁇ 1). Within the excitation vectors, only a few positions are respectively occupied by pulses. After the selection of the positions, the entire vector is weighted with an amplitude factor.
  • a codebook of this type has several advantages. On the one hand, it does not take up any storage space, since the positions allowed for the pulses are determined by an algebraic computing rule, on the other hand it can be searched through very efficiently for the best pulse positions on account of the way it is structured.
  • a configurational variant of a conventional CELP encoder is first described below with reference to FIG. 2.
  • a target signal to be approximated is reproduced by searching through two codebooks.
  • the adaptive codebook (a 2 ) changes according to the speech signal, while the stochastic codebook (a 4 ) is invariant over time.
  • the search for the best excitation code vectors takes place not by searching jointly, i.e.
  • the codebooks simultaneously, in the codebooks, as would be necessary for an optimum selection of the excitation code vectors, but, for expenditure-related reasons, by initially searching through the adaptive codebook (a 2 ). Once the excitation code vector that is best according to the error criterion has been found, its contribution to the reconstructed target signal is subtracted from the target vector (target signal) and the part of the target signal still to be reconstructed by means of a vector from the stochastic codebook (a 4 ) is obtained.
  • the search in the individual codebooks takes place on the same principle. In both cases, the quotient from the square of the correlation of the filtered excitation code vector with the target vector and the energy of the filtered target vector is computed for all the excitation code vectors.
  • That excitation code vector which maximizes this quotient is regarded as the best excitation code vector, which minimizes the error criterion (a 5 ).
  • the preceding error weighting (a 6 ) weights the error according to the characteristics of human hearing.
  • the position of the found excitation code vector within the excitation codebook is transmitted to the decoder.
  • the computation of the afore-mentioned quotient has the effect of implicitly determining the correct (codebook) amplitude factor (amplification 1 , amplification 2 ) for each excitation code vector.
  • the quality-reducing influence of the sequentially performed codebook search can be reduced by a joint optimization of the amplification. This involves re-specifying the original target vector and computing the best amplifications matching the excitation code vectors now selected. These amplifications usually differ slightly from those which were determined during the codebook search.
  • each candidate vector can be individually filtered (a 3 ) and compared with the target signal for finding the best excitation code vector.
  • filter parameters are converted into binary signals and, embedded in a fixed structure, are transmitted in frames.
  • the filter parameters may be LPC (linear predictive coding) coefficients, LTP (long term prediction) indices, or LTP (long term prediction) amplitude factors.
  • the LPC residual signal or excitation signal is vectorially quantized.
  • an algebraic codebook is used.
  • the group of quantization representatives is not explicitly present in a table, but instead the various representatives can be determined from an index value with the aid of an algebraic computing rule. This additionally has complexity-related advantages in the codebook search.
  • the algebraic computing rule for the determination of the excitation code vectors from an index value uses a division of the vector space into so-called tracks. Within one track (vector), only components (vector elements) which lie on a track-specific grid can assume values which differ from zero.
  • Positions which differ from zero are referred to as pulse positions.
  • the pulses may have either a binary ( ⁇ 1/+1) or ternary ( ⁇ 1/0/+1) value range.
  • the superposing of individual tracks finally supplies the excitation code vector.
  • a track (vector) is consequently described by pulses (vector elements which differ from zero). It is possible for the pulses to be described by their position (vector element index) and their algebraic sign. The total set of possible pulse position combinations is described with the aid of an overall index.
  • the code word length required for encoding the excitation vector found is made up of the number of bits required for encoding the pulse positions and algebraic signs. If only one pulse is set per track, not only the number of bits for the encoding of its position but also a further bit for the encoding of its algebraic sign are required.
  • the sign information of a track must allow itself to be transmitted by deliberate changing of the encoding sequence of the pulse positions. If, for instance, a codebook which provides four pulses per track were drawn up, the permutation of the encoding sequence alone would be sufficient for the sign encoding; an additional sign bit would not be required.
  • the invention describes a significantly more robust sign encoding.
  • the overall set of possible pulse positions is addressed by a suitable algebraic method with the aid of a single index.
  • the algebraic signs of the pulses are respectively encoded with a bit.
  • the improved method may be explained by the example of the algebraic codebook for the rate 9.5 kbits/s.
  • this codebook two pulses are set to 14 possible positions.
  • one bit is required for the first pulse sign and 4 bits are required for each of the two pulse positions, that is a total of 9 bits.
  • the different encoding rates mean that different codec modes generally have different frame sizes, and therefore also different structures in which the bit positions serving for describing the filter parameters, amplitude factors or excitation code vectors are embedded.
  • the changing of the encoding rate may be realized by a corresponding changing of the number of bit positions for describing an excitation code vector taken from a stochastic codebook.
  • Switching over of the codec modes consequently leads, as a result of switching over to a stochastic codebook corresponding to the new codec mode, to the selection of excitation code vectors contained in this codebook, for the description of which a correspondingly changed number of bit positions is required. Consequently, there are different stochastic codebooks available according to the number of different codec modes.
  • a processor unit PE which may be contained in particular in a communication device, such as a base station BS or mobile station MS.
  • the unit PE contains a control device STE, which essentially comprises a program-controlled microcontroller, and a processing device VE, which comprises a processor, in particular a digital signal processor, which can both gain writing and reading access to memory chips SPE.
  • the microcontroller controls and monitors all the major elements and functions of a functional unit which includes the processor unit PE.
  • the digital signal processor, part of the digital signal processor or a specific processor is responsible for carrying out the speech coding or speech decoding.
  • the selection of a speech codec may also be performed by the microcontroller or the digital signal processor itself.
  • An input/output interface I/O serves for the input/output of useful or control data, for example to a man-machine interface MMI, which may include a keyboard and/or a display.
  • MMI man-machine interface
  • the individual elements of the processor unit may be connected to one another by a digital bus system BUS.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Error Detection And Correction (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)

Abstract

In the novel speech coding method and device, speech signals are coded by a combination of speech parameters and excitation signals. The speech parameters or excitation signals are described with vectors. The vectors are formed by superposing at least two tracks, wherein at least one track has at least two vector elements different from zero. The algebraic signs of the vector elements that differ from zero are coded independently of one another and independently of the positions of the vector elements that differ from zero.

Description

    CROSS-REFERENCE TO RELATED APPLICATION
  • 1. This is a continuation of copending International Application PCT/EP99/03766, filed May 31, 1999, which designated the United States.
  • BACKGROUND OF THE INVENTION
  • 2. 1. Field of the Invention
  • 3. The invention lies in the communications field. More specifically, the invention relates to methods and devices for speech coding in which the speech signal is encoded by a combination of speech parameters and excitation signals, in particular on the CELP (coded excited linear predictive) principle.
  • 4. Source signals or source information such as voice, sound, picture, and video signals almost always contain statistical redundancy, that is redundant information. This redundancy can be greatly reduced by source encoding, so that efficient transmission or storage of the source signal is made possible. This reduction in redundancy eliminates, prior to transmission, redundant signal contents that are based on the prior knowledge of, for example, statistical parameters of the signal variation. The bit rate of the source-encoded information is also referred to as the encoding rate or source bit rate. Following transmission, in the source decoding, these component parts are once more added to the signal, so that objectively and/or subjectively there is no ascertainable loss in quality.
  • 5. On the other hand, it is customary in signal transmission for redundancy to be specifically added again by channel encoding, in order to eliminate largely the influencing of the transmission by channel interference. Additional redundant bits enable the receiver or decoder to detect errors and possibly also correct them. The bit rate of the channel-encoded information is also referred to as the gross bit rate.
  • 6. To allow information, in particular speech data, picture data or other useful data, to be transmitted as efficiently as possible by means of the limited transmission capacities of a transmission medium, in particular an air interface, this information to be transmitted is consequently compressed prior to the transmission by a source encoding and protected against channel errors by a channel encoding. Different methods are known for these procedures. For example, in the GSM (Global System for Mobile Communication) system, speech can be encoded by means of a full rate speech codec, a half rate speech codec or an enhanced full rate speech codec (EFR).
  • 7. Within the scope of this description, a method of encoding and/or corresponding decoding, which may also comprise source encoding and/or channel encoding is also referred to as a speech codec.
  • 8. As part of the further development of the European GSM mobile radio standard and the development of new mobile radio systems which are based on a CDMA (code division multiple access) method, such as the UMTS (Universal Mobile Telecommunications System) in the process of being standardized, new methods are being developed for encoded speech transmission, making it possible for the entire data rate, and also the dividing of the data rate between the source encoding and channel encoding, to be set adaptively according to the channel state and network conditions (system load). Instead of the speech codecs described above, having a fixed source bit rate, new speech codecs, able to be operated in different codec modes, are to be used here, the codec modes differing with regard to their source bit rate (encoding rate).
  • 9. The main objects of such AMR (adaptive multirate) speech codecs with variable source bit rate or variable encoding rate are to achieve fixed network quality of the speech under different channel conditions and to ensure optimum distribution of the channel capacity with certain network parameters taken into account.
  • 10. 2. Summary of the Invention
  • 11. The object of the invention is to provide a method and a device for speech coding which overcome the above-noted deficiencies and disadvantages of the prior art devices and methods of this kind, and which make it possible to encode speech signals robustly to resist transmission errors and with relatively little expenditure.
  • 12. With the above and other objects in view there is provided, in accordance with the invention, a speech coding method, which comprises:
  • 13. coding a speech signal by a combination of speech parameters and excitation signals;
  • 14. describing the speech parameters or excitation signals by vectors;
  • 15. forming the vectors by superposing at least two tracks, wherein at least one track has at least two vector elements different from zero; and
  • 16. coding algebraic signs of the vector elements that differ from zero independently of one another and independently of the positions of the vector elements that differ from zero.
  • 17. In accordance with an added feature of the invention, the speech parameters or the excitation signals are formed from the speech signals on the CELP principle.
  • 18. In accordance with an additional feature of the invention, the positions of the vector elements that differ from zero are encoded together to form an index value.
  • 19. With the above and other objects in view there is also provided, in accordance with the invention, a device for speech coding. The device includes a processor unit receiving speech signals and being configured to:
  • 20. encode a speech signal by a combination of speech parameters and excitation signals;
  • 21. describe the speech parameters or excitation signals by vectors;
  • 22. form the vectors by superposing at least two tracks;
  • 23. define at least one track with at least two vector elements that differ from zero; and
  • 24. encode an algebraic sign of the vector elements that differ from zero independently of one another and independently of positions of the vector elements that differ from zero.
  • 25. In accordance with again an added feature of the invention, the processor unit is programmed to obtain the speech parameters and excitation signals from the speech signals on the CELP principle.
  • 26. In accordance with a concomitant feature of the invention, the processor unit is programmed to encode the positions of the vector elements that differ from zero together to form an index value.
  • 27. In other words, the invention is premised on the idea of encoding positions of certain predetermined vector elements which differ from zero and the algebraic signs of these vector elements separately from one another for the encoding of vectors for describing speech parameters or excitation signals.
  • 28. The problem is also solved by devices for speech coding in which a digital signal processor is in each case set up in such a way that positions of certain predetermined vector elements which differ from zero and the algebraic signs of these vector elements are encoded separately from one another for the encoding of vectors for describing speech parameters or excitation signals.
  • 29. The invention relates quite generally to methods for speech coding in which the speech signal is encoded by a combination of speech parameters and excitation signals. The speech parameters include parameters or characteristic variables of a statistical model on which the speech production is based, for example LPC or LTP filter coefficients, and the excitation signals are signals of the exciting processes of this model. These processes may be modeled either statistically or deterministically. Examples of statistical modeling are vocoder methods in which the excitation signals are generated by noise sources. In deterministic modeling, the excitation signals are obtained with the aid of the underlying model from the speech signal and are quantized. Examples of this are RPE/LTP (GSM Full Rate Codec), VSELP (GSM Half Rate Codec) and ACELP (GSM Enhanced Full Rate Codec).
  • 30. Other features which are considered as characteristic for the invention are set forth in the appended claims.
  • 31. Although the invention is illustrated and described herein as embodied in a method and device for speech coding, it is nevertheless not intended to be limited to the details shown, since various modifications and structural changes may be made therein without departing from the spirit of the invention and within the scope and range of equivalents of the claims.
  • 32. The construction and method of operation of the invention, however, together with additional objects and advantages thereof will be best understood from the following description of specific embodiments when read in connection with the accompanying drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • 33.FIG. 1 is a block diagram of essential elements in a telecommunications transmission chain;
  • 34.FIG. 2 is a block diagram of an AMR encoder based on the CELP principle; and
  • 35.FIG. 3 is a schematic block diagram of a processor unit.
  • DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • 36. Referring now to the figures of the drawing in detail and first, particularly, to FIG. 1 thereof, there is seen a source Q, which generates source signals qs, which are compressed by a source encoder QE, such as the GSM full rate speech coder, to form symbol sequences comprising symbols. In parametric source encoding methods, the source signals qs generated by the source Q (for example speech) are divided into blocks (for example time frames) and these are processed separately. The source encoder QE generates quantized parameters (for example speech parameters or speech coefficients), which are also referred to hereafter as symbols of a symbol sequence, and which reflect the properties of the source in the current block in a certain way (for example spectrum of the speech in the form of filter coefficients, amplitude factors, excitation vectors). After the quantization, these symbols have a certain symbol value.
  • 37. The symbols of the symbol sequence or the corresponding symbol values are mapped by a binary mapping (allocation specification), which is frequently described as part of the source encoding QE, onto a sequence of binary code words, which in each case have a plurality of bit positions. If these binary code words are, for example, further processed one after the other as a sequence of binary code words, a sequence of source-encoded bit positions which may be embedded in a frame structure is produced. After source encoding carried out in this way, source bits or data bits db with a source bit rate (encoding rate) dependent on the type of source encoding are thus obtained in a structured form in the frame.
  • 38. The term codebook is to be understood here as describing a table with all the quantization representatives. The entries of the table may be both scalar and vectorial quantities.
  • 39. Scalar codebooks can be used for example for the quantization of amplitude factors, since these are generally scalar quantities. Examples of the use of vectorial codebooks are the quantization of LSF (line spectrum frequencies) and the quantization of the stochastic excitation.
  • 40. Referring now to FIG. 2, there is shown a basic representation a special variant of a source encoder. The exemplary embodiment is a speech coder, namely an AMR encoder based on a CELP (coded excited linear predictive) principle.
  • 41. The CELP principle concerns a method of analysis by synthesis. In this case, a filter structure obtained from the current portion of speech is excited by excitation vectors (code vectors) taken one after the other from a codebook. The output signal of the filter is compared by means of a suitable error criterion with the current portion of speech and the error-minimizing excitation vector is selected. The representation of the filter structure and the position number of the selected excitation vector are transmitted to the receiver.
  • 42. A specific variant of a CELP method uses an algebraic codebook, which is often also referred to as a sparse algebraic code. It is a multipulse codebook which is filled with binary (+/−1) or ternary pulses (0, +/−1). Within the excitation vectors, only a few positions are respectively occupied by pulses. After the selection of the positions, the entire vector is weighted with an amplitude factor. A codebook of this type has several advantages. On the one hand, it does not take up any storage space, since the positions allowed for the pulses are determined by an algebraic computing rule, on the other hand it can be searched through very efficiently for the best pulse positions on account of the way it is structured.
  • 43. A configurational variant of a conventional CELP encoder is first described below with reference to FIG. 2. A target signal to be approximated is reproduced by searching through two codebooks. A distinction is drawn here between an adaptive codebook (a2), the task of which is the reproduction of the harmonic speech components, and a stochastic codebook (a4), which serves for the synthesis of the speech components which cannot be obtained by prediction. The adaptive codebook (a2) changes according to the speech signal, while the stochastic codebook (a4) is invariant over time. The search for the best excitation code vectors takes place not by searching jointly, i.e. simultaneously, in the codebooks, as would be necessary for an optimum selection of the excitation code vectors, but, for expenditure-related reasons, by initially searching through the adaptive codebook (a2). Once the excitation code vector that is best according to the error criterion has been found, its contribution to the reconstructed target signal is subtracted from the target vector (target signal) and the part of the target signal still to be reconstructed by means of a vector from the stochastic codebook (a4) is obtained. The search in the individual codebooks takes place on the same principle. In both cases, the quotient from the square of the correlation of the filtered excitation code vector with the target vector and the energy of the filtered target vector is computed for all the excitation code vectors. That excitation code vector which maximizes this quotient is regarded as the best excitation code vector, which minimizes the error criterion (a5). The preceding error weighting (a6) weights the error according to the characteristics of human hearing. The position of the found excitation code vector within the excitation codebook is transmitted to the decoder.
  • 44. The computation of the afore-mentioned quotient has the effect of implicitly determining the correct (codebook) amplitude factor (amplification 1, amplification 2) for each excitation code vector. Once the best candidate has been determined from the two codebooks, the quality-reducing influence of the sequentially performed codebook search can be reduced by a joint optimization of the amplification. This involves re-specifying the original target vector and computing the best amplifications matching the excitation code vectors now selected. These amplifications usually differ slightly from those which were determined during the codebook search.
  • 45. In the case of the CELP principle, each candidate vector can be individually filtered (a3) and compared with the target signal for finding the best excitation code vector.
  • 46. Finally, filter parameters, amplitude factors, and excitation code vectors are converted into binary signals and, embedded in a fixed structure, are transmitted in frames. The filter parameters may be LPC (linear predictive coding) coefficients, LTP (long term prediction) indices, or LTP (long term prediction) amplitude factors.
  • 47. The LPC residual signal or excitation signal is vectorially quantized. To keep the ROM requirement small, an algebraic codebook is used. In other words, the group of quantization representatives is not explicitly present in a table, but instead the various representatives can be determined from an index value with the aid of an algebraic computing rule. This additionally has complexity-related advantages in the codebook search.
  • 48. The algebraic computing rule for the determination of the excitation code vectors from an index value uses a division of the vector space into so-called tracks. Within one track (vector), only components (vector elements) which lie on a track-specific grid can assume values which differ from zero.
  • 49. Positions which differ from zero are referred to as pulse positions. Depending on the codebook, the pulses may have either a binary (−1/+1) or ternary (−1/0/+1) value range. The superposing of individual tracks finally supplies the excitation code vector.
  • 50. This is to be explained on the basis of the following simple example: the algebraic codebook of the dimension 20, which can be formed by 2 tracks.
  • 51. The symbols are as follows:
  • 52. 0: allowed position in the track, no pulse
  • 53. #: unallowed position in the track
  • 54. +: positive pulse
  • 55. −: negative pulse
    Position: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
    Track1: 0 # 0 # 0 # 0 # 0 # 0 # + # 0 # 0 # 0 #
    Track2: # 0 # # 0 # 0 # 0 # 0 # 0 # 0 # 0 # 0
    Excit: 0 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 0 0
  • 56. For encoding this codebook, accordingly 10 positions and the algebraic sign of the set pulse have been encoded per track. A more accurate quantization of the excitation signal is achieved with codebooks which have more than one pulse per track, since then the superposing of two pulses is also possible.
  • 57. A track (vector) is consequently described by pulses (vector elements which differ from zero). It is possible for the pulses to be described by their position (vector element index) and their algebraic sign. The total set of possible pulse position combinations is described with the aid of an overall index.
  • 58. The code word length required for encoding the excitation vector found is made up of the number of bits required for encoding the pulse positions and algebraic signs. If only one pulse is set per track, not only the number of bits for the encoding of its position but also a further bit for the encoding of its algebraic sign are required.
  • 59. Efficient encoding of the algebraic signs with likewise only one bit for the case in which two pulses are set per track is already used in the GSM-EFR codec. Here, the information of the position sequence is utilized. If two pulses are located in the position grid of the same track, in the case in which both pulses have like algebraic signs, that pulse which assumes the lower position within the grid is encoded first. In the case of different algebraic signs, the pulse which occupies the higher position is encoded first. Only the sign bit of the pulse encoded first is transmitted. The algebraic sign of the second pulse is determined on the decoder side by analysis of the encoding sequence of the pulse positions.
  • 60. This principle is to be illustrated on the basis of a codebook with three pulses per track. In this case, the sign information of the pulses of a track can also be encoded with a bit, in that the principle of the algebraic sign encoding is extended by deliberate changing of the encoding sequence. This is shown by the following estimate: in the transmission of a sign bit, with PT=3 pulses per track, NVZ possible sign combinations are obtained N VZ = 2 P r 2 = 2 3 2 = 4.
    Figure US20010001320A1-20010517-M00001
  • 61. A number Nperm of possible permutations of the pulse positions becomes
  • N perm =P T!=3!=6>N VZ
  • 62. As long as the number of possible permutations Nperm is greater than the number of possible sign combinations NVZ, the sign information of a track must allow itself to be transmitted by deliberate changing of the encoding sequence of the pulse positions. If, for instance, a codebook which provides four pulses per track were drawn up, the permutation of the encoding sequence alone would be sufficient for the sign encoding; an additional sign bit would not be required.
  • 63. This efficient encoding of the sign information of an excitation vector is accomplished, however, at the expense of an increased susceptibility to interference on the transmission channel. In the worst case, the interference of a sign bit of a track causes the reversal of all the algebraic signs in this track. Similarly, the interference of the parameter of a pulse position may affect the algebraic signs of all the pulses of the same track.
  • 64. For this reason, the invention describes a significantly more robust sign encoding. In this case, the overall set of possible pulse positions is addressed by a suitable algebraic method with the aid of a single index. Independently of this, the algebraic signs of the pulses are respectively encoded with a bit.
  • 65. The improved method may be explained by the example of the algebraic codebook for the rate 9.5 kbits/s. In the case of this codebook, two pulses are set to 14 possible positions. For the encoding scheme with permutation encoding, one bit is required for the first pulse sign and 4 bits are required for each of the two pulse positions, that is a total of 9 bits.
  • 66. The robust encoding method encodes the possible pulse positions independently of the algebraic signs. Since both pulses may also lie at the same position, there are combinations with repetition here. It is known from the theory of combinations that in this case there exists the following number of possiblities: ( 14 + 2 - 1 2 ) = ( 15 2 ) = 105 < 2 7
    Figure US20010001320A1-20010517-M00002
  • 67. Since this number is less than 27=128, seven bits are sufficient for the encoding of the positions. The two algebraic signs are respectively encoded with one bit. In this way, a detachment of algebraic signs and pulse positions is achieved without increasing the bit rate required for the encoding of the excitation vectors.
  • 68. In simulations, individual bit positions of the codebook indices were in each case subjected to interference with a 100% error rate and the resulting speech SNR measured after resynthesis. In these simulations it was possible to improve the sensitivity of the algebraic signs by about 3 dB on account of the more robust encoding.
  • 69. In a configurational variant of the invention, the different encoding rates mean that different codec modes generally have different frame sizes, and therefore also different structures in which the bit positions serving for describing the filter parameters, amplitude factors or excitation code vectors are embedded.
  • 70. To realize a variable encoding rate, the changing of the encoding rate may be realized by a corresponding changing of the number of bit positions for describing an excitation code vector taken from a stochastic codebook. Switching over of the codec modes consequently leads, as a result of switching over to a stochastic codebook corresponding to the new codec mode, to the selection of excitation code vectors contained in this codebook, for the description of which a correspondingly changed number of bit positions is required. Consequently, there are different stochastic codebooks available according to the number of different codec modes.
  • 71. Referring now to FIG. 3, there is shown a processor unit PE, which may be contained in particular in a communication device, such as a base station BS or mobile station MS. The unit PE contains a control device STE, which essentially comprises a program-controlled microcontroller, and a processing device VE, which comprises a processor, in particular a digital signal processor, which can both gain writing and reading access to memory chips SPE.
  • 72. The microcontroller controls and monitors all the major elements and functions of a functional unit which includes the processor unit PE. The digital signal processor, part of the digital signal processor or a specific processor is responsible for carrying out the speech coding or speech decoding. The selection of a speech codec may also be performed by the microcontroller or the digital signal processor itself.
  • 73. An input/output interface I/O serves for the input/output of useful or control data, for example to a man-machine interface MMI, which may include a keyboard and/or a display. The individual elements of the processor unit may be connected to one another by a digital bus system BUS.
  • 74. It will be understood by those of skill in the pertinent art that, on the basis of the foregoing description, the invention can also apply to encoding methods other than the CELP encoding method explained in the application.

Claims (6)

We claim:
1. A speech coding method, which comprises:
coding a speech signal by a combination of speech parameters and excitation signals;
describing the speech parameters or excitation signals by vectors;
forming the vectors by superposing at least two tracks, wherein at least one track has at least two vector elements different from zero; and
coding algebraic signs of the vector elements that differ from zero independently of one another and independently of the positions of the vector elements that differ from zero.
2. The method according to
claim 1
, which comprises forming one of the speech parameters and the excitation signals from the speech signals on the CELP principle.
3. The method according to
claim 1
, which comprises coding the positions of the vector elements that differ from zero together to form an index value.
4. A device for speech coding, comprising an input for receiving a speech signal, and a processor unit connected to said input and configured to:
code a speech signal by a combination of speech parameters and excitation signals;
describe the speech parameters or excitation signals by vectors;
form the vectors by superposing at least two tracks;
define at least one track with at least two vector elements that differ from zero; and
code an algebraic sign of the vector elements that differ from zero independently of one another and independently of positions of the vector elements that differ from zero.
5. The device according to
claim 4
, wherein said processor unit is programmed to obtain the speech parameters and excitation signals from the speech signals on the CELP principle.
6. The device according to
claim 4
, wherein said processor unit is programmed to code the positions of the vector elements that differ from zero together to form an index value.
US09/725,345 1998-05-29 2000-11-29 Method and device for speech coding Abandoned US20010001320A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP98109868 1998-05-29
EP98109868.4 1998-05-29
PCT/EP1999/003766 WO1999063522A1 (en) 1998-05-29 1999-05-31 Method and device for voice encoding

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP1999/003766 Continuation WO1999063522A1 (en) 1998-05-29 1999-05-31 Method and device for voice encoding

Publications (1)

Publication Number Publication Date
US20010001320A1 true US20010001320A1 (en) 2001-05-17

Family

ID=8232031

Family Applications (2)

Application Number Title Priority Date Filing Date
US09/725,345 Abandoned US20010001320A1 (en) 1998-05-29 2000-11-29 Method and device for speech coding
US09/725,347 Expired - Lifetime US6567949B2 (en) 1998-05-29 2000-11-29 Method and configuration for error masking

Family Applications After (1)

Application Number Title Priority Date Filing Date
US09/725,347 Expired - Lifetime US6567949B2 (en) 1998-05-29 2000-11-29 Method and configuration for error masking

Country Status (5)

Country Link
US (2) US20010001320A1 (en)
EP (2) EP1093690B1 (en)
CN (2) CN1134764C (en)
DE (2) DE59913231D1 (en)
WO (3) WO1999063520A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050065788A1 (en) * 2000-09-22 2005-03-24 Jacek Stachurski Hybrid speech coding and system
US20050228653A1 (en) * 2002-11-14 2005-10-13 Toshiyuki Morii Method for encoding sound source of probabilistic code book
US20100174541A1 (en) * 2009-01-06 2010-07-08 Skype Limited Quantization
US20100174532A1 (en) * 2009-01-06 2010-07-08 Koen Bernard Vos Speech encoding
US20100174534A1 (en) * 2009-01-06 2010-07-08 Koen Bernard Vos Speech coding
US20100174547A1 (en) * 2009-01-06 2010-07-08 Skype Limited Speech coding
US20100174537A1 (en) * 2009-01-06 2010-07-08 Skype Limited Speech coding
US20100174538A1 (en) * 2009-01-06 2010-07-08 Koen Bernard Vos Speech encoding
US20110077940A1 (en) * 2009-09-29 2011-03-31 Koen Bernard Vos Speech encoding
US9263051B2 (en) 2009-01-06 2016-02-16 Skype Speech coding by quantizing with random-noise signal

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19932943A1 (en) * 1999-07-14 2001-01-18 Siemens Ag Method and device for decoding source signals
US8219940B2 (en) * 2005-07-06 2012-07-10 Semiconductor Insights Inc. Method and apparatus for removing dummy features from a data structure
US7957701B2 (en) * 2007-05-29 2011-06-07 Alcatel-Lucent Usa Inc. Closed-loop multiple-input-multiple-output scheme for wireless communication based on hierarchical feedback
US8207875B2 (en) * 2009-10-28 2012-06-26 Motorola Mobility, Inc. Encoder that optimizes bit allocation for information sub-parts
US9275644B2 (en) * 2012-01-20 2016-03-01 Qualcomm Incorporated Devices for redundant frame coding and decoding
US9700276B2 (en) * 2012-02-28 2017-07-11 Siemens Healthcare Gmbh Robust multi-object tracking using sparse appearance representation and online sparse appearance dictionary update
KR102132522B1 (en) * 2014-02-27 2020-07-09 텔레폰악티에볼라겟엘엠에릭슨(펍) Method and apparatus for pyramid vector quantization indexing and de-indexing of audio/video sample vectors

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5247579A (en) * 1990-12-05 1993-09-21 Digital Voice Systems, Inc. Methods for speech transmission
FI94810C (en) * 1993-10-11 1995-10-25 Nokia Mobile Phones Ltd A method for identifying a poor GSM speech frame
US5970444A (en) * 1997-03-13 1999-10-19 Nippon Telegraph And Telephone Corporation Speech coding method
US6161089A (en) * 1997-03-14 2000-12-12 Digital Voice Systems, Inc. Multi-subframe quantization of spectral parameters

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050065788A1 (en) * 2000-09-22 2005-03-24 Jacek Stachurski Hybrid speech coding and system
US7363219B2 (en) * 2000-09-22 2008-04-22 Texas Instruments Incorporated Hybrid speech coding and system
US20050228653A1 (en) * 2002-11-14 2005-10-13 Toshiyuki Morii Method for encoding sound source of probabilistic code book
US7577566B2 (en) * 2002-11-14 2009-08-18 Panasonic Corporation Method for encoding sound source of probabilistic code book
US8463604B2 (en) 2009-01-06 2013-06-11 Skype Speech encoding utilizing independent manipulation of signal and noise spectrum
US8433563B2 (en) 2009-01-06 2013-04-30 Skype Predictive speech signal coding
US20100174534A1 (en) * 2009-01-06 2010-07-08 Koen Bernard Vos Speech coding
US20100174547A1 (en) * 2009-01-06 2010-07-08 Skype Limited Speech coding
US20100174537A1 (en) * 2009-01-06 2010-07-08 Skype Limited Speech coding
US20100174538A1 (en) * 2009-01-06 2010-07-08 Koen Bernard Vos Speech encoding
US10026411B2 (en) 2009-01-06 2018-07-17 Skype Speech encoding utilizing independent manipulation of signal and noise spectrum
US8392178B2 (en) 2009-01-06 2013-03-05 Skype Pitch lag vectors for speech encoding
US8396706B2 (en) * 2009-01-06 2013-03-12 Skype Speech coding
US20100174532A1 (en) * 2009-01-06 2010-07-08 Koen Bernard Vos Speech encoding
US9530423B2 (en) 2009-01-06 2016-12-27 Skype Speech encoding by determining a quantization gain based on inverse of a pitch correlation
US20100174541A1 (en) * 2009-01-06 2010-07-08 Skype Limited Quantization
US8639504B2 (en) 2009-01-06 2014-01-28 Skype Speech encoding utilizing independent manipulation of signal and noise spectrum
US8670981B2 (en) 2009-01-06 2014-03-11 Skype Speech encoding and decoding utilizing line spectral frequency interpolation
US8849658B2 (en) 2009-01-06 2014-09-30 Skype Speech encoding utilizing independent manipulation of signal and noise spectrum
US9263051B2 (en) 2009-01-06 2016-02-16 Skype Speech coding by quantizing with random-noise signal
US8452606B2 (en) 2009-09-29 2013-05-28 Skype Speech encoding using multiple bit rates
US20110077940A1 (en) * 2009-09-29 2011-03-31 Koen Bernard Vos Speech encoding

Also Published As

Publication number Publication date
EP1080464B1 (en) 2003-01-29
EP1080464A1 (en) 2001-03-07
US20020007473A1 (en) 2002-01-17
CN1312937A (en) 2001-09-12
EP1093690A1 (en) 2001-04-25
US6567949B2 (en) 2003-05-20
WO1999063520A1 (en) 1999-12-09
CN1134764C (en) 2004-01-14
DE59904164D1 (en) 2003-03-06
WO1999063523A1 (en) 1999-12-09
CN1303508A (en) 2001-07-11
EP1093690B1 (en) 2006-03-15
CN1143470C (en) 2004-03-24
DE59913231D1 (en) 2006-05-11
WO1999063522A1 (en) 1999-12-09

Similar Documents

Publication Publication Date Title
USRE49363E1 (en) Variable bit rate LPC filter quantizing and inverse quantizing device and method
US20010001320A1 (en) Method and device for speech coding
US7778827B2 (en) Method and device for gain quantization in variable bit rate wideband speech coding
Salami et al. ITU-T G. 729 Annex A: reduced complexity 8 kb/s CS-ACELP codec for digital simultaneous voice and data
JP3114197B2 (en) Voice parameter coding method
EP0802524B1 (en) Speech coder
US20050065785A1 (en) Indexing pulse positions and signs in algebraic codebooks for coding of wideband signals
CN101494055A (en) Method and device for CDMA wireless systems
JP2002526798A (en) Encoding and decoding of multi-channel signals
US20040024594A1 (en) Fine granularity scalability speech coding for multi-pulses celp-based algorithm
US7302387B2 (en) Modification of fixed codebook search in G.729 Annex E audio coding
Kataoka et al. An 8-bit/s speech coder based on conjugate structure CELP
Kataoka et al. An 8-kb/s conjugate structure CELP (CS-CELP) speech coder
US6704703B2 (en) Recursively excited linear prediction speech coder
Ohmuro et al. Coding of LSP parameters using interframe moving average prediction and multi-stage vector quantization
KR100465316B1 (en) Speech encoder and speech encoding method thereof
Mano et al. Design of a pitch synchronous innovation CELP coder for mobile communications
US20060080090A1 (en) Reusing codebooks in parameter quantization
US6973424B1 (en) Voice coder
Xydeas et al. Theory and Real Time Implementation of a CELP Coder at 4.8 and 6.0 kbits/second Using Ternary Code Excitation
EP1859441B1 (en) Low-complexity code excited linear prediction encoding
AU682505B2 (en) Vector coding process, especially for voice signals
Akamine et al. CELP coding with an adaptive density pulse excitation model
KR100389898B1 (en) Method for quantizing linear spectrum pair coefficient in coding voice
Miki et al. Pitch synchronous innovation code excited linear prediction (PSI‐CELP)

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION