WO2000046795A1 - Speech synthesizer based on variable rate speech coding - Google Patents

Speech synthesizer based on variable rate speech coding Download PDF

Info

Publication number
WO2000046795A1
WO2000046795A1 PCT/US2000/002900 US0002900W WO0046795A1 WO 2000046795 A1 WO2000046795 A1 WO 2000046795A1 US 0002900 W US0002900 W US 0002900W WO 0046795 A1 WO0046795 A1 WO 0046795A1
Authority
WO
WIPO (PCT)
Prior art keywords
rate
speech
variable
variable rate
parameters
Prior art date
Application number
PCT/US2000/002900
Other languages
French (fr)
Other versions
WO2000046795A9 (en
Inventor
Chienchung Chang
Original Assignee
Qualcomm Incorporated
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Incorporated filed Critical Qualcomm Incorporated
Priority to EP00914511A priority Critical patent/EP1159738B1/en
Priority to JP2000597796A priority patent/JP4503853B2/en
Priority to DE60027140T priority patent/DE60027140T2/en
Priority to AU35891/00A priority patent/AU3589100A/en
Publication of WO2000046795A1 publication Critical patent/WO2000046795A1/en
Publication of WO2000046795A9 publication Critical patent/WO2000046795A9/en
Priority to HK02104772.4A priority patent/HK1042980B/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding

Definitions

  • the present invention relates to speech synthesis. More particularly, the present invention relates to synthesis of speech encoded by a variable rate vocoder. The invention further relates to use of speech synthesis with wireless communication devices.
  • Electronic speech synthesis is useful in a number of applications. More and more, computers and other electronic equipment are providing the option of voiced prompts as a user interface. For example, speech may be utilized for reading electronic mail messages, for generating spoken prompts in a voice response system, or for providing directions to a driver in a vehicle.
  • TTS text-to-speech
  • grammar based A TTS based system converts ordinary text into intelligible and natural sounding speech. It is useful for applications needing an automatic conversion of arbitrary input text into intelligible and natural sounding speech output. It is especially useful where large vocabularies and /or dynamically changing data are involved.
  • the TTS system is useful in applications such as providing automatic voice alerts and prompts, proofreading, telephone access to databases, and conversion of electronic mail to voice mail or audio output. Because TTS is flexible and powerful, it offers utility in many applications.
  • TTS Transmission Control Protocol
  • a machine tone if synthesizer doesn't simulate human speech intonation closely. Accordingly, TTS is not a practical choice for applications with limited memory and processing resources, such as in small portable wireless devices, remotely located communication devices or computers, and the like.
  • a second type of speech synthesizer is Voice Coder (Vocoder) based.
  • a vocoder compresses voiced speech, or audio signals, by extracting parameters that relate to a model of human speech generation. Vocoders have been developed to compress input speech that has been digitally converted to a rate of 64 kilo bits per second (kbps) down to 13 kbps, 8 kbps, or even lower rates.
  • a vocoder based speech synthesizer generates certain parameters of or for the speech to be synthesized. The parameters are stored in some type of memory, preferably flash type, and are decoded upon speech synthesis. Because the parameters of all words to be synthesized need to be stored in memory, vocoder based speech synthesizers are more suitable for applications that do not require large vocabularies. They are especially suitable for systems having limited memory and processing resources.
  • the present invention is an apparatus and method for speech synthesis based on variable rate vocoding.
  • the speech to be synthesized is encoded by a variable rate vocoder.
  • a variable rate vocoder encodes a frame of speech at one of a set of predetermined rates based on the speech activity taking place within the frame of speech.
  • the variable rate vocoder is a code excited linear prediction (CELP) encoder having four bit rates.
  • CELP code excited linear prediction
  • an input speech signal is encoded into speech parameters at one of the four rates using a CELP encoding scheme for the selected rate.
  • the speech parameters are generally provided to a decoder which performs a variable rate decoding scheme corresponding to the variable rate encoding scheme utilized.
  • the decoder produces speech samples, which are provided to a coder-decoder or codec for digital-to-analog conversion.
  • the resulting analog signal generated by the codec is then broadcast through a speaker or other known audio output device as synthesized speech.
  • the speech synthesizer of the present invention is especially suitable for use in wireless communication systems in which variable rate vocoding is already implemented.
  • the existing vocoding resources may be employed for speech synthesis.
  • DSP elements already present or easily incorporated, can be used in conjunction with a small amount of memory to provide the speech synthesizer function.
  • a speech synthesizer based on variable rate vocoding is able to provide good speech quality without requiring a large amount of memory.
  • the level of compression provided by a variable rate vocoder makes it suitable for applications with limited memory.
  • FIG. 1 is a block diagram of a variable rate vocoder
  • FIG. 2 is a block diagram of the speech synthesizer of the present invention.
  • the present invention provides an apparatus and method for synthesizing speech which is very useful when used with wireless communication equipment.
  • the invention can take advantage of existing signal processing resources in wireless communication equipment or a minimum of additional hardware to synthesize speech in a manner that provides high speech quality and requires a small memory size.
  • the present invention is very useful when employed in conjunction with a variety of known communication devices or systems, and it is described below in relation to a CDMA wireless communication system.
  • it is contemplated that it is particularly well suited for specific applications, such as hands-free car kits used to mount and operate wireless devices in vehicles.
  • this is not a limitation of the present invention, and that it can be used with other types of communication devices including those communicating in wired, wire line, or optical cable type systems, and those using other signal modulation techniques.
  • An exemplary wireless communication system makes use of code division multiple access (CDMA) modulation techniques.
  • CDMA code division multiple access
  • TDMA time division multiple access
  • FDMA frequency division multiple access
  • AM amplitude modulation
  • a speech synthesizer may be implemented in wireless communication devices or equipment for a number of reasons.
  • speech synthesis may be part of a voice recognition system in a wireless telephone or a "hands- free" carkit used to support operation in a vehicle.
  • a speech synthesizer can provide information in audible form when a device user or operator cannot visually observe an output screen or indicators on the device. For example, information can be provided to allow device operation or output when a vehicle driver or machinery operator cannot safely look at the communication device, closely.
  • the speech synthesizer would also allow for hands free operation of devices by providing voice prompts for operations to be performed.
  • the speech synthesizer may ask for the name of a person to be called, allowing the device to automatically dial a telephone number, or ask for a command to be implemented, such as dialing, storing, opening mail, terminating a call attempt, or shutting down.
  • the speech synthesizer of the present invention makes use of the vocoder circuitry already present in a number of wireless devices such as wireless telephones and other products used by communication service subscribers to generate voiced speech.
  • the speech synthesizer is based on a variable rate vocoder.
  • a variable rate vocoder uses speech activity to vary its instantaneous data rate.
  • the vocoder encoder uses a large number of bits to encode the speech samples.
  • the vocoder encoder uses few or fewer bits to encode the background noise.
  • An exemplary embodiment of a variable rate vocoder is described in U.S. Patent No. 5,414,796, entitled "Variable Rate Vocoder/' assigned to the assignee of the present invention and incorporated herein by reference.
  • Variable rate vocoders are commonly used in CDMA type communication systems to increase system capacity by decreasing the number of bits generally used by each communication signal.
  • a variable rate vocoder may, for example, be implemented in the CDMA communication system of Patent No. 4,901,307 discussed above.
  • different users communicate using the same bandwidth, but using different code channels.
  • a variable rate vocoder in a CDMA communication system takes advantage of the fact that a user is only speaking actively about 40% of the time on any given channel. By sending fewer bits when a user is silent, the variable rate vocoder allows more users to share the same bandwidth.
  • a schematic block diagram of a typical variable rate vocoder is shown in
  • FIG. 1 and is indicated generally by 100.
  • the vocoder shown in FIG. 1 uses four different data rates, although it should be understood that a different number of data rates may be employed instead, as would be known in the art. In the set of four rates, if the peak rate is 13.2 kbps, then full rate corresponds to 13.2 kbps, 1/2 rate corresponds to approximately 6.2 kbps, 1/4 rate corresponds to approximately 2.7 kbps, and 1/8 rate corresponds to approximately 1.0 kbps. Note that the actual bit rate for rates other than the full rate are approximate because of the use of overhead bits, as is well understood in the art. Referring still to FIG. 1, it can be seen that variable rate vocoder 100 includes an encoder 102 and a decoder 104.
  • Encoder 102 receives speech samples for frames of speech data as an input, for example, as 8-bit PCM samples at a 64 kbps data rate, in either mu-law or a-law format. Encoder 102 encodes these speech samples into speech parameters at one of four data rates, depending on the speech activity. The input speech samples are also provided to rate determination element 106.
  • Rate determination element 106 may implement any of a number of rate decision algorithms.
  • energy thresholds relative to the background noise energy level are used to determine the speech activity, and thereby the rate, at which the input samples are to be encoded. If the energy of the current frame of speech samples is far above the background noise energy, then the rate determination element 106 will determine that the frame is to be encoded at full rate. If the energy of the current frame is close to the background noise energy, then rate determination element 106 will determine that the frame is to be encoded at eighth rate, and so forth, as is known.
  • a first mode measure is the target matching signal to noise ratio (TMSNR) from the previous encoding frame, which provides information on how well the encoding model is performing by comparing a synthesized speech signal with the input speech signal.
  • TMSNR target matching signal to noise ratio
  • a second mode measure is the normalized autocorrelation function (NACF), which measures periodicity in the speech frame.
  • NACF normalized autocorrelation function
  • ZC zero crossings
  • PWD prediction gain differential
  • a fifth measure is the energy differential (ED), which compares the energy in the current frame to an average frame energy.
  • rate determination logic selects an encoding rate for each frame of input speech data.
  • the values for the various modes select one of say four or more modes in which to operate. That is, the values detected for each mode measure relative to a threshold or other criteria determines which encoding rate is selected, based on a preselected pattern or hiearchy. For example, if the value for NACF is less than a pre- selected threshold and ZC is greater than a second pre-selected threshold one rate could be selected. However, if these conditions are not met but ED is lower than a third threshold, then a quarter rate might be selected.
  • rate determination element 106 may be adopted by rate determination element 106.
  • a signal indicating the data rate determined by rate determination element 106 is provided to a switch 108.
  • Switch 108 selects an element for encoding a frame of input speech samples from among a full rate encoding element 110, a half rate encoding element 112, a quarter rate encoding element 114, and an eighth rate encoding element 116, as designated by the data rate signal.
  • the selected encoding element encodes the speech samples to produce a signal of an encoded data packet.
  • Rate determination element 106 also provides a signal indicating the data rate to a switch 118, which selects the same encoding element as switch 108 so that the signal of the encoded data packet generated by the selected encoding element can be provided to an output of the variable rate vocoder.
  • Each of the encoding elements 110, 112, 114, and 116 is configured to encode speech using a predetermined encoding scheme.
  • a linear-prediction- based encoding scheme such as the Code Excited Linear Predictive (CELP) encoder, is used in a preferred embodiment.
  • CELP coder is described in the paper "A 4.8 Kbps Code Excited Linear Predictive Coder/' by Thomas E. Tremain el al., Proceedings of the Mobile Satellite Conference, 1988.
  • Linear- prediction-based encoders compress speech by removing the natural redundancies inherent in speech. Speech typically exhibits short term redundancies resulting from the mechanical action of the lips and tongue, and long term redundancies resulting from the vibration of the vocal cords.
  • Linear predictive schemes model these operations as filters, remove the redundancies, and then model the resulting residual signal as white gaussian noise.
  • Linear predictive coders therefore, achieve a reduced bit rate by transmitting filter coefficients and quantized noise rather than a full bandwidth speech signal.
  • a linear predictive coding scheme that employs variable rates offers further reductions in bit rate without compromising the quality of speech.
  • the full rate encoding element 110 encodes the parameters of the input speech signal using more bits to better preserve the characteristics of the input.
  • eighth rate encoding element 116 encodes the parameters using fewer bits since there is typically little detail or useful information to be captured. Transitions between periods of active speech and periods with no detected speech are encoded by half rate encoding element 112 and quarter rate encoding element 114.
  • decoder 104 receives a signal of the encoded speech parameters as well as a signal indicating the rate used to encode the speech.
  • a rate extraction element 128 receives this input signal and determines the data rate of the speech.
  • a signal of the data rate is also_provided to a switch 130, which selects the decoding element from a set of decoding elements to properly decode the input parameters.
  • four decoding elements, full rate decoding element 120, half rate decoding element 122, quarter rate decoding element 124, and eighth rate decoding element 126 are provided for decoding the speech parameters at the four possible rates.
  • the selected decoding element decodes the input parameters based on the data rate to produce a signal of decoded samples, which typically are 64 kbps pulse code modulated (PCM) samples.
  • a signal of the data rate determined by rate extraction element 128 is also provided to a switch 132.
  • Switch 132 selects the same decoding element as switch 130 so that a signal of the decoded samples is provided to an output of the vocoder.
  • FIG. 2 a block diagram of a speech synthesis system operating according to the principles of the present invention, which incorporates a variable rate vocoder, is shown.
  • the speech synthesis system comprises a variable rate encoder 202 and a speech synthesizer 204.
  • An example of the variable rate encoder 202 is encoder 102 of FIG. 1.
  • Variable rate encoder 202 receives a speech signal as input, and encodes the speech at one of a set of predetermined rates.
  • variable rate encoder 202 is a CELP encoder that generates speech parameters at one of the rates based on the speech activity in the input segment of speech.
  • variable rate decoder is an enhanced variable rate decoder such as described in relation to the IS127 standard.
  • encoding rate decisions are based on "mode measures,” as discussed above.
  • mode measures The different combinations of criteria used to make rate selections are used to create what is termed “reduced rate mode” or “modes,” and referred to more simply as mode 0, mode 1, mode 2, and so forth, as would be understood by those skilled in the art.
  • the present invention can take advantage of such modes for purposes of speech synthesis.
  • the speech received by variable rate encoder 202 may be a word or a phrase from a pre-selected vocabulary that a communication device such as a wireless telephone, carkit, or other communication device is designed to synthesize.
  • the vocabulary would include prompts and alerts to be given to a device user. For example, by extracting and synthesizing five individual vocabulary words: 'call', 'redial', 'program', 'or' and 'exit', the speech synthesizer may be designed to provide the prompts "call, redial, program, or exit" in solicitation of a response from the user.
  • the speech synthesizer may be designed to provide previously stored information, such as in phone books, look-up tables, or databases, to a device user in response to various device inputs, including audio.
  • the speech received by variable rate encoder 202 is encoded, and the encoded parameters are provided to a memory element or circuit 206 of the speech synthesizer 204 for storage.
  • Memory 206 is intended to hold or store the parameters over some time for operation of the desired device. However, it is also generally desirable to have the parameters stored in a manner that makes them updateable or replaceable, such as when the vocabulary needs to be changed for changing conditions or upgrades to device features. Therefore, memory 206 is configured in the form of non-volatile but re-writable memory, which can be accomplished using flash type memory elements, as is well known in the art.
  • variable rate encoder 202 may receive a speech signal input during operation of the communication device. For example, in response to a prompt from the speech synthesizer, the user may provide a spoken response.
  • Variable rate encoder 202 will then encode the user's speech, and the encoded parameters may be provided to flash memory 206 for storage, and /or provided to a voice recognizer (not shown) for voice recognition purposes. In this manner, the parameters are input post manufacture such as immediately upon the device entering useful service or over time, such as by building a personal vocabulary library for each device (vocoder) user, related to that user's requirements.
  • vocoder personal vocabulary library
  • Flash memory 206 should be of a size that is sufficient to store the parameters of the pre-selected vocabulary as well as the parameters of speech anticipated from the user. Thus, the size of flash memory 206 may vary based on the requirements of the specific application. Post manufacture storage may have an advantage of reducing memory requirements where each device user does not require as extensive a vocabulary as compared to what a manufacturer would have to install to cover an entire larger device market.
  • the speech synthesizer can record names or other words, like 'Fred Smith' by detecting the endpoints of the target or desired phrase or speech, removing silence or redundancies, and encoding it. Therefore, speech can be recorded "on-line” and used later to synthesize speech output.
  • variable rate encoder 202 may be configured based on the available memory and the voice quality required. In the system having four rates wherein the full rate is 13 kbps, the average rate will generally be 5.88 kbps based on 40% voice activity. The use of the variable rates will provide high speech quality. If, however, the memory size is limited, variable rate encoder 202 mav be configured to operate at, say, a fixed half-rate of approximately 800 bvtes per second. Otherwise, the rate may be selected from a subset of the predetermined set of rates instead of the whole set of rates. For example, the reduced rate modes discussed above can be used to select various rates. In one embodiment of the invention, the rates are divided into a set of four modes, labeled as modes 0, 1, 2, and 3.
  • variable rate encoder 202 may switch between different modes of operation (variable rate, all half-rate, a subset of the variable rates, etc.) based on the instantaneous requirements of the application. Because there may be a trade off between voice quality and memory size, the configuration to be adopted will depend on the application being implemented.
  • variable rate decoder 208 The speech parameters stored in flash memory 206 will be provided to a variable rate decoder 208 when speech synthesis is desired.
  • the variable rate decoder 208 is configured to decode the parameters generated by corresponding variable rate encoder 202.
  • variable rate decoder 208 will be implemented as part of a digital signal processor (DSP)_used within the communication device.
  • DSP digital signal processor
  • Such DSPs are used as or to form the processing elements for signal coding/decoding, combining, CDMA coding, power adjustment, and so forth. Since such elements are typically used in wireless devices, and many other devices in which the invention may find use, advantage can be taken of their presence to very cost effectively implement the present invention.
  • a stand-alone decoder within or using a DSP requires a very small amount of memory (both program and data) to attain speech synthesis capability.
  • the speech synthesizer can be implemented using well known DSP circuits and devices such as commercially available from Analog Devices and Qualcomm Incorporated.
  • the decoded parameters typically in the form of pulse code modulated
  • PCM samples are then provided to a codec 210.
  • Codec 210 converts the PCM samples from a digital format to an analog signal.
  • the analog signal is provided to speaker or other known audio output device 212, which projects or broadcasts synthesized speech into the surrounding device environment where it can be heard.
  • a speech synthesizer based on variable rate vocoding is provided by the present invention.
  • the speech synthesizer is especially suitable for use in wireless communication devices that already comprise a variable rate vocoder.
  • an existing variable rate vocoder that may be employed by the speech synthesizer, through the use of appropriate changes in program or operational instructions, or using control hardware.
  • the compression achieved may allow a pre-determined vocabulary to be stored in a memory of limited size associated with the wireless device or other equipment with which it interfaces.
  • the trade off between voice quality and memory size may be considered in configuring the variable rate vocoder to provide a speech synthesizer with the desired voice quality and memory size.
  • the present invention can find application in a variety of communication devices and interface equipment.
  • wireless communication devices such as, but not limited to, cellular and satellite telephones, often referred to as user terminals, subscriber units, mobile stations, or simply "users," “mobiles,” or “subscribers”.
  • other devices are also contemplated, such as message receivers and data transfer devices (e.g., portable computers, personal data assistants, modems, machinery controllers), or interfaces for public telephone networks or dedicated communications channels.
  • the invention can be implemented using separate circuits in the form of dedicated components or application specific integrated circuits (ASIC) to form a speech synthesizer which is installed within a desired device. Alternatively, it can be incorporated within other ASICs and devices by using a small amount of additional memory to work with existing digital signal processing elements.
  • ASIC application specific integrated circuits

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Reduction Or Emphasis Of Bandwidth Of Signals (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

An apparatus and method for speech synthesis based on variable rate vocoding is presented. An input speech signal is encoded by a variable rate vocoder (202), and the parameters of the speech signal are stored in memory. To synthesize speech, a variable rate decoder (208) decodes the parameters to produce speech samples. A codec (210) converts the speech samples from a digital signal to an analog signal, which is broadcast through a speaker (212).

Description

SPEECH SYNTHESIZER BASED ON VARIABLE RATE SPEECH
CODING
BACKGROUND OF THE INVENTION
I. Field of the Invention
The present invention relates to speech synthesis. More particularly, the present invention relates to synthesis of speech encoded by a variable rate vocoder. The invention further relates to use of speech synthesis with wireless communication devices.
II. Description of the Related Art
Electronic speech synthesis is useful in a number of applications. More and more, computers and other electronic equipment are providing the option of voiced prompts as a user interface. For example, speech may be utilized for reading electronic mail messages, for generating spoken prompts in a voice response system, or for providing directions to a driver in a vehicle.
There are two general types of speech synthesizers or techniques used to generate speech. The first type is referred to as a text-to-speech (TTS) speech synthesizer, and is grammar based. A TTS based system converts ordinary text into intelligible and natural sounding speech. It is useful for applications needing an automatic conversion of arbitrary input text into intelligible and natural sounding speech output. It is especially useful where large vocabularies and /or dynamically changing data are involved. The TTS system is useful in applications such as providing automatic voice alerts and prompts, proofreading, telephone access to databases, and conversion of electronic mail to voice mail or audio output. Because TTS is flexible and powerful, it offers utility in many applications. However, implementation of a TTS system may require tremendous memory and processing power resources. It may also contain a machine tone if synthesizer doesn't simulate human speech intonation closely. Accordingly, TTS is not a practical choice for applications with limited memory and processing resources, such as in small portable wireless devices, remotely located communication devices or computers, and the like.
A second type of speech synthesizer is Voice Coder (Vocoder) based. A vocoder compresses voiced speech, or audio signals, by extracting parameters that relate to a model of human speech generation. Vocoders have been developed to compress input speech that has been digitally converted to a rate of 64 kilo bits per second (kbps) down to 13 kbps, 8 kbps, or even lower rates. A vocoder based speech synthesizer generates certain parameters of or for the speech to be synthesized. The parameters are stored in some type of memory, preferably flash type, and are decoded upon speech synthesis. Because the parameters of all words to be synthesized need to be stored in memory, vocoder based speech synthesizers are more suitable for applications that do not require large vocabularies. They are especially suitable for systems having limited memory and processing resources.
For vocoder based speech synthesizers, there is a need to optimize memory usage while maintaining acceptable speech quality. For some applications, it may be desirable to maximize the size of the vocabulary for a given size of memory. Furthermore, it may also be desirable to use signal processing resources already available within a given communication system design for accomplishing speech synthesis. A speech synthesizer possessing these and other characteristics is provided by the present invention in the manner described below.
SUMMARY OF THE INVENTION
The present invention is an apparatus and method for speech synthesis based on variable rate vocoding. The speech to be synthesized is encoded by a variable rate vocoder. A variable rate vocoder encodes a frame of speech at one of a set of predetermined rates based on the speech activity taking place within the frame of speech. In one embodiment, the variable rate vocoder is a code excited linear prediction (CELP) encoder having four bit rates. Thus, an input speech signal is encoded into speech parameters at one of the four rates using a CELP encoding scheme for the selected rate. The speech parameters are generally provided to a decoder which performs a variable rate decoding scheme corresponding to the variable rate encoding scheme utilized. The decoder produces speech samples, which are provided to a coder-decoder or codec for digital-to-analog conversion. The resulting analog signal generated by the codec is then broadcast through a speaker or other known audio output device as synthesized speech.
The speech synthesizer of the present invention is especially suitable for use in wireless communication systems in which variable rate vocoding is already implemented. In these systems, the existing vocoding resources may be employed for speech synthesis. Alternatively, DSP elements, already present or easily incorporated, can be used in conjunction with a small amount of memory to provide the speech synthesizer function. In addition, a speech synthesizer based on variable rate vocoding is able to provide good speech quality without requiring a large amount of memory. The level of compression provided by a variable rate vocoder makes it suitable for applications with limited memory.
BRIEF DESCRIPTION OF THE DRAWINGS
The features, objects, and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout and wherein:
FIG. 1 is a block diagram of a variable rate vocoder; and FIG. 2 is a block diagram of the speech synthesizer of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The present invention provides an apparatus and method for synthesizing speech which is very useful when used with wireless communication equipment. The invention can take advantage of existing signal processing resources in wireless communication equipment or a minimum of additional hardware to synthesize speech in a manner that provides high speech quality and requires a small memory size.
The present invention is very useful when employed in conjunction with a variety of known communication devices or systems, and it is described below in relation to a CDMA wireless communication system. In addition, it is contemplated that it is particularly well suited for specific applications, such as hands-free car kits used to mount and operate wireless devices in vehicles. However, those skilled in the art will readily understand that this is not a limitation of the present invention, and that it can be used with other types of communication devices including those communicating in wired, wire line, or optical cable type systems, and those using other signal modulation techniques.
An exemplary wireless communication system makes use of code division multiple access (CDMA) modulation techniques. Although other techniques such as time division multiple access (TDMA), frequency division multiple access (FDMA), and amplitude modulation (AM) schemes such as amplitude companded single sideband (ACSSB) are known, CDMA has significant advantages over these other techniques. The use of CDMA techniques in a multiple access communication system is disclosed in U.S. Patent No. 4,901,307, entitled "Spread Spectrum Multiple Access Communication System Using Satellite Or Terrestrial Repeaters," assigned to the assignee of the present invention and incorporated herein by reference. A speech synthesizer may be implemented in wireless communication devices or equipment for a number of reasons. For example, speech synthesis may be part of a voice recognition system in a wireless telephone or a "hands- free" carkit used to support operation in a vehicle. A speech synthesizer can provide information in audible form when a device user or operator cannot visually observe an output screen or indicators on the device. For example, information can be provided to allow device operation or output when a vehicle driver or machinery operator cannot safely look at the communication device, closely. The speech synthesizer would also allow for hands free operation of devices by providing voice prompts for operations to be performed. For example, the speech synthesizer may ask for the name of a person to be called, allowing the device to automatically dial a telephone number, or ask for a command to be implemented, such as dialing, storing, opening mail, terminating a call attempt, or shutting down. In one embodiment the speech synthesizer of the present invention makes use of the vocoder circuitry already present in a number of wireless devices such as wireless telephones and other products used by communication service subscribers to generate voiced speech. Specifically, the speech synthesizer is based on a variable rate vocoder. A variable rate vocoder uses speech activity to vary its instantaneous data rate. During active speech, the vocoder encoder uses a large number of bits to encode the speech samples. During periods of silence, the vocoder encoder uses few or fewer bits to encode the background noise. An exemplary embodiment of a variable rate vocoder is described in U.S. Patent No. 5,414,796, entitled "Variable Rate Vocoder/' assigned to the assignee of the present invention and incorporated herein by reference.
Variable rate vocoders are commonly used in CDMA type communication systems to increase system capacity by decreasing the number of bits generally used by each communication signal. A variable rate vocoder may, for example, be implemented in the CDMA communication system of Patent No. 4,901,307 discussed above. In a CDMA communication system, different users communicate using the same bandwidth, but using different code channels. A variable rate vocoder in a CDMA communication system takes advantage of the fact that a user is only speaking actively about 40% of the time on any given channel. By sending fewer bits when a user is silent, the variable rate vocoder allows more users to share the same bandwidth. A schematic block diagram of a typical variable rate vocoder is shown in
FIG. 1 and is indicated generally by 100. The vocoder shown in FIG. 1 uses four different data rates, although it should be understood that a different number of data rates may be employed instead, as would be known in the art. In the set of four rates, if the peak rate is 13.2 kbps, then full rate corresponds to 13.2 kbps, 1/2 rate corresponds to approximately 6.2 kbps, 1/4 rate corresponds to approximately 2.7 kbps, and 1/8 rate corresponds to approximately 1.0 kbps. Note that the actual bit rate for rates other than the full rate are approximate because of the use of overhead bits, as is well understood in the art. Referring still to FIG. 1, it can be seen that variable rate vocoder 100 includes an encoder 102 and a decoder 104. Encoder 102 receives speech samples for frames of speech data as an input, for example, as 8-bit PCM samples at a 64 kbps data rate, in either mu-law or a-law format. Encoder 102 encodes these speech samples into speech parameters at one of four data rates, depending on the speech activity. The input speech samples are also provided to rate determination element 106.
Rate determination element 106 may implement any of a number of rate decision algorithms. In one embodiment, energy thresholds relative to the background noise energy level are used to determine the speech activity, and thereby the rate, at which the input samples are to be encoded. If the energy of the current frame of speech samples is far above the background noise energy, then the rate determination element 106 will determine that the frame is to be encoded at full rate. If the energy of the current frame is close to the background noise energy, then rate determination element 106 will determine that the frame is to be encoded at eighth rate, and so forth, as is known.
Another rate determination technique is disclosed in copending U.S. patent application Serial No. 08/286,842, entitled "Method And Apparatus For Performing Reduced Rate Variable Rate Vocoding, " assigned to the assignee of the present invention and incorporated herein by reference. This technique provides the set of rate decision criteria referred to as mode measures. A first mode measure is the target matching signal to noise ratio (TMSNR) from the previous encoding frame, which provides information on how well the encoding model is performing by comparing a synthesized speech signal with the input speech signal. A second mode measure is the normalized autocorrelation function (NACF), which measures periodicity in the speech frame. A third mode measure is the zero crossings (ZC) parameter, which measures high frequency content in an input speech frame. A fourth measure, the prediction gain differential (PGD), determines if the encoder is maintaining its prediction efficiency. A fifth measure is the energy differential (ED), which compares the energy in the current frame to an average frame energy.
Using the mode measures discussed above, rate determination logic selects an encoding rate for each frame of input speech data. The values for the various modes select one of say four or more modes in which to operate. That is, the values detected for each mode measure relative to a threshold or other criteria determines which encoding rate is selected, based on a preselected pattern or hiearchy. For example, if the value for NACF is less than a pre- selected threshold and ZC is greater than a second pre-selected threshold one rate could be selected. However, if these conditions are not met but ED is lower than a third threshold, then a quarter rate might be selected. If the value for TSNR is greater, PGD is less, and NACF is greater than forth, fifth, and sixth thresholds, respectively, then a half rate might be selected. Various such combinations and thresholds may be employed by those skilled in the art to selected encoding rates.
It should be understood that still other rate determination techniques may be adopted by rate determination element 106.
Referring still to FIG. 1, a signal indicating the data rate determined by rate determination element 106 is provided to a switch 108. Switch 108 selects an element for encoding a frame of input speech samples from among a full rate encoding element 110, a half rate encoding element 112, a quarter rate encoding element 114, and an eighth rate encoding element 116, as designated by the data rate signal. The selected encoding element encodes the speech samples to produce a signal of an encoded data packet. Rate determination element 106 also provides a signal indicating the data rate to a switch 118, which selects the same encoding element as switch 108 so that the signal of the encoded data packet generated by the selected encoding element can be provided to an output of the variable rate vocoder.
Each of the encoding elements 110, 112, 114, and 116 is configured to encode speech using a predetermined encoding scheme. A linear-prediction- based encoding scheme, such as the Code Excited Linear Predictive (CELP) encoder, is used in a preferred embodiment. The CELP coder is described in the paper "A 4.8 Kbps Code Excited Linear Predictive Coder/' by Thomas E. Tremain el al., Proceedings of the Mobile Satellite Conference, 1988. Linear- prediction-based encoders compress speech by removing the natural redundancies inherent in speech. Speech typically exhibits short term redundancies resulting from the mechanical action of the lips and tongue, and long term redundancies resulting from the vibration of the vocal cords. Linear predictive schemes model these operations as filters, remove the redundancies, and then model the resulting residual signal as white gaussian noise. Linear predictive coders, therefore, achieve a reduced bit rate by transmitting filter coefficients and quantized noise rather than a full bandwidth speech signal.
A linear predictive coding scheme that employs variable rates offers further reductions in bit rate without compromising the quality of speech. In FIG. 1, the full rate encoding element 110 encodes the parameters of the input speech signal using more bits to better preserve the characteristics of the input. For periods where no speech is detected, eighth rate encoding element 116 encodes the parameters using fewer bits since there is typically little detail or useful information to be captured. Transitions between periods of active speech and periods with no detected speech are encoded by half rate encoding element 112 and quarter rate encoding element 114.
Referring now to the decoding element of the variable rate vocoder, decoder 104 receives a signal of the encoded speech parameters as well as a signal indicating the rate used to encode the speech. A rate extraction element 128 receives this input signal and determines the data rate of the speech. A signal of the data rate is also_provided to a switch 130, which selects the decoding element from a set of decoding elements to properly decode the input parameters. In FIG. 1, four decoding elements, full rate decoding element 120, half rate decoding element 122, quarter rate decoding element 124, and eighth rate decoding element 126 are provided for decoding the speech parameters at the four possible rates. The selected decoding element decodes the input parameters based on the data rate to produce a signal of decoded samples, which typically are 64 kbps pulse code modulated (PCM) samples. A signal of the data rate determined by rate extraction element 128 is also provided to a switch 132. Switch 132 selects the same decoding element as switch 130 so that a signal of the decoded samples is provided to an output of the vocoder. Referring now to FIG. 2, a block diagram of a speech synthesis system operating according to the principles of the present invention, which incorporates a variable rate vocoder, is shown. The speech synthesis system comprises a variable rate encoder 202 and a speech synthesizer 204. An example of the variable rate encoder 202 is encoder 102 of FIG. 1. Variable rate encoder 202 receives a speech signal as input, and encodes the speech at one of a set of predetermined rates. In a preferred embodiment, variable rate encoder 202 is a CELP encoder that generates speech parameters at one of the rates based on the speech activity in the input segment of speech.
The present invention uses a variable rate vocoder as described in U.S. Patent No. 5,414,796, discussed above, which is commercially available, for example, as a 13 kbps vocoder product from Qualcomm Incorporated. In one preferred embodiment, the variable rate decoder is an enhanced variable rate decoder such as described in relation to the IS127 standard.
In one embodiment of the present invention, encoding rate decisions are based on "mode measures," as discussed above. The different combinations of criteria used to make rate selections are used to create what is termed "reduced rate mode" or "modes," and referred to more simply as mode 0, mode 1, mode 2, and so forth, as would be understood by those skilled in the art. The present invention can take advantage of such modes for purposes of speech synthesis.
The speech received by variable rate encoder 202 may be a word or a phrase from a pre-selected vocabulary that a communication device such as a wireless telephone, carkit, or other communication device is designed to synthesize. The vocabulary would include prompts and alerts to be given to a device user. For example, by extracting and synthesizing five individual vocabulary words: 'call', 'redial', 'program', 'or' and 'exit', the speech synthesizer may be designed to provide the prompts "call, redial, program, or exit" in solicitation of a response from the user. Alternatively, the speech synthesizer may be designed to provide previously stored information, such as in phone books, look-up tables, or databases, to a device user in response to various device inputs, including audio. The speech received by variable rate encoder 202 is encoded, and the encoded parameters are provided to a memory element or circuit 206 of the speech synthesizer 204 for storage.
Memory 206 is intended to hold or store the parameters over some time for operation of the desired device. However, it is also generally desirable to have the parameters stored in a manner that makes them updateable or replaceable, such as when the vocabulary needs to be changed for changing conditions or upgrades to device features. Therefore, memory 206 is configured in the form of non-volatile but re-writable memory, which can be accomplished using flash type memory elements, as is well known in the art.
As one would recognize, the operation of loading parameters may be performed during manufacture of a communication device for which the invention is to be used. Since the prompts and alerts to be synthesized may be predetermined, these may be encoded during manufacture and stored in flash memory 206 prior to use. The parameters can be changed or replaced during service of the device, or through newly developed over-the-air programming techniques for wireless devices. Alternatively, variable rate encoder 202 may receive a speech signal input during operation of the communication device. For example, in response to a prompt from the speech synthesizer, the user may provide a spoken response. Variable rate encoder 202 will then encode the user's speech, and the encoded parameters may be provided to flash memory 206 for storage, and /or provided to a voice recognizer (not shown) for voice recognition purposes. In this manner, the parameters are input post manufacture such as immediately upon the device entering useful service or over time, such as by building a personal vocabulary library for each device (vocoder) user, related to that user's requirements.
Flash memory 206 should be of a size that is sufficient to store the parameters of the pre-selected vocabulary as well as the parameters of speech anticipated from the user. Thus, the size of flash memory 206 may vary based on the requirements of the specific application. Post manufacture storage may have an advantage of reducing memory requirements where each device user does not require as extensive a vocabulary as compared to what a manufacturer would have to install to cover an entire larger device market. The speech synthesizer can record names or other words, like 'Fred Smith' by detecting the endpoints of the target or desired phrase or speech, removing silence or redundancies, and encoding it. Therefore, speech can be recorded "on-line" and used later to synthesize speech output.
It should be noted that variable rate encoder 202 may be configured based on the available memory and the voice quality required. In the system having four rates wherein the full rate is 13 kbps, the average rate will generally be 5.88 kbps based on 40% voice activity. The use of the variable rates will provide high speech quality. If, however, the memory size is limited, variable rate encoder 202 mav be configured to operate at, say, a fixed half-rate of approximately 800 bvtes per second. Otherwise, the rate may be selected from a subset of the predetermined set of rates instead of the whole set of rates. For example, the reduced rate modes discussed above can be used to select various rates. In one embodiment of the invention, the rates are divided into a set of four modes, labeled as modes 0, 1, 2, and 3. Using fixed rates according to the mode, rates on the order of 1800 bytes per second, 1540 bytes per second, 1400 bytes per second, and 1100 bvtes per second, respectively, can be used. The use of such fixed reduced rates allows delivery of very high quality voice given a predefined data rate, approaching land-line quality. These four modes provide the best tradeoff between synthesized speech quality and memory requirement.
Furthermore, variable rate encoder 202 mav switch between different modes of operation (variable rate, all half-rate, a subset of the variable rates, etc.) based on the instantaneous requirements of the application. Because there may be a trade off between voice quality and memory size, the configuration to be adopted will depend on the application being implemented.
The speech parameters stored in flash memory 206 will be provided to a variable rate decoder 208 when speech synthesis is desired. The variable rate decoder 208 is configured to decode the parameters generated by corresponding variable rate encoder 202. An example of variable rate decoder
208 is decoder 104 of FIG. 1.
Generally, variable rate decoder 208 will be implemented as part of a digital signal processor (DSP)_used within the communication device. Such DSPs are used as or to form the processing elements for signal coding/decoding, combining, CDMA coding, power adjustment, and so forth. Since such elements are typically used in wireless devices, and many other devices in which the invention may find use, advantage can be taken of their presence to very cost effectively implement the present invention.
In order to implement the decoding functionality for the present invention, only a small amount of memory is required in or coupled to a DSP. A stand-alone decoder within or using a DSP requires a very small amount of memory (both program and data) to attain speech synthesis capability. The speech synthesizer can be implemented using well known DSP circuits and devices such as commercially available from Analog Devices and Qualcomm Incorporated.
The decoded parameters, typically in the form of pulse code modulated
(PCM) samples, are then provided to a codec 210. Codec 210 converts the PCM samples from a digital format to an analog signal. The analog signal is provided to speaker or other known audio output device 212, which projects or broadcasts synthesized speech into the surrounding device environment where it can be heard.
Therefore, a speech synthesizer based on variable rate vocoding is provided by the present invention. The speech synthesizer is especially suitable for use in wireless communication devices that already comprise a variable rate vocoder. In other words, an existing variable rate vocoder that may be employed by the speech synthesizer, through the use of appropriate changes in program or operational instructions, or using control hardware. In addition, through the use of variable rate vocoding, the compression achieved may allow a pre-determined vocabulary to be stored in a memory of limited size associated with the wireless device or other equipment with which it interfaces. Furthermore, the trade off between voice quality and memory size may be considered in configuring the variable rate vocoder to provide a speech synthesizer with the desired voice quality and memory size. The present invention can find application in a variety of communication devices and interface equipment. The above example embodiments were discussed in relation to wireless communication devices such as, but not limited to, cellular and satellite telephones, often referred to as user terminals, subscriber units, mobile stations, or simply "users," "mobiles," or "subscribers". In addition, other devices are also contemplated, such as message receivers and data transfer devices (e.g., portable computers, personal data assistants, modems, machinery controllers), or interfaces for public telephone networks or dedicated communications channels.
The invention can be implemented using separate circuits in the form of dedicated components or application specific integrated circuits (ASIC) to form a speech synthesizer which is installed within a desired device. Alternatively, it can be incorporated within other ASICs and devices by using a small amount of additional memory to work with existing digital signal processing elements.
The previous description of the preferred embodiments is provided to enable any person skilled in the art to make or use the present invention. The various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without the use of the inventive faculty. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
What I claim as my invention is:

Claims

1. An apparatus for synthesizing speech in a wireless communication system, comprising: a memory for storing speech parameters encoded by a variable rate encoder; a variable rate decoder for decoding said speech parameters to generate decoded speech samples; and a digital-to-analog converter for converting said speech samples into an analog signal for broadcast as synthesized speech.
2. The apparatus of claim 1 wherein said variable rate encoder is linear-prediction-based.
3. The apparatus of claim 1 wherein said variable rate decoder is linear-prediction-based.
4. The apparatus of claim 1 wherein said speech parameters are encoded at a set of variable rates comprising of a full rate, a half rate, a quarter rate, and an eighth rate.
5. The apparatus of claim 4 wherein said full rate is 13.2 kbps, said half rate is approximately 6.2 kbps, said quarter rate is approximately 2.7 kbps, and said eighth rate is approximately 1.0 kbps.
6. The apparatus of claim 4 wherein said speech parameters are encoded at a rate fixed in response to one or more measured mode criteria.
7. The apparatus of claim 4 wherein said speech parameters are encoded at a rate fixed at said half rate.
8. The apparatus of claim 4 wherein the encoding rate is selected in accordance with the requirements of the quality of voice and the size of said memory.
9. The apparatus of claim 1 wherein said wireless communication system is a CDMA system.
10. The apparatus of claim 1 further comprising a variable rate encoder for encoding speech into said speech parameters.
11. The apparatus of claim 10 wherein said variable rate encoder encodes speech that belongs to a pre-selected vocabulary.
12. The apparatus of claim 10 wherein said variable rate encoder comprises an enhanced variable rate encoder.
13. A method for synthesizing speech in a wireless communication system, comprising the steps of: retrieving speech parameters stored in a memory, said speech parameters having been encoded using a variable rate encoding scheme; decoding said speech parameters using a variable rate encoding scheme to generate decoded speech samples; and converting said speech samples into an analog signal for broadcast as synthesized speech.
14. The method of claim 13 wherein said variable rate encoding scheme is linear-prediction-based.
15. The method of claim 13 wherein said variable rate decoding scheme is linear-prediction-based.
16. The method of claim 13 wherein said speech parameters are encoded at a set of variable rates comprising of a full rate, a half rate, a quarter rate, and an eighth rate.
17. The method of claim 16 wherein said full rate is 13.2 kbps, said half rate is approximately 6.2 kbps, said quarter rate is approximately 2.7 kbps, and said eighth rate is approximately 1.0 kbps.
18. The method of claim 16 wherein said speech parameters are encoded at a rate fixed in response to one or more measured mode criteria.
19. The method of claim 16 wherein said speech parameters are encoded at a rate fixed at said half rate.
20. The method of claim 16 wherein the encoding rate is selected in accordance with the requirements of the quality of voice and the size of said memory.
21. The method of claim 13 wherein said wireless communication system comprises a CDMA system.
22. The method of claim 13 further comprising the step of encoding an input speech signal into said speech parameters.
23. The method of claim 22 wherein said step of encoding encodes speech that belongs to a pre-selected vocabulary.
PCT/US2000/002900 1999-02-08 2000-02-04 Speech synthesizer based on variable rate speech coding WO2000046795A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP00914511A EP1159738B1 (en) 1999-02-08 2000-02-04 Speech synthesizer based on variable rate speech coding
JP2000597796A JP4503853B2 (en) 1999-02-08 2000-02-04 Speech synthesizer based on variable rate speech coding
DE60027140T DE60027140T2 (en) 1999-02-08 2000-02-04 LANGUAGE SYNTHETIZER BASED ON LANGUAGE CODING WITH A CHANGING BIT RATE
AU35891/00A AU3589100A (en) 1999-02-08 2000-02-04 Speech synthesizer based on variable rate speech coding
HK02104772.4A HK1042980B (en) 1999-02-08 2002-06-27 Speech synthesizer based on variable rate speech coding

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US24660599A 1999-02-08 1999-02-08
US09/246,605 1999-02-08

Publications (2)

Publication Number Publication Date
WO2000046795A1 true WO2000046795A1 (en) 2000-08-10
WO2000046795A9 WO2000046795A9 (en) 2001-10-18

Family

ID=22931374

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/002900 WO2000046795A1 (en) 1999-02-08 2000-02-04 Speech synthesizer based on variable rate speech coding

Country Status (10)

Country Link
EP (1) EP1159738B1 (en)
JP (2) JP4503853B2 (en)
KR (1) KR100648872B1 (en)
CN (1) CN1212604C (en)
AT (1) ATE322731T1 (en)
AU (1) AU3589100A (en)
DE (1) DE60027140T2 (en)
ES (1) ES2263459T3 (en)
HK (1) HK1042980B (en)
WO (1) WO2000046795A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4867076B2 (en) * 2001-03-28 2012-02-01 日本電気株式会社 Compression unit creation apparatus for speech synthesis, speech rule synthesis apparatus, and method used therefor
KR100425982B1 (en) * 2001-12-29 2004-04-06 엘지전자 주식회사 Voice Data Rate Changing Method in IMT-2000 Network
KR100651731B1 (en) * 2003-12-26 2006-12-01 한국전자통신연구원 Apparatus and method for variable frame speech encoding/decoding
CN101692685B (en) * 2009-10-29 2012-05-30 中国电信股份有限公司 Method and system for improving acoustics of polyphonic ringtone
JP5677470B2 (en) * 2011-02-03 2015-02-25 パナソニックIpマネジメント株式会社 Voice reading device, voice output device, voice output system, voice reading method and voice output method
CN106952651A (en) * 2017-02-17 2017-07-14 福建星网智慧科技股份有限公司 A kind of voice processing apparatus transmits the method and system of voice
US11404045B2 (en) 2019-08-30 2022-08-02 Samsung Electronics Co., Ltd. Speech synthesis method and apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0762711A2 (en) * 1995-09-12 1997-03-12 Nokia Mobile Phones Ltd. Speech storage in a portable cellular telephone
US5657420A (en) * 1991-06-11 1997-08-12 Qualcomm Incorporated Variable rate vocoder
DE29717372U1 (en) * 1997-09-29 1997-11-27 Siemens AG, 80333 München Integrated circuit for a mobile radio with answering machine function

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0331858B1 (en) * 1988-03-08 1993-08-25 International Business Machines Corporation Multi-rate voice encoding method and device
JP3081300B2 (en) * 1991-10-01 2000-08-28 三洋電機株式会社 Residual driven speech synthesizer
TW271524B (en) * 1994-08-05 1996-03-01 Qualcomm Inc
JPH08263099A (en) * 1995-03-23 1996-10-11 Toshiba Corp Encoder
US6137840A (en) * 1995-03-31 2000-10-24 Qualcomm Incorporated Method and apparatus for performing fast power control in a mobile communication system
US5914950A (en) * 1997-04-08 1999-06-22 Qualcomm Incorporated Method and apparatus for reverse link rate scheduling

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5657420A (en) * 1991-06-11 1997-08-12 Qualcomm Incorporated Variable rate vocoder
EP0762711A2 (en) * 1995-09-12 1997-03-12 Nokia Mobile Phones Ltd. Speech storage in a portable cellular telephone
DE29717372U1 (en) * 1997-09-29 1997-11-27 Siemens AG, 80333 München Integrated circuit for a mobile radio with answering machine function
WO1999017516A1 (en) * 1997-09-29 1999-04-08 Siemens Aktiengesellschaft Integrated circuit for a mobile radio telephone with an answerphone function

Also Published As

Publication number Publication date
WO2000046795A9 (en) 2001-10-18
EP1159738B1 (en) 2006-04-05
KR20020012157A (en) 2002-02-15
HK1042980B (en) 2005-12-23
AU3589100A (en) 2000-08-25
JP4503853B2 (en) 2010-07-14
DE60027140T2 (en) 2007-01-11
CN1347548A (en) 2002-05-01
ATE322731T1 (en) 2006-04-15
KR100648872B1 (en) 2006-11-24
JP2002536693A (en) 2002-10-29
CN1212604C (en) 2005-07-27
HK1042980A1 (en) 2002-08-30
JP2010092059A (en) 2010-04-22
DE60027140D1 (en) 2006-05-18
EP1159738A1 (en) 2001-12-05
ES2263459T3 (en) 2006-12-16

Similar Documents

Publication Publication Date Title
KR100923891B1 (en) Method and apparatus for interoperability between voice transmission systems during speech inactivity
US6615169B1 (en) High frequency enhancement layer coding in wideband speech codec
JP5149217B2 (en) Method and apparatus for reducing undesirable packet generation
US5251261A (en) Device for the digital recording and reproduction of speech signals
KR100574031B1 (en) Speech Synthesis Method and Apparatus and Voice Band Expansion Method and Apparatus
JP2006099124A (en) Automatic voice/speaker recognition on digital radio channel
KR100351484B1 (en) Speech coding apparatus and speech decoding apparatus
JP2010092059A (en) Speech synthesizer based on variable rate speech coding
ES2371455T3 (en) PRE-PROCESSING OF DIGITAL AUDIO DATA FOR MOBILE AUDIO CODECS.
KR20000053407A (en) Method for transmitting data in wireless speech channels
JP2001242896A (en) Speech coding/decoding apparatus and its method
KR100911278B1 (en) Sound source supply device and sound source supply method
KR100498177B1 (en) Signal quantizer
KR101011320B1 (en) Identification and exclusion of pause frames for speech storage, transmission and playback
JP5199281B2 (en) System and method for dimming a first packet associated with a first bit rate into a second packet associated with a second bit rate
Choudhary et al. Study and performance of amr codecs for gsm
JP3496618B2 (en) Apparatus and method for speech encoding / decoding including speechless encoding operating at multiple rates
US6728344B1 (en) Efficient compression of VROM messages for telephone answering devices
JP2000078246A (en) Radio telephone system
KR20010038033A (en) Apparatus and Method for generating a receiving ring in a mobile communication system
JPH06120889A (en) Telephone signal transmission method/system in cordless telephone

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 00803589.X

Country of ref document: CN

AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2000914511

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2000 597796

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 1020017009887

Country of ref document: KR

AK Designated states

Kind code of ref document: C2

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: C2

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

COP Corrected version of pamphlet

Free format text: PAGES 1/2-2/2, DRAWINGS, REPLACED BY NEW PAGES 1/2-2/2

WWP Wipo information: published in national office

Ref document number: 2000914511

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 1020017009887

Country of ref document: KR

WWG Wipo information: grant in national office

Ref document number: 2000914511

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 1020017009887

Country of ref document: KR