CN1212604C - Speech synthesizer based on variable rate speech coding - Google Patents

Speech synthesizer based on variable rate speech coding Download PDF

Info

Publication number
CN1212604C
CN1212604C CNB00803589XA CN00803589A CN1212604C CN 1212604 C CN1212604 C CN 1212604C CN B00803589X A CNB00803589X A CN B00803589XA CN 00803589 A CN00803589 A CN 00803589A CN 1212604 C CN1212604 C CN 1212604C
Authority
CN
China
Prior art keywords
variable
group
speech
ratio
speed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB00803589XA
Other languages
Chinese (zh)
Other versions
CN1347548A (en
Inventor
张承纯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN1347548A publication Critical patent/CN1347548A/en
Application granted granted Critical
Publication of CN1212604C publication Critical patent/CN1212604C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding

Abstract

An apparatus and method for speech synthesis based on variable rate vocoding is presented. An input speech signal is encoded by a variable rate vocoder (202), and the parameters of the speech signal are stored in memory. To synthesize speech, a variable rate decoder (208) decodes the parameters to produce speech samples. A codec (210) converts the speech samples from a digital signal to an analog signal, which is broadcast through a speaker (212).

Description

Voice operation demonstrator based on variable rate speech coding
Technical field
The present invention relates to phonetic synthesis.Particularly, the present invention relates to synthetic to by the speech of variable-ratio vocoder coding.The invention still further relates to share of phonetic synthesis and Wireless Telecom Equipment.
Background technology
Electronic speech synthesizes in many application all very useful.Increasing computing machine and other electronic equipment provide the option of sound prompting as user interface.For example, can utilize voice to read email message, be created in oral prompting in the voice response system or the driver on automobile points the direction.
Generally there are two kinds of voice operation demonstrator or technology to be used for producing speech.First kind is called as text voice (TTS) voice operation demonstrator, and according to grammer.Based on the system of TTS plain text is converted to and can understand and the voice of natural sounding.The text that it will be imported arbitrarily for needs convert to automatically can understand and the application of the voice output of natural sounding very useful.It is particularly useful for comprising a large amount of vocabulary and/or dynamically changing data conditions.Tts system is such as automatic speech alarm and prompting, proofreading (proofreading) being provided, inserting the phone inlet of database and converting Email the application of voice mail or audio frequency output to very useful.Because TTS is flexible and strong, so it can be used for multiple application.Yet implementing tts system may need huge storage and processing power resource.If compositor does not have anthropomorphic dummy's speech intonation approx, it also may comprise machine tones (machine tone) so.Therefore, to have limited storage and handle resource application (such as, small portable wireless device, remote communication devices or computing machine etc.), TTS is not practical selection.
Second kind of voice operation demonstrator is based on speech coder (vocoder).Vocoder compresses voice or the sound signal of sending by extracting the parameter relevant with people's speech production model.It has been that 13kbps, 8kbps or low rate more reduced in the input voice of 64 kilobits/second (kbps) by digital conversion to speed that vocoder has developed into compression.Based on the voice operation demonstrator of vocoder produce for or be used for some parameters of the voice that will synthesize.Parameter is stored in the storer of some type, flash memory preferably, in case and phonetic synthesis just it is decoded.Because the parameter of all words that will synthesize all needs to be stored in the storer, so not needing to be suitable for the application of a large amount of vocabulary more based on the compositor of vocoder.The system that they are particularly suitable for having limited storage and handle resource.
For voice operation demonstrator, when keeping acceptable voice quality, need optimization storage purposes based on vocoder.For some application, it is desirable to, for given memory size, make the vocabulary maximum.In addition, also it is desirable to, will in given Communication System Design, be used for realizing phonetic synthesis by acquired signal processing resources.The present invention provides the voice operation demonstrator of handling these and other feature by following method.
Summary of the invention
The present invention is a kind of apparatus and method of the phonetic synthesis according to variable-ratio sound encoder (vocoding).By the variable-ratio vocoder voice that will synthesize are encoded.The variable-ratio vocoder is according to the speech activity that takes place in speech frame, with one of one group of set rate, encoded speech frames.In one embodiment, the variable-ratio vocoder is (CELP) vocoder of Code Excited Linear Prediction (code excited linear prediction), and it has 4 kinds of bit rates.So with one of 4 kinds of speed, utilization is encoded into speech parameter for the CELP encoding scheme of selected speed with input speech signal.Generally, provide speech parameter to demoder, wherein demoder is carried out the variable-ratio decoding scheme according to used variable-ratio encoding scheme.Demoder provides speech sample, is provided for the digital-to-analog conversion to scrambler-demoder or codec.So the gained simulating signal that is produced by codec by the broadcasting of loudspeaker or other known audio output device is as synthetic speech.
Voice operation demonstrator of the present invention is particularly suitable for using in wireless communication system, wherein executed variable-ratio sound encoder in this system.In these systems, existing sound encoder resource is used for phonetic synthesis.On the other hand, can use DSP element (existing or easy combination) in conjunction with a spot of storer, so that the voice operation demonstrator function to be provided.In addition, can provide good voice quality based on the voice operation demonstrator of variable-ratio sound encoder, and need not a large amount of internal memories.The compression degree that is provided by the variable-ratio vocoder makes it be suitable for having the application of limited internal memory.
Description of drawings
In conjunction with the accompanying drawings, from following detailed, features, objects and advantages of the invention will be apparent, and same numeral is done corresponding expression in the accompanying drawings, wherein
Fig. 1 is the block scheme of variable-ratio vocoder; With
Fig. 2 is the block scheme of voice operation demonstrator of the present invention.
Embodiment
The invention provides the apparatus and method of useful especially synthetic speech when the combining wireless communication facilities uses together.Utilization of the present invention in Wireless Telecom Equipment existing signal processing resources or minimum additional firmware high voice quality to be provided and to require the mode of little memory space to come synthetic speech.
The present invention is very useful in conjunction with multiple known communicator or system's use the time, and with reference to the cdma wireless communication system it is described below.In addition, should be understood that it is particularly suitable for application-specific, such as hands-free (hands-free) automobile accessories equipment that is used for installing and operating wireless device in the vehicle.Yet, it is not to limit the invention that those skilled in the art understand this easily, and it also can use together in conjunction with the communication facilities of other type (comprising that those use the device of other signal modulation technique by wired, cable or the communication of optical cable type systematic and those).
Example wireless communication system is utilized CDMA (CDMA) modulation technique.Though known other technology (such as, time division multiple access (TDMA) (TDMA), frequency division multiple access (FDMA) with such as amplitude modulation (AM) scheme of amplitude companded single sideband), CDMA has the remarkable advantage that is better than other these technology.At U.S. Patent number 4,901, the utilization that has disclosed CDMA technology in multi-address communication system in 307 (are entitled as " the spread spectrum multiple access communication system (SpreadSpectrum Multiple Access Communication System Using Satellite OrTerrestrial Repeaters) at utilization satellite or terrestrial repetition station ", transferred assignee of the present invention and be incorporated herein) as the reference data.
For a plurality of reasons, in radio communication device or equipment, can implement voice operation demonstrator.For example, phonetic synthesis can be at wireless telephone or is used for being supported in automobile and operates a part of speech recognition system in " hands-free " automobile accessories equipment.During designator on device users or operator can not detect by an unaided eye output screen or device, voice operation demonstrator audio available form provides information.For example, can provide information to allow when vehicle drivers or machine operation person closely can not watch communicator safely operation or to export.Voice operation demonstrator also allows the voice suggestion by the operation that is provided for carrying out, and comes the device hands-free operation.For example, voice operation demonstrator may require the called person name, allows device to dial phone number automatically or requires to carry out such as the mail that dial, stores, opens, stops the order that trial is made a phone call or closed etc.
In one embodiment, voice operation demonstrator of the present invention is used the vocoder circuit that has appeared in the multiple wireless device (such as wireless telephone and be used for producing the other products of pronunciation voice by the communication service user).Particularly, based on the voice operation demonstrator of variable-ratio vocoder.Variable-ratio vocoder utilization speech activity changes its moment data rate.During existing speaking, the vocoder coding device comes the encoded voice sampling with a large amount of positions.Between quiet period, the vocoder coding device hardly with or and the more a spot of position ground unrest of encoding.At United States Patent (USP) 5,414, a kind of example embodiment of variable-ratio vocoder has been described in 796 in (be entitled as " variable-ratio vocoder (Varaible Rate Vocoder) ", transferred assignee of the present invention, and be incorporated herein as the reference data).
Usually in CDMA class communication system, use the variable-ratio vocoder to come to increase power system capacity by reducing the used bit number of each signal of communication.For example, can in the cdma communication system of above-mentioned patent 4,901,307, carry out the variable-ratio vocoder.In cdma communication system, different user uses identical bandwidth but communicates with different sign indicating number channels.This fact that variable-ratio vocoder in cdma communication system utilizes the user only to speak in 40% time on arbitrary given channel.By sending less position, the variable-ratio vocoder allows more users to share same band when the user mourns in silence.
Fig. 1 illustrates the schematic block diagram of typical variable-ratio vocoder, and briefly represents with 100.Vocoder has as shown in Figure 1 used 4 different data rates, though should understand the data rate (as be known in the art such) of available varying number.In 4 speed of this group, if peak rate is 13.2kbps, so at full speed corresponding to 13.2kbps, 1/2 speed corresponding to about 6.2kbps, 1/4 speed corresponding to about 2.7kbps and 1/8 speed corresponding to about 1.0kbps.Note, as be known in the art such, because the utilization additional bit, so all be similar to for the actual bit rate of other speed except that at full speed.
Still with reference to figure 1, visible variable-ratio vocoder 100 comprises scrambler 102 and demoder 104.The speech sample that scrambler 102 receptions are used for speech data frame is as input, and for example, 8 PCM with mu-law (mu-Law) or a law (a-law) form when the 64kbps data rate sample.Scrambler 102 is encoded into speech parameter with one of 4 data speed with these speech samples according to speech activity.Also determine that to speed element 106 provides the input speech sample.
Speed determines that element 106 can carry out any algorithm in a plurality of rate determination algorithms.In one embodiment, use the threshold value relevant to determine speech activity with the ground unrest energy level, and the speed of coding input sample.If the energy of the present frame of speech sample is far above the ground unrest energy, speed determines that element 106 will determine coded frame at full speed so.If the energy of present frame is near the ground unrest energy, so as known, speed determines that element 106 will determine with 1/8th speed coding frames, or the like.
In pending U.S. Patent Application number 08/286, disclosed another kind of speed in 842 (denomination of invention is " method and apparatus (Method And Appratus For Performing Reduced Rate VariableRate Vocoding) of carrying out deceleration variable-ratio sound encoder ", transfer assignee of the present invention and be incorporated herein) and determined technology as the reference data.This technology provides the one group of speed that is called as pattern measurement (mode measure) to determine criterion.The first pattern measurement is the object matching signal to noise ratio (S/N ratio) (TMSNR) from last coded frame, and it provides by synthetic voice signal is compared with input speech signal carries out to such an extent that how good information arranged about encoding model.The second pattern measurement is normalized autocorrelation functions (NACF), and it measures the periodicity in speech frame.The three-mode measurement is zero crossing (ZC) parameter, and it measures the high frequency content in the input speech frame.The 4th measures, and predetermined gain difference (PGD) determines whether scrambler keeps its predetermined efficient.The 5th measurement is energy difference (ED), and it makes comparisons energy and average frame energy in present frame.
Use above-mentioned pattern to measure, speed determines that the logic selection is used for the code rate of each input speech data frame.For the value of each pattern for example from 4 or more select a pattern will operating the multi-mode.That is,, measure detected value for each pattern relevant and determine to select which code rate with thresholding or other criterion according to preassigned pattern or classification.For example, if for the value of NACF less than preselected threshold and ZC greater than second preselected threshold, can select a speed.Yet if do not satisfy these conditions, but ED is lower than the 3rd thresholding, can select 1/4th speed so.If the value for TSNR is bigger, PGD still less and also NACF respectively greater than the 4th, the 5th and the 6th thresholding, can select Half Speed so.Those skilled in the art that can adopt various such combinations and thresholding to select code rate.
Should be understood that speed determines that element 106 also can adopt other speed to determine technology.
Still with reference to Fig. 1, determine that by speed the indicator signal of element 106 determined data rate offers switch 108 for one.Switch 108 is used for coding input speech sample frame, as the data rate signal is specified from selecting an element between full speed encoder element 110, Half Speed encoder element 112,1/4th rate coding elements 114 and 1/8th rate coding elements 116.Selected encoder element is encoded to produce the signal of coded data packet to speech sample.Speed determines that element 106 also provides the signal of designation data speed to switch 118, and above-mentioned switch is selected the encoder element identical with switch 108, thereby the signal of the coded data packet of selected encoder element generation can be provided to the output of variable-ratio vocoder.
Each arrangements of components in the encoder element 110,112,114 and 116 is become utilization predictive encoding scheme encoded voice.In preferred embodiment, use encoding scheme based on linear prediction (such as, code exciting lnear predict (Code Excited Linear Predictive) is scrambler (CELP).In the papers that the people showed " 4.8Kbps code exciting lnear predict coder (A 4.8Kbps CodeExcited Linear Predictive Coder) " (mobile-satellite meeting process (Proceedings of theMobile Satellite Conference), 1998) such as Thomas E.Tremain, celp coder has been described.By removing natural redundancies intrinsic in voice, based on the encoder compresses voice of linear prediction.Voice generally present because short term redundancies due to the mechanical action of lip and tongue and because the long term redundancy due to the vocal cord vibration.These are operating as the linear prediction program simulation wave filter, remove redundant and simulate the gained residual signal as white Gauss noise.Therefore, by transmitting filter coefficient and quantizing noise rather than full bandwidth voice signal, the bit rate that Linear Predictive Coder has obtained to reduce.
Adopt the linear predictive coding scheme of variable-ratio further to reduce bit rate, and do not influence voice quality.In Fig. 1, at full speed encoder element 110 uses more multidigit to come the parameter of input speech signal is encoded so that keep the feature of input better.For do not detect any voice during in owing to almost do not obtain details or Useful Information, 1/8th rate coding elements, 116 utilization less bits are encoded to parameter.By 114 pairs of Half Speed encoder element 112 and 1/4th encoder elements between existing speech period and the transition between not detecting between speech period encode.
Referring now to the decode element of variable-ratio vocoder, the signal of demoder 104 received code speech parameters and indication are used for the signal of speed of encoded voice.Speed is extracted the data rate that element 128 receives this input signal and determines voice.The signal of data rate also is provided to switch 130, and described switch is selected the decode element input parameter of decoding normally from one group of decode element.In Fig. 1, provide 4 decode element (decode element 120, Half Speed decode element 122,1/4th speed decode element 124 and 1/8th speed decode element 126 at full speed) decoded speech parameter under these 4 possible speed.To produce the decoding sampled signal, described sampling generally is 64kpbs pulse code modulation (pcm) sampling to selected decode element according to data rate decoding input parameter.Also provide the signal that extracts element 128 determined data rate by speed to switch 132.Switch 132 is selected the decode element identical with switch 130, thereby provides decoding to sample to the output of vocoder.
Now, with reference to Fig. 2, show the block scheme according to the speech synthesis system of operate of the present invention, described system combines the variable-ratio vocoder.Speech synthesis system comprises variable-ratio scrambler 202 and voice operation demonstrator 204.The example of variable-ratio scrambler 202 is scramblers 102 of Fig. 1.Variable-ratio scrambler 202 received speech signals are as input and with one of one group of set rate encoded voice.In preferred embodiment, variable-ratio scrambler 202 is celp coders, and it produces speech parameter according to the speech activity in the input voice segments with a speed.
The present invention's utilization as above-mentioned United States Patent (USP) 5,414, the variable-ratio vocoder described in 796, this variable-ratio vocoder is on sale on market, for example the 13kpbs vocoder product of Qualcomm's production.In preferred embodiment, the variable-ratio demoder is such as basis IS127The described enhancement mode variable-ratio of standard demoder.
In one embodiment of the invention, code rate judges it is according to above-mentioned " pattern measurement ".Those skilled in the art that will appreciate that, the different combination of criteria that is used for making rate selection produces what is called " rate mode of deceleration " or " pattern ", and abbreviates pattern 0, pattern 1, pattern 2 as, or the like.The present invention can utilize this pattern to make phonetic synthesis.
The voice that received by variable-ratio scrambler 202 can be word or the phrases from a preliminary election vocabulary, and the communicator that wherein designs such as wireless telephone, vehicle support equipment or other communicator synthesizes above-mentioned preliminary election vocabulary.Prompting and the alarm that provides to device users can be provided this vocabulary.For example, by extracting and synthetic 5 vocabulary words: " calling ", " redialing ", " program ", " or " and " withdrawing from ", voice operation demonstrator can be designed to is providing prompting " call out, redial, program or withdraw from " from user's response request.On the other hand, voice operation demonstrator can be designed to, the information (such as in telephone directory, question blank or database) of before being deposited is provided to device users in response to various device inputs (comprising audio frequency).The voice coding that receives by variable-ratio scrambler 202, and provide the parameter of having encoded with storage to the memory element of voice operation demonstrator 204 or circuit 206.
In a period of time, storer 206 be used for keeping or stored parameter to operate required device.Yet, it is desirable to usually so that they are renewable or alternative (such as, when needs change vocabulary and are upgraded to device characteristic to change conditioned disjunction) the mode stored parameter.Therefore, constitute storer 206 with the form of non-volatile but recordable memory, wherein as be known in the art, can use quickflashing class memory component to realize above-mentioned non-volatile but recordable memory.
As people recognize, can during communicator constructed in accordance, carry out the operation of load parameter.Owing to prompting and the alarm that can be scheduled to synthesize, so can during making before the use and being stored in flash memory 206, these be encoded.At the device viability, can change or alternative parameter, perhaps the aerial programming technique of the new development by being used for wireless device is realized.
On the other hand, during the operation communication device, but the input of variable-ratio scrambler 202 received speech signals.For example, in response to prompting from voice operation demonstrator, the response that the user can provide mouth to say.Variable-ratio rate coding device 202 is the voice of encoding user, and can provide encoded parameter to be used for storage to flash memory 206, and/or is provided for speech recognition to speech recognition device (not shown).By this method, (post manufacture) (such as enter the instant or overtime of utility services at device) realizes such as setting up the lexicon relevant with this user's requirement by (vocoder) user for each device with regard to input parameter after making.
Flash memory 206 should have is enough to store the parameter of preliminary election vocabulary and the scale of the parameter that the user expects.So, can change the size of flash memory 206 according to the requirement of application-specific.Storer after the manufacturing can have the advantage that reduces memory requirement, because compare in order to cover the whole vocabulary that must install than bigger device market with manufacturer, each device users does not require so big vocabulary.Voice operation demonstrator can be by detecting target or required phrase or voice end points, remove and mourn in silence or redundant and it is encoded record name or other word, such as " Fred Smith ".Therefore, but online record voice and be used for synthetic speech output subsequently.
It should be noted that and to dispose variable-ratio scrambler 202 according to available storer and required voice quality.In system with 4 speed, wherein be 13kbps at full speed, according to 40% speech activity, mean speed generally is 5.88kbps.Utilization to variable-ratio provides high voice quality.Yet if memory size is limited, variable-ratio scrambler 202 is configured to suppose to operate with fixing Half Speed (about 800 byte per seconds) so.Otherwise, can from the subclass of a set rate group, select speed, rather than from whole speed group, select.For example, the available above-mentioned rate mode that has slowed down is selected each speed.In one embodiment of the invention, speed is divided into one group of 4 pattern, is designated as pattern 0,1,2 and 3.Utilization can be adopted the speed of about 1800 byte per seconds, 1540 byte per seconds, 1400 byte per seconds and 1100 byte per seconds respectively according to the fixed rate of pattern.These fixing utilizations that reduce speed are allowed with the very high voice of given pre-determined data rate Transfer Quality, and this quality has reached the quality of landline.These 4 kinds of patterns provide optimal compromise between the specification of synthetic speech quality and storer necessity.
In addition, according to the moment requirement of using, variable-ratio scrambler 202 can be between different operator schemes (subclass of variable-ratio, all Half Speeds, variable-ratio, or the like) conversion.Trade off owing between voice quality and memory size, exist, so the configuration that will adopt should be according to the application that will implement.
When the needs phonetic synthesis, provide the speech parameter that is stored in the flash memory 206 to variable-ratio demoder 208.Variable-ratio demoder 208 is configured to the parameter decoding by being produced with corresponding variable-ratio scrambler 202.The example of variable-ratio demoder 208 is demoders 104 of Fig. 1.
Generally, variable-ratio demoder 208 is implemented as a part of digital signal processor (DSP) that uses in communicator.With these DSP as or be formed for signal encoding/decoding, combination, CDMA coding, power adjustments, or the like treatment element.Owing to generally in can adopting wireless device of the present invention and multiple other device, use these elements, utilize their existence can very implement the present invention economically.
In order to want the invention process decoding function, as long as in DSP, have a spot of storer or be coupled to DSP.In DSP or the independent decoder of utilization DSP require very a spot of internal memory (program and data) to obtain the voice operation demonstrator ability.Utilization can be implemented voice operation demonstrator such as famous DSP circuit and the device that can buy from analogue means (Analog Devices) and Qualcomm (Qualcomm Inc.).
Provide general parameter to coding decoder 210 through decoding with the pulse code modulation (pcm) sampled form.Coding decoder 210 converts the PCM sampling to simulating signal from digital format.Provide simulating signal to loudspeaker or other known audio output device 212, wherein output unit 212 is invested synthetic speech or be broadcast in the device context on every side that can hear it.
Therefore, the invention provides voice operation demonstrator based on the variable bit rate sound encoder.Voice operation demonstrator is specially adapted to comprise the radio communication device of variable-ratio vocoder.In other words, by proper transformation in program or operational order or utilization control hardware, voice operation demonstrator can adopt existing variable-ratio vocoder.In addition, by utilization variable-ratio sound encoder, the compression of acquisition allows predetermined vocabulary is stored in the storer of the relevant size qualification of the wireless device that connected with it or miscellaneous equipment.In addition,, can consider between voice quality and memory size, to trade off to provide to voice operation demonstrator in the required voice quality and memory size process at configuration variable-ratio vocoder.
The present invention can be used for multiple communicator and interfacing equipment.With reference to Wireless Telecom Equipment (be commonly called user terminal, subscriber unit, movement station or abbreviate the honeycomb and the satellite phone of " user ", " moving " or " subscriber " as), above-mentioned example embodiment is discussed such as (but being not limited to).In addition, also can consider other device,, perhaps can consider to be used for the interface of public switched telephone network (PSTN) or private communication channel such as message receiver and data transfer device (for example, portable computer, personal digital assistant, modulator-demodular unit, machine controller).
Utilization is implemented the present invention to form the voice operation demonstrator that can be installed in the required device with the discrete circuit of professional component or the specific integrated circuit of purposes (ASIC) form.On the other hand, work with existing digital signal processing element, it can be added in other ASIC and the device by using a spot of annex memory.
The description that preferred embodiment is provided to those skilled in the art that is to make or utilization the present invention.Various conversion for these embodiment are conspicuous for the personnel that are familiar with these prior aries, and the General Principle that defines can be used for other embodiment here, and need not carry out creative work.So, the embodiment shown in the present invention is not limited to here, but according to the principle that discloses the here wide region consistent with novel features.

Claims (20)

1. the device of a synthetic preliminary election vocabulary in wireless communication system, described vocabulary, be is characterized in that comprising by one group of variable rate encoding by the variable-ratio scrambler:
Storer is used to store one group of speech parameter, and described one group of speech parameter is represented the preliminary election vocabulary of described coding;
The variable-ratio scrambler, structure becomes to accept oral input from the user, and selects a speech parameter subclass of described one group of speech parameter according to described user's oral input;
The variable-ratio demoder, the described speech parameter subclass that is used to decode is sampled through decoded speech to produce; With
Digital-to-analog converter is used for converting described speech sample to simulating signal to broadcast as synthetic speech.
2. device as claimed in claim 1 is characterized in that, described variable-ratio scrambler is based on linear prediction.
3. device as claimed in claim 1 is characterized in that, described variable-ratio demoder is based on linear prediction.
4. device as claimed in claim 1 is characterized in that, described variable-ratio scrambler is with one group of described one group of speech parameter of variable rate encoding, and wherein said variable bit rate group comprises full speed, Half Speed, 1/4th speed and 1/8th speed.
5. device as claimed in claim 4 is characterized in that, described variable-ratio scrambler is with one group of described one group of speech parameter of variable rate encoding, and wherein said variable bit rate group comprises 13.2kbps, 6.2kpbs, 2.7kbps and 1.0kbps.
6. device as claimed in claim 4 is characterized in that, the rate coding described one group speech parameter of described variable-ratio scrambler to fix in response to one or more mode determination criterions.
7. device as claimed in claim 4 is characterized in that, described variable-ratio scrambler is with the fixing described one group of speech parameter of rate coding of described Half Speed.
8. device as claimed in claim 4 is characterized in that, described variable-ratio scrambler is selected code rate according to the requirement to voice quality and described memory size.
9. device as claimed in claim 1 is characterized in that described wireless communication system is a cdma system.
10. device as claimed in claim 1 is characterized in that, described variable-ratio scrambler comprises enhancement mode variable-ratio scrambler.
11. the method for a synthetic preliminary election vocabulary in wireless communication system, described vocabulary, be is characterized in that by one group of variable rate encoding by variable rate coder, comprises the following steps:
Receive user's oral input;
The one group speech parameter of retrieve stored in storer, described one group of speech parameter are the preliminary election vocabularies corresponding to the described coding of a part of described user's oral input;
The utilization variable-ratio encoding scheme described one group of speech parameter of decoding is sampled through decoded speech to produce; With
Convert described speech sample to simulating signal to broadcast as synthetic speech.
12. method as claimed in claim 11 is characterized in that, described variable-ratio encoding scheme is based on linear prediction.
13. method as claimed in claim 11 is characterized in that, described variable-ratio decoding scheme is based on linear prediction.
14. method as claimed in claim 11 is characterized in that, with the described one group of speech parameter of variable bit rate group coding, wherein said variable bit rate group comprises full speed, Half Speed, 1/4th speed and 1/8th speed.
15. method as claimed in claim 14 is characterized in that, described full speed is 13.2kbps, the about 6.2kpbs of described Half Speed, the about 2.7kbps of described 1/4th speed and the about 1.0kbps of described 1/8th speed.
16. method as claimed in claim 14 is characterized in that, with the described one group of speech parameter of fixing in response to one or more mode determination criterions of rate coding.
17. method as claimed in claim 14 is characterized in that, with the fixing described one group of speech parameter of rate coding of described Half Speed.
18. method as claimed in claim 14 is characterized in that, according to the requirement to voice quality and described memory-size, selects code rate.
19. method as claimed in claim 11 is characterized in that, described wireless communication system comprises cdma system.
20. method as claimed in claim 11 is characterized in that, also comprises with described user's oral input coding and with the user's of described coding oral input being added to the step of described storer as the part of described one group of speech parameter.
CNB00803589XA 1999-02-08 2000-02-04 Speech synthesizer based on variable rate speech coding Expired - Fee Related CN1212604C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US24660599A 1999-02-08 1999-02-08
US09/246,605 1999-02-08

Publications (2)

Publication Number Publication Date
CN1347548A CN1347548A (en) 2002-05-01
CN1212604C true CN1212604C (en) 2005-07-27

Family

ID=22931374

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB00803589XA Expired - Fee Related CN1212604C (en) 1999-02-08 2000-02-04 Speech synthesizer based on variable rate speech coding

Country Status (10)

Country Link
EP (1) EP1159738B1 (en)
JP (2) JP4503853B2 (en)
KR (1) KR100648872B1 (en)
CN (1) CN1212604C (en)
AT (1) ATE322731T1 (en)
AU (1) AU3589100A (en)
DE (1) DE60027140T2 (en)
ES (1) ES2263459T3 (en)
HK (1) HK1042980B (en)
WO (1) WO2000046795A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4867076B2 (en) * 2001-03-28 2012-02-01 日本電気株式会社 Compression unit creation apparatus for speech synthesis, speech rule synthesis apparatus, and method used therefor
KR100425982B1 (en) * 2001-12-29 2004-04-06 엘지전자 주식회사 Voice Data Rate Changing Method in IMT-2000 Network
KR100651731B1 (en) * 2003-12-26 2006-12-01 한국전자통신연구원 Apparatus and method for variable frame speech encoding/decoding
CN101692685B (en) * 2009-10-29 2012-05-30 中国电信股份有限公司 Method and system for improving acoustics of polyphonic ringtone
US9472181B2 (en) * 2011-02-03 2016-10-18 Panasonic Intellectual Property Management Co., Ltd. Text-to-speech device, speech output device, speech output system, text-to-speech methods, and speech output method
CN106952651A (en) * 2017-02-17 2017-07-14 福建星网智慧科技股份有限公司 A kind of voice processing apparatus transmits the method and system of voice
WO2021040490A1 (en) * 2019-08-30 2021-03-04 Samsung Electronics Co., Ltd. Speech synthesis method and apparatus

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0331858B1 (en) * 1988-03-08 1993-08-25 International Business Machines Corporation Multi-rate voice encoding method and device
BR9206143A (en) * 1991-06-11 1995-01-03 Qualcomm Inc Vocal end compression processes and for variable rate encoding of input frames, apparatus to compress an acoustic signal into variable rate data, prognostic encoder triggered by variable rate code (CELP) and decoder to decode encoded frames
JP3081300B2 (en) * 1991-10-01 2000-08-28 三洋電機株式会社 Residual driven speech synthesizer
TW271524B (en) * 1994-08-05 1996-03-01 Qualcomm Inc
JPH08263099A (en) * 1995-03-23 1996-10-11 Toshiba Corp Encoder
US6137840A (en) * 1995-03-31 2000-10-24 Qualcomm Incorporated Method and apparatus for performing fast power control in a mobile communication system
US5790957A (en) * 1995-09-12 1998-08-04 Nokia Mobile Phones Ltd. Speech recall in cellular telephone
US5914950A (en) * 1997-04-08 1999-06-22 Qualcomm Incorporated Method and apparatus for reverse link rate scheduling
DE29717372U1 (en) * 1997-09-29 1997-11-27 Siemens Ag Integrated circuit for a mobile radio with answering machine function

Also Published As

Publication number Publication date
JP4503853B2 (en) 2010-07-14
DE60027140D1 (en) 2006-05-18
CN1347548A (en) 2002-05-01
WO2000046795A9 (en) 2001-10-18
HK1042980B (en) 2005-12-23
ATE322731T1 (en) 2006-04-15
AU3589100A (en) 2000-08-25
EP1159738B1 (en) 2006-04-05
KR20020012157A (en) 2002-02-15
JP2010092059A (en) 2010-04-22
EP1159738A1 (en) 2001-12-05
ES2263459T3 (en) 2006-12-16
KR100648872B1 (en) 2006-11-24
HK1042980A1 (en) 2002-08-30
WO2000046795A1 (en) 2000-08-10
DE60027140T2 (en) 2007-01-11
JP2002536693A (en) 2002-10-29

Similar Documents

Publication Publication Date Title
CN1179324C (en) Method and apparatus for improving voice quality of tandemed vocoders
CN1168070C (en) Distributed voice recognition system
KR100574031B1 (en) Speech Synthesis Method and Apparatus and Voice Band Expansion Method and Apparatus
US20020103646A1 (en) Method and apparatus for performing text-to-speech conversion in a client/server environment
US5251261A (en) Device for the digital recording and reproduction of speech signals
CN1375096A (en) Spectral magnetude quantization for a speech coder
US20060235685A1 (en) Framework for voice conversion
CN1200404C (en) Relative pulse position of code-excited linear predict voice coding
JP2010092059A (en) Speech synthesizer based on variable rate speech coding
US5706392A (en) Perceptual speech coder and method
US5666350A (en) Apparatus and method for coding excitation parameters in a very low bit rate voice messaging system
WO2000077774A1 (en) Noise signal encoder and voice signal encoder
AU6533799A (en) Method for transmitting data in wireless speech channels
KR20020088088A (en) Data processing device
JP2001242896A (en) Speech coding/decoding apparatus and its method
US6792402B1 (en) Method and device for defining table of bit allocation in processing audio signals
WO2002021091A1 (en) Noise signal analyzer, noise signal synthesizer, noise signal analyzing method, and noise signal synthesizing method
EP1298647B1 (en) A communication device and a method for transmitting and receiving of natural speech, comprising a speech recognition module coupled to an encoder
EP0850471B1 (en) Very low bit rate voice messaging system using variable rate backward search interpolation processing
JP3183072B2 (en) Audio coding device
US6728344B1 (en) Efficient compression of VROM messages for telephone answering devices
JP3330178B2 (en) Audio encoding device and audio decoding device
KR0152341B1 (en) Output break removing apparatus and method of multimedia
JPH0414813B2 (en)
JP2000244614A (en) Portable radio terminal

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20050727

Termination date: 20110204