USRE39336E1 - Formant-based speech synthesizer employing demi-syllable concatenation with independent cross fade in the filter parameter and source domains - Google Patents
Formant-based speech synthesizer employing demi-syllable concatenation with independent cross fade in the filter parameter and source domains Download PDFInfo
- Publication number
- USRE39336E1 USRE39336E1 US10/288,029 US28802902A USRE39336E US RE39336 E1 USRE39336 E1 US RE39336E1 US 28802902 A US28802902 A US 28802902A US RE39336 E USRE39336 E US RE39336E
- Authority
- US
- United States
- Prior art keywords
- filter
- demi
- synthesizer
- cross fade
- waveform
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 claims abstract description 34
- 230000007246 mechanism Effects 0.000 claims description 21
- 230000008569 process Effects 0.000 claims description 8
- 230000015572 biosynthetic process Effects 0.000 abstract description 22
- 238000003786 synthesis reaction Methods 0.000 abstract description 22
- 230000001755 vocal effect Effects 0.000 abstract description 6
- 230000000593 degrading effect Effects 0.000 abstract description 2
- 230000009977 dual effect Effects 0.000 abstract 1
- 230000006870 function Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 210000004704 glottis Anatomy 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 238000009877 rendering Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000000135 prohibitive effect Effects 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000033458 reproduction Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
- G10L13/07—Concatenation rules
Definitions
- the present invention relates generally to speech synthesis and more particularly to a concatenative synthesizer based on a source-filter model in which the source signal and filter parameters are generated by independent cross fade mechanisms.
- Modern day speech synthesis involves many tradeoffs. For limited vocabulary applications, it is usually feasible to store entire words as digital samples to be concatenated into sentences for playback. Given a good prosody algorithm to place the stress on the appropriate words, these systems tend to sound quite natural, because the individual words can be accurate reproductions of actual human speech. However, for larger vocabularies it is not feasible to store complete word samples of actual human speech. Therefore, a number of speech synthesis have been experimenting with breaking speech into smaller units and concatenating those units into words, phrases and ultimately sentences.
- the formant-based speech synthesizer of the invention is based upon a source-filter model that closely ties the source and filter synthesizer components to physical structures within the human vocal tract.
- the source model is based on a best estimate of the source signal produced at the glottis
- the filter model is based on the resonant (formant-producing) structures generally above the glottis. For this reason, we call our synthesis technique “formant-based” synthesis.
- Formant-based synthesis technique
- Our synthesis technique involves identifying and extracting the formants from an actual speech signal (labeled to identify approximate demi-syllable areas) and then using this information to construct demi-syllable segments each represented by a set of filter parameters and a source signal waveform.
- the invention provides a novel cross fade technique to smoothly concatenate consecutive demi-syllable segments.
- our system allows us to perform cross fade in the filter parameter domain while simultaneously but independently performing “cross fade” (parameter interpolation) of the source waveforms in the time domain.
- the filter parameters model vocal tract effects, while the source waveforms model the global source.
- the technique has the advantage of restricting prosodic modification to only the glottal source, if desired. This can reduce distortion usually associated with the conventional blending techniques.
- the invention further provides a system whereby interaction between initial and final demi-syllables can be taken into account.
- Demi-syllables represent the presently preferred concatenation unit. Ideally, concatenation units are selected at points of least co-articulatory effect.
- the syllable is a natural unit for this purpose, but choosing the syllable requires a large amount of memory. For systems with limited available memory, the demi-syllable is preferred.
- This interaction information is stored in a waveform database containing not only the source waveform data and filter parameter data, but also the necessary label or marker data and context data used by the system in applying formant modification rules.
- the system operates upon an input phoneme string by first performing unit selection, then building an acoustic string of syllable objects and then rendering those objects by performing the cross face operations in both source signal and filter parameter domains.
- the resulting output are source waveforms and filter parameters that may then be used in a source-filter model to generate synthesized speech.
- the result is a natural sounding speech synthesizer that can be incorporated into many different consumer products.
- the techniques can be applied to any speech coding application, the invention is well suited for use as a concatenative speech synthesizer, suitable for use in text-to-speech applications.
- This system is designed to work within the current memory and processor constraints found in many consumer applications.
- the synthesizer is designed to fit into a small memory footprint, while providing better sounding synthesis than other synthesizers of larger size.
- FIG. 1 is a block diagram illustrating the basic source-filter model with which the invention may be employed
- FIG. 2 is a diagram of speech synthesizer technology, illustrating the spectrum of possible source-filter combinations, particularly pointing out the domain in which the synthesizer of the present invention resides;
- FIG. 3 is a flowchart diagram illustrating the procedure for constructing waveform databases used in the present invention
- FIGS. 4A and 4B comprise a flowchart diagram illustrating the synthesis process according to the invention.
- FIG. 5 is a waveform diagram illustrating time domain cross fade of source waveform snippets
- FIG. 6 is a block diagram of the presently preferred apparatus useful in practicing the invention.
- FIG. 7 is a flowchart diagram illustrating the process in accordance with the invention.
- speech can be modeled as an initial source component 10 , processed through a subsequent filter component 12 .
- either source or filter, or both can be very simple or very complex.
- PCM Phase Code Modulated
- a very simple filter In the PCM synthesizer all a prior knowledge was imbedded in the source and none in the filter.
- another synthesis method used a simple repeating pulse train as the source and a comparatively complex filter based on LPC (Linear Predictive Coding). Note that neither of these conventional synthesis techniques attempted to model the physical structures within the human vocal tract that are responsible for producing human speech.
- the present invention employs a formant-based synthesis model that closely ties the source and filter synthesizer components to the physical structures within the human vocal tract.
- the synthesizer of the present invention bases the source model on a best estimate of the source signal produced at the glottis.
- the filter model is based on the resonant (formant producing) structures located generally above the glottis. For these reasons, we call our synthesis technique “formant-based”.
- FIG. 2 summarizes various source-filter combinations, showing on the vertical axis a comparative measure of the complexity of the corresponding source or filter component.
- the source and filter components are illustrated as side-by-side vertical axes.
- the source axis relative complexity decreases from top to bottom, whereas along the filter axis relative complexity increases from top to bottom.
- Several generally horizontal or diagonal lines connect a point on the source axis with a point on the filter axis to represent a particular type of speech synthesizer.
- the horizontal line 14 connects a fairly complex source with a fairly simple filter to define the TD-PSOLA synthesizer, an example of one type of well-known synthesizer technology in which a PCM source waveform is applied to an identity filter.
- horizontal line 16 connects a relatively simple source with a relatively complex filter to define another known synthesizer of the phase vocorder, harmonic synthesizer.
- This synthesizer in essence uses a simple form of pulse train source waveform and a complex filter designed using spectral analysis techniques such as Fast Fourier Transforms (FFT).
- FFT Fast Fourier Transforms
- the classic LPC synthesizer is represented by diagonal line 17 , which connects a pulse train source with an LPC filter,
- the Klatt synthesizer 18 is defined by a parametric source applied through a filter comprised of formants and zeros.
- the present invention occupies a location within FIG. 2 illustrated generally by the shaded region 20 .
- the present invention can use a source waveform ranging from a pure glottal source to a glottal source with nasal effects present.
- the filter can be a simple formant filter bank or a somewhat more complex filter having formants and zeros.
- Region 20 corresponds as close as practical to the natural separation in humans between the glottal voice source and the vocal tract (filter).
- TD-PSOLA pure time domain representation
- the pure frequency domain representation such as the phase vocorder or harmonic synthesizer
- the presently preferred implementation of our formant-based synthesizer uses a technique employing a filter and an inverse filter to extract source signal and formant parameters from human speech. The extracted signals and parameters are then used in the source-filter model corresponding to region 20 in FIG. 2 .
- the presently preferred procedure for extracting source and filter parameters from human speech is described later in this specification. The present description will focus on other aspects of the formant-based synthesizer, namely those relating to selection of concatenative units and cross fade.
- the formant-based synthesizer of the invention defines concatenation units representing small pieces of digitized speech that are then concatenated together for playback through a synthesizer source module.
- the cross fade techniques of the invention can be employed with concatenation units of various sizes.
- the syllable is a natural unit for the purpose, but where memory is limited choosing the syllable as the basic concatenation unit may be prohibitive in terms of memory requirements. Accordingly, the present implementation uses the demi-syllable as the basic concatenation unit.
- An important part of the formant-based synthesizer involves performing a cross fade to smoothly join adjacent demi-syllables so that the resulting syllables sound natural and without glitches or distortion. As will be more fully explained below, the present system performs this cross fade in both the time domain and the frequency domain, involving both components of the source-filter model; the source waveforms and the formant filter parameters.
- the preferred embodiment stores source waveform data and filter parameter data in a waveform database.
- the database in its maximal form stores digitized speech waveforms and filter parameter data for at least one example of each demi-syllable found in the natural language (e.g. English).
- the database can be pruned to eliminate redundant speech waveforms. Because adjacent demi-syllables can significantly affect one another, the preferred system stores the data for each different context encountered.
- FIG. 3 shows the presently preferred technique for constructing the waveform database.
- the boxes with double-lined top edges are intended to depict major processing block headings.
- the single-lined boxed beneath these headings represent the individual steps or modules that comprise the major block designated by the heading block.
- data for the waveform database is constructed as at 40 by first compiling a list of demi-syllables and boundary sequences as depicted at step 42 . This is accomplished by generating all possible combinations of demi-syllables (step 44 ) and by then excluding any unused combinations as at 46 .
- Step 44 may be a recursive process whereby all different permutations of initial and final demi-syllables are generated. This exhaustive list of all possible combinations is then printed to reduce the size of the database. Pruning is accomplished in step 46 by consulting a word dictionary 48 that contains phonetic transcriptions of all words that the synthesizer will pronounce. These phonetic transcriptions are used to weed out any demi-syllable combinations that do not occur in the words the synthesizer will pronounce.
- the preferred embodiment also treats boundaries between syllables, such as those that occur across word boundaries or sentence boundaries. These boundary units (often consonant clusters) are constructed from diphones sampled from the correct context.
- One way to exclude unused boundary unit combinations is to provide a text corpus 50 containing exemplary sentences formed using the words found in word dictionary 48 . These sentences are used to define different word boundary contexts such that boundary unit combinations not found in the text corpus may be excluded at step 46 .
- the sampled waveform data associated with each demi-syllable is recorded and labeled at step 52 .
- This entails applying phonetic markers at the beginning and ending of the relevant portion of each demi-syllable, as indicated at step 54 .
- the relevant parts of the sampled waveform data are extracted and labeled by associating the extracted portions with the corresponding demi-syllable or boundary unit from which the sample was derived.
- the next step involves extracting source and filter data from the labeled waveform data as depicted generally at step 56 .
- Step 56 involves a technique described more fully below in which actual human speech is processed through a filter and its inverse filter using a cost function that helps extract an inherent source signal and filter parameters from each of the labeled waveform data.
- the extracted source and filter data are then stored at step 58 in the waveform database 60 .
- the maximal waveform database 60 thus contains source (waveform) data and filter parameter data for each of the labeled demi-syllables and boundary units. Once the waveform database has been constructed, the synthesizer may now be used.
- the input string may be a phoneme string representing a phrase or sentence, as indicated diagrammatically at 64 .
- the phoneme string may include aligned intonation patterns 66 and syllable duration information 68 .
- the intonation patterns and duration information supply prosody information that the synthesizer may use to selectively alter the pitch and duration of syllables to give a more natural human-like inflection to the phrase or sentence.
- the phoneme string is processed through a series of steps whereby information is extracted from the waveform database 60 and rendered by the cross fade mechanisms.
- unit selection is performed as indicated by the heading block 70 .
- This entails applying context rules as at 72 to determine what data to extract from waveform database 60 .
- the context rules depicted diagrammatically at 74 , specify which demi-syllables or boundary units to extract from the database under certain conditions. For example, if the phoneme string calls for a demi-syllable that is directly represented in the database, then that demi-syllable is selected.
- the context rules take into account the demi-syllables of neighboring sound units in making selections from the waveform database.
- the context rules will specify the closest approximation to the required demi-syllable.
- the context rules are designed to select the demi-syllables that will sound most natural when concatenated. Thus the context rules are based on linguistic: principles.
- the context rules will specify the next-most desirable context.
- the rules may choose a segment preceded by a different bilabial, such as /p/.
- the synthesizer builds an acoustic string of syllable objects corresponding to the phoneme string supplied as input.
- This step is indicated generally at 76 and entails constructing source data for the string of demi-syllables as specified during unit selection.
- This source data corresponds to the source component of the source-filter model.
- Filter parameters are also extracted from the database and manipulated to build the acoustic string. The details of filter parameter manipulation are discussed more fully below.
- the presently preferred embodiment defines the string of syllable objects as a linked list of syllables 78 , which in turn, comprises a linked list of demi-syllables 80 .
- the demi-syllables contain waveform snippets 82 obtained from waveform database 60 .
- a series of rendering steps are performed to cross fade the source data in the time domain and independently cross fade the filter parameters in the frequency domain.
- the rendering steps applied in the time domain appear beginning at step 84 .
- the rendering steps applied in the frequency domain appear beginning at step 110 (FIG. 4 B).
- FIG. 5 illustrates the presently preferred technique for performing a cross fade of the source data in the time domain.
- a syllable of duration S is comprised of initial and final demi-syllables of duration A and B.
- the waveform data of demi-syllable A appears at 86 and the waveform data of demi-syllable B appears at 88 .
- These waveform snippets are slid into position (arranged in time) so that both demi-syllables fit within syllable duration S. Note that there is some overlap between demi-syllables A and B.
- the cross fade mechanism of the preferred embodiment performs a linear cross fade in the time domain.
- This mechanism is illustrated diagrammatically at 90 , with the linear cross fade function being represented at 92 .
- demi-syllable A receives full emphasis while demi-syllable B receive zero emphasis.
- demi-syllable B receives zero emphasis.
- demi-syllable A is gradually reduced in emphasis while demi-syllable B is gradually increased in emphasis.
- a separate cross fade process is performed on the filter parameter data associated with the extracted demi-syllables.
- the procedure begins by applying filter selection rules 98 to obtain filter parameter data from database 60 . If the requested syllable is directly represented in a syllable exception component of database 60 , then filter data corresponding to that syllable is used as at step 100 . Alternatively, if the filter data is not directly represented as a full syllable in the database, then new filter data are generated as at step 102 by applying a cross fade operation upon data from two demi-syllables in the frequency domain.
- the filter data cross fade operation may employ data having a different type of concatenation unit than the waveform cross fade operation.
- the cross fade operation entails selecting a cross fade region across which the filter parameters of successive demi-syllables will be cross faded and by then applying a suitable cross fade function as at 106 .
- the cross fade function is applied in the filter domain and may be a linear function (similar to that illustrated in FIG. 5 ), a sigmoidal function or some other suitable function.
- the filter parameter data are stored at 108 for later use in the source-filter model synthesizer.
- cross fade region and the cross fade function is data dependent.
- the objective of performing cross fade in the frequency domain is to eliminate unwanted glitches or resonances without degrading important dip-thongs.
- cross-fade regions must be identified in which the trajectories of the speech units to be joined are as similar as possible. For example, in the construction of the word “house”, disyllabic filter units for /haw/- and /aws/ could be concatenated with overlap in the nuclear /a/ region.
- the source data and filter data have been compiled and rendered according to the preceding steps, they are output as at 110 to the respective source waveform databank 112 and filter parameters databank 114 for use by the source filter model synthesizer 116 to output synthesized speech.
- FIG. 6 illustrates a system according to the invention by which the source waveform may be extracted from a complex unit signal.
- a filter/inverse-filter pair is used in the extraction process.
- filter 110 is defined by its filter model 112 and filter parameters 114 .
- the present invention also employs an inverse filter 116 that corresponds to the inverse of filter 110 .
- Filter 116 would, for example, have the same filter parameters as filter 110 , but would substitute zeros at each location where filter 110 has poles.
- the filter 110 and inverse filter 116 define a reciprocal system in which the effect of inverse filter 116 is negated or reversed by the effect of filter 110 .
- a speech waveform inputs to inverse filter 16 and subsequently processed by filter 110 results in an output waveform that, in theory, is identical to the input waveform.
- signal variations in filter tolerance or slight differences between filters 116 and 110 would result in an output waveform that deviates somewhat from the identical match of the input waveform.
- the output residual signal at node 120 is processed by employing a cost function 122 .
- this cost function analyzes the residual signal according to one or more of a plurality of processing functions described more fully below, to produce a cost parameter.
- the cost parameter is then used in subsequent processing steps to adjust filter parameters 114 in an effort to minimize the cost parameter.
- the cost minimizer block 124 diagrammatically represents the process by which filter parameters are selectively adjusted to produce a resulting reduction in the cost parameter. This may be performed iteratively, using an algorithm that incrementally adjusts filter parameters while seeking the minimum cost.
- the resulting residual signal at node 120 may then be used to represent an extracted source signal for subsequent source-filter model synthesis.
- the filter parameters 114 that produced the minimum cost are then used as the filter parameters to define filter 110 for use in subsequent source-filter model synthesis.
- FIG. 7 illustrates the process by which the source signal is extracted, and the filter parameters identified, to achieve a source-filter model synthesis system in accordance with the invention.
- a filter model is defined at step 150 . Any suitable filter model that lends itself to a parameterized representation may be used.
- An initial set of parameters is then supplied at step 152 . Note that the initial set of parameters will be iteratively altered in subsequent processing steps to seek the parameters that correspond to a minimized cost function. Different techniques may be used to avoid a sub-optimal solution corresponding to local minima.
- the initial set of parameters used at step 152 can be selected from a set or matrix of parameters designed to supply several different starting points in order to avoid the local minima. Thus in FIG. 7 note that step 152 may be performed multiple times for different initial sets of parameters.
- the filter model defined at 150 and the initial set of parameters defined at 152 are then used at step 154 to construct a filter (as at 156 ) and an inverse filter (as at 158 ).
- the speech signal is applied to the inverse filter at 160 to extract a residual signal as at 164 .
- the preferred embodiment uses a Hanning window centered on the current pitch epoch and adjusted so that it covers two-pitch periods. Other windows are also possible.
- the residual signal is then processed at 166 to extract data points for use in the arc-length calculation.
- the residual signal may be processed in a number of different ways to extract the data points. As illustrated at 168 , the procedure may branch to one or more of a selected class of processing routines. Examples of such routines are illustrated at 170 . Next the arc-length (or square-length) calculation is performed at 172 . The resultant value serves as a cost parameter.
- the filter parameters are selectively adjusted at step 174 and the procedure is iteratively repeated as depicted at 176 until a minimum cost is achieved.
- the extracted residual signal corresponding to that minimum cost is used at step 178 as the source signal.
- the filter parameters associated with the minimum cost are used as the filter parameters (step 180 ) in a source-filter model.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Electrophonic Musical Instruments (AREA)
- Reduction Or Emphasis Of Bandwidth Of Signals (AREA)
Abstract
Description
Claims (15)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/288,029 USRE39336E1 (en) | 1998-11-25 | 2002-11-05 | Formant-based speech synthesizer employing demi-syllable concatenation with independent cross fade in the filter parameter and source domains |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/200,327 US6144939A (en) | 1998-11-25 | 1998-11-25 | Formant-based speech synthesizer employing demi-syllable concatenation with independent cross fade in the filter parameter and source domains |
US10/288,029 USRE39336E1 (en) | 1998-11-25 | 2002-11-05 | Formant-based speech synthesizer employing demi-syllable concatenation with independent cross fade in the filter parameter and source domains |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/200,327 Reissue US6144939A (en) | 1998-11-25 | 1998-11-25 | Formant-based speech synthesizer employing demi-syllable concatenation with independent cross fade in the filter parameter and source domains |
Publications (1)
Publication Number | Publication Date |
---|---|
USRE39336E1 true USRE39336E1 (en) | 2006-10-10 |
Family
ID=22741247
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/200,327 Ceased US6144939A (en) | 1998-11-25 | 1998-11-25 | Formant-based speech synthesizer employing demi-syllable concatenation with independent cross fade in the filter parameter and source domains |
US10/288,029 Expired - Lifetime USRE39336E1 (en) | 1998-11-25 | 2002-11-05 | Formant-based speech synthesizer employing demi-syllable concatenation with independent cross fade in the filter parameter and source domains |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/200,327 Ceased US6144939A (en) | 1998-11-25 | 1998-11-25 | Formant-based speech synthesizer employing demi-syllable concatenation with independent cross fade in the filter parameter and source domains |
Country Status (5)
Country | Link |
---|---|
US (2) | US6144939A (en) |
EP (2) | EP1005017B1 (en) |
JP (1) | JP3408477B2 (en) |
DE (1) | DE69909716T2 (en) |
ES (1) | ES2204071T3 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050131680A1 (en) * | 2002-09-13 | 2005-06-16 | International Business Machines Corporation | Speech synthesis using complex spectral modeling |
US20100131268A1 (en) * | 2008-11-26 | 2010-05-27 | Alcatel-Lucent Usa Inc. | Voice-estimation interface and communication system |
US20130231928A1 (en) * | 2012-03-02 | 2013-09-05 | Yamaha Corporation | Sound synthesizing apparatus, sound processing apparatus, and sound synthesizing method |
US8559813B2 (en) | 2011-03-31 | 2013-10-15 | Alcatel Lucent | Passband reflectometer |
US8666738B2 (en) | 2011-05-24 | 2014-03-04 | Alcatel Lucent | Biometric-sensor assembly, such as for acoustic reflectometry of the vocal tract |
Families Citing this family (140)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6266638B1 (en) * | 1999-03-30 | 2001-07-24 | At&T Corp | Voice quality compensation system for speech synthesis based on unit-selection speech database |
US7369994B1 (en) * | 1999-04-30 | 2008-05-06 | At&T Corp. | Methods and apparatus for rapid acoustic unit selection from a large speech corpus |
JP2001034282A (en) * | 1999-07-21 | 2001-02-09 | Konami Co Ltd | Voice synthesizing method, dictionary constructing method for voice synthesis, voice synthesizer and computer readable medium recorded with voice synthesis program |
JP3361291B2 (en) * | 1999-07-23 | 2003-01-07 | コナミ株式会社 | Speech synthesis method, speech synthesis device, and computer-readable medium recording speech synthesis program |
US7941481B1 (en) | 1999-10-22 | 2011-05-10 | Tellme Networks, Inc. | Updating an electronic phonebook over electronic communication networks |
US6807574B1 (en) | 1999-10-22 | 2004-10-19 | Tellme Networks, Inc. | Method and apparatus for content personalization over a telephone interface |
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
JP3728172B2 (en) * | 2000-03-31 | 2005-12-21 | キヤノン株式会社 | Speech synthesis method and apparatus |
US6847931B2 (en) | 2002-01-29 | 2005-01-25 | Lessac Technology, Inc. | Expressive parsing in computerized conversion of text to speech |
US6865533B2 (en) * | 2000-04-21 | 2005-03-08 | Lessac Technology Inc. | Text to speech |
US6963841B2 (en) * | 2000-04-21 | 2005-11-08 | Lessac Technology, Inc. | Speech training method with alternative proper pronunciation database |
US7280964B2 (en) * | 2000-04-21 | 2007-10-09 | Lessac Technologies, Inc. | Method of recognizing spoken language with recognition of language color |
US7143039B1 (en) | 2000-08-11 | 2006-11-28 | Tellme Networks, Inc. | Providing menu and other services for an information processing system using a telephone or other audio interface |
US7308408B1 (en) * | 2000-07-24 | 2007-12-11 | Microsoft Corporation | Providing services for an information processing system using an audio interface |
US6990450B2 (en) * | 2000-10-19 | 2006-01-24 | Qwest Communications International Inc. | System and method for converting text-to-voice |
US6871178B2 (en) * | 2000-10-19 | 2005-03-22 | Qwest Communications International, Inc. | System and method for converting text-to-voice |
US7451087B2 (en) * | 2000-10-19 | 2008-11-11 | Qwest Communications International Inc. | System and method for converting text-to-voice |
US6990449B2 (en) * | 2000-10-19 | 2006-01-24 | Qwest Communications International Inc. | Method of training a digital voice library to associate syllable speech items with literal text syllables |
JP3901475B2 (en) | 2001-07-02 | 2007-04-04 | 株式会社ケンウッド | Signal coupling device, signal coupling method and program |
US7546241B2 (en) * | 2002-06-05 | 2009-06-09 | Canon Kabushiki Kaisha | Speech synthesis method and apparatus, and dictionary generation method and apparatus |
GB2392592B (en) * | 2002-08-27 | 2004-07-07 | 20 20 Speech Ltd | Speech synthesis apparatus and method |
CN1604077B (en) * | 2003-09-29 | 2012-08-08 | 纽昂斯通讯公司 | Improvement for pronunciation waveform corpus |
US7571104B2 (en) * | 2005-05-26 | 2009-08-04 | Qnx Software Systems (Wavemakers), Inc. | Dynamic real-time cross-fading of voice prompts |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8024193B2 (en) * | 2006-10-10 | 2011-09-20 | Apple Inc. | Methods and apparatus related to pruning for concatenative text-to-speech synthesis |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
CN101281744B (en) | 2007-04-04 | 2011-07-06 | 纽昂斯通讯公司 | Method and apparatus for analyzing and synthesizing voice |
US8321222B2 (en) * | 2007-08-14 | 2012-11-27 | Nuance Communications, Inc. | Synthesis by generation and concatenation of multi-form segments |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US20100030549A1 (en) | 2008-07-31 | 2010-02-04 | Lee Michael M | Mobile device having human language translation capability with positional feedback |
US8332215B2 (en) * | 2008-10-31 | 2012-12-11 | Fortemedia, Inc. | Dynamic range control module, speech processing apparatus, and method for amplitude adjustment for a speech signal |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US20120309363A1 (en) | 2011-06-03 | 2012-12-06 | Apple Inc. | Triggering notifications associated with tasks items that represent tasks to perform |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
DE202011111062U1 (en) | 2010-01-25 | 2019-02-19 | Newvaluexchange Ltd. | Device and system for a digital conversation management platform |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US8994660B2 (en) | 2011-08-29 | 2015-03-31 | Apple Inc. | Text correction processing |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
DE112014000709B4 (en) | 2013-02-07 | 2021-12-30 | Apple Inc. | METHOD AND DEVICE FOR OPERATING A VOICE TRIGGER FOR A DIGITAL ASSISTANT |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
WO2014144579A1 (en) | 2013-03-15 | 2014-09-18 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
WO2014197336A1 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
WO2014197334A2 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
WO2014197335A1 (en) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
EP3008641A1 (en) | 2013-06-09 | 2016-04-20 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
WO2014200731A1 (en) | 2013-06-13 | 2014-12-18 | Apple Inc. | System and method for emergency calls initiated by voice command |
KR101749009B1 (en) | 2013-08-06 | 2017-06-19 | 애플 인크. | Auto-activating smart responses based on activities from remote devices |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
AU2015266863B2 (en) | 2014-05-30 | 2018-03-15 | Apple Inc. | Multi-command single utterance input method |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US9578173B2 (en) | 2015-06-05 | 2017-02-21 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
DK179309B1 (en) | 2016-06-09 | 2018-04-23 | Apple Inc | Intelligent automated assistant in a home environment |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
DK179343B1 (en) | 2016-06-11 | 2018-05-14 | Apple Inc | Intelligent task discovery |
DK179049B1 (en) | 2016-06-11 | 2017-09-18 | Apple Inc | Data driven natural language event detection and classification |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
DK201770439A1 (en) | 2017-05-11 | 2018-12-13 | Apple Inc. | Offline personal assistant |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
DK201770432A1 (en) | 2017-05-15 | 2018-12-21 | Apple Inc. | Hierarchical belief states for digital assistants |
DK179560B1 (en) | 2017-05-16 | 2019-02-18 | Apple Inc. | Far-field extension for digital assistant services |
Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS62100027A (en) * | 1985-10-28 | 1987-05-09 | Hitachi Ltd | Voice coding system |
JPS62102294A (en) | 1985-10-30 | 1987-05-12 | 株式会社日立製作所 | Voice coding system |
JPS62208099A (en) | 1986-04-24 | 1987-09-12 | ヤマハ株式会社 | Musical sound generator |
JPS63127630A (en) * | 1986-11-18 | 1988-05-31 | Hitachi Ltd | Voice compression processing unit |
US4910781A (en) * | 1987-06-26 | 1990-03-20 | At&T Bell Laboratories | Code excited linear predictive vocoder using virtual searching |
US4912768A (en) * | 1983-10-14 | 1990-03-27 | Texas Instruments Incorporated | Speech encoding process combining written and spoken message codes |
US5060268A (en) * | 1986-02-21 | 1991-10-22 | Hitachi, Ltd. | Speech coding system and method |
EP0504684A2 (en) * | 1991-03-19 | 1992-09-23 | Casio Computer Company Limited | Digital pitch shifter |
JPH06175692A (en) | 1992-12-08 | 1994-06-24 | Meidensha Corp | Data connecting method of voice synthesizer |
US5400434A (en) * | 1990-09-04 | 1995-03-21 | Matsushita Electric Industrial Co., Ltd. | Voice source for synthetic speech system |
JPH07177031A (en) | 1993-12-20 | 1995-07-14 | Fujitsu Ltd | Voice coding control system |
US5536902A (en) * | 1993-04-14 | 1996-07-16 | Yamaha Corporation | Method of and apparatus for analyzing and synthesizing a sound by extracting and controlling a sound parameter |
US5729694A (en) * | 1996-02-06 | 1998-03-17 | The Regents Of The University Of California | Speech coding, reconstruction and recognition using acoustics and electromagnetic waves |
US5845247A (en) * | 1995-09-13 | 1998-12-01 | Matsushita Electric Industrial Co., Ltd. | Reproducing apparatus |
US5970453A (en) * | 1995-01-07 | 1999-10-19 | International Business Machines Corporation | Method and system for synthesizing speech |
US6041300A (en) * | 1997-03-21 | 2000-03-21 | International Business Machines Corporation | System and method of using pre-enrolled speech sub-units for efficient speech synthesis |
WO2000030069A2 (en) * | 1998-11-13 | 2000-05-25 | Lernout & Hauspie Speech Products N.V. | Speech synthesis using concatenation of speech waveforms |
US6119086A (en) * | 1998-04-28 | 2000-09-12 | International Business Machines Corporation | Speech coding via speech recognition and synthesis based on pre-enrolled phonetic tokens |
US6169240B1 (en) * | 1997-01-31 | 2001-01-02 | Yamaha Corporation | Tone generating device and method using a time stretch/compression control technique |
US6266638B1 (en) * | 1999-03-30 | 2001-07-24 | At&T Corp | Voice quality compensation system for speech synthesis based on unit-selection speech database |
US20010056347A1 (en) * | 1999-11-02 | 2001-12-27 | International Business Machines Corporation | Feature-domain concatenative speech synthesis |
US6496801B1 (en) * | 1999-11-02 | 2002-12-17 | Matsushita Electric Industrial Co., Ltd. | Speech synthesis employing concatenated prosodic and acoustic templates for phrases of multiple words |
-
1998
- 1998-11-25 US US09/200,327 patent/US6144939A/en not_active Ceased
-
1999
- 1999-11-22 EP EP99309293A patent/EP1005017B1/en not_active Expired - Lifetime
- 1999-11-22 EP EP03008984A patent/EP1347440A3/en not_active Withdrawn
- 1999-11-22 DE DE69909716T patent/DE69909716T2/en not_active Expired - Fee Related
- 1999-11-22 ES ES99309293T patent/ES2204071T3/en not_active Expired - Lifetime
- 1999-11-24 JP JP33263399A patent/JP3408477B2/en not_active Expired - Fee Related
-
2002
- 2002-11-05 US US10/288,029 patent/USRE39336E1/en not_active Expired - Lifetime
Patent Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4912768A (en) * | 1983-10-14 | 1990-03-27 | Texas Instruments Incorporated | Speech encoding process combining written and spoken message codes |
JPS62100027A (en) * | 1985-10-28 | 1987-05-09 | Hitachi Ltd | Voice coding system |
JPS62102294A (en) | 1985-10-30 | 1987-05-12 | 株式会社日立製作所 | Voice coding system |
US5060268A (en) * | 1986-02-21 | 1991-10-22 | Hitachi, Ltd. | Speech coding system and method |
JPS62208099A (en) | 1986-04-24 | 1987-09-12 | ヤマハ株式会社 | Musical sound generator |
JPS63127630A (en) * | 1986-11-18 | 1988-05-31 | Hitachi Ltd | Voice compression processing unit |
US4910781A (en) * | 1987-06-26 | 1990-03-20 | At&T Bell Laboratories | Code excited linear predictive vocoder using virtual searching |
US5400434A (en) * | 1990-09-04 | 1995-03-21 | Matsushita Electric Industrial Co., Ltd. | Voice source for synthetic speech system |
EP0504684A2 (en) * | 1991-03-19 | 1992-09-23 | Casio Computer Company Limited | Digital pitch shifter |
JPH06175692A (en) | 1992-12-08 | 1994-06-24 | Meidensha Corp | Data connecting method of voice synthesizer |
US5536902A (en) * | 1993-04-14 | 1996-07-16 | Yamaha Corporation | Method of and apparatus for analyzing and synthesizing a sound by extracting and controlling a sound parameter |
JPH07177031A (en) | 1993-12-20 | 1995-07-14 | Fujitsu Ltd | Voice coding control system |
US5970453A (en) * | 1995-01-07 | 1999-10-19 | International Business Machines Corporation | Method and system for synthesizing speech |
US5845247A (en) * | 1995-09-13 | 1998-12-01 | Matsushita Electric Industrial Co., Ltd. | Reproducing apparatus |
US5729694A (en) * | 1996-02-06 | 1998-03-17 | The Regents Of The University Of California | Speech coding, reconstruction and recognition using acoustics and electromagnetic waves |
US6169240B1 (en) * | 1997-01-31 | 2001-01-02 | Yamaha Corporation | Tone generating device and method using a time stretch/compression control technique |
US6041300A (en) * | 1997-03-21 | 2000-03-21 | International Business Machines Corporation | System and method of using pre-enrolled speech sub-units for efficient speech synthesis |
US6119086A (en) * | 1998-04-28 | 2000-09-12 | International Business Machines Corporation | Speech coding via speech recognition and synthesis based on pre-enrolled phonetic tokens |
WO2000030069A2 (en) * | 1998-11-13 | 2000-05-25 | Lernout & Hauspie Speech Products N.V. | Speech synthesis using concatenation of speech waveforms |
US6266638B1 (en) * | 1999-03-30 | 2001-07-24 | At&T Corp | Voice quality compensation system for speech synthesis based on unit-selection speech database |
US20010056347A1 (en) * | 1999-11-02 | 2001-12-27 | International Business Machines Corporation | Feature-domain concatenative speech synthesis |
US6496801B1 (en) * | 1999-11-02 | 2002-12-17 | Matsushita Electric Industrial Co., Ltd. | Speech synthesis employing concatenated prosodic and acoustic templates for phrases of multiple words |
Non-Patent Citations (19)
Title |
---|
"A Diphone Synthesis System Based On Time-Domain Prosodic Modifications Of Speech", Christian Hamon, Eric Moulines, and Francis Charpentier, Centre National d'Etudes des Telecommunications, France, S5.7, p. 238. * |
"A New Method Of Generating Speech Synthesis Units Based On Phonological Knowledge and Clustering Technique", Yuki Yoshida, Shin'ya Nakajima, Kazuo Hakoda and Tomohisa Hirokawa, NTT Human Interface Laboratories, Japan, p. 1712. * |
"A New Text-To-Speech Synthesis System", E. Lewis, University of Bristol, U. K., and M. A. A. Tatham, University of Essex, U. K., Eurospeech, p. 1235. * |
"Automatic Generation Of Synthesis Units For Trainable Text-To-Speech Systems", H. Hon, A. Acero X. Huang, J. Liu, and M. Plumpe, Microsoft Research, Redmond, Washington. * |
"Automatically Clustering Similar Units For Unit Selection In Speech Synthesis", Alan W. Black and Paul Taylor, Centre for Speech Technology Research, University of Edinburgh, U. K. * |
"Combining Concatenation and Formant Synthesis for Improved Intelligibility and Naturalness in Text-to-Speech Systems", Steve Pearson, Frode Holm and Kazue Hata, International Journal Of Speech Technology 1, p. 103, 1997. * |
"Diphone Synthesis Using Unit Selection", Mark Beutnagel, Alistair Conkie, and Ann K. Syrdal, AT&T Labs-Research, New Jersey. * |
"High Quality Text-To-speech Synthesis: A Comparison Of Four Candidate Algorithms", T. Dutoit, Faculte Polytechnique de Mons, Belgium. * |
"High-Quality Speech Synthesis Using Context-Dependent Syllabic Units", Takashi Saito, Yasuhide Hashimoto, and Masaharu Sakamoto, IBM Research, Tokyo Research Laboratory, IBM Japan, Ltd., Japan, p. 381, IEEE 1996. * |
"Residual-Based Speech Modification Algorithms for Text-to-Speech Synthesis", M. Edgington and A. Lowry, BT Laboratories, Martiesham Heath, U.K., p. 1425. * |
"Spectral Modeling Synthesis: A Sound Analysis/Synthesis System Based on a Deterministic plus Stochastic Decomposition", Xavier Serra and Julius Smith III, Computer Music Journal, vol. 14, No. 4, p. 12, Winter 1990. * |
"Speech Synthesis", M. Stella, p. 435. * |
"Text To Speech Synthesizer Using Superposition of Sinusoidal Waves Generated By Synchronized Oscillators", K. Shirai, K. Hashimoto and T. Kobayashi, Department of Electrical Engineering, Waseda University, Japan, p. 39, Eurospeech 1991. * |
Chi-Shi Liu, Wern-Jun Wang, Shiow-Min Yu, Hsiao-Chuan Wang, Mandarin Speech Synthesis by the Unit of Coarticulatory Demi-syllable. |
Combinatorial Issues In Text-To-Speech Synthesis:, Jan P. H. van Santen, Lucent Technologies, Bell Labs, New Jersey. * |
F.M. Gimenez de Los Galanes, M.H. Savoji, J.M. Pardo, "New Algorithm for Spectral Smoothing and Envelope Modification For LP-PSOLA Synthesis", IEEE 1994, pp. 573-576. |
K. Matsui, S. D. Pearson, T. Kami, "Improving Naturalness in Text-to-Speech Synthesis using Natural Glottal Source", International Conference on Acoustics, Speech, and Signal Processing, IEEE 1991, pp. 769-772. |
Koh et al ("A Speech Synthesizer for Mandarin Chinese", IEEE Transactions on Consumer Electronics, Jun. 1990). * |
Matsui et al ("Improving Naturalness in Text-To-Speech Synthesis using Natural Glottal Source", International Conference on Acoustics, Speech, and Signal Processing, Apr. 1991). * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050131680A1 (en) * | 2002-09-13 | 2005-06-16 | International Business Machines Corporation | Speech synthesis using complex spectral modeling |
US8280724B2 (en) * | 2002-09-13 | 2012-10-02 | Nuance Communications, Inc. | Speech synthesis using complex spectral modeling |
US20100131268A1 (en) * | 2008-11-26 | 2010-05-27 | Alcatel-Lucent Usa Inc. | Voice-estimation interface and communication system |
US8559813B2 (en) | 2011-03-31 | 2013-10-15 | Alcatel Lucent | Passband reflectometer |
US8666738B2 (en) | 2011-05-24 | 2014-03-04 | Alcatel Lucent | Biometric-sensor assembly, such as for acoustic reflectometry of the vocal tract |
US20130231928A1 (en) * | 2012-03-02 | 2013-09-05 | Yamaha Corporation | Sound synthesizing apparatus, sound processing apparatus, and sound synthesizing method |
US9640172B2 (en) * | 2012-03-02 | 2017-05-02 | Yamaha Corporation | Sound synthesizing apparatus and method, sound processing apparatus, by arranging plural waveforms on two successive processing periods |
Also Published As
Publication number | Publication date |
---|---|
EP1005017A2 (en) | 2000-05-31 |
EP1347440A3 (en) | 2004-11-17 |
US6144939A (en) | 2000-11-07 |
EP1005017B1 (en) | 2003-07-23 |
JP2000172285A (en) | 2000-06-23 |
JP3408477B2 (en) | 2003-05-19 |
EP1005017A3 (en) | 2000-12-20 |
EP1347440A2 (en) | 2003-09-24 |
ES2204071T3 (en) | 2004-04-16 |
DE69909716D1 (en) | 2003-08-28 |
DE69909716T2 (en) | 2004-08-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
USRE39336E1 (en) | Formant-based speech synthesizer employing demi-syllable concatenation with independent cross fade in the filter parameter and source domains | |
Valbret et al. | Voice transformation using PSOLA technique | |
US5400434A (en) | Voice source for synthetic speech system | |
EP1704558B1 (en) | Corpus-based speech synthesis based on segment recombination | |
US4912768A (en) | Speech encoding process combining written and spoken message codes | |
EP1643486B1 (en) | Method and apparatus for preventing speech comprehension by interactive voice response systems | |
DE19610019C2 (en) | Digital speech synthesis process | |
Huang et al. | Recent improvements on Microsoft's trainable text-to-speech system-Whistler | |
JP3588302B2 (en) | Method of identifying unit overlap region for concatenated speech synthesis and concatenated speech synthesis method | |
JPH031200A (en) | Regulation type voice synthesizing device | |
Moulines et al. | A real-time French text-to-speech system generating high-quality synthetic speech | |
US7912718B1 (en) | Method and system for enhancing a speech database | |
O'Shaughnessy | Modern methods of speech synthesis | |
JP3281266B2 (en) | Speech synthesis method and apparatus | |
Bonafonte Cávez et al. | A billingual texto-to-speech system in spanish and catalan | |
Cadic et al. | Towards Optimal TTS Corpora. | |
US6829577B1 (en) | Generating non-stationary additive noise for addition to synthesized speech | |
Mandal et al. | Epoch synchronous non-overlap-add (ESNOLA) method-based concatenative speech synthesis system for Bangla. | |
JP3281281B2 (en) | Speech synthesis method and apparatus | |
van Rijnsoever | A multilingual text-to-speech system | |
JPH1195796A (en) | Voice synthesizing method | |
Datta et al. | Epoch Synchronous Overlap Add (ESOLA) | |
Pearson et al. | A synthesis method based on concatenation of demisyllables and a residual excited vocal tract model | |
Benbassat et al. | Low bit rate speech coding by concatenation of sound units and prosody coding | |
Christogiannis et al. | Construction of the acoustic inventory for a greek text-to-speech concatenative synthesis system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
CC | Certificate of correction | ||
FPAY | Fee payment |
Year of fee payment: 8 |
|
REMI | Maintenance fee reminder mailed | ||
FEPP | Fee payment procedure |
Free format text: PETITION RELATED TO MAINTENANCE FEES FILED (ORIGINAL EVENT CODE: PMFP); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees | ||
REIN | Reinstatement after maintenance fee payment confirmed | ||
FPAY | Fee payment |
Year of fee payment: 12 |
|
SULP | Surcharge for late payment | ||
FEPP | Fee payment procedure |
Free format text: PETITION RELATED TO MAINTENANCE FEES GRANTED (ORIGINAL EVENT CODE: PMFG); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
PRDP | Patent reinstated due to the acceptance of a late maintenance fee |
Effective date: 20061010 |
|
AS | Assignment |
Owner name: PANASONIC INTELLECTUAL PROPERTY CORPORATION OF AMERICA, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PANASONIC CORPORATION;REEL/FRAME:033033/0163 Effective date: 20140527 Owner name: PANASONIC INTELLECTUAL PROPERTY CORPORATION OF AME Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PANASONIC CORPORATION;REEL/FRAME:033033/0163 Effective date: 20140527 |