WO1993016465A1 - Process for speech analysis - Google Patents

Process for speech analysis Download PDF

Info

Publication number
WO1993016465A1
WO1993016465A1 PCT/SE1993/000058 SE9300058W WO9316465A1 WO 1993016465 A1 WO1993016465 A1 WO 1993016465A1 SE 9300058 W SE9300058 W SE 9300058W WO 9316465 A1 WO9316465 A1 WO 9316465A1
Authority
WO
WIPO (PCT)
Prior art keywords
roots
tracks
factor
factors
formants
Prior art date
Application number
PCT/SE1993/000058
Other languages
French (fr)
Inventor
Jaan Kaja
Original Assignee
Televerket
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Televerket filed Critical Televerket
Priority to EP93904419A priority Critical patent/EP0579812B1/en
Priority to DE69318223T priority patent/DE69318223T2/en
Priority to US08/129,077 priority patent/US6289305B1/en
Priority to AU35778/93A priority patent/AU658724B2/en
Publication of WO1993016465A1 publication Critical patent/WO1993016465A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/15Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information

Definitions

  • the present invention relates to a process for speech analysis and more specifically to an automatic process for the analysis of continuous speech.
  • the results of the invention can be used for speech recogni ⁇ tion and for speech synthesis etc. It is conventional to describe the wave form of speech using those resonant frequencies, so-called formants, which arise in the speech organ.
  • the present invention presents a process for determining suitable frequencies for the formants from an utterance.
  • the present invention provides a process for speech analysis comprising the recording of an utterance using some suitable device.
  • the utterance is divided into time frames and is analyzed by linear prediction in order to determine the roots for the denominator polynomial and thereby frequency values for each frame.
  • the utterance is divided into voiced regions and in each voiced region the centres of vowel sounds are determined using a number of starting points.
  • tracks are formed from the starting points by the roots being sorted from frame to frame, so that old and new roots are linked together.
  • Factors of merit are calculated for the tracks relative to the formants and the tracks are distributed to the formants in accordance with the factors of merit.
  • the factors of merit are preferably calculated using energy factors, continuity factors and correlation factors. Further embodiments of the present invention are given in more detail in the subsequent patent claims.
  • Figure 1 shows an example of a spectrogram of a vowel sound
  • Figure 2 is a curve of the low frequency energy
  • Figure 3 diagramatically shows the model for analysis using linear prediction.
  • the waveshape of speech can be likened to the response from a resonance chamber, the voice pipe, to a series of pulses, quasi-periodic vocal chord pulses during voiced sounds or sounds produced in association with a constriction during unvoiced sounds.
  • resonance arises in various cavities as in an acoustic filter.
  • the resonances are called formants and they appear in the spectrum as energy peaks at the resonant frequencies.
  • formant frequencies vary with time as the resonant cavities change position.
  • a spectrogram of a vowel sound is shown in Figure 1. It has been possible to produce spectrograms for a long time and linguists have studied them in order to be able to describe how speech is generated.
  • Vowel sounds are usually characterised by the three first, strongest, formants. In Figure 1 the formants are visible as dark bands which correspond to energy peaks from the point of view of frequency.
  • the vowel sounds lie in the low frequency region, while consonants lie in high frequency regions, e.g. the s sound, and have a completely different appearance.
  • the low frequency energy for the sound in Figure 1 is shown in Figure 2. It is evident that, from the point of view of time, the low frequency energy has a peak in the middle of the vowel sound.
  • the formants are thus important for describing the sound and are used, inter alia, for speech synthesis and speech recognition.
  • An automatic process for speech analysis therefore has an important technical applica ⁇ tion.
  • Linear prediction is a known method for analyzing a spoken utterance.
  • the model for the analysis is shown in Figure 3.
  • One proceeds from a speech signal which is inverse filtered with a transfer function of 1/H(z) so that white noise is obtained. Consequently, the model assumes that the sound source is white noise, while in actual fact it is vocal chord pulses. This signifies an error in the model, but the method is still usable.
  • the poles of the transfer function i.e. the roots of the denominator polynomial IH(z), which is a polynomial of z "1 , the frequencies are obtained as roots within the unit circle in the z plane.
  • the frequencies are calculated, for example, every 5th ms, so that the spectrum is divided into frames of 5 ms.
  • the utterance is recorded by some suitable recording device and is stored on a medium which is suitable for data processing. Since, in the case of for ant analysis, the main interest is in the vowel sounds, all the voiced regions in the recorded utterance are determined first of all. All the voiced regions with a minimum time length are ascertained. The unvoiced regions must also have a minimum length. The time length limitation is there in order to avoid possible mistakes in establishing voiced regions. Each voiced region is treated separately. They can in turn consist of several vowel sounds with interposed voiced consonant sounds, e.g. "mamma". The a's have corresponding peaks in the low frequency energy.
  • the aim is to set starting points in the centres of the vowel sounds. For this reason, all the low frequency energy peaks which are separated by an energy drop exceeding a particular threshold, usually 3 dB, are identified. A low frequency energy peak of this type is shown in Figure 2. A number of starting points are then obtained, one for each resonant frequency. A number of roots have thus been chosen for the frame which corresponds to the starting point.
  • the roots are then treated as follows.
  • the roots at the starting point are arranged so that the roots with a bandwidth above a minimum value are placed first in increasing bandwidth order, followed by remaining roots in decreasing bandwidth order.
  • the bandwidth of the roots is determined by their distance from the unit circle in the z plane. This rearrangement of the roots is not a critical part of the invention, but means that the roots do not have to be rearranged later.
  • each root is considered as the seed for a "track" of roots which goes to the left and the right.
  • the tracks are then extended, first to the left and then to the right, by sorting the roots from frame to frame.
  • the sorting procedure links together old and new roots by
  • the above procedure does not minimise the total distance between old and new roots, but retains tracks of roots, which lie close together, from frame to frame.
  • the number of roots can vary from frame to frame, as a result of which "holes" arise in certain tracks. This is allowed to take place and is in fact an important aspect of the algorithm. If holes were not allowed, it would be necessary to decide on the identity of a track. Sometimes additional roots are also obtained which must be sorted in among the holes.
  • the frequencies of the formants must be determined, i.e. the tracks sorted for the formants. Since there can be more tracks than formants, some of the tracks must be discarded.
  • the factor of merit is calculated for each track. Firstly, two factors of merit are formed for each track, a bandwidth factor and a continuity factor.
  • the bandwidth factor is formed by simming the square of the absolute quantity of the root for each root in the track. The bandwidth can be calculated as the distance of the root from the unit circle in the z plane.
  • the continuity factor is calculated as 1- the square of the bandwidth for the square of the difference between roots in succession (i.e.
  • nc is a measure of the distance i between neighbouring roots.
  • a correlation factor must be formed for each track in relation to each formant. In this way a vector with a correlation factor is obtained for each track, one for each formant. The correlation factor is calculated as the sum of the dependent probabilities that the particular root belongs to a formant. The vector is then multiplied by the square of the bandwidth factor and the square of the continuity factor in order to form the final "merit vector" . The merit vectors are then assembled into a merit matrix.
  • the allocation of tracks to formants is then carried out by changing the columns around in the merit matrix so that the diagonal element is maximised with the stipulation that the average frequency of the appertaining tracks lies in ascending order.
  • the first column in the arranged merit matrix thus corresponds to the first formant with the lowest frequency etc.
  • the tracks are drawn from these into the unvoiced regions.
  • a part of these extensions contains useful information, e.g. the tracks for the formants F2 and F3 from plosives to the following vowels.
  • the present invention thus provides a process for speech analysis which gives a more global optimisation by delaying the formant allocation until a whole voiced region has been analyzed. If the formants are established for each frame separately, as in the previous technology, there are often errors, since additional/false resonances appear. By linking the tracks together using the method according to the invention, these additional resonances can be controlled.
  • the method according to the invention rearranges the data recorded for the utterance. Thus, it is a non-destructive method insofar as the information is not altered. The extent of protection of the invention is only limited by the subsequent patent claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Auxiliary Devices For Music (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Investigating Or Analysing Materials By The Use Of Chemical Reactions (AREA)

Abstract

The invention relates to an automatic process for the analysis of continuous speech. The waveshape of the speech is described with the aid of the resonant frequencies, formants, which arise in the speech organ. Suitable frequencies for the formants are determined from an utterance by dividing it into time frames and analyzing it by linear prediction in order to determine roots of the denominator polynomial and thereby frequency values for each frame. The utterance is divided into voiced regions and in each voiced region the centres of vowel sounds are established in order to obtain a number of starting points. Tracks are formed from the starting points by sorting the roots from frame to frame so that old and new roots are linked together. Factors of merit are calculated for the tracks relative to formants and the tracks are distributed to formants in accordance with the factors of merit.

Description

TITLE OF THE INVENTION: PROCESS FOR SPEECH ANALYSIS
FIELD OF THE INVENTION
The present invention relates to a process for speech analysis and more specifically to an automatic process for the analysis of continuous speech. The results of the invention can be used for speech recogni¬ tion and for speech synthesis etc. It is conventional to describe the wave form of speech using those resonant frequencies, so-called formants, which arise in the speech organ. The present invention presents a process for determining suitable frequencies for the formants from an utterance.
STATE OF THE ART There already exist known methods for determining formants. One such method uses linear prediction, which provides frequencies included in the utterance at sampled time points. The centre of each vowel is determined using low energy peaks and is set as the starting point. Pro- ceeding from the starting point, the frequencies are allocated to known, previously estimated, intervals for the formants. Subsequently a matching is made to surrounding frames, forwards and backwards, in order to join the formants together over the whole vowel sound. One problem with this known method is that when each time point or frame is determined individually, it is easy for the wrong decision to be made in the alloca¬ tion of the frequencies to the formants, because addi¬ tional, incorrect, resonances arise, e.g. in the case of nasal sounds etc. The present invention removes this problem by delaying the decision on the allocation of frequencies to the formants until the whole utterance has been analyzed. SUMMARY OF THE INVENTION
Thus, the present invention provides a process for speech analysis comprising the recording of an utterance using some suitable device. The utterance is divided into time frames and is analyzed by linear prediction in order to determine the roots for the denominator polynomial and thereby frequency values for each frame. The utterance is divided into voiced regions and in each voiced region the centres of vowel sounds are determined using a number of starting points.
In accordance with the invention, tracks are formed from the starting points by the roots being sorted from frame to frame, so that old and new roots are linked together. Factors of merit are calculated for the tracks relative to the formants and the tracks are distributed to the formants in accordance with the factors of merit. The factors of merit are preferably calculated using energy factors, continuity factors and correlation factors. Further embodiments of the present invention are given in more detail in the subsequent patent claims.
BRIEF DESCRIPTION OF THE FIGURES
The invention will be described in detail below with reference to the following figures, in which: Figure 1 shows an example of a spectrogram of a vowel sound; Figure 2 is a curve of the low frequency energy: and Figure 3 diagramatically shows the model for analysis using linear prediction.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS OF THE
INVENTION
The waveshape of speech can be likened to the response from a resonance chamber, the voice pipe, to a series of pulses, quasi-periodic vocal chord pulses during voiced sounds or sounds produced in association with a constriction during unvoiced sounds. In shaping the voice pipe, resonance arises in various cavities as in an acoustic filter. The resonances are called formants and they appear in the spectrum as energy peaks at the resonant frequencies. In continuous speech the formant frequencies vary with time as the resonant cavities change position.
A spectrogram of a vowel sound, e.g. "A", is shown in Figure 1. It has been possible to produce spectrograms for a long time and linguists have studied them in order to be able to describe how speech is generated. Vowel sounds are usually characterised by the three first, strongest, formants. In Figure 1 the formants are visible as dark bands which correspond to energy peaks from the point of view of frequency. The vowel sounds lie in the low frequency region, while consonants lie in high frequency regions, e.g. the s sound, and have a completely different appearance.
The low frequency energy for the sound in Figure 1 is shown in Figure 2. It is evident that, from the point of view of time, the low frequency energy has a peak in the middle of the vowel sound.
The formants are thus important for describing the sound and are used, inter alia, for speech synthesis and speech recognition. An automatic process for speech analysis therefore has an important technical applica¬ tion.
Linear prediction is a known method for analyzing a spoken utterance. The model for the analysis is shown in Figure 3. One proceeds from a speech signal which is inverse filtered with a transfer function of 1/H(z) so that white noise is obtained. Consequently, the model assumes that the sound source is white noise, while in actual fact it is vocal chord pulses. This signifies an error in the model, but the method is still usable. By calculating the poles of the transfer function, i.e. the roots of the denominator polynomial IH(z), which is a polynomial of z"1, the frequencies are obtained as roots within the unit circle in the z plane. The frequencies are calculated, for example, every 5th ms, so that the spectrum is divided into frames of 5 ms. The utterance is recorded by some suitable recording device and is stored on a medium which is suitable for data processing. Since, in the case of for ant analysis, the main interest is in the vowel sounds, all the voiced regions in the recorded utterance are determined first of all. All the voiced regions with a minimum time length are ascertained. The unvoiced regions must also have a minimum length. The time length limitation is there in order to avoid possible mistakes in establishing voiced regions. Each voiced region is treated separately. They can in turn consist of several vowel sounds with interposed voiced consonant sounds, e.g. "mamma". The a's have corresponding peaks in the low frequency energy.
As mentioned earlier, the aim is to set starting points in the centres of the vowel sounds. For this reason, all the low frequency energy peaks which are separated by an energy drop exceeding a particular threshold, usually 3 dB, are identified. A low frequency energy peak of this type is shown in Figure 2. A number of starting points are then obtained, one for each resonant frequency. A number of roots have thus been chosen for the frame which corresponds to the starting point.
The roots are then treated as follows. The roots at the starting point are arranged so that the roots with a bandwidth above a minimum value are placed first in increasing bandwidth order, followed by remaining roots in decreasing bandwidth order. The bandwidth of the roots is determined by their distance from the unit circle in the z plane. This rearrangement of the roots is not a critical part of the invention, but means that the roots do not have to be rearranged later. At this stage each root is considered as the seed for a "track" of roots which goes to the left and the right.
The tracks are then extended, first to the left and then to the right, by sorting the roots from frame to frame. The sorting procedure links together old and new roots by
1. going through all new roots and finding the nearest old root; 2. eliminating competing candidates by removing those which are farthest away; 3. going through all zero links and comparing with existing links. If the root which is associated with a zero link fits better than an existing link, these are exchanged.
The above procedure functions when the number of new roots is greater than or equal to the number of old roots. If the latter number is greater, the procedure is essentially the same, but the old roots are examined instead. Proceeding from the middle point of the vowel sound, a number of tracks are obtained.
The above procedure does not minimise the total distance between old and new roots, but retains tracks of roots, which lie close together, from frame to frame. The number of roots can vary from frame to frame, as a result of which "holes" arise in certain tracks. This is allowed to take place and is in fact an important aspect of the algorithm. If holes were not allowed, it would be necessary to decide on the identity of a track. Sometimes additional roots are also obtained which must be sorted in among the holes.
When tracks have thus been formed for roots over the whole utterance, the frequencies of the formants must be determined, i.e. the tracks sorted for the formants. Since there can be more tracks than formants, some of the tracks must be discarded. To do this, the factor of merit is calculated for each track. Firstly, two factors of merit are formed for each track, a bandwidth factor and a continuity factor. The bandwidth factor is formed by simming the square of the absolute quantity of the root for each root in the track. The bandwidth can be calculated as the distance of the root from the unit circle in the z plane. The continuity factor is calculated as 1- the square of the bandwidth for the square of the difference between roots in succession (i.e. ^ M- r i- r i_2 ]) nc is a measure of the distance i between neighbouring roots. Additionally, a further factor of merit, a correlation factor must be formed for each track in relation to each formant. In this way a vector with a correlation factor is obtained for each track, one for each formant. The correlation factor is calculated as the sum of the dependent probabilities that the particular root belongs to a formant. The vector is then multiplied by the square of the bandwidth factor and the square of the continuity factor in order to form the final "merit vector" . The merit vectors are then assembled into a merit matrix. The allocation of tracks to formants is then carried out by changing the columns around in the merit matrix so that the diagonal element is maximised with the stipulation that the average frequency of the appertaining tracks lies in ascending order. The first column in the arranged merit matrix thus corresponds to the first formant with the lowest frequency etc.
When all the voiced regions have been treated, the tracks are drawn from these into the unvoiced regions. A part of these extensions contains useful information, e.g. the tracks for the formants F2 and F3 from plosives to the following vowels.
The present invention thus provides a process for speech analysis which gives a more global optimisation by delaying the formant allocation until a whole voiced region has been analyzed. If the formants are established for each frame separately, as in the previous technology, there are often errors, since additional/false resonances appear. By linking the tracks together using the method according to the invention, these additional resonances can be controlled. The method according to the invention rearranges the data recorded for the utterance. Thus, it is a non-destructive method insofar as the information is not altered. The extent of protection of the invention is only limited by the subsequent patent claims.

Claims

PATENT CLAIMS 1. Process for speech analysis, comprising recording of an utterance, division of the utterance into time frames and analysis by linear prediction in order to determine roots for the denominator polynomial and thereby frequency values for each frame, division of an utterance into voiced regions, establishing the centres of vowel sounds in each voiced region using a number of starting points, characterised in that tracks are formed from the starting points by sorting the roots from frame to frame so that old and new roots are linked together, that factors of merit are calculated for the tracks relative to the formants, that the tracks are distributed to the formants in accordance with the factors of merit.
2. Process for speech analysis according to Claim 1, characterised in that the tracks of roots are formed by starting from one root and going through all the new roots to find the one at the least distance from the root and linking these roots together.
3. Process for speech analysis according to Claim 2, characterised in that the factors of merit are calculated using bandwidth factors, continuity factors and correla¬ tion factors.
4. Process for speech analysis according to Claim 3, characterised in that the bandwidth factor is calculated as the sum of the distance of the roots from the unit circle in the z-plane, that the continuity factor is calculated as the sum of the distance between neigh¬ bouring roots, whereby a bandwidth factor and a con- tinuity factor are obtained for each track.
5. Process for speech analysis according to Claim 4, characterised in that the correlation factor is calcu¬ lated as the sum of the dependent probabilities that the roots belong to a formant, so that for each track a vector with a correlation factor is obtained.
6. Process for speech analysis according to Claim 5, characterised in that a merit matrix is formed by, for each track, multiplying the correlation factor vector by the bandwidth factor and the square of the continuity factor, and by arranging the vectors thus formed in a matrix so that the diagonal elements are maximised, with the stipulation that the average frequencies of the tracks belonging to the vectors are arranged in ascending order.
7. Process for speech analysis according to any one of the preceding claims, characterised in that the tracks are extended into the unvoiced regions.
PCT/SE1993/000058 1992-02-07 1993-01-28 Process for speech analysis WO1993016465A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP93904419A EP0579812B1 (en) 1992-02-07 1993-01-28 Process for speech analysis
DE69318223T DE69318223T2 (en) 1992-02-07 1993-01-28 METHOD FOR VOICE ANALYSIS
US08/129,077 US6289305B1 (en) 1992-02-07 1993-01-28 Method for analyzing speech involving detecting the formants by division into time frames using linear prediction
AU35778/93A AU658724B2 (en) 1992-02-07 1993-01-28 Process for speech analysis

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SE9200349-0 1992-02-07
SE9200349A SE468829B (en) 1992-02-07 1992-02-07 PROCEDURES IN SPEECH ANALYSIS FOR DETERMINATION OF APPROPRIATE FORM FREQUENCY

Publications (1)

Publication Number Publication Date
WO1993016465A1 true WO1993016465A1 (en) 1993-08-19

Family

ID=20385237

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE1993/000058 WO1993016465A1 (en) 1992-02-07 1993-01-28 Process for speech analysis

Country Status (6)

Country Link
US (1) US6289305B1 (en)
EP (1) EP0579812B1 (en)
AU (1) AU658724B2 (en)
DE (1) DE69318223T2 (en)
SE (1) SE468829B (en)
WO (1) WO1993016465A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001018789A1 (en) * 1999-09-03 2001-03-15 Microsoft Corporation Formant tracking in speech signal with probability models
GB2447141A (en) * 2007-02-27 2008-09-03 Sepura Plc Speech encoding and decoding in tetra communications systems

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB9928420D0 (en) * 1999-12-02 2000-01-26 Ibm Interactive voice response system
US20040260540A1 (en) * 2003-06-20 2004-12-23 Tong Zhang System and method for spectrogram analysis of an audio signal
KR100634526B1 (en) 2004-11-24 2006-10-16 삼성전자주식회사 Apparatus and method for tracking formants
WO2008084476A2 (en) * 2007-01-09 2008-07-17 Avraham Shpigel Vowel recognition system and method in speech to text applications

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0275584A1 (en) * 1986-12-12 1988-07-27 Koninklijke Philips Electronics N.V. Method of and device for deriving formant frequencies from a part of a speech signal
US4882758A (en) * 1986-10-23 1989-11-21 Matsushita Electric Industrial Co., Ltd. Method for extracting formant frequencies

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4625286A (en) 1982-05-03 1986-11-25 Texas Instruments Incorporated Time encoding of LPC roots
US4536886A (en) 1982-05-03 1985-08-20 Texas Instruments Incorporated LPC pole encoding using reduced spectral shaping polynomial
US4922539A (en) 1985-06-10 1990-05-01 Texas Instruments Incorporated Method of encoding speech signals involving the extraction of speech formant candidates in real time

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4882758A (en) * 1986-10-23 1989-11-21 Matsushita Electric Industrial Co., Ltd. Method for extracting formant frequencies
EP0275584A1 (en) * 1986-12-12 1988-07-27 Koninklijke Philips Electronics N.V. Method of and device for deriving formant frequencies from a part of a speech signal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
IEEE Transactions on Communications, Volume COM26, No. 3, March 1978, CHONG KWAN UN, "A Low-Rate Digital Formant Vocoder pp. 344-354", see II. system description. *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001018789A1 (en) * 1999-09-03 2001-03-15 Microsoft Corporation Formant tracking in speech signal with probability models
US6505152B1 (en) 1999-09-03 2003-01-07 Microsoft Corporation Method and apparatus for using formant models in speech systems
US6708154B2 (en) 1999-09-03 2004-03-16 Microsoft Corporation Method and apparatus for using formant models in resonance control for speech systems
GB2447141A (en) * 2007-02-27 2008-09-03 Sepura Plc Speech encoding and decoding in tetra communications systems
GB2447141B (en) * 2007-02-27 2009-06-17 Sepura Plc Speech encoding and decoding in communications systems
US8577672B2 (en) 2007-02-27 2013-11-05 Audax Radio Systems Llp Audible errors detection and prevention for speech decoding, audible errors concealing

Also Published As

Publication number Publication date
SE9200349D0 (en) 1992-02-07
EP0579812B1 (en) 1998-04-29
EP0579812A1 (en) 1994-01-26
AU658724B2 (en) 1995-04-27
US6289305B1 (en) 2001-09-11
SE9200349L (en) 1993-03-22
AU3577893A (en) 1993-09-03
DE69318223T2 (en) 1998-09-17
SE468829B (en) 1993-03-22
DE69318223D1 (en) 1998-06-04

Similar Documents

Publication Publication Date Title
Childers et al. Gender recognition from speech. Part II: Fine analysis
CA2189666C (en) Waveform speech synthesis
EP0127729B1 (en) Voice messaging system with unified pitch and voice tracking
US4783802A (en) Learning system of dictionary for speech recognition
US5459815A (en) Speech recognition method using time-frequency masking mechanism
US5144672A (en) Speech recognition apparatus including speaker-independent dictionary and speaker-dependent
Besacier et al. Subband approach for automatic speaker recognition: optimal division of the frequency domain
EP0579812B1 (en) Process for speech analysis
US7039584B2 (en) Method for the encoding of prosody for a speech encoder working at very low bit rates
Christensen et al. A comparison of three methods of extracting resonance information from predictor-coefficient coded speech
DeMori Syntactic recognition of speech patterns
EP0190489B1 (en) Speaker-independent speech recognition method and system
Wong On understanding the quality problems of LPC speech
Moreau et al. Selection of excitation vectors for the CELP coders
Hernando Pericás et al. Robust speech parameters located in the frequency domain
Rodet et al. Synthesis by rule: LPC diphones and calculation of formant trajectories
Yip et al. Optimal root cepstral analysis for speech recognition
Cosi On the use of auditory models in speech technology
Mashao Experiments on a parametric nonlinear spectral warping for an HMM-based speech recognizer
McCandless Automatic formant extraction using linear prediction
Faycal et al. Pitch modification of speech signal using source filter model by linear prediction for prosodic transformations
Mashao Development of the LEMS speech recognizer: Improving performance using feature-sets
Sirigos et al. Vowel-non vowel decision using neural networks and rules
Sirigos et al. Vowel-non vowel classification of speech using an MLP and Rules
JPS63195700A (en) Formant extractor

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU JP US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

WWE Wipo information: entry into national phase

Ref document number: 1993904419

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1993904419

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 08129077

Country of ref document: US

WWG Wipo information: grant in national office

Ref document number: 1993904419

Country of ref document: EP