EP0713208B1 - Système d'estimation de la fréquence fondamentale - Google Patents
Système d'estimation de la fréquence fondamentale Download PDFInfo
- Publication number
- EP0713208B1 EP0713208B1 EP95118142A EP95118142A EP0713208B1 EP 0713208 B1 EP0713208 B1 EP 0713208B1 EP 95118142 A EP95118142 A EP 95118142A EP 95118142 A EP95118142 A EP 95118142A EP 0713208 B1 EP0713208 B1 EP 0713208B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- pitch
- lag
- speech
- coding
- subframe
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0011—Long term prediction filters, i.e. pitch estimation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
Definitions
- LPC linear prediction coding
- pitch information is a reliable indicator and representative of sounds for coding purposes.
- Pitch describes a key feature or parameter of a speaker's voice.
- speech estimation models which can effectively estimate the speech pitch data provide for more accurate and precise coded and decoded speech.
- CELP vector sum excited linear prediction
- codecs MBE coder/decoders
- pitch lag estimation schemes are used in conjunction with the above-mentioned codecs: a time domain approach, frequency domain approach, and cepstrum domain approach.
- the precision of the pitch estimation has a direct impact on the speech quality due to the close relationship between pitch lag and speech reproduction.
- speech generation is based on predictions -- long-term pitch prediction and short-term linear prediction.
- Figure 1 shows a speech regeneration block diagram of a typical CELP coder.
- a CELP speech coder To compress speech data, it is desirable to extract only essential information to avoid transmitting redundancies. Speech can be grouped into short blocks, where representative parameters can be identified in all of the blocks. As indicated in Figure 1, to generate good quality speech, a CELP speech coder must extract LPC parameters 110, pitch lag parameters 112 (including lag and its coefficient), and an optimal innovation code vector 114 with its gain parameter 116 from the input speech to be coded. The coder quantizes the LPC parameters by implementing appropriate coding schemes. The indices of quantization of each parameter comprise the information to be stored or transmitted to the speech decoder. In CELP codecs, determination of pitch prediction parameters (pitch lag and pitch coefficients) is performed in the time domain, while in MBE codecs, pitch parameters are estimated in the frequency domain.
- the CELP encoder determines an appropriate LPC filter 110 for the current speech coding frame (usually taken about 10-40 ms).
- np is the LPC prediction order (usually approximately 10)
- y(n) is sampled speech data
- n represents the time index.
- the LPC equations above describe the estimation of the current sample according to the linear combination of the past samples.
- T is the target signal which represents the perceptually filtered input signal
- H is the impulse response matrix of the filter W(z)/A(z).
- P Lag is the pitch prediction contribution having pitch lag "Lag" and prediction coefficient ⁇ which is uniquely defined for a given lag
- C is the codebook contribution associated with index i in the codebook and its corresponding gain ⁇ .
- the pitch of human speech varies from 2 ms - 20 ms.
- the pitch lag corresponds roughly to 20 - 147 samples.
- i takes values between 0 and Nc-1, where Nc is the size of the innovation codebook.
- a one-tap pitch predictor and one innovation codebook are assumed.
- the general form of the pitch predictor is a multi-tap scheme
- the general form of the innovation codebook is a multi-level vector quantization, or utilizes multiple innovation codebooks.
- one-tap pitch predictor indicates that the current speech sample can be predicted by a past speech sample
- the multi-tap predictor means that the current speech sample can be predicted by multiple past speech samples.
- pitch lag estimation may be performed by simply evaluating the possible lag values in the range between L 1 and L 2 samples to cover 2.5 ms - 18.5 ms. Consequently, the estimated pitch lag value is determined by maximizing the following: Even though this time domain approach may enable the determination of the real pitch lag, for female speech having a high pitch frequency, the pitch lag found by Eqn. (1) may not be the real lag, but a multiple of the real lag. To avoid this estimation error, additional processes are necessary to correct the estimation error (e.g., lag smoothing) at the cost of undesirable complexity.
- the time domain approach requires at least 3 million operations per second (MOPs) to determine the lag using integer lag only.
- MOPs operations per second
- the complexity is more likely approximately 4 MOPs.
- DSP MIPs digital signal processing machine instructions per second
- MBE coders an important member in the class of sinusoidal coders, coding parameters are extracted and quantized in the frequency domain.
- the MBE speech model is shown in Figures 2-4.
- the MBE voice encoder/decoder described in Figures 2 and 3
- the fundamental frequency (or pitch lag) 210, voiced/unvoiced decision 212, and spectral envelope 214 are extracted from the input speech in the frequency domain.
- the parameters are then quantized and encoded into a bit stream which can be stored or transmitted.
- the fundamental frequency In the MBE vocoder, to achieve high speech quality, the fundamental frequency must be estimated with high precision.
- the estimation of the fundamental frequency is performed in two stages. First, an initial pitch lag is searched within the range of 21 samples to 114 samples to cover 2.6 - 14.25 ms at the sampling rate of 8000 Hz by minimizing a weighted mean square error equation 310 ( Figure 3) between the input speech 216 and the synthesized speech 218 in the frequency domain.
- the mean square error between the original speech and the synthesized speech is given by the equation: where S( ⁇ ) is the original speech spectrum, S and( ⁇ ) is the synthesized speech spectrum, and G( ⁇ ) is a frequency-dependent weighting function.
- a pitch tracking algorithm 410 is used to update the initial pitch lag estimate 412 by using the pitch information of neighboring frames.
- the motivation for using this approach is based upon the assumption that the fundamental frequency should not change abruptly between neighboring frames.
- the pitch estimates of the two past and two future neighbor frames are used for the pitch tracking.
- the mean-square error (including two past and future frames) is then minimized to find a new pitch lag value for the current frame.
- a pitch lag multiple checking scheme 414 is applied to eliminate the multiple pitch lag, thus smoothing the pitch lag.
- pitch lag refinement 416 is employed to increase the precision of the pitch estimate.
- the candidate pitch lag values are formed based on the initial pitch lag estimate (i.e., the new candidate pitch lag values are formed by adding or subtracting some fractional number from the initial pitch lag estimate). Accordingly, a refined pitch lag estimate 418 can be determined among the candidate pitch lags by minimizing the mean square error function.
- cepstrum domain pitch lag estimation (Figure 5), which was proposed by A.M. Noll in 1967, other modified methods were proposed.
- cepstrum domain pitch lag estimation approximately 37 ms of speech are sampled 510 so that at least two periods of the maximum possible pitch lag (e.g., 18.5 ms) are covered.
- a 512-point FFT is then applied to the windowed speech frame (at block 512) to obtain the frequency spectrum.
- taking the logarithm 514 amplitude of the frequency spectrum another 512-point inverse FFT 516 is applied to get the cepstrum.
- a weighting function 518 is applied to the cepstrum, and the peak of the cepstrum is detected 520 to determine the pitch lag.
- a tracking algorithm 522 is then implemented to eliminate any pitch multiples.
- EP-A-415163 discloses a method and apparatus for determining the lag of a long term filter in a code excited linear prediction speech coder.
- An open loop lag is first determined using an autocorrelation function.
- the open loop lag is then utilized to generate a limited range over which a closed loop search is performed.
- the range for appropriate values includes lags that are harmonically related to open loop lag as well as adjacent lags.
- a speech coding apparatus for reproducing and coding input speech, as set forth in claim 1, and a speech coding method for reproducing and coding input speech, as set forth in claim 11, is provided.
- Preferred embodiments of the invention are disclosed in the dependent claims.
- the present invention is directed to a device and method of speech coding using CELP techniques, as well as a variety of other speech coding and recognition systems. Consequently, better results are provided with fewer computational resources, while maintaining the necessary high precision.
- a pitch lag estimation scheme which quickly and efficiently enables the accurate reproduction and regeneration of speech.
- the pitch lag is extracted for a given speech frame and then refined for each subframe. After a minimum number of speech samples have been obtained by sampling speech directly, a Discrete Fourier Transform (DFT) is applied, and the resultant amplitude is squared. A second DFT is then performed. Accordingly, an accurate initial pitch lag for the speech samples within the frame can be determined between the possible minimum value of 20 samples and the maximum lag value of 147 samples at the 8 KHz sampling rate. After obtaining the initial pitch lag estimate, time domain refinement must be performed for each subframe to further improve the estimation precision.
- DFT Discrete Fourier Transform
- Figure 1 is a block diagram of a CELP speech model.
- Figure 2 is a block diagram of an MBE speech model.
- Figure 3 is a block diagram of an MBE encoder.
- Figure 4 is a block diagram of pitch lag estimation in an MBE vocoder.
- Figure 5 is block diagram of a cepstrum-based pitch lag detection scheme.
- Figure 6 is an operational flow diagram of pitch lag estimation according to an embodiment of the present invention.
- Figure 7 is a flow diagram of pitch lag estimation according to another embodiment of the present invention.
- Figure 8 is a diagrammatic view of speech coding according to the embodiment of Figure 6.
- N may equal 320 speech samples to accommodate a typical 40 ms speech window at an 8000 Hz sampling rate.
- the value of N is determined by the roughly estimated speech period, wherein at least two periods are generally required to generate the speech spectrum.
- a Hamming window 604 or other window which covers at least two pitch periods is preferably implemented.
- 2 for f 0, 1, ..., N-1
- G(f)
- 2 for f 0, 1, ..., N-1
- a second N-point DFT is applied to G(f) in Step 610 to obtain
- C(n) is unlike the conventional cepstrum transformation in which the logarithm of G(f) is used in Eqn. (4) rather than the function G(f).
- This difference is generally attributable to complexity concerns. It is desirable to reduce the complexity by eliminating the logarithmic function, which otherwise requires substantially greater computational resources.
- pitch lag estimation schemes using cepstrum or the C(n) function
- varying results have been obtained only for unvoiced or transition segments of the speech. For example, for unvoiced or transition speech, the definition of pitch is unclear. It has been said that there is no pitch in transition speech, while others say that some prediction can always be designated to minimize the error.
- the pitch lag for the given speech frame can be found in step 614 by solving the following: where arg [•] determines the variable n which satisfies the inside optimization function, L 1 and L 2 are defined as the minimum and maximum possible pitch lag, respectively.
- L 1 and L 2 take values of 20 and 147, respectively, to cover the typical human speech pitch lag range of 2.5 to 18.375 ms, where the distance between L 1 and L 2 is a power of 2.
- W(i) is a weighting function, and 2M+1 represents the window size.
- the resultant pitch lag is an averaged value, it has been found to be reliable and accurate.
- the averaging effect is due to the relatively large analysis window size; for a lag of 147 samples, the window size should be at least twice as large as the lag value.
- signals from some speakers, such as female talkers who typically display a small pitch lag may contain 4-10 pitch periods. If there is a change in the pitch lag, the proposed pitch lag estimation only produces an averaged pitch lag. As a result, the use of such an averaged pitch lag in speech coding could cause severe degradation in speech estimation and regeneration.
- a refined search based on the initial pitch lag estimate is performed in the time domain (Step 618).
- a simple autocorrelation method is performed around the averaged Lag value for the particular coding period, or subframe: where arg [•] determines the variable n which satisfies the inside optimization function, k denotes the first sample of the subframe, l represents the refine window size and m is a searching range.
- a more precise pitch lag can be estimated and applied to the coding of the subframe.
- the window size must be power of 2. For example, it has been shown that the maximum pitch lag of 147 samples is not a power of 2. To include the maximum pitch lag, a window size of 512 samples is necessary. However, this results in a poor pitch lag estimation for female voices due to the averaging effect, discussed above, and the large amount of computation required. If a window size of 256 samples is used, the averaging effect is reduced and the complexity is less. However, to use such a window, a pitch lag larger than 128 samples in the speech cannot be accommodated.
- FFT Fast Fourier Transform
- an alternative preferred embodiment of the present invention utilizes a 256-point FFT to reduce the complexity, and employ a modified signal to estimate the pitch lag.
- a Hamming window, or other window is then applied to the interpolated data in step 705.
- step 706 the pitch lag estimation is performed over y(i) using a 256-point FFT to generate the amplitude Y(f).
- Steps 708-710 are then carried out similarly to those described with regard to Figure 6.
- G(f) is filtered (step 709) to reduce the high frequency components of G(f) which are not useful for pitch detection.
- time domain refinement is performed over the original speech samples.
- the 40 ms coding frame is divided into eight 5 ms coding subframes 808, as shown in Figure 8.
- Initial pitch lag estimates lag 1 and lag 2 are the lag estimates for the last coding subframe of each pitch subframe in the current coding frame.
- Lag 0 is the refined lag estimate of the second pitch subframe in the previous coding frame.
- the relationship among lag 1 , lag 2 , and lag 0 is shown in Figure 8.
- the initial pitch lags lag 1 and lag 2 are refined first to improve their precision (step 718 in Figure 7) according to: where N 1 is the index of the starting sample in the pitch subframe for its pitch lag 1 .
- M is selected to be 10
- L is lag i + 10
- i indicates the index of the pitch subframe.
- the pitch lags of the coding subframes can be determined.
- the pitch lags of the coding subframes are estimated by linearly interpolating lag 1 , lag 2 , and lag 0 .
- each lag I (i) is further refined (step 722) by: where Ni is the index of the starting sample in the coding subframe for pitch lag(i).
- M is chosen to be 3, and L equals 40.
- the linear interpolation of pitch lag is critical in unvoiced segments of speech.
- the pitch lag found by any analysis method tends to be randomly distributed for unvoiced speech.
- due to the relatively large pitch subframe size if the lag for each subframe is too close to the initially determined subframe lag (found in step (2) above), an artificial periodicity that originally was not in the speech is added.
- linear interpolation provides a simple solution to problems associated with poor quality unvoiced speech.
- the subframe lag tends to be random, once interpolated, the lag for each subframe is also very randomly distributed, which guarantees voice quality.
Claims (16)
- Dispositif de codage de voix pour reproduire et coder une voix d'entrée, le dispositif de codage de voix utilisant des paramètres de codage à prédiction linéaire (LPC) et un dictionnaire de nouveautés représentant une pluralité de vecteurs qui font l'objet d'une référence pour exciter une reproduction vocale afin de générer une voix, le dispositif de codage de voix comportant :des moyens d'entrée vocale (602) destinés à recevoir la voix d'entrée,un ordinateur destiné à traiter la voix d'entrée, l'ordinateur incluant :des moyens pour séparer une trame de codage courante se trouvant dans la voix d'entrée,des moyens pour diviser la trame de codage en plusieurs sous-trames de fondamental (802, 804),des moyens pour définir une fenêtre d'analyse de fondamental (806) ayant N échantillons vocaux, la fenêtre d'analyse de fondamental s'étendant jusqu'aux sous-trames de fondamental (802, 804),des moyens pour estimer une valeur de retard de fondamental initiale (714) pour chaque sous-trame de fondamental (802, 804),des moyens pour diviser chaque sous-trame de fondamental (802, 804) en de multiples sous-trames de codage (808),dans lequel l'estimation de retard de fondamental initiale pour chaque sous-trame de fondamental (802, 804) représente l'estimation de retard pour la dernière sous-trame de codage (802) de chaque sous-trame de fondamental (802, 804) se trouvant dans la trame de codage courante, etdes moyens pour interpoler linéairement (720) les valeurs de retard de fondamental estimées (714) entre les sous-trames de fondamental (802, 804) afin de déterminer une estimation de retard de fondamental pour chaque sous-trame de codage (808), etdes moyens pour affiner (722) les valeurs de retard linéairement interpolées (720) de chaque sous-trame de codage, etdes moyens de sortie vocale pour délivrer en sortie une voix reproduite conformément aux valeurs de retard de fondamental affinées (722).
- Dispositif selon la revendication 1, comportant en outre des moyens d'échantillonnage qui échantillonnent la voix d'entrée à une vitesse d'échantillonnage R, dans lequel les N échantillons vocaux sont déterminés suivant l'équation N = R * X, et où X est une valeur de réduction d'échantillonnage permettant une représentation à l'aide de moins d'échantillons.
- Dispositif selon la revendication 2, dans lequel X = 25 ms, R = 8 000 Hz, et N = 320 échantillons.
- Dispositif selon la revendication 1, dans lequel chaque trame de codage a une longueur approximativement égale à 40 ms.
- Système selon la revendication 1, comportant en outre :des moyens pour appliquer une première transformée de Fourier discrète (606) (DFT) aux échantillons, la première transformée DFT ayant une amplitude associée,des moyens pour élever au carré l'amplitude (608) de la première transformée DFT (606),des moyens pour appliquer une seconde transformée DFT (610) à l'amplitude (608) élevée au carré.
- Système selon la revendication 5, dans lequel la valeur de retard de fondamental initiale a une erreur de prédiction associée et les moyens destinés à affiner la valeur de retard de fondamental initiale minimisent l'erreur de prédiction associée.
- Système selon la revendication 5, comportant en outre :des moyens pour estimer les estimations de retard de fondamental initiales retard1 et retard2 (714) qui représentent des estimations de retard, respectivement, pour la dernière sous-trame de codage (808) de chaque sous-trame de fondamental se trouvant dans la trame de codage courante,des moyens pour affiner (718) l'estimation de retard de fondamental retard0 de la seconde sous-trame de fondamental se trouvant dans la trame de codage précédente,des moyens pour interpoler linéairement (720) retard1, retard2, et retard0 afin d'estimer les valeurs de retard de fondamental (714) des sous-trames de codage (808).
- Système selon la revendication 1, comportant en outre des moyens destinés à réduire l'échantillonnage (704) des échantillons vocaux à une valeur de réduction d'échantillonnage permettant une représentation approximative à l'aide de moins d'échantillons.
- Système selon la revendication 8, dans lequel la valeur de retard de fondamental initiale est graduée (716) suivant l'équation : Retardgradué = Nombre d'échantillons vocaux/valeur de réduction d'échantillonnage.
- Système selon la revendication 5, dans lequel les moyens destinés à affiner la valeur de retard de fondamental initiale comporte une autocorrélation.
- Procédé de codage de voix pour reproduire et coder une voix d'entrée, le dispositif de codage de voix utilisant des paramètres de codage à prédiction linéaire (LPC) et un dictionnaire de nouveautés représentant des signaux pseudo--aléatoires qui forment une pluralité de vecteurs qui font l'objet d'une référence pour exciter une reproduction vocale afin de générer une voix, procédé de codage de voix comportant les étapes consistant à :recevoir (602) et traiter la voix d'entrée,traiter la voix d'entrée, l'étape de traitement incluant les étapes consistant à :déterminer une trame de codage de voix dans la voix d'entrée,sous-diviser la trame de codage en plusieurs sous-trames de fondamental (802, 804),définir une fenêtre d'analyse de fondamental (806) ayant N échantillons vocaux, la fenêtre d'analyse de fondamental s'étendant jusqu'aux sous-trames de fondamental (802, 804),estimer approximativement une valeur de retard de fondamental initiale (714) pour chaque sous-trame de fondamental (802, 804),diviser chaque sous-trame de fondamental (802, 804) en de multiples sous-trames de codage (808), de sorte que l'estimation de retard de fondamental initiale pour chaque sous-trame de fondamental (802, 804) représente l'estimation de retard pour la dernière sous-trame de codage (808) de retard sous-trame de fondamental (802, 804), etinterpoler linéairement (720) les valeurs de retard de fondamental estimées (714) entre les sous-trames de fondamental (802, 804) pour déterminer une estimation de retard de fondamental pour chaque sous-trame de codage (808), etaffiner (722) les valeurs de retard linéairement interpolées (720), etdélivrer en sortie une voix reproduite conformément aux valeurs de retard de fondamental affinées (722).
- Procédé selon la revendication 11, comportant en outre les étapes consistant à échantillonner la voix d'entrée à une vitesse d'échantillonnage R, de sorte que les N échantillons vocaux sont déterminés suivant l'équation N = R * X, où X est une valeur de réduction d'échantillonnage permettant une représentation à l'aide de moins d'échantillons.
- Procédé selon la revendication 11, comportant en outre les étapes consistant à :appliquer une première transformée de Fourier discrète (606) (DFT) aux échantillons, la première transformée DFT ayant une amplitude associée,élever au carré l'amplitude (608) de la première transformée DFT (606),appliquer une seconde transformée DFT (610) à l'amplitude (608) élevée au carré de la première transformée DFT (606),dans lequel la valeur de retard de fondamental initiale a une erreur de prédiction associée, etl'étape consistant à affiner la valeur de retard de fondamental initiale utilise une autocorrélation pour minimiser l'erreur de prédiction associée.
- Procédé selon la revendication 13, comportant en outre les étapes consistant à :estimer les estimations de retard de fondamental initiales retard1 et retard2 (714) qui représentent les estimations de retard, respectivement, pour la dernière sous-trame de codage (808) de chaque sous-trame de fondamental (802, 804) se trouvant dans la trame de codage courante,affiner (718) l'estimation de retard de fondamental retard0 de la seconde sous-trame de fondamental de la trame de codage précédente,interpoler linéairement (720) retard1, retard2, et retard0 pour estimer les valeurs de retard de fondamental (714) des sous-trames de codage (808).
- Procédé selon la revendication 11, comportant en outre l'étape consistant à réduire l'échantillonnage (704) des échantillons vocaux à une valeur de réduction d'échantillonnage permettant une représentation approximative à l'aide de moins d'échantillons.
- Procédé selon la revendication 11, comportant en outre l'étape consistant à graduer (716) la valeur de retard de fondamental initiale suivant l'équation : Retardgradué = Nombre d'échantillons vocaux/valeur de réduction d'échantillonnage.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US342494 | 1989-04-24 | ||
US34249494A | 1994-11-21 | 1994-11-21 |
Publications (3)
Publication Number | Publication Date |
---|---|
EP0713208A2 EP0713208A2 (fr) | 1996-05-22 |
EP0713208A3 EP0713208A3 (fr) | 1997-12-10 |
EP0713208B1 true EP0713208B1 (fr) | 2002-02-20 |
Family
ID=23342074
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP95118142A Expired - Lifetime EP0713208B1 (fr) | 1994-11-21 | 1995-11-17 | Système d'estimation de la fréquence fondamentale |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP0713208B1 (fr) |
JP (1) | JPH08211895A (fr) |
DE (1) | DE69525508T2 (fr) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1256000A (zh) * | 1998-01-26 | 2000-06-07 | 松下电器产业株式会社 | 增强音调的方法和装置 |
US6113653A (en) * | 1998-09-11 | 2000-09-05 | Motorola, Inc. | Method and apparatus for coding an information signal using delay contour adjustment |
JP4464488B2 (ja) * | 1999-06-30 | 2010-05-19 | パナソニック株式会社 | 音声復号化装置及び符号誤り補償方法、音声復号化方法 |
EP1619664B1 (fr) * | 2003-04-30 | 2012-01-25 | Panasonic Corporation | Appareil de codage et de décodage de la parole et méthodes pour cela |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5097508A (en) * | 1989-08-31 | 1992-03-17 | Codex Corporation | Digital speech coder having improved long term lag parameter determination |
US5189701A (en) * | 1991-10-25 | 1993-02-23 | Micom Communications Corp. | Voice coder/decoder and methods of coding/decoding |
-
1995
- 1995-11-14 JP JP7295266A patent/JPH08211895A/ja not_active Withdrawn
- 1995-11-17 DE DE69525508T patent/DE69525508T2/de not_active Expired - Lifetime
- 1995-11-17 EP EP95118142A patent/EP0713208B1/fr not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
DE69525508D1 (de) | 2002-03-28 |
DE69525508T2 (de) | 2002-06-20 |
JPH08211895A (ja) | 1996-08-20 |
EP0713208A3 (fr) | 1997-12-10 |
EP0713208A2 (fr) | 1996-05-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5781880A (en) | Pitch lag estimation using frequency-domain lowpass filtering of the linear predictive coding (LPC) residual | |
JP4843124B2 (ja) | 音声信号を符号化及び復号化するためのコーデック及び方法 | |
McCree et al. | A mixed excitation LPC vocoder model for low bit rate speech coding | |
Spanias | Speech coding: A tutorial review | |
US7092881B1 (en) | Parametric speech codec for representing synthetic speech in the presence of background noise | |
US5751903A (en) | Low rate multi-mode CELP codec that encodes line SPECTRAL frequencies utilizing an offset | |
EP0337636B1 (fr) | Dispositif de codage harmonique de la parole | |
JP3481390B2 (ja) | 短期知覚重み付けフィルタを使用する合成分析音声コーダに雑音マスキングレベルを適応する方法 | |
JP3277398B2 (ja) | 有声音判別方法 | |
US6078880A (en) | Speech coding system and method including voicing cut off frequency analyzer | |
JP5978218B2 (ja) | 低ビットレート低遅延の一般オーディオ信号の符号化 | |
EP0336658A2 (fr) | Quantification vectorielle dans un dispositif de codage harmonique de la parole | |
KR20020052191A (ko) | 음성 분류를 이용한 음성의 가변 비트 속도 켈프 코딩 방법 | |
US6138092A (en) | CELP speech synthesizer with epoch-adaptive harmonic generator for pitch harmonics below voicing cutoff frequency | |
JP2002516420A (ja) | 音声コーダ | |
EP1313091B1 (fr) | Procédés et système informatique pour l'analyse, la synthèse et la quantisation de la parole. | |
EP2593937B1 (fr) | Codeur et décodeur audio, et procédés permettant de coder et de décoder un signal audio | |
JP2004510174A (ja) | Celp型音声符号化装置用の利得量子化 | |
EP0713208B1 (fr) | Système d'estimation de la fréquence fondamentale | |
Korse et al. | Entropy Coding of Spectral Envelopes for Speech and Audio Coding Using Distribution Quantization. | |
JPH09508479A (ja) | バースト励起線形予測 | |
Srivastava | Fundamentals of linear prediction | |
JPH0782360B2 (ja) | 音声分析合成方法 | |
Yuan | The weighted sum of the line spectrum pair for noisy speech | |
Rabiner et al. | Use of a Computer Voice‐Response System for Wiring Communications Equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): DE FR GB |
|
PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): DE FR GB |
|
17P | Request for examination filed |
Effective date: 19980529 |
|
17Q | First examination report despatched |
Effective date: 20000306 |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: CONEXANT SYSTEMS, INC. |
|
RIC1 | Information provided on ipc code assigned before grant |
Free format text: 7G 10L 11/04 A, 7G 10L 19/08 B |
|
GRAG | Despatch of communication of intention to grant |
Free format text: ORIGINAL CODE: EPIDOS AGRA |
|
RIC1 | Information provided on ipc code assigned before grant |
Free format text: 7G 10L 11/04 A, 7G 10L 19/08 B |
|
GRAG | Despatch of communication of intention to grant |
Free format text: ORIGINAL CODE: EPIDOS AGRA |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: IF02 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB |
|
REF | Corresponds to: |
Ref document number: 69525508 Country of ref document: DE Date of ref document: 20020328 |
|
ET | Fr: translation filed | ||
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20021121 |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: 732E |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: TP |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R082 Ref document number: 69525508 Country of ref document: DE Representative=s name: DR. WEITZEL & PARTNER, DE |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: 732E Free format text: REGISTERED BETWEEN 20120705 AND 20120711 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R082 Ref document number: 69525508 Country of ref document: DE Representative=s name: DR. WEITZEL & PARTNER PATENT- UND RECHTSANWAEL, DE Effective date: 20120706 Ref country code: DE Ref legal event code: R081 Ref document number: 69525508 Country of ref document: DE Owner name: WIAV SOLUTIONS LLC, VIENNA, US Free format text: FORMER OWNER: CONEXANT SYSTEMS, INC., NEWPORT BEACH, CALIF., US Effective date: 20120706 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: TP Owner name: WIAV SOLUTIONS LLC, US Effective date: 20121029 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20141111 Year of fee payment: 20 Ref country code: FR Payment date: 20141110 Year of fee payment: 20 Ref country code: GB Payment date: 20141112 Year of fee payment: 20 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R071 Ref document number: 69525508 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: PE20 Expiry date: 20151116 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION Effective date: 20151116 |