EP0421360A2 - Procédé et dispositif d'analyse par synthèse de la parole - Google Patents

Procédé et dispositif d'analyse par synthèse de la parole Download PDF

Info

Publication number
EP0421360A2
EP0421360A2 EP90118888A EP90118888A EP0421360A2 EP 0421360 A2 EP0421360 A2 EP 0421360A2 EP 90118888 A EP90118888 A EP 90118888A EP 90118888 A EP90118888 A EP 90118888A EP 0421360 A2 EP0421360 A2 EP 0421360A2
Authority
EP
European Patent Office
Prior art keywords
speech
phase
impulse
filter
equalized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP90118888A
Other languages
German (de)
English (en)
Other versions
EP0421360A3 (en
EP0421360B1 (fr
Inventor
Masaaki Honda
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Publication of EP0421360A2 publication Critical patent/EP0421360A2/fr
Publication of EP0421360A3 publication Critical patent/EP0421360A3/en
Application granted granted Critical
Publication of EP0421360B1 publication Critical patent/EP0421360B1/fr
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters

Definitions

  • the present invention relates to a speech analysis-synthesis method and apparatus in which a linear filter representing the spectral envelope characteristic of a speech is excited by an excitation signal to synthesize a speech signal.
  • linear predictive vocoder and multipulse predictive coding have been proposed for use in speech analysis-synthesis systems of this kind.
  • the linear predictive vocoder is now widely used for speech coding in a low bit rate region below 4.8 kb/s and this system includes a PARCOR system and a line spectrum pair (LSP) system.
  • LSP line spectrum pair
  • the linear predictive vocoder is made up of an all-pole filter representing the spectral envelope characteristic of a speech and an excitation signal generating part for generating a signal for exciting the all-pole filter.
  • the excitation signal is a pitch frequency impulse sequence for a voiced sound and a white noise for an unvoiced sound.
  • Excitation parameters are the distinction between voiced and unvoiced sounds, the pitch frequency and the magnitude of the excitation signal. These parameters are extracted as average features of the speech signal in an analysis window about 30 msec.
  • speech feature parameters extracted for each analysis window as mentioned above are interpolated temporarily to synthesize a speech, features of its waveform cannot be reproduced with sufficient accuracy when the pitch frequency, magnitude and spectrum characteristic of the speech undergo rapid changes.
  • the excitation signal composed of the pitch frequency impulse sequence and the white noise is insufficient for reproducing features of various speech waveforms, it is difficult to produce a highly natural-sounding synthesized speech. To improve the quality of the synthesized speech in the linear predictive vocoder, it is considered in the art to use excitation which permits more accurate reproduction of features of the speech waveform.
  • the multipulse predictive coding is a method that uses excitation of higher producibility than in the conventional vocoder.
  • the excitation signal is expressed using a plurality of impulses and two all-pole filters representing proximity correlation and pitch correlation characteristics of speech are excited by the excitation signal to synthesize the speech.
  • the temporal positions and magnitudes of the impulses are selected such that an error between input original and synthesized speech waveforms is minimized. This is described in detail in B.S. Atal, "A New Model of LPC Excitation for Producing Natural-Sounding Speech at Low Bit Rates," IEEE Int. Conf. on ASSP, pp 614-617, 1982.
  • the speech quality can be enhanced by increasing the number of impulses used, but when the bit rate is low, the number of impulses is limited, and consequently, reproducibility of the speech waveform is impaired and no sufficient speech quality can be obtained. It is considered in the art that an amount of information of about 8 kb/s is needed to produce high speech quality.
  • a zero filter is excited by a quasi-periodic impulse sequence derived from a phase-equalized prediction residual of an input speech signal and the resulting output signal from the zero filter is used as an excitation signal for a voiced sound in the speech analysis-synthesis.
  • the coefficients of the zero filter are selected such that an error between a speech waveform synthesized by exciting an all-pole prediction filter by the excitation signal and the phase-equalized input signal is minimized.
  • the zero filter which is placed under the control of the thus selected coefficients, can synthesize an excitation signal accurately representing features of the prediction residual of the phase-equalized speech, in response to the above-mentioned quasi-periodic impulse sequence.
  • a quasi-periodic impulse sequence having limited fluctuation in its pitch period is produced.
  • the quasi-periodic impulse sequence As the above-­mentioned impulse sequence, it is possible to further reduce the amount of parameter information representing the impulse sequence.
  • the pitch period impulse sequence composed of the pitch period and magnitudes obtained for each analysis window is used as the excitation signal, whereas in the present invention the impulse position and magnitude are determined for each pitch period and, if necessary, the zero filter is introduced, with a view to enhancing the reproducibility of the speech waveform.
  • the excitation signal is represented by impulses each per pitch and the coefficients of the zero filter set for each fixed frame so as to reduce the amount of information for the excitation signal.
  • the prior art employs, as a criterion for determining the excitation parameters, an error between the input speech waveform and the synthesized speech waveform
  • the present invention uses an error between the input speech waveform and the phase-equalized speech waveform.
  • a waveform matching criterion for the phase-equalized speech waveform it is possible to improve matching between the input speech waveform and the speech waveform synthesized from the excitation signal used in the present invention. Since the phase-equalized speech waveform and the synthesized one are similar to each other, the number of excitation parameters can be reduced by determining them while comparing the both speech waveforms.
  • Fig. 1 illustrates in block form the constitution of the speech analysis-synthesis system of the present invention.
  • a sampled digital speech signal s(t) is input via an input terminal 1.
  • a prediction residual signal e(t) of the input speech signal s(t) is obtained by an inverse filter (not shown) which uses the set of prediction coefficients as its filter coefficients.
  • phase equalizing-analyzing part 4 coefficients of a phase equalizing filter for rendering the phase characteristic of the speech into a zero phase and reference time points of phase equalization are computed.
  • Fig. 2 shows in detail the constitution of the phase equalizing-analyzing part 4.
  • the speech signal s(t) is applied to an inverse filter 31 to obtain the prediction residual e(t).
  • the prediction residual e(t) is provided to a maximum magnitude position detecting part 32 and a phase equalizing filter 37.
  • a switch control part 33C monitors the decision signal VU fed from the linear predictive analyzing part 2 and normally connects a switch 33 to the output side of a magnitude comparing part 38, but when the current window is of a voiced sound V and the immediately preceding frame is of an unvoiced sound U, the switch 33 is connected to the output side of the maximum magnitude position detecting part 32.
  • the maximum magnitude position detecting part 32 detects and outputs a sample time point t′ p at which the magnitude of the prediction residual e(t) is maximum.
  • phase-equalizing filter coefficients h t′i (k) have been obtained for the currently determined reference time point t′ i at a coefficient smoothing part 35.
  • the coefficients h t′i (k) are supplied from the filter coefficient holding part 36 to the phase equalizing filter 37.
  • the precipitation residual e(t) which is the output of the inverse filter 31, is phase-equalized by the phase equalizing filter 37 and output therefrom as phase-equalized prediction residual e p (t). It is well known that when the input speech signal s(t) is a voiced sound signal, the prediction residual e(t) of the speech signal has a waveform having impulses at the pitch intervals of the voiced sound.
  • the phase equalizing filter 37 produces an effect of emphasizing the magnitudes of impulses of such pitch intervals.
  • the magnitude comparing part 38 compares levels of the phase-equalized prediction residual e p (t) with a predetermined threshold value, determines, as an impulse position, each sample time point where the sample value exceeds the threshold value, and outputs the impulse position as the next reference time point t′ i+1 on the condition that an allowable minimum value of the impulse intervals is L min and the next reference time point t′ i+1 is searched for from sample points spaced more than the value L min apart from the time point t′ i .
  • the phase-equalized residual e p (t) during the unvoiced sound frame is composed of substantially random components (or white noise) which are considerably lower than the threshold value mentioned above, and the magnitude comparing part 38 does not produce, as an output of the phase equalizing-analyzing part 4, the next reference time point t′ i+1 . Rather, the magnitude comparing part 38 determines a dummy reference time point t′ i+1 at, for example, the last sample point of the frame (but not limited thereto) so as to be used for determination of smoothed filter coefficients at the smoothing part 35 as will be explained later.
  • the characteristic of the phase-equalizing filter 37 expressed by Eq. (2) represents such a characteristic that the input signal thereto is passed therethrough intact.
  • the filter coefficients h*(k) thus calculated for the next reference time point t′ i+1 are smoothed by the coefficient smoothing part 35 as will be described later to obtain smoothed phase equalizing filter coefficients h t′i+1 (k), which are held by the coefficient holding part 36 and supplied as updated coefficients h t′i (k) to the phase equalizing filter 37.
  • the phase equalizing filter 37 having its coefficients thus updated phase-equalizes the prediction residual e(t) again, and based on its output, the next impulse position, i.e., a new next reference time point t′ i+1 is determined by the magnitude comparing part 38.
  • a next reference time point t′ i+1 is determined based on the phase-equalized residual e p (t) output from the phase equalizing filter 37 whose coefficients have been set to h t′i (k) and, thereafter, new smoothed filter coefficients h t′i+1 (k) are calculated for the reference time point t′ i+1 .
  • the prediction residual e(t) including impulses of the pitch frequency are provided, for the first time, to the phase equalizing filter 37 having set therein the filter coefficients given essentially by Eq. (1).
  • the magnitudes of impulses are not emphasized and, consequently, the prediction residual e(t) is output intact from the filter 37.
  • the magnitudes of impulses of the pitch frequency happen to be smaller than the threshold value, the impulses cannot be detected in the magnitude comparing part 38. That is, the speech is processed as if no impulses are contained in the prediction residual, and consequently, the filter coefficients h*(k) for the impulse positions are not obtained -- this is not preferable from the viewpoint of the speech quality in the speech analysis-synthesis.
  • the maximum magnitude detecting part 32 detects the maximum magnitude position t′ p of the prediction residual e(t) in the voiced sound frame and provides it via the switch 33 to the filter coefficient calculating part 34 and, at the same time, outputs it as a reference time point.
  • the filter coefficient calculating part 34 calculates the filter coefficients h*(k), using the reference time point t′ p in place of t′ i+1 in Eq. (2).
  • the coefficient b is set to a value of about 0.97.
  • h t-1 (k) represents smoothed filter coefficients at an arbitrary sample point (t-1) in the time interval between the current reference time point t′ i and the next reference time point t′ i+1
  • h t (k) represents the smoothed filter coefficients at the next sample point. This smoothing takes place for every sample point from a sample point next to the current reference time point t′ i , for which the smoothed filter coefficients have already been obtained, to the next reference time point t′ i+1 for which the smoothed filter coefficients are to be obtained next.
  • the filter coefficient holding part 36 holds those of the thus sequentially smoothed filter coefficients h t (k) which were obtained for the last sample point which is the next reference time point, that is, h t′i+1 (k), and supplies them as updated filter coefficients h t′i (k) to the phase equalizing filter 37 for further determination of a subsequent next reference time point.
  • the phase equalizing filter 37 is supplied with the prediction residual e(t) and calculates the phase-­equalized prediction residual e p (t) by the following equation:
  • the calculation of Eq. (4) needs only to be performed until the next impulse position is detected by the magnitude comparing part 38 after the reference time point t′ i at which the above-said smoothed filter coefficients were obtained.
  • the magnitude comparing part 38 the magnitude level of the phase-equalized prediction residual e p (t) is compared with a threshold value, and the sample point where the former exceeds the latter is detected as the next reference time point t′ i+1 in the current frame.
  • processing is performed by which the time point where the phase-equalized prediction residual e p (t) takes the maximum magnitude until then is detected as the next reference time point t′ i+1 .
  • steps 5 through 8 are repeatedly performed in the same manner as mentioned above, by which the smoothed filter coefficients h t′i (k) at all impulse positions in the frame can be obtained.
  • the smoothed filter coefficients h t (k) obtained in the phase equalizing-­analyzing part 4 are used to control the phase equalizing filter 5.
  • the processing expressed by the following equation is performed to obtain a phase-­equalized speech signal Sp(t).
  • the voiced sound excitation source comprises an impulse sequence generating part 7 and an all-zero filter (hereinafter referred to simply as zero filter) 10.
  • the impulse sequence generating part 7 generates such a quasi-periodic impulse sequence as shown in Fig. 3 in which the impulse position t i and the magnitude m i of each impulse are specified.
  • the temporal position (the impulse position) t i and the magnitude m i of each impulse in the quasi-periodic impulse sequence are represented as parameters.
  • the impulse position t i is produced by an impulse position generating part 6 based on the reference time point t′ i , and the impulse magnitude m i is controlled by an impulse magnitude calculating part 8.
  • the impulse magnitude at each impulse position t i generated by the impulse position generating part 6 is selected so that a frequency-weighted mean square error between a synthesized speech waveform Sp′(t) produced by exciting such an all-pole filter 18 with the impulse sequence created by the impulse sequence generating part 7 and an input speech waveform Sp(t) phase-equalized by a phase equalizing filter 5 may be eventually minimized.
  • Fig. 6 shows the internal construction of the impulse magnitude calculating part 8.
  • the phase-equalized input speech waveform Sp(t) is supplied to a frequency weighting filter processing part 39.
  • is a parameter which controls the degree of suppression and is in the range of 0 ⁇ ⁇ ⁇ 1, and the degree of suppression increases as the value of ⁇ decreases. Usually, ⁇ is in the range of 0.7 to 0.9.
  • the frequency weighting filter processing part 39 has such a construction as shown in Fig. 6A.
  • the linear prediction coefficients a i are provided to a frequency weighting filter coefficient calculating part 39A, in which coefficients ⁇ i a i of a filter having a transfer characteristic A(z/ ⁇ ) are calculated.
  • a zero input response calculating part 39C uses, as an initial value, a synthesized speech obtained as the output of an all-pole filter 18A (see Fig. 1) of a transfer characteristic 1/A(z/ ⁇ ) in the preceding frame and outputs an initial response when the all-pole filter 18A is excited by a zero input.
  • a target signal calculating part 39D subtracts the output of the zero input response calculating part 39C from the output S′w(t) of the frequency weighting filter 39B to obtain a frequency-weighted signal Sw(t).
  • the output ⁇ i a i of the frequency weighting filter coefficient processing part 39A is supplied to an impulse response calculating part 40 in Fig. 6, in which an impulse response f(t) of a filter having the transfer characteristic 1/A(z/ ⁇ ) is calculated.
  • Another correlation calculating part 42 calculates a covariance ⁇ (i, j) of the impulse response for a set of impulse positions t i , t j as follows:
  • An impulse magnitude calculating part 43 obtains impulse magnitudes m i from ⁇ (t) and ⁇ (i, j) by solving the following simultaneous equations, which equivalently minimize a mean square error between a synthesized speech waveform obtainable by exciting the all-pole filter 18 with the impulse sequence thus determined and the phase-­equalized speech waveform Sp(t).
  • the impulse magnitudes m i are quantized by the quantizer 9 in Fig. 1 for each frame. This is carried out by, for example, a scalar quantization or vector quantization method.
  • a vector (a magnitude pattern) using respective impulse magnitudes ml as its elements is compared with a plurality of predetermined standard impulse magnitude patterns and is quantized to that one of them which minimizes the distance between the patterns.
  • a measure of the distance between the magnitude patterns corresponds essentially to a mean square error between the speech waveform Sp′(t) synthesized, without using the zero filter, from the standard impulse magnitude pattern selected in the quantizer 9 and the phase-equalized input speech waveform Sp(t). For example, letting the magnitude pattern vector obtained by solving Eq.
  • the quantized value m ⁇ of the above-mentioned magnitude pattern is expressed by the following equation, as a standard pattern which minimizes the mean square error d(m, m c ) in Eq, (12) in the afore-mentioned plurality of standard pattern vectors m ci .
  • the zero filter 10 is to provide an input impulse sequence with a feature of the phase-equalized prediction residual waveform, and the coefficients of this filter are produced by a zero filter coefficient calculating part 11.
  • Fig. 7A shows an example of the phase-equalized prediction residual waveform e p (t)
  • Fig. 7B an example of an impulse response waveform of the zero filter 10 for the input impulse thereto.
  • the phase-­equalized prediction residual e p (t) has a flat spectral envelope characteristic and a phase close to zero, and hence is impulsive and large in magnitude at impulse positions t i , t i+1 , ... but relatively small at other positions.
  • the waveform is substantially symmetric with respect to each impulse position and each midpoint between adjacent impulse positions, respectively.
  • the magnitude at the midpoint is relatively larger than at other positions (except for impulse positions) as will be seen from Fig. 7A, and this tendency increases for a speech of a long pitch frequency, in particular.
  • the zero filter 10 is set so that its impulse response assume values at successive q sample points on either side of the impulse position t i and at successive r sample points on either side of the midpoint between the adjacent impulse positions t i and t i+1 , as depicted in Fig. 7B.
  • the transfer characteristic of the zero filter 10 is expressed as follows:
  • filter coefficients v k are determined such that a frequency-weighted mean square error between the synthesized speech waveform Sp′(t) and the phase-equalized input speech waveform Sp(t) may be minimum.
  • Fig. 8 illustrates the construction of the filter coefficient calculating part 11.
  • a frequency weighting filter processing part 44 and an impulse response calculating part 45 are identical in construction with the frequency weighting filter processing part 39 and the impulse response calculating part 40 in Fig. 6, respectively.
  • a correlation calculating part 47 calculates the cross-covariance ⁇ (i) between the signals Sw(t) and u i (t), and another correlation calculating part 48 calculates the auto-covariance ⁇ (i, j) between the signals u i (t) and u j (t).
  • a filter coefficient calculating part 49 calculates coefficients v i of the zero filter 10 from the above-said cross correlation ⁇ (i) and covariance ⁇ (i, j) by solving the following simultaneous equations: These solutions eventually minimize a mean square error between a synthesized speech waveform obtainable by exciting the all-pole filter 18 with the output of the zero filter 10 and the phase-equalized speech waveform Sp(t).
  • the filter coefficient v i is quantized by a quantizer 12 in Fig. 1. This is performed by use of a scalar quantization or vector quantization technique, for example.
  • a vector (a coefficient pattern) using the filter coefficients v i as its elements is compared with a plurality of predetermined standard coefficient patterns and is quantized to a standard pattern which minimizes the distance between patterns.
  • the quantized value v ⁇ of the filter coefficients is obtained by the following equation: where v is a vector using, as its elements, coefficients v -q , v- q+1 , ..., v q+2r+1 obtained by solving Eq. (16), and v ci is a standard pattern vector of the filter coefficients. Further, ⁇ is a matrix using as its elements the covariance ⁇ (i, j) of the impulse response u i (t).
  • the speech signal Sp′(t) is synthesized by exciting an all-pole filter featuring the speech spectrum envelope characteristic, with a quasi-periodic impulse sequence which is determined by impulse positions based on the phase-equalized residual e p (t) and impulse magnitudes determined so that an error of the synthesized speech is minimum.
  • the impulse magnitudes m i and the coefficients v i of the zero filter are set to optimum values which minimize the matching error between the synthesized speech waveform Sp′(t) and the phase-equalized speech waveform Sp(t).
  • a random pattern generating part 13 in Fig. 1 has stored therein a plurality of patterns each composed of a plurality of normal random numbers with a mean 0 and a variance 1.
  • a gain calculating part 15 calculates, for each random pattern, a gain g i which makes equal the power of the synthesized speech Sp′(t) by the output random pattern and the power of the phase-equalized speech Sp(t), and a scalar-­quantized gain ⁇ by a quantizer 16 is used to control an amplifier 14.
  • a matching error between a synthesized speech waveform Sp′(t) obtained by applying each of all the random patterns to the all-pole filter 18 and the phase-equalized speech Sp′(t) is obtained by the waveform matching error calculating part 19.
  • the errors thus obtained are decided by the error deciding part 20 and the random pattern generating part 13 searches for an optimum random pattern which minimizes the waveform matching error.
  • one frame is composed of three successive random patterns. This random pattern sequence is applied as the excitation signal to the all-­pole filter 18 via the amplifier 14.
  • the speech signal is represented by the linear prediction coefficients a i and the voiced/unvoiced sound parameter VU; the voiced sound is represented by the impulse positions t i , the impulse magnitudes m ⁇ i and zero filter coefficients v ⁇ i , and the unvoiced sound is represented by the random number code pattern (number) c i and the gain ⁇ i .
  • These speech parameters are coded by a coding part 21 and then transmitted or stored. In a speech synthesizing part the speech parameters are decoded by a decoding part 22.
  • an impulse sequence composed of the impulse positions t i and the impulse magnitudes m ⁇ i is produced in an impulse sequence generating part 23 and is applied to a zero filter 24 to create an excitation signal.
  • a random pattern is selectively generated by a random pattern generating part 25 using the random number code (signal) c i and is applied to an amplifier 26 which is controlled by the gain ⁇ i and in which it is magnitude-controlled to produce an excitation signal.
  • Either one of the excitation signals thus produced is selected by a switch 27 which is controlled by the voiced/unvoiced parameter VU and the excitation signal thus selected is applied to an all-pole filter 28 to excite it, providing a synthesized speech at its output end 29.
  • the filter coefficients of the zero filter 24 are controlled by v ⁇ i and the filter coefficients of the all-pole filter 28 are controlled by a i .
  • the impulse excitation source is used in common to voiced and unvoiced sounds in the construction of Fig. 1. That is, the random pattern generating part 13, the amplifier 14, the gain calculating part 15, the quantizer 16 and the switch 17 are omitted, and the output of the zero filter 10 is applied directly to the all-pole filter 18.
  • the bit rate is reduced by 60 bits per second.
  • the zero filter 10 is not included in the impulse excitation source in Fig. 1, that is, the zero filter 10, the zero filter coefficient calculating part 11 and the quantizer 12 are omitted, and the output of the impulse sequence generating part 7 is provided via the switch 17 to the all-pole filter 18. (The zero filter 24 is also omitted accordingly.)
  • the natural sounding property of the synthesized speech is somewhat degraded for a speech of a male voice of a low pitch frequency, but the removal of the zero filter 10 reduces the scale of hardware used and the bit rate is reduced by 600 bits per second which are needed for coding filter coefficients.
  • a third modified form processing by the impulse magnitude calculating part 8 and processing by the vector quantizing part 9 in Fig. 1 are integrated for calculating a quantized value of the impulse magnitudes.
  • Fig. 9 shows the construction of this modified form.
  • a frequency weighting filter processing part 50, an impulse response calculating part 51, a correlation calculating part 52 and another correlation calculating part 53 are identical in construction with those in Fig. 6.
  • Figs. 6 and 9 are nearly equal in the amount of data to be processed for obtaining the optimum impulse magnitude, but in Fig. 9 processing for solving the simultaneous equations included in the processing of Fig. 6 is not required and the processor is simple-structured accordingly.
  • the maximum value of the impulse magnitude can be scalar-­quantized, whereas in Fig. 9 it is premised that the vector quantization method is used.
  • the impulse position generating part 6 is not provided, and consequently, processing shown in Fig. 4 is not involved, but instead all the reference time points t′ i provided from the phase equalizing-analyzing part 4 are used as impulse positions t i .
  • the throughput for enhancing the quality of the synthesized speech by the use of the zero filter 10 may also be assigned for the reduction of the impulse position information at the expense of the speech quality.
  • the constant J representing the allowed limit of fluctuations in the impulse frequency in the impulse source, the allowed maximum number of impulses per frame, Np, and the allowed minimum value of impulse intervals, L min , are dependent on the number of bits assigned for coding of the impulse positions.
  • the difference between adjacent impulse intervals, ⁇ T be equal to or smaller than 5 samples
  • the maximum number of impulses, Np be equal to or smaller than 6 samples
  • the allowed minimum impulse interval L min be equal to or greater than 13 samples.
  • the random pattern vector c i is composed of 40 samples (5 ms) and is selected from 512 kinds of patterns (9-bit).
  • the gain g i is scalar-­quantized using 6 bits including a sign bit.
  • the speech coded using the above conditions is far natural sounding than the speech by the conventional vocoder and its quality is close to that of the original speech. Further, the dependence of speech quality on the speaker in the present invention is lower than in the case of the prior art vocoder. It has been ascertained that the quality of the coded speech is apparently higher than in the cases of the conventional multipulse predictive coding and the code excited predictive coding.
  • a spectral envelope error of a speech coded at 4.8 kb/s is about 1 dB.
  • a coding delay of this invention is 45 ms, which is equal to or shorter than that of the conventional low-bit rate speech coding schemes.
  • the reproducibility of speech waveform information is higher than in the conventional vocoder and the excitation signal can be expressed with a smaller amount of information than in the conventional multipulse predictive coding.
  • the present invention enhances matching between the synthesized speech waveform and the input speech waveform as compared with the prior art utilizing an error between the input speech itself and the synthesized speech, and hence permits an accurate estimation of the excitation parameters.
  • the zero filter produces the effect of reproducing fine spectral characteristics of the original speech, thereby making the synthesized speech more natural sounding.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
EP90118888A 1989-10-02 1990-10-02 Procédé et dispositif d'analyse par synthèse de la parole Expired - Lifetime EP0421360B1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP257503/89 1989-10-02
JP1257503A JPH0782360B2 (ja) 1989-10-02 1989-10-02 音声分析合成方法

Publications (3)

Publication Number Publication Date
EP0421360A2 true EP0421360A2 (fr) 1991-04-10
EP0421360A3 EP0421360A3 (en) 1991-12-27
EP0421360B1 EP0421360B1 (fr) 1996-01-17

Family

ID=17307200

Family Applications (1)

Application Number Title Priority Date Filing Date
EP90118888A Expired - Lifetime EP0421360B1 (fr) 1989-10-02 1990-10-02 Procédé et dispositif d'analyse par synthèse de la parole

Country Status (4)

Country Link
EP (1) EP0421360B1 (fr)
JP (1) JPH0782360B2 (fr)
CA (1) CA2026640C (fr)
DE (1) DE69024899T2 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2741744A1 (fr) * 1995-11-23 1997-05-30 Thomson Csf Procede et dispositif d'evaluation de l'energie du signal de parole par sous bande pour vocodeur bas debits
WO2000011660A1 (fr) * 1998-08-24 2000-03-02 Conexant Systems, Inc. Compensation d'inclinaisons adaptative pour residus vocaux synthetises
CN108281150A (zh) * 2018-01-29 2018-07-13 上海泰亿格康复医疗科技股份有限公司 一种基于微分声门波模型的语音变调变嗓音方法
CN113066476A (zh) * 2019-12-13 2021-07-02 科大讯飞股份有限公司 合成语音处理方法及相关装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1252679C (zh) 1997-03-12 2006-04-19 三菱电机株式会社 声音编码装置、声音编码译码装置、以及声音编码方法
US7072832B1 (en) 1998-08-24 2006-07-04 Mindspeed Technologies, Inc. System for speech encoding having an adaptive encoding arrangement
JP4999757B2 (ja) * 2008-03-31 2012-08-15 日本電信電話株式会社 音声分析合成装置、音声分析合成方法、コンピュータプログラム、および記録媒体
JP5325130B2 (ja) * 2010-01-25 2013-10-23 日本電信電話株式会社 Lpc分析装置、lpc分析方法、音声分析合成装置、音声分析合成方法及びプログラム

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4850022A (en) * 1984-03-21 1989-07-18 Nippon Telegraph And Telephone Public Corporation Speech signal processing system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4850022A (en) * 1984-03-21 1989-07-18 Nippon Telegraph And Telephone Public Corporation Speech signal processing system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ICASSP'86 - IEEE-IECEJ-ASJ INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, Tokyo, 7th - 11th April 1986, vol. 3, pages 1701-1704, IEEE, New York, US; T. MORIYA et al.: "Speech coder using phase equalization and vector quantization" *
ICASSP'90 - 1990 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, Albuquerque, New Mexico, 3rd - 6th April 1990, vol. 1, pages 213-216, IEEE, New York, US; M. HONDA: "Speech coding using waveform matching based on LPC residual phase equalization" *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2741744A1 (fr) * 1995-11-23 1997-05-30 Thomson Csf Procede et dispositif d'evaluation de l'energie du signal de parole par sous bande pour vocodeur bas debits
WO2000011660A1 (fr) * 1998-08-24 2000-03-02 Conexant Systems, Inc. Compensation d'inclinaisons adaptative pour residus vocaux synthetises
US6385573B1 (en) 1998-08-24 2002-05-07 Conexant Systems, Inc. Adaptive tilt compensation for synthesized speech residual
CN108281150A (zh) * 2018-01-29 2018-07-13 上海泰亿格康复医疗科技股份有限公司 一种基于微分声门波模型的语音变调变嗓音方法
CN108281150B (zh) * 2018-01-29 2020-11-17 上海泰亿格康复医疗科技股份有限公司 一种基于微分声门波模型的语音变调变嗓音方法
CN113066476A (zh) * 2019-12-13 2021-07-02 科大讯飞股份有限公司 合成语音处理方法及相关装置
CN113066476B (zh) * 2019-12-13 2024-05-31 科大讯飞股份有限公司 合成语音处理方法及相关装置

Also Published As

Publication number Publication date
DE69024899D1 (de) 1996-02-29
JPH03119398A (ja) 1991-05-21
CA2026640A1 (fr) 1991-04-03
EP0421360A3 (en) 1991-12-27
EP0421360B1 (fr) 1996-01-17
DE69024899T2 (de) 1996-07-04
CA2026640C (fr) 1996-07-09
JPH0782360B2 (ja) 1995-09-06

Similar Documents

Publication Publication Date Title
US5293448A (en) Speech analysis-synthesis method and apparatus therefor
US5305421A (en) Low bit rate speech coding system and compression
McCree et al. A mixed excitation LPC vocoder model for low bit rate speech coding
US6345248B1 (en) Low bit-rate speech coder using adaptive open-loop subframe pitch lag estimation and vector quantization
CA1123955A (fr) Appareil d'analyse et de synthese de la parole
EP0745971A2 (fr) Système d'estimation du pitchlag utilisant codage résiduel selon prédiction
EP0718822A2 (fr) Codec CELP multimode à faible débit utilisant la rétroprédiction
KR20030035522A (ko) 스무딩 필터를 이용한 음성 합성 시스템 및 그 방법
EP0342687B1 (fr) Système de transmission de parole codée comportant des dictionnaires de codes pour la synthése des composantes de faible amplitude
US4701955A (en) Variable frame length vocoder
US4918734A (en) Speech coding system using variable threshold values for noise reduction
KR100497788B1 (ko) Celp 코더내의 여기 코드북을 검색하기 위한 방법 및 장치
US5884251A (en) Voice coding and decoding method and device therefor
JP3687181B2 (ja) 有声音/無声音判定方法及び装置、並びに音声符号化方法
US4720865A (en) Multi-pulse type vocoder
JPH10207498A (ja) マルチモード符号励振線形予測により音声入力を符号化する方法及びその符号器
US8195463B2 (en) Method for the selection of synthesis units
KR100421648B1 (ko) 음성코딩을 위한 적응성 표준
EP0421360B1 (fr) Procédé et dispositif d'analyse par synthèse de la parole
EP0745972B1 (fr) Procédé et dispositif de codage de parole
JP3531780B2 (ja) 音声符号化方法および復号化方法
JP2000235400A (ja) 音響信号符号化装置、復号化装置、これらの方法、及びプログラム記録媒体
EP0713208B1 (fr) Système d'estimation de la fréquence fondamentale
JPH0830299A (ja) 音声符号化装置
JP3552201B2 (ja) 音声符号化方法および装置

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19901002

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): DE FR GB SE

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): DE FR GB SE

17Q First examination report despatched

Effective date: 19940526

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB SE

REF Corresponds to:

Ref document number: 69024899

Country of ref document: DE

Date of ref document: 19960229

ET Fr: translation filed
REG Reference to a national code

Ref country code: FR

Ref legal event code: CA

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20070926

Year of fee payment: 18

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20071030

Year of fee payment: 18

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: SE

Payment date: 20071004

Year of fee payment: 18

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20070716

Year of fee payment: 18

EUG Se: european patent has lapsed
GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20081002

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20090630

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20090501

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20081031

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20081002

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20081003