EP0950238B1 - Sprachkodier- und dekodiersystem - Google Patents
Sprachkodier- und dekodiersystem Download PDFInfo
- Publication number
- EP0950238B1 EP0950238B1 EP97930643A EP97930643A EP0950238B1 EP 0950238 B1 EP0950238 B1 EP 0950238B1 EP 97930643 A EP97930643 A EP 97930643A EP 97930643 A EP97930643 A EP 97930643A EP 0950238 B1 EP0950238 B1 EP 0950238B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- frame
- voiced
- pitch
- speech
- lpc
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 62
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 62
- 239000013074 reference sample Substances 0.000 claims abstract description 8
- 238000005314 correlation function Methods 0.000 claims abstract description 7
- 238000000034 method Methods 0.000 claims description 170
- 239000013598 vector Substances 0.000 claims description 146
- 230000008569 process Effects 0.000 claims description 119
- 230000003595 spectral effect Effects 0.000 claims description 82
- 230000005284 excitation Effects 0.000 claims description 71
- 238000001228 spectrum Methods 0.000 claims description 61
- 238000013139 quantization Methods 0.000 claims description 33
- 238000012549 training Methods 0.000 claims description 25
- 238000005070 sampling Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 8
- 230000004044 response Effects 0.000 claims description 7
- 230000001419 dependent effect Effects 0.000 claims description 3
- 230000003247 decreasing effect Effects 0.000 claims description 2
- 238000012804 iterative process Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 description 36
- 238000010586 diagram Methods 0.000 description 20
- 238000013459 approach Methods 0.000 description 14
- 230000009466 transformation Effects 0.000 description 12
- 230000000737 periodic effect Effects 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 8
- 230000005540 biological transmission Effects 0.000 description 7
- 239000011159 matrix material Substances 0.000 description 6
- 230000003044 adaptive effect Effects 0.000 description 5
- 238000012938 design process Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 239000000523 sample Substances 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000012805 post-processing Methods 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 3
- 230000001755 vocal effect Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 101100205313 Caenorhabditis elegans nars-1 gene Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000004870 electrical engineering Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/10—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
- G10L2025/937—Signal energy in various frequency bands
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
Definitions
- the present invention relates to speech synthesis systems, and in particular to speech systems coding and synthesis systems which can be used in speech communication systems operating at low bit rates.
- Speech can be represented as a waveform the detailed structure of which represents the characteristics of the vocal tract and vocal excitation of the person producing the speech. If a speech communication system is to be capable of providing an adequate perceived quality, the transmitted information must be capable of representing that detailed structure. Most of the power in voiced speech is at relatively low frequencies, for example below 2kHz. Accordingly good quality speech synthesis can be achieved on the basis of speech waveforms that have been low pass filtered to reject higher frequency components. The perceived speech quality is however adversely effected if the frequency is restricted much below 4kHz.
- An error measure is calculated in the time domain representing the difference between harmonic and aharmonic speech spectra and that error measure is used to define the degree of voicing of the input frame in terms of a frequency value.
- the parameters used to represent a frame are the pitch period, the magnitude and phase values for each harmonic, and the frequency value. Proposals have been made to operate this system such that phase information is predicted in a coherent way across successive frames.
- Multiband excitation coding In another system known as “multiband excitation coding” (D.W. Griffin and J.S. Lim, “Multiband Excitation Vocoder” IEEE Transaction on Acoustics, Speech and Signal Processing, vol. 36, pp 1223-1235, 1988 and Digital Voice Systems Inc, "INMARSAT M Voice Codec, Version 3.0", Voice Coding System Description, Module 1, Appendix 1, August 1991) the amplitude and phase functions are determined in a different way from that employed in sinusoidal coding. The emphasis in this system is placed on dividing a spectrum into bands, for example up to twelve bands, and evaluating the voiced/unvoiced nature of each of these bands. Bands that are classified as unvoiced are synthesised using random signals.
- linear interpolation is used to define the required amplitudes.
- the phase function is also defined using linear frequency interpolation but in addition includes a constant displacement which is a random variable and which depends on the number of unvoiced bands present in the short term spectrum of the input signal. The system works in a way to preserve phase continuity between successive frames.
- a weighted summation of signals produced from amplitudes and phases derived for successive frames is formed to produced the synthesised signal.
- both schemes directly model the input speech signal which is DFT analysed, and both systems are at least partially based on the same fundamental relationship for representing speech to be synthesised.
- the systems differ however in terms of the way in which amplitudes and phase are estimated and quantized, the way in which different interpolation methods are used to define the necessary phase relationships, and the way in which "randomness" is introduced in the recovered speech.
- EUSIPCO-94 Edingburgh, Vol. 2, pp. 391-394, September 1994
- the short term magnitude spectrum is divided into two bands and a separate pitch frequency is calculated for each band
- the spectral excitation coding system V. Cuperman, P. Lupini and B. Bhattacharya, " Spectral Excitation Coding of Speech at 2.4 kb / s ", IEEE Proc. ICASSP-95, pp.
- a further type of coding system exists, that is the prototype interpolation coding system. This relies upon the use of pitch period segments or prototypes which are spaced apart in time and reiteration/interpolation techniques to synthesise the signal between two prototypes.
- Such a system was described as early as 1971 (J.S. Severwight, "Interpolation Reiterations Techniques for Efficient Speech Transmission", Ph.D. Thesis, Loughborough University, Department of Electrical Engineering, 1971). More sophisticated systems of the same general class have been described more recently, for example in the paper by W.B. Kleijn, "Continuous Representations in Linear Predictive Coding, Proc. ICASSP-91, pp201-204, May 1991. The same author has published a series of related papers.
- the system employs 20msecs coding frames which are classified as voiced or unvoiced. Unvoiced frames are effectively CELP coded. Pitch prototype segments are defined in adjacent voiced frames, in the LPC residual signal, in a way which ensures maximum alignment (correlation) of the prototypes and defines the prototype so that the main pitch excitation pulse is not near to either of the ends of the prototype. A pitch period in a given frame is considered to be a cycle of an artificial periodic signal from which the prototype for the frame is obtained.
- the prototypes which have been appropriately selected from adjacent frames are Fourier transformed and the resulting coefficients are coded using a differential vector quantization scheme.
- the decoded prototype Fourier representations for adjacent frames are used to reconstruct the missing signal waveform between the two prototype segments using linear interpolation.
- the residual signal is obtained which is then presented to an LPC synthesis filter the output of which provides the synthesised voiced speech signal.
- An amount of randomness can be introduced into voiced speech by injecting noise at frequencies larger than 2khz, the amplitude of the noise increasing with frequency.
- the periodicity of synthesised voiced speech is controlled during the quantization of prototype parameters in accordance with a long term signal to change ratio measure that reflects the similarity which exists between the prototypes of adjacent frames in the residual excitation signal.
- the known prototype interpolation coding systems rely upon a Fourier Series synthesis equation which involves a linear-with-time-interpolation process.
- the assumption is that the pitch estimates for successive frames are linearly interpolated to provide a pitch function and an associated instant fundamental frequency.
- the instant phase used in the cosine and sine terms of the Fourier series synthesis equation is the integral of the instantaneous harmonic frequencies. This synthesis arrangement allows for the linear evolution of the instantaneous pitch and the non-linear evolution of the instantaneous harmonic frequencies.
- the "slowly evolving" signal is sampled at relatively long intervals of 25msecs, but the parameters are quantized quite accurately on the basis of spectral magnitude information.
- the spectral magnitude of the "rapidly evolving" signal is sampled frequently, every 4msecs, but is quantized less accurately. Phase information is randomised every 2msecs.
- a speech synthesis system in which a speech signal is divided into a series of frames, and each frame is converted into a coded signal including a voiced/unvoiced classification and a pitch estimate, wherein a low pass filtered speech segment centred about a reference sample is defined in each frame, a correlation value is calculated for each of a series of candidate pitch estimates as the maximum of multiple crosscorrelation values obtained from variable length speech segments centred about the reference sample, the correlation values are used to form a correlation function defining peaks, and the locations of the peaks are determined and used to define a pitch estimate.
- the pitch estimate is defined using an iterative process.
- a single reference sample may be used, for example centred with respect to the respective frame, or alternatively multiple pitch estimates may be derived for each frame using different reference samples, the multiple pitch estimates being combined to define a combined pitch estimate for the frame.
- the pitch estimate may be modified by reference to a voiced/unvoiced status and/or pitch estimates of adjacent frames to define a final pitch estimate.
- the correlation function may be clipped using a threshold value, remaining peaks being rejected if they are adjacent to larger peaks. Peaks are initially selected and can be rejected if they are smaller than a following peak by more than a predetermined factor, for example smaller than 0.9 times the following peak.
- the pitch estimation procedure is based on a least squares error algorithm.
- the algorithm defines the pitch as a number whose multiples best fit the correlation function peak locations. Initial possible pitch values may be limited to integral numbers which are not consecutive, the increment between two successive numbers being proportional to a constant multiplied by the lower of those two numbers.
- a speech synthesis system in which a speech signal is divided into a series of frames, and each frame is converted into a coded signal including pitch segment magnitude spectral information, a voiced/unvoiced classification, and a mixed voiced classification which classifies harmonics in the magnitude spectrum of voiced frames as strongly voiced or weakly voiced, wherein a series of samples centred on the middle of the frame are windowed to form a data array which is Fourier transformed to produce a magnitude spectrum, a threshold value is calculated and used to clip the magnitude spectrum, the clipped data is searched to define peaks, the locations of peaks are determined, constraints are applied to define dominant peaks, and harmonics not associated with a dominant peak are classified as weakly voiced.
- Peaks may be located using a second order polynomial.
- the samples may be Hamming windowed.
- the threshold value may be calculated by identifying the maximum and minimum magnitude spectrum values and defining the threshold as a constant multiplied by the difference between the maximum and minimum values. Peaks may be defined as those values which are greater than the two adjacent values.
- a peak may be rejected from consideration if neighbouring peaks are of a similar magnitude, e.g. more than 80% of the magnitude, or if there are spectral magnitudes in the same range of greater magnitudes.
- a harmonic may be considered as not being associated with a dominant peak if the difference between two adjacent peaks is greater than a predetermined threshold value.
- the spectrum may be divided into bands of fixed width and a strongly/weakly voiced classification assigned for each band.
- the frequency range may be divided into two or more bands of variable width, adjacent bands being separated at a frequency selected by reference to the strongly/weakly voiced classification of harmonics.
- the spectrum may be divided into fixed bands, for example fixed bands each of 500Hz, or variable width bands selected in dependence upon the strongly/weakly voiced status of harmonic components of the excitation signal.
- a strongly/weakly voiced classification is then assigned to each band.
- the lowest frequency band e.g. 0-500Hz
- the highest frequency band for example 3500Hz to 4000Hz
- other bands within the current frame e.g. 3000Hz to 3500Hz may be automatically classified as weakly voiced.
- the strongly/weakly voiced classification may be determined using a majority decision rule on the strongly/weakly voiced classification of those harmonics which fall within the band in question. If there is no majority, alternate bands may be alternately assigned strongly voiced and weakly voiced classifications.
- a speech synthesis system in which a speech signal is divided into a series of frames, each frame is defined as voiced or unvoiced, each frame is converted into a coded signal including a pitch period value, a frame voiced/unvoiced classification and, for each voiced frame, a mixed voiced spectral band classification which classifies harmonics within spectral bands as either strongly or weakly voiced, and the speech signal is reconstructed by generating an excitation signal in respect of each frame and applying the excitation signal to a filter, wherein for each weakly voiced spectral band, an excitation signal is generated which includes a random component in the form of a function which is dependent upon the respective pitch period value.
- the excitation signal is represented by a function which includes a first harmonic frequency component, the frequency of which is dependant upon the pitch period value appropriate to that frame, and a second random component which is superimposed upon the first component.
- the random component may be introduced by reducing the amplitude of harmonic oscillators assigned the weakly voiced classification, for example by reducing the power of the harmonics by 50%, while disturbing the oscillator frequencies, for example by shifting the oscillators randomly in frequency in the range of 0 to 30 Hz such that the frequency is no longer a multiple of the fundamental frequency, and then adding further random signals.
- the phase of the oscillators producing random signals may be randomised at pitch intervals.
- a speech synthesis system in which a speech signal is represented in part by spectral information in the form of harmonic magnitude values, it is possible to process an input speech signal to produce a series of spectral magnitude values and then to use all of those magnitude values at harmonic locations in subsequent processing steps. In many circumstances however at least some of the magnitude values contain little information which is useful in the recovery of the input speech signal. Accordingly when magnitude values are quantized for transmission to a receiver it is sensible to discard magnitude values which contain little useful information.
- an input speech signal is processed to produce an LPC residual signal which in turn is processed to provide harmonic magnitude values, but only a fixed number of those magnitude values is vector quantized for transmission to a receiver.
- the discarded magnitude values arc represented at the receiver as identical constant values.
- a speech synthesis system in which a speech signal is divided into a series of frames, and each voiced frame is converted into a coded signal including a pitch period value, LPC coefficients, and pitch segment spectral magnitude information, wherein the spectral magnitude information is quantized by sampling the LPC short term magnitude spectrum at harmonic frequencies, the locations of the largest spectral samples are determined to identify which of the magnitudes are relatively more important for accurate quantization, and the magnitudes so identified are selected and vector quantized.
- the invention selects only those values which make a significant contribution according to the subjectively important LPC magnitude spectrum, thereby reducing redundancy without compromising quality.
- a pitch segment of P n LPC residual samples is obtained, where P n is the pitch period value of the nth frame, the pitch segment is DFT transformed, the mean value of the resultant spectral magnitudes is calculated, the mean value is quantized and used as a normalisation factor for the selected magnitudes, and the resulting normalised amplitudes are quantized.
- the RMS value of the pitch segment is calculated, the RMS value is quantized and used as a normalisation factor for the selected magnitudes, and the resulting normalised amplitudes are quantized.
- the selected magnitudes are recovered, and each of the other magnitude values is reproduced as a constant value.
- Interpolation coding systems which employ a pitch-related synthesis formula to recover speech generally encounter the problem of coding a variable length, pitch dependant spectral amplitude vector.
- the quantization scheme referred to above in which only the magnitudes of relatively greater importance are quantized avoids this problem by quantizing only a fixed number of magnitude values and setting the rest of the magnitude values to a constant value. Thus at the receiver a fixed length vector can be recovered.
- Such a solution to the problem may result in a relatively spectrally flat excitation model which has limitations in providing high recovcred speech quality.
- Two vector quantization methodologies have been reported which quantize a variable size input vector with a fixed size code vector.
- the input vector is transformed to a fixed size vector which is then conventionally vector quantized.
- An inverse transform of the quantized fixed size vector yields the recovered quantized vector.
- Transformation techniques which have been used include linear interpolation, band limited interpolation, all pole modelling and non-square transformation. This approach however produces an overall distortion which is the summation of the vector quantization noise and a component which is introduced by the transformation process.
- a variable input vector is directly quantized with a fixed size code vector. This approach is based on selecting only a limited number of elements from each codebook vector to form a distortion measure between a codebook vector and an input vector.
- Such a quantization approach avoids the transformation distortion of the alternative technique mentioned above and results in an overall distortion that is equal to the vector quantization noise, but this is significant.
- a speech synthesis system in which a variable size input vector of coefficients to be transmitted to a receiver for the reconstruction of a speech signal is vector quantized using a codebook defined by vectors of fixed size, the codebook vectors of fixed size are obtained from variable size training vectors and an interpolation technique which is an integral part of the codebook generation process, codebook vectors are compared to the variable sized input vector using the interpolation process, and an index associated with the codebook entry with the smallest difference from the comparison is transmitted, the index being used to address a further codebook at the receiver and thereby derive an associated fixed size codebook vector, and the interpolation process being used to recover from the derived fixed sized codebook vector an approximation of the variable sized input vector.
- the invention is applicable in particular to pitch synchronous low bit rate coders of the type described in this document and takes advantage of the underlying principle of such coders which means that the shape of the magnitude spectrum is represented by a relatively small number of equally spaced samples.
- the interpolation process is linear.
- the interpolation process is applied to produce from the codebook vectors a set of vectors of that given dimension.
- a distortion measure is then derived to compare the interpolated set of vectors and the input vector and the codebook vector which yields the minimum distortion is selected.
- the dimension of the input vectors is reduced by taking into account only the harmonic amplitudes with the input brandwidth range, for example 0 to 3.4kRz.
- the remaining amplitudes i.e. in the region of 3.4kHz to 4 kHz are set to a constant value.
- the constant value is equal to the mean value of the quantized amplitudes.
- Amplitude vectors obtained from adjacent residual frames exhibit significant amounts of redundancy which can be removed by means of backward prediction.
- the backward prediction may be performed on a harmonic basis such that the amplitude value of each harmonic of one frame is predicted from the amplitude value of the same harmonic in the previous frame or frames.
- a fixed linear predictor may be incorporated in the system, together with mean removal and gain shape quantization processes which operate on a resulting error magnitude vector.
- variable sized vector quantization scheme provides advantageous characteristics, and in particular provides for good perceived signal quality at a bit rate of for example 2.4Kbits/sec, in some environments a lower bit rate would be highly desirable even at the loss of some quality. It would be possible for example to rely upon a single value representation and quantization strategy on the assumption that the magnitude spectrum of the pitch segment in the residual domain has an approximately flat shape. Unfortunately systems based on this assumption have a rather poor decoded speech quality.
- a speech synthesis system in which a speech signal is divided into a series of frames, each frame is converted into a coded signal including an estimated pitch period, an estimate of the energy of a speech segment the duration of which is a function of the estimated pitch period, and LPC filter coefficients defining an LPC spectral envelope, and a speech signal of related power to the power of the input speech signal is reconstructed by generating an excitation signal using spectral amplitudes which are defined from a modified LPC spectral envelope sampled at the harmonic frequencies defined by the pitch period.
- the excitation spectral envelope is shaped according to the LPC spectral envelope.
- the result is a system which is capable of delivering high quality speech at 1.5Kbits/sec.
- the invention is based on the observation that some of the speech spectrum resonance and anti-resonance information is also present in the residual magnitude spectrum, since LPC inverse filtering cannot produce a residual signal of absolutely flat magnitude spectrum. As a consequence, the LPC residual signal is itself highly intelligible.
- the magnitude values may be obtained by spectrally sampling a modified LPC synthesis filter characteristic at the harmonic locations related to the pitch period.
- the modified LPC synthesis filter may have reduced feed back gain and a frequency response which consists of equalised resonant peaks, the locations of which are close to the LPC synthesis resonant locations.
- the value of the feed back gain may be controlled by the performance of the LPC model such that it is for example proportional to the normalised LPC prediction error.
- the energy of the reproduced speech signal may be equal to the energy of the original speech waveform.
- a speech synthesis system in which a speech signal is divided into a series of frames, each frame is converted into a coded signal including LPC filter coefficients and at least one parameter associated with a pitch segment magnitude, and the speech signal is reconstructed by generating two excitation signals in respect of each frame, each pair of excitation signals comprising a first excitation signal generated on the basis of the pitch segment magnitude parameter or parameters of one frame and a second excitation signal generated on the basis of the pitch segment magnitude parameter or parameters of a second frame which follows and is adjacent to the said one frame, applying the first excitation signal to a first LPC filter the characteristics of which are determined by the LPC filter coefficients of the said one frame and applying the second excitation signal to a second LPC filter the characteristics of which are determined by the LPC filter coefficients of the said second frame, and weighting and combining the outputs of the first and second LPC filters to produce one frame of a synthesised speech signal.
- the first and second excitation signals include the same phase function and different phase contributions from the two LPC filters involved in the above double synthesis process. This reduces the degree of pitch periodicity in the recovered signals. This and the combination of the first and second LPC filter outputs ensures an effective smooth evolution of the speech spectral envelope on a sample by sample basis.
- the outputs of the first and second LPC filters are weighted by half a window function such as a Hamming window such that the magnitude of the output of the first filter is decreasing with time and the magnitude of the output of the second filter is increasing with time.
- a window function such as a Hamming window
- a speech coding system which operates on a frame by frame basis, and in which information is transmitted which represents each frame as either voiced or unvoiced and, for each voiced frame, represents that frame by a pitch period value, quantized magnitude spectral information with associated strong/weak voiced harmonic classification, and LPC filter coefficients, the received pitch period value and magnitude spectral information being used to generate residual signals at the receiver which are applied to LPC speech synthesis filters the characteristics of which are determined by the transmitted filter coefficients, wherein each residual signal is synthesised according to a sinusoidal mixed excitation synthesis process, and a recovered speech signal is derived from the combination of the outputs of the LPC Synthesis filters.
- the system operates on an LPC residual signal on a frame by frame basis.
- voiced speech K depends on the pitch frequency of the signal.
- a voiced/unvoiced classification process allows the coding of voiced and unvoiced frames to be handled in different ways.
- Unvoiced frames are modelled in terms of an RMS value and a random time series.
- voiced frames a pitch period estimate is obtained and used to define a pitch segment which is centred at the middle of the frame.
- Pitch segments from adjacent frames are DFT transformed and only the resulting pitch segment magnitude information is coded and transmitted.
- pitch segment magnitude samples are classified as strongly or weakly voiced.
- the system transmits for every voiced frame the pitch period value, the magnitude spectral information of the pitch segment, the strong/weak voiced classification of the pitch magnitude spectral values, and the LPC coefficient.
- the information which is transmitted for every voiced frame is, in addition to voiced/unvoiced information, the pitch period value, the magnitude spectral information of the pitch segment, and the LPC filter coefficients.
- a synthesis process that includes interpolation, is used to reconstruct the waveform between the middle points of the current (n+1)th and previous nth frames.
- the basic synthesis equation for the residual signal is: where M ⁇ j are decoded pitch segment magnitude values and phase j (i) is calculated from the integral of the linearly interpolated instantaneous harmonic frequencies ⁇ j (i). K is the largest value of j for which ⁇ j n (i) ⁇ .
- the initial phase for each harmonic is set to zero. Phase continuity is preserved across the boundaries of successive interpolation intervals.
- the synthesis process is performed twice however, once using the magnitude spectral values MG j n+1 of the pitch segment derived from the current (n+1)th frame and again using the magnitude values MG j n of the pitch segment derived in the previous nth frame.
- the phase function phase j (i) in each case remains the same.
- the resulting residual signals Res n (i) and Res n+1 (i) are used as inputs to corresponding LPC synthesis filters calculated for the nth and (n+1)th speech frames.
- the two LPC synthesised speech waveforms are then weighted by W n+1 (i) and W n (i) to yield the recovered speech signal.
- H n ( ⁇ n j ( i )) is the frequency response of the nth frame LPC synthesis filter calculated, at the ⁇ j n (i) harmonic frequency function at the ith instant.
- ⁇ "( ⁇ " l ( i )) is the associated phase response of this filter.
- ⁇ j n (i) and phase j n (i) are the frequency and phase functions defined for the sampling instants i, with i covering the middle of the nth frame to the middle of the (n+1)th frame segments.
- K is the largest value of j for which ⁇ j n (i) ⁇ .
- the above speech synthesis process introduces two "phase dispersion” terms i.e. ⁇ "( ⁇ n j ( i )) and ⁇ n +1 ( ⁇ n j ( i )) which effectively reduce the degree of pitch periodicity in the recovered signal.
- this "double synthesis" arrangement followed by an overlap-add process ensures an effective smooth evolution of the speech spectral envelope (LPC) on a sample by sample basis.
- the LPC excitation signal is based on a "mixed" excitation model which allows for the appropriate mixing of periodic and random excitation components in voiced frames on a frequency-band basis. This is achieved by operating the system such that the magnitude spectrum of the residual signal is examined, and applying a peak-picking process, near the ⁇ j resonant frequencies, to detect possible dominant spectral peaks.
- ⁇ j being located in the middle of such a 50 Hz interval.
- the amplitudes of the NRS random components are set to ( M ⁇ j / 2 NRS ⁇ ) Their initial phases are selected randomly from the [- ⁇ , + ⁇ ] region at pitch period intervals.
- the hv j information must be transmitted to be available at the receiver and, in order to reduce the bit rate allocated to hv j , the bandwidth of the input signal is divided into a number of fixed size bands BD k and a "strongly” or “weakly” voiced flag Bhv k is assigned for each band.
- a strongly or “weakly” voiced flag Bhv k is assigned for each band.
- a weakly voiced band a highly periodic signal is reproduced.
- a signal which combines both periodic and aperiodic components is required.
- the remaining spectral bands can be strongly or weakly voiced.
- Figure 1 schematically illustrates processes operated by the system encoder. These processes are referred to in Figure 1 as Processes I to VII and these terms are used throughout this document.
- a speech signal is input and processes I, III, IV, VI AND VII produce outputs for transmission.
- each of the k coding frames within the MQA is classified as voiced or unvoiced (V n ) using, Process I.
- a pitch estimation part of Process I provides a pitch period value P n only when a coding frame is voiced.
- k/m is an integer and represents the frame dimension of the matrix quantizer employed in Process III.
- the quantized coefficients â are used to derive a residual signal R n (i).
- P n is the pitch period value associated with the nth frame. This segment is centred in the middle of the frame.
- the selected P n samples are DFT transformed (Process V) to yield ⁇ ( P n + 1)/2 ⁇ spectral magnitude values M ⁇ n j , 0 ⁇ j ⁇ ( P n + 1)/2 ⁇ , and ⁇ ( P n + 1)/2 ⁇ phase values. The phase information is neglected.
- the magnitude information is coded (using Process VI) and transmitted.
- a segment of 20 msecs which is centred in the middle of the nth coding frame, is obtained from the residual signal R n (i).
- Process IV produces quantized Bhv information, which for voiced frames is multiplexed and transmitted to the receiver together with the voiced/unvoiced decision V n , the pitch period P n , the quantized LPC coefficients â of the corresponding LPC frame, and the magnitude values M ⁇ n j .
- V n voiced/unvoiced decision
- P n the pitch period
- P n the quantized LPC coefficients â of the corresponding LPC frame
- M ⁇ n j the magnitude values
- Figure 3 schematically illustrates processes operated by the system decoder.
- the decoder Given the received parameters of the nth coding frame and those of the previous (n-1)th coding frame, the decoder synthesises a speech signal S n (i) that extends from the middle of the (n-1)th frame to the middle of the nth frame.
- This synthesis process involves the generation in parallel of two excitation signals Res n (i) and Res n-1 (i) which are used to drive two independent LPC synthesis filters 1/ A n ( z ) and 1/ A n -1 (z) the coefficients of which are derived from the transmitted quantized coefficients â.
- the process commences by considering the voiced/unvoiced status V k , where k is equal to n or n-1, (see Figure 4).
- V k 0
- Performance could be increased if the E k value was calculated, quantized and transmitted every 5msecs.
- the Res k (i) excitation signal is defined as the summation of a "harmonic" Res k h (i) component and a "random" Res k r (i) component.
- the top path of the V k 1 part of the synthesis in Figure 4, which provides the harmonic component of this mixed excitation model, calculates always the instantaneous harmonic frequency function ⁇ j n (i) which is associated with the interpolation interval that is defined between the middle points of the nth and (n-1)th frames. (i.e. this action is independent of the value of k).
- the frequencies, f j 1,n and f j 2,n are defined as follows:
- ⁇ is defined so as to ensure that the phase of the cos terms is randomised every ⁇ samples across frame boundaries.
- the resulting Res n (i) and Res n-1 (i) excitation sequences, see Figure 4, are processed by the corresponding 1/ A n ( z ) and 1/ A n -1 ( z ) LPC synthesis filters.
- 1/ A n -1 ( z ) becomes 1/A n (z) (including the memory)
- 1/A n (z) becomes 1/ A n +1 ( z ) with the memory of 1/A n (z).
- X ⁇ n ( i ) is then filtered via a PF(z) post filter and a high pass filter HP(z) to yield the speech segment S' n (i).
- K n j is the first reflection coefficient of the nth coding frame.
- SC is calculated every LPC frame of L samples.
- SC l is associated with the middle of the Ith LPC frame as illustrated in Figure 6.
- the filtered samples from the middle of the (l-1)th frame to the middle of the Ith frame are then multiplied by SC 1 (i) to yield the final output of the system, S
- (i) SC l (i) ⁇ S' l (i)
- SC l ( i ) SC l W l ( i ) +SC l -1 W l -1 ( i ) 0 ⁇ i ⁇ L and
- the scaling process introduces an extra half LPC frame delay into the coding-decoding process.
- the above described energy scaling procedure operates on an LPC frame basis in contrast to both the decoding and PF(z), HP(z) filtering procedures which operate on the basis of a frame of M samples.
- Process I derives a voiced/unvoiced (V/UV) classification V n for the nth input coding frame and also assigns a pitch estimate P n to the middle sample M n of this frame. This process is illustrated in Figure 7.
- the V/UV and pitch estimation analysis frame is centred at the middle M n+1 of the (n+1)th coding frame with 237 samples on either side.
- the pitch estimation algorithm is illustrated in Figure 8, where P represents the output of the pitch estimation process.
- the 294 input samples are used to calculate a crosscorrelation function CR(d), where d is shown in Figure 9 and 20 ⁇ d ⁇ 147.
- Figure 9 shows the two speech segments which participate in the calculation of the crosscorrelation function value at "d" delay.
- the crosscorrelation function ⁇ d (j) is calculated for the segments ⁇ x L ⁇ d , (x R ⁇ d ,as:
- Figure 12 is a block diagram of the process involving the calculation of the CR function and the selection of its peaks.
- d n +1 max is equal to the value of d for which CR(d) is maximised to CR max M n +1 .
- the algorithm examines the length of the G 0 runs which exist between successive G s segments (i.e. G s and G s+1 ), and when G 0 ⁇ 17, then the G s segment with the max CR L (d) value is kept. This procedure yields CR ⁇ L ( d ), which is then examined by the following "peak picking" procedure.
- those CR ⁇ L ( d ) values are selected for which: C R ⁇ L ( d ) >C R ⁇ L ( d -1) and C R ⁇ L ( d ) > C R ⁇ L ( d + 1)
- CR(d) and loc(k) are used as inputs to the following Modified High Resolution Pitch.
- Estimation algorithm shown in Figure 8, whose output is P Mn+1 .
- the flowchart of this MHRPE procedure is shown in Figure 13, where P is initialised with 0 and, at the end, the estimated P is the requested P Mn+1 .
- the main pitch estimation procedure is based on a Least Squares Error (LSE) algorithm which is defined as follows: For each possible pitch value j in the range from 21 to 147 with an increment of 0.1 x j . i.e. j ⁇ 21,23,25,27,30,33,36,40,44,48,53,58,64,70,77,84,92,101,111,122,134 ⁇ . (Thus 21 iterations are performed.)
- LSE Least Squares Error
- Process I obtains 160 samples centred at the middle of the M n+1 coding frame, removes their mean value, and then calculates R0, R1 and the average R av of the energies of the previous K non-silence coding frames.
- K is fixed to 50 for the first 50 non-silence coding frames, increases from 50 to 100 with the next 50 non-silence coding frames, and then remains constant at the value of 100.
- V/UV part of Process I calculates the status V Mn+1 of the n+1 frame.
- the flowchart of this part of the algorithm is shown in Figure 18 where "V" represents the output V/UV flag of this procedure. Setting the "V” flag to 1 or 0 indicates voiced or unvoiced classification respectively.
- the "CR” parameter denotes the maximum value of the CR function which is calculated in the pitch estimation process.
- a diagrammatic representation of the voiced/unvoiced procedure is given in Figure 19.
- a multipoint pitch estimation algorithm accepts P Mn+1 , P Mn+1+d1 , P Mn+1+d2 , V n-1 , P n-1 , V' n , P' n to provide a preliminary pitch value P pr n +1 .
- the flowchart of this multipoint pitch estimation algorithm is given in Figure 21, where P 1 , P 2 and P o represent the pitch estimates associated with the M n+1+d1 , M n+1 +d2 and M n+1 points respectively, and P denotes the output pitch estimate of the process, that is P n+1 .
- This pitch post processing stage is defined in the flowchart of Figures 23, 24 and 25, the output A of Figure 23 being the input to Figure 24, and the output B of Figure 24 being the input to Figure 25.
- P n " and V n " represent the pitch estimate and voicing flag respectively, which correspond to the nth coding frame prior to post processing (i.e. P 1 n , V 1 n ) whereas at the end of the procedure "P n " and "V n " represent the final pitch estimate and voicing flag associated with the nth frame (i.e. P n , V n ).
- the LPC analysis process (Process II of Figure 1) can be performed using the Autocorrelation, Stabilised Covariance or Lattice methods.
- the Burg algorithm was used, although simple autocorrelation schemes could be employed without a noticeable effect in the decoded speech quality.
- the LPC coefficients are then transformed to an LSP representation. Typical values for the number of coefficients are 10 to 12 and a 10th order filter has been used.
- LPC analysis processes are well known and described in the literature, for example "Digital Processing of Speech Signals", L.R. Rabiner and R.W. Schafer, Prentice - Hall Inc., Englewood Cliffs, New Jersey, 1978.
- LSP representations are well known, for example from "Line Spectrum Pair and Speech Data Compression", F Soong and B.H. Juang, Proc. ICASSP-84, pp 1.10.1-1.10.4, 1984. Accordingly these processes and representations will not be described further in this document.
- LSP coefficients are used to represent the data. These 10 coefficients could be quantized using scalar 37 bits with the following bit allocation pattern [3,4,4,4,4,4,4,3,3]. This is a relatively simple process, but the resulting bit rate of 1850 bits/second is unnecessarily high.
- the LSP coefficients can be Vector Quantised (VQ) using a Split-VQ technique.
- VQ Vector Quantised
- the Split-VQ technique an LSP parameter vector of dimension "p" is split into two or more subvectors of lower dimensions and then, each subvector is Vector Quantised separately (when Vector Quantising the subvectors a direct VQ approach is used).
- K is set to "p” (i.e. when C is partitioned into “p” elements)
- the Split-VQ becomes equivalent to Scalar Quantisation.
- ⁇ and ⁇ 1 are set to 0.2 and 0.15 respectively.
- w s,k ( s,t )
- ⁇ P ( l k+s n+1 ) is the value of the power envelope spectrum of the (1+t) speech frame at the l k+s LSP s+t s ( k -1)+ z frequency.
- ⁇ is equal to 0.15
- the overall SMQ quantisation process that yields the quantised LSP coefficients vectors l ⁇ 1 to Î l+N-1 for the 1 to l+N-1 analysis frames is shown in Figure 26.
- a 5Hz bandwidth expansion is also included in the inverse quantisation process.
- Process IV of Figure l This process is concerned with the mixed voiced classification of harmonics.
- the flowchart of Process IV is given in Figure 27.
- the R n array of 160 samples is Hamming windowed and augmented to form a 512 size array, which is then FFT processed.
- the maximum and minimum values MGR max , MGR min of the resulting 256 spectral magnitude values are determined, and a threshold TH0 is calculated. TH0 is then used to clip the magnitude spectrum.
- the clipped MGR array is searched to define peaks MGR(P) satisfying: MGR(P)>MGR(P+1) and MGR(P)>MGR(P-1)
- MGR(P) "supported" by the MGR(P+1) and MGR(P-1) values a second order polynomial is fitted and the maximum point of this curve is accepted as MGR(P) with a location loc(MGR(P)). Further constraints are then imposed on these magnitude peaks. In particular peaks are rejected :
- loc(MGR d ( k )) - loc(MGR d ( k-1 ) loc(K).
- the spectrum is divided into bands of 500Hz each and a strongly voiced/weakly voiced flag Bhv is assigned for each band:
- the Bhv values of the remaining 5 bands are determined using a majority decision rule on the hv j values of the j harmonics which fall within the band under consideration.
- the hv j of a specific harmonic j is equal to the Bhv value of the corresponding band.
- the hv information may be transmitted with 5 bits.
- the 680 Hz to 3400 Hz range is represented by only two variable size bands.
- Figures 29 and 30 represent respectively an original speech waveform obtained for the utterance "Industrial shares were mostly a" and frequency tracks obtained for that utterance.
- the horizontal axis represents time in terms of frames each of 20msec duration.
- Figure 32 shows four waveforms A, B, C and D.
- Waveform A represents the magnitude spectrum of a speech segment and the corresponding LPC spectral envelope (log 10 domain).
- Waveforms B, C and D represent the normalised Short-Term magnitude spectrum of the corresponding residual segment (B), the excitation segment obtained using the binary (voiced/unvoiced) excitation model (C), and the excitation segment obtained using the strongly voiced/weakly voiced/unvoiced hybrid excitation model (D).
- B the normalised Short-Term magnitude spectrum of the corresponding residual segment
- C binary (voiced/unvoiced) excitation model
- D strongly voiced/weakly voiced/unvoiced hybrid excitation model
- the DFT For a real-valued sequence x(i) of P points the DFT may be expressed as:
- the P n point DFT will yield a double-side spectrum.
- the magnitude of all the non DC components must be multiplied by a factor of 2.
- the total number of single side magnitude spectrum values, which are used in the reconstruction process, is equal to ⁇ ( P n + 1) / 2 ⁇
- MSVSAR modified single value spectral amplitude representation
- MSVSAR is based on the observation that some of the speech spectrum resonance and anti-resonance information is also present at the residual magnitude spectrum (G.S. Kang and S.S. Everett, "Improvement of the Excitation Source in the Narrow-Band Linear Prediction Vocoder", IEEE Trans. Acoust., Speech and Signal Proc., Vol. ASSP-33, pp.377-386, 1985).
- LPC inverse filtering can not produce a residual signal of absolutely flat magnitude spectrum mainly due to: a) the "cascade representation" of formats by the LPC filter 1/A(z), which results in the magnitudes of the resonant peaks to be dependent upon the pole locations of the 1/A(z) all-pole filter and b) the LPC quantisation noise.
- the LPC residual signal is itself highly intelligible.
- Equation 32 defines a modified LPC synthesis filter with reduced feedback gain, whose frequency response consists of nearly equalised resonant peaks, the locations of which are very close to the LPC synthesis resonant locations. Furthermore, the value of the feedback gain G R is controlled by the performance of the LPC model (i.e. it is proportional to the normalised LPC prediction error). In addition Equation 34 ensures that the energy of the reproduced speech signal is equal to the energy of the original speech waveform. Robustness is increased by computing the speech RMS value over two pitch periods.
- the first of the alternative magnitude spectrum representations techniques is referred to below in the "Na amplitude system".
- the basic principle of this MG n j quantisation system is to represent accurately those MG n j values which correspond to the Na largest speech Short Term (ST) spectral envelope values.
- ST Short Term
- This arrangement for the quantization of g or m extends the dynamic range of the coder to not less than 25dBs.
- the remaining ⁇ ( P n +1)/2 ⁇ - Na-1 MG n p values are set to a constant value A. (where A is either "m” or "g").
- A is either "m” or "g”).
- the block diagram of the adaptive ⁇ -law quantiser is shown in Figure 34.
- the second of the alternative magnitude spectrum representation techniques is referred to below as the "Variable Size Spectral Vector Quantisation (VS/SVQ)" system.
- Coding systems which employ the general synthesis formula of Equation (1) to recover speech, encounter the problem of coding a variable length, pitch dependant spectral amplitude vector MG .
- the "Na- amplitudes" MG n j quantisation schemes described in Figure 33 avoid this problem by Vector Quantising the minimum expected number of spectral amplitudes and by setting the rest of the MG n j amplitudes to a fixed value.
- such a partially spectrally flat excitation model has limitations in providing high recovered speech quality.
- VQ Variable Size Spectral Vector Quantisation
- Figure 35 highlights the VS/SVQ process.
- Interpolation (in this case linear) is used on the S i vectors to yield S'' vectors of dimension vs n.
- Equation (38) is used to define M ⁇ n from S l .
- Amplitude vectors obtained from adjacent residual frames, exhibit significant redundancy, which can be removed by means of backward prediction. Prediction is performed on a harmonic basis i.e. the amplitude value of each harmonic MG j n is predicted from the amplitude value of the same harmonic in previous frames i.e. MG n -1 j .
- a fixed linear predictor b ⁇ M ⁇ n -1 may be incorporated in the VS/SVQ system, and the resulting DPCM structure is shown in Figure 36 (differential VS/SVQ, (DVS/SVQ)).
- error vectors are formed as the difference between the original spectral amplitudes MG j n and their predicted ones n j , i.e.: where the predicted spectral amplitudes n j are given as: and
- quantised spectral amplitudes M ⁇ n j are given as: where ⁇ n j denotes the quantised error vector.
- the quantisation of the E n j 1 ⁇ j ⁇ vs n error vector incorporates Mean Removal and Gain Shape Quantisation techniques, using the hierarchical VQ structure of Figure 36.
- a weighted Mean Square Error is used in the VS/SVQ stage of the system.
- W n j is normalised so that:
- the pdf of the mean value of E n is very broad and, as a result, the mean value differs widely from one vector to another.
- This mean value can be regarded as statistically independent of the variation of the shape of the error vector E n and thus, can be quantised separately without paying a substantial penalty in compression efficiency.
- the objective of the Gain-Shape VQ process is to determine the gain value ⁇ and the shape vector ⁇ so as to minimise the distortion measure:
- a gain optimised VQ search method similar to techniques used in CELP systems, is employed to find the optimum ⁇ and ⁇ .
- the shape Codebook (CBS) of vectors S i is searched first to qud an index I, which maximises the ouantitv: where cbs is the number of codevectors in the CBS.
- the optimum gain value is defined as: and is Optimum Scalar Quantised to ⁇ .
- a closed-loop joint predictor and VQ design process was employed to design the CBS codebook, the optimum scalar quantisers CBM and CBG of the mean M and gain G values respectively, and also to define the prediction coefficient b of Figure 36. In particular, the following steps take place in the design process.
- each quantizer i.e. b k , CBM k , CBG k , CBS k
- b k The performance of each quantizer (i.e. b k , CBM k , CBG k , CBS k ) has been evaluated using subjective tests and a LogSegSNR distortion measure, which was found to reflect the subjective performance of the system.
- Q i denotes the cluster of Erm n k error vectors which are quantised to the S k , v , m -1 i , codebook shape vector, cbs represents the total number of shape quantisation levels, J n represents the CBG k,v-1 gain codebook index which encodes the Erm n k error vector and 1 ⁇ j ⁇ vs n .
- Process VII calculates the energy of the residual signal.
- the LPC analysis performed in Process II provides the prediction coefficients a; 1 ⁇ i ⁇ p and the reflection coefficients k i 1 ⁇ i ⁇ p.
- the Voiced/Unvoiced classification performed in Process I provides the short term autocorrelation coefficient for zero delay of the speech signal (R0) for the frame under consideration.
- the Energy of the residual signal E n value is given as:
- Equation (50) gives a good approximation of the residual signal energy with low computational requirements.
- E n value can be given as:
- E n is then Scalar Quantised using an adaptive ⁇ -law quantised arrangement similar to the one depicted in Figure 34.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Aerials With Secondary Devices (AREA)
- Optical Communication System (AREA)
- Telephonic Communication Services (AREA)
- Reduction Or Emphasis Of Bandwidth Of Signals (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
Claims (46)
- Sprachcodiersystem, das rahmenbezogen arbeitet und in dem Information übertragen wird, die jeden Rahmen entweder als stimmhaft oder stimmlos darstellt und die bei jedem stimmhaften Rahmen diesen Rahmen durch einen Grundfrequenzperiodenwert, Spektralinformation quantisierter Größen mit zugeordneter Stark/Schwach-stimmhaft-Oberwellenklassifikation und LPC-Filterkoeffizienten darstellt, wobei der empfangene Grundfrequenzperiodenwert und die Spektralinformation Größen verwendet werden, um Restsignale im Empfänger zu erzeugen, die an LPC-Sprachsynthesefilter angelegt werden, deren charakteristische Merkmale bestimmt werden durch die übertragenen Filterkoeffizienten, wobei jedes Restsignal entsprechend einem sinusförmigen gemischten Anregungssyntheseprozeß synthetisiert wird und ein zurückgewonnenes Sprachsignal aus der Kombination der Ausgangssignale der LPC-Synthesefilter abgeleitet wird.
- System nach Anspruch 1, wobei ein Sprachsignal in eine Serie von Rahmen geteilt ist und jeder Rahmen in ein codiertes Signal mit einer Stimmhaft/Stimmlos-Klassifikation und einem Grundfrequenzschätzwert umgesetzt wird, wobei ein tiefpaßgefiltertes Sprachsegment, das um einen Referenzabtastwert zentriert ist, in jedem Rahmen definiert ist, ein Korrelationswert für jeden einer Serie von in Betracht kommenden Grundfrequenzschätzwerten als der höchste von mehreren Kreuzkorrelationswerten berechnet wird, die aus variabel langen Sprachsegmenten ermittelt werden, die um den Referenzabtastwert zentriert sind, die Korrelationswerte verwendet werden, um eine Korrelationsfunktion zu bilden, die Spitzenwerte definiert, und die Lagen der Spitzenwerte bestimmt und verwendet werden, um einen Grundfrequenzschätzwert zu definieren.
- System nach Anspruch 2, wobei der Grundfrequenzschätzwert unter Verwendung eines iterativen Prozesses definiert ist.
- System nach Anspruch 2 oder 3, wobei ein einzelner Referenzabtastwert verwendet werden kann, der in bezug auf den jeweiligen Rahmen zentriert ist.
- System nach Anspruch 2 oder 3, wobei mehrere Grundfrequenzschätzwerte für jeden Rahmen unter Verwendung verschiedener Referenzabtastwerte abgeleitet werden, wobei die mehreren Grundfrequenzschätzwerte kombiniert werden, um einen kombinierten Grundfrequenzschätzwert für den Rahmen zu definieren.
- System nach einem der Ansprüche 2 bis 5, wobei der Grundfrequenzschätzwert mit Bezug auf einen Stimmhatt/Stimmlos-Status und/oder Grundfrequenzschätzwerte benachbarter Rahmen modifiziert wird, um einen endgültigen Grundfrequenzschätzwert zu definieren.
- System nach einem der Ansprüche 2 bis 6, wobei die Korrelationsfunktion unter Verwendung eines Schwellwerts begrenzt ist, wobei verbleibende Spitzenwerte zurückgewiesen werden, wenn sie neben größeren Spitzenwerte liegen.
- System nach Anspruch 7, wobei Spitzenwerte gewählt werden, die größer sind als einer von beiden benachbarten Spitzenwerten, und Spitzenwerte zurückgewiesen werden, wenn sie um mehr als einen vorbestimmten Faktor kleiner sind als ein folgender Spitzenwert.
- System nach einem der Ansprüche 2 bis 8, wobei der Grundfrequenzschätzungsablauf auf einem Algorithmus der kleinsten Fehlerquadrate beruht.
- System nach Anspruch 9, wobei der Grundfrequenzschätzungsalgorithmus den Grundfrequenzwert als eine Zahl definiert, deren Vielfache am besten zu den Orten der Korrelationsfunktionsspitzenwerte passen.
- System nach einem der Ansprüche 2 bis 10, wobei mögliche Grundfrequenzwerte auf ganze Zahlen begrenzt sind, die nicht aufeinanderfolgend sind, wobei das Inkrement zwischen zwei aufeinanderfolgenden Zahlen proportional zu einer Konstanten, multipliziert mit der niedrigeren der beiden Zahlen, sind.
- System nach Anspruch 1, wobei ein Sprachsignal in eine Serie von Rahmen geteilt ist und jeder Rahmen in ein codiertes Signal mit Spektralinformation von Grundfrequenzsegmentgrößen, einer Stimmhaft/Stimmlos-Klassifikation und einer Gemischt-stimmhaft-Klassifikation umgesetzt wird, die Oberwellen im Größenspektrum von stimmhaften Rahmen als stark stimmhaft oder schwach stimmhaft klassifiziert, wobei eine Serie von Abtastwerten, die in der Mitte des Rahmens zentriert sind, in Fenstertechnik dargestellt werden, um ein Datenfeld zu bilden, das fouriertransformiert ist, um ein Größenspektrum zu erzeugen, ein Schwellwert berechnet und verwendet wird, um das Größenspektrum zu begrenzen, die begrenzten Daten durchsucht werden, um Spitzenwerte zu definieren, die Orte von Spitzenwerten bestimmt werden, Einschränkungen auferlegt werden, um dominante Spitzenwerte zu definieren, und Oberwellen, die keinem dominanten Spitzenwert zugeordnet sind, als schwach stimmhaft klassifiziert werden.
- System nach Anspruch 12, wobei Spitzenwerte unter Verwendung eines Polynoms zweiter Ordnung lokalisiert werden.
- System nach Anspruch 12 oder 13, wobei die Abtastwerte ein Hamming-Fenstertechnik dargestellt werden.
- System nach Anspruch 12, 13 oder 14, wobei der Schwellwert berechnet wird, indem der höchste und der niedrigste Größenspektrumwert identifiziert werden und der Schwellwert als eine Konstante, multipliziert mit der Differenz zwischen dem höchsten und dem niedrigsten Wert, definiert wird.
- System nach einem der Ansprüche 12 bis 15, wobei Spitzenwerte als diejenigen Werte definiert sind, die größer sind als zwei benachbarte Werte, wobei ein Spitzenwert nicht in Betracht kommt, wenn benachbarte Spitzenwerte eine ähnliche Größe haben oder wenn Spektralgrößen im gleichen Bereich höherer Größe vorhanden sind.
- System nach einem der Ansprüche 12 bis 16, wobei eine Oberwelle als keinem dominanten Spitzenwert zugeordnet angesehen wird, wenn die Differenz zwischen zwei benachbarten Spitzenwerten größer ist als ein vorbestimmter Schwellwert.
- System nach einem der Ansprüche 12 bis 17, wobei das Spektrum in Bänder fester Breite geteilt ist und jedem Band eine Stark/Schwach-stimmhaft-Klassifikation zugewiesen ist.
- System nach einem der Ansprüche 12 bis 18, wobei der Frequenzbereich in zwei oder mehr Bänder variabler Breite geteilt ist, wobei benachbarte Bänder an einer Frequenz getrennt sind, die mit Bezug auf die Stark/Schwach-stimmhaft-Klassifikation von Oberwellen gewählt werden.
- System nach Anspruch 18 oder 19, wobei das niedrigste Frequenzband als stark stimmhaft angesehen wird, wogegen das höchste Frequenzband als schwach stimmhaft angesehen wird.
- System nach Anspruch 20, wobei, wenn ein gegenwärtiger Rahmen stimmhaft ist und der folgende Rahmen stimmlos ist, weitere Bänder im gegenwärtigen Rahmen automatisch als schwach stimmhaft klassifiziert werden.
- System nach Anspruch 20 oder 21, wobei die Stark/Schwach-stimmhatt-Klassifikation unter Verwendung einer Majoritätsentscheidungsregel in die Stark/Schwach-stimmhaft-Klassifikation derjenigen Oberwellen bestimmt wird, die in das in Frage kommende Band fallen.
- System nach Anspruch 22, wobei, wenn keine Majorität vorhanden ist, alternierende Bänder einer Stark-stimmhaft- und einer Schwach-stimmhaft-Klassifikation alternierend zugeordnet werden.
- Sprachsynthesesystem, in dem ein Sprachsignal in eine Serie von Rahmen geteilt ist, wobei jeder Rahmen als stimmhaft oder stimmlos definiert ist, jeder Rahmen in ein codiertes Signal mit einem Grundfrequenzperiodenwert, einer Rahmen-Stimmhaft/Stimmlos-Klassifikation und, bei jedem stimmhaften Rahmen, einer Gemischt-stimmhaft-Spektralbandklassifikation umgewandelt wird, die Oberwellen innerhalb von Spektralbändern entweder als stark oder schwach stimmhaft klassifiziert, und das Sprachsignal wiederhergestellt wird, indem ein Anregungssignal in bezug auf jeden Rahmen erzeugt wird und das Anregungssignal an ein Filter angelegt wird, wobei bei jedem schwach stimmhaftem Spektralband ein Anregungssignal erzeugt wird, das eine Zufallskomponente in Form einer Funktion aufweist, die von dem jeweiligen Grundfrequenzperiodenwert abhängig ist.
- System nach Anspruch 24, wobei das Spektrum in Bänder geteilt ist und jedem Band eine Stark/Schwach-stimmhaft-Klassifikation zugeordnet ist.
- System nach Anspruch 24 oder 25, wobei die Zufallskomponente eingeführt wird, indem die Amplitude von Oberwellenoszillatoren, die der Schwach-stimmhaft-Klassifikation zugeordnet sind, reduziert wird, die Oszillatorfrequenzen so gestört werden, daß die Frequenz kein Vielfaches der Grundfrequenz mehr ist, und dann weitere Zufallssignale hinzugefügt werden.
- System nach Anspruch 26, wobei die Phase der Oszillatoren randomisiert ist.
- System nach Anspruch 1, wobei ein Sprachsignal in eine Serie von Rahmen geteilt ist und jeder stimmhafte Rahmen in ein codiertes Signal mit einem Grundfrequenzperiodenwert, LPC-Koeffizienten und Spektralgrößeninformation von Grundfrequenzsegmenten codiert wird, wobei die Spektralgrößeninformation quantisiert wird, indem das LPC-Kurzzeitgrößenspektrum mit Oberwellenfrequenzen abgetastet wird, die Orte der größten Spektralabtastwerte bestimmt werden, um zu identifizieren, welche der Größen relativ wichtiger für eine genaue Quantisierung sind, und die so identifizierten Größen gewählt und vektorquantisiert werden.
- System nach Anspruch 28, wobei ein Grundfrequenzsegment von Pn LPC-Restabtastwerten ermittelt wird, wobei Pn der Grundfrequenzperiodenwert des n-ten Rahmens ist, das Grundfrequenzsegment DFT-transformiert wird, der Mittelwert der resultierenden Spektralgrößen berechnet wird, der Mittelwert quantisiert und verwendet wird als Normierungsfaktor für die gewählten Größen und die resultierenden normierten Amplituden quantisiert werden.
- System nach Anspruch 28, wobei der RMS-Wert des Grundfrequenzsegments berechnet wird, der RMS-Wert quantisiert wird und als Normierungsfaktor für die gewählten Größen verwendet wird und die resultierenden normierten Amplituden quantisiert werden.
- System nach einem der Ansprüche 28 bis 30, wobei im Empfänger die gewählten Größen zurückgewonnen werden und jeder der anderen Größenwerte als ein konstanter Wert wiedergegeben wird.
- System nach Anspruch 1, wobei ein variabel großer Eingangsvektor von Koeffizienten, die zur Wiederherstellung eines Sprachsignals an einen Empfänger zu übertragen sind, unter Verwendung eines Codebuchs, das durch Vektoren fester Größe definiert ist, vektorquantisiert wird, die Codebuchvektoren fester Größe aus variabel großen Trainingsvektoren und anhand einer Interpolationstechnik, die ein integraler Bestandteil des Codebucherzeugungsprozesses ist, ermittelt werden, Codebuchvektoren mit den variabel großen Eingangsvektoren unter Verwendung des Interpolationsprozesses verglichen werden und ein Index, der dem Codebucheintrag mit der kleinsten Differenz aus dem Vergleich zugeordnet ist, übertragen wird, wobei der Index verwendet wird, um ein weiteres Codebuch im Empfänger anzusprechen und dabei einen zugeordneten Codebuchvektor fester Größe abzuleiten, und der Interpolationsprozeß verwendet wird, um aus dem abgeleiteten Codebuchvektor fester Größe eine Näherung des Eingangsvektors variabler Größe zurückzugewinnen.
- System nach Anspruch 32, wobei der Interpolationsprozeß linear ist und bei einem Eingangsvektor einer gegebenen Dimension der Interpolationsprozeß angewendet wird, um aus den Codebuchvektoren einen Satz von Vektoren dieser gegebenen Dimension zu erzeugen, dann ein Verzerrungsmaß abgeleitet wird, um den interpolierten Satz von Vektoren und den Eingangsvektor zu vergleichen, und derjenige Codebuchvektor gewählt wird, der die kleinste Verzerrung ergibt.
- System nach Anspruch 33, wobei die Dimension der Vektoren reduziert wird, indem nur die Oberwellenamplituden im Eingangsbandbreitenbereich in Betracht gezogen werden.
- System nach Anspruch 34, wobei die verbleibenden Amplituden auf einen konstanten Wert gesetzt werden.
- System nach Anspruch 35, wobei der konstante Wert gleich dem Mittelwert der quantisierten Amplituden ist.
- System nach einem der Ansprüche 32 bis 36, wobei Redundanz zwischen Amplitudenvektoren, die aus benachbarten Restrahmen ermittelt werden, mittels Rückwärtsprädiktion entfernt wird.
- System nach Anspruch 37, wobei die Rückwärtsprädiktion auf der Grundlage von Oberwellen durchgeführt wird, so daß der Amplitudenwert jeder Oberwelle eines Rahmens aus dem Amplitudenwert der gleichen Oberwelle in dem vorherigen Rahmen oder in den vorherigen Rahmen vorhergesagt wird.
- System nach Anspruch 1, wobei ein Sprachsignal in eine Serie von Rahmen geteilt ist, jeder Rahmen in ein codiertes Signal mit einer geschätzten Grundfrequenzperiode, einem Schätzwert der Energie eines Sprachsegments, dessen Dauer eine Funktion der geschätzten Grundfrequenzperiode ist, und LPC-Filterkoeffizienten, die eine LPC-Spektralhüllkurve definieren, umgewandelt wird und ein Sprachsigrial einer Leistung, die auf die Leistung des Eingangssprachsignals bezogen ist, wiederhergestellt wird, indem ein Anregungssignal unter Verwendung von Spektralamplituden erzeugt wird, die anhand einer modifizierten LPC-Spektralhüllkurve definiert sind, die mit Oberwellenfrequenzen abgetastet ist, die durch die Grundfrequenzperiode definiert sind.
- System nach Anspruch 39, wobei die Größenwerte durch spektrale Abtastung einer modifizierten LPC-Synthesefiltercharakteristik an den Oberwellenorten ermittelt werden, die auf die Grundfrequenzperiode bezogen sind.
- System nach Anspruch 40, wobei das modifizierte LPC-Synthesefilter den Rückkopplungsgewinn und eine Frequenzantwort reduziert hat, die aus entzerrten Resonanzspitzenwerten besteht, deren Orte nahe den LPC-Syntheseresonanzorten sind.
- System nach Anspruch 41, wobei der Wert des Rückkopplungsgewinns durch das Verhalten des LPC-Modells gesteuert wird, so daß er sich auf den normierten LPC-Prädiktionsfehler bezieht.
- System nach einem der Ansprüche 39 bis 42, wobei die Energie des wiedergegebenen Sprachsignals gleich der Energie der ursprünglichen Sprachwellenform ist.
- System nach Anspruch 1, wobei ein Sprachsignal in eine Serie von Rahmen geteilt ist, jeder Rahmen in ein codiertes Signal mit LPC-Filterkoeffizienten und mindestens einem Parameter umgewandelt wird, der einer Grundfrequenzsegmentgröße zugeordnet ist, und das Sprachsignal wiederhergestellt wird, indem zwei Anregungssignale in bezug auf jeden Rahmen erzeugt werden, wobei jedes Paar von Anregungssignalen ein erstes Anregungssignal, das auf der Grundlage des Grundfrequenzsegmentgrößenparameters oder der Grundfrequenzsegmentgrößenparameter eines Rahmens erzeugt wird, und ein zweites Anregungssignal umfaßt, das auf der Grundlage des Grundfrequenzsegmentgrößenparameters oder der Grundfrequenzsegmentgrößenparameter eines zweiten Rahmens erzeugt wird, der auf den einen Rahmen folgt und neben diesem liegt, das erste Anregungssignal an ein erstes LPC-Filter angelegt wird, dessen Charakteristik durch die LPC-Filterkoeffizienten des einen Rahmens bestimmt wird, und das zweite Anregungssignal an ein zweites LPC-Filter angelegt wird, dessen Charakteristik durch die LPC-Filterkoeffizienten des zweiten Rahmens bestimmt wird, und die Ausgangssignale des ersten und des zweiten LPC-Filters gewichtet und kombiniert werden, um einen Rahmen eines synthetisierten Sprachsignals zu erzeugen.
- System nach Anspruch 44, wobei das erste und zweite Anregungssignal die gleiche Phasenfunktion und verschiedene Phasenbeiträge von den beiden LPC-Filtern aufweisen.
- System nach Anspruch 45, wobei die Ausgangssignale des ersten und zweiten LPC-Filters mit einer halben Fensterfunktion gewichtet werden, so daß die Größe des Ausgangssignals des ersten Filters mit der Zeit abnimmt und die Größe des Ausgangssignals des zweiten Filters mit der Zeit zunimmt.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB9614209 | 1996-07-05 | ||
GBGB9614209.6A GB9614209D0 (en) | 1996-07-05 | 1996-07-05 | Speech synthesis system |
US2181596P | 1996-07-16 | 1996-07-16 | |
US21815P | 1996-07-16 | ||
PCT/GB1997/001831 WO1998001848A1 (en) | 1996-07-05 | 1997-07-07 | Speech synthesis system |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0950238A1 EP0950238A1 (de) | 1999-10-20 |
EP0950238B1 true EP0950238B1 (de) | 2003-09-10 |
Family
ID=26309651
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP97930643A Expired - Lifetime EP0950238B1 (de) | 1996-07-05 | 1997-07-07 | Sprachkodier- und dekodiersystem |
Country Status (7)
Country | Link |
---|---|
EP (1) | EP0950238B1 (de) |
JP (1) | JP2000514207A (de) |
AT (1) | ATE249672T1 (de) |
AU (1) | AU3452397A (de) |
CA (1) | CA2259374A1 (de) |
DE (1) | DE69724819D1 (de) |
WO (1) | WO1998001848A1 (de) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102004007184B3 (de) * | 2004-02-13 | 2005-09-22 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Verfahren und Vorrichtung zum Quantisieren eines Informationssignals |
US7716042B2 (en) | 2004-02-13 | 2010-05-11 | Gerald Schuller | Audio coding |
US7729903B2 (en) | 2004-02-13 | 2010-06-01 | Gerald Schuller | Audio coding |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2784218B1 (fr) * | 1998-10-06 | 2000-12-08 | Thomson Csf | Procede de codage de la parole a bas debit |
GB2357683A (en) * | 1999-12-24 | 2001-06-27 | Nokia Mobile Phones Ltd | Voiced/unvoiced determination for speech coding |
GB2398981B (en) * | 2003-02-27 | 2005-09-14 | Motorola Inc | Speech communication unit and method for synthesising speech therein |
CN114519996B (zh) * | 2022-04-20 | 2022-07-08 | 北京远鉴信息技术有限公司 | 一种语音合成类型的确定方法、装置、设备以及存储介质 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2670313A1 (fr) * | 1990-12-11 | 1992-06-12 | Thomson Csf | Procede et dispositif pour l'evaluation de la periodicite et du voisement du signal de parole dans les vocodeurs a tres bas debit. |
JP3093113B2 (ja) * | 1994-09-21 | 2000-10-03 | 日本アイ・ビー・エム株式会社 | 音声合成方法及びシステム |
US5978764A (en) * | 1995-03-07 | 1999-11-02 | British Telecommunications Public Limited Company | Speech synthesis |
-
1997
- 1997-07-07 DE DE69724819T patent/DE69724819D1/de not_active Expired - Lifetime
- 1997-07-07 CA CA002259374A patent/CA2259374A1/en not_active Abandoned
- 1997-07-07 EP EP97930643A patent/EP0950238B1/de not_active Expired - Lifetime
- 1997-07-07 AU AU34523/97A patent/AU3452397A/en not_active Abandoned
- 1997-07-07 AT AT97930643T patent/ATE249672T1/de not_active IP Right Cessation
- 1997-07-07 WO PCT/GB1997/001831 patent/WO1998001848A1/en active IP Right Grant
- 1997-07-07 JP JP10504943A patent/JP2000514207A/ja active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102004007184B3 (de) * | 2004-02-13 | 2005-09-22 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Verfahren und Vorrichtung zum Quantisieren eines Informationssignals |
US7464027B2 (en) | 2004-02-13 | 2008-12-09 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Method and device for quantizing an information signal |
US7716042B2 (en) | 2004-02-13 | 2010-05-11 | Gerald Schuller | Audio coding |
US7729903B2 (en) | 2004-02-13 | 2010-06-01 | Gerald Schuller | Audio coding |
Also Published As
Publication number | Publication date |
---|---|
AU3452397A (en) | 1998-02-02 |
JP2000514207A (ja) | 2000-10-24 |
EP0950238A1 (de) | 1999-10-20 |
WO1998001848A1 (en) | 1998-01-15 |
ATE249672T1 (de) | 2003-09-15 |
DE69724819D1 (de) | 2003-10-16 |
CA2259374A1 (en) | 1998-01-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1576585B1 (de) | Verfahren und vorrichtung zur robusten prädiktiven vektorquantisierung von parametern der linearen prädiktion in variabler bitraten-kodierung | |
RU2389085C2 (ru) | Способы и устройства для введения низкочастотных предыскажений в ходе сжатия звука на основе acelp/tcx | |
US6871176B2 (en) | Phase excited linear prediction encoder | |
US20040002856A1 (en) | Multi-rate frequency domain interpolative speech CODEC system | |
EP0878790A1 (de) | Sprachkodiersystem und Verfahren | |
EP1103955A2 (de) | Hybrider Harmonisch-Transform-Sprachkodierer | |
EP1989703A1 (de) | Vorrichtung und verfahren zum codieren und decodieren eines signals | |
US20050065788A1 (en) | Hybrid speech coding and system | |
US7617096B2 (en) | Robust quantization and inverse quantization using illegal space | |
EP0950238B1 (de) | Sprachkodier- und dekodiersystem | |
US7647223B2 (en) | Robust composite quantization with sub-quantizers and inverse sub-quantizers using illegal space | |
Özaydın et al. | Matrix quantization and mixed excitation based linear predictive speech coding at very low bit rates | |
US20050065782A1 (en) | Hybrid speech coding and system | |
Ju et al. | Complexity reduction in Karhunen-Loeve transform based speech coder for voice transmission | |
Jamrozik et al. | Modified multiband excitation model at 2400 bps | |
US20050065787A1 (en) | Hybrid speech coding and system | |
US20050065786A1 (en) | Hybrid speech coding and system | |
Ahmadi et al. | New techniques for sinusoidal coding of speech at 2400 bps | |
Villette | Sinusoidal speech coding for low and very low bit rate applications | |
So et al. | Multi-frame GMM-based block quantisation of line spectral frequencies | |
EP1293966B1 (de) | Quantisierung mit Subquantisierern unter Verwendung von ungültigen Koden | |
Papanastasiou | LPC-Based Pitch Synchronous Interpolation Speech Coding | |
Balint | Excitation modeling in CELP speech coders [articol] | |
Zhang | Speech transform coding using ranked vector quantization | |
EP1212750A1 (de) | Multimodaler vselp sprachkodierer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 19990205 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
17Q | First examination report despatched |
Effective date: 20000518 |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
RIC1 | Information provided on ipc code assigned before grant |
Free format text: 7G 10L 11/04 A, 7G 10L 11/06 B |
|
RTI1 | Title (correction) |
Free format text: SPEECH SYNTHESIS SYSTEM |
|
RIC1 | Information provided on ipc code assigned before grant |
Free format text: 7G 10L 19/12 A |
|
RTI1 | Title (correction) |
Free format text: SPEECH CODING AND DECODING SYSTEM |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20030910 Ref country code: LI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20030910 Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT;WARNING: LAPSES OF ITALIAN PATENTS WITH EFFECTIVE DATE BEFORE 2007 MAY HAVE OCCURRED AT ANY TIME BEFORE 2007. THE CORRECT EFFECTIVE DATE MAY BE DIFFERENT FROM THE ONE RECORDED. Effective date: 20030910 Ref country code: FR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20030910 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20030910 Ref country code: CH Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20030910 Ref country code: BE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20030910 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20030910 |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REF | Corresponds to: |
Ref document number: 69724819 Country of ref document: DE Date of ref document: 20031016 Kind code of ref document: P |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20031210 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20031210 Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20031210 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20031211 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20031217 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20031221 |
|
NLV1 | Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act | ||
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20040707 Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20040707 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20040707 Year of fee payment: 8 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20040731 |
|
26N | No opposition filed |
Effective date: 20040614 |
|
EN | Fr: translation not filed | ||
REG | Reference to a national code |
Ref country code: GB Ref legal event code: 732E |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: MM4A |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20050707 |
|
REG | Reference to a national code |
Ref country code: HK Ref legal event code: WD Ref document number: 1019805 Country of ref document: HK |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20050707 |