GB2130852A - Speech signal reproducing systems - Google Patents
Speech signal reproducing systems Download PDFInfo
- Publication number
- GB2130852A GB2130852A GB8330820A GB8330820A GB2130852A GB 2130852 A GB2130852 A GB 2130852A GB 8330820 A GB8330820 A GB 8330820A GB 8330820 A GB8330820 A GB 8330820A GB 2130852 A GB2130852 A GB 2130852A
- Authority
- GB
- United Kingdom
- Prior art keywords
- pitch
- commencement
- interval
- speech signal
- updated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012512 characterization method Methods 0.000 claims description 16
- 230000001755 vocal effect Effects 0.000 claims description 13
- 230000005284 excitation Effects 0.000 description 7
- 238000000034 method Methods 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/09—Long term prediction, i.e. removing periodical redundancies, e.g. by using adaptive codebook or pitch predictor
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
In an L.P.C. speech synthesizer arrangement in which a reproduced- speech pitch period may be longer than the frame period the filter coefficients are arranged to be updated either pitch-synchronously or at least a predetermined time after the commencement of a pitch period to avoid "thumping" in the reproduced speech.
Description
SPECIFICATION
Speech signal reproducing systems
The present invention relates to speech signal
reproducing systems and to speech signal
synthesizers for such systems.
Speech signal reproducing systems are
presently being developed in which electric signals
representing a speech utterance are analysed, in
respect of each of a succession of time intervals or frames of, say, 20 milliseconds duration, to derive successive sets of parameters representing a model of the vocal tract of the speaker producing that utterance.
The parameters in respect of each frame may comprise for example the pitch period, a voiced/unvoiced apportionment or mixing parameter, a measure of the total energy of the speech signal during the frame, and a set of twelve coefficients, all these parameters being conveyed by a total of, say, eighty binary digits.
The speech signals may then be fairly reproduced by means of a synthesizer arrangement comprising a recursive filter excited by a series of pulses separated by the pitch period for voiced sounds and by pseudo-random noise for unvoiced sounds, or more generally by a mix of the two, the set of twelve coefficients being utilised to determine, say, the reflection coefficients of the recursive filter. The synthesizer arrangement is commonly a digital arrangement, the initial and reproduced speech signals being of the form of linear or compression-law coded PCM speech.
This "parametric" form of coding, know as linear prediction coding, enables speech signals to be represented with acceptable quality on reproduction by a series of binary digits generated at bit rates of 4Kbits/sec or less. Where better quality reproduction is required and higher bit rates are tolerable, the original speech signals may be analysed in overlapping 20 msecond blocks to derive updated parameters, say, every 10 mseconds.
Since the pitch period of a speech signal typically may vary between 2 and 17 mseconds a timing problem exists at the synthesizer, particularly when the parameters are made available for updating every 10 milliseconds.
According to one aspect of the present invention in a speech signal reproducing system in which a speech signal synthesizer includes means for substantially reproducing at least the pitch and the vocal tract characterisation of an original speech signal, during each of a succession of regular time intervals or frames of predetermined duration, in dependence upon respective pulse coded signals which represent said pitch and said characterisation and which are made available to said synthesizer at the commencement of each said interval, the pulse coded signals utilised by the pitch reproducing means during a time interval are updated to match those made available in respect of that interval at the first commencement of a pitch period after the commencement of said interval, and the pulse coded signals utilised by the vocal tract characterisation means during that interval are updated either contemporaneously with the updating of the pitch-representing signals or after a predetermined delay time from the commencement of said time interval, whichever is sooner.
According to another aspect of the present invention in a speech signal reproducing system in.
which a speech signal synthesizer includes means for substantially reproducing at least the pitch and the vocal tract characterisation of an original speech signal, during each of a succession of regular time intervals or frames of predetermined duration, in dependence upon respective pulse coded signals which represent said pitch and said characterisation and which are made available to said synthesizer at the commencement of each said interval, there are provided means to generate a signal to mark a predetermined delay time after the commencement of each pitch period of the reproduced signal, the pulse coded signals utilised by the pitch reproducing means during a time interval being updated to match those made available in respect of that interval at the first commencement of a pitch period after the commencement of said interval, and the pulse coded signals utilised by the vocal tract characterisation means being updated either at the commencement of the respective interval in the absence of a delay time marking signal, contemporaneously with the updating of the pitch-representing signals if said updating occurs in the presence of said delay time marking signal, or at the termination of said delay time marking signal if no pitch updating has yet taken place in that interval.
The pulse coded signals which represent the pitch and vocal tract characterisation of the original speech signal and which are made available at the commencement of each time interval may themselves be updated at different rates. For example the pitch-representing signals may be updated at the commencement of each time interval while the vocal tract characterisation signals may be updated at the commencement of alternate time intervals. The time intervals may be of 10 msec. duration.
A speech signal reproducing system in accordance with the present invention will now be described with reference to the accompanying drawings, of which Figures 1 A and 1 B and Figures 2A and 2B show timing diagrams illustrating respective modes of operation of the system.
In the established method of analysing speech signals using linear prediction coding (LPC) each 20 msecond segment of speech signal represented, say, by 160 PCM coded amplitude samples is analysed to derive values for fifteen parameters represented by a total of some 80 binary digits. These LPC parameter values can be conveyed to a synthesizer arrangement as a stream of binary digits at bit rate of 4 Kbits/sec.
For better quality speech reproduction the analysis can be carried out on "overlapping" 20 msecond segments with all parameters being updated and transmitted afresh every 10 msecs, giving a bit rate of 8 Kbits/sec.
At the synthesizer twelve of the parameter values of each set of fifteen are applied as coefficient values in a recursive filter. One of the remaining three values, representing the pitch period in respect of a voiced sound, is arranged to give a unit positive pulse excitation to the input of the filter at the beginning of each pitch period and small negative pulses at subsequent 8 KHz sample intervals in order to give a zero mean excitation value. Thus at the beginning of a pitch period the excitation values in the filter are large but subsequently tend to decrease.At the same time when the twelve coefficients are updated the excitation values within the filter do not in general correspond with the new filter coefficients since these values have been generated in accordance with the preceding coefficients, and a number of sample intervals have to pass before the change is completed.
If the excitation values within the filter are large when the coefficients are updated the changeover will be audible in the resulting synthesized speech, and for this reason the updating is best carried out when the excitation values are small, preferably at the end of each pitch period. With this method of updating, which is referred to as pitchsynchronous updating, where the pitch period exceeds the frame interval, as in the case of a lowpitched voice with parameters being updated every 10 mseconds, a whole set of parameter values has to be discarded whenever a pitch period extends from one frame interval into the next but one, and the quality of the reproduced speech suffers accordingly.
In order to overcome this loss of quality, in the present arrangement the twelve coefficients are arranged to be updated within each frame interval either pitch synchronously if the pitch period is less than a predetermined time or at least that predetermined time after the commencement of a pitch period if that pitch period is greater than the predetermined time. In the latter case the twelve coefficients may either be updated at the commencement of a frame interval or at the predetermined time after the commencement of a frame interval. The predetermined time may be, for example, of the order of 5 mseconds.
The three remaining parameters, determining the pitch period, gain and the voiced/unvoiced mix are updated pitch-synchronously at the commencement of the first pitch period in a frame interval.
Referring now to Figures 1 A and 1 B, in a first method of carrying out this conditional updating a counter (not shown) is set at the commencement of each frame to count 8 KHz sample intervals up to a total count x. If within this count period a pitch boundary occurs then the twelve coefficients are updated pitch-synchronously, as shown in
Figure 1 A, where PA and CA represent the pitch and coefficient values in respect of frame A and so on.
If a pitch boundary does not occur within the count period x, as shown in respect of frames B, C and E of Figure 1 B, then the updating of the twelve coefficients is arranged to take place at the end of the count period.
In an alternative method indicated in Figures 2A and 2B a counter (not shown) is set at each pitch boundary to count 8 KHz sample intervals up to a total count y. If a frame boundary occurs during this count then updating of the twelve coefficients takes place when the counter is reset, either when the next pitch boundary occurs, as shown in Figure 2A, or when the count y is reached, as shown in Figure 2B, frames B and C. If at a frame boundary the counter is not running, as in Figure 2B, frames D and E, then the twelve coefficients are updated at the frame boundary, that is, frame-synchronously.
Both of these methods ensure that each set of twelve coefficients is actually used for most of the frame period for which it is intended, with the updating pitch-synchronous where possible but where this is not possible at lower than peak excitation levels. In general the counts x and y are expected to be the same, and of the order of 40, that is, equivalent to a time period of 5 mseconds.
Where the channel capacity between the analyser and synthesizer is limited it has been found that compared with the results obtained by transmitting all fifteen parameters every 20 mseconds a significant improvement in reproduced speech quality can be obtained by transmitting the pitch, gain and voiced/unvoiced mix parameters every 10 mseconds while transmitting the twelve coefficients only every 20 mseconds. The methods described above for conditional updating at the synthesizer ensure that the best possible use is made of the coefficient values once they are received.
Speech synthesizers to which this updating technique may be applied were described, for example by Atal and Hanauer in The Journal of the
Acoustical Society of America, Volume 50, 1971, pages 637 and 655.
Claims (7)
1. A speech signal reproducing system in which
a speech signal synthesizer includes means for
substantially reproducing at least the pitch and the vocal tract characterisation of an original
speech signal, during each of a succession of
regular time intervals or frames of predetermined
duration, in dependence upon respective pulse
coded signals which represent said pitch and said
characterisation and which are made available to
said synthesizer at the commencement of each
said interval, wherein the pulse coded signals
utilised by the pitch reproducing means during a
time interval are updated to match those made
available in respect of that interval at the first
commencement of a pitch period after the
commencement of said interval, and the pulse
coded signals utilised by the vocal tract
characterisation means during that interval are
updated either contemporaneously with the updating of the pitch-representing signals or after a predetermined delay time from the commencement of said time interval, whichever is sooner.
2. A speech signal reproducing system in which a speech signal synthesizer includes means for substantially reproducing at least the pitch and the vocal tract characterisation of an original speech signal, during each of a succession of regular time intervals or frames of predetermined duration, in dependence upon respective pulse coded signals which represent said pitch and said characterisation and which are made available to said synthesizer at the commencement of each said interval, wherein there are provided means to generate a signal to mark a predetermined delay time after the commencement cf each pitch period of the reproduced signal, the pulse coded signals utilised by the pitch reproducing means during a time interval being updated to match those made available in respect of that interval at the first commencement of a pitch period after the commencement of said interval, and the pulse coded signals utilised by the vocal tract characterisation means being updated either at the commencement of the respective interval in the absence of a delay time marking signal, contemporaneously with the updating of the pitch-representing signals if said updating occurs in the presence of said delay time marking signal, or at the termination of said delay time marking signal if no pitch updating has yet taken place in that interval.
3. A speech signal reproducing system in accordance with Claim 2 wherein the pulse coded signals which represent the pitch and vocal tract characterisation of the original speech signal and which are made available at the commencement of each time interval are themselves updated at different rates.
4. A speech signal reproducing system in accordance with Claim 3 wherein the pitchrepresenting signals are updated at the commencement of each time interval while the vocal tract characterisation signals may be updated at the commencement of alternate time intervals.
5. A speech signal reproducing system in accordance with any preceding claim wherein the time intervals are of 10 milliseconds duration.
6. A speech signal reproducing system substantially as hereinbefore described with reference to Figures 1 A and 1 B of the accompanying drawings.
7. A speech signal reproducing system substantially as hereinbefore described with reference to Figures 2A and 2B of the accompanying drawings.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB8330820A GB2130852B (en) | 1982-11-19 | 1983-11-18 | Speech signal reproducing systems |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB8233068 | 1982-11-19 | ||
GB8330820A GB2130852B (en) | 1982-11-19 | 1983-11-18 | Speech signal reproducing systems |
Publications (3)
Publication Number | Publication Date |
---|---|
GB8330820D0 GB8330820D0 (en) | 1983-12-29 |
GB2130852A true GB2130852A (en) | 1984-06-06 |
GB2130852B GB2130852B (en) | 1986-03-12 |
Family
ID=26284448
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB8330820A Expired GB2130852B (en) | 1982-11-19 | 1983-11-18 | Speech signal reproducing systems |
Country Status (1)
Country | Link |
---|---|
GB (1) | GB2130852B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0016427A2 (en) * | 1979-03-15 | 1980-10-01 | CSELT Centro Studi e Laboratori Telecomunicazioni S.p.A. | Multi-channel digital speech synthesizer |
GB2093668A (en) * | 1981-01-29 | 1982-09-02 | Seiko Instr & Electronics | A speech synthesizer |
GB2097636A (en) * | 1981-04-28 | 1982-11-03 | Seiko Instr & Electronics | Speech synthesizer |
-
1983
- 1983-11-18 GB GB8330820A patent/GB2130852B/en not_active Expired
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0016427A2 (en) * | 1979-03-15 | 1980-10-01 | CSELT Centro Studi e Laboratori Telecomunicazioni S.p.A. | Multi-channel digital speech synthesizer |
GB2093668A (en) * | 1981-01-29 | 1982-09-02 | Seiko Instr & Electronics | A speech synthesizer |
GB2097636A (en) * | 1981-04-28 | 1982-11-03 | Seiko Instr & Electronics | Speech synthesizer |
Also Published As
Publication number | Publication date |
---|---|
GB2130852B (en) | 1986-03-12 |
GB8330820D0 (en) | 1983-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US4709390A (en) | Speech message code modifying arrangement | |
CA1181854A (en) | Digital speech coder | |
US5305421A (en) | Low bit rate speech coding system and compression | |
CA1184657A (en) | Digital speech processing using linear prediction process | |
WO1985004276A1 (en) | Multipulse lpc speech processing arrangement | |
TW326070B (en) | The estimation method of the impulse gain for coding vocoder | |
EP0731348B1 (en) | Voice storage and retrieval system | |
US7869993B2 (en) | Method and a device for source coding | |
EP0634041B1 (en) | Method and apparatus for encoding/decoding of background sounds | |
JPH0636159B2 (en) | Pitch detector | |
KR100291584B1 (en) | Speech waveform compressing method by similarity of fundamental frequency/first formant frequency ratio per pitch interval | |
GB2130852A (en) | Speech signal reproducing systems | |
Holmes | Copy synthesis of female speech using the JSRU parallel formant synthesiser. | |
US4809330A (en) | Encoder capable of removing interaction between adjacent frames | |
JPH087597B2 (en) | Speech coder | |
JP3798433B2 (en) | Method and apparatus for smoothing pitch cycle waveform | |
JPS5888798A (en) | Voice synthesization system | |
Hedelin | Relp-vocoding with uniform and non-uniform down-sampling | |
JPH0411040B2 (en) | ||
JPS62102294A (en) | Voice coding system | |
GB2266213A (en) | Digital signal coding | |
JPS5961891A (en) | Encoding of residual signal | |
EP0138954B1 (en) | Speech pattern processing utilizing speech pattern compression | |
JP2560277B2 (en) | Speech synthesis method | |
JPH05136697A (en) | Voice coding system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PCNP | Patent ceased through non-payment of renewal fee |