GB2130852A - Speech signal reproducing systems - Google Patents

Speech signal reproducing systems Download PDF

Info

Publication number
GB2130852A
GB2130852A GB8330820A GB8330820A GB2130852A GB 2130852 A GB2130852 A GB 2130852A GB 8330820 A GB8330820 A GB 8330820A GB 8330820 A GB8330820 A GB 8330820A GB 2130852 A GB2130852 A GB 2130852A
Authority
GB
United Kingdom
Prior art keywords
pitch
commencement
interval
speech signal
updated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB8330820A
Other versions
GB2130852B (en
GB8330820D0 (en
Inventor
Tad Weng Chong
Angela Druckman
Michael John Shearme
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
General Electric Co PLC
Original Assignee
General Electric Co PLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by General Electric Co PLC filed Critical General Electric Co PLC
Priority to GB8330820A priority Critical patent/GB2130852B/en
Publication of GB8330820D0 publication Critical patent/GB8330820D0/en
Publication of GB2130852A publication Critical patent/GB2130852A/en
Application granted granted Critical
Publication of GB2130852B publication Critical patent/GB2130852B/en
Expired legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/09Long term prediction, i.e. removing periodical redundancies, e.g. by using adaptive codebook or pitch predictor

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

In an L.P.C. speech synthesizer arrangement in which a reproduced- speech pitch period may be longer than the frame period the filter coefficients are arranged to be updated either pitch-synchronously or at least a predetermined time after the commencement of a pitch period to avoid "thumping" in the reproduced speech.

Description

SPECIFICATION Speech signal reproducing systems The present invention relates to speech signal reproducing systems and to speech signal synthesizers for such systems.
Speech signal reproducing systems are presently being developed in which electric signals representing a speech utterance are analysed, in respect of each of a succession of time intervals or frames of, say, 20 milliseconds duration, to derive successive sets of parameters representing a model of the vocal tract of the speaker producing that utterance.
The parameters in respect of each frame may comprise for example the pitch period, a voiced/unvoiced apportionment or mixing parameter, a measure of the total energy of the speech signal during the frame, and a set of twelve coefficients, all these parameters being conveyed by a total of, say, eighty binary digits.
The speech signals may then be fairly reproduced by means of a synthesizer arrangement comprising a recursive filter excited by a series of pulses separated by the pitch period for voiced sounds and by pseudo-random noise for unvoiced sounds, or more generally by a mix of the two, the set of twelve coefficients being utilised to determine, say, the reflection coefficients of the recursive filter. The synthesizer arrangement is commonly a digital arrangement, the initial and reproduced speech signals being of the form of linear or compression-law coded PCM speech.
This "parametric" form of coding, know as linear prediction coding, enables speech signals to be represented with acceptable quality on reproduction by a series of binary digits generated at bit rates of 4Kbits/sec or less. Where better quality reproduction is required and higher bit rates are tolerable, the original speech signals may be analysed in overlapping 20 msecond blocks to derive updated parameters, say, every 10 mseconds.
Since the pitch period of a speech signal typically may vary between 2 and 17 mseconds a timing problem exists at the synthesizer, particularly when the parameters are made available for updating every 10 milliseconds.
According to one aspect of the present invention in a speech signal reproducing system in which a speech signal synthesizer includes means for substantially reproducing at least the pitch and the vocal tract characterisation of an original speech signal, during each of a succession of regular time intervals or frames of predetermined duration, in dependence upon respective pulse coded signals which represent said pitch and said characterisation and which are made available to said synthesizer at the commencement of each said interval, the pulse coded signals utilised by the pitch reproducing means during a time interval are updated to match those made available in respect of that interval at the first commencement of a pitch period after the commencement of said interval, and the pulse coded signals utilised by the vocal tract characterisation means during that interval are updated either contemporaneously with the updating of the pitch-representing signals or after a predetermined delay time from the commencement of said time interval, whichever is sooner.
According to another aspect of the present invention in a speech signal reproducing system in.
which a speech signal synthesizer includes means for substantially reproducing at least the pitch and the vocal tract characterisation of an original speech signal, during each of a succession of regular time intervals or frames of predetermined duration, in dependence upon respective pulse coded signals which represent said pitch and said characterisation and which are made available to said synthesizer at the commencement of each said interval, there are provided means to generate a signal to mark a predetermined delay time after the commencement of each pitch period of the reproduced signal, the pulse coded signals utilised by the pitch reproducing means during a time interval being updated to match those made available in respect of that interval at the first commencement of a pitch period after the commencement of said interval, and the pulse coded signals utilised by the vocal tract characterisation means being updated either at the commencement of the respective interval in the absence of a delay time marking signal, contemporaneously with the updating of the pitch-representing signals if said updating occurs in the presence of said delay time marking signal, or at the termination of said delay time marking signal if no pitch updating has yet taken place in that interval.
The pulse coded signals which represent the pitch and vocal tract characterisation of the original speech signal and which are made available at the commencement of each time interval may themselves be updated at different rates. For example the pitch-representing signals may be updated at the commencement of each time interval while the vocal tract characterisation signals may be updated at the commencement of alternate time intervals. The time intervals may be of 10 msec. duration.
A speech signal reproducing system in accordance with the present invention will now be described with reference to the accompanying drawings, of which Figures 1 A and 1 B and Figures 2A and 2B show timing diagrams illustrating respective modes of operation of the system.
In the established method of analysing speech signals using linear prediction coding (LPC) each 20 msecond segment of speech signal represented, say, by 160 PCM coded amplitude samples is analysed to derive values for fifteen parameters represented by a total of some 80 binary digits. These LPC parameter values can be conveyed to a synthesizer arrangement as a stream of binary digits at bit rate of 4 Kbits/sec.
For better quality speech reproduction the analysis can be carried out on "overlapping" 20 msecond segments with all parameters being updated and transmitted afresh every 10 msecs, giving a bit rate of 8 Kbits/sec.
At the synthesizer twelve of the parameter values of each set of fifteen are applied as coefficient values in a recursive filter. One of the remaining three values, representing the pitch period in respect of a voiced sound, is arranged to give a unit positive pulse excitation to the input of the filter at the beginning of each pitch period and small negative pulses at subsequent 8 KHz sample intervals in order to give a zero mean excitation value. Thus at the beginning of a pitch period the excitation values in the filter are large but subsequently tend to decrease.At the same time when the twelve coefficients are updated the excitation values within the filter do not in general correspond with the new filter coefficients since these values have been generated in accordance with the preceding coefficients, and a number of sample intervals have to pass before the change is completed.
If the excitation values within the filter are large when the coefficients are updated the changeover will be audible in the resulting synthesized speech, and for this reason the updating is best carried out when the excitation values are small, preferably at the end of each pitch period. With this method of updating, which is referred to as pitchsynchronous updating, where the pitch period exceeds the frame interval, as in the case of a lowpitched voice with parameters being updated every 10 mseconds, a whole set of parameter values has to be discarded whenever a pitch period extends from one frame interval into the next but one, and the quality of the reproduced speech suffers accordingly.
In order to overcome this loss of quality, in the present arrangement the twelve coefficients are arranged to be updated within each frame interval either pitch synchronously if the pitch period is less than a predetermined time or at least that predetermined time after the commencement of a pitch period if that pitch period is greater than the predetermined time. In the latter case the twelve coefficients may either be updated at the commencement of a frame interval or at the predetermined time after the commencement of a frame interval. The predetermined time may be, for example, of the order of 5 mseconds.
The three remaining parameters, determining the pitch period, gain and the voiced/unvoiced mix are updated pitch-synchronously at the commencement of the first pitch period in a frame interval.
Referring now to Figures 1 A and 1 B, in a first method of carrying out this conditional updating a counter (not shown) is set at the commencement of each frame to count 8 KHz sample intervals up to a total count x. If within this count period a pitch boundary occurs then the twelve coefficients are updated pitch-synchronously, as shown in Figure 1 A, where PA and CA represent the pitch and coefficient values in respect of frame A and so on.
If a pitch boundary does not occur within the count period x, as shown in respect of frames B, C and E of Figure 1 B, then the updating of the twelve coefficients is arranged to take place at the end of the count period.
In an alternative method indicated in Figures 2A and 2B a counter (not shown) is set at each pitch boundary to count 8 KHz sample intervals up to a total count y. If a frame boundary occurs during this count then updating of the twelve coefficients takes place when the counter is reset, either when the next pitch boundary occurs, as shown in Figure 2A, or when the count y is reached, as shown in Figure 2B, frames B and C. If at a frame boundary the counter is not running, as in Figure 2B, frames D and E, then the twelve coefficients are updated at the frame boundary, that is, frame-synchronously.
Both of these methods ensure that each set of twelve coefficients is actually used for most of the frame period for which it is intended, with the updating pitch-synchronous where possible but where this is not possible at lower than peak excitation levels. In general the counts x and y are expected to be the same, and of the order of 40, that is, equivalent to a time period of 5 mseconds.
Where the channel capacity between the analyser and synthesizer is limited it has been found that compared with the results obtained by transmitting all fifteen parameters every 20 mseconds a significant improvement in reproduced speech quality can be obtained by transmitting the pitch, gain and voiced/unvoiced mix parameters every 10 mseconds while transmitting the twelve coefficients only every 20 mseconds. The methods described above for conditional updating at the synthesizer ensure that the best possible use is made of the coefficient values once they are received.
Speech synthesizers to which this updating technique may be applied were described, for example by Atal and Hanauer in The Journal of the Acoustical Society of America, Volume 50, 1971, pages 637 and 655.

Claims (7)

1. A speech signal reproducing system in which a speech signal synthesizer includes means for substantially reproducing at least the pitch and the vocal tract characterisation of an original speech signal, during each of a succession of regular time intervals or frames of predetermined duration, in dependence upon respective pulse coded signals which represent said pitch and said characterisation and which are made available to said synthesizer at the commencement of each said interval, wherein the pulse coded signals utilised by the pitch reproducing means during a time interval are updated to match those made available in respect of that interval at the first commencement of a pitch period after the commencement of said interval, and the pulse coded signals utilised by the vocal tract characterisation means during that interval are updated either contemporaneously with the updating of the pitch-representing signals or after a predetermined delay time from the commencement of said time interval, whichever is sooner.
2. A speech signal reproducing system in which a speech signal synthesizer includes means for substantially reproducing at least the pitch and the vocal tract characterisation of an original speech signal, during each of a succession of regular time intervals or frames of predetermined duration, in dependence upon respective pulse coded signals which represent said pitch and said characterisation and which are made available to said synthesizer at the commencement of each said interval, wherein there are provided means to generate a signal to mark a predetermined delay time after the commencement cf each pitch period of the reproduced signal, the pulse coded signals utilised by the pitch reproducing means during a time interval being updated to match those made available in respect of that interval at the first commencement of a pitch period after the commencement of said interval, and the pulse coded signals utilised by the vocal tract characterisation means being updated either at the commencement of the respective interval in the absence of a delay time marking signal, contemporaneously with the updating of the pitch-representing signals if said updating occurs in the presence of said delay time marking signal, or at the termination of said delay time marking signal if no pitch updating has yet taken place in that interval.
3. A speech signal reproducing system in accordance with Claim 2 wherein the pulse coded signals which represent the pitch and vocal tract characterisation of the original speech signal and which are made available at the commencement of each time interval are themselves updated at different rates.
4. A speech signal reproducing system in accordance with Claim 3 wherein the pitchrepresenting signals are updated at the commencement of each time interval while the vocal tract characterisation signals may be updated at the commencement of alternate time intervals.
5. A speech signal reproducing system in accordance with any preceding claim wherein the time intervals are of 10 milliseconds duration.
6. A speech signal reproducing system substantially as hereinbefore described with reference to Figures 1 A and 1 B of the accompanying drawings.
7. A speech signal reproducing system substantially as hereinbefore described with reference to Figures 2A and 2B of the accompanying drawings.
GB8330820A 1982-11-19 1983-11-18 Speech signal reproducing systems Expired GB2130852B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB8330820A GB2130852B (en) 1982-11-19 1983-11-18 Speech signal reproducing systems

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB8233068 1982-11-19
GB8330820A GB2130852B (en) 1982-11-19 1983-11-18 Speech signal reproducing systems

Publications (3)

Publication Number Publication Date
GB8330820D0 GB8330820D0 (en) 1983-12-29
GB2130852A true GB2130852A (en) 1984-06-06
GB2130852B GB2130852B (en) 1986-03-12

Family

ID=26284448

Family Applications (1)

Application Number Title Priority Date Filing Date
GB8330820A Expired GB2130852B (en) 1982-11-19 1983-11-18 Speech signal reproducing systems

Country Status (1)

Country Link
GB (1) GB2130852B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0016427A2 (en) * 1979-03-15 1980-10-01 CSELT Centro Studi e Laboratori Telecomunicazioni S.p.A. Multi-channel digital speech synthesizer
GB2093668A (en) * 1981-01-29 1982-09-02 Seiko Instr & Electronics A speech synthesizer
GB2097636A (en) * 1981-04-28 1982-11-03 Seiko Instr & Electronics Speech synthesizer

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0016427A2 (en) * 1979-03-15 1980-10-01 CSELT Centro Studi e Laboratori Telecomunicazioni S.p.A. Multi-channel digital speech synthesizer
GB2093668A (en) * 1981-01-29 1982-09-02 Seiko Instr & Electronics A speech synthesizer
GB2097636A (en) * 1981-04-28 1982-11-03 Seiko Instr & Electronics Speech synthesizer

Also Published As

Publication number Publication date
GB2130852B (en) 1986-03-12
GB8330820D0 (en) 1983-12-29

Similar Documents

Publication Publication Date Title
US4709390A (en) Speech message code modifying arrangement
CA1181854A (en) Digital speech coder
US5305421A (en) Low bit rate speech coding system and compression
CA1184657A (en) Digital speech processing using linear prediction process
WO1985004276A1 (en) Multipulse lpc speech processing arrangement
TW326070B (en) The estimation method of the impulse gain for coding vocoder
EP0731348B1 (en) Voice storage and retrieval system
US7869993B2 (en) Method and a device for source coding
EP0634041B1 (en) Method and apparatus for encoding/decoding of background sounds
JPH0636159B2 (en) Pitch detector
KR100291584B1 (en) Speech waveform compressing method by similarity of fundamental frequency/first formant frequency ratio per pitch interval
GB2130852A (en) Speech signal reproducing systems
Holmes Copy synthesis of female speech using the JSRU parallel formant synthesiser.
US4809330A (en) Encoder capable of removing interaction between adjacent frames
JPH087597B2 (en) Speech coder
JP3798433B2 (en) Method and apparatus for smoothing pitch cycle waveform
JPS5888798A (en) Voice synthesization system
Hedelin Relp-vocoding with uniform and non-uniform down-sampling
JPH0411040B2 (en)
JPS62102294A (en) Voice coding system
GB2266213A (en) Digital signal coding
JPS5961891A (en) Encoding of residual signal
EP0138954B1 (en) Speech pattern processing utilizing speech pattern compression
JP2560277B2 (en) Speech synthesis method
JPH05136697A (en) Voice coding system

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee