GB2125259A

GB2125259A - Digital coding of speech

Info

Publication number: GB2125259A
Application number: GB8320964A
Authority: GB
Inventors: John Nicholas Holmes
Original assignee: UK Secretary of State for Defence
Current assignee: UK Secretary of State for Defence
Priority date: 1982-08-04
Filing date: 1983-08-03
Publication date: 1984-02-29
Also published as: GB8320964D0; GB2125259B

Abstract

In an encoder for the digital coding of speech, sounds of the unvoiced fricative type are identified and sections coded at intervals, together with a code representing the interval between coded sections. In a decoder signal samples are reconstructed to provide signal output, and means responsive to a duration code included to repeat a section of waveform during sounds of the unvoiced fricative type.

Description

SPECIFICATION Digital coding of speech This invention relates to digital coding of speech and speech like signals, and in particular to the digital coding of speech for transmission over a communications channel.

A common requirement for speech signal processing and transmission is that of digital coding, that is representation of the signal by a series of digital words. Unfortunately the high complexity and information content of speech results in a large number of words being required in order to specify the signal accurately. If the signal is to be stcred, a large storage area will be required and, if transmitted, a high data rate over the transmission channel. Because of these difficulties, it is a common objective of workers in the field that the number of words required to specify a speech signal be reduced.

A number of coding algorithms have been suggested with a view to reducing data rate. In one technique (which is often referred to in the art as extremal coding) signal maxima and minima only are identified at the transmitter end of a communcation channel. At the receiver end, the waveform between these points is reconstructed by introducing a fixed shape. Data rate is reduced since, in addition to coding the maxima and minima, code representing only the duration and amplitude of the shape need be transmitted.

Although quality is degraded, a viable coding system can be achieved with reduced data rate. In an alternative algorithm, (referred to in the art as time-encoded speech) the waveform between the transmitted points (in this case the zero crossings) is reconstructed by a selection from a limited number of possible waveshapes. For a given number of transmitted symbols, improved accuracy is achieved at only the overhead of transmitting an additional selection code.

It will be realised that a feature of these systems is a variable rate of code generation. At times where maxima and minima, or zero crossings are closely spaced (that is during sounds of the unvoiced fricative or similar type) data rate is much higher than average, whereas at some other times data rate is comparatively low. This variable rate does not accord well with conventional communications channels, which are commonly of fixed and limited data rate. Where generation rate can exceed the data rate of the channel a buffer delay is required to even out the variability.

Buffer length is determined by the statistics of the data rate variation, but there is a practical limit on the buffer delay, particularly if two-way speed communications is to be viable. The consequence of limited buffer length is that the channel data rate must be fairly close to the highest possible ) generation rate, and the expected saving by using the coding algorithms cannot be realised.

According to one aspect of the present invention a digital encoder for speech includes means for identifying sounds of the unvoiced fricative or similar type, means for coding sections of the sound waveform at intervals and means for coding the duration of sound between such coded sections.

According to another aspect of the present invention a decoder for digitally encoded speech includes means for reconstructing signal samples to provide a signal output, and means responsive to a duration code for repeating a section of waveform during sounds of the unvoiced fricative or similar type.

According to another aspect of the present invention, a decoder for digitally encoded speech includes a means for randomly reversing the polarity and for randomly reversing the timescale of repeated waveform sections for successive repeats, during sounds of the unvoiced fricative or similar type.

According to a further aspect of the present invention speech communications apparatus includes a digital encoder and a decoder for digitally encoded speech as specified above and further includes a transmission channel arranged such that the encoder is at the transmitter end of the channel and the decoder at the receiver end of the channel.

Preferably speed during sounds not of the unvoiced fricative type is encoded according to extremal coding or time encoded speech as described above. In communications apparatus the transmission channel preferably has a constant data rate and buffer delay is included at both transmitter and receiver. The data rate is advantageously chosen such that the buffer delay is short enough to permit two way speech communications over a reversed transmission path. In a preferred form of the present invention the length of the waveform sections transmitted during sounds of the unvoiced fricative or similar type is short compared with the interval between such waveform sections. During long sounds of the unvoiced fricative type the interval between transmitted waveform sections may advantageously be increased and the duration code chosen to represent the greater length of interval.

In order that features and advantages of the present invention may be appreciated, some examples will now be described.

Those parts of the speech signal that produce the greatest data rate (the unvoiced fricative consonants) are generated by turbulent noise exciting the vocal tract, and are essentially random in character. It has been discovered during the course of making the present invention that the precise pattern of speech signal during such a sound need not be reproduced, provided the shortterm statistical properties are adequately maintained. These properties only change fairly slowly as a result of articulatory movements, and so unvoiced fricative sounds require very little information to describe their perceptual properties. This small amount of information is in contrast to the fact that symbols generated during unvoiced fricatives cause most of the transmission load of prior art extremal coding (EC) and time encoded speech (TES) techniques.In general the spectral envelope of unvoiced sounds can be defined with completely adequate frequency resolution by waveform segments only about 3 ms long. As the normal rates of change of fricative spectra can be defined by spectrum descriptions at least 20 ms apart there is at least a 6:1 redundancy, and in long continuant fricatives this redundancy is even greater. In accordance with the present invention buffer size is reduced by specifying unvoiced fricatives by short sections of waveform with comparatively long intervals between them. The times of occurrence and durations of omissions can be specified to the receiver by a very small proportion of extra coding data. At the receiver a continuous waveform can be reconstructed by replacing the missing sections with repeated sections of waveform already used.

Adverse subjective effects of exact waveform repetition may be avoided by introuducing some random or pseudo-random action in the repetition process.

In general buffers will tend to fill during unvoiced sounds and empty during voiced sounds and silences. The present invention, with its repeat strategy may therefore be used in conjunction with an EC or TES system when necessary to avoid buffer over-flow and this straighfforward use of the present invention brings a significant improvement in terms of data rate and reduced buffer length. In this regime, however, it is possible for a voiced sound to be encountered with the buffer substantially full, and the transmitted data rate will have to be high enough to ensure that some space is cleared in the buffer before the next voiced sound.In a preferred form of the present invention the repeat strategy is used during a!l unvoiced fricatives and the interval between transmitted waveform sections is iengthened during long unvoiced sounds, so deliberately emptying the transmission buffer. It is then possible for the average data rate to be lower, without risk of buffer overflow in voiced periods.

There is a wide choice of detailed strategies that conform to the present invention and one such scheme has been simulated using a type of extremal coding. However, as the aim of these experiments was to investigate buffering, rather than extremal coding per se, no deliberate quantizing was introduced for. either the times of occurrence or amplitudes of the extremes. Linear pre-filtering was used to prevent the lowfrequency components of vowels from obscuring the waveform ripples of the important higherfrequency components; its effect was compensated for by post-filtering at the receiver.

When this system was used at a sufficiently high symbol transmission rate, high quality speech of 4.8 kHz bandwidth was reproduced with only very slight subjective degradation.

For one typical example of the sentences tested the mean symbol generation rate was 4,200 symbols per second, but peak rates as high as 8,500 symbol/s occurred for short periods. A transmission rate of 4,600 symbol/s was needed to avoid buffer overflow for this sentence when the buffer delay was restricted to 200 ms. When used with a buffer scheme in accordance with the present invention it was possible to reduce the symbol transmission rate to only 2,300 symbols per second, for the same total buffer delay. The effect of the signal repetition was only just detectable subjectively when it was associated with random modifications to avoid exact repetition. It will be noted that with this buffer repeat system the transmission rate penalty is small for using an audio bandwidth well above the 3.4 kHz of normal telephony, and so some common consonant confusions can be avoided.

It will be appreciated that a digital encoder for speech in accordance with the present invention provides an efficient coding strategy at a lower data transmission rate compared with the prior art. Where speech is coded for storage as coded digital words, economy of store is provided.

Claims

1. A digital encoder for speech inciuding means for identifying sounds of the unvoiced fricative or similar type, means for coding sections of the sound waveform at intervals, and means for coding the duration of sound between such coded sections.

2. A decoder for digitally encoded speech including means for reconstructing signal samples to provide a signal output, and means responsive to a duration code for repeating a section of waveform during sounds of the unvoiced fricative or similar type.

3. A decoder as claimed in claim 2 and including means for ramdomly reversing the polarity and for randomly reversing the timescale of repeated waveform sections for successive repeats during sounds of the unvoiced fricative or similar type.

4. A digital encoder as claimed in claim 1 and including means for encoding sounds not of the unvoiced fricative type according to extremal coding or time encoded speech.

5. Communication apparatus including a digital encoder as claimed in claim 1 or claim 4 and a decoder as claimed in claim 2 or claim 3 and further including a transmission channel arranged such that the encoder is at a transmitter end of the channel and the decoder is at a receiver end of the channel.

6. Communications apparatus as claimed in claim 5 having a constant data rate and including buffer delay at both transmitter end and receiver end of the transmission channel.

7. Communications apparatus as claimed in claim 6 and wherein the data rate is chosen such that the buffer delay is short enough to permit two way speech communications.

8. Communications apparatus as claimed in claim 5, 6 or 7 and wherein the length of the waveform sections transmitted during sounds of the unvoiced fricative or similar type is short compared with the interval between such waveform sections.

9. Communication apparatus as claimed in claims, 5, 6, 7 or 8 and including means for increasing the interval between transmitted waveform sections during long sounds of the unvoiced fricative of similar type and for setting the duration code to represent the greater length of interval.

10. A digital encoder substantially as herein described including means for identifying sounds of the unvoiced fricative or similar type.

11. A decoder substantially as hereindescribed including means responsive to a duration code.

1 2. Communications apparatus substantially as herein described.