GB2125259A - Digital coding of speech - Google Patents

Digital coding of speech Download PDF

Info

Publication number
GB2125259A
GB2125259A GB8320964A GB8320964A GB2125259A GB 2125259 A GB2125259 A GB 2125259A GB 8320964 A GB8320964 A GB 8320964A GB 8320964 A GB8320964 A GB 8320964A GB 2125259 A GB2125259 A GB 2125259A
Authority
GB
United Kingdom
Prior art keywords
sounds
speech
sections
waveform
similar type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB8320964A
Other versions
GB8320964D0 (en
GB2125259B (en
Inventor
John Nicholas Holmes
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
UK Secretary of State for Defence
Original Assignee
UK Secretary of State for Defence
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by UK Secretary of State for Defence filed Critical UK Secretary of State for Defence
Priority to GB8320964A priority Critical patent/GB2125259B/en
Publication of GB8320964D0 publication Critical patent/GB8320964D0/en
Publication of GB2125259A publication Critical patent/GB2125259A/en
Application granted granted Critical
Publication of GB2125259B publication Critical patent/GB2125259B/en
Expired legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0018Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

In an encoder for the digital coding of speech, sounds of the unvoiced fricative type are identified and sections coded at intervals, together with a code representing the interval between coded sections. In a decoder signal samples are reconstructed to provide signal output, and means responsive to a duration code included to repeat a section of waveform during sounds of the unvoiced fricative type.

Description

SPECIFICATION Digital coding of speech This invention relates to digital coding of speech and speech like signals, and in particular to the digital coding of speech for transmission over a communications channel.
A common requirement for speech signal processing and transmission is that of digital coding, that is representation of the signal by a series of digital words. Unfortunately the high complexity and information content of speech results in a large number of words being required in order to specify the signal accurately. If the signal is to be stcred, a large storage area will be required and, if transmitted, a high data rate over the transmission channel. Because of these difficulties, it is a common objective of workers in the field that the number of words required to specify a speech signal be reduced.
A number of coding algorithms have been suggested with a view to reducing data rate. In one technique (which is often referred to in the art as extremal coding) signal maxima and minima only are identified at the transmitter end of a communcation channel. At the receiver end, the waveform between these points is reconstructed by introducing a fixed shape. Data rate is reduced since, in addition to coding the maxima and minima, code representing only the duration and amplitude of the shape need be transmitted.
Although quality is degraded, a viable coding system can be achieved with reduced data rate. In an alternative algorithm, (referred to in the art as time-encoded speech) the waveform between the transmitted points (in this case the zero crossings) is reconstructed by a selection from a limited number of possible waveshapes. For a given number of transmitted symbols, improved accuracy is achieved at only the overhead of transmitting an additional selection code.
It will be realised that a feature of these systems is a variable rate of code generation. At times where maxima and minima, or zero crossings are closely spaced (that is during sounds of the unvoiced fricative or similar type) data rate is much higher than average, whereas at some other times data rate is comparatively low. This variable rate does not accord well with conventional communications channels, which are commonly of fixed and limited data rate. Where generation rate can exceed the data rate of the channel a buffer delay is required to even out the variability.
Buffer length is determined by the statistics of the data rate variation, but there is a practical limit on the buffer delay, particularly if two-way speed communications is to be viable. The consequence of limited buffer length is that the channel data rate must be fairly close to the highest possible ) generation rate, and the expected saving by using the coding algorithms cannot be realised.
According to one aspect of the present invention a digital encoder for speech includes means for identifying sounds of the unvoiced fricative or similar type, means for coding sections of the sound waveform at intervals and means for coding the duration of sound between such coded sections.
According to another aspect of the present invention a decoder for digitally encoded speech includes means for reconstructing signal samples to provide a signal output, and means responsive to a duration code for repeating a section of waveform during sounds of the unvoiced fricative or similar type.
According to another aspect of the present invention, a decoder for digitally encoded speech includes a means for randomly reversing the polarity and for randomly reversing the timescale of repeated waveform sections for successive repeats, during sounds of the unvoiced fricative or similar type.
According to a further aspect of the present invention speech communications apparatus includes a digital encoder and a decoder for digitally encoded speech as specified above and further includes a transmission channel arranged such that the encoder is at the transmitter end of the channel and the decoder at the receiver end of the channel.
Preferably speed during sounds not of the unvoiced fricative type is encoded according to extremal coding or time encoded speech as described above. In communications apparatus the transmission channel preferably has a constant data rate and buffer delay is included at both transmitter and receiver. The data rate is advantageously chosen such that the buffer delay is short enough to permit two way speech communications over a reversed transmission path. In a preferred form of the present invention the length of the waveform sections transmitted during sounds of the unvoiced fricative or similar type is short compared with the interval between such waveform sections. During long sounds of the unvoiced fricative type the interval between transmitted waveform sections may advantageously be increased and the duration code chosen to represent the greater length of interval.
In order that features and advantages of the present invention may be appreciated, some examples will now be described.
Those parts of the speech signal that produce the greatest data rate (the unvoiced fricative consonants) are generated by turbulent noise exciting the vocal tract, and are essentially random in character. It has been discovered during the course of making the present invention that the precise pattern of speech signal during such a sound need not be reproduced, provided the shortterm statistical properties are adequately maintained. These properties only change fairly slowly as a result of articulatory movements, and so unvoiced fricative sounds require very little information to describe their perceptual properties. This small amount of information is in contrast to the fact that symbols generated during unvoiced fricatives cause most of the transmission load of prior art extremal coding (EC) and time encoded speech (TES) techniques.In general the spectral envelope of unvoiced sounds can be defined with completely adequate frequency resolution by waveform segments only about 3 ms long. As the normal rates of change of fricative spectra can be defined by spectrum descriptions at least 20 ms apart there is at least a 6:1 redundancy, and in long continuant fricatives this redundancy is even greater. In accordance with the present invention buffer size is reduced by specifying unvoiced fricatives by short sections of waveform with comparatively long intervals between them. The times of occurrence and durations of omissions can be specified to the receiver by a very small proportion of extra coding data. At the receiver a continuous waveform can be reconstructed by replacing the missing sections with repeated sections of waveform already used.
Adverse subjective effects of exact waveform repetition may be avoided by introuducing some random or pseudo-random action in the repetition process.
In general buffers will tend to fill during unvoiced sounds and empty during voiced sounds and silences. The present invention, with its repeat strategy may therefore be used in conjunction with an EC or TES system when necessary to avoid buffer over-flow and this straighfforward use of the present invention brings a significant improvement in terms of data rate and reduced buffer length. In this regime, however, it is possible for a voiced sound to be encountered with the buffer substantially full, and the transmitted data rate will have to be high enough to ensure that some space is cleared in the buffer before the next voiced sound.In a preferred form of the present invention the repeat strategy is used during a!l unvoiced fricatives and the interval between transmitted waveform sections is iengthened during long unvoiced sounds, so deliberately emptying the transmission buffer. It is then possible for the average data rate to be lower, without risk of buffer overflow in voiced periods.
There is a wide choice of detailed strategies that conform to the present invention and one such scheme has been simulated using a type of extremal coding. However, as the aim of these experiments was to investigate buffering, rather than extremal coding per se, no deliberate quantizing was introduced for. either the times of occurrence or amplitudes of the extremes. Linear pre-filtering was used to prevent the lowfrequency components of vowels from obscuring the waveform ripples of the important higherfrequency components; its effect was compensated for by post-filtering at the receiver.
When this system was used at a sufficiently high symbol transmission rate, high quality speech of 4.8 kHz bandwidth was reproduced with only very slight subjective degradation.
For one typical example of the sentences tested the mean symbol generation rate was 4,200 symbols per second, but peak rates as high as 8,500 symbol/s occurred for short periods. A transmission rate of 4,600 symbol/s was needed to avoid buffer overflow for this sentence when the buffer delay was restricted to 200 ms. When used with a buffer scheme in accordance with the present invention it was possible to reduce the symbol transmission rate to only 2,300 symbols per second, for the same total buffer delay. The effect of the signal repetition was only just detectable subjectively when it was associated with random modifications to avoid exact repetition. It will be noted that with this buffer repeat system the transmission rate penalty is small for using an audio bandwidth well above the 3.4 kHz of normal telephony, and so some common consonant confusions can be avoided.
It will be appreciated that a digital encoder for speech in accordance with the present invention provides an efficient coding strategy at a lower data transmission rate compared with the prior art. Where speech is coded for storage as coded digital words, economy of store is provided.

Claims (11)

1. A digital encoder for speech inciuding means for identifying sounds of the unvoiced fricative or similar type, means for coding sections of the sound waveform at intervals, and means for coding the duration of sound between such coded sections.
2. A decoder for digitally encoded speech including means for reconstructing signal samples to provide a signal output, and means responsive to a duration code for repeating a section of waveform during sounds of the unvoiced fricative or similar type.
3. A decoder as claimed in claim 2 and including means for ramdomly reversing the polarity and for randomly reversing the timescale of repeated waveform sections for successive repeats during sounds of the unvoiced fricative or similar type.
4. A digital encoder as claimed in claim 1 and including means for encoding sounds not of the unvoiced fricative type according to extremal coding or time encoded speech.
5. Communication apparatus including a digital encoder as claimed in claim 1 or claim 4 and a decoder as claimed in claim 2 or claim 3 and further including a transmission channel arranged such that the encoder is at a transmitter end of the channel and the decoder is at a receiver end of the channel.
6. Communications apparatus as claimed in claim 5 having a constant data rate and including buffer delay at both transmitter end and receiver end of the transmission channel.
7. Communications apparatus as claimed in claim 6 and wherein the data rate is chosen such that the buffer delay is short enough to permit two way speech communications.
8. Communications apparatus as claimed in claim 5, 6 or 7 and wherein the length of the waveform sections transmitted during sounds of the unvoiced fricative or similar type is short compared with the interval between such waveform sections.
9. Communication apparatus as claimed in claims, 5, 6, 7 or 8 and including means for increasing the interval between transmitted waveform sections during long sounds of the unvoiced fricative of similar type and for setting the duration code to represent the greater length of interval.
10. A digital encoder substantially as herein described including means for identifying sounds of the unvoiced fricative or similar type.
11. A decoder substantially as hereindescribed including means responsive to a duration code.
1 2. Communications apparatus substantially as herein described.
GB8320964A 1982-08-04 1983-08-03 Digital coding of speech Expired GB2125259B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB8320964A GB2125259B (en) 1982-08-04 1983-08-03 Digital coding of speech

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB8222514 1982-08-04
GB8320964A GB2125259B (en) 1982-08-04 1983-08-03 Digital coding of speech

Publications (3)

Publication Number Publication Date
GB8320964D0 GB8320964D0 (en) 1983-09-07
GB2125259A true GB2125259A (en) 1984-02-29
GB2125259B GB2125259B (en) 1986-10-22

Family

ID=26283516

Family Applications (1)

Application Number Title Priority Date Filing Date
GB8320964A Expired GB2125259B (en) 1982-08-04 1983-08-03 Digital coding of speech

Country Status (1)

Country Link
GB (1) GB2125259B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2290683A (en) * 1994-06-20 1996-01-03 Studio Audio And Video Limited Editing recorded material

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1282641A (en) * 1969-05-14 1972-07-19 Thomas Patterson Speech encoding and decoding
GB1333880A (en) * 1970-05-21 1973-10-17 Phonplex Corp Terminal system for simultaneous multiple use of single channel
GB1433351A (en) * 1972-04-14 1976-04-28 Licentia Gmbh Method and apparatus for speech transfer over a norrow trans mission bandwidth by means of delta modulattion
GB2004443A (en) * 1977-08-09 1979-03-28 Center Of Scient & Applied Res Voice codification system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1282641A (en) * 1969-05-14 1972-07-19 Thomas Patterson Speech encoding and decoding
GB1333880A (en) * 1970-05-21 1973-10-17 Phonplex Corp Terminal system for simultaneous multiple use of single channel
GB1433351A (en) * 1972-04-14 1976-04-28 Licentia Gmbh Method and apparatus for speech transfer over a norrow trans mission bandwidth by means of delta modulattion
GB2004443A (en) * 1977-08-09 1979-03-28 Center Of Scient & Applied Res Voice codification system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2290683A (en) * 1994-06-20 1996-01-03 Studio Audio And Video Limited Editing recorded material
GB2290683B (en) * 1994-06-20 1999-04-21 Studio Audio And Video Limited Method and means for editing recorded material

Also Published As

Publication number Publication date
GB8320964D0 (en) 1983-09-07
GB2125259B (en) 1986-10-22

Similar Documents

Publication Publication Date Title
RU2146394C1 (en) Method and device for alternating rate voice coding using reduced encoding rate
CN102682777B (en) Acquiring method and acquiring device of attenuation factors
EP0785541B1 (en) Usage of voice activity detection for efficient coding of speech
JPS6046440B2 (en) Audio processing method and device
BR9602835B1 (en) process and apparatus for reproducing a voice signal, and process for transmitting the same.
US4791670A (en) Method of and device for speech signal coding and decoding by vector quantization techniques
Marques et al. Harmonic coding at 4.8 kb/s
DE60032006T2 (en) PREDICTION LANGUAGE CODERS WITH SAMPLE SELECTION FOR CODING TOPICS TO REDUCE SENSITIVITY FOR FRAME ERRORS
WO2003028009A1 (en) Perceptually weighted speech coder
US5267317A (en) Method and apparatus for smoothing pitch-cycle waveforms
DE69905152T2 (en) DEVICE AND METHOD FOR IMPROVING THE QUALITY OF ENCODED LANGUAGE BY MEANS OF BACKGROUND
EP0772185A3 (en) Speech decoding method and apparatus
JP2586043B2 (en) Multi-pulse encoder
GB2125259A (en) Digital coding of speech
Serizawa et al. A packet loss concealment method using pitch waveform repetition and internal state update on the decoded speech for the sub-band ADPCM wideband speech codec
Esteban et al. 9.6/7.2 kbps voice excited predictive coder (VEPC)
KR19990049148A (en) Compression method of speech waveform by similarity of FO / F1 ratio by pitch interval
Svendsen Segmental quantization of speech spectral information
Iao Mixed wideband speech and music coding using a speech/music discriminator
JPH0411040B2 (en)
EP0537948A2 (en) Method and apparatus for smoothing pitch-cycle waveforms
RU2180974C2 (en) Process of compression of insulated layers
Berouti et al. High quality adaptive predictive coding of speech
JP2000322095A (en) Sound decoding device
Dangjin et al. Implementation of voice synthesis algorithm using redundant bit reduction technique for the text editing type recorded announcements

Legal Events

Date Code Title Description
732 Registration of transactions, instruments or events in the register (sect. 32/1977)
PCNP Patent ceased through non-payment of renewal fee