GB2338630A

GB2338630A - Voice decoder reduces buzzing

Info

Publication number: GB2338630A
Application number: GB9813255A
Authority: GB
Inventors: Dominic Sai Fan Chan
Original assignee: Motorola Ltd
Current assignee: Motorola Solutions UK Ltd
Priority date: 1998-06-20
Filing date: 1998-06-20
Publication date: 1999-12-22
Anticipated expiration: 2018-06-20
Also published as: GB9813255D0; GB2338630B

Abstract

The speech communications unit includes a speech decoder for receiving an input signal and being operably coupled to a pitch determiner (68) for determining a pitch of a received input signal and an amplitude correction function (66) for correcting an amplitude of the received input signal based on the pitch determination. This is a CELP decoder for reducing buzziness, especially in low pitch male speech, in mobile radio/telephony.

Description

1 2338630 SPEECH DECODER AND METHOD OF OPERATION

Field of the Invention

This invention relates to decoding in communications systems and more particularly to speech decoding in a mobile communications system.

Background Qf the Invention

Many voice communications systems, such as the TErrestrial Trunked RAdio (TETRA) system for private mobile radio users, use speech processing units to encode and decode speech patterns. In such voice communications systems the speech encoder converts the analogue speech pattern into a suitable digital format for transmission and the speech decoder converts a received digital speech signal into an appropriate analog speech pattern.

As spectrum for such voice communications systems is a valuable resource, it is desirable to limit the channel bandwidth used, to maximise the number of users per frequency band. Hence, the primary objective in the use of speech coding techniques is to reduce the occupied capacity of the speech patterns as much as possible, by use of compression techniques, without losing fidelity of speech signals.

Speech coding typically uses speech production modelling techniques to compress pulse code modulation (PCM) speech signals into bit-rates that are suitable for different Idnds of bandwidth-limited applications such as speech communication systems or voice storage systems.

The basic speech production model, that is commonly used in speech coding algorithms, is shown in FIG. 1. The model in FIG. 1 was used in early linear predictive coding WC) based vocoders. The LPC filter models the combined effect of the glottal pulse model, the vocal tract and the lip radiation. For voiced speech, the voiced excitation, which consists of a pulse train separated by the pitch duration T, is used as an input signal to the LPC filter.

1 Alternatively, for unvoiced speech, a Gaussian noise source is used as the LPC filter input excitation.

The advance of speech coding development led to the introduction of

Analysis by Synthesis technique used in CELP (Code Excited Linear Prediction) such as (Algebraic Code Excited Linear Prediction). The improved speech production model or the synthesis model used in the ACELP case is shown in FIG. 2.

The excitation in the ACELP case is a weighted combination of the innovative codebook and the adaptive codebook. Typically research papers on the subject matter of CELP-based speech coding techniques refer to two codebooks, namely an "innovative" codebook as the basic codebook for CELP, in order to distinguish the codebook from the "adaptive" codebook. The 15 innovative codebook in ACELP consists of code-vectors each contains only a small number of pulses, and zero value elsewhere. The periodicity of the excitation, which is needed for voiced speech, derives from the last frame total LPC filter input excitation based on the present frame pitch lag value.

It has been observed that low bit-rate speech codees using synthesis model based on the one as shown in FIG. 1 or FIG. 2 introduce buzziness in the decoder speech synthesis output when the pitch frequency is low. The main cause is identified as the use of pulse like signal in the excitation signal, see "A Mixed Source Model Speech Compression and Synthesis", by J. 25 Makhoul, R. Viswanathan, R. Schwartz and A. W. F. Huggins - JAcoust. Soc. Amer., Vol. 64, No. 6, p 1577-158 1, Dec., 1978; "Improvements in the Classical Model for Better Speech Quality", by S. Maitra, IEEE Proc. ICASSP p23-27, 1980; and "On Reducing the Buzz in LPC Synthesis", by M. R. Sambur, A.E. Rosenberg, L. R. Rabiner and C. A. McGonegal, IEEE 30 Proc. ICASSP p.401-404 1977.

Over the years, researchers have tried to address the problem of buzzy speech from vocoders. Most of the research has concentrated on using nonpulse like waveform for voiced excitation, either by introducing noise into the pulse train or using adaptive glottal pulse like waveform.

The problem of buzziness is most noticeable in low pitch speech signal under headphone /handset listening conditions. The reduction of the buzziness effect is important especially in public safety communication system where the population of male users are large and the pitch of some male speakers may always stay within the region where buzziness occurs and thus making the speech communication system becoming less comfortable to use.

During extensive testing of ACELP speech quality on speech of various speakers, it was observed that some synthetic speech has a "buzzy" or "rattly" quality. This usually occurs for low pitch male speech, where the pitch is typically less than 150 Hz.

When buzziness occurs, the perceptual distortion is significant and this may be the reason that ACELP has difficulties in coding speech of some particular low-pitch speakers. The reduction of this kind of distortion is therefore necessary Thus it is desirable to reduce the aforementioned "buzziness" effect, observed when utilising speech codecs in a mobile radio /telephony environment.

Summary of the Invention

According to a first aspect of the invention, a speech communications unit is provided. The speech communications unit includes a speech decoder for receiving an input signal, where the speech decoder is operably coupled to pitch determining means for determining a pitch of a received input signal and amplitude correction means for correcting an amplitude of the received input signal based on the pitch determination.

In this manner, unwanted effects of constant amplitude speech signals, particularly for low-pitch speech signals, are compensated for by adjustment of the gain based on such pitch determination.

In the preferred embodiment of the invention, the speech decoder decodes pulse excitation coded signals. Preferably, the speech communications unit further includes a selector operably coupled to the pitch determining means for selecting signals of a predetermined pitch, where the amplitude correction means corrects an amplitude of the selected signals of predetermined pitch.

Preferably, the amplitude correction means is a comb filter arrangement having amplitude nulls for varying an amplitude performance of the received input signal, the amplitude nulls being arranged to be coincident with odd harmonies of the received input signal. Alternatively the nulls can be arranged to be coincident with even harmonics. The pitch determining means is preferably a pitch threshold level for determining whether a pitch of the received input signal is below a predetermined threshold, the pitch threshold level set to approximately 150 Hz such that low pitch portions of the received input signal are introduced into the amplitude correction means.

In a second aspect of the preferred embodiment of the invention, a method of decoding a speech signal is provided. The method includes the steps of receiving an input signal; determining a pitch of a received input signal; and correcting an amplitude of the received input signal based on the pitch determination. Preferably, signals of a predetermined pitch can be selected, on which amplitude correction is performed. In the preferred embodiment of the invention, a user can select when to operate the improved decoding function in order to reduce the buzziness effect of the perceived speech signal.

A preferred embodiment of the invention will now be described, by way of example only, with reference to the drawings.

Brief Description of the Drawings

FIG. 1 shows a functional model of a basic LPC synthesis model.

FIG. 2 shows a functional model of a basic ACELP synthesis model.

FIG. 3 shows a graphical representation of a low-pitch male speech signal.

FIG. 4 shows a post-processing 'buzziness" reduction arrangement according to a preferred embodiment of the invention.

FIG. 5 shows a flow chart for operating the 'buzziness" reduction arrangement according to the preferred embodiment of the invention.

Detailed Description of the Drawings

Referring first to FIG. 1, a block diagram of a synthesis functional model of a basic LPC codec is shown. A voiced excitation source 10 provides a pulse train signal, of pitch duration T into a voiced gain element 12. The amplified pulse train signal from' voiced gain element 12 is then selectively input, via a switch 14, to a Linear Predictive Coder (LPC) Filter 16. When no voice signal is present, an unvoiced excitation source 18 provides a Gaussian noise signal into an unvoiced gain element 20. The amplified Gaussian noise signal from unvoiced gain element 20 is selectively input, via switch 14, to the Linear Predictive Coder (LPC) Filter 16, when no voice is present. The output from the LPC filter 16 is synthetic speech.

In this manner, a series of amplified pulses from the voiced excitation source 10 are combined with amplified signals from an unvoiced excitation source 18, filtered with the resultant generated signal being representative of synthetic speech.

Referring next to FIG. 2, a block diagram of a synthesis functional model of a basic ACELP codec is shown. An excitation vector from the "InnovativJ codebook 30 is chosen and input to the voice gain element 31. Another excitation vector from the "Adaptive" codebook 32 is also chosen according to the present frame pitch lag value T and input the gain element 33. The output of voice gain element 31 and the output of voice gain element 33 are input to a summation device 34. The output of the summation device 34 is input to the Linear Predictive Coder filter 35. The output of the summation device 34 is also used to update the "Adaptive" codebook for next frame speech synthesis. The output from the LPC filter 35 is then synthetic speech.

In this manner, a series of amplified excitation vectors from the "Innovative" codebook 30, incorporating a feedback path to the excitation vector source, are combined with a variety of amplified pulses selected from an "Adaptive" codebook 32 (unvoiced excitation source). The combined signal is then filtered with the resultant signal from the LPC filter being representative of synthetically generated speech. The particular vectors are chosen to best imitate the speech signal to be transmitted, or being received.

Referring now to FIG. 3, a post-processing 'buzzines? reduction arrangement, according to a preferred embodiment of the invention, is shown. The post-processing arrangement includes a complimentary switch feature having switches 60 and 62. The arrangement includes two paths 64, 66, for routing the decoded speech, dictated by the positioning of the switches 60 and 62. The first path 64 routes the decoded speech signal via a comb filter arrangement 66, whilst the second path is a direct link, with no adjustment being performed to the decoded speech signal. The switches are controlled by a pitch determining arrangement, for example a digital signal processor 68 function. When the pitch of the decoded speech signal is determined to be below a particular threshold, for example 150 Hz, the decoded speech signal is routed through the comb filter arrangement 66. The pitch period of the comb filter arrangement is, in the preferred embodiment of the invention, adjusted to be twice the pitch period of the input decoded speech signal.

When the pitch of a sub-frame is less than 150 Hz, the decoded speech harmonic structure is modified by a comb-filter of the form, p Qz) = G,: (1 + a,z 2) (1) frame and where p is the pitch duration in samples of the present sub- GC = 1 / (1 + 00 (2) is the gain normalisation factor to ensure the filter has a unity gain at even harmonic peaks.

This filter, in the preferred embodiment of the invention, attenuates the odd pitch harmonics of the decoded speech signal. The level of attenuation depends on the value of (x,. A value of '0' provides no 1 attenuation anda value of'LO'for maxirnilrn attenuation, thereby effectively doubling the pitch of the speech signal.

Based on informal subjective listening tests, this filter is able to reduce the "buzzy" distortion slightly and also enhance the "smoothness" of the decoded speech. The attenuation of some low frequency components is perceptible but not significant. The optimal setting of oc., based on informal listening tests was found to be 0.25 with a random jitter within the range of +/- 0. 1 for each sub-frame.

Advantageously, the correction of alternate harmonic amplitudes of the received input signal disrupts the amplitude between close harmonics and thereby reduces perceived buzziness of the received signal and enhances the "smoothness" of any received low-pitch speech signal. The preferred embodiment is implemented as a post-processing filter and is therefore easily introduced into existing codec arrangements to provide enhanced quality for received low-pitch speech signals.

It is within the contemplation of the invention that alternative arrangements to the specific, preferred embodiment described above, would be consistent with the inventive concept described herein. In particular, any amplitude adjustment means could be used instead of the comb filter arrangement as described. Furthermore, the selected pitch threshold rate may well vary dependent upon the type of speech signal to be synthetically generated. In addition, the inventive concept described herein is not limited to pulse-excited codec arrangements, nor codebook- type approaches.

As an alternative technique to using the approach of pitch detection followed by a comparison with a threshold level, the implementation of the amplitude adjustment may be instigated by a user. It is envisaged that a user would initiate the fidelity improvement method of adjusting the amplitude of certain speech harmonies when the perceived speech is poor. Such a dynamic and user-controlled technique is beneficial in a mobile communications environment, as it enables the user to determine when the improvement should or could be made, dependent upon the user's perception of the received speech.

Referring now to FIG. 4, a graphical representation of a low-pitch male speech signal is shown. The first graph 80 in FIG. 3 shows the amplitude 82 versus frequency 84 of a low-pitch male voice. The lowpitch period (T2) 86 of the voice shows that there is some overlap of the speech content, the constant amplitude of which causes the buzziness effect in pulse-excited codecs. Superimposed upon the low-pitch speech signal is a second graph 88 showing the spectral output from the comb filter of FIG. 3. The comb filter in FIG. 3 is arranged with a "null" pitchperiod of (2T2), such that alternate harmonics of the low-pitch speech signal are attenuated, when the low-pitch speech is fed through the comb filter.

In such a manner, the consecutive harmonies of the low-pitch signal have different amplitude levels, after the combining process. This difference in harmonic levels contributes to a reduction in the buzziness effect in voice transmissions in a pulse-excited codec.

Referring now to FIG. 5, a flow chart for operating the 'buzziness" reduction arrangement in the decoding of a speech signg according to the preferred embodiment of the invention, is shown. The flow chart includes receiving an input speech signal, as shown in step 100 and decoding the speech signal to determine the speech content, as in step 102. The pitch of the received input signal is then determined, as shown in step 104 and a decision made, in step 106, as to what course of action to take with the received, decoded speech signal. If, in the preferred embodiment of the invention, the pitch of the received input signal is determined to be less than a 150 Hz threshold level, the received, decoded speech signal is input to a post-processing, harmonic amplitude adjustment function, as shown in step 108. Alternatively, the user may initiate the post-processing harmonic adjustment function, upon hearing poor quality low-pitch speech, as shown in step 105. Preferably, the harmonic amplitude adjustment function is a comb filter arrangement. In this case, the amplitude of the received input signal is corrected based on the pitch determination. If the pitch is determined to be greater than the threshold value no such harmonic amplitude adjustment is performed on the received input signal, as shown in step 110.

Thus, a method for reducing the buzziness effect in speech codees is provided.

Claims

1. A speech communications unit comprising a speech decoder for receiving an input signal, the speech decoder being operably coupled to pitch determining means for determining a pitch of a received input signal and amplitude correction means for correcting an amplitude of the received input signal based on the pitch determination.

2. A speech communications unit according to claim 1 wherein the 10 speech decoder decodes pulse excitation coded signals.

3. A speech communications unit according to claims 1 or 2 wherein the speech communications unit further comprises a selector operably coupled to the pitch determining means for selecting signals of a predetermined pitch and the amplitude correction means corrects an amplitude of the selected signals of predetermined pitch.

4. A speech communications unit according to claims 1 or 2 wherein the pitch determining means is performed by the user determining whether the perceived speech signal is of an unacceptable quality level and initiating the amplitude correction means to correct an amplitude of the selected signals when the speech quality is unacceptable.

5. A speech communications unit according to any of the preceding claims wherein the amplitude correction means comprises a comb filter arrangement having amplitude nulls for varying an amplitude performance of the received input signal.

6. A speech communications unit according to claim 5 wherein the amplitude nulls of the comb filter arrangement are arranged to be substantially coincident with alternate harmonics of the received input signal.

7. A speech communications unit according to claim 6 wherein the amplitude nulls are arranged to be substantially coincident with odd alternate harmonics of the received input signal.

8. A speech communications unit according to any of the preceding claims wherein the pitch determining means is a pitch threshold level for determining whether a pitch of the received input signal is below a predetermined threshold.

9. A speech communications unit according to claim 8, wherein the pitch threshold level is approximately 150 Hz such that low pitch portions of the received input signal are introduced into the amplitude correction means.

10. A method of decoding a speech signal comprising the steps of.

receiving an input signal; determining a pitch of a received input signal; and correcting an amplitude of the received input signal based on the pitch determination.

11. A method according to claim 10 further comprising the step of selecting signals of a predetermined pitch on which to perform the amplitude correction.

12. A method according to claims 10 or 11, wherein the step of determination a pitch includes a user determining whether the perceived speech signal is of an unacceptable quality level and initiating the amplitude correction means to correct an amplitude of the selected signals when the speech quality is unacceptable.

13. A method according to claims 10 to 12 wherein the amplitude correction is arranged to be substantially coincident with alternate harmonics of the received input signal.

14. A method according to any of preceding claims 10 to 13 wherein the determining of pitch determines whether the pitch is below a predetermined threshold level for determining whether a portion of the received input signal having a pitch below the threshold level is to have its amplitude corrected.

15. A method for operating a speech decoder as substantially described with reference to, and/or as illustrated by, FIG. 4 of the drawings.