CA1321025C - Speech signal coding/decoding system - Google Patents

Speech signal coding/decoding system

Info

Publication number
CA1321025C
CA1321025C CA000581746A CA581746A CA1321025C CA 1321025 C CA1321025 C CA 1321025C CA 000581746 A CA000581746 A CA 000581746A CA 581746 A CA581746 A CA 581746A CA 1321025 C CA1321025 C CA 1321025C
Authority
CA
Canada
Prior art keywords
filter
term predictive
output
shaping
speech signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CA000581746A
Other languages
French (fr)
Inventor
Takahiro Nomura
Yohtaro Yatsuzuka
Shigeru Iizuka
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
KDDI Corp
Original Assignee
Kokusai Denshin Denwa KK
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kokusai Denshin Denwa KK filed Critical Kokusai Denshin Denwa KK
Application granted granted Critical
Publication of CA1321025C publication Critical patent/CA1321025C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

TITLE OF THE INVENTION
A Speech Signal Coding/Decoding System ABSTRACT OF THE DISCLOSURE

input speech signal is encoded by an adaptive quantizer (16) which quantizes the predicted residual signal between the digital input speech signal, and prediction signals provided by predictors (6,10) and a shaped quantization noise provided by a noise shaping filter (19). An inverse quantizer (18) which the encoded speech signal is supplied to, is provided for noise shaping and local decoding. A noise shaping filter (19) around the adaptive quantizer makes the spectrum of the quantization noise similar to that of the original digital input speech signal by using the shaping factors.
The shaping factors (rn1, rns) set to the noise shaping filter (19) are adaptively switched to weight it depending upon the prediction gain (ex. ratio of input speech signal to predicted residual signal or the prediction coefficients). In a decoding side there are an inverse quantizer (36), predictors (42,43), and a post noise shaping filter (44). The shaping factors set to the post noise shaping filter are similary switched to weight it depending upon the prediction gain.

Description

1321~2~
TITIE OF THE INVENTION
__ _ A Speech Signal Coding/Decoding System ~ACKGROUND_OF THE INVENTION
The present invention relates to a speech signal coding/decoding system, in particular, relates to such a system which codes or decodes a digital speech signal with a low bit rate.
A communication system with severe limitation of frequency band and/or transmit power, like digital marine satellite communication and digital business satellite communication using SCPC (single channel per carrier), a speech coding/decoding system with low bit rate, excellent speech quality, and low error rate, is requested.
As the conventional coding/decoding systems, there are an adaptive prediction coding system (APC), which has a predictor for calculating the prediction coefficient for every frame, and an adaptive quantizer for coding the predieted residual signal which is free from correlation between sampled values, and a multi-pulse drive linear prediction coding system (MPEC) which excites LPC
synthesis filter with a plurality of pulse sources, and so on.
The prior adaptive prediction coding system (APC) is 1321~2~
now descrihed as an Example.
_IEF DESCRIPTION OF THE DRAWINGS
The foregoing and other objects, features and attendant advantages of the present inventlon will be appreciated as the same become better understood by means of ths following description and accompanying drawings wherein:
Fig. lA is a block diagram of a prior speech signal coder, Fig. lB is a block diagram of a prior speech signal decoder, Fig. 2 is a block diagram of a noise shaping filter for a prior coder, Fig. 3A is a block diagram of a post noise shaping filter for a prior speech signal decoder, Fig. 3B is a block diagram of another post noise shaping filter for a prior decoder, Fig. 4 is a block diagram of a noise shaping filter for a coder according to the present invention, and Fig. 5 is a block diagram of a post noise shaping filter for a decoder according to the present invention.
Fig. lA is a block diagram of a prior coder for adaptive prediction coding system which is shown in U.K. Patent No. 2150377. A digital input speech signal Sj is fed to the LPC
analyzer 2 and the short term predictor 6 through the input terminal 1. The LPC analyzer 2 carries out the short term spectrum analysis for every frames according to the digital input speech signal, and resultant LPC parameters thus obtained are coded in the LPC parameter coder 3. And, the coded PLC
.,~

1 32102~
parameters are transmitted to a receive side through a multiplex circuit 30. The LPC parameter decoder 4 decodes the output of the LPC parameter coder 3, and the I.PC parameter/short term prediction parametèr converter 5 provides the short term S prediction parameter, which is applied to the short term predictor 6, the noise shaping filter 19, and the local decoding short term predictor 24.
The subtractor 11 substracts the output of the short term predictor 6 from the digital input speech signal Sj and provides the short term predicted residual signal ~Sj which is free from correlation between adjacent samples of the speech signal. The short term predicted residual signal ~Sj is fed to the pitch analyzer 7 and the long term predictor lO. The pitch analyzer 7 carries out the - 2a -~32~02~
pitch analysis according to the short term predicted residual sign~l ~s; ~nd provides the pitch period and the pltch parameter which are coded by the pitch parameter coder 8.and are transmitted to a receive side through the multiplex circuit 30. The pitch parameter decoder 9 decodes the pitch period and the pitch parameter which are the output of the coder ~, and the output of the decoder 9 is set to the long term predictor lO, the noise shaping filter l9 and the local decoding long term predictor 23.
The subtractor 12 subtracts the output of the long term predictor lO which uses the pitch period and the pitch parameter from the short term predicted residual signal ~sj, and provides the long term predicted residual signal, which is free from the correlation of repetitive waveforms by the pitch of speech signal and ideally is an white noise. The subtractor 17 subtracts the output of the noise shaping filter l9 from the long term predicted residual signal which is the output of the subtractor 12, and provides the final predicted residual signal into the adaptive quantizer 16. The quantizer 16 performs the quantization and the coding of the final predicted residual signal and transmits the coded signal to the receive side through the multiplix circuit 30.
The coded final predicted residual signal which is 1321~2~
the output of the quantizer ~6 is fed to the inverse quantizer 18 for decoding and inverse quantizing. The output of said inverse quantizer 18 is fed to the subtrator 20 and the adder 21. The subtractor 20 subtracts the final predicted residual signal which is the input of the adaptive quantizer 16 from said quantized final predicted residual signal which is the output of said inverse quantizer 18, and provides the quantization noise, which is fed to the noise shaping filter 19.
In order to update the quantization step size in every sub-frame, the RMS calculation circuit 13 calculates the RMS (root mean square) of said long term predicted residual signal. The RMS coder 14 codes the output of said RMS calculator 13, and stores the coded output level as a reference level along with the ad~acent levels made from it. The output of the RMS coder 14 is decoded in the RMS decoder 15. Multiplication of the quantized RMS value corresponding to said reference level as the reference RMS value, by the predetermined fundamental step size makes the step size of the adaptive quantizer 16.
On the other hand, the adder 21 adds the quantized final predicted residual signal which is the output of the inverse quantizer 18, to the output of the local .

132102~
decoding long term predictor 23. The output of said adder 21 is fed to the long term predictor 23 and the adder 22, which also receives the ou~put of the local decoding short term predictor 24. The output of the adder 22 is fed to the local decoding short term predictor 24.
The local decoded digital input speech signal Sj is obtained through above process on the terminal 25.
The subtractor 26 provides the difference between said local decoded digital input speech signal S] and the original digital input speech signal Sj. The minimum error power detector 27 calculates the power of the error which is the output of the subtractor 25 over the sub-frame period. The similar operation is carried out for all the stored fundamental step sizes, and the adjacent levels. The RMS step size selector 28 selects the coded RMS level and the fundamental step size which provide the minimum power among error powers. The selected step size is coded in the step size coder 29.
The output of the step size coder 29 and the selected coded RMS level are transmitted to the receive side through the multiplexer 30.
Fig.lB shows a block diagram of a decoder which is used in a prior adaptive prediction coding system on a receive side.
The input signal at the decoder input terminal 32 is 132102~

separated in the demultipl.exer 33 into each information of the final residual. signal. (a), an RMS value (b), a step size (c), an LPC parameter (d), and a pitch period/pitch parameter (e). They are fed to the adaptive inverse quantizer 36, the RMS decoder 35, the step size decocler 34, the LPC parameter decoder 38, and the pitch parameter decoder 37, respectively.
The RMS value decoded by the RMS value decoder 35, and the fundamental step size obtained in the step size decoder 34 are set to the adaptive inverse quantizer 36.
The inverse quantizer 36 inverse quantizes the received final predicted residual signal, and provides the quantized final predicted residual signal.
The short term prediction parameter obtained in the LPC parameter decoder 38 and the LPC parameter/short term prediction parameter converter 39 is set to the short term predictor 43 which is one of the synthesis filters, and to the post noise shaping filter 44. Furthermore, the pitch period and the pitch parameter obtained in the pitch parameter decoder 37 are set to the long term predictor q2, which is the other element of the synthesis filters.
The adder 40 adds the output of the adaptive inverse quantizer 36 to the output of the long term predictor 42, and the sum is fed to the long term predictor 42. The adder 41 adds said sum of the adder 40 to the output of the short term predictor 43, and provides the reproduced speech signal. The output of the adder 41 is fed to the short term predictor 43, and the post noise shaping filter 44 which shapes the quantization noise. The output of said adder 41 is further fed to the level adjuster 45, which adjusts the level of the output signal by comparing the level of the input with that of the output of the post noise shaping filter 44.
The noise shaping filter 19 in the coder, and the post noise shaping filter 44 in the aecoder are now described.
Fig.2 shows a block diagram of the prior noise shaping filter 19 in the coder. The output of the LPC
parameter/short term prediction parameter converter 5 is set to the short term predictor 49, and the pitch parameter and the pitch period which are the outpu~s of the pitch parameter decoder 9 are set to the long term predictor 47. The quantization noise which is the output of the subtractor 20 is fed to the long term predictor 47. The subtractor 48 provides the difference between the input of the long term predictor 47 (quantization noise) and the output of the long term predictor 47. The output of the subtractor 48 is fed to the short term predictor 49. The adder S0 adds the output of the short term 132102~
predictor 49 to the output of the long term predictor 47, and the output of the adder 50 is fed to the subtractor 17 as the output of the noise shaping filter 19.
The transfer function F'(z) of the noise shaping filter 19 is as follows.

nl l(Z)+[l~rnlPl(z)]ps(z/(rsr )) (1) where Ps(z) and Pl(z) are transfer functions of the short term predictor 6 and the long term predictor 10, respectively, and are given for instance by the equations (2) and (3), respectively, described later. rs is leakage, rnl and rnS are noise shaping factors of the long term predictor and the short term predictor, respectively, and each satisfying O_rS, rnl, rns_l. The values of r 1 and r s are fixed in a prior noise shaping filter.
The transfer function PS(z) of the short term predictor 6 is given below.

1=l (2) where ai is short term prediction parameter, Ns is the number of taps of a short term predictor. The value ai is calculated in every frame in the LPC analyzer 2 and the 132102.~
LPC parameter/short term prediction parameter converter 5. The value ai varies adaptively in every frame depending upon the change of the spectrum of the input signal.
The transfer function of the long term predictor 10 is defined by the similar equation, and the transfer function Pl~z) in case of one tap predictor is as follows.

Pl(z) = b1z (Pp) (3) where b1 is pitch parameter, Pp is pitch period. The vlaues bl and Pp are calculated in every frame in the pitch analyzer 7, and follows adaptively to the change of the periodicity of the input signal.
Figs.3A and 3B show block diagrams of the prior post noise shaping filter 44 in the decoder.
In a prior art, only a short term post noise shaping filter which has the weight of the short term prediction parameter in the equation (2) is used.
Fig.3A shows a post noise shaping filter composed of merely a pole filter. The short term prediction parameter obtained in the LPC parameter/short term prediction parameter converter 39 is set to the short term predictor 52. The adder 51 adds the reproduced speech signal from ~32102~
the adder 41 to the output of the short term predictor 52, and the sum of the adder 51 is fed to the short term predictor 52 and the level adjuster 45. The transfer function E`p~z) of the post noise shaping filter including the level adjuster 45 is shown below.

F'(z) = 0 p l-Ps ( Z/rSrps ) where Go is a gain control parameter, rpS is shaping factor satisfying O_rp5<1.
Fig.3B shows another post noise shaping filter which has a zero filter together with the structure of Fig.3A.
The short term prediction parameter obtained in the LPC

parameter/short term prediction parameter converter 39 is set to the pole filter 54 and the zero filter 55 of the short term predictor. The adder 53 adds the reproduced speech signal from the adder 41 to the output of the pole filter 54, and the sum is fed to the pole filter 54 and the zero filter 55. The subtractor 56 subtracts the output of the zero filter 55 from the output of the adder 53, and the difference is fed to the level adjuster 45.
The transfer function Fpo(z) of the post noise shaping filter of Fig.3B including the level adjuster 45 is shown below.

1321~2~
G(~[l-Ps(z/~srpsz _____ (5) P [l-PS(Z/rsrpSp) where Go is a gain control parameter, r s and rpsp are shaping factors of zero and pole filters, respectively, atisf~ing 0<rp5z_l, and 0_rp5p<1.

The noise shaping filter 19 in a prior coder is based upon a prediction filter which shapes spectrum of quantization noise similar to that of a speech signal, and masks the noise by speech signa~ so that audible speech quality is improved. It is effective in particular to reduce the influence by quantization noise which exists far from the formant-frequencies (in the valleys of the spectrum).
However, it should be appreciated that the spectrum of speech slgnal fluctuates in time, and has thus feature depending upon voiced sound or non-voiced sound. A prior noise shaping filter does not depend on the feature of a speech signal, and merely applies fixed shaping factors.
Therefore, when the shaping factors are the best for non-voiced sound, the voiced sound is distorted or not clear. On the other hand, when the shaping factors are the best for voiced sound, it does not noise-shape satisfactorily for non-voiced speech. Therefore, a prior fixed shaping factors cannot provide excellent speech 1~21025 quality for both voiced sound and non-voiced sound.
Further, the post noise shaping filter 44 in a prior decoder consists of only a short term predictor. And it emphasizes the speech energy in the vicinities of formant frequencies (at the peaks of the spectrum), that is, it spread the difference between the level of speech at the peaks and that of noise in the valleys. That is why speech quality is improved by the post noise shaping filter on frequency domain. A prior post noise shaping filter also takes a fixed weight to a short term prediction filter without considering the feature of the spectrum of a speech signal. So, a strong noise-shaping which is suitable to non-voiced sound would provide undesirable click or distortion for voiced sound. On the other hand, the noise-shaping suitable for voiced sound is not satisfactory with non-voiced sound. Therefore, the post noise shaping filter with fixed shaping factors can not provide satisfactory speech quality for both voiced sound and non-voiced sound.
Also, in a transmit side, a prior MPEC system has an weighting filter which determines amplitude and location of a excitation pulse so that the power of the difference between the input speech signal and the reproduced speech signal from a synthesis filter becomes minimum. The weighting filter also has fixed weighting coefficient.

l32la2~

Therefore, similar to the previous reason, it is not possible to obtain satisfactory speech quality for both voiced sound and non-voiced sound.
SUMMARY OF T~IE INVENTION
In accordance with the invention there is a speech coding and decoding system providing a coding side and a decoding side. In the coding side a predictor provides a prediction signal of a digital input speech signal based upon a prediction parameter which is output by a prediction parameter means. A
quantizer quantizes a final residual signal input thereto and outputs a coded final residual signal, the final residual signal being a function of the prediction signal, the digital input speech signal, and a shaped quantization noise. An inverse quantizer is provided for inverse quantization of the coded final residual signal of the quantizer, the inverse quantizer outputting a quantized final residual signal. A substractor provides quantization noise, the quantization nose being a difference between the final residual signal and the quantized final residual signal of the inverse quantizer. A noise shaping filter shapes a spectrum of the quantization noise similar to a spectrum envelope of the digital input speech signal, the shaping of the spectrum based upon first shaping factors, the noise shaping filter outputting the shaped quantization noise. A
multiplexer multiplexes the coded final residual signal from the quantizer and other information determined in the coding side for sending to a decoding side, the other information including at least the prediction parameter. The decoding side includes a demultiplexer for separating the coded final residual signal and . ~, . ..

~ 321~2~
the other information including the prediction parameter from the coding side. An inverse quantizer is provided for inverse quantization and decoding of the coded final residual signal from the demultiplexer, the inverse quantizer outputting a quantized final predicted residual signal. A synthesis filter reproduces the digital input speech signal by adding the quantized final predicted residual signal of the inverse quantizer and a prediction signal which is based upon the prediction parameter from the demultiplexer. A post noise shaping filter shapes a spectrum of a reproduced digital speech signal using second shaping factors to reduce the effect of the quantization noise on the reproduced digital speech signal. The first and second shaping factors of the noise shaping filter and the post noise shaping filter vary over time with changes in the spectrum envelope in the digital input speech signal wherein the shaping factors for non-voiced sound will be larger than the shaping factors for voiced sound.

~: 14 13~102~
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Now, the embodi~ents of the present invention, in 1 32102~

particular, a noise shapiny filter in a coder and a post noise shaping filter in a decoder, are described.
Fig.4 shows a block diagram of a noise shaping filter according to the present invention. The shaping factor selector 66 receives the digital input signal from the coder input 1, the short term predicted residual signal from the subtractor 11, and t.he long term predicted residual signal from the subtractor 12, and evaluates the prediction gain by using those input signals. Then, the selector 66 weights adaptively the short term prediction parameter from the LPC
parameter/short term prediction parameter converter 5, and the pitch parameter from the pitch parameter decoder 9 by using the result of the evaluation, then, these weighted parameters are set to the short term predictive pole filter 62, the short term predictive zero filter 63, the long term predictive pole filter 58, and the long term predictive zero filter 59. The adder 57 adds the quantization noise from the subtractor 20 and the output of the long term predictive pole filter 58, and the sum is fed to the long term predictive pole filter 58 and the long term predictive zero filter 59. The subtractor 60 subtracts the output of the long term predictive zero filter 59 from the output of the adder 57, and the difference which is the output of the subtractor 60 is ~ 321~2~

fed to t~le adder 61. The adder 61 adds the output of the subtractor 60 to the output of the short term predictive pole fi.].ter 62. The sum which is the output of the adder 61 is fed to the short term predictive pole filter 62 and the short term predictive zero filter 63. The subtractor 64 subtracts the output of the short term pred.ictive zero filter 63 from the output of the adder 61. The subtractor 65 subtracts the output of the subtractor 64 from the quantiæation noise which is the input of the noise shaping filter 19, and the difference which is the output of the subtractor 65 is fed to the subtractor 17 (Fig.lA) as the output of the noise shaping filter 19.
The transfer function F(z) of the noise shaping filter of Fig.4 is shown as follows.

l-Ps(z/rs) 1 Pl(Z) F(z) = 1- - (6) l-PS ( Z/rsrnS ) l~rnl Pl ( ~ he noise shaping filter 19 composes the long term predictive pole filter 58, the long term predictive zero filter 59, the short term predictive pole filter 62 and the short term predictive zero filter 63 so that the equation (6) is satisfied. For instance, the location of the long term predictive pole filter 58 and the long term predictive zero filter 59, and/or the location of the 1321 ~
short term predictive pole filter 62 and the short term predictive zero filter 63 may be opposite with that of Fig.4 if sati.sfy.ing the equation (6). Further, separate shaping factor selectors for long term predictive filters (58, 59), and short term predictive filters (62, 63) may be installed.
Generally speaking, voiced sound has a clear spectrum envelope, and in particular, a nasal sound and a word tail are close to a sinusoidal wave. Therefore, they can be reproduced well, that is, the short term prediction gain is high. Further, since the voiced sound has the clear pitch structure, the long term (pitch) prediction gain is high, and the quantization noise is little.
On the other hand, non-voiced sound, like fricative sound, has a spectrum close to random noise, and has no clear pitch structure. So, thèy can not be reproduced well, that is, the long term prediction gain and the short term prediction gain are low, and the quantization noise is much.
Therefore, We must shape the quantization noise adequatly to the feature of speech by measuring the prediction gain. For example, the prediction gain may be evaluated by using Sk/Rk, and/or Sk/Pk, where Sk is power of digital input speech signal, Rk is power of short term 132~
predicted residual signal, and Pk is long term predicted residual signal, Sk/Rk is power ratio of speech signal before short term prediction to speech signal after it, and Sk/Pk is power ratio of speech signal before total prediction and speech signal after it.
The noise shaping works strongly to voiced sound which has a large value for the above ratios (that is, which has high prediction gain), and weakly to non--voiced sound which has a small value for the above ratios (that is, which has low prediction gain). The shaping factor selector 66 in Fig.4 uses the above ratios of input to output of the predictor as the indicator of the prediction gain. To tell in detail, the selector 66 has the threshold values Sthl, and Sth2 k k k k respectively, and the shaping factors rnS and rnl of the short term predictor and the long term predictor, respectively, are switched as follows.

a) When Sk/Pk > Sthl or Sk/Rk > Sth2 i _ n _ n rnS - rthl ~ rnl - rth3 b) When Sk/Pk < Sthl and Sk/Pk - Sth2 is rnS = rth2, r = rn whereO_rthl_rth2_1~ and -rth3-rth4 132102~
As an alternative, LPC parameters ki (reflection coefficients) which are the output of the LPC parameter decoder 4 are used as an indicator of the prediction gain, instead of the ratios of input to output of the predictor into the shaping factor selector 66 in Fig.4.
The prediction gain of voiced sound, nasal sound, - and word tail is high, then¦kil is close to 1. On the other hand, non-voiced sound like fricative sound has small prediction gain, then¦ki¦ is close to 0. So, the parameter G which defines the prediction gain is determined as follows.

G = 1~ (l-ki) (8) i=l 15When the parameter G is close to 0, the prediction gain is high, and when the parameter G is close to 1, the prediction gain is low~ Therefore, the noise shaping must work weakly when the parameter G is small, and strongly when the parameter G is large. In an embodiment, a threshold Gthl is defined for the parameter G, and the shaping factors rnS, and rnl of the short term predictor and the long term predictor are switched as follows.

132102~
a) When G<Gthl is satisfied;

_ n _ n rnS--rth5~ rnl rth7 (9) b) When Gth1<G is satisfied;
rnS=rth6' rnl rth8 J

where O_Gthl_l, O_rnth5_rnth6_1, and 0_rnth7_rnth8_1 The number of the thresholds is not restricted like the above, but a plurality of threshold values may be defined, that is, the shaping factors may be switched by dividing the range of the parameters G into small ones.
Fig.5 is a block diagram of the post noise shaping filter 44 according to the present invention.
The shaping factor selector 76 for the short term predictor evaluates the prediction gain by using the LPC
parameter which is the output of the LPC parameter decoder 38 (Fig.lB). Then, the short term prediction parameter which is the output of the LPC parameter/short term prediction parameter converter 39, is adaptively weighted according to said evaluation, and these differently weighted short term prediction parameters are set to the short term predictive pole filter 72 and the short term predictive zero filter 73. The shaping factor selector 75 of the long term predictor evaluates the 132~025 prediction gain by using the pitch parameter which is the output of the pitch parameter decoder 37, and the pitch parameter is weighted adaptively according to the evaluation. These differently weighted pitch parameters are set to the long term predictive pole filter 68 and the long term predictive zero filter 69. The adder 67 adds the reproduced speech signal from the subtractor 44 to the output of the long term predictive pole filter 68, and the sum is fed to the long term predictive pole filter 68 and the long term predictive zero filter 69.
The adder 70 adds the output of the adder 67 to the output of the long term predictive zero filter 69, and the adder 71 adds the output of the adder 70 to the output of the short term predictive pole filter 72, and the output of the adder 72 is fed to the short term predictive pole filter 72 and the short term predictive zero filter 73. The subtractor 74 subtracts the output of the short term predictive zero filter 73 from the output of the adder 71, and the output of the subtractor 74 is fed to the level adjuster 45 (Fig.lB) as the output of the post noise shaping filter 44.
The transfer function G(z) of the post noise shaping filter 44 including the level adjuster 45 is given below.

l-Ps(Z/rsrpsz) l+rplz 1( tlO) 1 ~PS ( Z/rSrpsp ) plp 132102~
where rpSp, rpSz, rp1p, and rplz are shaping factors of the short term predictive pole filter 72, the short term predictive zero filter 73, the long term predictive pole filter 68, and the long term predictive zero filter 69, respectively.
This short term predictor has the spectrum characteristics keeping the formant structure of the LPC
spectrum, by superimposing the poles of the pole filter with the zeros of the zero filter which has less weight than that the pole filter, on the spectrum. Thus, the spectrum characteristics are emphasized in the high frequency formants as compared with the spectrum characteristics of a mere pole filter. The long term predictor has the spectrum characteristics emphasizing the pitch component on the spectrum, by locating the poles between the zeros. Thus, the insertion of the short term predictive zero filter, the long term predictive zero filter 69 and the adder 70 emphasizes the formant component of speech, in particular, the high frequency formant component, and the pitch component. So, clear speech can be obtained.
From the reason similar to the case of the noise shaping filter in the coder, the noise shaping must work weakly to the voiced sound where the prediction gain is high, and strongly to the non-voiced sound where the ~ 32102~

prediction gain is low. F`or example, in the short term predictor in the post noise shaping filter using the LPC
parameter ki for the spectrum envelope information, when the parameter G of the equation (8) is used as the 5 prediction gain, the values rpsp and rpsz may be switched by using the thresholds Gth2 and Gth3 of the parameter G, as follows.

a) When G ~ Gth2 rpsp = rtPhl ~ rpSZ rth4 b) When Gth2 - G < Gth3 (11) psp th2' psz rth5 c) When Gth3 _ G

rpsp = rPth3, rpSz rth6 where -Gth2-Gth3-l~ _rthl_rth2_rth3_1, 0-rth4-r As mentioned above, the switching of the shaping factors of the short term predictive pole filter 72 and the zero filter 73 provides the factors suitable to the current speech spectrum.
The similar consideration is possible for the long term predictors, that is, the use of the above equations is possible. For the sake of the simplicity of the explanation, the example using a one tap filter is 1321~25 described below.
For example, the pitch parameter bl as the prediction gain in tlle range of O<b1<1 indicates the pitch correlation, and when bl is close to 1, the pitch structure becomes clear, and the long term prediction gain becomes large. Therefore, the noise shaping must work weakly to the voiced sound which has a large value of bl, and strongly to the transient sound which has a small value of bl. The threshold bth of bl is defined, and the values rplp and rplz are switched as follows.

bl<b ; rplp=rtPh2, r 1 =rPl b) When bth_bl; rplp=rthl~ rplz th3 (12) where <bth-l' _rtPhl-rth2-l~ O_rth3_rth4_1 Similarly, the shaping factors of the long term predictive pole filter 68 and the zero filter 69 are switched to be set the values suitable for the speech spectrum.
Fig.5 shows the case of the use of the separate selectors 75 and 76. Of course, the use of a common selector as in the case of Fig.4 is possible in the embodiment of Fig.5.

Finally, the n~lmerical embodiment of the shaping factors which are used in the simulation for 9.6 kbps APC-MLQ ~adaptive predictive coding - most liklyhood ;~ quantization) are shown as follows.

a) When the transfer function of the noise shaping filter in the coder is expressed by the equation (6), and the accuracy of the prediction is indicated by the input output ratio of the predictor (equation (7));

If Sk/Pk>40 or Sk/Rk>30, then rnS=0.2, rnl=0.2 If Sk/Pk<40, and Sk/Rk<30, then rnS=0.5, rnl=0.5 b) When the transfer function of the post noise shaping filter in the decoder is indicate~ by the equation (10), and the short term prediction gain is expressed by LPC
parameter (equation (11));

G<0.08; rpsp=0.25, rpsz=0.075 0.08<G<0.4; rpSp=0.6, rp5z=0.18 0.4<G; rpsp=0.9, rpSz=O ~ 27 ; 20 c) When the pitch parameter (equation (12)) is used as ;. the long term prediction gain in the post noise shaping filter;
bl<0.4; rplp=0.62, rplz=0.31 0.4_bl; rplp=0.35, rplz=0.175 , :

13~102~

As mentioned above, according to the present invention, the factors of the noise shaping filter in the coder and the post noise shaping filter in the decoder, are adaptively weighted depending on the prediction gain.
Therefore, the excellent speech quality can be obtained irrespective of voiced sound or non-voiced sound. The present invention is implemented simply by using the ratio of the input to the output of the predictor, the LPC parameter, or the pitch parameter as the indication of the predictor gain.
Further, in order to reduce the effect of the quantization noise the noise shaping works more powerfully by the use of the noise shaping filter having the shaping factor selector 66, the long time prediction pole filter 58, the zero filter 59, the short time prediction pole filter 62, and the zero filter 63.
Further, the clear speech with less quantization noise effect is provided by the use of the post noise shaping filter having the shaping factor selector 75, 76, the long term predictive pole filter 68 and zero filter 69, the short term predictive pole filter 72 and the zero filter 73, means for adding the input and the output of the long term predictive the zero filter 69, and subtracting the output from the input of the short term predictive the zero filter 73.

: . ., 132102~

The present invention is beneficial, in particular, for the high efficiency speech coding/decoding system with the low bit rate.
From the foregoing, it will now be apparent that a new and improved speech coding/decoding system has been found. It should be understood of course that the embodiments disclosed are merely illustrative and are not intended to limit the scope of the invention. Reference should be made to the appended claims, therefore, rather than the specification as indicating the scope of the invention.

Claims (9)

1. A speech coding/decoding system comprising:
a coding side including a predictor providing a prediction signal of a digital input speech signal based upon a prediction parameter which is output by a prediction parameter means, a quantizer quantizing a final residual signal input thereto and outputting a coded final residual signal, said final residual signal is a function of said prediction signal, said digital input speech signal and a shaped quantization noise, an inverse quantizer for inverse quantization of said coded final residual signal of said quantizer, said inverse quantizer outputting a quantized final residual signal, a substractor providing quantization noise, said quantization noise is a difference between said final residual signal and said quantized final residual signal of said inverse quantizer, a noise shaping filter shaping a spectrum of said quantization noise similar to a spectrum envelope of the digital input speech signal, said shaping of said spectrum based upon first shaping factors, said noise shaping filter outputting said shaped quantization noise, and a multiplexer for multiplexing said coded final residual signal from said quantizer, and other information determined in said coding side for sending to a decoding side, said other information including at least said prediction parameter;
said decoding side including a demultiplexer for separating said coded final residual signal, and the other information including said prediction parameter from said coding side, an inverse quantizer for inverse quantization and decoding of said coded final residual signal from said demultiplexer, said inverse quantizer outputting a quantized final predicted residual signal, a synthesis filter for reproducing said digital input speech signal by adding said quantized final predicted residual signal of said inverse quantizer and a prediction signal which is based upon said prediction parameter from said demultiplexer, and a post noise shaping filter for shaping a spectrum of a reproduced digital speech signal using second shaping factors to reduce an effect of said quantization noise on said reproduced digital speech signal, wherein the first and second shaping factors of said noise shaping filter and said post noise shaping filter vary over time with changes in the spectrum envelope in the digital input speech signal wherein said shaping factors for non-voiced sound will be larger than said shaping factors for voiced sound.
2. A speech coding/decoding system according to claim 1, wherein said first and second shaping factors vary based on a ratio of the digital input speech signal and a residual signal, which is a difference between said digital input speech signal and the prediction signal output from said predictor.
3. A speech coding/decoding system according to claim 1, wherein said first and second shaping factors vary based upon the prediction parameter which is at least one of a linear predictive coding parameter and a pitch parameter.
4. A speech coding/decoding system according to claim 1, wherein said noise shaping filter comprises:
a short term predictive pole filter and a short term predictive zero filter which shape the spectrum of the quantization noise similar to the spectrum envelope of the digital input speech signal, a long term predictive pole filter and a long term predictive zero filter which shape the spectrum of the quantization noise similar to a harmonic spectrum due to a periodicity of the digital input speech signal, a shaping factor selector for selecting said first shaping factors of said short term predictive pole filter, said short term predictive zero filter, said long term predictive pole filter and said long term predictive zero filter depending upon an elevated predication gain, a first adder receiving an output of said substractor as an input of the noise shaping filter, and an output from said long term predictive pole filter, and providing inputs to said long term predictive zero filter and said long term predictive pole filter, a first subtractor for providing a difference between an output of said first adder and an output of said long term predictive zero filter, a second adder receiving an output from said first substractor and an input from an output of said short term predictive pole filter, and providing inputs to said short term predictive zero filter and said short term predictive pole filter, a second substractor for providing a difference between an output of said second adder and an output of said short term predictive zero filter, a third substractor for providing a difference between an output of said second substractor and an input of the noise shaping filter to provide an output of the noise shaping filter, said evaluated prediction gain being determined by evaluating said prediction parameter according to said digital input speech signal, and said prediction signal which is a difference between said digital input speech signal and said predicted signal.
5. A speech coding/decoding system according to claim 1, wherein said post noise shaping filter comprises:
a short term predictive pole filter and a short term predictive zero filter which shape the spectrum of the decoded digital speech signal similar to the spectrum envelope of the digital input speech signal, a long term predictive pole filter and a long term predictive zero filter which shape the spectrum of the decoded digital speech signal similar to a harmonic spectrum of the digital input speech signal, shaping factor selectors for selecting said second shaping factors of said short term predictive pole filter, said short term predictive zero filter, said long term predictive pole filter and said long term predictive zero filter depending upon said prediction gain, .

a first adder receiving an output from said synthesis filter, and an output from said long term predictive pole filter, and providing inputs to said long term predictive zero filter and said long term predictive pole filter, a second adder receiving an output of said first adder, and a output from said long term predictive zero filter, a third adder receiving an output from said second adder, and an output from said short term predictive pole filter, and providing inputs to said short term predictive zero filter and said short term predictive pole filter, and a substractor for providing a difference between an output of said third adder and an output from said short term predictive zero filter to provide said reproduced digital speech signal.
6. A speech coding system comprising:
a predictor providing a prediction signal of a digital input speech signal based upon a prediction parameter which is output by a prediction parameter means;
a quantizer quantizing a final residual signal input thereto and outputting a coded final residual signal, said final residual signal is a function of said prediction signal, said digital input speech signal, and a shaped quantization noise;
an inverse quantizer for inverse quantization of said coded final residual signal of said quantizer, said inverse quantizer outputting a quantized final residual signal;
a substractor providing quantization noise, said quantization noise is a difference between said final residual signal and said quantized final residual signal of said inverse quantizer; and a noise shaping filter shaping a spectrum of said quantization noise similar to a spectrum envelope of the digital input speech signal, said shaping of said spectrum based upon shaping factors, wherein the shaping factors of said noise shaping filter vary over time with changes in the spectrum envelope of the digital input speech signal wherein said shaping factors for non-voiced sound will be larger than shaping factors for voiced sound.
7. A speech coding system according to claim 6, wherein said noise shaping filter comprises:
a short term predictive pole filter and a short term predictive zero filter which shape the spectrum of the quantization noise similar to a spectrum envelope of the digital input speech signal, a long term predictive pole filter and a long term predictive zero filter which shape the spectrum of the quantization noise similar to a harmonic spectrum due to a periodicity of the digital input speech signal, and a shaping factor selector for selecting shaping factors of said short predictive pole filter, said short term predictive zero filter said long term predictive pole filter and said long term predictive zero filter depending upon an evaluated prediction gain, a first added receiving an output of said subtractor as an input of the noise shaping filter, and an output from said long term predictive pole filter, and providing inputs to said long term predictive zero filter, and said long term predictive pole filter, a first substractor for providing a difference between an output of said first adder and an output of said long term predictive zero filter, a second adder receiving an output from said first substractor and an input from an output of said short term predictive pole filter, and providing inputs to said short term predictive zero filter and said short term predictive pole filter, a second substractor for providing a difference between an output of said second adder and an output of said short term predictive zero filter, a third substractor for providing a difference between an output of said second substractor and an input of the noise shaping filter to provide an output of the noise shaping filter, said evaluated prediction gain being determined by evaluating said prediction parameter according to said digital input speech signal, and said prediction signal which is a difference between said digital input speech signal and said predicted signal.
8. A speech decoding system comprising:
an inverse quantizer for inverse quantization and decoding of a coded final residual signal from a coding side, said inverse quantizer outputting a quantized final predicted residual signal;
a synthesis filter for decoding a digital input speech signal by adding said quantized final predicted residual signal of said inverse quantizer and a prediction signal which is a function of a prediction parameter output by a prediction parameter means; and a post noise shaping filter for shaping a decoded digital speech signal using shaping factors to reduce an effect of said quantization noise on said reproduced digital speech signal, wherein the shaping factors of said post noise shaping filter vary over time with changes in the spectrum envelope of the digital input speech signal wherein said shaping factors for non-voiced sound will be larger than shaping factors for voiced sound.
9. A speech decoding system according to claim 8, wherein said post noise shaping filter comprises;
a short term predictive pole filter and a short term predictive zero filter which shape the spectrum of the decoded digital speech signal similar to the spectrum envelope of the digital input speech signal, a long term predictive pole filter and a long term predictive zero filter which shape the spectrum of the decoded digital speech signal similar to a harmonic spectrum of the digital input speech signal, shaping factor selectors for selecting shaping factors of said short term predictive pole filter, said short term predictive zero filter, said long term predictive pole filter and said long term predictive zero filter depending upon said prediction gain, a first adder receiving an output from said synthesis filter, and an output from said long term predictive pole filter, and providing inputs to said long term predictive zero filter and said long term predictive pole filter, a second adder receiving an output of said first adder, and an output from said long term predictive zero filter, a third adder receiving an output from said second adder, and an output from said short term predictive pole filter, and providing inputs to said short term predictive zero filter and said short term predictive pole filter, and a substractor for providing a difference between an output of said third adder and an output from said short term predictive zero filter to provide said reproduced digital speech signal.
CA000581746A 1988-04-13 1988-10-31 Speech signal coding/decoding system Expired - Fee Related CA1321025C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP63088922A JP3074680B2 (en) 1988-04-13 1988-04-13 Post-noise shaping filter for speech decoder.
JP88922/88 1988-04-13

Publications (1)

Publication Number Publication Date
CA1321025C true CA1321025C (en) 1993-08-03

Family

ID=13956406

Family Applications (1)

Application Number Title Priority Date Filing Date
CA000581746A Expired - Fee Related CA1321025C (en) 1988-04-13 1988-10-31 Speech signal coding/decoding system

Country Status (2)

Country Link
JP (1) JP3074680B2 (en)
CA (1) CA1321025C (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3274451B2 (en) * 1990-02-23 2002-04-15 株式会社東芝 Adaptive postfilter and adaptive postfiltering method
JP3071800B2 (en) * 1990-02-23 2000-07-31 株式会社東芝 Adaptive post filter
JP3076086B2 (en) * 1991-06-28 2000-08-14 シャープ株式会社 Post filter for speech synthesizer
ATE210347T1 (en) * 1991-08-02 2001-12-15 Sony Corp DIGITAL ENCODER WITH DYNAMIC QUANTIZATION BIT DISTRIBUTION
US5710862A (en) * 1993-06-30 1998-01-20 Motorola, Inc. Method and apparatus for reducing an undesirable characteristic of a spectral estimate of a noise signal between occurrences of voice signals
JP3360423B2 (en) * 1994-06-21 2002-12-24 三菱電機株式会社 Voice enhancement device
US5819213A (en) * 1996-01-31 1998-10-06 Kabushiki Kaisha Toshiba Speech encoding and decoding with pitch filter range unrestricted by codebook range and preselecting, then increasing, search candidates from linear overlap codebooks
JP4735711B2 (en) * 2008-12-17 2011-07-27 ソニー株式会社 Information encoding device
JP2010160496A (en) * 2010-02-15 2010-07-22 Toshiba Corp Signal processing device and signal processing method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0727398B2 (en) * 1985-02-12 1995-03-29 日本電気株式会社 Constant variable perceptual weighting filter
JPS62111300A (en) * 1985-11-08 1987-05-22 松下電器産業株式会社 Voice analysis/synthesization circuit
JPS62111299A (en) * 1985-11-08 1987-05-22 松下電器産業株式会社 Voice signal feature extraction circuit

Also Published As

Publication number Publication date
JPH01261930A (en) 1989-10-18
JP3074680B2 (en) 2000-08-07

Similar Documents

Publication Publication Date Title
US5125030A (en) Speech signal coding/decoding system based on the type of speech signal
US4811396A (en) Speech coding system
Campbell Jr et al. The DoD 4.8 kbps standard (proposed federal standard 1016)
EP0573398B1 (en) C.E.L.P. Vocoder
EP0503684B1 (en) Adaptive filtering method for speech and audio
US6202046B1 (en) Background noise/speech classification method
EP1222659B1 (en) Lpc-harmonic vocoder with superframe structure
RU2441286C2 (en) Method and apparatus for detecting sound activity and classifying sound signals
USRE38269E1 (en) Enhancement of speech coding in background noise for low-rate speech coder
EP0751494B1 (en) Speech encoding system
US20030009325A1 (en) Method for signal controlled switching between different audio coding schemes
US7426465B2 (en) Speech signal decoding method and apparatus using decoded information smoothed to produce reconstructed speech signal to enhanced quality
KR100798668B1 (en) Method and apparatus for coding of unvoiced speech
KR100526829B1 (en) Speech decoding method and apparatus Speech decoding method and apparatus
EP0375551B1 (en) A speech coding/decoding system
CA1321025C (en) Speech signal coding/decoding system
US5526464A (en) Reducing search complexity for code-excited linear prediction (CELP) coding
Zelinski et al. Approaches to adaptive transform speech coding at low bit rates
Guilmin et al. New NATO STANAG narrow band voice coder at 600 bits/s
EP0534442B1 (en) Vocoder device for encoding and decoding speech signals
EP0814459A2 (en) Wideband speech coder and decoder
EP0723257B1 (en) Voice signal transmission system using spectral parameter and voice parameter encoding apparatus and decoding apparatus used for the voice signal transmission system
Gournay et al. A 1200 bits/s HSX speech coder for very-low-bit-rate communications
Drygajilo Speech Coding Techniques and Standards
Atkinson et al. Time envelope vocoder, a new LP based coding strategy for use at bit rates of 2.4 kb/s and below

Legal Events

Date Code Title Description
MKLA Lapsed