EP0402947A2 - Arrangement and method for encoding speech signal using regular pulse excitation scheme - Google Patents
Arrangement and method for encoding speech signal using regular pulse excitation scheme Download PDFInfo
- Publication number
- EP0402947A2 EP0402947A2 EP90111360A EP90111360A EP0402947A2 EP 0402947 A2 EP0402947 A2 EP 0402947A2 EP 90111360 A EP90111360 A EP 90111360A EP 90111360 A EP90111360 A EP 90111360A EP 0402947 A2 EP0402947 A2 EP 0402947A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- signal
- parameters
- parameter
- generating
- discrete
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000005284 excitation Effects 0.000 title claims abstract description 71
- 238000000034 method Methods 0.000 title claims description 9
- 238000005311 autocorrelation function Methods 0.000 claims abstract description 38
- 238000005314 correlation function Methods 0.000 claims abstract description 33
- 230000004044 response Effects 0.000 claims abstract description 24
- 238000005316 response function Methods 0.000 claims abstract description 10
- 230000007774 longterm Effects 0.000 claims description 28
- 238000001228 spectrum Methods 0.000 claims description 12
- 238000007781 pre-processing Methods 0.000 abstract description 9
- 239000000284 extract Substances 0.000 abstract description 2
- 230000015572 biosynthetic process Effects 0.000 description 18
- 238000003786 synthesis reaction Methods 0.000 description 18
- 230000006870 function Effects 0.000 description 11
- 238000001208 nuclear magnetic resonance pulse sequence Methods 0.000 description 11
- 108010076504 Protein Sorting Signals Proteins 0.000 description 8
- 238000006243 chemical reaction Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000010420 art technique Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/10—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
- G10L19/113—Regular pulse excitation
Definitions
- the present invention relates generally to an arrangement and method for encoding a discrete-time speech signal using a regular pulse excitation scheme and more specifically to such an arrangement and method for encoding a speech signal at a low bit rate less than 16k-bit per second.
- an a/d (analog-to-digital) converted speech signal is applied via an input terminal 10 to a pre-processing circuit 12 on a frame by frame basis.
- the speech frame applied to the circuit 12 is pre-processed to produce an offset-free signal, which signal is then subjected to a first order pre-emphasis filter.
- An original speech signal has been sampled at a rate of 8 kHz. Since the frame length is 20 ms in this prior art, the one frame consists of 160 signal samples.
- the 160 samples thus obtained are applied to a short term LPC (Linear Predictive Coding) analysis circuit 14 and also to a short term analysis filter 16.
- LPC Linear Predictive Coding
- the 160 samples, applied to the short term LPC analysis circuit 14, are analyzed to determine 8 orders of reflection coefficients which represent a spectrum envelope of each frame.
- the LPC short term analysis circuit 14 further transforms or encodes the reflection coefficients to log area ratios (LAR), which are applied to the short term analysis circuit 16 and a multiplexor 30.
- the short term analysis circuit 16 decodes the LAR into the reflection coefficients and obtains 160 samples of short term residual signals.
- the term "short term analysis” has the same meaning as the spectrum envelope analysis.
- the short term residual signal, outputted from the filter 16, is applied to a subtractor 18 and a long term analysis circuit 22.
- the long term analysis circuit 22 divides the speech frame into 4 sub-frames (5 ms) each of which consists of 40 samples forming the short term residual signal. Each sub-frame is processed blockwise by the subsequent function blocks.
- the long term analysis circuit 22 produces a long term prediction (LTP) lag and an LTP gain on the basis of the two signals: the short term residual samples applied from the circuit 16 and an output sequence from an adder 26.
- LTP long term prediction
- the term "long term analysis” has the same meaning as pitch analysis, and the LTP lag and the LTP gain respectively correspond to a pitch period and a pitch gain.
- the subtractor 18 outputs a block of 40 long term residual signal samples by subtracting the output of a long term analysis filter 20 from the short term residual signal applied from the filter 16.
- m j P ⁇ j + q (0 ⁇ j ⁇ N/p - 1 and 0 ⁇ q ⁇ p) (1)
- p denotes a predetermined pulse interval
- q an RPE grid
- N the number of samples within one sub-frame.
- the excitation pulse generator 28 decodes the signal applied from the circuit 24 to determine an excitation pulse, which is fed to the adder 26.
- the adder 26 adds the excitation pulse from the circuit 28 and the output sequence of the long term analysis filter 20, and applies the resultant sum to the filter 20 as well as the analysis circuit 22.
- the multiplexor 30 combines the encoded outputs of the blocks 14, 22 and 24, and applies the result to a transmission line coupled to an output terminal 32.
- the above-mentioned prior art has encountered the difficulty of low quality of the reconstructed or reproduced speech. This is because the amplitude of each excitation pulse is determined on the basis of the short term signal residual applied to the subtractor 18. In other words, according to the prior art, the long term signal residual outputted from the subtractor 18 is shifted by an RPE grid and then quantized every predetermined number of samples.
- Another object of the present invention is to provide a method for encoding a discrete-time speech signal at a low bit rate less than 16k-bit per second using a regular pulse excitation scheme.
- a binary adder is comprised of a pre-processing circuit is provided to receive a discrete-time speech signal which are then divided into a plurality of frames.
- a parameter extracting circuit is coupled to the pre-processing circuit and extracts a plurality of parameters therefrom.
- a impulse response calculating circuit is coupled to receive the plurality of parameters from the parameter extracting circuit, and generates an impulse response function signal using the plurality of parameters.
- An autocorrelation function circuit is coupled to receive the impulse response signal and generates an autocorrelation function signal using the signal applied.
- a cross-correlation function signal generates a cross-correlation function signal using the discrete-time speech signal and the autocorrelation function signal.
- a grid signal generator receive the output of the cross-correlation function calculating circuit, and outputs a grid signal indicative of a location of a first excitation pulse within one frame.
- a pulse amplitude calculating circuit receives the autocorrelation function signal, the cross-correlation function signal and the grid signal, and determines an amplitude sequence of excitation pulses within one frame.
- One aspect of this invention takes the form of an arrangement for encoding a speech signal using a regular pulse excitation scheme, comprising: first means, the first means being supplied with a discrete-time speech signal and dividing the discrete-time speech signal into a plurality of frames; second means, the second means extracting a plurality of parameters from the first means; third means, the third means generating an impulse response function signal using the plurality of parameters; fourth means, the fourth means generating an autocorrelation function signal using the impulse response signal; fifth means, the fifth means generating a cross-correlation function signal using the discrete-time speech signal and the autocorrelation function signal; sixth means, the sixth means generating a grid signal indicative of a location of a first excitation pulse within one frame; and seventh means, the seventh means receiving the autocorrelation function signal, the cross-correlation function signal and the grid signal, the seventh means determining an amplitude sequence of excitation pulses within one frame.
- Another aspect of this invention takes the form of an arrangement for encoding a speech signal using a regular pulse excitation scheme, comprising: first means, the first means being supplied with a discrete-time speech signal and dividing the discrete-time speech signal into a plurality of frames; second means, the second means extracting a first parameter representative of a spectrum envelope from the first means and encoding the first parameter, the second means decoding the encoded first parameter and obtaining the decoded first parameter; third means, the third means extracting second and third parameters from the first means, the second and third parameters being respectively representative of a pitch period and a pitch gain, the third means decoding the coded second and third parameters and obtaining the decoded second and third parameters,fourth means, the fourth means generating an impulse response function signal using the decoded first, second and third parameters; fifth means, the fifth means generating an autocorrelation function signal using the impulse response function signal; sixth means, the sixth means generating a cross-correlation function signal using the discrete-time speech signal and the autocor
- Still another aspect of this invention takes the form of a method for encoding a speech signal using a regular pulse excitation scheme, comprising the steps of: (a) receiving a discrete-time speech signal and dividing the discrete-time speech signal into a plurality of frames; (b) extracting a plurality of parameters from the discrete-time speech signal; (c) generating an impulse response function signal using the plurality of parameters; (d) generating an autocorrelation function signal using the impulse response signal; (e) generating a cross-correlation function signal using the discrete-time speech signal and the autocorrelation function signal; (f) generating a grid signal indicative of a location of a first excitation pulse within one frame; and (g) receiving the autocorrelation function signal, the cross-correlation function signal and the grid signal, and determining an amplitude sequence of excitation pulses within one frame.
- the present invention is characterized by algorithms for calculating an amplitude of each of the excitation pulses. It should be noted that the location of the excitation pulse can be determined in accordance with the prior art disclosed in Paper 1. The above mentioned algorithms will be discussed below.
- equation (1) the location of a j-th excitation pulse within a frame can be specified by equation (1).
- equation (1) is again shown as equation (3).
- m j p ⁇ j + q (0 ⁇ j ⁇ N/p -1 and 0 ⁇ q ⁇ p) (3) Algorithm of obtaining the RPE grid q will be described later.
- Fig. 3 shows a synthesis filter 122 which comprises two digital filters 310 and 320 coupled in series.
- the filter 310 includes an adder 322, a coefficient weighting circuit 324 and a delay 326.
- the filter 320 includes an adder 328, a coefficient weighting circuit 330 and a delay 332.
- the synthesis filter 122 forms part of the arrangement shown in Fig. 2, and will again be referred to later. Consequently, the detail description of Fig. 3 will be postponed.
- the filter 310 is a long term prediction filter whose output represents a pitch structure, while the filter 320 is a short term prediction filter whose output represents spectrum envelope characteristics.
- the synthesis filter 122 is supplied with the excitation pulse series and outputs a reconstructed signal sequence x′(n) in accordance with the following equation: where ⁇ denotes an LTP gain representative of tap coefficients of the long term filter 310, Md a LTP lag indicative of a pitch period of an incoming speech signal.
- x d (n) denotes an output signal of the filter 310, Np a prediction order of the short term prediction filter 320, and a i (1 ⁇ i ⁇ Np) a prediction coefficient of the filter 320 (a i corresponds to LAR in Fig. 3).
- ⁇ and Md can be obtained in accordance with the prior art techniques disclosed in Paper 1.
- ⁇ and Md can be determined by a peak amplitude of the autocorrelation function sequence of an input speech signal and the position of said peak. The algorithms via which can be achieved have been disclosed in the document entitled "Adaptive predictive coding of speech signals" by B.S. Atal et al., pages 1973 to 1986, The Bell System Technical Journal, October 1970 (referred to as Paper 2).
- the square error J in weighting between the input speech signal x(n) and the reproduced signal x′(n) within one frame can be represented by: where N denotes the number of samples within one frame and w(n) a weighting function.
- Equation (7) is rewritten as follows. where the term x′(n) * w(n) can be modified according to the following equation.
- X w ′(Z) X′(Z) ⁇ W(Z) (11)
- X′(Z) H(Z) ⁇ D(Z) (12) where D(Z) represents the Z conversion of the excitation pulse series given by equation (4), and H(Z) the Z conversion value of the impulse response of the synthesis filter 122.
- Equation (16) The following equation can be obtained by partially differentiating equation (16) with g k and then setting it to zero, where g k is an amplitude of the excitation pulse for minimizing equation (16).
- ⁇ xh ( ⁇ ) represents a cross-correlation function sequence computed from x w (n) and h w (n)
- ⁇ hh ( ⁇ ) represents an autocorrelation function sequence of hw(n).
- the amplitude g k of each of the excitation pulses is a function of the location m k of the corresponding excitation pulse. This means that the most desirable amplitude g k at a given pulse position m k can be computed.
- Equation (17) can be modified using equation (21) as follows:
- RPE grid q is calculated using the autocorrelation function obtained by equation (18). That is to say, the RPE grid q can be determined so as to satisfy the following equation. where max(q) indicates the maximum value of the right term when changing the value of q.
- the value that an RPE grid q can assumes is 0, 1, 2, 3 in the prior art disclosed in Paper 1 merely by way of example.
- an amplitude sequence of the excitation signal can be precisely obtained using equation (22), and hence a high quality reproduced voice can be realized.
- an a/d (analog-to-digital) converted speech signal is applied via an input terminal 110 to a pre-processing circuit 112 on a frame by frame basis.
- the pre-processing circuit 112 can be configured in the same manner as the circuit 12 of Fig. 1.
- the speech frame applied to the circuit 112 is pre-processed to produce an offset-free signal, which is then subjected to a first order pre-emphasis filter.
- An original speech signal to be applied to the input terminal 110 has been sampled at a predetermined rate such as 8 kHz.
- the one frame consists of 160 signal samples.
- the samples thus obtained are applied to a short term LPC (Linear Predictive Coding) analysis circuit 114 and also to a long term (pitch) analysis filter 116.
- LPC Linear Predictive Coding
- the reflection coefficients represent a spectrum envelope of each frame.
- An LAR coding circuit 118 is supplied with the LAR(i)s and transforms or encodes them into log area ratios (coded-LAR(i)) based on a predetermined quantizing levels (quantizing bits), and then applies them to a multiplexor 300. Further, the LAR coding circuit 118 decodes the coded-LAR(i)s, applying the decoded LAR′(i) to an impulse response calculating circuit 120 as well as a synthesis filter 122.
- the long term analysis circuit 116 receives the one frame samples from the pre-processing circuit 112, calculating LTP lag Md and LTP gain ⁇ along with the algorithms as disclosed in the above-mentioned Paper 2.
- the Md, ⁇ are fed to a long term (pitch) coding circuit 124, which encodes the Md, ⁇ and applies the coded-Md and coded- ⁇ to the multiplexor 300. Further, the long term coding circuit 124 decodes the coded-Md and the coded- ⁇ into Md′ and ⁇ ′, respectively.
- the decoded LTP lag (Md′) and the decoded LTP gain ( ⁇ ′) are applied to the impulse response calculating circuit 120 and also to the synthesis filter 122.
- the impulse response calculating circuit 120 comprises an impulse generator 400, a long term prediction (LTP) filter 402 and a short term prediction (STP) filter 404, which are coupled in series.
- the LTP filter 402 includes an adder 406, a coefficient weighting circuit 408 and a delay circuit 410.
- the STP filter 404 includes an adder 412, a coefficient weighting circuit 414 and a delay circuit 416.
- the operation of each of the filters 402 and 404 are known to those in the art, and hence the detail descriptions thereof will be omitted.
- the decoded Md′ and ⁇ ′ are applied to the coefficient weighting circuit 408, while the decoded LAR′(i) to the coefficient weighting circuit 414.
- the impulse response calculating circuit 120 determines an impulse response of a predetermined number of samples and applies the output h w (n) to an autocorrelation function calculating circuit 126 and a cross-correlation function calculating circuit 128.
- the circuit 126 calculates an autocorrelation function R hh (
- a subtractor 134 coupled to the pre-processing circuit 112 and the synthesis filter 122, subtracts the output sequence of the filter 122 from the speech signal sequence x(n), and applies the resultant difference to a weighting circuit 136.
- the synthesis filter 122 has already stored one frame of response signal sequence, which is obtained by using an excitation pulse one frame before the present frame as an excitation signal and thereafter delayed to the present frame by making the excitation signal zero.
- the speech signal sequence of the present frame can be expressed by the sum of a signal sequence obtained by delaying the output signal of the synthesis filter driven by an excitation pulse one frame before to the present frame by making the excitation signal zero, and by the output signal sequence of the synthesis filter driven by the excitation pulse sequence of the present frame.
- the weighting circuit 136 is supplied with the parameter LAR′(i) from the LAR coding circuit 118, and calculates the weighting function w(n) in a manner that the Z conversion value thereof satisfies equation (8). This calculation can be implemented through the use of another frequency weighting scheme.
- the weighting circuit 136 performs a convolution integral of the difference from the subtractor 134 and the function w(n), and applies the output thereof x w (n) to the cross-correlation function circuit 128.
- This circuit 128 is further supplied with the impulse response hw(n), and calculates the cross-correlation function ⁇ xh (-m k ) (where 1 ⁇ m k ⁇ N) which is applied to the RPE grid selector 130 and also to the pulse amplitude calculating circuit 132.
- the grid selector 130 determines or selects a grid q, using the cross-correlation function ⁇ xh (-m k ), according to equation (23) and applies the selected grid to the pulse amplitude calculating circuit 132.
- the circuit 132 is synchronously supplied with the above-mentioned three outputs (viz., the autocorrelation function R hh (
- a pulse coding circuit 136 receives the output sequence of the circuit 132 and encodes the selected grid q and the amplitude sequence g k of the excitation pulses using normalizing coefficients, and applied the encoded information to the multiplexor 300.
- the normalizing coefficients are also encoded within the pulse coding circuit 136 and applied to the multiplexor 300.
- the circuit 136 further decodes the encoded data (viz., the grid and the amplitude sequence and the normalizing coefficients), applying them to a pulse sequence generator 138.
- the decoded grid and the decoded amplitude sequence are respectively denoted by q′ and g k ′.
- the operation of the pulse coding circuit 136 has been disclosed in the above-mentioned Paper 1.
- the pulse sequence generator 138 outputs an excitation pulse sequence of one frame using g k ′ and m k ′, which pulse sequence has an amplitude g k ′ at a position m k ′.
- the synthesis filter 122 receives the excitation pulse sequence, and also receives the coefficients LAR′(i) and the pitch information (Md′ and ⁇ ′) from the circuits 118 and 124, respectively. It should be noted that the synthesis filter 122 converts LAR′(i) into a prediction parameter a i (1 ⁇ i ⁇ Np) by means of a well known method. The filter 122 adds the excitation signal applied thereto and one frame of 0 sequence together with to determine a response signal sequence x(n) for the two frame signal.
- the sequence x′(n) can be represented by: This equation is identical to equation (5).
- the excitation signal d(n) represents the output pulse signal generated by the pulse generating circuit 138 when 1 ⁇ n ⁇ N, while representing a series of all zeros in the case of (N + 1) ⁇ n ⁇ 2N.
- the subtractor 134 receives x′(n) obtained using equation (24) (wherein N + 1 ⁇ n ⁇ 2N).
- the multiplexor 300 combines the outputs of the circuit 136, 118 and 124, which are applied to a transmission line via an output terminal 302.
- FIG. 5 differs from that of Fig. 2 in that the former arrangement further includes a switch 500, a decision circuit 502, a gate 504 and a section 506.
- This section 506 is arranged in exactly the same manner as the arrangement of a section 508, although the functions of the two sections 506 and 508 are slightly different.
- each of the blocks 120′, 126′, 128′, 130′, 132′ and 136′ in the section 506 bears the same reference numeral as the counterpart in the section 508 but has a prime for the purposes of differentiation.
- the section 508 operates in the same manner as described above and hence further descriptions thereof will be omitted for clarity.
- the blocks included in the section 506 operates in the same manner as their counterparts in the section 508, the operations thereof may not be described for clarity.
- the impulse response calculating circuit 120′ in the section 506 receives the decoded LAR′(i) at the coefficient weighting circuit 414 (Fig. 4), and determines an impulse response of a predetermined number of samples and applies the output h w ′(n) to the autocorrelation function calculating circuit 126′ as well as the cross-correlation function calculating circuit 128′.
- the autocorrelation function calculating circuit 126′ calculates an autocorrelation function R hh ′(
- the weighting circuit 136′ operates in the same manner as the counterpart 136, and applies the output thereof x w (n) to the cross-correlation function calculating circuit 128′.
- This circuit 138′ is further supplied with the impulse response hw′(n), and calculates the cross-correlation function ⁇ xh ′(-m k ) (where 1 ⁇ m k ⁇ N) which is applied to the RPE grid selector 130′ and also to the pulse amplitude calculating circuit 132′.
- the grid selector 130′ determines or selects a grid q′, using the cross-correlation function ⁇ xh ′(-m k ), according to equation (23) and applies the selected grid q′ to the pulse amplitude calculating circuit 132′.
- the circuit 132′ is synchronously supplied with the above-mentioned three outputs (viz., the autocorrelation function R hh ′(
- the decision circuit 502 is coupled to the circuits 132 and 132′ to be supplied with the outputs: the autocorrelation functions R hh (
- the decision circuit 502 determines power or energy J of an error signal between the incoming and reconstructed signals, according to the following equation (25), in connection with each of the two excitation pulse series which are obtained at the sections 508 and 506. Equation (25) can be obtained by substituting equations (15) and (22) into equation (9).
- R xx (0) represents power or energy of the output x w (n) of the weighting circuit 136 (or 136′).
- Equation (26) utilizes an error of the cross-correlation function, which can be obtained by calculating the excitation pulse series.
- the decision circuit 502 compares the two kinds of power or energy: one obtained depending on the parameters from the section 508 (referred to as Jo) and the other obtained depending on the parameters from the section 506 (referred to as Jo′). In the event of Jo′ ⁇ Jo, the decision circuit 502 determines that the excitation pulse series obtained through the section 506 is suitable for use relative to that obtained through the section 508. In this case, the decision circuit 502 instructs the switch 500 to relay the output of the section 506 to the pulse coding circuit 136. Further, the decision circuit 502 opens the gate 504 allowing the coded information (coded-LAR(i), coded-Md and coded- ⁇ ) to be applied to the multiplexor 300.
- coded information coded-LAR(i), coded-Md and coded- ⁇
- the gate 504 attaches a predetermined code to the coded-Md and - ⁇ ). Contrarily, in the event of Jo′>Jo, the decision circuit 502 forces the switch 500 to relay the output of the section 508 to the circuit 136, and opens the gate 504 to pass the above-mentioned coded information therethrough.
- the impulse response calculating circuit 120 can be adapted to calculate the above-mentioned two functions h w (n) and h w ′(n). In this case the circuit 120 generates hw′(n) by making zero the parameters Md′ and ⁇ ′ which are applied to the coefficient weighting circuit 408. It goes without saying that h w (n) is first calculated and thereafter computation of the h w ′(n) is performed or vice versa, which can be applied to the other blocks wherein two kinds of computation are implemented.
- the second embodiment can be modified such that the pitch gain ⁇ is compared with a predetermined threshold. If the pitch gain ⁇ is less than the threshold then the pitch gain ⁇ is rendered zero. This means that the excitation pulses are generated using the spectrum parameters only. It is understood that this modification no longer requires the provision of the decision circuit 500 and the calculations of equations (25) and (26). This variation can result in the reduced number of operations.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
- The present invention relates generally to an arrangement and method for encoding a discrete-time speech signal using a regular pulse excitation scheme and more specifically to such an arrangement and method for encoding a speech signal at a low bit rate less than 16k-bit per second.
- In order to encode a speech signal with the limited number of calculations at a low bit rate (less than approximately 16k-bit per second), it is a known practice to model the characteristics of a human's vocal tract using a digital filter and further exhibits excitation signals by combining regular pulse sequences. Such a coding scheme is known as a Regular Pulse Excitation - Long Term Prediction - Linear Predictive Coder (hereinlater referred to as RPE-LTP), which has been proposed in the CEPT/CCH/GSM Recommendation 06.10 entitled "GSM Full Rate Speech Transcoding" published by Conference of European Postal and Telecommunications Administrations, September 19, 1988 (hereinafter referred to as Paper 1).
- Before describing the present invention, the regular pulse excitation coding scheme disclosed in
Paper 1 will be described with reference to Fig. 1. - In Fig. 1, an a/d (analog-to-digital) converted speech signal is applied via an
input terminal 10 to apre-processing circuit 12 on a frame by frame basis. The speech frame applied to thecircuit 12 is pre-processed to produce an offset-free signal, which signal is then subjected to a first order pre-emphasis filter. An original speech signal has been sampled at a rate of 8 kHz. Since the frame length is 20 ms in this prior art, the one frame consists of 160 signal samples. The 160 samples thus obtained are applied to a short term LPC (Linear Predictive Coding)analysis circuit 14 and also to a shortterm analysis filter 16. The 160 samples, applied to the short termLPC analysis circuit 14, are analyzed to determine 8 orders of reflection coefficients which represent a spectrum envelope of each frame. The LPC shortterm analysis circuit 14 further transforms or encodes the reflection coefficients to log area ratios (LAR), which are applied to the shortterm analysis circuit 16 and amultiplexor 30. The shortterm analysis circuit 16 decodes the LAR into the reflection coefficients and obtains 160 samples of short term residual signals. In the above, the term "short term analysis" has the same meaning as the spectrum envelope analysis. The short term residual signal, outputted from thefilter 16, is applied to asubtractor 18 and a longterm analysis circuit 22. - For the following operations, the long
term analysis circuit 22 divides the speech frame into 4 sub-frames (5 ms) each of which consists of 40 samples forming the short term residual signal. Each sub-frame is processed blockwise by the subsequent function blocks. - The long
term analysis circuit 22 produces a long term prediction (LTP) lag and an LTP gain on the basis of the two signals: the short term residual samples applied from thecircuit 16 and an output sequence from anadder 26. The term "long term analysis" has the same meaning as pitch analysis, and the LTP lag and the LTP gain respectively correspond to a pitch period and a pitch gain. - The
subtractor 18 outputs a block of 40 long term residual signal samples by subtracting the output of a longterm analysis filter 20 from the short term residual signal applied from thefilter 16. An excitation pulse calculating circuit 24, using the long term residual signal samples from thesubtractor 18, obtains an RPE grid of an excitation pulse sequence and an amplitude sequence of an excitation pulse series, which are encoded and fed to themultiplexor 30 and also to anexcitation pulse generator 28. In connection with an RPE grid, reference should be made toPaper 1. - The position of j-th excitation pulse (mj) within a sub-frame is given by the following equation.
mj = P·j + q (0 ≦ j ≦ N/p - 1 and 0 ≦ q ≦ p) (1)
where p denotes a predetermined pulse interval, q an RPE grid, and N the number of samples within one sub-frame. By expressing the output sequence of thesubtractor 18 as xj(i), the RPE grid q of the excitation pulse sequence is obtained from the following equation. - The
excitation pulse generator 28 decodes the signal applied from the circuit 24 to determine an excitation pulse, which is fed to theadder 26. theadder 26 adds the excitation pulse from thecircuit 28 and the output sequence of the longterm analysis filter 20, and applies the resultant sum to thefilter 20 as well as theanalysis circuit 22. The long term analysis filter 20, utilizing the LTP lag and the LTP gain, both applied from thecircuit 22, filters the output sequence of theadder 26. The output sequence of thefilter 20 is fed back to theadder 26 and also applied to thesubtractor 18. - The
multiplexor 30 combines the encoded outputs of theblocks output terminal 32. - However, the above-mentioned prior art has encountered the difficulty of low quality of the reconstructed or reproduced speech. This is because the amplitude of each excitation pulse is determined on the basis of the short term signal residual applied to the
subtractor 18. In other words, according to the prior art, the long term signal residual outputted from thesubtractor 18 is shifted by an RPE grid and then quantized every predetermined number of samples. - Furthermore, the aforesaid prior art has encountered another problem in that the reproduced speech is degraded by quatizing distortion. This results from the fact that the number of quantizing bits is insufficient at a bit rate in the order of 13k bps.
- It is an object of the present invention to provide an arrangement for encoding a discrete-time speech signal using a regular pulse excitation scheme.
- It is an object of the present invention to provide an arrangement for encoding a discrete-time speech signal at a low bit rate less than 16k-bit per second through the use of a regular pulse excitation scheme.
- Another object of the present invention is to provide a method for encoding a discrete-time speech signal at a low bit rate less than 16k-bit per second using a regular pulse excitation scheme.
- In brief, a binary adder is comprised of a pre-processing circuit is provided to receive a discrete-time speech signal which are then divided into a plurality of frames. A parameter extracting circuit is coupled to the pre-processing circuit and extracts a plurality of parameters therefrom. A impulse response calculating circuit is coupled to receive the plurality of parameters from the parameter extracting circuit, and generates an impulse response function signal using the plurality of parameters. An autocorrelation function circuit is coupled to receive the impulse response signal and generates an autocorrelation function signal using the signal applied. A cross-correlation function signal generates a cross-correlation function signal using the discrete-time speech signal and the autocorrelation function signal. A grid signal generator receive the output of the cross-correlation function calculating circuit, and outputs a grid signal indicative of a location of a first excitation pulse within one frame. A pulse amplitude calculating circuit receives the autocorrelation function signal, the cross-correlation function signal and the grid signal, and determines an amplitude sequence of excitation pulses within one frame.
- One aspect of this invention takes the form of an arrangement for encoding a speech signal using a regular pulse excitation scheme, comprising: first means, the first means being supplied with a discrete-time speech signal and dividing the discrete-time speech signal into a plurality of frames; second means, the second means extracting a plurality of parameters from the first means; third means, the third means generating an impulse response function signal using the plurality of parameters; fourth means, the fourth means generating an autocorrelation function signal using the impulse response signal; fifth means, the fifth means generating a cross-correlation function signal using the discrete-time speech signal and the autocorrelation function signal; sixth means, the sixth means generating a grid signal indicative of a location of a first excitation pulse within one frame; and seventh means, the seventh means receiving the autocorrelation function signal, the cross-correlation function signal and the grid signal, the seventh means determining an amplitude sequence of excitation pulses within one frame.
- Another aspect of this invention takes the form of an arrangement for encoding a speech signal using a regular pulse excitation scheme, comprising: first means, the first means being supplied with a discrete-time speech signal and dividing the discrete-time speech signal into a plurality of frames; second means, the second means extracting a first parameter representative of a spectrum envelope from the first means and encoding the first parameter, the second means decoding the encoded first parameter and obtaining the decoded first parameter; third means, the third means extracting second and third parameters from the first means, the second and third parameters being respectively representative of a pitch period and a pitch gain, the third means decoding the coded second and third parameters and obtaining the decoded second and third parameters,fourth means, the fourth means generating an impulse response function signal using the decoded first, second and third parameters; fifth means, the fifth means generating an autocorrelation function signal using the impulse response function signal; sixth means, the sixth means generating a cross-correlation function signal using the discrete-time speech signal and the autocorrelation function signal; seventh means, the seventh means generating a grid signal indicative of a location of a first excitation pulse within one frame; and eighth means, the eighth means receiving the autocorrelation function signal, the cross-correlation function signal and the grid signal, the eighth means determining an amplitude sequence of excitation pulses within one frame.
- Still another aspect of this invention takes the form of a method for encoding a speech signal using a regular pulse excitation scheme, comprising the steps of: (a) receiving a discrete-time speech signal and dividing the discrete-time speech signal into a plurality of frames; (b) extracting a plurality of parameters from the discrete-time speech signal; (c) generating an impulse response function signal using the plurality of parameters; (d) generating an autocorrelation function signal using the impulse response signal; (e) generating a cross-correlation function signal using the discrete-time speech signal and the autocorrelation function signal; (f) generating a grid signal indicative of a location of a first excitation pulse within one frame; and (g) receiving the autocorrelation function signal, the cross-correlation function signal and the grid signal, and determining an amplitude sequence of excitation pulses within one frame.
- The features and advantages of the present invention will become more clearly appreciated from the following description taken in conjunction with the accompanying drawings in which like elements are denoted by like reference numerals and in which:
- Fig. 1 is a block diagram illustrating a known RPE scheme, the drawing having been referred to in the opening paragraphs of this specification;
- Fig. 2 is a block diagram showing a first embodiment of this invention;
- Figs. 3 and 4 each is a block diagram showing in detail a block in Fig. 2; and
- Fig. 5 is a block diagram showing a second embodiment of this invention.
- The present invention is characterized by algorithms for calculating an amplitude of each of the excitation pulses. It should be noted that the location of the excitation pulse can be determined in accordance with the prior art disclosed in
Paper 1. The above mentioned algorithms will be discussed below. - According to a so-called RPE coding scheme, the location of a j-th excitation pulse within a frame can be specified by equation (1). For the convenience of description, equation (1) is again shown as equation (3).
mj = p·j + q (0 ≦ j ≦ N/p -1 and 0 ≦ q ≦ p) (3)
Algorithm of obtaining the RPE grid q will be described later. - An excitation pulse sequence d(n) can be represented by
- Fig. 3 shows a
synthesis filter 122 which comprises twodigital filters filter 310 includes anadder 322, acoefficient weighting circuit 324 and adelay 326. Similarly, thefilter 320 includes anadder 328, acoefficient weighting circuit 330 and adelay 332. Thesynthesis filter 122 forms part of the arrangement shown in Fig. 2, and will again be referred to later. Consequently, the detail description of Fig. 3 will be postponed. - The
filter 310 is a long term prediction filter whose output represents a pitch structure, while thefilter 320 is a short term prediction filter whose output represents spectrum envelope characteristics. For simplifying the description, it will be assumed that thefilter 310 is of a first order type. Thesynthesis filter 122 is supplied with the excitation pulse series and outputs a reconstructed signal sequence x′(n) in accordance with the following equation:long term filter 310, Md a LTP lag indicative of a pitch period of an incoming speech signal. Further, in equation (5), xd(n) denotes an output signal of thefilter 310, Np a prediction order of the shortterm prediction filter 320, and ai (1≦ i ≦Np) a prediction coefficient of the filter 320 (ai corresponds to LAR in Fig. 3). β and Md can be obtained in accordance with the prior art techniques disclosed inPaper 1. As an alternative, β and Md can be determined by a peak amplitude of the autocorrelation function sequence of an input speech signal and the position of said peak. The algorithms via which can be achieved have been disclosed in the document entitled "Adaptive predictive coding of speech signals" by B.S. Atal et al., pages 1973 to 1986, The Bell System Technical Journal, October 1970 (referred to as Paper 2). - By defining the impulse response of the
synthesis filter 122 as h(i) (0 ≦ i ≦ M-1 (M is the number of continuous samples)), the reconstructed signal x′(n) is given by:
X′(n) = d(n) * h(n) (6)
where the symbol * denotes convolution integration. Further, the square error J in weighting between the input speech signal x(n) and the reproduced signal x′(n) within one frame, can be represented by:synthesis filter 300 and r is a constant (0≦ r ≦1) which determines the frequency characteristics of W(Z). In more detail, in the event of r=1 then W(Z)=1. This means that the frequency characteristics is flat. On the other hand, when r=0 then W(Z) represents an inverse frequency characteristics of thesynthesis filter 122. It follows that the value of r is able to change the characteristics of W(Z). The reason why W(Z) is determined depending upon the frequency characteristics of thesynthesis filter 122 as shown in equation (8), stems from the fact that an audible masking effect is utilized. In more detail, at a portion where the power of the spectrum of the input speech signal is large (for example in the vicinity of a formant), even if the difference or error between the spectrums of input and reconstructed signals is somewhat large, such error does not affect the hearing sense of the ears. - Algorithms for calculating an excitation pulse series which minimizes the weighted square error J shown in equation (7), will be discussed in the followings. Equation (7) is rewritten as follows.
xw′(n) = x′(n) * w(n) (10)
and by performing Z conversion on both sides of equation (10), we obtain:
Xw′(Z) = X′(Z)·W(Z) (11)
Further, X′(Z) can be expressed as follows:
X′(Z) = H(Z)·D(Z) (12)
where D(Z) represents the Z conversion of the excitation pulse series given by equation (4), and H(Z) the Z conversion value of the impulse response of thesynthesis filter 122. Substituting equation (12) into equation (11) gives:
X′w(Z) = H(Z)·D(Z)·W(Z) (13)
By setting Hw(Z)=H(Z)·W(Z) and then implementing an inverse Z conversion on equation (13), we obtain
x′w(n) = d(n) * hw(n) (14)
where hw(n) denotes an inverse Z conversion value of Hw(Z) and indicates the impulse response of a cascade coupled filter comprising a synthesis filter and a weighting circuit. By substituting equation (4) into equation (14), we obtain - The following equation can be obtained by partially differentiating equation (16) with gk and then setting it to zero, where gk is an amplitude of the excitation pulse for minimizing equation (16).
- As will be understood from equation (17), the amplitude gk of each of the excitation pulses is a function of the location mk of the corresponding excitation pulse. This means that the most desirable amplitude gk at a given pulse position mk can be computed. If an incoming speech signal sequence is assumed stationary, then the covariance function φhh(mi,mk) can be represented by the following equation (20).
φhh(mi,mk) = Rhh(|mi - mk|) (20) - This equation indicates that under the above-mentioned assumption, φhh(mi,mk) is equal to an autocorrelation function Rhh(·) which depends on a delay |mi - mk|.
-
- The value of RPE grid q is calculated using the autocorrelation function obtained by equation (18). That is to say, the RPE grid q can be determined so as to satisfy the following equation.
Paper 1 merely by way of example. - According to the present invention, an amplitude sequence of the excitation signal can be precisely obtained using equation (22), and hence a high quality reproduced voice can be realized.
- A first embodiment of this invention will be discussed with reference to Figs. 2 to 4.
- As previously mentioned in connection with Fig. 1, an a/d (analog-to-digital) converted speech signal is applied via an
input terminal 110 to apre-processing circuit 112 on a frame by frame basis. Thepre-processing circuit 112 can be configured in the same manner as thecircuit 12 of Fig. 1. The speech frame applied to thecircuit 112 is pre-processed to produce an offset-free signal, which is then subjected to a first order pre-emphasis filter. An original speech signal to be applied to theinput terminal 110, has been sampled at a predetermined rate such as 8 kHz. In the event that the frame length is 20 ms as in the prior art, merely by way of example, the one frame consists of 160 signal samples. The samples thus obtained are applied to a short term LPC (Linear Predictive Coding)analysis circuit 114 and also to a long term (pitch)analysis filter 116. - The one frame samples, applied to the short term
LPC analysis circuit 114, are analyzed to determine predetermined orders of reflection coefficients (LAR(i)) (i=1···8) in the same manner as disclosed inPaper 1. The reflection coefficients represent a spectrum envelope of each frame. AnLAR coding circuit 118 is supplied with the LAR(i)s and transforms or encodes them into log area ratios (coded-LAR(i)) based on a predetermined quantizing levels (quantizing bits), and then applies them to amultiplexor 300. Further, theLAR coding circuit 118 decodes the coded-LAR(i)s, applying the decoded LAR′(i) to an impulseresponse calculating circuit 120 as well as asynthesis filter 122. - The long
term analysis circuit 116 receives the one frame samples from thepre-processing circuit 112, calculating LTP lag Md and LTP gain β along with the algorithms as disclosed in the above-mentioned Paper 2. The Md, β are fed to a long term (pitch)coding circuit 124, which encodes the Md, β and applies the coded-Md and coded-β to themultiplexor 300. Further, the longterm coding circuit 124 decodes the coded-Md and the coded-β into Md′ and β′, respectively. The decoded LTP lag (Md′) and the decoded LTP gain (β′) are applied to the impulseresponse calculating circuit 120 and also to thesynthesis filter 122. - As shown in Fig. 4, the impulse
response calculating circuit 120 comprises animpulse generator 400, a long term prediction (LTP)filter 402 and a short term prediction (STP)filter 404, which are coupled in series. TheLTP filter 402 includes anadder 406, acoefficient weighting circuit 408 and adelay circuit 410. Similarly, theSTP filter 404 includes anadder 412, acoefficient weighting circuit 414 and adelay circuit 416. The operation of each of thefilters coefficient weighting circuit 408, while the decoded LAR′(i) to thecoefficient weighting circuit 414. - The impulse
response calculating circuit 120 determines an impulse response of a predetermined number of samples and applies the output hw(n) to an autocorrelationfunction calculating circuit 126 and a cross-correlationfunction calculating circuit 128. - The
circuit 126 calculates an autocorrelation function Rhh(|mi - mk|) according to equation (21), and applies the result to a pulseamplitude calculating circuit 132. - A
subtractor 134, coupled to thepre-processing circuit 112 and thesynthesis filter 122, subtracts the output sequence of thefilter 122 from the speech signal sequence x(n), and applies the resultant difference to aweighting circuit 136. Thesynthesis filter 122 has already stored one frame of response signal sequence, which is obtained by using an excitation pulse one frame before the present frame as an excitation signal and thereafter delayed to the present frame by making the excitation signal zero. This is based on a consideration that if it is assumed that the effective sample number of the impulse response of the synthesis filter in question is at most about two frames, the speech signal sequence of the present frame can be expressed by the sum of a signal sequence obtained by delaying the output signal of the synthesis filter driven by an excitation pulse one frame before to the present frame by making the excitation signal zero, and by the output signal sequence of the synthesis filter driven by the excitation pulse sequence of the present frame. - The
weighting circuit 136 is supplied with the parameter LAR′(i) from theLAR coding circuit 118, and calculates the weighting function w(n) in a manner that the Z conversion value thereof satisfies equation (8). This calculation can be implemented through the use of another frequency weighting scheme. Theweighting circuit 136 performs a convolution integral of the difference from thesubtractor 134 and the function w(n), and applies the output thereof xw(n) to thecross-correlation function circuit 128. Thiscircuit 128 is further supplied with the impulse response hw(n), and calculates the cross-correlation function φxh(-mk) (where 1≦ mk ≦N) which is applied to theRPE grid selector 130 and also to the pulseamplitude calculating circuit 132. - The
grid selector 130 determines or selects a grid q, using the cross-correlation function φxh(-mk), according to equation (23) and applies the selected grid to the pulseamplitude calculating circuit 132. Thecircuit 132 is synchronously supplied with the above-mentioned three outputs (viz., the autocorrelation function Rhh(|mi - mk|), the cross-correlation function φxh(-mk) and the selected grid q), and determines an amplitude of each of the excitation pulses within one frame. In other words, thecircuit 132 determines a so-called amplitude sequence of the excitation pulses in one frame. - A
pulse coding circuit 136 receives the output sequence of thecircuit 132 and encodes the selected grid q and the amplitude sequence gk of the excitation pulses using normalizing coefficients, and applied the encoded information to themultiplexor 300. The normalizing coefficients are also encoded within thepulse coding circuit 136 and applied to themultiplexor 300. Thecircuit 136 further decodes the encoded data (viz., the grid and the amplitude sequence and the normalizing coefficients), applying them to apulse sequence generator 138. The decoded grid and the decoded amplitude sequence are respectively denoted by q′ and gk′. The operation of thepulse coding circuit 136 has been disclosed in the above-mentionedPaper 1. - The
pulse sequence generator 138 outputs an excitation pulse sequence of one frame using gk′ and mk′, which pulse sequence has an amplitude gk′ at a position mk′. - The
synthesis filter 122 receives the excitation pulse sequence, and also receives the coefficients LAR′(i) and the pitch information (Md′ and β′) from thecircuits synthesis filter 122 converts LAR′(i) into a prediction parameter ai (1≦ i ≦Np) by means of a well known method. Thefilter 122 adds the excitation signal applied thereto and one frame of 0 sequence together with to determine a response signal sequence x(n) for the two frame signal. The sequence x′(n) can be represented by:pulse generating circuit 138 when 1≦ n ≦N, while representing a series of all zeros in the case of (N + 1)≦ n ≦2N. Thesubtractor 134 receives x′(n) obtained using equation (24) (wherein N + 1≦ n ≦2N). - The
multiplexor 300 combines the outputs of thecircuit output terminal 302. - A second embodiment of this invention will be discussed with reference to Figs. 3 to 5. The arrangement of Fig. 5 differs from that of Fig. 2 in that the former arrangement further includes a
switch 500, adecision circuit 502, agate 504 and asection 506. Thissection 506 is arranged in exactly the same manner as the arrangement of asection 508, although the functions of the twosections blocks 120′, 126′, 128′, 130′, 132′ and 136′ in thesection 506 bears the same reference numeral as the counterpart in thesection 508 but has a prime for the purposes of differentiation. Thesection 508 operates in the same manner as described above and hence further descriptions thereof will be omitted for clarity. Similarly, in the case where the blocks included in thesection 506 operates in the same manner as their counterparts in thesection 508, the operations thereof may not be described for clarity. - The impulse
response calculating circuit 120′ in thesection 506 receives the decoded LAR′(i) at the coefficient weighting circuit 414 (Fig. 4), and determines an impulse response of a predetermined number of samples and applies the output hw′(n) to the autocorrelationfunction calculating circuit 126′ as well as the cross-correlationfunction calculating circuit 128′. This means that thecircuit 120′ utilizes only the shortterm prediction filter 404. It should be noted that as shown in Fig. 5 the line provided for the pitch information (Md′ and β′) is not coupled to theblock 120′ for disabling the longterm prediction filter 402. - The autocorrelation
function calculating circuit 126′ calculates an autocorrelation function Rhh′(|mi - mk|) according to equation (21), and applies the result to the pulseamplitude calculating circuit 132′. Theweighting circuit 136′ operates in the same manner as thecounterpart 136, and applies the output thereof xw(n) to the cross-correlationfunction calculating circuit 128′. Thiscircuit 138′ is further supplied with the impulse response hw′(n), and calculates the cross-correlation function φxh′(-mk) (where 1≦ mk ≦N) which is applied to theRPE grid selector 130′ and also to the pulseamplitude calculating circuit 132′. - The
grid selector 130′ determines or selects a grid q′, using the cross-correlation function φxh′(-mk), according to equation (23) and applies the selected grid q′ to the pulseamplitude calculating circuit 132′. - The
circuit 132′ is synchronously supplied with the above-mentioned three outputs (viz., the autocorrelation function Rhh′(|mi - mk|), the cross-correlation function φxh′(-mk) and the selected grid q′), and determines an amplitude of each of the excitation pulses within one frame. - The
decision circuit 502 is coupled to thecircuits decision circuit 502 determines power or energy J of an error signal between the incoming and reconstructed signals, according to the following equation (25), in connection with each of the two excitation pulse series which are obtained at thesections - Alternatively, the error signal energy can approximately be obtained using the following equation (26) instead of equation (25).
J = Σ φ² xh(-mi)/Rhh (26)
Equation (26) utilizes an error of the cross-correlation function, which can be obtained by calculating the excitation pulse series. - The
decision circuit 502 compares the two kinds of power or energy: one obtained depending on the parameters from the section 508 (referred to as Jo) and the other obtained depending on the parameters from the section 506 (referred to as Jo′). In the event of Jo′<Jo, thedecision circuit 502 determines that the excitation pulse series obtained through thesection 506 is suitable for use relative to that obtained through thesection 508. In this case, thedecision circuit 502 instructs theswitch 500 to relay the output of thesection 506 to thepulse coding circuit 136. Further, thedecision circuit 502 opens thegate 504 allowing the coded information (coded-LAR(i), coded-Md and coded-β) to be applied to themultiplexor 300. In this case, thegate 504 attaches a predetermined code to the coded-Md and -β). Contrarily, in the event of Jo′>Jo, thedecision circuit 502 forces theswitch 500 to relay the output of thesection 508 to thecircuit 136, and opens thegate 504 to pass the above-mentioned coded information therethrough. - As shown, the two
sections response calculating circuit 120 can be adapted to calculate the above-mentioned two functions hw(n) and hw′(n). In this case thecircuit 120 generates hw′(n) by making zero the parameters Md′ and β′ which are applied to thecoefficient weighting circuit 408. It goes without saying that hw(n) is first calculated and thereafter computation of the hw′(n) is performed or vice versa, which can be applied to the other blocks wherein two kinds of computation are implemented. - In the above-mentioned embodiments, various calculations can be carried out on a sub-frame basis as in the prior art.
- The second embodiment can be modified such that the pitch gain β is compared with a predetermined threshold. If the pitch gain β is less than the threshold then the pitch gain β is rendered zero. This means that the excitation pulses are generated using the spectrum parameters only. It is understood that this modification no longer requires the provision of the
decision circuit 500 and the calculations of equations (25) and (26). This variation can result in the reduced number of operations. - While the foregoing description describes only two embodiments of the present invention, the various alternatives and modifications possible without departing from the scope of the present invention, which is limited only by the appended claims, will be apparent to those skilled in the art.
Claims (8)
first means, said first means being supplied with a discrete-time speech signal and dividing said discrete-time speech signal into a plurality of frames;
second means, said second means extracting a plurality of parameters from said first means;
third means, said third means generating an impulse response function signal using said plurality of parameters;
fourth means, said fourth means generating an autocorrelation function signal using said impulse response signal;
fifth means, said fifth means generating a cross-correlation function signal using said discrete-time speech signal and said autocorrelation function signal;
sixth means, said sixth means generating a grid signal indicative of a location of a first excitation pulse within one frame; and
seventh means, said seventh means receiving said autocorrelation function signal, said cross-correlation function signal and said grid signal, said seventh means determining an amplitude sequence of excitation pulses within one frame.
eighth means, said eighth means extracting a first parameter representative of a spectrum envelope from said first means and encoding the first parameter, said second means decoding the encoded first parameter and obtaining the decoded first parameter; and
ninth means, said ninth means extracting second and third parameters from said first means, said second and third parameter being respectively representative of a pitch period and a pitch gain, said ninth means decoding the coded second and third parameters and obtaining the decoded second and third parameters,
wherein the decoded first, second and third parameters are applied to said third means.
an impulse generator for generating an impulse;
a long term prediction filter, said long term prediction filter receiving said impulse as well as said second and third parameters; and
a short term prediction filter, said short term prediction filter being coupled in series with said long term prediction filter, said short term prediction filter receiving said first parameter and the output of said long term prediction filter.
first means, said first means being supplied with a discrete-time speech signal and dividing said discrete-time speech signal into a plurality of frames;
second means, said second means extracting a first parameter representative of a spectrum envelope from said first means and encoding the first parameter, said second means decoding the encoded first parameter and obtaining the decoded first parameter;
third means, said third means extracting second and third parameters from said first means, said second and third parameters being respectively representative of a pitch period and a pitch gain, said third means decoding the coded second and third parameters and obtaining the decoded second and third parameters;
fourth means, said fourth means generating an impulse response function signal using the decoded first, second and third parameters;
fifth means, said fifth means generating an autocorrelation function signal using said impulse response function signal;
sixth means, said sixth means generating a cross- correlation function signal using said discrete-time speech signal and said autocorrelation function signal;
seventh means, said seventh means generating a grid signal indicative of a location of a first excitation pulse within one frame; and
eighth means, said eighth means receiving said autocorrelation function signal, said cross-correlation function signal and said grid signal, said eighth means determining an amplitude sequence of excitation pulses within one frame.
an impulse generator for generating an impulse;
a long term prediction filter, said long term prediction filter receiving said impulse as well as said second and third parameters; and
a short term prediction filter, said short term prediction filter being coupled in series with said long term prediction filter, said short term prediction filter receiving said first parameter and the output of said long term prediction filter.
extracting a first parameter representative of a spectrum envelope from said discrete-time speech signal and encoding the first parameter, decoding the encoded first parameter and obtaining the decoded first parameter; and
extracting second and third parameters from said discrete-time speech signal wherein said second and third parameter being respectively representative of a pitch period and a pitch gain, and decoding the coded second and third parameters and obtaining the decoded second and third parameters,
wherein the decoded first, second and third parameters correspond to said plurality of parameters in said step (c).
generating an impulse;
receiving said impulse as well as said second and third parameters, and generating an output representative of a pitch structure; and
receiving said parameter and said output representative of a pitch structure, and generating an output representative of spectrum envelope characteristics.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP1150770A JPH0315900A (en) | 1989-06-14 | 1989-06-14 | Audio signal encoding device |
JP150770/89 | 1989-06-14 | ||
JP254458/89 | 1989-09-29 | ||
JP1254458A JP2900431B2 (en) | 1989-09-29 | 1989-09-29 | Audio signal coding device |
Publications (3)
Publication Number | Publication Date |
---|---|
EP0402947A2 true EP0402947A2 (en) | 1990-12-19 |
EP0402947A3 EP0402947A3 (en) | 1991-10-23 |
EP0402947B1 EP0402947B1 (en) | 1997-11-26 |
Family
ID=26480256
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP19900111360 Expired - Lifetime EP0402947B1 (en) | 1989-06-14 | 1990-06-15 | Arrangement and method for encoding speech signal using regular pulse excitation scheme |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP0402947B1 (en) |
DE (1) | DE69031749T2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0545403A2 (en) * | 1991-12-03 | 1993-06-09 | Nec Corporation | Speech signal encoding system capable of transmitting a speech signal at a low bit rate |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0374941A2 (en) * | 1988-12-23 | 1990-06-27 | Nec Corporation | Communication system capable of improving a speech quality by effectively calculating excitation multipulses |
-
1990
- 1990-06-15 EP EP19900111360 patent/EP0402947B1/en not_active Expired - Lifetime
- 1990-06-15 DE DE1990631749 patent/DE69031749T2/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0374941A2 (en) * | 1988-12-23 | 1990-06-27 | Nec Corporation | Communication system capable of improving a speech quality by effectively calculating excitation multipulses |
Non-Patent Citations (3)
Title |
---|
ICASSP'85 (IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, Tampa, Florida, 26th - 29th March 1985), vol. 1, pages 256-259, IEEE, New York, US; J.P. ADOUL et al.: "Generalization of the multipulse coding for low bit rate coding purposes: the generalized decimation" * |
ICASSP'87 (1987 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, Dallas, Texas, 6th - 9th April 1987), vol. 2, pages 968-971, IEEE, New York, US; A. FUKUI et al.: "Implementation of a multi-pulse speech codec with pitch prediction on a single chip floating-point signal processor" * |
ICASSP'88 (1988 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, New York, 11th - 14th April 1988), vol. 1, pages 227-230, IEEE, New York, US; P. VARY et al.: "Speech codec for the European mobile radio system" * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0545403A2 (en) * | 1991-12-03 | 1993-06-09 | Nec Corporation | Speech signal encoding system capable of transmitting a speech signal at a low bit rate |
EP0545403A3 (en) * | 1991-12-03 | 1993-07-07 | Nec Corporation | Speech signal encoding system capable of transmitting a speech signal at a low bit rate |
Also Published As
Publication number | Publication date |
---|---|
EP0402947A3 (en) | 1991-10-23 |
EP0402947B1 (en) | 1997-11-26 |
DE69031749D1 (en) | 1998-01-08 |
DE69031749T2 (en) | 1998-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0409239B1 (en) | Speech coding/decoding method | |
KR100417836B1 (en) | High frequency content recovering method and device for over-sampled synthesized wideband signal | |
EP1202251B1 (en) | Transcoder for prevention of tandem coding of speech | |
DE60011051T2 (en) | CELP TRANS CODING | |
CA2160749C (en) | Speech coding apparatus, speech decoding apparatus, speech coding and decoding method and a phase amplitude characteristic extracting apparatus for carrying out the method | |
US5125030A (en) | Speech signal coding/decoding system based on the type of speech signal | |
US4716592A (en) | Method and apparatus for encoding voice signals | |
US4821324A (en) | Low bit-rate pattern encoding and decoding capable of reducing an information transmission rate | |
US6681204B2 (en) | Apparatus and method for encoding a signal as well as apparatus and method for decoding a signal | |
EP0802524B1 (en) | Speech coder | |
JP2002055699A (en) | Device and method for encoding voice | |
KR100218214B1 (en) | Apparatus for encoding voice and apparatus for encoding and decoding voice | |
US20030142699A1 (en) | Voice code conversion method and apparatus | |
US4945565A (en) | Low bit-rate pattern encoding and decoding with a reduced number of excitation pulses | |
US20040111257A1 (en) | Transcoding apparatus and method between CELP-based codecs using bandwidth extension | |
JP2903533B2 (en) | Audio coding method | |
US6006178A (en) | Speech encoder capable of substantially increasing a codebook size without increasing the number of transmitted bits | |
CA1229681A (en) | Method and apparatus for speech-band signal coding | |
EP0744069B1 (en) | Burst excited linear prediction | |
EP0534442B1 (en) | Vocoder device for encoding and decoding speech signals | |
EP0402947B1 (en) | Arrangement and method for encoding speech signal using regular pulse excitation scheme | |
US5708756A (en) | Low delay, middle bit rate speech coder | |
JP2900431B2 (en) | Audio signal coding device | |
JP3047761B2 (en) | Audio coding device | |
JP3089967B2 (en) | Audio coding device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 19900710 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): BE DE FR GB NL SE |
|
PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): BE DE FR GB NL SE |
|
17Q | First examination report despatched |
Effective date: 19940926 |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): BE DE FR GB NL SE |
|
REF | Corresponds to: |
Ref document number: 69031749 Country of ref document: DE Date of ref document: 19980108 |
|
ET | Fr: translation filed | ||
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed | ||
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: BE Payment date: 20000522 Year of fee payment: 11 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: SE Payment date: 20000605 Year of fee payment: 11 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20000612 Year of fee payment: 11 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20000614 Year of fee payment: 11 Ref country code: DE Payment date: 20000614 Year of fee payment: 11 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: NL Payment date: 20000629 Year of fee payment: 11 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20010615 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20010616 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20010630 |
|
BERE | Be: lapsed |
Owner name: NEC CORP. Effective date: 20010630 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20020101 |
|
EUG | Se: european patent has lapsed |
Ref document number: 90111360.5 |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20010615 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20020228 |
|
NLV4 | Nl: lapsed or anulled due to non-payment of the annual fee |
Effective date: 20020101 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20020403 |