JPH08202396A

JPH08202396A - Voice prediction coding method

Info

Publication number: JPH08202396A
Application number: JP7009615A
Authority: JP
Inventors: Jiyoutarou Ikedo; 丈太朗池戸
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1995-01-25
Filing date: 1995-01-25
Publication date: 1996-08-09

Abstract

PURPOSE: To unnecessitate a memory by which plural noise excitation sources are stored by selecting a noise excitation source with less quantity of arithmetic processing. CONSTITUTION: In a voice prediction coding method by which a voice is coded utilizing reproducing a voice by driving a synthetic filter 4 of which a quantized filter coefficient calculated based on sampling of plural input voices is set per frame unit by a time series vector component outputted from two excitation sources of a pitch excitation source 11 expressing a pitch component and a noise excitation source 31 expressing a noise component, a frame is further divided into plural sub-frames, the noise excitation source 31 is expressed by making a pulse for each sub-frame, and a position and amplitude of this pulse is quantized and coded for each sub-frame.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、音声予測符号化方法
に関し、特に、ピッチ成分を表現するピッチ励振源およ
び雑音成分を表現する雑音励振源の２個の励振源を使用
して合成フィルタを駆動して音声を再生することを利用
して音声を符号化する音声予測符号化方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech predictive coding method and, more particularly, to a synthesis filter using two excitation sources, a pitch excitation source expressing a pitch component and a noise excitation source expressing a noise component. The present invention relates to a voice predictive coding method for encoding voice by utilizing driving and reproducing voice.

【０００２】[0002]

【従来の技術】２個の励振源を使用して合成フィルタを
駆動する音声予測符号化方法としてはVSELP、CS-CELPそ
の他の方法が知られている。これらは、I.A.Gerson and
M.A.Jasiuk:“Vector Sum Excited Linear Prediction
(VSELP)Speech Coding at 8 kbps”，proc.ICASSP '90
，pp.461-464，1994、或はA.Kataoka,T.moriya an d
S.Hayashi：“An 8-kbit/s Speech Coder Based on Con
jugate Structure CEIP”，proc.ICASSP '93 ，pp.592-
595,1993,その他の文献に開示されている。2. Description of the Related Art VSELP, CS-CELP, and other methods are known as speech predictive coding methods for driving a synthesis filter using two excitation sources. These are IAGerson and
MAJasiuk: “Vector Sum Excited Linear Prediction
(VSELP) Speech Coding at 8 kbps ”, proc.ICASSP '90
, Pp.461-464, 1994, or A. Kataoka, T. moriya and d.
S.Hayashi: “An 8-kbit / s Speech Coder Based on Con
jugate Structure CEIP ”, proc.ICASSP '93, pp.592-
595, 1993, and others.

【０００３】ここで、図４を参照して音声予測符号化方
法の従来例を説明する。先ず、入力端子１から入力され
る入力音声波形のサンプリング値系列がフィルタ係数決
定部２に供給され、ここにおいてフィルタ係数が計算さ
れる。フィルタ係数決定部２において計算されたフィル
タ係数は、次いで、フィルタ係数量子化部３に供給さ
れ、フィルタ係数はここにおいて量子化され、量子化さ
れたフィルタ係数は合成フィルタ４に供給され、ここに
合成フィルタが設定される。Now, a conventional example of the voice predictive coding method will be described with reference to FIG. First, the sampling value series of the input speech waveform input from the input terminal 1 is supplied to the filter coefficient determination unit 2, and the filter coefficient is calculated here. The filter coefficient calculated in the filter coefficient determination unit 2 is then supplied to the filter coefficient quantization unit 3, the filter coefficient is quantized here, and the quantized filter coefficient is supplied to the synthesis filter 4, where The synthesis filter is set.

【０００４】この合成フィルタ４を励振する励振源は２
個の励振源より成る。励振源の内の一方はピッチ励振源
１１であり、他方は雑音励振源２１である。このピッチ
励振源１１は複数のピッチ周期成分より成り、選択され
たピッチ周期成分候補はピッチ利得乗算部１２において
ピッチ利得を乗算して出力される。雑音励振源２１は複
数の雑音波形成分より成り、選択された雑音励振源候補
は雑音利得乗算部２２において雑音利得を乗算して出力
される。合成フィルタ４はピッチ利得乗算部１２の出力
と雑音利得乗算部２２の出力を加算したものにより励振
駆動される。The excitation source for exciting the synthesis filter 4 is 2
It consists of individual excitation sources. One of the excitation sources is a pitch excitation source 11 and the other is a noise excitation source 21. The pitch excitation source 11 is composed of a plurality of pitch period components, and the selected pitch period component candidate is multiplied by the pitch gain in the pitch gain multiplication unit 12 and output. The noise excitation source 21 is composed of a plurality of noise waveform components, and the selected noise excitation source candidate is multiplied by the noise gain in the noise gain multiplication unit 22 and output. The synthesis filter 4 is excited and driven by the sum of the output of the pitch gain multiplication unit 12 and the output of the noise gain multiplication unit 22.

【０００５】歪計算部５は、入力端子１を介して入力さ
れる入力音声と合成フィルタ４から出力される合成音声
との間の差である歪が最も小さくなる様に両励振源中の
各励振成分候補を選択し、同時に各励振成分候補に対し
て最適な利得を設定するところである。符号出力部６
は、フィルタ係数量子化部３から供給される量子化され
たフィルタ係数、歪計算部５において選択された各励振
成分候補をそれぞれ符号化し出力するところであり、こ
れらの符号は出力端子７を介して伝送される。The distortion calculating section 5 is arranged so that the distortion, which is the difference between the input speech input through the input terminal 1 and the synthetic speech output from the synthesis filter 4, is minimized. The excitation component candidate is selected, and at the same time, the optimum gain is set for each excitation component candidate. Code output unit 6
Is to encode and output the quantized filter coefficient supplied from the filter coefficient quantization unit 3 and each excitation component candidate selected in the distortion calculation unit 5, and these codes are output via the output terminal 7. Is transmitted.

【０００６】図４に示される音声予測符号化方法の従来
例における最適な雑音励振源候補の選択の仕方を図５を
参照して説明する。図５は従来例の歪計算部５の一部を
構成する回路である。この従来例において最適な雑音励
振源候補の選択をするには、フレーム長と等しい時間長
を有する合成フィルタのインパルス応答波形と雑音励振
源候補の畳み込み演算、およびフレーム長と等しい時間
長を有する入力音声よりピッチ成分を除いた波形と先の
畳み込み演算結果の波形との間の歪計算を、雑音励振源
の候補の数だけ実施する必要があり、このために多数回
の演算処理をする必要がある。そして、比較的長い時間
長を有する雑音励振源を記憶するための大きな記憶容量
のメモリを必要とする。A method of selecting an optimum noise excitation source candidate in the conventional example of the speech predictive coding method shown in FIG. 4 will be described with reference to FIG. FIG. 5 shows a circuit which constitutes a part of the distortion calculation section 5 of the conventional example. In this conventional example, the optimum noise excitation source candidate is selected by convolving the impulse response waveform of the synthesis filter having the time length equal to the frame length and the noise excitation source candidate, and the input having the time length equal to the frame length. It is necessary to perform the distortion calculation between the waveform obtained by removing the pitch component from the speech and the waveform of the convolution operation result as many as the number of noise excitation source candidates, and for this reason, it is necessary to perform a large number of arithmetic processings. is there. Then, a memory having a large storage capacity for storing a noise excitation source having a relatively long time length is required.

【０００７】ところで、雑音励振源候補をメモリに記憶
しておく雑音励振は使用せずに、雑音励振源を複数のパ
ルスにより表現し、２個の励振源を使用して合成フィル
タを駆動する音声の符号化を行なう方法として、ACELP
が開示されている（R.Salami，C.Laflamme and J-P.Ado
ul：“ACELP Speech Coder at 8 kbit/s with a 10msfr
ame:A Candidate for CCITT Standardization”，proc.
IEEE Workshop on Speech Coding,pp.23-24,1993 参
照）。この方法は、５msecの時間長の雑音励振源を４な
いし５個の振幅一定のパルス列により表現し、雑音励振
源はこれらパルス列のパルスの位置のみにより特定する
ものであり、雑音励振源の候補の格納に必要とされたメ
モリを不要としている。ところが、この方法は、１個の
パルスの位置を特定するために入力音声と同じ長時間の
時間長の音声を合成するところから、多数回の演算処理
をする必要があり、従って、４ないし５個のパルスを特
定するには更に多数回の演算処理をする必要がある。By the way, the noise excitation source candidate is stored in a memory, and the noise excitation source is expressed by a plurality of pulses without using the noise excitation source, and a voice that drives a synthesis filter by using two excitation sources. ACELP is a method of encoding
Is disclosed (R. Salami, C. Laflamme and JP.Ado
ul: “ACELP Speech Coder at 8 kbit / s with a 10msfr
ame: A Candidate for CCITT Standardization ”, proc.
See IEEE Workshop on Speech Coding, pp.23-24, 1993). In this method, a noise excitation source having a time length of 5 msec is represented by a pulse train of 4 or 5 constant amplitudes, and the noise excitation source is specified only by the pulse positions of these pulse trains. It eliminates the memory required for storage. However, since this method synthesizes a voice having the same long time as the input voice in order to specify the position of one pulse, it is necessary to perform a large number of arithmetic processes, and therefore, 4 to 5 are required. It is necessary to perform arithmetic processing a large number of times to specify each pulse.

【０００８】[0008]

【発明が解決しようとする課題】この発明は、ピッチ成
分を表現するピッチ励振源および雑音成分を表現する雑
音励振源の２個の励振源を使用して合成フィルタを駆動
する音声予測符号化方法において、雑音励振源として雑
音励振源候補をメモリに記憶しておくことはせずに少な
い演算量により雑音励振源を選択設定する音声予測符号
化方法を提供するものである。SUMMARY OF THE INVENTION The present invention provides a speech predictive coding method for driving a synthesis filter using two excitation sources, a pitch excitation source expressing a pitch component and a noise excitation source expressing a noise component. In the above, there is provided a speech predictive coding method for selectively setting a noise excitation source with a small amount of calculation without storing a noise excitation source candidate as a noise excitation source in a memory.

【０００９】[0009]

【課題を解決するための手段】入力音声を複数サンプリ
ングしたものに基づいて計算された量子化フィルタ係数
の設定された合成フィルタ４を、ピッチ成分を表現する
ピッチ励振源１１および雑音成分を表現する雑音励振源
３１の２個の励振源から出力される時系列ベクトル成分
により、フレーム単位毎に駆動して音声を再生すること
を利用して音声を符号化する音声予測符号化方法におい
て、フレームを更に複数のサブフレームに分割し、サブ
フレーム毎に１個づつパルスを立てることにより雑音励
振源３１を表現し、このパルスの位置および振幅をサブ
フレーム毎に量子化して符号化する音声予測符号化方法
を構成した。A synthesis filter 4 in which a quantization filter coefficient calculated based on a plurality of samples of input speech is set represents a pitch excitation source 11 representing a pitch component and a noise component. In a voice predictive coding method for coding a voice by utilizing the time-series vector components output from the two excitation sources of the noise excitation source 31 to drive the voice for each frame to reproduce the voice, Further, it is divided into a plurality of subframes, a noise excitation source 31 is expressed by setting a pulse for each subframe, and the position and amplitude of this pulse are quantized and coded for each subframe. Configured method.

【００１０】そして、サブフレーム毎のパルス振幅を複
数まとめて量子化する音声予測符号化方法を構成した。
また、サブフレーム毎のパルス位置を複数まとめて量子
化する音声予測符号化方法を構成した。更に、サブフレ
ーム毎のパルス振幅およびパルス位置の双方について複
数まとめて量子化する音声予測符号化方法を構成した。Then, a speech predictive coding method is constructed in which a plurality of pulse amplitudes for each subframe are collectively quantized and quantized.
In addition, we constructed a speech prediction coding method that quantizes multiple pulse positions for each subframe. Furthermore, we constructed a speech predictive coding method that quantizes multiple pulse amplitudes and pulse positions for each subframe.

【００１１】ここで、入力音声を複数サンプリングした
ものに基づいて計算された量子化フィルタ係数の設定さ
れた合成フィルタを、ピッチ成分を表現するピッチ励振
源および雑音成分を表現する雑音励振源の２個の励振源
から出力される時系列ベクトル成分により、フレーム単
位毎に駆動して音声を再生することを利用して音声を符
号化する音声予測符号化方法において、フレームを更に
複数のサブフレームに分割し、サブフレーム毎に１個づ
つ振幅一定のパルスを立てることにより雑音励振源を表
現してこのパルスの位置をサブフレーム毎に量子化し、
更にフレーム毎に雑音励振源の利得を決定してこれを量
子化し、パルス位置および振幅を符号化する音声予測符
号化方法を構成した。そして、サブフレーム毎のパルス
位置を複数まとめて量子化する音声予測符号化方法を構
成した。Here, the synthesis filter in which the quantized filter coefficient calculated based on a plurality of samples of the input speech is set as a pitch excitation source expressing a pitch component and a noise excitation source expressing a noise component. In the audio predictive coding method in which the audio is encoded by using the time-series vector component output from each excitation source to drive the audio for each frame to reproduce the audio, the frame is further divided into a plurality of subframes. The noise excitation source is expressed by dividing and arranging one pulse with a constant amplitude for each subframe, and the position of this pulse is quantized for each subframe.
Furthermore, we constructed a speech predictive coding method that determines the gain of the noise excitation source for each frame, quantizes it, and encodes the pulse position and amplitude. Then, a speech predictive coding method for collectively quantizing a plurality of pulse positions for each subframe is configured.

【００１２】[0012]

【実施例】この発明の一実施例を図１を参照して説明す
る。先ず、入力端子１から入力される入力音声波形のサ
ンプリング値系列がフィルタ係数決定部２に供給され、
ここにおいてフィルタ係数が計算される。フィルタ係数
決定部２において計算されたフィルタ係数は、次いで、
フィルタ係数量子化部３に供給され、フィルタ係数はこ
こにおいて量子化され、量子化されたフィルタ係数は合
成フィルタ４に供給され、合成フィルタが設定される。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT An embodiment of the present invention will be described with reference to FIG. First, the sampling value series of the input speech waveform input from the input terminal 1 is supplied to the filter coefficient determination unit 2,
Here the filter coefficients are calculated. The filter coefficient calculated in the filter coefficient determination unit 2 is
The filter coefficient is supplied to the filter coefficient quantizer 3, the filter coefficient is quantized here, and the quantized filter coefficient is supplied to the synthesis filter 4, and the synthesis filter is set.

【００１３】この合成フィルタ４を励振する励振源は２
個の励振源より成る。励振源の内の一方はピッチ励振源
１１である。このピッチ励振源１１は複数のピッチ周期
成分より成り、選択されたピッチ周期成分候補はピッチ
利得乗算部１２においてピッチ利得を乗算して出力され
る。励振源の内の他方はパルス生成源３１であり、入力
音声フレームを複数区間に分割した区間であるサブフレ
ーム毎に１個のパルスを生成する。The excitation source for exciting the synthesis filter 4 is 2
It consists of individual excitation sources. One of the excitation sources is the pitch excitation source 11. The pitch excitation source 11 is composed of a plurality of pitch period components, and the selected pitch period component candidate is multiplied by the pitch gain in the pitch gain multiplication unit 12 and output. The other of the excitation sources is a pulse generation source 31, which generates one pulse for each subframe which is a section obtained by dividing the input speech frame into a plurality of sections.

【００１４】歪計算部５は、入力音声と合成音声との間
の差である歪が最も小さくなる様にピッチ励振源１１の
ピッチ周期成分中からピッチ周期成分候補を選択し、こ
のピッチ周期成分候補に対して最適な利得を設定すると
共に、これと同時にパルス生成部３１において生成する
パルスの生成されるべき位置および振幅を設定する。歪
計算部５におけるパルスの生成されるべき位置および振
幅の具体的な設定の仕方は下記の通りである。The distortion calculator 5 selects a pitch cycle component candidate from the pitch cycle components of the pitch excitation source 11 so that the distortion, which is the difference between the input speech and the synthesized speech, becomes the smallest, and the pitch cycle component is selected. The optimum gain is set for the candidate, and at the same time, the position and amplitude at which the pulse generated in the pulse generation unit 31 should be generated are set. The specific method of setting the position and amplitude of the pulse to be generated in the distortion calculation section 5 is as follows.

【００１５】入力音声の標本化周波数を８. ０ｋＨｚ、
符号化フレーム長を１０msecとし、サブフレーム数を１
０フレームとすると、各サブフレームにおいてパルスの
生成されるべき位置は８箇所ある。ここで、設定される
べき雑音励振源のパルス位置とパルス振幅以外のすべて
のパラメータおよび入力音声フレームをロックした状態
において、第１のサブフレームから第１０のサブフレー
ムについて順次にパルス位置およびパルス振幅の設定操
作を実施する。先ず、第１のサブフレームについて、上
述の状態においてパルスの生成されるべき８箇所の位置
の内の第１の位置にパルスを立て、パルス振幅を変化さ
せて歪計算をし、その結果を記憶し、引き続いて８箇所
の位置の内の第２の位置にパルスを立て、パルス振幅を
変化させて歪計算をし、その結果を記憶し、同様に第３
ないし第８の位置についても歪計算をして計算結果を記
憶する。記憶された第１ないし第８の位置およびパルス
振幅の歪計算結果をすべて比較して最小の値を示す位置
およびパルス振幅を第１のサブフレームのパルス位置お
よびパルス振幅として１個だけ設定する。第２のサブフ
レームについても、同様に、上述の状態においてパルス
の生成されるべき８箇所の位置およびパルス振幅の歪計
算を実施して結果を記憶し、これらを比較して最小の値
を示す位置およびパルス振幅を第２のサブフレームのパ
ルス位置およびパルス振幅とする。第３ないし第１０の
サブフレームについても、同様にしてパルス位置および
パルス振幅を１個づつ設定する。第１ないし第１０のサ
ブフレームのすべてについてパルス位置およびパルス振
幅が１個づつ設定されたとき、雑音励振源の表現は完了
する。The sampling frequency of the input voice is 8.0 kHz,
Encoded frame length is 10msec and number of subframes is 1
If there are 0 frames, there are 8 positions where pulses should be generated in each subframe. Here, in the state where all parameters other than the pulse position and pulse amplitude of the noise excitation source to be set and the input speech frame are locked, the pulse position and the pulse amplitude are sequentially applied from the first subframe to the tenth subframe. Perform the setting operation of. First, for the first sub-frame, in the above-mentioned state, a pulse is set at a first position among the eight positions where a pulse should be generated, the pulse amplitude is changed, distortion calculation is performed, and the result is stored. Then, subsequently, a pulse is set at the second position out of the eight positions, the pulse amplitude is changed, the distortion is calculated, and the result is stored.
Also, distortion calculation is performed for the eighth position and the calculation result is stored. All the stored 1st to 8th position and pulse amplitude distortion calculation results are compared, and only one position and pulse amplitude showing the minimum value are set as the pulse position and pulse amplitude of the first sub-frame. Similarly, for the second subframe, in the above-described state, the distortion calculation of the position and the pulse amplitude of the eight positions where the pulse should be generated is performed, the results are stored, and these are compared to show the minimum value. The position and pulse amplitude are the pulse position and pulse amplitude of the second subframe. Similarly, for the third to tenth subframes, one pulse position and one pulse amplitude are set. When the pulse position and the pulse amplitude are set one by one for all the first to tenth subframes, the expression of the noise excitation source is completed.

【００１６】符号出力部６は、フィルタ係数量子化部３
から供給される量子化されたフィルタ係数、歪計算部５
において選択されたピッチ励振源候補、雑音励振源の各
パルスの位置および振幅をそれぞれ符号化し出力すると
ころであり、これらの符号は出力端子７を介して伝送さ
れる。この発明は、雑音励振源にパルスを使用すること
により、従来例において非常に演算処理負担の大きかっ
た合成フィルタのインパルス応答波形と雑音励振源候補
の畳み込み演算をする必要をなくすることができる。こ
のことをこの発明の歪計算部５の一部を構成する回路で
ある図２を参照して説明する。即ち、図２においては、
上述の畳み込み演算をする代わりに、シフトレジスタを
使用してこれに入力された合成フィルタのインパルス応
答波形を位置を示すパルスにより読みだしてこれを従来
例における畳み込み演算結果に対応するものとし、演算
処理負担を軽減することができる。更に、フレーム長Ｌ
と比較して短い時間長Ｌ’のサブフレーム毎にパルスを
特定するので歪算出の演算処理負担をも軽減することが
できる。The code output unit 6 includes a filter coefficient quantization unit 3
The quantized filter coefficient supplied from the distortion calculation unit 5
The position and amplitude of each pulse of the pitch excitation source candidate and the noise excitation source selected in 1 are encoded and output, and these symbols are transmitted through the output terminal 7. According to the present invention, by using the pulse for the noise excitation source, it is possible to eliminate the need for performing the convolution calculation of the impulse response waveform of the synthesis filter and the noise excitation source candidate, which is very heavy in the calculation processing in the conventional example. This will be described with reference to FIG. 2, which is a circuit forming a part of the distortion calculation section 5 of the present invention. That is, in FIG.
Instead of performing the above-mentioned convolution operation, the shift register is used to read the impulse response waveform of the synthesis filter input to this with a pulse indicating the position, and this is assumed to correspond to the convolution operation result in the conventional example. The processing load can be reduced. Furthermore, the frame length L
Since the pulse is specified for each sub-frame having a shorter time length L ′ as compared with, it is possible to reduce the calculation processing load for distortion calculation.

【００１７】ここで、符号出力部６において雑音励振源
の各パルスの位置および／或は各パルス振幅を複数まと
めて量子化することにより、量子化の効率を向上させる
ことができる。図３を参照してこの発明の他の実施例を
説明する。図３において図１と対応する部分には同一の
符号を付与してある。雑音励振源を構成するパルス生成
部４１の生成するパルスの振幅は一定であり、パルス生
成部４１においてはパルスの位置のみをもって雑音波形
成分を特定する。パルス生成部４１の出力は雑音利得利
得乗算部４２において雑音利得が乗算され、合成フィル
タ４にピッチ励振源候補と共に供給される。符号出力部
６は、フィルタ係数量子化部３から供給される量子化さ
れたフィルタ係数、選択されたピッチ励振源候補、雑音
励振源候補の各パルスの位置および雑音利得をそれぞれ
符号化し出力する。Here, by quantizing the position and / or the amplitude of each pulse of the noise excitation source in the code output section 6 collectively, the efficiency of the quantization can be improved. Another embodiment of the present invention will be described with reference to FIG. In FIG. 3, parts corresponding to those in FIG. 1 are given the same reference numerals. The amplitude of the pulse generated by the pulse generation unit 41 constituting the noise excitation source is constant, and the pulse generation unit 41 specifies the noise waveform component only by the position of the pulse. The output of the pulse generation unit 41 is multiplied by the noise gain in the noise gain gain multiplication unit 42 and supplied to the synthesis filter 4 together with the pitch excitation source candidate. The code output unit 6 encodes and outputs the quantized filter coefficient supplied from the filter coefficient quantization unit 3, the selected pitch excitation source candidate, the position of each pulse of the noise excitation source candidate, and the noise gain, respectively.

【００１８】符号出力部６は、雑音励振源の各パルスの
位置を複数まとめて量子化することにより量子化の効率
を向上させることができる。この発明によれば、雑音励
振源を選択するためには音声フレームを分割したサブフ
レームに相当する短い時間分だけの音声を合成すればよ
く、雑音励振源の選択に必要な演算量は従来の方法と比
して大幅に軽減される。更に、雑音励振源をすべてパル
スにより表現して雑音励振源の符号化はパルスの位置お
よび振幅について行なうものであり、雑音励振源候補を
記憶するためのメモリは不要となる。The code output unit 6 can improve the efficiency of quantization by collectively quantizing the positions of each pulse of the noise excitation source. According to the present invention, in order to select a noise excitation source, it suffices to synthesize a voice for a short time corresponding to a subframe obtained by dividing a voice frame. It is significantly reduced compared to the method. Furthermore, since all noise excitation sources are expressed by pulses and the noise excitation sources are encoded with respect to the position and amplitude of the pulse, a memory for storing noise excitation source candidates is unnecessary.

【００１９】入力音声の標本化周波数を８. ０ｋＨｚ、
符号化フレーム長１０msecの条件の元において従来の符
号化方法の雑音励振源の符号長の大きさを２５６とした
場合と、この発明において音声フレームを１０分割して
サブフレーム長を１msecとして１フレーム１０パルスと
した場合の、雑音励振源の特定に必要とされる演算処理
量を比較すると、この発明の演算処理量は従来の符号化
方法と比較して１０％以下にまで削減することができる
と推定される。更に、この発明は、従来の符号化方法に
おいて必要とされる雑音励振源を記憶するメモリは不要
であるThe sampling frequency of the input voice is 8.0 kHz,
Under the condition of the coded frame length of 10 msec, the size of the code length of the noise excitation source of the conventional coding method is set to 256, and in the present invention, the voice frame is divided into 10 and the subframe length is set to 1 msec. Comparing the amount of calculation required to specify the noise excitation source when the number of pulses is 10 pulses, the amount of calculation of the present invention can be reduced to 10% or less as compared with the conventional encoding method. It is estimated to be. Furthermore, the present invention does not require a memory to store the noise excitation source required in conventional coding methods.

【００２０】[0020]

【発明の効果】以上の通りであって、この発明は、極く
短い時間長の雑音励振源を単一のパルスを使用して表現
するものであるので雑音励振源の選択において短い時間
長の音声を合成すればよく、少ない演算処理量により雑
音励振源を選択することができる。更に、雑音励振源を
パルスにより表現するのでパルスの位置および振幅によ
り雑音励振源の特定をすることができるので、複数の雑
音励振源を記憶するためのメモリを不要とすることがで
きる。As described above, according to the present invention, since the noise excitation source having an extremely short time length is expressed by using a single pulse, a short time length is selected in selecting the noise excitation source. It is only necessary to synthesize the voice, and the noise excitation source can be selected with a small amount of calculation processing. Furthermore, since the noise excitation source is represented by a pulse, the noise excitation source can be specified by the position and amplitude of the pulse, so that a memory for storing a plurality of noise excitation sources can be eliminated.

[Brief description of drawings]

【図１】この発明の実施例を説明するブロック図。FIG. 1 is a block diagram illustrating an embodiment of the present invention.

【図２】この発明の歪計算部の一部を構成するブロック
図。FIG. 2 is a block diagram which constitutes a part of a distortion calculation section of the present invention.

【図３】この発明の他の実施例を説明するブロック図。FIG. 3 is a block diagram illustrating another embodiment of the present invention.

【図４】従来例を説明するブロック図。FIG. 4 is a block diagram illustrating a conventional example.

【図５】従来例の歪計算部の一部を構成する回路であ
る。FIG. 5 is a circuit forming a part of a distortion calculation unit of a conventional example.

[Explanation of symbols]

１入力端子２フィルタ係数決定部３フィルタ係数量子化部４合成フィルタ５歪計算部６符号出力部７出力端子１１ピッチ励振源１２ピッチ利得乗算部３１パルス生成源 1 Input Terminal 2 Filter Coefficient Determining Section 3 Filter Coefficient Quantization Section 4 Synthesis Filter 5 Distortion Calculating Section 6 Code Output Section 7 Output Terminal 11 Pitch Excitation Source 12 Pitch Gain Multiplying Section 31 Pulse Generation Source

Claims

[Claims]

1. A synthesis filter in which a quantization filter coefficient calculated based on a plurality of samples of input speech is set, and two synthesis filters are provided: a pitch excitation source expressing a pitch component and a noise excitation source expressing a noise component. In the audio predictive coding method in which the audio is encoded by using the time-series vector component output from the excitation source to drive the audio for each frame to reproduce the audio, the frame is further divided into a plurality of subframes. Then, a noise excitation source is expressed by raising one pulse for each subframe, and the position and amplitude of this pulse are quantized and encoded for each subframe.

2. The speech predictive coding method according to claim 1, wherein a plurality of pulse amplitudes for each subframe are collectively quantized and quantized.

3. The speech predictive coding method according to claim 1, wherein a plurality of pulse positions for each subframe are collectively quantized and quantized.

4. The speech predictive coding method according to claim 1, wherein a plurality of pulse amplitudes and pulse positions for each subframe are collectively quantized.

5. A synthesis filter in which a quantized filter coefficient calculated based on a plurality of samples of input speech is set, and two synthesis filters are provided: a pitch excitation source expressing a pitch component and a noise excitation source expressing a noise component. In the audio predictive coding method in which the audio is encoded by using the time-series vector component output from the excitation source to drive the audio for each frame to reproduce the audio, the frame is further divided into a plurality of subframes. Then, a noise excitation source is expressed by raising one pulse with a constant amplitude for each subframe, the position of this pulse is quantized for each subframe, and the gain of the noise excitation source is determined for each frame. And a pulse position and amplitude are coded, and a speech predictive coding method is characterized.

6. The speech predictive coding method according to claim 5, wherein a plurality of pulse positions for each subframe are collectively quantized and quantized.