JPH0934498A

JPH0934498A - Acoustic signal encoding method

Info

Publication number: JPH0934498A
Application number: JP7185625A
Authority: JP
Inventors: Jiyoutarou Ikedo; 丈太朗池戸; Akitoshi Kataoka; 章俊片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1995-07-21
Filing date: 1995-07-21
Publication date: 1997-02-07

Abstract

PROBLEM TO BE SOLVED: To decrease the selective calculating amount of a noise exciting signal. SOLUTION: Matrices H and H<t> H=R using the impulse response of a synthesizing filter as elements are determined by a calculation part 31 after selecting a pitch exciting signal (p), and ψ is determined by another calculation part 32 from H, R, signal (p) and input voice (s), and preliminary selection part 33-37 calculate aj ψ(lj ) for vector elements ψ(i) of ψ about eight pulse positions lj and their polarities aj in groups 0-4, and in the sequence from greater value of aj ψ(lj ), for example, three pulse positions are preliminarily selected, and a candidate preparing part 38 prepares a noise exciting signal (c) of five pulses as one selection each from the three positions among group 0-4, and for each (c), ψc is calculated and also distortion for (s) is calculated so that the optimum noise exciting signal is selected.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は音声や音楽などの
音響信号を、ピッチ成分を表現するピッチ励振信号と雑
音成分を表現する雑音励振信号の合成フィルタを駆動し
て音響信号を再生することを利用して音響信号を符号化
する方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to reproducing an acoustic signal such as voice or music by driving a synthesis filter of a pitch excitation signal expressing a pitch component and a noise excitation signal expressing a noise component. The present invention relates to a method of encoding an acoustic signal by utilizing the method.

【０００２】[0002]

【従来の技術】２個の励振信号を用いて合成フィルタを
駆動する音響信号符号化方法としてはＶＳＥＬＰ、ＣＳ
−ＣＥＬＰその他の方法が知られている。これらは、I.
A. Gerson and M. A. Jasiuk:“Vector Sum Excited L
inear Prediction (VSELP) Speech Coding at 8kb/s",
Proc. IEEE ICASSP '90, pp.461-464 (1990)（文献１）
あるいは A. Kataoka, T. Moriya and S. Hayashi:“An
8-kbit/s Speech CoderBasedon Conjugate Structure
CELP", Proc. IEEE ICASSP '93, pp.592-595 (1993)
（文献２）その他の文献に開示されている。2. Description of the Related Art VSELP, CS are available as audio signal encoding methods for driving a synthesis filter using two excitation signals.
-CELP and other methods are known. These are I.
A. Gerson and MA Jasiuk: “Vector Sum Excited L
inear Prediction (VSELP) Speech Coding at 8kb / s ",
Proc. IEEE ICASSP '90, pp.461-464 (1990) (Reference 1)
Or A. Kataoka, T. Moriya and S. Hayashi: “An
8-kbit / s Speech CoderBased on Conjugate Structure
CELP ", Proc. IEEE ICASSP '93, pp.592-595 (1993)
(Reference 2) It is disclosed in other references.

【０００３】ここで、図３を参照してこの種の音声符号
化方法の従来例を説明する。ただし同図は説明の簡単の
ため主な信号の流れのみを記載したものである。まず、
入力端子１から入力される入力音声波形のサンプリング
値系列がフィルタ係数決定部２に供給され、ここにおい
て線形予測分析などによりフィルタ係数が計算される。
フィルタ係数決定部２において計算されたフィルタ係数
は、次いでフィルタ係数量子化部３に供給され、フィル
タ係数はここにおいて量子化され、量子化されたフィル
タ係数は合成フィルタ４に供給され、ここに合成フィル
タ４のフィルタ係数が設定される。A conventional example of this type of speech coding method will be described with reference to FIG. However, in the figure, only the main signal flow is shown for the sake of simplicity of description. First,
The sampling value series of the input speech waveform input from the input terminal 1 is supplied to the filter coefficient determination unit 2, where the filter coefficient is calculated by linear prediction analysis or the like.
The filter coefficient calculated in the filter coefficient determination unit 2 is then supplied to the filter coefficient quantization unit 3, the filter coefficient is quantized here, and the quantized filter coefficient is supplied to the synthesis filter 4 and synthesized there. The filter coefficient of the filter 4 is set.

【０００４】この合成フィルタ４を駆動する励振信号は
２個の励振信号より成る。励振信号のうちの一方はピッ
チ符号帳１１の出力であり、他方は雑音符号帳２１の出
力である。このピッチ符号帳１１は合成フィルタ４に対
する過去の励振信号について複数のピッチ周期で取出さ
れた成分であり、選択されたピッチ周期成分候補はピッ
チ利得乗算部１２においてピッチ利得が乗算されて出力
される。雑音符号帳２１は複数の雑音波形成分より成
り、選択された雑音波形候補は雑音利得乗算部２２にお
いて雑音利得が乗算されて出力される。合成フィルタ４
はピッチ利得乗算部１２の出力と雑音利得乗算部２２の
出力とを加算部８で加算したものにより励振駆動され
る。The excitation signal for driving the synthesis filter 4 is composed of two excitation signals. One of the excitation signals is the output of the pitch codebook 11 and the other is the output of the noise codebook 21. The pitch codebook 11 is a component extracted in a plurality of pitch periods with respect to the past excitation signal for the synthesis filter 4, and the selected pitch period component candidate is multiplied by the pitch gain in the pitch gain multiplication unit 12 and output. . The noise codebook 21 includes a plurality of noise waveform components, and the selected noise waveform candidate is multiplied by the noise gain in the noise gain multiplication unit 22 and output. Synthesis filter 4
Is driven by the sum of the output of the pitch gain multiplication unit 12 and the output of the noise gain multiplication unit 22 in the addition unit 8.

【０００５】歪計算部５は、入力端子１を介して入力さ
れる入力音声信号と合成フィルタ４から出力される合成
音声信号との間の差が減算部９でとられ、その差である
歪が最も小さくなるように両符号帳１１，２１中の各励
振成分候補を選択し、同時に各励振成分候補に対して最
適な利得を設定する。符号出力部６は、フィルタ係数量
子化部３から供給される量子化されたフィルタ係数、歪
計算部５において選択された各符号帳１１，２１の選択
された各候補利得部１２，２２の利得をそれぞれ符号化
して出力する。これらの符号は出力端子７を介して伝送
又は蓄積される。通常はピッチ励振信号を選択した後、
雑音波形励振信号を選択する。In the distortion calculating section 5, the difference between the input audio signal input through the input terminal 1 and the synthetic audio signal output from the synthesizing filter 4 is taken in the subtracting section 9, and the difference which is the distortion. Each excitation component candidate in both codebooks 11 and 21 is selected so that is minimized, and at the same time, an optimum gain is set for each excitation component candidate. The code output unit 6 includes the quantized filter coefficients supplied from the filter coefficient quantization unit 3 and the gains of the selected candidate gain units 12 and 22 of the codebooks 11 and 21 selected by the distortion calculation unit 5. Are encoded and output. These codes are transmitted or stored via the output terminal 7. Normally, after selecting the pitch excitation signal,
Select the noise waveform excitation signal.

【０００６】ところで、前記雑音符号帳２１よりの雑音
波形成分を、複数のパルスにより表現した音声符号化方
法として、ＡＣＥＬＰが知られている。これは文献 R.
Salami, C. Laflamme and J-P. Adoul: "ACELP Speech
Coderat 8kbit/s with 10msFrame: A Candidate for CC
ITT Standardization", Proc. IEEE Workshop on Speec
h Coding, pp. 23-24(1993)（文献３）その他の文献に
開示されている。この方法は５ｍｓの時間長の雑音励振
信号を４ないし５個の振幅一定のパルス列により表現
し、雑音励振信号はこれらのパルスの位置およびその極
性により特定するものであり、雑音励振波形の候補の格
納に必要とされるメモリを不要としている。ACELP is known as a speech coding method in which the noise waveform component from the noise codebook 21 is expressed by a plurality of pulses. This is the reference R.
Salami, C. Laflamme and JP. Adoul: "ACELP Speech
Coderat 8kbit / s with 10msFrame: A Candidate for CC
ITT Standardization ", Proc. IEEE Workshop on Speec
h Coding, pp. 23-24 (1993) (Reference 3) and other references. In this method, a noise excitation signal having a time length of 5 ms is represented by a pulse train of 4 or 5 constant amplitudes, and the noise excitation signal is specified by the position of these pulses and their polarities. It eliminates the memory required for storage.

【０００７】図４を用いて従来の雑音励振信号としての
パルスを選択する方法を説明する。図３に示した場合と
同様にして選択されたピッチ励振信号が端子１３から加
算部８に入力され、雑音励振作成部１４から雑音励振信
号候補として複数のパルスの組み合わせが加算部８へ供
給され、その出力が、図３の量子化部３よりのフィルタ
係数が端子１５を通じて設定されている合成フィルタ４
に通され、その出力と入力音声信号との差が歪計算部５
へ供給され誤差信号のパワーが最小になるように雑音励
振作成部１４が制御され、誤差パワーが最小となるパル
ス位置、その各極性を符号化して出力する。A conventional method for selecting a pulse as a noise excitation signal will be described with reference to FIG. The pitch excitation signal selected in the same manner as in the case shown in FIG. 3 is input from the terminal 13 to the adding section 8, and the noise excitation creating section 14 supplies the combination of a plurality of pulses as a noise excitation signal candidate to the adding section 8. , Its output is the synthesis filter 4 to which the filter coefficient from the quantizer 3 of FIG. 3 is set through the terminal 15.
And the difference between the output and the input audio signal is passed to the distortion calculator 5.
The noise excitation generator 14 is controlled so as to minimize the power of the error signal supplied to the pulse position and the pulse position where the error power is minimized and each polarity thereof are encoded and output.

【０００８】ところが、この方法は複数のパルスの組み
合わせで雑音励振波形を表現するため非常に多くの雑音
励振信号候補が存在し、最適な雑音励振信号候補を決定
するためには多数回の演算処理を必要とする。即ち図４
に於いて雑音励振信号作成、合成フィルタ、誤差パワー
算出の処理を全ての雑音励振信号候補について行う必要
があり、このため多くの演算処理が必要となる。However, since this method represents a noise excitation waveform with a combination of a plurality of pulses, there are a large number of noise excitation signal candidates, and in order to determine the optimum noise excitation signal candidate, a large number of arithmetic processings are performed. Need. That is, FIG.
In this case, it is necessary to perform the processes of noise excitation signal generation, synthesis filter, and error power calculation for all noise excitation signal candidates, and therefore many arithmetic processes are required.

【０００９】さらに、雑音励振信号を特定する際に、各
雑音励振信号候補を、事前に決定されたピッチ励振信号
に対して直交化して選択する方法が知られている。この
方法は雑音励振信号の特定に際して事前に決定したピッ
チ励振信号の影響をゼロとすることができ、より高精度
に雑音励振信号の特定を可能にする。これは I. A. Ger
son and M. A. Jasiuk; “Vector Sum Excited Linear
Prediction (VSELP) Speech Coding at 8kb/s", Proc.
IEEE ICASSP '90, pp.461-464(1990)(文献４）その他の
文献に開示されている。Further, a method is known in which, when specifying a noise excitation signal, each noise excitation signal candidate is orthogonalized to a pitch excitation signal determined in advance and selected. This method can eliminate the influence of the pitch excitation signal determined in advance when the noise excitation signal is specified, and enables the noise excitation signal to be specified with higher accuracy. This is IA Ger
son and MA Jasiuk; “Vector Sum Excited Linear
Prediction (VSELP) Speech Coding at 8kb / s ", Proc.
IEEE ICASSP '90, pp.461-464 (1990) (reference 4) and other references.

【００１０】図５を用いてこの従来の雑音励振信号の直
交化選択を、図４と同様に選択したピッチ励振信号を用
いて雑音励振信号を選択する場合について説明する。端
子１５からフィルタ係数が合成フィルタ４ａ，４ｂに設
定され、端子１３から選択されたピッチ励振信号が合成
フィルタ４ａに入力され、雑音符号帳１７から選択され
た複数のパルスよりなる雑音励振信号が合成フィルタ４
ｂに入力される。合成フィルタ４ａ，４ｂより各合成信
号は直交化部１８で直交化処理されて歪計算部１９に入
力され、端子１からの音声信号に対する誤差パワーが最
小になるように雑音符号帳１７の選択が行われる。The orthogonal selection of the conventional noise excitation signal will be described with reference to FIG. 5, and the case where the noise excitation signal is selected using the pitch excitation signal selected similarly to FIG. The filter coefficient is set to the synthesis filters 4a and 4b from the terminal 15, the pitch excitation signal selected from the terminal 13 is input to the synthesis filter 4a, and the noise excitation signal composed of a plurality of pulses selected from the noise codebook 17 is synthesized. Filter 4
b. The synthesized signals from the synthesis filters 4a and 4b are orthogonalized by the orthogonalization unit 18 and input to the distortion calculation unit 19, and the noise codebook 17 is selected so that the error power with respect to the voice signal from the terminal 1 is minimized. Done.

【００１１】直交化部１８で合成雑音成分は、合成ピッ
チ成分と無相関とされているから、歪計算部１９では、
この直交化された合成雑音成分と入力音声中の雑音成分
との誤差パワーが最小となる雑音励振信号を選択するこ
とになる。選択されたピッチ励振信号をｐ、雑音励振信
号候補をｃ、合成フィルタ４ａ，４ｂのインパルス応答
を要素とする行列をＨとすると、直交化部１８の出力中
の合成雑音成分Ｃ₀は前記文献４に示すように、次式で
表わされる。Since the synthetic noise component is uncorrelated with the synthetic pitch component in the orthogonalization unit 18, the distortion calculation unit 19
The noise excitation signal that minimizes the error power between the orthogonalized synthesized noise component and the noise component in the input speech is selected. When the selected pitch excitation signal is p, the noise excitation signal candidate is c, and the matrix having the impulse responses of the synthesis filters 4a and 4b as elements is H, the synthesized noise component C _{0 in} the output of the orthogonalization unit 18 is the above-mentioned document. As shown in FIG.

【００１２】Ｃ₀＝Ｈｃ−〔（ｐ^tＨ^tＨｃ）／（ｐ^t
Ｈ^tＨｐ）〕ＨｐＨ^tはＨの転置を示す。この合成雑音成分Ｃ₀と入力音
声信号の時系列ベクトルｓの歪みＤは次式で求められ
る。Ｄ＝｛ｓ−γ_cＣ₀｝^t｛ｓ−γ_cＣ₀｝この歪Ｄを最小とする雑音励振信号候補ｃは次式を最大
にする候補ｃと等価である。[0012] C ₀ = Hc - ^{^{[(p t H t Hc) /}} (p t
H ^t Hp)] Hp H ^t denotes the transpose of H. The synthetic noise component C ₀ and the distortion D of the time series vector s of the input voice signal are obtained by the following equation. D = {s-γ _c C ₀ } ^t {s-γ _c C ₀ } The noise excitation signal candidate c that minimizes the distortion D is equivalent to the candidate c that maximizes the following expression.

【００１３】Ｄ_su＝（Ψｃ）²／｛（ｃＲｃ）（ｐＲ
ｐ）−（ｐＲｃ）²｝Ｒ＝Ｈ^tＨ Ψ＝（ｐ^tＲｐ）ｓ^tＨ−（ｓ^tＨｐ）ｐ^tＲ最適雑音励振信号の選択はＤ_suが最大となるｃを求める
ことになる。これらのことは前記文献４に示されてい
る。すべての座音励振信号ｃについてＤ_suを計算するこ
とにより最適な雑音励振信号を選択する。D _su = (Ψc) ² / {(cRc) (pR
^{p) - (pRc) 2}} R = H t H Ψ = ( selection of ^{^{p t Rp) s t H- (}} s t Hp) p t R optimum noise excitation signal to obtain the c that D _su is maximum Become. These are shown in the above-mentioned reference 4. The optimal noise excitation signal is selected by calculating D _su for all seat excitation signals c.

【００１４】[0014]

【発明が解決しようとする課題】例えば雑音励振信号の
継続時間長（フレーム長）を５ｍｓとし、各雑音励振信
号を５個のパルスで表現し、図６に示すように、グルー
プ０〜４はそれぞれフレーム中における決められた８つ
のパルス位置の何れかしかとることができないとされ、
各グループ０〜４からそれぞれその８つのパルス位置の
何れか１つを取出して、５つのパルスよりなる雑音励振
信号とする。その場合、各パルスは＋１又は−１の何れ
かの極性をとる。このようにして雑音励振信号が作成さ
れるため、その数は著しく大であり、その各雑音励振信
号についてＤ_suの式を演算する必要があり、その演算量
が著しく多くなり、時間も長くなる。For example, the duration (frame length) of the noise excitation signal is set to 5 ms, each noise excitation signal is represented by 5 pulses, and as shown in FIG. It is said that each of them can take only one of eight fixed pulse positions in the frame,
Any one of the eight pulse positions is taken out from each of the groups 0 to 4 to obtain a noise excitation signal composed of five pulses. In that case, each pulse has a polarity of either +1 or -1. Since the noise excitation signal is created in this way, the number thereof is remarkably large, and it is necessary to calculate the formula D _su for each noise excitation signal, and the amount of calculation is remarkably increased and the time also becomes long. .

【００１５】この発明は、ピッチ成分を表現するピッチ
励振信号および雑音成分を表現する雑音励振信号の２個
の励振信号を使用して合成フィルタを駆動する音響信号
符号化方法において、雑音励振信号として複数のパルス
で構成される雑音励振信号を用い、少ない演算量で実行
可能な雑音励振信号を選択することができる音響信号符
号化方法を提供することを目的とする。According to the present invention, in a sound signal encoding method for driving a synthesis filter using two excitation signals, a pitch excitation signal expressing a pitch component and a noise excitation signal expressing a noise component, the noise excitation signal is used as a noise excitation signal. An object of the present invention is to provide an acoustic signal encoding method that can select a noise excitation signal that can be executed with a small amount of calculation, using a noise excitation signal composed of a plurality of pulses.

【００１６】[0016]

【課題を解決するための手段】請求項１の発明によれ
ば、雑音励振信号、入力音響信号、ピッチ励振信号及び
合成フィルタ係数から求められる値を利用して、雑音励
振信号の候補を予備的に複数選択し、これら選択された
各雑音励振信号の候補で合成フィルタを駆動して得られ
る各合成音響信号と入力音響信号との間の歪を求め、そ
の歪を最小とする雑音励振信号を一つ選択する。According to the invention of claim 1, a candidate of the noise excitation signal is preliminarily used by utilizing the values obtained from the noise excitation signal, the input acoustic signal, the pitch excitation signal and the synthesis filter coefficient. , And select the distortion between each synthetic acoustic signal and the input acoustic signal obtained by driving the synthesis filter with these selected candidates of each noise excitation signal, and select the noise excitation signal that minimizes the distortion. Select one.

【００１７】請求項２の発明によれば、雑音励振信号、
入力音響信号、ピッチ励振信号及び合成フィルタ係数か
ら求められる値を利用して雑音励振信号を構成するパル
スの位置及び振幅の候補を予備的に複数選択し、これら
選択されたパルス位置により構成される各雑音励振信号
で合成フィルタを駆動して得られる各合成音響信号と入
力音響信号との間の歪を求め、その歪を最小とする雑音
励振信号を一つ選択する。According to the invention of claim 2, a noise excitation signal,
Using the values obtained from the input acoustic signal, the pitch excitation signal, and the synthesis filter coefficient, a plurality of candidates for the position and amplitude of the pulse forming the noise excitation signal are preliminarily selected, and the pulse position is selected. The distortion between each synthetic acoustic signal obtained by driving the synthetic filter with each noise excitation signal and the input acoustic signal is obtained, and one noise excitation signal that minimizes the distortion is selected.

【００１８】更に具体的に述べると、ピッチ成分を表現
するピッチ励振信号および雑音成分を表現する雑音励振
信号の２個の励振信号を使用して合成フィルタを駆動し
て音響信号を再生し、その入力音響信号との歪が最小と
なるピッチ励振信号および雑音励振信号を選択すること
により符号化し、その際に上記雑音励振信号を複数のパ
ルスで表現し、上記合成フィルタの特性を表わす行列
と、ピッチ励振信号と、入力音響信号と、雑音励振信号
とを用いた演算の結果が最大となる雑音励振信号を選択
する方法において、請求項１の発明では上記演算の分子
と対応する項を演算し、その演算結果の大きいものから
順に予め決めた数の雑音励振信号を予備選択し、その予
備選択した雑音励振信号について上記演算の結果の最大
となるものを選択する。More specifically, two excitation signals, a pitch excitation signal expressing a pitch component and a noise excitation signal expressing a noise component, are used to drive a synthesis filter to reproduce an acoustic signal, Encoding by selecting a pitch excitation signal and a noise excitation signal with minimum distortion with the input acoustic signal, in which case the noise excitation signal is represented by a plurality of pulses, and a matrix representing the characteristics of the synthesis filter, In a method of selecting a noise excitation signal that maximizes the result of calculation using a pitch excitation signal, an input acoustic signal, and a noise excitation signal, in the invention of claim 1, the term corresponding to the numerator of the calculation is calculated. , Pre-select a predetermined number of noise excitation signals in descending order of the calculation result, and select the pre-selected noise excitation signal that gives the maximum result of the above calculation. .

【００１９】請求項２の発明では請求項１の発明におけ
る雑音励振信号の作成を、それぞれパルスのとり得るフ
レーム上の位置が複数個所予め決められた複数のグルー
プから、その各１つのパルス位置を選択し、かつ極性を
与えて作成し、予備選択を、各グループにおけるパルス
位置について、前記最大となる演算の分子と対応する演
算を行い、その各グループにおける演算結果のうち、大
きな順に所定数を選択して行い、その選択されたパルス
位置を、各グループから選択して作成した雑音励振信号
について上記演算の結果が最大となるものを求める。According to the second aspect of the invention, the noise excitation signal in the first aspect of the invention is created by selecting one pulse position from each of a plurality of groups each having a plurality of predetermined positions on the frame where the pulse can be taken. Select and create by giving polarity, pre-selection is performed for the pulse position in each group, the calculation corresponding to the numerator of the maximum calculation is performed, and a predetermined number is selected from the calculation result in each group in descending order. The selected pulse position is selected and the noise excitation signal generated by selecting it from each group is determined so as to maximize the result of the above calculation.

【００２０】[0020]

【発明の実施の形態】簡単のため、入力音声のサンプリ
ング周波数を８０００Ｈｚ、雑音励振信号の継続時間長
（フレーム長）を５ｍｓとし、雑音励振信号を５個のパ
ルスで表現し、かつその第０〜第４パルスは図６に示し
た第０〜第４グループごとにそれぞれ規定されるフレー
ム上の位置にのみ存在することを可能とする。つまり雑
音励振信号はこれら第０〜第４グループから一つづつパ
ルス位置を選択し、さらに各パルスに極性を設定するこ
とにより構成する。ピッチ励振信号の選択は例えば図３
について述べたと同様に行う。この発明では雑音励振信
号の選択に特徴があり、この選択は従来技術の項で説明
したＤ_suの演算が最大となるものを求めるが、請求項１
の発明では、まず図１Ａに示すように、Ｄ_suの式の分子
のΨの算出を、合成フィルタのインパルス応答を要素と
する行列Ｈと、入力音声信号の時系列ベクトルｓと、ピ
ッチ励振信号（ベクトル）ｐとにより行う（Ｓ₁）。次
にこのΨと、図６から求められる全ての雑音励振信号
（ベクトル）ｃの積を求める（Ｓ₂）。Ψのベクトルの
第ｉ要素（ｉ＝０，１，２，…３９）をψ（ｉ）とし、
雑音励振信号ｃの５つのパルスの極性をａ₀〜ａ₄、パ
ルス位置をｌ₀〜ｌ₄とすると、Ψｃは次式の演算とな
る。For simplicity, the sampling frequency of the input voice is 8000 Hz, the duration time (frame length) of the noise excitation signal is 5 ms, the noise excitation signal is expressed by 5 pulses, and The ~ fourth pulse can exist only at the position on the frame defined for each of the 0th to 4th groups shown in Fig. 6. That is, the noise excitation signal is formed by selecting pulse positions one by one from these 0th to 4th groups and further setting the polarity of each pulse. The pitch excitation signal is selected, for example, as shown in FIG.
As described above. The present invention is characterized by the selection of the noise excitation signal, and this selection seeks the one that maximizes the calculation of D _su described in the section of the prior art.
In the invention of FIG. 1, first, as shown in FIG. 1A, the calculation of Ψ of the numerator of D _su is performed by a matrix H having the impulse response of the synthesis filter as an element, the time series vector s of the input speech signal, and the pitch excitation signal (Vector) p and (S ₁ ). Next, the product of this Ψ and all noise excitation signals (vectors) c obtained from FIG. 6 is obtained (S ₂ ). Let i (i) be the i-th element (i = 0, 1, 2, ... 39) of the vector of Ψ,
When the polarities of the five pulses of the noise excitation signal c are a _{0 to} a ₄ and the pulse positions are l ₀ to l ₄ , Ψc is calculated by the following equation.

【００２１】Ψｃ＝ａ₀ψ（ｌ₀）＋ａ₁ψ（ｌ₁）＋
…＋ａ₄ψ（ｌ₄）この演算を各雑音励振信号について実行する。これらΨ
ｃの値の大きなものから順に、例えば３０個程度を予備
選択する（Ｓ₃）。この予備選択した雑音励振信号につ
いて前記Ｄ_suの演算をそれぞれ行い（Ｓ₄）、最大のＤ
_suとなった雑音励振信号を選択する（Ｓ₅）。このよう
に予備選択を行うがＤ_suの値が大きなものはその分子の
Ψｃも大きな値と考えられ、この予備選択によってもそ
の数を適当に選ぶことによって最適の雑音励振信号を落
すおそれはない。Ψc = a ₀ ψ (l ₀ ) + a ₁ ψ (l ₁ ) +
... + a ₄ ψ (l ₄ ) This calculation is executed for each noise excitation signal. These Ψ
For example, about 30 pieces are preselected in descending order of the value of c (S ₃ ). The D _su is calculated for each of the preselected noise excitation signals (S ₄ ), and the maximum D is calculated.
_The noise excitation signal that has become _su is selected (S ₅ ). Preliminary selection is performed in this way, but if the value of D _su is large, ψc of the numerator is also considered to be large, and there is no possibility of dropping the optimum noise excitation signal by appropriately selecting the number even in this preselection. .

【００２２】次に請求項２の発明の実施例を説明する。
この発明においても雑音励振信号の選択に特徴があり、
この実施例においても、図１Ｂに示すようにまずΨの演
算を行う（Ｓ₁）。この演算は図１Ａのそれと同様に行
う。次に図６中の各グループ０〜４ごとに、そのグルー
プに属する、この例では各８つのパルス位置ｌ_j（０，
１，…，７）についてそのａ_j，ｌ_jを用いてΨの要素
ａ_jψ（ｌ_j）を演算する（Ｓ₂）。これら各グループ
ごとにａ_jψ（ｌ_j）の演算結果から大きな順に１乃至
複数個のａ_j，ｌ_jを取出す（Ｓ₃）。これら取出され
たａ_j，ｌ_jを各グループごとから１つづつ取出して、
作り得る雑音励振信号ｃを作成する（Ｓ ₄）。このよう
にして図６に示したものから作り得る雑音励振信号より
予備選択がなされたことになる。この予備選択された雑
音励振信号を用いてＤ_suの演算を行い（Ｃ₅）、その演
算結果が最大となった雑音励振信号を選択する
（Ｓ₆）。Next, an embodiment of the invention of claim 2 will be described.
This invention is also characterized by the selection of the noise excitation signal,
Also in this embodiment, as shown in FIG.
Calculate (S₁). This operation is the same as that of FIG. 1A.
U. Next, for each group 0 to 4 in FIG.
8 pulse positions l in this example_j(0,
About 1, ..., 7)_j, L_jThe element of Ψ using
a_jψ (l_j) Is calculated (S_Two). Each of these groups
Every a_jψ (l_j1) from the calculation result of 1)
Multiple a_j, L_jTake out (S_Three). These are taken out
A_j, L_jTake out one from each group,
A possible noise excitation signal c is created (S _Four). like this
From the noise excitation signal that can be made from the one shown in FIG.
A preliminary selection has been made. This preselected miscellaneous
D using sound excitation signal_suIs calculated (C_Five), The performance
Select the noise excitation signal with the maximum calculation result
(S₆).

【００２３】この図１Ｂに示した実施例を機能的に示す
と図２のようになる。つまり端子１５からの合成フィル
タ係数によりＲ，Ｈ算出部３１でＨとＲが算出され、こ
れらと端子１からの入力音声ｓ及び端子１３からのピッ
チ励振信号ｐとからΨ算出部３２でΨが算出される。予
備選択部３３〜３７でそれぞれグループ０〜４における
各ａ_jψ（ｌ_j）の値が算出され、各グループにおいて
ａ_jψ（ｌ_j）の値の大きい順にａ_j，ｌ_jをそれぞれ
例えば３個づつ選択する。Functionally, the embodiment shown in FIG. 1B is as shown in FIG. That is, H and R are calculated by the R, H calculation unit 31 by the synthesis filter coefficient from the terminal 15, and from these, the input sound s from the terminal 1 and the pitch excitation signal p from the terminal 13, Ψ is calculated by the Ψ calculation unit 32. It is calculated. The values of a _j ψ (l _j ) in the groups 0 to 4 are calculated by the preliminary selection units 33 to 37, and a _j and l _j are respectively calculated in descending order of the value of a _j ψ (l _j ) in each group, for example. Select three at a time.

【００２４】雑音励振信号候補作成部３８で予備選択部
３３〜３７でそれぞれ選択されたａ _j，ｌ_jから１つず
つ取出して、その全ての組み合せを作り、これらを雑音
励振信号候補とし、これらのそれぞれに対し、Ψ算出部
３２で得たΨをΨｃ算出部３９で乗算し、これら各Ψｃ
の乗算結果と、Ｒとｐと雑音励振信号候補作成部３８で
作成された対応するｃとにより歪算出部４１でそれぞれ
Ｄ_suの演算がなされ、その演算結果が最大となった雑音
励振信号候補を最適励振信号候補選択部４２で選択して
出力する。In the noise excitation signal candidate creation unit 38, a preliminary selection unit
A selected in each of 33 to 37 _j, L_jOne by one
Take out all of them and make a noise
As excitation signal candidates, Ψ calculator
Ψ obtained in 32 is multiplied by the Ψc calculation unit 39, and each Ψc
The multiplication result of R, p and the noise excitation signal candidate creation unit 38
The distortion is calculated by the distortion calculating unit 41 according to the created corresponding c.
D_suIs calculated, and the noise that maximizes the calculation result is
Select the excitation signal candidate in the optimum excitation signal candidate selection unit 42
Output.

【００２５】この請求項２の発明において請求項１の発
明の場合と同様に、Ψｃが大きいことはその構成要素ａ
₀ψ（ｌ₀），ａ₁ψ（ｌ₁），…，ａ₄ψ（ｌ₄）の
それぞれも大きいはずであるから、各グループからａ_j
ψ（ｌ_j）が大きな値となるものを予備選択しても、正
しく最適な雑音励振信号を選択することができる。In the invention of claim 2, as in the case of the invention of claim 1, the fact that Ψc is large means that the constituent element a
_{Since 0} ψ (l ₀ ), a ₁ ψ (l ₁ ), ..., A ₄ ψ (l ₄ ) should also be large, a _j from each group
Even if the one with a large value of ψ (l _j ) is preselected, the optimum noise excitation signal can be correctly selected.

【００２６】[0026]

【発明の効果】以上述べたように請求項１の発明によれ
ばΨｃを演算し、それが大きな値となるｃを予備選択
し、その予備選択したｃについてのみＤ_suを演算するた
め、全てのｃについてＤ_suを演算するより演算量が少な
くなる。請求項２の発明によれば、ａ_jψ（ｌ_j）を演
算し、各グループ中のａ_j，ｌ _jの数を減少させて、こ
れより作成した雑音励振信号ｃについてＤ_suを演算する
ため、すべてのｃについてＤ_suを演算するよりも演算量
が少なくなる。例えば各グループから３個のパルス位置
を予備選択する場合はこれらより作られる雑音励振信号
の数は３⁵＝２４３個であり、この予備選択をしない場
合の数８⁵個よりも著しく少なくなり、処理量が多い歪
算出（Ｄ_suの演算）回数がそれだけ減少し、その効果は
著しい。As described above, according to the invention of claim 1.
For example, calculate Ψc and preselect c that has a large value.
And D only for the preselected c_suTo calculate
D for all c_suThe amount of calculation is smaller than
It becomes. According to the invention of claim 2, a_jψ (l_jPlay)
A in each group_j, L _jReduce the number of
The noise excitation signal c created from this_suCompute
Because D for all c_suComputation amount rather than computing
Is less. For example, 3 pulse positions from each group
When preselecting
Is 3^Five= 243, if you do not make this preliminary selection
Number 8^FiveDistortion that is significantly less than the number of pieces and has a large amount of processing
Calculate (D_suThe number of operations is reduced accordingly, and the effect is
Remarkable.

【００２７】図７に、各グループよりのパルス位置選択
数を変化させた時の、雑音励振信号選択に必要な演算量
と、再生音声の品質とを示す。選択数が１の場合は作成
される雑音励振信号は１個であり、それが選択された雑
音励振信号となる。選択数が８の場合は、予備選択を行
わない従来の手法であり、この場合、演算量が著しく多
くなっている。再生音声品質は値が大きい程、よいこと
を示し、従って本来は選択数が多くなる程、良くなるべ
きであるが、それ程変化がなかった。このことは、この
予備選択は可なり正しく行われることを示しているとも
言える。FIG. 7 shows the amount of calculation required for noise excitation signal selection and the quality of reproduced voice when the number of pulse position selections from each group is changed. When the number of selections is 1, there is one noise excitation signal to be created, which is the selected noise excitation signal. When the number of selections is 8, it is a conventional method that does not perform preliminary selection, and in this case, the amount of calculation is significantly large. The reproduced voice quality shows that the higher the value, the better. Therefore, originally, the larger the number of selections, the better the quality should be, but there was not much change. This can be said to indicate that this preliminary selection is done fairly correctly.

【００２８】上述ではこの発明を音声信号の符号化に適
用したが、音響信号の符号化にも適用できる。Although the present invention has been applied to the coding of audio signals in the above, it can also be applied to the coding of acoustic signals.

[Brief description of drawings]

【図１】Ａは請求項１の発明の実施例の要部を示す流れ
図、Ｂは請求項２の発明の実施例の要部を示す流れ図で
ある。FIG. 1 is a flow chart showing an essential part of an embodiment of the invention of claim 1, and B is a flow chart showing an essential part of an embodiment of the invention of claim 2.

【図２】請求項２の発明の実施例の要部を機能的に示し
たブロック図。FIG. 2 is a block diagram functionally showing a main part of an embodiment of the invention of claim 2;

【図３】従来のＣＥＬＰ符号化法を機能的に示すブロッ
ク図。FIG. 3 is a block diagram functionally showing a conventional CELP encoding method.

【図４】従来のＡＣＥＬＰ符号化法の一部を機能的に示
すブロック図。FIG. 4 is a block diagram functionally showing a part of a conventional ACELP encoding method.

【図５】雑音励振信号をピッチ励振信号に対して直交化
し雑音励振信号を選択する従来の手法を機能的に示すブ
ロック図。FIG. 5 is a block diagram functionally showing a conventional method of orthogonalizing a noise excitation signal with respect to a pitch excitation signal and selecting the noise excitation signal.

【図６】雑音励振信号作用の各グループのパルス位置の
例を示す図。FIG. 6 is a diagram showing an example of pulse positions of each group of noise excitation signal action.

【図７】グループからの予備選択数と演算量と再生音声
品質との関係を示す図。FIG. 7 is a diagram showing the relationship between the number of preliminary selections from a group, the amount of calculation, and the reproduced voice quality.

Claims

[Claims]

1. A synthesizing filter in which a quantized filter coefficient obtained from an input acoustic signal is set to a pitch excitation signal expressing a pitch component and a noise excitation signal expressing a noise component.
The audio signal is encoded by utilizing the time-series vector component composed of the individual excitation signals to reproduce the audio signal by driving for each frame, and the noise excitation signal is represented by a small number of pulses, and each pulse In the acoustic signal encoding method for encoding the position and the amplitude of the noise excitation signal, the noise excitation signal, the input acoustic signal, the pitch excitation signal, and the value obtained from the synthesis filter coefficient are used to reserve the candidate of the noise excitation signal. A plurality of selected, further, the distortion between each synthetic acoustic signal and the input acoustic signal obtained by driving the synthetic filter with the candidate of each of the noise excitation signals selected here is obtained, and the distortion is minimized. A method for encoding an acoustic signal, characterized in that only one noise excitation signal is selected.

2. A synthesizing filter in which a quantized filter coefficient obtained from an input acoustic signal is set to a pitch excitation signal expressing a pitch component and a noise excitation signal expressing a noise component.
The audio signal is encoded by utilizing the time-series vector component composed of the individual excitation signals to reproduce the audio signal by driving for each frame, and the noise excitation signal is represented by a small number of pulses, and each pulse In the acoustic signal encoding method for encoding the position and amplitude of the pulse, the noise excitation signal is formed by using a value obtained from the noise excitation signal, the input acoustic signal, the pitch excitation signal, and the synthesis filter coefficient. A plurality of candidates for the position and amplitude of each of them are preliminarily selected, and each synthesized acoustic signal and input acoustic signal obtained by driving the synthesis filter with each of the noise excitation signals configured by the pulse positions selected here. And a noise excitation signal that minimizes the distortion is selected, and an acoustic signal coding method is characterized.