JPH11184499A

JPH11184499A - Voice encoding method and voice encoding method

Info

Publication number: JPH11184499A
Application number: JP9355746A
Authority: JP
Inventors: Kimio Miseki; 公生三関; Katsumi Tsuchiya; 勝美土谷
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1997-12-24
Filing date: 1997-12-24
Publication date: 1999-07-09
Anticipated expiration: 2017-12-24
Also published as: JP3192999B2

Abstract

PROBLEM TO BE SOLVED: To provide such a voice encoding method that encoding distortion is hardly perceived in low-rate encoding of about 4 kbit/s. SOLUTION: A phase search part 112 searches for a phase shift quantity for sectioning a noise vector outputted from a noise code book 120 by a phase adaptation part 119 so that the error between the waveform composed of a pitch pulse train repeated in pitch cycles from a pulse train setting part 113 and the waveform having an adaptive code vector composed of a voice by repeating a driving source signal outputted from an adaptive code book 101 in pitch cycles becomes less. Then the phase search part 112 searches a phase shift quantity so that the error between composite waveforms composed by artificial auditory sense weighting composing filters 114 and 115 having characteristics set according to composing filter information of a composing filter part 104 by regarding the pitch pulse train and adaptive code vector as voices becomes less.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、低ビットレート音
声符号化／復号化方法に係り、特に雑音音源信号の位相
適応化を行う音源信号の符号化／復号化方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a low bit rate speech coding / decoding method, and more particularly, to a coding / decoding method of an excitation signal for performing phase adaptation of a noise excitation signal.

【０００２】[0002]

【従来の技術】低ビットレート音声符号化の代表的な方
式として、ＣＥＬＰ方式が知られている。ＣＥＬＰ方式
は、スペクトル包絡情報と音源信号の符号化によって音
声信号を表現する方式であり、このＣＥＬＰ方式での音
質をより改善させる音源信号の符号化法として、雑音音
源信号の位相適応化が注目されている。この位相適応化
法は、例えば、間野氏他、「位相適応型ＰＳＩ−ＣＥＬ
Ｐ音声符号化の検討」、電子情報通信学会技術報告ＳＰ
９４−９６、ｐｐ．３７−４４、１９９５年２月（文献
１）に詳しく説明されている。この文献１に記載された
従来の位相適応化方法を用いた音声符号化システムにつ
いて、図４を用いて説明する。2. Description of the Related Art The CELP system is known as a typical system for low bit rate speech coding. The CELP method is a method for expressing a speech signal by coding spectral envelope information and a sound source signal. As an encoding method of a sound source signal for further improving sound quality in the CELP method, attention has been paid to phase adaptation of a noise sound source signal. Have been. This phase adaptation method is described in, for example, Mino et al., “Phase Adaptive PSI-CEL”.
Examination of P speech coding ", IEICE Technical Report SP
94-96, p. 37-44, February 1995 (Literature 1). A speech coding system using the conventional phase adaptation method described in this reference 1 will be described with reference to FIG.

【０００３】図４において、まず音声入力端子からの入
力音声信号は線形予測分析部５０２で分析され、線形予
測係数（ＬＰＣ係数）が抽出される。抽出された線形予
測係数は合成フィルタ情報量子化部５０３で量子化さ
れ、この量子化された線形予測係数、すなわち合成フィ
ルタ部５０４の特性を示す合成フィルタ情報を表す符号
Ａが多重化部５０７に出力される。量子化された線形予
測係数は、合成フィルタ部５０４のフィルタ係数として
使用される。In FIG. 4, an input audio signal from an audio input terminal is first analyzed by a linear prediction analysis unit 502 to extract a linear prediction coefficient (LPC coefficient). The extracted linear prediction coefficients are quantized by the synthesis filter information quantization unit 503, and the quantized linear prediction coefficients, that is, the code A indicating the synthesis filter information indicating the characteristics of the synthesis filter unit 504 is transmitted to the multiplexing unit 507. Is output. The quantized linear prediction coefficients are used as filter coefficients of the synthesis filter unit 504.

【０００４】適応符号帳５０１は過去の符号化された音
源信号を格納しており、過去の音源信号をピッチ周期Ｌ
で繰り返すことにより適応符号ベクトルを生成する。ピ
ッチ周期Ｌの探索は、ピッチ周期候補に対応する適応符
号ベクトル候補をゲイン乗算部５０８および加算部５０
９を経て合成フィルタ５０４で音声として合成したとき
の波形の歪みを減算部５２２を介して聴覚重み付き歪み
計算部５０５により聴覚重み付き歪尺度で計算し、この
聴覚重み付きの合成波形の歪みがより小さくなるピッチ
周期を符号選択部５０６で探索することにより行われ
る。こうして探索されたピッチ周期Ｌを表す符号は、多
重化部５０７に出力される。An adaptive codebook 501 stores past encoded excitation signals, and stores past excitation signals in a pitch cycle L
To generate an adaptive code vector. The search for the pitch period L is performed by adding the adaptive code vector candidate corresponding to the pitch period candidate to the gain multiplication unit 508 and the addition unit 50.
9, the distortion of the waveform when synthesized as speech by the synthesis filter 504 is calculated by the auditory weighted distortion calculator 505 via the subtractor 522 using the auditory weighted distortion scale, and the distortion of the synthesized auditory weighted waveform is calculated. This is performed by searching the code selection unit 506 for a smaller pitch period. The code representing the pitch period L searched in this way is output to the multiplexing unit 507.

【０００５】位相探索部５１２は、適応符号帳５０１か
らの適応符号ベクトルと符号選択部５０６で探索された
ピッチ周期Ｌを用いて後述するようにして位相シフト量
を求め、これを位相適応部５１９に出力する。[0005] The phase search section 512 obtains a phase shift amount using the adaptive code vector from the adaptive codebook 501 and the pitch period L searched by the code selection section 506 as described later. Output to

【０００６】雑音符号帳５２０は複数の雑音ベクトルを
格納しており、符号選択部５０６で探索された雑音符号
Ｃに対応する雑音ベクトルを位相適応部５１９に出力す
る。位相適応部５１９は、雑音符号帳５２０から出力さ
れる雑音ベクトルを位相探索部５１２から与えられる位
相シフト量に応じた位置から切り出す。ピッチ周期化部
５１８は、この切り出された雑音ベクトルにピッチ周期
性を与え、これを雑音符号ベクトルとして出力する。[0007] The noise codebook 520 stores a plurality of noise vectors, and outputs a noise vector corresponding to the noise code C searched by the code selection unit 506 to the phase adaptation unit 519. Phase adaptation section 519 cuts out a noise vector output from noise codebook 520 from a position corresponding to the amount of phase shift provided from phase search section 512. Pitch periodicizing section 518 gives the extracted noise vector pitch periodicity, and outputs this as a noise code vector.

【０００７】雑音符号Ｃの探索は、雑音符号の候補に対
応する雑音符号ベクトル候補を位相適応部５１９、ピッ
チ周期化部５１８、乗算部５１０および加算部５０９を
経て合成フィルタ５０４で音声として合成したときの波
形の歪みを減算部５２２を介して聴覚重み付き歪み計算
部５０５により聴覚重み付き歪尺度で計算し、この聴覚
重み付きの合成波形の歪みがより小さくなる雑音符号を
符号選択部５０６で探索することにより行われる。こう
して探索された雑音符号Ｃは、多重化部５０７に出力さ
れる。In searching for the noise code C, a noise code vector candidate corresponding to the noise code candidate is synthesized as speech by a synthesis filter 504 via a phase adaptation section 519, a pitch periodization section 518, a multiplication section 510 and an addition section 509. The distortion of the waveform at that time is calculated by the auditory weighted distortion calculator 505 via the subtractor 522 by the auditory weighted distortion calculator, and the noise selector that reduces the distortion of the auditory weighted composite waveform is reduced by the code selector 506. This is done by searching. The noise code C searched in this way is output to the multiplexing section 507.

【０００８】ゲイン符号帳５２１は、適応符号ベクトル
に適用するゲインＧ０と雑音符号ベクトルに適用するゲ
インＧ１の候補を格納している。ゲイン符号の探索は、
ゲイン乗算部５０８および５１０でゲインＧ０，Ｇ１の
候補を乗じたときの適応符号ベクトルと雑音符号ベクト
ルを加算部５０９で加え合わせて生成される音源ベクト
ルの候補を合成フィルタ５０４で合成したときの波形の
歪みを減算部５２２を介して聴覚重み付き歪み計算部５
０５により聴覚重み付き歪尺度で計算し、この聴覚重み
付きの合成波形の歪みがより小さくなるゲイン符号を符
号選択部５０６で探索することにより行われる。こうし
て探索されたゲイン符号Ｇは、多重化部５０７に出力さ
れる。The gain codebook 521 stores a gain G0 applied to an adaptive code vector and a candidate G1 applied to a noise code vector. The search for the gain code is
Waveforms when the excitation filter candidates generated by adding the adaptive code vector and the noise code vector obtained by multiplying the gain G0 and G1 candidates by the gain multiplication units 508 and 510 by the addition unit 509 are synthesized by the synthesis filter 504. Of the auditory weighting through the subtractor 522
The calculation is performed by using a perceptually weighted distortion scale in step S05, and the code selecting unit 506 searches for a gain code in which the distortion of the perceptually weighted synthesized waveform becomes smaller. The gain code G searched in this way is output to multiplexing section 507.

【０００９】位相探索部５１２では、パルス列設定部５
１３からのピッチ周期Ｌの間隔で立てたパルス列（ピッ
チパルス列という）を１／Ａｑ（ｚ）で表される伝達特
性を有する合成フィルタ５１４で音声として合成した合
成波形と、適応符号帳５０１からの適応符号ベクトルを
同様に１／Ａｑ（ｚ）で表される伝達特性を有する合成
フィルタ５１５で音声として合成した合成波形との間の
誤差（波形歪み）が最小となるようなピッチパルス列の
位相を位相候補設定部５１６および最適位相選択部５１
７を用いて探索することで、位相適応部５１９での位相
シフト量を求めている。The phase search section 512 includes a pulse train setting section 5
13 and a synthesized waveform obtained by synthesizing a pulse train (referred to as a pitch pulse train) at intervals of a pitch period L from the adaptive codebook 501 with a synthesis filter 514 having a transfer characteristic represented by 1 / Aq (z). The phase of the pitch pulse train that minimizes the error (waveform distortion) between the adaptive code vector and the synthesized waveform synthesized as speech by the synthesis filter 515 having the transfer characteristic represented by 1 / Aq (z) Phase candidate setting section 516 and optimal phase selection section 51
7, the amount of phase shift in the phase adaptation unit 519 is obtained.

【００１０】ここで、伝達特性１／Ａｑ（ｚ）の合成フ
ィルタ５１４，５１５の出力は合成波形、言い換えると
音声波形のレベルとなっているため、従来では音声波形
レベルでの歪み評価によりピッチパルス列の位相探索、
すなわち雑音符号帳３２０から出力される雑音ベクトル
を切り出すための位相シフト量の探索を行っているとい
うことができる。しかし、この方法では、必ずしも聴覚
的に好ましい位相シフト量は探索されないため、例えば
４ｋｂｉｔ／ｓ程度といった低レート符号化になると、
音源信号の符号化歪みが感知され易くなるという問題が
ある。Here, the output of the synthesis filters 514 and 515 of the transfer characteristic 1 / Aq (z) is a synthesized waveform, that is, the level of a voice waveform. Phase search,
That is, it can be said that the search for the phase shift amount for extracting the noise vector output from the noise codebook 320 is performed. However, in this method, a phase shift amount that is acoustically preferable is not always searched, so that when a low-rate coding such as about 4 kbit / s is performed,
There is a problem that encoding distortion of the excitation signal is easily perceived.

【００１１】[0011]

【発明が解決しようとする課題】上述したように、従来
の音声符号化／復号化方法では、音声波形レベルの歪み
評価によりピッチパルス列の位相探索、すなわち雑音符
号帳から出力される雑音ベクトルを切り出すための位相
シフト量の探索を行っているため、探索された位相シフ
ト量が必ずしも聴覚的に好ましいものではなく、４ｋｂ
ｉｔ／ｓ程度の低レート符号化では音源信号の符号化歪
みが感知され易くなってしまうという問題点があった。
本発明は、低レート符号化においても符号化歪みが感知
されにくい音声符号化／復号化方法を提供することを目
的とする。As described above, in the conventional speech coding / decoding method, the phase search of the pitch pulse train, that is, the noise vector output from the noise codebook is cut out by evaluating the distortion of the speech waveform level. The amount of phase shift searched for is not always acoustically preferable because the amount of phase shift searched for is 4 kb.
In low-rate coding at about it / s, there is a problem that coding distortion of the excitation signal is easily perceived.
An object of the present invention is to provide a speech encoding / decoding method in which encoding distortion is hardly perceived even in low-rate encoding.

【００１２】[0012]

【課題を解決するための手段】上述した課題を解決する
ため、本発明の音声符号化／復号化方法は、合成フィル
タ情報から擬似的に聴覚重み付けを行う擬似聴覚重み合
成フィルタを構成し、この擬似聴覚重み合成フィルタに
よる擬似聴覚重み付きの合成波形レベルで、雑音符号帳
から出力される雑音ベクトルを切り出すための位相シフ
ト量を探索するようにしたものである。In order to solve the above-mentioned problems, a speech encoding / decoding method according to the present invention comprises a pseudo-auditory weight synthesis filter for performing pseudo-aural weighting from synthesis filter information. A phase shift amount for extracting a noise vector output from the noise codebook is searched at a synthesized waveform level with a pseudo auditory weight by a pseudo auditory weight synthesis filter.

【００１３】すなわち、本発明の音声符号化方法は、
(a) 過去の駆動音源信号を格納した適応符号帳から出力
される駆動音源信号をピッチ周期で繰り返すことにより
適応符号ベクトルを生成するステップと、(b) ピッチ周
期で繰り返されるパルス列から合成した波形と適応符号
ベクトルから合成した波形との間の誤差がより小さくな
るように、複数の雑音ベクトルを格納した雑音符号帳か
ら出力される雑音ベクトルを切り出すための位相シフト
量を探索する位相シフト量探索ステップと、(c) この位
相シフト量に従って切り出された雑音ベクトルをピッチ
周期で周期化することにより雑音符号ベクトルを生成す
るステップと、(d) 入力音声信号の分析結果から求めら
れる合成フィルタ情報に従って特性が決定された第１の
合成フィルタを適応符号ベクトルと雑音符号ベクトルを
用いて駆動することにより合成音声信号を生成するステ
ップと、(e) 第１の合成フィルタの特性を表す合成フィ
ルタ情報を符号化するステップと、(f) 合成音声信号の
聴覚重み付け後の歪みがより小さくなるピッチ周期およ
び雑音ベクトルを探索するステップとを有する。That is, the speech encoding method of the present invention comprises:
(a) generating an adaptive code vector by repeating a drive excitation signal output from an adaptive codebook storing past drive excitation signals at a pitch cycle, and (b) a waveform synthesized from a pulse train repeated at a pitch cycle. Phase shift amount search for searching a phase shift amount for extracting a noise vector output from a noise codebook storing a plurality of noise vectors so that an error between the waveform and a waveform synthesized from the adaptive code vector becomes smaller. (C) generating a noise code vector by periodicizing a noise vector cut out according to the amount of phase shift at a pitch period, and (d) according to synthesis filter information obtained from an analysis result of an input speech signal. By driving the first synthesis filter whose characteristics have been determined using the adaptive code vector and the noise code vector, (E) encoding synthesis filter information representing the characteristics of the first synthesis filter; and (f) a pitch period in which the distortion of the synthesized speech signal after auditory weighting becomes smaller. And searching for a noise vector.

【００１４】そして、位相シフト量探索ステップは、擬
似的に聴覚重み付けを行って音声を合成するための第２
の合成フィルタ（擬似聴覚重み合成フィルタ）を合成フ
ィルタ情報に基づいて構成し、該第２の合成フィルタに
より前記パルス列から合成した合成波形と該第２の合成
フィルタにより適応符号ベクトルから合成した合成波形
との間の誤差がより小さくなるように位相シフト量を探
索する。The phase shift amount searching step includes a second step of synthesizing voice by performing pseudo auditory weighting.
, Based on the synthesis filter information, and a synthesized waveform synthesized from the pulse train by the second synthesis filter and a synthesized waveform synthesized from the adaptive code vector by the second synthesis filter. The phase shift amount is searched such that the error between the phase shift and the phase shift is smaller.

【００１５】一方、この音声符号化方法に対応する本発
明の音声復号化方法は、(a) 過去の駆動音源信号を格納
した適応符号帳から出力される駆動音源信号をピッチ周
期で繰り返すことにより適応符号ベクトルを生成するス
テップと、(b) ピッチ周期で繰り返されるパルス列から
合成した波形と適応符号ベクトルから合成した波形との
間の誤差がより小さくなるように、複数の雑音ベクトル
を格納した雑音符号帳から出力される雑音ベクトルを切
り出すための位相シフト量を探索する位相シフト量探索
ステップと、(c) 位相シフト量に従って切り出された雑
音ベクトルをピッチ周期で周期化することにより雑音符
号ベクトルを生成するステップと、(d) 合成フィルタ情
報に従って特性が決定された第１の合成フィルタを適応
符号ベクトルと雑音符号ベクトルを用いて駆動すること
により再生音声信号を生成するステップとを有する。On the other hand, the speech decoding method of the present invention corresponding to this speech encoding method comprises the steps of (a) repeating a drive excitation signal output from an adaptive codebook storing past drive excitation signals at a pitch cycle. Generating an adaptive code vector; and (b) storing a plurality of noise vectors so that an error between a waveform synthesized from a pulse train repeated at a pitch period and a waveform synthesized from the adaptive code vector becomes smaller. A phase shift amount searching step of searching for a phase shift amount for cutting out a noise vector output from the codebook, and (c) a noise code vector by periodicizing a noise vector cut out according to the phase shift amount with a pitch period. Generating a first synthesis filter whose characteristics are determined in accordance with the synthesis filter information; Generating a reproduced audio signal by driving using the vector.

【００１６】そして、位相シフト量探索ステップにおい
ては、合成フィルタ情報から擬似的に聴覚重み付けを行
って音声を合成するための第２の合成フィルタ（擬似聴
覚重み合成フィルタ）を構成し、該第２の合成フィルタ
によりパルス列から合成した合成波形と該第２の合成フ
ィルタにより適応符号ベクトルから合成した合成波形と
の間の誤差がより小さくなるように位相シフト量を探索
する。Then, in the phase shift amount searching step, a second synthesis filter (pseudo-auditory weight synthesis filter) for synthesizing speech by performing pseudo-auditory weighting from the synthesis filter information is configured. The phase shift amount is searched such that the error between the synthesized waveform synthesized from the pulse train by the synthesis filter of the above and the synthesized waveform synthesized from the adaptive code vector by the second synthesis filter becomes smaller.

【００１７】このように本発明の音声符号化／復号化方
法では、擬似聴覚重み合成フィルタによる擬似的な聴覚
重み付きの合成波形レベルで、雑音符号帳から出力され
る雑音ベクトルを切り出すための位相シフト量を探索す
ることにより、聴覚的に好ましい位相シフト量を探索す
ることが可能となる。このため、４ｋｂｉｔ／ｓ程度の
低レートにおいても符号化歪みが感知されにくい音源信
号の符号化／復号化を実現することができる。As described above, according to the speech encoding / decoding method of the present invention, the phase for extracting the noise vector output from the noise codebook at the synthetic waveform level with pseudo auditory weights by the pseudo auditory weight synthesis filter. By searching for the shift amount, it is possible to search for a phase shift amount that is acoustically preferable. Therefore, even at a low rate of about 4 kbit / s, encoding / decoding of the excitation signal in which encoding distortion is hardly perceived can be realized.

【００１８】本発明による音声符号化／復号化方法にお
いては、擬似聴覚重み合成フィルタのインパルス応答を
有限個の所定サンプルで打ち切るようにしてもよく、こ
のようにすることで位相探索性能をほとんど低下させる
ことなく、擬似聴覚重み合成フィルタで合成に要するフ
ィルタリングの計算量を大幅に削減できる。In the speech encoding / decoding method according to the present invention, the impulse response of the pseudo-auditory weighting synthesis filter may be cut off at a finite number of predetermined samples, thereby substantially reducing the phase search performance. Without this, it is possible to greatly reduce the amount of calculation of filtering required for synthesis by the pseudo auditory weight synthesis filter.

【００１９】[0019]

【発明の実施の形態】以下、図面を参照して本発明の実
施形態を説明する。［音声符号化システムについて］図１は、本発明の一実
施形態に係る音声符号化方法を適用した音声符号化シス
テムの構成をブロック図で表したものであり、ＣＥＬＰ
方式における雑音音源の符号化に位相適応化を用いた例
を示している。ＣＥＬＰ方式は、音声の生成過程のモデ
ルとして声帯信号を音源信号に対応させ、声道が表すス
ペクトル包絡特性を合成フィルタにより表現し、音源信
号を合成フィルタに入力して、この合成フィルタの出力
で音声信号を表現することが基本である。この際、符号
化システムに入力される音声信号と復号化システムで再
生される音声信号との間の誤差、つまり再生音声信号の
誤差が聴覚的に小さくなるように音源信号の符号化を行
う点に本発明の特徴がある。Embodiments of the present invention will be described below with reference to the drawings. [Speech Coding System] FIG. 1 is a block diagram showing a configuration of a speech coding system to which a speech coding method according to an embodiment of the present invention is applied.
1 shows an example in which phase adaptation is used for encoding a noise source in the system. In the CELP method, a vocal cord signal is made to correspond to a sound source signal as a model of a speech generation process, a spectrum envelope characteristic represented by a vocal tract is expressed by a synthesis filter, and the sound source signal is input to a synthesis filter, and the output of the synthesis filter is It is fundamental to represent an audio signal. At this time, the encoding of the sound source signal is performed so that the error between the audio signal input to the encoding system and the audio signal reproduced by the decoding system, that is, the error of the reproduced audio signal is perceptually reduced. There is a feature of the present invention.

【００２０】本実施形態の音声符号化システムと図４に
示した従来の音声符号化システムとでは、位相探索部に
用いるフィルタの特性が大きく異なる。すなわち、従来
の音声符号化システムにおける位相探索部５１２では、
図４の合成フィルタ５１４，５１５により音声波形レベ
ルでの歪み評価で位相探索を行っているのに対し、本実
施形態における位相探索部１１２では、図１に示される
ように擬似聴覚重み合成フィルタ設定部１１１により擬
似聴覚重み合成フィルタ１１４，１１５の伝達特性Ｆ
（ｚ）を設定し、これらの擬似聴覚重み合成フィルタ１
１４，１１５によって適応符号ベクトルおよびピッチパ
ルス列に対しフィルタリングを行うことにより、聴覚的
な歪みが従来に比して少ないより好適な位相シフト量を
探索することができるようになる。The characteristics of the filter used in the phase search unit differ greatly between the speech coding system of the present embodiment and the conventional speech coding system shown in FIG. That is, in the phase search unit 512 in the conventional speech coding system,
While the phase search is performed by the distortion evaluation at the audio waveform level by the synthesis filters 514 and 515 in FIG. 4, the phase search unit 112 in the present embodiment sets the pseudo-auditory weight synthesis filter as shown in FIG. A transfer characteristic F of the pseudo auditory weight synthesis filters 114 and 115 by the unit 111
(Z), and these pseudo auditory weight synthesis filters 1
By performing filtering on the adaptive code vector and the pitch pulse train according to 14, 115, it becomes possible to search for a more suitable phase shift amount with less audible distortion than in the past.

【００２１】以下、本実施形態について詳細に説明する
と、まず音声入力端子からの入力音声信号は線形予測分
析部１０２で分析され、線形予測係数（ＬＰＣ係数）が
抽出される。抽出された線形予測係数は、合成フィルタ
情報量子化部１０３で量子化され、この量子化された線
形予測係数、すなわち合成フィルタ部１０４の特性を表
す合成フィルタ情報を表す符号Ａが多重化部５０７に出
力される。量子化された線形予測係数は、フィルタ係数
として使用される。Hereinafter, the present embodiment will be described in detail. First, an input audio signal from an audio input terminal is analyzed by a linear prediction analysis unit 102, and a linear prediction coefficient (LPC coefficient) is extracted. The extracted linear prediction coefficients are quantized by the synthesis filter information quantization unit 103, and the quantized linear prediction coefficients, that is, the code A indicating the synthesis filter information indicating the characteristics of the synthesis filter unit 104 is multiplexed by the multiplexing unit 507. Is output to The quantized linear prediction coefficients are used as filter coefficients.

【００２２】適応符号帳１０１は過去の符号化された音
源信号を格納しており、過去の音源信号をピッチ周期Ｌ
で繰り返すことにより、適応符号ベクトルを生成する。
ピッチ周期Ｌの探索に際しては、ピッチ周期候補に対応
する適応符号ベクトル候補をゲイン乗算部１０８および
加算部１０９を経て合成フィルタ１０４で音声として合
成して得られる合成波形の歪みが減算部１２２を介して
聴覚重み付き歪み計算部１０５により聴覚重み付き歪尺
度で計算される。そして、この聴覚重み付きの合成波形
の歪みがより小さくなるピッチ周期が符号選択部５０５
によって探索される。こうして探索されたピッチ周期Ｌ
を表す符号は、多重化部１０７に出力される。The adaptive codebook 101 stores the past encoded excitation signal, and converts the past excitation signal into a pitch cycle L
, An adaptive code vector is generated.
When searching for the pitch period L, the distortion of the synthesized waveform obtained by synthesizing the adaptive code vector candidate corresponding to the pitch period candidate as speech by the synthesis filter 104 via the gain multiplication unit 108 and the addition unit 109 is output via the subtraction unit 122. The weighted distortion calculator 105 calculates the weighted distortion scale. The pitch period at which the distortion of the synthesized waveform with auditory weight becomes smaller is determined by the code selector 505.
Searched by. The pitch period L thus found
Is output to the multiplexing unit 107.

【００２３】雑音符号帳１２０は複数の雑音ベクトルを
格納しており、符号選択部１０６で探索された雑音符号
Ｃに対応する雑音ベクトルを位相適応部１１９に出力す
る。位相適応部１１９は、雑音符号帳１２０から出力さ
れる雑音ベクトルを位相探索部１１２から与えられる位
相シフト量に応じた位置から切り出す。ピッチ周期化部
１１８は、この切り出された雑音ベクトルをピッチ周期
Ｌで周期化することによって雑音ベクトルにピッチ周期
性を与え、これを雑音符号ベクトルとして出力する。The noise codebook 120 stores a plurality of noise vectors, and outputs a noise vector corresponding to the noise code C searched by the code selection unit 106 to the phase adaptation unit 119. Phase adaptation section 119 cuts out a noise vector output from noise codebook 120 from a position corresponding to the phase shift amount provided from phase search section 112. The pitch periodizing section 118 gives the noise vector pitch periodicity by periodicizing the cut-out noise vector with the pitch period L, and outputs this as a noise code vector.

【００２４】雑音符号Ｃの探索に際しては、雑音符号の
候補に対応する雑音符号ベクトル候補を位相適応部１１
９、ピッチ周期化部１１８、乗算部１１０および加算部
１０９を経て合成フィルタ１０４で音声として合成して
得られる合成波形の歪みが減算部１２２を介して聴覚重
み付き歪み計算部１０５により聴覚重み付き歪尺度で計
算される。そして、この聴覚重み付きの合成波形の歪み
がより小さくなる雑音符号Ｃが符号選択部１０６によっ
て探索される。こうして探索された雑音符号Ｃは、多重
化部１０７に出力される。At the time of searching for the noise code C, a noise code vector candidate corresponding to the noise code candidate is
9, the distortion of the synthesized waveform obtained by synthesizing as speech by the synthesis filter 104 through the pitch periodizing unit 118, the multiplying unit 110, and the adding unit 109 is subjected to the auditory weighting by the auditory weighting distortion calculator 105 via the subtractor 122. Calculated on the distortion scale. Then, the code selecting unit 106 searches for a noise code C in which the distortion of the synthesized waveform with auditory weights becomes smaller. The noise code C searched in this way is output to multiplexing section 107.

【００２５】ゲイン符号帳１２１は、適応符号帳１０１
から出力される適応符号ベクトルに適用するゲインＧ０
とピッチ周期化部１１８から出力される雑音符号ベクト
ルに適用するゲインＧ１の候補を格納している。The gain codebook 121 is the adaptive codebook 101
G0 applied to the adaptive code vector output from
And a gain G1 candidate to be applied to the noise code vector output from the pitch periodizing unit 118.

【００２６】ゲイン符号の探索に際しては、ゲイン乗算
部１０８および１１０によりゲインＧ０，Ｇ１の候補を
乗じた後の適応符号ベクトルおよび雑音符号ベクトルを
加算部１０９で加え合わせて生成される音源ベクトルの
候補を合成フィルタ１０４で合成したときの波形を減算
部１２２を介して聴覚重み付き歪み計算部１０５により
聴覚重み付き歪尺度で計算される。そして、この聴覚重
み付きの合成波形の歪みがより小さくなるゲイン符号が
符号選択部１０６によって探索される。こうして探索さ
れたゲイン符号Ｇは、多重化部１０７に出力される。At the time of searching for a gain code, the addition unit 109 adds the adaptive code vector and the noise code vector obtained by multiplying the gain G0 and G1 candidates by the gain multiplication units 108 and 110, and adds the excitation code candidate. Is calculated by the auditory weighted distortion calculator 105 via the subtractor 122 using the auditory weighted distortion scale. Then, the code selection unit 106 searches for a gain code that reduces the distortion of the synthesized waveform with auditory weight. The gain code G searched in this way is output to multiplexing section 107.

【００２７】このようにして符号選択部１０６で最終的
に選択されたピッチ周期Ｌの符号と雑音符号Ｃおよびゲ
イン符号Ｇは、合成フィルタ情報を表す符号Ａとともに
多重化部１０７で多重化され、符号ストリームとして復
号化側へ伝送される。また、符号Ｌ，Ｃ，Ｇを用いて生
成された音源ベクトルは、次の区間の音源信号の符号化
に備えるべく適応符号帳１０１に格納される。The code of the pitch period L, the noise code C and the gain code G finally selected by the code selection unit 106 are multiplexed by the multiplexing unit 107 together with the code A representing the synthesis filter information. It is transmitted to the decoding side as a code stream. Excitation vectors generated using codes L, C, and G are stored in adaptive codebook 101 in preparation for encoding of excitation signals in the next section.

【００２８】ＣＥＬＰ方式に基づく音声符号化システム
では、上述のように音源信号を生成する基となる符号
Ｌ，Ｃ，Ｇの選択に聴覚重み付きの歪み尺度を用いるこ
とで、符号化歪みが聴覚的に聞こえにくくなるようにし
ている。符号化の際に用いる聴覚重み特性は、線形予測
分析部１０２で求められた量子化前の線形予測係数から
求められる。聴覚重み付き歪み計算部１０５で計算され
る聴覚重み付き歪みの情報は、符号化側で行う符号選択
にだけ使われるため、この聴覚重み特性の情報を復号化
側に伝送する必要はない。In the speech coding system based on the CELP method, as described above, a distortion measure with an auditory weight is used for selecting codes L, C, and G from which a sound source signal is generated, so that encoding distortion is reduced by an auditory sense. To make it harder to hear. The perceptual weight characteristic used in encoding is obtained from the linear prediction coefficient before quantization obtained by the linear prediction analysis unit 102. Since the information on the perceptually weighted distortion calculated by the perceptually weighted distortion calculator 105 is used only for code selection performed on the encoding side, there is no need to transmit this perceptual weighting characteristic information to the decoding side.

【００２９】次に、本発明の特徴をなす擬似聴覚重み合
成フィルタを用いた位相探索部１１２での位相探索法に
ついて説明する。位相探索部１１２は、パルス列設定部
１１３からのピッチ周期Ｌの間隔で立てたパルス列（こ
れをピッチパルス列という）をＦ（ｚ）で表される伝達
特性を有する合成フィルタ１１４で合成した波形と、適
応符号帳１０１からの適応符号ベクトルを同様にＦ
（ｚ）で表される伝達特性を有する合成フィルタ１１５
で合成した波形との間の誤差が最小となるようなピッチ
パルス列の位相を位相候補設定部１１６と最適位相選択
部１１７を用いて探索することで、位相シフト量を求め
ている。Next, a description will be given of a phase search method in the phase search unit 112 using the pseudo auditory weight synthesis filter which is a feature of the present invention. The phase search unit 112 synthesizes a pulse train (hereinafter referred to as a pitch pulse train), which is set at intervals of the pitch period L from the pulse train setting unit 113, with a synthesis filter 114 having a transfer characteristic represented by F (z), The adaptive code vector from adaptive codebook 101 is
A synthesis filter 115 having a transfer characteristic represented by (z)
By using the phase candidate setting unit 116 and the optimal phase selection unit 117 to search for the phase of the pitch pulse train that minimizes the error between the waveform and the waveform synthesized in step 2, the phase shift amount is obtained.

【００３０】この場合、まず擬似聴覚重み合成フィルタ
設定部１１１により、合成フィルタ情報量子化部１０３
からの量子化された線形予測係数である合成フィルタ情
報を用いて、擬似的に聴覚重み付けを行う合成フィルタ
１１４，１１５の伝達特性を求める。この方法の一例を
図２を用いて説明する。In this case, first, the pseudo-auditory weight synthesis filter setting unit 111 sets the synthesis filter information quantization unit 103.
The transfer characteristics of the synthesis filters 114 and 115 that perform pseudo auditory weighting are obtained by using the synthesis filter information that is the quantized linear prediction coefficients from. An example of this method will be described with reference to FIG.

【００３１】図２（ａ）は、従来技術に基づく量子化さ
れた線形予測係数αｑ（ｉ）をそのままフィルタ係数に
用いた通常の１／Ａｑ（ｚ）なる伝達特性を持つ合成フ
ィルタ２０１（図４中の合成フィルタ５１４，５１５に
相当）による位相探索の概念を示している。一方、図２
（ｂ）は本発明に従うＦ（ｚ）なる伝達特性を持つ擬似
聴覚重み合成フィルタ２０２（図１中の合成フィルタ１
１４，１１５に相当）による位相探索の概念を示してい
る。通常、ＣＥＬＰ方式で用いられる聴覚重みフィルタ
の特性Ｗ（ｚ）は、Ｗ（ｚ）＝Ａ（ｚ／γ１）／Ａ（ｚ／γ２）（１）で表される。ここで、０＜γ２＜γ１＜１であり、符号
化の実現形態に応じてγ２とγ１の値は設定される。一
例として、γ２＝０．５、γ１＝０．９８を用いること
ができる。また、Ａ（ｚ）は量子化されていない線形予
測係数α（ｉ）を用いて、Ａ（ｚ）＝１＋Σα（ｉ）ｚ^-i （２）で表される。FIG. 2A shows a synthesis filter 201 having a transfer characteristic of 1 / Aq (z), which is a normal 1 / Aq (z), using the quantized linear prediction coefficient αq (i) based on the prior art as a filter coefficient as it is. 4 (corresponding to the synthesis filters 514 and 515 in FIG. 4). On the other hand, FIG.
(B) is a pseudo-auditory weight synthesis filter 202 (the synthesis filter 1 in FIG. 1) having a transfer characteristic of F (z) according to the present invention.
14 and 115). Usually, the characteristic W (z) of the auditory weight filter used in the CELP method is represented by W (z) = A (z / γ1) / A (z / γ2) (1) Here, 0 <γ2 <γ1 <1, and the values of γ2 and γ1 are set according to the implementation of encoding. As an example, γ2 = 0.5 and γ1 = 0.98 can be used. A (z) is represented by A (z) = 1 + Σα (i) z ⁻ⁱ (2) using the unquantized linear prediction coefficient α (i).

【００３２】一方、本実施形態で用いる擬似聴覚重み合
成フィルタ２０２（図１中の合成フィルタ１１４，１１
５）に基となる擬似聴覚重みフィルタの特性Ｗｑ（ｚ）
は、次式で表される。Ｗｑ（ｚ）＝Ａｑ（ｚ／γ１）／Ａｑ（ｚ／γ２）（３）ここで、Ａｑ（ｚ）は量子化された線形予測係数αｑ
（ｉ）を用いて、Ａｑ（ｚ）＝１＋Σαｑ（ｉ）ｚ^-i （４）で表される。On the other hand, the pseudo-auditory weight synthesis filter 202 (the synthesis filters 114 and 11 in FIG. 1) used in the present embodiment.
The characteristic Wq (z) of the pseudo auditory weight filter based on 5)
Is represented by the following equation. Wq (z) = Aq (z / γ1) / Aq (z / γ2) (3) where Aq (z) is a quantized linear prediction coefficient αq
Using (i), Aq (z) = 1 + Σαq (i) z- ⁱ (4)

【００３３】そして、本実施形態で位相探索に用いる擬
似聴覚重み合成フィルタ２０２（図１中の合成フィルタ
１１４，１１５）の伝達特性Ｆ（ｚ）は、上述した擬似
聴覚重みフィルタの特性Ｗｑ（ｚ）と通常の合成フィル
タの伝達特性１／Ａｑ（ｚ）を組み合わせることにより
設定され、図２（ｂ）に示すように、Ｆ（ｚ）＝ [１／Ａｑ（ｚ）] Ｗｑ（ｚ）＝ [１／Ａｑ（ｚ）][Ａｑ（ｚ／γ１）／Ａｑ（ｚ／γ２）] （５）で表される。The transfer characteristic F (z) of the pseudo-auditory weight synthesis filter 202 (the synthesis filters 114 and 115 in FIG. 1) used for the phase search in this embodiment is the characteristic Wq (z) of the pseudo-auditory weight filter described above. ) And the transfer characteristic 1 / Aq (z) of the ordinary synthesis filter are combined, and as shown in FIG. 2B, F (z) = [1 / Aq (z)] Wq (z) = [1 / Aq (z)] [Aq (z / γ1) / Aq (z / γ2)] (5)

【００３４】通常、線形予測係数の量子化誤差、言い換
えると合成フィルタ情報の量子化誤差により、擬似聴覚
重みフィルタの特性Ｗｑ（ｚ）と聴覚重みフィルタの重
み特性Ｗ（ｚ）は同一の特性とはならないが、符号化側
で量子化誤差が小さくなるように符号化を行うため、４
ｋｂｉｔ／ｓ程度の低レート符号化においては、Ｗｑ
（ｚ）と実際のＷ（ｚ）とは非常に近い特性を示す。本
実施形態は、このことを利用して従来よりも聴覚的な歪
みが少なくなる歪み尺度の基で、位相探索を行うことが
できるようにしたことが特徴となっている。Normally, due to the quantization error of the linear prediction coefficient, in other words, the quantization error of the synthesis filter information, the characteristic Wq (z) of the pseudo auditory weight filter and the weight characteristic W (z) of the auditory weight filter are the same. However, since encoding is performed on the encoding side so as to reduce the quantization error, 4
In low-rate encoding of the order of kbit / s, Wq
(Z) and the actual W (z) show very similar characteristics. This embodiment is characterized in that a phase search can be performed based on a distortion measure that reduces auditory distortion as compared with the related art using this fact.

【００３５】また、量子化した線形予測係数αｑ（ｉ）
を用いて擬似聴覚重み合成フィルタの伝達特性Ｆ（ｚ）
を設定できるので、復号化側においても符号化側と同一
の位相探索アルゴリズムを用いることにより、符号化側
と同一の位相シフト量を再生することができる。The quantized linear prediction coefficient αq (i)
The transfer characteristic F (z) of the pseudo auditory weight synthesis filter using
Can be set, so that the same phase shift algorithm as the encoding side can be reproduced on the decoding side by using the same phase search algorithm as the encoding side.

【００３６】本発明では、擬似聴覚重みフィルタの特性
Ｗｑ（ｚ）が聴覚重み合成フィルタ特性Ｗ（ｚ）の特性
に近く、かつ量子化された合成フィルタ情報を基に再生
できるものであれば、擬似聴覚重みフィルタは式（３）
と異なる特性のものでも有効であることは言うまでもな
い。例えば、Ｗｑ（ｚ）をＷｑ（ｚ）＝Ａｑ（ｚ）／Ａｑ（ｚ／γ３）（６）としても、擬似的に聴覚重みフィルタの特性を反映させ
ることができる。γ３の値としては、０．７〜０．９程
度の範囲の値が適当である。このとき位相探索に用いる
擬似聴覚重み合成フィルタの特性Ｆ（ｚ）は、分母と分
子のＡｑ（ｚ）がキャンセルされるので、Ｆ（ｚ）＝［１／Ａｑ（ｚ）] Ｗｑ（ｚ）（７）＝１／Ａｑ（ｚ／γ３）となる。この場合、式（５）のＦ（ｚ）を用いる場合に
比べて擬似的な聴覚重み特性が多少変わるが、聴覚的に
歪みの聞こえにくい位相シフト量を探すには十分な特性
である。また、Ｆ（ｚ）の次数が少なくなるので、フィ
ルタリング計算自体が簡単になり、位相探索に要する計
算量を少なくできる効果がある。According to the present invention, if the characteristic Wq (z) of the pseudo auditory weight filter is close to the characteristic of the auditory weight composite filter characteristic W (z) and can be reproduced based on the quantized synthetic filter information, The pseudo auditory weight filter is given by equation (3)
Needless to say, a material having a characteristic different from that described above is also effective. For example, even if Wq (z) is set to Wq (z) = Aq (z) / Aq (z / γ3) (6), the characteristics of the auditory weight filter can be reflected in a pseudo manner. A value in the range of about 0.7 to 0.9 is appropriate as the value of γ3. At this time, the characteristic F (z) of the pseudo-auditory weight synthesis filter used for the phase search is such that Fq (z) = [1 / Aq (z)] Wq (z) because the denominator and numerator Aq (z) are canceled. (7) = 1 / Aq (z / γ3) In this case, the pseudo auditory weight characteristic slightly changes as compared with the case of using F (z) in Expression (5), but is a sufficient characteristic for searching for a phase shift amount in which distortion is hardly perceptible. Further, since the order of F (z) is reduced, the filtering calculation itself is simplified, and there is an effect that the amount of calculation required for the phase search can be reduced.

【００３７】次に、本実施形態における具体的な位相探
索方法の一例について述べる。パルス列設定部１１３で
ピッチ周期Ｌの間隔で立てたパルス列（ピッチパルス
列）を伝達特性Ｆ（ｚ）の擬似聴覚重み合成フィルタ１
１４で合成した波形に位相候補設定部１１６で位相候補
φを与えて位相シフトさせることにより得られる波形
と、適応符号帳１０１からの適応符号ベクトルを伝達特
性Ｆ（ｚ）の擬似聴覚重み合成フィルタ１１５で合成し
た波形との間の誤差（波形歪み）を最適位相選択部１１
７で評価することによって、聴覚的に最適な位相シフト
量φopt を求める。Next, an example of a specific phase search method in this embodiment will be described. A pulse train (pitch pulse train) set at intervals of a pitch period L by the pulse train setting unit 113 is used as a pseudo-auditory weighting synthesis filter 1 having a transfer characteristic F (z).
A pseudo auditory weighting synthesis filter having a transfer characteristic F (z) is obtained by adding a phase candidate φ to the waveform synthesized by the phase candidate setting unit 116 and shifting the phase by shifting the adaptive code vector from the adaptive codebook 101. The error (waveform distortion) between the waveform synthesized in 115 and the optimal phase selecting unit 11
7 to obtain an auditory optimal phase shift amount φopt.

【００３８】ここで、位相探索に用いるフィルタの特性
として従来技術では１／Ａｑ（ｚ）を用いているのに対
し、本発明では擬似聴覚重み合成フィルタ１１４，１１
５の特性Ｆ（ｚ）を用いている。このため、フィルタリ
ングされた波形に対し位相φを変えて歪みを計算する方
法については、従来の技術をそのまま用いることができ
る。また、次に示すように実際にフィルタリングを行わ
ずにフィルタリングを行った時の歪みを計算すること
で、最適な位相を探索することもできる。Here, 1 / Aq (z) is used as the characteristic of the filter used for the phase search in the prior art, whereas the pseudo-auditory weight synthesis filters 114 and 11 are used in the present invention.
5, the characteristic F (z) is used. For this reason, the conventional technique can be used as it is for the method of calculating the distortion by changing the phase φ with respect to the filtered waveform. As described below, by calculating the distortion when filtering is performed without actually performing filtering, an optimum phase can be searched.

【００３９】今、適応符号ベクトルをｘ、適応符号ベク
トルｘに対応するピッチ周期をＬとし、ピッチ周期Ｌの
間隔で立てたピッチパルス列に位相φを与えたパルス列
ベクトルをｙφとする。パルス列ベクトルｙφは、ベク
トルの次元をＮとするとｙφ＝［ｙφ（０），ｙφ（１），…，ｙφ（Ｎ−
１）］^t と定義できる。ｙφ（ｉ）はベクトルｙφのｉ番目の要
素を表すものとする。ここでφ＝０に対応するベクトル
ｙ０は、ｙ０＝［１，０，…，０，１，０，…，０，１，０，
…］^t であり、ｙ０（０）＝１である。このとき、１の要素は
ピッチ周期Ｌの間隔で配置されている。Now, let x be an adaptive code vector, let L be a pitch period corresponding to the adaptive code vector x, and let yφ be a pulse train vector obtained by giving a phase φ to a pitch pulse train set at intervals of the pitch period L. The pulse train vector yφ is given by yφ = [yφ (0), yφ (1),..., Yφ (N−
1)] ^t can be defined. Let yφ (i) represent the i-th element of the vector yφ. Here, a vector y0 corresponding to φ = 0 is represented by y0 = [1,0,..., 0,1,0,.
...] ^t , and y0 (0) = 1. At this time, one element is arranged at intervals of the pitch period L.

【００４０】これに対し、位相φ＝１に対応するパルス
ベクトルｙ１はパルスベクトルｙ０の位相を１サンプル
右にシフトさせるたものであるため、ｙ１（０）＝０、ｙ１（１）＝１となり、ｙ１＝［０，１，０，…，０，１，０，…. ，０，１，
０，…］^t である。擬似聴覚重み合成フィルタ１１４，１１５のフ
ィルタリング演算を行列Ｆで表したとき、適応符号ベク
トルｘとパルス列ベクトルｙφとの間の擬似聴覚重み合
成レベルでの歪みＥφは、Ｅφ＝（Ｆｘ−ｇφＦｙφ）^t （Ｆｘ−ｇφＦｙφ）（８）で表すことができる。ここでｇφはゲイン値である。Ｅ
φを最小とするときの最適ゲインｇφはｇφ＝（ｘ^t Ｆ^t Ｆｙφ）／（ｙφ^t Ｆ^t Ｆｙφ）（９）であるから、これを式（６）に代入すると、Ｅφの最小
値（Ｅφ）min は（Ｅφ）min ＝ｘ^t Ｆ^t Ｆｘ −（ｘ^t Ｆ^t Ｆｙφ）² ／（ｙφ^t Ｆ^t Ｆｙφ）（１０）となる。従って、式（１０）の右辺第２項（ｘ^t Ｆ^t Ｆ
ｙφ）² ／（ｙφ^t Ｆ^tＦｙφ）を最大にするφが最適
な位相シフト量φopt となることが判る。On the other hand, since the pulse vector y1 corresponding to the phase φ = 1 is obtained by shifting the phase of the pulse vector y0 to the right by one sample, y1 (0) = 0 and y1 (1) = 1. , Y1 = [0,1,0, ..., 0,1,0, ...., 0,1,
0, ...] ^t . When the filtering operation of the pseudo auditory weight synthesis filters 114 and 115 is represented by a matrix F, the distortion Eφ at the pseudo auditory weight synthesis level between the adaptive code vector x and the pulse train vector yφ is: Eφ = (Fx−gφFyφ) ^t (Fx-gφFyφ) (8) Here, gφ is a gain value. E
Since the optimal gain Jifai when minimizing the φ is ^{^{gφ = (x t F t Fyφ}} ) / (yφ t F t Fyφ) (9), when it is substituted into equation (6), the minimum value of E? ( E?) min is ^{(Eφ) min = x t F} t Fx - a ^{^{^{(x t F t Fyφ) 2}}} / (yφ t F t Fyφ) (10). Therefore, the second term on the right side of the equation (10) (x ^t F ^t F
^{^{yφ) 2 / (yφ t F}} t Fyφ) to it can be seen that φ maximizing becomes optimum phase shift amount Faiopt.

【００４１】位相探索の計算量を減らす方法として、Ｆ
^t Ｆｘの計算結果に相当するベクトルｒ（＝Ｆ^t Ｆｘ）
を先に求め、次に位相候補φ毎にｒ^t ｙφを求めること
により、式（１０）の右辺第２項の分子部分の計算量が
削減できる。式（１０）の右辺第２項の分母部分につい
ては、ｄφ＝Ｆｙφとし、Ｂφ＝ｙφ^t Ｆ^t Ｆｙφ＝ｄ
φ^t ｄφとすると、分母の項Ｂφの計算はＢφ＝Ｂφ+1＋ｄ０（Ｎ−１−φ）² （１１）なる再帰式が成り立つ。この関係を利用することによ
り、１つの位相候補当たり１回の積和演算で式（１０）
の右辺第２項の分母部分の計算ができるので、計算量が
１／Ｎに減る。As a method of reducing the calculation amount of the phase search, F
^t Fx calculated results corresponding to the vector r (= F ^t Fx)
Is calculated first, and then r ^t yφ is calculated for each phase candidate φ, thereby reducing the calculation amount of the numerator part of the second term on the right side of Expression (10). The right side denominator of the second term of formula (10), and ^{dφ = Fyφ, Bφ = yφ t} F t Fyφ = d
Assuming that φ ^t dφ, the recursive formula Bφ = Bφ + 1 + d0 (N−1−φ) ² (11) holds for the calculation of the denominator term Bφ. By utilizing this relationship, the expression (10) can be calculated by one product-sum operation per one phase candidate.
Can be calculated in the denominator part of the second term on the right-hand side of, so the calculation amount is reduced to 1 / N.

【００４２】また、擬似聴覚重み合成フィルタ１１４，
１１５のフィルタリング演算Ｆの計算量をさらに削減す
る方法として、フィルタ１１４，１１５のインパルス応
答ｆｎを所定のＫサンプルまで求め、ｆｎ＝０（ｎ＞Ｋ）とすることで、打ち切られたインパルス応答を用いて位
相シフト量を探索することも有効である。通常、この擬
似聴覚重み合成フィルタ１１４，１１５のインパルス応
答ｆｎはｎが１０サンプル程度になると減衰が大きくな
り、打ち切りによる特性への影響が非常に小さくなって
くる。このため、擬似聴覚重み合成フィルタ１１４，１
１５のインパルス応答はＫ＝８〜１５程度で打ち切るこ
とが可能となる。このようにすると、位相探索の性能を
ほとんど落とすことなく擬似聴覚重みフィルタ１１４，
１１５で合成に要するフィルタリングの計算を大幅に削
減することができる。The pseudo auditory weight synthesis filter 114,
As a method for further reducing the calculation amount of the filtering operation F of 115, the impulse response fn of the filters 114 and 115 is obtained up to a predetermined K samples, and by setting fn = 0 (n> K), the truncated impulse response is obtained. It is also effective to search for the amount of phase shift using this. Normally, the impulse response fn of the pseudo auditory weighting synthesis filters 114 and 115 has a large attenuation when n is about 10 samples, and the effect of the truncation on the characteristics becomes very small. For this reason, the pseudo auditory weight synthesis filters 114, 1
Fifteen impulse responses can be terminated when K = about 8 to 15. In this way, the pseudo-auditory weighting filter 114,
At 115, the calculation of filtering required for synthesis can be significantly reduced.

【００４３】［音声復号化システムについて］図３は、
本実施形態に係る音声復号化方法を適用した音声復号化
システムの構成をブロック図で表したものであり、図１
に示した音声符号化システムに対応した構成となってい
る。[Speech Decoding System] FIG.
FIG. 1 is a block diagram showing a configuration of a speech decoding system to which a speech decoding method according to the present embodiment is applied.
Has a configuration corresponding to the speech encoding system shown in FIG.

【００４４】まず、符号入力端子から入力された符号ス
トリームは逆多重化部３０７に入力され、量子化された
合成フィルタ情報の符号Ａ、ピッチ周期Ｌの符号、雑音
符号Ｃおよびゲイン符号Ｇが分離される。これらの各符
号に基づいて復号化が行われる。First, the code stream input from the code input terminal is input to the demultiplexing unit 307, where the code A, pitch period L code, noise code C, and gain code G of the quantized synthesis filter information are separated. Is done. Decoding is performed based on each of these codes.

【００４５】合成フィルタ情報復号化部３０３では、符
号Ａで与えられる合成フィルタ情報を基に量子化された
線形予測係数（ＬＰＣ係数）が求められ、これが合成フ
ィルタ部３０４のフィルタ係数として使用される。The synthesis filter information decoding section 303 obtains a quantized linear prediction coefficient (LPC coefficient) based on the synthesis filter information given by the code A, and uses this as a filter coefficient of the synthesis filter section 304. .

【００４６】適応符号帳３０１は、過去の符号化された
音源信号を格納しており、過去の音源信号をピッチ周期
Ｌで繰り返すことにより適応符号ベクトルを生成する。
位相探索部３１２は、図１の符号化側の位相探索部１１
２と同様、パルス列設定部３１３、擬似聴覚重み合成フ
ィルタ３１４，３１５、位相候補設定部３１６および最
適位相選択部３１７により構成され、適応符号帳３０１
からの適応符号ベクトルとピッチ周期Ｌを用いて位相シ
フト量を求め、これを位相適応部３１９に出力する。す
なわち、擬似聴覚重み合成フィルタ設定部３１１におい
て、合成フィルタ情報復号化部３０３からの量子化され
た合成フィルタ情報を用いて、符号化側と同じ手順によ
り擬似聴覚重み付け合成フィルタ３１４，３１５の特性
を求め、これを用いて位相シフト量を求める。The adaptive codebook 301 stores a past encoded excitation signal, and generates an adaptive code vector by repeating the past excitation signal at a pitch period L.
The phase search unit 312 is a phase search unit 11 on the encoding side in FIG.
As in the second embodiment, the adaptive codebook 301 includes a pulse train setting unit 313, pseudo auditory weighting synthesis filters 314 and 315, a phase candidate setting unit 316, and an optimal phase selecting unit 317.
The phase shift amount is obtained using the adaptive code vector and the pitch period L from the above, and is output to the phase adaptation section 319. That is, the pseudo-auditory weighting synthesis filter setting unit 311 uses the quantized synthesis filter information from the synthesis filter information decoding unit 303 to change the characteristics of the pseudo-auditory weighting synthesis filters 314 and 315 in the same procedure as on the encoding side. Then, the phase shift amount is obtained using this.

【００４７】雑音符号帳３２０は、逆多重化部３０７か
らの雑音符号Ｃに対応する雑音ベクトルを出力する。位
相適応部３１９は、雑音符号帳３２０から出力される雑
音ベクトルを位相探索部３１２から与えられる位相シフ
ト量に応じた位置から切り出す。ピッチ周期化部３１８
は、この切り出された雑音ベクトルをピッチ周期Ｌで周
期化し、これを雑音符号ベクトルとして出力する。Noise codebook 320 outputs a noise vector corresponding to noise code C from demultiplexing section 307. Phase adaptation section 319 cuts out a noise vector output from noise codebook 320 from a position corresponding to the amount of phase shift given from phase search section 312. Pitch period 318
, The extracted noise vector is cycled by the pitch period L, and this is output as a noise code vector.

【００４８】ゲイン符号帳３２１は、適応符号帳３０１
から出力される適応符号ベクトルに適用するゲインＧ０
とピッチ周期化部３１８から出力される雑音符号ベクト
ルに適用するゲインＧ１の候補を格納しており、逆多重
化部３０７からのゲイン符号Ｇに対応して、これらのゲ
インＧ０，Ｇ１を出力する。The gain codebook 321 is the adaptive codebook 301
G0 applied to the adaptive code vector output from
And a candidate for a gain G1 to be applied to the noise code vector output from the pitch periodizing unit 318, and output these gains G0 and G1 corresponding to the gain code G from the demultiplexing unit 307. .

【００４９】適応符号ベクトルおよび雑音符号ベクトル
は、乗算部３０８および３１０でゲインＧ０，Ｇ１がそ
れぞれ乗じられた後、加算部３０９で加え合わせられて
音源ベクトルとなる。この音源ベクトルが合成フィルタ
３０４で合成され、音声信号が再生される。再生された
音声信号は、更に主観的な品質を上げるため、ポストフ
ィルタ３３０を適宜通過して最終的な音声信号として出
力される。The adaptive code vector and the noise code vector are multiplied by gains G0 and G1 in multipliers 308 and 310, respectively, and then added in adder 309 to form a sound source vector. The sound source vector is synthesized by the synthesis filter 304, and an audio signal is reproduced. The reproduced audio signal is appropriately passed through a post filter 330 and output as a final audio signal in order to further increase the subjective quality.

【００５０】[0050]

【発明の効果】以上説明したように、本発明の音声符号
化／復号化方法によれば、合成フィルタ情報から擬似的
に聴覚重み付けを行う擬似聴覚重み合成フィルタを構成
し、この擬似聴覚重み合成フィルタによる擬似聴覚重み
付きの合成波形レベルで、雑音符号帳から出力される雑
音ベクトルを切り出すための位相シフト量を探索するこ
とにより、４ｋｂｉｔ／ｓ程度の低レートにおいても符
号化歪みが聴覚的に感知されにくい高品質の音源信号の
符号化／復号化を行うことが可能となる。As described above, according to the speech encoding / decoding method of the present invention, a pseudo-auditory weight synthesis filter for performing pseudo-auditory weighting from synthetic filter information is constructed. By searching for a phase shift amount for extracting a noise vector output from the noise codebook at a synthetic waveform level with a pseudo auditory weight using a filter, coding distortion is audibly reduced even at a low rate of about 4 kbit / s. It becomes possible to perform encoding / decoding of a high-quality excitation signal that is hardly perceived.

[Brief description of the drawings]

【図１】本発明の一実施形態に係る音声符号化システム
の構成を示すブロック図FIG. 1 is a block diagram showing a configuration of a speech encoding system according to an embodiment of the present invention.

【図２】本発明と従来の位相探索に用いるフィルタの概
念図FIG. 2 is a conceptual diagram of a filter used in the present invention and a conventional phase search.

【図３】同実施形態に係る音声復号化システムの構成を
示すブロック図FIG. 3 is a block diagram showing the configuration of the speech decoding system according to the embodiment;

【図４】従来の音声符号化システムの構成を示すブロッ
ク図FIG. 4 is a block diagram showing a configuration of a conventional speech encoding system.

[Explanation of symbols]

１０１…適応符号帳１０２…線形予測分析部１０３…合成フィルタ情報量子化部１０４…合成フィルタ部（第１の合成フィルタ）１０５…聴覚重み付き歪み計算部１０６…符号選択部１０７…多重化部１０８，１１０…ゲイン乗算部１０９…加算部１１１…擬似聴覚重み合成フィルタ設定部１１２…位相探索部１１３…パルス列設定部１１４，１１５…擬似聴覚重み合成フィルタ（第２の合
成フィルタ）１１６…位相候補設定部１１７…最適位相選択部１１８…ピッチ周期化部１１９…位相適応部１２０…雑音符号帳１２１…ゲイン符号帳１２２…減算部２０１…合成フィルタ２０２…擬似聴覚重み合成フィルタ３０１…適応符号帳３０３…合成フィルタ情報復号化部３０４…合成フィルタ部（第１の合成フィルタ）３０７…逆多重化部３０８，３１０…ゲイン乗算部３０９…加算部３１１…擬似聴覚重み合成フィルタ設定部３１２…位相探索部３１３…パルス列設定部３１４，３１５…擬似聴覚重み合成フィルタ（第２の合
成フィルタ）３１６…位相候補設定部３１７…最適位相選択部３１８…ピッチ周期化部３１９…位相適応部３２０…雑音符号帳３２１…ゲイン符号帳３３０…ポストフィルタReference Signs List 101: adaptive codebook 102: linear prediction analysis unit 103: synthesis filter information quantization unit 104: synthesis filter unit (first synthesis filter) 105: perceptually weighted distortion calculation unit 106: code selection unit 107: multiplexing unit 108 110, a gain multiplying unit 109, an adding unit 111, a pseudo auditory weight synthesis filter setting unit 112, a phase search unit 113, a pulse train setting unit 114, 115, a pseudo auditory weight synthesis filter (second synthesis filter) 116, a phase candidate setting Unit 117: Optimal phase selection unit 118: Pitch periodicization unit 119: Phase adaptation unit 120: Noise codebook 121 ... Gain codebook 122 ... Subtraction unit 201 ... Synthesis filter 202 ... Pseudo-auditory weight synthesis filter 301 ... Adaptive codebook 303 ... Synthesis filter information decoding section 304... Synthesis filter section (first synthesis filter) 307 ... Demultiplexing sections 308 and 310... Gain multiplication section 309... Addition section 311... Pseudo-auditory weight synthesis filter setting section 312. 316: phase candidate setting unit 317: optimum phase selection unit 318: pitch period 319: phase adaptation unit 320: noise codebook 321: gain codebook 330: post filter

Claims

[Claims]

(A) generating an adaptive code vector by repeating, at a pitch cycle, a drive excitation signal output from an adaptive codebook storing past drive excitation signals; and (b) repeating at the pitch cycle. Phase shift amount for extracting a noise vector output from a noise codebook storing a plurality of noise vectors so that an error between a waveform synthesized from a pulse train to be synthesized and a waveform synthesized from the adaptive code vector is smaller. (C) generating a noise code vector by periodicizing the noise vector cut out in accordance with the phase shift amount with the pitch period; and (d) analyzing the input speech signal. The first synthesis filter whose characteristics are determined in accordance with the synthesis filter information obtained from the result is combined with the adaptive code vector and the noise code vector. Generating a synthesized speech signal by driving using a vector; (e) encoding synthesis filter information representing characteristics of the first synthesis filter; (f) auditory weighting of the synthesized speech signal Searching for the pitch period and the noise vector at which the subsequent distortion is smaller, wherein the phase shift amount searching step includes a second synthesizing filter for synthesizing speech by performing pseudo auditory weighting. An error between a synthesized waveform synthesized from the pulse train by the second synthesis filter and a synthesized waveform synthesized from the adaptive code vector by the second synthesis filter is reduced based on the synthesis filter information. A speech encoding method characterized by searching for the amount of phase shift as described above.

2. The speech encoding method according to claim 1, wherein an impulse response of said second synthesis filter is truncated at a finite number of predetermined samples.

(A) generating an adaptive code vector by repeating a drive excitation signal output from an adaptive codebook storing past drive excitation signals at the pitch cycle; and (b) generating an adaptive code vector at the pitch cycle. A phase shift for cutting out a noise vector output from a noise codebook storing a plurality of noise vectors so that an error between a waveform synthesized from a repeated pulse train and a waveform synthesized from the adaptive code vector becomes smaller. (C) generating a noise code vector by periodicizing the noise vector cut out in accordance with the phase shift amount with the pitch period, (d) according to the synthesis filter information Driving the first synthesis filter whose characteristics have been determined using the adaptive code vector and the noise code vector. Generating a reproduced audio signal, wherein the phase shift amount searching step configures a second synthesis filter for synthesizing audio by performing pseudo auditory weighting from the synthesis filter information, 2. The phase shift amount is searched such that an error between a synthesized waveform synthesized from the pulse train by the second synthesis filter and a synthesized waveform synthesized from the adaptive code vector by the second synthesis filter becomes smaller. Audio decoding method.

4. The speech decoding method according to claim 3, wherein an impulse response of said second synthesis filter is truncated at a finite number of predetermined samples.