JP3126761B2

JP3126761B2 - Audio encoding / decoding device

Info

Publication number: JP3126761B2
Application number: JP03236112A
Authority: JP
Inventors: 真哉高橋; 邦男中島
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1991-09-17
Filing date: 1991-09-17
Publication date: 2001-01-22
Anticipated expiration: 2016-01-22
Also published as: JPH0573096A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、音声をディジタル伝
送あるいは蓄積する場合などに用いられるもので、音声
の音源情報をベクトル量子化する音声符号化・復号化装
置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is used for digitally transmitting or storing voice, and more particularly to a voice coding / decoding apparatus for vector-quantizing sound source information of voice.

【０００２】[0002]

【従来の技術】音声の音源情報をベクトル量子化する従
来の音声符号化・復号化装置として図３に示すものがあ
る。図３は、 M.R.Shoreoder,B.S.Atal 著 ”Code-Exc
itedLinear Prediction (CELP) : High-Quality Speech
at very Low Bit Rates”(ICASSP■85,pp937-940 1985)
に示されたものと同様なものである。図において、１は
離散サンプル値系列に変換された入力音声信号が入力さ
れる経路を示す。２は経路１より入力音声信号が加えら
れ、線形予測係数を求める周波数スペクトル分析手段で
ある。３は複数の音源ベクトルを記憶している音源ベク
トル符号帳、４は切り替えスイッチである。６は切り替
えスイッチ４より音源ベクトルを選び、経路５より線形
予測係数が加えられて合成音声ベクトルを合成する線形
予測合成フィルタＦ手段である。８は経路１より入力音
声信号を切り出した入力音声ベクトルが、経路７より合
成音声ベクトルが加えられる最適音源ベクトル選択Ｆ手
段である。１４は経路５より符号化された線形予測係数
が、経路１１より最適音源ベクトルのインデックスと符
号化された最適音源ゲインが加えられる多重化手段であ
る。以上で符号化装置を構成する。2. Description of the Related Art FIG. 3 shows a conventional speech encoding / decoding apparatus for performing vector quantization of speech source information. Fig. 3 is from MRShoreoder, BSAtal, "Code-Exc
itedLinear Prediction (CELP): High-Quality Speech
at very Low Bit Rates ”(ICASSP ■ 85, pp937-940 1985)
Are similar to those shown in FIG. In the figure, reference numeral 1 denotes a path through which an input audio signal converted into a discrete sample value sequence is input. Reference numeral 2 denotes a frequency spectrum analyzing unit to which an input voice signal is added from the path 1 to obtain a linear prediction coefficient. Reference numeral 3 denotes an excitation vector codebook storing a plurality of excitation vectors, and 4 denotes a changeover switch. Reference numeral 6 denotes a linear prediction synthesis filter F means for selecting a sound source vector from the changeover switch 4 and adding a linear prediction coefficient from the path 5 to synthesize a synthesized speech vector. Reference numeral 8 denotes an optimal sound source vector selection F means for adding an input voice vector obtained by cutting out the input voice signal from the path 1 and a synthesized voice vector from the path 7. Reference numeral 14 denotes multiplexing means for adding the linear prediction coefficient encoded from the path 5 and the index of the optimal excitation vector and the encoded optimal excitation gain from the path 11. The encoding device is configured as described above.

【０００３】１５は多重化された符号化音声情報が伝送
される伝送路である。１６は伝送路１５より多重化され
た符号化音声情報が加えられる分離手段、１９は経路１
８より最適音源ベクトルのインデックスと符号化された
最適音源ゲインが、伝送された符号で指定された音源ベ
クトルが経路２２を通じて、加えられる音源信号復号化
手段である。２１は符号化装置の中の音源ベクトル符号
帳３と対をなしていて、複数の音源ベクトルを記憶して
いる。２４は経路１７より符号化された線形予測係数
が、経路２３より復号化された音源信号が加えられ、音
声信号が復号される線形予測合成フィルタ手段である。
この出力が経路２５を通じて復号音声信号として出され
る。[0005] Reference numeral 15 denotes a transmission line through which multiplexed coded audio information is transmitted. 16 is a separating means to which the coded audio information multiplexed from the transmission path 15 is added, and 19 is a path 1
8 is an excitation signal decoding means in which the excitation vector specified by the transmitted code is added via the path 22 to the index of the optimal excitation vector and the encoded optimal excitation gain. Reference numeral 21 is paired with the excitation vector codebook 3 in the encoding device, and stores a plurality of excitation vectors. Reference numeral 24 denotes a linear prediction synthesis filter means for decoding the audio signal by adding the linear prediction coefficient encoded from the path 17 and the excitation signal decoded from the path 23.
This output is output as a decoded audio signal through a path 25.

【０００４】以下、従来の音声符号化復号化装置の動作
について説明する。まず図３（ａ）の符号化部について
説明する。離散サンプル値の時系列に変換された入力音
声信号が経路１より周波数スペクトル分析手段２に入力
される。周波数スペクトル分析手段２は固定されたＮ点
のフレーム毎に入力音声信号を線形予測分析し、線形予
測係数を求める。そして経路５より線形予測合成フィル
タＦ手段６に出力すると共に、この線形予測係数を符号
化し、経路５より多重化手段１４に出力する。音源ベク
トル符号帳３には、予めＭ個の音源ベクトルが記憶され
ており、各音源ベクトルは切り替えスイッチ４を介して
順次線形予測合成フィルタＦ手段６へ読み出される。こ
の時の音源ベクトルの次元長は一定値Ｊに設定されてい
る。線形予測合成フィルタＦ手段６は、経路５より入力
された線形予測係数を用い、切り替えスイッチ４より入
力されたＭ個の音源ベクトルそれぞれに線形予測合成フ
ィルタ処理を施し、Ｍ個のＪ次元合成音声ベクトルを合
成をして経路７に出力する。[0004] The operation of the conventional speech encoding / decoding apparatus will be described below. First, the encoding unit shown in FIG. An input audio signal converted into a time series of discrete sample values is input to the frequency spectrum analysis unit 2 from the path 1. The frequency spectrum analysis means 2 performs linear prediction analysis on the input speech signal for each of the fixed N points of frames to obtain a linear prediction coefficient. Then, the signal is output from the path 5 to the linear prediction synthesis filter F means 6, the linear prediction coefficient is encoded, and output to the multiplexing means 14 from the path 5. Excitation vector codebook 3 stores M excitation vectors in advance, and each excitation vector is sequentially read out to linear predictive synthesis filter F means 6 via switch 4. The dimension length of the sound source vector at this time is set to a constant value J. The linear prediction synthesis filter F means 6 performs a linear prediction synthesis filter process on each of the M sound source vectors input from the changeover switch 4 using the linear prediction coefficients input from the path 5, and obtains M J-dimensional synthesized speech. The vectors are combined and output to path 7.

【０００５】最適音源ベクトル選択Ｆ手段８は、該フレ
ーム内の入力音声信号をＬ（＝Ｎ/Ｊ）個のＪ次元の入
力音声ベクトルに等分割し、まず該フレーム内の最初の
入力音声ベクトルと、経路７より入力されたＭ個の合成
音声ベクトルとのベクトル間距離Ｄを例えば式（１）に
示すように自乗距離によってそれぞれ計算する。[0005] The optimal sound source vector selection F means 8 equally divides the input speech signal in the frame into L (= N / J) J-dimensional input speech vectors. , And the inter-vector distance D between the M synthesized speech vectors input from the path 7 is calculated by the square distance, for example, as shown in Expression (1).

【０００６】[0006]

【数１】 (Equation 1)

【０００７】ここでＳi は入力音声ベクトル、Ｓ■iは
合成音声ベクトル、Ｊは固定次元長である。次にこのベ
クトル間距離が最小となる合成音声ベクトルを探索し、
その合成音声ベクトルの音源となった音源ベクトル（以
降最適音源ベクトルと呼ぶ）のインデックスを経路１１
に出力する。以上の処理で最適音源ベクトルが選択さ
れ、入力音声ベクトルに対応する音源信号のベクトル量
子化が完遂される。なおこのとき、最適音源ベクトルに
対応する最適音源ゲインも計算され、符号化されて経路
１１に出力される。最適音源ベクトル選択Ｆ手段８は以
上の最適音源ベクトル選択処理を、該フレーム内の最初
から最後までＬ個の入力音声ベクトルについて順次実行
する。図４にフレーム内のＮ点の入力音声信号をＬ個
（Ｌ＝４）に等分割して得られたＪ次元の入力音声ベク
トルと、各入力音声ベクトルに対して選択された最適音
源ベクトルの例を示す。多重化手段１４は、経路５より
入力される符号化された線形予測係数と、経路１１より
入力される最適音源ベクトルのインデックス及び符号化
された最適音源ゲインを多重化し、伝送路１５に出力す
る。Here, Si is an input speech vector, S ■ i is a synthesized speech vector, and J is a fixed dimension length. Next, a search is made for a synthesized speech vector in which the distance between the vectors is minimized.
The index of the sound source vector (hereinafter referred to as the optimum sound source vector) that has become the sound source of the synthesized speech vector is assigned to the path 11
Output to Through the above processing, the optimal sound source vector is selected, and the vector quantization of the sound source signal corresponding to the input speech vector is completed. At this time, the optimal excitation gain corresponding to the optimal excitation vector is also calculated, encoded, and output to the path 11. The optimum sound source vector selection F means 8 sequentially executes the above-described optimum sound source vector selection processing for the L input speech vectors from the beginning to the end in the frame. FIG. 4 shows a J-dimensional input speech vector obtained by equally dividing N (L = 4) input speech signals at N points in a frame, and an optimal sound source vector selected for each input speech vector. Here is an example. The multiplexing unit 14 multiplexes the coded linear prediction coefficient input from the path 5, the index of the optimal excitation vector and the encoded optimal excitation gain input from the path 11, and outputs the multiplexed linear prediction coefficient to the transmission path 15. .

【０００８】次に図３（ｂ）の復号化部について説明す
る。分離手段１６は、伝送路１５より入力された符号化
された線形予測係数と、Ｌ個の最適音源ベクトルのイン
デックス及び符号化された最適音源ゲインを分離する。
そして、線形予測係数を経路１７に、最適音源ベクトル
のインデックスと最適音源ゲインを経路１８に出力す
る。音源信号復号手段１９は、経路１８より入力された
符号化された最適音源ベクトルのインデックスを復号化
し、そのインデックスが指定する音源ベクトルを音源ベ
クトル符号帳２１より読み出す。ここで、音源ベクトル
符号帳２１の内容は符号化装置の音源ベクトル符号帳３
と同一である。また符号化された最適音源ゲインを復号
化し、先に音源ベクトル符号帳２１より読み出した音源
ベクトルに乗じて復号音源ベクトルを生成する。以上の
処理を該フレームの最初から最後までのＬ個の最適音源
ベクトルインデックスについて行い、１フレーム分の復
号音源ベクトルを求めて経路２３に出力する。線形予測
合成フィルタ手段２４は、経路１７より入力された符号
化された線形予測係数を復号化し、この復号化線形予測
係数を用いて線形予測合成フィルタを構成する。そして
経路２３より入力された１フレーム分の復号音源ベクト
ルを入力として線形予測合成フィルタ処理を行い、該フ
レームの復号音声信号を合成し、経路２５より出力す
る。Next, the decoding section shown in FIG. 3B will be described. The separating unit 16 separates the coded linear prediction coefficient input from the transmission path 15, the index of the L optimal excitation vectors, and the encoded optimal excitation gain.
Then, the linear prediction coefficient is output to a path 17, and the index of the optimal excitation vector and the optimal excitation gain are output to a path 18. The excitation signal decoding means 19 decodes the encoded optimal excitation vector index input from the path 18 and reads the excitation vector specified by the index from the excitation vector codebook 21. Here, the contents of excitation vector codebook 21 correspond to excitation vector codebook 3 of the encoding apparatus.
Is the same as In addition, it decodes the encoded optimal excitation gain and multiplies the excitation vector read from the excitation vector codebook 21 in advance to generate a decoded excitation vector. The above processing is performed for the L optimal excitation vector indices from the beginning to the end of the frame, and a decoded excitation vector for one frame is obtained and output to the path 23. The linear prediction synthesis filter means 24 decodes the coded linear prediction coefficients input from the path 17 and configures a linear prediction synthesis filter using the decoded linear prediction coefficients. Then, a linear prediction synthesis filter process is performed by using the decoded excitation vector for one frame input from the path 23 as an input, and a decoded speech signal of the frame is synthesized, and output from the path 25.

【０００９】[0009]

【発明が解決しようとする課題】従来の音声符号化・復
号化装置は以上のように構成されているので、フレーム
内の入力音声信号を均等に分割して入力音声ベクトルを
求め、それぞれに対応する音源ベクトルをベクトル量子
化する。しかし、フレーム内の音声信号にはパワ−やス
ペクトル形状の変化点等のように、フレーム内の他の部
分よりベクトル次元長を短縮して精度よく音源信号をベ
クトル量子化する必要がある部分が存在し、従来の音声
符号化・復号化装置のような入力音声信号を均等分割す
る方法では、これらの部分のベクトル次元長を適応的に
調節することができない課題があった。Since the conventional speech encoding / decoding apparatus is configured as described above, an input speech signal in a frame is equally divided to obtain an input speech vector, and a corresponding Vector quantization of the sound source vector to be performed. However, in the audio signal in the frame, there is a portion where it is necessary to perform the vector quantization of the sound source signal more accurately by shortening the vector dimension length than the other portions in the frame, such as power and a change point of the spectrum shape. There is a problem that a method of equally dividing an input speech signal as in a conventional speech encoding / decoding apparatus cannot adaptively adjust the vector dimension length of these portions.

【００１０】この発明は、上記のような課題を解消する
ためになされたもので、フレーム内の音源信号をベクト
ル量子化する際のベクトル次元長を適応的に調節し、必
要部分のベクトル量子化精度を向上することができるよ
うにしたものである。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problem, and adaptively adjusts a vector dimension length when vector-quantizing an excitation signal in a frame, and performs vector quantization of a necessary portion. The accuracy can be improved.

【００１１】[0011]

【課題を解決するための手段】この発明に係る音声符号
化・復号化装置は、入力音声信号を固定フレームで切り
出して離散サンプル値系列に変換して線形予測計数を求
めて伝送すると共に、音源ベクトル符号帳にある定めら
れた複数の音源ベクトルを基に固定フレームでの離散サ
ンプル値ベクトルと選択した音源ベクトルに基づく合成
音声ベクトルとの歪みを最小にする音源ベクトル列を選
択して符号化して伝送する符号化部と、伝送された線形
予測計数を復号化すると共に伝送された選択後の符号化
音源ベクトル列を基に対応した音源ベクトル符号帳から
順次、音源ベクトルを選択して復号する復号化部からな
る構成において、符号化部では、フレームを異なる長さ
で分割した複数のベクトル次元長配列を記憶してこれら
複数の次元長配列の中からフレームの歪みが最小となる
ベクトル次元長の配列を出力する最適ベクトル次元長決
定手段を備え、音源ベクトルと、離散サンプル値から得
られる線形予測計数とで合成音声ベクトルを生成する線
形予測合成フィルタ手段は、異なる長さのベクトル次元
長に基づいて合成音声ベクトルを生成するようにし、ま
た歪みを最小にする音源ベクトルを選択する対敵音源ベ
クトル選択手段は、上記フレームにおいて複数回の歪み
計算を行って最適ベクトル次元長決定手段が最小歪みと
するベクトル次元長配列に基づいて符号化出力するよう
にし、復号化部では、伝送された可変長の最小歪みのベ
クトル次元長に基づいて、順次、フレーム内の音源ベク
トルを復号するベクトル次元長指定音源信号復号手段を
備えた。A speech encoding / decoding device according to the present invention extracts an input speech signal by a fixed frame, converts the signal into a discrete sample value sequence, and obtains a linear prediction count.
And a discrete frame in a fixed frame based on a plurality of excitation vectors defined in the excitation vector codebook.
Synthesis based on sample value vector and selected sound source vector
An encoding unit for encoding and transmitting by selecting the excitation vector sequence that minimizes the distortion of the speech vector, linear transmitted
In the configuration comprising a decoding unit that selects and decodes an excitation vector, sequentially from the excitation vector codebook corresponding to the excitation vector sequence based on the selected and encoded transmitted and decoded excitation vector sequence. The encoding unit stores a plurality of vector dimensional length arrays obtained by dividing the frame into different lengths, and outputs an array of the vector dimensional length that minimizes frame distortion from the plurality of dimensional length arrays. A linear prediction synthesis filter for generating a synthesized speech vector from the sound source vector and the linear prediction count obtained from the discrete sample values, the synthesis prediction unit generating the synthesized speech vector based on the different vector dimension lengths And an enemy sound source vector selecting means for selecting a sound source vector that minimizes distortion performs a plurality of distortion calculations in the frame. The suitable vector dimension length determining means performs encoding and output based on the vector dimension length array having the minimum distortion, and the decoding unit sequentially performs intra-frame decoding based on the transmitted variable dimension minimum distortion vector dimension length. A vector dimension length designation excitation signal decoding means for decoding the excitation vector of

【００１２】[0012]

【作用】本発明における音声符号化・復号化装置は、最
適ベクトル次元長決定手段が予め複数個用意したベクト
ル次元長の配列のセットの中から、演算の１セット毎に
次元長の配列のひとつを取り出して線形予測合成フィル
タ手段と最適音源ベクトル選択手段に送られる。また歪
み最小の信号を受け取り、その時の次元長の配列が伝送
出力される。このベクトル次元長の配列により最適音源
ベクトル選択手段は順次、歪みを計算出力し、歪み最小
の音源ベクトルインデックスが伝送出力される。復号化
部では、伝送されてきた音源インデックスと次元長の配
列の符号により、音源ベクトルの種類とベクトル長が定
まる。According to the speech encoding / decoding device of the present invention, the optimal vector dimension length determining means selects one of the array of dimension lengths for each set of operations from a set of vector dimension length arrays prepared in advance. And sent to the linear prediction synthesis filter means and the optimal excitation vector selection means. Further, the signal having the minimum distortion is received, and the array of the dimension length at that time is transmitted and output. The optimal excitation vector selection means sequentially calculates and outputs the distortion based on the arrangement of the vector dimension lengths, and transmits and outputs the excitation vector index with the minimum distortion. In the decoding unit, the type and vector length of the excitation vector are determined based on the transmitted excitation index and the code of the array of the dimension length.

【００１３】[0013]

【Example】

実施例１．図１はこの発明の一実施例の構成図であり、
以下この図について説明する。なお図１において図３と
同一か相当部分については同一符号を付し、同一部分に
ついては説明を省略する。図１において、３０はベクト
ル次元長の配列を複数個記憶していて、経路１２よりフ
レーム内歪量が加えられ、従来の最適音源ベクトル選択
の計算に相当する１セットの時間はひとつの次元長の配
列を保持していて、１セットの計算が済むと、次の次元
長の配列を保持し、さらに後述の最適音源ベクトル選択
手段から歪み計算結果を受け取り、歪み最小の時のベク
トル次元長の配列を出力する最適ベクトル次元長決定手
段である。また３６は切り替えスイッチ４で選択された
音源ベクトルが、また周波数スペクトル分析手段２より
線形予測係数が、経路９よりベクトル次元長が加えられ
る線形予測合成フィルタである。従来の線形予測合成フ
ィルタ手段６とはベクトル次元長が可変になっている点
で異なる。Embodiment 1 FIG. FIG. 1 is a configuration diagram of an embodiment of the present invention.
Hereinafter, this figure will be described. In FIG. 1, the same or corresponding parts as those in FIG. 3 are denoted by the same reference numerals, and the description of the same parts will be omitted. In FIG. 1, reference numeral 30 stores a plurality of arrays of vector dimension lengths, the amount of intra-frame distortion is added from the path 12, and one set of time corresponding to the conventional calculation of the optimal excitation vector selection takes one dimension length. When one set of calculations is completed, the array of the next dimension length is held. Further, the distortion calculation result is received from the optimal sound source vector selecting means described later, and the vector dimension length at the minimum distortion is obtained. This is an optimal vector dimension length determining means for outputting an array. Reference numeral 36 denotes a linear prediction synthesis filter to which the sound source vector selected by the changeover switch 4, the linear prediction coefficient from the frequency spectrum analysis means 2, and the vector dimension length from the path 9 are added. It differs from the conventional linear prediction synthesis filter means 6 in that the vector dimension length is variable.

【００１４】３８は経路１より入力音声信号が、経路７
より合成音声信号が、経路９よりベクトル次元長の配列
が加えられ、これらの歪みを計算し、計算結果を記憶
し、出力する、最適音源ベクトル選択手段である。１
４、１６はそれぞれ多重化手段、分離手段であるが、ベ
クトル次元長の配列を知らせる符号も多重化、分離す
る。３９は経路１８より最適音源ベクトルのインデック
スと符号化された最適音源ゲインが、経路２０より符号
化された最適ベクトル次元長の配列が、経路２２より音
源ベクトルが加えられ、これらにより順次フレ−ム内の
音源ベクトルを復号するベクトル長指定音源信号復号化
手段である。Reference numeral 38 denotes an input audio signal from the path 1
This is an optimal sound source vector selecting means for adding an array having a vector dimension length from the path 9 to the synthesized speech signal, calculating these distortions, storing and outputting the calculation results. 1
Numerals 4 and 16 denote multiplexing means and demultiplexing means, respectively, which multiplex and demultiplex codes indicating the vector dimension length array. Numeral 39 indicates an index of the optimal excitation vector and the encoded optimal excitation gain from the path 18, an array of the optimal vector dimension length encoded from the path 20, and the excitation vector from the path 22. Is a vector-length-specified excitation signal decoding means for decoding the excitation vector in the vector.

【００１５】以下、本発明の一実施例の動作を図１に基
づいて説明する。まず図１（ａ）の符号化部について説
明する。最適ベクトル次元長決定手段３０は，図２に示
すようにフレーム内のベクトル次元長の配列セットを予
めＫ個用意する（図２は長さＮのフレーム内を４通りの
比率で分割し、Ｋ＝４個のベクトル次元長の配列セット
を設定した例である）。そして、先ずセット１のベクト
ル次元長の配列を経路９を通じて線形予測合成フィルタ
手段３６と最適音源ベクトル選択手段３８に出力する。
線形予測合成フィルタ手段３６は、経路９より入力され
たセット１のベクトル次元長の配列の順序に従って、フ
レームの最初から音源ベクトル符号帳３内の音源ベクト
ルを元に、Ｍ個の合成音声ベクトルを合成する（図２の
セット１はたまたま従来と同じ等分割、等次元である
が、セット２は第１、第２の部分は３Ｎ／１６、第３、
第４の部分は５Ｎ／１６に設定されている）。そして、
経路７より最適音源ベクトル選択手段３８に出力する。
このとき音源ベクトル符号帳３内の各音源ベクトルの次
元長は最適次元決定手段３０が設定する最大のベクトル
次元長と同じ値に設定しておき、指定される次元長に応
じて例えば第１次元目から切り出して使用する（図２の
ベクトル次元長が最大であるとすると、この場合は５Ｎ
／１６）。The operation of one embodiment of the present invention will be described below with reference to FIG. First, the encoding unit in FIG. 1A will be described. The optimal vector dimension length determining means 30 prepares K vector dimension length array sets in the frame in advance as shown in FIG. 2 (FIG. 2 divides the frame of length N into four types, and = This is an example in which an array set of four vector dimension lengths is set). Then, first, the array of the vector dimension length of the set 1 is output to the linear prediction synthesis filter means 36 and the optimum excitation vector selection means 38 through the path 9.
The linear prediction synthesis filter means 36 converts M synthesized speech vectors from the beginning of the frame based on the excitation vectors in the excitation vector codebook 3 in accordance with the order of the vector dimension length of the set 1 input from the path 9. (The set 1 in FIG. 2 happens to be the same as the conventional one, having the same division and the same dimension, but the set 2 is the first, the second part is 3N / 16, the third,
The fourth part is set to 5N / 16). And
The signal is output to the optimum sound source vector selecting means 38 through the path 7.
At this time, the dimension length of each excitation vector in the excitation vector codebook 3 is set to the same value as the maximum vector dimension length set by the optimal dimension determining means 30, and for example, the first dimension is set according to the specified dimension length. It is cut out from the eye and used (assuming that the vector dimension length in FIG. 2 is the maximum, in this case, 5N
/ 16).

【００１６】最適音源ベクトル選択手段３８は経路９よ
り入力されたセット１のベクトル次元長の配列で指定の
順序で、フレーム内の入力音声信号より入力音声ベクト
ルを切り出し、切り出した入力音声ベクトルと経路７よ
り入力されたＭ個の合成音声ベクトルとの自乗距離Ｄを
式（２）に従って計算する。The optimum sound source vector selecting means 38 cuts out the input sound vector from the input sound signal in the frame in the order specified by the vector dimension length array of the set 1 input from the path 9, and The square distance D with the M synthesized speech vectors input from step 7 is calculated according to equation (2).

【００１７】[0017]

【数２】 (Equation 2)

【００１８】ここでＳi は入力音声ベクトル、Ｓ■iは
合成音声ベクトル、Ｊ■ は可変次元長である。そして
この計算の中で、最小距離となる合成信号ベクトルを合
成する音源ベクトル（最適音源ベクトル）のインデック
スと最適音源ゲイン及びそのときの最小距離を記憶す
る。線形予測合成フィルタ手段３６と最適音源ベクトル
選択手段３８は以上の処理をフレームの最初から最後ま
で、セット１のベクトル次元長の配列の各ベクトル次元
長に従って順次実行する。最適音源ベクトル選択手段３
８は各入力音源ベクトル毎に求めて記憶された最小距離
値を合計し、セット１のベクトル次元長の配列に対応す
るフレーム内歪とする。これが式（２）の意味である。
そして経路１２を通じて最適ベクトル次元長決定手段３
０に出力する。Here, Si is an input speech vector, S ■ i is a synthesized speech vector, and J ■ is a variable dimension length. In this calculation, the index of the sound source vector (optimal sound source vector) for synthesizing the synthesized signal vector having the minimum distance, the optimum sound source gain, and the minimum distance at that time are stored. The linear predictive synthesis filter means 36 and the optimal sound source vector selecting means 38 sequentially execute the above processing from the beginning to the end of the frame according to each vector dimension length of the array of vector dimension lengths of the set 1. Optimal sound source vector selection means 3
Numeral 8 sums the minimum distance values obtained and stored for each input sound source vector, and sets the sum as the intra-frame distortion corresponding to the vector dimension length array of the set 1. This is the meaning of equation (2).
Then, the optimal vector dimension length determining means 3 through the path 12
Output to 0.

【００１９】最適ベクトル次元長決定手段３０は続いて
セット２からセットＫまでのベクトル次元長の配列を順
次経路９に出力する。線形予測合成フィルタ手段３６と
最適ベクトル選択手段３８は上記と同じ処理をこのセッ
ト２からセットＫまでのベクトル次元長の配列について
実行する。その後、最適ベクトル次元長決定手段３０
は、経路１２より入力された各ベクトル次元長セットに
対応するフレーム内歪（Ｋ個）の中から最小値を選択
し、そのときのベクトル次元長の配列のインデックスま
たは符号を経路１３より出力する。また、最適音源ベク
トル選択手段３８は最適ベクトル次元長決定手段３０が
選択したベクトル次元長の配列に対して求めた最適音源
ベクトルのインデックスと最適音源ゲインを符号化して
経路１１より出力する。Subsequently, the optimal vector dimension length determining means 30 sequentially outputs an array of vector dimension lengths from set 2 to set K to the path 9. The linear predictive synthesis filter means 36 and the optimum vector selecting means 38 execute the same processing as described above for the array of the vector dimension length from the set 2 to the set K. Then, the optimal vector dimension length determining means 30
Selects the minimum value from the in-frame distortions (K) corresponding to each vector dimension length set input from the path 12 and outputs the index or code of the vector dimension length array at that time from the path 13 . Further, the optimal excitation vector selecting means 38 encodes the index of the optimal excitation vector and the optimal excitation gain obtained for the array of the vector dimension length selected by the optimal vector dimension length determining means 30 and outputs them from the path 11.

【００２０】以上の方法により、計算回数は増えるが、
入力音声信号と合成音声信号の間のフレーム全体の歪を
より最小とするように、音源信号をベクトル量子化する
ベクトル次元長が決定される。次に図１（ｂ）の復号化
部の動作について説明する。分離手段１６はベクトル次
元長セットのインデックスを、経路２０よりベクトル長
指定音源信号復号化手段３９に出力する。ベクトル長指
定音源信号復号化手段３９はベクトル次元長の配列のイ
ンデックスよりフレーム内の各ベクトル次元長を復号す
る。このとき、この各ベクトル次元長に従ってフレーム
の最初から最後まで各音源ベクトルを復号し、指定のベ
クトルの次元長が音源ベクトル符号帳ベクトル次元長よ
り短い場合は指定の次元までを切り出して使用する。こ
の復号化出力は経路２３を通じて、また符号化された線
形予測係数が経路１７を通じて線形予測合成フイルタ２
４に加えられ、音声を合成して２５に出力する。Although the number of calculations is increased by the above method,
The vector dimension length for vector quantization of the sound source signal is determined so as to minimize the distortion of the entire frame between the input speech signal and the synthesized speech signal. Next, the operation of the decoding unit in FIG. 1B will be described. The separation unit 16 outputs the index of the vector dimension length set to the vector length designation excitation signal decoding unit 39 via the path 20. The vector length designation excitation signal decoding means 39 decodes each vector dimension length in the frame from the index of the vector dimension length array. At this time, each excitation vector is decoded from the beginning to the end of the frame according to each vector dimension length, and if the dimension length of the specified vector is shorter than the dimension length of the excitation vector codebook vector, cutout to the specified dimension is used. This decoded output is passed through a path 23 and the encoded linear prediction coefficients are passed through a path 17 to the linear prediction synthesis filter 2.
Then, the voice is synthesized and output to 25.

【００２１】実施例２．なお、上記実施例１では最適ベ
クトル次元長決定手段は、音源ベクトルより合成した合
成音声ベクトルと入力音声ベクトルの歪がフレーム全体
で最小になるようにベクトル次元長セットを選択した
が、例えば入力音声信号のフレーム内の部分パワーを求
め、パワーが大きい部分やパワー変化が大きい部分に短
いベクトル次元長を割り当てる基準でベクトル次元長セ
ットを選択するようにしても良い。即ちフレ−ム内を１
等分し、各部分のパワ−のＰ1 、Ｐ2 、Ｐ3 をそれぞれ
求め、次のような条件で図２におけるベクトル次元長の
配列のセットを選択する。Ｐ₁ ＞Ｐ₂ ＋ａかつＰ₁ ＞Ｐ₃ ＋ａ →セット２（３）Ｐ₂ ＞Ｐ₁ ＋ａかつＰ₂ ＞Ｐ₃ ＋ａ →セット３（４）Ｐ₃ ＞Ｐ₁ ＋ａかつＰ₃ ＞Ｐ₂ ＋ａ →セット４（５）（３）から（５）以外のとき →セット１（６）ただしａは定数である。Embodiment 2 FIG. In the first embodiment, the optimal vector dimension length determining means selects the vector dimension length set such that the distortion between the synthesized speech vector synthesized from the sound source vector and the input speech vector is minimized in the entire frame. A partial power within a signal frame may be obtained, and a vector dimension length set may be selected based on a standard for assigning a short vector dimension length to a large power section or a large power change section. That is, 1 in the frame
The powers P1, P2, and P3 of each part are obtained by equal division, and a set of vector dimension length arrays in FIG. 2 is selected under the following conditions. P _1> P ₂ + a and P _1> P ₃ + a → set _{2 (3) P 2> P} 1 + a and P _2> P ₃ + a → set _{3 (4) P 3> P} 1 + a and P _3> P ₂ + A → set 4 (5) In cases other than (3) to (5) → set 1 (6) where a is a constant.

【００２２】実施例３．また、上記実施例１では最適ベ
クトル次元長決定手段は、音源ベクトルより合成した合
成音声ベクトルと入力音声ベクトルの歪がフレーム全体
で最小になるようにベクトル次元長セットを選択した
が、例えば入力音声信号のフレーム内におけるスペクト
ル形状の時間変化を求め、変化が大きい部分に短いベク
トル次元長を割り当てる基準でベクトル次元長セットを
選択するようにしても良い。即ち、先ずフレ−ム内の各
サンプル点におけるスペクトルを求め、これを用いて２
サンプル間のスペクトルの距離値を求める。次にフレ−
ム内を３分割し、各部におけるスペクトル間の距離値の
合計をＤ1 、Ｄ2 、Ｄ3 として求め、次のような条件で
図２におけるベクトル次元長の配列のセットを選択す
る。Ｄ₁ ＞Ｄ₂ ＋ｂかつＤ₁ ＞Ｄ₃ ＋ｂ →セット２（７）Ｄ₂ ＞Ｄ₁ ＋ｂかつＤ₂ ＞Ｄ₃ ＋ｂ →セット３（８）Ｄ₃ ＞Ｄ₁ ＋ｂかつＤ₃ ＞Ｄ₂ ＋ｂ →セット４（９）（７）から（９）以外のとき →セット１（１０）ただしｂは定数である。Embodiment 3 FIG. Further, in the first embodiment, the optimal vector dimension length determining means selects the vector dimension length set such that the distortion between the synthesized speech vector synthesized from the sound source vector and the input speech vector is minimized in the entire frame. A time change of the spectrum shape in a signal frame may be obtained, and a vector dimension length set may be selected based on a standard for assigning a short vector dimension length to a portion having a large change. That is, first, the spectrum at each sample point in the frame is obtained, and
Find the distance value of the spectrum between the samples. Next,
The system is divided into three parts, and the sum of the distance values between the spectra in each part is obtained as D1, D2, and D3, and a set of vector dimension length arrays in FIG. 2 is selected under the following conditions. D ₁ > D ₂ + b and D ₁ > D ₃ + b → set 2 (7) D ₂ > D ₁ + b and D ₂ > D ₃ + b → set 3 (8) D ₃ > D ₁ + b and D ₃ > D ₂ + B → set 4 (9) In cases other than (7) to (9) → set 1 (10) where b is a constant.

【００２３】[0023]

【発明の効果】以上のようにこの発明によれば、音源信
号をベクトル量子化する際、ベクトル次元長の配列を複
数個用意し、フレーム内の入力音声信号と合成音声信号
の間の歪が小さくなる次元長の配列を選択する最適ベク
トル次元長決定手段と、線形予測合成フイルタ手段、最
適音源ベクトル選択手段およびベクトル長指定音源信号
復号手段を設けたので、ベクトルの次元長を適応的に調
節して、入力音声信号のパワーやスペクトル形状の変化
点のような部分に重点的にベクトル量子化の精度を高め
ることができ、入力音声との歪が少ない復号音声信号を
合成できる効果がある。As described above, according to the present invention, when the sound source signal is vector-quantized, a plurality of vector dimension length arrays are prepared, and the distortion between the input speech signal and the synthesized speech signal in the frame is reduced. Since an optimal vector dimension length determining means for selecting an array having a smaller dimension length, a linear predictive synthesis filter means, an optimal excitation vector selecting means and a vector length designation excitation signal decoding means are provided, the dimension length of the vector is adaptively adjusted. As a result, the accuracy of vector quantization can be enhanced focusing on portions such as the power of the input voice signal and the change point of the spectrum shape, and there is an effect that a decoded voice signal with little distortion from the input voice can be synthesized.

[Brief description of the drawings]

【図１】この発明の音声符号化・復号化装置の実施例１
を示す構成図である。FIG. 1 is a first embodiment of a speech encoding / decoding apparatus according to the present invention;
FIG.

【図２】この発明のベクトル次元長の配列の例を示す説
明図である。FIG. 2 is an explanatory diagram showing an example of a vector dimension length array according to the present invention.

【図３】従来の音声符号化・復号化装置を示す構成図で
ある。FIG. 3 is a configuration diagram showing a conventional speech encoding / decoding device.

【図４】従来の音声符号化・復号化装置における入力音
声信号ベクトルと音源ベクトルの説明図である。FIG. 4 is an explanatory diagram of an input speech signal vector and an excitation vector in a conventional speech encoding / decoding device.

[Explanation of symbols]

２線形予測分析手段３音源ベクトル符号帳５経路９経路１２経路１３経路１４多重化手段１６分離手段２０経路２１音源ベクトル符号帳２４線形予測合成フィルタ手段３０最適ベクトル次元長決定手段３６線形予測合成フィルタ手段３８最適音源ベクトル選択手段３９ベクトル長指定音源信号復号化手段 2 linear prediction analysis means 3 excitation vector codebook 5 path 9 path 12 path 13 path 14 multiplexing means 16 separation means 20 path 21 excitation vector codebook 24 linear prediction synthesis filter means 30 optimal vector dimension length determination means 36 linear prediction synthesis filter Means 38 Optimal excitation vector selection means 39 Vector length designation excitation signal decoding means

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 19/08 G10L 19/00 ──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int.Cl. ⁷ , DB name) G10L 19/08 G10L 19/00

Claims

(57) [Claims]

1. An input audio signal is cut out by a fixed frame and converted into a discrete sample value sequence to obtain a linear prediction count.
While transmitting , the separation in the fixed frame is performed based on a plurality of predetermined excitation vectors in the excitation vector codebook.
Based on scattered sample value vector and selected sound source vector
An encoding unit that selects and encodes and transmits a source vector sequence that minimizes distortion with the synthesized speech vector,
After decoding the linear prediction count and transmitted above
In a configuration comprising a decoding unit for selecting and decoding an excitation vector sequentially from an excitation vector codebook corresponding to the encoded excitation vector sequence of the above, the encoding unit includes a plurality of divided frames each having a different length. And an optimal vector dimension length determining means for outputting an array of vector dimension lengths that minimizes the distortion of the frame from the plurality of dimension length arrays. The sound source vector and the discrete A linear prediction synthesis filter unit that generates a synthesized speech vector by using a linear prediction count obtained from a sample value, and generates a synthesized speech vector based on the vector dimension lengths having different lengths; The opponent sound source vector selecting means for selecting a vector performs distortion calculation a plurality of times in the frame, and determines the optimal vector dimension length determination. The means performs encoding and output based on the vector dimension length array with the minimum distortion, and the decoding unit sequentially decodes the excitation vector in the frame based on the transmitted variable dimension minimum distortion vector dimension length. A speech encoding / decoding device comprising a vector dimension length designation sound source signal decoding means.