JPH0580798A

JPH0580798A - Speech encoding and decoding device and sound source generating method

Info

Publication number: JPH0580798A
Application number: JP3245666A
Authority: JP
Inventors: Katsushi Seza; 勝志瀬座; Hirohisa Tazaki; 裕久田崎; Kunio Nakajima; 邦男中島
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1991-09-25
Filing date: 1991-09-25
Publication date: 1993-04-02
Anticipated expiration: 2017-02-12
Also published as: JP3254696B2

Abstract

PURPOSE:To provide a frame synchronizing process by stably interpreting and encoding an input speech with a small arithmetic quantity by using spectrum parameters and sound source parameters. CONSTITUTION:An AR preparatory selecting means 6 outputs a preparatory select AR code word 8 and an MA preparatory selecting means 16 outputs a preparatory select MA code word 18 selected from an MA code table 17 according to the distance to MA 15. A sound source preparatory selecting means 9 outputs a preparatory select sound source code word 11 selected preparatorily from a sound code book 10 according to a sound source code word series and a sound source position detecting means 2 detects a sound source position 3 in a frame; and a sound source generating means 12 generates a sound source 13 synchronized with the sound source position 3 from the preparatory select sound source code word 11 and a synthesizing means 19 synthesizes a synthesis speech 20 of the preparatory select AR code word 8, preparatory select MA code word 18, and sound source 13. An optimum code word selecting means selects the combination of an AR code word, an MA code word, and a sound source code word which minimize the distance between the synthesis speech 20 and an input speech 1 among the preparatory select AR code words 8, preparatory select MA code words 18, and preparatory select sound code words 11.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、音声をディジタル伝
送あるいは蓄積する場合に用いられる音声符号化復号化
装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice coding / decoding device used when digitally transmitting or storing voice.

【０００２】[0002]

【従来の技術】一ピッチ周期の音源信号（以下音源と略
す）を用いた従来の音声符号化復号化装置は例えば「”
声帯音源波形のモデルを用いた音声のARMAパラメータの
推定”マッツユンクヴィスト・藤崎博也電子情
報通信学会技術研究報告ＳＰ８６−４９、ＰＰ３９−４
５、１９８６」に記載されたものがある。この従来のも
のにおいては、スペクトルパラメータとしてARパラメー
タ（以下ARと略す）とMAパラメータ（以下MAと略す）を
用い、音源として声門音源波の微分波形上で定義される
音源波モデルを用いている。2. Description of the Related Art A conventional speech coding / decoding apparatus using a sound source signal of one pitch period (hereinafter referred to as a sound source) is, for example, "".
Estimation of ARMA parameters of speech using vocal cord source waveform model "Mats Junckvist, Hiroya Fujisaki Technical Report of IEICE SP86-49, PP39-4
5, 1986 ”. In this conventional one, an AR parameter (hereinafter abbreviated as AR) and an MA parameter (hereinafter abbreviated as MA) are used as spectrum parameters, and a sound source wave model defined on the differential waveform of the glottal sound source is used as a sound source. ..

【０００３】図６はこの従来の音声符号化復号化装置の
構成を示す構成図であり、図６(ａ)は分析部、図６
（ｂ）は合成部を示す。まず、図６（ａ）に示す分析部
について説明する。ARMA分析手段４４は一ピッチ周期の
入力音声１と音源生成手段１２で生成される音源１３か
らAR４５とMA４６を求め、合成手段１９に出力する。合
成手段１９では、音源１３、AR４５、MA４６より一ピッ
チ周期の合成音声２０を生成する。距離算出手段４７で
は、この合成音声２０と入力音声１との距離E1を算出す
る。FIG. 6 is a block diagram showing the structure of this conventional speech coding / decoding apparatus. FIG. 6 (a) is an analysis unit, and FIG.
(B) shows a synthesis part. First, the analysis unit shown in FIG. 6A will be described. The ARMA analysis means 44 obtains AR 45 and MA 46 from the input voice 1 of one pitch period and the sound source 13 generated by the sound source generation means 12, and outputs them to the synthesis means 19. The synthesizing means 19 generates a synthesized voice 20 of one pitch period from the sound source 13, AR 45, and MA 46. The distance calculating means 47 calculates the distance E1 between the synthetic voice 20 and the input voice 1.

【０００４】この距離E1が閾値E0未満の場合、音源パラ
メータ４８、AR４９、MA５０を出力する。距離E1が閾値
E0以上の場合、音源パラメータの一つのパラメータに微
少な摂動を与え、これを音源パラメータ４８として音源
生成手段１２に出力する。音源生成手段１２は音源パラ
メータ４８より音源１３を生成し、ARMA分析手段４４に
出力する。この操作を音源パラメータに与える摂動を小
さくしながら距離E1が閾値E0未満になるまで繰り返す。When the distance E1 is less than the threshold value E0, the sound source parameters 48, AR49 and MA50 are output. Distance E1 is the threshold
If it is equal to or more than E0, a slight perturbation is given to one of the sound source parameters, and this is output to the sound source generation means 12 as the sound source parameter 48. The sound source generation means 12 generates the sound source 13 from the sound source parameter 48 and outputs it to the ARMA analysis means 44. This operation is repeated while reducing the perturbation given to the sound source parameter until the distance E1 becomes less than the threshold E0.

【０００５】次に、図６（ｂ）に示す合成部について説
明する。音源生成手段４０では音源パラメータ４８から
音源４１を生成する。合成手段４２は、音源４１、AR４
９、MA５０を用いて合成音声４３を生成する。Next, the combining section shown in FIG. 6B will be described. The sound source generation means 40 generates a sound source 41 from the sound source parameter 48. The synthesizing means 42 includes the sound source 41 and the AR4.
9, the synthetic speech 43 is generated using the MA 50.

【０００６】図７は、上記従来の音声符号化復号化装置
に用いられている音源波モデルを表す説明図で、横軸は
時間、縦軸は振幅である。この音源波モデルｇ（ｎ）は
微分声門音源波を表すもので、変数Ａ、Ｂ、Ｃ、Ｄ、
Ｒ、Ｆ、Ｗとピッチ周期Ｔを音源パラメータとし、式
（１）により定義される。式中、ｎは時間である。ま
た、式（１）中α、βは音源パラメータより式（２）で
算出される変数である。FIG. 7 is an explanatory diagram showing a source wave model used in the above-mentioned conventional speech coding / decoding apparatus, in which the horizontal axis represents time and the vertical axis represents amplitude. This source wave model g (n) represents a differential glottal source wave, and has variables A, B, C, D,
R, F, W and pitch period T are used as sound source parameters, and are defined by equation (1). In the formula, n is time. Further, α and β in the equation (1) are variables calculated by the equation (2) from the sound source parameter.

【０００７】[0007]

【数１】 [Equation 1]

【０００８】[0008]

【数２】 [Equation 2]

【０００９】[0009]

【発明が解決しようとする課題】従来の音声符号化復号
化装置は以上の様に構成されており、スペクトルパラメ
ータと音源パラメータの求解を各パラメータ毎にA-b-S
(Analysis by Synthesis)で行うために演算量が多く、
求めたパラメータが不安定解に陥るという問題点があっ
た。また、ピッチ周期同期処理であるため音源パラメー
タを符号化する際に固定ビットレート化及び低ビットレ
ート化が困難であるという問題点があった。The conventional speech coding / decoding apparatus is configured as described above, and the solution of the spectrum parameter and the excitation parameter is calculated for each parameter AbS.
Since it is performed by (Analysis by Synthesis), there is a large amount of calculation,
There is a problem that the obtained parameters fall into unstable solutions. Further, since the pitch period synchronization processing is performed, there is a problem that it is difficult to reduce the fixed bit rate and reduce the bit rate when encoding the excitation parameters.

【００１０】さらに、従来の音源波モデルはパラメータ
数が多いため、求解のための演算量が多いという問題点
があった。Further, since the conventional source wave model has a large number of parameters, there is a problem that a large amount of calculation is required for solution.

【００１１】この発明は上記問題点を解消するためにな
されたもので、スペクトルパラメータと音源パラメータ
求解の演算量を削減し、パラメータ求解を安定化して、
品質の優れた合成音声生成を実現し、また、フレーム同
期処理を行うことにより固定ビットレート化及び低ビッ
トレート化することを目的としている。The present invention has been made to solve the above problems, and reduces the calculation amount of the solution of the spectrum parameter and the sound source parameter and stabilizes the solution of the parameter.
The object of the present invention is to realize high-quality synthetic speech generation and to achieve a fixed bit rate and a low bit rate by performing frame synchronization processing.

【００１２】[0012]

【課題を解決するための手段】この発明に係る音声符号
化復号化装置は、符号化部に、入力音声を分析してスペ
クトルパラメータを抽出するスペクトル分析手段と、ス
ペクトルパラメータをスペクトル符号語として複数セッ
ト格納したスペクトル符号帳と、スペクトル分析手段が
出力したスペクトルパラメータとの距離の近い有限Ｌ個
の予備選択スペクトル符号語をスペクトル符号帳から選
択するスペクトル予備選択手段と、入力音声からフレー
ム内に存在する全ての一ピッチ周期の音源の開始位置を
検出し音源位置として出力する音源位置検出手段と、一
ピッチ周期の音源として用いる音源波モデルのパラメー
タを音源パラメータとし、この音源パラメータを音源符
号語として複数セット格納した音源符号帳と、過去に選
択された音源符号語との音源パラメータ上の距離の近い
有限Ｍ個の予備選択音源符号語を音源符号帳から選択す
る音源予備選択手段と、前記予備選択音源符号語を用い
て音源位置に同期した音源を生成する音源生成手段と、
予備選択スペクトル符号語と音源より合成音声を生成す
る合成手段と、この合成手段から出力された合成音声と
入力音声の距離を最小にするスペクトル符号語と音源符
号語の組み合わせを予備選択スペクトル符号語と予備選
択音源符号語の中から選択する最適符号語選択手段を備
え、A speech coding / decoding apparatus according to the present invention has a coding section, in which a plurality of spectrum analyzing means for analyzing input speech to extract spectrum parameters and a plurality of spectrum parameters as spectrum codewords. The spectrum preselection means for selecting from the spectrum codebook a finite L number of preselected spectrum codewords having a close distance between the spectrum codebook stored as a set and the spectrum parameter output by the spectrum analysis means, and existing in the frame from the input speech. A sound source position detecting means for detecting the start position of all sound sources of one pitch period and outputting it as a sound source position, and a parameter of a sound source model used as a sound source of one pitch period as a sound source parameter, and this sound source parameter as a sound source codeword Excitation codebook with multiple sets stored and excitation code selected in the past Source preselection means for selecting, from the source codebook, a finite number of preselected source codewords having a close distance on the source parameter from the source code source, and a source generating a source synchronized with the source position using the preselected source codeword. Generating means,
The preselection spectrum codeword is a synthesis means for generating synthetic speech from the preselected spectrum codeword and the sound source, and a combination of the spectrum codeword and the excitation codeword that minimizes the distance between the synthetic speech output from the synthesis means and the input speech. And an optimal codeword selecting means for selecting from the preselected excitation codeword,

【００１３】復号化部に、符号化部と同じスペクトル符
号帳、音源符号帳と、入力されたスペクトル符号語番号
に対応するスペクトル符号語をスペクトル符号帳より出
力するスペクトル逆量子化手段と、入力された音源符号
語番号に対応する音源符号語を音源符号帳より出力する
音源逆量子化手段と、前記音源符号語から音源信号を生
成する符号化部と同じ音源生成手段と、前記音源と前記
スペクトル符号語から合成音声を生成する符号化部と同
じ合成手段を備えたものである。In the decoding section, the same spectrum codebook and excitation codebook as in the coding section, spectrum dequantization means for outputting the spectrum codeword corresponding to the inputted spectrum codeword number from the spectrum codebook, and the input Source dequantization means for outputting the excitation codeword corresponding to the generated excitation codeword number from the excitation codebook, the same excitation generation means as the encoding part for generating the excitation signal from the excitation codeword, the excitation source, and the excitation source. It is provided with the same synthesizing means as the coder for generating synthetic speech from the spectrum code word.

【００１４】また、この発明に係わる音源生成方法は、
変数Ａ、Ｂ、Ｃ、Ｌ_１、Ｌ_２、及びピッチ周期Ｔを用
い、下式の波形ｇ（ｎ）よりなる一ピッチ周期の音源信
号を生成するものである。ｇ（ｎ）＝Ａｎ−Ｂｎ² （０≦ｎ≦Ｌ_１）ｇ（ｎ）＝Ｃ（ｎ−Ｌ_２）² （Ｌ_１＜ｎ≦Ｌ_２）ｇ（ｎ）＝０（Ｌ_２＜ｎ≦Ｔ）ただしｎは時間。The sound source generation method according to the present invention is
By using the variables A, B, C, L ₁ , L ₂ and the pitch period T, a sound source signal of one pitch period having a waveform g (n) of the following formula is generated. g (n) = An−Bn ² (0 ≦ n ≦ L ₁ ) g (n) = C (n−L ₂ ) ² (L ₁ <n ≦ L ₂ ) g (n) = 0 (L ₂ <n ≦ T) where n is time.

【００１５】[0015]

【作用】この発明においては、スペクトル分析手段によ
り得られたスペクトルパラメータとの距離が小さいスペ
クトル符号語をスペクトル予備選択手段がスペクトル符
号帳から有限Ｌ個予備選択し、音源予備選択手段が、過
去に選択された音源符号語との音源パラメータ上の距離
の近い音源符号語を音源符号帳から有限Ｍ個予備選択
し、最適符号語選択手段が合成音声と入力音声の距離を
最小にするスペクトル符号語と音源符号語の組み合わせ
を予備選択スペクトル符号語と予備選択音源符号語の中
から選択してそれぞれ番号を出力することで安定に演算
量少なく符号化がおこなわれ、また復号化部では選択ス
ペクトル符号語番号、予備選択音源符号語番号により適
正に復号化が行われる。またこの発明に係わる音源生成
方法によれば、少ないパラメータで良好に一ピッチ周期
の音源信号が生成される。According to the present invention, the spectrum preselection means preliminarily selects a finite number L of spectrum codewords having a small distance from the spectrum parameter obtained by the spectrum analysis means from the spectrum codebook, and the sound source preselection means has been selected in the past. A spectrum codeword that preliminarily selects a finite number M of excitation codewords having a short distance on the excitation parameter from the selected excitation codeword from the excitation codebook, and the optimum codeword selection means minimizes the distance between the synthesized speech and the input speech. By selecting a combination of the and excitation codewords from the preselected spectrum codeword and the preselected excitation codeword and outputting the respective numbers, stable encoding is performed with a small amount of computation, and the decoding unit selects the selected spectrum codeword. Proper decoding is performed based on the word number and the preselected excitation codeword number. Further, according to the sound source generation method according to the present invention, the sound source signal of one pitch period is satisfactorily generated with a small number of parameters.

【００１６】[0016]

【Example】

実施例１．図１はこの発明の一実施例に係る音声符号化
復号化装置の符号化部の構成図、図２は復号化部の構成
図である。以下、動作についてを説明する。なお図１、
図２において図６と同一の部分については同一符号を付
している。まず、図１の符号化部について説明する。Example 1. FIG. 1 is a configuration diagram of an encoding unit of a speech encoding / decoding device according to an embodiment of the present invention, and FIG. 2 is a configuration diagram of a decoding unit. The operation will be described below. Note that FIG.
2, the same parts as those in FIG. 6 are designated by the same reference numerals. First, the encoding unit in FIG. 1 will be described.

【００１７】AR分析手段４は入力音声１をAR分析して、
AR５を出力する。AR予備選択手段６は距離尺度として例
えば２乗距離を用い、AR５とのパラメータ間の距離の近
いAR符号語をAR符号帳７より有限Ｌ個選択し、これを予
備選択AR符号語８として出力する。The AR analysis means 4 analyzes the input voice 1 by AR,
Output AR5. The AR preselection means 6 uses, for example, the square distance as a distance measure, selects a finite number of AR codewords having a short distance between parameters from the AR 5 from the AR codebook 7, and outputs these as preselected AR codewords 8. To do.

【００１８】音源位置検出手段２は、例えば、入力音声
１のLPC残差信号のピッチ周期毎のピーク位置を検出
し、これを音源位置３として出力する。The sound source position detecting means 2 detects, for example, a peak position for each pitch cycle of the LPC residual signal of the input voice 1 and outputs it as a sound source position 3.

【００１９】音源予備選択手段９は距離尺度として例え
ば音源パラメータ間の重み付け２乗距離を用い、前フレ
ームで選択された音源符号語との距離が小さい音源符号
語を音源符号帳１０から有限Ｍ個選択し、これを予備選
択音源符号語１１として出力する。音源生成手段１２は
予備選択音源符号語１１からを用い、音源位置３に同期
した音源を生成し、音源１３として出力する。The excitation preselection means 9 uses, for example, a weighted square distance between excitation parameters as a distance measure, and a finite M number of excitation codewords having a small distance from the excitation codeword selected in the previous frame are extracted from the excitation codebook 10. It is selected and output as the preselected excitation codeword 11. The sound source generation means 12 uses the preselected sound source codeword 11 to generate a sound source synchronized with the sound source position 3 and outputs it as the sound source 13.

【００２０】MA算出手段１４は予備選択AR符号語８と音
源１３を用いてMA１５を算出する。MA予備選択手段１６
は距離尺度として例えばパラメータ間の２乗距離を用
い、MA１５との距離の近いMA符号語をMA符号帳１７より
有限Ｎ個選択し、これを予備選択MA符号語１８として出
力する。The MA calculating means 14 calculates the MA 15 using the preselected AR codeword 8 and the sound source 13. MA preliminary selection means 16
Uses, for example, the square distance between parameters as a distance measure, selects a finite number of N MA codewords that are close to the MA 15 from the MA codebook 17, and outputs these as preselected MA codewords 18.

【００２１】合成手段１９は予備選択AR符号語８と予備
選択MA符号語１８と音源１３より合成音声２０を生成す
る。最適符号語選択手段２１は、入力音声１と合成音声
２０の距離が最も小さくなるAR符号語とMA符号語と音源
符号語の組み合わせを選択し、その組み合わせにおける
AR符号語番号２２とMA符号語番号２３と音源符号語番号
２４を出力する。The synthesizing means 19 generates a synthesized speech 20 from the preselected AR codeword 8, the preselected MA codeword 18 and the sound source 13. The optimum codeword selecting means 21 selects a combination of an AR codeword, an MA codeword, and an excitation codeword in which the distance between the input speech 1 and the synthesized speech 20 is the smallest, and selects the combination.
The AR codeword number 22, the MA codeword number 23, and the excitation codeword number 24 are output.

【００２２】図３は、最適符号語選択手段の動作の一例
を説明したもので、まず前後の数ピッチ周期も含めた距
離計算範囲ａでの入力音声（実線）と合成音声（破線）
の距離E1を最小にするAR符号語とMA符号語と音源符号語
の組み合わせを選択し、距離E1が予め定められた閾値E0
以下の場合はこれを選択する。FIG. 3 illustrates an example of the operation of the optimum codeword selecting means. First, the input voice (solid line) and the synthetic voice (broken line) in the distance calculation range a including several pitch periods before and after.
The distance E1 is selected to be a combination of an AR code word, an MA code word, and an excitation code word, and the distance E1 is a predetermined threshold value E0.
Select this in the following cases.

【００２３】距離E1が予め定められた閾値E0を越えた場
合は、入力音声のパワーの大きい数ピッチ周期長を距離
計算範囲ｂ（ｂ＜ａ）として、この範囲での入力音声と
合成音声の距離を最小にするAR符号語とMA符号語と音源
符号語の組み合わせを選択する。When the distance E1 exceeds a predetermined threshold value E0, the several pitch cycle length with the large power of the input voice is set as the distance calculation range b (b <a), and the input voice and the synthesized voice in this range are calculated. A combination of AR codeword, MA codeword, and excitation codeword that minimizes the distance is selected.

【００２４】なお、AR符号帳７と音源符号帳１０とMA符
号帳１７は、大量の学習音声についてパラメータ毎のA-
b-Sにより安定解になるまで求解したARパラメータと音
源パラメータとMAパラメータを例えばLBGアルゴリズム
によりそれぞれクラスタリングして作成されている。The AR codebook 7, the excitation codebook 10 and the MA codebook 17 are A-parameter-specific parameters for a large amount of learning speech.
It is created by clustering the AR parameter, sound source parameter, and MA parameter, which are obtained by bS until a stable solution is obtained, for example, by the LBG algorithm.

【００２５】次に図２の復号化部について説明する。AR
逆量子化手段２５はAR符号語番号２２に対応するAR符号
語２７をAR符号帳２６より得る。Next, the decoding unit shown in FIG. 2 will be described. AR
The inverse quantization means 25 obtains the AR code word 27 corresponding to the AR code word number 22 from the AR code book 26.

【００２６】MA逆量子化手段３０はMA符号語番号２３に
対応するMA符号語３２をMA符号帳３１より得る。音源逆
量子化手段３５は音源符号語番号２４に対応する音源符
号語３７を音源符号帳３６より得る。The MA inverse quantization means 30 obtains the MA code word 32 corresponding to the MA code word number 23 from the MA code book 31. The excitation dequantization means 35 obtains the excitation codeword 37 corresponding to the excitation codeword number 24 from the excitation codebook 36.

【００２７】図４はAR符号語とMA符号語と音源符号語の
補間方法を示した説明図で、図中、Ｖ、Ｗ、Ｘ、Ｙ、Ｚ
は一ピッチ周期の合成区間である。AR補間手段２８は、
現在のフレームのAR符号語２７と前フレームのAR符号語
を前記区間毎に例えば線形補間し、補間AR２９として出
力する。FIG. 4 is an explanatory diagram showing an interpolating method for AR code words, MA code words, and excitation code words. In the figure, V, W, X, Y, Z
Is a synthetic interval of one pitch period. The AR interpolation means 28
For example, the AR codeword 27 of the current frame and the AR codeword of the previous frame are linearly interpolated for each section and output as an interpolated AR 29.

【００２８】MA補間手段３２は現在のフレームのMA符号
語３２と前フレームのMA符号語を前記区間毎に例えば線
形補間し、補間MA３４として出力する。音源補間手段３
８は現在のフレームの音源符号語３７と前フレームの符
号語を前記区間毎に例えば線形補間し、補間音源パラメ
ータ３９として出力する。音源生成手段４０は、補間音
源パラメータ３９から音源４１を生成する。合成手段４
２は、音源４１と補間AR２９と補間MA３４から合成音声
４３を生成する。The MA interpolating means 32 linearly interpolates the MA codeword 32 of the current frame and the MA codeword of the previous frame, for example, for each section, and outputs the interpolated MA 34. Sound source interpolation means 3
Reference numeral 8 linearly interpolates the excitation codeword 37 of the current frame and the codeword of the previous frame for each section, and outputs the interpolation excitation parameter 39. The sound source generation means 40 generates a sound source 41 from the interpolation sound source parameter 39. Synthetic means 4
2 generates a synthetic voice 43 from the sound source 41, the interpolated AR 29, and the interpolated MA 34.

【００２９】上記のようにそれぞれ前後のフレームの符
号語との間で補間しながら合成することによりフレーム
同期処理を行うことで、低ビットレート化及び固定ビッ
トレート化を可能にする。なお、AR符号帳７とAR符号帳
２６、音源符号帳１０と音源符号帳３６、MA符号帳１７
とMA符号帳３１はそれぞれ同じものである。As described above, by performing frame synchronization processing by synthesizing while interpolating with the code words of the preceding and following frames, it is possible to achieve a low bit rate and a fixed bit rate. The AR codebook 7 and the AR codebook 26, the excitation codebook 10 and the excitation codebook 36, and the MA codebook 17
And the MA codebook 31 are the same.

【００３０】図５はこの発明の音源生成方法を説明する
ための、音源波モデルの一実施例を示す説明図であり、
図中縦軸は音源波の時間微分値で、横軸は時間である。
また区間ａは声門開放点から極小点までの時間、区間ｂ
はピッチ周期Ｔから区間ａを差し引いた時間、区間ｃは
極小点から０交差するまでの時間、区間ｄは声門開放点
から最初に０交差するまでの時間である。FIG. 5 is an explanatory diagram showing an embodiment of a sound source wave model for explaining the sound source generation method of the present invention.
In the figure, the vertical axis represents the time derivative of the sound source wave, and the horizontal axis represents time.
Section a is the time from the glottal opening point to the minimum point, and section b is
Is the time obtained by subtracting the section a from the pitch cycle T, the section c is the time from the minimum point to the zero crossing, and the section d is the time from the glottal opening point to the first zero crossing.

【００３１】この音源波モデルは声門音源波の微分波形
上で定義されるものであり、微分声門音源波は、ピッチ
周期Ｔ、振幅ＡＭ、ＯＱ（区間ａがピッチ周期中に占め
る割合）、ＯＰ（区間ｄが区間ａに占める割合）、ＣＴ
（区間ｃが区間ｂに占める割合）の５つの音源パラメー
タを用いて式（３）から算出される。なお、式中ｎは時
間である。また式（３）中、Ａ、Ｂ、Ｃ、Ｌは式（４）
で定義される変数である。This source wave model is defined on the differential waveform of the glottal source wave, and the differential glottal source wave has a pitch period T, amplitude AM, OQ (the ratio of the section a to the pitch period), OP. (Ratio of section d to section a), CT
It is calculated from Equation (3) using five sound source parameters (the ratio of section c to section b). In the formula, n is time. In the formula (3), A, B, C, and L are formula (4).
Is a variable defined by.

【００３２】[0032]

【数３】 [Equation 3]

【００３３】[0033]

【数４】 [Equation 4]

【００３４】実施例２．上記実施例１では１フレームに
一組のAR符号語、MA符号語、音源符号語を選択している
が、それぞれのパラメータに対し複数の符号語を選択す
ることも可能である。Example 2. In the first embodiment, one set of AR code word, MA code word, and excitation code word is selected for one frame, but a plurality of code words can be selected for each parameter.

【００３５】実施例３．上記実施例１ではスペクトルパ
ラメータとしてARとMAを用いているが、ARのみとするこ
とも可能である。Example 3. Although AR and MA are used as the spectrum parameters in the first embodiment, it is also possible to use only AR.

【００３６】実施例４．上記実施例１では合成手段にお
いて合成音声をスペクトルパラメータと音源パラメータ
より生成しているが、スペクトルパラメータと音源パラ
メータを補間しながら合成音声を生成し、合成音声と入
力音声の距離を計算することも可能である。Example 4. In the first embodiment, the synthesis means generates synthetic speech from the spectrum parameter and the sound source parameter, but it is also possible to generate the synthetic speech while interpolating the spectrum parameter and the sound source parameter and calculate the distance between the synthetic speech and the input speech. It is possible.

【００３７】実施例５．上記実施例１の最適符号語選択
手段において、合成音声と入力音声の距離の大きいフレ
ームでは、スペクトルパラメータと音源パラメータを前
後のフレームから補間して現フレームのパラメータとす
ることも可能である。Example 5. In the optimum codeword selection means of the first embodiment, in a frame in which the distance between the synthesized voice and the input voice is large, it is possible to interpolate the spectrum parameter and the sound source parameter from the preceding and succeeding frames to obtain the parameter of the current frame.

【００３８】実施例６．上記実施例１では音源符号語に
ピッチ周期Ｔと振幅ＡＭを含めているが、ピッチ周期Ｔ
と振幅ＡＭは音源符号語から除いてクラスタリングして
音源符号帳を作成し、ピッチ周期と振幅は別途符号化復
号化することも可能である。Example 6. In the first embodiment, the excitation codeword includes the pitch period T and the amplitude AM.
It is also possible to exclude the excitation code AM from the excitation codeword and perform clustering to create an excitation codebook, and separately encode and decode the pitch period and amplitude.

【００３９】[0039]

【発明の効果】以上説明したようにこの発明によれば、
入力音声と合成音声の距離を最小にするスペクトル符号
語と音源符号語の組み合わせをそれぞれ予め求められた
安定な符号語の中から選択することでスペクトルパラメ
ータと音源パラメータの求解を安定化し、スペクトル符
号語と音源符号語の予備選択を行うことでスペクトルパ
ラメータと音源パラメータの求解における演算量を削減
する効果がある。As described above, according to the present invention,
The combination of the spectrum codeword and the excitation codeword that minimizes the distance between the input speech and the synthesized speech is selected from the previously obtained stable codewords to stabilize the solution of the spectrum parameter and the excitation parameter, and the spectrum code Pre-selection of words and excitation code words has the effect of reducing the amount of calculation in the solution of spectrum parameters and excitation parameters.

【００４０】また符号化部でフレーム毎に一組のスペク
トル符号語と音源符号語を選択し、復号化部でスペクト
ル符号語と音源符号語をそれぞれ前後のフレームの符号
語との間で補間しながら合成することによりフレーム同
期処理を行うことで、低ビットレート化及び固定ビット
レート化を可能にする。The encoding unit selects a set of spectrum codeword and excitation codeword for each frame, and the decoding unit interpolates the spectrum codeword and excitation codeword between the codewords of the preceding and succeeding frames, respectively. By performing the frame synchronization processing by synthesizing the signals, the low bit rate and the fixed bit rate can be achieved.

【００４１】また、この発明の音源生成方法を用いれ
ば、少ないパラメータで一ピッチ周期の音源を良好に表
現し、音源パラメータ求解における演算量を削減する効
果を奏する。Further, by using the sound source generation method of the present invention, a sound source of one pitch period can be satisfactorily expressed with a small number of parameters, and an effect of reducing the amount of calculation in finding a sound source parameter can be obtained.

[Brief description of drawings]

【図１】この発明の実施例を示す音声符号化復号化装置
の符号化部の構成図である。FIG. 1 is a configuration diagram of an encoding unit of a speech encoding / decoding device showing an embodiment of the present invention.

【図２】この発明の実施例を示す音声符号化復号化装置
の復号化部の構成図である。FIG. 2 is a configuration diagram of a decoding unit of a speech encoding / decoding apparatus showing an embodiment of the present invention.

【図３】この発明の実施例における最適符号語選択手段
の動作説明図である。FIG. 3 is an operation explanatory view of the optimum codeword selecting means in the embodiment of the present invention.

【図４】この発明の実施例における音源符号語とAR符号
語とMA符号語の補間方法の説明図である。FIG. 4 is an explanatory diagram of a method of interpolating excitation codewords, AR codewords, and MA codewords in the embodiment of the present invention.

【図５】この発明の音源生成方法による音源波モデルの
説明図である。FIG. 5 is an explanatory diagram of a sound source wave model according to the sound source generation method of the present invention.

【図６】従来の音声符号化復号化装置を示す構成図であ
る。FIG. 6 is a configuration diagram showing a conventional speech encoding / decoding device.

【図７】従来の音源波モデルの説明図である。FIG. 7 is an explanatory diagram of a conventional source wave model.

【符号の説明】１入力音声２音源位置検出手段４ AR分析手段６ AR予備選択手段７ AR符号帳８予備選択AR符号語９音源予備選択手段１０音源符号帳１１予備選択音源符号語１２音源生成手段１４ MA算出手段１６ MA予備選択手段１７ MA符号帳１８予備選択MA符号語１９合成手段２１最適符号語選択手段２５ AR逆量子化手段２６ AR符号帳２８ AR補間手段３０ MA逆量子化手段３１ MA符号帳３２ MA符号語３３ MA補間手段３５音源逆量子化手段３６音源符号帳３８音源補間手段４０音源生成手段４２合成手段[Description of Codes] 1 input speech 2 sound source position detection means 4 AR analysis means 6 AR preliminary selection means 7 AR codebook 8 preliminary selection AR codeword 9 excitation preliminary selection means 10 excitation codebook 11 preliminary selection excitation codeword 12 excitation generation Means 14 MA calculation means 16 MA preliminary selection means 17 MA codebook 18 preselected MA codeword 19 combining means 21 optimal codeword selection means 25 AR dequantization means 26 AR codebook 28 AR interpolation means 30 MA dequantization means 31 MA codebook 32 MA codeword 33 MA interpolation means 35 excitation dequantization means 36 excitation codebook 38 excitation interpolation means 40 excitation generation means 42 synthesis means

Claims

[Claims]

1. A speech coding / decoding device for separating a speech signal into spectral parameters representing its frequency spectrum characteristic and a sound source signal for coding, and analyzing the input speech to a coding section to extract spectral parameters. Spectrum analyzing means, a spectrum codebook in which a plurality of sets of spectrum parameters are stored as spectrum codewords,
Spectral preselection means for selecting, from the spectrum codebook, a finite number of preselected spectrum codewords that are close in distance to the spectrum parameter output by the spectrum analysis means, and an excitation parameter representing an excitation signal of one pitch period as the excitation codeword. Excitation source preselecting means for selecting, from the excitation codebook, a finite number of preselected excitation codewords that have a close distance on excitation parameters between the excitation codebook containing a plurality of sets and the excitation codewords selected in the past, and this excitation source. A one-pitch period excitation generating unit that generates an excitation signal of one pitch period from the pre-selected excitation code word selected by the preliminary selection unit, and a synthesized voice from the pre-selected spectrum code word and the excitation signal of this one pitch period. The synthesizing means and the combination of the spectrum code word and the excitation code word that minimizes the distance between the synthesized speech and the input speech are A pre-selected spectrum code word and the pre-selected excitation code word are selected, and a one-pitch cycle optimum code word selecting means for outputting the spectrum code word number and the excitation code word number of the selected combination is provided, and the decoding section is provided. , The same spectrum codebook as the coding unit, the same excitation codebook as the coding unit, spectrum dequantization means for outputting the spectrum codeword corresponding to the inputted spectrum codeword number from the spectrum codebook, and Source excitation codeword number corresponding to the excitation codeword number from the excitation codebook, and excitation pitch dequantization means, the same one-pitch period excitation generation means as the encoding unit that generates the excitation signal from the excitation codeword, and one pitch A voice code comprising the same synthesizing unit as a coding unit for generating a synthetic voice from the excitation signal output from the periodic sound source generating unit and the spectrum code word. Decoding apparatus.

2. A sound source position detecting means for detecting the start points of all the sound source signals of one pitch period existing in an analysis frame of a fixed time from the input voice and outputting the sound source position to the encoding unit, and the sound source spare. Sound source generation means for generating a sound source signal synchronized with the sound source position output by the sound source position detection means using the preselected sound source codeword selected by the selection means, and a synthesized speech from this sound source signal and the preselected spectrum codeword. And a combination of the spectrum code word and the excitation code word that minimizes the distance between the synthesized speech and the input speech within a range of several pitch periods in three frames including the preceding and following frames. And the optimum codeword selecting means for selecting from among the preselected excitation codewords, and the decoding unit is provided with the current frame obtained by the spectrum dequantizing means. A spectrum code word and a spectrum code word of the previous frame are interpolated for each pitch period, and a spectrum interpolating means for outputting the obtained interpolated spectrum parameter, and an excitation code word of the current frame obtained by the excitation dequantization means The same excitation as the excitation interpolating unit that interpolates the excitation codeword selected in the previous frame for each pitch period and outputs the obtained interpolation excitation parameter, and the encoding unit that generates the excitation signal in the frame from the interpolation excitation parameter The speech coding / decoding apparatus according to claim 1, further comprising a generation unit.

3. A sound source generation method for generating a sound source signal of one pitch period composed of a waveform g (n) of the following equation, using variables A, B, C, L ₁ , L ₂ and a pitch period T. g (n) = An−Bn ² (0 ≦ n ≦ L ₁ ) g (n) = C (n−L ₂ ) ² (L ₁ <n ≦ L ₂ ) g (n) = 0 (L ₂ <n ≦ T) where n is time.