JPH11249696A

JPH11249696A - Voice encoding/decoding method

Info

Publication number: JPH11249696A
Application number: JP10047248A
Authority: JP
Inventors: Ko Amada; 皇天田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1998-02-27
Filing date: 1998-02-27
Publication date: 1999-09-17

Abstract

PROBLEM TO BE SOLVED: To realize an auditorily natural low rate voice encoding method even though coincidence with an original sound is slightly lost. SOLUTION: When an LPC coefficient and a drive signal obtained by analyzing an input voice with an LPC analysis part 111 and a drive signal analysis part 112 are encoded, the predicted values of the LPC coefficient and the drive signal are obtained by prediction parts 141, 142, and the analysed values are compared with the predicted values related to each LPC coefficient and drive signal by an evaluation part 150. Then a degree of an effect imparted to auditory naturalness when the predicted values are used for generating a decoded voice is evaluated, and more quantization bits are allocated to the quantizing side having a larger degree of the effect between the quantization parts 121, 122 of the PC coefficient, the drive signal based on this evaluation result by a bit allocation part 160, and an LPC coefficient quantization index 1001, a drive signal quantization index 1002 and bit allocation information 1003 are transmitted.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声符号化方法およ
び音声復号化方法に係り、特にディジタル電話、ボイス
メモなどに用いられる低レートの音声符号化／復号化方
方法に関する。The present invention relates to a speech encoding method and a speech decoding method, and more particularly to a low-rate speech encoding / decoding method used for digital telephones, voice memos, and the like.

【０００２】[0002]

【従来の技術】近年、携帯電話やインターネットなどの
通信手段の発達によって、音声信号や楽音信号を少ない
情報量に圧縮して伝送または蓄積するための符号化技術
が盛んに研究されている。時間領域ではＣＥＬＰ(Code
Excited Linear Prediction)、周波数領域ではサブバン
ド符号化などの手法がある。2. Description of the Related Art In recent years, with the development of communication means such as cellular phones and the Internet, coding techniques for compressing a voice signal or a tone signal into a small amount of information for transmission or storage have been actively studied. In the time domain, CELP (Code
Excited Linear Prediction) and subband coding in the frequency domain.

【０００３】Code Excited Linear Prediction (M.R.Sc
hroeder and B.S.Atal, “Code Excited Linear Predic
tion(CELP):High Quality Speech at Very LoW Bit Rat
es,”Proc.ICASSP, pp.937-940,1985（文献１）、およ
びW.S.Kleijin,D.J.Krasinski et al.“Improved Speec
h Quality and Efficient Vector Quantization in SEL
P,” Proc.ICASSP,pp.155-158,1988（文献２））などに
記載されているＣＥＬＰ方式は、線形予測分析に基づく
符号化方式である。[0003] Code Excited Linear Prediction (MRSc
hroeder and BSAtal, “Code Excited Linear Predic
tion (CELP): High Quality Speech at Very LoW Bit Rat
es, ”Proc. ICASSP, pp. 937-940, 1985 (Reference 1), and WSKleijin, DJ Krasinski et al.“ Improved Speec
h Quality and Efficient Vector Quantization in SEL
P, “Proc. ICASSP, pp. 155-158, 1988 (Reference 2)) and the like are CELP schemes based on linear prediction analysis.

【０００４】ＣＥＬＰ方式によると、入力音声信号は線
形予測分析によって音韻情報を表す線形予測係数と音の
高さ等を表す予測残差信号という二つのパラメータに分
解され、これらのパラメータが符号化される。線形予測
係数は、再帰型のディジタルフィルタで構成される合成
フィルタのフィルタ係数として用いられる。復号化側で
は、この合成フィルタに予測残差信号を入力することに
より、元の入力音声信号が得られる。According to the CELP system, an input speech signal is decomposed into two parameters by a linear prediction analysis, a linear prediction coefficient representing phoneme information and a prediction residual signal representing pitch and the like, and these parameters are encoded. You. The linear prediction coefficient is used as a filter coefficient of a synthesis filter composed of a recursive digital filter. On the decoding side, the original input speech signal is obtained by inputting the prediction residual signal to this synthesis filter.

【０００５】この場合、低レートで符号化するために
は、予測残差信号をできるだけ少ない情報で表す必要が
ある。ＣＥＬＰ方式の特徴は、この予測残差信号の候補
である励振信号と呼ばれる信号を符号帳に何種類も格納
しておき、各励振信号を合成フィルタに通して生成され
る合成音声信号と入力音声信号がどの程度近いかを評価
し、入力音声信号に最も近い合成音声信号を生成する励
振信号を選ぶという構成にある。従って、符号化レート
を上げて励振信号の数を増やすに従い、復号音声信号の
波形は入力音声信号の波形に近づいてゆき、結果として
入力音声信号に近い復号音声信号が得られる。In this case, in order to perform encoding at a low rate, it is necessary to represent a prediction residual signal with as little information as possible. The feature of the CELP method is that several kinds of signals called excitation signals, which are candidates for the prediction residual signal, are stored in a codebook, and each excitation signal is generated through a synthesis filter. It is configured to evaluate how close the signals are and select an excitation signal that generates a synthesized voice signal closest to the input voice signal. Accordingly, as the encoding rate is increased to increase the number of excitation signals, the waveform of the decoded audio signal approaches the waveform of the input audio signal, and as a result, a decoded audio signal close to the input audio signal is obtained.

【０００６】ＣＥＬＰ方式のバリエーションの一つとし
て、マルチモードＣＥＬＰが知られている。ＣＥＬＰ方
式では、先に述べたように入力音声信号を線形予測係数
（フィルタ係数）や予測残差信号（励振信号）を表すパ
ラメータに分解して符号化する。入力音声の性質は一定
でなく、例えば有声区間と無声区間とでは異なるように
時々刻々変化する。そこで、マルチモードＣＥＬＰ方式
では、入力音声の性質に合わせて各パラメータの符号化
のビット配分や、符号帳の内容の異なる複数種類の符号
化系を用意しておき、これらを入力音声に応じて使い分
けることで効率を上げることができる。As one of the variations of the CELP system, a multi-mode CELP is known. In the CELP method, as described above, an input speech signal is decomposed into parameters representing linear prediction coefficients (filter coefficients) and prediction residual signals (excitation signals) and encoded. The nature of the input speech is not constant, and changes from time to time, for example, in voiced and unvoiced sections. Therefore, in the multi-mode CELP system, a plurality of types of encoding systems having different bit distributions for each parameter and different codebook contents are prepared according to the characteristics of the input speech, and these are prepared according to the input speech. Efficiency can be improved by properly using them.

【０００７】一方、このマルチモードＣＥＬＰとは別
に、入力音声に応じて符号化のビット配分を変える符号
化方式として、ＭＰＥＧオーディオなどで用いられてい
るサブバンド符号化方式が知られている。サブバンド符
号化は、入力音声信号を周波数領域に変換し、複数のバ
ンドに分割してバンド毎に符号化する方式であり、各バ
ンドの成分に対する符号化ビット数は、その成分のパワ
が大きいほど多く配分されるようになっている。マスキ
ング効果が考慮されるので、必ずしもパワのみで配分さ
れる訳ではないが、基本的には性質の変動する入力音声
信号に対して適応的にビット配分を行うことで、符号化
効率を上げている。On the other hand, apart from the multi-mode CELP, a sub-band coding method used in MPEG audio and the like is known as a coding method for changing the bit allocation of coding according to input speech. Subband coding is a method in which an input audio signal is converted into a frequency domain, divided into a plurality of bands, and coded for each band. The number of coded bits for each band component has a large power of the component. Are distributed as much as possible. Since the masking effect is taken into account, it is not necessarily allocated only by power, but basically by adaptively allocating bits to the input voice signal whose characteristics fluctuate, the coding efficiency can be increased. I have.

【０００８】マルチモードＣＥＬＰ方式やサブバンド符
号化方式で上述のようにビット配分を適応化している目
的は、入力音声に対して聴覚的により近い復号音声を得
るためである。そのため入力音声に含まれているパラメ
ータ（フォルマント、ピッチ周期、ゲインなど）の変化
や細かな揺らぎまで忠実に再現せざるを得ない構造にな
っており、このことが符号化効率のさらなる向上（低レ
ート化）の妨げとなっている。[0008] The purpose of adapting the bit allocation in the multi-mode CELP system or the sub-band coding system as described above is to obtain a decoded speech that is more audible to the input speech. For this reason, it is necessary to faithfully reproduce changes in parameters (formant, pitch period, gain, etc.) and fine fluctuations included in the input voice, which further improves the coding efficiency (lower Rate).

【０００９】しかし、例えば同一の文章を２度発話した
場合、人間の耳には同じに聞こえても音声波形レベルで
は異なっているという事実からも分かるように、入力音
声に含まれているパラメータの変化や細かな揺らぎとい
った情報は、聴覚的な自然性という面からは必ずしも伝
送する必要がないと考えられる。さらに、２度の発話を
聞き比べれば聴覚的な差異が認められる場合でも、その
差が問題になるほど大きなものでないこともある。However, for example, when the same sentence is uttered twice, as can be seen from the fact that the same sound is heard by the human ear but the sound waveform level is different, the parameters included in the input sound are different. Information such as changes and small fluctuations is not necessarily required to be transmitted in terms of auditory naturalness. Furthermore, even if an auditory difference is recognized when two utterances are compared, the difference may not be so large as to cause a problem.

【００１０】[0010]

【発明が解決しようとする課題】上述したように、符号
化効率を向上させるためにビット配分の適応化を行うマ
ルチモードＣＥＬＰ、サブバンド符号化といった従来の
低レート音声符号化技術では、聴覚的な自然性という面
からは必ずしも必要のない入力音声に含まれているパラ
メータの変化や細かな揺らぎなどの情報をも忠実に再現
しており、符号化効率のさらなる向上の妨げとなってい
た。As described above, conventional low-rate speech coding techniques such as multi-mode CELP and sub-band coding for adapting bit allocation to improve coding efficiency are audible. From the viewpoint of naturalness, information such as parameter changes and small fluctuations included in the input speech that are not necessarily required is faithfully reproduced, which hinders further improvement in coding efficiency.

【００１１】本発明は、このような事情を考慮してなさ
れたもので、入力音声と聴覚的に同一の復号音声を得る
ことを目標とせず、聴覚的に僅かな差異が認められても
自然性のある復号音声を得ることが可能な音声符号化／
復号化方法を提供することを目的とする。The present invention has been made in view of such circumstances, and does not aim to obtain the same decoded speech as the input speech in an auditory manner. Coding /
It is an object to provide a decoding method.

【００１２】[0012]

【課題を解決するための手段】上記の課題を解決するた
め、本発明に係る音声符号化方法は、入力音声を分析し
て得られる複数のパラメータを符号化する音声符号化方
法であって、各パラメータの予測値を過去の符号化デー
タから求め、各パラメータについて分析値と予測値の比
較を行うことにより、予測値を復号音声の生成に用いた
場合の聴覚的な自然性に与える影響の度合を評価し、こ
の評価結果に基づき、該影響の度合いがより大きいパラ
メータにより多くの符号化ビットを配分し、各パラメー
タの符号化データおよび各パラメータへの符号化ビット
の配分を示すビット配分情報を伝送することを特徴とす
る。To solve the above-mentioned problems, a speech encoding method according to the present invention is a speech encoding method for encoding a plurality of parameters obtained by analyzing an input speech. By calculating the predicted value of each parameter from past coded data and comparing the analysis value with the predicted value for each parameter, the effect on the auditory naturalness when the predicted value is used to generate decoded speech The degree is evaluated, and based on the evaluation result, more coded bits are allocated to the parameter having a greater degree of influence, and bit allocation information indicating the coded data of each parameter and the allocation of coded bits to each parameter. Is transmitted.

【００１３】また、この音声符号化方法に対応する本発
明に係る音声復号化方法は、入力音声を分析して得られ
る複数のパラメータについて、該パラメータの符号化デ
ータと、該パラメータの予測値を用いて復号音声を生成
した場合の聴覚的な自然性に与える影響の度合の評価結
果に基づき該影響の度合いがより大きいパラメータによ
り多く配分されるように決定された符号化ビットの配分
を示すビット配分情報を入力し、符号化データをビット
配分情報に従って復号化して復号音声を生成することを
特徴とする。[0013] In addition, the speech decoding method according to the present invention corresponding to this speech encoding method comprises, for a plurality of parameters obtained by analyzing input speech, encoding data of the parameters and a predicted value of the parameters. A bit indicating the distribution of coded bits determined such that the degree of the influence on the auditory naturalness when the decoded speech is generated is determined so that the degree of the influence is more distributed to the larger parameter based on the evaluation result. The method is characterized in that allocation information is input, and coded data is decoded according to the bit allocation information to generate a decoded speech.

【００１４】本発明に係る他の音声符号化方法は、入力
音声を分析して得られる複数のパラメータを符号化する
音声符号化方法であって、各パラメータの各々の予測値
を過去の符号化データから求め、各パラメータについて
分析値と予測値の比較を行うことにより、予測値を復号
音声の生成に用いた場合の聴覚的な自然性に与える影響
の度合を評価し、この評価結果に基づき、該影響の度合
いがより大きいパラメータのみを選択して該パラメータ
の分析値の符号化データを伝送するとともに、選択した
パラメータの種類を示す選択情報を伝送することを特徴
とする。Another speech encoding method according to the present invention is a speech encoding method for encoding a plurality of parameters obtained by analyzing an input speech, wherein a predicted value of each parameter is calculated by past encoding. From the data, the analysis value and the prediction value are compared for each parameter to evaluate the degree of the effect on the auditory naturalness when the prediction value is used to generate the decoded speech. The method is characterized in that only the parameter having a greater degree of influence is selected and encoded data of the analysis value of the parameter is transmitted, and selection information indicating the type of the selected parameter is transmitted.

【００１５】また、この音声符号化方法に対応する本発
明に係る他の音声復号化方法は、入力音声を分析して得
られる複数のパラメータについて、該パラメータの予測
値を用いて復号音声を生成した場合の聴覚的な自然性に
与える影響の度合の評価結果に基づき選択された前記影
響の度合いがより大きいパラメータの分析値の符号化デ
ータと、選択されたパラメータの種類を示す選択情報を
入力し、選択情報に基づいて、選択されたパラメータに
ついては分析値の符号化データを復号化して復号音声を
生成し、選択されなかったパラメータについては予測値
をそのまま用いて復号音声を生成することを特徴とす
る。In another speech decoding method according to the present invention corresponding to this speech encoding method, a decoded speech is generated for a plurality of parameters obtained by analyzing an input speech by using predicted values of the parameters. Input the coded data of the analysis value of the parameter having the greater degree of the influence selected based on the evaluation result of the degree of the influence on the auditory naturalness in the case of the selection, and the selection information indicating the type of the selected parameter. Then, based on the selection information, for the selected parameter, decode the encoded data of the analysis value to generate a decoded speech, and for the unselected parameter, generate the decoded speech using the predicted value as it is. Features.

【００１６】このように本発明では、入力音声に含まれ
る異なる性質のパラメータの変化をそれぞれ予測し、予
測値をそのまま復号音声の生成に用いた場合に聴覚的に
不自然になる種類のパラメータに対しては、より多くの
符号化ビットを配分するか、または全符号化ビットを配
分し、入力音声と差異が認められても不自然にならない
種類のパラメータについては、より少ないビット数で符
号化を行うか、または全く符号化ビットを配分させず、
予測値をそのまま復号音声の生成に用いるようにする。
さらに、各パラメータの符号化データに対するビット配
分は、入力音声の性質により時々刻々変化させるように
することが望ましい。その結果、復号音声と元の入力音
声との一致性は僅かに犠牲になるが、従来よりも遥かに
低レートの音声符号化／復号化が可能になる。As described above, according to the present invention, changes in parameters of different properties included in the input speech are predicted, and parameters that are perceptually unnatural when the predicted values are used as they are in the generation of decoded speech. On the other hand, allocate more coded bits, or allocate all coded bits, and use a smaller number of bits for parameters that are not unnatural even if they differ from the input speech. Or do not distribute any coded bits,
The prediction value is used as it is for generating the decoded speech.
Further, it is desirable that the bit allocation of each parameter to the encoded data be changed every moment depending on the characteristics of the input speech. As a result, although the consistency between the decoded speech and the original input speech is slightly sacrificed, speech encoding / decoding at a much lower rate than before becomes possible.

【００１７】[0017]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態を説明する。なお、本発明の音声符号化／復号
化方法は多くの場合、コンピュータを用いたソフトウェ
アにより実現されるが、以下の説明ではブロック図を用
いて音声符号化／復号化システムとして述べるものとす
る。このような説明によっても、本発明による音声符号
化／復号化方法としての手順は当業者が明確に理解でき
るものと考えられる。Embodiments of the present invention will be described below with reference to the drawings. Note that the audio encoding / decoding method of the present invention is often realized by software using a computer, but will be described below as an audio encoding / decoding system using a block diagram. Even with such a description, it is considered that a person skilled in the art can clearly understand the procedure as the speech encoding / decoding method according to the present invention.

【００１８】（第１の実施形態）＜符号化側について＞図１に、本発明の第１の実施形態
に係る音声符号化システムの構成を示す。この音声符号
化システムは、入力端子１００からの入力音声信号を分
析するＬＰＣ分析部１１１および駆動信号分析部１１２
からなる分析部１１０、ＬＰＣ係数および駆動信号の量
子化をそれぞれ行う第１、第２の量子化部１２１，１２
２、ＬＰＣ係数および駆動信号の量子化値をそれぞれ遅
延させる第１、第２の遅延部１３１，１３２、ＬＰＣ係
数および駆動信号の予測をそれぞれ行う第１、第２の予
測部１４１，１４２、予測部１４１，１４２からの予測
値をそのまま復号音声の生成に用いた場合の聴覚的な自
然性に与える度合いを評価する評価部１５０、および評
価部１５０の評価結果に基づき量子化部１２１，１２２
への量子化ビット数（符号化ビット数）の配分を行うビ
ット配分部１６０から構成される。(First Embodiment) <Regarding the Encoding Side> FIG. 1 shows the configuration of a speech encoding system according to a first embodiment of the present invention. The speech encoding system includes an LPC analysis unit 111 and a drive signal analysis unit 112 that analyze an input audio signal from the input terminal 100.
, Which performs quantization of LPC coefficients and drive signals, respectively.
2. First and second delay units 131 and 132 for delaying the LPC coefficient and the quantization value of the drive signal, respectively, first and second prediction units 141 and 142 for predicting the LPC coefficient and the drive signal, respectively. The evaluator 150 evaluates the degree to which the predicted values from the units 141 and 142 are applied to the auditory naturalness when the decoded values are used as they are in the decoded speech, and the quantizers 121 and 122 based on the evaluation results of the evaluator 150.
And a bit allocation unit 160 for allocating the number of quantization bits (the number of coding bits).

【００１９】次に、この音声符号化システムの動作につ
いて説明する。入力端子１００には、音声信号が１フレ
ーム単位で入力され、これに同期してＬＰＣ分析部１１
１では線形予測分析が行われて声道特性に相当するＬＰ
Ｃ係数の分析値が出力され、駆動信号分析部１１２から
は声帯波形に相当する駆動信号の分析値が出力される。Next, the operation of the speech encoding system will be described. An audio signal is input to the input terminal 100 on a frame-by-frame basis.
In LP, linear prediction analysis is performed and LP corresponding to vocal tract characteristics
The analysis value of the C coefficient is output, and the analysis value of the drive signal corresponding to the vocal cord waveform is output from the drive signal analysis unit 112.

【００２０】第１の量子化部１２１は、予測部１４１の
出力であるＬＰＣ係数の予測値を利用して、ＬＰＣ分析
部１１１からのＬＰＣ係数の分析値をビット配分部１６
０によって配分された量子化ビット数で量子化し、ＬＰ
Ｃ係数量子化インデックス１００１を出力すると同時
に、ＬＰＣ係数の量子化値を次フレームの予測のために
遅延部１３１に供給する。The first quantization unit 121 uses the LPC coefficient prediction value output from the prediction unit 141 to divide the LPC coefficient analysis value from the LPC analysis unit 111 into the bit distribution unit 16.
Quantize with the number of quantization bits allocated by 0, LP
At the same time as outputting the C coefficient quantization index 1001, the quantization value of the LPC coefficient is supplied to the delay unit 131 for prediction of the next frame.

【００２１】第２の量子化部１２２は、同様に第２の予
測部１４２の出力である駆動信号の予測値を利用して、
駆動信号分析部１１２からの駆動信号の分析値をビット
配分部１６０によって配分された量子化ビット数で量子
化し、駆動信号量子化インデックス１００２を出力する
と同時に、駆動信号の量子化値を次フレームの予測のた
めに遅延部１３２に供給する。The second quantizing section 122 similarly uses the predicted value of the drive signal output from the second predicting section 142,
The analysis value of the drive signal from the drive signal analysis unit 112 is quantized by the number of quantization bits allocated by the bit allocation unit 160, and the drive signal quantization index 1002 is output. The signal is supplied to the delay unit 132 for prediction.

【００２２】評価部１５０では、まずＬＰＣ分析部１１
１で得られたＬＰＣ係数の分析値を基に、第１の予測部
１４１で得られたＬＰＣ係数の予測値をそのまま合成フ
ィルタのフィルタ係数として用いた場合の復号音声の声
質への影響（聴覚的な自然性に与える影響）の度合が求
められる。In the evaluation section 150, first, the LPC analysis section 11
Influence on the voice quality of the decoded speech when the predicted value of the LPC coefficient obtained by the first prediction unit 141 is used as it is as the filter coefficient of the synthesis filter based on the analysis value of the LPC coefficient obtained in Step 1 Impact on natural nature) is required.

【００２３】さらに、評価部１５０では、駆動信号分析
部１１２で得られた駆動信号の分析値を基に、第２の予
測部１４２で得られた駆動信号の予測値をそのまま合成
フィルタの入力に用いた場合の復号音声の音質への影響
（聴覚的な自然性に与える影響）の度合も求められる。Further, the evaluation section 150 uses the analysis value of the drive signal obtained by the drive signal analysis section 112 to input the predicted value of the drive signal obtained by the second prediction section 142 as it is to the input of the synthesis filter. The degree of influence on the sound quality of the decoded speech (effect on auditory naturalness) when used is also required.

【００２４】そして、評価部１５０からは上記の二種類
の影響の度合を示す情報がビット配分部１６０に送られ
る。ビット配分部１６０では、この情報に基づいて影響
の度合が大きい方により多くの量子化ビットを配分する
ように、第１、第２の量子化器１２１，１２２の量子化
ビット数の配分を決める。同時にビット配分部１６０か
らは、第１、第２の量子化器１２１，１２２の各々に配
分した量子化ビット数の情報であるビット配分情報１０
０３が出力される。Then, information indicating the degree of the above two kinds of influences is sent from the evaluation section 150 to the bit allocation section 160. The bit allocation unit 160 determines the allocation of the number of quantization bits of the first and second quantizers 121 and 122 based on this information so as to allocate more quantization bits to the one having a greater influence. . At the same time, the bit allocation section 160 outputs bit allocation information 10 which is information on the number of quantization bits allocated to each of the first and second quantizers 121 and 122.
03 is output.

【００２５】この音声符号化システムから出力される符
号化データは、ＬＰＣ係数量子化インデックス１００
１、駆動信号量子化１００２およびビット配分情報１０
０３であり、これらが伝送路や記録媒体を介して後述す
る音声復号化システムへ伝送される。The coded data output from the speech coding system has an LPC coefficient quantization index 100
1, drive signal quantization 1002 and bit allocation information 10
03, which are transmitted to an audio decoding system described later via a transmission path or a recording medium.

【００２６】次に、本実施形態におけるＬＰＣ係数の符
号化方法について説明する。第１の遅延部１３１には、
第１の量子化部１２１において過去のフレームで量子化
されたＬＰＣ係数の量子化値が格納されている。第１の
予測部１４１では、この遅延部１３１に格納された過去
のフレームのＬＰＣ係数の量子化値を用いて現在のフレ
ームのＬＰＣ係数の値を予測する。仮に、このＬＰＣ係
数の予測値がＬＰＣ分析部１１１で入力音声信号を分析
することで得られたＬＰＣ係数の分析値と同じであった
場合、ＬＰＣ係数の情報は全く伝送する必要がない。復
号化側も同じ予測アルゴリズムを備えているので、過去
の量子化値から符号化側と同じ現在の予測値を得ること
ができるからである。Next, an encoding method of LPC coefficients according to the present embodiment will be described. In the first delay unit 131,
The quantization value of the LPC coefficient quantized in the past frame in the first quantization unit 121 is stored. The first prediction unit 141 predicts the value of the LPC coefficient of the current frame using the quantization value of the LPC coefficient of the past frame stored in the delay unit 131. If the predicted value of the LPC coefficient is the same as the analysis value of the LPC coefficient obtained by analyzing the input audio signal in the LPC analysis section 111, there is no need to transmit information of the LPC coefficient at all. This is because the decoding side also has the same prediction algorithm, so that the same current prediction value as the encoding side can be obtained from the past quantization value.

【００２７】実際にはＬＰＣ係数の予測値が分析値と同
じになることは稀であり、多くの場合は違いが生じるた
め、その違いを量子化して伝送する必要があるが、その
時に割り当てる量子化ビット数を他のパラメータ（本実
施形態では、ＬＰＣ係数と同様の構成で求められた駆動
信号の予測値）との入力音声信号に対する重要度の違い
に応じて分配を決めるのが特徴である。In practice, it is rare that the predicted value of the LPC coefficient is the same as the analysis value. In many cases, a difference occurs. Therefore, it is necessary to quantize and transmit the difference. The feature is that the distribution is determined according to the difference between the number of coded bits and another parameter (in the present embodiment, the predicted value of the drive signal obtained by the same configuration as the LPC coefficient) and the importance of the input audio signal. .

【００２８】ここで、本実施形態では従来のマルチモー
ドＣＥＬＰなどのように各パラメータに対するビット配
分を変えて、入力音声信号に対する歪みがより小さくな
るようなビット分配方法を選ぶのではなく、入力音声信
号と聴覚的に差異が感じられても自然に聞こえるように
ビット配分を行う。例えば、ＬＰＣ係数は分析窓の位置
により定常区間でも値が変わることがあるが、マルチモ
ードＣＥＬＰではこのような違いも忠実に再現できるよ
うに符号化を行っていたため、例えばＬＰＣ係数の符号
化データに異なる２種類の符号Ａと符号Ｂが交互に現わ
れるＡＢＡＢ…のような符号パターンになる現象が起こ
り得た。Here, in the present embodiment, the bit allocation method for each parameter is changed as in the conventional multi-mode CELP or the like, and a bit distribution method that minimizes the distortion to the input voice signal is selected. Bit allocation is performed so that the signal can be heard naturally even if it is perceptually different from the signal. For example, the value of the LPC coefficient may change even in a stationary section depending on the position of the analysis window. However, in the multi-mode CELP, encoding is performed so that such a difference can be faithfully reproduced. In this case, a code pattern such as ABAB... In which two different codes A and B alternately appear may occur.

【００２９】本実施形態では、このＡＢＡＢ…のような
符号パターンが出現する場合、全て符号Ａまたは全て符
号Ｂと置き換えることで、ＬＰＣ係数の予測値を伝送し
続けるようにする。ＡＢＡＢ…、ＡＡＡＡ…、ＢＢＢＢ
…の各符号パターンの間には、聴覚的な差異は若干存在
すると思われる。しかし、それぞれの符号パターンから
生成された復号音声を聞いた場合、問題となる差異では
ない。符号伝送量を考えた場合、ＡＢＡＢ…の符号パタ
ーンは符号Ａと符号Ｂの差を毎フレーム伝送し続けなく
てはならないのに対し、ＡＡＡＡ…やＢＢＢＢ…の符号
パターンは、先頭の値を一旦得てしまえば、後は同じ値
を予測することで、その後のフレームの伝送量は０に抑
えることができ、ビットレート削減の効果が大きい。こ
のように復号音声の原音声である入力音声信号との一致
性を僅かに犠牲にすることで、ビットレートの大きな削
減が可能となる。In this embodiment, when a code pattern such as ABAB... Appears, all the codes are replaced with the codes A or all the codes B, so that the predicted values of the LPC coefficients are continuously transmitted. ABAB ..., AAAA ..., BBBB
It seems that there is some auditory difference between the code patterns of. However, when hearing the decoded voice generated from each code pattern, this is not a problematic difference. Considering the amount of code transmission, the code pattern of ABAB... Must keep transmitting the difference between the code A and the code B every frame, whereas the code pattern of AAAA... And BBBB. Once obtained, the same value is predicted thereafter, so that the transmission amount of the subsequent frame can be suppressed to 0, and the effect of reducing the bit rate is great. By slightly sacrificing the consistency of the decoded speech with the input speech signal as the original speech, the bit rate can be greatly reduced.

【００３０】上記ではＬＰＣ係数の量子化について説明
したが、駆動信号の量子化についても同様のことが言え
る。ここで、本実施形態のポイントはＬＰＣ係数と駆動
信号で聴覚的に影響の大きい方により多くの符号化ビッ
トを配分する点である。上述したＬＰＣ係数の符号化デ
ータの伝送法において、ＡＢＡＢ…の符号パターンがＡ
ＡＡＡ…やＢＢＢＢ…の符号パターンよりも確かに原音
声に近い復号音声が得られるので、可能であれば多くの
ビットを割り当ててＡＢＡＢ…の符号パターンを伝送し
たいところである。The quantization of the LPC coefficient has been described above, but the same can be said for the quantization of the driving signal. Here, the point of the present embodiment is that more coded bits are allocated to the LPC coefficient and the drive signal that have a greater auditory effect. In the transmission method of the encoded data of the LPC coefficient described above, the code pattern of ABAB.
Since a decoded voice closer to the original voice can be obtained than the code patterns of AAA... And BBBB..., The code pattern of ABAB.

【００３１】ＡＢＡＢ…の符号パターンを伝送するか、
ＡＡＡＡ…やＢＢＢＢ…の符号パターンに簡略化して伝
送するかは、駆動信号の変化の度合いよって決まる。駆
動信号のピッチ周期やゲインなどがあまり変化していな
い場合は、ＬＰＣ係数の符号化に多くのビットを割り当
ててＡＢＡＢ…の符号パターンとして伝送し、より原音
声に忠実な復号音声を生成することが可能になる。逆
に、駆動信号の変化が大きく聴覚的に影響が大きい場合
は、駆動信号の符号に多くのビットを割り当て、ＬＰＣ
係数はＡＡＡＡ…やＢＢＢＢ…の符号パターンで我慢す
るようにする。このとき、復号音声と原音声との一致性
は多少犠牲になるが、聴覚的な自然性は保たれる点が大
きな利点である。Whether the code pattern of ABAB ... is transmitted,
Whether transmission is simplified to a code pattern of AAAA... Or BBBB... Depends on the degree of change of the drive signal. When the pitch period or gain of the drive signal does not change much, a large number of bits are allocated to the encoding of the LPC coefficient and transmitted as a code pattern of ABAB... To generate a decoded voice more faithful to the original voice. Becomes possible. Conversely, when the change in the drive signal is large and has a large auditory effect, many bits are assigned to the sign of the drive signal and the LPC
The coefficients are endured in a code pattern of AAAA... Or BBBB. At this time, although the consistency between the decoded speech and the original speech is somewhat sacrificed, a great advantage is that the auditory naturalness is maintained.

【００３２】ＬＰＣ係数は声道特性を表すパラメータの
代表であり、例えばｋパラメータ、ＬＳＰ、ＬＰＣケプ
ストラムなど様々な表現方法がある。また、ＬＰＣ係数
以外のパラメータで声道特性を表すことも可能である。
本実施形態では、ＬＰＣ係数を声道特性を表すパラメー
タの一例として用いたが、他のパラメータを用いてもよ
く、この点は以下に述べる他の実施形態についても同様
である。＜復号化側について＞図２に、図１の音声符号化システ
ムに対応する音声復号化システムの構成を示す。この音
声復号化システムは、入力端子２０１，２０２に入力さ
れるＬＰＣ係数量子化インデックス１００１および駆動
信号量子化インデックス１００２をそれぞれ逆量子化す
る第１，第２の逆量子化部２２１，２２２、入力端子２
０３に入力されるビット配分情報１００３に基づいて逆
量子化部２２１，２２２へのビット配分を決定するビッ
ト配分部２１０、逆量子化部２２１，２２２で逆量子化
されたＬＰＣ係数および駆動信号をそれぞれ遅延させる
第１、第２の遅延部２３１，２３２、ＬＰＣ係数および
駆動信号の予測をそれぞれ行う第１、第２の予測部２４
１，２４２、および逆量子化部２２１，２２２で逆量子
化されたＬＰＣ係数および駆動信号から復号音声信号を
生成する合成部２５０からなる。The LPC coefficients are representative of parameters representing vocal tract characteristics, and include various expression methods such as k parameters, LSP, and LPC cepstrum. It is also possible to express the vocal tract characteristics with parameters other than the LPC coefficient.
In the present embodiment, the LPC coefficient is used as an example of a parameter representing a vocal tract characteristic, but another parameter may be used, and this point is the same in other embodiments described below. <Regarding the Decoding Side> FIG. 2 shows a configuration of a speech decoding system corresponding to the speech encoding system of FIG. This speech decoding system includes first and second dequantizers 221 and 222 for dequantizing LPC coefficient quantization index 1001 and drive signal quantization index 1002 input to input terminals 201 and 202, respectively. Terminal 2
The bit allocation unit 210 that determines the bit allocation to the inverse quantization units 221 and 222 based on the bit allocation information 1003 input to the input unit 03 and the LPC coefficients and the drive signal inversely quantized by the inverse quantization units 221 and 222 First and second delay units 231 and 232 for delaying, respectively, first and second prediction units 24 for predicting LPC coefficients and drive signals, respectively.
1 and 242, and a synthesizing unit 250 that generates a decoded audio signal from the LPC coefficient and the driving signal dequantized by the dequantizing units 221 and 222.

【００３３】合成部２５０は、より具体的には逆量子化
されたＬＰＣ係数がフィルタ係数として与えられるＬＰ
Ｃ合成フィルタを用いて構成され、このＬＰＣ合成フィ
ルタに逆量子化された駆動信号を入力することにより、
復号音声信号を生成する。More specifically, the synthesizing unit 250 is an LP to which inversely quantized LPC coefficients are given as filter coefficients.
The LPC synthesis filter is constituted by using a C synthesis filter, and by inputting the inversely quantized drive signal to the LPC synthesis filter,
Generate a decoded audio signal.

【００３４】（第２の実施形態）＜符号化側について＞図３に、本発明の第２の実施形態
に係る音声符号化システムを示す。この音声符号化シス
テムは、図１に示した第１の実施形態の音声符号化シス
テムにおけるビット配分部１６０を取り除き、評価部１
５０の評価結果に応じて第１、第２の量子化部１２１，
１２２からのＬＰＣ係数および駆動信号の量子化インデ
ックスのいずれを伝送すべき符号化パラメータとして選
択するかを決定する符号化パラメータ選択制御部１７０
と、この符号化パラメータ選択制御部１７０によって制
御され、量子化部１２１，１２２からのＬＰＣ係数およ
び駆動信号の量子化インデックスのいずれか（ＬＰＣ係
数／駆動信号量子化インデックス）を符号化パラメータ
１００５として取り出す切り替えスイッチ１８０に置き
換えた構成になっている。(Second Embodiment) <On the Encoding Side> FIG. 3 shows a speech encoding system according to a second embodiment of the present invention. This speech coding system removes the bit allocation unit 160 in the speech coding system of the first embodiment shown in FIG.
The first and second quantization units 121,
Coding parameter selection control section 170 that determines which of the LPC coefficient from step 122 and the quantization index of the drive signal is to be selected as the coding parameter to be transmitted
Any of the LPC coefficients from the quantization units 121 and 122 and the quantization index of the drive signal (LPC coefficient / drive signal quantization index) is controlled by the encoding parameter selection control unit 170 as the encoding parameter 1005. The configuration is such that the switch 180 is replaced with an extraction switch 180.

【００３５】符号化パラメータ選択制御部１７０から
は、ＬＰＣ係数および駆動信号の量子化インデックスの
いずれの符号化パラメータを選択したかを示す符号化パ
ラメータ選択情報１００４が出力される。The coding parameter selection control section 170 outputs coding parameter selection information 1004 indicating which of the LPC coefficient and the quantization index of the drive signal has been selected.

【００３６】第１の実施形態では、ＬＰＣ係数と駆動信
号の予測値に応じてＬＰＣ係数と駆動信号の符号化デー
タへのビット配分を変える方法を述ベたが、本実施形態
ではこれをさらに押し進め、符号化ビットをＬＰＣ係数
と駆動信号のいずれか一方に全て割り当てる。その結
果、フレーム毎にＬＰＣ係数の符号化データを伝送する
か駆動信号の符号化データを伝送するかが変わることに
なる。＜復号化側について＞図４に、図３の音声符号化システ
ムに対応する音声復号化システムを示す。この音声復号
化システムでは、入力端子２０４に入力される符号化パ
ラメータ選択情報１００４に従って、入力端子２０５に
入力される符号化パラメータ（ＬＰＣ係数／駆動信号量
子化インデックス）が逆量子化部２２１，２２２のいず
れかに振り分けて入力される。すなわち、ＬＰＣ係数量
子化インデックスは逆量子化部２２１に、駆動信号量子
化インデックスは逆量子化部２２２にそれぞれ入力され
る。これ以後の動作は、図２と同様である。In the first embodiment, the method of changing the bit allocation of the LPC coefficient and the drive signal to the coded data in accordance with the LPC coefficient and the predicted value of the drive signal has been described. Push all the coded bits to one of the LPC coefficient and the drive signal. As a result, transmission of encoded data of LPC coefficients or transmission of encoded data of a drive signal changes for each frame. <Regarding the Decoding Side> FIG. 4 shows a speech decoding system corresponding to the speech encoding system of FIG. In this speech decoding system, according to the coding parameter selection information 1004 input to the input terminal 204, the coding parameter (LPC coefficient / drive signal quantization index) input to the input terminal 205 is inversely quantized by the inverse quantization units 221 and 222. And input. That is, the LPC coefficient quantization index is input to the inverse quantization unit 221, and the drive signal quantization index is input to the inverse quantization unit 222. The subsequent operation is the same as in FIG.

【００３７】このように本実施形態によると、復号化側
においてＬＰＣ係数の符号化データが伝送されたフレー
ムでは、駆動信号には予測値をそのまま用い、駆動信号
の符号化データが伝送されたフレームでは、ＬＰＣ係数
には予測値をそのまま用いることで復号音声を生成する
ため、復号音声の原音声との一致性が第１の実施形態よ
りも失われる代わりに、より低レートの符号化が可能と
なる。As described above, according to the present embodiment, in the frame in which the encoded data of the LPC coefficient is transmitted on the decoding side, the predicted value is used as it is for the drive signal, and the frame in which the encoded data of the drive signal is transmitted is used. Then, since the decoded speech is generated by using the predicted value as it is as the LPC coefficient, the consistency of the decoded speech with the original speech is lost as compared with the first embodiment, and a lower rate encoding is possible. Becomes

【００３８】（第３の実施形態）＜符号化側について＞図５に、本発明の第３の実施形態
に係る音声符号化システムを示す。この音声符号化シス
テムでは、第１の実施形態における分析部１１０がピッ
チ周期分析部１９１とピッチ波形分析部１９２とからな
る分析部１９０に置き換えられ、さらにＬＰＣ分析部１
１１とＬＰＣ係数の分析値を量子化するＬＰＣ量子化部
３２０が分析部３１０の外に設けられている。(Third Embodiment) <On the Encoding Side> FIG. 5 shows a speech encoding system according to a third embodiment of the present invention. In this speech coding system, the analysis unit 110 in the first embodiment is replaced by an analysis unit 190 including a pitch period analysis unit 191 and a pitch waveform analysis unit 192, and the LPC analysis unit 1
An LPC quantization unit 320 that quantizes the analysis value of the LPC coefficient 11 and the LPC coefficient is provided outside the analysis unit 310.

【００３９】さらに、本実施形態ではピッチ周期および
ピッチ波形の量子化をそれぞれ行う第１、第２の量子化
部３２１，３２２、ピッチ周期およびピッチ波形の量子
化値をそれぞれ遅延させる第１、第２の遅延部３３１，
３３２、ピッチ周期およびピッチ波形の予測をそれぞれ
行う第１、第２の予測部３４１，３４２、予測部３４
１，３４２からの予測値を復号音声の生成に用いた場合
の聴覚的な自然性に与える度合いを評価する評価部３５
０、および評価部３５０の評価結果に基づいて量子化部
３２１，３２２への量子化ビット数の配分を行うビット
配分部３６０が設けられている。Further, in this embodiment, the first and second quantizers 321 and 322 for quantizing the pitch period and the pitch waveform, respectively, and the first and second quantizers for delaying the pitch period and the quantization value of the pitch waveform, respectively. 2 delay units 331,
332, first and second prediction units 341 and 342, which predict the pitch period and the pitch waveform, respectively, and the prediction unit 34
Evaluating unit 35 that evaluates the degree to which auditory naturalness is imparted when the predicted value from 1,342 is used to generate decoded speech.
There is provided a bit allocation unit 360 that allocates the number of quantization bits to the quantization units 321 and 322 based on 0 and the evaluation result of the evaluation unit 350.

【００４０】次に、本実施形態の音声符号化システムの
動作について説明する。入力端子１００には音声信号が
１フレーム単位で入力され、これに同期して線形予測分
析が行われてＬＰＣ係数と駆動信号が分離されて符号化
される。すなわち、ＬＰＣ分析部１１１からはＬＰＣ係
数の分析値が出力され、分析部１９０では、駆動信号が
ピッチ周期分析部１９１とピッチ波形分析部１９２でそ
れぞれピッチ周期と１ピッチ分のピッチ波形に分離され
る。ＬＰＣ分析部１１１から出力されるＬＰＣ係数の分
析値はＬＰＣ量子化部３２０で量子化され、ＬＰＣ係数
量子化インデックス１００１が伝送される。Next, the operation of the speech coding system according to this embodiment will be described. An audio signal is input to the input terminal 100 on a frame-by-frame basis. In synchronization with this, a linear prediction analysis is performed, and LPC coefficients and a drive signal are separated and encoded. That is, the LPC analysis section 111 outputs an analysis value of the LPC coefficient, and the analysis section 190 separates the drive signal into a pitch cycle and a pitch waveform for one pitch by the pitch cycle analysis section 191 and the pitch waveform analysis section 192, respectively. You. The analysis value of the LPC coefficient output from the LPC analysis unit 111 is quantized by the LPC quantization unit 320, and the LPC coefficient quantization index 1001 is transmitted.

【００４１】第１の量子化部３２１は、予測部３４１の
出力であるピッチ周期の予測値を利用して、ピッチ周期
分析部１９１からのピッチ周期の分析値をビット配分部
３６０によって配分された量子化ビット数で量子化し、
ピッチ周期量子化インデックス１０１１を出力すると同
時に、ピッチ周期の量子化値を次フレームの予測のため
に遅延部３３１に供給する。The first quantization section 321 uses the bit cycle allocation section 360 to distribute the pitch cycle analysis value from the pitch cycle analysis section 191 using the pitch cycle prediction value output from the prediction section 341. Quantize with the number of quantization bits,
At the same time as outputting the pitch period quantization index 1011, the quantization value of the pitch period is supplied to the delay unit 331 for prediction of the next frame.

【００４２】第２の量子化部３２２は、同様に第２の予
測部３２２の出力であるピッチ波形の予測値を利用し
て、ピッチ波形分析部１９２からのピッチ波形の分析値
をビット配分部３６０によって配分された量子化ビット
数で量子化し、ピッチ波形量子化インデックス１０１２
を出力すると同時に、ピッチ波形の量子化値を次フレー
ムの予測のために遅延部３３２に供給する。The second quantizing unit 322 similarly uses the predicted value of the pitch waveform output from the second predicting unit 322 to transmit the analyzed value of the pitch waveform from the pitch waveform analyzing unit 192 to the bit distribution unit. Quantize with the number of quantization bits allocated by 360 and generate a pitch waveform quantization index 1012
, And supplies the quantized value of the pitch waveform to the delay unit 332 for prediction of the next frame.

【００４３】評価部３５０では、まずピッチ周期分析部
１９１で得られたピッチ周期の分析値を基に、第１の予
測部３２１で得られたピッチ周期の予測値をそのまま復
号音声の生成に用いた場合の復号音声の声質への影響の
度合が求められる。また、評価部３５０では、ピッチ波
形駆動信号分析部３１２で得られたピッチ波形の分析値
を基に、第２の予測部３２２で得られたピッチ波形の予
測値を復号音声の生成に用いた場合の復号音声の音質へ
の影響の度合も求められる。The evaluation section 350 first uses the pitch period predicted value obtained by the first prediction section 321 as it is for the generation of decoded speech based on the pitch period analysis value obtained by the pitch period analysis section 191. In this case, the degree of influence of the decoded speech on the voice quality is determined. In addition, the evaluation unit 350 uses the predicted value of the pitch waveform obtained by the second prediction unit 322 to generate a decoded speech based on the analysis value of the pitch waveform obtained by the pitch waveform drive signal analysis unit 312. The degree of influence on the sound quality of the decoded speech in that case is also obtained.

【００４４】そして、評価部３５０からは上記の二種類
の影響の度合を示す情報がビット配分部３６０に送られ
る。ビット配分部３６０では、この情報に基づいて影響
の度合が大きい方により多くの量子化ビットを配分する
ように、第１、第２の量子化器３２１，３２２の量子化
ビット数の配分を決める。同時にビット配分部３６０か
らは、第１、第２の量子化器３４１，３４２の各々に配
分した量子化ビット数の情報であるビット配分情報１０
１３が出力される。Then, information indicating the degree of the above two types of influence is sent from the evaluation section 350 to the bit allocation section 360. The bit allocation unit 360 determines the allocation of the number of quantization bits of the first and second quantizers 321 and 322 based on this information so as to allocate more quantization bits to the one having a greater influence. . At the same time, the bit allocation section 360 outputs bit allocation information 10 which is information on the number of quantization bits allocated to each of the first and second quantizers 341 and 342.
13 is output.

【００４５】この音声符号化システムから出力される符
号化データは、ＬＰＣ係数量子化インデックス１００
１、ピッチ周期量子化インデックス１０１１、ピッチ波
形量子化インデックス１０１２およびビット配分情報１
０１３であり、これらが伝送路や記録媒体を介して後述
する音声復号化システムへ伝送される。The coded data output from the speech coding system has an LPC coefficient quantization index of 100
1, pitch cycle quantization index 1011, pitch waveform quantization index 1012 and bit allocation information 1
013, which are transmitted to a later-described audio decoding system via a transmission path or a recording medium.

【００４６】第１の実施形態では、ＬＰＣ係数と駆動信
号に関して符号化データのビット配分を変える例を述べ
たが、本実施形態ではピッチ周期とピッチ波形の符号化
データに対してビット配分を変えるようにしたものであ
る。In the first embodiment, the example in which the bit allocation of the encoded data is changed with respect to the LPC coefficient and the drive signal has been described. In the present embodiment, the bit allocation is changed with respect to the pitch period and the encoded data of the pitch waveform. It is like that.

【００４７】まず、ピッチ周期に関して説明すると、ピ
ッチ周期は平坦で変化の少ない区間や、急激に上下する
など変化の大きな区間が存在し、また平坦な区間でも僅
かな揺らぎが存在する。従来は、このようなピッチ周期
の変化を忠実に符号化していた。First, the pitch period will be described. The pitch period has a flat section with little change, a section with a large change such as a sharp rise and fall, and a slight fluctuation even in a flat section. Conventionally, such a change in pitch cycle has been faithfully encoded.

【００４８】これに対し、本実施形態では入力音声信号
のピッチ周期と必ずしも一致させることを目的とせず、
聴覚的に自然に聞こえるピッチパターンを予測する。予
測が適当であれば、ピッチ周期の情報を伝送する必要は
ない。復号化側も同じ予測アルゴリズムを備えているた
め、過去の量子化値から符号化側と同じ予測値を生成で
きるからである。予測から外れ聴覚上問題がある場合
は、多くの符号化ビットを割り当てて、より入力音声信
号のピッチ周期に近いピッチ周期を伝送する。ピッチ波
形についても同様である。On the other hand, in the present embodiment, it is not intended to always match the pitch period of the input voice signal,
Predict pitch patterns that sound audibly natural. If the prediction is appropriate, there is no need to transmit information on the pitch period. This is because the decoding side also has the same prediction algorithm, so that the same prediction value as the encoding side can be generated from past quantized values. If there is an auditory problem that deviates from the prediction, more coded bits are allocated and a pitch cycle closer to the pitch cycle of the input speech signal is transmitted. The same applies to the pitch waveform.

【００４９】さらに、本実施形態ではピッチ周期とピッ
チ波形で聴覚的に影響の大きい方により多くの符号化ビ
ットを配分する。ピッチ周期がほぼ一定の場合は、ピッ
チ波形に、逆にピッチ周期が変化しているがピッチ波形
は変わらない場合は、ピッチ周期により多くの符号化ビ
ットを配分する。このようにすることで、ピッチ周期の
原音との一致性は多少犠牲になるが、駆動信号を低レー
トで符号化することが可能になる。＜復号化側について＞図６に、図５の音声符号化システ
ムに対応する音声復号化システムの構成を示す。この音
声復号化システムは、入力端子２０１，２１１、２１２
にそれぞれ入力されるＬＰＣ係数量子化インデックス１
００１、ピッチ周期量子化インデックス１０１１、ピッ
チ波形量子化インデックス１０１２を逆量子化する逆量
子化部４２０，４２１，４２２、入力端子２１３に入力
されるビット配分情報１０１３に基づいて逆量子化部４
２１，４２２へのビット配分を決定するビット配分部４
１０、逆量子化部４２１，４２２で逆量子化されたピッ
チ周期およびピッチ波形をそれぞれ遅延させる第１、第
２の遅延部４３１，４３２、ピッチ周期およびピッチ波
形の予測をそれぞれ行う第１、第２の予測部４４１，４
４２、および逆量子化部４０２，４２１，４２２で逆量
子化されたＬＰＣ係数、ピッチ周期およびピッチ波形か
ら復号音声信号を生成する合成部４５０からなる。Further, in the present embodiment, more coded bits are allocated to those having a greater auditory effect on the pitch period and the pitch waveform. When the pitch period is substantially constant, on the other hand, if the pitch period changes but the pitch waveform does not change, more coded bits are allocated to the pitch period. By doing so, the matching of the pitch period with the original sound is somewhat sacrificed, but the drive signal can be encoded at a low rate. <Regarding the Decoding Side> FIG. 6 shows the configuration of a speech decoding system corresponding to the speech encoding system of FIG. This audio decoding system has input terminals 201, 211, 212
LPC coefficient quantization index 1 respectively input to
001, the pitch period quantization index 1011, the inverse quantization units 420, 421 and 422 for inversely quantizing the pitch waveform quantization index 1012, and the inverse quantization unit 4 based on the bit allocation information 1013 input to the input terminal 213.
Bit allocation unit 4 that determines bit allocation to 21, 422
10. First and second delay units 431 and 432 for respectively delaying the pitch cycle and the pitch waveform dequantized by the inverse quantization units 421 and 422, and the first and second delay units for predicting the pitch cycle and the pitch waveform, respectively. 2 prediction units 441, 4
42, and a synthesis unit 450 that generates a decoded speech signal from the LPC coefficient, pitch period, and pitch waveform dequantized by the dequantization units 402, 421, and 422.

【００５０】合成部４５０は、より具体的には逆量子化
されたＬＰＣ係数がフィルタ係数として与えられるＬＰ
Ｃ合成フィルタを用いて構成され、このＬＰＣ合成フィ
ルタに逆量子化されたピッチ周期およびピッチ波形から
生成される駆動信号を入力することにより、復号音声信
号を生成する。More specifically, the synthesizing section 450 is an LP to which inversely quantized LPC coefficients are given as filter coefficients.
A decoded speech signal is generated by inputting a drive signal generated from the dequantized pitch cycle and pitch waveform to this LPC synthesis filter.

【００５１】（第４の実施形態）図７に、本発明の第４
の実施形態に係る音声符号化システムを示す。この音声
符号化システムは、図５に示した第３の実施形態の音声
符号化システムから、ＬＰＣ分析部１１１とＬＰＣ量子
化部３２０を取り去った構成になっている。第３の実施
形態では、入力音声信号を線形予側分析し、ＬＰＣ係数
と駆動信号に分離して符号化しているが、本実施形態で
は分離せずに符号化している。(Fourth Embodiment) FIG. 7 shows a fourth embodiment of the present invention.
1 shows a speech encoding system according to an embodiment. This speech coding system has a configuration in which the LPC analysis unit 111 and the LPC quantization unit 320 are removed from the speech coding system according to the third embodiment shown in FIG. In the third embodiment, the input audio signal is subjected to linear pre-analysis, and is separated into LPC coefficients and drive signals and encoded. However, in the present embodiment, encoding is performed without separation.

【００５２】具体的には、第３の実施形態では駆動信号
のピッチ周期およびピッチ波形が抽出されるのに対し、
本実施形態ではピッチ周期分析部３１１で入力音声信号
のピッチ周期が抽出され、ピッチ波形分析部３１２で入
力音声信号のピッチ波形が切り出される。Specifically, in the third embodiment, the pitch period and the pitch waveform of the drive signal are extracted.
In the present embodiment, the pitch cycle of the input voice signal is extracted by the pitch cycle analysis unit 311, and the pitch waveform of the input voice signal is cut out by the pitch waveform analysis unit 312.

【００５３】本実施形態によると、第１〜第３の実施形
態で必要であったＬＰＣ分析が不要となり、ＬＰＣ分析
の効果が薄い入力音声信号に対しても本発明を適用する
ことができる。以降の実施形態ではＬＰＣ分析を行わな
い例を示しているが、本実施形態と第３の実施形態との
関係と同様に、ＬＰＣ分析を行う場合にも適用可能であ
る。＜復号化側について＞図８は、図７の音声符号化システ
ムに対応する音声復号化システムの構成を示す図であ
り、図６に示した第３の実施形態の音声復号化システム
から、ＬＰＣ逆量子化部４２０を取り去った構成となっ
ている。この場合、合成部４５０では逆量子化されたピ
ッチ波形を逆量子化されたピッチ周期で適当な手法でつ
なぎ合わせることで、復号音声信号を生成する。According to the present embodiment, the LPC analysis required in the first to third embodiments becomes unnecessary, and the present invention can be applied to an input voice signal having a small effect of the LPC analysis. In the following embodiments, an example in which the LPC analysis is not performed is shown. However, as in the relationship between the present embodiment and the third embodiment, the present invention is also applicable to the case where the LPC analysis is performed. <Regarding the Decoding Side> FIG. 8 is a diagram showing a configuration of a speech decoding system corresponding to the speech encoding system of FIG. The configuration is such that the inverse quantization unit 420 is removed. In this case, the synthesizer 450 generates a decoded speech signal by connecting the inversely quantized pitch waveforms with the inversely quantized pitch period by an appropriate method.

【００５４】（第５の実施形態）図９に、本発明の第５
の実施形態に係る音声符号化システムを示す。この音声
符号化システムは、図７に示した第５の実施形態の音声
符号化システムにおける第１、第２の予測部３４１，３
４２および評価部３５０の構成を詳細に示したものであ
り、他の構成要素やその動作に関しては第４の実施形態
と同じである。予測部３４１は比較部３４１１とピッチ
パターン符号帳３４１２からなり、予測部３４２は比較
部３４２１と波形パターン符号帳３４２２からなる。ま
た、評価部３５０は二つの誤差積分部３５１１，３５２
２からなる。(Fifth Embodiment) FIG. 9 shows a fifth embodiment of the present invention.
1 shows a speech encoding system according to an embodiment. This speech coding system includes the first and second prediction units 341 and 3 in the speech coding system according to the fifth embodiment shown in FIG.
This shows the configuration of the evaluation unit and the evaluation unit 350 in detail, and other components and operations thereof are the same as those of the fourth embodiment. The prediction unit 341 includes a comparison unit 3411 and a pitch pattern codebook 3412, and the prediction unit 342 includes a comparison unit 3421 and a waveform pattern codebook 3422. Further, the evaluation unit 350 includes two error integration units 3511 and 352.
Consists of two.

【００５５】まず、第１の予測部３４１について説明す
る。予測部３４１のピッチパターン符号帳３４１２に
は、多くの音声から学習によって得られたピッチパター
ンと、そのパターンの次に出現する後続ピッチ周期の組
が複数格納されている。ここで、ピッチパターンとは音
声のピッチ周期の変化を表すもので、ここでは過去Ｎフ
レーム分のピッチ周期とする。符号化時には、遅延部３
３１に格納された過去のＮ個の量子化ピッチ周期とピッ
チパターン符号帳３４１２に格納されたピッチパターン
（Ｎ個のピッチ周期）を比較部３４１１で比較してピッ
チ周期の変化の形が最も近い候補を探し、これと組にな
っている後続ピッチ周期を予測ピッチ周期として出力す
る。First, the first predictor 341 will be described. The pitch pattern codebook 3412 of the prediction unit 341 stores a plurality of pairs of pitch patterns obtained by learning from many voices and subsequent pitch periods appearing next to the patterns. Here, the pitch pattern represents a change in the pitch cycle of the voice, and is assumed to be the pitch cycle of the past N frames. At the time of encoding, the delay unit 3
A comparison unit 3411 compares the past N quantized pitch periods stored in 31 with the pitch pattern (N pitch periods) stored in the pitch pattern codebook 3412, and the change form of the pitch period is the closest. A candidate is searched for, and a subsequent pitch period paired with the candidate is output as a predicted pitch period.

【００５６】次に、第２の予測部３４２について説明す
る。予測部３４２は、扱うパラメータが予測部３４１の
ピッチ周期からピッチ波形に代わっただけで、動作は同
じである。すなわち、遅延部３３２に蓄えられて過去Ｍ
個のピッチ波形と、波形パターン符号帳３４２２に格納
されたＭ個のピッチ波形を比較して最も近いものを探
し、これと組になっている後続ピッチ波形を予測ピッチ
波形として出力する。Next, the second predictor 342 will be described. The operation of the prediction unit 342 is the same as that of the prediction unit 342, except that the parameter to be handled is changed from the pitch period of the prediction unit 341 to a pitch waveform. That is, the past M stored in the delay unit 332
The number of pitch waveforms and the M number of pitch waveforms stored in the waveform pattern codebook 3422 are compared to find the closest one, and the subsequent pitch waveform paired with this is output as a predicted pitch waveform.

【００５７】次に、評価部３５０について説明する。評
価部３５０の誤差積分部３５１１ではピッチ分析部３１
１で得られたピッチ周期の分析値と、予測部３４１から
出力されたピッチ周期の予測値を入力とし、その違いを
評価する。このとき、現フレームのピッチ周期の違いだ
けではなく、過去数フレームに遡って全体のピッチ周期
の変化がどれ程度ずれているかを評価する。この変化の
ずれをここでは積分値と呼ぶことにする。Next, the evaluation section 350 will be described. In the error integration section 3511 of the evaluation section 350, the pitch analysis section 31
The analysis value of the pitch period obtained in step 1 and the predicted value of the pitch period output from the prediction unit 341 are input, and the difference is evaluated. At this time, not only the difference in the pitch cycle of the current frame but also how much the change in the pitch cycle of the entire frame is shifted back to the past several frames is evaluated. This shift in change is referred to as an integral value here.

【００５８】ピッチ周期は、一時的にずれても聴覚的な
自然性という観点からは問題はないが、長期にわたって
ずれ続けると不自然なイントネーションとなって自然性
を損なう。そこで、誤差積算部３５１１では、ピッチ周
期の分析値と予測値が長期にわたってずれているほど差
が大きいと判断し、その情報を誤差情報としてビット配
分部３６０に送る。Even if the pitch cycle is temporarily shifted, there is no problem from the viewpoint of auditory naturalness. However, if the pitch cycle is continuously shifted for a long period of time, an unnatural intonation is lost and the naturalness is impaired. Therefore, the error accumulating unit 3511 determines that the larger the difference between the analysis value and the predicted value of the pitch period over a long period of time, the larger the difference, and sends the information to the bit distribution unit 360 as error information.

【００５９】誤差積算部３５１２は、扱う情報がピッチ
周期でなくピッチ波形である点を除けば誤差積算部３５
１１と同じである。すなわち、ピッチ波形の予測値と分
析値の時間変化を比較し、長期にわたってずれ続けてい
る場合ほど大きな値の誤差情報をビッ卜配分部３６０に
出力する。The error accumulating unit 3512 is different from the error accumulating unit 3512 except that the information to be handled is not a pitch period but a pitch waveform.
Same as 11. That is, the temporal change of the predicted value of the pitch waveform and the analysis value are compared, and the error information of a larger value is output to the bit distribution unit 360 as the deviation continues for a long time.

【００６０】ビット配分部３６０では、誤差積分部３５
１１，３５１２からの誤差情報を基にピッチ周期とピッ
チ波形の量子化ビット数を決定し、量子化部３２１、３
２２にビット数を出力する。In the bit distribution section 360, the error integration section 35
The pitch period and the number of quantization bits of the pitch waveform are determined on the basis of the error information from
The number of bits is output to 22.

【００６１】量子化部３２１では、割り当てられたビッ
ト数でピッチ周期分析部３１１より得られたピッチ周期
の分析値と比較部３４１１より得られたピッチの予測値
を基に現フレームのピッチ周期を決定する。通常、ピッ
チ周期の分析値により近い値になるように現フレームの
ピッチ周期を決定し、これを示すインデックスを出力す
ると同時に、次のフレームで用いるため量子化値を遅延
部３３１に供給して格納しておく。The quantization unit 321 calculates the pitch period of the current frame based on the analysis value of the pitch period obtained from the pitch period analysis unit 311 and the predicted value of the pitch obtained from the comparison unit 3411 using the allocated number of bits. decide. Normally, the pitch period of the current frame is determined so as to be closer to the analysis value of the pitch period, an index indicating this is output, and at the same time, the quantization value is supplied to the delay unit 331 and stored for use in the next frame. Keep it.

【００６２】量子化部３２２も、同様にピッチ波形の量
子化を行う。量子化には多くの場合符号帳を用いるが、
この符号帳はピッチパターン符号帳３４１２や波形パタ
ーン符号帳３４２２とは別のものであり、図９では量子
化部３２２に含まれており、陽には示していない。The quantization unit 322 also performs quantization of the pitch waveform in the same manner. Codebooks are often used for quantization,
This codebook is different from the pitch pattern codebook 3412 and the waveform pattern codebook 3422, and is included in the quantization unit 322 in FIG. 9 and is not explicitly shown.

【００６３】このように本実施形態では、長期にわたる
ピッチ周期およびピッチ波形の変化のずれを評価するこ
とで、聴覚的により自然に聞こえる復号音声が得られる
低レート符号化が可能になる。また、ピッチパターン符
号帳３４１１の作成においてメモリ量を削減するためピ
ッチ周期を正規化することは効果的である。＜復号化側について＞図１０は、図９の音声符号化シス
テムに対応する音声復号システムの構成を示す図であ
り、図８に示した第５の実施形態の音声復号化システム
における第１，第２の予測部４４１，４４２の構成を詳
細に示したものである。すなわち、予測部４４１は比較
部４４１１とピッチパターン符号帳３４１２からなり、
予測部４４２は比較部４４２１と波形パターン符号帳４
４２２からなる。As described above, in the present embodiment, by evaluating the shift of the pitch cycle and the change of the pitch waveform over a long period of time, it becomes possible to perform low-rate encoding that can obtain decoded speech that sounds more audible and natural. In addition, it is effective to normalize the pitch period in order to reduce the amount of memory when creating the pitch pattern codebook 3411. <Regarding the Decoding Side> FIG. 10 is a diagram showing a configuration of a speech decoding system corresponding to the speech encoding system of FIG. 9, and the first and the first speech decoding systems of the fifth embodiment shown in FIG. 4 shows the configuration of the second prediction units 441 and 442 in detail. That is, the prediction unit 441 includes a comparison unit 4411 and a pitch pattern codebook 3412,
The prediction unit 442 includes the comparison unit 4421 and the waveform pattern codebook 4
422.

【００６４】次に、この音声復号化システムの動作につ
いて説明する。ビット配分部４１０において、入力端子
２１３に入力されたビット配分情報から、図９中の量子
化部３２１，３２２に配分された量子化ビット数が求め
られ、これらが逆量子化部４２１，４２２にそれぞれ与
えられる。Next, the operation of the speech decoding system will be described. In bit allocation section 410, the number of quantization bits allocated to quantization sections 321 and 322 in FIG. 9 is obtained from the bit allocation information input to input terminal 213, and these are supplied to inverse quantization sections 421 and 422. Each given.

【００６５】入力端子３０１にはピッチ周期インデック
スが入力され、このインデックスと第１の予測部４４１
から出力されたピッチ周期の予測値を基に逆量子化部４
２１でピッチ周期が復号され、合成部４５０に入力され
ると同時に、次のフレームの処理に備えて遅延部４３１
に入力される。予測部４４１は、符号帳も含め図９の音
声符号化システムの予測部３３１と全く同じ構成になっ
ているので、予測部３３１と同じ出力をサイド情報なし
で得ることができる。A pitch period index is input to the input terminal 301, and the index and the first prediction unit 441
Inverse quantization unit 4 based on the predicted pitch period output from
At 21, the pitch period is decoded and input to the synthesis unit 450, and at the same time, the delay unit 431 is prepared for the processing of the next frame.
Is input to Since the prediction unit 441 has exactly the same configuration including the codebook as the prediction unit 331 of the speech coding system in FIG. 9, the same output as the prediction unit 331 can be obtained without side information.

【００６６】入力端子２１５にはピッチ波形インデック
スが入力され、このインデックスと第２の予測部４４２
から出力されたピッチ波形の予測値を基に逆量子化部４
２２でピッチ周期が復号され、合成部４５０に入力され
ると同時に、次のフレームの処理に備えて遅延部４３２
に入力される。The pitch waveform index is input to the input terminal 215, and this index and the second prediction unit 442
Inverse quantization unit 4 based on the predicted value of the pitch waveform output from
At 22, the pitch period is decoded and input to the synthesizing unit 450, and at the same time, the delay unit 432 prepares for the processing of the next frame.
Is input to

【００６７】合成部４５０では、ピッチ波形をピッチ周
期で適当な手段を用いてつなぎ合わせることで復号音声
を生成する。（第６の実施形態）＜符号化側について＞図１１に、本発明の第５の実施形
態に係る音声符号化システムを示す。この音声符号化シ
ステムは、図７に示した第４の実施形態の音声符号化シ
ステムにおけるビット配分部３６０を取り除き、評価部
３５０の評価結果に応じて第１、第２の量子化部３２
１，３２２からのピッチ周期およびピッチ波形の量子化
値のいずれを符号化パラメータとして選択するかを決定
する符号化パラメータ選択制御部３７０と、この符号化
パラメータ選択制御部３７０によって制御され、第１、
第２の量子化部３２１，３２２からのピッチ周期および
ピッチ波形の量子化インデックスのいずれか（ピッチ周
期／ピッチ波形量子化インデックス）を符号化パラメー
タ１０１５として取り出す切り替えスイッチ３８０に置
き換えた構成になっている。符号化パラメータ選択制御
部３７０からは、ピッチ周期およびピッチ波形の量子化
インデックスのいずれを符号化パラメータとして選択し
たかを示す符号化パラメータ選択情報１０１４が出力さ
れる。The synthesizing section 450 generates a decoded speech by connecting the pitch waveforms at a pitch period by using an appropriate means. Sixth Embodiment <On the Encoding Side> FIG. 11 shows a speech encoding system according to a fifth embodiment of the present invention. This speech coding system removes the bit allocation unit 360 in the speech coding system according to the fourth embodiment shown in FIG. 7, and according to the evaluation result of the evaluation unit 350, the first and second quantization units 32.
1 and 322, a coding parameter selection control unit 370 that determines which of the pitch period and the quantization value of the pitch waveform is to be selected as a coding parameter, and is controlled by the coding parameter selection control unit 370. ,
One of the pitch cycle and the quantization index of the pitch waveform (pitch cycle / pitch waveform quantization index) from the second quantization units 321 and 322 (pitch cycle / pitch waveform quantization index) is replaced with a changeover switch 380 for extracting as an encoding parameter 1015. I have. The encoding parameter selection control unit 370 outputs encoding parameter selection information 1014 indicating which of the pitch period and the quantization index of the pitch waveform has been selected as the encoding parameter.

【００６８】そして、フレーム毎にピッチ周期かピッチ
波形のいずれか一方を選択し、その選択した方に配分可
能な符号化ビットを全て割り当て符号化を行う。符号化
ビットがは配分されなかった方は、予測値をそのまま用
いる。＜復号化側について＞図１２は、図１１の音声符号化シ
ステムに対応する音声復号化システムを示す図である。
この音声復号化システムでは、入力端子２１４に入力さ
れる符号化パラメータ選択情報１０１４に従って、入力
端子２１５に入力される符号化パラメータ（ピッチ周期
／ピッチ波形量子化インデックス）が逆量子化部４２
１，４２２のいずれかに振り分けて入力される。すなわ
ち、ピッチ周期量子化インデックスは逆量子化部４２１
に、ピッチ波形量子化インデックスは逆量子化部４２２
にそれぞれ入力される。これ以後の動作は、図８と同様
である。Then, either the pitch period or the pitch waveform is selected for each frame, and all the coded bits that can be allocated to the selected one are assigned and coding is performed. If the coded bits are not allocated, the predicted value is used as it is. <Regarding the Decoding Side> FIG. 12 is a diagram showing a speech decoding system corresponding to the speech encoding system of FIG.
In this speech decoding system, the coding parameter (pitch cycle / pitch waveform quantization index) input to the input terminal 215 is changed according to the coding parameter selection information 1014 input to the input terminal 214.
1, 422, and input. That is, the pitch period quantization index is calculated by the inverse quantization unit 421.
The pitch waveform quantization index is calculated by the inverse quantization unit 422.
Respectively. The subsequent operation is the same as in FIG.

【００６９】このように本実施形態によると、ピッチ周
期の符号化データが伝送されたフレームでは、ピッチ波
形には予測値を復号音声の生成に用い、ピッチ波形の符
号化データが伝送されたフレームでは、ピッチ周期には
予測値を復号音声の生成に用いることで、復号音声の原
音声との一致性が第４の実施形態よりも失われる代わり
に、より低レートの符号化が可能となる。As described above, according to the present embodiment, in a frame in which encoded data of a pitch period is transmitted, a predicted value is used for generation of decoded speech in a pitch waveform, and a frame in which encoded data of a pitch waveform is transmitted is used. Then, by using the predicted value for the generation of the decoded speech in the pitch period, the consistency of the decoded speech with the original speech is lost as compared with the fourth embodiment, and a lower-rate encoding becomes possible. .

【００７０】（第８の実施形態）図１３に、本発明の第
８の実施形態に係る音声復号化システムを示す。本実施
形態は、第１の実施形態と第４の実施形態を組み合わ
せ、さらに分析部５１０においてピッチ形状分析部５１
１とゲイン分析部５１２を設けて、ピッチ波形を形状と
ゲインに分解して分析し、ピッチ周期、ピッチ形状、ゲ
インおよびＬＰＣ係数という全部で４種類のパラメータ
を入力音声信号から抽出する構成になっている。以降の
処理はこれまでの実施形態と同様であり、予測値が聴覚
上不自然に聞こえる可能性のあるパラメータほど多くの
ビットを割り当てて符号化を行う。(Eighth Embodiment) FIG. 13 shows a speech decoding system according to an eighth embodiment of the present invention. This embodiment combines the first embodiment and the fourth embodiment, and further includes a pitch shape analyzer 51 in an analyzer 510.
1 and a gain analysis unit 512 to analyze the pitch waveform by decomposing it into a shape and a gain, and to extract all four types of parameters from the input speech signal, that is, pitch period, pitch shape, gain, and LPC coefficient. ing. Subsequent processing is the same as in the previous embodiments, and encoding is performed by allocating more bits to a parameter whose predicted value may sound unnaturally audible.

【００７１】このように音声信号のパラメータをより細
かく分類分けすることによって、必要なパラメータに集
中してビットを配分することができ、符号化の効率がさ
らに上がる。図１３の音声符号化システムに対応する音
声復号化システムについては特に図示しないが、これま
での実施形態から明らかであるので、詳細な説明を省略
する。As described above, by classifying the parameters of the audio signal more finely, the bits can be concentrated on the necessary parameters, and the encoding efficiency is further improved. Although a speech decoding system corresponding to the speech encoding system in FIG. 13 is not particularly shown, it is apparent from the embodiments described above, and thus detailed description will be omitted.

【００７２】[0072]

【発明の効果】以上説明したように、本発明によると原
音声との一致性は多少失われるが、聴覚的に自然な復号
音声が得られる低レート音声符号化／復号化を実現する
ことができる。As described above, according to the present invention, it is possible to realize low-rate speech encoding / decoding in which the consistency with the original speech is somewhat lost, but an auditory natural decoded speech can be obtained. it can.

[Brief description of the drawings]

【図１】本発明の第１の実施形態に係る音声符号化シス
テムの構成を示すブロック図FIG. 1 is a block diagram showing a configuration of a speech coding system according to a first embodiment of the present invention.

【図２】本発明の第１の実施形態に係る音声復号化シス
テムの構成を示すブロック図FIG. 2 is a block diagram showing a configuration of a speech decoding system according to the first embodiment of the present invention.

【図３】本発明の第２の実施形態に係る音声符号化シス
テムの構成を示すブロック図FIG. 3 is a block diagram showing a configuration of a speech coding system according to a second embodiment of the present invention.

【図４】本発明の第２の実施形態に係る音声復号化シス
テムの構成を示すブロック図FIG. 4 is a block diagram showing a configuration of a speech decoding system according to a second embodiment of the present invention.

【図５】本発明の第３の実施形態に係る音声符号化シス
テムの構成を示すブロック図FIG. 5 is a block diagram showing a configuration of a speech coding system according to a third embodiment of the present invention.

【図６】本発明の第３の実施形態に係る音声復号化シス
テムの構成を示すブロック図FIG. 6 is a block diagram showing a configuration of a speech decoding system according to a third embodiment of the present invention.

【図７】本発明の第４の実施形態に係る音声符号化シス
テムの構成を示すブロック図FIG. 7 is a block diagram showing a configuration of a speech coding system according to a fourth embodiment of the present invention.

【図８】本発明の第４の実施形態に係る音声復号化シス
テムの構成を示すブロック図FIG. 8 is a block diagram showing a configuration of a speech decoding system according to a fourth embodiment of the present invention.

【図９】本発明の第５の実施形態に係る音声符号化シス
テムの構成を示すブロック図FIG. 9 is a block diagram showing a configuration of a speech coding system according to a fifth embodiment of the present invention.

【図１０】本発明の第５の実施形態に係る音声復号化シ
ステムの構成を示すブロック図FIG. 10 is a block diagram showing a configuration of a speech decoding system according to a fifth embodiment of the present invention.

【図１１】本発明の第６の実施形態に係る音声符号化シ
ステムの構成を示すブロック図FIG. 11 is a block diagram showing a configuration of a speech coding system according to a sixth embodiment of the present invention.

【図１２】本発明の第６の実施形態に係る音声復号化シ
ステムの構成を示すブロック図FIG. 12 is a block diagram showing a configuration of a speech decoding system according to a sixth embodiment of the present invention.

【図１３】本発明の第７の実施形態に係る音声符号化シ
ステムの構成を示すブロック図FIG. 13 is a block diagram showing a configuration of a speech coding system according to a seventh embodiment of the present invention.

[Explanation of symbols]

１００…音声信号入力端子１１０，３１０，５１０…分析部１１１…ＬＰＣ分析部１１２…駆動信号分析部１２１，１２２，３２１，３２２，５２１，５２２…量
子化部１３１，１３２，３３１，３３２，５３１，５３２…遅
延部１４１，１４２，３４１，２４２，５４１，５４２…予
測部１５０，３５０，５５０…評価部１６０，３６０，５６０…ビット配分部１７０，３７０…符号化パラメータ選択制御部１８０，３８０…切り替えスイッチ１９０…分析部１９１…ピッチ周期分析部１９２…ピッチ波形分析部２１０，４１０…ビット配分部２２１，２２２，４２１，４２２…逆量子化部２３１，２３２，４３１，４３２…遅延部２４１，２４２，４４１，４４２…予測部２５０，４５０…合成部２６０，４６０…切り替えスイッチ３１１…ピッチ周期分析部３１２…ピッチ波形分析部３１３…ゲイン分析部３２０…ＬＰＣ量子化部５１１…ピッチ形状分析部５１２…ゲイン分析部１００１…ＬＰＣ係数量子化インデックス１００２…駆動信号量子化インデックス１００３…ビット配分情報１００４…符号化パラメータ選択情報１００５…ＬＰＣ係数／駆動信号量子化インデックス１０１１…ピッチ周期量子化インデックス１０１２…ピッチ波形量子化インデックス１０１３，１０１８…ビット配分情報１０１４…符号化パラメータ選択情報１０１５…ピッチ周期／ピッチ波形量子化インデックス１０１６…ピッチ形状量子化インデックス１０１７…ゲイン量子化インデックス１０２０…復号音声信号３４１１，３４２１…ピッチパターン符号帳３４１２，３４２２…波形パターン符号帳３５１１，３５１２…誤差積分部４４１１，４４２１…ピッチパターン符号帳４４１２，４４２２…波形パターン符号帳100 audio signal input terminals 110, 310, 510 analysis unit 111 LPC analysis unit 112 drive signal analysis units 121, 122, 321, 322, 521, 522 quantization units 131, 132, 331, 332, 531, 532 delay unit 141, 142, 341, 242, 541, 542 prediction unit 150, 350, 550 evaluation unit 160, 360, 560 bit allocation unit 170, 370 encoding parameter selection control unit 180, 380 switching Switch 190 Analysis section 191 Pitch period analysis section 192 Pitch waveform analysis section 210, 410 Bit allocation section 221, 222, 421, 422 Inverse quantization section 231, 232, 431, 432 Delay section 241, 242, 441, 442: prediction unit 250, 450 ... synthesis unit 260, 460: switch Switch 311 Pitch period analysis unit 312 Pitch waveform analysis unit 313 Gain analysis unit 320 LPC quantization unit 511 Pitch shape analysis unit 512 Gain analysis unit 1001 LPC coefficient quantization index 1002 Drive signal quantization index 1003 ... bit allocation information 1004 ... coding parameter selection information 1005 ... LPC coefficient / drive signal quantization index 1011 ... pitch period quantization index 1012 ... pitch waveform quantization index 1013, 1018 ... bit allocation information 1014 ... coding parameter selection information 1015 ... Pitch period / pitch waveform quantization index 1016 ... Pitch shape quantization index 1017 ... Gain quantization index 1020 ... Decoded speech signal 3411, 3421 ... Pitch pattern codebook 3 12,3422 ... waveform pattern codebook 3511,3512 ... error integration unit 4411,4421 ... pitch pattern codebook 4412,4422 ... waveform pattern codebook

Claims

[Claims]

1. A speech encoding method for encoding a plurality of parameters obtained by analyzing input speech, wherein a predicted value of each parameter is obtained from past encoded data, and an analysis value and a predicted value of each parameter are obtained. By performing the comparison, the degree of the effect on the auditory naturalness when the predicted value is used to generate the decoded speech is evaluated. Based on the evaluation result, more codes are assigned to a parameter having a higher degree of the effect. A coded bit, and transmitting the coded data of each parameter and bit allocation information indicating allocation of coded bits to each parameter.

2. A plurality of parameters obtained by analyzing an input voice, the effect of which on the perceived naturalness when a decoded voice is generated using encoded data of the parameters and a predicted value of the parameters. Based on the evaluation result of the degree, input the bit allocation information indicating the allocation of the coded bits determined so that the degree of the influence is more allocated to the larger parameter, and decode the coded data according to the bit allocation information. A speech decoding method characterized by generating a decoded speech by performing the decoding.

3. A speech encoding method for encoding a plurality of parameters obtained by analyzing an input speech, wherein a predicted value of each parameter is obtained from past encoded data, and an analysis value and a prediction value of each parameter are obtained. By comparing the values, the degree of influence on the auditory naturalness when the predicted value is used to generate the decoded speech is evaluated. Based on the evaluation result, only the parameter having a larger degree of the influence is evaluated. A speech encoding method comprising: selecting and transmitting encoded data of an analysis value of the parameter; and transmitting selection information indicating a type of the selected parameter.

4. A method in which a plurality of parameters obtained by analyzing an input voice are selected based on an evaluation result of a degree of influence on auditory naturalness when a decoded voice is generated using a predicted value of the parameter. The coded data of the analysis value of the parameter having a greater degree of influence and selection information indicating the type of the selected parameter are input. Based on the selection information, the code of the analysis value is selected for the selected parameter. A speech decoding method comprising: decoding decoded data to generate decoded speech; and generating a decoded speech by using the predicted value as it is for a parameter not selected.