JPH1020895A

JPH1020895A - Speech encoding device and recording medium

Info

Publication number: JPH1020895A
Application number: JP8171485A
Authority: JP
Inventors: Takuya Kawashima; 嶋拓也河
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1996-07-01
Filing date: 1996-07-01
Publication date: 1998-01-23

Abstract

PROBLEM TO BE SOLVED: To reduce an increase in the processing volume when a delayed decision method is used for all time and to attain an aurally undeteriorated encoded speech. SOLUTION: A square error minimizing control means 109 outputs evaluation information for selecting from each code book 105, 106, and 107 a candidate sequence minimizing the error between a synthesized speech through an aurally weighted synthesis filter 108 and an aurally weighted input speech. On the basis of this evaluation information, a delayed decision control means 110 calculates a degree of necessary for the later delayed decision from a speech signal level from a power analyzer 104 and the voiced and unvoiced information, and controls whether or not to execute the delayed decision.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、移動体通信に用い
られ、入力音声信号を一定区間のフレームに分割し、声
道情報と音源情報とに分離し、声道情報を線形予測パラ
メータ、音源情報を数種のコードブックにより表現する
音声符号化装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is used for mobile communication, and divides an input speech signal into frames of a certain section, separates the vocal tract information into sound source information, and converts vocal tract information into linear prediction parameters and sound source information. The present invention relates to a speech encoding device that expresses information using several types of codebooks.

【０００２】[0002]

【従来の技術】従来、この種の音声符号化装置として、
4 〜8kbps 程度のビットレートので符号化するＣＥＬＰ
(Code Excited Linear Prediction coding: 線形予測符
号化)方式がある。ＣＥＬＰ方式は、入力音声信号をあ
らかじめ定められた時間長の音声フレームに分割し、各
音声フレームを線形予測分析器により分析して線形予測
係数を算出し、この線形予測係数により構成される聴覚
重み付け合成フィルタを、コードブックから選択した音
源信号により励振することによって合成音声信号を得る
方法である。コードブックとしては、過去の音源信号を
蓄えておき、入力信号のピッチ周期に応じて切り出して
用いる適応コードブックと、あらかじめ学習により作成
した音源信号を格納していて、その中から音源信号とし
て最もふさわしいものを取り出して用いる雑音コードブ
ックとを有し、これらコードブックの線形和を音源信号
として聴覚重み付け合成フィルタに入力し、符号化音声
を得る。2. Description of the Related Art Conventionally, as this kind of speech coding apparatus,
CELP encoding at a bit rate of about 4 to 8 kbps
(Code Excited Linear Prediction coding). In the CELP method, an input audio signal is divided into audio frames of a predetermined time length, and each audio frame is analyzed by a linear prediction analyzer to calculate a linear prediction coefficient. This is a method of obtaining a synthesized speech signal by exciting a synthesis filter with a sound source signal selected from a codebook. The codebook stores the past sound source signals, stores the adaptive codebook that is used by cutting out according to the pitch period of the input signal, and the sound source signals created by learning in advance. And a noise codebook for extracting and using appropriate ones, and inputting a linear sum of these codebooks as a sound source signal to an auditory weighting synthesis filter to obtain encoded speech.

【０００３】以下、図３を参照してこの種の音声符号化
装置について説明する。まず、ピッチ分析器３０１によ
り算出されたピッチ候補に応じて、適応コードブック３
０５に蓄えられた過去の音源信号から適応コードブック
候補を選択する。選択された適応コードブック候補と雑
音コードブック３０６の雑音コードブック候補との線形
和により表される駆動音源候補を生成し、この駆動音源
候補と線形予測分析器３０２により入力信号から算出さ
れた線形予測係数とから聴覚重み付け合成フィルタ３０
８により合成音声を得る。この合成音声と聴覚重み付け
フィルタ３０３を通した入力音声との誤差が最小となる
ように、２乗誤差最小化手段３０９によりコードブック
候補系列が選ばれる。ただし、聴覚重み付けされた入力
音声と聴覚重み付けされた合成音声との誤差を、各コー
ドブックの全組合せについて計算するには膨大な演算が
必要であるため、実際には各コードブックについて逐次
的に最適なコードブックを決定していく方法がとられ
る。上記構成例を用いて説明すると、まず第一段階とし
て適応コードブック３０５の候補を決定し、第二段階と
して、その候補に対して最適な組合せとなる雑音コード
ブック３０６の候補を選択し、最終段階として誤差が最
小となるゲインをゲインコードブック３０７で決定する
ことにより、コードブックの候補系列を決定している。
そして、これらコードブックからのインデックスと線形
予測係数とパワー分析器３０４からの入力音声のパワー
とをマルチプレクサ３１０で合成して符号化音声を出力
する。Hereinafter, this type of speech coding apparatus will be described with reference to FIG. First, the adaptive codebook 3 is set in accordance with the pitch candidate calculated by the pitch analyzer 301.
An adaptive codebook candidate is selected from the past sound source signals stored in 05. A driving excitation candidate represented by a linear sum of the selected adaptive codebook candidate and the noise codebook candidate of the noise codebook 306 is generated, and the driving excitation candidate and the linearity calculated from the input signal by the linear prediction analyzer 302 are generated. Auditory weighting synthesis filter 30 from prediction coefficients
8, a synthesized voice is obtained. A codebook candidate sequence is selected by the square error minimizing means 309 so that the error between the synthesized speech and the input speech passed through the auditory weighting filter 303 is minimized. However, since an enormous amount of calculation is required to calculate the error between the perceptually weighted input speech and the perceptually weighted synthesized speech for all combinations of each codebook, in practice, it is necessary to sequentially calculate each codebook sequentially. A method of determining an optimal codebook is used. To explain using the above configuration example, first, a candidate of the adaptive codebook 305 is determined as a first step, and a candidate of the noise codebook 306 that is an optimal combination for the candidate is selected as a second step. The gain codebook 307 determines the gain that minimizes the error as a step, thereby determining a codebook candidate sequence.
The index from the codebook, the linear prediction coefficient, and the power of the input speech from the power analyzer 304 are combined by the multiplexer 310 to output the encoded speech.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、このよ
うな逐次的な選択方法では、最適なコードブックの組合
せを得ることは保証されないため、ＰＤＣハーフレート
音声符号化方式であるＰＳＩ−ＣＥＬＰ方式（ＲＣＲ−
２７Ｄ）のように、第一段階である適応コードブックの
候補を２候補残し、それぞれの候補に対して誤差が最小
となるように、雑音コードブック、ゲインコードブック
の候補が選択され、最終的に誤差が少ない適応コードブ
ック、雑音コードブック、ゲインコードブックの候補の
組が選択されるディレイドディシジョン法が適用されて
いるが、このような最終ステップにおいてそれぞれの候
補の組み合わせの誤差が最小となる候補系列を選択する
ディレイドディシジョン法を常時適用していたのでは、
通常第一段階に候補を決定する適応コードブックにおい
て、２種類の候補を残すだけでも処理量が大幅に増えて
しまうという問題点を有していた。However, in such a sequential selection method, it is not guaranteed that an optimum codebook combination is obtained, so that the PSI-CELP (RCR) which is a PDC half-rate speech coding system is used. −
27D), two candidates for the adaptive codebook, which is the first stage, are left, and candidates for the noise codebook and gain codebook are selected so that the error is minimized for each candidate. The delayed decision method in which a set of candidates for an adaptive codebook, a noise codebook, and a gain codebook with a small error is selected is applied, but the error of the combination of each candidate is minimized in such a final step. Since the delayed decision method of selecting candidate series was always applied,
Usually, in an adaptive codebook that determines candidates in the first stage, there is a problem that even if only two types of candidates are left, the processing amount is significantly increased.

【０００５】本発明は、上記従来の問題を解決するもの
で、音声信号の音声レベルやコードベクトル決定時の評
価値等を用いて、サブフレーム毎にディレイドディシジ
ョンを実行するか否かを決定することにより、常にディ
レイドディシジョンを行う場合と比較して、処理量を効
果的に低減し、なおかつ聴感的に劣化の無い符号化音声
を得ることのできる音声符号化装置を提供することを目
的とする。The present invention solves the above-mentioned conventional problem, and determines whether or not to execute delayed decision for each subframe by using an audio level of an audio signal, an evaluation value at the time of determining a code vector, and the like. Accordingly, an object of the present invention is to provide a speech coding apparatus capable of effectively reducing the amount of processing and obtaining coded speech with no audible deterioration compared to a case where a delayed decision is always performed. .

【０００６】[0006]

【課題を解決するための手段】上記目的を達成するため
に本発明は、従来のＣＥＬＰ符号化装置に音声信号の音
声レベルや音源コードブック候補決定時の評価値から、
それ以後のディレイドディシジョンの必要度を計算して
実行するか否かの制御を行うディレイドディシジョン制
御手段を備えたものである。これにより、処理量を効果
的に低減し、なおかつ常にディレイドディシジョンを行
う場合と比較しても、聴感的に劣化の無い符号化音声を
得ることができる。In order to achieve the above object, the present invention provides a conventional CELP coding apparatus which uses a speech level of a speech signal and an evaluation value at the time of determining a source codebook candidate.
It is provided with a delayed decision control means for calculating whether or not the necessity of the delayed decision thereafter is to be calculated and executed. As a result, it is possible to effectively reduce the amount of processing, and to obtain coded speech that is not audibly degraded even when compared with the case of always performing delayed decision.

【０００７】[0007]

【発明の実施の形態】本発明の請求項１に記載の発明
は、入力音声信号を一定区間のフレームに分割し、声道
情報と音源情報とに分離し、声道情報を線形予測パラメ
ータ、音源情報を数種のコードブックにより表現する音
声符号化装置において、入力音声信号からピッチ候補を
算出するピッチ分析器と、入力音声信号から線形予測係
数を算出する線形予測分析器と、入力音声信号に対して
聴覚重み付けを行う聴覚重み付けフィルタと、入力音声
信号の音声レベルを求めるパワー分析器と、過去の音源
信号を蓄積してピッチ分析器からのピッチ候補に応じて
適応コードブック候補を選択する適応コードブックと、
あらかじめ学習により作成された音源信号を蓄積した雑
音コードブックと、適応コードブック候補および雑音コ
ードブック候補のゲインを選択するゲインコードブック
と、線形予測分析器からの線形予測係数と各コードブッ
ク候補の線形和とから合成音声を生成する聴覚重み付け
合成フィルタと、聴覚重み付け合成フィルタからの合成
音声と聴覚重み付けフィルタからの聴覚重み付けされた
入力音声との誤差が最小となる候補系列を各コードブッ
クから選択するための評価情報を出力する２乗誤差最小
化制御手段と、出力された評価情報をもとに適応コード
ブックおよび雑音コードブックからコードブック候補を
選択するとともに、その評価情報とパワー分析器からの
音声信号レベルや有音・無音情報により、それ以降のデ
ィレイドディシジョンの必要度を計算して実行するか否
かの制御を行うディレイドディシジョン制御手段とを備
えた音声符号化装置であり、常時ディレイドディシジョ
ンを行なう場合に比べ、処理量を削減しつつ、かつ音質
劣化の無い合成音声を得ることができる。DESCRIPTION OF THE PREFERRED EMBODIMENTS The invention according to claim 1 of the present invention divides an input speech signal into frames of a certain section, separates the vocal tract information into sound source information, and converts vocal tract information into linear prediction parameters, In a speech encoding device that expresses sound source information by several kinds of codebooks, a pitch analyzer that calculates pitch candidates from an input speech signal, a linear prediction analyzer that calculates a linear prediction coefficient from the input speech signal, and an input speech signal A perceptual weighting filter that performs perceptual weighting on the input signal, a power analyzer that determines the voice level of the input voice signal, and a past sound source signal that is stored and selects an adaptive codebook candidate according to the pitch candidate from the pitch analyzer. An adaptive codebook,
A noise codebook that stores the sound source signal created by learning in advance, a gain codebook that selects the gain of the adaptive codebook candidate and the noise codebook candidate, a linear prediction coefficient from the linear prediction analyzer and A perceptual weighting synthesis filter that generates a synthesized voice from the linear sum, and a candidate sequence that minimizes the error between the synthesized voice from the perceptual weighting synthesis filter and the perceptually weighted input voice from the perceptual weighting filter is selected from each codebook Means for minimizing the square error for outputting the evaluation information for performing the calculation, selecting codebook candidates from the adaptive codebook and the noise codebook based on the output evaluation information, and selecting the codebook candidate from the evaluation information and the power analyzer. Depending on the audio signal level and sound / silence information, And a delayed decision control means for controlling whether or not to execute the calculation of the necessity of the audio signal, and to reduce the processing amount and the sound quality as compared with the case of always performing the delayed decision. Synthesized speech without degradation can be obtained.

【０００８】また、請求項２に記載の発明は、請求項１
に記載の音声符号化装置を、信号処理プロセッサを用い
てソフトウェアで実現するためのプログラムを記憶させ
た記憶媒体であり、例えば、プログラムをＲＯＭや磁気
ディスク等に記憶させることにより、パーソナルコンピ
ュータ等の汎用信号処理装置上で、ソフトウェアにより
本発明の音声符号化装置を実現できるものである。[0008] The invention described in claim 2 is the first invention.
Is a storage medium storing a program for realizing software using the signal processor, the speech encoding device described in, for example, by storing the program in a ROM or a magnetic disk, such as a personal computer The speech encoding device of the present invention can be realized by software on a general-purpose signal processing device.

【０００９】以下、本発明の実施の形態について、図１
および図２を用いて説明する。（実施の形態）図１は本発明の形態における音声符号化
装置の構成を示すものであり、従来のＣＥＬＰ符号化装
置に、入力音声レベルや各コードブックにおける候補選
択時の評価値によりディレイドディシジョンの制御を行
なう構成を付加したものである。図１において、１０１
は入力音声からピッチ候補を算出するピッチ分析器、１
０２は入力音声から線形予測係数を算出する線形予測分
析器、１０３は入力音声に対して聴覚重み付けを行う聴
覚重み付けフィルタ、１０４は入力音声の音声レベルを
求めるパワー分析器、１０５は過去の音源信号を蓄積し
てピッチ分析器１０１からのピッチ候補に応じて適応コ
ードブック候補を選択する適応コードブック、１０６は
あらかじめ学習により作成された音源信号を蓄積した雑
音コードブック、１０７は適応コードブック候補および
雑音コードブック候補のゲインを選択するゲインコード
ブック、１０８は線形予測分析器１０２からの線形予測
係数と各コードブック候補の線形和とから合成音声を生
成する聴覚重み付け合成フィルタ、１０９は聴覚重み付
け合成フィルタ１０８からの合成音声と聴覚重み付けフ
ィルタ１０３からの聴覚重み付けされた入力音声との誤
差が最小となる候補系列を各コードブック１０５、１０
６、１０７から選択するための評価情報を出力する２乗
誤差最小化制御手段、１１０は出力された評価情報をも
とに適応コードブック１０５および雑音コードブック１
０６からコードブック候補を選択するとともに、その評
価情報とパワー分析器１０４からの音声信号レベルや有
音・無音情報により、それ以降のディレイドディシジョ
ンの必要度を計算して実行するか否かの制御を行うディ
レイドディシジョン制御手段、１１１は適応コードブッ
ク１０５、雑音コードブック１０６およびゲインコード
ブック１０７から選択された各コードブック候補と線形
予測分析器１０２からの線形予測係数とパワー分析器１
０４からの音声信号レベル情報を合成して符号化音声を
出力するマルチプレクサである。Hereinafter, an embodiment of the present invention will be described with reference to FIG.
This will be described with reference to FIG. (Embodiment) FIG. 1 shows a configuration of a speech coding apparatus according to an embodiment of the present invention. In the conventional CELP coding apparatus, a delayed decision is made based on an input speech level and an evaluation value at the time of selecting a candidate in each codebook. Is added. In FIG.
Is a pitch analyzer that calculates pitch candidates from input speech,
02 is a linear prediction analyzer that calculates a linear prediction coefficient from the input voice, 103 is an auditory weighting filter that performs auditory weighting on the input voice, 104 is a power analyzer that determines the voice level of the input voice, and 105 is a past sound source signal. , And an adaptive codebook for selecting an adaptive codebook candidate in accordance with a pitch candidate from the pitch analyzer 101, a noise codebook 106 for storing a sound source signal created by learning in advance, 107 an adaptive codebook candidate and A gain codebook for selecting a gain of the noise codebook candidate; 108, an auditory weighting / synthesis filter for generating a synthetic speech from the linear prediction coefficient from the linear prediction analyzer 102 and a linear sum of the codebook candidates; From the synthesized speech from the filter 108 and the auditory weighting filter 103 Each codebook candidate sequence error is minimized between the perceptually weighted input speech 105,10
And a square error minimizing control means 110 for outputting evaluation information for selecting from the adaptive codebook 105 and the noise codebook 1 based on the output evaluation information.
06, a codebook candidate is selected, and based on the evaluation information and the audio signal level and the sound / non-sound information from the power analyzer 104, the necessity of the delayed decision is calculated and executed. The delay decision control means 111 performs each of the codebook candidates selected from the adaptive codebook 105, the noise codebook 106, and the gain codebook 107, the linear prediction coefficient from the linear prediction analyzer 102, and the power analyzer 1.
This is a multiplexer for synthesizing the audio signal level information from the output unit 04 and outputting encoded audio.

【００１０】次に、上記のように構成された音声符号化
装置の動作を図２を参照しながら説明する。図１におい
て、入力音声は線形予測分析器１０２により線形予測係
数を算出し、この係数を用いて聴覚重み付け合成フィル
タ１０８を構成する。音源コードベクトルの出力は、聴
覚重み付け合成フィルタ１０８を通すことにより合成音
声が合成され、この合成音声と入力音声を聴覚重み付け
フィルタ１０３によって聴覚重み付けされたターゲット
ベクトルとの差が最小となるものが、２乗誤差最小化制
御手段１０９により選択される。具体的には、式（１）
に示す評価値を最小とする音源コードブックの候補が選
択される。Ｅ²＝｜ｔ−ｇ_c・Ｈ・ｃ_j｜² ・・・（１）ここで、ｔは入力音声を聴覚重み付けフィルタ１０３に
よって聴覚重み付けされたターゲットベクトル、ｇ_cは
ゲイン、ｃ_jは音源コードベクトル、ｊは音源コードベ
クトルインデクス、Ｈは聴覚重み付けフィルタを表す。
複数のコードブックから逐次的にコードベクトルを選択
するには、ターゲットベクトルｔから前段階で選択した
コードベクトルの値を減算するか、または選択しようと
するコードブックのコードベクトルを、前段階で選択し
たコードベクトルに対して直行化させてから選択する。
ＰＳＩ−ＣＥＬＰにおいては、適応コードブック候補を
選択する場合に、誤差Ｅ²が最小のもの２つを常に残
し、それぞれの候補に対して、雑音コードブック、ゲイ
ンコードブックの候補を選択し、最終的にＥ²が最小の
候補系列を一つ選択する。Next, the operation of the speech coding apparatus configured as described above will be described with reference to FIG. In FIG. 1, a linear prediction coefficient is calculated by a linear prediction analyzer 102 for an input speech, and an auditory weighting synthesis filter 108 is configured using the coefficient. The output of the sound source code vector is synthesized with a synthesized speech by passing through a perceptual weighting synthesis filter 108, and the one in which the difference between the synthesized voice and the target vector of which the input voice is perceptually weighted by the perceptual weighting filter 103 is minimized, It is selected by the square error minimizing control means 109. Specifically, equation (1)
The candidate of the sound source codebook that minimizes the evaluation value shown in (1) is selected. E ² = | t−g _c · H · c _j | ² (1) where t is a target vector obtained by subjecting the input speech to auditory weighting by the auditory weighting filter 103, g _c is a gain, and c _j is a sound source. The code vector, j represents a sound source code vector index, and H represents an auditory weighting filter.
To sequentially select code vectors from a plurality of codebooks, subtract the value of the code vector selected in the previous step from the target vector t, or select the code vector of the code book to be selected in the previous step. After making the code vector orthogonal, select it.
In PSI-CELP, when selecting an adaptive codebook candidate, leaving always error E ² are two smallest ones, select for each candidate, the noise codebook, the candidates of the gain codebook, the final One candidate sequence with the smallest E ² is selected.

【００１１】次に図２を用いて、ディレイドディシジョ
ン制御手段１１０における制御手順について説明する。
説明の簡単のため、音源コードブックは、適応コードブ
ック１０５、雑音コードブック１０６、ゲインコードブ
ック１０７により構成され、この順で候補が確定するも
のとする。また、適応コードブック１０５の候補を最大
２つ残すものとし、雑音コードブック１０６、ゲインコ
ードブック１０７は適応コードブック候補一つに対し、
唯一決定するものとする。まず、ステップ２０１で音声
レベルの判定を行なう。一般に音声レベルが小さい場合
には、聴感上影響が小さいため、高音質であることに固
執する必要が無い。これより、閾値をε _sとおくと、音
声レベルがε_sより大きい場合にはステップ２０４へ、
つまり適応コードブック候補を２つ残し、それ以外であ
る時には、ステップ２０２、つまり適応コードブック候
補を１つのみ残すものとする。ステップ２０２を選択し
た場合には、ステップ２０３で雑音コードブック候補、
ステップ２０８でゲインコードブックの候補をそれぞれ
一つずつ選択し、音源コードブック候補系列が決定す
る。Next, referring to FIG.
The control procedure in the control unit 110 will be described.
For simplicity of explanation, the sound source code book
Block 105, noise codebook 106, gain codebook
And the candidates are determined in this order.
And In addition, the number of adaptive codebook 105 candidates is
The noise code book 106 and the gain code
The codebook 107 corresponds to one adaptive codebook candidate.
The sole decision shall be made. First, in step 201, voice
Determine the level. Generally when the audio level is low
Has a low sound quality, so it is important to have high sound quality.
You don't have to. From this, the threshold is set to ε _sThe sound
Voice level is ε_sIf greater, go to step 204,
In other words, two adaptive codebook candidates are left,
At step 202, the adaptive codebook
Only one complement shall be left. Select Step 202
In step 203, the noise code book candidate
In step 208, each of the gain codebook candidates is
Select one by one to determine the sound source codebook candidate sequence.
You.

【００１２】次に、ステップ２０４を選択した場合の説
明をする。ステップ２０５において残された２つの適応
コードブック候補を選択する際に算出された選択評価値
を比較する。ここでは式（１）に示すＥ²を選択評価値
とし、この評価値が最小のものからが第一候補、第二候
補、・・・として選ばれるとする。この時、２つの候補
の選択評価値がある程度の差内である場合には、どちら
の候補が最終的に選択されるか判断できないため、ステ
ップ２０６で２候補とも選択するものとする。一方、２
つの候補の選択評価値がある程度差が大きい場合には、
第１候補の適応度が十分高いと考え、ステップ２０７で
適応コードブックの第一候補のみ残すものとする。ステ
ップ２０７の場合には、ステップ２０２で行った場合と
同様に雑音コードブック候補、ゲインコードブック候補
を一意に決定する。またステップ２０６の場合には、各
適応コードブック候補に対して、それぞれ雑音コードブ
ック候補、ゲインコードブック候補を一意に決定し、最
終的に誤差の少ない候補系列を選択するものとする。こ
のようにして、音声信号の音声レベルや音源コードブッ
ク候補決定時の評価値から、それ以後のディレイドディ
シジョンの必要度を計算して実行するか否かの制御を行
う。Next, the case where step 204 is selected will be described. The selection evaluation value calculated when the two adaptive codebook candidates left in step 205 are selected is compared. Here, it is assumed that E ² shown in Expression (1) is a selected evaluation value, and the evaluation value with the smallest evaluation value is selected as a first candidate, a second candidate,. At this time, if the selection evaluation value of the two candidates is within a certain difference, it is not possible to determine which candidate is finally selected. Meanwhile, 2
If the selection evaluation value of the two candidates has a large difference,
Considering that the fitness of the first candidate is sufficiently high, only the first candidate of the adaptive codebook is left in step 207. In the case of step 207, noise codebook candidates and gain codebook candidates are uniquely determined as in the case of step 202. In the case of step 206, a noise codebook candidate and a gain codebook candidate are uniquely determined for each adaptive codebook candidate, and a candidate sequence with a small error is finally selected. In this way, it is controlled whether or not the necessity of the delayed decision after that is calculated based on the audio level of the audio signal and the evaluation value at the time of determining the sound source codebook candidate, and whether or not to execute the calculation.

【００１３】なお以上の説明では、ディレイドディシジ
ョン選択基準として音声レベル、適応コードブック候補
選択評価値を用いた例で説明したが、その他の音声パラ
メータを用いたり、ニューラルネットワークにより、あ
らかじめ音声パターンとディレイドディシジョン実行率
等を学習させておき、ニューラルネットワークにより制
御する方法を用いても同様の効果を期待できる。In the above description, an example was described in which the speech level and the adaptive codebook candidate selection evaluation value were used as the delayed decision selection criterion. However, other speech parameters were used, or the speech pattern and the delayed The same effect can be expected by learning a decision execution rate or the like and using a method of controlling by a neural network.

【００１４】[0014]

【発明の効果】以上のように、本発明は、従来のＣＥＬ
Ｐ符号化装置にディレイドディシジョン制御手段を設け
ることにより、処理量を大幅に増やすことなく、音声品
質を向上させることができるという効果が得られる。As described above, according to the present invention, the conventional CEL
By providing the delayed decision control means in the P encoder, there is an effect that the speech quality can be improved without significantly increasing the processing amount.

[Brief description of the drawings]

【図１】本発明の実施の形態における音声符号化装置の
ブロック図FIG. 1 is a block diagram of a speech encoding device according to an embodiment of the present invention.

【図２】本発明の実施の形態におけるディレイドディシ
ジョン制御手段の動作説明のためのフロー図FIG. 2 is a flowchart for explaining the operation of a delayed decision control means in the embodiment of the present invention.

【図３】従来のＣＥＬＰ方式音声符号化装置のブロック
図FIG. 3 is a block diagram of a conventional CELP type speech coding apparatus.

[Explanation of symbols]

１０１ピッチ分析器１０２線形予測分析器１０３聴覚重み付けフィルタ１０４パワー分析器１０５適応コードブック１０６雑音コードブック１０７ゲインコードブック１０８聴覚重み付け合成フィルタ１０９自乗誤差最小化制御手段１１０ディレイドディシジョン制御手段１１１マルチプレクサ Reference Signs List 101 Pitch analyzer 102 Linear prediction analyzer 103 Auditory weighting filter 104 Power analyzer 105 Adaptive codebook 106 Noise codebook 107 Gain codebook 108 Auditory weighting synthesis filter 109 Square error minimization control means 110 Delayed decision control means 111 Multiplexer

Claims

[Claims]

1. A speech coding system which divides an input speech signal into frames of a certain section, separates the vocal tract information and sound source information, expresses the vocal tract information by linear prediction parameters, and expresses the sound source information by several kinds of codebooks. A pitch analyzer that calculates pitch candidates from an input audio signal, a linear prediction analyzer that calculates a linear prediction coefficient from the input audio signal, an auditory weighting filter that performs auditory weighting on the input audio signal, A power analyzer that calculates the audio level of the signal, an adaptive codebook that accumulates past sound source signals and selects an adaptive codebook candidate according to pitch candidates from the pitch analyzer, and a sound source signal created by learning in advance A gain codebook that selects the gains of the accumulated noise codebook, adaptive codebook candidates and noise codebook candidates, and linear A perceptual weighting synthesis filter that generates a synthesized voice from a linear prediction coefficient from the prediction analyzer and a linear sum of each codebook candidate; a synthesized voice from the perceptual weighted synthesis filter; and a perceptually weighted input voice from the perceptual weighting filter. Squared error minimizing control means for outputting evaluation information for selecting a candidate sequence that minimizes the error from each codebook, and a codebook from an adaptive codebook and a noise codebook based on the output evaluation information. Delayed decision control that selects candidates and calculates whether or not to execute the subsequent delayed decision based on the evaluation information and the audio signal level and sound / non-sound information from the power analyzer. And a speech encoding device.

2. A storage medium storing a program for realizing the speech encoding device according to claim 1 by software using a signal processor.