JPH06195098A

JPH06195098A - Speech encoding method

Info

Publication number: JPH06195098A
Application number: JP4345902A
Authority: JP
Inventors: Hidetoshi Sekine; 英敏関根; Yoshiaki Asakawa; 吉章淺川; 卓 ▲高▼島; Taku Takashima; Atsuyoshi Ishikawa; 敦義石川
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1992-12-25
Filing date: 1992-12-25
Publication date: 1994-07-15
Anticipated expiration: 2016-11-26
Also published as: JP3232728B2

Abstract

PURPOSE:To provide the speech encoding method which can obtain a synthesized speech of high quality even at a <=4Kbps low bit rate. CONSTITUTION:An input speech is analyzed by an acoustic classification part 31 and a retrieval code book selection part 34 selects a code book for retrieval processing among a pulse information code book 33, a pulse sound source code book constituted by a pulse generation part 34 and a pulse sound source retrieval part 32, a noise information code book, and a noise sound source code book constituted by a noise sound source retrieval part 37 according to the analytic result; and an in-use sound source selection part 35 selects a sound source to be used according to the retrieval result and the input speech is encoded corresponding to its acoustic features. As for pulse sound sources, a retrieval arithmetic quantity is reduced and all possible combinations of pulse sound sources are retrieved to select an optimum sound source pulse. Consequently, the reproducibility of a periodic component of a speech and adaption to acoustic features are improved and the speech of high quality can be obtained with a low throughput even at a low-bit rate.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、低ビットレートで高品
質な合成音声を得るに好適な音声符号化方法に関し、特
に比較的少ない処理量で４ｋｂｐｓ以下のビットレート
に適用できる音声符号化方法に係る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coding method suitable for obtaining high-quality synthesized speech at a low bit rate, and particularly to a speech coding method applicable to a bit rate of 4 kbps or less with a relatively small processing amount. Pertain to.

【０００２】[0002]

【従来の技術】合成音声と原音声の重み付き誤差を評価
し、その誤差を最小化するように符号化パラメータを決
定する、「合成による分析」手法を取り入れた音声符号
化方式が最近提案され、低ビットレートにおいても比較
的良好な音声品質を得ることに成功している。代表的な
ものとして符号駆動線形予測符号化（ＣＥＬＰ）方式
（例えば、M. R. Schroeder and B. S. Atal: "Code-ex
cited linear prediction(CELP)", Proc. ICASSP 85 (1
985.3)）があり、４．８ｋｂｐｓで実用的な音声品質を
達成している。また、ＣＥＬＰ方式の改良方式も多数提
案されており、例えばベクトル和駆動線形予測符号化
（ＶＳＥＬＰ）方式（例えば、I. A. Gersonand M. A.
Jasiuk: "Vector sum excited linear prediction (VSE
LP) speechcoding at 8kbps", Proc. ICASSP 90 (1990.
4)）は、処理量やメモリ容量、ビット誤り耐性の点で優
れている。2. Description of the Related Art Recently, a speech coding method incorporating a "synthesis analysis" method for evaluating a weighted error between synthetic speech and original speech and determining a coding parameter so as to minimize the error has been proposed. We have succeeded in obtaining relatively good voice quality even at low bit rates. A typical example is a code driven linear predictive coding (CELP) method (for example, MR Schroeder and BS Atal: "Code-ex.
cited linear prediction (CELP) ", Proc. ICASSP 85 (1
985.3)) and achieves practical voice quality at 4.8 kbps. Also, many improved methods of the CELP method have been proposed, for example, vector sum driven linear predictive coding (VSELP) method (for example, IA Gersonand MA).
Jasiuk: "Vector sum excited linear prediction (VSE
LP) speechcoding at 8kbps ", Proc. ICASSP 90 (1990.
4)) is superior in processing amount, memory capacity, and bit error resistance.

【０００３】一方、移動無線通信のディジタル化が本格
化し、周波数の有効活用の観点から、より低ビットレー
ト（４ｋｂｐｓ以下）の音声符号化方式の開発が望まれ
ている。ＣＥＬＰやＶＳＥＬＰを単純に低ビットレート
化しようとすると、品質劣化が大きくなり、限界があ
る。これは適応コードブック検索による長期予測精度が
低下し、周期成分の再現性が低下する結果、復号音声の
雑音感が強くなるためである。そこで、従来の統計音源
（雑音性音源）の他に、パルス音源を導入し、周期性の
再現性を高める方式が提案されている。On the other hand, with the full-scale digitization of mobile radio communications, there is a demand for the development of a voice coding system having a lower bit rate (4 kbps or less) from the viewpoint of effective use of frequencies. If CELP or VSELP is simply made to have a low bit rate, quality deterioration becomes large and there is a limit. This is because the long-term prediction accuracy by the adaptive codebook search is reduced and the reproducibility of the periodic component is reduced, resulting in a stronger sense of noise in the decoded speech. Therefore, in addition to the conventional statistical sound source (noise source), a method of introducing a pulse sound source to improve the reproducibility of periodicity has been proposed.

【０００４】このような方式としては、有声音では位相
と振幅を制御したシングルパルス、無声音ではＣＥＬＰ
を用いる「ＳＰＥ−ＣＥＬＰ」方式（W. Granzow and
B. S.Atal: "High-quality digital speech at 4 kb/
s", Proc. GLOBECOM 90 (1990.12)）や、周期パルスと
雑音を切り替えて使用する「パルス／雑音選択型ＣＥＬ
Ｐ」方式（吉田、他２：”低ビットレートＣＥＬＰ符号
化へのパルス音源探索の適用”、信学技報ＳＰ９１−
６８（１９９１．１０）、あるいは、田中、板倉：”Ｃ
ＥＬＰ音声符号化方式におけるパルス音源導入による音
声品質の向上”、信学技報ＥＡ９２−２４（１９９
２．５））等がある。As such a method, a single pulse whose phase and amplitude are controlled is used for voiced sound, and CELP is used for unvoiced sound.
"SPE-CELP" method (W. Granzow and
BSAtal: "High-quality digital speech at 4 kb /
s ", Proc. GLOBECOM 90 (1990.12)), or" pulse / noise selection type CEL "that switches between periodic pulse and noise
P "method (Yoshida et al. 2:" Application of pulse source search to low bit rate CELP coding ", IEICE Technical Report SP91-
68 (1991.10) or Tanaka, Itakura: "C
Improving Speech Quality by Introducing a Pulse Sound Source in ELP Speech Coding System ", IEICE Technical Report EA92-24 (199)
2.5)) etc.

【０００５】[0005]

【発明が解決しようとする課題】上記パルス音源を使用
した音声符号化方式は、従来方式に比べると低ビットレ
ート化しても周期成分の再現性の向上が図れるが、次の
ような問題がある。The speech coding method using the above pulse sound source can improve the reproducibility of the periodic component even if the bit rate is reduced as compared with the conventional method, but has the following problems. .

【０００６】「ＳＰＥ−ＣＥＬＰ」方式は１ピッチ周期
あたり１本のパルスしか用いないため、その位置や振幅
が音声品質に極めて大きな影響を与える。パルス位置の
決め方はかなり複雑で、入力音声信号に対するロバスト
性に問題がある。また、符号化音声がブザー的になる場
合があるとの報告もある。Since the "SPE-CELP" method uses only one pulse per pitch period, its position and amplitude have a great influence on voice quality. The method of determining the pulse position is quite complicated, and there is a problem in robustness to the input audio signal. In addition, there is also a report that the coded voice may become a buzzer.

【０００７】一方、「パルス／雑音選択型ＣＥＬＰ」方
式は、パルス音源と雑音音源を個別に用いたときの誤差
を評価し、誤差の小さい方の音源を選択したり、入力音
声の有声無声判定により使用する音源を選択したりす
る。これらの方法では長期予測（適応コードブック検
索）を併用するので、パルス音源は長期予測ベクトルを
補完する意味合いが強い。しかし、上記の文献ではパル
ス間隔を長期予測ラグ、またはピッチ周期に限定してい
るため、十分な音声品質が得られていないという問題が
ある。On the other hand, the "pulse / noise selection type CELP" system evaluates an error when a pulse sound source and a noise sound source are individually used, selects a sound source with a smaller error, and determines whether a voiced voice is unvoiced. Select the sound source to use. Since these methods also use long-term prediction (adaptive codebook search), pulse sources have a strong meaning of complementing long-term prediction vectors. However, in the above-mentioned document, since the pulse interval is limited to the long-term prediction lag or the pitch period, there is a problem that sufficient voice quality is not obtained.

【０００８】また、「ＳＰＥ−ＣＥＬＰ」方式も「パル
ス／雑音選択型ＣＥＬＰ」方式もパルス音源と雑音音源
との切り替えを行っているため、符号化音声に、音源の
切り替えに起因する音色の変化（不連続感）があるとい
う問題もある。Further, in both the "SPE-CELP" system and the "pulse / noise selection type CELP" system, switching between the pulse sound source and the noise sound source is performed, and therefore, the tone color change caused by the sound source switching in the coded voice. There is also the problem of (discontinuity).

【０００９】また、音源としてパルス成分だけを用いた
方式では、摩擦音等の残差波形が雑音的になる部分で
は、パルス音源による残差波形の近似が困難であり、復
号音声の劣化が目立ち、パルス音源だけを用いるのは問
題がある。Further, in the method using only the pulse component as the sound source, it is difficult to approximate the residual waveform by the pulse sound source in a portion where the residual waveform such as a fricative noise becomes noisy, and the deterioration of the decoded speech is conspicuous. It is problematic to use only pulsed sound sources.

【００１０】本発明の第１の目的は、低ビットレート化
しても音声品質の劣化が少い符号化方式を提供すること
である。また、本発明の第２の目的は、比較的低処理量
で第１の目的を実現することである。A first object of the present invention is to provide an encoding method in which the deterioration of voice quality is small even if the bit rate is reduced. A second object of the present invention is to realize the first object with a relatively low throughput.

【００１１】[0011]

【課題を解決するための手段】上記目的を達成するた
め、本発明では、音声符号器に入力された音声は、まず
フレーム、およびサブフレームに分割される。短期予測
分析部では、フレームごとにスペクトルパラメータ（短
期予測係数）が抽出され、量子化される。次に、聴覚重
み付け誤差を評価するための準備として、入力音声に聴
覚重み付けがなされる。また、重み付け合成フィルタに
ゼロ信号を入力し、零入力応答を求め、重み付けられた
入力信号から減算する。これは、合成フィルタの内部状
態に依存する過去の影響を取り除くためである。さら
に、重み付け合成フィルタのインパルス応答も計算して
おく。In order to achieve the above object, in the present invention, the speech input to the speech coder is first divided into frames and subframes. In the short-term prediction analysis unit, the spectrum parameter (short-term prediction coefficient) is extracted and quantized for each frame. The input speech is then perceptually weighted in preparation for evaluating perceptual weighting errors. Further, the zero signal is input to the weighting synthesis filter, the zero input response is obtained, and the zero input response is subtracted from the weighted input signal. This is to remove past effects that depend on the internal state of the synthesis filter. Furthermore, the impulse response of the weighting synthesis filter is also calculated.

【００１２】次に長期予測分析部において、サブフレー
ム単位で、適応コードブックから最適な長期予測ラグと
利得を求める。重み付けられた入力信号から零入力応答
を減算した信号から、さらに利得を乗じた重み付け長期
予測ベクトルを差し引いた信号を作製し、検索コードブ
ック選択部に入力する。Next, the long-term prediction analysis unit obtains the optimum long-term prediction lag and gain from the adaptive codebook in subframe units. A signal obtained by subtracting the weighted long-term prediction vector multiplied by the gain from the signal obtained by subtracting the zero-input response from the weighted input signal is input to the search codebook selection unit.

【００１３】音響分類部は上記入力音声をフレームある
いはサブフレーム単位で分析し、音響的特徴を表す音響
パラメータを求め、分析結果を検索コードブック選択部
と使用音源選択部に出力する。（Ａ）検索コードブック選択部は、音響分類部からの入力情報
によって、検索を行うコードブックを複数のコードブッ
クの中から選択し、各検索コードブックに前述の検索対
象信号を入力する。コードブックは、パルス音源と雑音
音源など異なった特性の音源が複数個用意されており、
入力音声の音響的特徴に基づいて、適当なコードブック
が検索処理の対象として選択される。（Ｂ）パルス音源の検索では、まずパルス情報コードブックか
らパルス間隔と先頭パルス位置の情報を読みだし、パル
ス発生部でパルス列を発生させる。この時、パルス間隔
の情報は、長期予測の検索結果等による限定は行わず、
予め設定した値の全てをパルス列発生に用いる。このパ
ルス列を音源ベクトルとみなし、重み付け合成フィルタ
のインパルス応答の畳み込みにより、重み付けする。こ
れらの重み付けベクトルに対して重み付け誤差を順次評
価し、誤差を最小化するパルス音源ベクトルと利得を決
定する。（Ｃ）雑音音源の検索では、雑音情報コードブックから雑音情
報を読みだして音源ベクトルを作成し、その音源ベクト
ルに対して重み付けを行った重み付けベクトルの重み付
け誤差を評価し、誤差を最小にする雑音音源ベクトルと
利得を決定する。（Ｄ）使用音源選択部では、検索コードブック選択部によって
選択された各検索コードブックの検索結果、音響分類部
の分析結果、長期予測器の検索結果等から使用音源コー
ドブックを選択し、そのコードブックの検索結果を音源
ベクトルとして出力し、使用するコードブックを表す使
用音源指標を出力する。（Ｅ）利得量子化部では、長期予測ベクトルと音源ベクトルの
利得を同時に最適化し、量子化する。The sound classification unit analyzes the input speech in units of frames or subframes, obtains acoustic parameters representing acoustic characteristics, and outputs the analysis result to the search codebook selection unit and the used sound source selection unit. (A) The search codebook selection unit selects a codebook to be searched from a plurality of codebooks based on the input information from the acoustic classification unit, and inputs the above-mentioned search target signal to each search codebook. The codebook has multiple sound sources with different characteristics such as pulse sound source and noise sound source.
An appropriate codebook is selected for the search process based on the acoustic features of the input speech. (B) In the search for the pulse sound source, first, the information on the pulse interval and the head pulse position is read from the pulse information codebook, and the pulse train is generated by the pulse generator. At this time, the information on the pulse interval is not limited by the search results of long-term prediction,
All preset values are used for pulse train generation. This pulse train is regarded as a sound source vector and weighted by convoluting the impulse response of the weighting synthesis filter. The weighting error is sequentially evaluated for these weighting vectors, and the pulse source vector and the gain that minimize the error are determined. (C) In the noise source search, noise information is read from the noise information codebook, a source vector is created, and the weighting error of the weighting vector obtained by weighting the source vector is evaluated to minimize the error. Determine the noise source vector and gain. (D) The used sound source selection unit selects a used sound source codebook from the search result of each search codebook selected by the search codebook selection unit, the analysis result of the acoustic classification unit, the search result of the long-term predictor, and the like. The search result of the codebook is output as a sound source vector, and the used sound source index indicating the codebook to be used is output. (E) The gain quantizer simultaneously optimizes and quantizes the gains of the long-term prediction vector and the excitation vector.

【００１４】以上のようにして求められたスペクトルパ
ラメータや利得の量子化コード、長期予測ラグ、使用音
源指標、音源ベクトルの指標が伝送パラメータとして復
号器へ伝送される。The spectrum parameter, the quantized code of the gain, the long-term prediction lag, the used excitation index, and the excitation vector index obtained as described above are transmitted to the decoder as transmission parameters.

【００１５】復号器では、上記伝送パラメータから駆動
音源が計算され、短期予測係数をフィルタ係数とする合
成フィルタに入力されることによって、復号化音声が得
られる。In the decoder, the driving sound source is calculated from the above transmission parameters and is input to the synthesis filter using the short-term prediction coefficient as a filter coefficient, whereby decoded speech is obtained.

【００１６】[0016]

【作用】前記（Ａ）の音響分類部は、入力音声の分析を
行い、音響的特徴を表すパラメータを求め、それに基づ
き音響的分類を行う。そして入力音声の音響パラメータ
と分類結果を検索コードブック選択部と使用音源選択部
に出力する。In the acoustic classification section (A), the input voice is analyzed to obtain the parameter representing the acoustic feature, and the acoustic classification is performed based on the parameter. Then, the acoustic parameters of the input voice and the classification result are output to the search codebook selection unit and the used sound source selection unit.

【００１７】前記（Ｂ）の検索コードブック選択部は、
前記（Ａ）の音響分類部の分析結果従って、検索処理を
行うコードブック選択し、限定する。これによって、コ
ードブック検索に要する演算量を削減することができ、
しかも入力音声に適当なコードブックを選択することに
より合成音声の音質を保つ。The search code book selecting section in the above (B) is
According to the analysis result of the sound classification unit in (A) above, the codebook to be searched is selected and limited. This can reduce the amount of calculation required for codebook search,
Moreover, the quality of the synthesized voice is maintained by selecting an appropriate codebook for the input voice.

【００１８】音源コードブックとして、前記（Ｃ）のパ
ルス音源と前記（Ｄ）の雑音音源という特性の異なった
コードブックを複数用意することによって、定常部、非
定常部など入力音声の音響的特徴の違いに対応すること
ができ、駆動音源の近似精度の低下を防ぐとともに、符
号化効率を向上させ、合成音声の音質を向上させる。As a sound source codebook, a plurality of codebooks having different characteristics such as the pulse sound source of (C) and the noise sound source of (D) are prepared, so that the acoustic characteristics of the input voice such as a stationary portion and a non-stationary portion. It is possible to cope with the difference of the above, prevent the deterioration of the approximation accuracy of the driving sound source, improve the coding efficiency, and improve the sound quality of the synthesized voice.

【００１９】前記（Ｃ）のパルス音源では、検索するパ
ルス列の先頭パルス位置とパルス間隔の範囲を予め決め
ておき、長期予測やピッチ予測の結果に係らず、範囲全
体を検索することによって、駆動音源の近似精度を向上
させ、低ビットレート化に伴う長期予測利得と周期性の
再現性の低下を補償し、合成音の音質を向上させる。ま
たパルス音源検索処理において、合成フィルタのインパ
ルス応答の打切りと最小パルス間隔の設定を行い、隣接
パルス間の影響を無くすことによって演算量の低減を行
うことができる。また、パルス列を発生させるためのパ
ルス情報コードブックは、波形ベクトル自体の情報では
なく、先頭パルス位置とパルス間隔の２つの情報しか持
たないため、コードブックに必要なメモリー量の削減が
可能である。In the pulse sound source of the above (C), the start pulse position of the pulse train to be searched and the range of the pulse interval are determined in advance, and the entire range is searched regardless of the result of the long-term prediction or the pitch prediction. It improves the approximation accuracy of the sound source, compensates for the deterioration of the long-term prediction gain and the reproducibility of the periodicity due to the low bit rate, and improves the sound quality of synthesized speech. In the pulse sound source search process, the impulse response of the synthesis filter is cut off and the minimum pulse interval is set to eliminate the influence between adjacent pulses, thereby reducing the amount of calculation. Further, the pulse information codebook for generating the pulse train does not have the information of the waveform vector itself but only the two information of the head pulse position and the pulse interval, so that the memory amount required for the codebook can be reduced. .

【００２０】前記（Ｅ）の使用音源選択部は、検索コー
ドブック選択部の選択結果、各コードブックの検索結
果、音響分類部の分析結果などから使用音源の選択を行
う。これによって最適な駆動音源を選択し、駆動音源の
近似精度を向上させることができる。The used sound source selection unit (E) selects a used sound source from the selection result of the search codebook selection unit, the search result of each codebook, the analysis result of the sound classification unit, and the like. This makes it possible to select an optimum driving sound source and improve the approximation accuracy of the driving sound source.

【００２１】[0021]

【実施例】以下、図面を用いて本発明の一実施例を説明
する。本発明の実施例の音声符号化部のブロック図を図
１に、音声復号化部のブロック図を図２に示す。An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 shows a block diagram of a speech encoding unit according to the embodiment of the present invention, and FIG. 2 shows a block diagram of a speech decoding unit.

【００２２】本発明は、符号駆動線形予測（ＣＥＬＰ）
音声符号化方式に基づいているので、具体的な実施例の
説明に先立って、まずＣＥＬＰ方式の原理について説明
する。図３はＣＥＬＰの符号化部における駆動音源決定
の原理図である。同図では、音源の周期性を表す成分と
して適応コードブック１０８の出力である長期予測ベク
トル１１０と、周期性以外の成分（ランダム性、雑音
性）として統計コードブック１０９の出力であるコード
ベクトル１１１にそれぞれの利得１１２、１１３を乗じ
て加算した荷重和１１４を駆動音源としている。The present invention uses code driven linear prediction (CELP).
Since it is based on the voice coding system, the principle of the CELP system will be described first before the description of a specific embodiment. FIG. 3 is a principle diagram of driving sound source determination in the CELP coding unit. In the figure, the long-term prediction vector 110 that is the output of the adaptive codebook 108 as a component that represents the periodicity of the sound source, and the code vector 111 that is the output of the statistical codebook 109 as the components other than the periodicity (randomness and noise). Is added to each of the gains 112 and 113 and added as a driving sound source.

【００２３】最適な駆動音源を得るためのコードブック
の検索は次のようにしてなされる。一般に駆動音源を合
成フィルタに入力して得られる合成音声が、原音声（入
力音声）に一致するような駆動音源が得られれば良い
が、実際にはなんらかの誤差（量子化歪）を伴う。した
がってこの誤差を最小化するように駆動音源を決定すれ
ば良いことになるが、人間の聴覚特性は必ずしも誤差量
と音声の主観品質の対応が取れないことが知られてい
る。そこで聴覚特性との対応が良くなるように重み付け
した誤差を用いるのが一般的である。聴覚重み付けにつ
いては、例えば次の文献に記載されている。B. S. Atal
and J. R. Remde: "A new model of LPC excitation f
or producing natural-sounding speech at low bit ra
tes", Proc.ICASSP 82 (1982.5)。The codebook search for obtaining the optimum driving sound source is performed as follows. Generally, it suffices to obtain a driving sound source in which the synthetic speech obtained by inputting the driving sound source to the synthesis filter matches the original speech (input speech), but in practice, some error (quantization distortion) is involved. Therefore, it is sufficient to determine the driving sound source so as to minimize this error, but it is known that the human auditory characteristics do not always correspond to the error amount and the subjective quality of voice. Therefore, it is general to use an error weighted so that the correspondence with the auditory characteristics is improved. Hearing weighting is described in the following documents, for example. BS Atal
and JR Remde: "A new model of LPC excitation f
or producing natural-sounding speech at low bit ra
tes ", Proc. ICASSP 82 (1982.5).

【００２４】この聴覚重み付け誤差を評価するために、
駆動音源１１４は重み付け合成フィルタ１０５に入力さ
れ、重み付け合成音声１１６を得る。入力音声１０１も
聴覚重み付けフィルタ１０４を通して重み付け入力音声
１１５を得、重み付け合成音声１１６との差を取って重
み付け誤差波形１１７を得る。なお、聴覚重み付けフィ
ルタ１０４と重み付け合成フィルタ１０５のフィルタ係
数は、あらかじめ入力音声１０１をＬＰＣ（線形予測）
分析部１０２に入力して得られたＬＰＣパラメータ１０
３によって決められる。To evaluate this perceptual weighting error,
The driving sound source 114 is input to the weighting synthesis filter 105 to obtain the weighting synthesis speech 116. The input voice 101 also obtains a weighted input voice 115 through the auditory weighting filter 104, and obtains a weighted error waveform 117 by subtracting the weighted input voice 115 from the weighted synthetic voice 116. Note that the filter coefficients of the perceptual weighting filter 104 and the weighting synthesis filter 105 are the LPC (linear prediction) of the input speech 101 in advance.
LPC parameter 10 obtained by inputting to the analysis unit 102
Determined by 3.

【００２５】重み付け誤差波形１１７は、２乗誤差計算
部１１８において誤差評価区間にわたって２乗和を計算
され、重み付け２乗誤差１１９が得られる。前述のよう
に駆動音源は長期予測ベクトルと統計コードベクトルと
の荷重和であるから、駆動音源の決定は、各コードブッ
クからどのコードベクトルを選択するかを決めるコード
ベクトル指標の決定に帰着する。すなわち、長期予測ラ
グ１０６とコードベクトル指標１０７を順次変えて重み
付け２乗誤差１１９算出し、誤差最小化部１２０におい
て重み付け誤差が最小となるものを選択すれば良い。こ
のような駆動音源決定法を「合成による分析」法と呼ん
でいる。The weighted error waveform 117 is calculated as the sum of squares over the error evaluation section in the squared error calculation unit 118, and the weighted squared error 119 is obtained. Since the driving sound source is the weighted sum of the long-term predicted vector and the statistical code vector as described above, the determination of the driving sound source results in the determination of the code vector index that determines which code vector is selected from each codebook. That is, the long-term prediction lag 106 and the code vector index 107 are sequentially changed to calculate the weighted squared error 119, and the error minimizing section 120 may select the one with the smallest weighting error. Such a driving sound source determination method is called a "synthesis analysis" method.

【００２６】この様にして最適な駆動音源が決定した
ら、長期予測ラグ１０６、コードブック指標１０７、利
得１１２、１１３、及びＬＰＣパラメータ１０３を伝送
パラメータとして多重化部１２１において多重化し、送
信データ１２２とする。また、この時の駆動音源１１４
を用いて、適応コードブック１０８の状態が更新され
る。When the optimum driving sound source is determined in this manner, the long-term prediction lag 106, the codebook index 107, the gains 112 and 113, and the LPC parameter 103 are multiplexed as transmission parameters in the multiplexing unit 121, and the transmission data 122 is obtained. To do. Also, the driving sound source 114 at this time
Is used to update the state of adaptive codebook 108.

【００２７】上述の「合成による分析」法を忠実に実行
しようとすると、すなわち毎回重み付け誤差を評価しな
がら長期予測ラグと統計コードベクトルの指標を同時に
最適化しようとすると、膨大な処理量となる。そのた
め、実際には逐次最適化等の手法が用いられる。If the above-mentioned "analysis by synthesis" method is faithfully executed, that is, if the long-term prediction lag and the index of the statistical code vector are simultaneously optimized while evaluating the weighting error, a huge amount of processing is required. . Therefore, a method such as sequential optimization is actually used.

【００２８】一方、復号部における処理は、まず受信デ
ータ２２２が多重分離部２２１で各種パラメータに分離
される。長期予測ラグ２０６に基づき適応コードブック
２０８を検索し、長期予測ベクトル２１０を出力する。
また、コードブック指標２０７に基づき統計コードブッ
ク２０９を検索し、音源ベクトル２１１を出力する。長
期予測ベクトル２１０と音源ベクトル２１１にそれぞれ
の利得２１２と２１３を乗じ、加算した信号を駆動音源
２１４として合成フィルタ２３０に入力する。合成フィ
ルタのフィルタ係数は、ＬＰＣパラメータ２０３によっ
て決まる。ポストフィルタ２３１は必須ではないが、合
成音声の主観的品質を改善するために多用され、その出
力が出力音声２３２となる。On the other hand, in the processing in the decoding section, the received data 222 is first separated into various parameters by the demultiplexing section 221. The adaptive codebook 208 is searched based on the long-term prediction lag 206, and the long-term prediction vector 210 is output.
Also, the statistical codebook 209 is searched based on the codebook index 207, and the sound source vector 211 is output. The long-term prediction vector 210 and the sound source vector 211 are multiplied by respective gains 212 and 213, and the added signal is input to the synthesis filter 230 as the driving sound source 214. The filter coefficient of the synthesis filter is determined by the LPC parameter 203. Although the post filter 231 is not essential, it is often used to improve the subjective quality of synthesized speech, and its output becomes the output speech 232.

【００２９】本発明の一実施例の音声符号化部のブロッ
ク図を図１に、音声復号化部のブロック図を図２に示
す。以下、本実施例の動作の概要を説明する。FIG. 1 shows a block diagram of a speech coder according to an embodiment of the present invention, and FIG. 2 shows a block diagram of a speech decoder. The outline of the operation of this embodiment will be described below.

【００３０】音声符号化部では、所定の標本化周波数
（通常８ｋＨｚ）でＡ／Ｄ変換されたディジタル音声信
号１１が入力される。In the voice encoding unit, the digital voice signal 11 A / D converted at a predetermined sampling frequency (usually 8 kHz) is input.

【００３１】短期予測分析部（ＬＰＣ分析部）１２は分
析フレーム長の音声データ１１を読みだし、短期予測係
数１３を出力する。フレーム長は、例えば４０ｍｓ（３
２０サンプル）程度である。The short-term prediction / analysis unit (LPC analysis unit) 12 reads out the voice data 11 having the analysis frame length and outputs the short-term prediction coefficient 13. The frame length is, for example, 40 ms (3
20 samples).

【００３２】短期予測係数１３は、短期予測係数量子化
部１４において量子化される。量子化符号が短期予測係
数量子化指標１８として、伝送パラメータとして出力さ
れる。また、短期予測係数の量子化値１７が、次段以降
の処理で参照される。The short-term prediction coefficient 13 is quantized in the short-term prediction coefficient quantizer 14. The quantized code is output as the transmission parameter as the short-term prediction coefficient quantization index 18. Further, the quantized value 17 of the short-term prediction coefficient is referred to in the processing of the next and subsequent stages.

【００３３】さらに、入力音声１１は聴覚重み付けフィ
ルタ１９で重み付けられ、重み付け音声２０が得られ
る。一方、重み付け合成フィルタ２１にフレーム長分
の、値が０の信号（零入力）２２を入力し、零入力応答
２３を得る。これを重み付け入力音声２０から減算し、
重み付け合成フィルタの過去の内部状態の影響を取り除
いた重み付け入力音声２４が得られる。また、重み付け
合成フィルタのインパルス応答２９も求めておく。Further, the input voice 11 is weighted by the perceptual weighting filter 19 to obtain the weighted voice 20. On the other hand, a signal (zero input) 22 having a value of 0 for the frame length is input to the weighting synthesis filter 21, and a zero input response 23 is obtained. Subtract this from the weighted input voice 20,
The weighted input speech 24 is obtained by removing the influence of the past internal state of the weighting synthesis filter. Further, the impulse response 29 of the weighting synthesis filter is also obtained.

【００３４】長期予測分析は、サブフレームごとに適応
コードブックの検索によって実行されるので、以下では
適応コードブック検索と呼ぶ。ここで、サブフレーム長
は、例えば１０ｍｓ（８０サンプル）程度である。適応
コードブック検索部２５において、音声の周期性を表す
パラメータである長期予測ラグが抽出され、長期予測ラ
グの指標３０と長期予測ベクトル５８が出力される。Since the long-term predictive analysis is performed by searching the adaptive codebook for each subframe, it will be referred to as an adaptive codebook search hereinafter. Here, the subframe length is, for example, about 10 ms (80 samples). The adaptive codebook search unit 25 extracts the long-term prediction lag, which is a parameter indicating the periodicity of speech, and outputs the long-term prediction lag index 30 and the long-term prediction vector 58.

【００３５】音響分類部３１では、入力音声１１をフレ
ームおよびサブフレーム単位で分析し、音響分類パラメ
ータ３３を検索コードブック選択部３４と使用音源選択
部３５に出力する。The sound classification unit 31 analyzes the input voice 11 in units of frames and subframes, and outputs the sound classification parameter 33 to the search codebook selection unit 34 and the used sound source selection unit 35.

【００３６】検索コードブック選択部３３は、音響分類
部３１からの音響分類パラメータ３３と前フレームまで
の分析結果、符号化結果に従って、検索対象コードブッ
クを複数個のコードブックの中から選択する。この際、
各コードブックの一部もコードブックとみなして検索対
象とする。このように検索対象コードブックを限定する
ことによって、コードベクトル検索に必要な演算量を削
減しつつ、駆動音源の近似精度を保つ。The search codebook selection unit 33 selects a search target codebook from a plurality of codebooks according to the sound classification parameter 33 from the sound classification unit 31, the analysis result up to the previous frame, and the coding result. On this occasion,
A part of each codebook is also regarded as a codebook and is searched. By limiting the search target codebook in this way, the approximation accuracy of the driving sound source is maintained while reducing the amount of calculation required for code vector search.

【００３７】検索コードブック選択部３４がパルス音源
を選択すると、パルス発生部４０はパルス情報コードブ
ック３８からパルス間隔と先頭パルス位置の情報３９を
読みだし、その情報に基づきパルス列４１を生成する。
パルス音源検索部３６では、パルス列４１を音源ベクト
ルとみなし、重み付け合成フィルタのインパルス応答２
９の畳み込みにより、重み付けする。重み付けられた入
力信号２０から零入力応答２３を減算した信号２４か
ら、さらに利得を乗じた重み付け長期予測ベクトル２８
を差し引いた信号に対し、最適なパルス音源ベクトル４
６を検索する。最適なパルス音源ベクトル４６に対応す
るパルス情報コードブック３８の指標４４を出力する。When the search codebook selecting section 34 selects the pulse sound source, the pulse generating section 40 reads the information 39 on the pulse interval and the leading pulse position from the pulse information codebook 38, and generates the pulse train 41 based on the information.
In the pulse sound source search unit 36, the pulse train 41 is regarded as a sound source vector, and the impulse response 2 of the weighting synthesis filter is calculated.
Weighting is performed by convolution of 9. A weighted long-term prediction vector 28 obtained by further multiplying a gain from a signal 24 obtained by subtracting a quiescent response 23 from a weighted input signal 20.
The optimum pulse source vector 4 for the signal from which
Search for 6. The index 44 of the pulse information codebook 38 corresponding to the optimum pulse source vector 46 is output.

【００３８】検索コードブック選択部３４が雑音音源を
選択すると、雑音音源検索部３７では、雑音情報コード
ブック４２から雑音情報４３を読みだし、雑音ベクトル
を生成し、この雑音ベクトルを音源ベクトルとみなし、
短期予測係数の量子化値１７から重み付け合成フィルタ
２１と同じフィルタ係数を計算し、その係数によって重
み付けを行う。そして、重み付けられた入力信号２０か
ら零入力応答２３を減算した信号２４から、さらに利得
を乗じた重み付け長期予測ベクトル２８を差し引いた信
号に対し、最適な雑音音源ベクトル４７を検索する。最
適な雑音音源ベクトル４７に対応する雑音情報コードブ
ック４２の指標４５を出力する。When the search codebook selecting unit 34 selects a noise source, the noise source searching unit 37 reads the noise information 43 from the noise information codebook 42, generates a noise vector, and regards this noise vector as a source vector. ,
The same filter coefficient as the weighting synthesis filter 21 is calculated from the quantized value 17 of the short-term prediction coefficient, and the weighting is performed by the coefficient. Then, the optimum noise source vector 47 is searched for the signal 24 obtained by subtracting the zero input response 23 from the weighted input signal 20 and further subtracting the weighted long-term prediction vector 28 multiplied by the gain. The index 45 of the noise information codebook 42 corresponding to the optimum noise source vector 47 is output.

【００３９】使用音源選択部３５では、音響分類部３１
から出力された音響分類パラメータ３３と検索コードブ
ック選択部３４の選択結果３２と検索対象コードブック
の検索結果４４、４５、４６、４７から、音源のコード
ブック指標３２と音源コードベクトル５０と音源コード
ベクトルの指標４９を出力する。In the used sound source selection unit 35, the sound classification unit 31
From the sound classification parameter 33, the selection result 32 of the search codebook selection unit 34, and the search results 44, 45, 46, and 47 of the search target codebook, the codebook index 32 of the sound source, the sound source code vector 50, and the sound source code. The vector index 49 is output.

【００４０】利得最適化・量子化部５１では、長期予測
ベクトル５８、及び、音源ベクトル５０の利得の最適値
を計算し、量子化する。その時の量子化コード５２を出
力する。The gain optimization / quantization unit 51 calculates and quantizes the optimum values of the gains of the long-term prediction vector 58 and the excitation vector 50. The quantized code 52 at that time is output.

【００４１】以上のようにして求められた短期予測係数
や利得の量子化コード１８、５２、長期予測ラグの指標
３０や使用音源指標３２や音源情報コードブックの指標
４９が伝送パラメータとして音声復号部へ伝送される。The short-term prediction coefficient and gain quantization code 18, 52, the long-term prediction lag index 30, the used sound source index 32, and the sound source information codebook index 49 obtained as described above are used as transmission parameters in the speech decoding unit. Transmitted to.

【００４２】音声復号化部では、長期予測ラグの指標６
３を用いて適応コードブック６８から長期予測ベクトル
６９が読みだされ、パルス発生部７３は音源コードブッ
ク指標６４を用いてパルス情報コードブック７０からパ
ルス間隔と先頭パルス位置に関する情報７１が読みだ
し、パルス音源ベクトル７４を発生する。雑音情報コー
ドブック７５は、音源コードブック指標６４を用いて雑
音音源ベクトル７６を生成する。そして、使用音源選択
部７０によって音源の切替が行われ、音源ベクトル７７
が出力される。また、利得コードブック指標６６を用い
て、利得コードブック７８から各利得７９、８０が再生
される。各コードベクトル６９、７７に各利得７９、８
０を乗じて加算し、駆動音源ベクトル８４が生成され
る。In the speech decoding unit, the long-term prediction lag index 6
3 is used to read the long-term predicted vector 69 from the adaptive codebook 68, and the pulse generator 73 uses the sound source codebook index 64 to read information 71 on the pulse interval and the head pulse position from the pulse information codebook 70. Generate pulse source vector 74. The noise information codebook 75 uses the source codebook index 64 to generate a noise source vector 76. Then, the sound source selection unit 70 switches the sound source, and the sound source vector 77
Is output. Also, using the gain codebook index 66, each gain 79, 80 is reproduced from the gain codebook 78. Gains 79, 8 for each code vector 69, 77
The driving sound source vector 84 is generated by multiplying by 0 and adding.

【００４３】上記駆動音源８４を、合成フィルタ８５に
入力することによって、合成音声８６が得られる。合成
フィルタ８５のフィルタ係数は、短期予測パラメータの
量子化指標６７に基づいて短期予測パラメータ量子化コ
ードブック８１から読みだされた短期予測パラメータ８
２が用いられる。最後に主観的な音質を向上させる目的
で、合成音声８６が適応ポストフィルタ８７に入力さ
れ、最終的な復号音声８８が得られる。By inputting the driving sound source 84 to the synthesis filter 85, a synthetic voice 86 is obtained. The filter coefficient of the synthesis filter 85 is the short-term prediction parameter 8 read from the short-term prediction parameter quantization codebook 81 based on the quantization index 67 of the short-term prediction parameter.
2 is used. Finally, for the purpose of improving subjective sound quality, the synthetic speech 86 is input to the adaptive post filter 87, and the final decoded speech 88 is obtained.

【００４４】復号音声（ディジタル信号）はＤＡ変換さ
れ、アナログ音声に変換され、出力される。The decoded voice (digital signal) is DA converted, converted into analog voice and output.

【００４５】以上、本実施例の概要を説明したので、次
に主要部分の詳細な機能について説明する。The outline of the present embodiment has been described above. Next, the detailed functions of the main parts will be described.

【００４６】短期予測分析部（ＬＰＣ分析部）１２は、
フレームごとに音声データ１１から音声のスペクトル包
絡を表す短期予測係数１３を抽出する。短期予測係数１
３は最も一般的には線形予測係数であるが、それから導
出される等価なパラメータである偏自己相関係数（ＰＡ
ＲＣＯＲ係数、反射係数）や線スペクトル対（ＬＳＰパ
ラメータ）に容易に変換される。The short-term prediction analysis unit (LPC analysis unit) 12 is
The short-term prediction coefficient 13 representing the spectrum envelope of the voice is extracted from the voice data 11 for each frame. Short-term prediction coefficient 1
3 is most commonly a linear prediction coefficient, but is an equivalent parameter derived from it, the partial autocorrelation coefficient (PA
It is easily converted into RCOR coefficient, reflection coefficient) and line spectrum pair (LSP parameter).

【００４７】線形予測係数の導出方法としては、Ｄｕｒ
ｂｉｎ・Ｌｅｖｉｎｓｏｎの反復法（斎藤、中田著、
「音声情報処理の基礎」、オーム社、昭和５６年に紹介
されている）が一般的であり、反射係数の導出方法は、
上記以外にもＦＬＡＴアルゴリズム（電波システム開発
センター策定、「デジタル方式自動車電話システム標準
規格ＲＣＲＳＴＤ−２７」（以下、「ＲＣＲ規格書」
と略す）に開示されている）やＬｅＲｏｕｘ法（斎藤、
中田著、前出書に記載）などが提案されている。また、
線形予測係数からＬＳＰパラメータへの変換方法も、斎
藤、中田著の前出書に記載されている。As a method of deriving the linear prediction coefficient, Dur
Bin-Levinson Iterative Method (Saito, Nakata,
"Basics of voice information processing", introduced by Ohmsha, Ltd. in 1981) is common, and the derivation method of the reflection coefficient is
In addition to the above, the FLAT algorithm (defined by the Radio System Development Center, "Digital Car Telephone System Standard RCR STD-27" (hereinafter "RCR Standard")
Abbreviated) and the LeRoux method (Saito,
Nakata, written in the above-mentioned book) is proposed. Also,
The conversion method from the linear prediction coefficient to the LSP parameter is also described in the above-mentioned book by Saito and Nakata.

【００４８】線形予測係数１３は本実施例ではＬＳＰパ
ラメータに変換された後、量子化部１４によってベクト
ル量子化され、量子化値１７に変換される（ＬＳＰコー
ドブック１５からコードベクトル１６が順次読みださ
れ、最も誤差の小さいものが量子化値となる）。ＬＳＰ
パラメータは線形予測係数を直接量子化するよりも量子
化特性が良い（同一のビット数で量子化しても、スペク
トル歪が小さい）ことが知られている。量子化方法は、
許容されるビット数によって、スカラー量子化や多段ベ
クトル量子化、ベクトル・スカラー量子化等が用いられ
ることもある。量子化指標１８は伝送パラメータとして
出力される。In the present embodiment, the linear prediction coefficient 13 is converted into an LSP parameter and then vector-quantized by the quantizing unit 14 to be converted into a quantized value 17 (the code vector 16 is sequentially read from the LSP codebook 15). The quantized value is the one with the smallest error). LSP
It is known that the parameter has a better quantization characteristic than that of directly quantizing the linear prediction coefficient (spectral distortion is small even if quantized with the same number of bits). The quantization method is
Depending on the number of bits allowed, scalar quantization, multistage vector quantization, vector scalar quantization, etc. may be used. The quantization index 18 is output as a transmission parameter.

【００４９】次に聴覚重み付け誤差を計算するための前
処理について説明する。重み付け誤差を算出するため
に、まず入力音声１１に聴覚重み付けフィルタ１９で重
み付けがなされ、重み付け音声２０を得る。重み付けフ
ィルタ１９は短期予測係数（または等価なパラメータ）
の量子化値１７から構成されるが、その具体形式は次の
通りである。Next, preprocessing for calculating the perceptual weighting error will be described. In order to calculate the weighting error, the perceptual weighting filter 19 first weights the input voice 11 to obtain a weighted voice 20. The weighting filter 19 is a short-term prediction coefficient (or equivalent parameter).
Of the quantized value 17 of, the concrete form of which is as follows.

【００５０】[0050]

【数１】 [Equation 1]

【００５１】ここにα_iはフィルタ係数（線形予測係
数）、Ｎｐはフィルタ次数でたとえばＮｐ＝１０、λは
重み付けパラメータで通常λ＝０．８である。Here, α _i is a filter coefficient (linear prediction coefficient), Np is a filter order, for example, Np = 10, and λ is a weighting parameter, usually λ = 0.8.

【００５２】一般に合成フィルタの出力は過去の状態の
影響を受けるが、ここでは演算量を削減するために、予
め重み付け音声２０から過去の合成フィルタの影響を取
り除いておく。すなわち、重み付け合成フィルタ２１に
フレーム長に相当する、値が０のデータ（零入力２２）
を入力し、零入力応答２３を計算し、重み付け音声２０
から減算し、過去の影響を取り除いた重み付け音声２４
を得る。ここで用いる重み付け合成フィルタ２１の伝達
関数は次の通りである。Generally, the output of the synthesis filter is influenced by the past state, but here, in order to reduce the amount of calculation, the influence of the past synthesis filter is removed from the weighted speech 20 in advance. That is, data having a value of 0 corresponding to the frame length in the weighting synthesis filter 21 (zero input 22)
, The zero input response 23 is calculated, and the weighted speech 20
Weighted speech 24, which is subtracted from
To get The transfer function of the weighting synthesis filter 21 used here is as follows.

【００５３】[0053]

【数２】 [Equation 2]

【００５４】この合成フィルタ２１は重み付けパラメー
タλを含んでいる点が、復号側の合成フィルタと異なる
点である。また、この重み付け合成フィルタ２１のイン
パルス応答２９も同時に求めておく。この時、（数２）
のαとしては、線形予測パラメータの量子化値１７が用
いられる。This synthesis filter 21 is different from the decoding-side synthesis filter in that it includes the weighting parameter λ. Further, the impulse response 29 of the weighting synthesis filter 21 is also obtained at the same time. At this time, (Equation 2)
As α of, the quantized value 17 of the linear prediction parameter is used.

【００５５】初めに説明したとおり、長期予測分析は適
応コードブックの検索とみなされ、合成波形と原音声と
の聴覚重み付け誤差の最小化によって長期予測ラグ（適
応コードブックの指標）が選択される。ここでは適応コ
ードブック検索とパルス音源検索は逐次的に行われる場
合について説明する。すなわち、パルス音源を用いない
と仮定して、最適な長期予測ラグの指標３０を決定す
る。As explained at the beginning, the long-term prediction analysis is regarded as a search of the adaptive codebook, and the long-term prediction lag (index of the adaptive codebook) is selected by minimizing the auditory weighting error between the synthetic waveform and the original speech. . Here, the case where the adaptive codebook search and the pulse sound source search are sequentially performed will be described. That is, the optimum long-term prediction lag index 30 is determined on the assumption that a pulsed sound source is not used.

【００５６】次に、適応コードブック検索部２５につい
て説明する。検索の対象となる長期予測ラグに対応し
て、適応コードブック２６から読みだされたコードベク
トル２７の重み付け合成は、重み付け合成フィルタのイ
ンパルス応答２９との畳み込みによって実現する。この
ようにして得られた合成出力（重み付き長期予測ベクト
ル）２８は合成フィルタの過去の状態には依存しないの
で、零状態応答と呼ばれる。検索範囲にある各ラグに対
する長期予測ベクトル２８を計算し、過去の影響を取り
除いた重み付け音声２４との相関が計算され、相関の最
大値を与える（最適な）長期予測ベクトル５８と、その
時の長期予測ラグを量子化した長期予測ラグ指標３０が
出力される。長期予測分析方法の詳細や演算量削減のた
めの手法等については、前出のＲＣＲ規格書を参照され
たい。Next, the adaptive codebook search unit 25 will be described. The weighted synthesis of the code vector 27 read from the adaptive codebook 26 corresponding to the long-term prediction lag to be searched is realized by convolution with the impulse response 29 of the weighted synthesis filter. The synthesized output (weighted long-term predicted vector) 28 thus obtained does not depend on the past states of the synthesis filter, and is called a zero-state response. The long-term prediction vector 28 for each lag in the search range is calculated, the correlation with the weighted speech 24 from which the past influence is removed is calculated, and the (optimal) long-term prediction vector 58 that gives the maximum value of the correlation and the long-term prediction time at that time are calculated. The long-term prediction lag index 30 obtained by quantizing the prediction lag is output. For details of the long-term prediction analysis method and the method for reducing the amount of calculation, refer to the above-mentioned RCR standard.

【００５７】次に、パルス音源と雑音音源の併用とパル
ス音源ベクトルの生成について説明する。Next, the combined use of the pulse sound source and the noise sound source and the generation of the pulse sound source vector will be described.

【００５８】本発明では、従来のＣＥＬＰの統計音源の
代わりに複数個のコードブックを具備し、その中にパル
スコードブックと雑音コードブックを少なくとも一つず
つ有し、それらを併用し、そのうちパルスコードブック
については長期予測分析等の結果とは独立に全検索する
点に特徴がある。パルス音源と雑音音源の併用は、入力
音声の音響的性質によって、駆動音源が近似しようとす
る残差波形も性質が異なることから行われる。例えば図
５に示すように、音声波形はおおまかに定常部と非定常
部に分けることができるが、残差波形は音声波形から短
期的性質の成分を取り除いたものであり長期的な周期成
分と考えられるが、、定常部と非定常部では残差波形の
性質が異なっていると考えられる。この性質の異なる残
差波形の近似をパルス音源、あるいは雑音音源どちらか
だけで行うのは困難である。そこで、入力音声の音響的
特徴によって検索対象とする音源コードブックを変化さ
せることによって、符号化音声の音質の向上を図ってい
る。In the present invention, a plurality of codebooks are provided in place of the conventional CELP statistical sound source, and at least one pulse codebook and at least one noise codebook are provided in each of them, and the pulse codebook and the noise codebook are used together. The feature of the codebook is that all searches are performed independently of the results of long-term prediction analysis. The combined use of the pulse sound source and the noise sound source is performed because the residual waveform that the driving sound source is trying to approximate also has different properties depending on the acoustic properties of the input voice. For example, as shown in FIG. 5, the speech waveform can be roughly divided into a stationary portion and a non-stationary portion, but the residual waveform is obtained by removing a component having a short-term characteristic from the speech waveform and is a long-term periodic component. It can be considered that the characteristics of the residual waveform are different between the stationary part and the non-stationary part. It is difficult to approximate residual waveforms having different properties with only a pulse sound source or a noise sound source. Therefore, the sound source codebook to be searched is changed according to the acoustic characteristics of the input voice to improve the sound quality of the encoded voice.

【００５９】本発明では使用音源の選択は、入力音声の
音響的特徴を分析し分類を行う音響分類部３１と、検索
対象とする音源コードブックを選択する検索コードブッ
ク選択部３４、使用する音源コードブックを選択する使
用音源選択部３５によって行う。本実施例では、検索対
象コードブックを検索コードブック選択部３４で複数選
択し、使用音源選択部３５が一つのコードブックを使用
音源として選択する方法であるが、検索コードブック選
択部３４と使用音源選択部３５それぞれの選択方法を組
み合わせることによって本実施例の他にもいくつかの使
用音源選択法が考えられる。検索コードブックの選択で
は、全コードブックあるいは一部のコードブックを複数
選択する方法や、各コードブックのサブセットを検索コ
ードブックの中に含める含める方法も考えられる。また
音響分類部などの出力結果によってトップダウン式に検
索コードブックを一つに限定する方法も考えられ、この
場合は選択コードブック選択部が使用音源選択部を兼ね
ており、検索コードブックの選択結果が自動的に使用音
源指標となる。また検索音源の選択評価方法では、フレ
ームあるいはサブフレーム等の単位で行った音響分類の
結果で選択を行う方法や、前フレームまでの符号化を行
った結果から選択を行う方法、それらを組み合わせて選
択を行う方法などが考えられる。使用音源の選択では、
複数コードブックの検索結果からボトムアップ式に使用
音源を選択する方法や、複数コードブックの検索結果に
音響分類部などの結果を考慮して選択する方法などが考
えられる。In the present invention, the sound source to be used is selected by analyzing the acoustic characteristics of the input voice and classifying the sound, a sound codebook selecting unit 34 for selecting a sound source codebook to be searched, and a sound source to be used. This is performed by the used sound source selection unit 35 that selects a codebook. In the present embodiment, a plurality of search target codebooks are selected by the search codebook selection unit 34, and the used sound source selection unit 35 selects one codebook as the used sound source. By combining the selection methods of the respective sound source selection units 35, several sound source selection methods other than this embodiment can be considered. In selecting the search codebook, a method of selecting all or some of the codebooks or a method of including a subset of each codebook in the search codebook can be considered. It is also possible to limit the search codebook to one in a top-down manner depending on the output result of the sound classification unit, etc. In this case, the selection codebook selection unit also serves as the sound source selection unit, and the search codebook selection The result automatically becomes the used sound source index. In addition, in the method for selecting and evaluating the search sound source, a method of selecting based on the result of acoustic classification performed in units of frames or subframes, a method of selecting from the results of encoding up to the previous frame, and combining them A method of making a selection can be considered. When selecting the sound source to use,
A method of selecting a sound source to be used from a search result of a plurality of codebooks in a bottom-up manner, a method of selecting a sound source of a plurality of codebooks in consideration of a result of an acoustic classification unit, and the like are possible.

【００６０】パルス音源は、基本的には周期パルス列の
一部（サブフレーム長分）を取りだしたものである。し
かし、先頭パルス位置は図６に示すように、パルス間隔
によらずサブフレームの最初のサンプルから最後のサン
プルまで取りうるようにしている。これは低ビットレー
ト化にともないサブフレーム長が長くなることによっ
て、長期予測ベクトルではカバーしきれない、音声の立
上りの特徴をパルス音源により再現するためである。ま
た、パルス間隔は長期予測ラグの検索範囲と同様に、人
間の発声のピッチ周期の変動範囲をほぼカバーする程度
にするのが良い。本実施例では最小パルス間隔をＬｍｉ
ｎ＝２０、最大パルス間隔をＬｍａｘ＝１４６としてい
る。The pulse sound source is basically a part of the periodic pulse train (for the subframe length). However, as shown in FIG. 6, the head pulse position is set so that it can be taken from the first sample to the last sample of the subframe regardless of the pulse interval. This is because the sub-frame length becomes longer as the bit rate becomes lower, and the characteristics of the rising edge of the voice, which cannot be covered by the long-term prediction vector, are reproduced by the pulse sound source. Further, it is preferable that the pulse interval substantially covers the variation range of the pitch period of human utterance, similarly to the search range of the long-term prediction lag. In this embodiment, the minimum pulse interval is Lmi
n = 20 and the maximum pulse interval is Lmax = 146.

【００６１】パルス情報コードブック３８には、図７に
示すようにパルス間隔と先頭パルス位置が格納されてい
る。図６から分かるように、パルス間隔をＬ、サブフレ
ーム長をＮとしたとき（本実施例ではＮ＝８０）、Ｌ≧
Ｎの場合はサブフレーム内のパルス数は１本である。Ｌ
＜Ｎの場合は、先頭パルス位置によって１本または２本
以上となる。１本の場合は、Ｌ≧Ｎの場合と重複するの
で、パルス情報コードブックにはパルス列の重複が生じ
ないようにパルス間隔と先頭パルス位置を配置する。す
なわち、Ｌ＜Ｎの場合は、先頭パルス位置はサブフレー
ム内に２本以上のパルスが存在するような範囲とし、Ｌ
≧Ｎについては、Ｌ＝Ｎで代表させ、先頭パルス位置は
０からＮ−１とする。本実施例ではＮ＝８０、Ｌｍｉｎ
＝２０としているので、重複のないパルス列の種類は１
９１０通りとなるが、先頭パルス位置を２サンプル毎に
することによって、パルス列の種類は１０１０種類とな
り、１０ビットで表現できる。これは送出ビット数削減
を目的としたものだが、実験の結果、復号音声８８の劣
化は少なく、音声符号化部の性能上問題はない。As shown in FIG. 7, the pulse information codebook 38 stores pulse intervals and head pulse positions. As can be seen from FIG. 6, when the pulse interval is L and the subframe length is N (N = 80 in this embodiment), L ≧
In the case of N, the number of pulses in the subframe is one. L
In the case of <N, the number is one or two or more depending on the head pulse position. Since the case of one line overlaps with the case of L ≧ N, the pulse interval and the leading pulse position are arranged in the pulse information codebook so that the pulse trains do not overlap. That is, when L <N, the leading pulse position is set to a range where two or more pulses are present in the subframe, and L
For ≧ N, L = N is representative and the head pulse position is from 0 to N−1. In this embodiment, N = 80, Lmin
= 20, so the number of pulse trains that do not overlap is 1
There are 910 types, but by setting the head pulse position every 2 samples, the number of types of pulse trains becomes 1010 types, which can be expressed by 10 bits. This is intended to reduce the number of transmitted bits, but as a result of experiments, the decoded speech 88 is less deteriorated and there is no problem in the performance of the speech coding unit.

【００６２】パルス発生部４０では、パルス情報コード
ブック３８から読みだされたパルス間隔と先頭パルス位
置の情報３９に基づき、図８に示すようなパルスを生成
する。パルスの振幅は１、パルスの存在しないサンプル
の振幅は０とする。The pulse generating section 40 generates a pulse as shown in FIG. 8 based on the pulse interval and head pulse position information 39 read from the pulse information code book 38. The amplitude of the pulse is 1, and the amplitude of the sample without the pulse is 0.

【００６３】以上はパルス情報コードブック３８とパル
ス発生部４０によってパルス音源ベクトル４１を生成す
る場合であるが、全てのパルス音源ベクトルをコードブ
ックに格納しておくことももちろん可能である。ただし
その場合は、パルス生成の処理が省略できる反面、コー
ドブックの記憶容量は、パルス情報コードブック３８で
は１ベクトルあたりパルス間隔と先頭パルス位置の２ワ
ードで済んでいたのに対し、Ｎワード必要になる。The above is the case where the pulse information codebook 38 and the pulse generator 40 generate the pulse excitation vector 41, but it is of course possible to store all the pulse excitation vectors in the codebook. However, in this case, the pulse generation process can be omitted, but the wordbook requires a storage capacity of N words, whereas the pulse information codebook 38 requires only two words of the pulse interval per vector and the head pulse position. become.

【００６４】次にパルス音源の検索について説明する。Next, the search for the pulse sound source will be described.

【００６５】まず、適応コードブック検索の結果出力さ
れた最適な長期予測ベクトル５８をｂ_L(n)、その重み付
けられた信号（ｂ_L(n)の零状態応答）２８をｂ'_L(n)、
利得をβとする。また、過去の影響を取り除いた重み付
け入力音声２４をｐ(n)とする。ここで次式のようにｐ'
(n)を定義する。First, the optimum long-term prediction vector 58 output as a result of the adaptive codebook search is b _L (n), and its weighted signal (zero-state response of b _L (n)) 28 is b ′ _L (n). ),
The gain is β. Further, the weighted input voice 24 from which the influence of the past is removed is defined as p (n). Here, p '
Define (n).

【００６６】[0066]

【数３】 [Equation 3]

【００６７】これは理想的な合成音声から長期予測ベク
トルの寄与分を差し引いた成分を表しており、パルス音
源によってカバーしようとする成分である。This represents the component obtained by subtracting the contribution of the long-term predicted vector from the ideal synthetic speech, which is the component to be covered by the pulse sound source.

【００６８】生成したパルス音源をｆ_i(n)、その重み付
け合成音声をｆ'_i(n)とすると、誤差Ｅ、If the generated pulse sound source is f _i (n) and the weighted synthesized speech is f ′ _i (n), the error E,

【００６９】[0069]

【数４】 [Equation 4]

【００７０】を最小化するようなｆ'_i(n)を求めればよ
い。ここでγ_iは利得、ｉはパルス情報コードブックの
指標（インデクス）を表す。It suffices to find f ′ _i (n) that minimizes Here, γ _i represents the gain, and i represents the index (index) of the pulse information codebook.

【００７１】（数４）をγで偏微分して０とおくと、誤
差Ｅを最小化するγ_iはWhen (Equation 4) is partially differentiated by γ and set to 0, γ _i that minimizes the error E is

【００７２】[0072]

【数５】 [Equation 5]

【００７３】となり、この時のＥはAnd E at this time is

【００７４】[0074]

【数６】 [Equation 6]

【００７５】となる。ここで（数６）の右辺第１項は
ｆ'_i(n)によらず正の一定値となるので、右辺第２項を
最大化するｆ'_i(n)、すなわちパルス音源ｆ_i(n)を求め
ることに帰着する。It becomes 'Since the positive constant value irrespective of the _i (n), to maximize the second term f' where the first term on the right side of equation (6) is f _i (n), i.e. pulse excitation f _i ( Reduce to finding n).

【００７６】以上の処理は、基本的には従来のＣＥＬＰ
における統計コードブック検索と同じであり、処理量の
大きな部分である。本発明では、パルス音源の特徴を利
用し、次数を打ち切ったインパルス応答を用いること
で、検索の処理量を大幅に低減している。The above-mentioned processing is basically performed by the conventional CELP.
It is the same as the statistical codebook search in, and is a large processing amount. In the present invention, by utilizing the characteristics of the pulse sound source and using the impulse response whose order is truncated, the amount of search processing is significantly reduced.

【００７７】一般にインパルス応答の畳み込みにより音
声を合成する場合、インパルス応答の次数打切りは誤差
の原因となる。しかし、（数２）で表される重み付け合
成フィルタのインパルス応答は図９に示すように、重み
付けなしのインパルス応答に比べて減衰が急峻であり、
次数打切りの影響は小さい。打切り次数を２０次（２．
５ｍｓ）位に設定すれば、ほとんどの場合打切りによる
影響は無視できる。そこで本発明では、打切り次数をパ
ルス音源の最小パルス間隔であるＬｍｉｎ（２０サンプ
ル）にする。Generally, when synthesizing a voice by convolving the impulse response, the order cutoff of the impulse response causes an error. However, the impulse response of the weighting synthesis filter represented by (Equation 2) has a steeper attenuation than the impulse response without weighting as shown in FIG.
The impact of order cancellation is small. The censoring order is the 20th order (2.
If it is set to about 5 ms), in most cases the effect of censoring can be ignored. Therefore, in the present invention, the cutoff order is set to Lmin (20 samples) which is the minimum pulse interval of the pulse sound source.

【００７８】ここでＣ_i、Ｇ_iを次式のように定義する。Here, C _i and G _i are defined as follows.

【００７９】[0079]

【数７】 [Equation 7]

【００８０】Ｃ_iはｐ'(n)とｆ'_i(n)の相互相関であり、
またＧ_iはｆ'_i(n)のパワーであるので、本来ならばｆ'_i
(n)が変わるごとに（指標ｉを更新するごとに）計算し
なおす必要がある。一方、ｐ'(n)（０≦ｎ≦Ｎ−１、Ｎ
はサブフレームのサンプル数）とインパルス応答ｈ(n)
はあるサブフレームでは一定である。ここで次数をＬｍ
ｉｎで打ち切ったインパルス応答をｈ'(n)（０≦ｎ≦Ｌ
ｍｉｎ）とし、次式で表されるａ_j（０≦ｊ≦Ｎ−１）
をあらかじめ計算しておく。[0080] C _i is the cross-correlation p '(n) and f' _i (n),
The 'because it is the power of _i (n), would otherwise f' G _i is f _i
It is necessary to recalculate each time (n) changes (every time the index i is updated). On the other hand, p ′ (n) (0 ≦ n ≦ N−1, N
Is the number of subframe samples) and impulse response h (n)
Is constant in a certain subframe. Where the order is Lm
The impulse response truncated by in is h ′ (n) (0 ≦ n ≦ L
min) and a _j (0 ≦ j ≦ N−1) represented by the following equation
Is calculated in advance.

【００８１】[0081]

【数８】 [Equation 8]

【００８２】ａ_jは図１０に示すように、ｈ'(n)の位置
を１サンプルずつシフトしたときの、ｈ'(n)に対応する
ｐ'(n)の部分との相互相関を示してしている。As shown in FIG. 10, a _j represents the cross-correlation with the part of p '(n) corresponding to h' (n) when the position of h '(n) is shifted by one sample. I am doing it.

【００８３】ｈ'(n)はＬｍｉｎで打ちきられているの
で、検索対象のいかなるパルス音源に対してもパルス間
でのオーバラップが生じない。したがって、（数７）の
Ｃ_iを求めるには、例えば図１１に示すように、パルス
音源ｆ_i(n)のパルス位置がＰ１、Ｐ２、Ｐ３だったとす
ると、あらかじめ計算してあるａ_jのうち、ａ_P1と
ａ_P2、ａ_P3の和を計算すれば良いことになる。よって、
ｆ'_i(n)が変わるごとに行うべきインパルス応答の畳み
込みの計算が、あらかじめサブフレームに１回計算して
ある部分相互相関の和に置き換えられることにより、処
理量の大幅な低減が可能となった。Since h '(n) is cut off by Lmin, no overlap occurs between pulses for any pulse sound source to be searched. Accordingly, the a _j To obtain the C _i, for example, as shown in FIG. 11, when the pulse position of the pulse excitation f _i (n) is that it was P1, P2, P3, which is previously calculated in equation (7) Of these, the sum of a _P1 , a _P2 , and a _P3 should be calculated. Therefore,
Since the calculation of the convolution of the impulse response that should be performed each time f ′ _i (n) changes is replaced with the sum of the partial cross-correlations that have been calculated once for each subframe, the processing amount can be significantly reduced. became.

【００８４】同様な手法が（数７）のＧ_iについても適
用できる。すなわち、あらかじめ次式で定義されるｇ_j
を計算しておく。A similar method can be applied to G _{i in} (Equation 7). That is, g _j defined in advance by the following equation
Is calculated.

【００８５】[0085]

【数９】 [Equation 9]

【００８６】なお、（数９）の示すとおり、０≦ｊ≦Ｎ
−Ｌｍｉｎの場合はｇ_jの値は一定になるので、ｇ₀のみ
計算しておけば良い。Ｇ_iの計算もＣ_iの場合と同様に、
ｆ_i(n)のパルス位置に対応したｇ_jの和を求めることに
よって実現できる。As shown in (Equation 9), 0 ≦ j ≦ N
In the case of −Lmin, the value of g _j is constant, so only g ₀ needs to be calculated. The calculation of G _i is similar to the case of C _i ,
This can be realized by _obtaining the sum of g _j corresponding to the pulse position of f _i (n).

【００８７】なお、この様にして最適な（Ｃ_i ²／Ｇ_iを
最大化する）パルス音源ｆ_i(n)が求まったら、次数打切
りをしないインパルス応答ｈ(n)を用いることにより、
ｆ_i(n)の厳密な重み付け信号ｆ'_i(n)を計算しておく。When the optimum pulse sound source f _i (n) (maximizing C _i ² / G _i ) is obtained in this way, the impulse response h (n) without order censoring is used.
f _i strict weighting signal f _'i of (n) (n) previously calculated.

【００８８】パルスコードブックを用いる従来の方法
（前述の吉田等の文献、及び、田中等の文献）ではパル
ス間隔は長期予測ラグ、または、ピッチ抽出をして求め
たピッチ周期としている。そのため、入力音声の周期性
の低い部分ではパルス音源を使用すると音質が劣化して
いた。本発明では、可能な組合せのパルス音源の全検索
を行っているため、この様な部分でも長期予測ベクトル
を補完し、良好な音質が得られる。In the conventional method using the pulse codebook (the above-mentioned documents of Yoshida et al. And Tanaka et al.), The pulse interval is the long-term prediction lag or the pitch period obtained by pitch extraction. Therefore, the sound quality is deteriorated when the pulse sound source is used in the portion where the periodicity of the input sound is low. In the present invention, since all possible combinations of pulsed sound sources are searched, even in such a portion, the long-term predicted vector is complemented and good sound quality can be obtained.

【００８９】本実施例では、雑音音源として少数の基底
ベクトルの２値の荷重付き線形結合を用いており、雑音
音源の検索は、少数の基底ベクトルの２値の荷重付き線
形結合の荷重値の組合せを検索することによって行って
いる。この方法はＲＣＲ規格のフルレート音声符号化方
法に用いられている方法であり、処理量と必要メモリー
量の点で大きな改良がされている。実際の処理の手順に
関してはＲＣＲ規格書を参照されたい。In this embodiment, a binary weighted linear combination of a small number of basis vectors is used as a noise source, and a noise source search is performed using the binary weighted linear combination of a small number of basis vectors. It does this by searching for combinations. This method is used in the full rate speech coding method of the RCR standard, and has been greatly improved in terms of processing amount and required memory amount. Refer to the RCR standard for the actual processing procedure.

【００９０】音声符号化部における最終段の処理は利得
の最適化と量子化である。利得最適化・量子化部５１
に、厳密に重み付けられた（次数打切りのないインパル
ス応答の畳み込みによって求められた）長期予測ベクト
ルｂ'_L(n)２８と音源ベクトルｆ'_i(n)５０、及び、過去
の影響を取り除かれた重み付け入力音声ｐ(n)２４が入
力される。ここで改めて利得をβ、γとすると、次式の
重み付け誤差Ｅを最小化するようにβとγを決定する。The final stage processing in the speech encoding unit is gain optimization and quantization. Gain optimization / quantization unit 51
, The strictly weighted long-term prediction vector b ′ _L (n) 28 and the source vector f ′ _i (n) 50 (determined by convolution of the impulse response without order truncation), and the past effects are removed. The weighted input voice p (n) 24 is input. Here, assuming that the gains are β and γ again, β and γ are determined so as to minimize the weighting error E of the following equation.

【００９１】[0091]

【数１０】 [Equation 10]

【００９２】具体的には、（数１０）をβとγで偏微分
して０とおいてできる連立方程式を解くことによる。Specifically, it is based on solving a simultaneous equation that can be set to 0 by partially differentiating (Equation 10) with β and γ.

【００９３】利得の量子化は、βとγを直接スカラー量
子化したり、別の変数に変換後スカラー量子化あるいは
ベクトル量子化するなどの方法がある。本実施例では後
者の方法によりスカラー量子化する。As for the quantization of gain, there are methods such as direct scalar quantization of β and γ, scalar quantization after conversion into another variable, or vector quantization. In this embodiment, the latter method is used for scalar quantization.

【００９４】β及びγの量子化値をβ_q５３、γ_q５４と
すると、それぞれを重み付けられていない長期予測ベク
トル５０と音源ベクトル５０に乗じ、駆動音源５５を作
製する。この駆動音源５５は、適応コードブック２６の
更新に用いられる。When the quantized values of β and γ are β _q 53 and γ _q 54, respectively, the unweighted long-term prediction vector 50 and the excitation vector 50 are multiplied to produce a driving excitation 55. The driving sound source 55 is used to update the adaptive codebook 26.

【００９５】次に図２に戻り、本実施例の音声復号化部
について説明する。Next, returning to FIG. 2, the speech decoding unit of this embodiment will be described.

【００９６】受信データ６１は、多重分離部６２におい
て、短期予測パラメータ量子化指標６７、長期予測ラグ
指標６３、使用音源指標６５、音源情報コードブック指
標６４、利得量子化指標６６に多重分離される。The received data 61 is demultiplexed by the demultiplexing unit 62 into a short-term prediction parameter quantization index 67, a long-term prediction lag index 63, a used sound source index 65, a sound source information codebook index 64, and a gain quantization index 66. .

【００９７】復号化処理の第１段階は、各パラメータ値
の復号化である。短期予測パラメータの指標６７に基づ
いて、短期予測パラメータ量子化コードブック８１から
短期予測パラメータ値８２が復号される。同様に適応コ
ードブック６８では、長期予測ラグ指標６３に基づいて
長期予測ベクトル６９を復号する。利得コードブック７
８では、利得量子化指標６６に基づいて量子化利得７
９、８０を復号する。パルス音源では、音源情報コード
ブック指標６４に基づいて、パルス情報コードブック７
１からパルス間隔と先頭パルス位置の情報７２が読みだ
され、パルス生成部７３によってパルス音源ベクトル
（パルス列）７４が復号される。雑音音源では、音源情
報コードブック指標６４に基づいて、雑音情報コードブ
ック７５によって、基底ベクトルを２値の荷重付き線形
結合を行った雑音音源ベクトル７６が出力される。次
に、使用音源指標６５に基づいて使用音源選択部７０が
使用音源を選択し、音源ベクトル７７を出力する。The first stage of the decoding process is the decoding of each parameter value. The short-term prediction parameter value 82 is decoded from the short-term prediction parameter quantization codebook 81 based on the short-term prediction parameter index 67. Similarly, the adaptive codebook 68 decodes the long-term prediction vector 69 based on the long-term prediction lag index 63. Gain Codebook 7
8, the quantization gain 7 is calculated based on the gain quantization index 66.
Decode 9, 80. In the pulse sound source, based on the sound source information codebook index 64, the pulse information codebook 7
The information 72 of the pulse interval and the head pulse position is read from 1, and the pulse generator 73 decodes the pulse sound source vector (pulse train) 74. In the noise source, the noise source codebook 75 outputs the noise source vector 76 in which the basis vectors are binary-weighted linearly combined based on the source information codebook index 64. Next, the used sound source selection unit 70 selects the used sound source based on the used sound source index 65 and outputs the sound source vector 77.

【００９８】復号化処理の第２段階は、駆動音源の生成
である。適応コードブック６８から長期予測ラグ指標６
３に対応して読みだされた長期予測ベクトル６９と、音
源ベクトル７７のそれぞれに、利得７９、８０が乗ぜら
れ、加算されて駆動音源８４が生成される。駆動音源８
４は合成フィルタ８５に入力されるとともに、適応コー
ドブック６８の状態更新にも用いられる。The second stage of the decoding process is the generation of the driving sound source. Long-term prediction lag index 6 from the adaptation codebook 68
The long-term predicted vector 69 and the sound source vector 77 read corresponding to 3 are multiplied by gains 79 and 80, and added to generate the driving sound source 84. Driving sound source 8
4 is input to the synthesis filter 85 and is also used for updating the state of the adaptive codebook 68.

【００９９】復号化処理の最後の段階は、音声合成であ
る。合成フィルタ８５では、復号された短期予測パラメ
ータ８２をフィルタ係数とし、駆動音源８４を入力する
ことによってディジタル合成音声８６を合成出力する。
さらに、主観的音質を高めるために、合成フィルタ８５
の出力８６をポストフィルタ８７に通し、その出力であ
る最終的なディジタル合成音声８８を得る。これはバッ
ファメモリを介して連続的にＤＡ変換器に送られ、アナ
ログ合成音声に変換される。The final stage of the decoding process is speech synthesis. The synthesis filter 85 uses the decoded short-term prediction parameter 82 as a filter coefficient, and inputs the driving sound source 84 to synthesize and output the digital synthesized speech 86.
Furthermore, in order to enhance the subjective sound quality, the synthesis filter 85
Output 86 is passed through a post filter 87 to obtain the final digital synthesized voice 88 as its output. This is continuously sent to the DA converter via the buffer memory and converted into analog synthesized voice.

【０１００】以上で、本発明の実施例の音声入力から符
号化、復号化、音声出力までの動作を説明した。以上の
説明では、音声のフレームエネルギー（パワー）には特
に言及しなかった。これはフレームエネルギーは駆動音
源の利得に反映されているためであるが、利得の量子化
を考慮すると、利得のダイナミックレンジを抑えるため
にあらかじめフレームエネルギーで正規化しておく方が
有利である。フレームエネルギーは線形予測パラメータ
の計算時に容易に求められるので、フレームエネルギー
は別途量子化して、その指標を伝送する。また、長期予
測分析の際に、適応コードブックを補間あるいは長期予
測評価関数を補間し、サブサンプル単位での長期予測を
行うフラクショナルピッチを用いることによって符号化
音声、特に女声など周期が比較的短い音声の品質を向上
させることができる。また長期予測で、適応コードブッ
クや長期予測ベクトルを波形的に伸縮させ、周期成分の
時間的な変化に対応することによって符号化音声の品質
が向上する。これらの処理を行う場合には、復号化部に
フラクショナルピッチあるいは波形伸縮の処理の有無の
情報を送らなければならない。The operations from voice input to encoding, decoding and voice output according to the embodiment of the present invention have been described above. In the above description, the frame energy (power) of voice is not particularly referred to. This is because the frame energy is reflected in the gain of the driving sound source, but considering the quantization of the gain, it is advantageous to normalize with the frame energy in advance in order to suppress the dynamic range of the gain. Since the frame energy can be easily obtained when calculating the linear prediction parameter, the frame energy is separately quantized and the index is transmitted. Also, in the case of long-term prediction analysis, the adaptive codebook is interpolated or the long-term prediction evaluation function is interpolated, and the fractional pitch is used to perform long-term prediction in sub-sample units. The voice quality can be improved. In the long-term prediction, the adaptive codebook and the long-term prediction vector are expanded / contracted in a waveform to cope with the temporal change of the periodic component, thereby improving the quality of the coded speech. When performing these processes, it is necessary to send information on the presence or absence of the fractional pitch or waveform expansion / contraction process to the decoding unit.

【０１０１】このようにした場合のビット割当ての例を
次に示す。An example of bit allocation in this case is shown below.

【０１０２】標本化周波数を８ｋＨｚ、フレーム長を４
０ｍｓ（３２０サンプル）、サブフレーム長を１０ｍｓ
（８０サンプル）とする。フレームエネルギーと線形予
測パラメータはフレーム単位で更新し、他のパラメータ
はサブフレーム単位で更新するものとする。なお、フレ
ームエネルギーと線形予測パラメータは、サブフレーム
単位で補間して用いた方が、合成音声の品質向上に有効
である。量子化は２７ビットの多段ベクトル量子化を行
うとすれば、線形予測パラメータの量子化指標は２７ビ
ットとなる。フレームエネルギーは５ビットでスカラー
量子化する。よって、フレーム当りの伝送ビット数は３
２ビットである。Sampling frequency is 8 kHz, frame length is 4
0 ms (320 samples), subframe length 10 ms
(80 samples). It is assumed that the frame energy and the linear prediction parameter are updated in frame units, and the other parameters are updated in subframe units. It should be noted that it is effective to improve the quality of synthesized speech by interpolating the frame energy and the linear prediction parameter in subframe units. If the quantization is 27-bit multi-stage vector quantization, the quantization index of the linear prediction parameter is 27 bits. Frame energy is scalar quantized with 5 bits. Therefore, the number of transmission bits per frame is 3
It is 2 bits.

【０１０３】サブフレーム単位のパラメータは、長期予
測ラグの指標が７ビットで、これは長期予測ラグの範囲
が１９サンプル（４２１Ｈｚ）から１４６サンプル（５
５Ｈｚ）に対応する。音源を２種類用いるとすると音源
切替指標は１ビットである。パルスおよび雑音情報コー
ドブックのコードブックサイズを１０ビット（１０１０
コードベクトル）とすればコードベクトル指標は１０ビ
ットである。利得は、長期予測ベクトルに対するものと
統計コードベクトルに対するものを別のパラメータに変
換した上、ベクトル量子化して８ビットで表す。よっ
て、サブフレーム当りの伝送ビット数は２５ビットとな
る。以上により、トータルビットレートは３３００ｂｐ
ｓとなる。The sub-frame unit parameter has a long-term prediction lag index of 7 bits, which means that the range of the long-term prediction lag is 19 samples (421 Hz) to 146 samples (5
5 Hz). If two types of sound sources are used, the sound source switching index is 1 bit. The codebook size of the pulse and noise information codebook is 10 bits (1010
Code vector), the code vector index is 10 bits. The gain is expressed by 8 bits by converting the one for the long-term prediction vector and the one for the statistical code vector into different parameters, and then performing vector quantization. Therefore, the number of transmission bits per subframe is 25 bits. As a result, the total bit rate is 3300bp
s.

【０１０４】以上説明したように、本発明の実施例で
は、低ビットレート化しても周期成分の再現性が向上
し、高品質化が図れる。また、次数を打ち切ったインパ
ルス応答の組合せによる音源コードブック検索により、
処理量を従来のＣＥＬＰに比べ低減することができる。As described above, in the embodiment of the present invention, the reproducibility of the periodic component is improved and the quality is improved even if the bit rate is reduced. In addition, by searching the sound source codebook using a combination of impulse responses with truncated orders,
The processing amount can be reduced as compared with the conventional CELP.

【０１０５】[0105]

【発明の効果】本発明によれば、ＣＥＬＰ符号化方法を
低ビットレート化したときに問題となる周期成分の再現
性が改善され、また雑音音源との併用を行うため、４ｋ
ｂｐｓ以下のビットレートでも良好な音声品質の音声符
号器を提供できる。また、パルス音源の検索処理量を低
減できるので、低処理量の音声符号器を提供できる。According to the present invention, the reproducibility of the periodic component, which is a problem when the CELP coding method is reduced to a low bit rate, is improved, and since it is used in combination with a noise source, it is 4 k.
It is possible to provide a voice coder having a good voice quality even at a bit rate of bps or less. Further, since the search processing amount of the pulse sound source can be reduced, it is possible to provide a speech encoder with a low processing amount.

[Brief description of drawings]

【図１】本発明の第１の実施例の符号化部のブロック
図。FIG. 1 is a block diagram of an encoding unit according to a first embodiment of the present invention.

【図２】本発明の第１の実施例の復号化部のブロック
図。FIG. 2 is a block diagram of a decoding unit according to the first embodiment of this invention.

【図３】従来のＣＥＬＰ符号器の原理説明図。FIG. 3 is an explanatory diagram of the principle of a conventional CELP encoder.

【図４】従来のＣＥＬＰ復号器の原理説明図。FIG. 4 is an explanatory diagram of the principle of a conventional CELP decoder.

【図５】音声波形と残差波形の対応図。FIG. 5 is a correspondence diagram between a voice waveform and a residual waveform.

【図６】パルス音源の例。FIG. 6 shows an example of a pulse sound source.

【図７】パルス情報コードブックの構成。FIG. 7 shows the structure of a pulse information codebook.

【図８】パルス音源ベクトル発生の原理説明図。FIG. 8 is an explanatory diagram of the principle of pulse source vector generation.

【図９】重み付けの有無によるインパルス応答波形の比
較。FIG. 9 is a comparison of impulse response waveforms with and without weighting.

【図１０】部分相互相関計算法の説明図。FIG. 10 is an explanatory diagram of a partial cross-correlation calculation method.

【図１１】簡略化畳み込み演算の説明図。FIG. 11 is an explanatory diagram of a simplified convolution operation.

[Explanation of symbols]

１２…線形予測分析部、１４…線形予測パラメータ量子
化部、１５，８１…線形予測パラメータ量子化コードブ
ック、１９…聴覚重み付けフィルタ、２１…重み付け合
成フィルタ、２５…適応コードブック検索部、２６，６
８…適応コードブック、３１…音響分類部、３４…検索
コードブック検索部、３５，７０…使用音源選択部、３
６…パルス音源検索部、４０，７３…パルス発生部、３
８，７１…パルス情報コードブック、３７…雑音音源検
索部、４２，７５…雑音情報コードブック、５１…利得
最適化・量子化部、５９，７８…利得コードブック、８
５…合成フィルタ、８７…ポストフィルタ。12 ... Linear prediction analysis unit, 14 ... Linear prediction parameter quantization unit, 15, 81 ... Linear prediction parameter quantization codebook, 19 ... Auditory weighting filter, 21 ... Weighting synthesis filter, 25 ... Adaptive codebook search unit, 26, 6
8 ... Adaptive codebook, 31 ... Acoustic classification section, 34 ... Search codebook search section, 35, 70 ... Used sound source selection section, 3
6 ... Pulse sound source search unit, 40, 73 ... Pulse generation unit, 3
8, 71 ... Pulse information codebook, 37 ... Noise source search section, 42, 75 ... Noise information codebook, 51 ... Gain optimization / quantization section, 59, 78 ... Gain codebook, 8
5 ... Synthesis filter, 87 ... Post filter.

───────────────────────────────────────────────────── フロントページの続き (72)発明者石川敦義東京都小平市上水本町五丁目20番１号株式会社日立製作所半導体設計開発センタ内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Atsushi Ishikawa 5-20-1 Kamimizuhoncho, Kodaira-shi, Tokyo Inside Hitachi Semiconductor Design Development Center

Claims

[Claims]

1. An input voice signal is divided into frames of a predetermined time length, a spectrum parameter indicating a spectrum envelope of the voice signal is obtained and output, and the frame is divided into subframes of a predetermined time length. Then, a long-term prediction parameter is obtained from the past sound source so as to minimize the error from the speech signal and output, and an optimum code vector is selected from the codebook prepared in advance as a driving sound source for each subframe. In the CELP speech coding method, the codebook includes a plurality of codebooks, at least one of the plurality of codebooks is a noise component, and at least one is a pulse component having a constant amplitude and equally spaced. A speech coding method characterized by the following.

2. The code vector is selected by selecting a codebook to be searched from the plurality of codebooks according to a result of acoustic analysis of the voice signal, and selecting a codebook to be searched. The speech coding method according to claim 1, wherein the speech coding method is performed by selecting an optimum codebook from the code vectors.

3. The selection of the search codebook is characterized in that the entire codebook of the plurality of codebooks or a part of each codebook is set as a search target codebook. 2. The voice encoding method according to 2.

4. The codebook of the pulse component stores information on the position of the first pulse and the pulse interval as the information of the pulse train, and the code vector of the pulse train is generated from the information. A speech coding method according to any one of claims 1 to 3.

5. The speech coding method according to claim 1, wherein the optimum code vector of the pulse component is selected by a full search of the code book.

6. The speech coding method according to claim 1, wherein the interval between the pulse trains is a range that substantially covers a variation range of a pitch period of human utterance.

7. The head pulse position in the sub-frame of the pulse train can be taken from the head to the last point of the sub-frame regardless of the pulse interval. Speech coding method.

8. The code vector of the pulse component is selected based on a combination of impulse responses of a weighting synthesis filter whose length is truncated to a value equal to or less than a minimum value of the interval of the pulse train. The speech coding method according to any one of claims 1 to 7.