JPH09185396A

JPH09185396A - Speech encoding device

Info

Publication number: JPH09185396A
Application number: JP7352199A
Authority: JP
Inventors: 秀享 ▲高▼橋; Hideyuki Takahashi
Original assignee: Olympus Optical Co Ltd
Current assignee: Olympus Corp
Priority date: 1995-12-28
Filing date: 1995-12-28
Publication date: 1997-07-15

Abstract

PROBLEM TO BE SOLVED: To excellently encode a speech signal even in noisy environment and to obtain a speech signal of high sound quality by providing a gain control means which increases and decreases the gain of both or one of a probability code book and an adaptive code book according to the decision result of a speech decision means and the analytic result of a pitch periodicity analyzing means. SOLUTION: On the basis of spectrum parameters obtained by analyzing an input signal by an LPC analyzer 16, a composing filter 8 puts together signals outputted from the probability code book 4 and the adaptive code book 1 and an error evaluation unit 14 outputs codes corresponding to the delay of the adaptive code book 1 minimizing the distortion of the composite signal, the indexes of the probability code book 4, and respective gains. According to the decision result of a speech decision unit 11 and the analytic result of the pitch periodicity analyzer 10, the gains of the probability code book 4 and adaptive code book 1 are increased and decreased by a gain controller 15.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声符号化装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coder.

【０００２】[0002]

【従来の技術】音声信号を能率よく圧縮するために広く
用いられている手段として、音声信号を、スペクトル包
絡を表す線形予測パラメータと、線形予測残差信号に対
応する音源パラメータとを用いて符号化する方式があ
る。このような線形予測の手段を用いた音声符号化方式
は、少ない伝送容量で比較的高品質な合成音声が得られ
ることから、最近のハードウェア技術の進歩と相まっ
て、さまざまな応用方式がさかんに研究、開発されてい
る。その中でも良い音質が得られる方式として、過去の
音源信号を繰り返して得られる適応コードブックを用い
るＣＥＬＰ(Code Excited Linear Predictive Coding)
方式がよく知られている。ＣＥＬＰ方式については例え
ば、kleijin 等による“Improved speech quality and
efficientvector quantization in SELP ”(ICASP' 88
s4.4,pp.155-158,1988) と題した論文に記載されてい
る。2. Description of the Related Art As a widely used means for efficiently compressing a speech signal, the speech signal is coded using a linear prediction parameter representing a spectrum envelope and a sound source parameter corresponding to the linear prediction residual signal. There is a method to make it. A speech coding method using such a linear prediction method can obtain a relatively high quality synthesized speech with a small transmission capacity. Therefore, in combination with recent advances in hardware technology, various application methods have been widely used. Researched and developed. Among them, CELP (Code Excited Linear Predictive Coding), which uses an adaptive codebook obtained by repeating past sound source signals, is used as a method for obtaining good sound quality.
The method is well known. For the CELP method, for example, “Improved speech quality and
efficientvector quantization in SELP ”(ICASP '88
s4.4, pp. 155-158, 1988).

【０００３】図６は、適応コードブックを備えた従来の
コード駆動線形予測方式の音声符号化装置のブロック図
である。同図において、適応コードブック５１は乗算器
５２を介して加算器５３の第１入力端子に接続されてい
る。確率コードブック５４は乗算器５５とスイッチ５６
とを介して加算器５３の第２入力端子に接続されてい
る。FIG. 6 is a block diagram of a conventional code-driven linear prediction type speech encoding apparatus having an adaptive codebook. In the figure, the adaptive codebook 51 is connected to a first input terminal of an adder 53 via a multiplier 52. The probability code book 54 includes a multiplier 55 and a switch 56.
Is connected to the second input terminal of the adder 53 via.

【０００４】加算器５３の出力端子は遅延回路５７を介
して適応コードブック５１に接続されるとともに、合成
フィルタ５８の第１入力端子に接続されている。The output terminal of the adder 53 is connected to the adaptive codebook 51 via the delay circuit 57, and is also connected to the first input terminal of the synthesis filter 58.

【０００５】また、ディジタル音声信号が入力される入
力端子６６に接続されたバッファメモリ５９は、ＬＰＣ
分析器６０を介して合成フィルタ５８の第２入力端子に
接続されるとともに、サブフレーム分割器６１を介して
減算器６２の第１入力端子に接続されている。この減算
器６２の第２入力端子は合成フィルタ５８の出力端子に
接続され、出力端子は聴感重み付けフィルタ６３を介し
て誤差評価器６４に接続されている。誤差評価器６４は
適応コードブック５１と、確率コードブック５４と、乗
算器５２、５５とに接続されている。Further, the buffer memory 59 connected to the input terminal 66 to which the digital audio signal is input is the LPC.
It is connected to the second input terminal of the synthesis filter 58 via the analyzer 60, and is also connected to the first input terminal of the subtracter 62 via the subframe divider 61. The second input terminal of the subtractor 62 is connected to the output terminal of the synthesis filter 58, and the output terminal is connected to the error evaluator 64 via the perceptual weighting filter 63. The error evaluator 64 is connected to the adaptive codebook 51, the probability codebook 54, and the multipliers 52 and 55.

【０００６】さらに、ＬＰＣ分析器６０と、誤差評価器
６４とはマルチプレクサ６５に接続されている。Further, the LPC analyzer 60 and the error evaluator 64 are connected to a multiplexer 65.

【０００７】上記した構成において、入力端子６６か
ら、例えば８ｋＨｚでサンプリングされた原音声信号を
入力し、予め定められたフレーム間隔（例えば２０ｍ
ｓ、すなわち１６０サンプル）の音声信号をバッファメ
モリ５９に格納する。バッファメモリ５９は、フレーム
単位で原音声信号をＬＰＣ分析器６０に送出する。ＬＰ
Ｃ分析器６０は、原音声信号に対して線形予測（ＬＰ
Ｃ）分析を行い、スペクトル特性を表す線形予測パラメ
ータαを抽出し、合成フィルタ５８およびマルチプレク
サ６５に送出する。サブフレーム分割器６１は、フレー
ムの原音声信号を予め定められたサブフレーム間隔（例
えば５ｍｓ、すなわち４０サンプル）に分割する。すな
わち、フレームの原音声信号から、第１サブフレームか
ら第４サブフレームまでのサブフレーム信号が作成され
る。In the above structure, the original audio signal sampled at 8 kHz, for example, is input from the input terminal 66, and a predetermined frame interval (for example, 20 m) is input.
The audio signal of s, that is, 160 samples) is stored in the buffer memory 59. The buffer memory 59 sends the original audio signal to the LPC analyzer 60 in frame units. LP
The C analyzer 60 uses a linear prediction (LP
C) Analysis is performed to extract the linear prediction parameter α representing the spectral characteristic, and the linear prediction parameter α is sent to the synthesis filter 58 and the multiplexer 65. The subframe divider 61 divides the original audio signal of a frame into predetermined subframe intervals (for example, 5 ms, that is, 40 samples). That is, subframe signals from the first subframe to the fourth subframe are created from the original audio signal of the frame.

【０００８】また、適応コードブック５１の遅延Ｌとゲ
インβは、以下の処理によって決定される。The delay L and the gain β of the adaptive codebook 51 are determined by the following processing.

【０００９】まず、遅延回路５７で、先行サブフレーム
における合成フィルタ５８の入力信号すなわち駆動音源
信号に、ピッチ周期に相当する遅延を与えて適応コード
ベクトルとして作成する。例えば、想定するピッチ周期
を４０〜１６７サンプルとすると、４０〜１６７サンプ
ル遅れの１２８種類の信号が適応コードベクトルとして
作成され、適応コードブック５１に格納される。このと
きスイッチ５６は開いた状態となっている。したがっ
て、適応コードブック５１からの各適応コードベクトル
は乗算器５２で可変のゲイン値が乗じられたのち、加算
器５３を通過してそのまま合成フィルタ５８に入力され
る。合成フィルタ５８は線形予測パラメータαを用いて
合成処理を行い、合成ベクトルを減算器６２に送出す
る。減算器６２は原音声ベクトルと合成ベクトルとの減
算を行い、得られた誤差ベクトルを聴感重み付けフィル
タ６３に送出する。聴感重み付けフィルタ６３は誤差ベ
クトルに対して聴感特性を考慮した重み付け処理を行
い、誤差評価器６４に送出する。誤差評価器６４は誤差
ベクトルの２乗平均を計算し、その２乗平均値が最小と
なる最適な適応コードベクトルを検索して、その遅延Ｌ
とゲインβをマルチプレクサ６５に送出する。このよう
にして、適応コードブック５１の遅延Ｌとゲインβが決
定される。First, the delay circuit 57 gives a delay corresponding to the pitch period to the input signal of the synthesizing filter 58 in the preceding subframe, that is, the driving sound source signal, and creates it as an adaptive code vector. For example, if the assumed pitch period is 40 to 167 samples, 128 types of signals with a delay of 40 to 167 samples are created as adaptive code vectors and stored in the adaptive codebook 51. At this time, the switch 56 is in an open state. Therefore, each adaptive code vector from the adaptive codebook 51 is multiplied by a variable gain value in the multiplier 52, then passes through the adder 53 and is input to the synthesis filter 58 as it is. The synthesizing filter 58 performs the synthesizing process using the linear prediction parameter α, and sends the synthesized vector to the subtractor 62. The subtractor 62 subtracts the original speech vector and the synthesized vector and sends the obtained error vector to the perceptual weighting filter 63. The perceptual weighting filter 63 performs a weighting process on the error vector in consideration of the perceptual characteristic, and sends it to the error evaluator 64. The error evaluator 64 calculates the mean square of the error vector, searches for the optimum adaptive code vector having the smallest mean square value, and determines the delay L thereof.
And the gain β are sent to the multiplexer 65. In this way, the delay L and the gain β of the adaptive codebook 51 are determined.

【００１０】次に、確率コードブック５４のインデック
スｉとゲインγは、以下の処理によって決定される。Next, the index i and the gain γ of the probability code book 54 are determined by the following processing.

【００１１】確率コードブック５４には、サブフレーム
長に対応する次元数（すなわち４０次元）の確率的信号
ベクトルが例えば５１２種類、予め格納されており、各
々にインデックスが付与されている。このときスイッチ
５６は閉じた状態となっている。In the probability code book 54, for example, 512 kinds of stochastic signal vectors corresponding to the subframe length (that is, 40 dimensions) are stored in advance, and an index is assigned to each. At this time, the switch 56 is in a closed state.

【００１２】まず、前記処理によって決定された最適な
適応コードベクトルに対して乗算器５２で最適ゲインβ
を乗じたのち、加算器５３に送出する。First, the multiplier 52 selects the optimum gain β for the optimum adaptive code vector determined by the above process.
And then sends it to the adder 53.

【００１３】次に、確率コードブック５４からの各確率
コードベクトルに対して乗算器５５で可変のゲイン値を
乗じたのち、加算器５３に入力する。加算器５３は前記
最適ゲインβを乗じた最適な適応コードベクトルと各確
率コードベクトルとの加算を行い、加算結果を合成フィ
ルタ５８に入力する。この後の処理は前記した適応コー
ドブック５１のパラメータ（遅延Ｌとゲインβ）の決定
処理と同様に行われる。Next, each probability code vector from the probability code book 54 is multiplied by a variable gain value in the multiplier 55, and then input to the adder 53. The adder 53 adds the optimum adaptive code vector multiplied by the optimum gain β and each probability code vector, and inputs the addition result to the synthesis filter 58. The subsequent process is performed in the same manner as the process of determining the parameters (delay L and gain β) of the adaptive codebook 51 described above.

【００１４】すなわち、合成フィルタ５８は線形予測パ
ラメータαを用いて合成処理を行い、合成ベクトルを減
算器６２に送出する。減算器６２は原音声ベクトルと合
成ベクトルとの減算を行い、得られた誤差ベクトルを聴
感重み付けフィルタ６３に送出する。聴感重み付けフィ
ルタ６３は誤差ベクトルに対して聴感特性を考慮した重
み付け処理を行い、誤差評価器６４に送出する。誤差評
価器６４は誤差ベクトルの２乗平均を計算し、その２乗
平均値が最小となる確率コードベクトルを検索して、そ
のインデックスｉとゲインγをマルチプレクサ６５に送
出する。このようにして、確率コードブック５４のイン
デックスｉとゲインγが決定される。That is, the synthesizing filter 58 performs the synthesizing process using the linear prediction parameter α and sends the synthesized vector to the subtractor 62. The subtractor 62 subtracts the original speech vector and the synthesized vector and sends the obtained error vector to the perceptual weighting filter 63. The perceptual weighting filter 63 performs a weighting process on the error vector in consideration of the perceptual characteristic, and sends it to the error evaluator 64. The error evaluator 64 calculates the mean square of the error vector, searches for the probability code vector having the smallest mean square value, and sends the index i and the gain γ to the multiplexer 65. In this way, the index i and the gain γ of the probability codebook 54 are determined.

【００１５】マルチプレクサ６５は、量子化された線形
予測パラメータαと、適応コードブック５１の遅延Ｌ及
びゲインβと、確率コードブック５４のインデックスｉ
及びゲインγの各々をマルチプレクスする。The multiplexer 65 quantizes the linear prediction parameter α, the delay L and the gain β of the adaptive codebook 51, and the index i of the probability codebook 54.
And gain γ are each multiplexed.

【００１６】また、このような音声符号化装置において
は、上述したように有声音に対するピッチ周期性は過去
の音源信号に遅延を与えて作り出される。しかし、過去
の音源信号はもともと雑音系列から作られているため、
有声音の音源に相当するパルス系列を作り出すのが困難
となる。この影響により特に有声音において再生音声に
高周波ノイズが多く含まれて、音質が劣化してしまう。
この問題を解決するために種々の提案がなされている
が、その一例が、“DETAILS TO ASSIST INIMPLEMENTATI
ON OF FEDERAL STANDARD 1016 CELP ”(NATIONAL COMM
UNICATIONSSYSTEM, TECHNICAL INFOMATION BULLETIN 92
-1,PP.10-11,1992)に詳細に記載されている。Further, in such a speech coding apparatus, as described above, the pitch periodicity for voiced sound is created by delaying the past excitation signal. However, since past sound source signals are originally made from noise sequences,
It becomes difficult to create a pulse sequence corresponding to a voiced sound source. Due to this influence, particularly in voiced sound, the reproduced sound contains a lot of high frequency noise, and the sound quality is deteriorated.
Various proposals have been made to solve this problem, one example of which is “DETAILS TO ASSIST INIMPLEMENTATI
ON OF FEDERAL STANDARD 1016 CELP ”(NATIONAL COMM
UNICATIONSSYSTEM, TECHNICAL INFORMATION BULLETIN 92
-1, PP.10-11, 1992).

【００１７】図７は、図６に示すコード駆動線形予測方
式の音声符号化装置に対応する復号化器のブロック図で
ある。同図において、適応コードブック７０は乗算器７
１を介して加算器７２の第１入力端子に接続されてい
る。確率コードブック７３は乗算器７４とスイッチ７５
とを介して加算器７２の第２入力端子に接続されてい
る。加算器７２の出力端子は遅延回路７６を介して適応
コードブック７０に接続されるとともに、出力端子７９
を有する合成フィルタ７７の第１入力端子に接続されて
いる。FIG. 7 is a block diagram of a decoder corresponding to the speech coding apparatus of the code driven linear prediction system shown in FIG. In the figure, the adaptive codebook 70 is a multiplier 7
It is connected to the first input terminal of the adder 72 via 1. The probability code book 73 includes a multiplier 74 and a switch 75.
Is connected to the second input terminal of the adder 72 via. The output terminal of the adder 72 is connected to the adaptive codebook 70 via the delay circuit 76, and the output terminal 79
Is connected to the first input terminal of the synthesis filter 77.

【００１８】また、デマルチプレクサ７８は、適応コー
ドブック７０と、確率コードブック７３と、乗算器７
１、７４と、合成フィルタ７７の第２入力端子とに接続
されている。The demultiplexer 78 also includes an adaptive codebook 70, a probability codebook 73, and a multiplier 7.
1, 74 and the second input terminal of the synthesis filter 77.

【００１９】なお、ここでは合成フィルタ７７の構成は
前記した図６に示す合成フィルタ５８の構成と同一であ
るとする。The composition of the synthesizing filter 77 is assumed to be the same as the composition of the synthesizing filter 58 shown in FIG.

【００２０】上記した構成において、デマルチプレクサ
７８は受信した信号を線形予測パラメータαと、適応コ
ードブック７０の遅延Ｌ及びゲインβと、確率コードブ
ック７３のインデックスｉ及びゲインγとに分解して、
分解された線形予測パラメータαを合成フィルタ７７
に、遅延Ｌとゲインβを各々適応コードブック７０と乗
算器７１に、インデックスｉとゲインγを各々確率コー
ドブック７３と乗算器７４に出力する。In the above configuration, the demultiplexer 78 decomposes the received signal into the linear prediction parameter α, the delay L and the gain β of the adaptive codebook 70, the index i and the gain γ of the probability codebook 73,
A synthesis filter 77 for the decomposed linear prediction parameter α
Then, the delay L and the gain β are output to the adaptive codebook 70 and the multiplier 71, respectively, and the index i and the gain γ are output to the probability codebook 73 and the multiplier 74, respectively.

【００２１】デマルチプレクサ７８から出力された適応
コードブック７０の遅延Ｌに基づいて適応コードブック
７０の適応コードベクトルを選択する。ここで適応コー
ドブック７０は符号化装置における適応コードブック５
１の内容と同じ内容を有する。すなわち、適応コードブ
ック７０には、遅延回路７６を介して過去の駆動音源信
号が入力される。乗算器７１は受信したゲインβに基づ
いて入力された適応コードベクトルを増幅し、加算器７
２に送出する。An adaptive code vector of the adaptive codebook 70 is selected based on the delay L of the adaptive codebook 70 output from the demultiplexer 78. Here, the adaptive codebook 70 is the adaptive codebook 5 in the encoding device.
It has the same contents as the contents of 1. That is, the past driving sound source signal is input to the adaptive codebook 70 via the delay circuit 76. The multiplier 71 amplifies the input adaptive code vector based on the received gain β, and the adder 7
Send to 2.

【００２２】次に、デマルチプレクサ７８から出力され
た確率コードブック７３のインデックスｉに基づいて確
率コードブック７３の確率コードベクトルを選択する。
ここで確率コードブック７３は符号化装置における確率
コードブック５４の内容と同じ内容を有する。乗算器７
４は受信したゲインγに基づいて入力された確率コード
ベクトルを増幅し、加算器７２に送出する。Next, the probability code vector of the probability code book 73 is selected based on the index i of the probability code book 73 output from the demultiplexer 78.
Here, the probability code book 73 has the same contents as the contents of the probability code book 54 in the encoding device. Multiplier 7
4 amplifies the input probability code vector based on the received gain γ and sends it to the adder 72.

【００２３】加算器７２は増幅された確率コードベクト
ルと増幅された適応コードベクトルとを加算して合成フ
ィルタ７７および遅延回路７６に送出する。合成フィル
タ７７は受信した線形予測パラメータαを係数として合
成処理を行い、合成音声信号を出力端子７９から出力す
る。The adder 72 adds the amplified probability code vector and the amplified adaptive code vector and sends them to the synthesis filter 77 and the delay circuit 76. The synthesis filter 77 performs synthesis processing using the received linear prediction parameter α as a coefficient, and outputs a synthesized speech signal from the output terminal 79.

【００２４】[0024]

【発明が解決しようとする課題】しかしながら、上述し
たような線形予測分析を用いる音声符号化装置は、比較
的低いビットレートで高品質な符号化性能を得ることが
できるが、本装置を非音声信号すなわち背景雑音が不可
避的に存在する環境下で、例えば、移動体電話や音声録
音装置として使用する場合は、音声信号に背景雑音が混
入して符号化した信号の音質が大きく劣化してしまうと
いう問題があった。However, although the speech coding apparatus using the linear prediction analysis as described above can obtain high-quality coding performance at a relatively low bit rate, this apparatus can be used for non-speech. In the environment where signals, that is, background noises inevitably exist, for example, when used as a mobile telephone or a voice recording device, the background noises are mixed in the voice signals, and the quality of the encoded signals is greatly deteriorated. There was a problem.

【００２５】本発明の音声符号化装置はこのような課題
に着目してなされたものであり、その目的とするところ
は、雑音環境下においても音声信号を良好に符号化して
高音質の音声信号を得ることができる音声符号化装置を
提供することにある。The speech coding apparatus of the present invention has been made in view of such a problem, and an object of the speech coding apparatus is to adequately code a speech signal even in a noisy environment to obtain a speech signal of high sound quality. It is to provide a speech coding apparatus capable of obtaining

【００２６】[0026]

【課題を解決するための手段】上記の目的を達成するた
めに、第１の発明に係る音声符号化装置は、あらかじめ
定められたフレーム間隔に分割されたフレーム単位の入
力信号が音声信号か非音声信号かを判別する音声判別手
段と、上記入力信号を分析し、そのスペクトルパラメー
タを出力する線形予測分析手段と、上記入力信号のフレ
ーム間隔をさらに所定のサブフレーム間隔に分割するサ
ブフレーム分割手段と、過去の音源信号に遅延を与えて
作成した信号をあらかじめ複数記憶している適応コード
ブックと、上記サブフレーム間隔の雑音信号波形を複数
記憶している確率コードブックと、上記確率コードブッ
クと上記適応コードブックの両方もしくは一方から出力
される信号に基づき駆動音源信号を生成する駆動音源信
号生成手段と、上記スペクトルパラメータをもとに、上
記確率コードブックと上記適応コードブックから出力さ
れる信号を駆動音源信号として音声を合成する合成フィ
ルタと、入力信号に対する合成信号の歪みを最小とする
適応コードブックの遅延、確率コードブックのインデッ
クス、およびそれぞれのゲインに対応する符号を出力す
る誤差最小化手段と、入力信号のピッチ周期性を分析す
るピッチ周期性分析手段と、上記音声判別手段の判別結
果および上記ピッチ周期性分析手段の分析結果に応じ
て、上記確率コードブックと上記適応コードブックの両
方もしくは一方のゲインを増減させるゲイン調整手段と
を具備する。In order to achieve the above object, the speech coding apparatus according to the first aspect of the invention has a frame-based input signal divided into predetermined frame intervals as a speech signal or a non-speech signal. A voice discriminating means for discriminating whether the signal is a voice signal, a linear predictive analyzing means for analyzing the input signal and outputting a spectrum parameter thereof, and a subframe dividing means for further dividing the frame interval of the input signal into predetermined subframe intervals. An adaptive codebook in which a plurality of signals created by delaying past sound source signals are stored in advance, a probability codebook in which a plurality of noise signal waveforms at the subframe intervals are stored, and the probability codebook, A driving sound source signal generating means for generating a driving sound source signal based on a signal output from both or one of the adaptive codebooks; A synthesis filter for synthesizing speech by using a signal output from the stochastic codebook and the adaptive codebook as a driving sound source signal based on the spectrum parameter, and a delay of the adaptive codebook that minimizes distortion of the synthesized signal with respect to the input signal. , A probability codebook index, and an error minimization means for outputting a code corresponding to each gain, a pitch periodicity analysis means for analyzing the pitch periodicity of an input signal, a discrimination result of the voice discrimination means, and the pitch. Gain adjusting means for increasing / decreasing the gain of either or both of the probability codebook and the adaptive codebook according to the analysis result of the periodicity analysis means is provided.

【００２７】また、第２の発明に係る音声符号化装置
は、第１の発明に係る音声符号化装置において、上記駆
動音源信号生成手段は、上記音声判別手段が入力信号を
音声信号と判別したときには、上記確率コードブックと
上記適応コードブックから出力される信号から駆動音源
信号を生成し、上記音声判別手段が入力信号を非音声信
号と判別したときには、上記確率コードブックのみから
出力される信号から駆動音源信号を生成する。The speech encoding apparatus according to a second aspect of the present invention is the speech encoding apparatus according to the first aspect of the present invention, wherein in the drive excitation signal generation means, the speech discrimination means discriminates the input signal as a speech signal. Occasionally, a driving sound source signal is generated from the signals output from the stochastic codebook and the adaptive codebook, and when the sound discriminating means discriminates the input signal as a non-speech signal, the signal output only from the stochastic codebook. Generate a driving sound source signal from.

【００２８】また、第３の発明に係る音声符号化装置
は、第１の発明に係る音声符号化装置において、上記ゲ
イン調整手段は、上記音声判別手段が入力信号を非音声
信号と判別したときには、音声信号と判別したときに対
して所定の割合で確率コードブックと適応コードブック
の両方または一方のゲインを減衰させる。A speech coding apparatus according to a third aspect of the present invention is the speech coding apparatus according to the first invention, wherein the gain adjusting means is operable when the speech discrimination means discriminates the input signal as a non-speech signal. , The gain of the stochastic codebook and / or the adaptive codebook is attenuated at a predetermined ratio with respect to the case of discriminating the speech signal.

【００２９】また、第４の発明に係る音声符号化装置
は、第１または第２の発明に係る音声符号化装置におい
て、上記音声判別手段は、フレーム毎に入力信号のエネ
ルギーの大きさによって音声／非音声を判別するもので
あって、符号化開始時のフレームエネルギーに応じて判
別閾値を決定する閾値決定手段を具備し、現在のフレー
ムエネルギーと符号化開始時のフレームエネルギーとの
差が、上記閾値決定手段により決定された判別閾値より
大きければ音声、小さければ非音声とする。The speech coding apparatus according to a fourth aspect of the present invention is the speech coding apparatus according to the first or second aspect, wherein the speech discrimination means determines the speech depending on the energy level of the input signal for each frame. / Non-speech is discriminated, and a threshold determination means for determining a discrimination threshold according to the frame energy at the start of encoding is provided, and the difference between the current frame energy and the frame energy at the start of encoding is If it is larger than the discrimination threshold value determined by the threshold value determination means, it is determined as voice, and if it is less than it is determined as non-voice.

【００３０】すなわち、第１の発明に係る音声符号化装
置は、あらかじめ定められたフレーム間隔に分割された
フレーム単位の入力信号が音声信号か非音声信号かを音
声判別手段によって判別するとともに、線形予測分析手
段によって上記入力信号を分析してそのスペクトルパラ
メータを出力し、さらには上記入力信号のフレーム間隔
をサブフレーム分割手段によって所定のサブフレーム間
隔に分割する。また、過去の音源信号に遅延を与えて作
成した信号をあらかじめ複数記憶している適応コードブ
ックと、上記サブフレーム間隔の雑音信号波形を複数記
憶している確率コードブックの両方もしくは一方から出
力される信号に基づいて、駆動音源信号生成手段によっ
て駆動音源信号を生成する。そして、上記スペクトルパ
ラメータをもとに、合成フィルタによって上記確率コー
ドブックと上記適応コードブックから出力される信号を
駆動音源信号として音声を合成するとともに、誤差最小
化手段によって、入力信号に対する合成信号の歪みを最
小とする適応コードブックの遅延、確率コードブックの
インデックス、およびそれぞれのゲインに対応する符号
を出力する。そして、ピッチ周期性分析手段によって入
力信号のピッチ周期性を分析し、上記音声判別手段の判
別結果および上記ピッチ周期性分析手段の分析結果に応
じて、上記確率コードブックと上記適応コードブックの
両方もしくは一方のゲインをゲイン調整手段によって増
減させるようにする。That is, the speech coder according to the first aspect of the invention discriminates by the speech discriminating means whether the input signal in frame units divided into a predetermined frame interval is a speech signal or a non-speech signal, and at the same time, it is linear. The predictive analysis unit analyzes the input signal and outputs the spectrum parameter thereof, and further, the frame interval of the input signal is divided into predetermined subframe intervals by the subframe dividing unit. Also, it is output from both or one of the adaptive codebook in which a plurality of signals created by delaying the past sound source signal are stored in advance and the probability codebook in which a plurality of noise signal waveforms at the subframe intervals are stored. The drive sound source signal generating means generates a drive sound source signal based on the signal. Then, based on the spectrum parameter, while synthesizing the speech by using a signal output from the stochastic codebook and the adaptive codebook as a driving sound source signal by a synthesizing filter, the error minimization means converts the synthesized signal to the input signal. The adaptive codebook delay that minimizes the distortion, the probability codebook index, and the code corresponding to each gain are output. Then, the pitch periodicity of the input signal is analyzed by the pitch periodicity analysis means, and both the probability codebook and the adaptive codebook are selected according to the discrimination result of the voice discrimination means and the analysis result of the pitch periodicity analysis means. Alternatively, one of the gains is increased or decreased by the gain adjusting means.

【００３１】また、第２の発明に係る音声符号化装置
は、第１の発明に係る音声符号化装置において、上記音
声判別手段が入力信号を音声信号と判別したときには、
上記駆動音源信号生成手段によって、上記確率コードブ
ックと適応コードブックから出力される信号から駆動音
源信号を生成し、上記音声判別手段が入力信号を非音声
信号と判別したときには、上記確率コードブックのみか
ら出力される信号から駆動音源信号を生成するようにす
る。A speech coding apparatus according to a second invention is the speech coding apparatus according to the first invention, wherein when the speech discrimination means discriminates the input signal as a speech signal,
When the driving sound source signal generating means generates a driving sound source signal from the signals output from the stochastic codebook and the adaptive codebook, and when the sound discriminating means discriminates the input signal as a non-speech signal, only the stochastic codebook is generated. The drive sound source signal is generated from the signal output from the.

【００３２】また、第３の発明に係る音声符号化装置
は、第１の発明に係る音声符号化装置において、上記音
声判別手段が入力信号を非音声信号と判別したときに
は、上記ゲイン調整手段によって、音声信号と判別した
ときに対して所定の割合で確率コードブックと適応コー
ドブックの両方または一方のゲインを減衰させるように
する。A speech coding apparatus according to a third aspect of the present invention is the speech coding apparatus according to the first invention, wherein when the speech discrimination means discriminates an input signal as a non-speech signal, the gain adjustment means is used. , The gain of the stochastic codebook and / or the adaptive codebook is attenuated at a predetermined ratio with respect to the case where it is discriminated as a voice signal.

【００３３】また、第４の発明に係る音声符号化装置
は、第１、第２、または第３の発明に係る音声符号化装
置において、上記音声判別手段を用いて、フレーム毎に
入力信号のエネルギーの大きさによって音声／非音声を
判別するとともに、閾値決定手段によって、符号化開始
時のフレームエネルギーに応じて判別閾値を決定する。
そして、現在のフレームエネルギーと符号化開始時のフ
レームエネルギーとの差が、上記閾値決定手段により決
定された判別閾値より大きければ音声、小さければ非音
声とする。A speech coding apparatus according to a fourth aspect of the invention is the speech coding apparatus according to the first, second or third invention, wherein the speech discrimination means is used to detect the input signal for each frame. Voice / non-voice is discriminated according to the amount of energy, and the discrimination threshold is determined by the threshold determination means according to the frame energy at the start of encoding.
Then, if the difference between the current frame energy and the frame energy at the start of encoding is larger than the discrimination threshold value determined by the threshold value determination means, it is determined as voice, and if it is smaller, it is determined as non-voice.

【００３４】[0034]

【発明の実施の形態】以下、図面を参照して本発明の一
実施形態を詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

【００３５】図１は本発明が適用される音声符号化装置
の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a speech coder to which the present invention is applied.

【００３６】同図において、適応コードブック１は乗算
器２とスイッチ１９とを介して、駆動音源信号生成手段
としての加算器３の第１入力端子に接続され、確率コー
ドブック４は乗算器５とスイッチ６とを介して加算器３
の第２入力端子に接続されている。加算器３の出力端子
は合成フィルタ８を介して減算器１２の第１入力端子に
接続されるとともに、遅延回路７を介して適応コードブ
ック１に接続されている。In the figure, the adaptive codebook 1 is connected to the first input terminal of the adder 3 as the driving sound source signal generating means via the multiplier 2 and the switch 19, and the stochastic codebook 4 is multiplied by the multiplier 5. And adder 3 via switch 6
Is connected to the second input terminal of. The output terminal of the adder 3 is connected to the first input terminal of the subtracter 12 via the synthesis filter 8 and also connected to the adaptive codebook 1 via the delay circuit 7.

【００３７】また、入力端子１９に接続されたバッファ
メモリ９は、線形予測分析手段としてのＬＰＣ分析器１
６を介して合成フィルタ８に接続されるとともに、サブ
フレーム分割手段としてのサブフレーム分割器１７を介
して減算器１２に、音声判別手段としての音声判別器１
１を介してゲイン調整器１５に、さらに、ピッチ周期性
分析手段としてのピッチ周期性分析器１０を介して、ゲ
イン調整手段としてのゲイン調整器１５に接続されてい
る。このゲイン調節器１５は乗算器５に接続されてい
る。減算器１２の出力端子は聴感重み付けフィルタ１３
を介して誤差最小化手段としての誤差評価器１４の入力
端子に接続されている。この誤差評価器１４の出力端子
は適応コードブック１と、確率コードブック４と、乗算
器２、５に接続されている。Further, the buffer memory 9 connected to the input terminal 19 is the LPC analyzer 1 as the linear prediction analysis means.
6 is connected to the synthesizing filter 8 via 6, and the subtracter 12 is connected to the subtracter 12 via the subframe divider 17 as a subframe dividing means, and the voice discriminator 1 as a voice discriminating means.
1 to the gain adjuster 15 and further to the gain adjuster 15 as the gain adjusting means via the pitch periodicity analyzer 10 as the pitch periodicity analyzing means. The gain adjuster 15 is connected to the multiplier 5. The output terminal of the subtractor 12 is the perceptual weighting filter 13
Is connected to the input terminal of the error evaluator 14 as an error minimizing means. The output terminal of the error evaluator 14 is connected to the adaptive codebook 1, the probability codebook 4, and the multipliers 2 and 5.

【００３８】さらに、マルチプレクサ１８は音声判別器
１１とＬＰＣ分析器１６と誤差評価器１４とに接続され
ている。Further, the multiplexer 18 is connected to the voice discriminator 11, the LPC analyzer 16 and the error evaluator 14.

【００３９】図２は、図１に示す音声判別手段としての
音声判別器１１の構成を示すものである。同図におい
て、フレームエネルギー分析回路１２０は加算器１２１
の第１入力端子に接続されている。また、初期フレーム
エネルギー分析回路１２２は閾値決定手段としての閾値
決定回路１２４と、加算器１２１の第２入力端子に接続
されている。加算器１２１の出力端子と閾値決定回路１
２４とは判別回路１２３に接続されている。FIG. 2 shows the structure of the voice discriminator 11 as the voice discriminating means shown in FIG. In the figure, the frame energy analysis circuit 120 is an adder 121.
Connected to the first input terminal of. The initial frame energy analysis circuit 122 is connected to a threshold value determining circuit 124 as a threshold value determining means and a second input terminal of the adder 121. Output terminal of adder 121 and threshold value determination circuit 1
24 is connected to the discrimination circuit 123.

【００４０】上記した構成において、入力端子９から、
例えば８ｋＨｚでサンプリングされた原音声信号を入力
し、あらかじめ定められたフレーム間隔（例えば２０ｍ
ｓ、すなわち１６０サンプル）の音声信号をバッファメ
モリ９に格納する。バッファメモリ９は、入力信号をフ
レーム単位でＬＰＣ分析器１６、サブフレーム分割器１
７、音声判別器１１、およびピッチ周期性分析器１０に
送出する。ＬＰＣ分析器１６は、入力信号に対して線形
予測（ＬＰＣ）分析を行い、スペクトル特性を表す線形
予測パラメータαを抽出し、合成フィルタ８およびマル
チプレクサ１８に送出する。サブフレーム分割器１７
は、フレームの入力信号をあらかじめ定められたサブフ
レーム間隔（例えば５ｍｓ、すなわち４０サンプル）に
分割する。ここでは、フレームの入力信号から、第１サ
ブフレームから第４サブフレームまでのサブフレーム信
号が作成される。In the above structure, from the input terminal 9
For example, an original audio signal sampled at 8 kHz is input, and a predetermined frame interval (for example, 20 m
The audio signal of s, that is, 160 samples) is stored in the buffer memory 9. The buffer memory 9 receives the input signal on a frame-by-frame basis from the LPC analyzer 16 and the sub-frame divider 1.
7, the voice discriminator 11, and the pitch periodicity analyzer 10. The LPC analyzer 16 performs a linear prediction (LPC) analysis on the input signal, extracts a linear prediction parameter α representing a spectral characteristic, and sends it to the synthesis filter 8 and the multiplexer 18. Subframe divider 17
Divides the input signal of a frame into predetermined subframe intervals (for example, 5 ms, that is, 40 samples). Here, sub-frame signals from the first sub-frame to the fourth sub-frame are created from the input signal of the frame.

【００４１】音声判別器１１は、フレームの入力信号が
音声か非音声かを、以下の方法で判別する。すなわち、
図２に示す構成において、フレームエネルギー分析回路
１２０は入力されたフレーム入力信号のフレームエネル
ギーＥ_f［ｄＢ］を以下の式により算出する。The voice discriminator 11 discriminates whether the input signal of the frame is voice or non-voice by the following method. That is,
In the configuration shown in FIG. 2, the frame energy analysis circuit 120 calculates the frame energy E _f [dB] of the input frame input signal by the following formula.

【００４２】[0042]

【数１】ただし、ｓ（ｎ）はサンプルｎにおける入力信号、Ｎは
フレーム長を示す。[Equation 1] However, s (n) is the input signal in sample n, and N is the frame length.

【００４３】また、初期フレームエネルギー分析回路１
２２は符号化開始時のフレームエネルギーＥ_b［ｄＢ］
を上式により同様に算出する。Also, the initial frame energy analysis circuit 1
22 is the frame energy E _b [dB] at the start of encoding
Is similarly calculated by the above formula.

【００４４】閾値決定回路１２４は、例えば図３に示す
ような背景雑音エネルギー［ｄＢ］と閾値［ｄＢ］との
関係を基に、背景雑音エネルギーの大きさに応じて閾値
を決定して判別回路１２３に送出する。また、加算器１
２１ではフレームエネルギーＥ_f［ｄＢ］から初期フレ
ームエネルギーＥ_b［ｄＢ］を減算し、その減算結果を
判別回路１２３に送出する。そして、判別回路１２３は
入力された減算結果と閾値を比較し、減算結果が閾値よ
り大きければフレーム入力信号は音声信号であると判別
し、そうでなければ非音声信号である判別する。The threshold determining circuit 124 determines a threshold according to the magnitude of the background noise energy based on the relationship between the background noise energy [dB] and the threshold [dB] as shown in FIG. To 123. Also, adder 1
At 21, the initial frame energy E _b [dB] is subtracted from the frame energy E _f [dB], and the subtraction result is sent to the discrimination circuit 123. Then, the determination circuit 123 compares the input subtraction result with a threshold value, and if the subtraction result is larger than the threshold value, determines that the frame input signal is a voice signal, and otherwise determines that it is a non-voice signal.

【００４５】図１に戻って、音声判別器１１において入
力信号が音声信号であると判別されるとスイッチ１９は
閉じられる。また、入力信号が非音声信号であると判別
されるとスイッチ１９は開かれる。このような制御動作
により、音声区間では適応コードブック１１と確率コー
ドブック４から出力される信号から駆動音源信号が生成
され、非音声区間では確率コードブック４のみから駆動
音源信号が生成される。これは、非音声区間においては
適応コードブック１は単にもう一つの確率コードブック
としてしか機能しなくなるため、音質の向上にはほとん
ど寄与しないためである。音声区間においては、適応コ
ードブック１の遅延Ｌとゲインβは、前記した従来例と
同様に決定される。また、確率コードブック４のインデ
ックスｉとゲインγも、前記した従来例と同様に決定さ
れる。Returning to FIG. 1, when the voice discriminator 11 discriminates that the input signal is a voice signal, the switch 19 is closed. When it is determined that the input signal is a non-voice signal, the switch 19 is opened. By such a control operation, the driving sound source signal is generated from the signals output from the adaptive codebook 11 and the probability codebook 4 in the voice section, and the driving sound source signal is generated only from the probability codebook 4 in the non-voice section. This is because the adaptive codebook 1 only functions as another stochastic codebook in the non-speech section, and thus it hardly contributes to the improvement of sound quality. In the voice section, the delay L and the gain β of the adaptive codebook 1 are determined in the same manner as in the conventional example described above. Further, the index i and the gain γ of the probability codebook 4 are also determined in the same manner as the above-mentioned conventional example.

【００４６】ピッチ周期性分析器１０は、フレーム入力
信号のピッチ周期性を分析する。本実施形態では、例え
ば選択された適応コードベクトルをβ倍した信号（ピッ
チ予測信号）と、入力信号との相互相関を計算する。す
なわち、この相互相関の値が高ければより周期性の高
い、有声音であるといえ、反対に相互相関の値が低けれ
ば、無声音または非音声であるといえる。The pitch periodicity analyzer 10 analyzes the pitch periodicity of the frame input signal. In the present embodiment, for example, the cross-correlation between a signal (pitch prediction signal) obtained by multiplying the selected adaptive code vector by β and the input signal is calculated. That is, if the value of this cross-correlation is high, it can be said that the voiced sound has higher periodicity, and conversely, if the value of the cross-correlation is low, it can be said that it is unvoiced or non-voiced.

【００４７】相互相関を計算するにあたって、ここでは
下式で示される入力信号とピッチ予測信号との一般化相
互相関Ｒを用いる。In calculating the cross-correlation, the generalized cross-correlation R between the input signal and the pitch prediction signal expressed by the following equation is used here.

【００４８】[0048]

【数２】ただし、ｓ（ｎ）は前記したようにサンプルｎにおける
入力信号であり、ｐ（ｎ）はサンプルｎにおけるピッチ
予測信号である。[Equation 2] However, s (n) is the input signal in sample n as described above, and p (n) is the pitch prediction signal in sample n.

【００４９】ゲイン調整器１５はピッチ周期性分析器１
０の分析結果としての一般化相互相関Ｒの値と、音声判
別器１１の判別結果ｖ／ｕｖ（ｖはｖｏｉｃｅ（音声）
を意味し、ｕｖはｕｎｖｏｉｃｅ（非音声）を意味す
る）に応じて、確率コードブック４と、適応コードブッ
ク１の両方もしくは一方のゲインを増減させる。本実施
形態では、非音声信号であると判別されたときは、図４
に示すような一般化相互相関Ｒ［ｄＢ］と確率コードブ
ックのゲインの倍率との関係に基づいて、音声信号であ
ると判別されたときに対する所定の割合で音声確率コー
ドブック４のゲインγを減衰させるようにする。The gain adjuster 15 is the pitch periodicity analyzer 1
The value of the generalized cross-correlation R as the analysis result of 0 and the discrimination result v / uv (v is voice (voice)) of the voice discriminator 11.
, And uv means increase or decrease the gain of both or one of the probability codebook 4 and the adaptive codebook 1 in accordance with unvoice (non-voice). In the present embodiment, when it is determined that the signal is a non-voice signal, FIG.
Based on the relationship between the generalized cross-correlation R [dB] and the gain multiplication factor of the probability codebook as shown in, the gain γ of the voice probability codebook 4 is set at a predetermined ratio with respect to when it is determined to be a voice signal. Try to attenuate it.

【００５０】このような処理により、音声区間では通常
のピッチ強調処理が行われ、非音声区間では図４に示す
ように確率コードブック４のゲインγが減衰されるの
で、背景雑音を抑制することができる。By such processing, the normal pitch enhancement processing is performed in the voice section, and the gain γ of the probability codebook 4 is attenuated in the non-voice section as shown in FIG. 4, so that the background noise is suppressed. You can

【００５１】マルチプレクサ１８は、量子化された線形
予測パラメータαと、適応コードブック１の遅延Ｌ及び
ゲインβと、確率コードブック４のインデックスｉ及び
ゲインγと、音声判別情報ｖ／ｕｖの各々をマルチプレ
クスして伝送する。The multiplexer 18 receives the quantized linear prediction parameter α, the delay L and the gain β of the adaptive codebook 1, the index i and the gain γ of the probability codebook 4, and the speech discrimination information v / uv. It is multiplexed and transmitted.

【００５２】続いて、上記した音声符号化装置に対応す
る音声復号化装置の復号化動作を図面を参照して詳細に
説明する。Next, the decoding operation of the speech decoding apparatus corresponding to the above speech coding apparatus will be described in detail with reference to the drawings.

【００５３】図５は、図１の音声符号化装置に対応する
音声復号化装置のブロック図である。同図において、適
応コードブック３０は、乗算器３１とスイッチ３２を介
して加算器３３の第１入力端子に接続されている。確率
コードブック３６は、乗算器３７とスイッチ３８とを介
して加算器３３の第２入力端子に接続されている。加算
器３３の出力端子は遅延回路４０を介して適応コードブ
ック３０に接続されるとともに、出力端子３９を有する
合成フィルタ３４の第１入力端子に接続されている。FIG. 5 is a block diagram of a speech decoding apparatus corresponding to the speech encoding apparatus of FIG. In the figure, the adaptive codebook 30 is connected to a first input terminal of an adder 33 via a multiplier 31 and a switch 32. The probability code book 36 is connected to the second input terminal of the adder 33 via the multiplier 37 and the switch 38. The output terminal of the adder 33 is connected to the adaptive codebook 30 via the delay circuit 40, and is also connected to the first input terminal of the synthesis filter 34 having the output terminal 39.

【００５４】また、デマルチプレクサ３５は、適応コー
ドブック３０と、確率コードブック３６と、乗算器３
１、３７と、合成フィルタ３４の第２入力端子とに接続
されている。The demultiplexer 35 also includes an adaptive codebook 30, a probability codebook 36, and a multiplier 3
1, 37 and the second input terminal of the synthesis filter 34.

【００５５】上記した構成において、デマルチプレクサ
３５は受信した信号を線形予測パラメータαと、適応コ
ードブック３０の遅延Ｌ及びゲインβと、確率コードブ
ック３６のインデックスｉ及びゲインγと、音声判別情
報ｖ／ｕｖとに分解して、分解された線形予測パラメー
タαを合成フィルタ３４に、遅延Ｌとゲインβを各々適
応コードブック３０と乗算器３１に、インデックスｉと
ゲインγを各々確率コードブック３６と乗算器３７に、
音声判別情報ｖ／ｕｖをスイッチ３２に出力する。In the above configuration, the demultiplexer 35 processes the received signal by using the linear prediction parameter α, the delay L and the gain β of the adaptive codebook 30, the index i and the gain γ of the probability codebook 36, and the voice discrimination information v. / Uv and decomposed linear prediction parameter α into synthesis filter 34, delay L and gain β into adaptive codebook 30 and multiplier 31, index i and gain γ into probability codebook 36, respectively. In the multiplier 37,
The voice discrimination information v / uv is output to the switch 32.

【００５６】そして、デマルチプレクサ３５から出力さ
れた音声判別情報ｖ／ｕｖに基づいてスイッチ３２の開
閉動作を制御する。すなわち、音声判別情報ｖ／ｕｖが
音声信号であることを示していればスイッチ３２を閉じ
て適応コードブック３０からの情報を使用する。一方、
音声判別情報ｖ／ｕｖが非音声信号であることを示して
いればスイッチ３２を開いて適応コードブック３０を未
使用とする。Then, the opening / closing operation of the switch 32 is controlled based on the voice discrimination information v / uv output from the demultiplexer 35. That is, if the voice discrimination information v / uv indicates that it is a voice signal, the switch 32 is closed and the information from the adaptive codebook 30 is used. on the other hand,
If the voice discrimination information v / uv indicates that it is a non-voice signal, the switch 32 is opened to make the adaptive codebook 30 unused.

【００５７】また、デマルチプレクサ３５から出力され
た適応コードブック３０の遅延Ｌに基づいて適応コード
ブック３０の適応コードベクトルを選択する。ここで適
応コードブック３０は図１に示す音声符号化装置におけ
る適応コードブック１の内容と同じ内容を有する。すな
わち、適応コードブック３０には、遅延回路４０を介し
て過去の駆動音源信号が入力される。乗算器３１は受信
したゲインβにより、入力された適応コードベクトルを
増幅し、加算器３３に送出する。The adaptive code vector of the adaptive codebook 30 is selected based on the delay L of the adaptive codebook 30 output from the demultiplexer 35. Here, adaptive codebook 30 has the same content as adaptive codebook 1 in the speech coding apparatus shown in FIG. That is, the past driving sound source signal is input to the adaptive codebook 30 via the delay circuit 40. The multiplier 31 amplifies the input adaptive code vector by the received gain β and sends it to the adder 33.

【００５８】デマルチプレクサ３５から出力された確率
コードブック３６のインデックスｉに基づいて確率コー
ドブック３６の確率コードベクトルを選択する。ここで
確率コードブック３６は図１に示す音声符号化装置にお
ける確率コードブック４の内容と同じ内容を有する。乗
算器３７は受信したゲインγにより、入力された確率コ
ードベクトルを増幅し、加算器３３に送出する。The probability code vector of the probability code book 36 is selected based on the index i of the probability code book 36 output from the demultiplexer 35. Here, the probability code book 36 has the same contents as the contents of the probability code book 4 in the speech coding apparatus shown in FIG. The multiplier 37 amplifies the input probability code vector by the received gain γ and sends it to the adder 33.

【００５９】加算器３３は増幅された適応コードベクト
ルと、増幅された確率コードベクトルとを加算して合成
フィルタ３４および遅延回路４０に送出する。合成フィ
ルタ３４は受信した線形予測パラメータαを係数として
合成処理を行い、合成音声信号を出力する。The adder 33 adds the amplified adaptive code vector and the amplified probability code vector and sends them to the synthesis filter 34 and the delay circuit 40. The synthesis filter 34 performs a synthesis process using the received linear prediction parameter α as a coefficient, and outputs a synthesized speech signal.

【００６０】上記したように本実施形態では、音声区間
と非音声区間との間でピッチ強調処理におけるゲイン調
整の割合を切り替えている。すなわち、入力信号が音声
信号であると判別されたときには通常のピッチ強調処理
を行なうが、非音声信号であると判別されたときには確
率コードブックのゲインを減衰させている。したがっ
て、雑音環境下においても音声信号を良好に符号化して
高音質の音声信号を得ることができる。また、音声判別
器１１内の閾値決定回路１２４によって符号化開始時の
フレームエネルギーに応じて判別閾値を決定するので、
背景雑音の大きさに適応して閾値が決定されることにな
り、より精度よく音声／非音声の判別ができる。As described above, in this embodiment, the ratio of gain adjustment in the pitch enhancement processing is switched between the voice section and the non-voice section. That is, when it is determined that the input signal is a voice signal, normal pitch enhancement processing is performed, but when it is determined that the input signal is a non-voice signal, the gain of the probability codebook is attenuated. Therefore, even in a noisy environment, the audio signal can be properly encoded to obtain a high-quality audio signal. Further, since the threshold decision circuit 124 in the voice discriminator 11 decides the discrimination threshold according to the frame energy at the start of encoding,
The threshold value is determined according to the size of the background noise, and the voice / non-voice can be discriminated more accurately.

【００６１】[0061]

【発明の効果】請求項１、２、３に記載の発明によれ
ば、雑音環境下においても音声信号を良好に符号化して
高音質の音声信号を得ることができる効果を奏する。According to the first, second and third aspects of the present invention, it is possible to obtain a high-quality audio signal by satisfactorily encoding the audio signal even in a noisy environment.

【００６２】また、請求項４記載の発明によれば、請求
項１記載の発明の効果に加えて、より精度よく音声／非
音声の判別ができる効果を奏する。According to the invention described in claim 4, in addition to the effect of the invention described in claim 1, there is an effect that the voice / non-voice can be discriminated more accurately.

[Brief description of the drawings]

【図１】本発明が適用される音声符号化装置の構成を示
すブロック図である。FIG. 1 is a block diagram showing a configuration of a speech coding apparatus to which the present invention is applied.

【図２】図１に示す音声判別器の構成を示す図である。FIG. 2 is a diagram showing a configuration of a voice discriminator shown in FIG.

【図３】背景雑音エネルギーと閾値との関係を示す図で
ある。FIG. 3 is a diagram showing a relationship between background noise energy and a threshold value.

【図４】一般化相互相関と確率コードブックのゲインの
倍率との関係を示す図である。FIG. 4 is a diagram showing a relationship between a generalized cross-correlation and a gain factor of a probability codebook.

【図５】図１に示す音声符号化装置に対応する音声復号
化装置の構成を示す図である。5 is a diagram showing a configuration of a speech decoding apparatus corresponding to the speech encoding apparatus shown in FIG.

【図６】従来の音声符号化装置のブロック図である。FIG. 6 is a block diagram of a conventional audio encoding device.

【図７】図６に示す音声符号化装置に対応する音声復号
化装置の構成を示す図である。7 is a diagram showing a configuration of a speech decoding apparatus corresponding to the speech encoding apparatus shown in FIG.

[Explanation of symbols]

１…適応コードブック、２、５…乗算器、３…加算器、
４…確率コードブック、６、１９…スイッチ、７…遅延
回路、８…合成フィルタ、９…バッファメモリ、１０…
ピッチ周期性分析器、１１…音声判別器、１２…減算
器、１３…聴感重み付けフィルタ、１４…誤差評価器、
１５…ゲイン調整器、１６…ＬＰＣ分析器、１７…サブ
フレーム分割器、１８…マルチプレクサ。1 ... Adaptive codebook, 2, 5 ... Multiplier, 3 ... Adder,
4 ... Probability code book, 6, 19 ... Switch, 7 ... Delay circuit, 8 ... Synthesis filter, 9 ... Buffer memory, 10 ...
Pitch periodicity analyzer, 11 ... Speech discriminator, 12 ... Subtractor, 13 ... Perceptual weighting filter, 14 ... Error evaluator,
15 ... Gain adjuster, 16 ... LPC analyzer, 17 ... Subframe divider, 18 ... Multiplexer.

Claims

[Claims]

1. A voice discriminating means for discriminating whether a frame unit input signal divided into predetermined frame intervals is a voice signal or a non-voice signal, and a linear which analyzes the input signal and outputs a spectrum parameter thereof. Prediction analysis means, subframe division means for further dividing the frame interval of the input signal into predetermined subframe intervals, and an adaptive codebook in which a plurality of signals created by delaying past sound source signals are stored in advance. , A stochastic codebook storing a plurality of noise signal waveforms at the subframe intervals, and a driving sound source signal generation for generating a driving sound source signal based on a signal output from either or both of the stochastic codebook and the adaptive codebook. Means, and the stochastic codebook and the adaptive codebook based on the spectral parameters. A synthesis filter for synthesizing speech with the signal output from as the driving sound source signal, an adaptive codebook delay that minimizes distortion of the synthesized signal with respect to the input signal, a probability codebook index, and a code corresponding to each gain. Error minimizing means for outputting, pitch periodicity analyzing means for analyzing pitch periodicity of the input signal, the probability codebook according to the discrimination result of the voice discriminating means and the analysis result of the pitch periodicity analyzing means. A speech coding apparatus comprising: a gain adjusting unit that increases or decreases the gain of both or one of the adaptive codebooks.

2. The driving sound source signal generating means generates a driving sound source signal from the signals output from the probability code book and the adaptive code book when the sound judging means judges the input signal as a sound signal, 2. The speech encoding apparatus according to claim 1, wherein when the speech discriminating means discriminates the input signal as a non-speech signal, a driving excitation signal is generated from a signal output from only the probability codebook.

3. The gain adjusting means, when the voice discriminating means discriminates an input signal as a non-voice signal, both the probability codebook and the adaptive codebook at a predetermined ratio with respect to the case of discriminating the input signal as a voice signal, or The speech coding apparatus according to claim 1, wherein one of the gains is attenuated.

4. The speech discrimination means discriminates speech / non-speech according to the energy level of an input signal for each frame, and a threshold for determining a discrimination threshold according to the frame energy at the start of encoding. A deciding means is provided, wherein if the difference between the current frame energy and the frame energy at the start of coding is larger than the discrimination threshold decided by the threshold deciding means, it is voice, and if it is smaller, it is non-voice. The speech coding apparatus according to Item 1, 2 or 3.