JPH08272394A

JPH08272394A - Voice encoding device

Info

Publication number: JPH08272394A
Application number: JP7074094A
Authority: JP
Inventors: Noriyuki Otsuka; 則幸大塚
Original assignee: Olympus Optical Co Ltd
Current assignee: Olympus Corp
Priority date: 1995-03-30
Filing date: 1995-03-30
Publication date: 1996-10-18

Abstract

PURPOSE: To provide a voice encoding device capable of gaining excellent tone quality of a voice even in bad environment with more noises. CONSTITUTION: A pie-filter 104 constituting a prestage of the voice encoding device is constituted of a segmentation means 200, a formant emphasis means 201 and a gain control means 202, and a voice signal supplied from an input terminal 100 to the segmentation means 200 and the formant emphasis means 201 is divided to the voice/a noise by the segmentation means 200, and its voice/ noise decision information is imparted to the gain control means 202, and the voice signal is supplied to the gain control means 202 after a formant is emphasized by an LPC coefficient anti-quantized by the formant emphasis means 201 at every sub-frame to be supplied to an auditory sense weighting filter 105 after the gain is controlled based on the voice/noise decision information.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ＣＥＬＰ（Code-Excit
ed Linear Predictive Coding 即ち、「コード駆動音源
符号化方式」）やＭＰＣ（Multi-Pulse Excited Linear
PredictiveCoding 即ち、「マルチパルス音源符号化
方式」）等の "Ａ- ｂ- Ｓ" 法（Analysis-by-Synthesi
s 即ち、「合成による分析法」）が適用された音声符号
化方式の雑音除去装置に関する。BACKGROUND OF THE INVENTION The present invention relates to CELP (Code-Excit
ed Linear Predictive Coding That is, "code-driven excitation coding method") and MPC (Multi-Pulse Excited Linear)
PredictiveCoding, that is, "ABS" method (Analysis-by-Synthesi) such as "multipulse excitation coding method")
s That is, the present invention relates to a noise removal device of a voice coding system to which the “analysis method by synthesis” is applied.

【０００２】[0002]

【従来の技術】近年、音声信号をディジタル変換し符号
化して、通信等における装置が備えるメモリ等の有効的
活用を図ることが必要とされており、その中でも前述の
ＭＰＣやＣＥＬＰ等の「合成による分析」即ち、Ａ- ｂ
- Ｓ法を利用したハイブリッド符号化技術の研究が盛ん
である。2. Description of the Related Art In recent years, it has been required to digitally convert a voice signal and encode it to effectively utilize a memory or the like provided in a device for communication or the like. Among them, "synthesis" of MPC, CELP and the like mentioned above is required. Analysis by “Ab
-Study on hybrid coding technology using S method is active.

【０００３】この種の技術の内で代表的な一手法である
ＣＥＬＰについては、次のような文献があげられる。M.
R.Schroeder and B.S.Atal,"Code-excited linear pred
iction (CELP):Highquality speech at very low bit r
ates"(Proc. ICASSP, p.937-940,1985年)この文献に示
されているように、入力信号を線形予測分析して線形予
測係数（ＬＰＣ；Line Predictive Coding）を算出し、
合成信号をコードブック内の白色雑音と長期予測器から
抽出されたピッチから構成されたものを音源として聴覚
的な重み付けを施して求め、その聴覚的重み付けが施さ
れた合成信号と、同じように聴覚的重み付けが施された
入力信号との二乗誤差が最小となるようなインデックス
をピッチとＬＰＣ係数と共に出力側に転送している。こ
の技術は前述のように、入力信号と最も距離の小さいイ
ンデックスとピッチとＬＰＣ係数とを得る手法である。
つまり、このような方式は、聴覚的な重み付けが施され
た入力信号に、最も近似する聴覚的な重み付けが施され
た合成信号を求めて、その音源を選択する方法であり、
略して「Ａ- ｂ- Ｓ法」と呼ばれている。Regarding CELP, which is a typical technique of this kind of technique, the following documents are listed. M.
R. Schroeder and BSAtal, "Code-excited linear pred
iction (CELP): High quality speech at very low bit r
ates "(Proc. ICASSP, p.937-940, 1985) As shown in this document, a linear predictive analysis (LPC; Line Predictive Coding) is calculated by performing a linear predictive analysis on an input signal,
The synthesized signal is obtained by aurally weighting the synthesized signal composed of white noise in the codebook and the pitch extracted from the long-term predictor, and the same as the aurally weighted synthetic signal. The index that minimizes the squared error from the perceptually weighted input signal is transferred to the output side together with the pitch and the LPC coefficient. As described above, this technique is a method of obtaining the index, pitch, and LPC coefficient that have the smallest distance from the input signal.
In other words, such a method is a method of selecting a sound source by obtaining a synthesized signal that is most nearly acoustically weighted to an acoustically weighted input signal,
It is abbreviated as “A-b-S method”.

【０００４】この方式によれば、例えば入力信号に音声
信号のみが含まれている場合には高品質な音声信号が再
現される。しかし、実際の環境下における運用では、音
声のみが入力信号の中に含まれているとは限らない。例
えば、自動車内においては自動車のエンジン音や道路か
らの騒音が含まれるし、また室内においては、エアコン
等の電化製品から発生された雑音等の背景雑音も含まれ
ているのが通常である。よって、そのままの入力信号を
直接的に符号化処理しようとすると、音声自体の処理品
質が低下してしまうという不具合があった。According to this method, for example, when the input signal contains only the audio signal, a high quality audio signal is reproduced. However, in the operation in the actual environment, only the voice is not always included in the input signal. For example, in a car, engine noise of the car and noise from the road are included, and in a room, background noise such as noise generated from electric appliances such as an air conditioner is usually included. Therefore, there is a problem in that the processing quality of the voice itself is deteriorated if the input signal as it is is directly encoded.

【０００５】そこで、従来より上述のような入力信号内
の雑音成分を抑圧する方法として、様々な方法が提案さ
れている。例えば、須田、池田および池戸、３名による
文献「ＰＳＩ−ＣＥＬＰの誤り制御と信号処理技術」(N
TT R&D vol.No.4, page.373-380, 1994年) に教示され
ているような、入力信号の雑音成分を抑圧する方法の前
処理として、「カルマンフィルタ」を使用する方法が知
られている。しかしこのカルマンフィルタがその所定の
機能を発揮するためには複雑な演算処理が必要である故
に、もしリアルタイムな処理を行おうとすると専用の大
規模なハードウェアが必須であった。Therefore, various methods have heretofore been proposed as methods for suppressing the noise component in the input signal as described above. For example, Suda, Ikeda, and Ikedo, “PSI-CELP error control and signal processing technology” (N.
(TT R & D vol.No.4, page.373-380, 1994), the method of using the "Kalman filter" is known as a pre-processing of the method of suppressing the noise component of the input signal. There is. However, in order for this Kalman filter to exert its predetermined function, complicated arithmetic processing is required, so if real-time processing is to be performed, dedicated large-scale hardware is essential.

【０００６】よって、本発明の目的は、上記の不具合を
解決し雑音の多い劣悪な環境下においても良好な音声の
音質を獲得できるような音声符号化装置を提供すること
にある。Therefore, an object of the present invention is to provide a speech coding apparatus which can solve the above-mentioned problems and can obtain good speech quality even in a noisy and poor environment.

【０００７】[0007]

【課題を解決するための手段】上述の不具合を克服し目
的を達成するために、本発明は次のような手段を講じて
いる。すなわち、入力された音声信号の中に含まれる音
韻らしさの因子( ホルマント) を強調すること（以下、
この概念を「ホルマント強調」と称す）を行うと共に、
音声または雑音において複数のセグメント単位に分割処
理し（以下、この処理を「セグメンテーション」と称
す）、雑音のゲインを抑圧して得た信号（以下、「参照
信号」と称す）として符号化処理を行う。In order to overcome the above-mentioned problems and achieve the object, the present invention takes the following means. That is, emphasizing the phonological factor (formant) contained in the input speech signal (hereinafter,
This concept is called "formant emphasis") and
The speech or noise is divided into a plurality of segment units (hereinafter, this process is referred to as "segmentation"), and the encoding process is performed as a signal obtained by suppressing the noise gain (hereinafter, referred to as "reference signal"). To do.

【０００８】つまり、本発明は、雑音のゲインを抑制し
た信号である処の参照信号との「合成による分析法」、
即ち、Ａ- ｂ- Ｓ法によって符号化処理を行う音声符号
化装置であって、この参照信号のホルマントを強調する
手段と、信号の音声部と雑音部とを分割するセグメンテ
ーション手段と、このセグメンテーション手段により得
られた雑音部のレベルを抑圧する手段とを備えるように
装置を構成する。That is, according to the present invention, the "analysis method by combining" with the reference signal which is a signal in which the gain of noise is suppressed,
That is, a speech coder for performing coding processing by the A-B-S method, a means for emphasizing the formant of this reference signal, a segmentation means for dividing the speech part and the noise part of the signal, and this segmentation. And a means for suppressing the level of the noise part obtained by the means.

【０００９】[0009]

【作用】上述の手段によって本発明は次のような作用を
奏する。入力信号はサブフレーム毎に、音声／雑音でセ
グメンテーション処理が行われ、入力信号がいわゆる
「ホルマント強調」された後に、雑音部の振幅が抑圧さ
れ、雑音が抑圧された信号が、Ａ- ｂ- Ｓ法において用
いられる参照信号とされる。The present invention has the following actions by the above-mentioned means. The input signal is segmented by speech / noise for each subframe, and after the so-called “formant enhancement” of the input signal, the amplitude of the noise part is suppressed, and the noise-suppressed signal is A−b− It is used as a reference signal used in the S method.

【００１０】また本発明では、入力信号からフレーム毎
に線形予測係数が抽出され、線形予測係数を基にしてサ
ブフレーム毎に、ホルマント強調およびフィルタ処理が
施され、入力信号が音声／雑音でセグメンテーション処
理され、雑音部の振幅が抑圧され、雑音が抑圧された信
号が、Ａ- ｂ- Ｓ法において用いられる参照信号とな
る。Further, in the present invention, a linear prediction coefficient is extracted from the input signal for each frame, formant enhancement and filtering are performed for each subframe based on the linear prediction coefficient, and the input signal is segmented by speech / noise. The signal that has been processed, the amplitude of the noise part is suppressed, and the noise is suppressed becomes a reference signal used in the A-B-S method.

【００１１】その結果、従来技術に比較してより簡単な
処理によって、リアルタイムでサブフレーム毎に雑音が
抑圧可能である。また、同一の線形予測係数を適用する
処理により、装置自体のハードウエア構成が単純化され
る。As a result, noise can be suppressed for each subframe in real time by a simpler process compared to the prior art. In addition, the hardware configuration of the device itself is simplified by the process of applying the same linear prediction coefficient.

【００１２】[0012]

【Example】

（第１実施例）図１には、本発明の音声符号化装置の第
１の実施例に係わる構成が示されている。本装置の入力
信号として、例えば、８kHz でサンプリングされ、２０
〜３０ms程度のフレーム単位に分割された音声信号が、
端子１００から入力される。この入力信号を線形予測(L
PC) するためのＬＰＣ分析器１０１および、（この入力
信号を透過前処理する）プレフィルタ１０４にそれぞれ
分岐して入力する。(First Embodiment) FIG. 1 shows the configuration of a first embodiment of a speech coder according to the present invention. As an input signal of this device, for example, it is sampled at 8kHz,
An audio signal divided into frame units of ~ 30ms
It is input from the terminal 100. This input signal is linearly predicted (L
The LPC analyzer 101 for performing the PC) and the pre-filter 104 (which pre-processes the input signal for transmission) are branched and input.

【００１３】図示のように、このＬＰＣ分析器１０１は
入力信号から求められたＬＰＣ係数を量子化するＬＰＣ
量子化器１０２に接続し、このＬＰＣ量子化器１０２は
更に量子化されたＬＰＣ係数を逆量子化するＬＰＣ逆量
子化器１０３と、本装置の出力段として多重化処理する
多重化器１１９とに接続されている。As shown in the figure, the LPC analyzer 101 quantizes the LPC coefficient obtained from the input signal.
The LPC quantizer 102 is connected to the quantizer 102, and the LPC quantizer 102 further includes an LPC dequantizer 103 for dequantizing the quantized LPC coefficient, and a multiplexer 119 for performing a multiplexing process as an output stage of the present apparatus. It is connected to the.

【００１４】一方、プレフィルタ１０４は、聴覚的に重
み付けを施す聴覚重み付けフィルタ１０５と、前記のＬ
ＰＣ逆量子化器１０３に接続されている。この聴覚重み
付けフィルタ１０５は更に、レファレンス信号を出力す
る零入力フィルタ１０６に接続している。On the other hand, the pre-filter 104 includes a perceptual weighting filter 105 for perceptually weighting and the above L
It is connected to the PC inverse quantizer 103. The perceptual weighting filter 105 is further connected to a zero input filter 106 that outputs a reference signal.

【００１５】また、前記のＬＰＣ逆量子化器１０３は、
前記のプレフィルタ１０４のみならず、前記の聴覚重み
付けフィルタ１０５、零入力フィルタ１０６および、重
み付きの合成音声を合成する重み付け合成フィルタ１０
７にも接続している。Further, the LPC dequantizer 103 described above is
Not only the pre-filter 104, but also the auditory weighting filter 105, the zero-input filter 106, and the weighting synthesis filter 10 for synthesizing weighted synthetic speech
It is also connected to 7.

【００１６】本発明装置を構成するそれぞれの構成要素
は次のように機能する。まず、ＬＰＣ分析器１０１では
各フレーム毎に１０次〜１２次程度でＬＰＣ分析され、
線形予測されたＬＰＣ係数としてＬＰＣ量子化器１０２
に出力される。The respective constituent elements of the device of the present invention function as follows. First, the LPC analyzer 101 performs LPC analysis on the order of 10th to 12th for each frame,
LPC quantizer 102 as the linearly predicted LPC coefficient
Is output to

【００１７】ＬＰＣ量子化器１０２はＬＰＣ分析器１０
１から受け取った当該ＬＰＣ係数に対して所定の量子化
処理を施し、出力側への転送に必要なビット数まで削減
処理したうえで、このＬＰＣ係数のインデックスｃｉと
して多重化器１１９とＬＰＣ逆量子化器１０３に供給す
る。ＬＰＣ逆量子化器１０３ではＬＰＣ係数のインデッ
クスｃｉを逆量子化する。このように逆量子化されたＬ
ＰＣ係数はプレフィルタ１０４、聴覚重み付けフィルタ
１０５、零入力フィルタ１０６、および重み付け合成フ
ィルタ１０７に供給される。また、同時に入力端子１０
０から入力された音声信号は、フレームの４分の１から
成るサブフレーム毎にプレフィルタ１０４によってプレ
フィルタ処理される。ここで、このプレフィルタ１０４
に与えられるＬＰＣ係数は、聴覚重み付けフィルタ１０
５と零入力フィルタ１０６および、重み付け合成フィル
タ１０７と同等なものが使用される。The LPC quantizer 102 is an LPC analyzer 10.
The LPC coefficient received from 1 is subjected to predetermined quantization processing to reduce the number of bits required for transfer to the output side, and then the multiplexer 119 and the LPC inverse quantum are used as the index ci of the LPC coefficient. It is supplied to the rectifier 103. The LPC dequantizer 103 dequantizes the index ci of the LPC coefficient. L dequantized in this way
The PC coefficient is supplied to the pre-filter 104, the perceptual weighting filter 105, the zero-input filter 106, and the weighting synthesis filter 107. At the same time, the input terminal 10
The audio signal input from 0 is pre-filtered by the pre-filter 104 for each sub-frame consisting of a quarter of the frame. Here, this pre-filter 104
LPC coefficients given to the perceptual weighting filter 10
5 and the zero input filter 106 and the equivalent to the weighting synthesis filter 107 are used.

【００１８】ここで、それぞれのフィルタ処理で用いる
処理単位を同一サブフレームとすることで、装置自体の
構成を簡単にすることができる。ここで、図３を参照し
ながらプレフィルタ１０４の詳細な構成および機能動作
について説明する。Here, the processing unit used in each filter processing is the same subframe, whereby the structure of the apparatus itself can be simplified. Here, the detailed configuration and functional operation of the pre-filter 104 will be described with reference to FIG.

【００１９】図示のように、このプレフィルタ１０４
は、セグメンテーション手段２００とホルマント強調手
段２０１と、ゲイン制御手段２０２から構成されてい
る。セグメンテーション手段２００とホルマント強調手
段２０１とは、それぞれ入力端子１００に直列に接続さ
れている。また、セグメンテーション手段２００はゲイ
ン制御手段２０２に接続され、このゲイン制御手段２０
２は聴覚重み付けフィルタ１０５に接続している。ま
た、このホルマント強調手段２０１はゲイン制御手段２
０２に接続されている。As shown, this pre-filter 104
Is composed of a segmentation means 200, a formant emphasis means 201, and a gain control means 202. The segmentation means 200 and the formant emphasis means 201 are connected to the input terminal 100 in series, respectively. Further, the segmentation means 200 is connected to the gain control means 202, and the gain control means 20
2 is connected to the perceptual weighting filter 105. Further, the formant emphasizing means 201 is the gain control means 2
02 is connected.

【００２０】このプレフィルタ１０４は、入力端子１０
０からの音声信号を処理した後、セグメンテーション手
段２００とホルマント強調手段２０１に供給する。音声
信号は、セグメンテーション手段２００によって、音声
／雑音に分割され、その判定情報がゲイン制御手段２０
２に与えられる。同時に、音声信号は、ホルマント強調
手段２０１により、逆量子化されたＬＰＣ係数を使っ
て、サブフレーム毎にホルマントが強調された後、ゲイ
ン制御手段２０２に供給され、音声／雑音判定情報に基
づきゲインが制御された後には聴覚重み付けフィルタ１
０５に供給される。The pre-filter 104 has an input terminal 10
After processing the audio signal from 0, it is supplied to the segmentation means 200 and the formant emphasis means 201. The voice signal is divided into voice / noise by the segmentation means 200, and its judgment information is gain control means 20.
Given to 2. At the same time, the formant enhancement unit 201 enhances the formant for each subframe by using the dequantized LPC coefficient, and then the voice signal is supplied to the gain control unit 202 and gained based on the voice / noise determination information. Is controlled, the auditory weighting filter 1
It is supplied to 05.

【００２１】再び図１が示す装置の構成図に戻って説明
すると、上述のようにして作成された音声信号は、聴覚
重み付けフィルタ１０５によって、聴覚的に重み付け処
理を施された後、零入力フィルタ１０６に入力され、続
いてリファレンス信号として減算器１２０に入力され
る。Returning again to the block diagram of the apparatus shown in FIG. 1, the audio signal generated as described above is auditorily weighted by the auditory weighting filter 105, and then the zero input filter. It is input to 106 and then to the subtractor 120 as a reference signal.

【００２２】一方、合成音声は次のように生成される。
まず、評価器１０８から適応コードブック１０９に対し
て適当なインデックスの初期値が与えられ、このインデ
ックスの初期値に対応するコードが適応コードブックか
ら乗算器１１２に出力される。適応コードブック１０９
から乗算器１１２にコードが出力される一方で、図示し
ない制御手段から端子１１７を介してインデックスの初
期値が適応ゲインコードブック１１４に与えられ、これ
に対応するゲインが適応ゲインコードブック１１４から
出力されて乗算器１１２に供給される。この乗算器１１
２では適応コードブックからのコードと適応ゲインコー
ドブックからのゲインとの乗算が行われ、この乗算結果
は加算器１１６を介して重み付け合成フィルタ１０７に
入力される。重み付け合成フィルタ１０７に入力された
信号は重み付き合成音声として合成され、零入力フィル
タ１０６から出力されたリファレンス信号との間で減算
器１２０によって減算処理が施され、誤差信号として評
価器１０８に入力される。On the other hand, synthetic speech is generated as follows.
First, the evaluator 108 gives an appropriate initial value of the index to the adaptive codebook 109, and the code corresponding to the initial value of this index is output from the adaptive codebook to the multiplier 112. Adaptive codebook 109
While the code is output from the multiplier 112 to the multiplier 112, the initial value of the index is given to the adaptive gain codebook 114 from the control means (not shown) via the terminal 117, and the corresponding gain is output from the adaptive gain codebook 114. It is then supplied to the multiplier 112. This multiplier 11
In 2, the code from the adaptive codebook is multiplied by the gain from the adaptive gain codebook, and the multiplication result is input to the weighting synthesis filter 107 via the adder 116. The signal input to the weighting synthesis filter 107 is synthesized as a weighted synthetic speech, subtraction processing is performed with the reference signal output from the zero-input filter 106 by the subtractor 120, and the signal is input to the evaluator 108 as an error signal. To be done.

【００２３】評価器１０８は、適応コードブック１０９
に与えるインデックスを更新しつつ、この誤差信号から
求めた二乗誤差を最小にする適応コードブックのインデ
ックスｉａおよびゲインとそのインデックスｉｇａを求
める。以上で、適応コードブック探索処理が終了する。The evaluator 108 has an adaptive codebook 109.
The index ia of the adaptive codebook and the gain and its index iga that minimize the squared error obtained from this error signal are obtained while updating the index given to This is the end of the adaptive codebook search process.

【００２４】次に、適応コードブック探索処理で求めた
適応コードブックのインデックスｉａと適応ゲインコー
ドブックのインデックスｉｇａに相当するゲインを固定
しておき、確率コードブック１１０と確率ゲインコード
ブック１１５の探索を適応コードブック探索処理と同様
にして行う。適応コードブックで探索されたインデック
スとゲインに基づいて作成された音源と、確率コードブ
ック１１０からのコードと確率ゲインコードブック１１
５からのゲインに基づいて作成された音源とを、加算器
１１６によって加算して新たな音源信号を作成し、重み
付け合成フィルタ１０７で合成し、合成された合成音声
と入力信号との誤差信号から求められた二乗誤差が最小
になる確率コードブックのインデックスｉｓと確率ゲイ
ンコードブックのインデックスｉｇｓが求められる。こ
のようにして求められた適応コードブックｉａ、適応ゲ
インコードブックのインデックスｉｇａ、確率コードブ
ックのインデックスｉｓ、確率ゲインコードブックのイ
ンデックスｉｇｓおよびＬＰＣ係数のインデックスｃｉ
は、多重化器１１９によって多重化処理され、図示しな
い記録媒体、または通信路へと出力される。また、上記
のインデックスｉｓ，ｉｇｓ，ｉｇａおよびｉａに基づ
き、加算器１１６を介して加算された音源信号は、シフ
トレジスタ１１１に入力され、１サブフレーム分（１フ
レームの４分の１）だけ時間的にずらした音源信号、即
ち、１サブフレーム分「過去」の音源信号として適応コ
ードブック１０９に入力され、適応コードブック１０９
のコードを構成する。Next, the gains corresponding to the adaptive codebook index ia and the adaptive gain codebook index iga obtained by the adaptive codebook search process are fixed, and the stochastic codebook 110 and the stochastic gain codebook 115 are searched. Is performed in the same manner as the adaptive codebook search process. A sound source created based on the index and gain searched by the adaptive codebook, a code from the probability codebook 110, and a probability gain codebook 11
The sound source created based on the gain from 5 is added by the adder 116 to create a new sound source signal, which is synthesized by the weighting synthesis filter 107, from the error signal between the synthesized speech synthesized and the input signal. The index is of the probability codebook and the index igs of the probability gain codebook with which the obtained squared error is the minimum are obtained. The adaptive codebook ia, the adaptive gain codebook index iga, the probability codebook index is, the probability gain codebook index igs, and the LPC coefficient index ci determined as described above.
Are multiplexed by the multiplexer 119 and output to a recording medium or a communication path (not shown). Further, the sound source signal added through the adder 116 based on the above indexes is, igs, iga, and ia is input to the shift register 111, and the time is equivalent to one subframe (a quarter of one frame). Are input to the adaptive codebook 109 as a sound source signal that is shifted in a temporal sense, that is, a sound source signal of "past" for one subframe, and the adaptive codebook 109
Configure the code for.

【００２５】ここで、プレフィルタ１０４による作用に
ついて詳細に述べる。入力信号はサブフレーム毎に入力
され、セグメンテーション手段２００によって、音声／
雑音に分割され、セグメンテーション手段２００からの
出力である音声フラグがオン／オフされる。詳しくは、
例えば音声フラグがオンの場合は入力信号は音声であ
り、オフの場合には入力信号は雑音であると定義してお
く。入力信号を次にホルマント強調手段２０１は、入力
信号のホルマントが次式１によって強調処理する。Here, the operation of the pre-filter 104 will be described in detail. The input signal is input for each subframe, and the segmentation means 200 outputs the audio / voice.
It is divided into noise and the voice flag which is the output from the segmentation means 200 is turned on / off. For more information,
For example, when the voice flag is on, the input signal is voice, and when it is off, the input signal is noise. Next, the formant emphasizing means 201 emphasizes the input signal formant according to the following equation (1).

【００２６】[0026]

【数１】 [Equation 1]

【００２７】この式１において、α_i はｉ次のＬＰＣ係
数、ＮはＬＰＣ係数の次数、ηは零フィルタの重み付け
係数、νは極フィルタの重み付け係数をそれぞれ意味す
る。零フィルタの重み付け係数ηと極フィルタの重み付
け係数νは、０. ０＜η ＜ ν ＜１. ０の関
係が成り立つ。また好適な例としては、例えば、η＝
０. ２，ν＝０. ７である。In Equation 1, α _i is the i-th order LPC coefficient, N is the order of the LPC coefficient, η is the weighting coefficient of the zero filter, and ν is the weighting coefficient of the pole filter. The weighting coefficient η of the zero filter and the weighting coefficient ν of the pole filter have a relationship of 0.0 <η <ν <1.0. Further, as a preferable example, for example, η =
0.2 and v = 0.7.

【００２８】このフィルタによって周波数面上で、ホル
マントが強調され、アンチホルマント部分は逆に抑圧さ
れて、雑音が重畳した音声の明瞭度が向上する。次に、
図５のフローチャートに基づき、ゲイン制御手段が行う
ゲイン制御処理の工程について説明する。By this filter, the formant is emphasized in the frequency plane, and the antiformant part is suppressed in the opposite direction, so that the intelligibility of the voice on which the noise is superimposed is improved. next,
The steps of the gain control process performed by the gain control means will be described based on the flowchart of FIG.

【００２９】ゲイン制御手段２０２においては背景雑音
のある所定の区間の音声の振幅を抑圧するが、ここで雑
音区間の前後に段階的に振幅を抑圧する領域、即ち、
「フェードイン・フェードアウト」領域を設ける。（後
述参照）。The gain control means 202 suppresses the amplitude of the voice in a predetermined section having background noise, but here, the area where the amplitude is suppressed stepwise before and after the noise section, that is,
Provide a "fade in / fade out" area. (See below).

【００３０】まず最初に、音声フラグのオン／オフの判
定を行う（S302）。その結果、音声フラグがオフなら
ば、ゲインの値を検査、即ちゲインが無音部抑圧係数の
下限値（例えば、０. ５）であるか否かを判定する（S3
03）。First, it is determined whether the voice flag is on or off (S302). As a result, if the voice flag is off, the gain value is checked, that is, it is determined whether or not the gain is the lower limit value (eg, 0.5) of the silence suppression coefficient (S3).
03).

【００３１】このとき、そのゲイン値がこの抑圧係数の
下限値０. ５である場合は、そのゲインをいわゆる「ホ
ルマント強調」された入力音声に掛け合わせる（S30
4）。一方、下限値０. ５でなければ、ある一定値だけ
ゲインを下げたものをホルマント強調された入力音声に
掛け合わせる（S305）。At this time, when the gain value is the lower limit value of 0.5 of the suppression coefficient, the gain is multiplied by the so-called "formant-emphasized" input voice (S30).
Four). On the other hand, if the lower limit value is not 0.5, the formant-emphasized input voice is multiplied by a gain reduced by a certain fixed value (S305).

【００３２】なお、この一定値を例えば、ゲインが掛か
っていない状態、即ち１. ０から、１０サブフレームか
けてゲインの雑音部抑圧係数の下限値０. ５まで直線的
に下げると仮定すると、１サブフレームあたりの変化量
は０. ０５となる。このような処理を「フェードアウ
ト」処理と呼ぶ。Assuming that this constant value is linearly reduced from a state in which no gain is applied, that is, from 1.0 to the lower limit value 0.5 of the noise noise suppression coefficient of gain over 10 subframes, The amount of change per subframe is 0.05. Such a process is called a "fade out" process.

【００３３】また、音声フラグがＯＮならば、ＯＦＦの
時と同様にゲインの値を検査し（S304）、ゲインの値が
音声部抑圧係数の上限値即ち、１. ０であれば、ホルマ
ント強調された入力音声をそのまま出力する。If the voice flag is ON, the gain value is inspected as in the OFF state (S304). If the gain value is the upper limit value of the voice part suppression coefficient, ie, 1.0, the formant enhancement is performed. The input voice that is input is output as it is.

【００３４】一方、上限値が１. ０でなければ、ある一
定値だけゲインを上げたものをホルマント強調された入
力音声に掛け合わせる（S307）。このような処理を「フ
ェードイン」処理と呼ぶ。On the other hand, if the upper limit is not 1.0, the formant-emphasized input voice is multiplied by the gain increased by a certain fixed value (S307). Such processing is called "fade-in" processing.

【００３５】また、この一定値も例えば、前述のフェー
ドアウトと同様に、１０サブフレームかけてゲインの雑
音部抑圧係数の下限値０. ５から上限値１. ０まで直線
的に上げるとすると、１サブフレームあたりの変化量は
０. ０５とすることができる。Further, for example, if this constant value is linearly increased from the lower limit value of 0.5 to the upper limit value of 1.0 for the noise part suppression coefficient of gain over 10 subframes, as in the case of the fade-out described above, 1 The amount of change per subframe can be set to 0.05.

【００３６】上述の例では、いわゆるフェードイン／フ
ェードアウトは、同一サブフレーム数かけて直線的に変
化させたが、これは、例えばフェードインとフェードア
ウトにかけたサブフレーム数が異なっていたり、必ずし
も直線的でなくてもよい。In the above example, the so-called fade-in / fade-out is linearly changed over the same number of sub-frames, but this may be different, for example, when the number of sub-frames applied for fade-in and fade-out is different, or is not always linear. It doesn't have to be.

【００３７】このように、複数のサブフレームにわたっ
てゲインを変化させることにより、入力信号の音声から
雑音、或いは雑音から音声への変化を滑らかにすること
ができる。また、雑音が継続中に、セグメンテーション
手段２００において、インパルス的な雑音により、誤っ
て１サブフレーム分だけ、雑音が音声と判定された場合
においても、その振幅は、０. ５の状態から０. ０５増
加して、０. ５５倍されるだけであり、参照信号内のイ
ンパルス的なノイズもまた、実際のインパルス的なノイ
ズの０. ５５倍になっており、結果的にはインパルス的
なノイズの低減化が図られる。As described above, by changing the gain over a plurality of subframes, it is possible to smooth the change of the input signal from voice to noise or from noise to voice. Further, even if the noise is erroneously determined to be speech by one subframe in the segmentation means 200 due to impulse noise while the noise is continuing, the amplitude of the noise is from 0.5 to 0.5. It is increased by 0.05 and multiplied by 0.55, and the impulse noise in the reference signal is also 0.55 times the actual impulse noise, resulting in impulse noise. Can be reduced.

【００３８】以上のようにして、入力信号を加工したも
のを新たな入力信号として用いることにより、符号化処
理時において、雑音を抑圧することが可能となる。な
お、上記の構成ではプレフィルタ１０４はそれ以降の符
号化装置と独立している。従ってプレフィルタをオン／
オフすることも可能であり、このプレフィルタをオフに
設定した場合には従来の符号化装置として機能する。そ
の為の切り替え手段を設けることにより、音質への影響
が無いほどに雑音が微小な場合には、このプレフィルタ
をオフにすることにより、その分の消費電力の節約が可
能となる。As described above, by using the processed input signal as a new input signal, noise can be suppressed during the encoding process. Note that in the above configuration, the pre-filter 104 is independent of the subsequent coding devices. So turn on the prefilter /
It is also possible to turn it off, and when this prefilter is set to off, it functions as a conventional encoding device. By providing the switching means for that purpose, when the noise is so small that it does not affect the sound quality, the power consumption can be saved by turning off this pre-filter.

【００３９】入力信号の音声部と背景雑音部とのセグメ
ンテーション処理は、所定の閾値を用いて行われるが、
以下にはその閾値を適宜に変化させることによって背景
雑音が時間的に変化する場合にも適切にセグメンテーシ
ョン処理でき得る方法について説明する。The segmentation processing of the voice part and the background noise part of the input signal is performed using a predetermined threshold value.
Hereinafter, a method that can appropriately perform the segmentation processing even when the background noise temporally changes by appropriately changing the threshold value will be described.

【００４０】閾値は、サブフレーム毎の入力信号のエネ
ルギまたは振幅の和によって求められる。なお、エネル
ギは各サンプルの値の自乗をサブフレームに含まれるサ
ンプル数分だけの和をとったものであり、また、振幅の
和は各サンプルの絶対値をサブフレームに含まれるサン
プル数分だけの和をとったものである。The threshold value is obtained by the sum of energy or amplitude of the input signal for each subframe. The energy is the sum of the squares of the values of the samples for the number of samples included in the subframe, and the sum of the amplitudes is the absolute value of each sample for the number of samples included in the subframe. Is the sum of

【００４１】次に、サブフレーム毎の入力信号のエネル
ギを用いた場合の処理について述べる。まず、セグメン
テーション処理の基本的な処理アルゴリズムの一例を示
す。（１）エネルギＥthとなる値を音声と雑音との固定閾値
（閾値の上限値）とする。（２）１０サブフレーム分の平均エネルギを初期閾値Ｅ
th１とする。（３）１サブフレーム分ずらしながら１０サブフレーム
の平均エネルギを仮閾値Ｅth２として求める。（４）仮閾値Ｅth２と、初期閾値Ｅth１の比がＣth以下
であれば、雑音と判定し、初期閾値Ｅth１を仮閾値Ｅth
２で置き換える。（５）仮閾値Ｅth２と初期閾値Ｅth１との比が、Ｃthを
超過していれば、音声と判定する。（即ち、立ち上がり
の検出）（６）仮閾値Ｅth２がＥth未満に相当し、かつ初期閾値
Ｅth１を超えているサブフレームが５サブフレーム累積
した場合、音声と判定し、初期閾値Ｅth１を仮閾値Ｅth
２で置き換える。（なお、このような状態を「音声状
態」と称する。）（７）仮閾値Ｅth２が初期閾値Ｅth１以下のサブフレー
ムが連続５サブフレーム連続した状態の場合には、雑音
と判定し、初期閾値Ｅth１を仮閾値Ｅth２で置き換え
る。（なお、このような状態を「雑音状態」と称す
る。）次に、図６のフローチャートに従って具体的なセグメン
テーション手段が行うセグメンテーション処理の工程に
ついて説明する。Next, the processing when the energy of the input signal for each subframe is used will be described. First, an example of a basic processing algorithm of segmentation processing is shown. (1) Let the value that is the energy Eth be a fixed threshold value (the upper limit value of the threshold value) of voice and noise. (2) The average energy for 10 subframes is set to the initial threshold value E.
Let th1. (3) The average energy of 10 subframes is calculated as the temporary threshold Eth2 while shifting by 1 subframe. (4) If the ratio between the temporary threshold Eth2 and the initial threshold Eth1 is Cth or less, it is determined to be noise, and the initial threshold Eth1 is set to the temporary threshold Eth.
Replace with 2. (5) If the ratio between the temporary threshold Eth2 and the initial threshold Eth1 exceeds Cth, it is determined to be voice. (That is, detection of rising) (6) When 5 subframes in which the temporary threshold Eth2 corresponds to less than Eth and exceeds the initial threshold Eth1 are accumulated, it is determined to be voice, and the initial threshold Eth1 is set to the temporary threshold Eth.
Replace with 2. (Note that such a state is referred to as a "speech state".) (7) When the subframes whose provisional threshold value Eth2 is less than or equal to the initial threshold value Eth1 are continuous for 5 consecutive subframes, it is determined to be noise and the initial threshold value is determined. Eth1 is replaced with a temporary threshold value Eth2. (Note that such a state is referred to as a “noise state”.) Next, the steps of the segmentation processing performed by the concrete segmentation means will be described with reference to the flowchart of FIG.

【００４２】まず一連の処理の実行に先立ち、この処理
の開始を示す初期化フラグがオンされる。この初期化フ
ラグは、例えば装置の電源が投入されたか又は、電源が
切られるとオン( 即ち、電源投入) になるように設定さ
れている。First, prior to execution of a series of processing, an initialization flag indicating the start of this processing is turned on. The initialization flag is set to be turned on (that is, the power is turned on) when the power of the device is turned on or turned off.

【００４３】そして、サブフレーム毎の入力信号のエネ
ルギｅｎｇを算出する（S400）。上記（１）に相当する
処理は、Ｅthを決定することで、明らかに音声と雑音と
が、サブフレーム毎の入力信号のエネルギ的に見て分離
することができる値を見つけ閾値Ｅthとする。この閾値
Ｅthは、音声と雑音との閾値の最大値であり、この値を
超えれば無条件に音声と判定される。Then, the energy eng of the input signal for each subframe is calculated (S400). In the process corresponding to (1) above, by determining Eth, a threshold value Eth is found by finding a value by which speech and noise can be clearly separated from each other in terms of energy of the input signal for each subframe. This threshold Eth is the maximum value of the thresholds of voice and noise, and if it exceeds this value, it is unconditionally determined to be voice.

【００４４】上記（２）に相当する処理は、初期閾値Ｅ
th１を求める処理であり、初期化フラグの検査が行われ
る（S401）。この初期化フラグがオンであった場合に
は、サブフレーム毎の入力信号のエネルギｅｎｇは、音
声／雑音を分ける閾値Ｅthと比較される（S402）。この
比較で、エネルギｅｎｇのほうが閾値Ｅthよりも大きけ
れば、音声フラグをたてる、即ち、オンにする（S40
7）。そして、入力信号のエネルギｅｎｇは、セグメン
テーション手段内に設けられたエネルギバッファに古い
方から順次に新しい方に向かって、例えば１０サブフレ
ーム分が格納される（S403）。The process corresponding to (2) above is performed by the initial threshold value E.
This is a process for obtaining th1, and the initialization flag is inspected (S401). When the initialization flag is on, the energy eng of the input signal for each subframe is compared with the threshold Eth for separating voice / noise (S402). If the energy eng is larger than the threshold value Eth in this comparison, the voice flag is set, that is, turned on (S40).
7). Then, as the energy eng of the input signal, for example, 10 subframes are stored in the energy buffer provided in the segmentation means from the oldest one to the newest one (S403).

【００４５】このエネルギバッファはシフトレジスタと
同等の働きを行い、常に最新の１０サブフレームのエネ
ルギが格納されている。エネルギｅｎｇが閾値Ｅth以下
であれば、音声フラグをオフにし（S404）、１０サブフ
レーム分のエネルギで前記エネルギバッファが埋まって
いるか否かを検査する（S405）。仮にこのバッファがエ
ネルギで埋っていなければ、音声フラグをオフのまま当
該ルーチンから抜け出る。即ち、エンドに分岐する。This energy buffer performs the same function as the shift register, and always stores the latest 10 sub-frame energies. If the energy eng is less than or equal to the threshold value Eth, the voice flag is turned off (S404), and it is checked whether or not the energy buffer is filled with energy for 10 subframes (S405). If this buffer is not filled with energy, the routine exits the routine with the voice flag off. That is, it branches to the end.

【００４６】一方、バッファがエネルギで埋まっている
場合には、１０サブフレーム分のエネルギから、その平
均値を求め、初期閾値Ｅth１として保持し（S406）、そ
れと同時に初期化フラグをオフにする。On the other hand, when the buffer is filled with energy, the average value is obtained from the energy for 10 subframes and held as the initial threshold value Eth1 (S406), and at the same time, the initialization flag is turned off.

【００４７】なお、上述の平均値を求める動作が必要な
理由は、各サブフレーム間のエネルギのばらつきを極力
排除する目的のためである。また、この例においては、
１０サブフレームの平均値をとっているが、この代わり
に例えば、" ４" ，" ８" ，" １６" 等のように、いわ
ゆる「" ２" の累乗」の値をとった方が、固定小数点演
算で演算が右シフトのみで達成できるので、演算量を少
なくするという目的の為には適している。The reason why the above-mentioned operation for obtaining the average value is necessary is to eliminate the variation in energy between the sub-frames as much as possible. Also, in this example,
The average value of 10 subframes is taken, but instead of this, the value of so-called "power of 2", such as "4", "8", "16", is fixed. Since decimal point calculation can be achieved by only shifting to the right, it is suitable for the purpose of reducing the amount of calculation.

【００４８】上記（３）に相当する処理は、仮閾値Ｅth
２を求める処理であるが、まずセグメンテーション手段
内に設けられたエネルギバッファ内の最古のエネルギを
廃棄して、今求めたエネルギを入力することによってエ
ネルギバッファを更新する（S408）。また、初期化フラ
グがオンの時と同様に平均値を求め、仮閾値Ｅth２とし
て保持しておく（S409）。The process corresponding to the above (3) is performed by the provisional threshold Eth.
In the process of obtaining 2, the oldest energy in the energy buffer provided in the segmentation means is first discarded, and the energy thus obtained is input to update the energy buffer (S408). Further, the average value is obtained as in the case where the initialization flag is turned on, and is stored as the temporary threshold value Eth2 (S409).

【００４９】上記（４）に相当する処理は、まず、初期
化動作と同様に、ここで求められたサブフレームエネル
ギｅｎｇを、音声／雑音を分ける固定の閾値Ｅthと比較
する（S410）。サブフレームエネルギｅｎｇが閾値Ｅth
よりも大きければ音声フラグをオンにすると同時に、音
声状態のサブフレーム数を計測する音声状態カウンタ
に" ５" をセットし、雑音状態のサブフレーム数を計測
する雑音状態カウンタに" ５" をセットする（S407）。In the process corresponding to the above (4), first, similarly to the initialization operation, the sub-frame energy eng obtained here is compared with a fixed threshold Eth for separating voice / noise (S410). Subframe energy eng is threshold Eth
If it is larger than the above, the voice flag is turned on, and at the same time, "5" is set to the voice state counter that measures the number of subframes in the voice state, and "5" is set to the noise state counter that measures the number of subframes in the noise state. Yes (S407).

【００５０】後述のように、音声状態カウンタ及び雑音
状態カウンタは、それぞれその状態時にデクリメント(
即ち、減算) される。また、サブフレームエネルギｅｎ
ｇと音声／雑音を分離する閾値Ｅth以下の場合には、仮
閾値Ｅth２と初期閾値Ｅth１との比較が行われる（S41
1）。この比較で仮閾値Ｅth２の方が初期閾値Ｅth１よ
りも大きい場合、即ち、信号のエネルギが増加傾向にあ
る場合には、雑音状態カウンタに" ５" をセットし（S4
12）、次に仮閾値Ｅth２と初期閾値Ｅth１との比が求め
られ、その比とある一定値Ｃthとの比較が行われる（S4
13）。その比の値がＣth以下ならば、雑音部でのサブフ
レーム間の誤差であるとみなし、音声フラグをオフに設
定し（S414）、初期閾値Ｅth１を仮閾値Ｅth２で置き換
える（S415）。As will be described later, the voice state counter and the noise state counter respectively decrement (at the time of the state).
That is, subtraction) is performed. Also, the sub-frame energy en
When g is equal to or less than the threshold Eth for separating voice / noise, the temporary threshold Eth2 is compared with the initial threshold Eth1 (S41).
1). In this comparison, when the provisional threshold value Eth2 is larger than the initial threshold value Eth1, that is, when the signal energy tends to increase, "5" is set to the noise state counter (S4
12) Next, the ratio between the temporary threshold value Eth2 and the initial threshold value Eth1 is obtained, and the ratio is compared with a certain constant value Cth (S4).
13). If the value of the ratio is less than or equal to Cth, it is considered as an error between subframes in the noise part, the voice flag is set to OFF (S414), and the initial threshold Eth1 is replaced with the temporary threshold Eth2 (S415).

【００５１】この様な一連の処理を行うことによって、
本装置を使用する環境の、例えば背景ノイズが時間的に
増加している場合においても、その変化率がある割合以
下、即ちＣth以下ならば、その閾値を変化させることに
より背景ノイズの変化に追従させることが可能となる。
これは、音声の立ち上がりに対して背景ノイズの増加が
極めて緩やかであるという原理を利用したものである。By performing such a series of processing,
Even if the background noise in the environment in which the device is used is increased temporally, if the rate of change is less than a certain rate, that is, Cth or less, the threshold is changed to follow the change in the background noise. It becomes possible.
This is based on the principle that the increase of background noise is extremely gentle with respect to the rising of voice.

【００５２】上記（５）に相当する処理は、比の値がＣ
thを超過していれば、音声と判定し、音声フラグをオン
にする（S416）。この処理によって入力信号の立ち上が
りが検出されることとなる。In the process corresponding to (5) above, the ratio value is C
If th is exceeded, it is determined to be voice, and the voice flag is turned on (S416). By this processing, the rising edge of the input signal is detected.

【００５３】上記（６）に相当する処理は、音声状態カ
ウンタが" ０" か否かの検査が施され（S417）、音声状
態カウンタが" ０" でなければ、カウンタをデクリメン
トする（S420）。一方、音声状態カウンタが" ０" なら
ば、初期閾値Ｅth１を仮閾値Ｅth２で置き換え（S41
8）、音声状態カウンタに" ５" を再セットする。In the process corresponding to (6) above, it is checked whether the voice state counter is "0" (S417), and if the voice state counter is not "0", the counter is decremented (S420). . On the other hand, if the voice state counter is "0", the initial threshold value Eth1 is replaced with the temporary threshold value Eth2 (S41
8), reset "5" to the voice status counter.

【００５４】上述の処理は、立ち上がった信号の終端を
検出し易くしているという処理で、信号の立ち上がりが
検出された時点で、初期閾値Ｅth１がもはや変更されな
くなり、終端の検出には音声状態が５サブフレーム以上
連続した場合、音声フラグはオンにした状態のまま初期
閾値Ｅth１を書き換えることによって、新たな初期閾値
Ｅth１を設定するという処理である。The above-described processing is processing for facilitating detection of the rising edge of the signal. At the time when the rising edge of the signal is detected, the initial threshold value Eth1 is no longer changed, and the voice state is used for detecting the end. Is continuous for 5 subframes or more, the new initial threshold Eth1 is set by rewriting the initial threshold Eth1 with the audio flag kept on.

【００５５】上記（７）に相当する処理は、仮閾値Ｅth
２の方が初期閾値Ｅth１以下の場合、即ち、信号のエネ
ルギが減少傾向にある場合、雑音と判定し音声フラグを
オフにする（S421）。雑音状態カウンタが" ０" か否か
の検査を行い（S422）、雑音状態カウンタが" ０" でな
ければ、カウンタをデクリメントする（S425）。雑音状
態カウンタが" ０" ならば、初期閾値Ｅth１を仮閾値Ｅ
th２で置き換え（Ｓ４２３）、雑音状態カウンタに”
５" を再度セットする。The process corresponding to the above (7) is executed by the provisional threshold Eth.
When 2 is less than the initial threshold Eth1, that is, when the signal energy tends to decrease, it is determined to be noise and the voice flag is turned off (S421). It is checked whether the noise state counter is "0" (S422). If the noise state counter is not "0", the counter is decremented (S425). If the noise state counter is "0", the initial threshold Eth1 is set to the temporary threshold E.
Replaced with th2 (S423), "Noise state counter"
Set 5 "again.

【００５６】この処理は処理ステップS417〜S420迄と同
様に、立ち上がった信号の終端を検出し易くする処理で
ある。そして、この終端の検出には雑音状態が５サブフ
レーム連続したとき、音声フラグをオフにして初期閾値
Ｅth１を書き換えることによって、新たな初期閾値Ｅth
１を設定し、新たな立ち上がりに備えるという処理であ
る。Similar to the processing steps S417 to S420, this processing is processing for facilitating detection of the end of the rising signal. When a noise state continues for five subframes, the voice flag is turned off and the initial threshold value Eth1 is rewritten to detect the end point.
This is a process of setting 1 and preparing for a new rising.

【００５７】また、このプレフィルタを雑音抑圧フィル
タとして単独で用いることもでき、例えば復号化後のい
わゆる「ポストフィルタ」として用いることも可能であ
る。なお、以上に説明したセグメンテーション手段の動
作は、音声符号化だけでなく他の音声処理、例えば音声
認識などにも応用が可能である。Further, this pre-filter can be used alone as a noise suppression filter, for example, as a so-called "post-filter" after decoding. The operation of the segmentation means described above can be applied not only to speech coding but also to other speech processing, such as speech recognition.

【００５８】（第２実施例）次に、図２には本発明の音
声符号化装置に係わる第２の実施例が機能ブロック図で
示されている。(Second Embodiment) Next, FIG. 2 is a functional block diagram showing a second embodiment of the speech coding apparatus of the present invention.

【００５９】本実施例では、ホルマント強調手段（参
照、図４）において、ホルマント強調処理と聴覚重み付
け処理が同時になされることを特徴としている。前述の
第１実施例と同様に、８kHz でサンプリングされ、２０
〜３０ms程度のフレームに分割された音声信号が、入力
端子１００からＬＰＣ分析器１０１およびプレフィルタ
１０４’に入力される。The present embodiment is characterized in that the formant emphasizing means (see FIG. 4) simultaneously performs the formant emphasizing process and the auditory weighting process. As in the first embodiment described above, sampling was performed at 8 kHz, and 20
The audio signal divided into frames of about -30 ms is input from the input terminal 100 to the LPC analyzer 101 and the pre-filter 104 '.

【００６０】ＬＰＣ分析器１０１では各フレーム毎に１
０〜１２次程度でＬＰＣ分析され、ＬＰＣ係数としてＬ
ＰＣ量子化器１０２に出力される。図示のように、この
ＬＰＣ分析器１０１は入力信号から求められたＬＰＣ係
数を量子化するＬＰＣ量子化器１０２に接続し、このＬ
ＰＣ量子化器１０２は更に量子化されたＬＰＣ係数を逆
量子化するＬＰＣ逆量子化器１０３と、本装置の出力段
として多重化処理する多重化器１１９とに接続されてい
る。In the LPC analyzer 101, 1 is set for each frame.
LPC analysis is performed on the order of 0 to 12, and L is calculated as the LPC coefficient.
It is output to the PC quantizer 102. As shown in the figure, this LPC analyzer 101 is connected to an LPC quantizer 102 which quantizes the LPC coefficient obtained from the input signal,
The PC quantizer 102 is further connected to an LPC dequantizer 103 that dequantizes the quantized LPC coefficient and a multiplexer 119 that performs a multiplexing process as an output stage of the present apparatus.

【００６１】一方、プレフィルタ１０４’は前記のＬＰ
Ｃ逆量子化器１０３に接続し、更に、レファレンス信号
を出力する零入力フィルタ１０６に接続している。前記
のＬＰＣ逆量子化器１０３は、前記のプレフィルタ１０
４と、零入力フィルタ１０６のみならず、重み付きの合
成音声を合成する重み付け合成フィルタ１０７にも接続
している。On the other hand, the pre-filter 104 'is the above-mentioned LP.
It is connected to the C inverse quantizer 103, and is further connected to the quiescent filter 106 that outputs a reference signal. The LPC inverse quantizer 103 uses the pre-filter 10
4 and the zero-input filter 106 as well as the weighting synthesis filter 107 for synthesizing weighted synthetic speech.

【００６２】ＬＰＣ量子化器１０２は、入力端子１００
を介してＬＰＣ分析器１０１から受け取ったＬＰＣ係数
に対して量子化処理を施し、転送に必要となるビット数
までその音声信号のビット構成を削減した後、このＬＰ
Ｃ係数のインデックスｃｉとして多重化器１１９および
ＬＰＣ逆量子化器１０３に供給する。ＬＰＣ逆量子化器
１０３ではこのＬＰＣ係数のインデックスｃｉに対して
逆量子化処理を施す。そして、逆量子化されたＬＰＣ係
数はプレフィルタ１０４’、零入力フィルタ１０６、お
よび重み付け合成フィルタ１０７に供給される。The LPC quantizer 102 has an input terminal 100.
After the LPC coefficient received from the LPC analyzer 101 via the is quantized, the bit configuration of the audio signal is reduced to the number of bits required for transfer, and then the LP
The C coefficient index ci is supplied to the multiplexer 119 and the LPC dequantizer 103. The LPC inverse quantizer 103 performs an inverse quantization process on the index ci of this LPC coefficient. Then, the dequantized LPC coefficient is supplied to the pre-filter 104 ′, the zero-input filter 106, and the weighting synthesis filter 107.

【００６３】また上述の一連の処理と同時に、入力端子
１００から入力された音声信号は、フレームの４分の１
から構成されるサブフレーム毎にプレフィルタ１０４’
によって、プレフィルタ処理が施され、各サブフレーム
毎に「音韻性」の要因、いわゆる「ホルマント」が強調
されると同時に、聴覚的に重み付けを施された入力信号
が作成される。Simultaneously with the above-mentioned series of processing, the audio signal input from the input terminal 100 is a quarter of a frame.
Pre-filter 104 ′ for each sub-frame composed of
Performs a pre-filtering process to emphasize the "phonological" factor, so-called "formant", for each subframe, while at the same time creating an acoustically weighted input signal.

【００６４】つまり、前述の第１実施例と異なる構成上
の差異としては、このプレフィルタ１０４’が直接的に
零入力フィルタ１０６に接続されている点が特徴の１つ
である。よって、プレフィルタ１０４’の出力が零入力
フィルタ１０６に入力されることになる。That is, one of the features that is different from the above-described first embodiment in the construction is that the pre-filter 104 'is directly connected to the zero-input filter 106. Therefore, the output of the pre-filter 104 'is input to the zero-input filter 106.

【００６５】ここで、図４を参照しながら、本第２実施
例に係わるこのプレフィルタ１０４’の詳細な構成およ
び機能動作について説明する。図示のように、このプレ
フィルタ１０４’は、セグメンテーション手段２００と
ホルマント強調手段２０１’と、ゲイン制御手段２０２
から構成されている。セグメンテーション手段２００と
ホルマント強調手段２０１’とは、それぞれ入力端子１
００に直列に接続されている。また、セグメンテーショ
ン手段２００はゲイン制御手段２０２に接続され、この
ゲイン制御手段２０２は零入力フィルタ１０６に接続し
ている。また、このホルマント強調手段２０１’はゲイ
ン制御手段２０２に接続され、零入力フィルタ１０６に
直接的に出力している。The detailed structure and functional operation of the pre-filter 104 'according to the second embodiment will be described with reference to FIG. As shown, this pre-filter 104 'includes a segmentation means 200, a formant enhancement means 201', and a gain control means 202.
It consists of The segmentation means 200 and the formant emphasizing means 201 ′ are respectively connected to the input terminal 1
00 in series. Further, the segmentation means 200 is connected to the gain control means 202, and the gain control means 202 is connected to the quiescent filter 106. Further, the formant emphasizing means 201 ′ is connected to the gain controlling means 202 and directly outputs to the quiescent filter 106.

【００６６】このプレフィルタ１０４’において、端子
１００からの音声信号は、セグメンテーション手段２０
０とホルマント強調手段２０１’に供給される。音声信
号は、セグメンテーション手段２００によって、音声／
雑音に分割され、その判定情報がゲイン制御手段２０２
に与えられる。ここで、ホルマント強調手段２０１’内
において、ホルマント強調処理と聴覚重み付け処理とが
共に行われている。また、同一のサブフレームを処理単
位として同一のＬＰＣ係数の適用によって、同時期に処
理されている。すなわち、この第２実施例に係わるホル
マント強調手段２０１’において行われる処理は、次式
２に表されるフィルタ関数に入力信号が入力されること
によって達成される。In this pre-filter 104 ', the audio signal from the terminal 100 is segmented by the segmentation means 20.
0 and the formant emphasizing means 201 '. The voice signal is converted into voice / voice by the segmentation means 200.
It is divided into noise, and the determination information is divided into gain control means 202.
Given to. Here, in the formant emphasizing means 201 ′, both the formant emphasizing process and the auditory weighting process are performed. Further, the same LPC coefficient is applied to the same subframe as a processing unit, and the processing is performed at the same time. That is, the processing performed by the formant emphasizing means 201 ′ according to the second embodiment is achieved by inputting the input signal to the filter function represented by the following Expression 2.

【００６７】[0067]

【数２】 [Equation 2]

【００６８】この式２において、α_i はｉ次のＬＰＣ係
数、ηは零フィルタの重み付け係数、ν’は極フィルタ
の重み付け係数を意味し、聴覚的な重み付け処理とホル
マント強調処理とを併せた係数値として設定されてい
る。零フィルタの重み付け係数ηと極フィルタの重み付
け係数ν’とは、０. ０＜η＜ν’＜１. ０の関係が成
り立つ。好適な例としては、例えば、η＝０. ２，ν’
＝０. ６４である。In Equation 2, α _i is the i-th order LPC coefficient, η is the weighting coefficient of the zero filter, and ν ′ is the weighting coefficient of the polar filter. The auditory weighting processing and the formant enhancement processing are combined. It is set as a coefficient value. The weighting coefficient η of the zero filter and the weighting coefficient ν ′ of the pole filter have a relationship of 0.0 <η <ν ′ <1.0. As a suitable example, for example, η = 0.2, ν ′
= 0.64.

【００６９】なお、通常の聴覚重み付けでは、η＝１，
ν＝０. ８程度の値が用いられる。また、雑音抑圧のた
めのホルマント強調では、η＝０. ２，ν＝０. ８程度
の値が好適であるため、両者の効果を併せ持つために、
このような値が設定されるホルマント強調手段２０１’
で、サブフレーム毎にホルマントが強調されると同時
に、聴覚的に重み付けを施された入力信号は、ゲイン制
御手段２０２に供給され、音声／雑音判定情報に基づい
てゲインが制御された後、零入力フィルタ１０６に供給
される。Note that, with normal auditory weighting, η = 1,
A value of ν = 0.8 is used. Further, in the formant enhancement for noise suppression, a value of η = 0.2 and ν = 0.8 is preferable, so that both effects are combined,
Formant emphasizing means 201 'for which such a value is set
At the same time, the formant is emphasized for each subframe, and at the same time, the acoustically weighted input signal is supplied to the gain control means 202, and the gain is controlled based on the voice / noise determination information, and then zero. It is supplied to the input filter 106.

【００７０】なお、セグメンテーション手段２００にお
ける音声／雑音判定は、あらかじめ設定された閾値に基
づいて行われる。この閾値はあらかじめ所定の代表的な
環境条件に応じて適切な値に設定され、図示しない所定
の記憶手段に記憶されている。また、符号化処理におい
ては、本発明装置の使用者が使用時点における環境条件
を手動スイッチを用いて種々選択することによって、当
該条件における最適な閾値が記憶手段から読み出され、
セグメンテーション処理が行われる。The speech / noise determination in the segmentation means 200 is made based on a preset threshold value. This threshold value is set in advance to an appropriate value according to a predetermined representative environmental condition, and is stored in a predetermined storage means (not shown). In the encoding process, the user of the device of the present invention selects various environmental conditions at the time of use by using the manual switch, and the optimum threshold value under the conditions is read from the storage means.
Segmentation processing is performed.

【００７１】またその他の処理方法としては、装置に付
随する操作スイッチに連動して（入力音声信号中の）背
景雑音レベルを測定して、このレベル値に対応した所定
の閾値が自動的に選択されるように動作させてもよい。As another processing method, the background noise level (in the input audio signal) is measured in conjunction with the operation switch attached to the device, and a predetermined threshold value corresponding to this level value is automatically selected. It may be operated as described above.

【００７２】再び、図２を参照すると、プレフィルタ１
０４’において前述のようなプレフィルタ処理が施され
た音声信号は、次にこの信号は零入力フィルタ１０６に
入力され零状態応答が求められると、リファレンス信号
として減算器１２０に入力される。一方、合成信号を生
成する過程については、前述の第１実施例で説明した内
容と同様である。Referring again to FIG. 2, the prefilter 1
The audio signal that has been subjected to the pre-filtering processing in 04 'is input to the subtractor 120 as a reference signal when the signal is input to the zero input filter 106 and the zero state response is obtained. On the other hand, the process of generating the combined signal is the same as that described in the first embodiment.

【００７３】このように、本第２実施例では、ホルマン
ト強調処理および聴覚重み付け処理をプレフィルタ１０
４’中において一括して行うように、前述の第１実施例
での２つの機能をそれぞれに分担していたのを一体化し
て構成してあるので、実質的にフィルタ１つ分の演算処
理をしなくてもよくなり、その結果、一連の処理に係わ
る演算量の削減に寄与することとなる。As described above, in the second embodiment, the formant enhancement process and the perceptual weighting process are performed by the pre-filter 10.
4 ', the two functions in the above-described first embodiment are shared, so that they are collectively performed. Therefore, the arithmetic processing for one filter is substantially performed. Does not have to be performed, and as a result, it contributes to the reduction of the amount of calculation related to a series of processes.

【００７４】（変形実施例）以上、本発明に係わる第１
および第２実施例を説明したが、ここで、例えば、上述
の第２実施例のセグメンテーション手段２００およびゲ
イン制御手段２０２の処理動作は、前述の第１実施例と
同様な処理動作に替えて実施してもよい。(Modified Embodiment) The first embodiment of the present invention has been described above.
Although the second embodiment has been described, here, for example, the processing operation of the segmentation means 200 and the gain control means 202 of the second embodiment described above is performed instead of the same processing operation as that of the first embodiment described above. You may.

【００７５】また、第１実施例のセグメンテーション処
理のアルゴリズムでは、Ｅthという閾値を固定値として
設定しているが、信号レベルに対応して相対的にその閾
値の値を変化させてもよい。In the segmentation processing algorithm of the first embodiment, the threshold value Eth is set as a fixed value, but the threshold value may be relatively changed according to the signal level.

【００７６】本発明の音声符号化装置に係わる実施例と
しては、本発明の要旨の範囲において種々の変形実施も
可能である。本明細書中には、以下の発明が含まれる。As an embodiment relating to the speech coding apparatus of the present invention, various modifications can be made within the scope of the gist of the present invention. The following inventions are included in this specification.

【００７７】（１）雑音のゲインを抑制した信号であ
る参照信号との「合成による分析」（Analysis-by-Synt
hesis 以下、" Ａ- ｂ- Ｓ" と略す）法によって符号化
を行う音声符号化装置において、参照信号のホルマント
を強調する手段と、音声部と雑音部とを複数のセグメン
トに分割するセグメンテーション処理を行うセグメンテ
ーション手段と、前記セグメンテーション手段によって
得られた雑音部のレベルを抑圧する手段とを具備するこ
とを特徴とする音声符号化装置。作用１：入力信号をサブフレーム毎に、音声／雑音で
セグメンテーションを行い、入力信号をホルマント強調
した後、雑音部の振幅を抑圧して、雑音を抑圧した信号
をＡ- ｂ- Ｓ法で用いる参照信号とする。効果１：簡単な処理を適用し、リアルタイム処理によ
りサブフレーム毎に雑音を抑圧できる。(1) "Analysis-by-Synt" analysis with a reference signal, which is a signal with suppressed noise gain
hesis (hereinafter, abbreviated as "A-B-S") in a speech coder that performs coding by means of a method for enhancing the formant of a reference signal, and a segmentation process for dividing a speech part and a noise part into a plurality of segments. And a means for suppressing the level of the noise part obtained by the segmentation means. Action 1: The input signal is segmented by voice / noise for each subframe, the input signal is formant-emphasized, the amplitude of the noise part is suppressed, and the noise-suppressed signal is used by the A-B-S method. Use as reference signal. Effect 1: Simple processing is applied, and noise can be suppressed for each subframe by real-time processing.

【００７８】（２）前記参照信号のホルマントを強調
する手段に用いるフィルタの係数と、Ａ- ｂ- Ｓ法に用
いる重み付けフィルタのフィルタ係数と同一のものを使
用することを特徴とする( １) に記載の音声符号化装
置。作用２：入力信号から、フレーム毎に線形予測係数を
抽出し、線形予測係数をもとにサブフレーム毎に、ホル
マント強調フィルタ処理を施し、入力信号を音声／雑音
でセグメンテーション処理を行い、雑音部の振幅を抑圧
して、雑音を抑圧した信号をＡ- ｂ- Ｓ法で用いる参照
信号とする。効果２：同一の線形予測係数(LPC) を用いるので、装
置のハードウエアの構成が簡単で済む。(2) It is characterized in that the same filter coefficient as the filter used in the means for enhancing the formant of the reference signal and the filter coefficient of the weighting filter used in the A-b-S method are used (1). The audio encoding device according to. Action 2: A linear prediction coefficient is extracted from the input signal for each frame, formant enhancement filter processing is performed for each subframe based on the linear prediction coefficient, and the input signal is segmented by speech / noise to generate a noise part. The signal whose noise has been suppressed is used as the reference signal used in the A-B-S method. Effect 2: Since the same linear prediction coefficient (LPC) is used, the hardware configuration of the device is simple.

【００７９】（３）雑音のゲインを抑制した信号であ
る参照信号との「合成による分析」（Analysis-by-Synt
hesis 以下、" Ａ- ｂ- Ｓ" と略す）法によって符号化
を行う音声符号化装置において、前記参照信号のホルマ
ントを強調する手段と聴覚重み付けフィルタ手段との一
体化処理を行う手段と、音声部と雑音部とを複数のセグ
メントに分割するセグメンテーション処理を行うセグメ
ンテーション手段と、前記セグメンテーション手段によ
って得られた雑音部のレベルを抑圧する手段とを具備す
ることを特徴とする音声符号化装置。作用３：入力信号から、フレーム毎に線形予測係数を
抽出し、線形予測係数をもとにサブフレーム毎に、ホル
マント強調フィルタ処理と聴覚重み付けフィルタ処理を
一体化した処理を施し、入力信号を音声／雑音でセグメ
ンテーション処理を行い、雑音部の振幅を抑圧して、雑
音を抑圧した信号をＡ- ｂ- Ｓ法で用いる参照信号とす
る。効果３：処理を更に簡単にして( １) 〜( ２) で述べ
たと同等の効果が期待できる。(3) "Analysis-by-Synt" with a reference signal, which is a signal with suppressed noise gain
hesis (hereinafter, abbreviated as "A-B-S") in a speech coding apparatus for coding, a unit for emphasizing a formant of the reference signal and a unit for performing an auditory weighting filter unit, A speech coding apparatus comprising: a segmentation unit that performs a segmentation process that divides a noise unit and a noise unit into a plurality of segments; and a unit that suppresses the level of the noise unit obtained by the segmentation unit. Action 3: A linear prediction coefficient is extracted from the input signal for each frame, and a process in which formant enhancement filter processing and auditory weighting filter processing are integrated is performed for each subframe based on the linear prediction coefficient, and the input signal is voiced. / Segmentation processing is performed by noise to suppress the amplitude of the noise part, and the noise-suppressed signal is used as the reference signal used in the A-B-S method. Effect 3: The processing can be further simplified and the same effect as described in (1) to (2) can be expected.

【００８０】（４）前記セグメンテーション手段によ
って音声部と判定された区間の前後に、背景雑音の音声
の振幅を段階的に抑制する所定の領域としてのフェード
イン・フェードアウト区間を設定することを特徴とする
( １) 〜( ３) に記載の音声符号化装置。作用４：入力信号の雑音部の振幅の抑圧時に、サブフ
レーム単位で徐々にゲインを増加または減少させること
で、音声部との接続を滑らかにしたものを参照信号とし
て使用する。効果４：音声部の雑音感を目立たなくすると共に、イ
ンパルス的に背景雑音が生じた場合にもその抑圧効果が
ある。(4) A fade-in / fade-out section is set as a predetermined area before and after the section determined to be the voice part by the segmentation means, which gradually suppresses the amplitude of the background noise voice. Do
The speech coding apparatus according to any one of (1) to (3). Action 4: When suppressing the amplitude of the noise part of the input signal, the gain is gradually increased or decreased in units of subframes to smooth the connection with the audio part and used as the reference signal. Effect 4: The noise effect of the voice part is made inconspicuous, and the effect is suppressed even when the background noise is generated in impulse.

【００８１】（５）前記セグメンテーション手段は、
所定の閾値を適用して行うことを特徴とする( １) 〜(
４) に記載の音声符号化装置。作用５：入力信号のエネルギまたは振幅が、ある閾値
以下のとき雑音部と判定し、閾値を超えたとき音声部と
判定する。効果５：簡単な処理により音声部の検出が可能であ
る。(5) The segmentation means is
It is characterized by applying a predetermined threshold (1) to (
4) The voice encoding device according to 4). Action 5: When the energy or amplitude of the input signal is below a certain threshold, it is determined to be a noise part, and when it exceeds the threshold, it is determined to be a voice part. Effect 5: The voice part can be detected by a simple process.

【００８２】（６）前記セグメンテーション手段は、
前記参照信号のエネルギ又は振幅が固定された値を有す
る第１の閾値以下であり、かつ当該の変化率が一定レベ
ル以下の時にセグメンテーション処理に適用する閾値を
更新する手段を具備することを特徴とする( ５) に記載
の音声符号化装置。作用６：入力信号のエネルギまたは振幅が、ある固定
閾値以下で、変化率がある一定の範囲に収まっていれ
ば、閾値を変更し、変化率が一定の範囲を超えたとき
に、音声部への立ち上がりと判定する。効果６：背景雑音のレベルが時間的に変動している場
合でも、音声部の立ち上がりの検出が可能である。(6) The segmentation means is
The method further comprises means for updating the threshold value applied to the segmentation process when the energy or amplitude of the reference signal is equal to or lower than a first threshold value having a fixed value and the rate of change is equal to or lower than a certain level. The speech coding apparatus according to (5). Action 6: If the energy or amplitude of the input signal is below a certain fixed threshold and the rate of change is within a certain range, the threshold is changed, and when the rate of change exceeds a certain range, the voice unit is activated. It is determined that the rising edge of. Effect 6: Even if the level of the background noise varies with time, the rising edge of the voice part can be detected.

【００８３】（７）前記セグメンテーション手段は、
前記参照信号のエネルギ又は振幅が固定された値を有す
る第１の閾値以下の状態が、複数のフレームにわたって
保持されるとき、前記セグメンテーション処理に適用す
る閾値を更新する手段を具備することを特徴とする(
５) に記載の音声符号化装置。作用７：入力信号のエネルギまたは振幅が、ある固定
閾値をある決まったサブフレーム数だけ連続して下回っ
たとき閾値を変更し、雑音部と判定する。効果７：背景雑音のレベルが時間的に変動している場
合でも、音声部の立ち下がりの検出が可能である。(7) The segmentation means is
When the state below the first threshold value, in which the energy or amplitude of the reference signal has a fixed value, is maintained over a plurality of frames, it is provided with means for updating the threshold value applied to the segmentation processing. Do (
The speech coding apparatus according to 5). Action 7: When the energy or amplitude of the input signal is continuously lower than a certain fixed threshold value by a predetermined number of subframes, the threshold value is changed and it is determined as a noise part. Effect 7: Even when the level of the background noise is temporally varying, the fall of the voice part can be detected.

【００８４】（８）前記セグメンテーション手段は、
閾値を更新する手段によって処理を行うことを特徴とす
る( ６) 〜( ７) に記載の音声符号化装置。作用８：入力信号のエネルギまたは振幅が、ある固定
閾値以下で、変化率がある一定の範囲に収まっていれば
閾値を変更し、変化率が一定の範囲を超えた場合に、音
声部への立ち上がりと判定し、また、入力信号のエネル
ギまたは振幅が、ある固定閾値を所定のサブフレーム数
だけ連続して下回ったとき閾値を変更して、雑音部と判
定する。効果８：背景雑音のレベルが時間的に変動している場
合でも、音声部の検出が可能である。(8) The segmentation means is
The speech coding apparatus according to (6) to (7), characterized in that processing is performed by means for updating the threshold value. Action 8: If the energy or amplitude of the input signal is below a certain fixed threshold and the rate of change is within a certain range, the threshold is changed, and if the rate of change exceeds a certain range It is determined to be a rising edge, and when the energy or amplitude of the input signal is continuously lower than a certain fixed threshold value by a predetermined number of subframes, the threshold value is changed and it is determined to be a noise part. Effect 8: The voice part can be detected even when the level of the background noise varies with time.

【００８５】[0085]

【発明の効果】以上、本発明の音声符号化装置によれ
ば、次のような効果が得られる。従来に比較してより簡
単な演算処理を用い、リアルタイム処理でサブフレーム
毎に入力信号中の雑音部を抑圧できる。また、その入力
信号の音声部の雑音感を目立たなくすると同時に、イン
パルス的な背景雑音の発生に対してもその抑圧効果があ
る。その結果、雑音の多い劣悪な環境下における使用に
おいても良好な音声音質が得られる。As described above, according to the speech coding apparatus of the present invention, the following effects can be obtained. It is possible to suppress the noise part in the input signal for each sub-frame by real-time processing by using a simpler arithmetic processing than the conventional one. Further, the noise sense of the voice part of the input signal is made inconspicuous, and at the same time, it has an effect of suppressing the generation of impulsive background noise. As a result, good voice quality can be obtained even when used in a noisy and poor environment.

[Brief description of drawings]

【図１】第１実施例に係わる本発明装置の構成を示す
ブロック図。FIG. 1 is a block diagram showing the configuration of a device of the present invention according to a first embodiment.

【図２】第２実施例に係わる本発明装置の構成を示す
ブロック図。FIG. 2 is a block diagram showing a configuration of a device of the present invention according to a second embodiment.

【図３】第１実施例に係わるプレフィルタの構成を示
すブロック図。FIG. 3 is a block diagram showing the configuration of a prefilter according to the first embodiment.

【図４】第２実施例に係わるプレフィルタの構成を示
すブロック図。FIG. 4 is a block diagram showing the configuration of a prefilter according to a second embodiment.

【図５】第１実施例に係わる（プレフィルタの）ゲイ
ン制御の処理ルーチンを示すフローチャート。FIG. 5 is a flowchart showing a processing routine of gain control (of a prefilter) according to the first embodiment.

【図６】第１、第２実施例に係わる（セグメンテーシ
ョン手段の）セグメンテーション処理ルーチンを示すフ
ローチャート。FIG. 6 is a flowchart showing a segmentation processing routine (of segmentation means) according to the first and second embodiments.

[Explanation of symbols]

１００…入力端子、１０１…ＬＰＣ分析器、１０２…Ｌ
ＰＣ量子化器、１０３…ＬＰＣ逆量子化器、１０４…プ
レフィルタ、１０５…聴覚重み付けフィルタ、１０６…
零入力フィルタ、１０７…重み付け合成フィルタ、１０
８…評価器、１０９…適応コードブック、１１０…確率
コードブック、１１１…シフトレジスタ、１１４…適応
ゲインコードブック、１１５…確率ゲインコードブッ
ク、１１９…多重化器、２００…セグメンテーション手
段、２０１…ホルマント強調手段、ゲイン制御手段２０
２、Ｓ３０１〜Ｓ３０７…ゲイン制御ルーチン，Ｓ４０
０〜Ｓ４２６…セグメンテーションルーチン。100 ... Input terminal, 101 ... LPC analyzer, 102 ... L
PC quantizer, 103 ... LPC inverse quantizer, 104 ... Pre-filter, 105 ... Auditory weighting filter, 106 ...
Zero input filter, 107 ... Weighted synthesis filter, 10
8 ... Evaluator, 109 ... Adaptive codebook, 110 ... Stochastic codebook, 111 ... Shift register, 114 ... Adaptive gain codebook, 115 ... Stochastic gain codebook, 119 ... Multiplexer, 200 ... Segmentation means, 201 ... Formant Emphasis means, gain control means 20
2, S301 to S307 ... Gain control routine, S40
0-S426 ... Segmentation routine.

Claims

[Claims]

1. A speech coding apparatus for coding by a "analysis method by combination" with a reference signal, which is a signal in which a gain of noise is suppressed, a means for emphasizing a formant of the reference signal, A speech coding apparatus comprising: a segmentation unit that performs a segmentation process that divides a noise unit and a noise unit into a plurality of segments; and a unit that suppresses the level of the noise unit obtained by the segmentation unit.

2. A speech coding apparatus for coding by a "analysis method by synthesis" with a reference signal, which is a signal in which noise gain is suppressed, in a speech coding apparatus, means for emphasizing a formant of the reference signal, and auditory weighting. Means for performing an integration process with the filter means, segmentation means for performing a segmentation process for dividing the speech part and the noise part into a plurality of segments, and means for suppressing the level of the noise part obtained by the segmentation means, A speech coding apparatus comprising:

3. A fade-in area as a predetermined area that gradually suppresses the amplitude of the background noise sound before and after a section determined to be a sound part by the segmentation means.
The speech coding apparatus according to claim 1, wherein a fade-out section is set.