JPH07334189A

JPH07334189A - Sound information analysis device

Info

Publication number: JPH07334189A
Application number: JP6131569A
Authority: JP
Inventors: Minako Oota; 美奈子太田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1994-06-14
Filing date: 1994-06-14
Publication date: 1995-12-22

Abstract

PURPOSE:To remove a dynamic noise mixed and added to an input sound signal and to improve the S/N ratio of the sound signal by extracting a noise characteristic from the frame data incorporating no sound excepting a noise, using the extracted updated noise characteristic and removing the noise. CONSTITUTION:This device is provided with a sound/silence decision part 500 and an analysis part 600. The analysis part 600 is provided with an FFT part 660 for obtaining a pitch period from a logarithmic value of a power spectrum, a pitch extraction part 670 extracting the characteristic and the repeat period (pitch information) of the sound, a noise process part 720 extracting the noise characteristic, a frame setting part 730 for silence setting the data for outputting the silence and a table 710. Then, the noise process part 720 stores noise information at a silence time in the table 710. In a noise removal part 680, noise removal and a process improving the S/N of the sound are performed by a spectrum line at a sound time by using a frequency characteristic information value stored in the table 710.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声情報分析を行う音
声情報分析装置に関し、特に、音声信号において動的に
変化するノイズに対し、Ｓ／Ｎ比を向上させるものに関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice information analysis device for analyzing voice information, and more particularly to a voice information analysis device for improving the S / N ratio with respect to noise that dynamically changes in a voice signal.

【０００２】[0002]

【従来の技術】入力音声信号を良好に伝達する従来の技
術としては、特願平５−１３８６２６号公報がある。2. Description of the Related Art As a conventional technique for satisfactorily transmitting an input voice signal, there is Japanese Patent Application No. 5-138626.

【０００３】特願平５−１３８６２６号公報の技術は、
ノイズの除去にノイズテーブルを用い、無音時に抽出し
た情報をノイズ情報として取扱い、有音時にノイズテー
ブルの情報を参照し、ノイズの除去を行う事によって、
良好な音声信号の伝達を行っていた。The technique disclosed in Japanese Patent Application No. 5-138626 is as follows.
By using a noise table for noise removal, the information extracted when there is no sound is treated as noise information, and the information in the noise table is referenced when there is sound, and noise is removed.
Good voice signal was transmitted.

【０００４】[0004]

【発明が解決しようとする課題】本発明は、上記装置に
おいて、如何なる使用用途、使用環境にも左右されず、
かつ、音声情報分析とノイズ除去が、ハードウェア的制
約もなく、入力音声信号に混合付加される動的なノイズ
を有効に取り除き、かつ音声信号のＳ／Ｎ比を向上さ
せ、良好な音声を伝達することを目的とし、高品質な音
声情報分析装置を提供することにある。DISCLOSURE OF THE INVENTION The present invention, in the above-mentioned device, is not affected by any use application or use environment,
In addition, the voice information analysis and noise removal effectively removes the dynamic noise that is mixed and added to the input voice signal without increasing the hardware limitation, and improves the S / N ratio of the voice signal to obtain good voice. An object is to provide a high-quality voice information analysis device for the purpose of transmission.

【０００５】[0005]

【課題を解決するための手段】前記目的達成のために、
本発明は、音声の標本化データを一定時間蓄積したフレ
ームデータの示す音声を分析した分析結果を出力する音
声情報分析方法であって、各フレームデータの示す音声
にノイズ以外の音声が含まれているか否かを判定するス
テップと、ノイズ以外の音声が含まれていないと判定さ
れた場合に、フレームデータからフレームデータの示す
音声に含まれているノイズの情報を抽出して記憶し、前
記フレームデータの示す音声の分析結果として、あらか
じめ用意した無音の音声の分析結果を示す情報を出力す
る無音系処理を実行するステップと、ピッチ情報から音
声の特徴情報を抽出して記憶する手段と、ノイズ以外の
音声が含まれていると判定した場合には、ノイズ以外の
音声が含まれていないと判定された場合に記憶したノイ
ズの特徴分をフレームデータの示す音声から除去する共
に、ピッチ情報から得た音声の特徴情報をもとに音声信
号を強調し、分析した結果を出力する有音系処理を実行
するステップとを有することを特徴とする音声情報分析
方法を提供する。[Means for Solving the Problems] To achieve the above object,
The present invention is a voice information analysis method for outputting an analysis result obtained by analyzing voices represented by frame data obtained by accumulating voice sampling data for a certain period of time, and voices other than noise are included in the voices indicated by each frame data. And a step of determining whether or not there is a voice other than noise, and when it is determined that the voice included in the voice indicated by the frame data is extracted from the frame data and stored, As a result of analysis of the voice indicated by the data, a step of executing a silent system process of outputting information indicating a result of analysis of a silent voice prepared in advance, a means for extracting and storing characteristic information of the voice from pitch information, and noise. When it is determined that the sound other than the noise is included, the noise feature stored when it is determined that the sound other than the noise is not included And removing the voice signal indicated by the voice data from the voice data, emphasizing the voice signal based on the feature information of the voice obtained from the pitch information, and outputting the analyzed result. A voice information analysis method is provided.

【０００６】[0006]

【作用】本発明に係る音声情報分析方法によれば、各フ
レームデータの示す音声にノイズ以外の音声が含まれて
いるか否かを判定し、ノイズ以外の音声が含まれていな
いと判定された場合に、フレームデータからフレームデ
ータの示す音声に含まれているノイズの特徴を抽出して
記憶すると共に、前記フレームデータの示す音声の分析
結果として、あらかじめ用意した無音の音声の分析結果
を示す情報を出力する無音系処理を実行する。According to the voice information analysis method of the present invention, it is determined whether the voice indicated by each frame data contains voice other than noise, and it is determined that voice other than noise is not included. In this case, the characteristics of noise included in the voice indicated by the frame data are extracted from the frame data and stored, and the information indicating the analysis result of the silent voice prepared in advance is obtained as the analysis result of the voice indicated by the frame data. Silence processing that outputs is executed.

【０００７】一方ノイズ以外の音声が含まれていると判
定した場合には、前回の無音系処理で記憶したノイズの
特徴分をフレームデータの示す音声から除去すると共
に、音声の特徴であるピッチ周期とその整数倍の周波数
を強調する処理を行い、Ｓ／Ｎ比を向上させたフレーム
データの分析結果を出力する有音系処理を実行する。On the other hand, when it is determined that a voice other than noise is included, the noise feature stored in the previous silence processing is removed from the voice indicated by the frame data, and the pitch period, which is a voice feature, is removed. And a process of emphasizing a frequency that is an integral multiple thereof are executed, and a sound system process for outputting an analysis result of frame data with an improved S / N ratio is executed.

【０００８】従って、常に最新のノイズの特徴抽出を行
っておくので、ノイズ以外の音声が含まれている期間に
は、この抽出した最新のノイズの特徴を用いて、ノイズ
の除去を行い、更に、常に最新のピッチ情報を求め音声
の特徴情報を記億させる為、使用中の話者変更にも即座
に対応し、該当する話者の音声特徴を強調することがで
きる。Therefore, since the latest noise feature is always extracted, the noise is removed by using the extracted latest noise feature during the period in which voice other than noise is included. Since the latest pitch information is always obtained and the voice feature information is stored, it is possible to immediately respond to a change in the speaker in use and emphasize the voice feature of the corresponding speaker.

【０００９】また、有音系処理と無音系処理は、同時に
発生することはないので、これらの処理の実行負荷は小
さく、音声情報分析の機能を制限せずに単一のプロセッ
サ上で実現出来る。Further, since voiced system processing and silent system processing do not occur at the same time, the execution load of these processes is small and can be realized on a single processor without limiting the voice information analysis function. .

【００１０】[0010]

【実施例】以下、本発明の実施例を幾つか説明する。EXAMPLES Some examples of the present invention will be described below.

【００１１】図４は、本発明に係る音声情報分析装置を
適用した通信システムの構成を示す。FIG. 4 shows the configuration of a communication system to which the voice information analyzing apparatus according to the present invention is applied.

【００１２】図中１０００が送信装置、２０００が受信
装置である。In the figure, 1000 is a transmitter and 2000 is a receiver.

【００１３】送信装置１０００は、音声信号を音声分析
を利用した手法で圧縮符号化して得られたレベル情報７
０１とピッチ情報６７１を受信装置２０００に送信す
る。The transmitter 1000 compresses and encodes a voice signal by a method utilizing voice analysis, and obtains level information 7
01 and pitch information 671 are transmitted to the receiving device 2000.

【００１４】前記送信装置１０００は、音声の入力手段
である音声入力部２００と、入力された入力音声をアナ
ログ／デジタル（Ａ／Ｄ）変換して、音声標本化データ
に変換するＡ／Ｄ変換器３００と、この音声標本化デー
タを、順次記憶するバッファメモリ４００と、有音／無
音を判定する有音／無音判定部５００を有する。The transmitting apparatus 1000 has a voice input unit 200 which is a voice input unit, and an A / D converter which converts the input voice input thereto into analog / digital (A / D) and converts it into voice sampled data. It has a container 300, a buffer memory 400 for sequentially storing the voice sampling data, and a voice / non-voice determining unit 500 for determining voice / non-voice.

【００１５】バッファメモリ４００は、一定時間（１０
〜３０ミリ秒）記憶すると、一定時間音声標本化データ
（フレームデータ）４０１として有音／無音判定部５０
０に送る。The buffer memory 400 has a fixed time (10
(About 30 milliseconds), the sound / silence determination unit 50 stores the sound sampling data (frame data) 401 for a certain time.
Send to 0.

【００１６】さらに、送信装置１０００は、フレームデ
ータ４０１から、正規化波形系列７０２、レベル情報７
０１と、ピッチ情報６７１を作成する分析部６００と、
正規化波形系列７０２を、ベクトルコード８０１に変換
するベクトル量子化部８００と、これらの情報を送出す
る送信部９００を有する。Further, the transmitting apparatus 1000, from the frame data 401, the normalized waveform sequence 702, the level information 7
01, and an analysis unit 600 that creates pitch information 671,
It has a vector quantizer 800 for converting the normalized waveform sequence 702 into a vector code 801, and a transmitter 900 for transmitting these pieces of information.

【００１７】第一の実施例として、請求項１に係る有音
／無音判定部５００、分析部６００のブロックを図１に
示す。As a first embodiment, FIG. 1 shows blocks of a voiced / non-voiced determination unit 500 and an analysis unit 600 according to claim 1.

【００１８】上記有音／無音判定部５００から出力され
た有音／無音判定子５１１により有音処理系と無音処理
系に処理を二分する。The voice / silent discriminator 511 output from the voice / silent discriminating unit 500 divides the process into a voice processing system and a silence processing system.

【００１９】分析部６００は、ＦＦＴ用にデータ設定を
行うＦＦＴデータ設定部６１０と、フレームデータ４０
１から前記ＦＦＴにより周波数特性を得る手段であるＦ
ＦＴ部６２０と、得られた複素数の絶対値の二乗値、パ
ワースペクトルを出力するパワースペクトル変換部６３
０と、縦軸をパワースペクトル軸から振幅軸に変換する
軸変換部６４０と、テーブル７１０を利用してノイズを
除去するノイズ除去部６８０と、逆ＦＦＴする手段であ
る逆ＦＦＴ部６９０と、逆ＦＦＴされた結果の最大値
を”１”にするための正規化部７００と、パワースペク
トルの対数値（ケプストラム）からピッチ周期を求める
ためのＦＦＴ部６６０と、音声の特徴（高さ）と繰り返
し周期（ピッチ情報）を抽出するピッチ抽出部６７０
と、ノイズの特徴を抽出するノイズ処理部７２０と、無
音を出力するためのデータを設定する無音用フレーム設
定部７３０と、テーブル７１０を有する。The analysis unit 600 includes an FFT data setting unit 610 for setting data for FFT and frame data 40.
F from 1 to means for obtaining frequency characteristics by the FFT
The FT unit 620 and the power spectrum conversion unit 63 that outputs the square value of the absolute value of the obtained complex number and the power spectrum.
0, an axis conversion unit 640 that converts the vertical axis from a power spectrum axis to an amplitude axis, a noise removal unit 680 that removes noise using the table 710, an inverse FFT unit 690 that is an inverse FFT unit, and an inverse FFT unit 690. A normalization unit 700 for setting the maximum value of the FFT result to “1”, an FFT unit 660 for obtaining a pitch period from the logarithmic value (cepstral) of the power spectrum, and a feature (pitch) of speech and repetition. Pitch extractor 670 for extracting the period (pitch information)
A noise processing unit 720 for extracting the characteristics of noise, a silence frame setting unit 730 for setting data for outputting silence, and a table 710.

【００２０】上記ＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒ
Ｔｒａｎｓｆｏｒｍ：高速フーリエ変換）部６２０、６
６０は、信号のサンプリング値から元波形を周波数と振
幅によって再生する技法ＤＦＴ（ＤｉｓｃｒｅｔｅＦ
ｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ：離散フーリエ変
換）を更に高速に実現したものである。The FFT (Fast Fourier)
Transform: Fast Fourier Transform) section 620, 6
60 is a technique for reproducing an original waveform from a sampled value of a signal by frequency and amplitude DFT (Discrete F
This is a faster implementation of the "ourier Transform" (discrete Fourier transform).

【００２１】尚、ＦＦＴ信号処理技術は、「信号処理入
門」雨宮好文／佐藤幸男箸オーム社Ｐ１０６〜
６．３「高速フーリエ変換」によって、詳細に説明
されている。The FFT signal processing technology is described in "Introduction to Signal Processing" Yoshifumi Amemiya / Yukio Sato Chopsticks Ohmsha P106-
6.3 "Fast Fourier Transform" describes in detail.

【００２２】上記ピッチ抽出部６７０で、ピッチ周期を
求める際の方法としては、並列処理法、データ減少法、
ケプストラム法、ピリオドヒストグラム法等が有る。
（「ディジタル信号処理」古井貞煕著）ピッチ抽出部
６７０では、ピッチ周期（基本周波数）の整数倍上に入
力音声信号の特徴が表れることから、この求められたピ
ッチ周期に基づいて、生成された情報をテーブル７１０
に記憶する。As a method for obtaining the pitch period in the pitch extraction section 670, a parallel processing method, a data reduction method,
There are a cepstrum method, a period histogram method, and the like.
("Digital Signal Processing" by Sadahiro Furui) In the pitch extraction section 670, since the characteristics of the input speech signal appear above the integral multiple of the pitch cycle (fundamental frequency), it is generated based on the obtained pitch cycle. Information on the table 710
Remember.

【００２３】上記テーブル７１０を用いて、ノイズを除
去し、Ｓ／Ｎ比を良くする方式例としては、テーブル７
１０に格納する情報の種類に”重み付け”を用いる方式
が考えられる。Table 7 is an example of a method for removing noise and improving the S / N ratio using the above table 710.
A method of using "weighting" for the type of information stored in 10 can be considered.

【００２４】”重み付け”は、実際には０．０〜１．０
の数値であらわす。The "weighting" is actually 0.0-1.0.
It is expressed by the numerical value of.

【００２５】テーブル７１０は”１．０”で初期化して
おく。The table 710 is initialized to "1.0".

【００２６】テーブル７１０にはパワースペクトル列
（以後パワースペクトル）に対して、０．０〜１．０
の”重み付け”をした値を格納しておき、有音時のパワ
ースペクトル列（以後音声パワースペクトル）に対して
乗算を行う方式等がある。テーブル７１０には、ノイズ
パワースペクトルに対して、その絶対値の大きい順に
０．０〜１．０の重み付けを行った結果が格納される。
つまりノイズが顕著に表れている周波数には、１．０以
下を乗算することによって、有音パワースペクトルの該
当パワースペクトル値を元の値より減少の方向に移行さ
せ、ノイズ・スペクトルが現れない周波数には１．０の
重みを付けることによってその周波数のパワースペクト
ル値がそのまま残すことになる。その結果として、ノイ
ズの取り除かれたパワースペクトル列を得ることが可能
となる。In the table 710, 0.0 to 1.0 for the power spectrum sequence (hereinafter power spectrum).
There is a method in which a value obtained by “weighting” is stored and a power spectrum sequence in the presence of a voice (hereinafter referred to as a voice power spectrum) is multiplied. The table 710 stores the result of weighting the noise power spectrum from 0.0 to 1.0 in descending order of absolute value.
That is, by multiplying the frequency in which noise is noticeable by 1.0 or less, the corresponding power spectrum value of the voiced power spectrum is shifted in the direction of decreasing from the original value, and the frequency at which the noise spectrum does not appear. Is weighted with 1.0, the power spectrum value of the frequency is left as it is. As a result, it is possible to obtain a power spectrum sequence from which noise has been removed.

【００２７】ノイズ処理部７２０では、無音時のノイズ
情報を前記の方式によって格納する。The noise processing section 720 stores the noise information when there is no sound by the above method.

【００２８】更に、上記ノイズ処理部７２０によってあ
らかじめ作成されたテーブル７１０に対し、ピッチ抽出
部６７０では、テーブル７１０上に格納されている情報
（既に、０．０〜１．０が格納されている）に対し増加
方向に加算する（加算仮値：０．０〜１．０）。従っ
て、ピッチ周期周波数すなわち基本周波数の整数倍の周
波数に対応するテーブル７１０の値は１．０〜２．０の
値を取るので、該当した周波数は実際よりも強調される
ことになり、再生される音声信号の強調を計る。この
時、テーブル７１０に対する加算値はどの（基本周波数
の整数倍の）周波数に対しても同値であることが求めら
れる。同値でなければ、本来あった音声情報を損ない、
音声自体を加工してしまうからである。Furthermore, in contrast to the table 710 created in advance by the noise processing section 720, the pitch extraction section 670 stores information stored in the table 710 (already stores 0.0 to 1.0). ) Is added in the increasing direction (temporary addition value: 0.0 to 1.0). Therefore, since the value of the table 710 corresponding to the pitch period frequency, that is, the frequency that is an integral multiple of the fundamental frequency takes a value of 1.0 to 2.0, the corresponding frequency is emphasized more than the actual frequency and is reproduced. Sound signal is emphasized. At this time, the added value to the table 710 is required to be the same value for any frequency (an integral multiple of the fundamental frequency). If they are not the same value, the original audio information will be lost,
This is because the voice itself is processed.

【００２９】この様に、ノイズの情報と音声信号の特徴
情報を同じテーブルで利用することによって、テーブル
領域の節約が出来る。In this way, the table area can be saved by using the noise information and the voice signal characteristic information in the same table.

【００３０】上記ノイズ除去部６８０では、テーブル７
１０に格納されている周波数特性情報値を用いて、有音
時のスペクトル列より、ノイズ除去と、音声のＳ／Ｎを
あげる処理する。In the noise removing section 680, the table 7
Using the frequency characteristic information value stored in 10, processing is performed to remove noise and increase the S / N of the voice from the spectrum sequence in the presence of voice.

【００３１】次に、第二の実施例について説明する。Next, the second embodiment will be described.

【００３２】請求項１に係る、有音／無音部５００、分
析部６００のブロックを図２に示す。FIG. 2 shows blocks of the voiced / silent part 500 and the analysis part 600 according to claim 1.

【００３３】本第２実施例は、前記第１実施例と有音／
無音判定部５００と分析部６００の構成のみが異なる。The second embodiment is similar to the first embodiment in that the voice /
Only the configurations of the silence determination unit 500 and the analysis unit 600 are different.

【００３４】有音／無音判定部５００は、フレームデー
タをＦＦＴ処理するためのデータ設定を行うＦＦＴデー
タ設定部６１０と、フレームデータをＦＦＴする手段で
あるＦＦＴ部６２０と、得られた複素数の二乗和を求め
るパワースペクトル変換部６３０と、上記フレームデー
タよりフレームデータの総和を取り、閾値と比較し、有
音／無音判定子５１１を出力する音声パワー判定部５１
０と、上記有音／無音判定子の判定をする有音／無音判
定部６５０と、パワースペクトルをＦＦＴ処理するため
のＦＦＴ部６６０と、上記パワースペクトルの対数、ケ
プストラムよりピッチ情報を抽出し、無音時にはピッチ
周期が一定に定まらないこと（「ディジタル信号処理」
古井貞煕箸Ｐ５７〜Ｐ５９４．９ピッチ抽出）
から有音／無音を判定して、有音／無音判定子を出力す
るピッチ抽出部６７０を有する。The sound / silence judging section 500 includes an FFT data setting section 610 for setting data for FFT processing of frame data, an FFT section 620 for FFTing frame data, and a square of the obtained complex number. A power spectrum conversion unit 630 for obtaining a sum and an audio power determination unit 51 for obtaining a sum of frame data from the frame data, comparing the sum with a threshold value, and outputting a sound / silence determiner 511.
0, a voice / silence determination unit 650 for determining the voice / silence determiner, an FFT unit 660 for FFT processing the power spectrum, logarithm of the power spectrum, and pitch information extracted from the cepstrum, The pitch period is not fixed during silence ("Digital signal processing")
Furui Sadahi chopsticks P57-P59 4.9 pitch extraction)
It has a pitch extraction unit 670 that determines whether there is sound or no sound and outputs a sound / soundless discriminator.

【００３５】有音／無音の判定には、時間軸上のピッチ
情報を周波数軸上のピッチ周期に変換し、パワースペク
トルに対し、ピッチ周期ごとに極大値が存在すれば有
音、存在しなければ無音として判定を行う方法を実現す
る。In determining the presence / absence of a sound, the pitch information on the time axis is converted into a pitch period on the frequency axis, and if there is a maximum value for each pitch period in the power spectrum, the presence or absence of the sound is present. For example, it realizes the method of judging as silence.

【００３６】ピッチ周期（基本周波数）を求める方法と
して、ケプストラム法を用いれば、パワースペクトルの
対数のフーリエ変換により、スペクトルの包絡を微細構
造を分離させることによって求める。求められたピッチ
周期はテーブル７１０に記憶させる。If the cepstrum method is used as a method for obtaining the pitch period (fundamental frequency), the envelope of the spectrum is obtained by separating the fine structure by the Fourier transform of the logarithm of the power spectrum. The obtained pitch period is stored in the table 710.

【００３７】又、分析部６００は、前記軸変換部６４
０、前記ノイズ除去部６８０、前記逆ＦＦＴ部６９０、
前記正規化部７００、前記有音／無音部７４０、前記ノ
イズ処理部７２０、前記無音用データ設定部７３０を持
ち、無音処理系と判断された場合、得られた情報を全て
ノイズデータとしてテーブル７１０に記憶させ、有音処
理系と判断された場合、はテーブル７１０を用い、前記
ノイズ除去部６８０が処理する。Further, the analysis unit 600 includes the axis conversion unit 64.
0, the noise removing unit 680, the inverse FFT unit 690,
Having the normalization unit 700, the voice / silence unit 740, the noise processing unit 720, and the silence data setting unit 730, when it is determined to be a silence processing system, all of the obtained information is noise data in the table 710. If it is determined to be a sound processing system, the noise removal unit 680 uses the table 710 to process.

【００３８】上記有音／無音判定部５００から出力され
た有音／無音判定子５１１により有音処理系と無音処理
系に処理を二分する。The voice / silent discriminator 511 output from the voice / silent discriminating section 500 divides the process into a voice processing system and a silence processing system.

【００３９】分析部６００は、縦軸をパワースペクトル
軸から振幅軸に変換する軸変換部６４０と、ノイズを除
去するノイズ除去部６８０と、逆ＦＦＴする手段である
逆ＦＦＴ部６９０と、逆ＦＦＴされた結果の最大値を”
１”にするための正規化部７００と、ノイズの特徴を抽
出するノイズ処理部７２０と、無音を出力するためのデ
ータを設定する無音用データ設定部７３０と、テーブル
７１０を有する。The analysis section 600 includes an axis conversion section 640 for converting the vertical axis from a power spectrum axis to an amplitude axis, a noise removal section 680 for removing noise, an inverse FFT section 690 which is means for inverse FFT, and an inverse FFT. The maximum value of the results
It has a normalization unit 700 for setting to 1 ″, a noise processing unit 720 for extracting noise characteristics, a silence data setting unit 730 for setting data for outputting silence, and a table 710.

【００４０】この様に、第一の実施例と第２の実施例
は、構成こそ違うが、処理ステップ（ブロック）は同数
である。第二実施例は、第一実施例と同じ処理時間で、
より有音／無音処理系の選択を厳密に行う。従って目的
であるノイズの除去、Ｓ／Ｎ比の向上が行うことができ
る。As described above, the first embodiment and the second embodiment have the same configuration but the same number of processing steps (blocks). The second embodiment has the same processing time as the first embodiment,
The sound / silence processing system is selected more strictly. Therefore, it is possible to remove the noise and improve the S / N ratio, which are the objectives.

【００４１】次に、第三の実施例を説明する。Next, a third embodiment will be described.

【００４２】本第３実施例は、前記第１実施例、第２実
施例とは用いるテーブルの構成のみが異なる。The third embodiment differs from the first and second embodiments only in the structure of the table used.

【００４３】請求項１に係る分析部６００のブロック図
を図３に示す。A block diagram of the analysis unit 600 according to claim 1 is shown in FIG.

【００４４】本第３実施例は、ノイズ情報と、音声特徴
情報をそれぞれ別のテーブルで管理する。In the third embodiment, the noise information and the voice characteristic information are managed in separate tables.

【００４５】上記ノイズ処理部７２０においては、ノイ
ズ周波数に該当した周波数のスペクトルをテーブル７１
０に記憶させ、一方ピッチ抽出部６７０においては該当
した周波数のテーブル７５０を記憶させる。その後、ノ
イズ除去部６６０においては、テーブル７１０に記憶さ
れた周波数を減算した後、テーブル７５０を用いること
により目的を実現する。In the noise processing section 720, the spectrum of the frequency corresponding to the noise frequency is stored in the table 71.
0, while the pitch extraction unit 670 stores a table 750 of the corresponding frequencies. After that, in the noise removing unit 660, after the frequencies stored in the table 710 are subtracted, the purpose is realized by using the table 750.

【００４６】この様に、二つ、又は複数のテーブルを用
い様々な情報を別々に記憶させる方式も考えられる。As described above, a method of separately storing various information by using two or more tables can be considered.

【００４７】以上の様に、本実施例によれば、絶えず変
化する周囲のノイズを常に抽出し、そのノイズ特徴を取
り除き、ピッチ情報から入力音声信号の特徴を常に抽出
し、その特徴を強調することによって、如何なる状況で
も対応でき、的確なＳ／Ｎ比の向上を行い良質の音声信
号処理を行う事ができる。As described above, according to this embodiment, the constantly changing ambient noise is always extracted, the noise feature is removed, the feature of the input voice signal is always extracted from the pitch information, and the feature is emphasized. As a result, it is possible to deal with any situation, to improve the S / N ratio accurately, and to perform high-quality audio signal processing.

【００４８】さて、前記第１、第２、第３実施例に係る
各部の処理はプロセッサ上で動作するプログラムとして
実現することができる。この場合、無音時には、無音処
理系の処理のみを行い、有音時には有音処理系の処理の
みを行えば良く、かつ、無音／有音判定、もしくは正規
化する為に算出したピッチ情報を流用することにより、
新たな処理もしくは装置を必要とせず、プロセッサの負
荷は小さく、単一のプロセッサ上で動作するプログラム
として実現することができる。By the way, the processing of each unit according to the first, second and third embodiments can be realized as a program which operates on a processor. In this case, when there is no sound, only the processing of the silent processing system needs to be performed, and when there is sound, only the processing of the voice processing system needs to be performed, and the pitch information calculated for the silent / sound determination or normalization is used. By doing
No new processing or device is required, the load on the processor is small, and it can be realized as a program that operates on a single processor.

【００４９】なお、以上の実施例では、通信システムの
適用を例にとり説明したが、本第１、第２、第３実施例
に係る音声情報分析装置は、この他、分析部６００の分
析結果を用いて、音声認識などの処理を行う装置など、
多様な装置に適用することができる。In the above embodiments, the application of the communication system has been described as an example. However, in the voice information analysis apparatus according to the first, second and third embodiments, the analysis result of the analysis unit 600 is also included. A device that performs processing such as voice recognition using
It can be applied to various devices.

【００５０】[0050]

【発明の効果】本発明は、絶えず変化する周囲のノイズ
を常に抽出し、そのノイズ特徴を取り除くことによっ
て、如何なる状況でも的確なノイズ除去を行うだけでな
く、常に最新のピッチ情報（基本周波数）を利用し、音
声周波数を強調させることによって、話者の交代にも対
応でき、Ｓ／Ｎ比を向上させることができる。The present invention not only performs accurate noise removal under any circumstances by constantly extracting ambient noise that constantly changes and removing its noise characteristics, but also always provides the latest pitch information (fundamental frequency). By using and to emphasize the voice frequency, it is possible to cope with the change of the speaker and to improve the S / N ratio.

【００５１】又、有音／無音系処理選択の条件を増やす
ことによって、確実に音声情報から音声情報テーブルを
作成することによって、有音系処理においてその効果を
あげることができる。Further, by increasing the conditions for selecting the voiced / silent system processing, by reliably creating the voice information table from the voice information, the effect can be enhanced in the voiced system processing.

【００５２】更に、音声情報テーブルは最低一つあれば
よく、その個数により、使用用途にあった効果をハード
ウエアの変更なしに、実現出来る。Furthermore, at least one voice information table is required, and depending on the number of voice information tables, the effect suitable for the intended use can be realized without changing the hardware.

[Brief description of drawings]

【図１】本発明の実施例１の有音／無音判定部、分析部
を示すブロック図である。FIG. 1 is a block diagram showing a sound / silence determination unit and an analysis unit according to a first embodiment of the present invention.

【図２】本発明の実施例２の有音／無音判定部、分析部
を示すブロック図である。FIG. 2 is a block diagram showing a sound / silence determination unit and an analysis unit according to a second embodiment of the present invention.

【図３】本発明の実施例３の有音／無音判定部、分析部
を示すブロック図である。FIG. 3 is a block diagram showing a sound / silence determination unit and an analysis unit according to a third embodiment of the present invention.

【図４】本発明の音声情報装置を示すブロック図であ
る。FIG. 4 is a block diagram showing a voice information device of the present invention.

[Explanation of symbols]

１０００…本発明に係る音声情報分析装置（送信側）、２００…音声入力装置、３００…Ａ／Ｄ変換器、４００…バッファメモリ、４０１…フレームデータ、５００…有音／無音判定部、５１０…音声パワー判定部、５１１…有音／無音判定子、６００…分析部、６１０…ＦＦＴデータ設定部、６２０…ＦＦＴ部、６３０…パワースペクトル変換部、６３１…パワースペクトル列、６４０…軸変換部、６７０…ピッチ抽出部、６７１…ピッチ情報、６８０…ノイズ除去部、６９０…逆ＦＦＴ部、７００…正規化部、７０１…レベル情報、７０２…正規化波形系列、７１０…ノイズデーブル、７２０…ノイズ処理部、７３０…無音用フレーム設定部、７４０…有音／無音部、７５０…音声情報テーブル、６５０…有音／無音部、６６０…ＦＦＴ部、８００…ベクトル量子化部、８０１…ベクトルコード、９００…送信部、２０００…請求項１に係る音声情報分析装置（受信
側）、２１００…受信部、２２００…ベクトル逆量子化部、２３００…合成部、２４００…バッファメモリ、２５００…Ｄ／Ａ変換部、２６００…音声出力装置。1000 ... Voice information analysis device (sending side) according to the present invention, 200 ... Voice input device, 300 ... A / D converter, 400 ... Buffer memory, 401 ... Frame data, 500 ... Voice / silence determination unit, 510 ... Voice power determination unit, 511 ... Voice / silence determiner, 600 ... Analysis unit, 610 ... FFT data setting unit, 620 ... FFT unit, 630 ... Power spectrum conversion unit, 631 ... Power spectrum sequence, 640 ... Axis conversion unit, 670 ... Pitch extraction section, 671 ... Pitch information, 680 ... Noise removal section, 690 ... Inverse FFT section, 700 ... Normalization section, 701 ... Level information, 702 ... Normalized waveform sequence, 710 ... Noise table, 720 ... Noise processing Section, 730 ... Silence frame setting section, 740 ... Spoken / Silence section, 750 ... Voice information table, 650 ... Spoken / Silence , 660 ... FFT section, 800 ... Vector quantization section, 801 ... Vector code, 900 ... Transmission section, 2000 ... Speech information analysis apparatus (reception side) according to claim 1, 2100 ... Reception section, 2200 ... Vector dequantization Unit, 2300 ... Synthesis unit, 2400 ... Buffer memory, 2500 ... D / A conversion unit, 2600 ... Audio output device.

フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｈ０４Ｂ 15/00 Continuation of front page (51) Int.Cl. ⁶ Identification number Office reference number FI technical display area H04B 15/00

Claims

[Claims]

1. A voice information analysis device for outputting a result of analysis of voice information represented by frame data, which is obtained by accumulating voice sampled data for a certain period of time, wherein the voice represented by each frame data includes voice other than noise. If it is determined that the sound other than noise is not included, the characteristics of the noise included in the sound indicated by the frame data are extracted from the frame data and stored in a table. Then
As a voice analysis result indicated by the frame data, a step of executing a silence system processing for outputting information representing the analysis result of a silence voice prepared in advance, and a table of voice pitch information when voices other than noise are included. When it is determined that the means for storing in and the sound other than noise is included,
The process of emphasizing the characteristics of the audio signal obtained from the pitch information, and when it is determined that the audio other than noise is not included, the characteristics of the noise stored in the table are removed from the audio represented by the frame data to remove the noise. And a step of analyzing a voice represented by the frame data from which the characteristic is removed and outputting a result of the analysis, the voice information analyzing apparatus.

2. A voice compression encoding apparatus comprising: the voice information analyzing apparatus according to claim 1; and means for quantizing the waveform information output by the voice information analyzing apparatus and outputting quantized data. .

3. The voice information analyzing apparatus according to claim 1 or the voice compression encoding apparatus according to claim 2, further comprising means for transmitting via a wired or wireless transmission path. Terminal device.

4. A communication system comprising one or a plurality of the communication terminal devices according to claim 3.