JPH06348293A

JPH06348293A - Voice information analyzing device

Info

Publication number: JPH06348293A
Application number: JP5138626A
Authority: JP
Inventors: Minako Oota; 美奈子太田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1993-06-10
Filing date: 1993-06-10
Publication date: 1994-12-22

Abstract

PURPOSE:To effectively eliminate noise, whose features change dynamically, from a voice information analyzing device. CONSTITUTION:When a sound and silence detection section 720 discriminates it as no sound, a noise processing section 690 obtains noise features from the power spectrum of the input voice outputted by an axis transformation section 640 and stores them in a noise table 710. A no sound frame setting section 700 outputs the data, which signify silence, as a normalized waveform series 682. When it is discriminated as a sound, a noise eliminating section 660 eliminates noise from the power spectrum column outputted by the section 640 using the noise features stored in the table 710. An inverse FFT section 670 performs an inverse FFT of the power spectrum from which noise is eliminated. A normalizing section 680 normalizes the output of the section 670 using the pitch information received from a pitch extracting section 650 and outputs them as a normalized waveform series 682. Thus, by constantly extracting and eliminating environmental noise features that are mixed in the input signals and are constantly changing, noise elimination is securely performed under any circumstances.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声情報分析を行う音
声情報分析装置に関し、特に、音声信号から動的に変化
するノイズ成分を除去する技術に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice information analysis apparatus for analyzing voice information, and more particularly to a technique for removing a dynamically changing noise component from a voice signal.

【０００２】[0002]

【従来の技術】入力音声信号からノイズを除去する従来
の技術としては、特開平２−２７８２９８号公報記載の
技術や、特開平１−７５５９３号公報記載の技術が知ら
れている。2. Description of the Related Art As a conventional technique for removing noise from an input audio signal, a technique described in JP-A-2-278298 and a technique described in JP-A-1-75593 are known.

【０００３】前記特開平２−２７８２９８号公報記載の
技術は、フィルタを細分化して使用し、固定的に特定の
周波数の除去する技術であり、その音質は除去する周波
数によって左右される。また、前記特開平１−７５５９
３号公報記載の技術は、ノイズ除去用のニューラル・ネ
ットワークを用いて、あらかじめノイズの特徴を学習／
抽出し、これを用いてノイズを除去する技術であり、そ
の音質は、ノイズの特徴を学習／抽出に用いるデータ
（音声＋ノイズ）に混入するノイズ選定に左右される。The technique described in Japanese Patent Laid-Open No. 2-278298 is a technique in which a filter is subdivided and used to fixedly remove a specific frequency, and its sound quality depends on the frequency to be removed. Further, the above-mentioned JP-A-1-7559.
The technique described in Japanese Patent Publication No. 3 uses a neural network for noise removal to previously learn the features of noise /
This is a technique for extracting and removing noise using this. The sound quality depends on the noise selection that mixes the characteristics of the noise with the data (voice + noise) used for learning / extraction.

【０００４】[0004]

【発明が解決しようとする課題】前記、特開平２−２７
８２９８号公報記載のノイズ除去用のフィルタを用いて
特定の周波数を除去する技術によれば、除去対象とする
周波数のが固定的であるため、話者、背景の移動、変化
に伴い動的に特徴が変化するノイズの除去には適してい
ない。DISCLOSURE OF THE INVENTION Problems to be Solved by the Invention
According to the technique of removing a specific frequency by using the noise removal filter described in Japanese Patent No. 8298, since the frequency to be removed is fixed, it dynamically changes with the movement and change of the speaker and the background. It is not suitable for removing noise with changing characteristics.

【０００５】また、前記特開平１−７５５９３号公報記
載のノイズの特徴を、あらかじめ抽出／学習させておく
技術でも、ニューラル・ネットワークの性質上、ノイズ
学習には長時間を要するので、話者、背景の移動、変化
に伴い動的に特徴が変化するノイズの除去には適してい
ない。Even in the technique of extracting / learning the characteristics of noise described in Japanese Patent Laid-Open No. 1-75559, it takes a long time to learn noise due to the nature of the neural network. It is not suitable for removing noise whose characteristics change dynamically as the background moves or changes.

【０００６】また、音声情報分析を行う場合に、音声情
報分析対象の音声信号から、ノイズを除去するために、
これらの技術を適用すると、いずれの場合も、音声情報
分析処理の前処理として、声分析処理とは別個にノイズ
除去の処理を行わなければならない。このため、リアル
タイムな音声情報分析処理を行う場合には、過負荷とな
る可能性があり、音声情報分析の機能を制限して実現す
るか、各処理を、異なるプロセッサで実行させることに
より実現する必要が生じる場合がある。Further, when performing voice information analysis, in order to remove noise from the voice signal of the voice information analysis target,
When these techniques are applied, in any case, the noise removal process must be performed separately from the voice analysis process as a pre-process of the voice information analysis process. For this reason, when performing real-time voice information analysis processing, there is a possibility that it will be overloaded, and it is realized by limiting the function of voice information analysis, or by executing each processing by a different processor. There may be a need.

【０００７】そこで、本発明は、音声情報分析の対象と
する音声信号から、動的に特徴が変化するノイズの除去
を効率良く行うことのできる音声情報分析装置を提供す
ることを目的とする。Therefore, an object of the present invention is to provide a voice information analysis apparatus which can efficiently remove noise whose characteristics change dynamically from a voice signal which is a target of voice information analysis.

【０００８】[0008]

【課題を解決するための手段】前記目的達成のために、
本発明は、音声の標本化データを一定時間集積したフレ
ームデータの示す音声を分析した分析結果を出力する音
声情報分析方法であって、各フレ−ムデータの表す音声
にノイズ以外の音声が含まれているか否かを判定するス
テップと、ノイズ以外の音声が含まれていないと判定さ
れた場合に、フレームデータからフレ−ムデータの表す
音声に含まれているノイズの特徴を抽出して記憶し、前
記フレームデータの示す音声の分析結果として、あらか
じめ用意した無音の音声の分析結果を表す情報を出力す
る無音系処理を実行ステップと、ノイズ以外の音声が含
まれているとと判定した場合には、前回の無音系処理で
記憶したノイズの特徴分をフレームデータの表す音声か
ら除去し、ノイズの特徴分を除去したフレームデータの
示す音声を分析し、分析した結果を出力する有音系処理
を実行するステップとを有することを特徴とする音声情
報分析方法を提供する。[Means for Solving the Problems] To achieve the above object,
The present invention is a voice information analysis method for outputting an analysis result obtained by analyzing voices represented by frame data obtained by collecting voice sampling data for a certain period of time, and voices other than noise are included in voices represented by each frame data. If it is determined that a voice other than noise is not included, the feature of noise included in the voice represented by the frame data is extracted from the frame data and stored, As the analysis result of the voice indicated by the frame data, a step of executing a silent system process of outputting information representing the analysis result of a silence voice prepared in advance, and when it is determined that a voice other than noise is included , The noise feature stored in the previous silence processing was removed from the voice represented by the frame data, and the voice represented by the frame data from which the noise feature was removed was analyzed. Having and executing the sound system processing of outputting the result of analysis to provide voice information analyzing method comprising.

【０００９】[0009]

【作用】本発明に係る音声情報分析方法によれば、各フ
レ−ムデータの表す音声にノイズ以外の音声が含まれて
いるか否かを判定し、ノイズ以外の音声が含まれていな
いと判定された場合に、フレームデータからフレ−ムデ
ータの表す音声に含まれているノイズの特徴を抽出して
記憶すると共に、前記フレームデータの示す音声の分析
結果として、あらかじめ用意した無音の音声の分析結果
を表す情報を出力する無音系処理を実行する。一方、ノ
イズ以外の音声が含まれているとと判定した場合には、
前回の無音系処理で記憶したノイズの特徴分をフレーム
データの表す音声から除去し、ノイズの特徴分を除去し
たフレームデータの示す音声を分析し、分析した結果を
出力する有音系処理を実行する。According to the voice information analysis method of the present invention, it is determined whether or not the voice represented by each frame data includes voice other than noise, and it is determined that voice other than noise is not included. In this case, the characteristic of noise included in the voice represented by the frame data is extracted from the frame data and stored, and the analysis result of the silent voice prepared in advance is used as the analysis result of the voice indicated by the frame data. Executes silent processing that outputs the information that is represented. On the other hand, when it is determined that the sound other than noise is included,
The noise feature stored in the previous silence processing is removed from the voice represented by the frame data, the voice represented by the frame data from which the noise feature is removed is analyzed, and the voiced processing that outputs the analysis result is executed. To do.

【００１０】したがって、ノイズ以外の音声が含まれて
いない期間、すなわち無音とみなして、あらかじめ求ま
る無音の分析結果を出力すれば足りる期間に、常に最新
のノイズの特徴抽出を行っておくので、ノイズ以外の音
声が含まれている期間には、この抽出した最新のノイズ
の特徴を用いて、ノイズの除去を行うことができる。ま
た、有音系処理と無音系処理は、同時に生起することは
無いので、この処理の実行負荷は小く、音声情報分析の
機能を制限せずに単一のプロセッサ上で実現できる。Therefore, the latest noise feature extraction is performed during a period in which no voice other than noise is included, that is, in a period in which it is considered that the voice is silent and the analysis result of the silence obtained in advance is sufficient. In the period in which the sound other than the above is included, noise can be removed by using the extracted latest noise feature. Further, since the voiced system processing and the silent system processing do not occur at the same time, the execution load of this processing is small and it can be realized on a single processor without restricting the voice information analysis function.

【００１１】[0011]

【実施例】以下、本発明の一実施例を説明する。EXAMPLE An example of the present invention will be described below.

【００１２】まず、第１の実施例について説明する。First, the first embodiment will be described.

【００１３】図５に、本発明に係る音声情報分析装置を
適用した通信システムの構成を示す。FIG. 5 shows the configuration of a communication system to which the voice information analyzing apparatus according to the present invention is applied.

【００１４】図中、２０００が送信装置、１０００が受
信装置である。In the figure, 2000 is a transmitter and 1000 is a receiver.

【００１５】送信装置２０００は、音声信号を音声分析
を利用した手法で圧縮符号化して得られたレベル情報と
量子化データとピッチ情報を受信装置１０００に送信す
る。受信装置１０００は受信した情報より音声を復号化
して出力する。The transmitting device 2000 transmits to the receiving device 1000 the level information, the quantized data and the pitch information which are obtained by compressing and encoding the voice signal by a method using the voice analysis. The receiving device 1000 decodes the voice from the received information and outputs it.

【００１６】ここで、前記送信装置２００は、送信部９
００、ベクトル量子化部８００、音声情報分析装置１０
０とを備えており、受信装置１０００は、受信部１１０
０と、ベクトル逆量子化部１１００と、合成部１２００
と、Ｄ／Ａ変換部１３００と、バッファメモリ１５００
と、音声出力装置１４００を備えている。Here, the transmitter 200 includes a transmitter 9
00, vector quantization unit 800, voice information analysis device 10
0, and the receiving device 1000 includes the receiving unit 110.
0, the vector dequantization unit 1100, and the synthesis unit 1200.
, D / A converter 1300, and buffer memory 1500
And an audio output device 1400.

【００１７】送信装置２０００において、音声情報分析
装置１００は、入力された入力音声を分析し、得られた
レベル情報６８１と、ピッチ情報６５１を送信部９００
に、正規化波形系列６８２をベクトル量子化部８００に
送る。ベクトル量子化部８００は受け取った正規化波形
系列６８２をベクトルコードに変換し、変換して得られ
た量子化データ８０１を送信部９００に送る。送信部９
００は、レベル情報６８１と、ピッチ情報６５１と、量
子化データ８０１を受け取り、これを有線／無線を介し
て、受信装置１０００に送信する。In the transmission device 2000, the voice information analysis device 100 analyzes the input voice that has been input, and outputs the obtained level information 681 and pitch information 651 to the transmission section 900.
Then, the normalized waveform sequence 682 is sent to the vector quantization unit 800. The vector quantizer 800 converts the received normalized waveform sequence 682 into a vector code, and sends the quantized data 801 obtained by the conversion to the transmitter 900. Transmitter 9
00 receives the level information 681, the pitch information 651, and the quantized data 801, and transmits this to the receiving apparatus 1000 via wire / wireless.

【００１８】一方、受信装置１０００において、受信部
１０００は、送信装置２０００より送信された情報を受
信し、レベル情報６８１’と量子化データ８０１’とピ
ッチ情報６５１’とを出力する。ベクトル逆量子化部１
１００は、出力された量子化データ８０１’を逆量子化
する。合成部１２００は、ベクトル逆量子化部１１００
が出力する波形６８２’を、前記ピッチ情報６５１’に
基づき、繰り返し周期（ピッチ情報）毎に重ね合わせる
ことにより、波形を合成し、バッファメモリ１５００格
納する。Ｄ／Ａ変換部１３００は、バッファメモリ４０
０の出力をデジタル／アナログ（Ｄ／Ａ）変換する。音
声出力装置１４００は、Ｄ／Ａ変換器１３００により得
られた音声を出力する。On the other hand, in the receiver 1000, the receiver 1000 receives the information transmitted from the transmitter 2000 and outputs level information 681 ', quantized data 801' and pitch information 651 '. Vector inverse quantizer 1
100 inversely quantizes the output quantized data 801 '. The synthesizer 1200 includes a vector dequantizer 1100.
The waveforms 682 ′ output by the above are superposed for each repeating period (pitch information) based on the pitch information 651 ′, thereby synthesizing the waveforms and stored in the buffer memory 1500. The D / A converter 1300 has a buffer memory 40.
The output of 0 is converted to digital / analog (D / A). The audio output device 1400 outputs the audio obtained by the D / A converter 1300.

【００１９】以下、前記音声情報分析装置１００の詳細
について説明する。The details of the voice information analysis apparatus 100 will be described below.

【００２０】図５に示すように、前記音声情報分析装置
１００は、音声の入力手段である音声入力部２００と、
入力された入力音声をアナログ／デジタル（Ａ／Ｄ）変
換して、音声標本化データに変換するＡ／Ｄ変換器３０
０と、この音声標本化データを、順次記憶するバッファ
メモリ４００を有している。バッファメモリ４００に記
憶された一定時間（１０〜３０ミリ秒）毎の音声標本化
データは、一定時間音声標本化データ（フレームデー
タ）４０１として有音／無音判定部５００に送られる。
また、音声情報分析装置１００は、有音／無音を判定す
る有音／無音判定部５００と、音声情報分析装置１００
は、フレームデータ４０１から、正規化波形系列６８
２、レベル情報６８１と、ピッチ情報６５１を作成する
分析部６００とを有している。As shown in FIG. 5, the voice information analysis apparatus 100 includes a voice input section 200 which is a voice input means,
A / D converter 30 for converting input input voice into analog / digital (A / D) and converting into voice sampling data
0 and a buffer memory 400 for sequentially storing the voice sampling data. The voice sampling data for every fixed time (10 to 30 milliseconds) stored in the buffer memory 400 is sent to the voice / non-voice determination unit 500 as the voice sampling data (frame data) 401 for a fixed time.
Further, the voice information analysis device 100 includes a voice / non-voice determination unit 500 for determining voice / non-voice, and the voice information analysis device 100.
Is a normalized waveform sequence 68 from the frame data 401.
2. It has level information 681 and an analysis unit 600 that creates pitch information 651.

【００２１】まず、有音／無音判定部５００は、図１に
示すように、音声パワー判定部５１０から構成される。
音声パワー判定部５１０は、フレームデータ４０１の各
要素の総和（パワ−）を求め、これと、閾値（固定値）
と比較した後、閾値より小さければ無音、大きければ有
音と判定し、有音／無音を表す有音／無音判定子５１１
を出力する。First, the voiced / non-voiced determination unit 500 is composed of a voice power determination unit 510, as shown in FIG.
The audio power determination unit 510 obtains the total sum (power) of each element of the frame data 401, and this and a threshold value (fixed value).
After that, if it is smaller than the threshold value, it is determined to be silent, and if it is larger than the threshold value, it is determined to be voiced.
Is output.

【００２２】なお、上記閾値で十分な効果が得られなけ
れば、閾値を、式１に従い可変とするようにしてもよ
い。なお、式１において、ωは、０＜ω＜１を満たす重
み値である。If a sufficient effect cannot be obtained with the threshold value, the threshold value may be made variable according to the equation (1). In Expression 1, ω is a weight value that satisfies 0 <ω <1.

【００２３】次フレ−ムデータ閾値＝ω前フレームデータの閾値＋（１−ω）現無音パワー値．．（式１）すなわち、無音と判定されたときに、無音と判定された
パワ−の値と、前フレームデータの判定用いた閾値と
に、適当な重み付けを行い、これの加算値を次フレ−ム
データの判定に用いる閾値を求めるようにする。Next frame data threshold = ω threshold of previous frame data + (1−ω) current silent power value. ． (Equation 1) That is, when it is determined that there is no sound, the power value determined to be soundless and the threshold value used for the determination of the previous frame data are appropriately weighted, and the added value of these is added to the next frame. The threshold value used to determine the system data is calculated.

【００２４】また、音声パワー判定部５１０の有音／無
音の判定は、次のように行うようにしても良い。すなわ
ち、判定対象フレ−ムデータのパワ−と前フレ−ムデー
タのパワ−とを比較し、差が所定値より大きい場合に
は、有音／無音の状態が前回と変化したと判定し、差が
所定値より小さい場合には、有音／無音の状態が変化し
ていないと判定する。そして、記憶しておいた前フレ−
ムデータの判定結果より、判定対象フレ−ムデータにつ
いての有音／無音の判定を行う。または、差が所定値よ
り大きい場合に、さらに、判定対象フレ−ムデータのパ
ワ−と前記閾値との比較を行い有音／無音の判定を行う
ようにする。Further, the sound power judging section 510 may judge the presence / absence of sound as follows. That is, the power of the determination target frame data is compared with the power of the previous frame data, and if the difference is larger than a predetermined value, it is determined that the voiced / silent state has changed from the previous time, and the difference is If it is smaller than the predetermined value, it is determined that the state of voice / silence has not changed. And the previous frame I remembered
Based on the determination result of the frame data, the presence / absence of the frame data to be determined is determined. Alternatively, if the difference is larger than a predetermined value, the power of the frame data to be judged is further compared with the threshold value to judge whether there is sound or no sound.

【００２５】次に、分析部６００は、図２に示すよう
に、フレームデータ４０１からＦＦＴ（ＦａｓｔＦｏ
ｕｒｉｅｒＴｒａｎｓｆｏｒｍ：高速フーリエ変換）
によりフレ−ムデータの周波数特性を得るＦＦＴ部６２
０と、ＦＦＴ部６２０にデータの設定を行うＦＦＴデー
タ設定部６１０とを有している。また、ＦＦＴ部６２０
のＦＦＴにより得られた複素数の絶対値の二乗値、すな
わち、パワースペクトルを出力するパワースペクトル変
換部６３０と、パワースペクトルの縦軸をパワースペク
トル軸から振幅軸に変換する軸変換部６４０とを有して
いる。なお、ＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒ
ａｎｓｆｏｒｍ：高速フーリエ変換）は、信号のサンプ
リング値から元波形を周波数と振幅によって再生する技
法ＤＦＴ（ＤｉｓｃｒｅｔｅＦｏｕｒｉｅｒＴｒａ
ｎｓｆｏｒｍ：離散フーリエ変換）を更に高速に実現し
たものであり、このようなＦＦＴによる信号処理技術
は、「信号処理入門」雨宮好文／佐藤幸男箸オー
ム社Ｐ１０６〜６．３「高速フーリエ変換」等に
詳細に説明されている。Next, as shown in FIG. 2, the analysis unit 600 analyzes the FFT (Fast Fo) from the frame data 401.
urier Transform: Fast Fourier Transform)
FFT section 62 for obtaining the frequency characteristic of frame data by
0 and an FFT data setting unit 610 that sets data in the FFT unit 620. In addition, the FFT unit 620
The power spectrum conversion unit 630 that outputs the square value of the absolute value of the complex number obtained by the FFT, that is, the power spectrum, and the axis conversion unit 640 that converts the vertical axis of the power spectrum from the power spectrum axis to the amplitude axis. is doing. In addition, FFT (Fast Fourier Tr
transform: fast Fourier transform) is a technique for reproducing an original waveform from a sampling value of a signal by frequency and amplitude. DFT (Discrete Fourier Tra)
nsform: Discrete Fourier Transform) has been realized at a higher speed, and such signal processing technology by FFT is “Introduction to Signal Processing” Yoshifumi Amamiya / Yukio Sato Chopsticks Company P106-6.3 “Fast Fourier Transform” etc. Are described in detail in.

【００２６】また、分析部６００は、ＦＦＴ部７３０と
ピッチ抽出部６５０とノイズ除去部６６０と逆ＦＦＴ部
６７０と正規化部６８０とより構成される有音処理系
と、ノイズ処理部６９０と無音データ設定部７００とよ
り構成される無音処理系と、ノイズテーブル７１０と、
有音／無音部７２０とを有している。The analysis unit 600 includes a voice processing system including an FFT unit 730, a pitch extraction unit 650, a noise removal unit 660, an inverse FFT unit 670, and a normalization unit 680, a noise processing unit 690, and a silence. A silence processing system including a data setting unit 700, a noise table 710,
It has a voiced / silent part 720.

【００２７】有音／無音部７２０は、有音／無音判定部
５００から出力された有音／無音判定子５１１を判定
し、有音の場合は有音処理系に処理を行わせ、無音の場
合には無音処理系に処理を行わせる。The sound / silence section 720 determines the sound / silence determiner 511 output from the sound / silence determining section 500, and in the case of sound, causes the sound processing system to perform processing, and In some cases, the silent processing system is made to perform the processing.

【００２８】まず、有音／無音部７２０が無音と判定し
た場合の無音処理系の動作を説明する。First, the operation of the silence processing system when the sound / silence unit 720 determines that there is no sound will be described.

【００２９】この場合、ノイズ処理部６９０は、軸変換
部６４０の出力より、ノイズの特徴、ノイズテーブル７
１０に記憶する。すなわち、たとえば、軸変換部より軸
変換された周波数特性情報、パワ−スペクトル列をノイ
ズテ−ブル７１０に記憶する。無音用フレーム設定部７
００は、正規化波形系列６８２として、無音を正規化波
形系列表すデータ６８２ｂをベクトル量子化部８００に
出力する。In this case, the noise processing section 690 uses the output of the axis converting section 640 to determine the characteristics of the noise and the noise table 7.
Store in 10. That is, for example, the frequency characteristic information and the power spectrum sequence subjected to axis conversion by the axis conversion unit are stored in the noise table 710. Silent frame setting section 7
00 outputs, as the normalized waveform sequence 682, the data 682b representing the normalized waveform sequence of silence to the vector quantization unit 800.

【００３０】次に、有音／無音部７２０が有音と判定し
た場合の有音処理系の動作を説明する。Next, the operation of the voice processing system when the voice / silent unit 720 determines that there is voice will be described.

【００３１】この場合において、ＦＦＴ部７３０は、パ
ワースペクトルの対数値（ケプストラム）からピッチ周
期を求めるために用いられる。ピッチ抽出部６５０は、
ＦＦＴ部７３０の出力より、音声の特徴（高さ）と繰り
返し周期（ピッチ情報）を抽出する。ノイズ除去部６６
０は、ノイズテ−ブル７１０に記憶されているノイズの
特徴を用いて、軸変換部６４０の出力する有音時のパワ
−スペクトル列よりノイズを除去する。逆ＦＦＴ部６７
０は、ノイズの除去されたパワ−スペクトルを逆ＦＦＴ
する。正規化部６８０は、ピッチ抽出部から受け取った
ピッチ情報を用いて逆ＦＦＴ部６７０の出力を正規化
し、逆ＦＦＴされた結果の最大値を”１”にし、ピッチ
内の波形を表す正規化波形系列６８２ａを列正規化波形
系列６８２として出力する。また、逆ＦＦＴ部６７０の
出力のレベルの情報をレベル情報６８１として出力す
る。In this case, the FFT section 730 is used to obtain the pitch period from the logarithmic value (cepstrum) of the power spectrum. The pitch extraction unit 650
From the output of the FFT unit 730, the feature (height) of the voice and the repetition period (pitch information) are extracted. Noise removal unit 66
0 uses the characteristics of noise stored in the noise table 710 to remove noise from the power spectrum sequence in the presence of voice output from the axis conversion unit 640. Inverse FFT unit 67
0 is the inverse FFT of the de-noised power spectrum
To do. The normalization unit 680 normalizes the output of the inverse FFT unit 670 using the pitch information received from the pitch extraction unit, sets the maximum value of the results of the inverse FFT to “1”, and represents a waveform within the pitch. The sequence 682a is output as the column-normalized waveform sequence 682. The level information of the output of the inverse FFT unit 670 is output as level information 681.

【００３２】さて、ノイズ除去部６６０によるノイズ除
去は、たとえば次のようにして行う。すなわち、ノイズ
除去部６６０において、ノイズテ−ブル７１０の記憶内
容に応じて、各周波数に対する０．０〜１．０の重み付
けを記憶したノイズ・マスクテーブルを作成し、有音時
のパワースペクトル列に対して対応する重みを乗算す
る。この重みは、ノイズテ−ブル７１０に記憶されてい
る無音時のパワ−スペルトル（ノイズのパワ−スペクト
ル）の絶対値の大きい周波数がより小さくなるよう、順
に０．０〜１．０の重み付けを行う。つまりノイズが顕
著に表れている周波数には、１．０以下の値が乗算され
るようにすることによって、有音時のパワースペクトル
の該当パワースペクトル値を元の値より減少の方向に移
行させ、ノイズ・スペクトルが現れない周波数には１．
０の重みを付けることによってその周波数のパワースペ
クトル値がそのまま残す。この結果として、ノイズの取
り除かれたパワースペクトル列を得ることができる。The noise removal by the noise removal unit 660 is performed as follows, for example. That is, in the noise removing unit 660, a noise mask table storing weightings of 0.0 to 1.0 for each frequency is created in accordance with the stored contents of the noise table 710, and a power spectrum sequence at the time of sound is created. The corresponding weights are multiplied. The weights are sequentially weighted from 0.0 to 1.0 so that the frequency with a large absolute value of the power spectrum (noise power spectrum) stored in the noise table 710 during silence becomes smaller. . That is, the frequency at which noise is noticeable is multiplied by a value of 1.0 or less to shift the power spectrum value of the power spectrum in the presence of voice from the original value toward the decrease direction. , For frequencies where no noise spectrum appears, 1.
The weighting of 0 leaves the power spectrum value at that frequency unchanged. As a result, a noise-free power spectrum sequence can be obtained.

【００３３】以下、本発明の第２の実施例について説明
する。The second embodiment of the present invention will be described below.

【００３４】本第２実施例は、前記第１実施例と有音／
無音判定部５００と分析部６００の構成のみが異なる。The second embodiment is the same as the first embodiment except that the voice /
Only the configurations of the silence determination unit 500 and the analysis unit 600 are different.

【００３５】本第２実施例に係る有音／無音判定部５０
０は、図３に示すように、フレームデータをＦＦＴ処理
するためのデータ設定を行うＦＦＴデータ設定部６１０
と、フレームデータをＦＦＴする手段であるＦＦＴ部６
２０と、得られた複素数の二乗和を求めるパワースペク
トル変換部６３０と、上記フレームデータよりフレーム
データの総和を取り、閾値と比較し、有音／無音判定子
７２１を出力する音声パワー判定部５１０と、前記音有
音／無音判定子の判定をする有音／無音判定部７２０
と、パワースペクトルをＦＦＴ処理するためのＦＦＴ部
６２０と、上記パワースペクトの対数、ケプストラムよ
りピッチ情報を抽出し、無音時にはピッチ周期が一定に
定まらないこと（「ディジタル信号処理」古井貞煕箸
Ｐ５７〜Ｐ５９４．９ピッチ抽出）を利用して有
音状態を判定して、有音／無音判定子を出力するピッチ
抽出部６５０とを有している。Sound / silence determining section 50 according to the second embodiment.
As shown in FIG. 3, 0 is an FFT data setting unit 610 that sets data for FFT processing of frame data.
And an FFT unit 6 which is means for FFT frame data
20 and a power spectrum conversion unit 630 that obtains the sum of squares of the obtained complex numbers, and a voice power determination unit 510 that obtains the sum of the frame data from the frame data, compares it with a threshold value, and outputs a voice / silence determiner 721. And a sound / silence determination unit 720 that determines the sound / sound determination unit.
And the FFT unit 620 for FFT processing of the power spectrum, and the pitch information extracted from the logarithm of the power spectrum and the cepstrum, and the pitch period is not fixed when there is no sound. ~ P59 4.9 Pitch extraction) is used to determine a voiced state and output a voiced / non-voiced discriminator.

【００３６】さて、音声パワー判定部５１０は、前記第
１実施例に係る音声パワー判定部５１０と同様に、各フ
レ−ムデータのパワ−より有音／無音を判定する。しか
し、前述した判定方法によれば、パワ−のみによって有
音／無音を判定しているために、無音状態を誤って有音
と判定しまう場合がある。そこで、本第２実施例では、
音声パワー判定部５１０が有音と判定した場合に、有音
／無音判定部７２０によって、ＦＦＴ部７３０とピッチ
抽出部６５０を起動し、さらにピッチ周期を利用した有
音／無音判定を行う。Now, the voice power judging section 510, like the voice power judging section 510 according to the first embodiment, judges the presence / absence of sound based on the power of each frame data. However, according to the above-described determination method, since the presence / absence of sound is determined only by the power, the silence state may be erroneously determined as the presence of sound. Therefore, in the second embodiment,
When the voice power determining unit 510 determines that there is sound, the sound / silence determining unit 720 activates the FFT unit 730 and the pitch extracting unit 650, and further performs sound / silence determination using a pitch cycle.

【００３７】すなわち、音声パワー判定部５１０が有音
と判定した場合、ＦＦＴ部７３０は、パワースペクトル
をＦＦＴ処理する。ピッチ抽出部６５０は、この出力よ
り、パワースペクトの対数、ケプストラムよりピッチ情
報を抽出し、無音時にはピッチ周期が一定に定まらない
こと（「ディジタル信号処理」古井貞煕箸Ｐ５７〜
Ｐ５９４．９ピッチ抽出を参照）を利用して有音／
無音状態を判定して、有音／無音判定子を出力する。な
お、ピッチ抽出部６５０の有音／無音の判定は、時間軸
上のピッチ情報を周波数軸上のピッチ周期に変換し、パ
ワースペクトルに対し、ピッチ周期ごとに極大値が存在
すれば有音、存在しなければ無音として判定することに
より行うようにしてもよい。That is, when the voice power determining section 510 determines that there is sound, the FFT section 730 performs FFT processing on the power spectrum. The pitch extraction unit 650 extracts the log information of the power spectrum and the pitch information from the cepstrum from this output, and the pitch period is not fixed when there is no sound ("Digital signal processing" Sadahi Furui P57-
P59 4.9 Pitch extraction)
A silence state is determined and a sound / silence determiner is output. Note that the pitch extraction unit 650 determines the presence / absence of voice, by converting the pitch information on the time axis into a pitch cycle on the frequency axis, and if there is a maximum value for each pitch cycle in the power spectrum, the presence of a sound is detected. If it does not exist, it may be determined to be silent.

【００３８】次に、本第２実施例に係る分析部６００
は、図４に示すように、縦軸をパワースペクトル軸から
振幅軸に変換する軸変換部６４０と、ノイズを除去する
ノイズ除去部６６０と、逆ＦＦＴする手段である逆ＦＦ
Ｔ部６７０と、逆ＦＦＴされた結果の最大値を”１”に
するための正規化部６８０と、有音／無音部９２０と、
ノイズの特徴を抽出するノイズ処理部６９０と、無音を
出力するためのデータを設定する無音用データ設定部７
００と、ノイズ・テーブル７１０を有する。各部の個々
の動作は、前記第１実施例の対応部と同様である。た
だ、本第２実施例では、ノイズ除去部６６０と逆ＦＦＴ
部６７０と正規化部６８０が有音処理系を構成してい
る。そして、有音／無音部９２０が、ピッチ抽出部６５
０の出力する有音／無音判定子が有音を示す場合にの
み、この有音処理系に処理を行わせる。一方、ノイズ処
理部６９０と無音用データ設定部７００とより構成され
る無音処理系の処理は、有音／無音部９２０と、有音／
無音判定部５００の有音／無音部７２０との、少なくと
もいづれか一方が無音を判定した場合に行われる。Next, the analysis unit 600 according to the second embodiment.
4, as shown in FIG. 4, an axis conversion unit 640 that converts the vertical axis from the power spectrum axis to the amplitude axis, a noise removal unit 660 that removes noise, and an inverse FF that is means for inverse FFT.
A T section 670, a normalization section 680 for setting the maximum value of the result of the inverse FFT to “1”, a voiced / silent section 920,
A noise processing unit 690 for extracting characteristics of noise, and a silence data setting unit 7 for setting data for outputting silence.
00 and a noise table 710. The individual operation of each part is the same as that of the corresponding part of the first embodiment. However, in the second embodiment, the noise removing unit 660 and the inverse FFT are used.
The unit 670 and the normalization unit 680 form a voice processing system. Then, the voiced / silent part 920 makes the pitch extraction part 65
Only when the voiced / non-voiced discriminator output by 0 indicates a voiced voice, this voiced voice processing system is caused to perform processing. On the other hand, the processing of the silence processing system including the noise processing unit 690 and the silence data setting unit 700 is performed by the sound / silence unit 920 and the sound / silence unit 920.
This is performed when at least one of the sound / silence section 720 of the silence determination section 500 determines silence.

【００３９】以上のように、本実施例によれば、絶えず
変化する周囲のノイズを常に抽出し、そのノイズ特徴を
取り除くことによって、如何なる状況でも的確なノイズ
除去を行うことができる。As described above, according to the present embodiment, the constantly changing ambient noise is always extracted and the noise feature is removed, so that the noise can be removed accurately in any situation.

【００４０】さて、前記第１実施例および第２実施例に
係る有音／無音判定部５００と分析部６００の各部の行
う処理は、プロセッサ上で動作するプログラムとして実
現することができる。この場合、無音時には、無音処理
系の処理のみを行い、有音時には有音処理系の処理のみ
を行えばよく、かつ、無音処理系の前処理と有音処理系
の前処理は共通しているので、ノイズ除去を音声分析の
前処理として行う従来の技術に比べ、プロセッサの処理
の負荷は小さく、単一のプロセッサ上で動作するプログ
ラムとして実現することができる。By the way, the processing performed by each part of the voiced / non-voiced determination section 500 and the analysis section 600 according to the first and second embodiments can be realized as a program running on a processor. In this case, when there is no sound, only the processing of the sound processing system is performed, and when there is sound, only the processing of the sound processing system needs to be performed, and the preprocessing of the silence processing system and the preprocessing of the sound processing system are common. Therefore, the processing load of the processor is smaller than that of the conventional technique in which noise removal is performed as a pre-process of speech analysis, and the program can be implemented on a single processor.

【００４１】なお、以上の実施例では、通信システムへ
の適用を例にとり説明したが、本第１、第２実施例に係
る音声情報分析装置は、この他、分析部６００の分析結
果を用いて、音声認識等の処理を行う装置等、多様な装
置に適用することができる。In the above embodiments, the application to the communication system has been described as an example, but the voice information analysis apparatus according to the first and second embodiments uses the analysis result of the analysis unit 600 in addition to this. Therefore, it can be applied to various devices such as a device that performs processing such as voice recognition.

【００４２】[0042]

【発明の効果】以上説明してきたように、本発明によれ
ば、音声分析の対象とする音声信号から、動的に特徴が
変化するノイズの除去を効率良く行うことのできる音声
分析装置を提供することができる。As described above, according to the present invention, there is provided a voice analysis device capable of efficiently removing noise whose characteristics dynamically change from a voice signal to be subjected to voice analysis. can do.

[Brief description of drawings]

【図１】本発明の第１実施例に係る有音／無音判定部の
構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a sound / silence determination unit according to a first embodiment of the present invention.

【図２】本発明の第１実施例に係る分析部の構成を示す
ブロック図である。FIG. 2 is a block diagram showing a configuration of an analysis unit according to the first embodiment of the present invention.

【図３】本発明の第２実施例に係る有音／無音判定部の
構成を示すブロック図である。FIG. 3 is a block diagram showing a configuration of a sound / silence determination unit according to a second embodiment of the present invention.

【図４】本発明の第２実施例に係る分析部の構成を示す
ブロック図である。FIG. 4 is a block diagram showing a configuration of an analysis unit according to a second embodiment of the present invention.

【図５】本発明の実施例に係る通信システムの構成を示
すブロック図である。FIG. 5 is a block diagram showing a configuration of a communication system according to an embodiment of the present invention.

[Explanation of symbols]

１００音声情報分析装置２００音声入力装置３００Ａ／Ｄ変換器４００バッファメモリ４０１フレームデータ５００有音／無音判定部５１１有音／無音判定子６００分析部６１０ＦＦＴデータ設定部６２０ＦＦＴ部６３０パワースペクトル変換部６３１パワースペクトル列６４０軸変換部６５０ピッチ抽出部６５１ピッチ情報６６０ノイズ除去部６７０逆ＦＦＴ部６８０正規化部６８１正規化波形系列６８２レベル情報６９０ノイズ処理部７００無音用フレーム設定部７１０ノイズデーブル７２０有音／無音部８００ベクトル量子化部８０１ベクトルコード９００送信部１１００ベクトル逆量子化部１２００合成部１３００Ｄ／Ａ変換部１４００音声出力装置 100 voice information analysis device 200 voice input device 300 A / D converter 400 buffer memory 401 frame data 500 voice / silence determination unit 511 voice / silence determination unit 600 analysis unit 610 FFT data setting unit 620 FFT unit 630 power spectrum conversion Section 631 power spectrum sequence 640 axis conversion section 650 pitch extraction section 651 pitch information 660 noise removal section 670 inverse FFT section 680 normalization section 681 normalized waveform sequence 682 level information 690 noise processing section 700 silence frame setting section 710 noise table 720 Voice / silent part 800 Vector quantizer 801 Vector code 900 Transmitter 1100 Vector inverse quantizer 1200 Combiner 1300 D / A converter 1400 Audio output device

Claims

[Claims]

1. A voice information analysis method for outputting a result of analysis of voices represented by frame data obtained by collecting voice sampling data for a certain period of time, wherein voices other than noise are included in voices represented by each frame data. If it is determined that the voice other than noise is not included, the feature of the noise included in the voice represented by the frame data is extracted from the frame data and stored. , A step of executing a silence processing for outputting information representing the analysis result of a silent voice prepared in advance as the analysis result of the voice indicated by the frame data, and when it is determined that a voice other than noise is included. , The noise feature stored in the previous silence processing is removed from the voice represented by the frame data, and the voice represented by the frame data from which the noise feature is removed is separated. And voice information analyzing method characterized by a step of performing a sound system processing of outputting the result of analysis.

2. A voice information analysis apparatus for outputting an analysis result obtained by analyzing voices represented by frame data obtained by collecting voice sampling data for a certain period of time, wherein a noise table and a voice represented by each frame data have noise. Means for determining whether or not a voice other than noise is included based on the amplitude of the voice represented by the frame data, and the frame data from the frame data when it is determined that a voice other than noise is not included. Means for extracting the characteristics of noise contained in the voice represented by the above, storing it in the noise table, and outputting information representing the analysis result of the silent voice prepared in advance as the voice analysis result indicated by the frame data. And, when it is determined that a voice other than noise is included, the feature of the noise stored in the noise table represents the voice represented by the frame data. And a unit for analyzing the voice indicated by the frame data from which the noise feature has been removed and outputting the analysis result.

3. Pitch information representing a pitch of a voice represented by each frame data and waveform information representing a voice waveform within the pitch are analyzed by analyzing voices represented by frame data obtained by collecting voice sampling data for a certain period of time. A voice information analyzer for outputting amplitude information representing the amplitude of the voice represented by the frame data, and means for obtaining the power spectrum of the voice represented by each frame data, and frame data from the obtained power spectrum. Means for extracting and outputting the pitch information of the voice indicated by, the noise table, and whether or not the voice represented by each frame data includes voice other than noise is determined by the amplitude of the voice represented by the frame data. Based on the power spectrum of the voice represented by the frame data when it is determined that voice other than noise is not included. Means for storing the table, - the Noizute by extracting a feature of noise included in the speech represented by over data
When it is determined that the sound other than noise is not included, as the analysis result of the sound indicated by the frame data,
When it is determined that the means for outputting the waveform information that represents the silent waveform prepared in advance and the sound other than noise are included,
A means for removing the noise feature stored in the noise table from the power spectrum of the voice represented by the frame data, and the noise feature when the voice other than the noise is determined to be included. And a means for extracting the waveform information and the amplitude information from the removed power spectrum by using the pitch information obtained for the corresponding frame data and outputting the extracted information. .

4. A voice represented by frame data obtained by collecting voice sampling data for a certain period of time is analyzed, and pitch information representing a pitch of a voice represented by each frame data and waveform information representing a voice waveform within the pitch. A voice information analyzer for outputting amplitude information representing the amplitude of the voice represented by the frame data, a means for obtaining the power spectrum of the voice represented by each frame data, and frame data from the obtained power spectrum. Means for extracting and outputting the pitch information of the voice indicated by, the noise table, and whether or not the voice represented by each frame data contains voice other than noise. Or a means for making a determination based on the pitch information obtained for the frame data and the amplitude of the voice represented by the frame data. Noise processing for extracting the characteristic of noise contained in the voice represented by the frame data from the obtained power spectrum and storing it in the noise table when it is determined that the voice other than the noise is included. Means and means for outputting waveform information representing a silent waveform prepared in advance as a result of analysis of the voice indicated by the frame data when it is determined that voice other than noise is included, and voice other than noise When it is determined that the noise is included, a means for removing the noise feature stored in the noise table from the power spectrum of the voice represented by the frame data, and a voice other than noise are included. If it is determined that the waveform information and the amplitude information from the power spectrum from which the noise feature is removed,
Means for extracting and outputting the pitch data obtained for the music data using the pitch information.

5. A voice compression code having the voice information analysis apparatus according to claim 3 and 4, and means for quantizing waveform information output from the voice information analysis apparatus and outputting quantized data. Device.

6. A means for inputting voice, a means for sampling the input voice and outputting sampled data, a buffer means for accumulating the sampled data and outputting the frame data in which the sampled data is accumulated for a predetermined time. And a means for transmitting the pitch information, the waveform information, and the amplitude information output from the speech compression encoding apparatus through a wired or wireless transmission path. A communication terminal device characterized by the above.