JPH02253298A

JPH02253298A - Voice pass filter

Info

Publication number: JPH02253298A
Application number: JP1075593A
Authority: JP
Inventors: Shin Kamiya; 伸神谷
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1989-03-28
Filing date: 1989-03-28
Publication date: 1990-10-12

Abstract

PURPOSE:To provide the voice pass filter which can screen and remove a noise signal regardless of the kind of the noise signal superposed on a voice signal by providing a neural network for noise removal which screens and removes the acoustic parameter of the noise signal from the acoustic parameter of the inputted voice signal superposed with the noise signal. CONSTITUTION:The sampling value of the acoustic signal A consisting of the voice signal S superposed with the noise signal N is inputted via a delay section 20 to the input layer of the neural network 5 for noise removal. On the other hand, the environmental noise N is inputted mainly from a microphone 3 installed in the position apart at some extent from a speaker's mouth and is subjected to an A/D conversion in a sampling period of 12kHz by an A/D converter 4. The sampling value N(t) of the noise signal N is inputted via a delay section 21 to the input layer of the neural network 5 for noise removal. Then, the neural network 5 for noise removal separates the voice signal S and the noise signal N from the acoustic signal A, removes the noise signal N and outputs the voice signal S as an output sound.

Description

【発明の詳細な説明】〈産業上の利用分野〉この発明は、音声信号に雑音信号が重畳された音響信号
を入力し、雑音信号を除去して音声信号のみを出力する
音声通過フィルタに関する。DETAILED DESCRIPTION OF THE INVENTION <Industrial Application Field> The present invention relates to an audio pass filter that inputs an audio signal in which a noise signal is superimposed on an audio signal, removes the noise signal, and outputs only the audio signal.

〈従来の技術〉従来、音声信号に雑音信号が重畳された音響信号を音声
信号と雑音信号とに分離する方法として、スペクトル・
サブトラクション法がある。このスペクトル・サブトラ
クション法は、雑音信号が重畳された音声信号のスペク
トルから無音区間（すなわち、環境雑音のみ）のスペク
トルを引くことによって雑音信号を抑制し、音声信号を
分離する方法である。また、複数のマイクからの同時入
力音響信号を比較することによって音声信号成分を推定
し、音声信号を分離する方法がある。<Prior art> Conventionally, as a method for separating an acoustic signal in which a noise signal is superimposed on a voice signal into a voice signal and a noise signal, spectrum analysis has been used.
There is a subtraction method. This spectral subtraction method is a method of suppressing noise signals and separating voice signals by subtracting the spectrum of silent sections (that is, only environmental noise) from the spectrum of voice signals on which noise signals are superimposed. There is also a method of estimating audio signal components by comparing simultaneous input audio signals from multiple microphones and separating the audio signals.

さらに、最近、ニューラル・ネットワークを用いて雑音
信号を除去し、人力音響信号から音声信号を分離する方
法が試みられるようになった（例えば、荻山、板倉二「
ニューラルネットを用いた音声区間の検出」日本音響学
会講演論文集２−Ｐ−４昭和６３年ｌθ月）。Furthermore, recently, attempts have been made to use neural networks to remove noise signals and separate speech signals from human acoustic signals (for example, Ogiyama and Itakura,
"Detection of speech intervals using neural networks" Proceedings of the Acoustical Society of Japan, 2-P-4, lθ, 1986).

この音声区間の検出に用いられたニューラル。Neural used to detect this speech interval.

ネットワークは、３層バーセプトロン型ニューラル・ネ
ットワークであり、入力層には１９フレームのパワーと
３フレームの１２次までの線形予測ケプストラムとを入
力する。中間層にはパワーに関するユニットが５個、ス
ペクトルに関するユニットカ月Ｏ個設けられている。ま
た、出力層のユニットは１個であり、識別カテゴリは音
声区間である。The network is a three-layer berseptron type neural network, and the power of 19 frames and the linear prediction cepstrum up to the 12th order of three frames are input to the input layer. The intermediate layer is provided with five power-related units and O spectrum-related units. Further, there is one unit in the output layer, and the identification category is a voice section.

学習データは、特定話者の単語列発声音声、白色雑音お
よびプリンタ雑音のパワーと線形予測ケプストラムを用
いている。この学習データによって学習されたニューラ
ル・ネットワークを用いて音声区間の識別テストを行っ
た結果、上記学習データに類似した池の特定話者の単語
列発声音声やプリンタ音等の音響信号に対しては良い正
解率かえられる。The training data uses the power of word sequence utterances of a specific speaker, white noise, printer noise, and linear predicted cepstrum. As a result of performing a speech segment identification test using a neural network trained using this training data, it was found that for acoustic signals such as word sequence utterances of a specific speaker of Ike and printer sounds similar to the above training data, You can improve your accuracy rate.

〈発明が解決しようとする課題〉しかしながら、上記ニューラル・ネットワークを用いた
音声区間検出は、１回の学習に際して一つの学習データ
を入力して学習を実行するようにしているので、学習済
みのニューラル・ネットワークは入力された音響信号が
音声信号であるか雑音信号であるかを識別するような構
造になっている。したがって、学習データと特性の異な
る雑音信号が音声信号に重畳された音響信号に対しては
、識別能力がかなり悪くなるという問題がある。<Problems to be Solved by the Invention> However, in the above-mentioned speech segment detection using the neural network, one learning data is input for each training, so the trained neural - The network is structured to identify whether the input acoustic signal is a voice signal or a noise signal. Therefore, there is a problem in that the discrimination ability becomes considerably poor for an acoustic signal in which a noise signal having characteristics different from those of the learning data is superimposed on an audio signal.

そこで、この発明の目的は、音声信号に重畳される雑音
信号の種類が関係なく、雑音信号が音声信号に重畳され
た音響信号からその雑音信号を選別して除去することが
できる音声通過フィルタを提供することにある。SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide an audio pass filter that can select and remove a noise signal from an acoustic signal superimposed on an audio signal, regardless of the type of noise signal superimposed on the audio signal. It is about providing.

く課題を解決するための手段〉上記目的を達成するため、この発明の音声通過フィルタ
は、雑音信号が重畳された音声信号を音響分析して音響
パラメータを生成する第１音響分析部と、上記雑音信号
を音響分析して音響パラメータを生成する第２音響分析
部と、上記第１音響分析部から出力される雑音信号が重
畳された音声信号の音響パラメータと上記第２音響分析
部から出力される雑音信号の音響パラメータとが入力さ
れ、学習によって、雑音信号が重畳された音声信号ノ音
響パラメータからその雑音信号の音響パラメータを選別
して除去する規則を自ら生成し、人力された雑音信号が
重畳された音声信号の音響パラメータからその雑音信号
の音響パラメータを上記規則に従って選別して除去する
雑音除去用ニューラル・ネットワークを備えたことを特
徴としている。Means for Solving the Problems> In order to achieve the above object, the audio pass filter of the present invention includes a first acoustic analysis section that acoustically analyzes an audio signal on which a noise signal is superimposed to generate acoustic parameters; a second acoustic analysis section that acoustically analyzes a noise signal to generate acoustic parameters; and an acoustic parameter of an audio signal on which the noise signal output from the first acoustic analysis section is superimposed, and an acoustic parameter output from the second acoustic analysis section. The acoustic parameters of the noise signal are input, and through learning, the system automatically generates rules for selecting and removing the acoustic parameters of the noise signal from the acoustic parameters of the audio signal on which the noise signal is superimposed. The present invention is characterized in that it includes a noise removal neural network that selects and removes the acoustic parameters of the noise signal from the acoustic parameters of the superimposed audio signal according to the above rules.

く作用〉雑音信号が重畳された音声信号が第１音響分析部に入力
され、音響分析されて音響パラメータが生成される一方
、上記雑音信号が第２音響分析部に人力され、音響分析
されて音響パラメータが生成される。そして、上記第１
音響分析部および第２音響分析部から出力される両音響
パラメータが雑音除去用ニューラル・ネットワークに人
力される。そうすると、雑音除去用ニューラル・ネット
ワークは、学習によって自ら生成した規則に従って、人
力された雑音信号が重畳された音声信号の音響パラメー
タからその雑音信号の音響パラメータを除去して出力す
る。したがって、雑音信号が重畳された音声信号から音
声信号のみが選別されて出力される。Effects> The audio signal on which the noise signal is superimposed is input to the first acoustic analysis section and acoustically analyzed to generate acoustic parameters, while the noise signal is manually input to the second acoustic analysis section and acoustically analyzed. Acoustic parameters are generated. And the above first
Both acoustic parameters output from the acoustic analysis section and the second acoustic analysis section are manually input to a neural network for noise removal. Then, the noise removal neural network removes the acoustic parameters of the noise signal from the acoustic parameters of the audio signal on which the manually generated noise signal is superimposed, according to the rules it has generated by itself through learning, and outputs the result. Therefore, only the audio signal is selected and output from the audio signal on which the noise signal is superimposed.

〈実施例〉以下、この発明を図示の実施例により詳細に説明する。<Example> Hereinafter, the present invention will be explained in detail with reference to illustrated embodiments.

第１図はこの発明の音声通過フィルタのブロック図であ
る。この音声通過フィルタは、後に詳述するような構造
の雑音除去用ニューラル・ネットワーク５を有し、２つ
の音響信号を人力して音声信号のみを出力するものであ
る。FIG. 1 is a block diagram of the audio pass filter of the present invention. This audio pass filter has a noise removal neural network 5 having a structure as will be described in detail later, and outputs only the audio signal by manually inputting two audio signals.

本実施例においては、主に音声から成る信号を音声信号
Ｓと言い、主に雑音信号から成る信号を雑音信号Ｎと言
い、雑音信号Ｎが重畳された音声信号Ｓを音響信号Ａと
言う。In this embodiment, a signal mainly consisting of voice is referred to as a voice signal S, a signal mainly consisting of a noise signal is referred to as a noise signal N, and a voice signal S on which the noise signal N is superimposed is referred to as an acoustic signal A.

話者の口元に設置された接話マイク１から環境雑音が重
畳された話者の音声が入力され、Ａ／Ｄ変換器２によっ
て１２ＫＨｚのサンプリング周期でＡ／Ｄ変換される。A speaker's voice on which environmental noise is superimposed is input from a close-talking microphone 1 placed near the speaker's mouth, and is A/D converted by an A/D converter 2 at a sampling period of 12 KHz.

そして、音声信号Ｓに雑音信号Ｎが重畳された音響信号
Ａのサンプリング値Ａ（ｔＸｔ：サンプリング番号）が
、後に詳述する遅延部２０を介して雑音除去用ニューラ
ル・ネットワーク５の入力層に入力される。一方、話者
の口元からある程度能れた位置に設置されたマイク３か
ら主に環境雑音Ｎが入力され、Ａ／Ｄ変換器４によって
１２ＫＨｚのサンプリング周期でＡ／Ｄ変換される。そ
して、雑音信号Ｎのサンプリング値Ｎ　（ｔ）が遅延部
２１を介して雑音除去用ニューラル・ネットワーク５の
入力層に入力される。そうすると、雑音除去用ニューラ
ル・ネットワーク５は音響信号Ａから音声信号Ｓと雑音
信号Ｎとを分離して雑音信号Ｎを除去し、音声信号Ｓを
出力音として出力するのである。Then, the sampling value A (tXt: sampling number) of the acoustic signal A obtained by superimposing the noise signal N on the audio signal S is input to the input layer of the noise removal neural network 5 via the delay unit 20, which will be described in detail later. be done. On the other hand, mainly environmental noise N is inputted from a microphone 3 installed at a position that can be seen from the speaker's mouth to some extent, and is A/D converted by an A/D converter 4 at a sampling period of 12 KHz. Then, the sampled value N (t) of the noise signal N is input to the input layer of the noise removal neural network 5 via the delay section 21 . Then, the noise removal neural network 5 separates the audio signal S and the noise signal N from the audio signal A, removes the noise signal N, and outputs the audio signal S as an output sound.

第２図は上記雑音除去用ニューラル・ネットワーク５の
構造の概略図である。このニューラル・ネットワークは
、図中下側から順に入力層６．中間層７および出力層８
から成る３層構造を有する３層パーセプトロン型ニュー
ラル・ネットワークである。人力層６には２Ｔ個のユニ
、ットを配し、中間層７にはＮ個のユニットを配し、出
力Ｊ１８にはＴ個のユニットを配している。FIG. 2 is a schematic diagram of the structure of the neural network 5 for noise removal. This neural network consists of an input layer 6. Intermediate layer 7 and output layer 8
This is a three-layer perceptron type neural network with a three-layer structure consisting of: 2T units are arranged in the human power layer 6, N units are arranged in the intermediate layer 7, and T units are arranged in the output J18.

上記入力層６の２Ｔ個のユニットは２個ずつのＴ個のグ
ループに分けられる。そして、各グループの一方のユニ
ット９，１０．・・・、１１には、Ａ／Ｄ変換器２から
の音声信号Ｓに雑音信号Ｎが重畳された音響信号Ａのサ
ンプリング値Ａ　（ｔ）を入力する。また、各グループ
の他方のユニットｌ　２．＋　３゜・・・、１４には、
Ａ／Ｄ変換器４からの雑音信号Ｎのサンプリング値Ｎ（
（）を入力する。その際に、上記２Ｔ個のグループのう
ちユニット９，１２のグループにはｔ番目のサンプリン
グ値を入力し、ユニット１０．１３のグループには（ｔ
−１）番目のサンプリング値を入力し、以下同様にして
ユニット１１．１４のグループには（ｔ−Ｔ　＋１）番
目のサンプリング値を人力するのである。すなわち、雑
音除去用ニューラル・ネットワーク５の入力層６には、
（ｔ−Ｔ＋１）番目のサンプリング値からｔ番目のサン
プリング値までの音響信号Ａと雑音信号Ｎとを入力する
のである。The 2T units of the input layer 6 are divided into T groups of two. Then, one unit 9, 10 . . . , 11 receives the sampling value A (t) of the acoustic signal A obtained by superimposing the noise signal N on the audio signal S from the A/D converter 2. Also, the other unit l of each group 2. + 3°..., 14,
The sampling value N(
Enter (). At that time, among the 2T groups, the t-th sampling value is input to the groups of units 9 and 12, and the (t
-1)th sampling value is input, and the (t-T +1)th sampling value is manually inputted to the group of unit 11.14 in the same manner. That is, in the input layer 6 of the noise removal neural network 5,
The acoustic signal A and the noise signal N from the (t-T+1)th sampling value to the tth sampling value are input.

この場合、雑音除去用ニューラル・ネットワーク５の入
力層６にＡ　（ｔ−Ｔ　＋１）〜Ａ　（ｔ）およびＮ（
ｔ−Ｔ＋１）〜Ｎ　（ｔ）を入力する方法は次のような
方法で行う。すなわち、ユニット９にはＡ／Ｄ変換器２
からの出力信号Ａ　（ｔ）を直接人力し、ユニット１２
にはＡ／Ｄ変換器４からの出力信号Ｎ　（ｔ）を直接入
力する。また、ユニット１０にはＡ／Ｄ変換器２からの
出力信号Ａ　（ｔ）を遅延部２０の遅延素子２２によっ
て１サンプリング周期分だけ時間を遅延させて入力し、
ユニットＩ３にはＡ／Ｄ変換器４からの出力信号Ｎ　（
ｔ）を遅延部２１の遅延素子２４によって菫サンプリン
グ周期分だけ時間を遅延させて入力する。以下同様にし
て、ユニットｌｌにはＡ／Ｄ変換器２からの出力信号Ａ
　（ｔ）を遅延素子２３によって（Ｔ−１）サンプリン
グ周期分だけ時間を遅延させて入力し、ユニット１４に
はＡ／Ｄ変換器４からの出力信号Ｎ　（ｔ）を遅延素子
２５によって（Ｔ−１）サンプリング周期分だけ時間を
遅延させて入力すればよい。In this case, A (t-T +1) to A (t) and N(
The method for inputting t-T+1) to N (t) is as follows. That is, the unit 9 includes the A/D converter 2.
The output signal A (t) from the unit 12 is directly inputted manually.
The output signal N (t) from the A/D converter 4 is directly input to the input signal. Further, the output signal A (t) from the A/D converter 2 is input to the unit 10 after being delayed by one sampling period by the delay element 22 of the delay section 20.
The unit I3 receives the output signal N (
t) is input after being delayed by the violet sampling period by the delay element 24 of the delay unit 21. Similarly, unit 11 receives output signal A from A/D converter 2.
(t) is input to the unit 14 with a time delay of (T-1) sampling period by the delay element 23, and the output signal N (t) from the A/D converter 4 is input to the unit 14 by the delay element 25 (T -1) The time may be delayed by the sampling period and then input.

また、出力層８のユニット１５には音声信号Ｓのｔ番目
のサンプリング値Ｓ　（ｔ）を割り付けて、ユニット１
６には音声信号Ｓの（ト」）番目のサンプリング値５（
ｔ−ｔ）を割り付け、以下同様にしてユニット１７には
音声信号Ｓの（ｔ−Ｔ　＋ｌ）番目のサンプリング値Ｓ
　（ｔ−７４１）を割り付ける。人力層６の各ユニット
は夫々中間層７の全ユニットと接続し、中間層７の各ユ
ニットは夫々出力層８の全ユニットと接続している。し
かしながら、各層内のユニット間は接続されない。Further, the t-th sampling value S (t) of the audio signal S is assigned to the unit 15 of the output layer 8, and the unit 1
6 indicates the (g)th sampling value 5() of the audio signal S.
t-t), and in the same manner, the unit 17 assigns the (t-T +l)th sampling value S of the audio signal S.
(t-741) is assigned. Each unit of the human power layer 6 is connected to all units of the intermediate layer 7, and each unit of the intermediate layer 7 is connected to all units of the output layer 8, respectively. However, units within each layer are not connected.

上記雑音除去用ニューラル・ネットワーク５の学習は次
のように誤差逆伝播法によって行う。すなわち、この学
習には２種類の学習データを用意する。第１の学習デー
タとして、多数話者の音声信号Ｓ°に雑音信号Ｎ°を重
畳させた音響信号Ａ。The learning of the noise removal neural network 5 is performed by the error backpropagation method as follows. That is, two types of learning data are prepared for this learning. The first learning data is an acoustic signal A in which a noise signal N° is superimposed on a voice signal S° of many speakers.

におけるＴ個のサンプリング値Ａ’（ｔ−Ｔ＋１）〜Ａ
゛（１）を用意する。また、第２の学習データとして、
上記雑音信号Ｎ°におけるＴ個のサンプリング値Ｎ’　
（ｔ−Ｔ　＋ｌ）〜Ｎ’（ｔ）を用意する。また、教師
データとして雑音信号Ｎ゛が重畳されていない上記多数
話者の音声信号Ｓ°のＴ個のサンプリング値Ｓ　’　（
ｔ−Ｔ　＋Ｊ）〜Ｓ　’　（ｔ）を用意する。T sampling values A'(t-T+1)~A
Prepare (1). Also, as the second learning data,
T sampling values N' in the above noise signal N°
(t-T +l) to N'(t) are prepared. Also, as training data, T sampling values S' (
t-T +J) to S' (t) are prepared.

学習に際しては、第１の学習データを入力層６の各ユニ
ット９，１０．・・・、＋１に入力する一方、第２の学
習データを入力層６の各ユニット１２゜１３、・・・、
１４に入力した場合には、次のように教師データを出力
層８に人力する。すなわち、音声信号Ｓのｔ番目のサン
プリング値Ｓ　（ｔ）が割り付けられたユニット１５に
は教師データのｔ番目のサンプリング値Ｔ（ｔ）＝Ｓ″
（１）を入力し、サンプリング値５（ｔ−１）が割り付
けられたユニット１６には教師データの（ｔ−１）番目
のサンプリング値Ｔ（ｔ−１）＝Ｓ’（ｔ−１）を入力
し、以下同様にしてサンプリング値Ｓ　（ｔ−Ｔ　＋１
）が割り付けられたユニット１７には教師データの（ｔ
−Ｔ　＋１）番目のサンプリング値Ｔ　（ｔ−Ｔ　＋ｌ
）　＝　Ｓ　’　（ｔ−Ｔ　＋１）を入力するのである
。そうすると、雑音除去用ニューラル・ネットワーク５
は、出力層８のユニット１５．・・・１６、・・・、１
７からの出力値が教師データと同じになるようにネット
ワークの重みを設定しなおしてネットワーク構造を決定
するのである。During learning, the first learning data is applied to each unit 9, 10 . . . of the input layer 6. . . , +1, while inputting the second learning data to each unit 12, 13, . . . of the input layer 6.
14, the teacher data is manually input to the output layer 8 as follows. That is, the unit 15 to which the t-th sampling value S (t) of the audio signal S is assigned the t-th sampling value T (t) = S'' of the teacher data.
(1) is input, and the (t-1)th sampling value T(t-1)=S'(t-1) of the teaching data is input to the unit 16 to which sampling value 5(t-1) is assigned. input the sampling value S (t-T +1
) is assigned to unit 17, the teacher data (t
−T +1)th sampling value T (t−T +l
) = S' (t-T +1). Then, the neural network for noise removal 5
is unit 15. of output layer 8. ...16, ..., 1
The network structure is determined by resetting the weights of the network so that the output value from 7 is the same as the teacher data.

すなわち、この雑音除去用ニューラル・ネットワーク５
の学習は、あるカテゴリに属するパターン（イ）に他の
カテゴリに属する他のパターン（ロ）を重畳したパター
ン（ハ）と上記パターン（ロ）とを入力したときに、パ
ターン（ロ）を含むパターン（ハ）の中からパターン（
ロ）を選別して除去する規則を求める学習である。した
がって、この学習方法によって学習されたニューラル・
ネットワークにおける選別能力は、上記パターン（イ）
に重畳するバタ・−ン（ロ）の内容にはよらないのであ
る。したがって、上記雑音除去用ニューラル・ネットワ
ーク５においては、上記パターン（イ）は音声信号Ｓで
ありパターン（ロ）は音声信号Ｓ（こ重畳された雑音信
号Ｎであるから、学習済みの雑音除去用ニューラル・ネ
ットワーク５は雑音信号Ｎの種類には関係無く、音響信
号Ａから雑音信号Ｎを選別して除去できるのである。In other words, this noise removal neural network 5
is learned by inputting a pattern (c) in which a pattern belonging to a certain category (a) is superimposed with another pattern (b) belonging to another category, and the above pattern (b), the pattern (b) is included. Select the pattern (C) from the pattern (C).
This is learning to find rules for selecting and eliminating (b). Therefore, the neural
The selection ability in the network is based on the above pattern (a).
It does not depend on the content of the bata-n (ro) superimposed on the. Therefore, in the noise removal neural network 5, the pattern (a) is the audio signal S and the pattern (b) is the audio signal S (the noise signal N superimposed on this). The neural network 5 can select and remove the noise signal N from the acoustic signal A, regardless of the type of the noise signal N.

上述のようにして学習された雑音除去用ニューラル・ネ
ットワーク５による入力音響信号Ａからの雑音信号Ｎの
選別は次のように行われる。ずなわち、Ａ／Ｄ変換器２
から出力される音声信号Ｓに環境雑音信号Ｎが重畳され
た音響信号Ａのサンプリング値Ａ　（ｔ）が遅延部２０
によって（Ｔ−１）〜０サンプリング周期分だけ時間が
遅延され、得られたサンプリング値Ａ　＜ｔ−Ｔ＋１）
〜Ａ　（ｔ）が雑音除去用ニューラル・ネットワーク５
の入力層６の各ユニット９，１０．・・・、ＩＩに入力
される。一方、Ａ／Ｄ変換器４から出力される環境雑音
信号Ｎのサンプリング値Ｎ　（ｔ）が遅延部２１によっ
て（Ｔ−１）〜０サンプリング周期分だけ時間が遅延さ
れ、得られたサンプリング値Ｎ、（ｔ−Ｔ　４１）〜Ｎ
（ｔ）が各ユニット１２．１３．・・・、１４に入力さ
れる。そうすると、雑音除去用ニューラル・ネットワー
ク５は、音響信号Ａのサンプリング値Ａ　（ｔ−Ｔ　＋
ｌ）〜Ａ（１）から環境雑音信号Ｎのサンプリング値Ｎ
Ｃｔ−Ｔ＋ｌ）〜Ｎ（ｔ）を選別して除去する。そして
、出力層８のユニット１５からは音声信号Ｓのサンプリ
ング値Ｓ　（ｔ）に対応する出力値が出力され、ユニッ
ト１６からは音声信号Ｓのサンプリング値５（ｔｌ）に
対応する出力値が出力され、以下同様にしてユニット１
７からは音声信号Ｓのサンプリング値Ｓ　（ｔ−Ｔ　＋
１）が出力される。The selection of the noise signal N from the input acoustic signal A by the noise removal neural network 5 trained as described above is performed as follows. That is, A/D converter 2
The sampling value A (t) of the acoustic signal A obtained by superimposing the environmental noise signal N on the audio signal S output from the delay unit 20
The time is delayed by (T-1) to 0 sampling periods, and the obtained sampling value A <t-T+1)
~A (t) is a neural network for noise removal 5
Each unit 9, 10 . . . . is input to II. On the other hand, the sampling value N (t) of the environmental noise signal N output from the A/D converter 4 is delayed by (T-1) to 0 sampling periods by the delay unit 21, and the obtained sampling value N , (t-T 41) ~N
(t) for each unit 12.13. . . . is input to 14. Then, the noise removal neural network 5 calculates the sampling value A (t-T +
l) - Sampling value N of environmental noise signal N from A(1)
Ct-T+l) to N(t) are selected and removed. The unit 15 of the output layer 8 outputs an output value corresponding to the sampling value S (t) of the audio signal S, and the unit 16 outputs an output value corresponding to the sampling value 5 (tl) of the audio signal S. and unit 1 in the same way.
7, the sampling value S (t−T +
1) is output.

したがって、雑音除去用ニューラル・ネットワーク５の
出力層８のユニット１５からの出力信号を本音声通過フ
ィルタの出力信号０　（１）とすれば、雑音除去用ニュ
ーラル・ネットワーク５の人力層６に、音響信号Ａのサ
ンプリング値Ａ　（ｔ−Ｔ　＋１）〜Ａ　（ｔ）、およ
び、環境雑音信号Ｎのサンプリング値Ｎ　（ｔ−Ｔ　４
１）〜Ｎ　（ｔ）が入力されると、出力層８のユニット
！５からは音声信号Ｓ　（ｔ）に対応する出力信号０　
（ｔ）が出力される。そして、■サンプリング周期だけ
時間が経過して、入力層６へ入力される音響信号Ａのサ
ンプリング値がＡ　（ｔ−Ｔ→２）＝Ａ（ｔ＋１）に、
また、環境雑音信号Ｎのサンプリング値がＮ　（ｔ−Ｔ
　＋２）〜Ｎ（ｔ＋１）に変わると。Therefore, if the output signal from the unit 15 of the output layer 8 of the neural network 5 for noise removal is the output signal 0 (1) of the audio pass filter, the human layer 6 of the neural network 5 for noise removal The sampling values A (t-T +1) to A (t) of the signal A and the sampling values N (t-T 4
1)~N (t) is input, the unit of output layer 8! 5, the output signal 0 corresponding to the audio signal S (t)
(t) is output. Then, after the time period of ■sampling period has passed, the sampling value of the acoustic signal A input to the input layer 6 becomes A (t-T→2)=A(t+1),
Also, the sampling value of the environmental noise signal N is N (t-T
+2) to N(t+1).

出力層８のユニット１５からは音声信号Ｓ（ｔ＋Ｉ）に
対応する出力信号０（ｔ＋１）が出力される。The unit 15 of the output layer 8 outputs an output signal 0(t+1) corresponding to the audio signal S(t+I).

すなわち、本音声通過フィルタは、音声信号Ｓに雑音信
号Ｎが重畳された音響信号Ａのサンプリング値Ａ　（ｔ
）と雑音信号Ｎのサンプリング値Ｎ（ｔ）とが入力され
ると、音響信号Ａのサンプリング値Ａ　Ｑ）から雑音信
号Ｎのサンプリング値Ｎ　（ｔ）を選別して除去し、音
声信号Ｓのサンプリング値Ｓ　（ｔ）に対応した出力信
号０（ｔ）を出力するのである。That is, the present audio pass filter uses a sampling value A (t
) and the sampling value N(t) of the noise signal N are input, the sampling value N(t) of the noise signal N is selected and removed from the sampling value AQ) of the audio signal A, and the sampling value N(t) of the noise signal N is input. It outputs an output signal 0(t) corresponding to the sampling value S(t).

このように、本実施例の音声通過フィルタは、音声信号
Ｓに雑音信号Ｎが重畳された音響信号Ａのサンプリング
値Ａ（ｔ）をＡ／Ｄ変換器２によって求め、上記雑音信
号Ｎのサンプリング値Ｎ　（ｔ）をＡ／Ｄ変換器４によ
って求める。そして、この音響信号Ａのサンプリング値
Ａ　（ｔ）と雑音信号Ｎのサンプリング値Ｎ　（ｔ）に
基づいて、雑音除去用ニューラル・ネットワーク５によ
って音響信号Ａのサンプリング値Ａ　（ｔ）から雑音信
号Ｎのサンプリング値Ｎ　（ｔ）を選別して除去し、音
声信号５（ｔ）に応じた出力信号０（ｔ）を出力するよ
うにしている。In this manner, the audio pass filter of this embodiment obtains the sampling value A(t) of the audio signal A, in which the noise signal N is superimposed on the audio signal S, using the A/D converter 2, and performs sampling of the noise signal N. The value N (t) is determined by the A/D converter 4. Then, based on the sampling value A (t) of the acoustic signal A and the sampling value N (t) of the noise signal N, the noise removal neural network 5 extracts the sampling value A (t) of the acoustic signal A from the noise signal N (t). The sampling value N (t) is selected and removed, and an output signal 0(t) corresponding to the audio signal 5(t) is output.

したがって、上記雑音除去用ニューラル・ネットワーク
５を適確な学習データによって正しく学習しておけば、
音声信号Ｓに重畳される雑音信号Ｎの種類に関係無く、
雑音信号Ｎが重畳された音声信号Ｓから音声信号Ｓを選
別して出力することができるのである。Therefore, if the noise removal neural network 5 is properly trained using appropriate training data,
Regardless of the type of noise signal N superimposed on the audio signal S,
The audio signal S can be selected and output from the audio signal S on which the noise signal N is superimposed.

上記実施例においては、雑音除去用ニューラル・ネット
ワーク５として３層パーセブトロン型ニューラル・ネッ
トワークを用いているが、４層以上のパーセプトロン型
ニューラル・ネットワークを用いても構わない。In the above embodiment, a three-layer perceptron type neural network is used as the noise removal neural network 5, but a four or more layer perceptron type neural network may be used.

上記実施例においては、雑音除去用ニューラル・ネット
ワーク５の学習の際に入力される第１の学習データ（多
数話者の音声信号Ｓに雑音信号Ｎが重畳された音響信号
Ａ）と第２の学習データ（上記重畳された雑音信号Ｎ）
とのサンプリング時は同じになっている。ところが、実
際には接話マイク１に入力される雑音信号とマイク２に
入力される雑音信号とは位相が異なるので、第２の学習
データの入力タイミングを第１の学習データの入力タイ
ミングより所定時間だけ遅延させてもよい。In the above embodiment, the first learning data (acoustic signal A in which the noise signal N is superimposed on the speech signal S of many speakers) and the second learning data are input when the noise removal neural network 5 is trained. Learning data (the above superimposed noise signal N)
It is the same when sampling with. However, in reality, the noise signal input to the close-talking microphone 1 and the noise signal input to the microphone 2 have different phases, so the input timing of the second learning data is set to a predetermined value from the input timing of the first learning data. It may be delayed by a certain amount of time.

こうすることによって、雑音除去用ニューラル・ネット
ワーク５の選別能力が更に高くなることが期待できる上記実施例においては、雑音除去用ニューラル・ネット
ワーク５への人力信号および雑音除去用ニューラル・ネ
ットワーク５からの出力信号として、Ａ／Ｄ変換器２，
４によってＡ／Ｄ変換されたサンプリング値を用いてい
る。しかしながら、この発明はこれに限定されるもので
はなく、雑音信号が重畳された音声信号や雑音信号を音
響分析して得られる自己相関係数列およびスペクトル値
列等の特徴パラメータであってもよい。By doing so, it is expected that the selection ability of the noise removal neural network 5 will be further improved.In the above embodiment, the human input signal to the noise removal neural network 5 and the input signal from the noise removal neural network 5 are As an output signal, an A/D converter 2,
The sampled values A/D converted by 4 are used. However, the present invention is not limited thereto, and feature parameters such as an autocorrelation coefficient sequence and a spectral value sequence obtained by acoustic analysis of a speech signal or a noise signal on which a noise signal is superimposed may be used.

〈発明の効果〉以上より明らかなように、この発明の音声通過フィルタ
は、第１音響分析部、第２音響分析部および雑音除去用
ニューラル・ネッ）・ワークを備えて、雑音信号が重畳
された音声信号を第１音響分析部によって音響分析して
音響パラメータを生成する一方、上記雑音信号を第２音
響分析部によって音響分析して音響パラメータを生成し
、上記雑音除去用ニューラル・ネットワークによって、
上記第１音響分析部から出力される雑°音信号が重畳さ
れた音声信号の音響パラメータから、上記第２音響分析
部から出力される上記雑音信号の音響パラメータを、学
習によって自ら生成した規則に従って除去して出力する
ようにしたので、音声信号に重畳される雑音信号の種類
に関係無く、雑音信号が重畳された音声信号から雑音信
号のみを選別して除去することができる。<Effects of the Invention> As is clear from the above, the audio pass filter of the present invention includes a first acoustic analysis section, a second acoustic analysis section, and a neural network for noise removal, so that a noise signal is superimposed. A first acoustic analysis section acoustically analyzes the generated audio signal to generate acoustic parameters, while a second acoustic analysis section acoustically analyzes the noise signal to generate acoustic parameters, and the noise removal neural network performs
The acoustic parameters of the noise signal output from the second acoustic analysis unit are determined from the acoustic parameters of the audio signal on which the noise signal output from the first acoustic analysis unit is superimposed, according to rules generated by the user through learning. Since the noise signal is removed and output, it is possible to select and remove only the noise signal from the audio signal on which the noise signal is superimposed, regardless of the type of noise signal superimposed on the audio signal.

[Brief explanation of drawings]

第１図はこの発明の音声通過フィルタにおける一実施例
のブロック図、第２図は第１図における雑音除去用ニュ
ーラル・ネットワークの構造概略図である。１・・・接話マイク、２．４・・・Ａ／Ｄ変換器、３・
・・マイク、５・・・雑音除去用ニューラル・ネットワーク、６・・
・人力層、　　　７・・・中間層、８・・・出力層、　
　　　２０．２１・・・遅延部。FIG. 1 is a block diagram of an embodiment of the audio pass filter of the present invention, and FIG. 2 is a schematic diagram of the structure of the noise removal neural network in FIG. 1. 1... Close-talking microphone, 2.4... A/D converter, 3.
...Microphone, 5...Neural network for noise removal, 6...
・Manpower layer, 7...middle layer, 8...output layer,
20.21...Delay section.

Claims

[Claims]

(1) a first acoustic analysis section that acoustically analyzes an audio signal on which a noise signal is superimposed to generate acoustic parameters; a second acoustic analysis section that acoustically analyzes the noise signal to generate acoustic parameters; The acoustic parameters of the audio signal on which the noise signal is superimposed output from the first acoustic analysis section and the acoustic parameters of the noise signal output from the second acoustic analysis section are input, and through learning, the audio on which the noise signal is superimposed is generated. It generates a rule for selecting and removing the acoustic parameters of the noise signal from the acoustic parameters of the signal, and selects the acoustic parameters of the noise signal from the acoustic parameters of the audio signal on which the input noise signal is superimposed according to the above rules. An audio pass filter characterized by comprising a neural network for removing noise.