JP3394998B2

JP3394998B2 - Noise removal device for voice input system

Info

Publication number: JP3394998B2
Application number: JP35421992A
Authority: JP
Inventors: 晴剛安田; 泉木下
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1992-12-15
Filing date: 1992-12-15
Publication date: 2003-04-07
Anticipated expiration: 2018-04-07
Also published as: JPH06186997A

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声入力システムの騒
音除去装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a noise eliminating device for a voice input system.

【０００２】[0002]

【従来の技術】従来より、音声入力用と参照入力用の２
つのマイクロホンを用い、音声入力マイクロホンを通じ
て入力された音（音声と共に騒音が入る）の特徴量か
ら、同時に参照入力マイクロホンを通じて入力された音
（主に騒音である）の特徴量を減じることにより、騒音
を除去する装置が提案されている。2. Description of the Related Art Conventionally, 2 for voice input and 2 for reference input
Noise is reduced by using two microphones and subtracting the feature amount of the sound (mainly noise) input through the reference input microphone from the feature amount of the sound (noise enters with the voice) input through the voice input microphone at the same time. There has been proposed a device for removing the.

【０００３】また、図１０に示すように、適応フィルタ
１００と音声区間検出部１０１とを備えた騒音除去装置
が知られている（特開平２−２７８２９８号公報また
は、日本音響学会講論集平成２年９月１−８−５参
照）。この騒音除去装置は、音声入力用と参照入力用の
２つのマイクロホン１０４，１０５からそれぞれ入力さ
れた音声のパワースペクトルＸ（ｆ），Ｎ（ｆ）を求
め、これらから未知入力の音声スペクトルＳ（ｆ）を推
定するものであり、その際に、適応フィルタ１００によ
って両入力のゲイン差を補正する係数である補正係数ｋ
（ｆ）を生成し、これを前記Ｎ（ｆ）に乗じた値をＸ
（ｆ）から減じて騒音を除去している。上記の補正係数
ｋ（ｆ）は固定的なものではなく、逐次変動する両特徴
量のレベルに従って更新されることになる。Further, as shown in FIG. 10, there is known a noise eliminator including an adaptive filter 100 and a voice section detector 101 (Japanese Patent Laid-Open No. 2-278298 or Proceedings of the Acoustical Society of Japan, Heisei 2). September 1-8-5). This noise elimination device obtains power spectra X (f) and N (f) of voices respectively input from two microphones 104 and 105 for voice input and reference input, and from these, unknown input voice spectrum S ( f) is estimated, and at that time, a correction coefficient k which is a coefficient for correcting the gain difference between the two inputs by the adaptive filter 100.
(F) is generated and the value obtained by multiplying this by N (f) is X.
Noise is removed by subtracting from (f). The above correction coefficient k (f) is not fixed, and will be updated according to the level of both feature values that change sequentially.

【０００４】ところで、音声入力システムがパーソナル
コンピュータやワークステーション等である場合、図１
１に示すように、音声認識ボード１０２はコンピュータ
の拡張バスに挿入され、ソフトウェアはフロッピィディ
スクから供給される。この場合に、前記２つのマイクロ
ホン１０４，１０５は、それぞれコード１０６，１０７
にてパーソナルコンピュータの筐体外に引き出される。
そして、音声入力用マイクロホン１０４はユーザーの前
に置かれ、参照入力用マイクロホン１０５は、前記マイ
クロホン１０４とノイズが均等に入力されるような位置
に置かれる。By the way, if the voice input system is a personal computer, a workstation, etc., FIG.
As shown in FIG. 1, the voice recognition board 102 is inserted into the expansion bus of the computer, and the software is supplied from the floppy disk. In this case, the two microphones 104 and 105 have cords 106 and 107, respectively.
At, it is pulled out of the housing of the personal computer.
Then, the voice input microphone 104 is placed in front of the user, and the reference input microphone 105 is placed at a position where noise is equally input to the microphone 104.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、参照入
力用マイクロホン１０５を、前記マイクロホン１０４と
ノイズが均等に入る位置に配置するのは難しく、ユーザ
ーにとっては大変に面倒な作業となる。一方、参照入力
用マイクロホン１０５をコンピュータ装置の背面に固定
的に取り付けておくことでユーザー自身がマイクロホン
１０５を配置する手間を省くことも可能であるが、これ
では放熱ファンモーターの騒音がマイクロホン１０５に
過剰に入り、音声認識の精度を低下させてしまう。即
ち、従来の騒音除去装置において前記参照用マイクロホ
ン１０５を放熱ファンモーターの近傍に設けた場合、放
熱ファンモーターの音がマイクロホン１０５に過剰に入
り、マイクロホン１０４には殆ど入らず、両マイクロホ
ン１０４，１０５で騒音入力のレベルの著しい不均衡が
生じ、前記のゲイン調整を行ったとしても、音声のパワ
ースペクトルから過剰なデータを減じてしまい、音声の
特徴パターンを不正確にするおそれがあった。However, it is difficult to dispose the reference input microphone 105 at a position where noise is evenly distributed with the microphone 104, which is a very troublesome task for the user. On the other hand, the reference input microphone 105 may be fixedly attached to the back surface of the computer device to save the user from having to arrange the microphone 105 by himself. However, in this case, the noise of the heat dissipation fan motor is transmitted to the microphone 105. Excessive entry will reduce the accuracy of voice recognition. That is, when the reference microphone 105 is provided in the vicinity of the heat dissipation fan motor in the conventional noise eliminator, the sound of the heat dissipation fan motor excessively enters the microphone 105 and hardly enters the microphone 104. There is a significant imbalance in the level of the noise input, and even if the above-mentioned gain adjustment is performed, excessive data is subtracted from the power spectrum of the voice, which may make the feature pattern of the voice inaccurate.

【０００６】また、前記の放熱ファンモーターは、パー
ソナルコンピュータが作動している間において常に騒音
源となるものであり、音の強さ一定で特徴量も一定であ
るから、定常的・定型的な騒音源となるものであるが、
これに対し、例えば電話機やファクシミリ装置は、唐突
に鳴動し、一定周期で一定周波数の音を発するから、突
発的・定型的な騒音源といえる。 Further, since the heat radiation fan motor is always a noise source while the personal computer is operating, the sound intensity is constant and the characteristic amount is also constant. It is a source of noise,
In contrast, for example, telephone or facsimile machine suddenly sounded, because emits the sound of a certain frequency in a constant cycle, be said sudden-routine noise source.

【０００７】本発明は、上記の事情に鑑み、定常的・定
型的な騒音源で、特に、２つのマイクロホンへの入力量
に著しい不均衡があるような場合、これが音声用マイク
ロホンに入力されると音声認識に非常に悪影響を与える
場合において、そのような騒音を確実に除去することの
できる音声入力システムの騒音除去装置を提供すること
を目的としている。[0007] The present invention has been made in view of the above circumstances, in steady-routine noise source, in particular, the two input quantities a significant imbalance is such a case to the microphone, this is within the audio microphone It is an object of the present invention to provide a noise elimination device for a voice input system, which can surely eliminate such noise when the voice recognition is adversely affected when input.

【０００８】[0008]

【課題を解決するための手段】本発明の音声入力システ
ムの騒音除去装置は、上記従来の課題を解決するため
に、音声入力手段と、入力された音声の特徴量を抽出す
る音声特徴抽出部と、予め複数の音声の特徴量が登録さ
れている音声認識辞書と、音声入力とは別の参照音入力
手段と、入力された参照音の特徴量を抽出する参照音特
徴抽出部と、上記二つの特徴量を比較演算して補正係数
を生成する補正係数生成部と、上記補正係数と参照音特
徴量とを考慮して音声特徴量を補正する補正部と、補正
された音声特徴量と前記の音声認識辞書中の各音声の特
徴量との類似度計算を行って音声認識する認識部とを備
えた音声入力システムの騒音除去装置において、前記二
つの特徴量の平均量を求め、前記参照音特徴量に上記の
両平均量の差異に基づく補正を行った上でこの補正済の
参照音特徴量を前記補正係数生成部に入力する前段補正
部を備えたことを特徴としている。In order to solve the above-mentioned conventional problems, a noise elimination device for a voice input system according to the present invention includes a voice input means and a voice feature extraction section for extracting a feature amount of input voice. A voice recognition dictionary in which a plurality of voice feature amounts are registered in advance, a reference sound input unit different from voice input, a reference sound feature extraction unit for extracting the feature amount of the input reference sound, A correction coefficient generation unit that compares and calculates two feature amounts to generate a correction coefficient, a correction unit that corrects the voice feature amount in consideration of the correction coefficient and the reference sound feature amount, and a corrected voice feature amount. In a noise elimination device of a voice input system including a recognition unit for recognizing a voice by performing similarity calculation with the feature amount of each voice in the voice recognition dictionary, an average amount of the two feature amounts is obtained, The reference sound feature is based on the difference between the above two averages. Is characterized by having a Ku front correction unit for inputting the reference sound feature value of the corrected in the correction coefficient generation unit after performing correction.

【０００９】また、請求項１の音声入力システムの騒音
除去装置において、前段補正部にて生成された前記差異
量を記憶する手段を備えたことを特徴としている。Further, in the noise eliminating device for a voice input system according to the present invention, there is provided a means for storing the difference amount generated by the pre-stage correction section.

【００１０】また、請求項１又は２の音声入力システム
の騒音除去装置において、前記の参照音入力手段は騒音
除去装置を構成している回路基板に設けられていること
を特徴としている。Further, in the noise eliminating device of the voice input system according to the first or second aspect, the reference sound inputting means is provided on a circuit board constituting the noise eliminating device.

【００１１】[0011]

【００１２】[0012]

【００１３】[0013]

【作用】上記の構成によれば、騒音が参照音入力手段に
過剰に入り、音声入力手段に殆ど入らず、両手段におい
て騒音入力のレベルに著しい差異が生じても、その不均
衡は、前段補正部によって解消され、この解消済の参照
音特徴量が補正係数生成部に入力されることになる。従
って、上記補正係数生成部における補正係数の生成は、
一般的な雑音の不均衡に対するゲイン調整のためのもの
となり、誤差は少なく、認識部における音声認識の正確
化に寄与することができる。According to the above construction, even if the noise excessively enters the reference sound inputting means and hardly enters the voice inputting means, and there is a significant difference in the level of noise input between the two means, the imbalance will not occur. The corrected reference sound feature amount is canceled by the correction unit and is input to the correction coefficient generation unit. Therefore, the generation of the correction coefficient in the correction coefficient generation unit is
The gain adjustment is for general noise imbalance, and the error is small, which can contribute to the accuracy of speech recognition in the recognition unit.

【００１４】また、例えば、パーソナルコンピュータの
セットアップ時などに前段補正部を起動させ、この前段
補正部が起動されているときに生成された差異量を記憶
しておき、音声認識に際して前記の差異量に基づき補正
参照音特徴量を補正係数生成部に入力することができ
る。即ち、定常的・定型的な騒音については、これを予
め記憶して補正処理を行うことが可能である。Further, for example, when the pre-stage correction unit is activated at the time of setting up a personal computer, the difference amount generated when the pre-stage correction unit is activated is stored, and the difference amount is used for voice recognition. Based on the above, the corrected reference sound feature amount can be input to the correction coefficient generation unit. That is, it is possible to memorize the stationary / typical noise in advance and perform the correction process.

【００１５】また、騒音除去装置を構成している回路基
板に参照音入力手段を設けるから、ユーザー自身が参照
音入力手段を配置する手間を省くことができる。この場
合、回路基板の近傍に放熱ファンモーターが存在してい
ても、前述のように、騒音入力の著しい不均衡は解消さ
れるので、なんら問題はない。Further, since the reference sound inputting means is provided on the circuit board which constitutes the noise removing device, the user can save the labor of arranging the reference sound inputting means. In this case, even if there is a heat dissipation fan motor near the circuit board, as described above, the significant imbalance in noise input is resolved, and there is no problem.

【００１６】[0016]

【００１７】[0017]

【Example】

（実施例１）以下、本発明をその実施例を示す図面に基
づいて説明する。図１は音声入力システムの騒音除去装
置を示すブロック図である。図において、１は音声入力
用マイクロホン、２は参照音入力用マイクロホン、３，
４は特徴量抽出部、５，６は切替えスイッチ、７，８は
累積レジスタ、９，１０は平均化レジスタ、１１，１２
は減算器、１３は適応フィルタ（補正係数生成部）、１
４は減算器、１５は音声区間検出部、１６は認識部であ
る。この騒音除去装置は、図２に示すような音声認識ボ
ード２０に組み込まれている。(Embodiment 1) The present invention will be described below with reference to the drawings illustrating the embodiment. FIG. 1 is a block diagram showing a noise elimination device of a voice input system. In the figure, 1 is a voice input microphone, 2 is a reference sound input microphone, 3,
4 is a feature quantity extraction unit, 5 and 6 are changeover switches, 7 and 8 are accumulation registers, 9 and 10 are averaging registers, and 11 and 12.
Is a subtracter, 13 is an adaptive filter (correction coefficient generation unit), 1
Reference numeral 4 is a subtractor, 15 is a voice section detection unit, and 16 is a recognition unit. This noise eliminating device is incorporated in a voice recognition board 20 as shown in FIG.

【００１８】音声入力用マイクロホン１は、上記の図２
乃至図４に示すように、コード１ａを通じてパーソナル
コンピュータ２１の筐体外部へと引き出される一方、参
照音入力用のマイクロホン２は、音声認識ボード２０の
留め具に固定して、或いは、マイク位置の変更が可能と
なるように着脱自在に設けられる。マイクロホン２は、
前記ボード２０がパーソナルコンピュータ２１に装着さ
れたときにパーソナルコンピュータ２１の背面側に位置
する。なお、パーソナルコンピュータ２１の背面側に
は、放熱ファンモーター２２が設置されている。The voice input microphone 1 is similar to that shown in FIG.
As shown in FIG. 4, the microphone 2 for inputting the reference sound is fixed to the fastener of the voice recognition board 20 while being pulled out to the outside of the housing of the personal computer 21 through the cord 1a, or at the microphone position. It is detachably installed so that it can be changed. Microphone 2
When the board 20 is mounted on the personal computer 21, it is located on the back side of the personal computer 21. A heat radiation fan motor 22 is installed on the back side of the personal computer 21.

【００１９】特徴抽出部３，４は、例えば、公知の周波
数スペクトル法に基づいて音声の特徴を抽出するもので
あり、複数個の互いに通過させる周波数が異なるバンド
パスフィルターやＡ／Ｄ変換器などを備えて構成され
る。The feature extracting units 3 and 4 are for extracting features of voice based on, for example, a well-known frequency spectrum method, and include a plurality of band pass filters or A / D converters that pass different frequencies. It is configured with.

【００２０】切替えスイッチ５，６は、特徴抽出部３，
４で抽出された特徴量を端子ａ側へ出力するか或いは端
子ｂ側へ出力するかを切り替えるものである。その切替
えは、例えば、パーソナルコンピュータ２１のセットア
ップ時において端子ｂ側を選択し、しばらく後に自動的
に端子ａ側に切り替える、或いは、パーソナルコンピュ
ータ２１に電源が投入される毎に前記端子ｂ側を選択
し、しばらく後に端子ａ側を選択するように制御部（図
示せず）にて操作することが考えれらる。なお、端子ｂ
側に接続中は、なるべく放熱ファンモーター２２の音が
だけが入力されるように、物音を発しないようにするの
が望ましい。The changeover switches 5 and 6 are used for the feature extraction section 3 and 3.
The feature amount extracted in 4 is switched between output to the terminal a side or terminal b side. The switching is performed, for example, by selecting the terminal b side during setup of the personal computer 21 and automatically switching to the terminal a side after a while, or selecting the terminal b side each time the personal computer 21 is powered on. Then, it is conceivable that after a while, the control unit (not shown) operates so as to select the terminal a side. Note that terminal b
During connection to the side, it is desirable that noise is not emitted so that only the sound of the heat radiation fan motor 22 is input.

【００２１】累積レジスタ７，８は、特徴抽出部３，４
にて抽出された特徴量について各周波数成分のレベルを
累積するものであり、平均化レジスタ９，１０は、各周
波数成分の累積レベルを平均化するものである。これら
累積レジスタ７，８、平均化レジスタ９，１０により特
徴量の平均量が求められることになる。なお、平均化レ
ジスタ９，１０は、これに電源が供給されている間は前
記平均量の保持ができるものである。The accumulation registers 7 and 8 are provided in the feature extraction units 3 and 4, respectively.
The level of each frequency component is accumulated with respect to the feature amount extracted in (3), and the averaging registers 9 and 10 average the accumulated level of each frequency component. The average amount of the feature amount is obtained by the accumulating registers 7 and 8 and the averaging registers 9 and 10. The averaging registers 9 and 10 can hold the average amount while power is supplied thereto.

【００２２】減算器１１は、参照音特徴量についての平
均量から音声特徴量についての平均量を減じ、両平均量
の差異量Ｆ（ｆ）を生成するものである。即ち、参照用
マイクロホン２から入力される放熱ファンモーター２２
の大きな音と、音声マイクロホン１から入力される上記
放熱ファンモーター２２の比較的小さな音との差異量が
生成される。The subtractor 11 subtracts the average amount of the voice feature amount from the average amount of the reference sound feature amount to generate a difference amount F (f) between the two average amounts. That is, the heat radiation fan motor 22 input from the reference microphone 2
Is generated and a relatively small amount of the sound of the heat radiation fan motor 22 input from the voice microphone 1 is generated.

【００２３】減算器１２は、特徴抽出部４にて生成され
た参照音特徴量から、前記の減算器１１にて生成された
差異量を減じるものである。これにより、両マイクロホ
ン１，２において騒音入力のレベルに著しい差異が生じ
ても、その差異が解消されることになる。即ち、参照音
特徴量を適応フィルタ１３に入力する前の段階で上記参
照音特徴量に対しレベルの差異に応じた補正がかけられ
る。このような補正を適応フィルタ１３に任せることも
可能であるが、適応フィルタ１３は逐次変動する両特徴
量のレベルに従ってゲイン調整のための補正係数を更新
していくものであり、放熱ファンモーター１１のような
定常的・定型的な騒音については更新の必要がないの
で、適応フィルタ１３の前段階でそのような定常的・定
型的な騒音は予め除去し、補正係数の精度を高めること
ができる。The subtractor 12 subtracts the difference amount generated by the subtractor 11 from the reference sound feature amount generated by the feature extraction unit 4. As a result, even if there is a significant difference in the level of noise input between the microphones 1 and 2, the difference is eliminated. That is, before the reference sound feature amount is input to the adaptive filter 13, the reference sound feature amount is corrected according to the level difference. It is possible to entrust such correction to the adaptive filter 13, but the adaptive filter 13 updates the correction coefficient for gain adjustment according to the level of both feature amounts that change sequentially, and the heat radiation fan motor 11 Since it is not necessary to update the stationary / typical noise as described above, such stationary / typical noise can be removed in advance before the adaptive filter 13 to improve the accuracy of the correction coefficient. .

【００２４】適応フィルタ１３は、上述のように、逐次
変動する両特徴量のレベルに従ってゲイン調整のための
補正係数を更新していくものである。補正係数ｋ（ｆ）
は、（Ｘ（ｆ）＋ｃ）／（Ｎ（ｆ）＋ｃ）の式で計算さ
れる。入力が小さい場合にｋ（ｆ）は誤差が大きくな
り、適切な雑音成分の推定が行えないので、定数ｃ＞０
を導入してこれを防ぐようにしている。As described above, the adaptive filter 13 updates the correction coefficient for gain adjustment in accordance with the levels of the two feature quantities that change sequentially. Correction coefficient k (f)
Is calculated by the formula (X (f) + c) / (N (f) + c). When the input is small, k (f) has a large error, and the noise component cannot be properly estimated. Therefore, the constant c> 0.
Has been introduced to prevent this.

【００２５】減算器１４は、特徴量Ｓ（ｆ）を生成する
ものであり、Ｓ（ｆ）＝Ｘ（ｆ）−ｋ（ｆ）・Ｎ（ｆ）
の式で計算される。The subtractor 14 generates a feature quantity S (f), and S (f) = X (f) -k (f) .N (f).
It is calculated by the formula.

【００２６】音声区間検出部１５は、音声区間を検出す
るものであり、その判別信号を適応フィルタ１３へ出力
する。適応フィルタ１３は、上記の判別信号を受け、非
音声区間において補正係数を生成するようにしている。
なお、音声区間中でも、より正確には、或る帯域に音声
の成分がなく、別の帯域に音声成分が有るとき、音声成
分のない帯域で雑音成分の推定値を更新可能とし、時間
非定常の雑音に対する雑音除去性能を向上させるように
している。The voice section detector 15 detects a voice section and outputs the discrimination signal to the adaptive filter 13. The adaptive filter 13 receives the above determination signal and generates a correction coefficient in the non-voice section.
More precisely, even when there is no voice component in a certain band and a voice component in another band, it is possible to update the estimated value of the noise component in the band without the voice component, more accurately, in the voice section. We are trying to improve the noise removal performance against the noise of.

【００２７】認識部１６は、予め複数の音声の特徴量が
登録されている音声認識辞書を備えると共に、補正され
た音声特徴量と前記の音声認識辞書中の各音声の特徴量
との類似度計算を行って音声認識するものである。The recognition unit 16 includes a voice recognition dictionary in which a plurality of voice feature amounts are registered in advance, and the degree of similarity between the corrected voice feature amount and the feature amount of each voice in the voice recognition dictionary. This is to perform speech recognition by performing calculation.

【００２８】上記の構成によれば、騒音が参照音入力用
マイクロホン２に過剰に入り、音声入力用マイクロホン
１に殆ど入らず、両マイクロホンにおいて騒音入力のレ
ベルに著しい差異が生じても、その不均衡は、前段補正
部を構成する累積レジスタ７，８、平均化レジスタ９，
１０、及び減算器１１，１２によって解消され、この解
消済の参照音特徴量が補正係数生成部である適応フィル
タ１３に入力される。従って、上記適応フィルタ１３で
の補正係数の生成は、一般的な雑音の不均衡に対するゲ
イン調整のための補正係数となり、誤差は少なく、認識
部１６における音声認識の確度を高めることができる。According to the above configuration, even if noise is excessively input to the reference sound input microphone 2 and hardly enters the voice input microphone 1, even if there is a significant difference in the noise input level between the two microphones, it is not possible. As for the equilibrium, the accumulation registers 7 and 8 and the averaging register 9 which constitute the pre-stage correction unit,
10 and the subtractors 11 and 12, and the canceled reference sound feature amount is input to the adaptive filter 13 that is the correction coefficient generation unit. Therefore, the generation of the correction coefficient in the adaptive filter 13 becomes a correction coefficient for gain adjustment for general noise imbalance, and there are few errors, and the accuracy of voice recognition in the recognition unit 16 can be increased.

【００２９】また、参照音用マイクロホン２を、音声認
識ボード２０に設けてあるから、ユーザー自身がマイク
ロホン２を配置する手間を省くことができる。この場
合、ボード２０の近傍に放熱ファンモーター２２が存在
していても、前述のように、騒音入力の著しい不均衡に
よる不具合は前段補正部によって解消されるので、なん
ら問題はない。Further, since the reference sound microphone 2 is provided on the voice recognition board 20, it is possible to save the user the trouble of disposing the microphone 2 himself. In this case, even if the heat radiation fan motor 22 is present near the board 20, there is no problem because the front stage correction unit eliminates the problem due to the significant imbalance of the noise input as described above.

【００３０】なお、放熱ファンモーター２２のような定
常的・定型的な騒音については、これを予め記憶して補
正処理を行うことも可能である。例えば、パーソナルコ
ンピュータのセットアップ時などに前段補正部をしばら
く起動させて、この前段補正部が起動されているときに
生成した差異量を不揮発性メモリに記憶しておき、音声
認識に際し、前記の記憶してある差異量に基づき参照音
特徴量に補正をかけ、この補正済の参照音特徴量を前記
の適応フィルタ１３に入力するようにしてもよい。この
ように不揮発性メモリを備えれば、電源が絶たれても記
憶内容が保持されるので、差異量の生成は一回限りでよ
いことになる。また、前段補正部をソフトウェアで構成
しても、差異量の生成は一回限りであるから、ＣＰＵの
負担は少ない。It should be noted that it is also possible to memorize this in advance and perform a correction process for stationary / typical noise such as the heat radiation fan motor 22. For example, at the time of setting up a personal computer, the pre-stage correction unit is activated for a while, and the difference amount generated when the pre-stage correction unit is activated is stored in the non-volatile memory. The reference sound feature amount may be corrected on the basis of the predetermined difference amount, and the corrected reference sound feature amount may be input to the adaptive filter 13. If the nonvolatile memory is provided in this way, the stored contents are retained even when the power is cut off, so that the difference amount needs to be generated only once. Further, even if the pre-stage correction unit is configured by software, the difference amount is generated only once, so the load on the CPU is small.

【００３１】また、本実施例では、参照音用マイクロホ
ン２を、放熱ファンモーター２２の傍に設けた場合を示
したが、参照音用マイクロホン２が例えば、クーラーや
その他のＯＡ機器の傍に置かれる場合でもそのような定
常的・定型的な騒音についてその除去が可能となるもの
である。In this embodiment, the reference sound microphone 2 is provided near the heat radiation fan motor 22, but the reference sound microphone 2 is placed near a cooler or other OA equipment. Even in the case where the noise is generated, it is possible to eliminate such stationary / typical noise.

【００３２】（参考例）参考例の音声入力システムの騒音除去装置は、突発的・
定型的な騒音に対して好適な構成となっている。図５
は、本参考例の音声入力システムの騒音除去装置のブロ
ック図である。図において、３１は音声入力用マイクロ
ホン、３２は増幅器、３３は１５チャンネルのバンドパ
スフィルタで構成された特徴抽出部、３４は音声特徴量
の補正を行う特徴量補正部、３５は認識部、３６は電話
機、３７は信号源検出部、３８は信号源辞書である。 Reference Example The noise removal device of the voice input system of the reference example is suddenly
The configuration is suitable for fixed noise. Figure 5
[ Fig. 6] is a block diagram of a noise elimination device of a voice input system of the present reference example . In the figure, 31 is a voice input microphone, 32 is an amplifier, 33 is a feature extraction unit composed of a 15-channel band-pass filter, 34 is a feature amount correction unit for correcting voice feature amount, 35 is a recognition unit, and 36 is a recognition unit. Is a telephone, 37 is a signal source detector, and 38 is a signal source dictionary.

【００３３】信号源辞書３８は、予め信号音発生源であ
る電話機３６或いは図示しないファクシミリ装置などか
ら発せられる音の特徴量を登録してあるものである。具
体的には、電話機３６については、呼出音の周波数成分
ｆ₁〜ｆ₁₅、各周波数についてのノイズ係数ｋ_n、鳴動
時間ｔ₁、及び非鳴動時間ｔ₂が格納されている。ノイ
ズ係数ｋ_nは、補正のために作製する呼出音の各周波数
のレベルが、実際の呼出音がマイクロホン３１を通じて
入力されるときの各周波数のレベルと一致するように、
マイク位置および電話機位置を考慮して定められる。The signal source dictionary 38 is preliminarily registered with feature quantities of sounds emitted from the telephone 36 which is a signal sound source or a facsimile device (not shown). Specifically, the telephone 36, the frequency components f ₁ ~f ₁₅ ring tones, noise factor k _n for each frequency, ring time t _1, and the non-ringing time t ₂ is stored. Noise factor k _n, as the level of each frequency of the ringing tone that made for correction, consistent with the frequency level at which the actual ringing tone is input through the microphone 31,
It is determined in consideration of the microphone position and the phone position.

【００３４】信号源検出部３７は、信号音発生源である
電話機３６から呼出音が発せられたか否か及び呼出音の
発生が止んだか否かを検知するものである。信号源検出
部３７は、例えば、電話機から直接に呼出信号を検出す
るか、或いは、呼出音の周波数成分に対応したバンドパ
スフィルタ及び音検出器を備えることで呼出音の発生を
検出できる。一方、呼出音の発生が止んだか否かの検出
は、例えば、電話機３６からオフフック信号を取り出す
か、或いは、信号源辞書３８に格納されている鳴動時間
ｔ₁、非鳴動時間ｔ₂の情報、タイマー情報、及びこれ
ら情報に基づいて判断する判断手段によって行うことが
できる。The signal source detector 37 detects whether or not a ringing tone is emitted from the telephone 36, which is a signal tone generating source, and whether or not the generation of the ringing tone is stopped. The signal source detection unit 37 can detect the generation of a ringing tone by directly detecting a ringing signal from the telephone or by including a bandpass filter and a sound detector corresponding to the frequency component of the ringing tone. On the other hand, the detection of whether or not the ringing tone has stopped is performed, for example, by taking an off-hook signal from the telephone 36, or information on the ringing time t ₁ and the non-ringing time t ₂ stored in the signal source dictionary 38, It can be performed by the timer information and the judging means for judging based on these information.

【００３５】また、信号源検出部３７は、信号音が検知
されたときにその信号音に合った信号源辞書を信号源辞
書部３８から選択することができる。例えば、信号音の
周波数によって信号源の判別が可能であり、この判別結
果で信号源辞書を選択することができる。さらに、選択
した信号源辞書のノイズ係数、及び後述のＨ，Ｌ信号に
基づいて呼出音を作製し、この作製呼出音を前記補正部
３４に供給するようになっている。The signal source detector 37 can select a signal source dictionary suitable for the signal sound from the signal source dictionary unit 38 when the signal sound is detected. For example, the signal source can be discriminated by the frequency of the signal sound, and the signal source dictionary can be selected based on the discrimination result. Further, a ringing tone is produced based on the selected noise coefficient of the signal source dictionary and H and L signals which will be described later, and the produced ringing tone is supplied to the correction section 34.

【００３６】図６は、音声信号と検知信号との関係の一
例を示したタイミングチャートである。Ａ（ｔ）は音声
入力用マイクロホン３１から入力された音声（騒音とな
る呼出音が含まれている）を示している。Ｂ（ｔ）は信
号音検知区間を示している。Ｃ（ｔ）は鳴動時間ｔ₁、
非鳴動時間ｔ₂の情報によって作製されたＨｉｇｈ，Ｌ
ｏｗ信号であり、実際の呼出音の鳴動区間および非鳴動
区間に同期するものである。FIG. 6 is a timing chart showing an example of the relationship between the audio signal and the detection signal. A (t) indicates a voice (including a ringing tone that becomes noise) input from the voice input microphone 31. B (t) indicates a signal sound detection section. C (t) is the ringing time t ₁ ,
High, L created by the information of the non-ringing time t ₂
The ow signal is synchronized with the ringing section and the non-ringing section of the actual ringing tone.

【００３７】図７は、上記のＡ（ｔ）の特徴量Ｔｖ
（ｔ）をフレーム単位で示したものである。一方、図８
は、作製呼出音の特徴量Ｔｎ（ｔ）を、フレーム単位で
示したものである。なお、図中の“０”“１２”“３
５”“２０”“６”の各数値が各周波数（チャンネル）
についてのノイズ係数ｋ_nである。各数値は、図７にお
ける対応数値と完全に一致しているが、これは説明の便
宜上のもので、実際には多少のずれが伴うものである。
また、図中のＣは、信号音の鳴動区間をＨで、非鳴動区
間をＬで示したもので、前記のＣ（ｔ）に対応する。即
ち、前記の信号源辞書３８から読み出された特徴量は、
前記のＨ区間のときだけ補正部３４に出力されることを
示している。FIG. 7 shows the characteristic amount Tv of the above A (t).
(T) is shown in frame units. On the other hand, FIG.
Shows the feature amount Tn (t) of the produced ring back tone in units of frames. Note that "0", "12", and "3" in the figure
Each value of 5 ”,“ 20 ”, and“ 6 ”is each frequency (channel)
Is a noise coefficient k _n for. Although the respective numerical values are completely the same as the corresponding numerical values in FIG. 7, this is for convenience of explanation, and in reality, there is some deviation.
Further, C in the figure shows the ringing section of the signal sound as H and the non-ringing section as L, and corresponds to the above-mentioned C (t). That is, the feature amount read from the signal source dictionary 38 is
It is shown that the data is output to the correction unit 34 only in the H section.

【００３８】図９は、補正部３４における補正後の音声
信号の特徴量Ｔｘ（ｔ）をフレーム単位で示したもので
ある。図によれば、呼出音の特徴量である各周波数の
“１２”“３５”“２０”“６”が消去されたものとな
り、正確に補正が行われたことを示している。FIG. 9 shows the feature value Tx (t) of the audio signal after correction by the correction section 34 in units of frames. According to the figure, “12”, “35”, “20”, and “6” of each frequency, which is the characteristic amount of the ringing tone, are erased, which indicates that the correction is accurately performed.

【００３９】上記の構成によれば、電話の呼出音などの
突発的・定常的な騒音に対しても、それを確実に除去し
て音声認識の確度向上が図れると共にノイズの影響のな
い音声辞書作成が可能になる。また、２マイクロホン構
造のもの、即ち、参照音用の特徴抽出部を備えるものに
比べ、その参照音用の特徴抽出部が不要である分、コス
トの低減も図り得るものである。勿論、図１で示した２
マイクロホン構造のものに上記構成を付加した構造の騒
音除去装置とすることも可能である。According to the above construction, even for sudden and steady noise such as telephone ringing noise, it can be reliably removed to improve the accuracy of voice recognition and the voice dictionary is not affected by noise. Can be created. Further, as compared with the two-microphone structure, that is, the one including the feature extracting unit for the reference sound, the feature extracting unit for the reference sound is not necessary, so that the cost can be reduced. Of course, 2 shown in FIG.
It is also possible to provide a noise elimination device having a structure in which the above configuration is added to the structure of a microphone.

【００４０】なお、図１の特徴抽出部４、累積レジスタ
７，８、平均化レジスタ９，１０、及び減算器１１を備
えれば、信号源辞書３８に格納されているノイズ係数を
変更することが可能になる。If the feature extracting section 4, the accumulating registers 7 and 8, the averaging registers 9 and 10 and the subtractor 11 shown in FIG. 1 are provided, the noise coefficient stored in the signal source dictionary 38 can be changed. Will be possible.

【００４１】[0041]

【発明の効果】以上のように、本発明によれば、定常的
・定型的な騒音源で、特に、２つのマイクロホンへの入
力レベルに著しい不均衡があるような場合においても音
声認識の確度を高めることができる。故に、参照用マイ
クロホンを放熱ファンモーターの近傍に固定的に配する
ことが可能となり、参照マイクロホン配置場所を決定す
ることのユーザーの手間を解消することができる。 As described above, according to the present invention, the accuracy of speech recognition can be improved even in the case where there is a significant imbalance in the input level to the two microphones, especially in the case of a stationary and fixed noise source. Can be increased. Thus, it is possible to arrange the reference microphones fixedly in the vicinity of the cooling fan motor and will, Ru can be eliminated user effort to determine the reference microphone location.

[Brief description of drawings]

【図１】本発明の音声入力システムの騒音除去装置を示
すブロック図である。FIG. 1 is a block diagram showing a noise removing device of a voice input system of the present invention.

【図２】本発明の音声入力システムの騒音除去装置が搭
載されたボードを示す平面図である。FIG. 2 is a plan view showing a board on which the noise reduction device of the voice input system of the present invention is mounted.

【図３】本発明の音声入力システムであるパーソナルコ
ンピュータの背面図である。FIG. 3 is a rear view of a personal computer that is a voice input system of the present invention.

【図４】図２のボートがパーソナルコンピュータに装填
される様子を示した斜視図である。FIG. 4 is a perspective view showing how the boat of FIG. 2 is loaded into a personal computer.

【図５】本発明の参考例の音声入力システムの騒音除去
装置を示すブロック図である。FIG. 5 is a block diagram showing a noise elimination device of a voice input system according to a reference example of the present invention.

【図６】本発明の参考例において音声信号と検知信号と
の関係の一例を示したタイミングチャートである。FIG. 6 is a timing chart showing an example of a relationship between a voice signal and a detection signal in a reference example of the present invention.

【図７】本発明の参考例において音声入力用マイクロホ
ンから入力された音（音声と騒音の両者が混在してい
る）の特徴量をフレーム単位で示した説明図である。FIG. 7 is an explanatory diagram showing, for each frame, the characteristic amount of a sound (both voice and noise are mixed) input from the voice input microphone in the reference example of the present invention.

【図８】本発明の参考例において作製呼出音の特徴量を
フレーム単位で示した説明図である。FIG. 8 is an explanatory diagram showing, for each frame, a feature amount of a produced ring back tone in a reference example of the present invention.

【図９】本発明の参考例において補正後の特徴量をフレ
ーム単位で示した説明図である。FIG. 9 is an explanatory diagram showing the corrected feature amount in frame units in the reference example of the present invention.

【図１０】従来の音声入力システムの騒音除去装置を示
すブロック図である。FIG. 10 is a block diagram showing a noise reduction device of a conventional voice input system.

【図１１】従来の音声入力システムの騒音除去装置が搭
載されたボードを示す平面図である。FIG. 11 is a plan view showing a board on which a noise reduction device of a conventional voice input system is mounted.

[Explanation of symbols]

１音声入力用マイクロホン２参照音入力用マイクロホン３，４特徴抽出部５，６切替えスイッチ７，８累積レジスタ９，１０平均化レジスタ１１，１２減算器１３適応フィルタ１４減算器１５音声区間検出部１６認識部３１音声入力用マイクロホン３３特徴抽出部３４特徴量補正部３５認識部３６電話機３７信号源検出部３８信号源辞書 1 Microphone for voice input 2 Microphone for reference sound input 3, 4 feature extraction unit 5, 6 selector switch 7,8 cumulative register 9, 10 averaging register 11,12 subtractor 13 Adaptive filter 14 Subtractor 15 Voice section detector 16 Recognition part 31 Microphone for voice input 33 Feature Extraction Unit 34 Feature correction unit 35 recognition unit 36 telephones 37 Signal Source Detector 38 Source dictionary

フロントページの続き (56)参考文献特開昭58−62700（ＪＰ，Ａ) 特開平３−269498（ＪＰ，Ａ) 特許2836271（ＪＰ，Ｂ２) 特許3251980（ＪＰ，Ｂ２) Ｋ．Ｋｒｏｓｇｈｅｌ，Ｋ．Ｌｉｎｈａｒｄ，Ｃｏｍｂｉｎｅｄｍｅｔｈｏｄｓｆｏｒａｄａｐｔｉｖｅｎｏｉｓｅｃａｎｃｅｌｌａｔｉｏｎ，ＳＩＧＮＡＬＰＲＯＣＥＳＳＩＮＧＩＶ：ＴｈｅｏｒｉｅｓａｎｄＡｐｐｌｉｃａｔｉｏｎｓ，1988年，Ｖｏｌ．１，ｐ．411−414 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/20 G10L 21/02 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (56) References JP-A-58-62700 (JP, A) JP-A-3-269498 (JP, A) Patent 2836271 (JP, B2) Patent 3251980 (JP, B2) K.K. Krosghel, K .; Linh ard, Combined methods for adapt noise cancellation, SIGNAL PROCESSING IV: Theories and A applications, 1988, Vol. 1, p. 411-414 (58) Fields surveyed (Int.Cl. ⁷ , DB name) G10L 15/20 G10L 21/02 JISST file (JOIS)

Claims

(57) [Claims]

1. A voice input means, a voice feature extraction unit for extracting a feature amount of an input voice, a voice recognition dictionary in which a plurality of voice feature amounts are registered in advance, and a reference different from voice input. Sound input means, a reference sound feature extraction section for extracting the feature quantity of the input reference sound, a correction coefficient generation section for performing a comparison operation on the two feature quantities to generate a correction coefficient, the correction coefficient and the reference sound A correction unit that corrects the voice feature amount in consideration of the feature amount, and a voice recognition unit that performs similarity calculation between the corrected voice feature amount and the feature amount of each voice in the voice recognition dictionary. In a noise removal device of a voice input system comprising, the average amount of the two feature amounts is obtained, the reference sound feature amount is corrected based on the difference between the two average amounts, and then the corrected reference tone A pre-stage correction unit for inputting the feature amount to the correction coefficient generation unit is provided. Noise removal device of the speech input system, wherein the door.

2. The noise elimination device for a voice input system according to claim 1, further comprising means for storing the difference amount generated by the pre-stage correction unit.

3. The noise elimination device for a voice input system according to claim 1, wherein the reference sound input means is provided on a circuit board which constitutes the noise elimination device.