JPH1185185A

JPH1185185A - Voice recognition system and storage medium with voice recognition control program

Info

Publication number: JPH1185185A
Application number: JP9241083A
Authority: JP
Inventors: Kazuhiko Shudo; 和彦首藤
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1997-09-05
Filing date: 1997-09-05
Publication date: 1999-03-30
Anticipated expiration: 2017-09-05
Also published as: JP3510458B2

Abstract

PROBLEM TO BE SOLVED: To improve the voice recognition rate by employing a smaller amount of processes under such an acoustic environment that different kinds of plural noise signals exist. SOLUTION: In a cancel circuit 31, digital signals 211, which are generated by capturing car audio signals in a noise input circuit 21, are eliminated from voice signals 101 caught by a voice input circuit 10, voice is recognized in a recognition circuit 40 by a hidden Markov model(HMM) method and a recognition word S1 and a recognition probability P1 are obtained. Similarly, in a cancel circuit 32, digital signals 221, which are noise signals outside the car caught by a noise input circuit 22, are eliminated from the signals 101, voice is recognized in the circuit 40 by the HMM method and a recognition word S2 and a recognition probability P2 are obtained. Moreover, in a cancel circuit 33, digital signals 231, which are engine sound signals caught by a noise input circuit 23, are eliminated from the signals 101, voice is recognized in a recognition circuit 40 by the HMM method and a recognition symbol S3 and a recognition probability P3 are obtained. Then, a comparison circuit 50 selects the recognized word having the highest probability among the recognition probabilities Ps.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識システム
および音声認識制御プログラムを記録した記録媒体に関
し、たとえば、複数の雑音源からの雑音信号が主音響信
号に混入している場合に、主音響信号から雑音信号を除
去し音声認識を行うシステムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition system and a recording medium on which a speech recognition control program is recorded, for example, when a main sound signal contains noise signals from a plurality of noise sources. The present invention relates to a system for performing speech recognition by removing a noise signal from a signal.

【０００２】[0002]

【従来の技術】近年、カーナビゲーションシステムなど
の商品において、その操作制御を、音声認識を用いてユ
ーザの音声によって行うことが試みられている。しかし
ながら、自動車内のカーオーディオシステムからの音響
や車外からの騒音といった、音声認識装置にとっては雑
音が激しい環境である自動車内などでは、音声認識装置
をそのまま適用したのでは認識率が低くく実用的ではな
い。2. Description of the Related Art In recent years, it has been attempted to control the operation of a product such as a car navigation system by a user's voice using voice recognition. However, in a car where the noise is intense, such as the sound from the car audio system in the car and the noise from outside the car, the recognition rate is low and practical if the speech recognition device is applied as it is in a car. is not.

【０００３】そこで、従来、マイクロフォンなどを使用
して捕捉された音響信号から、適応ノイズキャンセラを
使用して、カーオーディオ信号などに重畳している雑音
信号を軽減し、その後に音声認識を行うことがなされて
きた。[0003] Therefore, conventionally, an adaptive noise canceller is used to reduce a noise signal superimposed on a car audio signal or the like from an audio signal captured using a microphone or the like, and thereafter, speech recognition is performed. It has been done.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、雑音信
号としては、カーオーディオ信号にとどまらず、車外騒
音や風を切る音やエンジンオ音やタイヤ走行音など異な
る種類の多数の雑音信号が存在する。このような複数の
種類の雑音が存在する雑音信号に対処するには、入力を
多数備える適応ノイズキャンセラが必要になる。しか
し、多入力の適応ノイズキャンセラは、アルゴリズムが
複雑で計算量が膨大になり、処理時間が長くなり実用的
ではないとされてきた。However, the noise signal is not limited to a car audio signal, but also includes a large number of different types of noise signals such as a noise outside a vehicle, a sound of cutting a wind, an engine sound, and a tire running sound. To cope with such a noise signal in which a plurality of types of noise exist, an adaptive noise canceller having many inputs is required. However, the multi-input adaptive noise canceller has been considered to be impractical due to the complicated algorithm and the enormous amount of calculation and the processing time.

【０００５】このようなことから、自動車外からの雑音
やカーオーディオシステムからの音響信号など、異なる
種類の複数の雑音信号が存在する音響環境において、現
実的な処理量（少ない処理量）で音声認識の認識精度を
向上させることができる音声認識システムおよび音声認
識制御プログラムを記録した記録媒体の実現が要請され
ている。[0005] For this reason, in an acoustic environment in which a plurality of different types of noise signals such as noise from the outside of a car and an acoustic signal from a car audio system are present, speech is processed with a realistic processing amount (small processing amount). There is a demand for a speech recognition system capable of improving the recognition accuracy of recognition and a recording medium recording a speech recognition control program.

【０００６】[0006]

【課題を解決するための手段】そこで、本発明によれ
ば、主音響を捕捉し主音響信号を出力すると共に、少な
くとも２以上の雑音源からの雑音を捕捉し得る主音響信
号捕捉手段を含む音声認識システムは、少なくとも２以
上の雑音源からの雑音信号を除去するために少なくとも
２以上の雑音信号捕捉手段と適応ノイズキャンセル手段
とを含むシステムであって、ある一つの雑音源からの雑
音を捕捉し雑音捕捉信号を出力する第１の雑音信号捕捉
手段と、他の雑音源からの雑音を捕捉し雑音捕捉信号を
出力する第２の雑音信号捕捉手段と、主音響信号から第
１の雑音信号捕捉手段で捕捉した雑音捕捉信号を除去
し、この除去した主音響信号を出力する第１の適応ノイ
ズキャンセル手段と、主音響信号から第２の雑音信号捕
捉手段で捕捉した雑音捕捉信号を除去し、この除去した
主音響信号を出力する第２の適応ノイズキャンセル手段
と、第１の適応ノイズキャンセル手段の出力主音響信号
を統計的音響モデルを使用して音声認識し、認識単語を
求めると共にその認識の確からしさを求め、第２の適応
ノイズキャンセル手段の出力主音響信号も統計的音響モ
デルを使用して音声認識し、認識単語を求めると共にそ
の認識の確からしさを求め、これらの認識結果から、音
声認識の確からしさが高い認識単語を認識結果として出
力する音声認識手段とを含む。Therefore, according to the present invention, there is provided a main sound signal capturing means capable of capturing main sound and outputting a main sound signal and capturing noise from at least two noise sources. A speech recognition system is a system including at least two or more noise signal capturing means and adaptive noise canceling means for removing a noise signal from at least two or more noise sources, wherein the noise from one noise source is removed. First noise signal capturing means for capturing and outputting a noise capturing signal, second noise signal capturing means for capturing noise from another noise source and outputting a noise capturing signal, and first noise from the main acoustic signal First adaptive noise canceling means for removing the noise capturing signal captured by the signal capturing means and outputting the removed main acoustic signal; and noise captured by the second noise signal capturing means from the main acoustic signal. A second adaptive noise canceling means for removing the captured signal and outputting the removed main acoustic signal; and a speech recognition of the main acoustic signal output from the first adaptive noise canceling means by using a statistical acoustic model. The word and the likelihood of its recognition are obtained, and the main sound signal output from the second adaptive noise canceling means is also subjected to speech recognition using the statistical acoustic model, to obtain the recognition word and to obtain the certainty of the recognition. A speech recognition unit that outputs a recognition word having a high probability of speech recognition from these recognition results as a recognition result.

【０００７】このような構成を採ることで、主音響信号
から、ある一つの雑音源からの雑音を捕捉した雑音捕捉
信号を第１の適応ノイズキャンセル手段で除去でき、さ
らに、他の雑音源からの雑音を捕捉した雑音捕捉信号を
第２の適応ノイズキャンセル手段で除去でき、これらの
除去された主音響信号に対してそれぞれ音声認識を行
い、認識の確からしさが高い単語を認識結果とする。こ
のため、複数の雑音源からの雑音信号が主音響信号に含
まれている場合であっても、複雑な処理を行うことなく
簡単な構成で音声認識を行うことができ、統計的音響モ
デルを使用して音声認識を行い認識率の高い認識単語を
選択するように構成しているため、認識精度も向上させ
ることができる。[0007] By adopting such a configuration, a noise capturing signal obtained by capturing noise from a certain noise source can be removed from the main acoustic signal by the first adaptive noise canceling means. Can be removed by the second adaptive noise canceling means, speech recognition is performed on each of the removed main sound signals, and a word having a high probability of recognition is determined as a recognition result. For this reason, even when noise signals from multiple noise sources are included in the main acoustic signal, speech recognition can be performed with a simple configuration without performing complicated processing, and a statistical acoustic model can be created. Since the speech recognition is used to select a recognized word having a high recognition rate, the recognition accuracy can be improved.

【０００８】なお、上述の構成では、適応ノイズキャン
セル手段を２つとしているが、これは、雑音源からの雑
音信号が少なくとも２種類としているためであり、３以
上であった場合は、雑音信号捕捉手段および適応ノイズ
キャンセル手段は、それぞれ３以上備えれば、同じよう
な処理で音声認識処理を行うことができる。In the above-described configuration, two adaptive noise canceling means are used. This is because there are at least two types of noise signals from the noise source. If three or more capture units and three or more adaptive noise cancellation units are provided, speech recognition processing can be performed by similar processing.

【０００９】また、本発明によれば、主音響を捕捉し主
音響信号を出力すると共に、少なくとも２以上の雑音源
からの雑音を捕捉し得る主音響信号捕捉手段を含む音声
認識システムは、少なくとも２以上の雑音源からの雑音
信号を除去するために少なくとも２以上の雑音信号捕捉
手段と適応ノイズキャンセル手段とを含むシステムであ
って、ある一つの雑音源からの雑音を捕捉し雑音捕捉信
号を出力する第１の雑音信号捕捉手段と、他の雑音源か
らの雑音を捕捉し雑音捕捉信号を出力する第２の雑音信
号捕捉手段と、主音響信号から第１の雑音信号捕捉手段
で捕捉した前記雑音捕捉信号を除去し、この除去した主
音響信号を出力する第１の適応ノイズキャンセル手段
と、主音響信号から第２の雑音信号捕捉手段で捕捉した
雑音捕捉信号を除去し、この除去した主音響信号を出力
する第２の適応ノイズキャンセル手段と、第１の適応ノ
イズキャンセル手段の出力主音響信号および第２の適応
ノイズキャンセル手段の出力主音響信号の信号品質を求
め、信号品質の良い出力主音響信号を選択し、この選択
した出力主音響信号に対して音声認識を行う音声認識手
段とを含む。Further, according to the present invention, a voice recognition system including main sound signal capturing means capable of capturing main sound and outputting a main sound signal and capturing noise from at least two or more noise sources is provided. A system including at least two or more noise signal capturing means and adaptive noise canceling means for removing noise signals from two or more noise sources, capturing noise from one noise source and generating a noise capturing signal. First noise signal capturing means for outputting, second noise signal capturing means for capturing noise from another noise source and outputting a noise capturing signal, and first noise signal capturing means for capturing the main acoustic signal. First adaptive noise canceling means for removing the noise capture signal and outputting the removed main audio signal, and removing the noise capture signal captured by the second noise signal capture means from the main audio signal A second adaptive noise canceling means for outputting the removed main sound signal, a signal quality of the output main sound signal of the first adaptive noise canceling means and a signal quality of the main sound signal output of the second adaptive noise canceling means, Voice recognition means for selecting an output main sound signal with good signal quality and performing voice recognition on the selected output main sound signal.

【００１０】このような構成を採ることで、信号品質と
して、信号対雑音比または信号歪み量などから第１の適
応ノイズキャンセル手段の出力主音響信号および第２の
適応ノイズキャンセル手段の出力主音響信号の信号品質
を判定することができ、信号品質の良い適応ノイズキャ
ンセル手段の出力主音響信号に対して音声認識を行うの
で、認識精度を向上させることができると共に音声認識
の計算処理量を少なくすることができるので認識のため
の時間を短くすることができる。By adopting such a configuration, the main sound signal output from the first adaptive noise canceling unit and the main sound output from the second adaptive noise canceling unit are determined based on the signal quality from the signal-to-noise ratio or the amount of signal distortion. Since the signal quality of the signal can be determined and speech recognition is performed on the output main sound signal of the adaptive noise canceling unit having good signal quality, the recognition accuracy can be improved and the calculation processing amount of the speech recognition can be reduced. Can reduce the time for recognition.

【００１１】また、本発明によれば、コンピュータによ
って、主音響を捕捉し主音響信号を出力すると共に、少
なくとも２以上の雑音源からの雑音を捕捉し得る主音響
信号捕捉手段で捕捉した主音響信号から、少なくとも２
以上の雑音源からの雑音信号を捕捉するために少なくと
も２以上の雑音信号捕捉手段を含み、ある一つの雑音源
からの雑音を捕捉し雑音捕捉信号を出力する第１の雑音
信号捕捉手段、および他の雑音源からの雑音を捕捉し雑
音捕捉信号を出力する第２の雑音信号捕捉手段によって
捕捉された雑音捕捉信号を除去し、この除去した主音響
信号に対する音声認識を行わせるための音声認識制御プ
ログラムを記録した記録媒体の音声認識制御プログラム
は、主音響信号から少なくとも２以上の雑音捕捉信号を
除去するために少なくとも２以上の適応ノイズキャンセ
ル工程を含むものであって、主音響信号から第１の雑音
信号捕捉手段で捕捉した雑音捕捉信号を除去させ、この
除去させた主音響信号を出力させる第１の適応ノイズキ
ャンセル工程と、主音響信号から第２の雑音信号捕捉手
段で捕捉した雑音捕捉信号を除去させ、この除去させた
主音響信号を出力させる第２の適応ノイズキャンセル工
程と、第１の適応ノイズキャンセル工程の出力主音響信
号を統計的音響モデルによって音声認識させ、認識単語
を求めさせると共にその認識の確からしさを求めさせ、
第２の適応ノイズキャンセル工程の出力主音響信号も統
計的音響モデルによって音声認識させ、認識単語を求め
させると共にその認識の確からしさを求めさせ、これら
の認識結果から、音声認識の確からしさが高い認識単語
を認識結果として出力させる音声認識工程とを含む。Further, according to the present invention, the main sound captured by the main sound signal capturing means capable of capturing the main sound and outputting the main sound signal by the computer and capturing the noise from at least two or more noise sources. At least 2
A first noise signal capturing unit including at least two or more noise signal capturing units for capturing a noise signal from the above noise source, capturing noise from a certain noise source, and outputting a noise capturing signal; and Speech recognition for removing a noise capture signal captured by a second noise signal capture unit that captures noise from another noise source and outputs a noise capture signal, and performs voice recognition on the removed main acoustic signal The voice recognition control program of the recording medium on which the control program is recorded includes at least two or more adaptive noise canceling steps for removing at least two or more noise capture signals from the main sound signal, and includes a step of removing the main sound signal from the main sound signal. A first adaptive noise canceling step of removing the noise capture signal captured by the first noise signal capture means and outputting the removed main sound signal; A second adaptive noise canceling step of removing the noise capture signal captured by the second noise signal capturing means from the audio signal and outputting the removed main audio signal, and an output main sound of the first adaptive noise canceling step Let the signal be speech-recognized by the statistical acoustic model, ask for the recognition word and ask for the certainty of the recognition,
The output main acoustic signal of the second adaptive noise canceling step is also subjected to speech recognition using a statistical acoustic model to determine a recognized word and to determine the likelihood of the recognition. From these recognition results, the likelihood of the speech recognition is high. A speech recognition step of outputting a recognition word as a recognition result.

【００１２】このような構成を採ることで、複雑なプロ
グラム処理を行うことなく簡単な構成で音声認識を行う
ことができ、統計的音響モデルを使用して音声認識を行
い認識率の高い認識単語を選択するように構成している
ため、認識精度も向上させることができ、ROM やフラッ
シュメモリや磁気ディスク装置に記憶して音声認識シス
テムの小型化を図ることも可能になる。By adopting such a configuration, speech recognition can be performed with a simple configuration without performing complicated program processing, and speech recognition is performed using a statistical acoustic model, and a recognized word having a high recognition rate is obtained. , The recognition accuracy can be improved, and the voice recognition system can be reduced in size by storing it in a ROM, a flash memory, or a magnetic disk device.

【００１３】[0013]

【発明の実施の形態】次に本発明の好適な実施例を図面
を用いて説明する。本実施例においては、音声信号に含
まれる雑音信号を低減するために適応ノイズキャンセラ
を組み合わせた音声認識システムにおいて、複数の雑音
信号を入力し、各雑音入力部に対応した適応ノイズキャ
ンセラ部を備え、これらの複数の適応ノイズキャンセル
部と、統計的音響モデルであるヒデンマルコフモデル
(Hidden Markov Model)法などによる音声認識処理部と
が共同して高認識率を達成する手段を備える。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, a preferred embodiment of the present invention will be described with reference to the drawings. In the present embodiment, in a speech recognition system that combines an adaptive noise canceller to reduce noise signals included in a speech signal, a plurality of noise signals are input, and an adaptive noise canceller unit corresponding to each noise input unit is provided. Adaptive noise cancellers and Hidden Markov model, a statistical acoustic model
A means for achieving a high recognition rate in cooperation with a speech recognition processing unit based on the (Hidden Markov Model) method or the like is provided.

【００１４】また、複数の適応ノイズキャンセル部と音
声認識処理部とが共同して高認識率を達成する手段とし
て、各適応ノイズキャンセル部の各出力についておのお
の音声認識を行い、こうして得られた複数の認識結果の
うち、その確からしさが最も高い認識結果を選択するよ
うに構成する。As means for achieving a high recognition rate by a plurality of adaptive noise canceling sections and a speech recognition processing section, each of the outputs of each adaptive noise canceling section performs speech recognition. , The recognition result having the highest probability is selected from among the recognition results.

【００１５】さらに、複数の適応ノイズキャンセル部と
音声認識部とが共同して高認識率を達成する手段とし
て、各適応ノイズキャンセル部の各出力信号について発
見的な方法で（簡易的な方法で）、信号対雑音比（S/N
比）を求め、このS/N 比が最も大きい適応ノイズキャン
セル部を選択し、その適応ノイズキャンセル部の出力信
号について音声認識を行い、この結果を認識結果とする
ように構成する。Further, as a means for achieving a high recognition rate by a plurality of adaptive noise canceling units and a speech recognizing unit working together, each output signal of each adaptive noise canceling unit is determined by a heuristic method (by a simple method). ), Signal to noise ratio (S / N
Ratio), an adaptive noise canceling unit having the largest S / N ratio is selected, speech recognition is performed on an output signal of the adaptive noise canceling unit, and the result is used as a recognition result.

【００１６】さらにまた、適応ノイズキャンセル部の出
力信号についての発見的な方法ではS/N 比を求める方法
として、その出力信号を発声区間と非発声区間とに時間
的に分け、各区間についてその振幅の平均値を求め、こ
のようにして得られる発声区間での振幅の平均ASと非発
声区間での振幅の平均ANとの比、AS/AN を求めるS/N比
とするように構成する。Furthermore, in a heuristic method for the output signal of the adaptive noise canceling unit, the output signal is temporally divided into a speech section and a non-speech section as a method of obtaining an S / N ratio. The average value of the amplitude is obtained, and the ratio between the average AS of the amplitude in the vocal section obtained in this way and the average AN of the amplitude in the non-vocal section, and the S / N ratio for obtaining AS / AN are configured. .

【００１７】図１は、自動車内に設置されている音声認
識システム60の機能構成図である。この図１において、
音声認識システム60は、認識対象の音声を捕捉し、捕捉
した音声信号をデジタル信号101 に変換して適応ノイズ
キャンセル回路31〜33に与えるための音声信号入力回路
10と、捕捉した音声信号101 に混入している雑音信号を
除去するために使用する特定の雑音信号を捕捉するため
の回路であり自動車内のカーオーディオ信号を捕捉し、
デジタル信号211 に変換して適応ノイズキャンセル回路
31に与える雑音信号入力回路21と、自動車外の騒音を例
えば、マイクロフォンなどで捕捉し、捕捉した自動車外
騒音信号をデジタル信号221 に変換して適応ノイズキャ
ンセル部32に与える雑音信号入力回路22と、自動車のエ
ンジン音を捕捉し、捕捉したエンジン音信号をデジタル
信号231 に変換して適応ノイズキャンセル回路33に与え
る雑音信号入力回路33とを備える。FIG. 1 is a functional block diagram of a speech recognition system 60 installed in a car. In this FIG.
The voice recognition system 60 captures voice to be recognized, converts the captured voice signal into a digital signal 101, and supplies the digital signal 101 to the adaptive noise cancellation circuits 31 to 33.
10 and a circuit for capturing a specific noise signal used to remove a noise signal mixed in the captured audio signal 101, capturing a car audio signal in a car,
Adaptive noise cancellation circuit by converting to digital signal 211
A noise signal input circuit 21 provided to the adaptive noise canceling section 32; a noise signal input circuit 21 provided to the adaptive noise canceling section 32; A noise signal input circuit 33 for capturing an engine sound of a vehicle, converting the captured engine sound signal into a digital signal 231 and supplying the digital signal 231 to the adaptive noise canceling circuit 33.

【００１８】更に、本音声認識システム60は、音声信号
入力回路10から供給される捕捉した音声信号101 から判
断して有音区間であるか、無音区間であるかを検出し、
この検出に対応して無音区間に適応ノイズキャンセル回
路31、32、33のフィルタ係数（重み付け係数）を更新さ
せるための係数更新命令信号301 を与える有音／無音検
出制御回路30と、デジタルフィルタを使用して、捕捉し
た音声信号101 から、カーオーディオ信号を捕捉したデ
ジタル信号211 を除去し、除去した音声信号311 を音声
認識回路40に与える適応ノイズキャンセル回路31と、デ
ジタルフィルタを使用して、捕捉した音声信号101 か
ら、捕捉した自動車外騒音信号のデジタル信号221 を除
去し、除去した音声信号321 を音声認識回路40に与える
適応ノイズキャンセル回路32と、デジタルフィルタを使
用して、捕捉した音声信号101 から、捕捉したエンジン
音信号のデジタル信号231 を除去し、除去した音声信号
331を音声認識回路40に与える適応ノイズキャンセル回
路33とを備える。Further, the voice recognition system 60 determines whether the voice signal is a voiced section or a silent section by judging from the captured voice signal 101 supplied from the voice signal input circuit 10,
A sound / silence detection control circuit 30 for providing a coefficient update command signal 301 for updating the filter coefficients (weighting coefficients) of the adaptive noise canceling circuits 31, 32, 33 in a silent section in response to this detection, and a digital filter. Using an adaptive noise canceling circuit 31 for removing the digital signal 211 obtained by capturing the car audio signal from the captured audio signal 101 and providing the removed audio signal 311 to the voice recognition circuit 40, using a digital filter, An adaptive noise canceling circuit 32 that removes the captured digital signal 221 of the noise signal outside the vehicle from the captured audio signal 101 and supplies the removed voice signal 321 to the voice recognition circuit 40, and a captured voice signal using a digital filter. The digital signal 231 of the captured engine sound signal is removed from the signal 101, and the removed audio signal is removed.
And an adaptive noise canceling circuit 33 for providing a speech recognition circuit 331 to the speech recognition circuit 40.

【００１９】更に、本音声認識システム60は、除去した
音声信号311 と、除去した音声信号321 と、除去した音
声信号331 とに対するハイデンマルコフモデル法による
音声認識を別々に行い、確からしさを確率で求め、音声
認識結果をそれぞれ求め、除去した音声信号311 に対す
る音声認識確率41を音声認識確率比較回路50に与え、除
去した音声信号321 に対する音声認識確率42を音声認識
確率比較回路50に与え、除去した音声信号331 に対する
音声認識確率43を音声認識確率比較回路50に与える音声
認識回路40と、音声認識確率41と音声認識確率42と音声
認識確率43とから最も音声認識確率の高い音声の単語を
認識結果51として出力する音声認識確率比較回路50とか
ら構成されている。Further, the speech recognition system 60 separately performs speech recognition based on the Heiden-Markov model method for the removed speech signal 311, the removed speech signal 321, and the removed speech signal 331, and determines the probability with certainty. The speech recognition probability 41 for the removed speech signal 311 is given to the speech recognition probability comparison circuit 50, and the speech recognition probability 42 for the removed speech signal 321 is given to the speech recognition probability comparison circuit 50. A speech recognition circuit 40 for giving a speech recognition probability 43 to the speech signal 331 to the speech recognition probability comparison circuit 50, and a speech word having the highest speech recognition probability from the speech recognition probabilities 41, 42 and 43. And a speech recognition probability comparison circuit 50 which outputs the result as a recognition result 51.

【００２０】なお、雑音信号入力回路21は、カーオーデ
ィオシステムが出力する音響として、ライン出力電気信
号を直接取り込み、これをカーオーディオ信号を捕捉し
たデジタル信号211 として適応ノイズキャンセル回路31
に供給するように構成してもよい。The noise signal input circuit 21 directly takes in the line output electric signal as the sound output from the car audio system, and converts it into a digital signal 211 obtained by capturing the car audio signal.
May be configured to be supplied.

【００２１】図２は、本音声認識システム60の動作を説
明するための図である。本音声認識システムにおいて、
先ず、適応ノイズキャンセル回路31において、音声信号
入力回路10で捉えた音声信号101 から、雑音信号入力回
路21でカーオーディオ信号を捕捉したデジタル信号211
を除去し（ステップS10 ）、除去された音声信号311が
音声認識回路40でヒデンマルコフモデル法によって音声
認識され（ステップS40 ）、音声認識シンボル（音声認
識単語）S1と、音声認識確率P1とが求められる。FIG. 2 is a diagram for explaining the operation of the speech recognition system 60. In this speech recognition system,
First, in the adaptive noise canceling circuit 31, the digital signal 211 obtained by capturing the car audio signal by the noise signal input circuit 21 is converted from the audio signal 101 captured by the audio signal input circuit 10.
Is removed (step S10), and the removed speech signal 311 is speech-recognized by the speech recognition circuit 40 by the Hidden Markov Model method (step S40), and the speech recognition symbol (speech recognition word) S1 and the speech recognition probability P1 Desired.

【００２２】また、適応ノイズキャンセル回路32におい
ても、音声信号入力回路10で捉えた音声信号101 から、
雑音信号入力回路22で捕捉した自動車外騒音信号のデジ
タル信号221 を除去し（ステップS20 ）、除去された音
声信号321 が音声認識回路40でヒデンマルコフモデル法
によって音声認識され（ステップS50 ）、音声認識シン
ボルS2と、音声認識確率P2とが求められる。更に、適応
ノイズキャンセル回路33においても、音声信号入力回路
10で捉えた音声信号101 から、雑音信号入力回路23で捕
捉したエンジン音信号のデジタル信号231 を除去し（ス
テップS30 ）、除去された音声信号331 が音声認識回路
40でヒデンマルコフモデル法によって音声認識され（ス
テップS60 ）、音声認識シンボルS3と、音声認識確率P3
とが求められる。Also in the adaptive noise canceling circuit 32, the audio signal 101 captured by the audio signal input circuit 10
The digital signal 221 of the noise signal outside the vehicle captured by the noise signal input circuit 22 is removed (step S20), and the removed voice signal 321 is voice-recognized in the voice recognition circuit 40 by the Hidden Markov Model method (step S50). A recognition symbol S2 and a speech recognition probability P2 are obtained. Furthermore, in the adaptive noise canceling circuit 33, the audio signal input circuit
The digital signal 231 of the engine sound signal captured by the noise signal input circuit 23 is removed from the voice signal 101 captured in step 10 (step S30), and the removed voice signal 331 is converted to a voice recognition circuit.
At 40, speech recognition is performed by the Hidden Markov Model method (step S60), and a speech recognition symbol S3 and a speech recognition probability P3
Is required.

【００２３】次に、音声認識確率比較回路50において、
これらの音声認識確率P1、P2、P3の中から確率の一番高
い音声認識シンボルS1、S2、S3が選択されて出力され
る。選択された音声認識シンボルは、カーナビゲーショ
ンシステムの場合は、コマンドとして、システムの制御
に使用される。Next, in the speech recognition probability comparison circuit 50,
From these speech recognition probabilities P1, P2, P3, the speech recognition symbols S1, S2, S3 having the highest probability are selected and output. In the case of a car navigation system, the selected voice recognition symbol is used as a command for controlling the system.

【００２４】上述の図２の音声認識システム60の動作を
現実的に実行するために、適応ノイズキャンセル回路31
〜33や、音声認識回路40や、音声認識確率比較回路50な
どの処理をプログラム処理で行うことができる。このよ
うな処理をプログラム処理で行い、しかもカーナビゲー
ションシステムに小型で組み込むためには、フラッシュ
メモリやROM にプログラムを搭載するとよい。また、バ
ックアップのために磁気ディスク装置に記憶しておくこ
ともよい。In order to execute the above-described operation of the speech recognition system 60 shown in FIG.
33, the speech recognition circuit 40, and the speech recognition probability comparison circuit 50 can be performed by program processing. In order to perform such processing by program processing and to incorporate it in a car navigation system in a small size, it is preferable to mount the program in flash memory or ROM. Alternatively, the information may be stored in a magnetic disk device for backup.

【００２５】図３は、適応ノイズキャンセル回路31、3
2、33の一例の構成例の図である。上述の適応ノイズキ
ャンセル回路31、32、33は、それぞれ同じ回路構成で実
現することができる。この図３において、適応ノイズキ
ャンセル回路31、32、33は、適応デジタルフィルタ回路
312 と、減算回路313 とから構成されている。適応デジ
タルフィルタ回路312 には、自動車内のカーオーディオ
信号を捕捉したデジタル信号211 、自動車外騒音信号の
デジタル信号221 、エンジン音信号のデジタル信号231
のいずれかが与えられると、音声信号入力回路10で捉え
た音声信号101 に含まれている雑音を除去するために疑
似ノイズ3121を生成し減算回路313 に与える。FIG. 3 shows adaptive noise canceling circuits 31, 3
35 is a diagram of a configuration example of an example of 2, 33. FIG. The adaptive noise cancellation circuits 31, 32, and 33 described above can be realized with the same circuit configuration. In FIG. 3, adaptive noise canceling circuits 31, 32, and 33 are adaptive digital filter circuits.
312 and a subtraction circuit 313. The adaptive digital filter circuit 312 includes a digital signal 211 that captures a car audio signal in a vehicle, a digital signal 221 of a noise signal outside the vehicle, and a digital signal 231 of an engine sound signal.
Is given, a pseudo noise 3121 is generated to remove noise contained in the audio signal 101 captured by the audio signal input circuit 10 and is supplied to the subtraction circuit 313.

【００２６】減算回路313 は、音声信号入力回路10で捉
えた音声信号101 から、適応デジタルフィルタ回路312
で求めた疑似ノイズ3121を差し引き、カーオーディオ信
号が除去された音声信号311 を出力し、または自動車外
騒音信号のデジタル信号221が除去された音声信号321
を出力し、またはエンジン音信号のデジタル信号231が
除去された音声信号331 を出力する。これらの音声信号
311 、音声信号321 、音声信号331 は、適応デジタルフ
ィルタ回路312 にフィードバックされ、より雑音信号の
残存成分が減少できるようにフィルタの係数を更新す
る。A subtraction circuit 313 converts an audio signal 101 captured by the audio signal input circuit 10 into an adaptive digital filter circuit 312.
The pseudo noise 3121 obtained in the above is subtracted to output the audio signal 311 from which the car audio signal has been removed, or the audio signal 321 from which the digital signal 221 of the noise signal outside the car has been removed.
Or an audio signal 331 from which the digital signal 231 of the engine sound signal has been removed. These audio signals
311, the audio signal 321, and the audio signal 331 are fed back to the adaptive digital filter circuit 312, and the filter coefficients are updated so that the residual component of the noise signal can be further reduced.

【００２７】図４は、図３の適応ノイズキャンセル回路
31、32、33の一例の適応デジタルフィルタ回路312 の構
成例の図である。この図４において、適応ノイズキャン
セル回路31、32、33は、自動車内のカーオーディオ信号
を捕捉したデジタル信号211または自動車外騒音信号の
デジタル信号221 またはエンジン音信号のデジタル信号
231 などを遅延させる遅延回路3122〜3125と、音声信号
311 、音声信号321 、音声信号331 などからフィルタ係
数の更新を行うためのフィルタ係数更新回路3130と、フ
ィルタ係数と遅延信号とを乗算する乗算器3126〜3129
と、それぞれの乗算結果を加算する加算器3131とから構
成されている。FIG. 4 shows the adaptive noise canceling circuit of FIG.
31 is a diagram illustrating a configuration example of an adaptive digital filter circuit 312 as an example of 31, 32, and 33. FIG. In FIG. 4, adaptive noise canceling circuits 31, 32, and 33 are a digital signal 211 that captures a car audio signal in a vehicle, a digital signal 221 of a noise signal outside the vehicle, or a digital signal of an engine sound signal.
231 and other delay circuits 3122 to 3125, and audio signals
311, a filter coefficient update circuit 3130 for updating a filter coefficient from the audio signal 321, the audio signal 331, and the like, and multipliers 3126 to 3129 for multiplying the filter coefficient by the delay signal.
And an adder 3131 that adds the respective multiplication results.

【００２８】フィルタ係数更新回路3130は、有音／無音
検出制御回路30から係数更新命令信号301 が与えられて
いるときに、カーオーディオ信号が除去された音声信号
311、自動車外騒音信号のデジタル信号221 が除去され
た音声信号321 、エンジン音信号のデジタル信号231 が
除去された音声信号331 などに混入する雑音信号成分が
減少するように係数を更新し、乗算器3126、3127、312
8、3129に与える。有音区間においては、係数更新を停
止し、自動車内のカーオーディオ信号を捕捉したデジタ
ル信号211 または自動車外騒音信号のデジタル信号221
またはエンジン音信号のデジタル信号231 などを遅延回
路3122〜3125を通して、遅延された信号に対して乗算器
3126〜3129でフィルタ係数を乗算し、それぞれの乗算結
果を加算器3131で加算し、加算結果3121を疑似雑音信号
として出力する。When the coefficient update command signal 301 is given from the sound / non-sound detection control circuit 30, the filter coefficient update circuit 3130 outputs the audio signal from which the car audio signal has been removed.
311, coefficient is updated and multiplied so that noise signal components mixed in the audio signal 321 from which the digital signal 221 of the noise signal outside the vehicle is removed, the audio signal 331 from which the digital signal 231 of the engine sound signal is removed, and the like are reduced. Vessels 3126, 3127, 312
8, give to 3129. In the sound section, the coefficient update is stopped and the digital signal 211 capturing the car audio signal in the car or the digital signal 221 of the noise signal outside the car is stopped.
Alternatively, the digital signal 231 of the engine sound signal is passed through delay circuits 3122 to 3125, and a multiplier is applied to the delayed signal.
The filter coefficients are multiplied by 3126 to 3129, the respective multiplication results are added by an adder 3131, and the addition result 3121 is output as a pseudo noise signal.

【００２９】なお、上述の図４においては、非巡回型の
デジタルフィルタで構成したが、巡回型デジタルフィル
タを採用することもできる。また、巡回型と非巡回型の
多段数のデジタルフィルタとで構成することもできる。Although FIG. 4 shows a non-recursive digital filter, a recursive digital filter may be employed. Further, it may be constituted by a multi-stage digital filter of a recursive type and a non-recursive type.

【００３０】図５は、音声認識回路40の一例の構成例の
図である。この図５において、音声認識回路40は、LPC
分析回路401 と、HMM 音声辞書回路402 と、ビタビ照合
回路403 とから構成されている。FIG. 5 is a diagram showing a configuration example of an example of the speech recognition circuit 40. In FIG. 5, the speech recognition circuit 40 is an LPC
It comprises an analysis circuit 401, an HMM speech dictionary circuit 402, and a Viterbi matching circuit 403.

【００３１】LPC 分析回路401 では、入力カーオーディ
オ信号が除去された音声信号311 、自動車外騒音信号の
デジタル信号221 が除去された音声信号321 、エンジン
音信号のデジタル信号231 が除去された音声信号331 な
どから音声波形を短い区間（フレーム、長さは例えば、
10msec〜30msec程度）に区切り、フレームごとに特徴パ
ラメータを抽出する。In the LPC analysis circuit 401, the audio signal 311 from which the input car audio signal has been removed, the audio signal 321 from which the digital signal 221 of the noise signal outside the vehicle has been removed, and the audio signal from which the digital signal 231 of the engine sound signal has been removed. From 331 etc., the audio waveform is divided into short sections (frame and length are, for example,
(About 10 msec to 30 msec) and extract feature parameters for each frame.

【００３２】この音声分析には、音声の特性に合った能
率的方法として広く使用されているLPC （Linear Predi
ction Coding：線形予測）分析を行い、LPC 係数からLP
C ケプストラムを算出する。このLPC ケプストラムは、
対数スペクトルを逆フーリエ変換したもので、人間の聴
覚特性に近い性質を持ち、比較的に少ない数のパラメー
タで効率よく音声を表現する。更に加えて、特徴パラメ
ータには、スペクトルの動的性質を表すケプストラムの
時間変化量であるデルタケプストラムや、音声の強さを
表す対数パワー、その時間変化量であるデルタ対数パワ
ーを用いる。このようなLPC 分析結果の情報をビタビ照
合回路403 に与える。In this speech analysis, an LPC (Linear Predidiode) which is widely used as an efficient method adapted to the characteristics of the speech is used.
ction Coding (linear prediction) analysis and LP from LPC coefficient
Calculate C cepstrum. This LPC cepstrum
It is the inverse Fourier transform of the logarithmic spectrum, has properties similar to human auditory characteristics, and expresses speech efficiently with a relatively small number of parameters. In addition, a delta cepstrum, which is a temporal change amount of a cepstrum representing a dynamic characteristic of a spectrum, a logarithmic power representing a voice intensity, and a delta logarithmic power, which is a temporal change amount, are used as the feature parameters. The information of the LPC analysis result is provided to the Viterbi matching circuit 403.

【００３３】ビタビ照合回路403 は、HMM 音声辞書回路
402 を使用して、ビタビ（Viterbi）アルゴリズムによ
って、音素や単語を表現したHMM モデルと未知入力音声
とを比較し、類似度を求める。すなわち、音声の特徴量
のベクトルの時系列Ｃが各単語モデルＭから生成される
音声認識確率Ｐを求め、最大認識確率を与えたモデルに
対応する単語を音声認識結果として出力する。The Viterbi matching circuit 403 is an HMM speech dictionary circuit.
Using 402, the HMM model representing phonemes and words is compared with the unknown input speech by the Viterbi algorithm to determine the similarity. That is, the speech recognition probability P is generated from the time series C of the speech feature vector from each word model M, and the word corresponding to the model having the maximum recognition probability is output as the speech recognition result.

【００３４】上述の実施例では、ヒデンマルコフモデル
法によって音声認識を行う例を説明したが、他に、動的
計画法（Dynamic Programming ）によって音声認識を行
うこともできる。In the above-described embodiment, an example has been described in which speech recognition is performed by the Hidden Markov Model method. Alternatively, speech recognition may be performed by a dynamic programming method.

【００３５】ここで、簡単な例で、雑音信号として、カ
ーオーディオ信号と自動車外からの雑音信号の２つをキ
ャンセルする場合の例を説明する。このような場合、マ
イクロフォンからの音声信号を認識するために妨げとな
る要因として、カーオーディオシステムからの雑音信号
と自動車外からの雑音信号との２つがある。このうち、
例えば、カーオーディオシステムからの雑音信号が大き
く、自動車外からの雑音信号が小さい場合を説明する。Here, a simple example in which two noise signals, a car audio signal and a noise signal from outside the car, are canceled as noise signals will be described. In such a case, there are two factors that hinder recognition of the audio signal from the microphone: a noise signal from the car audio system and a noise signal from outside the vehicle. this house,
For example, a case where the noise signal from the car audio system is large and the noise signal from outside the car is small will be described.

【００３６】カーオーディオシステムの雑音信号を除去
するための適応ノイズキャンセル回路31では、その適応
ノイズキャンセルの効果が発揮されマイクフォン入力中
のカーオーディオシステムによる雑音信号を低減するこ
とができ、純粋な音声信号に近い音声信号を出力するこ
とができる。この結果、この音声信号を与えられた音声
認識回路40では、正しい音声信号を単語S1として認識で
きると同時に、この確からしさP1（音声認識率）も高い
値を得ることができる。In the adaptive noise canceling circuit 31 for removing the noise signal of the car audio system, the effect of the adaptive noise cancellation is exerted, and the noise signal by the car audio system during the microphone phone input can be reduced. An audio signal close to the audio signal can be output. As a result, the speech recognition circuit 40 given this speech signal can recognize a correct speech signal as the word S1, and at the same time, can obtain a high value of the certainty P1 (speech recognition rate).

【００３７】一方、自動車外からの雑音信号を除去する
ための適応ノイズキャンセル回路32では、雑音成分中の
小さい部分である自動車外の雑音成分はある程度低減で
きるものの、より大きな雑音成分であるカーオーディオ
システムからの雑音信号は低減されない。したがって、
適応ノイズキャンセル回路32の出力信号には、カーオー
ディオシステムからの雑音信号が混入されたままとな
る。音声認識回路40では、カーオーディオシステムから
の雑音信号が混入されたまま与えられるので、誤った認
識単語S2を出力し易くなり、音声認識確率P2も低い値と
なる。On the other hand, in the adaptive noise canceling circuit 32 for removing a noise signal from outside the car, the noise component outside the car, which is a small part of the noise component, can be reduced to some extent, but the car audio, which is a larger noise component, can be reduced. The noise signal from the system is not reduced. Therefore,
The noise signal from the car audio system remains mixed in the output signal of the adaptive noise cancellation circuit 32. In the speech recognition circuit 40, since the noise signal from the car audio system is given while being mixed, an erroneous recognition word S2 is easily output, and the speech recognition probability P2 also has a low value.

【００３８】このようなことから、音声認識確率比較回
路50では、音声認識確率P2に比べて高い音声認識確率の
P1の認識単語S1が選択されて出力される。From the above, in the speech recognition probability comparison circuit 50, a speech recognition probability higher than the speech recognition probability P2 is obtained.
The recognition word S1 of P1 is selected and output.

【００３９】逆に、自動車外の雑音信号が大きく、カー
オーディオシステムからの雑音信号が小さい場合は、音
声認識確率P1に比べて高い音声認識確率のP2の認識単語
S2が選択されて出力される。このように、確からしさを
最大とする適応ノイズキャンセル回路の出力の認識結果
を選択することで、複数の雑音源のうち最も音声認識に
とって悪い影響を与える雑音信号を識別し、音声認識回
路では、最も悪い影響を及ぼしている雑音信号を除去し
た音声信号を認識することで、最も確からしい認識結果
を得ることができる。Conversely, when the noise signal outside the car is large and the noise signal from the car audio system is small, the recognition word of P2 having a higher speech recognition probability than the speech recognition probability P1 is obtained.
S2 is selected and output. In this way, by selecting the recognition result of the output of the adaptive noise cancellation circuit that maximizes the likelihood, a noise signal that has the worst effect on speech recognition among a plurality of noise sources is identified. By recognizing the audio signal from which the noise signal that has the worst effect has been removed, the most reliable recognition result can be obtained.

【００４０】以上のようにして、複数の雑音源に対して
個部に適応ノイズキャンセル回路を用いて雑音除去し、
その中で音声認識結果の最も確からしい音声認識単語を
選択するようにし、複数の雑音源の内、最も音声認識に
悪い影響を与えている雑音信号を識別して、その悪い影
響を与えている雑音信号を除去するように構成したの
で、雑音の多い音響環境において音声認識精度の向上を
図ることができる。しかも、計算処理も複雑でなく、処
理量も多くなる要素がないので容易に実現することがで
きる。As described above, a plurality of noise sources are individually subjected to noise elimination using an adaptive noise canceling circuit.
Among them, the most probable speech recognition word of the speech recognition result is selected, and a noise signal that has the least adverse effect on speech recognition among a plurality of noise sources is identified and the bad signal is given. Since the configuration is such that the noise signal is removed, it is possible to improve the speech recognition accuracy in a noisy acoustic environment. Moreover, since the calculation processing is not complicated and there is no element that increases the processing amount, it can be easily realized.

【００４１】以上の第１の実施例の音声認識システム60
においては、雑音信号に対して簡単な構成で音声認識を
行うことができるものの、各雑音信号に対応した適応ノ
イズキャンセル回路31、32、33のすべての出力に対して
音声認識を行うため、対応する雑音信号の数が多くなる
ほど計算量が多くなり得る。そこで、第２の実施例にお
いては、音声認識を行う前に、いずれの適応ノイズキャ
ンセル回路の出力信号が品質の良い信号であるかを確認
し、品質の良い適応ノイズキャンセル回路の出力信号に
対してだけ音声認識を行うように構成する。これによっ
て、より少ない計算量で良好な音声認識システムを実現
することができる。The speech recognition system 60 of the first embodiment described above.
In the above, speech recognition can be performed with a simple configuration for noise signals, but speech recognition is performed for all outputs of the adaptive noise cancellation circuits 31, 32, and 33 corresponding to each noise signal. As the number of noise signals generated increases, the amount of calculation may increase. Therefore, in the second embodiment, before performing speech recognition, it is checked which output signal of the adaptive noise canceling circuit is a signal of good quality, and the output signal of the adaptive noise canceling circuit of good quality is checked. It is configured to perform voice recognition only when Thus, a good speech recognition system can be realized with a smaller amount of calculation.

【００４２】図６は、第２の実施例の音声認識システム
70の機能構成図である。この図６において、音声認識シ
ステム70は、音声信号入力回路10と、雑音信号入力回路
21〜23と、音声信号入力回路10からの音声信号101 から
有音／無音区間の検出を行い、係数更新命令信号301 を
出力し適応ノイズキャンセル回路31〜33に与えると共に
有音／無音区間検出信号302 を適応ノイズキャンセル信
号選択回路80に与える有音／無音検出制御回路30と、適
応ノイズキャンセル回路31〜33の出力信号311、321 、3
31 のそれぞれのS/N 比を求め、最もS/N 比の高い出力
信号を選択する適応ノイズキャンセル信号選択回路80
と、S/N 比の高い適応ノイズキャンセル出力信号に対す
る音声認識をハイデンマルコフモデル法または動的計画
法によって行い認識結果901 を出力する音声認識回路90
とから構成されている。FIG. 6 shows a speech recognition system according to the second embodiment.
FIG. 70 is a functional configuration diagram of 70. In FIG. 6, a speech recognition system 70 includes a speech signal input circuit 10 and a noise signal input circuit.
A voice / silence section is detected from the voice signal 101 from the voice signal input circuit 10 and a coefficient update command signal 301 is output and supplied to the adaptive noise cancellation circuits 31 to 33, and voice / silence section detection is performed. A sound / silence detection control circuit 30 for providing a signal 302 to an adaptive noise canceling signal selecting circuit 80, and output signals 311, 321, 3 of adaptive noise canceling circuits 31 to 33
An adaptive noise canceling signal selection circuit 80 for determining the S / N ratio of each of the 31 and selecting the output signal having the highest S / N ratio
And a speech recognition circuit 90 for performing speech recognition for an adaptive noise canceling output signal having a high S / N ratio by a Heiden-Markov model method or a dynamic programming method and outputting a recognition result 901.
It is composed of

【００４３】図７は、図６の第２の実施例の音声認識シ
ステムの動作を説明するための図である。この図７にお
いて、先ず、適応ノイズキャンセル回路31において、雑
音信号入力回路21でカーオーディオ信号を捕捉したデジ
タル信号211 を除去し（ステップS10 ）、除去された音
声信号311 が適応ノイズキャンセル信号選択回路80に与
えられる。FIG. 7 is a diagram for explaining the operation of the speech recognition system according to the second embodiment of FIG. In FIG. 7, first, the adaptive noise canceling circuit 31 removes the digital signal 211 from which the car audio signal was captured by the noise signal input circuit 21 (step S10), and the removed audio signal 311 is used as an adaptive noise canceling signal selecting circuit. Given to 80.

【００４４】また、適応ノイズキャンセル回路32におい
ても、音声信号入力回路10で捉えた音声信号101 から、
雑音信号入力回路22で捕捉した自動車外騒音信号のデジ
タル信号221 を除去し（ステップS20 ）、除去された音
声信号321 が適応ノイズキャンセル信号選択回路80に与
えられる。さらに、適応ノイズキャンセル回路33におい
ても、音声信号入力回路10で捉えた音声信号101 から、
雑音信号入力回路23で捕捉したエンジン音信号のデジタ
ル信号231 を除去し（ステップS30 ）、除去された音声
信号331 が適応ノイズキャンセル信号選択回路80に与え
られる。In the adaptive noise canceling circuit 32, the audio signal 101 captured by the audio signal input circuit 10
The digital signal 221 of the noise signal outside the vehicle captured by the noise signal input circuit 22 is removed (step S20), and the removed voice signal 321 is supplied to the adaptive noise canceling signal selection circuit 80. Further, also in the adaptive noise canceling circuit 33, from the audio signal 101 captured by the audio signal input circuit 10,
The digital signal 231 of the engine sound signal captured by the noise signal input circuit 23 is removed (step S30), and the removed audio signal 331 is supplied to the adaptive noise canceling signal selection circuit 80.

【００４５】適応ノイズキャンセル信号選択回路80で
は、適応ノイズキャンセル回路31の出力信号311 のS/N
比を求めこの値をS/N1とする（ステップS80 ）。さら
に、適応ノイズキャンセル回路32の出力信号321 のS/N
比を求めこの値をS/N2とする（ステップS90 ）。さら
に、適応ノイズキャンセル回路33の出力信号331 のS/N
比を求めこの値をS/N3とする（ステップS100）。これら
のS/N 比が一番大きい出力信号801 を選択し（ステップ
S110）、音声認識回路90に与え音声認識結果901 を出力
する（ステップS120）。In the adaptive noise canceling signal selecting circuit 80, the S / N of the output signal 311 of the adaptive noise canceling circuit 31
The ratio is obtained and this value is set as S / N1 (step S80). Further, the S / N of the output signal 321 of the adaptive noise cancel circuit 32
The ratio is obtained and this value is set as S / N2 (step S90). Further, the S / N of the output signal 331 of the adaptive noise canceling circuit 33 is
The ratio is obtained and this value is set as S / N3 (step S100). Select the output signal 801 having the largest S / N ratio (step
(S110), and outputs the speech recognition result 901 to the speech recognition circuit 90 (step S120).

【００４６】上述の図７の音声認識システム70の動作を
現実的に実行するために、適応ノイズキャンセル回路31
〜33や、適応ノイズキャンセル信号選択回路80や、音声
認識回路90などの処理をプログラム処理で行うことがで
きる。このような処理を行うプログラム処理で実行し、
しかもカーナビゲーションシステムに小型で組み込むた
めには、フラッシュメモリやROM にプログラムを搭載す
るとよい。また、バックアップのために磁気ディスク装
置に記憶しておくこともよい。In order to realistically execute the operation of the speech recognition system 70 shown in FIG.
33, the adaptive noise canceling signal selection circuit 80, the speech recognition circuit 90, and the like can be performed by program processing. It is executed by the program processing that performs such processing,
Moreover, in order to incorporate the program in a car navigation system in a small size, it is better to load the program into flash memory or ROM. Alternatively, the information may be stored in a magnetic disk device for backup.

【００４７】図８は、上述のS/N 比の求め方を説明する
ための図である。この図８において、適応ノイズキャン
セル回路31〜33の出力信号331 、321 、331 の信号を、
有音／無音検出制御回路30からの有音／無音区間検出信
号302 に基づき、ユーザが発話している判定される有音
区間（発話区間82）と、それ以外、すなわち、ユーザが
発話していないと判定される無音区間（非発話区間81）
とに分け、非発話区間81での信号の振幅の区間に亘る平
均レベルANと、発話区間82での信号の振幅の区間に亘る
平均レベルASとを求め、その比AN/AS をS/N 比とすると
よい。FIG. 8 is a diagram for explaining how to obtain the above-mentioned S / N ratio. In FIG. 8, the output signals 331, 321 and 331 of the adaptive noise cancellation circuits 31 to 33 are
Based on the sound / silence section detection signal 302 from the sound / silence detection control circuit 30, the sound section (speech section 82) where the user is speaking is determined, and other than that, that is, the user is speaking. Silence section judged not to exist (non-speech section 81)
The average level AN over the signal amplitude section in the non-speech section 81 and the average level AS over the signal amplitude section in the utterance section 82 are obtained, and the ratio AN / AS is calculated as S / N It is good to be a ratio.

【００４８】以上のようにして、第２の実施例の音声認
識システムの構成によれば、各雑音信号に対して適応ノ
イズキャンセルを行い、各適応ノイズキャンセル出力信
号のS/N 比を求め、これらの中で最もS/N 比の高い出力
信号に対して音声認識を行うように構成したので、計算
量を少なく抑えながら、雑音の多い環境においても音声
認識精度の向上を図ることができるようになる。なお、
適応ノイズキャンセル出力信号の品質の良否を判定する
ために、信号歪み量を測定し、この信号歪み量（たとえ
ば、高調波歪み量）の大小によって信号品質を判定する
こともよい。As described above, according to the configuration of the speech recognition system of the second embodiment, adaptive noise cancellation is performed on each noise signal, and the S / N ratio of each adaptive noise cancellation output signal is obtained. Since speech recognition is performed on the output signal with the highest S / N ratio among these, the speech recognition accuracy can be improved even in a noisy environment while reducing the amount of calculation. become. In addition,
In order to determine the quality of the adaptive noise canceling output signal, the amount of signal distortion may be measured, and the signal quality may be determined based on the magnitude of the amount of signal distortion (eg, the amount of harmonic distortion).

【００４９】[0049]

【発明の効果】以上で述べたように本発明は、複数の雑
音源からの雑音信号が主音響信号に混入している場合
に、ある一つの雑音源からの雑音信号を主音響信号から
除去するためのある一つの適応ノイズキャンセル手段の
出力主音響信号を統計的音響モデルを使用して音声認識
し、認識単語を求めると共にその認識の確からしさを求
め、別の雑音源からの雑音信号を除去するための別の適
応ノイズキャンセル手段の出力主音響信号も統計的音響
モデルを使用して音声認識し、認識単語を求めると共に
その認識の確からしさを求め、これらの認識結果から、
音声認識の確からしさが高い認識単語を認識結果として
出力するように構成したことで、たとえば、自動車外か
らの雑音やカーオーディオシステムからの音響信号な
ど、異なる種類の複数の雑音信号が存在する音響環境に
おいて、少ない処理量で音声認識の認識精度を向上させ
ることができる。As described above, according to the present invention, when noise signals from a plurality of noise sources are mixed in a main audio signal, a noise signal from a certain noise source is removed from the main audio signal. Speech recognition of the output main acoustic signal of one adaptive noise canceling means for performing a speech recognition using a statistical acoustic model, obtaining a recognition word and obtaining a certainty of the recognition, and generating a noise signal from another noise source. The output main acoustic signal of another adaptive noise canceling means for removing is also speech-recognized using the statistical acoustic model, and a recognition word is obtained and its recognition certainty is obtained.From these recognition results,
By configuring so that a recognition word having a high probability of speech recognition is output as a recognition result, for example, a sound in which a plurality of different types of noise signals exist, such as noise from the outside of a car or a sound signal from a car audio system. In an environment, the recognition accuracy of speech recognition can be improved with a small amount of processing.

[Brief description of the drawings]

【図１】本発明の第１の実施例において自動車内に設置
されている音声認識システムの機能構成図である。FIG. 1 is a functional configuration diagram of a voice recognition system installed in an automobile in a first embodiment of the present invention.

【図２】図１に示す実施例の音声認識システムの動作を
説明するための図である。FIG. 2 is a diagram for explaining the operation of the speech recognition system of the embodiment shown in FIG.

【図３】同実施例の音声認識システムの適応ノイズキャ
ンセル回路の機能構成図である。FIG. 3 is a functional configuration diagram of an adaptive noise canceling circuit of the speech recognition system of the embodiment.

【図４】図３に示す実施例の適応ノイズキャンセル回路
の適応デジタルフィルタ回路の機能構成図である。4 is a functional configuration diagram of an adaptive digital filter circuit of the adaptive noise canceling circuit of the embodiment shown in FIG.

【図５】図１に示す実施例の音声認識システムの音声認
識回路の機能構成図である。5 is a functional configuration diagram of a speech recognition circuit of the speech recognition system of the embodiment shown in FIG.

【図６】第２の実施例の音声認識システムの機能構成図
である。FIG. 6 is a functional configuration diagram of a speech recognition system according to a second embodiment.

【図７】図６に示す実施例の音声認識システムの動作を
説明するための図である。FIG. 7 is a diagram for explaining the operation of the speech recognition system according to the embodiment shown in FIG. 6;

【図８】図６に示す音声認識システムの適応ノイズキャ
ンセル信号選択回路におけるS/N 比の求め方の説明図で
ある。FIG. 8 is an explanatory diagram of how to calculate an S / N ratio in an adaptive noise canceling signal selection circuit of the speech recognition system shown in FIG.

[Explanation of symbols]

10 音声信号入力回路 21〜23 雑音信号入力回路 31〜33 適応ノイズキャンセル回路 40 音声認識回路 50 音声認識確率比較回路 10 Speech signal input circuit 21 ~ 23 Noise signal input circuit 31 ~ 33 Adaptive noise cancellation circuit 40 Speech recognition circuit 50 Speech recognition probability comparison circuit

Claims

[Claims]

1. A speech recognition system comprising a main sound signal capturing means capable of capturing main sound and outputting a main sound signal and capturing noise from at least two noise sources, wherein the system comprises at least two or more sound sources. A system comprising at least two noise signal capturing means and adaptive noise canceling means for removing a noise signal from a noise source, wherein the system captures noise from a noise source and performs noise capture. A first noise signal capturing unit that outputs a signal, a second noise signal capturing unit that captures noise from another noise source and outputs a noise capturing signal, and the first noise signal capturing from the main acoustic signal. First noise canceling means for removing the noise capture signal captured by the means and outputting the removed main audio signal; and means for capturing the second noise signal from the main audio signal. Second adaptive noise canceling means for removing the noise capture signal captured in the stage and outputting the removed main acoustic signal; and using a statistical acoustic model for a main acoustic signal output from the first adaptive noise canceling means. And the likelihood of the recognition is determined and the likelihood of the recognition is determined. The output main acoustic signal of the second adaptive noise canceling means is also subjected to voice recognition using the statistical acoustic model to determine the recognition word. And a voice recognition means for obtaining a certainty of the recognition and outputting a recognized word having a high certainty of the voice recognition as a recognition result from the recognition results.

2. The speech recognition system according to claim 1, wherein the speech recognition is performed using a Heiden-Markov model method as a statistical acoustic model of the speech recognition. .

3. A speech recognition system including main sound signal capturing means capable of capturing main sound and outputting a main sound signal and capturing noise from at least two noise sources, wherein the system comprises at least two or more sound sources. A system comprising at least two noise signal capturing means and adaptive noise canceling means for removing a noise signal from a noise source, wherein the system captures noise from a noise source and performs noise capture. A first noise signal capturing unit that outputs a signal, a second noise signal capturing unit that captures noise from another noise source and outputs a noise capturing signal, and the first noise signal capturing from the main acoustic signal. First noise canceling means for removing the noise capture signal captured by the means and outputting the removed main audio signal; and means for capturing the second noise signal from the main audio signal. A second adaptive noise canceling means for removing the noise capture signal captured by the stage and outputting the removed main acoustic signal; and using a dynamic programming method for the main acoustic signal output from the first adaptive noise canceling means. And the similarity and the word thereof are obtained, and the main sound signal output from the second adaptive noise canceling means is also subjected to speech recognition using the dynamic programming method to obtain the similarity and the word. And a speech recognition unit that outputs a recognized word having a high degree of similarity from the recognition results as a recognition result.

4. A speech recognition system comprising main sound signal capturing means capable of capturing main sound and outputting a main sound signal and capturing noise from at least two noise sources, wherein the system comprises at least two or more sound sources. A system comprising at least two noise signal capturing means and adaptive noise canceling means for removing a noise signal from a noise source, wherein the system captures noise from a noise source and performs noise capture. A first noise signal capturing unit that outputs a signal, a second noise signal capturing unit that captures noise from another noise source and outputs a noise capturing signal, and the first noise signal capturing from the main acoustic signal. First noise canceling means for removing the noise capture signal captured by the means and outputting the removed main audio signal; and means for capturing the second noise signal from the main audio signal. A second adaptive noise canceling unit for removing the noise capture signal captured by the stage and outputting the removed main acoustic signal; a main acoustic signal output from the first adaptive noise canceling unit and the second adaptive noise Voice recognition means for determining the signal quality of the output main sound signal of the canceling means, selecting the output main sound signal having good signal quality, and performing voice recognition on the selected output main sound signal. Voice recognition system.

5. The speech recognition system according to claim 4, wherein the system determines the signal quality by determining a signal-to-noise ratio and / or a signal distortion amount of an output main audio signal. Voice recognition system.

6. The speech recognition system according to claim 4, wherein said speech recognition means performs said speech recognition by a statistical acoustic model method or a dynamic programming method.

7. A computer which captures main sound and outputs a main sound signal, and at least two or more main sound signals captured by main sound signal capturing means capable of capturing noise from at least two noise sources. A first noise signal acquisition unit that includes at least two or more noise signal acquisition units for acquiring a noise signal from a noise source, and acquires noise from a certain noise source and outputs a noise acquisition signal; and Recognition control for removing the noise capture signal captured by the second noise signal capture means that captures noise from the noise source and outputs the noise capture signal, and performs voice recognition on the removed main acoustic signal A recording medium on which a program is recorded, wherein the voice recognition control program is configured to remove at least two or more of the noise capture signals from the main audio signal. And two or more adaptive noise canceling steps, wherein the noise capturing signal captured by the first noise signal capturing means is removed from the main acoustic signal, and the removed main acoustic signal is output. 1 adaptive noise canceling step; and 2nd adaptive noise canceling step of removing the noise capture signal captured by the second noise signal capturing means from the main audio signal and outputting the removed main audio signal. The main acoustic signal output from the first adaptive noise canceling step is subjected to speech recognition using a statistical acoustic model to determine a recognition word and the likelihood of the recognition; An audio signal is also subjected to speech recognition by the statistical acoustic model, and a recognition word is determined and a probability of the recognition is determined.
A speech recognition step of outputting, as a recognition result, a recognition word having a high probability of speech recognition from the recognition results.

8. A computer that captures main sound and outputs a main sound signal, and at least two or more main sound signals captured by main sound signal capturing means capable of capturing noise from at least two noise sources. First noise signal capturing means for capturing noise from one noise source and outputting a noise capturing signal, and at least two noise signal capturing means for capturing a noise signal from the noise source of the other type. A speech recognition control program for removing a noise capture signal captured by a second noise signal capture unit that captures noise from a noise source and outputs a noise capture signal, and performs voice recognition on the removed main acoustic signal. Recording medium, wherein the voice recognition control program is configured to reduce at least two or more of the noise capture signals from the main acoustic signal. Also includes two or more adaptive noise canceling steps, wherein the noise capturing signal captured by the first noise signal capturing means is removed from the main audio signal, and the removed main acoustic signal is output. 1 adaptive noise canceling step; and 2nd adaptive noise canceling step of removing the noise capture signal captured by the second noise signal capturing means from the main audio signal and outputting the removed main audio signal. The output main audio signal of the first adaptive noise canceling step is subjected to speech recognition using dynamic programming to determine a similarity and its word, and the output main audio signal of the second adaptive noise canceling step is also obtained. A speech recognition step of performing speech recognition using the dynamic programming to obtain a similarity and a word thereof, and outputting a recognition word having a high similarity as a recognition result from the recognition results. And a recording medium storing a speech recognition control program.

9. A computer which captures main sound and outputs a main sound signal, and at least two or more main sound signals captured by main sound signal capturing means capable of capturing noise from at least two noise sources. A first noise signal acquisition unit that includes at least two or more noise signal acquisition units for acquiring a noise signal from a noise source, and acquires noise from a certain noise source and outputs a noise acquisition signal; and Recognition control for removing the noise capture signal captured by the second noise signal capture means that captures noise from the noise source and outputs the noise capture signal, and performs voice recognition on the removed main acoustic signal A recording medium on which a program is recorded, wherein the voice recognition control program is configured to remove at least two or more of the noise capture signals from the main audio signal. And two or more adaptive noise canceling steps, wherein the noise capturing signal captured by the first noise signal capturing means is removed from the main acoustic signal, and the removed main acoustic signal is output. 1 adaptive noise canceling step; and 2nd adaptive noise canceling step of removing the noise capture signal captured by the second noise signal capturing means from the main audio signal and outputting the removed main audio signal. Determining the signal quality of the output main audio signal of the first adaptive noise cancellation step and the signal quality of the output main audio signal of the second adaptive noise cancellation step, and selecting the output main audio signal having good signal quality; And a voice recognition step of performing voice recognition on the output main sound signal.