JP2000148184A

JP2000148184A - Speech recognizing device

Info

Publication number: JP2000148184A
Application number: JP10316204A
Authority: JP
Inventors: Hiroya Murao; 浩也村尾
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1998-11-06
Filing date: 1998-11-06
Publication date: 2000-05-26

Abstract

PROBLEM TO BE SOLVED: To improve the recognition ratio in a speech recognizing device. SOLUTION: An image information analysis part 26 analyses the image data obtained from an image information input part 25 to detect a position of a speaker within an image. The position of the speaker within the image can be determined by extracting the face image of the speaker and tracking it. A speech input control part 21 controls the directive characteristic, input characteristic and microphone direction of the speech information input part 20 having a microphone.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、マイクロホン等の
音声情報入力装置から入力された音声情報に基づいて機
器を制御・操作するための音声認識装置に関するもので
あり、特にカメラ等の撮像装置からの映像情報を併用し
てより正確な音声の認識を行うための装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device for controlling and operating a device based on voice information input from a voice information input device such as a microphone. The present invention relates to a device for performing more accurate voice recognition using the video information together.

【０００２】[0002]

【従来の技術】図４は従来の音声認識装置の構成を示す
ブロック図である。同図において、１０はマイクロホン
等で構成される音声情報入力部、１１は前記音声情報入
力部１０より入力された音声情報より音声の特徴を抽出
する処理を行う音声特徴ベクトル抽出部、１２は前記音
声特徴ベクトル抽出部より得られた音声の特徴情報に基
づいて音声を認識する音声認識部、１３は前記音声認識
部１２で認識された結果を表示するための認識結果表示
部である。2. Description of the Related Art FIG. 4 is a block diagram showing a configuration of a conventional speech recognition apparatus. In the figure, reference numeral 10 denotes a voice information input unit composed of a microphone or the like, 11 denotes a voice feature vector extraction unit that performs a process of extracting voice features from voice information input from the voice information input unit 10, and 12 denotes the voice feature vector extraction unit. A voice recognition unit 13 for recognizing voice based on voice feature information obtained from the voice feature vector extraction unit 13 is a recognition result display unit for displaying the result recognized by the voice recognition unit 12.

【０００３】話者より発せられた音声は、音声情報入力
部１０で電気信号に変換され、音声特徴ベクトル抽出部
１１へ入力される。該音声特徴ベクトル抽出部１１で
は、入力された電気信号を単位時間毎にＡ／Ｄ変換した
後、周波数分析（ＦＦＴ分析）などの既知の音声分析手
法によって分析し、音声の特徴ベクトル列に変換する。A voice uttered by a speaker is converted into an electric signal by a voice information input unit 10 and input to a voice feature vector extraction unit 11. The audio feature vector extraction unit 11 performs A / D conversion on the input electric signal for each unit time, analyzes the input electrical signal using a known audio analysis technique such as frequency analysis (FFT analysis), and converts the input electrical signal into an audio feature vector sequence. I do.

【０００４】尚、音声分析手法としては、通過帯域周波
数の異なる１６個のバンドパスフィルタ群による分析法
や、ＦＦＴアルゴリズムによる方法等がある。こうして
得られた音声特徴ベクトルは、音声認識部１２において
正規化処理、次元圧縮処理などが行われ、音声パターン
に変換された後、既知の音声認識手法によって認識結果
が求められる。[0004] As a voice analysis method, there are an analysis method using a group of 16 band-pass filters having different passband frequencies, a method using an FFT algorithm, and the like. The speech feature vector thus obtained is subjected to normalization processing, dimensional compression processing, and the like in the speech recognition unit 12, converted into a speech pattern, and then a recognition result is obtained by a known speech recognition technique.

【０００５】尚、音声認識手法としては、ＤＴＷ（Dyna
mic Time Warping）アルゴリズム等により登録済みの音
声パターンとの比較を行うパターンマッチング手法、大
量の音声パターンの分布を学習したニューラルネットワ
ークによるクラス分類手法、統計的確率モデルＨＭＭ
（Hidden Markov Model）による統計的手法等がある。[0005] As a speech recognition technique, DTW (Dyna
mic Time Warping) algorithm, a pattern matching method for comparing with registered voice patterns, a class classification method using a neural network that has learned a large amount of voice pattern distribution, a statistical probability model HMM
(Hidden Markov Model).

【０００６】[0006]

【発明が解決しようとする課題】然し乍ら、上記の構成
による音声認識装置では、以前にリファレンスとなる音
声パターンを収集した状況に近い発声環境下で発声され
た音声パターンについては高い認識性能を有するもの
の、話者と入力装置（マイクロホン）との位置関係の変
化や周囲雑音の有無等による発声環境の変化によって音
声パターンが変化してしまい、認識率が低下してしまう
という問題がある。そこで、予め種々の発声環境を想定
し、さまざまな音声レベルや周囲雑音下において音声パ
ターンを収集しておくことによって認識率の低下を防ぐ
ことも考えられるが、全ての発声環境を網羅することは
不可能であり、よって十分な音声認識性能を得ることは
極めて困難である。However, the speech recognition apparatus having the above configuration has a high recognition performance for a speech pattern uttered in an utterance environment close to a situation where a reference speech pattern has been collected before. In addition, there is a problem that a voice pattern changes due to a change in a vocal environment due to a change in a positional relationship between a speaker and an input device (microphone), the presence or absence of ambient noise, and the like, and the recognition rate decreases. Therefore, it is conceivable to prevent the reduction of the recognition rate by assuming various voice environments in advance and collecting voice patterns at various voice levels and under ambient noise, but it is not possible to cover all voice environments. It is impossible, and it is extremely difficult to obtain sufficient speech recognition performance.

【０００７】一方、特開平７−２８４９０号公報（Ｇ１
０Ｌ３／００）のように、音声認識の精度を向上させる
ため、カメラ等の撮像装置を用いて話者の画像情報（発
音時の口の形状等）を画像データベースとして予め登録
しておき、音声認識時に、音声情報だけでなく、カメラ
から入力される画像情報と画像データベースの画像とを
照合してこれを補助データとして利用することで、認識
性能を向上させるべく構成された装置が知られている
が、画像データベースに画像を登録した時の環境と音声
認識時の環境とが相違する場合（話者とカメラの位置関
係が相違する場合等）には、やはり良好な音声認識がで
きないという問題があった。On the other hand, JP-A-7-28490 (G1
0L3 / 00), in order to improve the accuracy of voice recognition, the image information of the speaker (such as the shape of the mouth at the time of sound generation) is registered in advance as an image database using an imaging device such as a camera, and the voice is recorded. At the time of recognition, there is known an apparatus configured to improve recognition performance by collating not only voice information but also image information input from a camera with an image in an image database and using this as auxiliary data. However, if the environment when images are registered in the image database is different from the environment during voice recognition (such as when the positional relationship between the speaker and the camera is different), the problem that good voice recognition cannot be achieved. was there.

【０００８】[0008]

【課題を解決するための手段】上記課題を解決するため
本発明では、像情報を取り込むための画像情報入力手段
と、該画像情報入力手段より入力された画像情報を解析
するための画像情報解析手段と、音声情報を取り込むた
めの音声情報入力手段と、前記画像情報解析手段からの
情報に基づいて前記音声情報入力手段の音声入力特性を
変化させるための入力特性変更手段とを備えたことを特
徴とする。According to the present invention, there is provided an image information input means for capturing image information, and an image information analysis means for analyzing image information input from the image information input means. Means, voice information input means for capturing voice information, and input characteristic changing means for changing voice input characteristics of the voice information input means based on information from the image information analysis means. Features.

【０００９】前記入力特性変更手段は、前記音声情報入
力手段の感度特性を変化させることを特徴とする。[0009] The input characteristic changing means changes the sensitivity characteristic of the voice information input means.

【００１０】前記入力特性変更手段は、前記音声情報入
力手段の指向特性を変化させることを特徴とする。The input characteristic changing means changes a directional characteristic of the voice information input means.

【００１１】前記入力特性変更手段は、前記音声情報入
力手段の方向を変化させることを特徴とする。The input characteristic changing means changes the direction of the voice information input means.

【００１２】前記音声情報入力手段は、直列に配列され
た複数のマイクロホンで構成されたマクロホンアレイを
備えることを特徴とする。The voice information input means includes a microphone array composed of a plurality of microphones arranged in series.

【００１３】前記音声情報入力手段は、パラメトリック
マイクロホンを備えることを特徴とする。The voice information input means includes a parametric microphone.

【００１４】前記画像情報解析手段は、前記画像情報入
力手段より入力された画像情報に基づいて話者の位置に
関する情報を解析することを特徴とする。The image information analyzing means analyzes information on the position of the speaker based on the image information input from the image information input means.

【００１５】前記画像情報解析手段は、前記画像情報入
力手段より入力された画像情報より話者の特定部分の画
像を抽出し、それを追跡して話者の位置に関する情報を
解析することを特徴とする。The image information analyzing means extracts an image of a specific part of the speaker from the image information input from the image information input means, and traces the image to analyze information on the position of the speaker. And

【００１６】[0016]

【発明の実施の形態】以下、図面を参照しつつ本発明の
実施形態について詳述する。先ず、図１は本発明の音声
認識装置の構成を示すブロック図である。同図におい
て、２０はマイクロホンアレイ等の指向特性や感度特性
等を可変できる構成とした音声情報入力部、２１は前記
音声情報入力部２０の指向特性あるいは感度特性等を調
整する音声入力制御部、２２は前記音声入力制御部２１
の制御に基づいて音声情報入力部２０より入力された音
声信号をＡ／Ｄ変換し、周波数分析を行い、音声の特徴
ベクトル列に変換する音声特徴ベクトル抽出部、２３は
前記音声特徴ベクトル抽出部２２から得られた音声特徴
ベクトルによって音声認識を行う音声認識部、２４は認
識結果表示部である。尚、音声認識のための各種手法に
ついては既に述べた通りである。また、２５はカメラ等
の撮像装置で構成される画像情報入力部、２６は前記画
像情報入力部２５から入力された画像情報を解析する画
像情報解析部である。Embodiments of the present invention will be described below in detail with reference to the drawings. First, FIG. 1 is a block diagram showing the configuration of the voice recognition device of the present invention. In the figure, reference numeral 20 denotes an audio information input unit having a configuration in which the directional characteristics and sensitivity characteristics of a microphone array or the like can be varied, 21 an audio input control unit for adjusting the directional characteristics or sensitivity characteristics of the audio information input unit 20, 22 is the voice input control unit 21
A / D conversion of a voice signal input from the voice information input unit 20 based on the control of the voice information input unit 20, frequency analysis, and conversion to a voice feature vector sequence, a voice feature vector extraction unit 23 is the voice feature vector extraction unit Reference numeral 24 denotes a speech recognition unit for performing speech recognition using the speech feature vector obtained from 22, and reference numeral 24 denotes a recognition result display unit. The various methods for speech recognition are as described above. Reference numeral 25 denotes an image information input unit configured by an imaging device such as a camera, and 26 denotes an image information analysis unit that analyzes image information input from the image information input unit 25.

【００１７】ここで前記音声情報入力部２０に用いるマ
イクロホンについて詳述する。図２は音声情報入力部２
０にマイクロホンアレイを適用した例を示している。同
図に示すように、マイクロホンアレイユニット３０の内
部には複数のマイクロホン３０ａ．．．が直列に並んで
おり、話者からの音声を受音する。マイクロホンアレイ
ユニット３０内の各マイクロホン３０ａと話者との距離
がそれぞれ異なるため、各マイクロホン３０ａによって
受ける音声は振幅も伝搬時間もその距離によって異なっ
ている。Here, the microphone used for the voice information input unit 20 will be described in detail. FIG. 2 shows the voice information input unit 2
0 shows an example in which a microphone array is applied. As shown in the figure, a plurality of microphones 30a. . . Are arranged in series, and receive a voice from a speaker. Since the distance between each microphone 30a in the microphone array unit 30 and the speaker is different, the sound received by each microphone 30a has different amplitude and propagation time depending on the distance.

【００１８】このような直列の配列構造を有するマイク
ロホンアレイユニット３０内の各マイクロホンで受けた
音声は、振幅について各ウェイティング用アンプ３１に
て調整が行われ、その結果、マイクロホンアレイの指向
特性におけるサイドローブが最も小さく（指向性が最も
鋭くなる）ように成される。The sound received by each microphone in the microphone array unit 30 having such a serial arrangement structure is adjusted in amplitude by each weighting amplifier 31, and as a result, the side characteristic in the directional characteristics of the microphone array is obtained. The lobe is made the smallest (the directivity becomes sharpest).

【００１９】また、各遅延回路３２を設け、該遅延回路
３２により各マイクロホン３０ａの持つ伝搬時間の差分
に応じて、マイクロホンアレイユニット３０の焦点位置
が話者の位置になるように調整することで、さらに高い
耐雑音性を得ることができる。また、方向制御機構３５
により、マイクロホンアレイユニット３０の話者に対す
る方向を変更することが可能になっている。Also, each delay circuit 32 is provided, and the delay circuit 32 adjusts the focal position of the microphone array unit 30 to be the position of the speaker according to the difference in the propagation time of each microphone 30a. And higher noise resistance can be obtained. The direction control mechanism 35
Thereby, the direction of the microphone array unit 30 with respect to the speaker can be changed.

【００２０】各マイクロホンからの出力は、マイクロホ
ンアンプ３３で加算され、次段のＡ／Ｄ変換回路３４で
ディジタル信号に変換された後に出力される。こうして
得られた信号が前記図１の音声特徴ベクトル抽出部２２
に送られる。The outputs from the microphones are added by a microphone amplifier 33 and converted into digital signals by an A / D conversion circuit 34 at the next stage, and then output. The signal thus obtained is used as the speech feature vector extraction unit 22 in FIG.
Sent to

【００２１】また、このようなマイクロホンアレイ以外
にも、日本音響学会誌５１巻５号（１９９５）、第４０
０ページ乃至４０６頁にも発表されているようなパラメ
トリックマイクロホン（超指向性マイクロホン）も利用
できる。パラメトリックマイクロホンとは、音波の非線
型性を利用し、プローブ波として大振幅の超音波を発生
させ、その伝播空間（超音波音場）内において、進入し
てくる音波と非線型相互作用を起こし、その結果とし
て、プローブ波は空間中で振幅変調され（非線型歪みの
発生）、その変調を利用するものである。In addition to such a microphone array, Journal of the Acoustical Society of Japan, Vol. 51, No. 5 (1995), No. 40
Parametric microphones (super directional microphones) as disclosed on pages 0 to 406 can also be used. Parametric microphones use the non-linearity of sound waves to generate large-amplitude ultrasonic waves as probe waves, and cause nonlinear interaction with incoming sound waves in the propagation space (ultrasonic sound field). As a result, the probe wave is amplitude-modulated in space (occurrence of nonlinear distortion), and the modulation is used.

【００２２】即ち、プローブ波と入射音波の交差空間が
長い程変調（非線型歪み）は大きくなり、これがパラメ
トリックマイクロホンの受音方向性（指向特性）を決定
する。従って、プローブ波と同方向に進む音波が最も変
調（非線型歪み）が大きくなる。さらに、超音波は鋭い
指向性を有しているため、交差空間は限定されてしまう
が、この交差空間がマイクロホンアレイと等価の働きを
し仮想的な縦形の配列構造を有するマイクロホンを形成
する。従ってパラメトリックマイクロホンは、ある一定
方向から入射してくる音のみを受音することができる。That is, the longer the intersecting space between the probe wave and the incident sound wave, the greater the modulation (non-linear distortion), which determines the sound receiving direction (directivity) of the parametric microphone. Therefore, the sound wave traveling in the same direction as the probe wave has the largest modulation (non-linear distortion). Further, since the ultrasonic wave has a sharp directivity, the intersection space is limited, but the intersection space functions equivalently to a microphone array and forms a microphone having a virtual vertical arrangement structure. Therefore, the parametric microphone can receive only the sound coming from a certain direction.

【００２３】図３は音声情報入力部２０にパラメトリッ
クマイクロホンを適用した具体例を示している。パラメ
トリックマイクロホンユニット４０はプローブ波発生部
４１と受音部４２により構成され、前記プローブ波発生
部４１では、プローブ波発生回路４３よりプローブ波信
号が供給され、超音波（約４０ＫＨｚ周辺）を発生す
る。入力音声はプローブ波の伝搬方向と同方向に伝搬す
る方向から音場（パラメトリックマイクロホンユニット
４０内）に入ってくる。この音声とプローブ波はパラメ
トリックマイクロホンユニット４０内で非線型相互作用
を起こし、プローブ波が振幅変調（非線型歪み）を生じ
る。FIG. 3 shows a specific example in which a parametric microphone is applied to the voice information input unit 20. The parametric microphone unit 40 includes a probe wave generator 41 and a sound receiver 42. The probe wave generator 41 receives a probe wave signal from a probe wave generator 43 and generates an ultrasonic wave (around 40 KHz). . The input sound enters the sound field (in the parametric microphone unit 40) from the direction propagating in the same direction as the propagation direction of the probe wave. The sound and the probe wave cause nonlinear interaction in the parametric microphone unit 40, and the probe wave causes amplitude modulation (non-linear distortion).

【００２４】プローブ波の伝搬方向と音声の伝搬方向が
一致しなければ上述の振幅変調（非線型歪み）の発生が
小さくなり、音声信号を抽出することは困難になるの
で、かなりの狭指向性（耐ノイズ性）を有することにな
る。この振幅変調（非線型歪み）はプローブ波の伝搬経
路と音声の伝搬経路の交差点で発生するため、見かけ上
仮想マイクロホン素子が縦形配列を形成しているような
状態となるものである。If the propagation direction of the probe wave and the propagation direction of the voice do not match, the occurrence of the amplitude modulation (non-linear distortion) described above becomes small, and it becomes difficult to extract the voice signal. (Noise resistance). Since this amplitude modulation (non-linear distortion) occurs at the intersection of the propagation path of the probe wave and the propagation path of the sound, the virtual microphone elements appear to form a vertical array.

【００２５】プローブ波発生部４１からのプローブ波を
受音部４２で受け、復調回路４４で音声だけを抽出し、
マイクロホンアンプ４５で増幅した後、Ａ／Ｄ変換回路
４６を介して出力信号を得る。この信号が前記図１の音
声特徴ベクトル抽出部２２に送られる。また、方向制御
機構４７によってパラメトリックマイクロホンユニット
４０の方向を変更させることができる。さらに、アレイ
長変更機構によってアレイ長（プローブ波発生部４１と
受音部４２との間の距離）を変更することができる。The probe wave from the probe wave generator 41 is received by the sound receiver 42, and only the sound is extracted by the demodulator 44.
After being amplified by the microphone amplifier 45, an output signal is obtained via the A / D conversion circuit 46. This signal is sent to the audio feature vector extraction unit 22 of FIG. Further, the direction of the parametric microphone unit 40 can be changed by the direction control mechanism 47. Further, the array length (the distance between the probe wave generator 41 and the sound receiver 42) can be changed by the array length changing mechanism.

【００２６】続いて本発明の音声認識装置の動作につい
て説明する。前記図１において、画像情報解析部２６
は、画像情報入力部２５から得られた画像データを解析
し、画像内の話者の位置を検出する。画像内における話
者の位置は、話者の顔画像を抽出し、それを追跡するこ
となどで求めることができる。音声入力制御部２１は、
前記画像情報解析部２６から送られてくる話者の位置デ
ータに基づいて、音声情報入力部２０の指向特性や入力
特性、方向を制御する。Next, the operation of the speech recognition apparatus of the present invention will be described. In FIG. 1, the image information analysis unit 26
Analyzes the image data obtained from the image information input unit 25 and detects the position of the speaker in the image. The position of the speaker in the image can be obtained by extracting the face image of the speaker and tracking it. The voice input control unit 21
The directional characteristics, input characteristics, and direction of the voice information input unit 20 are controlled based on the position data of the speaker sent from the image information analysis unit 26.

【００２７】前記図２の如くマイクロホンアレイを用い
る場合、遅延回路３２の遅延時間を制御して常に焦点位
置が話者の位置になるように調整する、話者とマイクロ
ホンアレイユニット３０との距離の変化に応じてマイク
ロホンアンプ３３のゲインを変更する、話者とマイクロ
ホンアレイユニット３０との位置関係の変化に応じて方
向制御機構３５を駆動してマイクロホンアレイユニット
３０の向きを調整する等の制御が可能である。When a microphone array is used as shown in FIG. 2, the delay time of the delay circuit 32 is controlled so that the focal position is always adjusted to the position of the speaker, and the distance between the speaker and the microphone array unit 30 is adjusted. Controls such as changing the gain of the microphone amplifier 33 in accordance with the change and driving the direction control mechanism 35 in accordance with the change in the positional relationship between the speaker and the microphone array unit 30 to adjust the direction of the microphone array unit 30 are performed. It is possible.

【００２８】また、前記図３の如くパラメトリックマイ
クロホンを用いる場合、感度特性はアレイ長とプローブ
波の角周波数との積によって決定されるため、アレイ長
や、プローブ波の周波数（プローブ波の周波数とプロー
ブ波の角周波数の変化とは正比例する）を変更すれば受
波感度を変化させることができるが、実施例ではプロー
ブ波発生回路４３を制御してプローブ波の周波数を変化
させるようにしている。When a parametric microphone is used as shown in FIG. 3, the sensitivity characteristic is determined by the product of the array length and the angular frequency of the probe wave. The reception sensitivity can be changed by changing the angular frequency of the probe wave (which is directly proportional to the change in the angular frequency of the probe wave). In the embodiment, the probe wave generation circuit 43 is controlled to change the frequency of the probe wave. .

【００２９】また、指向特性はアレイ長と信号波波数と
の積によって決定されるので、アレイ長と信号波波数と
の積により求められる該値を変化させることで指向特性
を変化させることができる（前記値が高くなるほど狭指
向性となる）。実施例においては、アレイ長変更機構４
８を駆動してアレイ長を変化させるようにしている。Since the directivity is determined by the product of the array length and the signal wave number, the directivity can be changed by changing the value obtained by the product of the array length and the signal wave number. (The higher the value, the narrower the directivity). In the embodiment, the array length changing mechanism 4
8 is driven to change the array length.

【００３０】そして、話者とパラメトリックマイクロホ
ンユニット４０との距離の変化に応じてマイクロホンア
ンプ４５のゲインを変更することができる。The gain of the microphone amplifier 45 can be changed according to the change in the distance between the speaker and the parametric microphone unit 40.

【００３１】さらには、方向制御機構４７により、パラ
メトリックマイクロホンユニット４０の方向を話者の方
へ向けることができる。Further, the direction of the parametric microphone unit 40 can be directed to the speaker by the direction control mechanism 47.

【００３２】このような制御により、話者の位置の変化
に対しては、話者の方向へマイクロホンの指向特性やマ
イクロホンの方向を調整することで、また、話者とマイ
クロホンとの間の距離の変動に対しては、マイクロホン
の感度特性を変更することで、安定して品質の良い音声
を得ることができ、認識率の向上を図ることができる。According to such control, when the position of the speaker changes, the directional characteristics of the microphone and the direction of the microphone are adjusted in the direction of the speaker, and the distance between the speaker and the microphone is adjusted. By changing the sensitivity characteristic of the microphone against the fluctuation of, it is possible to obtain a stable and high-quality voice and improve the recognition rate.

【００３３】また同時に、マイクロホンの指向特性の制
御により、話者の位置と異なる方向から発生している騒
音のレベルを相対的に抑制することができるため、周囲
雑音の影響が少なくなり、認識率の向上を図ることがで
きる。At the same time, by controlling the directional characteristics of the microphone, the level of noise generated from a direction different from the position of the speaker can be relatively suppressed, so that the influence of ambient noise is reduced and the recognition rate is reduced. Can be improved.

【００３４】このようにして前記図２または図３で示し
た音声情報入力部で取り込まれた信号は、従来の音声認
識装置と同様に、音声特徴ベクトル抽出部２２において
分析処理が施され、音声特徴ベクトル列に変換される。
変換された音声特徴ベクトルは音声認識部２３において
正規化処理等が行われ、音声パターンに変換された後、
既に述べたような既知の音声認識手法によって認識結果
が得られる。The signal fetched by the voice information input unit shown in FIG. 2 or FIG. 3 is subjected to an analysis process in the voice feature vector extraction unit 22 in the same manner as in the conventional voice recognition device. It is converted into a feature vector sequence.
The converted voice feature vector is subjected to normalization processing and the like in the voice recognition unit 23 and converted into a voice pattern.
A recognition result can be obtained by a known speech recognition method as described above.

【００３５】尚、マイクロホンの感度特性、マイクロホ
ンの指向特性、マイクロホンの方向の全てを変化させる
ようにしてもよいし、マイクロホンの感度特性、マイク
ロホンの指向特性、マイクロホンの方向のいずれか１つ
だけを変化させるように構成してもよい。例えば、話者
とマイクロホンとの間の位置がある程度限定されるよう
な場所に設置された音声認識装置ならば、感度特性のみ
変えるようにしてもよい。The sensitivity characteristic of the microphone, the directional characteristic of the microphone, and the direction of the microphone may all be changed, or only one of the sensitivity characteristic of the microphone, the directional characteristic of the microphone, and the direction of the microphone may be changed. You may comprise so that it may change. For example, if the voice recognition device is installed in a place where the position between the speaker and the microphone is limited to some extent, only the sensitivity characteristic may be changed.

【００３６】[0036]

【発明の効果】以上、詳述した如く本発明に依れば、常
に話者のいる方向からの音を適確に捕捉すると共に、話
者の位置の変化や周囲雑音等の影響を低減し、話者の発
する音声を良好な品質で取り込むことができるので、認
識率を向上させることができる。As described above, according to the present invention, it is possible to always accurately capture the sound from the direction in which the speaker is present, and reduce the influence of the change in the position of the speaker and the surrounding noise. Since the voice uttered by the speaker can be captured with good quality, the recognition rate can be improved.

[Brief description of the drawings]

【図１】本発明の音声認識装置の構成を示す回路ブロッ
ク図である。FIG. 1 is a circuit block diagram showing a configuration of a speech recognition device of the present invention.

【図２】本発明の音声認識装置において、音声情報入力
部２０にマイクロホンアレイを適用した例を示す回路ブ
ロック図である。FIG. 2 is a circuit block diagram showing an example in which a microphone array is applied to the voice information input unit 20 in the voice recognition device of the present invention.

【図３】本発明の音声認識装置において、音声情報入力
部２０にパラメトリックマイクロホンを適用した例を示
す回路ブロック図である。FIG. 3 is a circuit block diagram showing an example in which a parametric microphone is applied to the voice information input unit 20 in the voice recognition device of the present invention.

【図４】従来の音声認識装置の構成を示す回路ブロック
図である。FIG. 4 is a circuit block diagram showing a configuration of a conventional speech recognition device.

[Explanation of symbols]

２０音声情報入力部２１音声入力制御部２２音声特徴ベクトル抽出部２３音声認識部２４認識結果表示部２５画像情報入力部２６画像情報解析部３０マイクロホンアレイユニット３１ウェイティング用アンプ３２遅延回路３３マイクロホンアンプ３４Ａ／Ｄ変換回路３５方向制御機構４０パラメトリックマイクロホンユニット４１プローブ波発生部４２受音部４３プローブ波発生回路４４復調回路４５マイクロホンアンプ４６Ａ／Ｄ変換回路４７方向制御機構４８アレイ長変更機構 Reference Signs List 20 voice information input unit 21 voice input control unit 22 voice feature vector extraction unit 23 voice recognition unit 24 recognition result display unit 25 image information input unit 26 image information analysis unit 30 microphone array unit 31 waiting amplifier 32 delay circuit 33 microphone amplifier 34 A / D conversion circuit 35 Direction control mechanism 40 Parametric microphone unit 41 Probe wave generation unit 42 Sound reception unit 43 Probe wave generation circuit 44 Demodulation circuit 45 Microphone amplifier 46 A / D conversion circuit 47 Direction control mechanism 48 Array length change mechanism

Claims

[Claims]

1. An image information input means for capturing image information, an image information analysis means for analyzing image information input from the image information input means, and a voice information input means for capturing voice information. An input characteristic changing unit for changing a voice input characteristic of the voice information input unit based on information from the image information analysis unit.

2. The speech recognition apparatus according to claim 1, wherein said input characteristic changing means changes a sensitivity characteristic of said voice information input means.

3. The speech recognition device according to claim 1, wherein said input characteristic changing means changes a directional characteristic of said voice information input means.

4. The input characteristic changing means changes a direction of the voice information input means.
The speech recognition device according to the above.

5. The voice recognition device according to claim 1, wherein said voice information input means includes a microphone array including a plurality of microphones arranged in series. apparatus.

6. The speech recognition apparatus according to claim 1, wherein said speech information input means includes a parametric microphone.

7. The image information analyzing means according to claim 1, wherein said image information analyzing means analyzes information on a position of a speaker based on image information input from said image information input means. A voice recognition device according to any one of the claims.

8. The image information analysis means extracts an image of a specific part of a speaker from the image information input from the image information input means, and traces the image to analyze information on the position of the speaker. The speech recognition device according to claim 7, wherein