JPH08314489A

JPH08314489A - Voice recognition device

Info

Publication number: JPH08314489A
Application number: JP7116874A
Authority: JP
Inventors: Naoyuki Habasaki; 直行幅崎; Yasuo Tomooka; 靖夫友岡
Original assignee: NEC Corp; NEC Robotics Engineering Ltd
Current assignee: NEC Corp; NEC Robotics Engineering Ltd
Priority date: 1995-05-16
Filing date: 1995-05-16
Publication date: 1996-11-29

Abstract

PURPOSE: To recognize vocalization of voices simultaneously inputted to plural microphones by a voice recognition device. CONSTITUTION: This voice recognition device is provided with microphones 1-1 to 1-N converting an input voice to a voice signal, input voice analysis parts 2-1 to 2-N analyzing the voice signal from the microphones 1-1 to 1-N and converting to a characteristic vector system, characteristic vector system storage parts 3-1 to 3-N storing the characteristic vector system from the input voice analysis parts 2-1 to 2-N and a characteristic vector system monitor switch part 4 monitoring respective characteristic vector system storage parts 3-1 to 3-N and successively selecting and switching the storage part storing the effective characteristic vector system. Further, the device is provided with a recognition part 6 calculating the similarity between the effective vector system from respective storage parts and a standard pattern beforehand stored in a recognition dictionary part 5 and making the standard pattern that the similarity becomes maximum the recognition result of the input voice.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声認識装置に関し、特
に入力音声パターンと認識辞書とのパターンマッチング
により入力音声の認識結果を導き出す音声認識装置に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device, and more particularly to a voice recognition device for deriving a recognition result of an input voice by pattern matching between an input voice pattern and a recognition dictionary.

【０００２】[0002]

【従来の技術】従来から、産業現場等で複数の作業者が
類似の仕事を行う作業過程で生じる事象を音声で計算機
等に入力して情報処理させるシステムが一般化してい
る。このようなシステムでは、音声で入力される内容を
予め簡単な単語で代表するようにしておき、各各の作業
者が入力してくる単語音声を各各の作業者毎に設けられ
た音声認識装置により認識させ、この認識結果をホスト
計算機に送出する音声認識方式が採られている。そし
て、ホスト計算機は送られてきた認識結果に基づいて所
定の情報処理を実行することにより、適宜作業者に答を
送り、一連の作業を進めていく。このホスト計算機が作
業者に答を送る手段としては、応答装置を通して音声に
より作業者へ返答する方式およびＣＲＴ等の表示装置へ
答を表示して作業者に提示する方式等がある。2. Description of the Related Art Conventionally, there has been generalized a system in which a plurality of workers perform a similar work at an industrial site or the like to input information by voice into a computer or the like to process information. In such a system, the contents input by voice are represented by simple words in advance, and the word voice input by each worker is recognized by the voice recognition provided for each worker. A voice recognition method is adopted in which the recognition is performed by the device and the recognition result is sent to the host computer. Then, the host computer appropriately sends an answer to the worker by executing predetermined information processing based on the sent recognition result, and advances a series of work. As a means for the host computer to send an answer to the operator, there are a method of replying to the operator by voice through a response device, a method of displaying the answer on a display device such as a CRT and presenting it to the operator.

【０００３】この第１の従来技術である音声認識方式を
示す図２を参照すると、各各の作業者が発声する音声は
マイクロフォン１０−１〜１０〜Ｎの各各において電気
信号に変換されて対応する音声認識装置１１−１〜１１
−Ｎに入力される。音声認識装置１１−１〜１１−Ｎの
各各で音声認識された結果はホスト計算機１２に送出さ
れ、そのホスト計算機１２は入力された音声認識結果が
いずれの音声認識装置から入力されたものであるかを識
別して各各の音声認識結果に対応する処理を行い、その
処理結果を答えとして対応するＣＲＴ１３−１〜１３−
Ｎの各各へ送出し、その各各の画面上に表示する。従っ
て、例えばマイクロフォン１０−１から音声を入力する
と、その答えはＣＲＴ１３−１上に表示され、作業者は
この答を見て作業を進める。Referring to FIG. 2 showing the voice recognition method as the first conventional technique, the voice uttered by each worker is converted into an electric signal in each of the microphones 10-1 to 10-N. Corresponding voice recognition devices 11-1 to 11
-N. The result of voice recognition performed by each of the voice recognition devices 11-1 to 11-N is sent to the host computer 12, and the host computer 12 receives the voice recognition result input from any of the voice recognition devices. CRTs 13-1 to 13-corresponding to each of the speech recognition results by identifying whether there is any, and using the processing result as an answer.
It is sent to each N and displayed on the screen of each N. Therefore, for example, when a voice is input from the microphone 10-1, the answer is displayed on the CRT 13-1 and the worker sees this answer and proceeds with the work.

【０００４】しかし、この第１の従来技術である音声認
識方式のシステムが扱う作業の内容によっては事象の起
こる頻度が低く、複数の音声認識装置を設置しても、同
時に音声入力が起きる確立が極めて低い。従って、２台
以上の音声認識装置が同時に動作することがほとんどな
い場合があった。このような場合には、音声認識装置の
稼働率が極めて低く、高価な音声認識装置を複数台設置
することは非常に不経済となる問題点があった。However, the frequency of occurrence of an event is low depending on the contents of the work handled by the system of the voice recognition system which is the first prior art, and even if a plurality of voice recognition devices are installed, it is possible that voice input occurs simultaneously. Extremely low. Therefore, in some cases, two or more voice recognition devices do not operate at the same time. In such a case, there is a problem that the operating rate of the voice recognition device is extremely low, and it is very uneconomical to install a plurality of expensive voice recognition devices.

【０００５】この第１の従来技術である音声認識方式の
問題点を改善した第２の従来技術である音声認識方式と
して特開平１−２１６３９８号公報がある。Japanese Unexamined Patent Publication No. 1-216398 discloses a speech recognition system as a second conventional technique which improves the problems of the speech recognition system as the first conventional technique.

【０００６】この第２の従来技術である音声認識方式
は、図３を参照して説明すると、音声入力者の数に対応
して設置されて各各の音声入力者の発声音声を電気信号
の音声信号に変換するマイクロフォン２０−１〜２０−
Ｎと、対応するマイクロフォン２０−１〜２０−Ｎから
入力される各各の音声信号の音声出力期間を検出してそ
の検出結果を後述するホスト計算機２４へ供給する音声
出力期間検出装置２１−１〜２１−Ｎと、マイクロフォ
ン２０−１〜２０−Ｎから出力される音声信号の論理和
をとって後述する音声認識装置２３へ供給する論理和
（ＯＲ）回路２２と、ＯＲ回路２２から供給される音声
信号を認識（例えば、対応するコード等に変換）してそ
の結果をホスト計算機２４へ供給する前述の音声認識装
置２３と、音声認識装置２３から入力される音声認識結
果に基づいて所定の処理を行いその処理結果を対応する
後述のＣＲＴ２５−１〜２５−Ｎ各各へ送出することに
より各各の画面上に表示するホスト計算機２４と、ホス
ト計算機２４からの応答を表示する前述のＣＲＴ２５−
１〜２５−Ｎとから構成される。The second prior art voice recognition system will be described with reference to FIG. 3. The voice recognition system is installed in correspondence with the number of voice input persons, and the uttered voice of each voice input person is converted into an electric signal. Microphones 20-1 to 20- for converting to audio signals
N and a voice output period detecting device 21-1 for detecting the voice output period of each voice signal input from the corresponding microphones 20-1 to 20-N and supplying the detection result to the host computer 24 described later. ˜21-N, and a logical sum (OR) circuit 22 that supplies the speech signals output from the microphones 20-1 to 20-N to a speech recognition device 23 described later and is supplied from the OR circuit 22. A voice signal that recognizes (for example, converts into a corresponding code or the like) and supplies the result to the host computer 24, and a predetermined voice recognition result based on the voice recognition result input from the voice recognition device 23. The host computer 24 that performs processing and displays the processing result on each screen of each of the corresponding CRTs 25-1 to 25-N, which will be described later, and from the host computer 24 Answer to display the above-mentioned CRT25-
1 to 25-N.

【０００７】次に、この第２の従来技術である音声認識
方式の動作について説明する。先ず、１つのチャンネル
のみからの音声入力があった場合の動作について説明す
る。作業過程で事象が生じると、この事象に対する単語
音声をある作業者がマイクロフォン２０−１に向かって
発声したとする。マイクロフォン２０−１は集音した音
声を音声信号に電気変換して、これをＯＲ回路２２を介
して音声認識装置２３へ出力する。音声認識装置２３は
入力された音声信号を認識する所定の認識処理を行って
計算機が判別できるコード等に変換し、このコードを入
力音声の認識結果としてホスト計算機２４へ出力する。
一方、マイクロフォン２０−１から出力された音声信号
は音声出力期間検出装置２１−１に入力され、ここで音
声出力期間、つまり音声の発声期間が検出される。Next, the operation of the voice recognition system which is the second conventional technique will be described. First, the operation when there is audio input from only one channel will be described. When an event occurs in the work process, it is assumed that a worker utters a word voice for the event toward the microphone 20-1. The microphone 20-1 electrically converts the collected voice into a voice signal and outputs the voice signal to the voice recognition device 23 via the OR circuit 22. The voice recognition device 23 performs a predetermined recognition process for recognizing an input voice signal, converts it into a code that can be discriminated by a computer, and outputs this code to the host computer 24 as a recognition result of the input voice.
On the other hand, the voice signal output from the microphone 20-1 is input to the voice output period detection device 21-1, and the voice output period, that is, the voice utterance period is detected here.

【０００８】音声出力期間検出装置２１−１〜２１−Ｎ
各各の出力信号はホスト計算機２４に入力され、ホスト
計算機２４はこれらの出力信号がいずれの音声出力期間
検出装置から入力されたのかを識別することによって、
音声の入力チャネル、すなわち上述の場合はマイクロフ
ォン２０−１から音声が入力されたことを知る。これに
よって、ホスト計算機２４は音声認識装置２３の認識結
果に基づいた処理を行うと、その処理結果を音声入力チ
ャネルの対応するＣＲＴ２５−１へ答えとして送出し、
そのＣＲＴ２５−１の画面上に次の作業指示等の応答メ
ッセージを表示させる。Audio output period detectors 21-1 to 21-N
The output signal of each respective is input to the host computer 24, and the host computer 24 identifies by which audio output period detection device these output signals are input.
It is known that a voice is input from the voice input channel, that is, the microphone 20-1 in the above case. As a result, when the host computer 24 performs the processing based on the recognition result of the voice recognition device 23, it sends the processing result to the CRT 25-1 corresponding to the voice input channel as an answer,
A response message such as the next work instruction is displayed on the screen of the CRT 25-1.

【０００９】このように、あるチャネルが音声入力を行
っているときに、ホスト計算機２４は対応する音声出力
期間検出装置から出力される音声出力期間検出信号によ
って音声入力を行っているチャネル番号を判別すること
ができる。このため、他のチャネルのＣＲＴに対しては
音声認識装置２３が使用中である旨のメッセージＡを送
って、これを画面上に表示させる。このメッセージＡと
しては例えば「音声認識装置はビジー中です。音声入力
はしばらくお待ち下さい」等がある。ホスト計算機２４
はその後、前記チャネルの入力処理を終了すると、全チ
ャネルのＣＲＴに対して例えばメッセージＢとして「音
声認識装置はレディー中です。音声入力が可能です」を
送出する。As described above, when a certain channel is performing audio input, the host computer 24 determines the channel number of the audio input based on the audio output period detection signal output from the corresponding audio output period detection device. can do. Therefore, the voice recognition device 23 sends a message A indicating that the voice recognition device 23 is in use to the CRT of another channel and displays it on the screen. The message A includes, for example, "the voice recognition device is busy. Please wait for a while for voice input". Host computer 24
After that, when the input processing of the channels is completed, for example, a message B "Voice recognition device is ready. Voice input is possible" is sent to the CRTs of all channels.

【００１０】次に、複数のチャネルが同時に音声入力を
行った場合には、ＯＲ回路２２の出力は複数の音声が重
畳したものになり、この重畳した音声が音声認識装置２
３に入力される。従って、これら複数の音声を音声認識
装置２３によって認識させても、その認識結果は信頼が
できるものではなくなる。しかし、この場合、複数のチ
ャネルの音声出力期間検出装置から時間的に重なった複
数の音声出力期間検出信号がホスト計算機２４に入力さ
れると、ホスト計算機２４は複数のチャネルから同時に
音声入力があったことを認識して、音声認識装置２３か
ら入力される認識結果を捨てるとともに、同時に音声を
入力した複数のチャネルの各各に対して優先順位を決め
る。そして、ホスト計算機２４は優先順位の最も高いチ
ャネルのＣＲＴに対して、例えばメッセージＣとして
「もう一度、先ほどの音声を発生して下さい」を送出し
て表示させる。これと同時に、ホスト計算機２４は他の
優先順位のチャネルのＣＲＴには前述と同様のメッセー
ジＡ、つまり「音声認識装置はビジー中です。音声入力
はしばらくお待ち下さい」を送出する。このようにし
て、優先度の最も高いチャネルの音声入力の処理が終了
すると、ホスト計算機２４は第２番目に優先度の高いチ
ャネルにメーセージＣを送出し、同時に他の優先度のチ
ャネルに対してはメッセージＡを送出する処理を行な
う。結局、ホスト計算機２４は上記動作を繰返すことに
より、重複して音声入力された全チャネルに対して音声
入力を時分割で行なわせて音声認識装置２３からの有効
な認識結果を得、音声入力処理を順次進めて行く。な
お、上記音声認識装置２３の使用状態に対するメッセー
ジは例えばＣＲＴの画面の隅に表示させ、また作業内容
等の指示は画面中央に大きく表示させる。Next, when a plurality of channels simultaneously input voices, the output of the OR circuit 22 is a superposition of a plurality of voices.
Input to 3. Therefore, even if these voices are recognized by the voice recognition device 23, the recognition result is not reliable. However, in this case, when a plurality of audio output period detection signals that overlap in time are input from the audio output period detection devices of a plurality of channels to the host computer 24, the host computer 24 receives audio input from a plurality of channels at the same time. That is, the recognition result input from the voice recognition device 23 is discarded, and the priority order is determined for each of the plurality of channels that input the voice at the same time. Then, the host computer 24 sends out, for example, a message C "Please generate the voice again" to the CRT of the channel having the highest priority and displays it. At the same time, the host computer 24 sends a message A similar to the above, that is, "the voice recognition device is busy. Please wait for a while for voice input" to the CRT of the channel of another priority. In this way, when the voice input processing of the channel with the highest priority is completed, the host computer 24 sends the message C to the channel with the second highest priority, and at the same time with respect to the channels with other priorities. Performs a process of transmitting the message A. Eventually, the host computer 24 repeats the above-mentioned operation so that the voice input is time-divisionally performed for all the channels to which the voice is input redundantly, and the effective recognition result from the voice recognition device 23 is obtained, and the voice input processing To proceed sequentially. It should be noted that the message regarding the use state of the voice recognition device 23 is displayed in the corner of the screen of the CRT, for example, and the instruction such as work contents is displayed in a large size in the center of the screen.

【００１１】この第２の従来技術である音声認識方式に
よれば、複数のチャネルから音声が同時に入力された場
合には、優先順位を決めて時分割で各各のチャネルから
の音声を入力させるようにしているため、音声認識装置
２３を１台設けるだけで済む。According to the second prior art voice recognition method, when voices are simultaneously input from a plurality of channels, priority is determined and voices from the respective channels are input in a time division manner. Therefore, only one voice recognition device 23 needs to be provided.

【００１２】[0012]

【発明が解決しようとする課題】この第２の従来技術で
ある音声認識方式による従来の音声認識装置では、複数
のマイクロフォンに音声が同時に発声された場合には、
複数の音声入力者に対して優先度を決めて再度発声する
ことを要求しなければならなかった。このため、音声入
力者は入力待ちおよび再発声により作業効率が悪くなる
という問題があった。In the conventional voice recognition device based on the voice recognition system which is the second conventional technique, when voices are simultaneously uttered by a plurality of microphones,
It was necessary to determine the priority of multiple voice input persons and request them to speak again. For this reason, there is a problem that the voice input person becomes less efficient in work due to waiting for input and re-voice.

【００１３】[0013]

【課題を解決するための手段】本発明による音声認識装
置は、マイクロフォンを介して入力される一発声分ごと
の音声単語を分析して特徴ベクトル系列に変換する入力
音声分析手段と、前記入力音声分析手段からの特徴ベク
トル系列を記憶する特徴ベクトル記憶部とを有する複数
の音声分析記憶部と、前記複数の音声分析記憶手段の各
各の前記特徴ベクトル記憶手段を監視して有効な特徴ベ
クトル系列の記憶された前記特徴ベクトル記憶手段出力
を順次選択し切り換えを行う特徴ベクトル系列監視切換
手段と、前記特徴ベクトル系列監視切換手段を介して入
力される前記複数の音声分析記憶部の前記各各の特徴ベ
クトル記憶手段からの前記有効な特徴ベクトル系列と予
め認識辞書として蓄えられた前記音声単語の標準パター
ンとの類似度を計算して前記類似度が最大となる前記標
準パターンの音声単語を前記入力音声の認識結果とする
認識手段とを備える。A speech recognition apparatus according to the present invention comprises input speech analysis means for analyzing a speech word input via a microphone for each utterance and converting it into a feature vector sequence, and the input speech. A plurality of voice analysis storage units having a feature vector storage unit that stores the feature vector sequences from the analysis unit, and an effective feature vector sequence by monitoring each of the feature vector storage units of the plurality of voice analysis storage units. Of the stored feature vector storage means for sequentially selecting and switching the feature vector sequence monitoring switching means, and each of the plurality of voice analysis storage units input via the feature vector sequence monitoring switching means. The degree of similarity between the valid feature vector sequence from the feature vector storage means and the standard pattern of the voice word stored in advance as a recognition dictionary is calculated. Audio word of the standard pattern the similarity is maximized by and a recognition means for the recognition result of the input speech.

【００１４】また、本発明による音声認識装置は、前記
入力音声分析手段が前記一発声分ごとの音声単語の分析
を終了した時点で前記一発声分の音声単語の分析が終了
したことを知らせる付加情報を前記特徴ベクトル系列に
付加することを特徴とする。Further, in the voice recognition apparatus according to the present invention, when the input voice analysis means finishes analyzing the voice word for each utterance, the addition of notifying that the analysis of the voice word for one utterance is finished It is characterized in that information is added to the feature vector series.

【００１５】さらに、本発明による音声認識装置は、前
記複数の入力音声分析手段の各各が前記一発声分ごとの
音声単語の分析を終了した時点で前記特徴ベクトル系列
を有効と判断することを特徴とする。Further, the voice recognition apparatus according to the present invention determines that the feature vector sequence is valid at the time when each of the plurality of input voice analysis means finishes analyzing the voice word for each utterance. Characterize.

【００１６】さらにまた、本発明による音声認識装置
は、前記複数の特徴ベクトル記憶手段の各各が前記一発
声分ごとの音声単語の特徴ベクトル系列をメモリに格納
し終えた時点で前記特徴ベクトル系列を有効と判断する
ことを特徴とする。Furthermore, in the speech recognition apparatus according to the present invention, each of the plurality of feature vector storage means stores the feature vector sequence of the voice word for each utterance in the memory when the feature vector sequence ends. Is determined to be effective.

【００１７】[0017]

【実施例】次に、本発明について図面を参照して説明す
る。本発明の一実施例を示す図１を参照すると、音声認
識装置は、複数の音声入力者の数に対応して設置されて
各各の音声入力者からの発声音声を音声信号に電気変換
するマイクロフォン１−１〜１−Ｎと、マイクロフォン
１−１〜１−Ｎから入力される音声信号の一発声分ごと
の音声単語を分析して特徴ベクトル系列に変換する入力
音声分析部２−１〜２−Ｎと、入力音声分析部２−１〜
２−Ｎから入力される特徴ベクトル系列を一発声分ごと
の音声単語単位で記憶する特徴ベクトル系列記憶部３−
１〜３−Ｎと、特徴ベクトル系列記憶部３−１〜３−Ｎ
に記憶された各各の特徴ベクトル系列のうち有効な特徴
ベクトル系列を有する記憶部を順次選択し切り換えを行
う特徴ベクトル系列監視切換部４と、複数の音声単語の
標準音声パターンを記憶する認識辞書部５と、特徴ベク
トル系列監視切換部４からの有効な特徴ベクトル系列と
認識辞書部５からの標準音声パターンとのパターンマッ
チングを行ってパターン類似度が最大となる音声単語を
入力音声の認識結果とする認識部６とから構成される。Next, the present invention will be described with reference to the drawings. Referring to FIG. 1 showing an embodiment of the present invention, a voice recognition device is installed corresponding to the number of a plurality of voice input persons, and electrically converts a voice output from each voice input person into a voice signal. An input voice analysis unit 2-1 that analyzes a voice word for each utterance of a voice signal input from the microphones 1-1 to 1-N and a voice signal input from the microphones 1-1 to 1-N and converts the voice word into a feature vector sequence. 2-N and the input voice analysis unit 2-1.
2-A feature vector sequence storage unit 3 that stores the feature vector sequence input from N in units of voice words for each utterance
1 to 3-N and feature vector sequence storage units 3-1 to 3-N
A feature vector sequence monitoring switching unit 4 for sequentially selecting and switching a storage unit having an effective feature vector sequence among the respective feature vector sequences stored in, and a recognition dictionary for storing standard voice patterns of a plurality of voice words. The voice recognition result of the input voice is obtained by performing pattern matching between the effective voice vector from the feature vector sequence monitor switching unit 4 and the standard voice pattern from the recognition dictionary unit 5 by the unit 5. And a recognition unit 6.

【００１８】次に動作について説明する。まず、１つの
マイクロフォンのみから音声の入力があった場合の動作
について説明する。使用者Ｘが認識させたい単語をマイ
クロフォン１−１に向かって発声すると、マイクロフォ
ン１−１は入力された発声音声を音声信号に電気変換し
て入力音声分析部２−１へ出力する。入力音声分析部２
−１は入力された音声信号の一発声分ごとの音声単語単
位を分析して特徴ベクトル系列に変換して特徴ベクトル
系列記憶部３−１へ出力する。特徴ベクトル系列記憶部
３−１ではこの変換された特徴ベクトル系列を一発声分
ごとの音声単語単位でメモリに格納する。特徴ベクトル
系列監視切換部４は特徴ベクトル系列記憶部３−１〜３
−Ｎの各各を常時監視しており、例えば特徴ベクトル系
列記憶部３−１に記憶されている特徴ベクトル系列が有
効となるとこの出力を選択して認識部６に接続するよう
に切り換える。認識部６は特徴ベクトル系列監視切換部
４を介して特徴ベクトル系列記憶部１−１出力に接続さ
れると、このメモリ内に格納され記憶されている特徴ベ
クトル系列と予め認識辞書部５に蓄えられ記憶されてい
る音声単語の標準パターンとの類似度を計算してそのパ
ターン類似度が最大となる標準パターンの音声単語を入
力音声の認識結果とする。Next, the operation will be described. First, the operation when a voice is input from only one microphone will be described. When the user X utters a word to be recognized by the microphone 1-1, the microphone 1-1 electrically converts the input uttered voice into a voice signal and outputs the voice signal to the input voice analysis section 2-1. Input voice analysis unit 2
-1 analyzes a voice word unit for each utterance of the input voice signal, converts it into a feature vector sequence, and outputs it to the feature vector sequence storage unit 3-1. The feature vector sequence storage unit 3-1 stores the converted feature vector sequence in the memory in units of voice words for each utterance. The feature vector sequence monitoring switching unit 4 is a feature vector sequence storage unit 3-1 to 3
Each of -N is constantly monitored, and, for example, when the feature vector series stored in the feature vector series storage unit 3-1 becomes valid, this output is selected and switched to be connected to the recognition unit 6. When the recognition unit 6 is connected to the output of the feature vector sequence storage unit 1-1 via the feature vector sequence monitoring switching unit 4, the feature vector sequence stored and stored in this memory is stored in the recognition dictionary unit 5 in advance. The similarity of the stored and stored voice word to the standard pattern is calculated, and the voice word of the standard pattern having the maximum pattern similarity is used as the recognition result of the input voice.

【００１９】次に、複数のマイクロフォンから同時に音
声の入力があった場合の動作について説明する。マイク
ロフォン１−１〜１−Ｎから同時に入力された音声は入
力音声分析部２−１〜２−Ｎによって各各特徴ベクトル
系列に変換されて特徴ベクトル系列記憶部３−１〜３−
Ｎ各各のメモリに一発声分ごとの音声単語単位で格納さ
れ記憶される。特徴ベクトル系列監視切換部４は特徴ベ
クトル系列記憶部３−１〜３−Ｎの各各を常時監視して
おり、これらの記憶部のうち有効な特徴ベクトル系列を
有する記憶部を予め決められた優先順位に従って順次認
識部６に接続するように選択し切り換えを行う。認識部
５は特徴ベクトル系列監視切換部４を介して特徴ベクト
ル系列記憶部３−１〜３−Ｎのいずれかに接続されると
前述の１つのマイクロフォンのみからの音声入力があっ
た場合と同様の認識を行い、その認識処理が終了すると
認識終了を知らせる信号を特徴ベクトル系列監視切換部
４へ出力する。そして、特徴ベクトル系列監視切換部４
は認識部６から最初の有効な特徴ベクトル系列を認識し
たことを知らせる通知を受けとると、次の優先順位に従
って次の有効な特徴ベクトル系列を有する記憶部を認識
部６に接続するように選択切り換える。以下、同じ動作
が全てのマイクロフォンからの音声入力が認識完了する
まで繰り返し行われる。Next, the operation when voices are simultaneously input from a plurality of microphones will be described. The voices simultaneously input from the microphones 1-1 to 1-N are converted into respective feature vector sequences by the input voice analysis units 2-1 to 2-N, and feature vector sequence storage units 3-1 to 3-3-
Each of the N memory units stores and stores a voice word unit for each utterance. The feature vector sequence monitor switching unit 4 constantly monitors each of the feature vector sequence storage units 3-1 to 3-N, and a storage unit having an effective feature vector sequence is predetermined among these storage units. The connection is sequentially selected and switched according to the priority order. When the recognition unit 5 is connected to any of the feature vector sequence storage units 3-1 to 3-N via the feature vector sequence monitoring switching unit 4, the recognition unit 5 is similar to the case where there is a voice input from only one microphone described above. When the recognition process is completed, a signal notifying the end of recognition is output to the feature vector sequence monitoring switching unit 4. Then, the feature vector sequence monitoring switching unit 4
Upon receiving a notification from the recognition unit 6 that the first valid feature vector sequence has been recognized, the storage unit having the next valid feature vector sequence is selectively switched according to the next priority so as to be connected to the recognition unit 6. . Hereinafter, the same operation is repeated until the voice input from all microphones is recognized.

【００２０】このように、特徴ベクトル系列監視切換部
４は特徴ベクトル系列記憶部３−１〜３−Ｎを監視して
いる過程において、同時に２つ以上の記憶部の特徴ベク
トル系列を有効と判断したときは、このシステム内で予
め決められている優先順位の高い方の記憶部を認識部６
に接続するように選択して切り換えを行う。As described above, the feature vector sequence monitor switching unit 4 determines that the feature vector sequences in two or more storage units are valid at the same time while monitoring the feature vector sequence storage units 3-1 to 3-N. When this is done, the recognizing unit 6 selects the storage unit with the higher priority, which is predetermined in this system.
Select to connect to and switch.

【００２１】ここで、特徴ベクトル系列監視切換部４が
特徴ベクトル系列記憶部３−１〜３−Ｎのうちどの記憶
部が有効となったのかお判断する基準は、例えば入力音
声分析部が一発声分の音声単語の分析を終了して、その
分析が終了した一発声分の音声単語の特徴ベクトル系列
が特徴ベクトル系列記憶部内のメモリに格納されたこと
が確証された時点である。Here, the criterion for the feature vector sequence monitor switching unit 4 to determine which of the feature vector sequence storage units 3-1 to 3-N is valid is, for example, the input voice analysis unit. This is the time when it is confirmed that the analysis of the voice words for the utterance has been completed and that the feature vector sequence of the voice word for one voice for which the analysis has been completed is stored in the memory in the feature vector sequence storage unit.

【００２２】この確証方法としては、例えば入力音声分
析部が一発声分ごとの音声単語の分析を終了した時点で
一発声分の音声単語の分析が終了したことを知らせる付
加情報をその特徴ベクトル系列に付加して特徴ベクトル
系列記憶部へ供給し、メモリに格納して記憶させる。こ
れによって、特徴ベクトル系列監視切換部４は特徴ベク
トル系列記憶部３−１〜３−Ｎの各各に記憶されている
各各の特徴ベクトル系列の付加情報から一発声分ごとの
音声単語の特徴ベクトル系列の有効あるいは無効を判断
して、時系列的に早く有効であると判断された特徴ベク
トル系列の記憶部を認識部６に接続するように選択し切
り換えを行う。そして、時系列的にほとんど同時に複数
の記憶部の特徴ベクトル系列を有効と判断したときに
は、前述のように予め決められた優先順位に従って該当
する記憶部と認識部６との接続の切り換えを行う。従っ
て、第２の従来技術の音声認識方式のように、複数の音
声が時間軸で重なった場合には１つの音声だけしか認識
できず、他の音声は破棄されて、その破棄された音声の
発声者に対して再度発声要求が通知されるようなことは
ない。As the confirmation method, for example, when the input voice analysis unit finishes analyzing the voice word for each utterance, the additional information notifying that the analysis of the voice word for one utterance is finished is added to the feature vector sequence. Is supplied to the feature vector sequence storage unit and stored in the memory for storage. As a result, the feature vector sequence monitor switching unit 4 determines the feature of the voice word for each utterance from the additional information of each feature vector sequence stored in each of the feature vector sequence storage units 3-1 to 3-N. The validity or invalidity of the vector sequence is determined, and the feature vector sequence storage unit that is determined to be effective earlier in time series is selected and connected so as to be connected to the recognition unit 6. Then, when it is determined that the feature vector series in the plurality of storage units are valid almost simultaneously in time series, the connection between the corresponding storage unit and the recognition unit 6 is switched according to the predetermined priority order as described above. Therefore, as in the second conventional voice recognition method, when a plurality of voices overlap each other on the time axis, only one voice can be recognized, the other voices are discarded, and the discarded voices are discarded. The speaker is not notified of the request for utterance again.

【００２３】次に、認識部６は特徴ベクトル系列監視切
換部４を介して接続される特徴ベクトル系列記憶部３−
１〜３−Ｎ各各からの有効な特徴ベクトル系列と予め認
識辞書部５に蓄えられ記憶されている標準パターンとの
類似度を計算するが、この類似度の計算方法としては、
動的計画法を利用して時間軸方向の変動を吸収して特徴
ベクトル系列（入力スペクトル時系列）と標準パターン
（標準スペクトル列）との距離値を求めるパターンマッ
チング法が使用される。ここで、標準スペクトル列は認
識単語ごとに認識辞書に格納されており、全単語につい
て距離値を計算した結果、最小距離値である標準スペク
トル列の単語が類似度が最大となる標準パターンの単語
としての認識結果となる。この類似度の計算方法はパタ
ーン認識の分野では周知の技術であるため、これ以上の
説明は省略する。Next, the recognition unit 6 is connected to the feature vector sequence monitor switching unit 4 and the feature vector sequence storage unit 3 is connected.
The similarity between the effective feature vector sequence from each of 1 to 3 to N and the standard pattern stored and stored in the recognition dictionary unit 5 in advance is calculated. As a method of calculating this similarity,
A pattern matching method is used in which a distance value between a feature vector series (input spectrum time series) and a standard pattern (standard spectrum sequence) is obtained by absorbing a change in the time axis direction using a dynamic programming method. Here, the standard spectrum sequence is stored in the recognition dictionary for each recognition word, and as a result of calculating the distance value for all words, the word of the standard spectrum sequence having the minimum distance value is the word of the standard pattern having the highest similarity. It becomes the recognition result as. Since this method of calculating the degree of similarity is a well-known technique in the field of pattern recognition, further description will be omitted.

【００２４】[0024]

【発明の効果】以上説明したように本発明によれば、複
数のマイクロフォンに同時に入力される音声の発声が時
間軸で重なった場合でも、１つの音声だけ認識して他の
音声を破棄し、その破棄した音声の発声者に対して再度
発声要求を通知する手段を採ることなく、全ての入力音
声を１台の音声認識装置で認識することができる。As described above, according to the present invention, even when the utterances of the voices simultaneously input to the plurality of microphones overlap on the time axis, only one voice is recognized and the other voices are discarded, All the input voices can be recognized by one voice recognition device without adopting a means of notifying the utterer of the discarded voices the utterance request again.

[Brief description of drawings]

【図１】本発明の一実施例の音声認識装置を示すブロッ
ク図である。FIG. 1 is a block diagram showing a voice recognition device according to an embodiment of the present invention.

【図２】第１の従来例を示すブロック図である。FIG. 2 is a block diagram showing a first conventional example.

【図３】第２の従来例を示すブロック図である。FIG. 3 is a block diagram showing a second conventional example.

[Explanation of symbols]

１−１〜１−Ｎマイクロフォン２−１〜２−Ｎ入力音声分析部３−１〜３−Ｎ特徴ベクトル系列記憶部４特徴ベクトル系列監視切換部５認識辞書部６認識部 1-1 to 1-N Microphone 2-1 to 2-N Input speech analysis unit 3-1 to 3-N Feature vector sequence storage unit 4 Feature vector sequence monitoring switching unit 5 Recognition dictionary unit 6 Recognition unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者友岡靖夫東京都港区芝五丁目７番１号日本電気株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Yasuo Tomooka 5-7-1 Shiba, Minato-ku, Tokyo NEC Corporation

Claims

[Claims]

1. An input voice analysis means for analyzing a voice word for each utterance input via a microphone and converting it into a feature vector series, and a feature vector for storing the feature vector series from the input voice analysis means. A plurality of voice analysis storage units each having a storage unit, and the feature vector storage unit outputs in which effective feature vector series are stored by sequentially monitoring the respective feature vector storage units of the plurality of voice analysis storage units. Feature vector sequence monitor switching means for selecting and switching, and the effective feature vector sequence from each of the feature vector storage means of the plurality of voice analysis storage sections input via the feature vector sequence monitor switching means. And the standard pattern that maximizes the similarity by calculating the similarity with the standard pattern of the speech word stored in advance as a recognition dictionary. Speech recognition apparatus characterized by comprising a recognition means for the speech word and the recognition result of the input speech, the.

2. When the input speech analysis unit finishes analyzing the speech words for each utterance, additional information is added to the feature vector series to inform that the analysis of the speech words for one utterance is finished. Claim 1 characterized by the above.
The voice recognition device described.

3. The feature vector sequence is determined to be valid when each of the plurality of input voice analysis units finishes analyzing the voice word for each utterance. Voice recognition device.

4. Each of the plurality of feature vector storage means determines that the feature vector sequence is valid at the time when the feature vector sequence of the voice word for each utterance has been stored in the memory. The voice recognition device according to claim 1.