JPH01216398A

JPH01216398A - Speech recognizing system

Info

Publication number: JPH01216398A
Application number: JP63040626A
Authority: JP
Inventors: Kensuke Uehara; 上原　堅助; Yasuyuki Masai; 康之正井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1988-02-25
Filing date: 1988-02-25
Publication date: 1989-08-30

Abstract

PURPOSE:To input a voice by one set of speech recognizing device against plural voice input persons by inputting the voices from each channel by a time division by determining the priority when the voices are inputted simultaneously from plural channels. CONSTITUTION:An interface circuit takes OR of voices inputted from voice input devices 11-1n of each channel and inputs it to a speech recognizing device 2, and simultaneously, voice output period detecting devices 51-5n detect a voice output period of the voice input devices 11-1n. Subsequently, a voice input channel deciding means specifies an input voice channel by a detecting signal, and also, when voices are inputted simultaneously from plural channels, a voice input control means outputs a message for instructing to a voice input person so that the voice is inputted to the voice recognizing device 2 in order from a prescribed channel, to response devices concerned 41-4n, brings the voices inputted simultaneously to time division and inputs them to the speech recognizing device 2. In such a way, a voice input can be executed to a computer 3, etc., by one set of speech recognizing device 2 against plural voice input persons.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は複数の音声入力を不特定話者単語認識装置によ
り認識して処理をする音声認識方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Industrial Application Field) The present invention relates to a speech recognition method in which a plurality of speech inputs are recognized and processed by a speaker-independent word recognition device.

（従来の技術）従来から、産業現場等で複数の作業者が類似の仕事を行
う作業過程で生じる事象を音声で計算機等に入力して情
報処理させるシステムが一般化している。このようなシ
ステムでは、音声で入力される内容は予め簡単な単語で
代表するようにしておき、各作業者が入力してくる単語
音声を各作業者毎に設けられた音声認識装置にて認識さ
せ、この認識結果をホスト計算機に送出する音声認識方
式が採られている。ホスト計算機は送られてきた認識結
果に基づいて情報処理を実行することにより、適宜作業
者に答を送り、一連の作業を進めでいく。このホスト計
算機が作業者に答を送る手段は応答装置を通して音声に
て作業者に返答する方式、ＣＲＴ等の表示装置に答を表
示して作業者に提示する方式等がある。(Prior Art) Conventionally, systems have become commonplace in which events that occur during a work process in which multiple workers perform similar tasks at industrial sites are input into a computer or the like by voice and the information is processed. In such a system, the contents input by voice are represented by simple words in advance, and the words input by each worker are recognized by a voice recognition device installed for each worker. A voice recognition method is adopted in which the recognition result is sent to the host computer. The host computer performs information processing based on the sent recognition results, sends answers to the worker as appropriate, and proceeds with a series of tasks. The means by which the host computer sends the answer to the worker include a method in which the answer is sent to the worker by voice through a response device, a method in which the answer is displayed on a display device such as a CRT, and the answer is presented to the worker.

第６図は従来の音声入力応答システムの一例を示したブ
ロック図である。各作業者が発生する音声はマイクロフ
ォン１１〜１．から対応する音声認識装置２１〜２ｏに
入力される。各音声認識装＠２１〜２ｏにて音声Ｖ＆識
された結果はホスト計算機３に送り込まれ、ホスト計算
機３は入力された音声認識結果がいずれの音声認識装置
から入力されたものであるかを識別し、音声認識結果に
対応する処理を行って、その結果を答えとして対応する
ＣＲＴに表示する。従って、マイクロフォン１　から音
声を入力するとその答えはＣＲＴ４１に表示され、作業
者はこの答を見て作業を進める。FIG. 6 is a block diagram showing an example of a conventional voice input response system. Voices generated by each worker are transmitted through microphones 11 to 1. from there to the corresponding speech recognition devices 21 to 2o. The results of voice recognition by each voice recognition device @21 to 2o are sent to the host computer 3, and the host computer 3 identifies from which voice recognition device the input voice recognition result was input. Then, processing corresponding to the voice recognition result is performed and the result is displayed on the corresponding CRT as an answer. Therefore, when a voice is input from the microphone 1, the answer is displayed on the CRT 41, and the worker proceeds with the work by looking at this answer.

しかし、上記従来のシステムが扱う作業の内容によって
は事象の起こる頻度が低く、複数の音声認識装置を設置
しておいても、同時に音声入力が起きる確率が極めて低
く、２台以上の音声認識装置が同時に動作することがほ
とんどない場合があった。このような場合、音声認識装
置の稼動率が極めて低く、高価な音声認識装置を複数台
設置することは非常に不経済となる欠点があった。However, depending on the content of the work handled by the above-mentioned conventional systems, the frequency of occurrence of events is low, and even if multiple voice recognition devices are installed, the probability that voice input will occur at the same time is extremely low, and two or more voice recognition devices In some cases, the two rarely operated at the same time. In such a case, the operating rate of the speech recognition device is extremely low, and it is extremely uneconomical to install a plurality of expensive speech recognition devices.

（発明が解決しようとする課題）上記従来の音声認識方式では、音声入力頻度が少ない場
合でも、音声入力者の数に合せた複数台の音声認識装置
が必要とされていた。このため、各音声認識装置の稼動
率が悪く、はなはだ不経済となる欠点があった。そこで
本発明は上記の欠点を除去するもので、複数の音声入力
者に対して１台の音声認識装置にて計算機等に音声入力
を行うことができる音声認識方式を提供することを目的
としている。(Problems to be Solved by the Invention) In the conventional speech recognition method described above, a plurality of speech recognition devices are required to match the number of speech inputters even when the frequency of speech input is low. For this reason, the operation rate of each voice recognition device is poor, resulting in a disadvantage that it is extremely uneconomical. Therefore, the present invention aims to eliminate the above-mentioned drawbacks, and aims to provide a voice recognition method that allows multiple voice input users to input voice to a computer, etc. using one voice recognition device. .

［発明の構成］（課題を解決するための手段）本発明は、音声入力装置と応答装置から成るチャネルを
複数チャネル有し、各チャネルから入力された音声を音
声認識装置にて認識させ、この認識結果に基づいた応答
を入力音声チャネルの応答装置に送出する計算機を有し
たシステムにおいて、音声認識装置を１台とし、この音
声認識装置に各チャネルの音声入力装置から入力される
音声の論理和をとって入力するインタフェース回路と、
各音声入力装置に１対１で対応し、且つ対応する音声入
力装置の音声出力期間を検出する複数の音声出力期間検
出装置と、音声出力期間検出装置から入力される検出信
号によって入力音声チャネルを特定すると共に、複数の
チャネルから同時に音声が入力されたか否かを判定する
音声入力チャネル判定手段と、複数のチャネルから同時
に音声が入力された場合は所定のチャネルから順番に音
声を前記音声認識装置に入力するように音声入力者に指
示するメツセージを該当の応答装置に出力して、同時入
力された音声を時分割して前記音声認識装置に入力させ
る制御を行う音声入力制御手段を具備した構成を有して
いる。[Structure of the Invention] (Means for Solving the Problems) The present invention has a plurality of channels each consisting of a voice input device and a response device, and a voice recognition device recognizes the voice input from each channel. In a system that has a computer that sends a response based on the recognition result to the response device of the input voice channel, there is one voice recognition device, and the logical sum of voices input from the voice input devices of each channel to this voice recognition device is used. an interface circuit that takes and inputs the
A plurality of audio output period detection devices correspond to each audio input device on a one-to-one basis and detect the audio output period of the corresponding audio input device, and an input audio channel is detected by a detection signal input from the audio output period detection device. an audio input channel determining means for determining whether or not audio is input from multiple channels at the same time; A configuration comprising a voice input control means that outputs a message instructing a voice input person to input to the corresponding response device, and controls the simultaneously input voices to be time-divided and input to the voice recognition device. have.

（作用）本発明の音声認識方式において、インタフェース回路は
各チャネルの音声入力装置から入力される音声の論理和
をとって音声認識装置に入力する。これと同時に音声出
力期間検出装置は対応する音声入力装置の音声出力期間
を検出してこれを計算機の音声入力チャネル判定手段に
知らせる。(Operation) In the speech recognition system of the present invention, the interface circuit calculates the logical sum of the voices input from the voice input devices of each channel and inputs the result to the voice recognition device. At the same time, the audio output period detecting device detects the audio output period of the corresponding audio input device and notifies the audio input channel determining means of the computer.

音声入力チャネル判定手段は音声出力期間検出装置から
入力される検出信号によって入力音声チャネルを特定す
ると共に、複数のチャネルから同時に音声が入力された
か否かを判定してその結果を音声入力制御手段に知らせ
る。音声入力制御手段は複数のチャネルから同時に音声
が入力された場合は所定のチャネルから順番に音声を前
記音声認識装置に入力するように音声入力者に指示する
メツセージを該当の応答装置に出力して、同時入力され
た音声を時分割して前記音声認識装置に入力させる。The audio input channel determining means specifies the input audio channel based on the detection signal input from the audio output period detection device, determines whether audio is input from multiple channels simultaneously, and sends the result to the audio input control means. Inform. The voice input control means outputs a message to the corresponding response device instructing the voice input person to input voice into the voice recognition device in order from a predetermined channel when voice is input from multiple channels at the same time. , the simultaneously input voices are time-divided and input to the voice recognition device.

（実施例）以下本発明の一実施例を従来例と同一部には同−符号を
付して図面を参照して説明する。第１図は本発明の音声
認識方式を適用した音声入力応答システムの一実施例を
示したブロック図である。(Embodiment) An embodiment of the present invention will be described below with reference to the drawings, in which the same parts as those of the conventional example are given the same reference numerals. FIG. 1 is a block diagram showing an embodiment of a voice input response system to which the voice recognition method of the present invention is applied.

１１〜１ｏは音声入力者の数に対応して設置され、各音
声入力者の音声を音声信号に変換するマイクロフォン、
２は入力された音声信号を認識（対応するコード等に変
換）し、その結果をホスト計算機３に出力する音声認識
装置、３は音声認識装置から入力される音声認識結果に
基づいて処理を行いその処理結果を対応するＣＲＴに表
示するホスト計算機、４１〜４ｏは音声入力に対するホ
スト計算機３の応答を表示するＣＲＴ、５１〜５ｏは対
応するマイクロフォンから入力される音声の出力期間を
検出して、その検出結果をホスト計算機３に入力する音
声出力期間検出装置、６はマイクロフォン１１〜１ｏか
ら出力される音声信号の論理和をとって音声認識装置２
に入力するＯＲ回路である。Microphones 11 to 1o are installed corresponding to the number of voice inputters, and convert the voice of each voice inputter into an audio signal;
2 is a speech recognition device that recognizes the input speech signal (converts it into a corresponding code, etc.) and outputs the result to the host computer 3; 3 performs processing based on the speech recognition result input from the speech recognition device. The host computer displays the processing result on the corresponding CRT, 41 to 4o are CRTs that display the response of the host computer 3 to the voice input, and 51 to 5o detect the output period of the voice input from the corresponding microphone, A voice output period detection device 6 inputs the detection result to the host computer 3, and a voice recognition device 6 calculates the logical sum of the voice signals output from the microphones 11 to 1o.
This is an OR circuit that inputs to

次に本実施例の動作について説明する。先ず、１つのチ
ャネルのみから音声入力があった場合の動作について説
明する。作業過程で事象が生じると、この事象に対する
単語音声をある作業者がマイクロフォン１１に向かって
発声したとする。マイクロフォン１１は集音した音声を
音声信号に変換して、これをオア回路６を介して音声認
識装置２に出力する。音声認識装置２は入力された音声
信号を認識する認識処理を行って計痒機が分るコード等
に変換し、このコードを認識結果としてホスト計算機３
に出力する。一方、マイクロフォン１１から出力された
音声信号は音声出力期間検出装置５１に入力されるが、
ここで音声出力期間が検出される。Next, the operation of this embodiment will be explained. First, the operation when there is audio input from only one channel will be described. Assume that when an event occurs during a work process, a certain worker utters a word sound corresponding to this event into the microphone 11. The microphone 11 converts the collected voice into an audio signal and outputs this to the voice recognition device 2 via the OR circuit 6. The voice recognition device 2 performs recognition processing to recognize the input voice signal, converts it into a code that the pruritus meter can understand, and sends this code to the host computer 3 as a recognition result.
Output to. On the other hand, the audio signal output from the microphone 11 is input to the audio output period detection device 51,
Here, the audio output period is detected.

第２図は上記音声出力期間検出装置の詳細例を示したブ
ロック図である。マイクロフォン１１から入力される第
３図（Ａ＞に示すような音声信号１００は整流回路５１
にて半波整流されて第３図（Ｂ）に示すような信号２０
０となる。この信号２００はピークホールド回路５２を
通過することによって、第３図（Ｃ）に示す音声波形の
包絡情報３００が抽出される。この包絡情報３００はコ
ンベア回路５３にて基準電圧Ｖと比較される。このコン
ベア回路５３は入力電圧が基準電圧■より大きい区間で
は出力側に論理“１パの信号５００を出力する。ここで
、基準電圧■を適切に選択しておけば、レベルの大きい
音声区間と、レベルの小ざい雑音区間を区別することが
でき、入力電圧が基準電圧Ｖより大きい信号区間を音声
信号区間とみることができる。従って、コンベア回路５
３の出力５００は第４．５図の（Ｂ）の如くなり、音声
区間は“１”、雑音区間は“０″となる。コンベア回路
５３の出力５００はアンド回路５４でクロック４００と
論理積がとられた後、リトリガブルワンショツトマルチ
バイブレータ５５に入力される。リトリガブルワンショ
ツトマルチバイブレータ５５は、トリガパルスが入力さ
れると時定数ＣＲで決定される時間巾だけ出力にパルス
６００を発生し、入力するトリガパルスが前記時間巾以
内であったならばその出力は“１”の状態を持続する。FIG. 2 is a block diagram showing a detailed example of the audio output period detection device. The audio signal 100 as shown in FIG.
The signal 20 as shown in FIG. 3(B) is half-wave rectified at
It becomes 0. By passing this signal 200 through the peak hold circuit 52, envelope information 300 of the audio waveform shown in FIG. 3(C) is extracted. This envelope information 300 is compared with a reference voltage V in a conveyor circuit 53. This conveyor circuit 53 outputs a logic "1P" signal 500 on the output side in a section where the input voltage is higher than the reference voltage ■.Here, if the reference voltage ■ is appropriately selected, it is possible to , noise sections with low levels can be distinguished, and signal sections where the input voltage is higher than the reference voltage V can be regarded as audio signal sections.Therefore, the conveyor circuit 5
The output 500 of No. 3 is as shown in FIG. 4.5 (B), where the voice section is "1" and the noise section is "0". The output 500 of the conveyor circuit 53 is ANDed with the clock 400 by an AND circuit 54, and then inputted to a retriggerable one-shot multivibrator 55. When a trigger pulse is input, the retriggerable one-shot multivibrator 55 generates a pulse 600 at the output for a time width determined by a time constant CR, and if the input trigger pulse is within the time width, the output is maintains the state of “1”.

又、入力のトリガパルスが停止すればそのトリガパルス
から前記時間巾後に出力状態が“Ｏｔｅになｇ。従って
、入力信号５００、クロック４００が第４．５図（Ｂ）
、（Ｃ）の如くであれば、コンベア回路５３の出力６０
０は第４．５図（Ｄ）に示した如くなる。即ち、第５図
（Ａ＞に示す如く入力音声の途中に谷があり、これが基
準電圧■より下がった場合でも、この谷の間隔が前記時
間巾より小さい場合は、出力側にこの谷の影響が出てこ
ない。このため、入力音声の変動が大きくても音声出力
期間検出装置５１〜５ｏにて安定して音声発生区間を検
出することができる。但し、クロック４００の周期は前
記出力パルス巾より充分小さい時間であるとする。音声
出力期間検出装置５１の出力信号６００はホスト計算機
３に入力され、ホスト計算機３はいずれの音声出力期間
検出装置から前記出力信号６００が入力されたかを識別
することによって、音声の入力チャネル、即ちこの場合
はマイクロフォン１１から音声が入力されたことを知る
。これによって、ホスト計算機３は音声認識装置２の認
識結果に基づいた処理を行うと、その処理結果を音声入
力チャネルのＣＲＴ４１に答えとして送り、ＣＲＴ４１
に次の作業指示等の応答メツセージを表示させる。Also, if the input trigger pulse stops, the output state becomes "Ote" after the above-mentioned time interval from the trigger pulse. Therefore, the input signal 500 and the clock 400 change as shown in FIG. 4.5 (B).
, (C), the output 60 of the conveyor circuit 53
0 is as shown in FIG. 4.5 (D). In other words, even if there is a valley in the middle of the input audio as shown in Figure 5 (A>) and this valley falls below the reference voltage ■, if the interval between these valleys is smaller than the time width, the effect of this valley will be on the output side. Therefore, even if the input audio fluctuates greatly, the audio output period detection devices 51 to 5o can stably detect the audio generation period.However, the period of the clock 400 is determined by the output pulse width. The output signal 600 of the audio output period detection device 51 is input to the host computer 3, and the host computer 3 identifies from which audio output period detection device the output signal 600 is input. By this, the host computer 3 knows that the voice has been input from the voice input channel, that is, the microphone 11 in this case.Thus, when the host computer 3 performs processing based on the recognition result of the voice recognition device 2, the host computer 3 receives the processing result. Send it as an answer to the CRT41 of the audio input channel,
display a response message such as the next work instruction.

このようにあるチャネルが音声入力を行っている時、ホ
スト計算機３は対応する音声出力期間検出装置から出力
される音声出力期間検出信号６００によって音声入力を
行っているチャネル番号を判別することができる。この
ため、他のチャネルのＣＲＴに対しては音声１ｉｉｉ１
装置２が使用中である旨のメツセージＡを送って、これ
を表示させる。In this way, when a certain channel is inputting audio, the host computer 3 can determine the channel number that is inputting audio based on the audio output period detection signal 600 output from the corresponding audio output period detection device. . Therefore, for CRT of other channels, audio 1iii1
A message A indicating that device 2 is in use is sent and displayed.

このメツセージＡとしては例えば「音声認識装置はビジ
ー中です。音声入力はしばらくお待ち下さい。」等があ
る。ホスト計＄１１１３はその後、前記チャネルの入力
処理を終了すると、全チャネルのＣＲＴに対して例えば
以下のようなメツセージＢを送る。［音声認識装置はレ
ディー中です。音声入力が可能です。」次に複数のチャネルが同時に音声入力を行なった場合、
オア回路６の出力は複数の音声が重畳したものになり、
この重畳した音声が音声認識装置２に入力される。従っ
て、これら音声を音声認識装Ｍ２によって認識させても
、そのｖｌｈ識結果は信頼性のおけるものでなくなる。The message A may be, for example, "The voice recognition device is busy. Please wait for a while for voice input." Thereafter, when the host computer $1113 completes the input processing for the channel, it sends a message B as shown below, for example, to the CRTs of all channels. [The voice recognition device is ready. Voice input is possible. ” Next, if multiple channels perform audio input at the same time,
The output of the OR circuit 6 is a superimposition of multiple sounds,
This superimposed voice is input to the voice recognition device 2. Therefore, even if these voices are recognized by the voice recognition device M2, the vlh recognition results are not reliable.

しかし、この場合、複数のチャネルの音声出力期間検出
装置から時間的に重なった複数の音声出力期間検出像＠
６００がホスト計算機３に入力される。このため、ホス
ト計算機３は複数のチャネルから同時に音声入力があっ
たことを認識して、音声認識装置２から入力される認識
結果を捨てると共に、同時に音声を入力した複数のチャ
ネルに対して優先順位を決める。However, in this case, multiple temporally overlapping audio output period detection images from the audio output period detection devices of multiple channels @
600 is input to the host computer 3. Therefore, the host computer 3 recognizes that there is voice input from multiple channels at the same time, discards the recognition result input from the voice recognition device 2, and prioritizes the multiple channels that input voice at the same time. decide.

次にホスト計算機３は優先順位の最も高いチャネルのＣ
ＲＴに対して以下のメツセージＣを送って表示させる。Next, host computer 3 uses C, the channel with the highest priority.
Send the following message C to RT and have it displayed.

即ち、「もう−度、先はどの音声を発声して下さい」。In other words, "Which voice should you say next?"

これと同時に、ホスト計算機３は他の優先順位のチャネ
ルのＣＲＴには前記と同様のメツセージＡを送る。即ち
、「音声認識装置はビジー中です。音声入力はしばらく
お待ち下さい。」こうして、優先度の最も高いチャネル
の音声入力が終了すると、ホスト計算機３は第２番目に
優先度の高いチャネルにメツセージＣを送り、他の優先
度のチャネルにメツセージ八を送る処理を行なう。結局
ホスト計算機３は上記動作を繰返し、重複して音声入力
した全チャネルに対して時分割で音声入力を行なわせ、
音声認識装置２から有効な認識結果を得て、音声入力処
理を順次進めて行く。なお、上記音声Ｌ１’ｌＡ装置２
の使用状態に対するメツセージはＣＲＴの画面の隅に表
示させ、作業内容等の指示を画面中央に大きく表示させ
るこもできる。At the same time, the host computer 3 sends the same message A as described above to the CRTs of channels with other priority levels. In other words, "The voice recognition device is busy. Please wait for voice input for a while." When the voice input of the channel with the highest priority is completed, the host computer 3 sends message C to the channel with the second highest priority. , and processes to send message 8 to channels with other priorities. In the end, the host computer 3 repeats the above operation and performs audio input on a time-sharing basis for all channels to which audio has been input redundantly.
An effective recognition result is obtained from the speech recognition device 2, and speech input processing is sequentially performed. Note that the audio L1'lA device 2
Messages regarding the usage status of the CRT can be displayed in the corner of the CRT screen, and instructions such as work details can be displayed in a large size in the center of the screen.

本実施例によれば、複数のチャネルから音声が同時に入
力された場合、優先順位を決めて、時分割で各チャネル
からの音声を入力させるようにしているため、音声認識
装置２は１台ですますことができ、複数のチャネルの音
声入力システムを非常に廉価に構成することができる。According to this embodiment, when voices are input from multiple channels at the same time, priority is determined and the voices from each channel are input in a time-sharing manner, so there is only one voice recognition device 2. It is possible to configure a multi-channel audio input system at a very low cost.

又、音声入力の生じる頻度が少ないケースでは、音声が
同時に入力されることが少なく、実用的な音声入力処理
速度を維持することができる。Furthermore, in cases where voice inputs occur infrequently, voices are rarely input at the same time, and a practical voice input processing speed can be maintained.

［発明の効果］以上記述した如く本発明の音声認識方式によれば、複数
の音声入力者に対して１台の音声認識装置にて計算機等
に音声入力を行い得る効果がある。[Effects of the Invention] As described above, according to the speech recognition method of the present invention, there is an effect that a plurality of speech input users can input speech into a computer or the like using one speech recognition device.

[Brief explanation of the drawing]

第１図は本発明の音声認識方式を適用した音声入力応答
システムの一実施例を示したブロック図、第２図は第１
図に示した音声出力期間検出装置の詳細例を示したブロ
ック図、第３図は第２図に示した音声出力期間検出装置
の前段部の動作波形図、第４図及び第５図は第２図に示
した音声出力期間検出装置の後段部の動作波形図、第６
図は従来の音声入力応答システムの一例を示したブロッ
ク図である。１１〜１ｎ・・・マイクロフォン２・・・音声認識装置３・・・ホスト計算機４１〜４ｏ−ＣＲＴ５１〜５ｎ・・・音声出力期間検出装置６・・・オア回
路代理人　弁理士　則　近　憲　佑周　　山王　− 第２図ｊｌＩ３図（Ｃ）第４図（Ｃ）第５図ａ）FIG. 1 is a block diagram showing an embodiment of a voice input response system to which the voice recognition method of the present invention is applied, and FIG.
FIG. 3 is a block diagram showing a detailed example of the audio output period detection device shown in FIG. Operation waveform diagram of the latter part of the audio output period detection device shown in Figure 2, No. 6
The figure is a block diagram showing an example of a conventional voice input response system. 11-1n...Microphone 2...Speech recognition device 3...Host computer 41-4o-CRT 51-5n...Speech output period detection device 6...OR circuit agent Patent attorney Noriyuki Chika Zhou Shanwang - Figure 2 jlI Figure 3 (C) Figure 4 (C) Figure 5 a)

Claims

[Claims]

A computer that has multiple channels consisting of a voice input device and a response device, has a voice recognition device recognize the voice input from each channel, and sends a response based on the recognition result to the response device of the input voice channel. In a system with a single voice recognition device, an interface circuit that calculates the logical sum of the voices input from the voice input devices of each channel and inputs the result to the voice recognition device, and a one-to-one interface circuit for each voice input device. a plurality of audio output period detection devices that correspond to each other and detect audio output periods of the corresponding audio input devices;
an audio input channel determining means for specifying an input audio channel based on a detection signal input from the audio output period detection device and determining whether or not audio is input from multiple channels simultaneously; is input, a message instructing the voice input person to input voice into the voice recognition device in order from a predetermined channel is output to the corresponding response device, and the simultaneously input voices are time-divided. A voice recognition system comprising voice input control means for controlling input to the voice recognition device.