JP2000106603A

JP2000106603A - Device and method for voice recognition and response, and voice guidance method

Info

Publication number: JP2000106603A
Application number: JP10274547A
Authority: JP
Inventors: Toshiyuki Matsuda; 俊幸松田; Hitoshi Sato; 均佐藤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1998-09-29
Filing date: 1998-09-29
Publication date: 2000-04-11

Abstract

PROBLEM TO BE SOLVED: To recognize received voices from plural lines through one voice recognizing part by outputting a voice guidance, recognizing the voice inputted from each call originating terminal in response to that guidance, controlling the output time of the voice guidance outputted to the call originating terminal in accordance with the conditions of interaction with the other call originating terminal and performing recognition processing of voices received from plural call originating terminals in time division. SOLUTION: After call incoming is received on telephone line 1, a voice recognizing/ responding device outputs a guidance voice 1501 to a caller and outputs a guidance voice 1506 to the caller, even as to a telephone line 2 as well. On the side of the telephone line 2, the uttering timing of the caller on the telephone line 1 is predicted and the guidance voice on the side of the telephone line 2 is selected so as not to overlap the uttering timing. When the caller on the side of the telephone line 2 interrupts uttering after listening to the guidance voice only up to the middle, however, a control part performs control so as to temporarily preserve the voice of the caller on the telephone line 2 in a memory and after the end of recognition processing 1:1512, recognition processing 2:1513 is performed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、有線および無線通
信分野において通信回線を介して接続された相手方への
音声ガイダンスの送信と、相手方より受信する音声の認
識を行う音声認識応答装置および音声認識応答方法なら
びに音声ガイダンス方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition response apparatus and a voice recognition apparatus for transmitting voice guidance to a partner connected via a communication line and recognizing voice received from the partner in a wired and wireless communication field. The present invention relates to a response method and a voice guidance method.

【０００２】[0002]

【従来の技術】従来、電話回線を介して受信する音声を
認識し、案内サービスを行う技術としては「大規模内線
電話受付システム」（内藤他、日本音響学会講演論文集
3-p-27、p215-216 平成７年３月）に記載されているよ
うなものがある。従来技術においては、１本の電話回線
に一つの内線番号案内システムを接続する構成となって
いる。内線番号案内システムは、受信した音声を認識す
る音声認識部とシステムと発呼者間の対話の進行を管理
し音声ガイダンスを出力する談話管理部を有している。
従ってこの従来の内線番号案内システムを用いて複数回
線に対してサービスを行う場合には、回線数と同じ数の
内線番号案内システムを用意する必要があり、音声認識
部および談話管理部も回線数と同じ数有することにな
る。2. Description of the Related Art Conventionally, as a technology for recognizing voice received via a telephone line and providing a guidance service, a "large-scale extension telephone reception system" (Naito et al., Proceedings of the Acoustical Society of Japan)
3-p-27, p215-216, March 1995). In the prior art, one extension number guidance system is connected to one telephone line. The extension number guidance system has a voice recognition unit for recognizing the received voice and a discourse management unit for managing the progress of the dialog between the system and the caller and outputting voice guidance.
Therefore, when providing services to a plurality of lines using this conventional extension number guidance system, it is necessary to prepare the same number of extension number guidance systems as the number of lines. Will have the same number.

【０００３】こういったシステムの利用目的のひとつ
は、情報案内サービスや構内（企業内）交換機（以下Ｐ
ＢＸ:Private Branch eXchange)の自動交換サービスな
どを無人で行うことである。無人の情報案内サービスや
自動交換サービスでは、発呼者が操作方法がわからずに
混乱することがないように、システム側の音声ガイダン
ス装置からの問いかけに発呼者が答えるという形式で対
話を行い、発呼者の要求に応じたサービスを提供する。
こういったシステムにおける発呼者と音声ガイダンス装
置が出力するガイダンス音声との対話に関しては、以下
のような特徴的な傾向があることがわかっている。One of the purposes of using such a system is to provide information guidance services and private (intra-company) exchanges (hereinafter referred to as P
BX: Private Branch eXchange). In the unmanned information guidance service and automatic exchange service, the caller answers the question from the voice guidance device on the system side so that the caller does not know how to operate and gets confused. And provide services according to the caller's request.
It has been found that the dialogue between the caller and the guidance voice output by the voice guidance device in such a system has the following characteristic tendency.

【０００４】音声ガイダンス装置から問いかけている
最中には、発呼者は発声しないで質問を聞いている。[0004] While asking from the voice guidance device, the caller listens to the question without speaking.

【０００５】音声ガイダンス装置からの問いかけ時間
と発呼者が回答する時間では、問いかけ時間の方が圧倒
的に長い。[0005] The interrogation time from the voice guidance device and the time to answer by the caller are much longer than the interrogation time.

【０００６】上記システムの音声認識部は、発呼者が発
声している間のみ動作し、音声ガイダンス装置が発呼者
へ問いかけている間は待機または停止している。従っ
て、音声認識部は、ほとんどの時間動作せずに待機状態
あるいは停止状態にあることになる。[0006] The voice recognition section of the above system operates only while the caller is speaking, and waits or stops while the voice guidance device is asking the caller. Therefore, the voice recognition unit is in a standby state or a stopped state without operating most of the time.

【０００７】[0007]

【発明が解決しようとする課題】上記従来の技術におい
ては、内線電話受付システム内の音声認識部は大部分の
時間は待機状態または停止状態であり、処理効率が非常
に低いという問題点があった。また、回線ごとに内線電
話受付システムを備える構成であったので多数の回線を
対象としたサービスを行う場合には、回線数と同じ数の
装置が必要となる。従って、多数の回線を対象とする場
合には、システムの装置規模が大きくなり、設置コスト
もかさむという問題があった。In the above prior art, the voice recognition unit in the extension telephone reception system is in a standby state or a stopped state for most of the time, and there is a problem that the processing efficiency is very low. Was. In addition, since the system is provided with an extension telephone reception system for each line, when providing services for a large number of lines, the same number of devices as the number of lines are required. Therefore, when a large number of lines are targeted, there has been a problem that the system size of the system is increased and the installation cost is increased.

【０００８】本発明は上記課題を解決するためになされ
たもので、その目的は１つの音声認識部で複数の電話回
線から受信する音声を認識できるようにして、少ない数
の音声認識部を効率的に利用できる音声認識応答装置お
よび音声認識応答方法ならびに音声ガイダンス方法を提
供することにある。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and an object of the present invention is to enable a single voice recognition unit to recognize voices received from a plurality of telephone lines so that a small number of voice recognition units can be efficiently used. It is an object of the present invention to provide a voice recognition response device, a voice recognition response method, and a voice guidance method that can be used in a practical manner.

【０００９】[0009]

【課題を解決するための手段】上記課題を解決するため
に、本発明は、通信回線を介して接続された複数の発呼
端末に対して音声ガイダンスを出力し、その音声ガイダ
ンスに応答して各発呼端末から入力された音声を認識す
る対話形式の音声認識応答装置であって、各発呼端末に
出力する音声ガイダンスの出力時間を他の発呼端末との
間の対話状況に応じて制御する制御手段と、複数の発呼
端末から受信した音声を時分割で認識処理する音声認識
手段とを備えるようにしたものである。In order to solve the above problems, the present invention outputs voice guidance to a plurality of calling terminals connected via a communication line and responds to the voice guidance. An interactive voice recognition / response apparatus that recognizes voice input from each calling terminal, wherein an output time of voice guidance output to each calling terminal is set according to a dialogue state with another calling terminal. A control means for controlling and a voice recognition means for recognizing and processing voices received from a plurality of calling terminals in a time-division manner.

【００１０】詳しくは、上記構成において、音声ガイダ
ンスの言い回しを変更することにより、各発呼端末から
音声を受信する時間帯が重ならないように音声ガイダン
スの出力時間を制御するようにしたものである。[0010] More specifically, in the above configuration, by changing the wording of the voice guidance, the output time of the voice guidance is controlled so that the time periods for receiving voices from the calling terminals do not overlap. .

【００１１】また、上記課題を解決するために、通信回
線を介して接続された複数の発呼端末に対して音声ガイ
ダンスを出力し、音声ガイダンスに応答して各発呼端末
から入力された音声を認識する音声認識応答装置であっ
て、発呼端末から受信した音声を認識処理する音声認識
手段と、音声認識手段が認識処理中に別の発呼端末から
音声を受信した場合には別の発呼端末から受信した音声
を記憶するメモリと、音声認識手段における認識処理の
終了後、メモリに記憶した音声を読み出して音声認識手
段に送信するメモリ読み出し手段とを有するようにした
ものである。In order to solve the above-mentioned problems, voice guidance is output to a plurality of calling terminals connected via a communication line, and voices input from each calling terminal in response to the voice guidance. A voice recognition response device for recognizing a voice, wherein voice recognition means for recognizing and processing voice received from the calling terminal, and another voice when the voice recognition means receives voice from another calling terminal during the recognition process. It has a memory for storing the voice received from the calling terminal, and a memory reading means for reading out the voice stored in the memory and transmitting it to the voice recognition means after the recognition processing in the voice recognition means is completed.

【００１２】[0012]

【発明の実施の形態】以下、本発明の実施形態を図面を
参照して説明する。まず、図１ないし図３を用いて、本
発明の音声認識応答装置の構成および本発明の音声認識
応答装置を備えたシステムの構成例を説明する。図１
は、本発明の音声認識応答装置の構成を示す図である。
図２は、本発明の音声認識応答装置の音声入出力部の構
成を示す図である。図３は、本発明の音声認識応答装置
用いて自動内線交換サービスを実現するシステムの構成
を示す図である。図３に示すシステムにおいて、発呼者
が電話機301や302から自動内線交換サービスを行う交換
台に発呼すると、公衆網303とＰＢＸ304を介して音声認
識応答装置307に接続される。音声認識応答装置307は発
呼者に対しガイダンス音声を出力する。発呼者が、音声
認識応答装置307のガイダンス音声に従い接続したい相
手の所属部署名や名前を発声すると、音声認識応答装置
307はそれら所属部署名や名前を認識し、音声認識応答
装置307内に予め用意されたデータから認識結果に対応
する内線番号を検索し、ＰＢＸ304の制御部308に検索し
た内線番号を通知する。ＰＢＸ304は、音声認識応答装
置307から受信した内線番号に基づいて発呼者からの呼
に対し、内線転送を行う。この時の転送方法の一例とし
ては、一般的に通話中の内線を転送する場合に通話者が
手動で行うフッキング後に内線番号を発するという転送
操作を、音声認識応答装置307が自動的に行うようにす
る方法がある。Embodiments of the present invention will be described below with reference to the drawings. First, a configuration example of a voice recognition response device of the present invention and a configuration example of a system including the voice recognition response device of the present invention will be described with reference to FIGS. FIG.
FIG. 1 is a diagram showing a configuration of a voice recognition response device of the present invention.
FIG. 2 is a diagram showing a configuration of a voice input / output unit of the voice recognition response device of the present invention. FIG. 3 is a diagram showing a configuration of a system for realizing an automatic extension switching service using the voice recognition response device of the present invention. In the system shown in FIG. 3, when a caller places a call from a telephone 301 or 302 to a switchboard for performing an automatic extension switching service, the caller is connected to a voice recognition response device 307 via a public network 303 and a PBX 304. The voice recognition response device 307 outputs a guidance voice to the caller. When the caller utters the department name or name of the party to be connected according to the guidance voice of the voice recognition response device 307, the voice recognition response device
The 307 recognizes these department names and names, searches for an extension number corresponding to the recognition result from data prepared in advance in the voice recognition response device 307, and notifies the control unit 308 of the PBX 304 of the searched extension number. The PBX 304 performs extension transfer for a call from the caller based on the extension number received from the voice recognition response device 307. As an example of the transfer method at this time, in general, when transferring an extension during a call, the voice recognition response device 307 automatically performs a transfer operation of issuing an extension number after hooking manually performed by a caller. There is a way to

【００１３】次に、音声認識応答装置307の詳細構成に
ついて図１を用いて説明する。図１は、１つの音声認識
応答装置に2回線を接続した場合の構成を示す図であ
る。音声認識応答装置は、複数の電話回線を接続するた
めの複数の回線インターフェイス101、102(以下、回線I
/Fとする)と、各回線I/Fから入出力する音声の処理を行
う音声入出力部103、104と、複数の音声入出力部から入
力される音声信号を選択する入力選択部105と、音声の
出力先として複数の音声入出力部のうちのいずれかを選
択する出力選択部106と、入力音声を格納するメモリ108
およびメモリ108の読み書きを制御するメモリ制御部114
と、入力選択部105で選択した音声またはメモリ108に格
納した音声の認識処理を行う音声認識ユニット107と、
ガイダンス音声が蓄積された音声データ部109と音声デ
ータ部からのガイダンス音声の出力を制御する音声デー
タ制御部115と、音声認識ユニット107の処理状況および
音声ガイダンス出力状態を監視する監視部110と、各部
を制御するとともに音声認識ユニット107の認識処理に
用いる認識データを格納した認識データ部113および、
発呼者が発声するタイミングを予測するためのデータが
格納された予測データ部112からの認識データおよび予
測データの読み出しを制御する制御部111を有してい
る。図１では2回線を接続する例を示したが、更に多い
回線を接続する場合には、収容回線数に応じて音声認識
ユニット107数とメモリ108数を増加させることにより対
応可能である。なお、音声認識ユニット数とメモリ数を
合わせた合計数が収容回線数と同じであれば、同時に全
回線から着信があったとしても全回線に対応可能であ
る。例えば10回線を収容する場合、5つの音声認識ユニ
ットと5つメモリで構成、または３つの音声認識ユニッ
トと７つのメモリユニットで構成することも可能であ
る。収容回線を増加する場合には、回線I/Fと音声入出
力部も増加する電話回線数に応じて追加する。Next, the detailed configuration of the voice recognition response device 307 will be described with reference to FIG. FIG. 1 is a diagram showing a configuration when two lines are connected to one voice recognition response device. The voice recognition response device includes a plurality of line interfaces 101, 102 (hereinafter, line I) for connecting a plurality of telephone lines.
/ F), and audio input / output units 103 and 104 for processing audio input / output from each line I / F, and an input selection unit 105 for selecting audio signals input from a plurality of audio input / output units. An output selection unit 106 for selecting one of a plurality of audio input / output units as an audio output destination, and a memory 108 for storing input audio.
And a memory control unit 114 for controlling reading and writing of the memory 108
A voice recognition unit 107 that performs recognition processing of the voice selected by the input selection unit 105 or the voice stored in the memory 108;
A voice data unit 109 in which guidance voice is stored, a voice data control unit 115 that controls output of guidance voice from the voice data unit, a monitoring unit 110 that monitors a processing status of the voice recognition unit 107 and a voice guidance output state, A recognition data unit 113 that controls each unit and stores recognition data used for the recognition processing of the voice recognition unit 107;
The control unit 111 controls reading of recognition data and prediction data from the prediction data unit 112 in which data for predicting the timing at which the caller utters is stored. Although FIG. 1 shows an example in which two lines are connected, a case where more lines are connected can be handled by increasing the number of voice recognition units 107 and the number of memories 108 according to the number of accommodated lines. If the total number of voice recognition units and the number of memories is the same as the number of accommodated lines, all lines can be handled even if there is an incoming call from all lines at the same time. For example, when accommodating 10 lines, it is also possible to configure with five voice recognition units and five memories, or with three voice recognition units and seven memory units. When the number of accommodated lines is increased, the line I / F and the voice input / output unit are also added according to the increased number of telephone lines.

【００１４】次に、本発明の音声認識応答装置の各部の
動作を説明する。回線I/F101と回線I/F102は、それぞれ
電話回線1と電話回線2音声認識応答装置への回線接続を
行う。具体的には、回線I/Fは、電話回線からの着信に
対し自動着信し、制御部111へ着信通知を行い、制御部1
11からの指示で回線接続または切断を行う。また制御部
111の指示によりフッキングやダイヤル発信等の動作も
行う。Next, the operation of each part of the voice recognition response device according to the present invention will be described. The line I / F 101 and the line I / F 102 make line connections to the telephone line 1 and the telephone line 2 respectively. Specifically, the line I / F automatically receives an incoming call from the telephone line, notifies the control unit 111 of the incoming call,
Connect or disconnect the line according to the instruction from 11. Control unit
In accordance with the instruction of 111, operations such as hooking and dialing are also performed.

【００１５】音声入出力部103、104は、回線I/F101、10
2から受信したアナログ信号をディジタル信号に変換
し、入力選択部105へ送信する。この時、制御部111の指
示により変換されたディジタル信号から音声信号を検出
し、その検出状況を制御部111に通知する。音声入出力
部103、104はまた、出力選択部106から受け取ったガイ
ダンス音声をディジタル/アナログ変換して回線I/F10
1、102に送信する。The audio input / output units 103 and 104 are connected to the line I / Fs 101 and 10
The analog signal received from 2 is converted into a digital signal and transmitted to the input selection unit 105. At this time, an audio signal is detected from the digital signal converted by the instruction of the control unit 111, and the detection status is notified to the control unit 111. The voice input / output units 103 and 104 also convert the guidance voice received from the output selection unit 106 from digital to analog, and
Send to 1,102.

【００１６】入力選択部105は、制御部111の指示により
音声入出力部103、104からのディジタル音声信号を音声
認識ユニット107またはメモリ108に送信する。またメモ
リ制御部114が読み出したメモリ108に蓄積されたディジ
タル音声信号を音声認識ユニット107へ送信する。そし
て送信状態を制御部111へ通知する。The input selection unit 105 transmits digital voice signals from the voice input / output units 103 and 104 to the voice recognition unit 107 or the memory 108 in accordance with an instruction from the control unit 111. Also, the digital voice signal read by the memory control unit 114 and stored in the memory 108 is transmitted to the voice recognition unit 107. Then, the control unit 111 is notified of the transmission state.

【００１７】出力選択部106は、制御部111からの指定に
基づいて音声データ制御部115が音声データ部109から読
み出したガイダンス音声を取り込み、出力先を選択して
音声入出力部103または音声入出力部104へ送信する。The output selection unit 106 receives the guidance voice read from the voice data unit 109 by the voice data control unit 115 based on the designation from the control unit 111, selects an output destination, and selects the voice input / output unit 103 or the voice input / output unit. The information is transmitted to the output unit 104.

【００１８】音声認識ユニット107には、不特定話者を
対象とした単語音声認識方式が搭載されている。音声認
識処理は制御部111が認識データ部113から必要な認識辞
書データを取り込み、そのデータを音声認識ユニット10
7に送り込んだ後、制御部111の指示により入力選択部10
5から入力されるディジタル音声信号を認識することに
より行う。音声認識ユニット107はこの処理が終了した
後、制御部111に音声認識の結果を通知する。The speech recognition unit 107 is equipped with a word speech recognition system for unspecified speakers. In the voice recognition process, the control unit 111 fetches necessary recognition dictionary data from the recognition data unit 113 and stores the data in the voice recognition unit 10.
7 and then input selector 10 according to the instruction of controller 111.
This is performed by recognizing the digital voice signal input from step 5. After this processing is completed, the voice recognition unit 107 notifies the control unit 111 of the result of voice recognition.

【００１９】メモリ108は、音声入出力部103、104から
のディジタル音声信号を一時的に保存しておくところで
ある。具体的には、音声入出力部103からのディジタル
音声信号を音声認識ユニット107で認識している間に音
声入出力部104からのディジタル音声信号が入力された
場合には、音声認識ユニット107は使用中で音声入出力
部104からのディジタル音声信号の認識処理を行うこと
ができないので、入力選択部105はメモリ108を選択し、
音声入出力部104からのディジタル音声信号を一時的に
メモリ108に保存する。The memory 108 temporarily stores digital audio signals from the audio input / output units 103 and 104. Specifically, when a digital voice signal from the voice input / output unit 104 is input while the digital voice signal from the voice input / output unit 103 is being recognized by the voice recognition unit 107, the voice recognition unit 107 Since the digital audio signal cannot be recognized from the audio input / output unit 104 during use, the input selection unit 105 selects the memory 108,
The digital audio signal from the audio input / output unit 104 is temporarily stored in the memory 108.

【００２０】音声データ部109は、発呼者を誘導する様
々なガイダンス音声を蓄積している。同じ質問内容につ
いてガイダンス長の異なる複数のガイダンス音声を蓄積
している。The voice data section 109 stores various guidance voices for guiding a caller. A plurality of guidance voices with different guidance lengths are accumulated for the same question content.

【００２１】監視部110は、音声認識ユニット107とメモ
リ108の動作状態を監視して制御部111に通知する。The monitoring section 110 monitors the operation states of the speech recognition unit 107 and the memory 108 and notifies the control section 111 of the operation state.

【００２２】制御部111は、回線I/F101,102と、音声入
出力部103,104と、入力選択部105と、出力選択部106
と、音声認識ユニット107と、監視部110を制御する。制
御部111は、複数回線から送られてくる発呼者の発声タ
イミングを予測しながら、音声データ部109に蓄積され
た同じ内容であるがガイダンス長の異なるガイダンス音
声を使い分けて、複数回線から送られてくる発呼者の発
声タイミングをずらし、音声認識処理が重ならないよう
にする。予測に使用するデータベースは予測データ部11
2に搭載する。また、音声認識に必要となる認識対象辞
書データは認識データ部113に搭載する。The control unit 111 includes line I / Fs 101 and 102, voice input / output units 103 and 104, an input selection unit 105, and an output selection unit 106
Then, the voice recognition unit 107 and the monitoring unit 110 are controlled. The control unit 111 predicts the utterance timing of the caller sent from the plurality of lines, and uses the same guidance voices having the same contents but different guidance lengths stored in the voice data unit 109 to selectively transmit the guidance voices from the plurality of lines. The utterance timing of the incoming caller is shifted so that the voice recognition processing does not overlap. The database used for prediction is the prediction data section 11
Mount on 2. In addition, recognition target dictionary data required for voice recognition is mounted in the recognition data unit 113.

【００２３】図２は、本発明の音声認識応答装置の音声
入出力部103および104の機能ブロック図である。音声入
出力部103は、アナログ/ディジタル信号変換(A/D)201
と、ディジタル/アナログ信号変換(D/A)202と、エコー
キャンセラ203と、音声検出部204で構成される。アナロ
グ/ディジタル信号変換(A/D)201は、回線I/Fからのアナ
ログ信号をディジタル信号に変換し、ディジタル/アナ
ログ信号変換(D/A)202は出力選択部106からのディジタ
ル信号をアナログ信号に変換する。なお、回線I/Fがデ
ィジタル回線に接続する場合はアナログ/ディジタル信
号変換(A/D)201とディジタル/アナログ信号変換(D/A)20
1は搭載しない。FIG. 2 is a functional block diagram of the voice input / output units 103 and 104 of the voice recognition / response apparatus of the present invention. The audio input / output unit 103 has an analog / digital signal conversion (A / D) 201
, A digital / analog signal converter (D / A) 202, an echo canceller 203, and a voice detection unit 204. The analog / digital signal converter (A / D) 201 converts an analog signal from the line I / F into a digital signal, and the digital / analog signal converter (D / A) 202 converts the digital signal from the output selector 106 into an analog signal. Convert to a signal. When the line I / F is connected to a digital line, the analog / digital signal conversion (A / D) 201 and the digital / analog signal conversion (D / A) 20
1 is not mounted.

【００２４】エコーキャンセラ203は、出力選択部106か
らのガイダンス音声が電話回線を介してエコーとなって
再び音声入出力部103に入ってくることによる音声検出
部204の誤動作を防ぐためのものである。The echo canceller 203 is for preventing a malfunction of the voice detection unit 204 due to the guidance voice from the output selection unit 106 becoming an echo via the telephone line and entering the voice input / output unit 103 again. is there.

【００２５】音声検出部204は、回線I/Fからのディジタ
ル信号から音声信号を検出し制御部111へ通知する。The voice detector 204 detects a voice signal from the digital signal from the line I / F and notifies the controller 111 of the voice signal.

【００２６】次に、本発明の動作について説明する。ま
ず、本発明の特徴的な動作である、音声認識応答装置に
接続される複数回線から送られてくる発呼者からの音声
の認識処理が重ならないように、発呼者の発声タイミン
グを予測しながらガイダンス音声を選ぶ予測動作につい
て図４および図５を用いて説明する。Next, the operation of the present invention will be described. First, the utterance timing of the caller is predicted so that the processing of recognizing the voice from the caller sent from a plurality of lines connected to the speech recognition response device, which is a characteristic operation of the present invention, does not overlap. The prediction operation of selecting the guidance voice while doing so will be described with reference to FIGS.

【００２７】図４は本発明の音声認識応答装置の予測デ
ータ部に搭載されるデータの構成例を示したものであ
る。図４に示すように、予測データ部に搭載されるデー
タは、ガイダンス種毎に付与されたID番号と、ガイダン
ス種毎に複数種類用意してある各ガイダンス音声に対応
する複数のガイダンス音声ファイル名と、予測処理に用
いるため各ガイダンス音声ファイルのガイダンス長を示
す出力時間データからなる。FIG. 4 shows an example of the configuration of data mounted in the prediction data section of the speech recognition response device according to the present invention. As shown in FIG. 4, the data loaded in the prediction data part includes an ID number assigned to each guidance type and a plurality of guidance audio file names corresponding to each guidance audio prepared in a plurality of types for each guidance type. And output time data indicating the guidance length of each guidance voice file for use in the prediction processing.

【００２８】このデータを用いた具体的な予測動作を図
５を用いて説明する。図５は、他回線のガイダンス出力
状態と予測対象回線の予測結果の関係および予測処理の
フローチャートを示す図である。予測処理は制御部111
で行われる。図５の（ａ）は他回線のガイダンス出力状
態と予測対象回線の予測結果の関係を示し、（ｂ）は予
測処理のフローチャートを示す。図５（ａ）において、
現在時刻Tにおいて他回線の発呼者の発声開始予測開始
時刻はT1+t1、発声終了予測時刻はT1+t1+t2である。t1
はガイダンス出力時間であり、図４に示した予測データ
部112から取得できる。t2は予め決めておいた発呼者の
発声する時間である。概ね2秒から4秒ぐらいとする。あ
るIDを持つガイダンス種のi番目のガイダンス音声の長
さを示す出力時間をt(ID,i),(図４に示した実施例では
ガイダンスID 毎に３種類のガイダンス音声を用意して
いるので、i=1〜3)と表す。制御部111は、 t(ID,i)の値
を予測データ部112から得ることができる。予測対象回
線では現在時刻Tから出力するガイダンスとしてあるID
のガイダンス種のi番目のガイダンス音声を使用する場
合、発呼者の発声開始予測時刻はT+t(ID,i)となり、発
声終了予測時刻はT+t(ID,i)+t2となる。この図の場合は
他の回線と予測対象回線の発呼者の発声タイミングが異
なり、認識処理が重ならないために一つの音声認識ユニ
ットで認識することが可能な例を示している。A specific prediction operation using this data will be described with reference to FIG. FIG. 5 is a diagram showing the relationship between the guidance output state of another line and the prediction result of the prediction target line and a flowchart of the prediction process. The prediction processing is performed by the control unit 111.
Done in FIG. 5A shows the relationship between the guidance output state of another line and the prediction result of the prediction target line, and FIG. 5B shows a flowchart of the prediction process. In FIG. 5A,
At the current time T, the predicted utterance start time of the caller of the other line is T1 + t1, and the predicted utterance end time is T1 + t1 + t2. t1
Is a guidance output time, which can be obtained from the prediction data unit 112 shown in FIG. t2 is a predetermined utterance time of the caller. Approximately 2 to 4 seconds. The output time indicating the length of the i-th guidance voice of the guidance type having a certain ID is t (ID, i). (In the embodiment shown in FIG. 4, three types of guidance voices are prepared for each guidance ID. Therefore, it is expressed as i = 1 to 3). The control unit 111 can obtain the value of t (ID, i) from the prediction data unit 112. For a line to be predicted, a certain ID as guidance output from the current time T
When the i-th guidance voice of the guidance type is used, the predicted utterance start time of the caller is T + t (ID, i) and the predicted utterance end time is T + t (ID, i) + t2 . In the case of this figure, an example is shown in which the speech of the caller of the other line and the line to be predicted differ from each other and the recognition processing does not overlap, so that the recognition can be performed by one voice recognition unit.

【００２９】次に（ｂ）を参照して予測処理の説明を行
う。まず、予測データ部112からガイダンス対象となる
種類のIDに対応して用意されている複数のファイルの出
力時間を取得する。本実施例では３種類のガイダンス音
声が用意されているのでt(ID,1)、 t(ID,2)、 t(ID,3)
の３つの出力時間を取得する (ステップ501)。Next, the prediction process will be described with reference to FIG. First, the output time of a plurality of files prepared corresponding to the type of guidance target ID is obtained from the prediction data unit 112. In this embodiment, since three types of guidance voices are prepared, t (ID, 1), t (ID, 2), t (ID, 3)
Are obtained (step 501).

【００３０】次に予測対象回線以外の他の回線全てに対
して条件１を満たす最適なガイダンス音声を３種類の中
から選択する。条件１のT1+t1>T+t(ID,i)+t2は、他の回
線のガイダンス音声の終了前に予測対象回線のガイダン
ス音声出力および発呼者の発声が終了するための条件、
T1+t1+t2<T+t(ID,i)は、他の回線の発呼者の発声終了
後に予測対象回線の発呼者の発声が始まるための条件で
ある。この何れかを満たせば、他の回線と発声時間が重
ならないということになる（ステップ502）。図５
（ａ）は、 T1+t1+t2<T+t(ID,i)の条件を満たし、他の
回線の発呼者の発声終了後に予測対象回線の発呼者の発
声が始まる場合の例である。Next, an optimum guidance voice satisfying the condition 1 for all the lines other than the line to be predicted is selected from three types. T1 + t1> T + t (ID, i) + t2 of condition 1 is a condition for ending the guidance voice output of the prediction target line and the utterance of the caller before the guidance voice of the other line ends.
T1 + t1 + t2 <T + t (ID, i) is a condition for the utterance of the caller of the line to be predicted to start after the utterance of the caller of the other line ends. If any of these conditions is satisfied, it means that the utterance time does not overlap with other lines (step 502). FIG.
(A) is an example in which the condition of T1 + t1 + t2 <T + t (ID, i) is satisfied, and the utterance of the caller of the prediction target line starts after the utterance of the caller of another line ends. is there.

【００３１】次に、ステップ502で条件１を満たすガイ
ダンス音声が求められたか否かを判断し（ステップ50
3）、求められていたら処理を終了する（ステップ50
5）。Next, it is determined in step 502 whether a guidance voice satisfying condition 1 has been obtained (step 50).
3) If requested, end the process (step 50)
Five).

【００３２】発呼者の発声タイミングが重ならないよう
なガイダンス音声を求めることができなかった場合は、
発声時間の重なりのなるべく少ないガイダンスを選択す
るため、条件２を満たすガイダンス音声を求める。Tb
(i)は予測対象回線の発声時間の初めの重なり時間であ
り、MIN(Tb(1), Tb(2), Tb(3))は初めの重なり時間が最
小となるもの、 Ta(i)は予測対象回線の発声時間の終わ
りの部分の重なり時間であり、MIN(Ta(1), Ta(2), Ta
(3))は終わるの部分の重なり時間が最小となるガイダン
ス音声を求める条件である。予測対象回線以外の他の回
線全てに対して条件２を満たすガイダンス音声を求める
ことにより、他の回線との発声時間の重なりの最も少な
いガイダンス音声を求めることができる（ステップ50
4）。ガイダンス音声が求まったら処理を終了する（ス
テップ505）。If it is not possible to obtain a guidance voice that does not overlap the uttering timing of the caller,
In order to select a guidance with as little overlap in utterance time as possible, a guidance voice that satisfies condition 2 is obtained. Tb
(i) is the initial overlap time of the utterance time of the line to be predicted, MIN (Tb (1), Tb (2), Tb (3)) is the minimum overlap time of the first, Ta (i) Is the overlap time at the end of the utterance time of the line to be predicted, and MIN (Ta (1), Ta (2), Ta
(3)) is a condition for finding a guidance voice that minimizes the overlap time of the end part. By finding guidance speech that satisfies condition 2 for all the lines other than the prediction target line, it is possible to find the guidance speech with the least overlap of the utterance time with other lines (step 50).
Four). When the guidance voice is obtained, the processing is terminated (step 505).

【００３３】次に、音声認識動作について説明する。図
６は本実施例に係る音声認識応答装置の認識データ部11
3に搭載されるデータを示したものである。認識データ
部113は、音声認識処理に用いるための辞書データと呼
ばれるデータを有する。辞書データは、部署名を記述し
たデータ601と、各部署毎の名前と内線番号の対応する
データ602、603を持つ。回線から着信があり回線I/F101
と音声入出力部103を起動した後、制御部111は音声認識
ユニット107に認識対象辞書データを認識データ部113か
ら音声認識ユニット107に転送する。自動交換サービス
の場合は、まず、認識データ部113から音声認識ユニッ
ト107に部署名の記述された辞書データ601転送する。そ
して、部署名の記述された認識対象辞書データ601の転
送を終了した後、音声認識ユニット107を起動させる。
そして第1の認識処理として発呼者から発声された音声
を認識して部署名の記述された認識対象辞書データに基
づき発呼者が要求する部署名を求める。次に認識結果の
部署名を元にその部署に所属する名前と内線番号が対応
する認識対象辞書データ602または603を認識データ部11
3から音声認識ユニット107に転送する。そして、第2の
認識処理として発呼者から発声された音声を認識して発
呼者が要求する人物の内線番号を求める。Next, the speech recognition operation will be described. FIG. 6 shows a recognition data unit 11 of the voice recognition response apparatus according to the present embodiment.
3 shows the data to be mounted. The recognition data unit 113 has data called dictionary data to be used for speech recognition processing. The dictionary data has data 601 describing a department name, and data 602 and 603 corresponding to the name and extension number of each department. There is an incoming call from the line, line I / F101
After activating the voice input / output unit 103, the control unit 111 transfers the dictionary data to be recognized to the voice recognition unit 107 from the recognition data unit 113 to the voice recognition unit 107. In the case of the automatic exchange service, first, the dictionary data 601 in which the name of the department is described is transferred from the recognition data unit 113 to the speech recognition unit 107. Then, after the transfer of the recognition target dictionary data 601 in which the department name is described, the speech recognition unit 107 is activated.
Then, as a first recognition process, a voice uttered by the caller is recognized, and a department name requested by the caller is obtained based on the recognition target dictionary data in which the department name is described. Next, based on the department name of the recognition result, recognition target dictionary data 602 or 603 corresponding to the name belonging to the department and the extension number correspond to the recognition data section 11.
3 to the speech recognition unit 107. Then, as a second recognition process, the voice uttered by the caller is recognized to obtain the extension number of the person requested by the caller.

【００３４】図７は本実施例の音声認識応答装置の音声
データ部109に搭載されるガイダンス音声データを示し
たものである。音声データ部には図７に示すように各ガ
イダンス音声のID番号、ファイル名、ガイダンス内容が
格納されている。各ファイルの名前は、図４で示した予
測データ部112に搭載されているファイル名と同じもの
である。FIG. 7 shows guidance voice data installed in the voice data unit 109 of the voice recognition response device according to the present embodiment. As shown in FIG. 7, the voice data section stores the ID number, file name, and guidance content of each guidance voice. The name of each file is the same as the file name mounted on the prediction data section 112 shown in FIG.

【００３５】予測処理、認識処理について説明したが、
次に、自動内線交換サービスにおいて制御部111が行う
処理全体について説明する。図８は本実施例に係る音声
認識応答装置の制御部111の処理を示したものである。
制御部111は、予め決められた時間間隔で周期的に図８
に示す処理を繰り返す。周期は例えば10msから20msであ
る。制御部111は、まずはじめに電話回線１（CH1）につ
いて、回線I/F101の回線状況を確認する（ステップ80
1）。接続されている状態（通話中）であれば音声入出
力部103の音声検出部204と、入力選択部105と、出力選
択部106の動作状態を確認する（ステップ802）。続いて
各回線の使用状況から対話予測を行いガイダンス出力処
理を行う（ステップ803）。電話回線１が接続中ではな
く接続待ち状態であれば次の電話回線の確認に移行す
る。このようにして全ての回線の回線状況の確認が終了
すると、次に監視部110から各音声認識部とメモリの状
態を取得し、もしメモリに一時保存されているデータが
あれば入力選択部105を介して空いている音声認識ユニ
ットとメモリを接続し認識処理を行う（ステップ80
7）。制御部111は、以上のステップ800からステップ808
までの処理を周期的に繰り返して、すべての電話回線の
状態を確認する。以下、図９ないし図１２を用いて図８
に示した回線I/F確認処理（ステップ801）、音声入出力
の確認および認識処理およびメモリ処理（ステップ80
2）、ガイダンス選択・出力処理（ステップ803）の各ス
テップの動作を説明する。なお、対話予測処理は図５に
おいて説明した。The prediction process and the recognition process have been described.
Next, the entire processing performed by the control unit 111 in the automatic extension switching service will be described. FIG. 8 shows a process of the control unit 111 of the voice recognition response device according to the present embodiment.
The control unit 111 periodically performs the operation shown in FIG.
Is repeated. The cycle is, for example, 10 ms to 20 ms. The control unit 111 first checks the line status of the line I / F 101 for the telephone line 1 (CH1) (step 80).
1). If it is connected (during a call), the operation state of the voice detection unit 204 of the voice input / output unit 103, the input selection unit 105, and the output selection unit 106 is confirmed (step 802). Subsequently, dialog prediction is performed based on the usage status of each line, and guidance output processing is performed (step 803). If the telephone line 1 is not being connected but is in a connection waiting state, the flow shifts to confirmation of the next telephone line. When the checking of the line status of all the lines is completed in this way, the state of each voice recognition unit and the memory is obtained from the monitoring unit 110, and if there is data temporarily stored in the memory, the input selection unit 105 Connects the vacant speech recognition unit to the memory via the, and performs recognition processing (step 80)
7). The control unit 111 executes the above steps 800 to 808
Repeat the above steps periodically to check the status of all telephone lines. Hereinafter, FIG. 8 will be described with reference to FIGS.
Line I / F confirmation processing (step 801), voice input / output confirmation and recognition processing, and memory processing (step 80)
2) The operation of each step of the guidance selection / output processing (step 803) will be described. Note that the dialogue prediction processing has been described with reference to FIG.

【００３６】まず回線I/F確認処理について説明する。
図９は本実施例に係る音声認識応答装置の制御部111に
おける回線I/Fの詳細な制御を示したものである。まず
制御部111は、ある回線I/Fについての１周期前に確認処
理を行ったときの回線状態を調べる（ステップ901）。
そして、１周期前にその回線I/Fが電話回線と接続中で
あった場合には、まだ接続中（通話中）か通話が終了し
て回線が切断されているか現在の回線状態を確認する
（ステップ902）。回線が切断されていれば制御部111か
ら音声入出力の処理を停止させ（ステップ903）、回線
を切断し（ステップ904）、処理を終了する（ステップ9
08）。ステップ902の現在の回線状況の確認結果、１周
期前に引き続き通話中であれば接続中として処理を終了
する（ステップ909）。First, the line I / F confirmation processing will be described.
FIG. 9 shows detailed control of the line I / F in the control unit 111 of the voice recognition response device according to the present embodiment. First, the control unit 111 checks a line state when a confirmation process is performed one cycle before a certain line I / F (step 901).
If the line I / F was connected to the telephone line one cycle ago, the current line status is checked to see whether the line is still connected (during a call) or the call has been terminated and the line has been disconnected. (Step 902). If the line is disconnected, the control unit 111 stops the voice input / output processing (step 903), disconnects the line (step 904), and ends the processing (step 9).
08). As a result of checking the current line status in step 902, if a call is being made one cycle ago, the process is terminated as a connection being made (step 909).

【００３７】ステップ901において１周期前には回線I/F
は電話回線と接続されていなかった場合には、現在の回
線状態を確認し（ステップ905）、着信があれば回線を
装置と接続し（ステップ906）、音声入出力を起動させ
る（ステップ907）。回線Ｉ／Ｆ部接続、接続待ち確認
処理は以上のようにして行う。In step 901, the line I / F one cycle before
If the telephone is not connected to the telephone line, the current line state is confirmed (step 905). If there is an incoming call, the line is connected to the device (step 906), and voice input / output is activated (step 907). . The line I / F unit connection and connection waiting confirmation processing is performed as described above.

【００３８】次に、音声認識入出力の確認処理および音
声認識処理、メモリ処理について説明する。図１０は本
実施例に係る音声認識ユニットまたはメモリへのバッフ
ァリング開始についての詳細な制御を示したものであ
る。ここでは電話回線１についての処理を例にとって説
明する。Next, confirmation processing of speech recognition input / output, speech recognition processing, and memory processing will be described. FIG. 10 shows the detailed control for starting the buffering to the voice recognition unit or the memory according to the present embodiment. Here, the processing for the telephone line 1 will be described as an example.

【００３９】まず１周期前に音声入出力確認、認識、メ
モリ処理を行った時に電話回線を介して発呼者の音声が
入力中であったか否かを確認する（ステップ１００
１）。入力中でなければ、現在の音声入出力部103の音
声検出部204で発呼者の音声を検出したか否かを確認し
（ステップ1002）、音声を検出したら現在空いている音
声認識ユニット107を検索する（ステップ1003）。ステ
ップ1002で音声が検出できなければ処理を終了する（ス
テップ1013）。音声を検出して音声認識ユニットも検索
できれば（ステップ1004）制御部111の指示で入力選択
部105は音声入出力部103と選ばれた音声認識ユニット10
7を接続し（ステップ1005）、回線から入力されたディ
ジタル音声信号を音声認識ユニット107に送信し、ステ
ップ1007で音声認識ユニット107に認識処理の開始を通
知する（ステップ1007）。ステップ1004で空いている音
声認識ユニットが検索できなかった場合には、空いてい
る音声認識ユニットが無いために一時的に回線から入力
されたディジタル音声信号をメモリ108に保存するため
入力選択部105は音声入出力部103とメモリ108を接続し
（ステップ1006）、バッファリング処理の開始を指示す
る（ステップ1008）。First, it is checked whether the voice of the caller was being input via the telephone line when the voice input / output confirmation, recognition, and memory processing were performed one cycle before (step 100).
1). If the voice is not being input, it is checked whether or not the voice of the caller has been detected by the voice detection unit 204 of the current voice input / output unit 103 (step 1002). Is searched (step 1003). If no voice can be detected in step 1002, the process ends (step 1013). If the voice can be detected and the voice recognition unit can also be searched (step 1004), the input selection unit 105 is instructed by the control unit 111 and the voice input / output unit 103 and the selected voice recognition unit 10 are selected.
7 is connected (step 1005), the digital voice signal input from the line is transmitted to the voice recognition unit 107, and the start of recognition processing is notified to the voice recognition unit 107 in step 1007 (step 1007). If no vacant speech recognition unit is found in step 1004, there is no vacant speech recognition unit, so that the digital speech signal input from the line is temporarily stored in the memory 108 because there is no vacant speech recognition unit. Connects the voice input / output unit 103 and the memory 108 (step 1006), and instructs the start of buffering processing (step 1008).

【００４０】ステップ1001で１周期前に音声が入力中で
あった場合には、音声入出力部103の音声検出部204にお
いて入力中であった音声の終端が検出されたかどうか調
べる（ステップ1009）。音声の終端がまだ検出されてい
なければ処理を終了する（ステップ1013）。検出される
と、次にメモリにバッファリング中か否かを確認し（ス
テップ1010）、バッファリング中であれば音声の終端を
検出したのでメモリ108へのバッファリングを停止し、
音声入出力部103とメモリ108の接続を解除する（ステッ
プ1011）。メモリへのバッファリングは行っていなかっ
た場合には、音声認識ユニット107を停止し、音声入出
力部103と音声認識ユニット107の接続を解除する（ステ
ップ1012）。If the voice was being input one cycle before in step 1001, it is checked whether or not the end of the input voice has been detected in the voice detection unit 204 of the voice input / output unit 103 (step 1009). . If the end of the voice has not been detected yet, the process ends (step 1013). When detected, it is next checked whether or not buffering is being performed in the memory (step 1010). If buffering is being performed, the end of the audio is detected, so buffering in the memory 108 is stopped.
The connection between the voice input / output unit 103 and the memory 108 is released (step 1011). If buffering to the memory has not been performed, the voice recognition unit 107 is stopped, and the connection between the voice input / output unit 103 and the voice recognition unit 107 is released (step 1012).

【００４１】次にガイダンス出力処理について説明す
る。図１１は本実施例に係る音声認識応答装置のガイダ
ンス出力についての詳細な制御を示したものである。ま
ず各電話回線の対話の状態を確認し（ステップ1101）、
ガイダンスまたは保留音が出力中か否かを確認する（ス
テップ1102）。ガイダンスまたは保留音が出力中でなけ
れば、次に認識処理中か否かを確認（ステップ1103）す
る。そして認識処理中であれば処理を終了する（ステッ
プ1110）。ステップ1003で認識処理中でなければ、他回
線の対話状態を確認し（ステップ1104）、自回線の発呼
者発声タイミングを予測し、スムーズな対話になる最適
なガイダンス音声を選択する（ステップ1105）。最適な
ガイダンス音声を取得できたか否かを判断し（ステップ
1106）、選択されれば選択したガイダンスを出力するよ
う制御する（ステップ1107）。最適なガイダンス音声が
選択されないときは、認識処理が重なってしまうことに
より認識に時間がかかってしまう間、対話を一時的に停
止するための保留音を出力する（ステップ1109）認識処
理が完了すると保留音の出力を停止し、処理を終了する
（ステップ1110）。Next, the guidance output process will be described. FIG. 11 shows the detailed control of the guidance output of the voice recognition response device according to the present embodiment. First, check the status of the conversation on each telephone line (step 1101),
It is checked whether the guidance or the music on hold is being output (step 1102). If the guidance or the holding sound is not being output, it is next checked whether or not the recognition process is being performed (step 1103). If the recognition process is being performed, the process is terminated (step 1110). If the recognition processing is not being performed in step 1003, the dialogue state of the other line is confirmed (step 1104), the caller utterance timing of the own line is predicted, and the optimal guidance voice that provides a smooth dialogue is selected (step 1105). ). It is determined whether the optimal guidance sound has been obtained (step
1106), if selected, control is performed to output the selected guidance (step 1107). If the optimal guidance voice is not selected, a hold sound for temporarily stopping the dialogue is output while the recognition process takes a long time due to the overlapping of the recognition processes (step 1109). The output of the music on hold is stopped, and the process ends (step 1110).

【００４２】一方、ステップ1102においてガイダンスま
たは保留音が出力中であった場合には、保留音が出力中
であるかガイダンスが出力中であるかを確認し（ステッ
プ1108）、保留音であればステップ1104以下の処理を行
う。保留音ではなくガイダンスが出力中であった場合に
は処理を終了する（ステップ1110）。On the other hand, if the guidance or hold sound is being output in step 1102, it is checked whether the hold sound is being output or the guidance is being output (step 1108). Step 1104 and subsequent steps are performed. If the guidance is being output instead of the holding sound, the process ends (step 1110).

【００４３】次に、空いている音声認識装置がなく、メ
モリに保存されたデータを読み出して認識する場合の処
理について説明する。図１２は音声認識応答装置のメモ
リに保存されたディジタル音声信号の認識処理について
の詳細な制御を示す図である。まず、空いてる音声認識
ユニットがあるかどうか検索する（ステップ1201）。そ
してメモリ蓄積状態を検索する（ステップ1202）。メモ
リ内に処理を待っているメモリ蓄積データがあるか否か
と、空き音声認識ユニットがあるか否かを確認し（ステ
ップ1203）、有れば入力選択でメモリと音声認識を接続
し（ステップ1206）、認識処理を開始する（ステップ12
07）。そして音声認識処理が終了したかどうかを確認
し、処理を終了した場合には、処理を終了した音声認識
ユニットを停止させ（ステップ1208）、メモリと音声認
識の接続を解除する（ステップ1209）そして制御部111
に停止した音声認識から認識結果を送信し、処理を終了
する（ステップ1210）。Next, a description will be given of a process in the case where there is no vacant speech recognition device and the data stored in the memory is read and recognized. FIG. 12 is a diagram showing the detailed control of the recognition processing of the digital voice signal stored in the memory of the voice recognition response device. First, it is searched whether or not there is a free voice recognition unit (step 1201). Then, the memory storage state is searched (step 1202). It is checked whether or not there is memory storage data waiting to be processed in the memory and whether or not there is an empty voice recognition unit (step 1203). If there is, the memory and voice recognition are connected by input selection (step 1206). ), Start the recognition process (step 12)
07). Then, it is checked whether or not the voice recognition processing has been completed. If the processing has been completed, the voice recognition unit that has completed the processing is stopped (step 1208), and the connection between the memory and the voice recognition is released (step 1209). Control unit 111
Then, the recognition result is transmitted from the stopped speech recognition, and the process ends (step 1210).

【００４４】ステップ1203でメモリ内に処理待ちのデー
タがなかった場合には、現在、メモリ蓄積データの音声
認識処理が行われている音声認識ユニットを検索し（ス
テップ1204）、それらのの処理が終了したか否かを確認
し（ステップ1205）で、終了していればステップ1208以
下の処理を行う。処理を完了していない場合には処理を
終了する（ステップ1210）。If there is no data waiting to be processed in the memory at step 1203, a search is made for a voice recognition unit which is currently performing voice recognition processing on the data stored in the memory (step 1204). It is determined whether or not the processing has been completed (step 1205). If the processing has been completed, the processing from step 1208 on is performed. If the processing has not been completed, the processing ends (step 1210).

【００４５】図９ないし図１２に示した制御部111の処
理において、図９、図１０の１周期前の回線の状態を求
める処理、図１０の未使用音声認識ユニットを検索する
処理、図１１の回線の対話状況を確認する処理、図１２
の空き音声認識ユニットを検索する処理は、図１３に示
すような認識ユニット管理テーブルおよび回線状態管理
テーブルを参照し判断される。In the processing of the control unit 111 shown in FIGS. 9 to 12, the processing for obtaining the line state one cycle before in FIGS. 9 and 10, the processing for searching for an unused speech recognition unit in FIG. For checking the conversation status of the line of FIG.
The process of searching for an available voice recognition unit is determined by referring to a recognition unit management table and a line state management table as shown in FIG.

【００４６】図１３は、認識ユニット管理テーブルおよ
び回線状態管理テーブルの構成例を示す図である。図１
３に示す例では、図１３（ａ）の認識ユニットの管理テ
ーブルは、各認識ユニットの動作状態および接続回線の
データからなり、図１３（ｂ）の回線状態の管理テーブ
ルは、各回線毎の回線状態および対話状況のデータから
なる。回線状態は、現在の処理状態を示し、対話状態は
発呼者との対話の進み度合いを示すものである。例えば
今まで述べてきた自動交換サービスを例にとって説明す
ると、着信直後の「こちらは…です」というようなグリ
ーティングのためのガイダンス音声を出力している段階
は対話手順０、部署名の問い合わせと部署名の認識処理
を行っている段階を対話手順１、名前の問い合わせと名
前の認識処理を行っている段階を対話手順２、接続案内
ガイダンスを出力して接続処理を行っている段階は対話
手順３というように対話の段階を区別して表現する。こ
れらの動作状態、接続回線、回線状態、対話状態は、予
め決められたコードで表現することも考えられる。FIG. 13 is a diagram showing a configuration example of the recognition unit management table and the line state management table. FIG.
In the example shown in FIG. 3, the management table of the recognition unit in FIG. 13A is composed of the operation state of each recognition unit and the data of the connection line, and the management table of the line state in FIG. It consists of line status and conversation status data. The line state indicates the current processing state, and the conversation state indicates the degree of progress of the conversation with the caller. For example, taking the automatic exchange service described above as an example, the stage of outputting the guidance voice for a greeting such as "This is ..." immediately after an incoming call is the interactive procedure 0, the inquiry of the department name and the department. Dialogue procedure 1 refers to the step of performing name recognition processing, dialogue procedure 2 refers to the step of performing name inquiry and name recognition processing, and dialogue procedure 3 refers to the step of outputting connection guidance and performing connection processing. And express the stages of the dialogue. The operation state, the connection line, the line state, and the conversation state may be expressed by a predetermined code.

【００４７】次に、図１４および図１５を用いて本発明
の音声認識応答装置を用いた自動交換サービスの具体例
な対話例を説明する。図１４は本実施例に係る発呼者と
音声認識応答装置の具体的な対話例を示した図である。
図１４において、まず、音声認識応答装置は電話回線１
の着信後、発呼者に対しガイダンス音声1401を出力し、
電話回線２についても着信があると発呼者に対しガイダ
ンス音声1406を出力する。この時電話回線2の方は電話
回線１の発呼者の発声タイミングを予測して発声タイミ
ングが重ならないように電話回線2側のガイダンス音声
を選択する。この方法は図５で説明した手順に基づいて
選択するものである。この場合は、電話回線1のガイダ
ンス音声1401としては音声データ部109から図７に示し
たガイダンス音声G1bを選択し、電話回線２のガイダン
ス音声1406としては図７のガイダンス音声G1cを選択す
る。予測したように発呼者が発声すれば、図１４に示し
たよう電話回線１の利用者の音声「設計部」1404と電話
回線２の利用者の音声「資材部」1408は重なることなく
音声認識ユニットにおいて処理される。Next, a specific dialogue example of an automatic exchange service using the voice recognition response device of the present invention will be described with reference to FIGS. FIG. 14 is a diagram illustrating a specific example of a dialog between the caller and the voice recognition response device according to the present embodiment.
In FIG. 14, first, the voice recognition response device is the telephone line 1
After receiving the call, a guidance voice 1401 is output to the caller,
When there is an incoming call on the telephone line 2 as well, a guidance voice 1406 is output to the caller. At this time, the telephone line 2 predicts the utterance timing of the caller of the telephone line 1 and selects the guidance voice of the telephone line 2 so that the utterance timing does not overlap. This method is selected based on the procedure described in FIG. In this case, the guidance voice G1b shown in FIG. 7 is selected from the voice data unit 109 as the guidance voice 1401 of the telephone line 1, and the guidance voice G1c of FIG. 7 is selected as the guidance voice 1406 of the telephone line 2. If the caller utters as predicted, the voice “design part” 1404 of the telephone line 1 user and the voice “material part” 1408 of the telephone line 2 user do not overlap as shown in FIG. Processed in the recognition unit.

【００４８】図１５は本実施例に係る音声認識応答装置
の別の対話例を示す。図１５においては、２つの電話回
線から同時に着信し、発声時間長の異なるガイダンスを
出力したものの、音声認識処理が重なってしまった場合
の動作を示す。FIG. 15 shows another example of the dialogue of the voice recognition response apparatus according to the present embodiment. FIG. 15 shows an operation in the case where voices are simultaneously received from two telephone lines and guidances with different utterance time lengths are output, but the voice recognition processing is overlapped.

【００４９】音声認識応答装置は電話回線１の着信後、
発呼者に対しガイダンス音声1501を出力し、電話回線２
についても発呼者に対しガイダンス音声1506を出力す
る。この時電話回線2の方は電話回線１の発呼者の発声
タイミングを予測して発声タイミングが重ならないよう
に電話回線2側のガイダンス音声を選択する。しかし、
電話回線2の発呼者がガイダンス音声を途中までしか聞
かずに発声割り込みを行った結果、音声認識ユニット10
7が電話回線１の発呼者が発声した音声1504を認識中に
電話回線２の発呼者からも発声1510があることになる。
この場合は音声認識ユニットは認識処理１（1512）を行
っているため、制御部111は、電話回線2の発呼者の音声
を一時的にメモリ108に保存するよう制御する（151
6）。そして、音声認識ユニット107の認識処理１（151
2）が終了した後、メモリ108に保存したデータを読み出
して認識処理2（1513）を行う。この時、電話回線2の発
呼者に対し発声後認識処理が終わるまで保留音「ピィピ
ィ...」等（1507）を流す。認識処理2（1513）が終了し
た後は次の対話に進む。この処理によって音声認識ユニ
ット107が使用中であっても一旦音声をメモリに保存し
て認識処理の終了後にメモリから読み出して処理を行う
ことにより対話をスムーズに進めることができる。After the voice recognition response device receives the telephone line 1,
A guidance voice 1501 is output to the caller and telephone line 2
, A guidance voice 1506 is output to the caller. At this time, the telephone line 2 predicts the utterance timing of the caller of the telephone line 1 and selects the guidance voice of the telephone line 2 so that the utterance timing does not overlap. But,
As a result of the caller of the telephone line 2 interrupting the utterance only halfway through the guidance voice, the voice recognition unit 10
While 7 is recognizing the voice 1504 uttered by the caller of the telephone line 1, the caller of the telephone line 2 also makes a utterance 1510.
In this case, since the voice recognition unit is performing the recognition process 1 (1512), the control unit 111 controls the voice of the caller of the telephone line 2 to be temporarily stored in the memory 108 (151).
6). Then, the recognition processing 1 of the voice recognition unit 107 (151
After the end of 2), the data stored in the memory 108 is read out and the recognition process 2 (1513) is performed. At this time, a hold tone "Pipy ..." or the like (1507) is played to the caller of the telephone line 2 until the recognition processing is completed after the utterance. After the recognition process 2 (1513) ends, the process proceeds to the next dialogue. By this process, even if the speech recognition unit 107 is in use, the speech is temporarily stored in the memory, and after the recognition process is completed, the speech is read out from the memory and the process is performed, so that the conversation can proceed smoothly.

【００５０】[0050]

【発明の効果】本発明によれば、１つの音声認識部で複
数の電話回線から受信する音声を認識でき、少ない数の
音声認識部を効率的に利用できる音声認識応答装置およ
び音声認識応答方法ならびに音声ガイダンス方法を提供
することができる。According to the present invention, a single voice recognition unit can recognize voices received from a plurality of telephone lines, and a voice recognition response apparatus and a voice recognition response method that can efficiently use a small number of voice recognition units. In addition, a voice guidance method can be provided.

[Brief description of the drawings]

【図１】本発明の音声認識応答装置の構成を示す図であ
る。FIG. 1 is a diagram showing a configuration of a voice recognition response device of the present invention.

【図２】音声認識応答装置の音声入出力部の構成を示す
図である。FIG. 2 is a diagram illustrating a configuration of a voice input / output unit of the voice recognition response device.

【図３】音声認識応答装置を用いた自動内線交換サービ
スシステムの構成を示す図である。FIG. 3 is a diagram showing a configuration of an automatic extension switching service system using a voice recognition response device.

【図４】音声認識応答装置の予測データ部に搭載される
データの構成例を示す図である。FIG. 4 is a diagram showing a configuration example of data mounted on a prediction data unit of the voice recognition response device.

【図５】本発明の音声認識応答装置における他回線のガ
イダンス出力状態と予測対象回線の予測結果の関係およ
び予測処理のフローチャートを示す図である。FIG. 5 is a diagram showing a relationship between a guidance output state of another line and a prediction result of a prediction target line and a flowchart of a prediction process in the voice recognition response device of the present invention.

【図６】音声認識応答装置の認識データ部に搭載される
データの構成例を示す図である。FIG. 6 is a diagram illustrating a configuration example of data mounted on a recognition data unit of the voice recognition response device.

【図７】音声認識応答装置の音声データ部に搭載される
ガイダンス音声データの構成例を示す図である。FIG. 7 is a diagram illustrating a configuration example of guidance voice data mounted on a voice data unit of the voice recognition response device.

【図８】音声認識応答装置の制御部の処理を示すフロー
チャート図である。FIG. 8 is a flowchart illustrating a process of a control unit of the voice recognition response device.

【図９】音声認識応答装置の制御部の処理のうち回線I/
F確認処理を示すフローチャート図である。FIG. 9 shows a line I / O in processing of the control unit of the voice recognition response device.
It is a flowchart figure which shows F confirmation processing.

【図１０】音声認識応答装置の制御部の処理のうち音声
認識処理およびメモリへのバッファリング処理を示すフ
ローチャート図である。FIG. 10 is a flowchart illustrating a voice recognition process and a buffering process in a memory among processes of a control unit of the voice recognition response device.

【図１１】音声認識応答装置の制御部の処理のうちガイ
ダンス出力処理を示すフローチャートである。FIG. 11 is a flowchart showing a guidance output process among the processes of the control unit of the voice recognition response device.

【図１２】音声認識応答装置の制御部の処理のうちメモ
リに蓄積したデータの音声認識処理を示すフローチャー
トである。FIG. 12 is a flowchart illustrating a voice recognition process of data stored in a memory among processes of a control unit of the voice recognition response device.

【図１３】本発明の音声認識応答装置における認識ユニ
ット管理テーブルおよび回線状態管理テーブルの構成例
を示す図である。FIG. 13 is a diagram showing a configuration example of a recognition unit management table and a line state management table in the voice recognition response device of the present invention.

【図１４】本発明の一実施例における発呼者と音声認識
応答装置の具体的な対話例を示した図である。FIG. 14 is a diagram showing a specific example of a dialogue between a caller and a voice recognition response device according to an embodiment of the present invention.

【図１５】本発明の一実施例における発呼者と音声認識
応答装置の具体的な対話例を示した図である。FIG. 15 is a diagram showing a specific example of a dialog between the caller and the voice recognition response device according to an embodiment of the present invention.

[Explanation of symbols]

101,102…回線インターフェイス、 103,104…音声入
出力、105…入力選択、 106…出力選択、 107…
音声認識、108…メモリ、 109…音声データ、11
0…監視、 111…制御、112…予測データ、 113…認
識データ、114…メモリ制御、115…音声データ制御、20
1…アナログ/ディジタル変換、202…ディジタル/アナロ
グ変換、 203…エコーキャンセラ、204…音声検
出、 301,302…電話、 303…公衆網、304…ＰＢ
Ｘ、305、306…内線電話、 307…音声認識応答装置、30
8…制御部、601,602,603…辞書データ。101,102… Line interface, 103,104… Audio input / output, 105… Input selection, 106… Output selection, 107…
Voice recognition, 108… Memory, 109… Speech data, 11
0: monitoring, 111: control, 112: prediction data, 113: recognition data, 114: memory control, 115: voice data control, 20
1 ... analog / digital conversion, 202 ... digital / analog conversion, 203 ... echo canceller, 204 ... voice detection, 301,302 ... telephone, 303 ... public network, 304 ... PB
X, 305, 306: extension telephone, 307: voice recognition response device, 30
8: control unit, 601, 602, 603: dictionary data.

───────────────────────────────────────────────────── フロントページの続きＦターム(参考） 5K015 AA06 AA07 AD02 AD05 GA04 GA07 5K024 AA76 BB01 BB03 BB04 CC04 DD01 DD02 EE09 FF06 ──────────────────────────────────────────────────続き Continued on the front page F term (reference) 5K015 AA06 AA07 AD02 AD05 GA04 GA07 5K024 AA76 BB01 BB03 BB04 CC04 DD01 DD02 EE09 FF06

Claims

[Claims]

An interactive voice for outputting voice guidance to a plurality of calling terminals connected via a communication line and recognizing voice input from each calling terminal in response to the voice guidance. A recognition response device, comprising: control means for controlling an output time of voice guidance to be output to each of the calling terminals according to a dialogue state with another calling terminal; and receiving from the plurality of calling terminals. And a voice recognition means for performing time-division recognition processing of voice.

2. The control means controls the output time of the voice guidance so as to minimize the overlapping time of the voices received from the calling terminals by changing the output time length of the voice guidance. Claim 1 characterized by the following:
A speech recognition response device according to claim 1.

3. The voice guidance comprises a plurality of steps for collecting information from an operator of each calling terminal through a dialog, and a plurality of voice guidances having different output times are prepared as the above-mentioned steps. The control means predicts the utterance period of the voice input from the other calling terminal from the required output time of the voice guidance being output to the other calling terminal, The voice recognition response device according to claim 1, wherein an optimum voice guidance is selected and output from the voice guidance.

4. A voice recognition / response apparatus for outputting voice guidance to a plurality of calling terminals connected via a communication line and recognizing voice input from each calling terminal in response to the voice guidance. And voice recognition means for recognizing and processing the voice received from the calling terminal, and from the another calling terminal when the voice recognition means receives voice from another calling terminal during the recognition processing. A voice recognition response, comprising: a memory for storing a received voice; and a memory reading means for reading out the voice stored in the memory and transmitting the voice to the voice recognition means after completion of the recognition processing in the voice recognition means. apparatus.

5. An interactive voice for outputting voice guidance to a plurality of calling terminals connected via a communication line and receiving voice input from each calling terminal in response to the voice guidance. An answering device, comprising: a control unit that controls an output time of the audio guidance to be output to each calling terminal according to a dialogue state with another calling terminal.

6. The control means controls the output time of the voice guidance by changing the output time length of the voice guidance so that the overlap time of the voices received from each of the calling terminals is minimized. 6. The method according to claim 5, wherein
The voice response device according to claim 1.

7. The voice guidance comprises a plurality of steps for collecting information from an operator of each calling terminal through a dialogue, and a plurality of voice guidances having different output times are prepared as the above-mentioned steps. The control means predicts the utterance period of the voice input from the calling terminal from the required output time of the voice guidance being output to another calling terminal, and performs the above-described steps for the calling terminal. The voice response device according to claim 5, wherein an optimum voice guidance is selected from a plurality of voice guidances and output.

8. A voice response device which outputs voice guidance to a plurality of calling terminals connected via a communication line and receives voice input from each calling terminal in response to the voice guidance. And receiving processing means for performing a receiving process of the voice received from the calling terminal, and from the another calling terminal when the receiving processing means receives a voice from another calling terminal during the receiving process. A voice response device comprising: a memory for storing the received voice; and a memory reading unit for reading the voice stored in the memory and transmitting the voice to the reception processing unit after the reception processing in the reception processing unit is completed. .

9. An interactive speech recognition for outputting voice guidance to a plurality of callers connected via a communication line and recognizing voices from the respective callers input in response to the voice guidance. In a response method, when outputting voice guidance to a first caller, voice output from the second caller is performed based on a required time for outputting voice guidance being output to a second caller. A voice recognition response method comprising: estimating a time at which the first caller is to be received; and selecting a voice guidance to be output to the first caller based on the prediction result.

10. The voice guidance output to the first caller, wherein the overlap time of the voice received from the first caller and the voice received from the second caller is minimized. 10. The voice recognition response method according to claim 9, wherein the voice guidance is selected.

11. The voice guidance comprises a plurality of steps for collecting information through dialogue with each caller, and a plurality of voice guidances having different required output times are prepared as the guidance for each step. The voice guidance to be output to the first caller by estimating the reception period of the voice input from the second caller from the required output time of the voice guidance being output to the second caller 10. The voice recognition response method according to claim 9, wherein an optimal voice guidance is selected and output from a plurality of voice guidances of each step.

12. A voice recognition response method for outputting voice guidance to a plurality of callers connected via a communication line and recognizing voice input from each caller in response to the voice guidance. And when a voice is received from the second caller during the process of recognizing the voice received from the first caller, the voice received from the second caller is temporarily stored, After the voice recognition processing of the first caller is completed,
A voice recognition response method, comprising reading out the stored voice of the second caller and performing a recognition process.

13. An interactive voice response for outputting voice guidance to a plurality of callers connected via a communication line and receiving voice input from each of the callers in response to the voice guidance. When outputting voice guidance to a first caller, the method outputs voice from the second caller based on a required output time of voice guidance being output to a second caller. A voice response method comprising predicting a reception time and selecting voice guidance to be output to the first caller based on the prediction result.

14. The voice guidance output to the first caller, wherein the overlap time of the voice received from the first caller and the voice received from the second caller is minimized. 14. The voice response method according to claim 13, wherein such voice guidance is selected.

15. The voice guidance comprises a plurality of steps for collecting information through dialogue with each caller, and a plurality of voice guidances having different required output times are prepared as the guidance for each step. The voice guidance to be output to the first caller by estimating the reception period of the voice input from the second caller from the required output time of the voice guidance being output to the second caller 14. The voice response method according to claim 13, wherein an optimal voice guidance is selected and output from the plurality of voice guidances in each of the steps.

16. A voice response method for outputting voice guidance to a plurality of callers connected via a communication line and receiving voice input from each caller in response to the voice guidance. If the voice received from the second caller is received during the process of receiving the voice received from the first caller, the voice received from the second caller is temporarily stored,
A voice response method characterized by reading out the stored voice of the second caller and performing a reception process after the reception process of the voice of the first caller is completed.

17. A voice guidance method for outputting voice guidance to a caller connected via a communication line, wherein a plurality of voice guidances having different required output times are prepared for the same guidance content. A voice guidance method characterized by selecting and outputting a voice guidance having an optimum output time from a plurality of voice guidances to a caller.

18. The voice guidance method according to claim 17, wherein the plurality of voice guidances having different required output times are voice guidances expressing the same guidance content in different terms. Guidance method.

19. The voice guidance method according to claim 17, wherein, when another caller is connected via a communication line, said other caller. A voice guidance method characterized by selecting a guidance voice to be output based on a guidance output status to the user.