JP2010079103A

JP2010079103A - Voice interactive apparatus, program for the same, and voice interactive processing method

Info

Publication number: JP2010079103A
Application number: JP2008249280A
Authority: JP
Inventors: Masashi Takechi; 雅司武市; Hiroaki Matsuba; 弘明松場
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2008-09-26
Filing date: 2008-09-26
Publication date: 2010-04-08

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice interactive apparatus for easily obtaining a way how to input a voice even if a user is not familiar with an information input which uses a voice. <P>SOLUTION: The voice interactive apparatus includes: a voice recognizing means for recognizing input contents based on the voice inputted into a voice input section; a scenario storing means for storing an interactive scenario where a voice guidance is associated with an input item which should be succeedingly inputted to the voice input section according to the voice guidance; a voice guidance outputting means for outputting the voice guidance according to the interactive scenario from a voice output section; and a display processing means for displaying the input item inputted to the voice input section according to the voice guidance on a display section according to the interactive scenario when the voice guidance is outputted from the voice output section. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、音声対話装置及びそのプログラム並びに音声対話処理方法に関し、より詳しくは、音声により利用者と対話を行い、利用者が要求する情報やサービスを提供する音声対話装置及びそのプログラム並びに音声対話処理方法に関するものである。 The present invention relates to a voice interaction apparatus, a program thereof, and a voice interaction processing method. It relates to a processing method.

近年、音声により利用者と対話を行うことにより、利用者が要求する情報やサービスを提供する音声対話装置が多数提案されている。この種の音声対話装置では、音声ガイダンスを出力することによって利用者に入力すべき項目（入力項目）を提示し、当該音声ガイダンスに応じて利用者が発する音声を認識することで利用者との対話を行っている（例えば、特許文献１を参照。）。
特開平１１−２１２５９４号公報 2. Description of the Related Art In recent years, a large number of voice interaction apparatuses that provide information and services required by users by interacting with users by voice have been proposed. In this type of voice interactive device, items to be input (input items) are presented to the user by outputting voice guidance, and the user's voice is recognized by recognizing the voice emitted by the user according to the voice guidance. A dialogue is performed (see, for example, Patent Document 1).
JP-A-11-212594

しかしながら、従来の音声対話装置では、その利用者が音声による情報入力に慣れていない場合、音声ガイダンスに対してどのように音声入力をすればよいか分からなくなることがあった。 However, in the conventional voice interactive apparatus, when the user is not used to inputting information by voice, it may be difficult to know how to input voice to the voice guidance.

そこで、本発明は、利用者が音声による情報入力に慣れていない場合であっても、どのように音声入力をすればよいかを把握することのできる音声対話装置及びそのプログラム並びに音声対話処理方法を提供することを目的とする。 Therefore, the present invention provides a voice dialogue apparatus, a program thereof, and a voice dialogue processing method capable of grasping how to perform voice input even when the user is not accustomed to voice information input. The purpose is to provide.

かかる目的を達成するために、請求項１に記載の発明は、音声ガイダンスと、当該音声ガイダンスに応じて音声入力部へ入力させるべき入力項目と、を対応づけた対話シナリオを記憶するシナリオ記憶手段と、前記対話シナリオに従った音声ガイダンスを音声出力部から出力する音声ガイダンス出力手段と、前記音声ガイダンスが前記音声出力部から出力されるときに、当該音声ガイダンスに応じて音声入力部に入力されるべき入力項目を前記対話シナリオに従って表示部に所定表示形式で表示する表示処理手段と、前記音声入力部に入力される音声に基づいて入力内容の認識を行う音声認識手段と、を備えた音声対話装置とした。 To achieve this object, the invention according to claim 1 is a scenario storage means for storing a dialogue scenario in which voice guidance is associated with input items to be input to the voice input unit in accordance with the voice guidance. Voice guidance output means for outputting voice guidance according to the dialogue scenario from the voice output unit, and when the voice guidance is output from the voice output unit, the voice guidance is input to the voice input unit according to the voice guidance. A speech processing unit for displaying input items to be displayed in a predetermined display format on a display unit according to the dialogue scenario, and a speech recognition unit for recognizing input contents based on speech input to the speech input unit It was a dialogue device.

また、請求項２に記載の発明は、請求項１に記載の音声対話装置において、前記音声認識手段により認識された複数の入力内容の候補を前記表示部へ表示する候補表示手段と、前記候補表示手段により表示された複数の入力内容の候補から、いずれか一つの候補を選択する選択手段と、前記選択手段により選択された候補を入力内容として決定する入力処理手段とを備えたことを特徴とする。 Further, the invention according to claim 2 is the voice dialogue apparatus according to claim 1, wherein a candidate display unit that displays a plurality of input content candidates recognized by the voice recognition unit on the display unit, and the candidate A selection unit that selects any one candidate from a plurality of input content candidates displayed by the display unit, and an input processing unit that determines the candidate selected by the selection unit as input content. And

また、請求項３に記載の発明は、請求項１又は２に記載の音声対話装置において、前記表示処理手段は、前記表示部に表示した入力項目が指定されてから所定期間経過したとき、当該入力項目に対する入力例を表示することを特徴とする。 Further, the invention according to claim 3 is the voice interaction device according to claim 1 or 2, wherein the display processing means is configured to execute the operation when a predetermined period has elapsed after the input item displayed on the display unit is designated. An example of input for an input item is displayed.

また、請求項４に記載の発明は、請求項１〜３のいずれか１項に記載の音声対話装置において、前記表示処理手段は、前記表示部に表示した入力項目が指定されているときに、前記音声入力部への音声の入力が行われると、当該入力項目の表示形式を変更することを特徴とする。 According to a fourth aspect of the present invention, in the voice interactive apparatus according to any one of the first to third aspects, the display processing means is configured such that an input item displayed on the display unit is designated. When a voice is input to the voice input unit, the display format of the input item is changed.

また、請求項５に記載の発明は、請求項１〜３のいずれか１項に記載の音声対話装置において、前記表示処理手段は、前記表示部に表示した入力項目が指定されているときに、音声認識処理が開始されると、当該入力項目の表示形式を変更することを特徴とする。 According to a fifth aspect of the present invention, in the spoken dialogue apparatus according to any one of the first to third aspects, the display processing means is configured such that the input item displayed on the display unit is designated. When the voice recognition process is started, the display format of the input item is changed.

また、請求項６に記載の発明は、請求項１〜５のいずれか１項に記載の音声対話装置において、前記表示処理手段は、前記音声ガイダンス出力手段により音声ガイダンスを出力後、所定期間経過したときに、前記入力項目を前記表示部に表示することを特徴とする。 The invention according to claim 6 is the voice interactive apparatus according to any one of claims 1 to 5, wherein the display processing means outputs a voice guidance by the voice guidance output means, and then a predetermined period has elapsed. In this case, the input item is displayed on the display unit.

また、請求項７に記載の発明は、請求項１〜６のいずれか１項に記載の音声対話装置において、前記表示処理手段は、前記音声ガイダンスに応じて前記音声入力部に入力されるべき必須の入力項目に加え、任意の入力項目を前記表示部に表示するときには、前記必須の入力項目と前記任意の入力項目とで表示形式を変更することを特徴とする。 According to a seventh aspect of the present invention, in the voice interactive apparatus according to any one of the first to sixth aspects, the display processing means should be input to the voice input unit in accordance with the voice guidance. When an arbitrary input item is displayed on the display unit in addition to the essential input item, the display format is changed between the essential input item and the arbitrary input item.

また、請求項８に記載の発明は、請求項１〜７のいずれか１項に記載の音声対話装置において、前記表示処理手段は、前記入力項目に対する入力内容が所定の入力内容であるときに、すでに表示している入力項目に従属する入力項目を前記表示部に表示することを特徴とする。 According to an eighth aspect of the present invention, in the spoken dialogue apparatus according to any one of the first to seventh aspects, the display processing means is configured such that the input content for the input item is a predetermined input content. An input item subordinate to the already displayed input item is displayed on the display unit.

また、請求項９に記載の発明は、請求項１〜８のいずれか１項に記載の音声対話装置において、前記音声認識手段は、複数の音声認識用辞書を有しており、前記表示部に表示される入力項目に応じた音声認識用辞書を選択して前記音声入力部に入力された音声の認識を行うことを特徴とする。 Further, the invention according to claim 9 is the voice interaction device according to any one of claims 1 to 8, wherein the voice recognition means has a plurality of voice recognition dictionaries, and the display unit The speech recognition dictionary corresponding to the input item displayed on the screen is selected to recognize the speech input to the speech input unit.

また、請求項１０に記載の発明は、請求項１〜９のいずれか１項に記載の音声対話装置において、前記表示処理手段は、前記音声ガイダンスに応じて前記音声入力部に入力されるべき入力項目が複数あるとき、これら複数の入力項目をそれぞれ所定表示形式で前記表示部に表示し、前記音声認識手段は、複数の音声認識用辞書を有しており、前記複数の入力項目のうちすでに認識した入力項目の入力内容に応じた音声認識用辞書を選択し、前記複数の入力項目のうちまだ認識していない入力項目に対して前記音声入力部に入力される音声の認識を、前記選択した音声認識用辞書を用いて行うことを特徴とする。 The invention according to claim 10 is the voice interaction device according to any one of claims 1 to 9, wherein the display processing means should be input to the voice input unit according to the voice guidance. When there are a plurality of input items, each of the plurality of input items is displayed on the display unit in a predetermined display format, and the voice recognition means has a plurality of voice recognition dictionaries, Select a speech recognition dictionary corresponding to the input content of the input item that has already been recognized, and recognize the voice input to the voice input unit for the input item that has not been recognized among the plurality of input items, This is performed using the selected speech recognition dictionary.

請求項１１に記載の発明は、コンピュータを、請求項１〜１０のいずれか１項に記載の音声対話装置の各手段として機能させるプログラムとした。 The invention described in claim 11 is a program that causes a computer to function as each unit of the voice interactive apparatus according to any one of claims 1 to 10.

請求項１２に記載の発明は、記憶部に記憶した対話シナリオに従った音声ガイダンスを音声出力部から出力するステップと、前記音声ガイダンスが前記音声出力部から出力されるときに、当該音声ガイダンスに応じて音声入力部に入力されるべき入力項目を、前記認識された入力内容と前記対話シナリオとに従って表示部に所定表示形式で表示するステップと、前記音声入力部に入力される音声に基づいて入力内容の認識を行うステップと、を有する音声対話処理方法とした。 According to the twelfth aspect of the present invention, when the voice guidance according to the dialogue scenario stored in the storage unit is output from the voice output unit, and when the voice guidance is output from the voice output unit, In response to the input items to be input to the voice input unit in accordance with the recognized input content and the dialogue scenario in a predetermined display format, and based on the voice input to the voice input unit And a step of recognizing input contents.

本発明によれば、音声ガイダンスが音声出力部から出力されるときに、当該音声ガイダンスに応じて音声入力部に入力されるべき入力項目を表示部に所定表示形式で表示するので、利用者が音声による情報入力に慣れていない場合であっても、どのような内容をどのように音声入力すればよいかを把握することが容易となる。 According to the present invention, when the voice guidance is output from the voice output unit, the input items to be input to the voice input unit according to the voice guidance are displayed on the display unit in a predetermined display format. Even if the user is not used to inputting information by voice, it is easy to understand what content should be input and how.

［１．音声対話処理装置の概要］
本発明の実施の形態に係る音声対話装置の概要について、図面に基づいて説明する。図１は音声対話装置の概要構成を示す説明図、図２は本実施形態における音声対話装置の外観図、図３は音声対話処理方法の説明図である。 [1. Outline of Spoken Dialogue Processing Device]
An outline of a voice interaction apparatus according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is an explanatory diagram showing a schematic configuration of a voice interaction device, FIG. 2 is an external view of the voice interaction device in this embodiment, and FIG. 3 is an explanatory diagram of a voice interaction processing method.

音声対話装置は、音声により利用者と対話を行うことによって利用者が要求する情報やサービスを提供可能としたものであり、図１に示すように、音声ガイダンスと、当該音声ガイダンスに応じて音声入力部へ入力させるべき入力項目と、を対応づけた対話シナリオを記憶するシナリオ記憶手段と、前記対話シナリオに従った音声ガイダンスを音声出力部から出力する音声ガイダンス出力手段と、前記音声ガイダンスが前記音声出力部から出力されるときに、当該音声ガイダンスに応じて音声入力部に入力されるべき入力項目を前記対話シナリオに従って表示部に所定表示形式で表示する表示処理手段と、前記音声入力部に入力される音声に基づいて入力内容の認識を行う音声認識手段とを備えている。 The voice interaction device is capable of providing information and services required by the user by interacting with the user by voice. As shown in FIG. 1, the voice dialogue and voice according to the voice guidance are provided. Scenario storage means for storing a dialogue scenario in which input items to be input to the input unit are associated with each other, voice guidance output means for outputting voice guidance according to the dialogue scenario from a voice output unit, and the voice guidance is Display processing means for displaying input items to be input to the voice input unit according to the voice guidance in a predetermined display format according to the dialogue scenario when output from the voice output unit; and the voice input unit Voice recognition means for recognizing the input content based on the input voice.

ここで、本実施形態における対話シナリオとは、設定された複数の質問項目と、各質問項目に対して予測される複数の利用者の回答項目、各回答項目に対する新たな質問項目や確認項目などが、様々なシチュエーションに応じて筋道立てられて構築されている。 Here, the dialogue scenario in the present embodiment includes a plurality of set question items, a plurality of user answer items predicted for each question item, new question items and confirmation items for each answer item, etc. However, it is constructed according to various situations.

そして、この対話シナリオに従って、音声ガイダンスが音声出力部より出力されるとともに、音声ガイダンスに応じて音声入力部に入力されるべき入力項目が、利用者に分かりやすい表示形式で表示部に表示される。 In accordance with this dialogue scenario, voice guidance is output from the voice output unit, and input items to be input to the voice input unit in accordance with the voice guidance are displayed on the display unit in a display format that is easy for the user to understand. .

したがって、利用者は、音声ガイダンスが要求する入力内容が、表示部に入力項目として表示されるため、どのような内容をどのように音声入力すれば良いかを容易に把握することができる。 Therefore, since the input content requested by the voice guidance is displayed as an input item on the display unit, the user can easily grasp what content should be input and how.

かかる音声対話装置は、例えば、会社などの受付に設置することができ、会社を訪問した来客との音声による対話によって、客が望む部署や社員へ取り次ぐことができる自動受付装置として用いることが可能である。このとき、対話シナリオとしては、来客を所定の部署や特定社員に取り次ぐまでの音声ガイダンスと予想回答とが筋道立てられて構築されている来客受付シナリオが用いられる。 Such a voice interaction device can be installed, for example, at the reception of a company or the like, and can be used as an automatic reception device that can be relayed to a department or employee desired by the customer by means of a voice conversation with a visitor visiting the company. It is. At this time, as a dialogue scenario, a visitor reception scenario is used in which voice guidance and expected answers are routed until a visitor is relayed to a predetermined department or specific employee.

自動受付装置に適用された本実施形態に係る音声対話装置は、図２に示すように、受付カウンタ１０に設置されており、筐体１の前面には、表示部であるタッチパネルディスプレイ２と、音声入力部であるマイク３と、音声出力部であるスピーカ４と、利用者の存在を検出する赤外線センサ５が設けられている。 As shown in FIG. 2, the voice interactive apparatus according to the present embodiment applied to the automatic reception apparatus is installed in the reception counter 10, and on the front surface of the housing 1, a touch panel display 2 that is a display unit, A microphone 3 that is an audio input unit, a speaker 4 that is an audio output unit, and an infrared sensor 5 that detects the presence of a user are provided.

筐体１には、制御部や記憶部（図１）や入出力部を備えたコンピュータなどの情報処理装置、及びその他の機器類が必要に応じて搭載されている。前記音声ガイダンス出力手段、表示処理手段、及び音声認識手段としての機能は、制御部を主として情報処理装置が担っており、シナリオ記憶手段は、情報処理装置のハードディスクなどからなる記憶部がその機能を担っている。また、この記憶部には、対話シナリオに関連付けた各種の辞書が記憶されている。 An information processing apparatus such as a computer including a control unit, a storage unit (FIG. 1), and an input / output unit, and other devices are mounted on the housing 1 as necessary. The functions as the voice guidance output means, display processing means, and voice recognition means are mainly handled by the information processing apparatus as the control section, and the scenario storage means functions as a storage section including a hard disk of the information processing apparatus. I'm in charge. In addition, various dictionaries associated with the dialogue scenario are stored in the storage unit.

また、この記憶部には、情報処理装置を前述の各手段として機能させるための音声対話プログラムが格納されており（図１３参照）、この音声対話処理プログラムに従い、制御部は、音声対話装置による音声対話処理を実行する。なお、前記音声対話プログラムは、例えば、ＣＤ、ＤＶＤ、フレキシブルディスク、あるいはフラッシュメモリなどの各種記憶媒体に記録されており、これらから読み取って前記記憶部に記憶させている。 The storage unit stores a voice interaction program for causing the information processing apparatus to function as each of the above-described means (see FIG. 13). According to the voice interaction processing program, the control unit is operated by the voice interaction device. Perform voice interaction processing. The voice interaction program is recorded on various storage media such as a CD, a DVD, a flexible disk, or a flash memory, and is read from these and stored in the storage unit.

こうして、音声対話プログラムに従って、音声認識手段として機能する情報処理装置は、スピーカ４から音声ガイダンスを出力するとともに、マイク３から利用者により入力された音声信号を各種辞書と照合して、利用者の発話内容に対応する文字列データを生成し、このデータと対話シナリオに基づいて、さらに音声ガイダンスを出力するなどして、利用者との音声対話を進行させる。 In this way, the information processing apparatus functioning as a voice recognition unit according to the voice dialogue program outputs voice guidance from the speaker 4 and collates voice signals input by the user from the microphone 3 with various dictionaries. Character string data corresponding to the utterance content is generated, and voice guidance is further output based on this data and the dialogue scenario, so that voice dialogue with the user proceeds.

すなわち、かかるプログラムを用いた本実施形態に係る音声対話装置による音声対話処理方法は、図３に示すように、記憶部に記憶した対話シナリオに従った音声ガイダンスを音声出力部から出力する手順Ｓ１と、前記音声ガイダンスが前記音声出力部から出力されるときに、当該音声ガイダンスに応じて音声入力部に入力されるべき入力項目を、前記認識された入力内容と前記対話シナリオとに従って表示部に所定表示形式で表示する手順Ｓ２と、前記音声入力部に入力される音声に基づいて入力内容の認識を行う手順Ｓ３とを有する。 That is, in the voice dialogue processing method by the voice dialogue apparatus according to the present embodiment using such a program, as shown in FIG. 3, a procedure S1 for outputting voice guidance according to the dialogue scenario stored in the storage unit from the voice output unit. And when the voice guidance is output from the voice output unit, input items to be input to the voice input unit according to the voice guidance are displayed on the display unit according to the recognized input content and the dialogue scenario. There is a procedure S2 for displaying in a predetermined display format, and a procedure S3 for recognizing the input content based on the voice input to the voice input unit.

例えば、来客を検知すると、「いらっしゃいませ。画面の案内を確認しながらご用件をお話しください。先ずお名前と所属を教えてください。」などの音声ガイダンスをスピーカ４から出力する。そして、音声ガイダンスに応じて客が音声入力すべき入力項目をタッチパネルディスプレイ２に表示する。入力項目の表示態様としては、客自身の会社名入力区画や名前入力区画などが所定表示形式で表示される。他方、訪問客は、表示された会社名入力区画や名前入力区画を見ながら、自身が音声入力すべき内容を確認しながら発話する。音声対話装置は、マイク３を通して入力された発話（音声）を認識する。そして、音声ガイダンスの出力及び入力項目の表示、訪問客による音声入力、音声対話装置による音声認識が、対話シナリオ（来客受付シナリオ）に従って進行していくのである。 For example, when a visitor is detected, voice guidance such as “Come here. Please tell me your business while checking the guidance on the screen. Please tell me your name and affiliation.” Is output from the speaker 4. Then, input items to be voice input by the customer are displayed on the touch panel display 2 according to the voice guidance. As the display mode of the input items, the customer's own company name input section and name input section are displayed in a predetermined display format. On the other hand, the visitor speaks while confirming the contents to be voice-input while looking at the displayed company name input section and name input section. The voice interactive apparatus recognizes an utterance (voice) input through the microphone 3. Then, voice guidance output and input item display, voice input by a visitor, and voice recognition by a voice dialogue device proceed according to a dialogue scenario (customer acceptance scenario).

以上説明してきたように、本実施形態に係る音声対話装置は、対話進行を、装置側の音声ガイダンスと利用者の発話との音声のみで行うのではなく、音声ガイダンスに応じたディスプレイ表示を行うことで、音声による情報入力に慣れていない利用者であっても、どのように音声入力をすれば良いかが容易に把握することができるようになっている。すなわち、音声ガイダンスが前記スピーカ４から出力されるときに、当該音声ガイダンスに応じてマイク３に入力されるべき入力項目をタッチパネルディスプレイ２所定の表示形式で表示し、利用者は、この表示を見ながら音声入力することができるのである。 As described above, the voice interaction apparatus according to the present embodiment performs a display display corresponding to the voice guidance, instead of performing the dialogue progress only with the voice of the apparatus-side voice guidance and the user's utterance. Thus, even a user who is not used to inputting information by voice can easily grasp how to input voice. That is, when voice guidance is output from the speaker 4, the input items to be input to the microphone 3 in accordance with the voice guidance are displayed in a predetermined display format on the touch panel display 2, and the user views this display. The voice can be input.

音声ガイダンスに対して、利用者は音声入力を行うのであるが、利用者は、自身が音声による情報入力をしているときに、その情報入力が音声対話装置に本当に入力されるのかを把握することができず、不安になることも考えられる。 For voice guidance, the user performs voice input, but when the user inputs information by voice, the user grasps whether the information input is actually input to the voice interactive device. I can't do that, and I can be anxious.

そこで、表示処理手段は、表示部に表示した入力項目が指定されているときに、前記音声入力部への音声の入力が行われると、当該入力項目の表示形式を変更するようにしている。ここで、「表示部に表示した入力項目が指定されている」というのは、例えば、タッチパネルディスプレイ２に表示されている入力項目を利用者がタッチして（触れて）、制御部がそのタッチ信号を検出している状態である。 Therefore, the display processing means is configured to change the display format of the input item when an input of the voice is input to the voice input unit when the input item displayed on the display unit is designated. Here, “the input item displayed on the display unit is designated” means that the user touches (touches) the input item displayed on the touch panel display 2 and the control unit touches the input item. The signal is being detected.

すなわち、表示されている入力項目を、利用者が指でタッチしながら音声入力すると、音声対話装置がこの音声入力を受け付けた場合は、表示形式を変化させ、利用者に音声入力が受け付けられたことを報知するのである。なお、入力項目を指定するのは、指でワンタッチするだけであってもよいが、音声入力の開始、終了タイミングを制御部が簡単に認識することができるように、音声入力する場合、利用者は表示されている入力項目のタッチ状態を継続しておくことを条件とすることが好ましい。また、利用者の音声入力への意識付けを明確にできるという観点からも、音声入力中はタッチ状態を継続しておくことが望ましいため、以下では、表示されている入力項目のタッチは、音声入力中はタッチ状態を継続しておくものとして説明する。 In other words, when the user inputs a voice while touching the displayed input item with a finger, when the voice dialogue apparatus accepts this voice input, the display format is changed and the voice is accepted by the user. This is notified. The input item may be specified with a single touch of a finger, but when voice input is performed so that the control unit can easily recognize the start and end timing of voice input, the user Is preferably provided on the condition that the touch state of the displayed input item is continued. In addition, from the viewpoint that the user's awareness of voice input can be clearly defined, it is desirable to keep the touch state during voice input. It is assumed that the touch state is kept during input.

したがって、利用者は、音声ガイダンスに応じて、入力項目内容を音声入力する際に、タッチパネルディスプレイ２に表示されている入力項目を指でタッチすれば、どのような情報を音声入力すればいいかを視覚的に確認できるとともに、音声で入力した情報が装置に受け付けられていることも確認できるため、安心して音声対話装置を使用することができる。なお、表示形式の変化態様としては、形状、色、大きさなどの変化が考えられ、特に限定するものではないが、利用者の音声のボリュームに応じて変化するものとすれば、利用者は自分の声の大きさのレベルも認識できるため、より好ましい。 Therefore, when the user inputs the content of the input item in accordance with the voice guidance, what information should be input by touching the input item displayed on the touch panel display 2 with a finger? Can be confirmed visually, and it can be confirmed that the information inputted by voice is accepted by the apparatus, so that the voice interactive apparatus can be used with confidence. In addition, as a change mode of the display format, a change in shape, color, size, and the like can be considered, and there is no particular limitation. However, if it changes according to the volume of the user's voice, the user It is more preferable because it can recognize the level of loudness of one's own voice.

さらに、表示処理手段は、前記表示部に表示した入力項目が指定されているときに、音声認識処理が開始されると、当該入力項目の表示形式を変更することもできる。 Further, the display processing means can change the display format of the input item when the voice recognition process is started when the input item displayed on the display unit is designated.

つまり、表示されている入力項目が指でタッチされて音声入力されたとき、音声対話装置がこの音声入力を受け付け、なおかつ音声認識処理が開始されて情報として取得中である場合、入力項目の表示形式をさらに変化させ、利用者に、当該利用者の音声入力が受け付けられたことを報知するのである。なお、このときの表示形式の変化は、音声入力を受け付けたときの変化とは異なる態様とすることが好ましい。 In other words, when the displayed input item is touched with a finger and a voice is input, the voice interactive device accepts this voice input, and when the voice recognition process is started and is being acquired as information, the input item is displayed. The format is further changed to notify the user that the user's voice input has been accepted. It should be noted that the change in the display format at this time is preferably different from the change when the voice input is accepted.

ところで、表示処理手段により実行される処理において、（１）利用者が指でタッチして音声入力すると、音声対話装置がこの音声入力を受け付けた場合は、表示形式を変化させ、利用者に音声入力が受け付けられたことを報知する処理と、（２）前記表示部に表示した入力項目が指定されているときに、音声認識処理が開始されると、当該入力項目の表示形式を変更する処理とがあるが、そのいずれか一方を実施してもよいし、両方共に実施してもよい。 By the way, in the processing executed by the display processing means, (1) when the user touches with his / her finger and inputs a voice, when the voice dialogue apparatus accepts this voice input, the display format is changed and the user is voiced. A process for notifying that an input has been accepted; and (2) a process for changing the display format of the input item when the voice recognition process is started when the input item displayed on the display unit is designated. However, either one of them or both may be implemented.

また、表示処理手段は、前記音声ガイダンス出力手段により音声ガイダンスを出力後、所定期間経過したときに、前記入力項目を前記表示部に表示することが好ましい。すなわち、来訪者が最初はタッチパネルディスプレイ２に気を取られることなく、先ず、音声ガイダンスを集中して聞き、その後タッチパネルディスプレイ２に目を移せるようにするためである。
［２．音声対話処理装置の動作概要］ Further, it is preferable that the display processing means displays the input items on the display section when a predetermined period has elapsed after the voice guidance is output by the voice guidance output means. That is, the visitor is not initially distracted by the touch panel display 2, but first, listens to the voice guidance in a concentrated manner, and then shifts his eyes to the touch panel display 2.
[2. Outline of operation of spoken dialogue processing device]

上述してきた音声対話装置の動作概要について、会社を訪れた客が当該音声対話装置を利用する場合として説明する。図４は本実施形態に係る音声対話装置の動作の流れ示す説明図、図５は音声入力処理の概要を示す説明図、図６〜図９、図１１及び図１２はタッチパネルディスプレイ２に表示される来客受付画面の説明図、図１０は音声入力の入力レベルに応じた表示形式変更処理の手順を示す説明図である。 An outline of the operation of the voice interaction device described above will be described as a case where a customer visiting the company uses the voice interaction device. FIG. 4 is an explanatory diagram showing the flow of the operation of the voice interaction apparatus according to the present embodiment, FIG. 5 is an explanatory diagram showing an outline of the voice input processing, and FIGS. 6 to 9, 11 and 12 are displayed on the touch panel display 2. FIG. 10 is an explanatory diagram showing the procedure of the display format changing process according to the input level of voice input.

図４に示すように、訪問客が自動受付装置として機能する音声対話装置の前に立つと、音声対話装置は赤外線センサ５（図２）で客の来訪を検知する（ステップＳ１０１）。
来客があったことを検知した音声対話装置は、後に詳述する来客受付シナリオを参照して、所定の発話音声（セリフ）と入力項目のリストを取得する（ステップＳ１０２）。 As shown in FIG. 4, when the visitor stands in front of the voice interaction device functioning as an automatic reception device, the voice interaction device detects the visit of the customer by the infrared sensor 5 (FIG. 2) (step S101).
The voice interaction apparatus that has detected the presence of a visitor refers to a visitor reception scenario that will be described in detail later, and acquires a predetermined utterance voice (line) and a list of input items (step S102).

次いで、タッチパネルディスプレイ２上に、入力項目に対応する区画領域を表示し（ステップＳ１０３）、音声ガイダンスを発話する（ステップＳ１０４）。 Next, a partition area corresponding to the input item is displayed on the touch panel display 2 (step S103), and voice guidance is uttered (step S104).

例えば、図６に示すように、タッチパネルディスプレイ２上には、受付嬢をイメージしたキャラクタ画像と、訪問客が音声入力すべき客自身の「会社名入力区画」領域や「名前入力区画」領域が、円形で表示された領域内にそれぞれ「会社」、「名前（苗字）」と文字書された態様で表示される。また、スピーカ４からは、あたかも受付嬢が発話しているように、「いらっしゃいませ。先ずお名前と所属を教えてください。」などと音声ガイダンスを出力する。 For example, as shown in FIG. 6, on the touch panel display 2, there are a character image in which the receptionist is imaged, and a “company name input section” area and a “name input section” area of the visitor himself / herself who should input voice. In the area displayed in a circle, the characters “company” and “name (surname)” are displayed in a written form. Further, as if the receptionist is speaking, the speaker 4 outputs a voice guidance such as “Come on. Please tell me your name and affiliation first”.

そして、訪問客による区画領域の指定と音声入力に基づいて音声入力処理を実行し（ステップＳ１０５）、その後、来客受付シナリオに従った音声内容を全て認識し終えたかを判断して（ステップＳ１０６）、終えた場合はそのまま来客受付処理を終了し、終えていない場合はステップＳ１０２〜Ｓ１０５の処理を繰り返す。 Then, the voice input process is executed based on the designation of the partitioned area and the voice input by the visitor (step S105), and then it is determined whether or not all the voice contents according to the visitor reception scenario have been recognized (step S106). If completed, the customer acceptance process is terminated as it is, and if not completed, the processes in steps S102 to S105 are repeated.

ここで、ステップＳ１０５の音声入力処理は、図５に示すような手順で行われる。「先ずお名前と所属を教えてください。」という音声ガイダンスを聞いて、会社名を入力しようとするのであれば、客は、先ず、タッチパネルディスプレイ２上の「会社名入力区画」領域を指でタッチする。 Here, the voice input process of step S105 is performed in the procedure as shown in FIG. If you want to enter a company name after listening to the voice guidance “First, tell me your name and affiliation.” touch.

音声対話装置は、訪問客のタッチ動作を検知し（ステップＳ２０１）、入力項目に対応する音声認識用の辞書をロードする（Ｓ２０２）。この辞書は、来客受付シナリオに関連付けられて記憶部に複数種類格納されている。 The voice interactive device detects the touch operation of the visitor (step S201), and loads a voice recognition dictionary corresponding to the input item (S202). A plurality of types of dictionaries are stored in the storage unit in association with the customer reception scenario.

すなわち、声認識手段として機能する前記情報処理装置は、複数の音声認識用辞書を有しており、前記タッチパネルディスプレイ２に表示される入力項目に応じた音声認識用辞書を選択してマイク３に入力された音声の認識を行っている。 That is, the information processing apparatus functioning as voice recognition means has a plurality of voice recognition dictionaries, and selects a voice recognition dictionary corresponding to an input item displayed on the touch panel display 2 to the microphone 3. The input voice is recognized.

例えば、訪問者が、最初に名前を、次に会社名を入力する場合であれば、名前を入力するために「名前入力区画」領域をタッチすると、制御部は、全国を対象とした名前辞書を選択してこれをロードし、次いで「会社名入力区画」領域がタッチされると全国を対象とした会社辞書を選択してこれがロードされて各音声認識が行われる。 For example, if a visitor enters a name first and then a company name, touching the “name entry area” area to enter the name, the control unit will create a name dictionary for the whole country. When this is selected and loaded, and the "company name input section" area is touched, a company dictionary for the whole country is selected and loaded, and each voice recognition is performed.

他方、上記の「会社名入力区画」領域や「名前入力区画」領域のように、音声ガイダンスに応じて前記音声入力部に入力されるべき入力項目が複数あるとき、表示処理手段として機能する情報処理装置の制御部は、これら複数の入力項目をそれぞれ所定表示形式で前記表示部に表示するのであるが、前記制御部は、前記複数の入力項目のうちすでに認識した入力項目の入力内容に応じた音声認識用辞書を選択し、前記複数の入力項目のうちまだ認識していない入力項目に対して前記音声入力部に入力される音声の認識を、前記選択した音声認識用辞書を用いて行うこともできる。 On the other hand, information that functions as a display processing means when there are a plurality of input items to be input to the voice input unit in response to voice guidance, such as the above-described “company name input section” area and “name input section” area. The control unit of the processing device displays each of the plurality of input items on the display unit in a predetermined display format, but the control unit responds to the input content of the input item already recognized among the plurality of input items. A speech recognition dictionary is selected, and speech input to the speech input unit is recognized using the selected speech recognition dictionary for an input item that has not been recognized among the plurality of input items. You can also.

すなわち、最初に会社名を入力して、次に名前を入力する場合であれば、制御部は、全国を対象とした名前辞書ではなく、すでに認識した入力項目（会社名）の入力内容に応じた音声認識用辞書として、その会社の社員名辞書を選択してロードするのである。よって、辞書検索の範囲が小さくなり、訪問者の名前の認識精度とスピードを向上させることができる。 In other words, if the company name is input first and then the name is input next, the control unit will not respond to the input contents of the input item (company name) that has already been recognized, not the name dictionary for the whole country. The employee name dictionary of the company is selected and loaded as the voice recognition dictionary. Therefore, the range of the dictionary search is reduced, and the recognition accuracy and speed of the visitor's name can be improved.

次いで、音声対話装置は、入力項目に対応する表示領域を入力待ち状態を示す表示に変更するとともに（ステップＳ２０３）、システム状態を入力待ち状態に遷移する（ステップＳ２０４）。 Next, the voice interactive apparatus changes the display area corresponding to the input item to a display indicating the input waiting state (step S203), and changes the system state to the input waiting state (step S204).

「会社名入力区画」領域が指定されている場合であれば、図７に示すように、円形で表示されている「会社名入力区画」領域の円を多重に表示して、一重の円のままの表示がなされている「名前入力区画」領域と区別できるようにするのである。 If the “company name input section” area is designated, as shown in FIG. 7, the circle of the “company name input section” area displayed in a circle is displayed in multiple layers. This makes it possible to distinguish it from the “name input section” area that is displayed as it is.

他方、訪問客は、指でタッチしている「会社名入力区画」領域が、図７に示すように変化したことで、入力待ち状態となっていることを確認するとともに、「今は会社名を音声入力するのだ」と意識しつつ、タッチしたまま、例えば「○△工業です。」と自信をもって音声入力することができる。 On the other hand, the visitor confirms that the “company name input section” area touched with the finger has changed to the input waiting state as shown in FIG. While consciously saying, “I am a voice input,” for example, I can input a voice with confidence, for example, “I am an industry.”

音声対話装置は、音声入力を受付け、入力内容の認識処理を開始すると表示形式変更処理を実行する（ステップＳ２０６）。 When the voice interactive apparatus receives voice input and starts recognition processing of the input content, the voice interactive apparatus executes display format change processing (step S206).

例えば、図８に示すように、音声入力の入力レベルに応じて、色の種類とその濃度を３段階（第１の色〜第３の色）に変更する。かかる表示処理により、訪問客は自分の音声が受け付けられていることを確認することができる。 For example, as shown in FIG. 8, the color type and its density are changed to three levels (first color to third color) according to the input level of the voice input. By such display processing, the visitor can confirm that his / her voice is accepted.

この図８で示した表示処理の変更、すなわち、音声入力の入力レベルに応じて表示形式を変更する処理は、図１０に示す手順で行われる。 The change of the display process shown in FIG. 8, that is, the process of changing the display format in accordance with the input level of the voice input is performed according to the procedure shown in FIG.

図１０に示すように、先ず、音声対話装置の制御部は、音声入力レベルを取得する（ステップＳ３００）。次いで、予め記憶部に記憶した音声入力レベルの許容範囲データを読み出す（ステップＳ３０１）。そして、入力レベルが許容範囲を上回るか否かを判断する（ステップＳ３０２）。上回ると判断した場合は、処理をステップＳ３０３に移し、表示領域（例えば、「会社名入力区画」領域）を第１の色（例えば、薄い水色など）に設定する（図８(a)）。 As shown in FIG. 10, first, the control unit of the voice interaction apparatus acquires a voice input level (step S300). Next, the sound input level allowable range data stored in advance in the storage unit is read (step S301). Then, it is determined whether or not the input level exceeds the allowable range (step S302). If it is determined that the number exceeds, the process proceeds to step S303, and the display area (for example, “company name input section” area) is set to the first color (for example, light blue) (FIG. 8A).

ステップＳ３０２で、入力レベルが許容範囲を上回らないと判断した場合は、入力レベルが許容範囲を下回るか否かを判断する（ステップＳ３０４）。そして、下回ると判断した場合は、処理をステップＳ３０５に移し、表示領域（例えば、「会社名入力区画」領域）を第２の色（例えば、通常の濃度の緑色など）に設定する（図８(b)）。 If it is determined in step S302 that the input level does not exceed the allowable range, it is determined whether or not the input level falls below the allowable range (step S304). If it is determined that the value is lower, the process proceeds to step S305, and the display area (for example, “company name input section” area) is set to the second color (for example, normal density green) (FIG. 8). (b)).

ステップＳ３０４で、入力レベルが許容範囲を下回らないと判断した場合は、処理をステップＳ３０６に移し、表示領域（例えば、「会社名入力区画」領域）を第３の色（例えば、高濃度の赤色など）に設定する(図８(c))。 If it is determined in step S304 that the input level does not fall below the allowable range, the process proceeds to step S306, and the display area (for example, “company name input section” area) is changed to the third color (for example, high-density red). Etc.) (FIG. 8C).

ステップＳ３０３，Ｓ３０５，Ｓ３０６の処理を終えると、訪問客による音声入力が終了したかを判断し（ステップＳ３０７）、終了したと判断すればこの表示形式変更処理を終了し、音声入力が未だ終了していないと判断した場合は、処理をステップＳ３００に戻す。 When the processing of steps S303, S305, and S306 is completed, it is determined whether or not the voice input by the visitor has been completed (step S307). If it is determined that it is not, the process returns to step S300.

また、ステップＳ２０６の表示形式変更処理では、客の発声した「○△工業です。」について音声認識処理を開始すると、図９に示すように、「会社名入力区画」領域中で放射状に光が点滅するなどして表示態様（表示形式）を変更する。かかる表示処理により、訪問客は自分の名前を認識中であることを確認できる。 Further, in the display format changing process in step S206, when the voice recognition process is started with respect to “○ △ industrial” uttered by the customer, as shown in FIG. 9, light is emitted radially in the “company name input section” area. The display mode (display format) is changed by blinking. By such display processing, the visitor can confirm that his / her name is being recognized.

図５に戻り、音声対話装置の制御部が表示形式変更処理（ステップＳ２０６）を終了すると、訪問客は、自分の発話に対し、音声対話装置側で音声入力内容の認識処理を開始したことを図９に示した表示形式の変化で確認できるため、タッチしている「会社名入力区画」領域から指を離して発話を終了する。 Returning to FIG. 5, when the control unit of the voice interaction device finishes the display format change process (step S206), the visitor has started the voice interaction content recognition process on the voice interaction device side for his / her utterance. Since it can be confirmed by the change in the display format shown in FIG. 9, the finger is released from the touched “company name input section” area, and the utterance is ended.

音声対話装置の制御部は、ステップＳ２０２でロードした辞書を用いて、入力された音声の認識処理を行い（ステップＳ２０８）、認識結果を尤度順に「会社名入力区画」領域の周囲に表示する（ステップＳ２０９）。例えば、図１１に示すように、円で表示された会社名入力区画」領域を囲むように、尤度順に大きさの異なる複数の認識結果候補を所定形状の区画領域で表示する。ここでは、会社名入力区画」領域と同じように円形の区画領域としているが、矩形などであってもよい。 The control unit of the voice interaction apparatus performs input speech recognition processing using the dictionary loaded in step S202 (step S208), and displays the recognition results around the “company name input section” area in order of likelihood. (Step S209). For example, as shown in FIG. 11, a plurality of recognition result candidates having different sizes in order of likelihood are displayed in a partition area of a predetermined shape so as to surround a “company name input section displayed in a circle” area. Here, like the company name input section area, a circular section area is used, but it may be a rectangle or the like.

一方、訪問客は「会社名入力区画」領域の周囲に表示された認識結果の候補を確認し、その中に自分の音声入力を反映した正しい認識結果があれば、タッチパネルディスプレイ２上で正しい認識結果が表示されている部分をタッチする。他方、タッチパネルディスプレイ２上にはＮＧボタンが表示されており（図示せず）、認識結果が自分の音声入力を反映したものでないと判断すれば、ＮＧボタンをタッチして音声入力をやり直す。 On the other hand, the visitor confirms the recognition result candidates displayed around the “company name input section” area, and if there is a correct recognition result reflecting his / her voice input, the correct recognition is made on the touch panel display 2. Touch the part where the result is displayed. On the other hand, an NG button is displayed on the touch panel display 2 (not shown), and if it is determined that the recognition result does not reflect its own voice input, the NG button is touched and the voice input is performed again.

ところで、ステップＳ２０８で入力された音声の認識処理を行ったときに、候補が複数ではなく単独の場合もある。その場合は、複数の認識結果候補を示す所定形状の区画領域は１つだけ表示されることになり、これが確認ボタンとして機能する。なお、前記ＮＧボタンとともにＯＫボタン（図示せず）を表示して、これらを確認ボタンとして使用できるようにしてもよい。 By the way, when the recognition process of the voice input in step S208 is performed, there may be a single candidate instead of a plurality. In that case, only one section area of a predetermined shape indicating a plurality of recognition result candidates is displayed, and this functions as a confirmation button. Note that an OK button (not shown) may be displayed together with the NG button so that these can be used as a confirmation button.

このように、音声対話装置の制御部は、音声認識手段として機能するとともに、さらに、この音声認識手段により認識された複数の入力内容の候補をタッチパネルディスプレイ２へ表示する候補表示手段と、この候補表示手段により表示された複数の入力内容の候補から、いずれか一つの候補を選択する選択手段と、この選択手段により選択された候補を入力内容として決定する入力処理手段としても機能している。 As described above, the control unit of the voice interactive apparatus functions as voice recognition means, and further displays candidate candidates for displaying a plurality of input content candidates recognized by the voice recognition means on the touch panel display 2, and the candidates. It also functions as a selection means for selecting any one candidate from a plurality of input content candidates displayed by the display means, and an input processing means for determining the candidate selected by this selection means as the input content.

音声対話装置の制御部は、ステップＳ２０９の処理の後、訪問客からの正しい認識結果を示すタッチパネルディスプレイ２のタッチ信号を検出すると、図１２に示すように、会社名入力区画」領域内に認識結果を表示するとともに、会社名入力区画」領域の円を太線で表示して、音声入力が確定したことを訪問客に報知する。 When the control unit of the voice interaction apparatus detects a touch signal on the touch panel display 2 indicating a correct recognition result from the visitor after the process of step S209, the control unit recognizes it in the company name input section area as shown in FIG. In addition to displaying the result, the circle in the “Company Name Input Section” area is displayed with a bold line to notify the visitor that the voice input has been confirmed.

こうして、本実施形態に係る音声対話装置は、インターフェイスとして、音声入力部や音声出力部に加え、音声ガイダンスの内容を利用者が視覚的にも把握できるように、タッチパネルディスプレイ２を備えた構成としたことにより、より円滑な対話進行を実現することができる。 Thus, the voice interactive apparatus according to the present embodiment includes the touch panel display 2 as an interface so that the user can visually grasp the contents of the voice guidance in addition to the voice input unit and the voice output unit. As a result, a smoother dialog can be realized.

以下、本実施形態に係る音声対話装置について、図面を参照しながら、より具体的に説明する。なお、以下においても、音声対話装置によって会社へ来訪した来客の受け付けを行うとともに、この来客が希望する社員への面会取り次ぎを行うまでを例にとって説明する。 Hereinafter, the voice interaction apparatus according to the present embodiment will be described more specifically with reference to the drawings. In the following, an explanation will be given by taking as an example the process of accepting a visitor to the company by means of a voice interaction device and relaying a visit to an employee desired by the visitor.

［３．音声対話処理装置の具体的構成］
図１３は、本実施形態に係る音声対話装置の電気的構成を示すブロック図、図１４は記憶部に記憶されている対話シナリオファイルの一例である第１の来客受付シナリオを表化して示した来客受付シナリオテーブル、図１５はこの来客受付シナリオに関連付けられて記憶部に記憶されている来訪予約データを表化して示した来訪予約テーブル、図１６は来客受付シナリオに従って進行する音声対話の流れを示す説明図、図１７は第２の来客受付シナリオテーブルの説明図、図１８はタッチパネルディスプレイ２に表示される来客受付画面の説明図である。 [3. Specific Configuration of Spoken Dialogue Processing Device]
FIG. 13 is a block diagram showing an electrical configuration of the voice interaction apparatus according to the present embodiment, and FIG. 14 shows a first visitor reception scenario that is an example of the interaction scenario file stored in the storage unit. Visitor reception scenario table, FIG. 15 is a visit reservation table showing the visit reservation data associated with the visitor reception scenario and stored in the storage unit, and FIG. 16 shows the flow of voice dialogue that proceeds according to the visitor reception scenario. FIG. 17 is an explanatory diagram of a second customer reception scenario table, and FIG. 18 is an explanatory diagram of a customer reception screen displayed on the touch panel display 2.

図１３に示すように、音声対話装置は、先に図１を参照して説明したように、タッチパネルディスプレイ２（表示部）と、マイク３（音声入力部）と、スピーカ４（音声出力部）と、赤外線センサ５とを備えるとともに、これらと入出力部を介して接続された、ＣＰＵ６１、ＲＯＭ６２、ＲＡＭ６３などからなる制御部と、記憶部としてのハードディスク装置７（以下「ＨＤＤ７」とする）とを備える情報処理装置６を具備している。なお、図では省略したが、情報処理装置６には、タッチパネルディスプレイ２を制御する表示処理手段としての表示制御回路やスピーカ４からの音声出力を制御する音声出力回路などが備えられている。 As shown in FIG. 13, the voice interaction apparatus has a touch panel display 2 (display unit), a microphone 3 (voice input unit), and a speaker 4 (voice output unit) as described above with reference to FIG. 1. A control unit including a CPU 61, a ROM 62, a RAM 63, and the like, and a hard disk device 7 (hereinafter referred to as “HDD 7”) as a storage unit. The information processing apparatus 6 provided with is provided. Although not shown in the figure, the information processing apparatus 6 includes a display control circuit as a display processing unit that controls the touch panel display 2, an audio output circuit that controls audio output from the speaker 4, and the like.

ＨＤＤ７には、本音声対話装置全体を制御ためのシステムプログラム、音声対話処理を行うための音声対話プログラム、対話シナリオファイル、音声認識辞書、発話用音声データなどが格納されており、音声対話プログラムは、対話制御プログラム、音声入力プログラム、入力レベル判定プログラム、音声認識プログラムなどから構成されている。なお、前記発話用音声データに代えて、音声合成プログラムを用いることもできる。 The HDD 7 stores a system program for controlling the entire voice dialogue apparatus, a voice dialogue program for performing voice dialogue processing, a dialogue scenario file, a voice recognition dictionary, voice data for speech, and the like. , A dialogue control program, a voice input program, an input level determination program, a voice recognition program, and the like. Note that a speech synthesis program can be used instead of the speech data for speech.

なお、記憶部を構成するＨＤＤ７を始め、主制御を行う情報処理装置は必ずしも筐体１内に格納されていなくてもよく、例えば、別置きされたワークステーションやサーバなどに備えられていてもよい。その場合、図２で示した筐体１を備えた装置を端末装置として用い、これをワークステーションやサーバと無線あるいは有線にて接続したシステム構成であってもよい。 Note that the information processing apparatus that performs main control, such as the HDD 7 that constitutes the storage unit, does not necessarily have to be stored in the housing 1, and may be provided in a separate workstation or server, for example. Good. In that case, a system configuration in which an apparatus including the housing 1 shown in FIG. 2 is used as a terminal apparatus and is connected to a workstation or a server wirelessly or by wire.

対話シナリオファイルは、種々のシチュエーションに応じた複数の対話シナリオがテーブル化されて記憶されており、本実施形態の音声対話装置では、図１４に示す第１の来客受付シナリオテーブル又は図１７に示す第２の来客受付シナリオテーブルが最初に参照される。 In the dialogue scenario file, a plurality of dialogue scenarios corresponding to various situations are tabulated and stored. In the voice dialogue apparatus according to the present embodiment, the first customer reception scenario table shown in FIG. 14 or shown in FIG. The second visitor reception scenario table is referenced first.

図１４に示すように、第１の来客受付シナリオテーブルには、来客の用件に対応するために必要な特定事項を決定するのに必要な情報を取得するための複数のＩＤ欄が時系列に設けられている。そして、各ＩＤ欄には、タイトル、音声ガイダンスとして発話するセリフ、音声ガイダンスに応じて利用者である客が音声入力すべき入力項目が対応付けられている。さらに、この入力項目は、項目名、入力項目を認識するための認識辞書（音声認識用辞書）、入力項目から派生する従属項目であるか否かを判定するための従属チェック、入力項目が必須であるか任意のものであるのかを識別する必須チェックが対応付けられている。 As shown in FIG. 14, the first visitor reception scenario table includes a plurality of ID columns for acquiring information necessary for determining specific items necessary for dealing with a visitor's business in a time series. Is provided. Each ID column is associated with a title, a speech uttered as voice guidance, and input items to be voice input by a customer who is a user according to the voice guidance. Furthermore, this input item requires an item name, a recognition dictionary (speech recognition dictionary) for recognizing the input item, a subordinate check for determining whether or not it is a subordinate item derived from the input item, and an input item. Is associated with a mandatory check that identifies whether it is an arbitrary or arbitrary one.

具体的には、この第１の客受付シナリオテーブルにはＩＤ１〜ＩＤ４が設定されており、ＩＤ１の欄のタイトルは「来客特定１」であり、来客を特定するために音声ガイダンスとして音声出力されるセリフは「こんにちは。会社名とお名前を教えてください。」である。利用者である客が音声で入力する入力項目の項目名は、「会社名」、「名前（姓）」が設定されている。なお、このＩＤ１では、「会社名」、「名前（姓）」のいずれにも従属チェックでは従属されていないことを示すフラグが立てられており、必須チェックでは、いずれも必須であることを示すフラグが立てられている。 Specifically, ID1 to ID4 are set in the first customer reception scenario table, the title in the column of ID1 is “customer identification 1”, and is output as voice guidance to identify the customer. that dialogue is "Hello. please tell me the company name and your name.". “Company name” and “name (last name)” are set as the item names of input items that the customer as a user inputs by voice. In addition, in this ID1, a flag indicating that neither “company name” nor “name (last name)” is subordinate in the subordinate check is set, and in the essential check, both are essential. A flag is raised.

また、認識辞書として、「会社名」には「会社名辞書」が、「名前（姓）」には「苗字辞書」が設定されている。すなわち、音声認識手段として機能する情報処理装置は、複数の音声認識用辞書を有しており、タッチパネルディスプレイ２に表示される入力項目に応じた音声認識用辞書を適宜選択してマイク３を介して入力された音声の認識を行っている。 As the recognition dictionary, “company name dictionary” is set for “company name”, and “surname last name dictionary” is set for “name (surname)”. That is, the information processing apparatus functioning as voice recognition means has a plurality of voice recognition dictionaries, and appropriately selects a voice recognition dictionary corresponding to an input item displayed on the touch panel display 2 via the microphone 3. The input voice is recognized.

また、ＩＤ２の欄のタイトルは「部署特定」であり、部署を特定するための音声ガイダンスのセリフは「所属の部署名も教えてください。」である。入力項目の項目名は、「部署名」が設定されている。そして、認識辞書としては、「部署名辞書」が設定されている。このＩＤ２の入力項目はＩＤ１の入力項目である「会社名」から派生した従属項目であるため、従属チェックには、従属を示すフラグが立てられており、また、このＩＤ２の入力項目も部署を特定するためには必須であるため、必須チェックでは必須であることを示すフラグが立てられている。 In addition, the title of the column of ID2 is “department identification”, and the voice guidance line for identifying the department is “Tell me your department name”. “Department name” is set as the item name of the input item. As a recognition dictionary, a “part name dictionary” is set. Since the input item of ID2 is a subordinate item derived from “company name” which is the input item of ID1, a flag indicating subordination is set in the subordinate check, and the input item of ID2 also has a department. Since it is indispensable to specify, a flag indicating that it is indispensable in the essential check is set.

ＩＤ３の欄のタイトルは「来客特定２」であり、「来客特定１」で来客を特定できない場合の従属項目であるため、従属チェックには、従属を示すフラグが立てられている。音声ガイダンスとして音声出力されるセリフは「申し訳ありませんが下のお名前も頂戴できますか。」であり、訪問客が音声で入力する入力項目の項目名は、「名前（名）」が設定されている。そして、認識辞書としては、「名前辞書」が設定されている。 The title in the column of ID3 is “customer identification 2”, and this is a subordinate item when a visitor cannot be identified by “customer identification 1”, so a flag indicating subordination is set in the subordinate check. The speech that is output as voice guidance is “I'm sorry, can I get the name below?”, And the item name of the input item that the visitor inputs by voice is set to “name (name)” ing. A “name dictionary” is set as the recognition dictionary.

ＩＤ４の欄のタイトルは「案内特定」であり、対話シナリオの結びになり、来訪者からの音声入力は想定されていない。よって、このＩＤ４ではセリフのみが設定されている。設定されているセリフとしては、「ただいま担当者におつなぎします。」と、「アポイントが登録されていません。」の２通りである。 The title in the column of ID4 is “guidance specific”, which is a conclusion of a dialogue scenario, and no voice input from a visitor is assumed. Therefore, only a line is set in this ID4. There are two types of lines: “I will connect you to the person in charge now” and “No appointments are registered”.

また、図１５に示す来訪予約テーブルは、来客受付シナリオテーブルに関連付けられて記憶されており、図示するように、「来客会社名」、「来客部署」、「来客名」、「訪問予定日時」、「担当者名」、及び「担当者電話番号」の項目ごとに、来訪予約のあった訪問客に関するデータが纏められている。 Further, the visit reservation table shown in FIG. 15 is stored in association with the visitor reception scenario table. As shown in the figure, “visit company name”, “visitor department”, “visitor name”, “visit date / time” For each item of "person in charge" and "person in charge telephone number", data relating to visitors who have made a visit reservation are collected.

［４．音声対話処理装置による音声対話の進行］
上記第１の来客受付シナリオに従った音声対話がどのように進行していくかを、図１６を参照して説明する。訪問客を赤外線センサ５により検出すると、音声対話装置は、第１の来客受付シナリオを参照して、図示するように、先ず、発話処理と入力項目に対応する表示領域（「会社名入力区画」領域、「名前入力区画」領域）の表示処理を実行する（ステップＳ４００）。 [4. Progress of voice dialogue by voice dialogue processing device]
How the voice dialogue proceeds according to the first visitor reception scenario will be described with reference to FIG. When a visitor is detected by the infrared sensor 5, the voice interaction apparatus refers to the first visitor reception scenario and, as shown in the figure, first, a display area (“company name input section”) corresponding to the speech process and input items, as shown in the figure. The display process of the area, “name input section” area) is executed (step S400).

訪問客が上述してきた手順（図５参照）で音声入力を行った結果、訪問客の会社がグループ企業であるか否かを判断する（ステップＳ４０１）。この判断は、訪問客の音声入力を認識するときに用いた会社名辞書に基づく。すなわち、会社名辞書にある企業名データなどには、グループ企業であるか否かを示す識別子が付設されているのである。 It is determined whether or not the visitor's company is a group company as a result of the voice input by the visitor in the procedure described above (see FIG. 5) (step S401). This determination is based on the company name dictionary used when recognizing a visitor's voice input. That is, an identifier indicating whether or not a company is a group company is attached to company name data in the company name dictionary.

そして、グループ企業であると判断すると、来客受付シナリオテーブル（図１４）のＩＤ２に基づいて、音声対話装置はステップＳ４０２により「所属の部署も教えてください。」と音声ガイダンスを出力する。このとき、タッチパネルディスプレイ２上には、図１８に示すように、会社名に従属する入力項目である部署名（所属）が「会社名入力区画」領域と関連するように線で結ばれた状態で表示される。 If it is determined that the company is a group company, based on ID2 of the customer reception scenario table (FIG. 14), the voice interaction device outputs voice guidance “Please tell me your department” in step S402. At this time, as shown in FIG. 18, the department name (affiliation), which is an input item subordinate to the company name, is connected on the touch panel display 2 with a line so as to relate to the “company name input area” area. Is displayed.

すなわち、音声対話装置の情報処理装置は、入力項目に対する入力内容が所定の入力内容であるときに、すでに表示している入力項目に従属する入力項目をタッチパネルディスプレイ２に表示することができるのである。 That is, the information processing apparatus of the voice interactive apparatus can display on the touch panel display 2 an input item subordinate to the input item that is already displayed when the input content for the input item is a predetermined input content. .

この「所属の部署も教えてください。」という音声ガイダンスに従って、訪問客がこれも図５に示した手順に従って、タッチパネルディスプレイ２の部署名（所属）の部分をタッチしながら所属する部署を音声入力すると、情報処理装置は、音声認識して来客データを生成するとともに、来客予約データを参照しにいく（ステップＳ４０３）。そして、生成した来客データが来客予約データのデータに該当するか否かを判定する（ステップＳ４０４）。この来客予約データは、図１５で示した来訪予約テーブルのデータである。 According to this voice guidance “Please tell me your department”, the visitor can also input the department to which he / she belongs by touching the department (affiliation) part of the touch panel display 2 according to the procedure shown in FIG. Then, the information processing apparatus recognizes the voice and generates visitor data, and refers to the visitor reservation data (step S403). Then, it is determined whether the generated visitor data corresponds to the visitor reservation data (step S404). This visitor reservation data is data of the visit reservation table shown in FIG.

そして、来客予約テーブルのデータの中に、音声認識で特定した来客のデータがなければ、予約無の来客と判断し、ステップＳ４０５に処理を移す。この処理においては、情報処理装置は「アポイントが登録されておりません。」などと音声出力し、次いで、ステップＳ４０６で音声対話の進行を別の対話シナリオ、例えば用件確認シナリオに引き継いで処理を終了する。 If there is no visitor data specified by voice recognition in the visitor reservation table data, it is determined that there is no reservation, and the process proceeds to step S405. In this process, the information processing apparatus outputs a voice message such as “No appointment is registered.” Then, in step S406, the voice conversation progresses to another conversation scenario, for example, a business confirmation scenario. The process ends.

その後は、用件確認シナリオが読み出されて用件確認処理が進行していくが、この用件確認シナリオでは、例えば、「アポイントが登録されておりませんが、どのようなご用件でしたでしょうか？」と音声ガイダンスを出力するともに、この音声ガイダンスに応じて入力されるべき入力項目（例えば、打ち合わせ、配達、営業、その他）をタッチパネルディスプレイ２に表示することになる。 After that, the business confirmation scenario is read and the business confirmation process proceeds. In this business confirmation scenario, for example, “No appointment is registered, but what kind of business The voice guidance is output, and the input items (for example, meeting, delivery, sales, etc.) to be input in accordance with the voice guidance are displayed on the touch panel display 2.

一方、ステップＳ４０４で来客データが来客予約データのいずれにも該当しない場合、来客予約データ中に来客データと同一姓（同一苗字）のデータが存在するか否かを判定する（ステップＳ４０７）。 On the other hand, if the visitor data does not correspond to any of the visitor reservation data in step S404, it is determined whether or not there is data having the same surname (same last name) as the visitor data in the visitor reservation data (step S407).

そして、同一姓のデータがない場合は、処理をステップＳ４１０に移す一方、同一姓のデータがある場合は、ステップＳ４０８において、「申し訳ありませんが、下のお名前も頂戴できますか。」と音声ガイダンスを出力する。このとき、タッチパネルディスプレイ２上には、名前（姓）に従属する入力項目である名前（名）が、図１８で示した部署名（所属）が「会社名入力区画」領域と関連するように線で結ばれていのと同じような形態で表示される。 If there is no data with the same surname, the process proceeds to step S410. On the other hand, if there is data with the same surname, in step S408, "I'm sorry, can you give me a lower name?" Output guidance. At this time, the name (first name) which is an input item subordinate to the name (last name) is displayed on the touch panel display 2 so that the department name (affiliation) shown in FIG. 18 is related to the “company name input section” area. It is displayed in a form similar to that connected by a line.

訪問客が、ここでも図５に示した手順に従って、タッチパネルディスプレイ２の名前（名）の部分をタッチしながら下の名前を音声入力すると、情報処理装置は、これを音声認識して来客データを再生成して来客予約データを参照しにいく。そして、再生成した来客データが前述の来客予約データのデータに該当するか否かを判定する（ステップＳ４０９）。 When the visitor again inputs the lower name by voice while touching the name (name) portion of the touch panel display 2 according to the procedure shown in FIG. 5, the information processing apparatus recognizes the voice and inputs the visitor data. Regenerate and go to visit reservation data. Then, it is determined whether or not the regenerated customer data corresponds to the aforementioned customer reservation data (step S409).

同一の名前データがない場合は、処理をステップＳ４０５に移す一方、同一の名前データがある場合は、ステップＳ４１０において、「ただいま担当者におつなぎします。」と音声出力して、来客受付シナリオに沿った来客受付用の音声対話処理を終える。 If the same name data does not exist, the process proceeds to step S405. If the same name data exists, in step S410, the voice message “I will connect to the person in charge now” is output and the customer reception scenario Finish the voice dialogue process for accepting visitors.

なお、本実施形態に係る音声対話装置は、自動受付装置として用いているため、社員と通信可能な所定の通信手段を備えた構成としている。例えば通信手段が電話であれば、ステップＳ４１０の処理の後、音声対話装置は来訪予約テーブルの担当者電話番号を参照して担当者に繋ぎ、来客の旨を音声により告げるか、あるいは音声対話装置のマイク３とスピーカ４とを介して、来訪者が担当者（社員）と直接通話できるように通信制御処理を行うことができる。すなわち、音声対話処理を用いた自動受付装置の来客受付処理としては、担当者への連絡処理が含まれる。 Note that the voice interaction apparatus according to this embodiment is used as an automatic reception apparatus, and therefore has a configuration including predetermined communication means capable of communicating with employees. For example, if the communication means is a telephone, after the process of step S410, the voice dialogue apparatus refers to the person in charge telephone number of the visit reservation table and connects to the person in charge and informs the visitor by voice, or the voice dialogue apparatus. Through the microphone 3 and the speaker 4, communication control processing can be performed so that the visitor can directly talk with the person in charge (employee). In other words, the customer reception process of the automatic reception apparatus using the voice interaction process includes a process for contacting the person in charge.

ところで、上述してきた第１の来客受付シナリオテーブルを参照しての音声対話処理において、訪問客を最終的に担当者に取り次ぐ場合、来訪予約テーブル（図１５）を参照して来訪予定者と関連付けられた担当者に取り次ぐようにしていた。 By the way, in the voice dialogue processing referring to the first visitor reception scenario table described above, when the visitor is finally transferred to the person in charge, the visit reservation table (FIG. 15) is referred to and associated with the prospective visitor. I was trying to relay to the person in charge.

この場合、来訪予約テーブルには担当者を示すデータが必要となるが、来訪予約テーブルに担当者のデータがなくても音声対話による来客受付けを可能とするためには対話シナリオファイルの一例である第２の来客受付シナリオテーブルを用いるとよい。 In this case, data indicating the person in charge is required in the visit reservation table, but it is an example of a dialogue scenario file in order to be able to accept a visitor by voice dialogue even if there is no data of the person in charge in the visit reservation table The second visitor reception scenario table may be used.

この第２の来客受付シナリオテーブルでは、訪問客に担当者の名前を音声入力させるようにしている点が第１の来客受付シナリオテーブルと異なっている。 The second visitor reception scenario table is different from the first visitor reception scenario table in that the name of the person in charge is input to the visitor by voice.

すなわち、第２の来客受付シナリオテーブルは、図１７に示すように、ＩＤ１〜ＩＤ３が設定されており、ＩＤ１の欄のタイトルは「来客特定」であって、これは、図１４で示した第１の来客受付シナリオテーブルのＩＤ１の「来客特定１」と同じセリフと入力項目（項目名、認識辞書、従属チェック、必須チェック）が設定されている。 That is, as shown in FIG. 17, ID1 to ID3 are set in the second visitor reception scenario table, and the title in the column of ID1 is “customer identification”, which is the same as that shown in FIG. The same lines and input items (item name, recognition dictionary, subordinate check, mandatory check) as “customer identification 1” of ID 1 of the 1 visitor reception scenario table are set.

また、ＩＤ２の欄は、第１の来客受付シナリオテーブルのＩＤ２の欄と同じタイトル「部署特定」が設定され、セリフも入力項目（項目名、認識辞書、従属チェック、必須チェック）も同一である。 In the ID2 column, the same title “department identification” is set as in the ID2 column of the first visitor reception scenario table, and the lines and input items (item name, recognition dictionary, subordinate check, essential check) are the same. .

ＩＤ３の欄のタイトルは「担当者特定」であり、来客を特定するために音声ガイダンスとして音声出力されるセリフは「担当者の名前を教えてください。」である。利用者である客が音声で入力する入力項目の項目名は、「担当者名」、「部署名」が設定されている。なお、このＩＤ３の従属チェックは、「担当者名」、「部署名」のいずれも従属されていないことを示すフラグが立てられている。他方、必須チェックでは、「担当者名」については必須であることを示すフラグが立てられているが、「部署名」については任意であることを示すフラグが立てられている。 The title in the column of ID3 is “specify person in charge”, and the speech that is voice-output as voice guidance to identify the customer is “Please tell me the name of the person in charge”. “Name of person in charge” and “Department name” are set as item names of input items that are input by a customer as a user. In the subordinate check of ID3, a flag is set indicating that neither the “person in charge” nor the “department name” is subordinate. On the other hand, in the essential check, a flag indicating that the “person in charge” is essential is set, but a flag indicating that the “department name” is optional is set.

また、認識辞書として、「担当者名」には「担当者名辞書」が、「部署名」には「部署名辞書」が設定されている。 In addition, as a recognition dictionary, a “person name dictionary” is set as the “person name”, and a “part name dictionary” is set as the “part name”.

［５．音声対話処理装置による具体的な音声対話処理］
以下、図１９〜図２５を参照して、上記第２の来客受付シナリオテーブルに従った音声対話処理による来客受付処理について説明する。図１９は来客受付シナリオに従って進行する音声対話処理の一例を示すフローチャート、図２０及び図２１は同音声対話処理のサブルーチンを示すフローチャート、図２２〜図２５はタッチパネルディスプレイに表示される画面の説明図である。なお、以下の処理は、図１３で示した電気的構成を有する音声対話装置が適用された自動受付装置が実行するものであり、既に電源投入がなされ、システムプログラムが起動し、音声対話プログラムが読み出されて初期設定などが全て完了し、訪問客が装置前に位置した時点からの処理フローとしている。 [5. Specific voice dialogue processing by voice dialogue processing device]
Hereinafter, with reference to FIG. 19 to FIG. 25, the customer acceptance process by the voice dialogue process according to the second visitor acceptance scenario table will be described. FIG. 19 is a flowchart showing an example of a voice dialogue process that proceeds according to a customer acceptance scenario, FIGS. 20 and 21 are flowcharts showing a subroutine of the voice dialogue process, and FIGS. 22 to 25 are explanatory diagrams of screens displayed on the touch panel display. It is. The following processing is executed by the automatic reception device to which the voice interactive device having the electrical configuration shown in FIG. 13 is applied. The power is already turned on, the system program is started, and the voice interactive program is executed. This is a processing flow from the time when all the initial settings and the like are read and the visitor is located in front of the device.

図１９に示すように、赤外線センサ５で来客を検知すると（ステップＳ５００）、装置内のＣＰＵ６１は、対話シナリオファイルから第２の来客受付シナリオテーブル（図１７）を読み出し、来客特定（ＩＤ１）のデータをロードする（ステップＳ５０１）。 As shown in FIG. 19, when a visitor is detected by the infrared sensor 5 (step S500), the CPU 61 in the apparatus reads the second visitor reception scenario table (FIG. 17) from the dialogue scenario file and stores the visitor identification (ID1). Data is loaded (step S501).

この来客特定（ＩＤ１）に従って訪問客との対話を実行する（ステップＳ５０２）。このステップＳ５０２により、訪問者の会社名、氏名が特定されることになる。なお、このステップＳ５０２で実行される対話実行処理については、図２０及び図２１に示すサブルーチンを参照して、後に詳述する。 A dialogue with the visitor is executed according to this visitor identification (ID1) (step S502). By this step S502, the company name and name of the visitor are specified. The dialog execution process executed in step S502 will be described in detail later with reference to the subroutines shown in FIGS.

来客の特定により、会社名を認識すると、ＣＰＵ６１は、この会社名が自社のグループ企業であるか否かを判断する（ステップＳ５０３）。すなわち、第２の来客受付シナリオテーブルに規定されている会社名辞書を参照してＣＰＵ６１が判断する（図１６のステップＳ４０１の説明参照）。 When the company name is recognized by specifying the visitor, the CPU 61 determines whether or not this company name is its own group company (step S503). That is, the CPU 61 determines with reference to the company name dictionary defined in the second visitor reception scenario table (see the description of step S401 in FIG. 16).

会社名がグループ企業である場合、ＣＰＵ６１は、第２の来客受付シナリオテーブルから部署特定（ＩＤ２）のデータをロードし（ステップＳ５０４）、部署特定（ＩＤ２）に従って訪問客との対話を実行する（ステップＳ５０５）。このステップＳ５０５の処理はステップＳ５０２と同様な処理であり、これも後に詳述する。 If the company name is a group company, the CPU 61 loads the department identification (ID2) data from the second visitor reception scenario table (step S504), and executes a dialog with the visitor according to the department identification (ID2) ( Step S505). The processing in step S505 is similar to that in step S502, and this will also be described in detail later.

次いで、ＣＰＵ６１は来訪予約テーブルの来訪予約データ（図１５）を参照する（ステップＳ５０６）。そして、来訪者の音声入力を認識して生成した来客データが来訪予約データにあるか否かを判断する（ステップＳ５０７）。 Next, the CPU 61 refers to the visit reservation data (FIG. 15) in the visit reservation table (step S506). Then, it is determined whether or not the visitor data generated by recognizing the voice input of the visitor is in the visit reservation data (step S507).

来訪予約データに来客データがあると判断した場合、ＣＰＵ６１は、ステップＳ５１０に処理を移す一方、来客データがないと判断した場合、ステップＳ５０８により、第２の来客受付シナリオテーブルから担当者特定（ＩＤ３）のデータをロードし、次いで、この担当者特定（ＩＤ３）に従って訪問客との対話を実行し（ステップＳ５０９）、その後処理をステップＳ５１０に移す。なお、ステップＳ５０９の処理についてもステップＳ５０２と同様な処理であり、やはり後に詳述する。 If it is determined that there is visitor data in the visit reservation data, the CPU 61 proceeds to step S510. If it is determined that there is no visitor data, the CPU 61 determines the person in charge (ID3) from the second visitor reception scenario table in step S508. ) Is then loaded, and then a dialog with the visitor is executed according to the person-in-charge identification (ID3) (step S509), and then the process proceeds to step S510. Note that the processing in step S509 is similar to that in step S502, and will be described in detail later.

そして、ＣＰＵ６１は、ステップＳ５１０において、来訪予約テーブルの担当者電話番号を取得する（ステップＳ５１０）。そして、担当者に電話を介して連絡し（ステップＳ５１１）、来客の旨を音声により告げて来客受付処理を終了する。あるいは、ステップＳ５１１では、前述したように、音声対話装置（自動受付装置）のマイク３とスピーカ４とを介して、来訪者が担当者（社員）とが直接通話できるように通信制御処理を行い、通信が終了した時点で客受付処理の終了としてもよい。 In step S510, the CPU 61 obtains the person in charge telephone number of the visit reservation table (step S510). Then, the person in charge is contacted via the telephone (step S511), the effect of the visitor is announced by voice, and the visitor reception process is terminated. Alternatively, in step S511, as described above, the communication control process is performed so that the visitor can directly talk with the person in charge (employee) via the microphone 3 and the speaker 4 of the voice interaction device (automatic reception device). The customer acceptance process may be terminated when communication is completed.

ここで、上記ステップＳ５０２、ステップＳ５０５、ステップＳ５０９における訪問客との対話の実行処理について、図２０及び図２１を参照して詳述する。ステップＳ５０２、ステップＳ５０４及びステップＳ５０９の処理は、いずれも基本的には同じ処理フローとなるため、以下では、ステップＳ５０２の処理を基本として説明し、その中で、ステップＳ５０５、ステップＳ５０９の処理についても適宜説明することとする。 Here, the execution process of the dialog with the visitor in the above step S502, step S505, and step S509 will be described in detail with reference to FIG. 20 and FIG. Since the processes in steps S502, S504, and S509 all have the same process flow, the process in step S502 will be basically described below, and the processes in steps S505 and S509 will be described. Will be described as appropriate.

図２０に示すように、ＣＰＵ６１は、第２の来客受付シナリオテーブルの来客特定（ＩＤ１）に従って、先ず、入力画面の表示処理及びセリフの再生処理を実行する（ステップＳ６００）。 As shown in FIG. 20, according to the visitor identification (ID1) in the second visitor reception scenario table, the CPU 61 first executes an input screen display process and a speech reproduction process (step S600).

このステップＳ６００の処理は、図２１に示すサブルーチンによって実行される。すなわち、先ず、対話シナリオテーブルの該当データから入力項目のリストを取得する（ステップＳ７００）。図１９のステップＳ５０１の処理であれば、第２の来客受付シナリオテーブルの来客特定（ＩＤ１）を、ステップＳ５０４の処理であれば第２の来客受付シナリオテーブルの部署特定（ＩＤ２）を、ステップＳ５０９の処理であれば第２の来客受付シナリオテーブルの部署特定（ＩＤ３）を取得することになる。 The processing in step S600 is executed by a subroutine shown in FIG. That is, first, a list of input items is acquired from the corresponding data in the dialogue scenario table (step S700). If it is the process of step S501 in FIG. 19, the visitor identification (ID1) of the second visitor reception scenario table is specified, and if it is the process of step S504, the department identification (ID2) of the second visitor reception scenario table is specified, step S509. If it is the process, the department identification (ID3) of the second visitor reception scenario table is acquired.

次いで、ＣＰＵ６１は、取得したＩＤ１（あるいはＩＤ２又はＩＤ３）の入力項目中に他の入力項目に従属する項目があるか否かを判断する（ステップＳ７０１）。従属する項目がある場合はステップＳ７０４に処理を移す一方、従属する項目がない場合は、ステップＳ７０２において、図面の初期化処理を行ってすでに表示されている入力領域を削除するとともに、前述した（図５を用いた音声入力処理の概要説明）確認ボタンを表示し（ステップＳ７０３）、処理をステップＳ７０４に移す。 Next, the CPU 61 determines whether or not there is an item subordinate to another input item in the acquired input item of ID1 (or ID2 or ID3) (step S701). If there are subordinate items, the process proceeds to step S704. If there are no subordinate items, in step S702, the drawing initialization process is performed to delete the input area already displayed, and the above-described ( Outline of voice input processing using FIG. 5) A confirmation button is displayed (step S703), and the process proceeds to step S704.

ＣＰＵ６１は、ステップＳ７０４〜ステップＳ７１１で示される処理を、取得したＩＤ１（あるいはＩＤ２又はＩＤ３）の全ての入力項目（ＩＤ１であれば会社名と名前（姓）、ＩＤ２であれば部署名のみ、ＩＤ３であれば担当者名と部署名）への入力処理を開始する。 The CPU 61 performs the processing shown in steps S704 to S711 for all the input items of the acquired ID1 (or ID2 or ID3) (if ID1, company name and last name), if ID2, only the department name, ID3 If so, input processing to the person in charge and department name) is started.

すなわち、先ず、入力項目の必須チェックを行い（ステップＳ７０５）、必須であれば入力項目用の入力領域を必須用の大きさで表示する一方、必須でなく任意の場合は、入力項目用の入力領域を任意用の大きさで描画する（ステップＳ７０７）。 That is, first, the input item is checked for necessity (step S705). If it is required, the input area for the input item is displayed in the required size, but if it is not required, the input for the input item is displayed. The area is drawn with an arbitrary size (step S707).

ＩＤ１における入力項目である会社名、名前（姓）及びＩＤ２における部署名は、図１７に示すように、いずれも必須であるので、この来客受付処理におけるＩＤ１に従った対話実行処理ではステップＳ７０６の処理がなされることになる。 As shown in FIG. 17, the company name, first name (last name), and the department name in ID2 are all input items in ID1, and as a result, in the dialog execution process according to ID1 in this visitor reception process, step S706 is performed. Processing will be done.

他方、ＩＤ３における担当者名と部署名のうち、部署名については任意となっている。そこで、ＣＰＵ６１は、図２２に示すように、必須である「担当者名入力区画」領域に対して小さな領域からなる「部署名入力区画」を表示する。 On the other hand, among the person in charge name and department name in ID3, the department name is arbitrary. Therefore, as shown in FIG. 22, the CPU 61 displays a “department name input section” composed of a small area with respect to the essential “person name input section” area.

このように、除法処理装置のＣＰＵ６１は、音声ガイダンスに応じて前記音声入力部に入力されるべき必須の入力項目に加え、任意の入力項目をタッチパネルディスプレイ２に表示するときには、必須の入力項目と任意の入力項目とで表示形式を変更する制御を行う。かかる表示制御を行うことにより、訪問者は、複数の入力項目がタッチパネルディスプレイ２に表示されていても、必ず音声入力すべき項目が何れであるかを視覚的に判断することが可能となり、音声対話装置としての使い勝手が向上する。 As described above, when the CPU 61 of the division processing apparatus displays an arbitrary input item on the touch panel display 2 in addition to the essential input items to be input to the voice input unit according to the voice guidance, Control to change the display format with any input item. By performing such display control, it becomes possible for a visitor to visually determine which item should be input by voice even when a plurality of input items are displayed on the touch panel display 2. Usability as an interactive device is improved.

次いで、ＣＰＵ６１は、ＩＤ１（あるいはＩＤ２又はＩＤ３）の入力項目が他の入力項目に従属しているか否かを判断し（ステップＳ７０８）、従属する入力項目を特定する（ステップＳ７０９）。 Next, the CPU 61 determines whether or not the input item of ID1 (or ID2 or ID3) is subordinate to another input item (step S708), and specifies the subordinate input item (step S709).

ＩＤ２における部署名のように、ＩＤ１の会社名に従属しているものであれば、例えば図１８に示したように、従属する入力項目の入力領域との間（「会社名入力区画」領域と「部署名入力区画」領域との間）に関連を表す線を描画する（ステップＳ７１０）。他方、従属でない場合は、この入力処理を終了してステップＳ７１２に処理を移す（ステップＳ７１１）。 If it is subordinate to the company name of ID1 as in the department name in ID2, for example, as shown in FIG. 18, it is between the input areas of the subordinate input items ("company name input section" area and A line representing the relation is drawn in the “part name input section” area (step S710). On the other hand, if it is not dependent, this input process is terminated and the process proceeds to step S712 (step S711).

なお、ＩＤ１及びＩＤ３における入力項目である会社名と名前（姓）、及び担当者名と部署名は、図１７に示すように、いずれも従属ではないので、この来客受付処理におけるＩＤ１に従った対話実行処理ではステップＳ７０９及びステップＳ７１０の処理はなされない。 As shown in FIG. 17, since the company name and name (last name), the person in charge name and the department name, which are input items in ID1 and ID3, are neither subordinates, they follow ID1 in this customer acceptance process. In the dialog execution process, steps S709 and S710 are not performed.

ステップＳ７１２において、ＣＰＵ６１は、第２の来客受付シナリオテーブルの来客特定（ＩＤ１）に設定されたセリフを取得する。すなわち、「こんにちは。会社名とお名前を教えてください。」のセリフを取得する。 In step S712, the CPU 61 obtains the line set in the visitor identification (ID1) of the second visitor reception scenario table. In other words, to get the words of "Please tell me hello. Company name and your name.".

そして、取得したセリフを音声ガイダンスとしてスピーカ４から出力し（ステップＳ７１３）、ＣＰＵ６１は処理を図２０のステップＳ６０１に移す。 Then, the acquired speech is output from the speaker 4 as voice guidance (step S713), and the CPU 61 moves the process to step S601 in FIG.

ステップＳ６０１では、来客（訪問客）による音声入力がなされる。これは、音声入力処理の概要で説明した処理の流れに準ずるものである（図５参照）。 In step S601, a voice input is made by a visitor (visitor). This is in accordance with the process flow described in the outline of the voice input process (see FIG. 5).

すなわち、訪問客により、タッチパネルディスプレイ２上の所定の入力項目の領域がタッチされると、入力項目に対応する音声認識用の辞書をロードして、入力項目に対応する表示領域を、入力待ち状態を示す表示に変更し（図７参照）、システム状態を入力待ち状態に遷移する。そして、ＣＰＵ６１は、その状態で音声入力を受付け（図８参照）、さらに、入力内容の認識処理を開始すると表示形式変更処理を実行する（図９参照）。そして、先にロードした辞書を用いて、入力された音声の認識処理を行い、認識結果が複数個あれば、その尤度順に表示する（図１１参照）。 That is, when a predetermined input item area on the touch panel display 2 is touched by a visitor, a dictionary for speech recognition corresponding to the input item is loaded, and the display area corresponding to the input item is set in an input waiting state. Is displayed (see FIG. 7), and the system state transitions to the input waiting state. Then, the CPU 61 accepts the voice input in that state (see FIG. 8), and further executes the display format changing process when the input content recognition process is started (see FIG. 9). Then, the input speech is recognized using the previously loaded dictionary, and if there are a plurality of recognition results, they are displayed in order of likelihood (see FIG. 11).

訪問客が、タッチパネルディスプレイ２に表示された認識結果を確認して例えばＯＫボタンを押すと（ステップＳ６０２）、ＣＰＵ６１は、ＩＤ１（あるいはＩＤ２又はＩＤ３における必須の入力項目が全て入力されたか否かを判断し（ステップＳ６０３）、入力されていない入力項目があると判定した場合は、例えば、「○○の項目が入力されていません。」などの必須項目の入力を促す発話を出力し（ステップＳ６０４）、ステップＳ６０１に処理を移す。他方、ステップＳ６０３で必須の入力項目が全て入力されたと判断した場合は、ＣＰＵ６１は、ステップＳ５０２、ステップＳ５０５、ステップＳ５０９の処理を終了し、ステップＳ５０３又はステップＳ５０６又はステップＳ５１０の処理に移す。 When the visitor confirms the recognition result displayed on the touch panel display 2 and presses an OK button, for example (step S602), the CPU 61 determines whether or not all essential input items in ID1 (or ID2 or ID3) have been input. If it is determined (step S603) and it is determined that there is an input item that has not been input, for example, an utterance that prompts the user to input an essential item such as “No item of ○○ has been input” is output (step 603). In step S604, the process proceeds to step S601, on the other hand, if it is determined in step S603 that all the essential input items have been input, the CPU 61 ends the processes in step S502, step S505, and step S509, and then proceeds to step S503 or step S509. The process proceeds to S506 or step S510.

なお、ステップＳ６０４における発話に係るセリフのデータは、各対話シナリオの各ＩＤに関連付けられてＨＤＤ７内に発話用音声データとして記憶されているものである。 Note that the speech data relating to the utterance in step S604 is stored as utterance voice data in the HDD 7 in association with each ID of each dialogue scenario.

以上説明してきたように、音声対話装置を用いて来客受付を行えば、利用者が音声による情報入力に慣れていない場合であっても、どのような内容をどのように音声入力すればよいかを把握することが容易となるので、誰でも安心して利用することができる。 As described above, if you use a voice interaction device to accept customers, what kind of content should be input and how even if the user is not used to inputting information by voice? Since it becomes easy to grasp, anyone can use it with peace of mind.

ところで、上述してきた実施形態において、図２０に示すステップＳ６００とステップＳ６０１との処理の間、具体的には、図２１のステップＳ７１３で音声ガイダンスを出力した後、訪問者が入力する項目に対応する表示領域をタッチパネルディスプレイ２上でタッチした状態で、発話を開始するまでの間（図５参照）に、所定の時間が経過したときは、表示処理手段として機能する情報処理装置がタッチパネルディスプレイ２に、入力すべき項目の入力例を表示することもできる。 By the way, in the embodiment described above, during the processing of step S600 and step S601 shown in FIG. 20, specifically, after outputting voice guidance in step S713 of FIG. 21, it corresponds to the item input by the visitor. When a predetermined time elapses before the utterance is started (see FIG. 5) while the display area to be touched on the touch panel display 2, the information processing apparatus functioning as a display processing unit is displayed on the touch panel display 2. In addition, an input example of an item to be input can be displayed.

すなわち、タッチパネルディスプレイ２に表示した入力項目が指定されてから所定期間経過したとき、当該入力項目に対する入力例を表示することのできる音声対話装置とすることができる。 That is, when a predetermined period elapses after the input item displayed on the touch panel display 2 is specified, a voice interactive device that can display an input example for the input item can be provided.

入力例の表示形態としては、例えば会社名を入力する場合であれば、図２３に示すように、「会社名入力区画」領域の横に、○△工業や××販売や△×商事というように入力例が列記される。したがって、利用者（訪問者）が、会社名を音声入力するときに、株式会社や有限会社まで発話しなければならないのかなどと悩んで時間が経過した場合など、自分がタッチしている領域に近接して入力例が表示されるため、音声入力するのにいたずらに時間がかかることを防止できる。 As a display form of an input example, for example, in the case of inputting a company name, as shown in FIG. 23, “△ industrial”, “XX sales”, “Δ × trading” is displayed next to the “company name input section” area. Examples of input are listed in. Therefore, when the user (visitor) is worried about whether he / she has to speak to a corporation or limited company when inputting the company name by voice, the user touches the area that he / she touches. Since the input examples are displayed close to each other, it is possible to prevent the time required for voice input from being unnecessarily long.

また、図１６のステップＳ４０２の処理のように、入力項目に従属する項目がある場合は、図１８で示したように、各入力区画領域同士が互いに関連するように線で結ばれた状態で表示されるとしたが、例えばタッチパネルディスプレイ２上に３つ以上の入力区画領域がある場合など、図２４に示すように、従属項目を示す入力区画領域（図２４における「部署入力区画」領域）が、いずれの入力区画領域から派生したかを、あたかもアニメーションのように表示することができる。 If there are items subordinate to the input items as in the process of step S402 in FIG. 16, as shown in FIG. 18, the input partition areas are connected with lines so as to be related to each other. Although displayed, for example, when there are three or more input section areas on the touch panel display 2, as shown in FIG. 24, the input section area indicating the subordinate items (the “department input section” area in FIG. 24) However, it can be displayed as if it were an animation.

また、音声対話を進行させていく中で、利用者（訪問者）の音声入力すべき項目が、図２５に示すように、「はい」や「いいえ」などのような場合も考えられる。 In addition, as the voice conversation proceeds, the items to be input by the user (visitor) may be “Yes” or “No” as shown in FIG.

そのような場合は、利用者（訪問者）は音声入力を行わず、入力項目を示す入力区画領域をタッチするだけでも音声対話装置が入力を受け付けるようにしてもよい。つまり、簡単な内容、あるいは重要な内容の入力は、発声させることなく、タッチパネルディスプレイ２によるタッチ入力だけで完結させるのである。 In such a case, the user (visitor) does not perform voice input, and the voice interaction apparatus may accept input only by touching the input section area indicating the input item. That is, input of simple contents or important contents is completed only by touch input by the touch panel display 2 without uttering.

利用者とすれば、他人に聞こえさせたくないような内容を入力しなければならない場合でも安心して音声対話装置を利用することが可能となる。 If it is a user, even if it is necessary to input contents that the other person does not want to hear, it is possible to use the voice interactive apparatus with peace of mind.

また、来訪者がアポイントメントをとることなく、いきなり訪問してきた場合が想定される。そのようなときに、上述した第１の来客受付シナリオや第２の来客受付シナリオに従うと、第１の来客受付シナリオでは図１６を用いて説明したように、結局は「アポイントが登録されておりません。」との音声出力がなされ（ステップＳ４０５参照）、用件確認シナリオなどに引き継がれることになるし、第２の来客受付シナリオに従うとしても（図１９〜図２１参照）、来訪者は自分の名前や所属などを先に音声入力しなければならないため時間がかかるおそれがある。 In addition, it is assumed that a visitor suddenly visits without making an appointment. In such a case, according to the first visitor reception scenario and the second visitor reception scenario described above, in the first visitor reception scenario, as described with reference to FIG. Is output (see step S405), and it will be carried over to the business confirmation scenario, etc., and even if it follows the second visitor reception scenario (see FIGS. 19 to 21), the visitor May take time because his name and affiliation must be input first.

そこで、例えば、予めアポなし受付シナリオを用意しておくとともに、タッチパネルディスプレイ２上に「アポなし」ボタン領域を表示しておくとよい。そして、「アポなし」ボタン領域がタッチされると、アポなし受付シナリオが読みだされて、音声ガイダンスとして、「連絡をとりたい社員を教えてください。」などのセリフがスピーカ４から出力されるのである。そのときに、タッチパネルディスプレイ２に表示される画面としては、図２５に示す形態が考えられる。 Therefore, for example, an appointmentless reception scenario is prepared in advance, and a “no appointment” button area may be displayed on the touch panel display 2. When the “no appointment” button area is touched, the no appointment reception scenario is read, and a speech such as “Please tell me the employee you want to contact.” Is output from the speaker 4 as voice guidance. It is. At that time, as a screen displayed on the touch panel display 2, the form shown in FIG. 25 can be considered.

図２６は、この場合においてタッチパネルディスプレイ２に表示される画面の説明図であり、図示するように、部署名や社員の名前は必須項目として所定の大きさの入力区画領域で表示されるが、グループ名などの任意の入力項目は、それらよりも小さな入力区画領域となっている。すなわち、この場合も、音声ガイダンスに応じて音声入力部に入力されるべき必須の入力項目に加え、任意の入力項目をタッチパネルディスプレイ２に表示するときには、必須の入力項目と任意の入力項目とで表示形式を変更する制御に従っている。 FIG. 26 is an explanatory diagram of a screen displayed on the touch panel display 2 in this case. As shown in the figure, the department name and the employee name are displayed as required items in the input partition area of a predetermined size. Arbitrary input items such as group names are smaller than the input area. That is, also in this case, when displaying an arbitrary input item on the touch panel display 2 in addition to the essential input item to be input to the voice input unit according to the voice guidance, the required input item and the optional input item Follow the control to change the display format.

なお、アポなし受付シナリオでは、図示しないが、そのテーブルでは入力項目の項目名として部署名、名前（姓）及びグループが設定され、部署名、名前については必須チェックでは必須を示すフラグが、グループについては任意を示すフラグが立っていることは言うまでもない。 In the no appointment reception scenario, although not shown in the table, the department name, first name (surname), and group are set as the item name of the input item. Needless to say, there is a flag indicating that it is optional.

ところで、上述してきた実施形態では、音声対話装置は赤外線センサ５により利用者（訪問者）を検知するようにしたが、利用者を検出することができるものであればその装置や手段としては何を用いても構わない。来訪者自ら操作することで来訪信号を制御部に出力するようなスイッチなどであってもよい。 By the way, in embodiment mentioned above, although the voice interactive apparatus detected the user (visitor) with the infrared sensor 5, what is the apparatus and means as long as it can detect a user? May be used. It may be a switch that outputs a visit signal to the control unit by operating the visitor.

上述してきた実施形態より、以下の音声対話装置を実現することができる。
（１）音声ガイダンスと、当該音声ガイダンスに応じて音声入力部（例えばマイク３）へ入力させるべき入力項目（例えば、図１４や図１７に示す項目名）と、を対応づけた対話シナリオを記憶するシナリオ記憶手段（例えばＨＤＤ７）と、前記対話シナリオに従った音声ガイダンスを音声出力部（例えばスピーカ４）から出力する音声ガイダンス出力手段（例えば、情報処理装置６のＣＰＵ６１や音声出力回路）と、前記音声ガイダンスが前記音声出力部から出力されるときに、当該音声ガイダンスに応じて音声入力部に入力されるべき入力項目を前記対話シナリオに従って表示部（例えばタッチパネルディスプレイ２）に所定表示形式（例えば、図６〜９、図１１、１２、図１８、図２２〜２６）で表示する表示処理手段（例えば、情報処理装置６のＣＰＵ６１や表示制御回路）と、前記音声入力部に入力される音声に基づいて入力内容の認識を行う音声認識手段（例えば、情報処理装置６のＣＰＵ６１や音声対話プログラム、対話シナリオファイル、音声認識辞書及び発話用音声データなど）と、を備えた音声対話装置。 From the above-described embodiments, the following voice dialogue apparatus can be realized.
(1) A dialogue scenario in which voice guidance and input items (for example, item names shown in FIGS. 14 and 17) to be input to a voice input unit (for example, the microphone 3) according to the voice guidance are associated with each other is stored. Scenario storage means (for example, HDD 7), voice guidance output means (for example, CPU 61 or voice output circuit of information processing device 6) for outputting voice guidance according to the dialogue scenario from a voice output unit (for example, speaker 4), When the voice guidance is output from the voice output unit, input items to be input to the voice input unit according to the voice guidance are displayed on the display unit (for example, the touch panel display 2) according to the predetermined scenario (for example, touch panel display 2). 6 to 9, FIG. 11, 12, FIG. 18, FIG. 22 to FIG. 26) display processing means (for example, information A speech recognition means (for example, the CPU 61 of the information processing device 6, a voice dialogue program, a dialogue scenario file) that recognizes the input content based on the voice inputted to the voice input unit. Voice recognition dictionary and speech data for speech).

（２）上記（１）において、前記表示処理手段（例えば、情報処理装置６のＣＰＵ６１や表示制御回路）は、前記表示部（例えばタッチパネルディスプレイ２）に表示した入力項目（例えば、図１４や図１７に示す項目名）が指定されているときに、前記音声入力部（例えばマイク３）への音声の入力が行われると、当該入力項目の表示形式を変更する（例えば、図１０の処理（図７及び図８を参照））音声対話装置。 (2) In the above (1), the display processing means (for example, the CPU 61 or the display control circuit of the information processing device 6) displays the input items (for example, FIG. 14 and FIG. 14) displayed on the display unit (for example, the touch panel display 2). When the voice input to the voice input unit (for example, the microphone 3) is performed when the item name shown in FIG. 17 is specified, the display format of the input item is changed (for example, the processing ( See FIG. 7 and FIG.

（３）上記（１）において、前記表示処理手段（例えば、情報処理装置６のＣＰＵ６１や表示制御回路）は、前記表示部（例えばタッチパネルディスプレイ２）に表示した入力項目（例えば、図１４や図１７に示す項目名）が指定されているときに、音声認識処理が開始されると、当該入力項目の表示形式を変更する（例えば、図９を参照）音声対話装置。 (3) In the above (1), the display processing means (for example, the CPU 61 or the display control circuit of the information processing apparatus 6) displays the input items (for example, FIG. 14 and FIG. 14) displayed on the display unit (for example, the touch panel display 2). When the voice recognition process is started when the item name shown in FIG. 17 is designated, the voice interactive device changes the display format of the input item (see, for example, FIG. 9).

（４）上記（１）〜（３）のいずれかにおいて、前記表示処理手段（例えば、情報処理装置６のＣＰＵ６１や表示制御回路）は、前記音声ガイダンス出力手段（例えば、情報処理装置６のＣＰＵ６１や音声出力回路）により音声ガイダンスを出力後、所定期間経過したときに、前記入力項目を前記表示部（例えばタッチパネルディスプレイ２）に表示する音声対話装置。 (4) In any one of the above (1) to (3), the display processing means (for example, the CPU 61 or the display control circuit of the information processing apparatus 6) is connected to the voice guidance output means (for example, the CPU 61 of the information processing apparatus 6). Or a voice output circuit), a voice interaction device that displays the input items on the display unit (for example, the touch panel display 2) when a predetermined period has elapsed after the voice guidance is output.

（５）上記（１）〜（４）のいずれかにおいて、前記音声認識手段（例えば、情報処理装置６のＣＰＵ６１や音声対話プログラム、対話シナリオファイル、音声認識辞書及び発話用音声データなど）により認識された複数の入力内容の候補を前記表示部へ表示する候補表示手段（例えば、情報処理装置６のＣＰＵ６１や表示制御回路）と、前記候補表示手段により表示された複数の入力内容の候補から、いずれか一つの候補を選択する選択手段（例えばＯＫボタンなどの確認ボタン）と、前記選択手段により選択された候補を入力内容として決定する入力処理手段（例えば、情報処理装置６のＣＰＵ６１）とを備えた音声対話装置。 (5) In any one of the above (1) to (4), the voice recognition means (for example, the CPU 61 of the information processing device 6, a voice dialogue program, a dialogue scenario file, a voice recognition dictionary, speech voice data, etc.) is recognized. Candidate display means for displaying a plurality of input content candidates displayed on the display unit (for example, the CPU 61 or the display control circuit of the information processing device 6), and a plurality of input content candidates displayed by the candidate display means, Selection means for selecting any one candidate (for example, a confirmation button such as an OK button) and input processing means for determining the candidate selected by the selection means as input contents (for example, the CPU 61 of the information processing apparatus 6) Spoken dialogue device provided.

（６）上記（１）〜（５）のいずれかにおいて、前記表示処理手段（例えば、情報処理装置６のＣＰＵ６１や表示制御回路）は、前記音声ガイダンスに応じて前記音声入力部（例えばマイク３）に入力されるべき必須の入力項目（例えば、図２２に示す担当者名）に加え、任意の入力項目（例えば、図２２に示す部署名）を前記表示部に表示するときには、前記必須の入力項目と前記任意の入力項目とで表示形式を変更する音声対話装置。 (6) In any one of the above (1) to (5), the display processing means (for example, the CPU 61 or the display control circuit of the information processing device 6) is configured to perform the voice input unit (for example, the microphone 3) according to the voice guidance. ) In addition to the required input items to be input (for example, the person in charge name shown in FIG. 22), when displaying an arbitrary input item (for example, the department name shown in FIG. 22) on the display unit, A voice interactive apparatus that changes a display format between an input item and the arbitrary input item.

（７）上記（１）〜（６）のいずれかにおいて、前記表示処理手段（例えば、情報処理装置６のＣＰＵ６１や表示制御回路）は、前記表示部（例えばタッチパネルディスプレイ２）に表示した入力項目（例えば、図１４や図１７に示す項目名）が指定されてから所定期間経過したとき、当該入力項目に対する入力例（例えば、図２３に示す○△工業や××販売や△×商事）を表示する音声対話装置。 (7) In any one of the above (1) to (6), the display processing means (for example, the CPU 61 or the display control circuit of the information processing device 6) is an input item displayed on the display unit (for example, the touch panel display 2). When a predetermined period has elapsed since the designation of the item (for example, the item name shown in FIG. 14 or FIG. 17), an input example for the input item (for example, XX industry, XX sales, or △ X trading shown in FIG. 23) A voice interaction device to display.

（８）上記（１）〜（７）のいずれかにおいて、前記表示処理手段（例えば、情報処理装置６のＣＰＵ６１や表示制御回路）は、前記入力項目（例えば、図１４や図１７に示す項目名）に対する入力内容が所定の入力内容であるときに、すでに表示している入力項目に従属する入力項目（例えば、図１８における部署）を前記表示部（例えばタッチパネルディスプレイ２）に表示する音声対話装置。 (8) In any one of the above (1) to (7), the display processing means (for example, the CPU 61 or the display control circuit of the information processing apparatus 6) is configured to input the input items (for example, the items shown in FIGS. 14 and 17). When the input content for the name is a predetermined input content, a voice dialogue for displaying on the display unit (for example, touch panel display 2) an input item (for example, a department in FIG. 18) subordinate to the input item that has already been displayed. apparatus.

（９）上記（１）〜（８）のいずれかにおいて、前記音声認識手段（例えば、情報処理装置６のＣＰＵ６１や音声対話プログラム、対話シナリオファイル、音声認識辞書及び発話用音声データなど）は、複数の音声認識用辞書（例えば、名前辞書、会社辞書、社員名辞書など）を有しており、前記表示部（例えばタッチパネルディスプレイ２）に表示される入力項目（例えば、図１４や図１７に示す項目名）に応じた音声認識用辞書を選択して前記音声入力部（例えばマイク３）に入力された音声の認識を行う音声対話装置。 (9) In any one of the above (1) to (8), the voice recognition means (for example, the CPU 61 of the information processing device 6, a voice dialogue program, a dialogue scenario file, a voice recognition dictionary, and voice data for speech) It has a plurality of voice recognition dictionaries (for example, name dictionary, company dictionary, employee name dictionary, etc.), and input items (for example, in FIG. 14 and FIG. 17) displayed on the display unit (for example, touch panel display 2). A voice dialogue apparatus that selects a voice recognition dictionary corresponding to the item name and recognizes a voice input to the voice input unit (for example, the microphone 3).

（１０）上記（１）〜（９）のいずれかにおいて、前記表示処理手段（例えば、情報処理装置６のＣＰＵ６１や表示制御回路）は、前記音声ガイダンスに応じて前記音声入力部（例えばマイク３）に入力されるべき入力項目（例えば、図１４や図１７に示す項目名）が複数あるとき、これら複数の入力項目をそれぞれ所定表示形式で前記表示部（例えばタッチパネルディスプレイ２）に表示し、前記音声認識手段（例えば、情報処理装置６のＣＰＵ６１や音声対話プログラム、対話シナリオファイル、音声認識辞書及び発話用音声データなど）は、複数の音声認識用辞書（例えば、名前辞書、会社辞書、社員名辞書など）を有しており、前記複数の入力項目のうちすでに認識した入力項目の入力内容に応じた音声認識用辞書を選択し、前記複数の入力項目のうちまだ認識していない入力項目に対して前記音声入力部に入力される音声の認識を、前記選択した音声認識用辞書を用いて行う音声対話装置。 (10) In any one of the above (1) to (9), the display processing unit (for example, the CPU 61 or the display control circuit of the information processing device 6) is configured to perform the voice input unit (for example, the microphone 3) according to the voice guidance. ), When there are a plurality of input items (for example, item names shown in FIG. 14 and FIG. 17), each of the plurality of input items is displayed on the display unit (for example, touch panel display 2) in a predetermined display format. The voice recognition means (for example, the CPU 61 of the information processing device 6, the voice dialogue program, the dialogue scenario file, the voice recognition dictionary, and the voice data for speech) are a plurality of voice recognition dictionaries (eg, name dictionary, company dictionary, employee). A name dictionary, etc.), and selects a speech recognition dictionary corresponding to the input content of the input item already recognized among the plurality of input items. Yet the recognition of speech input to the speech input unit for the input item not recognized, voice dialogue system carried out using the selected speech recognition dictionary of the input items.

また、上述してきた実施形態より、コンピュータを、上記（１）〜（１０）のいずれかに記載の音声対話装置の各手段として機能させるプログラムが実現できる。 In addition, from the embodiment described above, it is possible to realize a program that causes a computer to function as each unit of the voice interaction device according to any one of (1) to (10).

さらに、上述してきた実施形態より、記憶部（例えばＨＤＤ７）に記憶した対話シナリオに従った音声ガイダンスを音声出力部（例えばスピーカ４）から出力する手順Ｓ１と、前記音声ガイダンスが前記音声出力部（例えばスピーカ４）から出力されるときに、当該音声ガイダンスに応じて音声入力部（例えばマイク３）に入力されるべき入力項目（例えば、図１４や図１７に示す項目名）を、前記認識された入力内容と前記対話シナリオとに従って表示部（例えばタッチパネルディスプレイ２）に所定表示形式で表示する手順Ｓ２と、前記音声入力部に入力される音声に基づいて入力内容の認識を行う手順Ｓ３とを有する音声対話処理方法が実現できる。 Furthermore, from the above-described embodiment, the procedure S1 for outputting voice guidance according to the conversation scenario stored in the storage unit (for example, HDD 7) from the voice output unit (for example, the speaker 4), and the voice guidance is the voice output unit ( For example, when output from the speaker 4), the input item (for example, the item name shown in FIGS. 14 and 17) to be input to the voice input unit (for example, the microphone 3) according to the voice guidance is recognized. A procedure S2 for displaying in a predetermined display format on a display unit (for example, the touch panel display 2) according to the input content and the dialogue scenario, and a procedure S3 for recognizing the input content based on the voice input to the voice input unit. The voice interaction processing method possessed can be realized.

以上、本発明を、主として会社などに設置される自動受付装置に適用した実施形態を通して説明したが、本発明は上述した実施形態に限定されるものではなく、音声により利用者と対話を行うことにより、利用者が要求する情報やサービスを提供する音声対話装置として広く用いることができる。 As mentioned above, although this invention was demonstrated through embodiment applied to the automatic reception apparatus mainly installed in a company etc., this invention is not limited to embodiment mentioned above, It communicates with a user by an audio | voice. Therefore, it can be widely used as a voice interactive apparatus that provides information and services required by users.

本実施形態にかかる音声対話処理方法の概略図である。It is the schematic of the voice dialogue processing method concerning this embodiment. 本実施形態における来訪者受付装置の構成図である。It is a block diagram of the visitor reception apparatus in this embodiment. 本実施形態における来訪者受付装置に記憶される来訪取扱データベースの一例を示す図である。It is a figure which shows an example of the visit handling database memorize | stored in the visitor reception apparatus in this embodiment. 本実施形態に係る音声対話装置の動作の流れ示す説明図である。It is explanatory drawing which shows the flow of operation | movement of the voice interactive apparatus which concerns on this embodiment. 音声入力処理の概要を示す説明図である。It is explanatory drawing which shows the outline | summary of an audio | voice input process. タッチパネルディスプレイに表示される画面の説明図である。It is explanatory drawing of the screen displayed on a touchscreen display. タッチパネルディスプレイに表示される画面の説明図である。It is explanatory drawing of the screen displayed on a touchscreen display. タッチパネルディスプレイに表示される画面の説明図である。It is explanatory drawing of the screen displayed on a touchscreen display. タッチパネルディスプレイに表示される画面の説明図である。It is explanatory drawing of the screen displayed on a touchscreen display. 音声入力の入力レベルに応じた表示形式変更処理の手順を示す説明図である。It is explanatory drawing which shows the procedure of the display format change process according to the input level of audio | voice input. タッチパネルディスプレイに表示される画面の説明図である。It is explanatory drawing of the screen displayed on a touchscreen display. タッチパネルディスプレイに表示される画面の説明図である。It is explanatory drawing of the screen displayed on a touchscreen display. 本実施形態に係る音声対話装置の電気的構成を示すブロック図である。It is a block diagram which shows the electric constitution of the voice interactive apparatus which concerns on this embodiment. 記憶部に記憶されている対話シナリオファイルの一例である第１の来客受付シナリオを表化して示した来客受付シナリオテーブルである。It is a visitor reception scenario table which tabulated and showed the 1st visitor reception scenario which is an example of the dialogue scenario file memorize | stored in the memory | storage part. 来客受付シナリオに関連付けられて記憶部に記憶されている来訪予約データを表化して示した来訪予約テーブルである。It is the visit reservation table which tabulated and showed the visit reservation data linked | related with the visitor reception scenario and memorize | stored in the memory | storage part. 来客受付シナリオに従って進行する音声対話の流れを示す説明図である。It is explanatory drawing which shows the flow of the voice dialogue which progresses according to a visitor reception scenario. 第２の来客受付シナリオテーブルの説明図である。It is explanatory drawing of a 2nd visitor reception scenario table. タッチパネルディスプレイに表示される画面の説明図である。It is explanatory drawing of the screen displayed on a touchscreen display. 来客受付シナリオに従って進行する音声対話処理の一例を示すフローチャートである。It is a flowchart which shows an example of the voice dialogue process which progresses according to a visitor reception scenario. 同音声対話処理のサブルーチンを示すフローチャートである。It is a flowchart which shows the subroutine of the same voice interaction process. 同音声対話処理のサブルーチンを示すフローチャートである。It is a flowchart which shows the subroutine of the same voice interaction process. タッチパネルディスプレイに表示される画面の説明図である。It is explanatory drawing of the screen displayed on a touchscreen display. タッチパネルディスプレイに表示される画面の説明図である。It is explanatory drawing of the screen displayed on a touchscreen display. タッチパネルディスプレイに表示される画面の説明図である。It is explanatory drawing of the screen displayed on a touchscreen display. タッチパネルディスプレイに表示される画面の説明図である。It is explanatory drawing of the screen displayed on a touchscreen display. 他の実施形態においてタッチパネルディスプレイに表示される画面の説明図である。It is explanatory drawing of the screen displayed on a touchscreen display in other embodiment.

Explanation of symbols

１筐体
２タッチパネルディスプレイ
３マイク
４スピーカ
５赤外線センサ
６情報処理装置
７ハードディスク装置
１０受付カウンタ DESCRIPTION OF SYMBOLS 1 Case 2 Touch panel display 3 Microphone 4 Speaker 5 Infrared sensor 6 Information processing device 7 Hard disk device 10 Reception counter

Claims

Scenario storage means for storing a dialogue scenario in which voice guidance and input items to be input to the voice input unit according to the voice guidance are associated with each other;
Voice guidance output means for outputting voice guidance according to the dialogue scenario from a voice output unit;
Display processing means for displaying input items to be input to the voice input unit according to the voice guidance in a predetermined display format according to the dialogue scenario when the voice guidance is output from the voice output unit;
Voice recognition means for recognizing input content based on voice input to the voice input unit;
A spoken dialogue apparatus characterized by comprising:

Candidate display means for displaying a plurality of input content candidates recognized by the voice recognition means on the display unit;
Selection means for selecting any one candidate from a plurality of input content candidates displayed by the candidate display means;
The voice interactive apparatus according to claim 1, further comprising: an input processing unit that determines a candidate selected by the selection unit as input content.

The spoken dialogue apparatus according to claim 1, wherein the display processing unit displays an input example for the input item when a predetermined period has elapsed after the input item displayed on the display unit is specified. .

The display processing means includes
The display format of the input item is changed when voice is input to the voice input unit when the input item displayed on the display unit is specified. The spoken dialogue apparatus according to any one of claims.

The display processing means includes
The display format of the input item is changed when the voice recognition process is started when the input item displayed on the display unit is designated. The spoken dialogue apparatus described.

6. The display unit according to claim 1, wherein the display processing unit displays the input item on the display unit when a predetermined period has elapsed after the voice guidance is output by the voice guidance output unit. The spoken dialogue apparatus according to any one of claims.

In addition to the essential input items to be input to the voice input unit in response to the voice guidance, the display processing means is configured to display the optional input item and the optional input item when displaying any input item on the display unit. The spoken dialogue apparatus according to claim 1, wherein a display format is changed depending on an input item.

The display processing means displays an input item subordinate to an input item already displayed on the display unit when the input content for the input item is a predetermined input content. 8. The voice interactive apparatus according to any one of 7 above.

The voice recognition means has a plurality of voice recognition dictionaries, selects a voice recognition dictionary corresponding to an input item displayed on the display unit, and recognizes a voice input to the voice input unit. The spoken dialogue apparatus according to claim 1, wherein the voice dialogue apparatus is performed.

When there are a plurality of input items to be input to the voice input unit according to the voice guidance, the display processing unit displays the plurality of input items on the display unit in a predetermined display format,
The voice recognition means has a plurality of voice recognition dictionaries, selects a voice recognition dictionary corresponding to the input content of the input item already recognized among the plurality of input items, and stores the plurality of input items. 10. The speech input to the speech input unit is recognized for input items that have not yet been recognized, using the selected speech recognition dictionary. The voice interactive apparatus described in 1.

A program for causing a computer to function as each means of the voice interactive apparatus according to any one of claims 1 to 10.

Outputting voice guidance from the voice output unit according to the dialogue scenario stored in the storage unit;
When the voice guidance is output from the voice output unit, displaying input items to be input to the voice input unit according to the voice guidance in a predetermined display format on the display unit according to the dialogue scenario;
Recognizing input content based on voice input to the voice input unit;
A voice dialogue processing method comprising: