JP2006003743A

JP2006003743A - Speech dialogue method and device

Info

Publication number: JP2006003743A
Application number: JP2004181589A
Authority: JP
Inventors: Tetsuo Amakasu; 哲郎甘粕; Junichi Hirasawa; 純一平澤; Shunichiro Yamamoto; 俊一郎山本; Akihiro Fuku; 昭弘富久
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2004-06-18
Filing date: 2004-06-18
Publication date: 2006-01-05
Anticipated expiration: 2024-06-18
Also published as: JP4249665B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech dialogue method and its device with which discrimination is made to discriminate the validity/invalidity of input information at the time of re-opening when the dialogue of a speech dialogue system is interrupted and then, it is re-opened. <P>SOLUTION: The speech dialogue device consists of an interactive history generating device 11 which generates a dialogue history of input information being inputted before the interrupted point of time of the dialogue and which consists of the control program position of the dialogue scenario that is being executed at that point of time and time information at that point of time, and of a dialogue history obtaining device 4 which takes out the dialogue history obtained by the dialogue history generating device 11 and judges whether the time information within the input information is still valid or not while referring to the time information in the input information within the taken out dialogue history to the time information at the point in time of dialogue reopening. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、利用者との対話処理を一時停止しその後対話処理を復帰（再開）させる場合の音声対話方法及び装置に関する。 The present invention relates to a voice interaction method and apparatus in a case where a dialogue process with a user is paused and then the dialogue process is restored (resumed).

利用者によるコンピュータへの音声入力に伴いコンピュータでの情報処理にて必要な音声応答出力を得る音声対話システムにあっては、従来は利用者の入力操作に始まりコンピュータ側（システム側）の音声応答処理にて利用者に音声入力を促しあるいは選択を促すことにより順に対話を進め、利用者が必要とする情報を得るものである。この場合、一般的には、このような利用者とコンピュータ（システム）との一連の対話は、対話が始まれば対話終了まで進行するかあるいは途中切断する必要が生じて最初からやり直すか、いずれかにて行われ、対話途中の中断（あるいはシステムの一時休止）及び対話の中断部分からの再開という制御プログラムは、現実に稼動しているシステムでは見当たらない。 In a voice dialogue system that obtains a voice response output required for information processing by a computer in response to a voice input to the computer by a user, conventionally, the voice response on the computer side (system side) starts with a user input operation. By prompting the user for voice input or prompting for selection in the process, the dialog is sequentially advanced to obtain information required by the user. In this case, in general, such a series of dialogue between the user and the computer (system) is either progressed to the end of the dialogue when the dialogue starts, or it is necessary to cut off halfway and start over from the beginning. The control program that is interrupted in the middle of the dialog (or pauses the system) and resumes from the interrupted part of the dialog is not found in a system that is actually operating.

一方、技術的な提案として、特許文献１では、音声対話中に対話の実行中の位置をマークして保存しておき、その後に保存されたマークの入力にて、始めから対話をやり直さなくてもそのマークされた位置にジャンプすることで途中の対話を省くことが可能であるという音声対話システムが開示されている。このとき、対話で入力された情報も保存され、呼び出すことができるので、入力内容についても再入力の手間も省けるというものである。
特許文献２及び特許文献３では、現在の話題が何かという情報（話題の焦点）をスタックとして管理する対話システムが開示され、ある音声対話の途中で、その時点までにシステムが進めようとしていた話題とは別の話題に関する新規の質問が利用者の発話として入力された際に、新たな話題として新規の質問に回答するというゴールを含んだ焦点情報を生成し、焦点情報を管理しているスタック領域を最上位にあるこれまで話題の焦点情報の上に積みあげ（プッシュし）、その新たな話題が完了しゴールが達成された際はスタックから完了した話題の焦点情報を取り除き（ポップし）、元の話題の焦点情報を再び参照可能とすることで、ある話題の一時中断とその復帰についてシステム発話の一貫性を保ちながら応答することを可能としている。 On the other hand, as a technical proposal, in Patent Document 1, a position where a conversation is being executed is marked and saved during a voice conversation, and then the conversation is not restarted from the beginning by inputting the saved mark. In addition, a voice dialogue system is disclosed in which a dialogue on the way can be omitted by jumping to the marked position. At this time, the information input in the dialog is also saved and can be recalled, so that it is possible to save the input contents and the trouble of re-input.
In Patent Document 2 and Patent Document 3, a dialog system that manages information on what the current topic is (the focus of the topic) as a stack is disclosed, and the system was about to advance to that point in the middle of a certain voice conversation When a new question related to a topic different from the topic is input as a user's utterance, focus information including the goal of answering the new question as a new topic is generated, and the focus information is managed. The stack area is stacked on top of the focus information of the topic at the top (push), and when the new topic is completed and the goal is achieved, the focus information of the completed topic is removed (popped) from the stack. ) By making it possible to refer to the focus information of the original topic again, it is possible to respond while maintaining the consistency of the system utterance about the temporary suspension and return of a topic That.

以上特許文献１〜３によれば、音声対話システムで実行中の対話を一時的に中断した際に、中断位置をマーキングしその入力情報と共に記憶し、その中断した対話を復帰させた際に記憶した情報を再開させることが可能となる。
このことを例えば利用者の希望の日付と地域を決定した天気予報として具体化したとき、利用者が例えば「３月２３日」「神奈川県東部」の天気を３月２２日知りたいとき、対話システムにて利用者（Ｕｓｅｒ）（Ｕとする）とシステム（Ｓｙｓｔｅｍ）（Ｓとする）とで入力音声及び出力音声のやり取り（対話）があったと仮定する。
今、システムＳからの出力音声として「神奈川県東部の３月２３日の天気」の確認があったとき、利用者Ｕから「中断」の入力がされ、その後３月２４日に対話の「再開」が入力されると、システムＳからの出力音声として、中断時の「神奈川県東部の３月２３日の天気」の確認から再開することになる。すなわち、「それでは、神奈川県東部の３月２３日の天気でよろしいでしょうか？」の如き確認のための中断時の出力音声が繰り返されてシステムから発話される。しかし、３月２３日は既に過去となっており、利用者にとっては３月２３日の天気は意味が無いので、利用者Ｕの入力音声としては当然「いいえ」となり、このためシステムからの出力音声としては「それでは、神奈川県東部の３月２３日をどのように直したらよいでしょう？」の如き情報の訂正の問合せが発話される。そして、利用者Ｕからは、例えば別の日付「３月２５日」などの日付指定の発話が入力される。
特開２００３−２２３１８７号公報特開２００１−１４２８７０号公報特開２００１−３５６７９７号公報 According to Patent Documents 1 to 3 described above, when a dialog being executed in the voice dialog system is temporarily interrupted, the interrupt position is marked and stored together with the input information, and stored when the interrupted dialog is restored. It becomes possible to resume the information.
For example, when this is embodied as a weather forecast in which the user's desired date and area are determined, the user wants to know the weather of “March 23” and “Eastern Kanagawa” on March 22, for example. It is assumed that the user (User) (supplied as U) and the system (System) (assumed as S) have exchanged (interactive) input voice and output voice.
Now, when the output from the system S is confirmed as “March 23 weather in eastern Kanagawa”, the user U inputs “suspend”, and then on March 24, the dialog “Resume” "Is input as the output sound from the system S, and the process restarts from the confirmation of" the weather on March 23 in eastern Kanagawa "at the time of the interruption. That is, the output voice at the time of the interruption such as “Would you like the weather on March 23 in eastern Kanagawa Prefecture?” Is repeated and spoken from the system. However, since March 23 is already in the past and the weather on March 23 is meaningless for the user, the input voice of the user U is naturally “No”, and therefore the output from the system As a voice, an inquiry for correction of information such as "How should we fix March 23 in eastern Kanagawa Prefecture?" Then, the user U inputs a date-designated utterance such as another date “March 25”.
JP 2003-223187 A JP 2001-142870 A JP 2001-356797 A

対話システムを上記具体例の天気予報にて述べたように、通常の天気予報では、未来の日付・時間の予報を知ることが目的であり、上述の対話例中３月２４日の中断後では中断前の入力情報中の「３月２３日」という日付情報は既に無効である（有効ではない）となる。
ところが、従来技術による対話システムでは中断前の入力情報が再開後にもそのまま繰り返され、中断後の入力情報としてすでに有効ではなくなった日付情報があること、すなわち無効となっている入力情報の存在を検知することができない。そのため、中断前の入力情報中に有効ではない日付情報があることを再開以降で利用者自身に確認させることになって負担を強いることになるという問題がある。 As described in the weather forecast of the specific example above, in the normal weather forecast, the purpose of the dialog system is to know the future date / time forecast. After the interruption on March 24 in the above dialog example, The date information “March 23” in the input information before the interruption is already invalid (not valid).
However, in the conventional interactive system, the input information before the interruption is repeated as it is after the restart, and there is date information that is no longer valid as the input information after the interruption, that is, the presence of invalid input information is detected. Can not do it. For this reason, there is a problem in that the user himself / herself confirms that there is date information that is not valid in the input information before the interruption, which imposes a burden.

また、従来技術による対話システムでは、中断前の入力情報中にあって「地域」と「日付」のどちらの情報が有効でないか（具体例の場合は日付であるが）を認知することができないため、対話システムから「地域」と「日付」両方を提示して問合せ、情報の訂正を求めている。このため、この情報の訂正を認識するにあたり「地域」と「日付」の両方の訂正に関して認識可能にしなければならない。このことは、広範な発話を音声認識するための音声認識用文法と辞書を利用しなければならないこととなり、音声認識対象が広範になればそれだけ認識の精度が下がるという問題が生ずる。 Also, in the dialogue system according to the prior art, it is impossible to recognize which information of “region” or “date” is invalid in the input information before the interruption (in the specific example, it is a date). Therefore, the dialogue system presents both “region” and “date” and asks for correction of information. For this reason, in order to recognize the correction of this information, it is necessary to make it possible to recognize both the “region” and “date” corrections. This means that a speech recognition grammar and dictionary for recognizing a wide range of utterances must be used, and there arises a problem that the recognition accuracy decreases as the speech recognition target becomes wider.

更に、従来技術では、具体例にて示すように中断時の対話がそのまま繰り返されるので「それでは」という単語がそのまま用いられる等、システム応答文の表現が中断から再開までの時間間隔を全く考慮することなく、利用者からは唐突で不自然なシステムの応答文となってしまっている。
以上のことを勘案すると、音声対話システムの対話中における中断あるいはその後の再開につき、再開時にて入力情報の有効あるいは無効が顕著に表れる入力情報内の時情報の有効あるいは無効を判別していないという問題があり、またこの判別しないことに起因して上述のような派生的な問題を孕んでいる。 Furthermore, in the prior art, as shown in the specific example, since the dialogue at the time of interruption is repeated as it is, the word “Now” is used as it is, and the expression of the system response sentence completely considers the time interval from interruption to resumption. The user's response is a sudden and unnatural system.
Considering the above, it is said that the validity / invalidity of the time information in the input information in which the validity / invalidity of the input information is noticeable at the time of resumption is not determined at the time of the interruption or the subsequent resumption during the conversation There is a problem, and due to the fact that this is not discriminated, the above-mentioned derivative problem is envied.

本発明は、上述の問題を解決するために発明されたもので、音声対話システムの対話中における中断あるいはその後の再開につき、再開時の入力情報の有効あるいは無効を判別する音声対話方法及びその装置の提供を目的とする。 The present invention has been invented to solve the above-described problem, and a voice interaction method and apparatus for determining whether input information at the time of resumption is valid or not with respect to interruption or subsequent resumption during a conversation of the voice interaction system. The purpose is to provide.

上述の目的を達成する本発明は、音声入力を処理し応答音声を出力する音声対話において、対話の中断が入力されるとき、対話の中断時点までに入力された入力情報とその時点で実行中の対話シナリオの制御プログラム位置とその時点の時情報とからなる対話履歴を生成する対話履歴生成処理と、対話の再開が入力されるとき、上記対話履歴生成処理にて得られた対話履歴を取り出し、取り出された対話履歴中の入力情報内の時情報を対話の再開時点の時情報に照らして前記入力情報内の時情報が未だに有効か無効かを判定する対話履歴取得処理と、を有することを特徴とする。 The present invention which achieves the above-mentioned object is the voice information processing the voice input and outputting the response voice. When the interruption of the dialogue is inputted, the input information inputted up to the point of interruption of the dialogue and being executed at that time Dialog history generation process for generating a dialog history consisting of the control program position of the dialog scenario and the time information at that time, and when dialog resumption is input, the dialog history obtained in the dialog history generation process is extracted. A dialogue history acquisition process for determining whether the time information in the input information is still valid or invalid by comparing the time information in the input information in the extracted dialogue history with the time information at the time of restarting the dialogue. It is characterized by.

本発明によれば、音声対話システムにあって、中断した対話の再開時に、日付や時刻などの時情報の有効性あるいは無効性について対話シナリオの目的に応じて自動的に判断することが可能となる。 According to the present invention, it is possible to automatically determine the validity or invalidity of time information such as date and time according to the purpose of a dialogue scenario when a suspended dialogue is resumed in a voice dialogue system. Become.

ここで、図を参照して本発明の実施形態を説明する。
図１に、本発明の実施形態となる音声対話装置の全体の概略ブロック構成を示す。この音声対話装置において、まず対話制御装置１は、対話制御プログラム２の実行によって進行される対話シナリオの内容に従い、入力装置である発話理解装置３及び対話履歴取得装置４からの入力に基づいて対話制御用メモリ５を更新しながら装置全体の制御演算を行う。そして、この対話制御装置１による制御の対象は、入力装置である、利用者の音声入力を得て認識辞書７及び認識文法８によって音声を認識する音声認識装置６とこの音声認識装置６からの単語やフレーズを認識して発話理解結果を得る発話理解装置３、システムによって生成された応答音声を出力する音声出力装置９、対話シナリオの中断の際対話履歴記憶装置１０へ記憶すべき入力情報及び対話制御プログラム位置を対話制御用メモリ５より取り出す対話履歴生成装置１１、対話履歴記憶装置１０に記憶された対話履歴生成装置１１からの入力情報及び対話制御プログラム位置等の対話履歴情報を取り出しかつ後述の経過フラグによる判断あるいは中断経過時間量情報の演算を行う対話履歴取得装置４を有する。また、対話履歴生成装置１１及び対話履歴取得装置４には、その時の現在時刻情報を得るための時計１２が接続される。 Now, an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 shows an overall schematic block configuration of a voice interaction apparatus according to an embodiment of the present invention. In this spoken dialogue apparatus, first, the dialogue control device 1 performs dialogue based on the input from the utterance understanding device 3 and the dialogue history acquisition device 4 as input devices in accordance with the contents of the dialogue scenario that is advanced by the execution of the dialogue control program 2. The control calculation of the entire apparatus is performed while updating the control memory 5. The object to be controlled by the dialogue control device 1 is an input device. The speech recognition device 6 recognizes speech by the recognition dictionary 7 and the recognition grammar 8 after obtaining the user's speech input, and the speech recognition device 6. An utterance understanding device 3 that recognizes words and phrases and obtains an utterance understanding result, an audio output device 9 that outputs response speech generated by the system, input information to be stored in the dialogue history storage device 10 when a dialogue scenario is interrupted, and The dialog history generating device 11 for extracting the dialog control program position from the dialog control memory 5, the input information from the dialog history generating device 11 stored in the dialog history storage device 10 and the dialog history information such as the dialog control program position are extracted and described later. Has a dialog history acquisition device 4 that performs a determination based on the progress flag or calculation of the interruption elapsed time amount information. The dialog history generation device 11 and the dialog history acquisition device 4 are connected to a clock 12 for obtaining current time information at that time.

この図１の構成において、まず音声認識装置６は、対話制御装置１からの制御信号に基づき、認識すべき文中の単語とその発音の情報を格納した認識辞書７と、認識すべき文中での単語の並び方を表した認識文法８とを用いて、利用者の音声を認識し単語列を出力するものである。
発話理解装置３は、対話制御装置１からの制御信号に基づき、音声認識装置６の出力した単語列からその音声対話装置にとって意味のある単語を取り出して発話理解結果として出力し、またはその音声対話装置にとって意味のあるフレーズを識別して発話理解結果として出力するものである。この発話理解装置３の出力情報である発話理解結果が利用者から対話制御装置１への入力情報となる。なお、前述の音声認識装置６がその認識結果として、孤立単語しか認識しない場合には、この認識結果を対話制御装置１への入力情報としてそのまま用いれば良く、したがってこの場合には発話理解装置３は省いても良い。 In the configuration of FIG. 1, first, the speech recognition device 6 is based on a control signal from the dialogue control device 1 and a recognition dictionary 7 storing words in the sentence to be recognized and information on their pronunciation, and in the sentence to be recognized. The recognition grammar 8 representing how words are arranged is used to recognize a user's voice and output a word string.
Based on the control signal from the dialogue control device 1, the utterance understanding device 3 extracts a word meaningful to the voice dialogue device from the word string output from the voice recognition device 6 and outputs it as the utterance understanding result or the voice dialogue. Phrases that are meaningful to the device are identified and output as utterance comprehension results. The utterance understanding result that is the output information of the utterance understanding device 3 becomes input information from the user to the dialogue control device 1. If the speech recognition device 6 recognizes only an isolated word as the recognition result, the recognition result may be used as it is as input information to the dialog control device 1, and in this case, the speech understanding device 3 is used. May be omitted.

音声出力装置９は、対話制御装置１からの制御信号に基づき、対話制御装置１から指示された文字列または応答の意味（コンセプト）を表す情報から音声合成装置（図示省略）などを利用して音声を合成しシステム応答として出力する。この場合、対話制御装置１から指示された予め録音ないし合成されていた音声情報を再生することにより、利用者にシステム応答として出力する。
音声対話システムとしては、音声認識装置６及び発話理解装置３からの入力情報を取り込む対話制御装置１にて、対話シナリオに沿って対話制御用メモリ５とやり取りしつつ、演算にて応答文を作成して音声出力装置９から応答音声を出力する。 The voice output device 9 uses a voice synthesizer (not shown) or the like based on a control signal from the dialogue control device 1 based on a character string instructed from the dialogue control device 1 or information indicating the meaning (concept) of the response. Synthesize speech and output as system response. In this case, by reproducing the voice information recorded or synthesized in advance instructed from the dialogue control device 1, it is output as a system response to the user.
As the voice dialogue system, the dialogue control device 1 that captures the input information from the voice recognition device 6 and the utterance understanding device 3 creates a response sentence by calculation while interacting with the dialogue control memory 5 according to the dialogue scenario. Then, a response voice is output from the voice output device 9.

ここにおいて、対話中の中断及び再開があった場合には、主に対話履歴生成装置１１や対話履歴取得装置４において、ソフトウエアによる処理が行われる。すなわち、対話履歴生成装置１１は、対話制御装置１からの制御信号に基づき、対話制御用メモリ５内に格納されている入力情報および、対話シナリオ中の実行位置である対話制御プログラム位置（対話シナリオ位置）を取り出し、時計１２から取り出したその時点の現在時刻情報を時情報とし、これらを組み合わせて対話履歴情報として対話履歴記憶装置１０へ保存する。 Here, when interruption and resumption occur during the dialogue, processing by software is mainly performed in the dialogue history generation device 11 and the dialogue history acquisition device 4. That is, the dialogue history generation device 11 is based on the control signal from the dialogue control device 1 and the input information stored in the dialogue control memory 5 and the dialogue control program position (dialog scenario) which is the execution position in the dialogue scenario. Position) is taken out, current time information taken out from the timepiece 12 is used as time information, and these are combined and stored in the dialogue history storage device 10 as dialogue history information.

図５は、対話履歴生成装置１１における処理ソフトウエアを例示しており、フローチャートとしては、対話制御用メモリ５に記憶された入力情報及び対話シナリオ位置を取得し（ステップ５１）、時計１２により現在時刻情報を取得して時情報を得て（ステップ５２）、各入力情報、対話シナリオ位置及び時情報を組にして対話履歴情報として生成する（ステップ５３）。
対話履歴記憶装置１０は、少なくとも対話履歴生成装置１１による対話履歴情報を記憶する記憶装置であり、対話制御装置１の中断や電源遮断などによっても消去することなく記憶を維持するものである。この対話履歴記憶装置１０は、フラッシュメモリなどの不揮発性メモリ、ハードディスクドライブ、あるいは、対話制御装置１が動作している音声対話装置本体とは別のネットワークなどにて接続されたリレーショナルデータべ−ス装置などを用いて実現することができる。 FIG. 5 exemplifies processing software in the dialog history generation device 11. As a flowchart, the input information and the dialog scenario position stored in the dialog control memory 5 are acquired (step 51), and the current time is displayed by the clock 12. Time information is obtained and time information is obtained (step 52), and each piece of input information, dialogue scenario position and time information is generated as dialogue history information (step 53).
The dialogue history storage device 10 is a storage device that stores at least dialogue history information from the dialogue history generation device 11, and maintains the memory without being erased even when the dialogue control device 1 is interrupted or powered off. This dialogue history storage device 10 is a non-volatile memory such as a flash memory, a hard disk drive, or a relational database connected via a network other than the voice dialogue device main body on which the dialogue control device 1 is operating. It can be realized using an apparatus or the like.

また、対話履歴取得装置４は、対話制御装置１からの制御信号に基づき、対話履歴記憶装置１０における複数の対話履歴情報の中から該当する対話履歴情報を取得し、その対話履歴情報の中から入力情報および対話シナリオ位置および時情報を取り出し、しかもこの取り出した時刻を時情報として時計１２により取得して対話履歴情報中の時情報から取り出した時刻の時情報までの経過時間量を算出して中断経過時間量情報を得る。ここで、対話履歴情報生成時の時情報及び対話履歴情報取り出し時の時情報のほか、入力情報自体の中に日付や時刻などの時点を示す時情報の存在により、その入力情報自体の時情報と対話履歴情報取り出し時の時情報とを比較し、その対話システムの使用目的上、入力情報自体の時情報が無効となっている場合は真（無効である）、有効である場合は偽（有効である）を示す経過フラグを各々の入力情報に対して算出する。対話履歴取得装置４は、これら対話履歴情報中の入力情報、対話シナリオ位置、時情報、更には経過フラグ、対話履歴情報取り出し時刻の時情報、中断経過時間量情報を対話履歴取得情報として対話制御装置１に出力する。なお、この対話履歴取得装置４の処理ソフトウエアは、図６に示すフローチャートにて例示するが、この図６の説明は、後述の動作と共に説明する。また、入力情報に対する経過フラグの算出の過程は、図７に例示しており、対話システムの使用上の目的に応じて対話制御装置１からの必要な時間レベルと同時刻情報を有効と判定するか無効と判定するかのフラグが指示され、入力情報内の時情報と対話履歴情報取り出し時の時情報を用いて算出される。また、時計１２は、要求指令があるとき要求元に、要求指令を受けた時点での時刻を示す現在時刻情報を返す。 Further, the dialog history acquisition device 4 acquires corresponding dialog history information from a plurality of dialog history information in the dialog history storage device 10 based on a control signal from the dialog control device 1, and from the dialog history information. The input information and dialogue scenario position and time information are taken out, and the taken time is acquired by the clock 12 as time information, and the elapsed time amount from the time information in the dialogue history information to the time information at the time taken out is calculated. Get information on the amount of elapsed time for interruption. Here, in addition to the time information at the time of dialog history information generation and the time information at the time of dialog history information retrieval, the time information of the input information itself due to the presence of time information indicating the time such as date and time in the input information itself. Is compared to the time information at the time of retrieving the conversation history information. For the purpose of using the dialog system, the time information of the input information itself is invalid (invalid), and the time information is valid (false) A progress flag indicating (valid) is calculated for each piece of input information. The dialog history acquisition device 4 controls dialog control using the input information, the dialog scenario position, the time information, the progress flag, the time information of the dialog history information extraction time, and the interruption elapsed time amount information in the dialog history information as the dialog history acquisition information. Output to device 1. The processing software of the dialog history acquisition device 4 is illustrated in the flowchart shown in FIG. 6, but the description of FIG. 6 will be described together with the operation described later. Further, the process of calculating the progress flag for the input information is illustrated in FIG. 7, and it is determined that the required time level and the same time information from the dialog control device 1 are valid according to the purpose of use of the dialog system. Flag indicating whether or not to determine invalidity is instructed, and the time information in the input information and the time information at the time of extracting the conversation history information are calculated. Further, the clock 12 returns current time information indicating the time when the request command is received to the request source when there is a request command.

対話制御用メモリ５には、対話制御装置１の実行中に、各入力情報および対話制御プログラムの実行位置（対話シナリオ位置）が保存される。更に、この対話制御用メモリ５には、対話履歴取得装置４から対話履歴取得情報を対話制御装置１が受け取る場合、その中の各入力情報と経過フラグ、対話シナリオ位置、中断経過時間量情報が保存される。
対話制御プログラムは、対話制御用メモリ５の内容および発話理解装置３からの発話理解結果出力に基づき、音声認識装置６、発話理解装置３、音声出力装置９、対話履歴生成装置１１、対話履歴取得装置４、対話制御用メモリ５に対する対話制御装置１からの制御や指示の情報が記述されており、対話制御装置１により読み込まれて実行される。 The dialog control memory 5 stores the input information and the execution position of the dialog control program (dialog scenario position) during the execution of the dialog control device 1. Further, in the dialog control memory 5, when the dialog control apparatus 1 receives the dialog history acquisition information from the dialog history acquisition apparatus 4, each of the input information, the progress flag, the dialog scenario position, and the interruption elapsed time information are stored. Saved.
The dialogue control program is based on the contents of the dialogue control memory 5 and the utterance understanding result output from the utterance understanding device 3, and the speech recognition device 6, the utterance understanding device 3, the voice output device 9, the dialogue history generation device 11, and the dialogue history acquisition. Information on control and instructions from the dialog control device 1 to the device 4 and the dialog control memory 5 is described, and is read and executed by the dialog control device 1.

次に、この実施形態における音声対話システムの全体の動作を対話シナリオに沿って図２以下にて説明する。この実施形態は、利用者の音声入力及びシステムの応答出力を対話制御プログラムに沿って入力し、処理し、出力するものであるが、この動作説明では対話の中断、再開（復帰）の動作について主眼に置く。もちろん、中断や再開が生じない状態では、通常の音声入力情報にて音声出力を得る対話が行われる。
また、この動作説明の前提として、複数の利用者についての識別は、例えば電話の場合には電話番号、ネットワークの場合にはアドレスやパスワード等のＩＤ情報にて行われる。したがって、中断時あるいは再開時の対話の整合については、例えばこのＩＤ情報の一致を見ることにより行われる。更に、対話の中断や再開に当たり、入力情報内の時情報と再開時の時情報を比較して対話システムの使用目的に照らし入力情報内の時情報の有効あるいは無効を判断するため、入力情報内の時情報は中断時点での時情報に限らず、中断時以前の任意の時点での入力情報内の時情報に関する有効あるいは無効を判断することも可能である。しかし、ここでの動作説明では、まず中断時の入力情報内に時情報が含まれるケースにつき述べる。 Next, the overall operation of the voice interaction system in this embodiment will be described with reference to FIG. In this embodiment, the voice input of the user and the response output of the system are input, processed, and output in accordance with the dialog control program. In this operation description, the operation of interrupting and resuming (returning) the dialog is described. Put on the main focus. Of course, in a state where no interruption or resumption occurs, a dialogue for obtaining a voice output with normal voice input information is performed.
Further, as a premise of this operation description, identification of a plurality of users is performed by ID information such as a telephone number in the case of a telephone and an address and a password in the case of a network, for example. Therefore, the matching of the dialogue at the time of interruption or resumption is performed, for example, by checking the coincidence of the ID information. Furthermore, when the dialog is interrupted or resumed, the time information in the input information is compared with the time information at the time of restart to determine whether the time information in the input information is valid or invalid according to the purpose of use of the dialog system. The time information is not limited to the time information at the time of interruption, and it is also possible to determine whether the time information in the input information at any time before the interruption is valid or invalid. However, in the description of the operation here, first, a case where time information is included in the input information at the time of interruption will be described.

図２は、対話シナリオの一般的な大略フローチャートを示す。利用者からの最初の発話があり、利用者のＩＤの整合があった場合、図２に示すようにシステムからの開始応答が出力される（ステップ２１）。この後利用者からの発話が再開（復帰）の発話か否かを判断し（ステップ２２）、再開の場合には後述のような再開処理を行う（ステップ２３）。ステップ２２にて再開でなく対話シナリオの最初からの発話の場合、あるいはステップ２３での再開処理後の場合、通常の対話シナリオに沿い、単位シナリオ毎の処理が行われ（ステップ２４）、次いで全単位シナリオが処理されたか否か判断され（ステップ２５）、未だ全シナリオが未処理の場合次の単位シナリオに移行する処理（ステップ２６）が行われる。 FIG. 2 shows a general schematic flowchart of an interaction scenario. When there is an initial utterance from the user and there is a match of the user ID, a start response from the system is output as shown in FIG. 2 (step 21). Thereafter, it is determined whether or not the utterance from the user is a resumption (return) utterance (step 22), and in the case of resumption, resumption processing as described later is performed (step 23). In the case of utterance from the beginning of the dialogue scenario instead of restarting in step 22, or after the restart processing in step 23, processing for each unit scenario is performed in accordance with the normal dialogue scenario (step 24), then all It is determined whether or not the unit scenario has been processed (step 25), and if all the scenarios have not been processed yet, processing for shifting to the next unit scenario (step 26) is performed.

次いで、音声対話システムについて天気予報を念頭において更に具体的に説明する。図３は、利用者からの音声で入力情報として取得し、地域と日付を特定することによる対話シナリオの具体的フローチャートである。ここでは、地域取得の処理にて再開判断を含めて示し、図２の単位シナリオ処理（ステップ２４）を地域取得１、日付取得１、入力確認１、再入力１としている。図３にあって、長方形で示される処理単位が単位シナリオ処理であり、対話制御用メモリ５にて対話制御プログラム位置（対話シナリオ位置）として記録される。ただし、「中断処理１」および「中断処理２」は、対話シナリオ位置として対話制御用メモリ５への記録は行わない。 Next, the voice dialogue system will be described more specifically with the weather forecast in mind. FIG. 3 is a specific flowchart of a dialogue scenario that is obtained as input information by voice from a user and specifies a region and a date. Here, the region acquisition process includes the restart determination, and the unit scenario process (step 24) in FIG. 2 is area acquisition 1, date acquisition 1, input confirmation 1, and re-input 1. In FIG. 3, a processing unit indicated by a rectangle is a unit scenario process, and is recorded as a dialog control program position (dialog scenario position) in the dialog control memory 5. However, “interrupt process 1” and “interrupt process 2” are not recorded in the dialog control memory 5 as dialog scenario positions.

図３にあって、まず地域取得１として地域を尋ね（ステップ３１）、地域に対する音声入力後日付取得１として地域を復唱しつつ日付を尋ね（ステップ３２）、日付に対する音声入力後、入力確認１として地域と日付を復唱しつつ確認し（ステップ３３）、確認の結果入力が否定されると再入力１として修正内容を尋ね（ステップ３４）、修正の発話にて入力確認１へ戻る。そして、入力確認１が肯定されると結果出力１として該当する天気予報を出力する（ステップ３５）。
図３における「中断処理１」は、「地域取得１」「日付取得１」「入力確認１」「再入力１」の各単位シナリオ処理における対話シナリオ位置において、利用者からの対話の中断を命令する発話が入力された場合に図３の破線にて示す如く呼び出される。この場合、現在の入力情報を保存して対話を一時中断する旨を伝える内容を音声出力装置９から出力する（ステップ３６）。 In FIG. 3, first asking for the area as the area acquisition 1 (step 31), asking the date while repeating the area as the date acquisition 1 after voice input for the area (step 32), after inputting the voice for the date, the input confirmation 1 And confirming the region and date while repeating (step 33). If the input is denied as a result of the confirmation, the correction content is asked as re-input 1 (step 34), and the process returns to the input confirmation 1 with the correction utterance. When the input confirmation 1 is affirmed, the corresponding weather forecast is output as the result output 1 (step 35).
“Interrupt process 1” in FIG. 3 instructs the user to interrupt the dialog at the position of the dialog scenario in each unit scenario process of “region acquisition 1” “date acquisition 1” “input confirmation 1” “re-input 1”. When an utterance is input, it is called as indicated by the broken line in FIG. In this case, the content that saves the current input information and informs that the dialogue is temporarily interrupted is output from the voice output device 9 (step 36).

図３において「中断処理２」では、対話履歴生成装置１１に該当する対話シナリオ位置が実行された時点での対話履歴情報を生成するように指示する。すなわちこの時点での対話制御用メモリ5での入力情報、対話シナリオ位置（対話制御プログラム位置）、及び時計１２からの時情報を生成する（ステップ３７）。
また、図３において「再開処理１」は、音声対話装置開始直後の「地域取得１」の段階（ステップ３１）において、利用者の発話によって前回の対話を復帰させるよう指示する旨の発話理解結果が出力された場合に呼び出される（図３中鎖線で示される）。このとき、対話制御装置１は対話履歴取得装置４に対して前回の対話における対話履歴取得情報を出力するよう指示し、その結果として対話履歴取得装置４から出力された対話履歴取得情報中の情報を対話制御用メモリ５に取り込む。 In “interruption process 2” in FIG. 3, the dialog history generation device 11 is instructed to generate dialog history information at the time when the corresponding dialog scenario position is executed. That is, the input information in the dialog control memory 5 at this time, the dialog scenario position (dialog control program position), and the time information from the clock 12 are generated (step 37).
In FIG. 3, “Resume processing 1” is an utterance understanding result indicating that the previous dialogue is instructed by the user's utterance in the “region acquisition 1” stage (step 31) immediately after the start of the voice dialogue device. Is called (indicated by a chain line in FIG. 3). At this time, the dialogue control device 1 instructs the dialogue history acquisition device 4 to output the dialogue history acquisition information in the previous dialogue, and as a result, information in the dialogue history acquisition information output from the dialogue history acquisition device 4. Is taken into the dialogue control memory 5.

「中断処理２」によって、対話履歴情報が対話履歴生成装置１１から対話履歴記憶装置１０に保存され、「再開処理１」にて対話履歴記憶装置１０から対話履歴取得情報が対話履歴取得装置４にて取得される。この対話履歴取得装置４内では、対話制御装置１から対話履歴取得情報の取得の指示を受けた際、図６に示すように対話履歴記憶装置１０内の対話履歴情報から対話制御装置１からの指示に該当する対話履歴情報を検索して取得する（ステップ６１）。検索には各対話履歴情報中に含まれる時計１２による時情報を指標として行うことができる。この場合、ここでは中断時点の対話履歴情報を探すことになるので時情報として最近のものを選べばよい。 Dialog history information is stored in the dialog history storage device 10 from the dialog history generation device 11 by the “interruption process 2”, and dialog history acquisition information is stored in the dialog history acquisition device 4 from the dialog history storage device 10 in the “resume process 1”. Is obtained. In the dialog history acquisition device 4, when an instruction to acquire the dialog history acquisition information is received from the dialog control device 1, the dialog history information from the dialog control device 1 is obtained from the dialog history information in the dialog history storage device 10 as shown in FIG. The dialogue history information corresponding to the instruction is searched and acquired (step 61). The search can be performed using time information by the clock 12 included in each dialogue history information as an index. In this case, since the dialog history information at the time of interruption is searched here, the latest information may be selected as the time information.

次に、対話履歴取得装置４内では、取得した対話履歴情報中の時情報に対して、時計１２に対して要求指令を送信して現在時刻が示す時情報を得る（ステップ６２）。そして、対話履歴情報内の中断時の時情報と取得時に得られた現在時刻の時情報との経過量である中断経過時間量情報を求める（ステップ６３）。このとき、中断経過時間量情報は、何日何時間何分経過というような時間単位毎の数値の減算による単なる経過量値を求めるだけでなく、暦上の日付や月、年での差分も計算する。たとえば対話履歴情報中の時情報が「西暦２００４年３月２６日午後１時５分」を表している時、得られた現在時刻が「西暦２００４年４月１０日午後１時５分」であれば、「年の単位：無変化、月の単位：１月変化、日の単位：１５日変化、時間の単位：３６０時間、分の単位：２１６００分」を示すような情報を生成する。すなわち日付の境である午前０時を何回またいでいるか、月の境である月末を何回またいでいるか、年の境である年末を何回またいでいるかという暦上の変化分の情報を各時間レベルにて付与する。 Next, in the dialog history acquisition device 4, a request command is transmitted to the clock 12 for the time information in the acquired dialog history information to obtain time information indicated by the current time (step 62). Then, interruption elapsed time amount information, which is the amount of passage between the time information at the time of interruption in the dialogue history information and the time information at the current time obtained at the time of acquisition, is obtained (step 63). At this time, the interrupted elapsed time amount information not only obtains a mere elapsed amount value by subtracting a numerical value for each time unit such as how many days, how many hours, but also the difference in calendar date, month, year. calculate. For example, when the time information in the dialogue history information represents “15:00 pm on March 26, 2004”, the current time obtained is “1:05 pm on April 10, 2004”. If there is, information that indicates “year unit: no change, month unit: January change, day unit: 15 day change, time unit: 360 hours, minute unit: 21600 minutes” is generated. In other words, information on changes in the calendar such as how many times crossing the midnight that is the boundary of the date, how many times it is crossing the end of the month that is the boundary of the month, and how many times it crosses the end of the year that is the boundary of the year Granted at each time level.

更に、対話履歴取得装置４内では、取得した対話履歴情報中の入力情報自体に日付や時刻を表す時情報があるとき（ステップ６４）、この入力情報内の時情報と時計１２から取得した現在時刻の時情報と比較し、入力情報内の時情報がその音声対話装置の使用目的上有効であるか無効であるかを経過フラグ真あるいは偽を持って判定する（ステップ６５）。この場合の判定は、図７に示す方法により行われ、入力情報中の時情報が無効である場合にはその入力情報に対して経過フラグの真値が付与される。また、入力情報の時情報が有効である場合には経過フラグとして偽値が付与される。入力情報内の時情報があって経過フラグの真または偽及び入力情報内にとき情報がない場合、対話履歴取得装置４から対話履歴取得情報として対話履歴情報のほか経過フラグ及び中断経過時間量情報を追加して出力する（ステップ６６）。 Further, in the dialog history acquisition device 4, when the input information itself in the acquired dialog history information includes time information indicating date and time (step 64), the time information in the input information and the current information acquired from the clock 12 are displayed. Compared with the time information of the time, it is determined with the progress flag true or false whether the time information in the input information is valid or invalid for the purpose of use of the voice interactive device (step 65). The determination in this case is performed by the method shown in FIG. 7, and when the time information in the input information is invalid, the true value of the progress flag is given to the input information. Further, when the time information of the input information is valid, a false value is assigned as the progress flag. If there is time information in the input information and the progress flag is true or false and there is no time information in the input information, the dialog history acquisition information from the dialog history acquisition device 4 is the progress flag and the elapsed time information of interruption. Are added and output (step 66).

ここで、図７を参照して経過フラグの判定方法を説明する。入力情報中の時情報が有効か無効かの判定には、図７に示すフローチャートが利用される。図７において、保存時刻情報は、入力情報内での時情報でありU１にて示し、現在時刻情報は、対話情報取得装置４での現在時刻の時情報でありU２にて示す。また、条件分岐で用いられる使用最小時間単位は、使用目的上必要とする時間単位の最小のレベルを示しUpにて表示し、同時刻有効判定フラグは、同時刻を有効とするか無効とするかのフラグでありFlgにて表示し，ｔｒｕｅは真で有効を示す。そして、これら使用最小時間単位及び同時刻有効判定フラグは、音声対話装置の使用目的に従い、対話シナリオ内であらかじめ指定される。 Here, a method for determining the progress flag will be described with reference to FIG. A flowchart shown in FIG. 7 is used to determine whether the time information in the input information is valid or invalid. In FIG. 7, the storage time information is time information in the input information and is indicated by U1, and the current time information is time information of the current time in the dialog information acquisition device 4 and is indicated by U2. In addition, the minimum time unit used in the conditional branch indicates the minimum level of the time unit required for the purpose of use and is displayed as Up, and the same time validity determination flag enables or disables the same time. This flag is indicated by Flg, and true is true and valid. These minimum use time unit and the same time validity determination flag are designated in advance in the dialogue scenario according to the purpose of use of the voice dialogue device.

例えば図３の対話シナリオでは、使用最小時間単位について、入力情報のうち「日付取得」の日付は日を表している。ここである対話の実行時に、対話履歴中の「日付」入力情報内の時情報が「西暦２００４年３月２３日」だつたとする。このとき「再開処理１」における処理に基づいて対話履歴取得装置４が時計１２から取得した時情報の現在時刻のうち日付までの情報が「西暦２００４年３月２４日」だった場合、天気予報の問い合わせ、という使用目的において過去の日付を用いるのは不自然である。昨日以前になってしまった入力情報「日付」に対する経過フラグは真値（経過してしまった）と判定されるべきである。また、対話が復帰された現在時刻の日付と保存されていた入力情報の日付が共に「西暦２００４年３月２３日」である場合、当日の天気予報を調べるのは自然であるから経過フラグは偽値とするべきである。つまり、この音声対話装置が扱う時間情報の最低単位である日付において同じ日付の間ではまだ経過していないと判定されるべきである。 For example, in the dialogue scenario in FIG. 3, the date of “date acquisition” in the input information represents the day for the minimum unit of use. It is assumed that the time information in the “date” input information in the dialog history is “March 23, 2004” when the dialog is executed. At this time, when the information up to the date of the current time of the time information acquired by the dialog history acquisition device 4 from the clock 12 based on the processing in the “resume processing 1” is “March 24, 2004”, the weather forecast It is unnatural to use a past date for the purpose of inquiring. The progress flag for the input information “date” which has become before yesterday should be determined to be a true value (has passed). If the date of the current time when the dialogue is restored and the date of the stored input information are both “March 23, 2004”, it is natural to check the weather forecast for the day, so the progress flag is Should be a false value. That is, it should be determined that the date that is the minimum unit of time information handled by the voice interactive apparatus has not yet passed between the same dates.

このような場合、図７経過フラグの判定時に、使用最小時間単位に「日」同時刻有効判定フラグは真（有効）として実行を指示することで、上記のような判定が行われる。このような仕組みを設けることで、対話履歴取得装置４内における入力情報中の時情報が有効か無効かの判定は、単にある日付や時刻を経過しているかどうかで判定されるのではなく、その音声対話装置（対話システム）の使用目的に合わせて判定されることに特徴がある。
ここで、図７でのフローチャートを述べる。ステップ７１では、保存時刻情報Ｕ１（２００４年３月２３日）と現在時刻情報Ｕ２（２００４年３月２４日）との比較において、この情報の使用時間最大レベルである年から始まり、月、日、時間、分のレベルを設定する。次にステップ７２では、保存時刻情報Ｕ１よりも現在時刻情報Ｕ２が大きいか（時間が経過しているか）判定し、同じ時間レベルにて差分かなければステップ７３に移り、この時間レベルが使用最小時間単位か否か判定する。今天気予報で年も月も同じ場合、ステップ７１、７２、７３を繰り返し、「日」の時間レベルになってステップ７３が満たされステップ７４に移行し経過フラグは真（無効）となる。仮に日の時間レベルでもステップ７２を満たさない場合、ステップ７３にて使用最小時間単位となるのでステップ７５に移行し、同時有効判定フラグが真か否か判定し、真の場合ステップ７６にて経過フラグは偽となって保存時刻情報は有効となる。すなわち同日の天気予報は有効となる。また、ステップ７５にて同時有効判定フラグが偽の場合ステップ７７に移行し保存時刻情報と現在時刻情報との全部を比較し、その結果にて経過フラグの真あるいは偽を判定する。 In such a case, at the time of determination of the elapsed flag in FIG. 7, the above-described determination is performed by instructing execution with the “day” same time validity determination flag being true (valid) for each minimum use time unit. By providing such a mechanism, the determination as to whether the time information in the input information in the dialogue history acquisition device 4 is valid or invalid is not simply based on whether a certain date or time has passed, It is characterized in that the determination is made according to the purpose of use of the voice interactive device (dialog system).
Here, the flowchart in FIG. 7 will be described. In step 71, in the comparison between the storage time information U1 (March 23, 2004) and the current time information U2 (March 24, 2004), this information starts from the year that is the maximum usage time level, and the month, day Set the hour, minute level. Next, in step 72, it is determined whether the current time information U2 is larger than the storage time information U1 (time has passed). If there is no difference at the same time level, the process proceeds to step 73, and this time level is the minimum used level. Judge whether or not the time unit. When the year and month are the same in the current weather forecast, Steps 71, 72, and 73 are repeated, the time level of “day” is reached, Step 73 is satisfied, the process proceeds to Step 74, and the progress flag becomes true (invalid). If step 72 is not satisfied even at the time level of the day, it becomes the minimum usable time unit at step 73, so the routine proceeds to step 75, where it is determined whether or not the simultaneous validity determination flag is true. The flag becomes false and the storage time information becomes valid. That is, the weather forecast for the same day is valid. If the simultaneous validity determination flag is false at step 75, the process proceeds to step 77, where the stored time information and the current time information are all compared, and the result is determined as true or false.

経過フラグの判定のもうひとつの例を挙げる。この音声対話装置の目的が映画や演劇などの公演チケットの予約システムであり、対話履歴情報内に保存されていた「予約希望公演開始時間」の入力情報内の時情報が「西暦２００４年３月２３日午後１８時」で、時計１２から得られた現在時刻情報が「西暦２００４年３月２３日午後１８時１０分」であった場合、すでに始まっている公演の予約を行うことは出来ないのでステップ７４にて経過フラグは真値とされるべきである。つまり、１８時丁度を過ぎて対話の再開処理が行われていたとしても、もはや経過してしまったと判定する。この場合、図７の経過フラグの判定時に、使用最小時間単位に「時間」、ステップ７５の同時刻有効判定フラグは偽として実行を指示することで上記のような判定が可能である。 Here is another example of determining the progress flag. The purpose of this voice dialogue device is a reservation system for performance tickets such as movies and plays, and the time information in the input information of “reservation desired performance start time” stored in the dialogue history information is “March 2004” If the current time information obtained from the clock 12 is “18:10 PM on March 23, 2004” at “18:00 PM on the 23rd”, it is not possible to make a reservation for a performance that has already started. Therefore, in step 74, the progress flag should be set to a true value. That is, it is determined that it has already passed even if the dialog resumption process is performed just after 18:00. In this case, at the time of determination of the elapsed flag in FIG. 7, the above determination can be made by instructing execution as “time” in the minimum use time unit and the same time validity determination flag in step 75 as false.

このように、図６、図７にて示す対話履歴取得装置４における処理を経て、対話履歴取得装置４からは対話シナリオ位置、入力情報、経過フラグ、中断経過時間量情報を含んだ対話履歴取得情報が出力され、対話制御装置１に渡される。このとき、対話制御用メモリ５にはこれらの値が書き込まれる。ただし、対話シナリオ位置に関する情報は、以降の処理で参照するために、現在実行中の対話シナリオ位置が保存されている位置とは別の位置にて保存される。
図６、図７に示す対話履歴取得装置４での処理は、対話履歴生成装置１１での生成時に当たる中断処理の後における再開処理にかかわることであるが、ここで図３から図４に至る再開処理につき述べる。 As described above, through the processing in the dialogue history acquisition device 4 shown in FIGS. 6 and 7, the dialogue history acquisition device 4 obtains the dialogue history including the dialogue scenario position, input information, elapsed flag, and interruption elapsed time amount information. Information is output and passed to the dialog control device 1. At this time, these values are written in the dialogue control memory 5. However, the information related to the dialog scenario position is stored at a position different from the position where the currently executed dialog scenario position is stored for reference in subsequent processing.
The processing in the dialog history acquisition device 4 shown in FIG. 6 and FIG. 7 is related to the restart processing after the interruption processing at the time of generation in the dialog history generation device 11. The restart process will be described.

図４における「再開処理１」では、対話履歴取得装置４から対話履歴取得情報を対話制御用メモリ５に取り込む（ステップ４１）、「再開処理２」では、対話を中断していた時間である中断経過時間量情報に基づいた中断時間間隔を表現する応答の再生を音声出力装置９に指示する。この表現の選択には、対話シナリオ中あるいは対話制御用メモリ５にあって図８に示す言語表現の第１テーブルの関係図に照らし、対話制御用メモリ５中の中断経過時間量情報を条件にして選択する（ステップ４２）。
図８の使用例としては、２日後に対話を再開した場合「日の単位」で「２日」経過していたら図８項番５該当するので日本語では「おととい」という表現を用いる。また、１カ月後に対話を再開し「月の単位」で「１月」経過していたら日本語では「先月」（図８項番１０）という表現を用いる。このような慣用的な表現を用いることで、中断時間を知らせる目的を果たすだけでなく、利用者にとって機械的でない自然でかつ簡潔な応答を出力することができる。こうして、対話の復帰時に、対話から復帰までの時間間隔についで慣用的な表現を用いて表現する自然で簡潔な応答を行う対話システムが構築可能となる。 In “Resume process 1” in FIG. 4, the dialog history acquisition information is fetched from the dialog history acquisition device 4 into the dialog control memory 5 (step 41). In “Resume process 2”, the interruption is the time during which the dialog was interrupted. The audio output device 9 is instructed to reproduce a response expressing the interruption time interval based on the elapsed time amount information. This expression is selected based on the information on the elapsed time of interruption in the dialog control memory 5 in the dialog scenario or in the dialog control memory 5 in light of the relationship diagram in the first table of the language expression shown in FIG. (Step 42).
As an example of use in FIG. 8, when “2 days” has passed in “day unit” when the conversation is resumed after 2 days, the expression “Ototo” is used in Japanese because it corresponds to item number 5 in FIG. 8. In addition, when the conversation is resumed after one month and “January” has elapsed in “Month unit”, the expression “Last month” (No. 10 in FIG. 8) is used in Japanese. By using such an idiomatic expression, not only the purpose of notifying the interruption time can be achieved, but also a natural and simple response that is not mechanical to the user can be output. In this way, it is possible to construct a dialog system that performs a natural and simple response that expresses the time interval from dialog to return using conventional expressions when the dialog returns.

次に、対話履歴取得情報中の入力情報に「地域」および「日付」が含まれていた場合この入力情報内の対話の内容としてそれら内容を知らせる表現を含んだ応答を出力するように音声出力装置１に指示する。しかし、保存されていた入力情報に「日付」に対する入力情報が含まれていても、その入力情報に対して経過フラグが真値（無効）を示している場合には「日付」の情報は読み上げない応答にする。この応答出力の決定には、対話シナリオ中あるいは対話制御用メモリ５にあって図９のような保存情報の第２テーブルを設け、この第２テーブル内の保存情報を対話制御用メモリ５の中断経過時間量情報と入力情報、経過フラグを条件に選択する（ステップ４２）。 Next, when "Region" and "Date" are included in the input information in the dialog history acquisition information, a voice output is output so that a response including an expression that informs the content as the content of the dialog in this input information is output. Instruct the device 1. However, even if the stored input information includes the input information for “date”, the “date” information is read out when the progress flag indicates a true value (invalid) for the input information. No response. To determine the response output, a second table of saved information is provided in the dialogue scenario or in the dialogue control memory 5 as shown in FIG. 9, and the saved information in this second table is interrupted by the dialogue control memory 5. The elapsed time information, input information, and elapsed flag are selected as conditions (step 42).

図４中「再開処理３」では、対話履歴取得装置４から得られた対話シナリオ位置の情報を参照し、それが「地域取得１」であるかを判定し、そうであれば「地域取得１」へジャンプする（ステップ４３）。
図４中「再開処理４」では、対話履歴取得装置４から得られた対話シナリオ位置の情報を参照し、それが「日付取得１」であるかを判定し、そうであれば「日付取得１」へジャンプする（ステップ４４）。
更に、図４中「再開処理５」では、対話履歴取得装置４から得られた対話シナリオ位置の情報を参照し、それが「再入力１」であるかを判定し、そうであれば「再入力１」へジャンプする（ステップ４５）。 In “Resume processing 3” in FIG. 4, the information on the dialogue scenario position obtained from the dialogue history acquisition device 4 is referred to determine whether it is “region acquisition 1”. (Step 43).
In “Resume processing 4” in FIG. 4, the information on the dialogue scenario position obtained from the dialogue history acquisition device 4 is referred to determine whether it is “date acquisition 1”. (Step 44).
Further, in “Resume process 5” in FIG. 4, the information on the dialogue scenario position obtained from the dialogue history acquisition device 4 is referred to determine whether it is “re-input 1”. Jump to input 1 "(step 45).

また更に、図４中「再開処理６」では、対話履歴取得装置４から得られた対話シナリオ位置の情報を参照し、それが「入力確認１」であるかを判定し、かつ、「日付」に対する経過フラグが偽値（有効）であるかを判定し、両方を満たせば「入力確認１」へジャンプする（ステップ４６）。
図４中「日付取得２」では、経過フラグが真を示し無効となっている「日付」を利用者に再度入力してもらうための応答処理が行われる（ステップ４７）。まず、日付を再入力するように促す内容の出力を音声出力装置９に指示すると共に、音声認識装置６に対して中断などの命令の他に日付に関する発話を認識するための文法および辞書を用いて認識を行うよう指示し利用者からの入力を待ち受ける。音声による入力が行われ、日付の情報が発話理解装置３から出力された場合、対話制御用メモリ５中に「日付」の入力情報としてその情報を記録する。 Furthermore, in “Resume process 6” in FIG. 4, the information on the dialogue scenario position obtained from the dialogue history acquisition device 4 is referred to, and it is determined whether it is “input confirmation 1”, and “date” It is determined whether the progress flag for is a false value (valid), and if both are satisfied, the process jumps to “input confirmation 1” (step 46).
In “date acquisition 2” in FIG. 4, a response process is performed for the user to input again the “date” whose progress flag is true and invalid (step 47). First, the voice output device 9 is instructed to output content that prompts the user to re-enter the date, and the grammar and dictionary for recognizing the utterance related to the date are used in addition to a command such as interruption to the voice recognition device 6. Instruct to perform recognition and wait for input from the user. When voice input is performed and date information is output from the utterance understanding device 3, the information is recorded in the dialog control memory 5 as “date” input information.

「再開処理６」の処理中、入力情報「日付」に対する経過フラグが真の状態であったときには、必ず「日付」の入力が必要になるため、もし、それを考慮せずに「入力確認１」にジャンプすると利用者からの否定により「再入力１」の処理を行うことになる。しかし、再入力１の処理では日付の他に地域について再入力する際の発話を許容するための文法や辞書を音声認識装置で用いている。そのため日付の発話を地域入力に関する発話に誤認識する可能性があった。
しかし、ここで述べたように、対話履歴情報取得時に判定した経過フラグの情報を用いて「再開処理６」から「日付取得２」までの一連の分岐処理により、日付の再入力のみを受け付けるように文法および辞書を設定して音声認識装置６を駆動させることで、精度の高い認識が可能となる。すなわち、有効性の判断において無効とされた日時に関する情報のみについて音声認識の対象とすることで、より認識精度の高い対話システムが構築可能である。これにより利用者の認知的負担の少ない音声対話システムが構築可能である。 If the progress flag for the input information “Date” is true during the “Resume Process 6” process, it is necessary to input “Date”. When jumping to “”, the process of “re-input 1” is performed due to the denial from the user. However, in the re-input 1 process, the speech recognition apparatus uses a grammar and a dictionary for allowing utterance when re-inputting the area in addition to the date. Therefore, there was a possibility of misrecognizing the date utterance as the utterance related to the regional input.
However, as described here, only re-input of the date is accepted through a series of branch processing from “resume process 6” to “date acquisition 2” using the information of the progress flag determined at the time of obtaining the conversation history information. By setting the grammar and dictionary and driving the speech recognition device 6, it is possible to recognize with high accuracy. That is, it is possible to construct a dialogue system with higher recognition accuracy by using only the information regarding the date and time invalidated in the determination of validity as the target of speech recognition. As a result, it is possible to construct a voice dialogue system with less user's cognitive burden.

またこの場合、「日付」の情報の有効性を利用者が判断するのではなく、システムが自動的に検知するので、利用者の負担を低減させることができる。
なお、図４中「地域取得１」では、地域を入力するように促す内容の出力を音声出力装置９に指示すると共に、音声認識装置６に対して中断、再開などの命令の他に地域に関する発話を認識するための文法および辞書を用いて認識を行うよう指示し利用者からの入力を待ち受ける。音声による入力が行われ、地域の情報が発話理解装置３から出力された場合、対話制御用メモリ５中に利用者入力情報中の「地域情報」として記録する。 In this case, since the user does not determine the validity of the “date” information but the system automatically detects it, the burden on the user can be reduced.
In “Region acquisition 1” in FIG. 4, the voice output device 9 is instructed to output the content for prompting the user to input the region, and the voice recognition device 6 is instructed regarding the region in addition to the interruption and restart instructions. Instructs the user to perform recognition using a grammar and dictionary for recognizing the utterance, and waits for input from the user. When voice input is performed and regional information is output from the utterance understanding device 3, it is recorded as “regional information” in the user input information in the dialogue control memory 5.

図４中「日付取得１」では、次に日付を入力するように促す内容の出力を音声出力装置９に指示すると共に、音声認識装置６に対して中断などの命令の他に日付に関する発話を認識するための文法および辞書を用いて認識を行うよう指示し利用者からの入力を待ち受ける。音声による入力が行われ、日付の情報が発話理解装置３から出力された場合、対話制御用メモリ５中に「日付」の入力情報としてその情報を記録する。
図４中「入力確認１」では、対話制御用メモリ５中の「地域」と「日付」の入力情報を復唱として出力し、この情報で天気を知らせて良いかどうか肯定ないし否定するように促す内容の出力を音声出力装置９に指示すると共に、音声認識装置６に対して中断などの命令の他に肯定または否定に関する発話を認識するための文法および辞書を用いて認識を行うよう指示し利用者からの入力を待ち受ける。音声による入力が行われ、発話理解装置３からの出力が肯定を示す入力情報の場合「結果出力１」を実行し、否定を示す入力情報の場合「再入力１」を実行する。 In “date acquisition 1” in FIG. 4, the voice output device 9 is instructed to output the content that prompts the user to input the next date, and the speech recognition device 6 is uttered about the date in addition to a command such as interruption. Instructs recognition using a grammar and dictionary for recognition and waits for input from the user. When voice input is performed and date information is output from the utterance understanding device 3, the information is recorded in the dialog control memory 5 as “date” input information.
In “input confirmation 1” in FIG. 4, the input information of “region” and “date” in the dialogue control memory 5 is output as a repetition, and it is urged to affirm or deny whether or not this information can be notified of the weather. Instructs the voice output device 9 to output the contents, and instructs the voice recognition device 6 to perform recognition using a grammar and dictionary for recognizing an utterance related to affirmation or denial in addition to a command such as interruption. Wait for input from the user. If the input is performed by voice and the output from the utterance understanding device 3 is input information indicating affirmation, “result output 1” is executed. If the input information indicates negative, “re-input 1” is executed.

図４中「再入力１」では、「入力確認１」での利用者からの否定の入力を受けて、修正内容を入力するように促す出力を音声出力装置９に指示すると共に、音声認識装置６に対して中断などの命令の他に地域および日付に関する発話を認識するための文法および辞書を用いて認識を行うよう指示し利用者からの入力を待ち受ける。
図３中「結果出力１」では、「入力確認１」での利用者からの肯定の入力を受けて、対話制御用メモリ５に記録されている「地域」と「日付」に対応する天気予報の内容を出力するよう音声出力装置９に指示する。 In “re-input 1” in FIG. 4, in response to a negative input from the user in “input confirmation 1”, the voice output device 9 is instructed to output to prompt the user to input correction contents, and the voice recognition device 6 is instructed to perform recognition using a grammar and a dictionary for recognizing an utterance related to a region and a date in addition to a command such as interruption, and waits for input from a user.
In “result output 1” in FIG. 3, the weather forecast corresponding to “region” and “date” recorded in the dialog control memory 5 in response to a positive input from the user in “input confirmation 1”. The audio output device 9 is instructed to output the contents of.

以上で述べた制御フローチャートにより、下記対話例のような対話が可能となる。
対話例（なお、下記にてUは利用者、Sはシステムである）
（下記の発話１〜発話６は３月２２日に行われたとする）
発話１：S：「お知りになりたい地域をお知らせ下さい」（地域取得１：による応答・認識）
発話２：U：「神奈川県東部です」
発話３：S：「それでは神奈川県東部のお知りなりたい日付はいつですか？」
（日付取得１：による応答認識）
発話４：U：「３月２３日です」
発話５：S：「それでは、神奈川県東部の３月２３日の天気でよろしいでしょうか？」（入力確認１：による応答・認識）
発話６：U：「この対話を一時中断します」
発話７：S：「それではこの対話を保存し中断します」（中断処理１：による応答。この後中断処理２による処理が行われる）
（下記、発話８〜発話１４は３月２４日に行われたとする）
発話８：S：「お知りになりたい地域をお知らせ下さい」（地域取得１：による応答・認識）
発話９：U：「先日の対話を復帰します」
発話１０：S：「おととい、神奈川県東部の天気をお調べになっていた件ですね。お調べになりたい日付をもう一度教えてください」（再開処理１：の処理後、再開処理２：による応答が行われ、再開処理３：、再開処理４：、再開処理５：、再開処理６：それぞれの判定処理の後に、日付取得２：の応答が行われている）
発話１１：U：「調べたい日は３月２５日です」
発話１２：S：「それでは、神奈川県東部の３月２５日の天気でよろしいでしょうか？」
（入力確認１：による応答・認識）
発話１３：U：「はい」、
発話１４：S：「神奈川県東部の３月２５日の天気は、晴れです」（結果出力１：による処理）
（対話例終わり）
以上、中断時の時情報を有する入力情報を念頭において実施形態を説明してきたが、本発明は中断前の任意の時点の入力情報について再開時にて有効か無効かを判断することができる。この場合、単位シナリオ処理毎に対話制御メモリ５に対話シナリオ位置と共に一旦記憶される入力情報に基づき対話履歴生成装置１１に中断と共に対話履歴情報として生成しておき、対話履歴記憶装置１０に保存し、再開時に対話履歴情報の中からシナリオ位置を選択してその入力情報を対話履歴取得装置に取り込み、対話制御装置１に出力することになる。そして、この該当する入力情報内の時情報につき再開時点の時情報と比較して経過フラグの有効あるいは無効を判定する。また、中断時と再開時との中断経過時間量情報をも算定して図８、図９のテーブルにて言語表現あるいは保存情報を選択する。なお、任意の時点の入力情報について有効あるいは無効の処理を行うに当たり、図２のステップ３７では中断処理２にあっては対話履歴生成装置１１に以前の任意の時点を含む対話履歴情報の生成を指示することになり、図６のステップ６１では対話履歴記憶装置中の任意の時点の対話履歴情報を検索、取得することになる。 With the control flowchart described above, it is possible to perform a dialogue such as the following dialogue example.
Dialogue example (In the following, U is a user and S is a system)
(Suppose that the following utterances 1 to 6 were made on March 22)
Utterance 1: S: “Please tell me the area you want to know” (Reply / Recognition by Area Acquisition 1 :)
Speak 2: U: “Eastern Kanagawa”
Speak 3: S: “When is the date you want to know about eastern Kanagawa?”
(Response recognition by date acquisition 1 :)
Speaking 4: U: “It ’s March 23”
Speaking 5: S: “Then, is it OK for the weather on March 23 in eastern Kanagawa Prefecture?” (Response / recognition by input confirmation 1 :)
Utterance 6: U: “I will suspend this dialogue”
Utterance 7: S: “Now save and suspend this conversation” (response by suspend process 1: after this, process by suspend process 2 is performed)
(The following utterances 8 to 14 are assumed to have been performed on March 24)
Utterance 8: S: “Please tell me the region you want to know” (Response / recognition by region acquisition 1 :)
Speaking 9: U: “We will return to the previous dialogue”
Speaking 10: S: “Otoi, you were checking the weather in eastern Kanagawa Prefecture. Please tell me the date you want to check again.” (Responding to Resume 2: After Resume 1: (Resume process 3 :, Resume process 4 :, Resume process 5 :, Resume process 6: Response of date acquisition 2: is performed after each determination process)
Utterance 11: U: “The day I want to investigate is March 25”
Speaking 12: S: “Are you sure the weather on March 25 in eastern Kanagawa?”
(Response / recognition by input confirmation 1 :)
Utterance 13: U: “Yes”,
Utterance 14: S: “The weather on March 25 in eastern Kanagawa is clear” (Processing by Result Output 1 :)
(End of dialogue example)
As described above, the embodiment has been described with the input information having the time information at the time of interruption in mind, but the present invention can determine whether the input information at an arbitrary time before the interruption is valid or invalid at the time of resumption. In this case, based on the input information temporarily stored together with the dialog scenario position in the dialog control memory 5 for each unit scenario process, it is generated as dialog history information along with interruption in the dialog history generating device 11 and stored in the dialog history storage device 10. When restarting, the scenario position is selected from the dialogue history information, the input information is taken into the dialogue history acquisition device, and is output to the dialogue control device 1. Then, the time information in the corresponding input information is compared with the time information at the time of restart to determine whether the progress flag is valid or invalid. Also, the interruption elapsed time amount information at the time of interruption and at the time of resumption is calculated, and language expression or storage information is selected from the tables of FIGS. When performing the valid or invalid process for the input information at an arbitrary time, in step 37 of FIG. 2, in the interruption process 2, the dialog history generating device 11 generates dialog history information including the previous arbitrary time. In step 61 of FIG. 6, the dialog history information at an arbitrary point in the dialog history storage device is retrieved and acquired.

また、これまでの実施形態の説明にあって具体的な説明としては、天気予報あるいは公演のチケット予約について述べたが、例えばチケット予約について座席の種類、座席数、等の選択肢が多くなるほど、中断の回数も多くなることえを勘案すれば、この選択肢が多くなる対話対象になるほど、入力情報内の時情報の有効あるいは無効の判定が効果的となり、威力を発揮する。
なお、本実施例においては、出力に音声合成装置を用いたが、画面にテキストを印字するなどの方法を用いても良い。 Also, in the explanation of the embodiments so far, as the concrete explanation, the weather reservation or the ticket reservation for the performance has been described. For example, as the ticket reservation has more options such as the type of seat, the number of seats, etc. Considering that the number of times increases, the more the number of choices becomes, the more effective and invalid the determination of the time information in the input information becomes.
In this embodiment, the speech synthesizer is used for output, but a method such as printing text on the screen may be used.

また、本実施例においては対話履歴情報の保存は、利用者からの中断の要請を受けてから行ったが、対話中の任意の時点で行ってもよい。 In the present embodiment, the dialog history information is stored after receiving a request for interruption from the user, but may be stored at any time during the dialog.

本発明の実施形態を示すブロック構成図である。It is a block block diagram which shows embodiment of this invention. 本発明に係る対話シナリオの大略を示すフローチャートである。It is a flowchart which shows the outline of the dialogue scenario which concerns on this invention. 対話シナリオのフローチャートである。It is a flowchart of a dialogue scenario. 図３に続く再開処理のフローチャートである。FIG. 4 is a flowchart of a restart process following FIG. 3. 対話履歴生成装置の処理フローチャートである。It is a processing flowchart of a dialogue history generation device. 対話履歴取得装置の処理フローチャートである。It is a processing flowchart of a dialogue history acquisition device. 経過フラグの判定方法の処理フローチャートである。It is a process flowchart of the determination method of progress flag. 中断経過時間量を示す言語表現の関係図である。It is a related figure of the language expression which shows the interruption elapsed time amount. 再開処理における保存情報の関係図である。It is a related figure of the preservation | save information in a restart process.

Claims

In a voice interaction method that processes voice input and outputs response voice,
When a dialog interruption is input, a dialog history is generated that includes the input information input up to the point of interruption of the dialog, the control program position of the dialog scenario being executed at that time, and the time information at that time. Processing,
When a dialog resume is input, the dialog history obtained in the dialog history generation process is extracted, and the input is made by referring to the time information in the input information in the extracted dialog history against the time information at the time of restarting the dialog. A dialogue history acquisition process for determining whether the time information in the information is still valid or invalid.

The voice interaction method according to claim 1,
Further, the dialog history acquisition process to which an operation for obtaining an elapsed time from interruption to resumption from time information at the time of interruption and time information at the time of resumption is added,
A response using a natural language expression corresponding to the elapsed time from the interruption point to the resumption point obtained in this dialog history acquisition process is output, and resumption corresponding to the time information in the input information in the case of the above invalid determination And a dialogue control process for outputting a response based on the time information at the time.

In a voice interaction method that processes voice input and outputs response voice,
When a dialog interruption is input, a dialog that generates a dialog history consisting of the input information input up to any point in the dialog, the control program position of the dialog scenario currently being executed, and the time information at that time History generation processing,
When a dialog resume is input, the dialog history obtained in the dialog history generation process is extracted, and the input is made by referring to the time information in the input information in the extracted dialog history against the time information at the time of restarting the dialog. A dialogue history acquisition process for determining whether the time information in the information is still valid or invalid.

The voice interaction method according to claim 3.
Further, the dialog history acquisition process to which an operation for obtaining an elapsed time from an arbitrary time point to the restart time point is added from the time information at the arbitrary time point and the time information at the restart time point;
Outputs a response using natural language expression in accordance with the elapsed time from the above arbitrary time point to the restart point obtained in this dialog history acquisition process, and corresponds to the time information in the input information in the case of the above invalid determination And a dialogue control process for outputting a response based on the time information at the time of resumption.

5. The voice dialogue method according to claim 1, wherein whether the time information is still valid or invalid is decided according to a purpose of use of the dialogue history.

6. The spoken dialogue method according to claim 1, wherein the natural language expression is extracted from a table of idiomatic expressions according to elapsed time.

7. The spoken dialogue method according to claim 1, wherein in the natural language expression, stored information corresponding to a determination result of time information is extracted from a table.

In a voice interactive device that processes voice input and outputs response voice,
A dialog history generating device for generating a dialog history including input information input up to the point of interruption of the dialog, a control program position of a dialog scenario being executed at that time, and time information at that time;
Take out the dialogue history obtained by this dialogue history generation device, and check whether the time information in the input information is still valid by checking the time information in the input information in the extracted dialogue history against the time information at the time of restarting the dialogue A voice dialog device, comprising: a dialog history acquisition device that determines whether it is invalid.

The voice interaction apparatus according to claim 8, wherein
Furthermore, the dialog history acquisition device to which an operation for obtaining an elapsed time from interruption to resumption from the time information of the time information at the time of interruption and the time information at the time of resumption is added,
A response using a natural language expression corresponding to the elapsed time from the interruption point to the restart point obtained by the dialog history acquisition device is output, and the restart corresponding to the time information in the input information in the case of the above invalid determination And a dialogue control device for outputting a response based on the time information at the time.

In a voice interactive device that processes voice input and outputs response voice,
A dialog history generation device for generating a dialog history including input information input up to an arbitrary point in time during the dialog, a control program position of the dialog scenario being executed at that time, and time information at that time;
Take out the dialogue history obtained by this dialogue history generation device, and check whether the time information in the input information is still valid by checking the time information in the input information in the extracted dialogue history against the time information at the time of restarting the dialogue A voice dialog device, comprising: a dialog history acquisition device that determines whether it is invalid.

The voice interaction apparatus according to claim 10, wherein
Further, the dialog history acquisition device to which an operation for obtaining an elapsed time from an arbitrary time point to the restart time point is added from the time information at the arbitrary time point and the time information at the restart time point;
Outputs a response using natural language expression corresponding to the elapsed time from the arbitrary time point to the restart time obtained by the dialog history acquisition device, and corresponds to the time information in the input information in the case of the invalid determination And a dialogue control device for outputting a response based on time information at the time of resumption.