JP2008003517A

JP2008003517A - Speech interaction system, speech interaction method, and program

Info

Publication number: JP2008003517A
Application number: JP2006175877A
Authority: JP
Inventors: Masateru Arakawa; 正輝荒川
Original assignee: NEC System Technologies Ltd
Current assignee: NEC Solution Innovators Ltd
Priority date: 2006-06-26
Filing date: 2006-06-26
Publication date: 2008-01-10
Anticipated expiration: 2026-06-26
Also published as: JP4491438B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech interaction system capable of correctly determining whether or not, speech response during outputting is important for a user, when there is a speech input from the user during the outputting of the speech response. <P>SOLUTION: The speech interaction system includes a memory section and a control section. The memory section stores: a recognition word which is a word to be recognized as input speech; speech response for outputting while corresponding to the input speech; priority of the recognition word and the speech response; input history of the recognition word; and output history of the speech response. In the control section, when the input speech is received during outputting of the speech response, the recognition word is specified by removing the speech response superimposed on the input speech, and the priority of the speech response and the recognition word are corrected by using the input history and the output history. The priority of the speech response is compared with that of the recognition word after correction. When the priority of the speech response is higher, outputting is maintained, and when the priority of the recognition word is higher, the speech response is interrupted. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、ユーザとの間で音声を用いて対話を行う音声対話装置、音声対話方法、およびその方法をコンピュータに実行させるためのプログラムに関する。 The present invention relates to a voice dialogue apparatus that performs dialogue with a user using voice, a voice dialogue method, and a program for causing a computer to execute the method.

従来の音声対話装置は、マイクロホンからユーザ（話者）の入力音声を認識した情報に応じてスピーカから音声応答を出力することで、ユーザとの対話を実現している。音声対話装置には、自装置から出力中の音声応答をマイクロホンで受け付けないようにするためのパージイン機能が設けられているものがある。パージイン機能は、装置が音声を認識するときに自ら出力中である音声応答をユーザからの入力音声として認識しないようにするため、ユーザの声による音声信号と出力中の音声応答による音声応答信号とがマイクロホンから入力されて重畳されている音声応答部分について、音声応答信号を入力音声から除去するものである。 A conventional voice interaction device realizes a dialogue with a user by outputting a voice response from a speaker in accordance with information recognizing a user (speaker) input voice from a microphone. Some voice interactive apparatuses are provided with a purge-in function for preventing a voice response being output from the own apparatus from being received by a microphone. In order for the purge-in function not to recognize the voice response that is being output when the apparatus recognizes the voice as the input voice from the user, the voice response by the voice of the user and the voice response signal by the voice response being output The voice response signal is removed from the input voice for the voice response portion that is input from the microphone and superimposed.

また、従来の音声対話装置は、音声応答の出力の途中にユーザの入力音声を受け付けると、音声応答の出力を中断する装置と、入力音声を記録しながら音声応答の出力を優先する装置とに大別される。 In addition, the conventional voice interaction device is divided into a device that interrupts the output of the voice response when a user input voice is received during the output of the voice response, and a device that prioritizes the output of the voice response while recording the input voice. Broadly divided.

音声応答の出力を中断する音声対話装置は、様々なエコーや雑音を起因とする不測の入力音声によりユーザの意図に反して音声応答の出力が中断するという問題があった。 The voice interactive apparatus that interrupts the output of the voice response has a problem that the output of the voice response is interrupted against the user's intention due to unexpected input voice caused by various echoes and noises.

一方、音声応答の出力を優先する音声対話装置は、音声応答が非常に長い場合にユーザの入力音声を受け付けた後も音声応答を出力し続け、ユーザに対して長い待ち時間を生じさせてしまうという問題があった。 On the other hand, a voice interactive apparatus that prioritizes output of a voice response continues to output a voice response even after receiving a user's input voice when the voice response is very long, causing a long waiting time for the user. There was a problem.

このような問題に対し、ユーザにとっての利便性を向上させる目的で、様々な工夫がなされている。例えば、音声応答の内容をユーザの音声対話装置の使用の習熟度に応じて、自動的に変更する音声対話システムが開示されている（特許文献１参照）。また、入力側で発生する様々なエコーや雑音を起因とする不測の入力音声に伴い、システムが誤認識により音声応答の出力を中断することやシステムの音声認識が誤動作することを抑止するために、入力音声をフィルタリングする音声対話システムが開示されている（特許文献２参照）。
特開2004-333543号公報特開2004-144791号公報 In order to improve the convenience for the user, various ideas have been made for such a problem. For example, a voice interaction system that automatically changes the content of a voice response according to the user's proficiency in using the voice interaction device is disclosed (see Patent Document 1). Also, in order to prevent the system from interrupting the output of the voice response due to misrecognition and the system's voice recognition malfunctioning due to unexpected input voices caused by various echoes and noise occurring on the input side A voice dialogue system for filtering input voice is disclosed (see Patent Document 2).
JP 2004-333543 A JP 2004-144791 A

従来の音声対話装置のうち音声応答の出力を優先する装置は、パージイン機能を利用することにより音声応答の出力中であってもユーザが発話した音声を受け付けることが可能である。特許文献１に開示された音声対応装置には、つぎのような問題が起こり得る。特許文献１の音声対話装置は、対応中のユーザのシステムに対する使用の習熟度が低いと判断すると、ユーザへの音声応答を詳細な内容で、ゆっくりとした速度で出力する。このユーザのシステムに対する習熟度が高ければ、ユーザは出力中の音声応答の内容を予測できるため、出力を中断して欲しいと感じるが、音声応答の出力が終了するまで待たなければならないという問題がある。
また、音声応答の出力を中断することが可能な音声対話装置は、入力音声を受け付けると音声応答の重要性を考慮することなく出力を中断する。したがって、そのシステムを初めて使用するユーザのための音声ガイド、ならびにシステムの習熟度を問わずユーザ全員への注意喚起および警告などの重要な音声応答がユーザに対し十分に伝えられないという問題がある。
本発明は、上述した問題点を解決するためになされたものであり、音声応答の出力中にユーザからの音声入力があると、出力中の音声応答がユーザにとって重要であるか否かをより的確に判断可能にした音声応対話装置、音声対話方法、およびその方法をコンピュータに実行させるためのプログラムを提供することを目的とする。 A device that prioritizes output of a voice response among conventional voice interaction devices can accept a voice spoken by a user even during output of a voice response by using a purge-in function. The following problems may occur in the voice-compatible device disclosed in Patent Document 1. If the spoken dialogue apparatus of Patent Literature 1 determines that the user's skill level of the corresponding user system is low, the voice interaction device outputs a voice response to the user at a slow speed with detailed contents. If the user's level of proficiency with the system is high, the user can predict the content of the voice response being output, so he feels that he wants to stop the output, but has to wait until the output of the voice response ends. is there.
In addition, when a voice interactive apparatus capable of interrupting the output of a voice response receives an input voice, the voice dialog device interrupts the output without considering the importance of the voice response. Therefore, there is a problem that voice guidance for users who use the system for the first time and important voice responses such as alerts and warnings to all users regardless of their proficiency level are not sufficiently communicated to users. .
The present invention has been made to solve the above-described problems, and if there is a voice input from the user during the output of the voice response, it is more determined whether the voice response being output is important for the user. An object of the present invention is to provide a voice response dialogue apparatus, a voice dialogue method, and a program for causing a computer to execute the method.

上記目的を達成するための本発明の音声対話装置は、
入力音声として認識される単語である認識語、該入力音声に対応して出力するための音声応答の情報、該認識語および該音声応答の優先度の情報、前記認識語の入力回数の情報を含む入力履歴、ならびに前記音声応答の出力回数および出力が完了する前に中断した回数の情報を含む出力履歴が格納された記憶部と、
前記音声応答の出力中に前記入力音声を受け付けると、該入力音声に重畳する該音声応答を除去して前記認識語を特定し、該音声応答および該認識語の前記記憶部に格納された優先度を前記入力履歴および前記出力履歴の情報を用いて補正し、該音声応答と該認識語の補正後の優先度を比較し、該音声応答の方の優先度が高ければ出力を維持し、該認識語の方の優先度が高ければ、該音声応答の出力を中断する制御部とを有する構成である。 In order to achieve the above object, a speech dialogue apparatus of the present invention is provided.
A recognition word that is a word recognized as an input voice, voice response information to be output corresponding to the input voice, priority information of the recognition word and the voice response, and information on the number of times the recognition word is input A storage unit storing an input history including, and an output history including information on the number of output times of the voice response and the number of times of interruption before the output is completed;
When the input voice is received during the output of the voice response, the voice response superimposed on the input voice is removed to identify the recognized word, and the priority stored in the storage unit of the voice response and the recognized word Is corrected using the information of the input history and the output history, the priority of the voice response and the recognition word after correction is compared, and if the priority of the voice response is higher, the output is maintained, A control unit that interrupts the output of the voice response if the recognition word has a higher priority.

本発明によれば、音声応答の出力中に入力音声を受け付けると、音声応答の出力履歴および入力音声の認識語の入力履歴を考慮した上で優先度が補正され音声応答の出力を中断するか否かが判定される。音声応答出力の維持または中断が履歴を考慮して決定されるため、実際の状況により適した応答を人に対してすることが可能となる。 According to the present invention, when an input voice is received during output of a voice response, the priority is corrected in consideration of the output history of the voice response and the input history of the recognition word of the input voice, and the output of the voice response is interrupted. It is determined whether or not. Since the maintenance or interruption of the voice response output is determined in consideration of the history, it becomes possible to make a response more suitable for the actual situation to the person.

また、上記本発明の音声対話装置において、
前記入力履歴には、前記認識語に対応して該認識語の入力された時刻の履歴の情報である時系列情報が含まれ、
前記制御部は、
受け付け中の認識語の優先度を補正する際、前記時系列情報を参照し、該認識語の入力が最後に記録された時刻からの経過時間が短いほど該認識語の優先度を高い値に補正することとしてもよい。 Moreover, in the above-described speech dialogue apparatus of the present invention,
The input history includes time-series information that is information on the history of the input time of the recognized word corresponding to the recognized word,
The controller is
When correcting the priority of the recognized word being accepted, the time series information is referred to, and the priority of the recognized word is increased as the elapsed time from the time when the input of the recognized word was last recorded is shorter. It is good also as correcting.

この場合、入力中の認識語が前回入力されてからの経過時間が短いほどユーザにとって重要であると考えられ、認識語が現在出力中の音声応答よりも優先される可能性が高くなるため、ユーザが必要とする音声応答をより早く出力することが可能となる。 In this case, the shorter the elapsed time since the last input of the recognized word being input, the more important to the user, the higher the possibility that the recognized word is given priority over the voice response currently being output. The voice response required by the user can be output more quickly.

また、上記本発明の音声対話装置において、
前記出力履歴には、前記音声応答に対応して該音声応答の出力された時刻の履歴の情報である時系列情報が含まれ、
前記制御部は、
出力中の音声応答の優先度を補正する際、前記時系列情報を参照し、該音声応答の出力が最後に記録された時刻からの経過時間が短いほど該音声応答の優先度を低い値に補正することとしてもよい。 Moreover, in the above-described speech dialogue apparatus of the present invention,
The output history includes time-series information that is information on the history of the time at which the voice response was output corresponding to the voice response,
The controller is
When correcting the priority of the voice response being output, the time series information is referred to, and the priority of the voice response is set to a lower value as the elapsed time from the time when the output of the voice response was last recorded is shorter. It is good also as correcting.

この場合、出力中の音声応答が前回出力されてからの経過時間が短いほどユーザはその内容を覚えているものと考えられ、音声応答が現在入力中の認識語よりも優先されない可能性が高くなるため、ユーザにとって不要な音声応答を中断することが可能となる。 In this case, it is considered that the shorter the elapsed time since the voice response being output last time, the more the user remembers the content, and there is a high possibility that the voice response is not prioritized over the currently input recognition word. Therefore, it becomes possible to interrupt the voice response unnecessary for the user.

また、上記本発明の音声対話装置において、
前記記憶部には、ユーザ毎に異なる識別子、ユーザにより入力された認識語、および該認識語に対応して出力された音声応答の情報を含むユーザ固有履歴情報が格納され、
前記制御部は、
前記識別子が入力されると、入力された識別子に一致する識別子を前記ユーザ固有履歴情報で特定し、特定したユーザ固有履歴情報にしたがって、受け付け中の認識語と出力中の音声応答の優先度を補正してもよい。 Moreover, in the above-described speech dialogue apparatus of the present invention,
The storage unit stores user-specific history information including a different identifier for each user, a recognized word input by the user, and voice response information output corresponding to the recognized word,
The controller is
When the identifier is input, an identifier that matches the input identifier is specified by the user-specific history information, and the received recognition word and the priority of the voice response being output are determined according to the specified user-specific history information. It may be corrected.

この場合、ユーザ毎に音声応答の内容や音声応答を中断するか否かの判断が異なるため、ユーザ毎により適した対応することが可能となる。 In this case, since the contents of the voice response and the determination of whether or not to interrupt the voice response are different for each user, it is possible to take a more appropriate action for each user.

本発明では、音声応答の出力中に入力音声を受け付けると、音声応答の出力履歴および入力音声の入力履歴を考慮した上で優先度を決定しているため、音声応答の出力を中断するか否かについて、入力音声を受け付けたときの状況により適した判断をすることが可能となる。その結果、従来よりも、人と対話しているのと近い感覚を得ることができる。 In the present invention, when the input voice is received during the output of the voice response, the priority is determined in consideration of the output history of the voice response and the input history of the input voice. This makes it possible to make a more suitable determination according to the situation when the input voice is received. As a result, it is possible to obtain a sensation closer to that of a conversation with a person than before.

本実施形態の音声対話装置の構成について、図面を参照しながら説明する。 The configuration of the voice interactive apparatus according to the present embodiment will be described with reference to the drawings.

図１は本実施形態の音声対話装置の一構成例を示すブロック図である。 FIG. 1 is a block diagram showing an example of the configuration of the voice interaction apparatus of this embodiment.

図１に示すように、本実施形態の音声対話装置は、入力手段５から入力される音声を認識するための用語およびそれに対する音声応答を含む情報を格納するための対話記憶部３と、入力音声の入力履歴および音声応答の出力履歴を格納するための対話履歴記憶部４と、入力音声に対応して音声応答を出力手段６に出力させる制御部７とを有する。なお、図には示さないが、制御部７は、プログラムにしたがって所定の処理を実行するＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）と、プログラムを格納するためのメモリとを有する。 As shown in FIG. 1, the speech dialogue apparatus according to the present embodiment includes a dialogue storage unit 3 for storing information including a term for recognizing a voice inputted from the input means 5 and a voice response to the term, and an input. It has a dialogue history storage unit 4 for storing a voice input history and a voice response output history, and a control unit 7 for causing the output means 6 to output a voice response corresponding to the input voice. Although not shown in the figure, the control unit 7 includes a CPU (Central Processing Unit) that executes predetermined processing according to a program and a memory for storing the program.

図１に示すように、制御部７は、音声応答除去手段１１および音声認識手段１２を含む音声入力手段１と、割り込み判定手段２１、優先度決定手段２３および対話制御手段２２を含む音声対話手段２とを有する。これらの手段は、ＣＰＵがプログラムを実行することで、音声対話装置内に仮想的に構成される。 As shown in FIG. 1, the control unit 7 includes a voice input unit 1 including a voice response removal unit 11 and a voice recognition unit 12, and a voice dialog unit including an interrupt determination unit 21, a priority determination unit 23, and a dialog control unit 22. 2. These means are virtually configured in the voice interactive apparatus by the CPU executing the program.

次に、対話記憶部３について説明する。対話記憶部３には、音声認識手段１２で入力音声から認識される言葉である認識語およびそれに付随する情報が記述された認識語リストと、音声応答するための単語が記述された音声応答リストとが格納されている。 Next, the dialogue storage unit 3 will be described. In the dialogue storage unit 3, a recognition word list in which a recognition word that is recognized from the input voice by the voice recognition unit 12 and information associated therewith are described, and a voice response list in which a word for voice response is described And are stored.

図２は認識語リストの一例を示す表である。図２に示すように、認識語リストには、認識語の識別子である認識語ＩＤ、その認識語の優先度、およびその認識語に対応する音声応答の情報が認識語ごとに記述されている。 FIG. 2 is a table showing an example of the recognized word list. As shown in FIG. 2, in the recognized word list, a recognized word ID that is an identifier of the recognized word, a priority of the recognized word, and voice response information corresponding to the recognized word are described for each recognized word. .

図３は音声応答リストの一例を示す表である。図３に示すように、音声応答リストには、音声応答の識別子である音声応答ＩＤ、音声応答の内容、および音声応答の優先度が音声応答の内容ごとに記述されている。例えば、図２および図３から、認識語「おはよう」に対応する音声応答に「おはよう」および「おはようございます」があることがわかる。それぞれの優先度は、認識語「おはよう」が３であり、音声応答「おはよう」および「おはようございます」は２であることがわかる。 FIG. 3 is a table showing an example of the voice response list. As shown in FIG. 3, in the voice response list, a voice response ID that is an identifier of the voice response, a voice response content, and a priority of the voice response are described for each voice response content. For example, it can be seen from FIGS. 2 and 3 that the voice response corresponding to the recognition word “good morning” includes “good morning” and “good morning”. As for each priority, it is understood that the recognition word “good morning” is 3, and the voice responses “good morning” and “good morning” are 2.

次に、対話履歴記憶部４について説明する。対話履歴記憶部４には、入力された認識語の履歴および出力された音声応答の履歴が格納される。 Next, the dialogue history storage unit 4 will be described. The dialogue history storage unit 4 stores a history of inputted recognition words and a history of outputted voice responses.

図４は認識語の入力履歴の一例を示す表である。図４に示すように、認識語ＩＤごとに認識語の総入力回数を保存する。 FIG. 4 is a table showing an example of recognition word input history. As shown in FIG. 4, the total number of input recognition words is stored for each recognition word ID.

図５は音声応答の出力履歴の一例を示す表である。図５に示すように、音声応答ＩＤごとに総出力回数および音声応答の出力が完了する前に中断された回数である総中断回数を保存する。例えば、図２および図４において、認識語「おはよう」の総入力回数は９８回である。また図３および図５において、音声応答「おはようございます」の総出力回数は４３回であり、総中断回数は１１回である。 FIG. 5 is a table showing an example of a voice response output history. As shown in FIG. 5, the total number of outputs and the total number of interruptions, which is the number of interruptions before the completion of the output of the voice response, are stored for each voice response ID. For example, in FIGS. 2 and 4, the total number of input of the recognition word “Good morning” is 98 times. 3 and 5, the total number of outputs of the voice response “Good morning” is 43 times, and the total number of interruptions is 11 times.

次に、制御部７の音声入力手段１について説明する。 Next, the voice input unit 1 of the control unit 7 will be described.

音声応答除去手段１１は、入力手段５から音声信号の入力があったときに音声対話手段２から出力される音声応答信号を除去するパージイン機能を有する。パージイン機能により音声応答信号を除去した音声信号を音声認識手段１２に出力する。なお、音声応答除去手段１１で用いるパージイン機能の実施形態は、特に限定されるものでなく従来と同様でよい。ただし、音声認識率を向上させるために、音声応答信号の除去だけでなく様々な雑音も除去できることが望ましい。 The voice response removing unit 11 has a purge-in function for removing the voice response signal output from the voice dialogue unit 2 when a voice signal is input from the input unit 5. The voice signal from which the voice response signal has been removed by the purge-in function is output to the voice recognition means 12. The embodiment of the purge-in function used in the voice response removing unit 11 is not particularly limited and may be the same as the conventional one. However, in order to improve the voice recognition rate, it is desirable that not only the voice response signal but also various noises can be removed.

音声認識手段１２は、音声応答除去手段１１から入力される音声信号を認識し、認識した処理の結果を音声対話手段２に出力する。また、音声認識手段１２は、ユーザからの発話を認識する度に、対話履歴記憶部４に認識語および認識語に付随する情報を記録する。なお、音声認識手段１２で用いる音声を認識する方法は、特に限定されるものでなく従来と同様であればよい。 The voice recognition unit 12 recognizes the voice signal input from the voice response removal unit 11 and outputs the recognized processing result to the voice dialogue unit 2. In addition, each time the speech recognition unit 12 recognizes an utterance from the user, the speech recognition unit 12 records the recognition word and information associated with the recognition word in the dialogue history storage unit 4. In addition, the method of recognizing the voice used by the voice recognition unit 12 is not particularly limited and may be the same as the conventional one.

次に、音声対話手段２について説明する。 Next, the voice interaction means 2 will be described.

割込み判定手段２１は、対話制御手段２２が音声応答を出力中に音声認識手段１２から認識語の入力を受け付けると、その認識語と出力中の音声応答とについて優先度の比較を行う。このとき、音声応答の出力中に複数の認識語を受け付けると、その度に優先度の比較を行う。優先度の比較を行った結果、認識語の優先度が高いときに、対話制御手段２２にその認識語を出力する。反対に出力中の音声応答の優先度が高かったときは、その音声応答の出力が完了したことを対話制御手段２２から入力し、その後音声応答が出力中に受け付けた認識語の中で最も優先度の高い認識語を対話制御手段２２に出力する。また、判定する認識語と音声応答を優先度決定手段２３に対して出力し、それぞれの優先度を優先度決定手段２３から受け付ける。なお、対話制御手段２２が音声応答を出力していないときに音声認識手段１２から認識語の入力があると、入力された認識語そのものを対話制御手段２２に出力する。 When the dialogue control unit 22 receives an input of a recognized word from the voice recognition unit 12 while the dialog control unit 22 is outputting a voice response, the interrupt determination unit 21 compares the priority of the recognized word and the voice response being output. At this time, when a plurality of recognition words are received during the output of the voice response, the priority is compared each time. When the priority of the recognized word is high as a result of the comparison of the priorities, the recognized word is output to the dialogue control means 22. On the other hand, when the priority of the voice response being output is high, it is input from the dialogue control means 22 that the output of the voice response has been completed, and then the highest priority among the recognized words received during the output of the voice response. A recognized word having a high degree is output to the dialogue control means 22. Further, the recognition word to be determined and the voice response are output to the priority determination unit 23, and each priority is received from the priority determination unit 23. If the recognition word is inputted from the voice recognition unit 12 when the dialogue control unit 22 is not outputting a voice response, the inputted recognition word itself is outputted to the dialogue control unit 22.

優先度決定手段２３は、割込み判定手段２１から認識語および出力中の音声応答を受け付け、それぞれの優先度を決定し、割込み判定手段２１に出力する。認識語の優先度および出力中の音声応答の優先度を決定する際に、優先度決定手段２３は、対話履歴記憶部４に格納される履歴情報を利用して、基準となる優先度について次のような補正を実施する。ここでいう基準となる優先度とは、対話記憶部３に格納されている認識語および音声応答の優先度のことである。 The priority determination unit 23 receives the recognized word and the voice response being output from the interrupt determination unit 21, determines each priority, and outputs the priority to the interrupt determination unit 21. When determining the priority of the recognized word and the priority of the voice response being output, the priority determination means 23 uses the history information stored in the dialog history storage unit 4 to determine the priority as a reference. The following correction is performed. Here, the reference priority is the priority of the recognized word and the voice response stored in the dialogue storage unit 3.

図４に示した認識語ＩＤ「３００」のように総入力回数が多い場合、その認識語はユーザが好んでいる、あるいはユーザにとって必要性が高いと考えられる。したがって、優先度決定手段２３は、優先度をその基準より高くなるようにする。 When the total number of times of input is large like the recognized word ID “300” shown in FIG. 4, the recognized word is considered to be preferred by the user or highly necessary for the user. Therefore, the priority determination means 23 makes the priority higher than the reference.

音声応答出力中にユーザから音声が複数回入力された場合、ユーザが早く次の応答が開始することを期待していると考えられる。したがって、優先度決定手段２３は、音声応答出力に音声認識された認識語の優先度をその基準より高くなるようにする。 When voice is input from the user a plurality of times during voice response output, it is considered that the user expects the next response to start soon. Therefore, the priority determination means 23 makes the priority of the recognized word recognized by the voice response output higher than the reference.

同一の認識語が複数回入力される場合、最後に記録された時刻からの経過時間が短いほどその認識語は、ユーザが好んでいる、あるいはユーザにとって必要性が高いと考えられる。したがって、優先度決定手段２３は、最後に記録された時刻からの経過時間が所定の時間よりも短い場合、優先度をその基準より高くなるようにする。この場合、図６に示すような認識語の入力された時刻の履歴である時系列情報が対話履歴記憶部４に格納されるものとする。さらに、格納された時系列情報を分析して、複数ある発話パターンからユーザの発話パターンを抽出し、優先度を補正してもよい。 When the same recognition word is input a plurality of times, the shorter the elapsed time from the last recorded time, the more the recognition word is considered to be preferred by the user or to the user. Therefore, the priority determination means 23 makes the priority higher than the reference when the elapsed time from the last recorded time is shorter than the predetermined time. In this case, it is assumed that time series information that is a history of the time when a recognition word is input as shown in FIG. 6 is stored in the dialogue history storage unit 4. Further, the stored time-series information may be analyzed to extract the user's utterance pattern from a plurality of utterance patterns, and the priority may be corrected.

図５に示した音声応答ＩＤ「１０１」のように総出力回数が多い場合、その音声応答は、ユーザの記憶に残っている可能性が高い。そのため、音声応答の冒頭を聞くだけで、その後に続く内容が容易に類推可能である。したがって、優先度決定手段２３は、その音声応答の優先度をその基準より低くなるようにする。 When the total number of outputs is large like the voice response ID “101” shown in FIG. 5, the voice response is likely to remain in the user's memory. Therefore, it is possible to easily infer the content that follows after just listening to the beginning of the voice response. Therefore, the priority determination means 23 makes the priority of the voice response lower than the reference.

図５に示した音声応答ＩＤ「３０１」のように総出力回数に対する中断回数の割合が高い場合、その音声応答は、ユーザが応答音声の全てを聞く必要がないと判断しているものである。したがって、優先度決定手段２３は、優先度をその基準より低くなるようにする。 When the ratio of the number of interruptions to the total number of outputs is high as in the voice response ID “301” illustrated in FIG. 5, the voice response determines that the user does not need to listen to all of the response voices. . Therefore, the priority determination means 23 makes the priority lower than the reference.

図５に示した音声応答ＩＤ「２０３」のように総出力回数が０回である場合、その音声応答は、ユーザにとって未知のものである。そのため、確実に最後まで出力することが望ましい。したがって、優先度決定手段２３は、優先度をその基準より高くなるようにする。 When the total number of outputs is 0 as in the voice response ID “203” illustrated in FIG. 5, the voice response is unknown to the user. Therefore, it is desirable to output to the end reliably. Therefore, the priority determination means 23 makes the priority higher than the reference.

これから出力しようとする音声応答について前回出力された時刻と現在時刻との間が所定の時間よりも短い場合、その音声応答は総出力回数が少なかったとしてもユーザの記憶に残っている可能性は高い。そのため、優先度決定手段２３は、優先度をその基準よりも低くなるようにする。反対に前回出力された時刻と現在時刻との間が所定の時間よりも長い場合、優先度をその基準よりも高くなるようにする。いずれの場合も、図７に示すような音声応答が出力された時刻の履歴である時系列情報が対話履歴記憶部４に格納されるものとする。 If the time between the last output time and the current time is shorter than the predetermined time for the voice response to be output, the voice response may remain in the user's memory even if the total number of output times is small. high. Therefore, the priority determination unit 23 sets the priority to be lower than the reference. On the contrary, when the time between the last output time and the current time is longer than a predetermined time, the priority is set higher than the reference. In any case, it is assumed that time series information, which is a history of the time when a voice response as shown in FIG. 7 is output, is stored in the dialogue history storage unit 4.

ユーザのシステムに対する習熟度が異なる場合、習熟度に対応して認識語と音声応答の優先度を補正してもよい。この場合、習熟度の判定については、従来と同様の方法でよい。 When the proficiency level of the user with respect to the system is different, the priority of the recognized word and the voice response may be corrected according to the proficiency level. In this case, the proficiency level may be determined by a method similar to the conventional method.

対話制御手段２２は、割込み判定手段２１から認識語が入力されると、その認識語に対する音声応答を出力手段６を介して出力する。このとき、その認識語を対話記憶部３内で検索し、その認識語に対応する音声応答を見つけて読み出す。対話制御手段２２は、音声応答を出力中に割込み判定手段認識語２１から認識語が入力されると出力中の音声応答を中断するという制御を行う。そして、その認識語に対応する音声応答を前述した方法で出力する。このとき、後から出力した音声応答の信号を音声応答除去手段１１へ出力する。対話制御手段２２は、音声応答を出力するたびに、対話履歴記憶部４に音声応答および音声応答に付随した情報を記録する。 When a recognized word is input from the interrupt determination unit 21, the dialogue control unit 22 outputs a voice response to the recognized word via the output unit 6. At this time, the recognized word is searched in the dialogue storage unit 3, and a voice response corresponding to the recognized word is found and read out. The dialogue control means 22 performs control to interrupt the voice response being output when the recognition word is input from the interrupt determination means recognition word 21 while outputting the voice response. Then, the voice response corresponding to the recognized word is output by the method described above. At this time, the voice response signal output later is output to the voice response removing unit 11. Each time the dialogue control means 22 outputs a voice response, it records the voice response and information accompanying the voice response in the dialogue history storage unit 4.

さらに、対話制御手段２２は、音声応答の出力だけでなく、コマンドの出力およびタスクを実行してもよい。ここでいうコマンドの出力とは、例えばカーナビゲーションシステムなどの音声をインターフェースとして利用できるシステムにおいて、その構成に含まれるリモートコントロールやボタン操作の入力に対応した機能を出力することである。一方、タスクを実行するというのは、同じくカーナビゲーションシステムなどの音声をインターフェースとして利用できるシステムにおいて、音声入力に対して前述したコマンドだけでなく、認識したことを示す効果音などを含めた音声応答も出力することである。なお、コマンドおよび複数のコマンドを組み合わせたタスクは、それぞれ優先度を付与された状態で図１に示した対話記憶部３に記憶されているものとする。これにより、コマンドの出力中やタスクを実行中に音声入力があった場合、優先度に対応した出力が可能となりユーザの利便性が向上する。 Furthermore, the dialogue control means 22 may execute not only a voice response output but also a command output and a task. The command output here refers to outputting a function corresponding to an input of a remote control or button operation included in the configuration in a system that can use voice as an interface, such as a car navigation system. On the other hand, a task is executed by a voice response including not only the above-mentioned command but also a sound effect indicating that it is recognized in a system that can use voice as an interface, such as a car navigation system. Is also output. It is assumed that a command and a task combining a plurality of commands are stored in the dialogue storage unit 3 shown in FIG. As a result, when there is a voice input while outputting a command or executing a task, the output corresponding to the priority is possible, and the convenience for the user is improved.

なお、複数のユーザが本発明の音声対話装置を利用する場合、ユーザ毎に認識語および音声応答の優先度を補正してもよい。例えば、ユーザ毎に異なる識別子、ユーザにより入力された認識語、および該認識語に対応して出力された音声応答の情報であるユーザ固有履歴情報を利用すると、ユーザから前記識別子が入力され、入力された識別子に一致する識別子を前記ユーザ固有情報で特定し、該ユーザ固有履歴情報にしたがって、受け付け中の認識語と出力中の音声応答の優先度を補正する。 When a plurality of users use the voice interactive apparatus of the present invention, the recognition word and the priority of the voice response may be corrected for each user. For example, when using an identifier that is different for each user, a recognition word input by the user, and user-specific history information that is voice response information output corresponding to the recognition word, the identifier is input from the user and input. An identifier that matches the identified identifier is specified by the user-specific information, and the recognition word being accepted and the priority of the voice response being output are corrected according to the user-specific history information.

この場合、ユーザ毎に音声応答の内容や音声応答を中断するか否かの判断が異なるため、ユーザ毎により適した対応することが可能となる。この場合、前記ユーザ固有履歴情報は、対話履歴記憶部４に格納されるものとする。さらに、ユーザを識別する手段は、ユーザ自身による識別子の入力以外の方法であってもよい。次に、本実施形態の音声対話装置の動作について、図１および図８を参照して説明する。以下に説明する動作が、ユーザから入力された音声を認識するたびに行われる。 In this case, since the contents of the voice response and the determination of whether or not to interrupt the voice response are different for each user, it is possible to take a more appropriate action for each user. In this case, the user specific history information is stored in the dialogue history storage unit 4. Further, the means for identifying the user may be a method other than the input of the identifier by the user himself / herself. Next, the operation of the voice interactive apparatus according to the present embodiment will be described with reference to FIG. 1 and FIG. The operation described below is performed every time a voice input from the user is recognized.

図８は本実施形態の音声対話装置の動作手順を示すフローチャートである。 FIG. 8 is a flowchart showing the operation procedure of the voice interaction apparatus of this embodiment.

音声入力手段１は、ユーザから入力された音声を認識すると（ステップ１０１）、対話履歴記憶部４に認識語を格納する（ステップ１０２）。この動作により認識語が入力履歴として格納される。続いて、音声対話手段２はユーザへの音声応答が出力中かどうか確認する（ステップ１０３）。音声応答が出力中でない場合、認識語に対応する音声応答を出力した後（ステップ１０４）、対話履歴記憶部４に応答音声を格納し（ステップ１０５）、動作を終了する。これにより、音声応答が出力履歴として格納される。 When the voice input means 1 recognizes the voice inputted by the user (step 101), the voice input means 1 stores the recognized word in the dialogue history storage unit 4 (step 102). With this operation, the recognized word is stored as an input history. Subsequently, the voice interaction means 2 checks whether a voice response to the user is being output (step 103). When the voice response is not being output, after outputting the voice response corresponding to the recognized word (step 104), the response voice is stored in the dialogue history storage unit 4 (step 105), and the operation is terminated. Thereby, the voice response is stored as an output history.

反対に、音声応答が出力中である場合、音声対話手段２は出力中の音声応答の優先度と入力された認識語の優先度について、どちらが高いか比較をする（ステップ１０６）。 On the other hand, when the voice response is being output, the voice interaction means 2 compares which of the priority of the voice response being output and the priority of the input recognition word is higher (step 106).

認識語の優先度が出力中の音声応答の優先度より高い場合、出力中の音声応答を中断し（ステップ１０７）、ステップ１０４において認識語に対応する音声応答を出力する。その後、ステップ１０７において中断した音声応答と認識語に対応した音声応答を対話履歴部４に格納し（ステップ１０５）、動作を終了する。これにより、中断した音声応答の中断履歴および認識語に対応した音声応答の出力履歴として格納される。 If the priority of the recognized word is higher than the priority of the voice response being output, the voice response being output is interrupted (step 107), and in step 104, the voice response corresponding to the recognized word is output. Thereafter, the voice response interrupted in step 107 and the voice response corresponding to the recognized word are stored in the dialogue history unit 4 (step 105), and the operation is terminated. Thereby, the interruption history of the interrupted voice response and the output history of the voice response corresponding to the recognized word are stored.

ステップ１０６で、認識語の優先度が出力中の音声応答の優先度より低い場合、音声対話手段２は、認識語を保持した状態で、出力中の音声応答が終了するまで、その認識語に対応する動作を待機する（ステップ１０８）。ただし、待機中に音声対話手段２は、新たな認識語の入力を受け付けることができる。 In step 106, when the priority of the recognized word is lower than the priority of the voice response being output, the voice interaction means 2 keeps the recognized word as the recognized word until the voice response being output ends. The corresponding operation is waited (step 108). However, during the standby, the voice interaction means 2 can accept an input of a new recognition word.

新たな認識語の入力がある場合（ステップ１０９）、対話履歴記憶部４にその認識語の入力履歴を格納した後（ステップ１１０）、その認識語の優先度と出力中の音声応答の優先度について、ステップ１０６に戻ってどちらが高いか比較する。認識語の優先度が出力中の音声応答の優先度より高い場合、上述のステップ１０７、ステップ１０４およびステップ１０５の処理を行った後、動作を終了する。認識語の優先度が出力中の音声応答の優先度より低い場合、音声対話手段２は、認識語を受け付ける度に上述のステップ１０９、ステップ１１０、ステップ１０６およびステップ１０８の一連の動作を繰り返す。そして、出力中の音声応答が終了した後に、音声対話手段２は、ステップ１０８で待機していた動作であるステップ１０９で入力された認識語について上述のステップ１０４、ステップ１０５の処理を行う。このとき、ステップ１０９において複数の認識語が入力されている場合、その認識語の中から最も優先度の高い認識語について前述と同じ処理を行う。 When there is an input of a new recognized word (step 109), the input history of the recognized word is stored in the dialogue history storage unit 4 (step 110), and then the priority of the recognized word and the priority of the voice response being output. , Return to step 106 to compare which is higher. If the priority of the recognized word is higher than the priority of the voice response being output, the operation is terminated after performing the above-described steps 107, 104, and 105. When the priority of the recognized word is lower than the priority of the voice response being output, the voice interaction means 2 repeats the above-described series of operations of Step 109, Step 110, Step 106 and Step 108 every time a recognized word is received. Then, after the voice response being output is completed, the voice interaction means 2 performs the above-described processing of Step 104 and Step 105 for the recognized word input in Step 109, which is the operation waiting in Step 108. At this time, if a plurality of recognized words are input in step 109, the same processing as described above is performed for the recognized word having the highest priority among the recognized words.

次に、図８で説明した動作手順のうち優先度決定手段２３の動作を抜き出して、詳しく説明する。 Next, the operation of the priority determination means 23 is extracted from the operation procedure described with reference to FIG. 8 and will be described in detail.

図９は、ステップ１０６において認識語の優先度を出力する動作を示すフローチャートである。 FIG. 9 is a flowchart showing the operation of outputting the priority of the recognized word in step 106.

優先度決定手段２３は、対話記憶部３に格納されている認識語についての基準となる優先度を読み込む（ステップ２０１）。続いて、対話履歴記憶部４に格納された認識語の入力履歴を読み込む（ステップ２０２）。このとき、認識語の入力履歴があるか否かを判定し（ステップ２０３）、認識語の入力履歴がない場合、優先度決定手段２３は優先度を補正しない。反対に、認識語の入力履歴がある場合、優先度決定手段２３は、入力履歴に応じて優先度を補正する（ステップ２０４）。最後に認識語の優先度を決定して（ステップ２０５）動作を終了する。 The priority determination means 23 reads the priority as a reference for the recognized word stored in the dialogue storage unit 3 (step 201). Subsequently, the recognition history input history stored in the dialogue history storage unit 4 is read (step 202). At this time, it is determined whether or not there is a recognition word input history (step 203). If there is no recognition word input history, the priority determination means 23 does not correct the priority. On the other hand, when there is an input history of recognized words, the priority determination means 23 corrects the priority according to the input history (step 204). Finally, the priority of the recognized word is determined (step 205), and the operation is terminated.

図１０は、ステップ１０６において音声応答の優先度を出力する動作を示すフローチャートである。 FIG. 10 is a flowchart showing the operation of outputting the priority of the voice response in step 106.

優先度決定手段２３は、対話記憶部３に格納されている音声応答についての基準となる優先度を読み込む（ステップ３０１）。続いて、対話履歴記憶部４に格納された音声応答の出力履歴を読み込む（ステップ３０２）。このとき、音声応答の出力履歴があるか否かを判定し（ステップ３０３）、音声応答の出力履歴がない場合、優先度決定手段２３は、優先度を補正しない。反対に音声応答の出力履歴がある場合、優先度決定手段２３は、出力履歴に応じて優先度を補正する（ステップ３０４）。最後に音声応答の優先度を決定して（ステップ３０５）動作を終了する。 The priority determination means 23 reads the priority as a reference for the voice response stored in the dialogue storage unit 3 (step 301). Subsequently, the output history of the voice response stored in the dialogue history storage unit 4 is read (step 302). At this time, it is determined whether or not there is a voice response output history (step 303). If there is no voice response output history, the priority determination means 23 does not correct the priority. On the other hand, when there is an output history of voice response, the priority determination means 23 corrects the priority according to the output history (step 304). Finally, the priority of the voice response is determined (step 305), and the operation is terminated.

次に、図９で説明したステップ２０４において、優先度決定手段２３が実行する優先度の補正動作を説明する。 Next, the priority correction operation performed by the priority determination unit 23 in step 204 described with reference to FIG. 9 will be described.

図１１は、ステップ２０４における認識語の優先度を補正する動作を示すフローチャートである。 FIG. 11 is a flowchart showing the operation for correcting the priority of the recognized word in step 204.

優先度決定手段２３は、認識語の総入力回数について、平均入力回数より多いか否か判定する（ステップ４０１）。ここでいう平均入力回数とは、認識語全てについて入力回数の合計値を認識語の全数で割った数値のことである。そして、ステップ４０１で総入力回数が平均入力数より多い場合、基準となる優先度に対し＋１を加えるように補正する（ステップ４０２）。一方、ステップ４０１で総入力回数が平均入力数以下の場合、基準となる優先度に対し−１を加えるように補正する（ステップ４０３）。以上で認識語の優先度を補正する動作を終了する。 The priority determination means 23 determines whether or not the total number of input recognition words is greater than the average number of inputs (step 401). The average number of times of input here is a numerical value obtained by dividing the total number of times of input for all recognized words by the total number of recognized words. If the total number of inputs is greater than the average number of inputs in step 401, correction is made to add +1 to the reference priority (step 402). On the other hand, if the total number of inputs is equal to or less than the average number of inputs in step 401, correction is made so that −1 is added to the reference priority (step 403). This completes the operation for correcting the priority of the recognized word.

次に、図８で説明したステップ３０４において、優先度決定手段２３が実行する優先度の補正動作を説明する。 Next, the priority correction operation performed by the priority determination unit 23 in step 304 described with reference to FIG. 8 will be described.

図１２は、ステップ３０４における音声応答の優先度を補正する動作を示すフローチャートである。 FIG. 12 is a flowchart showing the operation for correcting the priority of the voice response in step 304.

優先度決定手段２３は、音声応答について、中断確率を読み込む（ステップ５０１）。ここでいう中断確率とは、総出力回数を総中断回数で割った数値のことである。 The priority determination means 23 reads the interruption probability for the voice response (step 501). The interruption probability referred to here is a numerical value obtained by dividing the total number of outputs by the total number of interruptions.

次に、中断確率が５０％以下であるか否か判定する（ステップ５０２）。中断確率が５０％以下の場合、基準となる優先度に対して＋１を加えるように補正する（ステップ５０３）。反対に中断確率が５０％より大きい場合、基準となる優先度に対して−１を加えるように補正する（ステップ５０４）。以上で音声応答の優先度を補正する動作を終了する。 Next, it is determined whether or not the interruption probability is 50% or less (step 502). When the interruption probability is 50% or less, correction is performed so that +1 is added to the priority as a reference (step 503). On the other hand, if the interruption probability is greater than 50%, correction is made to add −1 to the reference priority (step 504). This completes the operation for correcting the priority of the voice response.

本実施形態の音声対話装置は、上述したように音声応答の出力中に入力音声を受け付けると、音声応答の出力履歴および入力音声の認識語の入力履歴を考慮した上で優先度を補正し、音声応答の出力を中断するか否か判定する。音声応答出力の維持または中断について履歴を考慮して決定するため、実際の状況により適した応答を人に対してすることが可能となる。その結果、従来よりも、人と対話しているのと近い感覚を得ることができる。 When the voice interaction apparatus of the present embodiment receives the input voice during the output of the voice response as described above, the priority is corrected in consideration of the output history of the voice response and the input history of the recognition word of the input voice, It is determined whether to interrupt the output of the voice response. Since it is determined in consideration of the history about the maintenance or interruption of the voice response output, it becomes possible to give a response more suitable for the actual situation to the person. As a result, it is possible to obtain a sensation closer to that of a conversation with a person than before.

なお、本実施形態として、コンピュータに実行させるためのプログラムに本発明の音声対話装置の入出力制御方法を適用してもよい。 In this embodiment, the input / output control method of the voice interactive apparatus of the present invention may be applied to a program to be executed by a computer.

また、本実施形態として、上記コマンドの出力や上記タスクの実行を行うカーナビゲーションシステム、およびコミュニケーション型ロボットに上述した音声対話装置の入出力制御を実行するためのプログラムを適用してもよい。このプログラムを適用したコミュニケーション型ロボットの一例について、実施例１で説明する。 Further, as the present embodiment, a program for executing input / output control of the above-described voice interactive apparatus may be applied to a car navigation system that performs the output of the command and the execution of the task, and a communication robot. An example of a communication robot to which this program is applied will be described in a first embodiment.

本実施例は、上述した本実施形態の音声対話装置を、人間と対話するためのロボットに適用した場合である。本実施例の音声対話装置の構成は図１から図５に示した構成と同様であるため、その詳細な説明を省略する。なお、全認識語の平均入力回数は、５０回である。 In this example, the above-described voice interaction apparatus of the present embodiment is applied to a robot for interacting with a human. Since the configuration of the voice interactive apparatus of the present embodiment is the same as the configuration shown in FIGS. 1 to 5, the detailed description thereof is omitted. The average number of times of input for all recognized words is 50 times.

次に、本実施例のロボットの動作を説明する。 Next, the operation of the robot of this embodiment will be described.

ロボットが音声応答を出力していない状態において、認識語「こんにちは」を認識すると、対話制御部２２は対話記憶部３の認識語リストおよび音声応答リストを参照し、認識語「こんにちは」に対応する音声応答「ハロー」「こんにちは」「久しぶり元気だった？」の中から一つランダムで選択して出力する。 In a state where the robot does not output the audio response, recognizes the recognition word "Hello", the dialogue control unit 22 refers to the recognition word list and voice response list interactive storage unit 3, corresponding to the recognized word "Hello" "It was a long time doing?" voice response "Hello", "Hello" and outputs the selected one random from among the.

ロボットが図３に示す音声応答「ダンスをすることが出来ます。ダンスしてといってみて」を出力中において、認識語「おはよう」を受け付けると、割り込みの可否を判定するため、認識語および音声応答の優先度を演算して、どちらの優先度が高いか比較を行う。認識語「おはよう」の優先度は、対話記憶部３の認識語リストから基準優先度が「３」であり、対話履歴記憶部４の入力履歴から入力回数が５０回以上であることから、基準値に補正値「＋１」を加えた『４』になる。音声応答「ダンスをすることができます。ダンスしてといってみて」の優先度は、対話記憶部３の音声応答リストから基準優先度が「４」であり、対話履歴記憶部３の出力履歴から中断確率が５０％を超えていることから、基準値に補正値「−１」を加えた『３』になる。この場合、認識語の優先度が音声応答の優先度を上回るため、音声応答「ダンスをすることができます。ダンスしてといってみて」の発話を中断し、認識語「おはよう」に対応する応答音声を出力する。 When the robot receives the recognition word “good morning” while outputting the voice response “Dance is possible. Try dancing,” shown in FIG. The priority of the voice response is calculated to compare which priority is higher. The priority of the recognition word “good morning” is the reference priority “3” from the recognition word list of the dialogue storage unit 3 and the number of inputs from the input history of the dialogue history storage unit 4 is 50 or more. The correction value “+1” is added to the value to “4”. The priority of the voice response “You can dance. Try dancing” is the reference priority “4” from the voice response list of the dialogue storage unit 3, and the output of the dialogue history storage unit 3 Since the interruption probability exceeds 50% from the history, “3” is obtained by adding the correction value “−1” to the reference value. In this case, since the priority of the recognized word exceeds the priority of the voice response, the speech of the voice response “You can dance. Try to dance” is interrupted and it corresponds to the recognized word “Good morning” The response voice to be output.

このようにして、ロボットは、音声応答出力中に人から挨拶を受けると、音声応答の出力を中断し、人からの挨拶に応じて人に挨拶を返す。その結果、このロボットと対話する人は、実際に人と会話したときと同じような感覚を得ることができる。 In this way, when the robot receives a greeting from a person while outputting a voice response, the robot stops outputting the voice response and returns a greeting to the person in response to the greeting from the person. As a result, the person who interacts with the robot can obtain the same feeling as when he / she actually talks with the person.

本実施形態の音声対話装置の一構成例を示すブロック図である。It is a block diagram which shows one structural example of the voice interactive apparatus of this embodiment. 認識語リストの一例を示す表である。It is a table | surface which shows an example of a recognition word list. 音声応答リストの一例を示す表である。It is a table | surface which shows an example of an audio | voice response list. 認識語の入力履歴の一例を示す表である。It is a table | surface which shows an example of the input history of a recognition word. 音声応答の出力履歴の一例を示す表である。It is a table | surface which shows an example of the output log | history of a voice response. 認識語が入力された時刻の履歴を示す表である。It is a table | surface which shows the log | history of the time when the recognition word was input. 音声応答が出力された時刻の履歴を示す表である。It is a table | surface which shows the log | history of the time when the voice response was output. 本実施形態の音声対話装置の動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the voice interactive apparatus of this embodiment. 認識語の優先度を出力する動作を示すフローチャートである。It is a flowchart which shows the operation | movement which outputs the priority of a recognition word. 音声応答の優先度を出力する動作を示すフローチャートである。It is a flowchart which shows the operation | movement which outputs the priority of a voice response. 認識語の優先度を補正する動作を示すフローチャートである。It is a flowchart which shows the operation | movement which correct | amends the priority of a recognition word. 音声応答の優先度を補正する動作を示すフローチャートである。It is a flowchart which shows the operation | movement which correct | amends the priority of a voice response.

Explanation of symbols

１音声入力手段
２音声対話手段
３対話記憶部
４対話履歴記憶部
５入力手段
６出力手段
７制御部
１１音声応答除去手段
１２音声認識手段
２１割込み判定手段
２２対話制御手段
２３優先度決定手段 DESCRIPTION OF SYMBOLS 1 Voice input means 2 Voice dialogue means 3 Dialogue memory | storage part 4 Dialogue history memory | storage part 5 Input means 6 Output means 7 Control part 11 Voice response removal means 12 Voice recognition means 21 Interrupt determination means 22 Dialogue control means 23 Priority determination means

Claims

A recognition word that is a word recognized as an input voice, voice response information to be output corresponding to the input voice, priority information of the recognition word and the voice response, and information on the number of times the recognition word is input A storage unit storing an input history including, and an output history including information on the number of output times of the voice response and the number of times of interruption before the output is completed;
When the input voice is received during the output of the voice response, the voice response superimposed on the input voice is removed to identify the recognized word, and the priority stored in the storage unit of the voice response and the recognized word Is corrected using the information of the input history and the output history, the priority of the voice response and the recognition word after correction is compared, and if the priority of the voice response is higher, the output is maintained, If the recognition word has a higher priority, a control unit that interrupts the output of the voice response;
Spoken dialogue apparatus having

The input history includes time-series information that is information on the history of the input time of the recognized word corresponding to the recognized word,
The controller is
When correcting the priority of the recognized word being accepted, the time series information is referred to, and the priority of the recognized word is increased as the elapsed time from the time when the input of the recognized word was last recorded is shorter. The voice interactive apparatus according to claim 1, wherein correction is performed.

The output history includes time-series information that is information on the history of the time at which the voice response was output corresponding to the voice response,
The controller is
When correcting the priority of the voice response being output, the time series information is referred to, and the priority of the voice response is set to a lower value as the elapsed time from the time when the output of the voice response was last recorded is shorter. The voice interactive apparatus according to claim 1 or 2, wherein correction is performed.

The storage unit stores user-specific history information including a different identifier for each user, a recognized word input by the user, and voice response information output corresponding to the recognized word,
The controller is
When the identifier is input, an identifier that matches the input identifier is specified by the user-specific history information, and the received recognition word and the priority of the voice response being output are determined according to the specified user-specific history information. The voice interactive apparatus according to claim 1, wherein correction is performed.

A voice interaction method by an information processing apparatus that outputs a voice response to an input voice,
A recognition word that is a word recognized as an input voice, voice response information to be output corresponding to the input voice, priority information of the recognition word and the voice response, and information on the number of times the recognition word is input Including an input history including, and an output history including information on the number of outputs of the voice response and the number of times the voice response was interrupted before the output was completed,
When the input speech is received during the output of the speech response, the speech response superimposed on the input speech is removed to identify the recognition word,
Correct the voice response being output and the priority stored in the storage unit of the identified recognition word using the information of the input history and the output history,
Compare the priority of the voice response and the recognition word after correction,
As a result of the comparison, the voice interaction method of maintaining the output if the priority of the voice response is higher, and interrupting the output of the voice response if the priority of the recognized word is higher.

The input history includes time-series information that is history information of the input time of the recognized word corresponding to the recognized word,
When correcting the priority of the recognition word being accepted, referring to the time series information,
6. The spoken dialogue method according to claim 5, wherein the priority of the recognized word is corrected to a higher value as the elapsed time from the last input of the recognized word being accepted is shorter.

The output history includes time-series information that is information of a history of the time at which the voice response was output corresponding to the voice response,
When correcting the priority of the voice response being output, refer to the time series information,
7. The voice interaction method according to claim 5, wherein the priority of the voice response is corrected to a lower value as the elapsed time from the last output time of the voice response being output is shorter.

User specific history information including an identifier different for each user, a recognition word input by the user, and voice response information output corresponding to the recognition word is stored in the storage unit,
When the identifier is input, an identifier that matches the input identifier is specified in the user-specific history information,
8. The voice interaction method according to any one of claims 5 to 7, wherein the recognition word being accepted and the priority of the voice response being outputted are corrected according to the specified user-specific history information.

A program for causing a computer to output a voice response to an input voice,
A recognition word that is a word recognized as an input voice, voice response information to be output corresponding to the input voice, priority information of the recognition word and the voice response, and information on the number of times the recognition word is input Including an input history including, and an output history including information on the number of outputs of the voice response and the number of times the voice response was interrupted before the output was completed,
When the input speech is received during the output of the speech response, the speech response superimposed on the input speech is removed to identify the recognition word,
Correct the voice response being output and the priority stored in the storage unit of the identified recognition word using the information of the input history and the output history,
Compare the priority of the voice response and the recognition word after correction,
As a result of comparison, if the priority of the voice response is higher, the output is maintained, and if the priority of the recognition word is higher, the program for causing the computer to execute a process of interrupting the output of the voice response .