JP2013037097A

JP2013037097A - Voice processor

Info

Publication number: JP2013037097A
Application number: JP2011171621A
Authority: JP
Inventors: 広宣 ▲柳▼田; Hironobu Yanagida
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2011-08-05
Filing date: 2011-08-05
Publication date: 2013-02-21

Abstract

PROBLEM TO BE SOLVED: To provide a voice processor capable of awaking a user in a case where the user using the voice processor for executing an interaction scenario falls into a doze.SOLUTION: A voice processor includes: a voice outputting section; a voice recognizing section; a first controller; and a first storage section that stores an interaction scenario and voice output data therein. The first controller causes the interaction scenario to progress on the basis of a voice recognition result from the voice recognizing section while causing a voice to be uttered from the voice outputting section by using the voice output data on the basis of the interaction scenario. In a scene of the interaction scenario where a response is waited, the first controller causes a voice indicating a warning to be uttered by using the voice outputting section when the voice recognition result is not transmitted to the first controller even after a predetermined time period has elapsed.

Description

本発明は、音声処理装置に関し、特に利用者との対話処理を行う音声処理システムに関する。 The present invention relates to a voice processing apparatus, and more particularly to a voice processing system that performs a dialogue process with a user.

従来、人との会話を行うものとして会話ボット若しくはおしゃべりボットと呼ばれるものが存在する。会話ボットは人との会話をシミュレーションする装置であるが、会話の内容は理解しない。このため、会話ボットは人工無能とも呼ばれる。たとえば、利用者の音声をそのまま繰り返して利用者に話しかける、特許文献１に記載されたおしゃべりオウムのようなものも会話ボットである。しかしながら、その歴史は古く、１９６６年にジョセフ・ワイゼンハイムが開発したＥＬＩＺＡと呼ばれる会話ボットシステムが発端とされる。ＥＬＩＺＡは、セラピストを装い患者の言葉を質問に変換して鸚鵡返しするものである。その後、このような会話ボットシステムは、上記したような治療は基より、高齢者等の話し相手や自動車等の運転における居眠り防止などの用途に用いられるようになってきている。 Conventionally, what is called a conversation bot or a chat bot exists as a conversation with a person. A conversation bot is a device that simulates a conversation with a person, but does not understand the content of the conversation. For this reason, conversation bots are also called artificial incompetence. For example, a conversational bot such as a talking parrot described in Patent Document 1 that speaks to a user by repeating the user's voice as it is. However, the history is old, and the conversation bot system called ELIZA, which was developed by Joseph Weisenheim in 1966, was started. In ELIZA, a therapist dressed as a therapist converts the patient's words into questions and turns them back. Thereafter, such a conversation bot system has come to be used for applications such as prevention of falling asleep when talking with an elderly person or driving a car, etc., based on the above-described treatment.

上述したように、会話ボットは利用者の発する単語と同じ単語を発声することから、利用者にとって飽き易いという欠点もある。これを解決するために、特許文献２では、利用者にとって好ましいと思える会話の特性／話題等を把握し対話処理を行うことで、運転者等の利用者の退屈の解消や利用者の居眠り防止等を実現することができることが記載されている。 As described above, since the conversation bot utters the same word as the word uttered by the user, there is also a drawback that the user is easily bored. In order to solve this problem, Patent Document 2 understands conversation characteristics / topics that seem to be favorable for the user and performs dialogue processing, thereby eliminating the boredom of the user such as the driver and preventing the user from falling asleep. It is described that the above can be realized.

特開平１１−９４８７号公報Japanese Patent Laid-Open No. 11-9487 特開２０１１−１２５９００号公報JP2011-125900A

しかしながら、利用者にとって好ましいと思える会話の特性／話題等を行うことが、逆に利用者に対してある種の緊張感の低下を招き、場合によっては利用者が居眠りしてしまうことがあると想定される。 However, performing conversation characteristics / topics that seem to be favorable for the user, on the other hand, leads to a reduction in a sense of tension for the user, and in some cases the user may fall asleep. is assumed.

本発明は、上述した問題若しくは課題の少なくともひとつを解決するためになされたものであり、以下の適用例若しくは実施形態として実現することが可能である。 SUMMARY An advantage of some aspects of the invention is to solve at least one of the problems or problems described above, and the invention can be implemented as the following application examples or embodiments.

［適用例１］
本適用例にかかる音声処理装置は、音声出力部と、音声認識部と、第１の制御部と、対話シナリオ及び音声出力データが記憶された第１の記憶部と、を含み、前記第１の制御部は、前記対話シナリオに基づき前記音声出力データを用いて前記音声出力部から音声を発声させると共に、前記音声認識部からの音声認識結果に基づき前記対話シナリオを進行させ、前記対話シナリオにおける返答待ちの場面において、前記音声認識結果が所定の時間を経過しても前記第１の制御部に伝達されないときは、前記第１の制御部は前記音声出力部を用いて警告を示す音声を発声させることを特徴とする。 [Application Example 1]
The voice processing device according to this application example includes a voice output unit, a voice recognition unit, a first control unit, and a first storage unit in which a dialogue scenario and voice output data are stored. The control unit causes the voice output unit to utter a voice using the voice output data based on the dialog scenario, and advances the dialog scenario based on the voice recognition result from the voice recognition unit, When the voice recognition result is not transmitted to the first control unit even after a predetermined time has elapsed in a response waiting scene, the first control unit uses the voice output unit to output a voice indicating a warning. It is characterized by uttering.

この構成によれば、音声処理装置が、音声出力部と、音声認識部と、第１の制御部と、対話シナリオ及び音声出力データが記憶された第１の記憶部とを含み、前記対話シナリオにおける返答待ちの場面において、前記音声認識結果が所定の時間を経過しても前記第１の制御部に伝達されないときは、前記第１の制御部は前記音声出力部を用いて警告を示す音声を発声させることで、該音声処理装置を利用しながら居眠りに陥ってしまった人若しくは居眠りに陥ろうとしている人に注意を喚起することができる。 According to this configuration, the voice processing device includes a voice output unit, a voice recognition unit, a first control unit, and a first storage unit in which a dialogue scenario and voice output data are stored. When the voice recognition result is not transmitted to the first control unit even after a predetermined time elapses in a response-waiting scene, the first control unit uses the voice output unit to indicate a warning. , It is possible to call attention to a person who has fallen asleep or is about to fall asleep while using the voice processing device.

［適用例２］
上記適用例にかかる音声処理装置において、前記第１の制御部は、前記所定の時間を計測するための計測カウンターを有し、前記所定の時間を計測するためのカウント値は、前記返答待ちの場面毎に設定されることが好ましい。 [Application Example 2]
In the audio processing device according to the application example, the first control unit includes a measurement counter for measuring the predetermined time, and a count value for measuring the predetermined time is a value waiting for the response. It is preferably set for each scene.

この構成によれば、第１の制御部が所定の時間を計測するための計測カウンターを有し、該所定の時間を計測するためのカウント値を返答待ちの場面毎に設定することで、返答待ちの場面に応じた待ち時間の設定を行うことができる。対話シナリオにおける返答待ちの場面は、利用者が即座に返答できる場面や利用者が思考若しくは確認などを必要とする場面などの様々な場面があると考えられる。このため、例えば単に利用者の応答の平均時間で待ち時間を設定している場合、即答可能な対話の後で思考の必要な対話が出現した場合、思考中に待ち時間が経過してしまう可能性がある。返答待ちの場面における待ち時間を、その場面に応じて思考時間を加味して設定することで、利用者に対してより自然で好ましい対話環境を設定することができる。 According to this configuration, the first control unit has a measurement counter for measuring a predetermined time, and sets a count value for measuring the predetermined time for each response-waiting scene. A waiting time can be set according to the waiting scene. There are various scenes waiting for a response in the dialogue scenario, such as a scene where the user can respond immediately and a scene where the user needs to think or confirm. For this reason, for example, when the waiting time is set simply by the average time of the user's response, if a dialogue that requires thought appears after a dialogue that can be answered immediately, the waiting time may elapse during thinking. There is sex. By setting the waiting time in the response waiting scene in consideration of the thinking time according to the scene, it is possible to set a more natural and preferable conversation environment for the user.

［適用例３］
上記適用例にかかる音声処理装置において、前記対話シナリオにおける最初の前記返答待ちの場面において、前記所定の時間は前記計測カウンターで計測ができる最大値がカウント値として設定されることが好ましい。 [Application Example 3]
In the speech processing apparatus according to the application example described above, it is preferable that a maximum value that can be measured by the measurement counter is set as a count value for the predetermined time in the first response waiting scene in the conversation scenario.

この構成によれば、対話シナリオにおける最初の返答待ちの場面において所定の時間を計測カウンターで計測できる最大の時間とすることで、最初の返答待ちの場面において所定の時間が経過するという状況が発生する可能性の低下を図ることができる。対話シナリオがスタートしての最初の返答待ちの場面は、利用者が居眠りをしている状態にあることや退屈している状態にあることは他の場面と比較して少なく、利用者が自身にとって適切なタイミングで応答してくれることを期待することできる。従って、所定の時間の設定を計測カウンターで計測できる最大値としても該計測カウンターがカウントアップする可能性は低く、対話シナリオの実質的な進行に対する影響はないと判断でき、不用意な警告を発することを防ぐことができる。 According to this configuration, a situation occurs in which the predetermined time elapses in the first response waiting scene by setting the predetermined time as the maximum time that can be measured by the measurement counter in the first response waiting scene in the dialogue scenario. It is possible to reduce the possibility of doing so. The first waiting for a response when the conversation scenario starts is less likely to be asleep or bored than the other scenes. Can be expected to respond at the right time. Therefore, even if the predetermined time setting is the maximum value that can be measured by the measurement counter, it is unlikely that the measurement counter will count up, it can be determined that there is no effect on the actual progress of the dialogue scenario, and an inadvertent warning is issued. Can be prevented.

［適用例４］
上記適用例にかかる音声処理装置において、前記所定の時間は、前記返答待ちの場面において、前記第１の制御部が前記音声認識結果を認識するまでに要した時間を基に変更されることが好ましい。 [Application Example 4]
In the speech processing apparatus according to the application example, the predetermined time may be changed based on a time required for the first control unit to recognize the speech recognition result in the response waiting scene. preferable.

この構成によれば、所定の時間の変更が、返答待ちの場面に移行してから音声認識結果が伝達されるまでに要した時間を基に変更されることで、利用者にとって、所定の時間が適切な時間の長さとなるようにしていくことができる。上記したように、所定の時間は返答待ちの場面によって異なってくる。対話シナリオの中には返答待ちの時間が複数あり、該複数の返答待ちの場面は、例えば利用者が即答可能な返答待ちの場面や利用者が思考した上で返答することが必要な返答待ちの場面など、複数の種類分けが可能である。この種類毎に第１の制御部が音声認識結果を認識するまでに要した時間を基にそれぞれの返答待ちの場面における所定の時間のカウント値を変更していくことで、装置の対応をより利用者に適したものとすることができる。 According to this configuration, the change of the predetermined time is changed based on the time required until the voice recognition result is transmitted after the transition to the response waiting scene. Can be the appropriate length of time. As described above, the predetermined time varies depending on the response waiting scene. There are multiple waiting times for a response in a dialogue scenario. For example, the waiting for a response is a waiting for a response that can be answered immediately by the user or a waiting for a response that needs to be answered by the user. Multiple types of scenes are possible. By changing the count value of the predetermined time in each response waiting scene based on the time required for the first control unit to recognize the speech recognition result for each type, the response of the device is further improved. It can be suitable for the user.

［適用例５］
上記適用例にかかる音声処理装置において、前記第１の制御部は、前記対話シナリオに関連付けて、前記音声認識結果の履歴を前記第１の記憶部に記憶することが好ましい。 [Application Example 5]
In the speech processing apparatus according to the application example described above, it is preferable that the first control unit stores a history of the speech recognition result in the first storage unit in association with the conversation scenario.

この構成によれば、第１の制御部が対話シナリオに関連付けて音声認識結果を第１の記憶部に記憶することで、該音声認識結果を該対話シナリオの履歴として用いることができる。該履歴は、利用者が患者である場合において後の治療に用いることも可能となる。 According to this configuration, the first control unit stores the voice recognition result in the first storage unit in association with the dialogue scenario, so that the voice recognition result can be used as a history of the dialogue scenario. The history can be used for later treatment when the user is a patient.

［適用例６］
上記適用例にかかる音声処理装置において、前記対話シナリオは異なる内容のものが複数存在し、対話者によりいずれの前記対話シナリオを用いるかの選択が可能であることが好ましい。 [Application Example 6]
In the speech processing apparatus according to the application example described above, it is preferable that there are a plurality of conversation scenarios having different contents, and a dialog person can select which conversation scenario to use.

この構成によれば、複数の対話シナリオを用意し、利用者がいずれの対話シナリオを用いるかを選択可能とすることにより、利用者が繰り返し同じ対話シナリオを用いることをなくすことができる。複数の対話シナリオは、第１の記憶部に最初から記憶されていてもよく、外部記憶装置やネットワークなどを用いて第１の記憶部に新たに記憶させるようにしてもよい。いずれにしても、複数の対話シナリオを有することにより、利用者が装置の使用に飽きることを防ぐことができる。 According to this configuration, by preparing a plurality of dialogue scenarios and allowing the user to select which dialogue scenario to use, it is possible to prevent the user from repeatedly using the same dialogue scenario. The plurality of dialogue scenarios may be stored in the first storage unit from the beginning, or may be newly stored in the first storage unit using an external storage device or a network. In any case, having a plurality of dialogue scenarios can prevent the user from getting bored with the use of the device.

音声処理装置の概略ブロック図。1 is a schematic block diagram of an audio processing device. 第１実施形態における処理のフローチャートの一部。A part of flowchart of the process in 1st Embodiment. 第１実施形態における処理のフローチャートの一部。A part of flowchart of the process in 1st Embodiment. 音声シナリオのチャート図の例。An example of a chart diagram of an audio scenario. 音声シナリオのチャート図の例。An example of a chart diagram of an audio scenario. 音声処理装置の一形態の概略ブロック図。1 is a schematic block diagram of an embodiment of a voice processing device.

以下、図を用いて本発明の実施形態について説明する。本実施形態の説明に用いる図は、説明の便宜上、説明に不要な部分についての記載を省略若しくは簡単化して記載を行っている。尚、以降の記載において、２進数のデータは数値の末尾にｂをつけて表し、１６進数のデータは数字の末尾にｈをつけて表すものとする。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the drawings used in the description of the present embodiment, for the convenience of description, description of parts unnecessary for description is omitted or simplified. In the following description, binary data is expressed by adding b to the end of numerical values, and hexadecimal data is expressed by adding h to the end of numbers.

（第１実施形態）
図１に本実施形態における音声処理装置１０の概略ブロック図を示す。音声処理装置１０は、対話ボットの機能を有する装置であり、マイク２１とスピーカー２２が接続されている。利用者は、スピーカー２２から出力される音声を聞き、マイク２１に話しかけることで音声処理装置１０と対話を行う。尚、本実施形態において、マイク２１及びスピーカー２２は、音声処理装置１０とのインターフェイスが適切に取られているものとする。 (First embodiment)
FIG. 1 shows a schematic block diagram of a speech processing apparatus 10 in the present embodiment. The voice processing device 10 is a device having a dialog bot function, and is connected to a microphone 21 and a speaker 22. The user listens to the sound output from the speaker 22 and talks to the microphone 21 to interact with the sound processing apparatus 10. In the present embodiment, it is assumed that the microphone 21 and the speaker 22 are appropriately interfaced with the sound processing apparatus 10.

音声処理装置１０は、音声認識部１１、第１制御部１２、第１記憶部１３及び音声出力部１４を有する。また、第１制御部１２は、計測カウンター３１を有する。また、第１記憶部１３には、対話シナリオ、スピーカー２２から出力する音声を合成するための音声データ、及び、音声認識のために必要なデータ（音声特徴データ）などが記憶されている。対話シナリオは、複数用意されているものとする。 The voice processing device 10 includes a voice recognition unit 11, a first control unit 12, a first storage unit 13, and a voice output unit 14. Further, the first control unit 12 has a measurement counter 31. The first storage unit 13 stores a dialogue scenario, voice data for synthesizing voice output from the speaker 22, data necessary for voice recognition (voice feature data), and the like. It is assumed that a plurality of dialogue scenarios are prepared.

音声認識部１１は、音声処理装置１０に取り込まれた音声を処理し、所定の記号列に変換する機能を有する。図示はしないが、音声認識部１１には、マイク２１からのアナログ信号をデジタル信号に変換するＡＤ変換器が含まれる。音声認識部１１は、デジタル変換された音声の所定の記号列の中から意味のあるフレーズを抽出し、第１制御部１２に伝搬する。該意味のあるフレーズの抽出には第１記憶部１３に記憶されている音声特徴データが用いられる。 The voice recognition unit 11 has a function of processing the voice taken in by the voice processing device 10 and converting it into a predetermined symbol string. Although not shown, the speech recognition unit 11 includes an AD converter that converts an analog signal from the microphone 21 into a digital signal. The speech recognition unit 11 extracts a meaningful phrase from a predetermined symbol string of the digitally converted speech and propagates it to the first control unit 12. Voice feature data stored in the first storage unit 13 is used to extract the meaningful phrase.

音声出力部１４は、第１制御部１２の指示に従い、対話シナリオに定義された音声データの合成を行い、スピーカー２２を介して音声を出力する機能を有する。 The voice output unit 14 has a function of synthesizing voice data defined in the dialogue scenario in accordance with an instruction from the first control unit 12 and outputting voice through the speaker 22.

第１制御部１２は、第１記憶部１３に記憶された対話シナリオに基づき音声認識部１１及び音声出力部１４の制御を行う部分である。図示しない操作手段により音声処理装置１０の電源が投入され利用者の操作により対話のスタートが指示されると、音声処理装置１０は、第１記憶部１３の対話シナリオを参照して音声認識部１１及び音声出力部１４に必要な処理を行わせることで、該対話シナリオを進行させる。 The first control unit 12 is a part that controls the voice recognition unit 11 and the voice output unit 14 based on the dialogue scenario stored in the first storage unit 13. When the voice processing device 10 is turned on by an operating unit (not shown) and a start of dialogue is instructed by a user operation, the voice processing device 10 refers to the dialogue scenario in the first storage unit 13 and recognizes the voice recognition unit 11. The dialogue scenario is advanced by causing the voice output unit 14 to perform necessary processing.

図２に、音声処理装置１０における処理の一部をフローチャート１００として示す。利用者により音声処理装置１０の電源の投入が行われると、ハードウェア動作としてのパワーオンリセットが実行される。その後、音声処理装置１０において必要な初期設定が行われる（Ｓ００１）。初期設定は音声処理装置１０が適切な動作を行うために必要な動作モード等の設定を行うことであり、これにより第１制御部１２、音声認識部１１及び音声出力部１４がそれぞれの機能を果たすにために適切な状態におかれることになる。 FIG. 2 shows a part of the processing in the speech processing apparatus 10 as a flowchart 100. When the user turns on the power of the speech processing apparatus 10, a power-on reset as a hardware operation is executed. Thereafter, necessary initial settings are made in the speech processing apparatus 10 (S001). The initial setting is to set an operation mode or the like necessary for the voice processing apparatus 10 to perform an appropriate operation, whereby the first control unit 12, the voice recognition unit 11, and the voice output unit 14 have their respective functions. You will be in the right condition to fulfill.

次にユーザーインターフェイス（ＵＩ）の起動が行われ、利用者が音声処理装置１０の操作ができるようになる（Ｓ００２）。これにより、音声処理装置１０は利用者からの指示待ちの状態となる（Ｓ００３）。本実施形態においては、説明の便宜上、利用者の操作は音声処理装置１０の処理を終了させるか対話シナリオの選択を行うことに限られるものとする。尚、特に言及しないが、初期設定若しくはＵＩの起動を実行する部分は、図１に図示しない音声処理装置１０の構成要素が行うことでもよく、第１制御部１２が行うことでもよい。 Next, the user interface (UI) is activated, and the user can operate the speech processing apparatus 10 (S002). As a result, the voice processing apparatus 10 waits for an instruction from the user (S003). In the present embodiment, for convenience of explanation, it is assumed that the user's operation is limited to ending the processing of the voice processing device 10 or selecting a dialogue scenario. Although not particularly mentioned, the part for executing the initial setting or the UI activation may be performed by a component of the voice processing apparatus 10 (not shown in FIG. 1) or may be performed by the first control unit 12.

利用者により操作が行われると、まず終了指示かどうかの判断が行われる（Ｓ００４）。終了指示の場合、所定の終了処理を行い音声処理装置１０の処理は終了する。終了処理でない場合は利用者により対話シナリオの選択がされているので、シナリオ選択の処理を行う（Ｓ００５）。次いで、第１制御部１２は、対応する対話シナリオの該当場面（この場合はスタートの場面）のデータを読み出し、以降、対話シナリオの進行に必要な処理を行う（Ｓ００６、Ｓ００７及びＳ００９など）。該当場面が音声出力を行う場面である場合は、音声出力の処理を実行（Ｓ００８）し、シナリオが終了したかどうかを判断（Ｓ０１５）し、終了でない場合は次の場面の実行を行うためにシナリオ進行の処理（Ｓ００６）に戻る。また、該当場面が音声入力を行う場面である場合は音声認識の処理（Ｓ０１０〜Ｓ０１４）に進み、そうでない場合はシナリオが終了したかどうかを判断（Ｓ０１５）し、終了でない場合はシナリオ進行の処理（Ｓ００６）に戻る。尚、対話シナリオが終了したと判断した場合（Ｓ０１５）は、利用者の指示待ちの状態（Ｓ００３）に戻る。 When an operation is performed by the user, it is first determined whether or not it is an end instruction (S004). In the case of an end instruction, a predetermined end process is performed and the processing of the voice processing device 10 ends. If it is not the end process, the dialog scenario is selected by the user, and the scenario selection process is performed (S005). Next, the first control unit 12 reads the data of the corresponding scene (in this case, the start scene) of the corresponding dialog scenario, and thereafter performs processing necessary for the progress of the dialog scenario (S006, S007, S009, etc.). If the corresponding scene is a scene where audio is output, an audio output process is executed (S008), and it is determined whether the scenario has ended (S015). If not, the next scene is executed. The process returns to the scenario progress process (S006). If the corresponding scene is a scene where voice input is performed, the process proceeds to voice recognition processing (S010 to S014). If not, it is determined whether the scenario is finished (S015). It returns to processing (S006). If it is determined that the dialogue scenario has been completed (S015), the process returns to a state of waiting for a user instruction (S003).

第１制御部１２における音声認識の処理（Ｓ０１０〜Ｓ０１４）は、以下のように実行される。まず、第１制御部１２は、音声認識部１１の動作をオンとし、返答待ちの時間を監視するための計測カウンター３１に所定の値を設定し計測カウンター３１を起動させる（Ｓ０１０）。次に、音声認識部１１及び計測カウンター３１の割込みを許可とする（Ｓ０１１）。その後、音声認識部１１の割込み処理の中でオンにされる処理フラグの監視（Ｓ０１２）、計測カウンター３１の割込み処理の中でオンにされるカウントアップフラグの監視（Ｓ０１３）を行う。 The voice recognition process (S010 to S014) in the first control unit 12 is executed as follows. First, the first control unit 12 turns on the operation of the voice recognition unit 11, sets a predetermined value in the measurement counter 31 for monitoring the response waiting time, and starts the measurement counter 31 (S010). Next, interruption of the voice recognition unit 11 and the measurement counter 31 is permitted (S011). Thereafter, the processing flag that is turned on in the interrupt processing of the voice recognition unit 11 is monitored (S012), and the count-up flag that is turned on in the interrupt processing of the measurement counter 31 is monitored (S013).

処理フラグがオンの場合は、利用者からの返答があり音声認識処理が正常に行われたことを示す。この場合は、シナリオが終了したかどうかを判断（Ｓ０１５）し、終了でない場合はシナリオ進行の処理（Ｓ００６）に戻る。カウントアップフラグがオンの場合は、返答待ちの時間として設定した時間内に利用者の返答がなかった場合である。この場合は、利用者に返答を促すための警告処理（Ｓ０１４）が行われる。その後、計測カウンター３１の割込み処理は割込み不許可の状態で終了していることから、割込みを許可し（Ｓ０１１）し、割込み処理の終了を監視する上記の処理（Ｓ０１２、Ｓ０１３）を繰り返す。 When the processing flag is on, there is a response from the user, indicating that the speech recognition processing has been performed normally. In this case, it is determined whether or not the scenario is completed (S015). If not completed, the process returns to the scenario progress process (S006). When the count-up flag is on, there is no response from the user within the time set as the response wait time. In this case, a warning process (S014) for prompting the user to respond is performed. Thereafter, since the interrupt processing of the measurement counter 31 is completed in a state where interrupt is not permitted, the interrupt is permitted (S011), and the above processing (S012, S013) for monitoring the end of the interrupt processing is repeated.

対話シナリオの進行の上で正常な形で割込み処理の監視（Ｓ０１２、Ｓ０１３）の状態を抜けるのは、処理フラグがオンと判断（Ｓ０１２）されて抜ける場合である。これは、利用者から返答があり、該返答の内容を所定の時間経過前に認識することができたことを示す。この場合、対話シナリオを進行させるためにフローチャート１００のＳ０１５に処理が進むことになる。 The state of monitoring the interrupt process (S012, S013) in a normal manner upon the progress of the dialogue scenario is exited when it is determined that the processing flag is on (S012). This indicates that there was a response from the user and that the content of the response could be recognized before a predetermined time. In this case, the process proceeds to S015 of the flowchart 100 in order to advance the dialogue scenario.

対話シナリオの進行の上で正常でない形で割込み処理の監視（Ｓ０１２、Ｓ０１３）の状態を抜けるのは、利用者からの返答による音声認識結果が、計測カウンター３１で計測する所定の時間を経過しても得られないときである。この場合が、計測カウンター３１の割込み処理でカウントアップフラグがオンとなった場合である。この場合は、フローチャート１００のＳ０１３でフラグのオンが判断され、警告処理（Ｓ０１４）に進む。尚、警告処理（Ｓ０１４）については後述する。警告処理（Ｓ０１４）は、割込みマスクの状態で行われることから、警告処理（Ｓ０１４）の後は割込み許可（Ｓ０１１）に進み、利用者の返答を待つ状態に移行し、割込み処理の終了を監視する上記の処理（Ｓ０１２、Ｓ０１３）を繰り返す。尚、利用者からの返答の音声認識結果が得られない場合は、割込み処理の終了を監視する処理、警告処理が繰り返されることになる。 The reason why the interruption process monitoring (S012, S013) exits in an abnormal manner in the progress of the dialogue scenario is that the voice recognition result by the response from the user passes a predetermined time measured by the measurement counter 31. This is when you cannot get it. This is the case when the count-up flag is turned on by the interrupt processing of the measurement counter 31. In this case, it is determined in S013 of the flowchart 100 that the flag is turned on, and the process proceeds to a warning process (S014). The warning process (S014) will be described later. Since the warning process (S014) is performed in the interrupt mask state, after the warning process (S014), the process proceeds to interrupt permission (S011), shifts to a state waiting for the user's reply, and monitors the end of the interrupt process. The above processing (S012, S013) is repeated. If the voice recognition result of the response from the user cannot be obtained, the process for monitoring the end of the interrupt process and the warning process are repeated.

音声認識部１１からの割込みは、音声認識部１１が認識すべき所定のフレーズを認識したときに発生する。この割込みが発生したときは、第１制御部１２は、利用者の返答があり、該返答が正常に認識されたと判断する。 The interruption from the voice recognition unit 11 occurs when a predetermined phrase to be recognized by the voice recognition unit 11 is recognized. When this interrupt occurs, the first control unit 12 determines that there is a response from the user and that the response has been recognized normally.

音声認識部１１の割込み処理の概略フローチャートを図３−（ａ）に示す。割込み処理に入ると、まず多重割込みとならないように割込みをマスクする（Ｓ１０１）。次に、第１制御部１２は、音声認識部１１の音声認識結果を確認し（Ｓ１０２）、その結果を第１記憶部１３の所定の領域に保存する（Ｓ１０３）。該認識結果は、対話シナリオのシーンナンバー（説明は後述）と関連付けて記憶され、該対話シナリオの終了後に対話シナリオと認識結果とを絡めて再構成することで、履歴として利用することが可能となる。その後、音声認識処理の終了を示す処理フラグをオン（Ｓ１０４）にして音声認識部１１の割込み処理を終了する。上述したように、フローチャート１００のＳ０１２において処理フラグがオンと判断されると、Ｓ０１２の処理及びＳ０１３の処理で構成される割込み監視のループを抜けて対話シナリオの終了の判断（Ｓ０１５）に進むことになる。 A schematic flowchart of the interrupt process of the voice recognition unit 11 is shown in FIG. When entering the interrupt process, first, the interrupt is masked so as not to be a multiple interrupt (S101). Next, the first control unit 12 confirms the speech recognition result of the speech recognition unit 11 (S102), and stores the result in a predetermined area of the first storage unit 13 (S103). The recognition result is stored in association with the scene number (explained later) of the dialogue scenario, and can be used as a history by reconstructing the dialogue scenario and the recognition result after completion of the dialogue scenario. Become. Thereafter, the processing flag indicating the end of the speech recognition processing is turned on (S104), and the interrupt processing of the speech recognition unit 11 is ended. As described above, when the processing flag is determined to be ON in S012 of the flowchart 100, the process goes through the interrupt monitoring loop constituted by the processing of S012 and the processing of S013 and proceeds to the determination of the end of the conversation scenario (S015). become.

次に、計測カウンター３１の割込み処理について説明する。計測カウンター３１の割込みは、計測カウンター３１のカウントがカウントアップしたときに発生する。この割込みの発生は、返答待ちの場面において利用者の返答が所定の時間を経過してもなかったことを意味し、利用者が居眠り状態にあると推測される状態を示している。計測カウンター３１の割込み処理の概略フローチャートを図３−（ｂ）に示す。本割込み処理に入ると、まず、多重割込みとならないように割込みをマスクする（Ｓ２０１）。次に、カウントアップフラグをオンとして（Ｓ２０２）、割込み処理を終了する。上述したように、カウントアップフラグがオンであるかどうかは、フローチャート１００のＳ０１３で監視されている。カウントアップフラグがオンの場合は、警告処理（Ｓ０１４）に進むことになる。 Next, interrupt processing of the measurement counter 31 will be described. The interruption of the measurement counter 31 occurs when the count of the measurement counter 31 is counted up. The occurrence of this interrupt means that the user's response has not passed a predetermined time in a response waiting scene, and indicates a state in which the user is assumed to be in a dozing state. A schematic flowchart of the interrupt processing of the measurement counter 31 is shown in FIG. When entering this interrupt process, first, interrupts are masked so as not to be multiple interrupts (S201). Next, the count-up flag is turned on (S202), and the interrupt process is terminated. As described above, whether or not the count-up flag is on is monitored in S013 of the flowchart 100. If the count-up flag is on, the process proceeds to a warning process (S014).

警告処理の概略フローチャートを図３−（ｄ）に示す。第１制御部１２は、対話シナリオの場面を警告場面に移行させる（Ｓ４０１）。警告場面の説明は後述する。次に、該警告場面に設定された警告の音声を再生する指示を音声出力部１４に対して行い（Ｓ４０２）、計測カウンター３１にカウント値を再設定して（Ｓ４０３）警告処理を終了する。 A schematic flowchart of the warning process is shown in FIG. The first control unit 12 shifts the scene of the dialogue scenario to a warning scene (S401). The warning scene will be described later. Next, an instruction to reproduce the warning sound set in the warning scene is given to the voice output unit 14 (S402), the count value is reset in the measurement counter 31 (S403), and the warning process is terminated.

また、音声認識部１１は、第１制御部１２により動作の開始が指示される（Ｓ０１０）と、マイク２１から入力される音の解析を開始する。音声認識部１１の処理の概略のフローチャートを図３−（ｃ）に示す。マイク２１からの音は音声認識部１１のＡＤ変換器によりデジタルデータに変換され、該デジタルデータは所定の記号列として音声認識部１１内において処理される。該所定の記号列を認識データとして取得（Ｓ３０１）し、認識データにおける特徴を第１記憶部１３に記憶された音声特徴データと比較することにより認識データの分析を行う（Ｓ３０２）。次に、認識データに対話シナリオの場面に応じた認証フレーズがあるかどうかを判断し（Ｓ３０３）、認証フレーズがない場合は認証データの取得を継続し（Ｓ３０１）、認証フレーズが抽出できた場合は所定のレジスターなどに保持し（Ｓ３０４）、割込み信号を発生させ（Ｓ３０５）、音声認識の処理を終了する。第１制御部１２は、音声認識部１１の割込み処理（図３−（ａ））において、音声認識の処理（図３−（ｃ））のＳ３０４で保持された認証フレーズを認識結果として取得する（Ｓ１０２、Ｓ１０３）。認証フレーズについては後述する。 In addition, when the first control unit 12 instructs the start of the operation (S010), the voice recognition unit 11 starts analyzing the sound input from the microphone 21. A schematic flowchart of the processing of the speech recognition unit 11 is shown in FIG. The sound from the microphone 21 is converted into digital data by the AD converter of the voice recognition unit 11, and the digital data is processed in the voice recognition unit 11 as a predetermined symbol string. The predetermined symbol string is acquired as recognition data (S301), and the recognition data is analyzed by comparing the feature in the recognition data with the voice feature data stored in the first storage unit 13 (S302). Next, it is determined whether or not there is an authentication phrase corresponding to the scene of the dialogue scenario in the recognition data (S303). If there is no authentication phrase, acquisition of authentication data is continued (S301), and the authentication phrase can be extracted. Is held in a predetermined register (S304), an interrupt signal is generated (S305), and the speech recognition process is terminated. The first control unit 12 acquires, as a recognition result, the authentication phrase held in S304 of the speech recognition process (FIG. 3- (c)) in the interrupt process (FIG. 3- (a)) of the speech recognition unit 11. (S102, S103). The authentication phrase will be described later.

次に、対話シナリオについて説明する。対話シナリオは、場面の実行内容を記載した場面情報が、会話の進行順にリンクされたものと考えてよい。例として、図４及び図５に対話シナリオのひとつの形式を示す。本実施形態においては３種類の場面情報を定義している。場面情報のひとつ目は、音声処理装置１０が利用者に話をする場面（以降、第１の場面と呼ぶ）の場面情報（以降、第１の場面情報と呼ぶ）である。第１の場面情報は、図４及び図５において、ＤＳ００１、ＤＳ００３及びＤＳ１０１で示したものである。場面情報のふたつ目は、利用者が返答をする場面即ち音声処理装置１０が音声認識を行う場面（以降、第２の場面と呼ぶ）の場面情報（以降、第２の場面情報と呼ぶ）である。第２の場面情報は、図４及び図５において、ＤＳ００２で示したものである。場面情報のみっつ目は、音声処理装置１０が利用者の返答待ちの場面において所定の時間を経過した後の場面（以降、第３の場面と呼ぶ）の場面情報（以降、第３の場面情報と呼ぶ）である。第３の場面情報は、図５においてＤＳ００２Ｗで示したものである。尚、上述した警告場面とは第３の場面のことである。それぞれの場面情報について以下に説明する。尚、それぞれの場面情報の具体的なデータ形式については特に言及しない。 Next, a dialogue scenario will be described. The conversation scenario may be considered as scene information describing the execution contents of the scene linked in the order of conversation progress. As an example, FIGS. 4 and 5 show one form of dialogue scenario. In this embodiment, three types of scene information are defined. The first piece of scene information is scene information (hereinafter referred to as first scene information) of a scene (hereinafter referred to as a first scene) in which the voice processing apparatus 10 talks to the user. The first scene information is indicated by DS001, DS003, and DS101 in FIGS. The second piece of scene information is scene information (hereinafter referred to as second scene information) of a scene where the user responds, that is, a scene where the speech processing apparatus 10 performs voice recognition (hereinafter referred to as the second scene). is there. The second scene information is indicated by DS002 in FIGS. The third scene information is the scene information (hereinafter referred to as the third scene) of the scene after the predetermined time has elapsed in the scene in which the voice processing device 10 is waiting for a response from the user (hereinafter referred to as the third scene). Called information). The third scene information is indicated by DS002W in FIG. In addition, the warning scene mentioned above is a 3rd scene. Each scene information will be described below. The specific data format of each scene information is not specifically mentioned.

第１の場面情報には、シーンナンバー（シーンＮｏ）、音声出力フラグ、音声認識フラグ及び音声フレーズの情報が含まれる。第１の場面情報及び第２の場面情報におけるシーンＮｏは共通のルールで割振られ、対話シナリオの種類を表すアルファベットと連続した数字で表される。本実施形態においては、便宜上、アルファベット１文字と数字４桁で構成されている。基本的に対話シナリオにおける場面の進行は、同一アルファベット文字において数字が１増加する場面の順番に行われる。即ち、シーンＮｏがＡ０１００の場面が実行された後は、シーンＮｏがＡ０１０１の場面が基本的に実行されることになる。 The first scene information includes information on a scene number (scene number), a voice output flag, a voice recognition flag, and a voice phrase. The scene numbers in the first scene information and the second scene information are assigned by a common rule, and are represented by a number that is continuous with an alphabet representing the type of dialogue scenario. In this embodiment, for convenience, it is composed of one alphabetic character and four digits. Basically, the progress of the scenes in the dialogue scenario is performed in the order of scenes where the number increases by 1 in the same alphabetic character. That is, after the scene No. A0100 is executed, the scene No. A0101 is basically executed.

音声出力フラグは、音声の出力を指示するためのフラグである。本実施形態においては、２ビットのコード“０１ｂ”と定義した。入力された音声は、音声認識部１１により分析が行われる。音声フレーズは、出力する音声を定義するものである。 The audio output flag is a flag for instructing audio output. In the present embodiment, it is defined as a 2-bit code “01b”. The input speech is analyzed by the speech recognition unit 11. The voice phrase defines the voice to be output.

第２の場面情報には、シーンＮｏ、音声出力フラグ、音声認識フラグ、カウンター設定値及び認証フレーズ・シーンＮｏの情報が含まれる。シーンＮｏ、音声出力フラグ及び音声認識フラグの定義は、第１の場面情報と同じである。カウンター設定値は、返答待ちの時間を計測するためのカウント値であり、計測カウンター３１に設定される。計測カウンター３１はカウントの起動が指示される（Ｓ０１０）と、所定のクロックでカウントされる。カウントはインクリメント若しくはデクリメントのいずれでもよく、カウンター設定値は該所定のクロックの周波数及び計測カウンター３１の使用形態を考慮して決定されることでよい。また、キャリー又はボローにより計測カウンター３１のカウントアップを検出してもよく、比較レジスターなどを設け該比較レジスターの値との一致によりカウントアップを定義してもよい。上述したように、計測カウンター３１のカウントアップによる割込み処理の中でカウントアップフラグがオンにされ、フローチャート１００のＳ０１３でフラグのオンが判断されることにより、警告処理（Ｓ０１４）が実行される。 The second scene information includes information of scene No., voice output flag, voice recognition flag, counter setting value, and authentication phrase / scene No. The definitions of the scene number, the voice output flag, and the voice recognition flag are the same as the first scene information. The counter setting value is a count value for measuring the response waiting time, and is set in the measurement counter 31. When the count counter 31 is instructed to start counting (S010), the measurement counter 31 is counted with a predetermined clock. The count may be either increment or decrement, and the counter set value may be determined in consideration of the frequency of the predetermined clock and the usage form of the measurement counter 31. Further, the count-up of the measurement counter 31 may be detected by carry or borrow, and a count-up may be defined by providing a comparison register or the like and matching with the value of the comparison register. As described above, the count-up flag is turned on in the interrupt process by counting up the measurement counter 31, and the warning process (S014) is executed when the flag is determined to be turned on in S013 of the flowchart 100.

認証フレーズ・シーンＮｏの情報は、本場面において有効と判断する認証フレーズと該認証フレーズが認識された場合の次に実行する場面のシーンＮｏとを示したものである。例えば、図４のＤＳ００２は、音声認識部１１の音声認識結果として“はい”が認識された場合は、次に実行する場面のシーンＮｏはＡ０１０２であることを定義している。同様に、ＤＳ００２は、音声認識部１１の音声認識結果が“だめ”であった場合には、次に実行する場面のシーンＮｏはＢ０１００であることを定義している。 The information of the authentication phrase / scene number indicates the authentication phrase that is determined to be valid in this scene and the scene number of the scene that is executed next when the authentication phrase is recognized. For example, DS002 in FIG. 4 defines that when “Yes” is recognized as the voice recognition result of the voice recognition unit 11, the scene number of the scene to be executed next is A0102. Similarly, DS002 defines that the scene number of the scene to be executed next is B0100 when the voice recognition result of the voice recognition unit 11 is “No”.

第３の場面情報には、シーンＮｏ、音声出力フラグ、音声認識フラグ、カウンター設定値、音声フレーズ及び認証フレーズ・シーンＮｏの情報が含まれる。第３の場面情報は、第１の場面情報における情報と第２の場面情報における情報の両方の情報を有することになる。シーンＮｏ以外の情報の定義は、第１の場面情報及び第２の場面情報で説明した内容と同じである。第３の場面情報のシーンＮｏは、返答待ちの場面のシーンＮｏの末尾にＷを付加したコードとして定義されている。上述したように、第３の場面情報は警告処理（Ｓ１０４）における場面である。利用者に警告を発した後は、直前の第２の場面と同じように利用者の返答を認識しなくてはいけない。従って、音声を発することと返答を認識することの両方を行う場面であることから、第１の場面情報における情報と第２の場面情報における情報の双方の情報を有している。尚、認証フレーズ・シーンＮｏの情報は直前の第２の場面と同じものである。 The third scene information includes information of scene No., voice output flag, voice recognition flag, counter setting value, voice phrase, and authentication phrase / scene No. The third scene information includes both information in the first scene information and information in the second scene information. The definition of information other than the scene No. is the same as the contents described in the first scene information and the second scene information. The scene number of the third scene information is defined as a code in which W is added to the end of the scene number of the scene waiting for a response. As described above, the third scene information is a scene in the warning process (S104). After issuing a warning to the user, the user's response must be recognized in the same manner as in the second scene immediately before. Therefore, since it is a scene where both the voice is emitted and the response is recognized, both the information in the first scene information and the information in the second scene information are included. The information of the authentication phrase / scene number is the same as that in the second scene immediately before.

図４のＤＳ００１以降の対話シナリオの処理の流れは次のようになる。 The process flow of the dialogue scenario after DS001 in FIG. 4 is as follows.

まず、図２のＳ００６において、第１制御部１２により図４のＤＳ００１の情報が読み出され、場面情報の解釈が行われる。第１制御部１２は、音声出力フラグの設定があることから（図２のＳ００７）、第１制御部１２は音声出力部１４に音声フレーズにある音声データ“昔の話をしましょう”の出力を指示する（図２のＳ００８に含まれる）。音声出力部１４は、第１制御部１２の指示により該音声データを出力する（図２のＳ００８に含まれる）。 First, in S006 of FIG. 2, the information of DS001 of FIG. 4 is read by the first control unit 12 and the scene information is interpreted. Since the first control unit 12 has a voice output flag set (S007 in FIG. 2), the first control unit 12 outputs the voice data “Let's talk about the past” in the voice phrase to the voice output unit 14. (Included in S008 of FIG. 2). The audio output unit 14 outputs the audio data in accordance with an instruction from the first control unit 12 (included in S008 of FIG. 2).

対話シナリオにはＤＳ００１の次の場面情報ＤＳ００２があることから（図２のＳ０１５）、第１制御部１２はＤＳ００２の情報を読み出し、場面情報の解釈を行う（図２のＳ００６）。ＤＳ００２には音声出力フラグの設定がなく（図２のＳ００７）、音声認識フラグの設定があることから（図２のＳ００９）、第１制御部１２は音声認識部１１の動作を開始をすると共にカウンター設定値の値を計測カウンター３１にセットして計測カウンターを起動させる（図２のＳ０１０）。つづいて、第１制御部１２は音声認識部１１及び計測カウンター３１からの割込みを許可し（図２のＳ０１１）、割込み処理により設定される各種フラグの確認待ちの状態となる（図２のＳ０１２、Ｓ０１３）。この状態も第２の場面（シーンＮｏがＡ０１０１）が実行されている状態に含まれる。 Since there is scene information DS002 next to DS001 in the dialogue scenario (S015 in FIG. 2), the first control unit 12 reads the information in DS002 and interprets the scene information (S006 in FIG. 2). Since DS002 does not have a voice output flag (S007 in FIG. 2) and has a voice recognition flag (S009 in FIG. 2), the first control unit 12 starts the operation of the voice recognition unit 11 and The value of the counter set value is set in the measurement counter 31 to start the measurement counter (S010 in FIG. 2). Subsequently, the first control unit 12 permits interruption from the voice recognition unit 11 and the measurement counter 31 (S011 in FIG. 2), and enters a state of waiting for confirmation of various flags set by the interruption process (S012 in FIG. 2). , S013). This state is also included in the state in which the second scene (scene number A0101) is being executed.

第２の場面（シーンＮｏがＡ０１０１）において、計測カウンター３１のカウントアップ前に利用者から“昔の話はいやだな”と返答があったとする。音声認識部１１は、利用者からの返答の認識データ（図３−（ｃ）のＳ３０１）の内容を分析し（図３−（ｃ）のＳ３０２）、ＤＳ００２に定義された認証フレーズに該当するものがあるかどうかを確認して該当するものがあった場合（図３−（ｃ）のＳ３０３）には、確認できた認証フレーズを認識データの中から抽出し（図３−（ｃ）のＳ３０４）、第１制御部１２がアクセス可能な所定の場所に抽出したフレーズを保持する。今回の返答の場合、“いや”というフレーズが抽出される。その後、音声認識部１１は割込みを発生させ、処理を終了する。 In the second scene (scene No. A0101), it is assumed that a response is received from the user that “the old story is not good” before the measurement counter 31 counts up. The voice recognition unit 11 analyzes the content of the recognition data of the response from the user (S301 in FIG. 3- (c)) (S302 in FIG. 3- (c)), and corresponds to the authentication phrase defined in DS002. If there is a corresponding one after confirming whether there is any (S303 in FIG. 3- (c)), the confirmed authentication phrase is extracted from the recognition data (see FIG. 3- (c)). S304), the extracted phrase is held in a predetermined location accessible by the first control unit 12. In the case of this reply, the phrase “No” is extracted. Thereafter, the voice recognition unit 11 generates an interrupt and ends the process.

音声認識部１１からの割込みが発生したことで、音声認識割込み処理（図３−（ａ））が実行され、処理フラグがオンとなる（図３−（ａ）のＳ１０４）。処理フラグがオンとなったことが確認される（図２のＳ０１２）と、実行した場面が終了場面かどうかの確認を行い（図２のＳ０１５）、次に実行する場面の処理（図２のＳ００６）に進む。本対話シナリオの場合、シーンＮｏがＡ０１０１の場面で認識されたフレーズが“いや”であったことから、次に実行される場面は、図４のＤＳ００２に記載されているように、シーンＮｏがＢ０１００の場面である。第１制御部１２が実行するのは、図４のＤＳ１０１で示した第１の場面となる。 When the interruption from the voice recognition unit 11 occurs, the voice recognition interruption process (FIG. 3- (a)) is executed, and the processing flag is turned on (S104 in FIG. 3- (a)). When it is confirmed that the processing flag is turned on (S012 in FIG. 2), it is confirmed whether or not the executed scene is an end scene (S015 in FIG. 2). The process proceeds to S006). In the case of this dialogue scenario, since the phrase recognized in the scene No. A0101 is “No”, the next scene to be executed is the scene No. as described in DS002 of FIG. This is a scene of B0100. The first control unit 12 executes the first scene indicated by DS101 in FIG.

次に、第２の場面（シーンＮｏがＡ０１０１）において、音声認識部１１が認証フレーズの抽出を行う前に計測カウンター３１のカウントアップの割込みが発生した場合について説明する。この場合、計測カウンター割込み処理（図３−（ｂ））においてカウントアップフラグがオンとなる（図３−（ｂ）のＳ２０２）。カウントアップフラグがオンとなることにより（図２のＳ０１３）、警告処理（図２のＳ０１４）が実行される。第２の場面（シーンＮｏがＡ０１０１）における警告処理であることから、実行される場面の第３の場面のシーンＮｏは、Ａ０１０１Ｗ（図５のＤＳ００２Ｗ）となる。 Next, in the second scene (scene No. A0101), a case where a count-up interruption of the measurement counter 31 occurs before the voice recognition unit 11 extracts an authentication phrase will be described. In this case, the count-up flag is turned on in the measurement counter interruption process (FIG. 3- (b)) (S202 in FIG. 3- (b)). When the count-up flag is turned on (S013 in FIG. 2), a warning process (S014 in FIG. 2) is executed. Since this is a warning process in the second scene (scene No. A0101), the scene No. of the third scene to be executed is A0101W (DS002W in FIG. 5).

ＤＳ００２Ｗの音声出力フラグが“０１ｂ”であることから、第１制御部１２は、音声出力部１４に対してＤＳ００２Ｗに定義された音声フレーズ“起きてるぅ〜”の出力を指示する（図３−（ｄ）のＳ４０２）。第１制御部１２は、ＤＳ００２Ｗに定義されたカウンター設定値を計測カウンター３１に設定して（図３−（ｄ）のＳ４０３）警告処理を終える。シーンＮｏがＡ０１０１Ｗの場面は、シーンＮｏがＡ０１０１の場面の延長と解される場面であることから、音声認識フラグには“１０ｂ”が設定されている。 Since the voice output flag of DS002W is “01b”, the first control unit 12 instructs the voice output unit 14 to output the voice phrase “Waking up” defined in DS002W (FIG. 3). (D) S402). The first control unit 12 sets the counter setting value defined in DS002W in the measurement counter 31 (S403 in FIG. 3D), and ends the warning process. Since the scene with the scene number A0101W is a scene that is understood as an extension of the scene with the scene number A0101, the speech recognition flag is set to “10b”.

所定の時間を経過しても返答がない場合は、上述したように警告処理が実行され、第３の場面情報に定義された音声フレーズが出力され、利用者に対して注意を促すことができる。これにより、利用者が居眠り状態にあった場合若しくは居眠りに陥ろうとしている状態にあった場合に、利用者を目覚めさせる効果が期待できることになる。 If there is no response after a predetermined time, the warning process is executed as described above, the voice phrase defined in the third scene information is output, and the user can be alerted. . Thereby, when the user is in a dozing state or is in a state of going to fall asleep, an effect of awakening the user can be expected.

また、対話シナリオにおける最初の返答待ちの場面において所定の時間を計測カウンターで計測できる最大の時間とすることで、最初の返答待ちの場面において所定の時間が経過するという状況が発生する可能性の低下を図ることができる。対話シナリオがスタートしての最初の返答待ちの場面は、利用者が居眠りをしている状態にあることや退屈している状態にあることは他の場面と比較して少なく、利用者が自身にとって適切なタイミングで応答してくれることを期待することできる。従って、所定の時間の設定を計測カウンターで計測できる最大値としても該計測カウンターがカウントアップする可能性は低く、対話シナリオの実質的な進行に対する影響はないと判断でき、不用意な警告を発することを防ぐことができる。 Also, by setting the maximum time that can be measured by the measurement counter in the first response waiting scene in the dialogue scenario, there is a possibility that the predetermined time will elapse in the first response waiting scene. Reduction can be achieved. The first waiting for a response when the conversation scenario starts is less likely to be asleep or bored than the other scenes. Can be expected to respond at the right time. Therefore, even if the predetermined time setting is the maximum value that can be measured by the measurement counter, it is unlikely that the measurement counter will count up, it can be determined that there is no effect on the actual progress of the dialogue scenario, and an inadvertent warning is issued. Can be prevented.

尚、音声処理装置１０の同じ問いに対する返答に要する時間は、人によって様々である。従って、計測カウンター３１への設定値を、対話シナリオの進行が進むにつれ変更することで、タイムアップするまでの時間を利用者に適したものに変更することができる。上記したように、返答待ちの時間は、第２の場面毎に設定されている。複数に種類分けした中の、所定の種類の第２の場面において設定したカウンター設定値と該所定の種類の第２の場面においての音声認識に要した時間との平均を次回の該所定の種類の第２の場面のカウンター設定値とすることで、該所定の種類の第２の場面における返答待ち時間の計測を利用者に適したものとすることを図ることできる。 Note that the time required for replying to the same question of the voice processing device 10 varies depending on the person. Therefore, by changing the setting value for the measurement counter 31 as the progress of the dialogue scenario proceeds, the time until the time is up can be changed to a value suitable for the user. As described above, the response waiting time is set for each second scene. The average of the counter setting value set in the second scene of the predetermined type and the time required for speech recognition in the second scene of the predetermined type among the plurality of types is determined next time. By setting the counter setting value of the second scene, it is possible to make the response waiting time measurement in the second scene of the predetermined type suitable for the user.

以上、本発明の実施形態について説明を行ったが、本発明は上記の実施形態に限られたものではい。例えば、対話シナリオの場面情報の定義の仕方には様々な方法が考えられる。また、音声処理装置１０を、図６に示すようにマイコンの形で構成（音声処理装置５０）してもよい。音声処理装置５０は、ＣＰＵ部、リセット制御部（Ｒｅｓｅｔ）、クロック生成部（Ｃｌｏｃｋ）、システムバス６０、ワークメモリー部、タイマー部５１、ペリフェラル部５２、記憶部５３、アナログ−デジタル変換器（Ａ／Ｄ変換）及びデジタル−アナログ変換器（Ｄ／Ａ変換）を備えている。本発明は、上記の適用例及び実施形態に限られず、趣旨を逸脱しない範囲において広く適用が可能である。 Although the embodiment of the present invention has been described above, the present invention is not limited to the above embodiment. For example, various methods are conceivable for defining scene information of a dialogue scenario. Further, the voice processing device 10 may be configured in the form of a microcomputer (voice processing device 50) as shown in FIG. The audio processing device 50 includes a CPU unit, a reset control unit (Reset), a clock generation unit (Clock), a system bus 60, a work memory unit, a timer unit 51, a peripheral unit 52, a storage unit 53, an analog-digital converter (A / D conversion) and a digital-analog converter (D / A conversion). The present invention is not limited to the application examples and embodiments described above, and can be widely applied without departing from the spirit of the present invention.

１０…音声処理装置、１１…音声認識部、１２…第１制御部、１３…第１記憶部、１４…音声出力部、２１…マイク、２２…スピーカー、３１…計測カウンター、５０…音声処理装置、５１…タイマー部、５２…ペリフェラル部、５３…記憶部、６０…システムバス、１００…フローチャート。 DESCRIPTION OF SYMBOLS 10 ... Voice processing apparatus, 11 ... Voice recognition part, 12 ... 1st control part, 13 ... 1st memory | storage part, 14 ... Sound output part, 21 ... Microphone, 22 ... Speaker, 31 ... Measurement counter, 50 ... Voice processing apparatus , 51 ... timer part, 52 ... peripheral part, 53 ... storage part, 60 ... system bus, 100 ... flowchart.

Claims

An audio output unit;
A voice recognition unit;
A first control unit;
A first storage unit storing a dialogue scenario and voice output data,
The first control unit utters speech from the speech output unit using the speech output data based on the conversation scenario, and advances the conversation scenario based on a speech recognition result from the speech recognition unit,
When the voice recognition result is not transmitted to the first control unit even after a predetermined time has elapsed in the response waiting scene in the dialogue scenario, the first control unit uses the voice output unit to warn A voice processing apparatus that utters a voice indicating

The first control unit has a measurement counter for measuring the predetermined time,
The audio processing apparatus according to claim 1, wherein a count value for measuring the predetermined time is set for each scene waiting for a response.

3. The voice processing apparatus according to claim 1, wherein, in the first waiting for response in the dialogue scenario, a maximum value that can be measured by the measurement counter is set as the count value for the predetermined time. .

4. The predetermined time is changed based on a time required for the first control unit to recognize the voice recognition result in the response waiting scene. The voice processing device according to claim 1.

5. The voice according to claim 1, wherein the first control unit stores a history of the voice recognition result in the first storage unit in association with the dialogue scenario. 6. Processing equipment.

6. The voice processing according to claim 1, wherein there are a plurality of conversation scenarios having different contents, and a conversation person can select which conversation scenario to use. apparatus.