JP2003036094A

JP2003036094A - Device for speech dialogue and method for processing speech dialogue

Info

Publication number: JP2003036094A
Application number: JP2001221080A
Authority: JP
Inventors: Masaki Matsudaira; 正樹松平; Mayumi Harada; 真弓原田; Shinji Hayakawa; 慎司早川
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2001-07-23
Filing date: 2001-07-23
Publication date: 2003-02-07

Abstract

PROBLEM TO BE SOLVED: To conduct speech dialogue processing with good efficiency and high credibility by conducting branch judgment and determining post-processing about a recognition result toward inputted speech. SOLUTION: In a speech dialogue device 100 provided with a speech dialogue processing part 105 connected with users' telephone set 101 through a communication line 103, several options for fixing the speech inputted from the telephone set 101 and likelihood toward each option are determined as the recognition result, in conducting the speech dialogue processing and the branch judgment is conducted by comparing the likelihood with a standard value in advance fixed to use for judging. The branch judgment determines how to process the recognition result.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、電話による音声
対話装置及び音声対話処理方法に関するものである。こ
の発明によれば、利用者の音声を認識してデータ化する
音声認識機能および利用者の音声を録音する音声録音機
能を有する装置において、音声対話処理を行う。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a telephone voice interaction device and a voice interaction processing method. According to the present invention, voice interaction processing is performed in a device having a voice recognition function of recognizing a voice of a user and converting it into data, and a voice recording function of recording a voice of the user.

【０００２】[0002]

【従来の技術】従来の音声対話装置の一例が、特開平１
０−７０６１３号公報に開示されている。この文献によ
れば、利用者は、音声等の手段によって、電話機より音
声対話装置へ入力を行う。電話機より入力される情報
は、音声対話装置において、音声認識が容易とされる項
目と、音声認識が困難とされる項目に分けられる。2. Description of the Related Art An example of a conventional voice dialogue system is disclosed in Japanese Patent Laid-Open No.
No. 0-70613. According to this document, the user inputs to the voice interaction device from the telephone by means of voice or the like. The information input from the telephone is divided into items for which voice recognition is easy and items for which voice recognition is difficult in the voice interaction device.

【０００３】音声認識が容易とされる項目について、音
声対話装置は、音声認識処理を行う。一方、音声認識が
困難とされる項目は、音声対話装置において、一旦音声
を録音し、オフラインで人手を介してデータ化（上記文
献では「聞き起こし」）する。このオフラインで人手を
介してデータ化する作業は、具体的に、次のようなもの
である。The voice dialog device performs a voice recognition process for an item for which voice recognition is easy. On the other hand, for an item for which voice recognition is difficult, a voice is temporarily recorded by a voice dialogue device and converted into data manually (“listen up” in the above document) manually. Specifically, the work of converting the data into the data manually by the offline is as follows.

【０００４】まず、利用者の有する電話機には、音声対
話装置からガイダンスメッセージが送られてくる。利用
者は、このガイダンスメッセージに従って、住所、名
字、名前等を発声する。利用者から発声された音声は、
電話機を介して、音声対話装置に入力される。入力され
た音声は、音声対話装置において一旦録音される。First, a guidance message is sent from the voice interaction device to the telephone of the user. The user utters an address, family name, first name, etc. according to this guidance message. The voice uttered by the user is
It is input to the voice interaction device via the telephone. The input voice is temporarily recorded in the voice interaction device.

【０００５】その後、録音された音声について、音声対
話装置が音声認識を行う。そして、認識された音声に基
づいて、住所及び氏名等について、複数の候補が決定さ
れる。決定された複数の候補は、コンピュータ端末の画
面等に表示される。After that, the voice dialogue device performs voice recognition on the recorded voice. Then, based on the recognized voice, a plurality of candidates for the address, name, etc. are determined. The plurality of determined candidates are displayed on the screen of the computer terminal or the like.

【０００６】コンピュータ端末の画面等に表示された複
数の候補について、オペレータ（人）が、聞き起こしの
作業を行う。そして、コンピュータ端末の画面等に表示
された複数の候補の中から、住所、氏名等の情報が確定
される。[0006] An operator (person) performs a work of listening to a plurality of candidates displayed on the screen or the like of a computer terminal. Then, information such as an address and a name is fixed from the plurality of candidates displayed on the screen of the computer terminal.

【０００７】なお、上記文献によれば、聞き起こしで行
われる作業は、次のようなものである。一旦録音された
利用者の音声は、適当な手段によって再生される。オペ
レータ（人）は、再生された音声を聞きながら、コンピ
ュータ端末の画面等に表示された複数の候補について、
データの確認を行う。[0007] According to the above-mentioned document, the work carried out by listening and raising is as follows. The user's voice once recorded is reproduced by an appropriate means. While listening to the reproduced voice, the operator (person) can select a plurality of candidates displayed on the screen of the computer terminal, etc.
Check the data.

【０００８】[0008]

【発明が解決しようとする課題】従来の音声対話装置で
は、音声認識が困難とされる項目について入力された音
声は、音声認識装置で正しく認識されていても、オペレ
ータがすべて聞き起こしの作業を行っていた。In the conventional voice dialog device, the operator does not have to utter all the voices input for the items for which voice recognition is difficult, even if the voice recognition device correctly recognizes the input voice. I was going.

【０００９】また、利用者は、音声認識が容易とされる
項目について音声入力するとき、音声認識装置が一度で
認識しなかった場合、音声認識装置が認識するまで、再
入力を繰り返さなければならなかった。[0009] Further, when the user does not recognize a voice recognition item at one time when he / she inputs the voice recognition easily, the user must repeat the input until the voice recognition device recognizes it. There wasn't.

【００１０】従来の音声対話装置は、以上のような問題
点を有していた。このため、入力された音声に対し、効
率良く、しかも信頼性良く、認識処理等の音声対話処理
を行う音声対話装置の出現が望まれていた。The conventional voice dialogue system has the above problems. For this reason, it has been desired to develop a voice interaction device that performs voice interaction processing such as recognition processing efficiently and reliably with respect to the input voice.

【００１１】[0011]

【課題を解決するための手段】そこで、この出願に係る
発明者等は、上述したような問題点を解消すべく、種々
の研究や思考を繰り返し行った。そして、利用者の入力
が次の設問の分岐条件になるか否かの判断を行い、後処
理を決定すれば、効率が良く且つ信頼性の高い音声対話
処理を行うことが出来るという結論に達し、この発明に
到った。Therefore, the inventors of the present application repeatedly conducted various studies and thoughts in order to solve the above-mentioned problems. Then, by deciding whether or not the user's input becomes the branch condition of the next question and deciding the post-processing, it is concluded that efficient and highly reliable voice dialogue processing can be performed. , Came to this invention.

【００１２】この発明の音声対話装置は、通信回線を介
して利用者の電話機に接続された音声対話処理部を具え
ている。そして、この発明の音声対話装置は、つぎのよ
うな方法に従って、音声対話処理を行う。The voice dialogue system of the present invention comprises a voice dialogue processing section connected to the user's telephone through a communication line. Then, the voice interaction device of the present invention performs voice interaction processing according to the following method.

【００１３】まず、音声対話処理部が、利用者の電話機
と音声対話を行い、利用者の音声を電話機から受信す
る。続いて、音声対話処理部は、受信した音声を確定す
るための複数の候補と、それぞれの候補に対する尤度と
を認識結果として決定する。その後、音声対話処理部
は、基準値と尤度とを比較し、認識結果について分岐判
断を行う。分岐判断の結果、認識結果をどのように処理
するかが決定される。即ち、この発明における分岐判断
とは、認識結果について、どのような処理をすべきかの
判断のことである。尚、基準値は、予め音声対話処理部
内に設定され、分岐判断に用いられるものである。First, the voice dialogue processing section carries out voice dialogue with the telephone of the user and receives the voice of the user from the telephone. Then, the voice interaction processing unit determines a plurality of candidates for determining the received voice and the likelihood for each candidate as the recognition result. After that, the voice interaction processing unit compares the reference value with the likelihood, and makes a branch decision on the recognition result. As a result of the branch determination, how to process the recognition result is determined. That is, the branch judgment in the present invention is a judgment as to what processing should be performed on the recognition result. The reference value is set in advance in the voice dialogue processing section and is used for branch determination.

【００１４】以上述べたように、この発明の音声対話装
置及び音声対話処理方法によれば、入力された音声に対
する認識結果について、分岐判断が行われる。そして、
この分岐判断の結果、認識結果をどのように処理するか
が決定される。よって、入力された音声に対して、音声
対話装置で正しい認識結果が得られた場合は、人手を介
する作業を省くことができる。As described above, according to the voice interactive apparatus and the voice interactive processing method of the present invention, the branch judgment is performed on the recognition result for the input voice. And
As a result of this branch determination, how to process the recognition result is determined. Therefore, when a correct recognition result is obtained for the input voice by the voice interaction device, it is possible to omit the manual work.

【００１５】さらに、この発明の音声対話装置及び音声
対話処理方法によれば、分岐判断を行うことによって、
入力された音声が正しく音声認識されない場合について
も、適した処理を決定することができる。この結果、従
来の音声対話装置と比較して、利用者による再入力の処
理等の負担を軽減することができる。Further, according to the voice interactive apparatus and the voice interactive processing method of the present invention, by performing the branch judgment,
Appropriate processing can be determined even when the input voice is not correctly recognized. As a result, it is possible to reduce the burden of re-input processing by the user as compared with the conventional voice interaction device.

【００１６】即ち、この発明の音声対話装置及び音声対
話処理方法によれば、尤度と基準値との比較に基づい
て、認識結果に対して的確な分岐判断が行われる。よっ
て、この発明の音声対話装置及び音声対話処理方法によ
れば、効率が良く、且つ信頼性の高い音声対話処理を実
現することができる。That is, according to the voice interaction device and the voice interaction processing method of the present invention, an accurate branch determination is made for the recognition result based on the comparison between the likelihood and the reference value. Therefore, according to the voice interaction device and the voice interaction processing method of the present invention, it is possible to realize efficient and highly reliable voice interaction processing.

【００１７】[0017]

【発明の実施の形態】以下、図を参照して、この発明の
音声対話装置における実施の形態について説明する。
尚、以下の説明に用いる各図は、この発明を理解できる
程度に概略的に示してあるに過ぎず、従って、この発明
が図示例のみに限定されるものでないことは理解された
い。また、説明に用いる各図において、同様な構成成分
については同一の符号を付して示し、重複する説明を省
略することもある。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of a voice dialogue apparatus of the present invention will be described below with reference to the drawings.
It should be understood that the drawings used in the following description are merely schematic representations to the extent that the present invention can be understood, and therefore the present invention is not limited to the illustrated examples. Further, in each drawing used for the description, the same constituent components are denoted by the same reference numerals, and the duplicate description may be omitted.

【００１８】［実施の形態の構成］この発明の実施の形
態における音声対話装置１００の構成について、図１を
参照して説明する。図１に、この発明の実施の形態にお
ける音声対話装置１００の構成を示す。[Structure of Embodiment] The structure of a voice interactive apparatus 100 according to an embodiment of the present invention will be described with reference to FIG. FIG. 1 shows the configuration of a voice dialog device 100 according to an embodiment of the present invention.

【００１９】音声対話装置１００は、通信回線１０３を
介して利用者の電話機１０１に接続された音声対話処理
部１０５を具えている。The voice interaction apparatus 100 comprises a voice interaction processing section 105 connected to the user's telephone 101 via a communication line 103.

【００２０】音声対話処理部１０５は、電話機１０１か
ら入力された音声について、この音声を確定するための
複数の候補と、それぞれの候補に対する尤度とを認識結
果として決定する。その後、音声対話処理部１０５は、
基準値と尤度とを比較し、認識結果について分岐判断を
行う。この分岐判断の結果、認識結果をどのように処理
するかが決定される。For the voice input from the telephone 101, the voice interaction processing section 105 determines a plurality of candidates for determining the voice and the likelihood for each candidate as the recognition result. After that, the voice interaction processing unit 105
The reference value and the likelihood are compared, and a branch decision is made on the recognition result. As a result of this branch determination, how to process the recognition result is determined.

【００２１】ここで、認識結果における複数の候補と
は、詳しくは、入力された音声の特性を確定するための
ものである。認識結果における複数の候補は、例えば、
入力された音声から抽出された周波数スペクトル等の物
理的特徴に基づいて、決定される。Here, the plurality of candidates in the recognition result are, in detail, for determining the characteristics of the input voice. A plurality of candidates in the recognition result are, for example,
It is determined based on physical characteristics such as a frequency spectrum extracted from the input voice.

【００２２】また、音声対話装置１００は、記憶部１０
７を具えている。この記憶部１０７には、音声対話処理
に利用される音声処理情報１３７が予め格納されてい
る。The voice dialog device 100 also includes a storage unit 10.
It has 7. In the storage unit 107, voice processing information 137 used for voice dialogue processing is stored in advance.

【００２３】音声処理情報１３７には、第１情報群と第
２情報群が含まれている。第１情報群は、認識結果の決
定に利用される第１情報と第２情報とを有する。また、
第２情報群は、分岐判断に利用される第３情報を有す
る。第１情報群と第２情報群に含まれる第１〜第３情報
について、詳細は後述する。The voice processing information 137 includes a first information group and a second information group. The first information group has first information and second information used for determining a recognition result. Also,
The second information group has third information used for branch determination. Details of the first to third information included in the first information group and the second information group will be described later.

【００２４】次に、音声対話処理部１０５の構成につい
て説明する。Next, the configuration of the voice dialogue processing section 105 will be described.

【００２５】音声対話処理部１０５は、通信回線１０３
を介して電話機１０１に接続された回線処理部１０９
と、音声対話制御部１１１と、認識結果を決定する音声
処理部１１３とを具えている。The voice dialogue processing section 105 is provided with a communication line 103.
Line processing unit 109 connected to the telephone 101 via the telephone
And a voice interaction control unit 111 and a voice processing unit 113 that determines a recognition result.

【００２６】音声対話制御部１１１は、回線処理部１０
９、記憶部１０７及び音声処理部１１３と音声や所要の
情報のやり取りを行い、分岐判断の結果を出力する機能
を有している。音声対話制御部１１１には、基本処理部
１１５と、分岐判断部１１７と、基準値を格納するため
のメモリ１１９とが設けられている。The voice dialogue control unit 111 is connected to the line processing unit 10.
9. It has a function of exchanging voice and required information with the storage unit 107 and the voice processing unit 113, and outputting the result of branch determination. The voice interaction control unit 111 is provided with a basic processing unit 115, a branch determination unit 117, and a memory 119 for storing a reference value.

【００２７】基本処理部１１５は、第１及び第２情報
を、記憶部１０７からメモリ１１９に読み込む。そして
基本処理部１１５は、第１情報を利用して、回線処理部
１０９を介して電話機１０１と音声対話を行い、電話機
１０１からの音声を受信する。The basic processing unit 115 reads the first and second information from the storage unit 107 into the memory 119. Then, the basic processing unit 115 uses the first information to perform a voice conversation with the telephone 101 via the line processing unit 109, and receives a voice from the telephone 101.

【００２８】音声処理部１１３は、電話機１０１からの
音声と第２情報とを基本処理部１１５から受信する。そ
して音声処理部１１３は、第２情報を利用して認識結果
を決定し、この認識結果を基本処理部１１５に送信す
る。The voice processing unit 113 receives the voice from the telephone 101 and the second information from the basic processing unit 115. Then, the voice processing unit 113 determines the recognition result using the second information, and transmits this recognition result to the basic processing unit 115.

【００２９】分岐判断部１１７は、認識結果を基本処理
部１１５から受信する。そして分岐判断部１１７は、記
憶部１０７から第３情報を、メモリ１１９から基準値
を、それぞれ読み出す。続いて、分岐判断部１１７は、
尤度と基準値との比較、及び第３情報に基づいて、分岐
判断を行う。The branch judgment unit 117 receives the recognition result from the basic processing unit 115. Then, the branch determination unit 117 reads out the third information from the storage unit 107 and the reference value from the memory 119, respectively. Then, the branch determination unit 117
A branch determination is performed based on the comparison between the likelihood and the reference value and the third information.

【００３０】尚、この実施の形態において、基準値は、
認識結果における尤度に対して設定されることが好まし
い。更に、基準値は、第１閾値、第２閾値、第１閾値
差、第２閾値差の４つの基準値として設定されることが
最適である。In this embodiment, the reference value is
It is preferable to set the likelihood in the recognition result. Furthermore, the reference value is optimally set as four reference values of a first threshold value, a second threshold value, a first threshold value difference, and a second threshold value difference.

【００３１】ところで、音声処理部１１３は、音声認識
部１２１と音声録音部１２３とで構成されることが好ま
しい。By the way, it is preferable that the voice processing unit 113 comprises a voice recognition unit 121 and a voice recording unit 123.

【００３２】音声認識部１２１は、電話機１０１からの
音声を受信し、この音声に対する音声認識処理を行っ
て、認識結果を出力する機能を有する。よって、認識結
果における尤度は、音声認識部１２１の装置構成に依存
する値となる。また、尤度に対して設定される基準値も
音声認識部１２１の装置構成に依存する値となる。The voice recognition unit 121 has a function of receiving a voice from the telephone 101, performing a voice recognition process on the voice, and outputting a recognition result. Therefore, the likelihood in the recognition result has a value that depends on the device configuration of the voice recognition unit 121. The reference value set for the likelihood also depends on the device configuration of the voice recognition unit 121.

【００３３】一方、音声録音部１２３は、電話機１０１
からの音声を受信し、この音声に対する録音処理を行っ
て、音声録音ファイルを作成する機能を有する。On the other hand, the voice recording section 123 is used for the telephone 101.
It has a function of receiving a voice from the user, performing a recording process for the voice, and creating a voice recording file.

【００３４】以上説明したように、この実施の形態の音
声対話処理部１０５の装置構成によれば、音声認識部１
２１は、音声対話装置に入力される音声について、音声
認識を行い、認識結果を決定する。この認識結果におけ
る尤度と基準値とを比較することによって、分岐判断部
１１７は、音声認識について認識性能を評価することが
できる。この実施の形態による尤度と基準値との比較に
ついて、詳細は後述する。As described above, according to the device configuration of the voice dialogue processing unit 105 of this embodiment, the voice recognition unit 1
Reference numeral 21 performs voice recognition on the voice input to the voice interaction device and determines the recognition result. By comparing the likelihood in the recognition result with the reference value, the branch determination unit 117 can evaluate the recognition performance for voice recognition. Details of the comparison between the likelihood and the reference value according to this embodiment will be described later.

【００３５】また、音声対話処理部１０５における各部
構成要素は、音声対話処理に適した任意好適な装置構成
とすることができる。このとき、基準値は、音声認識部
１２１の装置構成にあわせた所望の値を設定することが
できる。Further, each component of the voice dialogue processing section 105 may have any suitable device configuration suitable for voice dialogue processing. At this time, the reference value can be set to a desired value according to the device configuration of the voice recognition unit 121.

【００３６】尚、この実施の形態における音声対話装置
１００は、複数のハードウェアから構成されるコンピュ
ータ装置とすることが最適である。更に、音声対話処理
部１０５における各部構成要素は、プログラムを有する
ハードウェアを用いることが望ましい。The voice dialog device 100 in this embodiment is optimally a computer device composed of a plurality of hardware. Further, it is desirable to use hardware having a program for each component of the voice interaction processing unit 105.

【００３７】次に、記憶部１０７の構成について説明す
る。記憶部１０７には、音声処理情報１３７が格納され
るとともに、データベース部１３５が設けられている。Next, the structure of the storage unit 107 will be described. The storage unit 107 stores the voice processing information 137 and a database unit 135.

【００３８】音声処理情報１３７は、複数のファイルを
有する。この複数のファイルは、音声対話シーケンス１
２５、音声メッセージ１２７、確認認識文法１２９、確
認音声メッセージ１３１、再入力メッセージ１３３を含
むものである。The voice processing information 137 has a plurality of files. This multiple files is a spoken dialogue sequence 1
25, a voice message 127, a confirmation recognition grammar 129, a confirmation voice message 131, and a re-input message 133.

【００３９】ここで、音声対話シーケンス１２５、及び
音声メッセージ１２７は、音声対話装置１００を用いて
音声対話サービスを提供する組織が作成するものである
ことが好ましい。尚、音声メッセージ１２７は、音声に
よるメッセージを収録した複数のファイルを含むもので
ある。音声対話シーケンス１２５については、後に詳し
く説明する。Here, it is preferable that the voice dialogue sequence 125 and the voice message 127 are created by an organization that uses the voice dialogue device 100 to provide a voice dialogue service. The voice message 127 includes a plurality of files containing voice messages. The voice interaction sequence 125 will be described in detail later.

【００４０】また、確認認識文法１２９は、音声として
「はい」もしくは「いいえ」が入力されたときに、この
音声を音声認識するために用いられる。即ち、確認認識
文法１２９は、「はい」もしくは「いいえ」という入力
音声に対し、この音声を認識するための文法を記述した
ファイルであることが望ましい。The confirmation recognition grammar 129 is used for recognizing the voice when "Yes" or "No" is input as the voice. That is, it is preferable that the confirmation recognition grammar 129 is a file in which a grammar for recognizing an input voice of “Yes” or “No” is recognized.

【００４１】また、確認音声メッセージ１３１は、利用
者に認識結果を確認するための音声メッセージを収録し
たファイルであることが好ましい。The confirmation voice message 131 is preferably a file containing a voice message for confirming the recognition result to the user.

【００４２】更に、再入力メッセージ１３３は、電話機
１０１を有する利用者に再入力を促すための音声メッセ
ージを収録したファイルとすることが最適である。Further, the re-input message 133 is optimally a file containing a voice message for prompting the user having the telephone 101 to re-input.

【００４３】尚、データベース部１３５は、音声対話装
置１００において提供するサービスに必要なデータ、及
び認識結果等を格納するものである。The database unit 135 stores the data necessary for the service provided by the voice dialog device 100, the recognition result, and the like.

【００４４】記憶部１０７の各ファイル１２５、１２
７、１２９、１３１、１３３は、実際には、コンピュー
タディスクに格納されていることが好ましい。また、１
２５、１２７、１２９、１３１、及び１３３の各ファイ
ルとデータベース部１３５は、同じ記憶装置に格納され
る構成であってもよい。ここで、この実施の形態におけ
る音声対話装置１００は、上述したような音声対話処理
部１０５を構成する各ハードウェアと、各ファイル及び
データベース部を有する記憶装置とから構成されるコン
ピュータ装置であることが好ましい。Each file 125, 12 in the storage unit 107
In practice, 7, 129, 131, 133 are preferably stored on a computer disk. Also, 1
Each of the files 25, 127, 129, 131, and 133 and the database unit 135 may be stored in the same storage device. Here, the voice interaction device 100 according to the present embodiment is a computer device including each hardware configuring the voice interaction processing unit 105 as described above and a storage device having each file and database unit. Is preferred.

【００４５】上述したような記憶部１０７の構成によれ
ば、音声対話シーケンス１２５と音声メッセージ１２７
を、任意好適なファイル構成とすることができる。従っ
て、この実施の形態では、利用者に対して様々な音声対
話サービスを行うことができる。また、この実施の形態
では、音声対話サービスの内容に適した音声対話処理を
行うことも可能である。According to the configuration of the storage unit 107 as described above, the voice dialogue sequence 125 and the voice message 127 are used.
Can have any suitable file structure. Therefore, in this embodiment, various voice dialogue services can be provided to the user. Further, in this embodiment, it is also possible to perform voice interaction processing suitable for the content of the voice interaction service.

【００４６】尚、記憶部１０７に格納されている各ファ
イルは、必要に応じて更新されるものとすることが好ま
しい。The files stored in the storage unit 107 are preferably updated as needed.

【００４７】次に、音声対話シーケンス１２５につい
て、図２を参照して説明する。図２は、音声対話シーケ
ンス１２５の構成について示す図である。この実施の形
態によれば、音声対話シーケンス１２５は、複数の対話
セル２０１と、複数のグローバル変数２１５とを有する
構成であることが望ましい。Next, the voice dialogue sequence 125 will be described with reference to FIG. FIG. 2 is a diagram showing a configuration of the voice dialogue sequence 125. According to this embodiment, the voice interaction sequence 125 is preferably configured to have a plurality of interaction cells 201 and a plurality of global variables 215.

【００４８】各対話セル２０１は、音声対話装置１００
で行われる一連の対話の最小単位である。そして、音声
対話処理の際、音声対話制御部１１１は、初期対話セル
より以降、各対話セルの処理を繰り返し行う。ここで、
初期対話セルとは、音声対話処理を開始する際、音声対
話制御部１１１が最初に処理を行う対話セルのことであ
る。Each dialogue cell 201 is a voice dialogue device 100.
Is the smallest unit of a series of dialogues. Then, in the voice interaction process, the voice interaction control unit 111 repeats the process of each interaction cell after the initial interaction cell. here,
The initial dialogue cell is a dialogue cell which is first processed by the voice dialogue control unit 111 when the voice dialogue processing is started.

【００４９】対話セルは、第１情報群２１９と、第２情
報群２１７とを有する。既に説明したように、第１情報
群２１９には、第１情報と第２情報とが含まれている。
また、第２情報群２１７には、第３情報が含まれる。こ
の第３情報は、対話セルの処理を決定する対話セルモー
ド２０３とすることが望ましい。このとき、分岐判断部
１１７による分岐判断は、尤度と基準値との比較、及び
対話セルモード２０３に基づいて行われることが好適で
ある。対話セルモード２０３について、詳細は後述す
る。The conversation cell has a first information group 219 and a second information group 217. As described above, the first information group 219 includes the first information and the second information.
Further, the second information group 217 includes the third information. This third information is preferably the interactive cell mode 203 that determines the processing of the interactive cell. At this time, the branch determination by the branch determination unit 117 is preferably performed based on the comparison between the likelihood and the reference value and the interactive cell mode 203. Details of the interactive cell mode 203 will be described later.

【００５０】第１情報群２１９は、第１情報としての音
声ファイル名２１１と、第２情報としての認識文法２１
３とを含むものとすることが好ましい。The first information group 219 includes a voice file name 211 as the first information and a recognition grammar 21 as the second information.
3 is preferably included.

【００５１】音声ファイル名２１１は、電話機１０１へ
送信する音声メッセージのファイル名を、音声メッセー
ジ１２７の中から指定するものである。既に説明したよ
うに、音声メッセージ１２７は、音声によるメッセージ
を収録した複数のファイルを含むものである。音声メッ
セージ１２７に含まれる複数のファイルのうち、音声フ
ァイル名２１１によって指定されたファイルに該当する
音声メッセージが、電話機１０１へ送信される。The voice file name 211 designates the file name of the voice message to be transmitted to the telephone 101 from the voice message 127. As described above, the voice message 127 includes a plurality of files containing voice messages. A voice message corresponding to the file designated by the voice file name 211 among the plurality of files included in the voice message 127 is transmitted to the telephone 101.

【００５２】この実施の形態によれば、電話機１０１に
送信された音声メッセージに対し、利用者は音声により
回答を行う。利用者の音声は、認識文法２１３を用い
て、音声対話装置１００によって認識される。According to this embodiment, the user replies by voice to the voice message transmitted to the telephone set 101. The voice of the user is recognized by the voice interaction device 100 using the recognition grammar 213.

【００５３】また、第２情報群２１７は、対話セルモー
ド２０３のほかに、前処理プログラム２０５と、後処理
プログラム２０７と、次セルポインタ２０９とを含むこ
ととすることが最適である。Further, it is optimal that the second information group 217 includes a preprocessing program 205, a postprocessing program 207, and a next cell pointer 209 in addition to the interactive cell mode 203.

【００５４】前処理プログラム２０５は、音声対話に必
要な前処理を記述するものである。前処理とは、例え
ば、時刻の獲得、利用者の電話機１０１からの着信アカ
ウントなどの処理である。The preprocessing program 205 describes the preprocessing required for voice dialogue. The preprocessing is, for example, processing such as acquisition of time and an incoming call account from the user's telephone 101.

【００５５】更に、後処理プログラム２０７は、例えば
音声認識結果に対する処理等、音声対話に必要な後処理
を記述するものである。Further, the post-processing program 207 describes post-processing required for voice dialogue, such as processing for a voice recognition result.

【００５６】また、次セルポインタ２０９は、１つの対
話セルにおける全ての処理が終了された後、次に進むべ
き対話セルを指定するものである。The next cell pointer 209 is for designating the next interactive cell to be advanced after all the processing in one interactive cell is completed.

【００５７】ここで、対話セルモード２０３は、複数の
フラグ列から構成されるものであることが好ましい。そ
して、各対話セルにおいて、対話セルモード２０３は、
前処理プログラム２０５、後処理プログラム２０７、及
び次セルポインタ２０９に依存する。Here, the interactive cell mode 203 is preferably composed of a plurality of flag strings. Then, in each dialogue cell, the dialogue cell mode 203 is
It depends on the pre-processing program 205, the post-processing program 207, and the next cell pointer 209.

【００５８】即ち、対話セルモード２０３を構成する複
数のフラグ列は、（イ）その対話セルが音声認識処理を
含むかどうか、（ロ）その対話セルにおいて、次の対話
セルへの分岐があるかどうか、（ハ）後処理プログラム
２０７が、認識結果について、グローバル変数２１５に
代入するかどうか、（ニ）後処理プログラム２０７が、
認識結果について、データベース部１３５に登録するか
どうか、（ホ）後処理プログラム２０７が、認識結果に
ついて、グローバル変数２１５、或いはデータベース部
１３５内のカウンタ値を増やす処理を行うかどうか、を
示すものであること望ましい。That is, the plurality of flag strings constituting the conversation cell mode 203 are (a) whether or not the conversation cell includes a voice recognition process, and (b) there is a branch to the next conversation cell in the conversation cell. Whether (c) the post-processing program 207 substitutes the recognition result into the global variable 215, (d) the post-processing program 207
It indicates whether the recognition result is registered in the database unit 135, and (e) whether the post-processing program 207 performs the process of increasing the global variable 215 or the counter value in the database unit 135 for the recognition result. It is desirable to have.

【００５９】ここで、グローバル変数２１５、或いはデ
ータベース部１３５内のカウンタ値を増やす処理につい
て説明する。例えば、音声対話装置１００にて、何らか
の選択型アンケートを行う音声対話サービスが提供され
る場合を考える。このとき、利用者の有する電話機１０
１から、順次アンケートに対する解答が、音声対話装置
１００に入力される。例えば、選択型アンケートにおい
ては、利用者による選択項目が、音声対話装置１００に
入力される。そして、グローバル変数２１５もしくはデ
ータベース部１３５において、入力された選択項目がカ
ウント、即ち集計される。Here, the process of increasing the global variable 215 or the counter value in the database unit 135 will be described. For example, consider a case where the voice interactive device 100 provides a voice interactive service for conducting some sort of questionnaire. At this time, the telephone 10 owned by the user
From 1, the answers to the questionnaire are sequentially input to the voice interaction device 100. For example, in the selection-type questionnaire, the selection items by the user are input to the voice interaction device 100. Then, in the global variable 215 or the database unit 135, the input selection items are counted, that is, totaled.

【００６０】次に、グローバル変数２１５について説明
する。グローバル変数２１５は、各対話セル２０１が共
通して利用する変数である。例えば、音声対話装置１０
０が、旅行等の予約サービスを提供するためのものであ
るとする。そして、この予約サービスでは、住所、氏
名、予約する日付等の情報について、利用者の回答を求
める対話が行われるものとする。よって、このときの対
話セル２０１は、住所の入力を促すための対話セル、氏
名の入力を促すための対話セル、及び予約する日付の入
力を促すための対話セル等を含むものとなる。そして、
各対話セルについて、利用者の音声による回答が、順次
音声対話装置１００に入力されると、各グローバル変数
に代入されるようにする。グローバル変数に代入された
情報は、各対話セルにおいて共通に利用されるため、正
確な値を代入する必要がある。Next, the global variable 215 will be described. The global variable 215 is a variable commonly used by the dialogue cells 201. For example, the voice interaction device 10
It is assumed that 0 is for providing a reservation service for travel and the like. Then, in this reservation service, it is assumed that there is a dialogue for the user's answer regarding information such as address, name, date of reservation, and the like. Therefore, the dialog cell 201 at this time includes a dialog cell for prompting the input of the address, a dialog cell for prompting the input of the name, a dialog cell for prompting the input of the reserved date, and the like. And
For each dialogue cell, when the voice response of the user is sequentially input to the voice dialogue device 100, it is substituted into each global variable. Since the information assigned to the global variable is commonly used in each dialogue cell, it is necessary to assign an accurate value.

【００６１】［実施の形態の動作］次に、図１及び図２
の他、図３〜図１０に示す各フローチャートを参照し
て、この発明の実施の形態における各部構成要素の動作
について説明する。図３〜図１０のフローチャートで
は、音声対話制御部１１１、回線処理部１０９、音声認
識部１２１、及び音声録音部１２３にて行われる動作
を、それぞれ、対話制御プロセス、回線制御プロセス、
音声認識プロセス、及び音声録音プロセスに分けて説明
する。尚、利用者の電話機１０１によって行われる動作
は、単に利用者の電話機と名付けるプロセスで説明す
る。また、図中、各処理のステップを記号Ｓに番号を添
えて示してある。[Operation of Embodiment] Next, FIG. 1 and FIG.
In addition, the operation of each component in the embodiment of the present invention will be described with reference to the flow charts shown in FIGS. In the flowcharts of FIGS. 3 to 10, the operations performed by the voice dialogue control unit 111, the line processing unit 109, the voice recognition unit 121, and the voice recording unit 123 are respectively described as a dialogue control process, a line control process, and
The voice recognition process and the voice recording process will be described separately. The operation performed by the user's telephone 101 will be described in the process of simply naming the user's telephone. Further, in the figure, the step of each process is shown by adding a number to the symbol S.

【００６２】１．音声対話処理この実施の形態において行われる音声対話処理につい
て、図３に示すフローチャートを参照して説明する。1. Voice Interaction Process The voice interaction process performed in this embodiment will be described with reference to the flowchart shown in FIG.

【００６３】まず、図１に示したこの実施の形態の構成
において、音声対話処理部１０５に対して、サービス提
供者が従来既知の始動方法で、音声対話制御部１１１を
起動する。音声対話制御部１１１は起動されると、音声
対話処理を開始する。First, in the configuration of this embodiment shown in FIG. 1, the service provider activates the voice interaction control unit 111 to the voice interaction processing unit 105 by a conventionally known starting method. When activated, the voice interaction control unit 111 starts voice interaction processing.

【００６４】（Ｓ３０１）音声対話制御部１１１は、回
線処理部１０９を起動する。音声対話制御部１１１にお
いて、具体的には、基本処理部１１５が回線処理部１０
９を起動する。(S301) The voice interaction control section 111 activates the line processing section 109. In the voice dialogue control unit 111, specifically, the basic processing unit 115 is the line processing unit 10.
Start 9.

【００６５】（Ｓ３０２）音声対話制御部１１１におい
て、基本処理部１１５は、回線処理部１０９を介して、
回線状態の変化待ちとなる。回線状態の変化待ちとは、
即ち、電話機１０１からの着信待ちの状態である。(S302) In the voice dialogue control section 111, the basic processing section 115 receives the line processing section 109,
Waiting for the line status to change. Waiting for a change in line status
That is, it is in a state of waiting for an incoming call from the telephone 101.

【００６６】（Ｓ３０３）一方、起動された回線処理部
１０９は、通信回線１０３の状態を監視する動作を開始
する。(S303) On the other hand, the activated line processing unit 109 starts the operation of monitoring the state of the communication line 103.

【００６７】（Ｓ３０４）しかる後、利用者が電話機１
０１から音声対話装置１００に発呼する（電話をかけ
る）。電話機１０１からの発呼は、回線処理部１０９に
おいて着信される。そして、回線処理部１０９は、音声
対話制御部１１１に着信通知を行う。ここで、着信通知
とは、電話機１０１からの発呼を着信した旨の通知のこ
とである。(S304) Then, the user sets the telephone 1
A call is made from 01 to the voice interaction device 100 (a call is made). An outgoing call from the telephone 101 is received by the line processing unit 109. Then, the line processing unit 109 notifies the voice interaction control unit 111 of an incoming call. Here, the incoming call notification is a notification that the outgoing call from the telephone set 101 has been received.

【００６８】（Ｓ３０５）音声対話制御部１１１は、着
信通知を受け取ると、音声対話シーケンス１２５の初期
対話セルの処理を行う。(S305) Upon receiving the incoming call notification, the voice dialogue control section 111 processes the initial dialogue cell of the voice dialogue sequence 125.

【００６９】音声対話制御部１１１において、具体的
に、着信通知は、基本処理部１１５によって受信され
る。そして、基本処理部１１５は、着信通知に応答し
て、音声対話シーケンス１２５の初期対話セルの情報
を、記憶部１０７からメモリ１１９に読み込む。メモリ
１１９に読み込まれた情報に基づいて、基本処理部１１
５は、初期対話セルの処理を行う。In the voice interaction control section 111, specifically, the incoming call notification is received by the basic processing section 115. Then, the basic processing unit 115 reads the information of the initial dialogue cell of the voice dialogue sequence 125 from the storage unit 107 into the memory 119 in response to the incoming call notification. Based on the information read in the memory 119, the basic processing unit 11
5 processes the initial dialogue cell.

【００７０】（Ｓ３０６）初期対話セルにおける全ての
処理が終了されると、音声対話制御部１１１において、
次セルポインタ２０９の有無が判断される。次セルポイ
ンタ２０９の有無は、音声対話処理における一連の対話
を終了するか否かの判断となる。具体的には、音声対話
制御部１１１において、基本処理部１１５が、次セルポ
インタ２０９の有無を判断する。(S306) When all the processing in the initial dialogue cell is completed, the voice dialogue control section 111
The presence / absence of the next cell pointer 209 is determined. The presence or absence of the next cell pointer 209 determines whether or not to end a series of dialogues in the voice dialogue processing. Specifically, in the voice interaction control unit 111, the basic processing unit 115 determines whether or not the next cell pointer 209 is present.

【００７１】初期対話セル内に次セルポインタ２０９が
存在しない場合、基本処理部１１５は、音声対話処理の
全ての対話は終了されるものみなす。そして、基本処理
部１１５は、後述するＳ３０７の処理を行う。When the next cell pointer 209 does not exist in the initial dialogue cell, the basic processing unit 115 considers that all dialogues of the voice dialogue processing are to be ended. Then, the basic processing unit 115 performs the process of S307 described below.

【００７２】一方、初期対話セル内に次セルポインタ２
０９が存在する場合、音声対話制御部１１１における動
作は、Ｓ３０５の処理へ戻る。具体的には、基本処理部
１１５が、次セルポインタ２０９をメモリ１１９に読み
出す。その後、基本処理部１１５は、次セルポインタ２
０９で指定された対話セルの処理を行う。各対話セルの
処理については後述する。On the other hand, the next cell pointer 2 in the initial dialogue cell
If 09 is present, the operation of the voice interaction control unit 111 returns to the process of S305. Specifically, the basic processing unit 115 reads the next cell pointer 209 into the memory 119. After that, the basic processing unit 115 determines that the next cell pointer 2
The processing of the dialogue cell designated by 09 is performed. The processing of each dialogue cell will be described later.

【００７３】尚、音声対話シーケンス１２５を構成する
複数の対話セル２０１が、それぞれＩＤを有している場
合、次セルポインタ２０９とは、各対話セルのＩＤを指
定するものである。また、次セルポインタ２０９が存在
しない対話セルを、最後の対話セルと呼ぶことにする。When the plurality of conversation cells 201 forming the voice conversation sequence 125 each have an ID, the next cell pointer 209 is for designating the ID of each conversation cell. Further, the dialogue cell in which the next cell pointer 209 does not exist is called the last dialogue cell.

【００７４】（Ｓ３０７）最後の対話セルの処理を終了
すると、音声対話制御部１１１は回線処理部１０９に回
線切断の旨を通知し、再び、着信待ちの状態になる。具
体的には、音声対話制御部１１１において、基本処理部
１１５が、回線処理部１０９に回線切断の旨を通知す
る。そして、基本処理部１１５は、再び、着信待ちの状
態となる。ここで、回線切断の旨の通知を、切断通知と
よぶことにする。(S307) When the processing of the last dialogue cell is completed, the voice dialogue control unit 111 notifies the line processing unit 109 that the line is disconnected, and the incoming call waiting state is again established. Specifically, in the voice interaction control unit 111, the basic processing unit 115 notifies the line processing unit 109 that the line is disconnected. Then, the basic processing unit 115 again waits for an incoming call. Here, the notification of disconnection will be referred to as a disconnection notification.

【００７５】切断通知を受信した回線処理部１０９は、
利用者の電話機１０１との回線を切断し、再び回線状態
の変化待ちになる。ここで、ひとりの利用者の１回の通
話に対する音声対話処理は終了される。The line processing unit 109 having received the disconnection notice
The line with the user's telephone set 101 is disconnected, and the system waits for the line state to change again. Here, the voice interaction process for one call of one user is ended.

【００７６】２．各対話セルの処理次に、図３に示すフローチャート中、Ｓ３０５の処理、
即ち各対話セルの処理について、図４に示すフローチャ
ートを参照して説明する。2. Processing of each dialogue cell Next, in the flowchart shown in FIG. 3, the processing of S305,
That is, the processing of each dialogue cell will be described with reference to the flowchart shown in FIG.

【００７７】（Ｓ４０１）先ず、音声対話制御部１１１
は、対話セルの前処理プログラム２０５を実行する。前
処理プログラム２０５が実行されると、このプログラム
に記述されている前処理が行われる。(S401) First, the voice dialogue control section 111.
Executes the preprocessing program 205 for the interactive cell. When the preprocessing program 205 is executed, the preprocessing described in this program is performed.

【００７８】具体的には、音声対話制御部１１１におい
て、基本処理部１１５が、前処理プログラム２０５をメ
モリ１１９に読み込む。メモリ１１９に読み込まれた前
処理プログラム２０５は、基本処理部１１５によって実
行される。Specifically, in the voice interaction control section 111, the basic processing section 115 reads the preprocessing program 205 into the memory 119. The preprocessing program 205 read into the memory 119 is executed by the basic processing unit 115.

【００７９】（Ｓ４０２）次に、音声対話制御部１１１
は、対話セルの音声ファイル名２１１で指定された音声
メッセージを、回線処理部１０９を介して利用者の電話
機１０１に送信する。(S402) Next, the voice dialogue control section 111.
Transmits the voice message specified by the voice file name 211 of the conversation cell to the telephone set 101 of the user via the line processing unit 109.

【００８０】具体的には、音声対話制御部１１１におい
て、基本処理部１１５が、対話セルの音声ファイル名２
１１を記憶部１０７からメモリ１１９に読み込む。続い
て、基本処理部１１５は、読み込んだ音声ファイル名２
１１で指定されたメッセージを、音声メッセージ１２７
の中からメモリ１１９に読み出す。その後、読み出され
たメッセージは、基本処理部１１５によって、回線処理
部１０９を介して、利用者の電話機１０１に送信され
る。Specifically, in the voice conversation control unit 111, the basic processing unit 115 causes the voice file name 2 of the conversation cell to be changed.
11 is read from the storage unit 107 into the memory 119. Then, the basic processing unit 115 causes the read audio file name 2
The message specified in 11 is replaced by the voice message 127.
From the inside to the memory 119. After that, the read message is transmitted to the user's telephone 101 by the basic processing unit 115 via the line processing unit 109.

【００８１】（Ｓ４０３）電話機１０１は回線処理部１
０９から、メッセージを受信し、これを再生する。例え
ば、回線処理部１０９から、「こちらは××です。お客
様の年齢を２０代、３０代のようにお答えください。」
というメッセージを受信すると、これを電話機１０１は
再生する。(S403) Telephone 101 is line processing unit 1
From 09, the message is received and reproduced. For example, from the line processing unit 109, "This is XX. Please answer your age as if you were in your 20s or 30s."
Message is received, the telephone set 101 reproduces it.

【００８２】（Ｓ４０４）その後、電話機１０１によっ
て再生されたメッセージに対し、利用者が回答を音声で
行う。電話機１０１は、利用者の音声を音声対話装置１
００に送信する。(S404) After that, the user gives a voice response to the message reproduced by the telephone set 101. The telephone 101 uses the voice interaction device 1 to listen to the user's voice.
To 00.

【００８３】（Ｓ４０５）電話機１０１から、利用者の
音声を、通信回線１０３を介して、回線処理部１０９が
受信する。回線処理部１０９は、受信した利用者の音声
を音声対話制御部１１１に送信する。その後、音声対話
制御部１１１は入力処理をおこなう。ただし、この利用
者の回答および音声対話制御部１１１の入力処理は、存
在しない場合もある。例えば、コマーシャルのような音
声メッセージを送信、再生するだけの場合である。入力
処理については後述する。(S405) The line processing unit 109 receives the voice of the user from the telephone 101 via the communication line 103. The line processing unit 109 transmits the received voice of the user to the voice interaction control unit 111. After that, the voice interaction control unit 111 performs an input process. However, this user's response and the input process of the voice interaction control unit 111 may not exist. For example, this is a case where only a voice message such as a commercial is transmitted and played. The input process will be described later.

【００８４】（Ｓ４０６）入力処理終了後、音声対話制
御部１１１は、対話セルの後処理プログラム２０７を実
行し、対話セルの処理を終了する。(S406) After the input processing is completed, the voice interaction control section 111 executes the post-processing program 207 of the interaction cell and terminates the processing of the interaction cell.

【００８５】具体的に、音声対話制御部１１１におい
て、基本処理部１１５が、対話セルの後処理プログラム
２０７を、記憶部１０７からメモリ１１９に読み込み、
実行する。後処理プログラム２０７が実行されると、こ
のプログラムに記述されている後処理が行われる。Specifically, in the voice conversation control unit 111, the basic processing unit 115 reads the post-processing program 207 of the conversation cell from the storage unit 107 into the memory 119,
Run. When the post-processing program 207 is executed, the post-processing described in this program is performed.

【００８６】３．入力処理以上説明した図４のフローチャート中、Ｓ４０５の処
理、即ち入力処理について、図５及び図６に示すフロー
チャートを参照して説明する。まずは、図５に示すフロ
ーチャートを参照して、Ｓ５０１からＳ５０６までの処
理について説明する。3. Input Process In the flowchart of FIG. 4 described above, the process of S405, that is, the input process will be described with reference to the flowcharts shown in FIGS. First, the processing from S501 to S506 will be described with reference to the flowchart shown in FIG.

【００８７】（Ｓ５０１）まず、音声対話制御部１１１
は、音声処理部１１３における音声認識部１２１に、対
話セルの認識文法２１３を渡して、音声認識開始の旨を
通知する。音声認識部１２１は、認識文法２１３を用い
て音声認識処理を開始する。(S501) First, the voice dialogue control section 111
Passes the recognition grammar 213 of the dialogue cell to the voice recognition unit 121 in the voice processing unit 113 and notifies that the voice recognition is started. The voice recognition unit 121 uses the recognition grammar 213 to start the voice recognition process.

【００８８】具体的には、音声対話制御部１１１におい
て、基本処理部１１５が、記憶部１０７から、対話セル
の認識文法２１３をメモリ１１９に読み出す。そして、
メモリ１１９に読み出された認識文法２１３は、基本処
理部１１５によって、音声認識部１２１に送信される。Specifically, in the voice conversation control unit 111, the basic processing unit 115 reads the recognition grammar 213 of the conversation cell from the storage unit 107 into the memory 119. And
The recognition grammar 213 read into the memory 119 is transmitted to the voice recognition unit 121 by the basic processing unit 115.

【００８９】（Ｓ５０２）また、音声対話制御部１１１
において、基本処理部１１５は、音声録音部１２３に、
音声録音開始の旨を通知する。音声録音部１２３は音声
録音処理を開始する。(S502) Also, the voice dialogue control section 111.
In the above, the basic processing unit 115 causes the voice recording unit 123 to
Notify that voice recording has started. The voice recording unit 123 starts the voice recording process.

【００９０】（Ｓ５０３）その後、音声対話制御部１１
１は、音声認識部１２１及び音声録音部１２３からの結
果待ち状態になる。ここで、結果待ち状態とは、音声認
識部１２１及び音声録音部１２３から、音声処理の結果
が送信されるのを待つ状態である。ここで、音声処理と
は、音声認識部１２１で行われる音声認識、及び音声録
音部１２３で行われる音声録音の両方の処理を含めたも
のを意味する。(S503) After that, the voice dialogue control unit 11
1 is in a state of waiting for a result from the voice recognition unit 121 and the voice recording unit 123. Here, the result waiting state is a state in which the voice recognition unit 121 and the voice recording unit 123 wait for the voice processing result to be transmitted. Here, the voice processing means processing including both the voice recognition performed by the voice recognition unit 121 and the voice recording performed by the voice recording unit 123.

【００９１】（Ｓ５０４）次に、図４のフローチャート
を参照すれば、Ｓ４０３において電話機１０１は、音声
対話装置１００から受信したメッセージを再生する。こ
のメッセージに対し、利用者が回答を音声で行う。電話
機１０１は、利用者の音声を音声対話装置１００に送信
する。(S504) Next, referring to the flowchart in FIG. 4, the telephone set 101 reproduces the message received from the voice interactive apparatus 100 in S403. The user replies to this message by voice. The telephone 101 transmits the voice of the user to the voice interaction device 100.

【００９２】（Ｓ５０５）音声認識部１２１は、利用者
が行った回答についての音声を、音声対話制御部１１１
から受信する。具体的には、音声対話制御部１１１にお
いて、基本処理部１１５が、回線処理部１０９から利用
者の音声を受信する。そして、基本処理部１１５が、音
声認識部１２１に利用者の音声を送信する。(S505) The voice recognition unit 121 outputs the voice of the answer given by the user to the voice dialogue control unit 111.
To receive from. Specifically, in the voice interaction control unit 111, the basic processing unit 115 receives the voice of the user from the line processing unit 109. Then, the basic processing unit 115 transmits the voice of the user to the voice recognition unit 121.

【００９３】次に、音声認識部１２１は、受信した音声
を確定するために、従来既知の方法によって音声認識を
行う。音声認識の結果、即ち認識結果は、複数の候補と
それぞれの候補に対する尤度として決定される。ここ
で、音声認識部１２１は、音声認識処理を終了する。ま
た、認識結果は、音声認識部１２１によって音声対話制
御部１１１に送信される。Next, the voice recognition unit 121 performs voice recognition by a conventionally known method in order to determine the received voice. The result of voice recognition, that is, the recognition result is determined as a plurality of candidates and the likelihood for each candidate. Here, the voice recognition unit 121 ends the voice recognition process. In addition, the recognition result is transmitted to the voice interaction control unit 111 by the voice recognition unit 121.

【００９４】例えば、利用者が音声により「サンジュウ
ダイ」と回答した場合、利用者の音声に対する認識結果
は、次のようなものとなる。即ち、第１候補：「３０
代」、第１候補の尤度０．９８７；第２候補：「４０
代」、第２候補の尤度０．７６５；第３候補：「１０
代」、第３候補の尤度０．５４３のようになる。For example, when the user replies "Sanjudai" by voice, the recognition result for the voice of the user is as follows. That is, the first candidate: "30
Generation ”, likelihood of first candidate 0.987; second candidate:“ 40
Teens, likelihood of second candidate 0.765; third candidate: “10
Generation, the likelihood of the third candidate is 0.543.

【００９５】ここで、この実施の形態において、第１候
補とは最も高い尤度を有するものである。そして、以
下、尤度の高い順から、第２候補、第３候補・・・のよ
うに決定される。Here, in this embodiment, the first candidate has the highest likelihood. Then, in the following, the second candidate, the third candidate, ... Are determined in descending order of likelihood.

【００９６】（Ｓ５０６）音声録音部１２３は、音声対
話制御部１１１から、利用者が行った回答についての音
声を受信する。このとき、音声対話制御部１１１におい
て行われる具体的な手順は、Ｓ５０５で説明したものと
同様である。よって、ここでは、重複する説明について
記載を省略する。(S506) The voice recording unit 123 receives the voice of the answer given by the user from the voice interaction control unit 111. At this time, the specific procedure performed by the voice interaction control unit 111 is the same as that described in S505. Therefore, the description of the overlapping description is omitted here.

【００９７】次に、音声録音部１２３は、受信した音声
に対し録音処理を行う。そして、音声録音部１２３は、
録音した音声について音声録音ファイルを作成する。こ
こで、音声録音部１２３は音声録音処理を終了し、その
旨を音声対話制御部１１１に通知する。Next, the voice recording section 123 performs a recording process on the received voice. Then, the voice recording unit 123
Create a voice recording file for the recorded voice. Here, the voice recording unit 123 ends the voice recording process, and notifies the voice conversation control unit 111 of that.

【００９８】尚、この実施の形態の音声対話装置１００
において、上述したＳ５０１とＳ５０２の処理、及びＳ
５０５とＳ５０６の処理は並行して行われることが好ま
しい。The voice dialog device 100 of this embodiment is used.
In S, the processing of S501 and S502 described above, and S
It is preferable that the processes of 505 and S506 be performed in parallel.

【００９９】次に図６に示すフローチャートを参照し
て、入力処理において行われるＳ５０７以降の処理につ
いて説明する。Next, with reference to the flowchart shown in FIG. 6, the processing after S507 performed in the input processing will be described.

【０１００】（Ｓ５０７）音声対話制御部１１１におい
て、基本処理部１１５は、音声認識部１２１、音声録音
部１２３からそれぞれ認識結果、音声録音処理終了の通
知を受信する。そして、基本処理部１１５は、認識結果
を分岐判断部１１７に送信する。(S507) In the voice interaction control unit 111, the basic processing unit 115 receives the recognition result and the voice recording process end notification from the voice recognition unit 121 and the voice recording unit 123, respectively. Then, the basic processing unit 115 transmits the recognition result to the branch determination unit 117.

【０１０１】分岐判断部１１７は、認識結果を受信する
と、メモリ１１９より基準値を読み出す。既に説明した
ように、基準値は、第１閾値、第１閾値差、第２閾値、
第２閾値差の４つの基準値として設定されている。そし
て、分岐判断部１１７は、認識結果における複数の候補
のうち、第１候補の尤度及び第１候補の尤度と第２候補
の尤度との差について、それぞれ各閾値、各閾値差と比
較を行う。Upon receiving the recognition result, the branch judgment unit 117 reads the reference value from the memory 119. As described above, the reference value is the first threshold value, the first threshold difference, the second threshold value,
It is set as four reference values for the second threshold difference. Then, the branch determination unit 117 sets the respective thresholds and the respective threshold differences with respect to the likelihood of the first candidate and the difference between the likelihood of the first candidate and the likelihood of the second candidate among the plurality of candidates in the recognition result. Make a comparison.

【０１０２】ここで、第１候補の尤度と第２候補の尤度
の差と各閾値差との比較は、次のような判断に基づいて
行われることが望ましい。即ち、第１候補と第２候補の
尤度の差が大きいほど、第１候補は入力された音声を確
定するための信頼性が高い結果であると判断される。Here, it is preferable that the difference between the likelihood of the first candidate and the likelihood of the second candidate and each threshold difference be compared based on the following judgment. That is, the larger the difference between the likelihoods of the first candidate and the second candidate is, the higher the reliability of the first candidate for determining the input voice is determined to be.

【０１０３】また、このステップにおいて、対話セルの
対話セルモード２０３が、分岐判断部１１７によって、
記憶部１０７からメモリ１１９に読み込まれる。Further, in this step, the dialogue cell mode 203 of the dialogue cell is changed by the branch judgment unit 117.
It is read from the storage unit 107 into the memory 119.

【０１０４】そして、分岐判断部１１７は、上述したよ
うな各基準値と尤度との比較か、もしくは、対話セルモ
ード２０３に基づいて分岐判断を行う。Then, the branch determination unit 117 makes a branch determination based on the comparison between each reference value and the likelihood as described above or based on the interactive cell mode 203.

【０１０５】尚、分岐判断部１１７による分岐判断は、
認識結果について、（ａ）そのまま利用するか、（ｂ）
確認処理をおこなうか、（ｃ）再入力処理をおこなう
か、（ｄ）音声録音ファイルだけを保存するか、（ｅ）
認識結果を破棄するか、を判断するものであることが好
ましい。分岐判断部１１７によって行われる分岐判断に
ついて、詳細は後述する。The branch judgment by the branch judgment unit 117 is
Regarding the recognition result, (a) use it as it is or (b)
Confirmation process, (c) Re-input process, (d) Save voice recording file only, (e)
It is preferable to judge whether to discard the recognition result. Details of the branch determination performed by the branch determination unit 117 will be described later.

【０１０６】次に、（ａ）〜（ｅ）の判断の結果行われ
るＳ５０８〜Ｓ５１２の処理について説明する。Next, the processing of S508 to S512 performed as a result of the judgments of (a) to (e) will be described.

【０１０７】（Ｓ５０８）分岐判断部１１７が、（ａ）
によって、認識結果をそのまま利用すると判断した場合
は、音声対話制御部１１１において、第１候補がそのま
ま利用される。そして、入力処理は終了される。(S508) The branch judgment unit 117 (a)
When it is determined that the recognition result is used as it is, the voice interaction control unit 111 uses the first candidate as it is. Then, the input process ends.

【０１０８】（Ｓ５０９）分岐判断部１１７が、（ｂ）
によって、確認処理をおこなうと判断した場合は、音声
対話制御部１１１において、確認処理が行われる。確認
処理について、詳細は後述する。(S509) The branch judgment unit 117 determines (b)
When it is determined that the confirmation processing is to be performed, the voice interaction control unit 111 performs the confirmation processing. Details of the confirmation processing will be described later.

【０１０９】（Ｓ５１０）分岐判断部１１７が、（ｃ）
によって、再入力処理をおこなうと判断した場合、音声
対話制御部１１１において、再入力処理が行われる。再
入力処理について、詳細は後述する。(S510) The branch judgment unit 117 determines (c)
When it is determined that the re-input process is performed, the voice interaction control unit 111 performs the re-input process. Details of the re-input process will be described later.

【０１１０】（Ｓ５１１）分岐判断部１１７が、（ｄ）
によって、音声録音ファイルを保存すると判断した場
合、音声対話制御部１１１において基本処理部１１５
が、音声録音ファイルへのポインタを認識結果として代
用する。そして、基本処理部１１５は、音声録音ファイ
ルをデータベース部１３５に保存する。(S511) The branch judgment unit 117 (d)
If it is determined that the voice recording file is to be stored, the basic processing unit 115 in the voice interaction control unit 111 is determined.
Uses the pointer to the voice recording file as the recognition result. Then, the basic processing unit 115 stores the voice recording file in the database unit 135.

【０１１１】尚、音声録音ファイルへのポインタとは、
例えば、図５に示すフローチャートにおいて、Ｓ５０６
の処理で作成される音声録音ファイルのファイル名を示
すものである。そして、このポインタを代用するとは、
例えば、音声対話制御部１１１において基本処理部１１
５が、認識結果を破棄し、音声録音ファイルのファイル
名を認識結果のかわりに用いて、入力処理を終了するこ
とを意味する。The pointer to the voice recording file is
For example, in the flowchart shown in FIG.
The file name of the voice recording file created by the process of FIG. And to substitute this pointer,
For example, in the voice interaction control unit 111, the basic processing unit 11
5 means that the recognition result is discarded, the file name of the voice recording file is used instead of the recognition result, and the input process is terminated.

【０１１２】またデータベース部１３５に保存された音
声録音ファイルは、必要に応じて、取り出され、確認作
業が行われることが好ましい。この確認作業は、既に説
明した聞き起こし、もしくは、音声認識部１２１での音
声認識処理等、所望の手段によって行われるものであ
る。Further, it is preferable that the voice recording file stored in the database unit 135 is taken out and a confirmation work is performed if necessary. This confirmation work is performed by a desired means such as the above-mentioned listening and raising or voice recognition processing in the voice recognition unit 121.

【０１１３】（Ｓ５１２）分岐判断部１１７が、（ｅ）
によって、認識結果を破棄すると判断した場合は、音声
対話制御部１１１において、認識結果なしとして入力処
理を終了する。(S512) The branch judgment unit 117 determines (e)
If it is determined that the recognition result is discarded, the voice interaction control unit 111 determines that there is no recognition result and ends the input process.

【０１１４】４．分岐判断以上説明した図６のフローチャート中、Ｓ５０７の処理
について、図１１、及び図７と図８に示すフローチャー
トを参照して説明する。4. Branching Determination In the flowchart of FIG. 6 described above, the process of S507 will be described with reference to FIG. 11 and the flowcharts shown in FIGS. 7 and 8.

【０１１５】Ｓ５０７の処理とは、分岐判断で行われる
処理である。図１１は、この分岐判断の手順について
（Ａ）〜（Ｆ）の５つに場合分けをし、それぞれの場合
について、上述したような判断（ａ）〜（ｅ）のうち、
どの判断が成されるのか、その対応関係を示したもので
ある。ここで、図１１、図７及び図８において、第１候
補の尤度、第２候補の尤度をそれぞれ、尤度（１）、尤
度（２）とする。また、図１１において、尤度差とは、
第１候補の尤度と第２候補の尤度との差を示すものであ
る。The process of S507 is a process performed by a branch determination. FIG. 11 divides the procedure of this branch judgment into five cases (A) to (F), and in each case, among the judgments (a) to (e) described above,
It shows the correspondence between which judgments are made. Here, in FIG. 11, FIG. 7, and FIG. 8, the likelihood of the first candidate and the likelihood of the second candidate are referred to as likelihood (1) and likelihood (2), respectively. Further, in FIG. 11, the likelihood difference is
The difference between the likelihood of the first candidate and the likelihood of the second candidate is shown.

【０１１６】まずは、図７に示すフローチャートにおけ
るＳ６０１からＳ６０７までの処理について説明する。First, the processing from S601 to S607 in the flowchart shown in FIG. 7 will be described.

【０１１７】分岐判断部１１７は、基本処理部１１５か
ら認識結果を受信するとともに、メモリ１１９から基準
値を読み出す。更に、分岐判断部１１７は、対話セルモ
ード２０３を記憶部１０７からメモリ１１９に読み込
む。そして、分岐判断部１１７は、分岐判断を開始す
る。尚、対話セルモード２０３を構成する、（イ）〜
（ホ）の複数のフラグ列については既に説明した通りで
ある。The branch judgment unit 117 receives the recognition result from the basic processing unit 115 and reads the reference value from the memory 119. Furthermore, the branch determination unit 117 reads the interactive cell mode 203 from the storage unit 107 into the memory 119. Then, the branch determination unit 117 starts the branch determination. In addition, the interactive cell mode 203 is configured.
The plurality of flag strings in (e) are as described above.

【０１１８】（Ｓ６０１）、（Ｓ６０２）分岐判断部１
１７は、Ｓ６０１において、第１候補の尤度が第１閾値
より大きく、且つＳ６０２において、第１候補と第２候
補との尤度の差が第１閾値差より大きい場合、第１候補
をそのまま利用する判断（ａ）を行う。(S601), (S602) Branch decision unit 1
If the likelihood of the first candidate is larger than the first threshold value in S601 and the difference in likelihood between the first candidate and the second candidate is larger than the first threshold value difference in S602, the first candidate is left unchanged. The decision (a) to use is made.

【０１１９】図１１を参照すれば、（Ｓ６０１）と（Ｓ
６０２）の処理において、（ａ）の判断が行われる場合
は（Ａ）に該当する。Referring to FIG. 11, (S601) and (S
In the process of 602), when the determination of (a) is performed, it corresponds to (A).

【０１２０】（Ｓ６０３）次に、（Ａ）以外の場合で、
且つ対話セルモード２０３におけるフラグ列が、（イ）
且つ（（ロ）又は（ハ））であるとき、分岐判断部１１
７は、Ｓ６０４の処理を行う。(S603) Next, in cases other than (A),
And the flag string in the interactive cell mode 203 is (a)
And ((b) or (c)), the branch determination unit 11
7 performs the processing of S604.

【０１２１】ここで、フラグ列が（イ）且つ（（ロ）又
は（ハ））であるとは、即ち、音声認識処理に応じた次
の対話セルへの分岐があることを示しているか、或い
は、後処理プログラム２０７について、認識結果をグロ
ーバル変数２１５に代入することを示している場合であ
る。Here, the fact that the flag string is (a) and ((b) or (c)) means that there is a branch to the next dialogue cell according to the voice recognition processing, Alternatively, it is a case where the post-processing program 207 indicates that the recognition result is substituted into the global variable 215.

【０１２２】（Ｓ６０４）分岐判断部１１７は、第１候
補の尤度と第１閾値を比較する。そして、第１候補の尤
度が、第１閾値より大きい場合、分岐判断部１１７は、
Ｓ６０５の処理を続けて行う。(S604) The branch determination unit 117 compares the likelihood of the first candidate with the first threshold value. Then, when the likelihood of the first candidate is larger than the first threshold value, the branch determination unit 117
The process of S605 is continuously performed.

【０１２３】（Ｓ６０５）分岐判断部１１７は、第１候
補と第２候補との尤度の差と第２閾値差とを比較する。
そして、第１候補と第２候補との尤度の差が第２閾値差
より大きい場合、分岐判断部１１７は、第１候補につい
て確認処理をする判断を行う（ｂ）。(S605) The branch determination unit 117 compares the likelihood difference between the first candidate and the second candidate with the second threshold difference.
Then, when the difference between the likelihoods of the first candidate and the second candidate is larger than the second threshold difference, the branch determination unit 117 determines to perform confirmation processing on the first candidate (b).

【０１２４】尚、（Ｓ６０４）において第１候補の尤度
が、第１閾値より大きい場合以外は、Ｓ６０６の処理に
移る。Note that, unless the likelihood of the first candidate is larger than the first threshold value in (S604), the process proceeds to S606.

【０１２５】（Ｓ６０６）分岐判断部１１７は、第１候
補の尤度と第２閾値を比較し、第１候補の尤度が、第２
閾値より大きい場合、Ｓ６０７の処理を続けて行う。(S606) The branch determination unit 117 compares the likelihood of the first candidate with the second threshold, and the likelihood of the first candidate is the second threshold.
When it is larger than the threshold value, the process of S607 is continuously performed.

【０１２６】（Ｓ６０７）分岐判断部１１７は、第１候
補と第２候補との尤度の差と第１閾値差とを比較する。
そして、第１候補と第２候補との尤度の差が第１閾値差
より大きい場合、分岐判断部１１７は、第１候補につい
て確認処理をする判断を行う（ｂ）。(S607) The branch determination unit 117 compares the difference in likelihood between the first candidate and the second candidate with the first threshold difference.
Then, when the difference between the likelihoods of the first candidate and the second candidate is larger than the first threshold difference, the branch determination unit 117 determines to perform confirmation processing on the first candidate (b).

【０１２７】ここで、図１１を参照すれば、（Ｓ６０
３）〜（Ｓ６０７）の処理において、（ｂ）の判断が行
われる場合は（Ｂ）に該当する。Here, referring to FIG. 11, (S60
In the processes of 3) to (S607), when the determination of (b) is performed, it corresponds to (B).

【０１２８】尚、図７に示すフローチャートを参照すれ
ば、分岐判断部１１７は、（Ｂ）以外の場合において、
第１候補について再入力処理をする判断（ｃ）を行う。
図１１を参照すれば、このときの場合分けは、（Ｃ）に
該当する。Incidentally, referring to the flow chart shown in FIG. 7, the branch judging section 117 is
The determination (c) of re-inputting the first candidate is performed.
Referring to FIG. 11, the case classification at this time corresponds to (C).

【０１２９】次に、図８に示すフローチャートを参照し
て、Ｓ６０８からＳ６１２までの処理について説明す
る。Next, the processing from S608 to S612 will be described with reference to the flowchart shown in FIG.

【０１３０】（Ｓ６０８）（Ｃ）以外の場合において、
且つ対話セルモード２０３におけるフラグ列が、（イ）
且つ（ホ）であるとき、分岐判断部１１７は、Ｓ６０９
の処理を行う。フラグ列が（イ）且つ（ホ）であると
は、即ち後処理プログラム２０７について、認識結果に
応じたグローバル変数２１５のカウンタ値を増やす処理
を行うことを示している場合である。(S608) In cases other than (C),
And the flag string in the interactive cell mode 203 is (a)
And (e), the branch determination unit 117, S609
Process. The flag strings (a) and (e) indicate that the post-processing program 207 is to increase the counter value of the global variable 215 according to the recognition result.

【０１３１】（Ｓ６０９）分岐判断部１１７は、第１候
補の尤度と第１閾値を比較し第１候補の尤度が、第１閾
値より大きい場合、Ｓ６１０の処理を行う。(S609) The branch determination unit 117 compares the likelihood of the first candidate with the first threshold value, and when the likelihood of the first candidate is larger than the first threshold value, performs the process of S610.

【０１３２】（Ｓ６１０）分岐判断部１１７は、第１候
補と第２候補との尤度の差と第２閾値差とを比較する。
そして、第１候補と第２候補との尤度の差が第２閾値差
より大きい場合、分岐判断部１１７は、第１候補をその
まま利用する判断（ａ）を行う。(S610) The branch determination unit 117 compares the likelihood difference between the first candidate and the second candidate with the second threshold difference.
Then, when the difference in likelihood between the first candidate and the second candidate is larger than the second threshold difference, the branch determination unit 117 makes a determination (a) to use the first candidate as it is.

【０１３３】尚、Ｓ６０９において、第１候補の尤度が
第１閾値より大きい場合以外は、Ｓ６１１の処理に移
る。Incidentally, in S609, unless the likelihood of the first candidate is larger than the first threshold value, the process proceeds to S611.

【０１３４】（Ｓ６１１）分岐判断部１１７は、第１候
補の尤度と第２閾値を比較し、第１候補の尤度が、第２
閾値より大きい場合、Ｓ６１２の処理を続けて行う。(S611) The branch determination unit 117 compares the likelihood of the first candidate with the second threshold value, and the likelihood of the first candidate is the second likelihood.
If it is larger than the threshold value, the process of S612 is continued.

【０１３５】（Ｓ６１２）分岐判断部１１７は、第１候
補と第２候補との尤度の差と第１閾値差と比較する。そ
して、第１候補と第２候補との尤度の差が第１閾値差よ
り大きい場合、分岐判断部１１７は、第１候補をそのま
ま利用する判断（ａ）を行う。(S612) The branch determination unit 117 compares the difference in likelihood between the first candidate and the second candidate with the first threshold difference. Then, when the difference between the likelihoods of the first candidate and the second candidate is larger than the first threshold difference, the branch determination unit 117 makes a determination (a) to use the first candidate as it is.

【０１３６】ここで、図１１を参照すれば、（Ｓ６０
８）〜（Ｓ６１２）の処理において、（ａ）の判断が行
われる場合は、（Ｄ）に該当する。Here, referring to FIG. 11, (S60
In the processes of 8) to (S612), if the determination of (a) is performed, it corresponds to (D).

【０１３７】そして、図８に示すフローチャートを参照
すれば、分岐判断部１１７は、（Ｄ）以外の場合におい
て、認識結果を破棄し次の処理に進む判断（ｅ）を行
う。図１１において、この場合は（Ｅ）に該当する。Then, referring to the flow chart shown in FIG. 8, in cases other than (D), the branch judging section 117 makes a judgment (e) of discarding the recognition result and proceeding to the next process. In FIG. 11, this case corresponds to (E).

【０１３８】更に、図８において、（Ｅ）以外の場合
は、認識結果を破棄し、音声録音ファイルを認識結果と
して代用し、データベース部に保存する判断（ｄ）が、
分岐判断部１１７によって行われる。この場合は、図１
１において、（Ｆ）に場合分けされる。Further, in FIG. 8, in the case other than (E), the judgment (d) of discarding the recognition result, substituting the voice recording file as the recognition result, and storing it in the database part is
This is performed by the branch determination unit 117. In this case,
1 is divided into cases (F).

【０１３９】以上、図１１、図７及び図８を参照して説
明した、分岐判断部１１７が行う分岐判断の手順につい
て、例えば、認識結果が、第１候補：「３０代」、第１
候補の尤度０．９８７；第２候補：「４０代」、第２候
補の尤度０．７６５；第３候補：「１０代」、第３候補
の尤度０．５４３であった場合について考える。With respect to the procedure of the branch determination performed by the branch determination unit 117 described above with reference to FIGS. 11, 7 and 8, for example, the recognition result is the first candidate: “30s”, the first candidate.
Candidate Likelihood 0.987; Second Candidate: "40's", Second Candidate Likelihood 0.765; Third Candidate: "10's", Third Candidate Likelihood 0.543 Think

【０１４０】ここで、メモリ１１９に格納されている基
準値について、第１閾値が０．８００、第２閾値が０．
６００、第１閾値差が０．２００、第２閾値差が０．１
００のように設定されているものとする。Here, with respect to the reference values stored in the memory 119, the first threshold is 0.800 and the second threshold is 0.
600, the first threshold difference is 0.200, the second threshold difference is 0.1
00 is set.

【０１４１】このとき、第１候補の尤度、及び第１候補
と第２候補との尤度の差は、それぞれ０．９８７、０．
２２２である。よって、第１候補の尤度は、第１閾値
０．８００より大きく、且つ第１候補と第２候補との尤
度の差は、第１閾値差より大きい。即ち、このとき、図
７に示すフローチャートにおけるＳ６０１とＳ６０２の
処理において、分岐判断部１１７によって、（ａ）の判
断が行われる。これは、既に説明した（Ａ）の場合に相
当する。At this time, the likelihood of the first candidate and the difference between the likelihoods of the first candidate and the second candidate are 0.987, 0.
222. Therefore, the likelihood of the first candidate is larger than the first threshold value 0.800, and the difference in likelihood between the first candidate and the second candidate is larger than the first threshold value difference. That is, at this time, in the processing of S601 and S602 in the flowchart shown in FIG. 7, the branch determination unit 117 makes the determination (a). This corresponds to the case (A) already described.

【０１４２】尚、ここでは、第１候補の尤度、及び第１
候補と第２候補の尤度の差と、各閾値、各閾値差との比
較を行い、分岐判断を行うことについて説明したが、各
候補の尤度及びそれぞれの尤度の差と、各閾値、及び各
閾値差について比較する場合があってもよい。Here, the likelihood of the first candidate and the first candidate
Although it has been described that the difference between the likelihoods of the candidate and the second candidate is compared with each threshold value and each threshold value difference to make the branch judgment, the likelihood of each candidate and the difference between each likelihood and each threshold value are described. , And each threshold difference may be compared.

【０１４３】５．確認入力処理次に、図６に示したフローチャート中、Ｓ５０９の処
理、即ち確認入力処理について、図９に示すフローチャ
ートを参照して説明する。5. Confirmation Input Process Next, the process of S509 in the flowchart shown in FIG. 6, that is, the confirmation input process will be described with reference to the flowchart shown in FIG.

【０１４４】ここで、既に説明したように、図６に示し
たフローチャート中、Ｓ５０９の処理で、分岐判断部１
１７が確認処理を行うと判断した場合、音声対話制御部
１１１において確認入力処理が開始される。Here, as described above, in the process of S509 in the flowchart shown in FIG.
If 17 determines that the confirmation processing is to be performed, the confirmation input processing is started in the voice interaction control unit 111.

【０１４５】（Ｓ７０１）基本処理部１１５は、確認音
声メッセージ１３１を、記憶部１０７からメモリ１１９
に読み出す。続いて、基本処理部１１５は、認識結果の
第１候補に確認音声メッセージ１３１を付与する。そし
て、第１候補に付与された確認音声メッセージは、回線
処理部１０９を介し基本処理部１１５によって、利用者
の電話機１０１に送信される。(S701) The basic processing unit 115 transfers the confirmation voice message 131 from the storage unit 107 to the memory 119.
Read to. Subsequently, the basic processing unit 115 adds the confirmation voice message 131 to the first candidate of the recognition result. Then, the confirmation voice message added to the first candidate is transmitted to the user's telephone 101 by the basic processing unit 115 via the line processing unit 109.

【０１４６】ここで、第１候補に付与された確認音声メ
ッセージは、認識結果における第１候補が、入力された
音声を確定するものであるか、確認を行うものである。
例えば、認識結果における第１候補が「３０代」である
場合、この第１候補に付与された確認音声メッセージ
は、「３０代でよろしいですか。はい、いいえでお答え
ください。」というようなものとなる。即ち、この実施
の形態で行われる確認処理において、利用者からの回答
は「はい」、もしくは「いいえ」で行われるものとす
る。Here, the confirmation voice message given to the first candidate confirms whether the first candidate in the recognition result confirms the inputted voice.
For example, if the first candidate in the recognition result is "30s", the confirmation voice message given to this first candidate is "Are you in your 30s? Please answer yes or no." Becomes That is, in the confirmation process performed in this embodiment, the answer from the user is "yes" or "no".

【０１４７】（Ｓ７０２）電話機１０１は、回線処理部
１０９からメッセージを受信し、これを再生する。(S702) The telephone set 101 receives the message from the line processing unit 109 and reproduces it.

【０１４８】（Ｓ７０３）音声対話制御部１１１におい
て、基本処理部１１５は、「はい」もしくは「いいえ」
だけを認識する確認認識文法１２９を、記憶部１０７か
らメモリ１１９に読み込む。そして、基本処理部１１５
は、音声認識部１２１に確認認識文法１２９を送信する
とともに、音声認識開始の旨を通知する。音声認識部１
２１は確認認識文法１２９を用いて音声認識処理を開始
する。(S703) In the voice interaction control section 111, the basic processing section 115 determines "Yes" or "No".
The confirmation recognition grammar 129 for recognizing only is read from the storage unit 107 into the memory 119. Then, the basic processing unit 115
Sends the confirmation recognition grammar 129 to the voice recognition unit 121 and notifies that the voice recognition is started. Speech recognition unit 1
21 uses the confirmation recognition grammar 129 to start the voice recognition process.

【０１４９】（Ｓ７０４）その後、音声対話制御部１１
１は、音声認識部１２１からの結果待ち状態となる。こ
こで、結果待ち状態とは、音声認識部１２１から認識結
果が送信されるのを待つ状態である。(S704) After that, the voice dialogue control unit 11
1 is in the waiting state for the result from the voice recognition unit 121. Here, the result waiting state is a state of waiting for the recognition result to be transmitted from the voice recognition unit 121.

【０１５０】（Ｓ７０５）Ｓ７０２において、電話機１
０１で再生されたメッセージに対して、利用者は音声で
回答する。電話機１０１は、利用者の音声を音声対話装
置１００に送信する。(S705) In S702, the telephone set 1
The user answers by voice to the message reproduced in 01. The telephone 101 transmits the voice of the user to the voice interaction device 100.

【０１５１】（Ｓ７０６）電話機１０１から、利用者の
音声を回線処理部１０９が受信する。続いて、基本処理
部１１５が回線処理部１０９から利用者の音声を受信す
る。基本処理部１１５は、音声認識部１２１に、利用者
の音声を送信する。(S706) The line processing unit 109 receives the voice of the user from the telephone 101. Subsequently, the basic processing unit 115 receives the voice of the user from the line processing unit 109. The basic processing unit 115 transmits the voice of the user to the voice recognition unit 121.

【０１５２】音声認識部１２１は、利用者の音声を受信
すると、この音声に対して音声認識を行う。音声認識の
手順は、確認認識文法１２９を用いて行われること以外
は、既に説明したものと同様である。そして、決定され
た認識結果は、音声認識部１２１から基本処理部１１５
に通知される。決定された認識結果の通知を結果通知と
いう。Upon receiving the voice of the user, the voice recognition unit 121 performs voice recognition on this voice. The procedure of voice recognition is the same as that already described, except that the confirmation recognition grammar 129 is used. Then, the determined recognition result is transmitted from the voice recognition unit 121 to the basic processing unit 115.
Will be notified. Notification of the determined recognition result is called result notification.

【０１５３】例えば、利用者からの音声による回答が、
「はい」であるとすると、音声認識部１２１によって決
定された認識結果は、第１候補：「はい」、第１候補の
尤度０．９９０のようになる。For example, a voice response from the user is
If "yes", the recognition result determined by the voice recognition unit 121 is as follows: first candidate: "yes", first candidate likelihood 0.990.

【０１５４】（Ｓ７０７）基本処理部１１５は、認識結
果を受信すると、分岐判断部１１７にこれを送信する。
分岐判断部１１７は、この認識結果の第１候補が「いい
え」ならば、Ｓ７０８の処理、即ち再入力処理を行う判
断をする。Ｓ７０８の処理についての詳細な説明は、後
述する。(S707) Upon receiving the recognition result, the basic processing section 115 sends it to the branch determination section 117.
If the first candidate of the recognition result is “No”, the branch determination unit 117 determines to perform the process of S708, that is, the re-input process. A detailed description of the process of S708 will be given later.

【０１５５】一方、認識結果の第１候補が「はい」なら
ば、分岐判断部１１７は、全ての確認入力処理を終了す
る判断をする。このとき、音声対話制御部１１１におい
て、この認識結果の第１候補はそのまま利用され、処理
される。On the other hand, if the first candidate of the recognition result is "yes", the branch determination section 117 determines to end all the confirmation input processing. At this time, in the voice interaction control unit 111, the first candidate of the recognition result is used as it is and processed.

【０１５６】６．再入力処理ここで、図９に示すフローチャートにおけるＳ７０８の
処理、即ち再入力処理について、図１０に示すフローチ
ャートを参照して説明する。尚、図６のフローチャート
におけるＳ５１０の処理は、これより説明する再入力処
理と同様のものであり、同様の手順によって行われる。6. Re-input process Here, the process of S708 in the flowchart shown in FIG. 9, that is, the re-input process will be described with reference to the flowchart shown in FIG. The process of S510 in the flowchart of FIG. 6 is the same as the re-input process described below, and is performed by the same procedure.

【０１５７】（Ｓ８０１）まず、音声対話制御部１１１
において、基本処理部１１５は、再入力メッセージ１３
３を、記憶部１０７からメモリ１１９に読み出す。続い
て、回線処理部１０９を介して、基本処理部１１５より
利用者の電話機１０１に、再入力メッセージが送信され
る。(S801) First, the voice dialogue control section 111.
In the basic processing unit 115, the basic processing unit 115
3 is read from the storage unit 107 to the memory 119. Then, the re-input message is transmitted from the basic processing unit 115 to the user's telephone 101 via the line processing unit 109.

【０１５８】再入力メッセージは、例えば、「もう一
度、お答えを発声してください。」というように、利用
者に対して、電話機１０１における音声の再入力を促す
ものである。The re-input message prompts the user to re-input the voice on the telephone 101, for example, "Please say your answer again."

【０１５９】（Ｓ８０２）電話機１０１は、回線処理部
１０９からメッセージを受信し、これを再生する。(S802) The telephone set 101 receives the message from the line processing unit 109 and reproduces it.

【０１６０】（Ｓ８０３）利用者は、電話機１０１から
の再入力メッセージに対して、回答を音声で行う。電話
機１０１は利用者の音声を音声対話装置１００に送信す
る。(S803) The user replies by voice to the re-input message from the telephone set 101. The telephone 101 transmits the voice of the user to the voice interaction device 100.

【０１６１】（Ｓ８０４）電話機１０１から、利用者の
音声を回線処理部１０９が受信する。続いて、基本処理
部１１５は、回線処理部１０９から利用者の音声を受信
する。その後、音声対話制御部１１１において、利用者
の音声に対する入力処理が行われる。この入力処理は既
に説明したものと同様の手順によって行われる。(S804) The line processing unit 109 receives the voice of the user from the telephone 101. Then, the basic processing unit 115 receives the voice of the user from the line processing unit 109. After that, the voice dialogue control unit 111 performs an input process for the voice of the user. This input processing is performed by the same procedure as that already described.

【０１６２】入力処理終了後、この実施の形態の動作
は、図４に示すフローチャートにて既に説明したよう
な、Ｓ４０６以降の処理に移る。ここでは、重複する記
載について説明を省略する。After the input processing is completed, the operation of this embodiment moves to the processing of S406 and thereafter as already described in the flowchart shown in FIG. Here, the description of the overlapping description will be omitted.

【０１６３】以上、この実施の形態の音声対話処理によ
れば、音声対話シーケンス１２５における各対話セル２
０１の処理が、繰り返し行われる。そして、この対話セ
ルの処理では、分岐判断部１１７は、対話セルモード２
０３、及び認識結果における尤度と基準値との比較に基
づいて、入力された音声に対して分岐判断を行う。そし
て、この分岐判断によって、認識結果について適した処
理が行われる。即ち、この実施の形態によれば、各対話
セルにおいて、最も適した音声対話処理が行われる。As described above, according to the voice dialogue processing of this embodiment, each dialogue cell 2 in the voice dialogue sequence 125 is described.
The processing of 01 is repeated. Then, in the processing of the interactive cell, the branch determination unit 117 determines that the interactive cell mode 2
03, and based on the comparison between the likelihood in the recognition result and the reference value, the branch judgment is performed on the input voice. Then, according to this branch determination, a process suitable for the recognition result is performed. That is, according to this embodiment, the most suitable voice dialogue processing is performed in each dialogue cell.

【０１６４】よって、従来は人手による聞き起こしの作
業が必要であった認識結果に対して、この実施の形態で
は、人手を介する作業を省くことができる場合もある。
さらに、音声認識が様々な事由によってうまく行われな
い場合においても、分岐判断によって適した処理が決定
され、行われる。Therefore, in the present embodiment, in some cases, the manual work may be omitted in contrast to the recognition result which conventionally requires the manual raising work.
Further, even when the voice recognition is not successfully performed due to various reasons, a suitable process is determined and performed by the branch determination.

【０１６５】また、認識結果における尤度と基準値を比
較することによって、分岐判断部１１７は、音声認識に
ついて認識性能を評価することができる。即ち分岐判断
部１１７は、認識結果について、音声処理部１１３の装
置構成を考慮した的確な分岐判断を行っているといえ
る。Further, by comparing the likelihood in the recognition result with the reference value, the branch judgment unit 117 can evaluate the recognition performance for the voice recognition. That is, it can be said that the branch determination unit 117 makes an accurate branch determination of the recognition result in consideration of the device configuration of the voice processing unit 113.

【０１６６】[0166]

【発明の効果】以上、この発明の音声対話装置及び音声
対話処理方法によれば、入力された音声に対する認識結
果について、音声対話制御部において、分岐判断部が分
岐判断を行い、最も適した処理を決定する。この際、分
岐判断部において、基準値と尤度とを比較することによ
り、音声認識の認識性能を評価することができる。即ち
分岐判断部は、音声処理部における認識結果について的
確な分岐判断を行っている。よって、この発明の音声対
話装置及び音声対話処理方法によれば、効率が良く、且
つ信頼性の高い音声対話処理を実現することができる。As described above, according to the voice interaction apparatus and the voice interaction processing method of the present invention, the branch determination section in the voice interaction control section makes a branch determination with respect to the recognition result for the input voice, and the most suitable processing is performed. To decide. At this time, the branch determination unit can evaluate the recognition performance of the voice recognition by comparing the reference value and the likelihood. That is, the branch determination unit makes an accurate branch determination regarding the recognition result in the voice processing unit. Therefore, according to the voice interaction device and the voice interaction processing method of the present invention, it is possible to realize efficient and highly reliable voice interaction processing.

[Brief description of drawings]

【図１】この発明の実施の形態における音声対話装置の
構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a voice dialogue device according to an embodiment of the present invention.

【図２】音声対話シーケンスの構成を示すブロック図で
ある。FIG. 2 is a block diagram showing a configuration of a voice dialogue sequence.

【図３】この実施の形態における音声対話処理を示すフ
ローチャートである。FIG. 3 is a flowchart showing a voice dialogue process in this embodiment.

【図４】この実施の形態による各対話セルの処理を示す
フローチャートである。FIG. 4 is a flowchart showing a process of each dialogue cell according to this embodiment.

【図５】この実施の形態による入力処理を示すフローチ
ャートである。FIG. 5 is a flowchart showing an input process according to this embodiment.

【図６】この実施の形態による入力処理を示すフローチ
ャートである。FIG. 6 is a flowchart showing an input process according to this embodiment.

【図７】この実施の形態による分岐判断における処理を
示すフローチャートである。FIG. 7 is a flowchart showing a process in branch determination according to this embodiment.

【図８】この実施の形態による分岐判断における処理を
示すフローチャートである。FIG. 8 is a flowchart showing a process in branch determination according to this embodiment.

【図９】この実施の形態による確認処理を示すフローチ
ャートである。FIG. 9 is a flowchart showing a confirmation process according to this embodiment.

【図１０】この実施の形態による再入力処理を示すフロ
ーチャートである。FIG. 10 is a flowchart showing a re-input process according to this embodiment.

【図１１】この実施の形態による分岐判断部における分
岐判断を説明するための図である。FIG. 11 is a diagram for explaining a branch determination in a branch determination unit according to this embodiment.

【符号の説明】１００：音声対話装置１０１：利用者の電話機１０３：通信回線１０５：音声対話処理部１０７：記憶部１０９：回線処理部１１１：音声対話制御部１１３：音声処理部１１５：基本処理部１１７：分岐判断部１１９：メモリ１２１：音声認識部１２３：音声録音部１２５：音声対話シーケンス１２７：音声メッセージ１２９：確認認識文法１３１：確認音声メッセージ１３３：再入力メッセージ１３５：データベース部１３７：音声処理情報２０１：対話セル２０３：対話セルモード（第３情報）２０５：前処理プログラム２０７：後処理プログラム２０９：次セルポインタ２１１：音声ファイル名（第１情報）２１３：認識文法（第２情報）２１５：グローバル変数２１７：第２情報群２１９：第１情報群[Explanation of symbols] 100: Spoken dialogue device 101: User's telephone 103: communication line 105: voice interaction processing unit 107: storage unit 109: Line processing unit 111: Spoken dialogue control unit 113: Voice processing unit 115: Basic processing unit 117: Branch judgment unit 119: Memory 121: Speech recognition unit 123: Voice recording section 125: Spoken dialogue sequence 127: Voice message 129: Confirmation recognition grammar 131: Confirmation voice message 133: Re-enter message 135: Database part 137: Voice processing information 201: Dialog cell 203: Interactive cell mode (3rd information) 205: Preprocessing program 207: Post-processing program 209: Next cell pointer 211: Voice file name (first information) 213: Recognition grammar (second information) 215: Global variable 217: Second information group 219: First information group

フロントページの続き (72)発明者早川慎司東京都港区虎ノ門１丁目７番12号沖電気工業株式会社内Ｆターム(参考） 5D015 KK02 KK04 LL10 5D045 AB04 AB26 5K027 BB01 HH20 Continued front page (72) Inventor Shinji Hayakawa 1-7-12 Toranomon, Minato-ku, Tokyo Oki Electric Industry Co., Ltd. F-term (reference) 5D015 KK02 KK04 LL10 5D045 AB04 AB26 5K027 BB01 HH20

Claims

[Claims]

1. A voice interactive apparatus connected to a user's telephone via a communication line, receiving voice from the telephone, and performing voice interactive processing, wherein the voice is fixed for the voice input from the telephone. A plurality of candidates for determining the likelihood of each candidate as a recognition result, and by comparing the likelihood with a reference value set in advance for use in making the determination, the branch determination A voice interaction apparatus, comprising: a voice interaction processing unit that determines a method of performing the recognition result.

2. The voice interactive apparatus according to claim 1, wherein a first information group including first information and second information used for determining the recognition result, and a third information group used for the branch determination. A voice interactive apparatus comprising a storage unit in which voice processing information having a second information group including information is stored in advance.

3. The voice interaction device according to claim 2, wherein the voice interaction processing unit stores a line processing unit connected to the telephone, a basic processing unit, a branch determination unit, and the reference value. And a voice processing unit that determines the recognition result, wherein the basic processing unit includes the first information and the second information,
Reading from the storage unit into the memory, using the first information, performing a voice conversation with the telephone through the line processing unit, and receiving a voice from the telephone, the voice processing unit, The voice from the telephone and the second information are received from the basic processing unit, the recognition result is determined using the second information, and the recognition result is transmitted to the basic processing unit. The judgment unit receives the recognition result from the basic processing unit, reads the third information from the storage unit, and the reference value from the memory, respectively, and compares the likelihood with the reference value, And a voice interaction device that makes the branch determination based on the third information.

4. The voice interaction device according to claim 3, wherein the voice processing unit performs a recording process on a voice recognition unit for determining the recognition result, and a voice recording process for voice from the telephone. A voice interaction device, comprising: a voice recording unit for creating a recording file.

5. The voice interaction device according to claim 4, wherein the voice processing information stored in advance in the storage unit includes a voice interaction sequence having a plurality of interaction cells, and the interaction cell is , The first information group, and the second information group including a dialogue cell mode as the third information for determining processing of the dialogue cell, wherein the branching determination includes the likelihood and the criterion. A voice interaction device characterized by being compared with a value and based on the interaction cell mode.

6. The voice interactive apparatus according to claim 5, wherein the voice processing information includes a voice message having a plurality of files in which a voice message is recorded, and the first information group includes: A file used when performing the voice interaction with the telephone is a voice file name as the first information that specifies from the plurality of files, and a second file that is used when determining the recognition result. And a recognition grammar. The second information group describes, in addition to the interaction cell mode, a preprocessing program that describes a preprocessing required for the voice interaction processing, and a postprocessing required for the voice interaction processing. A post-processing program, and a next cell pointer for designating a dialogue cell to proceed to after the processing in the dialogue cell is completed. A ram, and the postprocessor, voice dialogue system, characterized in that depending on said next cell pointer.

7. The voice interactive apparatus according to claim 6, wherein the voice interactive sequence has a global variable set corresponding to the interactive cell, and the interactive cell mode has a plurality of global variables. (B) whether the dialogue cell mode includes voice recognition processing, (b) whether there is a branch to the next dialogue cell in the dialogue cell, C) Whether the post-processing program substitutes the recognition result into the global variable, and (d) Whether the post-processing program registers the recognition result in the database unit installed in the storage unit. Or (e) The post-processing program performs a process of increasing the global variable or the counter value in the database unit for the recognition result. Voice dialogue system, characterized in that illustrates how a.

8. The voice interaction device according to claim 7, wherein the branch determination unit uses the result of the branch determination as it is, discards it, or performs confirmation processing,
A voice interactive apparatus, characterized in that either a re-input process is performed or the voice recording file is substituted for the recognition result and stored in the database unit.

9. The voice interactive apparatus according to claim 8, wherein the recognition result is used as it is, discarded, confirmation processing, re-input processing, or the voice recording file is recorded. The reference value used when determining whether to substitute the recognition result and store it in the database unit is four reference values of a first threshold value, a second threshold value, a first threshold value difference, and a second threshold value difference. A voice interaction device characterized by being set as.

10. The voice interaction device according to claim 9, wherein the branch determination unit determines the likelihood of a first candidate among the plurality of candidates, and the first candidate and the second candidate in the branch determination. (A) the likelihood of the first candidate is larger than the first threshold,
When the difference in likelihood is larger than the first threshold difference, it is determined that the first candidate is used as it is. (B) In cases other than (A), and the flag string in the interactive cell mode is For the dialogue cell, it indicates that there is a branch to the next dialogue cell according to the voice recognition processing, or for the post-processing program, indicates that the recognition result is assigned to the global variable. When the likelihood of the first candidate is larger than the first threshold and the difference in likelihood is larger than the second threshold difference, or the likelihood of the first candidate is larger than the second threshold. And, when the difference in the likelihood is larger than the first threshold difference, a determination is made to confirm the first candidate, and in the cases other than (C) and (B), the re-input process is performed for the first candidate. In the cases other than (D) and (C), the flag string in the interactive cell mode increases the counter value of the global variable according to the recognition result in the post-processing program. When indicating that to perform, if the likelihood of the first candidate is larger than the first threshold and the difference in the likelihood is larger than the second threshold difference, or the likelihood of the first candidate is When it is larger than the second threshold and the difference in likelihood is larger than the first threshold difference, it is determined to use the first candidate as it is, and (E) and (D), the recognition result is Discard and determine to proceed to the next process. If (F) or (E) is not true, the recognition result is discarded, and the voice recording file is substituted as the recognition result and stored in the database unit. Spoken dialogue apparatus according to claim.

11. When performing a voice interaction process in a voice interaction device including a voice interaction processing unit connected to a user's telephone via a communication line, a plurality of voices for determining a voice input from the telephone are set. A branch decision is made by determining a candidate and the likelihood for each candidate as a recognition result, and comparing the likelihood with a preset reference value for use in making the decision. A method for voice interaction processing, characterized in that how to process the recognition result is determined by.

12. The voice interaction processing method according to claim 11, wherein the voice processing information having a first information group including first and second information and a second information group including third information is stored in advance in the storage unit. And storing the first information in the storage unit, the first and second information is read from the storage unit, the recognition result is determined by using the first and second information, and the third information is read from the storage unit. A voice interaction processing method, characterized in that the branch determination is performed using the third information.

13. The voice interaction processing method according to claim 12, wherein the first and second information are read from the storage unit to a memory, and the line processing unit is used by using the read first information. The second information read out to the memory by performing a voice dialogue between the basic processing unit and the telephone via the line processing unit and receiving a voice from the telephone via the line processing unit. Is used to determine the recognition result, read the third information from the storage unit, read the reference value stored in advance in the memory, compare the likelihood with the reference value, and A voice interaction processing method, characterized in that a previous branch decision is made based on the third information.

14. The voice interaction processing method according to claim 13, wherein a voice recording unit receives voice from the telephone,
A voice interaction processing method characterized by performing a voice recording process to create a voice recording file.

15. The speech dialogue processing method according to claim 14, wherein the speech processing information stored in advance in the storage unit includes a speech dialogue sequence having a plurality of dialogue cells, and the dialogue cell is the first dialogue cell. 1 information group and the 2nd information group containing the conversation cell mode as the 3rd information which determines the processing of the conversation cell, and the above-mentioned branch judgment of the above-mentioned likelihood and the above-mentioned reference value. A voice interaction processing method, characterized in that the comparison is performed based on the interaction cell mode.

16. The voice interaction processing method according to claim 15, wherein the voice processing information includes a voice message having a plurality of files in which voice messages are recorded, and the first information group includes: A voice file name as the first information that specifies a file used when performing the voice dialogue between the basic processing unit and the telephone, and when determining the recognition result. The second information group includes a recognition grammar as the second information to be used, the second information group includes a preprocessing program describing preprocessing required for the voice interaction processing, in addition to the interaction cell mode, and the voice interaction. A post-processing program that describes post-processing required for processing, and a next cell pointer that specifies a dialogue cell to proceed to after the processing in the dialogue cell is completed. The cell mode depends on the pre-processing program, the post-processing program, and the next cell pointer.

17. The voice dialogue processing method according to claim 16, wherein the voice dialogue sequence has a global variable set corresponding to the dialogue cell, and the dialogue cell mode is plural. And (b) whether or not the dialogue cell mode includes voice recognition processing, (b) whether or not there is a branch to the next dialogue cell in the dialogue cell, (C) Whether the post-processing program substitutes the recognition result into the global variable, or (D) Whether the post-processing program registers the recognition result in the database unit installed in the storage unit. Or (e) the post-processing program increases the counter value in the global variable or the database unit for the recognition result. Speech dialogue processing method characterized by whether or not to, shows the.

18. The voice interaction processing method according to claim 17, wherein the reference value is a first threshold value, a second threshold value, a first threshold value difference, or a second threshold value.
It is set as four reference values of a threshold difference, and the branching determination is performed on the likelihood of the first candidate among the plurality of candidates and the difference between the likelihoods of the first candidate and the second candidate. ) The likelihood of the first candidate is greater than the first threshold,
And when the difference in the likelihood is larger than the first threshold difference, a process of using the first candidate as it is is determined, and (B) in a case other than (A) and the flag string in the interactive cell mode is: For the dialogue cell, indicates that there is a branch to the next dialogue cell according to the voice recognition processing, or for the post-processing program, indicates that the recognition result is substituted into the global variable. When the likelihood of the first candidate is greater than the first threshold and the difference in likelihood is greater than the second threshold difference, or the likelihood of the first candidate is greater than the second threshold. If the difference is large and the difference in likelihood is larger than the first threshold difference, it is decided to perform confirmation processing for the first candidate, and in cases other than (C) and (B), re-input for the first candidate. place In the case other than (D) and (C), and the flag string in the interactive cell mode performs a process of increasing the counter value of the global variable according to the recognition result for the post-processing program. When the likelihood of the first candidate is larger than the first threshold and the difference in the likelihood is larger than the second threshold difference, or the likelihood of the first candidate is When the difference between the likelihoods is larger than the second threshold and the difference between the likelihoods is larger than the first threshold difference, a process of using the first candidate as it is is determined, and in a case other than (E) and (D), the recognition result is Discard and decide to proceed to the next process. In the case other than (F) and (E), the recognition result is discarded, the voice recording file is substituted as the recognition result, and the process of saving in the database unit is determined. This And a voice interaction processing method characterized by: