JP2006236037A

JP2006236037A - Voice interaction content creation method, device, program and recording medium

Info

Publication number: JP2006236037A
Application number: JP2005050358A
Authority: JP
Inventors: Tetsuo Amakasu; 哲郎甘粕; Noboru Miyazaki; 昇宮崎; Akihiro Fuku; 昭弘富久; Teruo Hagino; 輝雄萩野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-02-25
Filing date: 2005-02-25
Publication date: 2006-09-07

Abstract

<P>PROBLEM TO BE SOLVED: To simplify a creation method for various kinds of contents used in a voice interaction device. <P>SOLUTION: In this voice interaction content creation method, a scenario template decided with display order of images and order of interaction fit for an application to be developed, a plurality of slot names each expressing an input item name decided by the application to be developed and slot IDs attached to the slot names, and templates of a plurality of interaction sentences each previously embedded with the slot ID, each prompting input to an interaction opposite party by substituting the slot name according to an embedded position of the slot ID, and each leading the proper input are prepared. The slot name is substituted according to the slot ID embedded in the template of the interaction sentence, each interaction sentence embedded with the slot name is substituted for the scenario template, and a scenario file and a prompt voice file are generated. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、利用者の音声を認識し、その内容を理解した上で必要な処理を行い音声や画面にその結果を表示することで応答を返す音声対話システム用プログラムの開発に用いる音声対話コンテンツ作成方法、装置、プログラム、このプログラムを記録した記録媒体に関する。 The present invention recognizes a user's voice, understands its contents, performs necessary processing, displays the result on the voice and the screen, and returns a response to the voice dialog system used for developing a program for a voice dialog system The present invention relates to a creation method, an apparatus, a program, and a recording medium on which the program is recorded.

画面上に擬人化して表現されたアニメーションエージェントキャラクタを表示させ、ユーザーがそのエージェントに対して話しかけるように発声した音声を認識し、応答をあらかじめ録音された音声や合成音で再生したり画面上にテキストで表示するなどして言葉として出力したり、応答のニュアンスをエージェントの仕草としてアニメーションで表示させたりすることで通知しながら対話を進める装置が提案されている（特許文献１）。
音声対話システムの作成においては、音声認識、音声合成、音声再生、音声理解、対話理解、対話制御といった多岐にわたる部位の動作に関して詳細な記述を行う必要がある。 Display an anthropomorphic animated agent character on the screen, recognize the voice spoken as the user speaks to the agent, play the response with pre-recorded voice or synthesized sound, or on the screen There has been proposed an apparatus that advances a dialogue while notifying by displaying it in a text and outputting it as a word, or displaying an nuance of a response as an action of an agent in an animation (Patent Document 1).
In creating a voice dialogue system, it is necessary to describe in detail the operations of various parts such as voice recognition, voice synthesis, voice reproduction, voice understanding, dialogue understanding, and dialogue control.

通常この動作の詳細な記述作業（プログラミング）においては、作ろうとしている音声対話システムの設計情報の他に、各部位で用いる音声認識、音声合成等の技術要素の技術的特性を把握している必要があるなど専門的な知識を有している必要があった。また、音声を認識しながら音声を再生するといった各部位間を連携させるなどの処理を記述する必要があった。これらの要求から音声対話システムの設計及び記述は非常に複雑な作業となっていた。
この複雑さのために、従来から音声対話システムプログラム（以後音声対話シナリオ）記述においては（特許文献２）の中で示されているように、音声対話シナリオの中でよく利用される個所を部品化しておき、それらを対話の順序（対話フロー）に合わせて順番につなぎ合わせることで音声対話シナリオを設計及び記述を簡略化していた。 Usually, in the detailed description work (programming) of this operation, in addition to the design information of the spoken dialogue system to be created, the technical characteristics of technical elements such as speech recognition and speech synthesis used in each part are grasped. It was necessary to have specialized knowledge. In addition, it is necessary to describe a process of linking the parts such as reproducing the voice while recognizing the voice. Because of these requirements, the design and description of a spoken dialogue system has become a very complex task.
Due to this complexity, parts that are often used in a voice dialogue scenario have been conventionally used in a spoken dialogue system program (hereinafter referred to as a voice dialogue scenario) description as shown in (Patent Document 2). The voice dialogue scenario is designed and described in a simplified manner by connecting them in order according to the order of dialogue (dialogue flow).

また、音声認識を行うためには、その対話システムに対して利用者が発する音声を受理し認識結果として出力するための文法や言語モデル、辞書などを作成する必要がある。この過程においては（特許文献３及び特許文献４）により、クラス統計言語モデルと呼ばれる被覆率の高い言語モデルと辞書を少数のコーパス（電子化された音声・言語データ）から作成し、音声認識を実行する技術が提案されている。
また近年では音声出力、音声入力といった音声モーダル（音声対話形式）のみならず、音声入力に対する応答内容を画面に描画し、また、その描画された画面の上にあるリンク等の項目のうちマウス等で選択された項目について音声により応答するといったマルチモーダル（多機能）なシステムが提案されている。このようなシステムでは、システムが検索結果などを返すために必要な項目に対してそれぞれの項目が「入力済み」・「未入力」等どのような状態にあるかといった、音声対話の途中の状態や対話の結果を画面上で表現することでより利用者にとってわかりやすい対話システムを提供することができる。このために画像やハイパーテキスト等のドキュメントファイルなども準備する。このとき、既に入力されている項目に関しての情報を受け取ることで、未入力の項目（スロット）についての候補を表示したりする上記のドキュメントを動的に生成する装置を作成しておくことで、ドキュメント作成のコストを下げる方法なども提案されている。
特開２００４−２９５８３７号公報特開２００４−３１０６２８号公報特開２００４−６９８５８号公報特開２００４−０５３７４５号公報 In addition, in order to perform speech recognition, it is necessary to create a grammar, a language model, a dictionary, and the like for receiving speech uttered by the user and outputting it as a recognition result. In this process (Patent Document 3 and Patent Document 4), a language model with high coverage called a class statistical language model and a dictionary are created from a small number of corpora (computerized speech / language data), and speech recognition is performed. Techniques to perform have been proposed.
In recent years, not only voice modals (voice dialogue format) such as voice output and voice input, but also response contents for voice input are drawn on the screen, and a mouse or the like among items such as links on the drawn screen. A multi-modal system has been proposed that responds by voice to the item selected in (1). In such a system, in the middle of a voice conversation, the status of each item, such as “entered” or “not entered”, for the items necessary for the system to return search results etc. It is possible to provide an easy-to-understand dialogue system for users by expressing the results of dialogues and dialogues on the screen. For this purpose, document files such as images and hypertext are also prepared. At this time, by receiving information about items that have already been input, by creating a device that dynamically generates the above document that displays candidates for items that have not been input (slots), A method for reducing the cost of document creation has also been proposed.
JP 2004-295837 A JP 2004-310628 A JP 2004-69858 A JP 2004-053745 A

音声対話シナリオの開発について（特許文献２）にあるように部分的な音声対話シナリオの接続という手法をとることで開発の負担を低減できる。しかし、音声対話シナリオのフロー設計においては、非常に高い専門性が必要とされ音声対話シナリオを短期間に多数を量産することは困難であった。なぜなら、一般的に音声対話において結果出力に必要な入力項目をどのような順で入力させるか、そしてその手順通りにユーザを誘導するにはシステムからの応答としてどのような言語表現（プロンプト）を用いるかに関しては慎重な設計が必要だからである。 Regarding the development of a voice conversation scenario, as described in (Patent Document 2), the burden of development can be reduced by taking a technique of connecting partial voice conversation scenarios. However, in the flow design of the voice conversation scenario, very high expertise is required, and it is difficult to mass-produce many voice conversation scenarios in a short time. This is because, in general, in what order the input items necessary for the output of the results are input in voice dialogue, and what language expression (prompt) is given as a response from the system to guide the user according to the procedure. This is because a careful design is necessary for the use.

また、音声認識を行った後に発話理解（発話の中から、対話の目的を達成するためにユーザから聞き出すことが必要な項目（スロット）に代入すべきキーワードを抽出する処理）を行う必要があるが、クラス言語モデルを作成する段階においてそのキーワードとなる単語を辞書に登録する際には、その言語の（１）表記、（２）読みと、その単語をクラス言語モデルにおけるどのクラス記号に割り当てるのかという（３）情報（クラス情報）に加えて、発話理解処理の中で意味を表わす（４）代入先スロットを付与する必要がある。すなわち、４種類の情報を登録する必要があり煩雑であった。 In addition, after speech recognition, it is necessary to understand utterances (processing to extract keywords to be substituted into items (slots) that need to be heard from the user in order to achieve the purpose of dialogue) However, when registering a word as a keyword in the dictionary at the stage of creating a class language model, (1) notation and (2) reading of the language and assigning the word to which class symbol in the class language model In addition to (3) information (class information), it is necessary to add (4) an assignment destination slot that represents meaning in the speech understanding process. That is, it is necessary to register four types of information, which is complicated.

また、対話の最中、または対話の結果から情報を提供する段階にあたって動的に画面を表示するためのドキュメントを生成する装置を準備する場合、通常、対話処理装置以外の別の装置（例えばWebサーバ上のCGIプログラム）を作成する必要がある。音声対話シナリオ以外のプログラムを作成することは単なるドキュメントを作成することに比べると経験の浅い音声対話シナリオ作成者には困難である。しかしながら、動的にドキュメントを生成するのではなく、静的にすべての入力状態に応じたドキュメントを準備することは非常に手間のかかる作業であった。 In addition, when preparing a device for generating a document for dynamically displaying a screen during a dialogue or at a stage of providing information from a result of the dialogue, another device other than the dialogue processing device (for example, a Web) is usually used. It is necessary to create a CGI program on the server. Creating a program other than a voice conversation scenario is more difficult for a voice conversation scenario creator who is inexperienced than creating a simple document. However, instead of dynamically generating a document, statically preparing a document corresponding to all input states has been a very troublesome task.

上記のとおり、マルチモーダルな音声対話シナリオの問題点として以下の問題があり、これらの作業コストや難易度を下げることが課題であった。
（１）音声対話シナリオのフロー（筋書）自体を決定することも専門的知識が必要で作成が困難
（２）対話の最中の状態に応じた応答（プロンプト）内容を作文することが困難
（３）キーワードとなる単語を登録する際の作業に手間がかかる
（４）画面に結果表示するためのドキュメントやプログラムの作成が困難 As described above, there are the following problems as problems of the multimodal voice conversation scenario, and it has been a problem to reduce the work cost and the difficulty level thereof.
(1) Determining the flow (scenario) of the voice dialogue scenario itself is also difficult because it requires specialized knowledge. (2) It is difficult to write response (prompt) content according to the state during the dialogue ( 3) It takes a lot of work to register a word as a keyword. (4) It is difficult to create a document or program for displaying the result on the screen.

本発明は上記問題を鑑みなされたものである。
上記（１）に関しては、部分的な音声対話シナリオを部品化して開発者が検索やオンラインショッピングといったアプリケーションや交通、小売、証券といったドメイン（話題）に合わせてフローを検討しながら各部品間を接続するのではなく、例えば、２つの項目に関して入力を求め、その２つの情報から導かれる結果を返すアプリケーション（「場所」「業種」から店舗を表示する、「発駅」「着駅」からその間の経路の乗り換え案内情報を表示する等）の全体のフローをテンプレートとして準備しておくことで、フロー設計を不要とさせる。このとき、作成しようとしているアプリケーションのドメインに合わせて応答内容となるテキスト（画面中にエージェントの噴出し内やテロップとして表示）や応答音声を準備させる。 The present invention has been made in view of the above problems.
With regard to (1) above, partial voice conversation scenarios are made into parts, and the developer connects the parts while considering the flow according to the domain (topic) such as applications such as search and online shopping, transportation, retail, and securities. Rather than, for example, an application that asks for input about two items and returns results derived from the two information (displays stores from "location""industry", between "departure station""destinationstation" between By preparing the entire flow (such as displaying route transfer guidance information) as a template, the flow design becomes unnecessary. At this time, it prepares text (displayed in the agent spout and as a telop on the screen) and response voice according to the domain of the application to be created.

上記ドメインにあわせた応答を準備させる際の問題である上記（２）に関しては、上記で準備したシナリオのフロー内の各場面での状況（あるスロットは入力済みであり、他方のスロットはいまだ入力済みでない等）に応じて、スロット名称を示す部分を空欄にした応答テキストとの雛形（テンプレート）を準備しておく。そして、スロットの名称を指示することで、その名称を雛形中の空欄に埋め込み、プロンプト文を動的に生成し、シナリオテンプレートの中にプロンプト文を埋め込み、必要であれば音声合成器や、人間がそのテキストを読み上げた音を録音装置で録音することで音声ファイルを作成する手段を設ける。 Regarding (2) above, which is a problem when preparing a response tailored to the above domain, the situation in each scene in the scenario flow prepared above (one slot has already been entered and the other slot is still entered) In response to the request text, a template (template) with a response text in which a slot name portion is left blank is prepared. Then, by specifying the name of the slot, the name is embedded in the blank in the template, the prompt sentence is dynamically generated, the prompt sentence is embedded in the scenario template, and if necessary, a speech synthesizer or human Is provided with means for creating a sound file by recording the sound of the text read out by a recording device.

上記（３）の問題に関して、そのキーワードを入力すべきスロットの名前をそのままクラスとして読み替えるスロット・クラス読み替え手段を設ける。
上記（４）の問題に関しては、作成しようとしている音声対話アプリケーションにおけるスロットに入るキーワードの全ての組合せ（あるスロットにはキーワードが入っていない組み合わせも含む）について列挙したリストを自動的に作成する手段を設ける。そして、スロットにキーワードが入っていない組み合わせについては、既にスロットに入っている値と、まだ値が入っていないスロットに入るキーワードのリストを画面に表示することで利用者に提示するドキュメントのファイルを自動的に作成する手段を設ける。作成したドキュメントのファイル名は対応するキーワードの組み合わせと関連付けられてリストに追加する。 Regarding the problem (3), there is provided a slot / class replacement means for replacing the name of the slot into which the keyword is input as a class as it is.
With respect to the problem (4) above, means for automatically creating a list enumerating all combinations of keywords (including combinations in which a keyword is not included in a slot) in a slot in the voice interaction application to be created Is provided. For a combination that does not contain a keyword in the slot, a document file to be presented to the user can be displayed by displaying a list of keywords that are already in the slot and keywords that are not yet in the slot. A means for automatically creating is provided. The file name of the created document is added to the list in association with the corresponding keyword combination.

さらに対話シナリオが実行される環境、つまり、音声対話装置においては、対話シナリオが実行され、ユーザからの入力がある毎にスロットに入力されているキーワード情報を受け取り、このキーワードとドキュメントの対応をリストを参照して入力されているキーワードの組み合わせに関連付けられているドキュメントのパス、ファイル名を取り出して、そのファイルを読み込み表示する結果表示実行手段及び画面表示手段を設ける。 Further, in an environment where a dialogue scenario is executed, that is, in a voice dialogue device, the dialogue scenario is executed, and each time there is an input from the user, the keyword information inputted in the slot is received, and the correspondence between the keyword and the document is listed. A result display executing means and a screen display means are provided for extracting the path and file name of the document associated with the combination of keywords input with reference to the above, and reading and displaying the file.

本発明により以下の効果がえられる。
１．音声対話シナリオの専門的知識が要求される音声対話シナリオのフロー開発を開発ア
プリの開発者自身が行わずにすむため、適切なフローを持つ音声対話アプリケーショ
ンの非専門家でも容易に音声対話コンテンツの開発を行うことができる。
２．上記と同じく、音声対話シナリオの専門的知識が要求されたシステム応答プロンプト内容の作文において、
開発アプリ毎にはじめから作文する必要が無くなりテンプレートによって自動生成さ
れるため、適切な対話を行う非専門家による音声対話コンテンツの開発が容易になる。
３．途中結果ドキュメントファイルの生成を自動的に行うために、マルチモーダル音声対
話コンテンツの開発作業量を低減することができる。
４．クラス言語モデルの作成に必要な、クラス情報の付与をキーワードに対して行う必要
がなくなるため、音声対話コンテンツの開発作業量を低減することができる。
５．スロット名のリストと、単語情報のリストの準備だけで、マルチモーダル音声対話コ
ンテンツの実行に必要な要素全てについて、その全体または雛形が半自動作成可能と
なるため、開発作業量が従来に比してはるかに削減される。 The following effects can be obtained by the present invention.
1. Since the developer of the development application itself does not need to develop the flow of a voice conversation scenario that requires specialized knowledge of the voice conversation scenario, even a non-specialist of a voice conversation application with an appropriate flow can easily perform the voice conversation. Can develop content.
2. As above, in the composition of the system response prompt content that requires expert knowledge of the voice interaction scenario,
Since there is no need to create a composition for each development application from the beginning, it is automatically generated by a template, which facilitates the development of spoken dialogue content by non-experts who perform appropriate dialogue.
3. Since the intermediate result document file is automatically generated, it is possible to reduce the amount of development work for multimodal speech interactive content.
4). Since it is no longer necessary to assign class information to keywords, which is necessary for creating a class language model, the amount of development work for spoken dialogue content can be reduced.
5. By simply preparing a list of slot names and a list of word information, it is possible to create semi-automatically or entirely of all the elements necessary to execute multimodal spoken dialogue content, so the development work is less than before. Much more.

本発明による音声対話コンテンツ作成装置は全てをハードウェアによって構成することができるが、それより更に簡易に実現するには本発明で提案する音声対話コンテンツ作成プログラムをコンピュータにインストールし、コンピュータに備えられているCPU（中央演算処理装置）によりプログラムを解読させ、コンピュータを音声対話コンテンツ作成装置として機能させる実施形態が最良の形態である。
コンピュータに本発明による音声対話コンテンツ作成装置として機能させる場合、コンピュータには、開発すべきアプリケーションに適合した対話の順序及び画像の表示順序を定めたシナリオの雛形を格納したシナリオテンプレートと、開発すべきアプリケーションで定められる入力項目を表わす複数のスロット名及びこれらのスロット名に付したスロットIDを格納したスロット名リスト格納部と、予めスロットIDが埋め込まれ、このスロットIDの埋め込み位置に従って上記スロット名を代入することにより対話相手に対し入力を促し、適正な入力を誘導する複数の対話文の雛形を格納したプロンプトテンプレートと、プロンプトテンプレートに格納した対話文の雛形に埋め込まれた上記スロットIDに従って上記スロット名を代入し、プロンプト文リストを生成するプロンプトテキスト作成手段と、スロット名が埋め込まれた対話文のそれぞれを上記シナリオテンプレートに代入し、シナリオファイルを生成するシナリオ生成手段と、スロット名が埋め込まれた対話文のそれぞれを上記プロンプト文リストに代入し、プロンプト音声ファイルを生成するプロンプト音声生成手段とが構築され音声対話コンテンツ作成装置として機能する。 The voice interaction content creation apparatus according to the present invention can be configured entirely by hardware. However, in order to realize it more easily than that, the speech conversation content creation program proposed in the present invention is installed in a computer and is provided in the computer. An embodiment in which a program is decrypted by a central processing unit (CPU) and the computer functions as a voice interactive content creation device is the best mode.
When a computer functions as a voice dialogue content creation device according to the present invention, the computer should develop a scenario template that stores a scenario template that defines a dialogue order and an image display order suitable for the application to be developed. A slot name list storage unit storing a plurality of slot names representing input items determined by an application and slot IDs attached to these slot names, and slot IDs are embedded in advance, and the slot names are set according to the embedded positions of the slot IDs. Prompt input to the conversation partner by substituting, the prompt template storing a plurality of dialog sentence templates that guide the appropriate input, and the slot according to the slot ID embedded in the dialog sentence model stored in the prompt template Assign a name and prompt Prompt text creation means for generating a sentence list, and each dialogue sentence in which the slot name is embedded are substituted into the above scenario template, and scenario generation means for generating a scenario file and each dialogue sentence in which the slot name is embedded A prompt voice generation unit that generates a prompt voice file by substituting into the prompt sentence list is constructed and functions as a voice dialog content creation device.

この発明では更に、この音声対話コンテンツ作成装置として機能するコンピュータに、途中結果ドキュメント設置位置情報格納部と、単語情報リスト格納と、これら途中結果ドキュメント設置位置情報格納部に格納された途中結果ドキュメント設置位置情報と単語リスト格納部に格納された単語リストとから対話の結果を表わす画面を表示するための結果表示ドキュメントリストを生成する結果表示ドキュメントリスト生成手段と、スロット名リスト格納部に格納したスロット名リストと、単語情報リストとから対話途中の状況を表わすための途中結果ドキュメントファイルを作成する途中結果ドキュメントファイル生成手段とを構築し音声対話コンテンツ作成装置として機能させる。 In the present invention, the computer functioning as the voice interactive content creation device further includes an intermediate result document installation position information storage unit, a word information list storage, and an intermediate result document installation stored in the intermediate result document installation position information storage unit. Result display document list generating means for generating a result display document list for displaying a screen showing the result of the dialogue from the position information and the word list stored in the word list storage unit, and a slot stored in the slot name list storage unit An intermediate result document file generating means for generating an intermediate result document file for expressing a situation in the middle of the dialogue from the name list and the word information list is constructed to function as a voice interactive content generation device.

図１に本発明による音声対話コンテンツ作成装置の実施例を示す。図中１００は本発明による音声対話コンテンツ作成装置を示す。この音声対話コンテンツ作成装置１００に対し、予め用意した入力情報１０を入力し、入力情報１０に基づいて音声対話コンテンツ２０が生成される。
入力情報１０はここではスロット名リスト１１、途中結果ドキュメント設置位置情報１２、単語情報リスト１３、追加例文コーパスファイル１４等とした場合を示す。また、生成される音声対話コンテンツ２０としてはシナリオファイル２１と、プロンプト音声ファイル２２、結果表示ドキュメントリスト２３、途中結果ドキュメントファイル２４、キーワードリスト２５、クラス言語モデル２６、認識辞書２７等とした場合を示す。 FIG. 1 shows an embodiment of a voice interaction content creation apparatus according to the present invention. In the figure, reference numeral 100 denotes a voice interactive content creation apparatus according to the present invention. Input information 10 prepared in advance is input to the voice interaction content creation apparatus 100, and the voice interaction content 20 is generated based on the input information 10.
Here, the input information 10 indicates a case where the slot name list 11, the intermediate result document installation position information 12, the word information list 13, the additional example sentence corpus file 14 and the like are used. In addition, the generated voice dialogue content 20 includes a scenario file 21, a prompt voice file 22, a result display document list 23, an intermediate result document file 24, a keyword list 25, a class language model 26, a recognition dictionary 27, and the like. Show.

音声対話コンテンツ作成装置１００は筋書きが各種用意されたシナリオテンプレートの雛形を多数格納したシナリオテンプレート群１０１と、対話文の雛形を多数格納したプロンプトテンプレート群１０２、シナリオ生成手段１０３、プロンプトテキスト生成手段１０４、プロンプト音声生成手段１０５、結果表示ドキュメントリスト生成手段１０６、途中結果ドキュメントファイル生成手段１０７、スロット・クラス読み替え手段１０８、クラス言語モデルおよび認識辞書生成手段１０９、例文コーパス１１０等で構成される。
本実施例を用いて、音声対話アプリケーションを開発する場合、まず、開発するシナリオテンプレート群１０１の中からアプリケーション（以下、開発アプリと呼ぶ）の種類に応じて音声対話シナリオの雛形であるシナリオテンプレート（図２：シナリオテンプレートについては後段で説明する）を選択する。以下、本実施例ではプロ野球の成績情報を「リーグ」と「成績」という二つの項目を音声により入力させ、入力内容に応じた画面を表示するアプリケーションを作成する例を説明する。リーグには“パリーグ”“セリーグ”、成績には“順位”“打率”というキーワードが入力できるようにするものとする。この場合、シナリオテンプレートとしてはスロット（対話により入力する項目の埋め込み位置）を２つもち、この２つのスロットを入力させるというフローが記述されたシナリオテンプレートを選択する。 The spoken dialogue content creation apparatus 100 includes a scenario template group 101 that stores a large number of scenario template templates prepared with various scenarios, a prompt template group 102 that stores a large number of dialogue sentence templates, a scenario generation unit 103, and a prompt text generation unit 104. , Prompt voice generation means 105, result display document list generation means 106, intermediate result document file generation means 107, slot / class replacement means 108, class language model and recognition dictionary generation means 109, example sentence corpus 110, and the like.
When a voice interaction application is developed using this embodiment, first, a scenario template (model of a voice interaction scenario) according to the type of application (hereinafter referred to as a development application) from the scenario template group 101 to be developed ( Figure 2: Scenario template will be explained later). Hereinafter, in the present embodiment, an example will be described in which an application is created in which two items of “league” and “score” are input by voice as professional baseball score information, and a screen corresponding to the input content is displayed. It is assumed that keywords such as “pa-league” and “se-league” can be entered for the league, and “ranking” and “batting rate” can be entered for the results. In this case, the scenario template has two slots (embedding positions of items to be input by dialogue), and a scenario template in which a flow for inputting these two slots is described is selected.

選択したシナリオテンプレートが扱うスロットに対して、それぞれの名前を決め、スロット名リスト１１（図１）を用意する。ここで決めた名前は対話の中で開発アプリが利用者に入力を求める際に応答時の言語表現として用いられる。スロット名リスト１１の例を図４に挙げる。１１Ａはスロット名、１１ＢはスロットIDを示す。各スロット名１１Ａは、スロットを識別するために付与されたID記号であるスロットID１１Ｂと対にして記述される。
次に、プロンプトテキスト生成手段１０４にスロット名リスト１１を入力し、対話の各場面でのシステム応答文となるプロンプトテキストを列挙したプロンプト文リスト１０４Ａ（図１）を作成する。プロンプト文リストを生成するためにプロンプトテキスト生成手段１０４は、この入力とシナリオテンプレート群１０１と対応したプロンプトテンプレート群１０２を用いる。 A name is determined for each slot handled by the selected scenario template, and a slot name list 11 (FIG. 1) is prepared. The name decided here is used as a language expression when responding when the development application asks the user for input during the conversation. An example of the slot name list 11 is shown in FIG. 11A indicates a slot name, and 11B indicates a slot ID. Each slot name 11A is described in pairs with a slot ID 11B which is an ID symbol assigned to identify the slot.
Next, the slot name list 11 is input to the prompt text generating means 104, and a prompt sentence list 104A (FIG. 1) is created in which prompt texts that are system response sentences in each scene of the dialogue are listed. In order to generate the prompt sentence list, the prompt text generation means 104 uses the prompt template group 102 corresponding to this input and the scenario template group 101.

プロンプトテンプレート群１０２に格納されているプロンプト文リストの例を図５に示す。プロンプトテンプレート群１０２内には対話中の各場面でシステムが応答する際の応答文の雛形がテキストとして列挙されている。列挙されている１文１文をプロンプトテンプレートテキストと呼ぶ。
各テンプレートテキストでスロット名称が再生されるべき部分は、空欄とされ、＜条件１＞、＜条件２＞で示されるスロットIDが付与されている。
また、各プロンプトテンプレートテキストはシナリオテンプレート群１０１中に挿入すべき個所との対応を示すためにプロンプトIDと対応付けられている。 An example of a prompt sentence list stored in the prompt template group 102 is shown in FIG. In the prompt template group 102, templates of response sentences when the system responds in each scene during dialogue are listed as text. One sentence and one sentence listed are called prompt template text.
The part in which the slot name is to be reproduced in each template text is left blank, and slot IDs indicated by <condition 1> and <condition 2> are given.
Each prompt template text is associated with a prompt ID in order to indicate a correspondence with a part to be inserted into the scenario template group 101.

プロンプトテキスト生成手段１０４の処理手順を図６に示す。図６に示すステップSP6₁ではスロット名リストの読み込みを実行し、ステップSP6₂ではプロンプトテンプレートの読み込みを実行し、ステップSP6₃ではプロンプトテンプレートテキスト中の空欄部分にスロットIDに従ってスロットの名称を挿入し、プロンプト文とする動作をプロンプトテンプレートのテキスト分を繰り返し実行する。ステップSP6₄ではプロンプトリストファイルとして出力する動作を実行する。ここまでの処理により、図７に例示されるプロンプト分リスト１０４Ａが生成される。 The processing procedure of the prompt text generation means 104 is shown in FIG. Executes reading of step SP6 ₁ in the slot name list shown in FIG. 6, step SP6 and executes reading of ₂ at the prompt template, and insert the name of the slot according to the slot ID on blank portions of the step SP6 ₃ In the prompt template in the text , Prompt text is repeated for the prompt template. In step SP6 ₄ executes an operation of outputting a prompt list file. The prompt list 104A illustrated in FIG. 7 is generated by the processing so far.

次に、プロンプト文リスト１０４Ａをシナリオ生成手段１０３に入力し、シナリオファイル２１を出力する。シナリオ生成手段１０３の処理手順を図８に示す。シナリオ生成処理中のステップSP8₃では、シナリオテンプレート１０１Ａの中を走査してプロンプトIDを対応付ける空欄を発見したならば、その対応するプロンプトIDと対応するプロンプト文をプロンプトリスト１０４Ａより取り出し空欄部分に挿入することを繰り返す。図２に示したシナリオテンプレート１０１Ａと、図７のプロンプト文リスト１０４Ａの例によると、図３の例で示すようなシナリオファイル２１が生成される。 Next, the prompt sentence list 104A is input to the scenario generation unit 103, and the scenario file 21 is output. The processing procedure of the scenario generation unit 103 is shown in FIG. In step SP8 ₃ in scenario generation process, if found blank associating the prompt ID scans through the scenario templates 101A, inserted into extraction blank portion than prompt list 104A prompt text corresponding with its corresponding prompt ID Repeat to do. According to the scenario template 101A shown in FIG. 2 and the prompt sentence list 104A shown in FIG. 7, the scenario file 21 shown in the example of FIG. 3 is generated.

ここで、図２にその一部を示すシナリオテンプレート１０１Ａについて説明する。シナリオテンプレート１０１Ａは最終的に本手法で生成されるシナリオファイル２１の雛形になるファイルである。
このシナリオテンプレート１０１Ａにはある特定のアプリケーション向けに対話を実行し完了するまでに必要なシステム側の制御内容のほとんど全てをあらかじめシナリオファイルを記述するプログラム言語で記述されていることが特徴である。完全なシナリオファイルと異なる点は、応答として言語表現を用いる個所、例えば文字を表示したり（エージェントキャラクトの動作と同期させて音声再生する際に同時にフクダシを画面に描画して音声再生内容の文章をそのフキダシ内部に表示する場合）、音声合成したりする際の、表示・合成内容となるテキストを指示する個所が空欄になっており各空欄はプロンプトIDが対応付けされている点である。 Here, a scenario template 101A, part of which is shown in FIG. 2, will be described. The scenario template 101A is a file that finally becomes a template of the scenario file 21 generated by this method.
This scenario template 101A is characterized in that almost all of the control contents on the system side required to execute and complete a dialog for a specific application are described in advance in a programming language that describes a scenario file. The difference from the complete scenario file is that the language expression is used as a response, for example, characters are displayed (when voice playback is performed in synchronization with the agent contract operation, a balloon is drawn on the screen at the same time. When text is displayed inside the balloon), when synthesizing speech, the parts that indicate the text to be displayed and synthesized are blank, and each blank is associated with a prompt ID. .

また、そのプロンプトIDは上述のプロンプトテキスト生成手段１０４が用いるプロンプトテンプレート１０２Ａ内の各プロンプトIDと対応付けて記述されている。図２では、３行目L3に示すエージェントキャラクタによるアニメーションによる“RestPose”という動作の再生と同時にフキダシの内容を表示するが、そこが空欄となっていてprompt001というIDで対応付けられていることを示す（という箇所）。
更に、開発アプリにおける応答プロンプト音声の再生に、システム実行時の処理能力などの問題点から対話中におけるリアルタイム音声合成処理による音声再生ではなくあらかじめ録音されたファイルを使うように選択した場合、そのプロンプト音声ファイルを作るために図９に示すプロンプト音声生成処理手順を利用する。 The prompt ID is described in association with each prompt ID in the prompt template 102A used by the prompt text generation means 104 described above. In FIG. 2, the content of the balloon is displayed simultaneously with the reproduction of the action “RestPose” by the animation by the agent character shown in the third line L3, but it is blank and is associated with the ID of prompt001. (<!-Prompt001-><!-/Prompt001->).
In addition, if you choose to use a pre-recorded file instead of real-time speech synthesis during conversation, due to problems such as the processing capability at the time of system execution, the prompt of the response prompt in the development application will be used. In order to create an audio file, a prompt audio generation processing procedure shown in FIG. 9 is used.

図９−Aでは、ステップSP9₁とSP9₂を実行し、音声合成技術を用いて自動的に音声ファイルを生成する場合である。対話中にリアルタイムに合成音声を作製する場合に比べて合成するための処理時間の制約がなくなるために、より高品質な合成音声の作成が可能である。また、図９−ＢではステップSP9₃とSP9₄を実行し、プロンプト文リスト１０４Ａを予め一文一文人間が読み上げた音声を収録する。この場合、画面にプロンプト文リスト１０４Ａの各プロンプトテキスト表示させる機能と、音声をマイクなどで収音し波形をファイルとして記録する機能をもつ装備を準備すればよい。なお、プロンプトテキストを画面に表示する際に、対話状況の端的な説明文も表示すると音声を発声する人間がどのような韻律でそのテキストを発声すればよいのかを示すことができる。この説明文もプロンプトテンプレート１０２A中でプロンプトIDと対応付けて記述しておけばよい。 In Figure 9-A, perform step SP9 ₁ and SP9 _2, a case where automatically generates an audio file by using the voice synthesis techniques. Since there is no restriction on the processing time for synthesis compared to the case where synthetic speech is created in real time during conversation, it is possible to create synthesized speech of higher quality. Also, run the Figure 9-B In step SP9 ₃ and SP9 _4, to record a voice pre sentence sentence humans have read a prompt statement list 104A. In this case, it is only necessary to prepare equipment having a function of displaying each prompt text of the prompt sentence list 104A on the screen and a function of collecting sound with a microphone and recording a waveform as a file. When prompt text is displayed on the screen, it is possible to indicate what prosody a person who speaks should utter the text if a brief explanation of the conversation situation is also displayed. This explanatory text may also be described in association with the prompt ID in the prompt template 102A.

次に、単語情報リスト１３を用意する（図１０）単語情報リスト１３とは、開発アプリ内の各スロットの入力値となる単語（キーワード）のリストであり、その単語の表記と音声認識で用いるための読みと、入力先となるスロットのIDを組にしてリストにしたものである。
作成した単語情報リスト１３は、まず結果表示ドキュメントリスト生成手段１０６に入力され、図１１にステップSP11₁〜SP11₆で示される結果表示ドキュメントリスト生成処理手順により処理されて結果表示ドキュメントリストのテンプレートを生成し、このテンプレートが結果表示ドキュメントリストテンプレート１０６Ａ（図１２）となる。結果表示ドキュメントリストテンプレート１０６Ａとは、各スロットに入ったキーワードの組み合わせに応じて、開発アプリがどのドキュメントを画面に表示するかを示したリストである。結果表示ドキュメントリスト生成手段１０６はこのキーワードと途中結果ドキュメント設定位置情報との組み合わせのリストを自動的に作成してファイルにする。画面表示には途中結果ドキュメント設置位置情報１２で指定されるWebブラウザ等を利用することで、開発アプリを実行するパーソナルコンピュータ上に接続された記憶装置やネットワークに接続されたサーバ内のドキュメントを読み込んで表示することができる。 Next, a word information list 13 is prepared (FIG. 10). The word information list 13 is a list of words (keywords) that serve as input values for each slot in the development application, and is used for notation and speech recognition of the words. This is a list of readings for reading and IDs of slots to be input destinations.
The created word information list 13 is first input to the result display document list generation means 106 and processed by the result display document list generation processing procedure shown in steps SP11 ₁ to SP11 ₆ in FIG. This template is generated and becomes a result display document list template 106A (FIG. 12). The result display document list template 106A is a list showing which document the development application displays on the screen in accordance with the combination of keywords entered in each slot. The result display document list generating means 106 automatically creates a list of combinations of this keyword and the intermediate result document setting position information and makes it a file. By using a Web browser or the like specified by the intermediate result document installation position information 12 for screen display, a document stored in a storage device connected to a personal computer executing the development application or a server connected to a network is read. Can be displayed.

全てのスロットにキーワードが入力された場合の画面としてはネットワーク上のサーバなどに散在しているドキュメントなどを表示する。そのため、結果表示ドキュメントリストテンプレート１０６Ａに対して、そのドキュメントを保持しているサーバのアドレスと記憶装置内でのファイルのある位置（パス情報）、ファイル名を例えばURLなどの型式で結果表示ドキュメントリスト生成手段１０６で記述する。記述をするには、あらかじめ全てのスロットにキーワードが入力された場合のキーワードの組み合わせと、表示するドキュメントのＵＲＬの対応表をコンテンツ作成者があらかじめ作成し、結果表示ドキュメントリスト生成手段１０６への入力とする。結果表示ドキュメントリスト生成手段１０６は、そのリストを元にステップSP11₄の直後にＵＲＬを記述することが出来る。 When a keyword is input to all slots, a document scattered on a server on the network is displayed. Therefore, for the result display document list template 106A, the result display document list in the form of a URL, for example, the address of the server holding the document, the location (path information) of the file in the storage device, and the file name. This is described by the generation means 106. For the description, the content creator creates in advance a correspondence table of keyword combinations and URLs of documents to be displayed when keywords are inputted to all slots in advance, and inputs to the result display document list generation means 106 And Results document list generating unit 106 is able to describe the URL immediately after step SP11 ₄ based on that list.

一方、一部のスロットに値が入っていない対話途中の状態において、どのスロットのキーワードがまだ未入力でどのスロットにどのキーワードが入力されているかを示すために画面に表示する内容については途中結果ドキュメントファイル生成手段１０７で途中結果ドキュメントファイルを生成する。そのファイルを開発アプリの記憶装置内の保存位置とファイル名を記述する（図１１中ステップSP11₄からSP11₅）。途中結果ドキュメント保存位置については途中結果ドキュメントファイルを保存するサーバのネットワーク上のアドレスやパス情報を記した途中結果ドキュメント設置位置情報１２として用意する。結果表示ドキュメントリスト生成手段１０６が各キーワードの組み合わせと一緒にその途中結果ドキュメント設置位置情報１２と次段落に述べる途中結果ドキュメントファイル生成手段１０７が生成する各途中結果ドキュメントファイルの規則的なファイル名を組み合わせた完全なパス情報を生成して追記しても良い。さらに、途中結果のみならず最終的な結果についても表示すべきドキュメントのURL等が機械的に生成可能な場合は、生成し自動的に付与しても良い。 On the other hand, in the middle of the dialogue when some slots are not filled in, the results displayed on the screen to indicate which slot keyword is not yet entered and which keyword is entered in which slot The document file generation means 107 generates a halfway result document file. The storage location and file name in the storage device of the developed application are described for the file (steps SP11 ₄ to SP11 _{5 in} FIG. 11). The intermediate result document storage position is prepared as intermediate result document installation position information 12 describing the address and path information on the server network where the intermediate result document file is stored. The result display document list generation unit 106 sets the regular file names of the intermediate result document files generated by the intermediate result document file generation unit 107 described in the next paragraph and the intermediate result document installation position information 12 together with the combinations of keywords. The combined complete path information may be generated and added. Furthermore, when the URL or the like of the document to be displayed can be generated not only for the intermediate result but also for the final result, it may be generated and automatically assigned.

図１３に途中結果ドキュメントファイル生成手段１０７の処理手順を示す。途中結果ドキュメントファイル生成手段１０７は、ステップSP13₁でスロット名リスト１１からスロットIDとスロット名の対応を読み込み、ステップSP13₃で各スロットに入るキーワードの組み合わせのうち一つ以上のスロットにキーワードがない組み合わせを生成し、ステップSP13₄で各組み合わせにおいて図１４で例示するように、現在の各スロットの状態について入力済みキーワードと入力候補となるキーワードのリストを示すドキュメントファイル（途中結果ドキュメントファイル）を生成する。生成したファイルは、ステップSP13₅で一定の規則にしたがってファイル名を付与して保存される。規則としては、各スロットに入るキーワードについて単語情報リスト中に現れる順に番号を付与し、各スロットの入力キーワードの番号の組で示すなどの方法がある。例えば図１２の結果表示ドキュメントリストテンプレート例の１行目Ｌ１では、“C-1-.html”というファイル名が付与されている。これは、条件１スロットでは１番のキーワード入力状態にあり条件２スロットについてはキーワードが入力されていない状態である場合に表示する途中結果ドキュメントファイルであることを示すファイル名である。 FIG. 13 shows a processing procedure of the intermediate result document file generation means 107. Intermediate result document file generating unit 107 reads the corresponding slot ID and slot name from the slot name list 11 in step SP13 _1, there is no keyword to one or more slots out of a combination of keywords entering each slot in step SP13 ₃ the combination generates, as illustrated in FIG. 14 in each combination in step SP13 _4, generates a document file that lists the keywords to be entered keywords and input candidates for the current state of each slot (intermediate result document file) To do. The resulting file is stored by applying a file name according to a predetermined rule in step SP13 _5. As a rule, there is a method of assigning numbers to the keywords entering each slot in the order in which they appear in the word information list and indicating them by a set of numbers of input keywords of each slot. For example, the file name “C-1-.html” is given in the first line L1 of the result display document list template example of FIG. This is a file name indicating that it is a halfway result document file to be displayed when the first keyword is input in the condition 1 slot and no keyword is input in the condition 2 slot.

つまりハイフンに続く数字が有ればその番号で示されるキーワードが入っており、数字が無ければ対応するスロットには値が無いことを示す。図１２の８行目L8の“C--2.html”は条件２スロットに２番目のキーワードが入力されており、条件１スロットは未入力の状態での画面表示内容であることを示す。途中結果ドキュメントファイルのファイル形式は図１２の本実施例ではHTML形式で記述しているが、画面表示手段が表示可能なファイル形式（例えば画像ファイル形式）であればよい。さらに、HTML形式の場合、CSS（カスケーディングスタイルシート）のファイルをリンクしておき、途中結果ドキュメントが表示される際の背景画像や文字の書体、大きさ、色などを予めCSSファイルとして準備することで容易に変更できるようにしてあっても良い。 In other words, if there is a number following the hyphen, the keyword indicated by that number is entered, and if there is no number, the corresponding slot has no value. “C--2.html” in the eighth line L8 in FIG. 12 indicates that the second keyword is input in the condition 2 slot, and the condition 1 slot is the screen display content in an uninput state. The file format of the intermediate result document file is described in the HTML format in this embodiment of FIG. 12, but any file format that can be displayed by the screen display means (for example, an image file format) may be used. Furthermore, in the case of HTML format, a CSS (cascading style sheet) file is linked, and the background image and the typeface, size, color, etc. of the background image when the intermediate result document is displayed are prepared as a CSS file in advance. It may be possible to change easily.

次に、スロット・クラス読み替え手段１０８に単語情報リスト１３を入力してキーワードリスト２５を作成し、さらにクラス言語モデル及び認識辞書生成手段１０９によりクラス言語モデル２６と認識辞書２７を生成する。
クラス言語モデル生成手法においては、その作成過程において、例文コーパスを形態素解析して単語列に分解した上で、その単語列中でクラス化すべき単語があれば、その単語をクラスに置き換えるという処理を行う。そのとき、どの単語がどのクラスに属するかを示す対応表が必要である。本実施例では各キーワードの単語クラスとして入力先スロットIDをクラス情報として用いる。図１５にスロット・クラス読み替え手段が生成したキーワードリスト２５を、また、図１６にステップSP16₁〜SP16₃で示されるスロット・クラス読み替え処理手順の一例を例示する。 Next, the word information list 13 is input to the slot / class replacement unit 108 to create the keyword list 25, and the class language model 26 and the recognition dictionary 27 are generated by the class language model and recognition dictionary generation unit 109.
In the class language model generation method, in the creation process, the example sentence corpus is morphologically analyzed and decomposed into word strings, and if there is a word to be classified in the word string, the word is replaced with a class. Do. At that time, a correspondence table indicating which word belongs to which class is required. In this embodiment, the input slot ID is used as class information as the word class of each keyword. The keyword list 25-slot class replaced unit has generated 15, also illustrates an example of a slot class rereading procedure indicated at step SP16 ₁ ~SP16 ₃ in FIG. 16.

例文コーパスは各開発アプリが対象とする話題（ドメイン）に依存しないような一般的な表現からなる発話例文を集めたものである。しかしながら、開発アプリにおける音声認識率を向上させるためにはドメインに依存した表現を用いた発話例を言語モデル作成時に準備することが望ましい。本実施例では、クラス言語モデル作成段階で開発者が集めたドメインに依存した発話例文集である追加例文コーパス１４を取り込み、各例文を形態素解析後、キーワードリスト２５を用いてキーワード部分をクラス名に置き換える。
クラス言語モデルおよび認識辞書生成手段１０９は、あらかじめ準備されている例文コーパスと上記追加例文コーパス１４とを用いて、クラス言語モデル２６と認識辞書２７を生成する。この手段の詳細については従来の技術で述べた特許文献３、４に詳しく説明されている。この手段が利用する例文コーパスは開発者が選択するシナリオテンプレート名に合わせて用意される。この例文コーパスの各例文はあらかじめ形態素解析された上、クラス部分がスロットIDと同じ名前のクラスとして置き換えられている。 The example sentence corpus is a collection of utterance example sentences composed of general expressions that do not depend on the topic (domain) targeted by each development application. However, in order to improve the speech recognition rate in the developed application, it is desirable to prepare an utterance example using a domain-dependent expression when creating a language model. In this embodiment, an additional example sentence corpus 14, which is a collection of example sentences dependent on the domain collected by the developer at the class language model creation stage, is taken in, and after morphological analysis of each example sentence, the keyword list is used to classify the keyword part as a class name. Replace with
The class language model and recognition dictionary generation means 109 generates a class language model 26 and a recognition dictionary 27 using the prepared example sentence corpus and the additional example sentence corpus 14. Details of this means are described in detail in Patent Documents 3 and 4 described in the prior art. The example sentence corpus used by this means is prepared according to the scenario template name selected by the developer. Each example sentence in this example corpus is morphologically analyzed in advance, and the class portion is replaced with a class having the same name as the slot ID.

図１７に対話シナリオ実行装置の実施例を示す。図中２００は対話シナリオ実行装置を示す。対話シナリオ実行装置２００はマイク２０１と、スピーカ２０２と、音声認識手段２０３と、発話理解手段２０４、対話シナリオ実行手段２０５、音声再生手段２０６、結果表示実行手段２０８、画面表示手段２０９とを備えて構成される。
対話シナリオ実行手段２０５は音声対話コンテンツ作成装置１００で生成したシナリオファイル２１の内容を解釈し実行する。つまり、対話シナリオ実行手段２０５はシナリオファイル２１の内容に応じて各部位に指示を出力する。音声を再生すべき時点では音声再生手段２０６に対してプロンプト音声２２を再生するように指示し、エージェントアニメーションの動作を一緒に応答テキストをフキダシ等に表示するように指示する。さらに、音声認識手段２０３に対してクラス言語モデル２６と認識辞書２７を用いて音声認識するように指示する。 FIG. 17 shows an embodiment of the dialogue scenario execution apparatus. In the figure, reference numeral 200 denotes an interactive scenario execution device. The dialogue scenario execution apparatus 200 includes a microphone 201, a speaker 202, a voice recognition unit 203, an utterance understanding unit 204, a dialogue scenario execution unit 205, a voice reproduction unit 206, a result display execution unit 208, and a screen display unit 209. Composed.
The dialogue scenario execution means 205 interprets and executes the contents of the scenario file 21 generated by the audio dialogue content creation apparatus 100. That is, the dialogue scenario execution means 205 outputs an instruction to each part according to the contents of the scenario file 21. At the time when the voice is to be played back, the voice playback means 206 is instructed to play back the prompt voice 22, and the action of the agent animation is instructed to be displayed in a balloon or the like together. Further, the voice recognition unit 203 is instructed to perform voice recognition using the class language model 26 and the recognition dictionary 27.

発話理解手段２０４に対しては、認識単語列を音声認識手段２０３から受け取り、その中からキーワードを取り出して各キーワードがどのスロットに入力されるべきかをキーワードリスト２５中のクラス名をスロットIDとして読み替えて照らし合わせ、該当するスロットへの入力値として出力するように指示する。
そして、スロットの入力状態を結果表示実行手段２０８に送信し、結果表示実行手段２０８は画面にその入力状態に応じた画面内容として表示すべきファイルの情報を結果表示ドキュメントリスト２３から取り出して、画面表示手段２０９に表示するように指示する。 For the utterance understanding unit 204, the recognition word string is received from the speech recognition unit 203, the keyword is taken out of the recognition word sequence, and the class name in the keyword list 25 is used as the slot ID to which slot each keyword should be input. Instructs to read and compare and output as an input value to the corresponding slot.
Then, the input state of the slot is transmitted to the result display execution means 208, and the result display execution means 208 takes out information on the file to be displayed on the screen as the screen contents corresponding to the input state from the result display document list 23, and displays the screen. The display unit 209 is instructed to display.

画面表示手段２０９によって、途中結果ドキュメントファイルであれば途中結果ドキュメント設置位置が示す記憶領域に保存してある途中結果ドキュメントファイルを読み込み表示し、全てのスロットに値があれば、ネットワーク３００などを経由して外部のサーバ等から結果ドキュメントを読み込み結果となる情報を表示する。
以上説明した音声対話コンテンツ作成装置１００及び対話シナリオ実行装置２００はプログラム言語によって記述された音声対話コンテンツ作成プログラム及び音声対話プログラムをコンピュータにインストールし、コンピュータに備えられたCPU（中央演算処理装置）に解読させ、実行させることにより実現される。 If it is an intermediate result document file, the screen display means 209 reads and displays the intermediate result document file stored in the storage area indicated by the intermediate result document installation position. Then, the result document is read from an external server or the like, and the resulting information is displayed.
The voice conversation content creation device 100 and the dialogue scenario execution device 200 described above install the voice dialogue content creation program and the voice dialogue program described in the program language in the computer, and install them in a CPU (central processing unit) provided in the computer. This is realized by deciphering and executing.

音声対話コンテンツ作成プログラム及び音声対話プログラムはそれぞれコンピュータが読み取り可能な記録媒体に記録され、この記録媒体又は通信回線を通じてコンピュータにインストールされる。 Each of the voice interactive content creation program and the voice interactive program is recorded on a computer-readable recording medium, and is installed in the computer through the recording medium or a communication line.

この発明による音声対話コンテンツ作成方法、装置、プログラムはそれぞれ音声対話コンテンツ作成現場で活用される。 The voice dialogue content creation method, apparatus, and program according to the present invention are each utilized at the voice dialogue content creation site.

本発明の音声対話コンテンツ作成装置の実施例を説明するためのブロック図。The block diagram for demonstrating the Example of the audio | voice dialog content creation apparatus of this invention. 本発明で用いるシナリオテンプレートの一部を例示した図。The figure which illustrated a part of scenario template used by this invention. 本発明による音声対話コンテンツ作成装置で生成されるシナリオファイルの一例を示す図。The figure which shows an example of the scenario file produced | generated with the audio | voice dialog content creation apparatus by this invention. 本発明で用いるスロット名リストの一例を示す図。The figure which shows an example of the slot name list | wrist used by this invention. 本発明で用いるプロンプトテンプレート群の一例を示す図。The figure which shows an example of the prompt template group used by this invention. 本発明で用いるプロンプトテキスト生成手順の一例を説明するためのフローチャート。The flowchart for demonstrating an example of the prompt text production | generation procedure used by this invention. 図６に示したプロンプトテキスト生成手順で生成されるプロンプト文リストを説明するための図。The figure for demonstrating the prompt sentence list produced | generated by the prompt text production | generation procedure shown in FIG. 本発明で用いるシナリオ生成処理手順を説明するためのフローチャート。The flowchart for demonstrating the scenario production | generation process procedure used by this invention. 本発明で用いるプロンプト音声生成処理手順を説明するためのフローチャート。The flowchart for demonstrating the prompt audio | voice production | generation process procedure used by this invention. 本発明で用いる単語情報リストを説明するための図。The figure for demonstrating the word information list used by this invention. 本発明で用いる結果表示ドキュメントリスト生成処理手順を説明するためのフローチャート。The flowchart for demonstrating the result display document list production | generation process procedure used by this invention. 本発明で用いる結果表示ドキュメントリストテンプレートを説明するための図。The figure for demonstrating the result display document list template used by this invention. 本発明で用いる途中結果ドキュメントファイル生成処理手順を説明するためのフローチャート。The flowchart for demonstrating the intermediate result document file generation | occurrence | production process procedure used by this invention. 本発明で用いる途中結果ドキュメントファイルと、その表示結果を説明するための図。The figure for demonstrating the intermediate result document file used by this invention, and its display result. 本発明で用いるキーワードリストの一例を説明するための図。The figure for demonstrating an example of the keyword list used by this invention. 本発明で用いるスロット・クラス読み替え処理手順を説明するためのフローチャート。The flowchart for demonstrating the slot class replacement process procedure used by this invention. 本発明の音声対話コンテンツ作成装置で生成した各コンテンツを利用して音声対話を実行する対話シナリオ実行装置の一例を説明するためのブロック図。The block diagram for demonstrating an example of the dialogue scenario execution apparatus which performs a voice dialogue using each content produced | generated by the voice dialogue content creation apparatus of this invention.

Explanation of symbols

１０入力情報１００音声対話コンテンツ作成装置
１１スロット名リスト１０１シナリオテンプレート群
１２途中結果ドキュメント設置位置情報１０２プロンプトテンプレート群
１３単語情報リスト１０３シナリオ生成手段
１４追加例文コーパスファイル１０４プロンプトテキスト生成手段
２０音声対話コンテンツ１０４Ａプロンプト文リスト
２１シナリオファイル１０５プロンプト音声生成手段
２２プロンプト音声ファイル１０６結果表示ドキュメントリスト
２３結果表示ドキュメントリスト生成手段
２４途中結果ドキュメントファイル１０７途中結果ドキュメント
２５キーワードリストファイル生成手段
２６クラス言語モデル１０８スロット・クラス
２７認識辞書読み替え手段
１０９クラス言語モデルおよび
認識辞書生成手段
１１０例文コーパス

DESCRIPTION OF SYMBOLS 10 Input information 100 Voice dialogue content production apparatus 11 Slot name list 101 Scenario template group 12 Intermediate result document installation position information 102 Prompt template group 13 Word information list 103 Scenario generation means 14 Additional example sentence corpus file 104 Prompt text generation means 20 Voice dialogue content 104A Prompt sentence list 21 Scenario file 105 Prompt sound generation means 22 Prompt sound file 106 Result display document list 23 Result display document list generation means 24 Intermediate result document file 107 Intermediate result document 25 Keyword list file generation means 26 Class language model 108 Slot Class 27 recognition dictionary
109 class language model and
Recognition dictionary generation means
110 Example Corpus

Claims

A scenario template for creating a scenario that defines the order of interaction and the display order of images suitable for the application to be developed;
A plurality of slot names representing input item names determined by the application to be developed, and slot IDs attached to these slot names;
Prompt template for creating a plurality of dialogue sentences that promptly input to the dialogue partner by guiding the input by substituting the slot name in accordance with the slot ID embedding position, and guiding appropriate input,
And substituting the slot name in accordance with the slot ID embedded in the prompt template, substituting each of the dialogue sentences in which the slot name is embedded in the scenario template, and generating a scenario file and a prompt voice file. A method for creating spoken dialogue content.

2. The voice dialogue content creation method according to claim 1, wherein a list of words used for an application to be developed in addition to creation of the scenario file and prompt voice file, storage location information of example data representing a result of the dialogue, and the middle of the dialogue A speech dialogue content creation method, comprising: creating a result display document list and an intermediate result file document file using storage position information of example sentence data representing the situation of

A scenario template that stores a scenario template that defines the order of interaction and the display order of images suitable for the application to be developed;
A slot name list storage unit storing a plurality of slot names representing input items determined by an application to be developed and slot IDs attached to these slot names;
A prompt template in which a slot ID is embedded in advance, and prompts the dialogue partner by substituting the slot name according to the slot ID embedding position, and stores a plurality of dialogue sentence templates for guiding appropriate input, and
Prompt text creation means for generating a prompt sentence list by substituting the slot name according to the slot ID embedded in the dialog sentence template stored in the prompt template,
A scenario generation means for generating a scenario file by substituting each of the dialogue statements in which the slot names are embedded in the scenario template;
Prompt voice generation means for substituting each of the dialog sentences in which the slot names are embedded in the prompt sentence list and generating a prompt voice file;
A voice interactive content creation device comprising:

4. The interactive content creation apparatus according to claim 3, wherein the intermediate result document installation position information storage unit, the word information list storage unit, and the intermediate result document installation position information and word list stored in the intermediate result document installation position information storage unit. A result display document list generating means for generating a result display document list for displaying a screen representing the result of the dialogue from the word list stored in the storage unit;
A halfway result document file generating means for creating a halfway result document file for representing a situation during the dialogue from the slot name list stored in the slot name list storage unit and the word information list;
A voice interactive content creation device comprising:

5. A spoken dialogue content creation program that is described in a program language that can be read by a computer and causes the computer to function as the spoken dialogue content creation device according to claim 3.

6. A recording medium comprising a recording medium readable by a computer, wherein the audio dialogue content creation program according to claim 5 is recorded on the recording medium.