JP2004233852A

JP2004233852A - System and method for supporting preparation of speech response application

Info

Publication number: JP2004233852A
Application number: JP2003024488A
Authority: JP
Inventors: Tomonori Iketani; 智則池谷
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2003-01-31
Filing date: 2003-01-31
Publication date: 2004-08-19

Abstract

<P>PROBLEM TO BE SOLVED: To provide a system and a method for supporting the preparation of a speech response application that can structure a speech response application by generating an interaction script even when recognition grammar is no available. <P>SOLUTION: It is confirmed whether a secondary script including recognition grammar is available and when the secondary script is not generated, a temporary secondary script for urging a user to speak is generated according to a primary script, and interpreted and executed; and a user's speech is inputted as a speech signal and temporarily stored, at least a language code and a speech recognition engine are selected to analyzes the speech signal and extracts a keyword, and recognition grammar is generated according to the keyword to generate a secondary script in which the recognition grammar is incorporated. When the secondary script is generated, on the other hand, the secondary script is interpreted and executed. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、エンドユーザと対話を進めてタスクを実行する対話システムに関する。特に、対話と実行タスクを記述したスクリプト言語とそのスクリプト言語によって記述された対話スクリプトを解釈・実行するインタプリタ、およびインタプリタとユーザを仲介する実装プラットホーム、および対話スクリプトに付随して必要なキーワード抽出用文法生成装置に関する。
【０００２】
【従来の技術】
昨今のコンピュータ技術の急速な伸展に伴って、音声による対話を活用したアプリケーションが多々開発されるようになってきている。これらの音声対話システムにおいては、ユーザによる発話とシステムによる合成音声等による発話を交互に繰り返しながら、ユーザ発話の内容に応じて階層的な分岐を行いながらユーザから必要となる情報を収集し、十分な情報が得られた時点において何らかのタスクを実行する。
【０００３】
これらの応答シナリオを記述するスクリプト言語としては、ＶｏｉｃｅＸＭＬフォーラム（ＨＹＰＥＲＬＩＮＫｈｔｔｐ：／／ｗｗｗ．ｖｏｉｃｅｘｍｌ．ｏｒｇ／ｈｔｔｐ：／／ｗｗｗ．ｖｏｉｃｅｘｍｌ．ｏｒｇ／）によって策定されたＶｏｉｃｅＸＭＬ１．０が主流であり、商用ベースで既にリリースされている。現在では、後継バージョンとしてＶｏｉｃｅＸＭＬ２．０がワールドワイドウェブコンソーシアム（Ｗ３Ｃ；ｈｔｔｐ：／／ｗｗｗ．ｗ３．ｏｒｇ／）において策定段階に入っている。
【０００４】
これ以外にも、音声対話サービスを提供するシステムインテグレータが独自に策定したスクリプト言語も存在する。例えば、富士通株式会社からは、「ＶｏｉｃｅＳｃｒｉｐｔ（Ｒ）」という対話記述用スクリプト言語がリリースされている。
【０００５】
これらの音声対話システムを正常に稼動させるには、入力されるユーザによる発話の内容をより精度良く認識することが最も重要な課題となる。認識精度を高めるためには、（特許文献１）のように認識辞書の語彙を自動的に増強する方法や、（特許文献２）のようにキーワードを抽出してから当該キーワードに対応した認識辞書を生成する方法当が考えられている。
【０００６】
さらに、スクリプト言語を利用する場合には、ユーザによる発話の内容をより正確に認識するために、人間の発する言葉から必要なコマンドを抽出するために必要な認識文法（グラマー）を必要とする。この場合、ユーザの発話内容に合致する認識文法が想定されている場合には、当該認識文法に従ってユーザの発話内容を解析して、含まれているキーワードの抽出を行う。したがって、音声対話システムを運用するためには、対話を記述する上述したようなスクリプト言語で記述された対話スクリプトだけでは足りず、ユーザの発話内容を一次認識するための認識文法についても記述しておく必要がある。
【０００７】
認識文法の書式についてはＪａｖａＳｐｅｅｃｈＧｒａｍｍａｒＦｏｒｍａｔ（ＪＳＧＦ）、ワールドワイドウェブコンソーシアムによって策定段階となっているＳｐｅｅｃｈＲｅｃｏｇｎｉｔｉｏｎＧｒａｍｍａｒＳｐｅｃｉｆｉｃａｔｉｏｎＶｅｒｓｉｏｎ１．０（ＳＲＧＳ）において検討されている２つの書式ＡｕｇｍｅｎｔｅｄＢＮＦｓｙｎｔａｘ（ＡＢＮＦ）とＧｒＸＭＬ等が代表的である。また、その他にも、以前から音声認識アプリケーションを市場に提供してきた音声認識プロバイダ各社が独自に策定した書式も存在している。
【０００８】
ＶｏｉｃｅＸＭＬやＳＲＧＳといったスクリプト言語は、ウェブページ記述用スクリプトであるＨＴＭＬ言語をベースとして拡張されたＸＭＬ言語から派生した言語であるが、主にデータを修飾する用途であったＨＴＭＬ言語に比べると、プログラミング言語としての色彩が非常に強くなっている。各言語仕様によって定義されたタグを利用してスクリプトを記述するにはテキストエディタを利用する方法もあるが、Ｃ言語、あるいはＣ^＋＋やＪａｖａ（Ｒ）といったプログラミング言語にエディット環境やデバッガを統合した統合開発環境があるように、簡易にスクリプトを記述するための専用開発環境を用意する対話システムベンダーもある。
【０００９】
【特許文献１】
特開２００２−１４６９３号公報
【００１０】
【特許文献２】
特開平１１−２０２８９０号公報
【００１１】
【発明が解決しようとする課題】
しかし、上述したような方法では、書式の相違する対話スクリプトと認識文法を準備しておく必要があるが、それぞれ単独では何ら対話アプリケーションを構成できるものではない。すなわち、対話スクリプトおのおのに対応する認識文法を準備する必要があり、対話アプリケーションの作成者は、認識文法の生成を同時に行う必要があり、作成負荷が過大となっているという問題点があった。
【００１２】
また、一つ一つの対話スクリプトに対応するすべての認識文法を事前に準備することは、その認識文法の多様性によって現実的には困難であり、また記憶容量の物理的な制約によって、すべての認識文法を事前に登録しておくことも困難である。
【００１３】
さらに、認識文法が存在しない場合には、ユーザの発話内容を認識することができず、システムによる応答が見当違いの応答になってしまうことから、音声応答アプリケーションとして成立しないという問題点もあった。
【００１４】
本発明は、上記問題点を解決するために、認識文法がない場合であっても、対話スクリプトを生成することによって音声応答アプリケーションを構築することができる音声応答アプリケーション作成支援システム及び方法を提供することを目的とする。
【００１５】
【課題を解決するための手段】
上記目的を達成するために本発明にかかる音声応答アプリケーション作成支援システムは、ユーザ発話と自動的に対応する自動応答アプリケーションの作成を支援する自動応答アプリケーション作成支援システムであって、認識文法を含んだ二次スクリプトの存在の有無を確認し、有無に応じて一次スクリプトもしくは二次スクリプトを選択して取得し、一次スクリプトもしくは二次スクリプトを解釈して実行するインタプリタであるスクリプト解釈部と、ユーザによる発話を音声信号として入力する音声入力部と、入力された音声信号を一時記憶する一時記憶部と、入力された音声信号に基づいて、少なくとも言語コード及び音声認識エンジンを選択するプラットホーム制御部と、一次記憶されている音声信号を解析してキーワードを抽出するユーザ入力解析部と、一次スクリプト及び音声信号から抽出されたキーワードに基づいて、認識文法を生成し、認識文法を組み込んだ二次スクリプトを生成する二次スクリプト生成部とを含み、二次スクリプトが生成されていない場合には、一次スクリプトに基づいて、ユーザによる発話を促すための仮の二次スクリプトを生成し、スクリプト解釈部において解釈して実行し、二次スクリプトが生成されている場合には、二次スクリプトを前記スクリプト解釈部において解釈して実行することを特徴とする。
【００１６】
かかる構成により、認識文法がない場合であっても、ユーザによる発話の内容に応じた適切な認識文法を生成することができ、音声応答アプリケーションとして確実に対話を構成することが可能となる。
【００１７】
また、本発明にかかる音声応答アプリケーション作成支援システムは、認識文法を保存する認識文法記憶部をさらに含むことが好ましい。ユーザの発話内容に応じて認識文法を随時更新・蓄積することができるからである。
【００１８】
また、本発明にかかる音声応答アプリケーション作成支援システムは、二次スクリプトが生成されているか否かを判定し、生成されていない場合には仮の二次スクリプトを、生成されている場合には二次スクリプトを、それぞれスクリプト解釈部へ渡すリソースフェッチャーをさらに含むことが好ましい。
【００１９】
また、本発明は、上記のような音声応答アプリケーション作成支援システムの機能をコンピュータの処理ステップとして実行するソフトウェアを特徴とするものであり、具体的には、ユーザ発話と自動的に対応する自動応答アプリケーションの作成を支援する自動応答アプリケーション作成支援方法であって、認識文法を含んだ二次スクリプトの存在の有無を確認し、有無に応じて一次スクリプトもしくは二次スクリプトを選択して取得し、一次スクリプトもしくは二次スクリプトを解釈して実行する工程と、ユーザによる発話を音声信号として入力する工程と、入力された音声信号を一時記憶する工程と、入力された音声信号に基づいて、少なくとも言語コード及び音声認識エンジンを選択する工程と、一次記憶されている音声信号を解析してキーワードを抽出する工程と、一次スクリプト及び音声信号から抽出されたキーワードに基づいて、認識文法を生成し、認識文法を組み込んだ二次スクリプトを生成する工程とを含み、二次スクリプトが生成されていない場合には、一次スクリプトに基づいて、ユーザによる発話を促すための仮の二次スクリプトを生成して解釈して実行し、二次スクリプトが生成されている場合には、二次スクリプトを解釈して実行する自動応答アプリケーション作成支援方法並びにそのような工程を具現化するコンピュータ実行可能なプログラムであることを特徴とする。
【００２０】
かかる構成により、コンピュータ上へ当該プログラムをロードさせ実行することで、認識文法がない場合であっても、ユーザによる発話の内容に応じた適切な認識文法を生成することができ、音声応答アプリケーションとして確実に対話を構成することが可能となる自動応答アプリケーション作成支援システムを実現することが可能となる。
【００２１】
【発明の実施の形態】
以下、本発明の実施の形態にかかる音声応答アプリケーション作成支援システムについて、図面を参照しながら説明する。図１は本発明の実施の形態にかかる音声応答アプリケーション作成支援システムの構成図である。
【００２２】
図１において、まず音声出力部１２から合成音声等により出力されているシステムによる音声出力により促されたユーザによる発話を、音声入力部１１において音声信号として入力する。そして、プラットホーム制御部１３において、言語属性の切り替えや、認識エンジンの切り替えを行う。すなわち、音声入力部１１から入力された音声信号に応じて、言語コードを切り替えたり、適切な認識エンジンを選択する作業を行う。
【００２３】
プラットホーム制御部１３では、音声入力部１１から入力された音声信号そのものを、一時記憶部１４に記憶する。記憶された音声信号を用いて、認識文法を生成するためのキーワードを抽出するためである。
【００２４】
また、スクリプト解釈部１５では、音声対話アプリケーションにおいて用意されているスクリプトを解釈して実行する。もちろん、解釈されたスクリプトに対して、プラットホーム制御部１３において、解釈されたスクリプトの内容に応じて言語コードを切り替えたり適切な合成音声を選択することによって生成された合成音声が、音声対話アプリケーションの出力として音声出力部１２から出力される。
【００２５】
次に、ユーザ入力解析部１６では、一次記憶部１４に記憶されているユーザにより発声された音声信号を形態素解析等して、必要なキーワードを分析することになる。
【００２６】
そして、二次スクリプト生成部１７では、記述されたスクリプト及び音声信号から得られたキーワードに基づいて、認識文法を生成し、認識文法を組み込むように当初から準備されている一次スクリプトを更新することで、二次スクリプトを生成する。また、認識文法が存在しない場合には、仮の二次スクリプトを生成する。生成された二次スクリプトあるいは仮の二次スクリプトがスクリプト解釈部１５によって解釈され、実行されることによって、ユーザは音声対話を行うことができる。ここで生成された認識文法は、認識文法記憶部１９に記憶される。認識文法記憶部１９へ生成された認識文法を記憶しておくことで、次に音声入力がなされた時点においては、当該認識文法を参照することが可能となる。
【００２７】
また、認識文法の有無の確認は、二次スクリプト生成部１７において二次スクリプトの生成時に判断するものであっても良いし、認識文法記憶部１９に対応する認識文法が記憶されているか否かを確認するものであっても良い。
【００２８】
なお、リソースフェッチャー２０は、スクリプト解釈部１５で解釈して実行するスクリプトを、アプリケーション作成時に記述されている一次スクリプトに基づいて生成される仮の二次スクリプトと、新たに生成された二次スクリプトとの間で切り替える。
【００２９】
また、キーワードについては、ユーザ入力解析部１６において解析されることによって抽出されたキーワードに限定されるものではなく、あらかじめドキュメントデータベース１８に記憶させておいたキーワードを使用するものであっても良い。
【００３０】
次に、かかる構成を有する本発明の実施の形態にかかる音声応答アプリケーション作成支援システムにおける処理の流れについて説明する。図２に本発明の実施の形態にかかる音声応答アプリケーション作成支援システムを実現するプログラムの処理の流れ図を示す。
【００３１】
図２において、まずプラットホーム制御部１３からスクリプト解釈部１５に対して電話の着信等の対話開始依頼がなされると（ステップＳ２０１）、スクリプト解釈部１５は認識文法を含んだ二次スクリプトの有無を確認する（ステップＳ２０２）。
【００３２】
なお、認識文法の有無の確認方法は特に限定されるものではなく、二次スクリプト生成部１７において二次スクリプトの生成時に判断するものであっても良いし、認識文法記憶部１９に対応する認識文法が記憶されているか否かを確認するものであっても良い。
【００３３】
認識文法を含んだ二次スクリプトが既に生成されている場合には（ステップＳ２０２：Ｙｅｓ）、当該二次スクリプトの取り出しをリソースフェッチャー２０に対して依頼し（ステップＳ２０３）、スクリプト解釈部１５において当該二次スクリプトが解釈されて実行され（ステップＳ２０４）、合成音声等を用いたシステムによる音声出力によりユーザに対する発話依頼を行う（ステップＳ２０５）。
【００３４】
認識文法を含んだ二次スクリプトが生成されていない場合には（ステップＳ２０２：Ｎｏ）、リソースフェッチャー２０に対して一次スクリプトの取り出しを依頼する（ステップＳ２０６）。リソースフェッチャー２０がその旨を二次スクリプト生成部１７へ伝えると、二次スクリプト生成部１７は一次スクリプトをドキュメントデータベース１８から取り出すとともに、形式的な仮の二次スクリプトを生成する（ステップＳ２０７）。仮の二次スクリプトには、この時点では認識文法が含まれていない。
【００３５】
生成された仮の二次スクリプトがスクリプト解釈部１５へ渡されたら、当該スクリプトが解釈されて実行され（ステップＳ２０８）、合成音声等を用いたシステムによる音声出力によりユーザに対する発話依頼を行う（ステップＳ２０９）。
【００３６】
次に、発話依頼により促されたユーザによる発話を音声信号として入力する（ステップＳ２１０）。そして、入力された音声信号に基づいて、言語コードを切り替えたり、適切な認識エンジンを選択するとともに（ステップＳ２１１）、入力された音声信号そのものを一時記憶する（ステップＳ２１２）。
【００３７】
そして、一次記憶されているユーザにより発声された音声信号を形態素解析等して、必要なキーワードを分析する（ステップＳ２１３）。さらに、記述された一次スクリプト及び音声信号から抽出されたキーワードに基づいて、認識文法を生成し、一次スクリプトを更新することによって、認識文法が組み込まれた二次スクリプトを生成する（ステップＳ２１４）。
【００３８】
このような構成とすることによって、認識文法が存在しない場合であっても、音声信号に基づいて必要な認識文法を生成することができることから、音声応答アプリケーションを正常に実行させることが可能となる。
【００３９】
次に、具体的にどのようなスクリプト処理が行われるのかについて、具体例を示しながら説明する。
【００４０】
まず、ユーザからの電話を着呼するか、あるいは音声応答アプリケーションの始動動作を行うことによって、プラットホーム制御部１３からインタプリタであるスクリプト解釈部１５に対して応答依頼がなされる。
【００４１】
そして、スクリプト解釈部１５は、音声応答アプリケーションの初期動作スクリプトとして一次スクリプトを指定し、リソースフェッチャー２０に対して一次スクリプトの取得を指示する。一次スクリプトは、例えば図３に示すようなＶｏｉｃｅＸＭＬで記述されたスクリプトである場合を想定する。
【００４２】
ところが、図３に示す一次スクリプトの記述からも明らかなように、当該一次スクリプトの記述内容では、どんなユーザの発話を受け付けるのか、認識するための情報としての認識文法が記述されていないため、意図した対話が成立せず、スクリプト自体の正しさも確認することができない。したがって、音声応答アプリケーションの作成者は、記述された一次スクリプトの動作を確かめるために、認識文法を別途用意する必要が生じることになる。
【００４３】
そこで、リソースフェッチャー２０は、一次スクリプトを二次スクリプト生成部１７に引渡し、二次スクリプト生成部１７においては、図４に示すような仮の二次スクリプトが生成される。ここで仮の二次スクリプトとは、認識文法が組み込まれる前の状態の二次スクリプトを意味しており、ユーザの発話による音声信号をどのタイミングで取得するのか等について記述されているものと定義する。
【００４４】
図４に示す仮の二次スクリプトは、下線部により示されているように、一次スクリプトで使用されている定数データがそのまま流用されている。また、それ以外の部分については、一次スクリプトに記述された構文から生成されたテンプレートによって生成される。
【００４５】
当該テンプレートは以下の処理によって生成される。まず、ユーザによる音声入力を促して、実際にスロットを埋める動作をするタグを用意し、当該タグを生の音声信号を収集するタグに置き換える。図４においては、ユーザによる音声入力を促してスロットを埋める動作をするタグとしては＜ｉｎｉｔｉａｌ＞及び＜ｆｉｅｌｄ＞が用意されており、一方、生の音声信号を収集するタグとしては＜ｒｅｃｏｒｄ＞が用意されている。図３と図４を対比することで、＜ｉｎｉｔｉａｌ＞タグ及び＜ｆｉｅｌｄ＞タグが、それぞれ一対一対応で＜ｒｅｃｏｒｄ＞タグに置換されていることがわかる。
【００４６】
一方、仮の二次スクリプトにのみ記述されているタグである＜ｂｌｏｃｋ＞、＜ｆｉｌｌｅｄ＞、及び＜ｓｕｂｍｉｔ＞については、一次スクリプトで定義されている定数値をデータとして定型的に生成される。なお、＜ｓｕｂｍｉｔ＞タグはタグ内に実行するべき処理内容も記述されており、当該タグを実行することによって、ユーザによる音声入力がユーザ入力解析部１６へと渡される。
【００４７】
すなわち、ＶｏｉｃｅＸＭＬにおける＜ｆｉｅｌｄ＞タグに代表されるようなユーザによる入力を制限してキーワードを取得するためのタグを、生の音声信号を収集し（＜ｒｅｃｏｒｄ＞タグ）、収集された音声信号の中からキーワードを取得する（＜ｓｕｂｍｉｔ＞タグを用いてユーザ入力解析部１６へと渡してキーワード抽出する）という一連の動作に置換する点に特徴を有している。
【００４８】
なお、図３に示す一次スクリプトにおいても＜ｓｕｂｍｉｔ＞タグが存在しているが、これは＜ｆｉｅｌｄ＞タグに基づいて次のタスクへと遷移するために記述しているものであり、仮の二次スクリプトにおける＜ｓｕｂｍｉｔ＞タグとは用途が異なっている。
【００４９】
また、本実施例においては、かかる置換処理をテンプレートを生成することによって行っているが、事前にスクリプト変換テーブルを設けておき、当該スクリプト変換テーブルを参照することによってタグを置換する方法であっても良い。
【００５０】
次に、スクリプト解釈部１５は、生成された仮の二次スクリプトを読み込んで、解釈して実行することで、図５に示すような対話をユーザと行う。図５においては、ユーザが発話した音声信号が一時記憶部１３に一時記憶されるファイル名を括弧内に表示している。
【００５１】
そして、図５に示す対話が終了し、ユーザが要求された全ての音声信号を入力し一時記憶された後、スクリプト解釈部１５はユーザが発話した一時記憶部１３に記憶されている音声信号及び対応する一次スクリプトを、いわゆるサーブレットのような形態をとるユーザ入力解析部１６へと引き渡す。
【００５２】
なお、本実施例においては、仮の二次スクリプトを生成するための発話例（図５）に基づいて、音声信号をすべて収集した後に認識文法の生成処理を行っているが、特にこれに限定されるものではなく、ユーザによる音声入力があるごとに認識文法の生成処理を行うものであっても良い。
【００５３】
次に、ユーザ入力解析部１６は、例えば連続音声認識モジュールや形態素解析モジュールをサブモジュールとして有しており、ユーザが発話した音声信号の内容からキーワードとなる語句を切り出す。例えば、図５に示す対話に登場しているファイル名ｔｍｐ１．ｗａｖについては、図６に示すように解析される
そして、解析された内容は、二次スクリプト生成部１７へと送られ、二次スクリプト生成部１７では、タスクに必要なキーワードと不要なキーワードを確定して、認識文法を生成することになる。二次スクリプト生成部１７で生成される認識文法の一例として、ＸＭＬ形式で記述されたものを図７に示す。
【００５４】
図７においては、ユーザ発話から抽出された品詞のうち、名詞のみをキーワードとして受け付けるようにしている。また、図７に示すように、本実施例においては名詞が二つ現れているのは、同時に受け取った一次スクリプトにおいて対応している＜ｉｎｉｔｉａｌ＞タグに対する応答例であることから、それぞれ＜ｆｉｅｌｄ＞タグに対応する発話であると想定しているからである。なお本実施例においては、各キーワードと＜ｆｉｅｌｄ＞タグとの関連付けを自動で行っているが、さらに補助スクリプトを用意することで、それぞれのキーワードがどちらの＜ｆｉｅｌｄ＞タグに対応するかをユーザに問い合わせるようにしても良い。
【００５５】
そして、二次スクリプト生成部１７は、一次スクリプトを更新することによって、図８に示すように認識文法を埋め込んだ二次スクリプトを生成する。図８における下線部では、図７に示す認識文法ファイルを認識文法記憶部１９に記憶する際のファイル名を明示することによって、当該音声応答アプリケーション実行時に用いるべき認識文法が明確になる。
【００５６】
このようにすることで、本来認識文法が準備されていなかったスクリプトについても、最適な認識文法を付与することができ、音声応答アプリケーションとして実行させることが可能となる。
【００５７】
次に、一次スクリプトが図９に示すような形で与えられている場合について説明する。図９では、下線部に示すように、受け付けることのできるキーワードの代表値として“ドリンク”及び“食べ物”が、一次スクリプトの記述時点において埋め込まれている。もちろん、当該キーワードリストは別のファイルとして保存しておき、参照するものであっても良い。以下の処理はキーワードの抽出処理以外、同様の処理となる。
【００５８】
すなわち、あらかじめキーワードリストを生成しておくことによって、キーワード抽出処理が不要あるいは簡易的な処理で十分となることから、全体の処理負荷を軽減することが可能となる。
【００５９】
また、図１０に示す仮の二次スクリプトのように、キーワードである“ｄｒｉｎｋ”及び“ｎｕｍｂｅｒ”に相当する発話された音声信号を、一次スクリプトに記述されているタスク処理プログラム名とともにユーザ入力解析部１６に渡すよう記述することも考えられる。本実施例では、タスク処理プログラム名は変数名ＴＡＳＫ＿ＰＲＯＣＥＳＳＯＲに格納している。
【００６０】
このようにスクリプトを構成することにより、認識文法を組み込んだ二次スクリプトを生成することなく、直接、音声応答アプリケーションとして実行することも可能となる。
【００６１】
以上のように本実施の形態によれば、認識文法がない場合であっても、ユーザによる発話の内容に応じた適切な認識文法を生成することができ、音声応答アプリケーションとして確実に対話を構成することが可能となる。
【００６２】
なお、本発明の実施の形態にかかる音声応答アプリケーション作成支援システムを実現するプログラムは、図１１に示すように、ＣＤ−ＲＯＭ１１２−１やフレキシブルディスク１１２−２等の可搬型記録媒体１１２だけでなく、通信回線の先に備えられた他の記憶装置１１１や、コンピュータ１１３のハードディスクやＲＡＭ等の記録媒体１１４のいずれに記憶されるものであっても良く、プログラム実行時には、プログラムはローディングされ、主メモリ上で実行される。
【００６３】
また、本発明の実施の形態にかかる音声応答アプリケーション作成支援システムにより生成された認識文法に関する情報等についても、図１１に示すように、ＣＤ−ＲＯＭ１１２−１やフレキシブルディスク１１２−２等の可搬型記録媒体１１２だけでなく、通信回線の先に備えられた他の記憶装置１１１や、コンピュータ１１３のハードディスクやＲＡＭ等の記録媒体１１４のいずれに記憶されるものであっても良く、例えば本発明にかかる音声応答アプリケーション作成支援システムを利用する際にコンピュータ１１３により読み取られる。
【００６４】
【発明の効果】
以上のように本発明にかかる音声応答アプリケーション作成支援システムによれば、認識文法がない場合であっても、ユーザによる発話の内容に応じた適切な認識文法を生成することができ、音声応答アプリケーションとして確実に対話を構成することが可能となる。
【図面の簡単な説明】
【図１】本発明の実施の形態にかかる音声応答アプリケーション作成支援システムの構成図
【図２】本発明の実施の形態にかかる音声応答アプリケーション作成支援システムの処理の流れ図
【図３】本発明の実施の形態にかかる音声応答アプリケーション作成支援システムにおける一次スクリプトの例示図
【図４】本発明の実施の形態にかかる音声応答アプリケーション作成支援システムにおける仮の二次スクリプトの例示図
【図５】本発明の実施の形態にかかる音声応答アプリケーション作成支援システムにおける音声応答の例示図
【図６】本発明の実施の形態にかかる音声応答アプリケーション作成支援システムにおける形態素解析の例示図
【図７】本発明の実施の形態にかかる音声応答アプリケーション作成支援システムにおける認識文法の例示図
【図８】本発明の実施の形態にかかる音声応答アプリケーション作成支援システムにおける認識文法を組み込んだ二次スクリプトの例示図
【図９】本発明の実施の形態にかかる音声応答アプリケーション作成支援システムにおける一次スクリプトの他の例示図
【図１０】本発明の実施の形態にかかる音声応答アプリケーション作成支援システムにおける仮の二次スクリプトの他の例示図
【図１１】コンピュータ環境の例示図
【符号の説明】
１１音声入力部
１２音声出力部
１３プラットホーム制御部
１４一時記憶部
１５スクリプト解釈部
１６ユーザ入力解析部
１７二次スクリプト生成部
１８ドキュメントデータベース
１９認識文法記憶部
２０リソースフェッチャー
１１１回線先の記憶装置
１１２ＣＤ−ＲＯＭやフレキシブルディスク等の可搬型記録媒体
１１２−１ＣＤ−ＲＯＭ
１１２−２フレキシブルディスク
１１３コンピュータ
１１４コンピュータ上のＲＡＭ／ハードディスク等の記録媒体[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a dialogue system for performing a task by interacting with an end user. In particular, a script language that describes the dialogue and execution tasks, an interpreter that interprets and executes the dialogue script described by the scripting language, an implementation platform that mediates the interpreter and the user, and a keyword extraction that is necessary for the dialogue script The present invention relates to a grammar generation device.
[0002]
[Prior art]
With the rapid development of computer technology in recent years, many applications utilizing voice-based dialogue have been developed. In these spoken dialogue systems, while repeating the utterance of the user and the utterance of the synthesized speech by the system alternately, necessary information is collected from the user while performing hierarchical branching according to the content of the user's utterance. Some kind of task is executed when important information is obtained.
[0003]
As a script language for describing these response scenarios, VoiceXML Forum (HYPERLINK http://www.voiceexml.org/) http: // www. voicexml. org / ) VoiceXML 1.0 is the mainstream and has already been released on a commercial basis. At present, VoiceXML 2.0 as a successor version is in the development stage at the World Wide Web Consortium (W3C; http://www.w3.org/).
[0004]
In addition, there is a scripting language that was originally formulated by a system integrator that provides voice interaction services. For example, Fujitsu Limited has released a script language for dialog description called "VoiceScript (R)".
[0005]
In order for these voice interaction systems to operate normally, the most important issue is to recognize the content of the utterance made by the input user with higher accuracy. In order to increase recognition accuracy, a method of automatically increasing the vocabulary of the recognition dictionary as in (Patent Document 1) or a method of extracting a keyword and extracting a recognition dictionary corresponding to the keyword as in (Patent Document 2) A method for generating the suffix is considered.
[0006]
Furthermore, when a script language is used, a recognition grammar (grammar) necessary for extracting a necessary command from words uttered by humans is required in order to more accurately recognize the content of the utterance by the user. In this case, if a recognition grammar that matches the utterance content of the user is assumed, the utterance content of the user is analyzed according to the recognition grammar, and the included keywords are extracted. Therefore, in order to operate a speech dialogue system, a dialogue script described in the above script language that describes a dialogue is not enough, and a recognition grammar for primary recognition of a user's utterance is also described. Need to be kept.
[0007]
Regarding the format of the recognition grammar, there are two formats that are considered in the Java Speech Grammar Format (JSGF) and the SpeechRecognition Grammar Specification Version 1.0 (SRGS), which is being developed by the World Wide Web Consortium. GrXML and the like are typical. There are also other formats that have been independently developed by voice recognition providers who have long offered voice recognition applications to the market.
[0008]
Script languages such as VoiceXML and SRGS are languages derived from the XML language which is extended based on the HTML language which is a script for describing web pages. However, compared to the HTML language which was mainly used to modify data, the programming language The color as language is very strong. To write a script using tags defined by each language specification, there is a method using a text editor. ⁺⁺ Just as there is an integrated development environment in which an editing environment and a debugger are integrated with programming languages such as Java and Java (R), some interactive system vendors prepare a dedicated development environment for easily writing scripts.
[0009]
[Patent Document 1]
JP 2002-14693 A
[0010]
[Patent Document 2]
JP-A-11-202890
[0011]
[Problems to be solved by the invention]
However, in the above-described method, it is necessary to prepare an interactive script and a recognition grammar having different formats, but it is not possible to configure an interactive application by itself. That is, it is necessary to prepare a recognition grammar corresponding to each interactive script, and the creator of the interactive application needs to generate the recognition grammar at the same time.
[0012]
In addition, it is practically difficult to prepare in advance all recognition grammars corresponding to each interactive script, due to the diversity of the recognition grammar, and due to the physical limitation of the storage capacity, It is also difficult to register the recognition grammar in advance.
[0013]
Further, when there is no recognition grammar, there is a problem that the content of the user's utterance cannot be recognized, and the response by the system becomes an irrelevant response. .
[0014]
The present invention provides a voice response application creation support system and method capable of constructing a voice response application by generating an interactive script even if there is no recognition grammar in order to solve the above problem. The purpose is to:
[0015]
[Means for Solving the Problems]
In order to achieve the above object, a voice response application creation support system according to the present invention is an automatic response application creation support system that supports creation of an automatic response application that automatically responds to user utterance, and includes a recognition grammar. A script interpreter that is an interpreter that checks whether or not a secondary script exists, selects and acquires a primary script or a secondary script according to the presence or absence, and interprets and executes the primary script or the secondary script. A voice input unit that inputs an utterance as a voice signal, a temporary storage unit that temporarily stores the input voice signal, and a platform control unit that selects at least a language code and a voice recognition engine based on the input voice signal, Analyze primary stored voice signal and extract keywords A secondary script generator that generates a recognition grammar based on the primary script and a keyword extracted from the voice signal, and generates a secondary script incorporating the recognition grammar. If no secondary script has been generated, a temporary secondary script for prompting the user to speak is generated based on the primary script, interpreted and executed by the script interpreting unit, and the secondary script is generated. Is characterized in that a secondary script is interpreted and executed by the script interpreter.
[0016]
With this configuration, even when there is no recognition grammar, it is possible to generate an appropriate recognition grammar according to the content of the utterance by the user, and it is possible to reliably configure a dialog as a voice response application.
[0017]
Preferably, the voice response application creation support system according to the present invention further includes a recognition grammar storage unit for storing the recognition grammar. This is because the recognition grammar can be updated and accumulated at any time according to the content of the user's utterance.
[0018]
Further, the voice response application creation support system according to the present invention determines whether or not a secondary script has been generated, and if a secondary script has not been generated, a temporary secondary script has been generated. It is preferable to further include a resource fetcher that passes the next script to the script interpreter.
[0019]
Further, the present invention is characterized by software that executes the function of the voice response application creation support system as described above as a processing step of a computer, and more specifically, an automatic response that automatically corresponds to a user utterance. An automatic response application creation support method that supports creation of an application. The method includes checking whether a secondary script including a recognition grammar exists, and selecting and acquiring a primary script or a secondary script according to the presence / absence of the primary script. Interpreting and executing a script or secondary script, inputting an utterance by a user as an audio signal, temporarily storing the input audio signal, and at least a language code based on the input audio signal. Selecting a speech recognition engine and analyzing a speech signal stored temporarily Generating a recognition grammar based on the keywords extracted from the primary script and the voice signal, and generating a secondary script incorporating the recognition grammar, wherein the secondary script is generated. If not, a temporary secondary script for prompting the user to speak is generated based on the primary script, interpreted and executed, and if a secondary script is generated, the secondary script is generated. The present invention is characterized in that it is an automatic response application creation support method that is interpreted and executed, and a computer-executable program that embodies such a process.
[0020]
With such a configuration, by loading and executing the program on a computer, even when there is no recognition grammar, it is possible to generate an appropriate recognition grammar according to the content of the utterance by the user, and as a voice response application It is possible to realize an automatic response application creation support system capable of reliably forming a conversation.
[0021]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, a voice response application creation support system according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a configuration diagram of a voice response application creation support system according to an embodiment of the present invention.
[0022]
In FIG. 1, first, an utterance by a user, which is prompted by an audio output by a system output as a synthetic voice from an audio output unit 12, is input as an audio signal in an audio input unit 11. Then, the platform control unit 13 switches the language attribute and the recognition engine. That is, in accordance with a voice signal input from the voice input unit 11, a task of switching a language code or selecting an appropriate recognition engine is performed.
[0023]
In the platform control unit 13, the audio signal itself input from the audio input unit 11 is stored in the temporary storage unit 14. This is because a keyword for generating a recognition grammar is extracted using the stored voice signal.
[0024]
The script interpreter 15 interprets and executes a script prepared in the voice interaction application. Of course, for the interpreted script, the platform control unit 13 switches the language code according to the content of the interpreted script or selects an appropriate synthesized speech to generate a synthesized speech, It is output from the audio output unit 12 as an output.
[0025]
Next, the user input analysis unit 16 analyzes necessary keywords by performing morphological analysis or the like on the voice signal uttered by the user stored in the primary storage unit 14.
[0026]
Then, the secondary script generation unit 17 generates a recognition grammar based on the described script and keywords obtained from the voice signal, and updates the primary script prepared from the beginning to incorporate the recognition grammar. Generates a secondary script. If no recognition grammar exists, a temporary secondary script is generated. The generated secondary script or the temporary secondary script is interpreted by the script interpreting unit 15 and executed, so that the user can perform a voice dialogue. The recognition grammar generated here is stored in the recognition grammar storage unit 19. By storing the generated recognition grammar in the recognition grammar storage unit 19, it becomes possible to refer to the recognition grammar at the time of the next speech input.
[0027]
The confirmation of the presence or absence of the recognition grammar may be performed by the secondary script generation unit 17 when the secondary script is generated, or whether the recognition grammar corresponding to the recognition grammar storage unit 19 is stored. May be confirmed.
[0028]
The resource fetcher 20 interprets a script interpreted and executed by the script interpreter 15 into a temporary secondary script generated based on a primary script described at the time of application creation and a newly generated secondary script. Switch between scripts.
[0029]
Further, the keyword is not limited to the keyword extracted by being analyzed by the user input analysis unit 16, but may be a keyword that is stored in the document database 18 in advance.
[0030]
Next, the flow of processing in the voice response application creation support system according to the embodiment of the present invention having such a configuration will be described. FIG. 2 shows a flowchart of a program for realizing the voice response application creation support system according to the embodiment of the present invention.
[0031]
In FIG. 2, when the platform control unit 13 requests the script interpreting unit 15 to start a dialogue such as an incoming call (step S201), the script interpreting unit 15 determines whether a secondary script including a recognition grammar exists. Confirm (step S202).
[0032]
The method of checking the presence or absence of the recognition grammar is not particularly limited, and may be a method in which the secondary script generation unit 17 determines when the secondary script is generated, or the recognition corresponding to the recognition grammar storage unit 19. It may check whether the grammar is stored or not.
[0033]
If a secondary script including the recognition grammar has already been generated (step S202: Yes), the resource fetcher 20 is requested to retrieve the secondary script (step S203). The secondary script is interpreted and executed (step S204), and an utterance request is issued to the user by voice output by a system using synthesized voice or the like (step S205).
[0034]
If a secondary script including the recognition grammar has not been generated (step S202: No), the resource fetcher 20 is requested to take out the primary script (step S206). When the resource fetcher 20 informs the secondary script generation unit 17 of the fact, the secondary script generation unit 17 retrieves the primary script from the document database 18 and generates a formal temporary secondary script (step S207). . The temporary secondary script does not contain any recognition grammar at this time.
[0035]
When the generated temporary secondary script is passed to the script interpreting unit 15, the script is interpreted and executed (step S208), and an utterance request is issued to the user by voice output by a system using synthesized voice or the like (step S208). S209).
[0036]
Next, the utterance of the user prompted by the utterance request is input as a voice signal (step S210). Then, based on the input voice signal, the language code is switched, an appropriate recognition engine is selected (step S211), and the input voice signal itself is temporarily stored (step S212).
[0037]
Then, a necessary keyword is analyzed by morphological analysis or the like of the voice signal uttered by the user which is temporarily stored (step S213). Further, a recognition grammar is generated based on the described primary script and a keyword extracted from the voice signal, and the primary script is updated to generate a secondary script incorporating the recognition grammar (step S214).
[0038]
With such a configuration, even when no recognition grammar exists, a necessary recognition grammar can be generated based on the voice signal, and thus the voice response application can be normally executed. .
[0039]
Next, specific script processing will be described with reference to specific examples.
[0040]
First, by receiving a telephone call from the user or performing a start operation of a voice response application, a response request is made from the platform control unit 13 to the script interpretation unit 15 which is an interpreter.
[0041]
Then, the script interpreter 15 specifies the primary script as the initial operation script of the voice response application, and instructs the resource fetcher 20 to acquire the primary script. It is assumed that the primary script is a script described in VoiceXML as shown in FIG. 3, for example.
[0042]
However, as is clear from the description of the primary script shown in FIG. 3, the description content of the primary script does not describe the recognition grammar as information for recognizing which user's utterance is accepted. Dialogue is not established and the correctness of the script itself cannot be confirmed. Therefore, the creator of the voice response application needs to separately prepare a recognition grammar in order to confirm the operation of the described primary script.
[0043]
Therefore, the resource fetcher 20 delivers the primary script to the secondary script generation unit 17, and the secondary script generation unit 17 generates a temporary secondary script as shown in FIG. Here, the temporary secondary script means a secondary script in a state before the recognition grammar is incorporated, and is defined as a timing at which a voice signal obtained by a user's utterance is acquired. I do.
[0044]
In the temporary secondary script shown in FIG. 4, as indicated by the underlined portion, constant data used in the primary script is diverted as it is. The other parts are generated by a template generated from the syntax described in the primary script.
[0045]
The template is generated by the following processing. First, a tag that prompts the user for voice input and that actually performs an operation of filling a slot is prepared, and the tag is replaced with a tag for collecting a raw voice signal. In FIG. 4, <initial> and <field> are prepared as tags for prompting the user to input voice and perform an operation of filling a slot, while <record> is used as a tag for collecting a raw voice signal. It is prepared. By comparing FIG. 3 with FIG. 4, it can be seen that the <initial> tag and the <field> tag are replaced with the <record> tag in a one-to-one correspondence.
[0046]
On the other hand, tags <block>, <filled>, and <submit>, which are described only in the temporary secondary script, are generated in a fixed manner using constant values defined in the primary script as data. The <submit> tag also describes the content of processing to be executed in the tag. By executing the tag, a voice input by the user is passed to the user input analysis unit 16.
[0047]
That is, a tag such as a <field> tag in VoiceXML for obtaining a keyword by restricting an input by the user is obtained by collecting a raw audio signal (<record> tag) and generating a tag of the collected audio signal. It is characterized in that it is replaced with a series of operations of acquiring a keyword from the inside (passing it to the user input analysis unit 16 using a <submit> tag to extract the keyword).
[0048]
Note that the <submit> tag also exists in the primary script shown in FIG. 3, but is described to transition to the next task based on the <field> tag. The usage is different from the <submit> tag in the next script.
[0049]
Further, in the present embodiment, such a replacement process is performed by generating a template. However, a method is provided in which a script conversion table is provided in advance and tags are replaced by referring to the script conversion table. Is also good.
[0050]
Next, the script interpreting unit 15 reads, interprets, and executes the generated temporary secondary script to perform a dialogue as shown in FIG. 5 with the user. In FIG. 5, the name of the file in which the audio signal spoken by the user is temporarily stored in the temporary storage unit 13 is displayed in parentheses.
[0051]
Then, after the dialog shown in FIG. 5 is completed and the user inputs and temporarily stores all the requested voice signals, the script interpreting unit 15 outputs the voice signals stored in the temporary storage unit 13 spoken by the user and The corresponding primary script is passed to a user input analyzer 16 in the form of a so-called servlet.
[0052]
In the present embodiment, the recognition grammar generation process is performed after all voice signals have been collected, based on an utterance example (FIG. 5) for generating a temporary secondary script. Instead, the recognition grammar may be generated each time a user inputs a voice.
[0053]
Next, the user input analysis unit 16 includes, for example, a continuous speech recognition module and a morphological analysis module as submodules, and cuts out a phrase serving as a keyword from the content of the speech signal uttered by the user. For example, the file names tmp1. The wav is analyzed as shown in FIG.
Then, the analyzed contents are sent to the secondary script generation unit 17, and the secondary script generation unit 17 determines keywords necessary for the task and unnecessary keywords, and generates a recognition grammar. FIG. 7 shows an example of a recognition grammar generated in the secondary script generation unit 17 in an XML format.
[0054]
In FIG. 7, among the parts of speech extracted from user utterances, only nouns are accepted as keywords. Also, as shown in FIG. 7, in the present embodiment, two nouns appear in the example of the response to the corresponding <initial> tag in the primary script received at the same time. This is because it is assumed that the utterance corresponds to the tag. In this embodiment, each keyword is automatically associated with the <field> tag. However, by providing an auxiliary script, the user can determine which <field> tag each keyword corresponds to. You may ask to contact.
[0055]
Then, the secondary script generation unit 17 generates a secondary script in which the recognition grammar is embedded as shown in FIG. 8 by updating the primary script. In the underlined part in FIG. 8, by specifying the file name when the recognition grammar file shown in FIG. 7 is stored in the recognition grammar storage unit 19, the recognition grammar to be used when executing the voice response application becomes clear.
[0056]
By doing so, an optimal recognition grammar can be given to a script for which a recognition grammar was not originally prepared, and the script can be executed as a voice response application.
[0057]
Next, a case where the primary script is given in a form as shown in FIG. 9 will be described. In FIG. 9, “drink” and “food” are embedded as representative values of acceptable keywords at the time of writing the primary script, as indicated by the underlined portions. Of course, the keyword list may be stored as another file and referred to. The following processing is similar to the processing other than the keyword extraction processing.
[0058]
That is, by generating a keyword list in advance, keyword extraction processing is unnecessary or simple processing is sufficient, so that the overall processing load can be reduced.
[0059]
Further, as in the temporary secondary script shown in FIG. 10, the uttered voice signal corresponding to the keywords "dlink" and "number" is analyzed by the user input analysis together with the task processing program name described in the primary script. It is also conceivable that the description is passed to the unit 16. In this embodiment, the task processing program name is stored in the variable name TASK_PROCESSOR.
[0060]
By configuring the script in this way, it is also possible to directly execute as a voice response application without generating a secondary script incorporating a recognition grammar.
[0061]
As described above, according to the present embodiment, even when there is no recognition grammar, it is possible to generate an appropriate recognition grammar according to the content of the utterance by the user, and to reliably construct a dialog as a voice response application. It is possible to do.
[0062]
As shown in FIG. 11, the program for realizing the voice response application creation support system according to the embodiment of the present invention is not limited to the portable recording medium 112 such as the CD-ROM 112-1 and the flexible disk 112-2. The program may be stored in any of the other storage device 111 provided at the end of the communication line and the recording medium 114 such as the hard disk or the RAM of the computer 113. When the program is executed, the program is loaded. Runs on memory.
[0063]
Also, as shown in FIG. 11, information on the recognition grammar generated by the voice response application creation support system according to the embodiment of the present invention is a portable type such as a CD-ROM 112-1 or a flexible disk 112-2. Not only the recording medium 112, but also any other storage device 111 provided at the end of the communication line or a recording medium 114 such as a hard disk or a RAM of a computer 113 may be used. It is read by the computer 113 when using such a voice response application creation support system.
[0064]
【The invention's effect】
As described above, according to the voice response application creation support system according to the present invention, even when there is no recognition grammar, it is possible to generate an appropriate recognition grammar according to the content of the utterance by the user, and As a result, it is possible to reliably configure the dialogue.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a voice response application creation support system according to an embodiment of the present invention;
FIG. 2 is a flowchart of processing of a voice response application creation support system according to the embodiment of the present invention;
FIG. 3 is an exemplary diagram of a primary script in the voice response application creation support system according to the embodiment of the present invention;
FIG. 4 is an exemplary diagram of a temporary secondary script in the voice response application creation support system according to the embodiment of the present invention;
FIG. 5 is an exemplary diagram of a voice response in the voice response application creation support system according to the embodiment of the present invention;
FIG. 6 is an exemplary diagram of morphological analysis in the voice response application creation support system according to the embodiment of the present invention;
FIG. 7 is an exemplary diagram of a recognition grammar in the voice response application creation support system according to the embodiment of the present invention;
FIG. 8 is an exemplary diagram of a secondary script incorporating a recognition grammar in the voice response application creation support system according to the embodiment of the present invention;
FIG. 9 is another exemplary diagram of the primary script in the voice response application creation support system according to the embodiment of the present invention;
FIG. 10 is another exemplary diagram of a temporary secondary script in the voice response application creation support system according to the embodiment of the present invention;
FIG. 11 is an exemplary diagram of a computer environment.
[Explanation of symbols]
11 Voice input section
12 Audio output unit
13 Platform control unit
14 Temporary storage
15 Script interpreter
16 User input analysis unit
17 Secondary script generator
18 Document Database
19 Recognition grammar storage
20 Resource Fetcher
111 Destination storage device
112 Portable recording media such as CD-ROM and flexible disk
112-1 CD-ROM
112-2 Flexible disk
113 Computer
114 Recording media such as RAM / hard disk on computer

Claims

An automatic response application creation support system that supports creation of an automatic response application that automatically responds to user utterances,
Check the presence or absence of a secondary script containing a recognition grammar, select and obtain a primary script or the secondary script according to the presence or absence, and interpret and execute the primary script or the secondary script with an interpreter. A script interpreter,
An audio input unit for inputting an utterance by the user as an audio signal;
A temporary storage unit for temporarily storing the input audio signal,
A platform control unit that selects at least a language code and a voice recognition engine based on the input voice signal;
A user input analysis unit for analyzing the voice signal temporarily stored and extracting a keyword;
A secondary script generation unit that generates a recognition grammar based on the keyword extracted from the primary script and the audio signal, and generates a secondary script incorporating the recognition grammar,
If the secondary script has not been generated, based on the primary script, to generate a temporary secondary script to prompt the utterance of the user, interpreted and executed by the script interpretation unit,
If the secondary script has been generated, the script interpreting unit interprets the secondary script and executes the script.

2. The system according to claim 1, further comprising a recognition grammar storage unit for storing the recognition grammar.

It is determined whether or not the secondary script has been generated. If the secondary script has not been generated, the temporary secondary script is passed to the script interpreting unit. 3. The automatic response application creation support system according to claim 1, further comprising a resource fetcher.

An automatic response application creation support method that supports creation of an automatic response application that automatically responds to a user utterance,
Confirming the presence or absence of a secondary script including a recognition grammar, selecting and acquiring a primary script or the secondary script according to the presence or absence, interpreting and executing the primary script or the secondary script, and ,
Inputting the utterance by the user as an audio signal;
Temporarily storing the input audio signal;
Selecting at least a language code and a speech recognition engine based on the input speech signal;
Analyzing the primary stored voice signal to extract keywords;
Generating a recognition grammar based on the keyword extracted from the primary script and the audio signal, and generating a secondary script incorporating the recognition grammar,
If the secondary script has not been generated, based on the primary script, to generate and interpret a temporary secondary script for prompting the user to utter and execute it,
If the secondary script has been generated, the secondary script is interpreted and executed.

A computer-executable program embodying an automatic response application creation support method for supporting creation of an automatic response application automatically corresponding to a user utterance,
Confirming the presence or absence of a secondary script including a recognition grammar, selecting and acquiring a primary script or the secondary script according to the presence or absence, interpreting and executing the primary script or the secondary script, and ,
Inputting the utterance by the user as an audio signal;
Temporarily storing the input audio signal;
Selecting at least a language code and a speech recognition engine based on the input speech signal;
Analyzing the primary stored voice signal to extract a keyword;
Generating a recognition grammar based on the keywords extracted from the primary script and the audio signal, and generating a secondary script incorporating the recognition grammar,
If the secondary script has not been generated, based on the primary script, to generate and interpret a temporary secondary script for prompting the user to utter and execute it,
A computer-executable program for interpreting and executing the secondary script when the secondary script has been generated.