JP3925326B2

JP3925326B2 - Terminal communication system, linkage server, voice dialogue server, voice dialogue processing method, and voice dialogue processing program

Info

Publication number: JP3925326B2
Application number: JP2002186649A
Authority: JP
Inventors: 淳野口
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2002-06-26
Filing date: 2002-06-26
Publication date: 2007-06-06
Anticipated expiration: 2022-06-26
Also published as: JP2004029456A

Description

【０００１】
【発明の属する技術分野】
本発明は、例えば、音声対話処理によって入力された情報を、Ｗｅｂページの表示内容に早期に反映させることができるとともに、音声対話処理に用いられていた通信回線が切断された場合であってもＷｅｂページの表示内容に反映させることができる端末通信システム、連携サーバ、音声対話サーバ、音声対話処理方法、および音声対話処理プログラムに関する。
【０００２】
【従来の技術】
従来から、インターネットなどのデータ通信ネットワークに接続されているＷＷＷ（World Wide Web）サーバによるＷｅｂページの表示を利用した表示サービスと、一般公衆電話回線網などの音通信ネットワークに接続されている音声対話サーバによる音声対話機能を用いた音声サービスとを連携させたシステムが利用されている。
【０００３】
表示サービスと音声サービスとを連携させたシステムには、例えば特開２００２−２６８２４１に開示されている無線携帯端末通信システムがある。この無線携帯端末通信システムは、ブラウザ機能および通話機能を備えた携帯電話端末と、表示サービスを実行するコンテンツサーバと音声サービスを実行する音声対話サーバとを含むセンタとで構成される。この無線携帯端末通信システムによれば、センタは、コンテンツサーバによる表示サービスによって携帯電話端末が備える表示装置に表示されている情報入力領域への情報入力を、音声対話処理による音声入力によって受け付ける連携サービスを提供している。具体的には、音声対話処理にて入力した音声情報にもとづく文字列情報を音声認識処理によって取得し、携帯電話端末が備える表示装置に表示されている情報入力領域に、音声認識結果を示す文字列情報にもとづく表示を行うための処理が実行される。
【０００４】
【発明が解決しようとする課題】
音声対話処理における音声認識処理では、携帯電話端末のユーザの発声方法に癖があったりすることから、携帯電話端末から入力した音や音声がユーザが意図する内容として常に正確に認識されているとは限らない。このため、一般に、音声対話処理では、音声認識結果を示す音声を発声するための処理を行うなどすることで、音声認識結果が適正であるか否かを携帯電話端末のユーザに対して問い合わせるための確認処理が実行される。この確認処理が実行されると、確認処理によってユーザから適正であることの確認がとれた音声認識結果を示す文字列情報が、音声対話結果情報として取り扱うことに確定され、確定された音声対話結果情報が音声対話サーバからコンテンツサーバに送信される。そして、コンテンツサーバは、受信した音声対話結果情報にもとづく連携情報を生成し、携帯電話端末に送信する。すると、携帯電話端末の表示装置の表示画面に、連携情報にもとづいて、ユーザによって発声された音声の内容が反映された表示がなされるようになる。
【０００５】
ところが、音声対話処理における確認処理が完了する前に、例えば携帯電話端末を携帯しているユーザが音声対話処理中に電波の届かない場所に移動してしまった場合などの何らかの原因によって、音声対話処理に用いられていた通信回線が切断してしまい、音声認識結果を示す文字列情報が音声対話結果情報に確定される前に音声対話処理が中途終了してしまった場合には、未確定の音声対話結果情報は破棄され、その後の情報入力領域への表示などの処理がなされない。このように、ユーザが音声入力を行ったあとであっても、確認処理が終了する前に音声対話処理が中途終了してしまった場合には、ユーザによって発声された音声が情報入力領域の入力情報として反映されず、音声対話処理を最初からやり直さなければならない。上記のように、音声対話処理が完了する前に音声対話処理に用いられていた通信回線が切断してしまうと、既に音声入力された情報があっても、その情報が反映されることなく音声対話処理が中途終了してしまうという問題があった。
【０００６】
また、音声対話処理が完了したあと、確定された音声対話結果情報が音声対話サーバからコンテンツサーバに送信されるが、データ通信ネットワーク上で多くのデータが伝送され通信回線が混雑している場合には、音声対話結果情報の伝送期間が長くなってしまい、音声対話結果情報がコンテンツサーバに取得される時期が遅延することになってしまう。コンテンツサーバが音声対話結果情報を取得する時期が遅れると、携帯電話端末の表示装置の表示画面に連携情報にもとづく表示を行うことができる時期が遅延してしまうので、音通信を終えたあと直ぐに音声対話処理結果を反映させた表示を行うことができない。このように、音声対話処理が完了したあと直ぐに音声対話処理結果を反映させた表示を行うことができず、迅速に音声対話処理の結果を携帯電話端末の表示内容などに反映させることができないという問題があった。
【０００７】
本発明は上述した問題を解消し、音声対話処理の処理結果をより早期に表示内容などに反映させることができるようにするとともに、音声対話処理が完了していなくても音声対話処理にて音声入力された情報を表示内容などに反映させることができるようにすることを目的とする。
【０００８】
【課題を解決するための手段】
上記の問題を解決するために、本発明の端末通信システムは、音声通信機能およびパケット通信機能を有する端末装置（例えば無線携帯端末２０）と、端末装置との間で音声通話を行う音声制御部（例えば音声対話制御部３２）と、音声制御部で受信した端末装置からの音声信号を認識し認識結果を出力する音声認識部（例えば音声認識部３４）と、音声通話の回線情報を監視し音声通話の中断を検出する回線情報検出部（例えば音声通信情報検出部３１）と、音声通話による音声対話終了時もしくは回線情報検出部にて音声通話（例えば音声対話処理）の中断が検出されたときに、音声認識部にて得られた認識結果もしくは当該認識結果にもとづく情報（例えば音声対話結果データ）をパケット通信により端末装置に送信するパケット制御部（例えばコンテンツ制御部４２）とを有するセンタとを備えたことを特徴とする。
【０００９】
上記の構成としたことで、音声通話が中断して音声対話処理が中途終了していまっても、音声対話結果データとしての音声認識部にて得られた認識結果もしくは当該認識結果にもとづく情報を、端末装置に提供することができるようになる。
【００１４】
また、本発明の端末通信システム（例えば無線携帯端末システム１０）は、通話機能およびデータ通信機能を有する端末装置（例えば無線携帯端末２０）と、端末装置との間で通信ネットワーク（例えば一般公衆回線網）を介して音声対話処理を行うとともに音声対話処理の結果にもとづく連携結果情報を通信ネットワーク（例えばインターネット５０）を介して端末装置に提供する連携サーバ（例えば音声対話サーバ３０およびコンテンツサーバ４０を含むセンタ）とを備えた端末通信システムであって、連携サーバは、端末装置から通信ネットワークを介して受信した音もしくは音声を示す音データを認識する音声認識部（例えば音声認識部３４）と、音声認識部による音声認識結果を利用して、端末装置との間で通信ネットワークを利用した音通信によって音声対話処理を行う音声対話処理部（例えば音声対話制御部３２）と、音声対話処理に用いられる通信回線の接続状態を監視し、当該通信回線が切断したことを検出する回線切断検出部（例えば音声通信情報検出部３１）と、音声対話処理にて得られた音声認識結果情報のうち音声対話結果情報として取り扱うことが確定された確定音声対話結果情報（例えば、音声認識結果が適正であることの確認が取られたことによって、今後の処理に使用されることが確定された音声対話結果データ）にもとづく連携結果情報（例えば、音声対話結果データ自体の他、音声対話結果データにもとづいて生成された情報、音声対話結果データにもとづいて抽出された情報など、音声対話結果データにもとづく情報を含む）を、通信ネットワークを利用したデータ通信によって端末装置に提供する連携結果情報提供部（例えばコンテンツ制御部４２およびインターネット通信部４１）とを含み、音声対話処理部は、回線切断検出部によって音声対話処理において確定音声対話結果情報が得られる前に通信回線が切断されたことが検出された場合には、既に得られている音声認識処理部による音声認識結果情報のうち、確定音声対話結果情報とされていない音声認識結果情報（例えば仮決定状態の音声対話結果データ）を確定音声対話結果情報とすることに決定することを特徴とする。
【００１５】
上記の構成としたことで、音声対話処理が完了する前に通信回線が切断してしまっても、音声対話結果にもとづく連携結果情報を端末装置に提供することができるようになる。よって、音声対話処理が完了する前に通信回線が切断してしまっても、音もしくは音声入力されて既に音声認識されている情報があれば、その音声認識結果情報にもとづく連携結果情報を提供することができる。
【００１６】
音声対話処理部が、音声認識部による端末装置からの音データの音声認識結果が適正であるか否かを当該端末装置との音声対話処理によって確認し、確認がとれた場合に音データの音声認識結果を確定音声対話結果情報とすることに決定するように構成されていてもよい。
【００１７】
上記の構成としたことで、通信回線が切断しなかった場合には、音声対話処理の終了を示す報知を行う前に、確認がとれた音声認識結果にもとづく連携結果情報を迅速に端末装置に提供することができ、通信回線が切断した場合には、確認がとれる前の音声対話結果にもとづく連携結果情報を端末装置に提供することができるようになる。
【００１８】
音声対話処理部が、音声認識部による端末装置からの音データの音声認識結果が適正であるか否かを当該端末装置との音声対話処理によって確認し、一単位の音データの音声認識結果についてそれぞれ確認がとれた場合に、当該一単位の音データの音声認識結果をそれぞれ確定音声対話結果情報とすることに決定する構成とされていてもよい。
【００１９】
上記の構成としたことで、音声対話処理の終了を示す報知を行う前に、複数の音データの音声認識結果にもとづく連携結果情報を迅速に端末装置に提供することができる。
【００２２】
連携サーバが、端末装置との間で通信ネットワークを介して音もしくは音声による音通信を行う音声対話サーバと、Ｗｅｂページを用いて情報の提供や収集を行うコンテンツサーバとを含み、音声対話サーバとコンテンツサーバを用いて確定音声対話結果情報にもとづく連携結果情報を端末装置に提供するように構成されていてもよい。
【００２３】
上記の構成としたことで、音声対話処理などを実行するサーバとＷｅｂページを用いた情報の提供などを実行するサーバとが別個に備えられているシステムにおいて、音声対話処理が完了する前に通信回線が切断してしまっても、音声対話結果にもとづく連携結果情報を端末装置に提供することができる。よって、音声対話処理が完了する前に通信回線が切断してしまっても、音もしくは音声入力されて既に音声認識されている情報があれば、その音声認識結果情報にもとづく連携結果情報を提供することができる。
【００２４】
端末装置と連携サーバとで行われるデータ通信は、パケット通信により行われるように構成されていてもよい。
【００２５】
上記の構成としたことで、音声対話処理の終了を示す報知を行う前にパケット通信によって音声対話結果情報が送信されるので、パケット通信が行われる通信ネットワークが混雑していても、遅延することなく連携結果情報を端末装置に提供することができる。
【００２６】
連携結果情報は、確定音声対話結果情報が反映されたＷｅｂページデータ、または確定音声対話結果情報にもとづいて選択された選択データである構成とされていてもよい。
【００２７】
上記の構成としたことで、確定音声対話結果情報が反映されたＷｅｂページデータにもとづくＷｅｂページを端末装置に表示させることができる。また、確定音声対話結果情報にもとづいて選択された選択データ（例えば着信時に着信音として再生される着信メロディなどの表示されないデータ）を端末装置に提供することができる。
【００２８】
また、本発明の連携サーバは、通話機能およびデータ通信機能を有する端末装置との間で通信ネットワークを介して音声対話処理を行うとともに音声対話処理の結果にもとづく連携結果情報を通信ネットワークを介して端末装置に提供する連携サーバであって、端末装置から通信ネットワークを介して受信した音もしくは音声を示す音データを認識する音声認識部と、音声認識部による音声認識結果を利用して、端末装置との間で通信ネットワークを利用した音通信によって音声対話処理を行う音声対話処理部と、音声対話処理に用いられる通信回線の接続状態を監視し、当該通信回線が切断したことを検出する回線切断検出部と、音声対話処理にて得られた音声認識結果情報のうち音声対話結果情報として取り扱うことが確定された確定音声対話結果情報にもとづく連携結果情報を、通信ネットワークを利用したデータ通信によって端末装置に提供する連携結果情報提供部とを含み、音声対話処理部は、回線切断検出部によって音声対話処理において確定音声対話結果情報が得られる前に通信回線が切断されたことが検出された場合には、既に得られている音声認識処理部による音声認識結果情報のうち、確定音声対話結果情報とされていない音声認識結果情報を確定音声対話結果情報とすることに決定することを特徴とするものである。
【００２９】
上記の構成としたことで、音声対話処理が完了する前に通信回線が切断してしまっても、音声対話結果にもとづく連携結果情報を端末装置に提供することができるようになる。よって、音声対話処理が完了する前に通信回線が切断してしまっても、音もしくは音声入力されて既に音声認識されている情報があれば、その音声認識結果情報にもとづく連携結果情報を提供することができる。
【００３０】
音声対話処理部が、音声認識部による端末装置からの音データの音声認識結果が適正であるか否かを当該端末装置との音声対話処理によって確認し、確認がとれた場合に音データの音声認識結果を確定音声対話結果情報とすることに決定するように構成されていてもよい。
【００３１】
上記の構成としたことで、通信回線が切断しなかった場合には、音声対話処理の終了を示す報知を行う前に、確認がとれた音声認識結果にもとづく連携結果情報を迅速に端末装置に提供することができ、通信回線が切断した場合には、確認がとれる前の音声対話結果にもとづく連携結果情報を端末装置に提供することができるようになる。
【００３２】
音声対話処理部が、音声認識部による端末装置からの音データの音声認識結果が適正であるか否かを当該端末装置との音声対話処理によって確認し、一単位の音データの音声認識結果についてそれぞれ確認がとれた場合に、当該一単位の音データの音声認識結果をそれぞれ確定音声対話結果情報とすることに決定する構成とされていてもよい。
【００３３】
上記の構成としたことで、音声対話処理の終了を示す報知を行う前に、複数の音データの音声認識結果にもとづく連携結果情報を迅速に端末装置に提供することができる。
【００３４】
また、本発明の音声対話サーバは、通話機能およびデータ通信機能を有する端末装置との間で通信ネットワークを介して音もしくは音声による音通信を行うとともに、Ｗｅｂページを用いて情報の提供や収集を行うコンテンツサーバに対して音声対話処理の結果にもとづく連携結果情報を端末装置に提供することを依頼する音声対話サーバであって、端末装置から通信ネットワークを介して受信した音もしくは音声を示す音データを認識する音声認識部と、音声認識部による音声認識結果を利用して、端末装置との間で通信ネットワークを利用した音通信によって音声対話処理を行う音声対話処理部と、音声対話処理に用いられる通信回線の接続状態を監視し、当該通信回線が切断したことを検出する回線切断検出部と、音声対話処理にて得られた音声認識結果情報のうち音声対話結果情報として取り扱うことが確定された確定音声対話結果情報をコンテンツサーバに送信することで、コンテンツサーバに対して確定音声対話結果情報にもとづく連携結果情報の提供依頼を行う確定音声対話結果情報送信部とを含み、音声対話処理部は、回線切断検出部によって音声対話処理において確定音声対話結果情報が得られる前に通信回線が切断されたことが検出された場合には、既に得られている音声認識処理部による音声認識結果情報のうち、確定音声対話結果情報とされていない音声認識結果情報を確定音声対話結果情報とすることに決定することを特徴とするものである。
【００３５】
上記の構成としたことで、音声対話処理が完了する前に通信回線が切断してしまっても、確定音声対話結果情報をコンテンツサーバに送信することができる。よって、音声対話結果にもとづく連携結果情報を端末装置に提供することができるようになる。また、音声対話処理が完了する前に通信回線が切断してしまっても、音もしくは音声入力されて既に音声認識されている情報があれば、その音声認識結果情報にもとづく確定音声対話結果情報をコンテンツサーバに送信することができる。
【００３６】
音声対話処理部が、音声認識部による端末装置からの音データの音声認識結果が適正であるか否かを当該端末装置との音声対話処理によって確認し、確認がとれた場合に音データの音声認識結果を確定音声対話結果情報とすることに決定するように構成されていてもよい。
【００３７】
上記の構成としたことで、通信回線が切断しなかった場合には、音声対話処理の終了を示す報知を行う前に、確認がとれた音声認識結果にもとづく確定音声対話結果情報をコンテンツサーバに送信することができ、通信回線が切断した場合には、確認がとれる前の音声対話結果にもとづく確定音声対話結果情報をコンテンツサーバに送信することができるようになる。
【００３８】
音声対話処理部が、音声認識部による端末装置からの音データの音声認識結果が適正であるか否かを当該端末装置との音声対話処理によって確認し、一単位の音データの音声認識結果についてそれぞれ確認がとれた場合に、当該一単位の音データの音声認識結果をそれぞれ確定音声対話結果情報とすることに決定する構成とされていてもよい。
【００３９】
上記の構成としたことで、音声対話処理の終了を示す報知を行う前に、複数の音データの音声認識結果にもとづく確定音声対話結果情報をコンテンツサーバに送信することができる。
【００４０】
また、本発明の音声対話処理方法は、通話機能およびデータ通信機能を有する端末装置との間で通信ネットワークを介して音もしくは音声による音通信を行うとともに、Ｗｅｂページを用いて情報の提供や収集を行うコンテンツサーバに対して音声対話処理の結果にもとづく連携結果情報を端末装置に提供することを依頼するための音声対話処理方法であって、端末装置から通信ネットワークを介して受信した音もしくは音声を示す音データを認識するステップと、音声認識結果を利用して、端末装置との間で通信ネットワークを利用した音通信によって音声対話処理を行うステップと、音声対話処理に用いられる通信回線の接続状態を監視し、当該通信回線が切断したことを検出するステップと、音声対話処理にて得られた音声認識結果情報のうち音声対話結果情報として取り扱うことが確定された確定音声対話結果情報をコンテンツサーバに送信することで、コンテンツサーバに対して確定音声対話結果情報にもとづく連携結果情報の提供依頼を行うステップとを含み、音声対話処理において確定音声対話結果情報が得られる前に通信回線が切断されたことが検出された場合には、既に得られている音声認識結果情報のうち、確定音声対話結果情報とされていない音声認識結果情報を確定音声対話結果情報とすることに決定することを特徴とするものである。
【００４１】
上記の構成としたことで、音声対話処理が完了する前に通信回線が切断してしまっても、確定音声対話結果情報をコンテンツサーバに送信することができる。よって、音声対話結果にもとづく連携結果情報を端末装置に提供することができるようになる。また、音声対話処理が完了する前に通信回線が切断してしまっても、音もしくは音声入力されて既に音声認識されている情報があれば、その音声認識結果情報にもとづく確定音声対話結果情報をコンテンツサーバに送信することができる。
【００４２】
さらに、本発明の音声対話処理プログラムは、通話機能およびデータ通信機能を有する端末装置との間で通信ネットワークを介して音もしくは音声による音通信を行うとともに、Ｗｅｂページを用いて情報の提供や収集を行うコンテンツサーバに対して音声対話処理の結果にもとづく連携結果情報を端末装置に提供することを依頼するための音声対話処理プログラムであって、コンピュータに、端末装置から通信ネットワークを介して受信した音もしくは音声を示す音データを認識するステップと、音声認識結果を利用して、端末装置との間で通信ネットワークを利用した音通信によって音声対話処理を行うステップと、音声対話処理に用いられる通信回線の接続状態を監視し、当該通信回線が切断したことを検出するステップと、音声対話処理にて得られた音声認識結果情報のうち音声対話結果情報として取り扱うことが確定された確定音声対話結果情報をコンテンツサーバに送信することで、コンテンツサーバに対して確定音声対話結果情報にもとづく連携結果情報の提供依頼を行うステップとを実行させ、音声対話処理において確定音声対話結果情報が得られる前に通信回線が切断されたことが検出された場合には、既に得られている音声認識結果情報のうち、確定音声対話結果情報とされていない音声認識結果情報を確定音声対話結果情報とすることに決定させることを特徴とするものである。
【００４３】
上記の構成としたことで、音声対話処理が完了する前に通信回線が切断してしまっても、確定音声対話結果情報をコンテンツサーバに送信することができる。よって、音声対話結果にもとづく連携結果情報を端末装置に提供することができるようになる。また、音声対話処理が完了する前に通信回線が切断してしまっても、音もしくは音声入力されて既に音声認識されている情報があれば、その音声認識結果情報にもとづく確定音声対話結果情報をコンテンツサーバに送信することができる。
【００４４】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照して説明する。
図１は、本発明の一実施形態である端末通信システム１０の構成の例を示すブロック図である。端末通信システム１０は、無線携帯端末２０と、音声対話サーバ３０と、コンテンツサーバ４０とを含む。無線携帯端末２０は、無線基地局５１を介してインターネット５０に接続され、無線基地局６１を介して一般公衆回線網６０に接続される。また、音声対話サーバ３０およびコンテンツサーバ４０は、それぞれ、インターネット５０に接続される。さらに、音声対話サーバ３０は、一般公衆回線網６０に接続される。なお、以下の説明において、インターネット５０と一般公衆回線網６０とを含めて通信ネットワークということがある。
【００４５】
無線携帯端末２０は、例えばＰＤＣ（Personal Digital Cellular）規格に準拠したディジタル携帯電話などの携帯電話端末によって構成される。無線携帯端末２０は、一般公衆回線網６０を介して接続先との間で音声通話を行うための通話機能を有するとともに、自己が備える例えばＬＣＤ（Liquid Crystal Display）などの表示装置にＷｅｂページを表示したり、自己が備える入力装置を用いてＷｅｂページ上で文字入力や情報選択を行うためのブラウザ機能とを有している。無線携帯端末が有する２０ブラウザ機能には、インターネット５０上にWebサイトを開設しているWWW（World Wide Web）サーバとの間で各種のデータを送受するデータ通信機能が含まれるものとする。なお、本例では、無線携帯端末が有するデータ通信機能によって、パケット通信によるデータ通信が実行される。無線携帯端末２０は、インターネット５０への接続や、インターネット５０を利用した情報の送受などを行うことができる環境（例えばブラウザなどのソフトウェアや、ハードウェアなどにおける環境）を備えている。
【００４６】
音声対話サーバ３０は、音声通信情報検出部３１と、音声対話制御部３２と、音声対話情報記憶部３３と、音声認識部３４と、音声ガイダンス生成部３５と、インターネット通信部３６とを含み、一般公衆回線網６０を介して入力した音データが示す音や音声を認識する音声認識機能と、発声しようとする言葉を示す文字情報にもとづいて音声合成して音声データの出力を行う音声合成機能とを有する。音声対話サーバ３０は、音声認識機能と音声合成機能とを用いて、音声による情報の伝達や情報の取得を行う音声対話処理を実行する。
【００４７】
音声通信情報検出部３１は、音声対話サーバ３０での音声通話に用いられる通信回線の使用状態を示す回線情報を監視し、通信回線が使用状態から切断された状態に変化したことを検出して、その検出結果を音声対話制御部３２に通知する。具体的には、例えば、音声通信情報検出部３１は、回線情報として通信回線の使用状態を示す信号を監視し、その信号が回線使用状態を示すレベルから回線切断状態を示すレベルに変化したことを検出したときに、通信回線が使用状態から切断状態に変化したことを示す回線切断検出信号を音声対話制御部３２に対して出力する。
【００４８】
音声対話制御部３２は、音声対話サーバ３０内の各部を制御する機能を有する。例えば、音声対話制御部３２は、音声対話情報記憶部３３に記憶されている後述する音声対話処理プログラムに従って、音声認識部３４や音声ガイダンス生成部３５などを制御し、音声認識処理や音声出力処理を実行させることで、音声対話処理を実行させる。また、例えば、音声対話制御部３２は、音声対話処理プログラムに従って、インターネット通信部３６などを制御し、音声対話処理によって得られた音声対話結果データをコンテンツサーバ４０に向けて出力させる。
【００４９】
音声対話情報記憶部３３には、音声対話処理の処理内容を指定した音声対話処理プログラム、音声認識処理や音声合成処理で用いられる辞書データ、音声出力を行う際に使用される音声データが格納された音声ファイルなど、音声対話処理を実行するために用いられる各種の情報があらかじめ記憶される。なお、音声対話処理プログラムは、例えば、voiceXML（eXtensible Markup Language）などの音声対話処理の処理内容を指定するための音声対話処理用言語によって作成されたプログラムである。
【００５０】
音声認識部３４は、音声対話制御部３２の指示に従って、一般公衆回線網６０を介して入力した音データが示す音や音声を認識する音声認識処理を実行し、音声認識結果を音声対話制御部３２に対して送信する処理を実行する。
【００５１】
音声ガイダンス生成部３５は、音声対話制御部３２の指示に従って、音声合成機能もしくはあらかじめ用意された音声ファイルなどを用いて、音声対話処理にて発せられるガイダンスなどを示す音声データが含まれた音声ファイルを生成する。また、音声ガイダンス生成部３５は、生成した音声ファイルを音声対話制御部３５に送信する処理を実行する。
【００５２】
インターネット通信部３６は、インターネット５０に向けて情報を送信する処理や、インターネット５０からの情報を受信する処理を実行する。この例では、インターネット通信部３６は、音声対話制御部３２からの音声対話処理の結果を示す音声対話結果データをコンテンツサーバ４０に向けて送信する処理などを実行する。
【００５３】
コンテンツサーバ４０は、インターネット通信部４１と、コンテンツ制御部４２と、コンテンツ情報記憶部４３とを含む。コンテンツサーバ４０は、例えばＷＷＷサーバなどの情報処理装置により構成される。コンテンツサーバ４０は、例えばC-HTML(Compact HyperText Markup Language)などのマークアップ言語により作成されたＷｅｂページデータを管理し、Ｗｅｂページデータにもとづいて表示されるＷｅｂページを用いて、各種のコンテンツ（アプリケーションを作成する素材を意味するだけでなく、アプリケーションやサービスを含む概念である）の提供や情報の取得を行う機能を有している。Ｗｅｂページには、例えば、商品の受注を行うためのものや、特定の情報を検索するためのものや、アンケートの回収を行うためのものなどがある。
【００５４】
インターネット通信部４１は、コンテンツ制御部４２の制御に従って、インターネット５０に向けて情報を送信する処理や、インターネット５０からの情報を受信する処理を実行する。
【００５５】
コンテンツ制御部４２は、コンテンツ情報記憶部４３の記憶内容に従って、インターネット上にコンテンツサーバ４０が開設しているＷｅｂサイトの制御を行う。
【００５６】
コンテンツ情報記憶部４３は、各種のＷｅｂページデータなど、コンテンツサーバ４０が開設しているＷｅｂサイトの運営に必要な情報が格納されている。なお、本例では、コンテンツ情報記憶部４３に、本システムにユーザ登録されている各無線携帯端末についてのユーザ登録情報や後述する履歴情報なども格納される。
【００５７】
次に、本例の無線携帯端末システム１０の動作について図面を参照して説明する。図２は、本例の無線携帯端末システム１０における表示・音声連携処理および処理タイミングの一例を示すタイミングチャートである。
【００５８】
本例では、無線携帯端末２０は、本システム１０を管理するシステム管理者に対して、ユーザ登録を済ましているものとする。この例では、システム管理者は、音声対話サーバ３０およびコンテンツサーバ４０の双方を管理し、音声対話サーバ３０とコンテンツサーバ４０とを連携させたサービスを提供する。ユーザ登録の際に、本例では、無線携帯端末の電話番号を示す電話番号データ、無線携帯端末を管理しているユーザ名などの無線携帯端末に関する各種の情報がコンテンツ情報記憶部４３に登録される。
【００５９】
また、本例では、ユーザ登録済の各無線携帯端末について、音声対話サーバ３０やコンテンツサーバ４０からサービスを受けたときのサービス内容を示す履歴情報がコンテンツ情報記憶部４３に登録される。履歴情報は、具体的には、例えば、音声対話サーバ３０との間で実行された音声対話処理の結果を示す音声対話結果情報や、コンテンツサーバ４０から取得したＷｅｂページデータや無線携帯端末による入力情報などの情報である。履歴情報は、該当する無線携帯端末の電話番号を示す電話番号データに対応付けされた状態でコンテンツ情報記憶部４３に格納される。従って、音声対話サーバ３０およびコンテンツサーバ４０は、電話番号データを確認することで、どの無線携帯端末によって、どのＷｅｂページデータが取得されてどのような入力がなされたのかや、どのような音声対話処理結果が得られたのかなどを特定することができる。なお、ユーザ登録の際に登録された情報や履歴情報などが格納されるデータベースは、コンテンツサーバ４０に限らず、例えば音声対話サーバ３０が備えるようにしても、あるいは所定のデータベースサーバが備えることにしてもよく、システム１０内のどこに設置されていてもよい。
【００６０】
表示・音声連携処理において、先ず、無線携帯端末２０は、ユーザの操作に応じて、インターネット５０を介してコンテンツサーバ４０にアクセスする（ステップＳ１０１）。例えば、コンテンツサーバ４０が提供しているＷｅｂページのＵＲＬ（Uniform Resource Locator）を指定することでアクセスする。
【００６１】
無線携帯端末２０からのアクセスがあり、Ｗｅｂページを表示するためのＷｅｂページデータの取得要求があった場合には、コンテンツサーバ４０は、取得要求に応じて、無線携帯端末２０に向けてＷｅｂページデータをインターネット５０を介して送信する（ステップＳ１０２）。なお、この例では、送信されるＷｅｂページデータには、音声対話サーバ３０の電話番号を示す電話番号データが含まれている。
【００６２】
この例では、Ｗｅｂページデータには、音声対話サーバ３０との音声対話によって情報入力を行う処理を選択するための音声対話選択領域をＷｅｂページ上に表示するための音声対話選択領域表示データと、音声対話サーバ３０に向けて発呼するための電話番号を示す電話番号データとが、互いに関連付けされた状態で含まれている。すなわち、音声対話選択領域表示データと電話番号データとが、マークアップ言語によってＷｅｂページデータ内に表記されている。また、Ｗｅｂページデータ内に、マークアップ言語によって、音声対話選択領域表示データが示す音声対話選択領域が選択されると、電話番号データが示す電話番号を用いて発呼を行うように指示する記述がなされている。すなわち、Ｗｅｂページデータに、無線携帯端末２０においてｐｈｏｎｅ−ｔｏ機能（音声対話選択領域が選択されたことに応じて特定の相手に発呼する機能）が実現されるようにするための記述がなされている。
【００６３】
無線携帯端末２０は、Ｗｅｂページデータを受信すると、ブラウザ機能によって、受信したＷｅｂページデータにもとづくＷｅｂページを自己が備える表示装置に表示する（ステップＳ１０３）。
【００６４】
図３は、無線携帯端末２０に表示されるＷｅｂページの表示状態の例を示す説明図である。ここでは、コンテンツサーバ４０が、電車の駅に関する各種の情報（例えば、その駅の時刻表、駅周辺の地図や案内、駅構内の地図や案内など）を紹介するサービスを提供している場合を例に説明する。図３には、電車の駅に関する各種の情報を取得するためのＷｅｂページの表示状態の例が示されている。図３に示すように、Ｗｅｂページには、ガイダンスを表示するガイダンス表示領域７１と、駅名を入力するための入力領域７２と、音声対話によって情報入力を行うときに選択される音声対話選択領域７３と、入力領域７２に入力された駅についての情報検索を指示するときに選択される検索指示選択領域７４とが設けられている。
【００６５】
無線携帯端末２０の操作によってユーザによりＷｅｂページにおいて音声対話選択領域７３が選択されると、無線携帯端末２０のブラウザ機能は、通話機能を呼び出し（ステップＳ１０４）、音声対話選択領域７３を表示させるための音声対話選択領域表示データに関連付けされている電話番号データが示す電話番号を用いて発呼することを指示する。呼び出された通話機能は、ブラウザ機能からの指示に従って、Ｗｅｂページデータ内に設定されている電話番号データが示す電話番号を用いて、音声対話サーバ３０に向けて発呼を行う（ステップＳ１０５）。なお、ステップＳ１０５では、音声対話サーバ３０に対して、無線携帯端末２０の電話番号が通知される。
【００６６】
音声対話サーバ３０の音声対話制御部３２は、無線携帯端末２０からの発呼に応じて一般公衆回線網６０における通信回線を接続状態（通話状態）とし、無線携帯端末２０からの発信者番号通知によって特定される電話番号を示す電話番号データにもとづいて、音声対話処理の実行内容を決定する（ステップＳ１０６）。
【００６７】
ここで、ステップＳ１０６での音声対話処理の内容の決定処理について詳しく説明する。この例では、各無線携帯端末についての履歴情報などをコンテンツサーバ４０が管理しているので、音声対話制御部３２は、先ず、インターネット通信部３６を制御して無線携帯端末２０から受けた電話番号データをコンテンツサーバ４０に送信する。電話番号データを受信すると、コンテンツサーバ４０のコンテンツ制御部４２は、コンテンツ情報記憶部４３の格納情報の中から、無線携帯端末２０の電話番号データに対応付けられている履歴情報（例えば、最近追加された数バイト分のデータなど、履歴情報の一部であってもよい）を探索し、探索した履歴情報の中から無線携帯端末２０に最後に送信されたＷｅｂページデータを特定する。この特定したＷｅｂページデータにもとづいて、無線携帯端末２０がどのＷｅｂページを経由して音声対話サーバ３０に向けて発呼を行ったかを確認することができる。コンテンツ制御部４２は、特定したＷｅｂページデータから、無線携帯端末２０を用いてユーザがどのようなサービスを音声対話によって受けようとしていたかを確認し、インターネット通信部４１を制御して、その確認結果を音声対話サーバ３０に送信する。確認結果を受けると、音声対話制御部３２は、受信した確認結果を示す情報にもとづいて、実行する音声対話処理の内容を決定する。例えば、図３に示したＷｅｂページを経由して音声対話サーバ３０に向けて発呼を行ったことが特定された場合には、駅名の入力を音声対話によって行うための音声対話処理を実行することに決定する。このようにして、ステップＳ１０６にて音声対話処理の実行内容が決定される。
【００６８】
なお、各無線携帯端末についての履歴情報などをデータベースサーバが管理する構成とされている場合には、ステップＳ１０６にて、音声対話サーバ３０がデータベースサーバにアクセスすることで、無線携帯端末２０を用いてどのようなサービスを音声によって受けようとしていたかを確認するようにすればよい。
【００６９】
音声対話処理の実行内容を決定すると、音声対話サーバ３０の音声対話制御部３２は、決定した内容の音声対話処理を実行する。図４は、音声対話処理でやり取りされる対話内容の例を示す説明図である。図５は、音声対話サーバ３０が実行する音声対話処理の例を示すフローチャートである。
【００７０】
ここでは、音声対話サーバ３０と、無線携帯端末２０を使用するユーザとの間で、図４に示す内容の音声対話がなされるものとして説明する。音声対話処理において、音声対話制御部３２は、先ず、音声ガイダンス生成部３５にガイダンスを発声するための音声ファイルを生成させ、図４に示すような「駅名を発声してください。」との音声ガイダンスを出力させるための音声データを、一般公衆回線網６０を介して無線携帯端末２０に向けて出力する（ステップＳ２０１）。無線携帯端末２０は、受信した音声データにもとづいて、自己が備えるスピーカから「駅名を発声してください。」との音声を出力する。
【００７１】
次いで、「駅名を発声してください。」という音声ガイダンスに従ってユーザによって発声された音声を示す音声データが、無線携帯端末２０から一般公衆回線網６０を介して入力すると、音声対話制御部３２は、音声認識部３４を制御して、入力した音声データにもとづく音声認識処理を実行させる（ステップＳ２０２）。ステップＳ２０２では、音声認識部３４によって、駅名を示す音声を音声認識するためにあらかじめ作成されて音声対話情報記憶部３３に記憶されている辞書データである「駅名.dic」を用いて音声認識処理が実行される。ステップＳ２０２の音声認識処理にて辞書データ「駅名.dic」を使用することは、音声対話処理を実行する際に使用される音声対話処理プログラム内に表記されている。この例では、ステップＳ２０２にて、入力した音声データが「新宿」を示すものであるという音声認識結果が得られる。すなわち、音声認識結果を示す音声認識結果データとして、「新宿」を示す文字列データが得られる。本例では、音声対話制御部３２は、ステップＳ２０２にて取得した音声認識結果データを、音声対話結果を示す音声対話結果として仮決定する（ステップＳ２０３）。なお、仮決定された音声対話結果データは、音声対話情報記憶部３３に設けられている仮決定データ格納領域に格納される。
【００７２】
また、音声認識結果が得られると、音声対話制御部３２は、音声認識結果が適正なものかどうかを確認するなどのために、音声ガイダンス生成部３５に入力確認のためのガイダンスを発声するための音声ファイルを生成させ、ここでは「新宿でよろしいですか」との音声ガイダンスを出力させるための音声データを、一般公衆回線網６０を介して無線携帯端末２０に向けて出力する（ステップＳ２０４）。無線携帯端末２０は、受信した音声データにもとづいて、自己が備えるスピーカから「新宿でよろしいですか」との音声を出力する。
【００７３】
次いで、「新宿でよろしいですか」という音声ガイダンスに従ってユーザによって発声された音声を示す音声データが、無線携帯端末２０から一般公衆回線網６０を介して入力すると、音声対話制御部３２は、音声認識部３４を制御して、入力した音声データにもとづく音声認識処理を実行させる（ステップＳ２０５）。ステップＳ２０５では、「はい」、「いいえ」、「ＹＥＳ」、「ＮＯ」などの応答を示す音声を音声認識するためにあらかじめ作成されて音声対話情報記憶部３３に記憶されている辞書データである「yesno.dic」を用いて音声認識処理が実行される。ステップＳ２０５の音声認識処理にて辞書データ「yesno.dic」を使用することは、音声対話処理を実行する際に使用される音声対話処理プログラム内に表記されている。この例では、ステップＳ２０５にて、入力した音声データが「はい」を示すものであるという音声認識結果が得られる。すなわち、音声認識結果を示す音声認識結果データとして、「はい」を示す文字列データが得られる。
【００７４】
「はい」などの肯定的な回答を示す文字列データが得られた場合には（ステップＳ２０６）、音声対話制御部３２は、ステップＳ２０３にて仮決定データ格納領域に格納されている仮決定状態の音声対話結果データを、音声対話結果データとして確定させる。そして、音声対話制御部３２は、インターネット通信部３６を制御して、確定した音声対話結果データと、無線携帯端末２０の電話番号データとを、インターネット５０を介してコンテンツサーバ４０に向けて出力する（ステップＳ２０７、ステップＳ１０７）。なお、ステップＳ２０６にて「いいえ」などの否定的な回答を示す文字列データが得られていた場合には、ステップＳ２０１以降の処理を再度実行する。
【００７５】
音声対話結果データを送信すると、音声対話制御部３２は、音声対話処理を終了することを報知するために、音声ガイダンス生成部３５に処理の終了を報知するためのガイダンスを発声するための音声ファイルを生成させ、「了解しました。終了いたします。」との音声ガイダンスを出力させるための音声データを、インターネット通信部３６を制御して一般公衆回線網６０を介して無線携帯端末２０に向けて出力させる（ステップＳ２０８）。無線携帯端末２０は、受信した音声データにもとづいて、自己が備えるスピーカから「了解しました。終了いたします。」との音声を出力する。そして、音声対話サーバ３０は、通信回線を切断して通話状態を終了させ、音声対話処理を終了させる。
【００７６】
ステップＳ１０７にて送信された音声対話処理結果データおよび電話番号データを受信すると、コンテンツサーバ４０は、コンテンツ情報記憶部４３に格納されている受信した電話番号データと同一の電話番号データに対応付けして、受信した音声対話結果データを保存する（ステップＳ１０８）。
【００７７】
音声対話処理が終了すると、無線携帯端末２０の通話機能は、ブラウザ機能を呼び出す（ステップＳ１０９）。呼び出された無線携帯端末２０のブラウザ機能は、コンテンツサーバ４０に対して、無線携帯端末２０の表示装置に表示されている表示情報の更新を要求する（ステップＳ１１０）。コンテンツサーバ４０は、更新要求に応じて、無線携帯端末２０についての電話番号データに対応付けされている音声対話結果データをコンテンツ情報記憶部４３から読み出して、音声対話処理の結果を反映させたＷｅｂページデータを作成する（ステップＳ１１１）。そして、音声対話処理の結果を反映させたＷｅｂページデータを送信する（ステップＳ１１２）。
【００７８】
Ｗｅｂページデータを受信すると、無線携帯端末２０のブラウザ機能によって、受信したＷｅｂページデータにもとづくＷｅｂページが表示される（ステップＳ１１３）。Ｗｅｂページの更新後の表示内容は、例えば図６に示すように、音声対話処理によって入力された情報の内容が反映された状態となっている。すなわち、図６には、音声対話処理にて音声入力された「新宿」なる駅名が、入力領域７２に文字入力された更新後のＷｅｂページの表示状態が示されている。なお、この状態で検索指示選択領域７４が押下されると、コンテンツサーバ４０にて「新宿」駅についての情報検索が実行される。
【００７９】
上述したように、音声対話サーバ３０が、音声対話処理の終了を報知するための処理を実行する前に、音声対話結果データをコンテンツサーバ４０に向けて送信する構成としたことで、音声対話結果データをコンテンツサーバ４０に送る通信処理を早期に開始することができる。よって、インターネット５０上のパケット通信におけるトラフィック（伝送されているデータ量）が多く通信回線が混雑していてデータ伝送時間が長くなってしまうようなときであっても、コンテンツサーバ４０での音声対話結果データの受信を早期に完了させることができる。従って、音声対話処理の終了後に無線携帯端末２０のブラウザ機能によって表示情報の更新要求がなされるときまでに、コンテンツサーバ４０での音声対話結果データの受信処理が完了した状態とすることができる。
【００８０】
音声対話処理の終了を報知するための処理（ステップＳ２０８）は、相当の期間（本例であれば３〜４秒程度の期間）を要するので、その間に、コンテンツサーバ４０での音声対話結果データの受信処理が完了した状態とすることができるのである。従って、音声対話処理の終了後直ちに無線携帯端末に表示されているＷｅｂページを更新させ、音声入力の結果を表示に反映させることができるようになる。
【００８１】
次に、音声対話処理の実行中に一般公衆回線網６０の通信回線が切断された場合の処理について説明する。ここでは、音声対話サーバ３０と無線携帯端末２０を使用するユーザとの間で図７に示す内容の音声対話がなされるものとして説明する。すなわち、音声認識結果の確認などのためになされるガイダンスの出力中に通信回線が切断された場合を例に説明する。
【００８２】
図８は、音声対話サーバ３０が実行する回線監視処理の例を示すフローチャートである。図９は、音声対話処理において、音声対話処理の実行中に一般公衆回線網６０の音声対話処理に用いられている通信回線が切断された場合の例を示すフローチャートである。
【００８３】
回線監視処理は、例えば、音声対話処理が開始されたときに開始する。回線監視処理において、音声対話サーバ３０の音声通信情報検出部３１は、音声対話処理にて使用している通信回線の接続状態を監視する（ステップＳ３０１）。監視している通信回線が使用状態から切断された状態に変化したことを検出すると（ステップＳ３０１のＹ）、音声通信情報検出部３１は、通信回線が使用状態から切断状態に変化したことを示す回線切断検出信号を音声対話制御部３２に対して出力する。
【００８４】
通信回線の切断は、この例では、無線携帯端末２０を携帯しているユーザが電車などの移動体に乗っていてトンネルなどの無線通信インフラが整備されていない場所に移動したときなどに発生する電波障害が起きたとき、無線携帯端末２０のユーザが自発的に通信回線を切断する操作を行ったとき、あるいは音声対話処理が終了したときに検出される。
【００８５】
回線切断検出信号を受信すると、音声対話制御部３２は、音声対話情報記憶部３３の仮決定データ格納領域に未送信の音声対話結果データが格納されていた場合には（ステップＳ３０２のＹ）、その音声対話結果データをコンテンツサーバ４０に向けて出力するとともに、無線携帯端末２０の電話番号データもコンテンツサーバ４０に向けて出力する（ステップＳ３０３）。すなわち、ユーザからの入力情報として仮決定されている音声対話結果データを、音声対話結果データとして確定させ、確定させた音声対話結果データを送信する。
【００８６】
従って、音声対話処理にて、音声認識結果データが音声対話結果データとして仮決定されたあと、その音声対話結果データとすることに確定されてコンテンツサーバ４０に送信される前に、通信回線が切断されて音声対話処理が中途終了した場合には、仮決定状態の音声対話結果データを今後使用する音声対話結果データとすることに確定され、確定させた音声対話結果データがコンテンツサーバ４０に出力される。すなわち、この例では、図５に示した音声対話処理にて、ステップＳ２０３を終えたあとステップＳ２０７を終える前に通信回線が切断され場合には、音声対話処理が完了していなくても、音声対話結果データが出力される。
【００８７】
例えば、図９に示すように、音声対話処理において、音声対話制御部３２が「新宿でよろしいですか」との音声ガイダンスを出力させるための音声データの出力処理を行っている途中で（ステップＳ２０４参照）通信回線が切断した場合には、音声対話制御部３２によって、音声対話情報記憶部３３の仮決定データ格納領域に格納されている「新宿」を示す仮決定状態の音声対話結果データを用いて今後を制御を行うことが確定され、確定した音声対話結果データとして無線携帯端末２０の電話番号データとともにコンテンツサーバ４０に向けて出力される（ステップＳ１０７ａ）。
【００８８】
上述したように、音声対話処理に用いられている通信回線の接続状態を監視し、その通信回線が切断したときに仮決定状態の音声対話結果データが存在していた場合に、その仮決定状態の音声対話結果データを使用して今後の処理を実行することに確定させ、確定させた音声対話結果データをコンテンツサーバ４０に向けて出力する構成としたので、音声対話処理が中途終了した場合であっても音声対話結果データを出力することができる。
【００８９】
よって、電波障害によって音声対話処理が継続できなくなってしまったときであっても、音声認識処理が行われて音声対話結果データが仮決定されていれば、無線携帯端末２０の表示画面に音声入力された情報の内容を反映させることができるようになる。従って、ユーザは、再度の音声対話処理を行う必要がなくなる。
【００９０】
また、音声対話結果データを仮決定したあと、例えば音声認識結果の確認のためのガイダンスの報知を終える前に、ユーザによって通信回線が切断された場合であっても、音声対話結果データを出力することができる。従って、ユーザは、音声入力により駅名を入力したあと、音声認識結果の確認のためのガイダンスを聞くことなく通信回線を自発的に切断するようにすれば、確認依頼に対する応答（「はい」、「いいえ」などの応答）を行う必要がなく、無線携帯端末２０の表示画面に音声入力した情報の内容を迅速に反映させることができるようになる。よって、ユーザにとっては、情報入力などを行うための音声対話処理を短時間で終了させることができる。
【００９１】
なお、上述した実施の形態では、駅名という１つの情報を得るために音声対話処理が実行されていたが、複数の情報（例えば、住所、氏名などの情報）を得るための音声対話処理が実行されるようにしてもよい。この場合、例えば、複数の音声対話結果データを取得して確定させるための音声対話を実行したあと、コンテンツサーバ４０に確定した複数の音声対話結果データをまとめて送信し、その後に音声対話処理の終了を示すガイダンスを行うようにすればよい。複数の音声対話結果データがまとめて送信されてきた場合には、コンテンツサーバ４０は、複数の音声対話結果データのそれぞれが反映されたＷｅｂページデータを作成して無線携帯端末２０に送信するようにすればよい。
【００９２】
複数の情報を得るための音声対話処理が実行される場合、音声対話結果データに確定するための処理（確認処理）は、複数の仮決定状態の音声対話結果データの全てを取得したあとにその全てについてまとめて行うようにしてもよく、複数の仮決定状態の音声対話結果データの所定の一単位（例えば３つを一単位とする場合には３つ分）が取得される毎にその一単位についてまとめて行うようにしてもよく、仮決定状態の音声対話結果データが取得される毎に順次行うようにしてもよい。また、上記のように、複数の情報を得るための音声対話処理が実行される場合には、複数の情報の一部についての音声入力が終了している状態で通信回線が切断して音声対話処理が中途終了した場合に、音声認識結果が得られている一部の情報については確定した音声対話結果データとしてコンテンツサーバ４０に送信され、無線携帯端末２０の表示情報に反映されるようにすればよい。そして、その後の音声対話処理において、無線携帯端末２０の表示情報に反映されていない情報についてのみ、音声入力するための処理が実行されるようにすればよい。
【００９３】
また、複数の情報（例えば、住所、氏名などの情報）を得るための音声対話処理が実行する場合に、音声認識結果の確認処理を行うことなく、取得した音声認識結果データをそのまま音声対話結果データとして確定させ、コンテンツサーバ４０に順次送信する構成としてもよい。この場合、例えば図１０に示すように、音声認識結果が得られる毎に、音声対話結果データがコンテンツサーバ４０に向けて送信される（ステップＳ１０７ａ、ステップＳ１０７ｂ）。コンテンツサーバ４０は、音声対話結果データを受信する毎に上述したステップＳ１０８と同様にして保存する（ステップＳ１０８ａ、ステップＳ１０８ｂ）。このように構成すれば、コンテンツサーバ４０が早期に音声対話結果データを受信しておくことができ、音声対話処理の完了後直ちにＷｅｂページデータを更新するための処理を実行することができるようになる。なお、コンテンツサーバ４０が、音声対話結果データを受信する毎に、音声対話結果データを反映させたＷｅｂページデータを作成して無線携帯端末２０に送信する構成としてもよい。この場合、無線携帯端末２０は、音声対話処理中に音声入力を終える毎にブラウザ機能を呼び出して表示しているＷｅｂページを更新し、通話機能を呼び出して音声対話処理の続きを実行するようにすればよい。
【００９４】
また、上述した実施の形態では、無線携帯端末２０が携帯電話端末であるものとして説明していたが、ブラウザ機能と通話機能とをともに備えるものであれば、PDA(Personal Digital Assistants)やパーソナルコンピュータなどの他の端末装置であってもよい。また、無線通信を行う無線携帯端末２０を例にしたが、有線による通信を行う端末であっても本発明を適用することができる。
【００９５】
また、上述した実施の形態では、データ通信をインターネット５０を利用して行う構成としていたが、ＬＡＮなどの他の通信ネットワークによって行う構成とされていてもよい。また、データ通信をパケット通信によって行う構成としていたが、他の通信方法であってもよい。
【００９６】
また、上述した実施の形態において、音声対話サーバ３０とコンテンツサーバ４０とが一つのサーバ（連携サーバ、センタ）として運営されていてもよい。この場合、音声対話サーバ３０とコンテンツサーバ４０とが専用回線などによって接続されるようにしてもよい。
【００９７】
また、上述した実施の形態では、表示画面の情報入力を音声対話処理によって行う構成としていたが、電話の着信音として再生される着信メロディなどの表示されない情報を提供するために音声対話処理を実行する構成としてもよい。この場合、音声対話処理によってユーザが音声入力した情報によって特定されるタイトルの着信メロディを無線携帯端末２０に提供するようにすればよい。
【００９８】
また、上述した実施の形態では、コンテンツサーバ４０が、無線携帯端末２０からのＷｅｂページ取得要求（ステップＳ１１０のＷｅｂページ更新要求）に応じて、音声対話結果データを取得して（ステップＳ１１１）、音声対話結果データが示す音声対話結果を反映させたＷｅｂページデータを送信する構成（ステップＳ１１２）としていたが、音声対話結果データを保存したあと（ステップＳ１０８）に、無線携帯端末２０からのＷｅｂページ取得要求の有無に関わらず、音声対話結果データが示す音声対話結果を反映させたＷｅｂページデータを無線携帯端末２０に送信する構成としてもよい。このように構成すれば、Ｗｅｂページ取得要求を無線携帯端末２０に行わせることなく、音声対話処理の結果を、Ｗｅｂページに反映させることができる。
【００９９】
また、上述した各実施の形態では、Ｗｅｂページデータを生成するための表示用言語として、携帯電話端末のブラウザでWebページの表示などを行うために広く用いられているC-HTML(Compact HTML)を例にしていたが、HTML、HDML(Handheld Device Markup Language)、WML(Wireless Markup Language)などの他のマークアップ言語を用いるようにしてもよい。
【０１００】
さらに、上述した各実施の形態では、音声対話サーバ３０と無線携帯端末２０とが一般公衆回線網６０に接続され、一般公衆回線網６０を介して音声通話を行う構成としていたが、VoIP(Voice over Internet Protocol)等のＩＰネットワークに接続して音声通信を行う構成としてもよい。また、上述した各実施の形態では、音声対話処理が音声によって行われる構成としていたが、無線携帯端末２０を管理するユーザがＤＴＭＦ（Dual Tone Multi Frequency）信号によって音声入力を行い、音声対話サーバ３０が音声認識処理にて取得したＤＴＭＦ信号に対応するキーを表す文字を取得するようにしてもよい。
【０１０１】
さらに、上述した各実施の形態では、音声対話サーバ３０は、上述した各種の処理を実行するための音声対話処理プログラムにもとづいて動作を行っている。例えば、この音声対話処理プログラムは、音声対話サーバ３０に、無線携帯端末２０から通信ネットワーク６０を介して受信した音もしくは音声を示す音データを認識するステップと、音声認識結果を利用して、無線携帯端末２０との間で通信ネットワーク６０を利用した音通信によって音声対話処理を行うステップと、音声対話処理に用いられる通信回線の接続状態を監視し、当該通信回線が切断したことを検出するステップと、音声対話処理にて得られた音声認識結果情報のうち音声対話結果情報として取り扱うことが確定された確定音声対話結果情報をコンテンツサーバ４０に送信することで、コンテンツサーバ４０に対して確定音声対話結果情報にもとづく連携結果情報の提供依頼を行うステップとを実行させ、音声対話処理において確定音声対話結果情報が得られる前に通信回線が切断されたことが検出された場合には、既に得られている音声認識結果情報のうち、確定音声対話結果情報とされていない音声認識結果情報を確定音声対話結果情報とすることに決定させる処理を実行させるプログラムである。なお、コンテンツサーバ４０も、上述した各種の処理を実行するためのデータ処理プログラムにもとづいて動作を行っている。
【０１０２】
【発明の効果】
以上のように、本発明の端末通信システムによれば、音声通信機能およびパケット通信機能を有する端末装置と、端末装置との間で音声通話を行う音声制御部と、音声制御部で受信した端末装置からの音声信号を認識し認識結果を出力する音声認識部と、音声通話の回線情報を監視し音声通話の中断を検出する回線情報検出部と、音声通話による音声対話終了時もしくは回線情報検出部にて音声通話の中断が検出されたときに、音声認識部にて得られた認識結果もしくは当該認識結果にもとづく情報をパケット通信により端末装置に送信するパケット制御部とを有するセンタとを備えたことを特徴とするので、音声通話が中断して音声対話処理が中途終了していまっても、音声対話結果データとしての音声認識部にて得られた認識結果もしくは当該認識結果にもとづく情報を、端末装置に提供することができるようになる。
【０１０５】
また、本発明の端末通信システムによれば、連携サーバは、端末装置から通信ネットワークを介して受信した音もしくは音声を示す音データを認識する音声認識部と、音声認識部による音声認識結果を利用して、端末装置との間で通信ネットワークを利用した音通信によって音声対話処理を行う音声対話処理部と、音声対話処理に用いられる通信回線の接続状態を監視し、当該通信回線が切断したことを検出する回線切断検出部と、音声対話処理にて得られた音声認識結果情報のうち音声対話結果情報として取り扱うことが確定された確定音声対話結果情報にもとづく連携結果情報を、通信ネットワークを利用したデータ通信によって端末装置に提供する連携結果情報提供部とを含み、音声対話処理部は、回線切断検出部によって音声対話処理において確定音声対話結果情報が得られる前に通信回線が切断されたことが検出された場合には、既に得られている音声認識処理部による音声認識結果情報のうち、確定音声対話結果情報とされていない音声認識結果情報を確定音声対話結果情報とすることに決定することを特徴とするので、音声対話処理が完了する前に通信回線が切断してしまっても、音声対話結果にもとづく連携結果情報を端末装置に提供することができるようになる。よって、音声対話処理が完了する前に通信回線が切断してしまっても、音もしくは音声入力されて既に音声認識されている情報があれば、その音声認識結果情報にもとづく連携結果情報を提供することができる。
【０１０６】
音声対話処理部が、音声認識部による端末装置からの音データの音声認識結果が適正であるか否かを当該端末装置との音声対話処理によって確認し、確認がとれた場合に音データの音声認識結果を確定音声対話結果情報とすることに決定するように構成されている場合には、通信回線が切断しなかった場合には、音声対話処理の終了を示す報知を行う前に、確認がとれた音声認識結果にもとづく連携結果情報を迅速に端末装置に提供することができ、通信回線が切断した場合には、確認がとれる前の音声対話結果にもとづく連携結果情報を端末装置に提供することができるようになる。
【０１０７】
音声対話処理部が、音声認識部による端末装置からの音データの音声認識結果が適正であるか否かを当該端末装置との音声対話処理によって確認し、一単位の音データの音声認識結果についてそれぞれ確認がとれた場合に、当該一単位の音データの音声認識結果をそれぞれ確定音声対話結果情報とすることに決定する構成とされている場合には、音声対話処理の終了を示す報知を行う前に、複数の音データの音声認識結果にもとづく連携結果情報を迅速に端末装置に提供することができる。
【０１０９】
連携サーバが、端末装置との間で通信ネットワークを介して音もしくは音声による音通信を行う音声対話サーバと、Ｗｅｂページを用いて情報の提供や収集を行うコンテンツサーバとを含み、音声対話サーバとコンテンツサーバを用いて確定音声対話結果情報にもとづく連携結果情報を端末装置に提供するように構成されている場合には、音声対話処理などを実行するサーバとＷｅｂページを用いた情報の提供などを実行するサーバとが別個に備えられているシステムにおいて、音声対話処理が完了する前に通信回線が切断してしまっても、音声対話結果にもとづく連携結果情報を端末装置に提供することができる。よって、音声対話処理が完了する前に通信回線が切断してしまっても、音もしくは音声入力されて既に音声認識されている情報があれば、その音声認識結果情報にもとづく連携結果情報を提供することができる。
【０１１０】
端末装置と連携サーバとで行われるデータ通信は、パケット通信により行われるように構成されている場合には、音声対話処理の終了を示す報知を行う前にパケット通信によって音声対話結果情報が送信されるので、パケット通信が行われる通信ネットワークが混雑していても、遅延することなく連携結果情報を端末装置に提供することができる。
【０１１１】
連携結果情報が、確定音声対話結果情報が反映されたＷｅｂページデータ、または確定音声対話結果情報にもとづいて選択された選択データである構成とされている場合には、確定音声対話結果情報が反映されたＷｅｂページデータにもとづくＷｅｂページを端末装置に表示させることができ、あるいは確定音声対話結果情報にもとづいて選択された選択データを端末装置に提供することができる。
【０１１２】
また、本発明の連携サーバによれば、端末装置から通信ネットワークを介して受信した音もしくは音声を示す音データを認識する音声認識部と、音声認識部による音声認識結果を利用して、端末装置との間で通信ネットワークを利用した音通信によって音声対話処理を行う音声対話処理部と、音声対話処理に用いられる通信回線の接続状態を監視し、当該通信回線が切断したことを検出する回線切断検出部と、音声対話処理にて得られた音声認識結果情報のうち音声対話結果情報として取り扱うことが確定された確定音声対話結果情報にもとづく連携結果情報を、通信ネットワークを利用したデータ通信によって端末装置に提供する連携結果情報提供部とを含み、音声対話処理部は、回線切断検出部によって音声対話処理において確定音声対話結果情報が得られる前に通信回線が切断されたことが検出された場合には、既に得られている音声認識処理部による音声認識結果情報のうち、確定音声対話結果情報とされていない音声認識結果情報を確定音声対話結果情報とすることに決定することを特徴とするので、音声対話処理が完了する前に通信回線が切断してしまっても、音声対話結果にもとづく連携結果情報を端末装置に提供することができるようになる。よって、音声対話処理が完了する前に通信回線が切断してしまっても、音もしくは音声入力されて既に音声認識されている情報があれば、その音声認識結果情報にもとづく連携結果情報を提供することができる。
【０１１３】
音声対話処理部が、音声認識部による端末装置からの音データの音声認識結果が適正であるか否かを当該端末装置との音声対話処理によって確認し、確認がとれた場合に音データの音声認識結果を確定音声対話結果情報とすることに決定するように構成されている場合には、通信回線が切断しなかった場合には、音声対話処理の終了を示す報知を行う前に、確認がとれた音声認識結果にもとづく連携結果情報を迅速に端末装置に提供することができ、通信回線が切断した場合には、確認がとれる前の音声対話結果にもとづく連携結果情報を端末装置に提供することができるようになる。
【０１１４】
音声対話処理部が、音声認識部による端末装置からの音データの音声認識結果が適正であるか否かを当該端末装置との音声対話処理によって確認し、一単位の音データの音声認識結果についてそれぞれ確認がとれた場合に、当該一単位の音データの音声認識結果をそれぞれ確定音声対話結果情報とすることに決定する構成とされている場合には、音声対話処理の終了を示す報知を行う前に、複数の音データの音声認識結果にもとづく連携結果情報を迅速に端末装置に提供することができる。
【０１１５】
また、本発明の音声対話サーバによれば、端末装置から通信ネットワークを介して受信した音もしくは音声を示す音データを認識する音声認識部と、音声認識部による音声認識結果を利用して、端末装置との間で通信ネットワークを利用した音通信によって音声対話処理を行う音声対話処理部と、音声対話処理に用いられる通信回線の接続状態を監視し、当該通信回線が切断したことを検出する回線切断検出部と、音声対話処理にて得られた音声認識結果情報のうち音声対話結果情報として取り扱うことが確定された確定音声対話結果情報をコンテンツサーバに送信することで、コンテンツサーバに対して確定音声対話結果情報にもとづく連携結果情報の提供依頼を行う確定音声対話結果情報送信部とを含み、音声対話処理部は、回線切断検出部によって音声対話処理において確定音声対話結果情報が得られる前に通信回線が切断されたことが検出された場合には、既に得られている音声認識処理部による音声認識結果情報のうち、確定音声対話結果情報とされていない音声認識結果情報を確定音声対話結果情報とすることに決定することを特徴とするので、音声対話処理が完了する前に通信回線が切断してしまっても、確定音声対話結果情報をコンテンツサーバに送信することができる。よって、音声対話結果にもとづく連携結果情報を端末装置に提供することができるようになる。また、音声対話処理が完了する前に通信回線が切断してしまっても、音もしくは音声入力されて既に音声認識されている情報があれば、その音声認識結果情報にもとづく確定音声対話結果情報をコンテンツサーバに送信することができる。
【０１１６】
音声対話処理部が、音声認識部による端末装置からの音データの音声認識結果が適正であるか否かを当該端末装置との音声対話処理によって確認し、確認がとれた場合に音データの音声認識結果を確定音声対話結果情報とすることに決定するように構成されている場合には、通信回線が切断しなかった場合には、音声対話処理の終了を示す報知を行う前に、確認がとれた音声認識結果にもとづく確定音声対話結果情報をコンテンツサーバに送信することができ、通信回線が切断した場合には、確認がとれる前の音声対話結果にもとづく確定音声対話結果情報をコンテンツサーバに送信することができるようになる。
【０１１７】
音声対話処理部が、音声認識部による端末装置からの音データの音声認識結果が適正であるか否かを当該端末装置との音声対話処理によって確認し、一単位の音データの音声認識結果についてそれぞれ確認がとれた場合に、当該一単位の音データの音声認識結果をそれぞれ確定音声対話結果情報とすることに決定する構成とされている場合には、音声対話処理の終了を示す報知を行う前に、複数の音データの音声認識結果にもとづく確定音声対話結果情報をコンテンツサーバに送信することができる。
【０１１８】
また、本発明の音声対話処理方法によれば、端末装置から通信ネットワークを介して受信した音もしくは音声を示す音データを認識するステップと、音声認識結果を利用して、端末装置との間で通信ネットワークを利用した音通信によって音声対話処理を行うステップと、音声対話処理に用いられる通信回線の接続状態を監視し、当該通信回線が切断したことを検出するステップと、音声対話処理にて得られた音声認識結果情報のうち音声対話結果情報として取り扱うことが確定された確定音声対話結果情報をコンテンツサーバに送信することで、コンテンツサーバに対して確定音声対話結果情報にもとづく連携結果情報の提供依頼を行うステップとを含み、音声対話処理において確定音声対話結果情報が得られる前に通信回線が切断されたことが検出された場合には、既に得られている音声認識結果情報のうち、確定音声対話結果情報とされていない音声認識結果情報を確定音声対話結果情報とすることに決定することを特徴とするので、音声対話処理が完了する前に通信回線が切断してしまっても、確定音声対話結果情報をコンテンツサーバに送信することができる。よって、音声対話結果にもとづく連携結果情報を端末装置に提供することができるようになる。また、音声対話処理が完了する前に通信回線が切断してしまっても、音もしくは音声入力されて既に音声認識されている情報があれば、その音声認識結果情報にもとづく確定音声対話結果情報をコンテンツサーバに送信することができる。
【０１１９】
さらに、本発明の音声対話処理プログラムによれば、コンピュータに、端末装置から通信ネットワークを介して受信した音もしくは音声を示す音データを認識するステップと、音声認識結果を利用して、端末装置との間で通信ネットワークを利用した音通信によって音声対話処理を行うステップと、音声対話処理に用いられる通信回線の接続状態を監視し、当該通信回線が切断したことを検出するステップと、音声対話処理にて得られた音声認識結果情報のうち音声対話結果情報として取り扱うことが確定された確定音声対話結果情報をコンテンツサーバに送信することで、コンテンツサーバに対して確定音声対話結果情報にもとづく連携結果情報の提供依頼を行うステップとを実行させ、音声対話処理において確定音声対話結果情報が得られる前に通信回線が切断されたことが検出された場合には、既に得られている音声認識結果情報のうち、確定音声対話結果情報とされていない音声認識結果情報を確定音声対話結果情報とすることに決定させることを特徴とするので、音声対話処理が完了する前に通信回線が切断してしまっても、確定音声対話結果情報をコンテンツサーバに送信することができる。よって、音声対話結果にもとづく連携結果情報を端末装置に提供することができるようになる。また、音声対話処理が完了する前に通信回線が切断してしまっても、音もしくは音声入力されて既に音声認識されている情報があれば、その音声認識結果情報にもとづく確定音声対話結果情報をコンテンツサーバに送信することができる。
【図面の簡単な説明】
【図１】本発明の端末通信システムの一実施の形態における無線携帯端末システムの構成の例を示すブロック図である。
【図２】表示・音声連携処理および処理タイミングの一例を示すタイミングチャートである。
【図３】Ｗｅｂページの表示状態の例を示す説明図である。
【図４】音声対話の内容の例を示す説明図である。
【図５】音声対話処理の例を示すフローチャートである。
【図６】更新後のＷｅｂページの表示状態の例を示す説明図である。
【図７】音声対話の内容の他の例を示す説明図である。
【図８】回線監視処理の例を示すフローチャートである。
【図９】表示・音声連携処理および処理タイミングの他の例を示すタイミングチャートである。
【図１０】表示・音声連携処理および処理タイミングのさらに他の例を示すタイミングチャートである。
【符号の説明】
１０無線携帯端末システム
２０無線携帯端末
３０音声対話サーバ
３１音声通信情報検出部
３２音声対話制御部
３３音声対話情報記憶部
３４音声認識部
３５音声ガイダンス生成部
３６インターネット通信部
４０コンテンツサーバ
４１インターネット通信部
４２コンテンツ制御部
４３コンテンツ情報記憶部
５０インターネット
６０一般公衆回線網[0001]
BACKGROUND OF THE INVENTION
In the present invention, for example, information input by voice dialogue processing can be reflected early on the display content of a Web page, and even when the communication line used for voice dialogue processing is disconnected. The present invention relates to a terminal communication system, a linkage server, a voice dialogue server, a voice dialogue processing method, and a voice dialogue processing program that can be reflected in the display content of a Web page.
[0002]
[Prior art]
Conventionally, a display service using Web page display by a WWW (World Wide Web) server connected to a data communication network such as the Internet, and a voice dialogue connected to a sound communication network such as a general public telephone network. A system in which a voice service using a voice dialogue function by a server is linked is used.
[0003]
As a system in which a display service and a voice service are linked, there is a wireless portable terminal communication system disclosed in, for example, JP-A-2002-268241. This wireless portable terminal communication system includes a mobile phone terminal having a browser function and a call function, and a center including a content server that executes a display service and a voice conversation server that executes a voice service. According to this wireless portable terminal communication system, the center accepts information input to the information input area displayed on the display device included in the mobile phone terminal by the display service by the content server by voice input by voice dialogue processing. Is provided. Specifically, the character string information based on the voice information input in the voice dialogue process is acquired by the voice recognition process, and the character indicating the voice recognition result is displayed in the information input area displayed on the display device included in the mobile phone terminal. Processing for performing display based on the column information is executed.
[0004]
[Problems to be solved by the invention]
In the voice recognition process in the voice dialogue process, there is a flaw in the user's utterance method of the mobile phone terminal, so that sounds and voices input from the mobile phone terminal are always accurately recognized as the contents intended by the user. Is not limited. For this reason, in general, in the voice interaction process, a process for uttering a voice indicating the voice recognition result is performed to inquire the user of the mobile phone terminal whether or not the voice recognition result is appropriate. The confirmation process is executed. When this confirmation process is executed, the character string information indicating the voice recognition result confirmed to be appropriate by the user by the confirmation process is confirmed to be handled as the voice conversation result information, and the confirmed voice conversation result Information is transmitted from the voice interaction server to the content server. And a content server produces | generates the cooperation information based on the received audio | voice dialog result information, and transmits to a mobile telephone terminal. Then, on the display screen of the display device of the mobile phone terminal, a display reflecting the content of the voice uttered by the user is made based on the cooperation information.
[0005]
However, before the confirmation process in the voice dialogue process is completed, for example, when the user carrying the mobile phone terminal moves to a place where radio waves do not reach during the voice dialogue process, the voice dialogue is performed. When the communication line used for processing is disconnected and the voice dialogue process is terminated before the character string information indicating the voice recognition result is confirmed as the voice dialogue result information, The voice dialogue result information is discarded, and subsequent processing such as display in the information input area is not performed. In this way, even after the user performs voice input, if the voice dialogue process is terminated before the confirmation process is completed, the voice uttered by the user is input to the information input area. It is not reflected as information, and the voice dialogue processing must be restarted from the beginning. As described above, if the communication line used for voice dialogue processing is disconnected before the voice dialogue processing is completed, even if there is information already inputted by voice, the voice is not reflected. There was a problem that dialog processing was terminated prematurely.
[0006]
In addition, after the voice dialogue processing is completed, the confirmed voice dialogue result information is transmitted from the voice dialogue server to the content server, but when a lot of data is transmitted on the data communication network and the communication line is congested. The transmission period of the voice conversation result information becomes longer, and the time when the voice conversation result information is acquired by the content server is delayed. If the time when the content server acquires the voice interaction result information is delayed, the time when the display based on the link information can be performed on the display screen of the display device of the mobile phone terminal will be delayed. Display that reflects the result of voice interaction processing cannot be performed. In this way, it is impossible to display the result of the voice conversation processing immediately after the voice conversation processing is completed, and it is not possible to quickly reflect the result of the voice conversation processing to the display content of the mobile phone terminal. There was a problem.
[0007]
The present invention solves the above-described problems, enables the processing result of the voice dialog processing to be reflected in the display contents and the like at an earlier stage, and enables voice processing by the voice dialog processing even if the voice dialog processing is not completed. The purpose is to enable the input information to be reflected in the display content.
[0008]
[Means for Solving the Problems]
In order to solve the above problem, a terminal communication system according to the present invention includes a voice control unit that performs a voice call between a terminal device (for example, the wireless portable terminal 20) having a voice communication function and a packet communication function, and the terminal device. (For example, the voice dialogue control unit 32), the voice recognition unit (for example, the voice recognition unit 34) that recognizes the voice signal from the terminal device received by the voice control unit and outputs the recognition result, and monitors the line information of the voice call. The line information detection unit (for example, the voice communication information detection unit 31) that detects interruption of the voice call, and the interruption of the voice call (for example, voice conversation processing) is detected at the end of the voice conversation by the voice call or at the line information detection unit Sometimes, packet control for transmitting a recognition result obtained by the voice recognition unit or information (for example, voice dialogue result data) based on the recognition result to the terminal device by packet communication Characterized by comprising a center and a (for example, a content control unit 42).
[0009]
With the above configuration, even if the voice call is interrupted and the voice dialogue process is terminated, the recognition result obtained by the voice recognition unit as the voice dialogue result data or information based on the recognition result is obtained. Can be provided to the terminal device.
[0014]
The terminal communication system (for example, the wireless portable terminal system 10) of the present invention is a communication network (for example, a general public line) between a terminal device (for example, the wireless portable terminal 20) having a call function and a data communication function and the terminal device. A cooperation server (for example, the voice interaction server 30 and the content server 40) that performs a voice conversation process via the network and provides cooperation result information based on the result of the voice conversation process to the terminal device via the communication network (for example, the Internet 50). Including a center) including a voice recognition unit (for example, a voice recognition unit 34) that recognizes sound data indicating sound or voice received from a terminal device via a communication network; Utilizing the speech recognition result by the speech recognition unit, the communication network is used with the terminal device. A line disconnection that monitors a connection state between a voice conversation processing unit (for example, the voice conversation control unit 32) that performs voice conversation processing by voice communication and a communication line used for the voice conversation processing and detects that the communication line is disconnected. Of the voice recognition result information obtained by the voice dialog processing, the voice communication result detection information (for example, the voice recognition result is determined to be treated as voice dialog result information). Cooperation result information (for example, voice dialogue result data itself as well as voice dialogue result data based on voice dialogue result data determined to be used for future processing) Information based on voice dialogue result data, such as information generated based on voice dialogue result data and information extracted based on voice dialogue result data). A cooperation result information providing unit (for example, the content control unit 42 and the Internet communication unit 41) provided to the terminal device by data communication using a network, and the voice conversation processing unit performs a confirmed voice in the voice conversation processing by the line disconnection detection unit. If it is detected that the communication line has been disconnected before the conversation result information is obtained, the voice that has not been determined as the confirmed voice conversation result information among the voice recognition result information obtained by the voice recognition processing unit. The recognition result information (for example, voice dialog result data in a provisionally determined state) is determined to be confirmed voice dialog result information.
[0015]
With the above-described configuration, even if the communication line is disconnected before the completion of the voice conversation process, it is possible to provide cooperation result information based on the voice conversation result to the terminal device. Therefore, even if the communication line is disconnected before the completion of the voice dialogue processing, if there is sound or voice input and information that has already been voice-recognized, cooperation result information based on the voice recognition result information is provided. be able to.
[0016]
The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by the voice dialogue processing with the terminal device. The recognition result may be determined to be confirmed voice conversation result information.
[0017]
With the above configuration, when the communication line has not been disconnected, the cooperation result information based on the confirmed voice recognition result is promptly transmitted to the terminal device before the notification indicating the end of the voice conversation processing is performed. When the communication line is disconnected, the cooperation result information based on the voice conversation result before confirmation can be provided to the terminal device.
[0018]
The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by the voice dialogue processing with the terminal device, and the voice recognition result of the unit sound data When each confirmation is obtained, the voice recognition result of the unit of sound data may be determined as the confirmed voice conversation result information.
[0019]
With the above configuration, it is possible to promptly provide the terminal device with cooperation result information based on the voice recognition results of a plurality of sound data before performing notification indicating the end of the voice conversation processing.
[0022]
The cooperation server includes a voice dialogue server that performs sound communication by sound or voice over a communication network with a terminal device, and a content server that provides and collects information using a Web page, The content server may be used to provide cooperation result information based on the confirmed voice conversation result information to the terminal device.
[0023]
With the above configuration, in a system in which a server that executes voice interaction processing and the like and a server that executes information provision using a Web page are separately provided, communication is performed before the voice interaction processing is completed. Even if the line is disconnected, cooperation result information based on the voice conversation result can be provided to the terminal device. Therefore, even if the communication line is disconnected before the completion of the voice dialogue processing, if there is sound or voice input and information that has already been voice-recognized, cooperation result information based on the voice recognition result information is provided. be able to.
[0024]
Data communication performed between the terminal device and the cooperation server may be configured to be performed by packet communication.
[0025]
With the above configuration, the voice conversation result information is transmitted by packet communication before notification indicating the end of the voice conversation processing, so that even if the communication network in which the packet communication is performed is congested, there is a delay. The cooperation result information can be provided to the terminal device.
[0026]
The cooperation result information may be configured to be Web page data reflecting the confirmed voice conversation result information or selection data selected based on the confirmed voice conversation result information.
[0027]
With the above configuration, it is possible to display a Web page based on the Web page data reflecting the confirmed voice conversation result information on the terminal device. Also, selection data selected based on the confirmed voice conversation result information (for example, data that does not display a ringing melody that is reproduced as a ringing tone when a call is received) can be provided to the terminal device.
[0028]
In addition, the cooperation server of the present invention performs voice conversation processing via a communication network with a terminal device having a call function and a data communication function and sends cooperation result information based on the result of the voice conversation processing via the communication network. A cooperation server provided to a terminal device, the terminal device using a voice recognition unit for recognizing sound or sound data received from the terminal device via a communication network, and a voice recognition result by the voice recognition unit The line disconnection that monitors the connection state of the voice communication processing unit that performs voice dialog processing by voice communication using the communication network and the communication line used for the voice dialog processing and detects that the communication line is disconnected Determined sound that has been determined to be handled as speech dialogue result information from the speech recognition result information obtained by the voice dialogue processing with the detection unit Including a cooperation result information providing unit that provides cooperation result information based on the dialogue result information to the terminal device by data communication using a communication network, and the voice dialogue processing unit performs a confirmed voice dialogue in the voice dialogue processing by the line disconnection detection unit. If it is detected that the communication line has been disconnected before the result information is obtained, the voice recognition result information that has already been obtained and is not used as the finalized voice conversation result information. The result information is determined to be confirmed voice conversation result information.
[0029]
With the above-described configuration, even if the communication line is disconnected before the completion of the voice conversation process, it is possible to provide cooperation result information based on the voice conversation result to the terminal device. Therefore, even if the communication line is disconnected before the completion of the voice dialogue processing, if there is sound or voice input and information that has already been voice-recognized, cooperation result information based on the voice recognition result information is provided. be able to.
[0030]
The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by the voice dialogue processing with the terminal device. The recognition result may be determined to be confirmed voice conversation result information.
[0031]
With the above configuration, when the communication line has not been disconnected, the cooperation result information based on the confirmed voice recognition result is promptly transmitted to the terminal device before the notification indicating the end of the voice conversation processing is performed. When the communication line is disconnected, the cooperation result information based on the voice conversation result before confirmation can be provided to the terminal device.
[0032]
The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by the voice dialogue processing with the terminal device, and the voice recognition result of the unit sound data When each confirmation is obtained, the voice recognition result of the unit of sound data may be determined as the confirmed voice conversation result information.
[0033]
With the above configuration, it is possible to promptly provide the terminal device with cooperation result information based on the voice recognition results of a plurality of sound data before performing notification indicating the end of the voice conversation processing.
[0034]
The voice interaction server of the present invention performs sound communication by sound or voice via a communication network with a terminal device having a call function and a data communication function, and provides and collects information using a web page. Sound data indicating sound or sound received from a terminal device via a communication network, which is a voice interaction server that requests the content server to provide cooperation result information based on the result of the voice interaction processing to the terminal device A voice recognition unit that recognizes the voice, a voice dialogue processing unit that performs voice dialogue processing by using voice communication using a communication network with a terminal device using a voice recognition result of the voice recognition unit, and a voice dialogue processing It is obtained by voice communication processing and a line disconnection detection unit that monitors the connection state of the communication line to be detected and detects that the communication line has been disconnected. Of collation result information based on the confirmed voice conversation result information to the content server by transmitting to the content server the confirmed voice conversation result information that is confirmed to be handled as the voice conversation result information among the received voice recognition result information A voice conversation processing unit that makes a request, and the voice dialogue processing unit detects that the communication line has been disconnected before the confirmed voice dialogue result information is obtained in the voice dialogue processing by the line disconnection detection unit. In the case, the voice recognition result information that is not determined as the confirmed voice conversation result information among the already obtained voice recognition result information by the voice recognition processing unit is determined to be the confirmed voice conversation result information. To do.
[0035]
With the above configuration, even if the communication line is disconnected before the voice conversation process is completed, the confirmed voice conversation result information can be transmitted to the content server. Therefore, it becomes possible to provide the cooperation result information based on the voice conversation result to the terminal device. Even if the communication line is disconnected before the completion of the voice dialogue processing, if there is sound or voice input and information that has already been voice-recognized, the confirmed voice dialogue result information based on the voice recognition result information is displayed. It can be sent to the content server.
[0036]
The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by the voice dialogue processing with the terminal device. The recognition result may be determined to be confirmed voice conversation result information.
[0037]
With the above configuration, when the communication line has not been disconnected, the confirmed voice conversation result information based on the confirmed voice recognition result is sent to the content server before the notification indicating the end of the voice conversation processing is given. When the communication line is disconnected, the confirmed voice conversation result information based on the voice conversation result before confirmation can be transmitted to the content server.
[0038]
The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by the voice dialogue processing with the terminal device, and the voice recognition result of the unit sound data When each confirmation is obtained, the voice recognition result of the unit of sound data may be determined as the confirmed voice conversation result information.
[0039]
With the above configuration, before performing notification indicating the end of the voice conversation processing, it is possible to transmit the confirmed voice conversation result information based on the voice recognition results of the plurality of sound data to the content server.
[0040]
The voice interaction processing method of the present invention performs sound communication by sound or voice via a communication network with a terminal device having a call function and a data communication function, and provides and collects information using a web page. A voice dialogue processing method for requesting a content server that performs cooperation result information based on a result of voice dialogue processing to a terminal device, the sound or voice received from the terminal device via a communication network A step of recognizing sound data indicating a voice, a step of performing voice conversation processing by sound communication using a communication network with a terminal device using a voice recognition result, and connection of a communication line used for the voice conversation processing A step of monitoring the state and detecting that the communication line is disconnected; and the voice recognition result information obtained by the voice dialogue processing. A request to provide cooperation result information based on the confirmed voice conversation result information to the content server by transmitting the confirmed voice conversation result information confirmed to be handled as the voice conversation result information to the content server. In the case where it is detected that the communication line has been disconnected before the confirmed voice dialogue result information is obtained in the voice dialogue processing, it is determined as the confirmed voice dialogue result information among the already obtained voice recognition result information. The voice recognition result information that is not yet used is determined to be the final voice dialog result information.
[0041]
With the above configuration, even if the communication line is disconnected before the voice conversation process is completed, the confirmed voice conversation result information can be transmitted to the content server. Therefore, it becomes possible to provide the cooperation result information based on the voice conversation result to the terminal device. Even if the communication line is disconnected before the completion of the voice dialogue processing, if there is sound or voice input and information that has already been voice-recognized, the confirmed voice dialogue result information based on the voice recognition result information is displayed. It can be sent to the content server.
[0042]
Furthermore, the voice interaction processing program of the present invention performs sound communication by sound or voice via a communication network with a terminal device having a call function and a data communication function, and provides and collects information using a Web page. A speech dialogue processing program for requesting a content server that provides a terminal device with cooperation result information based on a result of the voice dialogue processing, which is received by a computer from a terminal device via a communication network A step of recognizing sound or sound data indicating sound, a step of performing voice conversation processing by sound communication using a communication network with a terminal device using a voice recognition result, and communication used for the voice conversation processing Monitoring the connection status of the line, detecting the disconnection of the communication line, and voice dialogue Linking based on the confirmed voice conversation result information to the content server by transmitting to the content server the confirmed voice conversation result information that has been confirmed to be handled as the voice conversation result information among the voice recognition result information obtained in the process A request for providing result information is executed, and if it is detected that the communication line is disconnected before the confirmed voice dialog result information is obtained in the voice dialog processing, the voice recognition result already obtained Of the information, voice recognition result information that is not regarded as confirmed voice conversation result information is determined to be confirmed voice conversation result information.
[0043]
With the above configuration, even if the communication line is disconnected before the voice conversation process is completed, the confirmed voice conversation result information can be transmitted to the content server. Therefore, it becomes possible to provide the cooperation result information based on the voice conversation result to the terminal device. Even if the communication line is disconnected before the completion of the voice dialogue processing, if there is sound or voice input and information that has already been voice-recognized, the confirmed voice dialogue result information based on the voice recognition result information is displayed. It can be sent to the content server.
[0044]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing an example of the configuration of a terminal communication system 10 according to an embodiment of the present invention. The terminal communication system 10 includes a wireless portable terminal 20, a voice interaction server 30, and a content server 40. The wireless portable terminal 20 is connected to the Internet 50 via a wireless base station 51 and connected to the general public network 60 via a wireless base station 61. The voice interaction server 30 and the content server 40 are each connected to the Internet 50. Furthermore, the voice conversation server 30 is connected to a general public line network 60. In the following description, the Internet 50 and the general public network 60 may be referred to as a communication network.
[0045]
The wireless mobile terminal 20 is configured by a mobile phone terminal such as a digital mobile phone conforming to the PDC (Personal Digital Cellular) standard, for example. The wireless portable terminal 20 has a call function for making a voice call with a connection destination via the general public line network 60, and displays a Web page on a display device such as an LCD (Liquid Crystal Display) provided in the wireless portable terminal 20 itself. It has a browser function for displaying or performing character input and information selection on a Web page using an input device provided by itself. The 20 browser function of the wireless portable terminal includes a data communication function for transmitting / receiving various data to / from a WWW (World Wide Web) server that has established a website on the Internet 50. In this example, data communication by packet communication is executed by the data communication function of the wireless portable terminal. The wireless portable terminal 20 includes an environment (for example, software such as a browser or hardware) that can connect to the Internet 50 and send and receive information using the Internet 50.
[0046]
The voice dialogue server 30 includes a voice communication information detection unit 31, a voice dialogue control unit 32, a voice dialogue information storage unit 33, a voice recognition unit 34, a voice guidance generation unit 35, and an Internet communication unit 36. A voice recognition function for recognizing sound and voice indicated by sound data input via the general public network 60 and a voice synthesis function for outputting voice data by synthesizing voice based on character information indicating a word to be uttered And have. The voice dialogue server 30 executes voice dialogue processing for transmitting information and obtaining information by voice using a voice recognition function and a voice synthesis function.
[0047]
The voice communication information detecting unit 31 monitors line information indicating a use state of a communication line used for a voice call in the voice conversation server 30, and detects that the communication line has changed from a use state to a disconnected state. The detection result is notified to the voice interaction control unit 32. Specifically, for example, the voice communication information detection unit 31 monitors a signal indicating the use state of the communication line as line information, and the signal has changed from a level indicating the line use state to a level indicating the line disconnection state. Is detected, a line disconnection detection signal indicating that the communication line has changed from the use state to the disconnection state is output to the voice conversation control unit 32.
[0048]
The voice dialogue control unit 32 has a function of controlling each unit in the voice dialogue server 30. For example, the voice dialogue control unit 32 controls the voice recognition unit 34 and the voice guidance generation unit 35 according to a later-described voice dialogue processing program stored in the voice dialogue information storage unit 33, and performs voice recognition processing and voice output processing. To execute voice dialogue processing. Further, for example, the voice dialogue control unit 32 controls the Internet communication unit 36 and the like according to the voice dialogue processing program, and outputs the voice dialogue result data obtained by the voice dialogue processing to the content server 40.
[0049]
The voice dialogue information storage unit 33 stores a voice dialogue processing program that specifies processing contents of voice dialogue processing, dictionary data used in voice recognition processing and voice synthesis processing, and voice data used when voice output is performed. Various kinds of information used for executing voice dialogue processing such as a voice file are stored in advance. The voice dialogue processing program is a program created by a voice dialogue processing language for designating processing contents of voice dialogue processing such as voiceXML (eXtensible Markup Language).
[0050]
The voice recognition unit 34 executes voice recognition processing for recognizing the sound or voice indicated by the sound data input via the general public line network 60 according to the instruction of the voice dialogue control unit 32, and the voice recognition result is sent to the voice dialogue control unit. The process of transmitting to 32 is executed.
[0051]
The voice guidance generation unit 35 uses a voice synthesis function or a voice file prepared in advance in accordance with an instruction from the voice dialogue control unit 32, and a voice file including voice data indicating guidance and the like issued in voice dialogue processing. Is generated. In addition, the voice guidance generation unit 35 executes processing for transmitting the generated voice file to the voice dialogue control unit 35.
[0052]
The Internet communication unit 36 executes processing for transmitting information toward the Internet 50 and processing for receiving information from the Internet 50. In this example, the Internet communication unit 36 executes a process of transmitting voice dialogue result data indicating the result of the voice dialogue process from the voice dialogue control unit 32 to the content server 40.
[0053]
The content server 40 includes an Internet communication unit 41, a content control unit 42, and a content information storage unit 43. The content server 40 is configured by an information processing device such as a WWW server, for example. The content server 40 manages Web page data created by a markup language such as C-HTML (Compact HyperText Markup Language), for example, and uses the Web page displayed based on the Web page data to make various contents ( (It is not only a material for creating an application but also a concept including an application and a service) and a function of acquiring information. Web pages include, for example, a product order, a search for specific information, and a questionnaire collection.
[0054]
The Internet communication unit 41 executes processing for transmitting information to the Internet 50 and processing for receiving information from the Internet 50 under the control of the content control unit 42.
[0055]
The content control unit 42 controls a Web site established by the content server 40 on the Internet according to the content stored in the content information storage unit 43.
[0056]
The content information storage unit 43 stores information necessary for the operation of the Web site established by the content server 40, such as various Web page data. In this example, the content information storage unit 43 also stores user registration information for each wireless mobile terminal registered as a user in this system, history information to be described later, and the like.
[0057]
Next, the operation of the wireless portable terminal system 10 of this example will be described with reference to the drawings. FIG. 2 is a timing chart showing an example of display / voice cooperation processing and processing timing in the wireless portable terminal system 10 of this example.
[0058]
In this example, it is assumed that the wireless portable terminal 20 has completed user registration with respect to a system administrator who manages the system 10. In this example, the system administrator manages both the voice dialogue server 30 and the content server 40 and provides a service in which the voice dialogue server 30 and the content server 40 are linked. At the time of user registration, in this example, telephone number data indicating the telephone number of the wireless portable terminal, and various information related to the wireless portable terminal such as a user name managing the wireless portable terminal are registered in the content information storage unit 43. The
[0059]
Also, in this example, for each wireless portable terminal that has been registered as a user, history information indicating the service details when the service is received from the voice interaction server 30 or the content server 40 is registered in the content information storage unit 43. Specifically, the history information is, for example, voice dialogue result information indicating the result of voice dialogue processing executed with the voice dialogue server 30, web page data acquired from the content server 40, or input by a wireless portable terminal. Information such as information. The history information is stored in the content information storage unit 43 in a state associated with the telephone number data indicating the telephone number of the corresponding wireless portable terminal. Therefore, the voice dialogue server 30 and the content server 40 confirm the telephone number data, which web page data is acquired and what input is made by which wireless portable terminal, and what kind of voice dialogue. It is possible to specify whether the processing result has been obtained. Note that the database storing the information registered at the time of user registration, history information, and the like is not limited to the content server 40, but may be provided in the voice dialogue server 30, for example, or provided in a predetermined database server. It may be installed anywhere in the system 10.
[0060]
In the display / audio cooperation processing, first, the wireless portable terminal 20 accesses the content server 40 via the Internet 50 in accordance with a user operation (step S101). For example, it is accessed by specifying a URL (Uniform Resource Locator) of a Web page provided by the content server 40.
[0061]
When there is an access from the wireless portable terminal 20 and there is an acquisition request for Web page data for displaying a Web page, the content server 40 sends the Web page to the wireless portable terminal 20 in response to the acquisition request. Data is transmitted via the Internet 50 (step S102). In this example, the transmitted web page data includes telephone number data indicating the telephone number of the voice interaction server 30.
[0062]
In this example, the web page data includes voice dialogue selection area display data for displaying on the web page a voice dialogue selection area for selecting a process for inputting information by voice dialogue with the voice dialogue server 30; Phone number data indicating a phone number for making a call to the voice interaction server 30 is included in a state of being associated with each other. That is, the voice dialog selection area display data and the telephone number data are written in the Web page data in the markup language. In addition, when the voice dialog selection area indicated by the voice dialog selection area display data is selected in the markup language in the Web page data, a description instructing to make a call using the telephone number indicated by the telephone number data Has been made. In other words, the web page data is described so that the phone-to function (a function for calling a specific partner in response to the selection of the voice conversation selection area) is realized in the wireless portable terminal 20. ing.
[0063]
When receiving the web page data, the wireless portable terminal 20 displays the web page based on the received web page data on the display device provided therein by the browser function (step S103).
[0064]
FIG. 3 is an explanatory diagram illustrating an example of a display state of a Web page displayed on the wireless portable terminal 20. Here, the case where the content server 40 provides a service that introduces various types of information related to a train station (for example, a timetable of the station, a map and guidance around the station, a map and guidance inside the station, etc.). Explained as an example. FIG. 3 shows an example of a display state of a Web page for acquiring various information related to a train station. As shown in FIG. 3, the Web page includes a guidance display area 71 for displaying guidance, an input area 72 for inputting a station name, and a voice dialog selection area 73 selected when information is input by voice dialog. And a search instruction selection area 74 that is selected when an information search for the station input in the input area 72 is instructed.
[0065]
When the voice conversation selection area 73 is selected on the Web page by the user by operating the wireless portable terminal 20, the browser function of the wireless portable terminal 20 calls the call function (step S104) to display the voice conversation selection area 73. It is instructed to make a call using the telephone number indicated by the telephone number data associated with the voice dialog selection area display data. The called call function makes a call to the voice interaction server 30 using the telephone number indicated by the telephone number data set in the web page data in accordance with an instruction from the browser function (step S105). In step S105, the telephone number of the wireless portable terminal 20 is notified to the voice interaction server 30.
[0066]
The voice dialogue control unit 32 of the voice dialogue server 30 sets the communication line in the general public line network 60 in a connected state (call state) in response to a call from the wireless portable terminal 20, and notifies the caller number from the wireless portable terminal 20. Based on the telephone number data indicating the telephone number specified by, the execution content of the voice dialogue process is determined (step S106).
[0067]
Here, the determination process of the content of the voice dialogue process in step S106 will be described in detail. In this example, since the content server 40 manages history information and the like for each wireless mobile terminal, the voice interaction control unit 32 first controls the Internet communication unit 36 to receive the telephone number received from the wireless mobile terminal 20. Data is transmitted to the content server 40. When the telephone number data is received, the content control unit 42 of the content server 40 stores history information associated with the telephone number data of the wireless portable terminal 20 from the stored information in the content information storage unit 43 (for example, recently added). (It may be a part of history information such as data of several bytes that have been searched), and the Web page data that was last transmitted to the wireless portable terminal 20 is specified from the searched history information. Based on the specified Web page data, it is possible to confirm through which Web page the wireless portable terminal 20 has made a call to the voice interaction server 30. The content control unit 42 confirms what service the user was trying to receive by voice conversation using the wireless portable terminal 20 from the specified Web page data, and controls the Internet communication unit 41 to confirm the confirmation. The result is transmitted to the voice dialogue server 30. When receiving the confirmation result, the voice interaction control unit 32 determines the content of the voice interaction process to be executed based on the received information indicating the confirmation result. For example, when it is specified that a call is made to the voice dialogue server 30 via the Web page shown in FIG. 3, a voice dialogue process is performed to input a station name by voice dialogue. Decide on. In this way, the execution content of the voice interaction process is determined in step S106.
[0068]
In the case where the database server manages history information and the like for each wireless mobile terminal, the wireless communication terminal 20 is used by the voice conversation server 30 accessing the database server in step S106. What kind of service should be confirmed by voice.
[0069]
When the execution content of the voice dialogue process is determined, the voice dialogue control unit 32 of the voice dialogue server 30 executes the voice dialogue process of the decided content. FIG. 4 is an explanatory diagram showing an example of conversation contents exchanged in the voice conversation processing. FIG. 5 is a flowchart showing an example of a voice dialogue process executed by the voice dialogue server 30.
[0070]
Here, a description will be given assuming that a voice dialogue having the contents shown in FIG. 4 is performed between the voice dialogue server 30 and the user using the wireless portable terminal 20. In the voice dialogue process, the voice dialogue control unit 32 first causes the voice guidance generation unit 35 to generate a voice file for uttering the guidance, and voices “Please say the station name” as shown in FIG. Voice data for outputting guidance is output to the wireless portable terminal 20 via the general public network 60 (step S201). Based on the received voice data, the wireless portable terminal 20 outputs a voice saying “Please say your station name” from the speaker that it has.
[0071]
Next, when the voice data indicating the voice uttered by the user according to the voice guidance “Please say the station name” is input from the wireless portable terminal 20 via the general public line network 60, the voice dialogue control unit 32 The voice recognition unit 34 is controlled to execute voice recognition processing based on the input voice data (step S202). In step S202, the speech recognition unit 34 uses the “station name.dic”, which is dictionary data created in advance and stored in the speech dialogue information storage unit 33 to recognize the speech indicating the station name. Is executed. The use of the dictionary data “station name.dic” in the voice recognition processing in step S202 is described in the voice dialogue processing program used when executing the voice dialogue processing. In this example, in step S202, a voice recognition result that the input voice data indicates “Shinjuku” is obtained. That is, character string data indicating “Shinjuku” is obtained as the speech recognition result data indicating the speech recognition result. In this example, the voice dialogue control unit 32 provisionally determines the voice recognition result data acquired in step S202 as a voice dialogue result indicating the voice dialogue result (step S203). The temporarily determined voice interaction result data is stored in a temporarily determined data storage area provided in the voice interaction information storage unit 33.
[0072]
Further, when the voice recognition result is obtained, the voice dialogue control unit 32 utters guidance for input confirmation to the voice guidance generation unit 35 in order to check whether the voice recognition result is appropriate. Voice data for generating voice guidance for outputting “sound in Shinjuku” is output to the wireless portable terminal 20 via the general public network 60 (step S204). . Based on the received audio data, the wireless portable terminal 20 outputs a voice saying “Are you sure you want to go to Shinjuku” from the speaker that you have.
[0073]
Next, when voice data indicating voice uttered by the user in accordance with the voice guidance “Is it OK in Shinjuku” is input from the wireless portable terminal 20 via the general public network 60, the voice dialogue control unit 32 performs voice recognition. The unit 34 is controlled to execute voice recognition processing based on the input voice data (step S205). In step S205, the dictionary data is created in advance and stored in the voice interaction information storage unit 33 in order to recognize voices indicating responses such as “Yes”, “No”, “YES”, and “NO”. Speech recognition processing is executed using “yesno.dic”. The use of the dictionary data “yesno.dic” in the voice recognition processing in step S205 is described in the voice dialogue processing program used when executing the voice dialogue processing. In this example, in step S205, a speech recognition result that the input speech data indicates “Yes” is obtained. That is, character string data indicating “Yes” is obtained as the speech recognition result data indicating the speech recognition result.
[0074]
When character string data indicating an affirmative answer such as “Yes” is obtained (step S206), the voice interaction control unit 32 determines the temporary determination state stored in the temporary determination data storage area in step S203. The voice dialogue result data is confirmed as voice dialogue result data. Then, the voice dialogue control unit 32 controls the Internet communication unit 36 to output the confirmed voice dialogue result data and the telephone number data of the wireless portable terminal 20 to the content server 40 via the Internet 50. (Step S207, Step S107). If character string data indicating a negative answer such as “No” has been obtained in step S206, the processes in and after step S201 are executed again.
[0075]
When the voice dialogue result data is transmitted, the voice dialogue control unit 32 utters a guidance for notifying the voice guidance generation unit 35 of the end of the process in order to notify the end of the voice dialogue process. And the voice data for outputting the voice guidance “I understand. I will end.” Is sent to the wireless portable terminal 20 via the general public line network 60 by controlling the Internet communication unit 36. Output (step S208). Based on the received voice data, the wireless portable terminal 20 outputs a voice saying “I understand. Then, the voice conversation server 30 disconnects the communication line, ends the call state, and ends the voice conversation processing.
[0076]
Upon receiving the voice interaction processing result data and telephone number data transmitted in step S107, the content server 40 associates the same telephone number data with the received telephone number data stored in the content information storage unit 43. The received voice dialogue result data is stored (step S108).
[0077]
When the voice interaction process is completed, the call function of the wireless portable terminal 20 calls the browser function (step S109). The called browser function of the wireless portable terminal 20 requests the content server 40 to update the display information displayed on the display device of the wireless portable terminal 20 (step S110). In response to the update request, the content server 40 reads out the voice interaction result data associated with the telephone number data for the wireless portable terminal 20 from the content information storage unit 43 and reflects the result of the voice interaction processing on the web. Page data is created (step S111). Then, the Web page data reflecting the result of the voice interaction process is transmitted (step S112).
[0078]
When the web page data is received, the web page based on the received web page data is displayed by the browser function of the wireless portable terminal 20 (step S113). As shown in FIG. 6, for example, the display content after the update of the Web page is in a state in which the content of information input by the voice interaction process is reflected. That is, FIG. 6 shows a display state of the updated Web page in which the station name “Shinjuku” inputted by voice in the voice dialogue processing is entered in the input area 72. When the search instruction selection area 74 is pressed in this state, the content server 40 executes information search for “Shinjuku” station.
[0079]
As described above, the voice conversation server 30 is configured to transmit the voice conversation result data to the content server 40 before executing the process for notifying the end of the voice conversation process. Communication processing for sending data to the content server 40 can be started early. Therefore, even when the traffic (the amount of data transmitted) in the packet communication on the Internet 50 is large and the communication line is congested and the data transmission time becomes long, the voice dialogue in the content server 40 Reception of the result data can be completed early. Therefore, the reception process of the voice conversation result data in the content server 40 can be completed by the time when the display information update request is made by the browser function of the wireless portable terminal 20 after the voice conversation process is completed.
[0080]
Since the process for notifying the end of the voice interaction process (step S208) requires a considerable period (in this example, a period of about 3 to 4 seconds), the voice conversation result data in the content server 40 is in the meantime. Thus, the reception process can be completed. Accordingly, the web page displayed on the wireless portable terminal can be updated immediately after the completion of the voice interaction process, and the result of the voice input can be reflected on the display.
[0081]
Next, a process when the communication line of the general public line network 60 is disconnected during the execution of the voice conversation process will be described. Here, a description will be given on the assumption that the voice dialogue having the contents shown in FIG. 7 is performed between the voice dialogue server 30 and the user using the wireless portable terminal 20. That is, a case will be described as an example where the communication line is disconnected during the output of guidance for confirming the voice recognition result.
[0082]
FIG. 8 is a flowchart showing an example of line monitoring processing executed by the voice interaction server 30. FIG. 9 is a flowchart showing an example of the case where the communication line used for the voice dialogue process of the general public line network 60 is disconnected during the voice dialogue process.
[0083]
The line monitoring process is started, for example, when the voice dialogue process is started. In the line monitoring process, the voice communication information detection unit 31 of the voice dialogue server 30 monitors the connection state of the communication line used in the voice dialogue process (step S301). When it is detected that the monitored communication line has changed from the use state to the disconnected state (Y in step S301), the voice communication information detection unit 31 indicates that the communication line has changed from the use state to the disconnected state. A line disconnection detection signal is output to the voice interaction control unit 32.
[0084]
In this example, the disconnection of the communication line occurs when the user carrying the wireless portable terminal 20 is on a moving body such as a train and moves to a place where a wireless communication infrastructure such as a tunnel is not established. It is detected when a radio wave failure occurs, when the user of the wireless portable terminal 20 performs an operation of spontaneously disconnecting the communication line, or when the voice interaction process is completed.
[0085]
Upon receiving the line disconnection detection signal, the voice conversation control unit 32, when unsent voice conversation result data is stored in the temporarily determined data storage area of the voice conversation information storage unit 33 (Y in step S302), The voice interaction result data is output toward the content server 40, and the telephone number data of the wireless portable terminal 20 is also output toward the content server 40 (step S303). That is, the voice conversation result data provisionally determined as input information from the user is confirmed as the voice conversation result data, and the confirmed voice conversation result data is transmitted.
[0086]
Therefore, after the voice recognition result data is provisionally determined as the voice dialogue result data in the voice dialogue processing, the communication line is disconnected before the voice recognition result data is decided to be the voice dialogue result data and transmitted to the content server 40. When the voice dialogue processing is terminated halfway, the voice dialogue result data in the temporarily determined state is decided to be used as voice dialogue result data to be used in the future, and the decided voice dialogue result data is output to the content server 40. The That is, in this example, when the communication line is disconnected in the voice interaction process shown in FIG. 5 after step S203 and before step S207, the voice conversation process is not completed. Dialogue result data is output.
[0087]
For example, as shown in FIG. 9, in the voice dialogue processing, the voice dialogue control unit 32 is performing voice data output processing for outputting voice guidance “Are you sure in Shinjuku” (step S <b> 204). (Refer to) When the communication line is disconnected, the voice dialogue control unit 32 uses the voice dialogue result data in the provisional decision state indicating “Shinjuku” stored in the provisional decision data storage area of the voice dialogue information storage unit 33. Then, it is determined that the future control will be performed, and is output to the content server 40 together with the telephone number data of the wireless portable terminal 20 as the determined voice dialogue result data (step S107a).
[0088]
As described above, the connection state of the communication line used for the voice conversation process is monitored, and if the voice conversation result data in the provisional decision state exists when the communication line is disconnected, the provisional decision state In this case, it is determined that the future processing is executed using the voice dialogue result data and the decided voice dialogue result data is output to the content server 40. Even if there is, voice dialogue result data can be output.
[0089]
Therefore, even when voice conversation processing cannot be continued due to radio wave interference, if voice recognition processing is performed and voice dialogue result data is provisionally determined, voice input is performed on the display screen of the wireless portable terminal 20. The contents of the information can be reflected. Therefore, the user does not need to perform the voice interaction process again.
[0090]
In addition, after the voice conversation result data is tentatively determined, for example, before the notification of the guidance for confirming the voice recognition result is finished, the voice conversation result data is output even when the communication line is disconnected by the user. be able to. Therefore, if the user inputs the station name by voice input and then spontaneously disconnects the communication line without listening to the guidance for confirming the voice recognition result, the response to the confirmation request (“Yes”, “ It is not necessary to make a response such as “No”, and the contents of the information inputted by voice can be quickly reflected on the display screen of the wireless portable terminal 20. Therefore, for the user, the voice interaction process for inputting information can be completed in a short time.
[0091]
In the above-described embodiment, the voice dialogue process is executed to obtain one piece of information called the station name. However, the voice dialogue process is executed to obtain a plurality of pieces of information (for example, information such as addresses and names). You may be made to do. In this case, for example, after executing a voice dialogue for acquiring and confirming a plurality of voice dialogue result data, the plurality of voice dialogue result data decided are transmitted to the content server 40 and then the voice dialogue processing is performed. What is necessary is just to give the guidance which shows completion | finish. When a plurality of voice conversation result data are transmitted together, the content server 40 creates Web page data reflecting each of the plurality of voice conversation result data and transmits it to the wireless portable terminal 20. do it.
[0092]
When voice dialogue processing for obtaining a plurality of information is executed, the processing for confirming the voice dialogue result data (confirmation processing) is performed after obtaining all of the voice dialogue result data in a plurality of temporarily determined states. All of them may be performed together, and every time a predetermined unit (for example, three if three is one unit) of a plurality of temporarily determined voice conversation result data is acquired. The unit may be collectively performed, or may be sequentially performed each time the voice dialog result data in the provisionally determined state is acquired. In addition, as described above, when the voice dialogue processing for obtaining a plurality of information is executed, the voice communication with the communication line is disconnected while the voice input for a part of the plurality of information is finished. When the processing is terminated halfway, some information for which a voice recognition result has been obtained is transmitted to the content server 40 as confirmed voice interaction result data and reflected in the display information of the wireless portable terminal 20. That's fine. Then, in the subsequent voice dialogue processing, only the information that is not reflected in the display information of the wireless portable terminal 20 may be processed for voice input.
[0093]
In addition, when voice dialogue processing for obtaining a plurality of pieces of information (for example, information such as addresses and names) is executed, the obtained voice recognition result data is directly used as the voice dialogue result without performing the voice recognition result confirmation processing. It may be determined as data and sequentially transmitted to the content server 40. In this case, for example, as shown in FIG. 10, every time a voice recognition result is obtained, voice dialogue result data is transmitted to the content server 40 (step S107a, step S107b). Each time the content server 40 receives the voice interaction result data, the content server 40 stores it in the same manner as in step S108 described above (steps S108a and S108b). If comprised in this way, the content server 40 can receive voice dialog result data at an early stage, and can perform the process for updating web page data immediately after completion of a voice dialog process. Become. The content server 40 may be configured to generate Web page data reflecting the voice conversation result data and transmit it to the wireless portable terminal 20 every time the voice conversation result data is received. In this case, each time the voice input is finished during the voice dialogue process, the wireless portable terminal 20 calls the browser function to update the displayed web page, calls the call function, and executes the continuation of the voice dialogue process. do it.
[0094]
In the above-described embodiment, the wireless portable terminal 20 has been described as a portable telephone terminal. However, if the wireless portable terminal 20 has both a browser function and a calling function, a PDA (Personal Digital Assistants) or a personal computer is used. Other terminal devices may be used. Moreover, although the wireless portable terminal 20 that performs wireless communication is taken as an example, the present invention can be applied to a terminal that performs wired communication.
[0095]
In the above-described embodiment, the data communication is performed using the Internet 50. However, the data communication may be performed using another communication network such as a LAN. In addition, although the data communication is performed by packet communication, other communication methods may be used.
[0096]
In the embodiment described above, the voice interaction server 30 and the content server 40 may be operated as one server (cooperation server, center). In this case, the voice conversation server 30 and the content server 40 may be connected by a dedicated line or the like.
[0097]
In the above-described embodiment, the input of information on the display screen is performed by voice dialogue processing. However, voice dialogue processing is executed to provide information that is not displayed such as a ringing melody that is played as a ringtone of a telephone call. It is good also as composition to do. In this case, it is only necessary to provide the wireless portable terminal 20 with an incoming melody of a title specified by information input by the user through voice interaction processing.
[0098]
In the above-described embodiment, the content server 40 acquires the voice interaction result data in response to the Web page acquisition request from the wireless portable terminal 20 (Web page update request in Step S110) (Step S111). The web page data reflecting the voice dialogue result indicated by the voice dialogue result data is transmitted (step S112). After the voice dialogue result data is saved (step S108), the web page from the wireless portable terminal 20 is stored. The web page data reflecting the voice conversation result indicated by the voice conversation result data may be transmitted to the wireless portable terminal 20 regardless of the acquisition request. If comprised in this way, the result of a voice interaction process can be reflected on a web page, without making the wireless portable terminal 20 perform a web page acquisition request.
[0099]
Further, in each of the above-described embodiments, C-HTML (Compact HTML), which is widely used for displaying a Web page in a browser of a mobile phone terminal, as a display language for generating Web page data However, other markup languages such as HTML, HDML (Handheld Device Markup Language), and WML (Wireless Markup Language) may be used.
[0100]
Further, in each of the above-described embodiments, the voice conversation server 30 and the wireless portable terminal 20 are connected to the general public network 60 and make a voice call via the general public network 60. However, VoIP (Voice It is good also as a structure which connects to IP networks, such as over Internet Protocol, and performs voice communication. In each of the above-described embodiments, the voice dialogue processing is performed by voice. However, a user who manages the wireless portable terminal 20 performs voice input by using a DTMF (Dual Tone Multi Frequency) signal, and the voice dialogue server 30. May acquire a character representing a key corresponding to the DTMF signal acquired in the voice recognition process.
[0101]
Further, in each of the above-described embodiments, the voice interaction server 30 performs an operation based on the voice interaction processing program for executing the various processes described above. For example, the voice dialogue processing program uses the voice dialogue server 30 to recognize the sound received from the wireless portable terminal 20 via the communication network 60 or sound data indicating the voice, and use the voice recognition result to perform wireless communication. A step of performing voice conversation processing by sound communication using the communication network 60 with the portable terminal 20, and a step of monitoring a connection state of a communication line used for the voice conversation processing and detecting that the communication line is disconnected And the confirmed voice conversation result information determined to be handled as the voice conversation result information among the voice recognition result information obtained by the voice conversation processing, to the content server 40, thereby confirming the confirmed voice to the content server 40. Requesting the provision of cooperation result information based on the dialogue result information When it is detected that the communication line is disconnected before the confirmed voice conversation result information is obtained, the voice recognition result information that is not determined as the confirmed voice conversation result information among the already obtained voice recognition result information Is a program for executing a process for determining that the information is to be determined voice conversation result information. The content server 40 also operates based on a data processing program for executing the various processes described above.
[0102]
【The invention's effect】
As described above, according to the terminal communication system of the present invention, a terminal device having a voice communication function and a packet communication function, a voice control unit that performs a voice call between the terminal devices, and a terminal received by the voice control unit A voice recognition unit that recognizes a voice signal from the device and outputs a recognition result, a line information detection unit that monitors line information of a voice call and detects interruption of the voice call, and a voice information end or voice line detection by voice call And a packet control unit that transmits a recognition result obtained by the voice recognition unit or information based on the recognition result to the terminal device by packet communication when interruption of the voice call is detected by the unit. Therefore, even if the voice call is interrupted and the voice dialogue processing is terminated, the recognition result obtained by the voice recognition unit as voice dialogue result data can be used. Information based on the recognition result, it is possible to provide the terminal device.
[0105]
According to the terminal communication system of the present invention, the cooperation server uses a voice recognition unit that recognizes sound data indicating sound or voice received from the terminal device via the communication network, and a voice recognition result by the voice recognition unit. The connection state of the voice dialogue processing unit that performs voice dialogue processing by voice communication using a communication network with the terminal device and the communication line used for voice dialogue processing, and the communication line is disconnected Uses the communication network to detect the link disconnection detection unit that detects the communication result and the cooperation result information based on the confirmed voice conversation result information that is confirmed to be handled as the voice conversation result information among the voice recognition result information obtained by the voice conversation processing. A cooperation result information providing unit provided to the terminal device through the data communication, and the voice dialogue processing unit is configured to perform voice dialogue processing by the line disconnection detection unit. If it is detected that the communication line has been disconnected before the confirmed voice conversation result information is obtained, the confirmed voice conversation result information is included in the voice recognition result information obtained by the voice recognition processing unit. Since it is determined that the voice recognition result information that has not been confirmed is the confirmed voice dialogue result information, even if the communication line is disconnected before the voice dialogue processing is completed, the linkage result based on the voice dialogue result Information can be provided to the terminal device. Therefore, even if the communication line is disconnected before the completion of the voice dialogue processing, if there is sound or voice input and information that has already been voice-recognized, cooperation result information based on the voice recognition result information is provided. be able to.
[0106]
The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by the voice dialogue processing with the terminal device. When it is configured to determine that the recognition result is the confirmed voice dialogue result information, if the communication line is not disconnected, a confirmation is made before notification indicating the end of the voice dialogue processing. Cooperation result information based on the obtained voice recognition result can be quickly provided to the terminal device, and when the communication line is disconnected, the cooperation result information based on the voice conversation result before confirmation can be provided to the terminal device. Will be able to.
[0107]
The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by the voice dialogue processing with the terminal device, and the voice recognition result of the unit sound data When the confirmation is made, if the configuration is such that the voice recognition result of the unit of sound data is determined as the confirmed voice dialogue result information, a notification indicating the end of the voice dialogue processing is given. Before, the cooperation result information based on the voice recognition results of a plurality of sound data can be quickly provided to the terminal device.
[0109]
The cooperation server includes a voice dialogue server that performs sound communication by sound or voice over a communication network with a terminal device, and a content server that provides and collects information using a Web page, When the content server is configured to provide the terminal device with the cooperation result information based on the confirmed voice dialogue result information, the server that executes the voice dialogue processing and the like and the provision of information using the Web page are performed. In a system in which a server to be executed is provided separately, even if the communication line is disconnected before the completion of the voice conversation process, the cooperation result information based on the voice conversation result can be provided to the terminal device. Therefore, even if the communication line is disconnected before the completion of the voice dialogue processing, if there is sound or voice input and information that has already been voice-recognized, cooperation result information based on the voice recognition result information is provided. be able to.
[0110]
If the data communication performed between the terminal device and the cooperation server is configured to be performed by packet communication, the voice conversation result information is transmitted by packet communication before notification indicating the end of the voice conversation processing. Therefore, even if a communication network in which packet communication is performed is congested, cooperation result information can be provided to the terminal device without delay.
[0111]
When the linkage result information is configured to be Web page data reflecting the confirmed voice conversation result information or selection data selected based on the confirmed voice conversation result information, the confirmed voice conversation result information is reflected. The web page based on the web page data thus made can be displayed on the terminal device, or the selection data selected based on the confirmed voice dialogue result information can be provided to the terminal device.
[0112]
Further, according to the cooperation server of the present invention, the terminal device uses the voice recognition unit that recognizes sound data indicating sound or voice received from the terminal device via the communication network, and the voice recognition result by the voice recognition unit. The line disconnection that monitors the connection state of the voice communication processing unit that performs voice dialog processing by voice communication using the communication network and the communication line used for the voice dialog processing and detects that the communication line is disconnected The result of cooperation based on the detected voice dialogue result information determined to be handled as the voice dialogue result information among the voice recognition result information obtained by the voice dialogue processing is detected by the data communication using the communication network. A voice interaction processing unit including a link result information providing unit provided to the apparatus, wherein the voice conversation processing unit performs a confirmed voice If it is detected that the communication line has been disconnected before the result information is obtained, the voice recognition result information that has already been obtained and is not used as the finalized voice conversation result information. Since it is determined that the result information is confirmed voice conversation result information, even if the communication line is disconnected before the voice conversation processing is completed, the cooperation result information based on the voice conversation result is displayed on the terminal device. Will be able to provide. Therefore, even if the communication line is disconnected before the completion of the voice dialogue processing, if there is sound or voice input and information that has already been voice-recognized, cooperation result information based on the voice recognition result information is provided. be able to.
[0113]
The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by the voice dialogue processing with the terminal device. When it is configured to determine that the recognition result is the confirmed voice dialogue result information, if the communication line is not disconnected, a confirmation is made before notification indicating the end of the voice dialogue processing. Cooperation result information based on the obtained voice recognition result can be quickly provided to the terminal device, and when the communication line is disconnected, the cooperation result information based on the voice conversation result before confirmation can be provided to the terminal device. Will be able to.
[0114]
The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by the voice dialogue processing with the terminal device, and the voice recognition result of the unit sound data When the confirmation is made, if the configuration is such that the voice recognition result of the unit of sound data is determined as the confirmed voice dialogue result information, a notification indicating the end of the voice dialogue processing is given. Before, the cooperation result information based on the voice recognition results of a plurality of sound data can be quickly provided to the terminal device.
[0115]
Further, according to the voice dialogue server of the present invention, the voice recognition unit that recognizes the sound received from the terminal device via the communication network or the sound data indicating the voice, and the voice recognition result by the voice recognition unit, A voice dialog processing unit that performs voice dialog processing by sound communication using a communication network with a device, and a line that monitors the connection state of a communication line used for voice dialog processing and detects that the communication line is disconnected Confirmed to the content server by sending to the content server the confirmed voice dialogue result information that is confirmed to be handled as the voice dialogue result information from the voice recognition result information obtained by the voice dialogue processing with the disconnection detection unit A confirmed voice dialog result information transmission unit that requests to provide cooperation result information based on the voice dialog result information. If it is detected that the communication line has been disconnected before the confirmed voice dialogue result information is obtained in the voice dialogue process, the confirmed voice is included in the voice recognition result information obtained by the voice recognition processing unit. Since it is determined that voice recognition result information that is not used as dialogue result information is determined voice dialogue result information, even if the communication line is disconnected before voice dialogue processing is completed, The dialogue result information can be transmitted to the content server. Therefore, it becomes possible to provide the cooperation result information based on the voice conversation result to the terminal device. Even if the communication line is disconnected before the completion of the voice dialogue processing, if there is sound or voice input and information that has already been voice-recognized, the confirmed voice dialogue result information based on the voice recognition result information is displayed. It can be sent to the content server.
[0116]
The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by the voice dialogue processing with the terminal device. When it is configured to determine that the recognition result is the confirmed voice dialogue result information, if the communication line is not disconnected, a confirmation is made before notification indicating the end of the voice dialogue processing. The confirmed voice dialogue result information based on the obtained voice recognition result can be transmitted to the content server. When the communication line is disconnected, the confirmed voice dialogue result information based on the voice dialogue result before confirmation can be sent to the content server. Be able to send.
[0117]
The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by the voice dialogue processing with the terminal device, and the voice recognition result of the unit sound data When the confirmation is made, if the configuration is such that the voice recognition result of the unit of sound data is determined as the confirmed voice dialogue result information, a notification indicating the end of the voice dialogue processing is given. Before, it is possible to transmit the confirmed voice conversation result information based on the voice recognition results of the plurality of sound data to the content server.
[0118]
In addition, according to the voice interaction processing method of the present invention, a step of recognizing sound received from a terminal device via a communication network or sound data indicating sound and a terminal device using a voice recognition result Obtained by performing voice conversation processing by sound communication using a communication network, monitoring the connection state of a communication line used for voice conversation processing, detecting that the communication line has been disconnected, and voice conversation processing. By providing the content server with the confirmed voice conversation result information that has been confirmed to be handled as the voice conversation result information, the cooperation result information based on the confirmed voice conversation result information is provided to the content server. The communication line is disconnected before the confirmed voice dialog result information is obtained in the voice dialog processing. Is detected, the voice recognition result information that is not regarded as the confirmed voice conversation result information among the already obtained voice recognition result information is determined as the confirmed voice conversation result information. Therefore, even if the communication line is disconnected before the voice conversation process is completed, the confirmed voice conversation result information can be transmitted to the content server. Therefore, it becomes possible to provide the cooperation result information based on the voice conversation result to the terminal device. Even if the communication line is disconnected before the completion of the voice dialogue processing, if there is sound or voice input and information that has already been voice-recognized, the confirmed voice dialogue result information based on the voice recognition result information is displayed. It can be sent to the content server.
[0119]
Furthermore, according to the speech dialogue processing program of the present invention, the step of recognizing sound data indicating sound or sound received from the terminal device via the communication network to the computer, and using the speech recognition result, Performing voice dialogue processing by sound communication using a communication network between the two, monitoring a connection state of a communication line used for voice dialogue processing, detecting that the communication line is disconnected, and voice dialogue processing The result of cooperation based on the confirmed voice conversation result information to the content server by transmitting to the content server the confirmed voice conversation result information confirmed to be handled as the voice conversation result information among the voice recognition result information obtained in And a step of requesting information provision are executed, and final voice dialogue result information is obtained in the voice dialogue processing. If it is detected that the communication line is disconnected before the voice recognition result information already obtained, the voice recognition result information that is not defined as the confirmed voice conversation result information is determined as the confirmed voice conversation result information. Therefore, even if the communication line is disconnected before the voice conversation process is completed, the confirmed voice conversation result information can be transmitted to the content server. Therefore, it becomes possible to provide the cooperation result information based on the voice conversation result to the terminal device. Even if the communication line is disconnected before the completion of the voice dialogue processing, if there is sound or voice input and information that has already been voice-recognized, the confirmed voice dialogue result information based on the voice recognition result information is displayed. It can be sent to the content server.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an example of the configuration of a wireless portable terminal system in an embodiment of a terminal communication system of the present invention.
FIG. 2 is a timing chart showing an example of display / voice cooperation processing and processing timing.
FIG. 3 is an explanatory diagram illustrating an example of a display state of a Web page.
FIG. 4 is an explanatory diagram showing an example of the contents of a voice dialogue.
FIG. 5 is a flowchart illustrating an example of a voice interaction process.
FIG. 6 is an explanatory diagram illustrating an example of a display state of a Web page after update.
FIG. 7 is an explanatory diagram showing another example of the contents of a voice dialogue.
FIG. 8 is a flowchart illustrating an example of line monitoring processing.
FIG. 9 is a timing chart showing another example of display / voice cooperation processing and processing timing.
FIG. 10 is a timing chart showing still another example of display / voice cooperation processing and processing timing.
[Explanation of symbols]
10 Wireless portable terminal system
20 Wireless portable terminal
30 Spoken Dialogue Server
31 Voice communication information detector
32 Spoken Dialogue Control Unit
33 Voice dialogue information storage
34 Voice recognition unit
35 Voice guidance generator
36 Internet Communication Department
40 content server
41 Internet Communication Department
42 Content control unit
43 Content information storage unit
50 Internet
60 General public network

Claims

A terminal device having a voice communication function and a packet communication function;
A voice control unit for performing a voice call with the terminal device, a voice recognition unit for recognizing a voice signal received from the terminal device by the voice control unit and outputting a recognition result, and line information for the voice call. A line information detection unit for monitoring and detecting interruption of the voice call; obtained by the voice recognition unit at the end of the voice conversation by the voice call or when the line information detection unit detects interruption of the voice call; A terminal communication system comprising: a center having a packet control unit that transmits the received recognition result or information based on the recognition result to the terminal device by packet communication.

Voice interaction processing is performed between the terminal device having a call function and a data communication function and the terminal device via a communication network, and cooperation result information based on the result of the voice interaction processing is transmitted to the terminal device via the communication network. A terminal communication system including a cooperation server to provide,
The linkage server
A voice recognition unit that recognizes sound data indicating sound or voice received from the terminal device via a communication network;
A voice dialogue processing unit that performs voice dialogue processing by sound communication using a communication network with the terminal device using a voice recognition result by the voice recognition unit;
A line disconnection detecting unit that monitors a connection state of a communication line used for the voice conversation processing and detects that the communication line is disconnected;
Coordination result information based on confirmed voice conversation result information that is confirmed to be handled as voice conversation result information among the voice recognition result information obtained by the voice conversation processing is transmitted to the terminal device by data communication using a communication network. Including the cooperation result information providing section to provide,
When the communication line disconnection unit detects that the communication line has been disconnected before the confirmed voice conversation result information is obtained in the voice conversation process by the line disconnection detection unit, the voice recognition processing unit already obtained A terminal communication system, characterized in that, among the speech recognition result information by the processing unit, speech recognition result information that is not defined as confirmed speech conversation result information is determined to be confirmed speech conversation result information.

The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by voice dialogue processing with the terminal device, and if the confirmation is obtained, The terminal communication system according to claim 2, wherein the voice recognition result is determined to be confirmed voice conversation result information.

The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by voice dialogue processing with the terminal device, and the voice recognition result of one unit of sound data 3. The terminal communication system according to claim 2 , wherein when each confirmation is obtained, the voice recognition result of the unit of sound data is determined to be the confirmed voice conversation result information.

The linkage server includes a voice dialogue server that performs sound communication by sound or voice with a terminal device via a communication network, and a content server that provides and collects information using a Web page, and the voice dialogue server The terminal communication system according to any one of claims 2 to 4 , wherein cooperation result information based on confirmed voice conversation result information is provided to the terminal device using the content server.

The terminal communication system according to any one of claims 2 to 5 , wherein data communication performed between the terminal device and the cooperation server is performed by packet communication.

Linkage result information, the terminal communication according to any one of claims 6 claims 2 confirm speech dialogue result information is selected data selected based on the Web page data or determined voice interaction result information is reflected system.

A cooperation server that performs voice dialogue processing with a terminal device having a call function and a data communication function via a communication network and provides linkage result information based on the result of the voice dialogue processing to the terminal device via the communication network. There,
A voice recognition unit that recognizes sound data indicating sound or voice received from the terminal device via a communication network;
A voice dialogue processing unit that performs voice dialogue processing by sound communication using a communication network with the terminal device using a voice recognition result by the voice recognition unit;
A line disconnection detecting unit that monitors a connection state of a communication line used for the voice conversation processing and detects that the communication line is disconnected;
Coordination result information based on confirmed voice conversation result information that is confirmed to be handled as voice conversation result information among the voice recognition result information obtained by the voice conversation processing is transmitted to the terminal device by data communication using a communication network. Including the cooperation result information providing section to provide,
If the communication line disconnection unit detects that the communication line is disconnected before the confirmed voice dialog result information is obtained in the voice dialog process by the line disconnection detection unit, A linkage server characterized in that, among the speech recognition result information by the processing unit, it is determined that speech recognition result information that is not defined as confirmed speech conversation result information is set as confirmed speech conversation result information.

The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by voice dialogue processing with the terminal device, and if the confirmation is obtained, The linkage server according to claim 8, wherein the voice recognition result is determined to be confirmed voice dialogue result information.

The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by voice dialogue processing with the terminal device, and the voice recognition result of one unit of sound data The cooperation server according to claim 8 , wherein when each confirmation is obtained, the voice recognition result of the unit of sound data is determined to be the confirmed voice conversation result information.

Performs voice communication by sound or voice over a communication network with a terminal device having a call function and a data communication function, and performs voice interaction processing on a content server that provides and collects information using a web page A voice interaction server that requests the terminal device to provide cooperation result information based on a result,
A voice recognition unit that recognizes sound data indicating sound or voice received from the terminal device via a communication network;
A voice dialogue processing unit that performs voice dialogue processing by sound communication using a communication network with the terminal device using a voice recognition result by the voice recognition unit;
A line disconnection detecting unit that monitors a connection state of a communication line used for the voice conversation processing and detects that the communication line is disconnected;
Out of the speech recognition result information obtained by the speech dialogue processing, the confirmed speech dialogue result information confirmed to be handled as the speech dialogue result information is transmitted to the content server, so that the confirmed speech dialogue is sent to the content server. Including a confirmed voice dialogue result information transmission unit for requesting provision of cooperation result information based on the result information,
If the communication line disconnection unit detects that the communication line is disconnected before the confirmed voice dialog result information is obtained in the voice dialog process by the line disconnection detection unit, A speech dialogue server, characterized in that, among speech recognition result information by a processing unit, speech recognition result information that is not defined as confirmed speech dialogue result information is determined to be confirmed speech dialogue result information.

The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by voice dialogue processing with the terminal device, and if the confirmation is obtained, The voice conversation server according to claim 11, wherein the voice recognition result is determined to be confirmed voice conversation result information.

The voice dialogue processing unit confirms whether or not the voice recognition result of the sound data from the terminal device by the voice recognition unit is appropriate by voice dialogue processing with the terminal device, and the voice recognition result of one unit of sound data The voice conversation server according to claim 11 , wherein when each confirmation is obtained, the voice recognition result of the unit of sound data is determined to be the confirmed voice conversation result information.

Performs voice communication by sound or voice over a communication network with a terminal device having a call function and a data communication function, and performs voice interaction processing on a content server that provides and collects information using a web page A voice interaction processing method for requesting the terminal device to provide cooperation result information based on a result,
Recognizing sound data indicating sound or sound received from the terminal device via a communication network;
Using voice recognition results to perform voice dialogue processing by sound communication using a communication network with the terminal device;
Monitoring a connection state of a communication line used for the voice conversation processing and detecting that the communication line is disconnected;
By transmitting to the content server finalized voice dialogue result information that is determined to be handled as voice dialogue result information among the voice recognition result information obtained by the voice dialogue processing, the final voice dialogue to the content server is transmitted. Requesting provision of cooperation result information based on the result information,
If it is detected that the communication line is disconnected before the confirmed voice conversation result information is obtained in the voice conversation processing, the confirmed voice conversation result information is not included in the already obtained voice recognition result information. A speech dialogue processing method, wherein the speech recognition result information is determined to be confirmed speech dialogue result information.

Performs voice communication by sound or voice over a communication network with a terminal device having a call function and a data communication function, and performs voice interaction processing on a content server that provides and collects information using a web page A voice interaction processing program for requesting the terminal device to provide cooperation result information based on a result,
On the computer,
Recognizing sound data indicating sound or sound received from the terminal device via a communication network;
Using voice recognition results to perform voice dialogue processing by sound communication using a communication network with the terminal device;
Monitoring a connection state of a communication line used for the voice conversation processing and detecting that the communication line is disconnected;
By transmitting to the content server finalized voice dialogue result information that is determined to be handled as voice dialogue result information among the voice recognition result information obtained by the voice dialogue processing, the final voice dialogue to the content server is transmitted. Requesting to provide cooperation result information based on the result information,
If it is detected that the communication line is disconnected before the confirmed voice conversation result information is obtained in the voice conversation processing, the confirmed voice conversation result information is not included in the already obtained voice recognition result information. A spoken dialogue processing program for making a decision to use voice recognition result information as confirmed voice dialogue result information.