JP2005519363A

JP2005519363A - Simultaneous multimodal communication system and method

Info

Publication number: JP2005519363A
Application number: JP2003571826A
Authority: JP
Inventors: ジョンソン、グレッグ; バラスリヤ、セナカ; フェランズ、ジェームズ; ヤンケ、ジェローム; ピアース、レイヌ; クカ、デイビッド; ガラジダラ、ディラニ
Original assignee: Motorola Inc
Current assignee: Motorola Solutions Inc
Priority date: 2002-02-27
Filing date: 2003-02-06
Publication date: 2005-06-30
Also published as: EP1679622A3; WO2003073198A2; EP1679622A2; US20030167172A1; US6807529B2; CN101291336A; KR20040089677A; BR0307274A; WO2003073198A3; EP1481334A2; CN1639707A; AU2003209037A8; EP1481334A4; KR100643107B1; AU2003209037A1

Abstract

マルチモーダル・ネットワーク要素（１４）は、１つまたは複数のデバイス（１２，１６）上の異なるユーザ・エージェント・プログラム（３０，３４）を通じた同時マルチモーダル通信セッションを容易にする。例えば、スピーチ・エンジンおよびコール／セッション終了を含む音声ゲートウェイ（１６）の音声ブラウザ（３４）などの音声モードで通信するユーザ・エージェント・プログラムは、モバイル・デバイス（１２）上のグラフィカル・ブラウザ（３０）などの異なるモードで動作している他のユーザ・エージェント・プログラムと同期する。複数のユーザ・エージェント・プログラム（３０，３４）は、セッション時にコンテンツ・サーバ（１８）に結合されて動作し、同時マルチモーダル対話を行うことができる。The multimodal network element (14) facilitates simultaneous multimodal communication sessions through different user agent programs (30, 34) on one or more devices (12, 16). For example, a user agent program that communicates in a voice mode, such as the voice browser (34) of the voice gateway (16) including the speech engine and call / session termination, may be a graphical browser (30) on the mobile device (12). Synchronize with other user agent programs running in different modes. A plurality of user agent programs (30, 34) are coupled to the content server (18) and operate at the time of a session to perform simultaneous multimodal interaction.

Description

本発明は、一般的に、通信システムおよび方法に関し、より詳細には、マルチモーダル通信システムおよび方法に関する。 The present invention relates generally to communication systems and methods, and more particularly to multimodal communication systems and methods.

ハンドヘルド・デバイス、携帯電話、ラップトップ・コンピュータ、ＰＤＡ、インターネット家電、非モバイル・デバイス、およびその他の適切なデバイス等の通信デバイスが関与する新しい技術分野に、情報およびサービスにアクセスするためマルチモーダル対話の応用分野がある。通信デバイスに通常存在するのは、ブラウザのような少なくとも１つや、ユーザ・インターフェースとして動作することが可能な他の好適なソフトウェアのようなユーザ・エージェント・プログラムである。ユーザ・エージェント・プログラムは、（ユーザ・エージェント・プログラムを通じてユーザが入力するか、または他のデバイスまたはソフトウェア・アプリケーションからの）フェッチ要求に応答し、フェッチされた情報を受け取り、内部または外部接続を介してコンテンツ・サーバ内をナビゲートし、情報をユーザに提示することができる。ユーザ・エージェント・プログラムは、グラフィカル・ブラウザ、音声ブラウザ、または当業者には理解される他の適切なユーザ・エージェント・プログラムとすることができる。このようなユーザ・エージェント・プログラムとしては、Ｊ２ＭＥアプリケーション、Ｎｅｔｓｃａｐｅ（商標）、ＩｎｔｅｒｎｅｔＥｘｐｌｏｒｅｒ（商標）、ｊａｖａアプリケーション、ＷＡＰブラウザ、Ｉｎｓｔａｎｔ
Ｍｅｓｓａｇｉｎｇ、マルチメディア・インターフェース、ＷｉｎｄｏｗｓＣＥ（商標）、または他の適切なソフトウェア実装があるが、これらに限定されるわけではない。 Multimodal interaction to access information and services in new technology areas involving communication devices such as handheld devices, mobile phones, laptop computers, PDAs, Internet appliances, non-mobile devices, and other suitable devices There are application fields. Typically present in the communication device is a user agent program such as at least one such as a browser or other suitable software capable of operating as a user interface. The user agent program responds to fetch requests (entered by the user through the user agent program or from other devices or software applications), receives the fetched information, and uses an internal or external connection. The user can navigate through the content server and present information to the user. The user agent program can be a graphical browser, a voice browser, or any other suitable user agent program understood by those skilled in the art. Such user agent programs include J2ME application, Netscape (trademark), Internet Explorer (trademark), Java application, WAP browser, Instant
This includes, but is not limited to, Messaging, Multimedia Interface, Windows CE ™, or other suitable software implementation.

マルチモーダル技術を使用することにより、ユーザは、ユーザ・エージェント・プログラムを介して１つのモードで、音声、データ、映像、オーディオ、またはその他の情報などの情報や、電子メール、天気情報、銀行取引、およびニュースまたはその他の情報などのサービスにアクセスし、異なるモードで情報を受信することができる。より具体的には、ユーザは、マイクに向かってフェッチ要求を発するなど、１つまたは複数のモードで情報フェッチ要求をサブミットし、その後、ユーザはフェッチされた情報を、同じモード（つまり、音声）で、または、返された情報を表示画面に目に見える形式で提示するグラフィカル・ブラウザを使用するなど異なるモードで、受信することができる。通信デバイス内では、ユーザ・エージェント・プログラムは、ネットワークに接続されているデバイスまたは他の端末デバイスに存在する標準のＷｅｂブラウザまたは他の適切なソフトウェア・プログラムと同様に動作する。 By using multimodal technology, the user can in one mode via user agent program, information such as voice, data, video, audio or other information, e-mail, weather information, banking And access services such as news or other information and receive information in different modes. More specifically, the user submits an information fetch request in one or more modes, such as issuing a fetch request toward the microphone, after which the user can retrieve the fetched information in the same mode (ie, voice). Or in different modes, such as using a graphical browser that presents the returned information in a visible format on the display screen. Within the communication device, the user agent program operates in the same manner as a standard web browser or other suitable software program residing on a device connected to the network or other terminal device.

したがって、セッション中に複数のモードで通信することを容易にするために、１つまたは複数のユーザ入力および出力インターフェースをユーザが使用可能であるマルチモーダル通信システムが提案されている。ユーザ・エージェント・プログラムは、異なるデバイス上に配置可能である。例えば、音声ゲートウェイなどのネットワーク要素は、音声ブラウザを含むことができる。例えば、ハンドヘルド・デバイスとしては、ＷＡＰブラウザまたは他の適切なテキスト・ベースのユーザ・エージェント・プログラムなどのグラフィカル・ブラウザがある。したがって、マルチモーダル機能を備える場合、ユーザは１つのモードで入力し、異なるモードで戻る情報を受け取ることができる。 Accordingly, multimodal communication systems have been proposed in which one or more user input and output interfaces can be used by a user to facilitate communication in multiple modes during a session. User agent programs can be located on different devices. For example, a network element such as a voice gateway can include a voice browser. For example, the handheld device may be a graphical browser such as a WAP browser or other suitable text-based user agent program. Therefore, when a multimodal function is provided, the user can receive information that is input in one mode and returned in a different mode.

例えば、一部の情報を音声モードで入力し、他の情報を触覚インターフェースまたはグラフィカル・インターフェースで入力するなど、ユーザ入力を２つの異なるモードで提供しようとするシステムが提案されている。例えば、提案されているものの１つとして、最初に音声入力し、音声入力が完了した後で短いメッセージを送信することをユーザに要求
するシリアル非同期手法の使用がある。このようなシステム内のユーザは、同じ１つのセッション中にモードを手動で切り換えなければならない場合がある。したがって、そのような提案は面倒なものとなる可能性がある。 For example, systems have been proposed that attempt to provide user input in two different modes, such as entering some information in a voice mode and entering other information in a haptic or graphical interface. For example, one that has been proposed is the use of a serial asynchronous approach that requires the user to speak first and then send a short message after the speech is complete. Users in such systems may have to manually switch modes during the same session. Therefore, such a proposal can be cumbersome.

他の提案されているシステムは、単一のユーザ・エージェント・プログラムと、マークアップ言語のタグとを既存のＨＴＭＬページで使用し、ユーザは、例えば、検索単語を入力する代わりに音声を使ってＷｅｂページにナビゲートし、その後、同じＨＴＭＬページで、ユーザがテキスト情報を入力することができる。例えば、ユーザは、「ｃｉｔｙ」という単語を発声し、住所を入力することにより、コンテンツ・サーバから視覚的地図情報を取得することができる。しかし、このような提案方法では、通常、１つのデバイス上の同じユーザ・エージェント・プログラムで異なるモードによりマルチモード入力を行う必要がある（同じブラウズを通じて入力する）。したがって、音声およびテキスト情報は、通常、同じＨＴＭＬ形式で入力され、同じユーザ・エージェント・プログラムを通じて処理される。しかし、この提案では、単一のデバイス上で動作している単一のユーザ・エージェント・プログラムを使用する必要がある。 Other proposed systems use a single user agent program and markup language tags in existing HTML pages, and the user can, for example, use voice instead of typing a search word. The user can navigate to the web page and then enter text information on the same HTML page. For example, the user can obtain visual map information from the content server by uttering the word “city” and inputting an address. However, such a proposed method usually requires multi-mode input in different modes with the same user agent program on one device (input through the same browse). Thus, voice and text information is typically entered in the same HTML format and processed through the same user agent program. However, this proposal requires the use of a single user agent program running on a single device.

そのため、処理能力および記憶容量が限られているモバイル・デバイスなど、あまり複雑でないデバイスでは、複雑なブラウザを使用するとデバイスのパフォーマンスが低下する可能性がある。また、このようなシステムでは、異なるユーザ・エージェント・プログラムを通じた同時マルチモーダル情報入力を容易にすることができない。さらに、複数のデバイス上で同時マルチモーダル入力を行い、異なるアプリケーションまたは異なるデバイス間に処理を分散させることが望ましい場合がある。 As such, on less complex devices, such as mobile devices with limited processing power and storage capacity, the use of complex browsers can degrade device performance. In addition, such a system cannot facilitate simultaneous multimodal information input through different user agent programs. Furthermore, it may be desirable to perform simultaneous multimodal input on multiple devices and distribute processing across different applications or different devices.

他の提案では、マルチモーダル・ゲートウェイおよびマルチモーダル・プロキシを使用し、マルチモーダル・プロキシでコンテンツをフェッチし、そのコンテンツを通信デバイス内のユーザ・エージェント・プログラム（例えば、ブラウザ）および音声ブラウザ（例えばネットワーク要素内の）に出力し、システム側が１つのデバイスに対し音声とテキスト出力の両方を使用可能であるようにする。しかし、このような手法では、異なるアプリケーションを通じて異なるモードでユーザが情報を同時に入力できるようには思われない。その理由は、この提案もまた、異なるモードのフェッチされた情報を単一のユーザ・エージェント・プログラムまたはブラウザに出力する必要がある単一ユーザ・エージェント手法であると思われるからである。 Other proposals use multimodal gateways and multimodal proxies, fetch content with multimodal proxies, and retrieve the content into user agent programs (eg, browsers) and voice browsers (eg, browsers) in communication devices. In the network element so that the system side can use both voice and text output for one device. However, this approach does not seem to allow the user to input information simultaneously in different modes through different applications. The reason is that this proposal also seems to be a single user agent approach that requires different modes of fetched information to be output to a single user agent program or browser.

したがって、同時マルチモーダル通信装置および方法の改良が必要とされている。 Therefore, there is a need for improved simultaneous multimodal communication devices and methods.

本発明を例を使用して説明するが、本発明は類似の参照番号が類似の要素を示す添付の図面に制限されない。
簡単に説明すると、マルチモーダル・ネットワーク要素を使用すると、１つまたは複数のデバイス上の異なるユーザ・エージェント・プログラムを通じて同時マルチモーダル通信セッションを円滑に行うことができる。例えば、スピーチ・エンジンおよびコール／セッション終了を含む音声ゲートウェイの音声ブラウザなどの音声モードで通信するユーザ・エージェント・プログラムは、モバイル・デバイス上のグラフィカル・ブラウザなどの異なるモードで動作している他のユーザ・エージェント・プログラムと同期する。複数のユーザ・エージェント・プログラムは、セッション時にコンテンツ・サーバに結合されて動作し、同時マルチモーダル対話を行うことができる。 The present invention will now be described by way of example, but the present invention is not limited to the accompanying drawings in which like reference numerals indicate like elements.
Briefly, multimodal network elements can be used to facilitate simultaneous multimodal communication sessions through different user agent programs on one or more devices. For example, a user agent program that communicates in a voice mode, such as a speech engine and a voice gateway voice browser that includes call / session termination, can be used by other users operating in different modes, such as a graphical browser on a mobile device Synchronize with the user agent program. Multiple user agent programs can operate in conjunction with the content server during a session to conduct simultaneous multimodal interactions.

例えば、マルチモーダル・ネットワーク要素は、テキスト・モードに関連するＨＴＭＬ形式および音声モードに関連するｖｏｉｃｅＸＭＬ形式など、異なるモードに関連付けられている異なるマークアップ言語形式を取得するなどして、互いに対して異なるモードで
動作する複数のユーザ・エージェント・プログラムに対するモード特有命令を取得する。セッション中のマルチモーダル・ネットワーク要素は、得られたモード特有命令に基づいてユーザのために複数のユーザ・エージェント・プログラムからの出力の同期をとる。例えば、音声ブラウザは１つのデバイス上でオーディオ出力と同期し、グラフィカル・ブラウザは同じデバイスまたは異なるデバイス上の画面の表示出力と同期するという動作を同時に実行するため、ユーザは１つまたは複数のユーザ・エージェント・プログラムを通じて入力可能である。ユーザが入力情報を、異なるモードで動作している複数のユーザ・エージェント・プログラムを通じて入力する場合、方法および装置では、異なる同時マルチモーダル情報の要求に対する応答として、ユーザによって入力された受信同時マルチモーダル入力情報と複数のユーザ・エージェント・プログラムから送信された受信同時マルチモーダル入力情報とを融合、またはリンクする。したがって、同時マルチモーダル入力は異なるユーザ・エージェント・プログラムを通じて利用しやすくなり、同時マルチモーダル・セッション中に複数のデバイスまたは他のデバイスを使用するか、または１つのデバイスで複数のユーザ・エージェント・プログラムを使用することができる。異なるプロキシがマルチモーダル・ネットワーク要素により指定され、異なるモードに設定されている異なるユーザ・エージェント・プログラムの各々と通信する。 For example, multimodal network elements differ from each other, such as obtaining different markup language formats associated with different modes, such as HTML formats associated with text modes and voiceXML formats associated with speech modes. Get mode specific instructions for multiple user agent programs running in mode. Multimodal network elements in the session synchronize the outputs from multiple user agent programs for the user based on the resulting mode specific instructions. For example, since a voice browser synchronizes with audio output on one device and a graphical browser synchronizes with the display output of a screen on the same device or a different device at the same time, the user can -It can be entered through an agent program. When a user inputs input information through multiple user agent programs operating in different modes, the method and apparatus provides a receiving simultaneous multimodal input by the user in response to a request for different simultaneous multimodal information. The input information and the received simultaneous multimodal input information transmitted from a plurality of user agent programs are fused or linked. Thus, simultaneous multimodal input is facilitated through different user agent programs, using multiple devices or other devices during a simultaneous multimodal session, or multiple user agent programs on one device Can be used. Different proxies communicate with each of the different user agent programs specified by the multimodal network element and set to different modes.

図１は、本発明の一実施形態によるマルチモーダル通信システム１０の一例を示している。この例では、マルチモーダル通信システム１０は、通信デバイス１２、マルチモーダル融合サーバ１４、音声ゲートウェイ１６、およびＷｅｂサーバ１８などのコンテンツ・ソースを含む。通信デバイス１２とは、例えば、インターネット家電、ＰＤＡ、携帯電話、ケーブル・セットトップボックス、テレマティックス・ユニット、ラップトップ・コンピュータ、デスクトップ・コンピュータ、または他のモバイルあるいは非モバイル・デバイスなどである。さらに、所望の通信の種類に応じて、通信デバイス１２は、無線ローカル・エリア・ネットワークまたは無線ワイド・エリア・ネットワーク２０、ＷＡＰ／データ・ゲートウェイ２２、ショート・メッセージング・サービス・センター（ＳＭＳＣ／ページング・ネットワーク）２４、または他の適切なネットワークと通信し稼働することも可能である。同様に、マルチモーダル融合サーバ１４は、適切なデバイス、ネットワーク要素またはインターネット、イントラネット、マルチメディア・サーバ（ＭＭＳ）２６、インスタント・メッセージング・サーバ（ＩＭＳ）２８、または他の適切なネットワークを含むネットワークと通信することができる。したがって、通信デバイス１２は、通信リンク２１，２３，および２５を介して適切なネットワークと通信し稼働する。同様に、マルチモーダル融合サーバ１４は、符号２７で示されている従来の通信リンクを介してさまざまなネットワークに適切にリンクすることができる。この例では、それだけに限らないが、音声ゲートウェイ１６は、音声認識エンジン、手書き文字認識エンジン、顔認識エンジン、セッション制御、ユーザ提供アルゴリズム、および運用および保守コントローラを必要に応じて含む、従来の音声ゲートウェイ機能を備えることができる。この例では、通信デバイス１２は、ＷＡＰブラウザ、身振り認識、触覚認識、または他の適切なブラウザの形の視覚的ブラウザ（例えば、グラフィカル・ブラウザ）などのユーザ・エージェント・プログラム３０を、例えば、電話回路３２として示されているマイクおよびスピーカを含む電話回路とともに備える。他の適切な構成も使用可能である。 FIG. 1 shows an example of a multimodal communication system 10 according to one embodiment of the present invention. In this example, multimodal communication system 10 includes content sources such as communication device 12, multimodal fusion server 14, voice gateway 16, and web server 18. The communication device 12 is, for example, an Internet home appliance, a PDA, a mobile phone, a cable set top box, a telematics unit, a laptop computer, a desktop computer, or other mobile or non-mobile device. Further, depending on the type of communication desired, the communication device 12 may be a wireless local area network or wireless wide area network 20, a WAP / data gateway 22, a short messaging service center (SMSC / paging network). Network) 24, or other suitable network, can also operate. Similarly, the multimodal fusion server 14 can be any suitable device, network element or network including the Internet, Intranet, Multimedia Server (MMS) 26, Instant Messaging Server (IMS) 28, or other suitable network. Can communicate. Accordingly, the communication device 12 communicates with and operates on an appropriate network via communication links 21, 23, and 25. Similarly, the multimodal fusion server 14 can be appropriately linked to various networks via a conventional communication link, indicated at 27. In this example, the voice gateway 16 includes, but is not limited to, a conventional voice gateway including a voice recognition engine, a handwritten character recognition engine, a face recognition engine, session control, user provided algorithms, and an operation and maintenance controller as needed. A function can be provided. In this example, the communication device 12 uses a user agent program 30 such as a WAP browser, gesture recognition, tactile recognition, or a visual browser (eg, a graphical browser) in the form of another suitable browser, such as a telephone. It is provided with a telephone circuit including a microphone and speaker shown as circuit 32. Other suitable configurations can also be used.

音声ゲートウェイ１６は、電話回路３２のスピーカから出力するのに適した形式でオーディオ情報を出力する、音声ブラウザなどの他のユーザ・エージェント・プログラム３４を含む。しかし、スピーカを、ポケベルまたはその他のＰＤＡなどの通信デバイス１２以外の異なるデバイスに配置して音声が１つのデバイスに出力されるようにし、ユーザ・エージェント・プログラム３０を介する視覚的ブラウザをさらに別のデバイス上に用意できることは理解されるであろう。また、ユーザ・エージェント・プログラム３４は音声ゲートウェイ１６内に存在するが、ユーザ・エージェント・プログラム３４は通信デバイス１２（音声ブラウザ３６として示されている）内や他の適切なデバイス内に収めることも可
能であることも理解されるであろう。同時マルチモーダル通信に対応するために、本明細書で説明しているように、複数のユーザ・エージェント・プログラム、つまり、ユーザ・エージェント・プログラム３０とユーザ・エージェント・プログラム３４は、所与のセッションで互いに対して異なるモードで動作する。したがって、ユーザは、開示されているサービスにサインアップし、Ｗｅｂサーバ１８または他のサーバ（ＭＦＳ１４を含む）を介してアクセス可能なモード・プリファレンス・データベース３６内のモード・プリファレンスをプリセットすることにより、ユーザ・エージェント・プログラムの各々のモードを事前に定義することができる。さらに、ユーザは、所望の場合には、セッション中に、当業界で知られているように、所与のユーザ・エージェント・プログラムのモードを選択したり、変更したりすることができる。 The voice gateway 16 includes another user agent program 34, such as a voice browser, that outputs audio information in a format suitable for output from the speaker of the telephone circuit 32. However, the speakers are placed on different devices other than the communication device 12, such as a pager or other PDA, so that the sound is output to one device, and the visual browser via the user agent program 30 is further different. It will be appreciated that it can be prepared on the device. Also, although the user agent program 34 resides within the voice gateway 16, the user agent program 34 may be contained within the communication device 12 (shown as a voice browser 36) or other suitable device. It will also be understood that this is possible. To accommodate simultaneous multimodal communication, as described herein, a plurality of user agent programs, ie, user agent program 30 and user agent program 34, are provided in a given session. In different modes with respect to each other. Thus, the user signs up for the disclosed service and presets the mode preferences in the mode preference database 36 accessible via the web server 18 or other server (including MFS 14). Thus, each mode of the user agent program can be defined in advance. In addition, the user can select or change the mode of a given user agent program during a session, as is known in the art, if desired.

同時マルチモーダル同期コーディネータ４２は、セッション中に、他のユーザ・エージェント・プログラムに対するモード特有命令に関連する通信遅延を補正するため複数のユーザ・エージェント・プログラムのうちの１つについてのモード特有命令を一時的に格納するバッファ・メモリを備えることができる。したがって、例えば、必要ならば、同期コーディネータ４２は、異なるユーザ・エージェント・プログラム上に同時にレンダリングされるようにモード特有命令を待ち、プロキシに出力するシステム遅延または他の遅延を考慮することができる。 The simultaneous multimodal synchronization coordinator 42 provides a mode specific instruction for one of the plurality of user agent programs to compensate for communication delays associated with mode specific instructions for other user agent programs during the session. A buffer memory for temporary storage may be provided. Thus, for example, if necessary, the synchronization coordinator 42 can take into account system delays or other delays waiting for mode specific instructions to be rendered simultaneously on different user agent programs and output to the proxy.

また、必要ならば、ユーザ・エージェント・プログラム３０は、ユーザがいくつかのマルチモードをミュートできる入力インターフェースを備えることができる。例えば、デバイスまたはユーザ・エージェント・プログラムが複数モード・オペレーションに対応可能である場合、ユーザは、特定の持続時間の間、モードをミュートするように指示することができる。例えば、ユーザ用の出力モードが音声であるが、ユーザが入っている環境に大きな音が発生している場合、ユーザは、例えば、音声ブラウザへの出力をミュートすることができる。ユーザから受け取ったマルチモード・ミュート・データは、マルチモーダル融合サーバ１４により、例えば、メモリ６０２（図５を参照）に格納され、所与のセッションの間にどのモードをミュートするかを指示可能である。その後、同期コーディネータ４２は、ミュートと識別されているモードのモード特有命令を取得することを控えることができる。 Also, if necessary, the user agent program 30 can include an input interface that allows the user to mute several multimodes. For example, if the device or user agent program is capable of multi-mode operation, the user can be instructed to mute the mode for a specific duration. For example, if the output mode for the user is voice, but a loud sound is generated in the environment where the user is present, the user can mute the output to the voice browser, for example. The multi-mode mute data received from the user is stored by the multi-modal fusion server 14, for example, in the memory 602 (see FIG. 5) and can indicate which mode is to be muted during a given session. is there. Thereafter, the synchronization coordinator 42 may refrain from obtaining mode specific instructions for the mode identified as mute.

情報フェッチャ４６は、複数のユーザ・エージェント・プログラム３０および３４についてマルチモード・アプリケーション５４からモード特有命令６９を取得する。モード特有命令６８，７０は、ユーザ・エージェント・プログラム３０および３４に送られる。この実施形態では、マルチモード・アプリケーション５４は、後述のように、異なるユーザ・エージェント・プログラム、したがって異なるモードに関連付けられているモード特有命令を識別するデータを含む。同時マルチモーダル同期コーディネータ４２は、モード特有命令を受け取るように情報フェッチャ４６に結合して動作する。同時マルチモーダル同期コーディネータ４２は、さらに、複数のプロキシ３８ａ〜３８ｎに結合して動作し、所与のセッションに必要なプロキシを指定する。 Information fetcher 46 obtains mode-specific instructions 69 from multi-mode application 54 for a plurality of user agent programs 30 and 34. Mode specific instructions 68, 70 are sent to user agent programs 30 and 34. In this embodiment, the multi-mode application 54 includes data identifying different user agent programs and thus mode specific instructions associated with different modes, as described below. The simultaneous multimodal synchronization coordinator 42 operates in conjunction with the information fetcher 46 to receive mode specific instructions. The simultaneous multimodal synchronization coordinator 42 further operates in conjunction with a plurality of proxies 38a-38n to specify the proxies required for a given session.

異なるユーザ・エージェント・プログラム３０および３４が異なるデバイス上にある場合、方法は、第１のモードに基づくマークアップ言語形式を１つのデバイスに送信し、第２のモード・マークアップ言語に基づく形式を１つまたは複数の他のデバイスに送信して、同じセッション中にユーザが異なるモードで同時情報入力を異なるデバイスに要求することにより、同時マルチモーダル入力情報６８，７０の要求を送信することを含む。これらのマークアップ言語に基づく形式は、モード特有命令６８，７０として得られた。 If the different user agent programs 30 and 34 are on different devices, the method sends a markup language format based on the first mode to one device and a format based on the second mode markup language. Including sending a request for simultaneous multimodal input information 68, 70 by sending to one or more other devices and allowing the user to request simultaneous information input to different devices in different modes during the same session. . Forms based on these markup languages were obtained as mode specific instructions 68,70.

マルチモーダル・セッション・コントローラ４０は、着信セッションの検出、セッションへの応答、セッション・パラメータの修正、セッションの終了、およびセッションおよ
び媒体情報とデバイス上のセッション制御アルゴリズムとの交換に使用される。マルチモーダル・セッション・コントローラ４０は、必要ならセッションの一次セッション終了ポイントであるか、または例えば、ユーザが音声ゲートウェイなどの他のゲートウェイとのセッションを確立することを望んでいる場合に二次セッション終了ポイントとし、次に、これによりマルチモーダル・セッション・コントローラ４０とのセッションを確立することができる。 The multimodal session controller 40 is used to detect incoming sessions, respond to sessions, modify session parameters, terminate sessions, and exchange session and media information with session control algorithms on the device. The multimodal session controller 40 is the primary session end point of the session if necessary, or the secondary session end if, for example, the user wishes to establish a session with another gateway such as a voice gateway The point then allows a session with the multimodal session controller 40 to be established.

同期コーディネータ４２は、同時マルチモーダル入力情報の要求を含む、出力同期メッセージ４７および４９を、各々のプロキシ３８ａおよび３８ｎに送信し、それらの出力と各々の複数のユーザ・エージェント・プログラムとの同期をとる。プロキシ３８ａおよび３８ｎは、同時同期コーディネータ４２に、受信したマルチモーダル入力情報７２および７４を含む入力同期メッセージ５１および５３を送信する。 The synchronization coordinator 42 sends output synchronization messages 47 and 49, including requests for simultaneous multimodal input information, to each proxy 38a and 38n to synchronize their output with each of the plurality of user agent programs. Take. The proxies 38a and 38n transmit input synchronization messages 51 and 53 including the received multimodal input information 72 and 74 to the simultaneous synchronization coordinator 42.

同時マルチモーダル同期コーディネータ４２は、プロキシを使用して、またはユーザ・エージェント・プログラムに能力があればユーザ・エージェント・プログラムを使用して、同期メッセージ４７、４９、５１、および５３を送受信し、プロキシ３８ａおよび３８ｎが異なるユーザ・エージェント・プログラムから受信マルチモーダル入力情報７２および７４を受信すると、プロキシ３８ａおよび３８ｎは受信マルチモーダル入力情報７２および７４を含む入力同期メッセージ５１および５３を同期コーディネータ４２に送信する。同期コーディネータ４２は、受信情報をマルチモーダル融合エンジン４４に転送する。さらに、ユーザ・エージェント・プログラム３４が同期メッセージをマルチモーダル同期コーディネータ４２に送信する場合、マルチモーダル同期コーディネータ４２は、その同期メッセージをセッション中の他のユーザ・エージェント・プログラム３０に送信する。同時マルチモーダル同期コーディネータ４２は、さらに、メッセージ変換を実行し、同期メッセージ・フィルタ処理を行って、同期システムをより効率的なものとすることができる。同時マルチモーダル同期コーディネータ４２は、所与のセッションで使用されている現在のユーザ・エージェント・プログラムのリストを保持し、同期処理の必要なときにその通知先を追跡することができる。 The simultaneous multimodal synchronization coordinator 42 sends and receives synchronization messages 47, 49, 51, and 53 using a proxy or, if the user agent program is capable, using a user agent program. When 38a and 38n receive incoming multimodal input information 72 and 74 from different user agent programs, proxies 38a and 38n send input synchronization messages 51 and 53 containing incoming multimodal input information 72 and 74 to synchronization coordinator 42. To do. The synchronization coordinator 42 transfers the received information to the multimodal fusion engine 44. Further, when the user agent program 34 sends a synchronization message to the multimodal synchronization coordinator 42, the multimodal synchronization coordinator 42 sends the synchronization message to the other user agent program 30 in the session. The simultaneous multimodal synchronization coordinator 42 can also perform message conversion and perform synchronization message filtering to make the synchronization system more efficient. The simultaneous multimodal synchronization coordinator 42 can maintain a list of current user agent programs used in a given session and track their notification destinations when synchronization processing is required.

マルチモーダル融合サーバ１４は、複数のマルチモーダル・プロキシ３８ａ〜３８ｎ、マルチモーダル・セッション・コントローラ４０、同時マルチモーダル同期コーディネータ４２、マルチモーダル融合エンジン４４、情報（例えば、モード特有命令）フェッチャ４６、およびｖｏｉｃｅＸＭＬインタプリタ５０を備える。少なくともマルチモーダル・セッション・コントローラ４０、同時マルチモーダル同期コーディネータ４２、マルチモーダル融合エンジン４４、情報フェッチャ４６、およびマルチモーダル・マークアップ言語（例えば、ｖｏｉｃｅＸＭＬ）インタプリタ５０は、１つまたは複数の処理デバイスを実行するソフトウェア・モジュールとして実装することができる。したがって、１つまたは複数のデバイスにより読み出されたときに１つまたは複数の処理デバイスでソフトウェア・モジュールの各々に関して本明細書で説明している機能を実行する実行可能命令がメモリに格納される。したがって、それだけには限らないが、マルチモーダル融合サーバ１４は、デジタル・シグナル・プロセッサ、マイクロコンピュータ、マイクロプロセッサ、状態機械、またはその他の適切な処理デバイスを含む可能性のある処理デバイスを含むが、これらには限定されない。メモリには、ＲＯＭ、ＲＡＭ、分散メモリ、フラッシュ・メモリ、または処理デバイスにより実行されたときに１つまたは複数の処理デバイスを本明細書で説明されているように動作させる状態またはその他のデータを格納することが可能な他の適切なメモリがある。あるいはそれとは別に、ソフトウェア・モジュールの機能は、必要に応じてハードウェアまたはハードウェア、ソフトウェア、およびファームウェアの適切な任意の組み合わせで適宜実装することができる。 The multimodal fusion server 14 includes a plurality of multimodal proxies 38a-38n, a multimodal session controller 40, a simultaneous multimodal synchronization coordinator 42, a multimodal fusion engine 44, an information (eg, mode specific instructions) fetcher 46, and A voiceXML interpreter 50 is provided. At least a multimodal session controller 40, a simultaneous multimodal synchronization coordinator 42, a multimodal fusion engine 44, an information fetcher 46, and a multimodal markup language (eg, voiceXML) interpreter 50, one or more processing devices It can be implemented as a software module to be executed. Accordingly, executable instructions that, when read by one or more devices, perform the functions described herein for each of the software modules on one or more processing devices are stored in memory. . Thus, the multimodal fusion server 14 includes, but is not limited to, processing devices that may include digital signal processors, microcomputers, microprocessors, state machines, or other suitable processing devices. It is not limited to. Memory includes state or other data that causes one or more processing devices to operate as described herein when executed by ROM, RAM, distributed memory, flash memory, or processing devices. There are other suitable memories that can be stored. Alternatively, the functionality of the software module may be implemented as appropriate with any suitable combination of hardware or hardware, software, and firmware as required.

マルチモーダル・マークアップ言語インタプリタ５０は、状態機械または他の適切なハ
ードウェア、ソフトウェア、ファームウェアまたはそれらの適切な組み合わせとすることができ、これは特に、マルチモーダル・アプリケーション５４が備えるマークアップ言語を実行する。 The multimodal markup language interpreter 50 can be a state machine or other suitable hardware, software, firmware, or any suitable combination thereof, in particular the markup language provided by the multimodal application 54. Execute.

図２は、マルチモーダル融合サーバ１４によりマルチモーダル通信を実行する方法を示す例である。しかし、本明細書で説明している工程はどれも、適切な順序で適切な１つまたは複数のデバイスにより実行可能であることは理解されるであろう。現在のマルチモーダル・セッションでは、ユーザ・エージェント・プログラム３０（例えば、ＷＡＰブラウザ）は要求５２をＷｅｂサーバ１８に送信し、Ｗｅｂサーバ１８にアクセス可能な同時マルチモーダル・アプリケーション５４からコンテンツを要求する。これは、例えば、ＵＲＬを入力するか、またはアイコンをクリックするか、または他の従来のメカニズムを使用することにより行うことができる。また、破線５２で示されているように、ユーザ・エージェント・プログラム３０および３４は各々、ユーザ・モード情報をマークアップ・インタプリタ５０に送信することができる。コンテンツ・サーバとして機能するＷｅｂサーバ１８は、同時マルチモーダル・サービスへのユーザ・サブスクリプション工程を通じてすでに入力されているモード・プリファレンス・データベース３６から、通信デバイス１２のマルチモーダル・プリファレンス５５を取得する。次に、Ｗｅｂサーバ１８は、データベース３６からのユーザ・プリファレンスを含む通知５６を通じてマルチモーダル融合サーバ１４に通知し、例えば、同時マルチモーダル通信でどのユーザ・エージェント・プログラムを使用しているか、どのモードでユーザ・エージェント・プログラムの各々が設定されているかを示す。この例では、ユーザ・エージェント・プログラム３０は、テキスト・モードに設定され、ユーザ・エージェント・プログラム３４は音声モードに設定されている。その後、同時マルチモード同期コーディネータ４２は、セッション中に、複数のマルチモーダル・プロキシ３８ａ〜３８ｎのどれがユーザ・エージェント・プログラム３０および３４の各々に対して使用されるかを判別する。したがって、同時マルチモード同期コーディネータ４２は、マルチモード・プロキシ３８ａを、テキスト・モードに設定されているユーザ・エージェント・プログラム３０と通信するためのテキスト・プロキシとして指定する。同様に、同時マルチモード同期コーディネータ４２は、プロキシ３８ｎを、音声モードで動作しているユーザ・エージェント・プログラム３４の音声情報を伝達するマルチモーダル・プロキシとして指定する。情報フェッチャは、Ｗｅｂページ・フェッチャ４６として示されており、マークアップ言語形式またはその他のデータなどのモード特有命令を同時マルチモーダル・アプリケーション５４と関連するＷｅｂサーバ１８から取得する。 FIG. 2 shows an example of a method for executing multimodal communication by the multimodal fusion server 14. However, it will be understood that any of the steps described herein can be performed by a suitable device or devices in a suitable order. In a current multimodal session, the user agent program 30 (eg, WAP browser) sends a request 52 to the web server 18 to request content from a simultaneous multimodal application 54 that is accessible to the web server 18. This can be done, for example, by entering a URL or clicking an icon or using other conventional mechanisms. Also, as indicated by dashed line 52, user agent programs 30 and 34 can each send user mode information to markup interpreter 50. The Web server 18 functioning as a content server obtains the multimodal preference 55 of the communication device 12 from the mode preference database 36 that has already been input through the user subscription process to the simultaneous multimodal service. To do. Next, the web server 18 notifies the multimodal fusion server 14 through a notification 56 containing user preferences from the database 36, for example which user agent program is being used in simultaneous multimodal communication, which Indicates whether each of the user agent programs is set in the mode. In this example, the user agent program 30 is set to the text mode, and the user agent program 34 is set to the voice mode. Thereafter, the simultaneous multimode synchronization coordinator 42 determines which of a plurality of multimodal proxies 38a-38n is used for each of the user agent programs 30 and 34 during the session. Thus, the simultaneous multimode synchronization coordinator 42 designates the multimode proxy 38a as a text proxy for communicating with the user agent program 30 that is set to text mode. Similarly, the simultaneous multimode synchronization coordinator 42 designates the proxy 38n as a multimodal proxy that conveys voice information of the user agent program 34 operating in voice mode. The information fetcher is shown as a web page fetcher 46 and obtains mode specific instructions, such as markup language format or other data, from the web server 18 associated with the simultaneous multimodal application 54.

例えば、マルチモーダル・アプリケーション５４がユーザに、情報を音声モードとテキスト・モードの両方で入力するよう要求した場合、情報フェッチャ４６は、ユーザ・エージェント・プログラム３０に対して出力する関連するＨＴＭＬマークアップ言語形式および要求６６を介してユーザ・エージェント・プログラム３４に出力する関連するｖｏｉｃｅＸＭＬ形式を取得する。その後、これらのモード特有命令は、ユーザ・エージェント・プログラムにより出力としてレンダリングされる（例えば、画面への出力またはスピーカによる出力）。同時マルチモーダル同期コーディネータ４２は、セッション中に、モード特有命令に基づき複数のユーザ・エージェント・プログラム３０および３４からの出力の同期をとる。例えば、同時マルチモーダル同期コーディネータ４２は、音声が通信デバイス１２上にレンダリングされるときにユーザ・エージェント・プログラム３０を介してテキストが画面上に出力されるのと同時にレンダリングされるように適切な時期に各々のユーザ・エージェント・プログラム３０および３４に異なるモードを表す適切なマークアップ言語形式を送信する。例えば、マルチモーダル・アプリケーション５４は、同時にユーザ・エージェント・プログラム３０からのテキスト入力を待ちながら、テキスト・ブラウザを介して入力されることが予期される情報に関して、ユーザ・エージェント・プログラム３４を介して、音声命令の形式の命令をユーザに与えることができる。例えば、マルチ
モーダル・アプリケーション５４は、「ｐｌｅａｓｅｅｎｔｅｒｙｏｕｒｄｅｓｉｒｅｄｄｅｓｔｉｎａｔｉｏｎｃｉｔｙｆｏｌｌｏｗｅｄｂｙｙｏｕｒｄｅｓｉｒｅｄｄｅｐａｒｔｕｒｅｔｉｍｅ」という語句の音声出力を必要とし、その一方で同時に、ユーザ・エージェント・プログラム３０を通じて通信デバイスの表示装置上に出力されるフィールドを表示し、そのフィールドは市を意味する「Ｃ」として指示され、次の行では送信先を意味する「Ｄ」として指示される。この例では、マルチモーダル・アプリケーションは、ユーザによる同時マルチモーダル入力を要求しておらず、１つのモード、つまりテキスト・モードを通じて入力を要求しているだけである。他のモードは、ユーザ命令を送るために使用されている。 For example, if the multimodal application 54 asks the user to enter information in both voice and text modes, the information fetcher 46 will output the associated HTML markup to be output to the user agent program 30. Obtain the associated voiceXML format that is output to the user agent program 34 via the language format and request 66. These mode-specific instructions are then rendered as output by the user agent program (eg, output to the screen or output by a speaker). A simultaneous multimodal synchronization coordinator 42 synchronizes the outputs from the plurality of user agent programs 30 and 34 based on mode specific instructions during the session. For example, the simultaneous multimodal synchronization coordinator 42 may have the appropriate time to render text as it is output on the screen via the user agent program 30 when audio is rendered on the communication device 12. To each user agent program 30 and 34 in the appropriate markup language format representing the different modes. For example, the multimodal application 54 may be able to communicate via the user agent program 34 for information expected to be entered via a text browser while simultaneously waiting for text input from the user agent program 30. A command in the form of a voice command can be given to the user. For example, the multimodal application 54 requires a voice output of the phrase “please enter your desired destined city followed by your desired depart time” while simultaneously on the display device of the communication device through the user agent program 30 In the next line, the field is designated as “C” meaning city, and the next line is designated as “D” meaning destination. In this example, the multimodal application does not require simultaneous multimodal input by the user, but only requests input through one mode, the text mode. Other modes are used to send user commands.

それとは別に、マルチモーダル・アプリケーション５４が複数のユーザ・エージェント・プログラムを通じて入力情報を入力するようユーザに要求する場合、マルチモーダル融合エンジン１４は、セッション中に異なるマルチモーダル・ユーザ・エージェント・プログラムで同時に入力されるユーザ入力を融合する。例えば、ユーザが表示されている地図上の２つの位置をクリックしながら「ｄｉｒｅｃｔｉｏｎｓｆｒｏｍｈｅｒｅｔｏ
ｔｈｅｒｅ」という語句を発声すると、音声ブラウザまたはユーザ・エージェント・プログラム３４は開始位置フィールドに「ｈｅｒｅ」を書き込み、目的位置フィールドに受信した入力情報７４として「ｔｈｅｒｅ」と書き込む一方で、グラフィカル・ブラウザ、つまり、ユーザ・エージェント・プログラム３０は、開始位置フィールドに地図上の第１のクリック点の地理的位置（例えば、緯度／経度）を書き込み、目的位置フィールドに地図上の第２のクリック点の地理的位置（例えば、緯度／経度）を書き込む。マルチモーダル融合エンジン４４はこの情報を取得し、異なるモードで動作している複数のユーザ・エージェント・プログラムからユーザが入力した入力情報を融合し、単語「ｈｅｒｅ」が第１のクリック点の地理的位置に対応し、単語「ｔｈｅｒｅ」が第２のクリック点の地理的位置（例えば、緯度／経度）に対応すると判定する。このようにして、マルチモーダル融合エンジン４４はユーザのコマンドの一揃いの完全な情報を持つ。マルチモーダル融合エンジン４４側で、融合された情報６０をユーザ・エージェント・プログラム３０および３４に送り返し、同時マルチモーダル通信に関連する完全な情報を持つようにしたい場合がある。このときに、ユーザ・エージェント・プログラム３０は、この情報をコンテンツ・サーバ１８にサブミットし、所望の情報を取得することができる。 Alternatively, if the multi-modal application 54 requires the user to enter input information through multiple user agent programs, the multi-modal fusion engine 14 may use different multi-modal user agent programs during the session. Merge user inputs that are input simultaneously. For example, while clicking on two positions on the map where the user is displayed, “directions from here to
utter the phrase "there", the voice browser or user agent program 34 writes "here" in the start position field and "the" as received input information 74 in the destination position field, while a graphical browser, That is, the user agent program 30 writes the geographical position (for example, latitude / longitude) of the first click point on the map in the start position field, and the geography of the second click point on the map in the destination position field. Write the target location (eg latitude / longitude). The multimodal fusion engine 44 obtains this information, fuses the input information entered by the user from multiple user agent programs operating in different modes, and the word “here” is the geographical location of the first click point. Corresponding to the position, it is determined that the word “there” corresponds to the geographical position (eg, latitude / longitude) of the second click point. In this way, the multimodal fusion engine 44 has a complete set of user commands. On the multimodal fusion engine 44 side, it may be desirable to send the fused information 60 back to the user agent programs 30 and 34 to have complete information related to simultaneous multimodal communication. At this time, the user agent program 30 can submit this information to the content server 18 to obtain desired information.

ブロック２００に示されているように、セッションに関して、方法は、互いに対して異なるモードで動作する複数のユーザ・エージェント・プログラムに対しモード特有命令６８，７０を取得することを含むが、例えば、複数のユーザ・エージェント・プログラムの各々の各モードに固有の異なる種類のマークアップ言語を取得する。ブロック２０２に示されているように、方法は、セッション中に、ユーザの同時マルチモーダル・オペレーションを円滑に実行可能であるようにするモード特有命令に基づきユーザ・エージェント・プログラムなどの出力の同期を行うことを含む。そこで、マークアップ言語形式のレンダリングの同期をとり、複数のユーザ・エージェント・プログラムを通じて異なるモードで複数のユーザ・エージェント・プログラムからの出力を同時にレンダリングする。ブロック２０３に示されているように、同時マルチモーダル同期コーディネータ４２は、異なるユーザ・エージェント・プログラム３０および３４に対するモード特有命令セット６８，７０が、異なるユーザ・エージェント・プログラムを使用してユーザが行う異なるモードでの情報の同時入力を要求するかどうかを判別する。否定の場合、ブロック２０５に示されているように、同時マルチモーダル同期コーディネータ４２は１つのユーザ・エージェント・プログラムだけから受信した入力情報を送信先サーバまたはＷｅｂサーバ１８に転送する。 As shown in block 200, for a session, the method includes obtaining mode specific instructions 68, 70 for a plurality of user agent programs operating in different modes relative to each other, eg, a plurality of Different types of markup languages specific to each mode of each of the user agent programs. As shown in block 202, the method synchronizes the output of a user agent program or the like during a session based on mode specific instructions that allow the user's simultaneous multimodal operation to be performed smoothly. Including doing. Therefore, the rendering of the markup language format is synchronized, and the outputs from the plurality of user agent programs are simultaneously rendered in different modes through the plurality of user agent programs. As shown in block 203, the simultaneous multimodal synchronization coordinator 42 performs mode specific instruction sets 68, 70 for different user agent programs 30 and 34 by the user using different user agent programs. Determine whether to request simultaneous input of information in different modes. If not, as indicated at block 205, the simultaneous multimodal synchronization coordinator 42 forwards input information received from only one user agent program to the destination server or Web server 18.

しかし、ブロック２０４に示されているように、異なるユーザ・エージェント・プログラム３０および３４に対するモード特有命令セット６８，７０が、異なるモードでの同時
ユーザ入力を要求した場合、方法は、異なるモードで動作する異なるユーザ・エージェント・プログラムに関連する融合されたマルチモーダル応答６０を生成するためユーザ・エージェント・プログラム３０および３４により送り返される、ユーザが入力する受信同時マルチモーダル入力情報を融合することを含む。ブロック２０６に示されているように、方法は、融合されたマルチモーダル応答６０をマークアップ言語インタプリタ５０で現在実行中のアプリケーション６１に転送して返すことを含む。現在実行中のアプリケーション６１（図５を参照）は、インタプリタ５０の一部として実行中のアプリケーション５４からのマークアップ言語である。 However, as shown in block 204, if mode specific instruction sets 68, 70 for different user agent programs 30 and 34 require simultaneous user input in different modes, the method operates in different modes. Fusing received user-entered multimodal input information sent back by user agent programs 30 and 34 to generate a fused multimodal response 60 associated with different user agent programs. As indicated at block 206, the method includes forwarding and returning the fused multimodal response 60 to the currently executing application 61 with the markup language interpreter 50. The currently running application 61 (see FIG. 5) is a markup language from the application 54 that is running as part of the interpreter 50.

図１および３を参照して、マルチモーダル通信システム１０の詳細なオペレーションについて説明する。ブロック３００に示されているように、通信デバイス１２は、ユーザ・エージェント・プログラム３０を介してＷｅｂコンテンツまたは他の情報に対する要求５２を送信する。ブロック３０２に示されているように、コンテンツ・サーバ１８は、セッションのデバイス・プリファレンスおよびモード・プリファレンスを取得するために識別されているユーザのモード・プリファレンス・データベース３６からマルチモーダル・プリファレンス・データ５５を取得する。ブロック３０４に示されているように、方法は、コンテンツ・サーバがマルチモーダル融合サーバ１４に、どのユーザ・エージェント・アプリケーションがどのデバイス上で、所与の同時の異なるマルチモーダル通信セッションに対しどのモードを使用して動作しているかを通知することを含む。 The detailed operation of the multimodal communication system 10 will be described with reference to FIGS. As shown in block 300, the communication device 12 sends a request 52 for web content or other information via the user agent program 30. As shown in block 302, the content server 18 determines the multimodal preferences from the user's mode preference database 36 that has been identified to obtain the device and mode preferences for the session. Reference data 55 is acquired. As shown in block 304, the method may include the content server to the multimodal fusion server 14, which user agent application on which device, which mode for a given concurrent different multimodal communication session. Including notifying you that you are working.

前述のように、またブロック３０６に示されているように、同時マルチモーダル同期コーディネータ４２は、モード・プリファレンス・データベース３６からのモード・プリファレンス情報５５に基づいて異なるモードの各々について各プロキシを判別するようにセットアップされている。ブロック３０８に示されているように、方法は、必要に応じて、マルチモーダル・セッション・コントローラ４０を介して各ユーザ・エージェント・プログラムのユーザ・モード指定を受け取ることを含む。例えば、ユーザは、所望のモードを変更し、モード・プリファレンス・データベース３６に格納されているプリセット済みのモード・プリファレンス５５と異なるようにすることができる。これは、従来のセッション・メッセージング機能を使用して実行可能である。所望のユーザ・エージェント・プログラムが異なるデバイス上にある場合など、ユーザが特定のユーザ・エージェント・プログラムに対する所望のモードを変更している場合、異なるマークアップ言語形式などの、異なるモード特有命令が必要になることがある。ユーザ・モード指定が変更された場合、情報フェッチャ４６は、ユーザ・エージェント・アプリケーションに対し選択されているモードに基づいて適切なモード特有命令をフェッチし、かつ要求する。 As described above and shown in block 306, the simultaneous multimodal synchronization coordinator 42 assigns each proxy for each of the different modes based on mode preference information 55 from the mode preference database 36. Set up to determine. As indicated at block 308, the method includes receiving a user mode designation for each user agent program via the multimodal session controller 40 as needed. For example, the user can change the desired mode to be different from the preset mode preferences 55 stored in the mode preferences database 36. This can be done using conventional session messaging functions. If the user is changing the desired mode for a particular user agent program, such as when the desired user agent program is on a different device, different mode specific instructions are required, such as different markup language formats May be. If the user mode designation has changed, the information fetcher 46 fetches and requests the appropriate mode specific instructions based on the mode selected for the user agent application.

その後、ブロック３１０に示されているように、情報フェッチャ４６は、ユーザ・エージェント・プログラム毎に、したがってモード毎に、フェッチ要求６６として示されているモード特有命令をコンテンツ・サーバ１８からフェッチする。したがって、マルチモーダル融合サーバ１４は、情報フェッチャ４６を介して、異なるモードを表すマークアップ言語を取得し、これにより、各ユーザ・エージェント・プログラム３０および３４は、そのマークアップ言語に基づいてさまざまモードで情報を出力することができる。しかし、マルチモーダル融合サーバ１４はマークアップ言語に基づく情報だけでなく、適切なモード特有命令であればどのようなものでも取得することができることは理解されるであろう。 Thereafter, as shown in block 310, the information fetcher 46 fetches from the content server 18 a mode specific instruction, shown as a fetch request 66, for each user agent program and thus for each mode. Thus, the multimodal fusion server 14 obtains a markup language representing different modes via the information fetcher 46 so that each user agent program 30 and 34 can operate in different modes based on the markup language. Can output information. However, it will be understood that the multimodal fusion server 14 can obtain any suitable mode specific command, not just information based on markup language.

モード特有命令が各ユーザ・エージェント・プログラムについてコンテンツ・サーバ１８からフェッチされ、ＣＭＭＴがモード特有命令６８，７０に関連付けられていない場合、受け取ったモード特有命令６９をトランスコーダ６０８（図５を参照）に送ることができる。トランスコーダ６０８は、受け取ったモード特有命令を、インタプリタ５０により解釈可能であるように、基本マークアップ言語形式にトランスコードし、異なるモード６
１０に対するモード特有命令を識別するデータを有する基本マークアップ言語形式を作成する。したがって、トランスコーダは、異なるモードで動作する他のユーザ・エージェント・プログラムに対するモード特有命令を識別するデータを含むようにモード特有命令をトランスコードする。例えば、インタプリタ５０がｖｏｉｃｅＸＭＬなどの基本マークアップ言語を使用しており、アプリケーション５４の一方のモード特有命令セットがｖｏｉｃｅＸＭＬ形式であり、他方がＨＴＭＬ形式であれば、トランスコーダ６０８は、ＨＴＭＬ形式を取得可能な場所のＵＲＬ、または実際のＨＴＭＬ形式自体を識別するＣＭＭＴをｖｏｉｃｅＸＭＬ形式に埋め込む。さらに、モード特有命令のいずれも基本マークアップ言語のものでなければ、一組のモード特有命令が基本マークアップ言語に変換され、それ以降、それ以外のモード特有命令群はＣＭＭＴにより参照される。 If a mode specific instruction is fetched from the content server 18 for each user agent program and the CMMT is not associated with the mode specific instructions 68, 70, the received mode specific instruction 69 is transcoded 608 (see FIG. 5). Can be sent to. The transcoder 608 transcodes the received mode-specific instructions into a basic markup language format so that it can be interpreted by the interpreter 50 to provide different mode 6
A basic markup language form is created having data identifying mode specific instructions for 10. Thus, the transcoder transcodes the mode specific instructions to include data identifying the mode specific instructions for other user agent programs operating in different modes. For example, if the interpreter 50 uses a basic markup language such as voiceXML, and one mode-specific instruction set of the application 54 is in voiceXML format and the other is in HTML format, the transcoder 608 acquires the HTML format. Embed the CMMT that identifies the URL of a possible location or the actual HTML format itself in the voiceXML format. Further, if none of the mode-specific instructions are in the basic markup language, a set of mode-specific instructions is converted to the basic markup language, and thereafter, the other mode-specific instructions are referred to by the CMMT.

あるいは、マルチモーダル・アプリケーション５４は、必要なＣＭＭＴ情報を供給し、同時マルチモーダル・セッション時に複数のユーザ・エージェント・プログラムによる出力の同期処理を円滑に行えるようにすることができる。各ユーザ・エージェント・プログラムに対するモード特有命令の一例を、以下にマークアップ言語形式で示す。マークアップ言語形式は、マルチモーダル・アプリケーション５４により供給され、マルチモーダル融合サーバ１４によって同時マルチモーダル通信セッションを実行するために使用される。マルチモーダルｖｏｉｃｅＸＭＬインタプリタ５０では、マルチモーダル・アプリケーション５４がｖｏｉｃｅＸＭＬを基本言語として使用するものと想定している。ユーザに代わって複数のユーザ・エージェント・プログラムが出力の同期処理を円滑に行えるようにするため、ｖｏｉｃｅＸＭＬ形式の拡張機能またはＨＴＭＬ形式のインデックスなどの同時マルチモーダル・タグ（ＣＭＭＴ）を含む、またはそのインデックスを生成するマルチモーダル・アプリケーション５４を作成することができる。ＣＭＭＴは、モードを識別し、識別されたモードでユーザ・エージェント・プログラムのうちの１つにより出力される実際のＨＴＭＬ形式などの情報を指し示すか、またはそのような情報を含む。ＣＭＭＴは、さらに、マルチモーダル同期データとしても使用され、ＣＭＭＴを入れることにより異なるモード特有命令と異なるユーザ・エージェント・プログラムとの同期をとる必要があることを示す。 Alternatively, the multimodal application 54 can provide the necessary CMMT information to facilitate smoothing of output synchronization by multiple user agent programs during a simultaneous multimodal session. An example of mode specific instructions for each user agent program is shown below in markup language format. The markup language form is supplied by the multimodal application 54 and is used by the multimodal fusion server 14 to perform simultaneous multimodal communication sessions. The multimodal voiceXML interpreter 50 assumes that the multimodal application 54 uses voiceXML as a basic language. Includes or enhances simultaneous multimodal tags (CMMT) such as voiceXML format extensions or HTML format indexes to allow multiple user agent programs to smoothly synchronize output on behalf of the user A multimodal application 54 can be created that generates an index. The CMMT identifies or includes information such as the actual HTML format that identifies the mode and is output by one of the user agent programs in the identified mode. CMMT is also used as multi-modal synchronization data, indicating that it is necessary to synchronize different mode specific instructions with different user agent programs by including CMMT.

例えば、ｖｏｉｃｅＸＭＬがマルチモーダル・アプリケーション５４の基本言語であれば、ＣＭＭＴはテキスト・モードであることを示す。この例では、ＣＭＭＴは、ユーザ・エージェント・プログラムにより出力されるＨＴＭＬ形式のテキストを含むＵＲＬを含むか、またはＣＭＭＴの一部としてＨＴＭＬを含むことができる。ＣＭＭＴは、マークアップ言語の属性拡張機能のプロパティを備えることができる。マルチモーダルｖｏｉｃｅＸＭＬインタプリタ５０は、情報フェッチャ４６を使用してモード特有命令をフェッチし、マルチモーダル・アプリケーションからフェッチされたモード特有命令を解析し（この例では、実行し）、ＣＭＭＴを検出する。検出された後、マルチモーダルｖｏｉｃｅＸＭＬインタプリタ５０は、ＣＭＭＴを解釈し、必要ならば、テキスト・モード用のＨＴＭＬなど、他のモード特有命令を取得する。 For example, if voiceXML is the base language of the multimodal application 54, it indicates that CMMT is in text mode. In this example, the CMMT can include a URL that includes HTML formatted text that is output by the user agent program, or can include HTML as part of the CMMT. The CMMT can be provided with a property of a markup language attribute extension function. The multimodal voiceXML interpreter 50 uses the information fetcher 46 to fetch mode specific instructions, parses (executes in this example) the mode specific instructions fetched from the multimodal application, and detects the CMMT. After being detected, the multimodal voiceXML interpreter 50 interprets the CMMT and, if necessary, obtains other mode specific instructions, such as HTML for text mode.

例えば、ＣＭＭＴは、グラフィカル・ブラウザ用のテキスト情報を取得する場所を示すことができる。以下に示すのは、音声ブラウザが「ｗｈｅｒｅｆｒｏｍ」および「ｗｈｅｒｅｔｏ」と尋ねる音声を出力し、その一方でグラフィカル・ブラウザが「ｆｒｏｍ
ｃｉｔｙ」および「ｔｏｃｉｔｙ」と表示する必要がある同時マルチモーダル・アプリケーションに対するｖｏｉｃｅＸＭＬ形式の形式の同時マルチモーダル巡回アプリケーションのモード特有命令の一例を示す表である。「ｆｒｏｍｃｉｔｙ」および「ｔｏｃｉｔｙ」と示されているフィールドでは、ユーザが異なるブラウザを通じて、受け取り済みの同時マルチモーダル情報を入力することが予期されている。 For example, the CMMT can indicate where to obtain text information for a graphical browser. Shown below is a voice output by the voice browser asking “where from” and “where to”, while the graphical browser is “from
FIG. 10 is a table showing an example of mode specific instructions for simultaneous multimodal cyclic application in the form of voiceXML for simultaneous multimodal applications that need to be labeled “city” and “to city”. In the fields labeled “from city” and “to city”, it is expected that the user will enter the received simultaneous multimodal information through different browsers.

表１
＜ｖｘｍｌｖｅｒｓｉｏｎ＝“２．０”＞
＜ｆｏｒｍ＞
＜ｂｌｏｃｋ＞
＜ｃｍｍｔｍｏｄｅ＝“ｈｔｍｌ” ｓｒｃ＝“．／ｉｔｉｎｅｒａｒｙ．ｈｔｍｌ”／＞非音声モードがｈｔｍｌ（テキスト）であること、およびソース情報がｉｔｉｎｅｒａｒｙ．ｈｔｍｌというｕｒｌに置かれていることを示す
＜／ｂｌｏｃｋ＞
＜ｆｉｅｌｄｎａｍｅ＝“ｆｒｏｍ＿ｃｉｔｙ”＞グラフィカル・ブラウザを通じて収集しようと試みる予期される情報テキスト断片
＜ｇｒａｍｍａｒｓｒｃ＝“．／ｃｉｔｙ．ｘｍｌ”／＞音声用であり、音声認識エンジンの可能な応答のリストを作成する必要がある
Ｗｈｅｒｅｆｒｏｍ？音声ブラウザが発するプロンプト
＜／ｆｉｅｌｄ＞
＜ｆｉｅｌｄｎａｍｅ＝“ｔｏ＿ｃｉｔｙ”＞テキストが入ることを予期する
＜ｇｒａｍｍａｒｓｒｃ＝“．／ｃｉｔｙ．ｘｍｌ”／＞
Ｗｈｅｒｅｔｏ？音声ブラウザが発する音声
＜／ｆｉｅｌｄ＞
＜／ｆｏｒｍ＞
＜／ｖｘｍｌ＞

したがって、上記のマークアップ言語形式は、少なくとも１つのユーザ・エージェント・プログラムに対するモード特有命令を表す基本マークアップ言語で書かれており、ＣＭＭＴは、異なるモードで動作している他のユーザ・エージェント・プログラムに対するモード特有命令を指定する拡張である。
Table 1
<Vxml version = “2.0”>
<Form>
<Block>
<Cmd mode = “html” src = “./ internally.html” /> non-speech mode is html (text), and source information is itinerary. Indicates that it is placed in url html </ block>
<Field name = “from_city”> Expected information text fragment to be collected through a graphical browser Need to create Where from? Prompt from voice browser </ field>
<Field name = "to_city"> Expects text <grammar src = "./ city.xml"/>
Where to? Voice generated by voice browser </ field>
</ Form>
</ Vxml>

Thus, the markup language format described above is written in a basic markup language that represents mode specific instructions for at least one user agent program, and the CMMT can be used by other user agent programs operating in different modes. An extension that specifies mode specific instructions for a program.

ブロック３１１に示されているように、ユーザがプリファレンスを変更した場合、方法は、その変更と矛盾しないようにプロキシをリセットすることを含む。ブロック３１２に示されているように、マルチモーダル融合サーバ１４は、受信待機ポイントに到達したかどうかを判別する。到達した場合、ブロック３１４に示されているように次の状態に入る。肯定の場合、この工程は完了である。否定の場合、方法は、異なるユーザ・エージェント・プログラムに対してモード特有命令の同期処理を行うことを含む。マルチモーダルｖｏｉｃｅＸＭＬインタプリタ５０は、この例では、ユーザ・エージェント・プログラム３０についてはＨＴＭＬを、ユーザ・エージェント３４についてはｖｏｉｃｅＸＭＬを、同時マルチモーダル同期コーディネータ４２に出力し、複数のユーザ・エージェント・プログラムによる出力の同期をとる。これは、例えば、前述のように、受信待機ポイントの発生に基づいて実行することができる。これは、ブロック３１６に示されている。 As shown in block 311, if the user changes the preference, the method includes resetting the proxy to be consistent with the change. As shown in block 312, the multimodal fusion server 14 determines whether a reception wait point has been reached. If so, the next state is entered as shown in block 314. If yes, this process is complete. If not, the method includes performing a synchronization process of mode specific instructions for different user agent programs. In this example, the multimodal voiceXML interpreter 50 outputs HTML for the user agent program 30, voiceXML for the user agent 34, and the simultaneous multimodal synchronization coordinator 42, and outputs by a plurality of user agent programs. Synchronize. This can be executed based on the occurrence of a reception standby point, for example, as described above. This is indicated by block 316.

ブロック３１８に示されているように、方法は、同時マルチモーダル同期コーディネータ４２などにより、対応するプロキシ３８ａおよび３８ｎに、同期しているモード特有命令６８，７０を送信し、同じセッションでユーザによる異なるモードでのユーザ入力情報を要求することを含む。同期をとった要求６８および７０は、ユーザ・エージェント・プログラム３０および３４の各々に送られる。例えば、異なるユーザ・エージェント・プログラムに関連付けられている複数の入力モードに対応する異なる同時モード入力情報の要求は、モード特有命令６８および７０を含む同期をとった要求として示される。これらは、例えば、同期したマークアップ言語形式とすることができる。 As shown in block 318, the method sends synchronized mode-specific instructions 68, 70 to the corresponding proxies 38a and 38n, such as by a simultaneous multimodal synchronization coordinator 42, and varies from user to user in the same session. Requesting user input information in mode. Synchronized requests 68 and 70 are sent to each of the user agent programs 30 and 34. For example, requests for different simultaneous mode input information corresponding to multiple input modes associated with different user agent programs are shown as synchronized requests that include mode specific instructions 68 and 70. These can be, for example, synchronized markup language formats.

ユーザ・エージェント・プログラム３０および３４は、モード特有命令を同時にレンダリングすると、方法は、ユーザ入力をブロック３２０に示されているようにタイムアウト期間内に受信したかどうか、または他のイベントが発生したかどうかを判別することを含
む。例えば、マルチモーダル融合エンジン４４は、一定期間待ち、それから、融合のためにユーザが入力したマルチモーダル入力情報が複数のユーザ・エージェント・プログラムから適切に受信されたかどうかを判別することができる。この待機期間は、各ユーザ・エージェント・プログラムのモード設定に応じて異なる期間とすることができる。例えば、ユーザが音声とテキスト情報の両方を同時に入力することが期待されているが、マルチモーダル融合エンジンが一定期間内に融合に関する情報を受け取っていない場合、エラーが発生しているとみなされる。さらに、マルチモーダル融合エンジン４４を使用すると、音声情報だと音声ゲートウェイ１６を介した処理に比較的長い時間を要するため、テキスト情報の場合と比べて音声情報では返すのにより長い時間がかかる場合がある。 When user agent programs 30 and 34 render mode-specific instructions at the same time, the method determines whether user input is received within the timeout period as indicated in block 320, or other event has occurred. Including determining whether. For example, the multimodal fusion engine 44 may wait for a period of time and then determine whether multimodal input information entered by the user for fusion has been properly received from a plurality of user agent programs. This waiting period can be different depending on the mode setting of each user agent program. For example, if a user is expected to input both voice and text information at the same time, but the multimodal fusion engine has not received information about the fusion within a certain period of time, it is considered that an error has occurred. Furthermore, when the multimodal fusion engine 44 is used, since it takes a relatively long time for processing via the voice gateway 16 for voice information, it may take a longer time to return the voice information than for text information. is there.

この例では、ユーザはユーザ・エージェント・プログラム３０を介してテキストを入力すると同時にマイクを使用して音声情報を発声し、ユーザ・エージェント・プログラム３４に伝達することが要求される。受信同時マルチモーダル入力情報７２および７４は、ユーザ・エージェント・プログラム３０および３４から受信され、適切な通信リンクを介して各々のプロキシに渡される。ユーザ・エージェント・プログラム３４とデバイス１２のマイクおよびスピーカとの間の符号７６で示されている通信は、ＰＣＭ形式または他の適切な形式で実行され、この例では、ユーザ・エージェント・プログラムにより出力することができるモード特有命令形式ではないことに注意されたい。 In this example, the user is required to input text via the user agent program 30 and at the same time utter voice information using a microphone and transmit it to the user agent program 34. Received simultaneous multimodal input information 72 and 74 is received from user agent programs 30 and 34 and passed to the respective proxies via appropriate communication links. The communication indicated by reference numeral 76 between the user agent program 34 and the microphone and speaker of the device 12 is performed in PCM format or other suitable format, in this example output by the user agent program. Note that it is not a mode-specific command format that can be done.

ユーザがテキスト・ブラウザと音声ブラウザを同時に使用して情報を入力し、マルチモーダル融合エンジン４４が複数のユーザ・エージェント・プログラムから送信された同時マルチモーダル入力情報を受信した場合、マルチモーダル融合エンジン４４は、ブロック３２２に示されているようにユーザから受信した入力情報７２および７４を融合する。 When a user inputs information using a text browser and a voice browser simultaneously, and the multimodal fusion engine 44 receives simultaneous multimodal input information transmitted from a plurality of user agent programs, the multimodal fusion engine 44 Fuses the input information 72 and 74 received from the user as shown in block 322.

図４は、マルチモーダル融合エンジン４４のオペレーションの一例を示している。説明のため、あるイベントについて、「ｎｏｉｎｐｕｔ」はユーザがこのモードで何も入力していなかったことを意味するものとする。「ｎｏｍａｔｃｈ」は、何かが入力されたが、予期した値でなかったことを意味する。結果は、ユーザにより正常に入力された内容からの一組のスロット（またはフィールド）名および対応する値のペアである。例えば、適切な入力内容は、“Ｃｉｔｙ＝Ｃｈｉｃａｇｏ”および“Ｓｔａｔｅ＝Ｉｌｌｉｎｏｉｓ”および“Ｓｔｒｅｅｔ”＝“ｆｉｒｓｔｓｔｒｅｅｔ”および例えば、０％から１００％の範囲の信頼度重み係数である。前述のように、マルチモーダル融合エンジン４４が情報を融合するかどうかは、スロット名（例えば、変数）および値ペアの受信または予期した受信の間の時間または他のイベントの受信に応じて異なる可能性がある。この方法は、信頼水準が受信した情報に割り当てられていると想定している。例えば、同期コーディネータおよびモードと情報到着時刻に基づくその重み信頼度。例えば、同じセッション中に異なるモードで同じスロット・データを入力することが可能な場合のように（例えば、住所の通り名を発声してキー入力する）、入力されたデータは発声されたデータよりも正確であると想定される。同期コーディネータは、受信時刻に基づき、また受信した個々の結果の信頼値に基づき、異なる同時マルチモーダル情報の要求への応答として送信された複数のユーザ・エージェント・プログラムの１つから送られた受信マルチモーダル入力情報を組み合わせる。 FIG. 4 shows an example of the operation of the multimodal fusion engine 44. For the sake of explanation, for a certain event, “no input” means that the user has not entered anything in this mode. “No match” means that something was entered but not the expected value. The result is a set of slot (or field) name and corresponding value pairs from what was successfully entered by the user. For example, suitable input content is “City = Chicago” and “State = Illinois” and “street” = “first street” and a confidence weighting factor in the range of 0% to 100%, for example. As described above, whether the multimodal fusion engine 44 fuses information can vary depending on the time between receipt of the slot name (eg, variable) and value pair or the receipt of expected or other events. There is sex. This method assumes that a confidence level is assigned to the received information. For example, the synchronization coordinator and its weight reliability based on mode and information arrival time. For example, if it is possible to enter the same slot data in different modes during the same session (for example, speaking and keying the street name of the address), the entered data is more than the spoken data Is also assumed to be accurate. The synchronization coordinator receives from one of a plurality of user agent programs sent in response to a request for different simultaneous multimodal information based on the time of reception and on the confidence value of the individual results received. Combine multimodal input information.

ブロック４００に示されているように、方法は、非音声モードでイベントまたは結果があったかを判別することを含む。肯定の場合、ブロック４０２に示されているように、方法は、「ｎｏｉｎｐｕｔ」および「ｎｏｍａｔｃｈ」イベントを除く任意のモードのイベントがあったかを判別することを含む。肯定の場合、方法は、ブロック４０４に示されているように、受け取った第１のそのようなイベントをインタプリタ５０に返すことを含む。しかし、「ｎｏｉｎｐｕｔ」および「ｎｏｍａｔｃｈ」以外にユーザ・エージェント・プログラムからのイベントがなかった場合、方法は、ブロック４０６に示されて
いるように、マルチモーダル融合エンジンの２つまたはそれ以上の結果を送信したモードについて、受信時刻の順序でそのモードの結果を組み合わせることを含む。これは、ユーザが同じスロットに対し入力を再入力する場合に有用であると思われる。所与のスロット名に対する後の値は、前の値を上書きする。マルチモーダル融合エンジンは、構成要素である個々の結果の信頼度重みに基づいてモードの結果信頼度重みを調整する。最終結果は、モード毎に、各スロット名に対する１つの回答となる。方法は、ブロック４０８に示されているように、ブロック４０６から結果を取り出し、それらを組み合わせてすべてのモードに対する１つの結合された結果とすることを含む。方法は、最も信頼度の低い結果から始めて、最も信頼度の高い結果へと進むことを含む。融合された結果の中の各スロット名で、そのスロットの定義を含む最も信頼度の高い入力結果に属するスロット値を受け取る。 As shown in block 400, the method includes determining if there was an event or result in non-voice mode. If yes, as shown in block 402, the method includes determining if there were any modes of events except for “no input” and “no match” events. If so, the method includes returning the received first such event to the interpreter 50, as shown in block 404. However, if there are no events from the user agent program other than “no input” and “no match”, the method may include two or more of the multimodal fusion engines as shown in block 406. For the mode that transmitted the result, combining the results of that mode in the order of the reception time. This may be useful when the user re-enters input for the same slot. The later value for a given slot name overwrites the previous value. The multimodal fusion engine adjusts the mode result confidence weight based on the individual result confidence weights that are components. The final result is one answer for each slot name for each mode. The method includes taking the results from block 406, as shown in block 408, and combining them into one combined result for all modes. The method includes starting with the least reliable result and proceeding to the most reliable result. For each slot name in the merged result, receive the slot value belonging to the most reliable input result including the definition of that slot.

ブロック４１０に示されているように、方法は、今組み合わされた結果があるかどうかを判別することを含む。つまり、ユーザ・エージェント・プログラムが、マルチモーダル融合エンジン４４に対する結果を送信したかということである。肯定の場合、方法は、ブロック４１２に示されているように、組み合わせた結果をコンテンツ・サーバ１８に返すことを含む。否定の場合、ブロック４１４に示されているように、「ｎｏｉｎｐｕｔ」または「ｎｏｍａｔｃｈ」イベントが０個またはそれ以上あることを意味する。方法は、「ｎｏｍａｔｃｈ」イベントがあるかどうかを判別することを含む。肯定の場合、方法は、ブロック４１６に示されているように、「ｎｏｍａｔｃｈ」イベントを返すことを含む。しかし、「ｎｏｍａｔｃｈ」イベントがなければ、方法は、ブロック４１８に示されているように、「ｎｏｉｎｐｕｔ」イベントをインタプリタ５０に返すことを含む。 As shown in block 410, the method includes determining whether there are now combined results. That is, whether the user agent program has sent a result to the multimodal fusion engine 44. If yes, the method includes returning the combined result to the content server 18, as shown in block 412. If not, it means that there are zero or more “no input” or “no match” events, as shown in block 414. The method includes determining whether there is a “no match” event. If yes, the method includes returning a “no match” event, as shown in block 416. However, if there is no “no match” event, the method includes returning a “no input” event to the interpreter 50, as shown in block 418.

ブロック４００に戻り、非音声モードからのイベントまたは結果がなかった場合、方法は、音声モードで結果を返したかどうか、つまり、ユーザ・エージェント・プログラム３４が受信情報７４を生成したかどうかを判別することを含む。これは、ブロック４２０に示されている。肯定の場合、ブロック４２２に示されているように、方法は、受け取った入力情報に対する音声応答をマルチモーダル・アプリケーション５４に返すことを含む。しかし、音声ブラウザ（例えば、ユーザ・エージェント・プログラム）が情報を出力しなかった場合、方法は、ブロック４２４に示されているように、音声モードでイベントが返されたかどうかを判別することを含む。「はい」であれば、ブロック４２６に示されているように、イベントは７３でマルチモーダル・アプリケーション５４に報告される。音声モード・イベントが生成されていなかった場合、方法は、ブロック４２８に示されているように、「ｎｏｉｎｐｕｔ」イベントを返すことを含む。 Returning to block 400, if there is no event or result from the non-voice mode, the method determines whether the result is returned in voice mode, that is, whether the user agent program 34 has generated the received information 74. Including that. This is indicated by block 420. If yes, as indicated at block 422, the method includes returning a voice response to the received input information to the multimodal application 54. However, if the voice browser (eg, user agent program) did not output information, the method includes determining whether an event was returned in voice mode, as shown in block 424. . If yes, the event is reported to multimodal application 54 at 73, as shown in block 426. If a voice mode event has not been generated, the method includes returning a “no input” event, as shown in block 428.

以下の表２は、仮説的データに適用される図４の方法の一例を示している。

表２
ＶｏｉｃｅＭｏｄｅＣｏｌｌｅｃｔｅｄＤａｔａ
ＳＴＲＥＥＴＮＡＭＥ＝Ｍｉｃｈｉｇａｎ
ＴＩＭＥＳＴＡＭＰ＝０
ＣＯＮＦＩＤＥＮＣＥＬＥＶＥＬ＝．８５
ＮＵＭＢＥＲ＝１１２
ＴＩＭＥＳＴＡＭＰ＝０
ＣＯＮＦＩＤＥＮＣＥＬＥＶＥＬ＝．９９

ＴｅｘｔＭｏｄｅＣｏｌｌｅｃｔｅｄＤａｔａ
ＳＴＲＥＥＴＮＡＭＥ＝Ｍｉｃｈｉｇａｎ
ＴＩＭＥＳＴＡＭＰ＝０
ＣＯＮＦＩＤＥＮＣＥＬＥＶＥＬ＝１．０
ＳＴＲＥＥＴＮＡＭＥ＝ＬａＳａｌｌｅ
ＴＩＭＥＳＴＡＭＰ＝１
ＣＯＮＦＩＤＥＮＣＥＬＥＶＥＬ＝１．０

例えば、ブロック４００で、非音声モードからの結果が受信されなかった場合、方法は、ブロック４０２に進む。ブロック４０２で、イベントがまったく受信されなかった場合、方法はブロック４０６に進む。ブロック４０６で、融合エンジンは、ＴｅｘｔＭｏｄｅＣｏｌｌｅｃｔｅｄＤａｔａを１スロット当たり１つの応答に圧縮する。ＶｏｉｃｅＭｏｄｅＣｏｌｌｅｃｔｅｄＤａｔａはそのままである。 Table 2 below shows an example of the method of FIG. 4 applied to hypothetical data.

Table 2
VoiceModeCollectedData
STREETNAME = Michigan
TIMESTAMP = 0
CONFIDENCELEVEL =. 85
NUMBER = 112
TIMESTAMP = 0
CONFIDENCELEVEL =. 99

TextModeCollectedData
STREETNAME = Michigan
TIMESTAMP = 0
CONFIDENCELEVEL = 1.0
STREETNAME = LaSalle
TIMESTAMP = 1
CONFIDENCELEVEL = 1.0

For example, if, at block 400, no result from non-voice mode is received, the method proceeds to block 402. If at block 402 no events are received, the method proceeds to block 406. At block 406, the fusion engine compresses TextModeCollectedData to one response per slot. VoiceModeCollectedData remains as it is.

ＶｏｉｃｅＭｏｄｅＣｏｌｌｅｃｔｅｄＤａｔａ
ＳＴＲＥＥＴＮＡＭＥ＝Ｍｉｃｈｉｇａｎ
ＴＩＭＥＳＴＡＭＰ＝０
ＣＯＮＦＩＤＥＮＣＥＬＥＶＥＬ＝．８５
ＮＵＭＢＥＲ＝１１２
ＴＩＭＥＳＴＡＭＰ＝０
ＣＯＮＦＩＤＥＮＣＥＬＥＶＥＬ＝．９９
ＯＶＥＲＡＬＬＣＯＮＦＩＤＥＮＣＥ＝．８５

音声モードはそのままである。しかし、．８５が結果セット内の最低信頼度であるため、総信頼度値．８５が割り当てられる。
VoiceModeCollectedData
STREETNAME = Michigan
TIMESTAMP = 0
CONFIDENCELEVEL =. 85
NUMBER = 112
TIMESTAMP = 0
CONFIDENCELEVEL =. 99
OVERALLCONFIDENCE =. 85

The voice mode remains the same. However,. Since 85 is the lowest confidence in the result set, the total confidence value. 85 is assigned.

ＴｅｘｔＭｏｄｅＣｏｌｌｅｃｔｅｄＤａｔａ
ＳＴＲＥＥＴＮＡＭＥ＝Ｍｉｃｈｉｇａｎ
ＴＩＭＥＳＴＡＭＰ＝０
ＣＯＮＦＩＤＥＮＣＥＬＥＶＥＬ＝１．０
ＳＴＲＥＥＴＮＡＭＥ＝ＬａＳａｌｌｅ
ＴＩＭＥＳＴＡＭＰ＝１
ＣＯＮＦＩＤＥＮＣＥＬＥＶＥＬ＝１．０

後のタイムスタンプでスロットにＬａＳａｌｌｅが書き込まれているため、テキスト・モードでは、収集データからＭｉｃｈｉｇａｎが削除される。最終結果はこのようになる。そして、１．０が結果セット内の最低信頼度レベルであるため、総信頼度レベル１．０が割り当てられる。
TextModeCollectedData
STREETNAME = Michigan
TIMESTAMP = 0
CONFIDENCELEVEL = 1.0
STREETNAME = LaSalle
TIMESTAMP = 1
CONFIDENCELEVEL = 1.0

Since LaSale is written to the slot at a later time stamp, Michigan is deleted from the collected data in the text mode. The final result looks like this: And since 1.0 is the lowest confidence level in the result set, a total confidence level of 1.0 is assigned.

ＴｅｘｔＭｏｄｅＣｏｌｌｅｃｔｅｄＤａｔａ
ＳＴＲＥＥＴＮＡＭＥ＝ＬａＳａｌｌｅ
ＴＩＭＥＳＴＡＭＰ＝１
ＣＯＮＦＩＤＥＮＣＥＬＥＶＥＬ＝１．０
ＯＶＥＲＡＬＬＣＯＮＦＩＤＥＮＣＥ＝１．０

以下に、ブロック４０８に送信されたデータを示す。
TextModeCollectedData
STREETNAME = LaSalle
TIMESTAMP = 1
CONFIDENCELEVEL = 1.0
OVERALLCONFIDENCE = 1.0

The data transmitted to block 408 is shown below.

ＶｏｉｃｅＭｏｄｅＣｏｌｌｅｃｔｅｄＤａｔａ
ＳＴＲＥＥＴＮＡＭＥ＝Ｍｉｃｈｉｇａｎ
ＴＩＭＥＳＴＡＭＰ＝０
ＣＯＮＦＩＤＥＮＣＥＬＥＶＥＬ＝．８５
ＮＵＭＢＥＲ＝１１２
ＴＩＭＥＳＴＡＭＰ＝０
ＣＯＮＦＩＤＥＮＣＥＬＥＶＥＬ＝．９９
ＯＶＥＲＡＬＬＣＯＮＦＩＤＥＮＣＥ＝．８５

ＴｅｘｔＭｏｄｅＣｏｌｌｅｃｔｅｄＤａｔａ
ＳＴＲＥＥＴＮＡＭＥ＝ＬａＳａｌｌｅ
ＴＩＭＥＳＴＡＭＰ＝１
ＣＯＮＦＩＤＥＮＣＥＬＥＶＥＬ＝１．０
ＯＶＥＲＡＬＬＣＯＮＦＩＤＥＮＣＥ＝１．０

ブロック４０８で、事実上２つのモードが単一の返却結果に融合される。
まず、最低信頼度レベルの結果全体が取り出され、最終結果（ＦｉｎａｌＲｅｓｕｌｔ）構造の中に入れられる。
VoiceModeCollectedData
STREETNAME = Michigan
TIMESTAMP = 0
CONFIDENCELEVEL =. 85
NUMBER = 112
TIMESTAMP = 0
CONFIDENCELEVEL =. 99
OVERALLCONFIDENCE =. 85

TextModeCollectedData
STREETNAME = LaSalle
TIMESTAMP = 1
CONFIDENCELEVEL = 1.0
OVERALLCONFIDENCE = 1.0

At block 408, effectively the two modes are merged into a single return result.
First, the entire result with the lowest confidence level is retrieved and placed in the final result (FinalResult) structure.

ＦｉｎａｌＲｅｓｕｌｔ

ＳＴＲＥＥＴＮＡＭＥ＝Ｍｉｃｈｉｇａｎ
ＣＯＮＦＩＤＥＮＣＥＬＥＶＥＬ＝．８５
ＮＵＭＢＥＲ＝１１２
ＣＯＮＦＩＤＥＮＣＥＬＥＶＥＬ＝．９９

その後、次の最低の結果の要素が最終結果の中で置き換えられる。
FinalResult

STREETNAME = Michigan
CONFIDENCELEVEL =. 85
NUMBER = 112
CONFIDENCELEVEL =. 99

The next lowest result element is then replaced in the final result.

ＦｉｎａｌＲｅｓｕｌｔ

ＳＴＲＥＥＴＮＡＭＥ＝ＬａＳａｌｌｅ
ＣＯＮＦＩＤＥＮＣＥＬＥＶＥＬ＝１．０
ＮＵＭＢＥＲ＝１１２
ＣＯＮＦＩＤＥＮＣＥＬＥＶＥＬ＝．９９

この最終結果は、２つのモードを融合したものであり、インタプリタに送られ、そこで、次に何をするかを決定する（Ｗｅｂからさらに情報をフェッチするか、またはユーザからの情報がもっと必要かどうかを決定し、現在の状態に基づいて再度プロンプトを表示する）。
FinalResult

STREETNAME = LaSalle
CONFIDENCELEVEL = 1.0
NUMBER = 112
CONFIDENCELEVEL =. 99

This end result is a blend of the two modes and is sent to the interpreter where it decides what to do next (fetch more information from the Web or need more information from the user? Determine whether and re-prompt based on current state).

図５は、同時マルチモーダル・セッションパーシスタンス（ｐｅｒｓｉｓｔａｎｃｅ、永続化）コントローラ６００と、同時マルチモーダル・セッション・パーシスタンス・コントローラ６００に結合された同時マルチモーダル・セッション・ステータス・メモリ６０２とを備えるマルチモーダル融合サーバ１４の他の実施形態を示している。同時マルチモーダル・セッション・パーシスタンス・コントローラ６００は、適切な処理デバイス上で実行されているソフトウェア・モジュールであるか、または適切なハードウェア、ソフトウェア、ファームウェア、またはそれらの適切な組み合わせとすることができる。同時マルチモーダル・セッション・パーシスタンス・コントローラ６００は、非セッション状態のときに、ユーザ毎に、同時マルチモーダル・セッション・ステータス情報６０４をデータベースまたはその他の適切なデータ構造の形で保持する。同時マルチモーダル・セッ
ション・ステータス情報６０４は、セッション中に異なる同時モード通信を行うように構成されている複数のユーザ・エージェント・プログラムのステータス情報である。同時マルチモーダル・セッション・パーシスタンス・コントローラ６００は、同時マルチモーダル・セッション・ステータス情報６０４へのアクセスに対する応答としてすでに終了している同時マルチモーダル・セッションを再確立する。マルチモーダル・セッション・コントローラ４０は、ユーザがいつセッションに参加したかを同時マルチモーダル・セッション・パーシスタンス・コントローラ６００に通知する。マルチモーダル・セッション・コントローラ４０は、また、同時マルチモーダル同期コーディネータと通信し、オフライン・デバイスとの同期処理を行うか、または同時マルチモーダル・セッションを再確立するために必要なユーザ・エージェント・プログラムと同期をとる。 FIG. 5 illustrates a multi comprising a simultaneous multi-modal session persistence controller 600 and a simultaneous multi-modal session status memory 602 coupled to the simultaneous multi-modal session persistence controller 600. Another embodiment of the modal fusion server 14 is shown. The simultaneous multimodal session persistence controller 600 may be a software module running on a suitable processing device, or suitable hardware, software, firmware, or a suitable combination thereof. it can. The concurrent multimodal session persistence controller 600 maintains concurrent multimodal session status information 604 in the form of a database or other suitable data structure for each user when in a non-session state. Simultaneous multimodal session status information 604 is status information for a plurality of user agent programs configured to perform different simultaneous mode communications during a session. Simultaneous multimodal session persistence controller 600 re-establishes a concurrent multimodal session that has already ended in response to access to simultaneous multimodal session status information 604. The multimodal session controller 40 notifies the simultaneous multimodal session persistence controller 600 when the user has joined the session. The multimodal session controller 40 also communicates with the simultaneous multimodal synchronization coordinator to synchronize with offline devices or to re-establish a simultaneous multimodal session. Synchronize with.

同時マルチモーダル・セッション・パーシスタンス・コントローラ６００は、例えば、前の同時マルチモーダル通信セッションのときに所与のモードに使用されるプロキシを示すＵＲＬなどのプロキシＩＤデータ９０６を格納する。必要ならば、同時マルチモーダル・セッション状態メモリ６０２は、さらに、そのようなフィールドまたはスロットの内容とともに前の同時マルチモーダル通信セッション時にユーザからの入力によりどのフィールドまたはスロットが書き込まれたかを示す情報も格納することができる。さらに、同時マルチモーダル・セッション状態メモリ６０２は、同時マルチモーダル通信セッションに対する現在の対話状態６０６を含むことができる。状態には、インタプリタ５０が実行中アプリケーションの実行状態にある場合も含まれる。ユーザがどのフィールドに書き込んだかに関する情報は、融合された入力情報６０の形式とすることができる。 The simultaneous multimodal session persistence controller 600 stores proxy ID data 906, such as a URL indicating the proxy used for a given mode during a previous simultaneous multimodal communication session, for example. If necessary, the simultaneous multimodal session state memory 602 also includes information indicating which fields or slots were written by input from the user during a previous simultaneous multimodal communication session along with the contents of such fields or slots. Can be stored. Further, the simultaneous multimodal session state memory 602 can include a current interaction state 606 for the simultaneous multimodal communication session. The state includes a case where the interpreter 50 is in the execution state of the application being executed. Information regarding which field the user has written can be in the form of fused input information 60.

図に示されているように、Ｗｅｂサーバ１８は、モード・タイプ毎にモード特有命令を備えることができる。この例では、テキストは、ＨＴＭＬ形式の形で供給され、音声は、ｖｏｉｃｅＸＭＬ形式の形で供給され、音声はさらに、ＷＭＬ形式で供給される。同時マルチモーダル同期コーディネータ４２は、適切な形式を適切なプロキシに出力する。図に示されているように、ｖｏｉｃｅＸＭＬ形式は、音声ブラウザ用に指定されているプロキシ３８ａを通じて出力されるが、ＨＴＭＬ形式は、グラフィカル・ブラウザ用のプロキシ３８ｎに出力される。 As shown in the figure, the web server 18 can include mode specific instructions for each mode type. In this example, the text is supplied in the form of HTML, the speech is supplied in the form of voiceXML, and the speech is further supplied in the WML format. The simultaneous multimodal synchronization coordinator 42 outputs the appropriate format to the appropriate proxy. As shown in the figure, the voiceXML format is output through the proxy 38a designated for the voice browser, while the HTML format is output to the proxy 38n for the graphical browser.

セッション・パーシスタンス維持は、セッションが異常終了し、ユーザがこの後も同じ対話状態に戻りたい場合に使用することができる。また、モードで、異なるモードで入力から出力までの遅延時間を生じさせ、時間遅延を補正するために情報を一時的に格納しておく必要のある、異なる遅延特性を持つトランスポート・メカニズムを使用するのも有益である。 Session persistence maintenance can be used when a session ends abnormally and the user wants to return to the same interactive state afterwards. In addition, the mode uses a transport mechanism with different delay characteristics that causes delays from input to output in different modes and requires information to be temporarily stored to compensate for the time delay. It is also beneficial to do.

図６〜７に示されているように、同時マルチモーダル・セッション・パーシスタンス・コントローラ６００は、所与のセッション中に所与のユーザの複数のユーザ・エージェント・プログラムのマルチモーダル・セッション・ステータス情報を保持し、ユーザ・エージェント・プログラムは、セッション中に異なる同時モード通信を行えるように構成されている。これは、ブロック７００に示されている。ブロック７０２に示されているように、方法は、マルチモーダル・セッション・ステータス情報６０４にアクセスすることに対する応答として前の同時マルチモーダル・セッションを再確立することを含む。ブロック７０４に示されているように、より詳しく述べると、同時マルチモーダル・セッション時に、同時マルチモーダル・セッション・パーシスタンス・コントローラ６００はメモリ６０２内にユーザ別マルチモーダル・セッション・ステータス情報６０４を格納する。ブロック７０６に示されているように、同時マルチモーダル・セッション・パーシスタンス・コントローラ６００は、セッション・コントローラーからユーザによるセッション参加を検出し、メモリ内でユーザＩＤを検索して、ユーザが前の同時マルチモーダル・セッションに関与していたかを判別する。したがって、ブロック７０８に示されているように、方
法は、ユーザによるセッション参加の検出に基づいてメモリ６０２に格納されているマルチモーダル・セッション・ステータス情報６０４にアクセスする。 As shown in FIGS. 6-7, the simultaneous multimodal session persistence controller 600 provides multimodal session status for multiple user agent programs for a given user during a given session. The information is retained and the user agent program is configured to allow different simultaneous mode communications during the session. This is indicated by block 700. As shown in block 702, the method includes re-establishing a previous simultaneous multimodal session in response to accessing the multimodal session status information 604. More specifically, as shown in block 704, during a simultaneous multimodal session, the simultaneous multimodal session persistence controller 600 stores per-user multimodal session status information 604 in memory 602. To do. As shown in block 706, the simultaneous multimodal session persistence controller 600 detects user participation in the session from the session controller and retrieves the user ID in memory so that the user Determine if you were involved in a multimodal session. Accordingly, as shown in block 708, the method accesses multimodal session status information 604 stored in memory 602 based on detection of session participation by the user.

ブロック７１０に示されているように、方法は、セッションがメモリ６０４内に存在しているかどうかを判別することを含む。否定の場合、セッションは新しいセッションとして指定され、さらに、新しいエントリが作成され、新しいセッションをメモリ６０２に記録するため必要なデータがそのエントリに書き込まれる。これは、ブロック７１２に示されている。ブロック７１４に示されているように、セッションが存在している場合、例えば、セッションＩＤがメモリ６０２内に存在する場合、方法は、メモリ６０２に対して、ユーザが既存のアプリケーションを実行させているかクエリを実行することを含み、もし実行していれば、ユーザがそのアプリケーションとの通信を再確立したいかクエリを実行することができる。ユーザが望めば、方法は、メモリ６０２から最後にフェッチされた情報のＵＲＬを取り出すことを含む。これは、ブロック７１６に示されている（図７）。ブロック７１８に示されているように、適切なプロキシ３８ａ〜３８ｎに対し、ブロック７１６で取り出された適切なＵＲＬが与えられる。ブロック７２０に示されているように、方法は、メモリ６０２に格納されているユーザ・エージェント状態情報６０６に基づいて、プロキシを介して、適切なユーザ・エージェント・プログラムに要求を送信することを含む。 As shown in block 710, the method includes determining whether a session exists in memory 604. If not, the session is designated as a new session, a new entry is created, and the data necessary to record the new session in memory 602 is written to that entry. This is indicated by block 712. As shown in block 714, if a session exists, for example, if the session ID exists in memory 602, the method causes memory 602 to have the user run an existing application. Including executing the query, and if so, the user can execute the query if they want to re-establish communication with the application. If the user desires, the method includes retrieving the URL of the last fetched information from memory 602. This is indicated by block 716 (FIG. 7). As shown in block 718, the appropriate URL retrieved in block 716 is provided to the appropriate proxy 38a-38n. As shown in block 720, the method includes sending a request to the appropriate user agent program via the proxy based on the user agent state information 606 stored in the memory 602. .

図８は、同時マルチモーダル・セッション・ステータス・メモリ６０２の内容の一例を示す図である。図に示されているように、ユーザＩＤ９００で特定のユーザを指定し、ユーザが複数のセッションをメモリ６０２内に格納している場合にセッションＩＤ９０２をユーザＩＤに関連付けることができる。さらに、ユーザ・エージェント・プログラムＩＤ９０４は、例えば、特定のユーザ・エージェント・プログラムを実行しているデバイスに関するデバイスＩＤを示す。プログラムＩＤは、ユーザ・プログラム識別子、ＵＲＬ、または他のアドレスでもよい。プロキシＩＤデータ９０６は、前の同時マルチモーダル通信でマルチモーダル・プロキシが使用されていることを示す。したがって、ユーザはセッションを終了し、後から、その終了したところから継続することができる。 FIG. 8 is a diagram illustrating an example of the contents of the simultaneous multimodal session status memory 602. As shown in the figure, a user ID 900 can be used to specify a particular user and when the user stores multiple sessions in the memory 602, the session ID 902 can be associated with the user ID. Further, the user agent program ID 904 indicates, for example, a device ID related to a device executing a specific user agent program. The program ID may be a user program identifier, URL, or other address. Proxy ID data 906 indicates that a multimodal proxy is used in the previous simultaneous multimodal communication. Thus, the user can end the session and later continue from where it ended.

デバイスＩＤ９０４を保持すると、とりわけ、システムでは同時マルチモーダル・セッションの実行中に使用されているデバイスの識別を保持可能であるため、ユーザは同時マルチモーダル通信中にデバイスを簡単に切り換えられる。 Holding the device ID 904, among other things, allows the user to easily switch between devices during simultaneous multimodal communication because the system can hold the identity of the device being used during the execution of the simultaneous multimodal session.

したがって、１つまたは複数のデバイスに分散されている別々のユーザ・エージェント・プログラムを通じて、（あるいは、同じデバイスに含まれている場合）、異なるモードで入力された複数の入力は、統一された一貫性のある方法で融合される。さらに、ユーザ・エージェント・プログラムのレンダリングとそれらのユーザ・エージェント・プログラムを使用したユーザによる情報入力の両方の同期をとるメカニズムが用意されている。さらに、開示されているマルチモーダル融合サーバは、同時マルチモーダル通信セッションを行えるように、既存のデバイスおよびゲートウェイに結合することができる。 Thus, multiple inputs entered in different modes, through separate user agent programs distributed across one or more devices (or if included in the same device), are unified and consistent. It is fused in a sexual way. In addition, a mechanism is provided that synchronizes both the rendering of user agent programs and the information input by the user using those user agent programs. Further, the disclosed multimodal fusion server can be coupled to existing devices and gateways to allow simultaneous multimodal communication sessions.

さまざまな態様における本発明の他の変更形態および修正形態の実施が、当業者には明らかであること、また本発明は説明されている特定の実施形態に限定されないこと、は理解されるであろう。例えば、本発明の方法はいくつかの工程に関して説明されているが、それらの工程は必要に応じて適切な順序で実行可能であることは理解されるであろう。したがって、開示され本願で権利を請求している基本原理の精神と範囲に包含される一部および全部の修正形態、変更形態、または均等形態は本発明に包含されるものとする。 It will be understood that other variations and modifications of the invention in various aspects will be apparent to those skilled in the art and that the invention is not limited to the specific embodiments described. Let's go. For example, although the method of the present invention has been described with respect to several steps, it will be understood that the steps can be performed in an appropriate order as needed. Accordingly, some and all modifications, variations, or equivalents that fall within the spirit and scope of the basic principles disclosed and claimed herein are intended to be covered by the present invention.

本発明の一実施形態によるマルチモーダル通信システムの一例を説明するブロック図。1 is a block diagram illustrating an example of a multimodal communication system according to an embodiment of the present invention. 本発明の一実施形態によるマルチモーダル通信の方法の一例を説明する流れ図。5 is a flowchart for explaining an example of a method of multimodal communication according to an embodiment of the present invention. 本発明の一実施形態によるマルチモーダル通信の方法の一例を説明する流れ図。5 is a flowchart for explaining an example of a method of multimodal communication according to an embodiment of the present invention. 本発明の一実施形態による受信した同時マルチモーダル入力情報を融合する方法の一例を説明する流れ図。5 is a flowchart illustrating an example of a method for fusing received simultaneous multimodal input information according to an embodiment of the present invention. 本発明の実施形態によるマルチモーダル・ネットワーク要素の一例を説明するブロック図。1 is a block diagram illustrating an example of a multimodal network element according to an embodiment of the invention. 本発明の一実施形態によるマルチモーダル・セッション・パーシスタンスを維持する方法の一例を説明する流れ図。5 is a flow diagram illustrating an example method for maintaining multimodal session persistence according to one embodiment of the invention. 図６に示されている流れ図の一部を説明する流れ図。7 is a flowchart for explaining a part of the flowchart shown in FIG. 6. 本発明の一実施形態による同時マルチモーダル・セッションのステータスのメモリ内容の一例を表すブロック図。FIG. 4 is a block diagram illustrating an example of memory contents of a status of simultaneous multimodal sessions according to an embodiment of the present invention.

Claims

A method of multimodal communication,
Obtaining mode specific instructions for multiple user agent programs operating in different modes relative to each other; and
Synchronizing the output from the plurality of user agent programs based on the mode specific instructions during a session.

Synchronizing output by the plurality of user agent programs transmits at least a portion of a markup language representing a different mode for use by the plurality of user agent programs, wherein each user agent The method of claim 1, comprising causing the program to output information in different modes based on the at least part of the markup language.

One of the plurality of user agent programs includes a graphical browser, the other one of the plurality of user agent programs includes a voice browser, and the output from the plurality of user agent programs is The method of claim 2, comprising user input entered simultaneously in different modes through the plurality of user agent programs.

Transmitting at least a portion of a markup language representing different modes for use by the plurality of user agent programs, transmitting markup language forms associated with the different modes to a plurality of different devices. The method of claim 2, wherein each of the devices is for operating one of the plurality of user agent programs.

Sending at least a portion of a markup language representing a different mode for use by the plurality of user agent programs sends a markup language form associated with the different mode to the same device. The method of claim 2, wherein the device is for operating the plurality of user agent programs in different modes.

The method of claim 1, comprising determining a proxy to communicate with each of the plurality of user agent programs during a session.

Obtaining a mode specific instruction for the plurality of user agent programs comprises communicating with an application that provides a different markup language format for each of the plurality of user agent programs; The method of claim 2, wherein the format represents a different mode.

Obtaining mode-specific instructions for the plurality of user agent programs operating in different modes comprises a markup language form written in a basic markup language that represents the mode-specific instructions for at least one user agent program. The method of claim 1 including fetching, wherein the markup language format includes data identifying mode specific instructions for other user agent programs operating in different modes.

Synchronizing the output from the plurality of user agent programs based on the mode specific instructions analyzes the mode specific instructions fetched from the multimodal application to detect simultaneous multimodal tags (CMMT) And if detected,
The method of claim 1, comprising obtaining a mode specific instruction for at least one user agent program based on the CMMT.

A multimodal network element,
An information fetcher that operates to obtain mode specific instructions for a plurality of user agent programs operating in different modes relative to each other during the same session;
A simultaneous multimodal synchronization coordinator that operates in conjunction with the information fetcher and operates to synchronize output from the plurality of user agent programs based on the mode-specific instructions during the session; Multimodal network element.

The multimodal network element of claim 10, wherein the simultaneous multimodal synchronization coordinator determines a proxy for communicating with each of the plurality of user agent programs during a session.

A method of multimodal communication,
Sending a request for simultaneous multimodal input information corresponding to multiple input modes associated with multiple user agent programs operating in the same session; and
Fusing received simultaneous multimodal input information transmitted from the plurality of user agent programs transmitted in response to requests for different simultaneous multimodal information.

Determining a proxy for each different mode associated with each application associated with a different mode in a given session prior to sending said request for simultaneous multimodal input information;
13. The method of claim 12, comprising synchronizing the request for different simultaneous multimodal input information with the plurality of user agent programs using the proxy determined for each different mode. .

13. The method of claim 12, comprising waiting for a period of time to determine whether the multimodal input information has been properly received for fusion.

15. The method of claim 14, comprising waiting for different periods depending on the mode of each user agent program.

Sending the request for simultaneous multimodal input information corresponding to multiple input modes sends mode specific instructions for multiple user agent programs operating in different modes to each other, and simultaneous information input of different modes 13. The method of claim 12, comprising requesting:

The step of transmitting a request for simultaneous multimodal input information corresponding to the plurality of input modes includes transmitting a first mode-based markup language format to the device and a second mode-based markup language format. 13. The method of claim 12, comprising transmitting to one or more devices to request simultaneous information input in different modes.

Sent from one of the plurality of user agent programs sent in response to a request for different simultaneous multimodal information based on the received time and on the confidence value of the individual results received The method of claim 12, comprising combining received multimodal input information.

The method of claim 12, comprising transcoding the mode specific instructions to include data identifying the mode specific instructions for other user agent programs operating in different modes.

A multimodal network element,
A plurality of proxies each sending a request for simultaneous multimodal input information corresponding to a plurality of input modes associated with a plurality of user agent programs operating during the same session;
Acting to respond to received simultaneous multimodal input information sent from the plurality of user agent programs sent in response to requests for different simultaneous multimodal information, and different user agent programs during the same session A multimodal network element comprising: a multimodal fusion engine operable to fuse different multimodal input information transmitted from the plurality of user agent programs to provide simultaneous multimodal communication from .

The request for different simultaneous multimodal information includes mode specific instructions for the plurality of user agent programs operating in different modes with respect to each other and requesting different modes of simultaneous information input, the multimodal network element comprising: ,
An information fetcher operative to obtain mode specific instructions for the plurality of user agent programs operating in different modes relative to each other during the same session;
A simultaneous multimodal synchronization coordinator that operates in conjunction with the information fetcher and the plurality of proxies and operates to synchronize received simultaneous multimodal input information output from the plurality of user agent programs during a session The multimodal network element of claim 20 comprising:

Simultaneous multimodal operation of the plurality of user agent programs operating in conjunction with the plurality of user agent programs and configured for different simultaneous mode communication during a session for each user during a non-session state A simultaneous multimodal session persistence controller that maintains session status information and re-establishes the simultaneous multimodal session in response to accessing the simultaneous multimodal session status information;
The multimodal network element of claim 21, comprising: a memory operatively coupled to the simultaneous multimodal session persistence controller and including the simultaneous multimodal session status information.

Data comprising a markup language interpreter and a transcoder operating in conjunction with the markup language interpreter, the transcoder identifying mode specific instructions for other user agent programs operating in different modes 21. The multimodal network element of claim 20, transcoding mode specific instructions to include: