JP2001211437A

JP2001211437A - Multimedia cti system

Info

Publication number: JP2001211437A
Application number: JP2000020156A
Authority: JP
Inventors: Mitsunari Uozumi; 光成魚住
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2000-01-28
Filing date: 2000-01-28
Publication date: 2001-08-03

Abstract

PROBLEM TO BE SOLVED: To provide a multimedia CTI system that provides a scenario of automatic reply to enhance the under standing of speaking in a guidance or the like by using both audio and image data in common through a telephone line. SOLUTION: A voice recognition section 7 converts a voice instruction from a video phone 1 via virtual termination section 6 into a form that can be interpreted by a reply scenario section 12 and outputs the converted voice instruction to the reply scenario section 12, an image reception section 9 converts an image instruction from the video phone 1 via the virtual termination section 6 into a form that can be interpreted by the reply scenario section 12 and outputs the converted image to the reply scenario section 12, the reply scenario section 12 reads a reply and image data from a database (not shown) according to instruction contents from the voice recognition section 7, a voice synthesis section 8 converts a reply from the reply scenario section 12 into a voice, an image transmission section 10 converts the image data from the reply scenario section 12 into an image and the virtual termination section 6 transmits the voice reply received from the voice synthesis section 8 or the image received from the image transmission section 10 to the video phone 1.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、ＴＶや電話など
を所有している在宅者とのコミュニケーションを図るた
めに、在宅者の電話及びテレビとコンピュータとを電話
回線で接続することでより高度な無人化あるいは、半無
人の効率化あるいは有人処理の効率化を図るマルチメデ
ィアＣＴＩ（ＣｏｍｐｕｔｅｒＴｅｌｅｐｈｏｎｙ
Ｉｎｔｅｇｒａｔｉｏｎ）システムに関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a more advanced system by connecting a home telephone and a television to a computer via a telephone line in order to communicate with a home owned by a TV or a telephone. Multimedia CTI (Computer Telephony) for unmanned or semi-unmanned or manned processing
Integration) system.

【０００２】[0002]

【従来の技術】図３は、例えば、一般的に雑誌やカタロ
グ等に示された従来の電話音声自動応答システムの構成
図である。図において、３１は電話、３２は、着信もし
くは発信した電話の会話の制御を応答シナリオに沿って
行う電話音声自動応答装置、３３は電話網、７は音声認
識部、８は音声合成部、１２は応答シナリオ部である。2. Description of the Related Art FIG. 3 is a block diagram of a conventional telephone automatic voice response system generally shown in, for example, a magazine or a catalog. In the figure, 31 is a telephone, 32 is a telephone automatic voice response apparatus for controlling the conversation of an incoming or outgoing telephone in accordance with a response scenario, 33 is a telephone network, 7 is a voice recognition unit, 8 is a voice synthesis unit, 12 is Is a response scenario part.

【０００３】次に、図３に示した従来例の動作について
説明する。電話３１と電話音声自動応答装置３２との間
で着信した電話の呼もしくは発信した電話の呼が確立す
ると、応答シナリオ部１２は、音声合成部８に対し、予
め用意してある定型メッセージの発声を指示し、これに
より、音声ガイダンスが話者に伝えられる。この音声ガ
イダンスに従って話者が指示内容を発声すると、この指
示内容が音声で電話音声自動応答装置３２に伝えられ
る。この話者による音声指示を音声認識部７が受ける
と、指示内容に応じて応答シナリオ部１２においてシナ
リオの分岐が行なわれる。一般に、音声認識部７が行う
音声認識には電話プッシュボタンの認識や話者の発声の
認識などがある。また、音声合成部８が行う音声合成に
は、録音音声を繋ぎ合わせる編集合成や、テキストから
音片を一定のルールに従って合成していく規則合成があ
る。Next, the operation of the conventional example shown in FIG. 3 will be described. When an incoming telephone call or an outgoing telephone call is established between the telephone 31 and the telephone automatic voice response apparatus 32, the response scenario unit 12 instructs the voice synthesis unit 8 to utter a prepared standard message. , Whereby voice guidance is conveyed to the speaker. When the speaker utters an instruction in accordance with the voice guidance, the instruction is transmitted to the telephone automatic voice response apparatus 32 by voice. When the voice instruction by the speaker is received by the voice recognition unit 7, the scenario is branched in the response scenario unit 12 in accordance with the content of the instruction. Generally, the voice recognition performed by the voice recognition unit 7 includes recognition of a telephone push button and recognition of a speaker's utterance. The voice synthesis performed by the voice synthesis unit 8 includes edit synthesis that connects recorded voices, and rule synthesis that synthesizes speech units from text according to a certain rule.

【０００４】このような従来の自動応答システムは、電
話という音声の交換だけでシナリオを実現していたた
め、人間が聞き取り記憶できる範囲のみのシナリオしか
実現できなかった。たとえば、音声のみの中で選択肢を
自動応答装置から発声しても、選択肢の数は３つが限度
で、５つ以上になると、殆どが最初の方は忘れていると
いうことで、シナリオの複雑度を上げることはできなか
った。[0004] In such a conventional automatic answering system, a scenario is realized only by exchanging the voice of a telephone, so that only a scenario that can be heard and stored by a human can be realized. For example, even if the option is uttered from the automatic answering device only in the voice, the number of options is limited to three, and when the number of options becomes five or more, most of the options are forgotten. Could not be raised.

【０００５】これを解決する１つの方法として、音声と
画像またはテキストを併用して利用する方法がある。図
４はこの音声と画像またはテキストを併用した従来の音
声対話システムの一例を示した構成図であり、特開平５
−２１６６１８号公報に示されているものである。As one method for solving this problem, there is a method of using voice and an image or text in combination. FIG. 4 is a block diagram showing an example of a conventional voice dialogue system using both voice and image or text.
No. 216618.

【０００６】図４において、この音声対話システムは例
えばフード店などの店に据え付けられており、話者がこ
の店に到来してこの音声対話システムの前に立つと、音
声対話システムはマットの下のスイッチや監視カメラな
どで話者を自動検出し、音声対話システムの応答生成出
力部４３が音声ガイダンスを出力する。この音声ガイダ
ンスに従って、話者が図示しないマイクなどにより指示
内容を音声入力すると、入力された音声の意味内容をマ
イクに直接接続されている音声理解部４１が理解し、こ
の理解の結果に基づいて対話管理部４２が応答内容の意
味的な決定を行い、この決定された応答内容に基づいて
応答生成出力部４３が音声応答データおよび画面表示デ
ータを生成し、これら音声応答データおよび画面表示デ
ータをディスプレイ４４およびスピーカ４５より出力す
る。In FIG. 4, the voice dialogue system is installed in a store such as a food store. When a speaker arrives at the store and stands in front of the voice dialogue system, the voice dialogue system is placed under a mat. The switch is automatically detected by a switch or a surveillance camera, and the response generation output unit 43 of the voice interaction system outputs voice guidance. According to the voice guidance, when the speaker inputs the instruction content by using a microphone (not shown) or the like, the voice understanding unit 41 directly connected to the microphone understands the meaning of the input voice, and based on the result of the understanding. The dialogue management unit 42 makes a semantic determination of the response content, and the response generation / output unit 43 generates voice response data and screen display data based on the determined response content, and outputs the voice response data and screen display data. Output from the display 44 and the speaker 45.

【０００７】しかし、このシステムは、店に到来した人
とのコミュニケーションを図るためのものであり、話者
が店に来ない限り、コミュニケーションを図ることがで
きない。特に、ＴＶや電話などを所有している在宅者
（障害を持っており、出歩けない人を含む）と電話など
によるコミュニケーションを図りたい場合やＴＶ電話付
きの遠隔地の施設にいる人とのコミュニケーションを図
りたい場合には対応できないという問題点があった。[0007] However, this system is intended to communicate with a person who has arrived at the store, and cannot communicate unless a speaker comes to the store. In particular, if you want to communicate with people at home who have a TV or telephone (including people who have disabilities and cannot go out) by telephone or with people who are in remote facilities with TV telephones There was a problem that it was not possible to cope when it was desired to communicate.

【０００８】[0008]

【発明が解決しようとする課題】このように従来の自動
応答装置は、電話という音声の交換だけでシナリオを実
現していたため、人間が聞き取り記憶できる範囲のみの
シナリオしか実現できていないという問題点があった。As described above, in the conventional automatic answering apparatus, since the scenario is realized only by exchanging the voice of the telephone, the problem is that only the scenario that can be heard and stored by a human can be realized. was there.

【０００９】また、特開平５−２１６６１８号公報に示
された従来例では、ＴＶ電話などを所有している在宅者
とのコミュニケーションを図ることができないという問
題点があった。In the conventional example disclosed in Japanese Patent Application Laid-Open No. Hei 5-216618, there is a problem that communication with a homeowner who owns a TV phone or the like cannot be achieved.

【００１０】この発明は上記のような問題点を解決する
ために為されたものであり、電話回線を通じて音声や画
像などを併用することでガイダンスなどにおける話者の
理解度を高めるための自動応答のシナリオを提供するマ
ルチメディアＣＴＩシステムを得ることを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and has an automatic answering system for enhancing the speaker's understanding in guidance and the like by using voice and images together through a telephone line. It is an object of the present invention to obtain a multimedia CTI system that provides the following scenarios.

【００１１】[0011]

【課題を解決するための手段】第１の発明に係るマルチ
メディアＣＴＩシステムは、音声通信の機能と画像通信
の機能を備えたＴＶ電話と、このＴＶ電話と電話回線に
より接続し、前記ＴＶ電話に音声でガイダンスを送り、
前記ＴＶ電話からの音声または画像による指示に応じ
て、音声または画像のいずれか一方あるいは両方を使っ
て応答を行う自動応答装置を備えたものである。A multimedia CTI system according to a first aspect of the present invention includes a TV telephone having a voice communication function and an image communication function, and the TV telephone is connected to the TV telephone by a telephone line. To send voice guidance to
An automatic answering apparatus is provided which responds to a voice or image instruction from the TV phone using one or both of the voice and the image.

【００１２】また、第２の発明に係るマルチメディアＣ
ＴＩシステムは、少なくとも１つのＴＶ電話やＷｅｂ
ＴＶやＷＣＤＭＡ（ＷｉｄｅＢａｎｄＣｏｄｅＤ
ｉｖｉｓｉｏｎＭｕｌｔｉｐｌｅＡｃｃｅｓｓ）な
どの音声通信の機能と画像通信の機能を備えたＴＶ（Ｔ
ｅｌｅｖｉｓｉｏｎ）電話系と、このＴＶ電話系に電話
回線を介して接続される自動応答装置と、を備え、前記
自動応答装置は前記ＴＶ電話系と接続し、このＴＶ電話
系と音声または画像を送受信する仮想終端部と、データ
ベースを保有する応答シナリオ部と、前記仮想終端部と
前記応答シナリオ部の双方に接続された音声認識部と、
前記仮想終端部と前記応答シナリオ部の双方に接続され
た音声合成部と、前記仮想終端部と前記応答シナリオ部
の双方に接続された画像受信部と、前記仮想終端部と前
記応答シナリオ部の双方に接続された画像送信部と、を
備え、前記仮想終端部は、前記ＴＶ電話系から受信した
情報が音声による指示ならばこの情報を前記音声認識部
へ送信し、画像（イメージ）による指示ならばこの情報
を前記画像受信部へ送信し、前記音声認識部は前記仮想
終端部からの音声指示を前記シナリオ部が解読できる形
式に変換して前記応答シナリオ部へ出力し、前記画像受
信部は前記仮想終端部からの画像による指示を前記シナ
リオ部が解読できる形式に変換して前記応答シナリオ部
へ出力し、前記応答シナリオ部は音声認識部からの指示
に応じて回答と画面データを前記データベースから読み
だし、前記音声合成部は前記応答シナリオ部からの回答
を音声に変換して前記仮想終端部へ出力し、前記画像送
信部は前記応答シナリオ部からの画面データを画像に変
換して前記仮想終端部へ出力し、前記仮想終端部は、前
記音声合成部から入力した音声の回答または前記画像送
信部から入力した画像を前記ＴＶ電話系へ送信するもの
である。Also, the multimedia C according to the second invention
The TI system includes at least one TV phone or Web
TV and WCDMA (Wide Band Code D)
A TV (TV) having a voice communication function such as vision multiple access (visual multiple access) and a video communication function.
e-vision) telephone system, and an automatic answering device connected to the TV telephone system via a telephone line, wherein the automatic answering device is connected to the TV telephone system and transmits and receives voice or images to and from the TV telephone system. A virtual termination unit, a response scenario unit holding a database, a speech recognition unit connected to both the virtual termination unit and the response scenario unit,
A voice synthesizing unit connected to both the virtual termination unit and the response scenario unit; an image receiving unit connected to both the virtual termination unit and the response scenario unit; and a virtual termination unit and the response scenario unit. An image transmission unit connected to both sides, wherein the virtual termination unit transmits the information to the voice recognition unit if the information received from the TV phone system is a voice instruction, and transmits the image (image) instruction. Then, the information is transmitted to the image receiving unit, and the voice recognition unit converts the voice instruction from the virtual terminal unit into a format that can be decoded by the scenario unit and outputs the format to the response scenario unit. Converts the instruction from the image from the virtual termination unit into a format that can be interpreted by the scenario unit and outputs the converted instruction to the response scenario unit. Data is read from the database, the voice synthesis unit converts the response from the response scenario unit to voice and outputs it to the virtual termination unit, and the image transmission unit converts the screen data from the response scenario unit into an image. The virtual termination unit converts the image and outputs the result to the virtual termination unit, and the virtual termination unit transmits the answer of the voice input from the voice synthesis unit or the image input from the image transmission unit to the TV telephone system.

【００１３】また、第３の発明に係るマルチメディアＣ
ＴＩシステムは、自動応答装置は、インターネットなど
のデータ系ネットワークを経由で遠隔のセンサと接続さ
れたデータ制御機能を持ち、ＴＶ電話と関係する遠隔の
センサの情報を関係するＴＶ電話に音声と画像で通知す
るものである。Further, the multimedia C according to the third invention
In the TI system, the automatic answering device has a data control function connected to a remote sensor via a data system network such as the Internet, and transmits voice and image information to the relevant TV phone by transmitting information of the remote sensor related to the TV phone. Is to be notified.

【００１４】また、第４の発明に係るマルチメディアＣ
ＴＩシステムは、少なくとも１つのＴＶ電話やＷｅｂ
ＴＶやＷＣＤＭＡ（ＷｉｄｅＢａｎｄＣｏｄｅＤ
ｉｖｉｓｉｏｎＭｕｌｔｉｐｌｅＡｃｃｅｓｓ）な
どの音声通信の機能と画像通信の機能を備えたＴＶ（Ｔ
ｅｌｅｖｉｓｉｏｎ）電話系と、このＴＶ電話系に電話
回線を介して接続される自動応答装置と、この自動応答
装置にインターネットなどのデータ系ネットワークを介
して接続されたＴＶ電話に対応して設けられた少なくと
も１つのセンサとを備え、前記自動応答装置は前記ＴＶ
電話系と接続し、このＴＶ電話系と音声または画像を送
受信する仮想終端部と、データベースを保有する応答シ
ナリオ部と、前記仮想終端部と前記応答シナリオ部の双
方に接続された音声認識部と、前記仮想終端部と前記応
答シナリオ部の双方に接続された音声合成部と、前記仮
想終端部と前記応答シナリオ部の双方に接続された画像
受信部と、前記仮想終端部と前記応答シナリオ部の双方
に接続された画像送信部と、前記データ系ネットワーク
と前記応答シナリオ部の双方に接続されたデータ制御部
と、を備え、前記仮想終端部は、前記ＴＶ電話系から受
信した情報が音声指示ならばこの情報を前記音声認識部
へ送信し、画像（イメージ）による指示ならばこの情報
を前記画像受信部へ送信し、前記音声認識部は前記仮想
終端部からの音声指示を前記シナリオ部が解読できる形
式に変換して前記応答シナリオ部へ出力し、前記画像受
信部は前記仮想終端部からの画像による指示を前記シナ
リオ部が解読できる形式に変換して前記応答シナリオ部
へ出力し、前記応答シナリオ部は音声認識部からの指示
に応じて回答と画面データを前記データベースから読み
だし、前記音声合成部は前記応答シナリオ部からの回答
を音声に変換して前記仮想終端部へ出力し、前記画像送
信部は前記応答シナリオ部からの画面データを画像に変
換して前記仮想終端部へ出力し、前記仮想終端部は、前
記音声合成部から入力した音声の回答または前記画像送
信部から入力した画像を前記ＴＶ電話系へ送信し、前記
データ制御部は前記ＴＶ電話が１台以上のＴＶ電話の内
のどれかを識別できた場合、このＴＶ電話に関係するセ
ンサからデータを収集し、このデータを前記音声合成部
または前記画像受信部によって音声や画像に変換した上
で前記ＴＶ電話へ送信するものである。Further, the multimedia C according to the fourth invention
The TI system includes at least one TV phone or Web
TV and WCDMA (Wide Band Code D)
A TV (TV) having a voice communication function such as vision multiple access (visual multiple access) and a video communication function.
(elevation) telephone system, an automatic answering device connected to the TV telephone system via a telephone line, and a TV phone connected to the automatic answering device via a data network such as the Internet. At least one sensor, wherein the automatic transponder comprises the TV
A virtual termination unit connected to a telephone system to transmit and receive voice or image to and from the TV telephone system, a response scenario unit holding a database, and a speech recognition unit connected to both the virtual termination unit and the response scenario unit. A speech synthesis unit connected to both the virtual termination unit and the response scenario unit, an image reception unit connected to both the virtual termination unit and the response scenario unit, the virtual termination unit and the response scenario unit And a data control unit connected to both the data network and the response scenario unit. The virtual termination unit transmits information received from the TV telephone system to a voice. If the instruction, the information is transmitted to the voice recognition unit. If the instruction is an image, the information is transmitted to the image reception unit. And converts the instruction into a format that can be interpreted by the scenario unit and outputs it to the response scenario unit.The image receiving unit converts the instruction from the image from the virtual terminal unit into a format that can be interpreted by the scenario unit and converts the response scenario. The response scenario unit reads the answer and the screen data from the database in response to an instruction from the voice recognition unit, and the voice synthesis unit converts the response from the response scenario unit to voice and converts the virtual Output to the termination unit, the image transmission unit converts the screen data from the response scenario unit to an image and outputs the image to the virtual termination unit, and the virtual termination unit responds to the speech input from the speech synthesis unit or The image transmitted from the image transmission unit is transmitted to the TV telephone system. The data control unit determines whether the TV telephone is one of one or more TV telephones. Collects data from the sensors relating to the telephone, and transmits to the TV telephone on converted into sounds and images by the speech synthesis unit or the image receiving unit to the data.

【００１５】また、第５の発明に係るマルチメディアＣ
ＴＩシステムは、仮想終端部は、ＴＶ電話から圧縮され
た画像データを受け取り、元の画像信号を再生した上で
画像受信部へ出力し、画像送信部からの画像を入力した
ら、画像の圧縮を行い、圧縮された画像データをＴＶ電
話へ送信するものである。The multimedia C according to the fifth invention
In the TI system, the virtual termination unit receives the compressed image data from the TV phone, reproduces the original image signal, outputs the original image signal to the image reception unit, and inputs the image from the image transmission unit. Then, the compressed image data is transmitted to the TV phone.

【００１６】また、第６の発明に係るマルチメディアＣ
ＴＩシステムは、電話回線としてＨ３２０、Ｈ３２３、
Ｈ３２４などの電話回線を使用するものである。The multimedia C according to the sixth invention
The TI system uses H320, H323,
A telephone line such as H324 is used.

【００１７】また、第７の発明に係るマルチメディアＣ
ＴＩシステムは、話者側に設けられた電話と、話者側に
設けられたＴＶと、このＴＶと接続し、このＴＶの画面
に書式付きデータを画像表示させるために、まだデータ
の埋められていない書式であるフレームとこの書式に埋
めるべきデータである個別ユーザ向けデータを合成して
前記ＴＶに出力するセットトップボックスなどの受信装
置と、前記電話と電話回線で接続され、前記受信装置と
放送網で接続され、フレームを内蔵するデータベースに
保存し、前記電話からフレームの番号と個別ユーザ向け
データを受信して、データベースから該当するフレーム
を読み出して、この読み出したフレームと前記電話から
受信した個別ユーザ向けデータとを前記放送網経由で前
記受信装置へ送信する自動応答装置とを備えたものであ
る。Further, the multimedia C according to the seventh invention
The TI system includes a telephone provided on the speaker side, a TV provided on the speaker side, and a TV which is connected to the TV, and data is still filled in to display the formatted data on the screen of the TV. A receiving device such as a set-top box that combines a frame in an unformatted format and data for an individual user, which is data to be embedded in the format, and outputs the combined data to the TV; Connected by a broadcast network, stored in a database containing frames, received frame numbers and data for individual users from the phone, read the corresponding frames from the database, received the read frames and received from the phone An automatic response device for transmitting data for individual users to the receiving device via the broadcast network.

【００１８】また、第８の発明に係るマルチメディアＣ
ＴＩシステムは、少なくとも１つの電話と、この電話に
電話回線を介して接続された自動応答装置と、この自動
応答装置に放送網を介して接続されたＴＶと、このＴＶ
に接続され、前記自動応答装置に放送網を介して接続さ
れたセットトップボックスなどの受信装置と、を備え、
前記自動応答装置はデータベースを保有する応答シナリ
オ部と、前記電話と前記応答シナリオ部の双方に接続さ
れた音声認識部と、前記電話と前記応答シナリオ部の双
方に接続された音声合成部と、前記受信装置と前記応答
シナリオ部の双方に接続されたフレーム送出制御部と、
前記受信装置と前記応答シナリオ部の双方に接続された
データ送出制御部と、を備え、前記音声認識部は前記電
話からのまだデータの埋められていない書式であるフレ
ームの番号とこの書式に埋めるべきデータである個別ユ
ーザデータを音声指示で受信して前記応答シナリオ部が
解読できる形式の指示に変換して前記応答シナリオ部へ
出力し、前記応答シナリオ部は音声認識部からの指示に
応じて、回答を前記データベースから読み出したり、前
記電話から指定されたフレームを前記データベースから
読みだ出し、受信した個別ユーザ向けデータと共に出力
したりし、前記音声合成部は前記応答シナリオ部からの
テキストによるガイダンス用メッセージを音声に変換し
て前記仮想終端部へ出力し、前記フレーム送出制御部は
前記応答シナリオ部からの読み出されたフレームを画像
へ変換して前記受信装置へ送出し、前記データ送出制御
部は前記応答シナリオ部からの個別ユーザデータを画像
へ変換して前記受信装置へ送出し、前記受信装置は前記
フレーム送出制御部からのフレームと前記データ送出制
御部からの個別ユーザ向けデータを多重化して前記ＴＶ
のモニタ部へ出力して書式付きデータを画像表示させる
ものである。Also, the multimedia C according to the eighth invention
The TI system includes at least one telephone, an automatic answering device connected to the telephone via a telephone line, a TV connected to the automatic answering device via a broadcast network,
And a receiving device such as a set-top box connected to the automatic response device via a broadcast network,
The automatic answering apparatus has a response scenario unit having a database, a voice recognition unit connected to both the telephone and the response scenario unit, a voice synthesis unit connected to both the telephone and the response scenario unit, A frame transmission control unit connected to both the receiving device and the response scenario unit,
A data transmission control unit connected to both the receiving device and the response scenario unit, wherein the voice recognition unit fills in the format with a frame number in which data from the telephone is not yet filled. Individual user data, which is data to be received, is received as a voice instruction, converted to an instruction in a format that can be decoded by the response scenario unit, and output to the response scenario unit.The response scenario unit responds to the instruction from the voice recognition unit. Reading the answer from the database, reading the frame specified by the telephone from the database, and outputting it together with the received data for individual users, the voice synthesis unit provides guidance by text from the response scenario unit. The message is converted into a voice and output to the virtual termination unit, and the frame transmission control unit Converting the frame read from the image into an image and transmitting the image to the receiving device; the data transmission control unit converts the individual user data from the response scenario unit into an image and transmits the image to the receiving device; The device multiplexes the frame from the frame transmission control unit and the data for individual user from the data transmission control unit, and
And outputs the formatted data to the monitor unit.

【００１９】また、第９の発明に係るマルチメディアＣ
ＴＩシステムは、受信装置は、フレームを記憶するフレ
ームキャッシュを備え、前記受信装置はフレーム送出制
御部からのフレームを前記フレームキャッシュに格納
し、このフレームキャッシュから読み出したフレームと
受信したデータを合成して前記ＴＶのモニタ部へ出力し
て表示させるものである。The multimedia C according to the ninth invention
In the TI system, the receiving device includes a frame cache for storing a frame, the receiving device stores the frame from the frame transmission control unit in the frame cache, and combines the frame read from the frame cache with the received data. And outputs it to the monitor of the TV for display.

【００２０】また、第１０の発明に係るマルチメディア
ＣＴＩシステムは、自動応答装置は、音声、画像、デー
タを入力としたシナリオ制御を行うものである。[0020] In the multimedia CTI system according to the tenth aspect, the automatic response device performs scenario control using voice, image, and data as input.

【００２１】また、第１１の発明に係るマルチメディア
ＣＴＩシステムは、自動応答装置は、音声、画像、デー
タを出力としたシナリオ制御を行うものである。[0021] In the multimedia CTI system according to the eleventh aspect, the automatic response device performs scenario control using voice, image, and data as output.

【００２２】また、第１２の発明に係るマルチメディア
ＣＴＩシステムは、自動応答装置はシナリオの一部を有
人で対応できるように構成したものである。Further, in the multimedia CTI system according to the twelfth invention, the automatic response device is configured so that a part of the scenario can be handled by a man.

【００２３】[0023]

【発明の実施の形態】実施の形態１．実施の形態１を説
明する前に、この発明に係るＣＴＩシステムがどのよう
に利用されるのか、在宅医療におけるヘルパーの呼び出
しの例を挙げて説明する。患者が僻地など医療機関から
遠距離に居住し、この患者が町の病院へすぐ行けないよ
うな場合に訪問医療を要求するときにこの発明が適用さ
れる。この場合、病院などの医療機関（以下、ヘルパー
ステーションという）に音声と画像の自動応答装置を有
するマルチメディアＣＴＩシステムを設置して、このヘ
ルパーステーションと患者が居住する町の診療所に設置
されているＴＶ電話とを電話回線で接続する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiment 1 Before describing Embodiment 1, how the CTI system according to the present invention is used will be described with an example of calling a helper in home care. The present invention is applied to a case where a patient resides at a long distance from a medical institution such as a remote place and cannot access a hospital in a town, and requests home visits. In this case, a multimedia CTI system having an automatic voice and image response device is installed in a medical institution such as a hospital (hereinafter referred to as a helper station), and the helper station is installed in a clinic in a town where patients live. Connected to the existing TV phone via a telephone line.

【００２４】患者（以下、在宅住民という）が診療所へ
行き、ＴＶ電話を使ってヘルパーステーションへ発信す
ると、この電話による呼び出しがヘルパステーションに
着信し、在宅住民とヘルパステーションとの間で電話の
呼が設定される。この後、ヘルパーステーションから例
えば、「あなたの電話がヘルパーステーションに接続さ
れました。ヘルパー（係りつけの医者など）呼び出しは
１、薬の確認は２、緊急呼び出しは３を押して下さい」
というガイダンスが音声と画面で電話回線経由で在宅住
民のＴＶ電話に送られる。このガイダンスはＴＶ電話に
音声及び画像表示されで在宅住民に知らされる。When a patient (hereinafter referred to as a home resident) goes to a clinic and makes a call to the helper station using a TV phone, a call by this phone is received at the helper station, and a call is made between the home resident and the helper station. The call is set up. After this, from the helper station, for example, "Your phone is connected to the helper station. Press 1 for helper (such as an attending physician) call, 2 for drug check, 3 for emergency call"
Is sent by voice and screen to the home resident's TV phone via telephone line. This guidance is displayed to the inhabitants by voice and image display on the TV phone.

【００２５】このガイダンスを音声と画面で確かめた在
宅住民は、例えば、ヘルパーを呼び出したければプッシ
ュボタン１を押す。すると、１に対応するプッシュトー
ン信号がＴＶ電話から電話回線経由でヘルパーステーシ
ョンに送られる。ヘルパーステーションでは、マルチメ
ディアＣＴＩシステムがこの１に対応するプッシュトー
ン信号を受け取り、自動応答装置は、名前を含む応答を
ＴＶ電話に返す。ＴＶ電話では、例えば、「ヘルパー○
○○○が○○月○○に訪問します」と音声と画像で表示
する。また、画面にはこのヘルパーの顔写真も表示され
る。この顔写真を見た患者は訪問日時を音声だけでなく
目で確かめることができるので精神的に大いに安心感を
得ることができる。The inhabitants living at home who have confirmed this guidance by voice and screen press the push button 1 to call a helper, for example. Then, a push tone signal corresponding to 1 is sent from the TV phone to the helper station via the telephone line. At the helper station, the multimedia CTI system receives the pushtone signal corresponding to this one and the auto attendant returns a response including the name to the TV phone. For a TV phone, for example, "Helper ○
XX will visit XX month XX "with audio and images. The helper's face photo is also displayed on the screen. The patient who sees this face photograph can check the visit date and time not only by voice but also by eyes, so that a great sense of security can be obtained mentally.

【００２６】以上、音声と画像を用いたこの発明に係る
マルチメディアＣＴＩシステムがどのように利用される
かについて示した。次に、この発明について説明する。
図１は、この発明に係るマルチメディアＣＴＩシステム
の実施の形態１を示す構成図であり、ＴＶ電話系に適用
した場合を示している。図１において、１は画像通信機
能を有する電話機、２は電話網、３はＴＶ電話用マルチ
メディアＣＴＩ装置、４はセンサ、５はインターネット
などのデータ系ネットワーク、６はＴＶ電話における画
像入出力のプロトコルを仮想的に終端する仮想終端部、
７はプッシュボタンや発声音声を認識する音声認識部、
８は編集合成や規則合成を行う音声合成部、９は電話機
から送られてきた画像を受け取る画像受信部、１０は電
話機に画像を送る画像送信部、１１はセンサとのデータ
交換を行うデータ制御部、１２は図示しない音声画像デ
ータベースを保有し、音声認識部７、音声合成部８、画
像受信部９、画像送信部１０、および画像送信部１１を
入出力としてシナリオを実行する応答シナリオ部であ
る。The above has described how the multimedia CTI system according to the present invention using sound and images is used. Next, the present invention will be described.
FIG. 1 is a configuration diagram showing Embodiment 1 of a multimedia CTI system according to the present invention, and shows a case where the invention is applied to a TV telephone system. In FIG. 1, reference numeral 1 denotes a telephone having an image communication function, 2 denotes a telephone network, 3 denotes a multimedia CTI device for a TV telephone, 4 denotes a sensor, 5 denotes a data network such as the Internet, and 6 denotes an image input / output for a TV telephone. A virtual terminator that virtually terminates the protocol,
7 is a voice recognition unit for recognizing push buttons and uttered voices,
Reference numeral 8 denotes a voice synthesizing unit that performs edit synthesis or rule synthesis, 9 denotes an image receiving unit that receives an image sent from the telephone, 10 denotes an image transmitting unit that sends an image to the telephone, and 11 denotes data control that exchanges data with a sensor. And a response scenario unit 12 having a voice image database (not shown) and executing a scenario using the voice recognition unit 7, the voice synthesis unit 8, the image reception unit 9, the image transmission unit 10, and the image transmission unit 11 as input and output. is there.

【００２７】次に、本実施の形態１の動作について説明
する。まず、電話機１からＴＶ電話用マルチメディアＣ
ＴＩ装置３に電話の呼び出しが着信すると仮想終端部６
が電話に応答し、応答シナリオ部１２が起動される。仮
想終端部６は電話機１からの音声を音声認識部７に、音
声合成部８からの音声を電話機１に電話網２を経由して
送受信する。また、仮想終端部６は電話機１と画像によ
る送受信を行う場合、画像の情報量が非常に多いので、
Ｈ３２０、Ｈ３２３、Ｈ３２４等の規格に従い、差分デ
ータのみを送るようにして画像の圧縮を行っている。仮
想終端部６は、画像データを保存し、電話機１から圧縮
された画像データ（差分データ）を一旦受け取り、保存
している画像データと受信した差分データから画像デー
タを再生した上で画像受信部へ出力する。また、仮想終
端部６は画像送信部からの画像データを入力したら、保
存している画像データと受信した画像データから差分デ
ータを作成し、この差分データのみを電話機１へ送信す
る。これにより、画像データ転送が効率よく行うことが
できる。Next, the operation of the first embodiment will be described. First, a multimedia C for a TV phone from the telephone 1
When a telephone call arrives at the TI device 3, the virtual termination unit 6
Responds to the call, and the response scenario section 12 is activated. The virtual termination unit 6 transmits and receives voice from the telephone 1 to the voice recognition unit 7 and voice from the voice synthesis unit 8 to the telephone 1 via the telephone network 2. In addition, when the virtual terminal unit 6 transmits and receives an image to and from the telephone 1, the amount of image information is very large.
According to the standards such as H320, H323, and H324, the image is compressed by sending only the difference data. The virtual termination unit 6 stores the image data, temporarily receives the compressed image data (difference data) from the telephone 1, reproduces the image data from the stored image data and the received difference data, and then transmits the image data to the image reception unit. Output to Further, when the virtual termination unit 6 receives the image data from the image transmission unit, it creates difference data from the stored image data and the received image data, and transmits only this difference data to the telephone 1. Thereby, image data transfer can be performed efficiently.

【００２８】データ制御部１１はセンサ４とのデータ送
受信をインターネットなどのデータ系ネットワーク５を
経由して行う。応答シナリオ部１２は音声認識部７から
の認識結果や画像受信部９からの画像やデータ制御部１
１からのセンサーデータを元に判断を行い、その判断結
果を音声合成部８や画像送信部１０やデータ制御部１１
に出力する。例えば、警備会社などの或る監視系システ
ムでお客から緊急情報がＴＶ電話を通じて監視センター
に入ってきた場合、監視センターのオペレータは、まず
電話の会話または電話番号などにより通報してきたお客
を特定し、このお客が特定できたら、このお客の最寄り
のセンサーのステータスを調べ、即座に異常事態を把握
し、緊急対応者を派遣する手配をすると共に、その旨を
お客に通知することができる。この場合、応答シナリオ
部は、お客の特定を行い、お客が特定できたら、データ
制御部を介してセンサーのステータスを調べて、緊急内
容を把握し、対応すべき処置を講ずるように制御する。The data controller 11 transmits and receives data to and from the sensor 4 via a data network 5 such as the Internet. The response scenario unit 12 includes a recognition result from the voice recognition unit 7, an image from the image reception unit 9, and the data control unit 1.
1 is determined based on the sensor data from the control unit 1 and the result of the determination is transmitted to the voice synthesizing unit 8, the image transmitting unit 10, and the data control unit 11.
Output to For example, when emergency information from a customer enters a surveillance center through a TV phone in a certain surveillance system such as a security company, the operator of the surveillance center first identifies the customer who reported by telephone conversation or telephone number. Once the customer has been identified, the status of the sensor closest to the customer can be checked, an abnormal situation can be immediately grasped, an emergency response person can be dispatched, and the customer can be notified of that fact. In this case, the response scenario unit identifies the customer, and if the customer can be identified, checks the status of the sensor via the data control unit, grasps the urgent content, and performs control so as to take an appropriate action.

【００２９】また、仮想終端部６は電話機１からの電話
が途切れたことを認識すると、応答シナリオ部１２を終
了させる。応答シナリオ部１２側が先に起動され、応答
シナリオ部１２が仮想終端部６に指示を行い電話機１に
電話をかける場合もある。応答シナリオ部１２の一般的
な動作は、例えば電話が着信した場合、最初に音声合成
部８を使用して、“こちらは××サービスです…”とい
ったナレーションを流し、電話機１からの例えば“××
予約”といった音声指示を音声認識部７が受け、音声か
ら前記応答シナリオ部が解読できる形式に変換されたこ
の指示に基づいて応答シナリオ部１２がデータベースな
どを検索し、その結果を音声合成部８に出力して音声合
成部８に喋らせると同時に、画像送信部１０にも出力し
て画像送信部１０に内容を表示させる。When the virtual termination unit 6 recognizes that the telephone call from the telephone 1 has been interrupted, the response scenario unit 12 is terminated. In some cases, the response scenario unit 12 is activated first, and the response scenario unit 12 instructs the virtual termination unit 6 to call the telephone 1. The general operation of the response scenario section 12 is, for example, when a telephone call is received, first, a voiceover is performed using the voice synthesis section 8 and a narration such as “this is a xx service ...” is sent. ×
The voice recognition unit 7 receives a voice instruction such as "reservation", and the response scenario unit 12 searches a database or the like based on the instruction converted from the voice into a format that can be interpreted by the response scenario unit. To the voice synthesizing section 8 and at the same time, to the image transmitting section 10 to display the content on the image transmitting section 10.

【００３０】また、通話の中で、あるいは発信者番号な
どから電話機１の所在が確認できると、応答シナリオ１
２はデータ制御部１１を使って関わるセンサ４から電話
機１の話者に関するデータを収集し、その内容を音声合
成部８と画像受信部１０によって音声や画像に変換し、
電話機１に通知することもある。応答シナリオ部１２は
さらに画像受信部９からの画像をデータベースに記録し
たり、画像受信部９がより高度な処理として画像認識を
行う場合は音声認識部７の認識機能と合わせ、電話機１
の話者の画像と音声による指示認識に使用する。If the location of the telephone 1 can be confirmed during a call or from a caller ID, etc., the response scenario 1
2 collects data relating to the speaker of the telephone 1 from the sensor 4 involved using the data control unit 11 and converts the content into a voice or an image by the voice synthesis unit 8 and the image receiving unit 10;
The telephone 1 may be notified. The response scenario unit 12 further records the image from the image receiving unit 9 in a database, and when the image receiving unit 9 performs image recognition as more advanced processing, the response scenario unit 12 combines the recognition function of the voice recognition unit 7 with the telephone 1.
It is used for instruction recognition by the image and voice of the speaker.

【００３１】以上のように、画像受信、データ制御を含
めた応答シナリオが組めるので、従来の自動応答より高
度なシナリオを使ったシステムを構築することができ
る。As described above, since a response scenario including image reception and data control can be set, a system using a scenario more advanced than the conventional automatic response can be constructed.

【００３２】実施の形態２．以上の実施の形態１では、
ＴＶ電話を対象とするようにしたものであるが、ここで
はＴＶを持たない通常の電話機とセットトップボックス
を有するＴＶを併用した場合にも画像を含む応答シナリ
オを提供する実施の形態を示す。図２は、この発明に係
るマルチメディアＣＴＩの実施の形態２を示す構成図で
あり、このような場合における電話とＴＶとの連携動作
を示したものである。図において、２１は電話機、２２
はＴＶ連携用マルチメディアＣＴＩ装置、２３は電話
網、２４はＴＶに接続するセットトップボックス、２５
はＴＶ、２６はフレームを記憶するフレームキャッシ
ュ、２７はケーブルテレビやＣＳ放送の放送網、２８は
フレームを送出するフレーム送出制御、２９は個別ユー
ザ向けデータを送出するデータ送出制御である。Embodiment 2 FIG. In the first embodiment,
Although the present invention is applied to a TV phone, an embodiment in which a response scenario including an image is provided even when a normal phone having no TV and a TV having a set-top box are used together will be described. FIG. 2 is a configuration diagram showing a second embodiment of the multimedia CTI according to the present invention, and shows a cooperative operation between a telephone and a TV in such a case. In the figure, 21 is a telephone, 22
Is a multimedia CTI device for TV cooperation, 23 is a telephone network, 24 is a set-top box connected to a TV, 25
Is a TV, 26 is a frame cache for storing frames, 27 is a broadcast network for cable television or CS broadcasting, 28 is a frame transmission control for transmitting frames, and 29 is a data transmission control for transmitting data for individual users.

【００３３】次に、本実施の形態２の動作について図２
を用いて説明する。フレーム送出制御部２８は画面の大
枠であるフレームを常時放送網２７の空き帯域を利用し
てセットトップボックス２４の中のフレームキャッシュ
２６に送り続けている。セットトップボックスは個別ユ
ーザ向けデータの指示があるとこのフレームと指示に伴
って送られてきたデータを元にＴＶ２５に画面を表示す
る。フレーム内の空欄はデータによって埋められ、この
ユーザ専用の表示が行われる。電話機２１からの着信に
よって応答シナリオ部１２が起動すると、実施の形態１
と同様に音声認識部７、音声合成部８を使ったシナリオ
が実行されるとともに、応答シナリオ部１２はシナリオ
で選択された内容から、どのフレームとどのデータを送
信したらよいかを決定し、データ送出制御部２９に指示
を行う。データ送出制御部２９は放送網２７を通じてセ
ットトップボックス２４にデータを伝える。Next, the operation of the second embodiment will be described with reference to FIG.
This will be described with reference to FIG. The frame transmission control unit 28 continuously sends a frame, which is a large frame of the screen, to the frame cache 26 in the set-top box 24 by always using the available bandwidth of the broadcast network 27. The set-top box displays a screen on the TV 25 based on the frame and the data sent in accordance with the instruction when the data for the individual user is instructed. Blanks in the frame are filled with data, and this user-specific display is performed. When the response scenario unit 12 is activated by an incoming call from the telephone 21, the first embodiment
Similarly, the scenario using the voice recognition unit 7 and the voice synthesis unit 8 is executed, and the response scenario unit 12 determines which frame and data to transmit from the content selected in the scenario, An instruction is given to the transmission control unit 29. The data transmission control unit 29 transmits data to the set-top box 24 via the broadcast network 27.

【００３４】セットトップボックス２４は、このデータ
に従ってＴＶ２５に画面を表示することでユーザは電話
で選択した内容を画面で確認することができる。このと
き使用するフレームがフレームキャッシュにない可能性
が高いとき、つまり電話が着信してからの間またはその
近傍に当該フレームが送出されていない場合は、優先的
に当該フレームを送出するよう、応答シナリオ１２から
フレーム送出制御２８に指示を行う場合もある。The set-top box 24 displays a screen on the TV 25 according to the data, so that the user can confirm the contents selected by telephone on the screen. If there is a high possibility that the frame to be used is not in the frame cache, that is, if the frame has not been transmitted during or near the time of the incoming call, a response is sent so that the frame is preferentially transmitted. The scenario 12 may instruct the frame transmission control 28 in some cases.

【００３５】実施の形態３．以上の実施の形態では、Ｔ
Ｖ電話、またはＴＶと電話を併用したものを対象として
いるが、例えばＩ−ｍｏｄｅ端末のように携帯電話機な
どにデータをアクセスする手段を有する場合は、実施の
形態１のセンサ４と同様に携帯電話機を位置づけてもよ
い。Embodiment 3 FIG. In the above embodiment, T
It is intended for a V-phone or a combination of a TV and a telephone. However, in the case where there is a means for accessing data to a mobile phone or the like such as an I-mode terminal, the mobile phone is used similarly to the sensor 4 of the first embodiment. The telephone may be located.

【００３６】また、応答シナリオ部１２から電話を有人
の電話機に転送し、人との応対に繋げることもできる。
例えば、話者が高齢で自動応答装置に対応できないよう
な場合、オペレータに繋ぎかえることも可能である。例
えば、自動応答装置と通話している最中に、話者がプッ
シュボタンによって指示するかあるいは音声で「オペレ
ータと接続したい」などと言ってオペレータとの接続を
要求する。自動応答装置の応答シナリオがこの要求を検
知すると、シナリオは確認のため「オペレータに接続し
ますか」というメッセージを音声出力装置と画像出力装
置に出力する。音声出力装置はこのメッセージを音声出
力し、画像出力装置は画像信号を出力する。この信号は
電話とＴＶ画面に出力される。これに対して、話者がが
「はい」と音声で答えると、応答シナリオの制御により
オペレータの電話に着信して呼を設定する。以後、話と
オペレータが直接対話する。このとき、応答シナリオが
取得したデータを対応者の使用するＰＣに転送すること
も可能である。対話が終わり、「対話終了」と話すこと
により、オペレータから自動応答装置に切り替わり、元
の応答シナリオに戻る。Further, the telephone call can be transferred from the response scenario section 12 to a manned telephone so that it can be connected to a person.
For example, if the speaker is elderly and cannot respond to the automatic response device, it is possible to switch to an operator. For example, during a call with the automatic answering apparatus, the speaker requests connection with the operator by instructing with a push button or saying "I want to connect with the operator" by voice. When the response scenario of the automatic response device detects this request, the scenario outputs a message “Do you want to connect to an operator” to the voice output device and the image output device for confirmation. The audio output device outputs this message by voice, and the image output device outputs an image signal. This signal is output on the telephone and TV screen. On the other hand, if the speaker answers “yes” by voice, the call is set by receiving a call to the operator's telephone under the control of the response scenario. Thereafter, the talk and the operator directly interact. At this time, it is also possible to transfer the data obtained by the response scenario to the PC used by the responder. When the dialogue is over, the operator switches to the automatic response device by saying "dialogue ended" and returns to the original response scenario.

【００３７】[0037]

【発明の効果】以上のように、第１または第２の発明に
よれば、ＴＶ電話と電話回線経由でこのＴＶ電話に対応
した音声画像自動応答装置とにより、電話の音声と画像
を併用した応答シナリオが組めるので音声だけを使った
場合に比べ、話者の理解度をより高めることができると
いう効果がある。As described above, according to the first or second aspect of the present invention, the voice of the telephone and the image are used together by the TV telephone and the voice image automatic answering apparatus corresponding to the TV telephone via the telephone line. Since a response scenario can be set, there is an effect that the level of understanding of the speaker can be further improved as compared with the case where only voice is used.

【００３８】また、第３〜第４の発明によれば、センサ
からの情報を集め、関係するＴＶ電話に電話回線経由で
音声と画像を送ることにより、ＴＶ電話の話者により多
くの情報を伝えるので、さらに話者の理解度を高めるこ
とができるという効果がある。Further, according to the third and fourth aspects of the present invention, by collecting information from the sensors and sending voice and images to the related TV phone via the telephone line, more information can be provided to the speaker of the TV phone. Since it is transmitted, there is an effect that the level of understanding of the speaker can be further increased.

【００３９】また、第５又は第６の発明によれば、仮想
終端部は、Ｈ３２０、Ｈ３２３、Ｈ３２４の規格に従
い、ＴＶ電話から圧縮された画像データを受け取り、元
の画像信号を再生した上で画像受信部へ出力し、画像送
信部からの画像を入力したら、画像の圧縮を行い、圧縮
された画像データをＴＶ電話へ送信するので、画像デー
タ転送が効率よく行えるという効果がある。According to the fifth or sixth aspect of the present invention, the virtual termination unit receives the compressed image data from the TV phone in accordance with the standards of H320, H323, and H324 and reproduces the original image signal. When the image is output to the image receiving unit and the image is input from the image transmitting unit, the image is compressed and the compressed image data is transmitted to the TV phone, so that there is an effect that the image data can be efficiently transferred.

【００４０】また、第７または第８の発明によれば、電
話からのフレーム番号に基づいて音声画像自動応答装置
は該当するフレームを内蔵するデータベースから読み出
して電話からの個別ユーザデータと共に放送網経由で受
信装置付きのＴＶに出力することにより電話の音声と画
像を併用した応答シナリオが組めるので従来の電話でも
音声だけを使った場合に比べ、話者の理解度をより高め
ることができるという効果がある。According to the seventh or eighth aspect of the present invention, based on the frame number from the telephone, the audio-visual automatic answering apparatus reads out the corresponding frame from the built-in database and sends it together with the individual user data from the telephone via the broadcasting network. By outputting to a TV equipped with a receiving device, it is possible to compose a response scenario using both voice and image of the telephone, so that it is possible to improve the level of understanding of the speaker compared to a case where only the voice is used even with a conventional telephone. There is.

【００４１】また、第９の発明によれば、受信装置のキ
ャッシュメモリはフレームを蓄積するので、ＴＶへの出
力において高速化を図ることができるという効果があ
る。Further, according to the ninth aspect, since the cache memory of the receiving device accumulates frames, there is an effect that the speed of output to the TV can be increased.

【００４２】また、第１０の発明によれば、自動応答装
置は、音声、画像、データを入力としたシナリオ制御を
行うので、音声だけを使った場合に比べ、話者の理解度
をより高めることができるという効果がある。According to the tenth aspect, the automatic answering apparatus performs scenario control using voice, image, and data as input, so that the level of understanding of the speaker can be further improved as compared with the case where only voice is used. There is an effect that can be.

【００４３】また、第１１の発明によれば、自動応答装
置は、音声、画像、データを出力としたシナリオ制御を
行うので、音声だけを使った場合に比べ、話者の理解度
をより高めることができるという効果がある。According to the eleventh aspect, the automatic answering apparatus performs scenario control using voice, image, and data as output, so that the level of understanding of the speaker can be further improved as compared with the case where only voice is used. There is an effect that can be.

【００４４】また、第１２の発明によれば、自動応答装
置はシナリオの一部を有人で対応できるように構成した
ので、自動応答装置と対話できない人でも対話が可能に
なるという効果がある。Further, according to the twelfth aspect, since the automatic response device is configured so that a part of the scenario can be handled by a person, there is an effect that even a person who cannot interact with the automatic response device can perform the dialogue.

[Brief description of the drawings]

【図１】この発明に係るマルチメディアＣＴＩシステ
ムの実施の形態１を示す構成図である。FIG. 1 is a configuration diagram showing Embodiment 1 of a multimedia CTI system according to the present invention.

【図２】この発明に係るマルチメディアＣＴＩシステ
ムの実施の形態２を示す構成図である。FIG. 2 is a configuration diagram showing Embodiment 2 of the multimedia CTI system according to the present invention.

【図３】従来の電話音声自動応答システムの構成図で
ある。FIG. 3 is a configuration diagram of a conventional telephone automatic voice response system.

【図４】従来の別の電話音声自動応答システムの構成
図である。FIG. 4 is a configuration diagram of another conventional telephone automatic voice response system.

[Explanation of symbols]

１電話機、２電話網、３ＴＶ電話用マルチメ
ディアＣＴＩ装置、４センサ、５データ系ネット
ワーク、６仮想終端部、７音声認識部、８
音声合成部、９画像受信部、１０画像送信部、
１１データ制御部、１２応答シナリオ部、３
１電話、３２電話音声自動応答装置、３３電
話網、４１音声理解部、４２対話管理部、４
３応答生成出力部、４４ディスプレイ部、４５
スピーカ。Reference Signs List 1 telephone, 2 telephone network, 3 multimedia CTI device for TV telephone, 4 sensors, 5 data network, 6 virtual termination unit, 7 voice recognition unit, 8
Voice synthesis unit, 9 image receiving unit, 10 image transmitting unit,
11 data control part, 12 response scenario part, 3
1 telephone, 32 telephone automatic voice response device, 33 telephone network, 41 voice understanding unit, 42 dialogue management unit, 4
3 response generation output unit, 44 display unit, 45
Speaker.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｍ 3/50 Ｈ０４Ｍ 11/06 ５Ｋ１０１ 3/527 Ｇ１０Ｌ 3/00 Ｑ９Ａ００１ 11/06 ５５１Ａ５５１ＧＦターム(参考） 5C064 AA01 AB03 AC02 AC05 AC06 AC13 AC16 AD01 AD08 AD14 5D015 AA02 AA05 BB01 KK01 LL05 LL12 5D045 AB04 AB26 5K015 AA00 AA06 AA07 AB00 AB01 AF00 GA04 GA07 5K024 AA00 AA76 BB01 BB02 CC00 CC01 CC10 DD01 EE00 EE09 FF04 FF06 5K101 KK04 KK14 KK16 KK17 LL03 MM07 NN06 NN07 NN08 NN13 NN18 TT06 9A001 BB04 BB06 EE02 HH17 HH18 HH20 HH30 JJ24 JJ25 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) H04M 3/50 H04M 11/06 5K101 3/527 G10L 3/00 Q 9A001 11/06 551A 551G F-term (Reference) ) 5C064 AA01 AB03 AC02 AC05 AC06 AC13 AC16 AD01 AD08 AD14 5D015 AA02 AA05 BB01 KK01 LL05 LL12 5D045 AB04 AB26 5K015 AA00 AA06 AA07 AB00 AB01 AF00 GA04 GA07 5K024 AA00 AA76 BB01 KK02 CC01 KK02 CC01 KK MM07 NN06 NN07 NN08 NN13 NN18 TT06 9A001 BB04 BB06 EE02 HH17 HH18 HH20 HH30 JJ24 JJ25

Claims

[Claims]

1. A TV phone having a voice communication function and an image communication function, connected to the TV phone by a telephone line,
Multimedia CTI (Compu) equipped with an automatic answering device that sends voice guidance to the TV phone and responds by using one or both of voice and image in response to a voice or image instruction from the TV phone
ter Telephony Integration
n) System.

2. At least one TV phone or Web T
V and WCDMA (Wide Band Code Di)
TVs (Tees) that have voice communication functions such as vision multiple access and image communication functions
(revision) telephone system, and an automatic answering device connected to the TV telephone system via a telephone line, wherein the automatic answering device is connected to the TV telephone system,
A virtual termination unit for transmitting / receiving voice or image to / from a telephone system, a response scenario unit holding a database, a voice recognition unit connected to both the virtual termination unit and the response scenario unit, the virtual termination unit and the response A speech synthesis unit connected to both of the scenario units, an image reception unit connected to both the virtual termination unit and the response scenario unit, and an image transmission connected to both the virtual termination unit and the response scenario unit The virtual termination unit transmits the information to the voice recognition unit if the information received from the TV phone system is an instruction by voice, and transmits the information to the image recognition unit if the information is an instruction by image. Transmitting to the receiving unit, the voice recognizing unit converts the voice instruction from the virtual termination unit into a format that can be decoded by the scenario unit and outputs it to the response scenario unit, and the image receiving unit is The instruction from the image from the virtual terminal unit is converted into a format that can be decoded by the scenario unit and output to the response scenario unit, and the response scenario unit sends a response and screen data from the database according to the instruction from the voice recognition unit. Reading, the voice synthesis unit converts the response from the response scenario unit into a voice and outputs the voice to the virtual termination unit, and the image transmission unit converts screen data from the response scenario unit into an image and converts the virtual data into an image. A multimedia CTI system for outputting to a terminal unit, wherein the virtual terminal unit transmits, to the TV phone system, an answer of voice input from the voice synthesizing unit or an image input from the image transmitting unit.

3. The automatic answering apparatus has a data control function connected to a remote sensor via a data network such as the Internet, and transmits information of the remote sensor related to the TV phone to the related TV phone by voice and sound. The multimedia CTI system according to any one of claims 1 to 3, wherein the notification is provided by an image.

4. At least one TV phone or Web T
V and WCDMA (Wide Band Code Di)
TVs (Tees) that have voice communication functions such as vision multiple access and image communication functions
(revision) telephone system, an automatic answering device connected to the TV telephone system via a telephone line, and a TV telephone connected to the automatic answering device via a data network such as the Internet. At least one sensor, wherein the automatic answering device is connected to the TV telephone system,
A virtual termination unit for transmitting / receiving voice or image to / from a telephone system, a response scenario unit holding a database, a voice recognition unit connected to both the virtual termination unit and the response scenario unit, the virtual termination unit and the response A speech synthesis unit connected to both of the scenario units, an image reception unit connected to both the virtual termination unit and the response scenario unit, and an image transmission connected to both the virtual termination unit and the response scenario unit And a data control unit connected to both the data network and the response scenario unit. The virtual termination unit transmits the information to the TV telephone system if the information received from the TV telephone system is a voice instruction. The information is transmitted to the recognition unit. If the instruction is based on an image, the information is transmitted to the image reception unit. The voice recognition unit transmits the voice instruction from the virtual terminal unit to the scenario unit. Converted to a readable format and output to the response scenario unit, the image receiving unit converts the instruction from the image from the virtual terminal unit into a format that can be decoded by the scenario unit and outputs it to the response scenario unit, The response scenario unit reads the answer and the screen data from the database according to the instruction from the voice recognition unit, and the voice synthesis unit converts the response from the response scenario unit into voice and outputs the voice to the virtual termination unit, The image transmission unit converts the screen data from the response scenario unit into an image and outputs the image to the virtual termination unit, and the virtual termination unit responds to a voice input from the voice synthesis unit or receives an input from the image transmission unit. And transmitting the image to the TV phone system. If the data control unit is able to identify any one of the one or more TV phones, the data control unit detects a sensor associated with the TV phone. Collect Luo data, the TV after having converted into voice and image data by the speech synthesis unit or the image receiving unit
Multimedia CTI for transmitting to telephone
system.

5. The virtual termination unit receives the compressed image data from the TV phone, reproduces the original image signal, outputs the signal to the image receiving unit, and compresses the image when the image from the image transmitting unit is input. The multimedia CTI system according to any one of claims 1 to 5, wherein the compressed image data is transmitted to a TV phone.

6. H320, H323, H
7. The multimedia CTI system according to claim 1, wherein a telephone line such as 324 is used.

7. A telephone provided on the speaker side, a TV provided on the speaker side, and data embedding for connecting the TV and displaying formatted data on a screen of the TV. A receiving device such as a set-top box for combining a frame in an unformatted format and data for an individual user, which is data to be filled in the format, and outputting the synthesized data to the TV; Is connected to the broadcasting network, stores the frame in a database containing the frame, receives the frame number and data for individual user from the telephone, reads the corresponding frame from the database, and receives the read frame and the received frame from the telephone. Multimedia CTI system comprising: an automatic response device for transmitting the data for individual users to the receiving device via the broadcast network. .

8. An at least one telephone, an automatic answering device connected to the telephone via a telephone line, a TV connected to the automatic answering device via a broadcasting network, and a TV connected to the TV, A receiving device such as a set-top box connected to the automatic answering device via a broadcast network, wherein the automatic answering device is connected to both a response scenario unit having a database, and both the telephone and the response scenario unit. A speech recognition unit, a speech synthesis unit connected to both the telephone and the response scenario unit, a frame transmission control unit connected to both the reception device and the response scenario unit, the reception device and the response A data transmission control unit connected to both of the scenario units, wherein the voice recognition unit includes a frame number, which is a format in which data from the telephone is not yet filled, and a format of the frame number. The individual user data, which is data to be filled in, is received by a voice instruction, converted into an instruction in a format that can be decoded by the response scenario unit, and output to the response scenario unit, and the response scenario unit receives an instruction from the voice recognition unit. In response, the answer is read from the database, the frame specified by the telephone is read from the database, and output together with the received individual user data, and the voice synthesis unit provides guidance from the response scenario unit. The frame transmission control unit converts the message read from the response scenario unit into an image and transmits the image to the receiving device, and the data transmission control The unit converts the individual user data from the response scenario unit to an image and sends the image to the receiving device, and the receiving device Multimedia CTI system, characterized in that for the image display multiplexing to formatted data is output to the TV monitor unit individual user's data from the frame and the data transmission control unit from the frame transmission control section.

9. The receiving device includes a frame cache for storing a frame, wherein the receiving device stores a frame from a frame transmission control unit in the frame cache,
10. The multimedia CTI system according to claim 8, wherein a frame read from the frame cache and received data are combined, output to a monitor of the TV, and displayed.

10. The multimedia CTI system according to claim 1, wherein the automatic response device performs scenario control using voice, image, and data as input.

11. The multimedia CTI system according to claim 1, wherein the automatic response device performs scenario control using voice, image, and data as output.

12. A multimedia CTI system, wherein the automatic response device is configured so that a part of a scenario can be handled by a man.