JP2003108362A

JP2003108362A - Communication supporting device and system thereof

Info

Publication number: JP2003108362A
Application number: JP2002150863A
Authority: JP
Inventors: Takashi Nishiyama; 高史西山; Hiroshi Hoshino; 洋星野; Takeyuki Suzuki; 健之鈴木
Original assignee: Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 2001-07-23
Filing date: 2002-05-24
Publication date: 2003-04-11

Abstract

PROBLEM TO BE SOLVED: To provide a communication supporting device that prevents the reduction of usage frequency by eliminating monotony in a dialog to induce user's interest in the dialog. SOLUTION: A dialog processing means 20 interacts with a user by a natural language via a voice input 11 and a voice output 12. A character assumed as a respondent who responds to the user can be selected from a plurality of types with a dialog processing means 20, and an appropriate character is selected according to the interaction environment with an overall control unit 10. The character, depending on the tone and accent, discriminates the respondents.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、コミュニケーショ
ン支援装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a communication support device.

【０００２】[0002]

【従来の技術】一般に、高齢者は外出の機会が少なく他
者との会話が少ないから、家族が会話の相手になってい
ることが多い。また、高齢者では体力の衰えから家庭内
での事故も多く、事故を未然に防止したり事故があれば
迅速に対処するために、家族が常に見守っていることも
必要である。このような事情から、高齢者とともに生活
している家族は高齢者を家に残して外出することが難し
く、旅行など言うに及ばず、買い物や習いことすらまま
ならないことが多い。2. Description of the Related Art Generally, an elderly person has few opportunities to go out and has little conversation with other persons, and therefore his family is often the other party. In addition, elderly people often suffer from accidents at home due to weakness in their physical strength, and it is necessary for the family to constantly watch over them in order to prevent accidents and to promptly deal with any accidents. Under such circumstances, it is difficult for a family living with an elderly person to leave the elderly person at home and go out, and it is often difficult to go shopping or even learn lessons, not to mention traveling.

【０００３】一方、最近では、あたかも意思を持つかの
ように行動するペットロボットと称する種類のロボット
が製品化されており、この種のロボットとしては、たと
えば犬型のペットロボットが「ＡＩＢＯ」（ソニー社の
商品名）という名称で製品化されている。このロボット
は、人の声の調子や表情や触れ方などを監視して学習す
ることによって、従来の機械装置のように刺激に対する
応答が定式化せず刺激に対する応答が状況に応じて変化
することになり、成長しているかのように振る舞った
り、人とコミュニケーションをしているかのように錯覚
させることが可能になり、一種の癒し機能を有するもの
になっている。ただし、このペットロボットは、犬の動
きを模倣してエンターテインメントを指向するものであ
り、高齢者とのコミュニケーションの支援を目的とする
ものではない。On the other hand, recently, a type of robot called a pet robot, which behaves as if it had its own intention, has been commercialized. As this type of robot, for example, a dog type pet robot is called "AIBO" ( It is commercialized under the name Sony's product name). By monitoring and learning the tone, facial expression, and touching method of a human voice, this robot does not formulate the response to stimuli like conventional mechanical devices, and the response to stimuli changes depending on the situation. It becomes possible to behave as if they are growing up, and it is possible to make them illusion as if they are communicating with a person, and it has a kind of healing function. However, this pet robot aims at entertainment by imitating the movement of a dog, and is not intended to support communication with the elderly.

【０００４】一方、高齢者とのコミュニケーションを主
たる目的としたペットロボットとしては、松下電器産業
株式会社から猫型あるいは熊型のロボットが提案されて
おり、このペットロボットは高齢者の快適な生活に貢献
するために、簡単な日常会話とペットのような振る舞い
が行えるほか、遠隔地からペットロボットの使用状況を
間接的に把握する機能も備えている。したがって、この
ペットロボットは高齢者の話し相手となって高齢者の精
神的なケアを行い、また高齢者の独居生活の安全の確認
を遠隔地から行うことが可能になっている。このペット
ロボットは、通信路（双方向ＣＡＴＶ施設にデジタル通
信システム技術およびモバイル通信技術を複合した通信
路）を介して福祉サービス支援センタに設置されたセン
タ設備と接続されるものであって、独居している高齢者
の安全を確認する際には、ペットロボットに発話させる
メッセージをセンタ設備から通信路を通してペットロボ
ットに伝送し、ペットロボットから高齢者に向かってメ
ッセージを音声によって伝える。このときの高齢者の応
答をセンタ設備に返送し、センタ設備に設けた推論装置
によって生活の状況を推論することによって、高齢者の
安全の確認を行う。推論装置では、たとえばペットロボ
ットの発話に対して高齢者の応答がないときに、異常が
生じている可能性があるなどと推論するのである。On the other hand, cat-shaped or bear-shaped robots have been proposed by Matsushita Electric Industrial Co., Ltd. as pet robots whose main purpose is to communicate with the elderly. To contribute, in addition to being able to perform simple daily conversations and behaving like a pet, it also has the function of indirectly grasping the usage status of the pet robot from a remote location. Therefore, this pet robot can serve as a conversation partner for the elderly person, mentally care for the elderly person, and confirm the safety of the elderly person's living alone from a remote location. This pet robot is connected to a center facility installed in a welfare service support center via a communication path (a communication path that combines digital communication system technology and mobile communication technology in a two-way CATV facility), When confirming the safety of the elderly, the pet robot transmits a message from the center equipment to the pet robot through the communication path, and the pet robot gives the message to the elderly by voice. The response of the elderly at this time is returned to the center facility, and the reasoning device provided in the center facility infers the living situation to confirm the safety of the elderly. The reasoning device infers that an abnormality may have occurred when, for example, the elderly person does not respond to the speech of the pet robot.

【０００５】[0005]

【発明が解決しようとする課題】上述した前者のペット
ロボットには対話機能がないから孤独感を十分に緩和す
ることができず、後者のペットロボットでは公共施設で
あるセンタ設備に通信路を介して接続されるからプライ
バシを侵害されることのないように使用者の端末２ｂを
センタ設備に転送する機能は設けていない。Since the former pet robot described above does not have an interactive function, it is not possible to sufficiently alleviate the feeling of loneliness. The latter pet robot cannot communicate with the center facility, which is a public facility, via a communication path. It is not provided with the function of transferring the user's terminal 2b to the center equipment so that privacy is not infringed because the connection is established.

【０００６】そこで、本発明者らは、人との対話を模擬
する対話機能を備えることによって使用者の孤独感を緩
和するとともに、プライバシを侵害しない範囲で使用者
の画像を携帯端末に転送可能として使用者の異常の有無
を看視可能とするシステムを先に提案した（特願２００
０−３９３６７８）。しかしながら、現状の対話機能の
技術では機械的な応答しかできず、対話が単調になりが
ちであるから数回の使用で新鮮味が薄れてしまい、使用
頻度が低下して精神的なケアの機能を十分に発揮できな
くなるという問題を有している。[0006] Therefore, the present inventors alleviate the feeling of loneliness of the user by providing an interactive function for simulating the interaction with a person, and can transfer the image of the user to the portable terminal within the range where privacy is not violated. As a system, we previously proposed a system that enables the user to see whether there is any abnormality (Japanese Patent Application No. 200
0-393678). However, the current technology of the dialogue function can only make a mechanical response, and the dialogue tends to be monotonous, so the freshness fades after several uses, and the frequency of use decreases and mental care functions are reduced. It has a problem that it cannot be fully exhibited.

【０００７】本発明は上記事由に鑑みて為されたもので
あり、その目的は、対話の単調さを解消して対話に興味
を持たせることにより使用頻度の低下を防止したコミュ
ニケーション支援装置を提供することにある。The present invention has been made in view of the above circumstances, and an object thereof is to provide a communication support device in which the frequency of use is prevented from decreasing by eliminating the monotonous nature of the dialog and making the dialog interesting. To do.

【０００８】[0008]

【課題を解決するための手段】請求項１の発明は、音声
入力部と音声出力部とが接続され自然言語を用いた音声
による対話を使用者との間で行うとともに使用者に応答
する応答者として想定したキャラクタが複数種類から選
択可能である対話処理手段と、対話環境に応じて対話処
理手段における前記キャラクタから適宜のキャラクタを
選択するキャラクタ選択手段を有した統括制御部とを備
えることを特徴とする。この構成によれば、複数種類の
キャラクタが選択されることによって対話の単調さを解
消することができ、対話に対して持続的に興味を持たせ
ることが可能になって使用頻度の低下を抑制することが
できる。According to a first aspect of the present invention, a voice input section and a voice output section are connected to each other, a dialogue by a voice using a natural language is carried out with a user, and a response is made to respond to the user. A dialogue processing means capable of selecting a character assumed as a person from a plurality of types, and a central control unit having a character selecting means for selecting an appropriate character from the characters in the dialogue processing means according to a dialogue environment. Characterize. With this configuration, it is possible to eliminate the monotonousness of the dialogue by selecting a plurality of types of characters, and it is possible to keep the dialogue interesting and prevent the frequency of use from decreasing. can do.

【０００９】請求項２の発明は、請求項１の発明におい
て、前記キャラクタの属性として声色を含むので、声色
を制御する程度の比較的簡単な構成を採用するだけで、
あたかも複数人と対話しているような印象を与えて対話
の単調さを緩和することが可能になる。According to a second aspect of the invention, in the first aspect of the invention, since the voice color is included as an attribute of the character, only a relatively simple structure for controlling the voice color is adopted.
It is possible to reduce the monotonousness of the dialogue by giving the impression of talking to multiple people.

【００１０】請求項３の発明は、請求項１の発明におい
て、表情を表現する可動部分を有したロボットを駆動す
る駆動源が付加され、前記キャラクタ選択手段が可動部
分の動作により表現される表情に応じたキャラクタを選
択することを特徴とする。この構成によれば、動作によ
ってキャラクタを表現するから、動作によって興味をひ
くことにより対話のみによる単調さを軽減することが可
能になる。According to a third aspect of the present invention, in the first aspect of the present invention, a driving source for driving a robot having a movable portion expressing an expression is added, and the character selecting means is expressed by the movement of the movable portion. The character is selected according to. According to this configuration, since the character is expressed by the motion, it is possible to reduce the monotonousness caused only by the dialogue by attracting the interest by the motion.

【００１１】請求項４の発明は、請求項１の発明におい
て、前記キャラクタを選択する操作部を備えるので、使
用者の希望に応じてキャラクタを選択することができ、
使用者の好きなキャラクタを選択させることにより対話
に対する興味を持続させることが可能になる。According to a fourth aspect of the present invention, in the first aspect of the present invention, since the operation unit for selecting the character is provided, the character can be selected according to the wish of the user.
By allowing the user to select a favorite character, it becomes possible to maintain the interest in the dialogue.

【００１２】請求項５の発明は、請求項１の発明におい
て、前記対話処理手段を通して前記使用者に発話を促す
ように呼びかける機能を有し、前記キャラクタ選択手段
が使用者の応答した内容に応じて前記キャラクタを選択
することを特徴とする。この構成では、使用者の発話内
容に応じてキャラクタを選択するから、キャラクタの選
択が対話的に行われることになり、キャラクタの選択が
定式化されることによる単調さを防止することができ
る。According to a fifth aspect of the present invention, in the first aspect of the present invention, there is a function of calling out to the user through the dialogue processing means so as to prompt the user to speak, and the character selecting means responds to the content of the user's response. And selecting the character. With this configuration, since the character is selected according to the utterance content of the user, the character selection is performed interactively, and it is possible to prevent monotonousness due to the formalization of the character selection.

【００１３】請求項６の発明は、請求項１の発明におい
て、前記使用者に関する使用者情報を入力する使用者情
報入力手段が付加され、前記キャラクタ選択手段が前記
使用者情報に応じて前記キャラクタを選択することを特
徴とする。この構成によれば、使用者が与える使用者情
報に基づいてキャラクタが選択されるから、使用者に適
合するキャラクタが選択されることになる。According to a sixth aspect of the present invention, in the first aspect of the present invention, user information input means for inputting user information about the user is added, and the character selecting means is operable to generate the character according to the user information. Is selected. According to this configuration, since the character is selected based on the user information provided by the user, the character suitable for the user is selected.

【００１４】請求項７の発明は、請求項６の発明におい
て、前記使用者情報入力手段が前記使用者情報を格納す
るとともに使用者に所持され、前使用者情報入力手段か
ら前記使用者情報を受信して前記統括制御部に引き渡す
信号受信部を備えることを特徴とする。この構成によれ
ば、使用者情報を使用者が所持する使用者情報入力手段
から与えるから、使用者の生体情報のような内容が時々
変動する使用者情報でも容易に入力することができる。According to a seventh aspect of the invention, in the sixth aspect of the invention, the user information input means stores the user information and is carried by the user, and the user information is input from the previous user information input means. It is characterized by comprising a signal receiving unit for receiving and delivering to the overall control unit. According to this configuration, since the user information is given from the user information input means possessed by the user, it is possible to easily input even user information such as biometric information of the user whose contents change from time to time.

【００１５】請求項８の発明は、請求項７の発明におい
て、前記使用者情報入力手段と前記信号受信部との間の
伝送路がワイヤレス伝送路であるので、使用者から離れ
ていても使用者情報の入力が可能になる。According to the invention of claim 8, in the invention of claim 7, since the transmission path between the user information input means and the signal receiving section is a wireless transmission path, it can be used even if the user is away from the user. It becomes possible to input personal information.

【００１６】請求項９の発明は、請求項７の発明におい
て、前記使用者情報入力手段と前記信号受信部との間の
伝送路の少なくとも一部が人体であるので、使用者に触
れるだけで使用者情報を入力することが可能になる。According to a ninth aspect of the invention, in the seventh aspect of the invention, since at least a part of the transmission line between the user information input means and the signal receiving section is a human body, it can be touched by a user. It becomes possible to input user information.

【００１７】請求項１０の発明は、請求項６ないし請求
項９の発明において、前記使用者情報が使用者の年齢・
性別を固有情報として含むので、使用者の年齢や性別に
応じたキャラクタの選択が可能になる。According to a tenth aspect of the invention, in the invention of the sixth to ninth aspects, the user information is the age of the user.
Since the gender is included as the unique information, it is possible to select the character according to the age and gender of the user.

【００１８】請求項１１の発明は、請求項６ないし請求
項９の発明において、前記使用者情報が使用者の生体情
報を含むので、使用者の健康状態の管理が可能になる。According to the invention of claim 11, in the invention of claims 6 to 9, since the user information includes biometric information of the user, the health condition of the user can be managed.

【００１９】請求項１２の発明は、請求項６ないし請求
項９の発明において、前記使用者情報が使用者を特定す
るための識別情報を含むので、使用者の本人確認が可能
になる。According to a twelfth aspect of the present invention, in the sixth to ninth aspects of the invention, the user information includes identification information for identifying the user, so that the user can be identified.

【００２０】請求項１３の発明は、請求項６ないし請求
項９の発明において、前記使用者情報が通信路を介して
接続された外部装置との接続時に必要となるシステム設
定情報を含むので、通信路を通して外部装置と接続する
のに必要なシステム設定情報を簡単に設定することがで
き、しかもこの種のシステム設定情報が消失した場合で
も使用者情報入力手段によって容易に再設定することが
可能になる。According to a thirteenth aspect of the present invention, in the inventions of the sixth to ninth aspects, the user information includes system setting information required when connecting to an external device connected via a communication path. You can easily set the system setting information required to connect to an external device through the communication path, and even if you lose this type of system setting information, you can easily reset it using the user information input means. become.

【００２１】請求項１４の発明は、請求項１ないし請求
項１３の発明において、前記キャラクタ選択手段が応答
者として想定したキャラクタとともに、対話中に相づち
を打つキャラクタを選択可能であることを特徴とする。
この構成によれば、対話中に相づちが打たれることによ
って、使用者の発話が促されることになり、結果的に使
用頻度の低下を軽減することができる。According to a fourteenth aspect of the present invention, in the first to thirteenth aspects of the present invention, the character selecting means is capable of selecting a character which is supposed to be a responder and a character which makes a relationship during the dialogue. To do.
According to this configuration, the user's utterance is urged due to the fact that the user is accused during the dialogue, and as a result, the decrease in the frequency of use can be reduced.

【００２２】請求項１５の発明は、請求項１ないし請求
項１４の発明において、前記キャラクタ選択手段が使用
者の音声入力に対する認識結果について複数の認識候補
が得られたときに各認識候補にそれぞれ異なるキャラク
タを割り当て、前記対話処理手段が前記認識候補を確認
するように使用者に質問するとともに使用者による肯定
の応答を受け取ると使用者による音声入力の内容を確定
する機能を有することを特徴とする。この構成によれ
ば、使用者の音声に適した音響モデルを使用することに
よって音声の認識率を高めることができ、しかも質問に
応答することになるから、対話における興味を持続させ
ることになり、使用者による使用頻度の低下を軽減する
ことができる。According to a fifteenth aspect of the present invention, in the first to fourteenth aspects, when the character selection means obtains a plurality of recognition candidates for a recognition result for a voice input of a user, the recognition candidates are respectively recognized. Different characters are assigned, and the dialogue processing unit has a function of asking the user to confirm the recognition candidate and having a function of determining the content of the voice input by the user when a positive response from the user is received. To do. According to this configuration, it is possible to increase the voice recognition rate by using the acoustic model suitable for the user's voice, and since it responds to the question, the interest in the dialogue is maintained, It is possible to reduce the decrease in the frequency of use by the user.

【００２３】請求項１６の発明は、請求項１５の発明に
おいて、前記対話処理手段には使用者の発話内容を認識
する音声認識部が設けられ、音声認識部には音声の特徴
量系列と照合される音響モデルが複数設けられるととも
に各音響モデルにそれぞれキャラクタが割り当てられお
り、前記認識候補に対する確定の結果により使用者の音
声に適合したキャラクタを抽出し、抽出されたキャラク
タに対応する音響モデルを音声認識部で用いることを特
徴とする。この構成によれば、使用者との対話を通して
複数個の音響モデルから使用者に適合する音響モデルを
絞り込むことができるから、適正な音響モデルを自動的
に選択することになり、結果的に使用者の音声に対する
認識率が高くなる。According to a sixteenth aspect of the present invention, in the fifteenth aspect of the present invention, the dialogue processing means is provided with a voice recognizing section for recognizing the utterance content of the user, and the voice recognizing section is collated with a voice feature amount series. A plurality of acoustic models are provided and a character is assigned to each acoustic model, and a character suitable for the voice of the user is extracted according to the result of confirmation of the recognition candidate, and an acoustic model corresponding to the extracted character is extracted. It is characterized by being used in a voice recognition unit. According to this configuration, it is possible to narrow down the acoustic model suitable for the user from a plurality of acoustic models through the interaction with the user, so that an appropriate acoustic model is automatically selected, and as a result, it is used. The recognition rate of the person's voice becomes high.

【００２４】請求項１７の発明は、請求項１ないし請求
項１６のいずれか１項に記載のコミュニケーション支援
装置と、複数の前記キャラクタについて複数の内容の音
声が登録される音声データベースとを備え、前記対話処
理手段は、前記使用者に応答する音声の内容を生成する
応答文生成部と、前記キャラクタ選択手段により選択さ
れたキャラクタに該当しかつ応答文生成部で生成した音
声の内容に該当する内容の音声を音声データベースから
検索する音声検索部とを備え、音声検索部が音声データ
ベースから該当する音声を抽出したときには当該音声を
使用者との対話に用いることを特徴とする。この構成に
よれば、音声データベースを設けていることによってキ
ャラクタとして使用者の家族を設定することが可能にな
り、使用者が独居高齢者である場合などに家族の音声を
利用して対話が行えることで使用者に安心感をもたらす
ことができる。The invention according to claim 17 comprises the communication support device according to any one of claims 1 to 16 and a voice database in which voices of a plurality of contents are registered for a plurality of the characters. The dialogue processing means corresponds to the response sentence generation unit that generates the content of the voice that responds to the user, the character that is selected by the character selection unit, and the content of the voice that is generated by the response sentence generation unit. And a voice search unit that searches the voice database for a voice of the content, and when the voice search unit extracts the relevant voice from the voice database, the voice is used for a dialogue with the user. According to this configuration, since the voice database is provided, the family of the user can be set as a character, and when the user is an elderly person living alone, a conversation can be performed using the voice of the family. This can give the user a sense of security.

【００２５】請求項１８の発明は、請求項１７の発明に
おいて、前記音声データベースには、音声入力が可能な
音声端末との間で通信可能とする音声通信部と、音声通
信部を通して音声端末から入力される音声を登録する登
録処理部とが付設されていることを特徴とする。この構
成によれば、音声端末を通して音声データベースに任意
の音声を登録することが可能であるから、たとえば使用
者の家族が音声データベースに容易に音声を登録するこ
とができる。According to an eighteenth aspect of the present invention, in the seventeenth aspect of the present invention, the voice database includes a voice communication section that enables communication with a voice terminal capable of voice input, and a voice terminal through the voice communication section. A registration processing unit for registering an input voice is additionally provided. According to this configuration, any voice can be registered in the voice database through the voice terminal, so that, for example, the family of the user can easily register the voice in the voice database.

【００２６】請求項１９の発明は、請求項１８の発明に
おいて、前記音声端末が電話端末であることを特徴とす
る。この構成によれば、電話端末を音声端末に用いるか
ら、携帯電話や固定電話を用いて音声データベースに音
声を簡便に登録することができる。The invention of claim 19 is characterized in that, in the invention of claim 18, the voice terminal is a telephone terminal. According to this configuration, since the telephone terminal is used as the voice terminal, it is possible to easily register the voice in the voice database using the mobile phone or the fixed telephone.

【００２７】請求項２０の発明は、請求項１８または請
求項１９の発明において、前記登録処理部は前記音声端
末に対して前記音声データベースに登録する音声の内容
を指示する機能を備えることを特徴とする。この構成に
よれば、音声データベースに登録する音声の内容が登録
処理部から指示されるから、指示に従って音声を登録す
ることにより、音声を容易に登録することができる上
に、コミュニケーション支援装置において対話に用いる
音声の内容をあらかじめ指示することにより、音声デー
タベースに格納された音声を対話に利用できる可能性が
高くなる。According to a twentieth aspect of the invention, in the eighteenth or nineteenth aspect of the invention, the registration processing unit has a function of instructing the voice terminal about the content of the voice to be registered in the voice database. And According to this configuration, since the content of the voice to be registered in the voice database is instructed from the registration processing unit, it is possible to easily register the voice by registering the voice in accordance with the instruction and also to perform the dialogue in the communication support device. By previously instructing the content of the voice used for, the possibility that the voice stored in the voice database can be used for the dialogue becomes high.

【００２８】請求項２１の発明は、請求項１８または請
求項１９の発明において、前記音声端末が画像を表示す
る表示部を備え、前記登録処理部は前記音声端末に対し
て前記音声データベースに登録する音声の内容をメニュ
ー形式で表示部に表示する機能を備えることを特徴とす
る。この構成によれば、音声データベースに登録する音
声の内容が登録処理部からメニュー形式で指示されるか
ら、指示の内容がわかりやすく、しかも指示に従って音
声を登録することにより、音声を容易に登録することが
できる上に、コミュニケーション支援装置において対話
に用いる音声の内容をあらかじめ指示することにより、
音声データベースに格納された音声を対話に利用できる
可能性が高くなる。According to a twenty-first aspect of the present invention, in the eighteenth or nineteenth aspect of the invention, the voice terminal includes a display section for displaying an image, and the registration processing section registers the voice terminal with the voice database. It is characterized in that it has a function of displaying the contents of the voice to be displayed on the display unit in a menu format. According to this configuration, the content of the voice to be registered in the voice database is instructed in the menu format from the registration processing unit, so that the content of the instruction is easy to understand, and the voice is easily registered by registering the voice according to the instruction. In addition to being able to perform, by instructing in advance the content of the voice used in the dialogue in the communication support device,
The voice stored in the voice database is more likely to be available for dialogue.

【００２９】請求項２２の発明は、請求項１７ないし請
求項２１の発明において、前記コミュニケーション支援
装置は前記音声入力部および前記音声出力部と前記対話
処理手段とを使用者と対話する使用者端末に備え、前記
音声データベースは使用者端末との間で通信路を介して
データ通信が可能なサーバに設けられ、前記音声検索部
は前記キャラクタ選択手段によりキャラクタが選択され
ると当該キャラクタに該当する音声を音声データベース
から一括して取り出し、前記対話処理手段は、前記応答
文生成部で生成した音声の内容に該当する内容の音声が
音声データベースから取り出した音声に含まれるときに
は当該音声を使用者との対話に用いることを特徴とす
る。この構成によれば、音声データベースがサーバに設
けられているから、音声データベースを専門家であるサ
ーバの管理者が監視することによって音声データベース
の保護や保守が確実に行われる。A twenty-second aspect of the present invention is the user terminal according to any one of the seventeenth to twenty-first aspects, wherein the communication support device interacts with the user through the voice input unit, the voice output unit, and the dialogue processing unit. In preparation for the above, the voice database is provided in a server capable of data communication with a user terminal via a communication path, and the voice search unit corresponds to the character when the character is selected by the character selecting means. When the voice extracted from the voice database is included in the voice extracted from the voice database when the voice having the content corresponding to the content of the voice generated by the response sentence generation unit is included in the voice extracted from the voice database at once. It is characterized by being used for the dialogue of. According to this configuration, since the voice database is provided in the server, protection and maintenance of the voice database can be surely performed by monitoring the voice database by a server administrator who is an expert.

【００３０】請求項２３の発明は、請求項１７ないし請
求項２１の発明において、前記音声データベースは前記
コミュニケーション支援装置と同所に設けられることを
特徴とする。この構成によれば、サーバが不要であるか
らシステムの構成が簡単になる。The invention of claim 23 is characterized in that, in the invention of claims 17 to 21, the voice database is provided at the same place as the communication support device. According to this configuration, a server is not required, so that the system configuration is simplified.

【００３１】請求項２４の発明は、請求項１７ないし請
求項２１の発明において、前記コミュニケーション支援
装置は、前記音声入力部および前記音声出力部を使用者
と対話する使用者端末に備えるとともに、前記対話処理
手段を使用者端末とは通信路を介してデータ通信が可能
なサーバに備え、前記音声データベースはサーバに設け
られていることを特徴とする。この構成によれば、音声
データベースと対話処理手段とがサーバに設けられてい
るから、音声データベースおよび対話処理手段を専門家
であるサーバの管理者が監視することによって音声デー
タベースおよび対話処理手段の保護や保守が確実に行わ
れる。According to a twenty-fourth aspect of the present invention based on the seventeenth to twenty-first aspects, the communication support device is provided with a user terminal that interacts with the voice input unit and the voice output unit, and The interactive processing means is provided in a server capable of data communication with the user terminal through a communication path, and the voice database is provided in the server. According to this configuration, since the voice database and the dialogue processing means are provided in the server, the voice database and the dialogue processing means are protected by monitoring the voice database and the dialogue processing means by a server administrator who is an expert. And maintenance will be performed reliably.

【００３２】請求項２５の発明は、請求項２２または請
求項２４の発明において、前記サーバが、前記コミュニ
ケーション支援装置が前記使用者との対話を行った履歴
を登録する対話記録格納部と、サーバに通信路を介して
接続される端末から対話記録格納部の内容を読み出した
ときに課金処理を行う課金処理部とを備えることを特徴
とする。この構成によれば、対話記録格納部に対話の履
歴を登録して対話の履歴を通信路を通して読出可能にし
ているから、使用者による対話が正常に行われたか否か
を外部から知ることが可能になる。たとえば、使用者が
独居高齢者である場合に、家族が対話記録を読み出すこ
とによって独居高齢者と直接対話できない場合でも、独
居高齢者の安否を確認することができる。しかも、対話
記録格納部を読み出すときに課金するから、対話記録格
納部の内容が不必要に読み出されることがなく、サーバ
のトラフィックの増加を抑制するとともにサーバの運用
費用を捻出することができる。According to a twenty-fifth aspect of the present invention, the server according to the twenty-second aspect or the twenty-fourth aspect, wherein the server registers a history of a dialogue in which the communication support device has a dialogue with the user, and a server. And a billing processing unit that performs billing processing when the content of the dialogue record storage unit is read from a terminal connected via a communication path. According to this configuration, since the history of the dialogue is registered in the dialogue record storage unit and the history of the dialogue can be read out through the communication path, it can be known from the outside whether or not the dialogue by the user is normally performed. It will be possible. For example, when the user is an elderly person living alone, the safety of the elderly person living alone can be confirmed even if the family cannot directly communicate with the elderly person living alone by reading the conversation record. In addition, since the fee is charged when reading the dialogue record storage unit, the contents of the dialogue record storage unit are not unnecessarily read, and it is possible to suppress an increase in the traffic of the server and generate the operating cost of the server.

【００３３】[0033]

【発明の実施の形態】（第１の実施の形態）本実施形態
は、図１に示すように、独居高齢者や病人（以下では使
用者と呼ぶ）がいる宅内に配置されるコミュニケーショ
ン支援装置１が、インターネットのようなネットワーク
を含む通信路３を介して、携帯端末２ａあるいはパーソ
ナルコンピュータからなる端末２ｂとの間でデータ伝送
を可能となっている例を示す。BEST MODE FOR CARRYING OUT THE INVENTION (First Embodiment) As shown in FIG. 1, the present embodiment is a communication support device arranged in a house where an elderly person living alone and a sick person (hereinafter referred to as a user) are present. 1 shows an example in which data transmission is possible with the mobile terminal 2a or the terminal 2b composed of a personal computer via a communication path 3 including a network such as the Internet.

【００３４】コミュニケーション支援装置１は、音声に
よる対話機能と、周囲を撮影する撮影機能と、画像およ
び音声を含むデータを伝送する通信機能とを備える。本
実施形態においては、コミュニケーション支援装置１を
犬型の外観を有するロボットとして説明するが、表情を
表すことが可能なロボットであればどのような外観を呈
していてもよい。また、携帯端末２ａは画像表示機能を
有する携帯電話を想定し、通信路には画像データおよび
音声データを一時的に蓄積する画像音声サーバ４が接続
されているものとする。The communication support device 1 has a voice interactive function, a photographing function for photographing the surroundings, and a communication function for transmitting data including images and voice. In the present embodiment, the communication support device 1 is described as a robot having a dog-like appearance, but any appearance is acceptable as long as the robot can express facial expressions. Further, it is assumed that the mobile terminal 2a is a mobile phone having an image display function, and an image / sound server 4 for temporarily storing image data and sound data is connected to the communication path.

【００３５】コミュニケーション支援装置１は、音声に
よる対話を行うために、耳としてのマイクロホンに増幅
回路を付加した音声入力部１１と、口としてのスピーカ
に増幅回路を付加した音声出力部１２とを備え、さらに
周囲を撮影するために、目としての小型のＴＶカメラ
（ＣＣＤ撮像素子あるいはＣＭＯＳ撮像素子に受光光学
系および画像信号増幅回路を組み合わせたもの）からな
る撮像部１３を備える。The communication support device 1 is provided with a voice input unit 11 in which an amplification circuit is added to a microphone as an ear and a voice output unit 12 in which an amplification circuit is added to a speaker as a mouth in order to perform a dialogue by voice. Further, in order to take a picture of the surroundings, an image pickup unit 13 including a small TV camera (a CCD image pickup element or a CMOS image pickup element in which a light receiving optical system and an image signal amplification circuit are combined) is provided as an eye.

【００３６】音声入力部１１および音声出力部１２は対
話処理手段２０に接続され、対話処理手段２０では音声
入力部１１および音声出力部１２を通して自然言語によ
る使用者とのコミュニケーションを可能とする。また、
撮像部１３および音声入力部１１はＡＶ信号処理部１４
に接続され、撮像部１３で撮影した画像信号および音声
入力部１１から得られた音声信号が、デジタル信号に変
換されるとともにデータ圧縮される。ＡＶ信号処理部１
４において画像信号および音声信号を圧縮した画像デー
タおよび音声データは通信手段１５を介して通信路３に
送出され、通信手段１５では所定の通信プロトコルを用
いて画像データおよび音声データを画像音声サーバ４に
転送する。コミュニケーション支援装置１の各部の動作
は統括制御部１０によって管理されている。The voice input unit 11 and the voice output unit 12 are connected to the dialogue processing unit 20, and the dialogue processing unit 20 enables communication with the user in natural language through the voice input unit 11 and the voice output unit 12. Also,
The imaging unit 13 and the audio input unit 11 are the AV signal processing unit 14.
The image signal photographed by the image pickup unit 13 and the audio signal obtained from the audio input unit 11 are connected to the digital camera, and are converted into digital signals and data compressed. AV signal processing unit 1
The image data and the audio data obtained by compressing the image signal and the audio signal in 4 are sent to the communication path 3 through the communication means 15, and the communication means 15 uses the predetermined communication protocol to convert the image data and the audio data into the image and sound server 4. Transfer to. The operation of each unit of the communication support apparatus 1 is managed by the general control unit 10.

【００３７】さらに、コミュニケーション支援装置１に
は、表情を表現するために首、耳、腕、口などに相当す
る部位が可動であって、可動部分は駆動手段３０に設け
た駆動部３１によって動きが付与される。駆動部３１は
直流モータとギヤとからなり、コミュニケーション支援
装置１の筐体において各可動部分の近傍に収納される。
駆動手段３０には、駆動部３１に設けた直流モータを制
御する駆動部制御部３２および直流モータの回転角度お
よび回転向きを検出するエンコーダなどの駆動部制御用
センサ部３３も設けられる。駆動部制御部３２では統括
制御部１０からの指示と駆動部制御用センサ部３３での
検出値とを用いて駆動部３１をフィードバック制御す
る。また、コミュニケーション支援装置１は、周囲の状
況を検出するセンサ部１７を備える。センサ部１７とし
ては、使用者がコミュニケーション支援装置１に触れた
ことを検出する接触センサ、使用者の接近を検出する測
距センサ、周囲の明るさを検出する照度センサが設けら
れる。Further, in the communication support device 1, parts corresponding to a neck, an ear, an arm, a mouth, etc. are movable in order to express a facial expression, and the movable part is moved by a driving part 31 provided in the driving means 30. Is given. The drive unit 31 includes a DC motor and gears, and is housed in the housing of the communication support apparatus 1 near each movable part.
The drive unit 30 is also provided with a drive unit control unit 32 for controlling the DC motor provided in the drive unit 31 and a drive unit control sensor unit 33 such as an encoder for detecting the rotation angle and the rotation direction of the DC motor. The drive unit control unit 32 feedback-controls the drive unit 31 using the instruction from the overall control unit 10 and the detection value of the drive unit control sensor unit 33. The communication support device 1 also includes a sensor unit 17 that detects a surrounding situation. As the sensor unit 17, a contact sensor that detects that the user has touched the communication support apparatus 1, a distance measurement sensor that detects the approach of the user, and an illuminance sensor that detects the ambient brightness are provided.

【００３８】統括制御部１０は、センサ部１７から得ら
れる周囲の状況（コミュニケーション支援装置１への接
触ないし接近、周囲の明るさの変化など）と、ＡＶ信号
処理部１４から得られる入力音の特徴量（使用者の発話
音圧レベルなど）、撮像された画像の特徴量（背景画像
との差による人と人工物との区別など）と、対話処理手
段２０における音声認識に基づく使用者からの情報（キ
ーワードの分析による発話者の意図推定など）と、統括
制御部１０において把握しているコミュニケーション支
援装置１の内部状態とにより、後述するキャラクタを選
択するほか、あらかじめ設定されている制御を行うよう
に制御コマンドおよび付随する制御データを駆動部制御
部３２、ＡＶ信号処理部１４、対話処理手段２０、通信
手段１５、ディスプレイ部１６、音声入力部１１、音声
出力部１２、撮像部１３を制御する。The overall control unit 10 detects the surrounding situation (contact or approach to the communication support device 1, change in ambient brightness, etc.) obtained from the sensor unit 17 and the input sound obtained from the AV signal processing unit 14. From the user based on the feature amount (such as the sound pressure level of the user's speech), the feature amount of the captured image (such as the distinction between a person and an artifact due to the difference from the background image), and the voice recognition in the dialogue processing means 20. Information (such as estimation of a speaker's intention by analyzing a keyword) and the internal state of the communication support device 1 ascertained by the overall control unit 10, a character to be described later is selected, and preset control is performed. The control command and the accompanying control data are transmitted to the drive unit control unit 32, the AV signal processing unit 14, the dialogue processing unit 20, the communication unit 15, the disk unit as if they were executed. Ray unit 16, voice input unit 11, audio output unit 12, controls the imaging unit 13.

【００３９】すなわち、統括制御部１０では、駆動部制
御部３２に対しては駆動部３１を配置している部位、動
作の向き・位置・時間を指示することによって、コミュ
ニケーション支援装置１の各部位に所望の動作を行わせ
る。また、ＡＶ信号処理部１４に対しては、撮像部１３
からの画像信号の取り込みのオン・オフを指示し、また
撮像部１３の電源のオン・オフも指示する。さらに、対
話制御部２３に対しては選択したキャラクタに対応する
内容および声色を発話させるように指示し、音声出力部
１２に対しては選択されたキャラクタに応じて増幅率を
制御する。ここに、音声入力部１１に入力される使用者
の発話による音圧レベルは定まっていないから、ＡＶ信
号処理部１４に入力される音声信号が適正なレベルにな
るように統括制御部１０では音声入力部１１に対しても
増幅率を指示する。このほか、通信手段１５に対する通
信のオン・オフを制御する指示、ディスプレイ部１６に
対する電源のオン・オフの指示も統括制御部１０が行
う。That is, in the overall control unit 10, by instructing the drive unit control unit 32 about the region where the drive unit 31 is arranged and the direction / position / time of the operation, each unit of the communication support apparatus 1 is controlled. To perform the desired action. In addition, for the AV signal processing unit 14, the imaging unit 13
Instructing to turn on / off the acquisition of the image signal from the device, and also instructing to turn on / off the power source of the imaging unit 13. Furthermore, the dialogue control unit 23 is instructed to speak the content and voice color corresponding to the selected character, and the voice output unit 12 controls the amplification factor according to the selected character. Here, since the sound pressure level due to the user's utterance input to the voice input unit 11 is not fixed, the integrated control unit 10 outputs the voice signal so that the voice signal input to the AV signal processing unit 14 becomes an appropriate level. The amplification factor is also instructed to the input unit 11. In addition, the overall control unit 10 also gives an instruction to control the communication on / off to the communication unit 15 and an instruction to turn on / off the power to the display unit 16.

【００４０】画像音声サーバ４では画像データおよび音
声データを一時的に蓄積する。画像音声サーバ４に蓄積
された画像データおよび音声データは、携帯端末２ａあ
るいは端末２ｂから通信路３を介して画像音声サーバ４
にアクセスすることによって閲覧することができる。こ
こにおいて、携帯端末２ａおよび端末２ｂはインタネッ
ト接続サービスを通して画像音声サーバ４にアクセスで
きるものとする。The image / sound server 4 temporarily stores image data and sound data. The image data and the audio data accumulated in the image / audio server 4 are transmitted from the mobile terminal 2a or the terminal 2b via the communication path 3 to the image / audio server 4
You can browse by accessing. Here, it is assumed that the mobile terminal 2a and the terminal 2b can access the video / audio server 4 through the Internet connection service.

【００４１】画像音声サーバ４は、画像データのサイズ
や色調を携帯端末２ａの画面に表示可能なフォーマット
に変換する変換手段４ａを備える。つまり、携帯端末２
ａによって画像音声サーバ４に蓄積された画像データを
閲覧する場合は、画像音声サーバ４に蓄積された画像が
携帯端末２ａの画面に表示可能なフォーマットに変換さ
れ、コミュニケーション支援装置１で撮影された画像を
携帯端末２ａによって確認することが可能になる。な
お、画像データには撮影日時を添付しておくことが望ま
しく、使用者の家族が遠方に居ても携帯端末２ａあるい
は端末２ｂを用いて、撮影日時における使用者の状況を
確認することができる。The image / sound server 4 includes a conversion means 4a for converting the size and color tone of the image data into a format that can be displayed on the screen of the portable terminal 2a. That is, the mobile terminal 2
When browsing the image data stored in the image and sound server 4 by a, the image stored in the image and sound server 4 is converted into a format that can be displayed on the screen of the mobile terminal 2a, and the image is captured by the communication support apparatus 1. The image can be confirmed by the mobile terminal 2a. It is desirable to attach the shooting date and time to the image data, and the user's situation at the shooting date and time can be confirmed using the mobile terminal 2a or the terminal 2b even if the user's family is far away. .

【００４２】一方、使用者の家族が使用者にメッセージ
を送る場合には、携帯端末２ａあるいは端末２ｂを用い
て画像データと音声データとの少なくとも一方を画像音
声サーバ４に送出して一時的に蓄積させる。ここに、画
像データおよび音声データはデータ圧縮された形で保存
される。コミュニケーション支援装置１では画像音声サ
ーバ４にアクセスしたときに、コミュニケーション支援
装置１に宛てた画像データあるいは音声データが存在す
れば、通信路３を介してこれを取得し、対話処理手段２
０により音声出力部１２から「家族からお便り来ている
よ」などのメッセージを発話する。ここで、取得した画
像データないし音声データはＡＶ信号処理部１４におい
てデータ伸長されるとともにアナログ信号に変換され、
画像データに対応する画像信号はディスプレイ部１６に
表示され、音声データに対応する音声信号は音声出力部
１２から出力される。この機能によって、使用者は家族
が遠方に居てもコミュニケーション支援装置１を用いて
家族の顔や声を見聞きすることが可能になる。On the other hand, when the user's family sends a message to the user, at least one of the image data and the audio data is sent to the image / audio server 4 using the mobile terminal 2a or the terminal 2b and temporarily sent. Accumulate. Here, the image data and the audio data are stored in a data compressed form. When the communication support apparatus 1 accesses the image / sound server 4, if there is image data or sound data addressed to the communication support apparatus 1, this is acquired via the communication path 3 and the dialogue processing unit 2
At 0, the voice output unit 12 utters a message such as "I have heard from my family." Here, the acquired image data or audio data is decompressed by the AV signal processing unit 14 and converted into an analog signal,
The image signal corresponding to the image data is displayed on the display unit 16, and the audio signal corresponding to the audio data is output from the audio output unit 12. With this function, the user can see and hear the faces and voices of the family using the communication support device 1 even if the family is far away.

【００４３】なお、画像音声サーバ４に蓄積された画像
データや音声データが他人に閲覧されることがないよう
に、画像音声サーバ４に蓄積された画像データおよび音
声データヘのアクセス権は、携帯端末２ａ、端末２ｂ、
コミュニケーション支援装置１からの識別情報に応じて
制限されている。The access right to the image data and voice data stored in the image / voice server 4 is set to the mobile terminal so that the image data / voice data stored in the image / voice server 4 may not be viewed by others. 2a, terminal 2b,
It is restricted according to the identification information from the communication support device 1.

【００４４】ところで、コミュニケーション支援装置１
において、撮像部１３で撮影するタイミングによっては
使用者のプライバシを損なう可能性がある。そこで、使
用者が自分の意思で撮影するタイミングを決定できるよ
うに構成してある。ただし、コミュニケーション支援装
置１の使用者は高齢者や病人であることが多いから、撮
像のタイミングを手操作で指示する構成を採用すると面
倒になって操作しなくなる可能性が高いから、使用者の
特定の動きによって撮影のタイミングを指示することが
望ましい。一例として、本実施形態では、撮像部１３が
ロボットの目の部位に設けられていることを利用し、使
用者の瞼の開閉によって撮影を指示するようにしてあ
る。すなわち、使用者はコミュニケーション支援装置１
の目に相当する部分を見て瞼を閉じた後に瞼を開くと、
音声入力部１１での音声の取り込みおよび撮像部１３で
の撮影が開始され、図示しない記憶手段によって音声デ
ータおよび画像データが格納される。その後、瞼を一定
時間以上閉じると音声および画像の取り込みが停止す
る。なお、通信路における通信速度を考慮すると、音声
および画像の取り込みの制限時間を規定しておくことが
望ましい。つまり、瞼を閉じない場合でも制限時間にな
れば自動的に音声および画像の取り込みを停止させる。
この制御は統括制御部１０により行われる。この構成に
よって、停止を指示することなくコミュニケーション支
援装置１から離れた場合でも音声および画像の取り込み
を自動的に停止させることができる。しかも、音声およ
び画像の取り込みのタイミングは使用者が意図して指示
するからプライバシを損なうことがない。By the way, the communication support device 1
In the above, there is a possibility that the privacy of the user may be impaired depending on the timing of photographing by the image pickup unit 13. Therefore, it is configured so that the user can determine the timing of shooting with his or her own will. However, since the user of the communication support device 1 is often an elderly person or a sick person, there is a high possibility that it will be cumbersome to stop operating if a configuration for manually instructing the timing of imaging is adopted. It is desirable to indicate the timing of shooting by a specific movement. As an example, in the present embodiment, the fact that the imaging unit 13 is provided in the eye area of the robot is used to instruct the photographing by opening and closing the eyelid of the user. That is, the user is the communication support device 1
If you open the eyelid after closing the eyelid by looking at the part corresponding to
The voice input unit 11 starts capturing the voice and the image capturing unit 13 starts shooting, and the storage unit (not shown) stores the voice data and the image data. After that, when the eyelids are closed for a certain time or longer, the capturing of the sound and the image is stopped. Considering the communication speed on the communication path, it is desirable to define a time limit for capturing audio and images. That is, even if the eyelids are not closed, the capturing of the sound and the image is automatically stopped at the time limit.
This control is performed by the integrated control unit 10. With this configuration, it is possible to automatically stop capturing voice and images even when the user leaves the communication support apparatus 1 without instructing to stop. Moreover, since the user intentionally indicates the timing of capturing the voice and the image, privacy is not impaired.

【００４５】ところで、対話処理手段２０は、基本的に
は音声入力部１１を通して入力された使用者の音声を認
識する音声認識部２１と、音声出力部１２から出力する
ための合成音声を生成する音声合成部２２と、音声認識
部２１において認識した音声の意味を解析し、応答する
内容を音声合成部２２に指示する対話制御部２３とを備
える。さらに、具体的に説明すると、図２に示すよう
に、音声認識部２１は音響処理部２１ａと言語復号部２
１ｂとから構成されており、音声入力部１１を通して入
力された音声信号は、音響処理部２１ａにおいてＡ／Ｄ
変換が施された後にＦＦＴなどの手法によってスペクト
ル分析が施される。スペクトル分析された音声波形は時
系列データである特徴量系列に符号化される。言語復号
部２１ｂでは、あらかじめ用意された単語系列が発話さ
れたときの音声波形の特徴量系列の出現確率を示す音響
モデルと、単語系列の出現確率のデータベースである言
語モデルにより、音響処理部２１ａの出力である音声の
特徴量系列から単語系列を推定する。すなわち、音声認
識部２１では統計的音声認識技術を用いる。By the way, the dialogue processing means 20 basically generates a voice recognition section 21 for recognizing the user's voice input through the voice input section 11 and a synthesized voice to be output from the voice output section 12. The speech synthesis unit 22 and the dialogue control unit 23 that analyzes the meaning of the speech recognized by the speech recognition unit 21 and instructs the speech synthesis unit 22 about the content to be responded to. More specifically, as shown in FIG. 2, the voice recognition unit 21 includes a sound processing unit 21a and a language decoding unit 2.
1b, and the audio signal input through the audio input unit 11 is A / D processed by the audio processing unit 21a.
After the conversion is performed, spectrum analysis is performed by a method such as FFT. The spectrum-analyzed speech waveform is encoded into a feature quantity series that is time series data. The language decoding unit 21b uses the acoustic model indicating the appearance probability of the feature amount sequence of the voice waveform when the prepared word sequence is uttered, and the language model which is a database of the appearance probability of the word sequence, to perform the acoustic processing unit 21a. Estimate the word sequence from the output feature sequence of the speech. That is, the voice recognition unit 21 uses a statistical voice recognition technique.

【００４６】一方、対話制御部２３は、言語復号部２１
ｂからの単語系列が入力される対話管理部２３ａを備え
る。対話管理部２３ａは推論エンジンを備え、入力され
た単語系列を対話ルールベース２３ｃに格納されたルー
ル（すなわち、知識）と照合する自然言語処理技術を用
いて、単語系列の構文解析とキーワード抽出とを行い、
その結果に基づいて意味解析および文脈解析を行う。す
なわち、使用者による発話の意味や使用者の意図を分析
して所定の形式で表し、発話の意味や使用者の意図の分
析結果に応じて応答内容を生成して所定の形式で発話内
容生成部２３ｂに引き渡す。ここで、所定の形式とは意
味表現の形式のことである。発話内容生成部２３ｂで
は、対話管理部２３ａから渡された応答内容に応じて単
語系列を生成し音声合成部２２に引き渡す。音声合成部
２２では単語系列から音声合成を行ったり、あらかじめ
登録されている音声を単語系列に従って編集して音声信
号を生成する。この音声信号が音声出力部１２に与えら
れることにより合成音声が外部に送出される。On the other hand, the dialogue control section 23 includes the language decoding section 21.
The dialog management unit 23a is provided to which the word sequence from b is input. The dialogue management unit 23a includes an inference engine, and uses a natural language processing technique that matches an input word sequence with a rule (that is, knowledge) stored in the dialogue rule base 23c, and performs syntactic analysis and keyword extraction of the word sequence. And then
Based on the result, semantic analysis and context analysis are performed. That is, the meaning of the utterance by the user and the intention of the user are analyzed and expressed in a predetermined format, and the response content is generated according to the analysis result of the meaning of the utterance and the user's intention to generate the utterance content in the predetermined format. It is delivered to the section 23b. Here, the predetermined format is a format of semantic expression. The utterance content generation unit 23b generates a word sequence according to the response content passed from the dialogue management unit 23a and delivers it to the voice synthesis unit 22. The voice synthesizing unit 22 performs voice synthesis from a word sequence or edits a voice registered in advance according to the word sequence to generate a voice signal. When this voice signal is given to the voice output unit 12, the synthesized voice is sent to the outside.

【００４７】音声出力部１２から出力する音声は自然な
音声に近づけることが望ましいから、以下の技術を用い
る。すなわち、「徳田功、池田徹、宮野尚哉、合原一
幸、「サロゲート法に基づく音声知覚心理実験」、電子
情報通信学会技術研究報告、Ｎｏ．ＮＬＰ９９−１５０
〜，ｐｐ．１７−２１，２０００」において、「音圧変
動を保持している母音のほうがより自然に聞こえる」と
いうことが報告されているから、音声信号にカオス的変
動を付加する機能を音声合成部２２に持たせることによ
り自然な音声に近づけている。Since it is desirable that the voice output from the voice output unit 12 be close to a natural voice, the following technique is used. That is, “Isao Tokuda, Tohru Ikeda, Naoya Miyano, Kazuyuki Aihara,“ Speech perception psychological experiment based on surrogate method ”, IEICE Technical Report, No. NLP99-150
~, Pp. 17-21, 2000 "," a vowel holding a sound pressure fluctuation sounds more natural ". Therefore, the speech synthesis unit 22 has a function of adding chaotic fluctuation to a voice signal. By having it, it is closer to a natural voice.

【００４８】ところで、本実施形態では、対話処理手段
２０において複数種類のキャラクタが設定してあり、統
括制御部１０は対話処理手段２０に設定されたキャラク
タを選択するキャラクタ選択手段を備える。ここで、キ
ャラクタは使用者に応答する応答者として想定されてお
り、キャラクタが異なれば声色のほか口調や言葉使いア
クセントなどが変化する。つまり、キャラクタの属性に
は、声色、口調、言葉使い、アクセントなどが含まれ
る。したがって、対話処理手段２０において落語を行う
ものとすれば、登場人物ごとのキャラクタが設定される
ことになる。また、高齢者に対して、「息子」、
「娘」、「孫」に相当するキャラクタを設定しておくこ
ともできる。By the way, in the present embodiment, a plurality of types of characters are set in the dialogue processing means 20, and the overall control section 10 comprises a character selecting means for selecting the character set in the dialogue processing means 20. Here, the character is assumed as a responder who responds to the user, and if the character is different, not only the voice color, but also the tone and accent of words are changed. That is, the attribute of the character includes a voice color, tone, word usage, accent, and the like. Therefore, if the dialogue processing unit 20 is to perform rakugo, a character is set for each character. Also, for the elderly, "son",
Characters corresponding to "daughter" and "grandchild" can also be set.

【００４９】ここで、キャラクタの選択例として、コミ
ュニケーション支援装置１により落語を実行させる例に
ついて説明する。すなわち、本実施形態では統括制御部
１０において落語の実行が可能なようにプログラミング
されているものとする。落語の開始にあたっては、使用
者からコミュニケーション支援装置１に対して、目的を
とくに指定せずにコミュニケーション支援装置１に何で
もよいから実行するように指示する。たとえば、「何か
やってよ」と話しかける。上述のように本実施形態のコ
ミュニケーション支援装置１では落語を演じることが可
能であるから、コミュニケーション支援装置１は「落語
やりましょうか」などと応答することになる。ここで、
使用者が「お願い」などと肯定の応答をすれば、対話処
理手段２０では統括制御部１０に対して落語の開始が指
示されたことを伝達する。統括制御部１０では、落語の
開始が指示されると、落語の演題の内容に応じてあらか
じめ設定されている動きの手順に基づいて駆動部制御部
３２に首や腕に相当する部位の動作を指示し、対話処理
手段２０に対して演題中のキャラクタに合わせた声色お
よび発話内容を指示するとともに、音声出力部１２に対
して発話内容に応じて増幅率を調節する指示を与える。
このような指示によって、対話処理手段２０では音声合
成部２２においてそれぞれのキャラクタの声色を生成す
る。いま、落語の演題中において会話する２人の登場人
物（つまり、キャラクタ）があるとすれば、一方の登場
人物の発話ではロボットの頭部が左に向き、他方の登場
人物の発話ではロボットの頭部を右に向くように首部分
を制御する。また、発話内容に応じて他の部位の動作も
適宜に指示することによってロボットの動作に表情を付
けるようにする。また、各登場人物ごとに声色を異なら
せることによって、キャラクタを差別化する。Here, as an example of character selection, an example in which the communication support device 1 executes a rakugo story will be described. That is, in the present embodiment, it is assumed that the overall control unit 10 is programmed so that it can execute rakugo. At the start of the rakugo story, the user instructs the communication support device 1 to perform whatever the communication support device 1 does without specifying the purpose. For example, say "Do something". As described above, since the communication support device 1 of the present embodiment can play rakugo, the communication support device 1 responds with "Shall I do rakugo?" here,
If the user makes a positive response such as "please", the dialogue processing means 20 notifies the central control unit 10 that the instruction to start rakugo is given. When the start of the rakugo story is instructed, the overall control part 10 causes the drive part control part 32 to perform the motion of the parts corresponding to the neck and the arm based on the procedure of the movement set in advance according to the content of the theme of the rakugo story. The voice processing unit 20 gives an instruction to the dialogue processing unit 20 to give a voice color and utterance content suitable for the character being presented, and gives an instruction to the voice output unit 12 to adjust the amplification factor according to the utterance content.
In accordance with such an instruction, the voice synthesizer 22 of the dialogue processing means 20 generates a voice color of each character. Now, if there are two characters (that is, characters) who talk during the rakugo story, the head of the robot turns to the left in the utterance of one character, and the robot head in the utterance of the other character. Control the neck so that the head turns to the right. In addition, the motion of the robot is given a facial expression by appropriately instructing the motion of other parts according to the utterance content. Also, the characters are differentiated by making the voice color different for each character.

【００５０】上述の例では、コミュニケーション支援装
置１が落語の内容に応じてキャラクタを自動的に設定し
ているが、使用者の意思によってキャラクタを選択する
ことも可能になっている。たとえば、コミュニケーショ
ン支援装置１の胸の部位に、「息子」、「娘」、「孫」
と表記した複数個の選択釦を設け、所望の選択釦を押操
作することによって、コミュニケーション支援装置１が
選択されたキャラクタの声色で発話するようにすること
ができる。この場合、キャラクタごとに声色を変えるの
はもちろんのこと、キャラクタに応じて発話内容（言葉
使い）や口調を異ならせるように制御する。たとえば、
音声入力部１１から入力された使用者の音声から使用者
に元気がなく気分がふさいでいると判断したときに、
「息子」が選択されていると叱咤激励する発話内容で強
い口調で発話し、「孫」が選択されていると同情する発
話内容で優しい口調で発話するのである。In the above example, the communication support device 1 automatically sets the character according to the content of the storyteller, but it is also possible to select the character according to the intention of the user. For example, “son”, “daughter”, “grandchild” are placed on the chest part of the communication support device 1.
By providing a plurality of selection buttons described as and pressing the desired selection button, the communication support apparatus 1 can speak in the voice color of the selected character. In this case, not only the voice color is changed for each character, but also the utterance content (word use) and tone are controlled so as to be different depending on the character. For example,
When it is determined from the voice of the user input from the voice input unit 11 that the user is not feeling well and is in a bad mood,
When "son" is selected, he speaks in a strong tone with excitement, and when "grandchild" is selected, he speaks in a gentle tone.

【００５１】ところで、対話処理手段２０において音声
認識部２１だけでは使用者の発話内容を一意に認識でき
ない場合がある。このような場合には、音声認識部２１
では発話内容について複数の認識候補を生成し、各認識
候補ごとに確信度を出力する。確信度に有意の差が生じ
たときには確信度の高いほうを認識内容とすればよい
が、確信度に有意の差が生じないときには、以下のよう
な処理を行う。By the way, there are cases where the speech recognition unit 21 in the dialogue processing means 20 cannot uniquely recognize the utterance content of the user. In such a case, the voice recognition unit 21
Then, a plurality of recognition candidates are generated for the utterance content, and the certainty factor is output for each recognition candidate. When there is a significant difference in the certainty factor, the content with the higher certainty factor may be the recognition content, but when there is no significant difference in the certainty factor, the following processing is performed.

【００５２】いま、使用者との対話中に使用者が「５時
に起こして」と言ったときに、音声認戯部２１におい
て、「５時に起こして」と「工事に起こして」との２つ
の認識候補が得られ、両者の認識候補の確信度に有意の
差が生じなかったとする。このような場合には、統括制
御部１０では一方の認識候補に割り当てるキャラクタを
選択して「５時に起こしてでよろしいですか」と発話さ
せ、他方の認識候補に割り当てる別のキャラクタを選択
して「工事に起こしてでよろしいですか」と発話させ
る。このように、認識候補ごとに異なるキャラクタを選
択して発話させることによって、使用者は認識候補の確
認がなされていることを明確に意識することができ、い
ずれかの認識候補に関する質問に肯定の応答を行えば、
その認識候補が確定することになる。このような技術を
用いることにより、複数個の認識候補について確信度に
有意の差が生じない場合でも、対話によって確認するこ
とになるから誤認識の可能性を低減することができる。Now, when the user says "wake up at 5 o'clock" during a dialogue with the user, the voice recognition section 21 indicates "wake up at 5 o'clock" and "wake up for construction". It is assumed that two recognition candidates are obtained and there is no significant difference in the certainty factors of both recognition candidates. In such a case, the overall control unit 10 selects a character to be assigned to one of the recognition candidates, speaks out "Are you sure you want to wake up at 5?", And selects another character to be assigned to the other recognition candidate. Ask him, "Are you sure you want to wake it up for construction?" In this way, by selecting and uttering a different character for each recognition candidate, the user can clearly recognize that the recognition candidate has been confirmed, and affirmatively answer the question regarding any recognition candidate. If you respond,
The recognition candidate will be confirmed. By using such a technique, the possibility of erroneous recognition can be reduced because confirmation is performed by dialogue even if there is no significant difference in the certainty factor for a plurality of recognition candidates.

【００５３】上述したように音声認織部２１ではあらか
じめ用意された単語系列が発話されたときの音声波形の
特徴量系列の出現確率を示す音響モデルを用いて単語系
列を推定している。したがって、音響モデルが若年者の
音声波形により作成されたものであるときに、使用者が
若年者であれば認識率も高くなるが、高齢者であれば認
織率は低下する傾向にある。つまり、想定される使用者
に応じた人により作成した音響モデルを用いるほうが認
識率が高くなると言える。ここで、すべての年代・性別
の音声による平均的な音響モデルを用いればよいとも考
えられるが、偏差が大きくなるからかえって認織率の低
下につながる。As described above, the speech recognition and weaving unit 21 estimates the word sequence using the acoustic model indicating the appearance probability of the feature quantity sequence of the voice waveform when the prepared word sequence is uttered. Therefore, when the acoustic model is created by the voice waveform of a young person, the recognition rate is high if the user is a young person, but the recognition rate tends to be low if the user is an elderly person. In other words, it can be said that the recognition rate is higher when the acoustic model created by a person according to the supposed user is used. Here, it may be possible to use an average acoustic model based on voices of all ages and sexes, but the deviation increases, which leads to a decrease in the acceptance rate.

【００５４】そこで、若年者による音響モデルや高齢者
による音響モデルのように音響モデルを複数用意し、各
音響モデルにそれぞれ異なるキャラクタを割り当てると
ともに、上述した複数個の認識候補に対する確認の応答
を行うことによって使用者の音声に適合した音響モデル
を抽出する。このようにして音響モデルを選択した情報
によって統合制御部１０を学習させることによって誤認
織を低減させることが可能になる。Therefore, a plurality of acoustic models such as an acoustic model by a young person and an acoustic model by an elderly person are prepared, different characters are assigned to each acoustic model, and a confirmation response to the plurality of recognition candidates described above is performed. As a result, an acoustic model suitable for the user's voice is extracted. In this way, it is possible to reduce false recognition by making the integrated control unit 10 learn by the information for selecting the acoustic model.

【００５５】（第２の実施の形態）第１の実施の形態で
は音声入力部１１から入力された声の調子などによって
使用者を識別しているが、本実施形態は図３に示すよう
に、使用者の年齢・性別を識別するための使用者情報を
使用者情報入力手段４０から与えるようにした例を示
す。(Second Embodiment) In the first embodiment, the user is identified by the tone of the voice input from the voice input unit 11, but in the present embodiment, as shown in FIG. An example will be given in which user information for identifying the age / sex of the user is given from the user information input means 40.

【００５６】使用者情報入力手段４０は、使用者情報を
格納したメモリ４１を備え、コミュニケーション支援装
置１の本体部であるロボットに設けた信号受信部１８に
対してデータ伝送を可能とする信号送信部４２を備え
る。本実施形態では、信号送信部４２と信号受信部１８
との間の伝送路の少なくとも一部に人体を用いており、
人体を通して微小電流を流すことによって信号送信部４
２から信号受信部１８へのデータ伝送が可能になってい
る。使用者情報入力手段４０は人体の手首（たとえば腕
輪型）や指（たとえば指輪型）などに取付可能であっ
て、指先などでロボットの所定部位に触れることによっ
て信号送信部４２と信号受信部１８との間の伝送路が形
成される。The user information input means 40 is provided with a memory 41 storing user information, and transmits a signal that enables data transmission to the signal receiving unit 18 provided in the robot which is the main body of the communication support apparatus 1. The unit 42 is provided. In the present embodiment, the signal transmitter 42 and the signal receiver 18
The human body is used for at least part of the transmission line between
By sending a minute current through the human body, the signal transmitter 4
The data transmission from 2 to the signal receiving unit 18 is possible. The user information input means 40 can be attached to a wrist (for example, a bracelet type) or a finger (for example, a ring type) of a human body, and the signal transmitting unit 42 and the signal receiving unit 18 can be obtained by touching a predetermined portion of the robot with a fingertip or the like. A transmission line between the and is formed.

【００５７】メモリ４１には、使用者情報として、使用
者の年齢・年齢のような使用者の固有情報、使用者の体
温・脈拍数・血圧のような生体情報、使用者を識別する
ための識別情報、後述するシステム設定情報のいずれか
１つまたは複数の組合せが記録可能になっている。した
がって、使用者情報入力手段４０を装着した使用者がコ
ミュニケーション支援装置１の本体部としてのロボット
の所定部位に触れると、メモリ４１に格納された使用者
情報が信号受信部１８に転送され、統括制御部１０に使
用者情報が引き渡されるのである。このように、コミュ
ニケーション支援装置１の本体部であるロボットに触れ
るだけで使用者情報を統括制御部１０に引き渡すことが
できるから、機器の操作が不慣れな高齢者でも使用でき
る。In the memory 41, as user information, unique information of the user such as age and age of the user, biometric information such as body temperature, pulse rate and blood pressure of the user, and for identifying the user Any one or a combination of identification information and system setting information described later can be recorded. Therefore, when the user wearing the user information input means 40 touches a predetermined portion of the robot as the main body of the communication support apparatus 1, the user information stored in the memory 41 is transferred to the signal receiving unit 18, and is integrated. The user information is handed over to the control unit 10. In this way, the user information can be handed over to the central control unit 10 simply by touching the robot, which is the main body of the communication support apparatus 1, so that even an elderly person who is unfamiliar with the operation of the device can use it.

【００５８】使用者情報入力手段４０に設けたメモリ４
１に格納された使用者情報が使用者の年齢・性別のよう
な固有情報の場合は、統括制御部１０では当該年齢層の
同性（または異性）であるキャラクタを選択し、使用者
に対する呼びかけの言葉も固有情報に基づいて決定す
る。たとえば、使用者が男性の高齢者であれば、使用者
を「おじいちゃん」と呼び、女性の高齢者であれば「お
ばあちゃん」と呼ぶ。Memory 4 provided in user information input means 40
If the user information stored in 1 is unique information such as the age and sex of the user, the overall control unit 10 selects a character having the same sex (or opposite sex) of the age group and calls the user. The language is also decided based on the unique information. For example, if the user is a male senior, the user is called "grandpa", and if the user is a female senior, it is called "grandma".

【００５９】使用者情報が体温・心拍数・血圧のような
生体情報の場合は、生態情報が正常範囲から逸脱してい
るときに、統括制御部１０では医者のキャラクタを選択
し、少し威厳のある声で対処についてアドバイスを発話
する。ここで、キャラクタはディスプレイ部１６にも表
示される。When the user information is biometric information such as body temperature, heart rate, and blood pressure, when the ecological information deviates from the normal range, the overall control unit 10 selects the character of the doctor and is slightly dignified. Speaking advice in a voice. Here, the character is also displayed on the display unit 16.

【００６０】使用者情報が識別情報の場合は、統括制御
部１０で識別情報が照合され、識別情報が合致したか否
かによってディスプレイ部１６に表示するキャラクタを
選択する。たとえば、識別情報が合致すれば、にこやか
な笑顔のキャラクタが表示され音声出力部１２からは識
別情報が合致したことを意味する発話がなされ、識別情
報が合致しない場合には、閻魔大王のようなキャラクタ
が表示され音声出力部１２からは識別情報が誤っている
ことを意味する発話がなされる。このように簡単な操作
でセキュリティ機能を実現し、かつ楽しさも実現するこ
とできる。When the user information is identification information, the overall control unit 10 collates the identification information and selects a character to be displayed on the display unit 16 depending on whether the identification information matches. For example, if the identification information matches, a smiling character with a smile is displayed, and the voice output unit 12 utters that the identification information matches. If the identification information does not match, a character like Enma Daio is displayed. The character is displayed, and the voice output unit 12 speaks to indicate that the identification information is incorrect. In this way, the security function can be realized and the enjoyment can be realized by a simple operation.

【００６１】ところで、コミュニケーション支援装置１
はインタネットを含む通信路３を介して画像音声サーバ
４に接続されているから、画像音声サーバ４との接続に
必要なシステム設定情報も使用者情報として用いられ
る。システム設定情報は、プロバイダの電話番号、ユー
ザ名、パスワード、サーバのｌＰアドレスなどを含み、
コミュニケーション支援装置１を提供するメーカにおい
て使用者情報入力手段４０のメモリ４１に登録される。
したがって、使用者はコミュニケーション支援装置１の
本体部であるロボットに触れるだけでインタネットに接
続することができ、何らかの原因で通信手段１５に設定
したシステム設定情報が失われたときにも、ロボットに
触れるだけでシステム設定情報の再設定を容易に行うこ
とができる。この場合にも、通信路３への接続が成功し
たときにはディスプレイ部１６に笑顔のキャラクタを表
示する。他の構成および動作は第１の実施の形態と同様
である。By the way, the communication support device 1
Is connected to the video / audio server 4 via the communication path 3 including the Internet, the system setting information necessary for connection with the video / audio server 4 is also used as the user information. The system setting information includes the telephone number of the provider, the user name, the password, the IP address of the server, etc.
It is registered in the memory 41 of the user information input means 40 in the manufacturer that provides the communication support device 1.
Therefore, the user can connect to the Internet simply by touching the robot, which is the main body of the communication support apparatus 1, and touch the robot even if the system setting information set in the communication means 15 is lost due to some reason. Only then can the system setting information be easily reset. Also in this case, when the connection to the communication path 3 is successful, a smiling character is displayed on the display unit 16. Other configurations and operations are similar to those of the first embodiment.

【００６２】本実施形態では使用者情報入力手段４０と
信号受信部１８との間の伝送路の少なくとも一部に人体
を用いる例を示したが、伝送路は赤外線などを伝送媒体
とするワイヤレス伝送路であってもよい。In the present embodiment, an example in which the human body is used for at least a part of the transmission path between the user information input means 40 and the signal receiving section 18, but the transmission path is wireless transmission using infrared rays or the like as a transmission medium. It may be a road.

【００６３】（第３の実施の形態）本実施形態は、統括
制御部１０において、使用者と対話する応答者として想
定したキャラクタとは別に、対話中に相づちを打つキャ
ラクタを選択可能としたものである。つまり、応答者と
してのキャラクタと使用者との間の対話中に、「うー
ん」という発話とともに軽く首を上下させる動作を行っ
たり、「なるほど、それはいい。」などの発話とともに
手を叩く動作を行ったりするなど、適宜のタイミングで
相づちを打つとともに、肯定的な態度を表現するような
キャラクタを選択するのである。他の構成および動作は
第１の実施の形態と同様である。(Third Embodiment) In the present embodiment, the central control unit 10 can select a character that makes a relationship during the conversation, in addition to the character assumed as a responder who interacts with the user. Is. In other words, during the dialogue between the character as the responder and the user, a motion of slightly raising and lowering the neck together with the utterance of "Umm" or a motion of clapping hands with an utterance such as "I see, that is good" is performed. The characters are selected at appropriate timings such as when going, and at the same time, a character expressing a positive attitude is selected. Other configurations and operations are similar to those of the first embodiment.

【００６４】なお、本実施形態では、１台のロボットに
よって複数のキャラクタを選択させているが、複数台の
ロボットを用い、使用者と対話中のロボットを除くロボ
ットが相づちを打つようにしてもよい。In the present embodiment, a plurality of characters are selected by one robot, but a plurality of robots may be used so that robots other than the robot in dialogue with the user can make a joint. Good.

【００６５】（第４の実施の形態）本実施形態では、図
４に示すように、コミュニケーション支援装置１におけ
る対話制御部２３および音声合成部２２の構成を第１の
実施の形態とは異なる構成とし、さらに対話制御部２３
が音声データベース４３に格納された音声を用いる点が
異なる。本実施形態では、音声データベース４３をサー
バである画像音声サーバ４に設けているが、画像音声サ
ーバ４とは別に設けたサーバに音声データベース４３を
設けてもよく、また後述するように音声データベース４
３は必ずしもサーバに設ける必要もない。(Fourth Embodiment) In this embodiment, as shown in FIG. 4, the configuration of the dialogue control unit 23 and the voice synthesis unit 22 in the communication support apparatus 1 is different from that of the first embodiment. In addition, the dialogue control unit 23
Uses the voice stored in the voice database 43. In the present embodiment, the voice database 43 is provided in the image / voice server 4 which is a server, but the voice database 43 may be provided in a server provided separately from the image / voice server 4, and as will be described later, the voice database 4 is provided.
3 does not necessarily have to be provided in the server.

【００６６】図４にはコミュニケーション支援装置１の
要部のみを示してあり、上述した他の実施の形態と同機
能を有する構成については省略してある。本実施形態の
対話制御部２３は、音声認識部２１から出力される単語
系列を対話ルールベース２３ｃに格納されたルールに照
合することによって使用者に応答する音声の内容をテキ
ストデータで生成する応答文生成部２３ｄと、応答文生
成部２３ｄで生成された応答内容を音声データベース４
３に照合する音声検索部２３ｅとを備える。ここでは単
純化のために、対話ルールベース２３ｃに格納されるル
ールを表１のように、使用者の発話内容を条件部とし応
答する内容を応答部としたｉｆ−ｔｈｅｎ型のルールと
して示す。ただし、実際には第１の実施の形態において
説明したように、対話ルールベース２３ｃには構文解析
や意味解析のための各種ルールが含まれる。なお、表１
の条件部および応答部の内容を片仮名で表記しているの
はテキストデータの音韻であることを意味している。FIG. 4 shows only the main part of the communication support device 1, and the components having the same functions as those of the other embodiments described above are omitted. The dialogue control unit 23 of the present embodiment collates the word sequence output from the voice recognition unit 21 with the rules stored in the dialogue rule base 23c to generate the content of the voice response to the user as text data. The sentence generation unit 23d and the response contents generated by the response sentence generation unit 23d are stored in the voice database 4
3 and a voice search unit 23e that matches with No. Here, for simplification, the rules stored in the dialogue rule base 23c are shown in Table 1 as if-then-type rules in which the utterance content of the user is the condition part and the response content is the response part. However, actually, as described in the first embodiment, the dialogue rule base 23c includes various rules for syntax analysis and semantic analysis. In addition, Table 1
The contents of the condition part and the response part of the above are written in katakana, which means that they are phoneme of text data.

【００６７】[0067]

【表１】 [Table 1]

【００６８】応答文生成部２３ｄでは、図５に示すよう
に、音声認識部２１での認識結果である単語系列から抽
出したテキストデータを表１のようなルールの条件部に
照合し（Ｓ１）、テキストデータが条件部に一致するル
ールが存在するときには（Ｓ２）、そのルールの応答部
を応答すべき音声の内容のテキストデータとして出力す
る（Ｓ３）。また、テキストデータが条件部に一致する
ルールが存在しない場合には、使用者の発話内容が理解
できなかったことを意味するテキストデータを出力する
（Ｓ４）。In the response sentence generator 23d, as shown in FIG. 5, the text data extracted from the word sequence which is the recognition result of the voice recognizer 21 is collated with the condition part of the rule as shown in Table 1 (S1). If there is a rule in which the text data matches the condition part (S2), the response part of the rule is output as text data of the content of the voice to be answered (S3). If there is no rule in which the text data matches the condition part, the text data indicating that the user's utterance content cannot be understood is output (S4).

【００６９】ところで、音声データベース４３は、複数
種類のキャラクタについて複数の内容の音声が登録され
るデータベースであって、音声データベース４３に音声
が登録されるキャラクタとしては基本的には使用者の息
子や孫のような家族を想定している。すなわち、音声デ
ータベース４３には通常は合成音ではなく肉声が登録さ
れる。音声データベース４３に格納されるデータは肉声
そのものではなくデジタル信号に変換して所定のフォー
マット（たとえば、ＷＡＶＥ形式）とした音声ファイル
であり、音声データベース４３の各レコードは音声ファ
イルのほか音声の内容を識別する情報、使用者や家族を
識別するための使用者情報も含む。また、音声の内容し
ては日常のあいさつや呼びかけの文言が複数種類登録さ
れる。表２に音声データベース４３に登録されたデータ
の例を示す。表２において「テキスト」の欄は検索キー
になるテキストデータであり、「音声」の欄は登録され
た音声ファイルの内容を示す。By the way, the voice database 43 is a database in which voices having a plurality of contents are registered for a plurality of types of characters. Basically, the voice database 43 is a character whose voices are registered. I am assuming a family like grandchildren. That is, the voice database 43 is usually registered with a real voice, not a synthetic voice. The data stored in the voice database 43 is not a real voice itself but a voice file converted into a digital signal and converted into a predetermined format (for example, WAVE format). Each record of the voice database 43 includes voice files and voice contents. It also includes identifying information and user information for identifying users and their families. In addition, a plurality of types of everyday greetings and callouts are registered as the content of the voice. Table 2 shows an example of data registered in the voice database 43. In Table 2, the "text" column is text data that serves as a search key, and the "voice" column shows the contents of the registered voice file.

【００７０】[0070]

【表２】 [Table 2]

【００７１】音声検索部２３ｅは、図６に示すように、
応答文生成部２３ｄにおいて応答する音声の内容である
テキストデータが引き渡されたときに通信路３を介して
音声データベース４３のデータを検索する機能を有し
（Ｓ１）、統括制御部１０に設けたキャラクタ選択手段
により選択されたキャラクタ（つまり、使用者の家族）
に一致する音声であって、かつ応答文生成部２３ｄで生
成した音声の内容に該当する内容の音声が音声データベ
ース４３に登録されているか否かを検索条件として音声
データベース４３の内容を検索する（Ｓ２）。音声デー
タベース４３では、音声検索部２３ｅにより指示された
検索条件を満たすレコードが存在するときには通信路３
を介して音声検索部２３ｅに音声ファイルを転送し、検
索条件を満たすレコードが存在しないときには音声検索
部２３ｅに対して音声ファイルが存在しない旨の通知を
行う。The voice search unit 23e, as shown in FIG.
It has a function of searching the data of the voice database 43 through the communication path 3 when the text data which is the content of the response voice is delivered in the response sentence generation unit 23d (S1), and is provided in the overall control unit 10. The character selected by the character selection means (that is, the user's family)
The content of the voice database 43 is searched with a search condition as to whether or not a voice that matches the content of the voice generated by the response sentence generation unit 23d is registered in the voice database 43. S2). In the voice database 43, when there is a record satisfying the search condition designated by the voice search unit 23e, the communication path 3
The voice file is transferred to the voice search unit 23e via the, and when there is no record satisfying the search condition, the voice search unit 23e is notified that the voice file does not exist.

【００７２】本実施形態の音声合成部２２は、応答文生
成部２３ｄにおいて生成した音声の内容に従ってテキス
ト合成を行った合成音を出力するテキスト合成部２２ａ
と、音声検索部２３ｅが音声データベース４３から検索
した音声ファイルを再生する音声再生部２２ｂとを備え
る。したがって、音声検索部２３ｅが音声データベース
４３に与えた検索条件を満たす音声ファイルが存在しな
いときには応答文生成部２３ｄから出力された音声の内
容に対応する合成音をテキスト合成部２２ａにおいて生
成し（Ｓ４）、音声データベース４３から検索条件を満
たす音声ファイルが転送されたときには音声再生部２２
ｂにおいてデジタル信号をアナログ信号に変換した音声
を音声出力部１２から出力する（Ｓ３）。The voice synthesizing unit 22 of the present embodiment outputs the synthesized voice obtained by performing the text synthesizing according to the content of the voice generated by the response sentence generating unit 23d.
And a voice reproduction unit 22b that reproduces the voice file searched by the voice search unit 23e from the voice database 43. Therefore, when there is no voice file satisfying the search conditions given by the voice search unit 23e to the voice database 43, the text synthesizing unit 22a generates a synthetic voice corresponding to the content of the voice output from the response sentence generating unit 23d (S4). ), When a voice file satisfying the search condition is transferred from the voice database 43, the voice reproducing unit 22
The voice which converted the digital signal into the analog signal in b is output from the voice output unit 12 (S3).

【００７３】たとえば、音声検索部２３ｅから音声デー
タベース４３に対して「コンニチワ」というテキストデ
ータの照合が要求されたとすると、表２によって「こん
にちは」という音声ファイルを抽出することができるか
ら、音声合成部２２における音声再生部２２ｂを通して
音声出力部１２から音声が出力される。[0073] For example, if the collation of the text data is requested as "Hello" to the audio database 43 from the speech retrieval unit 23e, since it is possible to extract the audio file named "Hello" by Table 2, the speech synthesis unit A voice is output from the voice output unit 12 through the voice reproduction unit 22b in 22.

【００７４】以上の説明から明らかなように、本実施形
態のコミュニケーション支援装置１は画像音声サーバ４
に対する端末であって使用者が用いる使用者端末として
機能する。しかも、上述のように家族の音声を音声デー
タベース４３から検索する機能を有しているから家族の
エージェントとみなすことができ、いわゆるエージェン
ト型端末として機能する。As is clear from the above description, the communication support device 1 of the present embodiment has the image / audio server 4
And functions as a user terminal used by the user. Moreover, as described above, it has a function of retrieving the voice of the family from the voice database 43, so that it can be regarded as an agent of the family and functions as a so-called agent type terminal.

【００７５】ところで、上述の動作を実現するには、音
声データベース４３に音声を登録する必要がある。本実
施形態においては、画像音声サーバ４に通信路５を介し
て接続可能な音声端末２ｃを通して音声データベース４
３に音声を登録可能になっている。通信路５はどのよう
な形態でもよく、通信路３と共用することも可能であ
る。音声端末２ｃは、通信路５に接続可能であればパー
ソナルコンピュータなどを用いることができるが、携帯
電話や固定電話のような電話端末を音声端末２ｃとして
用いることにより、音声データベース４３に音声を簡便
に登録することが可能になる。つまり、電話端末には当
然ながら音声を入力するためのマイクロホンが内蔵され
ており、しかも携帯電話はもちろんのこと固定電話にお
いても表示部を備えるものが普及してきているから、音
声の登録手順に関するガイダンスや登録する音声の内容
を指示するメッセージを画像音声サーバ４から表示部に
表示することによって、音声データベース４３に適正な
内容の音声を登録することが可能になる。By the way, in order to realize the above operation, it is necessary to register the voice in the voice database 43. In the present embodiment, the audio database 4 is provided through the audio terminal 2c connectable to the image / audio server 4 via the communication path 5.
The voice can be registered in 3. The communication path 5 may have any form and can be shared with the communication path 3. As the voice terminal 2c, a personal computer or the like can be used as long as it can be connected to the communication path 5, but by using a telephone terminal such as a mobile phone or a fixed line telephone as the voice terminal 2c, voice can be easily stored in the voice database 43. You will be able to register at. In other words, as a matter of course, telephone terminals have a built-in microphone for inputting voice, and moreover, mobile phones as well as fixed-line telephones are equipped with a display unit. By displaying a message instructing the content of the voice to be registered or the image and voice server 4 on the display unit, it becomes possible to register the voice with the proper content in the voice database 43.

【００７６】ここに、画像音声サーバ４においては、音
声端末２ｃとの間で通信可能とする音声通信部４４と、
音声通信部４４を通して音声端末２ｃから入力される音
声を音声データベース４３に登録する登録処理部４５と
が音声データベース４３に付設されている。いま、音声
端末２ｃとして携帯電話を用い、図７に示すように携帯
電話の表示部２ｄに各種メッセージが表示されるととも
に、表示部２ｄに表示された釦を携帯電話のキー操作に
よって操作できるものとする。音声端末（携帯電話）２
ｃから音声データベース４３に音声を登録する際には音
声端末２ｃと音声通信部４４とを通信可能にすると、図
７（ａ）のように、表２に示した「音声」の欄に記載し
た内容のメッセージ（図示例では「こんにちは」）が登
録手順の指示（図示例では「とおっしゃってくださ
い」）とともに表示される。表示部２ｄの画面下部には
「録音開始」釦Ｂ１が表示されるから、音声を登録しよ
うとする者は、携帯電話である音声端末２ｃのキー操作
により「録音開始」釦Ｂ１をオンにする操作を行い、指
示された内容の音声を発話する。Here, in the image / sound server 4, a sound communication unit 44 which enables communication with the sound terminal 2c,
The voice database 43 is additionally provided with a registration processing unit 45 for registering the voice input from the voice terminal 2c through the voice communication unit 44 in the voice database 43. Now, a mobile phone is used as the voice terminal 2c, and various messages are displayed on the display part 2d of the mobile phone as shown in FIG. 7, and the buttons displayed on the display part 2d can be operated by key operation of the mobile phone. And Voice terminal (mobile phone) 2
When the voice terminal 2c and the voice communication unit 44 can communicate with each other when the voice is registered in the voice database 43 from "c", as shown in FIG. 7A, it is described in the "voice" column shown in Table 2. (in the illustrated example, "Hello") content of the message is displayed together with the instruction of the registration procedure ( "Please saying the" in the illustrated example). Since the "recording start" button B1 is displayed at the bottom of the screen of the display unit 2d, the person who wants to register a voice turns on the "recording start" button B1 by key operation of the voice terminal 2c which is a mobile phone. Operate and speak the voice of the specified content.

【００７７】「録音開始」釦Ｂ１をオンにすると、図７
（ｂ）のように、表示部２ｄの画面の中央部に「録音
中」と表示されるとともに、「録音開始」釦Ｂ１に代わ
って「録音終了」釦Ｂ２が表示される。指示された内容
の音声を発話し終えた後に「録音終了」釦Ｂ１をオンに
する操作を行えば、発話した音声が画像音声サーバ４に
一時的に記憶された状態になる。「録音終了」釦Ｂ２の
操作後には、図７（ｃ）のように、表示部２ｄの画面の
中央部に、指示された内容のメッセージ（図示例では
「こんにちは」）が登録の確認のメッセージ（図示例で
は「を登録して宜しいですか」）とともに表示される。
さらに、確認のメッセージに応答するための釦として
は、「再生」釦Ｂ３、「キャンセル」釦Ｂ４、「ＯＫ」
釦Ｂ５が表示される。「再生」釦Ｂ３を操作すれば発話
した音声の確認をすることができ、「キャンセル」釦Ｂ
４を操作すれば画像音声サーバ４に一時的に記憶されて
いる音声が消去される。また、画像音声サーバ４に一時
的に記憶した音声を音声データベース４３に登録すると
きには、「ＯＫ」釦Ｂ５をオンにするように操作する。
なお、これらの一連の処理は登録処理部４５により行わ
れる。When the "start recording" button B1 is turned on, FIG.
As shown in (b), "Recording" is displayed in the center of the screen of the display unit 2d, and a "Recording end" button B2 is displayed instead of the "Recording start" button B1. When the operation of turning on the “recording end” button B1 is performed after the speech of the instructed content is finished, the spoken speech is temporarily stored in the image / audio server 4. "Recording end" After the operation of the button B2, as shown in FIG. 7 (c), the in the center portion of the screen of the display unit 2d, designated contents of the message (in the illustrated example, "Hello") is registered in the confirmation message (In the illustrated example, "Are you sure you want to register?") Is displayed.
Furthermore, as the buttons for responding to the confirmation message, there are "play" button B3, "cancel" button B4, and "OK".
Button B5 is displayed. By operating the "Play" button B3, you can check the spoken voice, and the "Cancel" button B
4 is operated, the voice temporarily stored in the image / voice server 4 is erased. When the voice temporarily stored in the image / voice server 4 is registered in the voice database 43, the “OK” button B5 is turned on.
The series of processes is performed by the registration processing unit 45.

【００７８】上述の手順によって音声データベース４３
に対して携帯電話を用いて音声を登録することができ
る。なお、音声データベース４３のレコードには、音声
ファイルのほか、音声の内容を識別する情報および使用
者や家族を識別する使用者情報が必要である。音声の内
容を識別する情報には、音声の登録の際の指示内容を用
いればよい。また、使用者や家族を識別する使用者情報
はあらかじめ携帯電話の電話番号に対応付けてデータフ
ァイルとして用意しておき、携帯電話から画像音声サー
バ４にアクセスがあったときに携帯電話の電話番号をデ
ータファイルに照合することにより電話番号に対応する
使用者や家族を識別する使用者情報を抽出する。ここで
は、携帯電話を音声端末２ｃに用いる例を示している
が、音声を入力する機能と画像音声サーバ４との間で通
信する機能とを備えていれば音声端末２ｃの形態は問わ
ない。したがって、上述したデータファイルによって使
用者や家族を識別する使用者情報に対応付ける情報は電
話番号ではなくてもよく、たとえば電子メールアドレス
を用いたりパスワードを用いたりすることも可能であ
る。According to the above procedure, the voice database 43
You can register your voice using a mobile phone. In addition to the voice file, the record of the voice database 43 needs information for identifying the contents of the voice and user information for identifying the user and family. As the information for identifying the content of the voice, the instruction content at the time of registering the voice may be used. In addition, user information for identifying the user and family is prepared in advance as a data file in association with the telephone number of the mobile phone, and the telephone number of the mobile phone is accessed when the image / voice server 4 is accessed from the mobile phone. The user information for identifying the user or family member corresponding to the telephone number is extracted by collating with the data file. Here, an example in which a mobile phone is used as the voice terminal 2c is shown, but the form of the voice terminal 2c does not matter as long as it has a function of inputting voice and a function of communicating with the image / voice server 4. Therefore, the information associated with the user information for identifying the user or family by the above-mentioned data file does not have to be a telephone number, and for example, an electronic mail address or a password can be used.

【００７９】上述した例では登録する音声の内容の指示
が表示部２ｄに１個ずつ表示されているが、音声の内容
は通常は多種類が必要であるから、表２の左欄に示した
テキストデータを複数個ずつ区分して表示部２ｄに一括
して表示するようにしてもよい。つまり、複数種類の音
声の内容をメニュー形式で表示部２ｄに一覧表示し、音
声を順次登録するか、あるいは複数種類の内容から所望
の内容を選択させて路頭録するようにしてもよい。具体
的には、順次登録するか選択して登録するかの別を指示
する釦を表示部２ｄの画面に設け、順次登録する場合に
は１つの内容を登録するたびに次の登録に移行させるた
めの釦を設けておけばよい。また、選択して登録する場
合には、複数の選択肢に番号を付与しておき所望の内容
を番号で選択させればよい。以後の動作は音声の内容を
１個ずつ登録する場合に準じる。さらに、上述の例では
音声端末２ｃが画像を表示する表示部２ｄを備えている
が、登録手順の指示は画像ではなく音声によって行うこ
とも可能である。In the above-mentioned example, the instruction of the contents of the voice to be registered is displayed one by one on the display section 2d. However, since many kinds of the voice contents are usually required, they are shown in the left column of Table 2. The text data may be divided into a plurality of pieces and displayed collectively on the display unit 2d. That is, the contents of a plurality of types of voices may be displayed in a list on the display unit 2d in a menu format, and the voices may be sequentially registered, or desired contents may be selected from a plurality of types of contents for road recording. Specifically, a button is provided on the screen of the display unit 2d for instructing whether to sequentially register or select and register, and when sequentially registering, each time one content is registered, the next registration is performed. There should be a button for this. Further, when selecting and registering, a number may be given to a plurality of options and desired contents may be selected by the number. Subsequent operations are based on the case of registering the voice contents one by one. Furthermore, in the above example, the audio terminal 2c is provided with the display unit 2d for displaying an image, but the instruction of the registration procedure can also be given by voice instead of the image.

【００８０】（第５の実施の形態）第４の実施の形態で
は、コミュニケーション支援装置１が、音声データベー
ス４３に格納した使用者の家族などの音声を用いて使用
者の発話に対する応答を行うから、使用者は家族との会
話を模擬することができる。一方、使用者の家族は使用
者の安否に関する情報を得ることができれば安心するこ
とができる。そこで、本実施形態では、図８に示すよう
に、音声データベース４３を設けたサーバ（たとえば、
画像音声サーバ４）に使用者がコミュニケーション支援
装置１との対話を行ったときの履歴を登録する対話記録
格納部４６を設けてある。つまり、使用者がコミュニケ
ーション支援装置１との対話を行うと、音声データベー
ス４３にアクセスするから、対話の内容の履歴を対話記
録格納部４６に格納しておき、これを読み出せば使用者
の安否や使用者の生活状態を知る目安を得ることができ
る。(Fifth Embodiment) In the fourth embodiment, the communication support device 1 responds to the user's utterance by using the voice of the user's family or the like stored in the voice database 43. , The user can simulate a conversation with the family. On the other hand, the user's family can feel at ease if they can obtain information on the safety of the user. Therefore, in the present embodiment, as shown in FIG. 8, a server provided with the voice database 43 (for example,
The image / sound server 4) is provided with a dialogue record storage unit 46 for registering a history of the user's dialogue with the communication support apparatus 1. That is, when the user has a dialogue with the communication support apparatus 1, the voice database 43 is accessed, so that the history of the contents of the dialogue is stored in the dialogue recording storage unit 46, and if this is read, the safety of the user can be obtained. You can get a guide to know the user's living condition.

【００８１】対話記録格納部４６へのデータの記録は音
声データベース４３へのアクセス時に行うようにすれば
よく、また場合によっては定期的に行うようにしてもよ
い。一方、対話記録格納部４６からのデータを読み出す
には、対話記録格納部４６を設けたサーバ（たとえば、
画像音声サーバ４）に通信路５を介して接続される携帯
電話または固定電話あるいはスピーカを備えるパーソナ
ルコンピュータなどの端末２ｅを用いればよい。第４の
実施の形態において説明した音声端末２ｃと同様に端末
２ｅとサーバとの間の通信路５は通信路３と同じであっ
てもよい。また、端末２ｅとして携帯電話を用いるので
あれば、音声端末２ｃと兼用することができる。ここに
おいて、対話記録格納部４６の記録内容を端末２ｅで読
み出すときに課金するようにすれば、サーバ（たとえ
ば、画像音声サーバ４）の運営費用を捻出することが可
能になる。課金のためには、端末２ｅから対話記録格納
部４６へのアクセスを検出する課金処理部４７を設けて
おけばよい。課金処理部４７による課金の体系について
は適宜に設定することが可能である。The recording of data in the dialogue record storage section 46 may be performed when the voice database 43 is accessed, or may be periodically performed in some cases. On the other hand, in order to read out the data from the dialogue record storage unit 46, a server provided with the dialogue record storage unit 46 (for example,
A terminal 2e such as a mobile phone or a fixed phone or a personal computer equipped with a speaker, which is connected to the image / sound server 4) via the communication path 5, may be used. Like the voice terminal 2c described in the fourth embodiment, the communication path 5 between the terminal 2e and the server may be the same as the communication path 3. If a mobile phone is used as the terminal 2e, it can also be used as the voice terminal 2c. Here, if the recorded contents of the dialogue record storage unit 46 are read out by the terminal 2e, it is possible to generate the operating cost of the server (for example, the image / audio server 4). For billing, a billing processing unit 47 for detecting access from the terminal 2e to the dialogue record storage unit 46 may be provided. The charging system by the charging processing unit 47 can be set appropriately.

【００８２】上述の構成では音声入力部１１から使用者
の音声が入力されると応答文生成部２３ｄにおいて応答
する音声の内容を生成し、この内容を音声データベース
４３に照合して１個の音声を抽出する構成を採用してい
るが、キャラクタ選択手段によりキャラクタが選択され
ると、選択されたキャラクタに該当する音声を音声デー
タベース４３から一括して取り出して記憶しておき、そ
の後、応答文生成部２３ｄで生成した音声の内容に該当
する内容の音声が音声データベース４３から取り出した
音声に含まれるときにはその音声を使用者に対する応答
に用いる構成としてもよい。このように、音声データベ
ース４３から複数種類の内容の音声を一括して抽出して
おけば、通信路３のトラフィックに影響されずに、使用
者の発話から応答までの時間を比較的短くすることがで
きる。In the above configuration, when the user's voice is input from the voice input unit 11, the response sentence generation unit 23d generates the content of the response voice, and this content is collated with the voice database 43 to generate one voice. However, when a character is selected by the character selecting means, voices corresponding to the selected character are collectively fetched from the voice database 43 and stored, and then a response sentence is generated. When the voice extracted from the voice database 43 includes the voice corresponding to the voice generated by the unit 23d, the voice may be used as a response to the user. In this way, if the voices of a plurality of types of contents are collectively extracted from the voice database 43, the time from the user's utterance to the response can be made relatively short without being affected by the traffic on the communication path 3. You can

【００８３】なお、第４の実施の形態および第５の実施
の形態において、コミュニケーション支援装置１を使用
者宅に設置する使用者端末と同義で用いたが、使用者端
末がコミュニケーション支援装置１と同義であることは
必須ではない。したがって、上述の例のように音声デー
タベース４３を使用者端末の遠方に設置する構成のほ
か、音声データベース４３を使用者宅に設置したり、コ
ミュニケーション支援装置１に組み込んだりしてもよ
い。つまり、音声データベース４３をコミュニケーショ
ン支援装置１と同所に設けることによって、別途のサー
バ（たとえば、画像音声サーバ４）を用いることなくコ
ミュニケーション支援システムを構築することが可能で
ある。また、コミュニケーション支援装置１の構成のう
ち音声入力部１１と音声出力部１２とを使用者端末に設
け、コミュニケーション支援装置１のうちの対話処理手
段２０を使用者端末の遠方に設置したサーバ（たとえ
ば、画像音声サーバ４）に音声データベース４３ととも
に設けるようにしてもよい。In the fourth and fifth embodiments, the communication support device 1 is used synonymously with the user terminal installed in the user's house, but the user terminal is referred to as the communication support device 1. Synonyms are not required. Therefore, in addition to the configuration in which the voice database 43 is installed far from the user terminal as in the above example, the voice database 43 may be installed in the user's home or incorporated in the communication support device 1. That is, by providing the voice database 43 in the same place as the communication support device 1, it is possible to construct a communication support system without using a separate server (for example, the image / voice server 4). A server in which the voice input unit 11 and the voice output unit 12 of the configuration of the communication support apparatus 1 are provided in the user terminal and the dialogue processing means 20 of the communication support apparatus 1 is installed far from the user terminal (for example, , The image / sound server 4) may be provided together with the sound database 43.

【００８４】[0084]

【発明の効果】請求項１の発明は、音声入力部と音声出
力部とが接続され自然言語を用いた音声による対話を使
用者との間で行うとともに使用者に応答する応答者とし
て想定したキャラクタが複数種類から選択可能である対
話処理手段と、対話環境に応じて対話処理手段における
前記キャラクタから適宜のキャラクタを選択するキャラ
クタ選択手段を有した統括制御部とを備えるものであ
り、複数種類のキャラクタが選択されることによって対
話の単調さを解消することができ、対話に対して持続的
に興味を持たせることが可能になって使用頻度の低下を
抑制することができる。According to the first aspect of the present invention, the voice input unit and the voice output unit are connected to each other, and it is assumed that the user has a voice dialogue using natural language with the user and is a responder who responds to the user. The character includes a dialogue processing unit capable of selecting from a plurality of types, and a central control unit having a character selection unit for selecting an appropriate character from the characters in the dialogue processing unit according to a dialogue environment. It is possible to eliminate the monotonousness of the dialogue by selecting the character, and it is possible to keep the dialogue interesting and suppress the decrease in the frequency of use.

【００８５】請求項２の発明は、請求項１の発明におい
て、前記キャラクタの属性として声色を含むので、声色
を制御する程度の比較的簡単な構成を採用するだけで、
あたかも複数人と対話しているような印象を与えて対話
の単調さを緩和することが可能になる。According to the invention of claim 2, in the invention of claim 1, since a voice color is included as an attribute of the character, only a relatively simple structure for controlling the voice color is adopted.
It is possible to reduce the monotonousness of the dialogue by giving the impression of talking to multiple people.

【００８６】請求項３の発明は、請求項１の発明におい
て、表情を表現する可動部分を有したロボットを駆動す
る駆動源が付加され、前記キャラクタ選択手段が可動部
分の動作により表現される表情に応じたキャラクタを選
択するものであり、動作によってキャラクタを表現する
から、動作によって興味をひくことにより対話のみによ
る単調さを軽減することが可能になる。According to a third aspect of the present invention, in the first aspect of the present invention, a driving source for driving a robot having a movable part expressing an expression is added, and the character selecting means is expressed by the motion of the movable part. The character is selected according to the action, and the character is expressed by the action. Therefore, it is possible to reduce the monotonousness caused only by the dialogue by attracting the interest by the action.

【００８７】請求項４の発明は、請求項１の発明におい
て、前記キャラクタを選択する操作部を備えるので、使
用者の希望に応じてキャラクタを選択することができ、
使用者の好きなキャラクタを選択させることにより対話
に対する興味を持続させることが可能になる。According to a fourth aspect of the present invention, in the first aspect of the invention, since the operation unit for selecting the character is provided, the character can be selected according to the wish of the user.
By allowing the user to select a favorite character, it becomes possible to maintain the interest in the dialogue.

【００８８】請求項５の発明は、請求項１の発明におい
て、前記対話処理手段を通して前記使用者に発話を促す
ように呼びかける機能を有し、前記キャラクタ選択手段
が使用者の応答した内容に応じて前記キャラクタを選択
するので、使用者の発話内容に応じてキャラクタを選択
するから、キャラクタの選択が対話的に行われることに
なり、キャラクタの選択が定式化されることによる単調
さを防止することができる。According to a fifth aspect of the present invention, in the first aspect of the invention, there is a function of calling out to the user through the dialogue processing means so as to prompt the user to speak, and the character selecting means responds to the content of the user's response. Since the character is selected according to the content of the user's utterance, the character is selected interactively, and monotonousness due to the formalization of the character selection is prevented. be able to.

【００８９】請求項６の発明は、請求項１の発明におい
て、前記使用者に関する使用者情報を入力する使用者情
報入力手段が付加され、前記キャラクタ選択手段が前記
使用者情報に応じて前記キャラクタを選択するので、使
用者が与える使用者情報に基づいてキャラクタが選択さ
れるから、使用者に適合するキャラクタが選択されるこ
とになる。According to a sixth aspect of the present invention, in the first aspect of the present invention, a user information input means for inputting user information regarding the user is added, and the character selecting means causes the character according to the user information. Is selected, the character is selected based on the user information provided by the user, so that the character suitable for the user is selected.

【００９０】請求項７の発明は、請求項６の発明におい
て、前記使用者情報入力手段が前記使用者情報を格納す
るとともに使用者に所持され、前使用者情報入力手段か
ら前記使用者情報を受信して前記統括制御部に引き渡す
信号受信部を備えるので、使用者情報を使用者が所持す
る使用者情報入力手段から与えるから、使用者の生体情
報のような内容が時々変動する使用者情報でも容易に入
力することができる。According to the invention of claim 7, in the invention of claim 6, the user information input means stores the user information and is carried by the user, and the user information is input from the previous user information input means. Since the user information is provided from the user information input means possessed by the user, since the user information is received and passed to the integrated control unit, the user information whose contents like biometric information of the user fluctuates from time to time. But you can enter it easily.

【００９１】請求項８の発明は、請求項７の発明におい
て、前記使用者情報入力手段と前記信号受信部との間の
伝送路がワイヤレス伝送路であるので、使用者から離れ
ていても使用者情報の入力が可能になる。According to the invention of claim 8, in the invention of claim 7, since the transmission path between the user information input means and the signal receiving section is a wireless transmission path, it can be used even if the user is away from the user. It becomes possible to input personal information.

【００９２】請求項９の発明は、請求項７の発明におい
て、前記使用者情報入力手段と前記信号受信部との間の
伝送路の少なくとも一部が人体であるので、使用者に触
れるだけで使用者情報を入力することが可能になる。According to a ninth aspect of the invention, in the seventh aspect of the invention, since at least a part of the transmission path between the user information input means and the signal receiving section is a human body, it can be touched by a user. It becomes possible to input user information.

【００９３】請求項１０の発明は、請求項６ないし請求
項９の発明において、前記使用者情報が使用者の年齢・
性別を固有情報として含むので、使用者の年齢や性別に
応じたキャラクタの選択が可能になる。According to a tenth aspect of the invention, in the inventions of the sixth to ninth aspects, the user information is the age of the user.
Since the gender is included as the unique information, it is possible to select the character according to the age and gender of the user.

【００９４】請求項１１の発明は、請求項６ないし請求
項９の発明において、前記使用者情報が使用者の生体情
報を含むので、使用者の健康状態の管理が可能になる。According to the invention of claim 11, in the invention of claims 6 to 9, since the user information includes biometric information of the user, it is possible to manage the health condition of the user.

【００９５】請求項１２の発明は、請求項６ないし請求
項９の発明において、前記使用者情報が使用者を特定す
るための識別情報を含むので、使用者の本人確認が可能
になる。According to a twelfth aspect of the present invention, in the inventions of the sixth to ninth aspects, since the user information includes identification information for specifying the user, it is possible to confirm the identity of the user.

【００９６】請求項１３の発明は、請求項６ないし請求
項９の発明において、前記使用者情報が通信路を介して
接続された外部装置との接続時に必要となるシステム設
定情報を含むので、通信路を通して外部装置と接続する
のに必要なシステム設定情報を簡単に設定することがで
き、しかもこの種のシステム設定情報が消失した場合で
も使用者情報入力手段によって容易に再設定することが
可能になる。According to a thirteenth aspect of the present invention, in the sixth to ninth aspects, the user information includes system setting information required when connecting to an external device connected via a communication path. You can easily set the system setting information required to connect to an external device through the communication path, and even if you lose this type of system setting information, you can easily reset it using the user information input means. become.

【００９７】請求項１４の発明は、請求項１ないし請求
項１３の発明において、前記キャラクタ選択手段が応答
者として想定したキャラクタとともに、対話中に相づち
を打つキャラクタを選択可能としたので、対話中に相づ
ちが打たれることによって、使用者の発話が促されるこ
とになり、結果的に使用頻度の低下を軽減することがで
きる。According to a fourteenth aspect of the present invention, in the first to thirteenth aspects of the present invention, since the character selecting means can select a character that is supposed to be a responder and a character that makes a relationship during the dialogue, As a result, the user's utterance is prompted, and as a result, it is possible to reduce the decrease in the frequency of use.

【００９８】請求項１５の発明は、請求項１ないし請求
項１４の発明において、前記キャラクタ選択手段が使用
者の音声入力に対する認識結果について複数の認識候補
が得られたときに各認識候補にそれぞれ異なるキャラク
タを割り当て、前記対話処理手段が前記認識候補を確認
するように使用者に質問するとともに使用者による肯定
の応答を受け取ると使用者による音声入力の内容を確定
する機能を有するものであり、使用者の音声に適した音
響モデルを使用することによって音声の認識率を高める
ことができ、しかも質問に応答することになるから、対
話における興味を持続させることになり、使用者による
使用頻度の低下を軽減することができる。According to a fifteenth aspect of the invention, in the inventions of the first to fourteenth aspects, when the character selection means obtains a plurality of recognition candidates for a recognition result for a voice input of a user, the recognition candidates are respectively recognized. Different characters are assigned, and the dialogue processing unit has a function of asking the user to confirm the recognition candidate and determining the content of the voice input by the user when receiving a positive response from the user, By using the acoustic model suitable for the user's voice, the recognition rate of the voice can be improved, and since the user can answer the question, the interest in the dialogue is maintained and the frequency of use by the user is increased. The decrease can be reduced.

【００９９】請求項１６の発明は、請求項１５の発明に
おいて、前記対話処理手段には使用者の発話内容を認識
する音声認識部が設けられ、音声認識部には音声の特徴
量系列と照合される音響モデルが複数設けられるととも
に各音響モデルにそれぞれキャラクタが割り当てられお
り、前記認識候補に対する確定の結果により使用者の音
声に適合したキャラクタを抽出し、抽出されたキャラク
タに対応する音響モデルを音声認識部で用いるものであ
り、使用者との対話を通して複数個の音響モデルから使
用者に適合する音響モデルを絞り込むことができるか
ら、適正な音響モデルを自動的に選択することになり、
結果的に使用者の音声に対する認識率が高くなる。According to a sixteenth aspect of the present invention, in the fifteenth aspect of the present invention, the dialogue processing means is provided with a voice recognition section for recognizing the utterance content of the user, and the voice recognition section collates with a voice feature quantity series. A plurality of acoustic models are provided and a character is assigned to each acoustic model, and a character suitable for the voice of the user is extracted according to the result of confirmation of the recognition candidate, and an acoustic model corresponding to the extracted character is extracted. It is used in the voice recognition unit, and since it is possible to narrow down the acoustic model that suits the user from a plurality of acoustic models through the dialogue with the user, an appropriate acoustic model will be automatically selected,
As a result, the recognition rate of the user's voice increases.

【０１００】請求項１７ないし請求項２５の発明は、請
求項１ないし請求項１６の発明に係るコミュニケーショ
ン支援装置を用いたコミュニケーション支援システムに
関するものである。The inventions of claims 17 to 25 relate to a communication support system using the communication support device according to the invention of claims 1 to 16.

【０１０１】請求項１７の発明は、請求項１ないし請求
項１６のいずれか１項に記載のコミュニケーション支援
装置と、複数の前記キャラクタについて複数の内容の音
声が登録される音声データベースとを備え、前記対話処
理手段は、前記使用者に応答する音声の内容を生成する
応答文生成部と、前記キャラクタ選択手段により選択さ
れたキャラクタに該当しかつ応答文生成部で生成した音
声の内容に該当する内容の音声を音声データベースから
検索する音声検索部とを備え、音声検索部が音声データ
ベースから該当する音声を抽出したときには当該音声を
使用者との対話に用いるものであり、音声データベース
を設けていることによってキャラクタとして使用者の家
族を設定することが可能になり、使用者が独居高齢者で
ある場合などに家族の音声を利用して対話が行えること
で使用者に安心感をもたらし、しかも機械的な音声に比
較すれば単調さが軽減されるから対話に興味をもたせて
対話を継続させることができる。The invention according to claim 17 comprises the communication support device according to any one of claims 1 to 16 and a voice database in which voices of a plurality of contents are registered for a plurality of the characters, The dialogue processing means corresponds to the response sentence generation unit that generates the content of the voice that responds to the user, the character that is selected by the character selection unit, and the content of the voice that is generated by the response sentence generation unit. A voice search unit for searching the voice database for a voice of the content is provided, and when the voice search unit extracts the relevant voice from the voice database, the voice is used for dialogue with the user, and the voice database is provided. This makes it possible to set the user's family as a character, and when the user is an elderly person living alone, Of using the voice brings a sense of security to the user by can be performed interactively, yet monotony when compared to the mechanical voice that it is possible to continue the conversation remembering interested in dialogue because is reduced.

【０１０２】請求項１８の発明は、請求項１７の発明に
おいて、前記音声データベースには、音声入力が可能な
音声端末との間で通信可能とする音声通信部と、音声通
信部を通して音声端末から入力される音声を登録する登
録処理部とが付設されているものであり、音声端末を通
して音声データベースに任意の音声を登録することが可
能であるから、たとえば使用者の家族が音声データベー
スに容易に音声を登録することができる。According to an eighteenth aspect of the present invention, in the seventeenth aspect of the present invention, the voice database includes a voice communication section capable of communicating with a voice terminal capable of inputting voice, and a voice communication section from the voice terminal through the voice communication section. Since a voice input terminal is provided with a registration processing unit for registering voices, any voice can be registered in the voice database through a voice terminal. Voice can be registered.

【０１０３】請求項１９の発明は、請求項１８の発明に
おいて、前記音声端末が電話端末であるから、携帯電話
や固定電話を用いて音声データベースに音声を簡便に登
録することができる。According to the nineteenth aspect of the invention, in the eighteenth aspect of the invention, since the voice terminal is a telephone terminal, the voice can be easily registered in the voice database using a mobile phone or a fixed telephone.

【０１０４】請求項２０の発明は、請求項１８または請
求項１９の発明において、前記登録処理部は前記音声端
末に対して前記音声データベースに登録する音声の内容
を指示する機能を備えるものであり、音声データベース
に登録する音声の内容が登録処理部から指示されるか
ら、指示に従って音声を登録することにより、音声を容
易に登録することができる上に、コミュニケーション支
援装置において対話に用いる音声の内容をあらかじめ指
示することにより、音声データベースに格納された音声
を対話に利用できる可能性が高くなる。According to a twentieth aspect of the present invention, in the eighteenth or nineteenth aspect of the present invention, the registration processing unit has a function of instructing the voice terminal about the content of voice to be registered in the voice database. Since the content of the voice to be registered in the voice database is instructed by the registration processing unit, the voice can be easily registered by registering the voice according to the instruction, and the content of the voice used for dialogue in the communication support device. By instructing in advance, there is a high possibility that the voice stored in the voice database can be used for dialogue.

【０１０５】請求項２１の発明は、請求項１８または請
求項１９の発明において、前記音声端末が画像を表示す
る表示部を備え、前記登録処理部は前記音声端末に対し
て前記音声データベースに登録する音声の内容をメニュ
ー形式で表示部に表示する機能を備えるものであり、音
声データベースに登録する音声の内容が登録処理部から
メニュー形式で指示されるから、指示の内容がわかりや
すく、しかも指示に従って音声を登録することにより、
音声を容易に登録することができる上に、コミュニケー
ション支援装置において対話に用いる音声の内容をあら
かじめ指示することにより、音声データベースに格納さ
れた音声を対話に利用できる可能性が高くなる。According to a twenty-first aspect of the invention, in the eighteenth or nineteenth aspect of the invention, the voice terminal includes a display section for displaying an image, and the registration processing section registers the voice terminal with the voice database. It has a function to display the contents of the voice to be displayed on the display unit in a menu format. Since the contents of the voice to be registered in the voice database are instructed in the menu format from the registration processing unit, the content of the instruction is easy to understand and By registering the voice according to
In addition to the ability to easily register the voice, the voice stored in the voice database is more likely to be used for the dialogue by instructing the content of the voice used for the dialogue in the communication support device in advance.

【０１０６】請求項２２の発明は、請求項１７ないし請
求項２１の発明において、前記コミュニケーション支援
装置は前記音声入力部および前記音声出力部と前記対話
処理手段とを使用者と対話する使用者端末に備え、前記
音声データベースは使用者端末との間で通信路を介して
データ通信が可能なサーバに設けられ、前記音声検索部
は前記キャラクタ選択手段によりキャラクタが選択され
ると当該キャラクタに該当する音声を音声データベース
から一括して取り出し、前記対話処理手段は、前記応答
文生成部で生成した音声の内容に該当する内容の音声が
音声データベースから取り出した音声に含まれるときに
は当該音声を使用者との対話に用いるものであり、音声
データベースがサーバに設けられているから、音声デー
タベースを専門家であるサーバの管理者が監視すること
によって音声データベースの保護や保守が確実に行われ
る。A twenty-second aspect of the present invention is the user terminal according to the seventeenth to twenty-first aspects, wherein the communication support device interacts with the user through the voice input section, the voice output section, and the dialogue processing means. In preparation for the above, the voice database is provided in a server capable of data communication with a user terminal via a communication path, and the voice search unit corresponds to the character when the character is selected by the character selecting means. When the voice extracted from the voice database is included in the voice extracted from the voice database when the voice having the content corresponding to the content of the voice generated by the response sentence generation unit is included in the voice extracted from the voice database at once. The voice database is used for the dialogue of the Protection and maintenance of the speech database is ensured by the administrator of a server is listening.

【０１０７】請求項２３の発明は、請求項１７ないし請
求項２１の発明において、前記音声データベースは前記
コミュニケーション支援装置と同所に設けられるもので
あり、サーバが不要であるからシステムの構成が簡単に
なる。According to a twenty-third aspect of the invention, in the seventeenth to twenty-first aspects of the invention, the voice database is provided at the same place as the communication support device, and a server is not required, so that the system configuration is simple. become.

【０１０８】請求項２４の発明は、請求項１７ないし請
求項２１の発明において、前記コミュニケーション支援
装置は、前記音声入力部および前記音声出力部を使用者
と対話する使用者端末に備えるとともに、前記対話処理
手段を使用者端末とは通信路を介してデータ通信が可能
なサーバに備え、前記音声データベースはサーバに設け
られているものであり、音声データベースと対話処理手
段とがサーバに設けられているから、音声データベース
および対話処理手段を専門家であるサーバの管理者が監
視することによって音声データベースおよび対話処理手
段の保護や保守が確実に行われる。In a twenty-fourth aspect of the present invention based on the seventeenth to twenty-first aspects, the communication support device is provided with a user terminal that interacts with the voice input unit and the voice output unit, and The dialogue processing means is provided in a server capable of data communication with the user terminal via a communication path, the voice database is provided in the server, and the voice database and the dialogue processing means are provided in the server. Therefore, the voice database and the dialogue processing means are monitored and monitored by the server administrator who is an expert, so that the voice database and the dialogue processing means are surely protected and maintained.

【０１０９】請求項２５の発明は、請求項２２または請
求項２４の発明において、前記サーバが、前記コミュニ
ケーション支援装置が前記使用者との対話を行った履歴
を登録する対話記録格納部と、サーバに通信路を介して
接続される端末から対話記録格納部の内容を読み出した
ときに課金処理を行う課金処理部とを備えるものであ
り、対話記録格納部に対話の履歴を登録して対話の履歴
を通信路を通して読出可能にしているから、使用者によ
る対話が正常に行われたか否かを外部から知ることが可
能になる。たとえば、使用者が独居高齢者である場合
に、家族が対話記録を読み出すことによって独居高齢者
と直接対話できない場合でも、独居高齢者の安否を確認
することができる。しかも、対話記録格納部を読み出す
ときに課金するから、対話記録格納部の内容が不必要に
読み出されることがなく、サーバのトラフィックの増加
を抑制するとともにサーバの運用費用を捻出することが
できる。According to a twenty-fifth aspect of the present invention, in the invention of the twenty-second aspect or the twenty-fourth aspect, the server stores an interaction record storage unit for registering a history of interaction of the communication support device with the user, and a server. And a billing processing unit that performs billing processing when the content of the dialogue recording storage unit is read from a terminal connected via a communication path. Since the history can be read out through the communication channel, it becomes possible to know from the outside whether or not the dialogue by the user is normally performed. For example, when the user is an elderly person living alone, the safety of the elderly person living alone can be confirmed even if the family cannot directly communicate with the elderly person living alone by reading the conversation record. In addition, since the fee is charged when reading the dialogue record storage unit, the contents of the dialogue record storage unit are not unnecessarily read, and it is possible to suppress an increase in the traffic of the server and generate the operating cost of the server.

[Brief description of drawings]

【図１】本発明の第１の実施の形態を示すブロック図で
ある。FIG. 1 is a block diagram showing a first embodiment of the present invention.

【図２】同上に用いる対話処理手段を示すブロック図で
ある。FIG. 2 is a block diagram showing a dialogue processing means used in the above.

【図３】本発明の第２の実施の形態の要部のブロック図
である。FIG. 3 is a block diagram of a main part according to a second embodiment of the present invention.

【図４】本発明の第４の実施の形態を示すブロック図で
ある。FIG. 4 is a block diagram showing a fourth embodiment of the present invention.

【図５】同上の動作説明図である。FIG. 5 is an operation explanatory diagram of the above.

【図６】同上の動作説明図である。FIG. 6 is an operation explanatory diagram of the above.

【図７】同上の動作説明図である。FIG. 7 is an operation explanatory diagram of the above.

【図８】本発明の第５の実施の形態を示すブロック図で
ある。FIG. 8 is a block diagram showing a fifth embodiment of the present invention.

[Explanation of symbols]

１コミュニケーション支援装置２ａ携帯端末２ｂ端末２ｃ音声端末２ｄ表示部２ｅ端末３通信路４画像音声サーバ５通信路１０統括制御部１１音声入力部１２音声出力部２０対話処理手段２１音声認識部２３ｄ応答文生成部２３ｅ音声検索部３１駆動源４０使用者情報入力手段４３音声データベース４４音声通信部４５登録処理部４６対話記録格納部４７課金処理部 1 Communication support device 2a Mobile terminal 2b terminal 2c voice terminal 2d display 2e terminal 3 communication channels 4 image and sound server 5 communication paths 10 Integrated control section 11 Voice input section 12 Audio output section 20 Dialog processing means 21 Speech recognition unit 23d Response sentence generator 23e Voice search unit 31 Drive source 40 User information input means 43 voice database 44 Voice communication unit 45 Registration Processing Department 46 Dialog record storage 47 Accounting unit

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 3/00 ５６１Ｅ５６１Ｄ (72)発明者鈴木健之大阪府門真市大字門真1048番地松下電工株式会社内Ｆターム(参考） 5D015 KK02 LL06 5D045 AB11 AB30 ─────────────────────────────────────────────────── ─── Continuation of front page (51) Int.Cl. ⁷ Identification code FI theme code (reference) G10L 3/00 561E 561D (72) Inventor Takeyuki Suzuki 1048, Kadoma, Kadoma-shi, Osaka Matsushita Electric Works Co., Ltd. Inner F term (reference) 5D015 KK02 LL06 5D045 AB11 AB30

Claims

[Claims]

1. A voice input unit and a voice output unit are connected to perform a voice dialogue using a natural language with a user, and a character assumed as a responder who responds to the user can be selected from a plurality of types. And a general control unit having character selection means for selecting an appropriate character from the characters in the dialogue processing means according to the dialogue environment.

2. The communication support device according to claim 1, wherein a voice color is included as an attribute of the character.

3. A drive source for driving a robot having a movable part expressing an expression, and the character selecting means selects a character according to the expression expressed by the motion of the movable part. The communication support device according to claim 1.

4. The communication support device according to claim 1, further comprising an operation unit for selecting the character.

5. A function for calling the user to utter a speech through the dialogue processing means, wherein the character selecting means selects the character according to the content of the user's response. Item 1. The communication support device according to item 1.

6. The user information input means for inputting user information about the user is added, and the character selection means selects the character according to the user information. Communication support device.

7. A signal receiving unit for storing the user information and possessed by the user by the user information input unit, receiving the user information from the previous user information input unit, and delivering the user information to the general control unit. The communication support device according to claim 6, further comprising:

8. The communication support apparatus according to claim 7, wherein a transmission path between the user information input means and the signal receiving section is a wireless transmission path.

9. The communication support apparatus according to claim 7, wherein at least a part of a transmission path between the user information input means and the signal receiving unit is a human body.

10. The communication support device according to claim 6, wherein the user information includes the age / sex of the user as unique information.

11. The communication support device according to claim 6, wherein the user information includes biometric information of the user.

12. The communication support device according to claim 6, wherein the user information includes identification information for identifying a user.

13. The user information according to claim 6, wherein the user information includes system setting information required when connecting to an external device connected via a communication path. The communication support device described.

14. The character selecting means is capable of selecting, together with a character assumed as a responder, a character that makes a relationship during a dialogue.
The communication support device according to claim 13.

15. When the character selection means obtains a plurality of recognition candidates for a recognition result for a voice input by a user, a different character is assigned to each recognition candidate, and the dialogue processing means confirms the recognition candidates. 15. The method according to claim 1, further comprising a function of determining the content of the voice input by the user when the user is asked a question and a positive response by the user is received.
The communication support device according to any one of 1.

16. The dialogue processing means is provided with a voice recognizing unit for recognizing a user's utterance content, and the voice recognizing unit is provided with a plurality of acoustic models to be compared with a feature amount series of voice and each acoustic model A character is assigned to each of the characters, a character suitable for the voice of the user is extracted according to the result of confirmation of the recognition candidate, and an acoustic model corresponding to the extracted character is used in the voice recognition unit. 15. The communication support device according to item 15.

17. The communication support device according to claim 1, and a voice database in which voices of a plurality of contents are registered for the plurality of characters, and the dialogue processing means. A response sentence generation unit that generates the content of a voice response to the user,
A voice search unit for searching a voice database for a voice corresponding to the character selected by the character selection unit and corresponding to the voice generated by the response sentence generation unit, and the voice search unit corresponds to the voice database. A communication support system characterized in that when a voice to be extracted is extracted, the voice is used for dialogue with a user.

18. The voice database includes a voice communication unit capable of communicating with a voice terminal capable of voice input, and a registration processing unit for registering voice input from the voice terminal through the voice communication unit. The communication support system according to claim 17, wherein the communication support system is attached.

19. The communication support system according to claim 18, wherein the voice terminal is a telephone terminal.

20. The communication support system according to claim 18, wherein the registration processing unit has a function of instructing the voice terminal about the content of voice to be registered in the voice database.

21. The voice terminal includes a display unit for displaying an image, and the registration processing unit has a function of displaying, in a menu format, the content of voice to be registered in the voice database on the voice terminal. The communication support system according to claim 18 or 19, characterized in that.

22. The communication support device comprises a voice input unit, a voice output unit, and a dialogue processing unit in a user terminal that interacts with a user, and the voice database communicates with the user terminal. Is provided in a server capable of data communication via, the voice search unit collectively retrieves voices corresponding to the character from the voice database when the character is selected by the character selection unit, and the dialogue processing unit, 2. When the voice extracted from the voice database includes a voice having a content corresponding to the content of the voice generated by the response sentence generation unit, the voice is used for a dialogue with a user.
The communication support system according to any one of claims 7 to 21.

23. The communication support system according to claim 17, wherein the voice database is provided at the same place as the communication support device.

24. The communication support device,
The voice input unit and the voice output unit are provided in a user terminal that interacts with a user, the interaction processing unit is provided in a server that can perform data communication with the user terminal via a communication path, and the voice database is The communication support system according to any one of claims 17 to 21, wherein the communication support system is provided in a server.

25. The server comprises a dialogue record storage unit for registering a history of dialogues with the user by the communication support apparatus, and a dialogue record storage unit for storing a dialogue record storage unit from a terminal connected to the server via a communication path. 25. The communication support system according to claim 22, further comprising: a billing processing unit that performs a billing process when the content is read.