JP2002261966A

JP2002261966A - Communication support system and photographing equipment

Info

Publication number: JP2002261966A
Application number: JP2001222210A
Authority: JP
Inventors: Hiroshi Hoshino; 洋星野; Takeyuki Suzuki; 健之鈴木; Takashi Nishiyama; 高史西山
Original assignee: Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 2000-09-08
Filing date: 2001-07-23
Publication date: 2002-09-13

Abstract

PROBLEM TO BE SOLVED: To enable to provide a mental care for loneliness by simulating human conversations and to watch whether there is anything wrong by transmitting an image to a mobile terminal within the extent not invading the privacy. SOLUTION: A robot 1 comprises a conversation processing means 21 to which a microphone 11 and a speaker 12 are connected and which enables voice conversations with the other party using a natural language, an image processing means 22 which is connected by a TV camera 13 for photographing the other party and outputs an image data which is a digitized image photographed by the TV camera 13, and a communication means 23 for sending out the image data onto a network. The image data is transferred to an image server 4 via a communication line 3. By accessing the image server 4 from a mobile terminal 2 which is allowed to access the image server, the image data can be displayed on the mobile terminal 2.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、コミュニケーショ
ン支援システムおよび撮影装置に関するものである。The present invention relates to a communication support system and a photographing device.

【０００２】[0002]

【従来の技術】一般に、高齢者は外出の機会が少なく他
者との会話が少ないから、家族が会話の相手になってい
ることが多い。また、高齢者では体力の衰えから家庭内
での事故も多く、事故を未然に防止したり事故があれば
迅速に対処するために、家族が常に見守っていることも
必要である。このような事情から、高齢者とともに生活
している家族は高齢者を家に残して外出することが難し
く、旅行など言うに及ばず、買い物や習い事すらままな
らないことが多い。2. Description of the Related Art In general, elderly people have few opportunities to go out and have few conversations with others, so that their family is often the partner of conversation. In addition, many elderly people are at home due to physical weakness, and their families need to be constantly watching to prevent accidents before they occur and to respond quickly if accidents occur. Under such circumstances, it is difficult for a family living with the elderly to leave the elderly at home and go out.

【０００３】一方、最近では、あたかも意思を持つかの
ように行動するペットロボットと称する種類のロボット
が製品化されており、この種のロボットとしては、たと
えば犬型のペットロボットが「ＡＩＢＯ」（ソニー社の
商品名）という名称で製品化されている。このロボット
は、人の声の調子や表情や触れ方などを監視して学習す
ることによって、従来の機械装置のように刺激に対する
応答が定式化せず刺激に対する応答が状況に応じて変化
することになり、成長しているかのように振る舞った
り、人とコミュニケーションをしているかのように錯覚
させることが可能になり、一種の癒し機能を有するもの
になっている。ただし、このペットロボットは、犬の動
きを模倣してエンターテインメントを指向するものであ
り、高齢者とのコミュニケーションの支援を目的とする
ものではない。On the other hand, recently, a kind of robot called a pet robot that acts as if it has intention has been commercialized. As this kind of robot, for example, a dog-shaped pet robot is called “AIBO” ( It is commercialized under the name of Sony Corporation. This robot monitors and learns the tone, facial expression, and how to touch a person's voice. It becomes possible to behave as if growing and to make the illusion of communicating with people, and it has a kind of healing function. However, this pet robot is intended for entertainment by imitating the movement of a dog, and is not intended to support communication with the elderly.

【０００４】一方、高齢者とのコミュニケーションを主
たる目的としたペットロボットとしては、松下電器産業
株式会社から猫型あるいは熊型のロボットが提案されて
おり、このペットロボットは高齢者の快適な生活に貢献
するために、簡単な日常会話とペットのような振る舞い
が行えるほか、遠隔地からペットロボットの使用状況を
間接的に把握する機能も備えている。したがって、この
ペットロボットは高齢者の話し相手となって高齢者の精
神的なケアを行い、また高齢者の独居生活の安全の確認
を遠隔地から行うことが可能になっている。このペット
ロボットは、通信回線（双方向ＣＡＴＶ施設にデジタル
通信システム技術およびモバイル通信技術を複合した通
信回線）を介して福祉サービス支援センタに設置された
センタ設備と接続されるものであって、独居している高
齢者の安全を確認する際には、ペットロボットに発話さ
せるメッセージをセンタ設備から通信回線を通してペッ
トロボットに伝送し、ペットロボットから高齢者に向か
ってメッセージを音声によって伝える。このときの高齢
者の応答をセンタ設備に返送し、センタ設備に設けた推
論装置によって生活の状況を推論することによって、高
齢者の安全の確認を行う。推論装置では、たとえばペッ
トロボットの発話に対して高齢者の応答がないときに、
異常が生じている可能性があるなどと推論するのであ
る。On the other hand, Matsushita Electric Industrial Co., Ltd. has proposed a cat-type or bear-type robot as a pet robot whose main purpose is to communicate with the elderly. In order to contribute, it can perform simple daily conversations and behave like a pet, and also has a function to indirectly grasp the usage status of a pet robot from a remote location. Therefore, this pet robot can talk to the elderly and provide mental care for the elderly, and can confirm the safety of the elderly living alone from a remote location. This pet robot is connected to a center facility installed in a welfare service support center via a communication line (a communication line in which digital communication technology and mobile communication technology are combined with a two-way CATV facility). When confirming the safety of the elderly, a message to be uttered by the pet robot is transmitted from the center facility to the pet robot through a communication line, and the message is transmitted from the pet robot to the elderly by voice. At this time, the response of the elderly is returned to the center equipment, and the safety of the elderly is confirmed by inferring the state of life with the inference device provided in the center equipment. In the inference device, for example, when the elderly person does not respond to the utterance of the pet robot,
It infers that something may have gone wrong.

【０００５】[0005]

【発明が解決しようとする課題】上述したペットロボッ
トには、生き物のように反応する機能があり、前者のペ
ットロボットには対話機能がないものの、後者のペット
ロボットは対話機能を備えているから、高齢者の孤独感
を軽減することが可能と考えられる。しかしながら、後
者のペットロボットは、公共施設である福祉サービス支
援センタに通信回線を介して接続されるものであるか
ら、高齢者の生活のプライバシが侵害されることのない
ように、高齢者の生活の様子を撮影した映像を転送する
機能は備えていない。したがって、異常が生じていると
推論したときにも、現場に赴かなければどのような異常
が生じているかを確認することができず、病気やけがの
ように迅速な対処が要求される事態に対しての処置に遅
れが生じる可能性がある。The above-mentioned pet robot has a function of reacting like a living thing, and the former pet robot has no interactive function, but the latter pet robot has an interactive function. It is thought that it is possible to reduce the loneliness of the elderly. However, since the latter pet robot is connected to a welfare service support center, which is a public facility, via a communication line, it is necessary to ensure that the privacy of the elderly is not violated. It does not have a function to transfer the video of the situation. Therefore, even if it is inferred that an abnormality has occurred, it is not possible to confirm what abnormality has occurred unless the person goes to the site, and prompt action is required such as illness or injury. May be delayed.

【０００６】家族が外出したり旅行したりする際に家に
残した高齢者の様子を確認するのであれば、テレビ電話
に代表されるように画像の転送が可能な機器を用いるこ
とが考えられるが、この種の機器は通話時において映像
の転送も可能としているに過ぎず、通話時以外に高齢者
の会話の相手になる機能はなく、通話時以外に高齢者の
精神的なケアを行う機能はない。このような事情は高齢
者を抱える家庭だけではなく、病人を抱える家庭におい
ても同様である。To check the state of the elderly left at home when a family goes out or travels, it is conceivable to use a device capable of transferring images, such as a videophone. However, this type of device only enables the transfer of video during a call, and there is no function to talk to elderly people except during calls, and provides mental care for elderly people except during calls No function. Such a situation applies not only to families with elderly people but also to families with sick people.

【０００７】本発明は上記事由に鑑みて為されたもので
あり、その目的は、人との対話を模擬することによって
人の孤独感の精神的なケアを可能とするとともに、プラ
イバシを侵害しない範囲で対話者の画像を携帯端末に転
送して対話者の異常の有無を看視可能としたコミュニケ
ーション支援システムおよび撮影装置を提供することに
ある。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and aims to simulate a dialogue with a person to enable mental care of loneliness of a person and not to violate privacy. It is an object of the present invention to provide a communication support system and a photographing device that allow an image of an interlocutor to be transferred to a portable terminal within a range so that the presence or absence of an abnormality of the interlocutor can be monitored.

【０００８】[0008]

【課題を解決するための手段】請求項１の発明は、音声
入力装置と音声出力装置とが接続され自然言語を用いた
音声による対話を対話者との間で行う対話処理手段と、
対話者を撮像する映像入力装置が接続され映像入力装置
で撮像された映像をデジタル化した映像データを出力す
る映像処理手段と、映像データをネットワークに送出す
る通信手段と、ネットワークを通して対話者の映像デー
タの閲覧が許可されている携帯端末と、映像データを携
帯端末に表示可能なフォーマットに変換する変換手段と
を備えるものである。この構成によれば、対話処理手段
を用いて人との対話を模擬するから対話者は孤独感に対
する精神的なケアがなされる。また、対話者の映像デー
タの閲覧は許可されている携帯端末のみで可能であるか
ら、携帯端末を対話者の家族のみが携帯できるようにす
るように使用することによって、他人のプライバシを侵
害しない範囲で対話者の画像を携帯端末に転送して対話
者の異常の有無を看視可能とすることができる。その結
果、高齢者や病人を抱える家庭においても高齢者や病人
の様子を携帯端末により遠方から看視することができ、
安心して外出することができるようになる。According to a first aspect of the present invention, there is provided a dialogue processing means for connecting a voice input device and a voice output device and performing a voice dialogue with a dialogue person using a natural language,
A video input device connected to the video input device for capturing an image of the interlocutor; a video processing unit for outputting video data obtained by digitizing the video captured by the video input device; a communication unit for transmitting the video data to a network; The portable terminal is provided with a portable terminal that is permitted to browse data and a conversion unit that converts video data into a format that can be displayed on the portable terminal. According to this configuration, since the dialogue with the person is simulated using the dialogue processing means, the interlocutor is provided with mental care for the feeling of loneliness. In addition, since browsing of the video data of the interlocutor is possible only with the authorized mobile terminal, the privacy of another person is not infringed by using the mobile terminal so that only the interlocutor's family can carry it. The image of the interlocutor can be transferred to the mobile terminal within the range, and the presence or absence of the interlocutor's abnormality can be monitored. As a result, even in a home that has the elderly and the sick, it is possible to view the state of the elderly and the sick from a distance using a mobile terminal,
You will be able to go out with peace of mind.

【０００９】請求項２の発明は、請求項１の発明におい
て、前記変換手段がネットワークを介して転送された映
像データを蓄積する映像サーバに設けられ、映像サーバ
では携帯端末の所持者に許容される映像データのみを選
別して携帯端末に転送するものである。この構成によれ
ば、対話者の映像データを蓄積する映像サーバを設けた
ことによって、過去に遡って映像データを閲覧すること
が可能になる。According to a second aspect of the present invention, in the first aspect of the present invention, the conversion means is provided in a video server for storing video data transferred via a network, and the video server is allowed by a portable terminal holder. In this case, only the video data is selected and transferred to the portable terminal. According to this configuration, by providing the video server that stores the video data of the interlocutor, it is possible to browse the video data retroactively.

【００１０】請求項３の発明は、請求項１の発明におい
て、前記変換手段が映像処理手段と通信手段との一方に
設けられているものである。この構成によれば、映像サ
ーバを設けることなく対話者の映像データを携帯端末か
ら閲覧することができ、システム構成が簡単である。According to a third aspect of the present invention, in the first aspect of the present invention, the conversion means is provided in one of the video processing means and the communication means. According to this configuration, the video data of the interlocutor can be browsed from the portable terminal without providing the video server, and the system configuration is simple.

【００１１】請求項４の発明は、請求項１ないし請求項
３の発明において、前記映像処理手段が前記携帯端末か
らの指示により対話者の映像データを通信手段を通して
ネットワークに送出させるものである。この構成によれ
ば、携帯端末から映像データの送出を指示するから、携
帯端末の所持者が要求したときの映像を見ることができ
る。According to a fourth aspect of the present invention, in the first to third aspects of the present invention, the video processing means sends the video data of the interlocutor to the network through the communication means in response to an instruction from the portable terminal. According to this configuration, since the transmission of the video data is instructed from the mobile terminal, it is possible to view the video when the owner of the mobile terminal requests.

【００１２】請求項５の発明は、請求項１ないし請求項
４の発明において、前記映像処理手段が前記対話処理手
段を通して対話者が指示した命令に応答して動作するも
のである。この構成によれば、対話者が自身の意思で映
像を撮影させるから、たとえば対話者が異常を感じたと
きに映像データを送って携帯端末の所持者に知らせるこ
とが可能になる。According to a fifth aspect of the present invention, in the first to fourth aspects of the present invention, the image processing means operates in response to a command instructed by an interlocutor through the interactive processing means. According to this configuration, since the interlocutor causes the user to shoot a video image by himself / herself, for example, when the interlocutor feels abnormal, it is possible to send video data to notify the portable terminal holder.

【００１３】請求項６の発明は、請求項１ないし請求項
５の発明において、前記映像処理手段が対話者からの指
示により動作しているか前記携帯端末からの指示によっ
て動作しているかを被指示側に報知する報知手段を備え
るものである。この構成によれば、対話者からと携帯端
末からとのどちらからも映像処理手段の制御を可能とし
ながらも、両者の指示が混乱することがない。According to a sixth aspect of the present invention, in accordance with the first to fifth aspects of the present invention, it is specified whether the video processing means is operated according to an instruction from an interlocutor or is operated according to an instruction from the portable terminal. It is provided with a notification means for notifying the side. According to this configuration, while it is possible to control the video processing means from both the interlocutor and the mobile terminal, the instructions of both are not confused.

【００１４】請求項７の発明は、請求項１ないし請求項
６の発明において、対話者の接触を検出する接触センサ
が付加され、接触センサにより対話者の接触が検出され
ると前記対話処理手段が起動するものである。この構成
によれば、対話者が対話しようとするときに接触センサ
に接触することによって対話処理手段が起動されるか
ら、周囲の雑音や対話者以外の他人の音声によって対話
処理手段が誤動作するのを防止することができる。According to a seventh aspect of the present invention, in the first to sixth aspects of the present invention, a contact sensor for detecting a contact of the interlocutor is added, and when the contact sensor detects the contact of the interlocutor, the interaction processing means is provided. Is to start. According to this configuration, the dialogue processing means is activated by touching the contact sensor when the dialogue person tries to talk, so that the dialogue processing means malfunctions due to ambient noise or voice of another person other than the talker. Can be prevented.

【００１５】請求項８の発明は、請求項１ないし請求項
７の発明において、前記映像入力手段によって対話者の
映像が撮影されていることを対話者に示す手段を備える
ものである。この構成によれば、対話者がいつ撮影され
ているかを知ることができるから、プライバシの都合で
撮影されたくないときには撮影されない場所に移動する
などの対処が可能になる。According to an eighth aspect of the present invention, in the first to seventh aspects, a means is provided for indicating to the interlocutor that the image of the interlocutor is being captured by the image input means. According to this configuration, since it is possible to know when the interlocutor is being photographed, it is possible to take a measure such as moving to a place where no photographing is performed when the photographer does not want to be photographed due to privacy.

【００１６】請求項９の発明は、請求項１ないし請求項
８の発明において、前記対話処理手段が、対話者による
音声入力を確認するように対話者に質問する機能を有
し、対話者による肯定の応答を受けると対話者による音
声入力を確定するものである。この構成によれば、復唱
によって対話者の音声入力を確認するから、対話者の発
生した音声を確実に理解することが可能になる。According to a ninth aspect of the present invention, in the first to eighth aspects of the present invention, the interaction processing means has a function of asking the interlocutor to confirm a voice input by the interlocutor. Upon receiving a positive response, the voice input by the interlocutor is determined. According to this configuration, since the voice input of the interlocutor is confirmed by repetition, it is possible to surely understand the voice generated by the interlocutor.

【００１７】請求項１０の発明は、請求項９の発明にお
いて、前記対話処理手段が、対話者による肯定の応答を
教師信号として対話者の発生する音声を学習するもので
ある。この構成によれば、対話者との対話によって対話
者の発生する音声を学習することで認識率が高くなる。According to a tenth aspect of the present invention, in the ninth aspect, the interaction processing means learns a voice generated by the interlocutor using a positive response from the interlocutor as a teacher signal. According to this configuration, the recognition rate is increased by learning the voice generated by the interlocutor in the dialog with the interlocutor.

【００１８】請求項１１の発明は、請求項１ないし請求
項１０の発明において、前記携帯端末で閲覧した映像デ
ータを前記ネットワーク上の他の端末に転送する手段を
設けたものである。この構成によれば、映像データを他
の端末に転送することにより、たとえば携帯端末の所持
者が対話者の異常に気づいたときに映像データを医者に
送ることで対処のアドバイスを受けるなどの使用が可能
になる。According to an eleventh aspect of the present invention, in the first to tenth aspects, there is provided means for transferring video data browsed by the portable terminal to another terminal on the network. According to this configuration, by transferring the video data to another terminal, for example, when the owner of the portable terminal notices an abnormality of the interlocutor, the video data is sent to the doctor to receive advice on coping. Becomes possible.

【００１９】請求項１２の発明は、請求項１ないし請求
項１１の発明において、前記携帯端末から映像データの
閲覧が要求されたことを前記対話者に通知する通知手段
を備えるものである。この構成によれば、対話者である
高齢者や病人を抱える家庭においても家族が携帯端末を
用いて外出先から高齢者や病人の様子を看視したとき
に、高齢者や病人に対しては通知手段によって家族が確
認したことが通知されるから、高齢者や病人にとっては
家族が確認してくれたことを知ることができ安心感が得
られる。In a twelfth aspect of the present invention, in the first to eleventh aspects of the present invention, there is provided a notifying means for notifying the interlocutor of a request to view video data from the portable terminal. According to this configuration, even when the family uses the portable terminal to monitor the elderly or the sick from outside the home, even if the family has the elderly or the sick who are interlocutors, Since the notification means notifies that the family has confirmed, the elderly or sick can know that the family has confirmed and can feel reassured.

【００２０】請求項１３の発明は、請求項１２の発明に
おいて、前記通知手段が前記携帯端末の識別情報を用い
て映像データの閲覧者を通知する機能を備えるものであ
る。この構成によれば、映像データを誰が確認したかを
知ることができるから、より安心することができる。According to a thirteenth aspect, in the twelfth aspect, the notifying means has a function of notifying a viewer of the video data using the identification information of the portable terminal. According to this configuration, it is possible to know who has confirmed the video data, so that the user can be more relieved.

【００２１】請求項１４の発明は、請求項１３の発明に
おいて、前記通知手段が映像データの閲覧者ごとに割り
当てた報知部を備えるものである。この構成によれば、
文字データなどによらずに報知部を視認すれば閲覧者を
識別することができるから、閲覧者が映像データを確認
したことを知るだけでなく誰が閲覧したかも直感的に知
ることができる。According to a fourteenth aspect, in the thirteenth aspect, the notifying means includes a notifying unit assigned to each viewer of the video data. According to this configuration,
Since the viewer can be identified by visually recognizing the notification unit without depending on the character data or the like, it is possible to intuitively know not only that the viewer has checked the video data but also who has viewed the video data.

【００２２】請求項１５の発明は、音声入力装置と音声
出力装置とが接続され自然言語を用いた音声による対話
を対話者との間で行う対話処理手段と、対話者を撮像す
る映像入力装置が接続され映像入力装置で撮像された映
像をデジタル化した映像データを出力する映像処理手段
と、映像データをネットワークに送出する通信手段と、
ネットワークを通して対話者の映像データの閲覧が許可
されている携帯端末に表示可能なフォーマットとなるよ
うに映像データを変換する変換手段とを備えるものであ
る。この構成によれば、対話処理手段を用いて人との対
話を模擬するから対話者は孤独感に対する精神的なケア
がなされる。また、対話者の映像データの閲覧は許可さ
れている携帯端末のみで可能であるから、携帯端末を対
話者の家族のみが携帯できるようにするように使用する
ことによって、他人のプライバシを侵害しない範囲で対
話者の画像を携帯端末に転送して対話者の異常の有無を
看視可能とすることができる。その結果、高齢者や病人
を抱える家庭においても高齢者や病人の様子を携帯端末
により遠方から看視することができ、安心して外出する
ことができるようになる。According to a fifteenth aspect of the present invention, a voice input device and a voice output device are connected to each other, and a dialogue processing means for performing a voice dialogue with a dialogue person using a natural language, and a video input device for imaging the talker. A video processing unit that is connected and outputs video data obtained by digitizing a video captured by a video input device, a communication unit that sends video data to a network,
Conversion means for converting the video data into a format that can be displayed on a portable terminal permitted to view the video data of the interlocutor through the network. According to this configuration, since the dialogue with the person is simulated using the dialogue processing means, the interlocutor is provided with mental care for the feeling of loneliness. In addition, since browsing of the video data of the interlocutor is possible only with the authorized mobile terminal, the privacy of another person is not infringed by using the mobile terminal so that only the interlocutor's family can carry it. The image of the interlocutor can be transferred to the mobile terminal within the range, and the presence or absence of the interlocutor's abnormality can be monitored. As a result, even in a home that has the elderly and the sick, the state of the elderly and the sick can be monitored from a distance using a portable terminal, and the user can go out with peace of mind.

【００２３】請求項１６の発明は、請求項１５の発明に
おいて、現在日時を計時する時計手段と、対話者が撮像
されたときに時計手段で計時されている日時を文字とし
て映像データに合成して通信手段から送出させるスーパ
インポーズ処理手段とが付加されたものである。この構
成によれば、各映像データに映像の撮像された日時が添
付されているから、映像が撮像された日時と映像データ
を携帯端末によって閲覧する日時とに大きな時間差があ
るようなときに、閲覧者は映像データに添付されている
日時を確認することによって、最新の映像を求めるなど
の対処が可能になる。According to a sixteenth aspect of the present invention, in the invention of the fifteenth aspect, the clock means for measuring the current date and time and the date and time measured by the clock means when the interlocutor is imaged are synthesized as characters into the video data. And superimpose processing means for sending out from the communication means. According to this configuration, since the date and time when the video was captured is attached to each video data, when there is a large time difference between the date and time when the video was captured and the date and time when the video data was browsed by the mobile terminal, By checking the date and time attached to the video data, the viewer can take measures such as requesting the latest video.

【００２４】請求項１７の発明は、請求項１６の発明に
おいて、前記通信手段を通してネットワークに送出され
る映像データおよび通信の成功・失敗の結果を前記時計
手段で計時されている日時に対応付けて順次記録して蓄
積するとともに無給電で記録内容を保持する記憶手段が
付加されているものである。この構成によれば、通信手
段から映像データが送出されず携帯端末によって映像デ
ータを確認することができないような場合にも、記憶手
段に蓄積された映像データによって対話者の健康状態な
どを後日確認することができ、また通信の成功・失敗の
結果を送信履歴として記憶手段に格納しているから、メ
ンテナンス時に通信手段の異常の有無などを容易に知る
ことができる。According to a seventeenth aspect, in the sixteenth aspect, the video data transmitted to the network through the communication means and the result of the success or failure of the communication are associated with the date and time measured by the clock means. A storage means for sequentially recording and accumulating and holding the recorded contents without power supply is added. According to this configuration, even when the video data is not transmitted from the communication means and the video data cannot be confirmed by the portable terminal, the health condition of the interlocutor can be confirmed later by the video data stored in the storage means. Since the result of communication success / failure is stored in the storage means as a transmission history, it is possible to easily know whether or not the communication means is abnormal during maintenance.

【００２５】請求項１８の発明は、請求項１５ないし請
求項１７の発明において、前記映像入力装置の傾きを検
出する姿勢センサが付加され、前記映像処理手段では映
像入力装置により撮像された映像を正立させるように姿
勢センサにより検出された傾きに基づいて映像を補正す
るものである。この構成によれば、姿勢センサにより得
られている情報に基づいて映像の傾きを除去するように
補正するから、映像入力装置が傾いているような場合、
たとえば水平ではない場所に撮影装置が配置されている
場合や対話者が撮影装置を持ち上げているような場合に
おいても、撮影装置が水平面上に配置されている状態と
同様の映像を得ることができる。According to an eighteenth aspect of the present invention, in the invention of the fifteenth to seventeenth aspects, an attitude sensor for detecting a tilt of the image input device is added, and the image processing means converts an image captured by the image input device into an image. The image is corrected based on the inclination detected by the posture sensor so as to be erect. According to this configuration, since the correction is performed so as to remove the inclination of the image based on the information obtained by the posture sensor, when the image input device is inclined,
For example, even when the photographing device is arranged in a place that is not horizontal or when an interlocutor lifts the photographing device, it is possible to obtain an image similar to the state where the photographing device is arranged on a horizontal plane. .

【００２６】請求項１９の発明は、請求項１５ないし請
求項１８の発明において、前記映像入力装置の視野の方
向を変更可能とする首振り駆動手段と、前記映像入力装
置の中央付近に対話者を撮像するように規定したテンプ
レートが登録可能であって、映像入力装置で撮像された
映像とテンプレートとを照合して類似度が大きくなるよ
うに首振り駆動手段を制御するパターン処理手段とが付
加されているものである。この構成によれば、映像入力
装置により撮像された映像とテンプレートとの類似度が
大きくなるように映像入力装置の視野の向きが制御され
るから、適切なテンプレートが設定されていれば対話者
を映像入力装置の視野の中央付近に捕捉するように映像
入力装置の向きを調節することが可能になる。According to a nineteenth aspect of the present invention, in accordance with the fifteenth to eighteenth aspects of the present invention, there is provided a swing driving means capable of changing the direction of the visual field of the video input device, and A template that specifies that the image is captured can be registered, and a pattern processing unit that controls the swing driving unit so as to increase the similarity by comparing the template with the image captured by the video input device is added. Is what is being done. According to this configuration, the direction of the visual field of the video input device is controlled so that the similarity between the video captured by the video input device and the template is increased. It is possible to adjust the orientation of the video input device so that it is captured near the center of the visual field of the video input device.

【００２７】請求項２０の発明は、請求項１５ないし請
求項１８の発明において、前記映像入力装置の視野の方
向を指向性を有する方向に一致させた第２の音声入力装
置と、前記映像入力装置の視野の方向と第２の音声入力
装置が指向性を有する方向とを同時に変更可能とする首
振り駆動手段と、前記第２の音声入力装置の出力の音レ
ベルが大きくなる向きに首振り駆動手段を制御する音レ
ベル処理手段とが付加されているものである。この構成
によれば、音レベルによって対話者の存在する方向を推
定することができるから、映像入力装置の視野を音レベ
ルの大きくなる向きに調節することによって、映像入力
装置の視野内に対話者を捕捉できる可能性が高くなる。According to a twentieth aspect of the present invention, in accordance with the fifteenth to eighteenth aspects of the present invention, the second audio input device in which the direction of the visual field of the video input device matches the direction having directivity, and the video input device Swing drive means for simultaneously changing the direction of the visual field of the device and the direction in which the second voice input device has directivity, and the head in a direction in which the sound level of the output of the second voice input device increases. A sound level processing means for controlling the driving means is added. According to this configuration, the direction in which the interlocutor exists can be estimated based on the sound level. Therefore, by adjusting the field of view of the video input device to a direction in which the sound level increases, the interlocutor can be estimated within the field of view of the video input device. Is more likely to be captured.

【００２８】請求項２１の発明は、請求項１５ないし請
求項１８の発明において、前記映像入力装置の視野の方
向に存在する音源に対して出力の位相差がゼロになるよ
うに配置した複数個の第３の音声入力装置と、第３の音
声入力装置の位相差を小さくする向きに首振り駆動手段
を制御する位相差検出手段とを付加したものである。こ
の構成によれば、複数個の音声入力装置の位相差を小さ
くするように映像入力装置の視野を調節することによっ
て、映像入力装置の視野内に対話者を捕捉することが可
能になる。According to a twenty-first aspect of the present invention, in accordance with the fifteenth to eighteenth aspects of the present invention, there are provided a plurality of the video input devices arranged so that a phase difference of an output with respect to a sound source present in a direction of a visual field becomes zero. And a phase difference detecting means for controlling the swing driving means in a direction to reduce the phase difference between the third sound input apparatus and the third sound input apparatus. According to this configuration, by adjusting the visual field of the video input device so as to reduce the phase difference between the plurality of audio input devices, it becomes possible to capture the interlocutor within the visual field of the video input device.

【００２９】請求項２２の発明は、請求項１５ないし請
求項２１の発明において、前記映像入力装置の前方に前
記映像入力装置の光軸方向に直交させた形で配置したハ
ーフミラーが付加されているものである。この構成によ
れば、対話者がハーフミラーに自身の姿を映した状態で
は映像入力装置の正面に対話者が存在することになるか
ら、対話者の意思によって対話者を映像入力装置で確実
に撮像することが可能になる。According to a twenty-second aspect of the present invention, in accordance with the fifteenth to twenty-first aspects, a half mirror is provided in front of the video input device so as to be orthogonal to the optical axis direction of the video input device. Is what it is. According to this configuration, since the interlocutor is present in front of the video input device in a state where the interlocutor reflects his / her figure on the half mirror, the interlocutor can be surely interposed by the video input device by the interlocutor's intention. It becomes possible to take an image.

【００３０】請求項２３の発明は、請求項１５ないし請
求項２１の発明において、外部のディスプレイ装置が接
続可能であって前記映像入力装置により撮像された映像
データをディスプレイ装置に表示可能とする映像出力手
段が付加されているものである。この構成によれば、対
話者は、映像入力装置により撮像されている映像を映像
出力手段に接続したディスプレイ装置によってその場で
確認できるから、対話者は自身の映り具合を確認するこ
とができるだけでなく周囲の様子も確認して適切な映像
が撮像されているか否かを確認することができる。According to a twenty-third aspect of the present invention, in accordance with the fifteenth to twenty-first aspects, an image can be displayed on a display device to which an external display device can be connected and image data picked up by the image input device can be displayed on the display device. Output means is added. According to this configuration, the interlocutor can check the image captured by the video input device on the spot by the display device connected to the video output means, so that the interlocutor can only check his own reflection condition. It is also possible to check whether or not an appropriate video is captured by checking the surroundings.

【００３１】請求項２４の発明は、請求項１５ないし請
求項２３の発明において、前記対話者が抱えることを可
能とする外殻を有し、前記対話者が抱えたときに対話者
の心拍音を検出可能な心拍センサと、心拍センサにより
検出された心拍音を記録する心拍記録装置とが付加され
ているものである。この構成によれば、映像によって対
話者の健康状態を知るだけではなく、心拍音によっても
対話者の健康状態を確認することが可能になり、対話者
に関して得られる情報を複合することによって対話者の
健康状態をより正確に知ることが可能になる。しかも、
心拍センサとして対話者が抱えたときに心拍音を集音す
る構成を採用しているから、手間をかけることなく心拍
音の集音が可能になる。According to a twenty-fourth aspect of the present invention, in the invention of the fifteenth to twenty-third aspects, there is provided an outer shell capable of being held by the interlocutor. And a heart rate recording device that records a heartbeat sound detected by the heart rate sensor. According to this configuration, it is possible to not only know the health of the interlocutor by the video but also check the health of the interlocutor by the heartbeat sound. It is possible to know the health condition of the person more accurately. Moreover,
Since the heartbeat sensor is configured to collect heartbeat sounds when the interlocutor is holding the heartbeat sensor, heartbeat sounds can be collected without any trouble.

【００３２】請求項２５の発明は、請求項１５ないし請
求項２４の発明において、前記携帯端末から映像データ
の閲覧が要求されたことを前記対話者に通知する通知手
段を備えるものである。この構成によれば、対話者であ
る高齢者や病人を抱える家庭においても家族が携帯端末
を用いて外出先から高齢者や病人の様子を看視したとき
に、高齢者や病人に対しては通知手段によって家族が確
認したことが通知されるから、高齢者や病人にとっては
家族が確認してくれたことを知ることができ安心感が得
られる。In a twenty-fifth aspect of the present invention, in any one of the fifteenth to twenty-fourth aspects, a notifying means is provided for notifying the interlocutor of a request to view video data from the portable terminal. According to this configuration, even when the family uses the portable terminal to monitor the elderly or the sick from outside the home, even if the family has the elderly or the sick who are interlocutors, Since the notification means notifies that the family has confirmed, the elderly or sick can know that the family has confirmed and can feel reassured.

【００３３】請求項２６の発明は、請求項２５の発明に
おいて、前記通知手段が前記携帯端末の識別情報を用い
て映像データの閲覧者を通知する機能を備えるものであ
る。この構成によれば、映像データを誰が確認したかを
知ることができるから、より安心することができる。According to a twenty-sixth aspect, in the twenty-seventh aspect, the notifying means has a function of notifying a viewer of the video data using the identification information of the portable terminal. According to this configuration, it is possible to know who has confirmed the video data, so that the user can be more relieved.

【００３４】請求項２７の発明は、請求項２６の発明に
おいて、前記通知手段が映像データの閲覧者ごとに割り
当てた報知部を備えるものである。この構成によれば、
文字データなどによらずに報知部を視認すれば閲覧者を
識別することができるから、閲覧者が映像データを確認
したことを知るだけでなく誰が閲覧したかも直感的に知
ることができる。According to a twenty-seventh aspect, in the twenty-sixth aspect, the notifying unit includes a notifying unit assigned to each viewer of the video data. According to this configuration,
Since the viewer can be identified by visually recognizing the notification unit without depending on the character data or the like, it is possible to intuitively know not only that the viewer has checked the video data but also who has viewed the video data.

【００３５】[0035]

【発明の実施の形態】（第１の実施の形態）まず、本発
明に用いるシステムの全体構成について説明する。本実
施形態は、図１に示すように、高齢者や病人（以下で
は、ロボット１と対話する者という意味で単に「対話
者」と呼ぶ）がいる宅内に配置されるロボット１と、こ
のロボット１にインタネットのようなネットワークを含
む通信回線３を介して接続される携帯端末２とを用い
る。ロボット１は、音声による対話の機能と、周囲を撮
像する機能と、映像を含むデータの通信機能とを備えて
いる。本実施形態においては、ロボット１が犬型の外観
を呈するものとして説明するが、動物などの動きを模擬
することは必須要件ではなく、上述した機能を備えてい
るものであれば、外観形状や動きについてはとくに問題
にする必要はない。本実施形態では、携帯端末２として
画像表示機能を有する携帯電話を想定する。また、通信
回線３には、ロボット１により撮像した映像を一時的に
蓄積する映像サーバ４が接続される。DESCRIPTION OF THE PREFERRED EMBODIMENTS (First Embodiment) First, the overall configuration of a system used in the present invention will be described. In the present embodiment, as shown in FIG. 1, a robot 1 placed in a house where an elderly person or a sick person (hereinafter, simply referred to as a “talker” in the sense of a person interacting with the robot 1), and the robot 1 1 and a portable terminal 2 connected via a communication line 3 including a network such as the Internet. The robot 1 has a function of dialogue by voice, a function of capturing an image of the surroundings, and a function of communicating data including video. In the present embodiment, the robot 1 will be described as having a dog-shaped appearance. However, it is not essential to simulate the movement of an animal or the like. You don't have to worry about movement. In the present embodiment, a mobile phone having an image display function is assumed as the mobile terminal 2. The communication line 3 is connected to a video server 4 for temporarily storing video captured by the robot 1.

【００３６】ロボット１は、音声による対話を行うため
に、耳としてのマイクロホン１１（音声入力装置）と口
としてのスピーカ１２（音声出力装置）とを備え、また
ロボット１の周囲の撮像のために、目としての小型のＴ
Ｖカメラ１３（映像入力装置）を備える。マイクロホン
１１およびスピーカ１２は対話処理手段２１に接続され
ており、対話処理手段２１ではマイクロホン１１および
スピーカ１２を通して自然言語による対話処理が可能に
なっている。一方、ＴＶカメラ１３は映像処理手段２２
に接続され、映像処理手段２２ではＴＶカメラ１３によ
り撮像された映像の輝度やコントラストを調整したり、
映像信号をデジタル化したりする。対話処理手段２１お
よび映像処理手段２２は、マイコンを主構成要素とする
信号処理部２０を構成する。The robot 1 includes a microphone 11 (speech input device) as an ear and a speaker 12 (speech output device) as a mouth in order to perform a dialogue by voice. , A small T as an eye
A V camera 13 (video input device) is provided. The microphone 11 and the loudspeaker 12 are connected to the dialogue processing means 21, and the dialogue processing means 21 enables the dialogue processing in a natural language through the microphone 11 and the speaker 12. On the other hand, the TV camera 13 is
The video processing unit 22 adjusts the brightness and contrast of the video imaged by the TV camera 13,
For example, digitizing video signals. The dialog processing means 21 and the video processing means 22 constitute a signal processing unit 20 having a microcomputer as a main component.

【００３７】映像処理手段２２の後段には通信手段２３
が接続され、通信手段２３では映像信号を通信回線３に
送出することができるデータフォーマットに圧縮して通
信回線３に送出する。圧縮された映像信号（以下では、
映像データという）は通信回線３に接続されている映像
サーバ４に一時的に蓄積される。ここで、映像サーバ４
は、映像データのサイズや色調を携帯端末２の画面に表
示可能なフォーマットに変換する変換手段としての機能
を有する。したがって、携帯端末２から専用のインター
ネット接続サービス（携帯電話において提供されている
インターネット接続サービス等）を通して映像サーバ４
にアクセスすれば、映像サーバ４に蓄積された画像を携
帯端末２の画面に表示させることができる。映像データ
は映像サーバ４によって携帯端末２の画面に表示可能な
フォーマットに変換されているから、ロボット１で撮像
された映像を携帯端末２によって確認することができ
る。なお、映像データは静止画でもよいが動画が望まし
い。ただし、携帯端末２では一般に映像の記憶に用いる
メモリ容量に制限があるから、映像データを携帯端末２
に取り込んだ後に再生することは難しく、現状での通信
回線３における通信速度および映像の圧縮技術を考慮す
るとテレビ電話程度の動画になる。The communication means 23 is provided after the video processing means 22.
The communication means 23 compresses the video signal into a data format that can be sent to the communication line 3 and sends it to the communication line 3. Compressed video signal (below,
Video data) is temporarily stored in a video server 4 connected to the communication line 3. Here, the video server 4
Has a function as a conversion unit that converts the size and color tone of the video data into a format that can be displayed on the screen of the mobile terminal 2. Therefore, the video server 4 is provided from the mobile terminal 2 through a dedicated Internet connection service (such as an Internet connection service provided by a mobile phone).
By accessing, the image stored in the video server 4 can be displayed on the screen of the mobile terminal 2. Since the video data is converted by the video server 4 into a format that can be displayed on the screen of the mobile terminal 2, the video captured by the robot 1 can be confirmed by the mobile terminal 2. The video data may be a still image, but is preferably a moving image. However, since the mobile terminal 2 generally has a limited memory capacity for storing video, the video data is transferred to the mobile terminal 2.
It is difficult to reproduce the video after it has been captured, and in consideration of the current communication speed on the communication line 3 and the video compression technique, the video becomes a moving picture equivalent to a videophone.

【００３８】ところで、通信手段２３には通知手段２８
が接続されており、当該ロボット１により撮像され映像
サーバ４に蓄積された画像が携帯端末２によって閲覧さ
れたときに、通知手段２８は聴覚的手段（スピーカ）な
いし視覚的手段（ランプ）を用いて対話者に画像が閲覧
された旨を通知する。すなわち、対話者の家族などが携
帯端末２を用いて映像サーバ４に蓄積されている画像を
閲覧すれば、映像サーバ４の画像の閲覧がロボット１に
通知され通知手段２８により対話者に通知されることに
なる。このことによって、対話者は家族などが自身の状
況を確認してくれたという安心感を持つことができ、不
安感を解消することができる。The communication means 23 has a notification means 28
Is connected, and when the image captured by the robot 1 and stored in the video server 4 is browsed by the mobile terminal 2, the notification unit 28 uses an auditory unit (speaker) or a visual unit (lamp). To inform the interlocutor that the image has been viewed. That is, if the family of the interlocutor browses the images stored in the video server 4 using the mobile terminal 2, the browsing of the image of the video server 4 is notified to the robot 1, and the interlocutor is notified by the notifying unit 28. Will be. Thus, the interlocutor can have a sense of security that the family or the like has confirmed his / her situation, and can eliminate the feeling of anxiety.

【００３９】映像サーバ４に映像データを蓄積するタイ
ミングは、携帯端末２からの指示によって決定する。つ
まり、携帯端末２からの指示により映像サーバ４を通し
てロボット１に映像データの蓄積を要求し、映像サーバ
４に蓄積された画像を携帯端末２に転送させる。ただ
し、携帯端末２から映像データの転送を要求したときに
ロボット１と対話する対話者がロボット１の近くにいな
い場合があり、また映像データの転送を要求した時点よ
りも過去の映像データを閲覧したい場合もあるから、Ｔ
Ｖカメラ１３による撮像をあらかじめ指定した間に定期
的に行って映像データを映像サーバ４に蓄積しておいて
もよい。このように、ＴＶカメラ１３で撮像した映像デ
ータを映像サーバ４に蓄積するタイミングと、携帯端末
２によって映像サーバ４から映像データを読み出すタイ
ミングとをずらせば、携帯端末２から映像サーバ４に指
示を与えた後に、映像サーバ４から携帯端末２に映像デ
ータを転送する時間だけ待てば映像データを見ることが
できるから、映像を見るまでの応答時間を短縮すること
ができる。なお、この場合には各映像データに撮像時刻
を添付しておくのが望ましい。The timing for storing video data in the video server 4 is determined by an instruction from the portable terminal 2. That is, it requests the robot 1 to accumulate video data through the video server 4 according to an instruction from the mobile terminal 2, and transfers the image stored in the video server 4 to the mobile terminal 2. However, when the mobile terminal 2 requests the transfer of the video data, the interlocutor interacting with the robot 1 may not be near the robot 1, and the user may view the video data past the time when the request of the video data transfer was made. Sometimes I want to
The video data may be stored in the video server 4 by performing the imaging by the V-camera 13 periodically while the video server 13 is designated in advance. As described above, if the timing at which the video data captured by the TV camera 13 is stored in the video server 4 and the timing at which the mobile terminal 2 reads the video data from the video server 4 are shifted, the mobile terminal 2 can instruct the video server 4 to issue an instruction. After giving the video data, the video data can be viewed by waiting for the time required to transfer the video data from the video server 4 to the portable terminal 2, so that the response time until the video is viewed can be reduced. In this case, it is desirable to attach an imaging time to each video data.

【００４０】また、映像サーバ４に蓄積された画像が他
人によって閲覧されることがないように、映像サーバ４
に蓄積された各映像データへのアクセスは、各携帯端末
２からの識別情報に応じて制限される。たとえば、携帯
端末２として携帯電話を用いる場合には、携帯電話の電
話番号を識別情報に用い、映像サーバ４では当該電話番
号に対応付けられている映像データのみを携帯電話に転
送する。識別情報は電話番号に限定されるものではな
く、映像サーバ４にアクセスした後に暗証番号を識別情
報に用いたり、複数の情報の組み合わせを識別情報に用
いたりすることもできる。Further, the video server 4 is designed to prevent the images stored in the video server 4 from being viewed by others.
Access to each of the video data stored in the portable terminal 2 is restricted according to the identification information from each portable terminal 2. For example, when a mobile phone is used as the mobile terminal 2, the phone number of the mobile phone is used for the identification information, and the video server 4 transfers only the video data associated with the phone number to the mobile phone. The identification information is not limited to the telephone number, and it is also possible to use the personal identification number as the identification information after accessing the video server 4 or use a combination of a plurality of information as the identification information.

【００４１】上述のように、映像サーバ４に蓄積された
映像データにアクセスする際には、携帯端末２からの識
別情報を確認するために、映像サーバ４には識別情報と
名前のような個人データとを対応付けたデータベースを
設けてある。データベースは携帯端末２からの識別情報
が入力されたときに、個人データをロボット１に転送
し、対話者に対して通知手段２８を通して個人データを
通知するようにしてある。この目的のために通知手段２
８は音声によって個人データを通知するか、あるいは別
途に小型のディスプレイ装置を設けて個人データを表示
する。この構成により、対話者は映像サーバ４にアクセ
スした人の確認ができ、対話者に対して安心感を与える
ことができる。As described above, when accessing the video data stored in the video server 4, in order to confirm the identification information from the portable terminal 2, the video server 4 uses the identification information and the personal information such as the name. A database that associates data with the data is provided. The database transfers personal data to the robot 1 when identification information is input from the mobile terminal 2, and notifies the interlocutor of the personal data through the notification means 28. Notification means 2 for this purpose
Numeral 8 notifies personal data by voice or displays a personal data by separately providing a small display device. With this configuration, the interlocutor can confirm the person who has accessed the video server 4, and can give the interlocutor a sense of security.

【００４２】ところで、本実施形態では、携帯端末２を
通して得られた映像のうちから必要なものを選択し電子
メールに添付して適宜の送付先に転送可能としてある。
つまり、映像サーバ４では携帯端末２によって指定され
た映像データにインデクスを付与し、この映像データを
携帯端末２で入力された電子メールの添付書類として扱
うことができるようにしてある。したがって、携帯端末
２を通して得た映像により撮像された対話者に何らかの
異常が認められるような場合には、かかりつけの医者に
映像データを添付した電子メールを送信することが可能
になり、医者のアドバイスを受けるなどの適切な処置が
可能になる。By the way, in this embodiment, it is possible to select a necessary one from the videos obtained through the portable terminal 2, attach it to an e-mail, and transfer it to an appropriate destination.
That is, the video server 4 adds an index to the video data specified by the mobile terminal 2 so that the video data can be handled as a document attached to the electronic mail input by the mobile terminal 2. Therefore, in the case where any abnormalities are recognized in the interlocutor imaged by the video obtained through the mobile terminal 2, it becomes possible to send an e-mail with the video data attached to the doctor at home, and the doctor's advice Appropriate measures such as receiving

【００４３】上述のように、映像サーバ４を通して携帯
端末２で再生することができる映像をＴＶカメラ１３で
撮像するから、ＴＶカメラ１３によって撮像するタイミ
ングがロボット１と対話する対話者にとっては不都合な
ときもある。たとえば、身内といえども着替え中に撮像
されるのはプライバシ上で好ましくない。そこで、ロボ
ット１と対話する対話者にＴＶカメラ１３での撮像中で
あることを報知するために、ロボット１には表示手段２
７を設けてある。表示手段２７は発光ダイオードでよ
く、ＴＶカメラ１３での撮像中に点灯させることによっ
て対話者に報知することができる。As described above, since the video that can be reproduced on the portable terminal 2 through the video server 4 is captured by the TV camera 13, the timing at which the video is captured by the TV camera 13 is inconvenient for a person who interacts with the robot 1. Sometimes. For example, it is not preferable from the viewpoint of privacy that an image is taken while changing clothes even in a relative. Therefore, in order to notify the interlocutor interacting with the robot 1 that the TV camera 13 is taking an image, the robot 1 is provided with a display unit 2.
7 is provided. The display means 27 may be a light emitting diode, and can be informed to the interlocutor by turning on the light during the imaging by the TV camera 13.

【００４４】ところで、対話処理手段２１は、図２に示
す構成を有し、音声による簡単な対話が可能になってい
る。マイクロホン１１を通して入力された音声は音声認
識部３１に入力される。ロボット１の使用環境に鑑みれ
ば周囲雑音もマイクロホン１１を通して入力されている
ことが多いから、音声認識部３１では、マイクロホン１
１から入力された音声の特徴パラメータを抽出し、音声
の継続性や周波数分布に基づいて音声と雑音とを分離す
るとともに話者認識を行う。話者認識を行うのは、テレ
ビジョンやラジオからの音声と対話者の音声との誤認を
避けるためである。話者認識や雑音の分離に要する情報
をここでは音響モデルＭ１と呼ぶことにする。また、音
声認識部３１においは、標準パターンとの照合によって
単語認識を行い、音声をテキストデータＴＸＴ１に変換
して出力する。ここに、単語認識を行う標準パターンの
情報をここでは言語モデルＭ２と呼ぶことにする。Incidentally, the dialogue processing means 21 has the configuration shown in FIG. 2 and enables a simple dialogue by voice. The voice input through the microphone 11 is input to the voice recognition unit 31. In view of the environment in which the robot 1 is used, ambient noise is often input through the microphone 11.
1 to extract the feature parameters of the input speech, separate speech and noise based on the continuity and frequency distribution of the speech, and perform speaker recognition. The speaker recognition is performed in order to avoid erroneous recognition between the voice from the television or the radio and the voice of the interlocutor. The information required for speaker recognition and noise separation is herein referred to as an acoustic model M1. The speech recognition unit 31 performs word recognition by collating with a standard pattern, converts speech into text data TXT1, and outputs the text data TXT1. Here, the information of the standard pattern for performing the word recognition is herein referred to as a language model M2.

【００４５】テキストデータＴＸＴ１は対話管理部３２
に入力される。対話管理部３２は、一種の推論エンジン
であって、テキストデータＴＸＴ１を知識ベースＩＢに
格納された知識と照合する自然言語処理技術を用いて、
テキストデータＴＸＴ１の構文解析やキーワードの抽出
を行い、その結果に基づいて意味解析や文脈解析を行
う。すなわち、対話者による発話の意味や対話者の意図
を分析して所定の形式で表し、発話の意味や対話者の意
図の分析結果に応じて応答内容を生成して所定の形式で
発話内容生成部３３に引き渡す。ここで、所定の形式と
は意味表現の形式のことである。発話内容生成部３３で
は、対話管理部３２から渡された応答内容に応じてテキ
ストデータＴＸＴ２を生成する。このテキストデータＴ
ＸＴ２は、音声発生部３４に引き渡され、テキストから
の音声合成を行ったり、予め登録されている音声をテキ
ストに従って編集することによって、スピーカ１２を通
して音声として出力させる。ただし、スピーカ１２から
出力する音声をできるだけ自然な音声に近付けることが
望ましい。The text data TXT1 is stored in the dialog management unit 32.
Is input to The dialogue management unit 32 is a kind of inference engine, and uses a natural language processing technology for collating the text data TXT1 with the knowledge stored in the knowledge base IB.
The syntax analysis and the keyword extraction of the text data TXT1 are performed, and the semantic analysis and the context analysis are performed based on the result. That is, the meaning of the utterance of the interlocutor and the intention of the interlocutor are analyzed and expressed in a predetermined format, and the response content is generated in accordance with the analysis result of the meaning of the utterance and the intention of the interlocutor, and the utterance content is generated in the predetermined format. Deliver to part 33. Here, the predetermined format is a format of a semantic expression. The utterance content generation unit 33 generates text data TXT2 according to the response content passed from the dialog management unit 32. This text data T
The XT 2 is delivered to the voice generating unit 34, and performs voice synthesis from the text, or edits a pre-registered voice according to the text, and outputs the voice as voice through the speaker 12. However, it is desirable to make the sound output from the speaker 12 as close to natural sound as possible.

【００４６】対話管理部３２においては、以下に説明す
るように、「しり取り」、「スケジュール管理」に対応
した応答内容を生成することができる。どの応答内容を
生成するかは、対話者がマイクロホン１１を通して音声
によってロボット１に伝達する。In the dialog management section 32, as described below, it is possible to generate a response corresponding to "scrutiny" and "schedule management". Which response content is to be generated is transmitted by the interlocutor to the robot 1 through the microphone 11 by voice.

【００４７】「しり取り」を行うには、対話者が「しり
取りがしたい」という命令を音声によってロボット１に
与える。ここで、先攻と後攻とを取り決めた後、しり取
りが開始される。しり取りでは、対話者の発話は原則と
して単語のみであるから、構文解析や文脈解析を行うこ
となく単語認識のみを行えばよいことになる。しり取り
の際の処理手順は、図３のようになる。いま、対話者が
先攻であるものとすると、対話管理部３２においては、
音声認識部３１からのテキストデータＴＸＴ１（単語）
が入力されると（Ｓ１）、入力された単語がしり取りの
開始後にすでに入力された単語か否かを判断する（Ｓ
２）。しり取りの開始から未入力の単語であれば、適宜
の記憶装置に記憶し、語尾の音節を抽出する（Ｓ３）。
次に、抽出した語尾の音節と同じ音節の語頭を持つ単語
を知識ベースＩＢの中の辞書から検索して抽出する（Ｓ
４）。ここで、抽出した単語を上述した記憶装置に照合
して、しり取りの開始後にすでに用いられた単語か否か
を判定し（Ｓ５）、すでに用いられた単語であれば、ス
テップＳ４，Ｓ５を繰り返すことにより別の単語を検索
して抽出する。ここに、辞書内での単語の並びに変化が
なければ、毎回同じ順序で単語が抽出されることになっ
て興味が失われるから、検索時には単語をランダムに検
索するような検索方法が望ましい。こうして、適宜の単
語が抽出されると、抽出した単語を上述した記憶装置に
記憶するとともに、発話内容生成部３３に対して抽出し
た単語の発話を指示し、次に音声認識部３１から単語が
入力されるのを待機する（Ｓ６）。ところで、ステップ
Ｓ２において音声認識部３１から入力されたテキストデ
ータＴＸＴ１を上述した記憶装置に照合し、しり取りの
開始後にすでに用いられた単語と判断したときには、対
話管理部３２から発話内容生成部３３に対して「前に言
いましたよ」というテキストデータＴＸＴ２を生成する
ように指示を与える（Ｓ７）。この処理によって、対話
者に別の単語の発話を促すことになる。しり取りの終了
は、対話者が「もう止める」という命令を音声によって
ロボット１に与えればよい。また、しり取りを止めた時
点で、上述した記憶装置の内容は消去される。In order to perform "grinding", the interlocutor gives a command "I want to grind" to the robot 1 by voice. Here, after the first and second attacks are negotiated, the scouring is started. In the rejection, since the utterance of the interlocutor is only words in principle, it is sufficient to perform only word recognition without performing syntax analysis or context analysis. The processing procedure at the time of descaling is as shown in FIG. Now, assuming that the interlocutor is the first player, the dialogue management unit 32
Text data TXT1 (word) from the voice recognition unit 31
Is input (S1), it is determined whether or not the input word is a word that has already been input after the start of filtering (S1).
2). If the word has not been input since the beginning of the rejection, it is stored in an appropriate storage device, and the syllable at the end is extracted (S3).
Next, a word having the beginning of the same syllable as the extracted syllable is searched and extracted from the dictionary in the knowledge base IB (S
4). Here, the extracted word is collated with the above-mentioned storage device, and it is determined whether or not the word has already been used after the start of descaling (S5). If the word has already been used, steps S4 and S5 are performed. By repeating, another word is searched and extracted. Here, if there is no change in the order of the words in the dictionary, the words are extracted in the same order each time and the interest is lost. Therefore, a search method in which words are searched randomly at the time of search is desirable. When an appropriate word is extracted in this way, the extracted word is stored in the above-described storage device, and the utterance of the extracted word is instructed to the utterance content generation unit 33. It waits for an input (S6). By the way, in step S2, the text data TXT1 input from the voice recognition unit 31 is collated with the above-mentioned storage device, and when it is determined that the word has already been used after the start of filtering, the dialog management unit 32 sends the utterance content generation unit 33. Is instructed to generate text data TXT2 saying "I told you before" (S7). This processing prompts the interlocutor to utter another word. To end the rejection, the interlocutor may give a command “stop it” to the robot 1 by voice. At the time when the removal is stopped, the contents of the storage device are erased.

【００４８】なお、上述の例は簡単な手順を示すもので
あり、語尾が「ん」の単語を言うと負けというルール
や、応答が遅いと負けというルールをを含んだ処理にな
っていないが、これらのルールを実現する処理を含めて
おくのが望ましい。たとえば、対話者とロボット１との
用いた単語の語尾を確認する処理を含めるのが望まし
い。この場合、ロボット１側も辞書から抽出した単語の
採用後に語尾の確認を行うようにする。また、上述の処
理ではロボット１は辞書に登録されている単語をすべて
用いることができ、ロボット１が対話者に負ける可能性
が少ないから、ロボット１の応答時間をランダムに制御
して適宜に負けさせるなどの処理を行うのが望ましい。Note that the above example shows a simple procedure, and the processing does not include the rule of losing if a word ending in "n" is used or the rule of losing if the response is slow. It is desirable to include processing for implementing these rules. For example, it is desirable to include a process of confirming the ending of the word used by the interlocutor and the robot 1. In this case, the robot 1 also checks the ending after adopting the word extracted from the dictionary. In the above-described processing, the robot 1 can use all the words registered in the dictionary, and the robot 1 is unlikely to lose to the interlocutor. It is desirable to perform processing such as causing

【００４９】以下に、「しり取り」の進行例を説明す
る。ここでは、対話者が「リンゴ」と言い、ロボット１
が「ゴリラ」と応答する場合を例にして説明する。対話
者の音声は音声認識部３１において認識され「リンゴ」
というテキストデータＴＸＴ１が生成されて対話管理部
３２に渡される。ここで、対話管理部３２では、テキス
トデータＴＸＴ１の内容確認を行うように発話内容生成
部３３に指示する。つまり、発話内容生成部３３には、
「（テキストデータＴＸＴ１）ですね」という確認用の
文言が定型文として登録してあり、対話管理部３２では
この定型文を用いるように発話内容生成部３３に指示を
与える。この例では（テキストデータＴＸＴ１）＝（リ
ンゴ）であるから、発話内容生成部３３では「リンゴで
すね」というテキストデータＴＸＴ２を生成し、音声発
生部３４およびスピーカ１２を通して対話者に応答す
る。ロボット１がスピーカ１２から「リンゴですね」と
発話したことに対して対話者が肯定の応答をすると（た
とえば、「はい」「イエス」「うん」などと答える
と）、対話者の発話内容が「リンゴ」であったと確認す
ることができる。つまり、対話者の発話に対してロボッ
ト１が復唱して単語を確認するのである。ここに、音声
認識部３１において学習機能を持たせておけば、単語毎
に復唱を繰り返すことで音声認識部３１に対話者の発音
の特徴パラメータを学習させることが可能になり、話者
認識および単語認識の認識率を高めることができる。Hereinafter, an example of the progress of “grinding” will be described. Here, the interlocutor calls "apple" and the robot 1
Replies "gorilla" as an example. The voice of the interlocutor is recognized by the voice recognition unit 31 and "apple"
Is generated and passed to the dialog management unit 32. Here, the dialog management unit 32 instructs the utterance content generation unit 33 to check the content of the text data TXT1. That is, the utterance content generation unit 33 includes:
A text for confirmation of "(text data TXT1) is registered as a fixed sentence, and the dialog management unit 32 gives an instruction to the utterance content generating unit 33 to use the fixed sentence. In this example, since (text data TXT1) = (apple), the utterance content generation unit 33 generates the text data TXT2 of "I'm an apple", and responds to the interlocutor through the voice generation unit 34 and the speaker 12. When the interlocutor gives an affirmative response to the robot 1 saying “Is an apple” from the speaker 12 (for example, “Yes”, “Yes”, “Yeah”), the content of the interlocutor's utterance is changed. You can confirm that it was "apple". That is, the robot 1 repeats the utterance of the interlocutor and confirms the word. Here, if the speech recognition unit 31 has a learning function, it is possible to make the speech recognition unit 31 learn the characteristic parameters of the pronunciation of the interlocutor by repeating the repetition for each word. The recognition rate of word recognition can be increased.

【００５０】対話者から肯定の応答があったときには、
図３に示した手順の処理を行うことにより、対話管理部
３２では対話者が発話した単語の語尾から「ゴ」を抽出
し、「ゴ」を語頭に持つ単語を知識ベースＩＢの辞書か
ら抽出する。本例では「ゴリラ」が抽出されるものとし
ており、これによって対話管理部３２は発話内容生成部
３３に対して「ゴリラ」のテキストデータＴＸＴ２を生
成するように指示する。ここにおいて、上述した記憶装
置には「リンゴ」と「ゴリラ」との単語が登録される。
こうして、ロボット１はスピーカ１２から「ゴリラ」と
いう単語の音声を出力する。以下同様にして、対話者と
ロボット１との一方が負けるか、対話者が終了を指示す
るまで上述の処理が繰り返される。When there is a positive response from the interlocutor,
By performing the process of the procedure shown in FIG. 3, the dialog management unit 32 extracts “go” from the ending of the word spoken by the talker, and extracts a word having “go” as the head from the dictionary of the knowledge base IB. I do. In this example, “gorilla” is extracted, and the dialog management unit 32 instructs the utterance content generation unit 33 to generate the text data TXT2 of “gorilla”. Here, the words "apple" and "gorilla" are registered in the above-described storage device.
Thus, the robot 1 outputs the voice of the word “gorilla” from the speaker 12. In the same manner, the above processing is repeated until one of the interlocutor and the robot 1 loses or the interlocutor instructs the end.

【００５１】次に、「スケジュール管理」について説明
する。「スケジュール管理」においては、音声認識部３
１から対話管理部３２に与えられたテキストデータＴＸ
Ｔ１に、スケジュールに関するキーワードが含まれてい
るか否かを判定することによって起動する。たとえば、
「起きる」、「薬を飲む」などの概念を含むテキストデ
ータＴＸＴ１が対話管理部３２に入力されると、対話管
理部３２ではスケジュールに関する情報と認識する。こ
こでは、目覚ましの機能について例示する。この機能を
実現するために、図１に図示していないが、ロボット１
には現在日時を計時する時計手段５１（図６参照）が設
けられている。いま、対話者がロボット１に向かって
「５時に起こして」と言ったとすると、対話管理部３２
では知識ベースＩＢを用いて構文解析や文脈解析を行う
ことによって、「５時に」を時刻と理解し、「起こし
て」を目覚ましのセットの指示と理解する。つまり、知
識ベースＩＢには、「（数字）時に」の「（数字）」は
時刻を表すという知識が設定されており、このような知
識を参照して目覚ましのセットを行うのである。Next, "schedule management" will be described. In the “schedule management”, the voice recognition unit 3
1 to the text data TX given to the dialogue management unit 32
It is activated by determining whether or not T1 includes a keyword related to a schedule. For example,
When text data TXT1 including concepts such as “get up” and “drink medicine” is input to the dialogue management unit 32, the dialogue management unit 32 recognizes the information as schedule information. Here, the alarm function is exemplified. In order to realize this function, although not shown in FIG.
Is provided with clock means 51 (see FIG. 6) for measuring the current date and time. Now, if the interlocutor says to the robot 1 "wake up at 5 o'clock", the dialog management unit 32
Then, by performing syntactic analysis and context analysis using the knowledge base IB, "5:00 o'clock" is understood as time, and "wake up" is understood as an instruction of an alarm clock set. That is, in the knowledge base IB, the knowledge that "(numeral)" of "(numerical) time" represents time is set, and an alarm clock is set with reference to such knowledge.

【００５２】ところで、音声認識部３１における単語認
識は複数の単語の候補が抽出されたときには類似度の高
いものを抽出しており、構文や文脈による解析を行って
いないから単語を誤認識することもある。たとえば、
「５時に」を「工事に」と誤認識する場合もある。この
場合、対話管理部３２に入力されるテキストデータＴＸ
Ｔ１は、「工事に起こして」になる。対話管理部３２に
おいては入力されたテキストデータＴＸＴ１に対して構
文や文脈から意味を解析しているから、知識ベースＩＢ
に照合することによって、「工事に」の次に「起こし
て」は結合できず意味の解析ができないと判断する。こ
のような場合には、音声認識部３１での誤認識と判断
し、「起こして」の前に結合可能で「工事に」と音韻の
類似している候補と、「工事に」の後に結合可能で「起
こして」と音韻の類似している候補とを抽出し、意味が
成立する候補を仮のテキストデータを生成する。いま、
「工事に」を「５時に」に置き換えて「５時に起こし
て」という仮のテキストデータを生成したとする。この
ようにして生成したテキストデータは仮のものであるか
ら、対話者に対して確認することによって検証する。つ
まり、「しり取り」の場合と同様にして、仮のテキスト
データを確認するための質問文のテキストデータＴＸＴ
２を生成するように発話内容生成部３３に指示する。た
とえば、仮のテキストデータが「５時に起こして」であ
るから、「５時に目覚ましをセットしますよ」という質
問文を発話内容生成部３３で生成し、このテキストデー
タＴＸＴ２を用いて対話者に音声での確認を行う。この
質問に対して対話者から肯定の応答があれば、仮のテキ
ストデータが正しかったものとして目覚ましを５時にセ
ットする。また、仮のテキストデータに対して否定の応
答であれば、「もう一度ゆっくり言って下さい」などの
音声をロボット１から出力する。このような対話によっ
て対話者が音声によって簡単なスケジュール管理の指示
をロボット１に与えることができる。By the way, in the word recognition in the voice recognition unit 31, when a plurality of word candidates are extracted, those having a high degree of similarity are extracted. There is also. For example,
"5 o'clock" may be erroneously recognized as "for construction". In this case, the text data TX input to the dialog management unit 32
T1 is "get up for construction". Since the dialog managing unit 32 analyzes the meaning of the input text data TXT1 from the syntax and the context, the knowledge base IB
, It is determined that “wake up” after “to construction” cannot be combined and meaning analysis cannot be performed. In such a case, the speech recognition unit 31 determines that the recognition is erroneous, and can combine before "wake up" and a candidate having a similar phoneme as "to construction" and combining after "to construction" A candidate that is similar to “wake up” and has similar phonemes is extracted, and temporary text data is generated for a candidate that has a meaning. Now
It is assumed that temporary text data of “wake up at 5:00” is generated by replacing “to work” with “at 5:00”. Since the text data generated in this way is temporary, the text data is verified by confirming with the interlocutor. That is, in the same manner as in the case of “removing”, the text data TXT of the question sentence for confirming the temporary text data
2 is instructed to generate the utterance content 2. For example, since the tentative text data is "wake up at 5 o'clock", the utterance content generation unit 33 generates a question sentence "set an alarm at 5 o'clock", and uses this text data TXT2 to inform the interlocutor. Confirm by voice. If there is a positive response from the interlocutor to this question, the wake-up is set at 5:00 assuming that the provisional text data was correct. If the response is negative to the provisional text data, the robot 1 outputs a voice such as "Please say again slowly". Through such a dialogue, the interlocutor can give a simple schedule management instruction to the robot 1 by voice.

【００５３】ここにおいて、対話管理部３２での構文解
析や意味解析で結合できない単語が結合されているとき
に、単語認識の際に誤認識した可能性があるものとし
て、仮のテキストデータを生成するとともに質問文を生
成し音声によって確認する処理は、スケジュール管理だ
けではなく、通常の対話の際にも行われる。つまり、一
種の復唱の処理がなされることになる。また、復唱して
確認することにより、対話者の応答を教師信号に用いて
ロボット１を学習させることができ、対話者の音声に対
する認識率を高めることができる。Here, when words that cannot be combined by syntactic analysis or semantic analysis in the dialog management unit 32 are combined, provisional text data is generated on the assumption that there is a possibility that the words have been erroneously recognized during word recognition. In addition, the process of generating a question sentence and confirming it by voice is performed not only in schedule management but also in ordinary dialogue. That is, a kind of repetition processing is performed. Further, by repeating and confirming, the robot 1 can learn using the response of the interlocutor as the teacher signal, and the recognition rate of the interlocutor's voice can be increased.

【００５４】本実施形態において、映像データを携帯端
末２に表示可能なフォーマットに変換する変換手段とし
ての機能を映像サーバ４に設けた例を示したが、ロボッ
ト１の映像処理手段２２と通信手段２３とのいずれか
に、映像データを携帯端末２に表示可能なフォーマット
に変換する変換手段としての機能を設けるようにしても
よい。また、通信手段２３は必ずしもロボット１に内蔵
していなくてもよく、通信回線２３に接続される通信手
段２３とロボット１との間は宅内に無線通信路を形成す
ることによって接続してもよい。In this embodiment, an example has been described in which the function as the conversion means for converting the video data into a format which can be displayed on the portable terminal 2 is provided in the video server 4. However, the video processing means 22 of the robot 1 and the communication means are provided. 23 may be provided with a function as a conversion unit for converting the video data into a format that can be displayed on the mobile terminal 2. Further, the communication means 23 does not necessarily need to be built in the robot 1, and the communication means 23 connected to the communication line 23 and the robot 1 may be connected by forming a wireless communication path in the house. .

【００５５】（第２の実施の形態）第１の実施の形態に
おいては、ロボット１が犬型ではあっても犬様の動きを
行うものではなかったが、本実施形態では、ロボット１
が適宜の関節を持ち、図４に示すように、複数個のアク
チュエータからなるロボット駆動部２４を制御して関節
を屈伸ないし回転させることによって犬様の動作が可能
になっている例について説明する。ロボット駆動部２４
はロボット制御手段２５を通して信号処理部２０に接続
され、信号処理部２０からの指示に応じてロボット駆動
部２４が制御される。ロボット制御手段２５は、信号処
理部２０からの指示をロボット駆動部２４の動作に変換
する。すなわち、対話処理手段２１の音声認識部３１で
はロボットに対する命令の語彙（単語）を抽出可能とし
ておき、対話管理部３２においてロボット１に対する命
令であることを認識すると、命令内容の分析結果をロボ
ット制御手段２５に引き渡す。ロボット制御手段２５で
は命令内容の分析結果に応じてロボット駆動部２４の各
アクチュエータの操作量をそれぞれ決定する。たとえ
ば、ロボット１の移動について、「回れ右」、「回れ
左」、「少し右」、「少し左」などの命令を認識可能と
してあり、「少し」というような曖昧語に対しても対応
可能としてある。(Second Embodiment) In the first embodiment, the robot 1 does not move like a dog even though it is a dog type.
4 has an appropriate joint, and as shown in FIG. 4, an example in which a dog-like operation is enabled by controlling a robot driving unit 24 including a plurality of actuators to bend and stretch or rotate the joint. . Robot drive unit 24
Is connected to the signal processing unit 20 through the robot control unit 25, and the robot driving unit 24 is controlled according to an instruction from the signal processing unit 20. The robot control unit 25 converts an instruction from the signal processing unit 20 into an operation of the robot driving unit 24. That is, the vocabulary (word) of the command to the robot can be extracted by the voice recognition unit 31 of the dialog processing unit 21. When the dialog management unit 32 recognizes that the command is the command to the robot 1, the analysis result of the command content is recognized by the robot control. Hand over to means 25. The robot controller 25 determines the operation amount of each actuator of the robot drive unit 24 according to the analysis result of the instruction content. For example, regarding the movement of the robot 1, commands such as "turn right", "turn left", "slight right", and "slight left" can be recognized, and ambiguous words such as "slight" can be handled. There is.

【００５６】さらに、本実施形態では、ロボット１を移
動させることなくＴＶカメラ１３の光軸の向きや焦点距
離を変化させることが可能にしてある。つまり、ＴＶカ
メラ１３を左右に振る機能とＴＶカメラ１３に設けたズ
ームレンズを駆動する機能とを有したカメラ制御手段２
６が設けられている。カメラ制御手段２６に対する指示
は信号処理部２０を通して行われる。信号処理部２０で
は、対話者の音声による指示を、カメラ制御手段２６に
与えることによって、ＴＶカメラ１３の視野を制御す
る。たとえば、対話者の音声によってＴＶカメラ１３の
視野を制御する場合には、ロボット１の移動を指示する
場合と同様に、対話処理手段２１において、「ズームイ
ン」、「ズームアウト」、「少し右」、「少し左」など
の命令を認識可能としておき、これらの命令に応じてＴ
Ｖカメラ１３の焦点距離や向きを制御するのである。な
お、本発明におけるＴＶカメラ１３は主として対話者を
撮像することを目的としているから、映像処理手段２２
においてはＴＶカメラ１３からの映像信号に基づいて人
の顔の位置を認識し、ズームインにおいては人の胸から
上が視野内に入る程度を最大とし、ズームアウトにおい
ては人の全身が視野内に入る程度を最小とするのが望ま
しい。また、「少し右」、「少し左」という命令に対し
ては、人の左右が視野内に入る程度が望ましい。Further, in this embodiment, the direction of the optical axis and the focal length of the TV camera 13 can be changed without moving the robot 1. That is, the camera control means 2 having a function of shaking the TV camera 13 right and left and a function of driving a zoom lens provided in the TV camera 13.
6 are provided. The instruction to the camera control means 26 is given through the signal processing unit 20. The signal processing unit 20 controls the field of view of the TV camera 13 by giving an instruction by a voice of an interlocutor to the camera control unit 26. For example, when controlling the field of view of the TV camera 13 by the voice of the interlocutor, the interactive processing unit 21 performs “zoom in”, “zoom out”, and “slightly right” as in the case of instructing the movement of the robot 1. , "Slightly left", etc., are made recognizable, and T
The focal length and direction of the V camera 13 are controlled. Since the TV camera 13 according to the present invention is mainly intended to capture an image of an interlocutor,
In, the position of a person's face is recognized based on the video signal from the TV camera 13, and when zooming in, the extent that the upper part of the person's chest enters the field of view is maximized; It is desirable to minimize the degree of entry. In addition, for the command of "slightly right" and "slightly left", it is desirable that the left and right of the person be within the field of view.

【００５７】ロボット１の移動に関する命令およびＴＶ
カメラ１の制御に関する命令は、対話者の音声によって
指示する以外にも、携帯端末２から通信回線３を通して
与えることもできる。ただし、通信回線３を通して外部
からロボット１に命令を与えるから、識別情報による認
証作業が必要であって、たとえば暗証番号を用いること
によって特定の人以外がロボット１に命令を与えられな
いようにしてある。携帯端末２からロボット１に対して
は、番号の組み合わせによって命令を与えるようにする
ことが可能であるが、対話処理手段２１を用いて携帯端
末２からの音声を受け付けるようにすれば、マイクロホ
ン１１を用いて命令を与える技術をそのまま利用するこ
とができる。Command for moving robot 1 and TV
The command related to the control of the camera 1 can be given from the portable terminal 2 through the communication line 3 in addition to the instruction by the voice of the interlocutor. However, since a command is given to the robot 1 from the outside through the communication line 3, authentication work based on identification information is necessary. For example, by using a personal identification number, it is possible to prevent a person other than a specific person from giving a command to the robot 1. is there. It is possible to give a command from the mobile terminal 2 to the robot 1 by a combination of numbers. The technique of giving an instruction by using a command can be used as it is.

【００５８】ところで、ロボット１に対してマイクロホ
ン１１から命令を与える構成と携帯端末２から命令を与
える構成とを併用すると、ロボット１が互いに矛盾する
命令をほぼ同時に受けることがある。この場合、ロボッ
ト１が命令を順に処理したとしても、後に処理された命
令に対応した動作を行うことになるから、先に処理され
た命令を与えた側から言えば、ロボット１が意図に反し
た動作をしたことになる。そこで、ロボット１に命令を
与えるに際して、一方の命令が実行されると当面（命令
に対する一通りの動作が完了する程度の時間）は他方の
命令を受け付けないようにし、他方の命令を受け付けな
い期間においては当該他方に対して上記一方の命令の実
行中であることを報知するようにしてある。たとえば、
ロボット１と対話する対話者が音声による命令を与えよ
うとするときに携帯端末２からの命令を実行中であれ
ば、「携帯端末からの操作中」である旨の音声メッセー
ジをスピーカ１２を通して出力する。逆に、携帯端末２
を操作する対話者が命令を与えようとするときにロボッ
ト１と対話する対話者が音声によって与えた命令を実行
中であれば、「対話による操作中」である旨のメッセー
ジを携帯端末２の画面に表示する。携帯端末２からロボ
ット１に対して音声による命令を与える場合であれば、
「対話による操作中」である旨の音声メッセージを携帯
端末２を通して出力してもよい。このように、対話者と
携帯端末２とのうちの一方からの指示によってロボット
１が動作しているときに、命令の被指示側である他方に
対して命令が受け付けられない旨の報知を行うように、
信号処理部２０には図示しない報知手段が設けられる。By the way, if the configuration for giving a command from the microphone 11 to the robot 1 and the configuration for giving a command from the portable terminal 2 are used together, the robot 1 may receive mutually contradictory commands almost simultaneously. In this case, even if the robot 1 sequentially processes the commands, the robot 1 performs an operation corresponding to the later processed command. That is, the operation that was performed. Therefore, when giving an instruction to the robot 1, if one of the instructions is executed, the other instruction is not accepted for the time being (the time for completing one operation for the instruction), and the other instruction is not accepted. Is informed to the other that the one instruction is being executed. For example,
If the interlocutor interacting with the robot 1 is executing a command from the mobile terminal 2 when trying to give a voice command, a voice message indicating that “operation is being performed from the mobile terminal” is output through the speaker 12. I do. Conversely, mobile terminal 2
When the interlocutor interacting with the robot 1 is executing the command given by voice when the interlocutor operating the mobile terminal 2 gives the command, a message indicating that “the operation is being performed by the dialogue” is transmitted to the mobile terminal 2. Display on the screen. If the mobile terminal 2 gives a voice command to the robot 1,
A voice message indicating “during operation by dialogue” may be output through the mobile terminal 2. As described above, when the robot 1 is operating in accordance with an instruction from one of the interlocutor and the portable terminal 2, a notification that the instruction is not accepted is issued to the other instructed side of the instruction. like,
The signal processing unit 20 is provided with a notifying unit (not shown).

【００５９】ところで、本実施形態では、対話者からの
情報を得るインタフェースとしてマイクロホン１１とＴ
Ｖカメラ１３とのほかに接触センサ１４を備えている。
接触センサ１４はロボット１の表面付近の適宜箇所に配
置され、接触センサ１４の出力はマイクロホン１１やＴ
Ｖカメラ１３の出力と同様に信号処理部２０に入力され
る。しかして、接触センサ１４により対話者がロボット
１に触れたことが検出されると、対話処理手段２１を起
動するようにしてある。つまり、対話処理手段２１を常
時動作させていると、周囲の雑音に応答してロボット１
が動作することがあるから、対話処理手段２１は対話者
が発話している期間にのみ起動するのが望ましいのであ
るが、対話者がいつ発話するかは予測できない。周囲の
雑音よりも大きい音声がマイクロホン１１に入力される
と対話処理手段２１を起動する技術も考えられるが、こ
のような技術を採用すると語頭が切断され、発話内容を
誤認識する可能性が高くなる。また、対話処理手段２１
において話者認識の技術を用いても、音声のみで話者を
完全に特定することは現状の技術では難しいものである
から、周囲の雑音を話者と誤認する可能性もある。In the present embodiment, the microphone 11 and the T are used as interfaces for obtaining information from the interlocutor.
A contact sensor 14 is provided in addition to the V camera 13.
The contact sensor 14 is arranged at an appropriate position near the surface of the robot 1, and the output of the contact sensor 14 is
The signal is input to the signal processing unit 20 in the same manner as the output of the V camera 13. Thus, when the contact sensor 14 detects that the interlocutor has touched the robot 1, the dialog processing means 21 is activated. That is, if the dialog processing means 21 is constantly operated, the robot 1 responds to the surrounding noise.
It is desirable that the dialog processing means 21 be activated only during the period during which the interlocutor is speaking, but it is not possible to predict when the interlocutor will speak. A technology that activates the dialogue processing means 21 when a voice louder than the surrounding noise is input to the microphone 11 can be considered. Become. Further, the interactive processing means 21
However, even if the speaker recognition technology is used, it is difficult to completely specify a speaker only by voice using the current technology, and therefore, there is a possibility that surrounding noise is erroneously recognized as a speaker.

【００６０】そこで、本実施形態においては、対話者が
ロボット１に触れたことが接触センサ１４に検出される
と対話処理手段２１を起動するのであって、対話者以外
の音声にロボット１が反応する可能性をほぼ回避できる
ことになる。Therefore, in the present embodiment, when the contact sensor 14 detects that the interlocutor has touched the robot 1, the interactive processing means 21 is activated. Can be almost avoided.

【００６１】本実施形態では、第１の実施の形態と同様
に、携帯端末２から映像サーバ４に蓄積された映像が閲
覧されたときに個人データが転送される。ただし、通話
手段２８には個人データにロボット１の動きを対応付け
るテーブルが設けてあり、ロボット１が図５のような犬
型であるとすれば、頭部１ａや尾部１ｂに各個人データ
を対応付ける。つまり、個人データに応じて、頭部１ａ
を振る動作や尾部１ｂを振る動作に対応付けてある。こ
のことによって、対話者は映像サーバ４にアクセスした
のが誰かを直感的に知ることが可能になる。なお、個人
データを識別するための報知部として、上述のようにロ
ボット１の各部の動作を用いるほか、個人データに色を
対応付けた表示灯を報知部として用いることも可能であ
る。In the present embodiment, as in the first embodiment, personal data is transferred when a video stored in the video server 4 is viewed from the portable terminal 2. However, the communication means 28 is provided with a table for associating the movement of the robot 1 with the personal data. If the robot 1 is of a dog type as shown in FIG. 5, the head 1a and the tail 1b are associated with each personal data. . That is, according to the personal data, the head 1a
And the action of shaking the tail 1b. This allows the interlocutor to intuitively know who accessed the video server 4. In addition to using the operation of each unit of the robot 1 as described above as a notification unit for identifying personal data, it is also possible to use a display lamp in which colors are associated with personal data as the notification unit.

【００６２】ところで、本実施形態におけるロボット１
は、インタネットへの接続が前提とされており、通信手
段２３が、電子メール処理部４１とインタネット検索処
理部４２とを備えている。インターネットを通して電子
メールを交換したり情報を検索したりする処理は、「し
り取り」や「スケジュール管理」と同様に、対話者がマ
イクロホン１１を通して音声によってロボット１に指示
することができる。By the way, the robot 1 in this embodiment
Is premised on connection to the Internet, and the communication means 23 includes an e-mail processing unit 41 and an Internet search processing unit 42. The process of exchanging e-mails and searching for information via the Internet can be instructed by the interlocutor to the robot 1 by voice through the microphone 11, as in the case of "scrutiny" and "schedule management".

【００６３】いま、携帯端末２との間で電子メールを交
換する場合について例示する。ここに、送信先のメール
アドレスは電子メール処理部４１に設定されているもの
とする。ロボット１と対話する対話者から携帯端末２に
電子メールを送信する場合は、対話者がロボット１に対
して「電子メールを××に送信したい」と音声によって
伝えることで、電子メール処理部４１が起動し、電子メ
ールの本文を入力できるようになる。ここで、対話者が
音声によって電子メールの本文を作成し、「電子メール
を送信して」というように音声によってロボット１に命
令を与えると、ロボット１が本文を読み上げて内容の確
認を行い、対話者が承諾すると、作成された電子メール
がインタネットプロバイダのメールサーバに蓄積され
る。メールサーバに蓄積された電子メールは、通常の電
子メールと同様であるから、携帯端末２によって読み出
すことができる。Now, an example in which electronic mail is exchanged with the portable terminal 2 will be described. Here, it is assumed that the mail address of the transmission destination is set in the e-mail processing unit 41. When an e-mail is transmitted from the interlocutor interacting with the robot 1 to the portable terminal 2, the interlocutor informs the robot 1 of “I want to transmit the e-mail to XX” by voice, and the e-mail processing unit 41. Will launch and you will be able to enter the body of the email. Here, when the interlocutor prepares the body of the e-mail by voice and gives an instruction to the robot 1 by voice such as "send the e-mail", the robot 1 reads out the body and confirms the content. When the interlocutor accepts, the created e-mail is stored in the mail server of the Internet provider. The e-mail stored in the mail server is the same as a normal e-mail, and can be read by the mobile terminal 2.

【００６４】また、ロボット１と対話する対話者に宛て
た電子メールをメールサーバから読み出すためにロボッ
ト１はメールサーバに定期的にアクセスしており、また
対話者が音声によって電子メールの読み出しをロボット
１に指示した場合にも電子メールの読み出しが可能にな
る。電子メール処理部４１においてメールサーバから電
子メールが読み出されると、電子メールの本文はテキス
トデータＴＸＴ２として音声発生部３４に入力される。
したがって、電子メールの本文が読み上げられ、音声に
よって対話者に伝えられることになる。The robot 1 periodically accesses the mail server in order to read an e-mail addressed to the interlocutor interacting with the robot 1 from the mail server. Even when the instruction is given to 1, the electronic mail can be read. When the electronic mail is read from the mail server in the electronic mail processing unit 41, the text of the electronic mail is input to the voice generating unit 34 as text data TXT2.
Therefore, the body of the e-mail is read out and transmitted to the interlocutor by voice.

【００６５】インタネット検索処理部４２はキーワード
検索に用いられる。インタネット検索処理部４２は、あ
らかじめ設定された検索用のホームページに接続する機
能を有し、そのホームページは検索エンジンを持ち、検
索結果をテキストデータとして転送できるようにしてあ
る。つまり、ロボット１に専用の検索用ホームページが
設けてある。この検索用ホームページは、基本的には電
話帳の機能を有するものであり、対話者が希望する地区
で希望する業種の業者を検索する機能を有している。た
とえば、対話者が「宅配業者を探して」と音声によって
ロボット１に伝えると、対話管理部３２においてインタ
ネットでの検索が要求されていると認識して、インタネ
ット検索処理部４２を通して宅配業者を検索する。検索
結果はテキストデータＴＸＴ２として音声発生部３４に
与えられ、検索された宅配業者名が音声によって順に読
み上げられる。ここで、検索結果の数が多いときにはイ
ンタネット検索処理部４２が絞り込みを促すように発話
内容生成部３３に指示し、検索結果が適当数になれば、
検索結果が音声によって対話者に伝達される。さらに、
対話者が希望する宅配業者を選んで音声によってロボッ
ト１に伝えると、その宅配業者の電話番号がホームペー
ジから読み出されて音声により対話者に伝達される。The internet search processing section 42 is used for keyword search. The Internet search processing unit 42 has a function of connecting to a preset homepage for search. The homepage has a search engine and can transfer search results as text data. That is, a dedicated search homepage is provided for the robot 1. This search homepage basically has a telephone directory function, and has a function of searching for a trader of a desired business type in a district desired by the interlocutor. For example, when the interlocutor tells the robot 1 by voice "search for a courier company", the dialog management unit 32 recognizes that a search on the Internet is required, and searches for the courier company through the Internet search processing unit 42. I do. The search result is given as text data TXT2 to the voice generating unit 34, and the searched delivery company names are read out by voice in order. Here, when the number of search results is large, the Internet search processing unit 42 instructs the utterance content generation unit 33 to promote the narrowing down.
Search results are communicated to the interlocutor by voice. further,
When the interlocutor selects a desired courier and notifies the robot 1 by voice, the telephone number of the courier is read from the homepage and transmitted to the interlocutor by voice.

【００６６】上述した各実施形態においては、対話者が
ロボット１に指示を与えることによってロボット１が動
作するようにしているが、従来構成として説明したペッ
トロボットでは、自律エージェントによってロボット１
が自動的に起動して対話者に話しかける機能を持ち、ま
た対話者の表情や身振りなどの情報も用いて音声認識を
行うマルチモーダル対話の機能も備えているものであっ
て、本発明においてもこれらの機能を付加することによ
って、対話者とロボット１との間でよりスムーズな対話
が可能になる。他の構成および動作は第１の実施の形態
と同様である。In each of the above-described embodiments, the robot 1 operates when the interlocutor gives an instruction to the robot 1. However, in the pet robot described as the conventional configuration, the robot 1 is operated by the autonomous agent.
Has a function of automatically starting and talking to the interlocutor, and also has a multimodal dialog function of performing voice recognition using information such as the expression and gesture of the interlocutor. By adding these functions, a smoother conversation between the interlocutor and the robot 1 becomes possible. Other configurations and operations are the same as those of the first embodiment.

【００６７】（第３の実施の形態）本実施形態は、第１
の実施の形態において、より利便性を高めるように機能
を付加したものである。本実施形態において付加する機
能は第２の実施の形態においても適用可能である。(Third Embodiment) In the present embodiment, the first
In the embodiment of the present invention, a function is added to enhance convenience. The functions added in the present embodiment can be applied to the second embodiment.

【００６８】本実施形態では、図６に示すように、ＴＶ
カメラ１３により撮像された映像に撮像された日時を付
加するために、現在の日時を計時する時計手段５１と、
時計手段５１により計時された日時をＴＶカメラ１３で
撮像された映像に文字としてスーパインポーズ表示する
ためのスーパインポーズ処理手段５２とを備える。すな
わち、ＴＶカメラ１３により図７（ａ）のような映像が
撮像され、時計手段５１により計時されている撮像日時
が２０００年１１月１５日の１５時２３分であったとす
ると、図７（ｂ）のように、ＴＶカメラ１３で撮像され
た映像の上部あるいは下部に、「２０００／１１／１５
１５：２３」などの形式で映像の撮像日時を示す文字
が合成される。このような日時付きの映像が映像サーバ
４に転送されることによって、携帯端末２の画面にもＴ
Ｖカメラ１３で映像が撮像された日時が表示されること
になる。つまり、対話者がＴＶカメラ１３で撮像された
時点から携帯端末２の画面に映像が表示されて対話者の
様子が確認されるまでには時間差があるから、携帯端末
２の画面に映像の撮像された日時を表示することによっ
て、少なくともその日時での対話者の様子を保証するこ
とが可能になる。なお、スーパインポーズ処理手段５２
は、マイコンよりなる映像処理手段２２に適宜プログラ
ムを与えて実現してもよいが、専用の集積回路によって
構成すれば映像処理手段２２における負荷が低減され
る。In the present embodiment, as shown in FIG.
The date and time when the image was captured is added to the image captured by the camera 13.
Clock means 51 for counting the current date and time to add
The date and time measured by the clock means 51 are recorded on the TV camera 13.
Superimposed display as characters on the captured video
And a superimpose processing means 52 for performing the operation. sand
That is, the image as shown in FIG.
The imaged date and time when the image is captured and timed by the clock means 51
It was 15:23 on November 15, 2000
Then, as shown in FIG.
On the top or bottom of the video
15:23 ", etc.
Are synthesized. The video with such date and time is stored on the video server
4, the screen of the mobile terminal 2 also has T
The date and time when the video was captured by the V camera 13 is displayed.
become. That is, the interlocutor is imaged by the TV camera 13.
From the point in time, an image is displayed on the screen of the mobile terminal 2 and the
Since there is a time lag before the situation is confirmed,
By displaying the date and time when the video was captured on screen 2
To ensure at least the state of the interlocutor at that date and time.
And become possible. The superimpose processing means 52
Is appropriately programmed by the video processing means 22 comprising a microcomputer.
May be implemented by adding a system, but using a dedicated integrated circuit
With this configuration, the load on the video processing unit 22 is reduced.
You.

【００６９】ところで、映像サーバ４における記憶容量
は非常に大きいものではあるが、ロボット１の台数が多
くなれば映像データのデータ量が膨大になるから、映像
サーバ４ではＴＶカメラ１３で撮像されたすべての映像
データを蓄積するのではなく、携帯端末２で閲覧された
映像データは原則として消去することになる。一方、対
話者に何らかの異常が生じた場合であって異常の発生時
点から携帯端末２で映像が閲覧されるまでの間に大きな
時間差があったような場合には、異常の発生前から発生
後に跨る期間の映像データの履歴が要求されることがあ
る。上述のように映像サーバ４に蓄積された映像データ
は逐次消去されるから、このような要求に十分に対応す
ることができない場合がある。また、通信手段２３を通
してロボット１から映像サーバ４に映像を転送するもの
であるから、通信回線３の状態や映像サーバ４の状態に
よっては、映像サーバ４に映像データが蓄積されない場
合も考えられる。Although the storage capacity of the video server 4 is very large, if the number of robots 1 increases, the amount of video data becomes enormous. Instead of storing all video data, video data viewed on the mobile terminal 2 is deleted in principle. On the other hand, in the case where some trouble occurs in the interlocutor and there is a large time difference between the time of occurrence of the trouble and the time when the video is viewed on the portable terminal 2, if the trouble occurs before the occurrence of the trouble There is a case where a history of video data of a straddling period is required. As described above, since the video data stored in the video server 4 is sequentially deleted, it may not be possible to sufficiently cope with such a request. Further, since the video is transferred from the robot 1 to the video server 4 through the communication means 23, depending on the state of the communication line 3 and the status of the video server 4, there may be cases where the video data is not stored in the video server 4.

【００７０】そこで、本実施形態では通信手段２３を通
して映像サーバ４に転送する映像データを蓄積するため
の記憶手段５３をロボット１に付加してある。記憶手段
５３は電源が遮断された状態でも記憶内容を保持するこ
とができるものを用いている。この種の記憶手段５３と
しては、フレキシブルディスク、ハードディスク、ＣＤ
−Ｒ、不揮発性メモリなどの各種のものを用いることが
できる。記憶手段５３に格納される情報は、各通信手段
２３から送出される映像データと、各映像データの送信
の日時および送信の成功・失敗のデータとが含まれる。
たとえば、送信する映像データ毎に個別の名称を付し、
「送信日時、映像データの名称、送信の成功・失敗」と
いうような形式のデータを映像データとともに蓄積すれ
ばよい。また、映像データに付ける個別の名称は送信日
時を利用して設定すればよい。たとえば、送信日時が２
０００年１１月１３日８時１３分であるとすれば、映像
データの名称を「２０００１１１３０８１１ａ」などと
すればよい。このように、通信手段２３から送出する映
像データと送信の履歴とをロボット１に蓄積することに
よって、必要に応じて映像データを過去に遡って閲覧す
ることが可能になる。ここに、記憶手段５３に蓄積され
た映像データおよび送信履歴については、記憶手段５３
から記録媒体を外すことによって別のパーソナルコンピ
ュータなどによって読み出すことが可能である。Therefore, in the present embodiment, the storage means 53 for storing the video data to be transferred to the video server 4 through the communication means 23 is added to the robot 1. The storage means 53 is capable of retaining the stored contents even when the power is turned off. This type of storage means 53 includes a flexible disk, a hard disk, and a CD.
-R, various types such as a nonvolatile memory can be used. The information stored in the storage unit 53 includes the video data transmitted from each communication unit 23, the date and time of transmission of each video data, and data on the success or failure of transmission.
For example, assign a unique name to each video data to be transmitted,
Data having a format such as “transmission date and time, name of video data, success / failure of transmission” may be stored together with the video data. Further, the individual name given to the video data may be set using the transmission date and time. For example, if the transmission date and time is 2
If it is 8:13 on November 13, 000, the name of the video data may be "200011130811a" or the like. Thus, by accumulating the video data transmitted from the communication means 23 and the transmission history in the robot 1, it is possible to browse the video data retrospectively as necessary. Here, regarding the video data and transmission history stored in the storage unit 53, the storage unit 53
By removing the recording medium from, the data can be read by another personal computer or the like.

【００７１】上述の説明は対話者がＴＶカメラ１３で確
実に撮像されていることを前提にしているが、実際には
ロボット１が傾いている場合や、対話者がＴＶカメラ１
３の視野の中心付近に存在しない場合もある。そこで、
以下では対話者をＴＶカメラ１３で適正に撮像する技術
について説明する。The above description is based on the premise that the interlocutor is reliably imaged by the TV camera 13. However, in actuality, when the robot 1 is tilted or when the interlocutor
In some cases, it does not exist near the center of the third visual field. Therefore,
In the following, a technique for appropriately capturing an image of the interlocutor with the TV camera 13 will be described.

【００７２】ロボット１が傾いて設置されている場合や
対話者がロボット１を抱いているとすると、ＴＶカメラ
１３により撮像された映像が傾くことがある。そこで、
本実施形態においては、ロボット１の傾きを検出するた
めの姿勢センサ５４を付加してある。姿勢センサ５４と
してはジャイロセンサを用いることができる。姿勢セン
サ５４によって検出されたロボット１の傾きは映像処理
手段２２に与えられ、映像処理手段２２において映像が
正立するように映像の傾きが補正される。ここに、映像
が正立するとは重力方向に直交する面が映像の下縁に平
行になる状態を意味する。また、映像の左右の傾きを補
正すればよく、上下の傾きについては必ずしも補正しな
くてもよいが、上下左右の傾きをともに補正すればＴＶ
カメラ１３を水平面上に設置している状態と同様の映像
を得ることが可能になる。たとえば、姿勢センサ５４に
より検出されたロボット１の傾きによって映像が左回り
に３０度回転していると判断されると、映像処理手段２
２においては映像を画面の中心を中心として右回りに３
０度回転させる補正を行い、補正後の映像を用いて以後
の処理を行うのである。If the robot 1 is installed at an angle or if the interlocutor is holding the robot 1, the image captured by the TV camera 13 may be tilted. Therefore,
In the present embodiment, a posture sensor 54 for detecting the inclination of the robot 1 is added. A gyro sensor can be used as the attitude sensor 54. The tilt of the robot 1 detected by the attitude sensor 54 is given to the video processing unit 22, and the tilt of the video is corrected by the video processing unit 22 so that the video is erected. Here, the image being erect means that a plane orthogonal to the direction of gravity is parallel to the lower edge of the image. Also, it is only necessary to correct the left and right inclination of the video, and the up and down inclination is not necessarily corrected.
It is possible to obtain an image similar to the state where the camera 13 is installed on a horizontal plane. For example, when it is determined that the image is rotated 30 degrees counterclockwise by the inclination of the robot 1 detected by the posture sensor 54, the image processing unit 2
In 2, the image is rotated clockwise 3 around the center of the screen.
Correction for rotating by 0 degrees is performed, and the subsequent processing is performed using the corrected image.

【００７３】また、ＴＶカメラ１３の視野内に対話者を
捕捉するために、ＴＶカメラ１３の向きを可変とする首
振り駆動手段５５と、ＴＶカメラ１３により撮像すべき
所要の映像をあらかじめテンプレートとして登録してお
きＴＶカメラ１３により撮像している映像をテンプレー
トと照合したときの類似度が大きくなるように首振り駆
動手段５５を制御するパターン処理手段５６とを設けて
ある。首振り駆動手段５５はＴＶカメラ１３の光軸の向
きを変化させるように設けられ、たとえばＴＶカメラ１
３をロボット１の頭部に配置しているとすれば首振り駆
動手段５５によりロボット１の頭部の位置ないし向きを
移動可能とすればよい。また、ロボット１はとくに移動
させずにロボット１の本体に対してＴＶカメラ１３のみ
が移動可能となるように首振り駆動手段５６を構成して
もよい。Further, in order to capture the interlocutor within the field of view of the TV camera 13, a swinging driving means 55 for changing the direction of the TV camera 13 and a required image to be picked up by the TV camera 13 are used as a template in advance. There is provided a pattern processing means 56 for controlling the swing driving means 55 so as to increase the degree of similarity when the image registered by the TV camera 13 is compared with the template. The swing driving means 55 is provided so as to change the direction of the optical axis of the TV camera 13.
If the robot 3 is disposed on the head of the robot 1, the position or the direction of the head of the robot 1 may be movable by the swing driving means 55. Further, the swing drive means 56 may be configured so that only the TV camera 13 can move with respect to the main body of the robot 1 without moving the robot 1 in particular.

【００７４】一方、パターン処理手段５６はＴＶカメラ
１３により撮像された映像をテンプレートとして登録さ
れている映像と照合し、得られている画像の範囲で類似
度が最大になる方向に首振り駆動手段５５を駆動してＴ
Ｖカメラ１３の光軸の向きを変更する。たとえば、テン
プレートとして図８（ａ）のような映像を登録している
ときには、図８（ｂ）や図８（ｄ）のような映像よりも
図８（ｃ）の映像のほうが類似度が大きいから、首振り
駆動手段５５では図８（ｃ）のような映像が得られるよ
うにＴＶカメラ１３の向きを変更する。このようにパタ
ーン処理手段５６に、対話者の適切なテンプレートを登
録しておけば、首振り駆動手段５５を適正に制御して対
話者をＴＶカメラ１３でつねに捕捉することが可能にな
る。なお、対話者は同じ姿勢を保つわけではないから、
形状や移動速度などに関する特徴抽出を行い、人か否か
を判断する知識をテンプレートと併せて用いることによ
って首振り駆動手段５５を制御するようにしてもよい。On the other hand, the pattern processing means 56 checks the image taken by the TV camera 13 with the image registered as a template, and swings the driving means in the direction in which the similarity becomes maximum in the range of the obtained image. Drive 55 to T
The direction of the optical axis of the V camera 13 is changed. For example, when a video as shown in FIG. 8A is registered as a template, the video of FIG. 8C has a higher similarity than the video of FIGS. 8B and 8D. Therefore, the swing driving means 55 changes the direction of the TV camera 13 so that an image as shown in FIG. If an appropriate template of the interlocutor is registered in the pattern processing unit 56 in this manner, the interrogator can always be captured by the TV camera 13 by appropriately controlling the swing driving unit 55. Note that interlocutors do not maintain the same attitude,
The swing drive unit 55 may be controlled by extracting features relating to the shape, the moving speed, and the like, and using the knowledge for determining whether or not the person is a person in combination with the template.

【００７５】上述したパターン処理手段５６ではＴＶカ
メラ１３により撮像した映像内に対話者が存在しなけれ
ば、首振り駆動手段５５をどのように駆動すべきか判断
することができない。そこで、マイクロホン１１として
指向性の強いものを用いて音レベルによっても対話者の
存在する方向を知ることを可能としてある。つまり、本
実施形態ではマイクロホン１１が第２の音声入力装置と
して機能する。また、マイクロホン１１における指向性
が最大になる向きはＴＶカメラ１３の光軸の向きと平行
に設定され、首振り駆動手段５５はマイクロホン１１と
ＴＶカメラ１３との位置関係を固定したままでマイクロ
ホン１１およびＴＶカメラ１３の向きを変更するように
構成してある。マイクロホン１１から入力された音声の
音レベルは音レベル処理手段５７により検出され、パタ
ーン処理手段５６において類似度が所定値以下であると
きには、音レベル処理手段５７で検出した音レベルが大
きくなる向きに首振り駆動手段５５を駆動するように構
成されている。要するに、首振り駆動手段５５によって
マイクロホン１１の向きを変更している間に、図９
（ａ）（ｃ）に示すような音レベルの比較的小さい状態
から図９（ｂ）のように音レベルが比較的大きい状態に
変化したとすれば、図９（ｂ）の状態でマイクロホン１
１の指向性が最大になる向きに対話者が存在していると
みなすことができる。The pattern processing means 56 cannot determine how to drive the swing drive means 55 unless there is an interlocutor in the video imaged by the TV camera 13. Therefore, it is possible to know the direction in which the interlocutor is present even by the sound level by using a microphone 11 having strong directivity. That is, in the present embodiment, the microphone 11 functions as a second voice input device. The direction in which the directivity of the microphone 11 is maximized is set to be parallel to the direction of the optical axis of the TV camera 13, and the swing drive unit 55 keeps the positional relationship between the microphone 11 and the TV camera 13 fixed. And the direction of the TV camera 13 is changed. The sound level of the sound input from the microphone 11 is detected by the sound level processing means 57, and when the similarity is less than or equal to a predetermined value in the pattern processing means 56, the sound level detected by the sound level processing means 57 is increased. It is configured to drive the swing drive means 55. In short, while the direction of the microphone 11 is being changed by the swinging driving means 55, FIG.
(A) Assuming that the state where the sound level is relatively low as shown in FIG. 9 (c) changes to the state where the sound level is relatively high as shown in FIG. 9 (b), the microphone 1 in the state shown in FIG.
It can be considered that the interlocutor exists in a direction in which the directivity of 1 is maximized.

【００７６】たとえば、対話者の音声がマイクロホン１
１を通して入力されると、対話者の音声が入力されてい
る間に首振り駆動手段５５によってマイクロホン１１の
向きが変更され、マイクロホン１１に入力される音声の
音レベルがほぼ最大になる向きにマイクロホン１１が向
けられる。このとき、ＴＶカメラ１３の光軸も同じ向き
を向いており、マイクロホン１１に入力される音声の音
レベルが大きくなる向きは対話者が存在する向きである
可能性が高いから、結果的にＴＶカメラ１３の視野内に
対話者を捕捉できる可能性が高くなるのである。なお、
マイクロホン１１の指向性を利用して対話者を捕捉する
技術を用いる場合に、パターン処理手段５６を省略する
ことも可能である。また、対話者を捕捉するために用い
る指向性の強いマイクロホンは、対話者との対話に用い
るマイクロホン１１とは別に設けることも可能である。For example, the voice of the interlocutor is
1, the direction of the microphone 11 is changed by the swing driving means 55 while the voice of the interlocutor is being input, and the microphone is input in such a direction that the sound level of the voice input to the microphone 11 becomes almost maximum. 11 is aimed. At this time, the optical axis of the TV camera 13 is also oriented in the same direction, and the direction in which the sound level of the sound input to the microphone 11 increases is likely to be the direction in which the interlocutor is present. This increases the possibility that the interlocutor can be captured within the field of view of the camera 13. In addition,
In the case of using a technique for capturing the interlocutor using the directivity of the microphone 11, the pattern processing unit 56 can be omitted. In addition, a microphone having strong directivity used for capturing the interlocutor can be provided separately from the microphone 11 used for the dialog with the interlocutor.

【００７７】さらに、図１０に示すように、ＴＶカメラ
１３の前方においてＴＶカメラ１３の光軸Ａｘに直交さ
せた形でハーフミラー５８を配置してもよい。このよう
にＴＶカメラ１３の前方にハーフミラー５８を配置して
おけば、ハーフミラー５８に対話者自身の姿を映した状
態とすることによって、ＴＶカメラ１３の視野内に対話
者が確実に入ることになり、ＴＶカメラ１３の視野内に
対話者が捕捉されていることを保証することができる。
要するに、対話者の姿はハーフミラー５８に反射されて
ハーフミラー５８に映り、ハーフミラー５８を透過した
光によってＴＶカメラ１３で対話者を撮像することがで
きるのである。Further, as shown in FIG. 10, a half mirror 58 may be arranged in front of the TV camera 13 so as to be orthogonal to the optical axis Ax of the TV camera 13. By arranging the half mirror 58 in front of the TV camera 13 in this manner, the half mirror 58 reflects the figure of the talker himself so that the talker can enter the field of view of the TV camera 13 without fail. In other words, it can be ensured that the interlocutor is captured in the field of view of the TV camera 13.
In short, the figure of the interlocutor is reflected by the half mirror 58 and reflected on the half mirror 58, and the interlocutor can be imaged by the TV camera 13 by the light transmitted through the half mirror 58.

【００７８】対話者がＴＶカメラ１３の視野内に存在す
ることを対話者自身で確認するために、映像処理手段２
２で扱う映像信号を通常のテレビ受像機に出力可能とす
る映像出力手段５９を付加してもよい。この場合には、
映像出力手段５９にテレビ受像機を接続しておけば、対
話者がＴＶカメラ１３により撮像されていることを対話
者自身がテレビ受像機の画面によって確認することがで
きる。また、対話者は自身の姿のみではなく、室内の様
子など周辺の様子もテレビ受像機の画面によってその場
で認識することが可能になる。つまり、テレビ受像機を
映像確認用のディスプレイ装置として用いることが可能
になる。また、映像出力手段５９は必ずしもテレビ受像
機の画面に映像の表示を可能とするものでなくてもよ
く、ＴＶカメラ１３で撮像している映像が表示可能なデ
ィスプレイ装置を備えるものであれば、パーソナルコン
ピュータのディスプレイ装置などに映像を出力するよう
にすることも可能である。ディスプレイ装置に出力され
た映像が対話者にとって不満である場合に備えて対話者
によって当該映像を消去可能とする機能をロボット１に
設けておくことが望ましい。この機能については音声で
ロボット１に命令できるようにしておくのが望ましい。In order to confirm by itself that the interlocutor is within the field of view of the TV camera 13, the video processing means 2
A video output means 59 for outputting the video signal handled in 2 to a normal television receiver may be added. In this case,
If a television receiver is connected to the video output means 59, the interlocutor himself can confirm on the screen of the television receiver that the interlocutor is being imaged by the TV camera 13. Further, the interlocutor can recognize not only his / her own figure but also the surrounding state such as the indoor state on the screen of the television receiver on the spot. That is, the television receiver can be used as a display device for checking images. Further, the video output means 59 does not necessarily need to be capable of displaying a video on the screen of a television receiver, as long as it has a display device capable of displaying a video captured by the TV camera 13 It is also possible to output an image to a display device of a personal computer or the like. It is desirable to provide the robot 1 with a function that allows the interlocutor to delete the video in case the video output to the display device is unsatisfactory for the interlocutor. It is desirable that this function can be instructed to the robot 1 by voice.

【００７９】上述したように、ロボット１は犬型の外観
を呈するものであり、いわゆるペットロボットとして用
いることができる。そこで、対話者がロボット１を抱け
るように、ぬいぐるみのような外殻としかつ軽量に形成
しておくことによって、必要に応じて対話者にロボット
１を抱かせることが可能になる。対話者にロボット１を
抱かせれば、ロボット１が対話者に接触することによ
り、音声や映像だけではなく接触感覚によっても対話者
に関する情報を得ることが可能になる。そこで、本実施
形態では心拍音を検出可能とする心拍センサとしての心
拍用マイクロホン６０をロボット１の胸部ないし腹部に
配置してあり、対話者がロボット１を抱いたときに心拍
用マイクロホン６０により心拍音を集音するようにして
ある。As described above, the robot 1 has a dog-like appearance, and can be used as a so-called pet robot. Therefore, by forming the outer shell such as a stuffed animal and making it lightweight so that the interlocutor can hold the robot 1, the interlocutor can hold the robot 1 as necessary. If the robot 1 is held by the interlocutor, the robot 1 comes into contact with the interlocutor, so that it is possible to obtain information about the interlocutor not only by voice and video but also by a sense of touch. Therefore, in the present embodiment, a heartbeat microphone 60 as a heartbeat sensor capable of detecting a heartbeat sound is arranged on the chest or abdomen of the robot 1, and the heartbeat microphone 60 is used by the heartbeat microphone 60 when the user holds the robot 1. The sound is collected.

【００８０】心拍用マイクロホン６０の出力は、たとえ
ば図１１のような波形になるから、心拍用マイクロホン
６０の出力についてサンプリングを行い、波形に関する
特徴抽出を行うことによって心拍数や心拍波形の異常の
有無などを検出することが可能になる。ただし、心拍用
マイクロホン６０は、必ずしも心拍音のみを検出するわ
けではなく、心拍用マイクロホン６０の出力には雑音成
分が多く含まれるから、適宜のフィルタリングが必要に
なる。Since the output of the heartbeat microphone 60 has a waveform as shown in FIG. 11, for example, the output of the heartbeat microphone 60 is sampled, and the characteristic of the waveform is extracted to determine whether the heart rate or the heartbeat waveform is abnormal. Etc. can be detected. However, the heartbeat microphone 60 does not necessarily detect only the heartbeat sound, and the output of the heartbeat microphone 60 contains many noise components, so that appropriate filtering is necessary.

【００８１】心拍用マイクロホン６０の出力に基づいて
得られた情報は、たとえば心拍数を表す文字を映像内に
スーパーインポーズ表示して映像サーバ４に転送され
る。あるいはまた、心拍用マイクロホン６０で集音した
心拍音を圧縮し、映像データに添付して映像サーバ４に
転送したり記憶手段５３に記憶させるようにしてもよ
い。これによって、対話者の健康状態の目安を得ること
が可能になる。記憶手段５３に心拍音を記録させる場合
には記憶手段５３が心拍記録装置として兼用されること
になる。また、記憶手段５３とは別に心拍音を記録する
心拍記録装置を設けてもよい。なお、対話者がロボット
１を抱かなければ心拍音を集音することができないか
ら、ロボット１からの音声メッセージによってロボット
１を抱くように指示させるのが望ましい。このような音
声メッセージは所定時刻毎に発生させるのが望ましく、
たとえば起床時刻の直後において日々の心拍音を検出す
るようにすれば健康状態の管理に役立つことになる。Information obtained based on the output of the heartbeat microphone 60 is transferred to the video server 4 by superimposing and displaying, for example, characters representing the heart rate in the video. Alternatively, the heartbeat sound collected by the heartbeat microphone 60 may be compressed, attached to the video data, transferred to the video server 4, or stored in the storage unit 53. This makes it possible to obtain an indication of the health status of the interlocutor. When recording the heartbeat sound in the storage unit 53, the storage unit 53 is also used as a heartbeat recording device. Further, a heart rate recording device for recording a heartbeat sound may be provided separately from the storage means 53. Since the heartbeat sound cannot be collected unless the interlocutor holds the robot 1, it is desirable to instruct the robot 1 to hold the robot 1 by a voice message from the robot 1. It is desirable to generate such a voice message at every predetermined time,
For example, detecting daily heartbeat sounds immediately after the wake-up time is useful for managing the health condition.

【００８２】（第４の実施の形態）第３の実施の形態に
おいては、対話者をＴＶカメラ１３の視野内に捕捉する
ためにマイクロホン１１として指向性を有するものを用
い、１個のマイクロホン１１で検出される音声の音レベ
ルが大きくなる向きに首振り駆動手段５５を制御してい
たが、本実施形態では図１２に示すように、２個のマイ
クロホン１１ａ，１１ｂを用い両マイクロホン１１ａ，
１１ｂで検出される位相差に基づいて首振り駆動手段５
５を制御するように構成してある。つまり、マイクロホ
ン１１ａ，１１ｂは第３の音声入力装置として機能す
る。２個のマイクロホン１１ａ，１１ｂはロボット１の
左右の耳に対応させて配置してあり、音源である対話者
と各マイクロホン１１ａ，１１ｂとの距離が等しいとき
に、図１３（ｂ）のように両マイクロホン１１ａ，１１
ｂの出力の位相差がゼロになるようにしてある。また、
両マイクロホン１１ａ，１１ｂの出力の位相差がゼロに
なる位置に対話者が存在するときにＴＶカメラ１３の視
野の中央に対話者が位置するように、両マイクロホン１
１ａ，１１ｂとＴＶカメラ１３との位置関係を設定して
ある。(Fourth Embodiment) In the third embodiment, a microphone 11 having directivity is used as a microphone 11 in order to capture an interlocutor in the field of view of the TV camera 13, and one microphone 11 is used. Has been controlled in such a direction that the sound level of the sound detected in step (1) increases, in the present embodiment, as shown in FIG. 12, two microphones 11a and 11b are used and both
11b based on the phase difference detected at 11b
5 is controlled. That is, the microphones 11a and 11b function as a third voice input device. The two microphones 11a and 11b are arranged corresponding to the left and right ears of the robot 1, and when the distance between the interlocutor as the sound source and each of the microphones 11a and 11b is equal, as shown in FIG. Both microphones 11a, 11
The phase difference between the outputs b is zero. Also,
When the interlocutor is present at a position where the phase difference between the outputs of the two microphones 11a and 11b becomes zero, the two microphones 1 are positioned so that the interlocutor is positioned at the center of the visual field of the TV camera 13.
The positional relationship between 1a, 11b and the TV camera 13 is set.

【００８３】したがって、対話者がＴＶカメラ１３の視
野に対して左側に位置するときには、図１３（ａ）のよ
うに左側のマイクロホン１１ａの出力の位相が右側のマ
イクロホン１１ｂの出力の位相よりも進むことになり、
逆に対話者がＴＶカメラ１３の視野に対して右側に存在
するときには、図１３（ｃ）のように右側のマイクロホ
ン１１ａの出力の位相が左側のマイクロホン１１ｂの出
力の位相よりも進むことになる。そこで、両マイクロホ
ン１１ａ，１１ｂの出力の位相差を検出する位相差検出
手段６１を設け、位相差が小さくなる向きに首振り駆動
手段５５を制御することによって、対話者がＴＶカメラ
１３の視野の中央付近に位置するようにＴＶカメラ１３
の向きを調節することが可能になるのである。Therefore, when the interlocutor is located on the left side of the field of view of the TV camera 13, the output phase of the left microphone 11a leads the output phase of the right microphone 11b as shown in FIG. That means
Conversely, when the interlocutor is present on the right side with respect to the field of view of the TV camera 13, the output phase of the right microphone 11a leads the output phase of the left microphone 11b as shown in FIG. . Therefore, a phase difference detecting means 61 for detecting the phase difference between the outputs of the microphones 11a and 11b is provided, and the swinging drive means 55 is controlled in such a direction that the phase difference becomes smaller, so that the interlocutor can view the field of view of the TV camera 13. TV camera 13 so that it is located near the center
It is possible to adjust the direction.

【００８４】さらに、本実施形態では２個のマイクロホ
ン１１ａ，１１ｂの出力の位相差から音源が存在する向
きを求めているが、３個以上のマイクロホンの位相差を
用いると音源の存在位置をさらに正確に把握することが
可能になる。本実施形態の他の構成および動作は上述し
た他の実施の形態と同様である。Further, in the present embodiment, the direction in which the sound source exists is obtained from the phase difference between the outputs of the two microphones 11a and 11b. It becomes possible to grasp accurately. Other configurations and operations of this embodiment are the same as those of the other embodiments described above.

【００８５】[0085]

【発明の効果】請求項１の発明は、音声入力装置と音声
出力装置とが接続され自然言語を用いた音声による対話
を対話者との間で行う対話処理手段と、対話者を撮像す
る映像入力装置が接続され映像入力装置で撮像された映
像をデジタル化した映像データを出力する映像処理手段
と、映像データをネットワークに送出する通信手段と、
ネットワークを通して対話者の映像データの閲覧が許可
されている携帯端末と、映像データを携帯端末に表示可
能なフォーマットに変換する変換手段とを備えるもので
あり、対話処理手段を用いて人との対話を模擬するから
対話者は孤独感に対する精神的なケアがなされる。ま
た、対話者の映像データの閲覧は許可されている携帯端
末のみで可能であるから、携帯端末を対話者の家族のみ
が携帯できるようにするように使用することによって、
他人のプライバシを侵害しない範囲で対話者の画像を携
帯端末に転送して対話者の異常の有無を看視可能とする
ことができる。その結果、高齢者や病人を抱える家庭に
おいても高齢者や病人の様子を携帯端末により遠方から
看視することができ、安心して外出することができるよ
うになる。According to the first aspect of the present invention, a voice input device and a voice output device are connected to each other, and a dialogue processing means for performing a voice dialogue with a talker using a natural language, and a video image of the talker. An image processing means for outputting video data obtained by digitizing a video image captured by the video input device to which the input device is connected, a communication means for transmitting the video data to a network,
A mobile terminal that is permitted to view the video data of the interlocutor through the network; and a conversion unit that converts the video data into a format that can be displayed on the mobile terminal. Simulates the spiritual care of loneliness. Also, since the viewing of the video data of the interlocutor is possible only with the mobile terminal that is permitted, by using the mobile terminal so that only the family of the interlocutor can carry,
The image of the interlocutor can be transferred to the portable terminal within a range that does not violate the privacy of the other person, and the presence or absence of the interlocutor's abnormality can be monitored. As a result, even in a home that has the elderly and the sick, the state of the elderly and the sick can be monitored from a distance using a portable terminal, and the user can go out with peace of mind.

【００８６】請求項２の発明は、請求項１の発明におい
て、前記変換手段がネットワークを介して転送された映
像データを蓄積する映像サーバに設けられ、映像サーバ
では携帯端末の所持者に許容される映像データのみを選
別して携帯端末に転送するものであり、対話者の映像デ
ータを蓄積する映像サーバを設けたことによって、過去
に遡って映像データを閲覧することが可能になる。According to a second aspect of the present invention, in the first aspect of the present invention, the conversion means is provided in a video server for storing video data transferred via a network, and the video server is allowed by a portable terminal holder. In this case, only the video data is sorted and transferred to the portable terminal. By providing the video server for storing the video data of the interlocutor, the video data can be browsed retroactively.

【００８７】請求項３の発明は、請求項１の発明におい
て、前記変換手段が映像処理手段と通信手段との一方に
設けられているものであり、映像サーバを設けることな
く対話者の映像データを携帯端末から閲覧することがで
き、システム構成が簡単である。According to a third aspect of the present invention, in the first aspect of the present invention, the conversion means is provided in one of the video processing means and the communication means. Can be browsed from a portable terminal, and the system configuration is simple.

【００８８】請求項４の発明は、請求項１ないし請求項
３の発明において、前記映像処理手段が前記携帯端末か
らの指示により対話者の映像データを通信手段を通して
ネットワークに送出させるものであり、携帯端末から映
像データの送出を指示するから、携帯端末の所持者が要
求したときの映像を見ることができる。According to a fourth aspect of the present invention, in the first to third aspects of the present invention, the video processing means transmits the video data of the interlocutor to the network through the communication means in accordance with an instruction from the portable terminal. Since the transmission of the video data is instructed from the mobile terminal, the video at the time of the request by the holder of the mobile terminal can be viewed.

【００８９】請求項５の発明は、請求項１ないし請求項
４の発明において、前記映像処理手段が前記対話処理手
段を通して対話者が指示した命令に応答して動作するも
のであり、対話者が自身の意思で映像を撮影させるか
ら、たとえば対話者が異常を感じたときに映像データを
送って携帯端末の所持者に知らせることが可能になる。According to a fifth aspect of the present invention, in the first to fourth aspects, the video processing means operates in response to a command instructed by the interlocutor through the interactive processing means. Since the user is allowed to shoot the video image by himself / herself, for example, when the interlocutor feels abnormal, it becomes possible to send the video data to notify the portable terminal owner.

【００９０】請求項６の発明は、請求項１ないし請求項
５の発明において、前記映像処理手段が対話者からの指
示により動作しているか前記携帯端末からの指示によっ
て動作しているかを被指示側に報知する報知手段を備え
るものであり、対話者からと携帯端末からとのどちらか
らも映像処理手段の制御を可能としながらも、両者の指
示が混乱することがない。According to a sixth aspect of the present invention, in accordance with the first to fifth aspects of the present invention, it is specified whether the video processing means is operated by an instruction from the interlocutor or is operated by an instruction from the portable terminal. It is provided with a notification means for notifying the video processing means from both the interlocutor and the portable terminal, but the instructions of the two are not confused.

【００９１】請求項７の発明は、請求項１ないし請求項
６の発明において、対話者の接触を検出する接触センサ
が付加され、接触センサにより対話者の接触が検出され
ると前記対話処理手段が起動するものであり、対話者が
対話しようとするときに接触センサに接触することによ
って対話処理手段が起動されるから、周囲の雑音や対話
者以外の他人の音声によって対話処理手段が誤動作する
のを防止することができる。According to a seventh aspect of the present invention, in the first to sixth aspects of the present invention, a contact sensor for detecting a contact of the interlocutor is added, and when the contact sensor detects the contact of the interlocutor, the interaction processing means is provided. Is activated, and the dialogue processing means is activated by contacting the contact sensor when the dialogue person tries to talk, and the dialogue processing means malfunctions due to ambient noise or voice of another person other than the dialogue person. Can be prevented.

【００９２】請求項８の発明は、請求項１ないし請求項
７の発明において、前記映像入力手段によって対話者の
映像が撮影されていることを対話者に示す手段を備える
ものであり、対話者がいつ撮影されているかを知ること
ができるから、プライバシの都合で撮影されたくないと
きには撮影されない場所に移動するなどの対処が可能に
なる。The invention according to claim 8 is the invention according to any one of claims 1 to 7, further comprising means for indicating to the interlocutor that the image of the interlocutor has been photographed by the image input means. Since it is possible to know when is photographed, it is possible to take measures such as moving to a place where photographing is not performed when the user does not want to be photographed for privacy reasons.

【００９３】請求項９の発明は、請求項１ないし請求項
８の発明において、前記対話処理手段が、対話者による
音声入力を確認するように対話者に質問する機能を有
し、対話者による肯定の応答を受けると対話者による音
声入力を確定するものであり、復唱によって対話者の音
声入力を確認するから、対話者の発生した音声を確実に
理解することが可能になる。According to a ninth aspect of the present invention, in the first to eighth aspects, the interaction processing means has a function of asking the interlocutor to confirm a voice input by the interlocutor. If a positive response is received, the voice input by the interlocutor is determined. Since the voice input of the interlocutor is confirmed by repetition, the voice generated by the interlocutor can be surely understood.

【００９４】請求項１０の発明は、請求項９の発明にお
いて、前記対話処理手段が、対話者による肯定の応答を
教師信号として対話者の発生する音声を学習するもので
あり、対話者との対話によって対話者の発生する音声を
学習することで認識率が高くなる。According to a tenth aspect of the present invention, in the ninth aspect, the interaction processing means learns a voice generated by the interlocutor using a positive response from the interlocutor as a teacher signal. The recognition rate increases by learning the voice generated by the interlocutor through the dialogue.

【００９５】請求項１１の発明は、請求項１ないし請求
項１０の発明において、前記携帯端末で閲覧した映像デ
ータを前記ネットワーク上の他の端末に転送する手段を
設けたものであり、映像データを他の端末に転送するこ
とにより、たとえば携帯端末の所持者が対話者の異常に
気づいたときに映像データを医者に送ることで対処のア
ドバイスを受けるなどの使用が可能になる。The invention of claim 11 is the invention of claim 1 to claim 10, further comprising means for transferring video data browsed by the portable terminal to another terminal on the network. Is transferred to another terminal, for example, when the owner of the portable terminal notices an abnormality of the interlocutor, it is possible to receive the advice of coping by sending the video data to the doctor.

【００９６】請求項１２の発明は、請求項１ないし請求
項１１の発明において、前記携帯端末から映像データの
閲覧が要求されたことを前記対話者に通知する通知手段
を備えるので、対話者である高齢者や病人を抱える家庭
においても家族が携帯端末を用いて外出先から高齢者や
病人の様子を看視したときに、高齢者や病人に対しては
通知手段によって家族が確認したことが通知されるか
ら、高齢者や病人にとっては家族が確認してくれたこと
を知ることができ安心感が得られる。According to a twelfth aspect of the present invention, in the first to eleventh aspects of the present invention, there is provided notifying means for notifying the interlocutor that browsing of video data has been requested from the portable terminal. Even in a home with a certain elderly or sick person, when the family uses a mobile terminal to monitor the state of the elderly or sick from a distance, the family confirmed that the elderly and sick were notified by means of notification. Since the notification is given, the elderly and the sick can be informed that the family has confirmed them, and a sense of security can be obtained.

【００９７】請求項１３の発明は、請求項１２の発明に
おいて、前記通知手段が前記携帯端末の識別情報を用い
て映像データの閲覧者を通知する機能を備えるので、映
像データを誰が確認したかを知ることができるから、よ
り安心することができる。According to a thirteenth aspect of the present invention, in the twelfth aspect, the notifying means has a function of notifying a viewer of the video data using the identification information of the portable terminal. You can know more, so you can feel more at ease.

【００９８】請求項１４の発明は、請求項１３の発明に
おいて、前記通知手段が映像データの閲覧者ごとに割り
当てた報知部を備えるので、文字データなどによらずに
報知部を視認すれば閲覧者を識別することができるか
ら、閲覧者が映像データを確認したことを知るだけでな
く誰が閲覧したかも直感的に知ることができる。According to a fourteenth aspect of the present invention, in the thirteenth aspect, the notifying means includes a notifying unit assigned to each viewer of the video data. Since the user can be identified, it is possible not only to know that the viewer has checked the video data, but also to intuitively know who viewed the video data.

【００９９】請求項１５の発明は、音声入力装置と音声
出力装置とが接続され自然言語を用いた音声による対話
を対話者との間で行う対話処理手段と、対話者を撮像す
る映像入力装置が接続され映像入力装置で撮像された映
像をデジタル化した映像データを出力する映像処理手段
と、映像データをネットワークに送出する通信手段と、
ネットワークを通して対話者の映像データの閲覧が許可
されている携帯端末に表示可能なフォーマットとなるよ
うに映像データを変換する変換手段とを備えるものであ
り、対話処理手段を用いて人との対話を模擬するから対
話者は孤独感に対する精神的なケアがなされる。また、
対話者の映像データの閲覧は許可されている携帯端末の
みで可能であるから、携帯端末を対話者の家族のみが携
帯できるようにするように使用することによって、他人
のプライバシを侵害しない範囲で対話者の画像を携帯端
末に転送して対話者の異常の有無を看視可能とすること
ができる。その結果、高齢者や病人を抱える家庭におい
ても高齢者や病人の様子を携帯端末により遠方から看視
することができ、安心して外出することができるように
なる。A fifteenth aspect of the present invention is directed to a dialogue processing means for connecting a voice input device and a voice output device to perform a voice dialogue using a natural language with a user, and a video input device for imaging the user. A video processing unit that is connected and outputs video data obtained by digitizing a video captured by a video input device, a communication unit that sends video data to a network,
Conversion means for converting the video data into a format that can be displayed on a portable terminal that is permitted to view the video data of the interlocutor through the network. Simulating gives the interlocutor mental care for loneliness. Also,
Since the video data of the interlocutor can be viewed only on the authorized mobile device, using the mobile device so that it can be carried only by the interlocutor's family can be performed within a range that does not violate the privacy of others. The image of the interlocutor can be transferred to the portable terminal, so that the presence or absence of the anomaly of the interlocutor can be monitored. As a result, even in a home that has the elderly and the sick, the state of the elderly and the sick can be monitored from a distance using a portable terminal, and the user can go out with peace of mind.

【０１００】請求項１６の発明は、請求項１５の発明に
おいて、現在日時を計時する時計手段と、対話者が撮像
されたときに時計手段で計時されている日時を文字とし
て映像データに合成して通信手段から送出させるスーパ
インポーズ処理手段とが付加されたものであり、各映像
データに映像の撮像された日時が添付されているから、
映像が撮像された日時と映像データを携帯端末によって
閲覧する日時とに大きな時間差があるようなときに、閲
覧者は映像データに添付されている日時を確認すること
によって、最新の映像を求めるなどの対処が可能にな
る。According to a sixteenth aspect of the present invention, in the invention of the fifteenth aspect, the clock means for measuring the current date and time and the date and time measured by the clock means when the interlocutor is imaged are combined with the video data as characters. And superimpose processing means for sending from the communication means, and the date and time when the video was captured is attached to each video data.
When there is a large time difference between the date and time when the video was captured and the date and time when the video data was viewed by the mobile terminal, the viewer can check the date and time attached to the video data to obtain the latest video, etc. Can be dealt with.

【０１０１】請求項１７の発明は、請求項１６の発明に
おいて、前記通信手段を通してネットワークに送出され
る映像データおよび通信の成功・失敗の結果を前記時計
手段で計時されている日時に対応付けて順次記録して蓄
積するとともに無給電で記録内容を保持する記憶手段が
付加されているものであり、通信手段から映像データが
送出されず携帯端末によって映像データを確認すること
ができないような場合にも、記憶手段に蓄積された映像
データによって対話者の健康状態などを後日確認するこ
とができ、また通信の成功・失敗の結果を送信履歴とし
て記憶手段に格納しているから、メンテナンス時に通信
手段の異常の有無などを容易に知ることができる。In a seventeenth aspect of the present invention, in the sixteenth aspect of the present invention, the video data transmitted to the network through the communication means and the result of the success or failure of the communication are associated with the date and time measured by the clock means. A storage means for sequentially recording and accumulating and storing the recorded contents without power supply is added, and when the video data is not transmitted from the communication means and the video data cannot be confirmed by the portable terminal, Also, the health status of the interlocutor can be confirmed at a later date based on the video data stored in the storage means, and the result of communication success / failure is stored in the storage means as a transmission history, so that the communication means can be used during maintenance. Can be easily known.

【０１０２】請求項１８の発明は、請求項１５ないし請
求項１７の発明において、前記映像入力装置の傾きを検
出する姿勢センサが付加され、前記映像処理手段では映
像入力装置により撮像された映像を正立させるように姿
勢センサにより検出された傾きに基づいて映像を補正す
るものであり、姿勢センサにより得られている情報に基
づいて映像の傾きを除去するように補正するから、映像
入力装置が傾いているような場合、たとえば水平ではな
い場所に撮影装置が配置されている場合や対話者が撮影
装置を持ち上げているような場合においても、撮影装置
が水平面上に配置されている状態と同様の映像を得るこ
とができる。According to an eighteenth aspect of the present invention, in the invention according to the fifteenth to seventeenth aspects, an attitude sensor for detecting a tilt of the image input device is added, and the image processing means converts an image picked up by the image input device. This is to correct the image based on the inclination detected by the posture sensor so as to erect it, and to correct so as to remove the inclination of the image based on the information obtained by the posture sensor. Even when the image capturing device is placed on a horizontal plane, for example, when the image capturing device is arranged in a place that is not horizontal or when an interlocutor lifts the image capturing device, Video can be obtained.

【０１０３】請求項１９の発明は、請求項１５ないし請
求項１８の発明において、前記映像入力装置の視野の方
向を変更可能とする首振り駆動手段と、前記映像入力装
置の中央付近に対話者を撮像するように規定したテンプ
レートが登録可能であって、映像入力装置で撮像された
映像とテンプレートとを照合して類似度が大きくなるよ
うに首振り駆動手段を制御するパターン処理手段とが付
加されているものであり、映像入力装置により撮像され
た映像とテンプレートとの類似度が大きくなるように映
像入力装置の視野の向きが制御されるから、適切なテン
プレートが設定されていれば対話者を映像入力装置の視
野の中央付近に捕捉するように映像入力装置の向きを調
節することが可能になる。A nineteenth aspect of the present invention is the invention according to the fifteenth to eighteenth aspects, wherein a swinging drive means for changing the direction of the field of view of the video input device, and a talker near the center of the video input device Can be registered, and a pattern processing unit that controls the swinging driving unit so as to increase the similarity by comparing the template image with the image captured by the video input device is added. Since the orientation of the visual field of the video input device is controlled so that the similarity between the image captured by the video input device and the template is increased, if the appropriate template is set, the Of the video input device can be adjusted so that the image is captured near the center of the visual field of the video input device.

【０１０４】請求項２０の発明は、請求項１５ないし請
求項１８の発明において、前記映像入力装置の視野の方
向を指向性を有する方向に一致させた第２の音声入力装
置と、前記映像入力装置の視野の方向と第２の音声入力
装置が指向性を有する方向とを同時に変更可能とする首
振り駆動手段と、前記第２の音声入力装置の出力の音レ
ベルが大きくなる向きに首振り駆動手段を制御する音レ
ベル処理手段とが付加されているものであり、音レベル
によって対話者の存在する方向を推定することができる
から、映像入力装置の視野を音レベルの大きくなる向き
に調節することによって、映像入力装置の視野内に対話
者を捕捉できる可能性が高くなる。According to a twentieth aspect of the present invention, in accordance with the fifteenth to eighteenth aspects, the second audio input device in which the direction of the visual field of the video input device matches the direction having directivity, and the video input device Swing drive means for simultaneously changing the direction of the field of view of the device and the direction in which the second voice input device has directivity; The sound level processing means for controlling the driving means is added, and the direction in which the interlocutor is present can be estimated based on the sound level. Therefore, the visual field of the video input device is adjusted to increase the sound level. By doing so, it is more likely that the interlocutor can be captured within the field of view of the video input device.

【０１０５】請求項２１の発明は、請求項１５ないし請
求項１８の発明において、前記映像入力装置の視野の方
向に存在する音源に対して出力の位相差がゼロになるよ
うに配置した複数個の第３の音声入力装置と、第３の音
声入力装置の位相差を小さくする向きに首振り駆動手段
を制御する位相差検出手段とを付加したものであり、複
数個の音声入力装置の位相差を小さくするように映像入
力装置の視野を調節することによって、映像入力装置の
視野内に対話者を捕捉することが可能になる。In a twenty-first aspect of the present invention, in accordance with the fifteenth to eighteenth aspects, a plurality of the video input devices are arranged so that a phase difference of an output with respect to a sound source present in a direction of a visual field becomes zero. And a phase difference detecting means for controlling the swinging drive means in a direction to reduce the phase difference between the third voice input apparatus and the third voice input apparatus. By adjusting the field of view of the video input device to reduce the phase difference, it becomes possible to capture the interlocutor within the field of view of the video input device.

【０１０６】請求項２２の発明は、請求項１５ないし請
求項２１の発明において、前記映像入力装置の前方に前
記映像入力装置の光軸方向に直交させた形で配置したハ
ーフミラーが付加されているものであり、対話者がハー
フミラーに自身の姿を映した状態では映像入力装置の正
面に対話者が存在することになるから、対話者の意思に
よって対話者を映像入力装置で確実に撮像することが可
能になる。According to a twenty-second aspect of the present invention, in addition to the fifteenth to twenty-first aspects, a half mirror is provided in front of the video input device so as to be orthogonal to the optical axis direction of the video input device. In the state where the interlocutor reflects himself in the half mirror, the interlocutor is present in front of the video input device, so the interlocutor can be reliably imaged by the video input device according to the will of the interlocutor. It becomes possible to do.

【０１０７】請求項２３の発明は、請求項１５ないし請
求項２１の発明において、外部のディスプレイ装置が接
続可能であって前記映像入力装置により撮像された映像
データをディスプレイ装置に表示可能とする映像出力手
段が付加されているものであり、対話者は、映像入力装
置により撮像されている映像を映像出力手段に接続した
ディスプレイ装置によってその場で確認できるから、対
話者は自身の映り具合を確認することができるだけでな
く周囲の様子も確認して適切な映像が撮像されているか
否かを確認することができる。[0107] According to a twenty-third aspect of the present invention, in accordance with the fifteenth to twenty-first aspects of the present invention, an image can be displayed on the display device to which an external display device can be connected and image data captured by the image input device can be displayed. The output means is added, and the interlocutor can check the image captured by the video input device on the spot by the display device connected to the video output means, so that the interlocutor can confirm the state of his own reflection. In addition to confirming whether or not an appropriate image has been captured, the user can also check the surroundings.

【０１０８】請求項２４の発明は、請求項１５ないし請
求項２３の発明において、前記対話者が抱えることを可
能とする外殻を有し、前記対話者が抱えたときに対話者
の心拍音を検出可能な心拍センサと、心拍センサにより
検出された心拍音を記録する心拍記録装置とが付加され
ているものであり、映像によって対話者の健康状態を知
るだけではなく、心拍音によっても対話者の健康状態を
確認することが可能になり、対話者に関して得られる情
報を複合することによって対話者の健康状態をより正確
に知ることが可能になる。しかも、心拍センサとして対
話者が抱えたときに心拍音を集音する構成を採用してい
るから、手間をかけることなく心拍音の集音が可能にな
る。[0108] According to a twenty-fourth aspect of the present invention, in the invention of the fifteenth to twenty-third aspects, there is provided an outer shell capable of being held by the interlocutor. A heart rate sensor capable of detecting the heart rate and a heart rate recording device that records the heart rate sound detected by the heart rate sensor are added. It is possible to check the health status of the interlocutor, and to know the health status of the interlocutor more accurately by combining the information obtained about the interlocutor. In addition, since the heartbeat sensor is configured to collect the heartbeat sound when the interlocutor is holding the heartbeat sensor, the heartbeat sound can be collected without any trouble.

【０１０９】請求項２５の発明は、請求項１５ないし請
求項２４の発明において、前記携帯端末から映像データ
の閲覧が要求されたことを前記対話者に通知する通知手
段を備えるので、話者である高齢者や病人を抱える家庭
においても家族が携帯端末を用いて外出先から高齢者や
病人の様子を看視したときに、高齢者や病人に対しては
通知手段によって家族が確認したことが通知されるか
ら、高齢者や病人にとっては家族が確認してくれたこと
を知ることができ安心感が得られる。[0109] According to a twenty-fifth aspect of the present invention, in the invention of the fifteenth to twenty-fourth aspects, there is provided a notifying means for notifying the interlocutor of a request to view video data from the portable terminal. Even in a home with a certain elderly or sick person, when the family uses a mobile terminal to monitor the state of the elderly or sick from a distance, the family confirmed that the elderly and sick were notified by means of notification. Since the notification is given, the elderly and the sick can be informed that the family has confirmed them, and a sense of security can be obtained.

【０１１０】請求項２６の発明は、請求項２５の発明に
おいて、前記通知手段が前記携帯端末の識別情報を用い
て映像データの閲覧者を通知する機能を備えるので、映
像データを誰が確認したかを知ることができるから、よ
り安心することができる。According to a twenty-sixth aspect of the present invention, in the twenty-seventh aspect, the notifying means has a function of notifying a viewer of the video data using the identification information of the portable terminal. You can know more, so you can feel more at ease.

【０１１１】請求項２７の発明は、請求項２６の発明に
おいて、前記通知手段が映像データの閲覧者ごとに割り
当てた報知部を備えるので、文字データなどによらずに
報知部を視認すれば閲覧者を識別することができるか
ら、閲覧者が映像データを確認したことを知るだけでな
く誰が閲覧したかも直感的に知ることができる。According to a twenty-seventh aspect of the present invention, in the invention of the twenty-sixth aspect, the notifying means includes a notifying unit assigned to each viewer of the video data. Since the user can be identified, it is possible not only to know that the viewer has checked the video data, but also to intuitively know who viewed the video data.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態を示すブロック図で
ある。FIG. 1 is a block diagram showing a first embodiment of the present invention.

【図２】同上に用いる対話処理手段を示すブロック図で
ある。FIG. 2 is a block diagram showing a dialog processing means used in the embodiment.

【図３】同上の対話処理手段の動作例を示す動作説明図
である。FIG. 3 is an operation explanatory view showing an operation example of the dialog processing means according to the first embodiment;

【図４】本発明の第２の実施の形態を示すブロック図で
ある。FIG. 4 is a block diagram showing a second embodiment of the present invention.

【図５】同上に用いるロボットの外観を示す斜視図であ
る。FIG. 5 is a perspective view showing the appearance of the robot used in the above.

【図６】本発明の第３の実施の形態を示すブロック図で
ある。FIG. 6 is a block diagram showing a third embodiment of the present invention.

【図７】同上の動作説明図である。FIG. 7 is an operation explanatory diagram of the above.

【図８】同上の動作説明図である。FIG. 8 is an operation explanatory view of the above.

【図９】同上の動作説明図である。FIG. 9 is an operation explanatory view of the above.

【図１０】同上の動作説明図である。FIG. 10 is an operation explanatory view of the above.

【図１１】同上の要部構成図である。FIG. 11 is a configuration diagram of a main part of the above.

【図１２】本発明の第４の実施の形態を示すブロック図
である。FIG. 12 is a block diagram showing a fourth embodiment of the present invention.

【図１３】同上の動作説明図である。FIG. 13 is an explanatory diagram of the operation of the above.

[Explanation of symbols]

１ロボット２携帯端末３映像サーバ４通信回線１１マイクロホン１１ａ，１１ｂマイクロホン１２スピーカ１３ＴＶカメラ１４接触センサ２０信号処理部２１対話処理手段２２映像処理手段２３通信手段２７表示手段２８通知手段５１時計手段５２スーパインポーズ処理手段５３記憶手段５４姿勢センサ５５首振り駆動手段５６パターン処理手段５７音レベル処理手段５８ハーフミラー５９映像出力手段６０心拍用マイクロホン６１位相差検出手段 DESCRIPTION OF SYMBOLS 1 Robot 2 Mobile terminal 3 Video server 4 Communication line 11 Microphone 11a, 11b Microphone 12 Speaker 13 TV camera 14 Contact sensor 20 Signal processing unit 21 Interactive processing means 22 Video processing means 23 Communication means 27 Display means 28 Notification means 51 Clock means 52 Superimpose processing means 53 Storage means 54 Attitude sensor 55 Swing drive means 56 Pattern processing means 57 Sound level processing means 58 Half mirror 59 Video output means 60 Heartbeat microphone 61 Phase difference detection means

フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/22 Ｇ１０Ｌ 3/00 ５５１Ｈ５Ｄ０４５Ｈ０４Ｎ 7/15 ５７１Ｕ５Ｋ１０１ 7/18 Ｒ (72)発明者西山高史大阪府門真市大字門真1048番地松下電工株式会社内Ｆターム(参考） 5C054 AA02 CC02 CD03 CE04 CG01 CH01 CH05 DA01 DA07 EA01 EA03 EA05 EA07 EH01 FA04 FC12 FF02 GA04 GB04 GD03 HA00 HA04 5C064 AA06 AB04 AC02 AC06 AC12 AC16 AC22 AD08 5C086 AA22 BA01 CA09 CA28 CB26 CB36 DA08 DA14 DA33 EA45 FA06 FA18 5C087 AA02 AA08 AA10 AA24 AA25 AA37 AA44 BB12 BB20 BB46 BB65 BB74 DD03 DD24 EE05 FF01 FF02 FF04 FF16 FF19 FF23 GG02 GG20 GG67 5D015 KK01 KK02 KK04 LL00 5D045 AB11 5K101 KK11 KK13 KK19 LL12 MM07 NN06 NN08 NN13 NN18 NN21 RR12 Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat II (Reference) G10L 15/22 G10L 3/00 551H 5D045 H04N 7/15 571U 5K101 7/18 R (72) Inventor Takashi Nishiyama Kadoma, Osaka 1048 Ichidai Kadoma Matsushita Electric Works Co., Ltd.F-term (reference) 5C054 AA02 CC02 CD03 CE04 CG01 CH01 CH05 DA01 DA07 EA01 EA03 EA05 EA07 EH01 FA04 FC12 FF02 GA04 GB04 GD03 HA00 HA04 5C064 AA06 AB04 AC02 AC06 A08 AC12 BA01 CA09 CA28 CB26 CB36 DA08 DA14 DA33 EA45 FA06 FA18 5C087 AA02 AA08 AA10 AA24 AA25 AA37 AA44 BB12 BB20 BB46 BB65 BB74 DD03 DD24 EE05 FF01 FF02 FF04 FF16 FF19 FF23 GG02 GG20 GG02 GG20 GG02 GG01 GG02 GG01 GG02 GG20 GG20 GG20 NN08 NN13 NN18 NN21 RR12

Claims

[Claims]

An audio input device and an audio output device are connected to each other, and a dialogue processing means for performing a voice dialogue with a dialogue person using a natural language with a dialogue person is connected to a video input device for imaging the talker. Video processing means for outputting video data obtained by digitizing video captured by the device, communication means for sending video data to a network, a portable terminal permitted to view video data of an interlocutor through the network, A communication support system, comprising: conversion means for converting data into a format that can be displayed on a mobile terminal.

2. The image processing apparatus according to claim 1, wherein the conversion unit is provided in a video server that stores video data transferred via a network,
2. The communication support system according to claim 1, wherein the video server selects only video data permitted to the portable terminal holder and transfers the selected data to the portable terminal.

3. The communication support system according to claim 1, wherein said conversion means is provided in one of a video processing means and a communication means.

4. The image processing apparatus according to claim 1, wherein said video processing means transmits video data of the interlocutor to a network through a communication means in accordance with an instruction from said portable terminal. Communication support system.

5. The communication support system according to claim 1, wherein said video processing means operates in response to a command instructed by an interlocutor through said interactive processing means. .

6. A notifying means for notifying the instructed side whether the video processing means is operating according to an instruction from an interlocutor or operating according to an instruction from the portable terminal. A communication support system according to claim 5.

7. A system according to claim 1, wherein a contact sensor for detecting a contact of the interlocutor is added, and when the contact of the interlocutor is detected by the contact sensor, the dialog processing means is activated.
The communication support system according to claim 6.

8. The communication support according to claim 1, further comprising means for indicating to the interlocutor that a video of the interlocutor has been photographed by the video input means. system.

9. The dialogue processing means has a function of asking the interlocutor to confirm the voice input by the interlocutor, and confirming the voice input by the interlocutor when receiving a positive response from the interlocutor. The communication support system according to any one of claims 1 to 8, wherein:

10. The communication support system according to claim 9, wherein said dialogue processing means learns a voice generated by the interlocutor using a positive response from the interlocutor as a teacher signal.

11. The communication support according to claim 1, further comprising means for transferring video data browsed on the portable terminal to another terminal on the network. system.

12. The communication according to claim 1, further comprising a notifying unit that notifies the interlocutor of a request to view video data from the mobile terminal. Support system.

13. The communication support system according to claim 12, wherein said notifying means has a function of notifying a viewer of video data using identification information of said portable terminal.

14. The communication support system according to claim 13, wherein said notifying unit includes a notifying unit assigned to each viewer of the video data.

15. A voice input device and a voice output device are connected to each other, and a dialogue processing means for performing a voice dialogue with a talker using a natural language with a talker; Image processing means for outputting image data obtained by digitizing the image picked up by the device, communication means for transmitting the image data to a network, and display on a portable terminal permitted to view the image data of an interlocutor through the network A conversion unit for converting video data so as to have a simple format.

16. Clock means for measuring the current date and time, and superimposing processing means for synthesizing the date and time measured by the clock means as text into video data when the interlocutor is imaged and transmitting the video data from the communication means. The photographing apparatus according to claim 15, wherein is added.

17. The video data transmitted to the network through the communication means and the result of communication success / failure are sequentially recorded and stored in association with the date and time measured by the clock means, and the recorded contents are supplied without power supply. 17. The photographing apparatus according to claim 16, further comprising a storage unit for storing the image data.

18. An image processing apparatus, further comprising: a posture sensor for detecting a tilt of the image input device, wherein the image processing means detects an image based on the inclination detected by the posture sensor so as to erect the image captured by the image input device. 18. The photographing apparatus according to claim 15, wherein the correction is performed.

19. A video camera, comprising: a swing drive unit capable of changing a direction of a field of view of the video input device; and a template defining an image of an interlocutor near a center of the video input device. 19. The image processing apparatus according to claim 15, further comprising pattern processing means for controlling the swing driving means so as to increase the degree of similarity by comparing the image captured by the input device with the template. The imaging device according to claim 1.

20. A second audio input device in which the direction of the visual field of the video input device matches the direction having directivity, and the direction of the visual field of the video input device and the second audio input device have directivity. And a sound level processing means for controlling the swing driving means so as to increase the sound level of the output of the second voice input device. The imaging device according to any one of claims 15 to 18, wherein:

21. A plurality of third audio input devices arranged so that a phase difference between outputs with respect to a sound source present in the direction of the visual field of the video input device becomes zero, 19. The photographing apparatus according to claim 15, further comprising a phase difference detecting unit for controlling the swing driving unit in a direction to reduce the phase difference.

22. The apparatus according to claim 15, further comprising a half mirror disposed in front of the image input device so as to be orthogonal to an optical axis direction of the image input device. 2. The photographing apparatus according to claim 1.

23. An image display device according to claim 15, further comprising a video output means connectable to an external display device and enabling display of video data picked up by said video input device on said display device. Claim 2
The imaging device according to any one of claims 1 to 7.

24. A heart rate sensor having an outer shell capable of being held by the interlocutor, capable of detecting a heartbeat sound of the interlocutor when the interlocutor is holding the heartbeat sound, and a heartbeat sound detected by the heartbeat sensor The imaging apparatus according to any one of claims 15 to 23, further comprising: a heartbeat recording device for recording a heart rate.

25. The photographing apparatus according to claim 15, further comprising a notifying unit that notifies the interlocutor that a request for browsing video data from the portable terminal is provided. apparatus.

26. The photographing apparatus according to claim 25, wherein the notifying unit has a function of notifying a viewer of the video data using identification information of the mobile terminal.

27. The photographing apparatus according to claim 26, wherein the notifying unit includes a notifying unit assigned to each viewer of the video data.