JP2010128237A

JP2010128237A - Speech interactive system

Info

Publication number: JP2010128237A
Application number: JP2008303596A
Authority: JP
Inventors: Takao Hayashi; 孝郎林
Original assignee: Individual
Current assignee: Individual
Priority date: 2008-11-28
Filing date: 2008-11-28
Publication date: 2010-06-10

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem that, when a speech interactive system is incorporated in an interactive object, the interactive object is too large to carry, and it may suffer big damage, when it is fallen or sunk in water. <P>SOLUTION: The interactive object 11 and a server 13 which is connected with the interactive object 11 with or without wires are provided. A speech recognition board 55 and an interactive processing section 71 are arranged in the server 13. When a predetermined device such as a game machine is operated, or a predetermined operation body is operated, a moving section is separately provided from the interactive object 11, and the device is operated with or without wires. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、音声対話システムに関するもので、例えば、携帯電話、パソコン、カメラ、ゲーム機に適用して有効である。 The present invention relates to a voice interaction system, and is effective when applied to, for example, a mobile phone, a personal computer, a camera, and a game machine.

近年、産官学を挙げて音声対話装置の開発、製品化が行われている。 In recent years, spoken dialogue devices have been developed and commercialized through industry, government and academia.

しかし、従来の音声対話装置は、例えば、音声対話ロボットのように、被対話体と一体に組み込まれることが多く、これを持ち歩く場合、常に、落下事故や水没事故で故障する可能性が指摘される。また、上記音声対話装置は、非常に高価で、上記のような事故が発生して故障すると、修理に多額の費用が発生する問題点があった。 However, conventional voice interaction devices, such as a voice interaction robot, are often integrated with the person to be interacted with, and it is always pointed out that there is a possibility of failure due to a fall accident or submergence accident when carrying this. The Further, the above-mentioned voice interactive apparatus is very expensive, and there is a problem that a large amount of cost is required for repair if the above accident occurs and breaks down.

また、従来の音声対話装置は、小型化が進んでいるが、例えば携帯電話に組み込むと非常に大きな携帯電話になってしまい、実用上、不便になる問題点があった。そのため、音声対話装置は開発されているが、実際に被対話体に組み込まれない場合があった。また、音声対話装置が組み込まれたロボットでは、小型化をすることが困難であった。 In addition, although the conventional voice interaction device has been reduced in size, there has been a problem that it becomes practically inconvenient because it becomes a very large mobile phone when incorporated in a mobile phone, for example. For this reason, although a voice interaction device has been developed, there are cases where it is not actually incorporated into the interactee. In addition, it is difficult to reduce the size of a robot incorporating a voice interaction device.

また、従来の音声対話装置は、被対話体と組み合わせただけでは、音声対話を行う被対話体という位置付けに過ぎず、より付加価値の高い、より高機能な音声対話システムを提供することができなかった。さらに、従来の音声対話装置は、被対話体と組み合わせただけでは、より高度なユーザインターフェースを実現できない問題点があった。 In addition, the conventional voice dialogue apparatus is merely positioned as a dialogue target for performing voice dialogue only by being combined with the dialogue target, and can provide a higher-value and higher-functional voice dialogue system. There wasn't. Furthermore, the conventional voice interaction apparatus has a problem that a more advanced user interface cannot be realized only by combining with the object to be interacted.

本発明は上記点に鑑み、事故が発生した場合に、損傷を少なくする音声対話システムを提供することを第１の目的とする。 In view of the above points, the first object of the present invention is to provide a spoken dialogue system that reduces damage when an accident occurs.

また、本発明は上記点に鑑み、小型に構成できる音声対話システムを提供することを第２の目的とする。 In addition, in view of the above points, the second object of the present invention is to provide a spoken dialogue system that can be made compact.

また、本発明は上記点に鑑み、付加価値の高い、高機能な音声対話システムを提供することを第３の目的とする。 In addition, in view of the above points, the third object of the present invention is to provide a high-value voice dialogue system with high added value.

また、本発明は上記点に鑑み、高度なユーザインターフェースを実現できる音声対話可能な被対話体または音声対話システムを提供することを第４の目的とする。 In addition, in view of the above points, a fourth object of the present invention is to provide an object to be spoken or a voice dialogue system capable of voice dialogue capable of realizing an advanced user interface.

本発明は、上記目的を達成するために、請求項１に記載の発明では、人の音声を音声信号に変換する音声変換手段および所定の発音信号を振動に変えて発音する発音手段を備えた被対話体と、
被対話体とは別体に設けられて被対話体に有線及び無線のいずれかで接続されたサーバ用コンピュータと、
を備えており、
サーバ用コンピュータが、音声変換手段により変換された音声信号を処理して人の音声を認識する音声認識手段と、音声認識手段により認識された音声に対応する音声を決定し所定の発音信号を出力する対話制御手段とを備えていることを特徴とする。 In order to achieve the above object, according to the present invention, in the first aspect of the present invention, there is provided speech conversion means for converting a human voice into a voice signal and a sound generation means for generating a sound by changing a predetermined pronunciation signal into vibration. The interactee,
A server computer provided separately from the interactee and connected to the interactee either by wire or wirelessly;
With
The server computer processes the voice signal converted by the voice conversion means to recognize the voice of the person, determines the voice corresponding to the voice recognized by the voice recognition means, and outputs a predetermined pronunciation signal And a dialogue control means.

これによれば、音声変換手段、発音手段を備えた被対話体と、上記被対話体と別体に構成され、音声認識手段、対話制御手段を備えたサーバ用コンピュータとの間が有線及び無線のいずれかで接続されて、人が被対話体と音声対話を行うことができる。 According to this configuration, the object to be interacted with the voice conversion means and the sound generation means and the server computer that is configured separately from the object to be interacted with the voice recognition means and the dialog control means are wired and wirelessly connected. A person can perform a voice conversation with a person to be interacted with.

音声認識手段、対話制御手段がサーバ用コンピュータに備えられるので、被対話体を落下させた場合、あるいは水たまりに水没させた場合でも、高価な音声認識手段、対話制御手段が故障することがない。さらに、被対話体とサーバ用コンピュータとが無線で接続されている場合には、有線で接続されている場合のように、有線の長さに制約されることなく、被対話体を移動することができる。 Since the voice recognition means and the dialogue control means are provided in the server computer, the expensive voice recognition means and the dialogue control means do not break down even when the object to be interacted is dropped or submerged in a puddle. Furthermore, when the interactee is connected wirelessly to the server computer, the interactee can be moved without being restricted by the length of the wire as in the case of being connected by wire. Can do.

請求項２に記載の発明では、所定の発音信号を振動に変えて発音する発音手段を備えた被対話体と、
被対話体とは別体に設けられて被対話体に有線及び無線のいずれかで接続されたサーバ用コンピュータと、
被対話体およびサーバ用コンピュータとは別体に設けられて被対話体およびサーバ用コンピュータのいずれかに有線及び無線のいずれかで接続されて人の音声を音声信号に変換する音声変換手段と、
を備えており、
サーバ用コンピュータが、音声変換手段により変換された音声信号を処理して人の音声を認識する音声認識手段と、音声認識手段により認識された音声に対応する音声を決定し所定の発音信号を出力する対話制御手段とを備えていることを特徴とする。 In the invention according to claim 2, to-be-interacted body provided with sounding means for sounding by changing a predetermined sounding signal into vibration,
A server computer provided separately from the interactee and connected to the interactee either by wire or wirelessly;
A voice conversion unit that is provided separately from the interactee and the server computer, and is connected to either the talkee and the server computer either by wire or wirelessly and converts a human voice into an audio signal;
With
The server computer processes the voice signal converted by the voice conversion means to recognize the voice of the person, determines the voice corresponding to the voice recognized by the voice recognition means, and outputs a predetermined pronunciation signal And a dialogue control means.

これによれば、音声変換手段が、被対話体およびサーバ用コンピュータとは別体に設けられるので、人が被対話体に近づかなくとも、音声を音声変換手段に入力することができる。また、音声変換装置を持ち歩くことがないので、被対話体を落下させた場合、あるいは水たまりに水没させた場合でも、音声変換手段が故障することがない。 According to this, since the voice conversion means is provided separately from the object to be interacted with and the server computer, it is possible to input the sound to the sound conversion means even if a person does not approach the object to be interacted with. Further, since the voice conversion device is not carried around, the voice conversion means does not break down even when the interactee is dropped or submerged in a puddle.

請求項３に記載の発明では、人の音声を音声信号に変換する音声変換手段および所定の発音信号を振動に変えて発音する発音手段を備えた被対話体と、
被対話体とは別体に設けられて被対話体に有線及び無線のいずれかで接続されたサーバ用コンピュータと、
を備えており、
音声変換手段により変換された音声信号を処理して人の言葉を認識する言葉認識手段、言葉認識手段により認識された言葉に対応する言葉を決定し所定の発音信号を出力する対話制御手段の両手段のうち、どちらか一方が被対話体に備えられており、他方がサーバ用コンピュータに備えられていることを特徴とする。 In the invention according to claim 3, there is provided a voice conversion means for converting a human voice into a voice signal, and a to-be-interactive body provided with a sound generation means for changing a predetermined sound generation signal into vibration,
A server computer provided separately from the interactee and connected to the interactee either by wire or wirelessly;
With
Both speech recognition means for processing a speech signal converted by the speech conversion means and a dialogue recognition means for determining a word corresponding to the word recognized by the word recognition means and outputting a predetermined pronunciation signal One of the means is provided in the object to be interacted with, and the other is provided in the server computer.

これによれば、音声認識手段が搭載された被対話体と、対話制御手段が搭載されたサーバ用コンピュータとの間が有線及び無線のいずれかで接続されて、人が被対話体と音声対話を行うことができる。 According to this, the person to be interacted with the person to be interacted is connected by a wired or wireless connection between the object to be interacted with the voice recognition means and the server computer on which the dialog control means is installed. It can be performed.

対話制御手段がサーバ用コンピュータに備えられるので、被対話体を落下させた場合に、あるいは水たまりに水没させた場合に、高価な対話制御手段が故障することがない。さらに、被対話体とサーバ用コンピュータとが無線で接続される場合には、有線で接続されている場合のように、有線の長さに制約されることなく、被対話体を移動することができる。このように、音声認識手段、対話制御手段のうち、どちらか一方が被対話体に、他方がサーバ用コンピュータに搭載するように構成する場合でも、上述した効果を得ることができる。 Since the dialog control means is provided in the server computer, the expensive dialog control means does not break down when the object to be interacted is dropped or submerged in a puddle. Further, when the interactee and the server computer are connected wirelessly, the interactee can be moved without being restricted by the length of the wire as in the case of being connected by wire. it can. As described above, even when one of the voice recognition unit and the dialogue control unit is mounted on the interactee and the other is mounted on the server computer, the above-described effects can be obtained.

請求項４に記載の発明では、所定の発音信号を振動に変えて発音する発音手段を備えた被対話体と、
被対話体とは別体に設けられて被対話体に有線及び無線のいずれかで接続されたサーバ用コンピュータと、
被対話体およびサーバ用コンピュータとは別体に設けられて被対話体およびサーバ用コンピュータのいずれかに有線及び無線のいずれかで接続されて人の音声を音声信号に変換する音声変換手段と、
を備えており、
音声変換手段により変換された音声信号を処理して人の音声を認識する音声認識手段、音声認識手段により認識された音声に対応する音声を決定し所定の発音信号を出力する対話制御手段の両手段のうち、どちらか一方が被対話体に備えられており、他方がサーバ用コンピュータに備えられていることを特徴とする。 In the invention according to claim 4, to-be-interacted body provided with sounding means for sounding by changing a predetermined sounding signal into vibration,
A server computer provided separately from the interactee and connected to the interactee either by wire or wirelessly;
A voice conversion unit that is provided separately from the interactee and the server computer, and is connected to either the talkee and the server computer either by wire or wirelessly and converts a human voice into an audio signal;
With
Both voice recognition means for processing a voice signal converted by the voice conversion means to recognize a human voice, and a dialog control means for determining a voice corresponding to the voice recognized by the voice recognition means and outputting a predetermined pronunciation signal One of the means is provided in the object to be interacted with, and the other is provided in the server computer.

これによれば、音声変換手段が被対話体およびサーバ用コンピュータとは別体に設けられるので、人が被対話体に近づかなくとも、音声を音声変換手段に入力することができる。また、音声変換装置を持ち歩くことがないので、被対話体を落下させた場合、あるいは水たまりに水没させた場合でも、音声変換手段が故障することがない。 According to this, since the voice conversion means is provided separately from the object to be interacted with and the server computer, it is possible to input the sound to the sound conversion means even if a person does not approach the object to be interacted with. Further, since the voice conversion device is not carried around, the voice conversion means does not break down even when the interactee is dropped or submerged in a puddle.

なお、請求項１乃至請求項４のいずれか１つによれば、音声変換手段、発音手段、音声認識手段、対話制御手段のすべてが被対話体に搭載される場合に比べると、被対話体を小さく、軽くすることができ、被対話体の持ち運びを容易にすることができる。 According to any one of claims 1 to 4, compared to the case where all of the voice conversion means, the sound generation means, the voice recognition means, and the dialogue control means are mounted on the dialogue target, Can be made small and light, and the object can be easily carried.

請求項５に記載の発明では、請求項１乃至請求項４のいずれか１つに記載の音声対話システムにおいて、さらに所定の発音情報を記憶自在な発音情報記憶部が被対話体およびサーバ用コンピュータのいずれかに搭載されており、
所定の発音情報が発音情報記憶部に記憶されており、
人が音声変換手段を介して所定の発音情報を要求した場合、人が音声変換手段を介して所定の発音情報を許可した場合、所定の発音情報を用いて被対話体が自ら発音する場合のいずれかに、発音情報記憶部から所定の発音情報を読み出して、発音手段から発音することを特徴とする。 According to a fifth aspect of the present invention, in the spoken dialogue system according to any one of the first to fourth aspects, a pronunciation information storage unit capable of storing predetermined pronunciation information further includes a computer to be interacted with and a server computer. Is mounted on either
Predetermined pronunciation information is stored in the pronunciation information storage unit,
When a person requests predetermined pronunciation information via the voice conversion means, when a person permits the predetermined pronunciation information via the voice conversion means, or when the person to be spoken pronounces himself using the predetermined pronunciation information One of the features is that predetermined sounding information is read from the sounding information storage unit and sounded by the sounding means.

これによれば、人が被対話体と単に音声対話するだけではなく、人が所定の発音情報を要求した場合、人が音声変換手段を介して所定の発音情報を許可した場合、所定の発音情報を用いて被対話体が自ら発音する場合のいずれかに、所定の発音情報を得ることができる高機能な音声対話システムを提供することができる。また、人が所定の発音情報を要求した場合、所定の発音情報を用いて被対話体が自ら発音する場合に、所定の発音情報を読み出して、発音手段から発音する高度なユーザインターフェースを提供できる。さらに、発音情報記憶部がサーバ用コンピュータに搭載されている場合には、被対話体を落下させた場合、あるいは水たまりに水没させた場合でも、発音情報記憶部に記憶された発音情報を損傷させることがない。 According to this, not only a person has a voice conversation with a person to be interacted but also a person requests a predetermined pronunciation information, a person permits a predetermined pronunciation information through the voice conversion means, a predetermined pronunciation It is possible to provide a highly functional voice dialogue system capable of obtaining predetermined pronunciation information in any case where the person to be spoken by himself / herself uses information. In addition, when a person requests predetermined pronunciation information, an advanced user interface can be provided that reads out the predetermined pronunciation information and produces sound from the sound generation means when the person to be spoken uses the predetermined pronunciation information. . Further, when the pronunciation information storage unit is mounted on the server computer, the pronunciation information stored in the pronunciation information storage unit is damaged even if the interactee is dropped or submerged in a puddle. There is nothing.

請求項６に記載の発明では、請求項５において、発音情報記憶部がインターネットに接続自在に構成されており、
発音情報がインターネット上の所定の記憶場所からダウンロード自在であることを特徴とする。 In the invention described in claim 6, in claim 5, the pronunciation information storage unit is configured to be freely connected to the Internet.
The pronunciation information can be downloaded from a predetermined storage location on the Internet.

これによれば、所定の発音情報をインターネット上からダウンロードできる高機能な音声対話システムを提供できる。また、所定の発音情報をインターネット上からダウンロードできるので、発音情報記憶部に記憶された所定の発音情報が損傷しても、直ぐに所定の発音情報を復旧することができる。 According to this, it is possible to provide a highly functional voice dialogue system that can download predetermined pronunciation information from the Internet. Further, since the predetermined pronunciation information can be downloaded from the Internet, even if the predetermined pronunciation information stored in the pronunciation information storage unit is damaged, the predetermined pronunciation information can be restored immediately.

請求項７に記載の発明のように、請求項１乃至請求項６のいずれか１つにおいて、被対話体が、１つ以上の可動部と、
１つ以上の可動部をそれぞれ可動するモータと、
モータをそれぞれ駆動する駆動部と、
駆動部に可動部の動作を司令する指令信号を出力するコントローラと、
を備えていてもよい。 As in the invention described in claim 7, in any one of claims 1 to 6, the interactee is one or more movable parts;
A motor for moving each of the one or more movable parts;
A drive unit for driving each motor;
A controller that outputs a command signal to command the operation of the movable part to the drive part;
May be provided.

これによれば、可動部の動作を司令する指令信号を、コントローラから駆動部に出力し、この指令信号に基づいてモータを駆動することで、可動部を可動することができる。上記のように、被対話体に可動部、モータ、駆動部、コントローラが備えられた音声対話システムであってもよい。 According to this, it is possible to move the movable part by outputting the command signal for commanding the operation of the movable part from the controller to the drive part and driving the motor based on the command signal. As described above, the spoken dialogue system may include a movable unit, a motor, a drive unit, and a controller in the object to be interacted.

請求項８に記載の発明では、請求項１乃至請求項６のいずれか１つにおいて、被対話体が、１つ以上の可動部と、
１つ以上の可動部をそれぞれ可動するモータと、
モータをそれぞれ駆動する駆動部と、
を備えており、
サーバ用コンピュータが、駆動部に動作の指令信号を出力するコントローラを備えていることを特徴とする。 According to an eighth aspect of the present invention, in any one of the first to sixth aspects, the interactee is one or more movable parts;
A motor for moving each of the one or more movable parts;
A drive unit for driving each motor;
With
The server computer includes a controller that outputs an operation command signal to the drive unit.

これによれば、可動部の動作を司令する指令信号を、サーバ用コンピュータに備えられたコントローラから、被対話体に備えられた駆動部に出力し、この指令信号に基づいてモータを駆動することで、可動部を可動することができる。 According to this, the command signal for commanding the operation of the movable part is output from the controller provided for the server computer to the drive part provided for the interactee, and the motor is driven based on the command signal. Thus, the movable part can be moved.

上記のように、コントローラがサーバ用コンピュータに備えられているので、被対話体を落下させた場合、あるいは水たまりに水没させた場合でも、高価なコントローラが故障することがない。 As described above, since the controller is provided in the server computer, the expensive controller does not fail even when the interactee is dropped or submerged in a puddle.

請求項９に記載の発明では、請求項１乃至請求項６のいずれか１つにおいて、被対話体が、１つ以上の可動部と、
１つ以上の可動部をそれぞれ可動するモータと、
を備えており、
サーバ用コンピュータが、モータをそれぞれ駆動する駆動部と、駆動部に動作の指令信号を出力するコントローラとを備えていることを特徴とする。 In the invention according to claim 9, in any one of claims 1 to 6, the interactee is one or more movable parts;
A motor for moving each of the one or more movable parts;
With
The server computer includes a drive unit that drives each motor, and a controller that outputs an operation command signal to the drive unit.

これによれば、可動部の動作を司令する指令信号を、サーバ用コンピュータに備えられたコントローラから駆動部に出力し、この指令信号に基づいて、被対話体に備えられたモータを駆動することで、可動部を可動することができる。 According to this, the command signal for commanding the operation of the movable part is output from the controller provided in the server computer to the drive unit, and the motor provided in the interactee is driven based on the command signal. Thus, the movable part can be moved.

駆動部、コントローラがサーバ用コンピュータに備えらるので、被対話体を落下させた場合、あるいは水たまりに水没させた場合でも、高価な駆動部、コントローラが故障することがない。 Since the drive unit and the controller are provided in the server computer, the expensive drive unit and controller do not break down even when the object to be interacted is dropped or submerged in a puddle.

請求項１０に記載の発明では、請求項１乃至請求項６のいずれか１つにおいて、被対話体およびサーバ用コンピュータとは別体に設けられて被対話体およびサーバ用コンピュータの少なくとも１つに有線及び無線のいずれかで接続されて可動する可動ユニットを備えており、
可動ユニットが、１つ以上の可動部と、
１つ以上の可動部をそれぞれ可動するモータと、
モータをそれぞれ駆動する駆動部と、
駆動部に可動部の動作を司令する指令信号を出力するコントローラと、
を備えていることを特徴とする。 According to a tenth aspect of the present invention, in any one of the first to sixth aspects, the at least one of the interactee and the server computer is provided separately from the interactee and the server computer. It is equipped with a movable unit that can be connected and moved by either wired or wireless,
The movable unit includes one or more movable parts;
A motor for moving each of the one or more movable parts;
A drive unit for driving each motor;
A controller that outputs a command signal to command the operation of the movable part to the drive part;
It is characterized by having.

これによれば、可動ユニットが被対話体およびサーバ用コンピュータとは別体に設けられ、被対話体と有線及び無線のいずれかで接続されて可動することができる。 According to this, the movable unit is provided separately from the interactee and the server computer, and can be moved by being connected to the interactee either by wire or wirelessly.

上記のように、可動部、モータ、駆動部、コントローラが被対話体と別体に備えられているので、被対話体を落下させた場合、あるいは水たまりに水没させた場合でも、高価な可動部、モータ、駆動部、コントローラが故障することがない。 As described above, since the movable part, motor, drive unit, and controller are provided separately from the interactee, even if the interactee is dropped or submerged in a puddle, the expensive movable part The motor, drive unit and controller will not break down.

請求項１１に記載の発明では、請求項１乃至請求項６のいずれか１つにおいて、被対話体およびサーバ用コンピュータとは別体に設けられて被対話体およびサーバ用コンピュータの少なくとも１つに有線及び無線のいずれかで接続された可動ユニットを備えており、
可動ユニットが、１つ以上の可動部と、
１つ以上の可動部をそれぞれ駆動するモータと、
モータをそれぞれ駆動する駆動部と、
を備えており、
被対話体およびサーバ用コンピュータのいずれかが、駆動部に動作の指令信号を出力するコントローラを備えていることを特徴とする。 According to an eleventh aspect of the present invention, in any one of the first to sixth aspects, at least one of the interactee and the server computer is provided separately from the interactee and the server computer. It has a movable unit connected by either wired or wireless,
The movable unit includes one or more movable parts;
Motors each driving one or more movable parts;
A drive unit for driving each motor;
With
One of the object to be interacted with and the server computer includes a controller that outputs an operation command signal to the drive unit.

これによれば、可動部の動作を司令する指令信号を、被対話体およびサーバ用コンピュータのいずれかに備えられたコントローラから駆動部に出力し、この指令信号に基づいて、可動ユニットに備えられたモータを駆動することで、可動部を可動することができる。 According to this, a command signal for commanding the operation of the movable part is output from the controller provided in either the interactee or the server computer to the drive unit, and the movable unit is provided based on this command signal. By driving the motor, the movable part can be moved.

可動部、モータ、駆動部が被対話体と別体に備えられているので、被対話体を落下させた場合、あるいは水たまりに水没させた場合でも、高価な可動部、モータ、駆動部が故障することがない。 Since the movable part, motor, and drive part are provided separately from the interactee, the expensive movable part, motor, and drive part will fail even if the interactee is dropped or submerged in a puddle. There is nothing to do.

請求項１２に記載の発明では、請求項１乃至請求項６のいずれか１つにおいて、被対話体とは別体に設けられて被対話体およびサーバ用コンピュータの少なくとも１つに有線及び無線のいずれかで接続された可動ユニットを備えており、
可動ユニットが、１つ以上の可動部と、
１つ以上の可動部をそれぞれ可動するモータと、
を備えており、
モータをそれぞれ駆動する駆動部が、被対話体およびサーバ用コンピュータのいずれかに備えられており、
駆動部に動作の指令信号を出力するコントローラが、被対話体およびサーバ用コンピュータのいずれかに備えられていることを特徴とする。 According to a twelfth aspect of the present invention, in any one of the first to sixth aspects, a wired and wireless connection is provided to at least one of the interactee and the server computer. It has a movable unit connected by either
The movable unit includes one or more movable parts;
A motor for moving each of the one or more movable parts;
With
A drive unit for driving each motor is provided in either the interactee or the server computer,
A controller that outputs an operation command signal to the drive unit is provided in either the interactee or the server computer.

これによれば、少なくとも可動部、モータが可動ユニットに備えられているので、被対話体を落下させた場合、あるいは水たまりに水没させた場合でも、高価な可動部、モータが故障することがない。 According to this, since at least the movable part and the motor are provided in the movable unit, the expensive movable part and the motor do not break down even when the interactee is dropped or submerged in a puddle. .

なお、請求項８乃至請求項１２のいずれかによれば、可動部、モータ、駆動部、コントローラのすべてが被対話体に備えられる場合に比べると、被対話体を小さく、軽くすることができ、被対話体の持ち運びを容易にすることができる。 According to any one of claims 8 to 12, the interactee can be made smaller and lighter than when the movable part, the motor, the drive unit, and the controller are all provided in the interactee. Therefore, it is possible to easily carry the interactee.

請求項１３に記載の発明のように、請求項１０乃至請求項１２のいずれか１つにおいて、被対話体と可動ユニットが取付自在に構成されていてもよい。 As in a thirteenth aspect of the present invention, in any one of the tenth to twelfth aspects, the interactee and the movable unit may be configured to be freely attachable.

これによれば、可動ユニットを被対話体に取り付けることができるので、被対話体が可動ユニットと別体に構成される場合と、被対話体が可動ユニットと一体に構成される場合の２つの構成を使い分けて使用することができる。 According to this, since the movable unit can be attached to the interacting body, there are two cases where the interacted body is configured separately from the movable unit and when the interacted body is configured integrally with the movable unit. You can use different configurations.

請求項１４に記載の発明では、請求項１乃至請求項９のいずれか１つに記載の音声対話システムにおいて、さらに所定の画像を表示する画像表示手段が被対話体と一体および別体のいずれかに設けられており、
所定の画像情報が予め記憶された画像情報記憶部が被対話体およびサーバ用コンピュータのいずれかに搭載されており、
人が音声変換手段を介して所定の画像情報を要求した場合、人が音声変換手段を介して所定の画像情報を許可した場合、所定の画像情報を用いて被対話体が自ら所定の画像を表示する場合のいずれかに、画像情報記憶部から所定の画像情報を読み出して、画像表示手段に表示することを特徴とする。 According to a fourteenth aspect of the present invention, in the voice interaction system according to any one of the first to ninth aspects, the image display means for displaying a predetermined image is either integrated with or separated from the object to be interacted. It is established in
An image information storage unit in which predetermined image information is stored in advance is mounted on either the interactee or the server computer,
When a person requests predetermined image information through the voice conversion unit, or when a person permits the predetermined image information through the voice conversion unit, the interactee uses the predetermined image information to display the predetermined image information. In any case of displaying, predetermined image information is read from the image information storage unit and displayed on the image display means.

これによれば、画像表示手段が被対話体と別体に設けられている場合には、被対話体を落下させた場合、あるいは水たまりに水没させた場合でも、画像表示手段を損傷させることがない。また、画像情報記憶部がサーバ用コンピュータに搭載されている場合には、被対話体を落下させた場合、あるいは水たまりに水没させた場合でも、画像情報記憶部の画像情報を損傷させることがない。なお、画像情報記憶部が被対話体に搭載されていてもよく、画像表示手段が被対話体と一体に設けられていてもよい。 According to this, when the image display means is provided separately from the interactee, the image display means can be damaged even if the interactee is dropped or submerged in a puddle. Absent. Further, when the image information storage unit is mounted on the server computer, the image information in the image information storage unit is not damaged even if the interactee is dropped or submerged in a puddle. . Note that the image information storage unit may be mounted on the interactee, and the image display means may be provided integrally with the interactee.

請求項１５に記載の発明では、請求項１０乃至請求項１３のいずれか１つに記載の音声対話システムにおいて、さらに所定の画像を表示する画像表示手段が被対話体および可動ユニットのいずれかに設けられて、被対話体、サーバ用コンピュータ、可動ユニットの少なくとも１つに有線及び無線のいずれかで接続されており、
所定の画像情報が予め記憶された画像情報記憶部が、被対話体、サーバ用コンピュータ、可動ユニットのいずれかに搭載されており、
人が音声変換手段を介して所定の画像情報を要求した場合、人が音声変換手段を介して所定の画像情報を許可した場合、所定の画像情報を用いて被対話体が自ら所定の画像を表示する場合のいずれかに、画像情報記憶部から所定の画像情報を読み出して、画像表示手段に表示することを特徴とする。 According to a fifteenth aspect of the present invention, in the voice interaction system according to any one of the tenth to thirteenth aspects, the image display means for displaying a predetermined image is either the object to be interacted or the movable unit. Provided, connected to at least one of the interactee, the server computer, and the movable unit by either wired or wireless,
An image information storage unit in which predetermined image information is stored in advance is mounted on any of the interactee, the server computer, and the movable unit,
When a person requests predetermined image information through the voice conversion unit, or when a person permits the predetermined image information through the voice conversion unit, the interactee uses the predetermined image information to display the predetermined image information. In any case of displaying, predetermined image information is read from the image information storage unit and displayed on the image display means.

これによれば、画像表示手段がサーバ用コンピュータ、可動ユニットのいずれかに搭載されている場合には、被対話体を落下させた場合、あるいは水たまりに水没させた場合でも、高価な画像表示手段を損傷させることがない。また、画像情報記憶部がサーバ用コンピュータ、可動ユニットのいずれかに搭載されている場合には、被対話体を落下させた場合、あるいは水たまりに水没させた場合でも、画像情報記憶部の画像情報を損傷させることがない。なお、画像表示手段が被対話体に設けられていてもよく、画像情報記憶部が被対話体に設けられていてもよい。
According to this, when the image display means is mounted on either the server computer or the movable unit, the expensive image display means can be used even when the interactee is dropped or submerged in a puddle. Will not damage. Further, when the image information storage unit is mounted on either the server computer or the movable unit, the image information stored in the image information storage unit can be obtained even when the object to be interacted is dropped or submerged in a puddle. Will not damage. Note that the image display means may be provided in the interactee, and the image information storage unit may be provided in the interactee.

請求項１６に記載の発明では、請求項１０乃至請求項１３のいずれか１つに記載の音声対話システムにおいて、さらに所定の画像を表示する画像表示手段が被対話体および可動ユニットのいずれとも別体に設けられ、被対話体、サーバ用コンピュータ、可動ユニットの少なくとも１つに有線及び無線のいずれかで接続されており、
所定の画像情報が予め記憶された画像情報記憶部が、被対話体、サーバ用コンピュータ、可動ユニットのいずれかに搭載されており、
人が音声変換手段を介して所定の画像情報を要求した場合、人が音声変換手段を介して所定の画像情報を許可した場合、所定の画像情報を用いて被対話体が自ら所定の画像を表示する場合のいずれかに、画像情報記憶部から所定の画像情報を読み出して、画像表示手段に表示することを特徴とする。 According to a sixteenth aspect of the present invention, in the voice interaction system according to any one of the tenth to thirteenth aspects, the image display means for displaying a predetermined image is separate from both the object to be interacted and the movable unit. Is connected to at least one of the interactee, the server computer, and the movable unit by either wired or wireless,
An image information storage unit in which predetermined image information is stored in advance is mounted on any of the interactee, the server computer, and the movable unit,
When a person requests predetermined image information through the voice conversion unit, or when a person permits the predetermined image information through the voice conversion unit, the interactee uses the predetermined image information to display the predetermined image information. In any case of displaying, predetermined image information is read from the image information storage unit and displayed on the image display means.

これによれば、所定の画像を表示する画像表示手段が被対話体および可動ユニットのいずれとも別体に設けられているので、被対話体を落下させた場合、あるいは水たまりに水没させた場合でも、画像情報記憶部の画像情報を損傷させることがない。 According to this, since the image display means for displaying a predetermined image is provided separately from both the interactee and the movable unit, even when the interactee is dropped or submerged in a puddle The image information in the image information storage unit is not damaged.

なお、請求項１４、請求項１５、請求項１６のいずれか１つによれば、人が音声変換手段を介して所定の画像情報を要求した場合、人が音声変換手段を介して所定の画像情報を許可した場合、所定の画像情報を用いて被対話体が自ら所定の画像を表示する場合のいずれかに、所定の画像情報を得ることができる高機能な音声対話システムを提供できる。また、人が音声変換手段を介して所定の画像情報を要求した場合、人が音声変換手段を介して所定の画像情報を許可した場合、所定の画像情報を用いて被対話体が自ら所定の画像を表示する場合のいずれかに、所定の画像情報を画像情報記憶部から得て、画像表示手段に表示する高度なユーザインターフェースを実現することができる。 According to any one of claims 14, 15, and 16, when a person requests predetermined image information via the voice conversion unit, the person receives a predetermined image via the voice conversion unit. When the information is permitted, it is possible to provide a high-performance voice dialogue system that can obtain the predetermined image information in any case where the interactee displays the predetermined image using the predetermined image information. In addition, when a person requests predetermined image information through the voice conversion unit, or when a person permits the predetermined image information through the voice conversion unit, the person to be interacted with the predetermined image information uses the predetermined image information. In any case of displaying an image, it is possible to realize an advanced user interface that obtains predetermined image information from the image information storage unit and displays it on the image display means.

請求項１７に記載の発明では、請求項１４乃至請求項１６のいずれか１つにおいて、画像情報記憶部がインターネットに接続自在に構成されており、
画像情報がインターネット上の所定の記憶場所からダウンロード自在であることを特徴とする。 According to a seventeenth aspect of the present invention, in any one of the fourteenth to sixteenth aspects, the image information storage unit is configured to be connectable to the Internet.
The image information can be downloaded from a predetermined storage location on the Internet.

これによれば、所定の画像情報をインターネット上からダウンロードできる高機能な音声対話システムを提供できる。また、所定の画像情報をインターネット上からダウンロードできるので、画像情報記憶部に記憶された所定の画像情報が損傷しても、直ぐに所定の画像情報を復旧することができる。 According to this, it is possible to provide a high-performance voice interaction system that can download predetermined image information from the Internet. Further, since the predetermined image information can be downloaded from the Internet, even if the predetermined image information stored in the image information storage unit is damaged, the predetermined image information can be restored immediately.

請求項１８に記載の発明のように、請求項１乃至請求項９、請求項１４のいずれか１つにおいて、人を含む所定の対象物を撮像自在な撮像手段が被対話体と一体および別体のいずれかに構成されており、
撮像手段により撮像された撮像データから所定の対象物を認識する画像認識手段が被対話体およびサーバ用コンピュータのいずれかに搭載されていてもよい。 As in the eighteenth aspect of the present invention, in any one of the first to ninth aspects and the fourteenth aspect, the imaging means capable of imaging a predetermined object including a person is integrated with and separated from the interactee. Composed of one of the bodies,
Image recognition means for recognizing a predetermined object from image data captured by the image pickup means may be mounted on either the interactee or the server computer.

これによれば、撮像手段により撮像された撮像データから所定の対象物を認識することができる。上記のように、撮像手段が被対話体と一体および別体のいずれかに構成されていてもよい。また、撮像手段を備えた高機能な音声対話システムを提供できる。 According to this, it is possible to recognize a predetermined object from the imaging data imaged by the imaging means. As described above, the imaging means may be configured either as an integral body or a separate body. In addition, it is possible to provide a high-functional voice interaction system including an imaging unit.

請求項１９に記載の発明では、請求項１０乃至請求項１３、請求項１５、請求項１６のいずれか１つにおいて、人を含む所定の対象物を撮像自在な撮像手段が被対話体および可動ユニットのいずれかに設けられて、被対話体、サーバ用コンピュータ、可動ユニットの少なくとも１つに有線及び無線のいずれかで接続されており、
撮像手段により撮像された撮像データから所定の対象物を認識する画像認識手段が被対話体、サーバ用コンピュータ、可動ユニットの少なくとも１つに搭載されていることを特徴とする。 According to a nineteenth aspect of the present invention, in any one of the tenth to thirteenth, fifteenth, and sixteenth aspects, the imaging means capable of imaging a predetermined object including a person is an interactive body and a movable body. It is provided in any of the units and is connected to at least one of the interactee, the server computer, and the movable unit by either wired or wireless,
Image recognition means for recognizing a predetermined object from image data captured by the imaging means is mounted on at least one of the interactee, the server computer, and the movable unit.

これによれば、撮像手段により撮像された撮像データから人を含む所定の対象物を認識することができる。撮像手段が可動ユニットに設けられている場合には、被対話体を落下させた場合、あるいは水たまりに水没させた場合でも、高価な撮像手段を損傷させることがない。また、画像認識手段がサーバ用コンピュータ、可動ユニットに搭載されている場合には、高価な画像認識手段を損傷させることがない。なお、撮像手段が被対話体に設けられていてもよく、画像認識手段が被対話体に搭載されていてもよい。 According to this, it is possible to recognize a predetermined object including a person from the imaging data captured by the imaging unit. When the imaging unit is provided in the movable unit, the expensive imaging unit is not damaged even when the interactee is dropped or submerged in a puddle. Further, when the image recognition means is mounted on the server computer or the movable unit, the expensive image recognition means is not damaged. Note that the imaging means may be provided on the interactee, and the image recognition means may be mounted on the interactee.

請求項２０に記載の発明では、請求項１０乃至請求項１３、請求項１５、請求項１６のいずれか１つにおいて、人を含む所定の対象物を撮像自在な撮像手段が被対話体および可動ユニットのいずれかとも別体に設けられて、被対話体、サーバ用コンピュータ、可動ユニットの少なくとも１つに有線及び無線のいずれかで接続されており、
撮像手段により撮像された撮像データから所定の対象物を認識する画像認識手段が被対話体、サーバ用コンピュータ、可動ユニットの少なくとも１つに搭載されていることを特徴とする。 According to a twentieth aspect of the present invention, in any one of the tenth to thirteenth, fifteenth, and sixteenth aspects, the imaging means capable of imaging a predetermined object including a person is an interactive body and a movable body. It is provided separately from any of the units, and is connected to at least one of the interactee, the server computer, and the movable unit by either wired or wireless,
Image recognition means for recognizing a predetermined object from image data captured by the imaging means is mounted on at least one of the interactee, the server computer, and the movable unit.

これによれば、撮像手段により撮像された撮像データから人を含む所定の対象物を認識することができる。上記のように、撮像手段が被対話体と一体および別体のいずれかに構成されていてもよい。また、撮像手段を備えた高機能な音声対話システムを提供できる。 According to this, it is possible to recognize a predetermined object including a person from the imaging data captured by the imaging unit. As described above, the imaging means may be configured either as an integral body or a separate body. In addition, it is possible to provide a high-functional voice interaction system including an imaging unit.

請求項２１に記載の発明では、請求項７乃至請求項２０のいずれか１つにおいて、人と対話を行う場合、所定の説明を行う場合の少なくとも１つにおいて、可動部が所定のコミュニケーション動作をするように、コントローラが駆動部に指令信号を出力することを特徴とする。 According to a twenty-first aspect of the present invention, in any one of the seventh to twentieth aspects, the movable portion performs a predetermined communication operation in at least one of a case where a dialogue is performed with a person and a predetermined explanation is given. As described above, the controller outputs a command signal to the drive unit.

これによれば、可動部が設けられていない音声対話システムに比べて、ミュニケーション動作をして、臨場感を持って発音する高度な音声対話システムを提供できる。また、動部が設けられていない音声対話システムに比べて、ミュニケーション動作をして、臨場感を持って発音する高度なユーザインターフェースを実現することができる。 Accordingly, it is possible to provide an advanced voice dialogue system that performs a communication operation and produces a realistic sensation as compared to a voice dialogue system in which no movable part is provided. In addition, it is possible to realize an advanced user interface that performs a communication operation and produces a realistic sensation as compared to a voice dialogue system that does not include a moving part.

請求項２２に記載の発明では、請求項１０乃至請求項１３、請求項請求項１５のいずれか１つにおいて、可動部が所定の装置を操作する位置に配置されており、
人の音声が所定の装置を操作する命令である場合、人の音声が所定の装置を操作する許可である場合、所定の操作入力手段により所定の装置を操作する場合、所定の装置を操作する自動実行プログラムが実行される場合に、所定の装置を操作するように、コントローラが駆動部に指令信号を出力することを特徴とする。 According to a twenty-second aspect of the present invention, in any one of the tenth to thirteenth and thirteenth and fifteenth aspects, the movable portion is disposed at a position for operating a predetermined device.
When a human voice is an instruction to operate a predetermined device, when a human voice is permission to operate a predetermined device, when operating a predetermined device by a predetermined operation input means, operate the predetermined device When the automatic execution program is executed, the controller outputs a command signal to the drive unit so as to operate a predetermined device.

これによれば、コントローラが駆動部に指令信号を出力して、可動部が所定の装置を操作する高度な音声対話システムを提供できる。 According to this, it is possible to provide an advanced spoken dialogue system in which the controller outputs a command signal to the drive unit and the movable unit operates a predetermined device.

請求項２３に記載の発明では、請求項１８乃至請求項２０のいずれか１つにおいて、撮像手段が、人を含む所定の対象物を撮像し、画像認識手段が所定の対象物を認識した結果に基づいて、人と所定のコミュニケーション動作をするように、コントローラが駆動部に指令信号を出力することを特徴とする。 According to a twenty-third aspect of the present invention, in any one of the eighteenth to twentieth aspects, the imaging unit images a predetermined object including a person, and the image recognition unit recognizes the predetermined object. Based on the above, the controller outputs a command signal to the drive unit so as to perform a predetermined communication operation with a person.

これによれば、人を含む所定の対象物を撮像し、画像認識手段が所定の対象物を認識した結果に基づいてコミュニケーション動作をし、臨場感を持って発音する高度な音声対話システムを提供できる。また、人を含む所定の対象物を撮像し、画像認識手段が所定の対象物を認識した結果に基づいてコミュニケーション動作をし、臨場感を持って発音する高度なユーザインターフェースを実現することができる。 According to this, an advanced voice dialogue system that captures images of a predetermined object including a person, communicates based on the result of the image recognition means recognizing the predetermined object, and pronounces with a sense of reality is provided. it can. Further, it is possible to realize an advanced user interface that captures an image of a predetermined object including a person, performs a communication operation based on a result of the image recognition unit recognizing the predetermined object, and pronounces with a sense of presence. .

請求項２４に記載の発明では、請求項１８乃至請求項２０のいずれか１つにおいて、撮像手段が、人を含む所定の対象物を撮像し、画像認識手段が所定の対象物を認識した結果に基づいて、複数の発音データから少なくとも１つを選択し、発音手段を介して人に対して発音することを特徴とする。 According to a twenty-fourth aspect of the present invention, in any one of the eighteenth to twentieth aspects, the imaging unit images a predetermined object including a person, and the image recognition unit recognizes the predetermined object. Based on the above, at least one of a plurality of pronunciation data is selected and pronounced with respect to a person via a pronunciation means.

これによれば、撮像手段、画像認識手段により人を含む所定の対象物を認識して、人と音声対話をする高度な音声対話システムを提供できる。また、画像認識手段により人を含む所定の対象物を認識して、人と音声対話をする高度なユーザインターフェースを実現することができる。 According to this, it is possible to provide an advanced speech dialogue system that recognizes a predetermined object including a person by the imaging means and the image recognition means and performs a voice dialogue with the person. In addition, it is possible to realize a high-level user interface for recognizing a predetermined object including a person by the image recognizing means and having a voice conversation with the person.

請求項２５に記載の発明では、請求項１８乃至請求項２０のいずれか１つにおいて、撮像手段が、所定の装置の操作手段を撮像し、
人の音声が所定の装置を操作する命令である場合、人の音声が所定の装置を操作する許可である場合、所定の操作入力手段により所定の装置を操作する場合、所定の装置を操作する自動実行プログラムが実行される場合に、画像認識手段が操作手段の位置を認識した結果に基づいて、可動部及び被対話体が、手段の操作位置に可動し、所定の装置を操作するように、コントローラが駆動部に指令信号を出力することを特徴とする。 According to a twenty-fifth aspect of the present invention, in any one of the eighteenth to twentieth aspects, the imaging unit images an operation unit of a predetermined device,
When a human voice is an instruction to operate a predetermined device, when a human voice is permission to operate a predetermined device, when operating a predetermined device by a predetermined operation input means, operate the predetermined device When the automatic execution program is executed, based on the result of the image recognizing means recognizing the position of the operating means, the movable part and the object to be interacted move to the operating position of the means and operate a predetermined device. The controller outputs a command signal to the drive unit.

これによれば、可動部及び被対話体が、手段の操作位置に可動し、所定の装置を操作する高度な音声対話システムを提供できる。また、人の音声が所定の装置を操作する命令である場合、人の音声が所定の装置を操作する許可である場合、所定の操作入力手段により所定の装置を操作する場合に、可動部及び被対話体が、手段の操作位置に可動し、所定の装置を操作する高度なユーザインターフェースを実現することができる。 According to this, it is possible to provide an advanced voice dialogue system in which the movable part and the object to be interacted move to the operation position of the means and operate a predetermined device. Further, when the human voice is an instruction to operate the predetermined device, the human voice is permitted to operate the predetermined device, or the predetermined device is operated by the predetermined operation input means, the movable portion and It is possible to realize an advanced user interface in which the interactee moves to the operation position of the means and operates a predetermined device.

請求項２６に記載の発明では、請求項１８乃至請求項２０のいずれか１つにおいて、撮像手段が、テーブルゲームの進行状況を撮像し、画像認識手段が、テーブルゲームの進行状況を画像認識するように構成されており、
画像認識手段により認識された進行状況から可動部の次の動作を決定する動作決定手段を備えており、
可動部が、動作決定手段により決定された次の動作を実行するように、コントローラが駆動部に指令信号を出力することを特徴とする。 According to a twenty-sixth aspect of the present invention, in any one of the eighteenth to twentieth aspects, the imaging unit images the progress of the table game, and the image recognition unit recognizes the progress of the table game. Is configured as
An operation determining means for determining the next operation of the movable part from the progress status recognized by the image recognition means;
The controller outputs a command signal to the drive unit so that the movable unit executes the next operation determined by the operation determination unit.

これによれば、撮像手段、画像認識手段によりテーブルゲームの進行状況を撮像、画像認識し、動作決定手段により可動部の次の動作を決定し、可動部が、動作決定手段により決定された次の動作を実行する高度な音声対話システムを提供できる。 According to this, the progress state of the table game is imaged and recognized by the imaging means and the image recognition means, the next action of the movable part is determined by the action determining means, and the next moving part determined by the action determining means is determined. It is possible to provide an advanced spoken dialogue system that executes the operations of

請求項２７に記載の発明では、請求項１８乃至請求項２０のいずれか１つにおいて、コントローラから駆動部に指令信号を出力して可動部を可動させ、人を含む所定の対象物を探し出すことを特徴とする。 According to a twenty-seventh aspect of the present invention, in any one of the eighteenth to twentieth aspects, the controller outputs a command signal to the driving unit to move the movable unit to search for a predetermined object including a person. It is characterized by.

これによれば、所定の対象物を探し出す高度な音声対話システムを提供できる。 According to this, it is possible to provide an advanced voice interaction system that searches for a predetermined object.

請求項２８に記載の発明では、請求項１８乃至請求項２０のいずれか１つにおいて、画像認識手段により認識された所定の対象物を撮像手段が追跡する追跡プログラムが被対話体およびサーバ用コンピュータのいずれかに搭載されており、
撮像手段が人を含む所定の対象物を追跡するように、コントローラから駆動部に指令信号を出力し、可動部を可動させることを特徴とする。 According to a twenty-eighth aspect of the present invention, in any one of the eighteenth to twentieth aspects, the tracking program in which the imaging unit tracks a predetermined object recognized by the image recognition unit is an interactive object and a server computer. Is mounted on either
A command signal is output from the controller to the drive unit so that the imaging unit tracks a predetermined object including a person, and the movable unit is moved.

これによれば、撮像手段が人を含む所定の対象物を追跡するように可動部を可動できるので、人を含む所定の対象物が移動しても、人を含む所定の対象物を追跡して認識をする高度な音声対話システムを提供できる。 According to this, since the movable unit can be moved so that the imaging unit tracks a predetermined object including a person, even if the predetermined object including the person moves, the predetermined object including the person is tracked. It is possible to provide an advanced spoken dialogue system that recognizes

請求項２９に記載の発明では、請求項１乃至請求項２８のいずれか１つの音声対話システムにおいて、さらに作動信号によって作動する作動手段を具えた作動体の作動手段に、作動信号を出力する作動信号出力手段が被対話体およびサーバ用コンピュータの少なくとも１つに搭載されており、
作動手段と作動信号出力手段との間が無線および有線のいずれか１つにより接続されていることを特徴とする。 According to a twenty-ninth aspect of the present invention, in the voice interaction system according to any one of the first to twenty-eighth aspects, an operation for outputting an operation signal to an operation means of an operating body further including an operation means operated by an operation signal A signal output means is mounted on at least one of the interactee and the server computer;
The actuating means and the actuating signal output means are connected by one of wireless and wired.

これによれば、可動ユニットを用いずに、作動信号出力手段から出力された作動信号により、直接、作動体の作動手段を作動させる高度な音声対話システムを提供できる。 According to this, it is possible to provide an advanced speech dialogue system that directly operates the operating means of the operating body by the operating signal output from the operating signal output means without using the movable unit.

請求項３０に記載の発明では、請求項１乃至請求項３０のいずれか１つにおいて、被対話体が携帯電話で構成されていることを特徴とする。 According to a thirty-third aspect of the present invention, in any one of the first to thirty-first aspects, the interactee is configured by a mobile phone.

これによれば、人と、携帯電話とが音声対話を行う高度な音声対話システムを提供できる。また、上述した請求項１乃至請求項３０のいずれか１つの手段の後に説明した作用、効果の「被対話体」を「携帯電話」に置き換えた作用、効果を得ることができる。 According to this, it is possible to provide an advanced voice interaction system in which a person and a mobile phone have a voice conversation. In addition, it is possible to obtain the operation and effect obtained by replacing the “subject to be interacted” of the operation and effect described after any one of the above-described means of claims 1 to 30 with “mobile phone”.

請求項３１に記載の発明では、請求項１乃至請求項３０のいずれか１つにおいて、被対話体がコンピュータで構成されていることを特徴とする。 According to a thirty-first aspect of the present invention, in any one of the first to thirty-first aspects, the interactee is constituted by a computer.

これによれば、人と、コンピュータとが音声対話を行う高度な音声対話システムを提供できる。また、上述した請求項１乃至請求項３０のいずれか１つの手段の後に説明した作用、効果の「被対話体」を「コンピュータ」に置き換えた作用、効果を得ることができる。 According to this, it is possible to provide an advanced voice dialogue system in which a person and a computer perform voice dialogue. In addition, it is possible to obtain the operation and effect obtained by replacing the “interacted object” of the operation and effect described after any one of the above-described means by the “computer”.

請求項３２に記載の発明では、請求項１乃至請求項３０のいずれか１つにおいて、被対話体がゲーム機で構成されていることを特徴とする。 According to a thirty-second aspect of the present invention, in any one of the first to thirtieth aspects, the interactee is configured by a game machine.

これによれば、人と、ゲーム機とが音声対話を行う高度な音声対話システムを提供できる。また、上述した請求項１乃至請求項３０のいずれか１つの手段の後に説明した作用、効果の「被対話体」を「ゲーム機」に置き換えた作用、効果を得ることができる。 According to this, it is possible to provide an advanced voice dialogue system in which a person and a game machine perform voice dialogue. In addition, it is possible to obtain an operation and an effect obtained by replacing the “interacted object” of the operation and effect described after any one of the above-described means by a “game machine”.

請求項３３に記載の発明では、請求項１乃至請求項３０のいずれか１つにおいて、被対話体がカメラで構成されていることを特徴とする。 According to a thirty-third aspect of the present invention, in any one of the first to thirty-third aspects, the object to be interacted is configured by a camera.

これによれば、人と、カメラとが音声対話を行う高度な音声対話システムを提供できる。また、上述した請求項１乃至請求項３０のいずれか１つの手段の後に説明した作用、効果の「被対話体」を「カメラ」に置き換えた作用、効果を得ることができる。 According to this, it is possible to provide an advanced voice interaction system in which a person and a camera have a voice conversation. In addition, it is possible to obtain an operation and an effect obtained by replacing the “interactive body” of the operation and effect described after any one of the above-described means of claims 1 to 30 with a “camera”.

請求項３４に記載の発明では、請求項１乃至請求項３０のいずれか１つにおいて、被対話体がロボットの機体で構成されていることを特徴とする。 According to a thirty-fourth aspect of the present invention, in any one of the first to thirty-first aspects, the interactee is configured by a robot body.

これによれば、人と、ロボットの機体とが音声対話を行う高度な音声対話システムを提供できる。また、上述した請求項１乃至請求項３０のいずれか１つの手段の後に説明した作用、効果の「被対話体」を「ロボットの機体」に置き換えた作用、効果を得ることができる。 According to this, it is possible to provide an advanced voice dialogue system in which a person and a robot body perform voice dialogue. In addition, it is possible to obtain the operation and effect obtained by replacing the “interacted body” of the operation and effect described after any one of the above-described means by the “robot body”.

請求項３５に記載の発明では、請求項１乃至請求項３０のいずれか１つにおいて、被対話体が人形、ぬいぐるみ、玩具のいずれか１つで構成されていることを特徴とする。 According to a thirty-fifth aspect of the present invention, in any one of the first to thirty-third aspects, the object to be interacted is formed of any one of a doll, a stuffed toy, and a toy.

これによれば、人と、人形、ぬいぐるみ、玩具のいずれか１つとが音声対話を行う高度な音声対話システムを提供できる。また、上述した請求項１乃至請求項３０のいずれか１つの手段の後に説明した作用、効果の「被対話体」を「人形」、「ぬいぐるみ」、「玩具」のいずれかに置き換えた効果を得ることができる。また、被対話体が人形、ぬいぐるみ、玩具のいずれか１つで構成されているので、親しみがわきやすい。 According to this, it is possible to provide an advanced voice dialogue system in which a person and any one of a doll, a stuffed animal, and a toy perform voice dialogue. Further, an effect obtained by replacing the “interactive body” of the action and effect described after any one of the means of claims 1 to 30 with any of “doll”, “stuffed animal”, and “toy”. Obtainable. In addition, since the object to be interacted with is composed of any one of a doll, a stuffed animal, and a toy, it is easy to get familiar.

（第１実施形態）
最初に、以下の説明で用いる用語について説明する。人の音声とは、人が発する音である。発音とは、音声対話システムから人に発する音である。 (First embodiment)
First, terms used in the following description will be described. A person's voice is a sound emitted by a person. Pronunciation is a sound emitted from a spoken dialogue system to a person.

以下具体的に説明する。図１は音声対話システムの外観図を、図２は音声対話システムのブロック図を、図３は可動ユニットの正面断面図を示す。図１に示すように、第１実施形態における音声対話システム１００は、被対話体１１、サーバ１３、可動ユニット１５を備えている。サーバ１３は、本発明のサーバ用コンピュータを構成する。 This will be specifically described below. FIG. 1 is an external view of the voice dialogue system, FIG. 2 is a block diagram of the voice dialogue system, and FIG. 3 is a front sectional view of the movable unit. As shown in FIG. 1, the voice interaction system 100 according to the first embodiment includes an object to be interacted 11, a server 13, and a movable unit 15. The server 13 constitutes the server computer of the present invention.

被対話体１１は人から見た場合の対話相手を想定したもので、図２に示すように、駆動部１７、上腕用モータ１９、下腕用モータ２１、ハンド用モータ２３、走行用モータ２５、旋回用モータ２７、上腕部２９、下腕部３１、ハンド３３、走行部３５、旋回部３７、マイク３９、音声出力ボード４１、スピーカ４３、ＣＣＤカメラ４５、指令信号受信復調手段４７、音声信号変調送信手段４９、発音信号受信復調手段５１、撮像信号変調送信手段５３、図示しない小規模な制御装置、図示しない電源を備えている。 The person to be interacted 11 is assumed to be a conversation partner when viewed from a person, and as shown in FIG. 2, the drive unit 17, the upper arm motor 19, the lower arm motor 21, the hand motor 23, and the traveling motor 25. , Turning motor 27, upper arm portion 29, lower arm portion 31, hand 33, traveling portion 35, turning portion 37, microphone 39, sound output board 41, speaker 43, CCD camera 45, command signal receiving demodulation means 47, sound signal A modulation transmission unit 49, a sound signal reception / demodulation unit 51, an imaging signal modulation transmission unit 53, a small control device (not shown), and a power supply (not shown) are provided.

駆動部１７は、後述するコントローラ５９の指令信号に基づいて、コントローラ５９の指令信号通りに、上腕用モータ１９、下腕用モータ２１、ハンド用モータ２３、走行用モータ２５、旋回用モータ２７を駆動する。 Based on a command signal from the controller 59, which will be described later, the drive unit 17 controls the upper arm motor 19, the lower arm motor 21, the hand motor 23, the traveling motor 25, and the turning motor 27 in accordance with the command signal from the controller 59. To drive.

上腕用モータ１９、下腕用モータ２１、ハンド用モータ２３、走行用モータ２５、旋回用モータ２７は、それぞれ、上腕部２９、下腕部３１、ハンド３３、走行部３５、旋回部３７を可動する。 Upper arm motor 19, lower arm motor 21, hand motor 23, traveling motor 25, and turning motor 27 move upper arm portion 29, lower arm portion 31, hand 33, traveling portion 35, and turning portion 37, respectively. To do.

上腕部２９、下腕部３１、ハンド３３、走行部３５、旋回部３７は、それぞれ、上腕用モータ１９、下腕用モータ２１、ハンド用モータ２３、走行用モータ２５、旋回用モータ２７の図示しない駆動軸に取り付けられており、上記駆動軸を駆動することで可動することができる。上記上腕部２９、下腕部３１、ハンド３３、走行部３５、旋回部３７は、人と対話を行う場合、所定の説明を行う場合の少なくとも１つにおいて、所定のコミュニケーション動作（身振り、手振り）をすることができる。上記コミュニケーション動作は、コントローラ５９が駆動部１７に、所定のコミュニケーション動作をする指令信号を出力することで行われる。所定のコミュニケーション動作は、後述する動作決定部７３で決定される。なお、上腕部２９、下腕部３１、ハンド３３、走行部３５、旋回部３７を合わせて可動部と称するものとする。上記上腕部２９、下腕部３１、ハンド３３、走行部３５、旋回部３７は、本発明の可動部を構成する。 The upper arm portion 29, the lower arm portion 31, the hand 33, the traveling portion 35, and the turning portion 37 are respectively shown as an upper arm motor 19, a lower arm motor 21, a hand motor 23, a traveling motor 25, and a turning motor 27. It is attached to the drive shaft that is not, and can be moved by driving the drive shaft. The upper arm part 29, the lower arm part 31, the hand 33, the traveling part 35, and the turning part 37 have a predetermined communication operation (gesture, hand gesture) in at least one of cases where a predetermined explanation is given when a conversation is performed with a person. Can do. The communication operation is performed when the controller 59 outputs a command signal for performing a predetermined communication operation to the drive unit 17. The predetermined communication operation is determined by an operation determination unit 73 described later. The upper arm part 29, the lower arm part 31, the hand 33, the traveling part 35, and the turning part 37 are collectively referred to as a movable part. The upper arm part 29, the lower arm part 31, the hand 33, the traveling part 35, and the turning part 37 constitute a movable part of the present invention.

また、上腕部２９、下腕部３１、ハンド３３、走行部３５、旋回部３７は、協調して、所定の装置２００を操作することがきる。上記所定の装置の操作は、コントローラ５９が駆動部１７に、所定の装置２００の操作をする指令信号を出力することで行われる。その際、後述するＣＣＤカメラ４５が所定の装置２００の操作手段２００ａを撮像し、画像認識手段で画像認識を行い、さらに、所定の装置２００の操作をするプログラムに基づいてコントローラ５９が駆動部１７に、所定の装置２００の操作をする指令信号を出力する。なお、ＣＣＤカメラ４５、画像認識手段を用いずに操作が可能である場合には、上記ＣＣＤカメラ４５、画像認識手段を用いずともよい。 Moreover, the upper arm part 29, the lower arm part 31, the hand 33, the traveling part 35, and the turning part 37 can operate the predetermined device 200 in cooperation. The operation of the predetermined device is performed by the controller 59 outputting a command signal for operating the predetermined device 200 to the drive unit 17. At that time, a CCD camera 45 (to be described later) images the operation means 200a of the predetermined apparatus 200, performs image recognition by the image recognition means, and further, the controller 59 operates the drive unit 17 based on a program for operating the predetermined apparatus 200. In addition, a command signal for operating the predetermined apparatus 200 is output. If the operation can be performed without using the CCD camera 45 and the image recognition means, the CCD camera 45 and the image recognition means may not be used.

マイク３９は、人の音声を音声信号に変換して出力する。上記マイクは、本発明の音声変換手段を構成する。 The microphone 39 converts a human voice into a voice signal and outputs the voice signal. The microphone constitutes the sound conversion means of the present invention.

音声出力ボード４１は、発音信号受信復調手段５１で受信、復調された発音信号を所定の電圧に変換して出力する。 The sound output board 41 converts the sound signal received and demodulated by the sound signal reception demodulating means 51 into a predetermined voltage and outputs it.

スピーカ４３は、音声出力ボード４１から出力された電圧を音に変換して発音する。上記スピーカ４３は、本発明の発音手段を構成する。 The speaker 43 converts the voltage output from the audio output board 41 into sound and generates a sound. The speaker 43 constitutes the sounding means of the present invention.

ＣＣＤカメラ４５は、被対話体１１の周りを撮像するもので、ＣＣＤイメージセンサ４５ａ、信号処理部４５ｂから構成されている。上記ＣＣＤイメージセンサ４５ａ、信号処理部４５ｂは、旋回部３７に搭載されている。そして、上記旋回用モータ２７が駆動することで、旋回部３７が旋回を行い、被対話体１１の周りを撮像する。なお、第１実施形態では、信号処理部４５ｂは被対話体１１に搭載されているが、サーバ１３に搭載されていてもよい。上記ＣＣＤイメージセンサ４５ａは本発明の撮像手段を構成する。 The CCD camera 45 captures an image around the object to be interacted 11 and includes a CCD image sensor 45a and a signal processing unit 45b. The CCD image sensor 45 a and the signal processing unit 45 b are mounted on the turning unit 37. Then, when the turning motor 27 is driven, the turning unit 37 turns to take an image of the periphery of the interacting body 11. In the first embodiment, the signal processing unit 45 b is mounted on the interactee 11, but may be mounted on the server 13. The CCD image sensor 45a constitutes the image pickup means of the present invention.

ＣＣＤイメージセンサ４５ａは、人を含む所定の対象物から発した光をレンズなどの光学系によって撮像素子の受光平面に結合させ、その像の光による明暗を電荷の量に光電変換し、それを順次読み出して電気信号に変換するものであって、被対話体１１の周囲を撮像し、電気信号に変換している。 The CCD image sensor 45a couples light emitted from a predetermined object including a person to a light receiving plane of an image sensor by an optical system such as a lens, and photoelectrically converts light and darkness of the image into an amount of electric charge. The information is sequentially read and converted into an electric signal, and the periphery of the interacting body 11 is imaged and converted into an electric signal.

また、信号処理部４５ｂは、ＣＣＤイメージセンサ４５ａによって変換された電気信号を所定の撮像信号に処理する。上記信号処理部４５ｂで認識された認識信号は、撮像信号変調送信手段５３により、電波、光波、超音波のいずれかに変調され、サーバ１３に設けられた認識信号受信復調手段６７により、所定の認識信号に復調される。そして、サーバ１３の画像認識手段に送信される。 The signal processing unit 45b processes the electrical signal converted by the CCD image sensor 45a into a predetermined imaging signal. The recognition signal recognized by the signal processing unit 45b is modulated into any one of radio waves, light waves, and ultrasonic waves by the imaging signal modulation and transmission means 53, and a predetermined signal is received by the recognition signal reception demodulation means 67 provided in the server 13. Demodulated into a recognition signal. Then, it is transmitted to the image recognition means of the server 13.

上記画像認識手段では、被対話体１１の周囲を撮像した撮像信号から人を含む所定の対象物の特徴点を抽出し、認識を行っている。ＣＰＵボード５７のＣＰＵは上記画像認識手段で認識された結果に基づいて、対話処理部７１、動作決定部７３を制御する。なお、ＣＣＤイメージセンサ４５ａで撮像され、信号処理部４５ｂで処理された画像は、後述する画像モニタ７９ａにより表示することができる。 In the image recognition means, feature points of a predetermined object including a person are extracted from an image pickup signal obtained by picking up an image around the object 11 to be recognized. The CPU of the CPU board 57 controls the dialogue processing unit 71 and the operation determining unit 73 based on the result recognized by the image recognition means. An image captured by the CCD image sensor 45a and processed by the signal processing unit 45b can be displayed on an image monitor 79a described later.

なお、信号処理部４５ｂは、サーバ１３側に設けてもよく、その場合は、ＣＣＤイメージセンサ４５ａに撮像された撮像データを撮像信号変調送信手段５３により、電波、光波、超音波のいずれかに変調し、サーバ１３に設けられた撮像信号受信復調手段６７により、所定の撮像データに復調して信号処理部４５ｂに送信するようにしてもよい。 The signal processing unit 45b may be provided on the server 13 side. In this case, the image data captured by the CCD image sensor 45a is converted into any one of radio waves, light waves, and ultrasonic waves by the imaging signal modulation transmission unit 53. The signal may be modulated and demodulated into predetermined imaging data by the imaging signal reception demodulating means 67 provided in the server 13, and transmitted to the signal processing unit 45b.

指令信号受信復調手段４７は、サーバ１３に搭載された指令信号変調送信手段６１から送信された電波、光波、超音波のいずれかを受信し、所定の指令信号に復調する。 The command signal receiving / demodulating means 47 receives any one of radio waves, light waves and ultrasonic waves transmitted from the command signal modulation / transmitting means 61 mounted on the server 13 and demodulates them into a predetermined command signal.

音声信号変調送信手段４９は、マイク３９により変換された音声信号を電波、光波、超音波のいずれかに変調してサーバ１３に搭載された音声信号受信復調手段６３に送信する。 The audio signal modulation / transmission means 49 modulates the audio signal converted by the microphone 39 into one of a radio wave, a light wave, and an ultrasonic wave and transmits it to the audio signal reception / demodulation means 63 mounted on the server 13.

発音信号受信復調手段５１は、サーバ１３に搭載された発音信号変調送信手段６５から送信された電波、光波、超音波のいずれかを受信し、所定の発音信号に復調する。 The sound signal receiving / demodulating means 51 receives any one of radio waves, light waves, and ultrasonic waves transmitted from the sound signal modulating / transmitting means 65 mounted on the server 13 and demodulates it into a predetermined sound signal.

次に、サーバ１３について説明する。上記サーバ１３は、音声認識ボード５５、ＣＰＵボード５７、コントローラ５９、指令信号変調送信手段６１、音声信号受信復調手段６３、発音信号変調送信手段６５、撮像信号受信復調手段６７、画像信号変調送信手段６９が搭載されており、図示しない電源から電気が供給されている。 Next, the server 13 will be described. The server 13 includes a voice recognition board 55, a CPU board 57, a controller 59, a command signal modulation / transmission means 61, a voice signal reception / demodulation means 63, a sound signal modulation / transmission means 65, an imaging signal reception / demodulation means 67, and an image signal modulation / transmission means. 69 is mounted and electricity is supplied from a power source (not shown).

音声認識ボード５５は、図２に示すように、音響分析部を備えており、マイク３９から入力された相手の音声を分析し、音響的特徴を抽出している。そして、音声認識エンジンで上記音響分析部で抽出された音響的特徴と、音素を単位とした音声特徴量パターンの分布の統計モデルである音響モデルとの比較照合を行うことで音声を認識し、その結果をＣＰＵボード５７の対話処理部７１に出力している。なお、第１実施形態では、音響モデルに加えて、単語間の接続関係を規定する言語モデルを備えており、連続した単語や、接頭語、接続詞を含めた文章を認識することができる。上記音声認識ボード５５は、本発明の音声認識手段を構成する。 As shown in FIG. 2, the voice recognition board 55 includes an acoustic analysis unit, analyzes the voice of the other party input from the microphone 39, and extracts acoustic features. Then, the speech recognition engine recognizes the speech by comparing and comparing the acoustic feature extracted by the acoustic analysis unit with the acoustic model that is a statistical model of the distribution of the speech feature amount pattern in units of phonemes, The result is output to the dialogue processing unit 71 of the CPU board 57. In the first embodiment, in addition to the acoustic model, a language model that defines the connection relationship between words is provided, and a continuous word, a sentence including a prefix, and a conjunction can be recognized. The speech recognition board 55 constitutes speech recognition means of the present invention.

ＣＰＵボード５７には、ＣＰＵの他にＲＡＭおよびＲＯＭからなるメモリが搭載されており、上記メモリに対話処理プログラム、動作決定プログラム、発音情報、画像情報が記憶されている。なお、以下の説明では、対話処理プログラムおよび上記対話処理プログラムが記憶される所定のメモリ領域を対話処理部７１、動作決定プログラムおよび上記動作決定プログラムが記憶される所定のメモリ領域を動作決定部７３、発音情報およびを発音情報が記憶される所定のメモリ領域を発音情報記憶部７５、画像情報およびを画像情報が記憶される所定のメモリ領域を画像情報記憶部７７と称するものとする。 In addition to the CPU, the CPU board 57 includes a memory including a RAM and a ROM, and the memory stores an interactive processing program, an operation determination program, pronunciation information, and image information. In the following description, the dialogue processing unit 71 is a predetermined memory area in which the dialogue processing program and the dialogue processing program are stored, and the action determination unit 73 is a predetermined memory area in which the operation determination program and the operation determination program are stored. A predetermined memory area in which the pronunciation information and the pronunciation information are stored is referred to as a pronunciation information storage section 75, and a predetermined memory area in which the image information and the image information are stored is referred to as an image information storage section 77.

コントローラ５９は、上述した上腕部２９、下腕部３１、ハンド３３、走行部３５、旋回部３７が動作決定部７３によって決定された動作となるように、駆動部１７に動作の指令信号を出す。 The controller 59 issues an operation command signal to the drive unit 17 so that the above-described upper arm unit 29, lower arm unit 31, hand 33, travel unit 35, and turning unit 37 perform the operations determined by the operation determination unit 73. .

指令信号変調送信手段６１は、コントローラ５９から送信された動作信号を、電波、光波、超音波のいずれかに変調し、指令信号受信復調手段４７に送信をする。 The command signal modulation / transmission means 61 modulates the operation signal transmitted from the controller 59 into one of radio waves, light waves, and ultrasonic waves, and transmits it to the command signal reception / demodulation means 47.

音声信号受信復調手段６３は、音声信号変調送信手段４９によって電波、光波、超音波のいずれかに変調された発音信号を受信し、所定の発音信号に復調する。 The sound signal receiving / demodulating means 63 receives the sound signal modulated by the sound signal modulation / transmitting means 49 into any one of radio waves, light waves, and ultrasonic waves, and demodulates it into a predetermined sound signal.

発音信号変調送信手段６５は、発音信号受信復調手段５１よって電波、光波、超音波のいずれかに変調された音声信号を受信し、所定の音声信号に復調する。 The sound signal modulation transmitting means 65 receives the sound signal modulated by the sound signal receiving / demodulating means 51 into any one of radio waves, light waves, and ultrasonic waves, and demodulates it into a predetermined sound signal.

画像信号変調送信手段６９は、ＣＣＤカメラ４５の信号処理部４５ｂから出力された画像信号を電波、光波、超音波のいずれかに変調し、画像情報受信復調手段８１に送信する。 The image signal modulation / transmission means 69 modulates the image signal output from the signal processing unit 45 b of the CCD camera 45 into any one of radio waves, light waves, and ultrasonic waves, and transmits it to the image information reception / demodulation means 81.

対話処理部７１は、音声認識ボード５５により認識された音声に基づいて、相手に対して応答する音声を決定する。上記対話処理部７１で決定された音声は、発音信号変調送信手段６５、発音信号受信復調手段５１を経由し、音声出力ボード４１で所定の電圧に変換され、スピーカ４３で発音される。なお、上記対話処理部７１は音声対話システム１００自らが発音する機能も有している。また、対話処理部７１では、上記ＣＣＤカメラ４５で人を認識した際、対話の際、あるいは被対話体１１自ら発音する際に、動作を決定する。上記対話処理部７１は、本発明の対話制御手段を構成する。 The dialogue processing unit 71 determines a voice to respond to the other party based on the voice recognized by the voice recognition board 55. The sound determined by the dialog processing unit 71 is converted into a predetermined voltage by the sound output board 41 via the sound signal modulation / transmission means 65 and the sound signal reception / demodulation means 51, and is sounded by the speaker 43. The dialog processing unit 71 also has a function of sounding by the voice dialog system 100 itself. The dialogue processing unit 71 determines an operation when a person is recognized by the CCD camera 45, during dialogue, or when the person to be interacted 11 himself / herself pronounces. The dialogue processing unit 71 constitutes dialogue control means of the present invention.

また、ＣＰＵボード５７は、所定の音声情報を記憶する発音情報記憶部７５を備えており、人がマイク３９を介して所定の発音情報を要求した場合、人がマイク３９を介して所定の発音情報を許可した場合、所定の発音情報を用いて被対話体１１が自ら発音する場合のいずれかに、発音情報記憶部７５から所定の発音情報を読み出して、スピーカ４３から発音する。 The CPU board 57 also includes a pronunciation information storage unit 75 that stores predetermined sound information. When a person requests predetermined sound generation information via the microphone 39, the person generates a predetermined sound generation via the microphone 39. When the information is permitted, the predetermined sounding information is read out from the sounding information storage unit 75 and sounded from the speaker 43 in any of the cases where the interacting body 11 sounds itself using the predetermined sounding information.

また、上記被対話体１１および上記サーバ１３とは別体に、画像表示装置７９が設けられている。画像表示装置７９は、画像を表示する画像モニタ７９ａと、画像信号変調送信手段６９から送信された電波、光波、超音波のいずれかを受信して所定の画像情報に復調する画像情報受信復調手段８１とが設けられている。上記画像モニタ７９ａは、本発明の画像表示手段を構成する。 Further, an image display device 79 is provided separately from the interactee 11 and the server 13. The image display device 79 includes an image monitor 79a for displaying an image, and an image information receiving / demodulating unit that receives any one of radio waves, light waves, and ultrasonic waves transmitted from the image signal modulation / transmitting unit 69 and demodulates them into predetermined image information. 81 is provided. The image monitor 79a constitutes the image display means of the present invention.

上記発音情報とは、発音により人に伝達する情報であって、音声の他に、音楽、音を含む。また、画像情報とは、人に対して表示する情報であって、静止画像、動画像、文字、光のうち、少なくとも１つで構成される。 The pronunciation information is information transmitted to a person by pronunciation, and includes music and sound in addition to voice. The image information is information displayed to a person, and is configured by at least one of a still image, a moving image, characters, and light.

なお、発音情報記憶部７５、画像情報記憶部７７は、ＣＰＵボード５７の外側に配置してもよく、被対話体１１に配置してもよい。 Note that the pronunciation information storage unit 75 and the image information storage unit 77 may be arranged outside the CPU board 57 or may be arranged in the interactee 11.

また、画像モニタ７９ａには、人の眉毛、目、口を真似て表情を表示するようにしてもよい。上記表情とは、例えば、普通の表情、笑った表情、泣いた表情、怒った表情等などで、対話処理部７１で決定された対話内容に基づいて、図示しない表情決定部により表情を決定する。 The image monitor 79a may display facial expressions by imitating human eyebrows, eyes, and mouth. The facial expression is, for example, an ordinary facial expression, a laughing facial expression, a crying facial expression, an angry facial expression, etc., and the facial expression is determined by a facial expression determination unit (not shown) based on the conversation content determined by the dialogue processing unit 71. .

次に、可動ユニット１５について説明する。可動ユニット１５は、被対話体１１、サーバ１３と別体に構成されており、駆動部８３、ソレノイド８５、プッシャ８７、指令信号受信復調手段８９を備えており、図示しない電源から電気が供給されている。 Next, the movable unit 15 will be described. The movable unit 15 is configured separately from the interacting body 11 and the server 13, and includes a drive unit 83, a solenoid 85, a pusher 87, and a command signal receiving / demodulating unit 89, and is supplied with electricity from a power source (not shown). ing.

駆動部８３は、図２に示すように、指令信号受信復調手段８９で受信復調された動作の指令信号を受信すると、図２に示すソレノイド８５に通電し、プッシャ８７を可動する。上記プッシャ８７は、可動した際に所定の装置２００の操作手段２００ａをオン／オフする位置に配置される。 As shown in FIG. 2, when receiving the command signal of the operation received and demodulated by the command signal receiving / demodulating means 89, the drive unit 83 energizes the solenoid 85 shown in FIG. 2 and moves the pusher 87. The pusher 87 is disposed at a position for turning on / off the operation means 200a of the predetermined device 200 when the pusher 87 is moved.

なお、可動ユニット１５は、上記構成に限るものではなく、種々の形態が考えられる。例えば、複数の可動部と、複数の可動部をそれぞれ可動するモータが搭載されていてもよく、上記複数の可動部、上記モータに加え、モータを駆動する駆動部が搭載されていてもよい。 The movable unit 15 is not limited to the above configuration, and various forms are conceivable. For example, a plurality of movable parts and a motor that moves each of the plurality of movable parts may be mounted, and in addition to the plurality of movable parts and the motor, a drive unit that drives the motor may be mounted.

ここで、音声対話システム１００の対話動作について説明する。人が被対話体１１に発声すると、周囲音とともに、その音声が被対話体１１に搭載されたマイク３９で音声信号に変換される。そして、変換された音声信号が、音声信号変調送信手段４９、音声信号受信復調手段６３を経由して音声認識ボード５５に送信される。上記音声認識ボード５５では、マイク３９から入力された相手の音声を分析し、音響的特徴を抽出、音声認識エンジンで上記音響分析部で抽出された音響的特徴と、音素を単位とした音声特徴量パターンの分布の統計モデルである音響モデルとの比較照合を行うことで音声を認識し、その結果をＣＰＵボード５７に出力する。 Here, the dialogue operation of the voice dialogue system 100 will be described. When a person speaks to the interactee 11, the sound is converted into an audio signal by the microphone 39 mounted on the interactee 11 along with the ambient sound. The converted voice signal is transmitted to the voice recognition board 55 via the voice signal modulation / transmission means 49 and the voice signal reception / demodulation means 63. The voice recognition board 55 analyzes the other party's voice input from the microphone 39, extracts the acoustic features, the acoustic features extracted by the acoustic analysis unit in the voice recognition engine, and the voice features in units of phonemes. The voice is recognized by comparing with the acoustic model that is a statistical model of the distribution of the quantity pattern, and the result is output to the CPU board 57.

その際、被対話体１１は、旋回用モータ２７および上記旋回用モータ２７に搭載されたＣＣＤカメラ４５が旋回して人を捜すように、コントローラ５９が駆動部１７に動作の指令信号を出力する。そして、ＣＣＤカメラ４５が被対話体１１の周囲を撮像し、ＣＣＤイメージセンサ４５ａによって変換された電気信号から人を含む所定の対象物の特徴点を抽出して認識を行う。そして、上記人が移動すると、人を追跡するように旋回用モータ２７および上記旋回用モータ２７に搭載されたＣＣＤカメラ４５が旋回する。 At that time, the controller 59 outputs an operation command signal to the drive unit 17 so that the dialogue motor 11 and the CCD camera 45 mounted on the turning motor 27 turn to search for a person. . Then, the CCD camera 45 captures an image of the periphery of the person to be interacted 11 and extracts and recognizes feature points of a predetermined object including a person from the electric signal converted by the CCD image sensor 45a. When the person moves, the turning motor 27 and the CCD camera 45 mounted on the turning motor 27 turn to follow the person.

次に、対話処理部７１は、音声認識ボード５５により認識された音声に基づいて、相手に対して応答する音声を決定する。上記対話処理部７１で決定された音声は、発音信号変調送信手段６５、発音信号受信復調手段５１を経由し、音声出力ボード４１で所定の電圧に変換され、スピーカ４３で発音される。その際、コミュニケーション動作をするように設定されている場合には、上記対話処理部７１で決定された音声の内容に応じて、人に対してコミュニケーション動作をするように、コントローラ５９が駆動部１７に動作の指令信号を出力する。 Next, the dialogue processing unit 71 determines a voice to respond to the other party based on the voice recognized by the voice recognition board 55. The sound determined by the dialog processing unit 71 is converted into a predetermined voltage by the sound output board 41 via the sound signal modulation / transmission means 65 and the sound signal reception / demodulation means 51, and is sounded by the speaker 43. At this time, if it is set to perform a communication operation, the controller 59 causes the drive unit 17 to perform a communication operation for a person according to the content of the voice determined by the dialogue processing unit 71. The operation command signal is output to.

また、人が被対話体１１に発声する内容が、人がマイク３９を介して所定の発音情報を要求した場合、人がマイク３９を介して所定の発音情報を許可した場合、所定の発音情報を用いて被対話体１１が自ら発音する場合のいずれかには、発音情報記憶部７５から所定の発音情報を読み出して、発音手段から発音する。 In addition, when a person utters the interacting body 11 when the person requests predetermined pronunciation information via the microphone 39, or when the person permits the predetermined pronunciation information via the microphone 39, the predetermined pronunciation information When the person to be interacted 11 pronounces itself by using the, the predetermined sounding information is read from the sounding information storage unit 75 and sounded by the sounding means.

また、人が被対話体１１に発声する内容が、人がマイク３９を介して所定の画像情報を要求した場合、人がマイク３９を介して所定の画像情報を許可した場合、所定の画像情報を用いて画像モニタ７３ａに画像情報を表示する場合のいずれかには、画像情報記憶部７７から所定の画像情報を読み出して、画像モニタ７９ａに表示する。 In addition, when the person utters the interacting body 11 when the person requests predetermined image information via the microphone 39, or when the person permits the predetermined image information via the microphone 39, the predetermined image information Is used to display the image information on the image monitor 73a, the predetermined image information is read from the image information storage unit 77 and displayed on the image monitor 79a.

また、人が被対話体１１に発声する内容が、人の音声が所定の装置２００を操作する命令である場合、人の音声が所定の装置２００を操作する許可である場合、所定の操作入力手段により所定の装置２００を操作する場合、所定の装置２００を操作する自動実行プログラムが実行される場合、ＣＣＤカメラ４５が、所定の装置２００の操作手段２００ａを撮像し、画像認識手段が操作手段２００ａの位置を認識する。そして、走行部３５が操作手段２００ａを操作する位置に可動し、上腕部２９、下腕部３１、ハンド３３、走行部３５が所定の装置２００を操作するように、コントローラ５９が駆動部１７に指令信号を出力する。 In addition, when the content of the person uttering the interacting body 11 is a command for operating the predetermined device 200 when the voice of the person is an instruction to operate the predetermined device 200, a predetermined operation input is performed. When the predetermined device 200 is operated by the means, when the automatic execution program for operating the predetermined device 200 is executed, the CCD camera 45 images the operation means 200a of the predetermined device 200, and the image recognition means is the operation means. The position of 200a is recognized. Then, the controller 59 is moved to the drive unit 17 so that the traveling unit 35 is moved to a position where the operating unit 200a is operated, and the upper arm unit 29, the lower arm unit 31, the hand 33, and the traveling unit 35 operate the predetermined device 200. A command signal is output.

また、人と音声対話をする際に、画像モニタ７９ａに、人の眉毛、目、口を真似て表情を表示するよう設定されている場合には、対話処理部７１で決定された対話内容に基づいて、図示しない表情決定部で表情を決定し、画像モニタ７９ａ用に、普通の表情、笑った表情、泣いた表情、怒った表情等などを表示する。 Further, when a voice dialogue is performed with a person, if the image monitor 79a is set to display a facial expression by imitating a person's eyebrows, eyes, and mouth, the conversation content determined by the dialogue processing unit 71 is displayed. Based on this, a facial expression determination unit (not shown) determines the facial expression, and displays an ordinary facial expression, a laughing facial expression, a crying facial expression, an angry facial expression, and the like for the image monitor 79a.

上記構成によれば、マイク３９、スピーカ４３を備えた被対話体１１と、上記被対話体１１と別体に構成され、音声認識ボード５５、対話処理部７１を備えたサーバ用コンピュータとの間が有線及び無線のいずれかで接続されて、人が被対話体１１と音声対話を行うことができる。また、可動部の動作を司令する指令信号を、サーバ用コンピュータに備えられたコントローラから、被対話体に備えられた駆動部に出力し、この指令信号に基づいてモータを駆動することで、可動部を可動することができる。 According to the above configuration, between the interactee 11 provided with the microphone 39 and the speaker 43, and the server computer comprising the speech recognition board 55 and the dialog processor 71, which is configured separately from the interactee 11. Are connected by either wired or wireless, and a person can have a voice conversation with the person to be interacted 11. In addition, a command signal for commanding the operation of the movable part is output from the controller provided in the server computer to the drive part provided in the object to be interacted with, and the motor is driven based on the command signal, thereby moving the command. The part can be moved.

また、上記構成によれば、音声認識ボード５５、対話処理部７１がサーバ１３に備えられるので、マイク３９、スピーカ４３、音声認識ボード５５、対話処理部７１のすべてが被対話体１１に搭載される場合に比べると、被対話体１１を小さく、軽くすることができ、被対話体１１の持ち運びを容易にすることができる。 Further, according to the above configuration, since the voice recognition board 55 and the dialogue processing unit 71 are provided in the server 13, all of the microphone 39, the speaker 43, the voice recognition board 55, and the dialogue processing unit 71 are mounted on the interacting body 11. Compared to the case where the interactee 11 is made smaller and lighter, it is possible to easily carry the interactee 11.

また、音声認識ボード５５、対話処理部７１がサーバ１３に備えられるので、被対話体１１を落下させた場合、あるいは水たまりに水没させた場合でも、高価な音声認識ボード５５、対話処理部７１が故障することがない。 Further, since the voice recognition board 55 and the dialogue processing unit 71 are provided in the server 13, even when the interacted body 11 is dropped or submerged in a puddle, the expensive voice recognition board 55 and the dialogue processing unit 71 are provided. There is no failure.

また、上記構成によれば、コントローラ５９がサーバ１３に備えられるので、各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７、コントローラ５９のすべてが被対話体１１に備えられる場合に比べると、被対話体１１を小さく、軽くすることができ、被対話体１１の持ち運びを容易にすることができる。 Further, according to the above configuration, since the controller 59 is provided in the server 13, the movable parts 29, 31, 33, 35, 37, the motors 19, 21, 23, 25, 27, the drive part 17, and the controller 59 Compared to the case where all of the object 11 is provided, the object 11 can be made smaller and lighter, and the object 11 can be easily carried.

また、コントローラ５９がサーバ１３に備えられるので、被対話体１１を落下させた場合、あるいは水たまりに水没させた場合でも、高価なコントローラ５９が故障することがない。 In addition, since the controller 59 is provided in the server 13, the expensive controller 59 does not break down even when the interactee 11 is dropped or submerged in a puddle.

また、上記構成によれば、発音情報記憶部７５、画像情報記憶部７７がサーバ１３に備えられるので、発音情報記憶部７５、画像情報記憶部７７が被対話体１１に備えられる場合に比べると、被対話体１１を小さく、軽くすることができ、被対話体１１の持ち運びを容易にすることができる。 Further, according to the above configuration, since the pronunciation information storage unit 75 and the image information storage unit 77 are provided in the server 13, compared with the case where the pronunciation information storage unit 75 and the image information storage unit 77 are provided in the interacting body 11. The interacting body 11 can be made small and light, and the interacting body 11 can be easily carried.

また、発音情報記憶部７５、画像情報記憶部７７がサーバ１３に備えられるので、被対話体１１を落下させた場合、あるいは水たまりに水没させた場合でも、高価な発音情報記憶部７５、画像情報記憶部７７が故障することがない。 Further, since the pronunciation information storage unit 75 and the image information storage unit 77 are provided in the server 13, even when the interacted body 11 is dropped or submerged in a puddle, the expensive pronunciation information storage unit 75 and the image information are stored. The storage unit 77 does not fail.

また、上記構成によれば、画像表示装置７９が、被対話体１１およびサーバ１３のいずれとも別体で構成されているので、画像表示装置７９が被対話体１１に備えられる場合に比べると、被対話体１１を小さく、軽くすることができ、被対話体１１の持ち運びを容易にすることができる。 Further, according to the above configuration, since the image display device 79 is configured separately from both the interactee 11 and the server 13, compared to the case where the image display 79 is provided in the interactee 11. The interacting body 11 can be made small and light, and the interacting body 11 can be easily carried.

また、発音情報記憶部７５、画像情報記憶部７７がサーバ１３に備えられるので、被対話体１１を落下させた場合、あるいは水たまりに水没させた場合でも、高価な発音情報記憶部７５、画像情報記憶部７７が故障することがない。 Further, since the pronunciation information storage unit 75 and the image information storage unit 77 are provided in the server 13, even when the interacted body 11 is dropped or submerged in a puddle, the expensive pronunciation information storage unit 75 and the image information are stored. The storage unit 77 does not break down.

また、画像表示装置７９が、被対話体１１およびサーバ１３のいずれとも別体で構成されているので、被対話体１１を落下させた場合、あるいは水たまりに水没させた場合でも、高価な画像表示装置７９が故障することがない。
さらに、被対話体１１とサーバ１３とが無線で接続されているので、有線の長さに制約されることなく、被対話体１１を移動することができる。 Further, since the image display device 79 is configured separately from both the interactee 11 and the server 13, even when the interactee 11 is dropped or submerged in a puddle, expensive image display is performed. The device 79 will not fail.
Furthermore, since the interactee 11 and the server 13 are connected wirelessly, the interactee 11 can be moved without being restricted by the length of the wire.

また、上記構成によれば、人が被対話体１１と単に音声対話するだけではなく、人が所定の発音情報を要求した場合、人がマイク３９を介して所定の発音情報を許可した場合、所定の発音情報を用いて被対話体１１が自ら発音する場合のいずれかに、所定の発音情報を得ることができる高機能な音声対話システムを提供することができる。また、人が所定の発音情報を要求した場合、所定の発音情報を用いて被対話体が自ら発音する場合に、所定の発音情報を読み出して、スピーカ４３から発音する高度なユーザインターフェースを提供できる。さらに、発音情報記憶部がサーバ１３に搭載されているので、被対話体１１を落下させた場合、あるいは水たまりに水没させた場合でも、発音情報記憶部７５に記憶された発音情報を損傷させることがない。 In addition, according to the above configuration, when a person not only has a voice conversation with the person to be interacted 11 but also requests a predetermined pronunciation information, if a person permits the predetermined pronunciation information via the microphone 39, It is possible to provide a highly functional voice dialogue system that can obtain predetermined pronunciation information in any case where the person to be interacted 11 pronounces itself using predetermined pronunciation information. In addition, when a person requests predetermined pronunciation information, an advanced user interface can be provided that reads out the predetermined pronunciation information and produces a sound from the speaker 43 when the person to be spoken uses the predetermined pronunciation information. . Further, since the pronunciation information storage unit is mounted on the server 13, even if the interactee 11 is dropped or submerged in a puddle, the pronunciation information stored in the pronunciation information storage unit 75 is damaged. There is no.

また、上記構成によれば、人がマイク３９を介して所定の画像情報を要求した場合、人がマイク３９を介して所定の画像情報を許可した場合、所定の画像情報を用いて被対話体が自ら所定の画像を表示する場合のいずれかに、所定の画像情報を得ることができる高機能な音声対話システムを提供できる。また、人がマイク３９を介して所定の画像情報を要求した場合、人がマイク３９を介して所定の画像情報を許可した場合、所定の画像情報を用いて被対話体１１が自ら所定の画像を表示する場合のいずれかに、所定の画像情報を画像情報記憶部から得て、画像モニタ７９ａに表示する高度なユーザインターフェースを実現することができる。 Further, according to the above configuration, when a person requests predetermined image information via the microphone 39, or when a person permits predetermined image information via the microphone 39, the object to be interacted using the predetermined image information. It is possible to provide a high-performance voice interaction system capable of obtaining predetermined image information in any case where the user himself / herself displays a predetermined image. In addition, when a person requests predetermined image information via the microphone 39, or when a person permits predetermined image information via the microphone 39, the interactee 11 himself / herself uses the predetermined image information. In any case of displaying the image information, it is possible to realize an advanced user interface that obtains predetermined image information from the image information storage unit and displays it on the image monitor 79a.

なお、画像情報記憶部は、被対話体１１に搭載されていてもよく、画像モニタ７９ａが被対話体１１と一体に設けられていてもよい。 The image information storage unit may be mounted on the interactee 11, and the image monitor 79 a may be provided integrally with the interactee 11.

また、上記構成によれば、ＣＣＤカメラ４５ａが人を含む所定の対象物を撮像し、画像認識手段が所定の対象物を認識した結果に基づいてコミュニケーション動作をし、臨場感を持って発音する高度な音声対話システムを提供できる。また、人を含む所定の対象物を撮像し、画像認識手段が所定の対象物を認識した結果に基づいてコミュニケーション動作をし、臨場感を持って発音する高度なユーザインターフェースを実現することができる。 Further, according to the above configuration, the CCD camera 45a images a predetermined object including a person, and the image recognition means performs a communication operation based on the result of recognizing the predetermined object, and pronounces with a sense of presence. An advanced voice dialogue system can be provided. Further, it is possible to realize an advanced user interface that captures an image of a predetermined object including a person, performs a communication operation based on a result of the image recognition unit recognizing the predetermined object, and pronounces with a sense of presence. .

また、上記構成によれば、ＣＣＤイメージセンサ４５ａ、画像認識手段により人を含む所定の対象物を認識して、人と音声対話をする高度な音声対話システムを提供できる。また、ＣＣＤイメージセンサ４５ａ、画像認識手段により人を含む所定の対象物を認識して、人と音声対話をする高度なユーザインターフェースを実現することができる。 Further, according to the above configuration, it is possible to provide an advanced voice dialogue system that recognizes a predetermined object including a person by the CCD image sensor 45a and the image recognition means and performs voice dialogue with the person. Further, it is possible to realize an advanced user interface for recognizing a predetermined object including a person by the CCD image sensor 45a and the image recognizing means and having a voice conversation with the person.

また、上記構成によれば、コントローラ５９が駆動部に指令信号を出力して、可動ユニット１５が所定の装置２００を操作する高度な音声対話システムを提供できる。 Moreover, according to the said structure, the controller 59 outputs a command signal to a drive part, and the high-level voice interactive system which the movable unit 15 operates the predetermined | prescribed apparatus 200 can be provided.

また、上記構成によれば、人の音声が所定の装置２００を操作する命令である場合、人の音声が所定の装置を操作する許可である場合、所定の操作入力手段により所定の装置を操作する場合に、各可動部２９、３１、３３、３５、３７および被対話体１１が、所定の装置２００の操作位置に可動し、所定の装置２００を操作する高度な音声対話システムを提供できる。また、人の音声が所定の装置を操作する命令である場合、人の音声が所定の装置を操作する許可である場合、所定の操作入力手段により所定の装置を操作する場合に、各可動部２９、３１、３３、３５、３７および被対話体１１が、所定の装置２００の操作位置に可動し、所定の装置２００を操作する高度なユーザインターフェースを実現することができる。 Further, according to the above configuration, when the human voice is an instruction to operate the predetermined device 200, when the human voice is permission to operate the predetermined device, the predetermined device is operated by the predetermined operation input means. In this case, it is possible to provide an advanced voice dialogue system in which the movable units 29, 31, 33, 35, and 37 and the interactee 11 are moved to the operation position of the predetermined device 200 and operate the predetermined device 200. Further, when the human voice is an instruction to operate a predetermined device, when the human voice is permission to operate the predetermined device, or when operating the predetermined device by a predetermined operation input means, each movable part 29, 31, 33, 35, 37 and the interacting body 11 can be moved to the operation position of the predetermined device 200, and an advanced user interface for operating the predetermined device 200 can be realized.

（第２実施形態）
第１実施形態では、音声対話システム１００が、被対話体１１、サーバ１３、可動ユニット１５が設けられたが、音声対話システム１００が、被対話体１１、サーバ１３のみで構成されていてもよい。 (Second Embodiment)
In the first embodiment, the spoken dialogue system 100 is provided with the interactee 11, the server 13, and the movable unit 15. However, the spoken dialogue system 100 may be configured by only the interactee 11 and the server 13. .

（第３実施形態）
第１実施形態では、被対話体１１に各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７が備えられたが、被対話体１１に各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７が備えられていなくともよい。また、音声対話システム１００が、被対話体１１、サーバ１３のみで構成されていてもよい。また、画像表示装置７９は、被対話体１１と別体に構成されていてもよい。 (Third embodiment)
In the first embodiment, the interacting body 11 is provided with the movable parts 29, 31, 33, 35, and 37, the motors 19, 21, 23, 25, and 27, and the driving unit 17. Each movable part 29, 31, 33, 35, 37, each motor 19, 21, 23, 25, 27, and the drive part 17 do not need to be provided. Further, the voice interaction system 100 may be configured only by the interactee 11 and the server 13. Further, the image display device 79 may be configured separately from the interactee 11.

上記構成によれば、被対話体１１に、駆動部１７、各モータ１９、２１、２３、２５、２７、各可動部２９、３１、３３、３５、３７、マイク３９、音声出力ボード４１、スピーカ４３、ＣＣＤカメラ４５、ＣＰＵボード５７、コントローラ５９、発音情報記憶部７５、画像情報記憶部７７、画像表示装置７９のうち、マイク３９、スピーカ４３が備えられるようにした場合、図４、図５に示すように、被対話体１１を飛躍的に小型化することができる。しかも、無線通信が可能であるので、例えば家庭内の限定された領域で使用するだけでなく、家庭を遠く離れた領域に、被対話体１１のみ移動させて使用することができる。 According to the above configuration, the interacting body 11 includes the drive unit 17, the motors 19, 21, 23, 25, 27, the movable units 29, 31, 33, 35, 37, the microphone 39, the audio output board 41, and the speaker. 43, the CCD camera 45, the CPU board 57, the controller 59, the pronunciation information storage unit 75, the image information storage unit 77, and the image display device 79, when the microphone 39 and the speaker 43 are provided, FIG. As shown in FIG. 5, the interacting body 11 can be greatly downsized. In addition, since wireless communication is possible, for example, not only can it be used in a limited area in the home, but also only the object 11 can be moved to a remote area and used.

また、被対話体１１を落下させた場合、あるいは水たまりに水没させた場合でも、最悪、マイク３９、スピーカ４３のみが故障するのみで、損傷を最小限にすることができる。 Further, even when the interactee 11 is dropped or submerged in a puddle, the worst case is that only the microphone 39 and the speaker 43 are broken, and the damage can be minimized.

（第４実施形態）
第１実施形態では、被対話体１１に走行部３５が備えられたが、被対話体１１に走行部３５が備えられなくともよい。また、被対話体１１に走行部３５が備えられたが、走行部３５の替わりに、歩行手段が備えられていてもよい。 (Fourth embodiment)
In the first embodiment, the traveling unit 35 is provided in the interacting body 11, but the traveling unit 35 may not be provided in the interacting body 11. Moreover, although the traveling unit 35 is provided in the interactee 11, a walking means may be provided instead of the traveling unit 35.

（第５実施形態）
第１実施形態では、音声認識ボード５５、対話処理部７１の両方がサーバ１３に搭載されたが、図６に示すように、音声認識ボード５５が被対話体１１に搭載され、対話処理部７１がサーバ１３に搭載されてもよい。 (Fifth embodiment)
In the first embodiment, both the voice recognition board 55 and the dialogue processing unit 71 are mounted on the server 13. However, as shown in FIG. 6, the voice recognition board 55 is mounted on the interactee 11 and the dialogue processing unit 71. May be mounted on the server 13.

上記構成によれば、対話処理部７１がサーバ１３に備えられるので、被対話体１１を落下させた場合に、あるいは水たまりに水没させた場合に、高価な対話処理部７１が故障することがない。 According to the above configuration, since the dialogue processing unit 71 is provided in the server 13, the expensive dialogue processing unit 71 does not break down when the object 11 is dropped or submerged in a puddle. .

また、上記構成によれば、各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７、コントローラ５９のすべてが被対話体１１に備えられる場合に比べると、被対話体１１を小さく、軽くすることができ、被対話体１１の持ち運びを容易にすることができる。 Further, according to the above configuration, the movable body 29, 31, 33, 35, 37, the motors 19, 21, 23, 25, 27, the drive unit 17, and the controller 59 are all provided in the interacting body 11. Compared to, the interactee 11 can be made smaller and lighter, and the interactee 11 can be easily carried.

（第６実施形態）
第４実施形態では、音声認識ボード５５が被対話体１１に搭載され、対話処理部７１がサーバ１３に搭載されたが、対話処理部７１が被対話体１１に搭載され、音声認識ボード５５がサーバ１３に搭載されてもよい。 (Sixth embodiment)
In the fourth embodiment, the speech recognition board 55 is mounted on the interactee 11 and the dialog processing unit 71 is mounted on the server 13. However, the dialog processing unit 71 is mounted on the interactee 11 and the speech recognition board 55 is It may be mounted on the server 13.

上記構成によれば、音声認識ボード５５がサーバ１３に備えられるので、被対話体１１を落下させた場合に、あるいは水たまりに水没させた場合に、高価な音声認識ボード５５が故障することがない。 According to the above configuration, since the voice recognition board 55 is provided in the server 13, the expensive voice recognition board 55 does not break down when the interacting body 11 is dropped or submerged in a puddle. .

（第７実施形態）
マイク３９を被対話体１１と別体に構成し、図示しないヘッドマイクに搭載するようにしてもよい。上記ヘッドマイクは、マイクを人の口元に配置する装置であり、人が被対話体１１に近づかなくとも、音声をマイク３９に入力することができる。 (Seventh embodiment)
The microphone 39 may be configured separately from the interactee 11 and mounted on a head microphone (not shown). The head microphone is a device that places a microphone near the person's mouth, and can input voice to the microphone 39 even if the person does not approach the interactee 11.

上記構成によれば、人が被対話体１１に近づかなくとも、音声をマイク３９に入力することができ、これにより、音声の認識率を向上させることができる。一般に、音声認識ボード５５で人の音声を認識する場合、周囲音、雑音等により、人の音声の認識率が低下することが知られている。このためマイク３９を複数個配置する、あるいは音響部分析部の手前にノイズ除去フィルタを配置する、などして音声の認識率を向上させる方法が考えられている。第４実施形態は、上記の他に、音声の認識率を向上させるようにしたものである。 According to the above configuration, voice can be input to the microphone 39 even if a person does not approach the interactee 11, thereby improving the voice recognition rate. In general, when a human voice is recognized by the voice recognition board 55, it is known that the recognition rate of a human voice is reduced due to ambient sounds, noise, and the like. For this reason, a method of improving the speech recognition rate by arranging a plurality of microphones 39 or arranging a noise removal filter in front of the acoustic unit analysis unit has been considered. In the fourth embodiment, in addition to the above, the speech recognition rate is improved.

また、第３実施形態において、マイク３９を被対話体１１と別体に構成し、図示しないヘッドマイクに搭載するようにした場合、第３実施形態よりさらに被対話体１１を小さく、軽くすることができ、被対話体１１の持ち運びを容易にすることができる。また、ヘッドマイクを使用することにより、音声認識ボード５５に音声信号が入力される際の雑音を小さくすることができる。 Further, in the third embodiment, when the microphone 39 is configured separately from the interactee 11 and mounted on a head microphone (not shown), the interactee 11 is made smaller and lighter than the third embodiment. It is possible to carry the interactee 11 easily. Further, by using the head microphone, it is possible to reduce noise when a voice signal is input to the voice recognition board 55.

（第８実施形態）
第１実施形態では、発音情報記憶部７５がサーバ１３に備えられたが、発音情報記憶部７５が被対話体１１に備えられてもよい。また、第１実施形態では、画像情報記憶部７７がサーバ１３に備えられたが、画像情報記憶部７７が被対話体１１に備えられてもよい。 (Eighth embodiment)
In the first embodiment, the pronunciation information storage unit 75 is provided in the server 13, but the pronunciation information storage unit 75 may be provided in the interactee 11. Further, in the first embodiment, the image information storage unit 77 is provided in the server 13, but the image information storage unit 77 may be provided in the interactee 11.

（第９実施形態）
上述した発音情報は、インターネット上の所定の記憶場所からダウンロードするようにしてもよい。 (Ninth embodiment)
The pronunciation information described above may be downloaded from a predetermined storage location on the Internet.

上記構成によれば、発音情報記憶部７５がインターネットに接続自在に構成されており、所定の発音情報がインターネット上の所定の記憶場所からダウンロード自在であるので、所定の発音情報をインターネット上からダウンロードできる高機能な音声対話システムを提供できる。また、所定の発音情報をインターネット上からダウンロードできるので、発音情報記憶部７５に記憶された所定の発音情報が損傷しても、直ぐに所定の発音情報を復旧することができる。 According to the above configuration, the pronunciation information storage unit 75 is configured to be connectable to the Internet, and the predetermined pronunciation information can be downloaded from a predetermined storage location on the Internet. Therefore, the predetermined pronunciation information is downloaded from the Internet. A high-performance spoken dialogue system can be provided. Further, since the predetermined pronunciation information can be downloaded from the Internet, even if the predetermined pronunciation information stored in the pronunciation information storage unit 75 is damaged, the predetermined pronunciation information can be restored immediately.

（第１０実施形態）
上述した画像情報は、インターネット上の所定の記憶場所からダウンロードするようにしてもよい。 (10th Embodiment)
The image information described above may be downloaded from a predetermined storage location on the Internet.

上記構成によれば、所定の画像情報をインターネット上からダウンロードできる高機能な音声対話システムを提供できる。また、所定の画像情報をインターネット上からダウンロードできるので、画像情報記憶部７７に記憶された所定の画像情報が損傷しても、直ぐに所定の画像情報を復旧することができる。 According to the above configuration, it is possible to provide a highly functional voice interaction system that can download predetermined image information from the Internet. Further, since the predetermined image information can be downloaded from the Internet, even if the predetermined image information stored in the image information storage unit 77 is damaged, the predetermined image information can be restored immediately.

（第１１実施形態）
第１実施形態では、被対話体１１が、各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７を備えており、サーバ１３がコントローラ５９を備えていたが、被対話体１１が、各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７、コントローラ５９の全てを備えていてもよい。 (Eleventh embodiment)
In the first embodiment, the interacting body 11 includes movable parts 29, 31, 33, 35, 37, motors 19, 21, 23, 25, 27, and a drive part 17, and the server 13 is a controller 59. However, even if the interactee 11 includes all of the movable parts 29, 31, 33, 35, 37, the motors 19, 21, 23, 25, 27, the drive part 17, and the controller 59. Good.

（第１２実施形態）
第１実施形態では、被対話体１１が、各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７を備えており、サーバ１３がコントローラ５９を備えていたが、被対話体１１が、各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７を備えており、サーバ１３が駆動部１７、コントローラ５９を備えていてもよい。 (Twelfth embodiment)
In the first embodiment, the interacting body 11 includes movable parts 29, 31, 33, 35, 37, motors 19, 21, 23, 25, 27, and a drive part 17, and the server 13 is a controller 59. However, the interacting body 11 includes the movable parts 29, 31, 33, 35, and 37, and the motors 19, 21, 23, 25, and 27, and the server 13 includes the drive part 17 and the controller 59. May be provided.

上記構成によれば、可動部１１の動作を司令する指令信号を、サーバ１３に備えられたコントローラ５９から駆動部１７に出力し、この指令信号に基づいて、被対話体１１に備えられた各モータ１９、２１、２３、２５、２７を駆動することで、各可動部２９、３１、３３、３５、３７を可動することができる。 According to the above configuration, the command signal for commanding the operation of the movable unit 11 is output from the controller 59 provided in the server 13 to the drive unit 17, and each of the commands provided in the interacting body 11 is based on the command signal. By driving the motors 19, 21, 23, 25, 27, the movable parts 29, 31, 33, 35, 37 can be moved.

駆動部１７、コントローラ５９がサーバ１３に備えられるので、被対話体１１を落下させた場合、あるいは水たまりに水没させた場合でも、高価な駆動部１７、コントローラ５９が故障することがない。 Since the drive unit 17 and the controller 59 are provided in the server 13, the expensive drive unit 17 and the controller 59 do not break down even when the interactee 11 is dropped or submerged in a puddle.

また、各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７、コントローラ５９のすべてが被対話体１１に備えられる場合に比べると、被対話体１１を小さく、軽くすることができ、被対話体１１の持ち運びを容易にすることができる。 Compared with the case where each of the movable parts 29, 31, 33, 35, 37, the motors 19, 21, 23, 25, 27, the drive part 17, and the controller 59 are all provided in the interacting body 11, The body 11 can be made small and light, and the carried body 11 can be easily carried.

（第１３実施形態）
第１実施形態では、被対話体１１に各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７が備えられ、サーバ１３にコントローラ５９が備えられたが、これに替わり、可動ユニット１５が各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７、コントローラ５９の全てを備えていてもよい。 (13th Embodiment)
In the first embodiment, the movable body 29, 31, 33, 35, 37, each motor 19, 21, 23, 25, 27, and the drive unit 17 are provided in the interacting body 11, and the controller 59 is provided in the server 13. However, instead of this, the movable unit 15 may include all of the movable parts 29, 31, 33, 35, 37, the motors 19, 21, 23, 25, 27, the drive part 17, and the controller 59. Good.

上記構成によれば、可動ユニット１５が被対話体１１およびサーバ１３とは別体に設けられ、被対話体と有線及び無線のいずれかで接続されて可動することができる。 According to the above configuration, the movable unit 15 is provided separately from the interactee 11 and the server 13, and can be moved by being connected to the interactee either by wire or wirelessly.

各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７、コントローラ５９が被対話体１１と別体に設けられているので、被対話体１１を落下させた場合、あるいは水たまりに水没させた場合でも、高価な各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７、コントローラ５９が故障することがない。 Since each movable part 29, 31, 33, 35, 37, each motor 19, 21, 23, 25, 27, drive part 17, and controller 59 are provided separately from the interactee 11, the interactee 11 Even when the camera is dropped or submerged in a puddle, the expensive movable parts 29, 31, 33, 35, 37, the motors 19, 21, 23, 25, 27, the drive part 17, and the controller 59 are broken. There is nothing to do.

（第１４実施形態）
第１実施形態では、被対話体１１に各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７が備えられ、サーバ１３にコントローラ５９が備えられたが、これに替わり、可動ユニット１５に各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７が備えられていてもよい。上記の場合、コントローラ５９が、被対話体１１およびサーバ１３のいずれかに備えられていてもよい。 (14th Embodiment)
In the first embodiment, the movable body 29, 31, 33, 35, 37, each motor 19, 21, 23, 25, 27, and the drive unit 17 are provided in the interacting body 11, and the controller 59 is provided in the server 13. However, instead of this, the movable unit 15 may be provided with the movable parts 29, 31, 33, 35, 37, the motors 19, 21, 23, 25, 27, and the drive part 17. In the above case, the controller 59 may be provided in either the interactee 11 or the server 13.

上記構成によれば、各可動部２９、３１、３３、３５、３７の動作を司令する指令信号を、被対話体１１およびサーバ１３のいずれかに備えられたコントローラ５９から駆動部に出力し、この指令信号に基づいて、可動ユニット１５に備えられた各モータ１９、２１、２３、２５、２７を駆動することで、各可動部２９、３１、３３、３５、３７を可動することができる。 According to the above configuration, a command signal for commanding the operation of each of the movable units 29, 31, 33, 35, and 37 is output from the controller 59 provided in either the interactee 11 or the server 13 to the drive unit. By driving the motors 19, 21, 23, 25, 27 provided in the movable unit 15 based on the command signal, the movable parts 29, 31, 33, 35, 37 can be moved.

各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７が被対話体１１と別体に備えられているので、被対話体１１を落下させた場合、あるいは水たまりに水没させた場合でも、高価な各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７が故障することがない。 Since each movable part 29, 31, 33, 35, 37, each motor 19, 21, 23, 25, 27, and drive part 17 are provided separately from the interactee 11, the interactee 11 is dropped. In this case, even when submerged in a puddle, the expensive movable parts 29, 31, 33, 35, 37, the motors 19, 21, 23, 25, 27, and the drive part 17 do not break down.

（第１５実施形態）
第１実施形態では、被対話体１１に各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７、駆動部１７が備えられ、サーバ１３にコントローラ５９が備えられたが、これに替わり、可動ユニット１５が、各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７を備えており、駆動部１７が、被対話体１１およびサーバ１３のいずれかに備えられており、コントローラ５９が、被対話体１１およびサーバ１３のいずれかに備えられていてもよい。 (Fifteenth embodiment)
In the first embodiment, the movable body 29, 31, 33, 35, 37, each motor 19, 21, 23, 25, 27, and the drive unit 17 are provided in the interacting body 11, and the controller 59 is provided in the server 13. However, instead of this, the movable unit 15 includes movable parts 29, 31, 33, 35, and 37, and motors 19, 21, 23, 25, and 27. 11 and the server 13, and the controller 59 may be provided in either the interactee 11 or the server 13.

上記構成によれば、少なくとも各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７に備えられているので、被対話体１１を落下させた場合、あるいは水たまりに水没させた場合でも、高価な各可動部２９、３１、３３、３５、３７、各モータ１９、２１、２３、２５、２７が故障することがない。 According to the above configuration, since at least each of the movable parts 29, 31, 33, 35, and 37 and the motors 19, 21, 23, 25, and 27 are provided, the object 11 is dropped or a puddle. Even when submerged in water, the expensive movable parts 29, 31, 33, 35, 37 and the motors 19, 21, 23, 25, 27 do not fail.

（第１６実施形態）
被対話体１１と可動ユニット１５が取付自在に構成されていてもよい。 (Sixteenth embodiment)
The interactee 11 and the movable unit 15 may be configured to be freely attachable.

これによれば、可動ユニット１５を被対話体１１に取り付けることができるので、被対話体１１が可動ユニット１５と別体に構成される場合と、被対話体１１が可動ユニット１５と一体に構成される場合の２つの構成を使い分けて使用することができる。 According to this, since the movable unit 15 can be attached to the interacting body 11, the interacting body 11 is configured separately from the movable unit 15, and the interacting body 11 is configured integrally with the movable unit 15. The two configurations can be used separately.

（第１７実施形態）
第１実施形態では、所定の画像を表示する画像モニタ７９ａがサーバ１３に設けられたが、所定の画像を表示する画像モニタ７９ａが被対話体１１と一体に設けられていてもよい。 (17th Embodiment)
In the first embodiment, the image monitor 79 a that displays a predetermined image is provided in the server 13, but the image monitor 79 a that displays a predetermined image may be provided integrally with the interacting body 11.

（第１８実施形態）
第１実施形態では、画像情報記憶部７７がサーバ１３に搭載されたが、画像情報記憶部７７が被対話体１１に搭載されてもよく、人がマイク３９を介して所定の画像情報を要求した場合、人がマイク３９を介して所定の画像情報を許可した場合、所定の画像情報を用いて被対話体１１が自ら所定の画像を表示する場合のいずれかに、画像情報記憶部７７から所定の画像情報を読み出して、画像モニタ７９ａに表示するようにしてもよい。 (Eighteenth embodiment)
In the first embodiment, the image information storage unit 77 is mounted on the server 13, but the image information storage unit 77 may be mounted on the interactee 11, and a person requests predetermined image information via the microphone 39. In such a case, when the person permits the predetermined image information via the microphone 39, the interacting body 11 displays the predetermined image by itself using the predetermined image information. Predetermined image information may be read out and displayed on the image monitor 79a.

上記構成によれば、人がマイク３９を介して所定の画像情報を要求した場合、人がマイク３９を介して所定の画像情報を許可した場合、所定の画像情報を用いて被対話体が自ら所定の画像を表示する場合のいずれかに、所定の画像情報を得ることができる高機能な音声対話システムを提供できる。また、人がマイク３９を介して所定の画像情報を要求した場合、人がマイク３９を介して所定の画像情報を許可した場合、所定の画像情報を用いて被対話体が自ら所定の画像を表示する場合のいずれかに、所定の画像情報を画像情報記憶部７７から得て、画像モニタ７９ａに表示する高度なユーザインターフェースを実現することができる。 According to the above configuration, when a person requests predetermined image information via the microphone 39, or when a person permits predetermined image information via the microphone 39, the person to be interacted with uses the predetermined image information. It is possible to provide a high-performance voice interaction system capable of obtaining predetermined image information in any case where a predetermined image is displayed. In addition, when a person requests predetermined image information via the microphone 39, or when a person permits predetermined image information via the microphone 39, the interactee uses the predetermined image information to display a predetermined image by himself / herself. In any case of display, it is possible to realize an advanced user interface that obtains predetermined image information from the image information storage unit 77 and displays it on the image monitor 79a.

（第１９実施形態）
第１実施形態では、画像モニタ７９ａが被対話体１１、サーバ１３、可動ユニット１５のいずれとも別体に設けられたが、画像モニタ７９ａが被対話体１１および可動ユニット１５のいずれかに設けられて、被対話体１１、サーバ１３、可動ユニット１５の少なくとも１つに有線及び無線のいずれかで接続され、所定の画像情報が予め記憶された画像情報記憶部７７が、被対話体１１、サーバ１３、可動ユニット１５のいずれかに搭載されており、人がマイク３９を介して所定の画像情報を要求した場合、人がマイク３９を介して所定の画像情報を許可した場合、所定の画像情報を用いて被対話体１１が自ら所定の画像を表示する場合のいずれかに、画像情報記憶部７７から所定の画像情報を読み出して、画像モニタ７９ａに表示してもよい。 (Nineteenth embodiment)
In the first embodiment, the image monitor 79 a is provided separately from any of the interactee 11, the server 13, and the movable unit 15, but the image monitor 79 a is provided in either the interactee 11 or the movable unit 15. Thus, an image information storage unit 77 that is connected to at least one of the interactee 11, the server 13, and the movable unit 15 by either wired or wireless, and stores predetermined image information in advance. 13, mounted on either of the movable units 15, when a person requests predetermined image information via the microphone 39, or when a person permits predetermined image information via the microphone 39, the predetermined image information In any of the cases where the interactee 11 displays a predetermined image by itself, the predetermined image information may be read from the image information storage unit 77 and displayed on the image monitor 79a.

上記構成によれば、画像モニタ７９ａがサーバ１３、可動ユニット１５のいずれかに搭載されている場合には、被対話体１１を落下させた場合、あるいは水たまりに水没させた場合でも、高価な画像モニタ７９ａを損傷させることがない。また、画像情報記憶部７７がサーバ１３、可動ユニット１５のいずれかに搭載されている場合には、被対話体１１を落下させた場合、あるいは水たまりに水没させた場合でも、画像情報記憶部７７の画像情報を損傷させることがない。なお、画像モニタ７９ａが被対話体１１に設けられていてもよく、画像情報記憶部７７が被対話体１１に設けられていてもよい。 According to the above configuration, when the image monitor 79a is mounted on either the server 13 or the movable unit 15, an expensive image is obtained even when the interactee 11 is dropped or submerged in a puddle. The monitor 79a is not damaged. Further, when the image information storage unit 77 is mounted on either the server 13 or the movable unit 15, the image information storage unit 77 even when the interactee 11 is dropped or submerged in a puddle. The image information is not damaged. Note that the image monitor 79 a may be provided in the interactee 11, and the image information storage unit 77 may be provided in the interactee 11.

また、上記構成によれば、人がマイク３９を介して所定の画像情報を要求した場合、人がマイク３９を介して所定の画像情報を許可した場合、所定の画像情報を用いて被対話体１１が自ら所定の画像を表示する場合のいずれかに、所定の画像情報を得ることができる高機能な音声対話システムを提供できる。また、人がマイク３９を介して所定の画像情報を要求した場合、人がマイク３９を介して所定の画像情報を許可した場合、所定の画像情報を用いて被対話体が自ら所定の画像を表示する場合のいずれかに、所定の画像情報を画像情報記憶部７７から得て、画像モニタ７９ａに表示する高度なユーザインターフェースを実現することができる。 Further, according to the above configuration, when a person requests predetermined image information via the microphone 39, or when a person permits predetermined image information via the microphone 39, the object to be interacted using the predetermined image information. It is possible to provide a high-performance voice interaction system capable of obtaining predetermined image information in any of the cases where the 11 displays a predetermined image by itself. In addition, when a person requests predetermined image information via the microphone 39, or when a person permits predetermined image information via the microphone 39, the interactee uses the predetermined image information to display the predetermined image information himself / herself. In any case of display, it is possible to realize an advanced user interface that obtains predetermined image information from the image information storage unit 77 and displays it on the image monitor 79a.

（第１８実施形態）
第１実施形態では、画像情報記憶部７７が、サーバ１３に搭載されていたが、被対話体１１、可動ユニット１５のいずれかに搭載され、人がマイク３９を介して所定の画像情報を要求した場合、人がマイク３９を介して所定の画像情報を許可した場合、所定の画像情報を用いて被対話体１１が自ら所定の画像を表示する場合のいずれかに、画像情報記憶部７７から所定の画像情報を読み出して、画像モニタ７９ａに表示してもよい。 (Eighteenth embodiment)
In the first embodiment, the image information storage unit 77 is mounted on the server 13, but is mounted on either the interactee 11 or the movable unit 15, and a person requests predetermined image information via the microphone 39. In such a case, when the person permits the predetermined image information via the microphone 39, the interacting body 11 displays the predetermined image by itself using the predetermined image information. Predetermined image information may be read and displayed on the image monitor 79a.

上記構成によれば、所定の画像を表示する画像モニタ７９ａが被対話体１１および可動ユニット１５のいずれとも別体に設けられているので、被対話体１１を落下させた場合、あるいは水たまりに水没させた場合でも、画像情報記憶部７７の画像情報を損傷させることがない。 According to the above configuration, the image monitor 79a for displaying a predetermined image is provided separately from both the interacting body 11 and the movable unit 15, so that the interacting body 11 is dropped or submerged in a puddle. Even when the image information is stored, the image information in the image information storage unit 77 is not damaged.

また、上記構成によれば、人がマイク３９を介して所定の画像情報を要求した場合、人がマイク３９を介して所定の画像情報を許可した場合、所定の画像情報を用いて被対話体が自ら所定の画像を表示する場合のいずれかに、所定の画像情報を得ることができる高機能な音声対話システムを提供できる。また、人がマイク３９を介して所定の画像情報を要求した場合、人がマイク３９を介して所定の画像情報を許可した場合、所定の画像情報を用いて被対話体が自ら所定の画像を表示する場合のいずれかに、所定の画像情報を画像情報記憶部７７から得て、画像モニタ７９ａに表示する高度なユーザインターフェースを実現することができる。 Further, according to the above configuration, when a person requests predetermined image information via the microphone 39, or when a person permits predetermined image information via the microphone 39, the object to be interacted using the predetermined image information. It is possible to provide a high-performance voice interaction system capable of obtaining predetermined image information in any case where the user himself / herself displays a predetermined image. In addition, when a person requests predetermined image information via the microphone 39, or when a person permits predetermined image information via the microphone 39, the interactee uses the predetermined image information to display the predetermined image information himself / herself. In any case of display, it is possible to realize an advanced user interface that obtains predetermined image information from the image information storage unit 77 and displays it on the image monitor 79a.

（第１９実施形態）
上述した画像情報記憶部７７がインターネットに接続自在に構成されており、画像情報がインターネット上の所定の記憶場所からダウンロード自在であってもよい。 (Nineteenth embodiment)
The image information storage unit 77 described above may be configured to be connectable to the Internet, and the image information may be downloaded from a predetermined storage location on the Internet.

（第２０実施形態）
第１実施形態では、ＣＣＤイメージセンサ４５ａが被対話体１１と一体に構成されていたが、ＣＣＤイメージセンサ４５ａが被対話体１１と別体に構成され、ＣＣＤイメージセンサ４５ａにより撮像された撮像データから所定の対象物を認識する画像認識手段が被対話体１１およびサーバ１３のいずれかに搭載されていてもよい。 (20th embodiment)
In the first embodiment, the CCD image sensor 45a is configured integrally with the interactee 11; however, the CCD image sensor 45a is configured separately from the interactee 11 and is captured by the CCD image sensor 45a. The image recognition means for recognizing a predetermined object from either of the objects 11 and the server 13 may be mounted.

これによれば、ＣＣＤイメージセンサ４５ａにより撮像された撮像データから所定の対象物を認識することができる。また、ＣＣＤイメージセンサ４５ａを備えた高機能な音声対話システムを提供できる。 According to this, it is possible to recognize a predetermined object from the imaging data captured by the CCD image sensor 45a. In addition, it is possible to provide a highly functional voice dialogue system including the CCD image sensor 45a.

（第２１実施形態）
人を含む所定の対象物を撮像自在なＣＣＤカメラ４５ａが可動ユニット１５に設けられていてもよい。 (21st Embodiment)
The movable unit 15 may be provided with a CCD camera 45a capable of imaging a predetermined object including a person.

上記構成によれば、被対話体１１を落下させた場合、あるいは水たまりに水没させた場合でも、高価なＣＣＤカメラ４５ａを損傷させることがない。
（第２２実施形態）
ＣＣＤカメラ４５ａにより撮像された撮像データから所定の対象物を認識する画像モニタ７９ａが被対話体１１、可動ユニット１５のいずれか１つに搭載されていてもよい。 According to the above configuration, the expensive CCD camera 45a is not damaged even if the interactee 11 is dropped or submerged in a puddle.
(Twenty-second embodiment)
An image monitor 79a for recognizing a predetermined object from image data captured by the CCD camera 45a may be mounted on any one of the interacting body 11 and the movable unit 15.

上記構成によれば、画像モニタ７９ａが可動ユニット１５に搭載されている場合には、高価な画像モニタ７９ａを損傷させることがない。なお、上記のように、画像モニタ７９ａが被対話体１１に搭載されていてもよい。 According to the above configuration, when the image monitor 79a is mounted on the movable unit 15, the expensive image monitor 79a is not damaged. As described above, the image monitor 79a may be mounted on the interactee 11.

（第２３実施形態）
人を含む所定の対象物を撮像自在なＣＣＤカメラ４５ａが被対話体１１および可動ユニット１５のいずれとも別体に設けられて、被対話体１１、サーバ１３、可動ユニット１５の少なくとも１つに有線及び無線のいずれかで接続され、ＣＣＤカメラ４５ａにより撮像された撮像データから所定の対象物を認識する画像モニタ７９ａが被対話体１１、サーバ１３、可動ユニット１５の少なくとも１つに搭載されていてもよい。 (23rd Embodiment)
A CCD camera 45a capable of imaging a predetermined object including a person is provided separately from both the interacting body 11 and the movable unit 15, and wired to at least one of the interacting body 11, the server 13, and the movable unit 15. And an image monitor 79a that is connected either wirelessly or for recognizing a predetermined object from image data captured by the CCD camera 45a is mounted on at least one of the interacting body 11, the server 13, and the movable unit 15. Also good.

（第２４実施形態）
ＣＣＤカメラ４５ａが、テーブルゲームの進行状況を撮像し、画像モニタ７９ａが、テーブルゲームの進行状況を画像認識するように構成されており、画像認識手段により認識された進行状況から各可動部２９、３１、３３、３５、３７の次の動作を決定する動作決定手段を備えており、各可動部２９、３１、３３、３５、３７が、動作決定手段により決定された次の動作を実行するように、コントローラ５９が駆動部に指令信号を出力してもよい。 (24th Embodiment)
The CCD camera 45a captures the progress status of the table game, and the image monitor 79a is configured to recognize the progress status of the table game. Each movable unit 29, Operation determining means for determining the next operation of 31, 33, 35, 37 is provided so that each movable part 29, 31, 33, 35, 37 executes the next operation determined by the operation determining means. In addition, the controller 59 may output a command signal to the drive unit.

上記構成によれば、ＣＣＤカメラ４５ａ、画像モニタ７９ａによりテーブルゲームの進行状況を撮像、画像認識し、動作決定部７３により各可動部２９、３１、３３、３５、３７の次の動作を決定し、可動部が、動作決定部７３により決定された次の動作を実行して、ゲームを進行する高度な音声対話システムを提供できる。 According to the above configuration, the progress status of the table game is picked up and recognized by the CCD camera 45a and the image monitor 79a, and the next operation of each movable unit 29, 31, 33, 35, 37 is determined by the operation determination unit 73. The moving part can perform the next action determined by the action determining part 73 to provide an advanced voice dialogue system for proceeding with the game.

（第２５実施形態）
各可動部２９、３１、３３、３５、３７のいずれかにＣＣＤイメイジセンサ４５ａを搭載し、コントローラ５９から駆動部１７に指令信号を出力して各可動部２９、３１、３３、３５、３７を可動させ、人を含む所定の対象物を探し出してもよい。 (25th Embodiment)
A CCD image sensor 45a is mounted on any one of the movable parts 29, 31, 33, 35, and 37, and a command signal is output from the controller 59 to the drive part 17 so that the movable parts 29, 31, 33, 35, and 37 are connected. A predetermined object including a person may be searched by moving the object.

上記構成によれば、人を含む所定の対象物を探し出す高度な音声対話システムを提供できる。 According to the above configuration, it is possible to provide an advanced voice interactive system that searches for a predetermined object including a person.

（第２６実施形態）
第１実施形態乃至第２５実施形態のいずれか１つの音声対話システムにおいて、さらに作動信号によって作動する作動手段を具えた作動体の作動手段に、作動信号を出力する作動信号出力手段が被対話体１１およびサーバ１３の少なくとも１つに搭載されており、作動手段と作動信号出力手段との間が無線および有線のいずれか１つにより接続されていてもよい。 (26th Embodiment)
In the voice interaction system according to any one of the first embodiment to the twenty-fifth embodiment, the operation signal output means for outputting the operation signal is further provided to the operation means of the operation body including the operation means operated by the operation signal. 11 and the server 13, and the operation means and the operation signal output means may be connected by any one of wireless and wired.

上記構成によれば、可動ユニット１５または各可動部２９、３１、３３、３５、３７を用いずに、作動信号出力手段から出力された作動信号により、直接、作動体の作動手段を作動させる高度な音声対話システムを提供できる。 According to the above-described configuration, the operating unit of the operating body is directly operated by the operation signal output from the operation signal output unit without using the movable unit 15 or the movable units 29, 31, 33, 35, and 37. Can provide a simple spoken dialogue system.

（第２７実施形態）
図７に示すように、上記被対話体１１が携帯電話で構成されていてもよい。 (27th Embodiment)
As shown in FIG. 7, the interactee 11 may be a mobile phone.

上記構成によれば、人と、携帯電話とが音声対話を行う高度な音声対話システムを提供できる。また、上述した第１実施形態乃至第２５実施形態のいずれか１つで説明した作用、効果の「被対話体」を「携帯電話」に置き換えた作用、効果を得ることができる。なお、通話時に、相手側のサーバが音声応答を行うものはあるが、本発明の音声対話システムとは区別するものとする。
（第２８実施形態） According to the above configuration, it is possible to provide an advanced voice dialogue system in which a person and a mobile phone perform voice dialogue. In addition, it is possible to obtain the operation and effect obtained by replacing the “interacted object” of the operation and effect described in any one of the first to 25th embodiments with “mobile phone”. Note that there is a server that responds to a voice during a call, but it is distinguished from the voice dialogue system of the present invention.
(Twenty-eighth embodiment)

図８に示すように、上記被対話体１１がコンピュータで構成されていてもよい。 As shown in FIG. 8, the interactee 11 may be composed of a computer.

上記構成によれば、人と、コンピュータとが音声対話を行う高度な音声対話システムを提供できる。また、上述した第１実施形態乃至第２５実施形態のいずれか１つで説明した作用、効果の「被対話体」を「コンピュータ」に置き換えた作用、効果を得ることができる。 According to the above configuration, it is possible to provide an advanced voice interaction system in which a person and a computer have a voice conversation. In addition, it is possible to obtain the operation and effect obtained by replacing the “interacted object” of the operation and effect described in any one of the first to 25th embodiments with “computer”.

（第２９実施形態）
上記被対話体１１がゲーム機で構成されていてもよい。 (Twenty-ninth embodiment)
The interactee 11 may be a game machine.

上記構成によれば、人と、ゲーム機とが音声対話を行う高度な音声対話システムを提供できる。また、上述した第１実施形態乃至第２５実施形態のいずれか１つで説明した作用、効果の「被対話体」を「ゲーム機」に置き換えた作用、効果を得ることができる。 According to the above configuration, it is possible to provide an advanced voice dialogue system in which a person and a game machine perform voice dialogue. In addition, it is possible to obtain the operation and effect obtained by replacing the “interacted body” of the operation and effect described in any one of the first to 25th embodiments described above with a “game machine”.

（第３０実施形態）
上記被対話体１１がカメラで構成されていてもよい。 (Thirty Embodiment)
The interactee 11 may be composed of a camera.

上記構成によれば、人と、カメラとが音声対話を行う高度な音声対話システムを提供できる。また、上述した第１実施形態乃至第２５実施形態のいずれか１つで説明した作用、効果の「被対話体」を「カメラ」に置き換えた作用、効果を得ることができる。 According to the above configuration, it is possible to provide an advanced voice dialogue system in which a person and a camera perform voice dialogue. In addition, it is possible to obtain the operation and effect obtained by replacing the “interactive body” of the operation and effect described in any one of the first to 25th embodiments described above with “camera”.

（第３１実施形態）
上記被対話体１１がロボットの機体で構成されていてもよい。 (Thirty-first embodiment)
The interactee 11 may be a robot body.

上記構成によれば、人と、ロボットの機体とが音声対話を行う高度な音声対話システムを提供できる。また、上述した第１実施形態乃至第２５実施形態のいずれか１つで説明した作用、効果の「被対話体」を「ロボットの機体」に置き換えた作用、効果を得ることができる。 According to the above configuration, it is possible to provide an advanced voice dialogue system in which a person and a robot body perform voice dialogue. In addition, it is possible to obtain the operation and effect obtained by replacing the “interacted body” of the operation and effect described in any one of the first to 25th embodiments with the “robot body”.

（第３２実施形態）
被対話体１１が人形、ぬいぐるみ、玩具のいずれか１つで構成されていてもよい。 (Thirty-second embodiment)
The interactee 11 may be composed of any one of a doll, a stuffed toy, and a toy.

上記構成によれば、人と、人形、ぬいぐるみ、玩具のいずれか１つとが音声対話を行う高度な音声対話システムを提供できる。また、上述した請求項１乃至請求項３０のいずれか１つの手段の後に説明した作用、効果の「被対話体」を「人形」、「ぬいぐるみ」、「玩具」のいずれかに置き換えた効果を得ることができる。さらに、被対話体１１が人形、ぬいぐるみ、玩具のいずれか１つで構成されているので、親しみがわきやすい。 According to the above configuration, it is possible to provide an advanced voice dialogue system in which a person and any one of a doll, a stuffed animal, and a toy perform voice dialogue. Further, an effect obtained by replacing the “interactive body” of the action and effect described after any one of the means of claims 1 to 30 with any of “doll”, “stuffed animal”, and “toy”. Obtainable. Furthermore, since the person to be interacted 11 is composed of any one of a doll, a stuffed animal, and a toy, it is easy to get familiar.

（その他の実施形態）
上述した可動部の構成は、第１実施のものに限らない。例えば、所定の装置２００の操作手段２００ａの操作方法に適宜適合するものであってもよく、所定のゲーム機を操作する操作方法に適宜適合するものであってもよい。また、１つ以上の可動部のそれぞれが、顔部、目部、口部、頭部、腕部、脚部、尻部のいずれかで構成されていてもよい。また、上述した可動部は、上腕部２９、下腕部３１のみでもよく、旋回部３７に替えて、所定の歩行装置であってもよい。 (Other embodiments)
The configuration of the movable part described above is not limited to that of the first embodiment. For example, it may be appropriately adapted to the operation method of the operation means 200a of the predetermined device 200, or may be appropriately adapted to the operation method of operating a predetermined game machine. Each of the one or more movable parts may be configured by any one of a face part, an eye part, a mouth part, a head part, an arm part, a leg part, and a hip part. Further, the above-described movable part may be only the upper arm part 29 and the lower arm part 31, or may be a predetermined walking device instead of the turning part 37.

また、発音情報記憶部７５が可動ユニット１５に備えられてもよい。また、画像情報記憶部７７が可動ユニット１５に備えられてもよい。 Further, the pronunciation information storage unit 75 may be provided in the movable unit 15. Further, the image information storage unit 77 may be provided in the movable unit 15.

また、音声認識ボード５５に替えて音声対話用プログラムを用いて音声対話の処理をしてもよい。 Further, instead of the voice recognition board 55, a voice dialogue process may be performed using a voice dialogue program.

また、サーバ１３がインターネット回線、電話回線、家庭用ＬＡＮを含むローカルネットワーク回線に接続されていてもよい。また、被対話体１１がインターネット回線、電話回線、家庭用ＬＡＮを含むローカルネットワーク回線に接続されていてもよい。また、上記インターネット回線、電話回線、家庭用ＬＡＮを含むローカルネットワーク回線に、被対話体１１と、サーバ１３とを中継するアクセスポイント、中継自在なコンピュータ、電話のいずれかが接続されており、上記被対話体１１が上記アクセスポイント、中継自在なコンピュータ、電話のいずれかを中継点として上記サーバ１３に接続されてもよい。 The server 13 may be connected to a local network line including an Internet line, a telephone line, and a home LAN. Further, the interactee 11 may be connected to a local network line including an Internet line, a telephone line, and a home LAN. In addition, any of an access point, a relayable computer, and a telephone that relays between the interactee 11 and the server 13 is connected to the local network line including the Internet line, the telephone line, and the home LAN. The interactee 11 may be connected to the server 13 using any one of the access point, the relayable computer, and the telephone as a relay point.

また、被対話体１１およびサーバ１３のいずれかに、被対話体１１が発音する際の感情パラメータを記憶する感情パラメータ記憶部が備えられており、スピーカ４３から発音する際にパラメータを参照し、顔の表情および口形状のうち、パラメータに応じた顔の表情および口形状を選択し、画像表示部に表示するようにしてもよい。上記構成によれば、人と対話を行う場合、所定の説明を行う場合、顔部、目部、口部、頭部、腕部、脚部、尻部のいずれかを可動させて、臨場感を持って発音する高度な音声対話システムを提供できる。また、顔部、目部、口部、頭部、腕部、脚部、尻部のいずれかを可動させて、臨場感を持って発音する高度なユーザインターフェースを実現することができる。なお、上述した音声認識ボード５５、ＣＰＵボード５７、画像認識手段等からなる制御回路の構成は種々あり、特許請求の範囲を満足するものであれば、これに限るものではない。 In addition, an emotion parameter storage unit that stores an emotion parameter when the interactant 11 produces a sound is provided in either the interactee 11 or the server 13, and refers to the parameter when the speaker 43 produces the sound, Of the facial expression and mouth shape, a facial expression and mouth shape corresponding to the parameters may be selected and displayed on the image display unit. According to the above configuration, when interacting with a person, when performing a predetermined explanation, any of the face, eyes, mouth, head, arms, legs, and buttocks can be moved to provide a sense of presence. It can provide an advanced spoken dialogue system that produces sounds with In addition, it is possible to realize an advanced user interface that can generate a realistic sensation by moving any of the face, eyes, mouth, head, arms, legs, and buttocks. There are various configurations of the control circuit including the voice recognition board 55, the CPU board 57, the image recognition means, and the like described above, and the configuration is not limited to this as long as it satisfies the claims.

本発明の第１実施形態による音声対話システムの外観図である。1 is an external view of a voice interaction system according to a first embodiment of the present invention. 本発明の第１実施形態による音声対話システムのブロック図である。1 is a block diagram of a voice interaction system according to a first embodiment of the present invention. 本発明の第１実施形態による可動ユニットの正面断面図を示す。The front sectional view of the movable unit by a 1st embodiment of the present invention is shown. 本発明の第３実施形態による音声対話システムの外観図である。It is an external view of the voice interactive system by 3rd Embodiment of this invention. 本発明の第３実施形態による音声対話システムのブロック図である。It is a block diagram of the voice dialogue system by a 3rd embodiment of the present invention. 本発明の第５実施形態による音声対話システムのブロック図である。It is a block diagram of the voice dialogue system by a 5th embodiment of the present invention. 本発明の第２７実施形態による音声対話システムの外観図である。It is an external view of the voice interactive system by 27th Embodiment of this invention. 本発明の第２８実施形態による音声対話システムの外観図である。It is an external view of the voice interactive system by 28th Embodiment of this invention.

Explanation of symbols

１００…音声対話システム
２００…所定の装置
２００ａ…操作手段
１１…被対話部
１３…サーバ（サーバ用コンピュータ）
１５…可動ユニット
１６…スピーカ（音声出力手段、対話手段）
１７…駆動部
１９…上腕用モータ
２１…下腕用モータ
２３…ハンド用モータ
２５…走行用モータ
２７…旋回用モータ
２９…上腕部（可動部）
３１…下腕部（可動部）
３３…ハンド（可動部）
３５…走行部（可動部）
３７…旋回部（可動部）
３９…マイク
４１…音声出力ボード（音声認識手段）
４３…スピーカ
４５…ＣＣＤカメラ
４５ａ…ＣＣＤイメージセンサ（撮像手段）
４５ｂ…信号処理部
４７…指令信号受診復調手段
４９…音声信号変調送信手段
５１…発音信号受診復調手段
５３…撮像信号変調送信手段
５５…音声認識ボード
５７…ＣＰＵボード
５９…コントローラ
６１…指令信号変調送信手段
６３…音声信号受診復調手段
６５…発音信号変調送信手段
６７…撮像信号受信復調段
６９…画像信号変調送信手段
７１…対話処理部（対話処理手段）
７３…動作決定部
７５…発音情報記憶部
７７…画像情報記憶部
７９…画像表示装置
７９ａ…画像モニタ（画像表示手段）
８１…画像情報受信復調手段
８３…駆動部
８５…ソレノイド
８７…プッシャ
８９…指令信号受信復調手段 DESCRIPTION OF SYMBOLS 100 ... Voice dialogue system 200 ... Predetermined apparatus 200a ... Operation means 11 ... Dialogue part 13 ... Server (server computer)
15 ... Moveable unit 16 ... Speaker (voice output means, dialogue means)
DESCRIPTION OF SYMBOLS 17 ... Drive part 19 ... Upper arm motor 21 ... Lower arm motor 23 ... Hand motor 25 ... Traveling motor 27 ... Turning motor 29 ... Upper arm part (movable part)
31 ... Lower arm (movable part)
33 ... Hand (movable part)
35 ... traveling part (movable part)
37 ... Turning part (movable part)
39 ... Microphone 41 ... Voice output board (voice recognition means)
43 ... Speaker 45 ... CCD camera 45a ... CCD image sensor (imaging means)
45b ... Signal processing unit 47 ... Command signal reception demodulation means 49 ... Sound signal modulation transmission means 51 ... Sound signal reception demodulation means 53 ... Imaging signal modulation transmission means 55 ... Sound recognition board 57 ... CPU board 59 ... Controller 61 ... Command signal modulation Transmission means 63 ... Audio signal reception demodulation means 65 ... Sound signal modulation transmission means 67 ... Imaging signal reception demodulation stage 69 ... Image signal modulation transmission means 71 ... Dialog processing section (dialog processing means)
73 ... Operation determination unit 75 ... Pronunciation information storage unit 77 ... Image information storage unit 79 ... Image display device 79a ... Image monitor (image display means)
81 ... Image information reception demodulating means 83 ... Driving section 85 ... Solenoid 87 ... Pusher 89 ... Command signal receiving demodulation means

Claims

A voice conversion means for converting a human voice into a voice signal, and a to-be-interactive body provided with a sound generation means for generating a sound by changing a predetermined pronunciation signal into vibration;
A server computer provided separately from the interactee and connected to the interactee either by wire or wirelessly;
With
The server computer determines the voice corresponding to the voice recognized by the voice recognition means by processing the voice signal converted by the voice conversion means and recognizing the voice of the person. A spoken dialogue system comprising dialogue control means for outputting a pronunciation signal.

A to-be-interactive body provided with a sound generation means for changing a predetermined sound generation signal into a vibration;
A server computer provided separately from the interactee and connected to the interactee either by wire or wirelessly;
Audio that is provided separately from the interactee and the server computer, and is connected to either the interactee or the server computer either by wire or wirelessly and converts human speech into an audio signal Conversion means;
With
The server computer determines the voice corresponding to the voice recognized by the voice recognition means by processing the voice signal converted by the voice conversion means and recognizing the voice of the person. A spoken dialogue system comprising dialogue control means for outputting a pronunciation signal.

A voice conversion means for converting a human voice into a voice signal, and a to-be-interactive body provided with a sound generation means for generating a sound by changing a predetermined pronunciation signal into vibration;
A server computer provided separately from the interactee and connected to the interactee either by wire or wirelessly;
With
Word recognition means for processing a voice signal converted by the voice conversion means to recognize a human word, dialog control for determining a word corresponding to the word recognized by the word recognition means and outputting the predetermined pronunciation signal One of the means is provided in the object to be interacted with, and the other is provided in the server computer.

A to-be-interactive body provided with a sound generation means for changing a predetermined sound generation signal into a vibration;
A server computer provided separately from the interactee and connected to the interactee either by wire or wirelessly;
Audio that is provided separately from the interactee and the server computer, and is connected to either the interactee or the server computer either by wire or wirelessly and converts human speech into an audio signal Conversion means;
With
Voice recognition means for processing a voice signal converted by the voice conversion means to recognize a human voice; dialog control for determining a voice corresponding to the voice recognized by the voice recognition means and outputting the predetermined pronunciation signal One of the means is provided in the object to be interacted with, and the other is provided in the server computer.

5. The spoken dialogue system according to claim 1, further comprising a pronunciation information storage unit capable of storing predetermined pronunciation information mounted on either the interactee or the server computer. And
The predetermined pronunciation information is stored in the pronunciation information storage unit;
When the person requests the predetermined pronunciation information via the voice conversion means, and when the person permits the predetermined pronunciation information via the voice conversion means, the subject using the predetermined pronunciation information A spoken dialogue system, wherein the predetermined pronunciation information is read out from the pronunciation information storage unit and is pronounced from the pronunciation means in any case where the dialogue body pronounces itself.

The pronunciation information storage unit is configured to be freely connected to the Internet,
6. The spoken dialogue system according to claim 5, wherein the pronunciation information can be downloaded from a predetermined storage location on the Internet.

The interactee includes one or more movable parts;
A motor for moving each of the one or more movable parts;
Driving units for driving the motors;
A controller that outputs a command signal to command the operation of the movable unit to the drive unit;
The spoken dialogue system according to claim 1, further comprising:

The interactee includes one or more movable parts;
A motor for moving each of the one or more movable parts;
Driving units for driving the motors;
With
7. The spoken dialogue system according to claim 1, wherein the server computer includes a controller that outputs an operation command signal to the drive unit.

The interactee includes one or more movable parts;
A motor for moving each of the one or more movable parts;
With
7. The server computer according to claim 1, further comprising: a drive unit that drives each of the motors; and a controller that outputs an operation command signal to the drive unit. The spoken dialogue system described in 1.

A movable unit provided separately from the interactee and the server computer, and connected to at least one of the interactee and the server computer in a wired or wireless manner and movable;
The movable unit includes one or more movable parts;
A motor for moving each of the one or more movable parts;
Driving units for driving the motors;
A controller that outputs a command signal to command the operation of the movable unit to the drive unit;
The spoken dialogue system according to claim 1, further comprising:

A movable unit provided separately from the interactee and the server computer and connected to at least one of the interactee and the server computer either by wire or wirelessly;
The movable unit includes one or more movable parts;
A motor for driving each of the one or more movable parts;
Driving units for driving the motors;
With
7. The controller according to claim 1, wherein one of the interactee and the server computer includes a controller that outputs an operation command signal to the drive unit. 8. Spoken dialogue system.

A movable unit provided separately from the interactee and connected to at least one of the interactee and the server computer either by wire or wirelessly;
The movable unit includes one or more movable parts;
A motor for moving each of the one or more movable parts;
With
A drive unit for driving each of the motors is provided in either the interactee or the server computer,
7. The controller according to claim 1, wherein a controller that outputs an operation command signal to the drive unit is provided in either the object to be interacted with or the server computer. 8. Voice dialogue system.

The spoken dialogue system according to any one of claims 10 to 12, wherein the interactee and the movable unit are configured to be freely attachable.

In the voice interaction system according to any one of claims 1 to 9, an image display means for displaying a predetermined image is provided either integrally with the interactee or separately.
The image information storage unit in which the predetermined image information is stored in advance is mounted on either the interactee or the server computer,
When the person requests the predetermined image information via the sound conversion means, or when the person permits the predetermined image information via the sound conversion means, the predetermined image information is used to A spoken dialogue system, wherein the predetermined image information is read from the image information storage unit and displayed on the image display means in any case where the dialogue body displays the predetermined image by itself.

14. The spoken dialogue system according to claim 10, wherein an image display means for displaying a predetermined image is further provided in either the interactee or the movable unit, Connected to at least one of the body, the server computer, and the movable unit by wire or wireless,
The image information storage unit in which the predetermined image information is stored in advance is mounted on any of the interactee, the server computer, and the movable unit,
When the person requests the predetermined image information via the sound conversion means, or when the person permits the predetermined image information via the sound conversion means, the predetermined image information is used to A spoken dialogue system, wherein the predetermined image information is read from the image information storage unit and displayed on the image display means in any case where the dialogue body displays the predetermined image by itself.

14. The voice interaction system according to claim 10, further comprising an image display means for displaying a predetermined image provided separately from both the interactee and the movable unit. It is connected to at least one of the interactive body, the server computer, and the movable unit by either wired or wireless,
The image information storage unit in which the predetermined image information is stored in advance is mounted on any of the interactee, the server computer, and the movable unit,
When the person requests the predetermined image information via the sound conversion means, or when the person permits the predetermined image information via the sound conversion means, the predetermined image information is used to A spoken dialogue system, wherein the predetermined image information is read from the image information storage unit and displayed on the image display means in any case where the dialogue body displays the predetermined image by itself.

The image information storage unit is configured to be freely connected to the Internet,
The voice dialogue system according to any one of claims 14 to 16, wherein the image information can be downloaded from a predetermined storage location on the Internet.

An imaging means capable of imaging a predetermined object including the person is configured to be integral with or separate from the interactee,
The image recognition means for recognizing the predetermined object from the image data picked up by the image pickup means is mounted on either the interactee or the server computer. 9. The voice interaction system according to any one of claims 14 and 14.

An imaging means capable of imaging a predetermined object including the person is provided in either the interactee or the movable unit, and is wired to at least one of the interactee, the server computer, and the movable unit. And wirelessly connected,
The image recognition means for recognizing the predetermined object from the image data picked up by the image pickup means is mounted on at least one of the interactee, the server computer, and the movable unit. The voice interactive system according to any one of claims 10 to 13, 15 and 16.

An imaging means capable of imaging a predetermined object including the person is provided separately from either the interacted body or the movable unit, and at least one of the interacted body, the server computer, and the movable unit. Connected to either cable or wirelessly,
The image recognition means for recognizing the predetermined object from the image data picked up by the image pickup means is mounted on at least one of the interactee, the server computer, and the movable unit. The voice interactive system according to any one of claims 10 to 13, 15 and 16.

The controller outputs the command signal to the drive unit so that the movable unit performs a predetermined communication operation in at least one of a case where a dialogue with the person is performed and a predetermined explanation is performed. 21. The spoken dialogue system according to any one of claims 7 to 20.

The movable part is arranged at a position for operating a predetermined device;
When the voice of the person is an instruction to operate a predetermined device, when the voice of the person is permission to operate the predetermined device, when operating the predetermined device by a predetermined operation input means, The said controller outputs the said command signal to the said drive part so that the said predetermined | prescribed apparatus may be operated when the automatic execution program to operate is performed, The Claim 13 thru | or 13 characterized by the above-mentioned. The spoken dialogue system according to claim 15.

The controller is configured so that the imaging unit images a predetermined object including the person and the image recognition unit performs a predetermined communication operation with the person based on the result of the recognition of the predetermined object. 21. The spoken dialogue system according to claim 18, wherein the command signal is output to the drive unit.

The imaging means picks up a predetermined object including the person, and the image recognition means selects at least one of the sound generation data based on the result of recognition of the predetermined object, and the sound generation 21. The spoken dialogue system according to any one of claims 18 to 20, wherein pronunciation is made to the person via means.

The imaging means images the operation means of the predetermined device,
When the voice of the person is an instruction to operate a predetermined device, when the voice of the person is permission to operate the predetermined device, when operating the predetermined device by a predetermined operation input means, When the automatic execution program to be operated is executed, based on the result of the image recognition means recognizing the position of the operation means, the movable part and the interactee move to the operation position of the means, 21. The spoken dialogue system according to claim 18, wherein the controller outputs the command signal to the drive unit so as to operate a predetermined device.

The imaging means is configured to image the progress of the table game, and the image recognition means is configured to recognize the image of the progress of the table game.
An operation determining means for determining a next operation of the movable part from the progress status recognized by the image recognition means;
21. The controller according to claim 18, wherein the controller outputs the command signal to the driving unit so that the movable unit executes a next operation determined by the operation determining unit. The spoken dialogue system according to one.

21. The controller according to claim 18, wherein the controller outputs the command signal to the drive unit to move the movable unit to search for a predetermined object including the person. Voice dialogue system.

According to the first embodiment, a tracking program for tracking the predetermined object recognized by the image recognition means is mounted on either the interactee or the server computer,
21. The moving unit is moved by outputting the command signal from the controller to the driving unit so that the imaging unit tracks a predetermined object including the person. The spoken dialogue system according to any one of the above.

29. The voice interaction system according to any one of claims 1 to 28, wherein an operation signal output means for outputting the operation signal is provided to the operation means of an operation body further comprising an operation means operated by an operation signal. It is mounted on at least one of the dialog and the server computer,
The voice interaction system, wherein the operation means and the operation signal output means are connected by one of wireless and wired.

The spoken dialogue system according to any one of claims 1 to 30, wherein the interactee is configured by a mobile phone.

The spoken dialogue system according to any one of claims 1 to 30, wherein the interactee is configured by a computer.

The spoken dialogue system according to any one of claims 1 to 30, wherein the interactee is configured by a game machine.

The spoken dialogue system according to any one of claims 1 to 30, wherein the interactee is configured by a camera.

The spoken dialogue system according to any one of claims 1 to 30, wherein the object to be interacted with is constituted by a robot.

The spoken dialogue system according to any one of claims 1 to 30, wherein the object to be interacted with is constituted by any one of a doll, a stuffed toy, and a toy.