JP2002120177A

JP2002120177A - Robot control device, robot control method and recording medium

Info

Publication number: JP2002120177A
Application number: JP2000310987A
Authority: JP
Inventors: Kazuo Ishii; 和夫石井; Hideki Noma; 英樹野間; Jun Hiroi; 順広井; Wataru Onoki; 渡小野木
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2000-10-11
Filing date: 2000-10-11
Publication date: 2002-04-23

Abstract

PROBLEM TO BE SOLVED: To perform suitable naming. SOLUTION: The voice input after a robot takes an action for expressing 'Please give a name' is recognized as a name, whereby a name is suitably given to the robot.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ロボット制御装置
およびロボット制御方法、並びに記録媒体に関し、特
に、例えば、音声認識装置による音声認識結果に基づい
て行動するロボットに用いて好適なロボット制御装置お
よびロボット制御方法、並びに記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a robot control device, a robot control method, and a recording medium, and more particularly to a robot control device suitable for use in a robot acting on the basis of a result of voice recognition by a voice recognition device, for example. The present invention relates to a robot control method and a recording medium.

【０００２】[0002]

【従来の技術】近年においては、例えば、玩具等とし
て、ユーザが発した音声を音声認識し、その音声認識結
果に基づいて、ある仕草をしたりするロボット（本明細
書においては、ぬいぐるみ状のものを含む）が製品化さ
れている。2. Description of the Related Art In recent years, for example, as a toy or the like, a robot that performs voice recognition of a voice uttered by a user and performs a certain gesture based on the voice recognition result (in this specification, a stuffed toy is used). ) Have been commercialized.

【０００３】[0003]

【発明が解決しようとする課題】ところで、このような
ロボットに名前を付けて、その名前を呼ぶことにより、
所定の動作をさせることもできる。By the way, by giving a name to such a robot and calling the name,
A predetermined operation can also be performed.

【０００４】しかしながら、名前を表す音声を音声認識
させることで、ロボットに名前を付ける場合、ユーザが
名前を表す音声を発する前または後に発せられた、例え
ば、周囲の音を、ロボットが誤って、名前であると認識
してしまうことがあった。[0004] However, when giving a name to a robot by causing a voice representing a name to be speech-recognized, the robot may erroneously recognize, for example, surrounding sounds produced before or after the user produces a voice representing the name. Sometimes it was recognized as a name.

【０００５】また、音声認識のために予め登録されてい
る単語と、音響上類似する単語が、名前として登録され
た場合、名前が呼ばれても、ロボットが、その音声を、
その類似した単語に誤って認識してしまうことがあっ
た。また、その類似した単語が発話されたとき、ロボッ
トが、名前が呼ばれたと誤って認識してしまうことがあ
った。If a word that is acoustically similar to a word registered in advance for voice recognition is registered as a name, even if the name is called, the robot will recognize the voice,
In some cases, the similar words were mistakenly recognized. Also, when the similar word is spoken, the robot may erroneously recognize that the name has been called.

【０００６】すなわち、このように、音声認識させるこ
とでロボットに名前を付ける場合、名前を適切に付けた
り、名前を正確に認識させることができない課題があっ
た。That is, when giving a name to a robot by performing voice recognition as described above, there has been a problem that it is not possible to appropriately assign a name or to accurately recognize a name.

【０００７】本発明は、このような状況に鑑みてなされ
たものであり、音声認識させることで名前を付ける場
合、名前を適切に付けたり、名前を正確に認識させるこ
とができるようにするものである。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and has been made in view of the above circumstances. In the case where a name is given by voice recognition, the name can be appropriately given or the name can be accurately recognized. It is.

【０００８】[0008]

【課題を解決するための手段】本発明のロボット制御装
置は、名前を付けてほしいことを表す行動をロボットが
起こすように、ロボットの行動を制御する行動制御手段
と、行動制御手段により、ロボットの行動が制御された
後に入力された音声から、最適音素列を検出する検出手
段と、最適音素列を、名前として登録する登録手段とを
備えることを特徴とする。SUMMARY OF THE INVENTION A robot control device according to the present invention comprises: a behavior control means for controlling a behavior of a robot so that the behavior of the robot indicates that a name is desired; The present invention is characterized in that it comprises a detecting means for detecting an optimal phoneme string from a voice inputted after the action is controlled, and a registering means for registering the optimal phoneme string as a name.

【０００９】ロボットの成長の状態を、所定の状態に順
次遷移させる遷移手段をさらに設け、行動制御手段に
は、遷移手段により、ロボットの成長の状態が所定の状
態に遷移されたとき、名前を付けてほしいことを表す行
動をロボットが起こすように、ロボットの行動を制御さ
せることができる。[0009] A transition means for sequentially transitioning the growth state of the robot to a predetermined state is further provided, and the behavior control means changes the name when the growth state of the robot is transitioned to the predetermined state by the transition means. It is possible to control the behavior of the robot so that the robot performs an action indicating that the robot wants to attach.

【００１０】ユーザにより入力された、名前登録処理の
開始を指示するコマンドを取得する取得手段をさらに設
け、行動制御手段には、取得手段により、コマンドが取
得されたとき、名前を付けてほしいことを表す行動をロ
ボットが起こすように、ロボットの行動を制御させるこ
とができる。An acquisition means for acquiring a command input by the user for instructing the start of the name registration process is further provided, and the behavior control means is required to give a name when the acquisition means acquires the command. Can be controlled so that the robot takes the action representing

【００１１】音響的な特徴を表す音響モデルを記憶する
第１の記憶手段と、音声認識のための単語が登録された
単語辞書を記憶する第２の記憶手段とをさらに設け、検
出手段には、音声の特徴パラメータに対応して接続され
た音響モデルからなる単語モデルのうち、特徴パラメー
タが観測されるスコアが最も高い単語モデルを、最適音
素列として検出させ、登録手段には、単語辞書にすでに
登録されている単語の音韻に対応して接続された音響モ
デルからなる単語モデルのうち、特徴パラメータが観測
されるスコアが最も高い単語モデルを検出するととも
に、検出した単語モデルのスコアと、最適音素列のスコ
アとの差が、所定の閾値より大きい場合、最適音素列
を、名前として登録させることができる。A first storage means for storing an acoustic model representing an acoustic feature, and a second storage means for storing a word dictionary in which words for speech recognition are registered, wherein the detection means are provided. The word model having the highest score at which the feature parameter is observed is detected as the optimal phoneme sequence among the word models formed of the acoustic models connected corresponding to the speech feature parameters. Among the word models consisting of acoustic models connected corresponding to the phonemes of the already registered words, a word model having the highest score in which the characteristic parameter is observed is detected. If the difference between the score of the phoneme string and the score is larger than a predetermined threshold, the optimum phoneme string can be registered as a name.

【００１２】本発明のロボット制御方法は、名前を付け
てほしいことを表す行動をロボットが起こすように、ロ
ボットの行動を制御する行動制御ステップと、行動制御
ステップにより、ロボットの行動が制御された後に入力
された音声から、最適音素列を検出する検出ステップ
と、最適音素列を、名前として登録する登録ステップと
を含むことを特徴とする。According to the robot control method of the present invention, the action of the robot is controlled by the action control step of controlling the action of the robot so that the action of the robot indicates that the name is desired. It is characterized by including a detecting step of detecting an optimal phoneme string from a voice inputted later and a registration step of registering the optimal phoneme string as a name.

【００１３】本発明の記録媒体のプログラムは、名前を
付けてほしいことを表す行動をロボットが起こすよう
に、ロボットの行動を制御する行動制御手段と、行動制
御手段により、ロボットの行動が制御された後に入力さ
れた音声から、最適音素列を検出する検出手段と、最適
音素列を、名前として登録する登録手段とを含むことを
特徴とする。[0013] The program of the recording medium of the present invention has a behavior control means for controlling the behavior of the robot and an action control means for controlling the behavior of the robot so that the behavior of the robot indicates that a name is desired. And a registration unit for registering the optimal phoneme string as a name from the speech input after the input.

【００１４】本発明のロボット制御装置および方法、並
びに記録媒体のプログラムにおいては、名前を付けてほ
しいことを表す行動をロボットが起こすように、ロボッ
トの行動が制御され、ロボットの行動が制御された後に
入力された音声から、最適音素列が検出され、最適音素
列が、名前として登録される。In the robot control apparatus and method of the present invention, and the program of the recording medium, the behavior of the robot is controlled so that the robot performs an action indicating that a name is desired to be given, and the behavior of the robot is controlled. The optimal phoneme sequence is detected from the voice input later, and the optimal phoneme sequence is registered as a name.

【００１５】[0015]

【発明の実施の形態】図１は、本発明を適用したロボッ
トの一実施の形態の外観構成例を示しており、図２は、
その電気的構成例を示している。FIG. 1 shows an example of the appearance of a robot according to an embodiment of the present invention, and FIG.
An example of the electrical configuration is shown.

【００１６】本実施の形態では、ロボットは、例えば、
犬等の四つ足の動物の形状のものとなっており、胴体部
ユニット２の前後左右に、それぞれ脚部ユニット３Ａ，
３Ｂ，３Ｃ，３Ｄが連結されるとともに、胴体部ユニッ
ト２の前端部と後端部に、それぞれ頭部ユニット４と尻
尾部ユニット５が連結されることにより構成されてい
る。In this embodiment, the robot is, for example,
It has the shape of a four-legged animal such as a dog, and has leg units 3A,
3B, 3C, and 3D are connected, and a head unit 4 and a tail unit 5 are connected to a front end and a rear end of the body unit 2, respectively.

【００１７】尻尾部ユニット５は、胴体部ユニット２の
上面に設けられたベース部５Ｂから、２自由度をもって
湾曲または揺動自在に引き出されている。The tail unit 5 is drawn out from a base 5B provided on the upper surface of the body unit 2 so as to bend or swing with two degrees of freedom.

【００１８】胴体部ユニット２には、ロボット全体の制
御を行うコントローラ１０、ロボットの動力源となるバ
ッテリ１１、並びにバッテリセンサ１２および熱センサ
１３からなる内部センサ部１４などが収納されている。The body unit 2 contains a controller 10 for controlling the entire robot, a battery 11 as a power source of the robot, and an internal sensor unit 14 including a battery sensor 12 and a heat sensor 13.

【００１９】頭部ユニット４には、「耳」に相当するマ
イク（マイクロフォン）１５、「目」に相当するＣＣＤ
(Charge Coupled Device)カメラ１６、「触覚」に相当
するタッチセンサ１７、および「口」に相当するスピー
カ１８が、それぞれ所定位置に配設されている他、ＬＥ
Ｄ（Light Emitting Diode）１９が、「目」の位置の設
けられている。また、頭部ユニット４には、口の下顎に
相当する下顎部４Ａが１自由度をもって可動に取り付け
られており、この下顎部４Ａが動くことにより、ロボッ
トの口の開閉動作が実現されるようになっている。The head unit 4 includes a microphone (microphone) 15 corresponding to “ears” and a CCD corresponding to “eyes”.
(Charge Coupled Device) A camera 16, a touch sensor 17 corresponding to "tactile sensation", and a speaker 18 corresponding to "mouth" are respectively provided at predetermined positions.
A D (Light Emitting Diode) 19 is provided at the position of the “eye”. Also, a lower jaw 4A corresponding to the lower jaw of the mouth is movably attached to the head unit 4 with one degree of freedom, and the opening and closing operation of the robot's mouth is realized by moving the lower jaw 4A. It has become.

【００２０】脚部ユニット３Ａ乃至３Ｄそれぞれの関節
部分や、脚部ユニット３Ａ乃至３Ｄそれぞれと胴体部ユ
ニット２の連結部分、頭部ユニット４と胴体部ユニット
２の連結部分、頭部ユニット４と下顎部４Ａの連結部
分、並びに尻尾部ユニット５と胴体部ユニット２の連結
部分などには、図２に示すように、それぞれアクチュエ
ータ３ＡＡ₁乃至３ＡＡ_K、３ＢＡ₁乃至３ＢＡ_K、３ＣＡ
₁乃至３ＣＡ_K、３ＤＡ₁乃至３ＤＡ_K、４Ａ₁乃至４Ａ_L、
５Ａ₁および５Ａ₂が配設されている。The joint portions of the leg units 3A to 3D, the connecting portions of the leg units 3A to 3D and the body unit 2, the connecting portions of the head unit 4 and the body unit 2, the head unit 4 and the lower jaw linking moiety parts 4A, and the like in the connecting portion of the tail unit 5 and the body unit 2, as shown in FIG. 2, each actuator 3AA ₁ to 3AA _K, 3BA ₁ to 3BA _K, 3CA
₁ to 3CA _K, 3DA ₁ to 3DA _K, 4A ₁ to 4A _L,
5A ₁ and 5A ₂ are disposed.

【００２１】頭部ユニット４におけるマイク１５は、ユ
ーザからの発話を含む周囲の音声（音）を集音し、得ら
れた音声信号を、コントローラ１０に送出する。ＣＣＤ
カメラ１６は、周囲の状況を撮像し、得られた画像信号
を、コントローラ１０に送出する。The microphone 15 in the head unit 4 collects surrounding sounds (sounds) including utterances from the user, and sends out the obtained sound signals to the controller 10. CCD
The camera 16 captures an image of the surroundings, and sends the obtained image signal to the controller 10.

【００２２】タッチセンサ１７は、例えば、頭部ユニッ
ト４の上部に設けられており、ユーザからの「なでる」
や「たたく」といった物理的な働きかけにより受けた圧
力を検出し、その検出結果を圧力検出信号としてコント
ローラ１０に送出する。The touch sensor 17 is provided, for example, above the head unit 4 and “strokes” from the user.
It detects the pressure received by a physical action such as tapping or tapping, and sends the detection result to the controller 10 as a pressure detection signal.

【００２３】胴体部ユニット２におけるバッテリセンサ
１２は、バッテリ１１の残量を検出し、その検出結果
を、バッテリ残量検出信号としてコントローラ１０に送
出する。熱センサ１３は、ロボット内部の熱を検出し、
その検出結果を、熱検出信号としてコントローラ１０に
送出する。The battery sensor 12 in the body unit 2 detects the remaining amount of the battery 11 and sends the detection result to the controller 10 as a battery remaining amount detection signal. The heat sensor 13 detects heat inside the robot,
The detection result is sent to the controller 10 as a heat detection signal.

【００２４】コントローラ１０は、ＣＰＵ(Central Pro
cessing Unit)１０Ａやメモリ１０Ｂ等を内蔵してお
り、ＣＰＵ１０Ａにおいて、メモリ１０Ｂに記憶された
制御プログラムが実行されることにより、各種の処理を
行う。The controller 10 has a CPU (Central Pro
(Processing Unit) 10A, a memory 10B, and the like. The CPU 10A performs various processes by executing a control program stored in the memory 10B.

【００２５】即ち、コントローラ１０は、マイク１５
や、ＣＣＤカメラ１６、タッチセンサ１７、バッテリセ
ンサ１２、熱センサ１３から与えられる音声信号、画像
信号、圧力検出信号、バッテリ残量検出信号、熱検出信
号に基づいて、周囲の状況や、ユーザからの指令、ユー
ザからの働きかけなどの有無を判断する。That is, the controller 10 controls the microphone 15
And, based on sound signals, image signals, pressure detection signals, remaining battery level detection signals, and heat detection signals provided from the CCD camera 16, the touch sensor 17, the battery sensor 12, and the heat sensor 13, the surrounding conditions and the user Is determined, and whether or not there is a request from the user.

【００２６】さらに、コントローラ１０は、この判断結
果等に基づいて、続く行動を決定し、その決定結果に基
づいて、アクチュエータ３ＡＡ₁乃至３ＡＡ_K、３ＢＡ₁
乃至３ＢＡ_K、３ＣＡ₁乃至３ＣＡ_K、３ＤＡ₁乃至３ＤＡ
_K、４Ａ₁乃至４Ａ_L、５Ａ₁、５Ａ₂のうちの必要なもの
を駆動させる。これにより、頭部ユニット４を上下左右
に振らせたり、下顎部４Ａを開閉させる。さらには、尻
尾部ユニット５を動かせたり、各脚部ユニット３Ａ乃至
３Ｄを駆動して、ロボットを歩行させるなどの行動を行
わせる。Furthermore, the controller 10, based on the determination results and the like, to determine the subsequent actions, based on the determination result, the actuators 3AA ₁ to 3AA _K, 3BA ₁
To 3BA _K, 3CA ₁ to 3CA _K, 3DA ₁ to 3DA
_K, 4A ₁ to 4A _L, 5A _1, 5A to drive the necessary of the _two. Thereby, the head unit 4 is swung up, down, left and right, and the lower jaw 4A is opened and closed. Further, the tail unit 5 can be moved, and the leg units 3A to 3D are driven to perform actions such as walking the robot.

【００２７】また、コントローラ１０は、必要に応じ
て、合成音、あるいは後述するようなエコーバック音声
を生成し、スピーカ１８に供給して出力させたり、ロボ
ットの「目」の位置に設けられたＬＥＤ１９を点灯、消
灯または点滅させる。Further, the controller 10 generates a synthesized sound or an echo-back sound as described later, as necessary, and supplies it to the speaker 18 for output, or is provided at the position of the "eye" of the robot. The LED 19 is turned on, turned off, or blinked.

【００２８】以上のようにして、ロボットは、周囲の状
況等に基づいて自律的に行動をとるようになっている。As described above, the robot autonomously behaves based on the surrounding situation and the like.

【００２９】次に、図３は、図２のコントローラ１０の
機能的構成例を示している。なお、図３に示す機能的構
成は、ＣＰＵ１０Ａが、メモリ１０Ｂに記憶された制御
プログラムを実行することで実現されるようになってい
る。FIG. 3 shows an example of a functional configuration of the controller 10 shown in FIG. Note that the functional configuration illustrated in FIG. 3 is realized by the CPU 10A executing a control program stored in the memory 10B.

【００３０】センサ入力処理部５０は、マイク１５や、
ＣＣＤカメラ１６、タッチセンサ１７等から与えられる
音声信号、画像信号、圧力検出信号等に基づいて、特定
の外部状態や、ユーザからの特定の働きかけ、ユーザか
らの指示等を認識し、その認識結果を表す状態認識情報
を、モデル記憶部５１および行動決定機構部５２に通知
する。The sensor input processing unit 50 includes a microphone 15,
Based on audio signals, image signals, pressure detection signals, and the like provided from the CCD camera 16, the touch sensor 17, and the like, a specific external state, a specific action from the user, an instruction from the user, and the like are recognized, and the recognition result is obtained. Is notified to the model storage unit 51 and the action determination mechanism unit 52.

【００３１】即ち、センサ入力処理部５０は、音声認識
部５０Ａを有しており、音声認識部５０Ａは、マイク１
５から与えられる音声信号について音声認識を行う。そ
して、音声認識部５０Ａは、その音声認識結果として
の、例えば、「歩け」、「伏せ」、「ボールを追いかけ
ろ」等の指令、後述するように登録された名前その他
を、状態認識情報として、モデル記憶部５１および行動
決定機構部５２に通知する。That is, the sensor input processing unit 50 has a voice recognition unit 50A, and the voice recognition unit 50A
Speech recognition is performed on the speech signal given from 5. Then, the voice recognition unit 50A uses, as the state recognition information, commands such as “walk”, “down”, “chase the ball”, etc., and registered names and the like as described later as the voice recognition result. It notifies the model storage unit 51 and the action determination mechanism unit 52.

【００３２】また、センサ入力処理部５０は、画像認識
部５０Ｂを有しており、画像認識部５０Ｂは、ＣＣＤカ
メラ１６から与えられる画像信号を用いて、画像認識処
理を行う。そして、画像認識部５０Ｂは、その処理の結
果、例えば、「赤い丸いもの」や、「地面に対して垂直
なかつ所定高さ以上の平面」等を検出したときには、
「ボールがある」や、「壁がある」等の画像認識結果
を、状態認識情報として、モデル記憶部５１および行動
決定機構部５２に通知する。The sensor input processing section 50 has an image recognizing section 50B. The image recognizing section 50B performs an image recognizing process using an image signal supplied from the CCD camera 16. When the image recognition unit 50B detects, for example, a “red round object” or a “plane that is perpendicular to the ground and equal to or more than a predetermined height” as a result of the processing,
Image recognition results such as “there is a ball” and “there is a wall” are notified to the model storage unit 51 and the action determination mechanism unit 52 as state recognition information.

【００３３】さらに、センサ入力処理部５０は、圧力処
理部５０Ｃを有しており、圧力処理部５０Ｃは、タッチ
センサ１７から与えられる圧力検出信号を処理する。そ
して、圧力処理部５０Ｃは、その処理の結果、所定の閾
値以上で、かつ短時間の圧力を検出したときには、「頭
が触られた」と認識して、その認識結果を、状態認識情
報として、モデル記憶部５１および行動決定機構部５２
に通知する。Further, the sensor input processing section 50 has a pressure processing section 50C, and the pressure processing section 50C processes a pressure detection signal given from the touch sensor 17. Then, when the pressure processing unit 50C detects a pressure that is equal to or more than a predetermined threshold value and for a short time as a result of the processing, the pressure processing unit 50C recognizes that “the head has been touched” and uses the recognition result as state recognition information. Model storage unit 51 and action determination mechanism unit 52
Notify.

【００３４】モデル記憶部５１は、ロボットの感情、本
能、成長の状態を表現する感情モデル、本能モデル、成
長モデルをそれぞれ記憶、管理している。The model storage unit 51 stores and manages an emotion model, an instinct model, and a growth model expressing the emotion, instinct, and growth state of the robot.

【００３５】ここで、感情モデルは、例えば、「うれし
さ」、「悲しさ」、「怒り」、「楽しさ」等の感情の状
態（度合い）を、所定の範囲の値によってそれぞれ表
し、センサ入力処理部５０からの状態認識情報や時間経
過等に基づいて、その値を変化させる。Here, the emotion model expresses emotion states (degrees) such as, for example, "joy,""sadness,""anger," and "fun," by values in a predetermined range. The value is changed based on the state recognition information from the input processing unit 50, the passage of time, and the like.

【００３６】本能モデルは、例えば、「食欲」、「睡眠
欲」、「運動欲」等の本能による欲求の状態（度合い）
を、所定の範囲の値によってそれぞれ表し、センサ入力
処理部５０からの状態認識情報や時間経過等に基づい
て、その値を変化させる。The instinct model is a state (degree) of desire by instinct such as “appetite”, “sleep desire”, and “exercise desire”.
Are respectively represented by values within a predetermined range, and the values are changed based on state recognition information from the sensor input processing unit 50, elapsed time, and the like.

【００３７】成長モデルは、例えば、図４に示すような
オートマトンで構成される。このオートマトンでは、成
長状態は、ノード（状態）ＮＯＤＥ0乃至ＮＯＤＥGで表
現され、成長、つまり成長状態の遷移は、ある成長状態
に対応するノードＮＯＤＥgから、次の成長状態に対応
するノードＮＯＤＥg+1への遷移を表すアークＡＲＣg+1
で表現される（ｇ＝０，１，・・・，Ｇ−１）。The growth model is composed of, for example, an automaton as shown in FIG. In this automaton, the growth state is represented by nodes (states) NODE0 to NODEG, and the growth, that is, the transition of the growth state is from a node NODEg corresponding to a certain growth state to a node NODEg + 1 corresponding to the next growth state. ARCg + 1 representing the transition of
(G = 0, 1,..., G-1).

【００３８】ここで、本実施の形態では、成長の状態
は、図４において、左のノードから右方向のノードに遷
移していくようになっている。従って、図４において、
例えば、最も左のノードＮＯＤＥ0は、生まれたばかり
の「新生児」の状態を表し、左から２番目のノードＮＯ
ＤＥ1は、「幼児」の状態を表し、左から３番目のノー
ドＮＯＤＥ2は「児童」の状態を表す。以下、同様にし
て、右方向のノードほど、より成長した状態を表し、最
も右のノードＮＯＤＥGは、「高齢」の状態を表す。Here, in the present embodiment, the state of growth transitions from the left node to the right node in FIG. Therefore, in FIG.
For example, the leftmost node NODE0 represents the state of a "newborn" just born, and the second node NO.
DE1 represents the state of "child", and the third node NODE2 from the left represents the state of "child". Hereinafter, similarly, the rightward node represents a more mature state, and the rightmost node NODEG represents an "elderly" state.

【００３９】そして、あるノードＮＯＤＥgから、右隣
のノードＮＯＤＥg+1への遷移を表すアークＡＲＣg+1に
は、その遷移が生じるための条件（入力）Ｐtg+1が設定
されており、ノードの遷移（成長）は、この条件に基づ
いて決定される。即ち、アークＡＲＣg+1においては、
その遷移が生じるために要求される、ＣＣＤカメラ１６
や、マイク９、タッチセンサ１７からの出力、時間経過
等についての条件Ｐtg+1が設定されており、その条件Ｐ
tg+1が満たされた場合に、ノードＮＯＤＥgから右隣の
ノードＮＯＤＥg+1への遷移が生じ、ロボットが成長す
る。An arc ARCg + 1 representing a transition from a certain node NODEg to the node NODEg + 1 on the right side is set with a condition (input) Ptg + 1 for causing the transition. Transition (growth) is determined based on this condition. That is, in the arc ARCg + 1,
CCD camera 16 required for the transition to occur
Ptg + 1 for the output from the microphone 9, the touch sensor 17, the passage of time, and the like are set.
When tg + 1 is satisfied, a transition from the node NODEg to the node NODEg + 1 on the right occurs, and the robot grows.

【００４０】モデル記憶部５１は、上述のようにして感
情モデル、本能モデル、成長モデルの値で表される感
情、本能、成長の状態を、状態情報として、行動決定機
構部５２に送出する。The model storage unit 51 sends the emotion, instinct, and growth state represented by the values of the emotion model, instinct model, and growth model as described above to the behavior determination mechanism unit 52 as state information.

【００４１】なお、モデル記憶部５１には、センサ入力
処理部５０から状態認識情報が供給される他、行動決定
機構部５２から、ロボットの現在または過去の行動、具
体的には、例えば、「長時間歩いた」などの行動の内容
を示す行動情報が供給されるようになっており、同一の
状態認識情報が与えられても、行動情報が示すロボット
の行動に応じて、異なる状態情報（感情モデル、本能モ
デル、および成長モデルの値で表される感情、本能、お
よび成長の状態）を生成するようになっている。The model storage unit 51 is supplied with state recognition information from the sensor input processing unit 50, and the current or past behavior of the robot, specifically, for example, “ Behavior information indicating the content of the action such as "walking for a long time" is supplied. Even if the same state recognition information is given, different state information ( (Emotion, instinct, and growth state) represented by the values of the emotion model, the instinct model, and the growth model.

【００４２】行動決定機構部５２は、センサ入力処理部
５０からの状態認識情報や、モデル記憶部５１からの状
態情報、時間経過等に基づいて、次の行動を決定し、決
定された行動の内容を、行動指令情報として、姿勢遷移
機構部５３に送出する。The action determining mechanism 52 determines the next action based on the state recognition information from the sensor input processing section 50, the state information from the model storage section 51, the passage of time, and the like. The content is sent to the posture transition mechanism 53 as action command information.

【００４３】即ち、行動決定機構部５２は、ロボットが
とり得る行動をステート（状態）(state)に対応させた
有限オートマトンを、ロボットの行動を規定する行動モ
デルとして管理しており、この行動モデルとしての有限
オートマトンにおけるステートを、センサ入力処理部５
０からの状態認識情報や、モデル記憶部５１における感
情モデル、本能モデル、または成長モデルの値、時間経
過等に基づいて遷移させ、遷移後のステートに対応する
行動を、次にとるべき行動として決定する。That is, the action determining mechanism 52 manages a finite state automaton in which actions that can be taken by the robot correspond to states, as an action model that defines the actions of the robot. State in the finite state automaton as the sensor input processing unit 5
Transition based on state recognition information from 0, the value of the emotion model, instinct model, or growth model in the model storage unit 51, the passage of time, and the like, and the action corresponding to the state after the transition as the action to be taken next decide.

【００４４】ここで、行動決定機構部５２は、所定のト
リガ(trigger)があったことを検出すると、ステートを
遷移させる。即ち、行動決定機構部５２は、例えば、現
在のステートに対応する行動を実行している時間が所定
時間に達したときや、特定の状態認識情報を受信したと
き、モデル記憶部５１から供給される状態情報が示す感
情や、本能、成長の状態の値が所定の閾値以下または以
上になったとき等に、ステートを遷移させる。Here, upon detecting that a predetermined trigger has occurred, the action determining mechanism 52 changes the state. That is, for example, when the time during which the action corresponding to the current state is being executed reaches a predetermined time, or when specific state recognition information is received, the action determining mechanism unit 52 is supplied from the model storage unit 51. The state is changed when the value of the emotion, instinct, or growth state indicated by the state information is equal to or less than a predetermined threshold.

【００４５】なお、行動決定機構部５２は、上述したよ
うに、センサ入力処理部５０からの状態認識情報だけで
なく、モデル記憶部５１における感情モデルや、本能モ
デル、成長モデルの値等にも基づいて、行動モデルにお
けるステートを遷移させることから、同一の状態認識情
報が入力されても、感情モデルや、本能モデル、成長モ
デルの値（状態情報）によっては、ステートの遷移先は
異なるものとなる。As described above, the behavior determining mechanism 52 stores not only the state recognition information from the sensor input processing unit 50 but also the values of the emotion model, the instinct model, the growth model, and the like in the model storage unit 51. Based on the state transition based on the behavior model, the destination of the state transition differs depending on the emotion model, the instinct model, and the value of the growth model (state information) even if the same state recognition information is input. Become.

【００４６】その結果、行動決定機構部５２は、例え
ば、状態情報が、「怒っていない」こと、および「お腹
がすいていない」ことを表している場合において、状態
認識情報が、「目の前に手のひらが差し出された」こと
を表しているときには、目の前に手のひらが差し出され
たことに応じて、「お手」という行動をとらせる行動指
令情報を生成し、これを、姿勢遷移機構部５３に送出す
る。As a result, for example, when the state information indicates “not angry” and “not hungry”, the action determining mechanism 52 sets the state recognition information to “eye”. When the palm has been presented before, the action command information for taking the action of "hand" is generated in accordance with the palm being presented in front of the eyes, It is sent to the posture transition mechanism 53.

【００４７】また、行動決定機構部５２は、例えば、状
態情報が、「怒っていない」こと、および「お腹がすい
ている」ことを表している場合において、状態認識情報
が、「目の前に手のひらが差し出された」ことを表して
いるときには、目の前に手のひらが差し出されたことに
応じて、「手のひらをぺろぺろなめる」ような行動を行
わせるための行動指令情報を生成し、これを、姿勢遷移
機構部５３に送出する。Further, for example, when the state information indicates that “the person is not angry” and “is hungry”, the action determining mechanism 52 determines that the state recognition information indicates “the front of the eyes”. When the palm is displayed, the action command information for performing an action such as "palm licking the palm" is generated in response to the palm being displayed in front of the eyes. This is sent to the posture transition mechanism 53.

【００４８】また、行動決定機構部５２は、例えば、状
態情報が、「怒っている」ことを表している場合におい
て、状態認識情報が、「目の前に手のひらが差し出され
た」ことを表しているときには、状態情報が、「お腹が
すいている」ことを表していても、また、「お腹がすい
ていない」ことを表していても、「ぷいと横を向く」よ
うな行動を行わせるための行動指令情報を生成し、これ
を、姿勢遷移機構部５３に送出する。Further, for example, when the state information indicates “angry”, the action determining mechanism 52 determines that the state recognition information indicates “the palm is put in front of the eyes”. When it indicates, even if the status information indicates that "stomach is hungry", or indicates that "stomach is not hungry", even if the state information indicates "being hungry", an action such as "turns to the side with a little bit" The action command information for performing the action is generated and sent to the posture transition mechanism 53.

【００４９】なお、行動決定機構部５２には、モデル記
憶部５１から供給される状態情報が示す感情や、本能、
成長の状態に基づいて、遷移先のステートに対応する行
動のパラメータとしての、例えば、歩行の速度や、手足
を動かす際の動きの大きさおよび速度などを決定させる
ことができ、この場合、それらのパラメータを含む行動
指令情報が、姿勢遷移機構部５３に送出される。The behavior determining mechanism 52 has an emotion, an instinct, and the like indicated by the state information supplied from the model storage 51.
Based on the state of growth, as a parameter of the action corresponding to the state of the transition destination, for example, the speed of walking, the magnitude and speed of the movement when moving the limbs can be determined, in this case, Is transmitted to the posture transition mechanism 53.

【００５０】また、行動決定機構部５２では、上述した
ように、ロボットの頭部や手足等を動作させる行動指令
情報の他、ロボットに発話を行わせる行動指令情報も生
成される。ロボットに発話を行わせる行動指令情報は、
音声合成部５５に供給されるようになっており、音声合
成部５５に供給される行動指令情報には、音声合成部５
５に生成させる合成音に対応するテキスト等が含まれ
る。そして、音声合成部５５は、行動決定機構部５２か
ら行動指令情報を受信すると、その行動指令情報に含ま
れるテキストに基づき、合成音を生成し、出力制御部５
７を介して、スピーカ１８に供給して出力させる。これ
により、スピーカ１８からは、例えば、ロボットの鳴き
声、さらには、「お腹がすいた」等のユーザへの各種の
要求、「何？」等のユーザの呼びかけに対する応答その
他の音声出力が行われる。In addition, as described above, the action determining mechanism 52 generates action command information for causing the robot to speak, in addition to action command information for operating the robot's head and limbs. The action command information that causes the robot to speak is
The voice command is supplied to the voice synthesis unit 55, and the action command information supplied to the voice synthesis unit 55 includes the voice synthesis unit 5.
5 includes a text corresponding to the synthesized sound to be generated. Then, upon receiving the action command information from the action determination mechanism section 52, the speech synthesis section 55 generates a synthesized sound based on the text included in the action command information, and outputs the synthesized sound.
7 to the speaker 18 for output. As a result, for example, the cry of the robot, various requests to the user such as “hungry”, a response to the user's call such as “what?”, And other voice output are performed from the speaker 18. .

【００５１】姿勢遷移機構部５３は、行動決定機構部５
２から供給される行動指令情報に基づいて、ロボットの
姿勢を、現在の姿勢から次の姿勢に遷移させるための姿
勢遷移情報を生成し、これを制御機構部５４に送出す
る。The posture transition mechanism unit 53 includes the action determination mechanism unit 5
Based on the action command information supplied from 2, posture change information for changing the posture of the robot from the current posture to the next posture is generated and transmitted to the control mechanism unit 54.

【００５２】ここで、現在の姿勢から次に遷移可能な姿
勢は、例えば、胴体や手や足の形状、重さ、各部の結合
状態のようなロボットの物理的形状と、関節が曲がる方
向や角度のようなアクチュエータ３ＡＡ₁乃至５Ａ₁およ
び５Ａ₂の機構とによって決定される。Here, the posture that can be changed next from the current posture is, for example, the physical shape of the robot such as the shape and weight of the body, hands and feet, the connection state of each part, the direction in which the joint is bent, and the like. It is determined by the mechanism of the actuator 3AA ₁ to 5A ₁ and 5A _2, such as angle.

【００５３】また、次の姿勢としては、現在の姿勢から
直接遷移可能な姿勢と、直接には遷移できない姿勢とが
ある。例えば、４本足のロボットは、手足を大きく投げ
出して寝転んでいる状態から、伏せた状態へ直接遷移す
ることはできるが、立った状態へ直接遷移することはで
きず、一旦、手足を胴体近くに引き寄せて伏せた姿勢に
なり、それから立ち上がるという２段階の動作が必要で
ある。また、安全に実行できない姿勢も存在する。例え
ば、４本足のロボットは、その４本足で立っている姿勢
から、両前足を挙げてバンザイをしようとすると、簡単
に転倒してしまう。As the next posture, there are a posture that can directly transition from the current posture and a posture that cannot directly transition. For example, a four-legged robot can make a direct transition from lying down with its limbs throwing down to lying down, but not directly into a standing state. It is necessary to perform a two-stage operation of pulling down to a prone position and then standing up. There are also postures that cannot be safely executed. For example, a four-legged robot easily falls down when trying to banzai with both front legs raised from its standing posture.

【００５４】このため、姿勢遷移機構部５３は、直接遷
移可能な姿勢をあらかじめ登録しておき、行動決定機構
部５２から供給される行動指令情報が、直接遷移可能な
姿勢を示す場合には、その行動指令情報を、そのまま姿
勢遷移情報として、制御機構部５４に送出する。一方、
行動指令情報が、直接遷移不可能な姿勢を示す場合に
は、姿勢遷移機構部５３は、遷移可能な他の姿勢に一旦
遷移した後に、目的の姿勢まで遷移させるような姿勢遷
移情報を生成し、制御機構部５４に送出する。これによ
りロボットが、遷移不可能な姿勢を無理に実行しようと
する事態や、転倒するような事態を回避することができ
るようになっている。For this reason, the posture transition mechanism unit 53 pre-registers a posture to which a direct transition is possible, and when the action command information supplied from the behavior determination mechanism unit 52 indicates a posture to which a direct transition is possible, The action command information is sent to the control mechanism unit 54 as posture change information as it is. on the other hand,
When the action command information indicates a posture that cannot directly make a transition, the posture transition mechanism unit 53 generates posture transition information that makes a transition to a target posture after temporarily transiting to another possible posture. To the control mechanism 54. As a result, it is possible to avoid a situation in which the robot forcibly executes an untransitionable posture or a situation in which the robot falls.

【００５５】制御機構部５４は、姿勢遷移機構部５３か
らの姿勢遷移情報にしたがって、アクチュエータ３ＡＡ
₁乃至５Ａ₁および５Ａ₂を駆動するための制御信号を生
成し、これを、アクチュエータ３ＡＡ₁乃至５Ａ₁および
５Ａ₂に送出する。これにより、アクチュエータ３ＡＡ₁
乃至５Ａ₁および５Ａ₂は、制御信号にしたがって駆動
し、ロボットは、自律的に行動を起こす。In accordance with the posture transition information from the posture transition mechanism 53, the control mechanism 54
₁ generates a control signal for driving the 5A ₁ and 5A _2, which is sent to the actuator 3AA ₁ to 5A ₁ and 5A _2. Thereby, the actuator 3AA ₁
To 5A ₁ and 5A ₂ is driven in accordance with the control signals, the robot causes the autonomous motions.

【００５６】エコーバック部５６は、マイク１５から与
えられ、音声認識部５０Ａで音声認識される音声信号を
監視しており、その音声信号を復唱するような音声（以
下、適宜、エコーバック音声という）を生成して出力す
る。このエコーバック音声は、出力制御部５７を介し
て、スピーカ１８に供給されて出力される。The echo back unit 56 monitors a voice signal provided from the microphone 15 and recognized by the voice recognition unit 50A, and reproduces the voice signal (hereinafter referred to as echo back voice as appropriate). ) Is generated and output. This echo back sound is supplied to the speaker 18 via the output control unit 57 and output.

【００５７】出力制御部５７には、音声合成部５５から
の合成音のディジタルデータと、エコーバック部５６か
らのエコーバック音声のディジタルデータとが供給され
るようになっており、それらのディジタルデータを、ア
ナログの音声信号にＤ／Ａ変換し、スピーカ１８に供給
して出力させる。また、出力制御部５７は、音声合成部
５５からの合成音と、エコーバック部５６からのエコー
バック音声の、スピーカ１８への出力が競合した場合
に、その競合を調整する。即ち、エコーバック部５６か
らのエコーバック音声の出力は、行動決定機構部５２の
制御にしたがって音声合成部５５が行う合成音の出力と
は独立に行われるようになっており、エコーバック音声
の出力と合成音の出力とは競合する場合がある。そこ
で、出力制御部５７は、その競合の調停を行う。The output control section 57 is supplied with the digital data of the synthesized sound from the voice synthesis section 55 and the digital data of the echo-back sound from the echo-back section 56. Is converted into an analog audio signal by D / A and supplied to the speaker 18 for output. When the output of the synthesized sound from the voice synthesizer 55 and the echo-back sound from the echo-back unit 56 to the speaker 18 conflicts, the output control unit 57 adjusts the conflict. That is, the output of the echo-back sound from the echo-back unit 56 is performed independently of the output of the synthesized sound performed by the voice synthesizing unit 55 under the control of the action determination mechanism unit 52. The output and the output of the synthesized sound may conflict with each other. Therefore, the output control unit 57 arbitrates the conflict.

【００５８】名前登録部５８は、センサ入力処理部５０
（音声認識部５０Ａ、圧力処理部５０Ｃ）、モデル記憶
部５１、行動決定機構部５２、エコーバック部５６、お
よびＬＥＤ１９を制御して、後述する名前登録処理を実
行する。The name registration unit 58 is provided with a sensor input processing unit 50
(Speech recognition unit 50A, pressure processing unit 50C), model storage unit 51, action determination mechanism unit 52, echo back unit 56, and LED 19 are controlled to execute a name registration process described later.

【００５９】次に、図５は、図３の音声認識部５０Ａの
構成例を示している。Next, FIG. 5 shows an example of the configuration of the voice recognition section 50A of FIG.

【００６０】マイク１５からの音声信号は、ＡＤ(Analo
g Digital)変換部２１に供給される。ＡＤ変換部２１で
は、マイク１５からのアナログ信号である音声信号がサ
ンプリング、量子化され、ディジタル信号である音声デ
ータにＡ／Ｄ変換される。この音声データは、特徴抽出
部２２および音声区間検出部２７に供給される。The audio signal from the microphone 15 is AD (Analo
g Digital) converter 21. The AD converter 21 samples and quantizes an audio signal, which is an analog signal from the microphone 15, and A / D converts the audio signal into digital audio data. This audio data is supplied to the feature extraction unit 22 and the audio section detection unit 27.

【００６１】特徴抽出部２２は、そこに入力される音声
データについて、適当なフレームごとに、例えば、ＭＦ
ＣＣ(Mel Frequency Cepstrum Coefficient)分析を行
い、その分析結果を、特徴パラメータ（特徴ベクトル）
として、マッチング部２３に出力する。なお、特徴抽出
部２２では、その他、例えば、線形予測係数、ケプスト
ラム係数、線スペクトル対、所定の周波数帯域ごとのパ
ワー（フィルタバンクの出力）等を、特徴パラメータと
して抽出することが可能である。The feature extracting unit 22 converts, for example, MF
A CC (Mel Frequency Cepstrum Coefficient) analysis is performed, and the analysis result is used as a feature parameter (feature vector).
Is output to the matching unit 23. The feature extraction unit 22 can also extract, for example, a linear prediction coefficient, a cepstrum coefficient, a line spectrum pair, power (output of a filter bank) for each predetermined frequency band, and the like as feature parameters.

【００６２】マッチング部２３は、特徴抽出部２２から
の特徴パラメータを用いて、音響モデル記憶部２４、辞
書記憶部２５、および文法記憶部２６を必要に応じて参
照しながら、マイク１５に入力された音声（入力音声）
を、例えば、連続分布ＨＭＭ(Hidden Markov Model)法
に基づいて音声認識する。The matching section 23 uses the feature parameters from the feature extraction section 22 to refer to the acoustic model storage section 24, the dictionary storage section 25, and the grammar storage section 26 as necessary, and to be input to the microphone 15. Voice (input voice)
Is recognized based on, for example, a continuous distribution HMM (Hidden Markov Model) method.

【００６３】即ち、音響モデル記憶部２４は、音声認識
する音声の言語における個々の音素や音節などの音響的
な特徴を表す音響モデルを記憶している。ここでは、連
続分布ＨＭＭ法に基づいて音声認識を行うので、音響モ
デルとしては、ＨＭＭ(Hidden Markov Model)が用いら
れる。That is, the acoustic model storage unit 24 stores acoustic models representing acoustic features such as individual phonemes and syllables in the language of the speech to be recognized. Here, since speech recognition is performed based on the continuous distribution HMM method, HMM (Hidden Markov Model) is used as an acoustic model.

【００６４】辞書記憶部２５は、図６に示すように、認
識対象の各単語について、その発音に関する情報（音韻
情報）が記述された単語辞書を記憶している。文法記憶
部２６は、辞書記憶部２５の単語辞書に登録されている
各単語が、どのように連鎖する（つながる）かを記述し
た文法規則を記憶している。ここで、文法規則として
は、例えば、文脈自由文法（ＣＦＧ）や、統計的な単語
連鎖確率（Ｎ−ｇｒａｍ）などに基づく規則を用いるこ
とができる。As shown in FIG. 6, the dictionary storage unit 25 stores a word dictionary in which information on pronunciation (phonological information) is described for each word to be recognized. The grammar storage unit 26 stores grammar rules describing how each word registered in the word dictionary of the dictionary storage unit 25 is linked (connected). Here, as the grammar rule, for example, a rule based on a context-free grammar (CFG), a statistical word chain probability (N-gram), or the like can be used.

【００６５】マッチング部２３は、辞書記憶部２５の単
語辞書を参照することにより、音響モデル記憶部２４に
記憶されている音響モデルを接続することで、単語の音
響モデル（単語モデル）を構成する。The matching section 23 refers to the word dictionary in the dictionary storage section 25 and connects the acoustic models stored in the acoustic model storage section 24 to form a word acoustic model (word model). .

【００６６】さらに、マッチング部２３は、幾つかの単
語モデルを、文法記憶部２６に記憶された文法規則を参
照することにより接続し、そのようにして接続された単
語モデルを用いて、特徴パラメータに基づき、連続分布
ＨＭＭ法によって、マイク１５に入力された音声を認識
する。即ち、マッチング部２３は、特徴抽出部２２が出
力する時系列の特徴パラメータが観測されるスコア（尤
度）が最も高い単語モデルの系列を検出し、その単語モ
デルの系列に対応する単語列の音韻情報（読み）を、音
声の認識結果として出力する。Further, the matching unit 23 connects several word models by referring to the grammar rules stored in the grammar storage unit 26, and uses the thus connected word models to generate feature parameters. , The speech input to the microphone 15 is recognized by the continuous distribution HMM method. That is, the matching unit 23 detects the sequence of the word model having the highest score (likelihood) at which the time-series feature parameters output by the feature extraction unit 22 are observed, and determines the word sequence corresponding to the word model sequence. The phoneme information (reading) is output as a speech recognition result.

【００６７】以上のようにして出力される、マイク１５
に入力された音声の認識結果は、状態認識情報として、
モデル記憶部５１および行動決定機構部５２に出力され
る。The microphone 15 output as described above
The recognition result of the voice input to the
It is output to the model storage unit 51 and the action determination mechanism unit 52.

【００６８】音声区間検出部２７は、ＡＤ変換部２１か
らの音声データについて、特徴抽出部２２がＭＦＣＣ分
析を行うのと同様のフレームごとに、例えば、パワーを
算出している。さらに、音声区間検出部２７は、各フレ
ームのパワーを、所定の閾値と比較し、その閾値以上の
パワーを有するフレームで構成される区間を、ユーザの
音声が入力されている音声区間として検出する。そし
て、音声区間検出部２７は、検出した音声区間を、特徴
抽出部２２とマッチング部２３に供給しており、特徴抽
出部２２とマッチング部２３は、音声区間のみを対象に
処理を行う。The voice section detecting section 27 calculates, for example, the power of the voice data from the AD converting section 21 for each frame similar to that in which the feature extracting section 22 performs the MFCC analysis. Further, the voice section detection unit 27 compares the power of each frame with a predetermined threshold, and detects a section including frames having power equal to or higher than the threshold as a voice section in which the user's voice is input. . Then, the voice section detecting section 27 supplies the detected voice section to the feature extracting section 22 and the matching section 23, and the feature extracting section 22 and the matching section 23 perform processing only on the voice section.

【００６９】図７は、図３のエコーバック部５６の構成
例を示している。FIG. 7 shows an example of the configuration of the echo back unit 56 of FIG.

【００７０】マイク１５からの音声信号は、ＡＤ変換部
４１に供給される。ＡＤ変換部４１では、マイク１５か
らのアナログ信号である音声信号がサンプリング、量子
化され、ディジタル信号である音声データにＡ／Ｄ変換
される。この音声データは、韻律分析部４２および音声
区間検出部４６に供給される。The audio signal from the microphone 15 is supplied to the AD converter 41. The A / D converter 41 samples and quantizes an audio signal, which is an analog signal from the microphone 15, and A / D-converts the audio signal into digital audio data. The voice data is supplied to the prosody analysis unit 42 and the voice section detection unit 46.

【００７１】韻律分析部４２は、そこに入力される音声
データを、適当なフレームごとに音響分析することによ
り、例えば、ピッチ周波数やパワー等といった音声デー
タの韻律情報を抽出する。この韻律情報は、音生成部４
３に供給される。The prosody analysis unit 42 extracts the prosody information of the voice data such as the pitch frequency and the power by acoustically analyzing the voice data input thereto for each appropriate frame. This prosody information is stored in the sound generation unit 4
3 is supplied.

【００７２】音生成部４３は、韻律分析部４２からの韻
律情報に基づいて、韻律を制御したエコーバック音声を
生成する。The sound generation unit 43 generates an echo-back sound whose prosody is controlled based on the prosody information from the prosody analysis unit 42.

【００７３】即ち、音生成部４３は、韻律分析部４２か
らの韻律情報と同一の韻律を有する、音韻のない音声
（以下、適宜、無音韻音声という）を、例えば、サイン
(sin)波を重畳することにより生成し、エコーバック音
声として、出力部４４に供給する。That is, the sound generation unit 43 outputs a phoneme-free speech (hereinafter, appropriately referred to as a silent phoneme speech) having the same prosody as the prosody information from the prosody analysis unit 42, for example, as a sign
The signal is generated by superimposing (sin) waves and supplied to the output unit 44 as echo back sound.

【００７４】なお、韻律情報としての、例えば、ピッチ
周波数とパワーから音声データを生成する方法について
は、例えば、鈴木、石井、竹内、「非分節音による反響
的な模倣とその心理的影響」、情報処理学会論文誌、vo
l.41,No.5,pp1328-1337,May,2000や、特開2000-181896
号公報等に、その詳細が記載されている。Note that methods for generating voice data from, for example, pitch frequency and power as prosodic information are described in, for example, Suzuki, Ishii, Takeuchi, "Echoing imitation by non-segmented sounds and its psychological effect", IPSJ Transactions, vo
l.41, No.5, pp1328-1337, May, 2000, and JP-A-2000-181896
The details are described in Japanese Patent Publication No.

【００７５】出力部４４は、音生成部４３からのエコー
バック音声のデータを、メモリ４５に記憶させるととも
に、出力制御部５７（図３）に出力する。The output unit 44 stores the echo back sound data from the sound generation unit 43 in the memory 45 and outputs the data to the output control unit 57 (FIG. 3).

【００７６】音声区間検出部４６は、ＡＤ変換部４１か
らの音声データについて、図５の音声区間検出部２７に
おける場合と同様の処理を行うことにより、音声区間を
検出し、韻律分析部４２と音生成部４３に供給する。こ
れにより、韻律分析部４２と音生成部４３では、音声区
間のみを対象に処理が行われる。The voice section detection section 46 detects the voice section by performing the same processing as that in the voice section detection section 27 of FIG. It is supplied to the sound generation unit 43. As a result, the prosody analysis unit 42 and the sound generation unit 43 perform processing only on the voice section.

【００７７】なお、図７のＡＤ変換部４１または音声区
間検出部４６と、図５のＡＤ変換部２１または音声区間
検出部２７とは、それぞれ兼用することが可能である。The AD converter 41 or the voice section detector 46 shown in FIG. 7 can be used as the AD converter 21 or the voice section detector 27 shown in FIG.

【００７８】以上のように構成されるエコーバック部５
６では、例えば、図８のフローチャートにしたがったエ
コーバック処理が行われる。The echo back unit 5 configured as described above
In step 6, for example, an echo back process is performed according to the flowchart of FIG.

【００７９】即ち、まず最初に、ステップＳ１１におい
て、音声区間検出部４６が、ＡＤ変換部４１の出力に基
づいて、音声区間であるかどうかを判定し、音声区間で
ないと判定した場合、処理を終了し、再度、ステップＳ
１１からのエコーバック処理を再開する。That is, first, in step S 11, the voice section detection unit 46 determines whether or not the voice section is a voice section based on the output of the AD conversion unit 41. Finished and step S again
The echo back process from step 11 is restarted.

【００８０】また、ステップＳ１１において、音声区間
であると判定された場合、ステップＳ１２に進み、韻律
分析部４２は、ＡＤ変換部４１の出力、即ち、マイク１
５に入力されたユーザの音声を音響分析することによ
り、その韻律情報を取得し、音生成部４３に供給する。If it is determined in step S11 that the voice section is a voice section, the process proceeds to step S12, where the prosody analysis section 42 outputs the output of the AD conversion section 41, that is, the microphone 1
The prosody information is obtained by acoustically analyzing the user's voice input to 5 and supplied to the sound generation unit 43.

【００８１】音生成部４３は、ステップＳ１３におい
て、韻律分析部４２からの韻律情報と同一の韻律を有す
る無音韻音声を生成し、エコーバック音声として、出力
部４４に供給する。In step S 13, the sound generation unit 43 generates a silent sound having the same prosody as the prosody information from the prosody analysis unit 42, and supplies it to the output unit 44 as an echo-back sound.

【００８２】出力部４４は、ステップＳ１４において、
音生成部４３からのエコーバック音声のデータを、メモ
リ４５に記憶させ、ステップＳ１５に進み、そのエコー
バック音声を、出力制御部５７（図３）に出力して、処
理を終了する。The output unit 44 determines in step S14
The data of the echo-back sound from the sound generation unit 43 is stored in the memory 45, and the process proceeds to step S15, where the echo-back sound is output to the output control unit 57 (FIG. 3), and the process ends.

【００８３】これにより、エコーバック音声は、出力制
御部５７を介して、スピーカ１８に供給されて出力され
る。Thus, the echo back sound is supplied to the speaker 18 via the output control unit 57 and output.

【００８４】従って、この場合、スピーカ１８からは、
ユーザが発した音声から、その音韻を無くしたものが、
エコーバック音声として出力される。Therefore, in this case, the speaker 18 outputs
From the voice uttered by the user, the one that lost the phoneme,
Output as echo back sound.

【００８５】このエコーバック音声は、音声認識部５０
Ａにおいて音声認識の対象とされるユーザの音声を復唱
するようなものであり、このようなエコーバック音声が
出力される結果、ユーザは、エコーバック音声を聴くこ
とにより、ロボットにおいて、自身の音声が受け付けら
れたことを認識することができる。従って、ロボット
が、ユーザからの音声に対する応答として、何の行動も
起こさない場合（音声認識部５０Ａにおいて、ユーザの
音声が正しく認識されている場合と、誤って認識されて
いる場合の両方を含む）であっても、ユーザにおいて、
ロボットが故障しているといったような勘違いをするこ
と等を防止することができる。The echo back voice is transmitted to the voice recognition unit 50.
A is to repeat the voice of the user whose voice is to be recognized in A. As a result of outputting such an echo-back voice, the user listens to the echo-back voice, and the robot hears his / her own voice. Can be recognized. Therefore, when the robot does not take any action as a response to the voice from the user (including both the case where the voice of the user is correctly recognized and the case where the voice of the user is incorrectly recognized in the voice recognition unit 50A) ), But in the user,
It is possible to prevent the robot from misunderstanding that the robot is out of order.

【００８６】さらに、エコーバック音声は、ユーザが発
した音声そのものではなく、その音声の音韻をなくした
ものであるため、ユーザには、ロボットが、ユーザの音
声を理解し、自身の声で復唱しているかのように聞こえ
る。従って、ロボットにおいて、ユーザの音声を、単に
録音して再生しているのではなく、理解しているかのよ
うな印象を、ユーザに与えることができる。Further, since the echo-back sound is not the sound itself uttered by the user but the phonology of the sound, the robot understands the user's voice and repeats with his own voice. Sounds as if you are. Therefore, in the robot, the user's voice is not simply recorded and played, but can be given to the user an impression as if he / she understands it.

【００８７】なお、ここでは、音生成部４３において、
サイン波を重畳することによって、エコーバック音声を
生成するようにしたが、その他、例えば、エコーバック
音声は、ロボットの鳴き声となるような複雑な波形を用
意しておき、その波形をつなぎ合わせることによって生
成することが可能である。さらに、エコーバック音声と
しては、例えば、ユーザの音声を構成する音素を認識
し、その音素列によって構成される音韻を有するような
ものを生成することが可能である。また、エコーバック
音声は、例えば、ユーザの音声について、ケプストラム
係数を得て、そのケプストラム係数をタップ係数とする
ディジタルフィルタによって生成すること等も可能であ
る。Here, in the sound generation section 43,
The echo back sound is generated by superimposing the sine wave.In addition, for the echo back sound, for example, a complicated waveform that can be a voice of a robot is prepared and the waveforms are connected. Can be generated by Further, as the echo back voice, for example, it is possible to recognize a phoneme constituting a user's voice and generate a voice having a phoneme constituted by the phoneme sequence. Also, the echo-back sound can be generated, for example, by obtaining a cepstrum coefficient for the user's voice and using a digital filter that uses the cepstrum coefficient as a tap coefficient.

【００８８】但し、エコーバック音声が、ユーザの音声
に似過ぎると、ロボットにおいて、ユーザの音声を、単
に録音して再生しているかのような、いわば興ざめした
印象を、ユーザに与えかねないので、エコーバック音声
は、ユーザの音声に、あまり似たものにしない方が望ま
しい。However, if the echo-back sound is too similar to the user's voice, the robot may give the user a so-called impressive impression as if the user's voice were simply recorded and reproduced. It is desirable that the echo-back sound should not be very similar to the user's sound.

【００８９】また、上述の場合には、音生成部４３にお
いて、ユーザの音声の韻律と同一の韻律を有するエコー
バック音声を生成するようにしたが、音生成部４３に
は、ユーザの音声の韻律に多少の加工を加えた韻律を有
するエコーバック音声を生成させることも可能である。In the above case, the sound generation unit 43 generates an echo-back sound having the same prosody as that of the user's voice. It is also possible to generate an echo back voice having a prosody obtained by adding some processing to the prosody.

【００９０】次に、図３の名前登録部５８が行う名前登
録処理の手順を、図９のフローチャートを参照して説明
する。名前登録処理は、ステップＳ２１において、ロボ
ットが、名前を必要とする状態にまで成長したと判定さ
れ、ステップＳ２２で、その旨が、名前登録部５８に通
知されたときに開始される。Next, the procedure of the name registration process performed by the name registration unit 58 of FIG. 3 will be described with reference to the flowchart of FIG. The name registration process is started when it is determined in step S21 that the robot has grown to a state requiring a name, and the fact is notified to the name registration unit 58 in step S22.

【００９１】ステップＳ２１の処理を具体的に説明する
と、行動決定機構部５２は、モデル記憶部５１から送出
された、状態情報としての成長モデルが、名前を必要と
する成長の状態を表しているか否かを判定する。More specifically, the action determining mechanism 52 determines whether the growth model as state information sent from the model storage 51 indicates a growth state requiring a name. Determine whether or not.

【００９２】例えば、人間や犬などは、ある程度成長す
れば、自分の名前を認識することができる。そこで、こ
の例の場合、成長モデルにおける「幼児」（図４）を、
名前を認識することができる状態（名前を必要とする状
態）とし、モデル記憶部５１からの成長モデルが、「幼
児」を表しているとき、ロボットは、名前を必要とする
状態にまで成長したとものとする。For example, humans and dogs can recognize their names once they have grown to some extent. Therefore, in this example, “infant” (FIG. 4) in the growth model is
When the growth model from the model storage unit 51 indicates “infant”, the robot has grown to a state that requires a name. And

【００９３】ステップＳ２１で、ロボットが、名前を必
要とする状態にまで成長していないと判定された場合
（成長モデルが、「幼児」より成長していない状態を表
している場合）、行動決定機構部５２は、処理を終了
し、再度、ステップＳ２１から処理を再開する。If it is determined in step S21 that the robot has not grown to a state that requires a name (if the growth model represents a state where the robot has not grown from "infant"), an action is determined. The mechanism unit 52 ends the process, and restarts the process from step S21 again.

【００９４】ステップＳ２１で、ロボットが名前を必要
とする状態にまで成長したと判定された場合（成長モデ
ルが、「幼児」を表している場合）、ステップＳ２２に
進み、行動決定機構部５２は、その旨を、名前登録部５
８に通知する。If it is determined in step S21 that the robot has grown to a state that requires a name (if the growth model represents “infant”), the process proceeds to step S22, where the action determining mechanism unit 52 To that effect, the name registration unit 5
Notify 8.

【００９５】このようにして、ロボットが名前を必要と
する状態にまで成長した旨が、名前登録部５８に通知さ
れると、ステップＳ２３において、名前登録部５８は、
センサ入力処理部５０の音声認識部５０Ａを制御して、
マイク１５から与えられる音声信号についての音声認識
を停止させる。これにより、後述するステップＳ２９，
３８で音声認識が再開されるまで、音声認識は行われな
い。なお、このとき、名前登録部５８は、エコーバック
部５６を制御して、エコーバック処理（図８）の実行を
禁止する。In this way, when the name registration unit 58 is notified that the robot has grown to a state requiring a name, in step S23, the name registration unit 58
By controlling the voice recognition unit 50A of the sensor input processing unit 50,
The voice recognition of the voice signal given from the microphone 15 is stopped. Thereby, step S29, which will be described later,
Until speech recognition is resumed at 38, no speech recognition is performed. At this time, the name registration unit 58 controls the echo back unit 56 to prohibit the execution of the echo back process (FIG. 8).

【００９６】ステップＳ２４において、名前登録部５８
は、行動決定機構部５２を制御して、「頭を触ってほし
い」ことを表す行動指令情報を、姿勢遷移機構部５３に
出力させる。これにより、姿勢遷移機構部５３は、行動
決定機構部５２からの行動指令情報に基づいて、ロボッ
トの姿勢を、「頭を触ってほしい」ことを表す行動にお
ける各姿勢に遷移させるための姿勢遷移情報を生成し、
制御機構部５４に送出する。制御機構部５４は、姿勢遷
移機構部５３からの姿勢遷移情報に従って、アクチュエ
ータ３ＡＡ₁乃至５Ａ₁および５Ａ₂を駆動するための制
御信号を生成し、これを、アクチュエータ３ＡＡ₁乃至
５Ａ₁および５Ａ₂に送出する。In step S24, the name registration unit 58
Controls the action determination mechanism 52 to cause the attitude transition mechanism 53 to output action command information indicating "I want you to touch your head." Thereby, based on the action command information from the action determination mechanism section 52, the attitude change mechanism section 53 changes the attitude of the robot to each of the actions in the action indicating “I want you to touch my head”. Generate information,
It is sent to the control mechanism 54. Control mechanism unit 54 in accordance with the posture transition information from the attitude transition mechanism part 53 generates control signals for driving the actuators 3AA ₁ to 5A ₁ and 5A _2, which, actuators 3AA ₁ to 5A ₁ and 5A ₂ To send to.

【００９７】アクチュエータ３ＡＡ₁乃至５Ａ₁および５
Ａ₂は、制御信号にしたがって駆動し、ロボットは、例
えば、図１０に示すように、「頭を触ってほしい」こと
を表す行動を起こす（ロボットが、自分の手で、頭を数
回叩く）。Actuators 3AA _{1 to} 5A ₁ and 5
A ₂ is driven according to the control signal, and the robot takes an action indicating “I want you to touch the head” as shown in FIG. 10 (the robot hits the head several times with its own hand). ).

【００９８】次に、ステップＳ２５において、名前登録
部５８は、コントローラ１０に内蔵されるタイマーＴを
リセットしてスタートさせる。Next, in step S25, the name registration unit 58 resets and starts the timer T built in the controller 10.

【００９９】ステップＳ２６において、名前登録部５８
は、センサ入力処理部５０の圧力処理部５０Ｃと通信す
ることで、圧力処理部５０Ｃが「頭が触られた」と認識
したか否か、すなわち、ユーザがロボットの頭部を触っ
たか否かを判定する。In step S26, the name registration unit 58
Communicates with the pressure processing unit 50C of the sensor input processing unit 50 to determine whether or not the pressure processing unit 50C recognizes that “the head has been touched”, that is, whether or not the user has touched the head of the robot. Is determined.

【０１００】ステップＳ２６で、ユーザがロボットの頭
部を触ってないと判定された場合、ステップＳ２７に進
み、名前登録部５８は、ステップＳ２５でスタートした
タイマーＴの値が１０以上であるか否か（１０秒経過し
たか否か）を判定し、１０秒経過していないと判定した
場合、ステップＳ２６に戻り、それ以降の処理を実行す
る。If it is determined in step S26 that the user has not touched the head of the robot, the flow advances to step S27, and the name registration unit 58 determines whether the value of the timer T started in step S25 is 10 or more. Is determined (whether or not 10 seconds have elapsed). If it is determined that 10 seconds have not elapsed, the process returns to step S26, and the subsequent processing is executed.

【０１０１】一方、ステップＳ２６で、頭部が触られた
と判定された場合、ステップＳ２８に進み、名前登録部
５８は、行動決定機構部５２を制御して、「名前を付け
てほしい」ことを表す行動指令情報を、姿勢遷移機構部
５３に出力させる。これにより、姿勢遷移機構部５３
は、行動決定機構部５２からの行動指令情報に基づい
て、ロボットの姿勢を、「名前を付けてほしい」ことを
表す行動における各姿勢に遷移させるための姿勢遷移情
報を生成し、制御機構部５４に送出する。制御機構部５
４は、姿勢遷移機構部５３からの姿勢遷移情報に従っ
て、アクチュエータ３ＡＡ₁乃至５Ａ₁および５Ａ₂を駆
動するための制御信号を生成し、これを、アクチュエー
タ３ＡＡ₁乃至５Ａ₁および５Ａ₂に送出する。On the other hand, if it is determined in step S26 that the head has been touched, the flow advances to step S28, where the name registration unit 58 controls the action determination mechanism unit 52 to indicate that "I want a name to be given." The action command information to be represented is output to the posture transition mechanism 53. Thereby, the posture transition mechanism 53
Generates posture transition information for transitioning the posture of the robot to each posture in the behavior indicating “I want to give a name” based on the behavior command information from the behavior determination mechanism unit 52, 54. Control mechanism 5
4, in accordance with the posture transition information from the attitude transition mechanism part 53 generates control signals for driving the actuators 3AA ₁ to 5A ₁ and 5A _2, which is sent to the actuator 3AA ₁ to 5A ₁ and 5A ₂ .

【０１０２】アクチュエータ３ＡＡ₁乃至５Ａ₁および５
Ａ₂は、制御信号にしたがって駆動し、ロボットは、例
えば、図１１に示すように、「名前を付けてほしい」こ
とを表す行動を起こす（ロボットが、自分の耳を、上の
向け、それを左右に数回振る）。Actuators 3AA _{1 to} 5A ₁ and 5
A ₂ is driven in accordance with the control signals, the robot, for example, as shown in FIG. 11, take action indicating that "I want to name" (robot, their ears, the above-friendly, it Shake left and right several times).

【０１０３】次に、ステップＳ２９において、名前登録
部５８は、センサ入力処理部５０の音声認識部５０Ａを
制御して、ステップＳ２３で停止させた、音声認識を再
開させる。なお、このとき、名前登録部５８は、エコー
バック部５６を制御して、図８に示した処理のうち、ス
テップＳ１１乃至ステップＳ１４までの処理の実行を許
可する。これにより、入力された音声により生成された
エコーバック音声のデータが、メモリ４５に記憶される
が（ステップＳ１１乃至ステップＳ１４の処理は実行さ
れるが）、そのエコーバック音声は、スピーカ１８から
出力されない（ステップＳ１５の処理は実行されな
い）。Next, in step S29, the name registration section 58 controls the speech recognition section 50A of the sensor input processing section 50 to restart the speech recognition stopped in step S23. At this time, the name registration unit 58 controls the echo back unit 56 to permit execution of the processing from step S11 to step S14 in the processing shown in FIG. As a result, the data of the echo-back sound generated by the input sound is stored in the memory 45 (although the processing of steps S11 to S14 is executed), and the echo-back sound is output from the speaker 18. Is not performed (the process of step S15 is not performed).

【０１０４】ステップＳ３０において、名前登録部５８
は、タイマーＴをリセットしてスタートさせる。In step S30, name registration unit 58
Resets and starts the timer T.

【０１０５】次に、ステップＳ３１において、名前登録
部５８は、センサ入力処理部５０の音声認識部５０Ａと
通信することで、音声認識部５０Ａに音声信号が入力さ
れたか否かを判定する。Next, in step S31, the name registration unit 58 communicates with the voice recognition unit 50A of the sensor input processing unit 50 to determine whether a voice signal has been input to the voice recognition unit 50A.

【０１０６】ステップＳ３１で、音声信号が入力されな
いと判定された場合、ステップＳ３２に進み、名前登録
部５８は、ステップＳ３０でスタートしたタイマーＴの
値が１０であるか（１０秒経過したか否か）を判定し、
１０秒経過していないと判定した場合、ステップＳ３１
に戻り、それ以降の処理を実行する。If it is determined in step S31 that a voice signal is not input, the process proceeds to step S32, where the name registration unit 58 determines whether the value of the timer T started in step S30 is 10 (whether or not 10 seconds have elapsed). Or judge)
If it is determined that 10 seconds have not elapsed, step S31
To execute the subsequent processing.

【０１０７】ステップＳ３１で、音声信号が入力された
と判定された場合、ステップＳ３３に進み、名前登録部
５８は、音声認識部５０Ａと通信して、ステップＳ３１
で入力された音声（名前）が、ロボットの名前として適
当なものであるか否かを確認する。ここでの処理の詳細
は、図１２のフローチャートに示されている。If it is determined in step S31 that a voice signal has been input, the flow advances to step S33, in which the name registration unit 58 communicates with the voice recognition unit 50A, and proceeds to step S31.
It is confirmed whether or not the voice (name) input in is appropriate as the robot name. Details of the processing here are shown in the flowchart of FIG.

【０１０８】ステップＳ５１において、音声認識部５０
Ａのマッチング部２３（図５）は、特徴抽出部２２から
供給された特徴パラメータ（ステップＳ３１で入力され
た音声信号がＡＤ変換部２１でサンプリングされて得ら
れた音声データから、特徴抽出部２２により抽出された
特徴パラメータ）に対応して、音響モデル記憶部２４の
音響モデルを接続し、単語の音響モデル（単語モデル）
を生成する、そしてマッチング部２３は、生成した音響
モデルのうち、特徴パラメータが観測されるスコア（尤
度）が最も高い単語モデル（最適音素列）を検出する。In step S51, the voice recognition unit 50
The A matching unit 23 (FIG. 5) converts the feature parameters supplied from the feature extraction unit 22 (from the audio data obtained by sampling the audio signal input in step S31 by the AD conversion unit 21) into the feature extraction unit 22. The acoustic model of the acoustic model storage unit 24 is connected to the acoustic model of the word (word model) in accordance with
Then, the matching unit 23 detects a word model (optimal phoneme sequence) having the highest score (likelihood) at which the feature parameter is observed, from the generated acoustic models.

【０１０９】次に、ステップＳ５２において、マッチン
グ部２３は、辞書記憶部２５の単語辞書を参照すること
により、音響モデル記憶部２４に記憶されている音響モ
デルを接続し、単語の音響モデル（単語モデル）を生成
する。そしてマッチング部２３は、ここで生成した音響
モデルのうち、特徴パラメータが観測されるスコアが最
も高い単語モデル（最適登録音素列）を検出する。Next, in step S52, the matching unit 23 refers to the word dictionary in the dictionary storage unit 25, connects the acoustic models stored in the acoustic model storage unit 24, and Model). Then, the matching unit 23 detects, from the acoustic models generated here, a word model (optimal registered phoneme sequence) having the highest score at which the characteristic parameter is observed.

【０１１０】ステップＳ５３において、マッチング部２
３は、ステップＳ５１で検出した最適音素列のスコアと
ステップＳ５３で検出した最適登録音素列のスコアとの
差を算出し（この例の場合、最適音素列のスコアから、
最適登録音素列のスコアを減算し）、ステップＳ５４に
おいて、算出結果（減算結果）が、所定の閾値より大き
いか否かを判定し、大きいと判定した場合、ステップＳ
５５に進む。In step S53, matching unit 2
3 calculates the difference between the score of the optimal phoneme string detected in step S51 and the score of the optimal registered phoneme string detected in step S53 (in this example, the score of the optimal phoneme string is
The score of the optimally registered phoneme sequence is subtracted), and in step S54, it is determined whether the calculation result (subtraction result) is greater than a predetermined threshold.
Go to 55.

【０１１１】ステップＳ５５において、マッチング部２
３は、ステップＳ５１で検出した最適音素列は、名前と
して適当であることを、名前登録部５８に通知する。こ
の場合、入力された音声で表される名前と音響上類似す
る単語が、音声認識部５０Ａの辞書記憶部２５の単語辞
書には登録されていないので、検出された最適音素列
は、名前として適当である。At step S55, matching unit 2
No. 3 notifies the name registration unit 58 that the optimal phoneme string detected in step S51 is appropriate as a name. In this case, a word that is acoustically similar to the name represented by the input voice is not registered in the word dictionary of the dictionary storage unit 25 of the voice recognition unit 50A. Appropriate.

【０１１２】一方、ステップＳ５４で、ステップＳ５３
での算出結果（減算結果）が、所定の閾値以下であると
判定された場合、ステップＳ５６に進み、マッチング部
２３は、ステップＳ５１で検出した最適音素列は、名前
として適当でないことを（不適当であることを）、名前
登録部５８に通知する。この場合、入力された音声で表
される名前と音響上類似する単語が、音声認識部５０Ａ
の辞書記憶部２５の単語辞書には登録されているので、
検出された最適音素列は、名前として適当でない。On the other hand, in step S54, step S53
If it is determined that the calculation result (subtraction result) is equal to or smaller than the predetermined threshold, the process proceeds to step S56, and the matching unit 23 determines that the optimal phoneme string detected in step S51 is not appropriate as a name (not Is appropriate), and notifies the name registration unit 58. In this case, a word acoustically similar to the name represented by the input voice is recognized by the voice recognition unit 50A.
Is registered in the word dictionary of the dictionary storage unit 25,
The detected optimal phoneme sequence is not appropriate as a name.

【０１１３】ステップＳ５５またはステップＳ５６での
処理の後、確認処理は終了し、図９のステップＳ３４に
進む。After the processing in step S55 or S56, the confirmation processing ends, and the flow advances to step S34 in FIG.

【０１１４】ステップＳ３４において、名前登録部５８
は、ステップＳ３３での確認結果に基づいて（図１２の
ステップＳ５５またはステップＳ５６で、音声認識部５
０Ａからの通知内容に基づいて）、ステップＳ３１で入
力された音声で表される単語（ステップＳ５１で検出さ
れた最適音素列）を、名前として登録できるか否かを判
定する。In step S34, name registration unit 58
Is based on the confirmation result in step S33 (in step S55 or step S56 in FIG.
Based on the notification content from 0A), it is determined whether the word represented by the voice input in step S31 (the optimal phoneme string detected in step S51) can be registered as a name.

【０１１５】図１２のステップＳ５４で、名前として適
当であることが通知された場合、名前登録部５８は、ス
テップＳ３４で、名前として登録できると判定し、ステ
ップＳ３５に進み、音声認識部５０Ａを制御して、ステ
ップＳ５１で検出された最適音素列を、名前として、辞
書記憶部２５の単語辞書に登録させる。If the name is notified in step S54 of FIG. 12 that the name is appropriate, the name registration unit 58 determines in step S34 that the name can be registered, and the process proceeds to step S35, where the speech recognition unit 50A Under the control, the optimal phoneme string detected in step S51 is registered as a name in the word dictionary of the dictionary storage unit 25.

【０１１６】次に、ステップＳ３６において、名前登録
部５８は、エコーバック部５６を制御して、メモリ４５
に記憶されているエコーバック音声のデータを、出力制
御部５７に出力させる（図８のステップＳ１５の処理の
実行を許可する）。これにより、登録された名前のエコ
ーバック音声が、出力制御部５７を介して、スピーカ１
８に供給されて出力される。Next, in step S36, the name registration unit 58 controls the echo back unit 56 to
Is output to the output control unit 57 (permission of execution of the process of step S15 in FIG. 8 is permitted). As a result, the echo back sound of the registered name is transmitted to the speaker 1 via the output control unit 57.
8 and output.

【０１１７】エコーバック部５６は、ステップＳ２９
で、図８のステップＳ１１乃至ステップＳ１４での処理
の実行が許可されているので、ステップＳ３１で音声
（名前）が入力されたとき、エコーバック部５６のメモ
リ４５には、そのエコーバック音声のデータが記憶され
ている。The echo back unit 56 determines in step S29
Since the execution of the processes in steps S11 to S14 of FIG. 8 is permitted, when a voice (name) is input in step S31, the memory 45 of the echo back unit 56 stores the echo back voice. Data is stored.

【０１１８】一方、名前登録部５８は、図１２のステッ
プＳ５５で、名前として適切でないことが通知された場
合、ステップＳ３４で、名前として登録できないと判定
し、ステップＳ３７に進み、行動決定機構部５２を制御
して、「発話された名前は登録されなかった」ことを表
す行動の行動指令情報を、姿勢遷移機構部５３に出力さ
せる。これにより、姿勢遷移機構部５３は、行動決定機
構部５２からの行動指令情報に基づいて、ロボットの姿
勢を、「発話された名前が登録されなかった」ことを表
す行動における各姿勢に遷移されるための姿勢遷移情報
を生成し、制御機構部５４に送出する。制御機構部５４
は、姿勢遷移機構部５３からの姿勢遷移情報に従って、
アクチュエータ３ＡＡ₁乃至５Ａ₁および５Ａ₂を駆動す
るための制御信号を生成し、これを、アクチュエータ３
ＡＡ₁乃至５Ａ₁および５Ａ₂に送出する。これにより、
アクチュエータ３ＡＡ₁乃至５Ａ₁および５Ａ₂は、制御
信号にしたがって駆動し、ロボットは、「発話された名
前が登録されなかった」ことを表す行動を起こす。On the other hand, if the name registration unit 58 is notified in step S55 of FIG. 12 that the name is not appropriate, it determines in step S34 that it cannot be registered as a name, and proceeds to step S37, where the action determination mechanism unit The control unit 52 controls the posture transition mechanism unit 53 to output the action command information of the action indicating that “the spoken name was not registered”. Thereby, the posture transition mechanism unit 53 changes the posture of the robot to each posture in the behavior indicating that “the spoken name was not registered” based on the behavior command information from the behavior determination mechanism unit 52. And generates the posture transition information for transmission to the control mechanism unit 54. Control mechanism 54
Is based on the posture transition information from the posture transition mechanism 53.
Generates a control signal for driving the actuator 3AA ₁ to 5A ₁ and 5A _2, which, actuator 3
AA ₁ to be sent to 5A ₁ and 5A _2. This allows
Actuators 3AA ₁ to 5A ₁ and 5A ₂ is driven in accordance with the control signals, the robot take action indicating that "name that is spoken is not registered".

【０１１９】ステップＳ２７で、１０秒経過したと判定
されたとき、すなわち、「頭を触ってほしい」ことを表
す行動が行われてから１０秒以内に頭が触られなかった
ときステップＳ３８に進み、名前登録部５８は、センサ
入力処理部５０の音声認識部５０Ａを制御して、ステッ
プＳ２３で停止させた、音声認識を再開される。なお、
このとき、名前登録部５８は、エコーバック部５６を制
御して、エコーバック処理（図８）の実行を許可する。In step S27, when it is determined that 10 seconds have elapsed, that is, when the head has not been touched within 10 seconds after the action indicating "I want to touch my head" is performed, the flow proceeds to step S38. The name registration unit 58 controls the voice recognition unit 50A of the sensor input processing unit 50 to restart the voice recognition stopped in step S23. In addition,
At this time, the name registration unit 58 controls the echo back unit 56 to permit the execution of the echo back process (FIG. 8).

【０１２０】ステップＳ３２で、１０秒間経過したと判
定された場合、すなわち、「名前を付けてほしい」こと
を表す行動が行われてから１０秒以内に音声が入力され
なかったとき、またはステップＳ３８で、音声認識が再
開されたとき、ステップＳ３９に進み、名前登録部５８
は、行動決定機構部５２を制御して、「名前が入力され
なかった」ことを表す行動の行動指令情報を、姿勢遷移
機構部５３に出力させる。これにより、姿勢遷移機構部
５３は、行動決定機構部５２からの行動指令情報に基づ
いて、ロボットの姿勢を、「名前が入力されなかった」
ことを表す行動における各姿勢に遷移されるための姿勢
遷移情報を生成し、制御機構部５４に送出する。制御機
構部５４は、姿勢遷移機構部５３からの姿勢遷移情報に
従って、アクチュエータ３ＡＡ₁乃至５Ａ₁および５Ａ₂
を駆動するための制御信号を生成し、これを、アクチュ
エータ３ＡＡ₁乃至５Ａ₁および５Ａ₂に送出する。これ
により、アクチュエータ３ＡＡ₁乃至５Ａ₁および５Ａ₂
は、制御信号にしたがって駆動し、ロボットは、「名前
が入力されなかった」ことを表す行動を起こす。In step S32, when it is determined that 10 seconds have elapsed, that is, when no voice has been input within 10 seconds after the action indicating "I want you to give a name" has been performed, or in step S38. When the voice recognition is restarted, the process proceeds to step S39, where the name registration unit 58
Controls the action determination mechanism 52 to cause the attitude transition mechanism 53 to output action command information of an action indicating that “name has not been input”. Thereby, the posture transition mechanism unit 53 sets the posture of the robot to “the name has not been input” based on the action command information from the action determination mechanism unit 52.
Posture transition information for transitioning to each posture in the action indicating the fact is generated and transmitted to the control mechanism unit 54. Control mechanism unit 54 in accordance with the posture transition information from the attitude transition mechanism part 53, the actuator 3AA ₁ to 5A ₁ and 5A ₂
It generates a control signal for driving the, which is sent to the actuator 3AA ₁ to 5A ₁ and 5A _2. Thus, the actuator 3AA ₁ to 5A ₁ and 5A ₂
Is driven according to the control signal, and the robot takes an action indicating that “name has not been input”.

【０１２１】ステップＳ３６で、エコーバック音声が出
力されたとき、ステップＳ３７で、「発話された名前が
登録されなかった」ことを表す行動が行われたとき、ま
たはステップＳ３９で、「名前が入力されなかった」こ
とを表す行動が行われたとき、処理は終了する。In step S36, when an echo back sound is output, in step S37, when an action indicating that "the spoken name has not been registered" is performed, or in step S39, "name is input." When the action indicating “not performed” has been performed, the process ends.

【０１２２】以上のように、ロボットが、「名前を付け
てほしい」ことを表す行動を起こした後に入力される音
声を、名前として認識するようにしたので、ロボット
は、このとき入力された音声を、名前として適切に認識
することができる。As described above, the voice input after the robot has performed an action indicating "I want you to give a name" is recognized as a name. Can be properly recognized as a name.

【０１２３】なお、以上においては、ロボットが、名前
を必要とする状態にまで成長したとき、名前登録処理を
開始する場合を例として説明したが、図１３のフローチ
ャートに示すように、ステップＳ６１で、センサ入力処
理部５０の音声認識部５０Ａが、「名前登録」の指令を
認識し、ステップＳ６２で、その旨を、名前登録部５８
に通知したときにおいて、名前登録処理が開始されるよ
うにすることもできる。すなわち、この例の場合、ユー
ザは、「名前登録」と発話することで、ロボットに名前
を付けることができる。In the above description, a case has been described as an example where the robot starts name registration processing when the robot has grown to a state where a name is required. However, as shown in the flowchart of FIG. The voice recognition unit 50A of the sensor input processing unit 50 recognizes the instruction of “name registration”, and in step S62, notifies the name registration unit 58.
, The name registration process may be started. That is, in the case of this example, the user can give a name to the robot by speaking “name registration”.

【０１２４】ステップＳ６３乃至ステップＳ７９におい
ては、図９のステップＳ２３乃至ステップＳ３９におけ
る場合と同様の処理が行われるので、その説明は省略す
る。In steps S63 to S79, the same processes as those in steps S23 to S39 in FIG. 9 are performed, and thus description thereof will be omitted.

【０１２５】次に、図１４のフローチャートを参照し
て、他の名前登録処理の手順を説明する。この場合、名
前登録処理は、図９の例の場合と同様に、ステップＳ９
１において、ロボットが、名前を必要とする状態にまで
成長したと判定され、ステップＳ９２で、その旨が、名
前登録部５８に通知されたときに開始される。Next, the procedure of another name registration process will be described with reference to the flowchart of FIG. In this case, as in the case of the example of FIG.
In 1, it is determined that the robot has grown to a state that requires a name, and is started when the fact is notified to the name registration unit 58 in step S92.

【０１２６】ロボットが名前を必要とする状態にまで成
長した旨が、名前登録部５８に通知されると、ステップ
Ｓ９３において、名前登録部５８は、センサ入力処理部
５０の音声認識部５０Ａを制御して、マイク１５から与
えられる音声信号についての音声認識を停止させる。こ
れにより、後述するステップＳ９８，１０９で音声認識
が再開されるまで、音声認識は行われない。なお、この
とき、名前登録部５８は、エコーバック部５６を制御し
て、エコーバック処理（図８）の実行を禁止する。When the name registration unit 58 is notified that the robot has grown to a state requiring a name, the name registration unit 58 controls the voice recognition unit 50A of the sensor input processing unit 50 in step S93. Then, the voice recognition of the voice signal given from the microphone 15 is stopped. As a result, voice recognition is not performed until voice recognition is restarted in steps S98 and S109 described below. At this time, the name registration unit 58 controls the echo back unit 56 to prohibit the execution of the echo back process (FIG. 8).

【０１２７】ステップＳ９４において、名前登録部５８
は、行動決定機構部５２を制御して、「名前を付けてほ
しい」ことを表す行動指令情報を、姿勢遷移機構部５３
に出力させる。これにより、姿勢遷移機構部５３は、行
動決定機構部５２からの行動指令情報に基づいて、ロボ
ットの姿勢を、「名前を付けてほしい」ことを表す行動
における各姿勢に遷移させるための姿勢遷移情報を生成
し、制御機構部５４に送出する。制御機構部５４は、姿
勢遷移機構部５３からの姿勢遷移情報に従って、アクチ
ュエータ３ＡＡ₁乃至５Ａ₁および５Ａ₂を駆動するため
の制御信号を生成し、これを、アクチュエータ３ＡＡ₁
乃至５Ａ₁および５Ａ₂に送出する。In step S94, name registration unit 58
Controls the action determination mechanism 52 to send the action command information indicating “I want you to give a name” to the posture transition mechanism 53
Output. Thereby, based on the action command information from the action determining mechanism section 52, the attitude transition mechanism section 53 changes the attitude of the robot to each of the attitudes in the action indicating “I want you to give a name”. The information is generated and transmitted to the control mechanism 54. Control mechanism unit 54 in accordance with the posture transition information from the attitude transition mechanism part 53 generates control signals for driving the actuators 3AA ₁ to 5A ₁ and 5A _2, which, actuators 3AA ₁
To be sent to 5A ₁ and 5A _2.

【０１２８】アクチュエータ３ＡＡ₁乃至５Ａ₁および５
Ａ₂は、制御信号にしたがって駆動し、ロボットは、例
えば、図１１に示すように、「名前を付けてほしい」こ
とを表す行動を起こす（ロボットが、自分の耳を、上の
向け、それを左右に数回振る）。Actuators 3AA _{1 to} 5A ₁ and 5
A ₂ is driven in accordance with the control signals, the robot, for example, as shown in FIG. 11, take action indicating that "I want to name" (robot, their ears, the above-friendly, it Shake left and right several times).

【０１２９】次に、ステップＳ９５において、名前登録
部５８は、コントローラ１０に内蔵されるタイマーＴを
リセットしてスタートさせる。Next, in step S95, the name registration unit 58 resets and starts a timer T built in the controller 10.

【０１３０】ステップＳ９６において、名前登録部５８
は、センサ入力処理部５０の圧力処理部５０Ｃと通信す
ることで、ユーザがロボットの頭部を触ったか否かを判
定する。In step S96, name registration unit 58
Determines whether or not the user has touched the head of the robot by communicating with the pressure processing unit 50C of the sensor input processing unit 50.

【０１３１】ステップＳ９６で、ユーザがロボットの頭
部を触ってないと判定された場合、ステップＳ９７に進
み、名前登録部５８は、ステップＳ９５でスタートした
タイマーＴの値が１０以上であるか否か（１０秒経過し
たか否か）を判定し、１０秒経過していないと判定した
場合、ステップＳ９６に戻り、それ以降の処理を実行す
る。If it is determined in step S96 that the user has not touched the head of the robot, the flow advances to step S97, and the name registration unit 58 determines whether the value of the timer T started in step S95 is 10 or more. (If 10 seconds have elapsed), and if it is determined that 10 seconds have not elapsed, the process returns to step S96, and the subsequent processing is executed.

【０１３２】一方、ステップＳ９６で、頭部が触られた
と判定された場合、ステップＳ９８に進み、名前登録部
５８は、センサ入力処理部５０の音声認識部５０Ａを制
御して、ステップＳ９３で停止させた、音声認識を再開
させる。なお、このとき、名前登録部５８は、エコーバ
ック部５６を制御して、図８に示した処理のうち、ステ
ップＳ１１乃至ステップＳ１４までの処理の実行を許可
する。これにより、入力された音声により生成されたエ
コーバック音声のデータが、メモリ４５に記憶される。On the other hand, if it is determined in step S96 that the head has been touched, the flow advances to step S98, where the name registration unit 58 controls the voice recognition unit 50A of the sensor input processing unit 50, and stops in step S93. Then, the voice recognition is restarted. At this time, the name registration unit 58 controls the echo back unit 56 to permit execution of the processing from step S11 to step S14 in the processing shown in FIG. As a result, the data of the echo-back sound generated from the input sound is stored in the memory 45.

【０１３３】次に、ステップＳ９９において、名前登録
部５８は、ＬＥＤ１９を点灯させる。Next, in step S99, the name registration section 58 turns on the LED 19.

【０１３４】ステップＳ１００において、名前登録部５
８は、タイマーＴをリセットしてスタートさせる。At step S100, the name registration unit 5
Step 8 resets and starts the timer T.

【０１３５】次に、ステップＳ１０１において、名前登
録部５８は、センサ入力処理部５０の音声認識部５０Ａ
と通信することで、音声認識部５０Ａに音声信号が入力
されたか否かを判定する。Next, in step S 101, the name registration unit 58 makes the voice recognition unit 50 A of the sensor input processing unit 50
By communicating with, it is determined whether or not a voice signal has been input to the voice recognition unit 50A.

【０１３６】ステップＳ１０１で、音声信号が入力され
ないと判定された場合、ステップＳ１０２に進み、名前
登録部５８は、ステップＳ１００でスタートしたタイマ
ーＴの値が１０であるか（１０秒経過したか否か）を判
定し、１０秒経過していないと判定した場合、ステップ
Ｓ１０１に戻り、それ以降の処理を実行する。If it is determined in step S101 that a voice signal is not input, the process proceeds to step S102, and the name registration unit 58 determines whether the value of the timer T started in step S100 is 10 (whether 10 seconds have elapsed). If it is determined that 10 seconds have not elapsed, the process returns to step S101, and the subsequent processing is executed.

【０１３７】ステップＳ１０１で、音声信号が入力され
たと判定された場合、ステップＳ１０３に進み、名前登
録部５８は、ステップＳ９９で点灯させたＬＥＤ１９を
消灯させる。If it is determined in step S101 that an audio signal has been input, the process proceeds to step S103, where the name registration unit 58 turns off the LED 19 turned on in step S99.

【０１３８】ステップＳ１０４乃至ステップＳ１０９、
およびステップＳ１１１においては、図９のステップＳ
３３乃至ステップＳ３９における場合と同様の処理が実
行されるので、その説明は省略する。Steps S104 to S109,
And in step S111, step S111 in FIG.
Since the same processing as in the steps 33 to S39 is performed, the description thereof is omitted.

【０１３９】ステップＳ１０２で、１０秒経過したと判
定された場合、ステップＳ１１０に進み、名前登録部５
８は、ステップＳ９９で点灯させたＬＥＤ１９を消灯さ
せる。If it is determined in step S102 that 10 seconds have elapsed, the process proceeds to step S110, where the name registration unit 5
8 turns off the LED 19 turned on in step S99.

【０１４０】この例の場合においても、図１５のフロー
チャートに示すように、ステップＳ１２１で、センサ入
力処理部５０の音声認識部５０Ａが、「名前登録」の指
令を認識し、ステップＳ１２２において、その旨が、名
前登録部５８に通知されたとき、名前登録処理が開始さ
れるようにすることもできる。Also in the case of this example, as shown in the flowchart of FIG. 15, the voice recognition unit 50A of the sensor input processing unit 50 recognizes the command of “name registration” in step S121, and in step S122, When the notification to the effect is sent to the name registration unit 58, the name registration process may be started.

【０１４１】ステップＳ１２３乃至ステップＳ１４１に
おいては、図１４のステップＳ９３乃至ステップＳ１１
１における場合と同様の処理が行われるので、その説明
は省略する。In steps S123 to S141, steps S93 to S11 in FIG.
Since the same processing as in the case of No. 1 is performed, the description thereof is omitted.

【０１４２】なお、以上においては、名前が登録された
後、名前を表す音声から生成されたエコーバック音声を
出力する場合を例として説明したが、予めメモリ４５に
記憶させた所定の音声を出力するようにすることもでき
る。In the above description, a case has been described in which, after a name is registered, an echo-back sound generated from a sound representing the name is output, but a predetermined sound stored in the memory 45 in advance is output. It can also be done.

【０１４３】また、登録された音声（名前）から生成さ
れたエコーバック音声のデータをメモリ４５が保持する
ようにして、ユーザが、「名前はなんですか」と発話
し、ロボットがそれを認識したとき、メモリ４５に保持
されている名前のエコーバック音声が出力されるように
することもできる。In addition, the memory 45 holds the data of the echo-back sound generated from the registered sound (name), and the user utters “What is the name?” And the robot recognizes it. At this time, the echo back sound having the name stored in the memory 45 may be output.

【０１４４】以上、本発明を、エンターテイメント用の
ロボット（疑似ペットとしてのロボット）に適用した場
合について説明したが、本発明は、これに限らず、例え
ば、産業用のロボット等の各種のロボットに広く適用す
ることが可能である。また、本発明は、現実世界のロボ
ットだけでなく、例えば、液晶ディスプレイ等の表示装
置に表示される仮想的なロボットにも適用可能である。The case where the present invention is applied to an entertainment robot (robot as a pseudo pet) has been described above. However, the present invention is not limited to this, and may be applied to various robots such as industrial robots. It can be widely applied. In addition, the present invention is applicable not only to a robot in the real world but also to a virtual robot displayed on a display device such as a liquid crystal display.

【０１４５】さらに、本実施の形態においては、上述し
た一連の処理を、ＣＰＵ１０Ａにプログラムを実行させ
ることにより行うようにしたが、一連の処理は、それ専
用のハードウェアによって行うことも可能である。Furthermore, in the present embodiment, the above-described series of processing is performed by causing the CPU 10A to execute a program, but the series of processing may be performed by dedicated hardware. .

【０１４６】なお、プログラムは、あらかじめメモリ１
０Ｂ（図２）に記憶させておく他、フロッピー（登録商
標）ディスク、CD-ROM(Compact Disc Read Only Memor
y)，MO(Magneto optical)ディスク，DVD(Digital Versa
tile Disc)、磁気ディスク、半導体メモリなどのリムー
バブル記録媒体に、一時的あるいは永続的に格納（記
録）しておくことができる。そして、このようなリムー
バブル記録媒体を、いわゆるパッケージソフトウエアと
して提供し、ロボット（メモリ１０Ｂ）にインストール
するようにすることができる。The program is stored in the memory 1 in advance.
0B (FIG. 2), a floppy (registered trademark) disk, a CD-ROM (Compact Disc Read Only Memor
y), MO (Magneto optical) disc, DVD (Digital Versa)
It can be temporarily or permanently stored (recorded) in a removable recording medium such as a tile disc), a magnetic disk, or a semiconductor memory. Then, such a removable recording medium can be provided as so-called package software, and can be installed in the robot (memory 10B).

【０１４７】また、プログラムは、ダウンロードサイト
から、ディジタル衛星放送用の人工衛星を介して、無線
で転送したり、LAN(Local Area Network)、インターネ
ットといったネットワークを介して、有線で転送し、メ
モリ１０Ｂにインストールすることができる。The program is transferred from a download site wirelessly via an artificial satellite for digital satellite broadcasting, or by wire via a network such as a LAN (Local Area Network) or the Internet. Can be installed.

【０１４８】この場合、プログラムがバージョンアップ
されたとき等に、そのバージョンアップされたプログラ
ムを、メモリ１０Ｂに、容易にインストールすることが
できる。In this case, when the program is upgraded, the upgraded program can be easily installed in the memory 10B.

【０１４９】ここで、本明細書において、ＣＰＵ１０Ａ
に各種の処理を行わせるためのプログラムを記述する処
理ステップは、必ずしもフローチャートとして記載され
た順序に沿って時系列に処理する必要はなく、並列的あ
るいは個別に実行される処理（例えば、並列処理あるい
はオブジェクトによる処理）も含むものである。Here, in this specification, the CPU 10A
The processing steps for writing a program for causing the CPU to perform various types of processing do not necessarily need to be processed in chronological order in the order described in the flowchart, and may be performed in parallel or individually (for example, parallel processing). Or processing by an object).

【０１５０】また、プログラムは、１のＣＰＵにより処
理されるものであっても良いし、複数のＣＰＵによって
分散処理されるものであっても良い。Further, the program may be processed by one CPU or may be processed by a plurality of CPUs in a distributed manner.

【０１５１】[0151]

【発明の効果】本発明のロボット制御装置および方法、
並びに記録媒体のプログラムによれば、名前を付けてほ
しいことを表す行動をロボットが起こすように、ロボッ
トの行動を制御し、ロボットの行動が制御された後に入
力された音声から、最適音素列を検出し、最適音素列
を、名前として登録するようにしたので、名前を適切に
登録することができる。The robot control apparatus and method of the present invention,
In addition, according to the program on the recording medium, the robot controls the behavior of the robot so as to cause the robot to perform an action indicating that a name is to be given. Since the detection and the registration of the optimal phoneme string are performed as names, the names can be registered appropriately.

[Brief description of the drawings]

【図１】本発明を適用したロボットの一実施の形態の外
観構成例を示す斜視図である。FIG. 1 is a perspective view illustrating an external configuration example of a robot according to an embodiment of the present invention.

【図２】ロボットの内部構成例を示すブロック図であ
る。FIG. 2 is a block diagram illustrating an example of an internal configuration of a robot.

【図３】コントローラ１０の機能的構成例を示すブロッ
ク図である。FIG. 3 is a block diagram illustrating a functional configuration example of a controller 10;

【図４】成長モデルを示す図である。FIG. 4 is a diagram showing a growth model.

【図５】音声認識部５０Ａの構成例を示すブロック図で
ある。FIG. 5 is a block diagram illustrating a configuration example of a voice recognition unit 50A.

【図６】単語辞書を示す図である。FIG. 6 is a diagram showing a word dictionary.

【図７】エコーバック部５６の構成例を示すブロック図
である。FIG. 7 is a block diagram illustrating a configuration example of an echo back unit 56.

【図８】エコーバック部５６によるエコーバック処理を
説明するフローチャートである。FIG. 8 is a flowchart illustrating an echo back process performed by an echo back unit.

【図９】名前登録処理を説明するフローチャートであ
る。FIG. 9 is a flowchart illustrating a name registration process.

【図１０】ロボットの行動を説明する図である。FIG. 10 is a diagram illustrating the behavior of the robot.

【図１１】ロボットの他の行動を説明する図である。FIG. 11 is a diagram illustrating another action of the robot.

【図１２】図９のステップＳ３３の処理の詳細を説明す
るフローチャートである。FIG. 12 is a flowchart illustrating details of a process in step S33 of FIG. 9;

【図１３】他の名前登録処理を説明するフローチャート
である。FIG. 13 is a flowchart illustrating another name registration process.

【図１４】他の名前登録処理を説明するフローチャート
である。FIG. 14 is a flowchart illustrating another name registration process.

【図１５】他の名前登録処理を説明するフローチャート
である。FIG. 15 is a flowchart illustrating another name registration process.

[Explanation of symbols]

１頭部ユニット，４Ａ下顎部，１０コントロ
ーラ，１０ＡＣＰＵ，１０Ｂメモリ，１５
マイク，１６ＣＣＤカメラ，１７タッチセン
サ，１８スピーカ，１９ＬＥＤ，２１ＡＤ
変換部，２２特徴抽出部，２３マッチング部，
２４音響モデル記憶部，２５辞書記憶部，２６
文法記憶部，２７音声区間検出部，３１テキ
スト生成部，３２規則合成部，３４辞書記憶
部，３５生成用文法記憶部，３６音素片記憶部，
４１ＡＤ変換部，４２韻律分析部，４３音
生成部，４４出力部，４５メモリ，４６音
声区間検出部，５０センサ入力処理部，５０Ａ
音声認識部，５０Ｂ画像認識部，５０Ｃ圧力処
理部，５１モデル記憶部，５２行動決定機構
部，５３姿勢遷移機構部，５４制御機構部，
５５音声合成部，５６エコーバック部，５７出
力制御部，５８名前登録部1 head unit, 4A lower jaw, 10 controller, 10A CPU, 10B memory, 15
Microphone, 16 CCD camera, 17 touch sensor, 18 speaker, 19 LED, 21 AD
Conversion unit, 22 feature extraction unit, 23 matching unit,
24 acoustic model storage unit, 25 dictionary storage unit, 26
Grammar storage unit, 27 voice section detection unit, 31 text generation unit, 32 rule synthesis unit, 34 dictionary storage unit, 35 grammar storage unit for generation, 36 phoneme segment storage unit,
41 AD conversion unit, 42 prosody analysis unit, 43 sound generation unit, 44 output unit, 45 memory, 46 voice section detection unit, 50 sensor input processing unit, 50A
Voice recognition section, 50B image recognition section, 50C pressure processing section, 51 model storage section, 52 action determination mechanism section, 53 attitude transition mechanism section, 54 control mechanism section,
55 voice synthesis unit, 56 echo back unit, 57 output control unit, 58 name registration unit

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 3/00 ５５１Ｈ (72)発明者広井順東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者小野木渡東京都品川区北品川６丁目７番35号ソニー株式会社内Ｆターム(参考） 2C150 CA02 DA05 DA24 DA26 DA27 DA28 DF03 DF04 DF33 ED42 ED52 EF07 EF16 EF23 EF29 EF33 EF36 3F059 AA00 BA00 BB06 DA05 DC00 FC00 3F060 AA00 BA10 CA14 5D015 GG03 KK02 LL10 LL11 ──────────────────────────────────────────────────の Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G10L 3/00 551H (72) Inventor Jun Hiroi 6-7-35 Kita Shinagawa, Shinagawa-ku, Tokyo Sony Stock In-house (72) Inventor Wataru Onoki 6-7-35 Kita-Shinagawa, Shinagawa-ku, Tokyo F-term in Sony Corporation (reference) 2C150 CA02 DA05 DA24 DA26 DA27 DA28 DF03 DF04 DF33 ED42 ED52 EF07 EF16 EF23 EF29 EF33 EF36 3F059 AA00 BA00 BB06 DA05 DC00 FC00 3F060 AA00 BA10 CA14 5D015 GG03 KK02 LL10 LL11

Claims

[Claims]

1. A robot control device for controlling a robot that behaves based on at least a speech recognition result, wherein the behavior control means controls the behavior of the robot so that the robot performs a behavior indicating that a name is desired. Detecting means for detecting an optimum phoneme string from a voice input after the action of the robot is controlled by the action control means; and registration means for registering the optimum phoneme string as the name. A robot control device characterized by the following.

2. The robot according to claim 1, further comprising a transition unit configured to sequentially transition a growth state of the robot to a predetermined state, wherein the behavior control unit transitions the growth state of the robot to a predetermined state by the transition unit. The robot control device according to claim 1, wherein the robot controller controls the behavior of the robot so that the robot performs an action indicating that the user wants to give the name.

3. An acquisition unit for acquiring a command input by a user for instructing start of a name registration process, wherein the action control unit is configured to, when the command is acquired by the acquisition unit, enter the name. The robot control device according to claim 1, wherein the control unit controls the behavior of the robot so that the robot performs an action indicating that the user wants to attach a character.

4. The method according to claim 1, further comprising: first storage means for storing an acoustic model representing an acoustic feature; and second storage means for storing a word dictionary in which words for voice recognition are registered. Means is a word model consisting of the acoustic model connected corresponding to the feature parameter of the voice,
Detecting the word model having the highest score in which the feature parameter is observed as the optimal phoneme string, wherein the registration unit is connected in correspondence with the phoneme of the word already registered in the word dictionary. Among the word models composed of acoustic models, the word model having the highest score at which the feature parameter is observed is detected, and the difference between the detected word model score and the score of the optimal phoneme sequence is determined by a predetermined value. The robot control device according to claim 1, wherein when the value is larger than a threshold value, the optimal phoneme sequence is registered as the name.

5. A robot control method for a robot control device for controlling a robot acting based on at least a speech recognition result, wherein the robot controls an action of the robot so that the robot performs an action indicating that a name is desired. A control step of detecting an optimal phoneme sequence from a voice input after the behavior of the robot is controlled by the behavior control step; and a registration step of registering the optimal phoneme sequence as the name. A robot control method comprising:

6. A program for a robot control device for controlling a robot acting at least based on a speech recognition result, wherein the program controls the behavior of the robot so that the robot performs an action indicating that a name is desired. Behavior control means, detection means for detecting an optimal phoneme sequence from a voice input after the behavior control means controls the behavior of the robot, and registration means for registering the optimal phoneme sequence as the name. A recording medium on which a computer-readable program is recorded.