JP4529091B2

JP4529091B2 - Learning apparatus, learning method, and robot apparatus

Info

Publication number: JP4529091B2
Application number: JP2006210319A
Authority: JP
Inventors: 一美青山; 秀樹下村
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2006-08-01
Filing date: 2006-08-01
Publication date: 2010-08-25
Anticipated expiration: 2023-02-19
Also published as: JP2007021719A

Description

本発明は学習装置及び学習方法並びにロボット装置に関し、例えばエンターテインメントロボットに適用して好適なものである。 The present invention relates to a learning apparatus, a learning method, and a robot apparatus, and is suitable for application to, for example, an entertainment robot.

従来、例えばセキュリティシステムなどにおける指紋認識器や声紋認識器等の認識器が新しいカテゴリを学習しようとする場合、学習していることがユーザに明示的であるような「学習モード」といったものが存在する。 Conventionally, when a recognizer such as a fingerprint recognizer or voiceprint recognizer in a security system, for example, tries to learn a new category, there is a “learning mode” in which learning is clearly indicated to the user. To do.

かかる「学習モード」は、上述のようなセキュリティのために指紋や声紋などを覚えるというような場合には、センシング情報がセキュリティに用いられるという目的がはっきりしているため、今現在自己の情報が登録されているとユーザに知らせるためにも明示的であるほうが好ましい。 In such a “learning mode”, the purpose of sensing information being used for security is clear when the user remembers fingerprints or voiceprints for security as described above. It is preferable to be explicit to inform the user that it is registered.

ところが、インタラクションの最中にユーザが誰であるかを認識するためにセンシング情報を利用するようなエンターテインメントロボットでは、自然なインタラクションの中でユーザを認識できるようにすることが重要となる。 However, in an entertainment robot that uses sensing information to recognize who the user is during the interaction, it is important to be able to recognize the user in a natural interaction.

このためかかるエンターテインメントロボットにおいて、例えばユーザの顔を学習する際に『顔を覚えるのでじっとしていてください』などとロボットに発話させることにより当該ロボットが顔の学習をしていることをユーザに明示的に示すことは、本来の目的であるユーザとの自然なインタラクションを阻害する問題がある。 For this reason, in such an entertainment robot, when learning the user's face, for example, by telling the robot that the robot is learning the face by saying "Please remember to remember the face" In other words, there is a problem that obstructs natural interaction with the user, which is the original purpose.

一方、ユーザの名前を学習するエンターテインメントロボットにおいて、ユーザとの自然なインタラクションを行わせるためには、ユーザから教えてもらった当該ユーザの名前に対してその特徴（その名前と結びつくセンシング情報）をなるべく１回で覚えられるようにする工夫が必要となる。 On the other hand, in an entertainment robot that learns the user's name, in order to perform natural interaction with the user, the characteristics (sensing information associated with the name) of the user's name taught by the user should be as much as possible. It is necessary to devise so that it can be remembered once.

ところが、従来提案されているエンターテインメントロボットでは、一定時間内に十分なデータが得られないときには学習が失敗、というように学習の成否判断が固定的になっているため、動的な環境では学習の失敗が多発し、センシング情報と名前とがなかなか結びつかずにロボットが何度も名前を尋ねてしまうという、ユーザにとって煩わしいインタラクションが発生する問題があった。 However, with the entertainment robots that have been proposed in the past, the success or failure of learning is fixed such that learning fails when sufficient data is not obtained within a certain period of time. There was a problem that troubles occurred frequently and the user was troublesome interaction that the robot asked the name many times without connecting the sensing information and the name.

さらに、従来提案されているエンターテインメントロボットでは、学習データを十分に取得できないまま学習を終了しなければならない場合、そのときの学習は失敗として、その学習により得られたデータは全て廃棄されていた。このためその途中までの学習結果を活かせず、効率の良い学習を行い難い問題があった。 Furthermore, in the conventionally proposed entertainment robot, when learning must be terminated without sufficiently acquiring learning data, the learning at that time has failed, and all the data obtained by the learning has been discarded. For this reason, there is a problem that it is difficult to perform efficient learning without utilizing the learning results up to that point.

従って、エンターテインメントロボットにおいて、自然なインタラクションの中でユーザを認識できるようにすることができ、また学習の失敗という状況をなるべく少なくすることができれば、エンターテインメント性をより一層と向上させ得るものと考えられる。 Therefore, if the entertainment robot can recognize the user in a natural interaction, and if the number of learning failures can be reduced as much as possible, the entertainment property can be further improved. .

本発明は以上の点を考慮してなされたもので、エンターテインメント性を向上させ得る学習装置及び学習方法並びにロボット装置を提案しようとするものである。 The present invention has been made in consideration of the above points, and an object of the present invention is to propose a learning apparatus, a learning method, and a robot apparatus that can improve entertainment.

かかる課題を解決するため本発明においては、学習装置において、ユーザの名前と特徴とを関連付けて記憶する記憶部と、カメラから得られる画像をもとにユーザを検出するユーザ検出部と、ユーザ検出部によりユーザが検出されると、当該ユーザに名前を尋ねる前に、カメラから得られる画像と、マイクロホンから得られる音声との少なくとも一方を用いて、当該ユーザの特徴を検出し、検出した当該ユーザの特徴と、記憶部に既に記憶されている特徴とを比較することで、当該ユーザの名前を認識する認識部と、ユーザ検出部により検出されたユーザとの対話を通して、当該ユーザに名前を尋ねることで、当該ユーザの名前を取得する対話部と、対話部により取得されたユーザの名前と認識部により検出された当該ユーザの特徴とが、記憶部に記憶されていないことにより当該ユーザが新規ユーザであると判断した場合に、画像及び音声の少なくとも一方を用いて当該ユーザの特徴を認識部に学習させ、当該学習の結果として得られた当該ユーザの特徴と、対話部により取得された当該ユーザの名前とを関連付けて記憶部に記憶させる制御部とを設け、認識部が、特徴の検出時に得られ特徴の検出に用いた、画像の枚数及び音声の時間長のうちの少なくとも一方をもとに特徴の学習達成度を判定し、対話部が、認識部が判定した学習達成度を用いて認識部による特徴の学習が不十分か否かを判断し、学習が不十分であると判断したときには、当該ユーザとの対話を引き伸ばすための処理を実行するようにした。この結果、この学習装置は、ユーザにそれと認識されることなく、ユーザの名前を通常のユーザとの対話を通して学習することができる。 In order to solve such a problem, in the present invention, in the learning device, a storage unit that associates and stores a user name and a feature, a user detection unit that detects a user based on an image obtained from a camera, and a user detection When the user is detected by the unit, before asking the user for a name, the user's characteristics are detected using at least one of an image obtained from the camera and a sound obtained from the microphone. The user is asked for the name through an interaction between the recognition unit for recognizing the name of the user and the user detected by the user detection unit by comparing the feature of the user and the feature already stored in the storage unit Thus, the dialogue unit for obtaining the name of the user, the user name obtained by the dialogue unit, and the characteristics of the user detected by the recognition unit are described. When it is determined that the user is a new user because it is not stored in the part, the feature of the user is learned by the recognition part using at least one of the image and the sound, and the obtained as a result of the learning A control unit that associates a user feature with the name of the user acquired by the dialogue unit and stores it in the storage unit, and the number of images used by the recognition unit to detect the feature obtained when the feature is detected And whether the feature learning achievement is determined based on at least one of the time length of the voice and the speech, and the dialogue unit is not sufficiently learning the feature by the recognition unit using the learning achievement degree determined by the recognition unit determine, sometimes learning is determined to be insufficient, and to execute a process for extending the interaction with the user. As a result, this learning device can learn the user's name through dialogue with a normal user without being recognized by the user.

また本発明においては、学習方法において、カメラから得られる画像をもとにユーザが検出されると、当該ユーザに名前を尋ねる前に、カメラから得られる画像と、マイクロホンから得られる音声との少なくとも一方を用いて、当該ユーザの特徴を検出し、検出した当該ユーザの特徴と、ユーザの名前と特徴とを関連付けて記憶する記憶部に既に記憶されている特徴とを比較することで、当該ユーザの名前を認識する第１のステップと、検出されたユーザとの対話を通して、当該ユーザに名前を尋ねることで、当該ユーザの名前を取得する第２のステップと、第２のステップで取得したユーザの名前と第１のステップで検出された当該ユーザの特徴とが、記憶部に記憶されていないことにより当該ユーザが新規ユーザであると判断した場合に、画像及び音声の少なくとも一方を用いて当該ユーザの特徴を学習し、当該学習の結果として得られた当該ユーザの特徴と、第２のステップで取得した当該ユーザの名前とを関連付けて記憶部に記憶させる第３のステップとを設け、第３のステップでは、特徴の検出時に得られ特徴の検出に用いた、画像の枚数及び音声の時間長のうちの少なくとも一方をもとに特徴の学習達成度を判定し、判定した学習達成度を用いて特徴の学習が不十分か否かを判断し、学習が不十分であると判断したときには、当該ユーザとの対話を引き伸ばすための処理を実行するようにした。この結果、この学習方法によれば、ユーザにそれと認識されることなく、ユーザの名前を通常のユーザとの対話を通して学習することができる。 In the present invention, in the learning method, when a user is detected based on an image obtained from a camera, before asking the user for the name, at least an image obtained from the camera and an audio obtained from a microphone are used. The user's feature is detected using one of them, and the detected feature of the user is compared with the feature already stored in the storage unit that stores the user's name and the feature in association with each other. A first step of recognizing the name of the user, a second step of acquiring the name of the user by asking the name of the user through interaction with the detected user, and a user acquired in the second step When it is determined that the user is a new user because the name of the user and the characteristics of the user detected in the first step are not stored in the storage unit The feature of the user is learned using at least one of the image and the sound, and the feature of the user obtained as a result of the learning and the name of the user acquired in the second step are associated and stored in the storage unit A degree of learning achievement of the feature based on at least one of the number of images and the time length of the sound used for feature detection obtained at the time of feature detection. determined, it is determined whether insufficiently learning feature using the determined learning achievement, sometimes learning is judged to be insufficient, to execute a process for extending the dialogue with the user I made it. As a result, according to this learning method, the user's name can be learned through dialogue with a normal user without being recognized by the user.

さらに本発明においては、ロボット装置において、ユーザの名前と特徴とを関連付けて記憶する記憶部と、カメラから得られる画像をもとにユーザを検出するユーザ検出部と、ユーザ検出部によりユーザが検出されると、当該ユーザに名前を尋ねる前に、カメラから得られる画像と、マイクロホンから得られる音声との少なくとも一方を用いて、当該ユーザの特徴を検出し、検出した当該ユーザの特徴と、記憶部に既に記憶されている特徴とを比較することで、当該ユーザの名前を認識する認識部と、ユーザ検出部により検出されたユーザとの対話を通して、当該ユーザに名前を尋ねることで、当該ユーザの名前を取得する対話部と、対話部により取得されたユーザの名前と認識部により検出された当該ユーザの特徴とが、記憶部に記憶されていないことにより当該ユーザが新規ユーザであると判断した場合に、画像及び音声の少なくとも一方を用いて当該ユーザの特徴を認識部に学習させ、当該学習の結果として得られた当該ユーザの特徴と、対話部により取得された当該ユーザの名前とを関連付けて記憶部に記憶させる制御部とを設け、認識部が、特徴の検出時に得られ特徴の検出に用いた、画像の枚数及び音声の時間長のうちの少なくとも一方をもとに特徴の学習達成度を判定し、対話部が、認識部が判定した学習達成度を用いて認識部による特徴の学習が不十分か否かを判断し、学習が不十分であると判断したときには、当該ユーザとの対話を引き伸ばすための処理を実行するようにした。この結果、このロボット装置は、ユーザにそれと認識されることなく、ユーザの名前を通常のユーザとの対話を通して学習することができる。 Furthermore, in the present invention, in the robot apparatus, the storage unit that stores the user name and the feature in association with each other, the user detection unit that detects the user based on the image obtained from the camera, and the user detection unit that detects the user Then, before asking the user for the name, the user's feature is detected using at least one of the image obtained from the camera and the sound obtained from the microphone, and the detected feature of the user is stored. By comparing the feature already stored in the unit, the user recognizes the name of the user through dialogue with the recognition unit that recognizes the name of the user and the user detected by the user detection unit. The dialogue unit for acquiring the name of the user, the user name obtained by the dialogue unit, and the characteristics of the user detected by the recognition unit are stored in the storage unit. When it is determined that the user is a new user, the recognition unit learns the user's characteristics using at least one of the image and the sound, and the characteristics of the user obtained as a result of the learning, A control unit that associates the name of the user acquired by the dialogue unit with the storage unit and stores it in the storage unit, and the recognition unit obtains the number of images and the time length of the sound obtained when detecting the feature. The learning achievement level of the feature is determined based on at least one of the above, and the dialogue unit determines whether the learning of the feature by the recognition unit is insufficient or not by using the learning achievement level determined by the recognition unit. When it is determined that the above is insufficient, a process for extending the dialogue with the user is executed. As a result, this robot apparatus can learn the user's name through dialogue with a normal user without being recognized by the user.

本発明によれば、学習装置において、ユーザの名前と特徴とを関連付けて記憶する記憶部と、カメラから得られる画像をもとにユーザを検出するユーザ検出部と、ユーザ検出部によりユーザが検出されると、当該ユーザに名前を尋ねる前に、カメラから得られる画像と、マイクロホンから得られる音声との少なくとも一方を用いて、当該ユーザの特徴を検出し、検出した当該ユーザの特徴と、記憶部に既に記憶されている特徴とを比較することで、当該ユーザの名前を認識する認識部と、ユーザ検出部により検出されたユーザとの対話を通して、当該ユーザに名前を尋ねることで、当該ユーザの名前を取得する対話部と、対話部により取得されたユーザの名前と認識部により検出された当該ユーザの特徴とが、記憶部に記憶されていないことにより当該ユーザが新規ユーザであると判断した場合に、画像及び音声の少なくとも一方を用いて当該ユーザの特徴を認識部に学習させ、当該学習の結果として得られた当該ユーザの特徴と、対話部により取得された当該ユーザの名前とを関連付けて記憶部に記憶させる制御部とを設け、認識部が、特徴の検出時に得られ特徴の検出に用いた、画像の枚数及び音声の時間長のうちの少なくとも一方をもとに特徴の学習達成度を判定し、対話部が、認識部が判定した学習達成度を用いて認識部による特徴の学習が不十分か否かを判断し、学習が不十分であると判断したときには、当該ユーザとの対話を引き伸ばすための処理を実行するようにした。この結果、この学習装置は、ユーザにそれと認識されることなく、ユーザの名前を通常のユーザとの対話を通して学習することができ、かくしてエンターテインメント性を向上させ得る学習装置を実現できる。 According to the present invention, in the learning device, the storage unit that stores the user name and the feature in association with each other, the user detection unit that detects the user based on the image obtained from the camera, and the user detection unit that detects the user Then, before asking the user for the name, the user's feature is detected using at least one of the image obtained from the camera and the sound obtained from the microphone, and the detected feature of the user is stored. By comparing the feature already stored in the unit, the user recognizes the name of the user through dialogue with the recognition unit that recognizes the name of the user and the user detected by the user detection unit. The storage unit does not store the dialog unit for acquiring the name of the user, the user name acquired by the dialog unit, and the characteristics of the user detected by the recognition unit When it is determined that the user is a new user, the recognition unit learns the user's characteristics using at least one of image and sound, and the user's characteristics obtained as a result of the learning and the dialogue unit A control unit that associates and stores the name of the user acquired by the storage unit in the storage unit, and the recognition unit uses the number of images and the time length of sound used for feature detection obtained when the feature is detected. The learning achievement level of the feature is determined based on at least one of the above, and the dialogue unit determines whether or not the learning of the feature by the recognition unit is insufficient using the learning achievement level determined by the recognition unit. When it is judged that it is sufficient, a process for extending the dialogue with the user is executed. As a result, this learning device can learn the user's name through dialogue with a normal user without being recognized by the user, thus realizing a learning device that can improve entertainment.

また本発明によれば、学習方法において、カメラから得られる画像をもとにユーザが検出されると、当該ユーザに名前を尋ねる前に、カメラから得られる画像と、マイクロホンから得られる音声との少なくとも一方を用いて、当該ユーザの特徴を検出し、検出した当該ユーザの特徴と、ユーザの名前と特徴とを関連付けて記憶する記憶部に既に記憶されている特徴とを比較することで、当該ユーザの名前を認識する第１のステップと、検出されたユーザとの対話を通して、当該ユーザに名前を尋ねることで、当該ユーザの名前を取得する第２のステップと、第２のステップで取得したユーザの名前と第１のステップで検出された当該ユーザの特徴とが、記憶部に記憶されていないことにより当該ユーザが新規ユーザであると判断した場合に、画像及び音声の少なくとも一方を用いて当該ユーザの特徴を学習し、当該学習の結果として得られた当該ユーザの特徴と、第２のステップで取得した当該ユーザの名前とを関連付けて記憶部に記憶させる第３のステップとを設け、第３のステップでは、特徴の検出時に得られ特徴の検出に用いた、画像の枚数及び音声の時間長のうちの少なくとも一方をもとに特徴の学習達成度を判定し、判定した学習達成度を用いて特徴の学習が不十分か否かを判断し、学習が不十分であると判断したときには、当該ユーザとの対話を引き伸ばすための処理を実行するようにした。この結果、この学習方法によれば、ユーザにそれと認識されることなく、ユーザの名前を通常のユーザとの対話を通して学習することができ、かくしてエンターテインメント性を向上させ得る学習方法を実現できる。 According to the present invention, in the learning method, when a user is detected based on an image obtained from the camera, before asking the user for the name, the image obtained from the camera and the sound obtained from the microphone The feature of the user is detected using at least one, and the feature of the detected user is compared with the feature already stored in the storage unit that stores the user name and the feature in association with each other. Obtained in the first step of recognizing the name of the user, the second step of obtaining the name of the user by asking the name of the user through the interaction with the detected user, and the second step When it is determined that the user is a new user because the user's name and the characteristics of the user detected in the first step are not stored in the storage unit, The feature of the user is learned using at least one of the image and the voice, and the feature of the user obtained as a result of the learning and the name of the user acquired in the second step are associated and stored in the storage unit A degree of learning achievement of the feature based on at least one of the number of images and the time length of the sound used for feature detection obtained at the time of feature detection. determined, it is determined whether insufficiently learning feature using the determined learning achievement, sometimes learning is judged to be insufficient, to execute a process for extending the dialogue with the user I made it. As a result, according to this learning method, the user's name can be learned through dialogue with a normal user without being recognized by the user, and thus a learning method that can improve entertainment properties can be realized.

さらに本発明によれば、ロボット装置において、ユーザの名前と特徴とを関連付けて記憶する記憶部と、カメラから得られる画像をもとにユーザを検出するユーザ検出部と、ユーザ検出部によりユーザが検出されると、当該ユーザに名前を尋ねる前に、カメラから得られる画像と、マイクロホンから得られる音声との少なくとも一方を用いて、当該ユーザの特徴を検出し、検出した当該ユーザの特徴と、記憶部に既に記憶されている特徴とを比較することで、当該ユーザの名前を認識する認識部と、ユーザ検出部により検出されたユーザとの対話を通して、当該ユーザに名前を尋ねることで、当該ユーザの名前を取得する対話部と、対話部により取得されたユーザの名前と認識部により検出された当該ユーザの特徴とが、記憶部に記憶されていないことにより当該ユーザが新規ユーザであると判断した場合に、画像及び音声の少なくとも一方を用いて当該ユーザの特徴を認識部に学習させ、当該学習の結果として得られた当該ユーザの特徴と、対話部により取得された当該ユーザの名前とを関連付けて記憶部に記憶させる制御部とを設け、認識部が、特徴の検出時に得られ特徴の検出に用いた、画像の枚数及び音声の時間長のうちの少なくとも一方をもとに特徴の学習達成度を判定し、対話部が、認識部が判定した学習達成度を用いて認識部による特徴の学習が不十分か否かを判断し、学習が不十分であると判断したときには、当該ユーザとの対話を引き伸ばすための処理を実行するようにした。この結果、このロボット装置は、ユーザにそれと認識されることなく、ユーザの名前を通常のユーザとの対話を通して学習することができ、かくしてエンターテインメント性を向上させ得るロボット装置を実現できる。 Further, according to the present invention, in the robot apparatus, the storage unit that stores the user name and the feature in association with each other, the user detection unit that detects the user based on the image obtained from the camera, and the user detection unit When detected, before asking the user for the name, the user's feature is detected using at least one of an image obtained from the camera and a voice obtained from the microphone, and the detected feature of the user, By comparing the feature already stored in the storage unit with the recognition unit for recognizing the name of the user and the dialogue with the user detected by the user detection unit, the user is asked for the name, The storage unit stores the dialogue unit for obtaining the user name, the user name obtained by the dialogue unit, and the characteristics of the user detected by the recognition unit. When it is determined that the user is a new user, the recognition unit learns the user's characteristics using at least one of an image and audio, and the user's characteristics obtained as a result of the learning; A control unit that associates the name of the user acquired by the dialogue unit with the storage unit and stores it in the storage unit, and the recognition unit obtains the number of images and the time length of the sound obtained when detecting the feature. The learning achievement level of the feature is determined based on at least one of the above, and the dialogue unit determines whether the learning of the feature by the recognition unit is insufficient or not by using the learning achievement level determined by the recognition unit. When it is determined that the above is insufficient, a process for extending the dialogue with the user is executed. As a result, this robot apparatus can learn the user's name through dialogue with a normal user without being recognized by the user, and thus can realize a robot apparatus that can improve entertainment properties.

以下図面について、本発明の一実施の形態を詳述する。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

（１）本実施の形態によるロボットの構成
図１及び図２において、１は全体として本実施の形態による２足歩行型のロボットを示し、胴体部ユニット２の上部に頭部ユニット３が配設されると共に、当該胴体部ユニット２の上部左右にそれぞれ同じ構成の腕部ユニット４Ａ、４Ｂがそれぞれ配設され、かつ胴体部ユニット２の下部左右にそれぞれ同じ構成の脚部ユニット５Ａ、５Ｂがそれぞれ所定位置に取り付けられることにより構成されている。 (1) Configuration of Robot According to this Embodiment In FIGS. 1 and 2, 1 indicates a bipedal walking robot according to this embodiment as a whole, and a head unit 3 is disposed above the body unit 2. In addition, arm units 4A and 4B having the same configuration are respectively disposed on the upper left and right of the body unit 2, and leg units 5A and 5B having the same structure are disposed on the lower left and right of the body unit 2, respectively. It is configured by being attached at a predetermined position.

胴体部ユニット２においては、体幹上部を形成するフレーム１０及び体幹下部を形成する腰ベース１１が腰関節機構１２を介して連結することにより構成されており、体幹下部の腰ベース１１に固定された腰関節機構１２の各アクチュエータＡ_１、Ａ_２をそれぞれ駆動することによって、体幹上部を図３に示す直交するロール軸１３及びピッチ軸１４の回りにそれぞれ独立に回転させ得るようになされている。 In the torso unit 2, a frame 10 that forms the upper part of the trunk and a waist base 11 that forms the lower part of the trunk are connected via a hip joint mechanism 12. By driving the actuators A ₁ and A ₂ of the fixed hip joint mechanism 12, the upper part of the trunk can be independently rotated around the orthogonal roll axis 13 and the pitch axis 14 shown in FIG. Has been made.

また頭部ユニット３は、フレーム１０の上端に固定された肩ベース１５の上面中央部に首関節機構１６を介して取り付けられており、当該首関節機構１６の各アクチュエータＡ_３、Ａ_４をそれぞれ駆動することによって、図３に示す直交するピッチ軸１７及びヨー軸１８の回りにそれぞれ独立に回転させ得るようになされている。 The head unit 3 is attached to the center of the upper surface of the shoulder base 15 fixed to the upper end of the frame 10 via a neck joint mechanism 16, and the actuators A ₃ and A ₄ of the neck joint mechanism 16 are respectively connected to the head unit 3. By being driven, it can be rotated independently around the orthogonal pitch axis 17 and yaw axis 18 shown in FIG.

さらに各腕部ユニット４Ａ、４Ｂは、それぞれ肩関節機構１９を介して肩ベース１５の左右に取り付けられており、対応する肩関節機構１９の各アクチュエータＡ_５、Ａ_６をそれぞれ駆動することによって図３に示す直交するピッチ軸２０及びロール軸２１の回りにそれぞれ独立に回転させ得るようになされている。 Furthermore arm units 4A, 4B are attached to the left and right shoulder base 15 respectively via a shoulder joint mechanism 19, FIG by driving corresponding to each actuator A _5, A ₆ of the shoulder joint mechanism 19 respectively 3 can be rotated independently around the orthogonal pitch axis 20 and roll axis 21 shown in FIG.

この場合、各腕部ユニット４Ａ、４Ｂは、それぞれ上腕部を形成するアクチュエータＡ_７の出力軸に肘関節機構２２を介して前腕部を形成するアクチュエータＡ_８が連結され、当該前腕部の先端に手部２３が取り付けられることにより構成されている。 In this case, each of the arm units 4A, 4B, the actuator A ₈ to form a forearm via an elbow joint mechanism 22 is connected to each output shaft of the actuator A ₇ forming the upper arm, the tip end of the forearm It is configured by attaching the hand portion 23.

そして各腕部ユニット４Ａ、４Ｂでは、アクチュエータＡ_７を駆動することによって前腕部を図３に示すヨー軸２４の回りに回転させ、アクチュエータＡ_８を駆動することによって前腕部を図３に示すピッチ軸２５の回りにそれぞれ回転させることができるようになされている。 The arm units 4A, in 4B, the forearm is rotated around the yaw axis 24 shown in FIG. 3 by driving the actuator A _7, pitch indicating the forearm in Fig. 3 by driving the actuator A ₈ Each of them can be rotated around an axis 25.

これに対して各脚部ユニット５Ａ、５Ｂにおいては、それぞれ股関節機構２６を介して体幹下部の腰ベース１１にそれぞれ取り付けられており、それぞれ対応する股関節機構２６の各アクチュエータをＡ_９〜Ａ_１１それぞれ駆動することによって、図３に示す互いに直交するヨー軸２７、ロール軸２８及びピッチ軸２９の回りにそれぞれ独立に回転させ得るようになされている。 On the other hand, each leg unit 5A, 5B is attached to the waist base 11 at the lower part of the trunk via the hip joint mechanism 26, and the actuators of the corresponding hip joint mechanism 26 are respectively A _{9 to} A _11. By driving each of them, the yaw axis 27, the roll axis 28 and the pitch axis 29 which are orthogonal to each other shown in FIG.

この場合各脚部ユニット５Ａ、５Ｂは、それぞれ大腿部を形成するフレーム３０の下端に膝関節機構３１を介して下腿部を形成するフレーム３２が連結されると共に、当該フレーム３２の下端に足首関節機構３３を介して足部３４が連結されることにより構成されている。 In this case, each leg unit 5A, 5B is connected to a lower end of a frame 30 that forms a thigh, a frame 32 that forms a lower leg through a knee joint mechanism 31, and to the lower end of the frame 32. The foot portion 34 is connected via an ankle joint mechanism 33.

これにより各脚部ユニット５Ａ、５Ｂにおいては、膝関節機構３１を形成するアクチュエータＡ_１２を駆動することによって、下腿部を図３に示すピッチ軸３５の回りに回転させることができ、また足首関節機構３３のアクチュエータＡ_１３、Ａ_１４をそれぞれ駆動することによって、足部３４を図３に示す直交するピッチ軸３６及びロール軸３７の回りにそれぞれ独立に回転させ得るようになされている。 Thus leg units 5A, in 5B, by driving the actuator A ₁₂ which forms a knee joint mechanism 31, it is possible to rotate the lower leg around the pitch axis 35 shown in FIG. 3, also ankles By driving the actuators A ₁₃ and A ₁₄ of the joint mechanism 33, the foot 34 can be rotated independently around the orthogonal pitch axis 36 and roll axis 37 shown in FIG.

一方、胴体部ユニット２の体幹下部を形成する腰ベース１１の背面側には、図４に示すように、当該ロボット１全体の動作制御を司るメイン制御部４０と、電源回路及び通信回路などの周辺回路４１と、バッテリ４５（図５）となどがボックスに収納されてなる制御ユニット４２が配設されている。 On the other hand, on the back side of the waist base 11 that forms the lower part of the trunk of the trunk unit 2, as shown in FIG. 4, a main control unit 40 that controls the operation of the entire robot 1, a power supply circuit, a communication circuit, and the like. A control unit 42 in which the peripheral circuit 41 and the battery 45 (FIG. 5) are housed in a box is provided.

そしてこの制御ユニット４２は、各構成ユニット（胴体部ユニット２、頭部ユニット３、各腕部ユニット４Ａ、４Ｂ及び各脚部ユニット５Ａ、５Ｂ）内にそれぞれ配設された各サブ制御部４３Ａ〜４３Ｄと接続されており、これらサブ制御部４３Ａ〜４３Ｄに対して必要な電源電圧を供給したり、これらサブ制御部４３Ａ〜４３Ｄと通信を行うことができるようになされている。 The control unit 42 includes sub-control units 43A to 43A disposed in the constituent units (the body unit 2, the head unit 3, the arm units 4A and 4B, and the leg units 5A and 5B). It is connected to 43D and can supply a necessary power supply voltage to these sub-control units 43A to 43D and can communicate with these sub-control units 43A to 43D.

また各サブ制御部４３Ａ〜４３Ｄは、それぞれ対応する構成ユニット内の各アクチュエータＡ_１〜Ａ_１４と接続されており、当該構成ユニット内の各アクチュエータＡ_１〜Ａ_１４をメイン制御部４０から与えられる各種制御コマンドに基づいて指定された状態に駆動し得るようになされている。 The sub-control units 43A to 43D are connected to the actuators A _{1 to} A _{14 in} the corresponding constituent units, respectively, and the actuators A _{1 to} A ₁₄ in the constituent units are given from the main control unit 40. It can be driven to a designated state based on various control commands.

さらに頭部ユニット３には、図５に示すように、このロボット１の「目」として機能するＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）カメラ５０及び「耳」として機能するマイクロホン５１などの各種外部センサと、「口」として機能するスピーカ５２となどがそれぞれ所定位置に配設され、手部２３や足部３４の底面部等には、外部センサとしてのタッチセンサ５３が配設されている。さらに制御ユニット４２内には、バッテリセンサ５４及び加速度センサ５５などからなる内部センサが配設されている。 Further, as shown in FIG. 5, the head unit 3 includes various external sensors such as a CCD (Charge Coupled Device) camera 50 that functions as an “eye” of the robot 1 and a microphone 51 that functions as an “ear”. A speaker 52 functioning as a “mouth” and the like are disposed at predetermined positions, respectively, and a touch sensor 53 as an external sensor is disposed on the bottom portion of the hand portion 23 and the foot portion 34. Further, an internal sensor including a battery sensor 54 and an acceleration sensor 55 is disposed in the control unit 42.

そしてＣＣＤカメラ５０は、周囲の状況を撮像し、得られた画像信号Ｓ１Ａをメイン制御部に送出する一方、マイクロホン５１は、各種外部音を集音し、かくして得られた音声信号Ｓ１Ｂをメイン制御部４０に送出するようになされている。
さらにタッチセンサ５３は、ユーザからの物理的な働きかけや、外部との物理的な接触を検出し、検出結果を圧力検出信号Ｓ１Ｃとしてメイン制御部４０に送出する。 The CCD camera 50 images the surrounding situation and sends the obtained image signal S1A to the main control unit, while the microphone 51 collects various external sounds and controls the audio signal S1B thus obtained as the main control. The data is sent to the unit 40.
Further, the touch sensor 53 detects a physical action from the user or a physical contact with the outside, and sends the detection result to the main control unit 40 as a pressure detection signal S1C.

またバッテリセンサ５４は、バッテリ４５のエネルギ残量を所定周期で検出し、検出結果をバッテリ残量検出信号Ｓ２Ａとしてメイン制御部４０に送出する一方、加速度センサ５６は、３軸方向（ｘ軸、ｙ軸及びｚ軸）の加速度を所定周期で検出し、検出結果を加速度検出信号Ｓ２Ｂとしてメイン制御部４０に送出する。 The battery sensor 54 detects the remaining amount of energy of the battery 45 at a predetermined cycle, and sends the detection result to the main control unit 40 as a remaining battery amount detection signal S2A. On the other hand, the acceleration sensor 56 has three axis directions (x-axis, (y-axis and z-axis) are detected at a predetermined cycle, and the detection result is sent to the main control unit 40 as an acceleration detection signal S2B.

メイン制御部４０は、ＣＣＤカメラ５０、マイクロホン５１及びタッチセンサ５３等からそれぞれ供給される外部センサ出力としての画像信号Ｓ１Ａ、音声信号Ｓ１Ｂ及び圧力検出信号Ｓ１Ｃ等と、バッテリセンサ５４及び加速度センサ５５等からそれぞれ供給される内部センサ出力としてのバッテリ残量検出信号Ｓ２Ａ及び加速度検出信号Ｓ２Ｂ等とに基づいて、ロボット１の周囲及び内部の状況や、外部物体との接触などを判断する。 The main control unit 40 includes an image signal S1A, an audio signal S1B, a pressure detection signal S1C, and the like as external sensor outputs supplied from a CCD camera 50, a microphone 51, a touch sensor 53, and the like, a battery sensor 54, an acceleration sensor 55, and the like. Based on the remaining battery level detection signal S2A, the acceleration detection signal S2B, and the like as internal sensor outputs respectively supplied from the above, the situation around and inside the robot 1, contact with an external object, and the like are determined.

そしてメイン制御部４０は、この判断結果と、予め内部メモリ４０Ａに格納されている制御プログラムと、そのとき装填されている外部メモリ５６に格納されている各種制御パラメータとに基づいて続く行動を決定し、決定結果に基づく制御コマンドを対応するサブ制御部４３Ａ〜４３Ｄに送出する。この結果、この制御コマンドに基づき、そのサブ制御部４３Ａ〜４３Ｄの制御のもとに、対応するアクチュエータＡ_１〜Ａ_１４が駆動され、かくして頭部ユニット３を上下左右に揺動させたり、腕部ユニット４Ａ、４Ｂを上にあげたり、歩行するなどの行動がロボット１により発現されることとなる。 The main control unit 40 determines the action to be continued based on the determination result, the control program stored in the internal memory 40A in advance, and various control parameters stored in the external memory 56 loaded at that time. Then, the control command based on the determination result is sent to the corresponding sub-control units 43A to 43D. As a result, based on this control command, the corresponding actuators A _{1 to} A ₁₄ are driven under the control of the sub-control units 43A to 43D, thus swinging the head unit 3 up and down, left and right, Actions such as raising the unit units 4A and 4B or walking are expressed by the robot 1.

またメイン制御部４０は、かかる音声信号Ｓ１Ｂに基づく音声認識処理によりユーザの発話内容を認識し、当該認識に応じた音声信号Ｓ３をスピーカ５２に与えることにより、ユーザと対話するための合成音声を外部に出力させる。 Further, the main control unit 40 recognizes the content of the user's utterance by the voice recognition process based on the voice signal S1B, and gives the voice signal S3 corresponding to the recognition to the speaker 52, so that the synthesized voice for interacting with the user is obtained. Output to the outside.

このようにしてこのロボット１においては、周囲及び内部の状況等に基づいて自律的に行動することができ、またユーザと対話することもができるようになされている。 In this way, the robot 1 can act autonomously based on the surrounding and internal conditions and can also interact with the user.

（２）名前学習機能に関するメイン制御部４０の処理
（２−１）名前学習機能に関するメイン制御部４０の構成
次にこのロボット１に搭載された名前学習機能について説明する。 (2) Processing of main control unit 40 related to name learning function (2-1) Configuration of main control unit 40 related to name learning function Next, the name learning function installed in the robot 1 will be described.

このロボット１には、ユーザとの対話を通してそのユーザの名前を取得し、当該取得した名前を、そのときマイクロホン５１やＣＣＤカメラ５０の出力に基づいて検出したそのユーザの声の音響的特徴及び顔の形態的特徴の各データと関連付けて記憶すると共に、これら記憶した各データに基づいて新規なユーザの登場を認識し、その新規なユーザの名前や声の音響的特徴及び顔の形態的特徴を上述と同様に取得し記憶するようにして、ユーザの名前を取得していく名前学習機能が搭載されている。なお以下においては、そのユーザの名前と声の音響的特徴及び顔の形態的特徴を対応付けて記憶し終えたユーザを『既知のユーザ』と呼び、記憶し終えていないユーザを『新規なユーザ』と呼ぶものとする。 The robot 1 acquires the user's name through dialogue with the user, and the acquired name is detected based on the output of the microphone 51 or the CCD camera 50 at that time, and the acoustic features and face of the user's voice. The morphological features are stored in association with each other, and the appearance of a new user is recognized based on the stored data, and the new user's name, voice acoustic features, and facial morphological features are recognized. A name learning function for acquiring and storing the user's name in the same manner as described above is provided. In the following, a user who has stored the user's name, voice acoustic features, and facial morphological features in association with each other will be referred to as a “known user”, and a user who has not been stored will be referred to as a “new user”. It shall be called.

そしてこの名前学習機能は、メイン制御部４０における各種処理により実現されている。 This name learning function is realized by various processes in the main control unit 40.

ここで、かかる名前学習機能に関するメイン制御部４０の処理内容を機能的に分類すると、図６に示すように、ユーザが発声した言葉を認識する音声認識部６０と、ユーザの声の音響的特徴を検出すると共に当該検出した音響的特徴に基づいてそのユーザを識別して認識する話者認識部６１と、ユーザの顔の形態的特徴を検出すると共に当該検出した形態的特徴に基づいてそのユーザを識別して認識する顔認識部６２と、ユーザとの対話制御を含むユーザの名前を学習するための各種制御を司る対話制御部と、既知のユーザの名前、声の音響的特徴及び顔の形態的特徴の関連付けを管理する連想記憶部と、対話制御部６３の制御のもとに各種対話用の音声信号Ｓ３を生成してスピーカ５４（図５）に送出する音声合成部６４とに分けることができる。 Here, when the processing contents of the main control unit 40 relating to the name learning function are functionally classified, as shown in FIG. 6, a speech recognition unit 60 that recognizes words spoken by the user, and an acoustic feature of the user's voice And a speaker recognition unit 61 for identifying and recognizing the user based on the detected acoustic feature, and detecting the morphological feature of the user's face and the user based on the detected morphological feature. A face recognition unit 62 that recognizes and recognizes, a dialogue control unit that performs various controls for learning a user's name including dialogue control with the user, a known user's name, voice acoustic features, and facial The associative storage unit that manages the association of the morphological features is divided into the speech synthesis unit 64 that generates voice signals S3 for various dialogues and sends them to the speaker 54 (FIG. 5) under the control of the dialogue control unit 63. Can Kill.

この場合、音声認識部６０は、マイクロホン５１（図５）からの音声信号Ｓ１Ｂに基づき所定の音声認識処理を実行することにより当該音声信号Ｓ１Ｂに含まれる言葉を単語単位で認識する機能を有し、認識したこれら単語を文字列データＤ１として対話制御部６３に送出する。 In this case, the voice recognition unit 60 has a function of recognizing words included in the voice signal S1B in units of words by executing a predetermined voice recognition process based on the voice signal S1B from the microphone 51 (FIG. 5). These recognized words are sent to the dialogue control unit 63 as character string data D1.

話者認識部６１は、マイクロホン５１からの音声信号Ｓ１Ｂに基づき得られる音声データを例えば内部メモリ４０Ａ（図５）に記憶保持する機能と、当該記憶保持した音声データ又はマイクロホン５１からリアルタイムで与えられる音声信号Ｓ１Ｂに基づき得られる音声データを用いて、ユーザの声の音響的特徴を例えば“Segregation of Speakers for Recognition and Speaker Identification（CH2977-7/91/0000~0873 S1.00 1991 IEEE）”に記載された方法等を利用した所定の信号処理により検出する機能を有する。 The speaker recognition unit 61 is provided with a function of storing and holding voice data obtained based on the voice signal S1B from the microphone 51 in, for example, the internal memory 40A (FIG. 5) and the stored and held voice data or the microphone 51 in real time. Using the audio data obtained based on the audio signal S1B, the acoustic characteristics of the user's voice are described in, for example, “Segregation of Speakers for Recognition and Speaker Identification (CH2977-7 / 91 / 0000-0873 S1.00 1991 IEEE)” It has a function of detecting by predetermined signal processing using the method or the like.

そして話者認識部６１は、この検出した音響的特徴のデータをそのとき記憶している全ての既知のユーザの音響的特徴のデータと順次比較し、そのとき検出した音響的特徴がいずれか既知のユーザの音響的特徴と一致した場合には当該既知のユーザの音響的特徴と対応付けられた当該音響的特徴に固有の識別子（以下、これをＳＩＤと呼ぶ）を対話制御部６３に通知する一方、検出した音響的特徴がいずれか既知のユーザの音響的特徴とも一致しなかった場合には、認識不能を意味するＳＩＤ（＝−１）を対話制御部６３に通知する。 The speaker recognizing unit 61 sequentially compares the detected acoustic feature data with all the known user acoustic feature data stored at that time, and any of the detected acoustic features is known. If it matches the acoustic feature of the user, the dialogue controller 63 is notified of an identifier (hereinafter referred to as SID) unique to the acoustic feature associated with the known user acoustic feature. On the other hand, when the detected acoustic feature does not match any known acoustic feature of the user, the dialogue control unit 63 is notified of SID (= −1) indicating that recognition is impossible.

また話者認識部６１は、対話制御部６３が新規なユーザであると判断したときに当該対話制御部６３から与えられる新規学習の開始命令及び学習終了命令に応じて、記憶保持した又はリアルタイムで得られる音声データを用いて、その間そのユーザの声の音響的特徴を検出し、当該検出した音響的特徴のデータを新たな固有のＳＩＤと対応付けて記憶すると共に、このＳＩＤを対話制御部６３に通知する。 In addition, the speaker recognition unit 61 stores or holds in real time according to a new learning start command and a learning end command given from the dialog control unit 63 when the dialog control unit 63 determines that the user is a new user. In the meantime, the acoustic feature of the user's voice is detected using the obtained voice data, and the detected acoustic feature data is stored in association with a new unique SID, and this SID is stored in the dialogue control unit 63. Notify

さらに話者認識部６１は、その後そのユーザに対する追加学習の開始命令及び終了命令が対話制御部６３から与えられたときには、記憶保持した又はリアルタイムで得られる音声データを用いて、そのユーザの声の音響的特徴のデータを追加的に収集する追加学習を実行する。 Further, the speaker recognizing unit 61 then uses the voice data stored and stored in real time when the start and end commands for the additional learning for the user are given from the dialogue control unit 63, Perform additional learning to additionally collect acoustic feature data.

さらに話者認識部６１は、対話制御部６３からユーザを指定してそのユーザの学習達成度について問い合わせがあったときに、これに回答する回答機能を有する。ここで学習達成度とは、そのユーザを認識するために用いるデータ（ここでは音響的特徴のデータ）の収集の程度を意味し、話者認識部６１における学習達成度は、そのユーザの声の音響的特徴のデータを収集するために用いた音声の時間長をパラメータとした関数で与えられる値をもとに決定される。 Further, the speaker recognizing unit 61 has a reply function for answering an inquiry about the degree of achievement of learning by designating a user from the dialogue control unit 63. Here, the learning achievement level means the degree of collection of data (in this case, acoustic feature data) used for recognizing the user, and the learning achievement level in the speaker recognition unit 61 is the voice level of the user. It is determined on the basis of a value given by a function having the time length of speech used for collecting acoustic feature data as a parameter.

そしてこの実施の形態においては、かかる学習達成度として、認識に実用上十分に使用できるレベルである「Ａ（十分に学習できた）」レベルと、認識には使用可能であるが追加学習をした方がいい程度のレベルである「Ｂ（ちょっと不安）」レベルと、認識には使用するには不十分であるため認識に使用せず、次の機会に追加学習をすべきレベルである「Ｃ（不十分）」レベルとが数値として設定されている。 In this embodiment, as the degree of learning achievement, the “A (sufficiently learned)” level, which is a level that can be used practically for recognition, and additional learning that can be used for recognition. The “B (a little anxiety)” level, which is a better level, and the “C” level that should not be used for recognition because it is insufficient for recognition, and “C” should be used for the next opportunity. (Insufficient) ”level is set as a numerical value.

かくして話者認識部６１においては、対話制御部６３からユーザを指定してそのユーザの学習達成度について問い合わせがあったときには、そのユーザの声の音響的特徴のデータを収集するために用いた音声の時間長をパラメータとした関数で与えられる値から、そのユーザ学習達成度が「Ａ」〜「Ｃ」のいずれに該当するかを判定し、判定結果を対話制御部６３に通知する。 Thus, in the speaker recognition unit 61, when the dialogue control unit 63 designates a user and inquires about the learning achievement level of the user, the voice used to collect the acoustic feature data of the user's voice Whether the user learning achievement level corresponds to “A” to “C” is determined from the value given by the function using the time length of the parameter as a parameter, and the determination result is notified to the dialog control unit 63.

顔認識部６２においては、ＣＣＤカメラ５０（図５）からの画像信号に基づき得られる学習用の画像データを例えば内部メモリ４０Ａ（図５）に記憶保持する機能と、当該記憶保持した画像データ又はＣＣＤカメラ５０からリアルタイムで与えられる当該画像信号Ｓ１Ａに基づき得られる画像データを用いて、当該画像データに基づく画像内に含まれるユーザの顔の形態的特徴を所定の信号処理により検出する機能とを有する。 In the face recognition unit 62, for example, learning image data obtained based on the image signal from the CCD camera 50 (FIG. 5) is stored and held in, for example, the internal memory 40A (FIG. 5), and the stored image data or A function of detecting morphological features of the user's face included in an image based on the image data by predetermined signal processing using image data obtained based on the image signal S1A given in real time from the CCD camera 50; Have.

そして顔認識部６２は、この検出した形態的特徴のデータをそのとき記憶している全ての既知のユーザの顔の形態的特徴のデータと順次比較し、そのとき検出した形態的特徴がいずれか既知のユーザの顔の形態的特徴と一致した場合には当該既知のユーザの形態的特徴と対応付けられた当該形態的特徴に固有の識別子（以下、これをＦＩＤと呼ぶ）を対話制御部６３に通知する一方、検出した形態的特徴がいずれか既知のユーザの顔の形態的特徴とも一致しなかった場合には、認識不能を意味するＦＩＤ（＝−１）を対話制御部６３に通知する。 Then, the face recognition unit 62 sequentially compares the data of the detected morphological features with the data of the morphological features of the faces of all known users stored at that time. When the morphological feature of the known user's face matches, an identifier (hereinafter referred to as FID) unique to the morphological feature associated with the known user's morphological feature is used as the dialogue control unit 63. On the other hand, if the detected morphological feature does not match any known morphological feature of the user's face, the dialogue control unit 63 is notified of FID (= -1) indicating that recognition is impossible. .

また顔認識部６２は、対話制御部６３が新規なユーザであると判断したときに当該対話制御部６３から与えられる新規学習の開始命令及び終了命令に基づいて、記憶保持した又はリアルタイムで得られる画像データを用いて、その間ユーザの顔の形態的特徴を検出し、当該検出した形態的特徴のデータを新たな固有のＦＩＤと対応付けて記憶すると共に、このＦＩＤを対話制御部６３に通知する。 The face recognizing unit 62 is stored and held or obtained in real time based on a new learning start command and an end command given from the dialog control unit 63 when the dialog control unit 63 determines that the user is a new user. In the meantime, the morphological features of the user's face are detected using the image data, the detected morphological feature data is stored in association with a new unique FID, and the dialogue control unit 63 is notified of this FID. .

さらに顔認識部６２は、その後そのユーザに対する追加学習の開始命令及び終了命令が対話制御部６３から与えられたときには、記憶保持した又はリアルタイムで得られる画像データを用いて、そのユーザの顔の形態的特徴のデータを追加的に収集する追加学習を実行する。 Further, the face recognizing unit 62 then uses the image data stored and stored in real time when an additional learning start command and end command for the user are given from the dialogue control unit 63, and the form of the user's face Perform additional learning to collect additional feature data.

さらに顔認識部６２は、話者認識部６１と同様に、対話制御部６３からユーザを指定してそのユーザの学習達成度について問い合わせがあったときに、これに回答する回答機能を有する。そしてこの実施の形態の場合、顔認識部６２における学習達成度は、ユーザの顔の形態的特徴のデータを収集するために用いた画像信号Ｓ１Ａに基づく顔画像の枚数をパラメータとした関数で与えられる値をもとに決定される。 Further, like the speaker recognizing unit 61, the face recognizing unit 62 has a reply function that answers a user when a user is designated from the dialogue control unit 63 and an inquiry is made about the learning achievement level of the user. In the case of this embodiment, the learning achievement level in the face recognition unit 62 is given by a function using the number of face images based on the image signal S1A used for collecting data of morphological features of the user's face as a parameter. It is determined based on the value obtained.

かくして顔認識部６２は、対話制御部６３からユーザを指定してそのユーザの学習達成度について問い合わせがあったときには、かかる値から学習達成度が「Ａ」〜「Ｃ」のいずれに該当するかを判定し、判定結果を学習達成度として対話制御部６３に通知する。 Thus, when the face recognition unit 62 designates a user from the dialogue control unit 63 and inquires about the learning achievement level of the user, the face recognition degree 62 corresponds to any of “A” to “C” based on the value. And the dialogue result is notified to the dialogue control unit 63 as a learning achievement level.

音声合成部６４においては、対話制御部６３から与えられる文字列データＤ２を音声信号Ｓ３に変換する機能を有し、かくして得られた音声信号Ｓ３をスピーカ５４（図５）に送出することにより、この音声信号Ｓ３に基づく音声をスピーカ５４から出力させる。 The voice synthesizing unit 64 has a function of converting the character string data D2 given from the dialogue control unit 63 into a voice signal S3. By sending the voice signal S3 thus obtained to the speaker 54 (FIG. 5), A sound based on the sound signal S3 is output from the speaker 54.

連想記憶部６５においては、例えば内部メモリ４０Ａ（図５）とソフトウェアとからなるオブジェクトであり、対話制御部６３の制御のもとに、図７に示すように、既知のユーザの名前と、話者認識部６１が記憶しているそのユーザの声の音響的特徴のデータに対応付けられたＳＩＤと、顔認識部６２が記憶しているそのユーザの顔の形態的特徴のデータに対応付けられたＦＩＤとを記憶する。 The associative memory unit 65 is an object composed of, for example, the internal memory 40A (FIG. 5) and software. Under the control of the dialogue control unit 63, as shown in FIG. The SID associated with the acoustic feature data of the user's voice stored in the person recognition unit 61 and the morphological feature data of the user's face stored in the face recognition unit 62 The FID is stored.

この際連想記憶部６５は、同じユーザと対応する名前、ＳＩＤ及びＦＩＤを関連付けて記憶するようになされ、これにより既知のユーザについて、１つの情報（名前、ＳＩＤ及びＦＩＤ）から他の情報を検索し得るようになされている。 At this time, the associative storage unit 65 is configured to store the name, SID, and FID corresponding to the same user in association with each other, thereby searching for other information from one information (name, SID, FID) for the known user. It is made to be able to do.

また連想記憶部６５は、対話制御部６３の制御のもとに、かかる各既知のユーザのＳＩＤとそれぞれ対応付けて、そのユーザに対する話者認識部６１の学習達成度を記憶すると共に、これと同様にして、各既知のユーザのＦＩＤとそれぞれ対応付けて、そのユーザに対する顔認識部６２の学習達成度を記憶する。 The associative storage unit 65 stores the learning achievement level of the speaker recognition unit 61 for the user in association with the SID of each known user under the control of the dialogue control unit 63. Similarly, the learning achievement level of the face recognition unit 62 for each user is stored in association with the FID of each known user.

なおこの実施の形態の場合、連想記憶部６５は、かかる記憶した話者認識部６１及び顔認識部６２の各ユーザに対する学習達成度を、その登録後又は最後の更新後一定時間（例えば数日）が経過するごとにレベルを下げる（例えば「Ａ」であったものを「Ｂ」に下げ、「Ｂ」であったものを「Ｃ」に下げる）（以下、これを学習達成度を時間減衰させるという）ように更新するようになされている。 In the case of this embodiment, the associative storage unit 65 sets the learning achievement level for each user of the speaker recognition unit 61 and the face recognition unit 62 stored for a certain time (for example, several days) after the registration or the last update. ) (E.g., “A” is lowered to “B”, and “B” is lowered to “C”) (hereinafter, learning achievement is time decayed) To be updated).

これは、ユーザの顔や声が経時的に変化することから、話者認識部６１や顔認識部６２がユーザを認識するために用いるそのユーザの声の音響的特徴のデータや顔の形態的特徴のデータも一定時間毎に更新することが望ましいからである。 This is because the user's face and voice change over time, so that the speaker recognition unit 61 and the face recognition unit 62 use the acoustic feature data of the user's voice and the morphological form of the face used to recognize the user. This is because it is desirable to update the feature data at regular intervals.

対話制御部６３は、音声認識部６０からの文字列データＤ１やプログラム等に基づき必要な文字列データＤ２を音声合成部６４に順次与えることにより、ユーザに対して適切な応答や質問を行う対話制御機能を有する。 The dialogue control unit 63 sequentially gives necessary character string data D2 to the voice synthesis unit 64 based on the character string data D1 from the voice recognition unit 60, a program, and the like, thereby performing an appropriate response or question to the user. Has a control function.

そして対話制御部は、かかる応答や質問を通して取得したそのユーザの名前と、このときのそのユーザに対する話者認識部６１及び顔認識部の各認識結果とに基づき、連想記憶部に記憶された各既知のユーザの名前、ＳＩＤ及びＦＩＤ等の情報を参照しながらそのユーザが新規なユーザであるか否かを判断する。 Then, the dialogue control unit, based on the name of the user acquired through the response or question, and the recognition results of the speaker recognition unit 61 and the face recognition unit for the user at this time, each stored in the associative storage unit It is determined whether or not the user is a new user while referring to information such as the name, SID and FID of the known user.

そして対話制御部６３は、そのユーザが新規なユーザであると判断したときには、話者認識部６１及び顔認識部６２に対して新規学習の開始命令及び終了命令を与えることにより、これら話者認識部６１及び顔認識部６２にその新規なユーザの声の音響的特徴や顔の形態的特徴のデータを収集及び記憶（学習）させる。 When the dialog control unit 63 determines that the user is a new user, the dialogue control unit 63 gives a new learning start command and an end command to the speaker recognition unit 61 and the face recognition unit 62, thereby recognizing these speakers. The unit 61 and the face recognition unit 62 collect and store (learn) data of the acoustic characteristics of the voice of the new user and the morphological characteristics of the face.

また対話制御部６３は、この後話者認識部６１及び顔認識部６２に対し、所定のタイミングでそのユーザに対する学習達成度を問い合わせ、話者認識部６１及び顔認識部６２のいずれかから「Ｃ」との回答を得た場合には、その話者認識部６１及び又は顔認識部６２に対して追加学習の開始命令及び終了命令を与えることにより、その話者認識部６１及び又は顔認識部６２に追加学習を行わせる一方、その間ユーザとの対話を長引かせるような制御を実行する。 Further, the dialogue control unit 63 inquires of the speaker recognition unit 61 and the face recognition unit 62 about the achievement level of learning for the user at a predetermined timing, and either one of the speaker recognition unit 61 and the face recognition unit 62 When the answer “C” is obtained, the speaker recognition unit 61 and / or the face recognition unit 61 and / or the face recognition unit 62 are given a start instruction and an end command for additional learning. While causing the unit 62 to perform additional learning, control is performed to prolong the dialog with the user during that time.

これに対して対話制御部６３は、そのユーザが既知のユーザであると判断したときには、連想記憶部６５が記憶しているそのユーザに対する話者認識部６１及び顔認識部６２の学習達成度を確認し、学習達成度が「Ｂ」又は「Ｃ」の話者認識部６１及び又は顔認識部６２にその学習達成度を通知すると共に、その話者認識部６１及び又は顔認識部６２に対して追加学習の開始命令及び終了命令を与えることにより追加学習を行わせる。 In contrast, when the dialog control unit 63 determines that the user is a known user, the dialogue control unit 63 determines the learning achievement level of the speaker recognition unit 61 and the face recognition unit 62 for the user stored in the associative storage unit 65. And confirming the learning achievement level to the speaker recognition unit 61 and / or the face recognition unit 62 whose learning achievement level is “B” or “C”, and to the speaker recognition unit 61 and / or the face recognition unit 62 Then, additional learning is performed by giving a start instruction and an end instruction for additional learning.

そして対話制御部６３は、かかる話者認識部６１及び又は顔認識部６２の追加学習の終了後、その話者認識部６１及び又は顔認識部６２にそのときのそのユーザに対する学習達成度を問い合わせ、これに対する話者認識部６１及び又は顔認識部６２の回答に基づき連想記憶部６５に記憶された対応する学習達成度を更新する。 Then, after completing the additional learning of the speaker recognition unit 61 and / or the face recognition unit 62, the dialogue control unit 63 inquires the speaker recognition unit 61 and / or the face recognition unit 62 about the learning achievement level for the user at that time. Then, the corresponding learning achievement degree stored in the associative memory unit 65 is updated based on the answer from the speaker recognition unit 61 and / or the face recognition unit 62.

（２−２）名前学習機能に関する対話制御部６３の具体的処理
次に、名前学習機能に関する対話制御部６３の具体的な処理内容について説明する。 (2-2) Specific Processing of Dialog Control Unit 63 Regarding Name Learning Function Next, specific processing contents of the dialogue control unit 63 regarding the name learning function will be described.

対話制御部６３は、内部メモリ４０Ａに格納された制御プログラムに基づいて、図８に示す名前学習処理手順ＲＴ１に従って新規な人の名前を順次学習するための各種処理を実行する。 The dialogue control unit 63 executes various processes for sequentially learning new person names in accordance with the name learning process procedure RT1 shown in FIG. 8 based on the control program stored in the internal memory 40A.

すなわち対話制御部６３は、ＣＣＤカメラ５０からの画像信号Ｓ１Ａに基づき顔認識部６２がユーザの顔を認識することにより当該顔認識部６２からＦＩＤが与えられると、この名前学習処理手順ＲＴ１をステップＳＰ０において開始し、続くステップＳＰ１において、話者認識部６１及び顔認識部６２を制御することにより、話者認識部６１にマイクロホン５１（図５）からの音声信号Ｓ１Ｂに基づく音声データの記憶を開始させると共に、顔認識部６２にＣＣＤカメラ５０からの画像信号Ｓ１Ａに基づく画像データの記憶を開始させる。 That is, when the face recognition unit 62 recognizes the user's face based on the image signal S1A from the CCD camera 50 and the FID is given from the face recognition unit 62, the dialogue control unit 63 performs this name learning processing procedure RT1. Starting in SP0, in the subsequent step SP1, the speaker recognition unit 61 and the face recognition unit 62 are controlled, whereby the speaker recognition unit 61 stores voice data based on the voice signal S1B from the microphone 51 (FIG. 5). At the same time, the face recognition unit 62 starts storing image data based on the image signal S1A from the CCD camera 50.

続いて対話制御部６３は、ステップＳＰ２に進んで、そのユーザの名前を確定する。具体的には、先行して得られたそのユーザのＦＩＤに基づき、連想記憶部６５に関連付けて記憶された各既知のユーザの名前、ＳＩＤ及びＦＩＤからそのユーザの名前が検索できるか否かを判断し、検索できた場合には、これに応じた文字列データＤ２を音声合成部６４に送出することにより、例えば「○○さんですよね。」といったそのユーザの名前がＦＩＤから検索された名前（上述の○○に当てはまる名前）と一致するか否かを確かめるための音声を出力させる。 Subsequently, the dialogue control unit 63 proceeds to step SP2 and determines the name of the user. Specifically, based on the FID of the user obtained in advance, whether or not the name of the user can be searched from the name, SID, and FID of each known user stored in association with the associative storage unit 65. If it is determined and searched, the character string data D2 corresponding to the name is sent to the speech synthesizer 64, so that the name of the user such as “You are Mr. XX” is searched from the FID. A voice for confirming whether or not it matches with (name corresponding to the above-mentioned XX) is output.

そして対話制御部６３は、かかる質問に対するそのユーザの「はい、そうです。」といった肯定的な応答の音声認識結果を音声認識部６０からの文字列データＤ１に基づき認識できた場合には、そのユーザの名前をかかる「○○さん」に確定する。 When the dialogue control unit 63 recognizes the voice recognition result of the positive response such as “Yes, yes” of the user to the question based on the character string data D1 from the voice recognition unit 60, The user's name is confirmed as “Mr. XX”.

これに対して対話制御部６３は、「いいえ、違います。」といった否定的な応答の音声認識結果を音声認識部６０からの文字列データＤ１に基づき認識できた場合には、これに応じた文字列データＤ２を音声合成部６４に送出することにより、例えば図９に示すように、「名前を教えてください。」といった名前を聞き出すための音声を出力させる。 On the other hand, when the dialogue control unit 63 can recognize the voice recognition result of a negative response such as “No, it is different” based on the character string data D1 from the voice recognition unit 60, it responds accordingly. By sending the character string data D2 to the voice synthesizer 64, for example, as shown in FIG. 9, a voice for hearing a name such as “Tell me your name” is output.

そして対話制御部６３は、かかる質問に対するそのユーザの「○○です。」といった応答の音声認識結果（すなわち名前）が得られ、その後さらに「○○さんっていうんですね。」といった確認に対するユーザの肯定的な応答があったことを音声認識部６０からの文字列データＤ１に基づいて認識すると、そのユーザの名前をかかる「○○さん」に確定する。 Then, the dialogue control unit 63 obtains a voice recognition result (that is, a name) of a response such as “It is ○○” of the user with respect to such a question, and after that, the user confirms the confirmation such as “What is Mr. ○○?” When recognizing that there is a positive response based on the character string data D1 from the voice recognition unit 60, the user's name is determined to be “Mr. XX”.

そして対話制御部６３は、このようにしてそのユーザの名前を確定すると、この後ステップＳＰ３に進んで、話者認識部６１及び顔認識部６２を制御することにより、話者認識部６１にマイクロホン５１からの音声信号Ｓ１Ａに基づく音声データの記憶処理を終了させると共に、顔認識部６２にＣＣＤカメラ５０からの画像信号Ｓ１Ａに基づく画像データの記憶を終了させる。 When the dialog control unit 63 determines the user's name in this way, the process proceeds to step SP3 and controls the speaker recognition unit 61 and the face recognition unit 62, thereby allowing the speaker recognition unit 61 to have a microphone. The audio data storage process based on the audio signal S1A from 51 is terminated, and the face recognition unit 62 ends the image data storage based on the image signal S1A from the CCD camera 50.

続いて対話制御部６３は、ステップＳＰ４に進んで、連想記憶部６５が記憶している各既知のユーザの名前と、そのユーザに対応するＳＩＤ及ＦＩＤとに基づいて、ステップＳＰ２において確定したそのユーザの名前と関連付けられたＳＩＤ及びＦＩＤが存在するか否かを判断する。 Subsequently, the dialogue control unit 63 proceeds to step SP4, and based on each known user name stored in the associative storage unit 65 and the SID and FID corresponding to the user, the dialogue control unit 63 determines that It is determined whether there is an SID and FID associated with the user's name.

ここで、このステップＳＰ４で否定結果を得ることは、そのユーザが、話者認識部６１がそのユーザの声の音響的特徴のデータを全く収集しておらず、かつ顔認識部６２がそのユーザの顔の形態的特徴のデータを全く収集していない新規なユーザであることを意味する。 Here, obtaining a negative result in step SP4 means that the user has not collected any acoustic feature data of the user's voice by the speaker recognition unit 61 and the face recognition unit 62 has This means that the user is a new user who has not collected any data on the morphological features of his face.

かくしてこのとき対話制御部６３は、ステップＳＰ８に進んで、話者認識部６１及び顔認識部６２に対して、ステップＳＰ１〜ステップＳＰ３間において記憶した音声データ又は画像データを利用した新規学習の開始命令を通知する。この結果、これら話者認識部６１及び顔認識部６２において、かかる音声データ又は画像データを利用して、新たにそのユーザの声の音響的特徴のデータ又はそのユーザの顔の形態的特徴のデータを収集し記憶する新規学習が開始されることとなる。 Thus, at this time, the dialog control unit 63 proceeds to step SP8, and starts new learning using the speech data or image data stored between step SP1 to step SP3 for the speaker recognition unit 61 and the face recognition unit 62. Notify instructions. As a result, the speaker recognition unit 61 and the face recognition unit 62 newly use the voice data or image data to newly generate acoustic feature data of the user's voice or morphological feature data of the user's face. A new learning to collect and store is started.

一方、ステップＳＰ４において肯定結果を得ることは、そのユーザが、話者認識部６１及び顔認識部６２がそれぞれそのユーザの声の音響的特徴のデータ又は顔の形態的特徴のデータを既に収集している既知のユーザであることを意味する。 On the other hand, obtaining a positive result in step SP4 means that the user has already collected the acoustic feature data of the user's voice or the facial morphological feature data, respectively. Means you are a known user.

かくしてこのとき対話制御部６３は、ステップＳＰ５に進んで、連想記憶部６５がそのユーザのＳＩＤと対応付けて記憶しているそのユーザに対する話者認識部６１の学習達成度と、連想記憶部６５がそのユーザのＦＩＤと対応付けて記憶しているそのユーザに対する顔認識部６２の学習達成度とをそれぞれ確認する。 Thus, at this time, the dialogue control unit 63 proceeds to step SP5, and the learning achievement level of the speaker recognition unit 61 for the user stored in the association storage unit 65 in association with the SID of the user, and the association storage unit 65. Confirms the learning achievement level of the face recognition unit 62 for the user stored in association with the FID of the user.

ここで、かかる確認結果として、そのユーザに対する話者認識部６１及び顔認識部６２の学習達成度がいずれも「Ａ」であることが確認された場合には、話者認識部６１及び顔認識部６２が既にそのユーザの認識に十分な当該ユーザの声の音響的特徴のデータ又は顔の形態的特徴のデータを収集し終えていると判断できる。 Here, when it is confirmed that the learning achievement levels of the speaker recognition unit 61 and the face recognition unit 62 for the user are both “A”, the speaker recognition unit 61 and the face recognition are confirmed. It can be determined that the unit 62 has already collected the acoustic feature data of the user's voice or the facial morphological feature data sufficient for the user's recognition.

よって、この場合対話制御部６３は、ステップＳＰ６に進んで、話者認識部６１及び顔認識部６３にステップＳＰ１〜ステップＳＰ３間において記憶させた音声データ又は画像データを破棄すべき旨の命令を通知する。 Therefore, in this case, the dialog control unit 63 proceeds to step SP6, and issues a command to discard the voice data or image data stored between step SP1 to step SP3 in the speaker recognition unit 61 and the face recognition unit 63. Notice.

また対話制御部６３は、この後ステップＳＰ１４に進んで、連想記憶部６５が記憶しているそのユーザに対する話者認識部６１及び顔認識部６２の学習達成度を再度「Ａ」に更新した後、ステップＳＰ１５に進んでこの名前学習処理手順ＲＴ１を終了する。なお、この後ロボット１は、そのユーザに対する学習とは無関係にそのユーザとの対話やダンス等の各種インタラクションを行う。 The dialog control unit 63 then proceeds to step SP14 and updates the learning achievement level of the speaker recognition unit 61 and the face recognition unit 62 for the user stored in the associative storage unit 65 to “A” again. Then, the process proceeds to step SP15, and the name learning processing procedure RT1 is terminated. After that, the robot 1 performs various interactions such as dialogue and dance with the user regardless of learning for the user.

これに対してステップＳＰ５における確認結果として、そのユーザに対する話者認識部６１及び顔認識部６２のいずれかの一方又は両方の学習達成度が「Ｂ」又は「Ｃ」であることが確認された場合には、その話者認識部６１及び又は顔認識部６２が未だそのユーザの認識に十分な当該ユーザの声の音響的特徴のデータ又は顔の形態的特徴のデータを収集し終えていないと判断できる。 On the other hand, as a confirmation result in step SP5, it was confirmed that the learning achievement level of one or both of the speaker recognition unit 61 and the face recognition unit 62 for the user is “B” or “C”. In this case, the speaker recognition unit 61 and / or the face recognition unit 62 has not yet collected the acoustic feature data of the user's voice or the facial morphological feature data sufficient for the user's recognition. I can judge.

よって、この場合対話制御部６３は、ステップＳＰ７に進んで学習達成度が「Ｂ」又は「Ｃ」であった話者認識部６１及び又は顔認識部６２に対してその学習達成度を通知し、この後ステップＳＰ８に進んで、その学習達成度を通知した（すなわちそのユーザに対する学習達成度が未だ「Ｂ」又は「Ｃ」である）話者認識部６１及び又は顔認識部６２に対してステップＳＰ１〜ステップＳＰ３間において記憶した音声データ又は画像データを利用した追加学習の開始命令を通知する。 Therefore, in this case, the dialog control unit 63 proceeds to step SP7 to notify the speaker achievement unit 61 and / or the face recognition unit 62 whose learning achievement level is “B” or “C” to the learning achievement level. Then, the process proceeds to step SP8 to notify the speaker recognition unit 61 and / or the face recognition unit 62 that notified the learning achievement level (that is, the learning achievement level for the user is still “B” or “C”). An instruction to start additional learning using audio data or image data stored between steps SP1 to SP3 is notified.

この結果、その話者認識部６１及び又は顔認識部６２において、ステップＳＰ７において通知された学習達成度をスタートポイントとして、そのユーザの声の音響的特徴のデータ又は顔の形態的特徴のデータの収集が未だ十分でない現在の収集状態からさらにこれを収集するように、かかる音声データ又は画像データを利用した追加学習が開始される。 As a result, in the speaker recognition unit 61 and / or the face recognition unit 62, the acoustic feature data of the user's voice or the morphological feature data of the user's voice is used as a starting point with the learning achievement level notified in step SP7. Incremental learning using such audio data or image data is started so as to collect further from the current collection state where collection is not yet sufficient.

さらに対話制御部６３は、この後その話者認識部６１及び又は顔認識部６２からステップＳＰ１〜ステップＳＰ３間において記憶した音声データ又は画像データを利用した学習が終了した旨の通知が与えられると、ステップＳＰ９に進んで、その学習を行った話者認識部６１及び又は顔認識部６２にそのユーザに対する学習達成度を問い合わせ、これに対するその話者認識部６１及び又は顔認識部６２の回答がともに「Ａ」又は「Ｂ」のいずれかであるか否かを判断する。 Further, when the dialog control unit 63 thereafter receives a notification from the speaker recognition unit 61 and / or the face recognition unit 62 that learning using the voice data or image data stored between steps SP1 to SP3 has been completed. Then, proceeding to step SP9, the speaker recognition unit 61 and / or the face recognition unit 62 that performed the learning are inquired about the degree of learning achievement for the user, and the answer of the speaker recognition unit 61 and / or the face recognition unit 62 for the inquiry is received. It is determined whether or not both are “A” or “B”.

ここで、このステップＳＰ９において肯定結果を得ることは、学習を行った話者認識部６１及び又は顔認識部６２のいずれもがそのユーザを認識できる程度に当該ユーザの声の音響的特徴のデータ又は顔の形態的特徴のデータを収集し記憶し終えた（十分に学習し終えた）ことを意味する。 Here, obtaining a positive result in step SP9 means that the acoustic feature data of the user's voice is such that both the speaker recognition unit 61 and the face recognition unit 62 that have learned can recognize the user. Or it means that the data of the morphological features of the face have been collected and stored (fully learned).

かくしてこのとき対話制御部６３は、ステップＳＰ１４に進んで、連想記憶部６５が記憶しているそのユーザに対する話者認識部６１及び顔認識部６２の学習達成度を、学習を行っていないものについてはステップＳＰ５において確認した学習達成度、学習を行ったものについてはステップＳＰ９において得られた学習達成度に更新し、この後ステップＳＰ１５に進んでこの名前学習処理手順ＲＴ１を終了する。なお、この後ロボット１は、そのユーザに対する学習とは無関係にそのユーザとの対話やダンス等の各種インタラクションを行う。 Thus, at this time, the dialog control unit 63 proceeds to step SP14, and the learning achievement levels of the speaker recognition unit 61 and the face recognition unit 62 for the user stored in the associative storage unit 65 are not learned. Is updated to the learning achievement level confirmed in step SP5, and the learning achievement level is updated to the learning achievement level obtained in step SP9. Thereafter, the process proceeds to step SP15 and the name learning processing procedure RT1 is terminated. After that, the robot 1 performs various interactions such as dialogue and dance with the user regardless of learning for the user.

これに対してステップＳＰ９において否定結果を得ることは、学習を行った話者認識部６１及び又は顔認識部６２のすくなくとも一方が未だそのユーザを認識できる程度に当該ユーザの声の音響的特徴のデータ又は顔の形態的特徴のデータを収集し記憶していない（十分には学習し終えていない）ことを意味する。 On the other hand, obtaining a negative result in step SP9 means that at least one of the learned speaker recognition unit 61 and / or face recognition unit 62 can still recognize the user so that the acoustic feature of the user's voice can be recognized. Means that data or facial morphological feature data is not collected and stored (not fully learned).

かくしてこのとき対話制御部６３は、ステップＳＰ１０に進んで、その話者認識部６１及び又は顔認識部６２に対して、マイクロホン５１からリアルタイムで与えられる音声信号Ｓ１Ｂに基づき得られる音声データ又はＣＣＤカメラ５０からリアルタイムで与えられる画像信号Ｓ１Ａに基づき得られる画像データを利用した追加学習の開始命令を通知する。 Thus, at this time, the dialogue control unit 63 proceeds to step SP10, and the voice data or CCD camera obtained based on the voice signal S1B given in real time from the microphone 51 to the speaker recognition unit 61 or the face recognition unit 62. 50, an instruction to start additional learning using image data obtained based on the image signal S1A given in real time is sent.

この結果、その話者認識部６１及び又は顔認識部６２において、ステップＳＰ９で対話制御部６３に通知した学習達成度をスタートポイントとして、かかる音声データ又は画像データを利用して、そのユーザの声の音響的特徴のデータ又はそのユーザの顔の形態的特徴のデータを追加的に収集し記憶する追加学習が開始される。 As a result, in the speaker recognition unit 61 and / or the face recognition unit 62, the voice of the user is obtained using the voice data or the image data using the learning achievement degree notified to the dialogue control unit 63 in step SP9 as a starting point. Additional learning is started to additionally collect and store the acoustic feature data or the morphological feature data of the user's face.

また対話制御部６３は、この後ステップＳＰ１１に進んで、そのユーザとの対話を引き伸ばすための処理を実行する。具体的には、対話制御部６３は、ロボット１がそのユーザについて学習していることを認識させないように、例えば図１０に示すように、「僕とお友達になってくれる？」、「ありがとう！それじゃ、〇〇さんのこと聞いていい？」、「〇〇さんの好きなたべものって何？」などのようなロボット１の方からそのユーザに対して積極的に話しかけたり、話題を提供するための文字列データＤ２を、音声認識部６０によるユーザの発話内容の音声認識結果に応じて選択しながら音声合成部６４に送出する。 Further, the dialogue control unit 63 thereafter proceeds to step SP11 and executes a process for extending the dialogue with the user. Specifically, in order to prevent the robot 1 from recognizing that the robot 1 is learning about the user, for example, as shown in FIG. 10, “Will you be my friend?”, “Thank you! Then, talk to the user from the robot 1 such as “What do you want to hear about Mr. OO?” And “What is Mr. OO's favorite food?” The character string data D2 to be transmitted is sent to the voice synthesis unit 64 while being selected according to the voice recognition result of the user's utterance content by the voice recognition unit 60.

また対話制御部６３は、この後所定のタイミングでステップＳＰ１２に進んで、追加学習の開始命令を通知した話者認識部６１及び又は顔認識部６２にそのユーザに対する学習達成度を問い合わせ、これに対するその話者認識部６１及び又は顔認識部６２の回答がともに「Ａ」又は「Ｂ」のいずれかであるか否かを判断する。 Further, the dialogue control unit 63 thereafter proceeds to step SP12 at a predetermined timing, inquires the speaker recognition unit 61 and / or the face recognition unit 62 that notified the start instruction of the additional learning about the learning achievement level for the user. It is determined whether the answer of the speaker recognition unit 61 and / or the face recognition unit 62 is either “A” or “B”.

そして対話制御部は、このステップＳＰ１２において否定結果を得るとステップＳＰ１１に戻り、この後このステップＳＰ１２において肯定結果を得られるまでステップＳＰ１１−ＳＰ１２−ＳＰ１１のループを繰り返す。 When the dialog control unit obtains a negative result in step SP12, it returns to step SP11, and thereafter repeats the loop of step SP11-SP12-SP11 until a positive result is obtained in step SP12.

そして対話制御部６３は、やがて話者認識部６１及び顔認識部６２のいずれもがその後そのユーザを認識できる程度に当該ユーザの声の音響的特徴のデータ又は顔の形態的特徴のデータを収集し記憶し終えることによりステップＳＰ１２において肯定結果を得ると、ステップＳＰ１３に進んで、追加学習を行っている話者認識部６１及び又は顔認識部６２に対して追加学習の終了命令を通知する。 Then, the dialogue control unit 63 collects the acoustic feature data of the user's voice or the morphological feature data of the face so that both the speaker recognition unit 61 and the face recognition unit 62 can recognize the user thereafter. If a positive result is obtained in step SP12 by completing the storage, the process proceeds to step SP13 to notify the speaker recognition unit 61 and / or the face recognition unit 62 performing additional learning of an additional learning end command.

また対話制御部６３は、この後ステップＳＰ１４に進んで、連想記憶部６５が記憶しているそのユーザに対する話者認識部６１及び顔認識部６２の学習達成度を、ステップＳＰ１０〜ステップＳＰ１３間の追加学習を行っていないものについてはステップＳＰ５又はステップ９において確認した学習達成度、ステップＳＰ１０〜ステップＳＰ１３間の追加学習を行ったものについてはステップＳＰ１２において得られた学習達成度に更新し、この後ステップＳＰ１５に進んでこの名前学習処理手順ＲＴ１を終了する。なお、この後ロボット１は、そのユーザに対する学習とは無関係にそのユーザとの対話やダンス等の各種インタラクションを行う。 Further, the dialogue control unit 63 thereafter proceeds to step SP14 to determine the learning achievement level of the speaker recognition unit 61 and the face recognition unit 62 for the user stored in the associative storage unit 65 between step SP10 and step SP13. For those not performing additional learning, the learning achievement level confirmed in step SP5 or step 9 is updated. For those for which additional learning between steps SP10 to SP13 is performed, the learning achievement level obtained in step SP12 is updated. Then, the process proceeds to step SP15, and the name learning processing procedure RT1 is terminated. After that, the robot 1 performs various interactions such as dialogue and dance with the user regardless of learning for the user.

このようにしてこのロボット１においては、対話制御部６３の制御のもとに、新規なユーザに対する新規学習や、既知のユーザに対する追加学習を行い得、これにより新規なユーザの名前をその声の音響的特徴のデータ及びその顔の形態的特徴のデータと関連付けて順次学習し得るようになされている。 In this way, the robot 1 can perform new learning for a new user or additional learning for a known user under the control of the dialogue control unit 63, and thereby the name of the new user can be changed. Learning can be sequentially performed in association with acoustic feature data and facial morphological feature data.

（２−３）名前学習処理時におけるエラー処理
次に、かかる名前学習処理手順ＳＴ１に従った名前学習処理時において、学習中のユーザが立ち去ってしまった等の理由によりそのユーザに対する学習を途中で終了せざるを得ない場合の処理について説明する。 (2-3) Error processing during name learning processing Next, during the name learning processing in accordance with the name learning processing procedure ST1, learning for the user is being performed for some reason, such as the user who is learning has left. A process in the case where it must be terminated will be described.

ユーザに対する学習を途中で終了せざるを得ない場合としては、
(1)名前学習処理手順ＳＴ１のステップＳＰ２において、そのユーザの名前を確定する前に学習を終了しなければならない場合
(2)名前学習処理手順ＳＴ１のステップＳＰ１〜ステップＳＰ３間において、話者認識部や顔認識部が音声データ又は画像データの記憶保持を開始後、その終了前に学習を終了せざるを得ない場合
(3)名前学習処理手順ＳＴ１のステップＳＰ４〜ステップＳＰ７間において、話者認識部６１や顔認識部６２が、記憶した音声データや画像データを利用した学習を開始する前に学習を終了せざるを得ない場合
(4)名前学習処理手順ＲＴ１のステップＳＰ８〜ステップＳＰ９間において、話者認識部６１や顔認識部６２が、記憶した音声データ又は画像データを利用した新規学習又は追加学習中に当該学習を終了せざるを得ない場合
(5)名前学習処理手順ＲＴ１のステップＳＰ１０〜ステップＳＰ１３間において、話者認識部６１や顔認識部６２が、リアルタイムで得られる音声信号Ｓ１Ｂ又は画像信号Ｓ１Ａを利用した追加学習中に学習を終了せざるを得ない場合
の５つのパターンが考えられる。なお、以下においては、これらパターンをそれぞれ学習不能パターン(1)〜(5)と呼ぶものとする。 If you have to finish learning for the user,
(1) In step SP2 of the name learning processing procedure ST1, learning must be terminated before the user's name is determined
(2) Between the step SP1 to step SP3 of the name learning processing procedure ST1, the speaker recognition unit or the face recognition unit has to end the learning before starting the storage and holding of the voice data or the image data after the start. Case
(3) Between step SP4 and step SP7 of the name learning processing procedure ST1, the speaker recognition unit 61 and the face recognition unit 62 must finish learning before starting learning using the stored voice data and image data. If you don't get
(4) Between step SP8 and step SP9 of the name learning processing procedure RT1, the speaker recognition unit 61 and the face recognition unit 62 end the learning during new learning or additional learning using stored voice data or image data. If you must
(5) Between step SP10 to step SP13 of the name learning processing procedure RT1, the speaker recognition unit 61 and the face recognition unit 62 finish learning during additional learning using the audio signal S1B or the image signal S1A obtained in real time. There are five possible patterns that must be considered. In the following, these patterns are referred to as unlearnable patterns (1) to (5), respectively.

これらの場合、そのユーザに対する学習は失敗であるとして、それまでに話者認識部６１や顔認識部６２が記憶した学習用の音声データ若しくは画像データを破棄させ、又はそれまでの間に話者認識部６１や顔認識部６２が収集したそのユーザの声の音響的特徴のデータや顔の形態的特徴のデータを破棄させることも考えられるが、これではそれまでに収集した学習用の音声データ又は画像データや、ユーザの声の音響的特徴のデータ又は顔の形態的特徴のデータが無駄となる。 In these cases, it is determined that learning for the user is unsuccessful, and the speech data or image data for learning stored by the speaker recognition unit 61 or the face recognition unit 62 so far is discarded, or the speaker is in the meantime. It may be possible to discard the acoustic feature data of the user's voice and the facial morphological feature data collected by the recognition unit 61 and the face recognition unit 62. However, in this case, the learning voice data collected so far Alternatively, image data, acoustic feature data of the user's voice, or facial morphological feature data is wasted.

そこで、このロボット１においては、上述の学習不能パターン(1)の場合には、そのときまでに記憶した学習用の音声データや画像データを破棄させる一方、上述の学習不能パターン(2)又は学習不能パターン(3)の場合には、そのときまでに記憶した学習用の音声データや画像データを利用して必要な話者認識部６１及び又は顔認識部６２に学習を行わせ、当該学習により学習達成度が「Ａ」又は「Ｂ」となった場合にはかかる学習を有効なものとして取り扱い、未だ学習達成度が「Ｃ」の場合にはかかる学習を無効なものとして取り扱う。 Therefore, in the case of the above-described unlearnable pattern (1), the robot 1 discards the learning voice data and image data stored so far, while the unlearnable pattern (2) In the case of the impossible pattern (3), the necessary speaker recognition unit 61 and / or face recognition unit 62 are made to learn using the learning voice data and image data stored up to that time, and the learning is performed. When the learning achievement level is “A” or “B”, such learning is treated as effective, and when the learning achievement level is still “C”, such learning is treated as invalid.

またこのロボット１においては、上述の学習不能パターン(4)又は学習不能パターン(5)の場合には、学習を終了せざるを得なくなった時点での話者認識部６１や顔認識部６２の学習達成度に応じて、当該学習達成度が「Ａ」、「Ｂ」である場合にはかかる学習を有効なものとして取り扱い、学習達成度が「Ｃ」の場合にはかかる学習を無効なものとして取り扱うようになされている。 Further, in the robot 1, in the case of the above-described unlearnable pattern (4) or unlearnable pattern (5), the speaker recognition unit 61 and the face recognition unit 62 at the time when learning must be ended. Depending on the degree of learning achievement, when the degree of learning achievement is “A” or “B”, such learning is treated as effective, and when the degree of learning achievement is “C”, such learning is invalid. It is made to handle as.

ここで、このようなエラー処理は、図１１に示すエラー処理手順ＲＴ２に従って対話制御部６３の制御のもとに行われる。 Here, such error processing is performed under the control of the dialogue control unit 63 according to the error processing procedure RT2 shown in FIG.

実際上、対話制御部６３は、上述の名前学習処理手順ＲＴ１の実行時に、例えば顔認識部６１から認識対象のユーザを検出し得なくなったとのエラー通知が与えられるなど、そのユーザに対する学習を継続し得なくなる予め定められた所定状態が発生すると、名前学習処理手順ＲＴ１を終了してこのエラー処理手順ＲＴ２をステップＳＰ２０において開始し、続くステップＳＰ２１において、それまで対象としていたユーザの名前が確定しているか否かを判断する。 In practice, the dialogue control unit 63 continues learning for the user when the name learning processing procedure RT1 described above is executed, such as an error notification indicating that the user to be recognized cannot be detected from the face recognition unit 61, for example. When a predetermined condition that cannot be performed occurs, the name learning processing procedure RT1 is terminated and the error processing procedure RT2 is started in step SP20. In the subsequent step SP21, the name of the user who has been targeted so far is determined. Judge whether or not.

このステップＳＰ２１において否定結果を得ることは、かかるエラーが名前学習処理手順ＲＴ１のステップＳＰ２においてユーザの名前を確定する前に学習を終了しなければならない事態が発生したことを意味し（学習不能パターン(1)の場合）、このとき対話制御部６３は、ステップＳＰ２２に進んで、話者認識部６１及び顔認識部６２に名前学習処理手順ＲＴ１の開始後そのときまでに記憶した学習用の音声データ又は画像データを破棄すべき旨の命令を通知し、この後ステップＳＰ３４に進んでこのエラー処理手順ＲＴ２を終了する。 Obtaining a negative result in step SP21 means that a situation has arisen in which learning has to be terminated before such an error determines the user's name in step SP2 of the name learning processing procedure RT1 (unlearnable pattern). In the case of (1)), the dialogue control unit 63 proceeds to step SP22 at this time, and the learning speech stored up to that point after the start of the name learning processing procedure RT1 in the speaker recognition unit 61 and the face recognition unit 62 A command to discard the data or image data is notified, and then the process proceeds to step SP34 to end the error processing procedure RT2.

これに対して対話制御部６３は、このステップＳＰ２１において肯定結果を得ると、ステップＳＰ２３に進んで、話者認識部６１及び顔認識部６２に自己の状態を通知すべき旨の命令を与え、これに対する話者認識部６１及び顔認識部６２の応答に基づいて、これら話者認識部６１及び顔認識部６２の現在の状態を確認する。 On the other hand, when the dialogue control unit 63 obtains a positive result in step SP21, the dialogue control unit 63 proceeds to step SP23 and gives an instruction to notify the speaker recognition unit 61 and the face recognition unit 62 of its own state, Based on the responses of the speaker recognizing unit 61 and the face recognizing unit 62 to this, the current states of the speaker recognizing unit 61 and the face recognizing unit 62 are confirmed.

そして対話制御部６３は、このステップＳＰ２３において、話者認識部６１及び顔認識部６２が音声データ又は画像データの記憶保持を開始後、その終了前であることを確認すると（学習不能パターン(2)の場合）、ステップＳＰ２４に進んで、話者認識部６１及び顔認識部６２に学習用の音声データ又は画像データの記憶を終了すべき旨の命令を通知し、この後ステップＳＰ２５〜ステップＳＰ２９を上述の名前学習処理手順ＲＴ１のステップＳＰ４〜ステップＳＰ８と同様に処理する。 Then, in step SP23, the dialog control unit 63 confirms that the speaker recognition unit 61 and the face recognition unit 62 have started storing and storing voice data or image data, but before the end (learning impossible pattern (2 )), The process proceeds to step SP24 to notify the speaker recognition unit 61 and the face recognition unit 62 of an instruction to end the storage of the speech data or image data for learning, and then step SP25 to step SP29. Are processed in the same manner as Step SP4 to Step SP8 of the name learning processing procedure RT1 described above.

そして対話制御部６３は、かかるステップＳＰ２５〜ステップＳＰ２９を処理し終えると、ステップＳＰ３０に進んで、話者認識部６１及び顔認識部６２にそのユーザに対する学習達成度を問い合わせることにより、そのユーザに対する話者認識部６１及び顔認識部６２の学習達成度を取得し、この後ステップＳＰ３１に進んで、これら取得したそのユーザに対する話者認識部６１及び顔認識部６２の学習達成度がともに「Ｃ」であるか否かを判断する。 When the dialogue control unit 63 finishes processing step SP25 to step SP29, the dialogue control unit 63 proceeds to step SP30 and inquires the speaker recognition unit 61 and the face recognition unit 62 about the achievement level of learning for the user. The learning achievement levels of the speaker recognition unit 61 and the face recognition unit 62 are acquired, and then the process proceeds to step SP31, where both the acquired learning achievement levels of the speaker recognition unit 61 and the face recognition unit 62 for the user are “C”. Is determined.

ここで、このステップＳＰ３１において否定結果を得ることは、かかる話者認識部６１及び顔認識部６２がともにそのユーザについて、当該ユーザを認識できる程度にそのユーザの声の音響的特徴のデータ又は顔の形態的特徴のデータを収集し得たことを意味する。 Here, obtaining a negative result in step SP31 means that both the speaker recognition unit 61 and the face recognition unit 62 can recognize the user's voice acoustic feature data or the face to such an extent that the user can be recognized. It means that the data of the morphological features of can be collected.

かくしてこのとき対話制御部６３は、ステップＳＰ３３に進んで、かかるユーザが新規のユーザであった場合には、新たに話者認識部６１及び顔認識部６２からそれぞれ発行されたＳＩＤ及びＦＩＤと、名前学習処理手順ＲＴ１のステップＳＰ２において確定したそのユーザの名前と、このエラー処理手順ＲＴ２のステップＳＰ３０において取得したそのユーザに対する話者認識部６１及び顔認識部６２の学習達成度とを上述の様に関連付けて連想記憶部６５に記憶させる。 Thus, at this time, the dialog control unit 63 proceeds to step SP33, and when the user is a new user, the SID and FID newly issued from the speaker recognition unit 61 and the face recognition unit 62, respectively, The name of the user determined in step SP2 of the name learning processing procedure RT1 and the learning achievement levels of the speaker recognition unit 61 and the face recognition unit 62 for the user acquired in step SP30 of the error processing procedure RT2 are as described above. And stored in the associative storage unit 65.

また対話制御部６３は、かかるユーザが既知のユーザであった場合には、連想記憶部６５が記憶しているそのユーザに対する話者認識部６１及び顔認識部６２の学習達成度を、ステップＳＰ３０において取得したそのユーザに対する話者認識部６１及び顔認識部６２の現在の学習達成度に更新する。そして対話制御部６３は、この後ステップＳＰ３４に進んでこのエラー処理手順ＲＴ２を終了する。 If the user is a known user, the dialogue control unit 63 determines the learning achievement level of the speaker recognition unit 61 and the face recognition unit 62 for the user stored in the associative storage unit 65 by using step SP30. Is updated to the current learning achievement level of the speaker recognition unit 61 and the face recognition unit 62 for the user acquired in step S2. Then, the dialogue control unit 63 proceeds to step SP34 and ends this error processing procedure RT2.

これに対して、ステップＳＰ３１において否定結果を得ることは、話者認識部６１及び顔認識部６２のいずれか一方又は両方がそのユーザについて、当該ユーザを認識できる程度にそのユーザの声の音響的特徴のデータ又は顔の形態的特徴のデータを収集し終えていないことを意味する。 On the other hand, obtaining a negative result in step SP31 means that either one or both of the speaker recognition unit 61 and the face recognition unit 62 can recognize the user's voice to such an extent that the user can be recognized. This means that feature data or facial morphological feature data has not been collected.

かくしてこのとき対話制御部６３は、ステップＳＰ３２に進んで、話者認識部６１及び顔認識部６２に対して、この後の話者認識処理や認識処理においてそのＳＩＤ又はＦＩＤと対応付けられた声の音響的特徴のデータや顔の形態的特徴のデータを使用しないように指示を与える。この結果、話者認識部６１及び顔認識部６２において、かかる声の音響的特徴のデータや顔の形態的特徴のデータは、この後の追加学習によりそのユーザに対する学習達成度が「Ｂ」又は「Ａ」となるまで話者認識処理又は顔認識処理に使用されないこととなる。 Thus, at this time, the dialogue control unit 63 proceeds to step SP32 and asks the speaker recognition unit 61 and the face recognition unit 62 for the voice associated with the SID or FID in the subsequent speaker recognition processing and recognition processing. An instruction is given not to use acoustic feature data or facial morphological feature data. As a result, in the speaker recognizing unit 61 and the face recognizing unit 62, the acoustic achievement data of the voice and the morphological characteristic data of the face have a learning achievement level “B” or “ It will not be used for speaker recognition processing or face recognition processing until “A” is reached.

そして対話制御部６３は、この後ステップＳＰ３３に進んで、かかるユーザが新規のユーザであった場合には、新たに話者認識部６１及び顔認識部６２から発行されたそれぞれＳＩＤ及びＦＩＤと、名前学習処理手順ＲＴ１のステップＳＰ２において確定したそのユーザの名前と、このエラー処理手順ＲＴ２のステップＳＰ３０において取得したそのユーザに対する話者認識部６１及び顔認識部６２の学習達成度とを上述のように関連付けて連想記憶部６５に記憶させる。 Then, the dialogue control unit 63 proceeds to step SP33, and when the user is a new user, the SID and FID newly issued from the speaker recognition unit 61 and the face recognition unit 62, respectively, The name of the user determined in step SP2 of the name learning processing procedure RT1 and the learning achievement levels of the speaker recognition unit 61 and the face recognition unit 62 for the user acquired in step SP30 of the error processing procedure RT2 are as described above. And stored in the associative storage unit 65.

従って、このロボット１の場合、そのユーザに対する話者認識部６１及び又は顔認識部６２の学習達成度が「Ｃ」であっても、そのとき得られた当該ユーザの名前と、対応するＳＩＤ及びＦＩＤと、そのユーザに対する話者認識部６１及び又は顔認識部６２の学習達成度とが関連付けて連想記憶部６５に記憶保持されるため（ステップＳＰ３１〜ステップＳＰ３３）、例えば次にその名前をもつユーザを認識したときに例えば図１２のように、そのユーザに以前会ったことがあるという発話を行うことができる。 Therefore, in the case of this robot 1, even if the learning achievement level of the speaker recognition unit 61 or the face recognition unit 62 for the user is “C”, the name of the user obtained at that time, the corresponding SID, Since the FID and the learning achievement level of the speaker recognition unit 61 and / or the face recognition unit 62 for the user are associated and stored in the associative storage unit 65 (step SP31 to step SP33), for example, the name is given next. When the user is recognized, for example, as shown in FIG. 12, an utterance that the user has been met before can be made.

またステップＳＰ３２において、対話制御部６３から話者認識部６１及び又は顔認識部６２に対してそのとき収集したユーザの声の音響的特徴のデータや顔の形態的特徴のデータを使用しないように指示が与えられた場合にあっても、名前学習処理手順ＲＴ１のステップＳＰ７及びステップＳＰ８について上述したように、次回の当該ユーザに対する話者認識部６１及び又は顔認識部６２の追加学習が当該収集したユーザの声の音響的特徴のデータや顔の形態的特徴のデータの存在を前提とした途中から開始されるため、これら話者認識部６１及び顔認識部６２が効率良く学習を行うことができる。 In step SP32, the acoustic feature data of the user's voice and the facial morphological feature data collected at that time from the dialogue control unit 63 to the speaker recognition unit 61 and / or the face recognition unit 62 are not used. Even when an instruction is given, as described above for step SP7 and step SP8 of the name learning processing procedure RT1, the next additional learning of the speaker recognition unit 61 and / or face recognition unit 62 for the user is collected. Therefore, the speaker recognition unit 61 and the face recognition unit 62 can efficiently perform learning because it starts on the premise of the existence of the acoustic feature data of the user's voice and the morphological feature data of the face. it can.

一方、対話制御部６３は、ステップＳＰ２３において、話者認識部６１及び顔認識部６２の現在の状態として、音声データ又は画像データを記憶し終えたが、これを利用した学習を開始する前であることを確認すると（学習不能パターン(3)の場合）、ステップＳＰ２５に進んで、名前学習処理手順ＲＴ１のステップＳＰ２において確定したそのユーザの名前に基づいて、当該名前と関連付けられたＳＩＤ及びＦＩＤを連想記憶部６５が記憶しているか否かを判断し、この後ステップＳＰ２６〜ステップＳＰ３４を上述と同様に処理する。 On the other hand, in step SP23, the dialog control unit 63 finishes storing the voice data or the image data as the current state of the speaker recognition unit 61 and the face recognition unit 62, but before starting the learning using this. If it is confirmed (in the case of unlearnable pattern (3)), the process proceeds to step SP25, and the SID and FID associated with the name based on the name of the user determined in step SP2 of the name learning processing procedure RT1. Is stored in the associative storage unit 65, and thereafter, step SP26 to step SP34 are processed in the same manner as described above.

また対話制御部６３は、ステップＳＰ２３において、話者認識部６１及び顔認識部６２の現在の状態として、記憶した音声データ又は画像データを利用した学習中であることを確認すると、（学習不能パターン(4)の場合）、ステップＳＰ３０に進んで、話者認識部６１及び顔認識部６２にそのユーザに対する学習達成度を問い合わせることにより、そのユーザに対する話者認識部６１及び顔認識部６２の学習達成度を取得し、この後ステップＳＰ３１〜ステップＳＰ３４を上述と同様に処理する。 In step SP23, when the dialog control unit 63 confirms that the current state of the speaker recognition unit 61 and the face recognition unit 62 is learning using the stored voice data or image data, the (unlearnable pattern) In the case of (4)), the process proceeds to step SP30, and the learning of the speaker recognition unit 61 and the face recognition unit 62 for the user is made by inquiring the speaker recognition unit 61 and the face recognition unit 62 about the degree of learning for the user. The achievement level is acquired, and thereafter, step SP31 to step SP34 are processed in the same manner as described above.

さらに対話制御部６３は、ステップＳＰ２３において、話者認識部６１及び顔認識部６２の現在の状態として、マイクロホン５１からリアルタイムで与えられる音声信号Ｓ１Ｂに基づく音声データ又はＣＣＤカメラ５０からリアルタイムで与えられる画像信号Ｓ１Ａに基づく画像データを利用した追加学習中であることを確認すると（学習不能パターン(5)の場合）、ステップＳＰ３５に進んで、その話者認識部６１及び又は顔認識部６２に追加学習の終了命令を通知する。 Further, in step SP23, the dialog control unit 63 is provided with audio data based on the audio signal S1B provided in real time from the microphone 51 or in real time from the CCD camera 50 as the current state of the speaker recognition unit 61 and the face recognition unit 62. If it is confirmed that the additional learning using the image data based on the image signal S1A is being performed (in the case of the learning impossible pattern (5)), the process proceeds to step SP35 to be added to the speaker recognition unit 61 and / or the face recognition unit 62. Notify learning end command.

また対話制御部６３は、この後ステップＳＰ３０に進んで、話者認識部６１及び顔認識部６２にそのユーザに対する学習達成度を問い合わせることにより、そのユーザに対する話者認識部６１及び顔認識部６２の学習達成度を取得し、さらにこの後ステップＳＰ３１〜ステップＳＰ３４を上述と同様に処理する。 The dialogue control unit 63 then proceeds to step SP30 and inquires of the speaker recognition unit 61 and the face recognition unit 62 about the degree of learning achievement for the user, so that the speaker recognition unit 61 and the face recognition unit 62 for the user. The learning achievement level is acquired, and thereafter, step SP31 to step SP34 are processed in the same manner as described above.

このようにしてこのロボット１においては、ユーザを学習中にそのユーザの学習を終了せざるを得なくなった場合においても、対話制御部６３の制御のもとに、そのときまでに収集した学習用の音声データ又は画像データや、ユーザの声の音響的特徴のデータ又は顔の形態的特徴のデータを利用してそのユーザを行い得るようになされている。 In this way, in the robot 1, even when the user is forced to finish learning while learning, the robot 1 collects the learning data collected up to that time under the control of the dialogue control unit 63. The user can be performed using the voice data or image data of the user, the acoustic feature data of the user's voice, or the morphological feature data of the face.

（３）本実施の形態の動作及び効果
以上の構成において、このロボット１では、新規なユーザとの対話を通してそのユーザの名前を取得し、当該名前を、マイクロホン５１（図５）やＣＣＤカメラ５０（図５）の出力に基づいて検出したそのユーザの声の音響的特徴及び顔の形態的特徴の各データと関連付けて記憶すると共に、これら記憶した各種データに基づいて、名前を取得していないさらに新規なユーザの登場を認識し、その新規なユーザの名前や声の音響的特徴及び顔の形態的特徴を上述と同様にして取得し記憶するようにしてユーザの名前を学習する。 (3) Operation and effect of the present embodiment In the above configuration, the robot 1 acquires the name of the user through dialogue with a new user, and the name is used for the microphone 51 (FIG. 5) and the CCD camera 50. (FIG. 5) is stored in association with the acoustic characteristics of the user's voice and the morphological characteristics of the face detected based on the output of (FIG. 5), and the name is not acquired based on the stored various data Further, the user's name is learned by recognizing the appearance of the new user and acquiring and storing the new user's name, the acoustic characteristics of the voice and the morphological characteristics of the face in the same manner as described above.

従って、このロボット１は、学習していることをそのユーザに認識させることなく、人間が普段行うように、ユーザとの対話を通じて新規のユーザの名前やそのユーザの声の音響的特徴及び顔の形態的特徴を自然に学習することができる。 Therefore, the robot 1 does not allow the user to recognize what he / she is learning, and through the dialogue with the user, the name of the new user, the acoustic features of the user's voice, and the facial Morphological features can be learned naturally.

またこのロボット１は、かかる学習を行うに際して話者認識部６１及び顔認識部６２が、対象とするユーザの声の音響的特徴を学習するための学習用の音声データ又は及びユーザの顔の形態的特徴を学習するための学習用の画像データを当該ユーザの名前が確定する前から予め記憶しておき、この音声データ及び画像データを利用して学習を行うこととしているため、そのユーザに対する学習を途中で終了せざるを得ない事態が発生した場合にもそのユーザに対する学習を行うことができる可能性があり、その分より効率良くユーザの学習を行うことができる。 Further, in the robot 1, when performing such learning, the speaker recognition unit 61 and the face recognition unit 62 learn voice data for learning the target user's voice acoustic characteristics or the form of the user's face. Learning image data for learning a characteristic feature is stored in advance before the name of the user is determined, and learning is performed using the voice data and image data. If there is a situation in which it is unavoidable to end the process, there is a possibility that the user can be learned, and the user can be more efficiently learned.

さらにこのロボット１は、ユーザの学習を学習途中で終了せざるを得ない場合にも、それまでの学習結果であるユーザの声の音響的特徴のデータや顔の形態的特徴のデータを保持し、次回のそのユーザに対する学習をその途中の状態から開始するため、効率良く学習を行うことができる。 Further, the robot 1 holds the acoustic feature data of the user's voice and the morphological feature data of the face, which are the learning results so far, even when the user's learning must be terminated in the middle of the learning. Since the next learning for the user is started in the middle of the learning, the learning can be performed efficiently.

さらにこのロボット１は、予め記憶した対象とするユーザの声の音声データ及び顔の画像データを利用した学習によって十分な学習が行われなかった場合には、そのユーザとの対話を引き伸ばして学習を継続することとしているため、そのユーザに対する学習を１回の対話によって完了させる機会が多く、その分同じユーザに名前を何度も尋ねるというユーザにとって煩わしいインタラクションの発生を有効に防止することができる。 Furthermore, the robot 1 expands the dialogue with the user when the learning is not performed by the learning using the voice data of the target user's voice and the face image data stored in advance. Since it is supposed to continue, there are many opportunities to complete learning for the user by one dialogue, and accordingly, it is possible to effectively prevent the occurrence of annoying interaction for the user who asks the same user for the name many times.

以上の構成によれば、新規なユーザとの対話を通してそのユーザの名前を取得し、当該名前を、マイクロホン５１やＣＣＤカメラ５０の出力に基づいて検出したそのユーザの声の音響的特徴及び顔の形態的特徴の各データと関連付けて記憶すると共に、これら記憶した各種データに基づいて、名前を取得していないさらに新規なユーザの登場を認識し、その新規なユーザの名前や声の音響的特徴及び顔の形態的特徴を上述と同様にして取得し記憶するようにしてユーザの名前を学習するようにしたことにより、学習していることをそのユーザに認識させることなく、人間が普段行うように、ユーザとの対話を通じて新規のユーザの名前やそのユーザの声の音響的特徴及び顔の形態的特徴を自然に学習することができ、かくしてエンターテインメント性を格段的に向上し得るロボットを実現できる。 According to the above configuration, the user's name is acquired through dialogue with a new user, and the name is detected based on the output of the microphone 51 or the CCD camera 50. It is stored in association with each data of morphological characteristics, and based on these stored various data, the appearance of a new user who has not obtained a name is recognized, and the acoustic characteristics of the name and voice of the new user are recognized. In addition, the user's name is learned by acquiring and storing the morphological features of the face in the same manner as described above, so that a human can do it normally without recognizing that the user is learning. In addition, it is possible to naturally learn the name of a new user, the acoustic characteristics of the user's voice, and the morphological characteristics of the face through user interaction, thus providing entertainment. A robot that can improve the cement resistance to remarkably can be realized.

（４）他の実施の形態
なお上述の実施の形態においては、本発明を図１のように構成された２足歩行型のロボット１に適用するようにした場合について述べたが、本発明はこれに限らず、この他種々の形態のロボット装置及びロボット装置以外のこの他種々の装置に広く適用することができる。 (4) Other Embodiments In the above-described embodiment, the case where the present invention is applied to the biped robot 1 configured as shown in FIG. 1 has been described. The present invention is not limited to this, and the present invention can be widely applied to various other types of robot apparatuses and various other apparatuses other than the robot apparatus.

また上述の実施の形態においては、学習対象が人間（ユーザ）である場合について述べたが、本発明はこれに限らず、人間以外の物体を名前学習の対象とする場合においても適用することができる。 In the above-described embodiment, the case where the learning target is a human (user) has been described. However, the present invention is not limited to this, and the present invention can also be applied to a case where an object other than a human is targeted for name learning. it can.

この場合において、上述の実施の形態においては、対象となる人物の声の音響的特徴及び顔の形態的特徴からその人物をそれぞれ認識し、これらの認識結果に基づいてその人物が新規な人物であるか否かを判別するようにした場合について述べたが、本発明はこれに限らず、これに代えて又はこれに加えて、これら以外の例えば体型やにおい等の生物学的に個体を識別可能な複数種類の各種特徴からその人物をそれぞれ認識し、これらの認識結果に基づいてその人物が新規な人であるか否かを判別するようにしても良い。また名前学習対象が人物以外の物体である場合には、色や形状、模様、大きさ等の物体を識別可能な複数種類の特徴からそれぞれその物体を認識し、これらの認識結果に基づいてその物体が新規な物体であるか否かを判別するようにしても良い。そしてこれらの場合には、それぞれ物体の異なる所定の特徴を検出すると共に、当該検出結果及び予め記憶している既知の物体の対応する特徴のデータに基づいて、当該対象とする物体を認識する複数の認識手段を設けるようにすれば良い。 In this case, in the above-described embodiment, the person is recognized from the acoustic characteristics of the target person's voice and the morphological characteristics of the face, and the person is a new person based on the recognition results. Although the case where it is determined whether or not there is described, the present invention is not limited to this, but instead of this, or in addition to this, an individual such as a body type or smell is biologically identified. The person may be recognized from various types of possible features, and it may be determined whether the person is a new person based on the recognition results. If the name learning target is an object other than a person, the object is recognized from a plurality of types of features that can identify the object such as color, shape, pattern, size, etc. It may be determined whether or not the object is a new object. In these cases, a plurality of different predetermined features of the object are detected, and the target object is recognized based on the detection result and the data of the corresponding features of the known object stored in advance. The recognition means may be provided.

さらに上述の実施の形態においては、学習達成度を「Ａ」〜「Ｃ」の３段階とするようにした場合について述べたが、本発明はこれに限らず、２段階又は４段階以上とするようにしても良い。 Further, in the above-described embodiment, the case where the learning achievement level is set to the three levels “A” to “C” has been described, but the present invention is not limited to this, and is set to two levels or four levels or more. You may do it.

さらに上述の実施の形態においては、例えば名前学習処理手順ＲＴ１のステップＳＰ１０〜ステップＳＰ１３について上述した追加学習時に、対話制御部６１が単に対話を引き伸ばすだけである場合について述べたが、本発明はこれに限らず、ユーザとの対話を引き伸ばす際に、対話制御部６３が、そのユーザに対する学習が不十分な認識手段（話者認識部６１及び顔認識部６２）が学習し易いような対話をユーザとの間で生成するように処理するようにしても良く、これにより追加学習をより効率良く行うことができる。 Furthermore, in the above-described embodiment, the case where the dialog control unit 61 simply stretches the dialog during the additional learning described above for step SP10 to step SP13 of the name learning processing procedure RT1, for example, has been described. In addition to the above, when extending the dialogue with the user, the dialogue control unit 63 makes the dialogue that the recognition means (speaker recognition unit 61 and face recognition unit 62) insufficiently learned for the user can easily learn. May be processed so as to be generated between them, and additional learning can be performed more efficiently.

実際上、例えば実施の形態の場合であれば、追加学習を行うのが話者認識部６１である場合には、なるべくユーザに発話させるようにな対話を生成しながら対話を引き伸ばし、追加学習を行うのが顔認識部６２である場合には、なるべくユーザのいろいろな方向からの顔画像を取得できるように、「右を向いてくれる？」など、ユーザの顔を動かさせるような対話を生成しながら対話を引き伸ばすようにすれば良い。 In practice, for example, in the case of the embodiment, if the speaker recognition unit 61 performs additional learning, the dialog is expanded while generating a dialog that makes the user speak as much as possible, and additional learning is performed. When the face recognition unit 62 performs, a dialogue that moves the user's face, such as “Would you turn to the right?” Is generated so that face images from various directions of the user can be acquired as much as possible. However, the dialogue should be expanded.

さらに上述の実施の形態においては、連想記憶部６５が、記憶した各学習達成度を例えば数日ごとに時間減衰させるようにした場合について述べたが、本発明はこれに限らず、時間減衰させる時間間隔は数日単位以外であっても良く、またかかる時間減衰を対話制御部６３が管理して行わせるようにしても良い。 Furthermore, in the above-described embodiment, the case where the associative memory unit 65 attenuates each stored learning achievement degree for example every several days has been described. However, the present invention is not limited to this, and time attenuation is performed. The time interval may be other than a few days, and the dialog control unit 63 may manage and cause such time decay.

さらに上述の実施の形態においては、ユーザとの対話を通して対象とする物体の名前をユーザから取得する対話手段と、当該対話手段が取得した対象とする物体の名前、当該対象とする物体に対する各認識手段の認識結果、及び記憶手段が記憶する関連付け情報に基づいて、対象とする物体が新規であると判断したときには、必要な認識手段に当該対象とする物体の対応する特徴を学習させると共に、当該対象とする物体についての関連付け情報を記憶手段に新たに記憶させる制御手段とを同じ１つの機能モジュールである対話制御部６３により構成するようにした場合について述べたが、本発明はこれに限らず、これらを別のモジュールとして構成するようにしても良い。 Furthermore, in the above-described embodiment, a dialog unit that acquires the name of the target object from the user through a dialog with the user, the name of the target object acquired by the dialog unit, and each recognition for the target object When it is determined that the target object is new based on the recognition result of the means and the association information stored in the storage means, the necessary recognition means learns the corresponding characteristics of the target object, and Although the case where the control unit that newly stores the association information about the target object in the storage unit is configured by the dialogue control unit 63 that is the same functional module has been described, the present invention is not limited thereto. These may be configured as separate modules.

さらに上述の実施の形態においては、既知のユーザの名前及び当該既知のユーザに対する各認識部（音声認識部６０、話者認識部６１及び顔認識部６２）の認識結果を関連付けた関連付け情報を記憶する記憶手段を、内部メモリ４０Ａとソフトウェアとから構成するようにした場合について述べたが、本発明はこれに限らず、例えばかかる記憶手段における関連付け情報を記憶する機能部分を、内部メモリ４０Ａ以外の例えばコンパクトディスク等の読書き自在の他の記憶機能を有する手段に置き換えるようにしても良い。 Further, in the above-described embodiment, the association information that associates the names of known users and the recognition results of the respective recognition units (speech recognition unit 60, speaker recognition unit 61, and face recognition unit 62) for the known users is stored. The case where the storage means is configured from the internal memory 40A and software has been described. However, the present invention is not limited to this, and for example, a functional part for storing association information in the storage means other than the internal memory 40A. For example, it may be replaced with a means having other storage functions such as a compact disk which can be freely read and written.

本発明は、エンターテインメントロボット等の学習機能を有する各種装置に適用できる。 The present invention can be applied to various devices having a learning function such as an entertainment robot.

本実施の形態によるロボットの外観構成の説明に供する斜視図である。It is a perspective view with which it uses for description of the external appearance structure of the robot by this Embodiment. 本実施の形態によるロボットの外観構成の説明に供する斜視図である。It is a perspective view with which it uses for description of the external appearance structure of the robot by this Embodiment. 本実施の形態によるロボットの外観構成の説明に供する概念図である。It is a conceptual diagram with which it uses for description of the external appearance structure of the robot by this Embodiment. 本実施の形態によるロボットの内部構成の説明に供するブロック図である。It is a block diagram with which it uses for description of the internal structure of the robot by this Embodiment. 本実施の形態によるロボットの内部構成の説明に供するブロック図である。It is a block diagram with which it uses for description of the internal structure of the robot by this Embodiment. 名前学習機能に関するメイン制御部の機能の説明に供するブロック図である。It is a block diagram with which it uses for description of the function of the main control part regarding a name learning function. 連想記憶部における各種情報の関連付けの説明に供する概念図である。It is a conceptual diagram with which it uses for description of the correlation of the various information in an associative memory part. 名前学習処理手順を示すフローチャートである。It is a flowchart which shows a name learning process procedure. ロボットとユーザとの対話例を示す略線図である。It is a basic diagram which shows the example of a dialogue between a robot and a user. ロボットとユーザとの対話例を示す略線図である。It is a basic diagram which shows the example of a dialogue between a robot and a user. エラー処理手順を示すフローチャートである。It is a flowchart which shows an error processing procedure. ロボットとユーザとの対話例を示す略線図である。It is a basic diagram which shows the example of a dialogue between a robot and a user.

Explanation of symbols

１……ロボット、４０……メイン制御部、５０……ＣＣＤカメラ、５１……マイクロホン、５２……スピーカ、６０……音声認識部、６１……話者認識部、６２……顔認識部、６３……対話制御部、６４……音声合成部、６５……連想記憶部、Ｓ１Ａ……画像信号、Ｓ１Ｂ、Ｓ３……音声信号、Ｄ１、Ｄ２……文字列データ、ＲＴ１……名前学習処理手順、ＲＴ２……エラー処理手順。 DESCRIPTION OF SYMBOLS 1 ... Robot, 40 ... Main control part, 50 ... CCD camera, 51 ... Microphone, 52 ... Speaker, 60 ... Voice recognition part, 61 ... Speaker recognition part, 62 ... Face recognition part, 63: Dialogue control unit, 64: Speech synthesis unit, 65: Associative memory unit, S1A: Image signal, S1B, S3 ... Audio signal, D1, D2: Character string data, RT1: Name learning process Procedure, RT2 ... Error handling procedure.

Claims

A storage unit for storing the user's name and characteristics in association with each other;
A user detection unit for detecting a user based on an image obtained from a camera;
When the user is detected by the user detection unit, before asking the user for a name, the user's characteristics are detected using at least one of the image obtained from the camera and the sound obtained from the microphone, A recognition unit that recognizes the name of the user by comparing the detected feature of the user with the feature already stored in the storage unit;
Through dialogue with the user detected by the user detection unit, by asking the user for the name, the dialogue unit for obtaining the name of the user;
When it is determined that the user is a new user because the name of the user acquired by the dialogue unit and the characteristics of the user detected by the recognition unit are not stored in the storage unit, the image And the recognition unit learns the user's characteristics using at least one of the voices and associates the user's characteristics obtained as a result of the learning with the user's name acquired by the dialogue unit. A control unit to be stored in the storage unit;
With
The recognition unit
The learning achievement level of the feature is determined based on at least one of the number of images and the time length of the sound obtained at the time of detecting the feature and used for the detection of the feature,
The dialogue section above
Using learning achievement degree the recognition section determines determines whether insufficient learning of the feature by the recognition unit, sometimes learning is judged to be insufficient, for extending the dialogue with the user A learning device that executes processing.

The recognition unit
Together to learn the user's morphological characteristics by using the image picture, it is adapted to learn the acoustic features of a user with the above sound voices,
The dialogue section above
When the dialogue with the user is extended, if one of the morphological feature and the acoustic feature is not sufficiently learned by the recognizing unit, the dialogue that facilitates the learning of the one with insufficient learning The learning apparatus according to claim 1, wherein a process for generating is executed.

When a user is detected based on an image obtained from the camera, before asking the user for the name, the user's characteristics are obtained using at least one of the image obtained from the camera and the sound obtained from the microphone. A first step of recognizing the user's name by comparing the detected feature of the user with the feature already stored in the storage unit storing the user's name and the feature in association with each other When,
A second step of obtaining the name of the user by asking the name of the user through interaction with the detected user;
When it is determined that the user is a new user because the name of the user acquired in the second step and the characteristics of the user detected in the first step are not stored in the storage unit The feature of the user is learned using at least one of the image and the voice, and the feature of the user obtained as a result of the learning is associated with the name of the user acquired in the second step. A third step for storing in the storage unit;
With
In the third step,
The learning achievement level of the feature is determined based on at least one of the number of images and the time length of the sound obtained when the feature is detected and used for the detection of the feature, and the determined learning achievement level is used. A learning method that determines whether learning of the feature is insufficient, and executes processing for extending a dialog with the user when it is determined that learning is insufficient.

A storage unit for storing the user's name and characteristics in association with each other;
A user detection unit for detecting a user based on an image obtained from a camera;
When the user is detected by the user detection unit, before asking the user for a name, the user's characteristics are detected using at least one of the image obtained from the camera and the sound obtained from the microphone, A recognition unit that recognizes the name of the user by comparing the detected feature of the user with the feature already stored in the storage unit;
Through dialogue with the user detected by the user detection unit, by asking the user for the name, the dialogue unit for obtaining the name of the user;
When it is determined that the user is a new user because the name of the user acquired by the dialogue unit and the characteristics of the user detected by the recognition unit are not stored in the storage unit, the image And the recognition unit learns the user's characteristics using at least one of the voices and associates the user's characteristics obtained as a result of the learning with the user's name acquired by the dialogue unit. A control unit to be stored in the storage unit;
With
The recognition unit
The learning achievement level of the feature is determined based on at least one of the number of images and the time length of the sound obtained at the time of detecting the feature and used for the detection of the feature,
The dialogue section above
Using learning achievement degree the recognition section determines determines whether insufficient learning of the feature by the recognition unit, sometimes learning is judged to be insufficient, for extending the dialogue with the user A robotic device that performs processing.