JP3919726B2

JP3919726B2 - Learning apparatus and method

Info

Publication number: JP3919726B2
Application number: JP2003345071A
Authority: JP
Inventors: 賢一前田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2003-10-02
Filing date: 2003-10-02
Publication date: 2007-05-30
Anticipated expiration: 2023-10-02
Also published as: JP2005110726A

Description

本発明は、ロボットや人形の玩具に内蔵して利用できる学習装置に関するものである。 The present invention relates to a learning device that can be used in a robot or a doll toy.

従来の玩具ロボットは、主として機械的な動きを実現するためのものである。従って、単純な動きのパターンを繰り返すという、小さな子供向けの玩具として利用されるに止まっている。 Conventional toy robots are mainly for realizing mechanical movement. Therefore, it is only used as a toy for small children that repeats a simple movement pattern.

これからは、子供より成人や老人の人口が多くなるため、玩具としても成人や老人が楽しめるロボットが必要とされている。しかし、人工的に知的な動作をさせたり複雑な行動を実現するためには、まだまだ将来の技術開発を待たなくてはならない。 From now on, since the population of adults and elderly people will be larger than children, robots that can be enjoyed by adults and elderly people as toys are also needed. However, in order to artificially perform intelligent movements and realize complex actions, we must still wait for future technological development.

また、ロボットには、ユーザーの顔を識別するものが提案されている（例えば、特許文献１参照）。
特開２００２−１５７５９６公報 Also, robots that identify the user's face have been proposed (see, for example, Patent Document 1).
JP 2002-157596 A

しかし、上記のようにロボットに顔識別装置を内蔵しても、その識別した情報をどのように有効に使用するかは開示されていない。 However, even if the face identification device is built in the robot as described above, it is not disclosed how to effectively use the identified information.

そこで、本発明では、ロボットや人形の玩具などに内蔵する学習装置において、顔識別情報を用いて赤ちゃんやアニメの主人公をモチーフとしたレベルの低い知的な動作を実現することによって、成人や老人でも楽しめるものを提供することを目的とする。 Therefore, in the present invention, in learning devices built in robots, doll toys, and the like, by using face identification information to realize low-level intelligent movements with the motif of heroes of babies and anime, adults and elderly people But the aim is to provide something that can be enjoyed.

本発明は、ロボット、人形、または、玩具に内蔵された学習装置であって、画像入力手段から入力した画像から物体を認識する物体認識手段と、前記物体を登録させる登録モードにおいて、音声入力装置から入力した音声から名詞を認識する音声認識手段と、前記登録モードにおいて、前記音声認識手段によって認識された名詞に関する記号列を含む音声データと、前記物体認識手段によって認識された物体に関する画像データを対応付けて記憶する登録記憶手段と、前記登録された物体か否かを認識させる認識モードにおいて、前記物体認識手段によって認識された物体が、前記登録記憶手段に記憶された物体と一致したときに、前記記憶された物体と対応して記憶されている音声データと予め記憶されている素片辞書の音声の素片に基づいて音声を合成する音声合成手段と、を有し、前記物体認識手段は、人間の顔を認識し、前記音声認識手段は、指示代名詞、助詞、助動詞の少なくともいずれかに属するキーワードを認識し、また、前記入力された音声の中で前記認識されたキーワードを除く文字列を名詞として認識し、さらに、前記認識された音声データには、抑揚の調子を表すピッチ情報を含み、前記音声合成手段は、前記名詞の文字列と前記ピッチ情報と前記素片辞書の素片から音声を合成し、また、前記素片辞書が、前記学習装置を内蔵したロボット、または、人形、または、玩具の外観に合わせた素片から構成されている、ことを特徴とする学習装置である。 The present invention relates to a learning device built in a robot, a doll, or a toy, and an object recognition unit for recognizing an object from an image input from an image input unit, and a voice input device in a registration mode for registering the object Voice recognition means for recognizing a noun from the voice input from the voice, voice data including a symbol string relating to the noun recognized by the voice recognition means in the registration mode, and image data relating to the object recognized by the object recognition means. In the recognition mode for recognizing whether or not the registered object is a registered object and the registered storage means for storing in association with each other, when the object recognized by the object recognizing means matches the object stored in the registered storage means , Based on the speech data stored corresponding to the stored object and the speech segment of the segment dictionary stored in advance. Has a speech synthesis means for synthesizing speech, the Te, the object recognition unit recognizes the human face, the voice recognition unit recognizes demonstrative pronoun, particle, at least keywords belong to one of the auxiliary verb, Further, a character string excluding the recognized keyword in the input speech is recognized as a noun, and the recognized speech data includes pitch information indicating a tone of inflection, and the speech synthesis means Synthesizes speech from the character string of the noun, the pitch information, and the segment of the segment dictionary, and the segment dictionary has an appearance of a robot, a doll, or a toy that incorporates the learning device It is comprised from the segment matched to this, It is the learning apparatus characterized by the above-mentioned.

請求項２に係る発明は、前記物体認識手段は、人間の顔を認識する顔認識手段であることを特徴とする請求項１記載の学習装置である。 The invention according to claim 2 is the learning apparatus according to claim 1, wherein the object recognition means is face recognition means for recognizing a human face.

本発明の学習装置について説明する。 The learning device of the present invention will be described.

登録モードにおいて、画像入力手段から入力した画像から顔などの所定の物体を認識すると共に、音声入力装置から入力した音声から名詞を認識する。そして、認識された名詞に関する記号列を含む音声データと、認識された物体に関する画像データを対応付けて記憶する。 In the registration mode, a predetermined object such as a face is recognized from the image input from the image input means, and a noun is recognized from the voice input from the voice input device. Then, the voice data including the symbol string related to the recognized noun and the image data related to the recognized object are stored in association with each other.

認識モードにおいて、認識された物体が、前記記憶された物体と一致したときに、前記記憶された物体と対応して記憶されている音声データと予め記憶されている素片辞書の音声の素片に基づいて音声を合成する。 In the recognition mode, when the recognized object matches the stored object, the speech data stored in correspondence with the stored object and the speech unit of the segment dictionary stored in advance Synthesize speech based on

また、指示代名詞、助詞、助動詞などに属する言葉であるキーワードが認識された場合には、登録する記号列から、そのキーワードを除いて、キーワードとの位置関係により名詞らしい部分のみを取り出すことができる。 In addition, when a keyword that is a word belonging to a pronoun, a particle, an auxiliary verb, or the like is recognized, only the part that seems to be a noun can be extracted from the registered symbol string by removing the keyword from the registered symbol string. .

また、ピッチ情報から抑揚を再現することにより本当に人間が発声するように聞くことできる。 In addition, by reproducing the inflection from the pitch information , it can be heard that a human is really speaking.

また、ロボット、人形、または、玩具が本当に発声するように聞くことできる。 You can also hear the robot, doll, or toy really speak.

本発明によれば、レベルの低い知的な動作を実現することによって、成人や老人でも楽しめる学習装置を提供することを可能とし、本発明を内蔵したものは、将来のおもちゃ的なロボットの範囲を超えることが可能となり、実用上多大な効果が得られる
例えば、赤ちゃんが人や物体を覚えていく様子を模倣することになり、子供が大きくなって独立したり子供のない成人や、寂しい思いをしている老人のなぐさめとなることが期待できる。 According to the present invention, it is possible to provide a learning device that can be enjoyed by adults and elderly people by realizing low-level intelligent movements, and those incorporating the present invention are within the scope of future toy robots. For example, it will imitate how a baby remembers people and objects, and the child grows up, becomes independent, has no children, or feels lonely It can be expected to serve as a supporter for the elderly who are doing so.

あるいは、アニメの主人公が飼い主である所有者の名前などを覚えていくという様子を模倣することにより、従来のユーザである子供に対しても魅力的なものとなることが期待できる。 Alternatively, it can be expected to be attractive to children who are conventional users by imitating the manner in which the main character of the animation remembers the name of the owner who is the owner.

以下に図１〜図１１を参照して、本発明の一実施形態について説明する。 An embodiment of the present invention will be described below with reference to FIGS.

（１）赤ちゃんロボット１００の構成
本実施形態は、赤ちゃんが母親の顔を学習するという動作をモチーフとして実施した本発明の典型的な実施形態である。 (1) Configuration of Baby Robot 100 This embodiment is a typical embodiment of the present invention that is implemented using a movement of a baby learning a mother's face as a motif.

図４は、本実施形態の学習装置１０を内蔵した赤ちゃんロボット１００の外観である。この赤ちゃんロボットには、マイク１０２とビデオカメラやデジタルカメラなどのカメラ１０４が内蔵されている。但し、カメラ１０４とマイク１０２は、実物の赤ちゃんと同じように目や耳の位置にある必要はないため、目立ちにくい髪の毛の中や衣類の一部であって良い。好ましくは、衣類のボタンに似せて作れば良い。 FIG. 4 is an external view of the baby robot 100 incorporating the learning device 10 of the present embodiment. This baby robot includes a microphone 102 and a camera 104 such as a video camera or a digital camera. However, since the camera 104 and the microphone 102 do not need to be in the position of eyes or ears as in the case of a real baby, they may be in a hair that is not noticeable or a part of clothing. Preferably, it may be made to resemble clothing buttons.

（２）学習装置１０の構成
図１は、本実施形態の学習装置１０を示すブロック図である。 (2) Configuration of Learning Device 10 FIG. 1 is a block diagram showing the learning device 10 of the present embodiment.

学習装置１０は、音声認識部１２、物体認識部１４、登録記憶部１６、音声合成部１８、モード切り替えスイッチ２０とから構成される。 The learning device 10 includes a voice recognition unit 12, an object recognition unit 14, a registration storage unit 16, a voice synthesis unit 18, and a mode change switch 20.

この学習装置１０は、音声認識と物体認識を組み合わせることにより、赤ちゃんロボット１００が見ている顔と聞いている音声とを対応付けて登録することにより、簡単な学習過程を模擬する。 The learning device 10 simulates a simple learning process by combining voice recognition and object recognition, and registering the face the baby robot 100 is viewing and the voice being heard in association with each other.

まず、図２のフローチャートに示すように、「登録モード」において、ユーザーは自分の顔を赤ちゃんロボット１００に見せながら、自分の名前を赤ちゃんロボット１００に聞かせる。最も典型的な例としては、女性が赤ちゃんロボット１００に自分の顔を見せて「ママ」というように名前を聞かせる。これにより、赤ちゃんロボット１００は、その音声を認識して文字列のような記号列に変換して記憶すると共に、物体認識の学習により顔の辞書をその記号列と関連付けて登録する。 First, as shown in the flowchart of FIG. 2, in the “registration mode”, the user asks the baby robot 100 his / her name while showing his / her face to the baby robot 100. As a most typical example, a woman shows her face to the baby robot 100 and asks her name “Mama”. As a result, the baby robot 100 recognizes the voice, converts it into a symbol string such as a character string, stores it, and registers a face dictionary in association with the symbol string by learning object recognition.

次に、図３のフローチャートに示すように、「認識モード」において、赤ちゃんロボット１００は、以前見せられた顔を再度見た場合に、その顔と同時に聞いた音声の記号を自分の音声で発声する。最も典型的な例としては、赤ちゃんらしい声で「ママ」と発声する。 Next, as shown in the flowchart of FIG. 3, in the “recognition mode”, when the baby robot 100 sees the face shown before again, the baby robot 100 utters the voice symbol heard simultaneously with the face with its own voice. To do. As the most typical example, say “Mama” with a baby-like voice.

このようにして赤ちゃんロボット１００が、人間の顔と名前を学習して、同じ顔を見た際に学習した名前を発声するという動作を真似ることが可能となる。 In this way, the baby robot 100 can learn the human face and name and imitate the action of speaking the learned name when viewing the same face.

以下、各部１２〜１６の構成をそれぞれ説明する。なお、これら構成１２〜１６の各機能は、コンピュータに記憶されたプログラムによって実現する。 Hereinafter, the structure of each part 12-16 is demonstrated, respectively. In addition, each function of these structures 12-16 is implement | achieved by the program memorize | stored in the computer.

（２−１）モード切り替えスイッチ２０
モード切り替えスイッチ２０は、学習装置１０の機能を登録モードと認識モードを切り替えるものであり、赤ちゃんロボット１００の背中などに設ける。 (2-1) Mode switch 20
The mode change switch 20 switches the function of the learning device 10 between the registration mode and the recognition mode, and is provided on the back of the baby robot 100 or the like.

（２−２）音声認識部１２
（２−２−１）第１の実施形態の音声認識部１２
音声認識部１２は、登録モードで機能するものであり、マイク１０２と、マイク１０２からの信号を適当なレベルに増幅する増幅器、不必要な帯域をカットするフィルター、増幅されたアナログ信号をディジタル化するＡ／Ｄ変換器、ディジタル化された信号をＨＭＭのような既知のアルゴリズムによって認識する認識部からなる。 (2-2) Voice recognition unit 12
(2-2-1) The voice recognition unit 12 of the first embodiment
The voice recognition unit 12 functions in the registration mode. The microphone 102, an amplifier that amplifies the signal from the microphone 102 to an appropriate level, a filter that cuts unnecessary bands, and an amplified analog signal are digitized. An A / D converter that recognizes the digitized signal using a known algorithm such as an HMM.

そして、登録モードにおいて登録記憶部１６にユーザーが発声した名前を音声辞書に記憶させる。 In the registration mode, the name uttered by the user is stored in the speech dictionary in the registration storage unit 16.

（２−２−２）第２の実施形態の音声認識部１２
ユーザーが対象物の名前のみを発声するという約束事に従って使えば、以上の第１の実施形態の音声認識部１２を実現することができる。 (2-2-2) Voice recognition unit 12 of the second embodiment
If the user uses according to the convention that he speaks only the name of the object, the speech recognition unit 12 of the first embodiment described above can be realized.

しかし、一般には、「ママですよ」とか「これが犬ですよ」というように、対象物の名称以外に「ですよ」とか「これが」というような不要の言葉が入ることがある。これに対応するために、図５に示すように、音声認識部１２にキーワード認識部１３をさらに設ける。 However, in general, unnecessary words such as “is” or “this is” may be entered in addition to the name of the object, such as “I am a mama” or “this is a dog”. In order to cope with this, as shown in FIG. 5, the speech recognition unit 12 is further provided with a keyword recognition unit 13.

キーワード認識部１３は、指示代名詞、助詞、助動詞などに属する言葉であるキーワードが認識された場合には、登録する記号列から、そのキーワードを除いて、キーワードとの位置関係により名詞らしい部分のみを取り出す。そして、登録記憶部１６にキーワードを取り除いた名前の記号列の音声データを音声辞書に記憶させる。 When a keyword that is a word belonging to a pronoun, a particle, an auxiliary verb, or the like is recognized, the keyword recognizing unit 13 removes the keyword from the registered symbol string and removes only the part that seems to be a noun depending on the positional relationship with the keyword. Take out. Then, the voice data of the symbol string of the name from which the keyword is removed is stored in the registration dictionary 16 in the voice dictionary.

図６に、最後が「ですよ」というキーワードである場合を想定した場合の動作をフローチャートで示す。 FIG. 6 is a flowchart showing an operation when it is assumed that the last keyword is “is”.

なお、最初に「これが」とか、「私が」とかいうキーワードを想定する場合も、ほぼ同じような動作で対応することが可能である。この場合は、キーワードの位置が最初なので、フローチャート中「前」とある部分を「後」にする必要がある。 It should be noted that even when the keyword “This is” or “I am” is first assumed, it is possible to cope with the same operation. In this case, since the position of the keyword is the first, it is necessary to set a part “front” in the flowchart to “back”.

（２−３）物体認識部１４と登録記憶部１６の構成
次に、物体認識部１４と登録記憶部１６の構成について説明する。 (2-3) Configuration of Object Recognition Unit 14 and Registration Storage Unit 16 Next, the configuration of the object recognition unit 14 and the registration storage unit 16 will be described.

この物体認識部１４と登録記憶部１６については、顔認識の登録モードと認識モードに分けて説明する。 The object recognition unit 14 and the registration storage unit 16 will be described separately for a face recognition registration mode and a recognition mode.

（２−３−１）登録モード
顔認識の登録モードのフローチャートを図８に示す。 (2-3-1) Registration Mode A flowchart of the registration mode for face recognition is shown in FIG.

登録モードにおいては、人物の顔が写った顔画像をＭ′枚取り込み、Ｍ′より少ないＭ個の主成分を抽出して画像データとして登録記憶部１６の画像辞書に記憶する。これには、よく知られた主成分分析の方法を応用することができる。 In the registration mode, M ′ face images showing a person's face are captured, and M principal components smaller than M ′ are extracted and stored in the image dictionary of the registration storage unit 16 as image data. For this, a well-known principal component analysis method can be applied.

そして、図１１に示すように、登録記憶部１６において、画像辞書と音声辞書を対応させて記憶させる。この場合には、同じ時刻に発声された音声辞書と認識した顔の画像辞書を、認識した時刻を基準に対応させて記憶させる。 Then, as shown in FIG. 11, the registration storage unit 16 stores the image dictionary and the voice dictionary in association with each other. In this case, the voice dictionary spoken at the same time and the recognized face image dictionary are stored in correspondence with the recognized time.

なお、前に登録された名前と同じ名前が登録される場合の動作に関しては、全く独立に２個の名前と画像データを登録しても良いし、前のものを置き換えても良い。 As for the operation when the same name as the previously registered name is registered, two names and image data may be registered completely independently, or the previous one may be replaced.

また、前に登録された画像データを新しいデータで更新しても良い。その際には、画像データのみでなく、前記主成分分析に使われた相関行列を同時に記録しておくようにすれば良い。この処理のフローチャートを図１０に示す。 Further, previously registered image data may be updated with new data. At that time, not only the image data but also the correlation matrix used for the principal component analysis may be recorded simultaneously. A flowchart of this process is shown in FIG.

図１０でｆは入力画像をメッシュで表現した際に、各メッシュの濃度値を並べたベクトルである。記号＜・，・＞はシャッテン積と呼ばれ、ベクトルとその転置を掛けて行列を作る演算である。Ｋ、Ｋ′は、この演算結果を重み付きで加えたもので特性核と呼ばれる、一種の相関行列である。ここでは、Ｋは一つしか書いていないが、一般には、登録された複数のカテゴリ毎に、そのカテゴリと同じ数だけある。Ｋ′は、新しく登録されるカテゴリに対するもので、登録後はＫと同じ扱いとなる。 In FIG. 10, f is a vector in which the density values of the meshes are arranged when the input image is represented by meshes. The symbols <·, ·> are called Schatten products, and are operations that create a matrix by multiplying a vector and its transpose. K and K ′ are a kind of correlation matrix obtained by adding the calculation results with weights and called a characteristic kernel. Here, only one K is written, but generally there are the same number of registered categories for each of a plurality of registered categories. K ′ is for a newly registered category and is treated the same as K after registration.

（２−３−２）認識モード
認識モードにおいては、顔画像をＮ′枚取り込み、Ｎ′より少ないＮ個の主成分を抽出し、Ｎ次元の部分空間を構成し、登録記憶部１６においてこの部分空間と画像データのＭ次元の部分空間との間の角度を計算する。 (2-3-2) Recognition Mode In the recognition mode, N ′ face images are captured, N principal components less than N ′ are extracted, an N-dimensional subspace is formed, and this is stored in the registration storage unit 16. An angle between the subspace and the M-dimensional subspace of the image data is calculated.

部分空間を校正するには、登録モードと同様の主成分分析を利用することができる。また、部分空間の間の角度を測るには、相互部分空間法と呼ばれる方法を用いる。 To calibrate the subspace, the same principal component analysis as in the registration mode can be used. Moreover, in order to measure the angle between subspaces, a method called a mutual subspace method is used.

具体的には、Ｍ個の主成分を｛φ_ｍ｝、Ｎ個の主成分を｛ψ_ｎ｝としたとき、Ｘ＝（ｘ_ｉｊ）
但し

または

なる行列Ｘの最大固有値として、最も小さい角度を計算することができることが知られている。角度をθ_１とすると、最大固有値λ_１との関係は、λ_１＝ｃｏｓ^２θ_１である。この技術に関しては、特許文献２（特開平１１−２６５４５２号公報参照、前田賢一，山口修，福井和広：“物体認識装置および物体認識方法”）。 Specifically, when M principal components are {φ _m } and N principal components are {ψ _n }, X = (x _ij )
However,

Or

It is known that the smallest angle can be calculated as the maximum eigenvalue of the matrix X. When the angle is θ ₁ , the relationship with the maximum eigenvalue λ ₁ is λ ₁ = cos ² θ ₁ . Regarding this technique, Patent Document 2 (see Japanese Patent Application Laid-Open No. 11-265452, Kenichi Maeda, Osamu Yamaguchi, Kazuhiro Fukui: “Object Recognition Device and Object Recognition Method”).

（２−４）音声合成部１８の構成
認識モードにおいて、登録記憶部１６で画像認識の結果が記憶されたものと一致すると、それと関連付けて記憶されていた音声辞書から音声データを呼び出し、その音声データの記号列を使って音声合成を行う。 (2-4) Configuration of the voice synthesizing unit 18 In the recognition mode, when the result of the image recognition is stored in the registration storage unit 16, the voice data is called from the voice dictionary stored in association therewith, and the voice Performs speech synthesis using a symbol string of data.

音声合成は、記号列から音声を合成するという処理を行うが、そのためには、記号列以外に、音声の素片（個々の音素を合成する音の材料）と、抑揚の情報が必要である。 In speech synthesis, a process of synthesizing speech from a symbol string is performed. To this end, in addition to the symbol string, speech segments (sound material that synthesizes individual phonemes) and inflection information are required. .

音素の素片は、予め登録されたものを用いる。例えば、赤ちゃんの声から収集した素片を使うと、赤ちゃんの声で合成が可能となる。なお、素片としては、上記したように赤ちゃんを模倣する場合には赤ちゃんの素片を予め収集しておいたものを利用するが、アニメの主人公の場合には、声優に発声してもらった音声から素片を収集して利用する。 As phoneme segments, those registered in advance are used. For example, using pieces collected from a baby's voice makes it possible to synthesize with the baby's voice. As described above, when imitating a baby as described above, the one collected in advance is used, but in the case of the main character of the anime, the voice actor uttered it. Collect and use fragments from speech.

しかし、登録したままの記号列を音声合成にかけると、まさにロボット的な抑揚のない音声が発生されることになる。この問題を解決するためには、音声認識部１２で抑揚も学習させることが有効である。これは音声のピッチを抽出して記号と対応付けながらピッチ情報として記憶するようにすればよい。そして、音声合成部１８において、このピッチ情報から抑揚を再現することにより本当にその人間が発声するように聞くことできる。その登録の状態のフローチャートを図７に示す。 However, if the registered symbol string is subjected to speech synthesis, speech without robotic intonation will be generated. In order to solve this problem, it is effective that the speech recognition unit 12 learns intonation. This can be done by extracting the pitch of the voice and storing it as pitch information while associating it with the symbol. Then, by reproducing the inflection from the pitch information, the speech synthesizer 18 can hear the person really speaking. A flowchart of the registration state is shown in FIG.

（変更例）
本発明は、上記実施形態に限定されるものではなく、その主旨を変更することなく、いろいろな変更例が考えられる。 (Example of change)
The present invention is not limited to the above-described embodiment, and various modifications can be considered without changing the gist thereof.

例えば、漫画の主人公のロボットが、飼い主の名前を覚えるようにしても良い。 For example, a cartoon hero's robot may remember the owner's name.

また、上記実施形態では、ユーザーの顔を認識させていたが、これに代えてボールなどの物体を認識させてもよい。例えば、犬型のロボットにボールや骨を覚えさせるようにすることもできる。 In the above embodiment, the user's face is recognized, but an object such as a ball may be recognized instead. For example, a dog-shaped robot can be made to remember a ball or bone.

本発明は、ロボット、または、人間や赤ちゃんの人形、動物の人形、アニメや漫画のキャラクターの人形やロボット、その他の玩具に内蔵して使用するものである。 The present invention is used by being incorporated in a robot, a human or baby doll, an animal doll, an anime or cartoon character doll, a robot, or other toys.

本発明の一実施形態の学習装置のブロック図である。It is a block diagram of the learning apparatus of one Embodiment of this invention. 本実施形態の学習装置の登録モードのフローチャートである。It is a flowchart of the registration mode of the learning apparatus of this embodiment. 本実施形態の学習装置の認識モードのフローチャートである。It is a flowchart of the recognition mode of the learning apparatus of this embodiment. 本実施形態の赤ちゃんロボットの外観図である。It is an external view of the baby robot of this embodiment. 第２の実施形態の音声認識部を有した学習装置のブロック図である。It is a block diagram of the learning apparatus which has the speech recognition part of 2nd Embodiment. キーワード認識部１４を有した音声認識のフローチャートである。5 is a flowchart of speech recognition having a keyword recognition unit 14. 抑揚を含めた学習のフローチャートである。It is a flowchart of learning including intonation. 登録モード中の顔認識のフローチャートである。It is a flowchart of face recognition in registration mode. 認識モード中の顔認識のフローチャートである。It is a flowchart of face recognition in recognition mode. 更新登録のフローチャートである。It is a flowchart of update registration. 登録記憶部１６の記憶状態を示す構成図である。3 is a configuration diagram illustrating a storage state of a registration storage unit 16. FIG.

Explanation of symbols

１０学習装置
１２音声認識部
１４物体認識部
１６登録記憶部
１８音声合成部
１００赤ちゃんロボット
１０２マイク
１０４カメラ DESCRIPTION OF SYMBOLS 10 Learning apparatus 12 Voice recognition part 14 Object recognition part 16 Registration memory | storage part 18 Speech synthesis part 100 Baby robot 102 Microphone 104 Camera

Claims

A learning device built into a robot, doll, or toy,
Object recognition means for recognizing an object from an image input from the image input means;
In a registration mode for registering the object, speech recognition means for recognizing a noun from speech input from a speech input device;
In the registration mode, a registration storage unit that stores voice data including a symbol string related to a noun recognized by the voice recognition unit and image data related to the object recognized by the object recognition unit in association with each other;
In the recognition mode for recognizing whether or not the object is a registered object, when the object recognized by the object recognizing unit matches the object stored in the registered storage unit, corresponding to the stored object Speech synthesis means for synthesizing speech based on speech data stored and speech segments of a segment dictionary stored in advance;
Have,
The object recognition means recognizes a human face,
The speech recognition means recognizes a keyword belonging to at least one of a pronoun, a particle, or an auxiliary verb in the input speech, and a character string excluding the recognized keyword in the input speech Is recognized as a noun, and the recognized voice data further includes pitch information representing the tone of inflection,
The speech synthesis means synthesizes speech from the character string of the noun, the pitch information, and the segment of the segment dictionary, and the segment dictionary includes a robot, a doll, or It consists of pieces that match the appearance of the toy.
A learning apparatus characterized by that.

A learning method in a learning device built in a robot, a doll, or a toy,
An object recognition step for recognizing an object from an image input from an image input means;
In a registration mode for registering the object, a speech recognition step for recognizing a noun from speech input from a speech input device;
In the registration mode, a registration storage step of storing voice data including a symbol string related to the noun recognized in the voice recognition step and image data relating to the object recognized in the object recognition step in association with each other;
In the recognition mode for recognizing whether or not the object is a registered object, when the object recognized in the object recognition step matches the object stored in the registration storage step, corresponding to the stored object A speech synthesis step of synthesizing speech based on speech data stored in advance and speech segments of a segment dictionary stored in advance;
Have,
Recognizing a human face in the object recognition step;
In the speech recognition step, a character string that recognizes a keyword belonging to at least one of a demonstrative pronoun, a particle, and an auxiliary verb in the input speech and that excludes the recognized keyword in the input speech Is recognized as a noun, and the recognized voice data further includes pitch information representing the tone of inflection,
In the speech synthesis step, a speech is synthesized from the character string of the noun, the pitch information, and the segment of the segment dictionary, and the segment dictionary is a robot, doll, or It consists of pieces that match the appearance of the toy.
A learning method characterized by that.

A program that causes a computer built in a robot, doll, or toy to function as a learning device,
An object recognition function for recognizing an object from an image input from an image input means;
In a registration mode for registering the object, a speech recognition function for recognizing a noun from speech input from a speech input device;
In the registration mode, a registration storage function for storing voice data including a symbol string relating to a noun recognized by the voice recognition function and image data relating to an object recognized by the object recognition function in association with each other;
In the recognition mode for recognizing whether or not the object is a registered object, when the object recognized by the object recognition function matches the object stored by the registration storage function, corresponding to the stored object A speech synthesis function for synthesizing speech based on speech data stored in advance and speech segments in a segment dictionary stored in advance;
Realized,
In the object recognition function, a human face is recognized,
In the speech recognition function, a character string that recognizes a keyword belonging to at least one of a demonstrative pronoun, a particle, and an auxiliary verb in the input speech and that excludes the recognized keyword in the input speech Is recognized as a noun, and the recognized voice data further includes pitch information representing the tone of inflection,
In the speech synthesis function, the speech is synthesized from the character string of the noun, the pitch information, and the segment of the segment dictionary, and the segment dictionary is a robot incorporating the computer, a doll, or Consists of pieces that match the appearance of the toy,
A program characterized by that.