JP2012010856A

JP2012010856A - State recognition device, listening interaction continuing system, state recognition program and state recognition method

Info

Publication number: JP2012010856A
Application number: JP2010149041A
Authority: JP
Inventors: Hirotake Yamazoe; 大丈山添; Yuichi Kamiyama; 祐一神山; Tomoko Yonezawa; 朋子米澤; Shinji Abe; 伸治安部
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2010-06-30
Filing date: 2010-06-30
Publication date: 2012-01-19

Abstract

PROBLEM TO BE SOLVED: To provide a state recognition device, a listening interaction continuing system, a state recognition program and a state recognition method which are new.SOLUTION: A PC 10 included in the listening interaction continuing system 100 acquires user's action data from images picked up by an abdomen camera 14 of a robot 12 and a monitor camera 22 and voices collected by microphones 20. An individualized SVM in which the position of a boundary (ultraplane) is adjusted is constructed on the basis of an individual learning sample sampled from the action data of a certain user and a general learning sample for constructing SVM. When a recognition sample sampled from the certain user's action is inputted into the individualized SVM, the concentration state of a ceratin user is recognized. According to this device, the PC 10 can easily and correctly recognize the concentration state of the certain user.

Description

この発明は、状態認識装置、傾聴対話持続システム、状態認識プログラムおよび状態認識方法に関し、特にたとえば、ユーザの状態を認識する、状態認識装置、傾聴対話持続システム、状態認識プログラムおよび状態認識方法に関する。 The present invention relates to a state recognition device, a listening dialogue persistence system, a state recognition program, and a state recognition method, and more particularly, to a state recognition device, a listening dialogue persistence system, a state recognition program, and a state recognition method, for example.

特許文献１に開示されている、状態監視装置は、複数台のカメラおよびマイクを利用して、人間の姿勢や顔の向きの変化および目や口の開閉を検出する。また、状態監視装置は、検出された情報に基づいて、人間がテレビなどを見ている停止状態、人間が話をしている会話状態および人間が寝ている就眠状態などを判定することができる。そして、状態監視装置は、判定した状態に応じて、人間の周囲に設定された各種電気製品の電源を制御したり、人間に対してメッセージを流したりする。
特開２００５−１９９０７８号公報［A61B 5/11, A61B 5/107, G06T 1/00, G06T 7/20］ The state monitoring device disclosed in Patent Document 1 uses a plurality of cameras and microphones to detect a change in the posture of a person and a face, and opening and closing of eyes and mouth. Further, the state monitoring device can determine, based on the detected information, a stop state in which a person is watching a television, a conversation state in which a person is speaking, a sleep state in which a person is sleeping, and the like. . Then, the state monitoring device controls the power supply of various electrical products set around the person or sends a message to the person according to the determined state.
JP-A-2005-199078 [A61B 5/11, A61B 5/107, G06T 1/00, G06T 7/20]

近年、特許文献１の状態監視装置のように人間の状態を認識する認識システムが多く開発され、様々な分野で利用され始めている。 In recent years, many recognition systems for recognizing a human state, such as the state monitoring device of Patent Document 1, have been developed and are beginning to be used in various fields.

ところが、様々な人間の状態を認識するために開発された認識システムでは、姿勢の変化や口の開閉が極端に少ない人間の状態は、正しく認識されないことがある。この場合、その人間に特化した認識システムを構築することも考えられるが、個人用の認識システムを構築するためには大量の学習データを用意しなければならず、現実的な解決手段とは言えない。また、個人用の認識システムを構築してしまうと、他の人間は個人用の認識システムを全く利用できず、認識システムの汎用性が損なわれる。 However, in a recognition system developed for recognizing various human states, a human state with extremely little change in posture and opening / closing of the mouth may not be recognized correctly. In this case, it may be possible to construct a recognition system specialized for that person, but in order to construct a personal recognition system, a large amount of learning data must be prepared. I can not say. Also, if a personal recognition system is constructed, other humans cannot use the personal recognition system at all, and the versatility of the recognition system is impaired.

それゆえに、この発明の主たる目的は、新規な、状態認識装置、傾聴対話持続システム、状態認識プログラムおよび状態認識方法を提供することである。 SUMMARY OF THE INVENTION Therefore, a main object of the present invention is to provide a novel state recognition device, a listening dialogue persistence system, a state recognition program, and a state recognition method.

この発明の他の目的は、特定のユーザの状態を、容易に正しく認識することができる、状態認識装置、傾聴対話持続システム、状態認識プログラムおよび状態認識方法を提供することである。 Another object of the present invention is to provide a state recognition device, a listening dialogue persistence system, a state recognition program, and a state recognition method that can easily and correctly recognize the state of a specific user.

この発明は、上記の課題を解決するために、以下の構成を採用した。なお、括弧内の参照符号および補足説明等は、この発明の理解を助けるために記述する実施形態との対応関係を示したものであって、この発明を何ら限定するものではない。 The present invention employs the following configuration in order to solve the above problems. The reference numerals in parentheses, supplementary explanations, and the like indicate the corresponding relationship with the embodiments described in order to help understanding of the present invention, and do not limit the present invention.

第１の発明は、ユーザの行動を取得する取得手段と、認識基準を有し、取得手段によって取得されたユーザの行動からユーザ状態を認識する認識手段とを備える、状態認識装置において、複数の学習サンプルを記憶する記憶手段、および複数の学習サンプルに基づいて、認識基準を調整する調整手段をさらに備えることを特徴とする、状態認識装置である。 A first invention includes an acquisition unit that acquires a user's behavior and a recognition unit that has a recognition criterion and recognizes a user state from the user's behavior acquired by the acquisition unit. A state recognition apparatus, further comprising: a storage unit that stores a learning sample; and an adjustment unit that adjusts a recognition reference based on the plurality of learning samples.

第１の発明では、状態認識装置（１０：実施例において対応する部分を例示する参照符号。以下、同じ。）は、カメラ（１４，２２）やマイク（２０）などからユーザの行動を取得する取得手段（２６，Ｓ１５３，Ｓ１５５）と、認識基準を有し、取得手段によって取得されたユーザの行動から、ユーザ状態（集中状態）を認識する認識手段（２６，Ｓ１９５，Ｓ２１３）とを備える。記憶手段（３０）は、認識する対象ユーザの行動からサンプリングされた、複数の学習サンプルを記憶する。調整手段（２６，Ｓ１７１，Ｓ１７３）は、認識対象のユーザの行動からサンプリングされた、複数の学習サンプルに基づいて、認識手段の認識基準を調整する。 In the first invention, the state recognition device (10: reference numeral exemplifying a corresponding part in the embodiment, hereinafter the same) acquires a user's action from a camera (14, 22), a microphone (20), or the like. It has an acquisition means (26, S153, S155) and a recognition means (26, S195, S213) that has a recognition standard and recognizes the user state (concentrated state) from the user behavior acquired by the acquisition means. A memory | storage means (30) memorize | stores the some learning sample sampled from the action of the object user to recognize. The adjustment means (26, S171, S173) adjusts the recognition reference of the recognition means based on a plurality of learning samples sampled from the behavior of the user to be recognized.

第１の発明によれば、特定のユーザの行動からサンプリングされた学習サンプルを利用して、認識基準が調整されるため、そのユーザの状態を容易に正しく認識できるようになる。 According to the first aspect, since the recognition standard is adjusted using the learning sample sampled from the behavior of a specific user, the user's state can be easily recognized correctly.

第２の発明は、第１の発明に従属し、認識基準は、ＳＶＭにおける境界であり、調整手段は、複数の学習サンプルの重みを調整する重み調整手段を含み、重み調整手段によって複数の学習サンプルの重みが調整されると境界の位置が変化することを特徴とする。 A second invention is according to the first invention, wherein the recognition criterion is a boundary in the SVM, and the adjustment means includes a weight adjustment means for adjusting the weights of the plurality of learning samples, and the plurality of learning is performed by the weight adjustment means. When the weight of the sample is adjusted, the boundary position changes.

第２の発明では、認識基準は、ＳＶＭにおける境界（境界線または超平面）である。重み調整手段（２６，Ｓ１７１）は、学習サンプルに含まれる各パラメータ値の重みを調整する。そして、重み調整手段によって複数の学習サンプルの重みが調整され、ＳＶＭが再構築されると境界の位置が変化する。 In the second invention, the recognition criterion is a boundary (boundary line or hyperplane) in the SVM. The weight adjusting means (26, S171) adjusts the weight of each parameter value included in the learning sample. Then, when the weights of the plurality of learning samples are adjusted by the weight adjusting means and the SVM is reconstructed, the position of the boundary changes.

第２の発明によれば、学習サンプルに含まれる各パラメータ値の重みを調整することで、ＳＶＭの境界を個人の行動に適した位置に変化させることができる。 According to the second invention, by adjusting the weight of each parameter value included in the learning sample, the boundary of the SVM can be changed to a position suitable for an individual's action.

第３の発明は、第２の発明に従属し、重み調整手段によって重みが調整された後に、ユーザ状態を仮認識する仮認識手段、および仮認識手段による認識度を記録する記録手段をさらに備え、重み調整手段は、記録手段によって記録された前回の認識度と、今回の認識度との差が所定値以下となるまで複数の学習サンプルの重みの調整を繰り返すことを特徴とする。 A third invention is dependent on the second invention, and further comprises provisional recognition means for temporarily recognizing a user state after the weight is adjusted by the weight adjustment means, and a recording means for recording the degree of recognition by the provisional recognition means. The weight adjusting means repeats the adjustment of the weights of the plurality of learning samples until the difference between the previous recognition degree recorded by the recording means and the current recognition degree becomes a predetermined value or less.

第３の発明では、仮認識手段（２６，Ｓ１７５）は、重み調整手段によって学習サンプルに含まれるパラメータの重みが調整された後に、ユーザ状態を仮認識する。記録手段（２６，Ｓ１７７）は、たとえば、アクティブ状態と認識された認識サンプルの数を認識度として記録する。そして、重み調整手段は、たとえば、前回の認識度と、今回の認識度との差が所定値以下となるまで複数の学習サンプルの重みの調整を繰り返す。 In the third invention, the temporary recognition means (26, S175) temporarily recognizes the user state after the weights of the parameters included in the learning sample are adjusted by the weight adjustment means. The recording means (26, S177) records, for example, the number of recognition samples recognized as the active state as the recognition degree. Then, the weight adjustment unit repeats the adjustment of the weights of the plurality of learning samples until, for example, the difference between the previous recognition level and the current recognition level becomes a predetermined value or less.

第３の発明によれば、ユーザ状態の認識精度は、状態認識装置の稼働時間に比例して向上する。 According to the third invention, the recognition accuracy of the user state is improved in proportion to the operating time of the state recognition device.

第４の発明は、ユーザのユーザ状態を認識する、状態認識装置であって、複数の学習サンプルを記憶する記憶手段、ユーザの行動を取得する取得手段、複数の学習サンプルに基づいて、取得手段によって取得されたユーザの行動から、重みを調整した認識サンプルを作成する作成手段、および認識基準を有し、作成手段によって作成された認識サンプルからユーザのユーザ状態を認識する認識手段を備える、状態認識装置である。 4th invention is a state recognition apparatus which recognizes a user's user state, Comprising: The memory | storage means which memorize | stores a some learning sample, The acquisition means which acquires a user's action, An acquisition means based on a some learning sample A creation unit that creates a recognition sample with an adjusted weight from the user's behavior acquired by the step, and a recognition unit that has a recognition standard and recognizes the user state of the user from the recognition sample created by the creation unit It is a recognition device.

第４の発明では、状態認識装置（１０）はユーザのユーザ状態を認識する。記憶手段（３０）は、複数のユーザの行動からサンプリングされた複数の学習サンプルを記憶する。取得手段（２６，Ｓ１５３，Ｓ１５５）は、カメラ（１４，２２）やマイク（２０）などからユーザの行動を取得する。作成手段（２６，Ｓ３１３）は、複数のユーザの行動からサンプリングされた複数の学習サンプルに基づいて、ユーザの行動から、重みを調整した認識サンプルを作成する。認識手段（２６，Ｓ３１５）は、認識基準を有し、作成手段によって作成された認識サンプルからユーザのユーザ状態（集中状態）を認識する。 In the fourth invention, the state recognition device (10) recognizes the user state of the user. The storage means (30) stores a plurality of learning samples sampled from the actions of a plurality of users. The acquisition means (26, S153, S155) acquires the user's action from the camera (14, 22), the microphone (20), or the like. The creating means (26, S313) creates a recognition sample in which the weight is adjusted from the user's action based on the plurality of learning samples sampled from the plurality of user's actions. The recognition means (26, S315) has a recognition standard, and recognizes the user state (concentrated state) of the user from the recognition sample created by the creation means.

第４の発明によれば、認識サンプルの重みを調整するだけでよいので、認識基準を変化させることなく、特定のユーザの状態を容易に正しく認識できるようになる。 According to the fourth aspect of the present invention, it is only necessary to adjust the weight of the recognition sample, so that the state of a specific user can be easily and correctly recognized without changing the recognition standard.

第５の発明は、第１の発明ないし第４の発明のいずれかに従属し、ユーザの複数の要素行動の有無を判定する要素行動判定手段をさらに備え、ユーザの行動は、複数の要素行動の有無を組み合わせた複合行動を含むことを特徴とする。 A fifth invention is according to any one of the first to fourth inventions, further comprising an elemental action determining means for determining presence / absence of a plurality of elemental actions of the user, wherein the user's action is a plurality of elemental actions It is characterized by including a complex action that combines the presence or absence of.

第５の発明では、要素行動判定手段（２６，Ｓ１３，Ｓ３７，Ｓ４５，Ｓ６７，Ｓ７５，Ｓ９７，Ｓ１０５）は、カメラやマイクの入力に基づいて、ユーザの複数の要素行動の有無を判定する。そして、ユーザの行動は、複数の要素行動の有無を組み合わせた複合行動を含む。 In the fifth invention, the element behavior determining means (26, S13, S37, S45, S67, S75, S97, S105) determines the presence or absence of a plurality of elemental actions of the user based on the input of the camera or microphone. And a user's action contains the compound action which combined the existence of a plurality of elemental actions.

第６の発明は、第５の発明に従属し、複数の要素行動は、ユーザの発話、ユーザの注視、ユーザの前傾姿勢およびユーザの頷きを含むことを特徴とする。 A sixth invention is according to the fifth invention, and the plurality of element behaviors include a user's utterance, a user's gaze, a user's forward leaning posture, and a user's whisper.

第５の発明および第６の発明によれば、取得される要素行動の種類を少なくしつつ、学習および認識に必要な複合行動を得ることができる。そのため、ユーザの要素行動を記録する状態認識装置の負荷を減らすことができる。 According to the fifth invention and the sixth invention, it is possible to obtain a complex action necessary for learning and recognition while reducing the types of element actions to be acquired. Therefore, it is possible to reduce the load on the state recognition device that records the user's elemental behavior.

第７の発明は、請求項６記載の状態認識装置を有する、傾聴対話持続システムあって、対話相手の発話の有無を判定する相手発話判定手段をさらに備え、一方の話者における複数の要素行動には、他方の対話相手の発話がさらに含まれる、傾聴対話持続システムである。 A seventh aspect of the present invention is a listening dialogue sustaining system having the state recognition device according to claim 6, further comprising partner utterance determination means for determining presence / absence of speech of the conversation partner, and a plurality of elemental actions in one speaker Is a listening dialogue sustaining system that further includes the utterance of the other dialogue partner.

第７の発明では、傾聴対話持続システム（１００）は、請求項６記載の状態認識装置（１０）を有し、対話相手との対話を持続させるためのシステムである。また、傾聴対話持続システムの相手発話判定手段（２６，Ｓ１２３）は、対話相手の相手の発話の有無を判定する。そして、一方の話者における複数の要素行動には、他方の対話相手の発話がさらに含まれる。 In the seventh invention, the listening dialogue sustaining system (100) has the state recognition device (10) according to claim 6 and is a system for sustaining the dialogue with the dialogue partner. Further, the other party utterance determination means (26, S123) of the listening dialogue sustaining system determines the presence or absence of the other party's utterance. The plurality of elemental actions in one speaker further includes the utterance of the other conversation partner.

第７の発明によれば、相手の発話の有無を利用してユーザの集中状態を認識することで、認識の精度を向上させることができる。 According to the seventh aspect, the recognition accuracy can be improved by recognizing the user's concentration state using the presence or absence of the other party's utterance.

第８の発明は、複数の学習サンプルを記憶する記憶手段（３０）を備える、状態認識装置のプロセッサ（２６）を、ユーザの行動を取得する取得手段（２６，Ｓ１５３，Ｓ１５５）、認識基準を有し、取得手段によって取得されたユーザの行動からユーザ状態を認識する認識手段（２６，Ｓ２１３）、および複数の学習サンプルに基づいて、認識基準を調整する調整手段（２６，Ｓ１７１，Ｓ１７３）として機能させる、状態認識プログラムである。 The eighth invention provides a processor (26) of a state recognition device comprising a storage means (30) for storing a plurality of learning samples, an acquisition means (26, S153, S155) for acquiring user actions, and a recognition standard. A recognition means (26, S213) for recognizing the user state from the user behavior acquired by the acquisition means, and an adjustment means (26, S171, S173) for adjusting the recognition reference based on a plurality of learning samples. It is a state recognition program that functions.

第８の発明でも、第１の発明と同様に、特定のユーザの行動からサンプリングされた学習サンプルを利用して、認識基準が調整されるため、そのユーザの状態を容易に正しく認識できるようになる。 In the eighth invention, similarly to the first invention, the recognition standard is adjusted using the learning sample sampled from the behavior of the specific user so that the user's state can be easily and correctly recognized. Become.

第９の発明は、複数の学習サンプルを記憶する記憶手段（３０）を備える、状態認識装置（１０）のプロセッサ（２６）を、ユーザの行動を取得する取得手段（Ｓ１５３，Ｓ１５５）、複数の学習サンプルに基づいて、取得手段によって取得されたユーザの行動から重みが調整された認識サンプルを作成する作成手段（Ｓ３１３）、および認識基準を有し、作成手段によって作成された重みが調整された認識サンプルからユーザのユーザ状態を認識する認識手段（Ｓ３１５）として機能させる、状態認識プログラムである。 According to a ninth aspect of the present invention, there is provided a processor (26) of the state recognition device (10) including a storage unit (30) for storing a plurality of learning samples, an acquisition unit (S153, S155) for acquiring a user action, a plurality of Based on the learning sample, a creation unit (S313) for creating a recognition sample in which the weight is adjusted from the user's action acquired by the acquisition unit, and the weight created by the creation unit having a recognition criterion is adjusted It is a state recognition program that functions as recognition means (S315) for recognizing the user state of a user from a recognition sample.

第９の発明でも、第４の発明と同様に、認識サンプルの重みを調整するだけでよいので、認識基準を変化させることなく、特定のユーザの状態を容易に正しく認識できるようになる。 In the ninth invention as well, as with the fourth invention, it is only necessary to adjust the weight of the recognition sample, so that the state of a specific user can be easily recognized correctly without changing the recognition standard.

第１０の発明は、ユーザの行動を取得する取得手段（Ｓ１５３，Ｓ１５５）、認識基準を有し、取得手段によって取得されたユーザの行動からユーザ状態を認識する認識手段（２６，Ｓ２１３）および複数の学習サンプルを記憶する記憶手段（３０）を備える、状態認識装置（１０）の状態認識方法において、取得手段によってユーザの行動を取得し（Ｓ１５３，Ｓ１５５）、複数の学習サンプルに基づいて、認識基準を調整し（Ｓ１７１，Ｓ１７３）、そして認識基準が調整された認識手段によって、取得手段によって取得されたユーザ行動からユーザ状態を認識することを特徴とする、状態認識方法である。 A tenth aspect of the present invention is an acquisition means (S153, S155) for acquiring a user's action, a recognition means (26, S213) having a recognition criterion and recognizing a user state from the user's action acquired by the acquisition means. In the state recognition method of the state recognition device (10), which includes a storage means (30) for storing the learning samples, the action of the user is acquired by the acquisition means (S153, S155), and recognition is performed based on the plurality of learning samples. The state recognition method is characterized in that the reference state is adjusted (S171, S173), and the user state is recognized from the user behavior acquired by the acquisition unit by the recognition unit whose recognition standard is adjusted.

第１０の発明でも、第１の発明と同様に、特定のユーザの行動からサンプリングされた学習サンプルを利用して、認識基準が調整されるため、そのユーザの状態を容易に正しく認識できるようになる。 In the tenth invention, similarly to the first invention, the recognition standard is adjusted using the learning sample sampled from the behavior of a specific user so that the user's state can be easily and correctly recognized. Become.

第１１の発明は、複数の学習サンプルを記憶する記憶手段（３０）を備える、状態認識装置（１０）の状態認識方法であって、ユーザの行動を取得し（Ｓ１５３，Ｓ１５５）、複数の学習サンプルに基づいて、取得手段によって取得されたユーザの行動から重みが調整された認識サンプルを作成し（Ｓ３１３）、そして認識基準を有し、重みが調整された認識サンプルからユーザのユーザ状態を認識する（Ｓ３１５）、状態認識方法である。 An eleventh aspect of the invention is a state recognition method of the state recognition device (10) that includes storage means (30) for storing a plurality of learning samples, acquires a user's action (S153, S155), and a plurality of learnings Based on the sample, a recognition sample in which the weight is adjusted from the user's action acquired by the acquisition unit is created (S313), and the user state of the user is recognized from the recognition sample having the recognition criterion and adjusted in the weight. (S315) is a state recognition method.

第１１の発明でも、第４の発明と同様に、認識サンプルの重みを調整するだけでよいので、認識基準を変化させることなく、特定のユーザの状態を容易に正しく認識できるようになる。 In the eleventh invention, as in the fourth invention, it is only necessary to adjust the weights of the recognition samples, so that the state of a specific user can be easily and correctly recognized without changing the recognition criteria.

この発明によれば、特定のユーザの行動からサンプリングされた学習サンプルを利用して、認識基準が調整されるため、そのユーザの状態を容易に正しく認識できるようになる。 According to the present invention, since the recognition standard is adjusted using a learning sample sampled from the behavior of a specific user, the user's state can be easily and correctly recognized.

この発明の上述の目的、その他の目的、特徴および利点は、図面を参照して行う以下の実施例の詳細な説明から一層明らかとなろう。 The above object, other objects, features, and advantages of the present invention will become more apparent from the following detailed description of embodiments with reference to the drawings.

図１はこの発明の一実施例の傾聴対話持続システムの概要を示す図解図である。FIG. 1 is an illustrative view showing an outline of a listening dialogue sustaining system according to an embodiment of the present invention. 図２は図１に示すモニタカメラとモニタとロボットとユーザとの位置関係およびそのモニタカメラと腹部カメラとの撮影範囲の一例を示す図解図である。FIG. 2 is an illustrative view showing an example of a positional relationship among the monitor camera, the monitor, the robot, and the user shown in FIG. 1 and an imaging range of the monitor camera and the abdominal camera. 図３は図１に示すＰＣの電気的な構成の一例を示すブロック図である。FIG. 3 is a block diagram showing an example of the electrical configuration of the PC shown in FIG. 図４は図１に示すロボットの外観を正面から見た図解図である。FIG. 4 is an illustrative view showing the appearance of the robot shown in FIG. 1 from the front. 図５は図１に示すロボットの電気的な構成の一例を示すブロック図である。FIG. 5 is a block diagram showing an example of the electrical configuration of the robot shown in FIG. 図６は図１に示すサーバの電気的な構成の一例を示すブロック図である。FIG. 6 is a block diagram showing an example of the electrical configuration of the server shown in FIG. 図７は図３に示すメモリに記憶される行動テーブルの一例を示す図解図である。FIG. 7 is an illustrative view showing one example of an action table stored in the memory shown in FIG. 図８は図１に示すモニタカメラおよび腹部カメラによって撮影された画像データの一例を示す図解図である。FIG. 8 is an illustrative view showing one example of image data taken by the monitor camera and the abdominal camera shown in FIG. 図９は図３に示すメモリに記憶される行動テーブルに基づく時系列グラフの一例を示す図解図である。FIG. 9 is an illustrative view showing one example of a time series graph based on an action table stored in the memory shown in FIG. 図１０は図９に示す時系列データから求められる複合行動の一例を示す図解図である。FIG. 10 is an illustrative view showing one example of a composite action obtained from the time series data shown in FIG. 図１１は図１０に示す複合行動の行動頻度の一例を示す図解図である。FIG. 11 is an illustrative view showing one example of an action frequency of the composite action shown in FIG. 図１２は図１に示すＰＣによって構築されるＳＶＭの一例を示す図解図である。FIG. 12 is an illustrative view showing one example of an SVM constructed by the PC shown in FIG. 図１３は図３に示すメモリに記憶される行動テーブルから求められる要素行動の発生頻度の一例を示す図解図である。FIG. 13 is an illustrative view showing one example of the occurrence frequency of elemental behavior obtained from the behavior table stored in the memory shown in FIG. 図１４は図１３に示す要素行動の発生頻度から算出された複合行動の行動頻度の一覧を示す図解図である。FIG. 14 is an illustrative view showing a list of behavior frequencies of complex behaviors calculated from the occurrence frequency of elemental behaviors shown in FIG. 図１５は図１２に示すＳＶＭを個人化するために利用される数値の一例を示す図解図である。FIG. 15 is an illustrative view showing one example of numerical values used for personalizing the SVM shown in FIG. 図１６は図１２に示すＳＶＭを個人化することで変化する境界線の一例を示す図解図である。FIG. 16 is an illustrative view showing one example of a boundary line that is changed by personalizing the SVM shown in FIG. 図１７は図３に示すＰＣのメモリのメモリマップの一例を示す図解図である。FIG. 17 is an illustrative view showing one example of a memory map of the memory of the PC shown in FIG. 図１８は図１７に示すデータ記憶領域の一例を示す図解図である。18 is an illustrative view showing one example of a data storage area shown in FIG. 図１９は図１７に示す行動判定プログラムの構成の一例を示す図解図である。FIG. 19 is an illustrative view showing one example of a configuration of the behavior determination program shown in FIG. 図２０は図３に示すＰＣのプロセッサの画像／音声取得処理を示すフロー図である。FIG. 20 is a flowchart showing image / sound acquisition processing of the processor of the PC shown in FIG. 図２１は図３に示すＰＣのプロセッサの発話判定処理を示すフロー図である。FIG. 21 is a flowchart showing speech determination processing of the processor of the PC shown in FIG. 図２２は図３に示すＰＣのプロセッサの注視方向判定処理を示すフロー図である。FIG. 22 is a flowchart showing gaze direction determination processing of the processor of the PC shown in FIG. 図２３は図３に示すＰＣのプロセッサの前傾姿勢判定処理を示すフロー図である。FIG. 23 is a flowchart showing the forward tilt posture determination process of the processor of the PC shown in FIG. 図２４は図３に示すＰＣのプロセッサの頷き判定処理を示すフロー図である。FIG. 24 is a flowchart showing the whirling determination process of the PC processor shown in FIG. 図２５は図３に示すＰＣのプロセッサの相手の発話判定処理を示すフロー図である。FIG. 25 is a flowchart showing the speech determination process of the other party of the PC processor shown in FIG. 図２６は図３に示すＰＣのプロセッサの同期処理を示すフロー図である。FIG. 26 is a flowchart showing synchronization processing of the processor of the PC shown in FIG. 図２７は図３に示すＰＣのプロセッサのサンプル蓄積処理を示すフロー図である。FIG. 27 is a flowchart showing sample accumulation processing of the processor of the PC shown in FIG. 図２８は図３に示すＰＣのプロセッサの学習処理を示すフロー図である。FIG. 28 is a flowchart showing the learning process of the processor of the PC shown in FIG. 図２９は図３に示すＰＣのプロセッサの認識処理を示すフロー図である。FIG. 29 is a flowchart showing the recognition processing of the processor of the PC shown in FIG. 図３０は図３に示すＰＣのプロセッサの集中認識処理を示すフロー図である。FIG. 30 is a flowchart showing centralized recognition processing of the processor of the PC shown in FIG. 図３１は図３に示すＰＣのプロセッサのロボット制御処理を示すフロー図である。FIG. 31 is a flowchart showing robot control processing of the processor of the PC shown in FIG. 図３２は図３に示すメモリに記憶される行動テーブルからサンプリングされた認識サンプルが一般化されたときの一例を示す図解図である。FIG. 32 is an illustrative view showing one example when a recognition sample sampled from the behavior table stored in the memory shown in FIG. 3 is generalized. 図３３は図３に示すＰＣのプロセッサの第２実施例の学習処理を示すフロー図である。FIG. 33 is a flowchart showing the learning process of the second embodiment of the processor of the PC shown in FIG. 図３４は図３に示すＰＣのプロセッサの第２実施例の集中認識処理を示すフロー図である。FIG. 34 is a flowchart showing centralized recognition processing of the second embodiment of the processor of the PC shown in FIG. 図３５は他の実施例における傾聴対話持続システムの概要を示す図解図である。FIG. 35 is an illustrative view showing an outline of a listening dialogue sustaining system in another embodiment. 図３６は図３５に示す他の実施例の傾聴対話持続システムを利用する２人のユーザの位置関係を示す図解図である。FIG. 36 is an illustrative view showing a positional relationship between two users using the listening dialogue sustaining system of another embodiment shown in FIG.

＜第１実施例＞
図１を参照して、この実施例の傾聴対話持続システム１００は、たとえば認知症患者のような軽度脳障害を持つユーザＡと、ユーザＢとの対話に利用される。そのため、傾聴対話持続システム１００には、ＰＣ１０ａ、ユーザＡが居る部屋１に設置される腹部カメラ１４ａを含むぬいぐるみ型ロボット（以下、単に「ロボット」と言う。）１２ａ、モニタ１６ａ、スピーカ１８ａ、マイク２０ａおよびモニタカメラ２２ａと、ユーザＢが居る部屋２（遠隔地）に設置されるＰＣ１０ｂ、腹部カメラ１４ｂを含むロボット１２ｂ、モニタ１６ｂ、スピーカ１８ｂ、マイク２０ｂおよびモニタカメラ２２ｂと、ネットワーク２００に接続されるサーバ２４とを備える。なお、本明細書では、部屋１および部屋２において対応する機器および人間を区別なく説明する場合、参照符号に添えられたアルファベットは省略する。 <First embodiment>
Referring to FIG. 1, the listening dialogue sustaining system 100 according to this embodiment is used for a dialogue between a user B and a user B having a mild brain disorder such as a patient with dementia. Therefore, the listening dialogue sustaining system 100 includes a stuffed robot (hereinafter simply referred to as “robot”) 12a including a PC 10a and an abdominal camera 14a installed in the room 1 where the user A is located, a monitor 16a, a speaker 18a, and a microphone. 20a and monitor camera 22a, PC 10b installed in room 2 (remote location) where user B is located, robot 12b including abdominal camera 14b, monitor 16b, speaker 18b, microphone 20b and monitor camera 22b, and connected to network 200 Server 24. In the present specification, when the corresponding devices and persons in the room 1 and the room 2 are described without distinction, the alphabets attached to the reference numerals are omitted.

ロボット１２はＰＣ１０による制御信号に基づいて、傾聴動作や発話を行う。ロボット１２の腹部に設けられた腹部カメラ１４はユーザを撮影し、ロボット１２を介して画像をＰＣ１０に出力する。ＰＣ１０は、ロボット１２に対して制御信号を出力するとともに、腹部カメラ１４およびモニタカメラ２２によって撮影された画像およびマイク２０によって集音される音声が入力される。そして、ＰＣ１０は、入力された画像および音声に基づいてユーザの行動を判定することで、ユーザの状態（ユーザ状態）を認識する。また、判定されたユーザの行動およびユーザ状態は、ネットワーク２００を介してサーバ２４に送信される。なお、ＰＣ１０は、ユーザの状態を認識するため、状態認識装置と呼ばれることもある。 The robot 12 performs a listening operation and a speech based on a control signal from the PC 10. The abdominal camera 14 provided on the abdomen of the robot 12 captures the user and outputs an image to the PC 10 via the robot 12. The PC 10 outputs a control signal to the robot 12 and receives an image captured by the abdominal camera 14 and the monitor camera 22 and a sound collected by the microphone 20. And PC10 recognizes a user's state (user state) by judging a user's action based on an inputted picture and sound. Further, the determined user behavior and user status are transmitted to the server 24 via the network 200. The PC 10 is sometimes called a state recognition device in order to recognize the user's state.

また、ＰＣ１０、モニタ１６、スピーカ１８、マイク２０およびモニタカメラ２２はテレビ電話機として機能する。たとえば、ＰＣ１０ａは、ユーザＢ側のＰＣ１０ｂから送信されたユーザＢの画像および音声を受信する。そのため、モニタ１６ａはユーザＢの画像を表示し、スピーカ１８はユーザＢの音声を出力する。さらに、マイク２０はユーザＡの音声を集音してＰＣ１０に出力し、モニタカメラ２２はユーザＡの画像を撮影してＰＣ１０に出力する。そして、ＰＣ１０は、ユーザＡの画像と音声とを、ネットワーク２００を介してＰＣ１０ｂに送信する。そのため、傾聴対話持続システム１００は、テレビ電話システムと呼ばれることもある。 The PC 10, the monitor 16, the speaker 18, the microphone 20, and the monitor camera 22 function as a video phone. For example, the PC 10a receives the image and sound of the user B transmitted from the PC 10b on the user B side. Therefore, the monitor 16a displays the image of the user B, and the speaker 18 outputs the user B's voice. Further, the microphone 20 collects the voice of the user A and outputs it to the PC 10, and the monitor camera 22 captures an image of the user A and outputs it to the PC 10. Then, the PC 10 transmits the image and sound of the user A to the PC 10b via the network 200. Therefore, the listening dialogue sustaining system 100 is sometimes called a videophone system.

サーバ２４は、ＰＣ１０ａおよびＰＣ１０ｂから送信される、ユーザＡおよびユーザＢの行動や状態のデータを受信すると、データベース（ＤＢ）に蓄積する。そして、ＰＣ１０から行動および状態のデータを取得する要求がある場合に、その要求に基づいてデータがＰＣ１０に送信される。 When the server 24 receives the data of the actions and states of the user A and the user B transmitted from the PC 10a and the PC 10b, the server 24 accumulates them in a database (DB). And when there exists a request | requirement which acquires the data of action and a state from PC10, data are transmitted to PC10 based on the request | requirement.

なお、他の実施例では、ロボット１２とＰＣ１０とが有線接続ではなく、無線接続であってもよい。また、ＰＣ１０およびサーバ２４のネットワーク２００との接続は、有線接続であってもよいし、無線接続であってもよい。 In another embodiment, the robot 12 and the PC 10 may be wirelessly connected instead of wiredly connected. Further, the connection between the PC 10 and the server 24 with the network 200 may be a wired connection or a wireless connection.

図２は図１に示す実施例を側面から見た実施例である。図２から分かるように、モニタカメラ２２はモニタ１６の上に置かれ、ロボット１２とモニタ１６とは机の上に置かれる。ユーザは、机の上に置かれるモニタ１６およびモニタカメラ２２に対面する状態で、腹部カメラ１４およびモニタカメラ２２によって撮影される。さらに、ロボット１２は、ユーザとモニタ１６との間に配置されるため、モニタカメラ２２はロボット１２とユーザとを同時に撮影する。これにより、ロボット１２は、ユーザＡに対して疑似的な傾聴動作（疑似傾聴動作）を行ったり、ユーザＢが表示されるモニタ１６ａに対して疑似傾聴動作を行ったりする。 FIG. 2 shows an embodiment of the embodiment shown in FIG. 1 viewed from the side. As can be seen from FIG. 2, the monitor camera 22 is placed on the monitor 16, and the robot 12 and the monitor 16 are placed on the desk. The user is photographed by the abdominal camera 14 and the monitor camera 22 while facing the monitor 16 and the monitor camera 22 placed on the desk. Furthermore, since the robot 12 is disposed between the user and the monitor 16, the monitor camera 22 photographs the robot 12 and the user at the same time. As a result, the robot 12 performs a pseudo-listening operation (pseudo-listening operation) on the user A, or performs a pseudo-listening operation on the monitor 16a on which the user B is displayed.

なお、ロボット１２は、モニタカメラ２２によって撮影され、かつユーザを撮影可能な位置であれば、机の上に置かれていなくてもよい。 Note that the robot 12 does not have to be placed on a desk as long as it is photographed by the monitor camera 22 and can be photographed by the user.

図３にはＰＣ１０の電気的な構成を示すブロック図が示される。ＰＣ１０には、マイクロコンピュータ或いはCPUとも呼ばれる、プロセッサ２６が内蔵されている。プロセッサ２６は、バス２８を介して、メモリ３０、音声入力／出力ボード３２、Ｉ／Ｏ３４および通信ＬＡＮボード３６と接続される。なお、プロセッサ２６には、日時情報を出力するＲＴＣ(Real Time Clock)２６ａが内蔵されている。 FIG. 3 is a block diagram showing the electrical configuration of the PC 10. The PC 10 has a processor 26 called a microcomputer or CPU. The processor 26 is connected to the memory 30, the audio input / output board 32, the I / O 34, and the communication LAN board 36 via the bus 28. The processor 26 has a built-in RTC (Real Time Clock) 26a for outputting date information.

記憶手段として機能するメモリ３０は、図示しないROM、RAMおよびHDDが組み込まれており、ROMには主として、電話機能を実現するためのプログラムや、後述のフローチャート（図２０−図３１）で表現されるプログラムが記憶される。また、RAMには主として、腹部カメラ１４およびモニタカメラ２２によって撮影された画像や、マイク２０によって集音された音声などが一時的に記憶されるバッファなどが設定されている。そして、HDDには主として、ユーザの行動を判定した結果や、状態を認識した結果などが記憶される。 The memory 30 functioning as a storage means incorporates ROM, RAM, and HDD (not shown). The ROM is mainly expressed by a program for realizing a telephone function and flowcharts (FIGS. 20 to 31) described later. Program is stored. The RAM is mainly set with a buffer for temporarily storing images taken by the abdominal camera 14 and the monitor camera 22 and sounds collected by the microphone 20. The HDD mainly stores the result of determining the user's action, the result of recognizing the state, and the like.

スピーカ１８には、音声入力／出力ボード３２を介して、プロセッサ２６から相手ユーザの音声データが与えられ、それに応じて、スピーカ１８からはそのデータに従った音声が出力される。そして、マイク２０によって集音された相手ユーザの音声は、音声入力／出力ボード３２を介して、プロセッサ２６に取り込まれる。 The audio data of the other user is given to the speaker 18 from the processor 26 via the audio input / output board 32, and in accordance with the audio data, audio corresponding to the data is output from the speaker 18. Then, the other user's voice collected by the microphone 20 is taken into the processor 26 via the voice input / output board 32.

Ｉ／Ｏ３４は、各々入力／出力の制御が可能なディジタルポートであり、出力ポートからは、制御信号がロボット１２に出力され、画像信号がモニタ１６に出力される。また、ロボット１２およびモニタカメラ２２からは、映像信号が出力され、入力ポートに与えられる。 Each of the I / Os 34 is a digital port that can control input / output. From the output port, a control signal is output to the robot 12 and an image signal is output to the monitor 16. In addition, video signals are output from the robot 12 and the monitor camera 22 and applied to the input port.

通信ＬＡＮボード３６は、たとえばDSP(Digital Signal Processor)で構成され、プロセッサ２６から与えられた送信データを無線通信装置３８に与える。無線通信装置３８は送信データを、ネットワーク２００を介して外部のコンピュータ（サーバ２４および相手のＰＣ１０）に送信する。また、通信ＬＡＮボード３６は、無線通信装置３８を介してデータを受信し、受信したデータをプロセッサ２６に与える。 The communication LAN board 36 is configured by, for example, a DSP (Digital Signal Processor), and supplies transmission data given from the processor 26 to the wireless communication device 38. The wireless communication device 38 transmits the transmission data to an external computer (the server 24 and the partner PC 10) via the network 200. The communication LAN board 36 receives data via the wireless communication device 38 and gives the received data to the processor 26.

たとえば、送信データとしては、テレビ電話機として必要なコマンド、画像データおよび音声データや、ユーザの行動を判定した結果およびユーザの状態を認識した結果であったりする。また、受信データとしては、テレビ電話機として得られる相手の画像データおよび音声データや、相手ユーザの行動を判定した結果および状態を認識した結果であったりする。 For example, the transmission data may be a command, image data and audio data necessary for a videophone, a result of determining a user's action, and a result of recognizing the user's state. The received data may be image data and audio data of the other party obtained as a video phone, or a result of determining the action and state of the other user and recognizing the state.

図４にはロボット１２の外観が図示される。このロボット１２は、頭部４２とそれを支える胴体４４とを含む。胴体４４の上部（人間の肩に相当）の左右に左腕４６Ｌおよび右腕４６Ｒが設けられ、胴体４４の腹部には腹部カメラ１４が設けられる。この腹部カメラ１４には、たとえばCCDやCMOSのような固体撮像素子を用いるカメラを採用することができる。また、頭部４２には、前面に口４８が配置され、その口４８の上方には眼球５０が設けられる。そして、頭部４２の上部側面には耳５２が取り付けられている。 FIG. 4 illustrates the appearance of the robot 12. The robot 12 includes a head 42 and a body 44 that supports the head 42. The left arm 46L and the right arm 46R are provided on the left and right of the upper part of the body 44 (corresponding to a human shoulder), and the abdomen camera 14 is provided on the abdomen of the body 44. As the abdominal camera 14, a camera using a solid-state image sensor such as a CCD or a CMOS can be employed. The head 42 is provided with a mouth 48 on the front surface, and an eyeball 50 is provided above the mouth 48. An ear 52 is attached to the upper side surface of the head 42.

頭部４２は、胴体４４によって旋回・俯仰可能に支持され、眼球５０も稼働的に保持されている。また、胴体４４は、腰の部分を中心として左右方向に傾くことが可能である。さらに、口４８にはスピーカ７２（図５）が内蔵され、耳５２にはマイク７４（図５）が内蔵される。 The head 42 is supported by the body 44 so as to be able to turn and rise, and the eyeball 50 is also operatively held. Further, the body 44 can be tilted in the left-right direction around the waist. Further, the mouth 48 has a built-in speaker 72 (FIG. 5), and the ear 52 has a built-in microphone 74 (FIG. 5).

なお、マイク７４を両方の耳５２にそれぞれ内蔵すれば、ステレオマイクとして機能し、それによって、そのステレオマイクに入力された音声の位置を必要に応じて特定することができる。また、ロボット１２の外見は、熊だけに限らず、他の動物や、人型であってもよい。 If the microphones 74 are built in both ears 52, respectively, the microphones 74 function as stereo microphones, whereby the position of the sound input to the stereo microphones can be specified as necessary. Further, the appearance of the robot 12 is not limited to a bear, but may be other animals or a humanoid.

図５にはロボット１２の電気的な構成を示すブロック図が示される。ロボット１２には、ＰＣ１４と同様、プロセッサ５４が内蔵されている。また、プロセッサ２６は、通信路の一例であるバス５６を介して、メモリ５８、モータ制御ボード６０、音声入力／出力ボード７０、センサ入力／出力ボード７６およびＩ／Ｏ７８に接続される。 FIG. 5 is a block diagram showing the electrical configuration of the robot 12. Similar to the PC 14, the robot 12 includes a processor 54. The processor 26 is connected to the memory 58, the motor control board 60, the audio input / output board 70, the sensor input / output board 76, and the I / O 78 via a bus 56 that is an example of a communication path.

メモリ５８は、図示しないROMやRAMが組み込まれており、ROMには主として、ロボット１２による傾聴動作や、発話を行うためのプログラムや、発話を行う際にスピーカ７２から出力される音声データなどが予め記憶されている。また、RAMは一時記憶メモリとして用いられるとともに、ワーキングメモリとして利用される。 The memory 58 incorporates a ROM and a RAM (not shown). The ROM mainly includes a listening operation by the robot 12, a program for making a speech, and audio data output from the speaker 72 when the speech is made. Stored in advance. The RAM is used as a temporary storage memory and a working memory.

モータ制御ボード６０は、たとえばDSPで構成され、図４に示すロボット１２の各腕や頭部の各軸モータを制御する。すなわち、モータ制御ボード６０は、プロセッサ５４からの制御データを受け、右腕４６Ｒ（図４）を前後や左右に動かすことができるように、Ｘ，ＹおよびＺ軸のそれぞれの角度を制御する３つのモータ（図５ではまとめて、「右腕モータ」として示す。）６２Ｒの回転角度を調節する。また、モータ制御ボード６０は、左腕４６Ｌの３つのモータ（図５ではまとめて、「左腕モータ」として示す。）６２Ｌの回転角度を調節する。モータ制御ボード６０は、また、頭部４２の旋回角や俯仰角を制御する３つのモータ（図５ではまとめて、「頭部モータ」として示す。）６４の回転角度を調節する。モータ制御ボード６０は、また、眼球５０を動かす眼球モータ６６および胴体４４を傾ける腰モータ６８も制御する。 The motor control board 60 is composed of, for example, a DSP, and controls each axis motor of each arm and head of the robot 12 shown in FIG. That is, the motor control board 60 receives the control data from the processor 54 and controls the three angles for controlling the X, Y, and Z axes so that the right arm 46R (FIG. 4) can be moved back and forth and left and right. The rotation angle of the motor 62R (collectively shown as “right arm motor” in FIG. 5) 62R is adjusted. Further, the motor control board 60 adjusts the rotation angle of three motors 62L of the left arm 46L (collectively shown as “left arm motor” in FIG. 5) 62L. The motor control board 60 also adjusts the rotation angle of three motors 64 (collectively shown as “head motors” in FIG. 5) that control the turning angle and the elevation angle of the head 42. The motor control board 60 also controls an eyeball motor 66 that moves the eyeball 50 and a waist motor 68 that tilts the body 44.

なお、上述のモータは、制御を簡単化するために、それぞれステッピングモータまたはパルスモータであるが、直流モータであってもよい。 Note that the motors described above are stepping motors or pulse motors in order to simplify the control, but may be direct current motors.

スピーカ７２には音声入力／出力ボード７０を介して、プロセッサ５４から合成音声データが与えられ、それに応じて、スピーカ７２からはそのデータに従った音声または声が出力される。そして、マイク７４によって集音された音声は、音声入力／出力ボード７０を介して、プロセッサ５４に取り込まれる。 The synthesized voice data is given to the speaker 72 from the processor 54 via the voice input / output board 70, and the voice or voice according to the data is outputted from the speaker 72 accordingly. Then, the sound collected by the microphone 74 is taken into the processor 54 via the sound input / output board 70.

センサ入力／出力ボード７６は、モータ制御ボード６０と同様に、DSPで構成され、腹部カメラ１４からの信号を取り込んで、プロセッサ５４に与える。腹部カメラ１４からの映像信号が、必要に応じてセンサ入力／出力ボード７６で所定の処理を施してからプロセッサ５４に入力される。 Similar to the motor control board 60, the sensor input / output board 76 is constituted by a DSP, takes in a signal from the abdominal camera 14, and gives it to the processor 54. The video signal from the abdominal camera 14 is input to the processor 54 after being subjected to predetermined processing by the sensor input / output board 76 as necessary.

Ｉ／Ｏ７８は、ＰＣ１０のＩ／Ｏ３２と同様に、各々入力／出力の制御が可能なディジタルポートであり、出力ポートからは映像信号が出力され、ＰＣ１０に与えられる。一方、ＰＣ１０からは、制御信号が出力され、入力ポートに与えられる。 Similar to the I / O 32 of the PC 10, the I / O 78 is a digital port that can control input / output, and a video signal is output from the output port and applied to the PC 10. On the other hand, a control signal is output from the PC 10 and applied to the input port.

図６にはサーバ２４の電気的な構成を示すブロック図が示される。サーバ２４は、プロセッサ２６，５４と同様、プロセッサ８０が内蔵されている。また、プロセッサ８０は、バス８２を介して、メモリ８４、第１ユーザ情報ＤＢ８６、第２ユーザ情報ＤＢ８８および通信ＬＡＮボード９０に接続されている。 FIG. 6 is a block diagram showing the electrical configuration of the server 24. Similarly to the processors 26 and 54, the server 24 incorporates a processor 80. The processor 80 is connected to the memory 84, the first user information DB 86, the second user information DB 88, and the communication LAN board 90 via the bus 82.

メモリ８４は、図示しないROMやRAMが組み込まれており、ROMには主として、サーバ２４とＰＣ１０ａ，１０ｂなどとのデータ通信を行うためのプログラムなどが予め記憶されている。また、RAMは、一時記憶メモリとして用いられるとともに、ワーキングメモリとして利用される。 The memory 84 incorporates a ROM and a RAM (not shown), and mainly stores a program for performing data communication between the server 24 and the PCs 10a and 10b. The RAM is used as a temporary storage memory and a working memory.

第１ユーザ情報ＤＢ８６は、ＰＣ１０ａから送信されるユーザＡの行動データおよび状態データを蓄積するためのデータベースである。また、第２ユーザ情報ＤＢ８８は、ＰＣ１０ｂから送信されるユーザＢの行動データおよび状態データを蓄積するためのデータベースである。そして、第１ユーザ情報ＤＢ８６および第２ユーザ情報ＤＢ８８は、HDDやSSDのような記憶媒体から構成される。 The first user information DB 86 is a database for accumulating user A's behavior data and state data transmitted from the PC 10a. The second user information DB 88 is a database for accumulating user B's behavior data and state data transmitted from the PC 10b. The first user information DB 86 and the second user information DB 88 are configured from storage media such as HDDs and SSDs.

通信ＬＡＮボード９０は、ＰＣ１０の通信ＬＡＮボード３８と同様に、たとえばDSPで構成され、プロセッサ８０から与えられた送信データを無線通信装置９２に与える。無線通信装置９２は送信データを、ネットワーク２００を介して外部のコンピュータ（ＰＣ１０ａ，１０ｂ）に送信する。また、通信ＬＡＮボード９０は、無線通信装置９２を介してデータを受信し、受信データをプロセッサ８０に与える。 Similar to the communication LAN board 38 of the PC 10, the communication LAN board 90 is composed of, for example, a DSP, and provides transmission data given from the processor 80 to the wireless communication device 92. The wireless communication device 92 transmits the transmission data to an external computer (PC 10a, 10b) via the network 200. The communication LAN board 90 receives data via the wireless communication device 92 and provides the received data to the processor 80.

たとえば、受信データはＰＣ１０ａから送信されるユーザＡの行動データであり、プロセッサ８０はユーザＡの行動データを第１ユーザ情報ＤＢ８６に保存する。さらに、受信データとして、ＰＣ１０ｂからユーザＡの行動データ取得要求がプロセッサ８０に与えられると、プロセッサ８０は、ユーザＡの行動データを送信データとして、通信ＬＡＮボード９０に与える。 For example, the received data is the action data of the user A transmitted from the PC 10a, and the processor 80 stores the action data of the user A in the first user information DB 86. Further, when the action data acquisition request of the user A is given to the processor 80 from the PC 10b as received data, the processor 80 gives the action data of the user A to the communication LAN board 90 as transmission data.

図７にはＰＣ１０のメモリ３０に記憶される、行動テーブルが示される。この行動テーブルとは、ユーザの行動が判定された結果が行動データにされ、その行動データが一定時間（たとえば、１秒）毎に刻々と記録されるテーブルである。 FIG. 7 shows an action table stored in the memory 30 of the PC 10. The behavior table is a table in which the result of determining the user's behavior is used as behavior data, and the behavior data is recorded every predetermined time (for example, 1 second).

図７を参照して、行動テーブルは、左側から「時刻」、「発話」、「注視方向」、「前傾姿勢」、「頷き」および「相手の発話」の列で構成されている。そして、各行動データは、「時刻」の列に同期して、各欄に記録される判定結果から構成される。 Referring to FIG. 7, the action table is composed of columns of “time”, “speech”, “gaze direction”, “forward tilt”, “whit”, and “speech of the other party” from the left side. Each action data is composed of a determination result recorded in each column in synchronization with the “time” column.

「時刻」の列に記録される数値はＲＴＣ２６ａが出力する日時情報であり、たとえば「10:00:30」は「１０時００分３０秒」を表す。 The numerical value recorded in the “time” column is the date and time information output by the RTC 26a. For example, “10:00:30” represents “10:00:30”.

「発話」の列には、ユーザが発話しているか否かを示す判定結果が記録される。たとえば、「発話」の列に「有り」が記録されていれば、ユーザが発話していることを示し、「無し」が記録されていれば、ユーザが発話していないことを示す。そして、発話の「有り」／「無し」は、マイク２０によって集音された音声データの音声レベルから判定される。たとえば、音声データの音声レベルが決められた値以上であれば「有り」と判定され、決められた値未満であれば「無し」と判定される。 In the “utterance” column, a determination result indicating whether or not the user is speaking is recorded. For example, if “present” is recorded in the “utterance” column, it indicates that the user is speaking, and if “not present” is recorded, it indicates that the user is not speaking. The presence / absence of the utterance is determined from the sound level of the sound data collected by the microphone 20. For example, if the sound level of the sound data is greater than or equal to a predetermined value, it is determined as “present”, and if it is less than the determined value, it is determined as “not present”.

「注視方向」の列には、ユーザが注視している物が記録される。たとえば、「モニタ」が記録されていれば、ユーザがモニタ１６を注視していることを示し、「ロボット」が記録されていれば、ユーザがロボット１２を注視していることを示し、「その他」が記録されていれば、ユーザがロボット１２またはモニタ１６以外の物を注視していることを示す。そして、「注視方向」の列における、「モニタ」、「ロボット」および「その他」は腹部カメラ１４またはモニタカメラ２２によって撮影されたユーザの顔を認識することで判定される。 In the “Gaze direction” column, an object being watched by the user is recorded. For example, if “monitor” is recorded, it indicates that the user is gazing at the monitor 16, and if “robot” is recorded, it indicates that the user is gazing at the robot 12, "Indicates that the user is gazing at something other than the robot 12 or the monitor 16. Then, “monitor”, “robot”, and “other” in the “gazing direction” column are determined by recognizing the face of the user taken by the abdominal camera 14 or the monitor camera 22.

図８（Ａ）にはモニタカメラ２２による顔認識結果の成功列が示され、図８（Ｂ）には腹部カメラ１４による顔認識結果の成功例が示され、図８（Ｃ）にはどちらのカメラでも顔認識が失敗した状態が示される。 FIG. 8A shows a success sequence of face recognition results by the monitor camera 22, FIG. 8B shows a success example of face recognition results by the abdominal camera 14, and FIG. Even the camera of shows the state that face recognition failed.

図８（Ａ）を参照して、左側が腹部カメラ１４による画像であり、右側がモニタカメラ２２による画像であり、どちらの画像も同じ時刻に撮影された画像である。このとき、ユーザはモニタカメラ２２を注視している状態である。そのため、モニタカメラ２２による画像では、ユーザの顔が正面に写っているため、顔認識が成功している。一方、腹部カメラ１４による画像では、ユーザの顔は傾いて写っているため、顔認識が失敗している。そのため、ユーザの注視方向は「モニタ」と判定される。 Referring to FIG. 8A, the left side is an image by the abdominal camera 14, the right side is an image by the monitor camera 22, and both images are images taken at the same time. At this time, the user is watching the monitor camera 22. Therefore, in the image by the monitor camera 22, the face of the user is reflected in the front, and thus the face recognition is successful. On the other hand, in the image by the abdominal camera 14, the user's face is tilted and the face recognition fails. Therefore, the gaze direction of the user is determined as “monitor”.

次に、図８（Ｂ）を参照して、図８（Ａ）と同様、同じ時刻に撮影された画像である。このとき、ユーザは腹部カメラ１４を注視しているため、腹部カメラ１４による画像では顔認識が成功している。一方、モニタカメラ２２による画像ではユーザの顔が見きれているため顔認識が失敗している。そのため、ユーザの注視方向は「ロボット」と判定される。 Next, referring to FIG. 8B, similar to FIG. 8A, the images are taken at the same time. At this time, since the user is gazing at the abdominal camera 14, face recognition is successful in the image by the abdominal camera 14. On the other hand, the face recognition fails because the user's face is seen in the image by the monitor camera 22. Therefore, the user's gaze direction is determined as “robot”.

そして、図８（Ｃ）を参照して、図８（Ａ），（Ｂ）と同様、同じ時刻に撮影された画像である。このとき、ユーザは俯いているため、腹部カメラ１４およびモニタカメラ２２のどちらでも、顔認識は失敗している。そのため、ユーザの注視方向は「その他」と判定される。 Then, referring to FIG. 8C, the images are taken at the same time as in FIGS. 8A and 8B. At this time, since the user is crawling, face recognition has failed in both the abdominal camera 14 and the monitor camera 22. Therefore, the user's gaze direction is determined as “other”.

図７に戻って、「前傾姿勢」の列には、ユーザが前傾姿勢を取っている方向が記録される。また、「前傾姿勢」の列には、「注視方向」の列と同様に、「モニタ」、「ロボット」および「その他」が記録される。そして、「前傾姿勢」の列における、「モニタ」、「ロボット」および「その他」は腹部カメラ１４またはモニタカメラ２２によって撮影されたユーザの顔領域の認識結果に基づいて判定される。 Returning to FIG. 7, the direction in which the user is taking the forward leaning posture is recorded in the “forward leaning posture” column. Also, in the column of “forward tilt posture”, “monitor”, “robot”, and “others” are recorded, similarly to the column of “gaze direction”. Then, “monitor”, “robot”, and “other” in the column of “forward tilt posture” are determined based on the recognition result of the user's face area photographed by the abdominal camera 14 or the monitor camera 22.

たとえば、図８（Ａ）に示す画像では、モニタカメラ２２によって撮影された顔領域が閾値以上でるため、前傾姿勢は「モニタ」と判定される。また、図８（Ｂ）に示す画像では、腹部カメラ１４によって撮影された顔領域が閾値以上であるため、前傾姿勢は「ロボット」と判定される。そして、図８（Ｃ）に示す画像では、腹部カメラ１４およびモニタカメラ２２のどちらでも顔認識は失敗しているため、どちらのカメラの画像でも顔領域が閾値未満の状態となる。そのため、前傾姿勢は「その他」と判定される。 For example, in the image shown in FIG. 8A, since the face area photographed by the monitor camera 22 is equal to or greater than the threshold value, the forward tilt posture is determined as “monitor”. Further, in the image shown in FIG. 8B, the face region photographed by the abdominal camera 14 is equal to or greater than the threshold value, and therefore the forward tilt posture is determined as “robot”. In the image shown in FIG. 8C, since face recognition has failed in either the abdominal camera 14 or the monitor camera 22, the face area is less than the threshold value in either camera image. Therefore, the forward leaning posture is determined as “other”.

図７に戻って、「頷き」の列には、ユーザによる頷きの有無が記録される。たとえば、「頷き」の列に「有り」が記録されていればユーザが頷いたことを示し、「無し」が記録されていればユーザが頷かなかったことを示す。そして、ユーザの頷きの有無は、腹部カメラ１４およびモニタカメラ２２によって撮影された画像における顔位置の変化に基づいて判定される。たとえば、顔の位置（顔領域の重心）は一定時間毎に認識および記録される。そして、「頷き」の判定を行う際に、前回の顔の位置と今回の顔の位置との距離が一定距離以上離れていれば「有り」と判定される。また、前回の顔の位置と今回の顔の位置との距離が一定距離未満であれば「無し」と判定される。なお、ユーザの頷きは、ユーザに頭部加速度センサを取り付けて、加速度の変化から検出されてもよい。 Returning to FIG. 7, the presence / absence of whispering by the user is recorded in the “whispering” column. For example, if “present” is recorded in the “whispering” column, it indicates that the user has asked, and if “not present” is recorded, it indicates that the user has not spoken. The presence / absence of the user's whispering is determined based on the change in the face position in the images taken by the abdominal camera 14 and the monitor camera 22. For example, the position of the face (the center of gravity of the face area) is recognized and recorded at regular intervals. Then, when “whispering” is determined, if the distance between the previous face position and the current face position is more than a certain distance, it is determined as “present”. Also, if the distance between the previous face position and the current face position is less than a certain distance, it is determined as “none”. Note that the user's whispering may be detected from a change in acceleration by attaching a head acceleration sensor to the user.

「相手の発話」の列には、相手のユーザが発話しているかが記録される。つまり、「相手の発話」の列に「有り」が記録されていれば相手のユーザが発話していることを示す。また、「相手の発話」の列に「無し」が記録されていれば相手のユーザが発話していないことを示す。そして、「相手の発話」の有無は、サーバ２４に記録された相手のユーザの行動データに基づいて記録される。たとえば、図７に示す行動テーブルがユーザＡに対応していれば、ユーザＢに対応する行動データがサーバ２４から読み出され、「相手の発話」の有無が記録される。 In the “other party's utterance” column, whether the other party's user is speaking is recorded. That is, if “present” is recorded in the column of “the other party's utterance”, it indicates that the other party's user is speaking. Further, if “None” is recorded in the “other party utterance” column, it indicates that the other party user is not speaking. The presence / absence of “the other party's utterance” is recorded based on the action data of the other party's user recorded in the server 24. For example, if the behavior table shown in FIG. 7 corresponds to the user A, the behavior data corresponding to the user B is read from the server 24 and the presence / absence of “the other party's utterance” is recorded.

なお、上述した顔認識処理、顔領域認識処理および顔位置認識処理には、Haar-like特徴量が利用されているが、他の特徴量が用いられてもよい。また、図７に示す行動テーブルは、ＰＣ３０のメモリ３０だけでなく、サーバ２４の第１ユーザ情報ＤＢ８６および第２ユーザ情報ＤＢ８８にも記憶されている。 In addition, although the Haar-like feature value is used in the face recognition process, the face area recognition process, and the face position recognition process described above, other feature values may be used. Further, the behavior table shown in FIG. 7 is stored not only in the memory 30 of the PC 30 but also in the first user information DB 86 and the second user information DB 88 of the server 24.

ここで、ＰＣ１０では、一定時間毎に記録された行動データから、ユーザ状態が所定時間（たとえば、１０秒）毎に認識される。ユーザ状態には、「対話における発話」、「興味を持つ対象」および「対話への集中」が含まれる。そして、これらのユーザ状態は、発話認識処理、興味対象認識処理および集中認識処理によって認識される。 Here, in the PC 10, the user state is recognized every predetermined time (for example, 10 seconds) from the action data recorded at regular intervals. The user state includes “utterance in dialogue”, “object of interest”, and “concentration in dialogue”. These user states are recognized by an utterance recognition process, an interest object recognition process, and a centralized recognition process.

発話認識処理とは、ユーザが話し手となり発話している「トーク：Ｔａｌｋ」状態であるか、相手ユーザの話を傾聴している「リッスン：Ｌｉｓｔｅｎ」状態であるかを認識する処理である。そして、発話認識処理では、ユーザの発声時間に基づいて発話状態（トーク／リッスン）が認識される。 The utterance recognition process is a process for recognizing whether the user is in the “Talk” state where the user is speaking and is speaking, or the “Listen” state where the user is listening to the other user's story. In the speech recognition process, the speech state (talk / listen) is recognized based on the user's speech time.

次に、興味対象認識とは、ユーザの興味が「ロボット」、「モニタ」および「その他」のいずれであるかを認識する処理である。そのため、興味対象認識では、所定時間分のユーザの行動データのうち、ユーザの注視方向に基づいて、ユーザが興味を持つ物（ロボット、モニタ、その他）が認識される。 Next, interest recognition is a process for recognizing whether the user's interest is “robot”, “monitor”, or “other”. Therefore, in interest recognition, an object (robot, monitor, etc.) that the user is interested in is recognized based on the user's gaze direction among the user's behavior data for a predetermined time.

そして、集中認識処理とは、ユーザが対話に積極的に参加し、対話に集中している「アクティブ：Ａｃｔｉｖｅ」状態か、ユーザが対話に非積極的であり、対話に集中していない「パッシブ：Ｐａｓｓｉｖｅ」状態かを認識する処理である。そして、本実施例の集中認識処理は、ＳＶＭによって集中状態（アクティブ／パッシブ）を認識する。 The centralized recognition processing is an “active: active” state in which the user actively participates in the dialogue and concentrates on the dialogue, or “passive” in which the user is inactive in the dialogue and does not concentrate on the dialogue. : Passive "state. In the centralized recognition process of this embodiment, the centralized state (active / passive) is recognized by the SVM.

ＳＶＭとは、認識手法（認識モデル）の一種であり、予めラベル付けがされた複数の学習サンプルを学習することで構築される。そして、本実施例における複数の学習サンプルは、システムの管理者によって予め作成され、メモリ３０に記憶されている。また、学習サンプルを作成する場合、まず、行動テーブルから所定時間の行動データが切り出される。次に、切り出された行動データから特徴量を抽出し、手作業でアクティブまたはパッシブのラベルが付けられる。さらに、抽出した特徴量にラベルを対応付けることで、学習サンプルが完成する。そして、このようにして作成された学習サンプルを学習することで、ＳＶＭは構築される。 SVM is a kind of recognition technique (recognition model), and is constructed by learning a plurality of learning samples labeled in advance. A plurality of learning samples in this embodiment are created in advance by the system administrator and stored in the memory 30. When creating a learning sample, first, action data for a predetermined time is cut out from the action table. Next, feature quantities are extracted from the extracted behavior data and manually or passively labeled. Furthermore, a learning sample is completed by associating a label with the extracted feature quantity. And SVM is constructed | assembled by learning the learning sample produced in this way.

図９−図１１を用いて、学習サンプルについて説明する。図９は行動テーブルを時系列グラフによって表す図解図である。図９を参照して、この時系列グラフは、横軸が時間の変化を示し、縦軸が行動を示す。また、縦軸の行動は、行動テーブルの列における、「発話」、「注視方向」、「前傾姿勢」、「頷き」および「相手の発話」に対応している。さらに、時系列グラフでは、各行動が「起きている」または「起きていない」の２状態で表現される。たとえば、「発話」であれば、ユーザが「発話している」／「発話していない」の２状態で表現される。また、「注視方向」であれば、「モニタ」、「ロボット」および「その他」に分割して、「注視している」／「注視していない」の２状態で表現される。そして、図９に示す、「発話」、「注視（モニタ）」、「注視（ロボット）」、「注視（その他）」、「姿勢（モニタ）」、「姿勢（ロボット）」、「姿勢（その他）」、「頷き」および「相手の発話」が、抽出される特徴量を構成する要素行動となる。 The learning sample will be described with reference to FIGS. FIG. 9 is an illustrative view showing a behavior table by a time series graph. Referring to FIG. 9, in this time series graph, the horizontal axis indicates a change in time, and the vertical axis indicates an action. The behavior on the vertical axis corresponds to “utterance”, “gaze direction”, “forward leaning posture”, “whit”, and “partner's speech” in the column of the behavior table. Furthermore, in the time-series graph, each action is expressed in two states, “Wake up” or “Not up”. For example, in the case of “utterance”, the user is expressed in two states of “speaking” / “not speaking”. In addition, the “gaze direction” is divided into “monitor”, “robot”, and “other”, and is expressed in two states of “gazing” / “not watching”. Then, “speech”, “gaze (monitor)”, “gaze (robot)”, “gaze (other)”, “posture (monitor)”, “posture (robot)”, “posture (other) shown in FIG. ) ”,“ Whispering ”, and“ the other party's utterance ”are elemental actions constituting the extracted feature quantity.

たとえば、図１０（Ａ），（Ｂ）を参照して、図９における「発話」、「注視（モニタ）」および「頷き」の３つの要素行動に着目した場合、「発話のみ」、「注視のみ」、「頷きのみ」、「発話と注視」、「発話と頷き」、「注視と頷き」、「発話と注視と頷き」および「全てなし」の８つの組み合わせ（複合行動）が得られる。そして、本実施例では、これらの複合行動が発生する頻度が学習サンプルの特徴量となる。また、複合行動が発生する頻度は以下に記述するサンプリング窓に基づいて算出される。 For example, referring to FIGS. 10A and 10B, when attention is paid to the three element actions of “utterance”, “gaze (monitor)”, and “whit” in FIG. 9, “utterance only”, “gaze” Eight combinations (compound behavior) of “only”, “whispering”, “speech and gaze”, “speech and whispering”, “gaze and whisper”, “speech and gaze and whisper” and “none” are obtained. In the present embodiment, the frequency of occurrence of these complex actions is the feature amount of the learning sample. Further, the frequency of occurrence of the composite action is calculated based on the sampling window described below.

図１１（Ａ）はラベルが付けられた行動データを示す図解図であり、図１１（Ｂ）はサンプリングされた行動データを示す図解図であり、図１１（Ｃ）は複合行動が発生する頻度を示す図解図である。図１１（Ａ）を参照して、図１０（Ａ）に示す時系列グラフに対して、「アクティブ」および「パッシブ」のラベルが、システムの管理者によって付けられる。そして、「アクティブ」および「パッシブ」のラベルが付けられた行動データから、サンプリング窓のウィンドウ幅に基づいて学習サンプルがサンプリングされる。 FIG. 11 (A) is an illustrative view showing labeled behavior data, FIG. 11 (B) is an illustrative view showing sampled behavior data, and FIG. 11 (C) is a frequency at which composite behavior occurs. FIG. Referring to FIG. 11A, labels “active” and “passive” are attached to the time series graph shown in FIG. 10A by the system administrator. Then, learning samples are sampled from the action data labeled “active” and “passive” based on the window width of the sampling window.

たとえば、ウィンドウ幅が１０秒のサンプリング窓によって、「アクティブ」のラベルが付けられた行動データからサンプリングされた第１学習サンプルと、「パッシブ」のラベルが付けられた行動データサンプリングされた第２学習サンプルとは、図１１（Ｂ）のように表すことができる。また、第１学習サンプルおよび第２学習サンプルのそれぞれから、各複合行動が発生する頻度（発生頻度）を算出すると、図１１（Ｃ）のグラフで表すことができる。 For example, a first learning sample sampled from behavior data labeled “active” and a second learning sampled behavior data labeled “passive” by a sampling window having a window width of 10 seconds. A sample can be represented as shown in FIG. Moreover, when the frequency (occurrence frequency) at which each composite action occurs is calculated from each of the first learning sample and the second learning sample, it can be represented by the graph of FIG.

図１１（Ｂ），（Ｃ）を参照して、第１学習サンプルで「注視のみ」が発生した時間が２．５秒であれば、「注視のみ」の発生頻度は「０．２５（＝２．５／１０）」となる。また、「発話と注視」が発生した時間が５．８秒であれば、「発話と注視」の発生頻度は「０．５８」となる。さらに、「発話と注視と頷き」が発生した時間が１．７秒であれば、「発話と注視」の発生頻度は「０．１７」となる。そして、「発話のみ」、「頷きのみ」、「発話と頷き」、「注視と頷き」および「全てなし」が発生した時間が０秒であれば、それらの複合行動の発生頻度は「０」となる。 Referring to FIGS. 11B and 11C, if the time when “gaze only” occurs in the first learning sample is 2.5 seconds, the occurrence frequency of “gaze only” is “0.25 (= 2.5 / 10) ". If the time when “speech and gaze” occurs is 5.8 seconds, the occurrence frequency of “speech and gaze” is “0.58”. Further, if the time when “speech and gaze and whisper” occurs is 1.7 seconds, the occurrence frequency of “speech and gaze” becomes “0.17”. If the occurrence time of “utterance only”, “whispering only”, “utterance and whispering”, “gaze and whispering”, and “none” is 0 seconds, the frequency of occurrence of these complex actions is “0”. It becomes.

ここで、複合行動の種類をＭ、複合行動をｐａｔ、複合行動の発生頻度をｄ、複合行動が発生した時間をｔ_ｐａｔ、ウィンドウ幅をｔ_ｓと表したとき、特徴量を示す特徴ベクトル（行動頻度）Ｄは数１のように表すことができる。なお、要素行動の種類をｍとした場合、複合行動の数は、「Ｍ＝２^ｍ」となる。 Here, M types of combined action, pat the combined action, d the frequency of the combined action, time t _pat the combined action _occurs, when the window width represented as t _s, a feature vector indicating the feature quantity ( Action frequency) D can be expressed as: If the type of elemental action is m, the number of complex actions is “M = 2 ^m ”.

［数１］
Ｄ＝[ｄ_１，ｄ_２，…，ｄ_Ｍ]
ただし、ｄ_ｐａｔ＝ｔ_ｐａｔ／ｔ_ｓ（ｐａｔ＝１，…，Ｍ）
また、Ｋ（ｋ＝１，…Ｋ）人のユーザから複数の学習サンプルを取得している場合、ユーザｋの行動データに基づいて取得されたＮ_ｋ個の特徴ベクトルは、Ｄ^（ｋ） _１，…Ｄ^（ｋ） _Ｎｋと表すことができる。このとき、ユーザｋに関する行動頻度Ｄ^￣（ｋ）は数２のように示され、ＳＶＭが学習する全ての学習サンプルの行動頻度Ｄ^￣は数３のように示される。 [Equation 1]
D = [d ₁ , d ₂ ,..., D _M ]
However, d _pat = t _pat / t _s (pat = 1,..., M)
Further, when a plurality of learning samples are acquired from K (k = 1,... K) users, N _k feature vectors acquired based on the action data of user k are D ^(k) _1. ,... D ^(k) _Nk . At this time, the action frequency D ^{￣ (k)} related to the user k is expressed as shown in Expression 2, and the action frequency D ^の of all learning samples learned by the SVM is expressed as shown in Expression 3.

ただし、「Ｎｋ」が下付きの添え字である場合については、表現の都合上、数式以外では「ｋ」を「Ｎ」の添え字とせず、まとめて下付きの添え字にして記載する。また、数式以外では、上付きバーを上付き添え字として記載する。 However, when “Nk” is a subscript, for the sake of expression, “k” is not used as a subscript of “N” except for mathematical expressions, and is described as a subscript. In addition to the formula, the superscript bar is described as a superscript.

また、ユーザ毎に行動の複合行動の発生頻度は異なるため、各発生頻度を正規化する。発生頻度を正規化する場合には、ユーザｋの行動頻度Ｄ^￣（ｋ）を構成する発生頻度ｄ^￣（ｋ）に対する全ての各学習サンプルの行動頻度Ｄ^￣を構成する発生頻度ｄ^￣の割合と、そのユーザｋの各発生頻度ｄ^￣（ｋ） _ｉとの積を求めればよい。そして、正規化された各発生頻度ｄ^￣（ｋ） _ｉから構成されるユーザｋの行動頻度Ｄ^＾（ｋ）は数４に示す式に従って求めることができる。なお、正規化された学習サンプルは、一般学習サンプルと呼ぶ。 In addition, since the occurrence frequency of the composite action of each action is different for each user, each occurrence frequency is normalized. When normalizing the occurrence frequency, the ratio of the occurrence frequency d ^構成 constituting the action frequency D ^の of all the learning samples to the occurrence frequency d ^{￣ (k)} constituting the action frequency D ^{￣ (k)} of the user k. And the product of each occurrence frequency d ^{） (k)} _i of the user k. Then, the action frequency D ^{^ (k)} of the user k composed of each normalized occurrence frequency d ^{￣ (k)} _i can be obtained according to the equation shown in Equation 4. Note that the normalized learning sample is referred to as a general learning sample.

ただし、添え字「＾」は、数式では「Ｄ」の上に付されるが、表現の都合上、数式以外では「Ｄ」の上付き添え字として記載する。 However, although the subscript “^” is added above “D” in the mathematical expression, it is described as a superscript “D” for other than the mathematical expression for convenience of expression.

そして、正規化された行動頻度Ｄ^＾（ｋ）の一般学習サンプルは、ＳＶＭによって学習される。また、学習したＳＶＭは、学習した複数の一般学習サンプルを分離する境界線（超平面）を設定する。また、全ての一般学習サンプルを学習した一般ＳＶＭは、図１２のように表すことができる。 And the general learning sample of normalized action frequency D ^{^ (k)} is learned by SVM. Further, the learned SVM sets a boundary line (hyperplane) that separates a plurality of learned general learning samples. Further, the general SVM obtained by learning all the general learning samples can be expressed as shown in FIG.

図１２を参照して、ＳＶＭによって学習された一般学習サンプルは平面上に配置される。平面上の一般学習サンプルのうち、アクティブのラベルが付けられた学習サンプルは丸で示され、パッシブのラベルが付けられた学習サンプルは三角形で示される。また、境界線は、「アクティブ」が付けられた一般学習サンプルと「パッシブ」が付けられた一般学習サンプルとを分離する位置に設定される。そして、認識処理が行われ、このＳＶＭに認識サンプルが入力されると、境界線に基づいて集中状態が認識される。つまり、認識サンプルが、境界線の左側に配置されるとパッシブ状態と認識され、境界線の右側に配置されるとアクティブ状態と認識される。 Referring to FIG. 12, the general learning samples learned by SVM are arranged on a plane. Of the general learning samples on the plane, learning samples labeled active are indicated by circles, and learning samples labeled passive are indicated by triangles. Further, the boundary line is set at a position for separating the general learning sample with “active” from the general learning sample with “passive”. Then, when recognition processing is performed and a recognition sample is input to this SVM, the concentration state is recognized based on the boundary line. That is, when the recognition sample is arranged on the left side of the boundary line, it is recognized as a passive state, and when it is arranged on the right side of the boundary line, it is recognized as an active state.

ここで、一般ＳＶＭによってユーザの集中状態を認識した場合、ユーザによっては特定の要素行動の発生頻度が少ないことがあるため、そのユーザがアクティブ状態であったとしても、パッシブ状態と認識されることがある。そこで、第１実施例では、認識対象ユーザの行動データを利用し、一般ＳＶＭを個人化して、ユーザ状態を認識する。 Here, when a user's concentration state is recognized by a general SVM, the occurrence frequency of a specific elemental action may be low depending on the user, so that even if the user is in an active state, the user is recognized as a passive state. There is. Therefore, in the first example, the behavior data of the recognition target user is used to personalize the general SVM to recognize the user state.

一般ＳＶＭを個人化する手法としては、まず、認識対象のユーザの行動データから個人学習サンプルをサンプリングし、要素行動の発生頻度を求める。次に、各要素行動の発生頻度から複合行動の発生頻度を推定すると共に、一般学習サンプルにおける各要素行動から複合行動の発生頻度も推定する。さらに、個人学習サンプルから推定された複合行動の発生頻度および一般学習サンプルから推定された複合行動の発生頻度に基づいて一般学習サンプルを個人化する。そして、個人化された一般学習サンプルを学習したＳＶＭを再構築することで、一般ＳＶＭを個人化することができる。以下、これらの処理について具体的に説明する。 As a method for personalizing a general SVM, first, an individual learning sample is sampled from the behavior data of a user to be recognized, and the occurrence frequency of elemental behavior is obtained. Next, the occurrence frequency of the complex action is estimated from the occurrence frequency of each element action, and the occurrence frequency of the complex action is also estimated from each element action in the general learning sample. Furthermore, the general learning sample is personalized based on the occurrence frequency of the complex behavior estimated from the individual learning sample and the occurrence frequency of the complex behavior estimated from the general learning sample. The general SVM can be personalized by reconstructing the SVM learned from the personalized general learning sample. Hereinafter, these processes will be specifically described.

まず、状態を認識するユーザの行動データから、個人学習サンプルをサンプリングする。次に、個人学習サンプルから、各要素行動の発生頻度を算出する。 First, a personal learning sample is sampled from the action data of the user who recognizes the state. Next, the frequency of occurrence of each elemental behavior is calculated from the individual learning sample.

たとえば、図１３（Ａ），（Ｂ）を参照して、ウィンドウ幅が１０秒の場合、個人学習サンプルで「発話している」と記録された総時間が７．５秒であれば、「発話」の行動頻度は「０．７５（＝７．５／１０）」となる。同様に、「モニタを注視している」と記録された総時間が３．３秒であれば「注視」の行動頻度は「０．３３」となり、「頷いている」と記録された総時間が０．１７秒であれば「頷き」の行動頻度は「０．１７（＝１．７／１０）」となる。 For example, referring to FIGS. 13A and 13B, if the window width is 10 seconds and the total time recorded as “speaking” in the personal learning sample is 7.5 seconds, The action frequency of “utterance” is “0.75 (= 7.5 / 10)”. Similarly, if the total time recorded as “gazing at the monitor” is 3.3 seconds, the action frequency of “gazing” is “0.33”, and the total time when “growing” is recorded If it is 0.17 seconds, the action frequency of “whit” is “0.17 (= 1.7 / 10)”.

ここで、要素行動をＰｒｉ、要素行動が発生した時間（記録された総時間）をｔ_ｐｒｉと表す場合に、要素行動の発生頻度ｐは数５のように示すことができる。 Here, when the element action is represented as Pri and the time (total recorded time) when the element action occurs is represented as t _pri , the occurrence frequency p of the element action can be expressed as shown in Equation 5.

［数５］
ｐ＝ｔ_ｐｒｉ／ｔ_ｓ
そして、複合行動が要素行動の組み合わせであることに着目し、認識対象ユーザの行動頻度を、各要素行動の期待値から推定する。ここで、個人学習サンプルから推定された行動頻度をＱｐとしたとき、推定行動頻度Ｑｐは数６に従う数式に基づいて算出される。 [Equation 5]
p _{= _t} pri / _t _s
Then, focusing on the fact that the composite action is a combination of element actions, the action frequency of the recognition target user is estimated from the expected value of each element action. Here, assuming that the action frequency estimated from the individual learning sample is Qp, the estimated action frequency Qp is calculated based on the mathematical formula according to Equation 6.

たとえば、図１４を参照して、「発話と注視と頷き」の行動頻度は、各要素行動の期待値、つまり「０．０４２（＝０．７５×０．３３×０．１７）」となる。また、「発話のみ」の行動頻度は、「発話」の行動頻度と「注視」および「頷き」が起きない期待値、つまり「０．４１７（＝０．７５×０．６７×０．８３）」となる。そして、複合行動毎に算出された期待値（行動頻度）から推定行動頻度Ｑｐが構成される。 For example, referring to FIG. 14, the action frequency of “utterance, gaze, and whisper” is the expected value of each element action, that is, “0.042 (= 0.75 × 0.33 × 0.17)”. . In addition, the action frequency of “utterance only” is the action frequency of “utterance” and an expected value at which “gaze” and “whit” do not occur, that is, “0.417 (= 0.75 × 0.67 × 0.83) " And the estimated action frequency Qp is comprised from the expected value (behavior frequency) calculated for every composite action.

また、一般学習サンプルから推定された行動頻度をＱｎとしたとき、推定行動頻度Ｑｎは数７に従う式に基づいて算出される。 Further, assuming that the behavior frequency estimated from the general learning sample is Qn, the estimated behavior frequency Qn is calculated based on an equation according to Equation 7.

そして、個人化された一般学習サンプルの行動頻度Ｄｐ^＾は、数８に従う数式に基づいて算出される。 And the action frequency Dp ^{^} of the personalized general learning sample is calculated based on the mathematical formula according to Equation 8.

図１５（Ａ）は一般学習サンプルから推定された推定行動頻度Ｑｎを構成する、複数の期待値（発生頻度）を示すテーブルであり、図１５（Ｂ）は個人学習サンプルから推定された推定行動頻度Ｑｐを構成する複数の期待値（発生頻度）を示すテーブルである。また、図１５（Ｃ）は一般学習サンプルの行動頻度Ｄ^＾を構成する複数の発生頻度を示すテーブルである。そして、図１５（Ｄ）は個人化された一般学習サンプルの行動頻度Ｄｐ^＾を構成する複数の発生頻度を示すテーブルである。 FIG. 15A is a table showing a plurality of expected values (occurrence frequencies) constituting the estimated action frequency Qn estimated from the general learning sample, and FIG. 15B is an estimated action estimated from the individual learning sample. It is a table which shows a plurality of expected values (occurrence frequency) which constitute frequency Qp. FIG. 15C is a table showing a plurality of occurrence frequencies constituting the behavior frequency D ^{^} of the general learning sample. FIG. 15D is a table showing a plurality of occurrence frequencies constituting the action frequency Dp ^{^} of the personalized general learning sample.

たとえば、図１５（Ａ）−図１５（Ｄ）を参照して、複合行動の「発話のみ」に着目すると、行動頻度Ｄｐ^＾の「発話のみ」の発生頻度「０．５００」は、行動頻度Ｑｎの発生頻度「０．２０８」に対する行動頻度Ｑｐの発生頻度「０．４１７」の割合と、一般学習サンプルの行動頻度の発生頻度「０．２５０」との積（０．４１７／０．２０８×０．２５０）となる。そして、図１５（Ｄ）のテーブルにおいて、「注視のみ」、「頷きのみ」、「発話と注視」、「発話と頷き」、「注視と頷き」、「発話と注視と頷き」および「全部なし」の発生頻度も、上記したの「発話のみ」の発生頻度と同様に算出された結果である。そして、図１５（Ｄ）で示される複数の発生頻度によって、個人化された一般学習サンプルの行動頻度Ｄ^＾ｐが構成される。 For example, referring to FIG. 15 (A) to FIG. 15 (D), focusing on “utterance only” of the complex action, the occurrence frequency “0.500” of “utterance only” of the action frequency Dp ^{^} is the action frequency. The product of the ratio of the occurrence frequency “0.417” of the action frequency Qp to the occurrence frequency “0.208” of the Qn and the occurrence frequency “0.250” of the action frequency of the general learning sample (0.417 / 0.208) X 0.250). In the table of FIG. 15D, “gaze only”, “whisker only”, “speech and gaze”, “speech and whisper”, “gaze and whisper”, “speech and gaze and whisper” and “none” The occurrence frequency of “is also the result calculated in the same manner as the occurrence frequency of“ utterance only ”described above. And the action frequency D ^{^} p of the personalized general learning sample is constituted by the plurality of occurrence frequencies shown in FIG.

さらに、全ての一般学習サンプルが個人化されると、ＳＶＭが再構築される。つまり、個人化された一般学習サンプルを学習した、個人化ＳＶＭが新たに構築される。そして、集中認識処理では、その個人化ＳＶＭによってユーザの集中状態が認識されるため、ユーザの集中状態が正しく認識される。 Furthermore, when all the general learning samples are personalized, the SVM is reconstructed. In other words, a personalized SVM is newly constructed by learning a personalized general learning sample. In the centralized recognition process, since the user's concentration state is recognized by the personalized SVM, the user's concentration state is correctly recognized.

たとえば、図１６（Ａ）を参照して、「注視」の発生頻度が低いユーザの場合、そのユーザからサンプリングした認識サンプルは、一般ＳＶＭによって認識された場合、会話に集中していたとしてもパッシブ（非集中）状態と誤認識されることがある。ところが、一般学習サンプルの重みを、上記ユーザの行動データからサンプリングされた個人学習サンプルを利用して調整した場合、個人化ＳＶＭの境界線は、一般ＳＶＭの境界線に対して異なる位置に設定される。そのため、上記ユーザの認識サンプルは、アクティブ状態と認識されるようになる。 For example, referring to FIG. 16A, in the case of a user with a low occurrence frequency of “gaze”, a recognition sample sampled from the user is passive even if the recognition sample is recognized by a general SVM even if it is concentrated on conversation. It may be misrecognized as a (non-concentrated) state. However, when the weight of the general learning sample is adjusted using the personal learning sample sampled from the user behavior data, the boundary line of the personalized SVM is set at a different position from the boundary line of the general SVM. The Therefore, the recognition sample of the user is recognized as an active state.

このように、一般学習サンプルの行動頻度Ｄ^＾ｐを構成する発生頻度の重みを調整することで、ＳＶＭの境界線を個人の行動に適した位置に変化させることができる。 In this way, by adjusting the weight of the occurrence frequency constituting the action frequency D ^{^} p of the general learning sample, the boundary line of the SVM can be changed to a position suitable for the individual action.

なお、上記説明では、説明の簡単のために３つの要素行動だけで説明したが、実際には９つの要素行動を利用して学習や認識を行う。 In the above description, only three elemental actions have been described for the sake of simplicity, but in practice, learning and recognition are performed using nine elemental actions.

また、本実施例では、個人学習サンプルを所定時間毎に蓄積し、蓄積した個人学習サンプルによって個人化ＳＶＭを逐次更新することで、認識の精度を高める。具体的には、まず、個人化ＳＶＭを更新する毎に、蓄積された全ての個人学習サンプルを認識する。次に、個人学習サンプルに集中状態の認識結果をラベル付けし、アクティブ状態と認識された個人学習サンプルの数を、個人化ＳＶＭの認識度として記録する。そして、個人化ＳＶＭを更新する度に、その認識度における前回と今回との差が閾値（所定値）以下となるまで、個人化ＳＶＭの更新を繰り返す。 In the present embodiment, personal learning samples are accumulated every predetermined time, and the personalized SVM is sequentially updated with the accumulated personal learning samples, thereby improving the recognition accuracy. Specifically, first, every time the personalized SVM is updated, all accumulated personal learning samples are recognized. Next, the recognition result of the concentrated state is labeled on the individual learning sample, and the number of the individual learning samples recognized as the active state is recorded as the recognition degree of the personalized SVM. Each time the personalized SVM is updated, the update of the personalized SVM is repeated until the difference between the previous time and the current level of recognition is equal to or less than a threshold value (predetermined value).

したがって、集中状態の認識精度は、ユーザが傾聴対話持続システム１００を利用する時間に比例して向上する。また、蓄積する個人化学習サンプルを集中状態の認識に利用することで、ユーザの集中状態を認識しつつ、集中状態の認識精度も向上させることができる。さらに、個人化ＳＶＭの認識精度が一定の精度となれば、プロセッサ２６は学習処理を終了させることができる。 Therefore, the recognition accuracy of the concentration state is improved in proportion to the time for which the user uses the listening dialog sustaining system 100. Further, by using the stored personalized learning samples for the recognition of the concentration state, it is possible to improve the recognition accuracy of the concentration state while recognizing the user's concentration state. Furthermore, if the recognition accuracy of the personalized SVM becomes a certain accuracy, the processor 26 can end the learning process.

なお、傾聴対話持続システム１００では、認識されたユーザの状態や、相手ユーザの状態データに基づいて、２人の対話が持続するように、ロボット１２が動作する。そして、ロボット１２は、ユーザＡとユーザＢとの対話に対して、「疑似傾聴動作」、「発話制御動作」および「注意の引きつけの動作」の３種類の動作を行い、対話を持続させる。 In the listening dialogue sustaining system 100, the robot 12 operates so that the dialogue between the two people is continued based on the recognized user status and the status data of the other user. Then, the robot 12 performs three types of operations, “pseudo-listening operation”, “speech control operation”, and “attention attracting operation”, for the dialogue between the user A and the user B, and continues the dialogue.

たとえば、疑似傾聴動作とは、ユーザＡとユーザＢとが積極的に対話している場合には、どちらか一方の発話を傾聴しているかのように振る舞う動作のことである。さらに、発話制御動作とは、どちらかのユーザが一方的に話している場合に、２人の発話のバランスを取るため、ユーザを見ることで発話を抑制したり、ユーザに話しかけたりすることで発話を促進したりする動作のことである。そして、注意の誘導や引きつけの動作は、ユーザが対話に対して集中していない場合に、ユーザに話しかけることでユーザの注意を引きつける動作のことである。 For example, the pseudo-listening operation is an operation that behaves as if the user A and the user B are actively listening to one of the utterances. Furthermore, when one user is speaking unilaterally, the utterance control action is to suppress the utterance by looking at the user or to talk to the user in order to balance the two utterances. It is an action that promotes speech. The attention inducing and attracting actions are actions to attract the user's attention by talking to the user when the user is not concentrated on the dialogue.

図１７は図２に示すＰＣ１０におけるメモリ３０のメモリマップ３００の一例を示す図解図である。図１７に示すようにメモリ３０はプログラム記憶領域３０２およびデータ記憶領域３０４を含む。プログラム記憶領域３０２には、ＰＣ１０を動作させるためのプログラムとして、データ通信プログラム３１２、行動判定プログラム３１４、サンプル蓄積プログラム３１６、学習プログラム３１８、認識プログラム３２０、集中認識プログラム３２２およびロボット制御プログラム３２４などが記憶される。 FIG. 17 is an illustrative view showing one example of a memory map 300 of the memory 30 in the PC 10 shown in FIG. As shown in FIG. 17, the memory 30 includes a program storage area 302 and a data storage area 304. The program storage area 302 includes a data communication program 312, an action determination program 314, a sample accumulation program 316, a learning program 318, a recognition program 320, a centralized recognition program 322, and a robot control program 324 as programs for operating the PC 10. Remembered.

データ通信プログラム３１２は、サーバ２４とデータ通信を行うためのプログラムである。行動判定プログラム３１４は、ユーザの行動を判定するためのプログラムである。サンプル蓄積プログラム３１６は、ユーザ行動から個人学習サンプルを所定時間毎に蓄積するためのプログラムである。なお、蓄積された個人学習サンプルは、認識サンプルとして読み出されることもある。 The data communication program 312 is a program for performing data communication with the server 24. The behavior determination program 314 is a program for determining a user's behavior. The sample accumulation program 316 is a program for accumulating individual learning samples from user behavior every predetermined time. The stored personal learning sample may be read as a recognition sample.

学習プログラム３１８は、個人化ＳＶＭを構築するためのプログラムである。認識プログラム３２０は、認識サンプルを読み出し、その認識サンプルから集中状態、発話状態および興味対象を認識するためのプログラムである。集中認識プログラム３２２は、学習プログラムによって構築された個人化ＳＶＭによってユーザの集中状態を認識するためのプログラムである。ロボット制御プログラム３２４は、ロボット１２の動作を決定するためのプログラムである。 The learning program 318 is a program for constructing a personalized SVM. The recognition program 320 is a program for reading a recognition sample and recognizing the concentration state, the utterance state, and the object of interest from the recognition sample. The concentrated recognition program 322 is a program for recognizing the user's concentrated state by the personalized SVM constructed by the learning program. The robot control program 324 is a program for determining the operation of the robot 12.

なお、図示は省略するが、ＰＣ１０を動作させるためのプログラムには、テレビ電話機能を実現するためのプログラムなどが含まれる。 Although not shown, the program for operating the PC 10 includes a program for realizing a videophone function.

また、図１８を参照して、データ記憶領域３０４には、時刻バッファ３３０、モニタカメラバッファ３３２、腹部カメラバッファ３３４、音声バッファ３３６、判定結果バッファ３３８、顔位置バッファ３４０，データ通信バッファ３４２およびＳＶＭ認識バッファ３４４が設けられる。また、データ記憶領域３０４には、一般学習データ３４６、行動テーブルデータ３４８、個人学習データ３５０および状態データ３５２が記憶されると共に、集中フラグ３５４および所定時間カウンタ３５６がさらに設けられる。 Referring to FIG. 18, data storage area 304 includes time buffer 330, monitor camera buffer 332, abdominal camera buffer 334, audio buffer 336, determination result buffer 338, face position buffer 340, data communication buffer 342, and SVM. A recognition buffer 344 is provided. The data storage area 304 stores general learning data 346, action table data 348, personal learning data 350, and state data 352, and a concentration flag 354 and a predetermined time counter 356 are further provided.

時刻バッファ３３０は、ＲＴＣ２６ａが出力する日時情報が一時的に記憶されるバッファである。モニタカメラバッファ３３２は、モニタカメラ２２によって撮影された画像が一時的に記憶されるバッファである。腹部カメラバッファ３３４は、腹部カメラ１４によって撮影された画像が一時的に記憶されるバッファである。音声バッファ３３６は、マイク２０によって集音された音声が一時的に記憶されるバッファである。 The time buffer 330 is a buffer in which date / time information output from the RTC 26a is temporarily stored. The monitor camera buffer 332 is a buffer in which an image taken by the monitor camera 22 is temporarily stored. The abdominal camera buffer 334 is a buffer in which an image taken by the abdominal camera 14 is temporarily stored. The audio buffer 336 is a buffer in which audio collected by the microphone 20 is temporarily stored.

判定結果バッファ３３８は、ユーザの発話の有無を判定する発話判定、ユーザの注視方向を判定する注視方向判定、ユーザの前傾姿勢の方向判定する前傾姿勢判定、ユーザの頷きの有無を判定する頷き判定および相手ユーザの発話の有無を判定する相手の発話判定それぞれの判定結果を一時的に記憶するためのバッファである。顔位置バッファ３４０は、撮影されたユーザの画像において、ユーザの顔の位置が記憶されるバッファである。また、顔位置バッファ３４０に記憶された位置データに基づいて、ユーザの頷きが判定される。 The determination result buffer 338 determines the utterance determination for determining the presence or absence of the user's utterance, the gaze direction determination for determining the user's gaze direction, the forward tilt posture determination for determining the direction of the user's forward tilt posture, and the presence / absence of the user's whisper. It is a buffer for temporarily storing the determination results of the whisper determination and the other party's speech determination for determining the presence or absence of the other user's speech. The face position buffer 340 is a buffer for storing the position of the user's face in the photographed user image. Further, based on the position data stored in the face position buffer 340, the user's whisper is determined.

データ通信バッファ３４２は、サーバ２４とのデータ通信によって得られた相手の行動データや、状態データなどが一時的に記憶されるバッファである。ＳＶＭ認識バッファ３４４は、一般ＳＶＭおよび個人化ＳＶＭの認識度が一時的に記憶されるバッファである。 The data communication buffer 342 is a buffer that temporarily stores the other party's action data, state data, and the like obtained by data communication with the server 24. The SVM recognition buffer 344 is a buffer that temporarily stores the recognition degrees of the general SVM and the personalized SVM.

一般学習データ３４６は、複数の一般学習サンプルから構成されるデータである。行動テーブルデータ３４８は、図７に示す行動テーブルであり、一定時間毎に最新の行動データが追記される。個人学習データ３５０は、複数の個人学習サンプルから構成されるデータであり、所定時間毎に最新の個人学習サンプルが追加される。状態データ３５２は、ユーザの集中状態、発話状態および興味対象が認識された結果を示すデータである。 The general learning data 346 is data composed of a plurality of general learning samples. The behavior table data 348 is the behavior table shown in FIG. 7, and the latest behavior data is added every certain time. The personal learning data 350 is data composed of a plurality of personal learning samples, and the latest personal learning sample is added every predetermined time. The state data 352 is data indicating the result of recognizing the user's concentration state, speech state, and object of interest.

集中フラグ３５４は、集中状態の認識結果を示すフラグである。たとえば集中フラグ３５４は１ビットのレジスタで構成される。集中フラグ３５４がオン（成立）されると、レジスタにはデータ値「１」が設定される。一方、集中フラグ３５４がオフ（不成立）されると、レジスタにはデータ値「０」が設定される。また、集中フラグ３５４は、アクティブ状態と認識されるとオンになり、パッシブ状態と認識されるとオフになる。 The concentration flag 354 is a flag indicating the recognition result of the concentration state. For example, the concentration flag 354 is composed of a 1-bit register. When the concentration flag 354 is turned on (established), a data value “1” is set in the register. On the other hand, when the concentration flag 354 is turned off (not established), a data value “0” is set in the register. The concentration flag 354 is turned on when recognized as an active state and turned off when recognized as a passive state.

所定時間カウンタ３５６は、所定時間を計測するためのカウンタであり、初期化されるとカウントを開始する。また、所定時間カウンタ３５６は所定時間カウンタとも呼ばれ、所定時間タイマによって時間を計測する処理が実行されると、所定時間カウンタ３５６は初期化される。たとえば、所定時間カウンタ３５６は、ＰＣ１０の電源がオンにされると初期化され、個人学習サンプルがサンプリングされる毎にリセットされる。 The predetermined time counter 356 is a counter for measuring a predetermined time, and starts counting when initialized. The predetermined time counter 356 is also called a predetermined time counter, and the predetermined time counter 356 is initialized when processing for measuring time by a predetermined time timer is executed. For example, the predetermined time counter 356 is initialized when the PC 10 is turned on, and is reset every time a personal learning sample is sampled.

なお、図示は省略するが、データ記憶領域３０４には、様々な計算の結果を一時的に格納するバッファなどが設けられると共に、ＰＣ１０の動作に必要な他のカウンタやフラグなども設けられる。 Although not shown, the data storage area 304 is provided with a buffer for temporarily storing various calculation results and other counters and flags necessary for the operation of the PC 10.

図１９は行動判定プログラム３１４のサブルーチンに対応するプログラムを示す図解図である。図１９を参照して、状況認識プログラム３１４は、画像／音声取得プログラム３１４ａ、発話判定プログラム３１４ｂ、注視方向判定プログラム３１４ｃ、前傾姿勢判定プログラム３１４ｄ、頷き判定プログラム３１４ｅ、相手の発話判定プログラム３１４ｆおよび同期プログラム３１４ｇから構成される。 FIG. 19 is an illustrative view showing a program corresponding to a subroutine of the action determination program 314. Referring to FIG. 19, the situation recognition program 314 includes an image / sound acquisition program 314a, an utterance determination program 314b, a gaze direction determination program 314c, a forward tilt posture determination program 314d, a whisper determination program 314e, an opponent utterance determination program 314f, and It consists of a synchronization program 314g.

画像／音声取得プログラム３１４ａは、モニタカメラ２２および腹部カメラ１４によって撮影された画像と、マイク２０によって集音された音声とをバッファに取り込むためのプログラムである。発話判定プログラム３１４ｂは、ユーザが発話しているか否かを判定するためのプログラムである。注視方向判定プログラム３１４ｃは、ユーザが注視している方向を判定するためのプログラムである。前傾姿勢判定プログラム３１４ｄは、ユーザが前傾姿勢を取っている方向を判定するためのプログラムである。頷き判定プログラム３１４ｅは、ユーザが頷いたか否かを判定するためのプログラムである。相手の発話判定プログラム３１４ｆは、相手ユーザが発話しているか否かを判定するためのプログラムである。同期プログラム３１４ｇは、発話判定結果、注視方向判定結果、前傾姿勢判定結果、頷き判定結果および相手ユーザの発話判定結果を同期して、行動データとするためのプログラムである。 The image / sound acquisition program 314a is a program for taking in an image captured by the monitor camera 22 and the abdomen camera 14 and the sound collected by the microphone 20 into a buffer. The utterance determination program 314b is a program for determining whether or not the user is speaking. The gaze direction determination program 314c is a program for determining the direction in which the user is gazing. The forward lean posture determination program 314d is a program for determining the direction in which the user is taking the forward lean posture. The whisper determination program 314e is a program for determining whether or not the user has whispered. The partner's utterance determination program 314f is a program for determining whether or not the partner user is speaking. The synchronization program 314g is a program for synchronizing the speech determination result, the gaze direction determination result, the forward tilt posture determination result, the whirl determination result, and the partner user's speech determination result into action data.

以下、ＰＣ１０によって実行される第１実施例のフロー図について説明する。また、図２０−図２６のフロー図は行動判定プログラム３１４を構成する各プログラムの処理を示す。さらに、図２７のフロー図はサンプル蓄積プログラム３１６による処理を示し、図２８のフロー図は学習プログラムによる処理を示し、図２９のフロー図は認識プログラム３２０による処理を示し、図３０のフロー図は集中認識プログラムによる処理を示し、図３１のフロー図はロボット制御プログラムによる処理を示す。 Hereinafter, a flowchart of the first embodiment executed by the PC 10 will be described. Also, the flowcharts of FIGS. 20 to 26 show the processing of each program constituting the action determination program 314. Further, the flowchart of FIG. 27 shows processing by the sample accumulation program 316, the flowchart of FIG. 28 shows processing by the learning program, the flowchart of FIG. 29 shows processing by the recognition program 320, and the flowchart of FIG. The process by the centralized recognition program is shown, and the flowchart of FIG. 31 shows the process by the robot control program.

図２０は画像／音声取得プログラム３１４ａの処理を示すフロー図である。たとえば、ＰＣ１０のプロセッサ２６は、ユーザによってＰＣ１０の電源がオンにされると、ステップＳ１で腹部カメラ１４による画像データを取得する。つまり、ロボット１２から入力される映像信号から画像データを取得し、腹部カメラバッファ３３４に一旦記憶させる。続いて、プロセッサ２６は、ステップＳ３で、モニタカメラ２２による画像データを取得する。つまり、モニタカメラ２２から入力される映像信号から画像データを取得し、モニタカメラバッファ３３２に一旦記憶させる。続いて、プロセッサ２６は、ステップＳ５で、音声データを取得し、ステップＳ１に戻る。つまり、プロセッサ２６は、マイク２０によって集音された音声から音声データを抽出し、音声バッファ３３６に一旦記憶させる。 FIG. 20 is a flowchart showing the processing of the image / sound acquisition program 314a. For example, when the power of the PC 10 is turned on by the user, the processor 26 of the PC 10 acquires image data from the abdominal camera 14 in step S1. That is, image data is acquired from the video signal input from the robot 12 and temporarily stored in the abdominal camera buffer 334. Subsequently, the processor 26 acquires image data from the monitor camera 22 in step S3. That is, image data is acquired from the video signal input from the monitor camera 22 and temporarily stored in the monitor camera buffer 332. Subsequently, in step S5, the processor 26 acquires audio data and returns to step S1. That is, the processor 26 extracts audio data from the audio collected by the microphone 20 and temporarily stores it in the audio buffer 336.

図２１は発話判定プログラム３１４ｂの処理を示すフロー図である。プロセッサ２６は、ステップＳ１１で、音声データが取得されたか否かを判断する。つまり、プロセッサ２６は、音声バッファ３３６に新たな音声データが記憶されたか否かを判断する。ステップＳ１１で“ＮＯ”であれば、つまり音声データが取得されていなければ、プロセッサ２６はステップＳ１１の処理を繰り返し実行する。一方、ステップＳ１１で“ＹＥＳ”であれば、つまり音声データが取得されていれば、プロセッサ２６はステップＳ１３で、音量が閾値以上であるか否かを判断する。つまり、プロセッサ２６は、音声バッファ３３６に記憶される音声データの音声レベルが閾値以上であるか否かを判定する。 FIG. 21 is a flowchart showing the processing of the speech determination program 314b. In step S11, the processor 26 determines whether audio data has been acquired. That is, the processor 26 determines whether or not new audio data is stored in the audio buffer 336. If “NO” in the step S11, that is, if the audio data is not acquired, the processor 26 repeatedly executes the process of the step S11. On the other hand, if “YES” in the step S11, that is, if audio data is acquired, the processor 26 determines whether or not the sound volume is equal to or higher than a threshold value in a step S13. That is, the processor 26 determines whether or not the sound level of the sound data stored in the sound buffer 336 is equal to or higher than the threshold value.

ステップＳ１３で“ＹＥＳ”であれば、つまり音声レベルが閾値以上であれば、プロセッサ２６はステップＳ１５で、「発話有り」と判定する。一方、ステップＳ１３で“ＮＯ”であれば、つまり音声レベルが閾値未満であれば、プロセッサ２６はステップＳ１７で、「発話無し」と判定する。 If “YES” in the step S13, that is, if the sound level is equal to or higher than the threshold value, the processor 26 determines that “there is an utterance” in a step S15. On the other hand, if “NO” in the step S13, that is, if the sound level is lower than the threshold value, the processor 26 determines that “no utterance” in a step S17.

続いて、プロセッサ２６は、ステップＳ１９で、現在時刻を取得する。つまり、プロセッサ２６は、時刻バッファ３３０に記憶される日時情報を取得する。続いて、プロセッサ２６は、ステップＳ２１で、発話の判定結果に現在時刻を対応付ける。つまり、プロセッサ２６は、複数の判定結果を同期させるために、現在時刻を対応付ける。そして、現在時刻が対応付けられた発話判定結果は、判定結果バッファ３３８に一時的に記憶される。 Subsequently, the processor 26 acquires the current time in step S19. That is, the processor 26 acquires date / time information stored in the time buffer 330. Subsequently, in step S21, the processor 26 associates the current time with the utterance determination result. That is, the processor 26 associates the current time with each other in order to synchronize a plurality of determination results. The utterance determination result associated with the current time is temporarily stored in the determination result buffer 338.

なお、図２２−図２５に示す他の判定処理でも、ステップＳ１９と同様に日時情報を取得し、ステップＳ２１と同様に日時情報を対応付ける処理が存在するが、処理内容は全て同じであるため、他のフロー図では詳細な説明は省略する。 In the other determination processes shown in FIG. 22 to FIG. 25, there is a process for acquiring date / time information as in step S19 and associating the date / time information as in step S21. Detailed description is omitted in other flowcharts.

図２２は注視方向判定プログラム３１４ｃの処理を示すフロー図である。プロセッサ２６は、ステップＳ３１で、画像が取得されたか否かを判断する。つまり、プロセッサ２６は、モニタカメラバッファ３３２および腹部カメラバッファ３３４に新たな画像データが記憶されたか否かを判断する。ステップＳ３１で“ＮＯ”であれば、つまり画像データが取得されていなければ、プロセッサ２６はステップＳ３１の処理を繰り返し実行する。一方、ステップＳ３１で“ＹＥＳ”であれば、つまり画像データが取得されていれば、プロセッサ２６はステップＳ３３で、モニタカメラ２２の画像を取得する。つまり、プロセッサ２６は、モニタカメラバッファ３３２から画像データを取得する。 FIG. 22 is a flowchart showing the processing of the gaze direction determination program 314c. In step S31, the processor 26 determines whether an image has been acquired. That is, the processor 26 determines whether new image data is stored in the monitor camera buffer 332 and the abdominal camera buffer 334. If “NO” in the step S31, that is, if the image data is not acquired, the processor 26 repeatedly executes the process of the step S31. On the other hand, if “YES” in the step S31, that is, if image data is acquired, the processor 26 acquires an image of the monitor camera 22 in a step S33. That is, the processor 26 acquires image data from the monitor camera buffer 332.

続いて、プロセッサ２６は、ステップＳ３５で、顔認識処理を実行する。つまり、プロセッサ２６は、モニタカメラバッファ３３２に記憶される画像データに対して所定の顔認識処理を実行する。続いて、プロセッサ２６は、ステップＳ３７で、顔を認識できたか否かを判断する。つまり、プロセッサ２６は、ステップ３５の処理が成功したか否かを判断する。ステップＳ３７で“ＹＥＳ”であれば、つまりモニタカメラ２２によって撮影された画像において顔の認識に成功していれば、プロセッサ２６はステップＳ３９で、注視方向を「モニタ」に設定する。 Subsequently, the processor 26 executes face recognition processing in step S35. That is, the processor 26 executes a predetermined face recognition process on the image data stored in the monitor camera buffer 332. Subsequently, the processor 26 determines whether or not the face has been recognized in step S37. That is, the processor 26 determines whether or not the process of step 35 is successful. If “YES” in the step S37, that is, if the face is successfully recognized in the image taken by the monitor camera 22, the processor 26 sets the gaze direction to “monitor” in a step S39.

また、ステップＳ３７で“ＮＯ”であれば、つまりモニタカメラ２２によって撮影された画像において顔の認識に失敗していれば、プロセッサ２６はステップＳ４１で、腹部カメラの画像を取得する。つまり、プロセッサ２６は、腹部カメラバッファ３３４から画像データを取得する。続いて、プロセッサ２６は、ステップＳ４３で、腹部カメラバッファ３３４に記憶される画像データに対して、ステップＳ３５と同様、顔認識処理を実行する。 If “NO” in the step S37, that is, if the face recognition has failed in the image taken by the monitor camera 22, the processor 26 acquires an image of the abdominal camera in a step S41. That is, the processor 26 acquires image data from the abdominal camera buffer 334. Subsequently, the processor 26 performs face recognition processing on the image data stored in the abdominal camera buffer 334 in step S43, as in step S35.

続いて、プロセッサ２６は、ステップＳ４５で顔が認識されたか否かを判断する。つまり、プロセッサ２６は、ステップＳ４３の処理で顔の認識が成功したか否かを判断する。ステップＳ４５で“ＹＥＳ”であれば、つまり腹部カメラ１４によって撮影された画像において顔の認識に成功していれば、プロセッサ２６はステップＳ４７で注視方向を「ロボット」に設定する。 Subsequently, the processor 26 determines whether or not a face is recognized in step S45. That is, the processor 26 determines whether or not the face recognition is successful in the process of step S43. If “YES” in the step S45, that is, if the face has been successfully recognized in the image taken by the abdominal camera 14, the processor 26 sets the gaze direction to “robot” in a step S47.

一方、ステップＳ４５で“ＮＯ”であれば、つまりモニタカメラ２２と腹部カメラ１４との両方のカメラでユーザの顔の認識が失敗していれば、プロセッサ２６はステップＳ４９で、注視方向を「その他」に設定する。 On the other hand, if “NO” in the step S45, that is, if the user's face recognition has failed in both the monitor camera 22 and the abdominal camera 14, the processor 26 sets the gaze direction to “others” in the step S49. To "".

続いて、プロセッサ２６は、ステップＳ５１で現在時刻を取得し、ステップＳ５３で発話の判定結果に現在時刻を対応付ける。そして、プロセッサ２６は、ステップＳ５３の処理が終了するとステップＳ３１に戻る。なお、ステップＳ５３において、現在時刻が対応付けられた注視方向の判定結果は、判定結果バッファ３３８に一旦記憶される。 Subsequently, the processor 26 acquires the current time in step S51, and associates the current time with the utterance determination result in step S53. And the processor 26 returns to step S31, after the process of step S53 is complete | finished. In step S53, the determination result of the gaze direction associated with the current time is temporarily stored in the determination result buffer 338.

なお、他の実施例では、モニタカメラ２２または腹部カメラ１４のどちらか一方の画像だけで注視方向の判定を行ってもよい。また、図２３および図２４に示す他の判定処理でも、ステップＳ３１，Ｓ３３，Ｓ４１と同様の処理が存在するが、処理内容は全て同じであるため、他のフロー図では詳細な説明は省略する。 In another embodiment, the gaze direction may be determined using only one image of the monitor camera 22 or the abdominal camera 14. Also, in the other determination processes shown in FIG. 23 and FIG. 24, the same processes as in steps S31, S33, and S41 exist, but the process contents are all the same, and thus detailed description is omitted in the other flowcharts. .

図２３は前傾姿勢判定プログラム３１４ｄの処理を示すフロー図である。プロセッサ２６は、ステップＳ６１で、画像が取得されたかを判断する。ステップＳ６１で“ＮＯ”であれば、つまり画像が取得されていなければ、ステップＳ６１の処理は繰り返し実行される。一方、ステップＳ６１で“ＹＥＳ”であれば、つまり画像が取得されていれば、プロセッサ２６はステップＳ６３で、モニタカメラ２２の画像を取得する。 FIG. 23 is a flowchart showing the processing of the forward tilt posture determination program 314d. In step S61, the processor 26 determines whether an image has been acquired. If “NO” in the step S61, that is, if an image is not acquired, the process of the step S61 is repeatedly executed. On the other hand, if “YES” in the step S61, that is, if an image is acquired, the processor 26 acquires an image of the monitor camera 22 in a step S63.

続いて、プロセッサ２６は、ステップＳ６５で顔領域認識処理を実行する。つまり、プロセッサ２６は、モニタカメラバッファ３３２に格納されている画像データから顔領域を認識する。続いて、プロセッサ２６は、ステップＳ６７で、顔領域が閾値（たとえば、画像データの面積の５割を示す値）以上か否かを判断する。つまり、プロセッサ２６は、認識された顔領域の面積が閾値以上であるか否かを判断する。ステップＳ６７で“ＹＥＳ”であれば、たとえば、図８（Ａ）に示すように、ユーザの顔が撮影されていれば、プロセッサ２６はステップＳ６９で、前傾姿勢を「モニタ」に設定する。 Subsequently, the processor 26 executes face area recognition processing in step S65. That is, the processor 26 recognizes the face area from the image data stored in the monitor camera buffer 332. Subsequently, in step S67, the processor 26 determines whether or not the face area is equal to or greater than a threshold value (for example, a value indicating 50% of the area of the image data). That is, the processor 26 determines whether or not the area of the recognized face area is equal to or greater than a threshold value. If “YES” in the step S67, for example, as shown in FIG. 8A, if the user's face is photographed, the processor 26 sets the forward tilt posture to “monitor” in a step S69.

一方、ステップＳ６７で“ＮＯ”であれば、顔領域の面積が閾値未満、または顔領域の認識が失敗してれば、プロセッサ２６はステップＳ７１で、腹部カメラ１４の画像を取得する。続いて、プロセッサ２６は、ステップＳ７３で、顔領域の認識処理を実行する。つまり、プロセッサ２６は、腹部ロボットバッファ３３４に格納された画像データから顔領域を認識する。続いて、プロセッサ２６は、ステップＳ７５で、顔領域が閾値以上か否かを判断する。つまり、プロセッサ２６は、腹部カメラ１４で撮影された画像データにおいて、ユーザの顔が閾値以上であるか否かを判断する。ステップＳ７５で“ＹＥＳ”であれば、たとえば、図８（Ｂ）に示すように、ユーザの顔が撮影されていれば、プロセッサ２６はステップＳ７７で、前傾姿勢を「ロボット」に設定する。 On the other hand, if “NO” in the step S67, if the area of the face region is less than the threshold value or the recognition of the face region fails, the processor 26 acquires an image of the abdominal camera 14 in a step S71. Subsequently, the processor 26 executes face area recognition processing in step S73. That is, the processor 26 recognizes the face area from the image data stored in the abdominal robot buffer 334. Subsequently, in step S75, the processor 26 determines whether or not the face area is greater than or equal to a threshold value. That is, the processor 26 determines whether or not the user's face is greater than or equal to the threshold in the image data captured by the abdominal camera 14. If “YES” in the step S75, for example, as shown in FIG. 8B, if the user's face is photographed, the processor 26 sets the forward tilt posture to “robot” in a step S77.

一方、ステップＳ７５で“ＮＯ”であれば、たとえば、図８（Ｃ）に示すように、ユーザの顔が撮影されていれば、プロセッサ２６はステップＳ７９で、前傾姿勢を「その他」に設定する。 On the other hand, if “NO” in the step S75, for example, as shown in FIG. 8C, if the user's face is photographed, the processor 26 sets the forward leaning posture to “other” in a step S79. To do.

続いて、プロセッサ２６は、ステップＳ８１で現在時刻を取得し、ステップＳ８３で発話の判定結果に現在時刻を対応付ける。そして、プロセッサ２６は、ステップＳ８３の処理が終了するとステップＳ６１に戻る。なお、ステップＳ８３において、現在時刻が対応付けられた注視方向の判定結果は、判定結果バッファ３３８に一旦記憶される。 Subsequently, the processor 26 acquires the current time in step S81, and associates the current time with the utterance determination result in step S83. Then, when the process of step S83 ends, the processor 26 returns to step S61. In step S83, the gaze direction determination result associated with the current time is temporarily stored in the determination result buffer 338.

なお、他の実施例では、モニタカメラ２２または腹部カメラ１４のどちらか一方の画像だけで、前傾姿勢の判定が行われてもよい。 In another embodiment, the forward tilt posture determination may be performed using only one image of the monitor camera 22 or the abdominal camera 14.

図２４は頷き判定プログラム３１４ｅの処理を示すフロー図である。プロセッサ２６はステップＳ９１で画像が取得されたか否かを判断する。ステップＳ９１で“ＮＯ”であれば、つまり画像が取得されていなければ、プロセッサ２６はステップＳ９１の処理を繰り返し実行する。一方、ステップＳ９１で“ＹＥＳ”であれば、つまり画像が取得されていれば、プロセッサ２６は、ステップＳ９１で、モニタカメラ２２の画像を取得する。 FIG. 24 is a flowchart showing the processing of the whirl determination program 314e. The processor 26 determines whether an image has been acquired in step S91. If “NO” in the step S91, that is, if an image is not acquired, the processor 26 repeatedly executes the process of the step S91. On the other hand, if “YES” in the step S91, that is, if an image is acquired, the processor 26 acquires an image of the monitor camera 22 in a step S91.

続いて、プロセッサ２６は、ステップＳ９５で、顔位置認識処理を実行する。つまり、プロセッサ２６は、モニタカメラバッファ３３２に保存される画像データにおいて、顔領域の重心位置を認識する。続いて、プロセッサ２６は、ステップＳ９７で、顔位置が変化したか否かを判断する。つまり、プロセッサ２６は、顔位置バッファ３４０からモニタカメラ２２に対応する前回の顔位置を取得する。さらに、プロセッサ２６は、取得した前回の顔位置とステップＳ９５で認識された今回の顔位置とを比較して、ユーザの顔位置が変化したか否かを判断する。ステップＳ９７で“ＹＥＳ”であれば、つまりモニタカメラ２２で撮影された画像において、ユーザの顔位置が変化していれば、プロセッサ２６は、ステップＳ９９で、頷きを「有り」と判定する。 Subsequently, the processor 26 executes face position recognition processing in step S95. That is, the processor 26 recognizes the position of the center of gravity of the face area in the image data stored in the monitor camera buffer 332. Subsequently, in step S97, the processor 26 determines whether or not the face position has changed. That is, the processor 26 acquires the previous face position corresponding to the monitor camera 22 from the face position buffer 340. Further, the processor 26 compares the acquired previous face position with the current face position recognized in step S95, and determines whether or not the user's face position has changed. If “YES” in the step S97, that is, if the user's face position changes in the image taken by the monitor camera 22, the processor 26 determines that the whisper is “present” in a step S99.

また、ステップＳ９７で“ＮＯ”であれば、つまりモニタカメラ２２の画像において、ユーザの顔位置が変化していなければ、プロセッサ２６はステップＳ１０１で、腹部カメラ１４の画像を取得する。続いて、プロセッサ２６は、ステップＳ１０３で、顔位置認識処理を実行する。つまり、プロセッサ２６は、腹部カメラバッファ３３４に記憶された画像データにおいて、顔領域の重心位置を認識する。続いて、プロセッサ２６は、ステップＳ１０５で、顔位置が変化したか否かを判断する。つまり、プロセッサ２６は、顔位置バッファ３４０から腹部カメラ１４に対応する前回の顔位置を取得し、ステップＳ９７と同様、ユーザの顔位置が変化したか否かを判断する。ステップＳ１０５で“ＹＥＳ”であれば、つまり腹部カメラ１４の画像において、ユーザの顔位置が変化していれば、プロセッサ２６はステップＳ９９で、頷きを「有り」と判定する。一方、ステップＳ１０５で“ＮＯ”であれば、つまり腹部カメラ１４の画像でも、ユーザの顔位置の変化が検出されなければ、プロセッサ２６はステップＳ１０７で、ユーザの頷きを「無し」と判定する。 If “NO” in the step S97, that is, if the user's face position is not changed in the image of the monitor camera 22, the processor 26 acquires the image of the abdominal camera 14 in a step S101. Subsequently, in step S103, the processor 26 executes face position recognition processing. That is, the processor 26 recognizes the gravity center position of the face area in the image data stored in the abdominal camera buffer 334. Subsequently, in step S105, the processor 26 determines whether or not the face position has changed. That is, the processor 26 acquires the previous face position corresponding to the abdominal camera 14 from the face position buffer 340, and determines whether or not the user's face position has changed as in step S97. If “YES” in the step S105, that is, if the user's face position has changed in the image of the abdominal camera 14, the processor 26 determines that the whisper is “present” in a step S99. On the other hand, if “NO” in the step S105, that is, if a change in the user's face position is not detected even in the image of the abdominal camera 14, the processor 26 determines that the user's whisper is “none” in a step S107.

続いて、プロセッサ２６は、ステップＳ１０９で、今回の顔位置を記憶する。つまり、プロセッサ２６は、ステップＳ９５，Ｓ１０３で認識されたユーザの顔位置を顔位置バッファ３４０に記憶させる。 Subsequently, the processor 26 stores the current face position in step S109. That is, the processor 26 stores the face position of the user recognized in steps S95 and S103 in the face position buffer 340.

続いて、プロセッサ２６は、ステップＳ１１１で現在時刻を取得し、ステップＳ１１３で発話の判定結果に現在時刻を対応付ける。そして、プロセッサ２６は、ステップＳ１１３の処理が終了するとステップＳ９１に戻る。また、ステップＳ１１３では、現在時刻が対応付けられた頷きの判定結果は、判定結果バッファ３３８に一旦記憶される。 Subsequently, the processor 26 acquires the current time in step S111, and associates the current time with the utterance determination result in step S113. And the processor 26 returns to step S91, after the process of step S113 is complete | finished. Further, in step S113, the determination result of the handing associated with the current time is temporarily stored in the determination result buffer 338.

なお、上記ステップＳ１３，Ｓ３７，Ｓ４５，Ｓ６７，Ｓ７５，Ｓ９７およびＳ１０５の処理を実行するプロセッサ２６は要素行動判定手段として機能する。 Note that the processor 26 that executes the processes of steps S13, S37, S45, S67, S75, S97, and S105 functions as an element behavior determination unit.

図２５は相手の発話判定プログラム３１４ｆの処理を示すフロー図である。プロセッサ２６は、ステップＳ１２１で、相手の行動データを取得する。つまり、プロセッサ２６は、サーバ２４とのデータ通信を確立し、相手の行動データをデータ通信バッファ３４２に記憶させる。続いて、プロセッサ２６は、ステップＳ１２３で、相手が発話したか否かを判断する。つまり、プロセッサ２６は、データ通信バッファ３４２に格納された相手の行動データにおいて、発話の欄に「有り」が記録されているか否かを判断する。ステップＳ１２３で“ＹＥＳ”であれば、つまり相手の行動データにおいて、発話が「有り」と記録されていれば、プロセッサ２６はステップＳ１２５で、相手の発話を有りと判定する。一方、ステップＳ１２３で“ＮＯ”であれば、つまり相手の行動データにおいて、発話が「無し」と記録されていれば、プロセッサ２６はステップＳ１２７で、相手の発話を「無し」と判定する。なお、ステップＳ１２３の処理を実行するプロセッサ２６は相手発話判定手段として機能する。 FIG. 25 is a flowchart showing the processing of the partner's speech determination program 314f. In step S121, the processor 26 acquires the other party's action data. That is, the processor 26 establishes data communication with the server 24 and stores the other party's action data in the data communication buffer 342. Subsequently, the processor 26 determines whether or not the other party has spoken in step S123. That is, the processor 26 determines whether or not “present” is recorded in the utterance column in the behavior data of the other party stored in the data communication buffer 342. If “YES” in the step S123, that is, if the utterance is recorded as “present” in the behavior data of the other party, the processor 26 determines in the step S125 that the other party has the utterance. On the other hand, if “NO” in the step S123, that is, if the utterance is recorded as “none” in the behavior data of the other party, the processor 26 determines that the other party's utterance is “none” in a step S127. Note that the processor 26 that executes the process of step S123 functions as a partner utterance determination unit.

続いて、プロセッサ２６は、ステップＳ１２９で現在時刻を取得し、ステップＳ１３１で発話の判定結果に現在時刻を対応付ける。そして、プロセッサ２６は、ステップＳ１３１の処理が終了するとステップＳ１２１に戻る。なお、ステップＳ１３１では、現在時刻が対応付けられた相手の発話の判定結果は、判定結果バッファ３３８に一旦記憶される。 Subsequently, the processor 26 acquires the current time in step S129, and associates the current time with the utterance determination result in step S131. And the processor 26 returns to step S121, after the process of step S131 is complete | finished. In step S131, the determination result of the other party's utterance associated with the current time is temporarily stored in the determination result buffer 338.

図２６は同期プログラム３１４ｇの処理を示すフロー図が示される。プロセッサ２６は、ステップＳ１４１で各判定が終了したか否かを判断する。たとえば、プロセッサ２６は、判定結果バッファ３３８に発話、注視方向、前傾姿勢、頷きおよび相手の発話の判定結果が記憶されているか否かを判定する。ステップＳ１４１で“ＮＯ”であれば、つまり各判定が終了していなければ、プロセッサ２６はステップＳ１４３の処理を繰り返し実行する。一方、ステップＳ１４３で“ＹＥＳ”であれば、つまり各判定が終了していれば、プロセッサ２６はステップＳ１４３で、各判定結果および顔認識結果を時刻に基づいて同期する。つまり、プロセッサ２６は、各判定結果に対応付けられた時刻に基づいて同期する。 FIG. 26 is a flowchart showing the processing of the synchronization program 314g. The processor 26 determines whether or not each determination is completed in step S141. For example, the processor 26 determines whether or not the determination result buffer 338 stores the determination results of the utterance, the gaze direction, the forward leaning posture, the whispering, and the opponent's utterance. If “NO” in the step S141, that is, if each determination is not completed, the processor 26 repeatedly executes the process of the step S143. On the other hand, if “YES” in the step S143, that is, if each determination is completed, the processor 26 synchronizes each determination result and the face recognition result based on the time in a step S143. That is, the processor 26 synchronizes based on the time associated with each determination result.

続いて、プロセッサ２６は、ステップＳ１４５で、同期した各判定結果を行動データとし、行動テーブルに記録する。つまり、プロセッサ２６は、図７に示す行動テーブルにおいて、新たな行に各判定結果を記録する。続いて、プロセッサ２６は、ステップＳ１４７で、現在の行動データをサーバ２４に送信する。そして、プロセッサ２６はステップＳ１４７の処理が終了すればステップＳ１４１に戻る。つまり、プロセッサ２６は、行動テーブルにおいて、新たに追加された行に対応する行動データをサーバ２４に送信する。 Subsequently, in step S145, the processor 26 sets each synchronized determination result as behavior data and records it in the behavior table. That is, the processor 26 records each determination result in a new row in the behavior table shown in FIG. Subsequently, the processor 26 transmits the current behavior data to the server 24 in step S147. Then, the processor 26 returns to step S141 when the process of step S147 is completed. That is, the processor 26 transmits the behavior data corresponding to the newly added row to the server 24 in the behavior table.

このように、図２０−図２６の処理が一定時間毎に並列的に実行されることで、ユーザの行動が、判定され、サーバ２４に送信される。 As described above, the actions of the user are determined and transmitted to the server 24 by executing the processes of FIGS.

図２７はサンプル蓄積プログラム３１６の処理を示すフロー図である。プロセッサ２６は、ステップＳ１５１で所定時間タイマが満了したか否かを判断する。つまり、プロセッサ２６は、前回の個人学習サンプルを蓄積してから所定時間が経過したか否かを、所定時間カウンタ３５６の値に基づいて判断する。続いて、プロセッサ２６は、ステップＳ１５３で、所定時間分の行動データを取得する。つまり、プロセッサ２６は、行動テーブルの時刻欄に基づいて、所定時間分の行動データを、行動テーブルデータ３４８から読み出す。 FIG. 27 is a flowchart showing the processing of the sample accumulation program 316. The processor 26 determines whether or not the timer has expired for a predetermined time in step S151. That is, the processor 26 determines whether or not a predetermined time has elapsed since the previous personal learning sample was accumulated based on the value of the predetermined time counter 356. Subsequently, the processor 26 acquires action data for a predetermined time in step S153. That is, the processor 26 reads behavior data for a predetermined time from the behavior table data 348 based on the time column of the behavior table.

続いて、プロセッサ２６は、ステップＳ１５５で、複合行動の行動頻度を算出する。たとえば、プロセッサ２６は、読み出した行動データから要素行動の発生頻度を算出する。そして、プロセッサ２６は、複合行動の行動頻度として、算出された発生頻度の期待値を算出する。続いて、プロセッサ２６は、ステップＳ１５７で、算出された行動頻度を、個人学習サンプルとして個人学習データ３５０に追加（蓄積）する。たとえば、プロセッサ２６は、図１４に示すように算出された行動頻度を、個人学習データ３５０を構成する個人学習サンプルとして、ＲＡＭ３０に記憶させる。続いて、プロセッサ２６は、ステップＳ１５９で、所定時間タイマをリセットして、ステップＳ１５１に戻る。つまり、プロセッサ２６は、所定時間カウンタ３５４を初期化する。 Subsequently, in step S155, the processor 26 calculates the action frequency of the composite action. For example, the processor 26 calculates the occurrence frequency of the element behavior from the read behavior data. Then, the processor 26 calculates an expected value of the calculated occurrence frequency as the action frequency of the composite action. Subsequently, in step S157, the processor 26 adds (accumulates) the calculated action frequency to the personal learning data 350 as a personal learning sample. For example, the processor 26 stores the behavior frequency calculated as shown in FIG. 14 in the RAM 30 as a personal learning sample constituting the personal learning data 350. Subsequently, in step S159, the processor 26 resets the timer for a predetermined time, and returns to step S151. That is, the processor 26 initializes the predetermined time counter 354.

図２８は学習プログラム３１８の処理を示すフロー図である。プロセッサ２６は、ステップＳ１６１で、一般学習データ３４６を取得する。つまり、プロセッサ２６は、ＲＡＭ３０から一般学習データ３４６を読み出す。続いて、プロセッサ２６は、ステップＳ１６３で、一般ＳＶＭを構築する。つまり、プロセッサ２６は、読み出した一般学習データ３４６を構成する複数の一般学習サンプルから、図１６（Ａ）に示すような一般ＳＶＭを構築する。続いて、プロセッサ２６は、ステップＳ１６７で、一般ＳＶＭによって個人学習データ３４６を構成する全ての個人学習サンプルを認識する。つまり、個人化ＳＶＭの認識精度を向上させるために、まず一般ＳＶＭによって個人学習サンプルを認識する。続いて、プロセッサ２６は、ステップＳ１６９で、一般ＳＶＭの認識度を記録する。つまり、プロセッサ２６は、個人学習データ３５０を構成する複数の個人学習サンプルのうち、アクティブ状態と認識された個人学習サンプルの数を、一般ＳＶＭの認識度としてＳＶＭ認識バッファ３４４に記憶する。 FIG. 28 is a flowchart showing the processing of the learning program 318. The processor 26 acquires the general learning data 346 in step S161. That is, the processor 26 reads the general learning data 346 from the RAM 30. Subsequently, the processor 26 constructs a general SVM in step S163. That is, the processor 26 constructs a general SVM as shown in FIG. 16A from a plurality of general learning samples constituting the read general learning data 346. Subsequently, in step S167, the processor 26 recognizes all personal learning samples constituting the personal learning data 346 by the general SVM. That is, in order to improve the recognition accuracy of the personalized SVM, the personal learning sample is first recognized by the general SVM. Subsequently, in step S169, the processor 26 records the recognition degree of the general SVM. That is, the processor 26 stores, in the SVM recognition buffer 344, the number of personal learning samples recognized as the active state among the plurality of personal learning samples constituting the personal learning data 350 as the recognition degree of the general SVM.

続いて、プロセッサ２６は、ステップＳ１７１で、一般学習サンプルの重みを、個人学習データ３５０を利用して調整する。つまり、プロセッサ２６は、数６および数７の数式に基づいて、個人学習サンプルおよび一般学習サンプルから複合行動の期待値を算出し、数８の数式に基づいて、一般学習データ３４６を構成する各一般学習サンプルの重みを調整する。続いて、プロセッサ２６は、ステップＳ１７３で、個人化ＳＶＭを構築する。つまり、プロセッサ２６は、重みが調整された一般学習サンプル（個人化された一般学習サンプル）から、図１６（Ｂ）に示すような個人化ＳＶＭを構築する。なお、ステップＳ１７１，Ｓ１７３の処理を実行するプロセッサ２６は調整手段として機能する。また、ステップＳ１７１の処理を実行するプロセッサ２６は重み調整手段とて機能する。 Subsequently, the processor 26 adjusts the weight of the general learning sample using the personal learning data 350 in step S171. That is, the processor 26 calculates the expected value of the composite action from the individual learning sample and the general learning sample based on the mathematical formulas 6 and 7, and configures the general learning data 346 based on the mathematical formula 8 Adjust the weight of the general learning sample. Subsequently, the processor 26 constructs a personalized SVM at step S173. That is, the processor 26 constructs a personalized SVM as shown in FIG. 16B from the general learning sample (personalized general learning sample) whose weight is adjusted. Note that the processor 26 that executes the processes of steps S171 and S173 functions as an adjusting unit. In addition, the processor 26 that executes the process of step S171 functions as a weight adjusting unit.

続いて、プロセッサ２６は、ステップＳ１７５で、個人化ＳＶＭによって、個人学習データ３５０を構成する全ての個人学習サンプルを認識する。つまり、プロセッサ２６は、個人化ＳＶＭの認識精度を向上させるために、個人化ＳＶＭで個人学習サンプルを認識する。続いて、プロセッサ２６は、ステップＳ１７７で、個人化ＳＶＭの認識度を記憶する。つまり、プロセッサ２６は、個人学習データ３４６を構成する複数の個人学習サンプルのうち、アクティブ状態と認識された個人学習サンプルの数を、個人化ＳＶＭの認識度としてＳＶＭ認識バッファ３４４に記憶させる。なお、ステップＳ１７５の処理を実行するプロセッサ２６は仮認識手段として機能し、ステップＳ１７７の処理を実行するプロセッサ２６は記録手段とし機能する。 Subsequently, in step S175, the processor 26 recognizes all personal learning samples constituting the personal learning data 350 by the personalized SVM. That is, the processor 26 recognizes the personal learning sample by the personalized SVM in order to improve the recognition accuracy of the personalized SVM. Subsequently, in step S177, the processor 26 stores the recognition degree of the personalized SVM. That is, the processor 26 stores, in the SVM recognition buffer 344, the number of personal learning samples recognized as an active state among the plurality of personal learning samples constituting the personal learning data 346 as the recognition degree of the personalized SVM. The processor 26 that executes the process of step S175 functions as a temporary recognition unit, and the processor 26 that executes the process of step S177 functions as a recording unit.

続いて、プロセッサ２６は、ステップＳ１７９で、前回の認識度と今回の認識度との差が閾値以下か否かを判断する。たとえば、プロセッサ２６は、ＳＶＭ認識バッファ３４４に記憶されている、一般ＳＶＭの認識度と個人化ＳＶＭの認識度との差を算出し、その差が個人学習サンプルの総数の１割以下であるか否かを判断する。なお、ステップＳ１７１-Ｓ１８１の処理が２回目以降の場合、ステップＳ１７９の判断は、個人化ＳＶＭによる前回の認識度と今回の認識度とから判断される。ステップＳ１７９で“ＹＥＳ”であれば、つまり認識度の差が閾値以下であれば、プロセッサ２６は学習処理を終了する。 Subsequently, in step S179, the processor 26 determines whether or not the difference between the previous recognition level and the current recognition level is equal to or less than a threshold value. For example, the processor 26 calculates the difference between the recognition degree of the general SVM and the recognition degree of the personalized SVM stored in the SVM recognition buffer 344, and whether the difference is 10% or less of the total number of personal learning samples. Judge whether or not. When the processes in steps S171 to S181 are performed for the second time or later, the determination in step S179 is determined from the previous recognition level and the current recognition level by the personalized SVM. If “YES” in the step S179, that is, if the difference in recognition degree is equal to or less than the threshold value, the processor 26 ends the learning process.

また、ステップＳ１７９で“ＮＯ”であれば、つまり認識度の差が閾値よりも大きければ、プロセッサ２６はステップＳ１８１で、個人学習サンプルが追加されたか否かを判断する。つまり、プロセッサ２６は、個人学習データ３５０に新しい個人学習サンプルが追加されたか否かを判断する。ステップＳ１８１で“ＮＯ”であれば、たとえば、所定時間が経過しておらず、個人学習サンプルが追加されていなければ、プロセッサ２６はステップＳ１８１の処理を繰り返し実行する。一方、ステップＳ１８１で“ＹＥＳ”であれば、つまり個人学習サンプルが追加されれば、プロセッサ２６はステップＳ１７１に戻る。そして、ステップＳ１７１では、追加された個人学習サンプルを含む個人学習データ３５０を利用して、一般学習サンプルの重みが調整される。 If “NO” in the step S179, that is, if the difference in recognition degree is larger than the threshold value, the processor 26 determines whether or not an individual learning sample is added in a step S181. That is, the processor 26 determines whether or not a new personal learning sample has been added to the personal learning data 350. If “NO” in the step S181, for example, if the predetermined time has not elapsed and the personal learning sample has not been added, the processor 26 repeatedly executes the process of the step S181. On the other hand, if “YES” in the step S181, that is, if a personal learning sample is added, the processor 26 returns to the step S171. In step S171, the weight of the general learning sample is adjusted using the individual learning data 350 including the added individual learning sample.

なお、他の実施例では、ステップＳ１６３−Ｓ１６９およびステップＳ１７５−Ｓ１８１を省略し、既に蓄積された個人学習サンプルのみで、個人化ＳＶＭが構築されてもよい。たとえば、認識対象ユーザが、一般学習サンプルを作成するために使われた個人学習サンプルを提供したユーザであれば、多くの個人学習サンプルが既に蓄積されているため、上記のように、処理は簡略化される。 In other embodiments, steps S163 to S169 and steps S175 to S181 may be omitted, and a personalized SVM may be constructed using only already accumulated personal learning samples. For example, if the user to be recognized is a user who has provided a personal learning sample used to create a general learning sample, many personal learning samples have already been accumulated, so the process is simplified as described above. It becomes.

図２９は認識プログラム３２０の処理を示すフロー図が示される。たとえば、ＰＣ１０のプロセッサ２６は、ユーザによってＰＣ１０の電源がオンにされると、ステップＳ１９１で、上記ステップＳ１８１と同様、個人学習サンプルが追加されたか否かを判断する。ステップＳ１９１で“ＮＯ”であれば、つまり個人学習サンプルが追加されていなければ、プロセッサ２６はステップＳ１９１の処理を繰り返す。一方、ステップＳ１９１で“ＹＥＳ”であれば、つまり個人学習サンプルが追加されれば、プロセッサ２６はステップＳ１９３で、追加された個人学習サンプルを認識サンプルとして取得する。つまり、プロセッサ２６は、個人学習データ３５０から一番最後に追加された個人学習サンプルを、集中状態を認識するために、認識サンプルとして読み出す。 FIG. 29 is a flowchart showing the processing of the recognition program 320. For example, when the power of the PC 10 is turned on by the user, the processor 26 of the PC 10 determines whether or not a personal learning sample has been added in step S191 as in step S181. If “NO” in the step S191, that is, if a personal learning sample is not added, the processor 26 repeats the process of the step S191. On the other hand, if “YES” in the step S191, that is, if a personal learning sample is added, the processor 26 acquires the added personal learning sample as a recognition sample in a step S193. That is, the processor 26 reads the personal learning sample added last from the personal learning data 350 as a recognition sample in order to recognize the concentration state.

続いて、プロセッサ２６は、ステップＳ１９５で、集中認識処理を実行する。また、ステップＳ１９５の処理については、図３０に示すフロー図を用いて後述するため、ここでの詳細な説明は省略する。続いて、プロセッサ２６は、ステップＳ１９７で発話認識処理を実行し、ステップＳ１９９で興味対象認識処理を実行する。先述したように、プロセッサ２６は、発話認識処理によって、ユーザが話し手として発話しているトーク状態、またはユーザが相手の話を傾聴しているリッスン状態を認識する。また、プロセッサ２６は、興味対象認識処理によって、ユーザの興味対象（ロボット、モニタ、その他）を認識する。 Subsequently, the processor 26 executes centralized recognition processing in step S195. Further, since the process of step S195 will be described later with reference to the flowchart shown in FIG. 30, detailed description thereof is omitted here. Subsequently, the processor 26 executes an utterance recognition process in step S197, and executes an interest object recognition process in step S199. As described above, the processor 26 recognizes a talk state in which the user is speaking as a speaker or a listen state in which the user is listening to the other party's story by the speech recognition process. Further, the processor 26 recognizes the user's interest (robot, monitor, etc.) by the interest recognition process.

続いて、プロセッサ２６は、ステップＳ２０１で、各認識結果を状態データ３５２として記憶する。たとえば、プロセッサ２６は、集中認識の結果がアクティブ状態であり、発話認識の結果がトークであり、興味対象認識の結果が「ロボット」であれば、状態データ３５２は、「アクティブ・トーク・ロボット」としてメモリ３０に記憶される。続いて、プロセッサ２６は、ステップＳ８３で、状態データ３５２をサーバ２４に送信し、ステップＳ１９１に戻る。 Subsequently, the processor 26 stores each recognition result as the state data 352 in step S201. For example, if the result of centralized recognition is active, the result of speech recognition is talk, and the result of interest recognition is “robot”, the processor 26 indicates that the state data 352 includes “active talk robot”. Is stored in the memory 30. Subsequently, in step S83, the processor 26 transmits the status data 352 to the server 24, and returns to step S191.

図３０は集中認識プログラム３２２の処理を示すフロー図が示される。ステップＳ１９５の処理が実行されると、プロセッサ２６は、ステップＳ２１１で、取得した認識サンプルから複合行動の行動頻度を算出する。つまり、プロセッサ２６は、図１４に示すように、認識サンプルにおける要素行動の発生頻度から、複合行動の行動頻度を算出する。続いて、プロセッサ２６は、ステップＳ２１３で、ユーザの集中状態を認識する。つまり、プロセッサ２６は、ステップＳ２１１で算出された行動頻度を個人化ＳＶＭに入力して、ユーザの集中状態を認識する。なお、ステップＳ２１３の処理を実行するプロセッサ２６は認識手段として機能する。 FIG. 30 is a flowchart showing the processing of the centralized recognition program 322. When the process of step S195 is executed, the processor 26 calculates the action frequency of the composite action from the acquired recognition sample in step S211. That is, as shown in FIG. 14, the processor 26 calculates the action frequency of the composite action from the occurrence frequency of the element action in the recognition sample. Subsequently, in step S213, the processor 26 recognizes the user's concentration state. That is, the processor 26 inputs the action frequency calculated in step S211 to the personalized SVM and recognizes the user's concentration state. The processor 26 that executes the process of step S213 functions as a recognition unit.

続いて、プロセッサ２６は、ステップＳ２１５で、認識結果がアクティブ状態か否かを判断する。ステップＳ２１５で“ＹＥＳ”であれば、つまり個人化ＳＶＭによる認識結果がアクティブ状態であれば、プロセッサ２６はステップＳ２１７で、アクティブ状態を設定して、集中認識処理を終了する。つまり、プロセッサ２６は、集中フラグ３５４をオンに設定する。一方、ステップＳ２１５で“ＮＯ”であれば、つまり個人化ＳＶＭによる認識結果がパッシブ状態であれば、プロセッサ２６はステップＳ２１９で、パッシブ状態を設定して、集中処理を終了する。つまり、プロセッサ２６は、集中フラグ３５４をオフに設定する。 Subsequently, the processor 26 determines whether or not the recognition result is in an active state in step S215. If “YES” in the step S215, that is, if the recognition result by the personalized SVM is in the active state, the processor 26 sets the active state in step S217, and ends the centralized recognition process. That is, the processor 26 sets the concentration flag 354 to ON. On the other hand, if “NO” in the step S215, that is, if the recognition result by the personalized SVM is in the passive state, the processor 26 sets the passive state in step S219 and ends the centralized process. That is, the processor 26 sets the concentration flag 354 to OFF.

そして、プロセッサ２６は、集中認識処理が終了すれば認識処理に戻って、ステップＳ１９７の処理を実行する。 Then, when the centralized recognition process is completed, the processor 26 returns to the recognition process and executes the process of step S197.

図３１はロボット制御プログラム３２４の処理を示すフロー図である。たとえば、ＰＣ１０のプロセッサ２６は、テレビ電話機能による通話が開始されると、ステップＳ２３１で終了操作か否かを判断する。たとえば、プロセッサ２６は、テレビ電話機能による通話を終了する操作がされたか否かを判断する。ステップＳ２３１で“ＹＥＳ”であれば、つまり終了操作が行われると、プロセッサ２６はロボット制御処理を終了する。一方、ステップＳ２３１で“ＮＯ”であれば、つまり終了操作が行われなければ、プロセッサ２６はステップＳ２３３で、状態データ３５２を参照する。 FIG. 31 is a flowchart showing the processing of the robot control program 324. For example, when the telephone call by the videophone function is started, the processor 26 of the PC 10 determines whether or not the end operation is performed in step S231. For example, the processor 26 determines whether or not an operation for ending a call using the videophone function has been performed. If “YES” in the step S231, that is, if an end operation is performed, the processor 26 ends the robot control processing. On the other hand, if “NO” in the step S231, that is, if the end operation is not performed, the processor 26 refers to the state data 352 in a step S233.

続いて、プロセッサ２６は、ステップＳ２３５ではモニタ状態か否かを判断する。つまり、プロセッサ２６は、状態データ３５２にユーザの興味対象がモニタ１６であること示す「モニタ」が含まれているか否かを判断する。ステップＳ２３５で“ＮＯ”であれば、つまりユーザの興味対象がモニタ１６でなければ、プロセッサ２６はステップＳ２４５に進む。一方、ステップＳ２３５で“ＹＥＳ”であれば、つまりユーザの興味対象がモニタ１６であれば、プロセッサ２６はステップＳ２３７で、アクティブか否かを判断する。つまり、プロセッサ２６は、状態データ３５２にユーザがアクティブ状態であることを示す「アクティブ」が含まれているか否かを判断する。ステップＳ２３７で“ＮＯ”であれば、つまりユーザがパッシブ状態であれば、プロセッサ２６はステップＳ２４７に進む。 Subsequently, in step S235, the processor 26 determines whether or not the monitor state is set. That is, the processor 26 determines whether or not “monitor” indicating that the user's interest is the monitor 16 is included in the state data 352. If “NO” in the step S235, that is, if the user's interest is not the monitor 16, the processor 26 proceeds to a step S245. On the other hand, if “YES” in the step S235, that is, if the user's interest is the monitor 16, the processor 26 determines whether or not it is active in a step S237. That is, the processor 26 determines whether or not “active” indicating that the user is in the active state is included in the state data 352. If “NO” in the step S237, that is, if the user is in a passive state, the processor 26 proceeds to a step S247.

一方、ステップＳ２３７で“ＹＥＳ”であれば、つまりユーザがアクティブ状態であれば、プロセッサ２６はステップＳ２３９で、トークであるか否かを判断する。つまり、プロセッサ２６は状態データ３５２に、ユーザがトーク状態であることを示す「トーク」が含まれるか否かを判断する。ステップＳ２３９で“ＹＥＳ”であれば、つまり、ユーザがトーク状態であれば、状態データ３５２は「アクティブ・トーク・モニタ」であるため、プロセッサ２６はステップＳ１７１でアクティブトーク処理を実行し、ステップＳ２３１に戻る。また、このアクティブトーク処理が実行されると、ロボット１２は、疑似傾聴動作および発話制御動作を行う。 On the other hand, if “YES” in the step S237, that is, if the user is in an active state, the processor 26 determines whether or not it is a talk in a step S239. That is, the processor 26 determines whether or not the status data 352 includes “talk” indicating that the user is in the talk state. If “YES” in the step S239, that is, if the user is in the talk state, the state data 352 is “active talk monitor”, and therefore the processor 26 executes the active talk process in the step S171, and the step S231. Return to. When this active talk process is executed, the robot 12 performs a pseudo-listening operation and an utterance control operation.

ステップＳ２３９で“ＮＯ”であれば、つまりユーザがリッスン状態であれば、状態データ３５２は「アクティブ・リッスン・モニタ」であるため、プロセッサ２６はステップＳ２４３で、アクティブリッスン処理を実行し、ステップＳ１６１に戻る。また、このアクティブリッスン処理が実行されると、ロボット１２は、疑似傾聴動作およびユーザに発話を促す動作を行う。 If “NO” in the step S239, that is, if the user is in a listening state, the state data 352 is “active listening monitor”, and therefore the processor 26 executes an active listening process in a step S243, and the step S161. Return to. When this active listening process is executed, the robot 12 performs a pseudo-listening operation and an operation for prompting the user to speak.

また、ユーザの興味対象がモニタ１６以外である場合、プロセッサ２６はステップＳ１４５で、ロボット状態か否かを判断する。つまり、プロセッサ２６は、状態データ３５２に、ユーザの興味対象がロボット１２であることを示す「ロボット」が含まれているか否かを判断する。ステップＳ２４５で“ＹＥＳ”であれば、つまりユーザの興味対象がロボット１２であれば、プロセッサ２６はステップＳ２４７で非アクティブ処理を実行し、ステップＳ２３１に戻る。また、この非アクティブ処理が実行されると、ロボット１２は、ユーザの注意を引きつける動作およびユーザの発話を促す動作を行う。 On the other hand, if the user's interest is other than the monitor 16, the processor 26 determines whether or not the robot is in a step S145. That is, the processor 26 determines whether or not the status data 352 includes “robot” indicating that the user is interested in the robot 12. If “YES” in the step S245, that is, if the user's interest is the robot 12, the processor 26 executes an inactive process in a step S247, and the process returns to the step S231. When this inactive process is executed, the robot 12 performs an operation that attracts the user's attention and an operation that prompts the user to speak.

一方、ステップＳ２４５で“ＮＯ”であれば、つまりユーザの興味対象がロボット１２でもモニタ１６でもなければ、ステップＳ２４９でアザー処理を実行し、ステップＳ２３１に戻る。また、アザー処理が実行されると、ロボット１２は、注意の引きつける動作およびユーザに話しかける動作を行う。 On the other hand, if “NO” in the step S245, that is, if the user is not interested in the robot 12 or the monitor 16, the other process is executed in a step S249, and the process returns to the step S231. When the other process is executed, the robot 12 performs an action to attract attention and an action to talk to the user.

このように、第１実施例では、ユーザの集中状態は、ユーザの行動に基づいてＳＶＭが個人化されるため、ユーザの状態が正しく認識されるようになる。 As described above, in the first embodiment, the user's concentration state is correctly recognized because the SVM is personalized based on the user's behavior.

＜第２実施例＞
第２実施例では、ＳＶＭを個人化するのではなく、認識サンプルを一般化することで、認識対象ユーザの集中状態を正しく認識する。そして、第２実施例では、認識サンプルは、ユーザの発生頻度を正規化（一般化）するために利用される、数４に示す数式に基づいて、一般化される。 <Second embodiment>
In the second embodiment, the concentration state of the recognition target user is correctly recognized by generalizing the recognition sample, not personalizing the SVM. In the second embodiment, the recognition sample is generalized based on the mathematical formula shown in Equation 4 that is used to normalize (generalize) the occurrence frequency of the user.

なお、第２実施例の傾聴対話持続システム１００は、第１実施例と同じであるため、ＰＣ１０およびロボット１２などの電気的な構成や、ＰＣ１０のメモリマップなどなど、重複した説明は省略する。 Note that the listening dialogue sustaining system 100 of the second embodiment is the same as that of the first embodiment, and therefore, redundant description such as an electrical configuration of the PC 10 and the robot 12 and a memory map of the PC 10 is omitted.

たとえば、図３２（Ａ）を参照して、図１６（Ａ）と同じように、「注視」の発生頻度が低いユーザは、会話に集中していたとしても、認識サンプルがパッシブ側に配置されるため、パッシブ状態と誤認識される。そこで、数４に示す数式に基づいてユーザの認識サンプルを一般化すると、一般化された認識サンプルは、境界線の右側（アクティブ側）に配置されるため、上記ユーザは、アクティブ状態と認識されるようになる。 For example, referring to FIG. 32 (A), as in FIG. 16 (A), even if a user with low occurrence frequency of “gaze” concentrates on the conversation, the recognition sample is placed on the passive side. Therefore, it is misrecognized as a passive state. Therefore, when the user's recognition sample is generalized based on the mathematical formula shown in Equation 4, the generalized recognition sample is arranged on the right side (active side) of the boundary line, so that the user is recognized as being in an active state. Become so.

このように、認識サンプルの重みを調整するだけでよいので、認識基準を変化させることなく、特定のユーザの集中状態を容易に正しく認識できるようになる。 In this way, since it is only necessary to adjust the weight of the recognition sample, the concentration state of a specific user can be easily and correctly recognized without changing the recognition reference.

以下、第２実施例のＰＣ１０によって実行される本願発明のフロー図について説明する。ただし、図２０−図２７、図２９および図３１に示すフロー図は、第１実施例と同じであるため、詳細な説明は省略する。また、図３３のフロー図は第２実施例の学習プログラム３１８による処理を示し、図３４のフロー図は第２実施例の集中認識プログラム３２２による処理を示す。 Hereinafter, a flowchart of the present invention executed by the PC 10 of the second embodiment will be described. However, since the flowcharts shown in FIGS. 20 to 27, 29, and 31 are the same as those in the first embodiment, detailed description thereof is omitted. 33 shows the processing by the learning program 318 of the second embodiment, and the flowchart of FIG. 34 shows the processing by the centralized recognition program 322 of the second embodiment.

図３３は第２実施例の学習プログラム３１８の処理を示すフロー図である。プロセッサ２６は、ステップＳ３０１で一般学習データを取得し、ステップＳ３０３で一般ＳＶＭを構築する。そして、一般ＳＶＭを構築する処理が終了すると、プロセッサ２６は学習処理を終了する。つまり、第２実施例の学習処理は、一般ＳＶＭを構築するだけの処置となる。 FIG. 33 is a flowchart showing the processing of the learning program 318 of the second embodiment. The processor 26 acquires general learning data in step S301, and constructs a general SVM in step S303. Then, when the process of constructing the general SVM ends, the processor 26 ends the learning process. In other words, the learning process of the second embodiment is a process that only constructs a general SVM.

図３４は第２実施例の集中認識プログラム３２２の処理を示すフロー図である。プロセッサ２６は、ステップＳ３１１で、集中処理で取得された認識サンプルから、複合行動の行動頻度を算出する。続いて、プロセッサ２６は、ステップＳ３１３で、重みを調整した認識サンプルを作成する。つまり、プロセッサ２６は、ステップＳ３１１で算出された行動頻度および数４に示す数式に基づいて、認識サンプルの重みを調整する。なお、重みが調整された認識サンプルは「一般化された認識サンプル」と呼ばれる。続いて、プロセッサ２６は、ステップＳ３１５で、重みが調整された認識サンプルに基づいて、ユーザの集中状態を認識する。つまり、プロセッサ２６は、学習処理で構築された一般ＳＶＭによって、一般化された認識サンプルを認識することで、ユーザの集中状態を認識する。なお、ステップＳ３１５の処理を実行するプロセッサ２６は作成手段として機能し、ステップＳ３１７の処理を実行するプロセッサ２６は認識手段として機能する。 FIG. 34 is a flowchart showing the processing of the centralized recognition program 322 of the second embodiment. In step S <b> 311, the processor 26 calculates the action frequency of the composite action from the recognition sample acquired by the central processing. Subsequently, in step S313, the processor 26 creates a recognition sample with the weight adjusted. That is, the processor 26 adjusts the weight of the recognition sample based on the behavior frequency calculated in step S311 and the mathematical formula shown in Equation 4. The recognition sample whose weight has been adjusted is referred to as a “generalized recognition sample”. Subsequently, in step S315, the processor 26 recognizes the user's concentration state based on the recognition sample whose weight is adjusted. That is, the processor 26 recognizes the user's concentration state by recognizing the generalized recognition sample by the general SVM constructed by the learning process. Note that the processor 26 that executes the process of step S315 functions as a creation unit, and the processor 26 that executes the process of step S317 functions as a recognition unit.

続いて、プロセッサ２６は、ステップＳ３１７で、認識結果がアクティブ状態か否かを判断する。つまり、プロセッサ２６は、一般ＳＶＭによる認識結果がアクティブ状態であるかを判断する。ステップＳ３１７で“ＹＥＳ”であれば、つまり認識結果がアクティブ状態であれば、プロセッサ２６はステップＳ３１９で、アクティブ状態を設定する。つまり、ステップＳ３１９では、集中フラグ３５４がオンに設定される。一方、ステップＳ３１７で“ＮＯ”であれば、つまり認識結果がパッシブ状態であれば、プロセッサ２６はステップＳ３２１で、パッシブ状態を設定する。つまり、集中フラグ３５４がオフに設定される。 Subsequently, in step S317, the processor 26 determines whether or not the recognition result is in an active state. That is, the processor 26 determines whether the recognition result by the general SVM is in the active state. If “YES” in the step S317, that is, if the recognition result is the active state, the processor 26 sets the active state in a step S319. That is, in step S319, the concentration flag 354 is set to ON. On the other hand, if “NO” in the step S317, that is, if the recognition result is the passive state, the processor 26 sets the passive state in a step S321. That is, the concentration flag 354 is set off.

また、プロセッサ２６は、ステップＳ３２３で、認識サンプルに認識結果をラベル付けして個人学習サンプルに追加し、集中認識処理を終了する。そして、プロセッサ２６は、集中認識処理を終了すると、上位ルーチンである認識処理に戻る。 In step S323, the processor 26 labels the recognition sample with the recognition result and adds it to the individual learning sample, and ends the intensive recognition process. When the processor 26 ends the intensive recognition process, the processor 26 returns to the recognition process that is a higher-level routine.

ここで、ステップＳ２３２で、ラベル付けされた個人学習サンプルが個人学習データ３５０に追加されるため、第２実施例では、ラベル付けされた個人学習サンプルを利用して一般学習サンプルを再計算する事ができる。つまり、一般ＳＶＭが学習する一般学習サンプルの数を増やすことができるため、認識の精度を向上させることができる。 Here, since the labeled personal learning sample is added to the personal learning data 350 in step S232, in the second embodiment, the general learning sample is recalculated using the labeled personal learning sample. Can do. That is, since the number of general learning samples learned by the general SVM can be increased, the recognition accuracy can be improved.

これらの実施例によれば、傾聴対話持続システム１００に含まれるＰＣ１０は、ロボット１２の腹部カメラ１４およびモニタカメラ２２によって撮影された画像とマイク２０によって集音された音声とから、ユーザの行動データを取得する。また、第１実施例では、特定のユーザの行動データからサンプリングされた個人学習サンプルおよびＳＶＭを構築するための一般学習サンプルに基づいて、境界線（超平面）の位置が調整された個人化ＳＶＭが構築される。さらに、第２実施例では、特定のユーザの行動データからサンプリングされた個人学習サンプルおよび一般ＳＶＭを構築するための一般学習サンプルに基づいて、認識サンプルが一般化される。 According to these embodiments, the PC 10 included in the listening dialogue sustaining system 100 uses the image taken by the abdomen camera 14 and the monitor camera 22 of the robot 12 and the voice collected by the microphone 20 to obtain user action data. To get. Further, in the first embodiment, the personalized SVM in which the position of the boundary line (hyperplane) is adjusted based on the personal learning sample sampled from the action data of the specific user and the general learning sample for constructing the SVM. Is built. Further, in the second embodiment, the recognition sample is generalized based on the individual learning sample sampled from the action data of a specific user and the general learning sample for constructing the general SVM.

そして、各実施例では、特定のユーザの行動から個人学習サンプルがサンプリングされ、その個人学習サンプルおよび既存の一般学習サンプルを利用して、特定のユーザの状態が認識される。そのため、ＰＣ１０は、特定のユーザの集中状態を容易に正しく認識できるようになる。 And in each Example, a personal learning sample is sampled from the action of a specific user, and a specific user's state is recognized using the personal learning sample and the existing general learning sample. Therefore, the PC 10 can easily and correctly recognize the concentration state of a specific user.

また、これらの実施例では、取得される要素行動の種類を少なくしつつ、学習および認識に必要な複合行動を得ることができる。そのため、ユーザの要素行動を記録するＰＣ１０の負荷を減らすことができる。さらに、相手の発話の有無を利用してユーザの集中状態を認識することで、認識の精度を向上させることができる。 Further, in these embodiments, it is possible to obtain complex actions necessary for learning and recognition while reducing the types of element actions to be acquired. Therefore, it is possible to reduce the load on the PC 10 that records the user's elementary actions. Furthermore, the recognition accuracy can be improved by recognizing the user's concentration state using the presence or absence of the other party's utterance.

なお、傾聴対話持続システム１００は、図１のように、必ずしも遠隔対話を前提としているわけではなく、同じ空間（部屋）に居る人物の状態を推定する場合にも用いることができる。図３５を参照して、２人のユーザが同じ部屋に居る場合は、ＰＣ１０は、ネットワーク２００を介さず、サーバ２４に直接接続される。また、図３６を参照して、２人が同じ部屋の中でお互いが向かい合って座る場合には、お互いが相手の顔を直接確認できるため、モニタ１６は何も表示しない。また、他の実施例では、ＰＣ１０がサーバ２４の機能を果たすことで、１台のＰＣ１０によって、ユーザＡおよびユーザＢの集中状態が認識されてもよい。つまり、各ユーザの行動データは、１台のＰＣ１０に蓄積される。ただし、ロボット１２、スピーカ１８、マイク２０およびモニタカメラ２２は、各ユーザの近くにそれぞれ設置される。 As shown in FIG. 1, the listening dialogue sustaining system 100 is not necessarily premised on remote dialogue, and can also be used when estimating the state of a person in the same space (room). Referring to FIG. 35, when two users are in the same room, PC 10 is directly connected to server 24 without network 200. Referring to FIG. 36, when two people sit in the same room facing each other, the monitor 16 displays nothing because each other can directly confirm the other's face. In another embodiment, the concentration state of the user A and the user B may be recognized by one PC 10 by the PC 10 fulfilling the function of the server 24. That is, the action data of each user is stored in one PC 10. However, the robot 12, the speaker 18, the microphone 20, and the monitor camera 22 are each installed near each user.

また、他の実施例では、集中状態だけでなく、発話状態や興味の対象なども、ＳＶＭによって認識されてもよい。 In another embodiment, not only the concentration state but also the utterance state and the object of interest may be recognized by the SVM.

また、本実施例では、集中状態を認識する手法としてＳＶＭを採用したが、他の実施例では最近傍法などの他の認識手法が採用されてもよい。 In this embodiment, the SVM is used as a method for recognizing the concentration state. However, in other embodiments, other recognition methods such as a nearest neighbor method may be used.

また、複合行動には、ロボット１２の制御結果およびユーザの手のジェスチャーなどが含まれていてもよい。そして、ロボット１２の制御結果を利用する場合はロボット１２の動作履歴データが参照される。また、ユーザの手のジェスチャーは、ユーザの手を検出する処理を利用して認識される。 Further, the composite action may include a control result of the robot 12, a gesture of the user's hand, and the like. When the control result of the robot 12 is used, the operation history data of the robot 12 is referred to. The user's hand gesture is recognized using a process for detecting the user's hand.

また、腹部カメラ１４およびモニタカメラ２２以外に、モニタ１６およびロボット１２以外の位置でユーザの顔を撮影する第３カメラを設置してもよい。 In addition to the abdominal camera 14 and the monitor camera 22, a third camera that captures the face of the user at a position other than the monitor 16 and the robot 12 may be installed.

また、ＰＣ１０に代えて、サーバ２４によってユーザの状態が認識されてもよい。この場合、腹部カメラ１４およびモニタカメラ２２によって撮影された画像と、マイク２０によって集音された音声とはサーバ２４に直接送信される。 Further, the state of the user may be recognized by the server 24 instead of the PC 10. In this case, the image captured by the abdominal camera 14 and the monitor camera 22 and the sound collected by the microphone 20 are directly transmitted to the server 24.

また、ＰＣ１０およびネットワーク２００を利用せずに電話網などを介して、ユーザの画像と音声とがＰＣ１０に送受信されてもよい。また、ＰＣ１０、モニタ１６、スピーカ１８、マイク２０およびモニタカメラ２２が同一の筐体に組み込まれてもよい。さらに、この場合、ロボット１２とテレビ電話機とが接続された状態で、テレビ電話の通話が開始される。 Further, the user's image and sound may be transmitted / received to / from the PC 10 via the telephone network or the like without using the PC 10 and the network 200. Further, the PC 10, the monitor 16, the speaker 18, the microphone 20, and the monitor camera 22 may be incorporated in the same casing. Further, in this case, a videophone call is started with the robot 12 and the videophone connected.

また、メモリ３０に記憶される各種プログラムのデータは、データ配信用のサーバのHDDに記憶され、通信を介してＰＣ１０に配信されてもよい。さらに、光学ディスクなどの記憶媒体にこれらのプログラムのデータを記憶させた状態で、その記憶媒体が販売または配布されてもよい。 The data of various programs stored in the memory 30 may be stored in the HDD of the data distribution server and distributed to the PC 10 via communication. Furthermore, the storage medium may be sold or distributed in a state where the data of these programs is stored in a storage medium such as an optical disk.

そして、本明細書中で挙げた、ウィンドウ幅の時間、一定時間および所定時間閾値などの具体的な数値は、いずれも単なる一例であり、製品の仕様などの必要に応じて適宜変更可能である。 The specific numerical values such as the window width time, the fixed time, and the predetermined time threshold listed in this specification are merely examples, and can be appropriately changed according to the needs of the product specifications and the like. .

１０ａ，１０ｂ …ＰＣ
１２ａ，１２ｂ …ロボット
１４ａ，１４ｂ …腹部カメラ
１６ａ，１６ｂ …モニタ
２０ａ，２０ｂ …マイク
２２ａ，２２ｂ …モニタカメラ
２４ …サーバ
２６ …プロセッサ
３０ …メモリ
３６ …通信ＬＡＮボード
３８ …無線通信装置
１００ …傾聴対話持続システム
２００ …ネットワーク 10a, 10b ... PC
12a, 12b ... Robot 14a, 14b ... Abdominal camera 16a, 16b ... Monitor 20a, 20b ... Microphone 22a, 22b ... Monitor camera 24 ... Server 26 ... Processor 30 ... Memory 36 ... Communication LAN board 38 ... Wireless communication device 100 ... Listen dialog Sustainable system 200 ... network

Claims

In a state recognition apparatus comprising: an acquisition unit that acquires a user's behavior; and a recognition unit that has a recognition criterion and recognizes a user state from the user's behavior acquired by the acquisition unit.
A state recognition apparatus, further comprising: a storage unit that stores a plurality of learning samples; and an adjustment unit that adjusts the recognition reference based on the plurality of learning samples.

The recognition criterion is a boundary in SVM;
The adjusting means includes weight adjusting means for adjusting the weights of the plurality of learning samples,
The state recognition apparatus according to claim 1, wherein the position of the boundary changes when weights of the plurality of learning samples are adjusted by the weight adjustment unit.

A provisional recognition means for temporarily recognizing a user state after the weight is adjusted by the weight adjustment means; and a recording means for recording the degree of recognition by the provisional recognition means.
The weight adjustment means repeats the adjustment of the weights of the plurality of learning samples until the difference between the previous recognition degree recorded by the recording means and the current recognition degree is a predetermined value or less. The state recognition apparatus according to claim 2.

A state recognition device for recognizing a user state of a user,
Storage means for storing a plurality of learning samples;
Acquisition means for acquiring the user's behavior;
Based on the plurality of learning samples, from the user's action acquired by the acquisition unit, a creation unit that creates a recognition sample in which a weight is adjusted, and a recognition sample that has a recognition reference and is created by the creation unit A state recognition device comprising recognition means for recognizing the user state of the user.

Further comprising an elemental action determination means for determining the presence or absence of a plurality of elemental actions of the user,
The state recognition apparatus according to claim 1, wherein the user's action includes a composite action that combines the presence or absence of the plurality of element actions.

The state recognition apparatus according to claim 5, wherein the plurality of elemental actions include the user's utterance, the user's gaze, the user's forward leaning posture, and the user's whisper.

A listening dialogue sustaining system comprising the state recognition device according to claim 6,
It further comprises a partner utterance judging means for judging the presence or absence of the utterance of the conversation partner,
The listening dialogue sustaining system, wherein the plurality of elemental actions in one speaker further include the speech of the other conversation partner.

A processor of a state recognition device comprising storage means for storing a plurality of learning samples;
Acquisition means for acquiring user behavior,
A state recognition program having a recognition standard and functioning as a recognition unit for recognizing a user state from a user action acquired by the acquisition unit, and an adjustment unit for adjusting the recognition standard based on the plurality of learning samples .

A processor of a state recognition device comprising storage means for storing a plurality of learning samples;
Acquisition means for acquiring user behavior,
Based on the plurality of learning samples, a creation unit that creates a recognition sample in which a weight is adjusted from the user's action obtained by the obtaining unit, and a recognition criterion, and the weight created by the creation unit is A state recognition program that functions as a recognition unit that recognizes the user state of the user from the adjusted recognition sample.

An acquisition unit for acquiring user behavior, a recognition unit having a recognition standard, a recognition unit for recognizing a user state from a user behavior acquired by the acquisition unit, and a storage unit for storing a plurality of learning samples In the state recognition method,
Acquiring the user's behavior by the acquisition means;
Adjusting the recognition standard based on the plurality of learning samples, and recognizing a user state from a user action acquired by the acquisition unit by the recognition unit in which the recognition standard is adjusted, State recognition method.

A state recognition method for a state recognition device, comprising storage means for storing a plurality of learning samples,
Get user behavior,
Based on the plurality of learning samples, a recognition sample with a weight adjusted from the user's behavior acquired by the acquisition unit is created, and the recognition sample having a recognition criterion and having the weight adjusted is used to determine the user's behavior. A state recognition method for recognizing a user state.