JP2015191471A

JP2015191471A - Emotion information estimation device, method, and program

Info

Publication number: JP2015191471A
Application number: JP2014068535A
Authority: JP
Inventors: 建鋒徐; Kenho Jo
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2014-03-28
Filing date: 2014-03-28
Publication date: 2015-11-02
Anticipated expiration: 2034-03-28
Also published as: JP6172755B2

Abstract

PROBLEM TO BE SOLVED: To highly accurately estimate emotion information of a human and the like who are acting according to action data of the human and the like.SOLUTION: A motion analysis part 1 divides action data into action fragments on a temporal axis and classifies the action fragments to identify which of the plurality of action fragments they correspond to. A feature amount calculation part 2 calculates a feature amount according to the respective action fragments. A fragment emotion estimation part 3 estimates, on the basis of the calculated feature amount, which of the plurality of emotion information the respective action fragments correspond to. An emotion integration part 4 identifies which of the plurality of emotion information the action data corresponds to by integrating the emotion information estimated for each action fragment all over the action data. In integrating the information, weight information with respect to the action kind identified for each action fragment is used.

Description

本発明は、人間等の行動データより当該行動している人間等の感情情報を推定する感情情報推定装置、方法及びプログラムに関する。 The present invention relates to an emotion information estimation apparatus, method, and program for estimating emotion information of a person or the like who is acting from the action data of the person or the like.

近年、例えば、特許文献１〜３で音声信号や顔の表情、行動（またはジェスチャー）などマルチモーダルを用いてユーザの感情を認識・推定する技術が開発された。但し、マルチモーダルの情報は、その全部を入手できない場合も多い。特にパブリックスペースで監視カメラの撮影角度や撮影距離を考えると、音声と顔情報を撮りにくい上に、個人情報が多数あるので、入手できないこともある。 In recent years, for example, Patent Documents 1 to 3 have developed a technique for recognizing and estimating a user's emotion using multimodal such as an audio signal, facial expression, action (or gesture). However, in many cases, all of the multimodal information cannot be obtained. Especially when considering the shooting angle and shooting distance of a surveillance camera in a public space, it is difficult to capture voice and face information, and there are many personal information that may not be available.

そこで、体の関節の動き情報だけを持つ人間の行動データから、感情を推定する技術も開発された。例えば、特許文献４では、全身関節の時系列データから動きの特徴をより詳細に反映させた上で次元削減技術を用いた特徴量を抽出することによって、感情認識をより正しく実現した。 Therefore, a technology has been developed to estimate emotions from human behavior data that contains only body joint movement information. For example, in Patent Document 4, emotion recognition is more correctly realized by extracting feature quantities using a dimension reduction technique after reflecting motion features in more detail from time series data of whole body joints.

このように、関節動き情報のみで構成された人間の行動データから、感情情報を推定する一般的な手法は、例えば非特許文献１に記載のように、以下の４手順からなる。 As described above, a general technique for estimating emotion information from human behavior data composed only of joint motion information includes the following four procedures, as described in Non-Patent Document 1, for example.

[手順１] センサーから関節の時系列データ（人間の行動）を取得する。
[手順２] 時間軸上で前記行動データを複数の小さな動き(action)に分割する。
[手順３] 動き(action)ごとに特徴量を抽出する。必要時に次元削減を行う。
[手順４] 感情情報を推定する。 [Procedure 1] Acquire joint time-series data (human behavior) from the sensor.
[Procedure 2] The action data is divided into a plurality of small actions on the time axis.
[Procedure 3] A feature value is extracted for each action. Reduce dimensions when necessary.
[Procedure 4] Estimate emotion information.

具体的に、例えば、非特許文献２では、人間のある行動(activity)データを時間軸上で複数の小さな動き(action)に分割し、前記分割した複数の小さな動きを意味内容で複数の動作種別(sub-activity)に分類し、前記小さな動き毎に特徴量を算出し、算出される特徴量を用いて、前記動作種別(sub-activity)毎に感情情報を推定し、各動作種別(sub-activity)の感情情報を投票し、最も多い票数の感情情報を行動の推定結果として出力する。 Specifically, in Non-Patent Document 2, for example, human activity data is divided into a plurality of small actions on the time axis, and the divided plurality of small movements are converted into a plurality of actions with semantic content. It is classified into types (sub-activity), feature quantities are calculated for each small movement, emotion information is estimated for each action type (sub-activity) using the calculated feature quantities, and each action type ( sub-activity) emotion information is voted, and emotion information with the largest number of votes is output as an action estimation result.

特開2007-41988号公報JP 2007-41988 特開2010-66844号公報JP 2010-66844 A 特開2010-134937号公報JP 2010-134937 JP 特開2009-037410号公報JP 2009-037410 A

Karg, M.; Samadani, A.; Gorbet, R.; Kuhnlenz, K.; Hoey, J.; Kulic, D., "Body Movements for Affective Expression: A Survey of Automatic Recognition and Generation," Affective Computing, IEEE Transactions on, in press.Karg, M .; Samadani, A .; Gorbet, R .; Kuhnlenz, K .; Hoey, J .; Kulic, D., "Body Movements for Affective Expression: A Survey of Automatic Recognition and Generation," Affective Computing, IEEE Transactions on, in press. Daniel Bernhardt, Peter Robinson, "Detecting affect from non-stylised body motions," Affective Computing and Intelligent Interaction, Lecture Notes in Computer Science Volume 4738, 2007, pp 59-70.Daniel Bernhardt, Peter Robinson, "Detecting affect from non-stylised body motions," Affective Computing and Intelligent Interaction, Lecture Notes in Computer Science Volume 4738, 2007, pp 59-70.

しかし、前記の従来技術では、動作種別が何であるかによらず平等に投票を行っているので、感情推定の性能低下の可能性がある。一般に、ジェスチャー等の動きデータは複数の動作種別(sub-activity)に分類されるが、動作種別にはそれぞれ特定な役割があり、感情表現の容易さも違う。 However, in the above-described prior art, since the voting is performed equally regardless of what the action type is, there is a possibility that the performance of emotion estimation is reduced. Generally, motion data such as gestures is classified into a plurality of motion types (sub-activity), but each motion type has a specific role, and the ease of expressing emotions is also different.

一例として、準備段階や本番段階、終了段階という３種類の動作種別を持つジェスチャーの中で、最も感情情報を含んでいる動作は本番段階である。仮に、準備段階と終了段階の推定結果が本番段階と違っても、感情推定のための情報としては、本番段階の方が信頼度が高いと言える。しかしながら、従来技術においては準備段階、本番段階及び終了段階のそれぞれが平等に投票を実施するので、本番段階の信頼度の高さが感情推定の結果に反映されないこととなってしまう。 As an example, among the gestures having three types of motions, that is, a preparation stage, a production stage, and an end stage, the action including the emotion information most is the production stage. Even if the estimation results in the preparation stage and the end stage are different from those in the production stage, it can be said that the production stage has higher reliability as information for emotion estimation. However, in the prior art, since each of the preparation stage, the production stage, and the end stage carries out voting equally, the high reliability in the production stage is not reflected in the result of emotion estimation.

本発明は、上記従来技術の課題に鑑み、高精度に感情推定を行うことのできる感情情報推定装置、方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above-described problems of the prior art, and an object of the present invention is to provide an emotion information estimation apparatus, method, and program capable of performing emotion estimation with high accuracy.

上記目的を達成するため、本発明は、感情情報推定装置であって、行動データを時間軸上で動作素片へと分割すると共に、各動作素片を分類して複数の動作種別のいずれに該当するかを特定するモーション解析部と、各動作素片より特徴量を算出する特徴量算出部と、前記算出された特徴量に基づき、各動作素片が複数の感情情報のいずれに該当するかを推定する素片感情推定部と、動作素片ごとに前記推定された感情情報を前記行動データの全体に渡って統合することにより、前記行動データが複数の感情情報のいずれに該当するかを特定する感情統合部と、を備え、前記感情統合部は、動作素片ごとに前記特定された動作種別に対する重み情報を用いて、前記統合することを特徴とする。 In order to achieve the above object, the present invention is an emotion information estimation device, which divides behavior data into motion segments on a time axis, and classifies each motion segment as one of a plurality of motion types. A motion analysis unit that identifies whether it corresponds, a feature amount calculation unit that calculates a feature amount from each motion unit, and each motion unit corresponds to any of a plurality of emotion information based on the calculated feature amount By integrating the estimated emotion information for each motion segment and the entire emotion data, the behavior data corresponds to which of a plurality of emotion information. An emotion integration unit that identifies the movement type, and the emotion integration unit performs the integration by using weight information for the specified action type for each movement element.

また、本発明は、感情情報推定方法であって、行動データを時間軸上で動作素片へと分割すると共に、各動作素片を分類して複数の動作種別のいずれに該当するかを特定するモーション解析段階と、各動作素片より特徴量を算出する特徴量算出段階と、前記算出された特徴量に基づき、各動作素片が複数の感情情報のいずれに該当するかを推定する素片感情推定段階と、動作素片ごとに前記推定された感情情報を前記行動データの全体に渡って統合することにより、前記行動データが複数の感情情報のいずれに該当するかを特定する感情統合段階と、を備え、前記感情統合段階では、動作素片ごとに前記特定された動作種別に対する重み情報を用いて、前記統合することを特徴とする。 In addition, the present invention is an emotion information estimation method, in which behavior data is divided into motion segments on a time axis, and each motion segment is classified to identify one of a plurality of motion types. A motion analysis stage, a feature quantity calculation stage for calculating a feature quantity from each motion segment, and an element for estimating which of the plurality of emotion information each motion segment corresponds to based on the calculated feature quantity A single emotion estimation stage, and emotion integration for identifying which of the plurality of emotion information the behavior data corresponds to by integrating the estimated emotion information for each motion element over the entire behavior data In the emotion integration step, the integration is performed using weight information for the specified action type for each action element.

さらに、本発明は、コンピュータを上記感情情報推定装置として機能させる感情情報推定プログラムであることを特徴とする。 Furthermore, the present invention is an emotion information estimation program that causes a computer to function as the emotion information estimation device.

本発明によれば、感情統合処理により、動作素片ごとに前記特定された動作種別に対する重み情報を用いて、動作素片ごとに前記推定された感情情報を前記行動データの全体に渡って統合することで、前記行動データが複数の感情情報のいずれに該当するかを特定する。従って、動作種別ごとの重み情報の反映により、高精度に感情情報を推定することが可能となる。 According to the present invention, by the emotion integration process, the estimated emotion information is integrated over the entire behavior data by using the weight information for the specified operation type for each operation unit. Thus, it is specified which of the plurality of emotion information the behavior data corresponds to. Therefore, it is possible to estimate emotion information with high accuracy by reflecting weight information for each action type.

感情情報推定装置の機能ブロック図である。It is a functional block diagram of an emotion information estimation device. 木構造として表現された骨の構造の例である。It is an example of the structure of the bone expressed as a tree structure. モーション解析部の機能ブロック図である。It is a functional block diagram of a motion analysis part. モーション解析部の各部で処理されるデータを概念的に説明するための図である。It is a figure for demonstrating notionally the data processed by each part of a motion analysis part. 算出される速度のグラフの一例である。It is an example of the graph of the calculated speed. 図５の例において、しきい値を変化させた際の動作素片の個数のグラフである。In the example of FIG. 5, it is a graph of the number of operation | movement pieces at the time of changing a threshold value. 事前知識として登録しておく、行動データの種類ごとの動作種別の数の例を表形式で示す図である。It is a figure which shows the example of the number of operation | movement types for every kind of action data registered as prior knowledge in a table format. 素片感情推定部の機能ブロック図である。It is a functional block diagram of a segment emotion estimation part. 感情統合部の機能ブロック図である。It is a functional block diagram of an emotion integration part. 感情情報が4種類、動作種別が4種類である場合の、動作種別ごとに定まる感情情報の分布の例である。This is an example of the distribution of emotion information determined for each action type when there are four kinds of emotion information and four kinds of action types.

図１は、一実施形態に係る感情情報推定装置の機能ブロック図である。感情情報推定装置10は、モーション解析部1、特徴量算出部2、素片感情推定部3及び感情統合部4を備え、行動データを入力として受け取り、当該行動データに対応する感情が何であるかを感情推定結果として出力する。各部の概要は以下の通りである。 FIG. 1 is a functional block diagram of an emotion information estimation apparatus according to an embodiment. The emotion information estimation device 10 includes a motion analysis unit 1, a feature amount calculation unit 2, a segment emotion estimation unit 3 and an emotion integration unit 4, receives behavior data as input, and what is the emotion corresponding to the behavior data Is output as an emotion estimation result. The outline of each part is as follows.

モーション解析部1は、行動データを時系列上で複数の動作素片へ分割すると共に、当該分割された動作素片の各々に対して、複数の動作種別のいずれに該当するかの区別を付与して、特徴量算出部2に渡す。 The motion analysis unit 1 divides the behavior data into a plurality of motion elements in time series, and assigns a distinction to which of the plurality of motion types to each of the divided motion elements. Then, it is passed to the feature amount calculation unit 2.

特徴量算出部2は、上記分割して得られた動作素片の各々より、当該動作素片のデータで表現されている行動に関しての特徴量を算出して、その結果を素片感情推定部3へ渡す。 The feature amount calculation unit 2 calculates a feature amount related to the behavior represented by the data of the motion unit from each of the motion units obtained by the above division, and the result is a unit emotion estimation unit Pass to 3.

素片感情推定部3は、予め構築された分類器を用いて、上記特徴量が算出された動作素片の各々における感情情報を推定して、その結果を感情統合部4に渡す。当該分類器は、特徴量を入力として感情情報を出力するように、後述する手法によって予め学習で構築しておく。 The segment emotion estimation unit 3 estimates emotion information in each of the motion segments for which the feature amount has been calculated using a classifier built in advance, and passes the result to the emotion integration unit 4. The classifier is constructed by learning in advance by a method to be described later so as to output emotion information with the feature quantity as an input.

感情統合部4は、後述する手法によって動作種別ごとに値が定義された重み情報を用いて、上記各動作素片につき推定された感情情報を、入力データとしての行動データの全体で統合することにより、最終的な出力としての、行動データに対応する感情情報の推定結果を得る。 The emotion integration unit 4 integrates the emotion information estimated for each motion segment with the entire behavior data as input data using weight information whose value is defined for each motion type by a method described later. Thus, an estimation result of emotion information corresponding to behavior data is obtained as a final output.

以下、当該各部の詳細を説明するが、まず、入力データである行動データの説明を行う。 Hereinafter, although the detail of the said each part is demonstrated, the action data which are input data are demonstrated first.

行動データは、時系列上で変化する人間等のポーズ（及び重心位置）のデータである。ある１つの時刻で行動データを切り出したものは、人の骨格を基に、骨及び骨の連結点（ジョイント）を用い、一ジョイントを根（ルート）とし、ルートからジョイント経由で順次連結される骨の構造を木（ツリー）構造によって定義することができる。 The action data is data of a pose (and a gravity center position) of a human or the like that changes in time series. The action data cut out at a certain time is based on a human skeleton and is connected sequentially from the root via the joint, using bones and bone connection points (joints) as one root. The structure of the bone can be defined by a tree structure.

図２に、当該木構造として表現された骨の構造の例を示す。図２において、ジョイント100は腰の部分であり、ルートとして定義される。ジョイント101は左腕の肘の部分、ジョイント102は左腕の手首の部分、ジョイント103は右腕の肘の部分、ジョイント104は右腕の手首の部分、ジョイント105は左足の膝の部分、ジョイント106は左足の足首の部分、ジョイント107は右足の膝の部分、ジョイント108は右足の足首の部分、である。 FIG. 2 shows an example of a bone structure expressed as the tree structure. In FIG. 2, the joint 100 is the waist part and is defined as the root. Joint 101 is the elbow part of the left arm, joint 102 is the wrist part of the left arm, joint 103 is the elbow part of the right arm, joint 104 is the wrist part of the right arm, joint 105 is the knee part of the left foot, and joint 106 is the part of the left foot The ankle part, joint 107 is the right leg knee part, and joint 108 is the right leg ankle part.

このような構造により、行動データは、各ジョイントの角度情報や位置情報、速度情報、加速度情報などで表現することが可能である。ここでは、角度情報を例に挙げて説明する。 With such a structure, the behavior data can be expressed by angle information, position information, speed information, acceleration information, and the like of each joint. Here, the angle information will be described as an example.

角度情報データは、人の一連の動きを複数の姿勢（ポーズ）の連続により表すものであり、人の基本ポーズ（neutral pose）を表す基本ポーズデータと、実際の人の動きの各ポーズを表すフレームデータとを有する。基本ポーズデータは、基本ポーズのときのルートの位置及び各ジョイントの位置、並びに各骨の長さなどの情報を有する。基本ポーズデータにより基本ポーズが特定される。フレームデータは、基本ポーズからの移動量をジョイント毎に表す。ここで、移動量として例えば角度情報を利用することができる。各フレームデータにより、基本ポーズに対して各移動量が加味された各ポーズが特定される。これにより、各フレームデータによって特定される各ポーズの連続により、人の一連の動きが特定される。 The angle information data represents a series of movements of a person by a sequence of a plurality of postures (poses), and represents basic pose data representing a person's basic pose and each pose of the actual movement of the person. Frame data. The basic pose data includes information such as the position of the root and the position of each joint in the basic pose, and the length of each bone. The basic pose is specified by the basic pose data. The frame data represents the amount of movement from the basic pose for each joint. Here, for example, angle information can be used as the movement amount. Each frame data identifies each pose in which each movement amount is added to the basic pose. Thereby, a series of movements of a person is specified by the continuation of each pose specified by each frame data.

なお、人間の動きは、人間の動きをカメラ撮影した映像からモーションキャプチャ処理によって作成したり、或いは、キーフレームアニメーションの手作業によって作成したりすることができる。上記では人間の動きとして説明したが、ジョイント構造を与えればその他にも動物やロボット等でも同様に、行動データを得ることができる。 Note that the human motion can be created by a motion capture process from video captured by the camera, or can be created manually by key frame animation. In the above description, human movement has been described. However, if a joint structure is provided, behavior data can be obtained in the same manner for animals and robots.

図３は、モーション解析部1の機能ブロック図である。モーション解析部1は、物理量変換部11、分割部12、正規化部13及び分類部14を備える。図４は、モーション解析部1の当該各部で処理されるデータを概念的に説明するための図である。各部概要は以下の通りである。 FIG. 3 is a functional block diagram of the motion analysis unit 1. The motion analysis unit 1 includes a physical quantity conversion unit 11, a division unit 12, a normalization unit 13, and a classification unit 14. FIG. 4 is a diagram for conceptually explaining data processed by the respective units of the motion analysis unit 1. The outline of each part is as follows.

物理量変換部11は、時系列上の行動データを変換して、時系列上のフレームデータ（後述するように、ジョイント相対位置としてのフレームデータ）となし、分割部12に渡す。図４では(1)に示すデータA1がフレームデータの概念的な例である。なお、行動データが予め、物理量変換部11によって変換されるような当該フレームデータの形式となっていれば、物理量変換部11は省略されてもよい。 The physical quantity conversion unit 11 converts the action data on the time series to make frame data on the time series (frame data as a joint relative position as will be described later), and passes it to the dividing unit 12. In FIG. 4, data A1 shown in (1) is a conceptual example of frame data. Note that the physical quantity conversion unit 11 may be omitted if the behavior data is in the format of the frame data that is converted by the physical quantity conversion unit 11 in advance.

分割部12は、フレームデータを時系列上に順次並んだ個別の短い動きに相当する動作素片に分割して、正規化部13に渡す。図４の例では、(1)のフレームデータA1が(2)に示すように、16個の動作素片D1〜D16に分割されている。 The dividing unit 12 divides the frame data into motion elements corresponding to individual short movements sequentially arranged in time series, and passes them to the normalizing unit 13. In the example of FIG. 4, the frame data A1 in (1) is divided into 16 motion elements D1 to D16 as shown in (2).

正規化部13は、一般に継続時間の長さが互いに異なっている上記分割された各動作素片の時間軸を正規化して、互いに等しい継続時間とすることで、正規化された動作素片を得て、分類部14に渡す。なお、当該正規化処理は、分類部14での処理を可能にするための前処理である。 The normalizing unit 13 generally normalizes the time segments of the divided motion elements whose durations are different from each other and sets the normalized motion elements to equal durations. Obtained and passed to the classification unit 14. The normalization process is a preprocess for enabling the process in the classification unit 14.

図４の例では、(2)の16個の動作素片D1〜D16が正規化されて、それぞれ(3)に示すような正規化された動作素片N1〜N16となっている。なお、図４において、正確には(2),(3)間では横軸方向（時間方向）の長さが、正規化され伸縮が発生することで互いに異なる長さとなるが、図４は概念的な例であるので、このような伸縮は表現していない。 In the example of FIG. 4, the 16 motion elements D1 to D16 of (2) are normalized to become normalized motion elements N1 to N16 as shown in (3), respectively. In addition, in FIG. 4, the length in the horizontal axis direction (time direction) between (2) and (3) is different from each other due to normalization and expansion / contraction, but FIG. Since this is a typical example, such expansion and contraction is not expressed.

分類部14は、正規化された動作素片の各々が、複数の動作種別のいずれに該当するかの分類結果を得て、当該結果を図１の特徴量算出部2に渡す。 The classification unit 14 obtains a classification result indicating which of the plurality of motion types each of the normalized motion elements corresponds to, and passes the result to the feature amount calculation unit 2 in FIG.

図４の例では、(3)の（正規化された）各動作素片N1〜N16が分類された結果として、(4)に示すように、動作素片N1〜N4が動作種別S1に分類され、動作素片N5〜N8が動作種別S2に分類され、動作素片N9〜N12が動作種別S3に分類され、動作素片N13〜N16が動作種別S4に分類されている。なお、当該(4)の例では、各動作種別S1〜S4は全て、連続した動作素片で構成されており、一般には、このように連続している可能性が高いものの、連続しない場合もある。すなわち、ある共通の動作種別に分類された動作素片同士に時間軸上の分断が生じている場合もある。例えば、図４の(4)のように分類されるのではなく、N1〜N3、N8, N10が動作種別S1に分類される、といったこともありうる。 In the example of FIG. 4, as a result of classifying the (normalized) motion elements N1 to N16 of (3), the motion elements N1 to N4 are classified into the motion type S1 as shown in (4). The motion elements N5 to N8 are classified into the action type S2, the action elements N9 to N12 are classified into the action type S3, and the action elements N13 to N16 are classified into the action type S4. In the example of (4), each of the motion types S1 to S4 is composed of continuous motion elements.In general, there is a high possibility that the motion types are not continuous. is there. That is, there is a case where division on the time axis occurs between operation elements classified into a certain common operation type. For example, instead of being classified as shown in (4) of FIG. 4, N1 to N3, N8, and N10 may be classified as the operation type S1.

以下、当該各部11〜14の詳細を説明する。 Hereinafter, details of the respective units 11 to 14 will be described.

物理量変換部11は、入力としての行動データにおいて、各ジョイントがルートに対してどのくらいの位置で動いているのかを算出することで、行動データをジョイント相対位置としてのフレームデータへと変換する。当該変換の具体的な計算方法は以下の通りである。 The physical quantity conversion unit 11 converts the behavior data into frame data as the joint relative position by calculating how much each joint moves with respect to the route in the behavior data as input. A specific calculation method of the conversion is as follows.

物理量変換部11は、図２等を参照して前述した行動データにおける基本ポーズデータとフレームデータを用いてジョイント位置を算出する。基本ポーズデータは、基本ポーズのときのルートの位置及び各ジョイントの位置、並びに各骨の長さなど、基本ポーズを特定する情報を有する。フレームデータは、ジョイント毎に、基本ポーズからの移動量の情報を有する。ここでは、移動量として角度情報を利用する。この場合、時刻ｔにおけるｋ番目のジョイントの位置(x,y,z座標)であるp^k(t)は、以下の（１）式および（２）式により計算される。なお、時刻ｔはフレームデータの時刻である。以下の説明においては、時刻ｔを単に「フレームインデックス」とする。つまり、t=0,1,2,…,T-1の値を取る。ここで、Tは行動データのフレーム数である。 The physical quantity converter 11 calculates the joint position using the basic pose data and the frame data in the action data described above with reference to FIG. The basic pose data includes information for specifying the basic pose, such as the position of the root and the position of each joint in the basic pose, and the length of each bone. The frame data has information on the amount of movement from the basic pose for each joint. Here, angle information is used as the movement amount. In this case, p ^k (t) which is the position (x, y, z coordinate) of the k-th joint at time t is calculated by the following equations (1) and (2). Note that time t is the time of the frame data. In the following description, time t is simply referred to as “frame index”. That is, the values of t = 0, 1, 2,. Here, T is the number of frames of action data.

但し、0番目(i=0)のジョイントはルートである。R_axis ^i-1,i(t)は、i番目のジョイントとその親ジョイント（「i-1」番目のジョイント）間の座標回転マトリックスであり、基本ポーズデータに含まれる。各ジョイントにはローカル座標系が定義されており、座標回転マトリックスは親子関係にあるジョイント間のローカル座標系の対応関係を表す。Rⁱ(t)は、i番目のジョイントのローカル座標系におけるi番目のジョイントの回転マトリックスであり、フレームデータに含まれる角度情報である。Tⁱ(t)は、i番目のジョイントとその親ジョイント間の遷移マトリックスであり、基本ポーズデータに含まれる。遷移マトリックスは、i番目のジョイントとその親ジョイント間の骨の長さを表す。 However, the 0th joint (i = 0) is the root. R _axis ^{i-1, i} (t) is a coordinate rotation matrix between the i-th joint and its parent joint ("i-1" -th joint), and is included in the basic pose data. A local coordinate system is defined for each joint, and the coordinate rotation matrix represents the correspondence of the local coordinate system between joints in a parent-child relationship. R ⁱ (t) is a rotation matrix of the i-th joint in the local coordinate system of the i-th joint, and is angle information included in the frame data. T ⁱ (t) is a transition matrix between the i-th joint and its parent joint, and is included in the basic pose data. The transition matrix represents the bone length between the i-th joint and its parent joint.

次いで、物理量変換部11は、時刻ｔにおける、ルートに対するｋ番目のジョイントの相対位置（ジョイント相対位置）p'^k (t)を（3）式により計算する。 Next, the physical quantity conversion unit 11 calculates the relative position (joint relative position) p ′ ^k (t) of the k-th joint with respect to the root at time t using Equation (3).

但し、p^root (t)は時刻tにおけるルート（0番目のジョイント）の位置p⁰ (t)である。 Here, p ^root (t) is the position p ⁰ (t) of the root (0th joint) at time t.

以上のようにして、物理量変換部11は、各時刻tのフレームを表現するデータとして、各ジョイントのルートに対しての相対位置x(t)を以下の式(4)のように定め、変換された最終的なフレームデータとして、分割部12に渡す。Kはルートを除いたジョイント数である。 As described above, the physical quantity conversion unit 11 determines the relative position x (t) with respect to the root of each joint as data representing the frame at each time t as shown in the following equation (4), and converts The final frame data is passed to the dividing unit 12. K is the number of joints excluding the root.

分割部12は、上記変換されたフレームデータに基づき、当該フレームデータにおける動きの境界を与えている時間を特定し、当該特定された境界ごとに区切ることによって、フレームデータを複数の動作素片へと分割する。ここで、動きの境界は、速度がしきい値THを横切る場合に存在するものとして判定する。すなわち、（１）速度が増加して、しきい値THより小さい値からしきい値THより大きい値へと移る瞬間と、（２）その逆に、速度が減少して、しきい値THより大きい値からしきい値THより小さい値へと移る瞬間と、の２通りが、動きの境界に該当する時間として判定される。 Based on the converted frame data, the dividing unit 12 specifies a time for which a boundary of motion in the frame data is given, and divides the frame data into a plurality of motion units by dividing the specified boundary. And split. Here, the boundary of motion is determined to exist when the speed crosses the threshold value TH. That is, (1) the moment when the speed increases and shifts from a value smaller than the threshold value TH to a value larger than the threshold value TH, and (2) conversely, the speed decreases and exceeds the threshold value TH. Two times, that is, a moment from a large value to a value smaller than the threshold value TH, are determined as the time corresponding to the boundary of motion.

また、速度v(t)は、前記式(4)の相対位置x(t)の微分の絶対値として、以下の式(5)で算出する。図５に、当該算出される速度のグラフの一例を示す。 Further, the velocity v (t) is calculated by the following equation (5) as an absolute value of the differentiation of the relative position x (t) of the equation (4). FIG. 5 shows an example of a graph of the calculated speed.

なお、前記しきい値THは次のように設定することができる。すなわち、最低値TH_lowから最高値TH_highまで少しづつ変化させると、各しきい値THに対して、得られる動作素片の個数が変化する。その中で最大個数を与えるようなしきい値を、最適なしきい値TH_optとして使えばよい。 The threshold value TH can be set as follows. That is, when the value is changed little by little from the lowest value TH_low to the highest value TH_high, the number of obtained motion elements changes for each threshold value TH. A threshold value that gives the maximum number among them may be used as the optimum threshold value TH_opt.

図６は、図５の速度グラフの例に対して、しきい値THを変化させた際の得られる動作素片の個数のグラフである。図６の例では、閾値TH=0.8において動作素片個数の最大値55が得られるので、当該しきい値TH=0.8を最適なものとして採用すればよい。 FIG. 6 is a graph of the number of operating pieces obtained when the threshold value TH is changed with respect to the example of the velocity graph of FIG. In the example of FIG. 6, since the maximum value 55 of the number of motion elements is obtained at the threshold TH = 0.8, the threshold TH = 0.8 may be adopted as an optimum value.

分割部12では、最適な閾値TH_optによって区切って得られた動作素片を、最終的な結果として正規化部13に渡す。 The dividing unit 12 passes the motion element obtained by dividing by the optimum threshold TH_opt to the normalizing unit 13 as a final result.

正規化部13は、動作素片ごとに一定のフレーム数（例えば、２５フレーム）で補間することによって、時間軸の正規化を各ジョイントの相対位置でそれぞれ行い、正規化された動作素片を分類部14に渡す。補間手法としては、例えば、cubic spline interpolation（３次スプライン補間）を用いることが好適である。 The normalization unit 13 performs normalization of the time axis at the relative position of each joint by interpolating with a fixed number of frames (for example, 25 frames) for each motion element, and the normalized motion element is Passed to the classification unit 14. As an interpolation method, for example, cubic spline interpolation is preferably used.

分類部14は、正規化された動作素片の各々を、その全てのフレームの全てのジョイント相対位置データで構成された1つのベクトルとして扱うことにより、動作素片の各々がいずれの動作種別に該当するかの分類を行って、その結果を特徴量算出部2に渡す。当該分類には以下のような各実施形態が可能である。 The classification unit 14 treats each normalized motion element as one vector composed of all joint relative position data of all the frames, so that each motion element is assigned to any motion type. The corresponding classification is performed, and the result is passed to the feature amount calculation unit 2. The following embodiments are possible for the classification.

一実施形態では、k-means法(ｋ平均法)によって、いずれの動作種別に該当するかの分類を行うことができる。但し、k-meansでは、分割クラスタ数としての動作種別の種類数を固定値として与える必要がある。ここで、一般論として、人間の行動の種類によって、当該行動に含まれる動作種別の種類数は変わるので、事前知識として行動データの種類ごとの動作種別の数を図７の表のように登録しておき、前記登録した情報によって動作の数を設定するようにしてもよい。 In one embodiment, it is possible to classify which operation type corresponds to the k-means method (k-means method). However, in k-means, it is necessary to give the number of types of operation as the number of divided clusters as a fixed value. Here, as a general rule, the number of action types included in the action varies depending on the type of human action, so the number of action types for each action data type is registered as prior knowledge as shown in the table of FIG. In addition, the number of operations may be set according to the registered information.

図７の例では、行動の種類として「ドアノック、徒歩、椅子に座る」の3種類が与えられ、当該行動を構成する動作種別の種類数がそれぞれ「４種類，４種類，３種類」のように事前知識として与えられている。 In the example of FIG. 7, three types of actions “Door Knock, Walking, Sitting in a Chair” are given, and the number of types of actions constituting the action is “4 types, 4 types, 3 types”, respectively. Is given as prior knowledge.

なお、本発明における入力データとしての行動データは、上記図７の例のように、その行動の種類が1つに決まっているものであることが好ましい。例えば、図７の3種類の全てを含む「ドアノックしてから歩いて椅子に座る」という行動データがあった場合、事前に「ドアノック」、「徒歩」、「椅子に座る」のそれぞれの部分に分けておいたものを、本発明における入力とすることが好ましい。そして、上記事前知識で分割クラスタ数kを決める場合は、図７のような事前知識の他に、当該事前知識を利用するために、行動データにおける行動の種類が何であるかの情報も、感情情報推定装置10への入力として用意することとなる。 Note that the action data as the input data in the present invention is preferably one in which the kind of action is determined as in the example of FIG. For example, if there is action data that includes all three types of Fig. 7 "Door knock and then walk and sit in a chair", the "Door knock", "Walk", and "Sit on chair" in advance What has been divided is preferably used as the input in the present invention. When determining the number k of divided clusters based on the prior knowledge, in addition to the prior knowledge as shown in FIG. 7, in order to use the prior knowledge, information on what kind of action is included in the action data is also emotion It is prepared as an input to the information estimation device 10.

また、一実施形態では、上記のような事前知識が与えられていない場合に、分類されるべき動作種別の種類数kを、以下の式(6)によるRule of thumb（経験則）で決めるようにして、k-meansを適用するようにしてもよい。なお、nは分割部12で分割された動作素片の個数である。 Also, in one embodiment, when prior knowledge as described above is not given, the number k of action types to be classified is determined by the Rule of thumb (rule of thumb) according to the following equation (6). Then, k-means may be applied. Note that n is the number of motion elements divided by the dividing unit 12.

また、一実施形態では、上記のような事前知識が与えられていない場合に、k-meansの代わりに、階層的なクラスタツリーで分類するようにしてもよい。階層的なクラスタツリーでは、分類されるべき種類数kが事前知識として与えられていなくとも、十分に分離したクラスタに自然に分類することができる。ここで、クラスタツリーのリンクの不整合係数は、動作素片間の類似性が急に変化することを表しているので、当該不整合係数を参照することにより、自然な分類が可能となる。 In one embodiment, when prior knowledge as described above is not given, classification may be made by a hierarchical cluster tree instead of k-means. In a hierarchical cluster tree, even if the number of types k to be classified is not given as prior knowledge, it can be naturally classified into sufficiently separated clusters. Here, since the inconsistency coefficient of the link of the cluster tree represents that the similarity between the motion elements changes suddenly, it is possible to perform natural classification by referring to the inconsistency coefficient.

なお、モーション解析部１の処理にて得られた動作素片は前述のように正規化されており、従って、以降の各部2,3,4で扱う動作素片も正規化されている。以降の説明では、当該各部2,3,4で扱う動作素片について、説明の簡素化のために、「正規化された」旨の形容は省略する。また、後述する図８における学習用モーション解析部31以降で扱われる動作素片についても同様に、「正規化された」旨の形容は省略する。 Note that the motion elements obtained by the processing of the motion analysis unit 1 are normalized as described above, and therefore the motion elements handled by the subsequent units 2, 3, and 4 are also normalized. In the following description, the description of “normalized” is omitted for simplification of the description of the operation pieces handled by the respective units 2, 3, and 4. Similarly, the description of “normalized” is also omitted for motion elements handled in the learning motion analysis unit 31 and later in FIG. 8 described later.

特徴量算出部2は、モーション解析部1から得た動作素片ごとに、その動きの特徴を示す特徴量を算出して、その結果を素片感情推定部3に渡す。ここで、特徴量の定義は多数存在するが、例えば、以下の式(7-1)〜(7-4)に掲げる4種類を利用することができる。「特定の関節h」については、各式においてそれぞれ1つ以上の任意組み合わせを利用できる。最終的に素片感情推定部3に渡される特徴量は、当該4種類を全て組み合わせたものとしてもよいし、その任意の一部分の組み合わせであってもよい。以下の4種類に限らず、その他の種類の特徴量を追加してもよい。また、式(7-1)〜(7-4)において、Nは動作素片に属するフレーム数(=動作素片の時間的長さ)である。 The feature amount calculation unit 2 calculates a feature amount indicating the feature of the motion for each motion unit obtained from the motion analysis unit 1, and passes the result to the unit emotion estimation unit 3. Here, although there are many definitions of feature quantities, for example, four types listed in the following formulas (7-1) to (7-4) can be used. For “specific joint h”, one or more arbitrary combinations can be used in each equation. The feature amount finally passed to the segment emotion estimation unit 3 may be a combination of all the four types, or a combination of any part thereof. In addition to the following four types, other types of feature values may be added. In the equations (7-1) to (7-4), N is the number of frames belonging to the motion element (= time length of the motion element).

図８は、素片感情推定部3の機能ブロック図である。素片感情推定部3は、推定部30、学習用モーション解析部31、学習用特徴量算出部32及び学習部33を備える。 FIG. 8 is a functional block diagram of the segment emotion estimation unit 3. The segment emotion estimation unit 3 includes an estimation unit 30, a learning motion analysis unit 31, a learning feature amount calculation unit 32, and a learning unit 33.

推定部30は、特徴量算出部2で算出された動作素片ごとの特徴量に対して、事前学習によって構築された分類器を適用して、各動作素片が所定種類のいずれの感情に該当するものであるかを感情情報として推定して、当該動作素片毎の感情情報を、感情統合部4に渡す。 The estimation unit 30 applies a classifier constructed by pre-learning to the feature amount for each motion unit calculated by the feature amount calculation unit 2, and each motion unit is assigned to any emotion of a predetermined type. Whether it is applicable is estimated as emotion information, and emotion information for each motion element is passed to the emotion integration unit 4.

ここで、推定部30が上記推定を可能とするために、事前学習により分類器を構築するのが各部31,32,33であり、その詳細は以下の通りである。分類器には例えば、サポートベクトルマシンを利用することができる。 Here, in order to enable the estimation unit 30 to perform the above estimation, each of the units 31, 32, and 33 constructs a classifier by prior learning, and details thereof are as follows. For example, a support vector machine can be used as the classifier.

学習用モーション解析部31及び学習用特徴量算出部32は、それぞれモーション解析部1及び特徴量算出部2と同じ内容の処理を行うので、その説明は省略する。ただし、処理対象となるデータは、事前に多数人（行動データが人間についてのものである場合）について準備された学習用の行動データである。当該学習用行動データの各々には、推定部30で推定するのと同様の、複数種類の感情のいずれに該当するかの感情情報が、人手等によりラベルとして付与されている。 Since the learning motion analysis unit 31 and the learning feature amount calculation unit 32 perform the same processing as the motion analysis unit 1 and the feature amount calculation unit 2, respectively, description thereof is omitted. However, the data to be processed is learning behavior data prepared for a large number of people (when the behavior data is about humans) in advance. In each of the learning behavior data, emotion information corresponding to any of a plurality of types of emotions, similar to that estimated by the estimation unit 30, is given as a label manually.

こうして、学習用の複数の行動データのそれぞれにつき、学習用モーション解析部31は正規化された動作素片へ分割すると共に各動作素片がいずれの動作種別に該当するかの区別を与え、当該動作素片の各々に対して学習用特徴量算出部32は特徴量を算出し、これらの結果を学習部33が受け取る。なお、モーション解析部1及び学習用モーション解析部31では同内容の処理を行うので、分類される動作種別の数ｋは互いに共通の値としておく。 In this way, for each of the plurality of learning behavior data, the learning motion analysis unit 31 divides the motion element into normalized motion elements and gives a distinction as to which motion type each motion element corresponds to, The learning feature amount calculation unit 32 calculates a feature amount for each motion element, and the learning unit 33 receives these results. Since the motion analysis unit 1 and the learning motion analysis unit 31 perform the same processing, the number k of operation types to be classified is set to a common value.

学習部33は、以上のように多数の動作素片について得られた、動作素片における特徴量と、対応する元の行動データにラベルとして付与されていた感情情報と、を対応付けたデータを学習用データとして学習を実施することにより、分類器を構築する。分類器が周知の（多値）サポートベクトルマシンであれば、学習によりそのパラメータが決定される。当該分類器により上記のように推定部30による感情情報の推定が可能となる。 The learning unit 33 obtains the data obtained by associating the feature amount in the motion unit obtained as described above with the motion unit and the emotion information given as a label to the corresponding original behavior data. A classifier is constructed by performing learning as learning data. If the classifier is a well-known (multilevel) support vector machine, its parameters are determined by learning. The classifier enables the estimation unit 30 to estimate emotion information as described above.

図９は、感情統合部4の機能ブロック図である。感情統合部4は、統合部40及び重み推定部41を備える。 FIG. 9 is a functional block diagram of the emotion integration unit 4. The emotion integration unit 4 includes an integration unit 40 and a weight estimation unit 41.

統合部40は、後述する重み推定部41によって推定されている重み情報を用いて、素片感情推定部3より得られている感情情報がそれぞれ推定された動作素片に対し、当該感情情報を、感情情報推定装置10の入力データである行動データの全ての動作素片に渡って統合することにより、当該行動データがいずれの感情情報に該当するかの結果を得る。 The integration unit 40 uses the weight information estimated by the weight estimation unit 41, which will be described later, for each motion unit from which the emotion information obtained from the unit emotion estimation unit 3 is estimated, Then, by integrating all the action pieces of the action data that is the input data of the emotion information estimation device 10, a result as to which emotion information the action data corresponds to is obtained.

当該統合の際に、重み情報が利用される。重み情報は、感情情報で特定される各動作種別i(i=1, 2, …, k；kは分類部14における分類数)についての重みw(i)である。行動データをモーション解析部1にて分割して得られた一連の動作素片のうち、素片感情推定部3で推定された感情情報が感情j(j=1, 2, …, M；Mは学習データにて付与したラベル数)であり、モーション解析部1で分類された動作種別が動作種別iであるような動作素片の個数N(i,j)によって統計的に算出した尤度をL(i,j)とすると、統合部40の出力する最終結果としての行動データにおける感情j=j_{[推定結果]}は、以下の式(8)で与えられる。ここで、個数N(i,j)から尤度L(i,j)を算出する際は、種々の公知の算出手法を利用することができる。 In the integration, weight information is used. The weight information is a weight w (i) for each action type i (i = 1, 2,..., K; k is the number of classifications in the classification unit 14) specified by the emotion information. Of the series of motion segments obtained by dividing the behavior data by the motion analysis unit 1, the emotion information estimated by the segment emotion estimation unit 3 is emotion j (j = 1, 2,…, M; M Is the number of labels given by learning data), and the statistically calculated likelihood based on the number N (i, j) of motion elements whose motion type classified by motion analysis unit 1 is motion type i Is L (i, j), emotion j = j _{[estimation result]} in the action data as the final result output by the integration unit 40 is given by the following equation (8). Here, when calculating the likelihood L (i, j) from the number N (i, j), various known calculation methods can be used.

従って、本発明においては上記のように動作種別i毎の重みw(i)を考慮して感情ｊへの投票を実施し、最大尤度のものを選択するので、行動データに対して精度よく感情推定を実施することができる。 Therefore, in the present invention, as described above, voting for emotion j is performed in consideration of the weight w (i) for each action type i, and the one with the maximum likelihood is selected. Emotion estimation can be performed.

重み推定部41は、統合部40が上記統合を行うために参照する重み情報w(i)を推定する。当該推定には、以下の第一〜第四実施形態が可能である。 The weight estimation unit 41 estimates weight information w (i) that the integration unit 40 refers to in order to perform the integration. The following first to fourth embodiments are possible for the estimation.

第一実施形態では、事前知識やユーザの入力によって、モーション解析部1で分類される動作種別i(i=1, 2, …, k)毎に重みw(i)を設定する。例えば、全四種類の動作種別にそれぞれ、「0.3, 0.4, 0.2, 0.1」の重みを設定する。 In the first embodiment, the weight w (i) is set for each action type i (i = 1, 2,..., K) classified by the motion analysis unit 1 based on prior knowledge or user input. For example, a weight of “0.3, 0.4, 0.2, 0.1” is set for all four types of motion.

第二実施形態では、入力データの行動データに含まれる全動作素片につき、素片感情推定部3にて推定された感情情報を以下のように統計処理して、重みw(i)を算出する。図１０は、感情情報が4種類（例えば「喜」「怒」「哀」「楽」）、動作種別が4種類（例えば「準備段階」「前半本番段階」「後半本番段階」「終了段階」）である場合の、動作種別iごとに定まる感情情報jの分布の例である。このような分布より、エントロピーh(i)を以下の式(9)で算出する。 In the second embodiment, the emotion information estimated by the segment emotion estimation unit 3 is statistically processed as follows for all motion segments included in the action data of the input data, and the weight w (i) is calculated. To do. FIG. 10 shows four types of emotion information (eg, “joy”, “anger”, “sorrow”, “easy”) and four types of motion (eg, “preparation stage”, “first half production stage”, “second half production stage”, “end stage”). ) Is an example of the distribution of emotion information j determined for each action type i. From such a distribution, entropy h (i) is calculated by the following equation (9).

但し、p_i(j)は動作種別iにおいて感情jである確率であり、当該確率は上記のような分布から定まる。そして、動作種別iにおける重みw(i)を以下の式(10)で算出する。 However, p _i (j) is the probability of emotion j in action type i, and the probability is determined from the distribution as described above. Then, the weight w (i) for the operation type i is calculated by the following equation (10).

第三実施形態では、上記第二実施形態と同様の算出を、感情情報推定装置10への入力であり感情推定対象となっている行動データではなく、学習部33が分類器を構築する際に用いた学習データとしての複数の行動データを用いて実施する。このため、追加処理として、当該学習データとしての複数の行動データの各々を分割して得られ特徴量が与えられた一連の動作素片に対して、推定部30による推定処理を実施することで、学習データにおける一連の動作素片における感情情報を求め、これにより処理対象の分布を得るようにすればよい。 In the third embodiment, the same calculation as in the second embodiment is performed when the learning unit 33 constructs a classifier instead of the behavior data that is an input to the emotion information estimation device 10 and is an emotion estimation target. It carries out using a plurality of behavior data as the learning data used. For this reason, as an additional process, an estimation process by the estimation unit 30 is performed on a series of motion elements obtained by dividing each of the plurality of behavior data as the learning data and given a feature amount. The emotion information in a series of motion elements in the learning data may be obtained, and thereby the distribution of the processing target may be obtained.

すなわち、第三実施形態における追加処理では、推定部30が利用する分類器を予め構築しておくのに用いた学習データにおける動作素片に対して、当該学習データによって構築された分類器それ自身が、推定部30により適用される。 That is, in the additional processing in the third embodiment, the classifier itself constructed by the learning data for the motion element in the learning data used for pre-constructing the classifier used by the estimation unit 30 Is applied by the estimation unit 30.

第四実施形態では、上記第三実施形態と同様に学習データをもとに算出を実施するが、第三実施形態において学習データにおける動作素片に推定部30による推定処理を実施して感情情報を求めた後、さらに、次の処理を行う。すなわち、当該学習データにおける推定結果の感情情報と、学習データにおいて予め正解ラベルとして付与されている感情情報と、を比較して、動作種別iごとにその正解率c(i)を求める。そして、当該正解率を以下の式(11)によって正規化して、重みw(i)を求める。 In the fourth embodiment, the calculation is performed based on the learning data as in the third embodiment. However, in the third embodiment, the estimation process by the estimation unit 30 is performed on the motion element in the learning data, and the emotion information Further, the following processing is performed. That is, the emotion information of the estimation result in the learning data is compared with the emotion information previously given as the correct answer label in the learning data, and the correct answer rate c (i) is obtained for each action type i. Then, the accuracy rate is normalized by the following equation (11) to obtain the weight w (i).

以上、本発明によれば、重みw(i)の利用により、行動データから高精度に感情情報を推定することが可能となる。なお、本発明は、CPU(中央演算装置)、メモリ、入出力インターフェース等の周知のハードウェアで構成されるコンピュータを感情情報推定装置10の各部（全部を含む）として機能させるプログラムとしても提供可能であり、また、感情情報推定装置10の動作方法としても提供可能である。コンピュータを感情情報推定装置10の各部として機能させる場合には、CPUが当該各部の機能を実現するような所定の命令に従って稼働すればよく、その際に参照すべきデータはメモリに格納しておけばよい。 As described above, according to the present invention, it is possible to estimate emotion information with high accuracy from behavior data by using the weight w (i). The present invention can also be provided as a program that causes a computer composed of known hardware such as a CPU (Central Processing Unit), a memory, and an input / output interface to function as each part (including all) of the emotion information estimation apparatus 10 It can also be provided as an operation method of the emotion information estimation apparatus 10. When the computer functions as each part of the emotion information estimation device 10, it is sufficient that the CPU operates according to a predetermined command that realizes the function of each part, and data to be referred to at that time can be stored in a memory. That's fine.

10…感情情報推定装置、1…モーション解析部、2…特徴量算出部、3…素片感情推定部、4…感情統合部 10 ... Emotion information estimation device, 1 ... Motion analysis unit, 2 ... Feature quantity calculation unit, 3 ... Fragment emotion estimation unit, 4 ... Emotion integration unit

Claims

A motion analysis unit that divides the behavior data into motion segments on the time axis, classifies each motion segment, and identifies which of the plurality of motion types corresponds,
A feature amount calculation unit for calculating a feature amount from each motion element;
A unit emotion estimation unit that estimates which of the plurality of pieces of emotion information each motion unit corresponds to based on the calculated feature amount;
An emotion integration unit that identifies which of the plurality of emotion information the behavior data corresponds to by integrating the estimated emotion information over the entire behavior data for each motion element;
The emotion information estimation device, wherein the emotion integration unit integrates the motion information using weight information for the specified action type for each action element.

The emotion information estimation apparatus according to claim 1, wherein the weight information is predetermined by giving a predetermined value of weight for each action type.

The emotion integration unit obtains a distribution of emotion information of motion segments for each motion type in the behavior data, and determines the weight information for each motion type based on the distribution. Emotion information estimation device.

The emotion integration unit obtains the probability of emotion information in each action type from the distribution, obtains the entropy of each action type from the probability, and obtains the weight information for each action type based on the entropy. The emotion information estimation apparatus according to claim 3.

The segment emotion estimation unit
A classifier constructed by performing learning based on a feature amount calculated from a series of motion pieces obtained from learning behavior data and emotion information previously given as a label to the learning behavior data. By using
The emotion information estimation apparatus according to claim 1, wherein based on the calculated feature amount, it is estimated which of the plurality of emotion information each motion element in the behavior data corresponds to.

The emotion integration unit
While identifying which of a plurality of operation types a series of operation elements obtained from the learning action data,
A distribution of emotion information of motion segments obtained by applying the classifier to a series of motion segments obtained from the learning behavior data is obtained for each motion type, and for each motion type based on the distribution The emotion information estimation apparatus according to claim 5, wherein the weight information is obtained.

The emotion integration unit compares the emotion information of the motion segment for each motion type obtained by applying the classifier to the learning data with the emotion information previously given as a label to the learning behavior data The emotion information estimation apparatus according to claim 6, wherein a correct answer rate for estimating emotion information for each action type is obtained, and the weight information for each action type is obtained based on the correct answer rate.

A motion analysis stage that divides the behavior data into motion segments on the time axis, classifies each motion segment, and identifies which of the plurality of motion types corresponds,
A feature amount calculation stage for calculating a feature amount from each motion element;
Based on the calculated feature quantity, a segment emotion estimation stage for estimating which of the plurality of emotion information each motion segment corresponds to,
An emotion integration stage that identifies which of the plurality of emotion information the behavior data corresponds to by integrating the estimated emotion information over the entire behavior data for each motion unit,
The emotion information estimation method, wherein in the emotion integration stage, the weight information for the specified action type is used for each action element and is integrated.

An emotion information estimation program for causing a computer to function as the emotion information estimation apparatus according to any one of claims 1 to 7.