JP2023182085A

JP2023182085A - Behavior recognition device, learning device, and behavior recognition method

Info

Publication number: JP2023182085A
Application number: JP2022095486A
Authority: JP
Inventors: 敦根尾; Atsushi Neo
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2022-06-14
Filing date: 2022-06-14
Publication date: 2023-12-26
Also published as: WO2023243130A1

Abstract

To reduce storage capacity and to accelerate behavior recognition in the case of accurately recognizing a plurality of kinds of behaviors of a recognition object in which a portion of its shape is missing.SOLUTION: A behavior recognition device can access a behavior classification model which is trained by using a component group about a learning object obtained from a shape of the learning object by component analysis for generating a statistical component by multivariate analysis, and a behavior of the learning object, detects the shape of the learning object from analysis object data, generates missing position information showing the position of a missing place in the shape of the recognition object, interpolates the missing place from non-missing information, updates the interpolated non-missing information as the shape of the recognition object, generates a component group about the same number of recognition objects as component groups about the learning object on the basis of the shape of an interpolated recognition object by component analysis, and outputs a recognition result showing a behavior of the recognition object by inputting the component group of the generated recognition object, and the missing position information to the behavior classification model.SELECTED DRAWING: Figure 1

Description

本発明は、行動認識装置、学習装置、および行動認識方法に関する。 The present invention relates to a behavior recognition device, a learning device, and a behavior recognition method.

本技術分野の背景技術として、特許文献１は、認識対象の複数種類の行動を高精度に認識する行動認識装置を開示する。この行動認識装置は、多変量解析で統計的な成分を生成する成分分析により学習対象の形状から得られる成分群と、学習対象の行動と、を用いて、成分群ごとに学習された行動分類モデル群にアクセス可能であり、解析対象データから認識対象の形状を検出し、成分分析により、認識対象の形状に基づいて、１以上の成分と、成分の各々の寄与率と、を生成し、各々の寄与率から得られる累積寄与率に基づいて、１以上の成分の各々の次元を示す序数を決定し、決定された次元を示す序数の成分を１以上含む特定の成分群と同じ成分群で学習された特定の行動分類モデルを、行動分類モデル群から選択し、特定の行動分類モデルに特定の成分群を入力することにより、認識対象の行動を示す認識結果を出力する。 As background art in this technical field, Patent Document 1 discloses an action recognition device that highly accurately recognizes multiple types of actions to be recognized. This behavior recognition device uses a component group obtained from the shape of the learning target through component analysis that generates statistical components through multivariate analysis, and the behavior of the learning target, and learns behavior classification for each component group. has access to a model group, detects the shape of the recognition target from the data to be analyzed, and generates one or more components and the contribution rate of each component based on the shape of the recognition target through component analysis; Based on the cumulative contribution rate obtained from each contribution rate, an ordinal number indicating each dimension of one or more components is determined, and a component group that is the same as a specific component group that includes one or more components with an ordinal number indicating the determined dimension is determined. The specific behavior classification model learned in is selected from the behavior classification model group, and a specific component group is input to the specific behavior classification model, thereby outputting a recognition result indicating the behavior to be recognized.

特開２０２２－４３９７４号公報Japanese Patent Application Publication No. 2022-43974

しかしながら、上述した特許文献１は、行動分類モデル群から特定の行動分類モデルを選択して行動認識する技術であり、１つの行動分類モデルを駆使して行動認識する点については、考慮されていない。 However, the above-mentioned Patent Document 1 is a technology for recognizing behavior by selecting a specific behavior classification model from a group of behavior classification models, and does not take into consideration the point of recognizing behavior by making full use of one behavior classification model. .

本発明は、形状の一部が欠損した認識対象の複数種類の行動を高精度に認識する場合に、記憶容量の低減および行動認識の高速化を図ることを目的とする。 An object of the present invention is to reduce memory capacity and speed up behavior recognition when recognizing multiple types of behaviors of recognition targets whose shapes are partially missing with high accuracy.

本願において開示される発明の一側面となる行動認識装置は、プログラムを実行するプロセッサと、前記プログラムを記憶する記憶デバイスと、を有する行動認識装置であって、多変量解析で統計的な成分を生成する成分分析により学習対象の形状から得られる前記学習対象に関する成分群と、前記学習対象の行動と、を用いて学習された行動分類モデルにアクセス可能であり、前記プロセッサは、解析対象データから認識対象の形状を検出する検出処理と、前記検出処理によって検出された前記認識対象の形状のうち欠損箇所の位置を示す欠損位置情報を生成する欠損位置情報生成処理と、前記欠損箇所を含む前記認識対象の形状のうち前記欠損箇所以外の箇所である非欠損情報から前記欠損箇所を補間し、補間後の前記非欠損情報を前記認識対象の形状として更新する補間処理と、前記成分分析により、前記補間処理によって補間された前記認識対象の形状に基づいて、前記学習対象に関する成分群と同数の前記認識対象に関する成分群を生成する成分分析処理と、前記行動分類モデルに、前記成分分析処理によって生成された前記認識対象に関する成分群と、前記欠損位置情報と、を入力することにより、前記認識対象の行動を示す認識結果を出力する行動認識処理と、を実行することを特徴とする。 A behavior recognition device that is an aspect of the invention disclosed in this application is a behavior recognition device that includes a processor that executes a program, and a storage device that stores the program, and that calculates statistical components by multivariate analysis. It is possible to access a behavior classification model learned using a component group related to the learning target obtained from the shape of the learning target by the generated component analysis and the behavior of the learning target, and the processor a detection process for detecting the shape of the recognition target; a missing position information generation process for generating missing position information indicating a position of a missing part in the shape of the recognition target detected by the detection process; By interpolating the missing part from non-missing information that is a part other than the missing part in the shape of the recognition target, and updating the non-missing information after interpolation as the shape of the recognition target, and the component analysis, a component analysis process that generates the same number of component groups related to the recognition target as component groups related to the learning target based on the shape of the recognition target interpolated by the interpolation process; The present invention is characterized in that by inputting the generated component group related to the recognition target and the missing position information, a behavior recognition process is executed to output a recognition result indicating the behavior of the recognition target.

本願において開示される発明の一側面となる学習装置は、プログラムを実行するプロセッサと、前記プログラムを記憶する記憶デバイスと、を有する学習装置であって、前記プロセッサは、学習対象の形状および行動を含む教師データを取得する取得処理と、前記取得処理によって取得された前記学習対象の形状を欠損させる欠損処理と、前記欠損処理によって前記学習対象の形状から欠損させた欠損箇所の位置を示す欠損位置情報を生成する欠損位置情報生成処理と、前記学習対象の形状のうち前記欠損処理によって欠損させた前記欠損箇所以外の箇所である非欠損情報から補間し、補間後の前記非欠損情報を前記学習対象の形状として更新する補間処理と、多変量解析で統計的な成分を生成する成分分析により、前記補間処理によって補間された前記学習対象の形状に基づいて、前記学習対象に関する成分群を生成する成分分析処理と、前記成分分析処理によって生成された前記学習対象に関する成分群と、前記学習対象の行動と、前記欠損位置情報と、に基づいて、前記学習対象の行動を学習して、前記学習対象の行動を分類する行動分類モデルを生成する行動学習処理と、を実行することを特徴とする。 A learning device that is one aspect of the invention disclosed in this application is a learning device that has a processor that executes a program, and a storage device that stores the program, and the processor is configured to learn the shape and behavior of a learning target. an acquisition process for acquiring teacher data including; a deletion process for deleting the shape of the learning target acquired by the acquisition process; and a deletion position indicating the position of the missing part deleted from the shape of the learning target by the deletion process. a missing position information generation process that generates information, and interpolation from non-missing information that is a location other than the missing location that was lost by the missing process in the shape of the learning target, and the non-missing information after interpolation is used for the learning. A component group related to the learning target is generated based on the shape of the learning target interpolated by the interpolation process, using interpolation processing to update the shape of the target and component analysis to generate statistical components by multivariate analysis. Learning the behavior of the learning target based on a component analysis process, a component group related to the learning target generated by the component analysis process, the behavior of the learning target, and the missing position information, and performing the learning. The present invention is characterized by executing a behavior learning process that generates a behavior classification model for classifying target behavior.

本発明の代表的な実施の形態によれば、形状の一部が欠損した認識対象の複数種類の行動を高精度に認識する場合に、記憶容量の低減および行動認識の高速化を図ることができる。前述した以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 According to a typical embodiment of the present invention, when recognizing multiple types of actions of a recognition target whose shape is partially missing with high accuracy, it is possible to reduce memory capacity and speed up action recognition. can. Problems, configurations, and effects other than those described above will become clear from the description of the following examples.

図１は、実施例１にかかる行動認識システムのシステム構成例を示す説明図である。FIG. 1 is an explanatory diagram showing an example of a system configuration of an action recognition system according to a first embodiment. 図２は、コンピュータのハードウェア構成例を示すブロック図である。FIG. 2 is a block diagram showing an example of the hardware configuration of a computer. 図３は、学習データの一例を示す説明図である。FIG. 3 is an explanatory diagram showing an example of learning data. 図４は、実施例１にかかる行動認識システムの機能的構成例を示すブロック図である。FIG. 4 is a block diagram showing an example of the functional configuration of the action recognition system according to the first embodiment. 図５は、骨格情報処理部の詳細な機能的構成例を示すブロック図である。FIG. 5 is a block diagram showing a detailed functional configuration example of the skeleton information processing section. 図６は、関節角度算出部が実行する関節角度の詳細な算出方法を示す説明図である。FIG. 6 is an explanatory diagram showing a detailed method for calculating joint angles executed by the joint angle calculating section. 図７は、移動量算出部が実行するフレーム間の移動量の詳細な算出方法の例を示す説明図である。FIG. 7 is an explanatory diagram illustrating a detailed example of a method for calculating the amount of movement between frames, which is executed by the movement amount calculating section. 図８は、正規化部が実行する骨格情報の正規化の詳細な手法を示す説明図である。FIG. 8 is an explanatory diagram showing a detailed method of normalizing skeleton information executed by the normalization unit. 図９は、教師信号ＤＢが保持する教師信号の詳細な例を示す説明図である。FIG. 9 is an explanatory diagram showing a detailed example of the teacher signal held by the teacher signal DB. 図１０は、教師信号を入力データとして主成分分析部が生成した主成分を、主成分空間上にプロットした例を示す説明図である。FIG. 10 is an explanatory diagram showing an example in which principal components generated by a principal component analysis section using a teacher signal as input data are plotted on a principal component space. 図１１は、主成分分析部および欠損位置情報生成部が、行動学習部に出力するデータを示す説明図である。FIG. 11 is an explanatory diagram showing data outputted by the principal component analysis section and the missing position information generation section to the behavior learning section. 図１２は、行動学習部が行動を学習し、行動認識部が行動を分類するための詳細な手法を示す説明図である。FIG. 12 is an explanatory diagram showing a detailed method for the behavior learning unit to learn behavior and for the behavior recognition unit to classify the behavior. 図１３は、主成分分析部が次元数決定の際に用いる累積寄与率の推移を示すグラフである。FIG. 13 is a graph showing changes in the cumulative contribution rate used by the principal component analysis unit when determining the number of dimensions. 図１４は、実施例１にかかるサーバ（学習装置）による学習処理の詳細な処理手順例を示すフローチャートである。FIG. 14 is a flowchart illustrating a detailed processing procedure example of learning processing by the server (learning device) according to the first embodiment. 図１５は、実施例１にかかる骨格情報処理の詳細な処理手順例を示すフローチャートである。FIG. 15 is a flowchart illustrating a detailed processing procedure example of skeleton information processing according to the first embodiment. 図１６は、実施例１にかかるクライアント（行動認識装置）による行動認識処理手順例を示すフローチャートである。FIG. 16 is a flowchart illustrating an example of a behavior recognition processing procedure by a client (behavior recognition device) according to the first embodiment. 図１７は、実施例２にかかる行動認識システムの機能的構成例を示すブロック図である。FIG. 17 is a block diagram showing an example of the functional configuration of the action recognition system according to the second embodiment. 図１８は、実施例３にかかる教師信号ＤＢが保持する教師信号の詳細な例を示す説明図である。FIG. 18 is an explanatory diagram showing a detailed example of the teacher signal held by the teacher signal DB according to the third embodiment. 図１９は、実施例３にかかる主成分分析部および欠損位置情報生成部が、行動学習部に出力するデータを示す説明図である。FIG. 19 is an explanatory diagram showing data that the principal component analysis section and the missing position information generation section according to the third embodiment output to the behavior learning section. 図２０は、実施例３にかかる行動認識システムの機能的構成例を示すブロック図である。FIG. 20 is a block diagram showing an example of the functional configuration of the action recognition system according to the third embodiment. 図２１は、実施例３にかかるクライアント（行動認識装置）による行動認識処理手順例を示すフローチャートである。FIG. 21 is a flowchart illustrating an example of a behavior recognition processing procedure by a client (behavior recognition device) according to the third embodiment. 図２２は、実施例４にかかる骨格情報処理部の機能的構成例を示すブロック図である。FIG. 22 is a block diagram showing an example of the functional configuration of the skeleton information processing section according to the fourth embodiment. 図２３は、実施例４にかかる骨格情報処理部の詳細な処理手順例を示すフローチャートである。FIG. 23 is a flowchart illustrating a detailed processing procedure example of the skeleton information processing unit according to the fourth embodiment. 図２４は、実施例５にかかる行動認識システムの機能的構成例を示すブロック図である。FIG. 24 is a block diagram showing an example of the functional configuration of the action recognition system according to the fifth embodiment. 図２５は、実施例６にかかる行動認識システムの機能的構成例を示すブロック図である。FIG. 25 is a block diagram showing an example of the functional configuration of the action recognition system according to the sixth embodiment. 図２６は、行動学習部および行動認識部が行動を分類するための基礎となる手法である決定木を示す説明図である。FIG. 26 is an explanatory diagram showing a decision tree, which is a basic method for classifying behaviors by the behavior learning unit and the behavior recognition unit. 図２７は、決定木による分類の詳細な展開方法を示す説明図である。FIG. 27 is an explanatory diagram showing a detailed method for developing classification using a decision tree. 図２８は、アンサンブル学習と、行動学習部と行動認識部が行動を分類するために用いる手法を示す説明図である。FIG. 28 is an explanatory diagram showing ensemble learning and a method used by the behavior learning section and the behavior recognition section to classify behaviors. 図２９は、実施例７にかかる行動認識システムの機能的構成例を示すブロック図である。FIG. 29 is a block diagram showing an example of the functional configuration of the action recognition system according to the seventh embodiment. 図３０は、実施例７にかかるサーバ（学習装置）による学習処理の詳細な処理手順例を示すフローチャートである。FIG. 30 is a flowchart illustrating a detailed processing procedure example of learning processing by the server (learning device) according to the seventh embodiment.

以下、本発明に係る実施の形態を図面に基づいて説明する。なお、実施の形態を説明するための全図において、同一の部材には原則として同一の符号を付し、その繰り返しの説明は省略する。また、以下の実施の形態において、その構成要素（要素ステップ等も含む）は、特に明示した場合および原理的に明らかに必須であると考えられる場合等を除き、必ずしも必須のものではないことは言うまでもない。また、「Ａからなる」、「Ａよりなる」、「Ａを有する」、「Ａを含む」と言うときは、特にその要素のみである旨明示した場合等を除き、それ以外の要素を排除するものでないことは言うまでもない。同様に、以下の実施の形態において、構成要素等の形状、位置関係等に言及するときは、特に明示した場合および原理的に明らかにそうでないと考えられる場合等を除き、実質的にその形状等に近似または類似するもの等を含むものとする。 Embodiments according to the present invention will be described below based on the drawings. In addition, in all the figures for explaining the embodiment, the same members are given the same reference numerals in principle, and repeated explanations thereof will be omitted. In addition, in the following embodiments, the constituent elements (including elemental steps, etc.) are not necessarily essential, except when explicitly stated or when it is clearly considered essential in principle. Needless to say. In addition, when we say "consists of A," "consists of A," "has A," or "contains A," other elements are excluded, unless it is specifically stated that only that element is included. Needless to say, this is not something you should do. Similarly, in the following embodiments, when referring to the shape, positional relationship, etc. of components, etc., the shape, positional relationship, etc. of components, etc. are referred to, unless specifically stated or when it is considered that it is clearly not possible in principle. This shall include things that approximate or are similar to, etc.

本明細書等における「第１」、「第２」、「第３」などの表記は、構成要素を識別するために付するものであり、必ずしも、数、順序、もしくはその内容を限定するものではない。また、構成要素の識別のための番号は文脈毎に用いられ、一つの文脈で用いた番号が、他の文脈で必ずしも同一の構成を示すとは限らない。また、ある番号で識別された構成要素が、他の番号で識別された構成要素の機能を兼ねることを妨げるものではない。 In this specification, etc., expressions such as "first," "second," and "third" are used to identify constituent elements, and do not necessarily limit the number, order, or content thereof. isn't it. Further, numbers for identifying components are used for each context, and a number used in one context does not necessarily indicate the same configuration in another context. Furthermore, this does not preclude a component identified by a certain number from serving the function of a component identified by another number.

図面等において示す各構成の位置、大きさ、形状、範囲などは、発明の理解を容易にするため、実際の位置、大きさ、形状、範囲などを表していない場合がある。このため、本発明は、必ずしも、図面等に開示された位置、大きさ、形状、範囲などに限定されない。 The position, size, shape, range, etc. of each component shown in the drawings etc. may not represent the actual position, size, shape, range, etc. in order to facilitate understanding of the invention. Therefore, the present invention is not necessarily limited to the position, size, shape, range, etc. disclosed in the drawings or the like.

＜行動認識システム＞
図１は、実施例１にかかる行動認識システムのシステム構成例を示す説明図である。行動認識システム１００は、サーバ１０１と、１台以上のクライアント１０２と、を有する。サーバとクライアントとは、インターネット、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）などのネットワーク１０５を介して通信可能に接続される。サーバ１０１は、クライアント１０２を管理するコンピュータである。クライアント１０２は、センサ１０３に接続され、センサ１０３からのデータを取得するコンピュータである。 <Action recognition system>
FIG. 1 is an explanatory diagram showing an example of a system configuration of an action recognition system according to a first embodiment. The behavior recognition system 100 includes a server 101 and one or more clients 102. The server and the client are communicably connected via a network 105 such as the Internet, a LAN (Local Area Network), or a WAN (Wide Area Network). The server 101 is a computer that manages the clients 102. The client 102 is a computer that is connected to the sensor 103 and acquires data from the sensor 103.

センサ１０３は、解析環境から解析対象データを検出する。センサ１０３は、たとえば、静止画または動画を撮像するカメラである。また、センサ１０３は、音声や匂いを検出してもよい。教師信号ＤＢ１０４は、学習データ（人の骨格情報）と行動情報（たとえば、「立つ」、「倒れる」といった人の姿勢や動作）との組み合わせを教師信号として保持するデータベースである。教師信号ＤＢ１０４は、サーバ１０１に記憶されていてもよく、サーバ１０１またはクライアント１０２とネットワーク１０５を介して通信可能なコンピュータに接続されていてもよい。 The sensor 103 detects data to be analyzed from the analysis environment. The sensor 103 is, for example, a camera that captures still images or moving images. Further, the sensor 103 may detect sound or smell. The teacher signal DB 104 is a database that holds a combination of learning data (person's skeletal information) and behavior information (for example, a person's posture or motion such as "standing" or "falling down") as a teacher signal. The teacher signal DB 104 may be stored in the server 101 or may be connected to a computer that can communicate with the server 101 or the client 102 via the network 105.

行動認識システム１００は、教師信号ＤＢ１０４を用いた学習機能と、学習機能により得られた行動分類モデルを用いた行動認識機能と、を有する。行動分類モデルとは、人や動物などの認識対象の行動を分類するための学習モデルである。学習機能および行動認識機能は、行動認識システム１００に実装されていれば、サーバ１０１およびクライアント１０２のいずれに実装されていてもよい。たとえば、サーバ１０１が学習機能を実装し、クライアント１０２が行動認識機能を実装してもよい。また、サーバ１０１が学習機能および行動認識機能を実装し、クライアント１０２は、センサ１０３からのデータをサーバ１０１に送信したり、サーバ１０１からの行動認識機能による行動認識結果を受け付けたりしてもよい。 The behavior recognition system 100 has a learning function using the teacher signal DB 104 and an action recognition function using a behavior classification model obtained by the learning function. A behavior classification model is a learning model for classifying the behavior of recognition targets such as people and animals. The learning function and the behavior recognition function may be implemented in either the server 101 or the client 102 as long as they are implemented in the behavior recognition system 100. For example, the server 101 may implement a learning function, and the client 102 may implement an action recognition function. Further, the server 101 may implement a learning function and a behavior recognition function, and the client 102 may transmit data from the sensor 103 to the server 101 or receive behavior recognition results from the server 101 using the behavior recognition function. .

また、クライアント１０２が学習機能および行動認識機能を実装し、サーバ１０１は、クライアント１０２からの行動分類モデルや行動認識結果を管理してもよい。なお、学習機能を実装するコンピュータを学習装置と称し、学習機能および行動認識機能のうち少なくとも行動認識機能を実装するコンピュータを行動認識装置と称す。また、図１では、クライアントサーバ型の行動認識システム１００を例に挙げたが、スタンドアロン型の行動認識装置でもよい。実施例１では、説明の便宜上、サーバ１０１が学習機能を実装し（学習装置）、クライアント１０２が行動認識機能を実装した（行動認識装置）行動認識システム１００を例に挙げて説明する。 Further, the client 102 may implement a learning function and a behavior recognition function, and the server 101 may manage behavior classification models and behavior recognition results from the client 102. Note that a computer implementing a learning function is referred to as a learning device, and a computer implementing at least an action recognition function of the learning function and the action recognition function is referred to as an action recognition device. Further, in FIG. 1, a client-server type behavior recognition system 100 is taken as an example, but a stand-alone type behavior recognition device may be used. In the first embodiment, for convenience of explanation, a behavior recognition system 100 will be described as an example in which the server 101 implements a learning function (learning device) and the client 102 implements a behavior recognition function (behavior recognition device).

＜コンピュータのハードウェア構成例＞
図２は、コンピュータ（サーバ１０１、クライアント１０２）のハードウェア構成例を示すブロック図である。コンピュータ２００は、プロセッサ２０１と、記憶デバイス２０２と、入力デバイス２０３と、出力デバイス２０４と、通信インターフェース（通信ＩＦ）２０５と、を有する。プロセッサ２０１、記憶デバイス２０２、入力デバイス２０３、出力デバイス２０４、および通信ＩＦ２０５は、バス２０６により接続される。プロセッサ２０１は、コンピュータ２００を制御する。記憶デバイス２０２は、プロセッサ２０１の作業エリアとなる。また、記憶デバイス２０２は、各種プログラムやデータを記憶する非一時的なまたは一時的な記録媒体である。記憶デバイス２０２としては、たとえば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、フラッシュメモリがある。入力デバイス２０３は、データを入力する。入力デバイス２０３としては、たとえば、キーボード、マウス、タッチパネル、テンキー、スキャナがある。出力デバイス２０４は、データを出力する。出力デバイス２０４としては、たとえば、ディスプレイ、プリンタ、スピーカがある。通信ＩＦ２０５は、ネットワーク１０５と接続し、データを送受信する。 <Example of computer hardware configuration>
FIG. 2 is a block diagram showing an example of the hardware configuration of a computer (server 101, client 102). The computer 200 includes a processor 201, a storage device 202, an input device 203, an output device 204, and a communication interface (communication IF) 205. The processor 201, storage device 202, input device 203, output device 204, and communication IF 205 are connected by a bus 206. Processor 201 controls computer 200 . Storage device 202 serves as a work area for processor 201 . Furthermore, the storage device 202 is a non-temporary or temporary recording medium that stores various programs and data. Examples of the storage device 202 include ROM (Read Only Memory), RAM (Random Access Memory), HDD (Hard Disk Drive), and flash memory. Input device 203 inputs data. Examples of the input device 203 include a keyboard, mouse, touch panel, numeric keypad, and scanner. Output device 204 outputs data. Examples of the output device 204 include a display, a printer, and a speaker. Communication IF 205 connects to network 105 and transmits and receives data.

＜学習データ＞
図３は、学習データの一例を示す説明図である。学習データ３８０は、対象者ごとに骨格情報３２０と、関節角度３７０と、により構成される。骨格情報３２０は、センサ１０３から取得した解析対象データを基に検出される。関節角度３７０は、骨格情報３２０を基に算出される。対象者一人分の学習データ３８０は、たとえば、その対象者が被写体となる複数の時系列なフレームの各々から得られる骨格情報３２０および関節角度３７０の組み合わせにより構成される。 <Learning data>
FIG. 3 is an explanatory diagram showing an example of learning data. The learning data 380 includes skeletal information 320 and joint angles 370 for each subject. Skeletal information 320 is detected based on analysis target data acquired from sensor 103. Joint angle 370 is calculated based on skeletal information 320. The learning data 380 for one subject is configured, for example, by a combination of skeletal information 320 and joint angles 370 obtained from each of a plurality of time-series frames in which the subject is the subject.

骨格情報３２０は、複数（本例では１８個）の骨格点３００～３１７の各々について、名前３２１と、ｘ軸におけるｘ座標値３２２と、ｘ軸に直交するｙ軸におけるｙ座標値３２３と、を有する。関節角度３７０も、複数（本例では１８個）の骨格点３００～３１７の各々について、名前３７１を有する。なお、名前３７１において、∠ａ－ｂ－ｃ（ａ，ｂ，ｃは骨格点の名前３２１）は、線分ａｂと線分ｂｃとのなす骨格点ｂの関節角度３７０である。なお、骨格情報３２０は、たとえば、指の関節を含んでもよい。また、関節角度３７０も、これら以外の関節角度３７０を含んでもよい。 The skeleton information 320 includes, for each of a plurality of (18 in this example) skeleton points 300 to 317, a name 321, an x-coordinate value 322 on the x-axis, a y-coordinate value 323 on the y-axis perpendicular to the x-axis, has. The joint angle 370 also has a name 371 for each of the plurality of (18 in this example) skeletal points 300-317. In addition, in the name 371, ∠a-b-c (a, b, c are the names 321 of skeleton points) is the joint angle 370 of the skeleton point b formed by the line segment ab and the line segment bc. Note that the skeletal information 320 may include, for example, finger joints. Further, the joint angle 370 may also include joint angles 370 other than these.

なお、図３では、骨格点３００～３１７の座標値を２次元の位置情報（ｘ座標値およびｙ座標値の組み合わせ）としたが、３次元の位置情報としてもよい。具体的には、たとえば、ｘ軸およびｙ軸に直交するｚ軸（たとえば、奥行き方向）におけるｚ座標値が追加されてもよい。 Note that in FIG. 3, the coordinate values of the skeleton points 300 to 317 are two-dimensional positional information (a combination of x-coordinate values and y-coordinate values), but they may be three-dimensional positional information. Specifically, for example, a z-coordinate value in the z-axis (eg, depth direction) perpendicular to the x-axis and the y-axis may be added.

＜行動認識システム１００の機能的構成例＞
図４は、実施例１にかかる行動認識システム１００の機能的構成例を示すブロック図である。サーバ１０１は、教師信号取得部４０１と、欠損発生部４０２と、欠損制御部４２１と、欠損位置情報生成部４２２と、欠損情報補間部４２３と、骨格情報処理部４０３と、主成分分析部４０４と、行動学習部４０６と、を有する。クライアント１０２は、骨格検出部４５１と、欠損位置情報生成部４６２と、欠損情報補間部４６１と、骨格情報処理部４５３と、主成分分析部４５４と、行動認識部４５７と、を有する。 <Functional configuration example of behavior recognition system 100>
FIG. 4 is a block diagram showing an example of the functional configuration of the behavior recognition system 100 according to the first embodiment. The server 101 includes a teacher signal acquisition unit 401, a loss generation unit 402, a loss control unit 421, a loss position information generation unit 422, a loss information interpolation unit 423, a skeleton information processing unit 403, and a principal component analysis unit 404. and a behavior learning unit 406. The client 102 includes a skeleton detection section 451, a missing position information generation section 462, a missing information interpolation section 461, a skeleton information processing section 453, a principal component analysis section 454, and an action recognition section 457.

これらは、具体的には、たとえば、図２に示した記憶デバイス２０２に記憶されたプログラムをプロセッサ２０１に実行させることにより実現される。まず、サーバ１０１側の機能的構成例について説明する。 Specifically, these are realized, for example, by causing the processor 201 to execute a program stored in the storage device 202 shown in FIG. 2. First, an example of the functional configuration of the server 101 will be described.

教師信号取得部４０１は、教師信号ＤＢ１０４から取得した教師信号について学習に用いる教師信号を単数、または複数取得して、選択した教師信号を欠損発生部４０２に出力する。教師信号における骨格情報３２０は、骨格点の欠損が１つもない非欠損情報である。 The teacher signal acquisition unit 401 acquires one or more teacher signals used for learning from the teacher signal DB 104 and outputs the selected teacher signal to the loss generation unit 402 . The skeleton information 320 in the teacher signal is non-defective information without any missing skeleton points.

欠損制御部４２１は、乱数、または予め定めた数に従って、欠損させる骨格点を設定し、欠損発生情報として、欠損発生部４０２と、欠損位置情報生成部４２２と、に出力する。欠損させる骨格点の数は、単数、複数、または欠損させないことを示す０の場合もある。ここで予め定めた数とは、たとえば、後述する骨格検出部４５１が、センサ１０３から取得した解析対象データに映る人の骨格情報３２０を検出する際に検出漏れの可能性が高い骨格位置を示す数とする。 The loss control unit 421 sets skeleton points to be lost according to random numbers or predetermined numbers, and outputs them as loss occurrence information to the loss generation unit 402 and the loss position information generation unit 422. The number of skeleton points to be deleted may be single, multiple, or zero indicating that no skeleton points are to be deleted. The predetermined number here indicates, for example, a skeletal position where there is a high possibility that detection will be missed when the skeletal detection unit 451 (described later) detects the skeletal information 320 of a person appearing in the data to be analyzed acquired from the sensor 103. Let it be a number.

欠損発生部４０２は、教師信号取得部４０１から取得した教師信号の内、骨格情報３２０に対して欠損発生情報に従い骨格点を欠損させる。欠損発生部４０２は、欠損後（欠損させる骨格点が０個の場合も含む）の骨格情報３２０を教師信号内の骨格情報３２０として更新する。またノイズ耐性を強くするため、骨格点を欠損させる際に骨格情報３２０に対して、欠損させない骨格点の位置をずらすようなノイズを加えて、骨格情報３２０を更新してもよい。欠損した単数または複数の骨格情報３２０を「欠損情報」と表記する。 The loss generation unit 402 causes skeleton points to be lost in the skeleton information 320 of the teacher signal acquired from the teacher signal acquisition unit 401 according to the loss occurrence information. The loss generating unit 402 updates the skeleton information 320 after the loss (including the case where the number of skeleton points to be deleted is zero) as the skeleton information 320 in the teacher signal. Furthermore, in order to strengthen noise resistance, when a skeleton point is deleted, the skeleton information 320 may be updated by adding noise that shifts the position of the skeleton point that is not to be deleted. The missing skeleton information 320 or pieces of skeleton information 320 will be referred to as "missing information."

そして、欠損発生部４０２は、欠損させた骨格点の名前３２１および位置情報（ｘ座標値３２２、ｙ座標値３２３）である欠損情報を含む教師信号を、欠損情報補間部４２３に出力する。 Then, the defect generation unit 402 outputs a teacher signal containing the missing information, which is the name 321 and position information (x coordinate value 322, y coordinate value 323) of the missing skeleton point, to the missing information interpolation unit 423.

欠損情報補間部４２３は、欠損情報を、欠損していない骨格情報３２０から補間する。欠損していない単数または複数の骨格情報３２０を非欠損情報と表記する。 The missing information interpolation unit 423 interpolates missing information from the non-missing skeleton information 320. One or more pieces of skeleton information 320 that are not missing are referred to as non-missing information.

具体的には、たとえば、欠損情報補間部４２３は、非欠損情報の内、欠損情報の骨格点と連結されていた骨格点または欠損情報の骨格点に近い位置にある骨格点から、欠損情報を補間してもよい。 Specifically, for example, the missing information interpolation unit 423 extracts missing information from a skeleton point that was connected to a skeleton point of the missing information or a skeleton point located near the skeleton point of the missing information, out of the non-missing information. May be interpolated.

また、欠損情報補間部４２３は、欠損情報に対して予め定めた位置情報を代入してもよい。また、欠損情報補間部４２３は、これまでに取得した別のフレームの骨格情報３２０について欠損情報を含むと判断された骨格情報３２０の欠損情報を用いて補間してもよい。このように、欠損情報の補間手法は限定されない。欠損情報補間部４２３は、補間した骨格情報３２０を骨格情報処理部４０３に出力する。 Furthermore, the missing information interpolation unit 423 may substitute predetermined position information for the missing information. Furthermore, the missing information interpolation unit 423 may interpolate the previously acquired skeletal information 320 of another frame using the missing information of the skeletal information 320 that has been determined to include missing information. In this way, the interpolation method for missing information is not limited. The missing information interpolation unit 423 outputs the interpolated skeleton information 320 to the skeleton information processing unit 403.

欠損位置情報生成部４２２は、欠損制御部４２１からの欠損発生情報より、欠損させた骨格点の位置情報を欠損位置情報として、行動学習部４０６に出力する。 Based on the defect occurrence information from the defect control section 421, the defect position information generation section 422 outputs the position information of the missing skeleton point to the behavior learning section 406 as defect position information.

骨格情報処理部４０３は、欠損情報補間部４２３による補間後の骨格情報３２０を処理する。具体的には、たとえば、骨格情報処理部４０３は、取得した補間後の教師信号の内、骨格情報３２０から関節角度３７０とフレーム間の移動量とを算出する。また、骨格情報処理部４０３は、骨格情報３２０に対して絶対的な位置情報を除外し、骨格情報３２０の大きさが一定となる正規化を実行する。そして、骨格情報処理部４０３は、関節角度３７０と、フレーム間の移動量と、正規化した骨格情報３２０と、を主成分分析部４０４に出力する。 The skeleton information processing unit 403 processes the skeleton information 320 after interpolation by the missing information interpolation unit 423. Specifically, for example, the skeletal information processing unit 403 calculates the joint angle 370 and the amount of movement between frames from the skeletal information 320 of the acquired interpolated teacher signal. Furthermore, the skeleton information processing unit 403 performs normalization on the skeleton information 320, excluding absolute position information so that the size of the skeleton information 320 is constant. Then, the skeletal information processing unit 403 outputs the joint angle 370, the amount of movement between frames, and the normalized skeletal information 320 to the principal component analysis unit 404.

図５は、骨格情報処理部４０３，４５３の詳細な機能的構成例を示すブロック図である。骨格情報処理部４０３，４５３は、関節角度算出部５０１と、移動量算出部５０２と、正規化部５０３と、を有する。 FIG. 5 is a block diagram showing a detailed functional configuration example of the skeleton information processing units 403 and 453. The skeletal information processing units 403 and 453 include a joint angle calculation unit 501, a movement amount calculation unit 502, and a normalization unit 503.

関節角度算出部５０１は、取得した教師信号の内、補間後の骨格情報３２０から関節角度３７０を算出し、移動量算出部５０２と正規化部５０３を介して主成分分析部４０４に出力する。 The joint angle calculation unit 501 calculates a joint angle 370 from the interpolated skeletal information 320 of the acquired teacher signal, and outputs it to the principal component analysis unit 404 via the movement amount calculation unit 502 and the normalization unit 503.

移動量算出部５０２は、取得した教師信号の内、補間後の骨格情報３２０からフレーム間の移動量を算出し、正規化部５０３を介して主成分分析部４０４に出力する。 The movement amount calculation unit 502 calculates the interframe movement amount from the interpolated skeleton information 320 of the acquired teacher signal, and outputs it to the principal component analysis unit 404 via the normalization unit 503.

正規化部５０３は、取得した教師信号の内、補間後の骨格情報３２０に対して絶対的な位置情報を除外し、補間後の骨格情報３２０の大きさが一定となる正規化を実行して主成分分析部４０４に出力する。 The normalization unit 503 excludes absolute position information from the interpolated skeletal information 320 from the acquired teacher signal, and performs normalization such that the size of the interpolated skeletal information 320 is constant. It is output to the principal component analysis section 404.

図４に戻り、主成分分析部４０４は、骨格情報処理部４０３から取得した教師信号の内、正規化した骨格情報３２０と、関節角度３７０と、フレーム間の移動量と、を入力データとして、主成分分析を実行して単数または複数の主成分を生成し、行動学習部４０６に出力する。なお、骨格情報３２０、関節角度３７０、およびフレーム間の移動量のうち、少なくとも正規化した骨格情報３２０が入力データであればよい。 Returning to FIG. 4, the principal component analysis unit 404 uses as input data the normalized skeletal information 320, the joint angles 370, and the amount of movement between frames among the teacher signals acquired from the skeletal information processing unit 403. A principal component analysis is performed to generate one or more principal components, and the generated principal components are output to the behavior learning unit 406. Note that among the skeletal information 320, the joint angles 370, and the amount of movement between frames, at least the normalized skeletal information 320 may be input data.

主成分分析では下記式（１）に示す通り、入力データｘ_ｉに係数ｗ_ｉｊを各々乗算し、加算することで主成分ｙ_ｉを生成する。主成分分析の一般式を下記式（２）に示す。係数ｗ_ｉｊは、下記式（３）に示す通り、ｙ_ｉの分散をＶ(ｙ_ｉ)として定義した場合、分散Ｖ（ｙ_ｉ）が最大となるように定める。 In principal component analysis, as shown in equation (1) below, input data x _i is multiplied by coefficient w _ij and added to generate principal component y _i . The general formula for principal component analysis is shown in formula (2) below. As shown in equation (3) below, the coefficient w _ij is determined so that when the variance of y _i is defined as V(y _i ), the variance V(y _i ) is maximized.

ただし、係数ｗｉｊに制約を持たせない場合、分散Ｖ（ｙ_ｉ）の絶対量は無限に大きく取ることができ、係数ｗ_ｉｊは一意に決定することができないため、下記式（４）の制約を付すことが望ましい。また、情報の重複を無くすため、新たに生成する主成分ｙ_ｋとこれまでに生成した主成分ｙ_ｋの共分散は０となる下記式（５）の制約を付すことが望ましい。 However, if there is no constraint on the coefficient wij, the absolute amount of the variance V(y _i ) can be infinitely large, and the coefficient _wij cannot be uniquely determined, so the constraint in equation (4) below It is desirable to add . Furthermore, in order to eliminate duplication of information, it is desirable to impose the following formula (5) constraint such that the covariance between the newly generated principal component y _k and the previously generated principal component y _k is 0.

ただし、制約として付す上記式（４）と上記式（５）は、これに限らず別の制約条件を付したり、または制約を外したりして係数ｗ_ｉｊを算出しても問題ない。こうして生成した新たな主成分ｙ_ｊの分散Ｖ（ｙｊ）について下記式（６）に示す通りλ_ｊとして別途定義した場合、下記式（７）に示す通り入力データｘ_ｊの分散Ｖ（ｘ_ｊ）の合計とλ_ｊの合計は等しい。 However, the above equations (4) and (5) that are added as constraints are not limited to these, and the coefficient w _ij may be calculated by adding another constraint condition or removing the constraints. When the variance V(yj) of the new principal component y _j generated in this way is separately defined as λ _j as shown in equation ₍ 6) below, the variance V(x _j ) and the sum of λ _j are equal.

ここでｐは入力データｘ_ｊの数とする。新たに生成した主成分ｙ_ｊの分散Ｖ（ｙ_ｊ）は高い方が元の情報をより多く反映しており、分散値が高い主成分から順に第１、第２、…、第ｍ主成分という。新たに生成した変数ｙ_ｊの分散と元のデータの分散の比を寄与率といい、下記式（８）で示される。また、第１主成分の寄与率から分散値の降順（主成分の序数ｍの昇順）に寄与率を加算した結果を累積寄与率といい、下記式（９）で示される。 Here, p is the number of input data _xj . The higher the variance V (y _j ) of the newly generated principal component y _j , the more the original information is reflected, and the principal component with the highest variance value is the first, second, ..., m-th principal component. That's what it means. The ratio of the variance of the newly generated variable y _j to the variance of the original data is called the contribution rate, and is expressed by the following equation (8). Further, the result of adding the contribution rate from the contribution rate of the first principal component in descending order of the variance value (in ascending order of the ordinal number m of the principal component) is called the cumulative contribution rate, and is expressed by the following formula (9).

寄与率と累積寄与率は、新たに生成した主成分ｙ_ｊや生成した複数の主成分が元のデータの情報量をどの程度表しているかといった尺度となり、主成分と共に生成される。なお、多変量解析で統計的な成分を生成する成分分析の一例として、主成分分析を適用したが、主成分分析の替わりに、同じく成分分析の一例である独立成分分析を実行してもよい。 The contribution rate and the cumulative contribution rate are measures of how much the newly generated principal component y _j or the generated plurality of principal components represent the information amount of the original data, and are generated together with the principal component. Although principal component analysis was applied as an example of component analysis that generates statistical components in multivariate analysis, independent component analysis, which is also an example of component analysis, may be performed instead of principal component analysis. .

独立成分分析の場合、主成分は独立成分となる。この独立成分が入力データｘｉにどのくらい影響を与えているのかを示す指標として、寄与率を用いてもよい。独立成分分析では、独立成分ごとの独立成分分析における混合係数行列の２乗和が、各独立成分の強度となる。 In the case of independent component analysis, the principal components are independent components. A contribution rate may be used as an index indicating how much influence this independent component has on the input data xi. In the independent component analysis, the sum of squares of the mixing coefficient matrix in the independent component analysis for each independent component becomes the strength of each independent component.

独立成分の強度は独立成分の入力データｘ_ｉにおける分散を示す。すなわち、独立成分分析によって得られた独立成分はいずれも分散が１に統一されるため、混合係数の２乗和をとれば入力データｘ_ｉの分散になる。そして、独立成分の強度を、全独立成分の強度の総和で割った値を、その独立変数の寄与率とすればよい。 The strength of the independent component indicates the variance of the independent component in the input data x _i . That is, since all the independent components obtained by the independent component analysis have a unified variance of 1, the sum of the squares of the mixing coefficients becomes the variance of the input data x _i . Then, the contribution ratio of the independent variable may be determined by dividing the strength of the independent component by the sum of the strengths of all independent components.

主成分分析部４０４は、１以上の成分の各々の次元を示す序数ｋを制御する。具体的には、たとえば、主成分分析部４０４は、生成した主成分の内、行動学習部４０６で学習に用いる主成分を分散値の高い順に何次元まで使用するかを決定し、第１主成分から、決定した次元ｋ（ｋは１以上の整数）を序数とする第ｋ主成分までの主成分を、分散値の高い順に行動学習部４０６に出力する。具体的には、たとえば、主成分分析部４０４は、累積寄与率のしきい値以上でかつ当該しきい値に最も近い主成分の次元ｋを採用する。もしくは、主成分分析部４０４は、累積寄与率のしきい値以下でかつ当該しきい値に最も近い主成分の次元ｋを採用する。 The principal component analysis unit 404 controls an ordinal number k indicating the dimension of each of one or more components. Specifically, for example, the principal component analysis unit 404 determines how many dimensions of the principal components to be used for learning in the behavioral learning unit 406 among the generated principal components, in descending order of variance value, and determines the first principal component. The principal components from the component to the k-th principal component whose ordinal number is the determined dimension k (k is an integer greater than or equal to 1) are output to the behavior learning unit 406 in descending order of variance value. Specifically, for example, the principal component analysis unit 404 employs the dimension k of the principal component that is equal to or greater than the cumulative contribution rate threshold and is closest to the threshold. Alternatively, the principal component analysis unit 404 adopts the dimension k of the principal component that is less than or equal to the cumulative contribution rate threshold and closest to the threshold.

行動学習部４０６は、主成分分析部４０４から取得した主成分と、欠損位置情報生成部４２２から取得した欠損位置情報と、教師信号ＤＢ１０４から取得した教師信号内の行動情報とを、関連付けて学習する。具体的には、たとえば、行動学習部４０６は、主成分分析部４０４から取得した第１主成分から第ｋ主成分までの主成分群と欠損位置情報とを連結したデータを説明変数とし、教師信号ＤＢ１０４から取得した教師信号内の行動情報を目的変数として、機械学習により、行動分類モデルを生成する。行動学習部４０６は、学習の結果生成した行動分類モデルを、行動認識部４５７に出力する。 The behavior learning unit 406 performs learning by associating the principal component acquired from the principal component analysis unit 404, the missing position information acquired from the missing position information generation unit 422, and the behavior information in the teacher signal acquired from the teacher signal DB 104. do. Specifically, for example, the behavior learning unit 406 uses data obtained by connecting the principal component group from the first principal component to the k-th principal component obtained from the principal component analysis unit 404 and the missing position information as an explanatory variable, and A behavior classification model is generated by machine learning using the behavior information in the teacher signal acquired from the signal DB 104 as a target variable. The behavior learning unit 406 outputs the behavior classification model generated as a result of learning to the behavior recognition unit 457.

つぎに、クライアント１０２側の機能的構成例について説明する。骨格検出部４５１は、センサ１０３から取得した解析対象データに映る人の骨格情報３２０を検出し、欠損情報補間部４６１と欠損位置情報生成部４６２とに出力する。骨格情報３２０の検出には機械学習により生成した人の骨格情報３２０を推定可能なＮＮ（ｎｅｕｒａｌｎｅｔｗｏｒｋ）を用いてもよいし、検出したい人の骨格点にマーカーを付与して、画像に映るマーカー位置から骨格情報３２０を検出してもよく、骨格情報３２０を検出する方法は限定されない。骨格情報３２０のうちｘ座標値３２２およびｙ座標値３２３が検出されずに不定値となった骨格点が欠損情報であり、骨格情報３２０のうち欠損情報を除く骨格点が非欠損情報である。たとえば、骨格点３００～３１７のうち骨格点３１０のｘ座標値３２２およびｙ座標値３２３のみが不定値であれば、骨格点３１０が欠損情報であり、骨格点３００～３０９、３１１～３１７が非欠損情報である。 Next, an example of the functional configuration of the client 102 will be described. The skeleton detection unit 451 detects the skeleton information 320 of a person appearing in the data to be analyzed acquired from the sensor 103 and outputs it to the missing information interpolation unit 461 and the missing position information generating unit 462. To detect the skeletal information 320, a neural network (NN) that can estimate the human skeletal information 320 generated by machine learning may be used, or markers may be added to the skeletal points of the person to be detected, and markers that appear in the image may be used. The skeletal information 320 may be detected from the position, and the method of detecting the skeletal information 320 is not limited. Skeletal points in the skeleton information 320 whose x-coordinate values 322 and y-coordinate values 323 are not detected and become undefined values are missing information, and skeleton points in the skeleton information 320 excluding the missing information are non-defective information. For example, if only the x-coordinate value 322 and y-coordinate value 323 of the skeleton point 310 among the skeleton points 300-317 are indefinite values, the skeleton point 310 is missing information, and the skeleton points 300-309, 311-317 are non-constant. This is missing information.

欠損情報補間部４６１は、欠損情報補間部４２３と同様の機能を有する。欠損情報補間部４６１は、骨格検出部４５１からの骨格情報３２０に対して、欠損情報補間部４２３と同様の処理を実行して、欠損情報（上記の例では骨格点３１０）を、非欠損情報（上記の例では骨格点３００～３０９、３１１～３１７）から補間し、補間した骨格情報３２０を骨格情報処理部４５３に出力する。 The missing information interpolation unit 461 has the same function as the missing information interpolation unit 423. The missing information interpolation unit 461 performs the same processing as the missing information interpolation unit 423 on the skeleton information 320 from the skeleton detection unit 451, and converts missing information (skeletal point 310 in the above example) into non-missing information. Interpolation is performed from the skeleton points 300 to 309 and 311 to 317 in the above example, and the interpolated skeleton information 320 is output to the skeleton information processing section 453.

欠損位置情報生成部４６２は、欠損位置情報生成部４２２と同様の機能を有する。欠損位置情報生成部４６２は、骨格検出部４５１からの骨格情報３２０に対して、欠損位置情報生成部４２２と同様の処理を実行して、骨格検出部４５１で検出した骨格情報３２０の内、オクルージョンなどにより取得できない骨格点があるか否かを判断し、取得できなかった骨格点があれば、その骨格点を欠損箇所としてその位置を示す欠損位置情報を生成し、行動認識部４５７に出力する。 The missing position information generating section 462 has the same function as the missing position information generating section 422. The missing position information generating unit 462 performs the same processing as the missing position information generating unit 422 on the skeleton information 320 from the skeleton detecting unit 451, and detects occlusions in the skeleton information 320 detected by the skeleton detecting unit 451. If there is a skeleton point that could not be acquired, the system generates missing position information indicating the position of the skeleton point as a missing point, and outputs it to the behavior recognition unit 457. .

骨格情報処理部４５３は、骨格情報処理部４０３と同様の機能を有する。骨格情報処理部４５３は、骨格検出部４５１で検出した骨格情報３２０に対して骨格情報処理部４０３と同様の処理を実行して、関節角度３７０と、フレーム間の移動量と、正規化した骨格情報３２０と、を主成分分析部４５４に出力する。 The skeletal information processing unit 453 has the same functions as the skeletal information processing unit 403. The skeleton information processing unit 453 executes the same processing as the skeleton information processing unit 403 on the skeleton information 320 detected by the skeleton detection unit 451, and calculates the joint angle 370, the amount of movement between frames, and the normalized skeleton. The information 320 is output to the principal component analysis section 454.

主成分分析部４５４は、主成分分析部４０４と同様の機能を有する。主成分分析部４５４は、骨格情報処理部４５３からの出力データに対して主成分分析部４０４と同様の処理を実行して、単数または複数の主成分を生成し、行動認識部４５７に出力する。 Principal component analysis section 454 has the same functions as principal component analysis section 404. The principal component analysis unit 454 performs the same processing as the principal component analysis unit 404 on the output data from the skeletal information processing unit 453 to generate one or more principal components, and outputs the generated principal components to the behavior recognition unit 457. .

主成分分析部４５４は、各々の寄与率から得られる累積寄与率に基づいて、１以上の成分の各々の次元を示す序数ｋを決定する。具体的には、たとえば、主成分分析部４５４は、取得した寄与率および累積寄与率から、取得した主成分の内、分散の高い順に何次元までの主成分を行動認識部４５７に出力するかを示す次元数ｋを決定する。次元数ｋとは、主成分の次元を示す序数ｋである。たとえば、第１主成分であれば、次元数（序数）ｋ＝１であり、第２主成分であれば、次元数（序数）ｋ＝２である。主成分分析部４５４は、分散の高い順に第１主成分から第ｋ主成分までの主成分群を行動認識部４５７に出力する。 The principal component analysis unit 454 determines an ordinal number k indicating each dimension of one or more components based on the cumulative contribution rate obtained from each contribution rate. Specifically, for example, the principal component analysis unit 454 determines, based on the acquired contribution rate and cumulative contribution rate, up to which dimension of the acquired principal components should be output to the behavior recognition unit 457 in descending order of variance? Determine the number of dimensions k that represents . The number of dimensions k is an ordinal number k indicating the dimension of the principal component. For example, for the first principal component, the number of dimensions (ordinal number) k=1, and for the second principal component, the number of dimensions (ordinal number) k=2. The principal component analysis unit 454 outputs a group of principal components from the first principal component to the k-th principal component to the behavior recognition unit 457 in order of descending variance.

行動認識部４５７は、行動学習部４０６で生成した行動分類モデルと、第１主成分から第ｋ主成分までの主成分群と、欠損位置情報と、に基づいて、センサ１０３から取得した解析対象データに映る人の行動を認識する。具体的には、たとえば、行動認識部４５７は、解析対象データから得らえた主成分群（第１主成分～第ｋ主成分）と欠損位置情報とを、選択した行動分類モデルに入力することにより、解析対象データに映る人の行動を示す予測値を認識結果として出力する。 The behavior recognition unit 457 selects the analysis target obtained from the sensor 103 based on the behavior classification model generated by the behavior learning unit 406, the principal component group from the first principal component to the k-th principal component, and the missing position information. Recognize human behavior reflected in data. Specifically, for example, the behavior recognition unit 457 inputs the principal component group (first principal component to k-th principal component) and missing position information obtained from the data to be analyzed into the selected behavior classification model. As a result, a predicted value indicating the behavior of the person reflected in the data to be analyzed is output as a recognition result.

＜関節角度算出の例＞
図６は、関節角度算出部５０１が実行する関節角度３７０の詳細な算出方法を示す説明図である。関節角度算出部５０１は、連結する３点の骨格点６００～６０２において関節角度θを算出する。骨格点６００～６０２の骨格情報６２０について、原点６３０を基準とする位置ベクトルＯ、Ａ、Ｂのように各々定義する。関節角度算出部５０１は、骨格点６００を原点とする相対ベクトルを下記式（１０），（１１）に示す通り算出し、算出したベクトルから下記式（１２）が成立し、下記式（１３）に示す通り逆余弦を算出することで関節角度θを算出する。 <Example of joint angle calculation>
FIG. 6 is an explanatory diagram showing a detailed calculation method of the joint angle 370 executed by the joint angle calculation unit 501. The joint angle calculation unit 501 calculates the joint angle θ at three connected skeleton points 600 to 602. The skeleton information 620 of the skeleton points 600 to 602 is defined as position vectors O, A, and B with the origin 630 as a reference. The joint angle calculation unit 501 calculates a relative vector with the skeleton point 600 as the origin as shown in the following equations (10) and (11), and from the calculated vector, the following equation (12) is established, and the following equation (13) is established. The joint angle θ is calculated by calculating the arc cosine as shown in .

＜フレーム間の移動量算出の例＞
図７は、移動量算出部５０２が実行するフレーム間の移動量の詳細な算出方法の例を示す説明図である。移動量算出部５０２は、フレーム間の移動量の算出において、同一被写体についての第Ｎフレーム目の骨格情報７０１と第Ｎ－Ｍフレーム目の骨格情報７０２とを用いる。Ｎ、Ｍは１以上の整数であり、Ｎ＞Ｍである。Ｍの値は任意に設定可能である。下記式（１４）～（１６）に示す通り、移動量算出部５０２は、各フレーム間で示される同一人物の同一骨格点３００～３１７の距離を各々算出する。１８個の骨格点３００～３１７のフレーム間の移動量が、当該人物についてのフレーム間の移動量となる。 <Example of calculating the amount of movement between frames>
FIG. 7 is an explanatory diagram illustrating a detailed example of a method for calculating the amount of movement between frames, which is executed by the movement amount calculation unit 502. The movement amount calculation unit 502 uses skeleton information 701 of the Nth frame and skeleton information 702 of the NMth frame regarding the same subject in calculating the movement amount between frames. N and M are integers of 1 or more, and N>M. The value of M can be set arbitrarily. As shown in equations (14) to (16) below, the movement amount calculation unit 502 calculates the distances between the same skeletal points 300 to 317 of the same person shown between each frame. The amount of movement of the 18 skeleton points 300 to 317 between frames is the amount of movement between frames for the person.

ただ、移動量算出部５０２が実行するフレーム間の移動量はこれに限定されるものではなく、下記式（１７）に示す通り、移動量算出部５０２は、各フレーム間で示される同一人物の同一骨格点３００～３１７の距離を各々算出し、全１８個の骨格点３００～３１７のフレーム間の移動量を合算した値を、当該人物についてのフレーム間の移動量としてもよい。

However, the amount of movement between frames executed by the movement amount calculation unit 502 is not limited to this, and as shown in equation (17) below, the movement amount calculation unit 502 can calculate the amount of movement of the same person shown between each frame. The distances between the same skeletal points 300 to 317 may be calculated, and the sum of the inter-frame movement amounts of all 18 skeletal points 300 to 317 may be used as the inter-frame movement amount for the person.

また、移動量算出部５０２は、第ｎフレームの骨格情報７０１と第ｎ－ｍフレームの骨格情報７０２の内、重心となる重心骨格情報７１１と重心骨格情報７１２を用いてもよい。具体的には、たとえば、移動量算出部５０２は、下記式（１８）～（１９）に示す通り、人物ごとに重心を算出し、下記式（２０）に示す通り、算出した重心に対して、当該人物についてのフレーム間の移動量を算出してもよい。 Further, the movement amount calculation unit 502 may use center-of-gravity skeleton information 711 and center-of-gravity skeleton information 712, which are the center of gravity, out of the skeleton information 701 of the n-th frame and the skeleton information 702 of the nm-th frame. Specifically, for example, the movement amount calculation unit 502 calculates the center of gravity for each person as shown in equations (18) to (19) below, and calculates the center of gravity for each person as shown in equation (20) below. , the amount of movement between frames for the person may be calculated.

＜正規化の例＞
図８は、正規化部５０３が実行する骨格情報３２０の正規化の詳細な手法を示す説明図である。まず、正規化部５０３は、（ａ）すべてまたは一部の骨格情報３２０から重心を算出し、（ｂ）重心を原点とする相対座標に変換する。その後、正規化部５０３は、（ｃ）１８個の骨格点３００～３１７を囲う最小の長方形の対角線の長さＬで、（ｄ）骨格情報３２０の各骨格点の位置情報を割る。（ｄ）で得られた骨格情報３２０を教師信号とした場合、割り算後の骨格点３００～３１７の位置情報も組み込まれることとなる。 <Example of normalization>
FIG. 8 is an explanatory diagram showing a detailed method of normalizing the skeleton information 320 performed by the normalization unit 503. First, the normalization unit 503 (a) calculates the center of gravity from all or part of the skeleton information 320, and (b) converts the center of gravity into relative coordinates with the origin as the origin. After that, the normalization unit 503 divides (d) the position information of each skeleton point in the skeleton information 320 by (c) the length L of the minimum rectangular diagonal that surrounds the 18 skeleton points 300 to 317. If the skeleton information 320 obtained in (d) is used as a teacher signal, the position information of the skeleton points 300 to 317 after division will also be incorporated.

たとえば、正規化部５０３が実行されないと「１８０ｃｍの人が地点Ａで座る」といった行動について骨格検出および行動分類のための学習が実行されると、「地点Ａ以外では座らない」、「１８０ｃｍ以外の人は座らない」といった判定が下される可能性がある。こうした限定を除外し、行動分類に汎用性を持たせるため、画像内の絶対的な位置情報と、骨格の大きさに関する情報について除去するため、正規化部５０３が骨格情報３２０の正規化を実行する。 For example, if the normalization unit 503 is not executed and learning for skeleton detection and behavior classification is executed for an action such as "a 180 cm person sits at point A", "a person who is 180 cm tall sits at point A" and "does not sit at any place other than point A" There is a possibility that a judgment will be made that ``people who do not sit down will not sit down''. In order to eliminate these limitations and provide versatility in behavior classification, the normalization unit 503 normalizes the skeleton information 320 in order to remove absolute position information in the image and information regarding the size of the skeleton. do.

＜教師信号ＤＢ１０４が保持する教師信号＞
図９は、教師信号ＤＢ１０４が保持する教師信号の詳細な例を示す説明図である。解析対象データとなる（ａ）画像９００に映る人において、（ｂ）骨格情報３２０Ａと、関節角度３７０（不図示）と、骨格情報３２０Ａに関連付けられる（ｃ）行動情報９０１（「立つ」）と、の組み合わせが、教師信号となる。同様に、解析対象データとなる（ａ）画像９１０に映る人において、（ｂ）骨格情報３２０Ｂと、関節角度３７０（不図示）と、骨格情報３２０Ｂに関連付けられる（ｃ）行動情報９１１（「倒れる」）と、の組み合わせが、教師信号となる。 <Teacher signal held by the teacher signal DB 104>
FIG. 9 is an explanatory diagram showing a detailed example of the teacher signal held by the teacher signal DB 104. For the person appearing in (a) image 900, which is data to be analyzed, (b) skeletal information 320A, joint angles 370 (not shown), and (c) behavior information 901 (“standing”) associated with skeletal information 320A. , becomes the teacher signal. Similarly, for the person appearing in (a) image 910, which is the data to be analyzed, (b) skeletal information 320B, joint angles 370 (not shown), and (c) behavioral information 911 (“fall down”) associated with skeletal information 320B. ”) becomes the teacher signal.

＜主成分分析部４０４による次元削減＞
図１０は、教師信号を入力データとして主成分分析部４０４が生成した主成分を、主成分空間上にプロットした例を示す説明図である。凡例は教師信号に含まれる行動情報１０００～１００４を示す。 <Dimension reduction by principal component analysis unit 404>
FIG. 10 is an explanatory diagram showing an example in which the principal components generated by the principal component analysis unit 404 using the teacher signal as input data are plotted on the principal component space. The legend indicates behavior information 1000 to 1004 included in the teacher signal.

図１０において、（ａ）はＸ軸に第１主成分を、Ｙ軸に第２主成分をとり、第２主成分までの情報を２次元平面上にプロットした例を示す。（ｂ）はＸ軸に第１主成分を、Ｙ軸に第２主成分をとり、Ｚ軸に第３主成分をとり、第３主成分までの情報を３次元空間上にプロットした例を示す。 In FIG. 10, (a) shows an example in which the first principal component is plotted on the X axis and the second principal component is plotted on the Y axis, and information up to the second principal component is plotted on a two-dimensional plane. (b) is an example in which the first principal component is plotted on the X-axis, the second principal component is plotted on the Y-axis, and the third principal component is plotted on the Z-axis, and the information up to the third principal component is plotted on a three-dimensional space. show.

（ａ）において、立つ１０００と、座る１００１と、倒れる１００４は、第２主成分までの２次元平面上でも分離可能な様子が伺えるが、歩く１００２と、しゃがむ１００３は第２主成分までの２次元平面上では分離困難な様子が伺える。ここで、（ｂ）において、第３主成分までを含めた３次元空間上で、歩く１００２としゃがむ１００３をプロットした場合、分離の可能性が拡大する場合がある。 In (a), it can be seen that standing 1000, sitting 1001, and falling 1004 can be separated even on a two-dimensional plane up to the second principal component, but walking 1002 and squatting 1003 are two-dimensional up to the second principal component. It can be seen that separation is difficult on a dimensional plane. Here, in (b), if walking 1002 and crouching 1003 are plotted on a three-dimensional space including up to the third principal component, the possibility of separation may increase.

このため、主成分分析部４０４が生成した主成分を多く用いれば高精度な行動分類の可能性がある。ただし、主成分の次元を示す序数ｋを多くすると計算量は増加するため、精度と計算量からどこまでの主成分を考慮し、どのくらいの次元の空間で行動を表すかを判断する必要がある。 Therefore, if a large number of principal components generated by the principal component analysis unit 404 are used, highly accurate behavior classification is possible. However, as the ordinal number k, which indicates the dimension of the principal component, increases, the amount of calculation increases, so it is necessary to consider the extent of the principal component based on the accuracy and amount of calculation, and determine in what dimensional space the behavior should be represented.

したがって、主成分分析部４０４は、行動学習部４０６で学習に用いる主成分の最大序数を変化させ、第１主成分～最大序数の主成分までの主成分群を行動学習部４０６に出力する。具体的には、たとえば、上述した行動分類の要求精度（たとえば、最低限必要な主成分の次元を示す序数）または／および許容計算量をあらかじめ設定しておき、主成分分析部４０４が、行動学習部４０６で学習に用いる主成分の最大序数を変化させ、要求精度または／および許容計算量を最大限充足する序数を決定する。 Therefore, the principal component analysis unit 404 changes the maximum ordinal number of the principal components used for learning by the behavior learning unit 406, and outputs a principal component group from the first principal component to the principal component with the maximum ordinal number to the behavior learning unit 406. Specifically, for example, the required accuracy of the behavior classification described above (for example, an ordinal number indicating the minimum required principal component dimension) and/or the allowable amount of calculation are set in advance, and the principal component analysis unit 404 The learning unit 406 changes the maximum ordinal number of the principal component used for learning, and determines the ordinal number that satisfies the required accuracy and/or allowable amount of calculation to the maximum extent.

たとえば、要求精度が次元を示す序数「３」（第３主成分）という条件の場合、主成分分析部４０４は、最大序数を「３」に決定し、第１主成分～第３主成分までの主成分群を行動学習部４０６に出力する。 For example, if the required accuracy is the ordinal number "3" indicating the dimension (third principal component), the principal component analysis unit 404 determines the maximum ordinal number to be "3" and The principal component group of is output to the behavior learning unit 406.

また、許容計算量が条件に設定されている場合、主成分分析部４０４は、第１主成分から昇順に計算量を順次取得し、最大序数を、許容計算量をはじめて超えたときの序数（たとえば、「５」）より１つ少ない序数（たとえば、「４」）に決定し、第１主成分から最大序数ｋ＝４の第４主成分までの主成分群を行動学習部４０６に出力する。 In addition, when the allowable amount of calculation is set as a condition, the principal component analysis unit 404 sequentially acquires the amount of calculation in ascending order starting from the first principal component, and determines the maximum ordinal number to be the ordinal number ( For example, the ordinal number (for example, "4") is determined to be one less than "5" (for example, "4"), and the principal component group from the first principal component to the fourth principal component with the maximum ordinal number k=4 is output to the behavior learning unit 406. .

また、要求精度が次元を示す序数「３」（第３主成分）以上という条件で、かつ、許容計算量が条件に設定されている場合、第３主成分までの累積計算量が許容計算量以下であれば、主成分分析部４０４は、最大序数を「３」から「４」に変化させる。そして、第４主成分までの累積計算量が許容計算量を超えれば、主成分分析部４０４は、最大序数ｋを「３」に決定し、第１主成分～第３主成分までの主成分群を行動学習部４０６に出力する。 In addition, if the required accuracy is the ordinal number "3" indicating the dimension (third principal component) or more, and the allowable calculation amount is set as a condition, the cumulative calculation amount up to the third principal component is the allowable calculation amount. If it is below, the principal component analysis unit 404 changes the maximum ordinal number from "3" to "4". Then, if the cumulative amount of calculation up to the fourth principal component exceeds the allowable amount of calculation, the principal component analysis unit 404 determines the maximum ordinal number k to be "3" and The group is output to the behavior learning unit 406.

一方、第３主成分までの累積計算量が許容計算量を超えれば、主成分分析部４０４は、最大序数を「３」から「２」に変化させる。そして、第２主成分までの累積計算量が許容計算量以下であれば、主成分分析部４０４は、最大序数ｋを「２」に決定し、第１主成分～第２主成分までの主成分群を行動学習部４０６に出力する。 On the other hand, if the cumulative amount of calculation up to the third principal component exceeds the allowable amount of calculation, the principal component analysis unit 404 changes the maximum ordinal number from "3" to "2". Then, if the cumulative amount of calculation up to the second principal component is less than the allowable amount of calculation, the principal component analysis unit 404 determines the maximum ordinal number k to be "2" and The component group is output to the behavior learning unit 406.

なお、行動学習部４０６に出力する主成分群は、第１主成分から昇順に限定する必要はない。たとえば、主成分分析部４０４は、予め定めた主成分群を特定の数だけ取り出してもよい。また、主成分分析部４０４は、特定の主成分群を除外した上で行動学習部４０６に出力する主成分群を決定してもよい。このように、行動学習部４０６に出力する主成分群は第１主成分から昇順の主成分群に限定されない。 Note that the principal component group output to the behavior learning unit 406 does not need to be limited to ascending order starting from the first principal component. For example, the principal component analysis unit 404 may extract a specific number of predetermined principal component groups. Further, the principal component analysis unit 404 may decide the principal component group to be output to the behavior learning unit 406 after excluding a specific principal component group. In this way, the principal component groups output to the behavior learning unit 406 are not limited to principal component groups in ascending order from the first principal component.

また、この場合においても、許容計算量が条件に設定されている場合、主成分分析部４０４は、上述した第１主成分からの昇順に限定していない主成分群について、序数の昇順に計算量を順次取得し、許容計算量をはじめて超えたときの序数より１つ前の序数までの主成分群を行動学習部４０６に出力する。たとえば、主成分群が第２主成分、第３主成分、第５主成分からなる場合、第２主成分では許容計算量を超えず、第２主成分および第３主成分でも許容計算量を超えず、第２主成分、第３主成分、および第５主成分ではじめて許容計算量を超えた場合、主成分分析部４０４は、第２主成分から第５主成分の１つ前の第３主成分までを、行動学習部４０６に出力する主成分群に決定してもよい。 Also in this case, if the allowable amount of calculation is set as a condition, the principal component analysis unit 404 performs calculations in ascending ordinal order for principal component groups that are not limited to ascending order from the first principal component described above. The amount is sequentially acquired, and the principal component group up to the ordinal number immediately before the ordinal number when the allowable calculation amount is exceeded for the first time is output to the behavior learning unit 406. For example, if the principal component group consists of the second principal component, third principal component, and fifth principal component, the second principal component does not exceed the allowable amount of calculation, and the second and third principal components do not exceed the allowable amount of calculation. If the allowable calculation amount is exceeded for the first time in the second principal component, third principal component, and fifth principal component without exceeding the allowable amount of calculation, the principal component analysis unit 404 Up to three principal components may be determined as the principal component group to be output to the behavior learning unit 406.

＜行動学習部４０６に出力するデータ＞
図１１は、主成分分析部４０４および欠損位置情報生成部４２２が、行動学習部４０６に出力するデータを示す説明図である。主成分分析部４０４は、補間後の骨格情報３２０と、関節角度３７０と、フレーム間の移動量と、を入力データとして、主成分分析を実行して単数または複数の主成分１１０１を生成する。 <Data output to behavior learning unit 406>
FIG. 11 is an explanatory diagram showing data that the principal component analysis section 404 and the missing position information generation section 422 output to the behavior learning section 406. The principal component analysis unit 404 executes a principal component analysis using the interpolated skeleton information 320, the joint angles 370, and the amount of movement between frames as input data to generate one or more principal components 1101.

欠損位置情報生成部４２２は、欠損位置情報１１００を生成する。欠損位置情報１１００は、たとえば、骨格情報３２０の内、各々の骨格が欠損しているか、否かを示すフラグ情報として骨格の数だけ用意される変数で示す。たとえば、骨格情報３２０が１８個の骨格点で構成される場合、フラグ情報は１８個の変数で用意され、各骨格点について、欠損せずに骨格点の座標情報があれば「１」を、欠損していれば「０」のフラグ情報として、欠損位置情報１１００を生成する。 The missing position information generation unit 422 generates missing position information 1100. The missing position information 1100 is indicated by, for example, variables prepared as many times as the number of skeletons as flag information indicating whether or not each skeleton in the skeleton information 320 is missing. For example, if the skeleton information 320 is composed of 18 skeleton points, the flag information is prepared in 18 variables, and for each skeleton point, if there is coordinate information of the skeleton point without missing, "1" is set, If it is missing, the missing position information 1100 is generated as flag information of "0".

また、欠損位置情報１１００は、多値の変数として、各値がどの骨格点が欠損しているかを示す情報として生成してもよく、欠損位置情報１１００のデータフォーマットは限定されない。このように行動学習部４０６には、主成分分析部４０４が生成する主成分１１０１と、欠損位置情報生成部４２２が生成する欠損位置情報１１００と、をまとめたデータ１１０２が出力される。なお、主成分分析部４５４および欠損位置情報生成部４６２が行動認識部４５７に出力するデータもデータ１１０２同様のデータとなる。 Furthermore, the missing position information 1100 may be generated as a multivalued variable, with each value indicating which skeleton point is missing, and the data format of the missing position information 1100 is not limited. In this way, the behavior learning unit 406 is outputted with data 1102 that is a compilation of the principal component 1101 generated by the principal component analysis unit 404 and the missing position information 1100 generated by the missing position information generating unit 422. Note that the data that the principal component analysis unit 454 and the missing position information generation unit 462 output to the behavior recognition unit 457 is also data similar to the data 1102.

行動学習部４０６は、データ１１０２を用いて学習を行い、行動分類モデルを生成し、行動認識部４５７に出力する。 The behavior learning unit 406 performs learning using the data 1102, generates a behavior classification model, and outputs it to the behavior recognition unit 457.

図１２は、行動学習部４０６が行動を学習し、行動認識部４５７が行動を分類するための詳細な手法を示す説明図である。図１２において、（ａ）は、次元削減後の第２変数までの行動のプロット点の分布を示し、（ｂ）は、欠損位置情報１１００を加味した行動分類結果を示す。主成分空間上での各行動について、行動学習部４０６は、各行動を領域毎に分類するため、境界線１２１０や境界平面１２２０を生成する。行動を学習し分類する際の手法は、ｋ平均法や、サポートベクトルマシン、決定木や、ランダムフォレストなどいずれを採用してもよく、行動学習方法は限定されない。 FIG. 12 is an explanatory diagram showing a detailed method for the behavior learning unit 406 to learn behaviors and for the behavior recognition unit 457 to classify the behaviors. In FIG. 12, (a) shows the distribution of plot points of the behavior up to the second variable after dimension reduction, and (b) shows the behavior classification result taking into account the missing position information 1100. For each action on the principal component space, the action learning unit 406 generates a boundary line 1210 and a boundary plane 1220 in order to classify each action into regions. The method for learning and classifying behaviors may be any of the k-means method, support vector machine, decision tree, random forest, etc., and the behavior learning method is not limited.

行動学習部４０６の行動学習時の欠損位置情報１１００は、主成分空間上にプロットされ、各行動情報に付与される。行動情報（立つ、座る、歩く、倒れる、挙手する、しゃがむ）ごとにプロットされた点をプロット点１２００～１２０５とする。ここでは簡単のため、骨格点の欠損が発生していない場合の行動のプロット点１２００～１２０３は白い点とし、骨格点の欠損が発生した場合の行動のプロット点１２０４，１２０５は塗潰すことで表現する。 The missing position information 1100 during behavior learning by the behavior learning unit 406 is plotted on the principal component space and given to each piece of behavior information. Points plotted for each action information (standing, sitting, walking, falling, raising hand, crouching) are plotted points 1200 to 1205. For the sake of simplicity, plot points 1200 to 1203 of behavior when there is no skeleton point loss are shown as white points, and plot points 1204 and 1205 of behavior when skeleton point loss occurs are filled out. express.

骨格点の欠損が発生している行動のプロット点１２０４は、行動「挙手する」が付与されており、骨格点の欠損が発生している行動のプロット点１２０５は、行動情報「しゃがむ」が付与されている。なお、同一形状のプロット点（１２００、１２０４）、（１２０１，１２０５）は、補間後の骨格情報３２０を主成分分析にかけた場合に同様の数値情報を示すことを表す。 The action plot point 1204 for which the skeleton point is missing is assigned the action "raise hand", and the plot point 1205 for the action for which the skeleton point is missing is assigned the action information "squat". has been done. Note that plot points (1200, 1204) and (1201, 1205) having the same shape indicate similar numerical information when the skeleton information 320 after interpolation is subjected to principal component analysis.

欠損がある骨格情報３２０は欠損情報補間部４２３によって、骨格点が補間される。しかし、補間された骨格点は検出された骨格点に比較して誤差量を多く含む場合があるため、欠損を含まない行動と同一次元で行動を分類した際には行動分類精度を劣化させる可能性がある。そこで、骨格情報３２０について、補間された情報か、検出された情報かを区分けして分類することで、行動分類精度の向上につなげることができる。すなわち、補間した誤差を含む主成分に欠損位置情報１１００を付加することで、主成分に別次元の情報が加わり、新たな次元上に分類のための境界平面１２２０を設けられることで行動分類の精度は向上する。 Skeletal points of the missing skeleton information 320 are interpolated by a missing information interpolation unit 423. However, since interpolated skeletal points may contain a larger amount of error than detected skeletal points, behavior classification accuracy may deteriorate when classifying behaviors in the same dimension as behaviors that do not include defects. There is sex. Therefore, by classifying the skeletal information 320 into interpolated information or detected information, it is possible to improve the accuracy of behavior classification. In other words, by adding the missing position information 1100 to the principal component containing the interpolated error, information of another dimension is added to the principal component, and a boundary plane 1220 for classification is provided on the new dimension, which facilitates behavior classification. Accuracy improves.

なお、図１２における欠損位置情報１１００は、プロット点１２０４，１２０５の塗潰しで表現したため、１ｂｉｔの情報として扱われているが、たとえば、欠損位置情報１１００は骨格点の数だけ用意される変数で表現され、この場合は多値の情報として取り扱われる。この場合、各行動は多次元空間上にプロットされるようになり、欠損位置情報１１００によって、より詳細な行動分類が実現可能となる。 Note that the missing position information 1100 in FIG. 12 is expressed by filling in the plot points 1204 and 1205, so it is treated as 1-bit information. However, for example, the missing position information 1100 is a variable prepared for the number of skeleton points. In this case, it is treated as multivalued information. In this case, each action is plotted on a multidimensional space, and the missing position information 1100 enables more detailed action classification.

なお、行動は、教師信号ＤＢ１０４において、学習データ（人の骨格情報３２０および関節角度３７０）と行動の組み合わせとして保持されており、欠損制御部４２１および欠損位置情報生成部４２２の処理により変更されるものではない。欠損制御部４２１および欠損発生部４０２により故意に欠損を発生させ、発生させた欠損の欠損位置情報１１００を欠損位置情報生成部４２２により生成するのは、欠損して補間した誤差を含む主成分においても、欠損位置情報１１００を加味した新たな軸で正しく行動分類を実行するためである。 Note that the behavior is held in the teacher signal DB 104 as a combination of learning data (human skeletal information 320 and joint angles 370) and behavior, and is changed by the processing of the defect control unit 421 and the defect position information generation unit 422. It's not a thing. The defect control unit 421 and the defect generation unit 402 intentionally generate a defect, and the defect position information generation unit 422 generates the defect position information 1100 of the generated defect in the principal component containing the error resulting from the defect and interpolation. This is also to correctly perform behavior classification using a new axis that takes into account the missing position information 1100.

行動認識部４５７は、行動学習部４０６が学習して生成した行動分類モデルを用いて、行動を認識する。具体的には、たとえば、クライアント１０２は、新たに入力された骨格情報３２０について、欠損情報補間部４６１に欠損情報があれば欠損情報を補間する。行動認識部４５７は、補間後の骨格情報３２０について、主成分分析を適用し、新たに生成された主成分および欠損位置情報１１００を行動分類モデルに入力する。これにより、行動認識部４５７は、行動分類モデルが設定する境界線１２１０や境界平面１２２０に従って、新たに入力された骨格情報３２０がどの領域に属するかを判定し、判定された領域に従って行動を認識する。 The behavior recognition unit 457 recognizes behavior using the behavior classification model learned and generated by the behavior learning unit 406. Specifically, for example, the client 102 interpolates missing information if the missing information interpolation unit 461 has missing information regarding the newly input skeleton information 320. The behavior recognition unit 457 applies principal component analysis to the interpolated skeletal information 320 and inputs the newly generated principal component and missing position information 1100 to the behavior classification model. Thereby, the behavior recognition unit 457 determines to which region the newly input skeleton information 320 belongs according to the boundary line 1210 and boundary plane 1220 set by the behavior classification model, and recognizes the behavior according to the determined region. do.

図１３は、主成分分析部４０４が次元数決定の際に用いる累積寄与率の推移を示すグラフである。累積寄与率は、新たに生成した複数の主成分が元のデータの情報量をどの程度表しているかといったことを示す尺度となる。このため、主成分の数を増やして、行動分類の際の次元数を増やしても、累積寄与率に大きな変化が見られない場合は、大きな精度向上は見込めない。 FIG. 13 is a graph showing changes in the cumulative contribution rate used by the principal component analysis unit 404 when determining the number of dimensions. The cumulative contribution rate is a measure of how much the newly generated principal components represent the amount of information in the original data. Therefore, even if the number of principal components is increased to increase the number of dimensions for behavior classification, if there is no significant change in the cumulative contribution rate, no significant improvement in accuracy can be expected.

そこで、主成分分析部４０４は、予め定めた累積寄与率の閾値を超えるのに必要な数だけ主成分を使用することとし、次元数を決定する。たとえば、予め定めた累積寄与率の閾値を「０．８」とする場合、第２主成分まであれば条件を満たすため、ここでの次元数ｋは「２」として、第１主成分と第２主成分とを行動学習部４０６に出力する。 Therefore, the principal component analysis unit 404 determines the number of dimensions by using as many principal components as are necessary to exceed a predetermined cumulative contribution rate threshold. For example, if the predetermined cumulative contribution rate threshold is "0.8", the condition is satisfied as long as the second principal component is reached, so the number of dimensions k here is assumed to be "2", and the first principal component and 2 principal components to the behavior learning unit 406.

なお、行動学習部４０６に出力する主成分群は、第１主成分から昇順に限定する必要はない。たとえば、主成分分析部４０４は、予め定めた累積寄与率の閾値を超えずかつ累積寄与率が最大となる主成分の序数ｋの組み合わせを決定してもよい。また、主成分分析部４０４は、このような主成分の序数ｋの組み合わせを、行動分類モデルに適用される主成分群から選択してもよい。このように、行動学習部４０６に出力する主成分群は第１主成分から昇順の主成分群に限定されない。 Note that the principal component group output to the behavior learning unit 406 does not need to be limited to ascending order starting from the first principal component. For example, the principal component analysis unit 404 may determine a combination of ordinal numbers k of principal components that does not exceed a predetermined cumulative contribution rate threshold and has a maximum cumulative contribution rate. Further, the principal component analysis unit 404 may select such a combination of ordinal numbers k of principal components from a group of principal components applied to the behavior classification model. In this way, the principal component groups output to the behavior learning unit 406 are not limited to principal component groups in ascending order from the first principal component.

なお、主成分分析部４５４が行動認識部４５７に出力する単数または複数の主成分群の選択方法は主成分分析部４０４と同様の方法とする。 Note that the method for selecting one or more principal component groups that the principal component analysis section 454 outputs to the behavior recognition section 457 is the same as the method used by the principal component analysis section 404.

＜学習処理＞
図１４は、実施例１にかかるサーバ１０１（学習装置）による学習処理の詳細な処理手順例を示すフローチャートである。サーバ１０１は、教師信号取得部４０１により、教師信号ＤＢ１０４から取得した教師信号について学習に用いる教師信号を単数、または複数取得する（ステップＳ１４００）。 <Learning process>
FIG. 14 is a flowchart illustrating a detailed processing procedure example of learning processing by the server 101 (learning device) according to the first embodiment. The server 101 uses the teacher signal acquisition unit 401 to acquire one or more teacher signals used for learning from the teacher signals acquired from the teacher signal DB 104 (step S1400).

サーバ１０１は、欠損制御部４２１は、乱数、または予め定めた数に従って、欠損させる骨格点を設定し、欠損発生部４０２により、設定結果に従って、ステップＳ１４００で取得した教師信号内の骨格情報３２０に対して骨格点を欠損させ、欠損させた骨格情報３２０を教師信号内の骨格情報３２０として更新する（ステップＳ１４０１）。 In the server 101, the loss control unit 421 sets skeleton points to be lost according to a random number or a predetermined number, and the loss generation unit 402 adds the skeleton information 320 in the teacher signal acquired in step S1400 according to the setting result. In contrast, the skeleton points are deleted, and the deleted skeleton information 320 is updated as the skeleton information 320 in the teacher signal (step S1401).

サーバ１０１は、欠損位置情報生成部４２２により、欠損制御部４２１が設定した欠損させる骨格点の位置情報を、欠損位置情報として生成する（ステップＳ１４１０）。 The server 101 uses the defect position information generation unit 422 to generate the position information of the skeleton point to be deleted, which is set by the defect control unit 421, as the defect position information (step S1410).

サーバ１０１は、欠損情報補間部４２３により、欠損情報（欠損した単数または複数の骨格情報３２０）を、非欠損情報から補間する（ステップＳ１４１１）。欠損情報補間部４２３が実行された教師信号を、更新教師信号と称す。 The server 101 uses the missing information interpolation unit 423 to interpolate missing information (the missing skeleton information or pieces of skeleton information 320) from the non-missing information (step S1411). The teacher signal processed by the missing information interpolation unit 423 is referred to as an updated teacher signal.

サーバ１０１は、骨格情報処理部４０３により、更新教師信号ごとに骨格情報処理を実行する（ステップＳ１４０２）。具体的には、たとえば、サーバ１０１は、関節角度算出部５０１、移動量算出部５０２、および正規化部５０３による処理を実行する。 The server 101 uses the skeleton information processing unit 403 to execute skeleton information processing for each updated teacher signal (step S1402). Specifically, for example, the server 101 executes processing by a joint angle calculation unit 501, a movement amount calculation unit 502, and a normalization unit 503.

図１５は、実施例１にかかる骨格情報処理の詳細な処理手順例を示すフローチャートである。サーバ１０１は、関節角度算出部５０１により、更新教師信号ごとに、更新教師信号内の骨格情報３２０から関節角度３７０を算出する（ステップＳ１５０１）。つぎに、サーバ１０１は、移動量算出部５０２により、更新教師信号ごとに、更新教師信号内の骨格情報３２０からフレーム間の移動量を算出する（ステップＳ１５０１）。 FIG. 15 is a flowchart illustrating a detailed processing procedure example of skeleton information processing according to the first embodiment. The server 101 uses the joint angle calculating unit 501 to calculate the joint angle 370 from the skeleton information 320 in the updated teaching signal for each updated teaching signal (step S1501). Next, in the server 101, the movement amount calculation unit 502 calculates the movement amount between frames for each updated teacher signal from the skeleton information 320 in the updated teacher signal (step S1501).

そして、サーバ１０１は、正規化部により、更新教師信号ごとに、骨格情報３２０に対して絶対的な位置情報を除外し、骨格情報３２０の大きさが一定となる正規化を実行する（ステップＳ１４０３）。これにより、更新教師信号について、関節角度３７０と、フレーム間の移動量と、正規化した骨格情報３２０と、が得られる。そして、図１３のステップＳ１４０３に移行する。 Then, the server 101 uses the normalization unit to exclude absolute position information from the skeletal information 320 for each updated teacher signal, and performs normalization such that the size of the skeletal information 320 is constant (step S1403 ). As a result, the joint angle 370, the amount of movement between frames, and the normalized skeletal information 320 are obtained for the updated teacher signal. Then, the process moves to step S1403 in FIG.

図１４に戻り、サーバ１０１は、主成分分析部４０４により、正規化した骨格情報３２０と、関節角度３７０と、フレーム間の移動量と、を入力データとして、主成分分析を実行して、単数または複数の主成分を生成する。生成した主成分の内、学習に用いる主成分を分散値の高い順に何次元使用するか決定し、決定したｋ次元までの主成分（第１主成分～第ｋ主成分）を分散値の高い順に選択する（ステップＳ１４０３）。 Returning to FIG. 14, the server 101 uses the normalized skeletal information 320, the joint angles 370, and the amount of movement between frames as input data, and executes a principal component analysis using the principal component analysis unit 404. or generate multiple principal components. Among the generated principal components, decide how many dimensions of the principal components to be used for learning in order of the highest variance value, and select the principal components up to the determined k dimension (1st principal component to k-th principal component) with the highest variance value. They are selected in order (step S1403).

ステップＳ１４０７では、ステップＳ１４０１で欠損させた骨格情報３２０について、まだ欠損させていない骨格情報３２０があれば（ステップＳ１４０７：Ｎｏ）、ステップＳ１４０１の処理に戻り、サーバ１０１は、これまでに欠損させていない骨格点を欠損させる（ステップＳ１４０１）。 In step S1407, if there is skeletal information 320 that has not been deleted yet for the skeletal information 320 that has been deleted in step S1401 (step S1407: No), the process returns to step S1401, and the server 101 determines whether or not the skeletal information 320 has been deleted so far. Skeletal points that do not exist are deleted (step S1401).

一方、すべての骨格情報３２０について欠損させた場合（ステップＳ１４０７：Ｙｅｓ）、ステップＳ１４０８の処理に進む。ただステップＳ１４０７の処理の判断はこれに限らず、サーバ１０１は、予め定めた繰返し回数に従ってステップＳ１４０１に戻るか、ステップＳ１４０８に進むかを判断してもよい。また、欠損させる骨格を予め定めておき、サーバ１０１は、予め定めた骨格をすべて欠損させたか否かでステップＳ１４０１に戻るか、ステップＳ１４０８に進むか判断してもよい。 On the other hand, if all of the skeleton information 320 is deleted (step S1407: Yes), the process advances to step S1408. However, the determination of the process in step S1407 is not limited to this, and the server 101 may determine whether to return to step S1401 or proceed to step S1408 according to a predetermined number of repetitions. Alternatively, skeletons to be deleted may be determined in advance, and the server 101 may determine whether to return to step S1401 or proceed to step S1408 depending on whether all the predetermined skeletons have been deleted.

ステップＳ１４０８では、ステップＳ１４００で選択した教師信号について、まだ選択していない教師信号があれば（ステップＳ１４０８：Ｎｏ）、サーバ１０１は、これまでに選択していない教師信号を選択する（ステップＳ１４００）。一方、すべての教師信号について選択した場合は（ステップＳ１４０８：Ｙｅｓ）、ステップＳ１４０５の処理に進む。ただステップＳ１４０８の処理の判断はこれに限らず、サーバ１０１は、予め定めた繰返し回数に従ってステップＳ１４００に戻るか、ステップＳ１４０５の処理に進むかを判断してもよい。 In step S1408, if there is a teacher signal that has not been selected yet among the teacher signals selected in step S1400 (step S1408: No), the server 101 selects a teacher signal that has not been selected so far (step S1400). . On the other hand, if all teacher signals have been selected (step S1408: Yes), the process advances to step S1405. However, the determination of the process in step S1408 is not limited to this, and the server 101 may determine whether to return to step S1400 or proceed to the process in step S1405 according to a predetermined number of repetitions.

なお、ステップＳ１４０７とステップＳ１４０８との処理によって繰り返し処理が発生し、繰返し回数だけステップＳ１４０３の主成分分析の処理が発生するが、各回で処理された主成分分析の結果は追加して保持され、すべての値を主成分分析部４０４にて保持する。すべての繰り返し処理が終了した際には、主成分分析部４０４にて保持するすべての値が行動学習部４０６に出力される。 Note that the processing in steps S1407 and S1408 causes repeated processing, and the principal component analysis processing in step S1403 occurs as many times as the number of repetitions, but the results of the principal component analysis processed each time are additionally held, All values are held in principal component analysis section 404. When all the repetitive processing is completed, all values held in the principal component analysis section 404 are output to the behavior learning section 406.

サーバ１０１は、行動学習部４０６により、主成分分析部４０４から入力した主成分と欠損位置情報生成部４２２から入力した欠損位置情報とを説明変数とし、更新教師信号内の行動情報を目的変数として、学習を行い、学習の結果、行動分類モデルを生成する（ステップＳ１４０５）。 The server 101 uses the behavior learning unit 406 to use the principal component input from the principal component analysis unit 404 and the missing position information input from the missing position information generation unit 422 as explanatory variables, and uses the behavioral information in the updated teacher signal as an objective variable. , and as a result of the learning, a behavior classification model is generated (step S1405).

＜行動認識処理＞
図１６は、実施例１にかかるクライアント１０２（行動認識装置）による行動認識処理手順例を示すフローチャートである。クライアント１０２は、骨格検出部４５１により、センサ１０３から取得した解析対象データに映る人の骨格情報３２０を検出する（ステップＳ１６００）。つぎに、クライアント１０２は、欠損位置情報生成部４６２により、検出した骨格情報３２０の内、オクルージョンなどにより検出できなかった骨格点の位置情報を欠損位置情報として生成する（ステップＳ１６０１）。 <Action recognition processing>
FIG. 16 is a flowchart illustrating an example of a behavior recognition processing procedure by the client 102 (behavior recognition device) according to the first embodiment. The client 102 uses the skeleton detection unit 451 to detect the skeleton information 320 of the person appearing in the analysis target data acquired from the sensor 103 (step S1600). Next, the client 102 causes the missing position information generation unit 462 to generate position information of the skeleton points that could not be detected due to occlusion or the like out of the detected skeleton information 320 as missing position information (step S1601).

クライアント１０２は、欠損情報補間部４２３により、欠損情報を、非欠損情報から補間する（ステップＳ１６１０）。 The client 102 uses the missing information interpolation unit 423 to interpolate missing information from non-defective information (step S1610).

つぎに、クライアント１０２は、骨格情報処理部４５３により、ステップＳ１６１０で欠損情報を補間した骨格情報３２０について、ステップＳ１４０２の処理と同様に、骨格情報処理を実行する（ステップＳ１６０２）。具体的には、たとえば、クライアント１０２は、図１５に示したように、関節角度算出部５０１、移動量算出部５０２、および正規化部５０３による処理（ステップＳ１５０１～Ｓ１５０３）を実行する。 Next, the client 102 causes the skeleton information processing unit 453 to perform skeleton information processing on the skeleton information 320 with the missing information interpolated in step S1610, similar to the process in step S1402 (step S1602). Specifically, for example, the client 102 executes processing (steps S1501 to S1503) by the joint angle calculation unit 501, movement amount calculation unit 502, and normalization unit 503, as shown in FIG.

つぎに、クライアント１０２は、主成分分析部４５４により、ステップＳ１６０２で正規化した骨格情報３２０と関節角度３７０とフレーム間の移動量とを入力データとして、主成分分析を実行して、単数または複数の主成分を生成する。主成分分析部４５４は、生成した主成分について、ステップＳ１４０３で決定された序数ｋにより、第１主成分～第ｋ主成分を、行動認識（ステップＳ１６０６）で使用する主成分に決定する。（ステップＳ１６０３）。 Next, the client 102 uses the skeletal information 320 normalized in step S1602, the joint angles 370, and the amount of movement between frames as input data to cause the principal component analysis unit 454 to perform principal component analysis, and then generate the principal components of Regarding the generated principal components, the principal component analysis unit 454 determines the first to kth principal components as principal components to be used in action recognition (step S1606), based on the ordinal number k determined in step S1403. (Step S1603).

つぎに、クライアント１０２は、行動認識部４５７により、ステップＳ１３０５で学習した行動分類モデルと、ステップＳ１６０３で生成した主成分と、ステップＳ１６０１で生成した欠損位置情報と、に基づいて、センサ１０３から取得した解析対象データに映る人の行動を認識する（ステップＳ１６０６）。クライアント１０２は、ステップＳ１６０６の認識結果をサーバ１０１に送信してもよく、また、認識結果を用いて、クライアント１０２に接続されている機器を制御してもよい。 Next, the client 102 uses the behavior recognition unit 457 to obtain information from the sensor 103 based on the behavior classification model learned in step S1305, the principal component generated in step S1603, and the missing position information generated in step S1601. The behavior of the person reflected in the data to be analyzed is recognized (step S1606). The client 102 may transmit the recognition result in step S1606 to the server 101, and may also use the recognition result to control devices connected to the client 102.

たとえば、センサ１０３が配備されている解析環境が工場である場合、行動認識システム１００は、認識結果を用いて、工場内での作業員の作業監視や、製品の欠陥検査などに適用可能である。解析環境が電車である場合、行動認識システム１００は、認識結果を用いて、電車内での乗客の監視や車内設備の監視、火災などの災害検知などに適用可能である。 For example, if the analysis environment in which the sensor 103 is installed is a factory, the behavior recognition system 100 can be applied to monitoring the work of workers in the factory, inspecting products for defects, etc. using the recognition results. . When the analysis environment is a train, the behavior recognition system 100 can be applied to monitoring passengers on the train, monitoring in-car equipment, detecting disasters such as fire, etc. using the recognition results.

このように、実施例１によれば、認識対象の複数種類の行動を高精度に認識することができる。また、一部欠損した形状においても精度良く行動認識を行うために複数の行動分類モデルを生成する必要はない。これにより、学習期間の短縮化や行動分類モデルの保存領域の縮小化を図ることができる。特に、オクルージョンになどにより骨格点３００～３１７が一部欠損した場合においても、行動分類モデルを増やすことなく、欠損した骨格点に応じた複数種類の行動を高精度に認識することができる。 In this manner, according to the first embodiment, multiple types of behaviors to be recognized can be recognized with high accuracy. Furthermore, it is not necessary to generate multiple behavior classification models in order to perform accurate behavior recognition even in a partially missing shape. Thereby, it is possible to shorten the learning period and reduce the storage area of the behavior classification model. In particular, even if some of the skeleton points 300 to 317 are missing due to occlusion or the like, multiple types of behaviors can be recognized with high accuracy according to the missing skeleton points without increasing the number of behavior classification models.

実施例２を、実施例１との相違点を中心に説明する。なお、実施例１と共通する点については、同一符号を付し、その説明を省略する。 The second embodiment will be described focusing on the differences from the first embodiment. Note that the same points as in the first embodiment are given the same reference numerals, and the explanation thereof will be omitted.

図１７は、実施例２にかかる行動認識システム１００の機能的構成例を示すブロック図である。実施例２では、教師信号ＤＢ１０４に、欠損制御部４２１により欠損させる骨格点の位置情報を保有しており、その位置情報を欠損制御部４２１に出力する。 FIG. 17 is a block diagram showing an example of the functional configuration of the behavior recognition system 100 according to the second embodiment. In the second embodiment, the teacher signal DB 104 holds position information of skeleton points to be deleted by the loss control unit 421, and the position information is output to the loss control unit 421.

骨格検出部４５１がセンサ１０３から取得した解析対象データに映る人の骨格情報３２０を検出する際に検出漏れする可能性は、骨格や解析対象データの特性による。たとえば、手首や、足首などの人体の末端の骨格点ほど検出漏れを起こす可能性が高い。また、たとえば、解析対象データがハイアングルである場合は、上半身に隠れて下半身の骨格点の検出漏れの発生確率が高くなるし、通路を横から水平方向のアングルから撮影した場合には、右半身や、左半身の骨格点の検出漏れが発生しやすくなる。 The possibility that the skeleton detection unit 451 misses detection when detecting the human skeleton information 320 appearing in the analysis target data acquired from the sensor 103 depends on the characteristics of the skeleton and the analysis target data. For example, there is a higher possibility of detection failure at skeletal points at the extremities of the human body, such as the wrist or ankle. Also, for example, if the data to be analyzed is from a high angle, there is a high probability that the skeleton points of the lower body will be hidden by the upper body and will not be detected. Missing detection of skeletal points on the half body or left half of the body is likely to occur.

こうした特徴が予め分かっている場合にはそうした欠損を発生させて重点的に学習させた方が行動認識精度の向上が望める。このため、教師信号ＤＢ１０４には欠損制御部４２１により欠損させる骨格点の位置情報を保有させておく。この欠損制御部４２１により欠損させる骨格点の位置情報を、欠損示唆情報と称す。欠損制御部４２１は、欠損示唆情報を教師信号ＤＢ１０４から取得して欠損発生部４０２に出力する。欠損発生部４０２は、欠損させる骨格点の位置情報により、骨格点を欠損させる。これにより、予め特定の欠損が発生しやすい状況においても精度よく行動認識を図ることができる。 If these characteristics are known in advance, it is possible to improve the accuracy of action recognition by generating such deficits and learning intensively. For this reason, the teacher signal DB 104 stores the position information of the skeleton points to be deleted by the deletion control unit 421. The positional information of the skeleton points to be deleted by the deletion control unit 421 is referred to as deletion suggestion information. The loss control unit 421 acquires loss suggestion information from the teacher signal DB 104 and outputs it to the loss generation unit 402. The defect generation unit 402 causes a skeleton point to be deleted based on the position information of the skeleton point to be deleted. Thereby, it is possible to accurately recognize behavior even in a situation where a specific defect is likely to occur in advance.

このように、実施例２によれば、予め特定の欠損が発生しやすい状況においてはその情報を教師信号ＤＢ１０４に保有し、保有した情報を用いて、故意に欠損を発生させて学習を行うことにより、精度よく行動認識を図ることができる。 As described above, according to the second embodiment, in a situation where a specific defect is likely to occur, the information is stored in the teacher signal DB 104 in advance, and the stored information is used to intentionally generate a defect to perform learning. This allows for highly accurate behavior recognition.

実施例３を、実施例１と実施例２との相違点を中心に説明する。なお、実施例１～実施例３と共通する点については、同一符号を付し、その説明を省略する。 Embodiment 3 will be described focusing on the differences between Embodiment 1 and Embodiment 2. Note that the same points as in Examples 1 to 3 are given the same reference numerals, and the explanation thereof will be omitted.

図１８は、実施例３にかかる教師信号ＤＢ１０４が保持する教師信号の詳細な例を示す説明図である。実施例３では、教師信号ＤＢ１０４は行動学習部４０６で説明変数として学習に用いるための環境情報１８０１，１８１１を保有する。環境情報１８０１，１８１１は、解析対象データの取得元の環境に関する情報であり、たとえば、解析対象データの取得元であるセンサ１０３を設置したアングル情報や、センサ１０３を設置した周辺に存在する物体の形状情報などがある。その他、解析対象データの取得時間帯の情報や、開発者が独自に生成する変数を加えても良く、環境情報１８０１，１８１１はこれらに限定されない。 FIG. 18 is an explanatory diagram showing a detailed example of the teacher signal held by the teacher signal DB 104 according to the third embodiment. In the third embodiment, the teacher signal DB 104 holds environmental information 1801 and 1811 to be used as explanatory variables in learning by the behavior learning unit 406. The environment information 1801 and 1811 is information about the environment from which the data to be analyzed is acquired, such as information about the angle at which the sensor 103, which is the source to acquire the data to be analyzed, is installed, and information about objects that exist around the sensor 103. There is shape information, etc. In addition, information on the acquisition time period of the data to be analyzed and variables generated independently by the developer may be added, and the environment information 1801 and 1811 is not limited to these.

図１９は、実施例３にかかる主成分分析部４０４および欠損位置情報生成部４２２が、行動学習部４０６に出力するデータを示す説明図である。教師信号ＤＢ１０４が保有する環境情報１９００（１８０１，１８１１）は、骨格情報３２０に付随して主成分分析部４０４まで出力される。環境情報１９００は主成分分析の処理の対象とはせず、主成分分析部４０４が生成した主成分１１０１および欠損位置情報１１００と共に、説明変数として、行動学習部４０６に出力される。 FIG. 19 is an explanatory diagram showing data outputted to the behavior learning unit 406 by the principal component analysis unit 404 and the missing position information generation unit 422 according to the third embodiment. The environment information 1900 (1801, 1811) held by the teacher signal DB 104 is output to the principal component analysis unit 404 along with the skeleton information 320. The environmental information 1900 is not subjected to principal component analysis processing, but is output to the behavior learning unit 406 as an explanatory variable together with the principal component 1101 and missing position information 1100 generated by the principal component analysis unit 404.

図２０は、実施例３にかかる行動認識システム１００の機能的構成例を示すブロック図である。実施例３では、教師信号取得部４０１は、教師信号取得部２０００に変更され、環境情報検出部２００１が追加される。 FIG. 20 is a block diagram showing an example of the functional configuration of the behavior recognition system 100 according to the third embodiment. In the third embodiment, the teacher signal acquisition section 401 is changed to a teacher signal acquisition section 2000, and an environmental information detection section 2001 is added.

教師信号取得部２０００は、教師信号ＤＢ１０４から、環境情報を含む教師信号について学習に用いる教師信号を単数、または複数取得して、選択した教師信号を欠損発生部４０２に出力する。なお、環境情報１９００は行動学習部４０６の入力とされ、行動分類モデル生成に用いられる。 The teacher signal acquisition unit 2000 acquires one or more teacher signals to be used for learning regarding the teacher signal including environmental information from the teacher signal DB 104, and outputs the selected teacher signal to the loss generating unit 402. Note that the environmental information 1900 is input to the behavior learning unit 406 and used to generate a behavior classification model.

環境情報検出部２００１は、センサ１０３から取得した解析対象データより、教師信号ＤＢ１０４に定義される環境情報１９００を検出し、骨格検出部４５１に出力する。なお、環境情報１９００はセンサ１０３の設置位置などに依存する場合もあり、この場合は、解析対象データではなく、センサ１０３を設置した際の環境情報１９００を予め環境情報検出部２００１に保有し、骨格検出部４５１に出力してもよく、環境情報１９００の検出の方法は限定されない。 The environmental information detection unit 2001 detects environmental information 1900 defined in the teacher signal DB 104 from the analysis target data acquired from the sensor 103, and outputs it to the skeleton detection unit 451. Note that the environmental information 1900 may depend on the installation position of the sensor 103, etc. In this case, the environmental information detection unit 2001 stores in advance the environmental information 1900 when the sensor 103 is installed, not the data to be analyzed. The environmental information 1900 may be output to the skeleton detection unit 451, and the method of detecting the environmental information 1900 is not limited.

＜行動認識処理＞
図２１は、実施例３にかかるクライアント１０２（行動認識装置）による行動認識処理手順例を示すフローチャートである。実施例３では、図１６において、骨格検出（ステップＳ１６００）に先立って、環境情報検出処理（ステップ３３００）が実行される。クライアント１０２は、環境情報検出部２００１により、センサ１０３から取得した解析対象データから教師信号ＤＢ１０４に定義される環境情報１９００を検出する（ステップＳ２１００）。このあと、ステップＳ１６００～Ｓ１６０６が実行される。環境情報１９００は、説明変数として、行動学習部４０６による行動認識（ステップＳ１６０６）で用いられる。 <Action recognition processing>
FIG. 21 is a flowchart illustrating an example of a behavior recognition processing procedure by the client 102 (behavior recognition device) according to the third embodiment. In the third embodiment, in FIG. 16, environmental information detection processing (step 3300) is executed prior to skeleton detection (step S1600). The client 102 uses the environmental information detection unit 2001 to detect the environmental information 1900 defined in the teacher signal DB 104 from the analysis target data acquired from the sensor 103 (step S2100). After this, steps S1600 to S1606 are executed. The environmental information 1900 is used as an explanatory variable in behavior recognition by the behavior learning unit 406 (step S1606).

このように実施例３によれば、教師信号ＤＢ１０４が環境情報１９００を保有し、行動学習時に説明変数として追加することで、主成分に欠損位置情報以外に環境情報１９００が別次元の情報として加わり、新たな次元上に分類するための境界が設定される。したがって、行動分類精度の向上を図ることができる。 In this way, according to the third embodiment, the teacher signal DB 104 holds the environmental information 1900, and by adding it as an explanatory variable during behavior learning, the environmental information 1900 is added to the principal component as information of another dimension in addition to the missing position information. , boundaries for classification on a new dimension are set. Therefore, it is possible to improve the accuracy of behavior classification.

実施例４を、実施例１～実施例３との相違点を中心に説明する。なお、実施例１～実施例３と共通する点については、同一符号を付し、その説明を省略する。 Example 4 will be explained focusing on the differences from Examples 1 to 3. Note that the same points as in Examples 1 to 3 are given the same reference numerals, and the explanation thereof will be omitted.

図２２は、実施例４にかかる骨格情報処理部の機能的構成例を示すブロック図である。実施例４では、骨格情報処理部４０３，４５３は、相互情報正規化部２２０４を有する。相互情報正規化部２２０４は、主成分分析部４０４に出力する骨格情報３２０、関節角度３７０、およびフレーム間の移動量について、値域を一定の範囲内に正規化する。 FIG. 22 is a block diagram showing an example of the functional configuration of the skeleton information processing section according to the fourth embodiment. In the fourth embodiment, the skeleton information processing units 403 and 453 include a mutual information normalization unit 2204. The mutual information normalization unit 2204 normalizes the range of the skeletal information 320, joint angles 370, and inter-frame movement amounts to be within a certain range, which are output to the principal component analysis unit 404.

骨格情報３２０およびフレーム間の移動量の値域は、解析対象データの解像度に依存する。一方、関節角度３７０の値域は、０から２π、または０度から３６０度の範囲となる。主成分分析の実行対象となるデータについて、値域に大きな違いがある場合、元のデータの主成分に対する影響にデータ種毎の偏りが生じる場合がある。 The range of the skeleton information 320 and the amount of movement between frames depends on the resolution of the data to be analyzed. On the other hand, the range of the joint angle 370 is from 0 to 2π, or from 0 degrees to 360 degrees. If there is a large difference in the range of the data to be subjected to principal component analysis, the influence on the principal components of the original data may be biased depending on the data type.

この偏りを無くすため、相互情報正規化部２２０４は、主成分にかけるデータの値域を一定の範囲内にする正規化を実行する。たとえば、相互情報正規化部２２０４は、骨格情報３２０を下記式（２１）～（２２）に従い、フレーム間移動量を下記式（２３）に従って、元のデータの値域を０から２πに統一する。 In order to eliminate this bias, the mutual information normalization unit 2204 performs normalization to bring the value range of the data to be applied to the principal component within a certain range. For example, the mutual information normalization unit 2204 unifies the range of the original data from 0 to 2π using the skeleton information 320 according to the following formulas (21) to (22) and the inter-frame movement amount according to the following formula (23).

ただし、相互情報正規化部２２０４が実行する正規化の手法はこれに限らず、相互情報正規化部２２０４は、たとえば、主成分分析の実行対象となるデータの解像度の大きさに従って、関節角度３７０の値域を一定に正規化してもよい。 However, the normalization method executed by the mutual information normalization unit 2204 is not limited to this; for example, the mutual information normalization unit 2204 may adjust the joint angle 370 The value range of may be normalized to a constant value.

図２３は、実施例４にかかる骨格情報処理部の詳細な処理手順例を示すフローチャートである。実施例４では、骨格情報処理（ステップＳ１４０２，Ｓ１６０２）において、クライアント１０２は、正規化（ステップＳ１５０３）のあと、相互情報正規化（ステップＳ２３０４）を実行する。相互情報正規化（ステップＳ２３０４）では、正規化部で正規化された骨格情報３２０と、関節角度３７０と、フレーム間の移動量と、について、取りえる値域を一定に正規化する。 FIG. 23 is a flowchart illustrating a detailed processing procedure example of the skeleton information processing unit according to the fourth embodiment. In the fourth embodiment, in the skeleton information processing (steps S1402, S1602), the client 102 performs mutual information normalization (step S2304) after normalization (step S1503). In mutual information normalization (step S2304), the possible value ranges of the skeletal information 320, joint angles 370, and inter-frame movement amounts normalized by the normalization unit are normalized to a constant value.

このように、実施例４によれば、主成分分析を実行する元のデータ（骨格情報３２０、関節角度３７０、フレーム間の移動量）の取り得る値域を一定に統一することで、広い値域を持つ特定のデータによる主成分への影響の偏りを無くし、複数種類の行動を高精度に判別することができる。 In this way, according to the fourth embodiment, by unifying the range of possible values of the original data (skeletal information 320, joint angles 370, amount of movement between frames) on which principal component analysis is performed, a wide range of values can be achieved. It is possible to eliminate bias in the influence of specific data on the principal components and to discriminate between multiple types of behavior with high accuracy.

実施例５を、実施例１～実施例４との相違点を中心に説明する。なお、実施例１～実施例４と共通する点については、同一符号を付し、その説明を省略する。 Example 5 will be explained focusing on the differences from Examples 1 to 4. Note that the same reference numerals are given to the same points as in Examples 1 to 4, and the explanation thereof will be omitted.

図２４は、実施例５にかかる行動認識システム１００の機能的構成例を示すブロック図である。実施例５では、主成分分析部４０４，４５４が、次元削減部２４００，２４０１に変更される。次元削減は、元の情報量を可能な限り維持した上で元の変数の数または元の次元の数を削減する処理であり、実施例１～実施例４の主成分分析や独立成分分析といった成分分析を包含する概念である。 FIG. 24 is a block diagram showing an example of the functional configuration of the behavior recognition system 100 according to the fifth embodiment. In the fifth embodiment, the principal component analysis units 404 and 454 are changed to dimension reduction units 2400 and 2401. Dimensionality reduction is a process of reducing the number of original variables or the number of original dimensions while maintaining the original amount of information as much as possible. This is a concept that includes component analysis.

次元削減部２４００は、骨格情報処理部４０３から取得した教師信号の内、正規化した骨格情報３２０と、関節角度３７０と、フレーム間の移動量と、を入力データとして、次元削減を実行して単数または複数の変数を生成し、行動学習部４０６に出力する。 The dimension reduction unit 2400 executes dimension reduction using the normalized skeleton information 320, joint angles 370, and inter-frame movement amounts as input data among the teacher signals acquired from the skeleton information processing unit 403. One or more variables are generated and output to the behavior learning unit 406.

次元削減部２４００が行う次元削減の手法としては、ＳＮＥ（ＳｔｏｃｈａｓｔｉｃＮｅｉｇｈｂｏｒＥｍｂｅｄｄｉｎｇ）、ｔ－ＳＮＥ（ｔ－ＤｉｓｔｒｉｂｕｔｅｄＳｔｏｃｈａｓｔｉｃＮｅｉｇｈｂｏｒＥｍｂｅｄｄｉｎｇ）、ＵＭＡＰ（ＵｎｉｆｏｒｍＭａｎｉｆｏｌｄＡｐｐｒｏｘｉｍａｔｉｏｎａｎｄＰｒｏｊｅｃｔｉｏｎ）、Ｉｓｏｍａｐ、ＬＬＥ（ＬｏｃａｌｌｙＬｉｎｅａｒＥｍｂｅｄｄｉｎｇ）、ラプラシアン固有マップ（ＬａｐｌａｃｉａｎＥｉｇｎｍａｐ）、ＬａｒｇｅＶｉｓ、および拡散マップのような手法がある。次元削減部２４００は、ｔ－ＳＮＥやＵＭＡＰに主成分分析や独立成分分析を組み合わせて次元削減してもよい。以下、各次元削減の手法と、各手法を組み合わせて行う次元削減の手法を説明する。 The dimension reduction methods performed by the dimension reduction unit 2400 include SNE (Stochastic Neighbor Embedding), t-SNE (t-Distributed Stochastic Neighbor Embedding), and UMAP (Uniform Manifold). Approximation and Projection), Isomap, LLE (Locally Linear Embedding), Techniques include Laplacian Eignmap, LargeVis, and Diffusion Map. The dimension reduction unit 2400 may perform dimension reduction by combining t-SNE or UMAP with principal component analysis or independent component analysis. Below, each dimension reduction method and a dimension reduction method performed by combining each method will be explained.

ＳＮＥの処理を、下記式（２４）～（２８）を用いて説明する。 The SNE processing will be explained using the following equations (24) to (28).

ｘ_ｉとｘ_ｊの２つのｘ座標値３２２（入力データ）の類似度をｘ_ｉが与えられたときに近傍としてｘ_ｊを選択する条件付確率ｐ_ｊ｜ｉとする。条件付確率ｐ_ｊ｜ｉを上記式（２４）に示す。この時、ｘ_ｊはｘ_ｉを中心とした正規分布に基づいて選択されると仮定する。次に、次元削減後のｙ_ｉとｙ_ｊの２つのｙ座標値３２３（主成分）の類似度も、次元削減前のｘ_ｉとｘ_ｊの類似度と同様に、上記式（２５）に示す条件付き確率ｑ_ｊ｜ｉとする。但し、次元削減後の座標値の分散は、式を簡略化するため１／√２で固定される。 Let the similarity between two x coordinate values 322 (input data) x _i and x _j be the conditional probability p _{j |i} of selecting x _j as a neighbor when x _i is given. The conditional probability p _{j |i} is shown in the above equation (24). At this time, it is assumed that x _j is selected based on a normal distribution centered on x _i . Next, the similarity between the two y-coordinate values 323 (principal components) of y _i and y _j after dimension reduction is also calculated by the above equation (25), as well as the similarity between x _i and x _j before dimension reduction. Let the conditional probability q _j|i be However, the variance of the coordinate values after dimension reduction is fixed at 1/√2 to simplify the equation.

次元削減前後での距離関係を維持するように次元削減のｙを生成すれば、情報量も可能な限り維持した上で、次元削減することが可能である。情報量の低減を抑制した上で次元削減を行うため、次元削減部２４００は、ｐ_ｊ｜ｉ＝ｑ_ｊ｜ｉとなるように処理を行う。次元削減には２つの確率分布がどの程度似ているかを表す尺度であるＫＬダイバージェンスが用いられる。 If y for dimension reduction is generated so as to maintain the distance relationship before and after dimension reduction, it is possible to reduce dimension while maintaining the amount of information as much as possible. In order to perform dimension reduction while suppressing a reduction in the amount of information, the dimension reduction unit 2400 performs processing so that p _{j |i} = q _{j |i} . KL divergence, which is a measure of how similar two probability distributions are, is used for dimension reduction.

ＫＬダイバージェンスを損失関数として次元削減前後の確率分布を適応した式を上記式（２６）に示す。次元削減部２４００は、損失関数である上記式（２６）を確率的勾配降下法により最小化する。この勾配は損失関数をｙ_ｉで微分した上記式（２７）を用いて、ｙ_ｉを変動させる。この変動の際の更新式は上記式（２８）で示される。 The above equation (26) shows an equation in which the probability distribution before and after dimension reduction is adapted using the KL divergence as a loss function. The dimensionality reduction unit 2400 minimizes the loss function expressed in equation (26) using stochastic gradient descent. This gradient varies y _i using the above equation (27) in which the loss function is differentiated by y _i . The updating formula for this variation is shown by the above equation (28).

以上、ｙ_ｉを変動させながら上記式（２８）を更新させ、上記式（２７）が最小となるｙ_ｉを得ることで次元削減を行ない、新たな変数を得る。ただし、ＳＮＥの場合、主成分分析と異なり処理の特性上縮約後の次元数（変数）は２または，３種類になる。このため、ＳＮＥによる次元削減を実施の際は、予め定めた次元数（変数）を行動学習部４０６に出力する。 As described above, the above formula (28) is updated while varying y _i , and dimension reduction is performed by obtaining y _i that minimizes the above formula (27), and a new variable is obtained. However, in the case of SNE, unlike principal component analysis, the number of dimensions (variables) after reduction is two or three types due to processing characteristics. Therefore, when performing dimension reduction by SNE, a predetermined number of dimensions (variables) is output to the behavior learning unit 406.

ただＳＮＥでは損失関数の最小化が難しく、また次元削減の際に等距離性を保とうとして、ｘ座標値３２２およびｙ座標値３２３で特定される骨格点が密になってしまう問題がある。この問題の解決手法としてｔ－ＳＮＥがある。 However, in SNE, it is difficult to minimize the loss function, and there is a problem that the skeleton points specified by the x-coordinate value 322 and the y-coordinate value 323 become dense when trying to maintain equidistantness during dimension reduction. t-SNE is a method for solving this problem.

ｔ－ＳＮＥの処理を下記式（２９）～（３３）を用いて説明する。 The t-SNE processing will be explained using the following equations (29) to (33).

損失関数最小化を簡単にするため、損失関数を対称化する。損失関数の対称化処理では、上記式（２９）に示す通り、ｘ_ｉとｘ_ｊの距離を同時確率分布ｐ_ｉｊで表す。ｐ_ｊ｜ｉは上記式（２４）同様で上記式（３０）で示せる。また次元削減後のｙ_ｉとｙ_ｊの距離を上記式（３１）に示す同時確率分布ｑ_ｉｊで表す。 To simplify loss function minimization, we make the loss function symmetric. In the loss function symmetrization process, the distance between x _i and x _j is represented by a joint probability distribution p _ij , as shown in equation (29) above. p _j|i is similar to the above equation (24) and can be expressed by the above equation (30). Further, the distance between y _i and y _j after dimension reduction is expressed by the joint probability distribution q _ij shown in the above equation (31).

次元削減後の点の距離はスチューデントのｔ分布を仮定している。スチューデントのｔ分布は、正規分布に比較して、平均値からずれた値の存在確率が高いことが特徴であり、この特徴が次元削減後のデータ間の距離について長い距離の分布も許容することが可能となる。 The distance between points after dimension reduction assumes Student's t distribution. The Student's t distribution is characterized by a higher probability of the existence of values that deviate from the mean value than the normal distribution, and this feature allows for a long distance distribution of the distance between data after dimension reduction. becomes possible.

ｔ－ＳＮＥでは、次元削減部２４００は、上記式（２９）～（３１）で求めたｐｉｊとｑ_ｉｊを用いて、上記式（３２）に示す損失関数を最小化することで次元削減を行う。次元削減部２４００は、損失関数の最小化にはＳＮＥと同様に上記式（３３）に示す確率的勾配降下法を用いる。 In t-SNE, the dimensionality reduction unit 2400 performs dimensionality reduction by minimizing the loss function shown in the above equation (32) using pij and q _ij calculated using the above equations (29) to (31). . The dimension reduction unit 2400 uses the stochastic gradient descent method shown in the above equation (33) similarly to SNE to minimize the loss function.

以上、上記式（３３）が最小となるｙ_ｉを得ることで、次元削減部２４００は、次元削減を行ない、新たな変数を得る。ｔ－ＳＮＥもＳＮＥ同様に処理の特性上縮約後の次元数（変数）は２または３種類になる。このため、ｔ－ＳＮＥによる次元削減を実施の際は、予め定めた次元数（変数）を行動学習部４０６に出力する。 As described above, by obtaining y _i that minimizes the above equation (33), the dimension reduction unit 2400 performs dimension reduction and obtains a new variable. Like SNE, t-SNE also has two or three types of dimensions (variables) after reduction due to processing characteristics. Therefore, when performing dimension reduction by t-SNE, a predetermined number of dimensions (variables) is output to the behavior learning unit 406.

ｔ－ＳＮＥは、次元削減前の高次元の局所的な構造を保った上で、大局的な構造も可能な限り捉えることから精度よく次元削減可能であるが、次元削減前の次元数に応じて計算時間が増加するといった問題がある。この次元削減の計算時間の問題を解決する手法としてＵＭＡＰがある。ＵＭＡＰの処理を下記式（３４）～（３６）を用いて説明する。 t-SNE maintains the high-dimensional local structure before dimension reduction and captures the global structure as much as possible, so it is possible to reduce the dimension accurately, but depending on the number of dimensions before dimension reduction. There is a problem that calculation time increases. UMAP is a method for solving the calculation time problem of dimension reduction. The UMAP processing will be explained using the following equations (34) to (36).

とり得る値の全体Ａの中で、高次元の集合Ｘ（上記式（３４））がある。Ａの中から任意のデータを取り出した際に、それが集合Ｘに含まれる度合いを０から１の範囲で出力するメンバーシップ関数をμとする。上記式（１）に示す入力Ｘに対して、上記式（２）に示すＹを用意する。ＹはＸに比較して低い次元の空間に存在するｍ（＜ｐ）個の点の集合であり、次元削減後のデータの集合である。そしてＹのメンバーシップ関数をνとして、次元削減部２４００は、上記式（３６）が最小となるようなＹを定めることで次元削減を行ない、新たな変数を得る。 Among all possible values A, there is a high-dimensional set X (formula (34) above). Let μ be a membership function that, when arbitrary data is extracted from A, outputs the degree to which it is included in the set X in a range of 0 to 1. For input X shown in equation (1) above, Y shown in equation (2) above is prepared. Y is a set of m (<p) points existing in a space with a lower dimension than X, and is a set of data after dimension reduction. Then, by setting the membership function of Y to ν, the dimension reduction unit 2400 performs dimension reduction by determining Y such that the above equation (36) is minimized, and obtains a new variable.

ＵＭＡＰによる次元削減を実施の際には、次元削減部２４００は、ＳＮＥやｔ－ＳＮＥ同様に予め定めた次元数（変数）を行動学習部４０６に出力してもよいし、または、次元削減後のメンバーシップ関数νが予め定めた値域以上となるような次元数（変数）を必要な次元数として行動学習部４０６に出力してもよい。 When performing dimension reduction using UMAP, the dimension reduction unit 2400 may output a predetermined number of dimensions (variables) to the behavior learning unit 406, similar to SNE and t-SNE, or The number of dimensions (variables) such that the membership function ν of is greater than or equal to a predetermined range may be output to the behavior learning unit 406 as the required number of dimensions.

Ｉｓｏｍａｐの処理を説明する。次元削減部２４００は、任意のデータにおいて、近傍にあるデータの最短距離を算出し、算出した距離を多次元尺度構成法（ＭＤＳ）により測地線距離行列で表すことで次元削減を行ない、新たな変数を得る。Ｉｓｏｍａｐによる次元削減の実施の際には、次元削減部２４００は、予め定めた次元数（変数）を行動学習部４０６に出力する。 Isomap processing will be explained. The dimensionality reduction unit 2400 calculates the shortest distance between nearby data for arbitrary data, and performs dimensionality reduction by expressing the calculated distance in a geodesic distance matrix using multidimensional scaling (MDS). Get variable. When performing dimension reduction using Isomap, the dimension reduction unit 2400 outputs a predetermined number of dimensions (variables) to the behavior learning unit 406.

ＬＬＥについて下記式（３７）～（４３）を用いて説明する。 LLE will be explained using the following formulas (37) to (43).

ｘ_ｉの近傍にある点を線形結合で近似的に上記式（３７）で表す。ここで、上記式（３８）の制約下で上記式（３９）を最小化することで次元削減前のｘ_ｉの近似値が定まる。次に、次元削減後のｙ_ｉについて、次元削減後にも可能な限りｘ_ｉの線形の隣接関係を保つため、次元削減部２４００は、上記式（４０）を最小化する。この解は上記式（４１）の固有ベクトルを固有値の２番目に小さいものｖ_ｉから（ｄ＋１）番目のｖ_ｄまで抽出することで上記式（４２）の通り得られ、次元削減部２４００は、上記式（４３）の通り、次元削減後のｙ_ｉを取得する。 Points in the vicinity of x _i are approximately represented by the above equation (37) by linear combination. Here, by minimizing the above equation (39) under the constraint of the above equation (38), the approximate value of x _i before dimension reduction is determined. Next, regarding y _i after dimension reduction, the dimension reduction unit 2400 minimizes the above equation (40) in order to maintain the linear adjacency relationship of x _i as much as possible even after dimension reduction. This solution is obtained as in the above equation (42) by extracting the eigenvectors of the above equation (41) from the second smallest eigenvalue v _i to the (d+1)th v _d , and the dimension reduction unit 2400 As shown in equation (43), y _i after dimension reduction is obtained.

ＬＬＥによる次元削減を実施の際には、次元削減部２４００は、予め定めた次元数（変数）を行動学習部４０６に出力し、行動学習部４０６は前記予め定めた次元数に従って、使用する変数の数を決定すればよい。 When implementing dimension reduction by LLE, the dimension reduction unit 2400 outputs a predetermined number of dimensions (variables) to the behavior learning unit 406, and the behavior learning unit 406 selects the variables to be used according to the predetermined number of dimensions. All you have to do is decide on the number of .

ラプラシアン固有マップの処理を下記式（４４）～（４９）を用いて説明する。 The processing of the Laplacian eigenmap will be explained using the following equations (44) to (49).

次元削減前のデータが生成する近傍グラフの各辺ｘ_ｉｘ_ｊを上記式（４４）または上記式（４５）に割り当てる。割り当てた重みに対して上記式（４６）のグラフラプラシアンを導入し、グラフラプラシアンの固有ベクトル（上記式（４７））を固有値の２番目に小さいｖ_ｉから（ｄ＋１）番目のｖ_ｄまで抽出することで上記式（４８）の通り得られ、次元削減部２４００は、上記式（４９）の通り次元削減後の値ｙ_ｉを取得する。 Each edge x _i x _j of the neighborhood graph generated by the data before dimension reduction is assigned to the above equation (44) or the above equation (45). Introducing the graph Laplacian of the above equation (46) for the assigned weights, and extracting the eigenvectors of the graph Laplacian (the above equation (47)) from the second smallest eigenvalue v _i to the (d+1)th v _d is obtained according to the above equation (48), and the dimension reduction unit 2400 obtains the value y _i after dimension reduction according to the above equation (49).

ラプラシアン固有マップによる次元削減を実施の際には、次元削減部２４００は、予め定めた次元数（変数）を行動学習部４０６に出力する。 When performing dimension reduction using the Laplacian eigenmap, the dimension reduction unit 2400 outputs a predetermined number of dimensions (variables) to the behavior learning unit 406.

ＬａｒｇｅＶｉｓの処理について説明する。ＬａｒｇｅＶｉｓはｔ－ＳＮＥの計算時間を改善した手法である。ｔ－ＳＮＥではデータ点同士の距離を求めるため、データ数に応じて計算時間が増大していた。ＬａｒｇｅＶｉｓでは、次元削減部２４００は、近傍のデータからＫ－ＮＮグラフを用いてデータを領域ごとに分け、領域ごとに分けられたデータモデル毎にｔ－ＳＮＥと同様の手法で次元削減を行う。 The processing of LargeVis will be explained. LargeVis is a method that improves the calculation time of t-SNE. In t-SNE, the distance between data points is determined, so the calculation time increases depending on the number of data. In LargeVis, the dimension reduction unit 2400 divides data into regions using a K-NN graph from neighboring data, and performs dimension reduction for each data model divided into regions using a method similar to t-SNE.

ＬａｒｇｅＶｉｓによる次元削減を実施の際には、次元削減部２４００は、予め定めた次元数（変数）を行動学習部４０６に出力する。 When implementing dimension reduction using LargeVis, the dimension reduction unit 2400 outputs a predetermined number of dimensions (variables) to the behavior learning unit 406.

拡散マップについて下記式（５０）～（５５）を用いて説明する。 The diffusion map will be explained using equations (50) to (55) below.

次元削減前のｘ_ｉと近傍にあるｘｊから構成される近傍グラフの各辺ｘ_ｉｘ_ｊに重みＷ_ｉｊを割当て、これを正規化して上記式（５０）に示すＮ×Ｎの推移確率行列Ｐを作る。ｐ_ｔ（ｘ_ｉｘ_ｊ）はＰで表現されるグラフ上のランダムウォークによってｘ_ｉを出発してｔステップ後にｘ_ｊに到達する確率を表すとする。推移行列の性質からｐ_ｔ（ｘ_ｉｘ_ｊ）はｔ→∞で定常分布φ_０（ｘ_ｊ）に収束する。この時、点ｘ_ｉｘ_ｊの拡散距離を上記式（５１）で定義する。推移確率行列Ｐの固有値を上記式（５２）、固有ベクトルを上記式（５３）とする。この時、上記式（５４）が成り立つ。λ_ｉの絶対値は１以下であるから、次元削減部２４００は、Ｎより小さい適当な次元ｄ（ｔ）までの固有ベクトルをとって、上記式（５５）の通り次元削減を行ない、新たな変数を得る。 A weight W _ij is assigned to each edge x _i x j of the neighborhood graph composed of x _i before dimension reduction and x _j in the neighborhood, and this is normalized to create an N×N transition probability matrix shown in equation (50) above. Make P. Let p _t (x _i x _j ) represent the probability of starting from x _i and arriving at x _j after t steps by a random walk on the graph represented by P. From the properties of the transition matrix, p _t (x _i x _j ) converges to a stationary distribution φ ₀ (x _j ) at t→∞. At this time, the diffusion distance of the point x _i x _j is defined by the above equation (51). Let the eigenvalue of the transition probability matrix P be the above equation (52), and the eigenvector be the above equation (53). At this time, the above formula (54) holds true. Since the absolute value of λ _i is less than or equal to 1, the dimension reduction unit 2400 takes the eigenvectors up to an appropriate dimension d(t) smaller than N, performs dimension reduction according to the above equation (55), and creates a new variable. get.

拡散マップによる次元削減を実施の際には、予め定めた次元数（変数）を行動学習部４０６に出力する。 When performing dimension reduction using a diffusion map, a predetermined number of dimensions (variables) is output to the behavior learning unit 406.

次元削減部２４００は、これまでに説明した主成分分析、独立成分分析、ｔ－ＳＮＥ、ＵＭＡＰ、Ｉｓｏｍａｐ、ＬＬＥ、ラプラシアン固有マップ、ＬａｒｇｅＶｉｓ、拡散マップなどを組み合わせて実施してもよい。たとえば、次元削減部２４００は、３６次元、または３６変数ある高次元なデータに対して、１０次元までの次元削減を主成分分析を用いて行い、その後２次元までの次元削減でＵＭＡＰを用いるなど、次元削減に用いる手法の組合せは限定されない。このように次元削減の際に各種手法を組み合わせることで性能や計算時間に複合的な効果が期待できる。 The dimension reduction unit 2400 may perform a combination of the principal component analysis, independent component analysis, t-SNE, UMAP, Isomap, LLE, Laplacian eigenmap, LargeVis, diffusion map, etc. described above. For example, the dimension reduction unit 2400 uses principal component analysis to reduce dimensions to 10 dimensions for high-dimensional data with 36 dimensions or 36 variables, and then uses UMAP to reduce dimensions to 2 dimensions. , the combination of methods used for dimensionality reduction is not limited. In this way, by combining various methods during dimensionality reduction, we can expect a composite effect on performance and calculation time.

また、これら次元削減の手法は、実施例５に記載の範囲で限定されるものではなく、たとえば、単に高次元の情報を可算、減算、乗算または除算したり、予め定めた係数に従って畳み込んだりしてもよく、実施例５記載の手法のように高次元のデータまたは多変数を、より低い次元のデータや、少ない数の変数を生成する手法であれば、次元削減の手法は限定されない。 Furthermore, these dimension reduction methods are not limited to the scope described in Example 5, and may include, for example, simply counting, subtracting, multiplying, or dividing high-dimensional information, or convolving it according to predetermined coefficients. However, the dimension reduction method is not limited as long as it is a method that generates high-dimensional data or multiple variables, lower-dimensional data, or a small number of variables, such as the method described in Example 5.

次元削減部２４０１は、次元削減部２４００と同様の機能を有する。次元削減部２４０１は、骨格情報処理部４５３からの出力データに対して次元削減部２４００と同様の処理を実行して、次元削減前に比較して、少ない単数または複数の新たな変数を生成する。また、次元削減部２４０１は、主成分と共に生成した寄与率と累積寄与率とから決定される新たな変数を行動認識部４５７に出力する。 The dimension reduction unit 2401 has the same function as the dimension reduction unit 2400. The dimensionality reduction unit 2401 executes the same processing as the dimensionality reduction unit 2400 on the output data from the skeleton information processing unit 453, and generates one or more new variables that are smaller than before the dimensionality reduction. . Further, the dimension reduction unit 2401 outputs a new variable determined from the contribution rate and cumulative contribution rate generated together with the principal component to the behavior recognition unit 457.

このように、実施例５によれば、次元削減の手法を変えることで、骨格情報処理部４０３から取得するデータに合わせて、効果的に、または計算時間を短縮して次元削減可能となり、複雑な行動を高精度に判別することができる。 As described above, according to the fifth embodiment, by changing the dimension reduction method, it is possible to reduce the dimension effectively or reduce the calculation time according to the data acquired from the skeleton information processing unit 403, and to reduce the complexity. behavior can be determined with high accuracy.

実施例６を、実施例１～実施例５との相違点を中心に説明する。なお、実施例１～実施例５と共通する点については、同一符号を付し、その説明を省略する。 Example 6 will be explained focusing on the differences from Examples 1 to 5. Note that the same reference numerals are given to the points common to Examples 1 to 5, and the explanation thereof will be omitted.

図２５は、実施例６にかかる行動認識システム１００の機能的構成例を示すブロック図である。実施例６では、行動学習部４０６と行動認識部４５７が、行動学習部２５００と行動認識部２５０１に変更される。行動学習部２５００および行動認識部２５０１が行動を分類するための詳細な手法を図２６～図２８を用いて説明する。 FIG. 25 is a block diagram showing an example of the functional configuration of the behavior recognition system 100 according to the sixth embodiment. In the sixth embodiment, the behavior learning section 406 and the behavior recognition section 457 are changed to the behavior learning section 2500 and the behavior recognition section 2501. A detailed method by which the behavior learning unit 2500 and the behavior recognition unit 2501 classify behaviors will be explained using FIGS. 26 to 28.

図２６は、行動学習部２５００および行動認識部２５０１が行動を分類するための基礎となる手法である決定木を示す説明図である。決定木を用いた行動分類手法を説明する。決定木では、次元削減後に新たに生成された変数空間での各行動について、予め行動の種類を与えられた変数である行動（プロット点１２００～１２０３）を用いて、（ａ）境界線２６１０が生成される。 FIG. 26 is an explanatory diagram showing a decision tree, which is a basic method by which the behavior learning unit 2500 and the behavior recognition unit 2501 classify behaviors. We will explain the behavior classification method using decision trees. In the decision tree, for each behavior in the variable space newly generated after dimension reduction, (a) the boundary line 2610 is generated.

（ａ）境界線２６１０を生成する手法を説明する。決定木は、行動（プロット点１２００～１２０３）を含む母集団２６２０から入力された変数群２６２１の不純度が最小になるように段階的に行動を分類していく。１段階目では第２変数軸上で、変数群２６２１を変数群２６２２と変数群２６２３とに分類し、２段階目では第１変数軸上で、変数群２６２２および変数群２６２３を変数群２６２４～２６２７に分類する。こうして不純度が最小となるよう分類していく過程で得られる判別式を用いて（ａ）境界線２６１０が生成される。なお、各段階でどの軸で行動を分類するかは限定されず、また各軸での行動分類について１回などの規定された回数で分類するなどの限定もされない。 (a) A method of generating the boundary line 2610 will be explained. The decision tree classifies behaviors step by step so that the impurity of a variable group 2621 input from a population 2620 including behaviors (plot points 1200 to 1203) is minimized. In the first stage, variable group 2621 is classified into variable group 2622 and variable group 2623 on the second variable axis, and in the second stage, variable group 2622 and variable group 2623 are classified into variable groups 2624 to 2623 on the first variable axis. Classified into 2627. (a) A boundary line 2610 is generated using the discriminant obtained in the process of classifying so that the impurity is minimized. Note that there is no limitation on which axis the behavior is classified at each stage, and there is no limitation on the behavior classification on each axis by a prescribed number of times such as once.

図２７は、決定木による分類の詳細な展開方法を示す説明図である。決定木には、レベル（深さ）ごとに決定木を成長させるレベルワイズ２７００と、リーフ（分岐後のデータ群）ごとに決定木を成長させるリーフワイズ２７０１と、がある。決定木のような分類器を重ねて学習することをアンサンブル学習という。 FIG. 27 is an explanatory diagram showing a detailed method for developing classification using a decision tree. Decision trees include level-wise 2700, which grows a decision tree for each level (depth), and leaf-wise 2701, which grows a decision tree for each leaf (data group after branching). Learning by stacking classifiers such as decision trees is called ensemble learning.

図２８は、アンサンブル学習と、行動学習部２５００と行動認識部２５０１が行動を分類するために用いる手法を示す説明図である。アンサンブル学習には、決定木のような分類木を並列に用いるバギング２８０１と、前の結果を引き継ぎ学習結果を更新していくブースティング２８０２と、がある。実施例１のランダムフォレストは、決定木についてバギング２８０１を採用した手法で、実施例６の行動学習部２５００および行動認識部２５０１は、ブースティング２８０２を使用した分類手法である。 FIG. 28 is an explanatory diagram showing ensemble learning and a method used by the behavior learning unit 2500 and the behavior recognition unit 2501 to classify behaviors. Ensemble learning includes bagging 2801 that uses classification trees such as decision trees in parallel, and boosting 2802 that updates learning results by inheriting previous results. The random forest of the first embodiment is a method that uses bagging 2801 for decision trees, and the behavior learning unit 2500 and the behavior recognition unit 2501 of the sixth embodiment are a classification method that uses boosting 2802.

行動学習部２５００が行動を学習し、行動認識部２５０１が行動を分類するにあたっては、各決定木をレベルワイズにより成長させ、複数の決定木を重ねるブースティングにより入力された変数を分類してもよいし、各決定木をリーフワイズにより成長させ、複数の決定木を重ねるブースティングにより入力された変数を分類してもよい。 When the behavior learning unit 2500 learns the behavior and the behavior recognition unit 2501 classifies the behavior, it is possible to grow each decision tree levelwise and classify the input variables by boosting multiple decision trees. Alternatively, each decision tree may be grown leafwise, and input variables may be classified by boosting, which overlaps a plurality of decision trees.

なお、各決定木をレベルワイズにより成長させ、複数の決定木を重ねるブースティングを行動分類手法として採用する際にはソフトウェアライブラリｘｇｂｏｏｓｔを用いて実装してもよい。また一方で、各決定木をリーフワイズにより成長させ、複数の決定木を重ねるブースティングを行動分類手法として採用する際にはソフトウェアライブラリＬｉｇｈｔＧＢＭを用いて実装してもよい。ただし、実装手法はこれらに限定されない。 Note that when employing boosting, in which each decision tree is grown levelwise and multiple decision trees are stacked, as a behavior classification method, it may be implemented using the software library xgboost. On the other hand, when employing boosting, in which each decision tree is grown leafwise and multiple decision trees are stacked, as a behavior classification method, it may be implemented using the software library LightGBM. However, implementation methods are not limited to these.

このように、実施例６によれば、行動分類手法にブースティングを用いて、複数の決定木を重ねることにより、複雑な行動を高精度に判別することができる。 In this way, according to the sixth embodiment, by using boosting as the behavior classification method and overlapping a plurality of decision trees, complex behaviors can be determined with high accuracy.

実施例７を、実施例１～実施例６との相違点を中心に説明する。なお、実施例１～実施例６と共通する点については、同一符号を付し、その説明を省略する。 Example 7 will be explained focusing on the differences from Examples 1 to 6. Note that the same reference numerals are given to the points common to Examples 1 to 6, and the explanation thereof will be omitted.

図２９は、実施例７にかかる行動認識システム１００の機能的構成例を示すブロック図である。実施例７では、次元削減部２４００、行動学習部４０６、次元削減部２４０１、および行動認識部４５７が、次元削減部２９００、行動学習部２９０１、次元削減部２９０３、および行動認識部２９０４に変更される。 FIG. 29 is a block diagram showing an example of the functional configuration of the behavior recognition system 100 according to the seventh embodiment. In the seventh embodiment, the dimensionality reduction unit 2400, behavior learning unit 406, dimensionality reduction unit 2401, and behavior recognition unit 457 are changed to dimensionality reduction unit 2900, behavior learning unit 2901, dimensionality reduction unit 2903, and behavior recognition unit 2904. Ru.

次元削減部２９００は、予め定めた次元数に従って、実施例１～実施例６のいずれかの手法で次元削減を行い、次元削減後の変数を行動学習部２９０１に出力する。 The dimensionality reduction unit 2900 performs dimensionality reduction using any of the methods of Examples 1 to 6 according to a predetermined number of dimensions, and outputs the variable after dimensionality reduction to the behavior learning unit 2901.

行動学習部２９０１は、取得した次元削減後の変数と共に、与えられた行動の種類から機械学習により、行動分類のための境界線を生成し、行動分類モデルを生成する。この際、生成した行動分類モデルに対して、どのくらいの精度で行動を予測できるかという行動分類精度を算出する。 The behavior learning unit 2901 generates a boundary line for behavior classification by machine learning from the given behavior type together with the obtained variable after dimension reduction, and generates a behavior classification model. At this time, the behavior classification accuracy, which indicates how accurately the behavior can be predicted, is calculated for the generated behavior classification model.

行動学習部２９０１は、行動分類モデル生成に用いた変数を用いて行動分類精度を算出してもよい。行動学習部２９０１は、次元削減部２９００から取得した変数の内、一部を行動分類モデル生成には用いず、行動分類生成に用いなかった変数を用いて行動分類精度を算出してもよい。ただし、行動分類精度算出の方法は、これらに限定されない。算出した行動分類精度が予め定めた精度より高ければ、行動学習部２９０１は、生成した行動分類モデルを行動認識部２９０４に出力する。またこの際、行動学習部２９０１は、取得した次元数と行動分類精度が合格であったことを次元削減部２９００に出力する。 The behavior learning unit 2901 may calculate the behavior classification accuracy using the variables used to generate the behavior classification model. The behavior learning unit 2901 may not use some of the variables acquired from the dimension reduction unit 2900 for behavior classification model generation, and may calculate behavior classification accuracy using variables that are not used for behavior classification generation. However, the method of calculating behavior classification accuracy is not limited to these. If the calculated behavior classification accuracy is higher than the predetermined accuracy, the behavior learning unit 2901 outputs the generated behavior classification model to the behavior recognition unit 2904. In addition, at this time, the behavior learning unit 2901 outputs to the dimension reduction unit 2900 that the obtained number of dimensions and behavior classification accuracy are passed.

一方で、行動学習部２９０１は、算出した行動分類精度が予め定めた精度より低ければ、行動分類精度が不合格であったことを次元削減部２９００に出力する。ただし、設定可能な次元数（変数）すべてで行動分類モデルを生成した上で、そのすべてで行動分類精度が不合格であった場合には、行動学習部２９０１は、これまでに生成した行動分類モデルの中で最も行動分類精度が高かった行動分類モデルを行動認識部２９０４に出力し、出力した際に用いた次元数（変数）を全学習完了情報と共に行動学習部２９０１に出力する。なお、合否を判断する行動分類精度を定めず、行動学習部２９０１は、設定可能な次元数すべてで学習を行い、行動分類精度を算出した上で、算出した行動分類精度に従って、行動分類モデルを決定して、決定した行動分類モデルを合格と判断してもよい。 On the other hand, if the calculated behavior classification accuracy is lower than the predetermined accuracy, the behavior learning unit 2901 outputs to the dimension reduction unit 2900 that the behavior classification accuracy has failed. However, if behavior classification models are generated using all settable dimensions (variables) and the behavior classification accuracy fails in all of them, the behavior learning unit 2901 The behavior classification model with the highest behavior classification accuracy among the models is output to the behavior recognition unit 2904, and the number of dimensions (variables) used when outputting is output to the behavior learning unit 2901 together with all learning completion information. Note that without determining the behavior classification accuracy for determining pass/fail, the behavior learning unit 2901 performs learning with all settable dimensions, calculates the behavior classification accuracy, and then creates a behavior classification model according to the calculated behavior classification accuracy. The determined behavior classification model may be determined to be acceptable.

次元削減部２９００は、行動学習部２９０１から取得した合否情報と全学習完了情報に従って、合格または全学習完了情報を取得した場合には、取得した次元数情報を次元削減部２９０３に出力し、不合格であった場合には次元削減に用いる次元数を変更して再度次元削減を実行し、生成した変数を行動学習部２９０１に出力する。 If the dimension reduction unit 2900 acquires pass or all learning completion information according to the pass/fail information and all learning completion information acquired from the behavior learning unit 2901, the dimension reduction unit 2900 outputs the acquired dimension number information to the dimensionality reduction unit 2903, and If it passes, the number of dimensions used for dimension reduction is changed, dimension reduction is performed again, and the generated variables are output to the behavior learning unit 2901.

次元削減部２９０３は、次元削減部２９００から取得した次元数ｋ（変数）に従って、骨格情報処理部４５３から取得したデータに、実施例１～実施例６の次元削減手法を用いて次元削減を行い、生成した変数を行動認識部２９０４に出力する。 The dimension reduction unit 2903 performs dimension reduction on the data obtained from the skeleton information processing unit 453 according to the number of dimensions k (variable) obtained from the dimension reduction unit 2900 using the dimension reduction methods of Examples 1 to 6. , and outputs the generated variables to the behavior recognition unit 2904.

行動認識部２９０４は、行動学習部２９０１が合格と判断した行動分類モデルを用いて、次元削減部２９０３から入力される変数を用いて行動認識を実行する。 The behavior recognition unit 2904 executes behavior recognition using the behavior classification model that the behavior learning unit 2901 determines to be acceptable and the variables input from the dimension reduction unit 2903.

なお、行動学習部２９０１が算出する行動分類精度は実施例１に記載の寄与率に見立ててもよい。たとえば、取得した次元削減後の変数とそれを用いて算出した行動分類精度とを関連付けておき、算出された行動分類精度が、算出に用いた次元削減後の変数の元の情報に対する寄与率とする。次元削減部２９００は、こうして見立てた寄与率に応じて、次元削減後の変数についてどれを用いて制御を行うか決定する。 Note that the behavior classification accuracy calculated by the behavior learning unit 2901 may be regarded as the contribution rate described in the first embodiment. For example, by associating the obtained variable after dimension reduction with the behavior classification accuracy calculated using it, the calculated behavior classification accuracy is the contribution rate to the original information of the variable after dimension reduction used for calculation. do. The dimensionality reduction unit 2900 determines which of the variables after dimensionality reduction will be used for control according to the contribution rate estimated in this way.

＜学習処理＞
図３０は、実施例７にかかるサーバ１０１（学習装置）による学習処理の詳細な処理手順例を示すフローチャートである。サーバ１０１は、次元削減部２９００により、次元数ｋを決定する。この際、初めて次元削減を実行する場合には予め定めた次元数ｋを決定し、２回目以降の次元削減の場合は、これまでに決定してない次元数ｋを決定する。次元削減部２９００は決定した次元数ｋに従って、次元削減を行い、新たな変数（次元削減後の次元数ｋ）を生成する（Ｓ３００１）。 <Learning process>
FIG. 30 is a flowchart illustrating a detailed processing procedure example of learning processing by the server 101 (learning device) according to the seventh embodiment. The server 101 uses the dimension reduction unit 2900 to determine the number of dimensions k. At this time, when performing dimension reduction for the first time, a predetermined number of dimensions k is determined, and when dimension reduction is performed for the second time or later, a previously undetermined number of dimensions k is determined. The dimension reduction unit 2900 performs dimension reduction according to the determined number of dimensions k, and generates a new variable (number of dimensions k after dimension reduction) (S3001).

ステップＳ３００２では、サーバ１０１は行動学習部２９０１から取得した行動分類精度に対して、合否判断を行い、合格であれば学習処理を終了し、不合格であればステップＳ３００１に戻る。 In step S3002, the server 101 makes a pass/fail judgment on the behavior classification accuracy acquired from the behavior learning unit 2901. If it passes, the learning process ends, and if it fails, the process returns to step S3001.

このように、実施例７によれば、目標の行動分類精度に合わせて次元数を変更し、次元削減を繰り返すことで、複雑な行動を高精度に判別することができる。 In this manner, according to the seventh embodiment, by changing the number of dimensions in accordance with the target behavior classification accuracy and repeating dimension reduction, complex behaviors can be determined with high accuracy.

また、上述した実施例１～実施例７の行動認識装置および学習装置は、下記［１］～［１４］のように構成することもできる。 Furthermore, the behavior recognition device and learning device of Examples 1 to 7 described above can also be configured as shown in [1] to [14] below.

［１］プログラムを実行するプロセッサ２０１と、前記プログラムを記憶する記憶デバイス２０２と、を有する行動認識装置（クライアント１０２）は、多変量解析で統計的な成分を生成する成分分析（主成分分析または独立成分分析）により学習対象の形状（骨格情報３２０）から得られる前記学習対象に関する成分群と、前記学習対象の行動と、を用いて学習された行動分類モデルにアクセス可能であり、前記プロセッサ２０１は、センサ１０３から得られた解析対象データから認識対象の形状（骨格情報３２０）を検出する検出処理と、前記検出処理によって検出された骨格情報３２０のうち欠損箇所の位置を示す欠損位置情報を生成する欠損位置情報生成処理と、前記欠損箇所を補間し、補間後の前記非欠損情報から前記欠損箇所を補間し、補間後の前記非欠損情報を前記認識対象の形状として更新する補間処理と、前記成分分析により、前記補間処理によって補間した前記認識対象の形状に基づいて、前記学習対象に関する成分群と同数の前記認識対象に関する成分群を生成する成分分析処理と、前記行動分類モデルに、前記成分分析処理によって生成された前記認識対象に関する成分群と、前記欠損位置情報を入力することにより、前記認識対象の行動を示す認識結果を出力する行動認識処理と、を実行する。 [1] The behavior recognition device (client 102) having a processor 201 that executes a program and a storage device 202 that stores the program performs component analysis (principal component analysis or The processor 201 can access a behavior classification model learned using the behavior of the learning target and a component group related to the learning target obtained from the shape of the learning target (skeletal information 320) by independent component analysis). includes a detection process that detects the shape of the recognition target (skeletal information 320) from the analysis target data obtained from the sensor 103, and missing position information indicating the position of the missing part of the skeleton information 320 detected by the detection process. a process for generating missing position information; and an interpolation process for interpolating the missing location, interpolating the missing location from the non-missing information after interpolation, and updating the non-missing information after interpolation as the shape of the recognition target. , a component analysis process that generates the same number of component groups related to the recognition target as component groups related to the learning target based on the shape of the recognition target interpolated by the interpolation process, and the behavior classification model; A behavior recognition process is executed to output a recognition result indicating the behavior of the recognition target by inputting the component group related to the recognition target generated by the component analysis process and the missing position information.

これにより、成分分析により複数の行動分類モデルから特定の行動分類モデルを選択して行動認識処理を実行する場合に比べて、記憶デバイス２０２の使用量の低減化と、認識結果を出力するまでの処理の高速化を図ることができる。また、形状の一部が欠損した認識対象の複数種類の行動を高精度に認識することができる。 As a result, compared to the case where a specific behavior classification model is selected from a plurality of behavior classification models by component analysis and the behavior recognition process is executed, the usage amount of the storage device 202 is reduced and the amount of time required to output the recognition result is reduced. Processing speed can be increased. Furthermore, it is possible to highly accurately recognize multiple types of behaviors of recognition targets whose shapes are partially missing.

［２］上記［１］の行動認識装置において、前記行動認識処理では、前記プロセッサは、前記解析対象データの取得元の環境を示す環境情報と、前記認識対象に関する成分群と、前記学習対象の行動と、前記欠損位置情報とを、前記行動分類モデルに入力することにより、前記認識対象の行動を示す認識結果を出力する。 [2] In the behavior recognition device according to [1] above, in the behavior recognition process, the processor includes environmental information indicating the environment from which the analysis target data is obtained, a component group related to the recognition target, and the learning target. By inputting the behavior and the missing position information into the behavior classification model, a recognition result indicating the behavior of the recognition target is output.

これにより、環境情報を考慮した行動分類モデルが用意されるため、認識対象において認識対象の環境に応じた行動を高精度に認識することができる。 As a result, a behavior classification model that takes environmental information into account is prepared, so that the behavior of the recognition target according to the environment of the recognition target can be recognized with high accuracy.

［３］上記［１］の行動認識装置において、前記行動分類モデルは、多変量解析で統計的な成分を生成する次元削減（主成分分析または独立成分分析またはＳＮＥ（ＳｔｏｃｈａｓｔｉｃＮｅｉｇｈｂｏｒＥｍｂｅｄｄｉｎｇ）またはｔ－ＳＮＥ（ｔ－ＤｉｓｔｒｉｂｕｔｅｄＳｔｏｃｈａｓｔｉｃＮｅｉｇｈｂｏｒＥｍｂｅｄｄｉｎｇ）またはＵＭＡＰ（ＵｎｉｆｏｒｍＭａｎｉｆｏｌｄＡｐｐｒｏｘｉｍａｔｉｏｎａｎｄＰｒｏｊｅｃｔｉｏｎ）またはＩｓｏｍａｐまたはＬＬＥ（ＬｏｃａｌｌｙＬｉｎｅａｒＥｍｂｅｄｄｉｎｇ）またはラプラシアン固有マップ（ＬａｐｌａｃｉａｎＥｉｇｎｍａｐ）またはＬａｒｇｅＶｉｓまたは拡散マップ）により前記学習対象の形状から得られる前記学習対象に関する第１変数からの昇順の成分群と、前記学習対象の行動と、を用いて学習されており、前記生成処理では、前記プロセッサは、前記次元削減により、前記補間処理によって補間された前記認識対象の形状に基づいて、前記学習対象に関する第１変数からの昇順の成分群と同数の前記認識対象に関する第１変数からの昇順の成分群を生成する。 [3] In the behavior recognition device of [1] above, the behavior classification model uses dimension reduction (principal component analysis, independent component analysis, SNE (Stochastic Neighbor Embedding) or t- SNE (t-Distributed Stochastic Neighbor Embedding) or UMAP (Uniform Manifold Approximation and Projection) or Isomap or LLE (Locally L obtained from the shape of the learning object by inear Embedding or Laplacian Eignmap or LargeVis or Diffusion map). Learning is performed using a component group in ascending order from a first variable related to the learning target and the behavior of the learning target, and in the generation process, the processor performs interpolation by the interpolation process by the dimension reduction. Based on the shape of the recognition target, the same number of component groups in ascending order from the first variable regarding the recognition target as the component groups in ascending order from the first variable regarding the learning target are generated.

これにより、次元削減により複数の行動分類モデルから特定の行動分類モデルを選択して行動認識処理を実行する場合に比べて、記憶デバイス２０２の使用量の低減化と、認識結果を出力するまでの処理の高速化を図ることができる。 As a result, compared to the case where a specific behavior classification model is selected from a plurality of behavior classification models by dimension reduction and the behavior recognition process is executed, the usage amount of the storage device 202 is reduced and the amount of time required to output the recognition result is reduced. Processing speed can be increased.

［４］上記［１］の行動認識装置において、前記行動分類モデル群の各々の行動分類モデルは、前記学習対象の形状および前記形状を構成する複数の頂点の角度（関節角度３７０）から得られる前記学習対象に関する成分群と、前記学習対象の行動と、を用いて、学習されており、前記プロセッサ２０１は、前記認識対象の形状に基づいて、前記認識対象の形状を構成する複数の頂点の角度（関節角度３７０）を算出する算出処理を実行し、前記成分分析処理では、前記プロセッサ２０１は、前記認識対象の形状と、前記算出処理によって算出された前記認識対象の頂点の角度と、に基づいて、前記認識対象に関する成分群を生成する。 [4] In the behavior recognition device of [1] above, each behavior classification model of the behavior classification model group is obtained from the shape of the learning target and the angles (joint angles 370) of a plurality of vertices constituting the shape. Learning is performed using a component group related to the learning target and the behavior of the learning target, and the processor 201 determines the shape of a plurality of vertices constituting the shape of the recognition target based on the shape of the recognition target. A calculation process for calculating an angle (joint angle 370) is executed, and in the component analysis process, the processor 201 calculates the shape of the recognition target and the angle of the vertex of the recognition target calculated by the calculation process. Based on the recognition target, a component group related to the recognition target is generated.

これにより、頂点の角度に起因する形状の変化に応じて、認識対象の複数種類の行動を高精度に認識することができる。 Thereby, multiple types of actions to be recognized can be recognized with high accuracy according to changes in shape caused by the angle of the vertices.

［５］上記［１］の行動認識装置において、前記行動分類モデル群の各々の行動分類モデルは、前記学習対象の形状および前記学習対象の移動量から得られる前記学習対象に関する成分群と、前記学習対象の行動と、を用いて、学習されており、前記プロセッサ２０１は、前記認識対象の異なる時点の複数の形状に基づいて、前記認識対象の移動量を算出する算出処理を実行し、前記成分分析処理では、前記プロセッサ２０１は、前記認識対象の形状と、前記算出処理によって算出された前記認識対象の移動量と、に基づいて、前記認識対象に関する成分群を生成する。 [5] In the behavior recognition device of [1] above, each behavior classification model of the behavior classification model group includes a component group related to the learning object obtained from the shape of the learning object and the amount of movement of the learning object; The processor 201 executes a calculation process of calculating the movement amount of the recognition target based on a plurality of shapes of the recognition target at different times, and In the component analysis process, the processor 201 generates a component group regarding the recognition target based on the shape of the recognition target and the amount of movement of the recognition target calculated by the calculation process.

これにより、移動に起因する形状の経時的な変化に応じて、認識対象の複数種類の行動を高精度に認識することができる。 Thereby, multiple types of actions of the recognition target can be recognized with high accuracy according to changes in shape over time due to movement.

［６］上記［１］の行動認識装置において、前記プロセッサ２０１は、前記認識対象の形状の大きさを正規化する第１正規化処理を実行し、前記成分分析処理では、前記プロセッサ２０１は、前記第１正規化処理による第１正規化後の前記認識対象の形状に基づいて、前記認識対象に関する成分群を生成する。 [6] In the behavior recognition device of [1] above, the processor 201 executes a first normalization process to normalize the size of the shape of the recognition target, and in the component analysis process, the processor 201 A component group related to the recognition target is generated based on the shape of the recognition target after first normalization by the first normalization process.

これにより、行動分類の汎用性の向上により、誤認識の抑制を図ることができる。 Thereby, by improving the versatility of behavior classification, it is possible to suppress misrecognition.

［７］上記［１］の行動認識装置において、前記プロセッサ２０１は、前記認識対象の形状および頂点の角度が取りうる値域を正規化する第２正規化処理を実行し、前記成分分析処理では、前記プロセッサ２０１は、前記第２正規化処理による第２正規化後の前記認識対象の形状および頂点の角度（関節角度３７０）に基づいて、前記認識対象に関する成分群を生成する。 [7] In the behavior recognition device of [1] above, the processor 201 executes a second normalization process to normalize the range of possible values of the shape of the recognition target and the angle of the vertex, and in the component analysis process, The processor 201 generates a component group regarding the recognition target based on the shape of the recognition target and the vertex angle (joint angle 370) after second normalization by the second normalization process.

これにより、形状と角度という異なるデータ種における値域の偏りを抑制することができ、行動認識の高精度化を図ることができる。 As a result, it is possible to suppress bias in the range of different data types such as shape and angle, and it is possible to improve the accuracy of behavior recognition.

［８］プログラムを実行するプロセッサ２０１と、前記プログラムを記憶する記憶デバイス２０２と、を有する学習装置において、前記プロセッサ２０１は、学習対象の形状および行動を含む教師データを取得する取得処理と、前記取得処理によって取得された前記学習対象の形状を欠損させる欠損処理と、前記欠損処理によって前記学習対象の形状から欠損させた欠損箇所の位置を示す欠損位置情報を生成する欠損位置情報生成処理と、前記学習対象の形状のうち前記欠損処理によって欠損させた前記欠損箇所以外の箇所である非欠損情報から補間し、補間後の前記非欠損情報を前記学習対象の形状として更新する補間処理と、多変量解析で統計的な成分を生成する成分分析（主成分分析または独立成分分析）により、前記補間処理によって補間した前記学習対象の形状に基づいて、前記学習対象に関する成分群を生成する成分分析処理と、前記成分分析処理によって生成された前記学習対象に関する成分群と、前記学習対象の行動と、前記欠損位置情報と、に基づいて、前記学習対象の行動を学習して、前記学習対象の行動を分類する行動分類モデルを生成する行動学習処理と、を実行する。 [8] In a learning device that includes a processor 201 that executes a program and a storage device 202 that stores the program, the processor 201 performs an acquisition process that acquires teacher data including the shape and behavior of a learning target; a deletion process that deletes the shape of the learning target acquired by the acquisition process; a deletion position information generation process that generates deletion position information indicating a position of a defective part deleted from the shape of the learning target by the deletion process; interpolation processing that interpolates from non-missing information that is a part of the shape of the learning target other than the missing part that has been deleted by the missing processing, and updating the non-missing information after interpolation as the shape of the learning target; A component analysis process that generates a component group related to the learning target based on the shape of the learning target interpolated by the interpolation process by component analysis (principal component analysis or independent component analysis) that generates statistical components by variable analysis. The behavior of the learning target is learned based on the component group related to the learning target generated by the component analysis process, the behavior of the learning target, and the missing position information, and the behavior of the learning target is learned. and a behavior learning process that generates a behavior classification model for classifying.

これにより、行動認識装置において、複数の行動分類モデルから特定の行動分類モデルを選択する必要がなくなる。 This eliminates the need for the behavior recognition device to select a specific behavior classification model from a plurality of behavior classification models.

［９］上記［８］の学習装置において、前記プロセッサ２０１は、前記学習対象の形状に基づいて、前記学習対象の形状を構成する複数の頂点の角度（関節角度３７０）を算出する算出処理を実行し、前記成分分析処理では、前記プロセッサ２０１は、前記学習対象の形状と、前記算出処理によって算出された前記学習対象の頂点の角度と、に基づいて、前記学習対象に関する成分群を生成する。 [9] In the learning device of [8] above, the processor 201 performs a calculation process of calculating angles (joint angles 370) of a plurality of vertices constituting the shape of the learning target based on the shape of the learning target. In the component analysis process, the processor 201 generates a component group regarding the learning target based on the shape of the learning target and the angle of the vertex of the learning target calculated by the calculation process. .

これにより、頂点の角度に起因する形状の変化に応じた行動分類モデルを用意することができるため、認識対象の頂点の角度に起因する形状の変化に応じた複数種類の行動を、高精度に認識することができる。 This makes it possible to prepare a behavior classification model that responds to changes in shape caused by the angle of the vertices, so it is possible to accurately classify multiple types of behaviors in response to changes in shape caused by the angle of the vertices to be recognized. can be recognized.

［１０］上記［８］の学習装置において、前記プロセッサ２０１は、前記学習対象の異なる時点の複数の形状に基づいて、前記学習対象の移動量を算出する算出処理を実行し、前記成分分析処理では、前記プロセッサ２０１は、前記学習対象の形状と、前記算出処理によって算出された前記学習対象の移動量と、に基づいて、前記学習対象に関する成分群を生成する。 [10] In the learning device of [8] above, the processor 201 executes a calculation process to calculate the movement amount of the learning target based on a plurality of shapes of the learning target at different times, and performs the component analysis process. Then, the processor 201 generates a component group regarding the learning object based on the shape of the learning object and the movement amount of the learning object calculated by the calculation process.

これにより、移動に起因する形状の経時的な変化に応じた行動分類モデルを用意することができるため、移動に起因する形状の経時的な変化に応じた複数種類の行動を、高精度に認識することができる。 As a result, it is possible to prepare a behavior classification model that responds to changes in shape over time due to movement, so multiple types of behaviors can be recognized with high accuracy according to changes in shape over time due to movement. can do.

［１１］上記［８］の学習装置において、前記プロセッサ２０１は、前記学習対象の形状の大きさを正規化する第１正規化処理を実行し、前記成分分析処理では、前記プロセッサ２０１は、前記第１正規化処理による第１正規化後の前記学習対象の形状に基づいて、前記学習対象に関する成分群を生成する。 [11] In the learning device according to [8] above, the processor 201 executes a first normalization process to normalize the size of the shape of the learning target, and in the component analysis process, the processor 201 A component group related to the learning target is generated based on the shape of the learning target after first normalization by a first normalization process.

これにより、行動分類学習の汎用性の向上により、誤学習の抑制を図ることができる。 Thereby, by improving the versatility of behavior classification learning, it is possible to suppress erroneous learning.

［１２］上記［９］の学習装置において、前記プロセッサ２０１は、前記学習対象の形状および頂点の角度が取りうる値域を正規化する第２正規化処理を実行し、前記成分分析処理では、前記プロセッサ２０１は、前記第２正規化処理による第２正規化後の前記学習対象の形状および頂点の角度に基づいて、前記学習対象に関する成分群を生成する。 [12] In the learning device according to [9] above, the processor 201 executes a second normalization process to normalize the range of possible values of the shape of the learning target and the angle of the vertex, and in the component analysis process, the The processor 201 generates a component group regarding the learning target based on the shape of the learning target and the angle of the vertex after second normalization by the second normalization process.

これにより、形状と角度という異なるデータ種における値域の偏りを抑制することができ、行動分類学習の高精度化を図ることができる。 Thereby, it is possible to suppress bias in the range of different data types such as shape and angle, and it is possible to improve the accuracy of behavior classification learning.

［１３］上記［８］の学習装置において、前記欠損処理では、前記プロセッサは、前記学習対象の形状のうち特定の箇所を欠損させる。 [13] In the learning device according to [8] above, in the deletion process, the processor deletes a specific part of the shape of the learning target.

これにより、欠損しやすい学習対象の形状を学習することができ、行動分類学習の高精度化を図ることができる。 Thereby, it is possible to learn the shape of the learning target that is likely to be lost, and it is possible to improve the accuracy of behavior classification learning.

［１４］上記［８］の学習装置において、前記プロセッサ２０１は、前記取得処理の前記学習対象の形状の取得の際に、前記学習対象の形状を取得した際の環境情報を取得し、前記成分群と、前記学習対象の行動と、前記欠損位置情報と、前記環境情報と、に基づいて、行動分類モデルを生成する。 [14] In the learning device according to [8] above, when acquiring the shape of the learning target in the acquisition process, the processor 201 acquires environmental information at the time when the shape of the learning target was acquired, and A behavior classification model is generated based on the group, the behavior of the learning target, the missing position information, and the environment information.

これにより、学習対象の形状と、環境情報に応じた行動分類モデルを生成することができ、多様な環境に合わせた行動分類学習の高精度化を図ることができる。 Thereby, it is possible to generate a behavior classification model according to the shape of the learning target and environmental information, and it is possible to improve the accuracy of behavior classification learning that is tailored to various environments.

なお、本発明は前述した実施例に限定されるものではなく、添付した特許請求の範囲の趣旨内における様々な変形例及び同等の構成が含まれる。たとえば、前述した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明したすべての構成を備えるものに本発明は限定されない。また、ある実施例の構成の一部を他の実施例の構成に置き換えてもよい。また、ある実施例の構成に他の実施例の構成を加えてもよい。また、各実施例の構成の一部について、他の構成の追加、削除、または置換をしてもよい。 Note that the present invention is not limited to the embodiments described above, and includes various modifications and equivalent configurations within the scope of the appended claims. For example, the embodiments described above have been described in detail to explain the present invention in an easy-to-understand manner, and the present invention is not necessarily limited to having all the configurations described. Further, a part of the configuration of one embodiment may be replaced with the configuration of another embodiment. Further, the configuration of one embodiment may be added to the configuration of another embodiment. Furthermore, other configurations may be added to, deleted from, or replaced with some of the configurations of each embodiment.

また、前述した各構成、機能、処理部、処理手段等は、それらの一部又は全部を、たとえば集積回路で設計する等により、ハードウェアで実現してもよく、プロセッサ２０１がそれぞれの機能を実現するプログラムを解釈し実行することにより、ソフトウェアで実現してもよい。 Further, each of the above-described configurations, functions, processing units, processing means, etc. may be partially or entirely realized in hardware by designing an integrated circuit, for example, and the processor 201 may implement each function. It may be realized by software by interpreting and executing a program to be realized.

各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記憶装置、又は、ＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）カード、ＳＤカード、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）の記録媒体に格納することができる。 Information such as programs, tables, and files that realize each function is stored in storage devices such as memory, hard disks, and SSDs (Solid State Drives), or on IC (Integrated Circuit) cards, SD cards, and DVDs (Digital Versatile Discs). It can be stored on a medium.

また、制御線や情報線は説明上必要と考えられるものを示しており、実装上必要なすべての制御線や情報線を示しているとは限らない。実際には、ほとんどすべての構成が相互に接続されていると考えてよい。 Furthermore, the control lines and information lines shown are those considered necessary for explanation, and do not necessarily show all the control lines and information lines necessary for implementation. In reality, almost all configurations can be considered interconnected.

１００行動認識システム
１０１サーバ
１０２クライアント
１０３センサ
１０４教師信号ＤＢ
２０１プロセッサ
２０２記憶デバイス
３２０骨格情報
４０１，２０００教師信号取得部
４０２欠損発生部
４２１欠損制御部
４０３，４５３骨格情報処理部
４０４，４５４主成分分析部
４０６，２５００，２９０１行動学習部
４２２，４６２欠損位置情報生成部
４２３，４６１欠損情報補間部
４５１骨格検出部
４５５次元数決定部
４５７，２５０１，２９０４行動認識部
５０１関節角度算出部
５０２移動量算出部
５０３正規化部
２００１環境情報検出部
２２０４相互情報正規化部
２４００，２４０１，２９００，２９０３次元削減部 100 Behavior recognition system 101 Server 102 Client 103 Sensor 104 Teacher signal DB
201 Processor 202 Storage device 320 Skeletal information 401, 2000 Teacher signal acquisition unit 402 Deficit generation unit 421 Deficit control unit 403, 453 Skeleton information processing unit 404, 454 Principal component analysis unit 406, 2500, 2901 Behavior learning unit 422, 462 Defective position Information generation section 423, 461 Missing information interpolation section 451 Skeleton detection section 455 Dimension number determination section 457, 2501, 2904 Action recognition section 501 Joint angle calculation section 502 Movement amount calculation section 503 Normalization section 2001 Environment information detection section 2204 Mutual information normalization conversion section 2400, 2401, 2900, 2903 dimension reduction section

Claims

An action recognition device comprising a processor that executes a program, and a storage device that stores the program,
It is possible to access a behavior classification model learned using a component group related to the learning target obtained from the shape of the learning target by component analysis that generates statistical components by multivariate analysis, and the behavior of the learning target. ,
The processor includes:
Detection processing that detects the shape of the recognition target from the analysis target data,
a defective position information generation process that generates defective position information indicating a position of a defective part in the shape of the recognition target detected by the detection process;
An interpolation process of interpolating the missing part from non-defective information that is a part other than the missing part in the shape of the recognition target including the missing part, and updating the non-defective information after interpolation as the shape of the recognition target. ,
a component analysis process that generates the same number of component groups related to the recognition target as component groups related to the learning target, based on the shape of the recognition target interpolated by the interpolation process, through the component analysis;
a behavior recognition process that outputs a recognition result indicating the behavior of the recognition target by inputting a component group related to the recognition target generated by the component analysis process and the missing position information to the behavior classification model; ,
An action recognition device characterized by performing.

The behavior recognition device according to claim 1,
In the behavior recognition process, the processor classifies environmental information indicating the environment from which the analysis target data is obtained, a component group related to the recognition target, the behavior of the learning target, and the missing position information into the behavior classification. outputting a recognition result indicating the behavior of the recognition target by inputting it to a model;
An action recognition device characterized by:

The behavior recognition device according to claim 1,
The behavior classification model includes a group of components in ascending order from a first variable related to the learning target obtained from the shape of the learning target by dimension reduction that generates statistical components by multivariate analysis, and a behavior of the learning target. It is learned using
In the component analysis process, the processor calculates the same number of recognition targets as the number of component groups in ascending order from the first variable related to the learning target based on the shape of the recognition target interpolated by the interpolation process through the dimension reduction. generate a group of components in ascending order from the first variable with respect to
An action recognition device characterized by:

The behavior recognition device according to claim 1,
The behavior classification model is trained using a component group related to the learning target obtained from the shape of the learning target and the angles of a plurality of vertices constituting the shape, and the behavior of the learning target,
The processor includes:
Based on the shape of the recognition target, performing a calculation process of calculating angles of a plurality of vertices forming the shape of the recognition target;
In the component analysis process, the processor generates a component group related to the recognition target based on the shape of the recognition target and the angle of the vertex of the recognition target calculated by the calculation process.
An action recognition device characterized by:

The behavior recognition device according to claim 1,
The behavior classification model is trained using a component group related to the learning target obtained from the shape of the learning target and the amount of movement of the learning target, and the behavior of the learning target,
The processor includes:
executing a calculation process for calculating a movement amount of the recognition target based on a plurality of shapes of the recognition target at different times;
In the component analysis process, the processor generates a component group related to the recognition target based on the shape of the recognition target and the movement amount of the recognition target calculated by the calculation process.
An action recognition device characterized by:

The behavior recognition device according to claim 1,
The processor includes:
performing a first normalization process to normalize the size of the shape of the recognition target;
In the component analysis process, the processor generates a component group regarding the recognition target based on the shape of the recognition target after normalization by the first normalization process.
An action recognition device characterized by:

The behavior recognition device according to claim 1,
The processor executes a second normalization process to normalize the range of possible values of the shape of the recognition target and the angle of the vertex,
In the component analysis process, the processor generates a component group regarding the recognition target based on the shape and angle of the vertex of the recognition target after second normalization by the second normalization process.
An action recognition device characterized by:

A learning device comprising a processor that executes a program and a storage device that stores the program,
The processor includes:
an acquisition process that acquires training data including the shape and behavior of the learning target;
a deletion process that deletes the shape of the learning target acquired by the acquisition process;
a defective position information generation process that generates defective position information indicating a position of a defective part deleted from the shape of the learning target by the defective process;
Interpolation processing that performs interpolation from non-missing information that is a location other than the missing location that has been deleted by the loss processing in the shape of the learning target, and updates the non-missing information after interpolation as the shape of the learning target;
Component analysis processing that generates a component group related to the learning object based on the shape of the learning object interpolated by the interpolation processing by component analysis that generates statistical components by multivariate analysis;
Learning the behavior of the learning target based on the component group related to the learning target generated by the component analysis process, the behavior of the learning target, and the missing position information, and classifying the behavior of the learning target. a behavioral learning process that generates a behavioral classification model;
A learning device characterized by performing the following.

The learning device according to claim 8,
The processor executes a calculation process to calculate angles of a plurality of vertices forming the shape of the learning target based on the shape of the learning target,
In the component analysis process, the processor generates a component group regarding the learning target based on the shape of the learning target and the angle of the vertex of the learning target calculated by the calculation process.
A learning device characterized by:

The learning device according to claim 8,
The processor executes a calculation process that calculates a movement amount of the learning target based on a plurality of shapes of the learning target at different times, and in the component analysis process, the processor calculates the shape of the learning target, generating a component group related to the learning target based on the movement amount of the learning target calculated by the calculation process;
A learning device characterized by:

The learning device according to claim 8,
The processor executes a first normalization process to normalize the size of the shape of the learning target, and in the component analysis process, the processor executes the learning after the first normalization by the first normalization process. generating a component group related to the learning target based on the shape of the target;
A learning device characterized by:

The learning device according to claim 9,
The processor executes a second normalization process that normalizes a range of possible values of the shape of the learning target and the angle of the vertex, and in the component analysis process, the processor generating a component group related to the learning target based on the shape of the learning target after transformation and the angle of the vertex;
A learning device characterized by:

The learning device according to claim 8,
In the deletion process, the processor deletes a specific part of the learning target shape;
A learning device characterized by:

The learning device according to claim 8,
In the acquisition process, the processor acquires environmental information when the shape of the learning target is acquired;
In the behavior learning process, the processor learns the behavior of the learning target based on the component group related to the learning target, the behavior of the learning target, the missing position information, and the environment information, generating a behavior classification model for classifying the behavior of the learning target;
A learning device characterized by:

An action recognition method executed by an action recognition device having a processor that executes a program, and a storage device that stores the program, the method comprising:
It is possible to access a behavior classification model learned using a component group related to the learning target obtained from the shape of the learning target by component analysis that generates statistical components by multivariate analysis, and the behavior of the learning target. ,
The processor includes:
Detection processing that detects the shape of the recognition target from the analysis target data,
a defective position information generation process that generates defective position information indicating a position of a defective part in the shape of the recognition target detected by the detection process;
An interpolation process of interpolating the missing part from non-defective information that is a part other than the missing part in the shape of the recognition target including the missing part, and updating the non-defective information after interpolation as the shape of the recognition target. ,
a component analysis process that generates the same number of component groups related to the recognition target as component groups related to the learning target, based on the shape of the recognition target interpolated by the interpolation process, through the component analysis;
a behavior recognition process that outputs a recognition result indicating the behavior of the recognition target by inputting a component group related to the recognition target generated by the component analysis process and the missing position information to the behavior classification model; ,
An action recognition method characterized by performing the following.