JP2023179239A

JP2023179239A - Information processing program, information processing method, and information processing apparatus

Info

Publication number: JP2023179239A
Application number: JP2022092433A
Authority: JP
Inventors: 博昭藤本; Hiroaki Fujimoto
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2022-06-07
Filing date: 2022-06-07
Publication date: 2023-12-19

Abstract

To provide an information processing program, method, and apparatus capable of improving the accuracy of a model for inferring skeletal data of a person.SOLUTION: In a skeleton recognition system 30, an information processing apparatus 100 is configured to: infer a plurality of pieces of skeletal data of a person included in a plurality of pieces of image data, based on the result of inputting the plurality of pieces of image data in which the areas of the person are cut out from a plurality of pieces of training image data into a learning model; determine whether or not the image data corresponding to abnormal skeletal data is the image data in which the area of the person is abnormal; if it is the image data in which the area of the person is abnormal, identify similar training image data from the plurality of pieces of training image data, wherein the similar training image data has joint position characteristics of a person that is similar to the joint position characteristics of the person identified from abnormal training image data; adjust the area of the person identified from the similar training image data; and train the learning model based on the image data obtained by cutting out the adjusted area of the person from the similar training image data.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理プログラム等に関する。 The present invention relates to an information processing program and the like.

体操などのスポーツ、ヘルスケアやエンターテイメントの分野において、骨格認識のニーズが高まっている。また、Deep Learning技術の向上により、画像方式によって、２次元（２Ｄ）または３次元（３Ｄ）の骨格認識の精度も向上しつつある。 The need for skeletal recognition is increasing in the fields of sports such as gymnastics, healthcare, and entertainment. Furthermore, with the improvement of deep learning technology, the accuracy of two-dimensional (2D) or three-dimensional (3D) skeleton recognition is also improving depending on the image method.

図１６は、従来の骨格認識システムを説明するための図である。たとえば、従来の骨格認識システム１では、学習フェーズの処理と、推論フェーズの処理とが実行される。 FIG. 16 is a diagram for explaining a conventional skeleton recognition system. For example, in the conventional skeleton recognition system 1, a learning phase process and an inference phase process are executed.

学習フェーズにおいて、骨格認識システム１の関節位置学習部１０は、訓練データセット５を基にして、関節位置推定モデル１１を訓練する。訓練データセット５には、画像データと、正解ラベルとの組が含まれる。訓練データセット５の画像データは、カメラの画像フレームに含まれる人領域をバウンディングボックス（bounding box）で切り出した画像データである。訓練データセット５の正解ラベルは、人物の正解関節位置を示すラベルである。 In the learning phase, the joint position learning unit 10 of the skeleton recognition system 1 trains the joint position estimation model 11 based on the training data set 5. The training data set 5 includes a set of image data and a correct label. The image data of the training data set 5 is image data obtained by cutting out a human region included in the image frame of the camera using a bounding box. The correct label of training data set 5 is a label indicating the correct joint position of the person.

推論フェーズにおいて、骨格認識システム１の人検出部１３は、カメラ１２が撮影した画像フレームに対して人領域を検出し、画像フレームから人領域をバウンディングボックスで切り出した画像データ６を生成する。骨格認識システム１の関節位置推定部１４は、訓練済みの関節位置推定モデル１１に、画像データ６を入力することで、人物の関節位置推定結果７を得る。 In the inference phase, the human detection unit 13 of the skeleton recognition system 1 detects a human region in the image frame photographed by the camera 12, and generates image data 6 by cutting out the human region from the image frame with a bounding box. The joint position estimating unit 14 of the skeleton recognition system 1 obtains the joint position estimation result 7 of the person by inputting the image data 6 to the trained joint position estimation model 11 .

たとえば、図１に示した人検出部１３は、人検出を行う場合に、機械学習モデルを利用する。人検出部１３は、機械学習モデルが訓練不足であると、人領域を正確に検出できない場合がある。また、関節位置推定部１４が、人領域を正確に検出できていない画像データを、関節位置推定モデル１１に入力すると、人物の関節位置を精度よく推定できない場合がある。 For example, the person detection unit 13 shown in FIG. 1 uses a machine learning model when detecting a person. The person detection unit 13 may not be able to accurately detect a human area if the machine learning model is insufficiently trained. Further, if the joint position estimating unit 14 inputs image data in which a human region cannot be accurately detected to the joint position estimation model 11, the joint positions of the person may not be accurately estimated.

図１７は、画像データに応じた関節位置推定結果の一例を示す図である。図１７において、画像データ６ａを関節位置推定モデル１１に入力すると、関節位置推定結果７ａが得られる。画像データ６ａは、人領域をバウンディングボックスで切り出した正常な画像データであるため、関節位置推定結果７ａは正常な推定結果となる。 FIG. 17 is a diagram showing an example of joint position estimation results according to image data. In FIG. 17, when image data 6a is input to the joint position estimation model 11, a joint position estimation result 7a is obtained. Since the image data 6a is normal image data obtained by cutting out the human region with a bounding box, the joint position estimation result 7a is a normal estimation result.

画像データ６ｂを関節位置推定モデル１１に入力すると、関節位置推定結果７ｂが得られる。画像データ６ｂは、左足の見切れが発生した画像データであるため、関節位置推定結果７ｂは正常な推定結果とならない。たとえば、左足の見切れの影響により、関節位置推定結果７ｂでは、右足の関節位置と、左足の関節位置とが重複している。 When the image data 6b is input to the joint position estimation model 11, a joint position estimation result 7b is obtained. Since the image data 6b is image data in which the left leg is cut off, the joint position estimation result 7b is not a normal estimation result. For example, due to the effect of the left foot being cut off, the joint positions of the right foot and the joint positions of the left foot overlap in the joint position estimation result 7b.

画像データ６ｃを関節位置推定モデル１１に入力すると、関節位置推定結果７ｃが得られる。画像データ６ｃは、人物以外に体操器具が含まれており、切り出された領域が、人領域よりも大きい画像データであるため、関節位置推定結果７ｃは正常な推定結果とならない。たとえば、画像データ６ｃに体操器具が含まれている影響により、関節位置推定結果７ｃでは、人物の関節位置の一部（左足首）が、体操器具上に配置されている。 When the image data 6c is input to the joint position estimation model 11, a joint position estimation result 7c is obtained. The image data 6c includes gymnastics equipment in addition to the person, and the extracted area is larger than the human area, so the joint position estimation result 7c is not a normal estimation result. For example, due to the fact that the image data 6c includes gymnastics equipment, in the joint position estimation result 7c, a part of the person's joint position (left ankle) is placed on the gymnastics equipment.

続いて、マルチカメラを用いた従来の３Ｄ骨格認識システムの一例について説明する。図１８は、３Ｄ骨格認識システムの一例を示す図である。図１８に示すように、この３Ｄ骨格認識システム２は、カメラ２０ａ，２０ｂ，２０ｃ，２０ｄと、人検出部２１ａ，２１ｂ，２１ｃ，２１ｄと、関節位置推定部２２ａ，２２ｂ，２２ｃ，２２ｄと、３Ｄ関節推定部２３とを有する。 Next, an example of a conventional 3D skeleton recognition system using multiple cameras will be described. FIG. 18 is a diagram showing an example of a 3D skeleton recognition system. As shown in FIG. 18, this 3D skeleton recognition system 2 includes cameras 20a, 20b, 20c, 20d, human detection units 21a, 21b, 21c, 21d, joint position estimation units 22a, 22b, 22c, 22d, It has a 3D joint estimation unit 23.

カメラ２０ａ～２０ｄは、人物の画像をそれぞれ異なる方向から撮影し、撮影した画像フレームを、対応する人検出部２１ａ，２１ｂ，２１ｃ，２１ｄに出力する。以下の説明では、特に区別しない場合、カメラ２０ａ～２０ｄをまとめて、カメラ２０と表記する。 The cameras 20a to 20d take images of people from different directions, and output the taken image frames to the corresponding person detection units 21a, 21b, 21c, and 21d. In the following description, the cameras 20a to 20d will be collectively referred to as the camera 20 unless otherwise distinguished.

人検出部２１ａ～２１ｄは、訓練済みの機械学習モデルを利用して、カメラ２０が撮影した画像フレームから人領域を検出し、検出した人領域をバウンディングボックスで切り出した画像データを出力する。図１８に示す例では、人検出部２１ａは、カメラ２０ａの画像フレームを基にして、画像データ８ａを出力する。人検出部２１ｂは、カメラ２０ｂの画像フレームを基にして、画像データ８ｂを出力する。人検出部２１ｃは、カメラ２０ｃの画像フレームを基にして、画像データ８ｃを出力する。人検出部２１ｄは、カメラ２０ｄの画像フレームを基にして、画像データ８ｄを出力する。 The human detection units 21a to 21d detect human regions from the image frames captured by the camera 20 using trained machine learning models, and output image data in which the detected human regions are cut out using bounding boxes. In the example shown in FIG. 18, the person detection unit 21a outputs image data 8a based on the image frame of the camera 20a. The person detection unit 21b outputs image data 8b based on the image frame of the camera 20b. The person detection unit 21c outputs image data 8c based on the image frame of the camera 20c. The person detection unit 21d outputs image data 8d based on the image frame of the camera 20d.

図１８に示す例では、画像データ８ａ，８ｄは、正常な画像データである。一方、画像データ８ｂは、左足首が見切れた画像データである。画像データ８ｃは、右足首が見切れた画像データである。 In the example shown in FIG. 18, image data 8a and 8d are normal image data. On the other hand, image data 8b is image data in which the left ankle is cut off. Image data 8c is image data in which the right ankle is not visible.

関節位置推定部２２ａ～２２ｄは、図１６で説明した訓練済みの関節位置推定モデル１１に、画像データ８ａ～８ｄをそれぞれ入力することで、ヒートマップ９ａ，９ｂ，９ｃ，９ｄを生成する。ヒートマップ９ａ～９ｄは、人物の各関節位置を示す情報である。 The joint position estimation units 22a to 22d generate heat maps 9a, 9b, 9c, and 9d by inputting the image data 8a to 8d, respectively, to the trained joint position estimation model 11 described in FIG. 16. The heat maps 9a to 9d are information indicating the positions of each joint of the person.

たとえば、ヒートマップ９ａ～９ｄには、右足首の関節ヒートマップ、左足首の関節ヒートマップ、他関節の関節ヒートマップが含まれる。関節ヒートマップでは、座標と尤度とが対応付けられ、該当する関節位置の最も確からしい座標の尤度ほど、大きな尤度となる。 For example, the heat maps 9a to 9d include a right ankle joint heat map, a left ankle joint heat map, and joint heat maps of other joints. In a joint heat map, coordinates and likelihoods are associated with each other, and the likelihood of the most probable coordinates of the corresponding joint position is increased.

ヒートマップ９ａ，９ｄは、正常な画像データ８ａ，８ｄを基に推定されたヒートマップであり、各関節ヒートマップに示される関節位置は適切なものとなっている。 The heat maps 9a and 9d are heat maps estimated based on normal image data 8a and 8d, and the joint positions shown in each joint heat map are appropriate.

ヒートマップ９ｂは、左足首が見切れた画像データ８ｂを基に推定されたヒートマップであり、左足首の関節ヒートマップに示される関節位置が、誤認識されている。ヒートマップ９ｃは、右足首が見切れた画像データ８ｃを基に推定されたヒートマップであり、右足首の関節ヒートマップに示される関節位置が、誤認識されている。 The heat map 9b is a heat map estimated based on the image data 8b in which the left ankle is not visible, and the joint positions shown in the joint heat map of the left ankle are misrecognized. The heat map 9c is a heat map estimated based on the image data 8c in which the right ankle is not visible, and the joint positions shown in the joint heat map of the right ankle are misrecognized.

３Ｄ関節推定部２３は、ヒートマップ９ａ～９ｄを基にして、３Ｄ骨格データ２４を生成する。３Ｄ骨格データ２４は、人物の３次元関節位置を有する。たとえば、ヒートマップ９ｂ，９ｃのように、関節位置が誤認識されると、３Ｄ骨格データ２４の３次元関節位置に乱れが発生する。 The 3D joint estimation unit 23 generates 3D skeletal data 24 based on the heat maps 9a to 9d. The 3D skeleton data 24 includes three-dimensional joint positions of a person. For example, when joint positions are incorrectly recognized as in the heat maps 9b and 9c, disturbances occur in the three-dimensional joint positions of the 3D skeleton data 24.

上記の３Ｄ骨格認識システム２は、カメラ２０から時系列に画像フレームが入力される度に、上記処理を繰り返し実行し、時系列に複数の３Ｄ骨格データを生成する。時系列の３Ｄ骨格データは、体操競技の採点を行う場合に利用される。 The 3D skeleton recognition system 2 described above repeatedly executes the above processing every time image frames are inputted in time series from the camera 20, and generates a plurality of 3D skeleton data in time series. Time-series 3D skeletal data is used when scoring gymnastics competitions.

ここで、図１８に示した３Ｄ骨格データ２４の３次元関節位置の乱れを抑止するための従来技術として、従来技術１および従来技術２がある。 Here, as conventional techniques for suppressing disturbances in the three-dimensional joint positions of the 3D skeleton data 24 shown in FIG. 18, there are conventional techniques 1 and 2.

従来技術１では、人検出部２１ａ～２１ｄが利用する機械学習モデルを訓練する際の訓練データセットとして、様々なサイズの画像データを準備する。たとえば、従来技術１では、バウンディングボックスのサイズを正常なサイズだけではなく、意図的に、ランダムに変更し、見切れの発生した画像データ、人領域以外の対象物を含む大きすぎる画像データ等を人工的に発生させ、係る画像データを用いて、機械学習モデルを訓練する。これによって、人検出部２１ａ～２１ｄから出力される画像データに、人領域が適切に含まれるようにし、３Ｄ骨格データ２４の３次元関節位置の乱れを抑止する。 In Prior Art 1, image data of various sizes are prepared as training data sets when training machine learning models used by the human detection units 21a to 21d. For example, in Prior Art 1, the size of the bounding box is not only changed to the normal size, but also intentionally changed at random to artificially create image data with cut-offs, overly large image data that includes objects other than human areas, etc. A machine learning model is trained using such image data. Thereby, the human region is appropriately included in the image data output from the human detection units 21a to 21d, and disturbances in the three-dimensional joint positions of the 3D skeleton data 24 are suppressed.

従来技術２では、３Ｄ関節推定部２３から出力される複数の３Ｄ骨格データから、３次元関節位置に乱れが発生した３Ｄ骨格データを特定し、特定した３Ｄ骨格データを生成する基となった画像データ（バウンディングボックスのサイズが異常な画像データ）を抽出する。たとえば、図１８の３Ｄ骨格データ２４に乱れが発生している場合には、画像データ８ｂ，８ｃを抽出する。従来技術２では、抽出した画像データ８ｂ，８ｃを基にして、関節位置推定モデル１１を再訓練することで、３Ｄ骨格データ２４の３次元関節位置の乱れを抑止する。 In conventional technology 2, 3D skeletal data in which disturbances have occurred in the 3D joint positions are identified from a plurality of 3D skeletal data output from the 3D joint estimating unit 23, and an image is used as the basis for generating the identified 3D skeletal data. Extract data (image data with abnormal bounding box size). For example, if a disturbance occurs in the 3D skeleton data 24 in FIG. 18, image data 8b and 8c are extracted. In prior art 2, the joint position estimation model 11 is retrained based on the extracted image data 8b and 8c, thereby suppressing disturbances in the three-dimensional joint positions of the 3D skeleton data 24.

特開２０２１－０５６９２２号公報JP2021-056922A 特開２０２１－１７４０５９号公報JP 2021-174059 Publication

上述した従来技術１では、ランダムに生成される画像データが、実際の人領域検出の誤りに対応する画像データと必ずしも同様になるわけではないため、機械学習モデルを適切に訓練できず、十分な効果を得ることができない。また、実際の人領域検出の誤りに対応しない画像データを用いて、機械学習モデルを再訓練すると、機械学習モデルの精度が低下する場合もあり得る。 In Prior Art 1 described above, the randomly generated image data is not necessarily the same as the image data corresponding to the actual human area detection error, so the machine learning model cannot be trained appropriately and the machine learning model cannot be trained properly. can't get any effect. Furthermore, if a machine learning model is retrained using image data that does not correspond to actual human area detection errors, the accuracy of the machine learning model may decrease.

従来技術２では、実際に３次元座標の乱れが発生している骨格データに応じた画像データを利用するものであるため、実際に異常が発生している画像データだけでは再訓練を行うための十分な訓練データを確保することが難しい。 Conventional technology 2 uses image data that corresponds to skeletal data in which disturbances in three-dimensional coordinates have actually occurred, so it is difficult to perform retraining using only image data in which abnormalities have actually occurred. It is difficult to secure sufficient training data.

すなわち、人物の骨格データを推論するモデルの精度を向上させるために有用な訓練データを確保し、モデルを訓練ことが求められている。 That is, in order to improve the accuracy of a model that infers human skeletal data, it is necessary to secure useful training data and train the model.

１つの側面では、本発明は、人物の骨格データを推論するモデルの精度を向上させることができる情報処理プログラム、情報処理方法および情報処理装置を提供することを目的とする。 In one aspect, the present invention aims to provide an information processing program, an information processing method, and an information processing device that can improve the accuracy of a model for inferring skeletal data of a person.

第１の案では、コンピュータに次の処理を実行させる。コンピュータは、複数の訓練用画像データから人物の領域を切り出した複数の画像データを、学習モデルに入力した結果を基にして、複数の画像データに含まれる人物の複数の骨格データを推論する。コンピュータは、複数の骨格データを基にして、異常な骨格データを検出する。コンピュータは、複数の訓練用画像データのうち、異常な骨格データに対応する異常訓練用画像データから特定される人物の領域と、異常な骨格データに対応する画像データの人物の領域とを基にして、異常な骨格データに対応する画像データが、人物の領域が異常な画像データであるか否かを判定する。コンピュータは、異常な骨格データに対応する画像データが異常な画像データである場合に、異常訓練用画像データから特定される人物の関節位置の特徴と類似する人物の関節位置の特徴を有する類似訓練用画像データを、複数の訓練用画像データから特定する。コンピュータは、異常訓練用画像データから特定される人物の領域に基づいて、類似訓練用画像データから特定される人物の領域を調整する。コンピュータは、類似訓練用画像データから、調整後の人物の領域を切り出した画像データを基にして、学習モデルを訓練する。 In the first plan, the computer executes the following process. The computer infers the plurality of skeletal data of the person included in the plurality of image data based on the result of inputting the plurality of image data obtained by cutting out the region of the person from the plurality of training image data into the learning model. The computer detects abnormal skeletal data based on a plurality of skeletal data. The computer performs a search based on a region of the person identified from the abnormal training image data corresponding to the abnormal skeletal data and a region of the person in the image data corresponding to the abnormal skeletal data among the plurality of training image data. Then, it is determined whether or not the image data corresponding to the abnormal skeletal data has an abnormal human area. When the image data corresponding to abnormal skeletal data is abnormal image data, the computer performs similar training that has joint position characteristics of a person similar to the joint position characteristics of the person identified from the abnormal training image data. training image data is identified from a plurality of training image data. The computer adjusts the region of the person specified from the similar training image data based on the region of the person specified from the abnormal training image data. The computer trains the learning model based on the image data obtained by cutting out the adjusted human region from the similar training image data.

人物の骨格データを推論するモデルの精度を向上させることができる。 The accuracy of a model that infers human skeletal data can be improved.

図１は、本実施例に係る骨格認識システムの一例を示す図である。FIG. 1 is a diagram showing an example of a skeleton recognition system according to this embodiment. 図２は、本実施例に係る情報処理装置の処理を説明するための図である。FIG. 2 is a diagram for explaining the processing of the information processing apparatus according to this embodiment. 図３は、人検出部の処理を説明するための図である。FIG. 3 is a diagram for explaining the processing of the person detection section. 図４は、セグメンテーションの一例を示す図である。FIG. 4 is a diagram illustrating an example of segmentation. 図５は、異常画像データ検出部の処理を説明するための図である。FIG. 5 is a diagram for explaining the processing of the abnormal image data detection section. 図６は、類似姿勢検出部の処理を説明するための図（１）である。FIG. 6 is a diagram (1) for explaining the processing of the similar posture detection section. 図７は、類似姿勢検出部の処理を説明するための図（２）である。FIG. 7 is a diagram (2) for explaining the processing of the similar posture detection section. 図８は、訓練用画像データ生成部の処理を説明するための図である。FIG. 8 is a diagram for explaining the processing of the training image data generation section. 図９は、本実施例に係る情報処理装置の構成を示す機能ブロック図である。FIG. 9 is a functional block diagram showing the configuration of the information processing device according to this embodiment. 図１０は、本実施例に係る情報処理装置の処理手順を示すフローチャートである。FIG. 10 is a flowchart showing the processing procedure of the information processing apparatus according to this embodiment. 図１１は、異常骨格データ検出処理の処理手順を示すフローチャートである。FIG. 11 is a flowchart showing the processing procedure of abnormal skeletal data detection processing. 図１２は、異常画像データ検出処理の処理手順を示すフローチャートである。FIG. 12 is a flowchart showing the processing procedure of abnormal image data detection processing. 図１３は、類似姿勢検出処理の処理手順を示すフローチャートである。FIG. 13 is a flowchart showing the processing procedure of similar posture detection processing. 図１４は、訓練用画像データ生成処理の処理手順を示すフローチャートである。FIG. 14 is a flowchart showing the processing procedure of training image data generation processing. 図１５は、実施例の情報処理装置と同様の機能を実現するコンピュータのハードウェア構成の一例を示す図である。FIG. 15 is a diagram illustrating an example of the hardware configuration of a computer that implements the same functions as the information processing device of the embodiment. 図１６は、従来の骨格認識システムを説明するための図である。FIG. 16 is a diagram for explaining a conventional skeleton recognition system. 図１７は、画像データに応じた関節位置推定結果の一例を示す図である。FIG. 17 is a diagram showing an example of joint position estimation results according to image data. 図１８は、３Ｄ骨格認識システムの一例を示す図である。FIG. 18 is a diagram showing an example of a 3D skeleton recognition system.

以下に、本願の開示する情報処理プログラム、情報処理方法および情報処理装置の実施例を図面に基づいて詳細に説明する。なお、この実施例によりこの発明が限定されるものではない。 Embodiments of an information processing program, an information processing method, and an information processing apparatus disclosed in the present application will be described in detail below based on the drawings. Note that the present invention is not limited to this example.

図１は、本実施例に係る骨格認識システムの一例を示す図である。図１に示すように、この骨格認識システム３０は、カメラ３１ａ，３１ｂ，３１ｃ，３１ｄと、情報処理装置１００とを有する。カメラ３１ａ～３１ｄは、情報処理装置１００に接続される。 FIG. 1 is a diagram showing an example of a skeleton recognition system according to this embodiment. As shown in FIG. 1, this skeleton recognition system 30 includes cameras 31a, 31b, 31c, and 31d, and an information processing device 100. Cameras 31a to 31d are connected to information processing device 100.

カメラ３１ａ～３１ｄは、それぞれ異なる位置に設置され、競技者の画像（ＲＧＢ＜Red Green Blue＞画像）を撮影する。カメラ３１ａ～３１ｄは、撮影した画像のデータを、情報処理装置１００に送信する。カメラ３１ａ～３１ｄが撮影した画像のデータを「画像フレーム」と表記する。カメラ３１ａ～３１ｄは、時系列で複数の画像フレームを、情報処理装置１００に送信する。各画像フレームには、フレーム番号が昇順に付与される。以下の説明では、適宜、カメラ３１ａ～３１ｄをまとめて「カメラ３１」と表記する。 The cameras 31a to 31d are installed at different positions and take images of the contestants (RGB <Red Green Blue> images). The cameras 31a to 31d transmit data of captured images to the information processing device 100. The data of images taken by the cameras 31a to 31d will be referred to as "image frames." The cameras 31a to 31d transmit a plurality of image frames to the information processing device 100 in time series. Frame numbers are assigned to each image frame in ascending order. In the following description, the cameras 31a to 31d will be collectively referred to as "camera 31" as appropriate.

情報処理装置１００は、訓練データセット５０に格納される訓練用画像データと正解ラベルと組を基にして、骨格推論モデル４０を事前に訓練しておく。たとえば、骨格推論モデル４０は、人領域をバウンディングボックスで切り出した画像データを入力とし、３Ｄ骨格データを出力とするモデルである。 The information processing device 100 trains the skeletal inference model 40 in advance based on the training image data and correct label set stored in the training data set 50. For example, the skeletal inference model 40 is a model that inputs image data obtained by cutting out a human region using a bounding box, and outputs 3D skeletal data.

情報処理装置１００は、カメラ３１から受信する画像フレームと、訓練済みの骨格推論モデル４０とを基にして、競技者の骨格データを推論する。 The information processing device 100 infers the athlete's skeletal data based on the image frame received from the camera 31 and the trained skeletal inference model 40.

ここで、情報処理装置１００が、骨格推論モデル４０を再訓練する場合の処理について説明する。たとえば、情報処理装置１００は、訓練データセット５０に格納された複数の訓練用画像データから人領域を切り出した複数の画像データを骨格推論モデル４０に入力して、複数の骨格データを推論する。 Here, a process when the information processing apparatus 100 retrains the skeletal inference model 40 will be described. For example, the information processing device 100 inputs a plurality of image data obtained by cutting out a human region from a plurality of training image data stored in the training data set 50 to the skeletal inference model 40, and infers a plurality of skeletal data.

情報処理装置１００は、複数の骨格データから異常な骨格データを特定し、異常な骨格データの推論元となる画像データが異常な画像データであるか否かを判定する。情報処理装置１００は、異常な画像データである場合に、係る異常な画像データに対応する訓練用画像データに含まれる人物特徴に類似する人物特徴を有する他の訓練用画像データを特定する。情報処理装置１００は、特定した他の訓練用画像データの人領域を調整し、調整した人領域を切り出した画像データを、再訓練時に利用する。 The information processing device 100 identifies abnormal skeletal data from a plurality of pieces of skeletal data, and determines whether image data from which the abnormal skeletal data is inferred is abnormal image data. When the image data is abnormal, the information processing apparatus 100 identifies other training image data having human characteristics similar to the human characteristics included in the training image data corresponding to the abnormal image data. The information processing device 100 adjusts the human region of the other training image data that has been identified, and uses the image data obtained by cutting out the adjusted human region at the time of retraining.

これによって、実際の人領域検出の誤りに対応する画像データの人物特徴に類似する他の画像データによって、骨格推論モデル４０を再訓練できる。また、実際に３次元座標の乱れが発生している骨格データに応じた画像データだけではなく、上記他の画像データを更に用いて、骨格推論モデル４０を再訓練できる。すなわち、人物の骨格データを精度よく推定できるように骨格推論モデル４０を再訓練することができる。 As a result, the skeletal inference model 40 can be retrained using other image data similar to the human features of the image data corresponding to the actual human area detection error. Furthermore, the skeletal inference model 40 can be retrained using not only the image data corresponding to the skeletal data in which the three-dimensional coordinates are actually disturbed, but also the other image data mentioned above. That is, the skeletal inference model 40 can be retrained to accurately estimate the skeletal data of a person.

図２は、本実施例に係る情報処理装置の処理を説明するための図である。図２に示すように、情報処理装置１００は、人検出部１５１、骨格推論部１５２、セグメンテーション部１５３を有する。また、情報処理装置１００は、異常骨格データ検出部１５４、異常画像データ検出部１５５、類似姿勢検出部１５６、訓練用画像データ生成部１５７、機械学習実行部１５８を有する。 FIG. 2 is a diagram for explaining the processing of the information processing apparatus according to this embodiment. As shown in FIG. 2, the information processing device 100 includes a person detection section 151, a skeleton inference section 152, and a segmentation section 153. The information processing device 100 also includes an abnormal skeleton data detection section 154, an abnormal image data detection section 155, a similar posture detection section 156, a training image data generation section 157, and a machine learning execution section 158.

人検出部１５１は、訓練用画像データ６０から、人領域を検出し、バウンディングボックスで人領域を切り出した画像データ６１を生成する。図３は、人検出部の処理を説明するための図である。人検出部１５１は、訓練済みの機械学習モデル１５１ａを利用する。機械学習モデル１５１ａは、YOLO（YOU Only Look Once）、SSD（Single Shot Multibox Detector）、RCNN（Region Based Convolutional Neural Networks）などである。 The person detection unit 151 detects a human region from the training image data 60 and generates image data 61 in which the human region is cut out using a bounding box. FIG. 3 is a diagram for explaining the processing of the person detection section. The person detection unit 151 uses a trained machine learning model 151a. The machine learning model 151a includes YOLO (YOU Only Look Once), SSD (Single Shot Multibox Detector), and RCNN (Region Based Convolutional Neural Networks).

図３に示すように、人検出部１５１は、訓練用画像データ６０を、機械学習モデル７０に入力して、人領域Ａ１を検出し、検出した人領域Ａ１をバウンディングボックスで切り出した画像データ６１を出力する。たとえば、訓練用画像データ６０には、フレーム番号が付与されており、画像データ６１には、訓練用画像データ６０と同一のフレーム番号が付与される。 As shown in FIG. 3, the person detection unit 151 inputs the training image data 60 into the machine learning model 70, detects a person area A1, and cuts out the detected person area A1 using a bounding box. Output. For example, the training image data 60 is assigned a frame number, and the image data 61 is assigned the same frame number as the training image data 60.

人検出部１５１は、複数の訓練用画像データ６０に対して、上記処理を繰り返し実行することで、複数の画像データ６１を出力する。人検出部１５１は、複数の画像データ６１を、骨格推論部１５２、異常画像データ検出部１５５に出力する。 The person detection unit 151 outputs a plurality of image data 61 by repeatedly performing the above processing on a plurality of training image data 60. The person detection section 151 outputs the plurality of image data 61 to the skeleton inference section 152 and the abnormal image data detection section 155.

図２の説明に戻る。骨格推論部１５２は、訓練済みの骨格推論モデル４０に、画像データ６１を入力することで、骨格データ６２を推論する。骨格データ６２には、人物の各関節について、３次元の関節位置のデータが設定される。骨格データ６２には、推論元の画像データ６１と同一のフレーム番号が付与される。これによって、骨格データ６２に付与されたフレーム番号と、推論元となる画像データ６１とのフレーム番号とが同一となる。 Returning to the explanation of FIG. 2. The skeletal inference unit 152 infers skeletal data 62 by inputting the image data 61 to the trained skeletal inference model 40. In the skeleton data 62, three-dimensional joint position data is set for each joint of the person. The same frame number as the inference source image data 61 is assigned to the skeleton data 62. As a result, the frame number assigned to the skeleton data 62 becomes the same as the frame number of the image data 61 that is the inference source.

骨格推論部１５２は、複数の画像データ６１に対して、上記処理を繰り返し実行することで、複数の骨格データ６２を生成する。骨格推論部１５２は、複数の骨格データ６２を、異常骨格データ検出部１５４に出力する。 The skeleton inference unit 152 generates a plurality of skeleton data 62 by repeatedly performing the above processing on a plurality of image data 61. The skeletal inference section 152 outputs the plurality of skeletal data 62 to the abnormal skeletal data detection section 154.

セグメンテーション部１５３は、訓練用画像データ６０に対して、セグメンテーションを実行することで、訓練用画像データ６０に含まれる人物の各部位を抽出したセグメンテーションデータ６３を生成する。たとえば、セグメンテーション部１５３は、BodyPix等を用いて、セグメンテーションを実行する。セグメンテーションデータ６３には、訓練用画像データ６０と同一のフレーム番号が付与される。 The segmentation unit 153 performs segmentation on the training image data 60 to generate segmentation data 63 in which each part of the person included in the training image data 60 is extracted. For example, the segmentation unit 153 uses BodyPix or the like to perform segmentation. The segmentation data 63 is given the same frame number as the training image data 60.

図４は、セグメンテーションの一例を示す図である。図４に示す例では、訓練用画像データ６０に対して、セグメンテーションを実行することで、セグメンテーションデータ６３が得られる。たとえば、訓練用画像データ６０の人物ｈ１が、セグメンテーションデータ６３において、複数のセグメントｐ１，ｐ２，ｐ３，ｐ４，ｐ５，ｐ６，ｐ７，ｐ８，ｐ９，ｐ１０，ｐ１１，ｐ１２，ｐ１３，ｐ１４，ｐ１５に分割されている。各セグメントｐ１～ｐ１５には、人物の部位が割り当てられる。訓練用画像データ６０に複数の人物が含まれる場合には、各人物が、複数のセグメントに分割される。 FIG. 4 is a diagram illustrating an example of segmentation. In the example shown in FIG. 4, segmentation data 63 is obtained by performing segmentation on the training image data 60. For example, a person h1 in the training image data 60 is divided into multiple segments p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, p13, p14, p15 in the segmentation data 63. It is divided. A body part of a person is assigned to each segment p1 to p15. When the training image data 60 includes multiple people, each person is divided into multiple segments.

セグメンテーション部１５３は、複数の訓練用画像データ６０に対して、上記処理を繰り返し実行することで、複数のセグメンテーションデータ６３を生成する。セグメンテーション部１５３は、複数のセグメンテーションデータ６３を、異常画像データ検出部１５５、類似姿勢検出部１５６、訓練用画像データ生成部１５７に出力する。 The segmentation unit 153 generates a plurality of segmentation data 63 by repeatedly performing the above processing on a plurality of training image data 60. The segmentation unit 153 outputs the plurality of segmentation data 63 to the abnormal image data detection unit 155, the similar posture detection unit 156, and the training image data generation unit 157.

異常骨格データ検出部１５４は、骨格データ６２の関節位置間の距離（骨の長さ）、関節角度、前後の骨格データの関節位置の移動距離を基にして、複数の骨格データ６２から、異常な骨格データを検出する。以下の説明では、異常な骨格データを「異常骨格データ」と表記する。 The abnormal skeletal data detection unit 154 detects an abnormality from a plurality of skeletal data 62 based on the distance between joint positions (bone length) of the skeletal data 62, the joint angle, and the movement distance of the joint positions of the previous and subsequent skeletal data. Detect skeletal data. In the following description, abnormal skeletal data will be referred to as "abnormal skeletal data."

異常骨格データ検出部１５４が、骨の長さを基にして、異常骨格データを検出する処理の一例について説明する。異常骨格データ検出部１５４は、対象の骨格データに含まれる複数の関節位置のうち、第１関節位置と、第１関節位置の隣の第２関節位置とを選択し、第１関節位置と、第２関節位置との距離を計算する。異常骨格データ検出部１５４は、第１関節位置を変更しながら、上記処理を繰り返し実行し、各関節位置間の距離（骨の長さ）を算出する。 An example of a process in which the abnormal skeletal data detection unit 154 detects abnormal skeletal data based on bone length will be described. The abnormal skeletal data detection unit 154 selects a first joint position and a second joint position adjacent to the first joint position from among the plurality of joint positions included in the target skeletal data, and selects the first joint position and the second joint position adjacent to the first joint position. Calculate the distance to the second joint position. The abnormal skeletal data detection unit 154 repeatedly executes the above process while changing the first joint position, and calculates the distance (bone length) between each joint position.

異常骨格データ検出部１５４は、対象の骨格データから算出した各関節位置間の距離の平均値が、閾値Ｔｈ１以上である場合に、対象の骨格データを、異常骨格データとして検出する。 The abnormal skeletal data detection unit 154 detects the skeletal data of the target as abnormal skeletal data when the average value of the distances between the joint positions calculated from the skeletal data of the target is equal to or greater than the threshold Th1.

異常骨格データ検出部１５４が、関節角度を基にして、異常骨格データを検出する処理の一例について説明する。異常骨格データ検出部１５４は、対象の骨格データに含まれる複数の関節のうち、第１関節と、第１関節の隣の第２関節とを選択し、第１関節と第２関節との関節角度を計算する。異常骨格データ検出部１５４は、第１関節と第２関節との角度が、人体の最大関節可動域に応じた閾値Ｔｈ２以上である場合に、対象の骨格データを、異常骨格データとして検出する。異常骨格データ検出部１５４は、第１関節を変更しながら、上記処理を繰り返し実行する。 An example of a process in which the abnormal skeletal data detection unit 154 detects abnormal skeletal data based on joint angles will be described. The abnormal skeletal data detection unit 154 selects a first joint and a second joint adjacent to the first joint from among the plurality of joints included in the target skeletal data, and detects the joint between the first joint and the second joint. Calculate the angle. The abnormal skeletal data detection unit 154 detects the target skeletal data as abnormal skeletal data when the angle between the first joint and the second joint is equal to or greater than a threshold value Th2 corresponding to the maximum joint range of motion of the human body. The abnormal skeletal data detection unit 154 repeatedly executes the above process while changing the first joint.

異常骨格データ検出部１５４が、前後の骨格データの関節位置の移動距離を基にして、異常骨格データを検出する処理の一例について説明する。異常骨格データ検出部１５４は、フレーム番号Ｎの骨格データの第１関節の第１関節位置（Ｎ）を選択し、フレーム番号Ｎ＋１の骨格データの第１関節の第１関節位置（Ｎ＋１）を選択する。Ｎは自然数である。異常骨格データ検出部１５４は、第１関節位置（Ｎ）と、第１関節位置（Ｎ＋１）との距離（移動距離）を計算し、移動距離が、閾値Ｔｈ３以上である場合に、対象の骨格データを、異常骨格データとして検出する。異常骨格データ検出部１５４は、第１関節を変更しながら、上記処理を繰り返し実行する。 An example of a process in which the abnormal skeletal data detection unit 154 detects abnormal skeletal data based on movement distances of joint positions of previous and subsequent skeletal data will be described. The abnormal skeletal data detection unit 154 selects the first joint position (N) of the first joint of the skeletal data of frame number N, and selects the first joint position (N+1) of the first joint of the skeletal data of frame number N+1. do. N is a natural number. The abnormal skeleton data detection unit 154 calculates the distance (movement distance) between the first joint position (N) and the first joint position (N+1), and when the movement distance is equal to or greater than the threshold Th3, the abnormal skeleton data detection unit 154 detects the target skeleton. The data is detected as abnormal skeletal data. The abnormal skeletal data detection unit 154 repeatedly executes the above process while changing the first joint.

異常骨格データ検出部１５４は、複数の骨格データに対して、上記処理を実行することで、異常骨格データを検出し、異常骨格データに対応するフレーム番号を、異常画像データ検出部１５５に出力する。 The abnormal skeletal data detection unit 154 detects abnormal skeletal data by performing the above processing on a plurality of skeletal data, and outputs the frame number corresponding to the abnormal skeletal data to the abnormal image data detection unit 155. .

異常画像データ検出部１５５は、異常骨格データのフレーム番号に対応する画像データ６１と、異常骨格データのフレーム番号に対応するセグメンテーションデータ６３を取得する。以下の異常画像データ検出部１５５の説明では、異常骨格データのフレーム番号に対応する画像データ６１、セグメンテーションデータ６３を、単に、画像データ６１、セグメンテーションデータ６３と表記する。 The abnormal image data detection unit 155 obtains image data 61 corresponding to the frame number of the abnormal skeleton data and segmentation data 63 corresponding to the frame number of the abnormal skeleton data. In the following description of the abnormal image data detection unit 155, the image data 61 and segmentation data 63 corresponding to the frame number of the abnormal skeleton data are simply expressed as image data 61 and segmentation data 63.

異常画像データ検出部１５５は、画像データ６１と、セグメンテーションデータ６３とを基にして、画像データ６１が、異常な画像データ６１であるか否かを判定する。以下の説明では、異常な画像データを「異常画像データ」と表記する。 The abnormal image data detection unit 155 determines whether the image data 61 is abnormal image data 61 based on the image data 61 and the segmentation data 63. In the following description, abnormal image data will be referred to as "abnormal image data."

図５は、異常画像データ検出部の処理を説明するための図である。異常画像データ検出部１５５の処理を、ケース１とケース２に分けて説明を行う。 FIG. 5 is a diagram for explaining the processing of the abnormal image data detection section. The processing of the abnormal image data detection unit 155 will be explained separately in case 1 and case 2.

ケース１について説明する。異常画像データ検出部１５５は、セグメンテーションデータ６３の複数の部位からなる人領域の外接ＢＢＯＸ７５ａを検出する。異常画像データ検出部１５５は、画像データ６１の矩形７５ｂを、セグメンテーションデータ６３上に設定する。画像データ６１は、訓練用画像データ６０から検出される人領域のＢＢＯＸに対応する画像データであるため、異常画像データ検出部１５５は、訓練用画像データ６０から検出される人領域のＢＢＯＸの位置情報を用いて、画像データ６１の矩形７５ｂを、セグメンテーションデータ６３上に設定してもよい。 Case 1 will be explained. The abnormal image data detection unit 155 detects a circumscribed BBOX 75a of a human region made up of a plurality of parts of the segmentation data 63. The abnormal image data detection unit 155 sets a rectangle 75b of the image data 61 on the segmentation data 63. Since the image data 61 is image data corresponding to the BBOX of the human region detected from the training image data 60, the abnormal image data detection unit 155 detects the position of the BBOX of the human region detected from the training image data 60. The rectangle 75b of the image data 61 may be set on the segmentation data 63 using the information.

異常画像データ検出部１５５は、外接ＢＢＯＸ７５ａと、矩形７５ｂとを比較し、第１正常条件及び第２正常条件を満たす場合には、画像データ６１を正常な画像データであると判定する。第１正常条件は、外接ＢＢＯＸ７５ａの辺と、矩形７５ｂの辺とが交差しないという条件である。第２正常条件は、外接ＢＢＯＸ７５ａの辺と、係る辺に対応する矩形７５ｂの辺との距離が閾値Ｔｈ４未満であるという条件である。 The abnormal image data detection unit 155 compares the circumscribed BBOX 75a and the rectangle 75b, and determines that the image data 61 is normal image data if the first normal condition and the second normal condition are satisfied. The first normal condition is that the sides of the circumscribed BBOX 75a and the sides of the rectangle 75b do not intersect. The second normal condition is that the distance between the side of the circumscribed BBOX 75a and the side of the rectangle 75b corresponding to the side is less than the threshold Th4.

異常画像データ検出部１５５は、ケース１に示す外接ＢＢＯＸ７５ａと、矩形７５ｂとを比較した結果、第１正常条件及び第２正常条件を満たさないため、画像データ６１を正常な画像データと判定する。 As a result of comparing the circumscribed BBOX 75a shown in case 1 with the rectangle 75b, the abnormal image data detection unit 155 determines that the image data 61 is normal image data because the first normal condition and the second normal condition are not satisfied.

ケース２について説明する。異常画像データ検出部１５５は、セグメンテーションデータ６３の複数の部位からなる人領域の外接ＢＢＯＸ７５ａを検出する。異常画像データ検出部１５５は、画像データ６１の矩形７５ｃを、セグメンテーションデータ６３上に設定する。 Case 2 will be explained. The abnormal image data detection unit 155 detects a circumscribed BBOX 75a of a human region made up of a plurality of parts of the segmentation data 63. The abnormal image data detection unit 155 sets a rectangle 75c of the image data 61 on the segmentation data 63.

異常画像データ検出部１５５は、外接ＢＢＯＸ７５ａと、矩形７５ｂとを比較すると、外接ＢＢＯＸ７５ａの辺と、矩形７５ｂの辺とが交差しており、第１正常条件を満たさない。このため、異常画像データ検出部１５５は、画像データ６１を異常画像データと判定する。 When the abnormal image data detection unit 155 compares the circumscribed BBOX 75a and the rectangle 75b, the sides of the circumscribed BBOX 75a intersect with the sides of the rectangle 75b, and the first normal condition is not satisfied. Therefore, the abnormal image data detection unit 155 determines the image data 61 to be abnormal image data.

なお、異常画像データ検出部１５５は、外接ＢＢＯＸ７５ａの辺と、矩形７５ｂの辺とが交差している場合でも、各辺の距離が閾値Ｔｈ５未満の場合には、第１正常条件を満たすと判定してもよい。 Note that even if the side of the circumscribed BBOX 75a and the side of the rectangle 75b intersect, the abnormal image data detection unit 155 determines that the first normal condition is satisfied if the distance between each side is less than the threshold Th5. You may.

異常画像データ検出部１５５は、上記処理を実行し、画像データ６１が、異常画像データであると判定した場合には、異常画像データ（画像データ６１）のフレーム番号を、類似姿勢検出部１５６に出力する。 The abnormal image data detection unit 155 executes the above processing, and if it is determined that the image data 61 is abnormal image data, the abnormal image data detection unit 155 sends the frame number of the abnormal image data (image data 61) to the similar posture detection unit 156. Output.

図２の説明に戻る。類似姿勢検出部１５６は、異常画像データ（画像データ６１）のフレーム番号に対応するセグメンテーションデータ６３（以下、基準データと表記する）を取得する。類似姿勢検出部１５６は、基準データと、セグメンテーションデータ６３とを比較し、基準データの人物の姿勢、基準データのカメラアングルに類似するセグメンテーションデータ６３を検出する。 Returning to the explanation of FIG. 2. The similar posture detection unit 156 acquires segmentation data 63 (hereinafter referred to as reference data) corresponding to the frame number of the abnormal image data (image data 61). The similar posture detection unit 156 compares the reference data and the segmentation data 63, and detects segmentation data 63 that is similar to the person's posture of the reference data and the camera angle of the reference data.

図６及び図７は、類似姿勢検出部の処理を説明するための図である。図６について説明する。類似姿勢検出部１５６は、基準データ７６の各部位の境界を基にして関節位置を特定し、隣り合う２つの関節位置の部分ベクトルを求める処理を繰り返すことで、基準ベクトル情報７６ａを生成する。図６に示す例では、人物の関節数を「１５」とする。各関節をＪ０～Ｊ１４と定義し、関節の関節位置（関節座標）をＪｉ（ｘ、ｙ）とする。たとえば、基準データ７６の左上隅の画素の位置を（０、０）とする。ｘは、幅方向の画素位置、ｙは、高さ方向の画素位置である。 6 and 7 are diagrams for explaining the processing of the similar posture detection section. FIG. 6 will be explained. Similar posture detection section 156 generates reference vector information 76a by identifying joint positions based on the boundaries of each part of reference data 76 and repeating the process of obtaining partial vectors of two adjacent joint positions. In the example shown in FIG. 6, the number of joints of the person is "15". Each joint is defined as J0 to J14, and the joint position (joint coordinates) of the joint is Ji (x, y). For example, assume that the position of the pixel at the upper left corner of the reference data 76 is (0, 0). x is the pixel position in the width direction, and y is the pixel position in the height direction.

たとえば、類似姿勢検出部１５６は、部分ベクトルＶ０を「Ｖ０＝Ｊ１－Ｊ０」によって算出する。類似姿勢検出部１５６は、部分ベクトルＶ１を「Ｖ１＝Ｊ２－Ｊ１」によって、算出する。類似姿勢検出部１５６は、部分ベクトルＶ２を「Ｖ２＝Ｊ３－Ｊ１」によって算出する。類似姿勢検出部１５６は、部分ベクトルＶ３を「Ｖ３＝Ｊ４－Ｊ３」によって算出する。類似姿勢検出部１５６は、部分ベクトルＶ４を「Ｖ４＝Ｊ５－Ｊ４」によって算出する。類似姿勢検出部１５６は、部分ベクトルＶ５を「Ｖ５＝Ｊ６－Ｊ１」によって算出する。類似姿勢検出部１５６は、部分ベクトルＶ６を「Ｖ６＝Ｊ７－Ｊ６」によって算出する。類似姿勢検出部１５６は、部分ベクトルＶ７を「Ｖ７＝Ｊ８－Ｊ７」によって算出する。 For example, the similar posture detection unit 156 calculates the partial vector V0 by "V0=J1-J0". The similar posture detection unit 156 calculates the partial vector V1 by “V1=J2−J1”. The similar posture detection unit 156 calculates the partial vector V2 by "V2=J3-J1". The similar posture detection unit 156 calculates the partial vector V3 by "V3=J4-J3". The similar posture detection unit 156 calculates the partial vector V4 by "V4=J5-J4". The similar posture detection unit 156 calculates the partial vector V5 by "V5=J6-J1". The similar posture detection unit 156 calculates the partial vector V6 by "V6=J7-J6". The similar posture detection unit 156 calculates the partial vector V7 by "V7=J8-J7".

類似姿勢検出部１５６は、部分ベクトルＶ８を「Ｖ８＝Ｊ９－Ｊ０」によって算出する。類似姿勢検出部１５６は、部分ベクトルＶ９を「Ｖ９＝Ｊ１０－Ｊ９」によって算出する。類似姿勢検出部１５６は、部分ベクトルＶ１０を「Ｖ１０＝Ｊ１１－Ｊ１０」によって算出する。類似姿勢検出部１５６は、部分ベクトルＶ１１を「Ｖ１１＝Ｊ１２－Ｊ０」によって算出する。類似姿勢検出部１５６は、部分ベクトルＶ１２を「Ｖ１２＝Ｊ１３－Ｊ１２」によって算出する。類似姿勢検出部１５６は、部分ベクトルＶ１３を「Ｖ１３＝Ｊ１４－Ｊ１３」によって算出する。 The similar posture detection unit 156 calculates the partial vector V8 by "V8=J9-J0". The similar posture detection unit 156 calculates the partial vector V9 by "V9=J10-J9". The similar posture detection unit 156 calculates the partial vector V10 by "V10=J11-J10". The similar posture detection unit 156 calculates the partial vector V11 by "V11=J12-J0". The similar posture detection unit 156 calculates the partial vector V12 by "V12=J13-J12". The similar posture detection unit 156 calculates the partial vector V13 by "V13=J14-J13".

類似姿勢検出部１５６は、上記の処理を実行することで、各部分ベクトルＶ０～Ｖ１３を算出し、基準ベクトル情報７６ａを生成する。 The similar posture detecting unit 156 calculates each partial vector V0 to V13 by executing the above processing, and generates the reference vector information 76a.

類似姿勢検出部１５６は、式（１）を基にして、各部分ベクトルを正規化する。正規化後の部分ベクトルをＶ’_ｉとする。正規化することで、同じ姿勢の場合には、画像内の人物の前後の位置（遠い、近い）に関わらず、部分ベクトルの大きさ、向きが同一となる。 The similar posture detection unit 156 normalizes each partial vector based on equation (1). Let the partial vector after normalization be V' _i . By normalizing, in the case of the same posture, the magnitude and direction of the partial vectors become the same regardless of the front and rear positions (far or near) of the person in the image.

図７の説明に移行する。類似姿勢検出部１５６は、図６で説明した処理を実行することで、基準データ７６から、基準ベクトル情報７６ａを生成する。類似姿勢検出部１５６は、図６で説明した処理と同様の処理を、セグメンテーションデータ６３に対して実行することで、各部分ベクトルを求め、ベクトル情報７７を生成する。 Moving on to the explanation of FIG. The similar posture detection unit 156 generates reference vector information 76a from the reference data 76 by executing the process described in FIG. The similar posture detection unit 156 calculates each partial vector and generates vector information 77 by performing processing similar to the processing described in FIG. 6 on the segmentation data 63.

類似姿勢検出部１５６は、基準ベクトル情報７６ａの各部分ベクトルと、ベクトル情報７７の各部分ベクトルとの比較結果を基にして、基準ベクトル情報７６ａと、ベクトル情報７７とが類似するか否かを判定する。たとえば、類似姿勢検出部１５６は、基準ベクトル情報７６ａの部分ベクトルＶ’_ｉと、ベクトル情報７７の部分ベクトルＶ’_ｉとの差が閾値Ｔｈ６未満であるか否かを判定する処理を、ｉ＝０～１３についてそれぞれ実行する。類似姿勢検出部１５６は、差が閾値Ｔｈ６未満となる部分ベクトルが存在しない場合に、基準ベクトル情報７６ａと、ベクトル情報７７とが類似すると判定する。 The similar posture detection unit 156 determines whether the reference vector information 76a and the vector information 77 are similar based on the comparison results between each partial vector of the reference vector information 76a and each partial vector of the vector information 77. judge. For example, the similar posture detection unit 156 performs a process of determining whether the difference between the partial vector V' _i of the reference vector information 76a and the partial vector V' _i of the vector information 77 is less than the threshold Th6. Execute each for 0 to 13. The similar posture detection unit 156 determines that the reference vector information 76a and the vector information 77 are similar when there is no partial vector with a difference less than the threshold Th6.

類似姿勢検出部１５６は、基準ベクトル情報７６ａと、ベクトル情報７７とが類似すると判定した場合には、ベクトル情報７７の生成元となるセグメンテーションデータ６３のフレーム番号を特定する。以下の説明では、基準ベクトル情報７６ａの生成元となるセグメンテーションデータ６３のフレーム番号を、基準フレーム番号と表記する。基準ベクトル情報７６ａと類似するベクトル情報７７の生成元となるセグメンテーションデータ６３のフレーム番号を、類似フレーム番号と表記する。類似姿勢検出部１５６は、基準フレーム番号と、類似フレーム番号とを、訓練用画像データ生成部１５７に出力する。 When determining that the reference vector information 76a and the vector information 77 are similar, the similar posture detection unit 156 identifies the frame number of the segmentation data 63 from which the vector information 77 is generated. In the following description, the frame number of the segmentation data 63 from which the reference vector information 76a is generated will be referred to as a reference frame number. The frame number of the segmentation data 63 from which the vector information 77 similar to the reference vector information 76a is generated is referred to as a similar frame number. Similar posture detection section 156 outputs the reference frame number and similar frame number to training image data generation section 157.

類似姿勢検出部１５６は、複数のセグメンテーションデータ６３に対して、上記処理を繰り返し実行することで、類似フレーム番号を特定し、基準フレーム番号と、類似フレーム番号とを、訓練用画像データ生成部１５７に出力する。 The similar posture detection unit 156 identifies similar frame numbers by repeatedly performing the above processing on the plurality of segmentation data 63, and uses the reference frame number and the similar frame number as the training image data generation unit 157. Output to.

図２の説明に戻る。訓練用画像データ生成部１５７は、基準フレーム番号と、類似フレーム番号とを基にして、画像データ６４を生成する。 Returning to the explanation of FIG. 2. The training image data generation unit 157 generates image data 64 based on the reference frame number and similar frame numbers.

図８は、訓練用画像データ生成部の処理を説明するための図である。訓練用画像データ生成部１５７は、基準フレーム番号に対応するセグメンテーションデータ（基準データ）を取得する。訓練用画像データ生成部１５７は、類似フレーム番号に対応するセグメンテーションデータ（以下、類似セグメンテーションデータ８０）を取得する。 FIG. 8 is a diagram for explaining the processing of the training image data generation section. The training image data generation unit 157 acquires segmentation data (reference data) corresponding to the reference frame number. The training image data generation unit 157 acquires segmentation data (hereinafter referred to as similar segmentation data 80) corresponding to the similar frame number.

訓練用画像データ生成部１５７は、類似セグメンテーションデータ８０の複数の部位からなる人領域の外接ＢＢＯＸ７７ａを検出する。訓練用画像データ生成部１５７は、基準データの複数の部位からなる人領域の外接ＢＢＯＸ７７ｂを検出する。訓練用画像データ生成部１５７は、外接ＢＢＯＸ７７ｂの縦横比率に合わせて、外接ＢＢＯＸ７７ａの縦横比率を調整したＢＢＯＸ７７ｃを生成する。 The training image data generation unit 157 detects a circumscribed BBOX 77a of a human region made up of a plurality of parts of the similar segmentation data 80. The training image data generation unit 157 detects a circumscribed BBOX 77b of a human region made up of a plurality of parts of the reference data. The training image data generation unit 157 generates a BBOX 77c in which the aspect ratio of the circumscribed BBOX 77a is adjusted to match the aspect ratio of the circumscribed BBOX 77b.

訓練用画像データ生成部１５７は、類似フレーム番号に対応する訓練用画像データ６０を取得し、訓練用画像データ６０を、ＢＢＯＸ７７ｃで切り出すことで、画像データ６４を生成する。訓練用画像データ生成部１５７は、基準フレーム番号と、類似フレーム番号との組について、上記処理をそれぞれ実行することで、複数の画像データ６４を生成する。訓練用画像データ生成部１５７は、画像データ６４に対応する正解ラベルとして、類似フレーム番号に対応する訓練用画像データ６０の正解ラベルを設定することで、訓練データセットを生成する。 The training image data generation unit 157 acquires the training image data 60 corresponding to the similar frame number, and generates the image data 64 by cutting out the training image data 60 using the BBOX 77c. The training image data generation unit 157 generates a plurality of image data 64 by performing the above processing on each set of a reference frame number and a similar frame number. The training image data generation unit 157 generates a training data set by setting the correct label of the training image data 60 corresponding to the similar frame number as the correct label corresponding to the image data 64.

機械学習実行部１５８は、訓練用画像データ生成部１５７によって生成される訓練データセットおよび訓練用画像データ６０（バウンディングボックスで切り出した画像データ６１および正解ラベル）を用いて、骨格推論モデル４０を再訓練する。たとえば、機械学習実行部１５８は、画像データを骨格推論モデル４０に入力した際の出力が、正解ラベルに近づくように、骨格推論モデル４０のパラメータを訓練する。 The machine learning execution unit 158 regenerates the skeletal inference model 40 using the training data set and the training image data 60 (the image data 61 cut out by the bounding box and the correct label) generated by the training image data generation unit 157. train. For example, the machine learning execution unit 158 trains the parameters of the skeletal inference model 40 so that the output when image data is input to the skeletal inference model 40 approaches the correct label.

次に、本実施例に係る情報処理装置１００の構成例について説明する。図９は、本実施例に係る情報処理装置の構成を示す機能ブロック図である。図９に示すように、情報処理装置１００は、通信部１１０と、入力部１２０と、表示部１３０と、記憶部１４０と、制御部１５０とを有する。 Next, a configuration example of the information processing device 100 according to the present embodiment will be described. FIG. 9 is a functional block diagram showing the configuration of the information processing device according to this embodiment. As shown in FIG. 9, the information processing device 100 includes a communication section 110, an input section 120, a display section 130, a storage section 140, and a control section 150.

通信部１１０は、カメラ３１から画像フレームを受信する。通信部１１０は、外部装置とデータ通信を実行し、訓練データセット５０等を受信してもよい。 The communication unit 110 receives image frames from the camera 31. The communication unit 110 may perform data communication with an external device and receive the training data set 50 and the like.

入力部１２０は、キーボードやマウス等の入力デバイスを用いて実現され、操作者による入力操作に対応して、制御部１５０に対して各種の情報を入力する。 The input unit 120 is realized using an input device such as a keyboard or a mouse, and inputs various information to the control unit 150 in response to input operations by an operator.

表示部１３０は、液晶ディスプレイなどの表示装置等によって実現される。 The display unit 130 is realized by a display device such as a liquid crystal display.

記憶部１４０は、骨格推論モデル４０および訓練データセット５０を有する。図示を省略するが、記憶部１４０は、制御部１５０で利用される各種のデータも記憶する。記憶部１４０は、たとえば、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。 The storage unit 140 includes a skeletal inference model 40 and a training data set 50. Although not shown, the storage unit 140 also stores various data used by the control unit 150. The storage unit 140 is realized by, for example, a semiconductor memory element such as a flash memory, or a storage device such as a hard disk or an optical disk.

骨格推論モデル４０は、人領域をバウンディングボックスで切り出した画像データを入力とし、３Ｄ骨格データを出力とするモデルである。骨格推論モデル４０は、ＮＮ（Neural Network）等である。 The skeletal inference model 40 is a model that inputs image data obtained by cutting out a human region using a bounding box, and outputs 3D skeletal data. The skeleton inference model 40 is a neural network (NN) or the like.

訓練データセット５０は、訓練用画像データと正解ラベルとの組を複数格納する。訓練データセット５０は、骨格推論モデル４０を訓練する場合に利用される。 The training data set 50 stores a plurality of pairs of training image data and correct labels. The training data set 50 is used when training the skeletal inference model 40.

技認識テーブル１４１は、各骨格データに含まれる各関節位置の時系列変化と、技の種別とを対応付けるテーブルである。また、技認識テーブル１４１は、技の種別の組み合わせと、スコアとを対応付ける。スコアは、Ｄ（Difficulty）スコアとＥ（Execution）スコアとの合計で算出される。たとえば、Ｄスコアは、技の難易度に基づいて算出されるスコアである。Ｅスコアは、技の完成度に応じて、減点法により算出されるスコアである。 The technique recognition table 141 is a table that associates time-series changes in the positions of each joint included in each skeleton data with the type of technique. Further, the technique recognition table 141 associates combinations of technique types with scores. The score is calculated as the sum of the D (Difficulty) score and the E (Execution) score. For example, the D score is a score calculated based on the difficulty level of the technique. The E score is a score calculated by a point deduction method according to the degree of perfection of the technique.

制御部１５０は、人検出部１５１、骨格推論部１５２、セグメンテーション部１５３を有する。制御部１５０は、異常骨格データ検出部１５４、異常画像データ検出部１５５、類似姿勢検出部１５６、訓練用画像データ生成部１５７、機械学習実行部１５８、技認識部１５９を有する。制御部１５０は、ＣＰＵ（Central Processing Unit）やＭＰＵ(Micro Processing Unit)により実現される。また、制御部１５０は、たとえば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実行されてもよい。 The control unit 150 includes a person detection unit 151, a skeleton inference unit 152, and a segmentation unit 153. The control unit 150 includes an abnormal skeleton data detection unit 154, an abnormal image data detection unit 155, a similar posture detection unit 156, a training image data generation unit 157, a machine learning execution unit 158, and a technique recognition unit 159. The control unit 150 is realized by a CPU (Central Processing Unit) or an MPU (Micro Processing Unit). Further, the control unit 150 may be executed by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

制御部１５０の各処理部１５１～１５９は、推論フェーズ、学習フェーズにおいて、それぞれ異なる処理を実行する。 Each of the processing units 151 to 159 of the control unit 150 executes different processing in the inference phase and the learning phase.

制御部１５０の推論フェーズの処理について説明する。推論フェーズでは、人検出部１５１、骨格推論部１５２、技認識部１５９が処理を実行する。 The inference phase processing of the control unit 150 will be explained. In the inference phase, the person detection unit 151, the skeleton inference unit 152, and the technique recognition unit 159 execute processing.

人検出部１５１は、カメラ３１から送信する画像フレームを受信し、画像フレームから、人領域を検出し、バウンディングボックスで人領域を切り出した画像データを生成する。人検出部１５１は、生成した画像データを、骨格推論部１５２に出力する。人検出部１５１は、時系列の画像フレームに対して、上記処理を繰り返し実行する。 The human detection unit 151 receives an image frame transmitted from the camera 31, detects a human region from the image frame, and generates image data by cutting out the human region using a bounding box. The person detection unit 151 outputs the generated image data to the skeleton inference unit 152. The person detection unit 151 repeatedly performs the above processing on time-series image frames.

骨格推論部１５２は、人検出部１５１から取得する画像データを、訓練済みの骨格推論モデル４０に入力することで、競技者の骨格データを推論する。骨格推論部１５２は、推論した骨格データを、技認識部１５９に出力する。骨格推定部１５２は、時系列の画像データに対して、上記処理を繰り返し実行する。 The skeletal inference unit 152 infers the athlete's skeletal data by inputting the image data acquired from the person detection unit 151 into the trained skeletal inference model 40. The skeleton inference unit 152 outputs the inferred skeleton data to the technique recognition unit 159. The skeleton estimation unit 152 repeatedly performs the above processing on time-series image data.

技認識部１５９は、時系列の骨格データを基にして、各関節位置の時系列変化を特定する。技認識部１５９は、各関節位置の時系列変化と、技認識テーブルとを比較して、技の種別を特定する。また、技認識部１５９は、技の種別の組み合わせと、技認識テーブル１４１とを比較して、競技者の演技のスコアを算出する。 The technique recognition unit 159 identifies time-series changes in the positions of each joint based on the time-series skeletal data. The technique recognition unit 159 compares the time-series changes in the positions of each joint with the technique recognition table to identify the type of technique. The technique recognition unit 159 also compares the combination of technique types with the technique recognition table 141 to calculate the score of the contestant's performance.

技認識部１５９は、演技のスコアと、演技の開始から終了までの骨格データとを基にして、画面情報を生成する。技認識部１５９は、生成した画面情報を、表示部１３０に出力して表示させる。 The technique recognition unit 159 generates screen information based on the performance score and skeletal data from the start to the end of the performance. The technique recognition unit 159 outputs the generated screen information to the display unit 130 for display.

続いて、制御部１５０の学習フェーズの処理について説明する。学習フェーズでは、人検出部１５１、骨格推論部１５２、セグメンテーション部１５３、異常骨格データ検出部１５４、異常画像データ検出部１５５、類似姿勢検出部１５６、訓練用画像データ生成部１５７、機械学習実行部１５８が処理を実行する。制御部１５０の学習フェーズの処理は、図２で説明した処理に対応する。 Next, the learning phase processing of the control unit 150 will be described. In the learning phase, a person detection unit 151, a skeleton inference unit 152, a segmentation unit 153, an abnormal skeleton data detection unit 154, an abnormal image data detection unit 155, a similar posture detection unit 156, a training image data generation unit 157, and a machine learning execution unit 158 executes the process. The learning phase process of the control unit 150 corresponds to the process described with reference to FIG.

人検出部１５１は、訓練データセット５０に格納された訓練用画像データ６０を取得し、訓練用画像データ６０から、人領域を検出し、バウンディングボックスで人領域を切り出した画像データ６１を生成する。人検出部１５１は、複数の画像データ６１を、骨格推論部１５２、異常画像データ検出部１５５に出力する。学習フェーズにおける、人検出部１５１のその他の処理は、図２で説明した人検出部１５１の処理と同様である。 The person detection unit 151 acquires training image data 60 stored in the training data set 50, detects a human region from the training image data 60, and generates image data 61 in which the human region is cut out using a bounding box. . The person detection section 151 outputs the plurality of image data 61 to the skeleton inference section 152 and the abnormal image data detection section 155. Other processes of the person detection unit 151 in the learning phase are similar to those of the person detection unit 151 described with reference to FIG.

骨格推定部１５２は、訓練済みの骨格推論モデル４０に、画像データ６１を入力することで、骨格データ６２を推論する。骨格推論部１５２は、複数の骨格データ６２を、異常骨格データ検出部１５４に出力する。骨格推定部１５２に関するその他の処理は、図２で説明した骨格推定部１５２の処理と同様である。 The skeleton estimation unit 152 infers skeleton data 62 by inputting the image data 61 to the trained skeleton inference model 40. The skeletal inference section 152 outputs the plurality of skeletal data 62 to the abnormal skeletal data detection section 154. Other processes related to the skeleton estimating section 152 are similar to those of the skeleton estimating section 152 described with reference to FIG.

セグメンテーション部１５３は、訓練データセット５０に格納された訓練用画像データ６０を取得する。セグメンテーション部１５３は、訓練用画像データ６０に対して、セグメンテーションを実行することで、訓練用画像データ６０に含まれる人物の各部位を抽出したセグメンテーションデータ６３を生成する。セグメンテーション部１５３は、複数のセグメンテーションデータ６３を、異常画像データ検出部１５５、類似姿勢検出部１５６、訓練用画像データ生成部１５７に出力する。セグメンテーション部１５３に関するその他の処理は、図２で説明したセグメンテーション部１５３の処理と同様である。 The segmentation unit 153 acquires training image data 60 stored in the training data set 50. The segmentation unit 153 performs segmentation on the training image data 60 to generate segmentation data 63 in which each part of the person included in the training image data 60 is extracted. The segmentation unit 153 outputs the plurality of segmentation data 63 to the abnormal image data detection unit 155, the similar posture detection unit 156, and the training image data generation unit 157. Other processes related to the segmentation unit 153 are similar to those of the segmentation unit 153 described with reference to FIG.

異常骨格データ検出部１５４は、骨格データ６２の関節位置間の距離（骨の長さ）、関節角度、前後の骨格データの関節位置の移動距離を基にして、複数の骨格データ６２から、異常な骨格データ（異常骨格データ）を検出する。異常骨格データ検出部１５４は、異常骨格データに対応するフレーム番号を、異常画像データ検出部１５５に出力する。異常骨格データ検出部１５４に関するその他の処理は、図２で説明した異常骨格データ検出部１５４の処理と同様である。 The abnormal skeletal data detection unit 154 detects an abnormality from a plurality of skeletal data 62 based on the distance between joint positions (bone length) of the skeletal data 62, the joint angle, and the movement distance of the joint positions of the previous and subsequent skeletal data. Detect abnormal skeletal data (abnormal skeletal data). The abnormal skeleton data detection unit 154 outputs the frame number corresponding to the abnormal skeleton data to the abnormal image data detection unit 155. Other processes regarding the abnormal skeletal data detection section 154 are similar to those of the abnormal skeletal data detection section 154 described with reference to FIG.

異常画像データ検出部１５５は、異常骨格データのフレーム番号に対応する画像データ６１と、異常骨格データのフレーム番号に対応するセグメンテーションデータ６３を取得し、画像データ６１が、異常画像データであるか否かを判定する。異常画像データ検出部１５５は、異常画像データ（画像データ６１）のフレーム番号を、類似姿勢検出部１５６に出力する。異常画像データ検出部１５５に関するその他の処理は、図２で説明した異常画像データ検出部１５５の処理と同様である。 The abnormal image data detection unit 155 acquires image data 61 corresponding to the frame number of the abnormal skeleton data and segmentation data 63 corresponding to the frame number of the abnormal skeleton data, and determines whether the image data 61 is abnormal image data. Determine whether The abnormal image data detection unit 155 outputs the frame number of the abnormal image data (image data 61) to the similar posture detection unit 156. Other processes related to the abnormal image data detection section 155 are similar to those of the abnormal image data detection section 155 described with reference to FIG.

類似姿勢検出部１５６は、異常画像データ（画像データ６１）のフレーム番号に対応するセグメンテーションデータ６３（以下、基準データと表記する）を取得する。類似姿勢検出部１５６は、基準データと、セグメンテーションデータ６３とを比較し、基準データの人物の姿勢、基準データのカメラアングルに類似するセグメンテーションデータ６３を検出する。類似姿勢検出部１５６は、基準データの基準フレーム番号と、基準データのカメラアングルに類似するセグメンテーションデータ６３の類似フレーム番号とを、訓練用画像データ生成部１５７に出力する。類似姿勢検出部１５６に関するその他の処理は、図２で説明した類似姿勢検出部１５６の処理と同様である。 The similar posture detection unit 156 acquires segmentation data 63 (hereinafter referred to as reference data) corresponding to the frame number of the abnormal image data (image data 61). The similar posture detection unit 156 compares the reference data and the segmentation data 63, and detects segmentation data 63 that is similar to the person's posture of the reference data and the camera angle of the reference data. The similar posture detection unit 156 outputs the reference frame number of the reference data and the similar frame number of the segmentation data 63 that is similar to the camera angle of the reference data to the training image data generation unit 157. Other processing related to the similar posture detection section 156 is the same as the processing of the similar posture detection section 156 described with reference to FIG.

訓練用画像データ生成部１５７は、基準フレーム番号と、類似フレーム番号とを基にして、画像データ６４を生成する。訓練用画像データ生成部１５７は、画像データ６４に対応する正解ラベルとして、類似フレーム番号に対応する訓練用画像データ６０の正解ラベルを設定することで、訓練データセットを生成する。訓練用画像データ生成部１５７は、生成した訓練データセットのデータを、訓練データセット５０に追加登録する。訓練用画像データ生成部１５７に関するその他の処理は、図２で説明した訓練用画像データ生成部１５７の処理と同様である。 The training image data generation unit 157 generates image data 64 based on the reference frame number and similar frame numbers. The training image data generation unit 157 generates a training data set by setting the correct label of the training image data 60 corresponding to the similar frame number as the correct label corresponding to the image data 64. The training image data generation unit 157 additionally registers the generated training data set data in the training data set 50. Other processing related to the training image data generation section 157 is similar to the processing of the training image data generation section 157 described with reference to FIG.

機械学習実行部１５８は、訓練データセット５０を用いて、骨格推論モデル４０を再訓練する。たとえば、機械学習実行部１５８は、画像データを骨格推論モデル４０に入力した際の出力が、正解ラベルに近づくように、骨格推論モデル４０のパラメータを訓練する。 The machine learning execution unit 158 retrains the skeletal inference model 40 using the training data set 50. For example, the machine learning execution unit 158 trains the parameters of the skeletal inference model 40 so that the output when image data is input to the skeletal inference model 40 approaches the correct label.

次に、本実施例に係る情報処理装置１００の処理手順の一例について説明する。図１０は、本実施例に係る情報処理装置の処理手順を示すフローチャートである。図１０に示すように、情報処理装置１００の制御部１５０は、訓練データセット５０から、訓練用画像データを取得する（ステップＳ１０１）。 Next, an example of a processing procedure of the information processing apparatus 100 according to the present embodiment will be described. FIG. 10 is a flowchart showing the processing procedure of the information processing apparatus according to this embodiment. As shown in FIG. 10, the control unit 150 of the information processing device 100 acquires training image data from the training data set 50 (step S101).

情報処理装置１００の人検出部１５１は、訓練用画像データに対して、人検出を実行し、画像データを生成する（ステップＳ１０２）。情報処理装置１００の骨格推論部１５２は、画像データを骨格推論モデル４０に入力して、骨格データを推論する（ステップＳ１０３）。情報処理装置１００のセグメンテーション部１５３は、訓練用画像データに対して、セグメンテーションを実行し、セグメンテーションデータを生成する（ステップＳ１０４）。 The person detection unit 151 of the information processing device 100 performs person detection on the training image data to generate image data (step S102). The skeleton inference unit 152 of the information processing device 100 inputs the image data to the skeleton inference model 40 and infers the skeleton data (step S103). The segmentation unit 153 of the information processing device 100 performs segmentation on the training image data to generate segmentation data (step S104).

情報処理装置１００の異常骨格データ検出部１５４は、異常骨格データ検出処理を実行する（ステップＳ１０５）。情報処理装置１００の異常画像データ検出部１５５は、異常画像データ検出処理を実行する（ステップＳ１０６）。 The abnormal skeletal data detection unit 154 of the information processing device 100 executes abnormal skeletal data detection processing (step S105). The abnormal image data detection unit 155 of the information processing device 100 executes abnormal image data detection processing (step S106).

情報処理装置１００の類似姿勢検出部１５６は、類似姿勢検出処理を実行する（ステップＳ１０７）。情報処理装置１００の訓練用画像データ生成部１５７は、訓練用画像データ生成処理を実行する（ステップＳ１０８）。 The similar posture detection unit 156 of the information processing device 100 executes similar posture detection processing (step S107). The training image data generation unit 157 of the information processing device 100 executes training image data generation processing (step S108).

訓練用画像データ生成部１５７は、生成した訓練用画像データおよび正解ラベルを訓練データセット５０に登録する（ステップＳ１０９）。情報処理装置１００の機械学習実行部１５８は、訓練データセット５０を用いて、骨格推論モデルを訓練する（ステップＳ１１０）。 The training image data generation unit 157 registers the generated training image data and correct label in the training data set 50 (step S109). The machine learning execution unit 158 of the information processing device 100 trains a skeletal inference model using the training data set 50 (step S110).

情報処理装置１００は、処理を継続する場合には（ステップＳ１１１，Ｙｅｓ）、ステップＳ１０１に移行する。情報処理装置１００は、処理を継続しない場合には（ステップＳ１１１，Ｎｏ）、処理を終了する。 If the information processing apparatus 100 continues the process (step S111, Yes), the process moves to step S101. If the information processing apparatus 100 does not continue the process (step S111, No), the information processing apparatus 100 ends the process.

続いて、図１０のステップＳ１０５に示した異常骨格データ検出処理の処理手順の一例について説明する。図１１は、異常骨格データ検出処理の処理手順を示すフローチャートである。情報処理装置１００の異常骨格データ検出部１５４は、骨格データを取得する（ステップＳ２０１）。 Next, an example of the processing procedure of the abnormal skeleton data detection process shown in step S105 of FIG. 10 will be described. FIG. 11 is a flowchart showing the processing procedure of abnormal skeletal data detection processing. The abnormal skeletal data detection unit 154 of the information processing device 100 acquires skeletal data (step S201).

異常骨格データ検出部１５４は、骨格データを基にして、骨の長さを算出する（ステップＳ２０２）。異常骨格データ検出部１５４は、骨の長さが閾値Ｔｈ１以上の場合には（ステップＳ２０３，Ｙｅｓ）、ステップＳ２０９に移行する。 The abnormal skeletal data detection unit 154 calculates the length of the bone based on the skeletal data (step S202). If the length of the bone is equal to or greater than the threshold Th1 (step S203, Yes), the abnormal skeleton data detection unit 154 moves to step S209.

一方、異常骨格データ検出部１５４は、骨の長さが閾値Ｔｈ１以上でない場合には（ステップＳ２０３，Ｎｏ）、骨格データを基にして、関節角度を算出する（ステップＳ２０４）。異常骨格データ検出部１５４は、関節角度が閾値Ｔｈ２以上の場合には（ステップＳ２０５，Ｙｅｓ）、ステップＳ２０９に移行する。 On the other hand, if the bone length is not equal to or greater than the threshold Th1 (step S203, No), the abnormal skeletal data detection unit 154 calculates the joint angle based on the skeletal data (step S204). If the joint angle is equal to or greater than the threshold Th2 (step S205, Yes), the abnormal skeleton data detection unit 154 moves to step S209.

異常骨格データ検出部１５４は、関節角度が閾値Ｔｈ２以上でない場合には（ステップＳ２０５，Ｎｏ）、骨格データを基にして、移動距離を算出する（ステップＳ２０６）。異常骨格データ検出部１５４は、移動距離が閾値Ｔｈ２以上でない場合には（ステップＳ２０７，Ｎｏ）、骨格データを正常と判定する（ステップＳ２０８）。 If the joint angle is not equal to or greater than the threshold Th2 (step S205, No), the abnormal skeleton data detection unit 154 calculates the movement distance based on the skeleton data (step S206). If the moving distance is not equal to or greater than the threshold Th2 (Step S207, No), the abnormal skeleton data detection unit 154 determines that the skeleton data is normal (Step S208).

一方、異常骨格データ検出部１５４は、移動距離が閾値Ｔｈ２以上の場合には（ステップＳ２０７，Ｙｅｓ）、骨格データを異常と判定する（ステップＳ２０９）。異常骨格データ検出部１５４は、異常骨格データのフレーム番号を出力する（ステップＳ２１０）。 On the other hand, if the moving distance is equal to or greater than the threshold Th2 (Step S207, Yes), the abnormal skeleton data detection unit 154 determines that the skeleton data is abnormal (Step S209). The abnormal skeleton data detection unit 154 outputs the frame number of the abnormal skeleton data (step S210).

続いて、図１０のステップＳ１０６に示した異常画像データ検出処理の処理手順の一例について説明する。図１２は、異常画像データ検出処理の処理手順を示すフローチャートである。図１２に示すように、情報処理装置１００の異常画像データ検出部１５５は、異常骨格データのフレーム番号に対応する画像データと、セグメンテーションデータとを取得する（ステップＳ３０１）。 Next, an example of the processing procedure of the abnormal image data detection process shown in step S106 of FIG. 10 will be described. FIG. 12 is a flowchart showing the processing procedure of abnormal image data detection processing. As shown in FIG. 12, the abnormal image data detection unit 155 of the information processing device 100 acquires image data corresponding to the frame number of the abnormal skeleton data and segmentation data (step S301).

異常画像データ検出部１５５は、セグメンテーションデータの人領域の外接ＢＢＯＸを検出する（ステップＳ３０２）。異常画像データ検出部１５５は、画像データの矩形と、外接ＢＢＯＸとを比較する（ステップＳ３０３）。 The abnormal image data detection unit 155 detects the circumscribed BBOX of the human region of the segmentation data (step S302). The abnormal image data detection unit 155 compares the rectangle of the image data and the circumscribed BBOX (step S303).

異常画像データ検出部１５５は、画像データの矩形と、外接ＢＢＯＸとの関係が、第１正常条件を満たす場合には（ステップＳ３０４，Ｙｅｓ）、ステップＳ３０６に移行する。一方、異常画像データ検出部１５５は、第１正常条件を満たさない場合には（ステップＳ３０４，Ｎｏ）、画像データを異常（見切れ）と判定し（ステップＳ３０５）、ステップＳ３０８に移行する。 If the relationship between the rectangle of the image data and the circumscribed BBOX satisfies the first normal condition (step S304, Yes), the abnormal image data detection unit 155 moves to step S306. On the other hand, if the first normal condition is not satisfied (step S304, No), the abnormal image data detection unit 155 determines that the image data is abnormal (cut off) (step S305), and proceeds to step S308.

異常画像データ検出部１５５は、画像データの矩形と、外接ＢＢＯＸとの関係が、第２正常条件を満たす場合には（ステップＳ３０６，Ｙｅｓ）、画像データを正常と判定する（ステップＳ３０９）。 If the relationship between the rectangle of the image data and the circumscribed BBOX satisfies the second normal condition (Step S306, Yes), the abnormal image data detection unit 155 determines that the image data is normal (Step S309).

一方、異常画像データ検出部１５５は、第２正常条件を満たさない場合には（ステップＳ３０６，Ｎｏ）、画像データを異常（大きすぎ）と判定する（ステップＳ３０７）。異常画像データ検出部１５５は、異常画像データのフレーム番号を出力する（ステップＳ３０８）。 On the other hand, if the second normal condition is not satisfied (step S306, No), the abnormal image data detection unit 155 determines that the image data is abnormal (too large) (step S307). The abnormal image data detection unit 155 outputs the frame number of the abnormal image data (step S308).

続いて、図１０のステップＳ１０７に示した類似姿勢検出処理の処理手順の一例について説明する。図１３は、類似姿勢検出処理の処理手順を示すフローチャートである。情報処理装置１００の類似姿勢検出部１５６は、異常画像データのフレーム番号に対応するセグメンテーションデータ（基準データ）を取得する（ステップＳ４０１）。類似姿勢検出部１５６は、基準データの各部分ベクトルを算出し、正規化する（ステップＳ４０２）。 Next, an example of the processing procedure of the similar posture detection processing shown in step S107 in FIG. 10 will be described. FIG. 13 is a flowchart showing the processing procedure of similar posture detection processing. The similar posture detection unit 156 of the information processing device 100 acquires segmentation data (reference data) corresponding to the frame number of the abnormal image data (step S401). The similar posture detection unit 156 calculates and normalizes each partial vector of the reference data (step S402).

類似姿勢検出部１５６は、未選択のセグメンテーションデータを取得する（ステップＳ４０３）。類似姿勢検出部１５６は、取得したセグメンテーションデータの各部分ベクトルを算出し、正規化する（ステップＳ４０４）。 The similar posture detection unit 156 acquires unselected segmentation data (step S403). The similar posture detection unit 156 calculates and normalizes each partial vector of the obtained segmentation data (step S404).

類似姿勢検出部１５６は、基準データの各部分ベクトルと、セグメンテーションデータの各部分ベクトルとを比較する（ステップＳ４０５）。類似姿勢検出部１５６は、全ての部分ベクトルの角度およびサイズの差が閾値Ｔｈ６未満でない場合には（ステップＳ４０６，Ｎｏ）、セグメンテーションデータを非類似フレームと判定し（ステップＳ４０７）、ステップＳ４１０に移行する。 The similar posture detection unit 156 compares each partial vector of the reference data with each partial vector of the segmentation data (step S405). If the difference in angle and size of all partial vectors is not less than the threshold Th6 (step S406, No), the similar posture detection unit 156 determines the segmentation data as a dissimilar frame (step S407), and moves to step S410. do.

一方、類似姿勢検出部１５６は、全ての部分ベクトルの角度およびサイズの差が閾値Ｔｈ６未満となる場合は（ステップＳ４０６，Ｙｅｓ）、セグメンテーションデータを類似フレームと判定する（ステップＳ４０８）。類似姿勢検出部１５６は、類似フレームの類似フレーム番号を出力する（ステップＳ４０９）。 On the other hand, if the difference in angle and size of all partial vectors is less than the threshold Th6 (Step S406, Yes), the similar posture detection unit 156 determines that the segmentation data is a similar frame (Step S408). The similar posture detection unit 156 outputs the similar frame number of the similar frame (step S409).

類似姿勢検出部１５６は、全てのセグメンテーションデータを選択していない場合には（ステップＳ４１０，Ｎｏ）、ステップＳ４０３に移行する。類似姿勢検出部１５６は、全てのセグメンテーションデータを選択した場合には（ステップＳ４１０，Ｙｅｓ）、処理を終了する。 If the similar posture detection unit 156 has not selected all the segmentation data (step S410, No), the process proceeds to step S403. When the similar posture detection unit 156 selects all the segmentation data (step S410, Yes), the process ends.

続いて、図１０のステップＳ１０８に示した訓練用画像データ生成処理の処理手順の一例について説明する。図１４は、訓練用画像データ生成処理の処理手順を示すフローチャートである。図１４に示すように、情報処理装置１００の訓練用画像データ生成部１５７は、類似フレーム番号に対応する類似セグメンテーションデータを取得する（ステップＳ５０１）。 Next, an example of the processing procedure of the training image data generation process shown in step S108 of FIG. 10 will be described. FIG. 14 is a flowchart showing the processing procedure of training image data generation processing. As shown in FIG. 14, the training image data generation unit 157 of the information processing device 100 acquires similar segmentation data corresponding to a similar frame number (step S501).

訓練用画像データ生成部１５７は、基準フレーム番号に対応するセグメンテーションデータ（基準データ）を取得する（ステップＳ５０２）。訓練用画像データ生成部１５７は、類似セグメンテーションデータの外接ＢＢＯＸを検出する（ステップＳ５０３）。 The training image data generation unit 157 acquires segmentation data (reference data) corresponding to the reference frame number (step S502). The training image data generation unit 157 detects the circumscribed BBOX of the similar segmentation data (step S503).

訓練用画像データ生成部１５７は、基準データの外接ＢＢＯＸを検出する（ステップＳ５０４）。訓練用画像データ生成部１５７は、基準データの外接ＢＢＯＸの縦横比を基にして、類似セグメントテーションデータの外接ＢＢＯＸを調整する（ステップＳ５０５）。 The training image data generation unit 157 detects the circumscribed BBOX of the reference data (step S504). The training image data generation unit 157 adjusts the circumscribed BBOX of the similar segmentation data based on the aspect ratio of the circumscribed BBOX of the reference data (step S505).

訓練用画像データ生成部１５７は、訓練用画像データを、調整後の外接ＢＢＯＸで切り出すことで、画像データを生成する（ステップＳ５０６）。 The training image data generation unit 157 generates image data by cutting out the training image data using the adjusted circumscribed BBOX (step S506).

次に、本実施例に係る情報処理装置１００の効果について説明する。情報処理装置１００は、複数の骨格データから異常な骨格データを特定し、異常な骨格データの推論元となる画像データが異常な画像データであるか否かを判定する。情報処理装置１００は、異常な画像データである場合に、係る異常な画像データに対応する訓練用画像データに含まれる人物特徴に類似する人物特徴を有する他の訓練用画像データを特定する。情報処理装置１００は、特定した他の訓練用画像データの人領域を調整し、調整した人領域を切り出した画像データを、再訓練時に利用する。 Next, the effects of the information processing device 100 according to this embodiment will be explained. The information processing device 100 identifies abnormal skeletal data from a plurality of pieces of skeletal data, and determines whether image data from which the abnormal skeletal data is inferred is abnormal image data. When the image data is abnormal, the information processing apparatus 100 identifies other training image data having human characteristics similar to the human characteristics included in the training image data corresponding to the abnormal image data. The information processing device 100 adjusts the human region of the other training image data that has been identified, and uses the image data obtained by cutting out the adjusted human region at the time of retraining.

情報処理装置１００は、骨格データに含まれる関節位置間の距離、関節角度、および、連続する骨格データの同一の関節位置の移動距離を基にして、異常な骨格データを検出する。これによって、実際に３次元座標の乱れが発生している骨格データを検出することができる。 The information processing device 100 detects abnormal skeletal data based on the distance between joint positions and joint angles included in the skeletal data, and the movement distance of the same joint position in consecutive skeletal data. This makes it possible to detect skeletal data in which three-dimensional coordinate disturbances have actually occurred.

情報処理装置１００は、訓練用画像データに対してセグメンテーションを実行して、前記人物の複数の部位を特定し、複数の部位に外接する外接矩形を特定する。これによって、検出部１５１によって特定される人領域とは別に、セグメンテーション結果に基づいた人領域の外接矩形を特定することができる。 The information processing apparatus 100 performs segmentation on the training image data to identify multiple body parts of the person and identify circumscribed rectangles that circumscribe the multiple body parts. Accordingly, in addition to the human area specified by the detection unit 151, it is possible to specify the circumscribed rectangle of the human area based on the segmentation result.

情報処理装置１００は、画像データ６１と、セグメンテーションデータに基づく外接矩形とを基にして、異常画像データを検出する。これによって、再訓練時に利用可能な候補となる画像データを特定することができる。 The information processing device 100 detects abnormal image data based on the image data 61 and a circumscribed rectangle based on the segmentation data. This makes it possible to specify image data that can be used as candidates for retraining.

情報処理装置１００は、異常画像データに含まれる人物の関節位置を基準とするベクトルと類似する人物の関節位置を基準とするベクトル有する画像データを検出する。これによって、再訓練時に利用可能な候補となる画像データであって、異常画像データに類似する画像データを検出することができる。 The information processing apparatus 100 detects image data having a vector based on a joint position of a person similar to a vector based on a joint position of a person included in the abnormal image data. Thereby, it is possible to detect image data that is a candidate that can be used at the time of retraining and that is similar to the abnormal image data.

次に、上記実施例に示した情報処理装置１００と同様の機能を実現するコンピュータのハードウェア構成の一例について説明する。図１５は、実施例の情報処理装置と同様の機能を実現するコンピュータのハードウェア構成の一例を示す図である。 Next, an example of the hardware configuration of a computer that implements the same functions as the information processing apparatus 100 shown in the above embodiment will be described. FIG. 15 is a diagram illustrating an example of the hardware configuration of a computer that implements the same functions as the information processing device of the embodiment.

図１５に示すように、コンピュータ２００は、各種演算処理を実行するＣＰＵ２０１と、ユーザからのデータの入力を受け付ける入力装置２０２と、ディスプレイ２０３とを有する。また、コンピュータ２００は、有線または無線ネットワークを介して、外部装置等との間でデータの授受を行う通信装置２０４と、インタフェース装置２０５とを有する。また、コンピュータ２００は、各種情報を一時記憶するＲＡＭ２０６と、ハードディスク装置２０７とを有する。そして、各装置２０１～２０７は、バス２０８に接続される。 As shown in FIG. 15, the computer 200 includes a CPU 201 that executes various calculation processes, an input device 202 that receives data input from a user, and a display 203. The computer 200 also includes a communication device 204 and an interface device 205 that exchange data with an external device or the like via a wired or wireless network. The computer 200 also includes a RAM 206 that temporarily stores various information and a hard disk device 207. Each device 201-207 is then connected to a bus 208.

ハードディスク装置２０７は、人検出プログラム２０７ａ、骨格推論プログラム２０７ｂ、セグメンテーションプログラム２０７ｃ、異常骨格データ検出プログラム２０７ｄ、異常画像データ検出プログラム２０７ｅを有する。ハードディスク装置２０７は、類似姿勢検出プログラム２０７ｆ、訓練用画像データ生成プログラム２０７ｇ、機械学習実行プログラム２０７ｈ、技認識プログラム２０７ｉを有する。また、ＣＰＵ２０１は、各プログラム２０７ａ～２０７ｉを読み出してＲＡＭ２０６に展開する。 The hard disk device 207 includes a person detection program 207a, a skeleton inference program 207b, a segmentation program 207c, an abnormal skeleton data detection program 207d, and an abnormal image data detection program 207e. The hard disk device 207 includes a similar posture detection program 207f, a training image data generation program 207g, a machine learning execution program 207h, and a technique recognition program 207i. Further, the CPU 201 reads each program 207a to 207i and expands it in the RAM 206.

人検出プログラム２０７ａは、人検出プロセス２０６ａとして機能する。骨格推論プログラム２０７ｂは、骨格推論プロセス２０６ｂとして機能する。セグメンテーションプログラム２０７ｃは、セグメンテーションプロセス２０６ｃとして機能する。異常骨格データ検出プログラム２０７ｄは、異常骨格データ検出プロセス２０６ｄとして機能する。異常画像データ検出プログラム２０７ｅは、異常画像データ検出プロセス２０６ｅとして機能する。類似姿勢検出プログラム２０７ｆは、類似姿勢検出プロセス２０６ｆとして機能する。訓練用画像データ生成プログラム２０７ｇは、訓練用画像データ生成プロセス２０７ｇとして機能する。機械学習実行プログラム２０７ｈは、機械学習実行プロセス２０６ｈとして機能する。技認識プログラム２０７ｉは、技認識プロセス２０６ｉとして機能する。 The person detection program 207a functions as a person detection process 206a. The skeleton inference program 207b functions as a skeleton inference process 206b. Segmentation program 207c functions as segmentation process 206c. The abnormal skeleton data detection program 207d functions as an abnormal skeleton data detection process 206d. The abnormal image data detection program 207e functions as an abnormal image data detection process 206e. The similar posture detection program 207f functions as a similar posture detection process 206f. The training image data generation program 207g functions as a training image data generation process 207g. The machine learning execution program 207h functions as a machine learning execution process 206h. The technique recognition program 207i functions as a technique recognition process 206i.

人検出プロセス２０６ａの処理は、人検出部１５１の処理に対応する。骨格推論プロセス２０６ｂの処理は、骨格推定部１５２の処理に対応する。セグメンテーションプロセス２０６ｃの処理は、セグメンテーション部１５３の処理に対応する。異常骨格データ検出プロセス２０６ｄの処理は、異常骨格データ検出部１５４の処理に対応する。異常画像データ検出プロセス２０６ｅの処理は、異常画像データ検出部１５５の処理に対応する。類似姿勢検出プロセス２０６ｆの処理は、類似姿勢検出部１５６の処理に対応する。訓練用画像データ生成プロセス２０７ｇの処理は、訓練用画像データ生成部１５７の処理に対応する。機械学習実行プロセス２０６ｈの処理は、機械学習実行部１５８の処理に対応する。技認識プロセス２０６ｉの処理は、技認識部１５９の処理に対応する。 The processing of the person detection process 206a corresponds to the processing of the person detection unit 151. The processing of the skeleton inference process 206b corresponds to the processing of the skeleton estimation unit 152. The processing of the segmentation process 206c corresponds to the processing of the segmentation unit 153. The processing of the abnormal skeletal data detection process 206d corresponds to the processing of the abnormal skeletal data detection unit 154. The processing of the abnormal image data detection process 206e corresponds to the processing of the abnormal image data detection section 155. The processing of the similar posture detection process 206f corresponds to the processing of the similar posture detection section 156. The processing of the training image data generation process 207g corresponds to the processing of the training image data generation unit 157. The processing of the machine learning execution process 206h corresponds to the processing of the machine learning execution unit 158. The processing of the technique recognition process 206i corresponds to the processing of the technique recognition section 159.

なお、各プログラム２０７ａ～２０７ｉについては、必ずしも最初からハードディスク装置３０７に記憶させておかなくても良い。例えば、コンピュータ２００に挿入されるフレキシブルディスク（ＦＤ）、ＣＤ－ＲＯＭ、ＤＶＤ、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」に各プログラムを記憶させておく。そして、コンピュータ２００が各プログラム２０７ａ～２０７ｉを読み出して実行するようにしてもよい。 Note that each of the programs 207a to 207i does not necessarily have to be stored in the hard disk device 307 from the beginning. For example, each program is stored in a "portable physical medium" such as a flexible disk (FD), CD-ROM, DVD, magneto-optical disk, or IC card that is inserted into the computer 200. Then, the computer 200 may read and execute each program 207a to 207i.

以上の各実施例を含む実施形態に関し、さらに以下の付記を開示する。 Regarding the embodiments including each of the above examples, the following additional notes are further disclosed.

（付記１）複数の訓練用画像データから人物の領域を切り出した複数の画像データを、学習モデルに入力した結果を基にして、前記複数の画像データに含まれる人物の複数の骨格データを推論し、
前記複数の骨格データを基にして、異常な骨格データを検出し、
前記複数の訓練用画像データのうち、前記異常な骨格データに対応する異常訓練用画像データから特定される前記人物の領域と、前記異常な骨格データに対応する画像データの人物の領域とを基にして、前記異常な骨格データに対応する画像データが、人物の領域が異常な画像データであるか否かを判定し、
前記異常な骨格データに対応する画像データが異常な画像データである場合に、前記異常訓練用画像データから特定される前記人物の関節位置の特徴と類似する前記人物の関節位置の特徴を有する類似訓練用画像データを、前記複数の訓練用画像データから特定し、
前記異常訓練用画像データから特定される人物の領域に基づいて、前記類似訓練用画像データから特定される人物の領域を調整し、
前記類似訓練用画像データから、調整後の人物の領域を切り出した画像データを基にして、前記学習モデルを訓練する
処理をコンピュータに実行させることを特徴とする情報処理プログラム。 (Appendix 1) Based on the results of inputting a plurality of image data into a learning model into which a region of a person is extracted from a plurality of training image data, multiple pieces of skeletal data of a person included in the plurality of image data are inferred. death,
Detecting abnormal skeletal data based on the plurality of skeletal data,
Based on the region of the person identified from the abnormal training image data corresponding to the abnormal skeletal data among the plurality of training image data and the region of the person in the image data corresponding to the abnormal skeletal data. and determining whether or not the image data corresponding to the abnormal skeletal data has an abnormal human area,
When image data corresponding to the abnormal skeletal data is abnormal image data, similarity having characteristics of joint positions of the person similar to characteristics of joint positions of the person identified from the abnormal training image data. identifying training image data from the plurality of training image data;
adjusting the region of the person specified from the similar training image data based on the region of the person specified from the abnormal training image data;
An information processing program that causes a computer to execute a process of training the learning model based on image data obtained by cutting out a region of a person after adjustment from the similar training image data.

（付記２）前記異常な骨格データを検出する処理は、前記骨格データに含まれる関節位置間の距離、関節角度、および、連続する骨格データの同一の関節位置の移動距離を基にして、前記異常な骨格データを検出することを特徴とする付記１に記載の情報処理プログラム。 (Supplementary note 2) The process of detecting the abnormal skeletal data is based on the distance between joint positions included in the skeletal data, the joint angle, and the movement distance of the same joint position in consecutive skeletal data. The information processing program according to supplementary note 1, which detects abnormal skeletal data.

（付記３）前記複数の訓練用画像データに対してセグメンテーションを実行して、前記人物の複数の部位を特定し、前記複数の部位に外接する外接矩形を、前記人物の領域として特定する処理を更にコンピュータに実行させることを特徴とする付記１に記載の情報処理プログラム。 (Additional note 3) A process of performing segmentation on the plurality of training image data to identify a plurality of parts of the person, and specifying a circumscribed rectangle that circumscribes the plurality of parts as an area of the person. The information processing program according to supplementary note 1, further being caused to be executed by a computer.

（付記４）前記異常な画像データであるか否かを判定する処理は、前記異常訓練用画像データのセグメンテーション結果から得られる外接矩形と、前記異常な骨格データに対応する画像データの人物の領域との比較結果を基にして、前記異常な画像データを特定することを特徴とする付記３に記載の情報処理プログラム。 (Additional note 4) The process of determining whether the image data is abnormal includes a circumscribed rectangle obtained from the segmentation result of the abnormal training image data and a region of a person in the image data corresponding to the abnormal skeletal data. The information processing program according to appendix 3, characterized in that the abnormal image data is identified based on a comparison result with the above.

（付記５）前記類似訓練用画像データを特定する処理は、前記セグメンテーション結果を基にして、前記異常訓練用画像データに含まれる人物の第１関節位置と、比較対象の訓練用画像データに含まれる人物の第２関節位置とを特定し、前記第１関節位置に基づくベクトルと、前記第２関節位置に基づくベクトルとが類似する場合に、前記比較対象の訓練用画像データを、前記類似訓練用画像データとして特定することを特徴とする付記４に記載の情報処理プログラム。 (Additional Note 5) The process of identifying the similar training image data is based on the segmentation result, and the first joint position of the person included in the abnormal training image data and the training image data to be compared is determined based on the segmentation result. If the vector based on the first joint position and the vector based on the second joint position are similar, the training image data to be compared is used for the similar training. The information processing program according to appendix 4, characterized in that the information processing program is specified as image data for use.

（付記６）複数の訓練用画像データから人物の領域を切り出した複数の画像データを、学習モデルに入力した結果を基にして、前記複数の画像データに含まれる人物の複数の骨格データを推論し、
前記複数の骨格データを基にして、異常な骨格データを検出し、
前記複数の訓練用画像データのうち、前記異常な骨格データに対応する異常訓練用画像データから特定される前記人物の領域と、前記異常な骨格データに対応する画像データの人物の領域とを基にして、前記異常な骨格データに対応する画像データが、人物の領域が異常な画像データであるか否かを判定し、
前記異常な骨格データに対応する画像データが異常な画像データである場合に、前記異常訓練用画像データから特定される前記人物の関節位置の特徴と類似する前記人物の関節位置の特徴を有する類似訓練用画像データを、前記複数の訓練用画像データから特定し、
前記異常訓練用画像データから特定される人物の領域に基づいて、前記類似訓練用画像データから特定される人物の領域を調整し、
前記類似訓練用画像データから、調整後の人物の領域を切り出した画像データを基にして、前記学習モデルを訓練する
処理をコンピュータが実行することを特徴とする情報処理方法。 (Additional Note 6) Based on the results of inputting a plurality of image data into a learning model into which a region of a person is extracted from a plurality of training image data, multiple pieces of skeletal data of a person included in the plurality of image data are inferred. death,
Detecting abnormal skeletal data based on the plurality of skeletal data,
Based on the region of the person identified from the abnormal training image data corresponding to the abnormal skeletal data among the plurality of training image data and the region of the person in the image data corresponding to the abnormal skeletal data. and determining whether or not the image data corresponding to the abnormal skeletal data has an abnormal human area,
When image data corresponding to the abnormal skeletal data is abnormal image data, similarity having characteristics of joint positions of the person similar to characteristics of joint positions of the person identified from the abnormal training image data. identifying training image data from the plurality of training image data;
adjusting the region of the person specified from the similar training image data based on the region of the person specified from the abnormal training image data;
An information processing method characterized in that a computer executes a process of training the learning model based on image data obtained by cutting out a region of a person after adjustment from the similar training image data.

（付記７）前記異常な骨格データを検出する処理は、前記骨格データに含まれる関節位置間の距離、関節角度、および、連続する骨格データの同一の関節位置の移動距離を基にして、前記異常な骨格データを検出することを特徴とする付記６に記載の情報処理方法。 (Additional Note 7) The process of detecting the abnormal skeletal data is based on the distance between joint positions included in the skeletal data, the joint angle, and the movement distance of the same joint position in consecutive skeletal data. The information processing method according to appendix 6, characterized in that abnormal skeletal data is detected.

（付記８）前記複数の訓練用画像データに対してセグメンテーションを実行して、前記人物の複数の部位を特定し、前記複数の部位に外接する外接矩形を、前記人物の領域として特定する処理を更にコンピュータに実行させることを特徴とする付記６に記載の情報処理方法。 (Additional note 8) A process of performing segmentation on the plurality of training image data to identify a plurality of parts of the person, and specifying a circumscribed rectangle that circumscribes the plurality of parts as an area of the person. The information processing method according to appendix 6, further comprising causing a computer to execute the information processing method.

（付記９）前記異常な画像データであるか否かを判定する処理は、前記異常訓練用画像データのセグメンテーション結果から得られる外接矩形と、前記異常な骨格データに対応する画像データの人物の領域との比較結果を基にして、前記異常な画像データを特定することを特徴とする付記８に記載の情報処理方法。 (Additional Note 9) The process of determining whether the image data is abnormal includes a circumscribing rectangle obtained from the segmentation result of the abnormal training image data and a region of a person in the image data corresponding to the abnormal skeletal data. The information processing method according to appendix 8, characterized in that the abnormal image data is identified based on a comparison result between the image data and the image data.

（付記１０）前記類似訓練用画像データを特定する処理は、前記セグメンテーション結果を基にして、前記異常訓練用画像データに含まれる人物の第１関節位置と、比較対象の訓練用画像データに含まれる人物の第２関節位置とを特定し、前記第１関節位置に基づくベクトルと、前記第２関節位置に基づくベクトルとが類似する場合に、前記比較対象の訓練用画像データを、前記類似訓練用画像データとして特定することを特徴とする付記９に記載の情報処理方法。 (Additional note 10) The process of identifying the similar training image data is based on the segmentation result, and the first joint position of the person included in the abnormal training image data and the training image data to be compared is determined based on the segmentation result. If the vector based on the first joint position and the vector based on the second joint position are similar, the training image data to be compared is used for the similar training. The information processing method according to appendix 9, characterized in that the information processing method is specified as image data for use.

（付記１１）複数の訓練用画像データから人物の領域を切り出した複数の画像データを、学習モデルに入力した結果を基にして、前記複数の画像データに含まれる人物の複数の骨格データを推論し、
前記複数の骨格データを基にして、異常な骨格データを検出し、
前記複数の訓練用画像データのうち、前記異常な骨格データに対応する異常訓練用画像データから特定される前記人物の領域と、前記異常な骨格データに対応する画像データの人物の領域とを基にして、前記異常な骨格データに対応する画像データが、人物の領域が異常な画像データであるか否かを判定し、
前記異常な骨格データに対応する画像データが異常な画像データである場合に、前記異常訓練用画像データから特定される前記人物の関節位置の特徴と類似する前記人物の関節位置の特徴を有する類似訓練用画像データを、前記複数の訓練用画像データから特定し、
前記異常訓練用画像データから特定される人物の領域に基づいて、前記類似訓練用画像データから特定される人物の領域を調整し、
前記類似訓練用画像データから、調整後の人物の領域を切り出した画像データを基にして、前記学習モデルを訓練する
処理を実行する制御部を有する情報処理装置。 (Additional Note 11) Based on the results of inputting a plurality of image data into a learning model into which a region of a person is extracted from a plurality of training image data, multiple pieces of skeletal data of a person included in the plurality of image data are inferred. death,
Detecting abnormal skeletal data based on the plurality of skeletal data,
Based on the region of the person identified from the abnormal training image data corresponding to the abnormal skeletal data among the plurality of training image data and the region of the person in the image data corresponding to the abnormal skeletal data. and determining whether or not the image data corresponding to the abnormal skeletal data has an abnormal human area,
When image data corresponding to the abnormal skeletal data is abnormal image data, similarity having characteristics of joint positions of the person similar to characteristics of joint positions of the person identified from the abnormal training image data. identifying training image data from the plurality of training image data;
adjusting the region of the person specified from the similar training image data based on the region of the person specified from the abnormal training image data;
An information processing apparatus comprising a control unit that executes a process of training the learning model based on image data obtained by cutting out a region of a person after adjustment from the similar training image data.

（付記１２）前記制御部が実行する前記異常な骨格データを検出する処理は、前記骨格データに含まれる関節位置間の距離、関節角度、および、連続する骨格データの同一の関節位置の移動距離を基にして、前記異常な骨格データを検出することを特徴とする付記１１に記載の情報処理装置。 (Additional Note 12) The process of detecting the abnormal skeletal data executed by the control unit includes the distance between joint positions included in the skeletal data, the joint angle, and the movement distance of the same joint position in consecutive skeletal data. The information processing device according to appendix 11, wherein the abnormal skeleton data is detected based on.

（付記１３）前記制御部は、前記複数の訓練用画像データに対してセグメンテーションを実行して、前記人物の複数の部位を特定し、前記複数の部位に外接する外接矩形を、前記人物の領域として特定する処理を更に実行することを特徴とする付記１１に記載の情報処理装置。 (Supplementary note 13) The control unit executes segmentation on the plurality of training image data to identify a plurality of parts of the person, and defines a circumscribing rectangle that circumscribes the plurality of parts as an area of the person. The information processing apparatus according to supplementary note 11, further executing a process specified as .

（付記１４）前記制御部が実行する前記異常な画像データであるか否かを判定する処理は、前記異常訓練用画像データのセグメンテーション結果から得られる外接矩形と、前記異常な骨格データに対応する画像データの人物の領域との比較結果を基にして、前記異常な画像データを特定することを特徴とする付記１３に記載の情報処理装置。 (Additional Note 14) The process of determining whether the image data is abnormal, which is executed by the control unit, is based on a circumscribed rectangle obtained from the segmentation result of the abnormal training image data and corresponding to the abnormal skeletal data. The information processing device according to appendix 13, wherein the abnormal image data is identified based on a comparison result of the image data with a region of a person.

（付記１５）前記制御部が実行する前記類似訓練用画像データを特定する処理は、前記セグメンテーション結果を基にして、前記異常訓練用画像データに含まれる人物の第１関節位置と、比較対象の訓練用画像データに含まれる人物の第２関節位置とを特定し、前記第１関節位置に基づくベクトルと、前記第２関節位置に基づくベクトルとが類似する場合に、前記比較対象の訓練用画像データを、前記類似訓練用画像データとして特定することを特徴とする付記１４に記載の情報処理装置。 (Additional Note 15) The process of identifying the similar training image data executed by the control unit is based on the segmentation result, and the first joint position of the person included in the abnormal training image data and the comparison target. A second joint position of the person included in the training image data is specified, and if a vector based on the first joint position and a vector based on the second joint position are similar, the comparison target training image The information processing device according to appendix 14, characterized in that the data is specified as the similar training image data.

１００情報処理装置
１１０通信部
１２０入力部
１３０表示部
１４０記憶部
１４１技認識テーブル
１５０制御部
１５１人検出部
１５２骨格推論部
１５３セグメンテーション部
１５４異常骨格データ検出部
１５５異常画像データ検出部
１５６類似姿勢検出部
１５７訓練用画像データ生成部
１５８機械学習実行部
１５９技認識部 100 Information processing device 110 Communication unit 120 Input unit 130 Display unit 140 Storage unit 141 Technique recognition table 150 Control unit 151 Person detection unit 152 Skeleton inference unit 153 Segmentation unit 154 Abnormal skeleton data detection unit 155 Abnormal image data detection unit 156 Similar posture detection Section 157 Training image data generation section 158 Machine learning execution section 159 Technique recognition section

Claims

Inferring a plurality of skeletal data of a person included in the plurality of image data based on the result of inputting a plurality of image data in which a region of a person is cut out from a plurality of training image data into a learning model,
Detecting abnormal skeletal data based on the plurality of skeletal data,
Based on the region of the person identified from the abnormal training image data corresponding to the abnormal skeletal data among the plurality of training image data and the region of the person in the image data corresponding to the abnormal skeletal data. and determining whether or not the image data corresponding to the abnormal skeletal data has an abnormal human area,
When image data corresponding to the abnormal skeletal data is abnormal image data, similarity having characteristics of joint positions of the person similar to characteristics of joint positions of the person identified from the abnormal training image data. identifying training image data from the plurality of training image data;
adjusting the region of the person specified from the similar training image data based on the region of the person specified from the abnormal training image data;
An information processing program that causes a computer to execute a process of training the learning model based on image data obtained by cutting out a region of a person after adjustment from the similar training image data.

The process of detecting the abnormal skeletal data includes detecting the abnormal skeletal data based on the distance between joint positions and joint angles included in the skeletal data, and the movement distance of the same joint position in consecutive skeletal data. The information processing program according to claim 1, wherein the information processing program detects.

The computer further executes a process of performing segmentation on the plurality of training image data to specify a plurality of parts of the person, and specifying a circumscribed rectangle that circumscribes the plurality of parts as an area of the person. 2. The information processing program according to claim 1, wherein the information processing program performs the following operations.

The process of determining whether the image data is abnormal includes the comparison result between the circumscribed rectangle obtained from the segmentation result of the abnormal training image data and the region of the person in the image data corresponding to the abnormal skeletal data. 4. The information processing program according to claim 3, wherein the abnormal image data is identified based on.

The process of identifying the similar training image data includes identifying the first joint position of the person included in the abnormal training image data and the first joint position of the person included in the comparison target training image data based on the segmentation result. If the vector based on the first joint position and the vector based on the second joint position are similar, the training image data to be compared is used as the similar training image data. 5. The information processing program according to claim 4, wherein the information processing program specifies the information processing program.

Inferring a plurality of skeletal data of a person included in the plurality of image data based on the result of inputting a plurality of image data in which a region of a person is cut out from a plurality of training image data into a learning model,
Detecting abnormal skeletal data based on the plurality of skeletal data,
Based on the region of the person identified from the abnormal training image data corresponding to the abnormal skeletal data among the plurality of training image data and the region of the person in the image data corresponding to the abnormal skeletal data. and determining whether or not the image data corresponding to the abnormal skeletal data has an abnormal human area,
When image data corresponding to the abnormal skeletal data is abnormal image data, similarity having characteristics of joint positions of the person similar to characteristics of joint positions of the person identified from the abnormal training image data. identifying training image data from the plurality of training image data;
adjusting the region of the person specified from the similar training image data based on the region of the person specified from the abnormal training image data;
An information processing method characterized in that a computer executes a process of training the learning model based on image data obtained by cutting out a region of a person after adjustment from the similar training image data.

Inferring a plurality of skeletal data of a person included in the plurality of image data based on the result of inputting a plurality of image data in which a region of a person is cut out from a plurality of training image data into a learning model,
Detecting abnormal skeletal data based on the plurality of skeletal data,
Based on the region of the person identified from the abnormal training image data corresponding to the abnormal skeletal data among the plurality of training image data and the region of the person in the image data corresponding to the abnormal skeletal data. and determining whether or not the image data corresponding to the abnormal skeletal data has an abnormal human area,
When image data corresponding to the abnormal skeletal data is abnormal image data, similarity having characteristics of joint positions of the person similar to characteristics of joint positions of the person identified from the abnormal training image data. identifying training image data from the plurality of training image data;
adjusting the region of the person specified from the similar training image data based on the region of the person specified from the abnormal training image data;
An information processing apparatus comprising a control unit that executes a process of training the learning model based on image data obtained by cutting out a region of a person after adjustment from the similar training image data.