JP2020201772A

JP2020201772A - Attitude analysis program and attitude analyzer

Info

Publication number: JP2020201772A
Application number: JP2019108981A
Authority: JP
Inventors: 昌之川俣; Masayuki Kawamata; 強松野; Tsutomu Matsuno
Original assignee: Hitachi Industry and Control Solutions Co Ltd
Current assignee: Hitachi Industry and Control Solutions Co Ltd
Priority date: 2019-06-11
Filing date: 2019-06-11
Publication date: 2020-12-17
Anticipated expiration: 2039-06-11
Also published as: JP6825041B2

Abstract

To analyze the attitude of a worker at low cost with high accuracy.SOLUTION: An analyzer 10 comprises: a skeleton extraction unit 11 for acquiring, by image recognition using image data 18 as input, skeleton data 19 that includes feature point data that indicates the joint positions of a person reflected in the image data 18; a storage unit of an attitude model 22b associated with an attitude label for each skeleton data 19; an attitude estimation unit 13b for determining, on the basis of the skeleton data 19 acquired by the skeleton extraction unit 11, the attitude of the person reflected in the image data 18 from the attitude label predetermined for the attitude model 22b; and an output unit 14 for marking the feature point data of the skeleton data 19 acquired by the skeleton extraction unit 11 on the basis of the result of determination by the attitude estimation unit 13b and displaying the resultant data.SELECTED DRAWING: Figure 1

Description

本発明は、姿勢分析プログラム、および、姿勢分析装置に関する。 The present invention relates to a posture analysis program and a posture analyzer.

製造業の人による作業を伴う生産現場において、また、製品の生産計画や、いわゆる現場改善において、人の作業時間計測は必須とされている。たとえば、組立作業を伴う生産現場においては、多様な組立作業に掛かる作業標準時間（ＳＴ：Standard Time）により、生産計画が立案される。また、作業の改善においては、標準作業との偏差の改善がテーマとなる。 It is indispensable to measure the working time of human beings at the production site where the work by human beings in the manufacturing industry is involved, and in the production planning of products and so-called on-site improvement. For example, at a production site that involves assembly work, a production plan is formulated based on the standard time (ST) required for various assembly work. In addition, in improving work, the theme is to improve the deviation from standard work.

ここで、作業時間の計測のためには、一般的には、PC操作、バーコード読込、ボタン押下などの人が作業の開始と終了を示す何らかの操作を契機にする。または、ドリルのON/OFF、スイッチのON/OFF、装置の稼動を示す電流値などの作業に間接的に係っている装置からデータを取り出すことで、作業時間が計測される場合もある。
しかし、これらの計測手段は、新たな設備の追加や、作業者への負担増につながり、本来業務ではない作業手順であることから実際にそれが行われず、正確にデータ化できないことが多い。 Here, in order to measure the working time, generally, some operation such as PC operation, reading a barcode, pressing a button, or the like indicating the start and end of the work is triggered. Alternatively, the working time may be measured by extracting data from a device indirectly involved in the work such as drill ON / OFF, switch ON / OFF, and current value indicating the operation of the device.
However, these measuring means lead to the addition of new equipment and an increase in the burden on the worker, and since it is a work procedure that is not the original work, it is not actually performed and it is often impossible to accurately convert the data.

そのため、作業者の作業状況をビデオカメラで記録し、人手によって解析することが一般的になっており、そのための計算機システムあるいは、プログラムが使われている。しかし、長時間記録されたビデオカメラの映像から特定の作業者の状況を解析、記録する作業は長時間におよび、解析者の負担が大きく、映像の解析自動化による解析作業の負担軽減がのぞまれている。
そこで、特許文献１，２では、解析者の目視での解析の代わりに、コンピュータの画像認識により、自動的に作業を解析する手法が提案されている。 Therefore, it is common to record the work situation of a worker with a video camera and analyze it manually, and a computer system or a program for that purpose is used. However, the work of analyzing and recording the situation of a specific worker from the video of a video camera recorded for a long time takes a long time, and the burden on the analyst is heavy, and the burden of analysis work can be reduced by automating the analysis of the video. It is rare.
Therefore, Patent Documents 1 and 2 propose a method of automatically analyzing a work by computer image recognition instead of visual analysis by an analyst.

特開２０１９−１６２２６号公報Japanese Unexamined Patent Publication No. 2019-16226 国際公開第２０１９／００３３５５号International Publication No. 2019/003355

画像認識の精度を高めるためには、事前に用意した教師ありの学習データを大量に用意し、その学習データを機械学習して高精度なモデルを生成することが一般的である。しかし、多数の作業者が勤務する製造現場などでは、作業者一人ずつの学習データを用意することは負担が大きい。
そこで、特許文献１の手法では、画像に写っている作業者の頭と手の位置を特徴量として抽出し、その特徴量から作業を特定することで、個々の作業者の体格や性別などに依存しない汎用的な手法を提供している。 In order to improve the accuracy of image recognition, it is common to prepare a large amount of supervised learning data prepared in advance and machine-learn the learning data to generate a highly accurate model. However, at a manufacturing site where a large number of workers work, it is a heavy burden to prepare learning data for each worker.
Therefore, in the method of Patent Document 1, the positions of the head and hand of the worker shown in the image are extracted as the feature amount, and the work is specified from the feature amount to determine the physique and gender of each worker. It provides a general-purpose method that does not depend on it.

一方、作業者の頭と手の位置だけをトレースするだけでは、作業内容を絞り込めないこともある。例えば、手の位置が床に近づいた状態であっても、単にしゃがんだだけなのか、それとも床の荷物を持ち上げているのかというように、細かく作業者の意図を分析することで、作業内容がより正確に特定できる。
しかし、特許文献１，２などの従来の自動認識では、このような詳細な認識モデルは提案されていなかった。 On the other hand, it may not be possible to narrow down the work content by tracing only the positions of the worker's head and hands. For example, even if the position of the hand is close to the floor, the work content can be determined by analyzing the worker's intention in detail, such as whether he is just crouching or lifting the luggage on the floor. Can be identified more accurately.
However, in the conventional automatic recognition such as Patent Documents 1 and 2, such a detailed recognition model has not been proposed.

そこで、本発明は、作業者の姿勢を低コストかつ高精度に分析することを、主な課題とする。 Therefore, the main subject of the present invention is to analyze the posture of a worker at low cost and with high accuracy.

前記課題を解決するために、本発明の姿勢分析プログラムは、以下の特徴を有する。
姿勢分析プログラムは、
画像データを入力とした画像認識により、前記画像データに写る人物の関節位置を示す特徴点データを含む骨格データを取得する骨格抽出部、
前記骨格データごとに姿勢ラベルが対応づけられている姿勢モデルの記憶部、
前記骨格抽出部が取得した前記骨格データをもとに、前記姿勢モデルに予め決められた前記姿勢ラベルから、前記画像データに写る人物の姿勢を判別する姿勢推定部、としてコンピュータを機能させる。
その他の手段は、後記する。 In order to solve the above problems, the posture analysis program of the present invention has the following features.
Posture analysis program
A skeleton extraction unit that acquires skeleton data including feature point data indicating the joint position of a person reflected in the image data by image recognition using image data as input.
A storage unit of a posture model to which a posture label is associated with each of the skeleton data.
Based on the skeleton data acquired by the skeleton extraction unit, the computer functions as a posture estimation unit that determines the posture of a person reflected in the image data from the posture label predetermined in the posture model.
Other means will be described later.

本発明によれば、作業者の姿勢を低コストかつ高精度に分析することができる。 According to the present invention, the posture of an operator can be analyzed at low cost and with high accuracy.

本発明の一実施形態に関する作業分析システムの構成図である。It is a block diagram of the work analysis system which concerns on one Embodiment of this invention. 本発明の一実施形態に関する作業分析システムの運用を示すシーケンス図である。It is a sequence diagram which shows the operation of the work analysis system which concerns on one Embodiment of this invention. 本発明の一実施形態に関する画像データおよび骨格データの一例を示す図である。It is a figure which shows an example of image data and skeleton data which concerns on one Embodiment of this invention. 本発明の一実施形態に関する図３の骨格データを構成する特徴点データを示すテーブルである。It is a table which shows the feature point data which comprises the skeleton data of FIG. 3 concerning one Embodiment of this invention. 本発明の一実施形態に関する領域に関する処理部を示す構成図である。It is a block diagram which shows the processing part about the area concerning one Embodiment of this invention. 本発明の一実施形態に関する姿勢に関する処理部を示す構成図である。It is a block diagram which shows the processing part about the posture about one Embodiment of this invention. 本発明の一実施形態に関する背景に関する処理部を示す構成図である。It is a block diagram which shows the processing part concerning the background concerning one Embodiment of this invention. 本発明の一実施形態に関する手順に関する処理部と、その処理結果を出力する出力部とを示す構成図である。It is a block diagram which shows the processing part which concerns on the procedure about one Embodiment of this invention, and the output part which outputs the processing result. 本発明の一実施形態に関する背景定義部によるモデル定義を示すフローチャートである。It is a flowchart which shows the model definition by the background definition part concerning one Embodiment of this invention. 本発明の一実施形態に関するモデル定義の対象となる画像データを示す図である。It is a figure which shows the image data which is the object of the model definition concerning one Embodiment of this invention. 本発明の一実施形態に関する図１０の画像データから生成される領域モデルの例を示す図である。It is a figure which shows the example of the area model generated from the image data of FIG. 10 concerning one Embodiment of this invention. 本発明の一実施形態に関する図１０の画像データから生成される背景モデルの例を示す図である。It is a figure which shows the example of the background model generated from the image data of FIG. 10 concerning one Embodiment of this invention. 本発明の一実施形態に関する姿勢学習部によるモデル定義を示すフローチャートである。It is a flowchart which shows the model definition by the posture learning part about one Embodiment of this invention. 本発明の一実施形態に関する図１３の姿勢推定部の学習工程におけるＧＵＩ画面図である。It is a GUI screen view in the learning process of the posture estimation part of FIG. 13 concerning one Embodiment of this invention. 本発明の一実施形態に関する図１３の姿勢推定部の学習工程の結果として生成される姿勢モデルを示す図である。It is a figure which shows the posture model generated as a result of the learning process of the posture estimation part of FIG. 13 concerning one Embodiment of this invention. 本発明の一実施形態に関する手順学習部による学習結果である手順モデルを示す図である。It is a figure which shows the procedure model which is the learning result by the procedure learning part about one Embodiment of this invention. 本発明の一実施形態に関する分析部のメイン処理を示すフローチャートである。It is a flowchart which shows the main processing of the analysis part which concerns on one Embodiment of this invention. 本発明の一実施形態に関する領域推定部のサブルーチン処理を示すフローチャートである。It is a flowchart which shows the subroutine processing of the area estimation part which concerns on one Embodiment of this invention. 本発明の一実施形態に関する図１８の処理結果として「部品取り領域」で両手を認識したときの図である。It is a figure when both hands are recognized in the "parts picking area" as a processing result of FIG. 18 concerning one Embodiment of this invention. 本発明の一実施形態に関する図１８の処理結果として「完成品格納領域」で両手を認識したときの図である。It is a figure when both hands are recognized in the "finished product storage area" as the processing result of FIG. 18 concerning one Embodiment of this invention. 本発明の一実施形態に関する姿勢推定部のサブルーチン処理を示すフローチャートである。It is a flowchart which shows the subroutine processing of the posture estimation part which concerns on one Embodiment of this invention. 本発明の一実施形態に関する図２１の処理に用いられる画像データを示す図である。It is a figure which shows the image data used for the processing of FIG. 21 concerning one Embodiment of this invention. 本発明の一実施形態に関する図２２の画像データに対する推論ラベル（姿勢ラベル）を示す姿勢データの図である。It is a figure of the posture data which shows the inference label (posture label) with respect to the image data of FIG. 22 concerning one Embodiment of this invention. 本発明の一実施形態に関する背景推定部のサブルーチン処理を示すフローチャートである。It is a flowchart which shows the subroutine processing of the background estimation part which concerns on one Embodiment of this invention. 本発明の一実施形態に関する図２４の処理結果として、ドライバが未使用の状態を認識したときの図である。It is a figure when the driver recognizes an unused state as a processing result of FIG. 24 which concerns on one Embodiment of this invention. 本発明の一実施形態に関する図２４の処理結果として、ドライバが使用中の状態を認識したときの図である。It is a figure when the driver recognizes the state in use as the processing result of FIG. 24 which concerns on one Embodiment of this invention. 本発明の一実施形態に関する手順推定部が出力する手順データの例を示す図である。It is a figure which shows the example of the procedure data output by the procedure estimation part which concerns on one Embodiment of this invention. 本発明の一実施形態に関する図２７の手順データをガントチャート形式で表示した画面図である。It is a screen view which displayed the procedure data of FIG. 27 concerning one Embodiment of this invention in the Gantt chart format.

以下、本発明の一実施形態について、図面を参照して詳細に説明する。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

図１は、作業分析システムの構成図である。以下では、この作業分析システムを製造業の組み立て現場に導入し、作業者が小型のパソコンを組み立てる作業を分析する場合に適用した一例を説明する。
作業分析システムは、分析装置１０を中心として、ビデオカメラ３１と、ビデオレコーダ３２と、入出力装置３３と、モニタ４１と、記憶装置４２と、アプリケーション装置４３とを含めて構成される。
これらの作業分析システムの各装置は、それぞれ、イーサネット（登録商標）などのネットワーク、USBやその他、ハードウェア・インタフェースとして使用可能な適切なもので接続される。また、作業分析システムの各装置は、単独の装置として構成されていてもよいし、分析装置１０などの計算機システム上のソフトウェアを実行することで実現してもよい。 FIG. 1 is a configuration diagram of a work analysis system. In the following, an example of introducing this work analysis system to an assembly site in the manufacturing industry and applying it when an operator analyzes the work of assembling a small personal computer will be described.
The work analysis system is composed of an analysis device 10, a video camera 31, a video recorder 32, an input / output device 33, a monitor 41, a storage device 42, and an application device 43.
Each device of these work analysis systems is connected by a network such as Ethernet, USB or any other suitable hardware interface. Further, each device of the work analysis system may be configured as a single device, or may be realized by executing software on a computer system such as the analysis device 10.

ビデオカメラ３１は、作業者を被写体として撮影する。ビデオレコーダ３２には、ビデオカメラ３１で撮影した映像が記録されている。入出力装置３３は、グラフィックディスプレイ、マウスを備え、作業者などの利用者に情報表示したり、利用者の指示を受け付けたりする。
モニタ４１と、記憶装置４２と、アプリケーション装置４３とは、それぞれ分析装置１０の分析結果の出力先である（詳細は図８）。 The video camera 31 shoots the operator as a subject. The video recorder 32 records images taken by the video camera 31. The input / output device 33 includes a graphic display and a mouse, displays information to a user such as an operator, and receives instructions from the user.
The monitor 41, the storage device 42, and the application device 43 are output destinations of the analysis results of the analysis device 10, respectively (details are shown in FIG. 8).

分析装置１０は、例えば、オンプレサーバ、もしくはクラウドサーバのような計算機システムである。分析装置１０は、ＣＰＵ（Central Processing Unit）と、メモリと、ハードディスクなどの記憶手段（記憶部）と、ネットワークインタフェースとを有するコンピュータとして構成される。このコンピュータは、ＣＰＵが、メモリ上に読み込んだプログラム（アプリケーションや、その略のアプリとも呼ばれる）を実行することにより、各処理部により構成される制御部（制御手段）を動作させる。
分析装置１０は、計算機システム上のプログラムを実行することで骨格抽出部１１と、モデル生成部１２と、分析部１３と、出力部１４とを構成する。これらの構成された各処理部は、ハードディスクなどの不揮発メモリ上に蓄えられるデータ（モデルデータ２２、推定結果データ２３）にアクセスする。 The analyzer 10 is, for example, a computer system such as an on-preserver or a cloud server. The analyzer 10 is configured as a computer having a CPU (Central Processing Unit), a memory, a storage means (storage unit) such as a hard disk, and a network interface. In this computer, the CPU operates a control unit (control means) composed of each processing unit by executing a program (also called an application or an abbreviation for application) read in the memory.
The analyzer 10 constitutes a skeleton extraction unit 11, a model generation unit 12, an analysis unit 13, and an output unit 14 by executing a program on the computer system. Each of these configured processing units accesses data (model data 22, estimation result data 23) stored in a non-volatile memory such as a hard disk.

骨格抽出部１１は、ビデオカメラ３１またはビデオレコーダ３２から入力された画像データ１８をもとに、骨格データ１９を抽出する。
モデル生成部１２は、学習用の画像データ１８ａ（画像データ１８）と、学習用の骨格データ１９ａ（骨格データ１９）とを入力として、モデルデータ２２を生成して不揮発メモリに保存する。モデルデータ２２には、ユーザから明示的に定義された定義データと、ユーザから入力されたラベルデータを用いた学習結果である学習済データとが存在する。なお、モデル生成部１２は、分析対象の作業に対してモデルデータ２２を基本的には１度作成すればよいが、精度を向上させるために、すでに作成したモデルデータ２２を更新（改良）してもよい。
分析部１３は、分析用の画像データ１８ｂ（画像データ１８）と、分析用の骨格データ１９ｂ（骨格データ１９）とを入力として、モデルデータ２２を用いた推論処理により、推定結果データ２３を求める。
出力部１４は、推定結果データ２３を外部装置（モニタ４１と、記憶装置４２と、アプリケーション装置４３）に出力する。 The skeleton extraction unit 11 extracts the skeleton data 19 based on the image data 18 input from the video camera 31 or the video recorder 32.
The model generation unit 12 receives the image data 18a (image data 18) for learning and the skeleton data 19a (skeleton data 19) for learning as inputs, generates the model data 22, and stores it in the non-volatile memory. The model data 22 includes definition data explicitly defined by the user and learned data which is a learning result using label data input by the user. The model generation unit 12 basically needs to create the model data 22 once for the work to be analyzed, but in order to improve the accuracy, the model data 22 already created is updated (improved). You may.
The analysis unit 13 obtains the estimation result data 23 by inference processing using the model data 22 with the image data 18b (image data 18) for analysis and the skeleton data 19b (skeleton data 19) for analysis as inputs. ..
The output unit 14 outputs the estimation result data 23 to external devices (monitor 41, storage device 42, and application device 43).

図２は、作業分析システムの運用を示すシーケンス図である。
深層学習などの機械学習段階において、ビデオカメラ３１から画像取得（Ｓ１０１）された画像データ１８、または、ビデオカメラ３１から画像取得（Ｓ１０２）されてビデオレコーダ３２が記録画像３２Ｄに画像記録（Ｓ１０３）した画像データ１８は、分析装置１０に入力される。
分析装置１０は、利用者から入出力装置３３を介して受けた学習指示（Ｓ１１１）により、学習処理（Ｓ１１２）を実行し、その結果をモデルデータ２２として出力する。 FIG. 2 is a sequence diagram showing the operation of the work analysis system.
In the machine learning stage such as deep learning, the image data 18 acquired from the video camera 31 (S101) or the image acquired from the video camera 31 (S102) is recorded by the video recorder 32 on the recorded image 32D (S103). The resulting image data 18 is input to the analyzer 10.
The analyzer 10 executes the learning process (S112) according to the learning instruction (S111) received from the user via the input / output device 33, and outputs the result as model data 22.

分析段階において、ビデオカメラ３１から画像取得（Ｓ１２１）された画像データ１８、または、ビデオカメラ３１から画像取得（Ｓ１２２）されてビデオレコーダ３２が記録画像３２Ｅに画像記録（Ｓ１２３）した画像データ１８は、分析装置１０に入力される。
分析装置１０は、利用者から入出力装置３３を介して受けた分析指示（Ｓ１３１）により、モデルデータ２２に基づく分析処理（Ｓ１３２）を実行し、その結果を推定結果データ２３として出力する。なお、分析装置１０は、画像取得（Ｓ１２１）された画像データ１８に対して分析処理（Ｓ１３２）をリアルタイムに実行してもよい。また、分析装置１０は、利用者からの分析指示（Ｓ１３１）の操作を介さずに、分析処理（Ｓ１３２）を自動実行してもよい。
そして、分析装置１０の出力部１４は、出力処理（Ｓ１４１）によって推定結果データ２３をアプリケーション装置４３などに出力する。 In the analysis stage, the image data 18 acquired from the video camera 31 (S121) or the image data 18 acquired from the video camera 31 (S122) and recorded by the video recorder 32 on the recorded image 32E (S123) is , Is input to the analyzer 10.
The analysis device 10 executes an analysis process (S132) based on the model data 22 according to the analysis instruction (S131) received from the user via the input / output device 33, and outputs the result as the estimation result data 23. The analysis device 10 may execute the analysis process (S132) in real time on the image data 18 whose image has been acquired (S121). Further, the analyzer 10 may automatically execute the analysis process (S132) without the operation of the analysis instruction (S131) from the user.
Then, the output unit 14 of the analyzer 10 outputs the estimation result data 23 to the application device 43 or the like by the output process (S141).

図３は、画像データ１８および骨格データ１９の一例を示す図である。
画像データ１８は、人物が写っている動画像において、人物ごと、画像フレームごとに１つ生成される。
骨格データ１９は、骨格抽出部１１が画像データ１８から人物の骨格情報を抽出した結果である。骨格データ１９は、人物の特徴点（関節点など）ごとに、１つずつ番号が割り当てられる（図では番号＝０〜９）。骨格抽出部１１は、OpenPose（ＵＲＬ＝https://github.com/CMU-Perceptual-Computing-Lab/openpose）などの公知の骨格情報取得技術を使用することができる。 FIG. 3 is a diagram showing an example of image data 18 and skeleton data 19.
One image data 18 is generated for each person and each image frame in a moving image in which a person is captured.
The skeleton data 19 is the result of the skeleton extraction unit 11 extracting the skeleton information of a person from the image data 18. The skeleton data 19 is assigned a number for each characteristic point (joint point, etc.) of the person (number = 0 to 9 in the figure). The skeleton extraction unit 11 can use a known skeleton information acquisition technique such as OpenPose (URL = https://github.com/CMU-Perceptual-Computing-Lab/openpose).

図４は、図３の骨格データ１９を構成する特徴点データを示すテーブルである。
このテーブルは、特徴点の番号ごとに、特徴点の名称（鼻など）と、特徴点の（ｘ，ｙ）座標とで構成される。特徴点の名称として、例えば、人物の首、左肩、左肘などの特徴点に対して別々の番号が割り当てられる。特徴点の名称や座標は、骨格抽出部１１が画像データ１８からそれぞれの関節点を画像認識で認識した結果である。 FIG. 4 is a table showing feature point data constituting the skeleton data 19 of FIG.
This table is composed of the name of the feature point (nose, etc.) and the (x, y) coordinates of the feature point for each feature point number. As the name of the feature point, different numbers are assigned to the feature points such as the neck, left shoulder, and left elbow of the person. The names and coordinates of the feature points are the results of the skeleton extraction unit 11 recognizing each joint point from the image data 18 by image recognition.

以下、図５〜図８を参照して、モデル生成部１２および分析部１３の詳細を説明する。分析装置１０は、以下の（１）〜（３）に示す中間的な分析結果をもとに、（４）の最終的な分析結果を求める。
（１）「領域」の分析とは、画像データ１８内にあらかじめ定義した領域内に、骨格データ１９が示す作業者の身体が入っているか否かを分析することである（詳細は図５）。
（２）「姿勢」の分析とは、骨格データ１９が示す作業者の身体がどのような姿勢になっているかを分析することである（詳細は図６）。
（３）「背景」の分析とは、画像データ１８内にあらかじめ定義した背景領域内の状態を分析することである（詳細は図７）。
（４）「手順」の分析とは、「領域、姿勢、背景」それぞれの分析結果の組み合わせを元に、画像データ１８内の作業者が組み立て作業の中のどのような手順を行っているかを分析することである（詳細は図８）。
なお、出力部１４は、（４）の最終的な分析結果を出力してもよいし、（１）〜（３）に示す中間的な分析結果のうちの少なくとも１つを出力してもよい。 Hereinafter, the details of the model generation unit 12 and the analysis unit 13 will be described with reference to FIGS. 5 to 8. The analyzer 10 obtains the final analysis result of (4) based on the intermediate analysis results shown in the following (1) to (3).
(1) The analysis of the "region" is to analyze whether or not the worker's body indicated by the skeleton data 19 is contained in the region defined in advance in the image data 18 (details are shown in FIG. 5). ..
(2) The "posture" analysis is to analyze the posture of the worker's body shown in the skeleton data 19 (details are shown in FIG. 6).
(3) The analysis of the "background" is to analyze the state in the background area defined in advance in the image data 18 (details are shown in FIG. 7).
(4) The analysis of the "procedure" is based on the combination of the analysis results of each of the "area, posture, and background", and what kind of procedure the worker in the image data 18 is performing in the assembly work. It is to analyze (details are shown in FIG. 8).
The output unit 14 may output the final analysis result of (4), or may output at least one of the intermediate analysis results shown in (1) to (3). ..

図５は、領域に関する処理部を示す構成図である。
モデル生成部１２の領域定義部１２ａは、画像データ１８上の領域を入出力装置３３を介してユーザに多角形（四角形）座標データとして定義させ、その定義データをモデルデータ２２の領域モデル２２ａとして保存する。
分析部１３の領域推定部１３ａは、保存された領域モデル２２ａと骨格データ１９とを使用して、定義された領域内に、骨格データ１９の特徴点が入っているか否かを分析し、その分析結果（人物作業状態）を推定結果データ２３の領域データ２３ａとして出力する。 FIG. 5 is a configuration diagram showing a processing unit related to the area.
The area definition unit 12a of the model generation unit 12 causes the user to define the area on the image data 18 as polygonal (quadrilateral) coordinate data via the input / output device 33, and uses the definition data as the area model 22a of the model data 22. save.
The region estimation unit 13a of the analysis unit 13 uses the stored region model 22a and the skeleton data 19 to analyze whether or not the feature points of the skeleton data 19 are included in the defined region, and the region estimation unit 13a thereof The analysis result (personal work state) is output as the area data 23a of the estimation result data 23.

なお、領域定義部１２ａは、深層学習などの機械学習を用いてもよいし、用いなくてもよい。機械学習は精度が高く、汎用性も高い。しかし、学習に膨大な画像データ１８を必要とし、学習に手間がかかる。また、深層学習を使う技術が必要とされるため、製造現場において、生産管理を担当する担当者が使えるものではない。そこで、機械学習を用いる代わりに、領域モデル２２ａを直接ユーザに定義させることで、製造現場の担当者の負担を軽減できる。 The area definition unit 12a may or may not use machine learning such as deep learning. Machine learning is highly accurate and versatile. However, a huge amount of image data 18 is required for learning, and learning is troublesome. In addition, since technology using deep learning is required, it cannot be used by the person in charge of production management at the manufacturing site. Therefore, instead of using machine learning, the burden on the person in charge at the manufacturing site can be reduced by having the user directly define the area model 22a.

図６は、姿勢に関する処理部を示す構成図である。
モデル生成部１２の姿勢学習部１２ｂは、骨格抽出部１１が抽出した骨格データ１９を表示し、その表示を見たユーザからの正解ラベル（姿勢ラベル）を受け付ける。姿勢学習部１２ｂは、骨格データ１９と姿勢ラベルとの組み合わせデータを学習し、その学習結果をモデルデータ２２の姿勢モデル２２ｂとして保存する。
分析部１３の姿勢推定部１３ｂは、保存された姿勢モデル２２ｂと骨格データ１９とを使用して、骨格データ１９の人物の姿勢を分析し、その分析結果（人物作業状態）を推定結果データ２３の姿勢データ２３ｂとして出力する。 FIG. 6 is a configuration diagram showing a processing unit related to posture.
The posture learning unit 12b of the model generation unit 12 displays the skeleton data 19 extracted by the skeleton extraction unit 11, and receives a correct answer label (posture label) from the user who has seen the display. The posture learning unit 12b learns the combination data of the skeleton data 19 and the posture label, and saves the learning result as the posture model 22b of the model data 22.
The posture estimation unit 13b of the analysis unit 13 analyzes the posture of the person in the skeleton data 19 using the stored posture model 22b and the skeleton data 19, and estimates the analysis result (person work state) of the person. Is output as posture data 23b.

図７は、背景に関する処理部を示す構成図である。
モデル生成部１２の背景定義部１２ｃは、画像データ１８上の背景領域を入出力装置３３を介してユーザに多角形（四角形）座標データとして定義させ、その背景領域内に写っている画像内容を見たユーザからの正解ラベル（背景ラベル）を受け付ける。背景定義部１２ｃは、背景領域と背景ラベルとの組み合わせデータを学習し、その学習結果をモデルデータ２２の背景モデル２２ｃとして保存する。
分析部１３の背景推定部１３ｃは、保存された背景モデル２２ｃと画像データ１８とを使用して、画像データ１８の背景領域内の画像内容を分析し、その分析結果（機材作業状態）を推定結果データ２３の背景データ２３ｃとして出力する。 FIG. 7 is a configuration diagram showing a processing unit related to the background.
The background definition unit 12c of the model generation unit 12 causes the user to define the background area on the image data 18 as polygonal (quadrilateral) coordinate data via the input / output device 33, and determines the image content reflected in the background area. Accepts the correct answer label (background label) from the user who saw it. The background definition unit 12c learns the combination data of the background area and the background label, and saves the learning result as the background model 22c of the model data 22.
The background estimation unit 13c of the analysis unit 13 analyzes the image content in the background area of the image data 18 using the saved background model 22c and the image data 18, and estimates the analysis result (equipment work state). It is output as background data 23c of the result data 23.

図８は、手順に関する処理部と、その処理結果を出力する出力部１４とを示す構成図である。
領域モデル２２ａと、姿勢モデル２２ｂと、背景モデル２２ｃと、手順モデル２２ｄとを含むモデルデータ２２、および、領域データ２３ａと、姿勢データ２３ｂと、背景データ２３ｃと、手順データ２３ｄとを含む推定結果データ２３は、それぞれ分析装置１０の記憶部２０に格納される。
手順推定部１３ｄは、「領域データ２３ａ、姿勢データ２３ｂ、背景データ２３ｃ」それぞれの中間的な分析結果の推定結果データ２３を組み合わせて、最終的な作業者の手順データ２３ｄ（作業状態）を決定する。中間的な分析結果の３種類のうちの１種類が誤った推定をしても、残り２種類が正しく推定されることで、最終的な精度が向上する。 FIG. 8 is a configuration diagram showing a processing unit related to the procedure and an output unit 14 that outputs the processing result.
Model data 22 including region model 22a, posture model 22b, background model 22c, and procedure model 22d, and estimation result including region data 23a, posture data 23b, background data 23c, and procedure data 23d. Each of the data 23 is stored in the storage unit 20 of the analyzer 10.
The procedure estimation unit 13d determines the final operator procedure data 23d (working state) by combining the estimation result data 23 of the intermediate analysis results of the "area data 23a, posture data 23b, and background data 23c". To do. Even if one of the three types of intermediate analysis results is estimated incorrectly, the remaining two types are estimated correctly, and the final accuracy is improved.

手順推定部１３ｄによる手順データ２３ｄの決定処理には、「領域モデル２２ａと、姿勢モデル２２ｂと、背景モデル２２ｃ」それぞれの中間的なモデルデータ２２を組み合わせから、手順データ２３ｄを求めるためのモデルデータ２２である手順モデル２２ｄが必要となる。
そこで、手順学習部１２ｄは、「領域、姿勢、背景」それぞれの中間的な分析結果を組み合わせをを表示し、その表示を見たユーザからの正解ラベル（手順ラベル）を受け付ける。手順学習部１２ｄは、中間的な分析結果を組み合わせと、手順ラベルとを学習し、その学習結果をモデルデータ２２の手順モデル２２ｄとして保存する。このように、機械学習の手法を使った学習・推論を組み合わせることで、より短い時間で、効率的に分析できる。 In the determination process of the procedure data 23d by the procedure estimation unit 13d, the model data for obtaining the procedure data 23d from the combination of the intermediate model data 22 of the "region model 22a, the posture model 22b, and the background model 22c". A procedure model 22d, which is 22, is required.
Therefore, the procedure learning unit 12d displays a combination of intermediate analysis results for each of the "area, posture, and background", and receives a correct answer label (procedure label) from the user who sees the display. The procedure learning unit 12d learns the combination of the intermediate analysis results and the procedure label, and saves the learning result as the procedure model 22d of the model data 22. In this way, by combining learning and inference using machine learning methods, it is possible to analyze efficiently in a shorter time.

出力部１４の出力演算部１４ｐは、推定結果データ２３の通知を受け、出力先で要求されるデータとなるように、以下に例示する演算処理を実行させる。
・HTML出力部１４ａは、推定結果データ２３をHTML形式（ブラウザ表示）に変換し、モニタ４１に出力する。
・CSV出力部１４ｂは、推定結果データ２３をCSV形式のファイルに変換し、記憶装置４２に出力する。
・ソケット通信部１４ｃは、推定結果データ２３をソケット通信でアプリケーション装置４３に出力する。 The output calculation unit 14p of the output unit 14 receives the notification of the estimation result data 23, and executes the calculation processing illustrated below so that the data is requested by the output destination.
-The HTML output unit 14a converts the estimation result data 23 into an HTML format (browser display) and outputs it to the monitor 41.
-The CSV output unit 14b converts the estimation result data 23 into a CSV format file and outputs it to the storage device 42.
The socket communication unit 14c outputs the estimation result data 23 to the application device 43 by socket communication.

以下、図９〜図１６を参照して、モデル生成部１２の事例を説明する。
図９は、背景定義部１２ｃによるモデル定義を示すフローチャートである。
Ｓ３０１として、背景定義部１２ｃは、ＧＵＩ（Graphical User Interface）を用いて選択されたフレームの画像データ１８を取得する。
Ｓ３０２として、背景定義部１２ｃは、選択されたフレームに対して背景ラベルのラベル付けの入力を受け付ける。
Ｓ３０３として、背景定義部１２ｃは、選択されたフレームの画像データ１８の一部である多角形（四角形）座標データで定義された背景領域の画像データを切り取る。
Ｓ３０４として、背景定義部１２ｃは、Ｓ３０３の画像データと、Ｓ３０２の背景ラベルとの組み合わせを、選択されたフレームの学習データとして保持する。 Hereinafter, an example of the model generation unit 12 will be described with reference to FIGS. 9 to 16.
FIG. 9 is a flowchart showing a model definition by the background definition unit 12c.
As S301, the background definition unit 12c acquires the image data 18 of the selected frame using the GUI (Graphical User Interface).
As S302, the background definition unit 12c accepts an input for labeling the background label for the selected frame.
As S303, the background definition unit 12c cuts out the image data of the background area defined by the polygonal (quadrilateral) coordinate data which is a part of the image data 18 of the selected frame.
As S304, the background definition unit 12c holds the combination of the image data of S303 and the background label of S302 as learning data of the selected frame.

Ｓ３０５として、背景定義部１２ｃは、未処理のフレームが存在するときには、処理をＳ３０１に戻す。
Ｓ３０６として、背景定義部１２ｃは、Ｓ３０６の学習データを入力として機械学習を実行する。機械学習は、深層学習を含む、ニューラルネットワークやアンサンブル学習など、公知の技術を用いることができる。
Ｓ３０７として、背景定義部１２ｃは、Ｓ３０６の学習結果を、背景モデル２２ｃとして保存する。
以上、Ｓ３０１〜Ｓ３０７の処理により、背景定義部１２ｃは、画像データ１８から背景モデル２２ｃを定義した。 As S305, the background definition unit 12c returns the processing to S301 when there is an unprocessed frame.
As S306, the background definition unit 12c executes machine learning by inputting the learning data of S306. For machine learning, known techniques such as neural networks and ensemble learning, including deep learning, can be used.
As S307, the background definition unit 12c saves the learning result of S306 as the background model 22c.
As described above, the background definition unit 12c has defined the background model 22c from the image data 18 by the processing of S301 to S307.

図１０は、モデル定義の対象となる画像データ１８を示す図である。
領域定義部１２ａは、入出力装置３３のＧＵＩを使用して、画像データ１８上の領域１０１，１０２を領域モデル２２ａ用に定義させる。例えば、部品取り領域を作業者のイスに対して右側に配置し（領域１０１）、完成品格納領域を左側に配置する（領域１０２）。
背景定義部１２ｃは、入出力装置３３のＧＵＩを使用して、画像データ１８上の背景領域１０３を背景モデル２２ｃ用に定義させる。例えば、ドライバ置き場であるドライバ領域を作業者のイスに対して右側に配置する（領域１０３）。 FIG. 10 is a diagram showing image data 18 to be model-defined.
The area definition unit 12a uses the GUI of the input / output device 33 to define the areas 101 and 102 on the image data 18 for the area model 22a. For example, the parts picking area is arranged on the right side of the worker's chair (area 101), and the finished product storage area is arranged on the left side (area 102).
The background definition unit 12c uses the GUI of the input / output device 33 to define the background area 103 on the image data 18 for the background model 22c. For example, the driver area, which is the driver storage area, is arranged on the right side of the worker's chair (area 103).

図１１は、図１０の画像データ１８から生成される領域モデル２２ａの例を示す図である。
領域モデル２２ａは、図１０で入力された領域ごとに、領域ラベルと、特徴点番号と、判定論理と、多角形（四角形）座標データとを対応づけて構成される。例えば、領域モデル２２ａの第１行は、「部品取り領域」として、作業者の骨格データ１９の特徴点番号（４は右手首、７は左手首を示す）の両方（AND）が多角形（四角形）座標データ（４つの頂点座標、図１０の領域１０１を示す）内に存在したときに、組み立てるパソコンの部品を作業者が取ったと認識される。
なお、判定論理の「AND」は特徴点番号のAND判定（例えば両手）を示し、「OR」は特徴点番号のOR判定（例えば片手）を示す。つまり、作業者の両手首が部品取り領域に入ったら「右側に両手が入る」という領域判定が行われる。 FIG. 11 is a diagram showing an example of a region model 22a generated from the image data 18 of FIG.
The area model 22a is configured by associating an area label, a feature point number, a determination logic, and polygonal (quadrilateral) coordinate data for each area input in FIG. For example, in the first row of the area model 22a, both (AND) of the feature point numbers (4 indicates the right wrist and 7 indicates the left wrist) of the worker's skeleton data 19 are polygonal (AND) as the “part removal area”. When it exists in the coordinate data (four vertex coordinates, showing the area 101 in FIG. 10), it is recognized that the operator has taken the parts of the personal computer to be assembled.
The "AND" in the determination logic indicates the AND determination of the feature point number (for example, both hands), and the "OR" indicates the OR determination of the feature point number (for example, one hand). That is, when both wrists of the worker enter the parts picking area, the area determination that "both hands enter on the right side" is performed.

図１２は、図１０の画像データ１８から生成される背景モデル２２ｃの例を示す図である。
背景モデル２２ｃは、定義名と、Ｓ３０３で入力される多角形（四角形）座標データと、Ｓ３０２で入力される背景ラベルとを対応づけて構成される。
例えば、利用者は、ドライバ置き場にドライバが置かれている状態の背景領域１０３の多角形（四角形）座標データに対して、背景ラベル「未使用（ドライバあり）」を対応づける。一方、図示は省略したが、利用者は、ドライバ置き場にドライバが置かれていない状態の背景領域１０３の多角形（四角形）座標データに対して、背景ラベル「使用中（ドライバなし）」を対応づける。つまり、多角形（四角形）座標データで示される画像データ１８内の領域の位置が同じであっても、ドライバが置かれている画像データ１８と、ドライバが置かれていない画像データ１８とで、別々の背景ラベルが対応づけられる。 FIG. 12 is a diagram showing an example of the background model 22c generated from the image data 18 of FIG.
The background model 22c is configured by associating the definition name with the polygonal (quadrilateral) coordinate data input in S303 and the background label input in S302.
For example, the user associates the background label "unused (with driver)" with the polygonal (quadrilateral) coordinate data of the background area 103 in the state where the driver is placed in the driver storage area. On the other hand, although not shown, the user corresponds to the background label "in use (without driver)" for the polygonal (quadrilateral) coordinate data of the background area 103 in the state where the driver is not placed in the driver storage area. Attach. That is, even if the positions of the regions in the image data 18 indicated by the polygonal (quadrilateral) coordinate data are the same, the image data 18 in which the driver is placed and the image data 18 in which the driver is not placed are available. Separate background labels are associated.

図１３は、姿勢学習部１２ｂによるモデル定義を示すフローチャートである。
Ｓ３１１として、姿勢学習部１２ｂは、ＧＵＩを用いて選択されたフレームの画像データ１８を取得する（図１４で後記）。
Ｓ３１２として、姿勢学習部１２ｂは、Ｓ３１１で選択されたフレームの骨格データ１９を取得する。そして、姿勢学習部１２ｂは、画像データ１８と骨格データ１９とを表示して、その表示内容に対して姿勢モデルの入力を促す。
Ｓ３１３として、姿勢学習部１２ｂは、姿勢ラベル（正解ラベル）がラベル付けされなかった場合、処理をＳ３１１に戻して別のフレームを選択させる。 FIG. 13 is a flowchart showing a model definition by the posture learning unit 12b.
As S311 the posture learning unit 12b acquires the image data 18 of the selected frame using the GUI (described later in FIG. 14).
As S312, the posture learning unit 12b acquires the skeleton data 19 of the frame selected in S311. Then, the posture learning unit 12b displays the image data 18 and the skeleton data 19 and prompts the input of the posture model for the displayed contents.
As S313, when the posture label (correct answer label) is not labeled, the posture learning unit 12b returns the process to S311 to select another frame.

Ｓ３１４として、姿勢学習部１２ｂは、Ｓ３１２の骨格データ１９と、Ｓ３１３の姿勢ラベルとの組み合わせを、選択されたフレームの学習データとして保持する。
Ｓ３１５として、姿勢学習部１２ｂは、未処理のフレームが存在するときには、処理をＳ３０１に戻す。
Ｓ３１６として、姿勢学習部１２ｂは、Ｓ３０６と同様に、Ｓ３１４の学習データを入力として機械学習を実行する。
Ｓ３１７として、姿勢学習部１２ｂは、Ｓ３１６の学習結果を、姿勢モデル２２ｂとして保存する（図１５で後記）。 As S314, the posture learning unit 12b holds the combination of the skeleton data 19 of S312 and the posture label of S313 as learning data of the selected frame.
As S315, the posture learning unit 12b returns the processing to S301 when there is an unprocessed frame.
As S316, the posture learning unit 12b executes machine learning by inputting the learning data of S314 as in S306.
As S317, the posture learning unit 12b saves the learning result of S316 as the posture model 22b (described later in FIG. 15).

図１４は、図１３の姿勢推定部１３ｂの学習工程におけるＧＵＩ画面図である。
利用者は、入出力装置３３のＧＵＩを用いて、正解ラベル付けを行う。まず、利用者は、画像表示欄１１１から学習用の画像を見ながら、画像選択欄１１２からコマ送りボタンやスライダで画像を選択する。
利用者は、選択した画像が、「右から取り出す」「組み立てる」「左に置く」「その他」のいずれであるかを、正解ラベル入力欄１１３のボタンの押下によって、指示する。 FIG. 14 is a GUI screen view in the learning process of the posture estimation unit 13b of FIG.
The user uses the GUI of the input / output device 33 to label the correct answer. First, the user selects an image from the image selection field 112 with the frame advance button or the slider while viewing the image for learning from the image display field 111.
The user indicates whether the selected image is "take out from the right", "assemble", "put on the left", or "other" by pressing the button in the correct label input field 113.

図１５は、図１３の姿勢推定部１３ｂの学習工程の結果として生成される姿勢モデル２２ｂを示す図である。
姿勢推定部１３ｂは、図１４のＧＵＩから入力されたフレーム番号と、正解ラベルと、フレームで検出した人物の骨格データ１９とを、機械学習の結果である姿勢モデル２２ｂとして関連付ける。この姿勢モデル２２ｂは、例えば、ねじ回しの作業姿勢を学習し、ねじ回しを行っているのか否かを推定するために使用される。
なお、図１５では、図１４の画像表示欄１１１に表示された画像上の人物の姿勢から、「右から取り出す」「組み立てる」「左に置く」「その他」を判定するための姿勢モデル２２ｂが示される。 FIG. 15 is a diagram showing a posture model 22b generated as a result of the learning process of the posture estimation unit 13b of FIG.
The posture estimation unit 13b associates the frame number input from the GUI of FIG. 14, the correct label, and the skeleton data 19 of the person detected in the frame as the posture model 22b which is the result of machine learning. The posture model 22b is used, for example, to learn the working posture of a screwdriver and estimate whether or not the screwdriver is being screwed.
In FIG. 15, the posture model 22b for determining "take out from the right", "assemble", "put on the left", and "other" from the posture of the person on the image displayed in the image display field 111 of FIG. 14 Shown.

図１６は、手順学習部１２ｄによる学習結果である手順モデル２２ｄを示す図である。
手順モデル２２ｄは、領域モデル２２ａ、姿勢モデル２２ｂ、背景モデル２２ｃの組み合わせを入力モデルとして、その入力モデルから推定される作業者の手順を出力するためのモデルである。例えば、組み立て作業は、以下の各手順などから構成される。
・部品取り手順は、作業者の右側にある組み立て対象の部品を取得する手順である。
・組み立て手順は、ドライバを用いて、ねじ締を行う手順である。
・部品格納手順は、組み立て完了した部品を作業者の左側に置く手順である。
例えば、領域モデル２２ａを単独に用いただけでは、人物の手が「どう」動いたのかは理解できても、人物の手が「何を」つかんだのかは不明である。
しかし、領域モデル２２ａと背景モデル２２ｃとを併用し、背景領域内のドライバ置き場にドライバが存在しないことで、人物が「ドライバを」つかんだことが明確になる。さらに、姿勢モデル２２ｂも併用して、ドライバに手が伸びたことが分かった上で、肘の角度などにより、ドライバをとったのか、置いたのかを判定することができる。 FIG. 16 is a diagram showing a procedure model 22d which is a learning result by the procedure learning unit 12d.
The procedure model 22d is a model for outputting the procedure of the operator estimated from the input model by using the combination of the region model 22a, the posture model 22b, and the background model 22c as an input model. For example, the assembly work is composed of the following procedures and the like.
-The parts removal procedure is a procedure for acquiring the parts to be assembled on the right side of the operator.
-The assembly procedure is a procedure for tightening screws using a screwdriver.
-The parts storage procedure is a procedure for placing the assembled parts on the left side of the operator.
For example, by using the region model 22a alone, it is possible to understand how the person's hand moved, but it is unclear what the person's hand grabbed.
However, when the area model 22a and the background model 22c are used together and the driver does not exist in the driver storage area in the background area, it becomes clear that the person has "grabbed the driver". Further, the posture model 22b can also be used in combination to determine whether the driver has been taken or placed based on the angle of the elbow or the like after knowing that the driver has reached out.

以下、図１７〜図２８を参照して、分析部１３の事例を説明する。
図１７は、分析部１３のメイン処理を示すフローチャートである。
Ｓ１１として、分析部１３は、モデルデータ２２を取得する。
Ｓ１２として、分析部１３は、分析用の画像データ１８を取得する。
Ｓ１３として、分析部１３は、Ｓ１２の画像データ１８から、骨格抽出部１１に骨格データ１９を抽出させる。
領域モデル２２ａが存在しているときには（Ｓ２１，Ｙｅｓ）、分析部１３は、領域推定部１３ａに領域データ２３ａの推定処理を実行させる（Ｓ２２，詳細は図１８）。
姿勢モデル２２ｂが存在しているときには（Ｓ２３，Ｙｅｓ）、分析部１３は、姿勢推定部１３ｂに姿勢データ２３ｂの推定処理を実行させる（Ｓ２４，詳細は図２１）。 Hereinafter, an example of the analysis unit 13 will be described with reference to FIGS. 17 to 28.
FIG. 17 is a flowchart showing the main processing of the analysis unit 13.
As S11, the analysis unit 13 acquires the model data 22.
As S12, the analysis unit 13 acquires the image data 18 for analysis.
As S13, the analysis unit 13 causes the skeleton extraction unit 11 to extract the skeleton data 19 from the image data 18 of S12.
When the region model 22a exists (S21, Yes), the analysis unit 13 causes the region estimation unit 13a to execute the estimation process of the region data 23a (S22, details are FIG. 18).
When the posture model 22b is present (S23, Yes), the analysis unit 13 causes the posture estimation unit 13b to execute the estimation process of the posture data 23b (S24, details are FIG. 21).

Ｓ２５として、分析部１３は、未処理の人物がＳ１２の画像データ１８に存在するときには、処理をＳ２１に戻す。
背景モデル２２ｃが存在しているときには（Ｓ２６，Ｙｅｓ）、分析部１３は、背景推定部１３ｃに背景データ２３ｃの推定処理を実行させる（Ｓ２７，詳細は図２４）。
Ｓ３１として、分析部１３は、未処理のフレームが存在するときには、処理をＳ１２に戻す。
Ｓ３２として、分析部１３は、Ｓ２２，Ｓ２４，Ｓ２７の各分析結果から手順推定部１３ｄに作業手順を推定させる。 As S25, when an unprocessed person exists in the image data 18 of S12, the analysis unit 13 returns the processing to S21.
When the background model 22c is present (S26, Yes), the analysis unit 13 causes the background estimation unit 13c to execute the estimation process of the background data 23c (S27, details are FIG. 24).
As S31, the analysis unit 13 returns the processing to S12 when there is an unprocessed frame.
As S32, the analysis unit 13 causes the procedure estimation unit 13d to estimate the work procedure from the analysis results of S22, S24, and S27.

図１８は、領域推定部１３ａのサブルーチン処理を示すフローチャートである。
Ｓ２２１として、領域推定部１３ａは、画像フレームごとに、そのフレームにて検出された人物の骨格データ１９を取得する。
Ｓ２２２として、領域推定部１３ａは、領域モデル２２ａから１レコード分（１つの領域）を取得する。
領域推定部１３ａは、Ｓ２２１の骨格データ１９を構成する特徴点番号の座標が、Ｓ２２２で取得した領域内であるときには（Ｓ２２３，Ｙｅｓ）、Ｓ２２２で取得したレコードの領域ラベルを保持する（Ｓ２２４）。 FIG. 18 is a flowchart showing the subroutine processing of the area estimation unit 13a.
As S221, the area estimation unit 13a acquires the skeleton data 19 of the person detected in the frame for each image frame.
As S222, the area estimation unit 13a acquires one record (one area) from the area model 22a.
When the coordinates of the feature point numbers constituting the skeleton data 19 of S221 are within the region acquired in S222 (S223, Yes), the region estimation unit 13a holds the region label of the record acquired in S222 (S224). ..

Ｓ２２５として、領域推定部１３ａは、領域モデル２２ａ内の未処理のレコードが存在するときには、処理をＳ２２２に戻す。
Ｓ２２６として、領域推定部１３ａは、Ｓ２２４で保持されたすべての結果を領域データ２３ａとして出力する。
Ｓ２２７として、領域推定部１３ａは、未処理のフレームが存在するときには、処理をＳ２２１に戻す。 As S225, the region estimation unit 13a returns the processing to S222 when there is an unprocessed record in the region model 22a.
As S226, the area estimation unit 13a outputs all the results held in S224 as area data 23a.
As S227, the area estimation unit 13a returns the processing to S221 when there is an unprocessed frame.

図１９は、図１８の処理結果として「部品取り領域」で両手を認識したときの図である。
領域推定部１３ａは、図１０の画像データ１８上の領域１０１に対して、図１１の領域モデル２２ａの第１レコード（部品取り領域）の要件を満たす（つまり、右側に両手が入る）ことで、「右から部品をとる」という領域ラベルを含めた領域データ２３ａを推定する。 FIG. 19 is a diagram when both hands are recognized in the “parts picking area” as the processing result of FIG.
The area estimation unit 13a satisfies the requirement of the first record (parts picking area) of the area model 22a of FIG. 11 with respect to the area 101 on the image data 18 of FIG. 10 (that is, both hands are placed on the right side). , Estimate the area data 23a including the area label "take the part from the right".

図２０は、図１８の処理結果として「完成品格納領域」で両手を認識したときの図である。
領域推定部１３ａは、図１０の画像データ１８上の領域１０２に対して、図１１の領域モデル２２ａの第２レコード（完成品格納領域）の要件を満たす（つまり、左側に両手が入る）ことで、「左側に部品を格納する」という領域ラベルを含めた領域データ２３ａを推定する。 FIG. 20 is a diagram when both hands are recognized in the “finished product storage area” as the processing result of FIG.
The area estimation unit 13a satisfies the requirement of the second record (finished product storage area) of the area model 22a of FIG. 11 with respect to the area 102 on the image data 18 of FIG. 10 (that is, both hands are placed on the left side). Then, the area data 23a including the area label "store the part on the left side" is estimated.

図２１は、姿勢推定部１３ｂのサブルーチン処理を示すフローチャートである。
Ｓ２４１として、姿勢推定部１３ｂは、画像データ１８の画像フレームごとに、そのフレームにて検出された人物の骨格データ１９を取得する。
Ｓ２４２として、姿勢推定部１３ｂは、取得した骨格データ１９を入力として、姿勢モデル２２ｂを用いて機械学習による推論を行う。これにより、骨格データ１９に対応する姿勢ラベルが出力される。
なお、ユーザは、Ｓ２４２の姿勢ラベルが実際と異なった場合（推論ミス）には（Ｓ２４３，Ｙｅｓ）、姿勢学習部１２ｂは、ユーザから正しい姿勢ラベルを受け付けてもよい。そして、姿勢学習部１２ｂは、受け付けた姿勢ラベルと、取得した骨格データ１９との組み合わせを新たな学習データとして、姿勢モデル２２ｂを修正（再学習）してもよい（Ｓ２４４）。 FIG. 21 is a flowchart showing the subroutine processing of the posture estimation unit 13b.
As S241, the posture estimation unit 13b acquires the skeleton data 19 of the person detected in each image frame of the image data 18.
As S242, the posture estimation unit 13b takes the acquired skeleton data 19 as an input and performs inference by machine learning using the posture model 22b. As a result, the posture label corresponding to the skeleton data 19 is output.
When the posture label of S242 is different from the actual one (inference error), the user may accept the correct posture label from the user (S243, Yes). Then, the posture learning unit 12b may modify (re-learn) the posture model 22b by using the combination of the received posture label and the acquired skeleton data 19 as new learning data (S244).

Ｓ２４５として、姿勢推定部１３ｂは、出力された姿勢ラベルを推論結果として保持する。
Ｓ２４６として、姿勢推定部１３ｂは、未処理の人物がＳ２４１の画像フレームに存在するときには、処理をＳ２４３に戻す。
Ｓ２４７として、姿勢推定部１３ｂは、画像フレームに存在するすべての人物についてのＳ２４５で保持した推論結果を姿勢データ２３ｂとして出力する。
Ｓ２４８として、姿勢推定部１３ｂは、未処理のフレームが存在するときには、処理をＳ２４１に戻す。 As S245, the posture estimation unit 13b holds the output posture label as an inference result.
As S246, the posture estimation unit 13b returns the processing to S243 when an unprocessed person exists in the image frame of S241.
As S247, the posture estimation unit 13b outputs the inference results held in S245 for all the persons existing in the image frame as the posture data 23b.
As S248, the posture estimation unit 13b returns the processing to S241 when there is an unprocessed frame.

図２２は、図２１の処理に用いられる画像データ１８を示す図である。出力部１４は、時系列的に表示する画像データ１８に対して、左側から右側にむかってフレーム番号（f10＝10番、f30＝30番、…）を併記する。各フレームの画像データ１８には、骨格抽出部１１が認識した骨格データ１９を示す線も人物の画像へ重畳表示されている。 FIG. 22 is a diagram showing image data 18 used in the processing of FIG. 21. The output unit 14 also writes frame numbers (f10 = 10, f30 = 30, ...) From the left side to the right side with respect to the image data 18 to be displayed in time series. In the image data 18 of each frame, a line indicating the skeleton data 19 recognized by the skeleton extraction unit 11 is also superimposed and displayed on the image of the person.

図２３は、図２２の画像データ１８に対する推論ラベル（姿勢ラベル）を示す姿勢データ２３ｂの図である。
姿勢推定部１３ｂは、ビデオカメラ３１などから取得した画像データ１８から、写っている人物の行動を構成する姿勢を分析し、その分析結果を姿勢データ２３ｂとして出力する。姿勢データ２３ｂには、検出時刻を示すフレーム番号が付されている。
この出力される姿勢データ２３ｂは、例えば、製造業の組み立て現場での組み立て作業の手順に係る作業姿勢や、製造業の製造現場での作業安全にかかわる身体的負担の大きい作業姿勢を検出するために活用できる。 FIG. 23 is a diagram of posture data 23b showing an inference label (posture label) for the image data 18 of FIG. 22.
The posture estimation unit 13b analyzes the posture constituting the behavior of the person in the image from the image data 18 acquired from the video camera 31 or the like, and outputs the analysis result as the posture data 23b. A frame number indicating the detection time is attached to the posture data 23b.
The output posture data 23b is for detecting, for example, a work posture related to the procedure of the assembly work at the assembly site of the manufacturing industry and a work posture having a heavy physical burden related to work safety at the manufacturing site of the manufacturing industry. Can be used for.

図２４は、背景推定部１３ｃのサブルーチン処理を示すフローチャートである。
Ｓ２７１として、背景推定部１３ｃは、画像フレームごとの画像データ１８を取得する。
Ｓ２７２として、背景推定部１３ｃは、背景モデル２２ｃから１レコード分（１つの背景領域）を取得する。
Ｓ２７３として、背景推定部１３ｃは、Ｓ２７１の画像データ１８から、Ｓ２７２の背景領域の位置の画像を切り取る。
Ｓ２７４として、背景推定部１３ｃは、Ｓ２７３で切り取った画像データ１８を入力として、背景モデル２２ｃを用いて機械学習の推論を実行する。
Ｓ２７５として、背景推定部１３ｃは、Ｓ２７４の推論結果として、背景ラベルを保持する。
Ｓ２７６として、背景推定部１３ｃは、背景モデル２２ｃ内の未処理のレコードが存在するときには、処理をＳ２７２に戻す。
Ｓ２７７として、背景推定部１３ｃは、未処理のフレームが存在するときには、処理をＳ２７１に戻す。
Ｓ２７８として、背景推定部１３ｃは、すべての背景ラベルの推論結果を出力する。 FIG. 24 is a flowchart showing the subroutine processing of the background estimation unit 13c.
As S271, the background estimation unit 13c acquires the image data 18 for each image frame.
As S272, the background estimation unit 13c acquires one record (one background area) from the background model 22c.
As S273, the background estimation unit 13c cuts out an image of the position of the background region of S272 from the image data 18 of S271.
As S274, the background estimation unit 13c takes the image data 18 cut out in S273 as an input and executes machine learning inference using the background model 22c.
As S275, the background estimation unit 13c holds the background label as the inference result of S274.
As S276, the background estimation unit 13c returns the processing to S272 when there is an unprocessed record in the background model 22c.
As S277, the background estimation unit 13c returns the processing to S271 when there is an unprocessed frame.
As S278, the background estimation unit 13c outputs the inference results of all the background labels.

図２５は、図２４の処理結果として、ドライバが未使用の状態を認識したときの図である。
出力部１４は、人物行動と関連のある背景画像の情報として、背景ラベルの「未使用」を、人物行動画像へ重畳表示する（符号１０３ｅ）。また、符号１１０で示すように、出力部１４は、画像データ１８の各フレームについて、骨格抽出部１１が認識した骨格データ１９を人物の画像へ重畳表示する。さらに、出力部１４は、骨格データ１９を構成する特徴点データ（関節点）をマーキングして表示する（図では丸印）。 FIG. 25 is a diagram when the driver recognizes an unused state as a result of the processing of FIG. 24.
The output unit 14 superimposes and displays "unused" of the background label on the person action image as the information of the background image related to the person action (reference numeral 103e). Further, as indicated by reference numeral 110, the output unit 14 superimposes and displays the skeleton data 19 recognized by the skeleton extraction unit 11 on the image of a person for each frame of the image data 18. Further, the output unit 14 marks and displays the feature point data (joint points) constituting the skeleton data 19 (circled in the figure).

図２６は、図２４の処理結果として、ドライバが使用中の状態を認識したときの図である。出力部１４は、人物行動と関連のある背景画像の情報として、背景ラベルの「使用中」を、人物行動画像へ重畳表示する（符号１０３ｆ）。 FIG. 26 is a diagram when the driver recognizes the state in use as the processing result of FIG. 24. The output unit 14 superimposes and displays "in use" of the background label on the person action image as the information of the background image related to the person action (reference numeral 103f).

図２７は、手順推定部１３ｄが出力する手順データ２３ｄの例を示す図である。
図８でも説明したように、手順推定部１３ｄは、領域データ２３ａ、姿勢データ２３ｂ、背景データ２３ｃそれぞれの分析結果を組み合わせて、作業者の手順データ２３ｄ（作業状態）をフレーム番号ごとに決定する。例えば、背景データ２３ｃとして、ドライバが未使用の状態（図２５）と使用中の状態（図２６）とを区別することで、出力手順が組み立て中か否かを決定することができる。 FIG. 27 is a diagram showing an example of procedure data 23d output by the procedure estimation unit 13d.
As described in FIG. 8, the procedure estimation unit 13d combines the analysis results of the area data 23a, the posture data 23b, and the background data 23c to determine the procedure data 23d (working state) of the operator for each frame number. .. For example, as the background data 23c, it is possible to determine whether or not the output procedure is being assembled by distinguishing between the unused state (FIG. 25) and the in-use state (FIG. 26) of the driver.

図２８は、図２７の手順データ２３ｄをガントチャート形式で表示した画面図である。
手順推定部１３ｄは、手順データ２３ｄのフレーム番号は特定の時刻を示しているので、手順データ２３ｄから時系列の作業手順（出力手順）を求めることができる。そこで、出力部１４は、時系列の作業手順をガントチャート形式で表示することで、作業手順ごとの所要時間をユーザにわかりやすく示すことができる。 FIG. 28 is a screen view showing the procedure data 23d of FIG. 27 in a Gantt chart format.
Since the frame number of the procedure data 23d indicates a specific time in the procedure estimation unit 13d, a time-series work procedure (output procedure) can be obtained from the procedure data 23d. Therefore, the output unit 14 can easily show the user the time required for each work procedure by displaying the time-series work procedure in the Gantt chart format.

以上説明した本実施形態では、分析装置１０が深層学習などによる画像認識で画像データ１８に写っている人物を検出し、その人物の骨格データ１９を取得する。そして、分析装置１０は、取得した骨格データ１９と、事前に入力された正解ラベルとを姿勢モデル２２ｂとして機械学習しておくことで、製造現場における作業者の姿勢を推定し、その姿勢から作業者の作業手順を特定する。
また、分析装置１０は、骨格データ１９との位置関係を判定するための領域モデル２２ａと、人物が置かれている状況を示す背景モデル２２ｃとを併せて用いることで、作業手順の特定精度を向上させる。これにより、深層学習や画像認識の知識を問わず、簡便な方法で、作業者を撮影した画像データ１８から、作業者の姿勢データ２３ｂを分析し、作業者の手順データ２３ｄを特定できる。 In the present embodiment described above, the analyzer 10 detects a person appearing in the image data 18 by image recognition by deep learning or the like, and acquires the skeleton data 19 of the person. Then, the analyzer 10 estimates the posture of the worker at the manufacturing site by machine learning the acquired skeleton data 19 and the correct answer label input in advance as the posture model 22b, and works from that posture. Identify the work procedure of the person.
Further, the analyzer 10 can determine the accuracy of specifying the work procedure by using the area model 22a for determining the positional relationship with the skeleton data 19 and the background model 22c indicating the situation where the person is placed. Improve. Thereby, regardless of the knowledge of deep learning and image recognition, the posture data 23b of the worker can be analyzed from the image data 18 of the photographed worker by a simple method, and the procedure data 23d of the worker can be specified.

なお、本発明は前記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、前記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。
また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。
また、各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。また、上記の各構成、機能、処理部、処理手段などは、それらの一部または全部を、例えば集積回路で設計するなどによりハードウェアで実現してもよい。
また、前記の各構成、機能などは、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。 The present invention is not limited to the above-described embodiment, and includes various modifications. For example, the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to those having all the described configurations.
Further, it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment.
Further, it is possible to add / delete / replace a part of the configuration of each embodiment with another configuration. Further, each of the above configurations, functions, processing units, processing means and the like may be realized by hardware by designing a part or all of them by, for example, an integrated circuit.
Further, each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function.

各機能を実現するプログラム、テーブル、ファイルなどの情報は、メモリや、ハードディスク、ＳＳＤ（Solid State Drive）などの記録装置、または、ＩＣ（Integrated Circuit）カード、ＳＤカード、ＤＶＤ（Digital Versatile Disc）などの記録媒体に置くことができる。
また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。実際にはほとんど全ての構成が相互に接続されていると考えてもよい。 Information such as programs, tables, and files that realize each function can be stored in memory, hard disks, recording devices such as SSDs (Solid State Drives), IC (Integrated Circuit) cards, SD cards, DVDs (Digital Versatile Discs), etc. Can be placed on the recording medium of.
In addition, the control lines and information lines indicate those that are considered necessary for explanation, and do not necessarily indicate all the control lines and information lines in the product. In practice, it can be considered that almost all configurations are interconnected.

１０分析装置
１１骨格抽出部
１２モデル生成部
１２ａ領域定義部
１２ｂ姿勢学習部
１２ｃ背景定義部
１２ｄ手順学習部
１３分析部
１３ａ領域推定部
１３ｂ姿勢推定部
１３ｃ背景推定部
１３ｄ手順推定部
１４出力部
１８画像データ
１９骨格データ
２０記憶部
２２モデルデータ
２２ａ領域モデル
２２ｂ姿勢モデル
２２ｃ背景モデル
２２ｄ手順モデル
２３推定結果データ
２３ａ領域データ
２３ｂ姿勢データ
２３ｃ背景データ
２３ｄ手順データ
３１ビデオカメラ
３２ビデオレコーダ
３３入出力装置
４１モニタ
４２記憶装置
４３アプリケーション装置 10 Analyzer 11 Skeletal extraction unit 12 Model generation unit 12a Area definition unit 12b Attitude learning unit 12c Background definition unit 12d Procedure learning unit 13 Analysis unit 13a Area estimation unit 13b Attitude estimation unit 13c Background estimation unit 13d Procedure estimation unit 14 Output unit 18 Image data 19 Skeletal data 20 Storage unit 22 Model data 22a Area model 22b Attitude model 22c Background model 22d Procedure model 23 Estimated result data 23a Area data 23b Attitude data 23c Background data 23d Procedure data 31 Video camera 32 Video recorder 33 Input / output device 41 Monitor 42 Storage device 43 Application device

Claims

A skeleton extraction unit that acquires skeleton data including feature point data indicating the joint position of a person reflected in the image data by image recognition using image data as input.
A storage unit of a posture model to which a posture label is associated with each of the skeleton data.
To make the computer function as a posture estimation unit that determines the posture of a person reflected in the image data from the posture label predetermined in the posture model based on the skeleton data acquired by the skeleton extraction unit. Posture analysis program.

The posture according to claim 1, wherein the computer functions as an output unit for marking and displaying the feature point data of the skeleton data acquired by the skeleton extraction unit based on the result of determination by the posture estimation unit. Analysis program.

Claim 1 is characterized in that a computer functions as an output unit for displaying the skeleton data acquired by the skeleton extraction unit for each of the image data in a time series based on the result of determination by the posture estimation unit. Posture analysis program described in.

The computer is operated as a posture learning unit that machine-learns the posture model using the posture label, which is a correct answer label input for each skeleton data, as training data.
The posture analysis program according to claim 1, wherein the posture estimation unit further outputs the posture label by inputting the skeleton data by machine learning inference using the posture model.

Background reflected in the background area defined as a part of the image data A background definition unit that machine-learns a background model using a background label, which is an input correct label, as training data for each image data.
The posture analysis program according to claim 4, wherein the computer functions as a background estimation unit that outputs the background label by inputting the background image data by machine learning inference using the background model.

An area definition unit that defines an area model in which the coordinates of the feature point region defined as a part of the image data, the feature point data for determining whether or not the feature point region is within the feature point region, and the region label are associated with each other.
A claim for operating a computer as a region estimation unit that outputs a corresponding region label when the feature point data of the skeleton data acquired by the skeleton extraction unit exists in the feature point region of the region model. The posture analysis program according to 5.

The posture analysis program according to claim 5, wherein the computer functions as an output unit that superimposes and displays the background label output by the background estimation unit in association with the background area of the image data.

The fourth aspect of the present invention is characterized in that, when the posture label output by the posture estimation unit is modified, the posture learning unit relearns the posture model based on the modified posture label. Posture analysis program.

From at least one of the posture label, the background label, and the area label, the person who appears in the image data performs the procedure based on a procedure model for identifying the procedure of the work performed by the person who appears in the image data. Procedure estimation unit that identifies the work procedure,
The posture analysis program according to claim 6, wherein the computer functions as an output unit that outputs the required time for each specified work procedure to the display unit.

A skeleton extraction unit that acquires skeleton data including feature point data indicating the joint position of a person reflected in the image data by image recognition using image data as input.
A storage unit of a posture model to which a posture label is associated with each skeleton data,
Based on the skeleton data acquired by the skeleton extraction unit, the posture model has a posture estimation unit that determines the posture of a person reflected in the image data from the posture label predetermined. Posture analyzer.