JP2019121045A

JP2019121045A - Posture estimation system, behavior estimation system, and posture estimation program

Info

Publication number: JP2019121045A
Application number: JP2017253937A
Authority: JP
Inventors: 一谷　修司; Shuji Ichitani; 修司一谷
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2017-12-28
Filing date: 2017-12-28
Publication date: 2019-07-22

Abstract

To provide a posture estimation system capable of accurately estimating a posture of a person in a photographed image while suppressing learning data used for machine learning, regardless of the posture of the person.SOLUTION: A posture estimation system, which estimates the posture of an object person based on a photographed image, includes: a human region detection unit that detects a region including an image of the object person in the photographed image as a human region; a head region detection unit that detects a region including a head image of the object person in the photographed image as a head region; an image rotation unit that unifies a direction of the head region with respect to a predetermined point in the human region in the photographed image in which a plurality of postures of the object person are respectively photographed by rotating each of the photographed images in a predetermined direction; and a posture estimation unit that estimates a joint point of the an object person as a posture by machine learning based on the photographed image after rotation.SELECTED DRAWING: Figure 3

Description

本発明は、姿勢推定システム、行動推定システム、および姿勢推定プログラムに関する。 The present invention relates to a posture estimation system, an action estimation system, and a posture estimation program.

我が国は、戦後の高度経済成長に伴う生活水準の向上、衛生環境の改善、および医療水準の向上等により、長寿命化が顕著となっている。このため、出生率の低下と相まって、高齢化率が高い高齢化社会になっている。このような高齢化社会では、病気、怪我、および加齢などにより、介護を必要とする要介護者等の増加が想定される。 Japan's life expectancy has become remarkable due to the improvement of living standards, the improvement of sanitation environment, and the improvement of medical standards following the postwar high economic growth. For this reason, coupled with the decline in the birth rate, it has become an aging society with a high aging rate. In such an aging society, an increase in the number of carers requiring care is expected due to illness, injury, and aging.

要介護者等は、病院や老人福祉施設などの施設において、歩行中に転倒したり、ベッドから転落して怪我をする可能性が比較的高い。そのため、要介護者等がこのような状態になったときに介護士等がすぐに駆けつけられるようにするために、要介護者等の状態を検知するためのシステムの開発が進められている。 Those in need of care etc. are relatively likely to fall during a walk or fall from a bed and get injured in a facility such as a hospital or nursing home. Therefore, development of a system for detecting the condition of the care recipient has been advanced in order to allow the care worker or the like to arrive immediately when the care recipient becomes such a condition.

下記特許文献１には、次の技術が開示されている。すなわち、魚眼レンズカメラで撮影された、立位の姿勢にある人の画像を含む撮影画像を、当該人の画像を頭側が上に、脚側が下になるように回転して、人の向きを統一する。そして、回転後の画像中の人の位置に応じて選択した、ニューラルネットワークの物体検知方式により、人の存在を検知する。これにより、画像中の位置に応じて歪みが変化する魚眼レンズカメラの画像から、立位にある人の存在を高精度に検知できる。 The following technology is disclosed in Patent Document 1 below. That is, a photographed image including an image of a person in a standing posture taken with a fisheye lens camera is rotated so that the image of the person is on the head side and the leg side is on the bottom, and the direction of the person is unified. Do. Then, the presence of a person is detected by an object detection method of a neural network selected according to the position of the person in the image after rotation. Thus, the presence of a person in a standing position can be detected with high accuracy from the image of the fisheye lens camera whose distortion changes according to the position in the image.

国際公開第２０１３／００１９４１号International Publication No. 2013/001941

しかし、特許文献１に開示された技術は、撮影画像中の立位の姿勢の人の存在は高精度に検知できるが、立位以外の姿勢の人の存在を高精度に検知できないという問題がある。また、立位の姿勢の人の存在を検知することに対応できても、撮影画像中の人の姿勢によらずに当該人の姿勢を高精度に検知することに対応できないという問題がある。 However, the technique disclosed in Patent Document 1 can detect the presence of a person in a standing posture with high accuracy in a captured image, but can not detect the presence of a person in a posture other than a standing posture with high accuracy. is there. Further, even if it is possible to detect the presence of the person in the standing posture, there is a problem that it is not possible to detect the posture of the person with high accuracy regardless of the posture of the person in the photographed image.

本発明は、このような問題を解決するためになされたものである。すなわち、撮影画像における人の姿勢によらず、当該人の姿勢を、機械学習に用いる学習データを抑制しつつ高精度に推定可能な姿勢推定システム、行動推定システム、および姿勢推定プログラムを提供することを目的とする。 The present invention has been made to solve such problems. That is, it is possible to provide a posture estimation system, an action estimation system, and a posture estimation program capable of accurately estimating the posture of a person while suppressing learning data used for machine learning regardless of the person's posture in a photographed image. With the goal.

本発明の上記課題は、以下の手段によって解決される。 The above problems of the present invention are solved by the following means.

（１）撮影画像に基づいて対象者の姿勢を推定する姿勢推定システムであって、前記撮影画像における、前記対象者の画像を含む領域を人領域として検出する人領域検出部と、前記撮影画像における、前記対象者の頭部の画像を含む領域を頭部領域として検出する頭部領域検出部と、前記対象者の複数の姿勢がそれぞれ撮像された前記撮影画像における、前記人領域内の所定点に対する前記頭部領域の方向を、それぞれ前記撮影画像を回転させることで、所定方向に統一する画像回転部と、回転後の前記撮影画像に基づいて、機械学習により、前記対象者の関節点を姿勢として推定する姿勢推定部と、を有する姿勢推定システム。 (1) A posture estimation system for estimating a posture of a subject based on a photographed image, the human area detection unit detecting a region including an image of the subject in the photographed image as a human area, and the photographed image A head area detection unit for detecting an area including an image of the head of the subject as the head area, and a position in the human area in the photographed image in which a plurality of postures of the subject are respectively captured The joint point of the target person is machine-learned based on an image rotating unit that unifies the direction of the head region with respect to a fixed point in the predetermined direction by rotating the captured image, and the captured image after rotation. A posture estimation unit that estimates a posture as the posture.

（２）前記人領域は、前記対象者の画像の輪郭内の領域または前記対象者の画像の輪郭が外接する矩形内の領域であり、前記頭部領域は、前記対象者の頭部の画像の輪郭内の領域または前記頭部の画像の輪郭が外接する矩形内の領域である、上記（１）に記載の姿勢推定システム。 (2) The human region is a region within the contour of the image of the subject person or a region within a rectangle circumscribed by the contour of the image of the subject person, and the head region is an image of the head of the subject person The posture estimation system according to (1) above, which is a region within the contour of or a region within a rectangle circumscribed by the contour of the image of the head.

（３）前記所定方向は、前記撮影画像における垂直上方向である、上記（１）または（２）に記載の姿勢推定システム。 (3) The posture estimation system according to (1) or (2), wherein the predetermined direction is a vertically upward direction in the photographed image.

（４）前記頭部領域の方向は、前記所定点である前記人領域の重心から前記頭部領域の重心へ向かう方向である、上記（１）〜（３）のいずれかに記載の姿勢推定システム。 (4) The posture estimation according to any one of (1) to (3), wherein the direction of the head region is a direction from the center of gravity of the human region which is the predetermined point toward the center of gravity of the head region. system.

（５）前記姿勢推定部は、ＤｅｅｐＰｏｓｅにより前記対象者の姿勢を推定する、上記（１）〜（４）のいずれかに記載の姿勢推定システム。 (5) The posture estimation system according to any one of (1) to (4), wherein the posture estimation unit estimates the posture of the subject by Deep Pose.

（６）前記撮影画像の中心から前記人領域までの距離を算出する距離算出部と、前記距離が所定の閾値以下の場合、前記画像回転部による回転後の前記撮影画像において前記人領域の近似楕円を算出する近似楕円算出部と、回転後の前記撮影画像を、前記近似楕円の長軸の方向が、前記撮影画像において水平方向となるように、再度回転させる画像再回転部と、をさらに有し、前記姿勢推定部は、前記画像再回転部による回転後の前記撮影画像に基づいて、前記対象者の姿勢を推定する、上記（１）〜（５）のいずれかに記載の姿勢推定システム。 (6) A distance calculation unit that calculates the distance from the center of the captured image to the human region, and the approximation of the human region in the captured image after rotation by the image rotation unit when the distance is less than a predetermined threshold An approximate ellipse calculation unit that calculates an ellipse, and an image re-rotation unit that rotates the captured image after rotation again such that the direction of the major axis of the approximate ellipse is in the horizontal direction in the captured image The posture estimation unit according to any one of (1) to (5), wherein the posture estimation unit estimates the posture of the subject based on the photographed image after rotation by the image re-rotation unit. system.

（７）前記姿勢推定部は、前記機械学習で使用する辞書を、前記距離算出部により算出された前記距離に基づいて選択する、上記（６）に記載の姿勢推定システム。 (7) The posture estimation system according to (6), wherein the posture estimation unit selects a dictionary used in the machine learning based on the distance calculated by the distance calculation unit.

（８）前記撮影画像における前記頭部領域の大きさを、頭部サイズとして算出する頭部サイズ算出部をさらに有し、前記姿勢推定部は、前記機械学習に使用する辞書を、前記頭部サイズに基づいて選択する、上記（６）に記載の姿勢推定システム。 (8) The image processing apparatus further includes a head size calculation unit that calculates the size of the head region in the captured image as a head size, and the posture estimation unit is configured to use a dictionary used for the machine learning. The posture estimation system according to (6) above, which is selected based on the size.

（９）前記人領域の面積の前記頭部領域の面積に対する比を、人体／頭部サイズ比として算出する人体／頭部サイズ比算出部をさらに有し、前記姿勢推定部は、前記機械学習で使用する辞書を、前記人体／頭部サイズ比に基づいて選択する、上記（１）〜（６）のいずれかに記載の姿勢推定システム。 (9) The human body / head size ratio calculating unit further calculates a ratio of the area of the human region to the area of the head region as a human body / head size ratio, and the posture estimation unit is the machine learning The posture estimation system according to any one of the above (1) to (6), which selects a dictionary to be used in accordance with the human body / head size ratio.

（１０）前記人領域は前記対象者の画像の輪郭が外接する矩形内の領域であり、前記矩形の縦辺の長さの横辺の長さに対する比であるアスペクト比を算出するアスペクト比算出部をさらに有し、前記姿勢推定部は、前記機械学習で使用する辞書を、前記アスペクト比に基づいて選択する、上記（２）に記載の姿勢推定システム。 (10) The human area is an area within a rectangle circumscribed by the contour of the image of the subject, and the aspect ratio is calculated to calculate the aspect ratio which is the ratio of the length of the vertical side of the rectangle to the length of the horizontal side The posture estimation system according to (2), further including a part, wherein the posture estimation unit selects a dictionary used in the machine learning based on the aspect ratio.

（１１）前記撮影画像の中心から前記人領域までの距離を算出する距離算出部と、前記撮影画像における前記頭部領域の大きさを、頭部サイズとして算出する頭部サイズ算出部と、前記人領域の面積に対する前記頭部領域の面積の比を、人体／頭部サイズ比として算出する人体／頭部サイズ比算出部と、前記人領域が前記対象者の画像の輪郭が外接する矩形内の領域である場合に、前記人領域の縦辺の長さの横辺の長さに対する比であるアスペクト比を算出するアスペクト比算出部と、をさらに有し、前記姿勢推定部は、前記機械学習で使用する辞書を、算出された前記距離、前記頭部サイズ、前記人体／頭部サイズ比、および前記アスペクト比の少なくともいずれかに基づいて選択する、上記（１）〜（５）のいずれかに記載の姿勢推定システム。 (11) A distance calculation unit that calculates a distance from the center of the captured image to the human region, a head size calculation unit that calculates the size of the head region in the captured image as a head size, A human body / head size ratio calculating unit that calculates the ratio of the area of the head area to the area of a human area as a human body / head size ratio, and a rectangle in which the human area is circumscribed by the contour of the image of the subject The aspect ratio calculating unit further calculates an aspect ratio that is a ratio of the length of the vertical side of the human area to the length of the horizontal side of the human area, and the posture estimation unit further includes: Any one of the above (1) to (5), wherein a dictionary used for learning is selected based on at least one of the calculated distance, the head size, the human body / head size ratio, and the aspect ratio. Pose estimation system described in crab Temu.

（１２）上記（１）〜（１１）のいずれかに記載の姿勢推定システムと、回転後の前記撮影画像に対応する前記関節点として推定された前記姿勢を逆回転することで回転前の前記撮影画像に対応する前記姿勢を算出する回転前姿勢算出部と、前記回転前姿勢算出部により算出された、複数の前記姿勢の間の変化に基づいて前記対象者の行動を推定する行動推定部と、を有する行動推定システム。 (12) The posture estimation system according to any one of the above (1) to (11), and the rotation before rotation by reversely rotating the posture estimated as the joint point corresponding to the photographed image after rotation. A pre-rotation posture calculation unit that calculates the posture corresponding to a photographed image; and an action estimation unit that estimates the action of the subject based on changes among the plurality of postures calculated by the pre-rotation posture calculation unit And an action estimation system having:

（１３）撮影画像における、対象者の画像を含む領域を人領域として検知する手順（ａ）と、前記撮影画像における、前記対象者の頭部の画像を含む領域を頭部領域として検知する手順（ｂ）と、前記対象者の複数の姿勢がそれぞれ撮像された前記撮影画像における、前記人領域内の所定点に対する前記頭部領域の方向を、前記撮影画像を回転させることで、所定方向に統一する手順（ｃ）と、回転後の前記撮影画像に基づいて、機械学習により、前記対象者の関節位置を推定する手順（ｄ）と、をコンピューターに実行させるための姿勢推定プログラム。 (13) A procedure (a) of detecting a region including an image of a subject in a photographed image as a human region, and a procedure of detecting a region including an image of a head of the subject in the photographed image as a head region (B) In the photographed image in which a plurality of postures of the target person are imaged, the direction of the head area with respect to the predetermined point in the human area is rotated in the predetermined direction by rotating the photographed image A posture estimation program for causing a computer to execute a procedure (c) for unifying and a procedure (d) for estimating a joint position of the subject by machine learning based on the captured image after rotation.

撮影画像における人の頭部の方向を、撮影画像を回転することで所定方向に統一し、回転後の撮影画像に基づいて機械学習により、人の関節点を姿勢として推定する。これにより、撮影画像の人の姿勢によらずに、少ない学習データで、当該人の姿勢を高精度に推定できる。 The direction of the head of a person in the photographed image is unified into a predetermined direction by rotating the photographed image, and the joint point of the person is estimated as a posture by machine learning based on the photographed image after rotation. This makes it possible to estimate the posture of the person with high accuracy with a small amount of learning data regardless of the posture of the person in the photographed image.

姿勢推定システムの概略構成を示す図である。FIG. 1 is a diagram showing a schematic configuration of a posture estimation system. 姿勢推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of an attitude | position estimation apparatus. 姿勢推定装置の制御部の機能を示すブロック図である。It is a block diagram which shows the function of the control part of an attitude | position estimation apparatus. 頭部方向を上方向に統一するための撮影画像の回転について説明するための説明図である。It is an explanatory view for explaining rotation of a photography picture in order to unify head direction upward. 回転前の撮影画像と回転後の撮影画像を示す説明図である。It is explanatory drawing which shows the picked-up image before rotation, and the picked-up image after rotation. 回転後の撮影画像における人矩形の再算出について説明するための説明図である。It is an explanatory view for explaining recalculation of a person rectangle in a photography picture after rotation. 人シルエットを利用した、回転後の撮影画像における人矩形の再算出について説明するための説明図である。It is an explanatory view for explaining recalculation of a person rectangle in a photography picture after rotation using a person silhouette. 再度の回転前の撮影画像と再度の回転後の撮影画像を示す説明図である。It is explanatory drawing which shows the picked-up image before rotation again, and the picked-up image after rotation again. 機械学習に使用される、区分されたモデルごとの辞書の一覧を示す図である。It is a figure which shows the list of the dictionary for every divided model used for machine learning. 対象距離が小さい場合であって、頭部サイズが大きい場合と小さい場合の撮影画像を示す説明図である。It is explanatory drawing which shows the picked-up image in the case where object size is small, and when head size is large and small. 対象距離が大きい場合であって、対象者の姿勢が立位の場合と座位の場合の撮影画像を示す説明図である。It is explanatory drawing which shows the picked-up image in the case where an object distance is large, and a posture of a subject is a standing position and a sitting position. 対象距離が大きい場合であって、立位の姿勢の対象者の人矩形アスペクト比が小さい場合と大きい場合を示す説明図である。It is an explanatory view showing a case where a subject distance is large, and a human rectangle aspect ratio of a subject of a standing posture is small and large. 機械学習の入力画像である撮影画像と、出力される推定姿勢を示す説明図である。It is an explanatory view showing a photography picture which is an input picture of machine learning, and a presumed posture outputted. 行動推定システムの動作を示すフローチャートである。It is a flow chart which shows operation of an action presumption system. 行動推定システムの動作を示すフローチャートである。It is a flow chart which shows operation of an action presumption system.

以下、図面を参照して、本発明の実施形態に係る姿勢推定システム、行動推定システム、および姿勢推定プログラムについて説明する。なお、図面において、同一の要素には同一の符号を付し、重複する説明を省略する。また、図面の寸法比率は、説明の都合上誇張されており、実際の比率とは異なる場合がある。 Hereinafter, with reference to the drawings, a posture estimation system, an action estimation system, and a posture estimation program according to an embodiment of the present invention will be described. In the drawings, the same elements will be denoted by the same reference symbols, without redundant description. Also, the dimensional proportions of the drawings are exaggerated for the convenience of the description, and may differ from the actual proportions.

図１は、姿勢推定システムの概略構成を示す図である。 FIG. 1 is a diagram showing a schematic configuration of a posture estimation system.

姿勢推定システム１０は、姿勢推定装置１００、撮影装置２００、通信ネットワーク３００を有する。姿勢推定装置１００は、通信ネットワーク３００により撮影装置２００と相互に通信可能に接続される。なお、姿勢推定システム１０は、姿勢推定装置１００のみにより構成され得る。また、姿勢推定システム１０は、行動推定システムを構成する。 The posture estimation system 10 includes a posture estimation device 100, an imaging device 200, and a communication network 300. Posture estimation apparatus 100 is communicably connected to imaging apparatus 200 by communication network 300. Posture estimation system 10 may be configured only by posture estimation apparatus 100. Posture estimation system 10 constitutes an action estimation system.

姿勢推定装置１００は、撮影装置２００から受信した撮影画像に含まれる、対象者４００である人の姿勢を、機械学習により対象者４００の関節点を算出することで推定する。姿勢推定装置１００は、機械学習として、たとえば深層学習を使用し得る。深層学習には、たとえば、ＤｅｅｐＰｏｓｅ、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎＮｅｕｒａｌＮｅｔｗｏｒｋ）、およびＲｅｓＮｅｔが含まれる。なお、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）およびＲａｎｄｏｍＦｏｒｅｓｔといった深層学習以外の機械学習を使用してもよい。 The posture estimation apparatus 100 estimates the posture of a person who is the target person 400 included in the captured image received from the imaging device 200 by calculating joint points of the target person 400 by machine learning. Posture estimation apparatus 100 may use, for example, deep learning as machine learning. Deep learning includes, for example, Deep Pose, CNN (Convolution Neural Network), and Res Net. Note that machine learning other than deep learning such as SVM (Support Vector Machine) and Random Forest may be used.

撮影装置２００は、たとえば近赤外線カメラにより構成され、所定の位置に設置されることで、当該所定の位置を視点として俯瞰される撮影領域を撮影する。すなわち、撮影装置２００は、ＬＥＤ（ＬｉｇｈｔＥｍｉｔｔｉｎｇＤｅｖｉｃｅ）により近赤外線を撮影領域に向けて照射し、撮影領域内の物体により反射される近赤外線の反射光をＣＭＯＳ（ＣｏｍｐｌｅｍｅｍｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）センサーにより受光することで撮影領域を撮影し得る。撮影画像は近赤外線の反射率を各画素とするモノクロ画像であり得る。所定の位置は、たとえば対象者４００の居室の天井とし得る。撮影領域は、たとえば居室の床全体を含む三次元の領域とし得る。撮影装置２００は、たとえば１５ｆｐｓ〜３０ｆｐｓのフレームレートの、複数の撮影画像からなる動画として撮影領域を撮影し得る。 The imaging device 200 is configured of, for example, a near-infrared camera, and installed at a predetermined position, thereby photographing an imaging region viewed from the predetermined position as a viewpoint. That is, the imaging device 200 irradiates near infrared rays toward the imaging area with an LED (Light Emitting Device), and the reflected light of the near infrared rays reflected by an object in the imaging area is received by a CMOS (Complementary Metal Oxide Semiconductor) sensor By doing this, the shooting area can be shot. The photographed image may be a monochrome image in which the reflectance of near-infrared light is each pixel. The predetermined position may be, for example, the ceiling of the room of the subject 400. The imaging area may be, for example, a three-dimensional area including the entire floor of the living room. The imaging device 200 may capture the imaging area as a moving image composed of a plurality of captured images at a frame rate of 15 fps to 30 fps, for example.

通信ネットワーク３００には、イーサネット（登録商標）などの有線通信規格によるネットワークインターフェースを使用し得る。通信ネットワーク３００には、Ｂｌｕｅｔｏｏｔｈ（登録商標）、ＩＥＥＥ８０２．１１などの無線通信規格によるネットワークインターフェースを使用してもよい。 The communication network 300 may use a network interface according to a wired communication standard such as Ethernet (registered trademark). The communication network 300 may use a network interface according to a wireless communication standard such as Bluetooth (registered trademark) or IEEE 802.11.

図２は、姿勢推定装置の構成を示すブロック図である。姿勢推定装置１００は、制御部１１０、記憶部１２０、表示部１３０、入力部１４０、および通信部１５０を有する。これらの構成要素は、バス１６０を介して相互に接続される。姿勢推定装置１００は、コンピューターにより構成され得る。 FIG. 2 is a block diagram showing the configuration of the posture estimation apparatus. Posture estimation apparatus 100 includes control unit 110, storage unit 120, display unit 130, input unit 140, and communication unit 150. These components are connected to one another via a bus 160. Posture estimation apparatus 100 may be configured by a computer.

制御部１１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）により構成され、プログラムにしたがって姿勢推定装置１００の各部の制御および演算処理を行う。制御部１１０の作用については後述する。 The control unit 110 is configured by a CPU (Central Processing Unit), and performs control and arithmetic processing of each unit of the posture estimation device 100 according to a program. The operation of the control unit 110 will be described later.

記憶部１２０は、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、およびフラッシュメモリにより構成され得る。ＲＡＭは、制御部１１０の作業領域として一時的にプログラムやデータを記憶する。ＲＯＭは、あらかじめ各種プログラムや各種データを格納する。フラッシュメモリは、オペレーションシステムを含む各種プログラムおよび各種データを格納する。 The storage unit 120 may be configured by a random access memory (RAM), a read only memory (ROM), and a flash memory. The RAM temporarily stores programs and data as a work area of the control unit 110. The ROM stores various programs and various data in advance. The flash memory stores various programs including the operation system and various data.

表示部１３０は、たとえば液晶ディスプレイであり、各種情報を表示する。 The display unit 130 is, for example, a liquid crystal display, and displays various information.

入力部１４０は、たとえばタッチパネルや各種キーにより構成される。入力部１４０は、各種操作、入力のために使用される。 Input unit 140 is formed of, for example, a touch panel or various keys. The input unit 140 is used for various operations and inputs.

通信部１５０は、外部機器と通信するためのインターフェースである。通信には、イーサネット（登録商標）、ＳＡＴＡ、ＰＣＩＥｘｐｒｅｓｓ、ＵＳＢ、ＩＥＥＥ１３９４などの規格によるネットワークインターフェースが用いられ得る。その他、通信には、Ｂｌｕｅｔｏｏｔｈ（登録商標）、ＩＥＥＥ８０２．１１、４Ｇなどの無線通信インターフェースが用いられ得る。通信部１５０は、撮影装置２００から撮影画像を受信する。また、通信部１５０は、外部機器である携帯端末（図示せず）へ、人の姿勢および行動の推定結果を送信し得る。 The communication unit 150 is an interface for communicating with an external device. For communication, a network interface according to standards such as Ethernet (registered trademark), SATA, PCI Express, USB, and IEEE 1394 can be used. In addition, for communication, a wireless communication interface such as Bluetooth (registered trademark), IEEE 802.11, 4G, etc. may be used. The communication unit 150 receives a photographed image from the photographing device 200. Also, the communication unit 150 can transmit the estimation result of the posture and behavior of a person to a portable terminal (not shown) which is an external device.

制御部１１０の作用について説明する。 The operation of the control unit 110 will be described.

図３は、姿勢推定装置の制御部の機能を示すブロック図である。制御部１１０は、人領域検出部１１１、頭部領域検出部１１２、画像回転部１１３、辞書選択部１１４、姿勢推定部１１５、画像逆回転部１１６、および行動推定部１１７を有する。画像回転部１１３は、距離算出部、近似楕円算出部、および画像再回転部を構成する。辞書選択部１１４は、頭部サイズ算出部、人体／頭部サイズ比算出部、およびアスペクト比算出部を構成する。画像逆回転部１１６は、回転前姿勢算出部を構成する。 FIG. 3 is a block diagram showing the function of the control unit of the posture estimation device. The control unit 110 includes a human area detection unit 111, a head area detection unit 112, an image rotation unit 113, a dictionary selection unit 114, a posture estimation unit 115, an image reverse rotation unit 116, and an action estimation unit 117. The image rotation unit 113 constitutes a distance calculation unit, an approximate ellipse calculation unit, and an image rerotation unit. The dictionary selection unit 114 configures a head size calculation unit, a human body / head size ratio calculation unit, and an aspect ratio calculation unit. The image reverse rotation unit 116 constitutes a pre-rotation posture calculation unit.

人領域検出部１１１は、撮影装置２００による撮影画像における、対象者４００の画像を含む領域を人領域として検出する。人領域は、撮影画像における対象者４００の画像の輪郭内の領域、または対象者４００の画像の輪郭が外接する矩形内の領域である。撮影画像における対象者４００の画像の輪郭内の領域は、対象者４００の画像のシルエット（以下、「人シルエット」と称する）として検出され得る。撮影画像における対象者４００の画像の輪郭内の領域は、撮影画像における対象者４００の画像を構成する画素の集合体として検出されてもよい。対象者４００の画像の輪郭が外接する矩形内の領域は、対象者４００の画像の輪郭が外接する矩形（以下、「人矩形」と称する）として検出され得る。以下、説明を簡単にするために、人領域は人シルエットまたは人矩形内として検出されるものとして説明する。 The human area detection unit 111 detects, as a human area, an area including an image of the target person 400 in an image captured by the imaging device 200. The human region is a region within the contour of the image of the target person 400 in the captured image, or a region within a rectangle circumscribed by the contour of the image of the target person 400. A region within the outline of the image of the target person 400 in the captured image may be detected as a silhouette of the image of the target person 400 (hereinafter, referred to as a “person silhouette”). The area within the contour of the image of the target person 400 in the captured image may be detected as a collection of pixels that constitute the image of the target person 400 in the captured image. An area in a rectangle circumscribing the contour of the image of the subject 400 may be detected as a rectangle circumscribed by the contour of the image of the subject 400 (hereinafter, referred to as a “human rectangle”). Hereinafter, in order to simplify the description, the human region is described as being detected as a human silhouette or a human rectangle.

人シルエットは、たとえば、撮影時刻が前後する画像を差分する時間差分により差分が相対的に大きい画素の範囲を抽出することで検出され得る。人シルエットは、撮影画像から背景画像を差分する背景差分により検出されてもよい。人矩形は、たとえば、人シルエットが外接する矩形を算出することで検出され得る。 The human silhouette can be detected, for example, by extracting a range of pixels having a relatively large difference based on a time difference that differentiates the images before and after the photographing time. The human silhouette may be detected by background subtraction that subtracts the background image from the captured image. A human rectangle can be detected, for example, by calculating a rectangle circumscribed by a human silhouette.

頭部領域検出部１１２は、対象者４００の頭部の画像を含む領域を頭部領域として検出する。頭部領域は、撮影画像における対象者４００の頭部の画像の輪郭内の領域、または対象者４００の頭部の画像の輪郭が外接する矩形内の領域である。撮影画像における対象者４００の頭部の画像の輪郭内の領域は、撮影画像における対象者４００の頭部の画像のシルエット（以下、「頭部シルエット」と称する）として検出され得る。対象者４００の頭部の画像の輪郭が外接する矩形内の領域は、撮影画像における対象者４００の頭部の画像を構成する画素の集合体として検出されてもよい。対象者４００の頭部の画像の輪郭が外接する矩形内の領域は、撮影画像における対象者４００の頭部の画像の輪郭に外接する矩形（以下、「頭部矩形」と称する）として検出され得る。以下、説明を簡単にするために、頭部領域は頭部シルエットまたは頭部矩形として検出されるものとして説明する。 The head area detection unit 112 detects an area including an image of the head of the subject 400 as a head area. The head area is an area within the outline of the image of the head of the subject 400 in the captured image, or an area within a rectangle circumscribed by the outline of the image of the head of the subject 400. A region within the contour of the image of the head of the target person 400 in the captured image may be detected as a silhouette of the image of the head of the target person 400 in the captured image (hereinafter referred to as “head silhouette”). The area within the rectangle circumscribed by the outline of the image of the head of the target person 400 may be detected as a collection of pixels constituting the image of the head of the target person 400 in the captured image. An area within a rectangle circumscribed by the contour of the image of the head of the subject 400 is detected as a rectangle circumscribing the contour of the image of the head of the subject 400 in the captured image (hereinafter referred to as “head rectangle”) obtain. Hereinafter, in order to simplify the description, the head region is described as being detected as a head silhouette or a head rectangle.

頭部シルエットは、たとえば、近赤外線に対する反射率が比較的高く、かつ略円形状の範囲を抽出することで検出され得る。人矩形は、たとえば、頭部シルエットが外接する矩形を算出することで検出され得る。 The head silhouette can be detected, for example, by extracting a relatively circular area with a relatively high reflectance to near infrared light. The human rectangle can be detected, for example, by calculating a rectangle circumscribed by the head silhouette.

画像回転部１１３は、人領域検出部１１１により検出された人矩形または人シルエット、ならびに頭部領域検出部１１２により検出された頭部矩形または頭部シルエットから、頭部方向を算出する。頭部方向とは、撮影画像における、人領域内の所定点に対する頭部領域の方向である。以下、説明を簡単にするために、人矩形および頭部矩形から頭部方向が算出されるものとして説明する。所定点は、たとえば人矩形の重心であり得る。 The image rotation unit 113 calculates the head direction from the human rectangle or human silhouette detected by the human region detection unit 111 and the head rectangle or head silhouette detected by the head region detection unit 112. The head direction is the direction of the head region with respect to a predetermined point in the human region in the captured image. Hereinafter, in order to simplify the description, it is assumed that the head direction is calculated from the human rectangle and the head rectangle. The predetermined point may be, for example, the center of gravity of a human rectangle.

画像回転部１１３は、撮影画像を回転させることで、算出した頭部方向を所定方向に統一する。ここで、撮影画像において撮影されている対象者４００の姿勢にかかわらず、撮影画像を回転することにより頭部方向を所定方向に統一する。すなわち、撮影画像において撮影されている対象者４００の姿勢が、立位、中腰、座位、臥位、およびその他の姿勢のいずれであっても、撮影画像を回転することにより頭部方向を所定方向に統一する。頭部方向は、たとえば、人矩形の重心に対する頭部矩形の重心であり得る。所定方向は、撮影画像における垂直上方向（矩形の撮像画像の縦辺と平行で、かつ撮像画像の上辺に向かう方向）であり得る。 The image rotation unit 113 unifies the calculated head direction into a predetermined direction by rotating the captured image. Here, regardless of the posture of the target person 400 photographed in the photographed image, the head direction is unified to a predetermined direction by rotating the photographed image. That is, even if the posture of the subject person 400 photographed in the photographed image is any of standing, middle and lower back, sitting, lying and other postures, the head direction is set to a predetermined direction by rotating the photographed image. Unify. The head direction may be, for example, the center of gravity of the head rectangle with respect to the center of gravity of the human rectangle. The predetermined direction may be a vertically upward direction (a direction parallel to the vertical side of the rectangular captured image and toward the upper side of the captured image) in the captured image.

撮影画像を回転して頭部方向を所定方向に統一することにより、機械学習による姿勢推定の対象となる撮影画像において、対象者４００の姿勢のバリエーションを減少させることができる。これにより、撮影画像において撮影されている対象者４００の姿勢が異なっていても、機械学習に使用する学習データの減少を抑制できるため、対象者４００の姿勢のバリエーションが増大することによる姿勢推定精度の劣化を防止できる。 By rotating the photographed image and unifying the head direction into a predetermined direction, it is possible to reduce the variation of the posture of the object person 400 in the photographed image which is the target of posture estimation by machine learning. As a result, even if the posture of the target person 400 captured in the photographed image is different, it is possible to suppress a decrease in learning data used for machine learning, so that posture estimation accuracy by increasing variations of the posture of the target person 400 Can be prevented.

図４は、頭部方向を上方向に統一するための撮影画像の回転について説明するための説明図である。 FIG. 4 is an explanatory view for explaining rotation of a photographed image for unifying the head direction to the upper direction.

図４の例においては、頭部方向が撮影画像における垂直上方向となるように撮影画像を回転している。回転前の頭部方向の垂直上方向との角度差は＋１００度（＋は反時計回り方向を示す）である。したがって、撮影画像を、必要回転角として−１００度回転させることで、頭部方向を垂直上方向とすることができる。 In the example of FIG. 4, the captured image is rotated so that the head direction is the vertically upward direction in the captured image. The angular difference between the head direction and the vertically upward direction before rotation is +100 degrees (+ indicates a counterclockwise direction). Therefore, the head direction can be made vertically upward by rotating the captured image by -100 degrees as the required rotation angle.

図５は、回転前の撮影画像と回転後の撮影画像を示す説明図である。なお、図５においては、対象者４００の姿勢を明確に示すために簡単な線図によるイラストにより撮影画像を表している（以下、図６、図８、図１０〜図１３において同様）。図５の上段は回転前の撮影画像であり、下段は回転後の撮影画像である。 FIG. 5 is an explanatory view showing a photographed image before rotation and a photographed image after rotation. In addition, in FIG. 5, in order to show the attitude | position of the subject person 400 clearly, the picked-up image is represented by the illustration by a simple diagram (following, in FIG. 6, FIG. 8, FIG. 10-FIG. 13). The upper part of FIG. 5 is a photographed image before rotation, and the lower part is a photographed image after rotation.

回転前の撮影画像においては、対象者４００の異なる姿勢を含む姿勢がそれぞれ撮像されており、頭部方向がそれぞれ異なっている。回転後の撮影画像は、対象者４００の頭部方向が垂直上方向に統一されている。 In the photographed image before rotation, postures including different postures of the target person 400 are respectively imaged, and the head directions are different. In the photographed image after rotation, the head direction of the target person 400 is unified in the vertically upward direction.

図６は、回転後の撮影画像における人矩形の再算出について説明するための説明図である。図６の上段は回転前の撮影画像であり、下段は回転後の撮影画像である。上段の各撮影画像には、回転前に算出された人矩形がグレーの線で示されている。下段の各撮影画像には、回転前に算出された人矩形がグレーの線で示され、回転後に再算出された人矩形が黒の線で示されている。 FIG. 6 is an explanatory diagram for describing recalculation of a human rectangle in a captured image after rotation. The upper part of FIG. 6 is a photographed image before rotation, and the lower part is a photographed image after rotation. In each photographed image on the upper side, a human rectangle calculated before rotation is indicated by a gray line. In each lower-stage photographed image, a human rectangle calculated before rotation is indicated by a gray line, and a human rectangle recalculated after rotation is indicated by a black line.

図６の上段においてグレーの線で示される人矩形、および下段において黒の線で示される人矩形において示されるように、人矩形は、縦辺が撮影画像における垂直上方向に平行で、横辺が撮影画像における水平方向（矩形の撮像画像の横辺と平行な方向）に平行になるように算出され得る。回転後の撮影画像は、頭部方向が撮影画像における垂直上方向となるため、回転後の撮影画像において人矩形を再算出することにより、人矩形のサイズを、撮影画像における対象者４００の画像のサイズにより近づけることができる。これにより、人矩形の算出精度を向上できるため、後述する、辞書選択部１１４による人矩形のサイズを利用した辞書選択において、より適切な辞書の選択が可能となる。ここで、辞書とは、機械学習の学習済みモデルを設定するためのデータであり、たとえば深層学習のニューラルネットワークにおける重み付けを定義するデータである。 As shown in the human rectangle shown by the gray line in the upper part of FIG. 6 and the human rectangle shown by the black line in the lower part, the human rectangle has a vertical side parallel to the vertical upper direction in the photographed image and a horizontal side May be calculated to be parallel to the horizontal direction (the direction parallel to the horizontal side of the rectangular captured image) in the captured image. In the photographed image after rotation, since the head direction is the vertically upward direction in the photographed image, the size of the human rectangle is calculated by recalculating the human rectangle in the photographed image after rotation, and the image of the target person 400 in the photographed image Can be closer to the size of As a result, since the calculation accuracy of the human rectangle can be improved, it is possible to select a more appropriate dictionary in dictionary selection using the size of the human rectangle by the dictionary selection unit 114 described later. Here, the dictionary is data for setting a learned model of machine learning, and is data defining, for example, weighting in a neural network of deep learning.

図７は、人シルエットを利用した、回転後の撮影画像における人矩形の再算出について説明するための説明図である。図７の上段は回転前の撮影画像における人シルエットであり、下段は回転後の撮影画像における人シルエットである。上段の各人シルエットには、回転前に算出された人矩形がグレーの線で示されている。下段の各人シルエットには、回転前に算出された人矩形がグレーの線で示され、回転後に再算出された人矩形が黒の線で示されている。 FIG. 7 is an explanatory diagram for describing recalculation of a human rectangle in a photographed image after rotation using a human silhouette. The upper part of FIG. 7 is a human silhouette in the photographed image before rotation, and the lower part is a human silhouette in the photographed image after rotation. In each upper person silhouette, human rectangles calculated before rotation are shown by gray lines. In each lower person silhouette, a human rectangle calculated before rotation is indicated by a gray line, and a human rectangle recalculated after rotation is indicated by a black line.

図７に示すように、人矩形は人シルエットに基づいて、人シルエットに外接する矩形として算出し得る。したがって、回転前に人シルエットを算出し、算出した人シルエットを必要回転角だけ回転し、回転後の人シルエットについて人矩形を再算出してもよい。 As shown in FIG. 7, the human rectangle can be calculated as a rectangle circumscribing the human silhouette based on the human silhouette. Therefore, a human silhouette may be calculated before rotation, and the calculated human silhouette may be rotated by a necessary rotation angle, and a human rectangle may be recalculated for the human silhouette after rotation.

画像回転部１１３は、撮影画像の中心から人領域までの距離（以下、「対象距離」と称する）を算出する。対象距離は、人領域である人矩形または人シルエットの重心の座標と撮影画像の中心の座標との距離として算出し得る。画像回転部１１３は、対象距離が所定の閾値以下の場合、回転後の撮影画像において、人シルエットの近似楕円を算出する。所定の閾値は、姿勢推定精度の観点から実験的に設定し得る。画像回転部１１３は、回転後の撮影画像を、算出した近似楕円の長軸の方向が水平方向になるように、再度回転する。すなわち、画像回転部１１３は、頭部方向を垂直上方向に統一するために撮影画像を回転した後、対象距離が所定の閾値以下の場合は、回転後の撮影画像を近似楕円の長軸の方向が水平方向になるように再度回転する。 The image rotation unit 113 calculates the distance from the center of the captured image to the human region (hereinafter, referred to as “target distance”). The target distance can be calculated as the distance between the coordinates of the center of gravity of the human rectangle or human silhouette that is a human region and the coordinates of the center of the captured image. When the target distance is equal to or less than a predetermined threshold, the image rotation unit 113 calculates an approximate ellipse of a human silhouette in the captured image after rotation. The predetermined threshold may be set experimentally from the viewpoint of posture estimation accuracy. The image rotation unit 113 rotates the captured image after rotation again such that the direction of the major axis of the calculated approximate ellipse is in the horizontal direction. That is, after rotating the captured image to unify the head direction vertically upward, the image rotation unit 113 rotates the captured image after rotation to the major axis of the approximate ellipse if the target distance is equal to or less than a predetermined threshold. Rotate again so that the direction is horizontal.

図８は、再度の回転前の撮影画像と再度の回転後の撮影画像を示す説明図である。図８の上段は再度の回転前の撮影画像であり、下段は再度の回転後の撮影画像である。 FIG. 8 is an explanatory view showing a photographed image before the second rotation and a photographed image after the second rotation. The upper part of FIG. 8 is a photographed image before rotation again, and the lower part is a photographed image after rotation again.

再度の回転前の撮影画像においては、立位の姿勢でカメラの近くにいる対象者４００がそれぞれ撮像されており、人シルエットの近似楕円の長軸の方向が異なっている。再度の回転後の撮影画像は、近似楕円の長軸の方向が水平方向に統一されている。 In the photographed image before the second rotation, the subjects 400 near the camera in the standing posture are respectively imaged, and the direction of the major axis of the approximate ellipse of the human silhouette is different. In the photographed image after the second rotation, the direction of the major axis of the approximate ellipse is unified in the horizontal direction.

辞書選択部１１４は、回転後（再度回転した場合は再度の回転後、以下同様）の撮影画像に基づく対象者４００の姿勢推定のための機械学習に使用する辞書を選択する。具体的には、辞書選択部１１４は、５つに区分されたモデルごとに記憶部１２０に記憶された辞書から、各モデルに対応した辞書を選択する。 The dictionary selection unit 114 selects a dictionary to be used for machine learning for posture estimation of the target person 400 based on a captured image after rotation (after rotation again if rotated again, and so forth). Specifically, the dictionary selection unit 114 selects a dictionary corresponding to each model from the dictionaries stored in the storage unit 120 for each of the five divided models.

図９は、機械学習に使用される、区分されたモデルごとの辞書の一覧を示す図である。 FIG. 9 is a diagram showing a list of dictionaries for each partitioned model used for machine learning.

辞書１は、直下・立位モデルに対応する辞書である。直下・立位モデルは、対象者４００が撮影装置２００のカメラの直下で立位の姿勢にある場合に区分されるモデルである。撮影画像は、対象距離が小さく、頭部領域の大きさ（以下、「頭部サイズ」と称する）が大きい場合に、直下・立位モデルに区分される。これは、対象者４００がカメラの直下にいて、頭部サイズが大きい場合は、対象者４００の頭部がカメラの直下で近距離にある場合であるため、対象者４００がカメラの直下で立位の姿勢にある可能性が比較的高いからである。撮影画像が直下・立位モデルに区分される場合に、辞書１が選択される。 The dictionary 1 is a dictionary corresponding to the direct / standing position model. The direct and standing position models are models that are classified when the target person 400 is in a standing position directly under the camera of the imaging device 200. The photographed image is divided into the directly below / standing position model when the target distance is small and the size of the head region (hereinafter referred to as “head size”) is large. This is the case where the target person 400 is directly below the camera and the head size is large, since the head of the target person 400 is at a short distance directly below the camera, the target person 400 stands directly below the camera. It is because the possibility of being in the position posture is relatively high. The dictionary 1 is selected when the photographed image is divided into the direct / standing position model.

辞書２は、直下・座位臥位モデルに対応する辞書である。直下・座位臥位モデルは、対象者４００が撮影装置２００のカメラの直下で座位または臥位の姿勢にある場合に区分されるモデルである。撮影画像は、対象距離が小さく、頭部サイズが小さい場合に、直下・座位臥位モデルに区分される。これは、対象者４００がカメラの直下にいて、頭部サイズが小さい場合は、対象者４００の頭部がカメラの直下で遠い距離にある場合であるため、対象者４００がカメラの直下で座位または臥位の姿勢にある可能性が比較的高いからである。撮影画像が直下・座位臥位モデルに区分される場合に、辞書２が選択される。 The dictionary 2 is a dictionary corresponding to the immediate down, sitting down position model. The direct-seat and sitting-down position models are models that are classified when the target person 400 is in a sitting or lying position directly under the camera of the imaging device 200. The photographed image is divided into the direct sitting and sitting position model when the target distance is small and the head size is small. This is a case where the target person 400 is directly below the camera and the head size is small, and the head of the target person 400 is far below the camera, so the target person 400 is seated directly below the camera. Or because it is relatively likely to be in a recumbent posture. The dictionary 2 is selected when the photographed image is divided into the direct sitting and sitting position model.

図１０は、対象距離が小さい場合であって、頭部サイズが大きい場合と小さい場合の撮影画像を示す説明図である。左図は、対象距離が小さく、頭部サイズが大きい場合の撮影画像である。右図は、対象距離が小さく、頭部サイズが小さい場合の撮影画像である。なお、各撮影画像には、頭部矩形が実線で示されている。 FIG. 10 is an explanatory view showing photographed images in the case where the target distance is small and the head size is large and small. The left figure is a photographed image when the target distance is small and the head size is large. The right figure is a photographed image when the target distance is small and the head size is small. In each photographed image, a head rectangle is indicated by a solid line.

図１０の左図のように、対象距離が小さく、頭部サイズが大きい場合は、辞書１が選択される。同図の例においては、カメラの直下で立位の姿勢にある対象者４００が撮影されている。左図のように、対象距離が小さく、頭部サイズが小さい場合は、辞書２が選択される。同図の例においては、カメラの直下で臥位の姿勢にある対象者４００が撮影されている。 As shown in the left diagram of FIG. 10, when the target distance is small and the head size is large, the dictionary 1 is selected. In the example of the figure, the subject person 400 in the standing posture is photographed directly below the camera. As shown in the left diagram, when the target distance is small and the head size is small, the dictionary 2 is selected. In the example of the figure, the target person 400 in the posture of lying down directly under the camera is photographed.

辞書３は、遠方・斜め立位モデルに対応する辞書である。遠方・斜め立位モデルは、対象者４００がカメラの遠方でカメラに対し斜め向きの立位の姿勢にある場合に区分されるモデルである。撮影画像は、対象距離が大きく、人領域の大きさである人体サイズの頭部サイズに対する比である、人体／頭部サイズ比が大きく、かつ、人矩形の横辺（水平方向の辺）に対する縦辺（垂直方向の辺）の比である、人矩形アスペクト比が大きい場合に、遠方・斜め立位モデルに区分される。これは、人体／頭部サイズ比が大きいことから対象者４００が立位の姿勢にあり、矩形アスペクト比が大きいことから対象者４００が斜め向きであるため、対象者４００がカメラの遠方でカメラに対し斜め向きの立位の姿勢にある可能性が比較的高いからである。撮影画像が遠方・斜め立位モデルに区分される場合に、辞書３が選択される。 The dictionary 3 is a dictionary corresponding to the distant / slant standing model. The distant / diagonal standing model is a model that is classified when the target person 400 is in a posture of standing upright with respect to the camera at a distance from the camera. The photographed image has a large object distance and a ratio of a human body size, which is the size of a human region, to a head size, a large human body / head size ratio, and a side (horizontal side) of a human rectangle. When the human rectangle aspect ratio, which is the ratio of the vertical sides (vertical sides), is large, the model is divided into the distant / diagonal standing model. This is because the target person 400 is in the standing posture since the human body / head size ratio is large, and the target person 400 is obliquely oriented because the rectangular aspect ratio is large. On the other hand, there is a relatively high possibility of being in an upright standing posture. The dictionary 3 is selected when the photographed image is divided into the distant and diagonal standing position model.

辞書４は、遠方・正面立位モデルに対応する辞書である。遠方・正面立位モデルは、対象者４００がカメラの遠方でカメラに対し正面（または裏面、以下同様）向きの立位の姿勢にある場合に区分されるモデルである。撮影画像は、対象距離が大きく、人体／頭部サイズ比が大きく、かつ、人矩形アスペクト比が小さい場合に、遠方・正面立位モデルに区分される。これは、人体／頭部サイズ比が大きいことから対象者４００が立位の姿勢にあり、矩形アスペクト比が小さいことから対象者４００が正面向きであるため、対象者４００がカメラの遠方でカメラに対し正面向きの立位の姿勢にある可能性が比較的高いからである。撮影画像が遠方・正面立位モデルに区分される場合に、辞書４が選択される。 The dictionary 4 is a dictionary corresponding to the distant / front standing model. The distant / front standing model is a model that is classified when the target person 400 is in a standing position facing the camera (or back, the same applies hereinafter) at a distance from the camera. The photographed image is classified into the distant / frontal standing model when the object distance is large, the human body / head size ratio is large, and the human rectangle aspect ratio is small. This is because the target person 400 is in the standing posture since the human body / head size ratio is large, and the target person 400 is facing front because the rectangular aspect ratio is small. On the other hand, it is relatively likely to be in a standing position facing the front. The dictionary 4 is selected when the photographed image is divided into the distant / front standing model.

辞書５は、遠方・座位臥位モデルに対応する辞書である。遠方・座位臥位モデルは、対象者４００がカメラから遠方で座位または臥位の姿勢にある場合に区分されるモデルである。撮影画像は、対象距離が大きく、かつ人体／頭部サイズ比が小さい場合に、遠方・座位臥位モデルに区分される。これは、人体／頭部サイズ比が小さいことから対象者４００が座位または臥位の姿勢にあるため、対象者４００がカメラの遠方で座位または臥位の姿勢にある可能性が比較的高いからである。撮影画像が遠方・座位臥位モデルに区分される場合に、辞書５が選択される。 The dictionary 5 is a dictionary corresponding to the distant / seated sitting position model. The distant / seated recumbent position model is a model that is classified when the subject 400 is in a sitting or recumbent position away from the camera. The photographed image is classified into the distant / seated recumbent position model when the object distance is large and the human body / head size ratio is small. This is because the subject 400 is in the sitting or lying position because the human body / head size ratio is small, so the subject 400 is relatively likely to be in the sitting or lying position at a distance from the camera. It is. The dictionary 5 is selected when the photographed image is classified into the distant / seated recumbent position model.

図１１は、対象距離が大きい場合であって、対象者の姿勢が立位の場合と座位の場合の撮影画像を示す説明図である。図１２は、対象距離が大きい場合であって、立位の姿勢の対象者の人矩形アスペクト比が小さい場合と大きい場合を示す説明図である。 FIG. 11 is an explanatory view showing photographed images in the case where the posture of the subject is in the standing position and in the case of the sitting position in the case where the target distance is large. FIG. 12 is an explanatory view showing the case where the human rectangle aspect ratio of the target person in the standing posture is small and large in the case where the target distance is large.

図１１の上図は、対象距離が大きく、人体／頭部サイズ比が大きい場合の撮影画像である。下図は、対象距離が大きく、人体／頭部サイズ比が小さい場合の撮影画像である。なお、各撮影画像の右側に、人矩形および頭部矩形の関係が実線で示されている。 The upper part of FIG. 11 is a photographed image when the target distance is large and the human body / head size ratio is large. The following figure is a photographed image when the object distance is large and the human body / head size ratio is small. The relationship between a human rectangle and a head rectangle is indicated by a solid line on the right of each captured image.

図１２の上図は、対象距離が大きく、人体／頭部サイズ比が大きく、人矩形アスペクト比が小さい場合の撮影画像である。下図は、対象距離が大きく、人体／頭部サイズ比が大きく、人矩形アスペクト比が大きい場合の撮影画像である。なお、各撮影画像の右側に、人矩形が実線で示されている。 The upper part of FIG. 12 is a photographed image when the object distance is large, the human body / head size ratio is large, and the human rectangle aspect ratio is small. The figure below shows the photographed image when the object distance is large, the human body / head size ratio is large, and the human rectangle aspect ratio is large. A human rectangle is indicated by a solid line on the right side of each captured image.

図１１の上図のように、対象距離が大きく、人体／頭部サイズ比が大きい場合は、遠方から立位の姿勢の対象者４００を撮影した撮影画像に該当する。また、同図においては、さらに矩形アスペクト比が小さい。したがって、辞書４が選択される。下図のように、対象距離が大きく、人体／頭部サイズ比が小さい場合は、遠方から座位の姿勢の対象者４００を撮影した撮影画像に該当する。したがって、辞書５が選択される。 When the target distance is large and the human body / head size ratio is large as in the upper diagram of FIG. 11, the image corresponds to a photographed image obtained by photographing a target person 400 in a standing position from a distance. Also, in the figure, the rectangular aspect ratio is smaller. Therefore, the dictionary 4 is selected. As shown in the figure below, when the target distance is large and the human body / head size ratio is small, it corresponds to a photographed image obtained by photographing the target person 400 in a sitting posture from a distance. Therefore, the dictionary 5 is selected.

図１２の上図のように、対象距離が大きく、人体／頭部サイズ比が大きく、人矩形アスペクト比が小さい場合は、遠方から立位の姿勢の対象者４００を正面から撮影した撮影画像に該当する。したがって、辞書４が選択される。下図のように、対象距離が大きく、人体／頭部サイズ比が大きく、人矩形アスペクト比が大きい場合は、遠方から立位の姿勢の対象者４００を斜めから撮影した撮影画像に該当する。したがって、辞書３が選択される。 As shown in the upper diagram of FIG. 12, when the target distance is large, the human body / head size ratio is large, and the human rectangle aspect ratio is small, the captured image is a captured image of the target person 400 in a standing position from a distance. Applicable Therefore, the dictionary 4 is selected. When the target distance is large, the human body / head size ratio is large, and the human rectangle aspect ratio is large as shown in the following figure, the image corresponds to a photographed image obtained by photographing the target person 400 in a standing posture from a distance from a distance. Therefore, dictionary 3 is selected.

辞書選択部１１４は、頭部サイズを、頭部シルエットの画素数として算出し得る。 The dictionary selection unit 114 can calculate the head size as the number of pixels of the head silhouette.

辞書選択部１１４は、人体／頭部サイズ比を、人矩形内または人シルエットの画素数を頭部矩形内または頭部シルエットの画素数で除算することで算出し得る。以下、説明を簡単にするために、人体／頭部サイズ比は、人矩形内の画素数を頭部矩形内の画素数で除算することにより算出されるものとして説明する。 The dictionary selection unit 114 can calculate the human body / head size ratio by dividing the number of pixels in a human rectangle or a human silhouette by the number of pixels in a head rectangle or a head silhouette. Hereinafter, in order to simplify the description, the human body / head size ratio is described as being calculated by dividing the number of pixels in the human rectangle by the number of pixels in the head rectangle.

辞書選択部１１４は、矩形アスペクト比を、人矩形の縦辺の長さに対応する画素数を横辺の長さに対応する画素数で除算することにより算出し得る。 The dictionary selection unit 114 can calculate the rectangular aspect ratio by dividing the number of pixels corresponding to the length of the vertical side of the human rectangle by the number of pixels corresponding to the length of the horizontal side.

姿勢推定部１１５は、画像回転部１１３による回転後の撮影画像に基づいて、機械学習により対象者４００の関節点として姿勢を推定する。機械学習としては、ＤｅｅｐＰｏｓｅが好ましく使用される。 The posture estimation unit 115 estimates a posture as a joint point of the target person 400 by machine learning based on the captured image after rotation by the image rotation unit 113. As machine learning, Deep Pose is preferably used.

図１３は、機械学習の入力画像である撮影画像と、出力される推定姿勢を示す説明図である。図１３の上段は入力画像である回転後の撮影画像であり、下段は関節点として推定された姿勢である。なお、上段の撮影画像には、推定姿勢である関節点が重畳されて示されている。 FIG. 13 is an explanatory view showing a photographed image which is an input image of machine learning and an estimated posture to be output. The upper part of FIG. 13 is a captured image after rotation which is an input image, and the lower part is a posture estimated as a joint point. In addition, the joint point which is an estimated posture is superimposed and shown by the picked-up image of the upper stage.

図１３に示すように、姿勢推定部１１５に入力される撮影画像は、頭部方向が垂直上方向に統一されている。このため、たとえば立位、臥位、および転倒の姿勢が撮影された撮影画像は、画像回転部１１３により回転されることで、頭部方向が垂直上方向に統一されているため、同一のモデル（たとえば遠方・正面立位モデル）に区分される可能性が比較的高くなる。このため、対象者４００の複数の姿勢について共通の辞書を使用して関節点の推定ができるため、時前機械学習に用いる学習データのデータ量を抑制できる。 As shown in FIG. 13, in the photographed image input to the posture estimation unit 115, the head direction is unified in the vertical upward direction. For this reason, for example, the photographed image in which the posture of standing position, lying position and falling position is photographed is rotated by the image rotating unit 113, and the head direction is unified vertically upward, so the same model is obtained. There is a relatively high probability of being divided into (for example, a distant / frontal standing model). For this reason, since it is possible to estimate joint points using a common dictionary for a plurality of postures of the target person 400, it is possible to suppress the data amount of learning data used for pre-time machine learning.

画像逆回転部１１６は、関節点として推定された姿勢を逆回転することで回転前の撮影画像に対応する姿勢を算出する。すなわち、画像逆回転部１１６は、画像回転部１１３により撮影画像が回転された回転方向と同じ回転量だけ、回転後の撮像画像について関節点として推定された姿勢を逆回転する。回転量は、画像回転部１１３から取得し得る。上述したように、たとえば立位、臥位、および転倒の姿勢が撮影された撮影画像は、画像回転部１１３により回転されることで、頭部方向が垂直上方向に統一されているため、姿勢推定部１１５により関節点として推定された姿勢も頭部方向が垂直上方向になっている。そこで、画像逆回転部１１６により、関節点として推定された姿勢を逆回転することで回転前の撮影画像に対応する姿勢を算出し、後述する、行動推定部１１７による対象者４００の行動の推定に使用することにより、対象者４００の行動の推定精度を向上させる。 The image reverse rotation unit 116 reversely rotates the posture estimated as the joint point to calculate the posture corresponding to the photographed image before the rotation. That is, the image reverse rotation unit 116 reversely rotates the posture estimated as the joint point in the captured image after rotation by the same rotation amount as the rotation direction in which the captured image is rotated by the image rotation unit 113. The amount of rotation can be acquired from the image rotation unit 113. As described above, for example, the photographed image in which the posture of standing position, lying position, and falling position is photographed is rotated by the image rotating unit 113, so that the head direction is unified in the vertically upward direction. The head direction of the posture estimated as a joint point by the estimation unit 115 is also vertically upward. Therefore, the image reverse rotation unit 116 reversely rotates the posture estimated as the joint point to calculate the posture corresponding to the captured image before rotation, and estimates the action of the subject 400 by the action estimation unit 117 described later. The estimation accuracy of the action of the target person 400 is improved by using the

行動推定部１１７には、画像逆回転部１１６による逆回転後の対象者４００の関節点による姿勢が複数入力される。行動推定部１１７は、複数の関節点による姿勢の間の変化に基づいて対象者４００の行動を推定する。行動推定部１１７は、たとえば、関節点による姿勢の差分に基づいて対象者４００の行動を推定し得る。関節点による姿勢の差分は、複数の撮影画像からなる映像の隣接するフレームについてそれぞれ推定された関節点による姿勢の差分であり得る。たとえば、転倒の動作は、関節点による姿勢の差分が所定の閾値を超えた後、当該姿勢の差分がほとんどなくなることにより推定し得る。 A plurality of postures based on joint points of the subject 400 after reverse rotation by the image reverse rotation unit 116 are input to the action estimation unit 117. The action estimation unit 117 estimates the action of the target person 400 based on the change between the postures due to the plurality of joint points. The action estimation unit 117 may, for example, estimate the action of the target person 400 based on the difference in posture due to the joint point. The difference in posture due to the joint point may be a difference in posture due to the joint point estimated for each adjacent frame of the video composed of a plurality of photographed images. For example, the motion of falling can be estimated by the fact that, after the difference in posture due to the joint point exceeds a predetermined threshold, the difference in posture is almost eliminated.

図１４および図１５は、行動推定システムの動作を示すフローチャートである。本フローチャートは、姿勢推定装置１００の制御部１１０により、プログラムにしたがい実行され得る。 14 and 15 are flowcharts showing the operation of the behavior estimation system. The flowchart may be executed by the control unit 110 of the posture estimation device 100 according to a program.

制御部１１０は、撮影装置２００から受信された撮影画像から、人領域を人矩形または人シルエットとして検出する（Ｓ１０１）。制御部１１０は、人領域を検出していないときは（Ｓ１０２：ＮＯ）、人領域の検出を継続する。 The control unit 110 detects a human area as a human rectangle or a human silhouette from the captured image received from the imaging device 200 (S101). When the human area is not detected (S102: NO), control unit 110 continues detection of the human area.

制御部１１０は、人領域を検出したときは（Ｓ１０２：ＹＥＳ）、頭部領域検出部１１２により頭部領域を頭部矩形または頭部シルエットとして検出する（Ｓ１０３）。 When the control unit 110 detects a human region (S102: YES), the head region detection unit 112 detects a head region as a head rectangle or a head silhouette (S103).

制御部１１０は、頭部方向が撮影画像における垂直上方向になるように撮影画像を回転する（Ｓ１０４）。 The control unit 110 rotates the photographed image such that the head direction is the vertically upward direction in the photographed image (S104).

制御部１１０は、対象距離が所定の閾値以下である場合は（Ｓ１０５：ＹＥＳ）、人領域である人シルエットの近似楕円の長軸が撮影画像において水平になるように撮影画像を再度回転する（Ｓ１０６）。制御部１１０は、頭部サイズが所定の閾値以上である場合は（Ｓ１０７：ＹＥＳ）、直下・立位モデルに対応する辞書１を選択する（Ｓ１０８）。当該閾値は、姿勢推定精度の観点から実験的に設定され得る。制御部１１０は、頭部サイズが所定の閾値以上でない場合は（Ｓ１０７：ＮＯ）、直下・座位臥位モデルに対応する辞書２を選択する（Ｓ１０９）。 When the target distance is equal to or less than the predetermined threshold (S105: YES), the control unit 110 rotates the captured image again so that the major axis of the approximate ellipse of the human silhouette that is the human region is horizontal in the captured image (S105). S106). When the head size is equal to or larger than the predetermined threshold (S107: YES), the control unit 110 selects the dictionary 1 corresponding to the direct / standing position model (S108). The threshold may be set experimentally from the viewpoint of posture estimation accuracy. When the head size is not equal to or larger than the predetermined threshold (S107: NO), the control unit 110 selects the dictionary 2 corresponding to the direct / sitting recumbent position model (S109).

制御部１１０は、対象距離が所定の閾値以下でない場合は（Ｓ１０５：ＮＯ）、人体／頭部サイズ比が所定の閾値以上かどうか判断する（Ｓ１１０）。当該閾値は、姿勢推定精度の観点から実験的に設定され得る。制御部１１０は、人体／頭部サイズ比が所定の閾値以上の場合は（Ｓ１１０：ＹＥＳ）、アスペクト比が所定の閾値以上かどうか判断する（Ｓ１１１）。 If the target distance is not equal to or less than the predetermined threshold (S105: NO), the control unit 110 determines whether the human body / head size ratio is equal to or more than the predetermined threshold (S110). The threshold may be set experimentally from the viewpoint of posture estimation accuracy. If the human body / head size ratio is equal to or greater than the predetermined threshold (S110: YES), the control unit 110 determines whether the aspect ratio is equal to or greater than the predetermined threshold (S111).

制御部１１０は、アスペクト比が所定の閾値以上の場合は（Ｓ１１１：ＹＥＳ）、遠方・斜め立位モデルに対応する辞書３を選択する（Ｓ１１２）。制御部１１０は、アスペクト比が所定の閾値以上でない場合は（Ｓ１１１：ＮＯ）、遠方・正面立位モデルに対応する辞書４を選択する（Ｓ１１３）。 When the aspect ratio is equal to or more than the predetermined threshold (S111: YES), the control unit 110 selects the dictionary 3 corresponding to the distant / diagonal standing model (S112). When the aspect ratio is not equal to or more than the predetermined threshold (S111: NO), the control unit 110 selects the dictionary 4 corresponding to the distant / front standing model (S113).

制御部１１０は、人体／頭部サイズ比が所定の閾値以上でない場合は（Ｓ１１０：ＮＯ）、遠方・座位臥位モデルに対応する辞書５を選択する（Ｓ１１４）。 When the human body / head size ratio is not equal to or larger than the predetermined threshold (S110: NO), the control unit 110 selects the dictionary 5 corresponding to the distant / sitting prone position model (S114).

制御部１１０は、撮影画像について、選択された辞書を使用した機械学習により関節点を算出することで、対象者４００の姿勢を推定する（Ｓ１１５）。 The control unit 110 estimates the posture of the subject 400 by calculating joint points of the captured image by machine learning using the selected dictionary (S115).

制御部１１０は、関節点による姿勢を逆回転することで回転前の撮影画像に対応する対象者４００の姿勢を関節点として算出する（Ｓ１１６）。 The control unit 110 reversely rotates the posture based on the joint point to calculate the posture of the target person 400 corresponding to the photographed image before rotation as a joint point (S116).

制御部１１０は、逆回転後の関節点による姿勢に基づいて対象者４００の行動を推定する（Ｓ１１７）。 The control unit 110 estimates the action of the target person 400 based on the posture by the joint point after the reverse rotation (S117).

本実施形態は、以下の効果を奏する。 The present embodiment has the following effects.

撮影画像における対象者の頭部の方向を、撮影画像を回転することで所定方向に統一し、回転後の撮影画像に基づいて、機械学習により、対象者の関節点を姿勢として推定する。これにより、撮影画像における対象者の姿勢によらずに、少ない学習データで、当該対象者の姿勢を高精度に推定できる。 The direction of the subject's head in the captured image is unified into a predetermined direction by rotating the captured image, and the joint point of the target person is estimated as a posture by machine learning based on the captured image after rotation. Thus, the posture of the subject can be estimated with high accuracy with a small amount of learning data regardless of the posture of the subject in the photographed image.

さらに、人領域を、人の画像の輪郭内の領域または人の画像の輪郭が外接する矩形内の領域として検知し、頭部領域を、対象者の頭部の画像の輪郭内の領域または頭部の画像の輪郭が外接する矩形内の領域として検出する。そして、撮影画像における人領域内の所定点に対する頭部領域の方向を頭部の方向とする。これにより、より正確に対象者の姿勢を推定できる。 Furthermore, the human area is detected as an area in the outline of the human image or an area in a rectangle circumscribed by the outline of the human image, and the head area is an area in the outline of the subject's head or the head The image is detected as an area in a rectangle circumscribed by the outline of the image of the part. Then, the direction of the head region with respect to a predetermined point in the human region in the photographed image is taken as the direction of the head. Thereby, the posture of the subject can be estimated more accurately.

さらに、頭部の方向を統一する所定方向を、撮影画像における垂直上方向とする。これにより、学習データを容易に増大できるため、当該対象者の姿勢を容易かつ高精度に推定できる。 Furthermore, a predetermined direction for unifying the direction of the head is taken as the vertical upper direction in the photographed image. As a result, since the learning data can be easily increased, the posture of the subject can be easily and accurately estimated.

さらに、頭部領域の方向を、所定点である人領域の重心から頭部領域の重心へ向かう方向とする。これにより、撮影画像の回転により頭部の方向を適切に統一できる。 Furthermore, the direction of the head region is a direction from the center of gravity of the human region, which is a predetermined point, to the center of gravity of the head region. Thereby, the direction of the head can be appropriately unified by rotation of the photographed image.

さらに、関節点による対象者の姿勢の推定を、ＤｅｅｐＰｏｓｅにより行う。これにより、信頼性の高い姿勢推定を実現できる。 Furthermore, estimation of the posture of the subject by joint points is performed by Deep Pose. Thus, highly reliable attitude estimation can be realized.

さらに、撮影画像の中心から人領域までの距離が所定の閾値以下の場合、人領域の近似楕円を算出し、近似楕円の長軸の方向が、撮影画像において水平方向となるように、撮影画像を再度回転し、再度の回転後の撮影画像に基づいて姿勢を推定する。これにより、遠方から撮影されることで、頭部の方向が明確でない場合であっても、対象者の姿勢を高精度に推定できる。 Furthermore, when the distance from the center of the captured image to the human region is less than a predetermined threshold value, the approximate ellipse of the human region is calculated, and the direction of the major axis of the approximate ellipse is horizontal in the captured image. Is rotated again, and the posture is estimated based on the photographed image after the rotation again. As a result, by photographing from a distance, even if the direction of the head is not clear, it is possible to estimate the posture of the subject with high accuracy.

さらに、機械学習に使用する辞書を、撮影画像の中心から人領域までの距離に基づいて選択する。これにより、近距離から撮影された撮影画像と遠距離から撮影された撮影画像とで辞書を変えることにより、対象者の姿勢の推定精度を向上できる。 Furthermore, a dictionary used for machine learning is selected based on the distance from the center of the photographed image to the human region. As a result, it is possible to improve the estimation accuracy of the posture of the object person by changing the dictionary between the photographed image photographed from a short distance and the photographed image photographed from a long distance.

さらに、機械学習に使用する辞書を、頭部サイズに基づいて選択する。これにより、特に近距離から撮影された撮影画像について立位の可能性が高いものと、それ以外のものとで辞書を変えることで、対象者の姿勢の推定精度を向上できる。 Furthermore, a dictionary used for machine learning is selected based on the head size. In this way, it is possible to improve the estimation accuracy of the posture of the object person by changing the dictionary between the one with a high possibility of standing and the other with a high possibility of standing in particular for a photographed image taken from a short distance.

さらに、機械学習に使用する辞書を、人体／頭部サイズ比に基づいて選択する。これにより、特に遠距離から撮影された撮影画像について、立位の可能性が高いものと、それ以外のものとで辞書を変えることにより、対象者の姿勢の推定精度を向上できる。 Furthermore, a dictionary used for machine learning is selected based on the human body / head size ratio. As a result, it is possible to improve the estimation accuracy of the posture of the object person by changing the dictionary between the one with a high possibility of standing and the other with respect to a photographed image photographed from a particularly long distance.

さらに、機械学習に使用する辞書を、人矩形のアスペクト比に基づいて選択する。これにより、特に遠距離から撮影された対象者が立位の撮影画像について対象者が斜めから撮影された可能性が高いものと、正面または裏面から撮影された可能性が高いものとで辞書を変えることで、対象者の姿勢の推定精度を向上できる。 Furthermore, a dictionary used for machine learning is selected based on the aspect ratio of the human rectangle. In this way, it is possible to use a dictionary with a high probability that the target person was photographed from an oblique direction, and that the target person was particularly photographed from the front or back side. By changing it, the estimation accuracy of the posture of the subject can be improved.

さらに、機械学習に使用する辞書を、撮影画像の中心から人領域までの距離、頭部サイズ、人体／頭部サイズ比、および前記アスペクト比の少なくともいずれかに基づいて選択する。これにより、より適当な辞書を選択することにより、より少ない学習データで、より高精度な姿勢推定を実現できる。 Furthermore, a dictionary used for machine learning is selected based on at least one of the distance from the center of the captured image to the human region, the head size, the human body / head size ratio, and the aspect ratio. As a result, more accurate posture estimation can be realized with less learning data by selecting a more appropriate dictionary.

さらに、回転後の撮影画像に対応する関節点として推定された姿勢を逆回転することで回転前の撮影画像に対応する姿勢を算出し、算出された、複数の姿勢の間の変化に基づいて対象者の行動を推定する。これにより、少ない学習データで、対象者の行動を高精度に推定できる。 Furthermore, the posture corresponding to the photographed image before rotation is calculated by reversely rotating the posture estimated as the joint point corresponding to the photographed image after rotation, and based on the change among the calculated postures. Estimate the subject's behavior. This makes it possible to estimate the behavior of the subject with high accuracy with a small amount of learning data.

本発明は上述した実施形態に限定されない。 The present invention is not limited to the embodiments described above.

たとえば、撮影画像を回転することにより、頭部方向を撮影画像の垂直上方向以外の所定方向に統一してもよい。たとえば、所定方向は撮影画像の水平方向であってもよい。 For example, by rotating the photographed image, the head direction may be unified to a predetermined direction other than the vertical upper direction of the photographed image. For example, the predetermined direction may be the horizontal direction of the captured image.

また、撮像画像を５つのモデルに区分して機械学習に使用する辞書を選択しているが、モデルの数は５未満であってもよい。すなわち、たとえば対象距離の大小のみに基づく２つのモデルに区分して各モデルに対応する辞書を選択するようにしてもよい。 Moreover, although the captured image is divided into five models and the dictionary used for machine learning is selected, the number of models may be less than five. That is, for example, it may be divided into two models based only on the magnitude of the target distance, and a dictionary corresponding to each model may be selected.

また、実施形態においてプログラムにより実行される処理の一部または全部を回路などのハードウェアに置き換えて実行されてもよい。 In addition, part or all of the processing executed by the program in the embodiment may be replaced with hardware such as a circuit.

１０姿勢推定システム、
１００姿勢推定装置、
２００撮影装置、
３００通信ネットワーク。 10 attitude estimation system,
100 attitude estimation devices,
200 shooting device,
300 communication network.

Claims

A posture estimation system for estimating a posture of a subject based on a captured image, comprising:
A human area detection unit configured to detect an area including an image of the subject in the captured image as a human area;
A head area detection unit which detects an area including an image of the head of the subject in the photographed image as a head area;
An image rotation for unifying the direction of the head region with respect to a predetermined point in the human region in the photographed image in which a plurality of postures of the target person are respectively captured, by rotating the photographed image. Department,
A posture estimation unit configured to estimate the joint point of the subject as a posture by machine learning based on the captured image after rotation;
Attitude estimation system having.

The human region is a region within the contour of the image of the subject person or a region within a rectangle circumscribed by the contour of the image of the subject person, and the head region is within the contour of the image of the head of the subject person The posture estimation system according to claim 1, wherein the region is a region within a rectangle circumscribed by the region of or the contour of the image of the head.

The posture estimation system according to claim 1, wherein the predetermined direction is a vertically upward direction in the photographed image.

The posture estimation system according to any one of claims 1 to 3, wherein a direction of the head region is a direction from a center of gravity of the human region which is the predetermined point to a center of gravity of the head region.

The posture estimation system according to any one of claims 1 to 4, wherein the posture estimation unit estimates the posture of the subject by Deep Pose.

A distance calculation unit that calculates a distance from the center of the captured image to the human region;
An approximate ellipse calculation unit that calculates an approximate ellipse of the human region in the photographed image after rotation by the image rotation unit when the distance is equal to or less than a predetermined threshold;
The image re-rotating unit further rotates the photographed image after rotation so that the direction of the major axis of the approximate ellipse is horizontal in the photographed image.
The posture estimation system according to any one of claims 1 to 5, wherein the posture estimation unit estimates the posture of the subject based on the photographed image after rotation by the image re-rotation unit.

The posture estimation system according to claim 6, wherein the posture estimation unit selects a dictionary used in the machine learning based on the distance calculated by the distance calculation unit.

The image processing apparatus further includes a head size calculation unit that calculates the size of the head region in the captured image as a head size,
The posture estimation system according to claim 6, wherein the posture estimation unit selects a dictionary used for the machine learning based on the head size.

The human body / head size ratio calculator further includes a ratio of the area of the human area to the area of the head area as a human body / head size ratio,
The posture estimation system according to any one of claims 1 to 6, wherein the posture estimation unit selects a dictionary used in the machine learning based on the human body / head size ratio.

The human region is a region within a rectangle circumscribed by the contour of the image of the subject,
The image processing apparatus further includes an aspect ratio calculation unit that calculates an aspect ratio that is a ratio of the length of the vertical side of the rectangle to the length of the horizontal side of the rectangle,
The posture estimation system according to claim 2, wherein the posture estimation unit selects a dictionary used in the machine learning based on the aspect ratio.

A distance calculation unit that calculates a distance from the center of the captured image to the human region;
A head size calculation unit that calculates the size of the head region in the captured image as a head size;
A human body / head size ratio calculating unit that calculates a ratio of an area of the head area to an area of the human area as a human body / head size ratio;
Aspect ratio calculation for calculating an aspect ratio which is a ratio of the length of the vertical side of the human region to the length of the horizontal side when the human region is a region within a rectangle circumscribed by the contour of the image of the subject person Have a part, and
The posture estimation unit selects a dictionary used in the machine learning based on at least one of the calculated distance, the head size, the human body / head size ratio, and the aspect ratio. The posture estimation system according to any one of 1 to 5.

The posture estimation system according to any one of claims 1 to 11.
A pre-rotation posture calculation unit that calculates the posture corresponding to the captured image before rotation by reversely rotating the posture estimated as the joint point corresponding to the captured image after rotation;
An action estimation unit that estimates the action of the target person based on changes between the plurality of postures calculated by the pre-rotation posture calculation unit;
An action estimation system having

Detecting a region including an image of a subject in a captured image as a human region (a);
Detecting a region including an image of the head of the subject in the captured image as a head region (b);
Procedure for unifying the direction of the head region with respect to a predetermined point in the human region in the photographed image in which a plurality of postures of the subject are respectively imaged, by rotating the photographed image in a predetermined direction (c )When,
A procedure (d) of estimating a joint position of the subject by machine learning based on the captured image after rotation;
Pose estimation program to make a computer execute.