JP2022095994A

JP2022095994A - Image processing system, image processing program, and image processing method

Info

Publication number: JP2022095994A
Application number: JP2022072168A
Authority: JP
Inventors: 智也岡▲崎▼; Tomoya Okazaki; 希武田中; Nozomu Tanaka; 直樹池田; Naoki Ikeda
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2019-05-29
Filing date: 2022-04-26
Publication date: 2022-06-28
Anticipated expiration: 2040-04-06
Also published as: JP7067672B2; WO2020241057A1; JP7347577B2; JPWO2020241057A1

Abstract

PROBLEM TO BE SOLVED: To provide an image processing system capable of estimating the posture of a person with high accuracy based on a photographed image even when the installation height of a photographing device varies.

SOLUTION: An image processing system includes: an image acquisition part for acquiring an image obtained by photographing an entire photographing area by a photographing device installed at an installation position for overlooking a predetermined photographing area; a person region detection part for detecting a person region from an image; an information acquisition part for acquiring information on the height from a predetermined position of the installation position: and an attitude estimation part for estimating the posture of a person by machine learning based on the person region and information on the height.

SELECTED DRAWING: Figure 3

Description

本発明は、画像処理システム、画像処理プログラム、および画像処理方法に関する。 The present invention relates to an image processing system, an image processing program, and an image processing method.

我が国は、戦後の高度経済成長に伴う生活水準の向上、衛生環境の改善、および医療水準の向上等により、長寿命化が顕著となっている。このため、出生率の低下と相まって、高齢化率が高い高齢化社会になっている。このような高齢化社会では、病気、怪我、および加齢などにより、介護等の対応を必要とする要介護者等の増加が想定される。 In Japan, the longevity of life has become remarkable due to the improvement of living standards, the improvement of sanitary environment, and the improvement of medical standards due to the high economic growth after the war. For this reason, the aging society has a high aging rate, coupled with the decline in the birth rate. In such an aging society, it is expected that the number of people requiring long-term care will increase due to illness, injury, and aging.

要介護者等は、病院や老人福祉施設などの施設において、歩行中に転倒したり、ベッドから転落して怪我をするおそれがある。そのため、要介護者等がこのような状態になったときに介護士や看護師等のスタッフがすぐに駆けつけられるようにするために、撮影画像から要介護者等の状態を検出するためのシステムの開発が進められている。このようなシステムで要介護者等の状態を検出するためには、撮影された画像から検知対象である人物の姿勢等を高精度で検出する必要がある。 In facilities such as hospitals and welfare facilities for the elderly, care recipients may fall while walking or fall from their beds and be injured. Therefore, a system for detecting the condition of the care recipient from the photographed image so that the staff such as the caregiver and the nurse can immediately rush to the care recipient when the care recipient becomes in such a state. Is under development. In order to detect the state of the care recipient or the like with such a system, it is necessary to detect the posture of the person to be detected with high accuracy from the captured image.

しかし、特に魚眼レンズカメラのような広角レンズのカメラで撮影された全方位画像においては、レンズの歪み特性により、同じ人物であっても、人物の位置に応じて、画像内の人物の見え方が変化するという特徴をもつ。 However, especially in an omnidirectional image taken with a wide-angle lens camera such as a fisheye lens camera, the appearance of the person in the image depends on the position of the person even if the person is the same person due to the distortion characteristics of the lens. It has the characteristic of changing.

全方位画像等の画像から人物の姿勢を検出する技術としては、下記特許文献１および２に開示されたものがある。 Techniques for detecting the posture of a person from an image such as an omnidirectional image include those disclosed in the following Patent Documents 1 and 2.

特許文献１には、人の個人ＩＤと、画像における直立時の頭の大きさとを関連付けて登録しておき、画像における人の頭の像の大きさを検出し、検出された頭の像の大きさに基づいて人の姿勢を判別する技術が開示されている。特許文献２には、画像から人を含む人領域を検出し、人領域の画像と姿勢との組み合わせの教師データにより予め学習されたニューラルネットワーク等により、人領域の画像に基づいて人の姿勢を推定する技術が開示されている。 In Patent Document 1, the personal ID of a person and the size of the head when standing upright in the image are registered in association with each other, the size of the image of the head of the person in the image is detected, and the detected image of the head is used. A technique for determining a person's posture based on size is disclosed. In Patent Document 2, a human region including a person is detected from an image, and the posture of the person is determined based on the image of the human region by a neural network or the like learned in advance by teacher data of a combination of the image of the human region and the posture. The estimation technique is disclosed.

特開２０１５－１５８９５２号公報Japanese Unexamined Patent Publication No. 2015-158952 特開２０１８－２０６３２１号公報Japanese Unexamined Patent Publication No. 2018-206321

しかし、特許文献１の技術は、画像上の頭の像の大きさに基づいて人物の姿勢を判別するに過ぎないため、高精度に姿勢を推定できないという問題がある。特許文献２の技術は、画像を撮影する撮影装置の設置場所の変更等により撮影装置の高さが変動した場合に、画像に基づく人物の姿勢の推定精度が低下するという問題がある。 However, the technique of Patent Document 1 has a problem that the posture cannot be estimated with high accuracy because the posture of the person is only determined based on the size of the image of the head on the image. The technique of Patent Document 2 has a problem that the estimation accuracy of the posture of a person based on an image is lowered when the height of the photographing device is changed due to a change in the installation location of the photographing device for photographing an image.

本発明は、このような問題を解決するためになされたものである。すなわち、撮影装置の設置高さが変動しても、撮影された画像に基づいて高精度に人物の姿勢を推定できる画像処理システム、画像処理プログラム、および画像処理方法を提供することを目的とする。 The present invention has been made to solve such a problem. That is, it is an object of the present invention to provide an image processing system, an image processing program, and an image processing method that can estimate the posture of a person with high accuracy based on the captured image even if the installation height of the photographing device fluctuates. ..

本発明の上記課題は、以下の手段によって解決される。 The above-mentioned problems of the present invention are solved by the following means.

（１）所定の撮影領域を俯瞰する設置位置に設置された撮影装置により、前記撮影領域全体が撮影された画像を取得する画像取得部と、前記画像から人物領域を検出する人物領域検出部と、前記設置位置の、所定位置からの高さの情報を取得する情報取得部と、前記人物領域と、前記高さの情報とに基づいて、機械学習により人物の姿勢を推定する姿勢推定部と、を有する画像処理システム。 (1) An image acquisition unit that acquires an image in which the entire imaging area is captured by an imaging device installed at an installation position that overlooks a predetermined imaging area, and a person area detection unit that detects a person area from the image. , An information acquisition unit that acquires height information from a predetermined position of the installation position, and a posture estimation unit that estimates a person's posture by machine learning based on the person area and the height information. An image processing system having.

（２）前記所定位置からの高さが予め設定した基準高さである場合に合わせて、前記高さの情報に基づいて、前記人物領域を補正する補正部をさらに有し、前記姿勢推定部は、前記基準高さに設置された前記撮影装置で撮影された前記画像から検出された前記人物領域と、前記人物の姿勢との組合せを教師データとして予め学習され、前記補正部により補正された前記人物領域に基づいて人物の姿勢を推定する、上記（１）に記載の画像処理システム。 (2) The posture estimation unit further includes a correction unit that corrects the person area based on the height information in accordance with the case where the height from the predetermined position is a preset reference height. Was previously learned as teacher data from the combination of the person area detected from the image taken by the image pickup device installed at the reference height and the posture of the person, and was corrected by the correction unit. The image processing system according to (1) above, which estimates the posture of a person based on the person area.

（３）前記人物領域から、人の体に関する特徴点を推定する特徴点推定部と、前記所定位置からの高さが予め設定した基準高さである場合に合わせて、前記高さの情報に基づいて、前記特徴点を補正する補正部をさらに有し、前記姿勢推定部は、前記基準高さに設置された前記撮影装置で撮影された前記画像から検出された前記人物領域から推定された前記特徴点と、前記人物の姿勢との組合せを教師データとして予め学習され、前記補正部により補正された前記特徴点に基づいて人物の姿勢を推定する、上記（１）に記載の画像処理システム。 (3) The feature point estimation unit that estimates feature points related to the human body from the person area, and the height information according to the case where the height from the predetermined position is a preset reference height. Based on this, it further has a correction unit that corrects the feature points, and the posture estimation unit is estimated from the person area detected from the image taken by the photographing device installed at the reference height. The image processing system according to (1) above, wherein the combination of the feature points and the posture of the person is learned in advance as teacher data, and the posture of the person is estimated based on the feature points corrected by the correction unit. ..

（４）所定の撮影領域を俯瞰する設置位置に設置された撮影装置により、前記撮影領域全体が撮影された画像を取得する手順（ａ）と、前記画像から人物領域を検出する手順（ｂ）と、前記設置位置の、所定位置からの高さの情報を取得する手順（ｃ）と、前記人物領域と、前記高さの情報とに基づいて、機械学習により人物の姿勢を推定する手順（ｄ）と、を有する処理を、コンピューターに実行させるための画像処理プログラム。 (4) A procedure (a) for acquiring an image in which the entire shooting area is shot by a shooting device installed at an installation position overlooking a predetermined shooting area, and a procedure (b) for detecting a person area from the image. And the procedure (c) for acquiring the height information of the installation position from the predetermined position, and the procedure for estimating the posture of the person by machine learning based on the person area and the height information (). An image processing program for causing a computer to execute a process having d) and.

（５）前記処理は、前記所定位置からの高さが予め設定した基準高さである場合に合わせて、前記高さの情報に基づいて、前記人物領域を補正する手順（ｅ）をさらに有し、前記手順（ｄ）では、前記基準高さに設置された前記撮影装置で撮影された前記画像から検出された前記人物領域と、前記人物の姿勢との組合せを教師データとして予め学習された姿勢推定部により、前記手順（ｅ）において補正された前記人物領域に基づいて人物の姿勢を推定する、上記（４）に記載の画像処理プログラム。 (5) The process further includes a procedure (e) for correcting the person area based on the height information in accordance with the case where the height from the predetermined position is a preset reference height. Then, in the procedure (d), the combination of the person area detected from the image taken by the photographing device installed at the reference height and the posture of the person is learned in advance as teacher data. The image processing program according to (4) above, wherein the posture estimation unit estimates the posture of a person based on the person area corrected in the procedure (e).

（６）前記処理は、前記人物領域から、人の体に関する特徴点を推定する手順（ｆ）と、前記所定位置からの高さが予め設定した基準高さである場合に合わせて、前記高さの情報に基づいて、前記特徴点を補正する手順（ｇ）と、をさらに有し、前記手順（ｄ）では、前記基準高さに設置された前記撮影装置で撮影された前記画像から検出された前記人物領域から推定された前記特徴点と、前記人物の姿勢との組合せを教師データとして予め学習された姿勢推定部により、前記手順（ｇ）において補正された前記特徴点に基づいて人物の姿勢を推定する、上記（４）に記載の画像処理プログラム。 (6) In the process, the height is matched with the procedure (f) of estimating a feature point related to the human body from the person area and the case where the height from the predetermined position is a preset reference height. Based on the above information, the procedure (g) for correcting the feature points is further provided, and in the procedure (d), the detection is performed from the image taken by the photographing device installed at the reference height. A person based on the feature points corrected in the procedure (g) by a posture estimation unit pre-learned using the combination of the feature points estimated from the person area and the posture of the person as teacher data. The image processing program according to (4) above, which estimates the posture of the above.

（７）画像処理システムによる画像処理方法であって、所定の撮影領域を俯瞰する設置位置に設置された撮影装置により、前記撮影領域全体が撮影された画像を取得する段階（ａ）と、前記画像から人物領域を検出する段階（ｂ）と、前記設置位置の、所定位置からの高さの情報を取得する段階（ｃ）と、前記人物領域と、前記高さの情報とに基づいて、機械学習により人物の姿勢を推定する段階（ｄ）と、を有する画像処理方法。 (7) An image processing method using an image processing system, the step (a) of acquiring an image in which the entire shooting area is taken by a shooting device installed at an installation position overlooking a predetermined shooting area, and the above. Based on the step (b) of detecting the person area from the image, the step (c) of acquiring the height information of the installation position from the predetermined position, the person area, and the height information. An image processing method comprising a step (d) of estimating a person's posture by machine learning.

（８）前記所定位置からの高さが予め設定した基準高さである場合に合わせて、前記高さの情報に基づいて、前記人物領域を補正する段階（ｅ）をさらに有し、前記段階（ｄ）では、前記基準高さに設置された前記撮影装置で撮影された前記画像から検出された前記人物領域と、前記人物の姿勢との組合せを教師データとして予め学習された姿勢推定部により、前記段階（ｅ）において補正された前記人物領域に基づいて人物の姿勢を推定する、上記（７）に記載の画像処理方法。 (8) Further having a step (e) of correcting the person area based on the information of the height according to the case where the height from the predetermined position is a preset reference height, the step. In (d), the posture estimation unit learned in advance using the combination of the person area detected from the image taken by the photographing device installed at the reference height and the posture of the person as teacher data. The image processing method according to (7) above, wherein the posture of the person is estimated based on the person area corrected in the step (e).

（９）前記人物領域から、人の体に関する特徴点を推定する段階（ｆ）と、前記所定位置からの高さが予め設定した基準高さである場合に合わせて、前記高さの情報に基づいて、前記特徴点を補正する段階（ｇ）と、をさらに有し、前記段階（ｄ）では、前記基準高さに設置された前記撮影装置で撮影された前記画像から検出された前記人物領域から推定された前記特徴点と、前記人物の姿勢との組合せを教師データとして予め学習された姿勢推定部により、前記段階（ｇ）において補正された前記特徴点に基づいて人物の姿勢を推定する、上記（７）に記載の画像処理方法。 (9) The height information is used according to the step (f) of estimating the feature points related to the human body from the person area and the case where the height from the predetermined position is a preset reference height. Based on this, the person further includes a step (g) for correcting the feature points, and in the step (d), the person detected from the image taken by the photographing device installed at the reference height. The posture of the person is estimated based on the feature points corrected in the step (g) by the posture estimation unit which has been learned in advance using the combination of the feature points estimated from the region and the posture of the person as teacher data. The image processing method according to (7) above.

所定の撮影領域を俯瞰する設置位置に設置された撮影装置で撮影された撮影領域全体の画像から人物領域を検出し、撮影装置が設置された高さの情報と、人物領域とに基づいて、機械学習により人物の姿勢を推定する。これにより、撮影装置の設置高さが変動しても、撮影された画像に基づいて高精度に人物の姿勢を推定できる。 A person area is detected from the image of the entire shooting area taken by the shooting device installed at the installation position that overlooks a predetermined shooting area, and based on the height information in which the shooting device is installed and the person area, Estimate the posture of a person by machine learning. As a result, even if the installation height of the photographing device fluctuates, the posture of the person can be estimated with high accuracy based on the captured image.

実施形態に係る画像認識装置を含む画像認識システムの概略構成を示す図である。It is a figure which shows the schematic structure of the image recognition system including the image recognition apparatus which concerns on embodiment. 画像認識装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware composition of an image recognition apparatus. 画像認識装置の制御部の機能を示すブロック図である。It is a block diagram which shows the function of the control part of an image recognition apparatus. 画像において検出された人物領域を示す図である。It is a figure which shows the person area detected in the image. 特徴点を示す説明図である。It is explanatory drawing which shows the feature point. 補正部による特徴点の補正について説明するための説明図である。It is explanatory drawing for demonstrating the correction of a feature point by a correction part. 補正前後の特徴点を画像上で説明するための説明図であるIt is explanatory drawing for explaining a feature point before and after a correction on an image. 画像認識装置の動作を示すフローチャートである。It is a flowchart which shows the operation of an image recognition apparatus. 画像認識装置の制御部の機能を示すブロック図である。It is a block diagram which shows the function of the control part of an image recognition apparatus. 画像認識装置の動作を示すフローチャートである。It is a flowchart which shows the operation of an image recognition apparatus. 画像認識装置の制御部の機能を示すブロック図である。It is a block diagram which shows the function of the control part of an image recognition apparatus. 画像認識装置の動作を示すフローチャートである。It is a flowchart which shows the operation of an image recognition apparatus.

以下、図面を参照して、本発明の実施形態に係る、画像処理システム、画像処理プログラム、および画像処理方法について説明する。なお、図面において、同一の要素には同一の符号を付し、重複する説明を省略する。また、図面の寸法比率は、説明の都合上誇張されており、実際の比率とは異なる場合がある。 Hereinafter, an image processing system, an image processing program, and an image processing method according to an embodiment of the present invention will be described with reference to the drawings. In the drawings, the same elements are designated by the same reference numerals, and duplicate description will be omitted. In addition, the dimensional ratios in the drawings are exaggerated for convenience of explanation and may differ from the actual ratios.

（第１実施形態）
図１は、実施形態に係る画像認識装置１００を含む画像認識システム１０の概略構成を示す図である。 (First Embodiment)
FIG. 1 is a diagram showing a schematic configuration of an image recognition system 10 including an image recognition device 100 according to an embodiment.

画像認識システム１０は、画像認識装置１００、撮影装置２００、通信ネットワーク３００、および携帯端末４００を有する。画像認識装置１００は、通信ネットワーク３００により撮影装置２００および携帯端末４００と相互に通信可能に接続される。 The image recognition system 10 includes an image recognition device 100, a photographing device 200, a communication network 300, and a mobile terminal 400. The image recognition device 100 is communicably connected to the photographing device 200 and the mobile terminal 400 by the communication network 300.

画像認識装置１００は、撮影装置２００により撮影された画像（以下、単に「画像６００」とも称する（図４等参照））を、撮影装置２００から受信し、画像６００から人物である対象者５００を含む領域を人物領域６１０として検出する。画像認識装置１００は、画像６００上で物体（オブジェクト）が存在する領域を検出し、検出した領域に含まれる物体のカテゴリーを推定することで、人物領域６１０を検出し得る。物体が存在する領域は、画像６００上で物体が含まれる矩形（候補矩形）として検出される。画像認識装置１００は、検出された候補矩形のうち、物体のカテゴリーが人物であると推定された候補矩形を検出することで、人物領域６１０を検出する。画像認識装置１００は、人物領域６１０に基づいて、対象者５００の姿勢または行動を検出する。姿勢には、立位、中腰、座位、臥位、しゃがみ込み、および座り込みが含まれる。行動には、起床、離床、転倒、および転落が含まれる。後述するように、人物領域６１０は、撮影装置２００の設置位置の高さが予め設定した基準高さである場合に合わせて、撮影装置２００の設置位置の高さに基づいて、補正される。また、推定した姿勢または行動から、対象者５００に関するイベントを検出し得る。イベントとは、対象者５００に関し、画像認識装置１００等が認識した状態の変化であって、例えば、起床、離床、転倒、および転落等のスタッフ８０に発報（報知）を行うべき事象である。画像認識装置１００は、イベントを検出した場合は、イベントの内容を通知するイベント通知を携帯端末４００へ送信する。画像認識装置１００は、ニューラルネットワーク（以下、「ＮＮ」と称する）により、対象者５００を画像６００上の人物領域６１０として検出し得る。ＮＮによる対象物体の検出方法としては、例えば、ＦａｓｔｅｒＲ－ＣＮＮ、ＦａｓｔＲ－ＣＮＮ、およびＲ－ＣＮＮといった公知の方法が挙げられる。画像認識装置１００は、コンピューターにより構成される。画像認識装置１００は、例えば、サーバーにより構成され得る。 The image recognition device 100 receives an image taken by the photographing device 200 (hereinafter, also simply referred to as “image 600” (see FIG. 4 and the like)) from the photographing device 200, and receives the subject 500 who is a person from the image 600. The included area is detected as the person area 610. The image recognition device 100 can detect the person area 610 by detecting the area where the object (object) exists on the image 600 and estimating the category of the object included in the detected area. The area where the object exists is detected on the image 600 as a rectangle (candidate rectangle) including the object. The image recognition device 100 detects the person area 610 by detecting the candidate rectangles whose object category is presumed to be a person among the detected candidate rectangles. The image recognition device 100 detects the posture or behavior of the subject 500 based on the person area 610. Postures include standing, mid-waist, sitting, lying down, crouching, and sitting. Behaviors include getting up, getting out of bed, falling, and falling. As will be described later, the person area 610 is corrected based on the height of the installation position of the photographing device 200 in accordance with the case where the height of the installation position of the photographing device 200 is a preset reference height. In addition, an event related to the subject 500 can be detected from the estimated posture or behavior. The event is a change in the state recognized by the image recognition device 100 or the like with respect to the target person 500, and is an event that should be notified (notified) to the staff 80 such as getting up, getting out of bed, falling, and falling. .. When the image recognition device 100 detects an event, the image recognition device 100 transmits an event notification notifying the content of the event to the mobile terminal 400. The image recognition device 100 can detect the subject 500 as a person region 610 on the image 600 by a neural network (hereinafter referred to as “NN”). Examples of the method for detecting the target object by the NN include known methods such as Faster R-CNN, Fast R-CNN, and R-CNN. The image recognition device 100 is composed of a computer. The image recognition device 100 may be configured by, for example, a server.

撮影装置２００は、例えば、近赤外線カメラにより構成され、所定の撮影領域を俯瞰する設置位置に設置され、撮影領域全体を撮影する。撮影装置２００の設置位置は、例えば対象者５００の居室の天井である。所定の撮影領域は、例えば、居室の床面全体を含む３次元の領域である。以下、撮影装置２００は対象者５００の居室の天井に設置されるものとして説明する。撮影装置２００は、ＬＥＤ（ＬｉｇｈｔＥｍｉｔｔｉｎｇＤｅｖｉｃｅ）により近赤外線を撮影領域に向けて照射し、撮影領域内の物体により反射される近赤外線の反射光をＣＭＯＳ（ＣｏｍｐｌｅｍｅｍｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）センサーにより受光することで撮影領域を撮影し得る。画像６００は近赤外線の反射率を各画素とするモノクロ画像であり得る。 The photographing device 200 is composed of, for example, a near-infrared camera, is installed at an installation position overlooking a predetermined photographing area, and photographs the entire photographing area. The installation position of the photographing device 200 is, for example, the ceiling of the living room of the subject 500. The predetermined photographing area is, for example, a three-dimensional area including the entire floor surface of the living room. Hereinafter, the photographing apparatus 200 will be described as being installed on the ceiling of the living room of the subject 500. The photographing device 200 irradiates near-infrared rays toward the photographing area by an LED (Light Emitting Device), and receives the reflected light of the near-infrared rays reflected by an object in the photographing area by a CMOS (Completion Metal Oxide Sensor) sensor. You can shoot the shooting area with. The image 600 may be a monochrome image having the reflectance of near infrared rays as each pixel.

撮影装置２００は、たとえば１５ｆｐｓ～３０ｆｐｓのフレームレートの動画として撮影領域を撮影し得る。画像６００には動画と静止画とが含まれる。撮影装置２００は、画像６００を画像認識装置１００等に送信する。 The photographing device 200 may photograph the photographing area as a moving image having a frame rate of, for example, 15 fps to 30 fps. The image 600 includes a moving image and a still image. The photographing apparatus 200 transmits the image 600 to the image recognition apparatus 100 and the like.

撮影装置２００は、コンピューターを有するセンサーボックスにより構成し得る。センサーボックスとは、近赤外線カメラおよび体動センサー等を備える装置である。この場合、画像６００が、センサーボックスから画像認識装置１００へ送信される。なお、画像認識装置１００の機能の一部または全部をセンサーボックスが有するようにしてもよい。体動センサーは、ベッドに対してマイクロ波を送受信して対象者５００の体動（例えば呼吸動）によって生じたマイクロ波のドップラシフトを検出するドップラシフト方式のセンサーである。 The photographing device 200 may be configured by a sensor box having a computer. The sensor box is a device including a near-infrared camera, a body motion sensor, and the like. In this case, the image 600 is transmitted from the sensor box to the image recognition device 100. The sensor box may have a part or all of the functions of the image recognition device 100. The body movement sensor is a doppler shift type sensor that transmits and receives microwaves to and from the bed and detects the doppler shift of the microwave generated by the body movement (for example, respiratory movement) of the subject 500.

通信ネットワーク３００には、イーサネット（登録商標）などの有線通信規格によるネットワークインターフェースを使用し得る。通信ネットワーク３００には、Ｂｌｕｅｔｏｏｔｈ（登録商標）、ＩＥＥＥ８０２．１１などの無線通信規格によるネットワークインターフェースを使用してもよい。通信ネットワーク３００には、アクセスポイント３１０が設けられ、携帯端末４００と、画像認識装置１００および撮影装置２００とを無線通信ネットワークにより通信可能に接続する。 For the communication network 300, a network interface based on a wired communication standard such as Ethernet (registered trademark) may be used. For the communication network 300, a network interface based on a wireless communication standard such as Bluetooth (registered trademark) or 802.11 may be used. An access point 310 is provided in the communication network 300, and the mobile terminal 400 and the image recognition device 100 and the photographing device 200 are communicably connected by a wireless communication network.

携帯端末４００は、画像認識装置１００からイベント通知を受信し、イベント通知の内容を表示する。イベント通知には、起床、離床、転倒、および転落の検出結果の他、微体動異常等の検出結果が含まれ得る。携帯端末４００は、撮影装置２００または画像認識装置１００から画像６００を受信して表示し得る。携帯端末４００は、たとえばスマートフォンにより構成される。 The mobile terminal 400 receives the event notification from the image recognition device 100 and displays the content of the event notification. The event notification may include detection results of getting up, getting out of bed, falling, and falling, as well as detection results of microbody movement abnormalities and the like. The mobile terminal 400 may receive and display an image 600 from the photographing device 200 or the image recognition device 100. The mobile terminal 400 is composed of, for example, a smartphone.

図２は、画像認識装置１００のハードウェア構成を示すブロック図である。画像認識装置１００は、制御部１１０、記憶部１２０、表示部１３０、入力部１４０、および通信部１５０を有する。これらの構成要素は、バス１６０を介して相互に接続される。 FIG. 2 is a block diagram showing a hardware configuration of the image recognition device 100. The image recognition device 100 includes a control unit 110, a storage unit 120, a display unit 130, an input unit 140, and a communication unit 150. These components are connected to each other via the bus 160.

制御部１１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）により構成され、プログラムに従って画像認識装置１００の各部の制御および演算処理を行う。制御部１１０の機能の詳細については後述する。 The control unit 110 is configured by a CPU (Central Processing Unit), and controls and performs arithmetic processing of each unit of the image recognition device 100 according to a program. The details of the function of the control unit 110 will be described later.

記憶部１２０は、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、およびＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）により構成され得る。ＲＡＭは、制御部１１０の作業領域として一時的にプログラムやデータを記憶する。ＲＯＭは、あらかじめ各種プログラムや各種データを格納する。ＳＳＤは、オペレーションシステムを含む各種プログラムおよび各種データを格納する。 The storage unit 120 may be composed of a RAM (Random Access Memory), a ROM (Read Only Memory), and an SSD (Solid State Drive). The RAM temporarily stores programs and data as a work area of the control unit 110. The ROM stores various programs and various data in advance. The SSD stores various programs including an operating system and various data.

表示部１３０は、たとえば液晶ディスプレイであり、各種情報を表示する。 The display unit 130 is, for example, a liquid crystal display and displays various information.

入力部１４０は、たとえばタッチパネルや各種キーにより構成される。入力部１４０は、各種操作、入力のために使用される。 The input unit 140 is composed of, for example, a touch panel and various keys. The input unit 140 is used for various operations and inputs.

通信部１５０は、外部機器と通信するためのインターフェースである。通信には、イーサネット（登録商標）、ＳＡＴＡ、ＰＣＩＥｘｐｒｅｓｓ、ＵＳＢ、ＩＥＥＥ１３９４などの規格によるネットワークインターフェースが用いられ得る。その他、通信には、Ｂｌｕｅｔｏｏｔｈ（登録商標）、ＩＥＥＥ８０２．１１、４Ｇ等の無線通信インターフェースが用いられ得る。通信部１５０は、撮影装置２００から画像６００を受信する。通信部１５０は、イベント通知を携帯端末４００へ送信する。 The communication unit 150 is an interface for communicating with an external device. For communication, network interfaces according to standards such as Ethernet (registered trademark), SATA, PCI Express, USB, and IEEE 1394 can be used. In addition, a wireless communication interface such as Bluetooth (registered trademark), 802.11, 4G, etc. may be used for communication. The communication unit 150 receives the image 600 from the photographing device 200. The communication unit 150 transmits the event notification to the mobile terminal 400.

制御部１１０の機能の詳細について説明する。 The details of the function of the control unit 110 will be described.

図３は、画像認識装置１００の制御部１１０の機能を示すブロック図である。制御部１１０は、画像取得部１１１、人物領域検出部１１２、特徴点推定部１１３、高さ情報取得部１１４、補正部１１５、および姿勢推定部１１６を含む。高さ情報取得部１１４は情報取得部を構成する。 FIG. 3 is a block diagram showing the function of the control unit 110 of the image recognition device 100. The control unit 110 includes an image acquisition unit 111, a person area detection unit 112, a feature point estimation unit 113, a height information acquisition unit 114, a correction unit 115, and a posture estimation unit 116. The height information acquisition unit 114 constitutes an information acquisition unit.

画像取得部１１１は、通信部１５０により撮影装置２００から受信した画像６００を取得する。 The image acquisition unit 111 acquires the image 600 received from the photographing device 200 by the communication unit 150.

人物領域検出部１１２はＮＮにより構成される。ＮＮには、人物領域を検出するための学習により得られた人物領域検出パラメーターが反映されている。人物領域検出部１１２は、画像６００に基づく畳み込み演算により、画素の特徴が抽出された特徴マップを生成する。人物領域検出部１１２は、特徴マップから、画像６００上で物体が存在する領域を候補矩形として検出する。人物領域検出部１１２は、ＦａｓｔｅｒＲ－ＣＮＮ等のＮＮを用いた公知の技術により候補矩形を検出し得る。人物領域検出部１１２は、各候補矩形について、所定のカテゴリーごとの信頼度スコアを算出する。すなわち、人物領域検出部１１２は、人物を含む所定のカテゴリーに関する信頼度スコアを算出する。信頼度スコアは、所定のカテゴリーごとの尤度である。人物領域検出部１１２は、ＦａｓｔｅｒＲ－ＣＮＮ等のＮＮを用いた公知の技術により信頼度スコアを算出し得る。所定のカテゴリーは、例えば、人物、椅子、および機具とし得る。人物領域検出部１１２は、信頼度スコアが最も高いカテゴリーが人物である候補矩形をそれぞれ人物領域６１０として検出する。なお、１つの候補矩形に対し、所定のカテゴリーことにそれぞれ算出された信頼度スコアの和は１となる。 The person area detection unit 112 is composed of NN. The NN reflects the person area detection parameters obtained by learning to detect the person area. The person area detection unit 112 generates a feature map from which pixel features are extracted by a convolution calculation based on the image 600. The person area detection unit 112 detects the area where the object exists on the image 600 as a candidate rectangle from the feature map. The person area detection unit 112 can detect a candidate rectangle by a known technique using NN such as Faster R-CNN. The person area detection unit 112 calculates the reliability score for each predetermined category for each candidate rectangle. That is, the person area detection unit 112 calculates the reliability score for a predetermined category including the person. The confidence score is the likelihood for each predetermined category. The person area detection unit 112 can calculate the reliability score by a known technique using NN such as Faster R-CNN. Predetermined categories can be, for example, people, chairs, and equipment. The person area detection unit 112 detects each candidate rectangle in which the category with the highest reliability score is a person as the person area 610. The sum of the reliability scores calculated for each of the predetermined categories for one candidate rectangle is 1.

人物領域検出部１１２は、画像６００と、当該画像６００に対する正解として設定された人物領域６１０との組合せの教師データを用いて、画像６００から人物領域６１０を推定するための学習が予めされている。これにより、人物領域検出部１１２には、上述した人物領域検出パラメーターが反映されている。 The person area detection unit 112 is preliminarily learned to estimate the person area 610 from the image 600 by using the teacher data of the combination of the image 600 and the person area 610 set as the correct answer for the image 600. .. As a result, the above-mentioned person area detection parameter is reflected in the person area detection unit 112.

図４は、画像６００において検出された人物領域６１０を示す図である。 FIG. 4 is a diagram showing a person region 610 detected in the image 600.

図４の例においては、人物領域６１０が、人物である対象者５００を囲む矩形の領域として検出されている。この場合、人物領域６１０は、画像６００における人物領域６１０の矩形の対頂点のいずれかの組をなす２つの点の座標として画像６００とともに出力され得る。 In the example of FIG. 4, the person area 610 is detected as a rectangular area surrounding the subject 500 who is a person. In this case, the person area 610 can be output together with the image 600 as the coordinates of two points forming any pair of rectangular pairs of vertices of the person area 610 in the image 600.

特徴点推定部１１３はＮＮにより構成される。ＮＮには、人の体に関する特徴点６２０（以下、単に「特徴点６２０」とも称する）を推定するための学習により得られた特徴点推定パラメーターが反映されている。特徴点推定部１１３は、人物領域６１０に基づいて、特徴点６２０を推定する。特徴点６２０には、関節点６２１、および頭部矩形６２３の対頂点６２２が含まれ得る。頭部矩形６２３は、人物の頭部を含む矩形の領域である。特徴点推定部１１３は、ＤｅｅｐＰｏｓｅ等のＮＮを用いた公知の技術により特徴点６２０を推定し得る。ＤｅｅｐＰｏｓｅについては、公知の文献（Alexander Toshev, et al.
“DeepPose: Human Pose Estimation via Deep Neural Networks”, in CVPR, 2014）に
詳細が記載されている。 The feature point estimation unit 113 is composed of NN. The NN reflects the feature point estimation parameters obtained by learning for estimating the feature point 620 (hereinafter, also simply referred to as “feature point 620”) relating to the human body. The feature point estimation unit 113 estimates the feature point 620 based on the person area 610. The feature point 620 may include a joint point 621 and a pair of vertices 622 of the head rectangle 623. The head rectangle 623 is a rectangular area including the head of a person. The feature point estimation unit 113 can estimate the feature point 620 by a known technique using NN such as DeepPose. For DeepPose, see the known literature (Alexander Toshev, et al.
Details can be found in “DeepPose: Human Pose Optimization via Deep Neural Networks”, in CVPR, 2014).

図５は、特徴点６２０を示す説明図である。図５には、人物領域６１０（より詳しくは、人物領域６１０の枠）も併せて示されている。 FIG. 5 is an explanatory diagram showing a feature point 620. FIG. 5 also shows a person area 610 (more specifically, a frame of the person area 610).

図５の例において、白い丸はそれぞれ関節点６２１を示し、グレーの丸はそれぞれ頭部矩形６２３の対頂点６２２を示している。 In the example of FIG. 5, the white circles indicate the joint points 621, and the gray circles indicate the opposite vertices 622 of the head rectangle 623, respectively.

特徴点推定部１１３は、人物領域６１０と、当該人物領域６１０に対する正解として設定された特徴点６２０との組合せの教師データを用いて、人物領域６１０から特徴点６２０を推定するための学習が予めされている。これにより、特徴点推定部１１３には、上述した特徴点推定パラメーターが反映されている。 The feature point estimation unit 113 is previously trained to estimate the feature point 620 from the person area 610 by using the teacher data of the combination of the person area 610 and the feature point 620 set as the correct answer for the person area 610. Has been done. As a result, the feature point estimation unit 113 reflects the above-mentioned feature point estimation parameters.

高さ情報取得部１１４は、例えば、ユーザーにより入力部１４０において入力された、撮影装置２００の設置位置の、所定位置からの高さの情報を取得する。以下、撮影装置２００の設置位置の所定位置からの高さを「撮影装置２００の高さ」と、所定位置からの高さの情報を単に「高さ情報」ともそれぞれ称する。所定位置は任意かつ一定の位置であり、例えば、床面の位置とし得る。 The height information acquisition unit 114 acquires, for example, the height information of the installation position of the photographing apparatus 200, which is input by the user in the input unit 140, from a predetermined position. Hereinafter, the height of the installation position of the photographing device 200 from a predetermined position is also referred to as “height of the photographing device 200”, and the information of the height from the predetermined position is also referred to simply as “height information”. The predetermined position is an arbitrary and constant position, and may be, for example, a position on the floor surface.

補正部１１５は、撮影装置２００の高さが予め設定された基準高さである場合に合わせて、高さ情報に基づいて、特徴点６２０（より詳細には、画像６００における特徴点６２０の座標）を補正する。 The correction unit 115 is based on the height information in accordance with the case where the height of the photographing apparatus 200 is a preset reference height, and the feature point 620 (more specifically, the coordinates of the feature point 620 in the image 600). ) Is corrected.

図６は、補正部１１５による特徴点６２０の補正について説明するための説明図である。 FIG. 6 is an explanatory diagram for explaining the correction of the feature point 620 by the correction unit 115.

図６において、撮影装置２００Ａは、基準高さである、床面から２４０ｃｍの設置位置に設置されている。撮影装置２００Ｂは、基準高さより高い、床面から２７０ｃｍの設置位置に設置されている。撮影装置２００Ａと撮影装置２００Ｂとで、床面上で固定された同じ対象者５００を撮影すると、同じ対象者５００を撮影しているにもかかわらず、画像６００上の対象者５００の大きさが異なる。具体的には、床面から２７０ｃｍの高さの撮影装置２００Ｂにより撮影された画像６００上の対象者５００の方が、床面から２４０ｃｍの高さの撮影装置２００Ａにより撮影された画像６００上の対象者５００よりも小さくなる。このような、撮影装置２００の高さが変化することによる画像６００上の対象者５００の大きさの変動は、後述する、姿勢推定部１１６による、特徴点６２０に基づく対象者５００の姿勢の推定精度を低下させ得る。このような姿勢の推定精度の低下は、様々な高さの撮影装置２００で撮影された画像６００から推定された特徴点６２０と、当該特徴点６２０に対する正解として設定された姿勢との組合せの教師データを用いて姿勢推定部１１６を学習させることで防止できる。しかし、姿勢の推定精度を維持するために必要な教師データの量が増大する。そこで、撮影装置２００の高さが変動しても、必要な教師データの量を増大させずに、高精度な姿勢推定を実現するために、特徴点６２０を補正する。具体的には、特徴点６２０を、基準高さの撮影装置２００により撮影された場合に合わせて補正する。すなわち、各特徴点６２０相互の距離および位置の相対的関係が、撮影装置２００が基準高さに設置された場合の各特徴点６２０相互の距離および位置の相対的関係となるように、特徴点６２０を補正する。 In FIG. 6, the photographing apparatus 200A is installed at an installation position 240 cm from the floor surface, which is a reference height. The photographing apparatus 200B is installed at an installation position 270 cm from the floor surface, which is higher than the reference height. When the same subject 500 fixed on the floor surface is photographed by the photographing device 200A and the photographing device 200B, the size of the subject 500 on the image 600 is increased even though the same subject 500 is photographed. different. Specifically, the subject 500 on the image 600 taken by the image pickup device 200B at a height of 270 cm from the floor surface is on the image 600 taken by the image pickup device 200A at a height of 240 cm from the floor surface. It is smaller than the subject 500. Such a change in the size of the subject 500 on the image 600 due to the change in the height of the photographing apparatus 200 is estimated by the posture estimation unit 116, which will be described later, of the posture of the subject 500 based on the feature point 620. It can reduce accuracy. Such a decrease in the estimation accuracy of the posture is caused by the teacher of the combination of the feature point 620 estimated from the image 600 taken by the photographing apparatus 200 of various heights and the posture set as the correct answer for the feature point 620. This can be prevented by training the posture estimation unit 116 using the data. However, the amount of teacher data required to maintain posture estimation accuracy increases. Therefore, even if the height of the photographing apparatus 200 fluctuates, the feature points 620 are corrected in order to realize highly accurate posture estimation without increasing the amount of required teacher data. Specifically, the feature point 620 is corrected according to the case where the image is taken by the image pickup device 200 having a reference height. That is, the feature points are so that the relative relationship between the distance and the position of each feature point 620 is the relative relationship between the distance and the position of each feature point 620 when the photographing device 200 is installed at the reference height. Correct 620.

図６に示すように、床面に対する基準画像平面の高さは、αｃｍである。従って、基準画像平面と撮影装置２００Ａとの距離は（２４０－α）ｃｍであり、基準画像平面と撮影装置２００Ｂとの距離は（２７０－α）ｃｍである。そうすると、基準画像平面上のものが画像６００に映る長さの、撮影装置２００Ａによる画像６００と、撮影装置２００Ｂによる画像６００との比は、（２４０－α）と（２７０－α）との比になる。そこで、撮影装置２００Ｂにより撮影された画像６００から推定された特徴点６２０の、画像６００上の画像６００の中心からの距離Ｌが、下記式による補正後の距離Ｌ’となるように、当該特徴点６２０の、当該画像６００上の座標を補正する。基準画像平面の高さは、姿勢推定精度の観点から実験により適当な一定の値に設定し得る。 As shown in FIG. 6, the height of the reference image plane with respect to the floor surface is α cm. Therefore, the distance between the reference image plane and the photographing device 200A is (240-α) cm, and the distance between the reference image plane and the photographing device 200B is (270-α) cm. Then, the ratio of the image 600 by the photographing apparatus 200A to the image 600 by the photographing apparatus 200B, which is the length that the object on the reference image plane is reflected in the image 600, is the ratio of (240-α) and (270-α). become. Therefore, the feature is such that the distance L from the center of the image 600 on the image 600 of the feature point 620 estimated from the image 600 captured by the photographing device 200B is the distance L'corrected by the following equation. The coordinates of the point 620 on the image 600 are corrected. The height of the reference image plane can be set to an appropriate constant value by experiment from the viewpoint of posture estimation accuracy.

Ｌ’＝Ｌ×（２７０－α）／（２４０－α）
図７は、補正前後の特徴点６２０を画像６００上で説明するための説明図である。 L'= L × (270-α) / (240-α)
FIG. 7 is an explanatory diagram for explaining the feature points 620 before and after the correction on the image 600.

図７において、画像６００の中心が黒い点で示されており、特徴点６２０が白抜きの丸で示されている。画像６００の中心との距離がＬである特徴点６２０が補正前の特徴点である。画像６００の中心との距離がＬ’である特徴点６２０が補正後の特徴点である。図７に示すように、特徴点６２０は、画像６００の中心に対する特徴点６２０の方向は変えずに、画像６００の中心との距離を上記式により変更することで、その座標が補正される。 In FIG. 7, the center of the image 600 is indicated by a black dot, and the feature point 620 is indicated by a white circle. The feature point 620 whose distance from the center of the image 600 is L is the feature point before correction. The feature point 620 whose distance from the center of the image 600 is L'is the corrected feature point. As shown in FIG. 7, the coordinates of the feature point 620 are corrected by changing the distance from the center of the image 600 to the center of the image 600 without changing the direction of the feature point 620 with respect to the center of the image 600.

姿勢推定部１１６は、ＮＮにより構成される。ＮＮには、人物の姿勢を推定するための学習により得られた姿勢推定パラメーターが反映されている。姿勢推定部１１６は、補正後の特徴点６２０に基づいて、対象者５００の姿勢を推定する。 The posture estimation unit 116 is composed of NN. The posture estimation parameters obtained by learning for estimating the posture of a person are reflected in the NN. The posture estimation unit 116 estimates the posture of the subject 500 based on the corrected feature point 620.

姿勢推定部１１６は、特徴点６２０と、当該特徴点６２０に対する正解として設定された姿勢との組合せを教師データとして、特徴点６２０から姿勢を推定するための学習が予めされている。これにより、姿勢推定部１１６には、上述した姿勢推定パラメーターが反映されている。教師データとして用いられる特徴点６２０等は、基準高さに設置された撮影装置２００により撮影された画像６００から検出されたもののみでよい。すなわち、姿勢推定しようとする画像６００が撮影された撮影装置２００が基準高さ以外の高さに設置されていても、当該基準高さと異なる高さに設置された撮影装置２００により撮影された画像６００から検出された特徴点６２０等の教師データを新たに用意する必要はない。 The posture estimation unit 116 is preliminarily learned to estimate the posture from the feature point 620 by using the combination of the feature point 620 and the posture set as the correct answer for the feature point 620 as teacher data. As a result, the posture estimation unit 116 reflects the above-mentioned posture estimation parameters. The feature points 620 and the like used as the teacher data may be only those detected from the image 600 taken by the photographing apparatus 200 installed at the reference height. That is, even if the photographing device 200 in which the image 600 for which the posture is to be estimated is captured is installed at a height other than the reference height, the image captured by the photographing device 200 installed at a height different from the reference height. It is not necessary to newly prepare teacher data such as feature points 620 detected from 600.

画像認識装置１００の動作について説明する。 The operation of the image recognition device 100 will be described.

図８は、画像認識装置１００の動作を示すフローチャートである。本フローチャートは、記憶部１２０に記憶されたプログラムに従い、制御部１１０により実行される。 FIG. 8 is a flowchart showing the operation of the image recognition device 100. This flowchart is executed by the control unit 110 according to the program stored in the storage unit 120.

画像取得部１１１は、撮影装置２００から画像６００を、通信部１５０を介して受信することで取得する（Ｓ１０１）。 The image acquisition unit 111 acquires an image 600 from the photographing device 200 via the communication unit 150 (S101).

人物領域検出部１１２は、画像６００から人物領域６１０を検出する（Ｓ１０２）。 The person area detection unit 112 detects the person area 610 from the image 600 (S102).

特徴点推定部１１３は、人物領域６１０から特徴点６２０を推定する（Ｓ１０３）。 The feature point estimation unit 113 estimates the feature point 620 from the person area 610 (S103).

補正部１１５は、撮影装置２００の高さ情報に基づいて、撮影装置２００が基準高さに設置されたと仮定された場合に合わせて、特徴点６２０を補正する（Ｓ１０４）。 The correction unit 115 corrects the feature point 620 based on the height information of the photographing apparatus 200, in accordance with the case where the photographing apparatus 200 is assumed to be installed at the reference height (S104).

姿勢推定部１１６は、補正後の特徴点６２０に基づいて、対象者５００の姿勢を推定する（Ｓ１０５）。 The posture estimation unit 116 estimates the posture of the subject 500 based on the corrected feature point 620 (S105).

本実施形態は以下の効果を奏する。 This embodiment has the following effects.

所定の撮影領域を俯瞰する設置位置に設置された撮影装置で撮影された撮影領域全体の画像から人物領域を検出し、人物領域から特徴点を推定し、撮影装置の高さが基準高さである場合に合わせて特徴点を補正し、補正後の特徴点に基づいて、機械学習により人物の姿勢を推定する。これにより、撮影装置の設置高さが変動しても、撮影された画像に基づいて高精度に人物の姿勢を推定できるとともに、撮影装置の設置高さの変化に起因する、機械学習による姿勢推定精度の低下を防止するために必要な、機械学習に用いる教師データを削減できる。 The person area is detected from the image of the entire shooting area taken by the shooting device installed at the installation position that overlooks the predetermined shooting area, the feature points are estimated from the person area, and the height of the shooting device is the reference height. The feature points are corrected according to a certain case, and the posture of the person is estimated by machine learning based on the corrected feature points. As a result, even if the installation height of the imaging device fluctuates, the posture of the person can be estimated with high accuracy based on the captured image, and the posture estimation by machine learning due to the change in the installation height of the imaging device. It is possible to reduce the teacher data used for machine learning, which is necessary to prevent the decrease in accuracy.

（第２実施形態）
本発明の第２実施形態について説明する。本実施形態と第１実施形態とで異なる点は次の点である。第１実施形態は、人物領域６１０から推定された特徴点６２０を高さ情報に基づいて補正し、補正後の特徴点６２０に基づいて姿勢を推定する。一方、本実施形態は、人物領域６１０を高さ情報に基づいて補正し、補正後の人物領域６１０に基づいて姿勢を推定する。その他の点については、本実施形態は第１実施形態と同様であるため、重複する説明は省略または簡略化する。 (Second Embodiment)
A second embodiment of the present invention will be described. The differences between the present embodiment and the first embodiment are as follows. In the first embodiment, the feature points 620 estimated from the person area 610 are corrected based on the height information, and the posture is estimated based on the corrected feature points 620. On the other hand, in the present embodiment, the person area 610 is corrected based on the height information, and the posture is estimated based on the corrected person area 610. Since the present embodiment is the same as the first embodiment in other respects, duplicated description will be omitted or simplified.

図９は、画像認識装置１００の制御部１１０の機能を示すブロック図である。制御部１１０は、画像取得部１１１、人物領域検出部１１２、高さ情報取得部１１４、補正部１１５、および姿勢推定部１１６を含む。 FIG. 9 is a block diagram showing the function of the control unit 110 of the image recognition device 100. The control unit 110 includes an image acquisition unit 111, a person area detection unit 112, a height information acquisition unit 114, a correction unit 115, and a posture estimation unit 116.

補正部１１５は、撮影装置２００の高さが基準高さとされた場合に合わせて、高さ情報に基づいて人物領域６１０を補正する。具体的には、第１実施形態において行った各特徴点６２０の座標の補正を、人物領域６１０の各画素について行う。これにより、人物領域の画素の各座標が補正されることで、補正後の人物領域６１０が算出される。なお、人物領域６１０が補正されることにより、人物領域６１０の画素密度が変化し得るが、補正後の人物領域６１０について、補正前の人物領域６１０の画素密度に戻す公知の変換がなされ得る。 The correction unit 115 corrects the person area 610 based on the height information in accordance with the case where the height of the photographing apparatus 200 is set as the reference height. Specifically, the correction of the coordinates of each feature point 620 performed in the first embodiment is performed for each pixel of the person region 610. As a result, each coordinate of the pixel of the person area is corrected, and the corrected person area 610 is calculated. Although the pixel density of the person area 610 may change due to the correction of the person area 610, a known conversion may be performed to return the corrected person area 610 to the pixel density of the person area 610 before correction.

姿勢推定部１１６は、補正後の人物領域６１０に基づいて、対象者５００の姿勢を推定する。 The posture estimation unit 116 estimates the posture of the subject 500 based on the corrected person area 610.

姿勢推定部１１６は、人物領域６１０と、当該人物領域６１０に対する正解として設定された姿勢との組合せを教師データとして、人物領域６１０から姿勢を推定するための学習が予めされる。教師データとして用いられる人物領域６１０等は、基準高さに設置された撮影装置２００により撮影された画像６００から検出されたもののみでよい。すなわち、姿勢推定しようとする画像６００が撮影された撮影装置２００が基準高さ以外の高さに設置されていても、当該基準高さと異なる高さに設置された撮影装置２００により撮影された画像６００から検出された人物領域６１０等の教師データを新たに用意する必要はない。 The posture estimation unit 116 is preliminarily learned to estimate the posture from the person area 610 by using the combination of the person area 610 and the posture set as the correct answer for the person area 610 as teacher data. The person area 610 or the like used as the teacher data may be only those detected from the image 600 taken by the photographing device 200 installed at the reference height. That is, even if the photographing device 200 in which the image 600 for which the posture is to be estimated is taken is installed at a height other than the reference height, the image taken by the photographing device 200 installed at a height different from the reference height. It is not necessary to newly prepare teacher data such as the person area 610 detected from 600.

図１０は、画像認識装置１００の動作を示すフローチャートである。本フローチャートは、記憶部１２０に記憶されたプログラムに従い、制御部１１０により実行される。 FIG. 10 is a flowchart showing the operation of the image recognition device 100. This flowchart is executed by the control unit 110 according to the program stored in the storage unit 120.

画像取得部１１１は、撮影装置２００から画像６００を、通信部１５０を介して受信することで取得する（Ｓ２０１）。 The image acquisition unit 111 acquires an image 600 from the photographing device 200 via the communication unit 150 (S201).

人物領域検出部１１２は、画像６００から対象者５００を含む人物領域６１０を検出する（Ｓ２０２）。 The person area detection unit 112 detects the person area 610 including the target person 500 from the image 600 (S202).

補正部１１５は、撮影装置２００が基準高さに設置された場合に合わせて、撮影装置２００の高さ情報に基づいて、人物領域６１０を補正する（Ｓ２０３）。 The correction unit 115 corrects the person area 610 based on the height information of the photographing device 200 according to the case where the photographing device 200 is installed at the reference height (S203).

姿勢推定部１１６は、補正後の人物領域６１０に基づいて、対象者５００の姿勢を推定する（Ｓ２０４）。 The posture estimation unit 116 estimates the posture of the subject 500 based on the corrected person area 610 (S204).

所定の撮影領域を俯瞰する設置位置に設置された撮影装置で撮影された撮影領域全体の画像から人物領域を検出し、撮影装置の高さが基準高さである場合に合わせて人物領域を補正し、補正後の人物領域に基づいて、機械学習により人物の姿勢を推定する。これにより、撮影装置の設置高さが変動しても、撮影された画像に基づいて高精度に人物の姿勢を推定できるとともに、撮影装置の設置高さの変化に起因する、機械学習による姿勢推定精度の低下を防止するために必要な、機械学習に用いる教師データを削減できる。 The person area is detected from the image of the entire shooting area taken by the shooting device installed at the installation position that overlooks the predetermined shooting area, and the person area is corrected according to the case where the height of the shooting device is the reference height. Then, the posture of the person is estimated by machine learning based on the corrected person area. As a result, even if the installation height of the imaging device fluctuates, the posture of the person can be estimated with high accuracy based on the captured image, and the posture estimation by machine learning due to the change in the installation height of the imaging device. It is possible to reduce the teacher data used for machine learning, which is necessary to prevent the decrease in accuracy.

（第３実施形態）
本発明の第３実施形態について説明する。本実施形態と第１実施形態とで異なる点は次の点である。第１実施形態は、人物領域６１０から推定された特徴点６２０を高さ情報に基づいて補正し、補正後の特徴点６２０に基づいて姿勢を推定する。一方、本実施形態は、特徴点６２０等の補正はせずに、人物領域６１０と、高さ情報とに基づいて、機械学習により姿勢を推定する。その他の点については、本実施形態は第１実施形態と同様であるため、重複する説明は省略または簡略化する。 (Third Embodiment)
A third embodiment of the present invention will be described. The differences between the present embodiment and the first embodiment are as follows. In the first embodiment, the feature points 620 estimated from the person area 610 are corrected based on the height information, and the posture is estimated based on the corrected feature points 620. On the other hand, in the present embodiment, the posture is estimated by machine learning based on the person area 610 and the height information without correcting the feature points 620 and the like. Since the present embodiment is the same as the first embodiment in other respects, duplicated description will be omitted or simplified.

図１１は、画像認識装置１００の制御部１１０の機能を示すブロック図である。制御部１１０は、画像取得部１１１、人物領域検出部１１２、高さ情報取得部１１４、および姿勢推定部１１６を含む。 FIG. 11 is a block diagram showing the function of the control unit 110 of the image recognition device 100. The control unit 110 includes an image acquisition unit 111, a person area detection unit 112, a height information acquisition unit 114, and a posture estimation unit 116.

姿勢推定部１１６は、画像６００から検出された人物領域６１０と、高さ情報とに基づいて、対象者５００の姿勢を推定する。 The posture estimation unit 116 estimates the posture of the subject 500 based on the person area 610 detected from the image 600 and the height information.

姿勢推定部１１６は、人物領域６１０および高さ情報と、当該人物領域６１０および高さ情報の入力に対する正解として設定された姿勢との組合せを教師データとして、人物領域６１０および高さ情報から姿勢を推定するための学習が予めされる。教師データとして用いられる人物領域６１０および高さ情報と、当該人物領域６１０および高さ情報の入力に対する正解として設定された姿勢との組合せは、複数の高さに設置された撮影装置２００により撮影された画像６００に基づくものを用いる。すなわち、教師データとして、様々な高さに設置された撮影装置２００により撮影された画像６００に基づいて得られた、人物領域６１０および高さ情報の入力と、当該人物領域６１０および高さ情報の入力に対する正解として設定された姿勢の正解ラベルとの組合せを用いる。 The posture estimation unit 116 uses the combination of the person area 610 and the height information and the posture set as the correct answer for the input of the person area 610 and the height information as teacher data, and obtains the posture from the person area 610 and the height information. Learning for estimation is performed in advance. The combination of the person area 610 and the height information used as the teacher data and the posture set as the correct answer for the input of the person area 610 and the height information is photographed by the photographing apparatus 200 installed at a plurality of heights. The one based on the image 600 is used. That is, as teacher data, input of the person area 610 and height information obtained based on the image 600 taken by the image pickup device 200 installed at various heights, and the person area 610 and the height information. Use the combination with the correct answer label of the posture set as the correct answer for the input.

図１２は、画像認識装置１００の動作を示すフローチャートである。本フローチャートは、記憶部１２０に記憶されたプログラムに従い、制御部１１０により実行される。 FIG. 12 is a flowchart showing the operation of the image recognition device 100. This flowchart is executed by the control unit 110 according to the program stored in the storage unit 120.

画像取得部１１１は、撮影装置２００から画像６００を、通信部１５０を介して受信することで取得する（Ｓ３０１）。 The image acquisition unit 111 acquires an image 600 from the photographing device 200 via the communication unit 150 (S301).

人物領域検出部１１２は、画像６００から対象者５００を含む人物領域６１０を検出する（Ｓ３０２）。 The person area detection unit 112 detects the person area 610 including the target person 500 from the image 600 (S302).

姿勢推定部１１６は、人物領域６１０および高さ情報に基づいて、対象者５００の姿勢を推定する（Ｓ３０３）。 The posture estimation unit 116 estimates the posture of the subject 500 based on the person area 610 and the height information (S303).

以上に説明した画像認識システム１０の構成は、上述の実施形態の特徴を説明するにあたって主要構成を説明したのであって、上述の構成に限られず、特許請求の範囲内において、種々改変することができる。また、一般的な画像認識システムが備える構成を排除するものではない。 The configuration of the image recognition system 10 described above is the main configuration described in explaining the features of the above-described embodiment, and is not limited to the above-mentioned configuration and may be variously modified within the scope of the claims. can. Further, it does not exclude the configuration provided in the general image recognition system.

例えば、上述の実施形態においては、基準高さと基準画像平面の高さは別々に設定している。しかし、基準高さと基準画像平面の高さは同じであってもよい。 For example, in the above-described embodiment, the reference height and the height of the reference image plane are set separately. However, the reference height and the height of the reference image plane may be the same.

また、画像認識装置１００が有する機能を、センサーボックスにより構成される撮影装置２００、または携帯端末４００が備えるようにしてもよい。 Further, the function of the image recognition device 100 may be provided in the photographing device 200 configured by the sensor box or the mobile terminal 400.

また、画像認識装置１００、撮影装置２００、および携帯端末４００は、それぞれ複数の装置により構成されてもよく、いずれか複数の装置が単一の装置として構成されてもよい。 Further, the image recognition device 100, the photographing device 200, and the mobile terminal 400 may each be configured by a plurality of devices, or any plurality of the devices may be configured as a single device.

また、上述したフローチャートは、一部のステップを省略してもよく、他のステップが追加されてもよい。また各ステップの一部は同時に実行されてもよく、一つのステップが複数のステップに分割されて実行されてもよい。 Further, in the above-mentioned flowchart, some steps may be omitted or other steps may be added. Further, a part of each step may be executed at the same time, or one step may be divided into a plurality of steps and executed.

また、上述した画像認識システム１０における各種処理を行う手段および方法は、専用のハードウェア回路、またはプログラムされたコンピューターのいずれによっても実現することが可能である。上記プログラムは、例えば、ＵＳＢメモリやＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）－ＲＯＭ等のコンピューター読み取り可能な記録媒体によって提供されてもよいし、インターネット等のネットワークを介してオンラインで提供されてもよい。この場合、コンピューター読み取り可能な記録媒体に記録されたプログラムは、通常、ハードディスク等の記憶部に転送され記憶される。また、上記プログラムは、単独のアプリケーションソフトとして提供されてもよいし、一機能としてその検出部等の装置のソフトウエアに組み込まれてもよい。 Further, the means and methods for performing various processes in the image recognition system 10 described above can be realized by either a dedicated hardware circuit or a programmed computer. The program may be provided by a computer-readable recording medium such as a USB memory or a DVD (Digital Versail Disc) -ROM, or may be provided online via a network such as the Internet. In this case, the program recorded on the computer-readable recording medium is usually transferred to and stored in a storage unit such as a hard disk. Further, the above program may be provided as a single application software, or may be incorporated into the software of a device such as a detection unit as a function.

１０画像認識システム、
１００画像認識装置、
１１０制御部、
１１１画像取得部、
１１２人物領域検出部、
１１３特徴点推定部、
１１４高さ情報取得部、
１１５補正部、
１１６姿勢推定部、
２００撮影装置、
３００通信ネットワーク、
３１０アクセスポイント、
４００携帯端末、
５００対象者、
６００画像、
６１０人物領域、
６２０特徴点、
６２１関節点、
６２２頭部矩形の対頂点、
６２３頭部矩形。 10 Image recognition system,
100 image recognition device,
110 Control unit,
111 Image acquisition unit,
112 Person area detector,
113 Point estimation unit,
114 Height information acquisition unit,
115 correction part,
116 Posture estimation unit,
200 shooting equipment,
300 communication networks,
310 access points,
400 mobile terminals,
500 Targets,
600 images,
610 person area,
620 feature points,
621 Cut vertices,
622 Head rectangle paired vertices,
623 Head rectangle.

（１）所定の撮影領域を俯瞰する設置位置に設置された撮影装置により、前記撮影領域全体が撮影された画像から検出された人物領域を取得する手順（ａ）と、前記設置位置の、所定位置からの高さの情報を取得する手順（ｂ）と、複数の異なる高さの情報と前記複数の異なる高さ情報に対応する人物領域の入力に対する正解として設定された姿勢との組合せを教師データとして学習されている人物の姿勢を推定する学習済みモデルに、前記人物領域および前記高さの情報入力し、人物の姿勢を推定する手順（ｃ）と、を有する処理を、コンピューターに実行させるための画像処理プログラム。
（２）所定の撮影領域を俯瞰する設置位置に設置された撮影装置により、前記撮影領域全体が撮影された画像から検出された人物領域を取得する人物領域取得部と、前記設置位置の、所定位置からの高さの情報を取得する情報取得部と、複数の異なる高さの情報と前記複数の異なる高さ情報に対応する人物領域の入力に対する正解として設定された姿勢との組合せを教師データとして学習されている人物の姿勢を推定する学習済みモデルと、前記人物領域および前記高さの情報を、前記学習済みモデルに入力し、人物の姿勢を推定する姿勢推定部と、を有する画像処理システム。
（３）所定の撮影領域を俯瞰する設置位置に設置された撮影装置により、前記撮影領域全体が撮影された画像から検出された人物領域を取得する段階（ａ）と、前記設置位置の、所定位置からの高さの情報を取得する段階（ｂ）と、複数の異なる高さの情報と前記複数の異なる高さ情報に対応する人物領域の入力に対する正解として設定された姿勢との組合せを教師データとして学習されている人物の姿勢を推定する学習済みモデルに、前記人物領域および前記高さの情報入力し、人物の姿勢を推定する段階（ｃ）と、を有する画像処理方法。 (1) A procedure (a) for acquiring a person area detected from an image in which the entire shooting area is taken by a shooting device installed at an installation position overlooking a predetermined shooting area, and a predetermined setting position. The teacher teaches the combination of the procedure (b) for acquiring the height information from the position and the posture set as the correct answer to the input of the person area corresponding to the information of the plurality of different heights and the information of the plurality of different heights. A computer is made to execute a process having the procedure (c) of inputting information on the person area and the height into the trained model for estimating the posture of the person trained as data and estimating the posture of the person. Image processing program for .
(2) A person area acquisition unit that acquires a person area detected from an image in which the entire shooting area is shot by a shooting device installed at an installation position that overlooks a predetermined shooting area, and a predetermined setting position of the person area acquisition unit. Teacher data is a combination of an information acquisition unit that acquires height information from a position, and a combination of a plurality of different height information and a posture set as a correct answer to the input of a person area corresponding to the plurality of different height information. Image processing having a trained model for estimating the posture of a person trained as, and a posture estimation unit for inputting information on the person area and the height into the trained model and estimating the posture of the person. system.
(3) A step (a) of acquiring a person area detected from an image in which the entire shooting area is taken by a shooting device installed at an installation position overlooking a predetermined shooting area, and a predetermined setting position. The teacher teaches the combination of the step (b) of acquiring the height information from the position and the posture set as the correct answer to the input of the person area corresponding to the plurality of different height information and the plurality of different height information. An image processing method comprising a step (c) of inputting information on the person area and the height into a trained model for estimating the posture of a person trained as data and estimating the posture of the person.

（１）撮影装置により、撮影された画像から人物領域を検出する手順（ａ）と、前記撮影装置の設置位置の、所定位置からの高さの情報を取得する手順（ｂ）と、人物領域および高さ情報の入力に対する正解として設定された姿勢との組合せを教師データとして、人物領域および高さ情報から人物の姿勢を推定するための学習がされた学習済みモデルに、前記手順（ａ）において検出された前記人物領域および前記手順（ｂ）において取得された前記高さの情報を、前記学習済みモデルに入力し、人物の姿勢を推定する手順（ｃ）と、を有する処理を、コンピューターに実行させるための画像処理プログラム。
（２）撮影装置により、撮影された画像から人物領域を検出する人物領域検出部と、前記撮影装置の設置位置の、所定位置からの高さの情報を取得する情報取得部と、人物領域および高さ情報の入力に対する正解として設定された姿勢との組合せを教師データとして、人物領域および高さ情報から人物の姿勢を推定するための学習がされた学習済みモデルと、前記人物領域検出部により検出された前記人物領域および前記情報取得部により取得された前記高さの情報を、前記学習済みモデルに入力し、人物の姿勢を推定する姿勢推定部と、を有する画像処理システム。
（３）撮影装置により、撮影された画像から人物領域を検出する段階（ａ）と、前記撮影装置の設置位置の、所定位置からの高さの情報を取得する段階（ｂ）と、人物領域および高さ情報の入力に対する正解として設定された姿勢との組合せを教師データとして、人物領域および高さ情報から人物の姿勢を推定するための学習がされた学習済みモデルに、前記段階（ａ）において検出された前記人物領域および前記段階（ｂ）において取得された前記高さの情報入力し、人物の姿勢を推定する段階（ｃ）と、を有する画像処理方法。 (1) A procedure (a) for detecting a person area from a captured image by a photographing device, and a procedure (b) for acquiring information on the height of the installation position of the photographing device from a predetermined position. And the combination of the posture set as the correct answer for the input of the person area and the height information is used as the teacher data, and the trained model for estimating the posture of the person from the person area and the height information is described above. It has a procedure (c) of inputting the person area detected in the procedure (a) and the height information acquired in the procedure (b) into the trained model and estimating the posture of the person. An image processing program that allows a computer to perform processing.
(2) A person area detection unit that detects a person area from an image captured by the photographing device, and an information acquisition unit that acquires information on the height of the installation position of the photographing device from a predetermined position. , A trained model that has been trained to estimate the posture of a person from the person area and height information, using the combination with the posture set as the correct answer for the input of the person area and height information as teacher data, and the person. An image processing system including a posture estimation unit that inputs the person area detected by the area detection unit and the height information acquired by the information acquisition unit into the trained model and estimates the posture of the person. ..
(3) A step (a) of detecting a person area from a captured image by a photographing device, and a step (b) of acquiring information on the height of the installation position of the photographing device from a predetermined position. , A trained model that has been trained to estimate the posture of a person from the person area and height information, using the combination with the posture set as the correct answer for the input of the person area and height information as teacher data. An image processing method comprising the step (c) of inputting the information of the person area detected in (a) and the height acquired in the step (b) and estimating the posture of the person.

Claims

An image acquisition unit that acquires an image in which the entire imaging area is captured by an imaging device installed at an installation position that gives a bird's-eye view of a predetermined imaging area.
A person area detection unit that detects a person area from the image,
An information acquisition unit that acquires height information from a predetermined position of the installation position, and
A posture estimation unit that estimates the posture of a person by machine learning based on the person area and the height information, and a posture estimation unit.
Image processing system with.

Further, it has a correction unit for correcting the person area based on the height information in accordance with the case where the height from the predetermined position is a preset reference height.
The posture estimation unit is pre-learned as teacher data from the combination of the person area detected from the image taken by the photographing device installed at the reference height and the posture of the person, and the correction unit is used. Estimates the posture of the person based on the person area corrected by
The image processing system according to claim 1.

A feature point estimation unit that estimates feature points related to the human body from the person area,
Further, it has a correction unit for correcting the feature points based on the height information in accordance with the case where the height from the predetermined position is a preset reference height.
The posture estimation unit uses a combination of the feature points estimated from the person area detected from the image taken by the photographing device installed at the reference height and the posture of the person as teacher data. The posture of the person is estimated based on the feature points learned in advance and corrected by the correction unit.
The image processing system according to claim 1.

The procedure (a) of acquiring an image in which the entire shooting area is shot by a shooting device installed at an installation position overlooking a predetermined shooting area, and
The procedure (b) for detecting a person area from the image and
The procedure (c) for acquiring height information of the installation position from a predetermined position and
The procedure (d) of estimating the posture of a person by machine learning based on the person area and the height information, and
An image processing program for causing a computer to execute a process having the above.

The process further includes a procedure (e) for correcting the person area based on the height information in accordance with the case where the height from the predetermined position is a preset reference height.
In the procedure (d), the posture estimation learned in advance using the combination of the person area detected from the image taken by the photographing device installed at the reference height and the posture of the person as teacher data. The unit estimates the posture of the person based on the person area corrected in the procedure (e).
The image processing program according to claim 4.

In the process, the procedure (f) for estimating the feature points related to the human body from the person area and the procedure (f).
Further, the procedure (g) for correcting the feature points based on the height information is provided in accordance with the case where the height from the predetermined position is a preset reference height.
In the procedure (d), the combination of the feature points estimated from the person area detected from the image taken by the photographing device installed at the reference height and the posture of the person is used as teacher data. The posture of the person is estimated based on the feature points corrected in the procedure (g) by the posture estimation unit learned in advance.
The image processing program according to claim 4.

It is an image processing method by an image processing system.
At the stage (a) of acquiring an image in which the entire shooting area is shot by a shooting device installed at an installation position overlooking a predetermined shooting area,
The step (b) of detecting a person area from the image and
At the stage (c) of acquiring height information of the installation position from a predetermined position,
The step (d) of estimating the posture of a person by machine learning based on the person area and the height information, and
Image processing method having.

Further, the step (e) for correcting the person area based on the height information is provided in accordance with the case where the height from the predetermined position is a preset reference height.
In the step (d), the posture estimation learned in advance using the combination of the person area detected from the image taken by the photographing device installed at the reference height and the posture of the person as teacher data. The unit estimates the posture of the person based on the person area corrected in the step (e).
The image processing method according to claim 7.

At the stage (f) of estimating the feature points related to the human body from the person area,
Further, it has a step (g) of correcting the feature point based on the information of the height in accordance with the case where the height from the predetermined position is a preset reference height.
In the step (d), teacher data is a combination of the feature points estimated from the person area detected from the image taken by the image pickup device installed at the reference height and the posture of the person. The posture of the person is estimated based on the feature points corrected in the step (g) by the posture estimation unit learned in advance.
The image processing method according to claim 7.