JP2019185752A

JP2019185752A - Image extracting device

Info

Publication number: JP2019185752A
Application number: JP2019030141A
Authority: JP
Inventors: ヤンチャン; Young Chung; ハオシャ; Hao Sha; パンチャン; Peng Zhang; 媛李; En Ri
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2018-03-30
Filing date: 2019-02-22
Publication date: 2019-10-24
Anticipated expiration: 2039-02-22
Also published as: JP6666488B2; CN110321767B; CN110321767A

Abstract

To extract a specific image for recognizing the behavior of a subject person in video data from video data derived from a plurality of video sources and improve the accuracy of behavior recognition.SOLUTION: This image extracting device comprises: a personal detection extraction unit for detecting a subject person from video data and extracting all images of the subject person; a key point extraction unit for performing key point extraction on the images of the subject person having been extracted by the person detection extraction unit; an interest region specification unit for specifying a region of interest of the subject person; an interest region image extraction unit for extracting the image of the region of interest of the subject person from the images of the subject person on the basis of the region of interest specified by the interest region specification unit; and an image determination output unit for determining whether the image of the region of interest extracted by the interest region image extraction unit is a specific image, and outputting the determined specific image.SELECTED DRAWING: Figure 3

Description

本発明はビデオ監視分野に関し、具体的には、ビデオカメラが撮像したビデオデータから行動認識用の特定画像を抽出する技術に関する。 The present invention relates to the field of video surveillance, and more specifically to a technique for extracting a specific image for action recognition from video data captured by a video camera.

人間行動認識（ＨｕｍａｎＡｃｔｉｏｎＲｅｃｏｇｎｉｔｉｏｎ）技術は、多くの分野で幅広く活用されることが期待され、経済的価値や社会的価値を持っている。 Human action recognition technology is expected to be widely used in many fields, and has economic value and social value.

例えば、医療介護、バーチャルリアリティ、スポーツ選手の訓練支援などの分野において、特定の対象の体に例えばウェアラブルセンサーを複数取り付けることで、当該対象の行動データを収集し、その行動モードを解析することができる。ところが、センサーによる人間行動認識は正確性、耐干渉性に優れるが、特定の対象のみに適用でき、且つコストが高いため、その利用は極めて限られている。 For example, in fields such as medical care, virtual reality, and training support for athletes, by attaching a plurality of wearable sensors, for example, to a specific target body, it is possible to collect the behavior data of the target and analyze the behavior mode it can. However, although human action recognition by a sensor is excellent in accuracy and interference resistance, it can be applied only to a specific object and its cost is high, so its use is extremely limited.

一方で、ビデオ監視機器の普及やコンピュータビジョン技術の発展に伴い、ビデオ解析による人間行動認識の研究が盛んに行われてきた。ビデオ解析による人間行動認識は、収集されたビデオデータへの解析のみにより認識結果が得られるため、低コストで不特定対象への行動認識が可能で、多くの分野、特にセキュリティ分野において、非常に重要な意味がある。 On the other hand, with the spread of video surveillance equipment and the development of computer vision technology, research on human behavior recognition by video analysis has been actively conducted. Human action recognition by video analysis can be recognized only by analyzing the collected video data, so it is possible to recognize actions on unspecified objects at a low cost. There is an important meaning.

行動認識の研究では、通常、人体の骨格のキーポイント（ｋｅｙｐｏｉｎｔ）の移動で人体の動きを表現する。そして、十数個のキーポイントの組み合わせで人体を表し、これらキーポイントに対する追跡で行動を認識する。例えば、カーネギーメロン大学（ＣＭＵ）は、コンピュータビジョンとパターン認識の国際会議ＣＶＰＲ２０１７で、「ＲｅａｌｔｉｍｅＭｕｌｔｉ−Ｐｅｒｓｏｎ２ＤＰｏｓｅＥｓｔｉｍａｔｉｏｎｕｓｉｎｇＰａｒｔＡｆｆｉｎｉｔｙＦｉｅｌｄｓ」と題して、リアルタイムに複数人のキーポイント検出とキーポイント関係推定が可能な技術ＯｐｅｎＰｏｓｅを発表した（オープンソースライブラリ、ｈｔｔｐｓ：／／ｇｉｔｈｕｂ．ｃｏｍ／ＣＭＵ−Ｐｅｒｃｅｐｔｕａｌ−Ｃｏｍｐｕｔｉｎｇ−Ｌａｂ／ｏｐｅｎｐｏｓｅ）。ＯｐｅｎＰｏｓｅ技術では、図７に示すように、撮像された人物の数に関わらずに、深層学習の方法で、１つの画像から複数人の骨格のキーポイントを同時に推定できる。そして、この技術をビデオデータに適用すれば、キーポイントの移動の情報を取得でき、人間の行動に関する情報が得られる。 In the study of action recognition, the movement of the human body is usually expressed by the movement of key points of the skeleton of the human body. A human body is represented by a combination of a dozen key points, and actions are recognized by tracking these key points. For example, Carnegie Mellon University (CMU), at CVPR 2017, an international conference on computer vision and pattern recognition, called “Realtime Multi-Person 2D Pose Estimating Part Affinity Fields”. A technology OpenPose that enables relationship estimation has been announced (open source library, https://github.com/CMU-Perceptual-Computing-Lab/openpose). In the OpenPose technique, as shown in FIG. 7, key points of a plurality of human skeletons can be estimated simultaneously from one image by a deep learning method regardless of the number of captured people. If this technique is applied to video data, information on movement of key points can be obtained, and information on human behavior can be obtained.

また、行動認識技術の具体的な応用例として、特許文献１には、人口高齢化の傾向に対応するために、人間行動認識を利用して医療介護を行う技術が開示されている。具体的に、ビデオデータから異常行動を認識する方法において、ビデオシーケンスから、画素がある程度変化している領域をフォアグラウンド抽出モジュールで抽出するステップと、フォアグラウンド抽出モジュールが検出した移動目標を判定し、歩行者と判定された矩形枠を保留して目標追跡モジュールに伝送するステップと、シーン内で認識された目標の各々を、目標追跡モジュールで継続的かつ多目標的に追跡するステップと、異常行動トリガーモジュールにより、目標追跡モジュールが追跡している目標の各々のフレームごとの姿勢を判定し、異常行動に対する行動解析を行うステップと、行動認識モジュールにより、異常行動のビデオクリップを演算処理し、異常行動を通報するとともに、行動の種類を認識する。 In addition, as a specific application example of the action recognition technology, Patent Document 1 discloses a technique for performing medical care using human action recognition in order to cope with the tendency of population aging. Specifically, in the method of recognizing abnormal behavior from video data, a step in which a region where pixels have changed to some extent is extracted from a video sequence by a foreground extraction module, a movement target detected by the foreground extraction module is determined, and walking Holding a rectangular frame determined to be a person and transmitting it to the target tracking module; continuously tracking each of the recognized targets in the scene with the target tracking module; and an abnormal behavior trigger. The module determines the posture for each frame of the target tracked by the target tracking module, performs a behavior analysis on the abnormal behavior, and computes a video clip of the abnormal behavior by the behavior recognition module, And recognize the type of action.

特許文献１では、ビデオシーケンスから移動目標を抽出して行動解析を行っているが、複数のビデオソース（例えばビデオカメラ）が存在しているときに、どのビデオソース（１つまたは複数）からのどのデータで解析を行うかについて考慮していない。 In Patent Document 1, a movement target is extracted from a video sequence and a behavior analysis is performed. When there are a plurality of video sources (for example, video cameras), from which video source (s) It does not consider which data is used for analysis.

中国特許出願公開第１０６５７１０１４号Chinese Patent Application No. 106571014

例えば、老人ホーム、リハビリテーションセンターなどの介護施設では、介護対象の健康状態又は回復状態を把握して治療方針、薬剤投与量などを調整するために、介護対象を撮像したビデオデータに対し、例えば前記ＯｐｅｎＰｏｓｅ技術でキーポイントデータを抽出し、介護対象の行動特徴を解析し、行動認識を行ってよい。また、介護対象のそれぞれは、健康状態などの原因により、例えば首、腰、ひじ、膝、足首など、特に注目する必要がある部位を有する。従って、特定の介護対象に対し、認識された行動特徴データにおける当該介護対象の注目部位の行動特徴データを、当該対象の履歴データ又は健常者の対応する部位のデータと比較して、その結果に基づいて健康状態又は回復状態を判定することができる。 For example, in a nursing home such as a nursing home or a rehabilitation center, in order to grasp the health state or recovery state of the care target and adjust the treatment policy, drug dosage, etc. Keypoint data may be extracted using the OpenPose technology, behavioral characteristics of the care target may be analyzed, and behavior recognition may be performed. Further, each of the care targets has a part that needs to be particularly noted, such as a neck, a waist, an elbow, a knee, and an ankle, for example, due to a health condition. Therefore, for a specific care target, the action feature data of the attention site of the care target in the recognized behavior feature data is compared with the history data of the target or the data of the corresponding part of the healthy person, and the result is Based on this, the health condition or the recovery condition can be determined.

しかし、専用の監視機器で介護対象を撮像すると、撮像場所が限られるだけではなく、介護対象が緊張などの原因により、自然的かつ日常的な行動を取れなくなる恐れがあり、撮像結果から注目部位の正確な行動特徴を得ることができない。このようなことを避けるために、既存の、例えば介護施設の食堂、廊下、エレベータ、娯楽休憩室、公園などの公共空間に設置されたビデオカメラを用いることが考えられる。即ち、すでに大規模に配置されている防犯用ビデオカメラなどの監視機器により撮像されたビデオデータを利用して、介護対象の行動認識を行うことが考えられる。 However, if a nursing target is imaged with a dedicated monitoring device, not only the imaging location is limited, but the nursing target may not be able to take natural and daily actions due to tension or other causes. I can't get accurate behavioral features. In order to avoid such a situation, it is conceivable to use an existing video camera installed in a public space such as a cafeteria, a corridor, an elevator, an entertainment rest room, or a park in a nursing facility. That is, it is conceivable to perform action recognition of a care target using video data captured by a surveillance device such as a security video camera that has already been arranged on a large scale.

一方で、防犯用ビデオカメラは特定の介護対象のために設置されたものではないため、介護対象のそれぞれにとって、ビデオカメラの位置、角度、ピント調整の設置が最適ではない可能性がある。よって、ある介護対象に対し、すべてのビデオカメラからのビデオデータが当該介護対象の注目部位の行動認識に適するとは言えない。前記ＯｐｅｎＰｏｓｅのような頑健性を持つ技術であっても、注目部位の信頼性が高い結果を得るために、全体のビデオデータにある程度絞り込みをかけて行動認識用のビデオデータを抽出しなければならない。 On the other hand, since the security video camera is not installed for a specific care target, the position, angle, and focus adjustment of the video camera may not be optimal for each care target. Therefore, it cannot be said that video data from all video cameras is suitable for action recognition of the attention site of the care target for a care target. Even in the robust technology such as OpenPose, in order to obtain a result with high reliability of the attention site, it is necessary to extract video data for action recognition by narrowing down the entire video data to some extent. .

本発明の目的は、複数のビデオカメラで撮像した対象人物のビデオデータを利用して当該対象人物の注目部位の行動特徴を解析する場合に、撮像されたビデオデータから当該対象人物の注目部位の行動認識に適する特定画像を抽出し、特定画像により対象人物の行動特徴を解析し、行動認識の信頼性と正確性を向上させることである。 An object of the present invention is to analyze the behavioral characteristics of a target part of the target person using video data of the target person captured by a plurality of video cameras. A specific image suitable for action recognition is extracted, and the action characteristics of the target person are analyzed using the specific image, thereby improving the reliability and accuracy of action recognition.

本発明の課題を解決するために、以下の発明が挙げられる。第１の発明は、複数のビデオソース由来のビデオデータから、ビデオデータにおける対象人物の行動認識を行うための特定画像を抽出する画像抽出装置であって、前記ビデオデータから対象人物を検出し、前記対象人物の画像の全てを抽出する人物検出抽出部と、前記人物検出抽出部が抽出した前記対象人物の画像に対してキーポイント抽出を行うキーポイント抽出部と、前記対象人物の注目部位を特定する注目部位特定部と、前記注目部位特定部が特定した前記注目部位に基づいて、前記対象人物の画像から当該対象人物の前記注目部位の画像を抽出する注目部位画像抽出部と、前記注目部位画像抽出部が抽出した前記注目部位の画像に対し、それぞれが前記特定画像であるか判定し、判定された前記特定画像を出力する画像判定出力部と、を備え、前記画像判定出力部は、前記注目部位の画像のシャープネス、画素数及び前記キーポイントの数の少なくともいずれかに基づいて、前記注目部位の画像が前記特定画像であるか判定する、画像抽出装置である。 In order to solve the problems of the present invention, the following inventions are listed. A first invention is an image extraction apparatus for extracting a specific image for performing action recognition of a target person in video data from video data derived from a plurality of video sources, the target person is detected from the video data, A person detection extraction unit that extracts all of the target person image, a key point extraction unit that performs key point extraction on the target person image extracted by the person detection extraction unit, and a target region of the target person An attention site specifying unit to be identified, an attention site image extracting unit that extracts an image of the attention site of the target person from the image of the target person based on the attention site specified by the attention site specifying unit, and the attention An image determination output unit that determines whether each of the images of the target region extracted by the region image extraction unit is the specific image, and outputs the determined specific image; The image determination output unit determines whether the image of the target region is the specific image based on at least one of the sharpness, the number of pixels, and the number of the key points of the image of the target region. It is an extraction device.

第２の発明は、複数のビデオソース由来のビデオデータから、ビデオデータにおける対象人物の行動認識を行うための特定画像を抽出する画像抽出方法であって、前記ビデオデータから対象人物を検出し、前記対象人物の画像の全てを抽出する人物検出抽出ステップと、前記人物検出抽出ステップで抽出した前記対象人物の画像に対してキーポイント抽出を行うキーポイント抽出ステップと、前記対象人物の注目部位を特定する注目部位特定ステップと、前記注目部位特定ステップで特定した前記注目部位に基づいて、前記対象人物の画像から当該対象人物の前記注目部位の画像を抽出する注目部位画像抽出ステップと、前記注目部位画像抽出ステップで抽出した前記注目部位の画像に対し、それぞれが前記特定画像であるか判定し、判定された前記特定画像を出力する画像判定出力ステップと、を備え、前記画像判定出力ステップにおいて、前記注目部位の画像のシャープネス、画素数及び前記キーポイントの数の少なくともいずれかに基づいて、前記注目部位の画像が前記特定画像であるか判定する、画像抽出方法である。 A second invention is an image extraction method for extracting a specific image for performing action recognition of a target person in video data from video data derived from a plurality of video sources, wherein the target person is detected from the video data, A person detection extraction step for extracting all the images of the target person, a key point extraction step for performing key point extraction on the target person image extracted in the person detection extraction step, and an attention site of the target person. An attention site specifying step to identify; an attention site image extracting step for extracting an image of the attention site of the target person from the image of the target person based on the attention site specified in the attention site specification step; It is determined whether each of the images of the target region extracted in the region image extraction step is the specific image. An image determination output step for outputting the specific image, and in the image determination output step, based on at least one of the sharpness of the image of the target region, the number of pixels, and the number of key points, Is an image extraction method for determining whether the image is the specific image.

第３の発明は、コンピュータで実行可能なプログラムであって、前記プログラムは、コンピュータに前記した画像抽出方法を実行させる、プログラムである。 A third invention is a program executable by a computer, and the program causes the computer to execute the image extraction method described above.

第４の発明は、行動解析システムであって、前記した画像抽出装置と、前記画像抽出装置から出力された前記特定画像を利用して、前記対象人物の行動認識を行う行動特徴解析部と、前記行動特徴解析部の解析結果を出力する通知部と、を備える、行動解析システムである。 4th invention is an action analysis system, Comprising: The above-mentioned image extraction apparatus, The action characteristic analysis part which performs action recognition of the said subject person using the said specific image output from the said image extraction apparatus, A behavior analysis system comprising: a notification unit that outputs an analysis result of the behavior feature analysis unit.

上記のように、本発明の画像抽出装置と方法によれば、複数のビデオソースから由来するビデオデータから、対象人物についてその注目部位の画像を抽出し、該注目部位の画像のシャープネス、画素数及びキーポイント数の少なくともいずれかに基づいて特定画像を判定・抽出するため、対象人物の注目部位の行動認識により適する画像を出力でき、行動認識の信頼性を向上させることができる。 As described above, according to the image extraction apparatus and method of the present invention, an image of a target region of a target person is extracted from video data derived from a plurality of video sources, and the sharpness and the number of pixels of the image of the target region are extracted. Since the specific image is determined / extracted based on at least one of the number of key points and the number of key points, an image suitable for the action recognition of the target portion of the target person can be output, and the reliability of the action recognition can be improved.

本発明によれば、複数のビデオカメラで撮像した対象人物のビデオデータを利用して当該対象人物の注目部位の行動特徴を解析する場合に、ビデオカメラのそれぞれが撮像した画像が必ずしも対象人物の行動認識に適するとは限らないことを考慮し、撮像されたビデオデータから、対象人物の注目部位のシャープネス、画素数及びキーポイント数の少なくともいずれかに基づいて、対象人物の注目部位の行動が正確に撮像された画像を抽出して特定画像として出力する画像抽出装置と方法を提供できる。 According to the present invention, when analyzing the behavioral characteristics of the target region of the target person using the video data of the target person captured by a plurality of video cameras, the images captured by each of the video cameras are not necessarily the target person's images. Considering that it is not necessarily suitable for action recognition, based on at least one of the sharpness, the number of pixels, and the number of key points of the target part of the target person from the captured video data, the action of the target part of the target person is It is possible to provide an image extraction apparatus and method for extracting an accurately captured image and outputting it as a specific image.

さらに、行動解析システムでは、当該特定画像から対象人物の行動を解析し、正確かつ耐干渉な行動認識結果が得られる。また、当該対象人物の履歴データ又は健常者のデータと照合することにより、対象人物の健康状態、回復状態を正確に把握し、リアルタイムに治療方針、薬剤投与量などを調整することができる。 Furthermore, in the behavior analysis system, the behavior of the target person is analyzed from the specific image, and an accurate and interference-resistant behavior recognition result can be obtained. In addition, by comparing the history data of the target person or the data of the healthy person, it is possible to accurately grasp the health state and recovery state of the target person and adjust the treatment policy, the drug dosage, etc. in real time.

本発明の画像抽出装置を適用した介護施設の平面模式図である。It is a plane schematic diagram of the care facility to which the image extraction device of the present invention is applied. 図１の行動解析システムを示す概略的なブロック図である。It is a schematic block diagram which shows the action analysis system of FIG. 画像抽出装置の構造を示すブロック図である。It is a block diagram which shows the structure of an image extraction apparatus. 人物データベースに記憶された人物データＤＢｐのエントリの例である。It is an example of the entry of person data DBp memorize | stored in the person database. 画像判定出力部の構造を概略的に示す図である。It is a figure which shows roughly the structure of an image determination output part. 本発明の画像抽出方法のフロー図である。It is a flowchart of the image extraction method of this invention. キーポイントの抽出を模式的に示す図である。It is a figure which shows extraction of a key point typically.

以下は図面を参照しながら本発明を実施するための形態を説明する。また、以下の実施の形態において、要素の数など（個数、数値、量、範囲などを含む）に言及する場合、特に明示した場合および原理的に明らかに特定の数に限定される場合等を除き、その特定の数に限定されるものではなく、特定の数以上でも以下でもよい。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. Also, in the following embodiments, when referring to the number of elements (including the number, numerical value, quantity, range, etc.), particularly when clearly indicated and when clearly limited to a specific number in principle, etc. Except, it is not limited to the specific number, and may be more or less than the specific number.

さらに、以下の実施の形態において、その構成要素（要素ステップ等も含む）は、特に明示した場合および原理的に明らかに必須であると考えられる場合等を除き、必ずしも必須ではなく、明細書に記載されていない要素を含んで良いことは言うまでもない。 Further, in the following embodiments, the constituent elements (including element steps and the like) are not necessarily required except in the case where they are clearly indicated and in principle considered to be essential in principle. It goes without saying that elements not described may be included.

同様に、以下の実施の形態において、構造要素等の形状、位置関係等に言及するときは、特に明示した場合および原理的に明らかにそうではないと考えられる場合等を除き、実質的にその形状等に近似または類似するもの等を含むものとする。このことは、上記数値および範囲についても同様である。 Similarly, in the following embodiments, when referring to the shape, positional relationship, etc., of structural elements, etc., it is substantially the case unless specifically stated otherwise or in principle considered otherwise. Including those that are approximate or similar to the shape. The same applies to the above numerical values and ranges.

以下は本発明を介護施設に適用する例を説明する。なお、本発明の適用は介護施設に限定されず、特定の対象人物の行動認識を行い、かつ対象人物を撮像する撮像機器を複数設置するものであれば、本発明を適用できる。例えば、本発明の画像抽出装置を対象人物の宅内又は居住地区内に設置し、宅内又は居住地区内に設置された複数の撮像機器に接続させ、これら撮像機器により撮像されたビデオデータを絞り込んで該対象人物の行動認識用の特定画像とすることが可能である。 The following describes an example in which the present invention is applied to a care facility. The application of the present invention is not limited to a care facility, and the present invention can be applied as long as a plurality of imaging devices that perform action recognition of a specific target person and image the target person are installed. For example, the image extraction device of the present invention is installed in the home or residential area of the target person, connected to a plurality of imaging devices installed in the home or residential area, and the video data captured by these imaging devices is narrowed down. It is possible to obtain a specific image for action recognition of the target person.

また、以下に記載の対象人物は、殆どの場合は介護施設内で介護を受ける人を指すが、これに限らず、例えば介護施設の従業員などの健常者であってもよい。健常者に対して本発明を利用して行動認識を行うと、前もって疾患/ストレスの予防などの効果を奏する。 In addition, the target person described below refers to a person who receives care in the care facility in most cases, but is not limited thereto, and may be a healthy person such as an employee of the care facility. When action recognition is performed on a healthy person using the present invention, effects such as prevention of disease / stress are exhibited in advance.

まず、図１〜図５を参照しながら本発明の画像抽出装置を説明する。図１は本発明の画像抽出装置を適用した介護施設１０１の平面模式図である。図１の介護施設１０１には、複数の被介護者個室１０２、食堂１０３、大広間１０４、娯楽休憩室１０５及び廊下１０６、制御室１０７などが設置される。被介護者Ｐ１〜Ｐｍは各々の個室１０２で生活し、食堂１０３、大広間１０４、娯楽休憩室１０５及び廊下１０６などで行動する。食堂１０３、大広間１０４、娯楽休憩室１０５及び廊下１０６のような公共空間には、複数のビデオカメラＣ１〜Ｃｎ（ビデオソース）が設置され、各ビデオカメラは撮像範囲Ｒ１〜Ｒｎを有し、それぞれの撮像範囲内で行動する被介護者を撮像し、それぞれが撮像したビデオデータを各ビデオカメラに有線又は無線で接続された行動解析システム１００に送信する。 First, the image extracting apparatus of the present invention will be described with reference to FIGS. FIG. 1 is a schematic plan view of a nursing care facility 101 to which the image extracting apparatus of the present invention is applied. In the care facility 101 in FIG. 1, a plurality of care receiver private rooms 102, a cafeteria 103, a hall 104, an entertainment break room 105, a corridor 106, a control room 107, and the like are installed. The care recipients P1 to Pm live in each private room 102 and act in the dining room 103, the hall 104, the entertainment break room 105, the hallway 106, and the like. A plurality of video cameras C1 to Cn (video sources) are installed in public spaces such as the cafeteria 103, the hall 104, the entertainment rest room 105, and the hallway 106, and each video camera has an imaging range R1 to Rn, The cared person acting within the imaging range is imaged, and the video data captured by each person is transmitted to the behavior analysis system 100 connected to each video camera by wire or wirelessly.

行動解析システム１００は、本発明の画像抽出装置２００を備え、制御室１０７内に設置され、対象人物の行動特徴の解析結果を管理者又は介護者に通知する。 The behavior analysis system 100 includes the image extraction apparatus 200 of the present invention, is installed in the control room 107, and notifies the administrator or caregiver of the analysis result of the behavior characteristics of the target person.

図２は図１の行動解析システム１００を示す概略的なブロック図である。行動解析システム１００は、画像抽出装置２００、行動特徴解析部２０１、行動特徴データベース２０２、通知部２０３を備える。 FIG. 2 is a schematic block diagram showing the behavior analysis system 100 of FIG. The behavior analysis system 100 includes an image extraction device 200, a behavior feature analysis unit 201, a behavior feature database 202, and a notification unit 203.

画像抽出装置２００は、ビデオカメラに撮像された対象人物が当該介護施設に入居中のどの被介護者であるかを判定し（例えば被介護者Ｐｉ）、ビデオカメラＣ１〜Ｃｎからのビデオデータから、対象人物Ｐｉの注目部位Ｉｉの行動認識に適する画像データを抽出する（以下は、例えばこのような画像データを「特定画像データ」と称する）。 The image extraction apparatus 200 determines which care recipient is in the care facility (for example, the care recipient Pi) and the target person captured by the video camera is based on the video data from the video cameras C1 to Cn. Then, image data suitable for action recognition of the attention site Ii of the target person Pi is extracted (hereinafter, such image data is referred to as “specific image data”, for example).

図３を参照に画像抽出装置２００を詳しく説明する。図３は画像抽出装置２００の構造を示すブロック図である。図３に示すように、画像抽出装置２００は人物検出抽出部３０１、人物特定部３０２、人物データベース３０３、注目部位特定部３０４、疾患データベース３０５、キーポイント抽出部３０６、注目部位画像抽出部３０７、画像判定出力部３０８を備える。 The image extracting apparatus 200 will be described in detail with reference to FIG. FIG. 3 is a block diagram showing the structure of the image extraction device 200. As shown in FIG. 3, the image extraction apparatus 200 includes a person detection extraction unit 301, a person identification unit 302, a person database 303, a region of interest identification unit 304, a disease database 305, a key point extraction unit 306, a region of interest image extraction unit 307, An image determination output unit 308 is provided.

人物検出抽出部３０１は、画像抽出装置２００に入力されたビデオカメラＣ１〜Ｃｎからのビデオデータに対し人物検出を行い、人物を撮像したビデオカメラがあるか判定する。なお、人物検出は従来技術を利用してよいため、ここでは詳細な説明は割愛する。 The person detection extraction unit 301 performs person detection on the video data from the video cameras C1 to Cn input to the image extraction apparatus 200, and determines whether there is a video camera that captures the person. Since human detection may use conventional techniques, a detailed description is omitted here.

あるビデオカメラＣｗが人物を撮像したと判定されると、当該人物を対象人物Ｐとし、当該ビデオカメラＣｗに撮像された対象人物Ｐの全ての画像を抽出する。また、人物検出抽出部３０１は、他のビデオカメラＣ１〜Ｃｗ−１、Ｃｗ＋１〜Ｃｎも当該対象人物Ｐを撮像したか否かを判定する。なお、この判定において、人物の類似度判定の従来技術を使用すればよい。例えば、中国特許出願２０１７１１２３６９０３．Ｘに記載の類似度判定の方法を利用することで、正確性を向上させることができる。 When it is determined that a certain video camera Cw has captured a person, the person is set as the target person P, and all images of the target person P imaged by the video camera Cw are extracted. In addition, the person detection extraction unit 301 determines whether or not the other video cameras C1 to Cw-1 and Cw + 1 to Cn have captured the target person P. In this determination, a conventional technique for determining a person's similarity may be used. For example, Chinese patent application 201511123903. By using the similarity determination method described in X, accuracy can be improved.

類似度判定の結果、他のビデオカメラも対象人物Ｐを撮像した場合に、これらビデオカメラが撮像した対象人物Ｐの全てのビデオを抽出し、上記の抽出したビデオカメラＣｗが撮像した対象人物Ｐのビデオと共に出力する。ここでは、出力したビデオに各ビデオ（画像）がどのビデオカメラが撮像したのかを示す情報をともに記録してもよい。 As a result of the similarity determination, when other video cameras capture the target person P, all the videos of the target person P captured by these video cameras are extracted, and the target person P captured by the extracted video camera Cw is extracted. Together with the video. Here, information indicating which video camera captured each video (image) may be recorded in the output video.

人物特定部３０２は人物データベース３０３を利用して、上記撮像した対象人物Ｐが介護施設１０１に入居中のどの人物であるかを判定する。人物データベース３０３は介護施設１０１に入居中の各被介護者Ｐ１〜Ｐｎの人物データＤＢｐを記憶したものである。図４は人物データベース３０３に記憶された人物データＤＢｐのエントリの例である。 The person specifying unit 302 uses the person database 303 to determine which person the photographed target person P is in the care facility 101. The person database 303 stores person data DBp of each of the care recipients P1 to Pn who are resident in the care facility 101. FIG. 4 is an example of an entry of the person data DBp stored in the person database 303.

図４に示すように、人物データＤＢｐは、被介護者を一意に特定するための人物ＩＤ４０１、ビデオカメラに撮像された人物がどの被介護者であるか特定するための人物特徴４０２、被介護者の体の不快部位（疾患部位）を示す注目部位４０３を含む。ここで、人物特徴４０２は、該当する被介護者の外貌（顔、体付）の画像を記憶してもよく、これら画像を処理した特徴データを記憶してもよい。本発明はこれについて特に限定しない。注目部位４０３には、例えば医師が診断した当該被介護者の不快部位などの情報が記述される。ここでは、「首」、「ひじ」などの文字を直接に記述してもよいが、「腰椎椎間板ヘルニア」などの病因を記述してもよく、又は予め決定された番号を記述してもよい（例えばある関節（即ちキーポイント）を示す番号など）。 As shown in FIG. 4, the person data DBp includes a person ID 401 for uniquely specifying a care recipient, a person feature 402 for specifying which care recipient the person captured by the video camera, It includes a site of interest 403 indicating an uncomfortable site (disease site) of the person's body. Here, the person feature 402 may store an image of the appearance (face, body) of the corresponding care recipient, or may store feature data obtained by processing these images. The present invention is not particularly limited in this regard. The attention part 403 describes information such as an uncomfortable part of the cared person diagnosed by a doctor, for example. Here, characters such as “neck” and “elbow” may be directly described, but an etiology such as “lumbar disc herniation” may be described, or a predetermined number may be described. (For example, a number indicating a certain joint (ie, key point)).

人物特定部３０２は対象人物Ｐの画像（又は特徴データ）を人物データベース３０３内の人物特徴４０２に記憶したデータと照合する。ここでも、人物の類似度判定の従来技術を使用すればよい。例えば、正確性を向上させるために、中国特許出願２０１７１１２３６９０３．Ｘに記載の類似度判定の方法を利用することができる。本発明はこれについて限定しない。 The person identifying unit 302 collates the image (or feature data) of the target person P with data stored in the person feature 402 in the person database 303. Here too, a conventional technique for determining the similarity of a person may be used. For example, in order to improve the accuracy, Chinese patent application 201511123903. The similarity determination method described in X can be used. The present invention is not limited in this regard.

類似度に対して所定の閾値Ｔ０を設定する。そして、人物データベース３０３内のあるエントリ（例えばＰｉ）の人物特徴４０２と対象人物Ｐとの類似度が該閾値Ｔ０より高い場合、対象人物Ｐが人物データベース３０３内の人物Ｐｉであると判定される。ここで、複数のエントリにおいて人物特徴４０２と対象人物Ｐとの類似度が閾値Ｔ０より高い場合、類似度がもっとも高いエントリを選択する。その後、人物Ｐｉの人物ＩＤを注目部位特定部３０４に出力する。次に、注目部位特定部３０４は人物データベース３０３から対象人物Ｐｉの注目部位Ｉｉを取得する。 A predetermined threshold T0 is set for the similarity. When the similarity between the person feature 402 of a certain entry (for example, Pi) in the person database 303 and the target person P is higher than the threshold T0, it is determined that the target person P is the person Pi in the person database 303. . Here, when the similarity between the person feature 402 and the target person P is higher than the threshold value T0 in the plurality of entries, the entry having the highest similarity is selected. Thereafter, the person ID of the person Pi is output to the attention site specifying unit 304. Next, the attention site specifying unit 304 acquires the attention site Ii of the target person Pi from the person database 303.

また、人物データベース３０３内のいずれのエントリにおいても人物特徴４０２と対象人物Ｐと類似度が閾値Ｔ０より低い場合、対象人物Ｐのデータが人物データベース３０３内に存在しないと判定され、対象人物が人物データベースに存在しない旨の情報を注目部位特定部３０４に送信する。 If the similarity between the person feature 402 and the target person P is lower than the threshold T0 in any entry in the person database 303, it is determined that the data of the target person P does not exist in the person database 303, and the target person is a person. Information indicating that it does not exist in the database is transmitted to the attention site specifying unit 304.

この場合、注目部位特定部３０４はキーポイント抽出部３０６の出力と疾患データベース３０５を利用して、対象人物の注目部位を判定する。 In this case, the attention site specifying unit 304 uses the output of the key point extraction unit 306 and the disease database 305 to determine the attention site of the target person.

具体的には、キーポイント抽出部３０６は、人物検出抽出部３０１から出力した対象人物Ｐの画像からキーポイントを抽出し、各画像における対象人物Ｐのキーポイント情報に関する情報を出力する。ここで、キーポイントの抽出は従来技術を利用すれば良い。例えば、上記ＯｐｅｎＰｏｓｅ技術を利用して、図７に示すように、対象人物の画像からキーポイントを抽出すればよい。 Specifically, the key point extraction unit 306 extracts key points from the image of the target person P output from the person detection extraction unit 301, and outputs information related to key point information of the target person P in each image. Here, the conventional technique may be used for key point extraction. For example, key points may be extracted from the image of the target person as shown in FIG. 7 using the OpenPose technique.

疾患データベース３０５には、疾患者の行動特徴（例えばキーポイント情報）のデータが多数に記憶され、疾患部位ごとに分類される。例えば、頸部不快患者のデータ、ひじ不快患者のデータ、足首不快患者のデータなどに分類される。 The disease database 305 stores a large amount of data of behavioral characteristics (for example, key point information) of the sick person, and is classified for each disease site. For example, it is classified into data on patients with neck discomfort, data on patients with elbow discomfort, data on patients with ankle discomfort.

注目部位特定部３０４はキーポイント抽出部３０６から出力した対象人物Ｐのキーポイント情報を疾患データベース３０５に記憶されたデータと照合し、どの種類のデータに最も近いか判定するとともに、疾患データベース内の最も近い種類の疾患部位が対象人物Ｐの注目部位であるとする。このように、人物データベース３０３に存在しない対象人物Ｐであっても、注目部位特定部３０４は当該対象人物Ｐの注目部位を取得できる。 The site-of-interest specifying unit 304 compares the key point information of the target person P output from the key point extracting unit 306 with the data stored in the disease database 305, determines which type of data is closest to the target person P, It is assumed that the closest type of diseased site is the target site of the target person P. As described above, even if the target person P does not exist in the person database 303, the attention site specifying unit 304 can acquire the attention site of the target person P.

なお、注目部位特定部３０４は、この対象人物に対してその注目部位を判定した後、判定結果を人物特定部３０２に通知し、人物特定部３０２が当該対象人物のために人物データベース３０３に新規のエントリを作り、人物検出抽出部３０１が抽出した当該対象人物の画像又は画像に基づく特徴データ、及び注目部位特定部３０４により判定された注目部位を当該エントリに記憶してもよい。 Note that the target region specifying unit 304 determines the target region for this target person, and then notifies the determination result to the person specifying unit 302. The person specifying unit 302 adds a new result to the person database 303 for the target person. , And the feature data based on the image of the target person extracted by the person detection extraction unit 301 and the attention site determined by the attention site identification unit 304 may be stored in the entry.

次に、注目部位特定部３０４は、取得した注目部位を注目部位画像抽出部３０７に出力する。注目部位画像抽出部３０７は、注目部位特定部３０４から出力した注目部位と人物検出抽出部３０１から出力された対象人物の画像から、対象人物の注目部位の画像を抽出する。 Next, the attention site specifying unit 304 outputs the acquired attention site to the attention site image extraction unit 307. The attention part image extraction unit 307 extracts an attention part image of the target person from the attention part output from the attention part specifying unit 304 and the target person image output from the person detection extraction unit 301.

具体的に、注目部位画像抽出部３０７は、人物検出抽出部３０１から入力された画像を注目部位ごとに分割し、注目部位特定部３０４が特定した注目部位に基づいて、当該注目部位の画像を画像判定出力部３０８に出力する。ここで、画像を分割する際に、注目部位の数および具体的な位置は予め決定されてもよい。例えば、注目部位の数は、疾患データベース３０５の分類数と同じであってもよい。 Specifically, the site-of-interest image extraction unit 307 divides the image input from the person detection extraction unit 301 for each site of interest, and based on the site of interest specified by the site-of-interest specifying unit 304, the image of the site of interest is displayed. The image is output to the image determination output unit 308. Here, when the image is divided, the number and specific position of the site of interest may be determined in advance. For example, the number of sites of interest may be the same as the number of classifications in the disease database 305.

また、注目部位の画像の抽出と共に、注目部位画像抽出部３０７は、キーポイント抽出部３０６からのキーポイント情報から、対応する注目部位のキーポイント情報（以下は「注目部位キーポイント情報」と称する）をも抽出し、各注目部位画像と注目部位キーポイント情報とを関連付けて画像判定出力部３０８に送信してもよい。 In addition to the extraction of the image of the target region, the target region image extraction unit 307 determines the key point information of the corresponding target region from the key point information from the key point extraction unit 306 (hereinafter referred to as “target region key point information”). ) May also be extracted, and each target part image and target part key point information may be associated with each other and transmitted to the image determination output unit 308.

その後、画像判定出力部３０８は注目部位画像抽出部３０７から出力した画像を判定し、所定の絞り込みルールに基づいて、当該対象人物の注目部位の行動特徴の解析に適する特定画像を出力する。 Thereafter, the image determination output unit 308 determines the image output from the target region image extraction unit 307, and outputs a specific image suitable for analyzing the behavioral characteristics of the target region of the target person based on a predetermined narrowing rule.

所定の絞り込みルールの詳細は以下である。ビデオカメラのそれぞれは、位置、角度、ピント調整などの設定が異なるため、撮像された対象人物の画像において、注目部位のシャープネス、大きさ、可視範囲（遮蔽されたか否か）などが異なる。従って、（１）シャープネス、（２）画素数、（３）キーポイント数との３点の少なくともいずれかに基づいて画像を絞り込むことが考えられる。 Details of the predetermined narrowing rule are as follows. Since each video camera has different settings such as position, angle, and focus adjustment, the sharpness, size, visible range (whether or not it is occluded), etc. of the region of interest differ in the captured image of the target person. Therefore, it is conceivable to narrow down the image based on at least one of (1) sharpness, (2) the number of pixels, and (3) the number of key points.

（１）シャープネスに基づく絞り込み
注目部位画像抽出部３０７からの対象人物Ｐの注目部位の複数の画像に対してそれぞれシャープネスを判定する。そして、シャープネスに対して予め閾値Ｔ１を設定し、シャープネスが閾値Ｔ１より低い画像を廃棄する。 (1) Narrowing Based on Sharpness Sharpness is determined for each of a plurality of images of the target region of the target person P from the target region image extraction unit 307. Then, a threshold value T1 is set in advance for sharpness, and an image whose sharpness is lower than the threshold value T1 is discarded.

以下はシャープネスの判定方法を説明する。 The sharpness determination method will be described below.

従来技術には、画像のシャープネスを判定する方法が複数存在する。本発明は従来の判定方法のいずれを利用してもよく、特に限定はない。ここで、注目部位毎に機械学習の方法により画像のシャープネスを判定する例を説明する。 In the prior art, there are a plurality of methods for determining the sharpness of an image. The present invention may use any conventional determination method, and is not particularly limited. Here, an example in which the sharpness of an image is determined by a machine learning method for each region of interest will be described.

まず、多数の鮮明な画像と不鮮明な画像を含むサンプルデータセットを用意し、サンプルデータセット内の人物を注目部位ごとに分割し、注目部位ごとのサブサンプルデータセットを構築する。そして、各サブサンプルデータセット内の画像のシャープネスを人工的に判定し、それぞれにシャープネス値を付与する。その後、注目部位のそれぞれに対し、各サブサンプルデータセットの画像を入力とし、シャープネス値を出力とするように、当該部位のシャープネスを取得するためのモデルをトレーニングする。 First, a sample data set including a large number of clear images and unclear images is prepared, a person in the sample data set is divided for each region of interest, and a subsample data set for each region of interest is constructed. Then, the sharpness of the image in each subsample data set is artificially determined, and a sharpness value is assigned to each. Thereafter, a model for acquiring the sharpness of the part is trained so that the image of each subsample data set is input and the sharpness value is output for each of the target parts.

これにより、注目部位画像抽出部３０７からの対象人物Ｐの注目部位の画像を対応する注目部位のモデルに入力することで、当該画像のシャープネスを得ることができる。 Thereby, by inputting the image of the target region of the target person P from the target region image extraction unit 307 to the corresponding target region model, the sharpness of the image can be obtained.

（２）画素数に基づく絞り込み
注目部位画像抽出部３０７からの対象人物Ｐの注目部位の複数の画像に対して、各画像の画素数を算出する。そして、画素数に対して閾値Ｔ２を設定し、画素数が閾値Ｔ２より低い画像を廃棄する。 (2) Narrowing Based on the Number of Pixels For the plurality of images of the target region of the target person P from the target region image extraction unit 307, the number of pixels of each image is calculated. Then, a threshold T2 is set for the number of pixels, and an image having a number of pixels lower than the threshold T2 is discarded.

具体的に、例えば注目部位ごとに、それぞれの最低画素数Ｗｍｉｎ＊Ｈｍｉｎ（前記閾値Ｔ２に対応する）を予め記憶し、入力された対象人物Ｐの注目部位の複数の画像のそれぞれの画素数が前記最低画素数より低いか否かを判定し、前記最低画素数より低いと、対応する画像を廃棄する。 Specifically, for example, for each target region, the minimum number of pixels Wmin * Hmin (corresponding to the threshold value T2) is stored in advance, and the number of pixels of each of the plurality of images of the target region of the target person P that has been input is stored. It is determined whether or not it is lower than the minimum number of pixels, and if it is lower than the minimum number of pixels, the corresponding image is discarded.

或いは、各画像の画素数を順位付け、最下位から上の一定の割合の画像を廃棄する。例えば、画素数が低いほうの５０％の画像を廃棄する。もちろん、閾値と順位付けを組み合わせて絞り込みを行ってもよいが、ここでは説明は割愛する。 Alternatively, the number of pixels of each image is ranked, and a certain percentage of the images from the lowest order are discarded. For example, 50% of the images with the lower number of pixels are discarded. Of course, narrowing down may be performed by combining thresholds and rankings, but the description is omitted here.

（３）キーポイント数に基づく絞り込み
注目部位画像抽出部３０７からの対象人物Ｐの注目部位の複数の画像に対して、それぞれの関連付けられた注目部位キーポイント情報によりキーポイント数を算出する。そして、例えばキーポイント数に対して閾値Ｔ３を設定し、キーポイント数が閾値Ｔ３より低い画像を廃棄する。 (3) Narrowing Based on Number of Key Points For the plurality of images of the target region of the target person P from the target region image extracting unit 307, the number of key points is calculated from the associated target region key point information. Then, for example, a threshold T3 is set for the number of key points, and images whose key points are lower than the threshold T3 are discarded.

具体的に、例えば注目部位ごとに、それぞれの最低キーポイント数Ｎｍｉｎ（前記閾値Ｔ３に対応する）を予め記憶し、入力された画像のそれぞれの注目部位キーポイント数が前記最低キーポイント数Ｎｍｉｎとの関係を判定し、前記最低キーポイント数Ｎｍｉｎよりキーポイント数が低い画像を廃棄する。 Specifically, for example, for each target region, each minimum key point number Nmin (corresponding to the threshold value T3) is stored in advance, and each target region key point number of the input image is the minimum key point number Nmin. And discards an image having a keypoint number lower than the minimum keypoint number Nmin.

或いは、各画像の注目部位キーポイント数を順位付け、最下位から上の一定の割合の画像を廃棄する。例えば、注目部位キーポイント数が低いほうの５０％の画像を廃棄する。もちろん、閾値と順位付けを組み合わせて絞り込みを行ってもよく、ここでは説明は割愛する。 Alternatively, the number of attention site key points in each image is ranked, and a certain percentage of the images from the bottom to the top are discarded. For example, 50% of the images with the lower number of target region key points are discarded. Of course, narrowing down may be performed by combining threshold values and ranking, and the description is omitted here.

図５は画像判定出力部３０８の構造を概略的に示す図である。図５に示すように、画像判定出力部３０８はシャープネス判定部５０１、画素数判定部５０２、キーポイント数判定部５０３を含む。
シャープネス判定部５０１は前記（１）に記載のシャープネスに基づく絞り込みを実行し、画素数判定部５０２は前記（２）に記載の画素数に基づく絞り込みを実行し、キーポイント数判定部５０３は前記（３）に記載のキーポイント数に基づく絞り込みを実行する。上述のように、画像判定出力部３０８において、シャープネス判定部５０１による処理、画素数判定部５０２による処理、およびキーポイント数判定部５０３による処理は、選択的に少なくとも１つを実行すればよく、全部実行する必要はない。 FIG. 5 is a diagram schematically showing the structure of the image determination output unit 308. As shown in FIG. 5, the image determination output unit 308 includes a sharpness determination unit 501, a pixel number determination unit 502, and a key point number determination unit 503.
The sharpness determination unit 501 performs the narrowing based on the sharpness described in (1), the pixel number determination unit 502 performs the narrowing based on the number of pixels described in (2), and the key point number determination unit 503 Narrowing based on the number of key points described in (3) is executed. As described above, in the image determination output unit 308, at least one of the processing by the sharpness determination unit 501, the processing by the pixel number determination unit 502, and the processing by the key point number determination unit 503 may be selectively executed. You don't have to do everything.

画像判定出力部３０８は、前記（１）〜（３）の少なくとも１つにより絞り込んだ対象人物の注目部位の画像（注目部位キーポイント情報と関連付けて出力してよい）を、特定画像のデータとしてそのまま出力してよい。また、絞り込んだ画像がそれぞれビデオカメラＣ１〜Ｃｎのいずれからのものかを更に判定し、特定画像の数に基づいてビデオカメラを順位付け、ビデオカメラの優先度を設定してもよい。この場合、絞り込んだ画像から優先度が高い（例えば順位が上位３０％）ビデオカメラからの画像を抽出し、抽出した画像を特定画像として図２の行動特徴解析部２０１に送信する。さらに、前記対象人物について、優先度が低い（例えば順位が下位３０％）ビデオカメラからのビデオデータをこれから受信しなくてもよい。これにより、画像抽出装置２００のリソース消費を低減させることができ、速度増加と共にコスト削減が可能になる。 The image determination output unit 308 uses, as specific image data, the image of the target region of the target person (which may be output in association with the target region key point information) narrowed down by at least one of the above (1) to (3). It may be output as it is. Further, it may be further determined whether the narrowed-down images are from the video cameras C1 to Cn, the video cameras may be ranked based on the number of specific images, and the priority of the video cameras may be set. In this case, an image from a video camera with a high priority (for example, the top 30%) is extracted from the narrowed-down image, and the extracted image is transmitted as a specific image to the behavior feature analysis unit 201 in FIG. Furthermore, it is not necessary for the target person to receive video data from a video camera with a low priority (for example, the lower rank is 30%). Thereby, the resource consumption of the image extracting apparatus 200 can be reduced, and the cost can be reduced as the speed increases.

或いは、ビデオカメラを順位付けた後、シャープネス判定部５０１、画素数判定部５０２、キーポイント数判定部５０３のそれぞれの処理をこれから実行せず、優先度が高い（例えば順位が上位３０％）ビデオカメラからの対象人物の注目部位の画像をそのまま出力してもよい。 Alternatively, after ranking the video cameras, each of the sharpness determination unit 501, the pixel number determination unit 502, and the keypoint number determination unit 503 is not executed, and the video with high priority (for example, the top 30%) The image of the target region of the target person from the camera may be output as it is.

また、優先度が低いビデオカメラについて、その旨（優先度が低いことを示す情報）をリアルタイムに、ビデオカメラの設置を制御する他のコントローラーにフィードバックし、当該コントローラーでこれら優先度が低いビデオカメラのピント調整、角度、位置のいずれを調整してもよい（もしできれば）。 In addition, for low-priority video cameras, that fact (information indicating low priority) is fed back to other controllers that control the installation of the video camera in real time, and these low-priority video cameras Any of focus adjustment, angle, and position may be adjusted (if possible).

また、以上の説明において、画像判定出力部３０８は特定画像の他に、注目部位キーポイント情報をも出力する。しかし、注目部位キーポイント情報を出力せずに特定画像のみを出力してもよい。そして、このような特定画像を利用して行動認識を行う際は、別途でキーポイントの抽出を再度行ってもよい。この点は後述する画像抽出方法においても同様である。 In the above description, the image determination output unit 308 also outputs attention site key point information in addition to the specific image. However, only the specific image may be output without outputting the attention site key point information. And when performing action recognition using such a specific image, you may perform a key point extraction again separately. This also applies to the image extraction method described later.

次に図２の説明に戻る。画像判定出力部３０８は、対象人物の注目部位の画像を絞り込んだ画像（即ち特定画像）を注目部位キーポイント情報と関連付けて図２の行動特徴解析部２０１に出力する。 Next, the description returns to FIG. The image determination output unit 308 associates an image obtained by narrowing down the image of the target region of the target person (that is, a specific image) with the target region key point information and outputs the image to the behavior feature analysis unit 201 in FIG.

行動特徴解析部２０１は、行動特徴データベース２０２に記憶された各被介護者の行動特徴の履歴データと健常者の行動特徴のデータにより、対象人物の注目部位の健康状態が改善したか又は悪化したかを判定する。 The behavior feature analysis unit 201 has improved or deteriorated the health condition of the attention site of the target person based on the history data of the behavior features of each care recipient and the behavior feature data of the healthy person stored in the behavior feature database 202. Determine whether.

具体的に、行動特徴データベース２０２には、人物ＩＤと各被介護者の注目部位の行動特徴の履歴データが記憶され、また、年齢、性別、人種がそれぞれの健常者の各注目部位の行動特徴の履歴データも記憶されている。 Specifically, in the behavior feature database 202, history data of a person ID and behavioral characteristics of each cared person's attention site is stored, and the behavior of each attention site of each healthy person in age, gender, and race is stored. Feature history data is also stored.

対象人物が人物データベース３０３内の人物である場合、その注目部位キーポイント情報に基づいて行動特徴を解析し、解析の結果を行動特徴データベース２０２の対応するエントリの履歴データと照合して、当該注目部位の状態の変化を判定する。また、その判定結果を通知部２０３で管理者又は介護者に通知する。 When the target person is a person in the person database 303, the behavior feature is analyzed based on the attention site key point information, the result of the analysis is compared with the history data of the corresponding entry in the behavior feature database 202, and The change in the state of the part is determined. In addition, the determination result is notified to the administrator or caregiver by the notification unit 203.

対象人物が人物データベース３０３内の人物ではない場合、その注目部位キーポイント情報に基づいて行動特徴を解析し、解析の結果を行動特徴データベース２０２の対応する健常者の対応する注目部位のデータと照合して、当該注目部位の状態と健常者との差異を判定する。また、その判定結果を通知部２０３で管理者又は介護者に通知する。 When the target person is not a person in the person database 303, the behavior feature is analyzed based on the attention site key point information, and the result of the analysis is collated with the corresponding attention site data of the corresponding healthy person in the behavior feature database 202. Then, the difference between the state of the attention site and the healthy person is determined. In addition, the determination result is notified to the administrator or caregiver by the notification unit 203.

さらに、人物特定部３０２と同様に、行動特徴解析部２０１も当該対象人物のために行動特徴データベース２０２に新規のエントリを作り、その行動特徴のデータを当該エントリに記憶し、これからの利用に備えても良い。 Further, similar to the person specifying unit 302, the behavior feature analysis unit 201 also creates a new entry in the behavior feature database 202 for the target person, stores the behavior feature data in the entry, and prepares for future use. May be.

以上は本発明の画像抽出装置２００と行動解析システム１００を説明した。本発明の画像抽出装置２００によれば、ビデオカメラのそれぞれが撮像した画像が必ずしも対象人物の行動認識に適するとは限らないことを考慮し、対象人物の注目部位の画像に対して、注目部位のシャープネス、画素数、キーポイント数の少なくともいずれかに基づいて絞り込みを行うことで、出力される特定画像を、対象人物の注目部位の行動がより正確に撮像されたものとすることができる。 The image extraction apparatus 200 and the behavior analysis system 100 of the present invention have been described above. According to the image extraction device 200 of the present invention, considering that the image captured by each video camera is not necessarily suitable for the action recognition of the target person, the target site is compared with the target site image of the target person. By narrowing down the image based on at least one of the sharpness, the number of pixels, and the number of key points, the output of the specific image can be obtained by capturing the action of the target portion of the target person more accurately.

また、本発明の行動解析システム１００によれば、前記特定画像を利用して対象人物の行動を解析するため、正確かつ耐干渉な行動認識結果が得られる。また、当該対象人物の履歴データ又は健常者のデータと照合することにより、対象人物の健康状態、回復状態を正確に把握し、リアルタイムに治療方針、薬剤投与量などを調整することができる。 Further, according to the behavior analysis system 100 of the present invention, since the behavior of the target person is analyzed using the specific image, an accurate and interference-resistant behavior recognition result can be obtained. In addition, by comparing the history data of the target person or the data of the healthy person, it is possible to accurately grasp the health state and recovery state of the target person and adjust the treatment policy, the drug dosage, etc. in real time.

以下は図６を参照しながら本発明の画像抽出方法を説明する。図６は本発明の画像抽出方法のフロー図である図６に示すように、ステップＳ６０１において、ビデオカメラＣ１〜Ｃｎから入力されたビデオデータに対して人物検出を行い、人物を撮像したビデオカメラがあるか判定する。 The image extraction method of the present invention will be described below with reference to FIG. FIG. 6 is a flowchart of the image extraction method of the present invention. As shown in FIG. 6, in step S601, a person is detected from the video data input from the video cameras C1 to Cn, and a video camera is picked up. Determine if there is any.

あるビデオカメラＣｗが人物を撮像したと判定されると、当該人物を対象人物Ｐとし、当該ビデオカメラＣｗに撮像された対象人物Ｐの全ての画像を抽出する。また、他のビデオカメラＣ１〜Ｃｗ−１、Ｃｗ＋１〜Ｃｎも当該対象人物Ｐを撮像したか否かを判定し、これらビデオカメラに撮像された対象人物Ｐの全ての画像をも抽出する。 When it is determined that a certain video camera Cw has captured a person, the person is set as the target person P, and all images of the target person P imaged by the video camera Cw are extracted. The other video cameras C1 to Cw-1 and Cw + 1 to Cn also determine whether or not the target person P has been imaged, and extract all the images of the target person P captured by these video cameras.

次いで、ステップＳ６０２において、ステップＳ６０１で抽出した対象人物の画像からキーポイントを抽出し、各画像における対象人物のキーポイント情報を出力する。また、ステップＳ６０３において、対象人物の画像（又は特徴データ）を人物データベース内の人物特徴と照合し、上述した人物特定部３０２に関する説明のように、人物データベースを利用して、検出された人物が介護施設に入居中のどの人物であるかを判定する。 Next, in step S602, key points are extracted from the image of the target person extracted in step S601, and key point information of the target person in each image is output. In step S603, the image (or feature data) of the target person is collated with the person feature in the person database, and the person detected using the person database as described above with respect to the person specifying unit 302 is obtained. Determine which person is in the care facility.

ステップＳ６０４において、対象人物を特定したか否かを判定する。対象人物を特定した場合、ステップＳ６０５に進む。ステップＳ６０５において、人物データベースから対象人物の注目部位を特定する。ステップＳ６０４において、対象人物を特定できないと判定されると、ステップＳ６０６に進む。 In step S604, it is determined whether the target person has been specified. If the target person is specified, the process proceeds to step S605. In step S605, the target region of the target person is specified from the person database. If it is determined in step S604 that the target person cannot be specified, the process proceeds to step S606.

ステップＳ６０６において、ステップＳ６０２で出力された対象人物のキーポイント情報を疾患データベースに記憶されたデータと照合し、疾患データベースにおけるどの種類のデータに最も近いか判定する。そして、疾患データベース内の最も近い種類の疾患部位が対象人物の注目部位であるとする。 In step S606, the key point information of the target person output in step S602 is collated with the data stored in the disease database to determine which type of data in the disease database is closest. Then, it is assumed that the closest type of disease site in the disease database is the target site of the target person.

次に、ステップＳ６０７において、ステップＳ６０５やステップＳ６０６で出力された注目部位に基づいて対象人物の注目部位の画像を抽出するとともに、ステップＳ６０２で出力された対象人物のキーポイント情報から、対応する注目部位のキーポイント情報を抽出する。その後、ステップＳ６０８において、ステップＳ６０７で出力された対象人物の注目部位の画像を絞り込む。 Next, in step S607, an image of the target portion of the target person is extracted based on the target portion output in step S605 or step S606, and the corresponding target attention is extracted from the key point information of the target person output in step S602. Extract the keypoint information of the part. Thereafter, in step S608, the image of the target region of the target person output in step S607 is narrowed down.

ここでは、上述した画像判定出力部３０８に関する説明のように、（１）シャープネスに基づく絞り込み、（２）画素数に基づく絞り込み、（３）キーポイント数に基づく絞り込みの少なくともいずれかを行う。これら（１）〜（３）について、上記の説明と同様であるため、重複の説明を省略する。 Here, at least one of (1) narrowing based on sharpness, (2) narrowing based on the number of pixels, and (3) narrowing based on the number of key points is performed as described above with respect to the image determination output unit 308. Since these (1) to (3) are the same as those described above, redundant description will be omitted.

次いで、ステップＳ６０９において、ビデオカメラの順位付け、ビデオカメラの順位付けに基づく絞り込み、ビデオカメラの設置の調整を選択的に実行する。これらの詳細は上記と同様であるため、説明を省略する。勿論、当該ステップを省略してもよい。その後、ステップＳ６１０において、対象人物の注目部位の画像を絞り込んだ画像（即ち特定画像）を注目部位キーポイント情報と関連付けて出力し、本発明の画像抽出方法を終了する。 Next, in step S609, the ranking of the video cameras, the narrowing down based on the ranking of the video cameras, and the adjustment of the installation of the video cameras are selectively executed. Since these details are the same as the above, description is abbreviate | omitted. Of course, this step may be omitted. Thereafter, in step S610, an image obtained by narrowing down the image of the target region of the target person (that is, a specific image) is output in association with the target region key point information, and the image extraction method of the present invention is terminated.

その後、出力された特定画像および関連付けられたキーポイント情報を利用して、図２の行動特徴解析部２０１のように、対象人物の注目部位の健康状態が改善したか又は悪化したかを判定する。 Thereafter, using the output specific image and the associated key point information, as in the behavior feature analysis unit 201 in FIG. 2, it is determined whether the health state of the target region of the target person has improved or deteriorated. .

以上は、本発明を実施するための好ましい形態を説明したが、本発明は上記実施の形態に限定されない。発明の要旨を逸脱しない範囲内で各種変更を行うことができる。例えば、以上では、画像抽出装置２００、行動解析システム１００の説明において、モジュール構造を例として説明した。このようなモジュール構造はその機能を実現するためのハードウエアで実現されてもよく、ＣＰＵ、コンピュータが記憶媒体に記憶されたプログラムを実行することで実現されてもよい。 The preferred embodiments for carrying out the present invention have been described above, but the present invention is not limited to the above embodiments. Various changes can be made without departing from the scope of the invention. For example, in the description of the image extraction apparatus 200 and the behavior analysis system 100, the module structure has been described as an example. Such a module structure may be realized by hardware for realizing the function, or may be realized by a CPU or a computer executing a program stored in a storage medium.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明したすべての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 In addition, this invention is not limited to an above-described Example, Various modifications are included. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. Further, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

また、上記の各構成・機能・処理部等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウエアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記録装置、または、ＩＣカード、ＳＤカード等の記録媒体に置くことができる。また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしもすべての制御線や情報線を示しているとは限らない。実際には殆どすべての構成が相互に接続されていると考えてもよい。 Each of the above-described configurations, functions, processing units, and the like may be realized by hardware by designing a part or all of them, for example, with an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor. Information such as programs, tables, and files for realizing each function can be stored in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card or an SD card. In addition, the control lines and information lines are those that are considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. In practice, it may be considered that almost all the components are connected to each other.

本発明はビデオ監視分野に関し、行動認識を行う場合であれば適用でき、例えば防犯や介護などの場合に適用できる。 The present invention relates to the field of video surveillance, and can be applied to cases where action recognition is performed. For example, the present invention can be applied to crime prevention or nursing care.

Claims

An image extraction device for extracting a specific image for performing action recognition of a target person in video data from video data derived from a plurality of video sources,
A person detection extraction unit that detects a target person from the video data and extracts all the images of the target person;
A key point extraction unit that performs key point extraction on the image of the target person extracted by the person detection extraction unit;
A site-of-interest identifying unit that identifies a site of interest of the target person;
A site-of-interest image extraction unit that extracts an image of the site of interest of the target person from the image of the target person based on the site of interest specified by the site-of-interest specifying unit;
An image determination output unit that determines whether each of the images of the target region extracted by the target region image extraction unit is the specific image, and outputs the determined specific image;
The image determination output unit determines whether the image of the target region is the specific image based on at least one of sharpness, the number of pixels, and the number of key points of the image of the target region.

The image extraction device according to claim 1,
further,
Using the person database in which the person data for uniquely identifying the person and the attention site of the person are stored, the target person detected by the person detection / extraction unit is which person. It has a person identification part that identifies whether there is
When the person specifying unit specifies which person in the person database the target person is, the attention site specifying unit uses the person database based on the result output from the person specifying unit. An image extraction device that identifies a target region of the target person.

The image extraction device according to claim 2,
further,
Provided with a disease database storing key point information of images of diseased persons classified for each diseased part,
When it is determined by the person specifying unit that the target person does not exist in the person database, the attention site specifying unit compares the key point information extracted by the key point extracting unit with the disease database, and An image extraction apparatus for determining the attention site of a person.

The image extraction device according to any one of claims 1 to 3,
The image determination output unit determines a video source from which the specific image is derived, gives priority to the plurality of video sources based on the number of the specific images from each video source, and the priority of the video source. An image extraction device for further narrowing down the specific image based on the degree.

The image extraction device according to any one of claims 1 to 3,
The image determination output unit trains a model of sharpness determination using a sample data set for each of the target regions by a machine learning method, and inputs an image of the target region of the target person to the corresponding model. An image extracting apparatus that determines the sharpness of the image of the target region from the result.

An image extraction method for extracting a specific image for performing action recognition of a target person in video data from video data derived from a plurality of video sources,
A person detection extraction step of detecting a target person from the video data and extracting all the images of the target person;
A key point extraction step for performing key point extraction on the image of the target person extracted in the person detection extraction step;
An attention site specifying step for specifying an attention site of the target person;
An attention site image extracting step for extracting an image of the attention site of the target person from the image of the target person based on the attention site identified in the attention site identification step;
An image determination output step of determining whether each of the images of the target region extracted in the target region image extraction step is the specific image, and outputting the determined specific image;
In the image determination output step, an image extraction method for determining whether the image of the target region is the specific image based on at least one of the sharpness, the number of pixels, and the number of key points of the image of the target region.

The image extraction method according to claim 6, comprising:
further,
Using a person database in which person data for uniquely identifying the person and a part of interest of the person is stored, which person is the target person detected in the person detection extraction step With a person identification step to identify
When the person specifying step specifies which person in the person database the target person is, the person database is used based on the result output in the person specifying step in the attention site specifying step. An image extraction method for identifying a target region of the target person.

The image extraction method according to claim 7, comprising:
When it is determined by the person specifying step that the target person does not exist in the person database,
In the attention site identification step, the keypoint information extracted in the keypoint extraction step is collated with a disease database in which keypoint information of images of diseased persons classified for each diseased site is stored, and the target person's An image extraction method for determining the region of interest.

The image extraction method according to any one of claims 6 to 8, comprising:
In the image determination output step, a video source from which the specific image is derived is determined, a priority is given to the plurality of video sources based on the number of the specific images from each video source, and the priority of the video source is determined. An image extraction method for further narrowing down the specific image based on the degree.

The image extraction method according to any one of claims 6 to 8, comprising:
In the image determination output step, a model for sharpness determination is trained using a sample data set for each of the target regions by a machine learning method, and an image of the target region of the target person is input to the corresponding model. An image extraction method for determining the sharpness of the image of the target region from the result.

A program executable on a computer,
The said program is a program which makes a computer perform the image extraction method as described in any one of Claims 6-10.

A behavior analysis system,
An image extraction device according to any one of claims 1 to 5,
Using the specific image output from the image extraction device, a behavior feature analysis unit for performing behavior recognition of the target person;
A notification unit for outputting an analysis result of the behavior feature analysis unit;
A behavior analysis system comprising