JP6773829B2

JP6773829B2 - Object recognition device, object recognition method, and object recognition program

Info

Publication number: JP6773829B2
Application number: JP2019029747A
Authority: JP
Inventors: 中村　友彦; 友彦中村
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2019-02-21
Filing date: 2019-02-21
Publication date: 2020-10-21
Anticipated expiration: 2039-02-21
Also published as: JP2020135551A

Description

本発明は、画像等の計測データから所定の対象物の部位を検出して対象物を認識する技術に関する。 The present invention relates to a technique for recognizing an object by detecting a part of a predetermined object from measurement data such as an image.

撮影画像中に現れている人の複数の部位を機械学習に基づいて検出する研究が盛んに行われている。 Research is being actively conducted to detect multiple parts of a person appearing in a photographed image based on machine learning.

例えば、下記の非特許文献１に記載の技術においては、人が写った多数の学習用画像を入力値とし当該学習用画像における人の部位の種別および位置を記したアノテーションを出力値の目標値とするモデルを深層学習させる。そして、学習済みモデルに撮影画像を入力することによって撮影画像に写った人の部位の種別および位置を出力させる。このアノテーションは学習用画像に現れている部位について作成される。ちなみに、アノテーションに記された各部位の情報や学習済みモデルが出力する各部位の情報はキーポイントなどと呼ばれている。 For example, in the technique described in Non-Patent Document 1 below, a large number of learning images showing a person are used as input values, and an annotation describing the type and position of a person's part in the learning image is used as an output value target value. Let the model be deeply trained. Then, by inputting the photographed image into the trained model, the type and position of the part of the person appearing in the photographed image are output. This annotation is created for the part appearing in the learning image. By the way, the information of each part described in the annotation and the information of each part output by the trained model are called key points.

人についての各種認識に必要な部位が検出できれば、当該人について、姿勢の認識の他にも、存在領域の認識、プロポーションに基づく大人か子供か（属性）の認識等が可能となる。 If the parts necessary for various recognitions about a person can be detected, it becomes possible to recognize the existence area, the proportion-based adult or the child (attribute), and the like, in addition to the posture recognition, for the person.

“Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields.”, Z. Cao, T. Simon, S. Wei and Y. Sheikh (2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1302-1310)“Realtime Multi-Person 2D Pose Optimization using Part Affinity Fields.”, Z. Cao, T. Simon, S. Wei and Y. Sheikh (2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1302-1310)

しかしながら、従来技術では、撮影画像に現れていない部位を推定する精度が低いため、隠蔽があると対象物の姿勢、存在領域、属性等の認識が困難となる問題があった。 However, in the prior art, there is a problem that it is difficult to recognize the posture, existence area, attribute, etc. of the object if there is concealment because the accuracy of estimating the portion that does not appear in the captured image is low.

例えば、人物の腰の辺りがテーブルなどの物体で隠れテーブルの天板より上に上半身、天板より下に脚が撮影された画像を、従来技術により生成した学習済みモデルに入力した場合、上半身および脚のキーポイントの両方とも検出されないか、一方のみ検出されるか、上半身および脚のキーポイントが別々に検出されるか（つまり上半身と脚とが同一人物の部位として検出されない）のいずれかとなってしまう。 For example, when a person's waist is hidden by an object such as a table and the upper body is taken above the top plate of the table and the legs are taken below the top plate, the upper body is input to the trained model generated by the conventional technique. And either both the key points of the legs are not detected, only one is detected, or the key points of the upper body and the legs are detected separately (that is, the upper body and the legs are not detected as parts of the same person). turn into.

そのため当該検出結果を基に人の存在領域の認識を行った場合、存在領域無し、１人分の存在領域、２人分の存在領域との認識になり、高精度の認識が難しい。また、１つの存在領域から人の一部の部位の位置しか特定できないため姿勢や属性の認識も困難である。 Therefore, when the existing area of a person is recognized based on the detection result, it is recognized that there is no existing area, the existing area for one person, and the existing area for two people, and it is difficult to recognize with high accuracy. In addition, it is difficult to recognize postures and attributes because only the position of a part of a person can be specified from one existing area.

このように、従来技術では、学習用画像と当該画像に現れている部位との関係を学習させていたため、撮影画像に現れていない部位の検出は困難であった。そのため、従来技術では、隠蔽があると姿勢、存在領域、属性などの認識が困難となる場合があった。 As described above, in the prior art, since the relationship between the learning image and the portion appearing in the image is learned, it is difficult to detect the portion not appearing in the captured image. Therefore, in the prior art, it may be difficult to recognize the posture, the existing area, the attribute, etc. if there is concealment.

また、上記問題は、二次元計測データ（画像）のみならず三次元計測データにおいても生じ、同様に二次元計測データの時系列、三次元計測データの時系列においても生じる。 Further, the above problem occurs not only in the two-dimensional measurement data (image) but also in the three-dimensional measurement data, and also in the time series of the two-dimensional measurement data and the time series of the three-dimensional measurement data.

本発明は上記問題を鑑みてなされたものであり、一部の部位が隠蔽されて対象物が計測されても、隠蔽されている部位を補完して対象物を認識できる対象物認識装置、対象物認識方法および対象物認識プログラムを提供することを目的とする。 The present invention has been made in view of the above problems, and even if a part of a part is concealed and an object is measured, the object recognition device and the object can recognize the object by complementing the concealed part. An object of the present invention is to provide an object recognition method and an object recognition program.

（１）本発明に係る対象物認識装置は、所定の対象物が計測された計測データから、前記対象物を構成する複数の要検出部位を検出する装置であって、前記対象物の複数のサンプルについて前記サンプルごとに付与された前記要検出部位の位置を含む付与データを用い、当該付与データから一部の前記位置を欠落させた劣化データを入力とし、元の前記付与データを出力の目標値とする学習によって予め生成された補完器を記憶している補完器記憶手段と、前記計測データから前記対象物における前記要検出部位の位置のうちの１つ以上を検出して、前記対象物の検出データを取得する部位検出手段と、前記検出データを前記補完器に入力し、当該検出データに不足している前記要検出部位の位置を補完して補完済み検出データを生成する部位補完手段と、を備える。 (1) The object recognition device according to the present invention is a device that detects a plurality of detection-required parts constituting the object from the measurement data obtained by measuring the predetermined object, and is a device for detecting a plurality of detection-required parts of the object. About the sample Using the given data including the position of the detection-required site given for each sample, the deterioration data in which some of the said positions are omitted from the given data is input, and the original given data is the output target. The complement storage means that stores the complement generated in advance by learning as a value, and one or more of the positions of the detection-required sites in the object are detected from the measurement data, and the object A site detecting means for acquiring the detection data of the above, and a site complementing means for inputting the detection data into the complement and complementing the position of the detection-required site lacking in the detection data to generate the complemented detection data. And.

（２）本発明に係る他の対象物認識装置は、計測データから、所定の対象物が計測された対象物領域を、前記対象物を構成する複数の部位であり前記対象物領域の検出に要すると定められた要検出部位に基づいて検出する装置であって、学習用データにおける前記対象物の複数のサンプルについて前記サンプルごとに付与された前記要検出部位の位置を含む学習用付与データを用い、当該学習用付与データから一部の前記位置を欠落させた劣化データを入力とし、元の前記学習用付与データを出力の目標値とする学習によって予め生成された補完器を記憶している補完器記憶手段と、前記計測データから前記対象物における前記要検出部位の位置のうちの１つ以上を検出して、前記対象物の検出データを取得する部位検出手段と、前記検出データを前記補完器に入力し、当該検出データに不足している前記要検出部位の位置を補完して補完済み検出データを生成する部位補完手段と、前記補完済み検出データが示す前記要検出部位の位置を基準とした所定範囲を前記計測データにおける前記対象物領域として検出する対象物領域検出手段と、を備える。 (2) Another object recognition device according to the present invention uses measurement data to detect an object area in which a predetermined object is measured, which is a plurality of parts constituting the object and detects the object area. It is a device that detects based on the detection-required part determined to be necessary, and provides learning-giving data including the position of the detection-required part assigned to each of the plurality of samples of the object in the learning data. It stores the complementer generated in advance by learning using the deterioration data in which some of the positions are omitted from the training grant data as input and the original training grant data as the output target value. The complement storage means, the site detection means that detects one or more of the positions of the detection-required sites in the object from the measurement data and acquires the detection data of the object, and the detection data. The site complementing means that inputs to the complementer and complements the position of the detection-required site that is lacking in the detection data to generate the complemented detection data, and the position of the detection-required site indicated by the complemented detection data. The object region detecting means for detecting a predetermined range as a reference as the object region in the measurement data is provided.

（３）本発明に係るさらに他の対象物認識装置は、計測データに計測された人物が子供であるか否かの推定を、対象物である人を構成する複数の部位であり前記推定に要すると定められた要検出部位に基づいて行う装置であって、前記対象物の複数のサンプルについて前記サンプルごとに付与された前記要検出部位の位置を含む付与データを用い、当該付与データから一部の前記位置を欠落させた劣化データを入力とし、元の前記付与データを出力の目標値とする学習によって予め生成された補完器を記憶している補完器記憶手段と、前記計測データから前記対象物における前記要検出部位の位置のうちの１つ以上を検出して、前記対象物の検出データを取得する部位検出手段と、前記検出データを前記補完器に入力し、当該検出データに不足している前記要検出部位の位置を補完して補完済み検出データを生成する部位補完手段と、前記補完済み検出データにおける前記要検出部位間の距離の比を所定の基準と比較して前記計測データに計測された人物が子供であるか否かを推定する子供推定手段と、を備える。 (3) Still another object recognition device according to the present invention is a plurality of parts constituting a person who is an object and estimates whether or not the person measured in the measurement data is a child. It is an apparatus that performs based on the detection-required site determined to be necessary, and uses the grant data including the position of the detection-required site assigned to each of the samples for a plurality of samples of the object, and one from the grant data. The complement storage means that stores the complement generated in advance by learning with the deterioration data lacking the position of the unit as the input and the original added data as the output target value, and the measurement data A site detection means that detects one or more of the positions of the detection-required sites on the object and acquires the detection data of the object, and inputs the detection data to the complement, and the detection data is insufficient. The measurement is performed by comparing the ratio of the distance between the detection-required site in the complemented detection data and the site-complementing means that complements the position of the detection-required site to generate the complemented detection data with a predetermined reference. It is provided with a child estimation means for estimating whether or not the person measured in the data is a child.

（４）上記（１）〜（３）に記載の対象物認識装置において、前記部位補完手段は、前記部位検出手段が１つの前記計測データから２つ以上の前記対象物についての前記検出データを取得した場合に、前記補完器が出力した前記要検出部位の位置から当該対象物ごとの存在領域を推定し、複数の前記検出データについての当該存在領域同士の重複度が予め定めた基準値以上となる場合に、同一の前記対象物によるものとして当該複数の検出データについて統合された前記補完済み検出データを生成する構成とすることができる。 (4) In the object recognition device according to the above (1) to (3), the site complementing means obtains the detection data for two or more objects from the measurement data of one site detecting means. When acquired, the existence area for each object is estimated from the position of the detection-required part output by the complement, and the degree of overlap between the existence areas for the plurality of detection data is equal to or higher than a predetermined reference value. In this case, it is possible to generate the complemented detection data integrated with respect to the plurality of detection data as being due to the same object.

（５）上記（４）に記載の対象物認識装置において、前記部位補完手段は、前記複数の検出データに関し前記存在領域同士の重複度が予め定めた基準値以上となる場合に、当該複数の検出データそれぞれに対して前記補完器により得られた同一部位の位置を１つに統合して前記補完済み検出データを生成する構成とすることができる。 (5) In the object recognition device according to (4) above, when the degree of overlap between the existing regions with respect to the plurality of detected data is equal to or higher than a predetermined reference value, the plurality of site complementing means. The position of the same portion obtained by the complementer can be integrated into one for each of the detected data to generate the complemented detection data.

（６）上記（４）に記載の対象物認識装置において、前記部位補完手段は、前記複数の検出データに関し前記存在領域同士の重複度が予め定めた基準値以上となる場合に、当該複数の検出データを統合して前記補完器に入力することにより前記補完済み検出データを生成する構成とすることができる。 (6) In the object recognition device according to (4) above, when the degree of overlap between the existing regions with respect to the plurality of detected data is equal to or higher than a predetermined reference value, the plurality of site complementing means. By integrating the detection data and inputting it to the complementer, the complemented detection data can be generated.

（７）本発明に係る対象物認識方法は、所定の対象物が計測された計測データから、前記対象物を構成する複数の要検出部位を検出する方法であって、前記対象物の複数のサンプルについて前記サンプルごとに付与された前記要検出部位の位置を含む付与データを用い、当該付与データから一部の前記位置を欠落させた劣化データを入力とし、元の前記付与データを出力の目標値とする学習によって予め生成された補完器を用意するステップと、前記計測データから前記対象物における前記要検出部位の位置のうちの１つ以上を検出して、前記対象物の検出データを取得する部位検出ステップと、前記検出データを前記補完器に入力し、当該検出データに不足している前記要検出部位の位置を補完して補完済み検出データを生成する部位補完ステップと、を備える。 (7) The object recognition method according to the present invention is a method of detecting a plurality of detection-requiring sites constituting the object from the measurement data obtained by measuring the predetermined object, and is a method of detecting a plurality of detection-required parts of the object. About the sample Using the given data including the position of the detection-required site given for each sample, the deterioration data in which some of the said positions are omitted from the given data is input, and the original given data is the output target. Acquire the detection data of the object by detecting one or more of the positions of the detection-required parts in the object from the measurement data and the step of preparing a complement generated in advance by learning to use the value. A site complementing step is provided in which the detection data is input to the complementer and the position of the detection-required site lacking in the detection data is complemented to generate the complemented detection data.

（８）本発明に係る対象物認識プログラムは、所定の対象物が計測された計測データから、前記対象物を構成する複数の要検出部位を検出する処理をコンピュータに行わせるプログラムであって、当該コンピュータを、前記対象物の複数のサンプルについて前記サンプルごとに付与された前記要検出部位の位置を含む付与データを用い、当該付与データから一部の前記位置を欠落させた劣化データを入力とし、元の前記付与データを出力の目標値とする学習によって予め生成された補完器を記憶している補完器記憶手段、前記計測データから前記対象物における前記要検出部位の位置のうちの１つ以上を検出して、前記対象物の検出データを取得する部位検出手段、及び、前記検出データを前記補完器に入力し、当該検出データに不足している前記要検出部位の位置を補完して補完済み検出データを生成する部位補完手段、として機能させる。 (8) The object recognition program according to the present invention is a program that causes a computer to perform a process of detecting a plurality of detection-required parts constituting the object from the measurement data obtained by measuring the predetermined object. The computer uses the given data including the positions of the detection-required sites given to each of the plurality of samples of the object, and inputs the deterioration data in which some of the said positions are omitted from the given data. , One of the complement storage means that stores the complement generated in advance by learning using the original added data as the output target value, and the position of the detection-required portion in the object from the measurement data. The site detecting means for detecting the above and acquiring the detection data of the object, and the detection data are input to the complement to complement the position of the detection-required site lacking in the detection data. It functions as a site complementing means for generating complemented detection data.

本発明の対象物認識装置、対象物認識方法および対象物認識プログラムによれば、一部の部位が隠蔽されて対象物が計測されても、隠蔽されている部位を補完して対象物を認識することが可能となる。 According to the object recognition device, the object recognition method, and the object recognition program of the present invention, even if a part of the part is concealed and the object is measured, the concealed part is complemented to recognize the object. It becomes possible to do.

本発明の実施形態に係る対象物認識装置の概略の構成を示す図である。It is a figure which shows the schematic structure of the object recognition apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る対象物認識装置の学習段階に関する概略の機能ブロック図である。It is a schematic functional block diagram concerning the learning stage of the object recognition apparatus which concerns on embodiment of this invention. 付与データの例を説明する模式図である。It is a schematic diagram explaining the example of the given data. 本発明の実施形態に係る対象物認識装置の認識段階に関する概略の機能ブロック図である。It is a schematic functional block diagram concerning the recognition stage of the object recognition apparatus which concerns on embodiment of this invention. 撮影画像と検出データの例を示す模式図である。It is a schematic diagram which shows the example of the photographed image and the detection data. 図５の右側の人物についての検出データの統合処理の概略を説明する模式図である。It is a schematic diagram explaining the outline of the integration process of the detection data about the person on the right side of FIG. 本発明の実施形態に係る対象物認識装置の学習段階に関するフロー図である。It is a flow chart about the learning stage of the object recognition apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る対象物認識装置の認識段階に関するフロー図である。It is a flow chart about the recognition stage of the object recognition apparatus which concerns on embodiment of this invention. キーポイント補完手段による重複検出判定処理の概略のフロー図である。It is a schematic flow chart of the duplication detection determination processing by a key point complementing means. 本発明の実施形態に係る対象物認識装置の認識段階に関し、統合処理が異なる例のフロー図である。It is a flow chart of the example which the integrated processing is different with respect to the recognition stage of the object recognition apparatus which concerns on embodiment of this invention.

以下、本発明の実施の形態（以下実施形態という）である対象物認識装置１について、図面に基づいて説明する。本発明に係る対象物認識装置は、計測データから所定の対象物について当該対象物を構成する複数の部位の位置を検出し、その結果に基づいて計測データにおける対象物の領域（対象物領域）や属性（対象物属性）を求めるものであり、本実施形態にて一例として示す対象物認識装置１は、監視空間を撮影した撮影画像から監視空間に現れた人の部位および領域を検出し、さらに当該人が子供であるか否かを推定する。すなわち、本実施形態において、計測データは二次元画像であり、対象物は人である。対象物認識装置１は二次元画像において人を構成する複数の部位の位置を検出して、部位を囲む領域を検出し、部位間の距離の比に基づき子供であるか否かを推定する。 Hereinafter, the object recognition device 1 according to the embodiment of the present invention (hereinafter referred to as the embodiment) will be described with reference to the drawings. The object recognition device according to the present invention detects the positions of a plurality of parts constituting the predetermined object from the measurement data, and based on the result, the area of the object (object area) in the measurement data. And attributes (object attributes) are obtained, and the object recognition device 1 shown as an example in the present embodiment detects a part and a region of a person appearing in the surveillance space from a photographed image of the surveillance space. Furthermore, it is estimated whether or not the person is a child. That is, in the present embodiment, the measurement data is a two-dimensional image, and the object is a person. The object recognition device 1 detects the positions of a plurality of parts constituting a person in a two-dimensional image, detects a region surrounding the parts, and estimates whether or not the child is a child based on the ratio of the distances between the parts.

上記対象物認識に用いる複数の部位を要検出部位、要検出部位の代表点をキーポイントと称する。キーポイントの情報は、少なくとも対応する部位の種別と位置の組み合わせで表され、この組み合わせを含むデータを部位データと称する。そして、各キーポイントを検出することによって、対応する要検出部位の位置が検出される。なお、要検出部位とする部位の種別は、対象物や認識の目的に応じて予め定められる。 A plurality of parts used for object recognition are referred to as detection-required parts, and representative points of the detection-required parts are referred to as key points. The key point information is represented by at least a combination of the type and position of the corresponding part, and the data including this combination is referred to as part data. Then, by detecting each key point, the position of the corresponding detection-required portion is detected. The type of the part to be detected is predetermined according to the object and the purpose of recognition.

特に、対象物認識装置１は、学習用画像に現れる部位のアノテーション（付与データ）を用いて学習した、隠れた部位を補完する補完器を記憶している。ここで、付与データは、学習用の計測データに現れている対象物や、対象物の三次元モデルなどに対して付与される部位データである。対象物認識装置１は、例えば従来の検出器により撮影画像から要検出部位の検出を行って検出データを生成し、検出し損ねた要検出部位の部位データを補完器によって補完する。そして、対象物認識装置１は、補完後の検出データ（補完済み検出データ）を用いて対象物領域を検出し、対象物属性を判定する。 In particular, the object recognition device 1 stores a complementer that complements the hidden portion learned by using the annotation (assigned data) of the portion appearing in the learning image. Here, the given data is the part data given to the object appearing in the measurement data for learning, the three-dimensional model of the object, and the like. The object recognition device 1 detects a detection-required portion from a captured image by, for example, a conventional detector to generate detection data, and complements the portion data of the detection-required portion that has failed to be detected by the complementer. Then, the object recognition device 1 detects the object area using the complemented detection data (complemented detection data), and determines the object attribute.

［対象物認識装置１の構成］
図１は対象物認識装置１の概略の構成を示すブロック図である。対象物認識装置１は撮影部２、通信部３、記憶部４、画像処理部５および出力部６からなる。 [Configuration of object recognition device 1]
FIG. 1 is a block diagram showing a schematic configuration of an object recognition device 1. The object recognition device 1 includes a photographing unit 2, a communication unit 3, a storage unit 4, an image processing unit 5, and an output unit 6.

撮影部２は、計測データを取得する計測部であり、本実施形態においては監視カメラである。撮影部２は通信部３を介して画像処理部５と接続され、監視空間を所定の時間間隔で撮影して撮影画像を生成し、撮影画像を順次、画像処理部５に入力する。例えば、撮影部２は、監視空間であるイベント会場の一角に設置されたポールに当該監視空間を俯瞰する所定の固定視野を有して設置され、監視空間をフレーム周期１秒で撮影してカラー画像を生成する。なお、撮影部２はカラー画像の代わりにモノクロ画像を生成してもよい。 The photographing unit 2 is a measuring unit that acquires measurement data, and is a surveillance camera in the present embodiment. The photographing unit 2 is connected to the image processing unit 5 via the communication unit 3, photographs the monitoring space at predetermined time intervals to generate a photographed image, and sequentially inputs the photographed images to the image processing unit 5. For example, the photographing unit 2 is installed on a pole installed in a corner of the event venue, which is a monitoring space, with a predetermined fixed field of view overlooking the monitoring space, and the monitoring space is photographed with a frame period of 1 second and colored. Generate an image. The photographing unit 2 may generate a monochrome image instead of the color image.

通信部３は通信回路であり、その一端が画像処理部５に接続され、他端が撮影部２および出力部６と接続される。通信部３は撮影部２から撮影画像を取得して画像処理部５に入力し、画像処理部５から対象物の認識結果を入力され出力部６へ出力する。 The communication unit 3 is a communication circuit, one end of which is connected to the image processing unit 5 and the other end of which is connected to the photographing unit 2 and the output unit 6. The communication unit 3 acquires a photographed image from the photographing unit 2 and inputs it to the image processing unit 5, inputs the recognition result of the object from the image processing unit 5, and outputs it to the output unit 6.

なお、撮影部２、通信部３、記憶部４、画像処理部５および出力部６の間は各部の設置場所に応じた形態で適宜接続される。例えば、撮影部２と通信部３および画像処理部５とが遠隔に設置される場合、撮影部２と通信部３との間をインターネット回線にて接続することができる。また、通信部３と画像処理部５との間はバスで接続する構成とすることができる。その他、接続手段として、ＬＡＮ（Local Area Network）、各種ケーブルなどを用いることができる。 The photographing unit 2, the communication unit 3, the storage unit 4, the image processing unit 5, and the output unit 6 are appropriately connected in a form according to the installation location of each unit. For example, when the photographing unit 2, the communication unit 3, and the image processing unit 5 are installed remotely, the photographing unit 2 and the communication unit 3 can be connected by an internet line. Further, the communication unit 3 and the image processing unit 5 can be connected by a bus. In addition, a LAN (Local Area Network), various cables, or the like can be used as the connection means.

記憶部４は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）等のメモリ装置であり、各種プログラムや各種データを記憶する。例えば、記憶部４は学習用画像、学習用画像に対する付与データ、学習済みモデルである検出器や補完器の情報を記憶する。記憶部４は画像処理部５と接続されて、画像処理部５との間でこれらの情報を入出力する。すなわち、対象物の認識に必要な情報や、認識処理の過程で生じた情報が記憶部４と画像処理部５との間で入出力される。 The storage unit 4 is a memory device such as a ROM (Read Only Memory) and a RAM (Random Access Memory), and stores various programs and various data. For example, the storage unit 4 stores the learning image, the data given to the learning image, and the information of the detector and the complementer which are the learned models. The storage unit 4 is connected to the image processing unit 5 and inputs / outputs these information to and from the image processing unit 5. That is, information necessary for recognizing an object and information generated in the process of recognition processing are input / output between the storage unit 4 and the image processing unit 5.

画像処理部５は、計測データを処理する計測データ処理部であり、ＣＰＵ（Central Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＭＣＵ（Micro Control Unit）、ＧＰＵ(Graphics Processing Unit)等の演算装置で構成される。画像処理部５は記憶部４からプログラムを読み出して実行することにより各種の処理手段・制御手段として動作し、必要に応じて、各種データを記憶部４から読み出し、生成したデータを記憶部４に記憶させる。例えば、画像処理部５は学習により検出器と補完器を生成し、生成した検出器と補完器を通信部３経由で記憶部４に記憶させる。また、画像処理部５は検出器と補完器を用いて、撮影画像における対象物を認識する処理を行う。 The image processing unit 5 is a measurement data processing unit that processes measurement data, and is an arithmetic unit such as a CPU (Central Processing Unit), a DSP (Digital Signal Processor), an MCU (Micro Control Unit), and a GPU (Graphics Processing Unit). It is composed. The image processing unit 5 operates as various processing means / control means by reading a program from the storage unit 4 and executing the program, reads various data from the storage unit 4 as necessary, and stores the generated data in the storage unit 4. Remember. For example, the image processing unit 5 generates a detector and a complementer by learning, and stores the generated detector and the complementer in the storage unit 4 via the communication unit 3. In addition, the image processing unit 5 uses a detector and a complement to perform processing for recognizing an object in the captured image.

出力部６は、液晶ディスプレイまたは有機ＥＬ（Electro-Luminescence）ディスプレイ等であり、通信部３から入力された認識結果を表示する。監視員は表示された認識結果に応じて対処の要否等を判断し、必要に応じて対処員を急行させる等の対処を行う。 The output unit 6 is a liquid crystal display, an organic EL (Electro-Luminescence) display, or the like, and displays the recognition result input from the communication unit 3. The observer decides whether or not a countermeasure is necessary according to the displayed recognition result, and takes measures such as urgently rushing the responder as necessary.

以下、対象物認識装置１の構成について、先ず、検出器と補完器を学習する学習段階に関する構成について説明し、次いで、検出器と補完器を用いて対象物を認識する認識段階に関する構成について説明する。 Hereinafter, regarding the configuration of the object recognition device 1, first, the configuration related to the learning stage for learning the detector and the complementer will be described, and then the configuration for the recognition stage for recognizing the object using the detector and the complementer will be described. To do.

［学習段階に関する対象物認識装置１の構成］
図２は学習段階に関する対象物認識装置１の概略の機能ブロック図であり、記憶部４が学習用データ記憶手段４０、検出器記憶手段４１および補完器記憶手段４２として機能し、画像処理部５が検出器学習手段５０、劣化データ生成手段５１および補完器学習手段５２として機能する。 [Configuration of object recognition device 1 related to learning stage]
FIG. 2 is a schematic functional block diagram of the object recognition device 1 relating to the learning stage, in which the storage unit 4 functions as the learning data storage means 40, the detector storage means 41, and the complement storage means 42, and the image processing unit 5 Functions as the detector learning means 50, the degradation data generating means 51, and the complement learning means 52.

学習用データ記憶手段４０は多数の学習用の画像を予め記憶する学習用画像記憶手段であると共に、当該学習用画像に撮影されている人について付与された部位データを付与データとして予め記憶している付与データ記憶手段である。学習用データ記憶手段４０は、学習用画像と当該画像に撮影されている各人（以下、サンプルと称する。別人物は別サンプルであり、同一人物であっても画像が異なれば別サンプルである）の付与データとを紐づけて保持する。具体的には、各サンプルには互いを識別するためのサンプルＩＤが付与され、学習用画像には画像ＩＤが付与され、学習用データ記憶手段４０にはこれらＩＤの対応関係が記憶される。学習用画像は、カメラで実際に撮影された実画像でなくてもよく、例えば、コンピュータグラフィックス（ＣＧ）などで作られた画像であってもよい。付与データは、各サンプルのキーポイントそれぞれについての種別および位置の情報を含む。また、位置が不明なキーポイントについてはその旨を示す情報とすることができる。つまり、付与データにより、各サンプルの複数のキーポイントについてその種別ごとに当該キーポイントの位置が付与されたか否かと付与された位置がわかる。付与データは、人手によって作成されてもよいし、機械が抽出したものを人が確認し必要に応じて修正することによって作成されてもよいし、それらが混在していてもよい。 The learning data storage means 40 is a learning image storage means that stores a large number of learning images in advance, and also stores in advance the part data given to the person photographed in the learning image as the given data. It is a means of storing given data. The learning data storage means 40 is a learning image and each person photographed in the image (hereinafter, referred to as a sample. Another person is another sample, and even if they are the same person, if the images are different, they are different samples. ) Is linked and held. Specifically, sample IDs for identifying each other are assigned to each sample, image IDs are assigned to the learning images, and the correspondence between these IDs is stored in the learning data storage means 40. The learning image does not have to be an actual image actually taken by the camera, and may be, for example, an image created by computer graphics (CG) or the like. The grant data includes type and position information for each key point of each sample. In addition, a key point whose position is unknown can be used as information indicating that fact. That is, from the assigned data, it is possible to know whether or not the position of the key point has been assigned and the assigned position for each of the plurality of key points of each sample. The given data may be created manually, may be created by a person checking the extracted data by a machine and modifying it as necessary, or may be a mixture thereof.

図３は付与データの例を説明する模式図である。図３（ａ）は、要導出部位を１７個とし、対象物のキーポイントのトポロジーを図化した例である。キーポイントの位置を表す１７個の白丸と、キーポイント間の連結関係を表す１６本の線分にて図化されている。図３（ｂ）は付与データをテーブル形式のデータベースとして定義した例を示している。テーブルの各行がサンプルごとの付与データのレコードを表す。各レコードにおいては、先頭（左側）にサンプルＩＤを表すインデックスｎ（ｎ＝１，２，…，Ｎ）が格納され、続いてキーポイントの情報を表す３つの値の組がキーポイントの種別に対応に対応するインデックスｉ（ｉ＝１，２，…，１７）の昇順に１７組格納される。 FIG. 3 is a schematic diagram illustrating an example of the given data. FIG. 3A is an example in which the topology of the key point of the object is plotted with 17 parts requiring derivation. It is illustrated by 17 white circles representing the positions of key points and 16 line segments representing the connection relationship between key points. FIG. 3B shows an example in which the given data is defined as a table format database. Each row in the table represents a record of grant data for each sample. In each record, the index n (n = 1, 2, ..., N) representing the sample ID is stored at the beginning (left side), and then the set of three values representing the key point information is used as the key point type. 17 sets of indexes i (i = 1, 2, ..., 17) corresponding to the correspondence are stored in ascending order.

上記３つの値の組は、各キーポイントのｘ座標ｘ_ｎ，ｉ、ｙ座標ｙ_ｎ，ｉ、および当該キーポイントが欠落していないかを表すフラグ（付与フラグ）ｖ_ｎ，ｉである。付与フラグｖ_ｎ，ｉに設定する値は、座標が付与されていれば“１”、座標が付与されていなければ“０”としている。各組において３つの値はｘ_ｎ，ｉ、ｙ_ｎ，ｉ、ｖ_ｎ，ｉの順に格納されている。 The set of the above three values is the x-coordinate x _{n, i} , y-coordinate y _{n, i of} each key point, and the flag (assignment flag) v _{n, i} indicating whether or not the key point is missing. The values set in the assignment flags v _{n and i} are "1" if the coordinates are assigned and "0" if the coordinates are not assigned. In each set, the three values are stored in the order of x _{n, i} , y _{n, i} , v _{n, i} .

なお、キーポイントの位置は画像上にて相対位置で表されているため、劣化データ生成手段５１は付与データにおけるキーポイントの位置を正規化してから劣化データを生成する。例えば、正規化は、各サンプルの付与フラグが１であるキーポイントに対して、当該サンプルの両肩に対応するキーポイントの中心を原点とした座標系に平行移動するといった方法で行うことができる。ちなみに、この場合、右肩および左肩のいずれかの付与フラグが０であるサンプルについては正規化されないことになるが、このように正規化されないサンプルについては補完器の学習に用いないこととすればよい。 Since the position of the key point is represented by a relative position on the image, the deterioration data generation means 51 normalizes the position of the key point in the assigned data and then generates the deterioration data. For example, normalization can be performed by translating a key point whose assigned flag of each sample is 1, to a coordinate system with the center of the key point corresponding to both shoulders of the sample as the origin. .. By the way, in this case, the sample in which the grant flag of either the right shoulder or the left shoulder is 0 is not normalized, but the sample not normalized in this way is not used for the learning of the complementer. Good.

検出器学習手段５０は、学習用画像を入力とし、当該学習用画像に対応する付与データを出力の目標値とする学習により検出器として学習済みモデルを生成する。 The detector learning means 50 generates a trained model as a detector by learning using a learning image as an input and the given data corresponding to the learning image as an output target value.

なお、検出器は、画像を入力として１つ以上の要検出部位を出力するものであれば、どのような手段を用いてもよい。本実施形態では、検出器として非特許文献１で提案された手法を用い、学習データによって予め学習しておく。具体的には、本実施形態では検出器をＣＮＮ（Convolutional Neural Networks）でモデル化する。 The detector may use any means as long as it outputs one or more detection-required parts by inputting an image. In the present embodiment, the method proposed in Non-Patent Document 1 is used as a detector, and learning is performed in advance using training data. Specifically, in this embodiment, the detector is modeled by CNN (Convolutional Neural Networks).

検出器記憶手段４１は、ＣＮＮを構成するフィルタのフィルタ係数やネットワーク構造などを含めた情報を検出器として記憶する。 The detector storage means 41 stores information including the filter coefficient and network structure of the filters constituting the CNN as a detector.

対象物認識装置１における検出器には他にも、例えば、対象物のシルエットマッチングを用いた姿勢推定に基づいてキーポイントの位置を特定する検出器など、公知の種々の検出器を適用することができる。なお、例えば、シルエットマッチングに基づく検出器を適用する場合、検出器記憶手段４１は、検出に必要となる背景画像や姿勢パターンなどを含めて記憶する。 In addition to the detectors in the object recognition device 1, various known detectors such as a detector that identifies the position of a key point based on posture estimation using silhouette matching of an object are applied. Can be done. For example, when a detector based on silhouette matching is applied, the detector storage means 41 stores a background image, a posture pattern, and the like necessary for detection.

劣化データ生成手段５１は、学習用データ記憶手段４０から付与データを読み出し、当該付与データにおける各サンプルのキーポイントの一部を欠落させて劣化データを作成する。そして、付与データと劣化データとをセットにして補完器学習手段５２へ出力する。例えば、劣化データ生成手段５１は、欠落させるキーポイントをランダムに或いは規則的に選択して、選択したキーポイントの位置を不明値に置換することによって劣化データを作成することができ、具体的には、劣化データ生成手段５１は、選択したキーポイントのｘ座標、ｙ座標、および付与フラグをそれぞれ０に置換して劣化データを作成する。ただし、劣化データ生成手段５１は、付与フラグが１であるキーポイントを予め定めた必須個数以上残す。 The deterioration data generation means 51 reads the given data from the learning data storage means 40, and creates the deteriorated data by omitting a part of the key points of each sample in the given data. Then, the given data and the deterioration data are set and output to the complement learning means 52. For example, the deterioration data generation means 51 can create deterioration data by randomly or regularly selecting the key points to be deleted and replacing the positions of the selected key points with unknown values. The deterioration data generation means 51 creates deterioration data by replacing the x-coordinate, y-coordinate, and assignment flag of the selected key point with 0, respectively. However, the deterioration data generation means 51 leaves a predetermined number or more of key points for which the assignment flag is 1.

すなわち、劣化データ生成手段５１は学習用データ記憶手段４０から、必須個数を超えるキーポイントの位置を付与された付与データを読み出し、当該付与データから１個以上の位置を欠落させて、必須個数以上の位置を含む劣化データを生成する。一方、付与データのうち、付与フラグが１のキーポイントが必須個数以下であるものは補完器の学習に用いない。本実施形態では必須個数は１個とする。 That is, the deteriorated data generation means 51 reads out the given data to which the key point positions exceeding the required number are given from the learning data storage means 40, omits one or more positions from the given data, and causes the required number or more. Generate degradation data including the location of. On the other hand, among the given data, the key points with the given flag of 1 are not used for learning the complementer. In this embodiment, the required number is one.

また、本実施形態では、魚眼レンズ画像や全天球画像などに写っている人の姿勢についても適切に補完できるように、ｘｙ座標が（０，０）の点を中心としてランダムな角度でサンプルを回転させ、補完器の学習に用いる。すなわち、劣化データ生成手段５１は、正規化後の付与データに対し回転処理を行ってキーポイントのｘｙ座標を変換してから劣化データを生成する。 Further, in the present embodiment, the sample is sampled at a random angle centered on the point where the xy coordinate is (0,0) so that the posture of the person shown in the fisheye lens image or the spherical image can be appropriately complemented. Rotate and use for learning complements. That is, the deterioration data generation means 51 generates deterioration data after performing rotation processing on the normalized assigned data to convert the xy coordinates of the key points.

補完器学習手段５２は、劣化データ生成手段５１から入力された、付与データと劣化データとのペアを用いて補完器を学習する。すなわち、補完器学習手段５２は、劣化データを入力とし付与データを出力の目標値とする学習によって補完器を生成する。ここでの学習とは、補完器のパラメータを求めることである。 The complement learning means 52 learns the complement using the pair of the given data and the degraded data input from the degraded data generating means 51. That is, the complement learning means 52 generates a complement by learning with the deterioration data as an input and the given data as an output target value. Learning here is to find the parameters of the complementer.

本実施形態においては、補完器を変分自己符号化器（variational autoencoder：ＶＡＥ）でモデル化する。ＶＡＥは線形変換処理、活性化関数等から構成され、ここでは、活性化関数としてＲｅＬＵ関数を用いる。本実施形態では、補完器学習手段５２はＶＡＥを構成する各要素のパラメータについて誤差関数を最小化する学習を行う。誤差関数として、劣化データを補完器に入力して得られたキーポイントの座標と、付与データのキーポイントの座標との二乗誤差などを用いる。このとき、付与データの付与フラグが０であるキーポイントについては誤差関数に含めない。最小化には確率的最急降下法などを用いる。 In this embodiment, the complement is modeled with a variational autoencoder (VAE). The VAE is composed of a linear conversion process, an activation function, and the like, and here, the ReLU function is used as the activation function. In the present embodiment, the complement learning means 52 learns to minimize the error function for the parameters of each element constituting the VAE. As the error function, the square error between the key point coordinates obtained by inputting the deterioration data into the complementer and the key point coordinates of the assigned data is used. At this time, the key point whose grant flag of the grant data is 0 is not included in the error function. A stochastic steepest descent method is used for minimization.

補完器記憶手段４２は、補完器学習手段５２によって得られた補完器のパラメータを記憶する。また、補完器記憶手段４２には補完器として用いるＶＡＥの構造が格納される。 The complement storage means 42 stores the parameters of the complement obtained by the complement learning means 52. Further, the complement storage means 42 stores the structure of the VAE used as the complement.

［認識段階に関する対象物認識装置１の構成］
図４は認識段階に関する対象物認識装置１の概略の機能ブロック図であり、記憶部４が検出器記憶手段４１および補完器記憶手段４２として機能し、画像処理部５がキーポイント検出手段５３、キーポイント補完手段５４、対象物領域検出手段５５および対象物属性推定手段５６として機能し、通信部３が画像処理部５と協働し、撮影画像取得手段３０および認識結果出力手段３１として機能する。 [Configuration of object recognition device 1 related to recognition stage]
FIG. 4 is a schematic functional block diagram of the object recognition device 1 regarding the recognition stage, in which the storage unit 4 functions as the detector storage means 41 and the complement storage means 42, and the image processing unit 5 is the key point detection means 53. It functions as a key point complementing means 54, an object area detecting means 55, and an object attribute estimating means 56, and a communication unit 3 cooperates with an image processing unit 5 to function as a captured image acquisition means 30 and a recognition result output means 31. ..

撮影画像取得手段３０は撮影部２から撮影画像を順次取得して画像処理部５に出力する。 The captured image acquisition means 30 sequentially acquires captured images from the photographing unit 2 and outputs them to the image processing unit 5.

検出器記憶手段４１は上述したように、学習段階で生成された検出器を記憶している。 As described above, the detector storage means 41 stores the detector generated in the learning stage.

キーポイント検出手段５３（部位検出手段）は検出器記憶手段４１に格納されている検出器を用いて、撮影手段である撮影部２から順次取得した撮影画像を入力として、当該画像に写っている人物のキーポイントを人物ごとに検出する。なお、シルエットマッチングに基づく検出器を適用する場合は、過去の撮影画像に基づく背景画像の生成や背景差分処理によるシルエットの抽出などの処理を伴う。なお、学習段階では各サンプルの付与データをサンプルＩＤで識別したが、認識段階では、検出された各人物の検出データに対象物ＩＤを付与して識別する。 The key point detecting means 53 (site detecting means) uses the detector stored in the detector storage means 41 to input the captured images sequentially acquired from the photographing unit 2 which is the photographing means and appears in the image. Detects the key points of a person for each person. When a detector based on silhouette matching is applied, processing such as generation of a background image based on past captured images and extraction of silhouettes by background subtraction processing is involved. In the learning stage, the assigned data of each sample is identified by the sample ID, but in the recognition stage, the object ID is assigned to the detected data of each detected person for identification.

補完器記憶手段４２は上述したように、学習段階で生成された補完器を記憶している。 As described above, the complement storage means 42 stores the complement generated in the learning stage.

キーポイント補完手段５４は、補完器記憶手段４２に格納されている学習済みの補完器を用いて、キーポイント検出手段５３から入力された各人の検出データに対して補完を行う。補完された検出データを補完済み検出データと称する。つまり、キーポイント補完手段５４は、キーポイント検出手段５３が撮影画像から検出した対象物ごとの検出データそれぞれを補完器に入力して各対象物の検出データを補完することによって、各対象物の補完済み検出データを生成する部位補完手段である。キーポイント補完手段５４により、各人について全てのキーポイントの座標が算出される。 The key point complement means 54 complements each person's detection data input from the key point detection means 53 by using the learned complement stored in the complement storage means 42. The complemented detection data is referred to as complemented detection data. That is, the key point complementing means 54 complements the detection data of each object by inputting the detection data for each object detected by the key point detecting means 53 from the captured image into the complementer. It is a site complementing means for generating complemented detection data. The key point complementing means 54 calculates the coordinates of all the key points for each person.

具体的には、キーポイント補完手段５４は、キーポイント検出手段５３から検出されたキーポイントについて部位と位置を検出データとして入力されると、部位ごとに位置の情報として、座標（ｘ_ｎ，ｉおよびｙ_ｎ，ｉ）と、位置に関する状態値とからなる３つの値の組を定義する。つまり、認識段階では、学習段階での付与フラグに代えて、検出データの部位ごとに状態値を付与する。検出データのデータ形式は、上述した付与データのレコードと対比すると、付与データにおけるサンプルＩＤに代えて対象物ＩＤを表すインデックスが格納され、また、各部位に対応してｘ_ｎ，ｉ、ｙ_ｎ，ｉ、ｖ_ｎ，ｉの順に格納される３つの値のうちｖ_ｎ，ｉに状態値が格納された形式とすることができる。 Specifically, when the key point complementing means 54 inputs the part and the position of the key point detected from the key point detecting means 53 as detection data, the coordinates (x _{n, i)} are used as the position information for each part. And y _{n, i} ) and a set of three values consisting of the state value related to the position are defined. That is, in the recognition stage, a state value is assigned to each part of the detection data instead of the assignment flag in the learning stage. When the data format of the detection data is compared with the above-mentioned record of the given data, an index representing the object ID is stored instead of the sample ID in the given data, and x _{n, i} , y _n corresponding to each part. _{, i,} v _{_n,} v n of the three values stored in the order of _{_i,} the state value to _i can be the stored form.

ここで、状態値は、キーポイント検出手段５３により位置が検出された部位については“１”、検出されなかった部位については“０”を付与し、また、補完処理でｘ座標ｘ_ｎ，ｉ、ｙ座標ｙ_ｎ，ｉが設定された部位については“２”を付与する。つまり、キーポイント補完手段５４への入力時に欠落していたキーポイントの位置が補完され、当該キーポイントについて、算出された座標と状態値“２”とからなる位置情報を格納した補完済み検出データが生成される。なお、入力時に既に検出されていたキーポイントについては、本実施形態では補完済み検出データにおける位置情報として、入力時の値を用いるが、これに代えて補完器の出力の座標を用いてもよい。また、検出されなかった部位に関しては、ｘ_ｎ，ｉおよびｙ_ｎ，ｉに不明値を表す情報として劣化データと同様、０を設定する。 Here, as the state value, "1" is given to the part where the position is detected by the key point detecting means 53, "0" is given to the part where the position is not detected, and the x-coordinate x _{n, i} is given by the complementary process. , Y coordinates y _{n, i} are set, and "2" is given to the part. That is, the position of the key point that was missing at the time of input to the key point complement means 54 is complemented, and the complemented detection data storing the position information consisting of the calculated coordinates and the state value “2” for the key point is complemented. Is generated. For the key points that have already been detected at the time of input, the value at the time of input is used as the position information in the complemented detection data in the present embodiment, but the coordinates of the output of the complement may be used instead. .. For the undetected part, 0 is set in x _{n, i} and y _{n, i} as information indicating an unknown value as in the case of deterioration data.

キーポイント補完手段５４は生成した補完済み検出データを、補完前の検出データの対象物ＩＤと共に対象物領域検出手段５５および対象物属性推定手段５６へ供する。 The key point complementing means 54 provides the generated complemented detection data to the object area detecting means 55 and the object attribute estimating means 56 together with the object ID of the detected data before complementing.

ここで、補完前の検出データに含まれるキーポイントの位置は、画像上にて相対位置で表されているため、キーポイント補完手段５４は検出データに劣化データ生成手段５１と同様の正規化処理を行ってから補完処理を行う。そして、キーポイント補完手段５４での補完処理で得られた位置に対して、補完前に正規化により平行移動した分を元に戻す処理を行って、補完済み検出データのキーポイントの位置とする。なお、上述のように正規化できない対象物が存在し得るが、当該対象物については補完前の検出データを補完済み検出データとしてそのまま出力することにする。 Here, since the position of the key point included in the detection data before completion is represented by a relative position on the image, the key point complement means 54 normalizes the detection data in the same manner as the deterioration data generation means 51. After that, the completion process is performed. Then, the position obtained by the completion process by the key point complement means 54 is subjected to a process of restoring the portion that has been translated by normalization before completion to obtain the position of the key point of the complemented detection data. .. Although there may be an object that cannot be normalized as described above, the detection data before complementation is output as it is as the complemented detection data for the object.

さらに、キーポイント補完手段５４は補完済み検出データに関する統合処理を行ってもよい。統合処理の意図するところについて図５を用いて説明する。図５は撮影画像と検出データの例を示す模式図である。図５（ａ）は撮影画像と、模式化した検出データとを重ねて表示した図である。当該画像には２人の人物６０，６１が写っており、検出されたキーポイントが白丸で表されている。図５（ｂ）は人物６０，６１についての検出データを、図３（ｂ）に示したようなテーブル形式で表している。図５（ｂ）の検出データは、キーポイント補完手段５４により上述した状態値を付与され、また、検出されなかった部位については、座標および状態値の３つの値を０に設定されている。 Further, the key point complementing means 54 may perform integrated processing on the complemented detection data. The intention of the integrated process will be described with reference to FIG. FIG. 5 is a schematic diagram showing an example of a captured image and detection data. FIG. 5A is a diagram in which the captured image and the schematic detection data are superimposed and displayed. Two people 60 and 61 are shown in the image, and the detected key points are represented by white circles. FIG. 5B shows the detection data for the persons 60 and 61 in a table format as shown in FIG. 3B. The detection data of FIG. 5B is given the above-mentioned state value by the key point complementing means 54, and the three values of the coordinates and the state value are set to 0 for the undetected portion.

図５の例では、人物６０については１７個のキーポイント“１”〜“１７”が全て検出されており、図５（ａ）の矩形６２はそれらキーポイント群の外接矩形を示している。人物６０については全身が一体に検出されており、これに対応して、１つの検出データ（対象物ＩＤ＝００１）が得られている。一方、人物６１は腰の辺りが机で隠蔽され、腰に位置する２つのキーポイント“１２”，“１３”が検出できない。その結果、検出器は人物６１の上半身のキーポイント“１”〜“１１”と、脚のキーポイント“１４”〜“１７”とを別々の対象物として検出してしまっている。図５（ａ）の矩形６３は上半身から検出されたキーポイント群の外接矩形、また、矩形６４は脚から検出されたキーポイント群の外接矩形を示している。これに対応して、人物６１については矩形６３からの検出データ（対象物ＩＤ＝００２）と矩形６４からの検出データ（対象物ＩＤ＝００３）とが得られている。 In the example of FIG. 5, all 17 key points "1" to "17" are detected for the person 60, and the rectangle 62 of FIG. 5 (a) shows the circumscribed rectangle of the key point group. The whole body of the person 60 is detected integrally, and one detection data (object ID = 001) is obtained correspondingly. On the other hand, the person 61 has the waist area hidden by the desk, and the two key points "12" and "13" located on the waist cannot be detected. As a result, the detector has detected the key points "1" to "11" of the upper body of the person 61 and the key points "14" to "17" of the legs as separate objects. The rectangle 63 in FIG. 5A shows the circumscribed rectangle of the key point group detected from the upper body, and the rectangle 64 shows the circumscribed rectangle of the key point group detected from the leg. Correspondingly, for the person 61, the detection data from the rectangle 63 (object ID = 002) and the detection data from the rectangle 64 (object ID = 003) are obtained.

統合処理は、この例における人物６１のように１人の人物から検出された複数の検出データを１つの検出データにまとめる処理である。例えば、キーポイント補完手段５４は統合処理にて、キーポイント検出手段５３が１つの撮影画像から２つ以上の対象物についての検出データを取得した場合に、補完器が出力したキーポイントの位置から当該対象物ごとの存在領域を推定し、複数の検出データについての当該存在領域同士の重複度が予め定めた基準値以上となる場合に、同一の対象物によるものとして当該複数の検出データについて統合された補完済み検出データを生成する。重複度の判定により部位の存在領域同士が同一対象物とみなせるほどに重複している場合に後述する統合処理が行われ、例えば、図５の例では、対象物ＩＤが００１の検出データは対象物ＩＤが００２や００３の検出データとの重複度が基準値未満となることにより統合されないが、対象物ＩＤが００２，００３の２つの検出データは重複度が基準値以上となり統合される。 The integrated process is a process of combining a plurality of detection data detected from one person like the person 61 in this example into one detection data. For example, the key point complementing means 54 uses the key point position output by the complementer when the key point detecting means 53 acquires detection data for two or more objects from one captured image in the integrated process. The existence area for each object is estimated, and when the multiplicity between the existence areas for multiple detection data is equal to or higher than a predetermined reference value, the plurality of detection data are integrated as if they are the same object. Generate the completed complemented detection data. When the existing regions of the parts overlap to the extent that they can be regarded as the same object by the determination of the degree of overlap, the integration process described later is performed. For example, in the example of FIG. 5, the detection data having the object ID 001 is the target. The two detection data having the object ID of 002,003 are integrated because the degree of duplication with the detection data of the object IDs 002 and 003 is less than the reference value.

図６は図５の人物６１についての２つの検出データの統合処理の概略を説明する模式図である。図６（ａ）〜（ｃ）はそれぞれ補完済み検出データのキーポイントおよびその外接矩形７０〜７２を撮影画像に重ねて表示した図であり、図６（ａ）は図５の矩形６３からの検出データ（対象物ＩＤ＝００２）に対する補完結果、また図６（ｂ）は図５の矩形６４からの検出データ（対象物ＩＤ＝００３）に対する補完結果を示し、図６（ｃ）はそれら２つの補完結果を統合処理した結果を示している。ここで、白丸は図５と同様、検出されたキーポイントを表し、黒丸は補完されたキーポイントを表している。図６（ａ）では補完によりキーポイントが追加され、矩形６３から拡大した外接矩形７０が得られている。一方、図６（ｂ）では、肩のキーポイントに基づく正規化が行えなかったために補完されず、元の検出データがそのまま補完済み検出データとなっており、矩形６４と同じ外接矩形７１が得られている。図６（ｃ）では、矩形６３，６４の両領域にて検出されたキーポイントが統合され、人物６１に好適に対応した外接矩形７２が得られている。図６（ｄ）は人物６０，６１についての検出データをテーブル形式で表している。対象物ＩＤが００１のデータは図５（ｂ）と同じく人物６０の検出データである。一方、人物６１の検出データは、図５（ｂ）にて対象物ＩＤが００２と００３の２つに分かれていたものが、図６（ｄ）では対象物ＩＤに００２を付与された１つの検出データに統合され、またキーポイント“１２”，“１３”については補完処理で得られた座標と状態値“２”が格納されている。 FIG. 6 is a schematic diagram illustrating an outline of an integrated process of two detection data for the person 61 of FIG. 6 (a) to 6 (c) are views in which key points of the complemented detection data and their circumscribed rectangles 70 to 72 are superimposed on the captured image, and FIG. 6 (a) is taken from the rectangle 63 of FIG. The complement result for the detection data (object ID = 002), FIG. 6 (b) shows the complement result for the detection data (object ID = 003) from the rectangle 64 of FIG. 5, and FIG. 6 (c) shows those 2 The result of integrated processing of the two complementary results is shown. Here, the white circles represent the detected key points and the black circles represent the complemented key points, as in FIG. In FIG. 6A, key points are added by complementation, and an extrinsic rectangle 70 enlarged from the rectangle 63 is obtained. On the other hand, in FIG. 6B, the normalization based on the key point of the shoulder could not be performed, so that the original detection data is not complemented and the original detection data is the complemented detection data as it is, and the same circumscribed rectangle 71 as the rectangle 64 is obtained. Has been done. In FIG. 6 (c), the key points detected in both the rectangles 63 and 64 are integrated to obtain the extrinsic rectangle 72 suitable for the person 61. FIG. 6D shows the detection data for the persons 60 and 61 in a table format. The data having the object ID of 001 is the detection data of the person 60 as in FIG. 5 (b). On the other hand, in the detection data of the person 61, the object ID is divided into two, 002 and 003 in FIG. 5 (b), but in FIG. 6 (d), one object ID is assigned 002. It is integrated with the detection data, and the coordinates and state value "2" obtained by the complementary process are stored for the key points "12" and "13".

キーポイント補完手段５４は統合処理にて、補完済み検出データの重複度を算出し、重複度が事前に定めた基準値以上である補完済み検出データ同士を統合する。重複度を適切に設計することで、異なる人物の検出データ間（例えば、図５（ｂ）の対象物ＩＤが００１と００２の検出データ）では統合処理を行わず、同一人物が分かれて検出された検出データ間（例えば、図５（ｂ）の対象物ＩＤが００２と００３の検出データ）に対して統合処理を行うことができる。 The key point complementing means 54 calculates the degree of duplication of the complemented detection data in the integration process, and integrates the complemented detection data having the degree of duplication equal to or higher than a predetermined reference value. By appropriately designing the degree of duplication, the same person is detected separately without performing integrated processing between the detection data of different persons (for example, the detection data of the object IDs 001 and 002 in FIG. 5B). Integrated processing can be performed between the detected data (for example, the detected data whose object IDs are 002 and 003 in FIG. 5B).

対象物領域検出手段５５はキーポイント補完手段５４から補完済み検出データを入力され、補完済み検出データが示す要検出部位の位置を基準とした所定範囲を計測データにおける対象物領域として検出し、検出した対象物領域の情報を認識結果出力手段３１に出力する。例えば、対象物領域検出手段５５は各人物の補完済み検出データに含まれるキーポイント群に外接する外接矩形を当該人物に関する対象物領域である人物領域として検出する。図５の例では、人物６０の矩形６２が人物領域である。 The target area detecting means 55 receives the complemented detection data from the key point complementing means 54, and detects and detects a predetermined range based on the position of the detection-required portion indicated by the complemented detection data as the target area in the measurement data. The information of the object area is output to the recognition result output means 31. For example, the object area detecting means 55 detects the circumscribed rectangle circumscribing the key point group included in the complemented detection data of each person as the person area which is the object area related to the person. In the example of FIG. 5, the rectangle 62 of the person 60 is the person area.

また、外接矩形を予め定めた比率で拡大して人物領域としてもよい。つまり人物領域の設定に際し、キーポイントが真の領域のやや内側に付与または検出されることや検出誤差を考慮して上下左右にマージンを設ける。また、各キーポイントの定義や各キーポイントの検出誤差の見積もりに応じて上下左右の各方向に対する比率を異なるものとしてもよい。 Further, the circumscribing rectangle may be enlarged by a predetermined ratio to form a person area. That is, when setting the person area, margins are provided on the top, bottom, left, and right in consideration of the fact that the key point is given or detected slightly inside the true area and the detection error. Further, the ratio to each direction of up, down, left and right may be different depending on the definition of each key point and the estimation of the detection error of each key point.

または、人の部位データから人物領域への変換を学習した変換器に各人物の補完済み検出データを入力することにより、補完済み検出データを当該人物の人物領域に変換してもよい。変換器は、学習用画像に撮影された対象物のサンプルの付与データを入力とし当該学習用画像における当該サンプルの対象物領域を出力の目標値とする学習によって生成された学習済みモデルであり、記憶部４が不図示の変換器記憶手段として機能し、変換器を予め記憶する。そして、対象物領域検出手段５５は変換器記憶手段から変換器を読み出して利用する。 Alternatively, the complemented detection data may be converted into the person area of the person by inputting the complemented detection data of each person into the converter that has learned the conversion from the part data of the person to the person area. The converter is a trained model generated by learning in which the added data of the sample of the object captured in the training image is input and the object area of the sample in the training image is the output target value. The storage unit 4 functions as a converter storage means (not shown) and stores the converter in advance. Then, the object area detecting means 55 reads out the converter from the converter storage means and uses it.

以上のように、対象物領域検出手段５５は、計測データにおいて、キーポイント補完手段５４が補完した検出データに外接する外接矩形を対象物領域として検出し、または、さらに当該外接矩形を所定数倍に拡大した対象物領域を検出する。 As described above, the object area detecting means 55 detects the circumscribed rectangle circumscribing the detection data complemented by the key point complementing means 54 as the object area in the measurement data, or further multiplys the circumscribed rectangle by a predetermined number of times. Detects the object area expanded to.

また、対象物領域検出手段５５は、キーポイント補完手段５４が補完した検出データを、学習用画像に撮影された対象物のサンプルの付与データを入力とし当該学習用画像における当該サンプルの対象物領域を出力の目標値とする学習によって生成された変換器に入力して、計測データにおける対象物領域を検出する。 Further, the object area detecting means 55 inputs the detection data complemented by the key point complementing means 54 with the added data of the sample of the object captured in the learning image, and the object area of the sample in the learning image is input. Is input to the converter generated by the learning with the output target value as the output target value, and the object area in the measurement data is detected.

補完済み検出データには、計測データ上で隠れているキーポイントが補完されているため、隠れによって極端に小さな対象物領域が検出されてしまう不具合や、隠れによって１つの対象物に係る対象物領域が複数に分かれて検出されてしまう不具合を格段に低減できる。 Since the key points hidden in the measurement data are complemented in the complemented detection data, there is a problem that an extremely small object area is detected due to hiding, and an object area related to one object due to hiding. It is possible to significantly reduce the problem that the data is detected in multiple parts.

対象物属性推定手段５６はキーポイント補完手段５４から入力された補完済み検出データを基に対象物属性を推定し、推定した対象物属性の情報を認識結果出力手段３１に出力する。本実施形態では、対象物属性推定手段５６は、補完済み検出データにおけるキーポイント間の距離の比を所定の基準と比較して計測データに計測された人物が子供であるか否かを推定する子供推定手段である。例えば、頭頂のキーポイントから首のキーポイントまでの頭部距離を算出するとともに、頭部距離に肩のキーポイントから腰部のキーポイント、膝のキーポイントを経由して踵のキーポイントに至るまでの距離を加えた全身距離を算出して、頭部距離に対する全身距離の比を算出する。ただし、肩と腰部と膝と踵の組み合わせは、右肩と右腰部と右膝と右踵の組み合わせ、または左肩と左腰部と右膝と左踵の組み合わせとし、或いは、右の組み合わせと左の組み合わせの両方を算出して平均する。算出した比を予め定めた閾値と比較し、比が閾値以下の場合は子供、比が閾値を超えた場合は大人と推定する。なお、補完済み検出データにおいて頭頂と肩と腰部と膝と踵のキーポイントのいずれかが欠落している場合は推定結果が不明である旨を出力する。上記比はいわゆる頭身であり、子供は大人に比べて頭身が低い傾向にあることを利用している。 The object attribute estimation means 56 estimates the object attribute based on the complemented detection data input from the key point complement means 54, and outputs the estimated object attribute information to the recognition result output means 31. In the present embodiment, the object attribute estimation means 56 compares the ratio of the distances between the key points in the complemented detection data with a predetermined reference to estimate whether or not the person measured in the measurement data is a child. It is a child estimation method. For example, the head distance from the key point on the crown to the key point on the neck is calculated, and the head distance is from the key point on the shoulder to the key point on the waist and the key point on the heel. Calculate the whole body distance by adding the distance of, and calculate the ratio of the whole body distance to the head distance. However, the combination of shoulder, waist, knee and heel may be the combination of right shoulder, right waist, right knee and right heel, the combination of left shoulder, left waist, right knee and left heel, or the combination of right and left. Calculate and average both combinations. The calculated ratio is compared with a predetermined threshold value, and if the ratio is below the threshold value, it is estimated as a child, and if the ratio exceeds the threshold value, it is estimated as an adult. If any of the key points of the crown, shoulders, hips, knees, and heels is missing in the complemented detection data, it is output that the estimation result is unknown. The above ratio is the so-called head and body, and utilizes the fact that children tend to have a lower head and body than adults.

補完済み検出データには、計測データ上で隠れているキーポイントが補完されているため、対象物属性推定手段５６は補完済み検出データを用いることで、隠れによって比を算出し損ねる不具合を格段に低減できる。特に、上述の子供推定では、子供の首のキーポイントが隠れることが多いため、補完済み検出データを用いた推定は効果的である。 Since the complemented detection data is complemented with key points hidden in the measurement data, the object attribute estimation means 56 uses the complemented detection data to remarkably fail to calculate the ratio due to hiding. Can be reduced. In particular, in the above-mentioned child estimation, the key points of the child's neck are often hidden, so the estimation using the complemented detection data is effective.

認識結果出力手段３１は、対象物領域検出手段５５が検出した対象物領域および対象物属性推定手段５６が推定した対象物属性を出力部６に出力する。例えば、認識結果出力手段３１は、撮影画像に対象物領域を表す矩形を描画し、属性が「子供」であればさらに矩形近傍に「子供」の文字を描画した画像を生成して出力部６に出力する。なお、対象物領域が検出されなかった場合、認識結果は対象物無しであるとして撮影画像をそのまま出力してもよい。 The recognition result output means 31 outputs the object area detected by the object area detecting means 55 and the object attribute estimated by the object attribute estimating means 56 to the output unit 6. For example, the recognition result output means 31 draws a rectangle representing the object area on the captured image, and if the attribute is "child", further generates an image in which the characters "child" are drawn in the vicinity of the rectangle, and the output unit 6 Output to. If the object area is not detected, the captured image may be output as it is, assuming that the recognition result is that there is no object.

［対象物認識装置１の動作］
次に、対象物認識装置１の動作を、学習段階と認識段階とに分けて説明する。 [Operation of object recognition device 1]
Next, the operation of the object recognition device 1 will be described separately for the learning stage and the recognition stage.

［学習段階での対象物認識装置１の動作］
図７は学習段階での対象物認識装置１の動作に関する概略のフロー図である。 [Operation of object recognition device 1 in the learning stage]
FIG. 7 is a schematic flow chart relating to the operation of the object recognition device 1 in the learning stage.

対象物認識装置１は撮影画像に現れる対象物を認識する動作に先立って、検出器および補完器を学習する動作を行う。 The object recognition device 1 performs an operation of learning the detector and the complementer prior to the operation of recognizing the object appearing in the captured image.

当該学習の動作が開始されると、画像処理部５は検出器学習手段５０として機能し、検出器の学習に用いるデータとして学習用データ記憶手段４０からサンプルごとの学習用画像および付与データを読み込む（ステップＳ１００）。そして、検出器学習手段５０は、学習用画像を入力とし、当該学習用画像に対応する付与データを出力の目標値とする検出器学習を行い（ステップＳ１０５）、当該学習により生成された学習済みモデルを検出器として検出器記憶手段４１に格納する（ステップＳ１１０）。 When the learning operation is started, the image processing unit 5 functions as the detector learning means 50, and reads the learning image and the given data for each sample from the learning data storage means 40 as the data used for learning the detector. (Step S100). Then, the detector learning means 50 performs detector learning using the learning image as an input and the given data corresponding to the learning image as an output target value (step S105), and the learned data generated by the learning is performed. The model is stored in the detector storage means 41 as a detector (step S110).

また、画像処理部５は補完器の学習を行うために、劣化データ生成手段５１として機能し、学習用データ記憶手段４０から付与データを読み出す（ステップＳ１１５）。劣化データ生成手段５１は、当該付与データに対し正規化処理および回転処理を行った後（ステップＳ１２０）、キーポイントの一部を欠落させて劣化データを作成する（ステップＳ１２５）。 Further, the image processing unit 5 functions as a deterioration data generation means 51 in order to learn the complementer, and reads out the given data from the learning data storage means 40 (step S115). The deterioration data generation means 51 creates deterioration data by removing a part of the key points after performing normalization processing and rotation processing on the given data (step S120).

画像処理部５は補完器学習手段５２として機能し、劣化データ生成手段５１から正規化および回転を行った付与データとそれから生成された劣化データとを入力され、それらデータを用いて補完器を学習し（ステップＳ１３０）、生成した補完器を補完器記憶手段４２に格納する（ステップＳ１３５）。 The image processing unit 5 functions as a complement learning means 52, and the added data normalized and rotated by the deterioration data generating means 51 and the deteriorated data generated from the assigned data are input, and the complement is learned using the data. (Step S130), and the generated complementer is stored in the complement storage means 42 (step S135).

［認識段階での対象物認識装置１の動作］
図８は認識段階での対象物認識装置１の動作に関する概略のフロー図である。 [Operation of object recognition device 1 in the recognition stage]
FIG. 8 is a schematic flow chart relating to the operation of the object recognition device 1 in the recognition stage.

対象物認識装置１が認識の動作を開始すると、撮影部２は所定時間おきに監視空間を撮影した画像を順次、出力する。画像処理部５は通信部３と協働して、撮影部２から撮影画像を受信するたびに図８のフロー図に示す動作を繰り返す。 When the object recognition device 1 starts the recognition operation, the photographing unit 2 sequentially outputs images taken in the monitoring space at predetermined time intervals. The image processing unit 5 cooperates with the communication unit 3 to repeat the operation shown in the flow chart of FIG. 8 every time a captured image is received from the photographing unit 2.

通信部３は撮影画像取得手段３０として機能し、撮影画像を受信すると当該撮影画像を画像処理部５に出力する（ステップＳ２００）。 The communication unit 3 functions as the captured image acquisition means 30, and when the captured image is received, the captured image is output to the image processing unit 5 (step S200).

画像処理部５はキーポイント検出手段５３として機能し、検出器記憶手段４１に記憶されている検出器を用いて、入力された撮影画像から人ごとにキーポイントを検出し検出データを生成する（ステップＳ２０５）。 The image processing unit 5 functions as the key point detecting means 53, detects the key point for each person from the input captured image by using the detector stored in the detector storage means 41, and generates detection data ( Step S205).

キーポイント検出手段５３によりキーポイントが検出された場合（ステップＳ２１０にて「ＹＥＳ」の場合）、画像処理部５はキーポイント補完手段５４として機能する。キーポイント補完手段５４は、キーポイント検出手段５３が生成した検出データに対して、正規化処理（ステップＳ２１５）と、キーポイントごとに状態値を付与する処理（ステップＳ２２０）を行った後、補完器記憶手段４２に記憶された補完器を用いて、キーポイントを補完する処理を行い補完済み検出データを生成する（ステップＳ２２５）。 When the key point is detected by the key point detecting means 53 (when “YES” in step S210), the image processing unit 5 functions as the key point complementing means 54. The key point complementing means 54 complements the detection data generated by the key point detecting means 53 after performing a normalization process (step S215) and a process of assigning a state value for each key point (step S220). Using the complementer stored in the instrument storage means 42, the key point is complemented and the complemented detection data is generated (step S225).

また、キーポイント補完手段５４は複数の検出データを統合する処理（ステップＳ２３０〜Ｓ２４０）を行う。上述したように統合処理は１人の人物から検出された複数の検出データをまとめる処理であり、重複検出判定処理Ｓ２３０は、複数の検出データ間での重複度に基づいて、重複検出（同一人物について複数の検出データが得られていること）を判定する。重複検出がある場合には（ステップＳ２３５にて「ＹＥＳ」の場合）、キーポイント補完手段５４は、重複検出であると判定された補完済み検出データにおける重複キーポイントを統合する（ステップＳ２４０）。他方、重複検出がない場合には（ステップＳ２３５にて「ＮＯ」の場合）、ステップＳ２４０は省略される。こうして、キーポイント補完手段５４は補完済み検出データの生成処理を完了し、当該データを対象物領域検出手段５５および対象物属性推定手段５６に供する。 Further, the key point complementing means 54 performs a process (steps S230 to S240) of integrating a plurality of detected data. As described above, the integrated process is a process of collecting a plurality of detection data detected from one person, and the duplication detection determination process S230 is a duplication detection (same person) based on the degree of duplication between the plurality of detection data. It is determined that multiple detection data have been obtained. When there is duplicate detection (when “YES” in step S235), the key point complementing means 54 integrates the duplicate key points in the complemented detection data determined to be duplicate detection (step S240). On the other hand, if there is no duplication detection (in the case of "NO" in step S235), step S240 is omitted. In this way, the key point complementing means 54 completes the process of generating the complemented detection data, and supplies the data to the object area detecting means 55 and the object attribute estimating means 56.

画像処理部５は対象物領域検出手段５５として機能し、キーポイント補完手段５４から入力される補完済み検出データに基づいて、対象物領域として人物領域を検出する（ステップＳ２４５）。また、画像処理部５は対象物属性推定手段５６として機能し、キーポイント補完手段５４から入力される補完済み検出データに基づいて、検出された人物が子供であるか否かといった対象物属性を推定する処理を行う（ステップＳ２５０）。 The image processing unit 5 functions as the object area detecting means 55, and detects the person area as the object area based on the complemented detection data input from the key point complementing means 54 (step S245). Further, the image processing unit 5 functions as the object attribute estimation means 56, and based on the complemented detection data input from the key point complement means 54, determines the object attribute such as whether or not the detected person is a child. Perform the estimation process (step S250).

画像処理部５および通信部３は協働して認識結果出力手段３１として機能し、対象物領域検出手段５５により検出された人物領域、および対象物属性推定手段５６による属性の推定結果を含む認識結果を出力部６に出力する（ステップＳ２５５）。また、ステップＳ２０５にて撮影画像からキーポイントが検出されなかった場合（ステップＳ２１０にて「ＮＯ」の場合）、認識結果出力手段３１はその旨を含む認識結果を出力部６に出力する。 The image processing unit 5 and the communication unit 3 cooperate to function as the recognition result output means 31, and the recognition including the person area detected by the object area detecting means 55 and the attribute estimation result by the object attribute estimating means 56. The result is output to the output unit 6 (step S255). Further, when the key point is not detected from the captured image in step S205 (when “NO” in step S210), the recognition result output means 31 outputs the recognition result including that fact to the output unit 6.

対象物認識装置１は、ステップＳ２００にて取得された撮影画像について認識結果を出力すると、ステップＳ２００に戻り、次に取得される撮影画像について上述の処理を繰り返す。 When the object recognition device 1 outputs the recognition result for the captured image acquired in step S200, the object recognition device 1 returns to step S200 and repeats the above processing for the captured image acquired next.

図９はキーポイント補完手段５４による重複検出判定処理Ｓ２３０の概略のフロー図である。キーポイント補完手段５４は、図８のステップＳ２２５にて生成した補完済み検出データに対して、例えば、状態値が１のキーポイントの数について降順でソート処理を行い、さらにキーポイントが同数のデータに対して対象物ＩＤの昇順とするソート処理を行った後（ステップＳ３００）、全ての補完済み検出データについて対象物ＩＤの付与を一旦、解除する（ステップＳ３０５）。そして、新たに付与する対象物ＩＤの初期値を１に設定し（ステップＳ３１０）、重複検出の判定結果に基づいて各補完済み検出データに対象物ＩＤを付与し直す処理を開始する。 FIG. 9 is a schematic flow chart of the duplication detection determination process S230 by the key point complement means 54. The key point complementing means 54 sorts the complemented detection data generated in step S225 of FIG. 8 in descending order for the number of key points having a state value of 1, and further, data having the same number of key points. After performing the sort process in which the object IDs are sorted in ascending order (step S300), the object IDs are temporarily released from all the complemented detection data (step S305). Then, the initial value of the newly assigned object ID is set to 1 (step S310), and the process of reassigning the object ID to each complemented detection data based on the determination result of duplicate detection is started.

補完済み検出データのうちに対象物ＩＤが未付与のものがあれば（ステップＳ３１５にて「ＹＥＳ」の場合）、当該未付与のデータのうち最先順序のものを選択する（ステップＳ３２０）。以下、選択したデータを補完済み検出データＡと表す。キーポイント補完手段５４は、補完済み検出データの複数のキーポイントのうち状態値が０ではないものについての外接矩形を算出し、その面積を人物の存在領域の面積として用いることができる。 If the complemented detection data has no object ID assigned (when “YES” in step S315), the unassigned data in the earliest order is selected (step S320). Hereinafter, the selected data is referred to as complemented detection data A. The key point complementing means 54 can calculate an circumscribed rectangle for a plurality of key points of the complemented detection data whose state value is not 0, and use the area as the area of the existing area of the person.

補完済み検出データＡには対象物ＩＤの現在の設定値を付与する（ステップＳ３２５）。しかる後、対象物ＩＤが未付与の補完済み検出データを順次、補完済み検出データＢとして選択する（ステップＳ３３０）。そして、補完済み検出データＢが選択された場合（ステップＳ３３５にて「ＹＥＳ」の場合）、補完済み検出データＡ，Ｂの間の重複度を算出する（ステップＳ３４０）。 The current set value of the object ID is assigned to the complemented detection data A (step S325). After that, the complemented detection data for which the object ID has not been assigned is sequentially selected as the complemented detection data B (step S330). Then, when the complemented detection data B is selected (when “YES” in step S335), the degree of duplication between the complemented detection data A and B is calculated (step S340).

本実施形態では、重複度αを、補完済み検出データＡについての人物領域Ｒ_Ａ、および補完済み検出データＢについての人物領域Ｒ_Ｂを用いて、
（数１）
α＝Ｒ_ＡとＲ_Ｂの重複領域の面積／Ｒ_ＡとＲ_Ｂの和領域の面積
で計算する。ここで、領域Ｒ_Ａ，Ｒ_Ｂは上述の外接矩形とする。なお、補完済み検出データＡ，Ｂが共に正規化を行えず補完できなかったもの（本実施形態では、両肩のいずれかのキーポイントの状態値が０である補完済み検出データ）であれば、重複判定処理は常に統合を行わない判定結果を返すこととする。 In this embodiment, a multiplicity alpha, using human region R _B for the human region R _A, and complemented detection data B for complemented detection data A,
(Number 1)
alpha = calculating the area of _{R A} and _R sum area of the area / _{R A} and _{R B} of the overlap region of the _B. Here, the region _R A, is _{R B} and the above-mentioned circumscribed rectangle. If both the complemented detection data A and B cannot be normalized and cannot be complemented (in this embodiment, the complemented detection data in which the state value of one of the key points on both shoulders is 0) , Duplicate judgment processing always returns the judgment result without integration.

また、補完済み検出データＡ，Ｂの片方（ここではＢとする）の補完済み検出データに状態値が０のキーポイントが含まれていれば、他の重複度αの計算方法として、
（数２）
α＝Ｒ_ＡとＲ_Ｂの重複領域の面積／Ｒ_Ｂの面積
としてもよい。なお、面積に代えて、同一部位のキーポイントの位置の間の距離の平均値が小さいほど高い重複度とすることもできる。この場合、一方のみにしかない部位のキーポイントは距離算出対象外とする。また、好適には、部位ごとのずれ易さを事前に指標値化しておき、ずれ易さが高い部位の距離ほど低く重みづけて平均値を求める。また、２つまたは３つ以上の部位データ同士の類似度を重複度として出力する重複度算出器を事前に学習させ用いてもよい。 If the complemented detection data of one of the complemented detection data A and B (referred to as B here) contains a key point whose state value is 0, the other method of calculating the multiplicity α is
(Number 2)
alpha = it may be the area of _{R A} and area / _{R B} of the overlap region of _{R B.} In addition, instead of the area, the smaller the average value of the distances between the positions of the key points of the same part, the higher the degree of overlap can be obtained. In this case, the key points of only one part are excluded from the distance calculation target. Further, preferably, the easiness of deviation for each part is converted into an index value in advance, and the distance of the part having a high degree of deviation is weighted lower to obtain the average value. Further, a multiplicity calculator that outputs the similarity between two or three or more part data as the multiplicity may be trained in advance and used.

ステップＳ３４０にて算出した重複度が、予め定めた基準値以上である場合（ステップＳ３４５にて「ＹＥＳ」の場合）、補完済み検出データＢに対象物ＩＤの現在の設定値を付与する（ステップＳ３５０）。これにより、補完済み検出データＡ，Ｂに共通の対象物ＩＤが付与される。キーポイント補完手段５４は、ステップＳ３５０を終えると、ステップＳ３３０に戻り、次の補完済み検出データＢを選択する。また、重複度が基準値未満である場合（ステップＳ３４５にて「ＮＯ」の場合）、現在の補完済み検出データＢは対象物ＩＤを未付与のままとして、ステップＳ３３０に戻り、次の補完済み検出データＢの処理に移る。 When the degree of duplication calculated in step S340 is equal to or greater than a predetermined reference value (when “YES” in step S345), the current set value of the object ID is added to the complemented detection data B (step). S350). As a result, a common object ID is assigned to the complemented detection data A and B. When the key point complementing means 54 finishes step S350, it returns to step S330 and selects the next complemented detection data B. If the degree of duplication is less than the reference value (“NO” in step S345), the current complemented detection data B returns to step S330, leaving the object ID unassigned, and returns to the next complemented detection data B. The process moves to the processing of the detection data B.

現在の補完済み検出データＡの選択時に対象物ＩＤが未付与であった補完済み検出データのうち補完済み検出データＢとして選択されていないものがなくなった場合（ステップＳ３３５にて「ＮＯ」の場合）、対象物ＩＤの設定値をインクリメントし（ステップＳ３５５）、ステップＳ３１５に戻る。そして、対象物ＩＤが未付与で残っている補完済み検出データがあれば（ステップＳ３１５にて「ＹＥＳ」の場合）、その中から新たな補完済み検出データＡを選択し（ステップＳ３２０）、ステップＳ３２５〜Ｓ３５５の処理を行い、一方、対象物ＩＤが未付与である補完済み検出データが残っていなければ（ステップＳ３１５にて「ＮＯ」の場合）、重複検出判定処理を終了し、図８のステップＳ２３５の処理に進む。 When there is no complemented detection data that has not been assigned an object ID when the current complemented detection data A is selected and is not selected as the complemented detection data B (in the case of "NO" in step S335). ), The set value of the object ID is incremented (step S355), and the process returns to step S315. Then, if there is complemented detection data in which the object ID has not been assigned (when “YES” in step S315), new complemented detection data A is selected from among them (step S320), and the step If the processing of S325 to S355 is performed, while the complemented detection data for which the object ID has not been assigned does not remain (when “NO” in step S315), the duplication detection determination processing is terminated and FIG. The process proceeds to step S235.

キーポイント補完手段５４は重複検出判定処理Ｓ２３０後の補完済み検出データに、同じ対象物ＩＤを有するものが複数あれば、ステップＳ２３５にて重複検出ありと判断し、その場合、統合処理Ｓ２４０を行う。 If there are a plurality of complemented detection data having the same object ID in the complemented detection data after the duplicate detection determination process S230, the key point complement means 54 determines in step S235 that there is duplicate detection, and in that case, performs the integrated process S240. ..

統合処理Ｓ２４０について以下説明する。当該処理では、統合後の補完済み検出データ（統合済み検出データ）の各キーポイントの位置および状態値を、事前に定めた規則に従い、統合の対象となる複数の補完済み検出データから決定する。具体的には、対象物ＩＤが一致する補完済み検出データのペアに対して、以下に示すような位置および状態値の更新を、ペアが無くなるまで繰り返し行う。統合対象となる２つの補完済み検出データにおける同一部位の状態値の組み合わせについて以下の４通りが考えられる。
（１）補完で得られたキーポイントと検出されたキーポイント（状態値“２”と“１”）
（２）検出されたキーポイント同士（どちらも状態値“１”）
（３）補完で得られたキーポイント同士（どちらも状態値“２”）
（４）一方のキーポイントのみ位置が判明（一方が状態値“１”または“２”で他方が“０”） The integrated process S240 will be described below. In this process, the position and state value of each key point of the integrated detection data (integrated detection data) is determined from a plurality of complemented detection data to be integrated according to a predetermined rule. Specifically, for the pair of complemented detection data having the same object ID, the position and state values as shown below are updated repeatedly until there are no more pairs. The following four combinations of state values of the same part in the two complemented detection data to be integrated can be considered.
(1) Key points obtained by complementation and detected key points (state values "2" and "1")
(2) Detected key points (both state values "1")
(3) Key points obtained by complementation (both state values "2")
(4) The position of only one key point is known (one is the state value "1" or "2" and the other is "0").

本実施形態では、（１）に対しては、検出されたキーポイントの位置と状態値を優先して選択する。（２）および（３）に対しては、検出されたキーポイント数が多い補完済み検出データのものを優先し、検出されたキーポイント数が同数である場合は位置に関しては両者の平均とし、状態値は（２）の場合は“１”、（３）の場合は“２”とする。（４）に関しては、位置の判明しているキーポイントの位置と状態値を用いる。この規則に従って、対象物ＩＤが同じ複数の補完済み検出データを統合して、１つの統合済み検出データを生成する。ちなみに、図５（ｂ）の対象物ＩＤが００２と００３の補完済み検出データについて重複検出判定にて共通の対象物ＩＤとして００２が付与され、それらデータを上述の規則で統合することで、統合済み検出データとして図６（ｄ）の対象物ＩＤが００２の補完済み検出データが生成される。 In the present embodiment, for (1), the position and state value of the detected key point are preferentially selected. For (2) and (3), priority is given to the complemented detection data having a large number of detected key points, and if the number of detected key points is the same, the positions are averaged. The state value is "1" in the case of (2) and "2" in the case of (3). Regarding (4), the position and state value of the key point whose position is known are used. According to this rule, a plurality of complemented detection data having the same object ID are integrated to generate one integrated detection data. By the way, 002 is assigned as a common object ID in the duplicate detection determination for the complemented detection data of the object IDs 002 and 003 in FIG. 5B, and the data are integrated by integrating them according to the above rule. As the completed detection data, the complemented detection data having the object ID of 002 in FIG. 6D is generated.

ステップＳ２４０の統合処理ではステップＳ２２５で補完されたキーポイントをそのまま用いて統合後の部位データを定めたが、重複検出判定処理Ｓ２３０にて同一の対象物ＩＤを付与された検出データに対し改めて補完を行って統合後の部位データを定めてもよい。図１０は再度の補完を行って統合する構成の対象物認識装置１における認識段階での動作に関する概略のフロー図である。図１０において、図８と共通する処理には同一の符号を付しており、図１０の処理は、図８の統合処理Ｓ２４０に代えて、重複検出された検出データのまとまりに対するキーポイント再補完を伴う統合処理Ｓ４００である点で相違している。ここで、重複検出された検出データのまとまりとは、補完済み検出データのうち重複検出判定処理Ｓ２３０にて互いに同一の対象物ＩＤを付与されたものに対応する元の検出データを合成したものであり、合成検出データと呼ぶことにする。統合処理Ｓ４００では再補完として当該合成検出データに対して補完器を適用し、その出力を統合済み検出データとする。例えば、補完済み検出データに対して上述の規則を適用し得られた部位データから、検出されたキーポイント（状態値“１”）のみを残して残りは状態値“０”とすることで合成検出データを作成する。再度補完することで、検出されたキーポイントがより多い検出データで補完を行うことができるため、より確度の高いキーポイントの位置が得られる。また、図８および図９を用いて説明した処理、並びに図１０の処理では、同一人物が多重に検知されることなどを抑制できる。また、補完済み検出データを用いることにより、重複検出が判定でき、１つの対象物の部位データを複数の対象物のものと認識する誤りを減じることが可能になる In the integration process of step S240, the part data after integration is determined by using the key points complemented in step S225 as they are, but the detection data to which the same object ID is given in the duplicate detection determination process S230 is complemented again. May be performed to determine the site data after integration. FIG. 10 is a schematic flow chart regarding the operation at the recognition stage in the object recognition device 1 having a configuration in which the objects are complemented and integrated again. In FIG. 10, the same reference numerals are given to the processes common to those in FIG. 8, and the processes in FIG. 10 replace the integrated process S240 in FIG. 8 with key point recomplementation for a set of detection data detected in duplicate. It is different in that it is an integrated process S400 accompanied by. Here, the set of detection data for which duplicate detection is detected is a combination of the original detection data corresponding to the complemented detection data to which the same object ID is assigned by the duplicate detection determination process S230. Yes, we will call it synthetic detection data. In the integrated process S400, a complementer is applied to the composite detection data as recomplementation, and the output is used as the integrated detection data. For example, from the part data obtained by applying the above rule to the complemented detection data, only the detected key point (state value “1”) is left and the rest is set to the state value “0”. Create detection data. By complementing again, it is possible to complement with the detected data having more detected key points, so that the position of the key points with higher accuracy can be obtained. Further, in the process described with reference to FIGS. 8 and 9 and the process of FIG. 10, it is possible to suppress multiple detection of the same person. Further, by using the complemented detection data, duplicate detection can be determined, and it is possible to reduce the error of recognizing the part data of one object as that of a plurality of objects.

［変形例］
（１）上記実施形態では、人の全身を対象物とする例を示したが、対象物は、人の上半身などの人体の一部としてもよいし、車両や椅子などの人以外の物体としてもよい。 [Modification example]
(1) In the above embodiment, an example in which the whole body of a person is an object is shown, but the object may be a part of a human body such as the upper body of a person or an object other than a person such as a vehicle or a chair. May be good.

（２）上記実施形態では、対象物が計測される計測データが二次元画像であり、計測データを取得する計測部は撮影部２とし二次元画像を撮影するカメラである例を示したが、計測データ、計測部はこの例に限られない。例えば、計測データは三次元空間を計測したものであってもよい。三次元計測データの例として、距離画像センサを計測部に用いて得られる距離画像や、多視点カメラで撮影した画像から構築した三次元データを挙げることができる。また、計測データは、二次元画像の時系列（二次元計測データの時系列）、三次元計測データの時系列とすることもできる。 (2) In the above embodiment, the measurement data in which the object is measured is a two-dimensional image, and the measurement unit that acquires the measurement data is the photographing unit 2 and is a camera that captures the two-dimensional image. The measurement data and the measurement unit are not limited to this example. For example, the measurement data may be a measurement of a three-dimensional space. Examples of the three-dimensional measurement data include a distance image obtained by using a distance image sensor in the measurement unit and three-dimensional data constructed from an image taken by a multi-view camera. Further, the measurement data can be a time series of two-dimensional images (time series of two-dimensional measurement data) or a time series of three-dimensional measurement data.

（３）上記実施形態では、両肩のキーポイントの中点を正規化後の原点に定めるという１通りのキーポイントの組を用いて正規化を行う例を示した。別の実施形態においては、他のキーポイントの組を用いた正規化を含めて複数通りの正規化を定義しておき、正規化の対象とする付与データごとに当該付与データにおいて利用可能なキーポイントの組に応じた正規化を選択する手法とすることもできる。このようにすることで、サンプルを無駄なく用いた学習を行うことが可能になり、補完器をより高精度化できる。また、このようにすることで、属性推定をし損ねる可能性を減じることができる。 (3) In the above embodiment, an example is shown in which normalization is performed using one set of key points in which the midpoint of the key points on both shoulders is set as the origin after normalization. In another embodiment, a plurality of types of normalization are defined including normalization using another set of key points, and the key that can be used in the grant data for each grant data to be normalized. It can also be a method of selecting normalization according to a set of points. By doing so, it becomes possible to perform learning using samples without waste, and it is possible to improve the accuracy of the complementer. In addition, by doing so, the possibility of failing to estimate the attribute can be reduced.

また、上記実施形態では原点を２つのキーポイントを用いて決める例としたが、原点は１つのキーポイント、または３つ以上のキーポイントを用いて定めてもよい。 Further, in the above embodiment, the origin is determined by using two key points, but the origin may be determined by using one key point or three or more key points.

（４）上記実施形態では、部位データはキーポイントの位置を座標で表現する形態としたが、画像の形態で表現したものとすることもできる。例えば、キーポイントの座標位置のみ画素値が１となるようなバイナリ画像を各キーポイントに対して作成したものや、そのバイナリ画像に対してガウシアンフィルタを適用したものを用いてもよい。その場合、各キーポイントの座標は当該画像での最大値をとる点に対応する。また、劣化データ生成手段５１や補完器学習手段５２やキーポイント補完手段５４の入力と出力とでキーポイントの位置の表現形態を異ならせてもよく、入力では位置を座標で表し出力では画像で表す構成や、逆に入力を画像とし出力を座標で表す構成とすることができる。 (4) In the above embodiment, the part data is expressed in the form of coordinates of the position of the key point, but it can also be expressed in the form of an image. For example, a binary image in which the pixel value is 1 only at the coordinate position of the key point may be created for each key point, or an image obtained by applying a Gaussian filter to the binary image may be used. In that case, the coordinates of each key point correspond to the point that takes the maximum value in the image. Further, the input and output of the deterioration data generation means 51, the complement learning means 52, and the key point complement means 54 may have different representation forms of the key point position, and the position is represented by coordinates in the input and an image in the output. It can be represented, or conversely, the input can be an image and the output can be represented by coordinates.

（５）補完器は、キーポイントの座標の推定値に加えて、座標推定値の信頼度を出力してもよい。この場合、キーポイント補完手段５４は、一定以上の信頼度を有するキーポイントのみ、補完済み検出データにおける状態値を“１”に設定することができる。例えば、キーポイント検出手段５３にて検出されたキーポイントが少なく座標の推定が難しいような場合には、誤った座標の推定がなされやすい。この点、得られた座標推定値の信頼度が低いキーポイントについては状態値を“１”に設定しないことで、補完済み検出データに基づく対象物認識の精度を向上させたり対象物認識の結果の精度指標を伝達することが可能となる。例えば、この補完済み検出データを用いることで、対象物属性推定手段５６は右側のキーポイントの組み合わせと左側のキーポイントの組み合わせのうちの状態値が“１”であるキーポイントが多い方の組み合わせを用いて属性を推定することができる。また、対象物領域検出手段５５は対象物領域の検出に用いたキーポイントのうちの状態値が“１”であるキーポイントの数に応じた精度指標を求めて出力し、対象物属性推定手段５６は対象物属性の推定に用いたキーポイントのうちの状態値が“１”であるキーポイントの数に応じた精度指標を求めて出力し、認識結果出力手段３１はこれらの精度指標を認識結果に含めて出力する。 (5) The complementer may output the reliability of the coordinate estimated value in addition to the estimated value of the coordinate of the key point. In this case, the key point complementing means 54 can set the state value in the complemented detection data to "1" only for the key points having a certain degree of reliability or higher. For example, when the number of key points detected by the key point detecting means 53 is small and it is difficult to estimate the coordinates, it is easy to estimate the wrong coordinates. In this regard, by not setting the state value to "1" for key points with low reliability of the obtained coordinate estimates, the accuracy of object recognition based on the complemented detection data can be improved or the result of object recognition. It becomes possible to transmit the accuracy index of. For example, by using this complemented detection data, the object attribute estimation means 56 is a combination of a combination of key points on the right side and a combination of key points on the left side, whichever has more key points whose state value is “1”. The attributes can be estimated using. Further, the object area detecting means 55 obtains and outputs an accuracy index according to the number of key points whose state value is "1" among the key points used for detecting the object area, and outputs the accuracy index, and is an object attribute estimating means. 56 obtains and outputs an accuracy index according to the number of key points whose state value is “1” among the key points used for estimating the object attribute, and the recognition result output means 31 recognizes these accuracy indexes. Include it in the result and output it.

また、信頼度をバイナリで表現される状態値に変換せずに補完済み検出データと合わせてそのまま保持してもよく、その場合、キーポイントの組み合わせの選択指標や精度指標は信頼度の合計値などとなる。 In addition, the reliability may be retained as it is together with the complemented detection data without being converted into a state value expressed in binary. In that case, the selection index and accuracy index of the key point combination are the total value of the reliability. And so on.

（６）上記実施形態では、補完器としてＶＡＥを用いたが、ニューラルネットワークやガウシアンプロセスなど連続値を出力可能な他のモデルを用いてもよい。また、事前にキーポイントの座標を離散化して、部位の位置の推定を各キーポイントはそれらのいずれかに属するクラス分類問題として定式化することにより、補完器として、アダブースト（AdaBoost）などの識別モデルを用いることもできる。 (6) In the above embodiment, VAE is used as a complement, but other models capable of outputting continuous values such as a neural network and a Gaussian process may be used. In addition, by discretizing the coordinates of the key points in advance and formulating the estimation of the position of the part as a classification problem in which each key point belongs to one of them, identification of AdaBoost etc. as a complementer A model can also be used.

（７）上記実施形態では、キーポイント補完手段５４での活性化関数としてＲｅＬＵ関数を用いたが、活性化関数としてｔａｎｈ関数、シグモイド（Sigmoid）関数などを用いてもよい。また、ＲｅｓＮｅｔ（residual network：残差ネットワーク）で用いられるようなショートカット構造を有する構成としてもよい。 (7) In the above embodiment, the ReLU function is used as the activation function in the key point complementing means 54, but a tanh function, a sigmoid function, or the like may be used as the activation function. Further, the configuration may have a shortcut structure as used in ResNet (residual network).

（８）上記実施形態では、キーポイント補完手段５４における統合処理にて重複度αの算出に、統合前の補完済み検出データについての外接矩形を用いたが、これに代えて、当該外接矩形を所定倍に拡大した領域を利用してもよく、統合前の補完済み検出データを人の部位データから人物領域への変換を学習した上述の変換器に入力して得られる領域を利用してもよい。 (8) In the above embodiment, the circumscribing rectangle of the complemented detection data before integration is used for calculating the multiplicity α in the integration process in the key point complementing means 54, but instead, the circumscribing rectangle is used. An area enlarged by a predetermined time may be used, or an area obtained by inputting the complemented detection data before integration into the above-mentioned converter that has learned the conversion from the human part data to the human area may be used. Good.

（９）キーポイント補完手段５４において、対象物の部位データの統合を行う際に、検知器から得られる各部位データまたは各キーポイントのスコアや信頼度を用いてもよい。スコアや信頼度の高い部位データから優先的に統合したり、重複検出判定処理の際にスコアや信頼度の低い部位データは統合しないことで、誤統合の抑制が期待できる。ただし、その場合の検出器およびキーポイント検出手段５３は各キーポイントの位置に加えて当該キーポイントのスコアや信頼度を出力する。 (9) In the key point complementing means 54, when integrating the part data of the object, each part data obtained from the detector or the score or reliability of each key point may be used. Misintegration can be expected to be suppressed by preferentially integrating site data with high scores and reliability, or by not integrating site data with low scores and reliability during duplication detection determination processing. However, the detector and the key point detecting means 53 in that case output the score and the reliability of the key point in addition to the position of each key point.

また、キーポイント補完手段５４による重複検出判定におけるソート順序は面積の降順としてもよい。 Further, the sort order in the duplicate detection determination by the key point complement means 54 may be the descending order of the area.

（１０）図８のステップＳ２４０を図９のステップＳ３５０の直後に実行する、または図１０のステップＳ４００を図９のステップＳ３５０の直後に実行することで、重複度を逐次更新するようにしてもよい。 (10) By executing step S240 in FIG. 8 immediately after step S350 in FIG. 9 or step S400 in FIG. 10 immediately after step S350 in FIG. 9, the degree of duplication may be updated sequentially. Good.

（１１）上記実施形態では、補完器と検出器の学習に、共通の付与データを用いる例を示したが、互いに異なる付与データを用いてもよい。また、補完器のみに用いる付与データは学習用画像と無関係に作成したものでもよい（例えば、対象物の部位データ付き三次元モデルを二次元投影して作成）。また、補完器の学習段階と検出器の学習段階に時期差があってもよい。 (11) In the above embodiment, an example in which common grant data is used for learning of the complementer and the detector is shown, but different grant data may be used. Further, the imparted data used only for the complement may be created independently of the learning image (for example, created by two-dimensionally projecting a three-dimensional model with part data of the object). In addition, there may be a time lag between the learning stage of the complement and the learning stage of the detector.

すなわち、検出器を用いた既存の対象物認識装置に新たな認識処理を加える変更が、検出器の再学習を行わずに、新たな認識処理とそれに必要な補完処理の付加だけで実現可能になる。 That is, a change that adds a new recognition process to an existing object recognition device using a detector can be realized only by adding a new recognition process and the necessary complementary process without re-learning the detector. Become.

（１２）上記実施形態では、記憶部４および画像処理部５を画像センター側に設ける例を示したが、これらをイベント会場側に設けてもよい。 (12) In the above embodiment, the storage unit 4 and the image processing unit 5 are provided on the image center side, but these may be provided on the event venue side.

１対象物認識装置、２撮影部、３通信部、４記憶部、５画像処理部、６出力部、３０撮影画像取得手段、３１認識結果出力手段、４０学習用データ記憶手段、４１検出器記憶手段、４２補完器記憶手段、５０検出器学習手段、５１劣化データ生成手段、５２補完器学習手段、５３キーポイント検出手段、５４キーポイント補完手段、５５対象物領域検出手段、５６対象物属性推定手段。 1 Object recognition device, 2 Imaging unit, 3 Communication unit, 4 Storage unit, 5 Image processing unit, 6 Output unit, 30 Captured image acquisition means, 31 Recognition result output means, 40 Learning data storage means, 41 Detector storage Means, 42 Complementary storage means, 50 Detector learning means, 51 Degraded data generation means, 52 Complementer learning means, 53 Keypoint detecting means, 54 Keypoint complementing means, 55 Object area detecting means, 56 Object attribute estimation means.

Claims

It is an object recognition device that detects a plurality of detection-required parts constituting the object from the measurement data obtained by measuring the predetermined object.
Using the given data including the positions of the detection-required sites given to each of the plurality of samples of the object, and inputting the deterioration data in which some of the said positions are omitted from the given data, the original Complementary device storage means that stores the complementer generated in advance by learning with the given data as the output target value, and
A site detection means that detects one or more of the positions of the detection-required sites on the object from the measurement data and acquires the detection data of the object.
A site complementing means for inputting the detection data into the complement and complementing the position of the detection-required site lacking in the detection data to generate complemented detection data.
An object recognition device characterized by being equipped with.

From the measurement data, the target area in which a predetermined object is measured is detected based on the detection-required parts that are a plurality of parts constituting the object and are required to detect the target area. It ’s a recognition device,
Using the given data including the positions of the detection-required sites given to each of the plurality of samples of the object, and inputting the deterioration data in which some of the said positions are omitted from the given data, the original Complementary device storage means that stores the complementer generated in advance by learning with the given data as the output target value, and
A site detection means that detects one or more of the positions of the detection-required sites on the object from the measurement data and acquires the detection data of the object.
A site complementing means for inputting the detection data into the complement and complementing the position of the detection-required site lacking in the detection data to generate complemented detection data.
An object area detecting means for detecting a predetermined range based on the position of the detection-required part indicated by the complemented detection data as the object area in the measurement data.
An object recognition device characterized by being equipped with.

An object recognition device that estimates whether or not a person measured in the measurement data is a child or not based on a plurality of parts constituting the person who is the object and a detection-required part determined to be necessary for the estimation. And
Using the given data including the positions of the detection-required sites given to each of the plurality of samples of the object, and inputting the deterioration data in which some of the said positions are omitted from the given data, the original Complementary device storage means that stores the complementer generated in advance by learning with the given data as the output target value, and
A site detection means that detects one or more of the positions of the detection-required sites on the object from the measurement data and acquires the detection data of the object.
A site complementing means for inputting the detection data into the complement and complementing the position of the detection-required site lacking in the detection data to generate complemented detection data.
A child estimation means for estimating whether or not the person measured in the measurement data is a child by comparing the ratio of the distances between the detection-required parts in the complemented detection data with a predetermined reference.
An object recognition device characterized by being equipped with.

When the site complementing means acquires the detection data for two or more objects from one measurement data, the site complementing means measures the target from the position of the detection-required site output by the complement. The existence area of each object is estimated, and when the degree of overlap between the existence areas of the plurality of detection data is equal to or more than a predetermined reference value, the plurality of detection data are integrated as if they are due to the same object. The object recognition device according to any one of claims 1 to 3, wherein the complemented detection data is generated.

When the degree of overlap between the existing regions of the plurality of detected data is equal to or higher than a predetermined reference value, the site complementing means has the same site obtained by the complementer for each of the plurality of detected data. The object recognition device according to claim 4, wherein the complemented detection data is generated by integrating the positions into one.

When the degree of overlap between the existing regions of the plurality of detected data is equal to or higher than a predetermined reference value, the site complementing means integrates the plurality of detected data and inputs the plurality of detected data to the complement. The object recognition device according to claim 4, wherein the detected detection data is generated.

It is an object recognition method that detects a plurality of detection-required parts constituting the object from the measurement data obtained by measuring the predetermined object.
Using the given data including the positions of the detection-required sites given to each of the plurality of samples of the object, the deterioration data in which some of the said positions are omitted from the given data is input, and the original said. The step of preparing a complementer generated in advance by learning with the given data as the output target value, and
A site detection step of detecting one or more of the positions of the detection-required sites on the object from the measurement data and acquiring the detection data of the object.
A site complement step in which the detection data is input to the complement, the position of the detection-required site lacking in the detection data is complemented, and complemented detection data is generated.
An object recognition method characterized by being equipped with.

It is a program that causes a computer to perform a process of detecting a plurality of detection-required parts constituting the object from the measurement data obtained by measuring the predetermined object.
The computer
Using the given data including the positions of the detection-required sites given to each of the plurality of samples of the object, and inputting the deterioration data in which some of the said positions are omitted from the given data, the original Complementor storage means that stores complements generated in advance by learning with the given data as the output target value,
A site detection means that detects one or more of the positions of the detection-required sites in the object from the measurement data and acquires the detection data of the object, and
A site complementing means that inputs the detection data to the complement and complements the position of the detection-required site that is lacking in the detection data to generate complemented detection data.
An object recognition program characterized by functioning as.