JP7022040B2

JP7022040B2 - Object identification device, method and program

Info

Publication number: JP7022040B2
Application number: JP2018179892A
Authority: JP
Inventors: 良亮渡邊
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2018-09-26
Filing date: 2018-09-26
Publication date: 2022-02-17
Anticipated expiration: 2038-09-26
Also published as: JP2020052600A

Description

本発明は、視点の異なる複数のカメラで撮影したカメラ映像上で各オブジェクトに固有のIDを認識し、ID認識の結果を基に各オブジェクトを識別するオブジェクト識別装置、方法およびプログラムに関する。 The present invention relates to an object identification device, a method, and a program that recognize an ID unique to each object on camera images taken by a plurality of cameras having different viewpoints and identify each object based on the result of the ID recognition.

従来、カメラで撮影した映像を基に、人物に代表される何らかのオブジェクトを抽出し、識別する技術が提案されてきた。この識別を実現するためには、例えばオブジェクトがスポーツ選手であれば背番号や顔、車であればナンバープレートの番号等を映像中から正確に抽出し、かつ抽出した部分から選手の背番号等の情報を正しく認識し、識別を実現する必要がある。 Conventionally, a technique has been proposed in which some object represented by a person is extracted and identified based on an image taken by a camera. In order to realize this identification, for example, if the object is an athlete, the uniform number and face, if it is a car, the license plate number, etc. are accurately extracted from the video, and the athlete's uniform number, etc. are extracted from the extracted part. It is necessary to correctly recognize the information of and realize the identification.

例えばスポーツ映像の中で、各選手の識別を正確に実現することができれば、各選手の動きを正確に画像だけから捉えることができ、戦術の分析等に役立てることが可能となる。 For example, if the identification of each player can be accurately realized in a sports image, the movement of each player can be accurately captured only from the image, which can be useful for tactical analysis and the like.

オブジェクト識別の手段としては、近年、高精度な識別を実現できるという理由から深層学習を用いる識別技術が注目されている。深層学習を用いてスポーツ選手の背番号を高精度に識別する技術が非特許文献１に開示されている。非特許文献１では、ある背番号画像を、訓練した畳み込みニューラルネットワークにより認識することで、約83%の精度で正解の番号を認識することができたことが示されている。 As a means of object identification, an identification technique using deep learning has been attracting attention in recent years because it can realize highly accurate identification. Non-Patent Document 1 discloses a technique for identifying an athlete's number with high accuracy using deep learning. Non-Patent Document 1 shows that by recognizing a certain uniform number image by a trained convolutional neural network, the correct number could be recognized with an accuracy of about 83%.

一方、シーン中で常に識別を行うためには、顔や背番号といった固有の識別部分が高い頻度でカメラに映っていなければならない。故に、１台のカメラの利用だけではロバストな識別を行う上で限界が存在していた。 On the other hand, in order to always identify in the scene, the unique identification part such as the face and the uniform number must be frequently reflected on the camera. Therefore, there is a limit to robust identification by using only one camera.

このような技術課題を解決するために、複数のカメラを用いて効率的にオブジェクトの識別を行うアプローチについて提案が成されてきた。特許文献１は、複数のカメラを使用し、特定の人物に対して複数の方向から撮像した人物の画像を用いて個人の同定を行っている。特許文献１では、画像間の相対的な方位の関係を基に複数の画像を登録画像と比較することで、高い精度の識別を実現していた。 In order to solve such technical problems, proposals have been made for an approach for efficiently identifying objects using a plurality of cameras. Patent Document 1 uses a plurality of cameras to identify an individual using images of a person captured from a plurality of directions with respect to a specific person. In Patent Document 1, high-precision identification is realized by comparing a plurality of images with registered images based on the relationship of relative orientations between images.

特開2016-001447号公報Japanese Unexamined Patent Publication No. 2016-001447

Sebastian Gerke; Karsten Muller; Ralf Schafer,"Soccer Jersey Number Recognition Using Convolutional Neural Networks,"The IEEE International Conference on Computer Vision (ICCV) Workshops, pp. 17-24, 2015.Sebastian Gerke; Karsten Muller; Ralf Schafer, "Soccer Jersey Number Recognition Using Convolutional Neural Networks," The IEEE International Conference on Computer Vision (ICCV) Workshops, pp. 17-24, 2015. Laurentini, A."The visual hull concept for silhouette based image understanding.", IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 150-162 (1994).Laurentini, A. "The visual hull concept for silhouette based image understanding.", IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 150-162 (1994). J. Redmon and A. Farhadi,"YOLO9000: Better, Faster, Stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6517-6525 (2017).J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6517-6525 (2017). Gandhi, T and Trivedi, M."Image based estimation of pedestrian orientation for improving path prediction."in Proc. 2008 IEEE Intelligent Vehicles Symposium, 506-511 (2008).Gandhi, T and Trivedi, M. "Image based estimation of pedestrian orientation for improving path prediction." In Proc. 2008 IEEE Intelligent Vehicles Symposium, 506-511 (2008). Z. Cao, T. Simon, S. Wei and Y. Sheikh, "Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields,"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1302-1310 (2017).Z. Cao, T. Simon, S. Wei and Y. Sheikh, "Realtime Multi-person 2D Pose Optimization Using Part Affinity Fields," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1302-1310 (2017) ). J. F. Henriques, R. Caseiro,P. Martins and J. Batista, "High-Speed Tracking with Kernelized Correlation Filters,"in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 3, pp. 583-596 (2015).JF Henriques, R. Caseiro, P. Martins and J. Batista, "High-Speed Tracking with Kernelized Correlation Filters," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 3, pp. 583-596 ( 2015).

非特許文献１では、深層学習を用いて高精度の背番号認識が行えることが示されている。しかしながら、映像の中で識別対象となる部位が常に見え続けるということは少ない。例えばスポーツ選手の背番号であればカメラに対する選手の立つ角度や、選手同士の重なり等の問題から、常に背番号をカメラの中に捉えるということは困難である。車のナンバープレートでも、ナンバープレートが見える角度は限定されるという問題がある。しかしながら、非特許文献１では、このような状況においても高精度で識別を実行する手段については開示されていない。 Non-Patent Document 1 shows that high-precision uniform number recognition can be performed by using deep learning. However, it is rare that the part to be identified is always visible in the image. For example, in the case of a sports player's uniform number, it is difficult to always capture the uniform number in the camera due to problems such as the angle at which the athlete stands with respect to the camera and the overlap between athletes. Even with the license plate of a car, there is a problem that the angle at which the license plate can be seen is limited. However, Non-Patent Document 1 does not disclose a means for performing identification with high accuracy even in such a situation.

一方、特許文献１は複数カメラを用いるため、前述の識別対象が見える頻度が少ないという問題を解決することができる。しかしながら、特許文献１は主に人物の顔（頭部）を対象に認識を行う技術であり、顔を対象とした識別では識別が可能なレベルで顔が鮮明にカメラに映っている必要がある。 On the other hand, since Patent Document 1 uses a plurality of cameras, it is possible to solve the problem that the above-mentioned identification target is rarely seen. However, Patent Document 1 is a technique for recognizing a person's face (head) mainly, and it is necessary that the face is clearly reflected on the camera at a level that can be identified in the face-targeted identification. ..

しかしながら、スタジアムのような広い領域を対象に、比較的少ないカメラでフィールド全体の選手の識別を行う場合、スタジアム全体が映り込むような画角で撮影を行う必要がある。しかしながら、このような撮影環境で顔を鮮明に映し出すことは一般的なカメラの解像度では困難であるため、広域空間には適用しづらいという問題があった。 However, when identifying players in the entire field with a relatively small number of cameras for a wide area such as a stadium, it is necessary to shoot at an angle of view that reflects the entire stadium. However, since it is difficult to clearly project a face in such a shooting environment with the resolution of a general camera, there is a problem that it is difficult to apply it to a wide area.

加えて、顔を対象にした認識は、例えばアメリカンフットボールのような、顔全体や頭部にプロテクターを付ける可能性のある競技には適用が難しい。また、特許文献１では顔（頭部）のみならず背番号領域を識別に用いることが可能ではあるものの、特許文献１のアルゴリズムは、識別に使用する特徴部が複数のカメラから見えることを前提として効率的な識別を実現するものであるが故に、背番号のように特定のカメラからしか見えない可能性が高い識別対象に対して、効果的に適用することが難しいという問題も存在していた。 In addition, face recognition is difficult to apply to competitions such as American football where a protector may be attached to the entire face or head. Further, although it is possible to use not only the face (head) but also the uniform number region for identification in Patent Document 1, the algorithm of Patent Document 1 presupposes that the feature portion used for identification can be seen from a plurality of cameras. Therefore, there is also a problem that it is difficult to effectively apply it to an identification target that is likely to be visible only from a specific camera, such as a uniform number, because it realizes efficient identification. rice field.

また、複数のカメラから対象を捉える際に、他のオブジェクトによって遮蔽されることによって対象が映らなかった場合に、精度が大きく低下してしまう懸念がある。しかしながら、この問題の解決法については特許文献１の中では明確に開示されていない。 In addition, when capturing an object from a plurality of cameras, if the object is not projected due to being shielded by another object, there is a concern that the accuracy will be greatly reduced. However, the solution to this problem is not clearly disclosed in Patent Document 1.

本発明の目的は、上記の技術課題を解決し、各オブジェクトの他のオブジェクトによる遮蔽度をカメラごとに求め、オブジェクトごとに遮蔽度の小さいカメラ映像を対象にID認識を実行することでオブジェクト識別の精度が向上するオブジェクト認識装置、方法およびプログラムを提供することにある。 An object of the present invention is to solve the above-mentioned technical problems, obtain the degree of obstruction by another object of each object for each camera, and perform ID recognition for a camera image having a small degree of obstruction for each object to identify an object. It is to provide an object recognition device, a method and a program which improve the accuracy of the object.

上記の目的を達成するために、本発明は、カメラ映像に基づいてオブジェクトを識別するオブジェクト識別装置、方法およびプログラムにおいて、以下の構成を具備した点に特徴がある。 In order to achieve the above object, the present invention is characterized in that the object identification device, method and program for identifying an object based on a camera image are provided with the following configurations.

(1) オブジェクトを複数の異なる視点で撮影したカメラ映像を取得する手段と、各オブジェクトの位置を推定する手段と、各カメラの視点および各オブジェクトの位置に基づいてオブジェクト同士の遮蔽度をカメラごとに計算する手段と、前記遮蔽度に基づいて各オブジェクトの識別に用いるカメラを選定する手段と、オブジェクト毎に前記選定したカメラのカメラ映像に基づいて各オブジェクトを識別する手段とを具備した。 (1) Means for acquiring camera images of objects taken from multiple different viewpoints, means for estimating the position of each object, and the degree of shielding between objects based on the viewpoint of each camera and the position of each object for each camera. It is provided with a means for calculating the above, a means for selecting a camera to be used for identifying each object based on the degree of shielding, and a means for identifying each object based on the camera image of the selected camera for each object.

(2) オブジェクトがカメラ映像から認識できるIDを保持し、カメラごとにそのカメラ映像に基づいて各オブジェクトの向きを推定する手段をさらに具備し、前記カメラを選定する手段は、各オブジェクトの向きおよび遮蔽度に基づいてオブジェクト毎にそのIDを認識するカメラを選定するようにした。 (2) The object holds an ID that can be recognized from the camera image, and each camera is further equipped with a means for estimating the direction of each object based on the camera image, and the means for selecting the camera is the direction of each object and the means for selecting the camera. A camera that recognizes the ID of each object is selected based on the degree of obstruction.

(3) カメラを選定する手段は、オブジェクトごとにID指向方向を計算する手段と、各オブジェクトのID指向方向ごとに候補ベクトルを算出する手段と、オブジェクトごとに、指向方向の角度差が所定の閾値を下回る２つの候補ベクトルを統合して一の候補ベクトルを新たに生成し、これを繰り返す手段と、前記統合された２つの候補ベクトルの信頼度を反映して前記新たに生成した一の候補ベクトルの信頼度を設定する手段とを具備し、信頼度の高さが所定の条件を満たす候補ベクトルに基づいてカメラを選定するようにした。 (3) The means for selecting a camera are a means for calculating the ID direction direction for each object, a means for calculating a candidate vector for each ID direction direction of each object, and a predetermined angle difference in the direction direction for each object. A means of integrating two candidate vectors below the threshold to newly generate one candidate vector and repeating this, and the newly generated one candidate reflecting the reliability of the two integrated candidate vectors. It is equipped with a means for setting the reliability of the vector, and the camera is selected based on the candidate vector whose high reliability satisfies a predetermined condition.

(4) オブジェクトを識別する手段は、オブジェクトのカメラ映像からオブジェクトのIDを含む識別領域を抽出する手段をさらに具備し、抽出した識別領域を対象にID認識を実行するようにした。 (4) The means for identifying the object is further equipped with a means for extracting the identification area including the ID of the object from the camera image of the object, and the ID recognition is executed for the extracted identification area.

(5) 各オブジェクトの向きを推定する手段は、カメラ映像から取得したオブジェクト画像に基づいて各オブジェクトの向きを推定する手段および各オブジェクトの移動ベクトルに基づいて各オブジェクトの向きを推定する手段の少なくとも一方を含むようにした。 (5) The means for estimating the orientation of each object is at least a means for estimating the orientation of each object based on the object image acquired from the camera image and a means for estimating the orientation of each object based on the movement vector of each object. I tried to include one.

(6) 各オブジェクトの向きを推定する手段は、各向き推定結果の信頼度を取得する手段をさらに具備した。 (6) The means for estimating the orientation of each object is further provided with means for acquiring the reliability of each orientation estimation result.

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1) オブジェクト同士の遮蔽度をカメラごとに求め、各オブジェクトの遮蔽度に基づいてオブジェクト識別の尤度が高いと推定されるカメラをオブジェクトごとに選定し、各オブジェクトの識別を、当該選定されたカメラのカメラ映像を対象に行うので、オブジェクト同士の遮蔽による誤認識の影響を排除した高精度なオブジェクト識別が可能になる。 (1) Obtain the degree of shielding between objects for each camera, select a camera for each object that is estimated to have a high probability of object identification based on the degree of shielding of each object, and identify each object. Since the target is the camera image of the same camera, it is possible to identify objects with high accuracy by eliminating the influence of erroneous recognition due to shielding between objects.

(2) オブジェクトに付されたIDを認識し、当該認識結果を基にオブジェクトを識別するにあたり、オブジェクトの向きを推定することでIDの指向方向を判断し、この指向方向を基にカメラを選定するので、ID認識の精度が向上し、オブジェクト同士の遮蔽による誤認識の影響を排除した高精度なオブジェクト識別が可能になる。 (2) When recognizing the ID attached to an object and identifying the object based on the recognition result, the direction of the ID is determined by estimating the direction of the object, and the camera is selected based on this direction of direction. Therefore, the accuracy of ID recognition is improved, and highly accurate object identification is possible by eliminating the influence of erroneous recognition due to shielding between objects.

(3) 各オブジェクトの向き推定に、カメラ映像に基づく推定結果のみならず、移動ベクトルに基づく推定結果も反映させたので、精度の高い向き推定が可能になる。 (3) Since the orientation estimation of each object reflects not only the estimation result based on the camera image but also the estimation result based on the movement vector, highly accurate orientation estimation becomes possible.

(4) 向き推定の結果ごとにその信頼度を取得し、オブジェクトごとに各向き推定の結果とその信頼度に基づいて最終的な向きを推定するようにしたので、精度の高い向き推定が可能になる。 (4) Since the reliability is acquired for each orientation estimation result and the final orientation is estimated based on the orientation estimation result and its reliability for each object, highly accurate orientation estimation is possible. become.

(5) 移動ベクトルに基づく向き推定結果の信頼度をオブジェクトの移動速度に基づいて求めるようにしたので、移動ベクトルに基づく向き推定結果の信頼度を簡単かつ正確に求められるようになる。 (5) Since the reliability of the orientation estimation result based on the movement vector is obtained based on the movement speed of the object, the reliability of the orientation estimation result based on the movement vector can be obtained easily and accurately.

(6) オブジェクトごとにID認識を実行するカメラを選定する際に、カメラ映像ごとに得られる各オブジェクトの指向方向を表す候補ベクトルのうち、角度差の小さい候補ベクトルは統合し、統合された各候補ベクトルの信頼度を統合により生成された新しい候補ベクトルに設定し、最終的に信頼度の高い候補ベクトルに基づいてカメラを選定するので、外れ値の候補ベクトルがカメラ選定に与える影響を排除できるようになる。 (6) When selecting a camera to perform ID recognition for each object, among the candidate vectors representing the direction of orientation of each object obtained for each camera image, the candidate vectors with a small angle difference are integrated and integrated. Since the reliability of the candidate vector is set to the new candidate vector generated by the integration and the camera is finally selected based on the highly reliable candidate vector, the influence of the outlier candidate vector on the camera selection can be eliminated. It will be like.

(7) オブジェクトごとに、各候補ベクトルの向きと各カメラの向きとに基づいて各カメラに推奨度のスコア付けを行い、これを全ての候補ベクトルについて繰り返すことで得られた累積スコアに基づいてカメラを選定するので、ID認識の尤度が高いカメラを選定できるようになる。 (7) For each object, score the recommendation level for each camera based on the orientation of each candidate vector and the orientation of each camera, and repeat this for all candidate vectors based on the cumulative score obtained. Since the camera is selected, it becomes possible to select a camera with a high probability of ID recognition.

(8) オブジェクト画像からIDを含む識別領域を抽出し、識別領域を対象にID認識を実行するのでID認識の範囲を予め狭めることができ、高速かつ高精度のID認識ひいてはオブジェクト識別を実現できるようになる。 (8) Since the identification area including the ID is extracted from the object image and ID recognition is executed for the identification area, the range of ID recognition can be narrowed in advance, and high-speed and high-precision ID recognition and thus object identification can be realized. It will be like.

本発明の一実施形態に係るオブジェクト識別装置の主要部の構成を各構成間で授受される信号／情報の内容と共に示した図である。It is a figure which showed the structure of the main part of the object identification apparatus which concerns on one Embodiment of this invention together with the content of the signal / information exchanged between each structure. 視体積交差法によるオブジェクトの3Dモデル構築方法を示した図である。It is a figure which showed the 3D model construction method of the object by the visual volume crossing method. オブジェクトの位置推定方法を示した図である。It is a figure which showed the position estimation method of an object. カメラ映像から抽出したオブジェクト画像の例を示した図である。It is a figure which showed the example of the object image extracted from the camera image. オブジェクト画像と向きとの関係を示した図である。It is a figure which showed the relationship between an object image and an orientation. オブジェクト同士の遮蔽度の算出方法を示した図である。It is a figure which showed the calculation method of the degree of occlusion between objects. オブジェクトの向きとIDの指向方向との関係を示した図である。It is a figure which showed the relationship between the direction of an object and the directivity of an ID. 統合する候補ベクトルの選択方法を示した図である。It is a figure which showed the selection method of the candidate vector to be integrated. 統合後の候補ベクトルの方向を統合された２つの候補ベクトルのスコアを基に決定する方法を示した図である。It is a figure which showed the method of determining the direction of a candidate vector after integration based on the score of two integrated candidate vectors. 統合が完了した複数の候補ベクトルを基にカメラを選定する方法を示した図である。It is a figure which showed the method of selecting a camera based on a plurality of candidate vectors for which integration was completed. 統合する候補ベクトルの選択手順を示したフローチャートである。It is a flowchart which showed the selection procedure of the candidate vector to be integrated. ２つの候補ベクトルを統合する手順を示したフローチャートである。It is a flowchart which showed the procedure of integrating two candidate vectors. カメラをその累積スコアを基に選定する方法を示した図である。It is a figure which showed the method of selecting a camera based on the cumulative score. カメラをその累積スコアを基に選定する手順を示したフローチャートである。It is a flowchart which showed the procedure of selecting a camera based on the cumulative score. オブジェクト画像から識別領域を抽出する方法を示した図である。It is a figure which showed the method of extracting the identification area from the object image. オブジェクト識別結果の出力例を示した図である。It is a figure which showed the output example of the object identification result.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明の一実施形態に係るオブジェクト識別装置の主要部の構成を、各構成間で授受される信号／情報の内容と共に示した図である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a diagram showing a configuration of a main part of an object identification device according to an embodiment of the present invention together with the contents of signals / information exchanged between the configurations.

本発明のオブジェクト識別装置は、汎用のコンピュータに、後述する各機能を実現するアプリケーション（プログラム）を実装することで構成できる。あるいは、アプリケーションの一部がハードウェア化またはROM化された専用機や単能機として構成することもできる。 The object identification device of the present invention can be configured by mounting an application (program) that realizes each function described later on a general-purpose computer. Alternatively, a part of the application can be configured as a dedicated machine or a single-purpose machine that is made into hardware or ROM.

本実施形態では、オブジェクトとして人物を想定し、各人物オブジェクトをその識別情報（ID）を基に識別する。本実施形態ではIDとして背番号を想定して説明するが、顔をIDとして識別しても良いし、オブジェクトが車両であれば、そのナンバープレートやゼッケンをIDとして識別しても良い。また、オブジェクトの識別は各カメラ映像に対してフレーム単位で連続して行われるが、ここでは１フレームの処理に限定して説明する。フレーム間での識別結果の追跡には、周知の追跡手法を適用することができる。 In the present embodiment, a person is assumed as an object, and each person object is identified based on the identification information (ID). In the present embodiment, the uniform number is assumed as the ID, but the face may be identified as the ID, and if the object is a vehicle, the license plate or the bib may be identified as the ID. Further, although the object identification is continuously performed for each camera image in frame units, the description here is limited to the processing of one frame. Well-known tracking techniques can be applied to track discrimination results between frames.

カメラ映像取得部１は、設置されている位置や向きが明らかであって、視点（立脚点）の異なる複数台（本実施形態では、n台）のカメラcam1，cam2…camNからカメラ映像Icam1，Icam2…IcamNを取得する。 The camera image acquisition unit 1 has multiple cameras (n in this embodiment) having a clear installation position and orientation and different viewpoints (standing points). Camera images Icam1 and Icam2 from camN. … Get the IcamN.

オブジェクト位置推定部２は、各カメラ映像Icamから抽出した各オブジェクトの位置の推定を行う。位置推定には、非特許文献２に示される視体積交差法を用いることができる。 The object position estimation unit 2 estimates the position of each object extracted from each camera image Icam. For the position estimation, the visual volume crossing method shown in Non-Patent Document 2 can be used.

視体積交差法は、図２に示したように、複数のカメラ映像Icamから抽出したオブジェクトのシルエットが形作る錐体の積集合を求めることで当該オブジェクトの3Dモデルを生成する手法であり、生成された3Dモデルの存在する位置から各オブジェクトの位置推定が可能である。このとき、生成された3Dモデルが一定以上の大きさを持つ場合に、その位置にオブジェクトが存在するものとして位置推定を行うことができる。 As shown in FIG. 2, the visual volume crossing method is a method of generating a 3D model of an object by obtaining a product set of pyramids formed by silhouettes of the objects extracted from a plurality of camera images Icam. It is possible to estimate the position of each object from the position where the 3D model exists. At this time, if the generated 3D model has a certain size or more, the position can be estimated assuming that the object exists at that position.

視体積交差法以外にも、画像内から人物抽出を行うことができる非特許文献３のような深層学習ベースの手法を用いて画像中の各オブジェクトの位置を特定した後に、画像中の位置をフィールド上の位置に射影することで各オブジェクトの位置を特定する手法を採用することも可能である。あるいは、各オブジェクトにセンサ等の位置を推定できるデバイスを付けることで位置情報を推定するようにしてもよい。 In addition to the visual volume crossing method, after identifying the position of each object in the image using a deep learning-based method such as Non-Patent Document 3, which can extract a person from the image, the position in the image is determined. It is also possible to adopt a method of specifying the position of each object by projecting it on the position on the field. Alternatively, the position information may be estimated by attaching a device such as a sensor that can estimate the position to each object.

前記オブジェクト位置推定部２は、空間中の全てのオブジェクトの位置を特定するものとし、この位置の推定結果は、図３のように２次元的に特定が成されてもよいし、３次元座標として位置が示されてもよい。 The object position estimation unit 2 shall specify the positions of all objects in space, and the estimation result of this position may be specified two-dimensionally as shown in FIG. 3, or three-dimensional coordinates. The position may be indicated as.

オブジェクト向き推定部３は、オブジェクト画像取得部３０１、分類部３０２、移動ベクトル計算部３０３および信頼度取得部３０４を含み、カメラ映像ごとに各オブジェクトの向きの推定を行う。オブジェクトの向きの推定結果は、後段のカメラ選定部５において、各オブジェクトに固有のID（本実施形態では、背番号）が映り込んでいる可能性が高いカメラを選択するために用いられる。 The object orientation estimation unit 3 includes an object image acquisition unit 301, a classification unit 302, a movement vector calculation unit 303, and a reliability acquisition unit 304, and estimates the orientation of each object for each camera image. The estimation result of the orientation of the object is used in the camera selection unit 5 in the subsequent stage to select a camera having a high possibility that an ID (in this embodiment, a uniform number) unique to each object is reflected.

本実施形態は、視点の異なるカメラ映像ごとに各オブジェクトの向き推定の結果を算出することを特徴としており、オブジェクト画像取得部３０１は、前記オブジェクト位置推定部２で得られる各オブジェクトの位置情報を基に、図４に示したように各オブジェクトの画像A1～A4を取得する。 The present embodiment is characterized in that the result of orientation estimation of each object is calculated for each camera image having a different viewpoint, and the object image acquisition unit 301 obtains the position information of each object obtained by the object position estimation unit 2. Based on this, the images A1 to A4 of each object are acquired as shown in FIG.

分類部３０２は、例えば非特許文献４に開示されているように、オブジェクトの向きごとに訓練画像を用意しておき、その特徴を基に向き推定を行う。本実施形態では、図５に示したように、事前に向き推定を行う方向を８方向に限定して訓練画像を用意し、取得したオブジェクトの画像から抽出した特徴量と各向きの訓練画像の特徴量とを比較することで各オブジェクト画像をいずれかの向きに分類する。 As disclosed in Non-Patent Document 4, for example, the classification unit 302 prepares a training image for each orientation of the object, and performs orientation estimation based on the characteristics thereof. In the present embodiment, as shown in FIG. 5, a training image is prepared by limiting the direction of orientation estimation to eight directions in advance, and the feature amount extracted from the acquired image of the object and the training image of each direction are used. Each object image is classified in either direction by comparing it with the feature amount.

本実施形態では、向き推定に畳み込みニューラルネットワークなどの深層学習を用いることを想定しているが、その他の方法として、HOG(Histograms of Oriented Gradients)特徴量と、その特徴量を基に学習させたSVM(Support Vector Machine)などを用いて訓練と識別を行ってもよい。 In this embodiment, it is assumed that deep learning such as a convolutional neural network is used for orientation estimation, but as another method, HOG (Histograms of Oriented Gradients) features and learning based on the features are trained. Training and identification may be performed using SVM (Support Vector Machine) or the like.

あるいは、非特許文献５に開示されているように、スケルトンを検出する手法で得られた関節の位置を基に、ある特定の関節が見えるかどうかや、関節の位置を特徴点として畳み込みニューラルネットワークやSVMに学習を行わせることによって向き推定を実施しても良い。 Alternatively, as disclosed in Non-Patent Document 5, based on the position of the joint obtained by the method of detecting the skeleton, whether or not a specific joint can be seen and the convolutional neural network using the position of the joint as a feature point. Or SVM may be trained to perform orientation estimation.

移動ベクトル計算部３０３は、前記深層学習による向き推定の正確性を高めるために、移動ベクトルを用いる別のアプローチで更に向き推定を行う。本実施形態では、例えば非特許文献６に開示されているように、フレーム間でオブジェクトの追跡を行うアルゴリズムを用いて移動ベクトルを取得する。 The movement vector calculation unit 303 further performs orientation estimation by another approach using the movement vector in order to improve the accuracy of the orientation estimation by the deep learning. In this embodiment, for example, as disclosed in Non-Patent Document 6, a movement vector is acquired by using an algorithm for tracking objects between frames.

移動ベクトルが取得されると、オブジェクトの向きはその移動方向と一致することが多い。後退りしながら移動するケース等もあることから必ずしも正確ではないが、移動ベクトルを基にした向き推定の結果も加えることで、オブジェクトの向き推定の正確性を高めることができる。 Once the movement vector is obtained, the orientation of the object often coincides with its movement direction. It is not always accurate because there are cases where the object moves while retreating, but the accuracy of the orientation estimation of the object can be improved by adding the result of the orientation estimation based on the movement vector.

本実施形態では、n台のカメラが存在する環境下を想定するので、各カメラ映像から得られるオブジェクトの画像に対して深層学習を実施して得られるn個の向き推定結果と、追跡が成功している場合には移動ベクトルによって得られる一つの向き推定結果との計ｎ＋１個の向き推定結果が、オブジェクトごとに得られることになる。 In this embodiment, since it is assumed that there are n cameras, n orientation estimation results obtained by performing deep learning on the image of the object obtained from each camera image and tracking are successful. If this is the case, a total of n + 1 orientation estimation results, including one orientation estimation result obtained by the movement vector, will be obtained for each object.

信頼度取得部３０４は、各向き推定結果の信頼度Riを取得する。ここで、ｉは向き推定結果のインデックスを表しており、本実施形態ではオブジェクトごとにiが１～ｎ＋１までの値を取ることになる。信頼度Riは、例えばニューラルネットワークによる向き推定結果であれば、出力層の関数から出力される確率を基に算出できる。 The reliability acquisition unit 304 acquires the reliability Ri of each direction estimation result. Here, i represents an index of the orientation estimation result, and in this embodiment, i takes a value from 1 to n + 1 for each object. The reliability Ri can be calculated based on the probability of being output from the function of the output layer, for example, in the case of the orientation estimation result by the neural network.

また、移動ベクトルによる向き推定では、一般的に移動速度が速ければ速いほど、方向転換や後退りなどの想定外の動きをしている可能性が小さくなることから、選手の移動速度を基にRiを求めてもよい。例えば、移動速度が速いほど信頼度が高いものとし、ここではRiが０～１の値に正規化される。 In addition, in the direction estimation by the movement vector, in general, the faster the movement speed, the less likely it is that the player is making an unexpected movement such as turning or retreating. Therefore, Ri is based on the movement speed of the player. May be asked. For example, the faster the moving speed, the higher the reliability, and here Ri is normalized to a value of 0 to 1.

オブジェクト遮蔽度計算部４は、前記向き推定やIDの認識を実行する前に、前記オブジェクト位置推定部２で位置が推定された各オブジェクトが、その前方に位置する他のオブジェクトにより遮蔽されている否かの判定をカメラごとに行い、最終的にオブジェクトごとに遮蔽度Ojを算出する（ｊは、カメラ識別子である）。 In the object obstruction degree calculation unit 4, each object whose position is estimated by the object position estimation unit 2 is shielded by another object located in front of the object before the orientation estimation or ID recognition is executed. Whether or not it is determined is performed for each camera, and finally the shielding degree Oj is calculated for each object (j is a camera identifier).

遮蔽度Ojも、０～１の値に正規化されるものとし、値が１に近いほど遮蔽度が大きく、０に近いほど遮蔽度が少ないことを表す度数として定義される。遮蔽度Ojが最大値の１であれば、注目しているオブジェクトが他のオブジェクトにより完全に遮蔽されていることを意味する。 The degree of shielding Oj is also normalized to a value of 0 to 1, and is defined as a frequency indicating that the closer the value is to 1, the larger the degree of shielding, and the closer the value is to 0, the smaller the degree of shielding. If the degree of occlusion Oj is 1 of the maximum value, it means that the object of interest is completely obscured by other objects.

本実施形態では、図６に示したように、カメラcamごとに注目オブジェクトの前方に他のオブジェクトがどれだけ存在しているかを基に遮蔽度Ojが算出される。遮蔽度Ojは、例えば視体積の逆投影マスクを用いることで算出できるが、前記オブジェクト位置推定部２がオブジェクトの視体積を計算し、その視体積を基に位置推定を行っていれば、各対象オブジェクトの視体積の計算結果を利用できる。 In the present embodiment, as shown in FIG. 6, the degree of shielding Oj is calculated based on how many other objects exist in front of the object of interest for each camera cam. The degree of obstruction Oj can be calculated by using, for example, a back projection mask of the visual volume, but if the object position estimation unit 2 calculates the visual volume of the object and estimates the position based on the visual volume, each of them. The calculation result of the visual volume of the target object can be used.

視体積の計算結果を利用するのであれば、図６にハッチングで示したように、初めに注目オブジェクトと重なる他のオブジェクト（遮蔽オブジェクト）の存在領域が定義される。当該存在領域は事前にユーザが定義するものとし、例えばカメラの視野内で注目オブジェクトの左右に長さLの存在幅を定義し、当該幅2Lを底辺、カメラを頂点とする三角形の領域内に他オブジェクトが存在するか否かを判断する。 If the calculation result of the visual volume is used, as shown by hatching in FIG. 6, the existing area of another object (shielding object) that overlaps with the object of interest is defined first. The existing area is defined by the user in advance. For example, the existing width of the length L is defined on the left and right of the object of interest in the field of view of the camera, and the width 2L is defined as the base and the area of the triangle with the camera as the apex. Determine if another object exists.

次いで、この存在領域内に存在すると判断された各オブジェクトの視体積からカメラのスクリーンに逆投影したマスクM1、および注目オブジェクトの視体積からカメラのスクリーンに逆投影したマスクM2を計算する。そして、マスクM2の全体面積（ピクセル数）Pallと、マスクM2に対してマスクM1が重なっているピクセル数Psとを求め、Ps／Pallの計算結果が遮蔽度Ojとされる。 Next, the mask M1 back-projected onto the camera screen from the visual volume of each object determined to exist in this existing area, and the mask M2 back-projected onto the camera screen from the visual volume of the object of interest are calculated. Then, the total area (number of pixels) Pal of the mask M2 and the number of pixels Ps in which the mask M1 overlaps the mask M2 are obtained, and the calculation result of Ps / Pall is the shielding degree Oj.

なお、本実施形態では便宜的に「遮蔽」と表現しているが、カメラに対して他のオブジェクトが注目オブジェクトの前方ではなく後方に存在し、当該後方に存在する他のオブジェクトが注目オブジェクトの認識結果に影響を与えそうな場合には、前記存在領域を対象オブジェクトの後方まで拡大して同様に計算を行ってもよい。 In this embodiment, although it is expressed as "shielding" for convenience, another object exists behind the object of interest instead of in front of the camera, and the other object existing behind the object of interest is the object of interest. If it is likely to affect the recognition result, the existing area may be expanded to the rear of the target object and the calculation may be performed in the same manner.

遮蔽度Ojの算出方法は上記の方法に限定されるものではなく、画像の特徴量や深層学習ベースで対象オブジェクトの抽出を行った際に、オブジェクトのバウンディングボックスを求めて遮蔽度の計算を行ってもよい。このとき、対象オブジェクトのバウンディングボックスの面積をPall、他オブジェクトのバウンディングボックスが注目オブジェクトのバウンディングボックスと重なっている部分の面積をPsとすることで、上記と同様の手順で遮蔽度Ojを計算できる。 The method of calculating the degree of obstruction Oj is not limited to the above method, and when the target object is extracted based on the feature amount of the image or deep learning, the bounding box of the object is calculated and the degree of obstruction is calculated. You may. At this time, by setting the area of the bounding box of the target object to Pall and the area of the part where the bounding box of another object overlaps the bounding box of the object of interest to Ps, the shielding degree Oj can be calculated by the same procedure as above. ..

カメラ選定部５は、オブジェクト向き推定部３が推定した各オブジェクトの向き、およびオブジェクト遮蔽度計算部４が計算した遮蔽度Ojを基に、オブジェクトの識別に使用するカメラをオブジェクトごとに選定する。本実施例では、前記深層学習によりカメラごとに得られたｎ個の信頼度Riおよび移動ベクトルを基に得られた１つの信頼度Riを、オブジェクトごとに取得済みであるものとして説明する。 The camera selection unit 5 selects a camera to be used for object identification for each object based on the orientation of each object estimated by the object orientation estimation unit 3 and the obstruction degree Oj calculated by the object obstruction degree calculation unit 4. In this embodiment, n reliability Ris obtained for each camera by the deep learning and one reliability Ri obtained based on the movement vector will be described as having been acquired for each object.

カメラ選定部５において、ID指向方向計算部５０１は、オブジェクトごとに前記ｎ＋１個の向き推定結果を基にIDの指向方向を計算する。ID指向方向とは、IDが背番号であれば当該背番号と正対する方向、換言すればオブジェクトの背中から垂直に延びる方向である。 In the camera selection unit 5, the ID direction direction calculation unit 501 calculates the direction direction of the ID based on the n + 1 direction estimation results for each object. If the ID is a uniform number, the ID-oriented direction is the direction facing the uniform number, in other words, the direction extending vertically from the back of the object.

一般に、向き推定結果が０度であれば、そのカメラは背番号と正対し、当該背番号を高確率で映し出していると言える。一方、向き推定結果が、例えば９０度であると、そのカメラから得られた画像は背番号を映し出している可能性が低いが、図７に示したように、向き推定で得られた方向ベクトルを９０度回転させた方向にあるカメラは背番号と正対し、当該背番号を明瞭に映し出している可能性が高い。したがって、当該９０度回転させた方向が指向方向とされる。 Generally, if the orientation estimation result is 0 degrees, it can be said that the camera faces the uniform number and projects the uniform number with high probability. On the other hand, if the orientation estimation result is, for example, 90 degrees, it is unlikely that the image obtained from the camera reflects the uniform number, but as shown in FIG. 7, the direction vector obtained by the orientation estimation It is highly possible that the camera in the direction of rotating 90 degrees faces the uniform number and clearly projects the uniform number. Therefore, the direction rotated by 90 degrees is regarded as the directivity direction.

このような観点から、本実施形態ではカメラ映像ごとに得られるn個の向き推定結果からｎ個のID指向方向が計算される。さらに、本実施例ではオブジェクトごとに、その移動ベクトルを用いた方向推定も実施されているが、この推定結果に関しては、オブジェクトの移動方向の反対方向（１８０度回転させた方向）を背番号と正対するID指向方向とした。 From such a viewpoint, in the present embodiment, n ID directivity directions are calculated from the n orientation estimation results obtained for each camera image. Further, in this embodiment, direction estimation using the movement vector of each object is also performed, but for this estimation result, the direction opposite to the movement direction of the object (direction rotated by 180 degrees) is used as the uniform number. The ID-oriented direction is set to face each other.

なお、本実施形態ではカメラごとに視線の方向が異なるので、前記オブジェクト向き推定部３で推定された各オブジェクトの向きを共通の方位で取り扱うことができない。例えば、cam1の画像上で向きが0°と推定されたオブジェクトA1とcam2の画像上で向きが0°と推定されたオブジェクトA2とは、フィールド上では同じ向きとならず、各カメラcam1，cam2の視線方向の差に応じた角度差が生じる。 In this embodiment, since the direction of the line of sight is different for each camera, the orientation of each object estimated by the object orientation estimation unit 3 cannot be handled in a common orientation. For example, the object A1 whose orientation is estimated to be 0 ° on the image of cam1 and the object A2 whose orientation is estimated to be 0 ° on the image of cam2 do not have the same orientation on the field, and each camera cam1 and cam2 An angle difference is generated according to the difference in the line-of-sight direction.

一方、本実施形態では各カメラの視線方向が既知なので、これ以降の説明では、各カメラの視線方向を基に各オブジェクトの向き推定結果を較正することで、前記オブジェクト向き推定部３が推定する向きとフィールド上での向きとが一致しているものとして説明を続ける。 On the other hand, since the line-of-sight direction of each camera is known in the present embodiment, in the following description, the object orientation estimation unit 3 estimates by calibrating the orientation estimation result of each object based on the line-of-sight direction of each camera. The explanation is continued assuming that the orientation and the orientation on the field match.

候補ベクトル算出部５０２は、前記ｎ+１個のID指向方向を基に各ID指向方向を表すｎ+１個の候補ベクトルを算出する。カメラ評価部５０３は、オブジェクトごとに前記ｎ+１個の候補ベクトルを基に各カメラを評価する。 The candidate vector calculation unit 502 calculates n + 1 candidate vectors representing each ID-oriented direction based on the n + 1 ID-oriented directions. The camera evaluation unit 503 evaluates each camera based on the n + 1 candidate vectors for each object.

本実施形態では、前記カメラ評価部５０３によるカメラの評価のアプローチとして、以下に詳述する２種類の方法、「候補ベクトルを統合する方法」および「各カメラに対してスコア付けを行う方法」のいずれかを採用できる。 In the present embodiment, as an approach for evaluating a camera by the camera evaluation unit 503, two methods described in detail below, "a method of integrating candidate vectors" and "a method of scoring each camera". Either can be adopted.

方法Ａ．［候補ベクトルを統合する方法］
オブジェクトごとに得られたｎ＋１個の向き推定結果を基に最終的に一つの向きを決定する際に、ｎ＋１個の向きの平均を求めただけでは、図８に示したように、推定値が大きく外れた候補ベクトル（図８では、「カメラ４による候補ベクトル」）が含まれていた場合に、推定結果が当該外れ値に強く影響されてしまい、推定精度が低下する。 Method A. [How to integrate candidate vectors]
When finally determining one orientation based on the n + 1 orientation estimation results obtained for each object, simply calculating the average of the n + 1 orientations will result in an estimated value as shown in FIG. When a candidate vector having a large deviation (“candidate vector by camera 4” in FIG. 8) is included, the estimation result is strongly influenced by the deviation value, and the estimation accuracy is lowered.

このような外れ値が少数現れる場合は、向き推計結果が誤りである可能性が高く、特に、遮蔽が生じている可能性の高いカメラからの推定結果である可能性が高い。そこで、本実施形態ではこのような外れ値を排除すべく、以下に詳述するように、各候補ベクトルを所定の条件下で統合し、これを繰り返すことで最終的に一つの候補ベクトルを獲得するようにしている。 If a small number of such outliers appear, it is highly likely that the orientation estimation result is incorrect, and in particular, it is likely that the estimation result is from a camera that is likely to be shielded. Therefore, in the present embodiment, in order to eliminate such outliers, as described in detail below, each candidate vector is integrated under predetermined conditions, and by repeating this, one candidate vector is finally obtained. I try to do it.

図９，１０は、候補ベクトルの統合方法を示した図であり、図１１，１２は、その手順を示したフローチャートである。 9 and 10 are diagrams showing a method of integrating candidate vectors, and FIGS. 11 and 12 are flowcharts showing the procedure.

ステップＳ１では、オブジェクトごとに各指向方向がベクトル化されて候補ベクトルが計算される。ステップＳ２では、指向方向の近い候補ベクトルを統合する際の閾値（統合閾値）θthが定義される。ステップＳ３では、各候補ベクトル間の角度∠（図８では、∠A～∠E）が計算される。 In step S1, each directivity direction is vectorized for each object and a candidate vector is calculated. In step S2, a threshold value (integration threshold value) θth for integrating candidate vectors having close directivity directions is defined. In step S3, the angle ∠ between each candidate vector (∠A to ∠E in FIG. 8) is calculated.

ステップＳ４では、最も小さい角度θminを求め、この最小角度θminが前記統合閾値θthと比較される。最小角度θminが統合閾値θthを下回っていればステップＳ５へ進み、当該最小角度θminをなす２つの候補ベクトルが統合されて新たな一の候補ベクトルが生成される。図９の例では、∠Bが最小角度θminであり、かつ∠B＜最小角度θminなので、「カメラcam2による候補ベクトル」と「カメラcam3による候補ベクトル」とを統合すべくステップＳ５へ進む。 In step S4, the smallest angle θmin is obtained, and this minimum angle θmin is compared with the integrated threshold value θth. If the minimum angle θmin is less than the integration threshold θth, the process proceeds to step S5, and the two candidate vectors forming the minimum angle θmin are integrated to generate a new candidate vector. In the example of FIG. 9, since ∠B is the minimum angle θmin and ∠B <minimum angle θmin, the process proceeds to step S5 to integrate the “candidate vector by the camera cam2” and the “candidate vector by the camera cam3”.

図１２は、前記ステップＳ５における候補ベクトルの統合手順を示したフローチャートであり、ステップＳ１０１では、統合対象の２つの候補ベクトルに関して、次式(1)にしたがってスコアSiが算出される。ここで、ｉは候補ベクトルのインデックスであり、ｊはインデックスｉの候補ベクトルを算出するのに用いたカメラのインデックスである。 FIG. 12 is a flowchart showing the procedure for integrating the candidate vectors in step S5. In step S101, the score Si is calculated according to the following equation (1) for the two candidate vectors to be integrated. Here, i is the index of the candidate vector, and j is the index of the camera used to calculate the candidate vector of the index i.

Si=Ri × (1-Oj) …(1) Si = Ri × (1-Oj)… (1)

Riは前記各方向推定結果の信頼度であり、Ojは遮蔽度である。ただし、移動ベクトルから求めた指向方向定のように遮蔽度を考慮できない推定結果に関しては Ojを定値としてよい。 Ri is the reliability of the estimation result in each direction, and Oj is the shielding degree. However, Oj may be used as a constant value for estimation results that cannot take the degree of shielding into consideration, such as the directivity constant obtained from the movement vector.

ステップＳ１０２では、前記スコアSの計算結果を基に、統合により新たに生成する一の候補ベクトルの指向方向を決定するための角度分割が実施される。本実施形態では、図９に示したように、統合される一方の候補ベクトルのスコアをS1、他方の候補ベクトルのスコアをS2としたとき、これら２つの候補ベクトルが挟む角度∠Bが、一方の候補ベクトル側から他方の候補ベクトル側にS2：S1の比で分割した角度が新しい統合ベクトルの指向方向とされる。 In step S102, angle division for determining the directivity direction of one candidate vector newly generated by integration is performed based on the calculation result of the score S. In the present embodiment, as shown in FIG. 9, when the score of one candidate vector to be integrated is S1 and the score of the other candidate vector is S2, the angle ∠B between these two candidate vectors is one. The angle divided by the ratio of S2: S1 from the candidate vector side of is to the other candidate vector side is the directing direction of the new integrated vector.

図９では、統合される一方の候補ベクトル（カメラ２の候補ベクトル）のスコアS2が０．４であり、他方の候補ベクトル（カメラ３の候補ベクトル）のスコアS3が０．６なので、∠Bが一方側から他方側に０．６：０．４の割合で分割される。 In FIG. 9, the score S2 of one candidate vector to be integrated (candidate vector of camera 2) is 0.4, and the score S3 of the other candidate vector (candidate vector of camera 3) is 0.6, so ∠B. Is divided from one side to the other at a ratio of 0.6: 0.4.

ステップＳ１０３では、当該分割された角度が統合後の新たな候補ベクトルの指向方向となり、新しいインデックスi（ここでは、ｉ=６）が付される。ステップＳ１０４では、統合後の新たな候補ベクトルのスコアS6が、前記統合された２つの候補ベクトルのスコアの和（=S2+S3）として計算される。 In step S103, the divided angle becomes the directivity direction of the new candidate vector after integration, and a new index i (here, i = 6) is attached. In step S104, the score S6 of the new candidate vector after integration is calculated as the sum of the scores of the two integrated candidate vectors (= S2 + S3).

図１１へ戻り、２つの候補ベクトルの統合が完了するとステップＳ３へ戻り、前記統合により生じた新たな候補ベクトルを含めて上記の各処理が、前記ステップＳ４において統合閾値θthを下回る角度が存在しなくなるまで繰り返される。図１０に示したように、統合閾値θthを下回る角度がなくなるとステップＳ６へ進む。 Returning to FIG. 11, when the integration of the two candidate vectors is completed, the process returns to step S3, and each of the above processes including the new candidate vector generated by the integration has an angle below the integration threshold θth in the step S4. Repeat until it runs out. As shown in FIG. 10, when there is no angle below the integration threshold value θth, the process proceeds to step S6.

ステップＳ６では、当該時点でスコアSの最も大きい候補ベクトルの指向方向が最終的なID指向方向として確定される。ステップＳ７では、前記確定したID指向方向を基に当該IDの認識に用いるカメラが選定される。 In step S6, the directivity direction of the candidate vector having the largest score S at that time is determined as the final ID directivity direction. In step S7, a camera used for recognizing the ID is selected based on the determined ID directivity direction.

本実施形態では、確定したID指向方向に最も近い角度のカメラ１台を選択してもよいし、ID指向方向から±φ度の角度範囲内に存在するカメラを全て選定するようにしても良い。複数台のカメラが選ばれた場合には、後に詳述するように、認識尤度の高い一方のカメラから取得した認識結果が最終的な一つの識別結果とされる。 In the present embodiment, one camera having an angle closest to the determined ID directivity direction may be selected, or all cameras existing within an angle range of ± φ degrees from the ID directivity direction may be selected. .. When a plurality of cameras are selected, the recognition result acquired from one camera having a high recognition likelihood is regarded as one final identification result, as will be described in detail later.

また、ID指向方向が確定しても、必ずしも当該指向方向に正対するカメラが存在するとは限らない。このような観点から、IDの指向方向に正対する角度と実際のカメラの角度がどれだけ離れているかを計算しておき、後段のオブジェクト識別部の計算の際の尤度に組み込んでもよい。 Further, even if the ID directivity direction is determined, there is not always a camera facing the directivity direction. From this point of view, it is possible to calculate how far the angle facing the direction of the ID is from the actual camera angle, and incorporate it into the likelihood when calculating the object identification unit in the subsequent stage.

Ｂ．［各カメラに対してスコア付けを行う方法］
上記の方法Ａでは、各候補ベクトルに対してスコア付けを行ったが、本方法Ｂでは、各カメラにスコア付けを行う点に特徴がある。方法Ｂでは、候補ベクトルと正対する方向を向くカメラが最も認識に適しているカメラであるという観点から、候補ベクトルごとに、正対するカメラのスコアが最大となるスコア付けを順次に行う。 B. [How to score each camera]
In the above method A, each candidate vector is scored, but in the present method B, each camera is scored. In the method B, from the viewpoint that the camera facing the direction facing the candidate vector is the most suitable camera for recognition, the scoring that maximizes the score of the camera facing the candidate vector is sequentially performed for each candidate vector.

図１３は、本方式Ｂによる各カメラのスコア付け方法を示した図であり、図１４は、その手順を示したフローチャートである。 FIG. 13 is a diagram showing a scoring method for each camera by the present method B, and FIG. 14 is a flowchart showing the procedure.

ステップＳ２１では、注目する一の候補ベクトルが選択される。ステップＳ２２では、スコア計算対象のカメラが選択される。ステップＳ２３では、次式(2)にしたがって当該カメラの評価値Piが計算される。本実施形態では、正対しているカメラであるか否かを評価する指標として内積に着目し、内積値の小さいカメラほど、よりスコアが高くなる関数を採用している。 In step S21, one candidate vector of interest is selected. In step S22, the camera to be scored is selected. In step S23, the evaluation value Pi of the camera is calculated according to the following equation (2). In this embodiment, the inner product is focused on as an index for evaluating whether or not the camera is facing the camera, and a function is adopted in which the smaller the inner product value, the higher the score.

Pi=Ri×（(1-Oj）×（-cos(Φi-C)) …(2) Pi = Ri × ((1-Oj) × (-cos (Φi-C))… (2)

ここで、Riは前記各向き推定結果の信頼度であり、Ojは遮蔽度である。Φiは注目する候補ベクトルの指向方向であり、Cはカメラの向いている方向を表している。cos部分の計算は内積値の計算を行うことを意味しており（ここでは、各ベクトルは単位ベクトルであるという前提で計算している）、正対する方向であればあるほど望ましいという観点から、内積としては－１となるケースが最も望ましいため、cosの頭にマイナスを付与することで正の値に変換している。 Here, Ri is the reliability of the estimation result for each direction, and Oj is the shielding degree. Φi is the directivity direction of the candidate vector of interest, and C is the direction in which the camera is facing. The calculation of the cos part means that the inner product value is calculated (here, each vector is calculated on the assumption that it is a unit vector), and from the viewpoint that the more facing the direction is, the more desirable it is. Since the case where the inner product is -1 is the most desirable, it is converted to a positive value by adding a minus to the head of cos.

ステップＳ２４では、前記スコアPiが注目カメラの総スコアΣPiに加算されて当該総スコアΣPiが更新される。ステップＳ２５では、全てのカメラに関して今回の候補ベクトルに関するスコア付けが完了したか否かが判断される。完了していなければステップＳ２２へ戻り、スコア計算対象のカメラを切り換えて上記の各処理が繰り返される。 In step S24, the score Pi is added to the total score ΣPi of the camera of interest, and the total score ΣPi is updated. In step S25, it is determined whether or not the scoring for the current candidate vector is completed for all the cameras. If it is not completed, the process returns to step S22, the camera for which the score is calculated is switched, and each of the above processes is repeated.

その後、今回の候補ベクトルに関して全てのカメラに対するスコア付けが完了するとステップＳ２６へ進む。ステップＳ２６では、全ての候補ベクトルに関して各カメラへのスコア付けが完了したか否かが判断される。完了していなければステップＳ２１へ戻り、注目する候補ベクトルを切り換えながら上記の各処理が繰り返される。 After that, when the scoring for all the cameras with respect to the candidate vector this time is completed, the process proceeds to step S26. In step S26, it is determined whether or not the scoring for each camera is completed for all the candidate vectors. If it is not completed, the process returns to step S21, and each of the above processes is repeated while switching the candidate vector of interest.

全ての候補ベクトルに関して各カメラへのスコア付けが完了するとステップＳ２７へ進み、各カメラの総スコアΣPを基に推奨カメラが選定される。推奨カメラは、総スコアΣPが最も大きい一つのカメラのみを選定しても良いし、所定の閾値を超えた全てのカメラを選定しても良い。あるいは、上位Nベストのカメラを選定するようにしても良い。 When the scoring for each camera is completed for all the candidate vectors, the process proceeds to step S27, and the recommended camera is selected based on the total score ΣP of each camera. As the recommended camera, only one camera having the largest total score ΣP may be selected, or all cameras exceeding a predetermined threshold value may be selected. Alternatively, the top N best camera may be selected.

なお、上記の説明では、注目した候補ベクトルごとに全てのカメラを対象にスコアを計算するものとして説明したが、本発明はこれのみに限定されるものではなく、予め候補ベクトルごとに、スコアが高いと予測される一部のカメラを前記内積計算等により事前に選定しておいても良い。 In the above description, the score is calculated for all the cameras for each candidate vector of interest, but the present invention is not limited to this, and the score is calculated for each candidate vector in advance. Some cameras that are expected to be expensive may be selected in advance by the internal product calculation or the like.

その場合、当該事前選定されたカメラのみを対象に上記のスコア付けを行うこととし、図１３に示したように、カメラごとに得られるスコアのうち、同一カメラについて得られたスコアは加算し、最終的に総スコアが最大となるカメラを選定しても良い。 In that case, the above scoring is performed only for the preselected cameras, and as shown in FIG. 13, among the scores obtained for each camera, the scores obtained for the same camera are added. Finally, the camera with the maximum total score may be selected.

オブジェクト識別部６は識別領域抽出部６０１を含み、当該識別領域抽出部６０１が抽出した識別領域を対象にID認識を実行し、各オブジェクトをIDの認識結果を基に識別する。 The object identification unit 6 includes the identification area extraction unit 601 and executes ID recognition for the identification area extracted by the identification area extraction unit 601 and identifies each object based on the recognition result of the ID.

識別領域の抽出対象とされる画像は、前記カメラ選定部５が選定したカメラが映し出すオブジェクトの画像であり、カメラ選定部５が複数のカメラを選択したオブジェクトについては、各カメラ映像から識別領域がそれぞれ抽出される。識別領域は、背番号をIDとするのであれば背番号部分、車のナンバープレートをIDとするのであればナンバープレートの部分である。 The image to be extracted from the identification area is an image of an object projected by the camera selected by the camera selection unit 5, and for an object in which the camera selection unit 5 selects a plurality of cameras, the identification area is obtained from each camera image. Each is extracted. The identification area is a uniform number portion if the uniform number is an ID, and a license plate portion if the license plate of a car is an ID.

図１５は、IDが背番号である場合の識別領域の抽出方法を示した図であり、オブジェクトの向き推定で使用された全身画像から背番号部分が抽出される。 FIG. 15 is a diagram showing a method of extracting the identification area when the ID is a uniform number, and the uniform number portion is extracted from the whole body image used in the orientation estimation of the object.

識別領域の抽出方法には、人物の骨格情報を基に識別領域を抽出する方法、対象オブジェクトの画像の上半分などの予め決められた領域を抽出する手法、識別領域を抽出するために再度深層学習等を行って抽出する方法、作成した対象オブジェクトの視体積を各カメラ映像に逆投影した際にできるシルエットの重心位置の情報を基に抽出する方法などがある。ここでは、人物の骨格情報を基に識別領域を抽出する例を説明する。 The identification area extraction method includes a method of extracting the identification area based on the skeleton information of the person, a method of extracting a predetermined area such as the upper half of the image of the target object, and a deep layer again to extract the identification area. There are a method of extracting by performing learning and the like, and a method of extracting based on the information of the center of gravity of the silhouette created when the visual volume of the created target object is back-projected on each camera image. Here, an example of extracting an identification area based on the skeleton information of a person will be described.

特許文献５には、画像のみから人物のボーン（骨格）を計算できる技術が開示されており、この技術を対象オブジェクトに適用することで、概ね各部位の位置を知ることができる。背番号であれば、概ね腰の位置が分かれば背番号部分を高精度に抽出することができる。 Patent Document 5 discloses a technique capable of calculating a person's bone (skeleton) only from an image, and by applying this technique to a target object, the position of each part can be generally known. If it is a uniform number, the uniform number portion can be extracted with high accuracy if the position of the waist is generally known.

また、カメラ選定部５のID指向方向計算部５０１が最終的に計算したID指向方向とカメラの向いている方向との間に角度のズレがある場合は、この角度をパラメータとして、抽出された識別領域部分の画像にアフィン変換等を行うなどの画像処理を行い、IDの認識精度を向上させる機能を追加しても良い。 If there is an angle difference between the ID direction direction finally calculated by the ID direction direction calculation unit 501 of the camera selection unit 5 and the direction in which the camera is facing, this angle is used as a parameter to extract the image. An image processing such as performing affine transformation may be performed on the image of the identification area portion to add a function for improving the ID recognition accuracy.

オブジェクト識別部６が、前記抽出された識別領域を対象にID認識を実行する方法としては、非特許文献１に記載されるように、機械学習を用いて背番号の認識を行う手法がある。背番号認識に機械学習を採用する場合は、背番号の映った画像を入力すると、予測した認識結果（背番号が何番であるかという推測結果）を取得できるモデルを作成する必要があるため、最初に学習画像を用いて背番号認識用のモデルの生成を行う。 As a method for the object identification unit 6 to perform ID recognition for the extracted identification area, there is a method of recognizing a uniform number by using machine learning as described in Non-Patent Document 1. When machine learning is adopted for uniform number recognition, it is necessary to create a model that can acquire the predicted recognition result (estimation result of what number the uniform number is) by inputting an image showing the uniform number. First, a model for uniform number recognition is generated using a training image.

このモデルの作成は、予め行っておくことが望ましい。例えば大量の学習画像を用意し、畳み込みニューラルネットワークを用いて背番号認識用のモデルを作成する。学習画像の生成については、背番号が映っている画像を大量に用意して正解ラベルを手動で付与してもよいし、任意の背景画像に数字の入ったフォントによる文字等を重ねて、人工的に学習画像を生成してもよい。後者の方法は自動で正解ラベルの付与された学習画像を生成できるため、手動で正解ラベルを割り付ける必要がなく効率的である。 It is desirable to create this model in advance. For example, a large number of training images are prepared, and a model for number recognition is created using a convolutional neural network. For the generation of the learning image, you may prepare a large number of images showing the uniform number and manually attach the correct answer label, or you can superimpose the characters in the font with numbers on any background image and artificially. A learning image may be generated. The latter method is efficient because it is possible to automatically generate a learning image with a correct answer label, so that it is not necessary to manually assign a correct answer label.

また、初めからフォントを回転させたり、歪ませたり、サイズの調節をしてさまざまな学習画像を生成しておけば、抽出された画像の背番号が多少斜めを向いていたり、綺麗に切り取られていなくても、精度の高い認識が可能となる。 Also, if you rotate the font, distort it, and adjust the size to generate various learning images from the beginning, the spine numbers of the extracted images will be slightly diagonal or will be cut out neatly. Even if it is not, highly accurate recognition is possible.

また、モデルの生成方法は畳み込みニューラルネットワークを用いる方法に限定されず、背番号の認識が可能であればテンプレートマッチングのようなアプローチや、画像の特徴量とSVMとを組み合わせて学習させた学習器を用いて識別を行うなどの手法を取ってもよい。 In addition, the model generation method is not limited to the method using a convolutional neural network, and if it is possible to recognize the number, an approach such as template matching or a learner that trains by combining image features and SVM. You may take a method such as performing identification using.

なお、前記カメラ選定部で２つ以上のカメラが選定されたために２以上の識別領域が抽出され、それぞれに対してID認識を実行した結果、同一の認識結果が得られていれば良いが、例えば一方のカメラの認識結果が「３８」、他方のカメラの認識結果が「３９」といったように、認識結果に食い違いの生じる可能性がある。 Since two or more cameras are selected by the camera selection unit, two or more identification areas are extracted, and ID recognition is executed for each of them. As a result, the same recognition result may be obtained. For example, the recognition result of one camera is "38", the recognition result of the other camera is "39", and so on, there is a possibility that the recognition results may be inconsistent.

この場合、より正しい認識結果を選択するためのアプローチとして、背番号の認識を行う際に、例えば畳み込みニューラルネットワークで背番号を認識するのであれば、背番号認識用のモデルの出力層の活性化関数にsoftmax関数を用いることで、認識結果の確率を算出することができる。 In this case, as an approach for selecting a more correct recognition result, when recognizing the uniform number, for example, if the uniform number is recognized by a convolutional neural network, the output layer of the model for uniform number recognition is activated. By using the softmax function as the function, the probability of the recognition result can be calculated.

同様に、テンプレートマッチングやSVMなどであっても、認識結果ごとに尤度を算出することが可能である。よって、得られる尤度を基に、複数のカメラで結果が食い違う場合に、最終的に一つのIDを決定するような機能を具備していてもよい。 Similarly, even with template matching and SVM, it is possible to calculate the likelihood for each recognition result. Therefore, it may be provided with a function of finally determining one ID when the results are different in a plurality of cameras based on the obtained likelihood.

加えて、複数のカメラが選択されたために２以上のID認識結果が得られた場合、その尤度計算に前記オブジェクト遮蔽度計算部４が計算した遮蔽度Ojを反映させてもよい。例えば、遮蔽の生じる可能性が高いカメラからのID認識の結果は誤る可能性が高いため、その尤度を遮蔽度Ojに応じて低下させることで、なるべく採用されないようにする措置を講じることができる。 In addition, when two or more ID recognition results are obtained because a plurality of cameras are selected, the degree of obstruction Oj calculated by the object obstruction degree calculation unit 4 may be reflected in the likelihood calculation. For example, the result of ID recognition from a camera that is likely to be obstructed is likely to be incorrect, so it is possible to take measures to prevent it from being adopted by lowering its likelihood according to the degree of obstruction Oj. can.

この処理には、オブジェクト向き推定部３での尤度計算の際に用いた遮蔽度Ojをそのまま用いても良いし、識別領域抽出部６０１が抽出した識別領域に対する遮蔽度Ijを新たに計算し直しても良い。 For this process, the shielding degree Oj used in the likelihood calculation in the object orientation estimation unit 3 may be used as it is, or the shielding degree Ij for the identification area extracted by the identification area extraction unit 601 may be newly calculated. You may fix it.

例えば、識別領域抽出部６０１が抽出した背番号部分と推定される画像領域部分のみに対して、オブジェクト遮蔽度計算部４が視体積を逆投影した際の重なり度を求めることによって、抽出された背番号領域がどれだけ遮蔽されているかという遮蔽度Ijを計算できる。 For example, it was extracted by obtaining the degree of overlap when the object shielding degree calculation unit 4 back-projects the visual volume only for the image area portion estimated to be the uniform number portion extracted by the identification area extraction unit 601. It is possible to calculate the degree of shielding Ij, which is how much the uniform number area is shielded.

結果出力部７は、前記オブジェクト位置推定部２が推定した各オブジェクトのフレーム画像上の位置座標と、前記オブジェクト識別部６が識別した当該オブジェクトのIDとを対応付けてID認識の結果表示を行う。 The result output unit 7 displays the result of ID recognition in association with the position coordinates on the frame image of each object estimated by the object position estimation unit 2 and the ID of the object identified by the object identification unit 6. ..

結果表示の方法には様々あり、コンソール上に各オブジェクトの位置座標およびIDを数値として表示させるだけでもよいが、図１５に一例を示したように、各オブジェクトの位置に当該オブジェクトのIDを紐付けてグラフィカルに平面マップとして表示しても良い。 There are various methods for displaying the result, and it is sufficient to display the position coordinates and ID of each object as numerical values on the console. However, as shown in FIG. 15, the ID of the object is linked to the position of each object. It may be attached and displayed graphically as a plane map.

図１６では、サッカーフィールドの半面を模した背景の上に、各オブジェクト（選手）の位置座標を示す丸型マーカを配置すると共に、各マーカ上に背番号を示すIDを重ねて表示している。 In FIG. 16, a round marker indicating the position coordinates of each object (player) is arranged on a background imitating one side of a soccer field, and an ID indicating a uniform number is superimposed and displayed on each marker. ..

このような平面マップを、映像のフレームごとに出力して動画的に動かすような表示方法も可能である。また、この表示の際に、例えば画像からユニフォームの色の情報を取得することで選手の所属チームを判断し、その結果でマーカの色を変化させて平面マップに反映させてもよい。さらに、色情報を基に審判と判定されたオブジェクトについては、選手ではないと判断して結果の表示から除外したり、あるいはIDを付さないことで視覚的に審判であると容易に判別できるようにしても良い。 It is also possible to display such a plane map by outputting it for each frame of the video and moving it like a moving image. Further, at the time of this display, for example, the team to which the player belongs may be determined by acquiring the uniform color information from the image, and the color of the marker may be changed and reflected on the plane map as a result. Furthermore, an object that is judged to be a referee based on color information can be easily identified as a referee by judging that it is not a player and excluding it from the display of the result, or by not attaching an ID. You may do so.

１…カメラ映像取得部，２…オブジェクト位置推定部，３…オブジェクト向き推定部，４…オブジェクト遮蔽度計算部，５…カメラ選定部，６…オブジェクト識別部，７…結果出力部，３０１…オブジェクト画像取得部，３０２…分類部，３０３…移動ベクトル計算部，３０４…信頼度取得部，５０１…ID指向方向計算部，５０２…候補ベクトル算出部，５０３…カメラ評価部，６０１…識別領域抽出部 1 ... camera image acquisition unit, 2 ... object position estimation unit, 3 ... object orientation estimation unit, 4 ... object shielding degree calculation unit, 5 ... camera selection unit, 6 ... object identification unit, 7 ... result output unit, 301 ... object Image acquisition unit, 302 ... Classification unit, 303 ... Movement vector calculation unit, 304 ... Reliability acquisition unit, 501 ... ID direction direction calculation unit, 502 ... Candidate vector calculation unit, 503 ... Camera evaluation unit, 601 ... Identification area extraction unit

Claims

In an object identification device that identifies objects based on camera images
A means of acquiring camera images of objects taken from multiple different perspectives,
A means of estimating the position of each object,
A means to calculate the degree of occlusion between objects for each camera based on the viewpoint of each camera and the position of each object,
A means for selecting a camera to be used for identifying each object based on the degree of shielding for each object, and
An object identification device comprising a means for identifying each object based on a camera image of the selected camera for each object.

The object holds an ID that can be recognized from the camera image,
Further equipped with a means to estimate the orientation of each object based on the camera image,
The object identification device according to claim 1, wherein the means for selecting the camera is to select a camera that recognizes the ID of each object based on the orientation and the degree of shielding of each object.

The means for estimating the orientation of each object is
The second aspect of claim 2 comprises at least one of a means for estimating the orientation of each object based on an object image acquired from a camera image and a means for estimating the orientation of each object based on a movement vector of each object. Object identification device.

The object identification device according to claim 3, wherein the means for estimating the orientation of each object further includes means for acquiring the reliability of the orientation estimation result.

The means for estimating the orientation of each object performs deep learning-based orientation estimation for the object image, and the means for acquiring the reliability of the orientation estimation result is the output of the output layer function in the deep learning-based orientation estimation. The object identification device according to claim 4, wherein the value is acquired as a reliability.

The object identification device according to claim 4 or 5, wherein the means for acquiring the reliability of the orientation estimation result obtains higher reliability as the moving speed of the object is faster in the orientation estimation based on the movement vector. ..

The means for selecting the camera is
A means to calculate the ID orientation direction for each object,
A means to calculate a candidate vector for each ID-oriented direction of each object,
A means for scoring each candidate vector based on the degree of obstruction and reliability, and
For each object, a means of integrating two candidate vectors whose directivity angle difference is less than a predetermined threshold value to generate a new candidate vector and repeating this process.
A means for scoring the newly generated candidate vector based on the scores of the two integrated candidate vectors is provided.
The object identification device according to any one of claims 4 to 6, wherein a camera is selected based on a candidate vector whose score satisfies a predetermined condition.

The means for selecting the camera is
A means to calculate the ID orientation direction for each object,
A means to calculate a candidate vector for each ID-oriented direction of each object,
For each object, each camera is scored based on the orientation of the candidate vector and the orientation of each camera, and this is repeated for all the candidate vectors to obtain the cumulative score of the recommendation.
The object identification device according to any one of claims 4 to 6, wherein a camera is selected based on a candidate vector whose cumulative score satisfies a predetermined condition.

The object identification according to claim 8, wherein the means for obtaining the cumulative score of the recommendation degree is to score the recommendation degree for each camera based on the inner product of the orientation of the candidate vector and the orientation of each camera. Device.

The means for identifying the object further includes means for extracting an identification area including the ID of the object from the camera image of the object.
The object identification device according to any one of claims 2 to 9, wherein ID recognition is executed for the extracted identification area.

The object identification device according to claim 10, wherein the means for extracting the identification area is to extract skeleton information from a camera image of an object and extract the identification area based on the skeleton information.

The means for calculating the degree of shielding is any one of claims 1 to 11, wherein the means for calculating the degree of shielding calculates the degree of shielding based on the ratio of other objects existing within the range connecting the predetermined width including the object of interest and the camera. The object identification device described in.

The means for calculating the degree of obstruction determines the degree of obstruction based on the amount of overlap between the mask generated when the visual volume of the object of interest is projected onto the camera and the mask generated when the visual volume of another object is projected onto the camera. The object identification device according to any one of claims 1 to 12, characterized in that calculation is performed.

In an object identification method in which a computer identifies an object based on camera images.
The procedure for acquiring camera images of objects taken from multiple different viewpoints,
The procedure for estimating the position of each object and
The procedure for calculating the degree of occlusion between objects based on the viewpoint of each camera and the position of each object, and the procedure for each camera.
The procedure for selecting a camera to be used for identifying each object based on the degree of shielding for each object, and
An object identification method including a procedure for identifying each object based on a camera image of the selected camera for each object.

In an object identification program that identifies objects based on camera images
The procedure for acquiring camera images of objects taken from multiple different viewpoints,
The procedure for estimating the position of each object and
The procedure for calculating the degree of occlusion between objects based on the viewpoint of each camera and the position of each object, and the procedure for each camera.
The procedure for selecting a camera to be used for identifying each object based on the degree of shielding for each object, and
An object identification program that causes a computer to perform a procedure for identifying each object based on the camera image of the selected camera for each object.