JP2021128540A

JP2021128540A - Information processing device, information processing method and program

Info

Publication number: JP2021128540A
Application number: JP2020022687A
Authority: JP
Inventors: 洋東條; Hiroshi Tojo
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-02-13
Filing date: 2020-02-13
Publication date: 2021-09-02

Abstract

To improve the efficiency of a checking work by a user relating to candidate images that are check-up results.SOLUTION: An information processing device includes: an object detecting unit that detects an object from a picked-up video picture; a feature quantity generating unit that generates a feature quantity of the object on the basis of a feature extracted for each part that forms the object; a feature quantity check-up unit that checks up the feature quantity of the object which is a designated check-up source with the feature quantity of the object in the video picture; and a visibility determining unit that determines the visibility of the object in the video picture on the basis of the check-up result by the feature quantity check-up unit and of the information on the part.SELECTED DRAWING: Figure 3

Description

本発明は、情報処理装置、情報処理方法、及びプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program.

コンビニエンスストア、ショッピングモール、空港などに複数のカメラを設置し、ネットワークでつなぐことによって複数の地点の監視を可能とするシステムがある。例えば、あるカメラで少し前に撮像された映像中の万引き犯など特定の人物を、現在の複数のカメラ映像の中から探すといった使い方がある。これは、カメラ映像中の人物から抽出した服の色などの特徴を照合することによって実現できる。例えば、照合した結果をスコアによって降順にソートして候補画像として人物領域を含むサムネイル画像を並べてユーザに提示することで、ユーザは候補の人物が照合元と同一人物であるか、候補画像を見て確認することができる。 There is a system that enables monitoring of multiple points by installing multiple cameras at convenience stores, shopping malls, airports, etc. and connecting them via a network. For example, there is a usage such as searching for a specific person such as a shoplifter in an image captured by a certain camera a while ago from a plurality of current camera images. This can be achieved by collating features such as the color of clothes extracted from the person in the camera image. For example, by sorting the collation results in descending order according to the score and arranging thumbnail images including the person area as candidate images and presenting them to the user, the user can see whether the candidate person is the same person as the collation source or the candidate image. Can be confirmed.

サムネイル画像は、大量の結果を一覧できるものの解像度が低く、ユーザによる候補画像の目視確認が容易ではない。混雑したシーンでの人同士の重なりやコンビニエンスストア内の棚等の背景物、傘などの所持物による隠れが画像内で生じると、サムネイル画像内で候補の人物について確認に利用できる領域が小さくなり、視認性が低下する。特許文献１には、人物のパーツ（目、鼻、口等）単位のスコアを円グラフ等で表示することによりユーザ確認を補助する技術が開示されている。また、特許文献２には、人物の複数の種別の特徴（顔、服装、歩容等）ごとにスコアを算出し、ソートした種別ごとの照合結果を並列して表示する技術が開示されている。 Although thumbnail images can list a large number of results, the resolution is low, and it is not easy for the user to visually confirm the candidate images. When people overlap in a crowded scene, background objects such as shelves in a convenience store, and hiding by belongings such as umbrellas occur in the image, the area that can be used to confirm the candidate person in the thumbnail image becomes smaller. , Visibility is reduced. Patent Document 1 discloses a technique for assisting user confirmation by displaying a score for each part (eyes, nose, mouth, etc.) of a person in a pie chart or the like. Further, Patent Document 2 discloses a technique of calculating a score for each of a plurality of types of characteristics (face, clothes, gait, etc.) of a person and displaying the collation results for each sorted type in parallel. ..

特開２０１４−１１５２３号公報Japanese Unexamined Patent Publication No. 2014-11523 特開２０１９−１６０９８号公報JP-A-2019-16098

特許文献１の技術では、候補画像の視認性が低い場合に、表示される各パーツのスコアからユーザによる候補画像の確認が容易になるが、大量の候補画像が存在する場合には大量のパーツのスコアを表示することになり、一覧性が低下してしまう。また、特許文献２に記載の技術では、複数の種別の照合結果を一覧できるようになるが、同じ候補画像の対応は、ユーザが照合結果を見比べて判断する必要があるため、特に視認性の低い候補画像において確認の作業効率が低下する。 In the technique of Patent Document 1, when the visibility of the candidate image is low, it is easy for the user to confirm the candidate image from the score of each displayed part, but when there are a large number of candidate images, a large number of parts The score will be displayed, and the listability will be reduced. Further, the technique described in Patent Document 2 makes it possible to list a plurality of types of collation results, but the correspondence between the same candidate images is particularly visible because the user needs to compare and judge the collation results. Confirmation work efficiency is reduced in low candidate images.

本発明は、このような事情に鑑みてなされたものであり、照合結果の候補画像に係るユーザによる確認作業の効率を向上させることを目的とする。 The present invention has been made in view of such circumstances, and an object of the present invention is to improve the efficiency of confirmation work by a user regarding a candidate image of a collation result.

本発明に係る情報処理装置は、撮像された映像から被写体を検出する検出手段と、前記被写体を構成するパーツ単位に抽出した特徴に基づいて、前記被写体の特徴量を生成する生成手段と、指定された照合元である被写体の特徴量と前記映像中の被写体の特徴量とを照合する照合手段と、前記照合手段での照合結果及び前記パーツに関する情報に基づいて、前記映像における前記被写体の視認性を判定する判定手段とを有する。 The information processing apparatus according to the present invention is designated as a detection means for detecting a subject from an captured image and a generation means for generating a feature amount of the subject based on features extracted for each part constituting the subject. Based on the collation means for collating the feature amount of the subject which is the collation source and the feature amount of the subject in the video, the collation result by the collation means, and the information about the parts, the visual recognition of the subject in the video is performed. It has a determination means for determining sex.

本発明によれば、照合結果の候補画像に係るユーザによる確認作業の効率を向上させることができる。 According to the present invention, it is possible to improve the efficiency of the confirmation work by the user regarding the candidate image of the collation result.

本実施形態における情報処理装置を適用したシステムの構成例を示す図である。It is a figure which shows the configuration example of the system which applied the information processing apparatus in this embodiment. 本実施形態における情報処理装置のハードウェア構成例を示す図である。It is a figure which shows the hardware configuration example of the information processing apparatus in this embodiment. 本実施形態における情報処理装置の機能構成例を示す図である。It is a figure which shows the functional structure example of the information processing apparatus in this embodiment. 本実施形態における特徴量抽出処理の例を示すフローチャートである。It is a flowchart which shows the example of the feature amount extraction processing in this embodiment. 本実施形態における被写体照合処理の例を示すフローチャートである。It is a flowchart which shows the example of the subject collation processing in this embodiment. パーツ単位の識別容易度を説明する図である。It is a figure explaining the degree of identification of a part unit. 本実施形態における画像表示の一例を示す図である。It is a figure which shows an example of the image display in this embodiment. 本実施形態における画像表示の他の例を示す図である。It is a figure which shows another example of the image display in this embodiment.

以下、本発明の実施形態を図面に基づいて説明する。なお、以下では、処理対象とする被写体を人物の全体（全身）として説明するが、これに限定されるものではなく、人物の顔や車両など他の被写体であっても同様に適用可能である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following, the subject to be processed will be described as the whole person (whole body), but the present invention is not limited to this, and the same applies to other subjects such as the face of a person and a vehicle. ..

図１は、本実施形態における情報処理装置を適用したシステムの構成例を示す図である。本実施形態におけるシステムは、撮像装置（カメラ）１０１〜１０３及び情報処理装置１０５を有する。撮像装置（カメラ）１０１〜１０３及び情報処理装置１０５は、ネットワーク１０４に接続されており、互いにデータ等の通信が可能である。なお、図１には、撮像装置（カメラ）１０１〜１０３の３つの撮像装置を有する例を示しているが、これに限定されるものではなく、システムが有する撮像装置の数は任意である。 FIG. 1 is a diagram showing a configuration example of a system to which the information processing apparatus according to the present embodiment is applied. The system in this embodiment includes an imaging device (camera) 101-103 and an information processing device 105. The imaging devices (cameras) 101 to 103 and the information processing device 105 are connected to the network 104, and can communicate data and the like with each other. Note that FIG. 1 shows an example of having three image pickup devices (cameras) 101 to 103, but the present invention is not limited to this, and the number of image pickup devices included in the system is arbitrary.

撮像装置（カメラ）１０１〜１０３のそれぞれは、撮像レンズ、ＣＣＤやＣＭＯＳなどの撮像センサ、及び映像信号処理部等を有し、映像を撮像する。また、撮像装置（カメラ）１０１〜１０３は、ネットワーク１０４を介して、撮像した映像を送信する。情報処理装置１０５は、ネットワーク１０４を介して受信した映像からカメラ映像中の被写体を照合する。これにより、情報処理装置１０５は、例えば被写体のカメラ間の移動を検出することができる。 Each of the image pickup devices (cameras) 101 to 103 has an image pickup lens, an image pickup sensor such as a CCD or CMOS, a video signal processing unit, and the like, and captures an image. Further, the image pickup apparatus (camera) 101 to 103 transmits the captured image via the network 104. The information processing device 105 collates the subject in the camera image with the image received via the network 104. Thereby, the information processing device 105 can detect, for example, the movement of the subject between the cameras.

図２は、本実施形態における情報処理装置１０５のハードウェア構成例を示すブロック図である。本実施形態における情報処理装置１０５は、ＣＰＵ２０１、ＲＯＭ２０２、ＲＡＭ２０３、２次記憶装置２０４、入力装置２０５、表示装置２０６、及びネットワークＩ／Ｆ２０７を有する。ＣＰＵ２０１、ＲＯＭ２０２、ＲＡＭ２０３、２次記憶装置２０４、入力装置２０５、表示装置２０６、及びネットワークＩ／Ｆ２０７は、バス２０８を介して互いに通信可能に接続されている。 FIG. 2 is a block diagram showing a hardware configuration example of the information processing device 105 according to the present embodiment. The information processing device 105 in this embodiment includes a CPU 201, a ROM 202, a RAM 203, a secondary storage device 204, an input device 205, a display device 206, and a network I / F 207. The CPU 201, ROM 202, RAM 203, secondary storage device 204, input device 205, display device 206, and network I / F 207 are connected to each other via bus 208 so as to be communicable with each other.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１は、ＲＯＭ２０２やＲＡＭ２０３に格納されたプログラムに従って処理を実行する。ＣＰＵ２０１は、プログラムを実行することにより、例えば情報処理装置１０５が有する各機能部を制御したり、特徴量抽出処理や被写体照合処置等の各処理を行ったりする。 The CPU (Central Processing Unit) 201 executes processing according to a program stored in the ROM 202 or the RAM 203. By executing the program, the CPU 201 controls, for example, each functional unit of the information processing apparatus 105, and performs various processes such as feature amount extraction processing and subject matching processing.

ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０２は、不揮発性メモリであり、本実施形態に係る処理を実行するためのプログラムやその他の制御に必要なプログラムやデータを格納する。ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０３は、揮発性メモリであり、フレーム画像データやパターン判別結果などの一時的なデータを記憶する。 The ROM (Read Only Memory) 202 is a non-volatile memory, and stores a program for executing the process according to the present embodiment and other programs and data necessary for control. The RAM (Random Access Memory) 203 is a volatile memory, and stores temporary data such as frame image data and pattern discrimination results.

２次記憶装置２０４は、ハードディスクドライブ、ソリッドステートドライブ、フラッシュメモリなどの書き換え可能な２次記憶装置であり、画像情報や画像処理プログラムや、各種設定情報などを記憶する。これらの情報は、ＲＡＭ２０３に転送され、ＣＰＵ２０１がプログラムを実行する際に利用される。 The secondary storage device 204 is a rewritable secondary storage device such as a hard disk drive, a solid state drive, and a flash memory, and stores image information, an image processing program, various setting information, and the like. This information is transferred to the RAM 203 and used when the CPU 201 executes the program.

入力装置２０５は、キーボードやマウスなどであり、ユーザからの入力を受け付ける。表示装置２０６は、ブラウン管ＣＲＴや液晶ディスプレイなどであり、ユーザに対して処理結果などを表示する。ネットワークＩ／Ｆ２０７は、インターネットやイントラネットなどのネットワークと接続を行うインタフェースである。 The input device 205 is a keyboard, a mouse, or the like, and receives input from the user. The display device 206 is a cathode ray tube CRT, a liquid crystal display, or the like, and displays a processing result or the like to the user. The network I / F207 is an interface for connecting to a network such as the Internet or an intranet.

本実施形態における情報処理装置１０５は、後述するフローチャートの各ステップに対応する処理を実装したソフトウェア（プログラム）を２次記憶装置２０４やＲＡＭ２０３などから読み出し、ＣＰＵ２０１を用いて実行する。 The information processing device 105 in the present embodiment reads software (program) that implements processing corresponding to each step of the flowchart described later from the secondary storage device 204, RAM 203, or the like, and executes the software (program) using the CPU 201.

図３は、本実施形態における情報処理装置１０５の機能構成例を示すブロック図である。本実施形態における情報処理装置１０５は、映像取得部３０１、映像記憶部３０２、被写体検出部３０３、基本特徴抽出部３０４、マスク抽出部３０５、特徴量生成部３０６、及び特徴量記憶部３０７を有する。また、情報処理装置１０５は、照合元指定部３０８、特徴量照合部３０９、視認性判定部３１０、及び表示部３１１を有する。 FIG. 3 is a block diagram showing a functional configuration example of the information processing apparatus 105 according to the present embodiment. The information processing device 105 in the present embodiment includes an image acquisition unit 301, an image storage unit 302, a subject detection unit 303, a basic feature extraction unit 304, a mask extraction unit 305, a feature amount generation unit 306, and a feature amount storage unit 307. .. Further, the information processing device 105 includes a collation source designation unit 308, a feature amount collation unit 309, a visibility determination unit 310, and a display unit 311.

映像取得部３０１は、撮像装置（カメラ）１０１〜１０３により撮像された映像を、ネットワーク１０４を介して撮像装置（カメラ）１０１〜１０３から受信することにより、映像を取得する。 The image acquisition unit 301 acquires the image by receiving the image captured by the image pickup device (camera) 101 to 103 from the image pickup device (camera) 101 to 103 via the network 104.

映像記憶部３０２は、映像取得部３０１で取得した映像をカメラＩＤ及びフレームＩＤとともに記憶する。映像記憶３０２は、例えばＲＡＭ２０３や２次記憶装置２０４から構成される。カメラＩＤは、撮像装置を識別するための識別情報（ＩＤ）であり、フレームＩＤは、撮像装置により撮影された複数のフレーム画像からなる映像においてフレーム画像を識別するための識別情報（ＩＤ）である。カメラＩＤ及びフレームＩＤにより、どの撮像装置により撮影された、どのフレーム画像であるかが特定できる。 The video storage unit 302 stores the video acquired by the video acquisition unit 301 together with the camera ID and the frame ID. The video storage 302 is composed of, for example, a RAM 203 or a secondary storage device 204. The camera ID is identification information (ID) for identifying an imaging device, and the frame ID is identification information (ID) for identifying a frame image in a video composed of a plurality of frame images captured by the imaging device. be. From the camera ID and the frame ID, it is possible to identify which frame image was taken by which imaging device.

被写体検出部３０３は、映像取得部３０１で取得した映像から被写体の領域を検出する。基本特徴抽出部３０４は、被写体検出部３０３で検出された被写体の領域の画像における基本特徴を抽出する。本実施形態では、基本特徴抽出部３０４は、被写体検出部３０３で検出された被写体の領域の画像から、色、エッジ、テクスチャなどといった基本的な特徴を抽出する。マスク抽出部３０５は、被写体を構成する所定のパーツ以外の領域をマスクするためのマスク画像を抽出する。被写体を構成するパーツには、例えば頭部、腕、胴体、脚などがある。 The subject detection unit 303 detects the area of the subject from the image acquired by the image acquisition unit 301. The basic feature extraction unit 304 extracts the basic features in the image of the area of the subject detected by the subject detection unit 303. In the present embodiment, the basic feature extraction unit 304 extracts basic features such as color, edge, and texture from the image of the subject region detected by the subject detection unit 303. The mask extraction unit 305 extracts a mask image for masking an area other than a predetermined part constituting the subject. The parts that make up the subject include, for example, the head, arms, torso, and legs.

特徴量生成部３０６は、基本特徴抽出部３０４で抽出された基本的な特徴に基づく特徴マップを、マスク抽出部３０５で抽出されたマスク画像を用いてマスク処理し、パーツ単位の特徴を抽出する。また、特徴量生成部３０６は、被写体毎に、抽出したパーツ単位の特徴を連結して被写体の特徴量を生成する。このとき、特徴量生成部３０６は、抽出したパーツ単位の特徴を次元削減して連結し、所定の次元のベクトルとして被写体の特徴量を得る。特徴量記憶部３０７は、特徴量生成部３０６で生成された被写体の特徴量を、パーツに関する情報、カメラＩＤ、及びフレームＩＤと関連付けて記憶する。特徴量記憶部３０７は、例えばＲＡＭ２０３や２次記憶装置２０４から構成される。 The feature amount generation unit 306 masks a feature map based on the basic features extracted by the basic feature extraction unit 304 using the mask image extracted by the mask extraction unit 305, and extracts the features of each part. .. In addition, the feature amount generation unit 306 generates the feature amount of the subject by connecting the extracted features of each part for each subject. At this time, the feature amount generation unit 306 reduces the dimensions of the extracted features of each part and connects them to obtain the feature amount of the subject as a vector having a predetermined dimension. The feature amount storage unit 307 stores the feature amount of the subject generated by the feature amount generation unit 306 in association with the information about the parts, the camera ID, and the frame ID. The feature amount storage unit 307 is composed of, for example, a RAM 203 or a secondary storage device 204.

照合元指定部３０８は、被写体照合の処理において照合元になる被写体を指定する。特徴量照合部３０９は、特徴量記憶部３０７から被写体の特徴量を読み出して、照合元指定部３０８で指定された被写体の特徴量と照合する。また、特徴量照合部３０９は、照合結果に基づいて映像記憶部３０２からフレーム画像を読み出し、被写体領域の画像を切り出して所定のサイズのサムネイル画像を生成する。 The collation source designation unit 308 designates a subject to be a collation source in the subject collation process. The feature amount collating unit 309 reads out the feature amount of the subject from the feature amount storage unit 307 and collates it with the feature amount of the subject designated by the collation source designation unit 308. Further, the feature amount collation unit 309 reads a frame image from the video storage unit 302 based on the collation result, cuts out an image of the subject area, and generates a thumbnail image of a predetermined size.

視認性判定部３１０は、特徴量照合部３０９により得られた照合結果として表示する候補画像の視認性を判定する。表示部３１１は、特徴量照合部３０９による照合結果を、視認性判定部３１０での判定結果に基づいた表示方法でユーザに対して表示する。表示部３１１は、視認性判定部３１０での判定結果に基づいて、特徴量照合部３０９による照合結果に応じたサムネイル画像を表示する。表示部３１１は、例えば表示装置２０６から構成される。 The visibility determination unit 310 determines the visibility of the candidate image to be displayed as the collation result obtained by the feature amount collation unit 309. The display unit 311 displays the collation result by the feature amount collation unit 309 to the user by a display method based on the determination result by the visibility determination unit 310. The display unit 311 displays a thumbnail image according to the collation result by the feature amount collation unit 309 based on the determination result by the visibility determination unit 310. The display unit 311 is composed of, for example, a display device 206.

次に、本実施形態における情報処理装置１０５での処理について説明する。
まず、カメラ映像から被写体の特徴量を生成する特徴量抽出処理について説明する。図４は、本実施形態における特徴量抽出処理の例を示すフローチャートである。 Next, the processing in the information processing apparatus 105 in this embodiment will be described.
First, a feature amount extraction process for generating a feature amount of a subject from a camera image will be described. FIG. 4 is a flowchart showing an example of the feature amount extraction process in the present embodiment.

ステップＳ４０１では、映像取得部３０１は、ネットワーク１０４を介して、撮像装置（カメラ）１０１〜１０３により撮像された映像を各カメラの識別情報として予め付与されたカメラＩＤとともにフレーム画像の単位で取得する。
次に、ステップＳ４０２では、ステップＳ４０１において取得したフレーム画像をカメラＩＤ及びフレームＩＤとともに映像記憶部３０２に記憶する。 In step S401, the image acquisition unit 301 acquires the image captured by the imaging devices (cameras) 101 to 103 via the network 104 in units of frame images together with the camera ID given in advance as the identification information of each camera. ..
Next, in step S402, the frame image acquired in step S401 is stored in the video storage unit 302 together with the camera ID and the frame ID.

次に、ステップＳ４０３では、被写体検出部３０３は、ステップＳ４０１において取得したフレーム画像内から被写体の検出を行う。フレーム画像から被写体の領域を検出する具体的な方法としては、例えば以下の参考文献１に記載の方法がある。
（参考文献１）米国特許出願公開第２００７／０２３７３８７号明細書 Next, in step S403, the subject detection unit 303 detects the subject from the frame image acquired in step S401. As a specific method for detecting the region of the subject from the frame image, for example, there is the method described in Reference 1 below.
(Reference 1) U.S. Patent Application Publication No. 2007/0237387

参考文献１に記載の方法は、所定の大きさの検出ウィンドウを入力画像上で走査させ、検出ウィンドウ内の画像を切り出したパターン画像に対し人物であるか否かの２クラス判別を行う。この判別には、アダブーストを使って多くの弱判別器を有効に組み合わせて判別器を構成し、判別精度を向上させる。また、この判別器を直列に繋ぎ、カスケード型の検出器を構成するようにしている。弱判別器はＨＯＧ（ＨｉｓｔｏｇｒａｍｓｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓ）特徴量で構成されている。そして、カスケード型の検出器は、まず前段の単純な判別器を使って明らかに被写体でないパターンの候補をその場で除去する。そして、それ以外の候補に対してのみ、より高い識別性能を持つ後段の複雑な判別器を使って人物かどうかの判別を行う。なお、検出ウィンドウの走査はフレーム画像全体に対して行われるため、フレーム画像に複数の被写体が含まれている場合には、すべての被写体を検出することができる。 The method described in Reference 1 scans a detection window of a predetermined size on an input image, and determines whether or not the pattern image obtained by cutting out the image in the detection window is a person or not. For this discrimination, AdaBoost is used to effectively combine many weak discriminators to form a discriminator, and the discriminant accuracy is improved. In addition, these discriminators are connected in series to form a cascade type detector. The weak discriminator is composed of HOG (Histograms of Oriented Gradients) features. Then, the cascade type detector first removes the candidate of the pattern that is clearly not the subject on the spot by using the simple discriminator in the previous stage. Then, only for the other candidates, it is determined whether or not the person is a person by using a complicated discriminator in the latter stage having higher discrimination performance. Since the scanning of the detection window is performed on the entire frame image, when a plurality of subjects are included in the frame image, all the subjects can be detected.

なお、本実施形態において被写体は人物の全体（全身）としているが、他の被写体についても前述した方法が適用可能である。例えば、被写体として顔を扱いたい場合には、参考文献１にある判別器を顔について作ればよい。また、前述した被写体の検出方法は一例であり、これに限定されるものではない。被写体の領域を検出する方法として、畳込みニューラルネットワーク（ＣＮＮ：ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）を用いた手法を適用してもよいし、背景差分法などを用いてもよい。 In the present embodiment, the subject is the whole person (whole body), but the above-mentioned method can be applied to other subjects. For example, when it is desired to treat a face as a subject, the discriminator described in Reference 1 may be made for the face. Further, the above-mentioned method for detecting a subject is an example, and the present invention is not limited to this. As a method for detecting the region of the subject, a method using a convolutional neural network (CNN) may be applied, or a background subtraction method or the like may be used.

前述した方法によって、被写体検出部３０３は、映像（フレーム画像）から被写体の領域を検出することができる。被写体領域は、例えばフレーム画像の左上を原点とし、被写体である人物を囲む矩形の左上と右下の２点のｘ座標、ｙ座標で表す。なお、被写体の領域を示す方法は、これに限定されるものではなく、領域を一意に指定可能な任意の方法を適用可能である。 By the method described above, the subject detection unit 303 can detect the region of the subject from the video (frame image). The subject area is represented by, for example, the x-coordinate and the y-coordinate of two points, the upper left and the lower right of the rectangle surrounding the person who is the subject, with the upper left of the frame image as the origin. The method of indicating the area of the subject is not limited to this, and any method capable of uniquely designating the area can be applied.

次に、ステップＳ４０４では、基本特徴抽出部３０４は、ステップＳ４０３において検出された被写体の領域の画像における基本特徴を抽出する。本実施形態での基本特徴とは、例えば、色、エッジ、テクスチャなどの基本的な特徴である。基本特徴抽出部３０４は、例えば、以下の参考文献２に記載のＲｅｓｎｅｔのようにＣＮＮを用い、出力として得られる特徴マップを基本特徴とする。なお、これに限定されるものではなく、ＡｌｅｘＮｅｔ、ＶＧＧなど他のアーキテクチャのＣＮＮであってもよい。または、カラーヒストグラム、ＬＢＰ（ＬｏｃａｌＢｉｎａｒｙＰａｔｔｅｒｎ）特徴、ＨＯＧ特徴、ＧａｂｏｒフィルタやＳｃｈｍｉｄフィルタなど用いて抽出した特徴であっても構わない。 Next, in step S404, the basic feature extraction unit 304 extracts the basic features in the image of the area of the subject detected in step S403. The basic features in this embodiment are, for example, basic features such as color, edge, and texture. The basic feature extraction unit 304 uses a CNN as in Resnet described in Reference 2 below, and uses a feature map obtained as an output as a basic feature. The CNN is not limited to this, and may be a CNN of another architecture such as AlexNet or VGG. Alternatively, it may be a color histogram, an LBP (Local Binary Pattern) feature, a HOG feature, a feature extracted using a Gabor filter, a Schmid filter, or the like.

（参考文献２）Ｋ．Ｈｅｅｔａｌ．“ＤｅｅｐＲｅｓｉｄｕａｌＬｅａｒｎｉｎｇｆｏｒＩｍａｇｅＲｅｃｏｇｎｉｔｉｏｎ”，ＰｒｏｃｅｅｄｉｎｇｓｏｆＩＥＥＥＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ，ｐｐ.７７０−７７８，２０１６ (Reference 2) K.K. He et al. "Deep Learning Learning for Image Recognition", Proceedings of IEEE Computer Vision and Pattern Recognition, pp.770-778, 2016

次に、ステップＳ４０５では、マスク抽出部３０５は、被写体を構成する所定のパーツに係るパーツマスク（マスク画像）を抽出する。マスク抽出部３０５は、例えば、以下の参考文献３に記載のように被写体画像と正解データとなるパーツマスク画像とを学習データに用い、パーツの種類別（本実施形態ではＫ種類あるものとする）にスコアマップを出力するＣＮＮを作成する。そして、得られるパーツのスコアマップを閾値処理することによりパーツマスク画像を得る。 Next, in step S405, the mask extraction unit 305 extracts a part mask (mask image) related to a predetermined part constituting the subject. For example, as described in Reference 3 below, the mask extraction unit 305 uses the subject image and the parts mask image which is the correct answer data as the learning data, and classifies the parts (K types are assumed in the present embodiment). ) Create a CNN that outputs a score map. Then, a part mask image is obtained by performing threshold processing on the score map of the obtained parts.

（参考文献３）Ｇ．Ｏｌｉｖｅｉｒａ，Ａ．Ｖａｌａｄａ，Ｃ．Ｂｏｌｌｅｎ，Ｗ．Ｂｕｒｇａｒｄ，ａｎｄＴ．Ｂｒｏｘ．Ｄｅｅｐｌｅａｒｎｉｎｇｆｏｒｈｕｍａｎｐａｒｔｄｉｓｃｏｖｅｒｙｉｎｉｍａｇｅｓ．ＩＣＲＡ，２０１６ (Reference 3) G. Oliveira, A.M. Valada, C.I. Bollen, W. et al. Burgard, and T. et al. Brox. Deep learning for human part discovery in images. ICRA, 2016

ステップＳ４０５における処理では、パーツの領域を示すパーツマスク画像が得られればよく、この方法に限定されるものではない。例えば、以下の参考文献４に記載のように照合元の人物画像と同一人物の画像、別人の人物画像の３つ組のデータセットを用意して、所定の数（Ｋ）のパーツ検出器を学習させる方法もある。 The process in step S405 is not limited to this method as long as a part mask image showing a region of the part can be obtained. For example, as described in Reference 4 below, a set of three data sets of a person image of the same person as the collation source person image and a person image of another person is prepared, and a predetermined number (K) of parts detectors are used. There is also a way to learn.

（参考文献４）Ｌ．Ｚｈａｏｅｔａｌ．“Ｄｅｅｐｌｙ−ＬｅａｒｎｅｄＰａｒｔ−ＡｌｉｇｎｅｄＲｅｐｒｅｓｅｎｔａｔｉｏｎｓｆｏｒＰｅｒｓｏｎＲｅ−Ｉｄｅｎｔｉｆｉｃａｔｉｏｎ，" ＩＥＥＥＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎ（ＩＣＣＶ），２０１７ (Reference 4) L. Zhao et al. "Deeply-Learned Part-Aligned Representations for Person Re-Identification," IEEE International Conference on Computer Vision (ICCV), 2017

また、前述した参考文献１のような方法でパーツ検出器を作成する方法も考えらえる。この場合は、パーツマスク画像でなくパーツの外接矩形しか得られないため、特徴量にパーツ以外の画素が反映されてしまう。しかしながら、本発明の趣旨に応じた処理は実現可能であり、同様な効果を得ることができる。 Further, a method of creating a parts detector by a method as described in Reference 1 can be considered. In this case, since only the circumscribed rectangle of the part can be obtained instead of the part mask image, the pixels other than the part are reflected in the feature amount. However, the processing according to the gist of the present invention is feasible, and the same effect can be obtained.

次に、ステップＳ４０６では、特徴量生成部３０６は、ステップＳ４０４において基本特徴抽出部３０４により得られた特徴マップを、ステップＳ４０５においてマスク抽出手段３０５で得られたマスクでマスキングし、パーツ単位の特徴を抽出する。本実施形態の例では、Ｋ個のパーツが存在するため、Ｋ個の特徴が抽出される。
続いて、ステップＳ４０７では、特徴量生成部３０６は、ステップＳ４０６において抽出したＫ個の特徴をすべて次元数Ｄに次元削減して連結することにより、（Ｋ×Ｄ）次元のベクトルの被写体特徴量を得る。 Next, in step S406, the feature amount generation unit 306 masks the feature map obtained by the basic feature extraction unit 304 in step S404 with the mask obtained by the mask extraction means 305 in step S405, and features of each part. Is extracted. In the example of this embodiment, since there are K parts, K features are extracted.
Subsequently, in step S407, the feature amount generation unit 306 reduces all the K features extracted in step S406 to the dimension number D and concatenates them, so that the subject feature amount of the (K × D) dimensional vector is the subject feature amount. To get.

なお、前述した説明では、先に特徴マップを生成してからパーツマスクを作成するようにしているが、先にパーツマスクを作成し、マスク処理された画像から特徴を抽出するようにしてもよい。 In the above description, the feature map is generated first and then the parts mask is created. However, the parts mask may be created first and the features may be extracted from the masked image. ..

次に、ステップＳ４０８では、ステップＳ４０７において得られた被写体の特徴量を、被写体の特徴量の各次元と特徴量を得たパーツの情報とを関連付けて特徴量記憶部３０７に記憶する。本実施形態では、一例としてパーツ情報は、パーツの種別ＩＤとパーツの面積であるとする。なお、パーツの中心位置座標や或いはパーツマスク画像そのものをパーツ情報として用いてもよい。特徴量とパーツの情報の関連付けは、例えば、特徴量ベクトルにおいて、１次元〜Ｄ次元はパーツＫ１の特徴量情報、（Ｄ＋１）次元〜（２Ｄ）次元はパーツＫ２の特徴量情報、と言ったように次元をインデックスとすればよい。なお、後のサムネイル画像作成のため、被写体領域の座標、被写体を識別する被写体ＩＤ、及び、現在の処理対象のフレームＩＤとカメラＩＤも被写体の特徴量に対応付けておくものとする。 Next, in step S408, the feature amount of the subject obtained in step S407 is stored in the feature amount storage unit 307 in association with each dimension of the feature amount of the subject and the information of the parts obtained from the feature amount. In the present embodiment, as an example, the part information is a part type ID and a part area. The center position coordinates of the part or the part mask image itself may be used as the part information. Regarding the association between the feature amount and the part information, for example, in the feature amount vector, the 1st dimension to the D dimension are the feature amount information of the part K1, and the (D + 1) dimension to the (2D) dimension are the feature amount information of the part K2. The dimension may be used as an index. In order to create a thumbnail image later, the coordinates of the subject area, the subject ID for identifying the subject, and the frame ID and camera ID of the current processing target are also associated with the feature amount of the subject.

情報処理装置１０５は、ステップＳ４０９で、ステップＳ４０３において検出されたすべての被写体について処理が完了したと判定するまで、ステップＳ４０４〜Ｓ４０８の処理を繰り返す。すなわち、情報処理装置１０５は、フレーム画像内のすべての被写体から被写体の特徴量を抽出するまで、ステップＳ４０４〜Ｓ４０８の処理を繰り返す。 The information processing apparatus 105 repeats the processes of steps S404 to S408 until it is determined in step S409 that the processes for all the subjects detected in step S403 have been completed. That is, the information processing device 105 repeats the processes of steps S404 to S408 until the feature amount of the subject is extracted from all the subjects in the frame image.

また、情報処理装置１０５は、ステップＳ４１０で、ネットワーク１０４を介して映像を取得可能なすべてのカメラについて処理が完了したと判定するまで、ステップＳ４０１〜Ｓ４０９の処理を繰り返す。
以上が被写体に係る特徴量抽出処理であり、ユーザからの終了の指示があるまですべて処理が繰り返されるものとする。 Further, the information processing apparatus 105 repeats the processes of steps S401 to S409 until it is determined in step S410 that the processes of all the cameras capable of acquiring images via the network 104 are completed.
The above is the feature amount extraction process for the subject, and it is assumed that all the processes are repeated until the user gives an instruction to end the feature amount.

次に、被写体の照合を行う被写体照合処理について説明する。図５は、本実施形態における被写体照合処理の例を示すフローチャートである。 Next, a subject collation process for collating a subject will be described. FIG. 5 is a flowchart showing an example of subject matching processing in the present embodiment.

ステップＳ５０１では、照合元指定部３０８は、照合対象となる人物（クエリ被写体）のユーザからの指定を受け付ける。ユーザからの指定は、例えば、カメラ映像を表示装置２０６に表示し、入力装置２０５のマウスなどを使って指示をする。そして、指示されたカメラのカメラＩＤと被写体の被写体ＩＤに基づいて、対象人物の被写体の特徴量が特徴量記憶部３０７から読み込まれＲＡＭ２０３に一時記憶される。 In step S501, the collation source designation unit 308 accepts a designation from the user of the person (query subject) to be collated. For the designation from the user, for example, the camera image is displayed on the display device 206, and the instruction is given by using the mouse of the input device 205 or the like. Then, based on the instructed camera ID of the camera and the subject ID of the subject, the feature amount of the subject of the target person is read from the feature amount storage unit 307 and temporarily stored in the RAM 203.

次に、ステップＳ５０２では、特徴量照合部３０９は、特徴量記憶部３０７から被写体の特徴量を関連付けられたパーツの情報とともに読み出す。
次に、ステップＳ５０３では、特徴量照合部３０９は、ステップＳ５０１においてユーザにより指定された照合元の被写体の特徴量と、ステップＳ５０２において読み出した被写体の特徴量との照合を行う。 Next, in step S502, the feature amount matching unit 309 reads out the feature amount of the subject from the feature amount storage unit 307 together with the information of the associated parts.
Next, in step S503, the feature amount collating unit 309 collates the feature amount of the subject of the collation source specified by the user in step S501 with the feature amount of the subject read out in step S502.

本実施形態では、一例として、特徴量の照合にユークリッド距離を用いるものとするが、Ｌ１距離やコサイン距離などの他の距離指標であってもよい。そして、２つの特徴量間の距離を照合の度合いを表すスコアに変換する。本実施形態では、スコアは０から１０００の範囲で表すものとする。パーツ単体の距離の最小値（ｄ＿ｍｉｎとする）を実験的に求め、スコア最大の１０００に対応する距離を（Ｋ×ｄ＿ｍｉｎ）とする。スコアＳは被写体特徴量間の距離がｄｉｓｔであったとすると、以下の（式１）で求めることができる。 In the present embodiment, as an example, the Euclidean distance is used for collation of the feature amount, but other distance indexes such as the L1 distance and the cosine distance may be used. Then, the distance between the two features is converted into a score indicating the degree of collation. In this embodiment, the score is represented in the range of 0 to 1000. The minimum value (referred to as d_min) of the distance of a single part is experimentally obtained, and the distance corresponding to the maximum score of 1000 is defined as (K × d_min). The score S can be obtained by the following (Equation 1), assuming that the distance between the subject features is a dust.

なお、このステップＳ５０３では、後に行う視認性判定に用いるため、特徴量照合部３０９は、人物全体（全身）である全パーツでのスコアの他にパーツ単位のスコアＳ_Kも算出しておく。各パーツ単位のスコアＳ_Kは、（式１）においてＫ＝１とし、ｄｉｓｔにパーツ単体の距離を代入すれば求めることができる。 Incidentally, the in step S503, for use in the visibility determination to be performed after, the feature checker 309, keep also calculates the score S _K parts units in addition to the score on all parts is an overall person (systemic). Score S _K of each part unit, it can be obtained by substituting the K = 1 in (Equation 1), the distance of the part itself in dist.

ステップＳ５０４では、特徴量照合部３０９は、照合対象のカメラから取得したフレーム画像内のすべての被写体の特徴量について、ステップＳ５０２〜Ｓ５０３の特徴量の照合処理を行ったか否かを判定する。フレーム画像内の被写体の内に特徴量の照合処理を行っていない被写体があると判定した場合には（ＮＯ）、ステップＳ５０２に戻り、特徴量照合部３０９は、未処理の被写体の特徴量について、ステップＳ５０２〜Ｓ５０３の特徴量の照合処理を行う。一方、フレーム画像内のすべての被写体について特徴量の照合処理を完了したと判定した場合には（ＹＥＳ）、ステップＳ５０５へ進む。 In step S504, the feature amount collation unit 309 determines whether or not the feature amount collation processing of steps S502 to S503 has been performed on the feature amounts of all the subjects in the frame image acquired from the camera to be collated. If it is determined that there is a subject in the frame image that has not been subjected to the feature amount matching process (NO), the process returns to step S502, and the feature amount matching unit 309 determines the feature amount of the unprocessed subject. , The feature amount collation processing of steps S502 to S503 is performed. On the other hand, if it is determined that the feature amount collation processing has been completed for all the subjects in the frame image (YES), the process proceeds to step S505.

ステップＳ５０５では、特徴量照合部３０９は、特徴量の照合処理により得られた照合結果の全候補をスコアの降順にソートする。なお、所定の閾値未満のスコアを持った候補については照合元と同一人物でない（類似していない）ものと判定して、これ以降の処理ステップでの処理対象から外してしまってもよい。 In step S505, the feature amount collation unit 309 sorts all the candidates of the collation result obtained by the feature amount collation process in descending order of the score. It should be noted that a candidate having a score less than a predetermined threshold value may be determined not to be the same person (not similar) as the collation source and may be excluded from the processing target in the subsequent processing steps.

次に、ステップＳ５０６では、特徴量照合部３０９は、照合結果の全候補についてカメラＩＤ及びフレームＩＤに基づいて映像記憶部３０２から当該フレーム画像を読み出す。そして、特徴量照合部３０９は、映像記憶部３０２より読み出したフレーム画像から、被写体領域の座標に基づいて被写体領域の画像を切り出し、縮小処理等を行って所定のサイズにリサイズすることにより、表示する候補画像としてのサムネイル画像を生成する。 Next, in step S506, the feature amount collation unit 309 reads out the frame image from the video storage unit 302 based on the camera ID and the frame ID for all the candidates of the collation result. Then, the feature amount collating unit 309 cuts out the image of the subject area from the frame image read from the image storage unit 302 based on the coordinates of the subject area, performs reduction processing and the like, and resizes the image to a predetermined size to display the image. Generate a thumbnail image as a candidate image.

次に、ステップＳ５０７では、視認性判定部３１０は、特徴量照合部３０９に得られた照合結果に基づく候補画像中の被写体について視認性を判定する。ここで、候補画像の視認性の高さと被写体がどの程度見えているかとは相関がある。候補画像の視認性は、被写体の全部が見えている時が最も高くなり、隠れが多くなるほど、あるいは照明の影響（建物の影など）を受けているほど、はっきりと見えている領域が小さくなる。隠れ等の影響が大きければ被写体の照合処理ではスコアが上がらず、本人と判定されなくなるが、影響が一部であれば全体のスコアへの影響は小さいため、候補として上がってくる。そこで、本実施形態では、被写体を構成するパーツのそれぞれがどの程度見えているかを、パーツ単位のスコアから推定する。スコアの高いパーツの面積の和の被写体全体の面積に対する割合を、被写体がどの程度見えているかを表す量として用い、視認性を評価する。 Next, in step S507, the visibility determination unit 310 determines the visibility of the subject in the candidate image based on the collation result obtained by the feature amount collation unit 309. Here, there is a correlation between the high visibility of the candidate image and how much the subject is visible. The visibility of the candidate image is highest when the entire subject is visible, and the more hidden or affected by lighting (such as the shadow of a building), the smaller the clearly visible area. .. If the influence of hiding or the like is large, the score will not increase in the subject matching process and the person will not be determined as the person himself / herself. Therefore, in the present embodiment, it is estimated from the score of each part how much each of the parts constituting the subject is visible. Visibility is evaluated by using the ratio of the sum of the areas of the parts having a high score to the area of the entire subject as an amount indicating how much the subject is visible.

また、視認性はパーツの種別にも依存する。一般に人間は顔の識別能力は高い。また、上半身は下半身に対して服装のバリエーションが多いため、比較的識別しやすい。そこで、パーツの種別ごとに識別の容易性を示す値（識別容易度）を予め規定しておくものとする。なお、値としては図６に一例を示したように総和が１になるように規定しておく。 Visibility also depends on the type of part. In general, humans have a high ability to discriminate faces. In addition, the upper body has many variations in clothing compared to the lower body, so it is relatively easy to identify. Therefore, a value (identification ease) indicating the ease of identification is defined in advance for each type of part. The value is specified so that the total sum is 1 as shown in FIG. 6 as an example.

以上から本実施形態では視認性の度合いを表す視認度Ｖを、パーツの面積Ａ_K、パーツ単位のスコアＳ_K、パーツ単位の識別容易度Ｅ_Kとして、（式２）のように定義する。 The visibility V representing the degree of visibility in the present embodiment from the above, the area A _K parts, the score S _K parts units, as identification Simplicity E _K parts units, defined as (Equation 2).

視認性判定部３１０は、算出した視認度Ｖを現在の候補画像と対応付けて、ＲＡＭ２０３に一時記憶する。 The visibility determination unit 310 temporarily stores the calculated visibility V in the RAM 203 in association with the current candidate image.

次に、ステップＳ５０８では、視認性判定部３１０は、すべての候補画像について視認性の判定処理を行ったか否かを判定する。視認性の判定処理を行っていない候補画像があると判定した場合には（ＮＯ）、ステップＳ５０７に戻り、視認性判定部３１０は、未処理の候補画像について視認性の判定処理を行う。一方、すべての候補画像について視認性の判定処理を完了したと判定した場合には（ＹＥＳ）、ステップＳ５０９へ進む。 Next, in step S508, the visibility determination unit 310 determines whether or not the visibility determination processing has been performed on all the candidate images. If it is determined that there is a candidate image for which the visibility determination process has not been performed (NO), the process returns to step S507, and the visibility determination unit 310 performs the visibility determination process for the unprocessed candidate image. On the other hand, if it is determined that the visibility determination process has been completed for all the candidate images (YES), the process proceeds to step S509.

ステップＳ５０９では、表示部３１１は、ステップＳ５０７において算出された視認度Ｖに基づいて、ステップＳ５０６において生成されたサムネイル画像（候補画像）を表示する。ここで、視認度Ｖに基づいた候補画像の表示には複数の方法があり、以下に一例を示す。 In step S509, the display unit 311 displays the thumbnail image (candidate image) generated in step S506 based on the visibility V calculated in step S507. Here, there are a plurality of methods for displaying the candidate image based on the visibility V, and an example is shown below.

第１の方法としては、視認度Ｖが所定値以下の候補画像を除いて表示する方法である。例えば、照合結果をユーザが確認した後に、直ちに現場に駆けつけるような場合、すべての候補を精査する必要はなく、時間も十分にない。このような場合には、表示する候補画像の数を減らす方が、ユーザの負担が減る。そこで、視認度Ｖに基づいて視認性の低い候補画像を間引いて表示する。 The first method is a method of displaying the candidate images whose visibility V is equal to or less than a predetermined value. For example, if the user rushes to the site immediately after confirming the collation result, it is not necessary to scrutinize all the candidates and there is not enough time. In such a case, reducing the number of candidate images to be displayed reduces the burden on the user. Therefore, candidate images with low visibility are thinned out and displayed based on the visibility V.

第２の方法としては、視認度Ｖが所定値以下の候補画像を強調表示する方法である。照合元の人物の行動をくまなく調べるような場合には、すべての候補を精査する必要がある。このような場合には、視認性の低い候補画像がどれであるかを強調表示し、注意を払って確認すべき候補であることをユーザに明示する。 The second method is a method of highlighting a candidate image having a visibility V of a predetermined value or less. If you want to investigate the behavior of the person you are collating with, you need to scrutinize all the candidates. In such a case, highlight which candidate image has low visibility and clearly indicate to the user that the candidate image should be carefully confirmed.

図７（ａ）を参照して、第２の方法について説明する。図７（ａ）において、７０１は照合結果を表示するウィンドウである。７０２は照合元（クエリ）のサムネイル画像である。７０３は候補（ギャラリ）のサムネイル画像を表示するエリアであり、図７（ａ）に示す例では５枚の候補画像が表示されている。画像７０４は傘で頭部が隠されており、画像７０５は他の人物と重なることで脚の一部が隠されているため、視認性が落ちている（視認度Ｖが所定値以下である）。画像７０４、７０５については視認性が落ちていることを示すマーク７０６、７０７が付与されて表示される。 The second method will be described with reference to FIG. 7A. In FIG. 7A, 701 is a window for displaying the collation result. Reference numeral 702 is a thumbnail image of the collation source (query). Reference numeral 703 is an area for displaying thumbnail images of candidates (gallery), and in the example shown in FIG. 7A, five candidate images are displayed. In image 704, the head is hidden by an umbrella, and in image 705, a part of the leg is hidden by overlapping with another person, so that the visibility is reduced (the visibility V is equal to or less than a predetermined value). ). The images 704 and 705 are displayed with the marks 706 and 707 indicating that the visibility is reduced.

図７（ａ）に示した例では、視認性が低い候補画像に対して特定のマークを付与したが、ユーザが視認性の低い候補画像を把握できれば他の方法であっても構わない。例えば、サムネイル画像に外枠を付けるようにしても構わない。或いは、視認度Ｖの値を各サムネイル画像の下などに表示し、所定値以下の場合には赤字などに変更するようにしても構わない。 In the example shown in FIG. 7A, a specific mark is given to the candidate image having low visibility, but another method may be used as long as the user can grasp the candidate image having low visibility. For example, an outer frame may be attached to the thumbnail image. Alternatively, the value of the visibility V may be displayed below each thumbnail image, and if it is less than a predetermined value, it may be changed to a deficit or the like.

第３の方法としては、視認度Ｖが所定値以下の候補画像と照合元のクエリ画像とを並列して表示する方法である。視認性の低い候補と照合元の比較が容易になるため、ユーザによる確認作業が容易になる。 The third method is a method of displaying a candidate image having a visibility V of a predetermined value or less and a query image of a collation source in parallel. Since it becomes easy to compare the candidate with low visibility with the collation source, the confirmation work by the user becomes easy.

図７（ｂ）を参照して、第３の方法について説明する。図７（ｂ）において、７１１は視認性の低い候補画像の確認用ウィンドウであり、図７（ａ）に示したウィンドウ７０１とは別のウィンドウとして表示される。７１２は照合元（クエリ）のサムネイル画像であり、７１３は視認性の低い候補（ギャラリ）のサムネイル画像を表示するエリアである。画像７１２と画像７１３は隣り合って表示されるため、比較が容易である。また、画像７１２と画像７１３を拡大表示するようにしてもよい。このとき、参考文献５に記載のような超解像処理を行い、更に確認が容易となる工夫を行ってもよい。また、第３の方法では、視認性の低い複数の候補をまとめて表示することにより、ユーザは連続して確認作業を行うことができる。 The third method will be described with reference to FIG. 7 (b). In FIG. 7B, 711 is a window for confirming a candidate image having low visibility, and is displayed as a window different from the window 701 shown in FIG. 7A. Reference numeral 712 is a thumbnail image of a collation source (query), and reference numeral 713 is an area for displaying a thumbnail image of a candidate (gallery) having low visibility. Since the image 712 and the image 713 are displayed next to each other, comparison is easy. Further, the image 712 and the image 713 may be enlarged and displayed. At this time, the super-resolution processing as described in Reference 5 may be performed to further facilitate confirmation. Further, in the third method, the user can continuously perform the confirmation work by displaying a plurality of candidates having low visibility together.

（参考文献５）Ｊ．Ｋｉｍ，Ｊ．Ｋ．Ｌｅｅ，ａｎｄＫ．Ｍ．Ｌｅｅ．“ＡｃｃｕｒａｔｅＩｍａｇｅＳｕｐｅｒＲｅｓｏｌｕｔｉｏｎＵｓｉｎｇＶｅｒｙＤｅｅｐＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｔｗｏｒｋｓ"，ＩＥＥＥＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ（ＣＶＰＲ），２０１６ (Reference 5) J. Kim, J.M. K. Lee, and K. M. Lee. "Accurate Image Super Resolution Using Very Deep Convolutional Networks", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

また、現在のフレームで検出された被写体が、前のフレームで検出されたどの被写体と対応するかを求める被写体追尾部を設けることにより、視認性の低い候補の前後フレームの被写体領域の画像を表示するようにしてもよい。追尾処理には様々な手法があるが、例えば前フレームに含まれる被写体領域の中心位置と現在のフレームに含まれる被写体領域の中心位置とが最短のものを対応付ける方法がある。この他にも前フレームの被写体領域を照合パターンとしたパターンマッチングによる手法など、フレーム間の被写体を対応付けることができれば、どのような手法であってもよい。フレーム間で対応付けられた被写体には同じ被写体ＩＤが付与されるため、被写体ＩＤに基づいて前後フレームの被写体領域の画像を読み出すことができる。 In addition, by providing a subject tracking unit that determines which subject detected in the previous frame corresponds to the subject detected in the current frame, an image of the subject area of the candidate front and rear frames with low visibility is displayed. You may try to do it. There are various methods for tracking processing. For example, there is a method of associating the center position of the subject area included in the previous frame with the center position of the subject area included in the current frame. In addition to this, any method may be used as long as the subjects between the frames can be associated with each other, such as a method by pattern matching using the subject area of the previous frame as a matching pattern. Since the same subject ID is assigned to the subject associated between the frames, the image of the subject area of the front and rear frames can be read out based on the subject ID.

この場合の表示方法について図８を参照して説明する。図８において、８０１は、例えば図７（ｂ）に示した画像表示においてカーソル等による指示で選択された候補画像について詳細を示すウィンドウである。選択された候補のサムネイル画像８０４と、その前後フレームから抜き出した同一の被写体のサムネイル画像８０３、８０５を時系列順に並べて表示している。一連の動きの中で、特徴を抽出していないフレーム画像の中には、画像８０４より隠れが少ない画像８０５のような表示が存在する場合がある。また、一連の画像を提示することで１枚のサムネイル画像よりも確認しやすくなる効果も期待できる。また、時間的に異なる映像中の画像を同時に並べて表示するのではなく動画として再生させるようにしても構わない。 The display method in this case will be described with reference to FIG. In FIG. 8, 801 is a window showing details of the candidate image selected by an instruction with a cursor or the like in the image display shown in FIG. 7B, for example. The selected candidate thumbnail images 804 and the thumbnail images 803 and 805 of the same subject extracted from the frames before and after the selected candidate are displayed side by side in chronological order. In a series of movements, in the frame image from which features have not been extracted, there may be a display such as image 805 with less hiding than image 804. In addition, by presenting a series of images, an effect of making it easier to confirm than a single thumbnail image can be expected. Further, the images in the images different in time may be played back as a moving image instead of being displayed side by side at the same time.

本実施形態では被写体として人物の全身を例にしているが、他の物体への適用も可能である。例えば、顔がサングラスやマスクなどのアクセサリで一部分が覆われているケースや、手などで一時的に顔の一部が隠れるケースにおいて有効である。 In the present embodiment, the whole body of a person is taken as an example as a subject, but it can also be applied to other objects. For example, it is effective in a case where the face is partially covered with accessories such as sunglasses and a mask, and a case where a part of the face is temporarily hidden by a hand or the like.

以上説明したように、本実施形態によれば、被写体を構成するパーツ単位の照合スコアや識別容易性などを基に候補画像の視認性の判定が可能になる。また、視認性の低い候補画像をユーザが確認しやすいように表示することが可能になる。これにより、ユーザによる候補画像の確認作業の効率を向上させることが可能となる。 As described above, according to the present embodiment, it is possible to determine the visibility of the candidate image based on the collation score and the ease of identification of each part constituting the subject. In addition, it becomes possible to display a candidate image having low visibility so that the user can easily check it. This makes it possible to improve the efficiency of the user's confirmation work of the candidate image.

前述した実施形態では、撮像装置からネットワーク経由で受信した映像を情報処理装置で処理するようにしているが、情報処理装置の一部の機能を撮像装置に組み込み、処理を分散するように構成してもよい。例えば、被写体の検出までを撮像装置で行い、被写体領域のみの映像を撮像装置からネットワーク経由で情報処理装置に送信するように構成してもよい。 In the above-described embodiment, the image received from the image pickup device via the network is processed by the information processing device, but some functions of the information processing device are incorporated into the image pickup device to distribute the processing. You may. For example, the image pickup device may perform up to the detection of the subject, and the image pickup device may transmit the image of only the subject area from the image pickup device to the information processing device via the network.

（本発明の他の実施形態）
本発明は、前述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読み出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other Embodiments of the present invention)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

なお、前記実施形態は、何れも本発明を実施するにあたっての具体化のほんの一例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 It should be noted that the above-described embodiments are merely examples of embodiment of the present invention, and the technical scope of the present invention should not be construed in a limited manner by these. That is, the present invention can be implemented in various forms without departing from the technical idea or its main features.

１０１〜１０３：撮像装置１０４：ネットワーク１０５：情報処理装置３０１：映像取得部３０２：映像記憶部３０３：被写体検出部３０４：基本特徴抽出部３０５：マスク抽出部３０６：特徴量生成部３０７：特徴量記憶部３０８：照合元指定部３０９：特徴量照合部３１０：視認性判定部３１１：表示部 101-103: Imaging device 104: Network 105: Information processing device 301: Image acquisition unit 302: Image storage unit 303: Subject detection unit 304: Basic feature extraction unit 305: Mask extraction unit 306: Feature amount generation unit 307: Feature amount Storage unit 308: Collation source designation unit 309: Feature amount collation unit 310: Visibility determination unit 311: Display unit

Claims

A detection means that detects the subject from the captured image,
A generation means for generating the feature amount of the subject based on the features extracted for each part constituting the subject, and
A collation means for collating the feature amount of the subject, which is the designated collation source, with the feature amount of the subject in the image.
An information processing device including a determination unit for determining the visibility of the subject in the video based on a collation result by the collation means and information on the part.

The information processing apparatus according to claim 1, wherein the determination means determines the visibility of the subject in the image based on the collation result of each part.

The information processing apparatus according to claim 1 or 2, wherein the information about the part includes at least one of the area of the part and the ease of identification of the part.

The claim means that the determination means calculates the visibility based on the collation result of each part and the information about the part, and determines the visibility of the subject in the image based on the calculated visibility. Item 3. The information processing apparatus according to item 3.

4. The information processing apparatus according to item 1.

The information processing apparatus according to any one of claims 1 to 5, further comprising a display means for displaying an image according to the collation result by the collation means based on the determination result by the determination means. ..

The information processing apparatus according to claim 6, wherein the display means displays an image of a subject in the image similar to the subject of the collation source.

The information processing apparatus according to claim 7, wherein the display means selects an image to be displayed based on the determination result of the determination means.

The information processing device according to claim 7, wherein the display means highlights an image based on the determination result of the determination means.

The information processing device according to claim 7, wherein the display means enlarges and displays an image based on the determination result of the determination means.

The information processing apparatus according to claim 7, wherein the display means performs super-resolution processing on an image based on the determination result of the determination means.

The information according to claim 7, wherein the display means displays in parallel an image of the subject of the collation source and an image of the subject in the video based on the determination result of the determination means. Processing equipment.

The information processing apparatus according to claim 7, wherein the display means simultaneously displays images of subjects in images that differ in time based on the determination result of the determination means.

A detection process that detects the subject from the captured image,
A generation step of generating a feature amount of the subject based on the features extracted for each part constituting the subject, and
A collation process for collating the feature amount of the subject, which is the designated collation source, with the feature amount of the subject in the image.
An information processing method comprising a determination step of determining the visibility of the subject in the video based on the collation result in the collation step and information on the part.

A detection step that detects the subject from the captured image,
A generation step of generating a feature amount of the subject based on the features extracted for each part constituting the subject, and
A collation step for collating the feature amount of the subject, which is the designated collation source, with the feature amount of the subject in the image,
A program for causing a computer to perform a determination step of determining the visibility of the subject in the video based on the collation result in the collation step and information on the part.