JP2019082959A

JP2019082959A - Information processing apparatus, information processing method, and program

Info

Publication number: JP2019082959A
Application number: JP2017211169A
Authority: JP
Inventors: 昌弘松下; Masahiro Matsushita
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-10-31
Filing date: 2017-10-31
Publication date: 2019-05-30

Abstract

To successively output newly retrieved results while maintaining an order of retrieved results.SOLUTION: According to an information processing method of the present invention, a multi-dimensional vector is calculated from query data from storage means registering a first vector which is a retrieval object at a plurality of groups in a multi-dimensional space, thereby a short distance to the multi-dimensional vector registered at the plurality of groups of an index is determined. Based on the determined shortest distance, an order of groups to be compared with multi-dimensional vectors of query data are determined, and based on the determined order, comparison between the registered multi-dimensional vectors of each group and the multi-dimensional vector of the query data is carried out. A registered multi-dimensional vector having distance between vectors set shorter than the shortest distance is output as the retrieval result.SELECTED DRAWING: Figure 4

Description

本発明は、対象のオブジェクトを検索する技術に関する。 The present invention relates to a technique for searching for an object of interest.

従来、監視対象である映像の各フレームから人物の顔を検出し、その顔から画像特徴量を算出して映像のフレームと対応づけて蓄積しておく装置が知られている（特許文献１参照）。その装置では、検索したい人の顔をクエリとして、蓄積した画像特徴量との照合を行い、その人物が映っている映像を表示することが可能である。 Conventionally, there is known an apparatus which detects the face of a person from each frame of a video to be monitored, calculates an image feature amount from the face, and stores it in association with the frame of the video (see Patent Document 1). ). In the device, it is possible to compare the accumulated image feature amount with the face of a person to be retrieved as a query, and to display a video in which the person is shown.

また、非特許文献１には、人物の顔の画像特徴量を用いた検索において、探索範囲を順次拡大しながら検索を行う方法が開示されている。この方法では、検索結果数が充分になった時点、もしくは、検索結果に対する十分な信頼性が得られた段階（１位得票数と２位得票数の比が所定以上に大きくなった時点）で検索処理を打ち切り、その結果を表示することが開示されている。 Further, Non-Patent Document 1 discloses a method of performing a search while sequentially expanding a search range in a search using an image feature amount of a human face. In this method, when the number of search results is sufficient, or when sufficient reliability of the search results is obtained (when the ratio of the number of votes obtained for the first place to the number of votes obtained for the second place exceeds a predetermined value) It is disclosed to abort the search process and display the results.

特開２００９−１９９３２２号公報JP, 2009-199322, A 特開２００２−３７３３３２号公報JP, 2002-373332, A 特開２０１０−１６５１５６号公報JP, 2010-165156, A

前川敬介，内海ゆづ子，岩村雅一，黄瀬浩一（大阪府立大学）：「１００万顔画像データベースに対する３４ｍｓでの照合の実現」，電子情報通信学会技術研究報告．ＰＲＭＵ，パターン認識・メディア理解１１１（３５３），９５−１００，２０１１−１２−０８Keisuke Maekawa, Yuzuko Uchiumi, Masakazu Iwamura, Kouichi Kose (Osaka Prefecture University): "Realization of verification in 1 ms face image database in 34 ms", Technical Report of IEICE. PRMU, Pattern recognition and Media understanding 111 (353), 95-100, 2011-12-08 ＥｒｉｋＭｕｐｈｙ−Ｃｈｕｔｏｒｉａｎ， “Ｈｅａｄｐｏｓｅｅｓｔｉｍａｔｉｏｎｆｏｒｄｒｉｖｅｒａｓｓｉｓｔａｎｃｅｓｙｓｔｅｍｓ：Ａｒｏｂｕｓｔａｌｇｏｒｉｔｈｍａｎｄｅｘｐｅｒｉｍｅｎｔａｌｅｖａｌｕａｔｉｏｎ，” ｉｎＰｒｏｃ．ＩＥＥＥＣｏｎｆ．ＩｎｔｅｌｌｉｇｅｎｔＴｒａｎｓｐｏｒｔａｔｉｏｎＳｙｓｔｅｍｓ，２００７，ｐｐ．７０９−７１４．Erik Muphy-Chutorian, “Head pose estimation for driver assistance systems: A robust algorithm and experimental evaluation,” in Proc. IEEE Conf. Intelligent Transportation Systems, 2007, pp. 709-714. Ｃ．ＨａｒｒｉｓａｎｄＭ．Ｊ．Ｓｔｅｐｈｅｎｓ，“Ａｃｏｍｂｉｎｅｄｃｏｒｎｅｒａｎｄｅｄｇｅｄｅｔｅｃｔｏｒ，” ＩｎＡｌｖｅｙＶｉｓｉｏｎＣｏｎｆｅｒｅｎｃｅ，ｐａｇｅｓ１４７−１５２，１９８８．C. Harris and M. J. Stephens, "A combined corner and edge detector," In Alvey Vision Conference, pages 147-152, 1988. ＤａｖｉｄＧ．Ｌｏｗｅ， “ＤｉｓｔｉｎｃｔｉｖｅＩｍａｇｅＦｅａｔｕｒｅｓｆｒｏｍＳｃａｌｅ−ＩｎｖａｒｉａｎｔＫｅｙｐｏｉｎｔｓ，” ＩｎｔｅｒｎａｔｉｏｎａｌＪｏｕｒｎａｌｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，６０，２（２００４），ｐｐ．９１−１１０．David G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.

特許文献１に記載の方法も、非特許文献１に記載の方法も、全ての検索対象に対する検索が完了した後、結果を出力し表示する。しかしながら、多量の監視カメラの映像や、長時間の映像を検索対象とする場合、検索結果が得られるまでには時間がかかるため、上述の従来技術では、監視者（ユーザ）が検索の指示をした後、検索結果を確認できるようになるまでには時間がかかる。これは、監視カメラの映像中の人物の検索に限らず、その他のオブジェクトを対象とした検索においても同様である。 Both the method described in Patent Document 1 and the method described in Non-Patent Document 1 output and display the results after the search for all search targets is completed. However, when video of a large number of surveillance cameras or video for a long time is to be searched, it takes time to obtain a search result. Therefore, in the above-mentioned prior art, the supervisor (user) instructs the search It takes a long time before you can check the search results. This is not limited to the search for a person in the video of the surveillance camera, and the same applies to a search for other objects.

検索処理の途中で、それまでの検索処理結果を出力、表示することによって、監視者（ユーザ）が確認するまでの時間を短くすることができる。しかしながら、検索結果が更新され、新たな検索結果が追加される際には、それまでの検索結果の順番を維持されることが望まれる。そこで、本発明は、検索結果の順番が維持されたまま新たな検索結果を順々に出力することを目的とする。 By outputting and displaying the search processing results so far in the middle of the search processing, it is possible to shorten the time until the observer (user) confirms. However, when the search results are updated and new search results are added, it is desirable to maintain the order of the search results so far. Therefore, an object of the present invention is to sequentially output new search results while maintaining the order of the search results.

本発明は、多次元空間における複数のグループに検索対象となる第１のベクトルを登録した記憶手段から、クエリデータから算出される特徴量を表す第２のベクトルに類似する前記第１のベクトルを出力する情報処理装置であって、前記クエリデータを入力する入力手段と、前記入力されたクエリデータから前記第２のベクトルを算出する算出手段と、前記グループに登録されている第１のベクトルと前記第２のベクトルとが取り得る最短の距離を、最短距離として決定する第１の決定手段と、前記決定した最短距離に基づいて、前記第２のベクトルと比較する前記複数のグループの順番を決定する第２の決定手段と、前記決定した順番に基づいて前記グループごとに前記第１のベクトルと前記第２のベクトルとの比較を行い、前記複数のグループそれぞれで前記第１のベクトルと前記第２のベクトルとの距離が前記最短距離よりも短い前記第１のベクトルを検索結果として出力する出力手段と、を有することを特徴とする。 According to the present invention, the first vector similar to the second vector representing the feature value calculated from the query data is stored from the storage means in which the first vectors to be searched are registered in a plurality of groups in the multidimensional space. An information processing apparatus for outputting, the input means for inputting the query data, the calculation means for calculating the second vector from the input query data, and the first vector registered in the group A first determining unit that determines the shortest distance that the second vector can take as the shortest distance, and an order of the plurality of groups to be compared with the second vector based on the determined shortest distance And comparing the first vector and the second vector for each group based on the determined second determining means and the determined order; Wherein the distance between said first vector and said second vector each over-flops and an output means for outputting the short first vector than the shortest distance search result.

本発明によれば、検索結果の順番が維持されたまま新たな検索結果を順々に出力することが可能となる。 According to the present invention, it is possible to sequentially output new search results while maintaining the order of the search results.

第１の実施形態に係る情報処理装置のハードウェア構成を示すブロック図。FIG. 2 is a block diagram showing the hardware configuration of the information processing apparatus according to the first embodiment. 第１の実施形態に係る情報処理装置の機能構成を示すブロック図。FIG. 2 is a block diagram showing a functional configuration of the information processing apparatus according to the first embodiment. 第１の実施形態に係る特徴蓄積部が蓄積する顔画像特徴の特徴量空間を表す概念図。FIG. 6 is a conceptual diagram showing a feature amount space of face image features stored by the feature storage unit according to the first embodiment. 第１の実施形態に係る検索部が検索する顔画像特徴の特徴量空間を表す概念図。FIG. 3 is a conceptual diagram showing a feature amount space of face image features searched by a search unit according to the first embodiment. 第１の実施形態に係る検索結果表示部による検索結果の表示例を示す図。FIG. 6 is a view showing a display example of search results by a search result display unit according to the first embodiment. 第１の実施形態に係る顔画像特徴を蓄積する処理手順を示すフローチャート。6 is a flowchart showing a processing procedure of accumulating face image features according to the first embodiment. 第１の実施形態に係る顔画像を検索する処理手順を示すフローチャート。6 is a flowchart showing a processing procedure for searching for a face image according to the first embodiment. 第２の実施形態に係る特徴蓄積部が蓄積する顔画像特徴の特徴量空間を表す概念図。FIG. 10 is a conceptual diagram showing a feature amount space of face image features stored by a feature storage unit according to the second embodiment. 第２の実施形態に係る検索部が検索する顔画像特徴の特徴量空間を表す概念図。The conceptual diagram showing the feature-value space of the face image feature which the search part which concerns on 2nd Embodiment searches. 第２の実施形態に係る顔画像特徴を蓄積する処理手順を示すフローチャート。The flowchart which shows the process sequence which accumulate | stores the face image characteristic which concerns on 2nd Embodiment. 第２の実施形態に係る顔画像を検索する処理手順を示すフローチャート。The flowchart which shows the process sequence which searches the face image which concerns on 2nd Embodiment. 第３の実施形態において各インデクスからの検索結果の一例を示す概念図。The conceptual diagram which shows an example of the search result from each index in 3rd Embodiment.

［第１の実施形態］
以下、本発明の第１の実施形態の詳細について図面を参照しつつ説明する。本実施形態では、監視カメラで撮影された映像中の人物の画像から顔画像特徴を算出し、その顔画像特徴量をカメラ情報、撮影時刻等と関連付けて記憶する。そして、クエリ（検索元）として与えられた顔画像をもとに顔画像検索を行う。その際、検索を行いながら順々に結果を表示する。 First Embodiment
Hereinafter, the details of the first embodiment of the present invention will be described with reference to the drawings. In the present embodiment, a facial image feature is calculated from an image of a person in a video captured by a surveillance camera, and the facial image feature amount is stored in association with camera information, shooting time, and the like. Then, face image search is performed based on the face image given as a query (search source). At that time, the results are displayed one after another while performing a search.

図１は、本実施形態において、サーバ装置やクライアント装置を構成する情報処理装置１００のハードウェア構成例を示すブロック図である。なお、サーバ装置やクライアント装置はそれぞれ単一の情報処理装置で実現してもよいし、必要に応じた複数の装置に各機能を分散して情報処理装置を実現するようにしてもよい。複数の装置で構成される場合は、互いに通信可能なようにＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）などで接続されている。また、情報処理装置は、パーソナルコンピュータ（ＰＣ）やワークステーション（ＷＳ）等の装置によって実現することができる。 FIG. 1 is a block diagram showing an example of the hardware configuration of an information processing apparatus 100 constituting a server apparatus and a client apparatus in the present embodiment. The server apparatus and the client apparatus may be realized by a single information processing apparatus, or the functions may be distributed to a plurality of apparatuses as needed to realize the information processing apparatus. When configured by a plurality of devices, they are connected by a LAN (Local Area Network) or the like so that they can communicate with each other. Further, the information processing apparatus can be realized by an apparatus such as a personal computer (PC) or a workstation (WS).

図１において、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０１は、情報処理装置１００全体を制御する。ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１０２は、変更を必要としないプログラムやパラメータを格納するメモリである。ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１０３は、外部装置などから供給されるプログラムやデータを一時記憶するメモリである。外部記憶装置１０４は、情報処理装置１００に固定して設置されたハードディスクやメモリカードなどの記憶装置である。なお、外部記憶装置１０４は、情報処理装置１００から着脱可能なフレキシブルディスク（ＦＤ）やＣＤ等の光ディスク、磁気や光カード、ＩＣカード、メモリカードなどであってもよい。後述する各動作は、ＲＯＭ１０２や外部記憶装置１０４に格納されたプログラムをＣＰＵ１０１が実行することにより実行される。 In FIG. 1, a CPU (Central Processing Unit) 101 controls the entire information processing apparatus 100. A ROM (Read Only Memory) 102 is a memory that stores programs and parameters that do not need to be changed. A random access memory (RAM) 103 is a memory for temporarily storing programs and data supplied from an external device or the like. The external storage device 104 is a storage device such as a hard disk or a memory card fixedly installed in the information processing apparatus 100. Note that the external storage device 104 may be a flexible disk (FD) or an optical disk such as a CD or the like removable from the information processing apparatus 100, a magnetic or optical card, an IC card, a memory card, or the like. Each operation described later is executed by the CPU 101 executing a program stored in the ROM 102 or the external storage device 104.

入力デバイスインターフェイス１０５は、ユーザの操作を受け、データを入力するポインティングデバイスやキーボードなどの入力デバイス１０９とのインターフェイスである。出力デバイスインターフェイス１０６は、情報処理装置１００の保持するデータや供給されたデータを表示するためのモニタ１１０とのインターフェイスである。通信インターフェイス１０７は、インターネットなどのネットワーク回線１１１に接続するための通信インターフェイスである。ネットワークカメラ１１２は、監視カメラなどの映像の撮像装置であり、ネットワーク回線１１１を介して情報処理装置１００に接続されている。システムバス１０８は前述した各ユニットを通信可能に接続する伝送路である。 The input device interface 105 is an interface with an input device 109 such as a pointing device or a keyboard which receives data from a user operation. The output device interface 106 is an interface with the monitor 110 for displaying data held by the information processing apparatus 100 and supplied data. The communication interface 107 is a communication interface for connecting to a network line 111 such as the Internet. The network camera 112 is an imaging device of a video such as a monitoring camera, and is connected to the information processing apparatus 100 via the network line 111. A system bus 108 is a transmission line communicably connecting the units described above.

図２は、本実施形態に係る情報処理装置１００の機能構成例を示すブロック図である。映像入力部２０１は、ネットワークカメラ１１２から通信インターフェイス１０７を介して映像データ（連続画像）を入力する。映像蓄積部２０２は、映像入力部２０１に入力された映像データを外部記憶装置１０４に記憶する。このとき、映像のメタデータとして、撮影時刻や撮影カメラなどの情報を映像データに関連付けて記憶しておく。 FIG. 2 is a block diagram showing an example of the functional configuration of the information processing apparatus 100 according to the present embodiment. The video input unit 201 inputs video data (continuous image) from the network camera 112 via the communication interface 107. The video storage unit 202 stores the video data input to the video input unit 201 in the external storage device 104. At this time, information such as a photographing time and a photographing camera is associated with the video data and stored as metadata of the video.

追尾処理部２０３は、映像入力部２０１から入力された映像中の人物の追尾を行う。人物追尾処理に関しては、例えば、特許文献２に示す公知技術を用いればよい。特許文献２に記載の方法では、動きベクトルから物体を検出し、次フレームでの探索位置を推定してテンプレートマッチングにより人物追尾を行っている。追尾処理部２０３では、同じ人物を追尾している追尾トラックに対して同じ追尾トラックＩＤを発行し、異なる人物の追尾トラックに対しては異なる追尾トラックＩＤを発行することにより一意性を担保し、追尾トラックＩＤから同一人物の特定を可能とする。また同じ人物であっても、追尾が途切れた場合は、異なる追尾トラックＩＤを発行する。 The tracking processing unit 203 tracks the person in the video input from the video input unit 201. For the person tracking process, for example, a known technique shown in Patent Document 2 may be used. In the method described in Patent Document 2, an object is detected from a motion vector, a search position in the next frame is estimated, and person tracking is performed by template matching. The tracking processing unit 203 issues the same tracking track ID to tracking tracks tracking the same person and secures uniqueness by issuing different tracking track IDs to tracking tracks of different persons. It is possible to identify the same person from the tracking track ID. Also, even if the person is the same, when the tracking is interrupted, different tracking track IDs are issued.

顔検出部２０４は、追尾処理部２０３で追尾された人物のフレーム画像のそれぞれから顔検出を行う。また、映像入力部２０１、後述するクエリ映像入力部２０８によって入力された顔画像を含む映像から顔検出を行う。画像から人物の顔を検出する方法については、例えば、特許文献３に示す公知技術を用いればよい。つまり、処理対象の画像に対し、片目候補領域を検出し、複数の片目候補領域からペアリングを行い、ペアリングされた両目位置に基づいて顔領域を決定する。 The face detection unit 204 performs face detection from each of the frame images of the person tracked by the tracking processing unit 203. Also, the video input unit 201 performs face detection from the video including the face image input by the query video input unit 208 described later. As a method of detecting the face of a person from an image, for example, a known technique shown in Patent Document 3 may be used. That is, a one-eye candidate area is detected for an image to be processed, pairing is performed from a plurality of one-eye candidate areas, and a face area is determined based on the paired eye positions.

代表顔画像決定部２０５は、追尾された人物のフレーム画像群から、代表となる顔画像を選択する。代表顔画像の選択処理については、例えば、顔検出部２０４で検出された顔サイズの大きい画像を選択する。顔サイズの大きい画像を用いる理由として、顔画像が大きいほど、精度の高い画像特徴が得られるからである。すなわち、顔画像から画像特徴を算出する際、顔画像の大きさを一定の大きさに変倍する顔サイズ正規化処理を行う必要がある。その際、顔画像が上記一定の大きさよりも大きい場合には縮小処理を行い情報のロスは比較的小さいが、上記一定の大きさよりも小さい場合には超解像度処理の様な画素補完を行う必要が有り、情報の劣化が激しい。 The representative face image determination unit 205 selects a representative face image from a group of frame images of the tracked person. For the process of selecting a representative face image, for example, an image with a large face size detected by the face detection unit 204 is selected. The reason for using an image with a large face size is that the larger the face image, the more accurate image features can be obtained. That is, when calculating an image feature from a face image, it is necessary to perform face size normalization processing for scaling the size of the face image to a fixed size. At that time, if the face image is larger than the above-mentioned fixed size, reduction processing is performed and the loss of information is relatively small, but if smaller than the above-mentioned fixed size, it is necessary to perform pixel complementation like super resolution processing. There is severe deterioration of information.

また、代表顔画像として、フレーム画像群から複数の画像を選択するようにしてもよい。例えば、複数の顔の向きの画像を選択する方法がある。同じ人の画像であっても、顔の向きが異なるとその画像から得られる画像特徴が異なるためである。画像から人物の顔向きを検出する方法については、例えば、非特許文献２に示す公知技術を用いればよい。非特許文献２に開示されている技術では、特徴量として勾配方向ヒストグラム（ＨｉｓｔｏｇｒａｍｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔ，ＨＯＧ）を抽出し、ＳＶＲで顔向きを推定している。勾配方向ヒストグラムは、画像の輝度勾配情報を画像の局所毎にヒストグラム化した特徴量で、局所的なノイズや画像の明暗にロバストな特徴量として知られている。ノイズや照明変動のような、顔の向きに関係のない変動にロバストな特徴量を選択することで、実環境においても安定した顔向き推定を実現している。 Further, a plurality of images may be selected from the frame image group as the representative face image. For example, there is a method of selecting a plurality of face orientation images. This is because even if the image of the same person is different in the face orientation, the image features obtained from the image are different. As a method of detecting the face direction of a person from an image, for example, a known technique shown in Non-Patent Document 2 may be used. In the technology disclosed in Non-Patent Document 2, a gradient direction histogram (Histogram of Oriented Gradient, HOG) is extracted as a feature amount, and the face direction is estimated by SVR. The gradient direction histogram is a feature quantity obtained by histogramming the brightness gradient information of an image for each of the images, and is known as a robust feature quantity for local noise or light and dark of the image. By selecting a feature that is robust to variations unrelated to the face orientation, such as noise and illumination variations, stable face orientation estimation is realized even in a real environment.

更に、ブレが少ない画像を代表顔画像として選択するようにしてもよい。動画を撮影するカメラでも静止画のカメラと同様に、その場所の明るさに従いシャッター速度が変わる場合がある。従って、暗い場所や被写体の動き速度により、顔画像のブレが生じることがあり、これは直接的に画像特徴量や属性情報の劣化の原因となる。ブレの推定に関しては、顔画像領域の周波数成分を求め、低周波成分と高周波成分との比率を求め、これが低周波成分の比率が所定の値を超えた時にブレを生じていると判断する事が可能となる。その他、眼つぶり、口あきなどがないかどうかの観点で代表顔画像を選択するようにしてもよい。眼つぶりや口あき等があると、器官の画像特徴が変質する可能性があり、これらの画像は代表顔画像として選択しないようにする。 Furthermore, an image with less blurring may be selected as a representative face image. Even with a camera that shoots a moving image, the shutter speed may change according to the brightness of the location, as with a still image camera. Therefore, blurring of the face image may occur due to the dark place or the movement speed of the subject, which directly causes deterioration of the image feature amount and attribute information. In order to estimate blurring, the frequency component of the face image area is determined, the ratio of the low frequency component to the high frequency component is determined, and it is determined that blurring occurs when the ratio of the low frequency component exceeds a predetermined value. Is possible. In addition, the representative face image may be selected from the viewpoint of whether or not there are eyelids and mouths. If there is eyelid or mouth, there is a possibility that the image feature of the organ may be altered, and these images should not be selected as representative face images.

本実施形態では、追尾処理部２０３で人物の追尾を行い、顔検出部２０４で追尾された人物の顔を検出し、代表顔画像決定部２０５で追尾された人物のフレーム画像群から代表となる顔画像を選択している。しかしながら、映像入力部２０１に入力された映像から顔検出部２０４で人物の顔を検出し、それらすべての顔画像を後述の特徴算出部２０６に渡すようにしてもよい。 In this embodiment, the tracking processing unit 203 tracks a person, the face detection unit 204 detects the face of the person tracked, and the representative face image determination unit 205 represents a group of frame images of the person tracked. The face image is selected. However, the face detection unit 204 may detect the face of a person from the video input to the video input unit 201 and pass all the face images to the feature calculation unit 206 described later.

特徴算出部２０６は、顔画像特徴の算出を行う。本実施形態では、顔全体をブロック分割して算出したＬＢＰ（ＬｏｃａｌＢｉｎａｒｙＰａｔｔｅｒｎ）特徴を用いる。なお、この特徴は一例であり、本実施形態はこれに限定されるものではない。人物の顔画像内の目、口などの器官点を求め、各器官点のＳＩＦＴ（ＳｃａｌｅＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍ）特徴を算出して用いても良いし、ＤｅｅｐＬｅａｒｎｉｎｇを行って特徴を算出しても良い。これらの顔画像特徴は、多次元のベクトル（多次元ベクトル）で表される。 The feature calculation unit 206 calculates a face image feature. In this embodiment, the LBP (Local Binary Pattern) feature calculated by dividing the whole face into blocks is used. Note that this feature is an example, and the present embodiment is not limited to this. Organ points such as eyes and mouth in the face image of a person may be determined, and SIFT (Scale Invariant Feature Transform) features of each organ point may be calculated and used, or Deep Learning may be performed to calculate the features. . These face image features are represented by multidimensional vectors (multidimensional vectors).

特徴蓄積部２０７は、映像入力部２０１に入力された映像から追尾処理部２０３、顔検出部２０４、代表顔画像決定部２０５、特徴算出部２０６を経て算出された顔画像特徴（検索対象ベクトル）を外部記憶装置１０４に記憶する。加えて、顔画像特徴のメタデータとして、人物のＩＤや、人物を追尾したときの追尾トラックＩＤ、撮影時刻、撮影カメラなどの情報を顔画像特徴と関連付けて記憶しておく。その際、顔画像特徴の類似性を高速に検索できるようにするため、インデクスを作成する。具体的には、顔画像特徴をグループ化して登録し、かつ、グループの代表を記憶しておく。検索の際は、まずグループの代表と比較して比較対象とするグループを絞り込み、絞り込んだグループに登録されている顔画像特徴との比較を行う。これにより、登録されたすべての顔画像特徴との比較をしなくてもよくなり、高速検索が実現可能となる。また、本実施形態では、グループに登録する顔画像特徴とグループの代表との距離が最も長い距離を記憶しておく。この距離は後述する順次検索時に使用する。 The feature storage unit 207 calculates a face image feature (search target vector) calculated through the tracking processing unit 203, the face detection unit 204, the representative face image determination unit 205, and the feature calculation unit 206 from the video input to the video input unit 201. Are stored in the external storage device 104. In addition, information such as the ID of a person, a tracking track ID when tracking a person, shooting time, and a shooting camera are stored as metadata of a face image feature in association with the face image feature. At this time, an index is created in order to be able to retrieve the similarity of face image features at high speed. Specifically, face image features are grouped and registered, and a representative of the group is stored. At the time of search, first, the group to be compared is narrowed down in comparison with the representative of the group, and comparison is made with the face image features registered in the narrowed group. As a result, comparison with all registered face image features is not necessary, and high-speed search can be realized. Further, in the present embodiment, the distance between the face image feature registered in the group and the representative of the group is the longest. This distance is used at the time of sequential search described later.

図３は、特徴蓄積部２０７が蓄積する顔画像特徴の特徴量空間を表した概念図である。本実施形態で用いられる顔画像特徴は、２５６次元等の多次元ベクトルであるが、ここでは説明の簡略化のため２次元の図で説明する。 FIG. 3 is a conceptual diagram showing a feature amount space of face image features stored by the feature storage unit 207. As shown in FIG. The face image feature used in the present embodiment is a multidimensional vector such as 256 dimensions, but here, in order to simplify the description, it will be described with a two dimensional diagram.

点３０１は、特徴量空間上の特徴量である。特徴量空間はｋ−ｍｅａｎｓ法等により複数の空間に分割され、特徴量は分割により生成されたクラスタに登録される。特徴量空間を分割した領域３０２がクラスタ、特徴量空間を分割している線３０３が各クラスタの境界である。多次元の特徴量空間では、超面で各クラスタに分割される。 A point 301 is a feature on the feature space. The feature amount space is divided into a plurality of spaces by the k-means method or the like, and the feature amount is registered in the cluster generated by the division. An area 302 obtained by dividing the feature space is a cluster, and a line 303 obtained by dividing the feature space is a boundary of each cluster. In a multi-dimensional feature space, it is divided into clusters in a hyperplane.

三角印３０４は、各クラスタを代表する特徴（代表ベクトル）である。代表特徴は、クラスタ内に含まれる顔画像特徴の多次元ベクトルの平均ベクトルでもよいし、クラスタの重心でもよいし、多次元ベクトルの平均やクラスタの重心に最も近い特徴であってもよい。 A triangle mark 304 is a feature (representative vector) representing each cluster. The representative feature may be an average vector of multidimensional vectors of face image features included in a cluster, a centroid of a cluster, or a feature closest to the average of multidimensional vectors or the centroid of a cluster.

また、特徴蓄積部２０７は、クラスタごとに、クラスタの代表特徴と、このクラスタに登録する特徴との距離が最も長い距離ｒ（例えば３０５）を記憶しておく。クラスタの代表特徴から半径ｒの円３０６（多次元の特徴量空間では超円）の内部に、そのクラスタの特徴量すべてが登録されていることになる。 In addition, the feature storage unit 207 stores, for each cluster, a distance r (for example, 305) having the longest distance between the representative feature of the cluster and the feature registered in the cluster. From the representative features of the cluster, all feature quantities of the cluster are registered within a circle 306 of radius r (super-circle in a multi-dimensional feature quantity space).

クエリ映像入力部２０８は、監視者（ユーザ）により指定された、検索に用いる人の顔画像を入力する。具体的には、まず、外部記憶装置１０４に記憶された人の顔画像をモニタ１１０に表示し、ユーザの操作によって入力デバイス１０９を介して指定する。クエリ映像入力部２０８は、その指定された顔画像を入力する。本実施形態において、検索に用いる人の顔画像の指定方法は特定の方法に限定されるものではない。また、検索に用いる人の顔画像は１枚でもよいし複数枚であってもよい。 The query video input unit 208 inputs a face image of a person used for a search designated by the monitor (user). Specifically, first, the face image of a person stored in the external storage device 104 is displayed on the monitor 110, and designated by the user via the input device 109. The query video input unit 208 inputs the specified face image. In the present embodiment, the method of specifying the face image of the person used for the search is not limited to a specific method. In addition, the face image of the person used for the search may be one or more.

検索部２０９は、クエリ画像（検索元画像）から顔検出部２０４、特徴算出部２０６を経て算出された顔画像特徴（検索対象ベクトルと同次元のクエリベクトル）をクエリとし、外部記憶装置１０４に記憶された多数の顔画像特徴から検索を行う。そして、顔画像特徴の類似度が所定の閾値ＴＨ１以上の顔画像を検索結果として特定する。なお、本実施形態において、ＬＢＰ特徴の距離の逆数を正規化し、それを類似度として求める。検索部２０９は、検索処理を行いながら、検索した結果を順次、検索結果表示部２１０へと出力する。 The search unit 209 uses the face image feature (query vector having the same dimension as the search target vector) calculated from the query image (search source image) via the face detection unit 204 and the feature calculation unit 206 as a query. A search is performed from a large number of stored face image features. Then, a face image having a face image feature similarity equal to or greater than a predetermined threshold TH1 is specified as a search result. In the present embodiment, the reciprocal of the distance of the LBP feature is normalized and obtained as the similarity. The search unit 209 sequentially outputs the search results to the search result display unit 210 while performing the search process.

図４は、検索部２０９が検索する際の顔画像特徴の特徴量空間を表した概念図である。検索の際に実際に用いられる特徴は、２５６次元等の多次元ベクトルであるが、ここでは説明の簡略化のため２次元の図で説明する。 FIG. 4 is a conceptual diagram showing a feature amount space of face image features when the search unit 209 searches. The feature actually used in the search is a multi-dimensional vector such as 256 dimensions, but here, in order to simplify the explanation, it will be described with a two-dimensional diagram.

点４０１は特徴量空間上の特徴量、領域４０２はクラスタである。線４０３は各クラスタの境界、三角印４０４は各クラスタを代表する特徴（代表ベクトル）、４０５はクラスタ代表特徴とクラスタ内特徴との最長距離ｒである。ここでは、クラスタ４０２の一部のクラスタに対しクラスタ名称を付け、それらをＧ−１、Ｇ−２、Ｇ−３とする。 A point 401 is a feature amount on the feature amount space, and an area 402 is a cluster. A line 403 is a boundary of each cluster, a triangle mark 404 is a feature (representative vector) representing each cluster, and 405 is a longest distance r between the cluster representative feature and the in-cluster feature. Here, cluster names are given to some of the clusters of the cluster 402, and these are designated as G-1, G-2, and G-3.

星印４０６は、クエリ顔画像特徴である。ここでは、クエリ顔画像特徴は、クラスタＧ−１の中に存在するものとする。検索部２０９は、まずクエリ顔画像特徴と各クラスタの代表特徴とを比較し、距離４０７（ｄｋ（ｋ＝１、２、・・・））を算出する。次に、クエリ顔画像特徴と各クラスタに登録されている画像特徴との最短距離を、クラスタの代表特徴とクラスタ内特徴との最長距離４０５（ｒｋ（ｋ＝１，２、・・・））を利用して推測する。 Asterisk 406 is a query face image feature. Here, it is assumed that the query face image feature is present in cluster G-1. First, the search unit 209 compares the query face image feature with the representative feature of each cluster to calculate a distance 407 (dk (k = 1, 2,...)). Next, the shortest distance between the query face image feature and the image feature registered in each cluster is the longest distance 405 between the representative feature of the cluster and the intra-cluster feature (rk (k = 1, 2,...)) Use to guess.

ここで、クラスタＧ−２に注目して説明すると、クエリ顔画像特徴とクラスタＧ−２の代表特徴との距離はｄ２である。また、クラスタ代表特徴とクラスタ内特徴との最長距離はｒ２である。このとき、（ｄ２−ｒ２）は、クエリ顔画像特徴とクラスタＧ−２の代表特徴を中心とした半径ｒ２の円との最短距離である。クエリ顔画像特徴からこの距離（ｄ２−ｒ２）よりも短い範囲には、クラスタＧ−２の画像特徴は存在しないため、クエリ顔画像特徴とクラスタＧ−２に登録されている画像特徴との距離は最短でも（ｄ２−ｒ２）となる。すなわち、最短距離は、クエリ顔画像特徴と注目するクラスタに属する画像特徴とが取り得る最短の距離に相当する。 Here, when focusing on the cluster G-2, the distance between the query face image feature and the representative feature of the cluster G-2 is d2. Also, the longest distance between the cluster representative feature and the intra-cluster feature is r2. At this time, (d2-r2) is the shortest distance between the query face image feature and the circle of radius r2 centered on the representative feature of the cluster G-2. Since the image feature of cluster G-2 does not exist in the range shorter than this distance (d2-r2) from the query face image feature, the distance between the query face image feature and the image feature registered in cluster G-2 Is (d2-r2) at the shortest. That is, the shortest distance corresponds to the shortest distance that can be taken between the query face image feature and the image feature belonging to the cluster of interest.

検索部２０９は、その他のクラスタについても同様にして、クエリ顔画像特徴と、各クラスタに登録されている画像特徴との最短距離を推測する。そして、最初はクエリ画像特徴が存在するクラスタに登録されている画像特徴との比較を行う。その後は、各クラスタとの最短距離が短いクラスタから順に、クラスタに登録されている画像特徴との比較を行う。 The search unit 209 estimates the shortest distance between the query face image feature and the image feature registered in each cluster in the same manner for other clusters. Then, first, comparison is performed with the image features registered in the cluster in which the query image features exist. After that, the comparison with the image features registered in the clusters is performed sequentially from the cluster having the shortest shortest distance to each cluster.

その際、検索部２０９は、クエリ顔画像特徴と、各クラスタに登録されている画像特徴との比較を行った後は、比較結果のうち、次に比較を行うクラスタとの間で推測した最短距離よりも短い結果を出力する。例えば、図４において、まずクエリ顔画像特徴が存在するクラスタＧ−１との比較を行った後は、次に比較を行うクラスタＧ−２との間で推測した最短距離（ｄ２−ｒ２）を半径とした円４０８の内側の結果を出力する。そして、クラスタＧ−２との比較を行った後は、次に比較を行うクラスタＧ−３との間で推測した最短距離（ｄ３−ｒ３）を半径とした円４０９の内側の結果を出力する。 At this time, after the search unit 209 compares the query face image feature with the image feature registered in each cluster, the shortest among the comparison results is estimated between the cluster to be compared next. Output a result shorter than the distance. For example, in FIG. 4, after first comparing with the cluster G-1 in which the query face image feature is present, the shortest distance (d2-r2) estimated with the cluster G-2 to be compared next is The result inside the circle 408 is output as the radius. Then, after comparison with cluster G-2, the result inside circle 409 with the shortest distance (d3-r3) estimated between cluster G-3 to be compared next being a radius is output .

このようにして、次に比較を行うクラスタとの間で推測した最短距離よりも短い（類似度が高い）結果を出力する。これによって、距離の短い順、すなわち、類似度が高い順を保証した検索結果を順々に出力することが可能となる。 In this way, a result shorter (higher in similarity) than the shortest distance estimated with the cluster to be compared next is output. As a result, it becomes possible to sequentially output search results in which the distance order is short, that is, the order in which the degree of similarity is high is guaranteed.

検索結果表示部２１０は、検索部２０９から順々に出力される検索結果を順々にモニタ１１０に表示する。図５は、検索結果表示部２１０による検索結果の表示例を示している。例えば、図５（Ａ）に示すように、上段の左側に位置するほど類似度の高い画像であるとして、類似度順に検索結果をモニタ１１０に表示する。あるいは、図５（Ｂ）に示すように、カメラ毎に分類した上で類似度順に検索結果をモニタ１１０に表示するようにしてもよい。図５（Ａ）、（Ｂ）の場合とも、検索結果表示部２１０は、検索部２０９の検索結果を順次表示していくことになる。ただし、本実施形態では、検索部２０９は、類似度が高い順に検索結果を順々に出力するため、追加される結果は、現状よりも類似度の低い検索結果追加されることになる。したがって、例えば、ユーザはモニタ１１０に表示される検索結果の確認を途中で打ち切っても、現状よりも類似度の高い検索結果が後から表示されるような事態は抑制される。なお、本実施形態では、検索部２０９より順次出力される検索結果を順次表示できる表示方法であれば、その表示方法はこれらに限定されるものではない。 The search result display unit 210 sequentially displays the search results sequentially output from the search unit 209 on the monitor 110. FIG. 5 shows a display example of a search result by the search result display unit 210. For example, as shown in FIG. 5A, the search result is displayed on the monitor 110 in order of similarity, assuming that the image is more similar to the image located closer to the left side of the upper row. Alternatively, as shown in FIG. 5B, the search results may be displayed on the monitor 110 in the order of the degree of similarity after being classified for each camera. Also in the cases of FIGS. 5A and 5B, the search result display unit 210 sequentially displays the search results of the search unit 209. However, in the present embodiment, since the search unit 209 sequentially outputs the search results in the descending order of the degree of similarity, the results to be added are added as the search results having a degree of similarity lower than the current state. Therefore, for example, even if the user aborts the confirmation of the search result displayed on the monitor 110 halfway, the situation in which the search result having a higher degree of similarity than the current state is displayed later is suppressed. In the present embodiment, as long as the display method can sequentially display the search results sequentially output from the search unit 209, the display method is not limited thereto.

次に、図６を用いて、映像入力部２０１から入力された映像データを検索可能なように蓄積する処理の詳細を説明する。図６は、本実施形態に係る顔画像特徴を蓄積する処理手順を示すフローチャートである。本処理は、先に説明した映像入力部２０１から特徴蓄積部２０７までの処理に対応する。 Next, details of processing for storing video data input from the video input unit 201 in a searchable manner will be described with reference to FIG. FIG. 6 is a flowchart showing a processing procedure for accumulating face image features according to the present embodiment. This processing corresponds to the processing from the video input unit 201 to the feature storage unit 207 described above.

ステップＳ６０１において、映像入力部２０１は、ネットワークカメラ１１２から通信インターフェイス１０７を介して映像データを入力する。 In step S601, the video input unit 201 inputs video data from the network camera 112 via the communication interface 107.

ステップＳ６０２において、映像蓄積部２０２は、ステップＳ６０１で入力された映像データを外部記憶装置１０４に記憶する。加えて、映像のメタデータとして、撮影時刻や撮影したカメラなどの情報を関連付けて記憶しておく。 In step S602, the video storage unit 202 stores the video data input in step S601 in the external storage device 104. In addition, information such as shooting time and a captured camera are associated and stored as video metadata.

ステップＳ６０３からステップＳ６０４は追尾処理部２０３で行われる処理である。まず、ステップＳ６０３において、追尾処理部２０３は、各フレーム画像から人物を検出し、追尾を行う。ここで、検出された人物にはフレーム画像ごとに別々の人物ＩＤが割り振られ、フレーム画像中の人物の座標とともに一時記憶される。また、追尾を行っている人物に対しては、同じ追尾トラックＩＤが割り振られ、追尾を行っているフレーム画像のＩＤとともに一時記憶される。 Steps S603 to S604 are processes performed by the tracking processing unit 203. First, in step S603, the tracking processing unit 203 detects a person from each frame image and performs tracking. Here, a separate person ID is assigned to the detected person for each frame image, and is temporarily stored together with the coordinates of the person in the frame image. The same tracking track ID is assigned to a person who is tracking, and is temporarily stored together with the ID of the frame image being tracked.

ステップＳ６０４において、追尾処理部２０３は、追尾が途切れた人物があるか否かを判定する。この判定の結果、追尾が途切れた人物がある場合は、該人物の追尾画像群が決定することから次のステップＳ６０５に進む。一方、追尾が途切れた人物がない場合は、追尾を継続するため、ステップＳ６０１に戻る。 In step S604, the tracking processing unit 203 determines whether there is a person whose tracking has been interrupted. As a result of this determination, when there is a person whose tracking has been interrupted, the tracking image group of the person is determined, and the process proceeds to the next step S605. On the other hand, when there is no person whose tracking has been interrupted, the processing returns to step S601 to continue the tracking.

ステップＳ６０５において、顔検出部２０４は、追尾処理部２０３で追尾された人物を含むフレーム画像のそれぞれから顔検出を行う。 In step S605, the face detection unit 204 performs face detection from each of the frame images including the person tracked by the tracking processing unit 203.

ステップＳ６０６において、顔検出部２０４は、ステップＳ６０５の処理によって顔が検出されたか否かを判定する。この判定の結果、顔が検出された場合はステップＳ６０７に進み、顔が検出されなかった場合は処理を終了する。 In step S606, the face detection unit 204 determines whether a face is detected in the process of step S605. As a result of the determination, if a face is detected, the process proceeds to step S607, and if a face is not detected, the process ends.

ステップＳ６０７において、代表顔画像決定部２０５は、追尾された人物のフレーム画像群から、検出された顔の代表となる顔画像を１枚もしくは複数枚選択する。 In step S 607, the representative face image determination unit 205 selects one or more face images representing the detected face from the frame image group of the tracked person.

ステップＳ６０８において、特徴算出部２０６は、ステップＳ６０７で選択された１枚もしくは複数枚の代表顔画像から顔画像特徴の算出を行う。 In step S608, the feature calculation unit 206 calculates a face image feature from the one or more representative face images selected in step S607.

ステップＳ６０９において、特徴蓄積部２０７は、ステップＳ６０８で算出された顔画像特徴とそれまでに蓄積された顔画像特徴とでクラスタリングを行う。クラスタリングの手法は、前述（図３）の通りである。 In step S609, the feature storage unit 207 performs clustering on the face image features calculated in step S608 and the face image features stored so far. The clustering method is as described above (FIG. 3).

ステップＳ６１０において、特徴蓄積部２０７は、クラスタの重心を算出し、ステップＳ６１１ではクラスタに含まれる顔画像特徴のうちでクラスタの重心からの距離が最も長い距離を算出する。 In step S610, the feature storage unit 207 calculates the center of gravity of the cluster, and in step S611 calculates the longest distance from the center of gravity of the cluster among the face image features included in the cluster.

ステップＳ６１２において、特徴蓄積部２０７は、顔画像特徴を外部記憶装置１０４に記憶する。加えて、顔画像特徴のメタデータとして、人物ＩＤや、人物を追尾したときの追尾トラックＩＤ、撮影時刻、撮影カメラなどの情報を顔画像特徴と関連付けて記憶する。 In step S612, the feature storage unit 207 stores the face image feature in the external storage device 104. In addition, information such as a person ID, a tracking track ID when tracking a person, shooting time, and a shooting camera is stored as metadata of a face image feature in association with the face image feature.

ステップＳ６１３において、映像が継続している場合はステップＳ６０１に戻り、映像が終了している場合は処理を終了する。 In step S613, when the video continues, the process returns to step S601, and when the video ends, the process ends.

以上の処理により、ネットワークカメラ１１２から入力された映像に写る人物の顔画像の顔画像特徴が特徴蓄積部２０７によって外部記憶装置１０４に蓄積され、検索可能な状態になる。 By the above processing, the feature storage unit 207 stores the face image feature of the face image of the person appearing in the image input from the network camera 112 in the external storage device 104, and the search becomes possible.

なお、本実施形態では、映像が入力され顔が検出されるごとに、クラスタリングを行っている。しかしながら、登録されるデータ量が少ない場合はクラスタリングの効果も小さいため、検出された顔が所定数に達するまではクラスタリングせずにデータをそのまま登録しておいてもよい。また、クラスタリングはデータの分布を見てグループ化する比較的重い処理である。したがって、毎回クラスタリングを行う（クラスタを更新する）のではなく、クラスタリングを行った後しばらくの間は、登録する顔画像特徴は各クラスタへの分類のみを行うようにすることも可能である。その場合は、クラスタ重心からの距離が最長になる場合に、このクラスタの重心からの最長距離を更新する。 In the present embodiment, clustering is performed each time an image is input and a face is detected. However, when the amount of data to be registered is small, the effect of clustering is small, so data may be registered as it is without clustering until the number of detected faces reaches a predetermined number. Also, clustering is a relatively heavy process of looking at the distribution of data and grouping. Therefore, instead of performing clustering each time (updating clusters), it is also possible to perform classification of face image features to be registered into each cluster only for a while after clustering. In that case, when the distance from the cluster centroid is the longest, the longest distance from the cluster centroid is updated.

次に、図７を用いて、クエリ映像入力部２０８から入力されたクエリ画像から人物の顔画像を検索する処理の詳細を説明する。図７は、本実施形態において、人物の顔画像を検索する処理手順を示すフローチャートである。本処理は、先に説明した図２における顔検出部２０４、特徴算出部２０６、クエリ映像入力部２０８から検索結果表示部２１０までの処理に対応する。 Next, details of processing for searching for a face image of a person from the query image input from the query video input unit 208 will be described using FIG. 7. FIG. 7 is a flowchart showing a processing procedure for searching for a face image of a person in the present embodiment. This processing corresponds to the processing from the face detection unit 204, the feature calculation unit 206, and the query video input unit 208 to the search result display unit 210 in FIG. 2 described above.

ステップＳ７０１において、クエリ映像入力部２０８は、検索に用いる人の顔画像を入力する。 In step S701, the query video input unit 208 inputs a face image of a person used for the search.

ステップＳ７０２において、顔検出部２０４は、クエリ映像入力部２０８によって入力された顔画像を含む映像から顔検出を行う。 In step S702, the face detection unit 204 performs face detection from the video including the face image input by the query video input unit 208.

ステップＳ７０３において、特徴算出部２０６は、ステップＳ７０２で検出された顔画像から顔画像特徴の算出を行う。 In step S703, the feature calculation unit 206 calculates a face image feature from the face image detected in step S702.

ステップＳ７０４において、検索部２０９は、ステップＳ７０３で算出された顔画像特徴をクエリとし、外部記憶装置１０４に記憶された各クラスタ重心との比較を行い、距離４０７（ｄｋ（ｋ＝１、２、・・・））を算出する。 In step S704, using the face image feature calculated in step S703 as a query, the search unit 209 compares each of the cluster centroids stored in the external storage device 104 with each other to obtain a distance 407 (dk (k = 1, 2,. ...)) is calculated.

ステップＳ７０５において、検索部２０９は、各クラスタに登録されている画像特徴との最短距離を、クラスタの代表特徴とクラスタ内特徴との最長距離４０５（ｒｋ（ｋ＝１，２、・・・））を利用して推測し、ソートすることで比較順を決定する。この最短距離が短いクラスタから順番に比較する。 In step S 705, the search unit 209 sets the shortest distance between the representative feature of the cluster and the intra-cluster feature to the shortest distance 405 (rk (k = 1, 2,...)) Determine the comparison order by guessing and sorting using. The shortest distances are compared in order from the shortest cluster.

ステップＳ７０６において、検索部２０９は、クエリ顔特徴が存在するクラスタを比較対象クラスタに設定する。 In step S706, the search unit 209 sets a cluster having a query face feature as a comparison target cluster.

ステップＳ７０７において、クエリ顔特徴と比較対象クラスタ内の各画像特徴とを比較し、その距離を算出する。 In step S 707, the query face feature is compared with each image feature in the comparison target cluster, and the distance is calculated.

ステップＳ７０８において、比較対象クラスタがまだ存在する場合はステップＳ７０９に進み、比較対象クラスタが存在しない場合はステップＳ７１２に進む。 In step S708, if the comparison target cluster still exists, the process proceeds to step S709, and if the comparison target cluster does not exist, the process proceeds to step S712.

ステップＳ７０９において、ステップＳ７０７における比較結果のうち、次の比較対象クラスタとの最短距離よりも距離が短い比較結果を出力する。 In step S709, among the comparison results in step S707, the comparison result whose distance is shorter than the shortest distance to the next comparison target cluster is output.

ステップＳ７１０において、検索結果表示部２１０は、ステップＳ７０９で出力された比較結果を前回の結果に追加してモニタ１１０に表示する。なお、同一カメラから複数の結果が得られたときは、すべての結果を表示するのではなく、顔画像の類似度が最も高いまたは高い方から所定数の結果を表示することも可能である。また、表示結果数が多い場合は分割して表示してもよいし、ユーザの指示があった後に更新して表示するようにしてもよい。 In step S710, the search result display unit 210 adds the comparison result output in step S709 to the previous result and displays the result on the monitor 110. In addition, when a plurality of results are obtained from the same camera, it is also possible to display a predetermined number of results from the one with the highest or the highest similarity of the face image, instead of displaying all the results. Also, if there are a large number of display results, they may be divided and displayed, or may be updated and displayed after a user's instruction.

ステップＳ７１１において、最短距離が次に短いクラスタを比較対象クラスタに設定し、ステップＳ７０７に進む。 In step S711, the cluster with the shortest shortest distance is set as the comparison target cluster, and the process proceeds to step S707.

ステップＳ７１２では、すべての比較対象クラスタとの比較が完了したため、全比較結果を出力する。 In step S712, since comparison with all comparison target clusters is completed, all comparison results are output.

ステップＳ７１３において、検索結果表示部２１０は、ステップＳ７１２で出力された比較結果を前回の結果に追加してモニタ１１０に表示し、処理を終了する。 In step S713, the search result display unit 210 adds the comparison result output in step S712 to the previous result, displays the result on the monitor 110, and ends the processing.

なお、登録データ量が少なく登録時にクラスタリングを行っていない場合は、本処理フローによらず、登録されているすべての特徴との比較を行い、その結果を出力するという通常の処理を行えばよい。 If the amount of registered data is small and clustering is not performed at the time of registration, a normal process of comparing with all registered features regardless of this processing flow may be performed, and the result may be output. .

以上のように、本実施形態によれば、監視カメラの映像中の人物を追尾して得られた同一人物の画像から顔画像特徴を算出し、それらをクラスタリングして記憶するようにした。そして、クエリとして与えられた顔画像をもとに顔画像検索を行う際、まず、クエリ顔特徴と各クラスタ内の特徴との最短距離を推測する。そして、最短距離が短いクラスタから順に、クエリ顔特徴と記憶されている画像特徴との比較を行い、その検索結果を順次出力するようにした。そのため、本実施形態では、ユーザが検索結果を確認できるようになるまでの時間を短くすることが可能になる。 As described above, according to the present embodiment, facial image features are calculated from images of the same person obtained by tracking the person in the video of the surveillance camera, and they are clustered and stored. Then, when performing face image search based on the face image given as a query, first, the shortest distance between the query face feature and the feature in each cluster is estimated. Then, the query face feature and the stored image feature are compared in order from the cluster with the shortest shortest distance, and the search result is sequentially output. Therefore, in the present embodiment, it is possible to shorten the time until the user can confirm the search result.

また、本実施形態ではクエリ顔特徴の存在するクラスタの各画像特徴との比較が終了した後は、最短距離の短い順に各クラスタ内特徴との比較を行い、次に比較を行うクラスタとの推測最短距離よりも短い結果を出力する。この構成によって、本実施形態では、距離の短い順、すなわち、類似度が高い順を保証した結果を順々に出力することが可能となる。 Further, in the present embodiment, after the comparison with each image feature of the cluster in which the query face feature is present is completed, the comparison with each intra-cluster feature is performed in ascending order of shortest distance, and estimation with the cluster to be compared next Output a result shorter than the shortest distance. According to this configuration, in the present embodiment, it is possible to sequentially output the results of guaranteeing the order of short distance, that is, the order of high similarity.

［第２の実施形態］
次に、本発明の第２の実施形態について説明する。第１の実施形態では、特徴量空間をｋ−ｍｅａｎｓ法等により複数の空間に分割し、分割したクラスタに特徴量を登録した。本実施形態では、特徴量空間を所定のルールで分割し、分割したクラスタに特徴量を登録する。ここでは、各クラスタの境界を多次元式で表現可能なように分割する。なお、第１の実施形態で既に説明をした構成については同一の符号を付し、その説明を省略する。 Second Embodiment
Next, a second embodiment of the present invention will be described. In the first embodiment, the feature amount space is divided into a plurality of spaces by the k-means method or the like, and the feature amounts are registered in the divided clusters. In the present embodiment, the feature amount space is divided according to a predetermined rule, and the feature amounts are registered in the divided clusters. Here, the boundaries of each cluster are divided so that they can be expressed by multidimensional expressions. The components already described in the first embodiment are denoted by the same reference numerals, and the description thereof is omitted.

図８は、本実施形態に係る特徴蓄積部２０７が蓄積する顔画像特徴の特徴量空間を表す概念図である。本実施形態において用いられる特徴は、２５６次元等の多次元ベクトルであるが、ここでは説明の簡略化のため２次元の図で説明する。なお、特徴蓄積部２０７は、第１の実施形態では、クラスタに登録する各画像特徴とクラスタの代表特徴との距離のうち最も長い距離を記憶していたが、本実施形態では不要である。 FIG. 8 is a conceptual diagram showing a feature amount space of face image features stored by the feature storage unit 207 according to the present embodiment. The features used in the present embodiment are multi-dimensional vectors such as 256 dimensions, but will be described here in a two-dimensional diagram for the simplification of the description. In the first embodiment, the feature storage unit 207 stores the longest distance among the distances between each image feature registered in the cluster and the representative feature of the cluster, but this feature is not necessary in this embodiment.

点８０１は、特徴量空間上の特徴量である。特徴量空間は格子状の複数の空間に分割され、特徴量は分割されたクラスタに登録される。本実施形態では、正方格子の例を示しているが、三角格子や六角格子でも良い。また、多次元空間においては、単純超立方格子で分割してもよいし、面心超立方格子で分割してもよい。特徴量空間を分割した領域８０２がクラスタ、特徴量空間を分割している線８０３が各クラスタの境界である。多次元の特徴量空間では、超面で各クラスタに分割される。 A point 801 is a feature on the feature space. The feature space is divided into a plurality of lattice-like spaces, and the feature is registered in the divided clusters. In this embodiment, an example of a square lattice is shown, but a triangular lattice or a hexagonal lattice may be used. Moreover, in a multidimensional space, it may divide | segment by a simple super cubic lattice, and may divide | segment by a face-centered super cubic lattice. An area 802 obtained by dividing the feature space is a cluster, and a line 803 obtained by dividing the feature space is a boundary of each cluster. In a multi-dimensional feature space, it is divided into clusters in a hyperplane.

図９は、本実施形態に係る検索部２０９が検索する顔画像特徴の特徴量空間を表す概念図ある。本実施形態において用いられる特徴は、２５６次元等の多次元ベクトルであるが、ここでは説明の簡略化のため２次元の図で説明する。 FIG. 9 is a conceptual diagram showing a feature amount space of face image features searched by the search unit 209 according to the present embodiment. The features used in the present embodiment are multi-dimensional vectors such as 256 dimensions, but will be described here in a two-dimensional diagram for the simplification of the description.

点９０１は特徴量空間上の特徴量、領域９０２はクラスタである。線９０３は各クラスタの境界である。ここでは、クラスタ９０２の一部のクラスタに対してクラスタ名称を付け、それらをＧ−１、Ｇ−２、Ｇ−３、Ｇ−４とする。 A point 901 is a feature amount on the feature amount space, and an area 902 is a cluster. Line 903 is the boundary of each cluster. Here, cluster names are given to some of the clusters of the cluster 902, and these are referred to as G-1, G-2, G-3, and G-4.

星印９０４は、クエリ顔画像特徴である。ここでは、クエリ顔画像特徴は、クラスタＧ−１の中に存在する。検索部２０９は、まず、クエリ顔画像特徴と、各クラスタとの間で最短距離を算出する。例えば、クエリ顔画像特徴とクラスタＧ−２の最短距離は、クエリ画像特徴からクラスタＧ−１とクラスタＧ−２の境界への法線の距離である。クエリ顔画像特徴とクラスタＧ−３の最短距離は、クエリ画像特徴からクラスタＧ−１とクラスタＧ−３の境界への法線の距離である。クエリ顔画像特徴とクラスタＧ−４の最短距離は、クエリ画像特徴からクラスタＧ−１、クラスタＧ−２、クラスタＧ−３、クラスタＧ−４の交点までの距離である。 Asterisk 904 is a query face image feature. Here, the query face image feature is present in cluster G-1. The search unit 209 first calculates the shortest distance between the query face image feature and each cluster. For example, the shortest distance between the query face image feature and the cluster G-2 is the distance of the normal from the query image feature to the boundary between the cluster G-1 and the cluster G-2. The shortest distance between the query face image feature and the cluster G-3 is the distance of the normal from the query image feature to the boundary between the cluster G-1 and the cluster G-3. The shortest distance between the query face image feature and cluster G-4 is the distance from the query image feature to the intersection of cluster G-1, cluster G-2, cluster G-3, and cluster G-4.

ｎ次元空間におけるある点（ｑ_１，ｑ_２，…，ｑ_ｎ）から数式１で表される超平面までの法線の距離は、数式２で算出できる。 The distance of the normal from a certain point (q ₁ , q ₂ ,..., q _n ) in the n-dimensional space to the hyperplane represented by equation 1 can be calculated by equation 2.

また、ｎ次元空間における２点（ｐ_１，ｐ_２，…，ｐ_ｎ）、（ｑ_１，ｑ_２，…，ｑ_ｎ）間の距離は、数式３で算出できる。 Further, the distance between two points (p ₁ , p ₂ ,..., P _n ) and (q ₁ , q ₂ ,..., Q _n ) in the n-dimensional space can be calculated by Equation 3.

検索部２０９は、まずクエリ顔特徴が存在するクラスタに登録されている各画像特徴との比較を行う。その後は、各クラスタとの最短距離が短いクラスタから順に、クラスタに登録されている各画像特徴との比較を行う。 The search unit 209 first compares each image feature registered in the cluster in which the query face feature is present. After that, comparison is performed with each image feature registered in the clusters in order from the cluster having the shortest shortest distance to each cluster.

本実施形態においても、各クラスタに登録されている各特徴との比較を行った後、比較結果を出力する際は、比較結果のうち、次に比較を行うクラスタとの推測最短距離よりも短い結果を出力する。例えば、図９において、クラスタＧ−１との比較を行った後は、次に比較を行うクラスタＧ−２との最短距離を半径とした円９０５の内側の結果を出力する。クラスタＧ−２との比較を行った後は、次に比較を行うクラスタＧ−３との最短距離を半径とした円９０６の内側の結果を出力する。このようにして、次に比較を行うクラスタとの最短距離よりも短い（類似度が高い）結果を出力することで、距離の短い順、すなわち、類似度が高い順を保証した結果を順々に出力することが可能となる。 Also in the present embodiment, after comparison with each feature registered in each cluster, when the comparison result is output, the comparison result is shorter than the estimated shortest distance to the cluster to be compared next. Output the result. For example, in FIG. 9, after the comparison with the cluster G-1, the result inside the circle 905 is output with the radius being the shortest distance to the cluster G-2 to be compared next. After the comparison with the cluster G-2, the result inside the circle 906 is output with the radius being the shortest distance to the cluster G-3 to be compared next. In this way, by outputting a result (high in similarity) shorter than the shortest distance to the cluster to be compared next, the results in which the order of short distance, that is, the order in which high similarity is guaranteed, are sequentially It is possible to output to

次に、図１０を用いて、映像入力部２０１から入力された映像データを検索可能なように蓄積する処理の詳細を説明する。図１０は、本実施形態において、顔画像特徴を蓄積する処理手順を示すフローチャートである。本処理は、先に説明した映像入力部２０１から特徴蓄積部２０７までの処理に対応する。 Next, details of processing for storing video data input from the video input unit 201 so as to be searchable will be described with reference to FIG. FIG. 10 is a flowchart showing a processing procedure for accumulating face image features in the present embodiment. This processing corresponds to the processing from the video input unit 201 to the feature storage unit 207 described above.

ステップＳ１００１からステップＳ１００８は、図６におけるステップＳ６０１からステップＳ６０８と同様である。 Steps S1001 to S1008 are the same as steps S601 to S608 in FIG.

ステップＳ１００９において、特徴蓄積部２０７は、ステップＳ６０８で算出された顔画像特徴を所定のルールで決定されたクラスタに登録する。加えて、顔画像特徴のメタデータとして、人物ＩＤや、人物を追尾したときの追尾トラックＩＤ、撮影時刻、撮影カメラなどの情報を顔画像特徴と関連付けて記憶する。上述のとおり、本実施形態で、各クラスタを決定する所定のルールとは、特徴量空間を格子状の複数の空間に分割するというものである。 In step S1009, the feature storage unit 207 registers the face image feature calculated in step S608 in a cluster determined according to a predetermined rule. In addition, information such as a person ID, a tracking track ID when tracking a person, shooting time, and a shooting camera is stored as metadata of a face image feature in association with the face image feature. As described above, in the present embodiment, the predetermined rule for determining each cluster is to divide the feature amount space into a plurality of lattice-like spaces.

ステップＳ１０１０は、図６におけるステップＳ６１３と同様である。 Step S1010 is the same as step S613 in FIG.

以上の処理により、ネットワークカメラ１１２から入力された映像に写る人物の顔画像の顔画像特徴が、特徴蓄積部２０７によって外部記憶装置１０４に検索可能な状態で蓄積される。 By the above-described processing, the feature storage unit 207 stores face image features of the face image of the person in the video input from the network camera 112 in a searchable state in the external storage device 104.

次に、図１１を用いて、クエリ映像入力部２０８から入力されたクエリ画像から人物の顔画像を検索する処理の詳細を説明する。図１１は、本実施形態において、人物の顔画像を検索する処理手順を示すフローチャートである。本処理は、先に説明した図２における顔検出部２０４、特徴算出部２０６、クエリ映像入力部２０８から検索結果表示部２１０までの処理に対応する。 Next, details of processing for searching for a face image of a person from the query image input from the query video input unit 208 will be described using FIG. FIG. 11 is a flowchart showing a processing procedure for searching for a face image of a person in the present embodiment. This processing corresponds to the processing from the face detection unit 204, the feature calculation unit 206, and the query video input unit 208 to the search result display unit 210 in FIG. 2 described above.

ステップＳ１１０１からステップＳ１１０３は、図７におけるステップＳ７０１からステップＳ７０３と同様である。 Steps S1101 to S1103 are the same as steps S701 to S703 in FIG.

ステップＳ１１０４において、検索部２０９は、クエリ顔特徴から各クラスタへの最短距離を算出し、ソートすることで比較順を決定する。検索部２０９は、クエリ顔特徴が存在するクラスタの各画像特徴との比較の後は、この最短距離が短いクラスタから順に比較を行う。 In step S1104, the search unit 209 calculates the shortest distance from each query face feature to each cluster, and determines the comparison order by sorting. After the comparison with each image feature of the cluster in which the query face feature is present, the search unit 209 performs comparison in order from the cluster having the shortest shortest distance.

ステップＳ１１０５からステップＳ１１１２は、図７におけるステップＳ７０６からステップＳ７１３と同様である。 Steps S1105 to S1112 are the same as steps S706 to S713 in FIG.

以上のように、本実施形態によれば、監視カメラの映像中の人物を追尾して得られた同一人物の画像から顔画像特徴を算出し、それらを所定のルールで分割したクラスタに記憶するようにした。そして、クエリとして与えられた顔画像をもとに顔画像検索を行う際、まず、クエリ顔特徴と各クラスタ内の特徴との最短距離を推測する。そして、最短距離が短いクラスタから順に、クエリ顔特徴と記憶されている画像特徴との比較を行い、その検索結果を順次出力するようにした。そのため、本実施形態では、ユーザが検索結果を確認できるようになるまでの時間を短くすることが可能になる。 As described above, according to the present embodiment, face image features are calculated from images of the same person obtained by tracking the person in the video of the monitoring camera, and these are stored in clusters divided according to a predetermined rule. I did it. Then, when performing face image search based on the face image given as a query, first, the shortest distance between the query face feature and the feature in each cluster is estimated. Then, the query face feature and the stored image feature are compared in order from the cluster with the shortest shortest distance, and the search result is sequentially output. Therefore, in the present embodiment, it is possible to shorten the time until the user can confirm the search result.

［第３の実施形態］
次に、本発明の第３の実施形態について説明する。第１、第２の実施形態では、顔画像特徴を１つのインデクスに登録し、該インデクスを対象に検索を行った。第３の実施形態では、顔画像特徴を複数のインデクスに分割して登録し、それらの複数のインデクスを対象に検索を行う。なお、第１、第２の実施形態で既に説明をした構成については同一の符号を付し、その説明を省略する。 Third Embodiment
Next, a third embodiment of the present invention will be described. In the first and second embodiments, face image features are registered in one index, and a search is performed on the index. In the third embodiment, the face image feature is divided into a plurality of indexes and registered, and the search is performed on the plurality of indexes. The components already described in the first and second embodiments are denoted by the same reference numerals, and the description thereof will be omitted.

まず、顔画像特徴を複数のインデクスに分割して登録することの利点について説明する。例えば、カメラ毎にインデクスを分けることで、検索対象のカメラを限定した検索を簡単に行うことができる。また、一定時間ごとや一定数の顔画像特徴ごとにインデクスを分けることで、時間帯を指定した検索を簡単に行うことができる。インデクスを複数に分割するときは、インデクスごとに第１の実施形態もしくは第２の実施形態と同様の手順でインデクスを作成する。 First, an advantage of dividing and registering a face image feature into a plurality of indexes will be described. For example, by dividing an index for each camera, it is possible to easily carry out a search with limited cameras to be searched. Further, by dividing the index for each fixed time or for each fixed number of face image features, it is possible to easily perform a search specifying a time zone. When the index is divided into a plurality of indexes, the index is created for each index according to the same procedure as in the first embodiment or the second embodiment.

検索を行うときは、検索部２０９が、それぞれのインデクスに対して検索を行い、インデクスごとに検索結果を取得する。図１２は、各インデクスの検索結果の一例を示す概念図である。同図では、各インデクスの検索結果を距離が短い順（類似度が高い順）に並べて表わしている。ここでは、インデクス１からは１２０１から１２０３、インデクス２からは１２０４から１２０７、インデクス３からは１２０８から１２１０が検索結果として取得されている。各検索結果の下部に記載の数字は、クエリ特徴からの距離である。 When the search is performed, the search unit 209 searches each index and acquires the search result for each index. FIG. 12 is a conceptual diagram showing an example of search results of each index. In the same figure, the search results of each index are shown in order of decreasing distance (in order of high similarity). Here, from index 1 1201 to 1203, from index 2 1204 to 1207, and from index 3 1208 to 1210 are acquired as search results. The numbers at the bottom of each search result are the distances from the query features.

この状態において、検索部２０９は、まずインデクスごとに最も長い距離を取得する。図１２の例では、インデクス１の最長距離は１１０、インデクス２の最長距離は１４０、インデクス３の最長距離は１２０である。そして、検索部２０９は、これらの距離のうち、最も短い最長距離よりも距離が短い検索結果を出力する。図１２の例では、各インデクスの最長距離のうち最も短い最長距離はインデクス１の１１０である。次の検索結果を取得する際、どのインデクスからも、距離が１１０よりも短い検索結果が取得されることは無い。そこで、距離が１１０よりも短い検索結果を出力する。以降も、各インデクスに対して検索結果を取得し、各インデクスの最長距離のうち最も短い最長距離よりも距離が短い検索結果を出力するという同様の処理を繰り返す。これにより、すべてのインデクスに対して距離の短い順、すなわち、類似度が高い順を保証した検索結果を順々に出力することが可能となる。 In this state, the search unit 209 first acquires the longest distance for each index. In the example of FIG. 12, the longest distance of index 1 is 110, the longest distance of index 2 is 140, and the longest distance of index 3 is 120. Then, the search unit 209 outputs a search result in which the distance is shorter than the shortest longest distance among these distances. In the example of FIG. 12, the shortest longest distance among the longest distances of each index is 110 of index 1. When acquiring the next search result, no search result whose distance is shorter than 110 is obtained from any index. Therefore, the search result whose distance is shorter than 110 is output. After that, the same process is repeated as acquiring search results for each index and outputting search results whose distance is shorter than the shortest longest distance among the longest distances of each index. This makes it possible to sequentially output search results guaranteed for all indexes in order of ascending distance, that is, in order of high similarity.

また、インデクスごとの検索結果を取得する際、そのインデクスがどの距離よりも短い結果を出力したか、すなわち次に比較するクラスタとの最短距離を取得することも可能である。例えば、図１２のインデクス１において、次に比較するクラスタとの最短距離が１１８であって、かつ、それまでの比較結果には距離が１１０より長く１１８より短い結果が存在しないときを考える。このとき、インデクス１から取得できる比較結果は変わらない。もし、次に比較するクラスタとの最短距離が１１８である、という情報を予め取得することができていれば、次の検索結果を取得する際、どのインデクスからも距離が１１８よりも短い結果が取得されることは無いことがわかる。したがって、この値を用いて、距離が１１８よりも短い結果のみを出力することで、順位の確定した結果をより多く、すなわち、より早く出力することが可能となる。 In addition, when acquiring the search result for each index, it is also possible to acquire the result whose index is shorter than which distance, that is, the shortest distance to the cluster to be compared next. For example, in index 1 of FIG. 12, it is assumed that the shortest distance to the cluster to be compared next is 118 and there is no result longer than 110 and shorter than 118 in the comparison results so far. At this time, the comparison result that can be acquired from index 1 does not change. If it is possible to obtain in advance information that the shortest distance to the cluster to be compared next is 118, when obtaining the next search result, the result is that the distance is shorter than 118 from any index It turns out that it is not acquired. Therefore, by using this value, by outputting only results with a distance shorter than 118, it is possible to output more ranked results, that is, faster.

以上、本実施形によれば、顔画像特徴を複数のインデクスに分割して登録した場合にも、ユーザが検索結果を確認できるようになるまでの時間を短くすることが可能になる。また、距離の短い順、すなわち、類似度が高い順を保証した結果を順々に出力することが可能となる。 As described above, according to the present embodiment, even when the face image feature is divided into a plurality of indexes and registered, it is possible to shorten the time until the user can confirm the search result. In addition, it is possible to sequentially output the results of guaranteeing the order in which the distances are short, that is, the order in which the degree of similarity is high.

［その他の実施形態］
上述した各実施形態では、監視カメラの映像中の人物の画像から顔画像特徴を抽出し、それらを記憶し、検索できるようにした。しかしながら本発明は、検索対象としてのオブジェクトは人物に限定するものではない。例えば、犬、猫などの動物や、車などの乗り物に適用しても良い。また、対象のオブジェクトから算出される特徴は、顔画像から算出される顔特徴に限定されない。例えば、人物検索の場合には、顔画像以外にも、全身、衣服、持ち物の画像から算出される特徴量であってもよい。本発明を動物に適用する場合は、顔画像、体の模様、衣服、全体的な形状や色を特徴として用いればよい。本発明を車に適用する場合は、識別番号周辺の画像、全体的な形状や色を特徴として用いればよい。 Other Embodiments
In each embodiment described above, face image features are extracted from the image of a person in the video of the surveillance camera, and they are stored and can be searched. However, the present invention does not limit the object as the search target to a person. For example, the present invention may be applied to animals such as dogs and cats and vehicles such as cars. In addition, the features calculated from the target object are not limited to the face features calculated from the face image. For example, in the case of a person search, other than face images, feature amounts calculated from images of the whole body, clothes, and belongings may be used. When the present invention is applied to animals, facial images, body patterns, clothes, overall shapes and colors may be used as features. When the present invention is applied to a car, an image around the identification number, an overall shape or color may be used as a feature.

また、本発明は、画像の局所的な特徴量（局所特徴量）を用いて類似画像を検索する際にも適用可能である。この方法では、まず、画像から特徴的な点（局所特徴点）を抽出する（非特許文献３）。次に、当該局所特徴点とその周辺の画像情報とに基づいて、当該局所特徴点に対応する特徴量（局所特徴量）を計算する（非特許文献４）。そのようにして得られた特徴量（多次元ベクトル）をクラスタリングして蓄積し、検索時はクラスタ内特徴と順次比較し、順次出力すればよい。 The present invention is also applicable to searching for similar images using local feature quantities (local feature quantities) of images. In this method, first, characteristic points (local feature points) are extracted from the image (Non-Patent Document 3). Next, a feature amount (local feature amount) corresponding to the local feature point is calculated based on the local feature point and image information on the periphery thereof (Non-Patent Document 4). The feature quantities (multidimensional vectors) obtained in this manner may be clustered and accumulated, and may be sequentially compared with the in-cluster features at the time of retrieval and sequentially output.

なお、上記各実施形態では検索対象のデータ、クエリデータの種類を画像として説明をしたが、データの種類は画像に限られず、例えば音声のような他の種類のデータであってもよい。 In the above embodiments, the data to be searched and the type of query data are described as images. However, the type of data is not limited to an image, and may be another type of data such as voice.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or storage medium, and one or more processors in a computer of the system or apparatus read and execute the program. Processing is also feasible. It can also be implemented by a circuit (eg, an ASIC) that implements one or more functions.

１００情報処理装置
２０１映像入力部
２０２映像蓄積部
２０３追尾処理部
２０４顔検出部
２０５代表顔画像決定部
２０６特徴算出部
２０７特徴蓄積部
２０８クエリ映像入力部
２０９検索部
２１０検索結果表示部 100 information processing apparatus 201 video input unit 202 video storage unit 203 tracking processing unit 204 face detection unit 205 representative face image determination unit 206 feature calculation unit 207 feature storage unit 208 query video input unit 209 search unit 210 search result display unit

Claims

Information processing for outputting the first vector similar to the second vector representing the feature value calculated from the query data from the storage means which registered the first vector to be searched in a plurality of groups in the multi-dimensional space A device,
Input means for inputting the query data;
Calculation means for calculating the second vector from the input query data;
First determining means for determining the shortest distance that the first vector registered in the group and the second vector can take as the shortest distance;
Second determining means for determining the order of the plurality of groups to be compared with the second vector based on the determined shortest distance;
The first vector and the second vector are compared for each of the groups based on the determined order, and the distance between the first vector and the second vector in each of the plurality of groups is the Outputting means for outputting the first vector shorter than the shortest distance as a search result;
An information processing apparatus comprising:

The information processing apparatus according to claim 1, wherein the plurality of groups are formed by dividing the multidimensional space.

The information processing apparatus according to claim 2, wherein the plurality of groups are generated by clustering the multidimensional space.

The first determination means is a longest distance which is the longest distance among the distance between the representative vector of the group and the first vector registered in the group, the representative vector, and the second vector. The information processing apparatus according to claim 2 or 3, wherein the shortest distance is determined based on the distance.

The representative vector is an average vector of a first vector registered in the group, a barycenter of the group, and a first vector closest to the average vector among the first vectors registered in the group, 5. The information processing apparatus according to claim 4, wherein one of the first vectors registered in the group is one of the first vectors closest to the center of gravity of the group.

The information processing apparatus according to claim 2, wherein the first determination unit determines a distance between the second vector and the boundary of the group as the shortest distance.

The second determination means narrows a group out of the plurality of groups based on the shortest distance of the plurality of groups, and determines an order of comparing the narrowed group with the second vector. The information processing apparatus according to any one of claims 1 to 6, wherein

In the storage means, the first vector is registered for each of a plurality of indexes;
The output means compares the first vector with the second vector in each of the plurality of indexes, and the distance between the first vector and the second vector is longest in each of the plurality of indexes. 8. The method according to claim 7, further comprising: acquiring a first distance of the first vector, which is shorter than a shortest distance among the longest distances of the plurality of indexes acquired, as a search result. Information processing device.

In the storage means, a multidimensional vector calculated from an image to be searched is registered as the first vector,
The input unit inputs a query image as the query data,
The information processing apparatus according to any one of claims 1 to 8, wherein the calculation means calculates a multidimensional vector from the input query image as the second vector.

It further comprises detection means for detecting a person from the input query image,
10. The information processing apparatus according to claim 9, wherein the second vector is a multidimensional vector representing a feature amount extracted from the area of the detected person.

The information processing apparatus according to any one of claims 1 to 10, further comprising display means for displaying the output search result on a display unit.

The input unit inputs data to be searched.
12. The apparatus according to claim 1, wherein the calculation means calculates the first vector from the data to be searched, and registers the calculated first vector in the storage means. The information processing apparatus according to claim 1.

Information processing for outputting the first vector similar to the second vector representing the feature value calculated from the query data from the storage means which registered the first vector to be searched in a plurality of groups in the multi-dimensional space Method,
Inputting the query data;
Calculating the second vector from the input query data;
Determining the shortest distance that the first vector registered in the group and the second vector can take as the shortest distance;
Determining an order of the plurality of groups to be compared with the second vector based on the determined shortest distance;
The first vector and the second vector are compared for each of the groups based on the determined order, and the distance between the first vector and the second vector in each of the plurality of groups is the Outputting the first vector shorter than the shortest distance as a search result;
An information processing method characterized by comprising:

A program for causing a computer to function as the information processing apparatus according to any one of claims 1 to 12.