JP2021086573A

JP2021086573A - Image search apparatus, control method thereof, and program

Info

Publication number: JP2021086573A
Application number: JP2019217500A
Authority: JP
Inventors: 博志吉川; Hiroshi Yoshikawa
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2021-06-03

Abstract

To provide a search apparatus which performs high-accuracy search even when there are a plurality of objects similar in feature quantity.SOLUTION: An image search apparatus 120 includes: a feature quantity extraction unit which extracts features of an object included in an input image; a normal search unit which searches a storage unit storing an image and features represented by multidimensional vector of the object in association with each other, to obtain a candidate image group including objects having features similar to the features extracted by the extraction unit; a different object group generation unit which determines whether a non-identical entity exists in the objects included in the candidate image group; a re-search unit which obtains, when it is determined that a non-identical entity exists, a candidate image group including objects having features similar to the features extracted by the extraction unit in a feature space where a difference in feature between entities is enhanced, by searching the storage unit; and a display unit which outputs, when a non-identical entity does not exist, a search result of the normal search unit, or displays a search result of the re-search unit when a non-identical entity exists.SELECTED DRAWING: Figure 1

Description

本発明は、画像検索装置及びその制御方法及びプログラムに関するものである。 The present invention relates to an image retrieval device, a control method and a program thereof.

近年、人物画像をクエリとして用い、その人物を含むシーンを、監視カメラによる映像から検索するシステムの実用化が進んでいる。代表例としては、犯罪容疑者の足取りを追うため、監視映像内からその容疑者が映っていたシーンを検索することがこれに当たる。また、イベント会場などでは要人の訪れた場所を把握することで、イベントの運営をスムーズに行うことにも利用可能である。 In recent years, a system that uses a person image as a query and searches for a scene including the person from an image taken by a surveillance camera has been put into practical use. As a typical example, in order to follow the footsteps of a criminal suspect, it is possible to search the scene in which the suspect was shown in the surveillance video. In addition, it can be used to smoothly manage the event by grasping the places visited by VIPs at the event venue.

人物画像を用いた検索を行う際には、画像から多次元ベクトルから成る特徴量を抽出し、特徴量の類似度に基づいて検索をおこなう方法が一般的である。当然、特徴量は同一人物からは類似した特徴量が抽出されることが望ましい。しかし、照明条件や撮影方向、姿勢の違いなどによる映り方の差に起因して、同一人物の画像であっても特徴量に差が生じてしまうことがある。そして、同一人物同士の特徴量よりも、他人の特徴量との類似度の方が高くなってしまう場合があり、これが検索精度の劣化につながっている。特にスーツなどの類似した服装を着ている人物間の特徴量を比較する場合には、ネクタイや靴などの僅かな違いしか観測されないため、この問題は大きい。 When performing a search using a person image, it is common to extract a feature amount consisting of a multidimensional vector from the image and perform the search based on the similarity of the feature amount. Naturally, it is desirable that similar features are extracted from the same person. However, due to differences in the way the images are projected due to differences in lighting conditions, shooting directions, postures, etc., there may be differences in the amount of features even for images of the same person. Then, the degree of similarity with the feature amount of another person may be higher than the feature amount of the same person, which leads to deterioration of the search accuracy. Especially when comparing the features of people wearing similar clothes such as suits, this problem is large because only slight differences such as ties and shoes are observed.

特許文献１には、印刷画像の原本検索において、検索クエリ画像に特有の特徴を強調し、検索精度を向上させる方法が開示されている。具体的には、クエリ画像を用いて得られた検索結果の内、類似度の高い検索結果画像同士の差異領域を判定し、その差異領域に着目して検索結果と検索クエリ画像との類似度を更新するものである。 Patent Document 1 discloses a method for improving the search accuracy by emphasizing the features peculiar to the search query image in the original search of the printed image. Specifically, among the search results obtained using the query image, the difference area between the search result images having a high degree of similarity is determined, and the degree of similarity between the search result and the search query image is focused on the difference area. Is to update.

特許文献２には、検索クエリ画像の属性情報と、検索クエリ画像の人物とは別の人物が撮影された画像の属性情報の統計情報をもとに、識別能力が高い属性情報を特定し、その属性情報の重みを大きくすることで、識別能力を向上させる方法が開示されている。 In Patent Document 2, attribute information having high discriminating ability is specified based on the attribute information of the search query image and the statistical information of the attribute information of the image taken by a person different from the person in the search query image. A method of improving the discriminating ability by increasing the weight of the attribute information is disclosed.

なお、人物検索に利用される特徴量やその特徴量照合方法は、予めリストに登録された人物をリアルタイムで撮影した映像の中から見つけ出す人物検出や、見つけ出した人物を追跡する人物追跡にも利用される。 In addition, the feature amount used for person search and the feature amount matching method are also used for person detection for finding a person registered in a list in real time from a video shot in real time, and for person tracking for tracking the found person. Will be done.

また、画像の類似度をもとに同一のものであるかを判定する技術は、人物画像に限定されるものではない。例えば、犬や猫などのペット、遺跡からの出土品、手作りの工芸品のように類似しているがよく見ると細かい部位が異なる対象物同士を識別する用途にも利用される。 Further, the technique for determining whether or not the images are the same based on the similarity of the images is not limited to the person image. For example, it is also used to identify objects that are similar but have different small parts, such as pets such as dogs and cats, artifacts excavated from archaeological sites, and handmade crafts.

特開２０１７−１３８７４４号公報JP-A-2017-138744 特許６２５４８３６号公報Japanese Patent No. 6254836

クエリ画像の人物と類似する服装を着用した別人が多数存在する環境下では、検索結果の上位にクエリ画像の人物と類似する服装を着用した別人がランクインする可能性が高くなる。言い換えれば、ユーザが探したい真の人物のランクが下位に位置し、別人の結果に埋もれてしまう可能性が高くなる。 In an environment where there are many other people wearing clothes similar to the person in the query image, there is a high possibility that another person wearing clothes similar to the person in the query image will be ranked higher in the search results. In other words, the true person the user wants to find is ranked lower and is more likely to be buried in the results of another person.

特許文献１は、検索で得た画像同士の差異領域に着目することで、正解と不正解の差を強調する効果を狙ったものである。ただし、チラシのように店名等のごく一部の領域のみが異なる画像が多数検索対象内に存在することを想定しており、検索結果の上位に正解と不正解が混在していることが前提となっている。人物画像検索においては、同一人物の画像であっても撮影条件によって特徴量が異なるため、検索結果の上位に正解が存在しないケースも起こりうる。不正解のみが検索結果の上位に存在し、なおかつ不正解画像群が同一の人物（類似服装の他人）を映したものである場合には、上位の結果同士の差異が得られず、異なる人物間に特有の特徴を強調することができない。この場合には、特許文献１の方法では、初回検索で得られた類似度を更新することができず、ユーザが探したい人物のランクを上昇させることができない。 Patent Document 1 aims at the effect of emphasizing the difference between the correct answer and the incorrect answer by paying attention to the difference region between the images obtained by the search. However, it is assumed that there are many images in the search target that differ only in a small part of the area such as the store name, such as leaflets, and it is assumed that correct and incorrect answers are mixed at the top of the search results. It has become. In a person image search, even if the image is the same person, the feature amount differs depending on the shooting conditions, so that there may be a case where the correct answer does not exist at the top of the search results. If only the incorrect answer exists at the top of the search results, and the incorrect answer image group shows the same person (another person with similar clothes), the difference between the top results cannot be obtained and the different person. It is not possible to emphasize the unique features in between. In this case, in the method of Patent Document 1, the similarity obtained in the initial search cannot be updated, and the rank of the person desired by the user cannot be increased.

特許文献２では、検索クエリ画像を撮影した時刻と同一時刻に撮影された人物の画像を、本人以外の人物が撮影された画像であるとみなす。そして、検索クエリ画像から本人の属性情報を抽出し、本人以外の人物の画像からも属性情報を抽出する。本人と本人以外の属性情報の統計情報をもとに、属性情報の識別性能を評価する。識別性能が高い属性情報の重みを大きくして識別を行うことで、本人と他者の識別能力を向上させる。この手法では、クエリ画像と同一時刻に撮影された全ての人物の統計情報を利用するため、類似する服装を着用した人物における識別能力の改善効果は大きくない。したがって、ユーザが探したい人物のランクを改善するという所望の効果は得られない。 In Patent Document 2, an image of a person taken at the same time as the time when the search query image was taken is regarded as an image taken by a person other than the person himself / herself. Then, the attribute information of the person is extracted from the search query image, and the attribute information is also extracted from the image of the person other than the person. The discrimination performance of the attribute information is evaluated based on the statistical information of the person and the attribute information other than the person. By increasing the weight of attribute information with high identification performance to perform identification, the ability to distinguish between the person and others is improved. In this method, since the statistical information of all the persons taken at the same time as the query image is used, the effect of improving the discrimination ability of the persons wearing similar clothes is not great. Therefore, the desired effect of improving the rank of the person the user wants to find cannot be obtained.

本発明は、上記の問題点に鑑みなされたものであり、検索結果の中から複数の同一人物でない人物を見つけ出し、その複数人物の特徴量の差分の重みを大きくすることで、識別能力を改善する技術を提供しようとするものである。
The present invention has been made in view of the above problems, and improves the discriminating ability by finding a plurality of non-identical persons from the search results and increasing the weight of the difference in the feature amounts of the plurality of persons. It is intended to provide the technology to do.

この課題を解決するため、例えば本発明の画像検索装置は、
入力画像に含まれる対象物の特徴を抽出する抽出手段と、
画像に含まれる対象物の、多次元ベクトルで表される特徴と前記画像とを対応付けて記憶している記憶手段を検索することにより、前記抽出手段で抽出した特徴に類似する特徴を有する対象物を含む候補画像群を得る第１の検索手段と、
前記候補画像群に含まれる対象物に、予め設定された条件に基づき、互いに非同一とみなせる個体が存在するか判定する判定手段と、
前記判定手段により、互いに非同一とみなせる個体が存在すると判定された場合、当該個体同士の特徴の差を強調するように前記多次元ベクトルの各次元の重みを調整した特徴空間にて、前記抽出手段で抽出した特徴に類似する特徴を有する対象物を含む候補画像群を、前記記憶手段を検索することにより得る第２の検索手段と、
前記判定手段により、前記個体が存在しないと判定された場合には、前記第１の検索手段による検索結果を出力し、前記判定手段により前記個体が存在すると判定された場合には、前記第２の検索手段による検索結果を出力する出力手段とを有する。 In order to solve this problem, for example, the image search device of the present invention
An extraction method that extracts the characteristics of the object contained in the input image,
An object having features similar to the features extracted by the extraction means by searching for a storage means that stores the features represented by the multidimensional vector and the image in association with each other in the object included in the image. A first search method for obtaining candidate images including objects, and
A determination means for determining whether or not the objects included in the candidate image group have individuals that can be regarded as non-identical to each other based on preset conditions.
When it is determined by the determination means that there are individuals that can be regarded as non-identical to each other, the extraction is performed in a feature space in which the weights of each dimension of the multidimensional vector are adjusted so as to emphasize the difference in features between the individuals. A second search means for obtaining a candidate image group including an object having a feature similar to the feature extracted by the means by searching the storage means, and
When the determination means determines that the individual does not exist, the search result by the first search means is output, and when the determination means determines that the individual exists, the second It has an output means for outputting the search result by the search means of.

本発明によれば、特徴量の多くの部分が類似した複数の対象物が存在している場合でも、より高い精度で検索結果を得ることが可能になる。 According to the present invention, it is possible to obtain search results with higher accuracy even when a plurality of objects having similar features are present.

第１の実施形態におけるシステム構成図。The system block diagram in the 1st Embodiment. 第１の実施形態における特徴量空間の模式図。The schematic diagram of the feature quantity space in 1st Embodiment. 第１の実施形態における検索処理手順を示すフローチャート。The flowchart which shows the search processing procedure in 1st Embodiment. 別対象物グループの作成処理を示すフローチャート。A flowchart showing a process of creating another object group. 別対象物グループの作成処理の適用例を示す図。The figure which shows the application example of the creation process of another object group. 複数の別対象グループの重みベクトルの統合方法を説明するための図。The figure for demonstrating the method of integrating the weight vector of a plurality of different target groups. 違いを強調した部分をユーザに表示する方法を説明するための図。Diagram to illustrate how to show the user where the differences are highlighted. 全身特徴量と顔特徴量の結合時の重みベクトル算出方法を説明するための図。The figure for demonstrating the weight vector calculation method at the time of combining a whole body feature amount and a face feature amount. 第３の実施形態におけるシステム構成図。The system configuration diagram in the third embodiment. 第３の実施形態における検出処理手順を示すフローチャートFlow chart showing the detection processing procedure in the third embodiment 第３の実施形態における別対象物グループ作成処理の適用例を示す図。The figure which shows the application example of another object group creation processing in 3rd Embodiment. クエリ人物画像と類似画像との差を説明するための図。A diagram for explaining the difference between a query person image and a similar image. グループ間の相違を説明するための図。Diagram to illustrate the differences between groups.

以下、添付図面を参照して実施形態を詳しく説明する。尚、以下の実施形態は特許請求の範囲に係る発明を限定するものでない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The following embodiments do not limit the invention according to the claims. Although a plurality of features are described in the embodiment, not all of the plurality of features are essential to the invention, and the plurality of features may be arbitrarily combined. Further, in the attached drawings, the same or similar configurations are designated by the same reference numbers, and duplicate explanations are omitted.

［第１の実施形態］
第１の実施形態で説明する画像検索装置は、撮影デバイスから送信された映像を解析し、ユーザに検索結果を提示するサーバに適用した例である。 [First Embodiment]
The image search device described in the first embodiment is an example applied to a server that analyzes an image transmitted from a photographing device and presents a search result to a user.

図１（Ａ）は、本実施形態における画像検索装置のハードウェア構成図である。ＣＰＵ１０１は、ＲＯＭ１０２に格納されている制御プログラムを実行することにより、本装置全体の制御をおこなう。ＲＯＭ１０２に格納されている制御プログラムは、対象物抽出部１０３、特徴量抽出部１０４、特徴量登録部１０５、特徴量検索部１０６から成る。ＲＡＭ１０７は、各構成要素からの各種データを一時記憶する。また、プログラムを展開し、ＣＰＵ１０１が実行可能な状態にする。 FIG. 1A is a hardware configuration diagram of the image search device according to the present embodiment. The CPU 101 controls the entire apparatus by executing the control program stored in the ROM 102. The control program stored in the ROM 102 includes an object extraction unit 103, a feature amount extraction unit 104, a feature amount registration unit 105, and a feature amount search unit 106. The RAM 107 temporarily stores various data from each component. In addition, the program is expanded so that the CPU 101 can be executed.

記憶部１０８は、処理対象となるデータを格納、検索対象となるデータを保存する。記憶部１０８の媒体としては、ＨＤＤ、フラッシュメモリ、各種光学メディアなどを用いることができる。入力部１０９は、キーボード、タッチパネル等で構成され、ユーザからの入力を受け付けるものであり、検索条件やユーザが選択したクエリ画像の入力も受け取る。表示部１１０は、液晶ディスプレイ等で構成され、検索結果をユーザに対して表示する。また、本装置は通信部１１１を介して、撮影装置等の他の装置と通信することができる。記憶部１０８は、必ずしも画像検索装置の内部に備えなくても良い。画像記憶装置の外部にある場合は、通信部１１１を介して処理対象のデータを取得する。 The storage unit 108 stores the data to be processed and stores the data to be searched. As the medium of the storage unit 108, an HDD, a flash memory, various optical media, or the like can be used. The input unit 109 is composed of a keyboard, a touch panel, and the like, and receives input from the user, and also receives input of search conditions and a query image selected by the user. The display unit 110 is composed of a liquid crystal display or the like, and displays the search result to the user. In addition, this device can communicate with other devices such as a photographing device via the communication unit 111. The storage unit 108 does not necessarily have to be provided inside the image search device. When it is outside the image storage device, the data to be processed is acquired via the communication unit 111.

図１（Ｂ）は、図１（Ａ）の構成を含めたデータフローを示したものである。本装置（画像検索装置１２０）は、外部にある撮影装置１２１が取得した映像データ１１２を、通信部１１１を介して受け取る。画像検索装置１２０は、映像データ１１２から登録特徴量１１３を抽出して、特徴量ＤＢ１１４に登録する。また、画像検索装置１２０は、入力部１０９より入力したクエリ画像から抽出したクエリ特徴量に類似する特徴量を特徴量ＤＢ１１４の中から検索する。 FIG. 1B shows a data flow including the configuration of FIG. 1A. The present device (image search device 120) receives the video data 112 acquired by the external photographing device 121 via the communication unit 111. The image search device 120 extracts the registered feature amount 113 from the video data 112 and registers it in the feature amount DB 114. Further, the image search device 120 searches the feature amount DB 114 for a feature amount similar to the query feature amount extracted from the query image input from the input unit 109.

本装置のデータ登録処理においては、対象物抽出部１０３が受け取った映像データ１１２を解析し、映像内に映っている対象物を抽出する。対象物としては、人物の全身領域や顔領域などを用いることができる。本実施形態においては、対象物として人物の全身領域を用いるものとする。本実施形態では顔領域が十分な大きさで撮影されない場合を想定しており、その場合、顔特徴量による識別は困難となるため、人物の全身領域の特徴量による識別を行う。次に、特徴量抽出部１０４は、対象物抽出部１０３によって抽出された対象物から登録特徴量１１３を抽出し、抽出した特徴量を特徴量登録部１０５に供給する。特徴量登録部１０５は、受け取った特徴量を特徴量ＤＢ（データベース）１１４に登録する。この特徴量ＤＢ１１４には多数の特徴量蓄積が行われるだけでなく、映像データ１１２から対象物領域を切り出したサムネイル画像や、撮影時刻情報、撮影カメラ情報が登録されており、検索時にそれらの情報も利用可能となっている。特徴量ＤＢ１１４は記憶部１０８に配置されていてもよいし、記憶部１０８から読み出された状態でＲＡＭ１０７上に展開されていても構わない。 In the data registration process of the present device, the video data 112 received by the object extraction unit 103 is analyzed to extract the object displayed in the video. As the object, a whole body area or a face area of a person can be used. In this embodiment, the whole body area of a person is used as an object. In the present embodiment, it is assumed that the face region is not photographed with a sufficient size. In that case, it is difficult to identify by the feature amount of the face, so the identification is performed by the feature amount of the whole body area of the person. Next, the feature amount extraction unit 104 extracts the registered feature amount 113 from the object extracted by the object object extraction unit 103, and supplies the extracted feature amount to the feature amount registration unit 105. The feature amount registration unit 105 registers the received feature amount in the feature amount DB (database) 114. In this feature amount DB 114, not only a large number of feature amounts are accumulated, but also a thumbnail image obtained by cutting out an object area from the video data 112, shooting time information, and shooting camera information are registered, and these information are registered at the time of searching. Is also available. The feature amount DB 114 may be arranged in the storage unit 108, or may be expanded on the RAM 107 in a state of being read from the storage unit 108.

本装置の検索処理においては、まず入力部１０９を介して、ユーザからクエリ画像１１５を受け取る。入力部１０９によるクエリ画像の入力法は特に問わない。例えば、ユーザが用意した対象人物が写った写真をスキャナより読み込む。また、対象人物が写った画像ファイルを記憶した記憶媒体を本装置にセットし、本装置がその画像ファイルを読み出しても構わない。また、対象人物が写った画像ファイルをメール等の通信技術を利用して受信しても構わない。なお、クエリ画像内に複数の人物が写っている場合には、その１つをユーザが指定するものとする。いずれにせよ、入力部１０９がクエリ画像１１５を入力すると、対象物抽出部１０３は、登録処理時と同様、クエリ画像１１５に対して画像内に映っている対象物を抽出し、特徴量抽出部１０４が対象物の画像からクエリ特徴量１１６を抽出する。 In the search process of the present device, first, the query image 115 is received from the user via the input unit 109. The method of inputting the query image by the input unit 109 is not particularly limited. For example, a photograph prepared by a user showing a target person is read from a scanner. Further, a storage medium storing an image file in which the target person is captured may be set in the present device, and the present device may read the image file. In addition, an image file showing the target person may be received by using a communication technology such as e-mail. If a plurality of people are shown in the query image, one of them shall be specified by the user. In any case, when the input unit 109 inputs the query image 115, the object extraction unit 103 extracts the object reflected in the image with respect to the query image 115 as in the registration process, and the feature amount extraction unit 104 extracts the query feature amount 116 from the image of the object.

通常検索部１１７は、特徴量ＤＢ１１４内からクエリ特徴量１１６と類似する特徴量の検索処理を行う。対象物選択部１２２は、通常検索部１１７による検索結果から、複数の対象物を選択する。別対象物グループ作成部１１８は、選択された複数の対象物が同一の対象物であるか否かを判定し、同一の対象物ではないと判定された場合、それら複数の対象物から別対象物グループを作成する。人物の場合には、同一人物ではない複数の別人から構成されるグループを作成することになる。再検索部１１９は、別対象物グループに属する複数の人物の特徴量を利用して特徴量ＤＢ１１４の再検索を行う。再検索部１１９は、再検索の結果、類似度の高い特徴量に紐づくサムネイル画像を検索結果として表示部１１０に出力する。 The normal search unit 117 performs a search process for a feature amount similar to the query feature amount 116 from the feature amount DB 114. The object selection unit 122 selects a plurality of objects from the search results of the normal search unit 117. The separate object group creation unit 118 determines whether or not a plurality of selected objects are the same object, and if it is determined that they are not the same object, another object is selected from the plurality of objects. Create an object group. In the case of a person, a group composed of a plurality of different people who are not the same person will be created. The re-search unit 119 re-searches the feature amount DB 114 by using the feature amounts of a plurality of persons belonging to another object group. As a result of the re-search, the re-search unit 119 outputs a thumbnail image associated with a feature amount having a high degree of similarity to the display unit 110 as a search result.

なお、入力部１０９および表示部１１０は、別の装置に備わっている構成でもよく、通信部１１１を介してクエリや検索結果の送受信をしてもよい。また、制御プログラムの内、いくつかを別装置に備える構成としてもよい。例えば、対象物抽出部１０３、特徴量抽出部１０４を別のサーバに備えて、本装置では特徴量登録部１０５、特徴量検索部１０６のみの構成とすることもできる。この場合には、映像を解析する処理と検索に関する処理で装置を分けることが可能であり、それぞれの処理に適したハードウェアスペックを選択することができる。 The input unit 109 and the display unit 110 may be provided in another device, or may send and receive queries and search results via the communication unit 111. Further, some of the control programs may be provided in a separate device. For example, the object extraction unit 103 and the feature amount extraction unit 104 may be provided in another server, and the present device may have only the feature amount registration unit 105 and the feature amount search unit 106. In this case, it is possible to separate the devices for the process of analyzing the video and the process related to the search, and it is possible to select the hardware specifications suitable for each process.

対象物抽出部１０３は、対象物の領域の特徴を事前に学習しておき、その上で入力画像内を探索窓で捜査して各探索窓が対象物の領域であるか否かを判定する方式を適用する。学習手法としては、アンサンブル学習やＣＮＮ（Convolutional Neural Network）等の深層学習手法を用いることができる。特徴量抽出部１０４は、対象物の領域内の部分領域の色ヒストグラムを特徴量として抽出してもよいし、ＣＮＮ等の深層学習手法により抽出する方法を用いてもよい。 The object extraction unit 103 learns the characteristics of the area of the object in advance, and then searches the input image with the search window to determine whether or not each search window is the area of the object. Apply the method. As a learning method, a deep learning method such as ensemble learning or CNN (Convolutional Neural Network) can be used. The feature amount extraction unit 104 may extract the color histogram of the partial region in the area of the object as the feature amount, or may use a method of extracting by a deep learning method such as CNN.

図１２（Ａ），（Ｂ）を参照して、本実施形態における本人と他者の識別性能の向上効果の概要を示す。図１２（Ａ）は、類似する服装を着用した３名の人物の外観の情報を一覧表にまとめたものである。３名の人物とは、検索クエリと同一人物である本人、クエリ画像の人物とは異なる人物の他人Ａ、クエリ画像の人物とも他人Ａとも異なる人物の他人Ｂである。全身画像領域に占める上下スーツ領域の割合は大きいため、本人画像、他人Ａ、他人Ｂは特徴空間上で近接している。クエリ画像を利用した類似度検索を行うと、本人画像、他人Ａ、他人Ｂはどれもスコアが高くなる。たとえ、Ｙシャツ、ネクタイ、靴、カバン、肌の色、毛髪などに違いがあっても全体像に占める割合はわずかであるため、撮影条件の違いや人物の向き、背景の情報などのノイズ要因の影響を受けて、本人よりも他人のスコアが高くなることがある。そのため、クエリ画像で検索を行っても検索結果の中に多数の他人がまぎれることとなる。 With reference to FIGS. 12A and 12B, the outline of the effect of improving the discrimination performance between the person and the other person in the present embodiment is shown. FIG. 12A is a list of information on the appearances of three persons wearing similar clothes. The three persons are the person who is the same person as the search query, another person A who is different from the person in the query image, and another person B who is different from the person in the query image and the person A. Since the ratio of the upper and lower suit areas to the whole body image area is large, the person's image, another person A, and another person B are close to each other in the feature space. When the similarity search using the query image is performed, the score of the person image, the other person A, and the other person B is high. Even if there are differences in shirts, ties, shoes, bags, skin color, hair, etc., they make up a small proportion of the overall picture, so noise factors such as differences in shooting conditions, person orientation, and background information The score of another person may be higher than that of the person himself / herself. Therefore, even if a search is performed using a query image, a large number of others will be confused in the search results.

図１２（Ｂ）は、外観の違いをまとめたものである。この例ではネクタイと毛髪の違いは３者共通に存在する。従って、他人Ａと他人Ｂの違いを強調すれば、本人と他人Ａの識別能力も向上する。すなわち、検索結果の上位に存在する人物の中から別人同士である人物を選び、その人物間の特徴量の違いを強調することができれば、類似する服装を着用している人物間の識別能力を向上させることができる。 FIG. 12B summarizes the differences in appearance. In this example, the difference between a tie and hair is common to all three. Therefore, if the difference between another person A and another person B is emphasized, the ability to distinguish between the person and another person A is also improved. In other words, if it is possible to select a person who is different from each other from the people who are at the top of the search results and emphasize the difference in the feature amount between the people, the ability to discriminate between the people wearing similar clothes can be improved. Can be improved.

その様子を図２（Ａ）、（Ｂ）、（Ｃ）を用いてさらに説明する。図２（Ａ）は初回の検索を行う特徴空間１上での特徴量の分布を表している。わかりやすくするために２次元で表現しているが、実際には特徴量の次元数は数１００〜数１０００次元にも及ぶことがある。図示はあくまで例示であると理解されたい。 The situation will be further described with reference to FIGS. 2A, 2B, and 2C. FIG. 2A shows the distribution of the feature amount on the feature space 1 in which the first search is performed. Although it is expressed in two dimensions for the sake of clarity, the number of dimensions of the feature quantity may actually range from several hundreds to several thousand. It should be understood that the illustrations are merely examples.

図２（Ａ）の参照符号Ｆ１０１で示した“×”印がクエリ特徴量を、参照符号Ｆ１０２で示した黒い四角印が本人の特徴量を、参照符号Ｆ１０３で示した黒い三角印が他人Ａの特徴量を、参照符号Ｆ１０４で示した黒丸印が他人Ｂの特徴量を表している。類似する服装を着用しているため、クエリ、本人、他人Ａ、他人Ｂの特徴量は特徴空間上で密集している。“×”印Ｆ１０３が示す位置の特徴量で検索を行うと、本人、他人Ａ、他人Ｂがランク内に混在することとなる。説明のわかりやすさを重視して、３名を例に挙げているが、実際の検索の場面では数千人、数万人規模での検索を行うため、より多数の人物が特徴空間の特定領域に密集することになり、検索結果には他者の情報があふれることになる。 The “x” mark indicated by the reference code F101 in FIG. 2 (A) is the query feature amount, the black square mark indicated by the reference code F102 is the feature amount of the person, and the black triangle mark indicated by the reference code F103 is the other person A. The black circles indicated by the reference reference numeral F104 represent the feature amount of another person B. Since they wear similar clothes, the features of the query, the person, the other person A, and the other person B are densely packed in the feature space. When the search is performed using the feature amount at the position indicated by the “x” mark F103, the person, the other person A, and the other person B are mixed in the rank. Three people are given as an example with an emphasis on easy-to-understand explanations, but in the actual search scene, a search is performed on a scale of thousands or tens of thousands of people, so more people are assigned to a specific area of the feature space. It will be crowded, and the search results will be flooded with information about others.

ここで、図２（Ａ）の点線円で囲んだ２つの特徴量が他人同士である条件を満たしたとする。他人Ａと他人Ｂの特徴量の共通性が低い部分を強調するように、特徴空間を変形したものが図２（Ｂ）で示した特徴空間２である。この空間では他人Ａと他人Ｂの違いであるシャツ、ネクタイ、靴、毛髪の違いが強調されるため、特徴空間における他人Ａと他人Ｂの分離度が特徴空間１に比べると高くなる。この特徴空間２にクエリ画像と本人の特徴量をマッピングしたものが図２（Ｃ）である。図１２（Ａ）と図１２（Ｂ）で示した例のように本人、他人Ａ、他人Ｂの３者に共通の違いが存在する場合には、他人Ａと他人Ｂの違いを強調することで、本人と他人Ａ、本人と他人Ｂの特徴空間上での分離度も高くなり、検索結果が改善する。 Here, it is assumed that the two feature quantities surrounded by the dotted line circle in FIG. 2 (A) satisfy the condition that they are other people. The feature space 2 shown in FIG. 2B is a modification of the feature space so as to emphasize the portion where the feature quantities of the other person A and the other person B have low commonality. In this space, the difference between the shirt, tie, shoes, and hair, which is the difference between the other person A and the other person B, is emphasized, so that the degree of separation between the other person A and the other person B in the feature space is higher than that in the feature space 1. FIG. 2C is a mapping of the query image and the feature amount of the person in the feature space 2. When there is a common difference between the person, the other person A, and the other person B as in the example shown in FIGS. 12 (A) and 12 (B), emphasize the difference between the other person A and the other person B. Then, the degree of separation between the principal and the other person A and the principal and the other person B in the feature space is also increased, and the search result is improved.

本実施形態で記述するフローチャートの各ステップに対応する処理は、ＣＰＵを用いてソフトウェアで実現しても良いし、電子回路などのハードウェアで実現するようにしても良い。上述の動作の内、検索関連処理についてフローチャートに従って説明する。 The processing corresponding to each step of the flowchart described in the present embodiment may be realized by software using a CPU or by hardware such as an electronic circuit. Among the above-mentioned operations, the search-related processing will be described according to the flowchart.

図３は、本実施形態における特徴量検索部１０６の処理を示したものである。本フローの内、Ｓ４０１〜Ｓ４０３は、通常検索部１１７の処理であり、Ｓ４０４が対象物選択部１２２と別対象物グループ作成部１１８の処理であり、Ｓ４０５以降が再検索部１１９の処理である。検索条件は予め入力されているものとする。 FIG. 3 shows the processing of the feature amount search unit 106 in the present embodiment. In this flow, S401 to S403 are the processes of the normal search unit 117, S404 is the process of the object selection unit 122 and the separate object group creation unit 118, and S405 and subsequent processes are the processes of the re-search unit 119. .. It is assumed that the search conditions have been entered in advance.

まず、Ｓ４０１にて、通常検索部１１７は、ユーザが入力したクエリ画像から抽出されたクエリ特徴量の入力を受け付ける。クエリ特徴量は、特徴空間１の特定の場所に位置する特徴量であるため、特徴量選択されたことになる。クエリ特徴量が入力されると、Ｓ４０２にて、通常検索部１１７は、特徴量ＤＢ１１４内に蓄積されている特徴量との類似度を算出する。類似度を算出する対象の特徴量は、特徴量ＤＢ１１４内のすべての特徴量を対象としてもよいし、特定の時刻やカメラに映った人物を検索したい場合には、撮影時刻やカメラ名で照合対象を絞り込んでもよい。特徴量の類似度は各種距離関数の逆数を利用することができ、距離が近い程類似度が高くなるように算出する。本実施形態では、距離計算にユークリッド距離を用いるものとする。 First, in S401, the normal search unit 117 accepts the input of the query feature amount extracted from the query image input by the user. Since the query feature amount is a feature amount located at a specific location in the feature space 1, it means that the feature amount has been selected. When the query feature amount is input, in S402, the normal search unit 117 calculates the degree of similarity with the feature amount stored in the feature amount DB 114. The feature amount of the target for which the similarity is calculated may be all the feature amounts in the feature amount DB 114, and when it is desired to search for a person reflected in a specific time or camera, the feature amount is collated by the shooting time or the camera name. The target may be narrowed down. The reciprocal of various distance functions can be used for the similarity of the features, and the similarity is calculated so that the closer the distance, the higher the similarity. In this embodiment, the Euclidean distance is used for the distance calculation.

Ｓ４０３にて、通常検索部１１７は、Ｓ４０２において算出した類似度をもとに検索結果を類似度の降順にソートして、順位を付与する。このときに、類似度が所定値以上となる候補のみを検索結果として残す。もし、類似度が所定値以上となる多数（複数）の候補が存在する場合には、ランキングが所定順位以上となる候補群のみを検索結果として残してもよい。 In S403, the normal search unit 117 sorts the search results in descending order of similarity based on the similarity calculated in S402, and assigns a ranking. At this time, only candidates whose similarity is equal to or higher than a predetermined value are left as search results. If there are a large number (plurality) of candidates whose similarity is equal to or higher than a predetermined value, only the candidate group whose ranking is equal to or higher than the predetermined rank may be left as the search result.

Ｓ４０４にて、対象物選択部１２２及び別対象物グループ作成部１１８は、別対象物グループを作成する。検索結果はＳ４０３にて、上位Ｎ位の結果まで絞り込まれているとする。 In S404, the object selection unit 122 and the separate object group creation unit 118 create another object group. It is assumed that the search results are narrowed down to the top N results in S403.

ここで、Ｓ４０４の別対象物グループの作成処理を、図４のフローチャートを参照して以下に説明する。図４における、Ｓ５０１〜Ｓ５０４、Ｓ５１０〜Ｓ５１３が対象物選択部１２２で実施される処理、Ｓ５０５〜Ｓ５０９が別対象物グループ作成部１１８で実施される処理である。 Here, the process of creating another object group in S404 will be described below with reference to the flowchart of FIG. In FIG. 4, S501 to S504 and S510 to S513 are processes performed by the object selection unit 122, and S505 to S509 are processes performed by another object group creation unit 118.

Ｓ５０１にて、対象物選択部１２２は、制御変数ｉに“１”を設定する。Ｓ５０２にて、対象物選択部１２２は、検索結果のランキングがｉ位の対象物を第一の対象物として選択する。Ｓ５０３にて、対象物選択部１２２は、制御変数ｊに“１”を設定する。Ｓ５０４にて、対象物選択部１２２は、ランキングが「ｉ＋ｊ」位の対象物を第二の対象物として選択する。 In S501, the object selection unit 122 sets “1” in the control variable i. In S502, the object selection unit 122 selects the object whose search result ranking is i-rank as the first object. In S503, the object selection unit 122 sets “1” in the control variable j. In S504, the object selection unit 122 selects an object having a ranking of "i + j" as the second object.

Ｓ５０５にて、別対象物グループ作成部１１８は、第一の対象物を撮影したカメラと第二の対象物を撮影したカメラが同一であるか否かを判定する。別対象物グループ作成部１１８は、この判定がＹｅｓの場合にはＳ５０８に、Ｎｏの場合にはＳ５０６に処理を進める。 In S505, the separate object group creation unit 118 determines whether or not the camera that captured the first object and the camera that captured the second object are the same. The separate object group creation unit 118 proceeds to S508 if the determination is Yes, and to S506 if the determination is No.

Ｓ５０６にて、別対象物グループ作成部１１８は、第一の対象物を撮影したカメラと第二の対象物を撮影したカメラが視野を共有しないか否かを判定する。この判定は、記憶部１０８に予め格納された各カメラの設置位置、撮像方向を定義した情報に基づいて行われるものとする。別対象物グループ作成部１１８は、この判定がＹｅｓの場合はＳ５０７に、Ｎｏの場合はＳ５０８に処理を進める。 In S506, the separate object group creation unit 118 determines whether or not the camera that captured the first object and the camera that captured the second object do not share the field of view. This determination is made based on the information that defines the installation position and the imaging direction of each camera stored in advance in the storage unit 108. The separate object group creation unit 118 proceeds to S507 if the determination is Yes, and proceeds to S508 if the determination is No.

Ｓ５０７にて、別対象物グループ作成部１１８は、第一の対象物の撮影期間と第二の対象物の撮影期間に重なりがあるか否かを判定する。別対象物グループ作成部１１８は、この判定がＹｅｓの場合にはＳ５０９に、Ｎｏの場合にはＳ５０８に処理を進める。 In S507, the separate object group creation unit 118 determines whether or not there is an overlap between the shooting period of the first object and the shooting period of the second object. The separate object group creation unit 118 proceeds to S509 if the determination is Yes, and proceeds to S508 if the determination is No.

Ｓ５０８に処理が到達した場合、第一の対象物と第二の対象物は別対象物である条件を満たさないことを意味する。よって、別対象物グループ作成部１１８は、第一の対象物と第二の対象物とを別対象物グループであるとは判定しない。 When the process reaches S508, it means that the first object and the second object do not satisfy the condition that they are different objects. Therefore, the separate object group creation unit 118 does not determine that the first object and the second object are different object groups.

一方、Ｓ５０９に処理が到達した場合、第一の対象物と第二の対象物は別対象物である条件（非同一であるという条件）を満たすことになる。よって、別対象物グループ作成部１１８は、第一の対象物と第二の対象物は、別対象物グループであると判定する。 On the other hand, when the process reaches S509, the condition that the first object and the second object are different objects (condition that they are not the same) is satisfied. Therefore, the separate object group creation unit 118 determines that the first object and the second object are different object groups.

Ｓ５１０にて、対象物選択部１２２は、制御変数ｊに“１”を加算する。そして、Ｓ５１１にて、対象物選択部１２２は、制御変数ｊがランキングの最下位であるＮ位に到達したか否かを判定する。対象物選択部１２２は、この判定がＹｅｓの場合にはＳ５１２に、Ｎｏの場合にはＳ５０４に処理を進める。つまりＳ５０４に進んだ場合には、対象物選択部１２２は、新たな第二の対象物の再選択処理を行うことになる。 In S510, the object selection unit 122 adds “1” to the control variable j. Then, in S511, the object selection unit 122 determines whether or not the control variable j has reached the Nth position, which is the lowest in the ranking. The object selection unit 122 proceeds to S512 if the determination is Yes, and to S504 if the determination is No. That is, when the process proceeds to S504, the object selection unit 122 reselects a new second object.

Ｓ５１２にて、対象物選択部１２２は制御変数ｉに“１”を加算する。そして、Ｓ５１３にて、対象物選択部１２２は、制御変数ｉが絞り込みされたランキングの最下位の一つ上位の順位であるＮ−１位に到達したかを判定する。対象物選択部１２２は、この判定がＹｅｓの場合には、別対象物グループ作成処理を終了する。また、判定がＮｏの場合、対象物選択部１２２は、処理をＳ５０２に進め、第一の対象物の再選択を行い、上記処理を繰り返す。 In S512, the object selection unit 122 adds “1” to the control variable i. Then, in S513, the object selection unit 122 determines whether the control variable i has reached the N-1 position, which is one higher rank than the lowest in the narrowed down ranking. If the determination is Yes, the object selection unit 122 ends the process of creating another object group. If the determination is No, the object selection unit 122 advances the process to S502, reselects the first object, and repeats the above process.

上記の図４の別対象物グループ作成処理に基づく別対象物グループが作成される事例を図５（Ａ）〜（Ｃ）を参照して説明する。 An example in which a separate object group is created based on the separate object group creation process of FIG. 4 will be described with reference to FIGS. 5 (A) to 5 (C).

図５（Ａ）は、１次検索結果の例である。図示では、スコア（類似度）が７８０点以上となる検索結果を１次検索結果として抽出しているものとする。また、カメラは２台（カメラ１、カメラ２）で監視し、且つ、この２台のカメラは撮影視野を共有しないように配置されているものとする。撮影視野が共有されていないため、同一人物が２台のカメラに同時に撮影されることはない。図５（Ｂ）と図５（Ｃ）に横軸に各検索結果の撮影期間、縦軸に対象物の検出位置の画像座標のｘ座標、またはｙ座標をとり、プロットする。フレームをまたがって追跡を行うためには、複数フレーム間の同一人物判定をする必要がある。フレーム間の同一人物の判定には、画像データから色ヒストグラム特徴を抽出して、パーティクルフィルタを用いる方法などを用いることができる。図５（Ａ）に示すＩＤが“３６”の人物が、図５（Ｂ）の“ＩＤ３６”に対応するものと理解されたい。 FIG. 5A is an example of the primary search result. In the figure, it is assumed that the search results having a score (similarity) of 780 points or more are extracted as the primary search results. Further, it is assumed that the cameras are monitored by two cameras (camera 1 and camera 2), and the two cameras are arranged so as not to share the shooting field of view. Since the shooting field of view is not shared, the same person cannot be shot by two cameras at the same time. In FIGS. 5B and 5C, the horizontal axis represents the shooting period of each search result, and the vertical axis represents the x-coordinate or y-coordinate of the image coordinate of the detection position of the object, and plots the image. In order to perform tracking across frames, it is necessary to determine the same person among a plurality of frames. To determine the same person between frames, a method of extracting color histogram features from image data and using a particle filter or the like can be used. It should be understood that the person whose ID is "36" shown in FIG. 5 (A) corresponds to "ID 36" in FIG. 5 (B).

本実施形態に従えば、順位が１位となるＩＤ３６の対象物を第一の対象物として先ず選択される。ここで、仮に、順位が４位のＩＤ５６の対象物が第二の対象物として選択されたとする。このＩＤ５６の人物の撮影時刻はｔ１であり、第一の対象物であるＩＤ３６の人物と撮影期間に重なりが発生する。故に、ＩＤ３６とＩＤ５６の人物は、別対象物グループと判定される。この２名をグループＡとする。同様に順位が３位のＩＤ６２の対象物が第一の対象物として選択されたとする。このとき、順位５位のＩＤ６１が第二の対象物として選択された場合に、時刻ｔ２において撮影期間に重なりが発生しているため、別対象物グループと判定される。この２名をグループＢとする。 According to the present embodiment, the object with ID 36 having the first rank is first selected as the first object. Here, it is assumed that the object with ID 56 having the fourth rank is selected as the second object. The shooting time of the person with ID 56 is t1, and the shooting period overlaps with the person with ID 36, which is the first object. Therefore, the persons with ID 36 and ID 56 are determined to be different object groups. These two people will be referred to as Group A. Similarly, it is assumed that the object with ID 62 having the third rank is selected as the first object. At this time, when the ID 61 of the fifth rank is selected as the second object, it is determined to be another object group because the shooting period overlaps at time t2. These two people are referred to as group B.

図３の説明に戻る。図４のＳ４０５にて、再検索部１１９は、別対象物グループが作成されたか否かを判定し、作成されたと判定した場合は処理をＳ４０６に、作成できなかったと判定した場合は本処理を終了する。 Returning to the description of FIG. In S405 of FIG. 4, the re-search unit 119 determines whether or not another object group has been created, and if it is determined that it has been created, the process is set to S406, and if it is determined that the process cannot be created, this process is performed. finish.

Ｓ４０６にて、再検索部１１９は、Ｓ４０４で作成した別対象物グループ内の複数の人物の特徴量を利用して、特徴空間を変換するために用いるパラメータ群として、重みベクトルｗ＝（ｗ₁，ｗ₂，…，ｗ_d）を算出する。ここでｄは特徴空間の次元数であり、重みベクトルは次元ごとに変換のための重み係数を設定するものである。各特徴量に対して、この重みベクトルを掛け合わせることで、特徴空間変換をおこなうことができる。 _{In S406, the re-search unit 119 uses the weight vector w = (w 1)} as a parameter group used to transform the feature space by using the feature quantities of a plurality of persons in another object group created in S404. , w _2, ..., to calculate the w _d). Here, d is the number of dimensions of the feature space, and the weight vector sets the weighting coefficient for conversion for each dimension. By multiplying each feature quantity by this weight vector, the feature space conversion can be performed.

別対象物グループ内の人物Ａの特徴量をａとし、別対象物グループ内の人物Ｂの特徴量をｂとし、重みベクトルを以下の式（１）および式（２）で算出する。 The feature amount of the person A in the different object group is a, the feature amount of the person B in the different object group is b, and the weight vector is calculated by the following equations (1) and (2).

式（１）では、次元ごとに人物Ａの特徴量と人物Ｂの特徴量との差分を計算し、次元ごとの重みを計算する。本式により、人物Ａの特徴量と人物Ｂの特徴量の数値の差が大きい次元ほど重み値が大きくなる。人物Ａの特徴量と人物Ｂの特徴量の数値の差が大きい次元は、クエリ画像と類似した外観をした複数の人物間を識別するための特徴を良く示していると考えられるため、この次元の重みを大きくすることで、人物Ａと人物Ｂの違いを強調することができる。クエリ画像の人物が人物Ａ、人物Ｂのいずれかであった場合、本人と他者との識別能力が向上する効果がある。クエリ画像の人物が人物Ａ、人物Ｂのいずれでもない場合であっても、クエリ画像の人物と人物Ａ、クエリ画像の人物と人物Ｂの識別能力が向上する効果が得られる。 In the equation (1), the difference between the feature amount of the person A and the feature amount of the person B is calculated for each dimension, and the weight for each dimension is calculated. According to this equation, the larger the difference between the numerical values of the feature amount of the person A and the feature amount of the person B, the larger the weight value. The dimension in which the difference between the feature amount of the person A and the feature amount of the person B is large is considered to well indicate the feature for distinguishing between a plurality of people having an appearance similar to the query image. By increasing the weight of, the difference between the person A and the person B can be emphasized. When the person in the query image is either person A or person B, there is an effect of improving the ability to distinguish between the person and another person. Even when the person in the query image is neither the person A nor the person B, the effect of improving the discriminating ability between the person and the person A in the query image and the person and the person B in the query image can be obtained.

式（２）では、最終重み値を得るための正規化をおこなう。各次元の重みの総和が次元数dとなるように正規化することで、変換前後での特徴空間における距離の絶対値の取りうる値の差を少なくすることができる。 In equation (2), normalization is performed to obtain the final weight value. By normalizing so that the sum of the weights of each dimension is the number of dimensions d, it is possible to reduce the difference between the absolute values of the distances in the feature space before and after the conversion.

前述した識別能力は、ある人物とその人物とは別の人物の識別が期待できる相対的な度合いを評価するものである。例えば、重みの偏差平方和と、式（４）の重みの有効次元に基づいて算出した有効次元の割合を掛け合わせた式（３）で算出できる。 The above-mentioned discriminating ability evaluates the relative degree to which one person and another person can be expected to be discriminated from each other. For example, it can be calculated by the equation (3) obtained by multiplying the sum of squared deviations of the weights and the ratio of the effective dimensions calculated based on the effective dimensions of the weights in the equation (4).

重みベクトル内の各次元の重みが均一である場合には、重みベクトルを掛けても元の特徴空間から変化がないため、識別能力は高くならない。従って、識別能力を高くするためには、次元ごとに重みがある程度異なっている必要がある。そのため、次元ごとの重みの分散度合いを表すために、式（４）の第一項に偏差平方和を用いている。 When the weights of each dimension in the weight vector are uniform, the discrimination ability does not increase because there is no change from the original feature space even if the weight vector is multiplied. Therefore, in order to improve the discriminating ability, the weights need to be different for each dimension to some extent. Therefore, in order to express the degree of variance of the weights for each dimension, the sum of squared deviations is used in the first term of the equation (4).

一方で、重みのばらつきがある場合であっても、ごく一部の次元にのみ重みが集中しており、重みが０に近い次元が大半となる場合には、識別に寄与する次元数が著しく少なくなり、却って識別性が下がることが考えられる。この特性を反映するために、式（４）の第二項に重みが予め定めた基準値以上となる次元の割合を用いている。次元の割合の代わりに、単純に次元数を用いてもよい。 On the other hand, even if there are variations in weights, if the weights are concentrated in only a small number of dimensions and most of the dimensions have weights close to 0, the number of dimensions that contribute to identification is significant. It is conceivable that the number will decrease and the distinctiveness will decrease. In order to reflect this characteristic, the ratio of dimensions in which the weight is equal to or greater than the predetermined reference value is used in the second term of the equation (4). You may simply use the number of dimensions instead of the proportion of dimensions.

一方、人物Ａの特徴量と人物Ｂの特徴量の数値の差が小さい次元ほど重み値が小さくなる。この数値の差が小さい次元は、クエリ画像と類似した外観をした複数の人物間の識別に寄与しないと考えられるため重みを小さくし、この次元の類似性が検索結果に与える影響を小さくする。 On the other hand, the smaller the difference between the numerical values of the feature amount of the person A and the feature amount of the person B, the smaller the weight value. A dimension with a small difference in numerical values is considered not to contribute to discrimination between a plurality of people having an appearance similar to a query image, so the weight is reduced, and the influence of the similarity of this dimension on the search result is reduced.

Ｓ４０７にて、再検索部１１９は、別対象物グループ数が２つ以上であるかの判定を行う。再検索部１１９は別対象物グループ数が１つであると判定した場合はＳ４０９に、２以上であると判定した場合はＳ４０８に処理を進める。 In S407, the re-search unit 119 determines whether or not the number of different object groups is two or more. When the re-search unit 119 determines that the number of different object groups is one, the process proceeds to S409, and when it is determined that the number of different object groups is two or more, the process proceeds to S408.

Ｓ４０８にて、再検索部１１９は、別対象物グループ毎の重みベクトルを統合する。統合の仕方としては、各グループの重みベクトルの値のうち、最大となる値を残す方法がある。各グループで強調する次元が異なる場合、最大値を残すことで、各グループで強調する次元の情報を保存することができる。 In S408, the re-search unit 119 integrates the weight vectors for each different object group. As a method of integration, there is a method of leaving the maximum value among the values of the weight vector of each group. When the dimension to be emphasized in each group is different, the information of the dimension to be emphasized in each group can be saved by leaving the maximum value.

例えば、グループが２つあり、グループ１で算出された重みベクトルをｗ₁＝（ｗ₁₁，ｗ₁₂，…，ｗ_1d）、グループ２で算出された重みベクトルをｗ₂＝（ｗ₂₁，ｗ₂₂，…，ｗ_2d）であるとする。このとき、統合後の重みベクトルをｗ_u＝（ｗ_u1，ｗ_u2，…，ｗ_ud）とすると、ｗ_uは以下の式（５）で算出される。 For example, there are two groups, the weight vector calculated in group 1 is w ₁ = (w ₁₁ , w ₁₂ , ..., w _1d ), and the weight vector calculated in group 2 is w ₂ = (w ₂₁ , w). ₂₂ , ..., w _2d ). At this time, if the weight vector after integration is w _u = (w _u1 , w _u2 , ..., W _ud ), w _u is calculated by the following equation (5).

ここで、max(…)は、括弧内に並んだ数値列の中の最大値を返す関数である。

Here, max (...) is a function that returns the maximum value in the numerical sequence arranged in parentheses.

ここで、グループを統合することにより識別能力が向上する理由を、図１３と、図５及び図６を参照して説明する。 Here, the reason why the discriminating ability is improved by integrating the groups will be described with reference to FIG. 13, FIG. 5, and FIG.

図５において、ＩＤ３６、ＩＤ５６の２名をグループＡ、ＩＤ６１、ＩＤ６２の２名をグループＢとする。また理解を容易にするため、ＩＤ６２の人物が本人、ＩＤ３６の人物が人物Ａ、ＩＤ５６およびＩＤ６１の人物が人物Ｂであるとする。図１３に、グループＡの２名の違いとグループＢの２名の違いをまとめた。○印は違いがある部分、×印は違いがない部分である。ここで、グループＡとグループＢの２つのグループを統合するときに、図１３の４列目に記載したようにそれぞれのグループの違いが、保存されるように統合されると、グループの個別の識別能力よりも統合後の識別能力が向上する。 In FIG. 5, two people with ID36 and ID56 are group A, and two people with ID61 and ID62 are group B. Further, in order to facilitate understanding, it is assumed that the person with ID 62 is the person himself, the person with ID 36 is the person A, and the people with ID 56 and ID 61 are the person B. FIG. 13 summarizes the differences between the two members in group A and the differences between the two members in group B. The ○ mark is the part where there is a difference, and the × mark is the part where there is no difference. Here, when the two groups of group A and group B are integrated, if the differences between the groups are integrated so as to be preserved as described in the fourth column of FIG. 13, the groups are individually integrated. The discriminating ability after integration is improved rather than the discriminating ability.

図６（Ａ）乃至（Ｇ）を参照して、より具体的に説明する。同図のグラフは横軸が次元数、縦軸が各次元の特徴量の大きさを表している。図６（Ａ）が他人Ａの特徴量、図６（Ｂ）が他人Ｂの特徴量を表している。前述した式（１）でグループＡに属する他人Ａと他人Ｂの特徴量の違いを求めたものが図６（Ｃ）である。他人Ａと他人Ｂとでは靴の色が異なるため、その部分に相当する次元の特徴量差が大きく出る。 A more specific description will be given with reference to FIGS. 6 (A) to 6 (G). In the graph of the figure, the horizontal axis represents the number of dimensions and the vertical axis represents the magnitude of the feature amount of each dimension. FIG. 6A shows the feature amount of another person A, and FIG. 6B shows the feature amount of another person B. FIG. 6 (C) shows the difference in the feature amount between the other person A and the other person B belonging to the group A by the above-mentioned formula (1). Since the color of the shoes is different between the other person A and the other person B, the difference in the feature amount of the dimension corresponding to that part is large.

続いて、図６（Ｄ）が本人の特徴量、図６（Ｅ）が他人Ｂの特徴量を表している。グループＡと同様に前述した式（１）でグループＢに属する本人と他人Ｂの特徴量の違いを求めたものが図６（Ｆ）である。本人と他人Ｂではカバンの有無が異なるため、その部分に相当する次元の特徴量差が大きく出る。 Subsequently, FIG. 6 (D) shows the feature amount of the person himself / herself, and FIG. 6 (E) shows the feature amount of the other person B. FIG. 6 (F) shows the difference in the feature amount between the person belonging to the group B and the other person B by the above-mentioned formula (1) as in the group A. Since the presence or absence of a bag is different between the person and the other person B, there is a large difference in the amount of features in the dimension corresponding to that part.

図６（Ｇ）は、図３のＳ４０８の重みベクトルの統合処理の結果である。グループＡの靴の違いと、グループＢのカバンの違いの双方が強調される重みベクトルが算出される。類似した服装を着用した人物においては、靴の情報やカバンの情報のいずれか一方しか違いが見られないケースもあるため、できるだけ多くの違いを強調可能な重みベクトルを用いた方が、識別能力は高い。したがって、複数のグループの重みベクトルを統合することで、識別能力が向上することが理解できよう。 FIG. 6 (G) is the result of the integration process of the weight vector of S408 of FIG. A weight vector is calculated that emphasizes both the difference in shoes in group A and the difference in bags in group B. In some cases, people wearing similar clothes can see only one of the shoe information and the bag information, so it is better to use a weight vector that can emphasize as many differences as possible. Is expensive. Therefore, it can be understood that the discriminating ability is improved by integrating the weight vectors of multiple groups.

図４の説明に戻る。Ｓ４０９にて、再検索部１１９は、重みベクトルを用いて変換した後の特徴空間において、クエリ特徴量と特徴量データベース１１４内の特徴量の類似度を算出することで、画像検索を行う。変換後の特徴空間での類似度を算出する際には、特徴量に重み係数を掛けた上（補正した上で）で、ユークリッド距離を用いてもよいし、重み付きユークリッド距離を用いて、変換後の特徴空間における距離を直接算出してもよい。クエリ特徴量ｑと特徴量データベース１１４内の特徴量ｘの重み付きユークリッド距離は以下の式（６）で算出することができる。人物Ａと人物Ｂを識別するために有効な次元の重みが大きくなっているため、識別に有効な次元の特徴量の差が大きい場合に距離が大きく（類似度が小さく）なり、識別に有効な次元の特徴量の差が小さい場合に距離が小さく（類似度が高く）なる効果がある。また識別に有効ではない次元は、重みが小さくなっており、これらの次元の違いは距離計算結果へ影響しにくくなっている。 Returning to the description of FIG. In S409, the re-search unit 119 performs an image search by calculating the similarity between the query feature amount and the feature amount in the feature amount database 114 in the feature space after conversion using the weight vector. When calculating the similarity in the feature space after conversion, the Euclidean distance may be used after multiplying the feature quantity by the weighting coefficient (after correction), or the weighted Euclidean distance may be used. The distance in the feature space after conversion may be calculated directly. The weighted Euclidean distance between the query feature amount q and the feature amount x in the feature amount database 114 can be calculated by the following equation (6). Since the weight of the dimension effective for distinguishing the person A and the person B is large, the distance is large (the similarity is small) when the difference in the feature amount of the dimension effective for identification is large, which is effective for identification. There is an effect that the distance is small (the degree of similarity is high) when the difference between the features of the same dimension is small. In addition, the dimensions that are not effective for identification have smaller weights, and the difference between these dimensions is less likely to affect the distance calculation result.

最後に、Ｓ４１０にて、再検索部１１９は、前のＳ４０９で算出した類似度が高い特徴量を上位から所定件数分選択し、該特徴量に紐づくサムネイル画像を表示部１１０に出力する。 Finally, in S410, the re-search unit 119 selects a predetermined number of features with high similarity calculated in the previous S409 from the top, and outputs thumbnail images associated with the features to the display unit 110.

なお、実施形態にて利用可能な重み計算方法は、前述の式（１）〜（２）に限定されるものではない。式（１）においては、２つの特徴量の差分の絶対値ではなく、差分の割合を利用してもよいし、算出される重みを０または１の２値にしてもよい。重みを２値にした場合は、重みが０の次元については類似度に寄与しなくなるため、次元削減の効果があり、計算量を削減することが出来る。 The weight calculation method that can be used in the embodiment is not limited to the above equations (1) and (2). In the formula (1), the ratio of the difference may be used instead of the absolute value of the difference between the two features, or the calculated weight may be a binary value of 0 or 1. When the weight is binary, the dimension having a weight of 0 does not contribute to the similarity, so that there is an effect of dimension reduction and the amount of calculation can be reduced.

また、画像位置と、特徴量次元の関係が明確である場合、画像のどの位置に特徴量の違いがあり、強調したのかをユーザに提示してもよい。図７（Ａ）に画像に直接重畳表示する場合の例を、図７（Ｂ）に文字情報で表示する場合の例を示す。これらの画像は表示部１１０に表示されるものと理解されたい。図７（Ａ）、図７（Ｂ）では、特徴量の違いが大きかった次元がネクタイ、カバン、靴の部位に対応する例を示しており、図７（Ａ）では、その部位に対応する画像位置に点線の丸印を重畳表示する例を表している。点線丸印１１０１がネクタイを、点線丸印１１０２がカバンを、点線丸印１１０３が靴に対応する位置に描画されている。図７（Ｂ）では特徴量の差が閾値以上であるネクタイ、カバン、靴を文字情報で表示する。画像への重畳表示と文字の表示の双方を表示してもよい。このように違いが大きかった部位をユーザに提示することで、ユーザは探したい人物と類似した服装を着用した人物の外観上の違いがどこにあるのかを明示的に知ることができる。 Further, when the relationship between the image position and the feature dimension is clear, the user may be presented with which position of the image the feature is different and emphasized. FIG. 7 (A) shows an example in which the image is directly superimposed and displayed, and FIG. 7 (B) shows an example in which the character information is displayed. It should be understood that these images are displayed on the display unit 110. FIGS. 7 (A) and 7 (B) show an example in which the dimension in which the difference in the feature amount is large corresponds to the part of the tie, the bag, and the shoe, and FIG. 7 (A) corresponds to the part. An example of superimposing a dotted circle on the image position is shown. The dotted circle 1101 is drawn at the position corresponding to the tie, the dotted circle 1102 is drawn at the position corresponding to the bag, and the dotted circle 1103 is drawn at the position corresponding to the shoe. In FIG. 7B, a necktie, a bag, and shoes having a difference in feature amounts equal to or greater than a threshold value are displayed as text information. Both the superimposed display on the image and the display of characters may be displayed. By presenting to the user the parts where the difference is large in this way, the user can explicitly know where the difference in appearance is between the person wearing the clothes similar to the person to be searched for.

本実施形態によって、ユーザが探したい人物と類似した服装を着用した別人が多数含まれる条件下においても、探したい人物を自動的に検索結果の上位にランクインさせることができる。これは検索結果の中から複数の別対象物を見つけ出し、特徴量の差分が大きい次元の重みを大きくして、スコアを再計算することにより、識別性能を改善することで実現される。対象物グループが複数作成された場合には、それぞれのグループで求めた重みベクトルを統合することで、探したい人物の検索結果のランクをさらに改善することができる。また、特徴量に違いが生じた部分を表示部に表示することで、ユーザは探したい人物と類似した服装を着用した人物の外観上の違いがどこにあるのかを明示的に知ることができるようになる。 According to this embodiment, the person to be searched for can be automatically ranked in the top of the search results even under the condition that many other people wearing clothes similar to the person to be searched for are included. This is realized by finding a plurality of different objects from the search results, increasing the weight of the dimension in which the difference between the features is large, and recalculating the score to improve the discrimination performance. When a plurality of object groups are created, the rank of the search result of the person to be searched for can be further improved by integrating the weight vectors obtained in each group. In addition, by displaying the part where the feature amount is different on the display unit, the user can clearly know where the difference in appearance is between the person wearing clothes similar to the person he / she wants to find. become.

なお、上記実施形態では、カメラが２台の例を説明したが、互いに撮影視野を非共有とする３以上のカメラに適用しても良い。この場合、複数台のカメラのうち２台のカメラ間で、同一時期とみなせる時間帯に撮像された人物は異なる人物であると見なせばよい。 In the above embodiment, the example of two cameras has been described, but it may be applied to three or more cameras that do not share the shooting field of view with each other. In this case, the person imaged in the time zone that can be regarded as the same time between the two cameras out of the plurality of cameras may be regarded as different persons.

以上のように本実施形態によれば、検索結果の中から複数の別の対象物を見つけ出し、その検索結果同士の特徴量の差分の重みを大きくすることで、特徴量の多くの部分が類似した対象物の中から対象物固有に共通して存在する違いの部分を強調し、対象物間の識別能力を改善できる。特に人物画像においては、類似した服装を着た人物の違いがある部分の重みが大きくなるため、人物間の識別能力を改善することができる。 As described above, according to the present embodiment, by finding a plurality of different objects from the search results and increasing the weight of the difference between the features of the search results, many parts of the features are similar. It is possible to emphasize the differences that are common to each object and improve the ability to discriminate between the objects. In particular, in a person image, the weight of the portion where there is a difference between people wearing similar clothes becomes large, so that the ability to discriminate between people can be improved.

［第２の実施形態］
上記第１の実施形態では、対象物として人物の全身領域を対象としていた。これに対し、人物の他の特徴量を併用する場合の識別能力の向上方法を第２の実施形態として説明する。本第２の実施形態で開示するものは、例えば、学生のように服装や持ち物までもが同一の人物においても本人の識別能力を改善する方法である。 [Second Embodiment]
In the first embodiment, the whole body area of a person is targeted as an object. On the other hand, a method for improving the discrimination ability when other feature quantities of a person are used in combination will be described as a second embodiment. What is disclosed in the second embodiment is a method for improving the discriminating ability of a person who has the same clothes and belongings, such as a student.

本第２の実施形態では顔領域が十分な大きさ（もしくは十分な高い解像度）で撮影されている場合を想定しており、この場合には顔特徴量による識別と全身特徴量による識別の双方が利用可能となる。 In the second embodiment, it is assumed that the face region is photographed with a sufficiently large size (or sufficiently high resolution), and in this case, both the identification by the facial feature amount and the identification by the whole body feature amount are performed. Will be available.

本第２の実施形態における対象画像検索装置のハードウェア構成図は、第１の実施形態で示した図１（Ａ）と同じである。また、本第２の実施形態におけるデータフローも、第１の実施形態で示した図１（Ｂ）と同じである。よって、装置構成についての説明は省略し、第１の実施形態との差について述べることとする。 The hardware configuration diagram of the target image search device in the second embodiment is the same as FIG. 1 (A) shown in the first embodiment. Further, the data flow in the second embodiment is also the same as that shown in FIG. 1 (B) shown in the first embodiment. Therefore, the description of the device configuration will be omitted, and the difference from the first embodiment will be described.

対象物抽出部１０３は、映像データおよびクエリ画像から人物の顔領域と全身領域とを抽出する。特徴量抽出部１０４は人物の顔領域からは顔特徴量を、人物の全身領域からは全身特徴量を抽出する。特徴量登録部１０５では、全身特徴量と、顔特徴量の双方を特徴量データベース１１４に登録する。通常検索部１１７、および、再検索部１１９は、クエリ特徴量１１６に紐づく全身特徴量と顔特徴量を結合した結合クエリ特徴量により、特徴量データベース１１４に登録された登録特徴量１１３との類似度を求める。類似度を算出する前に同一の画像から抽出された全身特徴量と顔特徴量は結合特徴量化される。もし、画像から一方の特徴量しか抽出できなかった場合には、結合時に各次元の特徴量を０とするか、抽出できた顔特徴量あるいは全身特徴量のどちらか一方と類似度演算をする。あるいは、一方の特徴量しか抽出できなかったものは、検索対象から除外してもよい。 The object extraction unit 103 extracts a human face region and a whole body region from the video data and the query image. The feature amount extraction unit 104 extracts the facial feature amount from the face area of the person and the whole body feature amount from the whole body area of the person. The feature amount registration unit 105 registers both the whole body feature amount and the facial feature amount in the feature amount database 114. The normal search unit 117 and the re-search unit 119 are different from the registered feature amount 113 registered in the feature amount database 114 by the combined query feature amount that combines the whole body feature amount and the face feature amount associated with the query feature amount 116. Find the similarity. Before calculating the similarity, the whole body features and facial features extracted from the same image are converted into combined features. If only one feature can be extracted from the image, the feature of each dimension is set to 0 at the time of combination, or the similarity is calculated with either the extracted facial feature or the whole body feature. .. Alternatively, those for which only one feature quantity can be extracted may be excluded from the search target.

本第２の実施形態における特徴量検索部１０６の処理のフローチャートは第１の実施形態の図３と同じである。図８（Ａ）乃至（Ｃ）を参照し、全身特徴量と顔特徴量の結合特徴量でＳ４０６の重みベクトルを計算する例について説明する。 The flowchart of the processing of the feature amount search unit 106 in the second embodiment is the same as that in FIG. 3 of the first embodiment. An example of calculating the weight vector of S406 from the combined feature amount of the whole body feature amount and the facial feature amount will be described with reference to FIGS. 8A to 8C.

図８（Ａ）は、検索結果の中から抽出された別対象物グループの１人目の人物の結合特徴量、図８（Ｂ）が検索結果の中から抽出された別対象物グループの２人目の人物の結合特徴量を示している。図８（Ａ）と図８（Ｂ）の点線の位置を境にして、左側が全身特徴量、右側が顔特徴量である。この図の例では、図８（Ａ）の人物と図８（Ｂ）の人物が同じ服・靴を着用していて、かつ、持ち物も同じ場合の例である。全身特徴量で差がつかない。一方で顔特徴量は差が大きいため、重みベクトルとしては顔特徴量が全身特徴量よりも重要視される。図８（Ｃ）は、図８（Ａ）の人物と図８（Ｂ）の人物の結合特徴量の差分を計算したものである。顔特徴量の差が大きいため、顔特徴量の違いが強調されることがわかる。 FIG. 8 (A) shows the combined feature amount of the first person in the different object group extracted from the search results, and FIG. 8 (B) shows the second person in the different object group extracted from the search results. Shows the combined features of the person. With the positions of the dotted lines in FIGS. 8 (A) and 8 (B) as boundaries, the left side is the whole body feature amount and the right side is the facial feature amount. In the example of this figure, the person in FIG. 8 (A) and the person in FIG. 8 (B) are wearing the same clothes / shoes, and the belongings are also the same. There is no difference in the amount of systemic features. On the other hand, since the facial features have a large difference, the facial features are more important than the whole body features as a weight vector. 8 (C) is a calculation of the difference between the combined feature amount of the person of FIG. 8 (A) and the person of FIG. 8 (B). It can be seen that the difference in facial features is emphasized because the difference in facial features is large.

本第２の実施形態では、顔特徴量と全身特徴量を結合した結合特徴量間で類似度を求める例を説明したが、顔特徴量と全身特徴量のそれぞれで検索を行うこともできる。全身のクエリ特徴量で検索を行い、検索結果の上位から別人グループを作成し、全身用重みベクトルを求める。続いて、顔のクエリ特徴量で検索を行い、検索結果の上位から別人グループを作成し、顔用の重みベクトルを求める。各々の重みベクトルをそれぞれの特徴量のスコア計算に利用してもよい。 In the second embodiment, an example of obtaining the similarity between the combined feature amount of the face feature amount and the whole body feature amount has been described, but it is also possible to search for each of the face feature amount and the whole body feature amount. A search is performed using the query features of the whole body, a different group is created from the top of the search results, and the weight vector for the whole body is obtained. Then, a search is performed using the query feature amount of the face, another person group is created from the top of the search results, and the weight vector for the face is obtained. Each weight vector may be used for score calculation of each feature.

本第２の実施形態では、全身特徴量に差が見られず、顔特徴量に差がみられるケースを例に説明したが、適用できる範囲はそれに限定されるものではない。例えば属性情報に差がみられるケースにも適用できる。属性情報としては、例えば、性別、年齢、毛髪の色、毛髪の長さ、歩き方（歩容）などは服装や持ち物が同一であっても差として観測することができる。服装が同一で持ち物に違いがみられるケースでは属性情報として持ち物に関する情報も利用することができる。 In the second embodiment, a case where there is no difference in the amount of whole body features and a difference in the amount of facial features has been described as an example, but the applicable range is not limited thereto. For example, it can be applied to cases where there is a difference in attribute information. As attribute information, for example, gender, age, hair color, hair length, walking style (gait), etc. can be observed as differences even if the clothes and belongings are the same. In cases where the clothes are the same and the belongings are different, information about the belongings can also be used as attribute information.

本第２の実施形態では、複数の人物が同一の服装を着用し、同一の持ち物を所有している状況では全身特徴量に差がつかないため、顔特徴量や属性情報などの差がつく特徴量を利用して、再検索を行うことで探したい人物を検索結果の上位にランクインさせることができる。 In the second embodiment, since there is no difference in the whole body feature amount in the situation where a plurality of people wear the same clothes and own the same belongings, there is a difference in the facial feature amount and the attribute information. By using the feature amount and performing a re-search, the person to be searched for can be ranked in the top of the search results.

［第３の実施形態］
第３の実施形態を以下に説明する。上記第１と第２の実施形態では、画像検索装置に適用する例を説明した。本実施形態では、画像検索装置を、リストに登録した人物を検出する画像検出装置に適用した例を説明する。 [Third Embodiment]
A third embodiment will be described below. In the first and second embodiments described above, an example of application to an image search device has been described. In this embodiment, an example in which the image search device is applied to an image detection device that detects a person registered in a list will be described.

この場合、画像検出装置では検出したい人物の画像をリストに登録し、そのリストに登録された人物と類似する人物が映像データ中で発見された場合、そのことをリアルタイム（実時間）にユーザに通知する。画像検出装置においてもリストに登録した人物の外観と、映像データからリアルタイムで検出された本人以外の人物の外観が類似している場合、類似度が高くなり、閾値を超えた場合には誤検出となる。画像検出装置の場合、検出結果をユーザに都度通知するため、誤検出が頻発すると、ユーザは検出結果を信頼しなくなり、本人を正しく検出した場合でも確認してもらえなくなる恐れがある。本実施形態を用いることで、正検出率を高めるとともに、誤検出の頻度を低減させることが可能となる。 In this case, the image detection device registers the image of the person to be detected in the list, and when a person similar to the person registered in the list is found in the video data, the user is notified in real time. Notice. Even in the image detection device, if the appearance of the person registered in the list is similar to the appearance of a person other than the person detected in real time from the video data, the degree of similarity is high, and if the threshold value is exceeded, false detection is performed. It becomes. In the case of the image detection device, since the detection result is notified to the user each time, if erroneous detection occurs frequently, the user may not trust the detection result, and even if the person is correctly detected, it may not be confirmed. By using this embodiment, it is possible to increase the positive detection rate and reduce the frequency of false detections.

図９（Ａ）は、本第３の実施形態における対象画像検出装置のハードウェア構成図である。第１の実施形態の図１（Ａ）の参照符号１０１〜１０５、１０７〜１１１が、図９（Ａ）の参照符号２０１〜２０５、２０７〜２１１に対応する。参照符号２０６は特徴量検出部である。 FIG. 9A is a hardware configuration diagram of the target image detection device according to the third embodiment. Reference numerals 101 to 105 and 107 to 111 of FIG. 1 (A) of the first embodiment correspond to reference numerals 201 to 205 and 207 to 211 of FIG. 9 (A). Reference numeral 206 is a feature amount detection unit.

図９（Ｂ）は、本第３の実施形態における本装置におけるデータフローを示したものである。本装置は、対象画像検出装置２２０に示した部分であり、外部にある撮影装置２２１ａ．２２１ｂそれぞれが取得した映像データ２１２ａ．２１２ｂを、通信部２１１を介して受け取る。 FIG. 9B shows a data flow in the present apparatus according to the third embodiment. This device is a part shown in the target image detection device 220, and is an external photographing device 221a. Video data 212a. Acquired by each of 221b. 212b is received via the communication unit 211.

本装置のデータ登録処理においては、対象物抽出部２０３は、入力部２０９を介して、ユーザから提示された入力検出対象画像２１５を解析し、検出対象画像から特定人物の人物領域を抽出する。次に、特徴量抽出部２０４は、検出対象特徴量２１３を抽出し、抽出した特徴量を特徴量メモリ２１４に保持する。本第３の実施形態ではリアルタイム処理が求められるため、特徴量メモリ２１４は、高速な読み出しが可能なＲＡＭで構成されるものとした。なお、検出対象画像２１５を毎回入力する必要がないように、登録された検出対象特徴量の情報は。記憶部２０８にリストで管理される。リストに登録された検出対象特徴量は、起動の都度読み出し、ＲＡＭに展開される。 In the data registration process of the present device, the object extraction unit 203 analyzes the input detection target image 215 presented by the user via the input unit 209, and extracts a person area of a specific person from the detection target image. Next, the feature amount extraction unit 204 extracts the detection target feature amount 213 and holds the extracted feature amount in the feature amount memory 214. Since real-time processing is required in the third embodiment, the feature memory 214 is composed of a RAM capable of high-speed reading. In addition, the information of the registered detection target feature amount is so that it is not necessary to input the detection target image 215 every time. It is managed as a list in the storage unit 208. The detection target feature amount registered in the list is read out each time it is started and expanded in RAM.

本装置の検出処理においては、受け取った映像データ２１２ａ．２１２ｂは、登録処理時と同様に対象物抽出部２０３で映像内に映っている人物が抽出し、特徴量抽出部２０４にてリアルタイム検出特徴量２１６が抽出される。特徴量検出部２０６に内包される通常検出部２１７では、抽出されたリアルタイム検出特徴量２１６ａ，２１６ｂと、特徴量メモリ２１４に保持された検出対象特徴量２０５との類似度演算を算出し、類似度が所定の閾値を超えた場合、表示部２１０に検出結果を表示する。 In the detection process of this device, the received video data 212a. The 212b is extracted by the object extraction unit 203 as in the case of the registration process, and the feature amount extraction unit 204 extracts the real-time detection feature amount 216. The normal detection unit 217 included in the feature amount detection unit 206 calculates the similarity calculation between the extracted real-time detection feature amounts 216a and 216b and the detection target feature amount 205 held in the feature amount memory 214, and is similar. When the degree exceeds a predetermined threshold value, the detection result is displayed on the display unit 210.

次に、通常検出部２１７によるリアルタイム検出結果から対象物選択部２２２が複数の対象物を選択する。選択された複数の対象物が別対象物グループ作成部２１８により、同一の対象物であるか否か判定し、別の対象物であると判定された場合、複数の対象物からなる別対象物グループを作成する。別対象物グループに属する複数の人物の特徴量を利用して、再検出部２１９が再検出を行う。具体的には、通常検出部２１７で検出した検出画像の特徴量に対して、類似度の再計算を行い、所定の閾値を超えたものを再検出結果として表示部２１０に出力する。以降は、再検出時に利用した類似度の計算方法で、リアルタイム検出を行う。 Next, the object selection unit 222 selects a plurality of objects from the real-time detection result by the normal detection unit 217. The different object group creation unit 218 determines whether or not the selected multiple objects are the same object, and if it is determined that they are different objects, another object consisting of a plurality of objects. Create a group. The rediscovery unit 219 performs rediscovery by using the feature quantities of a plurality of persons belonging to another object group. Specifically, the similarity is recalculated with respect to the feature amount of the detected image detected by the normal detection unit 217, and the recalculation result of exceeding a predetermined threshold value is output to the display unit 210. After that, real-time detection is performed by the calculation method of similarity used at the time of re-detection.

上述の動作の内、検出関連処理についてフローチャートに従って説明する。 Among the above-mentioned operations, the detection-related processing will be described with reference to the flowchart.

図１０は、本第３の実施形態における特徴量検出部２０６の処理を示したものである。本フローの内、Ｓ８０１〜Ｓ８０２が通常検出部２１７の処理であり、Ｓ８０３以降が再検出部２１９の処理である。 FIG. 10 shows the processing of the feature amount detection unit 206 in the third embodiment. In this flow, S801 to S802 are the processes of the normal detection unit 217, and S803 and subsequent steps are the processes of the re-detection unit 219.

まず、Ｓ８０１にて、通常検出部２１７は、映像データ２１２から抽出されたリアルタイム検出特徴量２１６の入力を受け付ける。リアルタイム検出特徴量入力が実施すると、通常検出部２１７は、Ｓ８０２にて、特徴量メモリ２１４内に保持されている複数人物の検出対象特徴量２１３との類似度を算出する。 First, in S801, the normal detection unit 217 receives the input of the real-time detection feature amount 216 extracted from the video data 212. When the real-time detection feature amount input is executed, the normal detection unit 217 calculates in S802 the degree of similarity with the detection target feature amount 213 of a plurality of persons held in the feature amount memory 214.

Ｓ８０３にて、再検出部２１９は、類似度が閾値を超える複数の対象物を選択する。その複数の対象物の中に別対象物グループを作成する条件を満たす人物が含まれる場合、別対象物グループを作成する。選択された複数の特徴量が同一のカメラで撮影され、かつ、撮影期間に重なりがある場合、別対象物だと判定する。あるいは、選択された複数の特徴量が視野を共有しない複数のカメラで撮影され、かつ、撮影期間に重なりがある場合、別対象物だと判定し、別対象物グループが作成される。別対象物グループが作成される事例を図１１（Ａ）〜（Ｃ）を参照して説明する。 In S803, the re-detection unit 219 selects a plurality of objects whose similarity exceeds the threshold value. If a person who satisfies the condition for creating another object group is included in the plurality of objects, another object group is created. If a plurality of selected features are shot by the same camera and the shooting periods overlap, it is determined that they are different objects. Alternatively, when a plurality of selected feature quantities are photographed by a plurality of cameras that do not share a field of view and the shooting periods overlap, it is determined that they are different objects, and another object group is created. An example in which another object group is created will be described with reference to FIGS. 11A to 11C.

図１１（Ａ）は、検出結果のリスト、図１１（Ｂ）がカメラ１による検出結果と撮影時刻、図１１（Ｃ）がカメラ２による検出結果と撮影時刻を表している。検出対象として登録された人物のＩＤが一致し、かつ、撮影時刻が一致している場合、外観の類似度が高く、かつ、別対象物グループを作成する条件を満たすと判断する。図１１（Ａ）では、登録人物ＩＤが５の人物がカメラ１とカメラ２において同時に検出されており、この２名の人物は同一時刻に別の場所に存在しているため、別対象物グループが作成される。なお、通常検索結果に含まれていないが類似度が所定値以上となる特徴量も別対象グループの作成条件に加えてもよい。 11 (A) shows a list of detection results, FIG. 11 (B) shows the detection result and shooting time by the camera 1, and FIG. 11 (C) shows the detection result and shooting time by the camera 2. When the IDs of the persons registered as detection targets match and the shooting times match, it is determined that the appearance similarity is high and the conditions for creating another object group are satisfied. In FIG. 11A, a person with a registered person ID of 5 is detected by the camera 1 and the camera 2 at the same time, and since these two people exist at different places at the same time, different object groups. Is created. It should be noted that features that are not normally included in the search results but whose similarity is equal to or higher than a predetermined value may be added to the creation conditions of another target group.

Ｓ８０４にて、再検出部２１９は、別対象物グループが存在するか否かを判定する。再検出部２１９は、別対象物グループが存在しないと判定した場合はＳ８０５に、存在すると判定した場合はＳ８０６に処理を進める。 In S804, the re-detection unit 219 determines whether or not another object group exists. The rediscovery unit 219 proceeds to S805 when it is determined that another object group does not exist, and proceeds to S806 when it determines that another object group does exist.

Ｓ８０５にて、再検出部２１９は、Ｓ８０２で類似度が閾値を超えた人物を通常検出結果として通知し、本処理を終える。 In S805, the re-detection unit 219 notifies a person whose similarity exceeds the threshold value in S802 as a normal detection result, and ends this process.

Ｓ８０６にて、再検出部２１９は、別対象物グループと判定された複数の人物間で特徴量の差をもとに、式（１）と式（２）に基づき、重みベクトルを計算する。図１１（Ａ）乃至（Ｃ）の例では、ＩＤ５の人物が別対象物グループの条件を満たすため、重みベクトルが計算される。重みベクトルは、検出対象特徴量ごとに算出・保存される。別対象グループが複数存在する場合には、重みベクトルを統合してもよい。 In S806, the re-detection unit 219 calculates the weight vector based on the equations (1) and (2) based on the difference in the feature amount between the plurality of persons determined to be different object groups. In the examples of FIGS. 11A to 11C, the weight vector is calculated because the person with ID 5 satisfies the condition of another object group. The weight vector is calculated and saved for each feature to be detected. When there are a plurality of different target groups, the weight vectors may be integrated.

Ｓ８０７にて、再検出部２１９は、重みベクトルを用いて変換後の特徴空間で類似度が算出される。検出結果リストにおいて登録人物ＩＤが５と判定された画像の特徴量に対して、Ｓ８０６で求めた重みベクトルと式（３）を用いてスコアの再計算を行う。以降、登録人物ＩＤが５の人物の検出対象特徴量のスコアは、Ｓ８０６で求めた重みベクトルと式（６）の式を用いて算出される。重みベクトルは所定時間経過したら、更新するとよい。時間や日付の変更に伴い、人物が着用している服装が変化したり、日の出・日没などの環境光条件が変化することにより、重みベクトルが正しく機能しない恐れがある。このため、適切なタイミングで更新することで外的要因に対してロバストな検出が可能になる。顔特徴量など日付の変更に影響を受けない特徴量の場合には、時間帯に分けて重みベクトルを持たせる仕組みにしてもよい。 In S807, the rediscovery unit 219 calculates the similarity in the converted feature space using the weight vector. The score is recalculated using the weight vector obtained in S806 and the equation (3) for the feature amount of the image for which the registered person ID is determined to be 5 in the detection result list. Hereinafter, the score of the detection target feature amount of the person whose registered person ID is 5 is calculated by using the weight vector obtained in S806 and the formula (6). The weight vector may be updated after a predetermined time has elapsed. The weight vector may not function properly due to changes in the clothes worn by the person or changes in ambient light conditions such as sunrise and sunset due to changes in time and date. Therefore, by updating at an appropriate timing, robust detection against external factors becomes possible. In the case of a feature amount such as a face feature amount that is not affected by a date change, a mechanism may be adopted in which a weight vector is provided for each time zone.

本第３の実施形態によって、画像検出装置において検出対象画像として登録した人物の外観と、映像データからリアルタイムで検出された本人以外の人物の外観が類似している場合でも、本人のみを正しく検出することが可能となる。 According to the third embodiment, even if the appearance of a person registered as a detection target image in the image detection device and the appearance of a person other than the person detected in real time from the video data are similar, only the person is correctly detected. It becomes possible to do.

［その他の実施形態］
本発明は、画像検索装置、画像検出装置だけでなく、画像追跡装置にも適用可能である。画像検出装置で検出した対象物をカメラ内、あるいは、複数のカメラ間にまたがって追跡する場合、同一時刻に２人以上の追跡候補をみつけてしまうことがある。この場合、同一時刻であるため、この２人以上の人物は別人であるとみなすことができる。この２人の人物の特徴量をもとに重みベクトルを計算し、重みベクトルを利用して再度スコア算出をし、スコアが高い人物を追跡することでで、精度よく本人を追跡することができる。 [Other Embodiments]
The present invention is applicable not only to an image search device and an image detection device, but also to an image tracking device. When an object detected by an image detection device is tracked in a camera or across a plurality of cameras, two or more tracking candidates may be found at the same time. In this case, since the time is the same, the two or more persons can be regarded as different persons. By calculating the weight vector based on the features of these two people, calculating the score again using the weight vector, and tracking the person with the higher score, the person can be tracked accurately. ..

また、本発明は人物以外の対象物に適用することもできる。具体的には、よく似ているが僅かな違いがあり、１つしか存在しないものの画像検索に利用できる。例えば犬や猫などのペットは、同じ種類であってもわずかな個体差があるため、本発明が適用できる。複数の飼い主が集まってペットの撮影会などを実施した場合、自分のペットが写った写真だけを抽出するのを手作業で行うと手間がかかる。本発明を適用することで、自分のペットが写った画像だけを自動的に高い精度で抽出することができる。また、本発明は遺跡から出土される土器や手作りの工芸品の画像検索にも利用できる。遺跡から出土される土器の多くは製造技術が未発達であるため、大きさ・形・色など類似性は見られるものの同一であるといえるほどは類似していない。あるいは、手作りの花瓶や陶芸家が作製する作品などの工芸品は、焼き色や形にもわずかな個体差があり、同じ外観をしたものは２つとない。このような対象物を撮影した画像のデータベースから、所望の土器や工芸品を撮影した画像を画像検索する際に本発明が適用できる。１つしか存在しないものと前述したが、撮影場所や撮影時間に制限があるのであれば、その範囲内で１つしか存在しないものであってもよい。 The present invention can also be applied to an object other than a person. Specifically, they are very similar, but there are slight differences, and although there is only one, it can be used for image retrieval. For example, pets such as dogs and cats have slight individual differences even if they are of the same type, so the present invention can be applied. When multiple owners gather to hold a pet photo session, it takes time and effort to manually extract only the photos of their pets. By applying the present invention, it is possible to automatically extract only an image of one's pet with high accuracy. The present invention can also be used for image retrieval of pottery and handmade crafts excavated from archaeological sites. Many of the pottery excavated from the ruins are underdeveloped in manufacturing technology, so although they are similar in size, shape, and color, they are not so similar that they can be said to be the same. Alternatively, crafts such as handmade vases and works made by potters have slight individual differences in color and shape, and no two have the same appearance. The present invention can be applied when searching for an image of a desired pottery or craft from a database of images of such an object. As mentioned above, there is only one, but if there are restrictions on the shooting location and shooting time, only one may exist within that range.

また、本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention also supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device implement the program. It can also be realized by the process of reading and executing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the above embodiments, and various modifications and modifications can be made without departing from the spirit and scope of the invention. Therefore, a claim is attached to make the scope of the invention public.

１０１…ＣＰＵ、１０２…ＲＯＭ、１０３…対象物抽出部、１０４…特徴量抽出部、１０５…特徴量登録部、１０６…特徴量検索部、１０７…ＲＡＭ、１０８…記憶部、１０９…入力部、１１０…表示部、１１１…通信部 101 ... CPU, 102 ... ROM, 103 ... Object extraction unit, 104 ... Feature amount extraction unit, 105 ... Feature amount registration unit, 106 ... Feature amount search unit, 107 ... RAM, 108 ... Storage unit, 109 ... Input unit, 110 ... Display unit, 111 ... Communication unit

Claims

An extraction method that extracts the characteristics of the object contained in the input image,
An object having features similar to the features extracted by the extraction means by searching for a storage means that stores the features represented by the multidimensional vector and the image in association with each other in the object included in the image. A first search method for obtaining candidate images including objects, and
A determination means for determining whether or not the objects included in the candidate image group have individuals that can be regarded as non-identical to each other based on preset conditions.
When it is determined by the determination means that there are individuals that can be regarded as non-identical to each other, the extraction is performed in a feature space in which the weights of each dimension of the multidimensional vector are adjusted so as to emphasize the difference in features between the individuals. A second search means for obtaining a candidate image group including an object having a feature similar to the feature extracted by the means by searching the storage means, and
When the determination means determines that the individual does not exist, the search result by the first search means is output, and when the determination means determines that the individual exists, the second search means. An image search device characterized by having an output means for outputting search results by the search means of.

The storage means stores the characteristics of an object imaged by a plurality of imaging devices that do not share a photographing field of view with each other.
The input image is a query image and
When there are two or more candidates that can be regarded as the same time and are imaged by different imaging devices in the candidate group obtained by the first search means, the determination means are not identical to each other. The image search device according to claim 1, wherein it is determined that the indicated group exists.

The storage means stores the characteristics of one or more objects to be searched.
The input image is a real-time image taken by a plurality of imaging devices that do not share a shooting field of view with each other.
When there are two or more candidates that can be regarded as the same time and are imaged by different imaging devices in the candidate group obtained by the first search means, the determination means are not identical to each other. The image search device according to claim 1, wherein it is determined that the indicated group exists.

When it is determined by the determination means that there are a plurality of combinations of individuals that can be regarded as non-identical to each other, the second search means is a feature space in which the weighting coefficients of each dimension of the multidimensional vector calculated for each combination are integrated. The image search device according to any one of claims 1 to 3, wherein the search is performed in.

The object is a person
The search device according to any one of claims 1 to 4, wherein the extraction means extracts features corresponding to the whole image of a person and an image of a face.

The feature space is a feature space in which the weighting coefficient of each dimension of the multidimensional vector is adjusted, and is characterized by further having a display means for displaying information for specifying a part of an object represented by a weighting coefficient equal to or higher than a threshold value. The image search device according to any one of claims 1 to 5.

It is a control method of the image search device.
An extraction process that extracts the characteristics of the object contained in the input image,
An object having features similar to the features extracted in the extraction step by searching for a storage means that stores the features represented by the multidimensional vector and the image in association with each other in the object included in the image. The first search process for obtaining candidate images including objects, and
A determination step of determining whether or not the objects included in the candidate image group have individuals that can be regarded as non-identical to each other based on preset conditions.
When it is determined by the determination step that there are individuals that can be regarded as non-identical to each other, the extraction is performed in a feature space in which the weights of each dimension of the multidimensional vector are adjusted so as to emphasize the difference in features between the individuals. A second search step of obtaining a candidate image group including an object having a feature similar to the feature extracted in the step by searching the storage means, and
When it is determined by the determination step that the individual does not exist, the search result by the first search step is output, and when it is determined by the determination step that the individual exists, the second A control method of an image search device, which comprises an output process for outputting a search result by the search process of.

A program for causing the computer to execute each step of the method according to claim 7, when the computer reads and executes the process.