JPWO2014065033A1

JPWO2014065033A1 - Similar image search device

Info

Publication number: JPWO2014065033A1
Application number: JP2014543185A
Authority: JP
Inventors: 小倉　慎矢; 慎矢小倉; 新保　直之; 直之新保; 平井　誠一; 誠一平井; 智巳高田
Original assignee: Hitachi Kokusai Electric Inc
Current assignee: Hitachi Kokusai Electric Inc
Priority date: 2012-10-26
Filing date: 2013-09-11
Publication date: 2016-09-08
Anticipated expiration: 2033-09-11
Also published as: WO2014065033A8; JP6203188B2; WO2014065033A1

Abstract

動画から検索キー画像（顔）を指定する際の繰返し操作を簡素化する。顔をクリックすると、自動的に１コマ進めた画像に対して顔検出処理を行うことで、次々に当該人物を選択でき、当該人物が時間的に連続に登場する場面をビューア操作せず選択できるようにした。なお、当該人物が検出されない（例えば前の人と重なる）とき『次ボタン』を押す。別の例として、顔を１つ指定すると、指定画像の近傍（例えば５秒前〜５秒後までの１０秒間）の顔検出結果が全て表示され、その中から当該人物の顔をユーザが指定することでビューア操作せず一連の人物の複数画像を指定できるようにした。更に別の例として、自動的に人物追跡された結果を操作端末に表示することで、自動的に人物追跡された結果に間違いが無いかユーザが判定でき、間違いがあれば修正できるようにした。Simplify the repetitive operation when specifying a search key image (face) from a video. When a face is clicked, the face detection processing is automatically performed on the image advanced by one frame, so that the person can be selected one after another, and scenes where the person appears continuously in time can be selected without operating the viewer. I did it. When the person is not detected (for example, overlaps with the previous person), the “next button” is pressed. As another example, when one face is specified, all face detection results in the vicinity of the specified image (for example, 10 seconds from 5 seconds before to 5 seconds) are displayed, and the user specifies the face of the person from among them. By doing so, you can specify multiple images of a series of people without operating the viewer. As another example, the result of automatically tracking the person is displayed on the operation terminal, so that the user can determine whether the result of the automatically tracked person is correct and correct it if there is an error. .

Description

本発明は、類似画像検索装置に係り、特に、動画像の中から所望の複数の検索キーを指定して検索を行う類似画像検索装置に関する。 The present invention relates to a similar image search apparatus, and more particularly to a similar image search apparatus that performs a search by designating a plurality of desired search keys from a moving image.

従来、監視カメラ等で撮影され或いは記録された映像（動画像）の中から、所望の人物を画像認識技術等を用いてコンピュータに検索させる人物検索システムが知られる（例えば、特許文献１乃至４参照。）。このような、タグ付け等の外部情報に拠らずに画像そのものの特徴に基づき検索する技術は、一般にＣＢＩＲ（Content-Based Image Retrieval）と呼ばれる。
特許文献１乃至４が用いる画像認識技術は、画像から人物（の顔）が映った部分を切出し、人物を個々に特定するための特徴量として色ヒストグラム等を抽出し、この特徴量が所望の人物のものと類似する場合に、同一人物であると推定するものである。2. Description of the Related Art Conventionally, there has been known a person search system that allows a computer to search for a desired person using video recognition technology or the like from video (moving images) shot or recorded by a surveillance camera or the like (for example, Patent Documents 1 to 4). reference.). Such a technique for searching based on the characteristics of the image itself without relying on external information such as tagging is generally called CBIR (Content-Based Image Retrieval).
The image recognition techniques used in Patent Documents 1 to 4 cut out a portion where a person (a face) is reflected from an image, extract a color histogram or the like as a feature quantity for individually identifying the person, and this feature quantity is desired. If it is similar to that of a person, it is estimated that they are the same person.

このような類似顔画像検索システムでは、検索の精度はキー画像の条件、例えば表情、顔の向き、照明の当たり方、顔の検出（切出し）のズレ具合）に影響され、キー画像での条件と近いものが、検索結果として出てくることが多い。
一方で、ユーザは検索キーに指定した人物を、そのような条件に影響されずに検索できることを望む。このような場合ユーザは、指定された複数のキー画像についてキー画像を１つずつ用いて検索しそれらの結果を結合して表示する複数キー検索機能を利用することができる。すなわち、最初に、所望の人物画像（１枚でよい）をキー画像にして検索する。その検索結果を確認し、同一人物の画像が含まれていれば、次にその画像に前後する画像からなる動画の１コマずつをキー画像に指定して、上記複数キー検索機能を用いて再度検索する。これにより様々な条件の画像を見つけられる可能性が向上する。In such a similar face image search system, the accuracy of search is affected by key image conditions, such as facial expressions, face orientation, lighting conditions, and face detection (cutout) misalignment). Often appear as search results.
On the other hand, the user desires to be able to search for the person specified by the search key without being affected by such conditions. In such a case, the user can use a multiple key search function for searching for a plurality of designated key images using one key image at a time and combining and displaying the results. That is, first, a desired person image (one image is sufficient) is searched as a key image. Check the search results, and if images of the same person are included, then designate each frame of a moving image consisting of images before and after that image as a key image and use the multiple key search function again. Search for. This improves the possibility of finding images with various conditions.

特開２００９−０２７３９３号公報JP 2009-027393 A 特開２０１１−２３７８７９号公報JP2011-237879A 特開２０１１−１０７７９５号公報JP 2011-107995 A 特開２０１１−０４８６６８号公報JP 2011-048668 A 特開２０１０−１４０４２５号公報JP 2010-140425 A

Michael J. Swain and Dana H. Ballard，”Color indexing”，International Journal of Computer Vision，米国，Springer，１９９１年１月，第７巻，第１号，ｐ．１１−３２Michael J. Swain and Dana H. Ballard, “Color indexing”, International Journal of Computer Vision, USA, Springer, January 1991, Vol. 7, No. 1, p. 11-32

上述したとおり、最善の検索を期待するユーザは、検索したい人物の画像を既得の動画中からできるだけ多く特定し、それらを全てキー画像に指定し、未特定の画像を検索しようとする。このとき、既得動画中に検索対象人物が１回（１コマ或いは複数コマ連続する１シーン）しか映っていないことが保証されている場合（例えば、一人ずつしか通過できないレーン等で通過に同期して撮影するような運用など）は、容易に動画中の当該人物の全ての画像を自動的に選択できる。
一方、通常の防犯カメラ映像のように、検索対象人物が複数回映る可能性があり、更にそれらの画像において他の人物も一緒に映る可能性がある場合は、ユーザは動画の１コマずつをコマ送りする再生装置の操作をマウスクリックで行った後、画像指定ボタンを押して顔検出させ、顔検出結果の含まれた複数の顔から探したい人物の顔をマウスクリックして指定するという３つの手順を、画像の枚数だけ繰り返す必要があり非常に手間がかかる。As described above, a user who expects the best search specifies as many images of a person who wants to search as possible from the already obtained moving images, designates all of them as key images, and tries to search for unspecified images. At this time, if it is guaranteed that the search target person is shown only once (one scene or one scene in which a plurality of frames are consecutive) in the acquired video (for example, synchronized with passing in a lane or the like in which only one person can pass). Operation such as shooting) can easily select all the images of the person in the video automatically.
On the other hand, if there is a possibility that the search target person appears multiple times as in normal security camera video, and there is a possibility that other persons will also be shown together in those images, the user can view each frame of the video. After operating the playback device for frame advance by mouse click, press the image designation button to detect the face, and specify the face of the person you want to search from the multiple faces that contain the face detection result by clicking with the mouse It is necessary to repeat the procedure as many times as the number of images, which is very troublesome.

この問題を図８、図９を参照して詳述する。図８は、従来の端末装置１０３に表示される検索画面４００を示す図である。
図８の検索画面４００は、再生画像表示領域３０１、画像再生操作領域３０３、検索キー画像指定領域３０４、検索絞込パラメータ指定領域３０８、検索実行領域３１７、及び、検索結果表示領域３２０よりなる。
画像再生操作領域３０３は、録画装置に記録された画像を再生操作する領域である。画像再生操作領域３０３を構成する各ボタンには、それぞれ固有の再生種類が割当てられており、例えば左から、巻戻し、コマ戻し、逆再生、再生停止、順再生、コマ送り、早送りの再生種類が割当てられている。ユーザが各ボタンをマウス２８２で適宜押下することにより、ボタンに割当てられた再生種類で再生画像表示領域３０１に動画３０２が再生される。This problem will be described in detail with reference to FIGS. FIG. 8 is a diagram showing a search screen 400 displayed on the conventional terminal device 103.
8 includes a reproduction image display area 301, an image reproduction operation area 303, a search key image designation area 304, a search refinement parameter designation area 308, a search execution area 317, and a search result display area 320.
An image reproduction operation area 303 is an area for performing an operation for reproducing an image recorded in the recording apparatus. Each button constituting the image playback operation area 303 is assigned with a unique playback type. For example, from the left, playback types of rewind, frame reverse, reverse playback, playback stop, forward playback, frame advance, and fast forward are provided. Is assigned. When the user appropriately presses each button with the mouse 282, the moving image 302 is reproduced in the reproduction image display area 301 with the reproduction type assigned to the button.

検索キー画像指定領域３０４は、検索キー画像の指定と表示を行う領域である。本領域は、キー原画像３０５と、画像指定ボタン３０６、ファイル読込ボタン３０７よりなる。
キー原画像表示部３０５は、類似検索のためのキー画像或いはその元となる画像（キー原画像と呼ぶ）を表示する領域である。初期状態においては検索キー原画像は未指定であるので、画像表示はされていない。
画像指定ボタン３０６は、画像再生操作領域３０３に現在表示されている動画３０２をキー原画像に指定するボタンである。例えば動画３０２を再生停止状態にし画像指定ボタン３０６を押すと、そのときの画像がキー原画像に指定され、キー原画像表示部３０５にも表示される。キー原画像が新たに表示される都度、必要に応じ顔検出処理が実行され、検出された顔を切出す際の枠が自動的に付加される。枠は、初期的には（複数あるときは全て）非選択状態となっている。枠の１つを選択すると、キー画像（キー顔）の指定が完了する。
ファイル読込ボタン３０７は、録画装置１０２に記録されている画像以外の画像、例えば、デジタルカメラで撮影した画像やスキャナで取込んだ画像や動画像を、画像再生操作領域３０３に表示させるボタンである。このファイル読込ボタン３０７を押下すると、ファイルを開くダイアログボックスが表示され、そこで指定したファイルが読み込まれ、画像再生操作領域３０３で再生可能な状態になるか、自動的に再生が始まる。ファイルが静止画であれば、そのままキー原画像に指定されたことになり、キー原画像表示部３０５に表示される。A search key image designation area 304 is an area for designating and displaying a search key image. This area includes a key original image 305, an image designation button 306, and a file read button 307.
The key original image display unit 305 is an area for displaying a key image for similarity search or an image based on the key image (referred to as a key original image). In the initial state, the search key original image is not designated, so that no image is displayed.
The image designation button 306 is a button for designating the moving image 302 currently displayed in the image reproduction operation area 303 as a key original image. For example, when the moving image 302 is stopped and the image designation button 306 is pressed, the image at that time is designated as the key original image and is also displayed on the key original image display unit 305. Each time the key original image is newly displayed, face detection processing is executed as necessary, and a frame for cutting out the detected face is automatically added. The frame is initially in an unselected state (if there are multiple frames). When one of the frames is selected, the designation of the key image (key face) is completed.
The file reading button 307 is a button for displaying an image other than the image recorded in the recording apparatus 102, for example, an image captured by a digital camera, an image captured by a scanner, or a moving image in the image playback operation area 303. . When the file reading button 307 is pressed, a file open dialog box is displayed, and the designated file is read there, and can be played back in the image playback operation area 303, or playback starts automatically. If the file is a still image, it is designated as the key original image as it is, and is displayed on the key original image display unit 305.

検索絞込パラメータ指定領域３０８は、検索の際の絞込パラメータの種類とその値（範囲）を指定する領域である。本領域は、撮像装置指定チェックボックス３０９，３１０，３１１，３１２と、時刻指定チェックボックス３１３、３１４、時刻指定欄３１５、３１６から構成する。
撮像装置指定チェックボックス３０９，３１０，３１１，３１２は、検索の際に検索対象とする撮像装置を指定するボタンである。本ボタンは、押下すると選ばれたことを示すチェックマークがそれぞれ表示される。このマークは再押下すると非表示となり、押下で表示・非表示を繰り返す。初期状態においては、全撮像装置を検索対象とするため、撮像装置指定チェックボックスは全て選択状態となる。
時刻指定チェックボックス３１３，３１４は、検索の際に検索対象とする時刻範囲を指定するボタンである。表示の態様については本ボタンも他のチェックボックスと同様である。時刻指定チェックボックス３１３を選択状態にした場合には時刻範囲に先頭時刻を与える。非選択状態にした場合には、時刻範囲に先頭時刻を与えない、すなわち、録画装置に記録された最も古い時刻の画像までを検索対象範囲とすることを意味する。時刻指定チェックボックス３１４も同様であり、これを非選択状態にした場合には、録画装置に記録された最も新しい時刻の画像までを検索対象範囲とすることを意味する。
時刻指定欄３１５、３１６は、上述の先頭時刻と末尾時刻の値を指定する入力欄である。
初期状態においては、全時間帯を検索対象とするため、時刻指定チェックボックスは全て非選択状態、時刻指定欄は空欄とする。The search refinement parameter designation area 308 is an area for designating the type and value (range) of the refinement parameter at the time of search. This area is composed of imaging device designation check boxes 309, 310, 311 and 312, time designation check boxes 313 and 314, and time designation columns 315 and 316.
The imaging device designation check boxes 309, 310, 311 and 312 are buttons for designating imaging devices to be searched at the time of searching. When this button is pressed, a check mark indicating that it has been selected is displayed. This mark disappears when pressed again, and is repeatedly displayed and hidden when pressed. In the initial state, all imaging devices are targeted for search, and all imaging device designation check boxes are selected.
The time designation check boxes 313 and 314 are buttons for designating a time range to be searched in the search. As for the display mode, this button is the same as other check boxes. When the time designation check box 313 is selected, the start time is given to the time range. In the non-selected state, it means that the start time is not given to the time range, that is, the image with the oldest time recorded in the recording device is set as the search target range. The same applies to the time designation check box 314, and when this is set to the non-selected state, it means that the search target range is the image up to the latest time recorded in the recording apparatus.
The time designation columns 315 and 316 are input columns for designating the above-described start time and end time values.
In the initial state, since all time zones are to be searched, all the time specification check boxes are not selected and the time specification column is blank.

検索実行領域３１７は、検索実行を指示する領域である。本領域は、類似人物検索ボタン３１８と登場イベント検索ボタン３１９よりなる。
類似人物検索ボタン３１８は、キー原画像表示部３０５による類似人物検索実行を指示するボタンである。検索絞込パラメータ指定領域３０８にてパラメータが指定されている場合には、指定されたパラメータに従って検索の実行をすることを指示する。
登場イベント検索ボタン３１９は、登場イベント検索実行を指示するボタンである。通常、監視カメラシステム等では、動き検出や人感センサ発報、入退出管理等の他システムからの通知等のイベントを映像と間接的に対応付けて記録したり、映像の記録自体もそれらの発報があったときだけ行ったりしており、これらをイベント記録などと呼んでいる。登場イベント検索は、記録された各種イベントの内、人物の顔が正面方向から撮影されることが期待できるイベントに対応付けられた映像のみを、検索対象とするものである。検索絞込パラメータ指定領域３０８にてパラメータが指定されている場合には、指定されたパラメータに従って検索の実行をすることを指示する。The search execution area 317 is an area for instructing search execution. This area includes a similar person search button 318 and an appearance event search button 319.
The similar person search button 318 is a button for instructing execution of a similar person search by the key original image display unit 305. When a parameter is specified in the search refinement parameter specification area 308, an instruction is given to execute the search according to the specified parameter.
The appearance event search button 319 is a button for instructing execution event search execution. Usually, in surveillance camera systems, etc., events such as motion detection, human sensor notifications, notifications from other systems such as entry / exit management, etc. are recorded in association with the video indirectly, and the video recording itself is also recorded. It is only performed when there is a report, and these are called event records. In the appearance event search, only the video associated with an event that can be expected to photograph a human face from the front direction among various recorded events is set as a search target. When a parameter is specified in the search refinement parameter specification area 308, an instruction is given to execute the search according to the specified parameter.

図９は、操作画面４００で、動画の１コマずつを複数のキー画像として検索するための各手順の操作を示す図である。動画の１コマずつを複数のキー画像として検索したい場合、ユーザはまず、画像再生操作領域３０３を使用して、キー画像に使いたい動画の先頭の画像を再生画像表示領域３０１に表示させる（手順１と呼ぶ）。次に、画像指定ボタン３０６を押す（手順２と呼ぶ）。次に、キー原画像表示部３０５に表示されている顔画像を示す矩形の中から、検索キーにしたい顔をマウスで指定する（手順３と呼ぶ）。すると端末装置１０３の内部で、このときの画像（のＩＤ）と指定した顔の領域の情報が保持される。保持した情報は、人物検索ボタン３１８が押されるまで蓄積され、次にユーザは画像再生操作領域３０３のコマ送り機能を用いて、動画３０２を１コマ送る（手順４）。次に先ほどと同様に手順２と手順３を行う。以降、動画の１コマずつを複数のキー画像として検索するための末尾の画像になるまで、手順２、手順３、手順４を繰り返す。
この、手順２、手順３、手順４の繰り返しがユーザにとって非常に面倒な作業であるという課題がある。FIG. 9 is a diagram showing operations of each procedure for searching for each frame of a moving image as a plurality of key images on the operation screen 400. When searching for each frame of a moving image as a plurality of key images, the user first displays the first image of the moving image to be used as a key image in the replay image display region 301 using the image replay operation region 303 (procedure). 1). Next, the image designation button 306 is pressed (referred to as procedure 2). Next, from the rectangle indicating the face image displayed on the key original image display unit 305, the face to be used as a search key is designated with the mouse (referred to as procedure 3). Then, the image (its ID) at this time and information on the designated face area are held inside the terminal device 103. The stored information is accumulated until the person search button 318 is pressed, and then the user uses the frame playback function in the image playback operation area 303 to send one frame of the moving picture 302 (procedure 4). Next, procedure 2 and procedure 3 are performed in the same manner as before. Thereafter, Step 2, Step 3, and Step 4 are repeated until the last image for searching each frame of the moving image as a plurality of key images is obtained.
There is a problem that the repetition of the procedure 2, the procedure 3, and the procedure 4 is a very troublesome work for the user.

本発明は、このような問題に鑑みてなされたものであり、単調な繰り返し作業を自動化により排除し、一般ユーザに使い易いユーザインターフェースを提供することを目的とする。 The present invention has been made in view of such a problem, and an object thereof is to provide a user interface that is easy to use for general users by eliminating monotonous repetitive work by automation.

本発明を概略的に述べると、本発明の一側面では、顔をクリックすると１コマ進めた画像に対して顔検出処理を行うことで、次々に当該人物を選択でき、当該人物が時間的に連続に登場する場面をビューア操作せず選択できるようにした。なお、当該人物が検出されない（例えば前の人と重なる）とき『次ボタン』を押す。 Briefly describing the present invention, in one aspect of the present invention, when a face is clicked, a face detection process is performed on an image advanced by one frame, whereby the person can be selected one after another. The scenes that appear continuously can be selected without operating the viewer. When the person is not detected (for example, overlaps with the previous person), the “next button” is pressed.

本発明の他の側面では、顔を１つ指定すると、指定画像の近傍（例えば５秒前〜５秒後までの１０秒間）の顔検出結果が全て表示され、その中から当該人物の顔をユーザが指定することでビューア操作せず一連の人物の複数画像を指定できるようにした。 In another aspect of the present invention, when one face is specified, all face detection results in the vicinity of the specified image (for example, 10 seconds from 5 seconds before to 5 seconds after) are displayed, and The user can now specify multiple images of a series of people without operating the viewer.

本発明の他の側面では、自動的に人物追跡された結果を操作端末に表示することで、自動的に人物追跡された結果に間違いが無いかユーザが判定でき、間違いがあれば修正できるようにした。例えば、自動判定された一連の人物の顔を太枠に、その他を破線枠で区別して表示する。 In another aspect of the present invention, the result of automatically tracking the person is displayed on the operation terminal, so that the user can determine whether or not the result of the automatically tracked person is correct. I made it. For example, a series of automatically determined human faces are displayed in a bold frame and others in a broken line.

本発明のより具体的な一側面では、カメラで撮影し記録した動画像から、所定の被写体が映った部分画像をキーとして指定し、指定された該部分画像の特徴量に近い特徴量を有する画像を検索する類似画像検索装置において、
動画像の中から所望の１コマを指定するためのプレビューを提供する手段と、
前記プレビューされた１コマをキー原画像に指定する操作を受付ける手段と、
前記受付ける手段で指定された１コマに時間的に近傍の複数コマを、自動的にキー原画像に追加指定する手段と、
前記受付ける手段或いは前記追加指定する手段でキー原画像に指定された１ないし複数コマを、該１ないし複数コマから所定のアルゴリズムで検出された被写体に対応する領域を示す図形を付加して表示する手段と、
付加された前記図形の初期状態を非選択状態とし、１つのコマに対しいずれか１つの図形を選択状態とする操作を受付ける選択手段と、
選択状態となった前記図形に対応する複数の被写体をキーとする検索を要求する手段と、を有する。In a more specific aspect of the present invention, a partial image showing a predetermined subject is specified as a key from a moving image photographed and recorded by a camera, and has a feature amount close to the specified feature amount of the partial image. In a similar image search device for searching for an image,
Means for providing a preview for designating a desired frame from a moving image;
Means for accepting an operation of designating one previewed frame as a key original image;
Means for automatically specifying a plurality of frames that are temporally adjacent to one frame specified by the receiving means to be added to the key original image;
One or more frames designated as the key original image by the accepting means or the additional designation means are displayed with a figure indicating an area corresponding to the subject detected from the one or more frames by a predetermined algorithm. Means,
A selection means for accepting an operation of setting one of the graphics to a selected state for one frame, with the initial state of the added figure being a non-selected state;
Means for requesting a search using a plurality of subjects corresponding to the figure in the selected state as a key.

上記の類似画像検索装置において、前記被写体は人の顔であり、前記特徴量は、前記動画像から自動的に検出された前記被写体ごとに予め抽出され、該自動的に検出された元の画像に対応付けて記録されており、更に、前記要求する手段からの要求を受けて、前記選択状態となった前記図形に対応する複数の被写体は同一人物であると看做して、該複数の被写体に対応する複数の特徴量を１つずつキーとして検索し、それら結果を結合して応答する検索実行手段を設けてもよい。 In the similar image search device, the subject is a human face, and the feature amount is extracted in advance for each subject automatically detected from the moving image, and the automatically detected original image In response to a request from the requesting means, the plurality of subjects corresponding to the figure in the selected state are regarded as the same person, and the plurality of subjects There may be provided search execution means for searching a plurality of feature amounts corresponding to the subject one by one as a key and combining and responding to the results.

上記の類似画像検索装置において、前記表示する手段は、受付ける手段で指定された１コマに対し、前記図形を付加して、キー原画像表示領域に表示するものであり、
前記選択手段がキー原画像表示領域に表示された前記図形に対する前記選択状態とする操作を受付けるか、或いは、所定のボタンの１回の押下のみに応じて、前記追加指定する手段が、そのとき前記キー原画像表示領域に表示されていた１コマに続く次のコマをキー原画像に追加指定することで、自動的に該次のコマが前記キー原画像表示領域に表示され、再び前記選択手段が操作を受付けるという動作を繰り返し、要求する手段は、該繰り返しの間に選択状態にされた複数の前記図形に対応する複数の被写体を前記キーとするように構成してもよい。In the similar image search device, the display means adds the figure to one frame designated by the receiving means and displays the figure in the key original image display area.
The selection means accepts an operation for setting the selected state for the graphic displayed in the key original image display area, or the additional designation means in response to a single press of a predetermined button, By specifying the next frame following the one frame displayed in the key original image display area to be added to the key original image, the next frame is automatically displayed in the key original image display area, and the selection is again made. The means for repeatedly requesting the operation of the means may be configured so that a plurality of subjects corresponding to the plurality of figures selected during the repetition are used as the key.

上記の類似画像検索装置において、追加指定する手段は、前記受付ける手段により前記１コマをキー原画像に指定する操作が受付けられた後、自動的に或いは所定のボタンの１回の押下のみに応じて、前記受付ける手段で指定された１コマに時間的に前または後に連続する複数コマのうち、少なくとも被写体が検出された複数コマをキー原画像に追加指定し、前記表示する手段は、前記追加指定する手段で指定された複数のキー原画像を、夫々に前記図形を付加して指定済キー表示領域に表示するものであり、前記選択手段が、前記指定済キー表示領域に表示された複数のキー原画像に対する、前記図形を前記選択状態とする操作を複数受付け、前記要求する手段が、前記選択状態にされた複数の前記図形に対応する複数の被写体を前記キーとするように構成してもよい。 In the above similar image search device, the additional designating means may be in response to an operation of designating the one frame as a key original image by the accepting means, automatically or only by pressing a predetermined button once. The additional means designates a plurality of frames in which at least a subject is detected among a plurality of frames that are temporally before or after the one frame designated by the accepting means, and the means for displaying the key images further comprises: A plurality of key original images designated by the designating means are displayed in the designated key display area with the figure added thereto, respectively, and the selection means displays a plurality of key images displayed in the designated key display area. A plurality of operations for setting the graphic to the selected state with respect to the key original image of the key, and the requesting means selects a plurality of subjects corresponding to the plurality of the graphic in the selected state as the key. It may be configured to be.

上記の類似画像検索装置において、前記表示する手段は、受付ける手段で指定された１コマに対し、前記図形を付加して、キー原画像表示領域に表示するものであり、
前記自動追加手段は、前記選択手段が前記キー原画像表示領域に表示された前記図形に対する前記選択状態とする操作を受付けた後の所定のボタンの１回の押下のみに呼応して、該選択状態にされた前記図形に対応する被写体を起点として、記録された動画像の中から当該被写体に時空間的連続性を満たす被写体を追跡し、当該追跡が成功した範囲の複数のコマをキー原画像に追加指定するものであり、前記表示する手段は、前記追加指定する手段で指定された複数のキー原画像を、夫々に前記図形を付加して指定済キー表示領域に表示するように構成してもよい。In the similar image search device, the display means adds the figure to one frame designated by the receiving means and displays the figure in the key original image display area.
The automatic adding means responds only to a single press of a predetermined button after the selection means accepts an operation for making the selected state for the graphic displayed in the key original image display area, and the selection means Starting from the subject corresponding to the figure in the state, the subject that satisfies the spatio-temporal continuity from the recorded moving image is tracked, and a plurality of frames in the range in which the tracking is successful are recorded as key sources. The display means is configured to be additionally specified, and the display means is configured to display a plurality of key original images specified by the additional specification means in the specified key display area with the figure added thereto, respectively. May be.

本発明によれば、動画からの複数の検索キーの指定を自動もしくは半自動で行うようにしたので、簡単な操作で精度の高い検索を行うことができる。 According to the present invention, since a plurality of search keys are specified automatically or semi-automatically from a moving image, a highly accurate search can be performed with a simple operation.

類似画像検索システムの構成図（実施例１〜３）。The block diagram of the similar image search system (Examples 1-3). 撮像装置２０１のハードウェア構成図。2 is a hardware configuration diagram of the imaging apparatus 201. 録画装置１０２のハードウェア構成図。The hardware block diagram of the video recording apparatus. 端末装置１０３のハードウェア構成図。The hardware block diagram of the terminal device 103. FIG. 複数キー画像検索を行う検索画面３００を示す図（実施例１）。FIG. 10 is a diagram illustrating a search screen 300 for performing a multiple key image search (Example 1). 複数キー画像検索を行う検索画面３３０を示す図（実施例２）。FIG. 10 is a diagram illustrating a search screen 330 for performing a multiple key image search (Example 2). 複数キー画像検索を行う検索画面３４０を示す図（実施例３）。FIG. 10 is a diagram illustrating a search screen 340 for performing a multiple key image search (Example 3). 検索画面３４０でのキー画像の修正操作を示す図（実施例３）。FIG. 10 is a diagram illustrating a key image correction operation on a search screen 340 (third embodiment). 従来の端末装置１０３に表示される検索画面４００を示す図。The figure which shows the search screen 400 displayed on the conventional terminal device 103. FIG. 従来の検索画面４００で複数キー画像検索を行う手順を示す図。The figure which shows the procedure of performing a multiple key image search with the conventional search screen.

以下、本発明に係る一実施形態について図面を参照して説明する。なお、各図の説明において、実質的に同一な機能を有する構成要素には同一の参照番号を付し、説明を省略する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. In the description of each figure, components having substantially the same function are denoted by the same reference numerals and description thereof is omitted.

まず、図１〜図４を参照して、本発明の一実施形態に係る類似画像検索システムの構成について説明する。図１には、本発明の一実施形態に係る類似画像検索システムのシステム構成を例示してある。
類似画像検索システムは、図１に示すように、ネットワーク２００に、撮像装置２０１、録画装置１０２、端末装置１０３が接続され、互いに通信可能な状態で構成される。First, the configuration of a similar image search system according to an embodiment of the present invention will be described with reference to FIGS. FIG. 1 illustrates a system configuration of a similar image search system according to an embodiment of the present invention.
As shown in FIG. 1, the similar image search system is configured such that an imaging apparatus 201, a recording apparatus 102, and a terminal apparatus 103 are connected to a network 200 and can communicate with each other.

ネットワーク２００は、データ通信を行う専用ネットワークやイントラネット、インターネット、無線ＬＡＮ（Local Area Network）等の各装置を相互に接続して通信を行う通信手段である。
撮像装置２０１は、ＣＣＤ（Charge Coupled Device）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）素子等で撮像した画像にデジタル変換処理を施し、変換結果の画像データを、ネットワーク２００を介して録画装置へ出力するネットワークカメラや監視カメラ等の装置である。The network 200 is a communication unit that performs communication by mutually connecting devices such as a dedicated network that performs data communication, an intranet, the Internet, and a wireless local area network (LAN).
The imaging device 201 performs a digital conversion process on an image captured by a CCD (Charge Coupled Device), a CMOS (Complementary Metal Oxide Semiconductor) element, or the like, and outputs the converted image data to the recording device via the network 200. Devices such as cameras and surveillance cameras.

録画装置１０２は、ネットワーク２００を介して撮像装置２０１より入力された画像データをＨＤＤ等の記録媒体に記録するネットワークビデオレコーダ等の装置である。また人物検索のための殆どの機能も搭載される。
録画装置１０２は、機能構成として、画像送受信部２１０、画像記録部２１１、再生制御部２１２、人物領域検出部２１３、人物特徴量抽出部２１４、人物特徴量記録部２１５、属性情報記録部２１６、要求受信部２１７、類似人物検索部２１８、登場イベント検索部２１９、検索結果送信部２２０を有する。The recording apparatus 102 is an apparatus such as a network video recorder that records image data input from the imaging apparatus 201 via the network 200 on a recording medium such as an HDD. It also has most of the functions for searching people.
The recording apparatus 102 has an image transmission / reception unit 210, an image recording unit 211, a reproduction control unit 212, a person area detection unit 213, a person feature amount extraction unit 214, a person feature amount recording unit 215, an attribute information recording unit 216, as a functional configuration. A request receiving unit 217, a similar person search unit 218, an appearance event search unit 219, and a search result transmission unit 220 are included.

画像送受信部２１０は、装置外部からの画像の入出力を行う処理部であり、撮像装置２０１からの画像データの受信、端末装置１０３への画像データの送信等を行う。
画像記録部２１１は、画像データの記録媒体への書込みや読出しを行う。書込みの際には、画像データに加え、画像データを読出す際の情報となる画像ＩＤ（画像の識別情報）も併せて記録する。
再生制御部２１２は、端末装置１０３への映像（ストリーム）再生を制御する。The image transmission / reception unit 210 is a processing unit that inputs and outputs images from the outside of the apparatus, and performs reception of image data from the imaging apparatus 201, transmission of image data to the terminal apparatus 103, and the like.
The image recording unit 211 writes and reads image data to and from a recording medium. At the time of writing, in addition to the image data, an image ID (image identification information) serving as information for reading the image data is also recorded.
The playback control unit 212 controls video (stream) playback to the terminal device 103.

人物領域検出部２１３は、撮像装置２０１から受信した画像データに対し画像認識技術を用いた人物検出を行い、画像中の人物の存在判定をし、人物が存在する場合には、その顔を基準にして所定条件で顔の周囲まで含む、所定の縦横比を有する矩形の領域の座標算出を行う。
人物特徴量抽出部２１４は、人物領域検出部２１３で検出した領域に対して画像認識技術を用いて特徴量算出を行う。ここで算出する人物特徴量とは、静止画から抽出可能なものであり、例えば、検出した領域を一定サイズにスケーリングし、一律に分割した画素ブロック毎に色や輝度、あるいはそれらの勾配やテクスチャのヒストグラムを求め、その結果を集約した多次元ベクトルである。このようなappearance-baseの認識のほか、３次元形状を復元して認識するもの、例えば、顔の骨格に強く依存する輪郭や目や鼻、口に対応する特徴点の相対的配置関係を３次元上で判断するものが挙げられるが、本実施形態においては、使用する特徴量の種類や数はいずれであってもよい。特徴量の次元を減らすため、Linde-Buzo-Gray法などベクトル量子化の手法を用いることができる。ヒストグラムベースの特徴量であれば、画素ブロック１つ分を大津の２値化手法により最終的に２色で近似するBlock Truncation Codingを用いても良い。The person area detection unit 213 performs person detection using image recognition technology on the image data received from the imaging device 201, determines the presence of a person in the image, and if a person exists, the person is used as a reference. Thus, the coordinates of a rectangular area having a predetermined aspect ratio including the periphery of the face under predetermined conditions are calculated.
The person feature amount extraction unit 214 performs feature amount calculation on the region detected by the person region detection unit 213 using an image recognition technique. The human feature amount calculated here is one that can be extracted from a still image. For example, the detected area is scaled to a certain size, and the color and brightness, or the gradient and texture of each pixel block are uniformly divided. This is a multidimensional vector in which the histograms are obtained and the results are aggregated. In addition to the recognition of appearance-base, the one that reconstructs and recognizes the three-dimensional shape, for example, the contour that strongly depends on the skeleton of the face and the relative arrangement of the feature points corresponding to the eyes, nose, and mouth. Although what is judged on a dimension is mentioned, in this embodiment, the kind and number of the feature-values to be used may be any. In order to reduce the dimension of the feature amount, a vector quantization method such as the Linde-Buzo-Gray method can be used. In the case of a histogram-based feature amount, Block Truncation Coding may be used in which one pixel block is finally approximated with two colors by Otsu's binarization method.

人物特徴量記録部２１５は、人物特徴量抽出部２１４で算出した特徴量の記録媒体への書込みと読出しを行う。大量の特徴量の中から高速に検索するために、特徴量はクラスタに分類され、それに応じ記録構造（ツリーや辞書の構成やＨＤＤ上のセクタ配置等）も最適化される。簡易的には、単純な規則のハッシュ関数で分類し、ハッシュ値と記録先を対応付ける辞書を作り、１分類に属する特徴量の数が増えすぎたときには、さらに階層を増やして細かく分類する方法がある。このほかＥＭ法など各種の最適化アルゴリズムが知られる。
属性情報記録部２１６は、個々の画像データに関連する属性情報の記録媒体への書込みと読出しを行う。属性情報とは、例えば、画像の撮影時刻や撮像装置番号、各種イベントのフラグ、人物領域検出部２１３で検出した領域の座標等である。The person feature amount recording unit 215 writes and reads the feature amount calculated by the person feature amount extraction unit 214 to and from the recording medium. In order to perform a high-speed search from a large amount of feature amounts, the feature amounts are classified into clusters, and the recording structure (a tree, dictionary configuration, sector arrangement on the HDD, etc.) is optimized accordingly. For simplicity, a method of classifying with a simple rule hash function, creating a dictionary that associates hash values with recording destinations, and when the number of feature quantities belonging to one category has increased excessively, a method of further classifying by further increasing the hierarchy is a method. is there. In addition, various optimization algorithms such as the EM method are known.
The attribute information recording unit 216 writes and reads attribute information related to individual image data to and from the recording medium. The attribute information is, for example, an image shooting time, an imaging device number, flags of various events, coordinates of an area detected by the person area detection unit 213, and the like.

要求受信部２１７は、端末装置１０３からの検索要求の受信を行う。検索要求には、類似画像検索要求と、登場イベント検索要求がある。
類似人物検索部２１８は、要求受信部２１７にて受信した要求が類似人物検索要求であった場合に、類似画像検索を行う。基本的には、両者の特徴量ベクトルの差（ノルム）が小さいほど類似していると判断する。ヒストグラムベースの特徴量の場合、非特許文献１に記載のHistogram Intersectionにより１乃至複数ブロック分の類似度を求め、これらを重み付き加算したものを類似度とすることができ、この類似度が所定値以上のものを検索結果として出力する。
登場イベント検索部２１９は、要求受信部にて受信した要求が登場イベント検索要求であった場合に、登場イベント検索を行う。
検索結果送信部２２０は、類似人物検索部２１８や登場イベント検索部２１９から得た類似人物検索結果や登場イベント検索結果の端末装置１０３への送信を行う。The request receiving unit 217 receives a search request from the terminal device 103. The search request includes a similar image search request and an appearance event search request.
The similar person search unit 218 performs a similar image search when the request received by the request reception unit 217 is a similar person search request. Basically, it is determined that the smaller the difference (norm) between the two feature vectors, the more similar. In the case of a histogram-based feature quantity, the similarity for one to a plurality of blocks can be obtained by Histogram Intersection described in Non-Patent Document 1, and the weighted addition of these can be used as the similarity. Output more than the value as search results.
The appearance event search unit 219 performs an appearance event search when the request received by the request reception unit is an appearance event search request.
The search result transmission unit 220 transmits the similar person search result and the appearance event search result obtained from the similar person search unit 218 and the appearance event search unit 219 to the terminal device 103.

端末装置１０３は、実際に検索を行う画像記録部２１１とユーザと間のインターフェースとなる装置であり、ネットワーク機能を有する一般のＰＣ（パーソナルコンピュータ）で実現してもよい。
端末装置１０３は、機能構成として、検索要求送信部２２１、検索結果受信部２２２、検索結果表示部２２３、再生画像表示部２２４、画面操作検知部２２５の各処理部を有する。The terminal device 103 is a device serving as an interface between the image recording unit 211 that actually performs the search and the user, and may be realized by a general PC (personal computer) having a network function.
The terminal device 103 includes processing units such as a search request transmission unit 221, a search result reception unit 222, a search result display unit 223, a reproduction image display unit 224, and a screen operation detection unit 225 as functional configurations.

検索要求送信部２２１は、検索要求の録画装置１０２への送信を行う。類似人物検索の場合、キー画像（キー顔）が指定されるたびにそれを蓄積し、その後類似人物検索ボタン３１８等が押されたときに、蓄積した１乃至複数の検索キー画像と、絞込みパラメータとを含む検索要求（クエリ）を送信するとともに、蓄積したキー画像をクリアする。なお、キー画像の蓄積や送信は、キー画像の画像データそのものではなく、その特徴量、或いはそれが抽出された原画像のＩＤと原画像内での位置情報の組等を用いてすることもできる。また送信したキー画像は別途、検索履歴として保存してもよい。
検索結果受信部２２２は、検索結果の録画装置１０２からの受信を行う。検索結果として受信するデータには、録画装置１０２において、類似人物検索、或いは、登場イベント検索を実施して得られた画像の集合が含まれる。集合を構成する個々の画像は、録画装置１０２に記録された映像から画像サイズ縮小処理等を施して生成される。以下、この個々の画像を「検索結果画像」、検索結果として送受信するデータを「検索結果データ」という。
検索結果表示部２２３は、検索結果受信部２２２にて受信した検索結果の画面表示を行う。表示される画面例については後述する。
再生画像表示部２２４は、DirectShow（商標）等を利用して、録画装置１０２から受信したされた画像データの復号や画面への再生（動画表示）を行う。
画面操作検知部２２５は、ユーザによる操作内容の検知及び取得を行う。The search request transmission unit 221 transmits a search request to the recording device 102. In the case of a similar person search, a key image (key face) is stored every time it is specified, and when the similar person search button 318 or the like is subsequently pressed, the one or more search key images stored and the narrowing parameters are stored. A search request (query) including the above is transmitted, and the accumulated key image is cleared. The key image is stored and transmitted not using the image data of the key image itself but also using a feature amount or a combination of the ID of the original image from which the key image is extracted and position information in the original image. it can. The transmitted key image may be stored separately as a search history.
The search result receiving unit 222 receives the search result from the recording device 102. The data received as the search result includes a set of images obtained by performing similar person search or appearance event search in the recording apparatus 102. Individual images constituting the set are generated by performing image size reduction processing or the like from the video recorded in the recording device 102. Hereinafter, each individual image is referred to as a “search result image”, and data transmitted and received as a search result is referred to as “search result data”.
The search result display unit 223 displays a screen of the search result received by the search result receiving unit 222. An example of the displayed screen will be described later.
The reproduced image display unit 224 uses DirectShow (trademark) or the like to decode the image data received from the recording apparatus 102 and reproduce it on the screen (moving image display).
The screen operation detection unit 225 detects and acquires operation content performed by the user.

図２には、本発明の一実施形態に係る類似画像検索システムに用いる撮像装置２０１のハードウェア構成を例示してある。
撮像装置２０１のハードウェア構成は、図２に示すように、撮像部２４１、主記憶部２４２、符号化部２４３、ネットワークインタフェース（Ｉ／Ｆ）２４５が、バス２４０で結合された形態である。FIG. 2 illustrates a hardware configuration of the imaging apparatus 201 used in the similar image search system according to the embodiment of the present invention.
As shown in FIG. 2, the hardware configuration of the imaging apparatus 201 is a form in which an imaging unit 241, a main storage unit 242, an encoding unit 243, and a network interface (I / F) 245 are coupled via a bus 240.

撮像部２４１は、レンズで撮像した光信号をデジタルデータに変換する。符号化部２４３は、撮像部２４１が出力するデジタルデータを符号化して、ＪＰＥＧ（Joint Photographic Experts Group）などの画像データに変換する。主記憶部２４２は、撮像したデジタルデータ、符号化された画像データを記憶する。ネットワークＩ／Ｆ２４５は、ネットワーク２００を介して、主記憶部２４２上の画像データを録画装置１０２に送信するためのインタフェースである。 The imaging unit 241 converts the optical signal captured by the lens into digital data. The encoding unit 243 encodes the digital data output from the imaging unit 241 and converts it into image data such as JPEG (Joint Photographic Experts Group). The main storage unit 242 stores captured digital data and encoded image data. The network I / F 245 is an interface for transmitting image data on the main storage unit 242 to the recording apparatus 102 via the network 200.

図３には、本発明の一実施形態に係る類似画像検索システムに用いる録画装置１０２のハードウェア構成を例示してある。
録画装置１０２のハードウェア構成は、図３に示すように、ＣＰＵ（Central Processing Unit）２５１、主記憶部２５２、補助記憶部２５３、ネットワークＩ／Ｆ２５４が、バス２５０で結合された形態である。FIG. 3 illustrates a hardware configuration of the recording apparatus 102 used in the similar image search system according to the embodiment of the present invention.
As shown in FIG. 3, the hardware configuration of the recording apparatus 102 is a form in which a CPU (Central Processing Unit) 251, a main storage unit 252, an auxiliary storage unit 253, and a network I / F 254 are coupled via a bus 250.

ＣＰＵ２５１は、録画装置１０２の各部の制御と、機能を実現するためのプログラムの実行を行う。主記憶部２５２は、ＤＲＡＭ（Dynamic Random Access Memory）などの半導体装置で実現され、検索のための画像データやＣＰＵ２５１で実行するプログラムをロードして格納するための中間的なメモリである。補助記憶部２５３は、ＨＤＤやフラッシュメモリなどで実現され、主記憶部２５２より大容量のメモリであり、画像データやプログラムを格納する。ネットワークＩ／Ｆ２５４は、ネットワーク２００を介して、撮像装置２０１からの画像データを受信したり、端末装置１０３から検索キーワードを受信したり、端末装置１０３に画像データを送信するためのインタフェースである。 The CPU 251 performs control of each unit of the recording device 102 and execution of a program for realizing the function. The main storage unit 252 is realized by a semiconductor device such as a DRAM (Dynamic Random Access Memory), and is an intermediate memory for loading and storing image data for search and a program executed by the CPU 251. The auxiliary storage unit 253 is realized by an HDD, a flash memory, or the like, and has a larger capacity than the main storage unit 252 and stores image data and programs. The network I / F 254 is an interface for receiving image data from the imaging apparatus 201, receiving a search keyword from the terminal apparatus 103, and transmitting image data to the terminal apparatus 103 via the network 200.

図４には、本発明の一実施形態に係る類似画像検索システムに用いる端末装置１０３のハードウェア構成を例示してある。
端末装置１０３のハードウェア構成は、図４に示すように、ＣＰＵ２６１、主記憶部２６２、補助記憶部２６３、表示Ｉ／Ｆ２６４、入出力Ｉ／Ｆ２６５、ネットワークＩ／Ｆ２６６が、バス２６０で結合された形態である。また、表示Ｉ／Ｆ２６４は、表示装置２７０と接続され、入出力Ｉ／Ｆ２６５は、キーボード２８０やマウス２８２などの入出力装置と接続される。FIG. 4 illustrates a hardware configuration of the terminal device 103 used in the similar image search system according to the embodiment of the present invention.
As shown in FIG. 4, the hardware configuration of the terminal device 103 includes a CPU 261, a main storage unit 262, an auxiliary storage unit 263, a display I / F 264, an input / output I / F 265, and a network I / F 266 that are coupled via a bus 260. It is a form. The display I / F 264 is connected to the display device 270, and the input / output I / F 265 is connected to input / output devices such as a keyboard 280 and a mouse 282.

ＣＰＵ２６１は、端末装置１０３の各部の制御と、機能を実現するためのプログラムの実行を行う。主記憶部２６２は、ＤＲＡＭなどの半導体装置で実現され、表示のための画像データやＣＰＵ２６１で実行するプログラムをロードして格納するための中間的なメモリである。補助記憶部２６３は、ＨＤＤやフラッシュメモリなどで実現され、主記憶部２６２より大容量のメモリであり、検索キーワード、画像データやプログラムを格納する。表示Ｉ／Ｆ２６４は、表示装置２７０と接続するためのインタフェースである。入出力Ｉ／Ｆ２６５は、キーボード２８０やマウス２８２などの入出力装置と接続するためのインタフェースである。ネットワークＩ／Ｆ２６６は、ネットワーク２００を介して、録画装置１０２からの画像データを受信したり、録画装置１０２に検索キーワードを送信したりするためのインタフェースである。表示装置２７０は、例えば、ＬＣＤ（Liquid Crystal Display）などの装置であり、画像や動画をその表示部に表示する装置である。ユーザは、表示装置２７０の表示部に表示された画像を、キーボード２８０やマウス２８２などの入出力装置を操作して、例えばＧＵＩ（Graphical User Interface）操作することによって、端末装置１０３、及び、類似画像検索システムを操作する。 The CPU 261 performs control of each unit of the terminal device 103 and execution of a program for realizing the function. The main storage unit 262 is realized by a semiconductor device such as a DRAM, and is an intermediate memory for loading and storing image data for display and a program executed by the CPU 261. The auxiliary storage unit 263 is realized by an HDD, a flash memory, or the like, and has a larger capacity than the main storage unit 262, and stores search keywords, image data, and programs. The display I / F 264 is an interface for connecting to the display device 270. The input / output I / F 265 is an interface for connecting to input / output devices such as a keyboard 280 and a mouse 282. The network I / F 266 is an interface for receiving image data from the recording apparatus 102 and transmitting a search keyword to the recording apparatus 102 via the network 200. The display device 270 is a device such as an LCD (Liquid Crystal Display), for example, and is a device that displays an image or a moving image on its display unit. The user operates the input / output device such as the keyboard 280 and the mouse 282 on the image displayed on the display unit of the display device 270, for example, the GUI (Graphical User Interface), thereby performing the terminal device 103 and the like. Operate the image search system.

次に、本発明の実施形態に係る類似画像検索システムにおける複数キー画像検索を、実施例１乃至３により説明する。 Next, multiple key image search in the similar image search system according to the embodiment of the present invention will be described with reference to Examples 1 to 3.

本発明の実施形態である類似画像検索システムの実施例１を、図５を参照して説明する。
図５は、実施例１の端末装置１０３に表示される検索画面３００を示す図である。本例の検索画面３００は、新たに、次ボタン３２１を設けた点などで従来と異なる。Example 1 of the similar image search system according to the embodiment of the present invention will be described with reference to FIG.
FIG. 5 is a diagram illustrating a search screen 300 displayed on the terminal device 103 according to the first embodiment. The search screen 300 of this example is different from the conventional one in that a next button 321 is newly provided.

本例の検索画面３００を用いて、ユーザが動画の１コマずつを複数のキー画像として検索するための手順は、おおよそ以下のようになる。
まず、従来と同様に手順１を行う。つまり、画像再生操作領域３０３を使用して、キー画像に使いたい動画の先頭の画像（或いは任意の動画中の画像）を再生画像表示領域３０１に表示させる。なお、記録装置２０１に記録された動画の中からカメラや時刻を指定して再生するには、監視カメラシステムの分野で一般的な技術を使用すればよい。
次に、従来と同様に手順２を行う。つまり、画像指定ボタン３０６を押すことで、そのとき表示されている動画３０２（先頭の画像である）が、キー原画像として取り込まれ、検索キー画像指定領域３０４にキー原画像３０５として表示される。Using the search screen 300 of this example, the procedure for the user to search for each frame of a moving image as a plurality of key images is roughly as follows.
First, procedure 1 is performed as in the conventional case. That is, using the image reproduction operation area 303, the first image (or an image in an arbitrary moving image) of the moving image that is desired to be used as the key image is displayed in the reproduced image display region 301. In order to designate and reproduce the camera and time from the moving image recorded in the recording device 201, a general technique in the field of the surveillance camera system may be used.
Next, procedure 2 is performed as in the conventional case. That is, when the image designation button 306 is pressed, the moving image 302 (the first image) displayed at that time is captured as a key original image and displayed as the key original image 305 in the search key image designation area 304. .

次に、手順３として、表示されているキー原画像３０５中で、検索キーにしたい顔があれば、それを従来の手順３同様にマウス２２６で指定し、検索キーにしたい顔がなければ、次ボタン３２１を押す。
顔を指定した瞬間、このときの画像と指定した顔の領域の情報が端末装置１０３内に保持される。あるいは、顔画像の特徴量を（適宜端末装置１０３内で算出して）保持してもよい。保持したのち、端末装置１０３は、キー原画像３０５を１コマ進めたものに更新して検索キー画像指定領域３０４に表示する。次ボタン３２１を押した場合は、保持をせずにキー原画像３０５を１コマ進めたものに更新する。Next, as step 3, if there is a face to be used as a search key in the displayed key original image 305, it is designated with the mouse 226 as in the conventional procedure 3, and if there is no face to be used as a search key, Press the next button 321.
At the moment when the face is designated, information on the image at this time and the designated face area is held in the terminal device 103. Or you may hold | maintain the feature-value of a face image (calculating suitably in the terminal device 103). After the holding, the terminal device 103 updates the key original image 305 to the one advanced by one frame and displays it in the search key image designation area 304. When the next button 321 is pressed, the key original image 305 is updated to one advanced by one frame without holding.

ユーザはこの後、所望の人物が画像に映っている間、手順３のみを繰り返せばよく、手順２と４が省略できる。なお、本例を最も簡易に実装する方法は、ユーザのする操作と同じキーコードやマウスイベントを発生させる方法（エミュレーション）である。 Thereafter, the user only has to repeat the procedure 3 while the desired person appears in the image, and the procedures 2 and 4 can be omitted. The simplest method for implementing this example is a method (emulation) for generating the same key code or mouse event as the user's operation.

以下、本例の処理の一例を詳細に説明する。本例の実現に当り、顔検出や顔画像の特徴量の計算を何時何処で行うか、コマ送り中の画像を伝送し復号するか、等により幾つかの最適実装が考えられ、最も簡易な実装は、ユーザのする操作と同じキーコードやマウスイベントを発生させる方法（エミュレーション）である。以下の説明では、顔検出や特徴量は画像（動画）に埋め込まれた状態で端末装置１０３にストリーム送信されるものとする。 Hereinafter, an example of the processing of this example will be described in detail. In order to realize this example, some optimal implementations are conceivable depending on when and where the face detection and facial image feature quantity calculation is performed, and whether the image during frame advance is transmitted and decoded. The implementation is a method (emulation) for generating the same key code and mouse event as the user's operation. In the following description, it is assumed that face detection and feature amounts are stream-transmitted to the terminal device 103 in a state where they are embedded in an image (moving image).

従来同様の手順１での処理は次のようになる。ＭＰＥＧのような動画であれば、ＲＴＳＰやＭＲＣＰ（Media Resource Control Protocol）等プロトコルで、録画装置１０２と端末装置１０３間でセッションが開始され、端末装置１０３の再生画像表示部２２４から再生位置を指定したＰＬＡＹメッセージ等を送信して、所望のコマの画像が再生画像表示領域３０１に表示されているものとする。このとき再生画像表示部２２４の受信バッファには、所望のコマの後のコマも蓄えられた状態となる。静止画を一枚ずつ伝送する方法であっても、プリフェッチを行うことで同様に蓄えられた状態とする。
動画の場合、再生画像表示部２２４のDirectShowの入力フィルタ（スプリッタ）が、カメラ名、撮影時刻、画像ＩＤ、その画像内での顔領域の通し番号と座標（と特徴量）を取り出して、描画フィルタ及びアプリケーションソフトに渡す。静止画の場合、画像ファイルのヘッダ等に埋め込まれたものを同様に取り出す。入力フィルタはMPEGビデオのエレメンタリストリームを復号フィルタに渡す。
描画フィルタ（レンダラ）は、カメラ名、撮影時刻を文字化および画像化し、それらを復号フィルタで復号された動画（静止画）に重畳して、表示Ｉ／Ｆ２６４への描画処理を行う。これにより再生画像表示領域３０１に動画３０２が表示される。書出しフィルタ（グラバ）は復号された動画（静止画）を１コマ単位で所定のフォーマットでアプリケーションソフトに渡す。巻戻し等の再生制御はIMediaControlやIMediaSeeking インターフェイスにより行う。巻戻しはSet_Rateメソッドで負の値を与えれば良いが、サポートしていない場合は、SetPositionsメソッドで再生コマを１つずつ指定する。またIBasicVideo インターフェイスのGetCurrentImageメソッドやMultimedia Streaming API等を使うことで書出しフィルタと同様の機能を実現できる。The process in Procedure 1 as in the prior art is as follows. In the case of a moving picture such as MPEG, a session is started between the recording apparatus 102 and the terminal apparatus 103 using a protocol such as RTSP or Media Resource Control Protocol (MRCP), and a reproduction position is designated from the reproduction image display unit 224 of the terminal apparatus 103. It is assumed that an image of a desired frame is displayed in the reproduction image display area 301 by transmitting the PLAY message or the like. At this time, the frame after the desired frame is also stored in the reception buffer of the reproduced image display unit 224. Even in the method of transmitting still images one by one, pre-fetching is performed to similarly store the images.
In the case of a moving image, the DirectShow input filter (splitter) of the reproduction image display unit 224 extracts the camera name, shooting time, image ID, serial number and coordinates (and feature amount) of the face area in the image, and draws the drawing filter. And pass to application software. In the case of a still image, the image embedded in the header of the image file is similarly extracted. The input filter passes the MPEG video elementary stream to the decoding filter.
The drawing filter (renderer) converts the camera name and shooting time into characters and images, superimposes them on the moving image (still image) decoded by the decoding filter, and performs a drawing process on the display I / F 264. As a result, the moving image 302 is displayed in the reproduced image display area 301. The writing filter (grabber) passes the decoded moving image (still image) to the application software in a predetermined format in units of frames. Playback control such as rewinding is performed using the IMediaControl and IMediaSeeking interfaces. For rewinding, it is sufficient to give a negative value with the Set_Rate method, but if it is not supported, playback frames are specified one by one with the SetPositions method. The same function as the export filter can be realized by using the GetCurrentImage method of the IBasicVideo interface or the Multimedia Streaming API.

従来同様の手順２での処理は次のようになる。つまり、アプリケーションソフトは、画像指定ボタン３０６が押されたことを画面操作検知部２２５から通知されると、書出しフィルタ等から受取った１コマ分の画像（動画３０２として表示されている画像である）に、受取った顔領域の座標に応じた枠の画像を重畳して、表示Ｉ／Ｆ２６４へ描画処理を行う。これによりキー原画像３０５として、動画３０２に表示されているものと同じ画像が表示される。 The process in the procedure 2 as in the prior art is as follows. That is, when the application software is notified from the screen operation detection unit 225 that the image designation button 306 has been pressed, the image for one frame received from the writing filter or the like (the image displayed as the moving image 302). Then, a frame image corresponding to the received coordinates of the face area is superimposed on the display I / F 264 to perform drawing processing. As a result, the same image as that displayed on the moving image 302 is displayed as the key original image 305.

本例の手順３での処理は次のようになる。つまり、アプリケーションソフトは、検索キー画像指定領域３０４内でマウス操作があったことを画面操作検知部２２５から通知されると、マウス操作の座標と、顔領域の枠を表示した座標とを比較し、該当する枠があるか判断する。該当する枠があれば、その枠が選択されたものとしてその枠に対応する顔領域の通し番号と座標（と特徴量）を画像ＩＤとともに配列に保持する。これにより、選択された枠内の部分画像が、指定済キー画像となる。
次にIVideoFrameStepインターフェイスのStepメソッドによりコマを１つ進め、画像、その画像での各顔領域の通し番号と座標（と特徴量）を入力フィルタ等から受取り、上記手順２と同様にキー原画像３０５として表示する。The processing in procedure 3 of this example is as follows. That is, when the application software is notified from the screen operation detection unit 225 that the mouse operation has been performed in the search key image designation area 304, the application software compares the coordinates of the mouse operation with the coordinates displaying the frame of the face area. Determine whether there is a corresponding frame. If there is a corresponding frame, the serial number and coordinates (and feature amount) of the face area corresponding to the frame are stored in the array together with the image ID, assuming that the frame has been selected. Thereby, the partial image in the selected frame becomes the designated key image.
Next, one frame is advanced by the Step method of the IVideoFrameStep interface, and the image and the serial number and coordinates (and feature amount) of each face area in the image are received from the input filter or the like, and the key original image 305 is obtained in the same manner as in the above procedure 2. indicate.

なお、本例では操作を極力シンプルにするため、指定済キー画像の一覧や修正（原画像で枠を選択し直すこと）、削除等のインタフェースを明示的には提供しないが、エキスパート向けにこれらを提供することを妨げるものではない。 In this example, in order to simplify the operation as much as possible, an interface such as a list of specified key images, correction (reselecting a frame in the original image), deletion, etc. is not explicitly provided. It does not prevent you from providing.

本発明の実施形態である類似画像検索システムの実施例２を、図６を参照して説明する。なお、実施例１と同様の部分は説明を省略する。
図６は、実施例２における端末装置１０３で複数キー画像検索を行う際の検索画面３３０を示す図である。本例の検索画面３３０は、次ボタン３２１に代えて近傍全表示ボタン３３１を備え、また自動的に追加されたキー原画像に対し顔の選択を行う指定済キー表示領域３３２を更に備えた点などで、実施例１と異なる。Example 2 of the similar image search system according to the embodiment of the present invention will be described with reference to FIG. The description of the same parts as those in the first embodiment will be omitted.
FIG. 6 is a diagram illustrating a search screen 330 when the terminal device 103 according to the second embodiment performs a multiple key image search. The search screen 330 in this example includes a near all display button 331 instead of the next button 321, and further includes a designated key display area 332 for selecting a face for the automatically added key original image. Thus, the first embodiment is different from the first embodiment.

本例の検索画面３３０を用いて、ユーザが動画の１コマずつを複数のキー画像として指定し検索するための手順は、おおよそ以下のようになる。
まず、実施例１同様に、従来の手順１と手順２を行う。つまり、目的の人物が映った所望の画像（ただし、連続する画像中の最先のものでなくても良い）を再生画像表示領域３０１に表示させた状態で、画像指定ボタン３０６を押す。Using the search screen 330 of this example, the procedure for the user to designate and search one frame at a time as a plurality of key images is roughly as follows.
First, similar to the first embodiment, the conventional procedure 1 and procedure 2 are performed. That is, the image designation button 306 is pressed in a state where a desired image (however, it may not be the first one in the continuous images) displayed in the reproduced image display area 301 is displayed.

次に、手順３として、近傍全表示ボタン３３１を押す。すると手順２で画像指定ボタン３０６を押した時の画像の近傍（例えば５秒前〜５秒後までの１０秒間）の動画３０２のうち１つ以上の顔検出結果を有する画像が、指定済キー表示領域３３２に全て表示される。図６には３つのキー原画像３３３、３３４、３３５が表示された例を示している。端末装置１０３は、キー原画像３３３〜３３５に、その画像内での顔領域の通し番号と座標に基づいて、顔領域の境界に相当する枠を重畳して表示する。 Next, as procedure 3, the all near display button 331 is pressed. Then, an image having one or more face detection results in the moving image 302 in the vicinity of the image when the image designation button 306 is pressed in step 2 (for example, 10 seconds from 5 seconds before to 5 seconds after) is designated key. All are displayed in the display area 332. FIG. 6 shows an example in which three key original images 333, 334, and 335 are displayed. The terminal device 103 superimposes and displays a frame corresponding to the boundary of the face area on the key original images 333 to 335 based on the serial number and coordinates of the face area in the image.

次に、手順４として、指定済キー表示領域３３２に表示された画像１つずつについて、実施例１の手順３と同様に、目的の人物の顔を選択する。顔を選択すると、その顔に付された枠が、選択状態を示す枠（例えば太枠）に描画しなおされ、また画像ＩＤと指定した顔の領域の情報が端末装置１０３内に保持される。
なお、目的の人物の顔がない画像は、そのままどの顔も選択せずにおけばよい。また一旦顔選択した画像において、別の顔をクリックすると、その新たにクリックした顔が選択状態、元の顔は非選択状態となり、また新たな顔の領域の情報が保持内容に上書きされる。これにより、類似人物検索ボタン３１８を押したときに指定済キー表示領域３３２内で選択状態となっていた顔が、キー画像（キー顔）となる。Next, as step 4, the face of the target person is selected for each image displayed in the designated key display area 332 as in step 3 of the first embodiment. When a face is selected, the frame attached to the face is redrawn in a frame indicating the selected state (for example, a thick frame), and the image ID and information on the designated face area are held in the terminal device 103. .
An image without a target person's face may be left without selecting any face. In addition, when another face is clicked in the face-selected image, the newly clicked face is selected, the original face is not selected, and the information of the new face area is overwritten with the stored content. Thus, the face selected in the designated key display area 332 when the similar person search button 318 is pressed becomes a key image (key face).

本例によれば、従来の手順２と手順４の繰り返しを省くことができる。
本例の手順３の処理の詳細を以下、補足する。
アプリケーションソフトは、近傍全表示ボタン３３１の押下があったことを画面操作検知部２２５から通知されると、SetPositionsメソッドにより再生位置を例えば５秒前に戻す。またフィルタグラフを操作し、出力ピンをNullレンダラに接続する。また再生レートを可能な限り最高にする。それにより順次、コマの画像（ＧＤＩビットマップオブジェクト）、その画像での顔領域の通し番号と座標を受取り、メモリに保持すると共に指定済キー表示領域３３２に並べて表示する。画像をメモリに保持するのは枠の再選択や指定済キー表示領域３３２のスクロールで再描画が必要になるからである。もしそうしたほうが処理が速くなるのなら、元のサイズではなく指定済キー表示領域３３２での表示サイズに縮小して保存してよい。画像ＩＤ、顔領域の通し番号と座標は配列などに格納する。
そして１０秒分の画像が取り込まれると、再生を停止し、出力ピンを元のレンダラに戻す。
Nullレンダラに接続したことで、この間、再生画像表示領域３０１への動画３０１の表示は更新されない。また通常のDirectShowフィルタは上流フィルタのスレッド内で動くので、アプリケーションのスレッドとデッドロックを起こしやすいので、Multimedia Streaming APIを使用して、フィルタグラフから画像データを取り出すことが望ましい。According to this example, the repetition of the conventional procedure 2 and procedure 4 can be omitted.
The details of the process of the procedure 3 of this example will be supplemented below.
When the application software is notified from the screen operation detection unit 225 that the all near display button 331 has been pressed, the application software returns the playback position to, for example, 5 seconds ago by the SetPositions method. Manipulate the filter graph and connect the output pin to the Null renderer. Also, maximize the playback rate as much as possible. As a result, the frame image (GDI bitmap object) and the serial number and coordinates of the face area in the image are sequentially received, held in the memory, and displayed side by side in the designated key display area 332. The image is held in the memory because redrawing is required by reselecting the frame or scrolling the designated key display area 332. If the processing is faster in this case, the display size may be reduced to the display size in the designated key display area 332 instead of the original size. The image ID, the serial number of the face area, and the coordinates are stored in an array or the like.
When 10 seconds worth of image is captured, playback is stopped and the output pin is returned to the original renderer.
By connecting to the Null renderer, the display of the moving image 301 in the reproduction image display area 301 is not updated during this period. In addition, since the normal DirectShow filter operates in the thread of the upstream filter, it is easy to cause a deadlock with the thread of the application. Therefore, it is desirable to extract the image data from the filter graph using the Multimedia Streaming API.

なお本例では、指定済キー表示領域３３２にキー原画像（の候補）が多数表示され、ユーザはその中で適宜スクロールしながら、キーにしたい顔を１つずつ選択する。そのため、顔が確認し易いよう、検索キー画像指定領域３０４に表示する場合と同じサイズで表示するようにしてある。なおキー原画像は、デフォルトを縮小表示（アイコン）としマウス２２６をそれに合わせることで元のサイズに表示させてもよく、顔を選択し終えた画像から順次、非表示にしても良い。 In this example, a large number of key original images (candidates) are displayed in the designated key display area 332, and the user selects one face at a time while scrolling appropriately. For this reason, the face is displayed in the same size as that displayed in the search key image designation area 304 so that the face can be easily confirmed. Note that the key original image may be displayed in the original size by reducing the display (icon) as a default and matching the mouse 226 to that, or may be sequentially hidden from the image after the face has been selected.

本発明の実施形態である類似画像検索システムの実施例３を、図７Ａ、図７Ｂを参照して説明する。なお、実施例１や２と同様の部分は説明を省略する。
図７Ａは、実施例３の端末装置で複数キー画像検索を行う際の検索画面３４０を示す図である。本例は、指定済キー表示領域３３２内でのキー画像の指定を更に自動化した点などで、実施例２と異なる。Example 3 of the similar image search system according to the embodiment of the present invention will be described with reference to FIGS. 7A and 7B. The description of the same parts as those in the first and second embodiments will be omitted.
FIG. 7A is a diagram illustrating a search screen 340 when a multiple key image search is performed on the terminal device according to the third embodiment. This example is different from Example 2 in that the designation of the key image in the designated key display area 332 is further automated.

本例の検索画面３４０を用いて、ユーザが動画の１コマずつを複数のキー画像として指定するための手順は、おおよそ以下のようになる。
まず、実施例２同様に、従来の手順１と手順２を行う。つまり、目的の人物が映った所望の画像を再生画像表示領域３０１に表示させた状態で、画像指定ボタン３０６を押す。Using the search screen 340 of this example, the procedure for the user to designate each frame of a moving image as a plurality of key images is roughly as follows.
First, similarly to the second embodiment, the conventional procedure 1 and procedure 2 are performed. That is, the image designation button 306 is pressed while a desired image showing the target person is displayed in the reproduction image display area 301.

次に、手順３として、実施例１同様に、キー原画像表示部３０５に表示されている原画像中で、顔を示す矩形の中から、検索キーにしたい顔をマウス２２６で指定する。
次に、手順４として、近傍全表示ボタン３３１を押す。すると、手順３で指定した顔の人物について、手順３で指定した画像の時間的近傍において人物追跡処理を行い、その結果同一人物とされた顔を含む画像が、時刻順に全て表示される。更に顔検出結果を示す枠のうち、同一人物とされた顔の枠が、自動的に選択状態となる。Next, as a procedure 3, as in the first embodiment, a face to be used as a search key is designated with the mouse 226 from the rectangle indicating the face in the original image displayed on the key original image display unit 305.
Next, as procedure 4, the all near display button 331 is pressed. Then, the person tracking process is performed on the face person specified in step 3 in the temporal vicinity of the image specified in step 3, and as a result, all the images including the face made the same person are displayed in order of time. Further, among the frames indicating the face detection results, the face frames that are the same person are automatically selected.

次に、手順５として、指定済キー表示領域３３２に表示されたキー原画像において、間違って別の人物の顔が選択されていないか確認し、もし間違いがあれば、正しい顔を選択しなおす。それにより、図７Ｂに示すようにその顔に付された枠が、選択状態を示す枠（例えば太枠）に描画しなおされ、新たな顔の領域の情報が保持内容に上書きされる。これにより、類似人物検索ボタン３１８を押したときに指定済キー表示領域３３２内で選択状態となっていた顔が、キー画像（キー顔）となる。 Next, as step 5, in the key original image displayed in the designated key display area 332, it is confirmed whether another person's face has been selected by mistake. If there is an error, the correct face is selected again. . As a result, as shown in FIG. 7B, the frame attached to the face is redrawn into a frame indicating the selected state (for example, a thick frame), and the information on the new face area is overwritten on the held content. Thus, the face selected in the designated key display area 332 when the similar person search button 318 is pressed becomes a key image (key face).

ここで、本例の手順４の詳細を補足する。アプリケーションソフトは、近傍全表示ボタン３３１の押下の通知を受けると、キー原画像３０５において選択されている枠があるか否か検査する。選択されている枠があれば、その情報（画像ＩＤ、通し番号、座標、特徴量）を配列の先頭などに保存する。そしてSetPositionsメソッド等を用い１つ前のコマを指定し、画像とその画像での顔領域の通し番号と座標（と特徴量）を入力フィルタから受取る。もし顔領域が１つもなければ、そのデータは破棄してさらに１つ前のコマを指定する。もし顔領域が１つ以上あれば、すでに配列（の最後）に保持された顔領域の座標（重心座標）に最も近い座標の顔領域を見つけ、領域の位置（重心座標）や大きさが、所定の連続性条件を満たすか（例えばカルマンフィルタの出力との差が所定値以内か）検査する。連続性条件を満たす場合、同一人物と推定されるので、それを配列に保存すると共に、更に前のコマを指定し、同様の処理を同一人物を見失う（ロスト）まで繰り返す。なお複数コマ連続して連続性条件を満たせなかったときにロストしたと判断する。また近傍全表示ボタン３３１の押下時に表示されていたキー原画像３０５より時間的に後の画像についても同様にコマを進めながら繰り返す。
なお、特徴量も取得しているので、配列の最初に保持された特徴量（ユーザにより手順３で指定された顔の特徴量）と今のコマの各顔領域の特徴量との類似度も用いて検査するほうが、精度が期待できる。Here, the detail of the procedure 4 of this example is supplemented. When the application software receives a notification that the all near display button 331 has been pressed, it checks whether there is a frame selected in the key original image 305. If there is a selected frame, the information (image ID, serial number, coordinates, feature amount) is stored at the top of the array. Then, the previous frame is designated using the SetPositions method or the like, and the image and the serial number and coordinates (and feature amount) of the face area in the image are received from the input filter. If there is no face area, the data is discarded and the previous frame is designated. If there is more than one face area, find the face area whose coordinates are closest to the coordinates (center of gravity coordinates) of the face area already held in the array (the last), and the position (center of gravity coordinates) and size of the area are It is inspected whether a predetermined continuity condition is satisfied (for example, whether the difference from the output of the Kalman filter is within a predetermined value). If the continuity condition is satisfied, it is presumed that the person is the same person, so that it is stored in the array, and the previous frame is designated, and the same process is repeated until the same person is lost (lost). In addition, it is determined that the frame has been lost when the continuity condition cannot be satisfied continuously for a plurality of frames. In addition, the same processing is repeated for the image temporally after the key original image 305 displayed when the all near display button 331 is pressed.
Since the feature quantity is also acquired, the similarity between the feature quantity held at the beginning of the array (the face feature quantity specified by the user in step 3) and the feature quantity of each face area of the current frame is also obtained. The accuracy can be expected by using the inspection.

このように、本例での人物追跡処理断は、端末装置１０３側で顔領域の時空間的連続性に基づき行うことができ、例えば特許文献５に記載の方法を利用できる。更に顔検出の結果以外のものによる追跡を併用することもできる。最も容易なのは、手順３で指定した顔の部分画像を用いたテンプレートマッチングである。初期テンプレートには、指定画像の時間的近傍で差分法による動体検知を行い、指定した顔に相当する人物全体の像（複数コマの中から大きさがそれらの中央値に近いものを選ぶと良い）を用いてもよい。また人物全体の像から顔以外の体の部分を切出し、その体画像から抽出した特徴量の類似度を顔の類似度と併用してもよい。なお画像処理による動体追跡手法は様々なものが知られており、本例に適用できる。複数の追跡手法を併用し、ある手法でロストしても他の手法での追跡結果をロストした手法に与え続けることで再度補足することが期待でき、全手法でロストするまで追跡できることとなり、堅牢性が向上する。或いはレーザセンサ等、距離或いは３次元形状を直接計測する手段をカメラに併設し、その計測結果から物体検出し追跡してもよい。時空間連続性における空間とは、２次元でもよく、ピクセル座標のような画像空間に限らず、経緯度のような地理的空間でもよい。複数のカメラ等の位置検出手段の計測結果を共通の座標系で扱うことで、個々のカメラを意識することなく追跡を行うことができる。 In this way, the person tracking processing interruption in this example can be performed based on the spatiotemporal continuity of the face area on the terminal device 103 side, and for example, the method described in Patent Document 5 can be used. Furthermore, tracking by means other than the result of face detection can be used together. The simplest method is template matching using the partial facial image specified in step 3. For the initial template, it is recommended to perform motion detection by the difference method near the specified image in time and select an image of the entire person corresponding to the specified face (a plurality of frames having a size close to their median value) ) May be used. Further, a body part other than the face may be cut out from the image of the whole person, and the similarity of the feature amount extracted from the body image may be used together with the similarity of the face. Various moving object tracking methods using image processing are known and can be applied to this example. Even if multiple tracking methods are used together, it is expected to be supplemented again by continuing to give the tracking results of other methods to the lost method even if it is lost by one method, and it can be tracked until it is lost by all methods, and it is robust Improves. Alternatively, a means for directly measuring a distance or a three-dimensional shape such as a laser sensor may be provided in the camera, and an object may be detected and tracked from the measurement result. The space in spatiotemporal continuity may be two-dimensional, and is not limited to an image space such as pixel coordinates, but may be a geographical space such as longitude and latitude. By handling the measurement results of position detection means such as a plurality of cameras in a common coordinate system, tracking can be performed without being conscious of each camera.

以上説明したように、本発明の実施形態は、時系列画像（動画）に連続的に映り込んでいることが期待できる所望の人物を、一括して検索キー画像に指定する場合に、好適である。なお検索対象は記録画像に限らず、撮影中の画像からリアルタイムに検索（照合）するシステムにも適用可能である。また動画は再生画像表示領域から画像指定するものに限らず、ファイル読込ボタン３０７を押したときのダイアログボックスにおいて複数の静止画ファイル（時系列に撮影されたもの）を選択することで、それらをキー原画像に一括指定できるようにしても良い。
ここで、本発明に係るシステムや装置などの構成としては、必ずしも以上に示したものに限られず、種々な構成が用いられてもよい。また、本発明は、例えば、本発明に係る処理を実行する方法或いは方式や、このような方法や方式をコンピュータに実現させるためのプログラムや当該プログラムを記録する有体の媒体などとして提供することもできる。As described above, the embodiment of the present invention is suitable when a desired person who can be expected to be continuously reflected in a time-series image (moving image) is designated as a search key image collectively. is there. Note that the search target is not limited to a recorded image, but can be applied to a system that searches (collates) a captured image in real time. In addition, the moving image is not limited to the image designated from the reproduction image display area, and by selecting a plurality of still image files (those photographed in time series) in the dialog box when the file reading button 307 is pressed, The key original image may be designated collectively.
Here, the configuration of the system and apparatus according to the present invention is not necessarily limited to the configuration described above, and various configurations may be used. The present invention also provides, for example, a method or method for executing the processing according to the present invention, a program for causing a computer to implement such a method or method, a tangible medium for recording the program, and the like. You can also.

ＣＣＴＶ（Closed-Circuit Television ）システム、顔認証システム、犯罪者データベース等のほか、テレビ番組制作システムや個人向けの電子アルバム等、カメラに映った人物や車両などの映像を扱うシステムに利用可能である。 It can be used for CCTV (Closed-Circuit Television) system, face authentication system, criminal database, etc., as well as TV program production system, personal electronic album, and other systems that handle video images of people and vehicles. .

１０２：録画装置、１０３：端末装置、１１３：複数検索キー選択部、
２００：ネットワーク、２０１：撮像装置、２１０：画像送受信部、２１１：画像記録部、２１２：再生制御部、２１３：人物領域検出部、２１４：人物特徴量抽出部、２１５：人物特徴量記録部、２１６：属性情報記録部、２１７：要求受信部、２１８：登場人物検索部、２１９：登場イベント検索部、２２０：検索結果送信部、
２４０：バス、２４１：撮像部、２４２：主記憶部、２４３：符号化部、２４５：ネットワークＩ／Ｆ、
２５０：バス、２５１：ＣＰＵ、２５２：主記憶部、２５３：補助記憶部、２５４：ネットワークＩ／Ｆ、
２６０：バス、２６１：ＣＰＵ、２６２：主記憶部、２６３：補助記憶部、２６４：表示Ｉ／Ｆ、２６６：ネットワークＩ／Ｆ、２７０：表示装置、２８０：キーボード、２８２：マウス、
３００，３３０，３４０：検索画面、３０１：再生画像表示領域、３０２：動画、３０３：画像再生操作領域、３０４：検索キー画像指定領域、３０５：検索キー画像、３０６：画像指定ボタン、３０７：ファイル読込ボタン、３０８：検索絞込パラメータ指定領域、３０９〜３１２：撮像装置指定チェックボックス、３１３，３１４：時刻指定チェックボックス、３１５，３１６：時刻指定欄、３１７：検索実行領域、３１８：類似人物検索ボタン、３１９：登場イベント検索ボタン、３２０：検索結果表示領域、
３３１：全近傍表示ボタン、３３２：指定済キー表示領域、３３３〜３３５：キー原画像。102: Recording device, 103: Terminal device, 113: Multiple search key selection unit,
200: Network 201: Imaging device 210: Image transmission / reception unit 211: Image recording unit 212: Playback control unit 213: Person area detection unit 214: Person feature amount extraction unit 215: Person feature amount recording unit 216: Attribute information recording unit, 217: Request receiving unit, 218: Character search unit, 219: Appearance event search unit, 220: Search result transmission unit,
240: Bus, 241: Imaging unit, 242: Main storage unit, 243: Encoding unit, 245: Network I / F,
250: Bus, 251: CPU, 252: Main memory, 253: Auxiliary memory, 254: Network I / F,
260: Bus, 261: CPU, 262: Main memory, 263: Auxiliary memory, 264: Display I / F, 266: Network I / F, 270: Display device, 280: Keyboard, 282: Mouse,
300, 330, 340: Search screen, 301: Reproduction image display area, 302: Movie, 303: Image reproduction operation area, 304: Search key image designation area, 305: Search key image, 306: Image designation button, 307: File Read button, 308: Search refinement parameter designation area, 309-312: Imaging apparatus designation check box, 313, 314: Time designation check box, 315, 316: Time designation column, 317: Search execution area, 318: Similar person search Button, 319: appearance event search button, 320: search result display area,
331: All neighborhood display button, 332: Designated key display area, 333-335: Key original image.

Claims

In a similar image search device that specifies a partial image in which a predetermined subject is captured as a key from a moving image photographed and recorded by a camera, and searches for an image having a feature amount close to the feature amount of the specified partial image.
Means for providing a preview for designating a desired frame from a moving image;
Means for accepting an operation of designating one previewed frame as a key original image;
Means for automatically specifying a plurality of frames that are temporally adjacent to one frame specified by the receiving means to be added to the key original image;
One or more frames designated as the key original image by the accepting means or the additional designation means are displayed with a figure indicating an area corresponding to the subject detected from the one or more frames by a predetermined algorithm. Means,
A selection means for accepting an operation of setting one of the graphics to a selected state for one frame, with the initial state of the added figure being a non-selected state;
And a means for requesting a search using a plurality of subjects corresponding to the graphic in the selected state as a key.

The subject is a human face;
The feature amount is extracted in advance for each subject automatically detected from the moving image and recorded in association with the automatically detected original image,
In response to the request from the requesting means, the plurality of subjects corresponding to the figure in the selected state are regarded as the same person, and a plurality of feature amounts corresponding to the plurality of subjects are set as one. 2. The similar image retrieval system according to claim 1, further comprising retrieval execution means for retrieving each key as a key and combining and responding to the results.

The display means adds the figure to one frame designated by the receiving means and displays it in the key original image display area.
The selection means accepts an operation for setting the selected state for the graphic displayed in the key original image display area, or the additional designation means in response to a single press of a predetermined button, By specifying the next frame following the one frame displayed in the key original image display area to be added to the key original image, the next frame is automatically displayed in the key original image display area, and the selection is again made. The means repeats the action of accepting the operation,
The similar image retrieval apparatus according to claim 1, wherein the requesting unit uses a plurality of subjects corresponding to the plurality of figures selected during the repetition as the keys.

The additional designating means is designated by the accepting means automatically after the operation for designating the one frame as the key original image is accepted by the accepting means or only by pressing the predetermined button once. In addition, a plurality of frames in which at least a subject is detected are additionally specified as a key original image among a plurality of frames that are temporally before or after one frame,
The display means is to display a plurality of key original images designated by the additional designation means in the designated key display area with the figure added thereto,
The selection means accepts a plurality of operations for setting the graphic in the selected state for a plurality of key original images displayed in the designated key display area,
The similar image search apparatus according to claim 1, wherein the requesting unit uses a plurality of subjects corresponding to the plurality of graphics in the selected state as the key.

The display means adds the figure to one frame designated by the receiving means and displays it in the key original image display area.
The automatic adding means responds only to a single press of a predetermined button after the selection means accepts an operation for making the selected state for the graphic displayed in the key original image display area, and the selection means Starting from the subject corresponding to the figure in the state, the subject that satisfies the spatio-temporal continuity from the recorded moving image is tracked, and a plurality of frames in the range in which the tracking is successful are recorded as key sources. It is to be added to the image,
5. The display unit according to claim 2, wherein the display unit displays a plurality of key original images designated by the additional designation unit in the designated key display area with the figure added thereto. Similar image search device.