JP2010140425A

JP2010140425A - Image processing system

Info

Publication number: JP2010140425A
Application number: JP2008318733A
Authority: JP
Inventors: Hideaki Uchikoshi; 秀昭打越; Seiichi Hirai; 誠一平井
Original assignee: Hitachi Kokusai Electric Inc
Current assignee: Hitachi Kokusai Electric Inc
Priority date: 2008-12-15
Filing date: 2008-12-15
Publication date: 2010-06-24

Abstract

PROBLEM TO BE SOLVED: To shorten processing time required for registering an image by reducing arithmetic processing required for registering, for example, a face image, or the like in an image processing system for registering information on a predetermined image part of a plurality of frame images. SOLUTION: When processing a plurality of frame images in time series, an initial searching means searches an image part to be searched in a predetermined local area set in a corresponding frame of a predetermined number of initial frame images, a follow-up searching means sets a local area in the frame by using information based on the search result obtained in a frame image preceding to the initial frame in time series in a frame image subsequent to the initial frame in time series, and searches an image part to be searched in the local area, and an image part information registering means registers information about the image part of a part or the whole of the searched image part or the whole searched image part. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、例えば、類似顔画像検索システムにおける顔画像の登録に関し、特に、顔画像の登録に要する演算処理を軽減する技術に関する。 The present invention relates to, for example, registration of a face image in a similar face image search system, and more particularly to a technique for reducing arithmetic processing required for registration of a face image.

今日の犯罪発生の増加に伴い、複数台の撮像装置による監視映像を長期にわたり保存することが求められている。これまで、小型店舗等では単体の撮像装置による直近の監視映像だけの保存が一般的であったが、磁気記録装置の大容量化や、撮像装置の普及に伴い、容易に長時間、多地点による監視が可能となった。これにより、映像の監視者の負担は増加する一方であり、監視者の負担軽減を目的に、録画した監視映像から特定の人物や特定のシーンを探し出すことが、顔検出等の物体検出や画像の構図、色分布等を用いた類似画像検索技術により可能となってきている。
しかしながら、撮像装置による監視映像からの顔検出等の物体検出や画像の構図を分析する演算処理については、計算機にかかる負荷が高く、リアルタイムで画像検索を行うことは困難である。このため、顔検出等の物体検出処理の演算量を削減し、処理時間を短縮することについて研究が行われている。 With the increase in crime occurrence today, it is required to store surveillance images from a plurality of imaging devices over a long period of time. Until now, it was common to store only the latest surveillance video with a single imaging device in small stores, etc., but with the increase in capacity of magnetic recording devices and the widespread use of imaging devices, it has been easy to Monitoring by is possible. As a result, the burden on the video supervisor is increasing, and for the purpose of reducing the burden on the supervisor, searching for a specific person or a specific scene from the recorded surveillance video can be performed using object detection or image detection such as face detection. This is made possible by a similar image retrieval technique using the composition, color distribution, and the like.
However, the calculation processing for analyzing the object detection such as face detection from the monitoring video by the imaging device and the composition of the image is heavy on the computer, and it is difficult to perform image search in real time. For this reason, research has been conducted on reducing the amount of computation for object detection processing such as face detection and shortening the processing time.

図７には、従来技術に係る監視映像を対象とした類似顔画像検索システムの構成の一例を示してある。
ネットワーク１０４、１０５は、各装置を相互に接続してデータ通信を行う専用線やイントラネット、インターネット、無線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）等の通信網である。
撮像装置１０１は、撮像した映像に対してデジタル変換処理を行い、変換された映像データをネットワーク１０４を介して録画装置１０２へ出力するネットワークカメラ等の装置である。
録画装置１０２は、マイクロコンピュータなどを備え、ネットワーク１０４を介して撮像装置１０１から入力された画像データをＨＤＤ（ハードデスク装置）等の記録媒体に記録するネットワークデジタルレコーダ等の装置である。 FIG. 7 shows an example of the configuration of a similar face image search system for monitoring video according to the prior art.
The networks 104 and 105 are communication networks such as a dedicated line, an intranet, the Internet, and a wireless LAN (Local Area Network) that perform data communication by connecting devices to each other.
The imaging device 101 is a device such as a network camera that performs digital conversion processing on the captured video and outputs the converted video data to the recording device 102 via the network 104.
The recording device 102 includes a microcomputer and the like, and is a device such as a network digital recorder that records image data input from the imaging device 101 via the network 104 on a recording medium such as an HDD (hard disk device).

監視端末１０３は、ネットワーク１０５を介して録画装置１０２から取得した画像データを、液晶ディスプレイやＣＲＴ等のモニタに画面表示するＰＣ（パーソナルコンピュータ）等の装置である。監視端末１０３は、録画画像、検索結果画像、類似画像検索メニューなどを表示するモニタや、キーボードやマウス等のユーザ入力部などを備え、録画装置１０２に記録された映像の再生や、類似画像検索等のユーザインタフェースを提供する。 The monitoring terminal 103 is a device such as a personal computer (PC) that displays image data acquired from the recording device 102 via the network 105 on a monitor such as a liquid crystal display or a CRT. The monitoring terminal 103 includes a monitor that displays a recorded image, a search result image, a similar image search menu, a user input unit such as a keyboard and a mouse, and the like. Provide a user interface.

録画装置１０２は、ネットワーク送受信部１１１、映像記録部１１２、映像配信部１１３、画像特徴量記録部１１４、画像特徴量抽出部１１５、画像類似度判定部１１６、顔探索部１１７の各機能部を備えている。
ネットワーク送受信部１１１は、ネットワーク１０４、１０５で接続した装置とのデータの送受信を行う処理部である。ここで、送受信するデータは、例えば、撮像装置１０１から送信された撮像画像、監視端末１０３からの映像再生要求や検索要求や画像登録（処理）要求、監視端末１０３への配信画像や検索結果である。 The recording apparatus 102 includes functional units such as a network transmission / reception unit 111, a video recording unit 112, a video distribution unit 113, an image feature amount recording unit 114, an image feature amount extraction unit 115, an image similarity determination unit 116, and a face search unit 117. I have.
The network transmission / reception unit 111 is a processing unit that transmits / receives data to / from devices connected via the networks 104 and 105. Here, the data to be transmitted / received include, for example, a captured image transmitted from the imaging device 101, a video reproduction request or search request or image registration (processing) request from the monitoring terminal 103, a distribution image to the monitoring terminal 103, or a search result. is there.

撮像装置１０１からネットワーク１０４を介してネットワーク送受信部１１１により受信された撮像画像は、ネットワーク送受信部１１１から映像記録部１１２へ出力される。また、監視端末１０３からネットワーク１０５を介して受信された信号について、映像再生要求信号は映像配信部１１３へ、検索要求信号は画像類似度判定部１１６へ、画像登録要求信号は画像特徴量記録部１１４へ、それぞれ出力される。また、配信画像は映像配信部１１３から、検索結果は画像類似度判定部１１６から、それぞれネットワーク送受信部１１１に入力され、監視端末１０３へ送信される。 A captured image received by the network transmission / reception unit 111 from the imaging apparatus 101 via the network 104 is output from the network transmission / reception unit 111 to the video recording unit 112. As for signals received from the monitoring terminal 103 via the network 105, the video reproduction request signal is sent to the video distribution unit 113, the search request signal is sent to the image similarity determination unit 116, and the image registration request signal is sent to the image feature amount recording unit. 114 respectively. The distribution image is input from the video distribution unit 113 and the search result is input from the image similarity determination unit 116 to the network transmission / reception unit 111 and transmitted to the monitoring terminal 103.

映像記録部１１２は、撮像装置１０１からネットワーク１０４を介して入力された画像データをＨＤＤ等（図示せず）の記録媒体へ記録する処理部である。画像データを記録する際には、後で画像データを取り出すための情報、例えば、装置内でユニークであるように録画開始からフレーム単位で順に割り振ったフレーム番号を同時に記録する。また、画像が何時に撮像されたものであるかを識別するための情報、例えば、画像時刻も同時に記録する。ここで、画像時刻としては、例えば、録画装置１０２に内蔵している時計機能部から出力される装置時刻、或いは、撮像装置１０１に内蔵している時計機能部から出力される装置時刻が用いられる。 The video recording unit 112 is a processing unit that records image data input from the imaging apparatus 101 via the network 104 onto a recording medium such as an HDD (not shown). When recording image data, information for later retrieving image data, for example, frame numbers assigned in order from the start of recording so as to be unique within the apparatus, are simultaneously recorded. In addition, information for identifying when the image was taken, for example, the image time is also recorded. Here, as the image time, for example, the device time output from the clock function unit built in the recording apparatus 102 or the device time output from the clock function unit built in the imaging device 101 is used. .

映像配信部１１３は、映像記録部１１２や画像特徴量記録部１１４などの記録媒体に記録されている画像データの配信を行う処理部である。映像配信部１１３は、監視端末１０３からネットワーク１０５を介して入力された映像再生要求信号に応じて配信画像を決定し、フレーム番号や画像時刻を利用して記録媒体から画像データを読み出して、ネットワーク送受信部１１１に入力する。 The video distribution unit 113 is a processing unit that distributes image data recorded in a recording medium such as the video recording unit 112 and the image feature amount recording unit 114. The video distribution unit 113 determines a distribution image according to a video reproduction request signal input from the monitoring terminal 103 via the network 105, reads image data from the recording medium using the frame number and the image time, and Input to the transceiver 111.

画像特徴量記録部１１４は、画像特徴量を記録媒体へ記録する処理部である。画像特徴量は、映像記録部１１２で記録された画像データを顔探索部１１７へ出力することにより取得される顔画像データを画像特徴量抽出部１１５へ出力することにより取得される。画像特徴量を記録する際には、顔探索部１１７に入力された画像データに対応するフレーム番号も同時に記録する。
以降では、この記録により生成されるフレーム番号と画像特徴量で構成されるリストデータを画像特徴量リストデータと称する。
処理の開始は登録要求部１２５において決定される。 The image feature amount recording unit 114 is a processing unit that records an image feature amount on a recording medium. The image feature amount is acquired by outputting the face image data acquired by outputting the image data recorded by the video recording unit 112 to the face search unit 117 to the image feature amount extraction unit 115. When recording the image feature amount, the frame number corresponding to the image data input to the face search unit 117 is also recorded.
Hereinafter, the list data including the frame number and the image feature amount generated by this recording will be referred to as image feature amount list data.
The start of processing is determined by the registration request unit 125.

画像特徴量抽出部１１５は、顔探索部１１７から入力された顔画像データの特徴量を画像認識技術を用いて算出する処理部である。ここで、画像の特徴量としては、例えば、画像の色分布やエッジパターンの構図分布やそれらの組合せ等が用いられる。 The image feature amount extraction unit 115 is a processing unit that calculates the feature amount of the face image data input from the face search unit 117 using an image recognition technique. Here, as the image feature amount, for example, an image color distribution, an edge pattern composition distribution, a combination thereof, or the like is used.

画像類似度判定部１１６は、画像検索を行い、検索結果を出力する処理部である。画像類似度判定部１１６は、類似度を判定するために検索される画像の雛形として指定された検索画像を顔探索部１１７へ出力することで取得される顔画像の画像特徴量と、映像記録部１１２の記録媒体に記録されている画像データの顔画像の画像特徴量から類似度を算出し、算出された類似度の大小から検索結果を生成する。具体的には、例えば、類似度が所定値より大きいものを検索結果とする。ここで、検索画像は、監視端末１０３からネットワーク１０５を介して入力された検索要求信号の中に含まれている。記録媒体に記録されている顔画像データの画像特徴量は、フレーム番号を利用して画像特徴量記録部１１４に記録された画像特徴量リストデータから取得する。
なお、類似度の算出方法については、種々なものが用いられてもよく、例えば、非特許文献１の論文を参照することができる。 The image similarity determination unit 116 is a processing unit that performs an image search and outputs a search result. The image similarity determination unit 116 outputs an image feature amount of a face image acquired by outputting a search image designated as a template of an image to be searched for determining the similarity to the face search unit 117, and a video recording The similarity is calculated from the image feature amount of the face image of the image data recorded on the recording medium of the unit 112, and a search result is generated from the calculated magnitude of the similarity. Specifically, for example, a search result having a similarity higher than a predetermined value is used. Here, the search image is included in the search request signal input from the monitoring terminal 103 via the network 105. The image feature amount of the face image data recorded on the recording medium is acquired from the image feature amount list data recorded in the image feature amount recording unit 114 using the frame number.
Various methods for calculating the similarity may be used. For example, a paper in Non-Patent Document 1 can be referred to.

監視端末１０３は、ネットワーク送受信部１２１、検索要求部１２２、検索結果表示部１２３、映像再生部１２４、登録要求部１２５の各機能部を備えている。
ネットワーク送受信部１２１は、ネットワーク１０５で接続した録画装置１０２とのデータの送受信を行う処理部である。ここで、送受信するデータは、例えば、録画装置１０２への映像再生要求信号、検索要求信号や、録画装置１０２からの配信画像、検索結果である。 The monitoring terminal 103 includes functional units such as a network transmission / reception unit 121, a search request unit 122, a search result display unit 123, a video reproduction unit 124, and a registration request unit 125.
The network transmission / reception unit 121 is a processing unit that transmits / receives data to / from the recording device 102 connected via the network 105. Here, the data to be transmitted / received are, for example, a video reproduction request signal to the recording device 102, a search request signal, a distribution image from the recording device 102, and a search result.

録画装置１０２から受信された信号として、配信画像は映像再生部１２４へ、検索結果は検索結果表示部１２３へ、それぞれ出力される。また、映像再生要求信号は映像再生部１２４から、検索要求信号は検索要求部１２２から、登録要求は登録要求部１２５から、ネットワーク送受信部１２１にそれぞれ入力され、ネットワーク送受信部１２１からネットワーク１０５を介して録画装置１０２へ送信される。 As a signal received from the recording device 102, the distribution image is output to the video reproduction unit 124, and the search result is output to the search result display unit 123. The video playback request signal is input from the video playback unit 124, the search request signal is input from the search request unit 122, the registration request is input from the registration request unit 125, and the network transmission / reception unit 121. And transmitted to the recording device 102.

検索要求部１２２は、監視端末１０３のモニタ上に表示される画面から与えられたユーザによる検索操作を検索要求信号へ変換して出力する処理部である。
検索結果表示部１２３は、入力された録画装置１０２からの検索結果を監視端末１０３のモニタ上に表示される画面に描画する処理部である。
映像再生部１２４は、監視端末１０３のモニタ上に表示される画面から与えられたユーザによる再生操作を映像再生要求信号へ変換して出力する処理部である。また、映像再生部１２４は、入力された録画装置１０２からの配信画像を監視端末１０３のモニタ上に表示される画面に描画する処理も行う。
登録要求部１２５は、監視端末１０３のモニタ上に表示される画面から与えられたユーザによる登録の開始指示となる画像登録操作を登録要求信号へ変換して出力する処理部である。 The search request unit 122 is a processing unit that converts a search operation performed by the user from a screen displayed on the monitor of the monitoring terminal 103 into a search request signal and outputs the search request signal.
The search result display unit 123 is a processing unit that draws the input search result from the recording device 102 on a screen displayed on the monitor of the monitoring terminal 103.
The video playback unit 124 is a processing unit that converts a playback operation by the user given from the screen displayed on the monitor of the monitoring terminal 103 into a video playback request signal and outputs the video playback request signal. In addition, the video playback unit 124 also performs processing for drawing the input distribution image from the recording device 102 on a screen displayed on the monitor of the monitoring terminal 103.
The registration request unit 125 is a processing unit that converts an image registration operation that is a user's registration start instruction given from a screen displayed on the monitor of the monitoring terminal 103 into a registration request signal and outputs the registration request signal.

特開２００８−１０９３３６号公報JP 2008-109336 A 特開２００７−０４２０７２号公報JP 2007-042072 A 「大規模な画像集合のための表現モデル」、廣池敦他、日本写真学会誌、２００３年６６巻１号、Ｐ９３−Ｐ１０１"Representation model for large-scale image collection", Satoshi Tsunoike et al., Journal of the Japan Photography Society, Vol. 66, No. 1, 2003, P93-P101

しかしながら、従来技術に係る類似顔画像検索システムでは、顔検出処理の演算量が多く、大量に蓄積された映像からの類似顔検索、又は、リアルタイムでの類似顔検索の妨げとなっていた。このため、従来技術に係る類似顔画像検索システムでは、撮像装置１０１で撮像された画像から顔領域を抽出する処理の演算処理量を削減し、顔画像の特徴量の登録に要する処理時間の短縮を図ることが強く要求されていた。 However, the similar face image search system according to the related art has a large amount of face detection processing, which hinders similar face search from a large amount of accumulated video or similar face search in real time. For this reason, in the similar face image search system according to the related art, the calculation processing amount of the process of extracting the face area from the image captured by the imaging device 101 is reduced, and the processing time required for registering the feature amount of the face image is reduced. There was a strong demand for this.

例えば、従来技術に係る顔探索に要する演算量の削減では、監視映像の特徴を利用していなかった。監視映像は、撮影方向、撮影範囲が固定である監視カメラで撮像される場合が多い。また、建物内の狭い廊下に設置された監視カメラやエスカレータの昇降位置に設置された監視カメラによる監視映像では、撮像される人の移動方向が一定であり、一連のシーンにおいて被写体の大きさの変化もほぼ一定であるという特徴がある。つまり、フレーム内で、人が集中的に現れ始める、又は、集中的に現れ終る（消える）局所領域が存在するという特徴がある。 For example, the feature of surveillance video has not been used to reduce the amount of computation required for face search according to the prior art. In many cases, the surveillance video is captured by a surveillance camera having a fixed shooting direction and shooting range. Also, in the surveillance video by surveillance cameras installed in narrow corridors in buildings and surveillance cameras installed in the escalator lift position, the moving direction of the person being imaged is constant, and the size of the subject in a series of scenes The feature is that the change is almost constant. That is, there is a feature that there is a local region in which a person starts to appear intensively or ends (disappears) intensively.

本発明は、このような従来の事情に鑑み為されたもので、例えば、顔画像などの登録に要する演算処理を軽減し、画像の登録に要する処理時間の短縮を図ることができる画像処理システムを提供することを目的とする。 The present invention has been made in view of such conventional circumstances. For example, an image processing system capable of reducing the processing time required for registration of an image by reducing the arithmetic processing required for registration of a face image or the like. The purpose is to provide.

上記目的を達成するため、本発明では、複数のフレームの画像について所定の画像部分に関する情報を登録する画像処理システムにおいて、次のような構成とした。
すなわち、本画像処理システムでは、前記複数のフレームの画像を時系列順に処理する。初期探索手段が、所定数の初期のフレームの画像において、当該フレームの内に設定される所定の局所領域の内で探索対象となる画像部分を探索する。追従探索手段が、前記初期のフレームより時系列順で後のフレームの画像において、当該フレームの内にそれより時系列順で前のフレームの画像で得られた前記探索の結果に基づく情報を用いて局所領域を設定し、当該局所領域の内で探索対象となる画像部分を探索する。画像部分情報登録手段が、前記初期探索手段及び前記追従探索手段により探索された画像部分の一部又は全部について、当該画像部分に関する情報を登録する。 In order to achieve the above object, in the present invention, an image processing system for registering information on a predetermined image portion for a plurality of frames of images has the following configuration.
That is, in the present image processing system, the images of the plurality of frames are processed in time series. The initial search means searches for an image portion to be searched in a predetermined local area set in the frame in images of a predetermined number of initial frames. The follow-up search means uses information based on the search result obtained in the image of the previous frame in the time series order within the frame in the image of the frame after the initial frame in the time series order. Then, a local area is set, and an image portion to be searched is searched for in the local area. The image part information registration unit registers information related to the image part for a part or all of the image part searched by the initial search unit and the follow-up search unit.

従って、複数のフレームの画像を時系列順に処理し、時系列順で前のフレームの画像で得られた情報を用いて時系列順で後のフレームの画像について探索対象となる画像部分を探索することにより、探索対象となる画像部分の探索処理に要する負担を軽減することができ、例えば、顔画像などの登録に要する演算処理を軽減し、画像の登録に要する処理時間の短縮を図ることができる。 Accordingly, a plurality of frames of images are processed in time series order, and an image portion to be searched for is searched for images of subsequent frames in time series order using information obtained from images of previous frames in time series order. Thus, it is possible to reduce the burden required for the search processing of the image portion to be searched, for example, to reduce the calculation processing required for registering a face image or the like, and to shorten the processing time required for image registration. it can.

ここで、画像処理システムとしては、種々な構成のものが用いられてもよく、例えば、撮像装置、録画装置、監視端末（装置）を有するようなものを用いることができる。
また、画像としては、例えば、撮像装置により撮像された画像（映像）などを用いることができる。
また、複数のフレームとしては、例えば、時間的に連続したフレームが用いられてもよく、或いは、間引きなどされて時間的に飛んだフレームが用いられてもよい。
また、複数のフレームの数としては、種々な数が用いられてもよい。 Here, various image processing systems may be used. For example, an image processing system having an imaging device, a recording device, and a monitoring terminal (device) can be used.
Moreover, as an image, the image (video) imaged by the imaging device can be used, for example.
Further, as the plurality of frames, for example, temporally continuous frames may be used, or frames skipped in time after being thinned out may be used.
Various numbers may be used as the number of frames.

また、所定の画像部分としては、種々なものが用いられてもよく、例えば、人の顔や所定の物体などの画像部分を用いることができる。
また、所定の画像部分に関する情報としては、種々なものが用いられてもよく、例えば、画像部分の特徴を表す特徴量などの情報を用いることができる。
また、情報を登録する態様としては、例えば、情報をメモリに記録（記憶）する態様を用いることができる。 Various predetermined image portions may be used. For example, an image portion such as a human face or a predetermined object can be used.
Various pieces of information relating to the predetermined image portion may be used. For example, information such as a feature amount representing the feature of the image portion can be used.
In addition, as a mode for registering information, for example, a mode in which information is recorded (stored) in a memory can be used.

また、初期のフレームの数（所定数）としては、種々な数が用いられてもよく、例えば、２が用いられる。
また、初期のフレームの内に設定される所定の局所領域としては、種々なものが用いられてもよく、例えば、予め設定された領域が用いられる。具体例として、時間が経過するに従って探索対象となる画像部分が大きくなるような場合には初期の局所領域として小さい面積の領域を設定し、時間が経過するに従って探索対象となる画像部分が小さくなるような場合には初期の局所領域として大きい面積の領域を設定することができる。 Various numbers may be used as the initial number of frames (predetermined number), for example, 2.
Various predetermined local regions set in the initial frame may be used, for example, a preset region is used. As a specific example, if the image portion to be searched becomes larger as time passes, a region with a small area is set as the initial local region, and the image portion to be searched becomes smaller as time passes. In such a case, a large area can be set as the initial local region.

また、初期のフレームより時系列順で後のフレームの画像において、それより時系列順で前のフレームの画像で得られた探索の結果に基づく情報を用いて局所領域を設定する方法としては、種々なものが用いられてもよく、例えば、探索対象となる画像部分の動きベクトルや、探索対象となる画像部分の大きさ（例えば、面積）や、探索対象となる画像部分の大きさ（例えば、面積）の変化の具合などの１つ以上の情報を用いて、所定の演算式により演算される領域を設定するような方法を用いることができる。
また、探索された画像部分に関する情報を登録する対象としては、例えば、探索された画像部分の全て（例えば、全てのフレーム）であってもよく、或いは、探索された画像部分の一部（例えば、一部のフレーム）であってもよい。 Also, as a method of setting a local region using information based on a search result obtained from an image of a previous frame in time series order in an image of a frame after time series order from the initial frame, Various things may be used, for example, the motion vector of the image part used as search object, the size (for example, area) of the image part used as search object, or the size (for example, the image part used as search object) , Area), etc., using one or more pieces of information such as the degree of change, a method for setting a region calculated by a predetermined arithmetic expression can be used.
In addition, as a target for registering information on the searched image part, for example, all of the searched image parts (for example, all frames) may be registered, or a part of the searched image part (for example, for example) , Some frames).

本発明では、複数のフレームの画像について所定の画像部分に関する情報を登録する画像処理システムにおいて、次のような構成とした。
すなわち、本画像処理システムでは、前記複数のフレームの画像を逆時系列順（時系列順とは逆の順）に処理する。初期探索手段が、所定数の初期のフレームの画像において、当該フレームの内に設定される所定の局所領域の内で探索対象となる画像部分を探索する。追従探索手段が、前記初期のフレームより逆時系列順で後のフレームの画像において、当該フレームの内にそれより逆時系列順で前のフレームの画像で得られた前記探索の結果に基づく情報を用いて局所領域を設定し、当該局所領域の内で探索対象となる画像部分を探索する。画像部分情報登録手段が、前記初期探索手段及び前記追従探索手段により探索された画像部分の一部又は全部について、当該画像部分に関する情報を登録する。 In the present invention, an image processing system for registering information on a predetermined image portion for a plurality of frames of images has the following configuration.
That is, in the present image processing system, the images of the plurality of frames are processed in the reverse time series order (the order opposite to the time series order). The initial search means searches for an image portion to be searched in a predetermined local area set in the frame in images of a predetermined number of initial frames. Information based on the search result obtained by the follow-up search means in the image of the frame subsequent to the initial frame in the reverse time-series order and the image of the previous frame in the reverse time-series order within the frame Is used to set a local region, and an image portion to be searched is searched for in the local region. The image part information registration unit registers information related to the image part for a part or all of the image part searched by the initial search unit and the follow-up search unit.

従って、複数のフレームの画像を逆時系列順に処理し、逆時系列順で前のフレームの画像で得られた情報を用いて逆時系列順で後のフレームの画像について探索対象となる画像部分を探索することにより、探索対象となる画像部分の探索処理に要する負担を軽減することができ、例えば、顔画像などの登録に要する演算処理を軽減し、画像の登録に要する処理時間の短縮を図ることができる。
ここで、逆時系列順で処理する場合については、上述した時系列順で処理する場合と比べて、処理対象となるフレームの時間的な順序が逆であり、その点を考慮すれば、同様なことが言える。 Therefore, an image part to be processed with respect to an image of a subsequent frame in the reverse time series order using information obtained from the image of the previous frame in the reverse time series order by processing the images of the plurality of frames. Can reduce the burden required for the search processing of the image portion to be searched, for example, reduce the calculation processing required for registration of face images and the like, and shorten the processing time required for image registration. Can be planned.
Here, in the case of processing in the reverse time series order, the temporal order of the frames to be processed is reverse compared to the case of processing in the above time series order. I can say that.

一構成例として、上述した時系列順又は逆時系列順に処理する画像処理システムにおいて、前記追従探索手段により探索された画像部分に関する情報に基づいて、（又は、前記初期探索手段及び前記追従探索手段により探索された画像部分に関する情報に基づいて、）探索対象となる画像部分の探索を省略するフレームを決定し、決定したフレームの画像での前記探索を省略する探索省略手段を備えた。 As one configuration example, in the above-described image processing system that processes in time series order or reverse time series order, (or the initial search means and the follow-up search means) based on information about the image portion searched by the follow-up search means And a search omitting unit that determines a frame that omits the search for the image portion to be searched) and that omits the search in the image of the determined frame.

従って、所定のフレームについては探索対象となる画像部分の探索を省略することにより、更に、探索対象となる画像部分の探索処理に要する負担を軽減することができ、例えば、顔画像などの登録に要する演算処理を軽減し、画像の登録に要する処理時間の短縮を図ることができる。
ここで、探索対象となる画像部分の探索を省略するフレームを決定する方法としては、種々なものが用いられてもよく、例えば、探索対象となる画像部分が所定の大きさ（例えば、面積）未満となる（又は、以下となる）と予想されるようなフレームを決定するような方法を用いることができる。 Therefore, by omitting the search for the image portion to be searched for a predetermined frame, it is possible to further reduce the burden required for the search processing for the image portion to be searched. The required arithmetic processing can be reduced, and the processing time required for image registration can be shortened.
Here, various methods may be used as a method for determining a frame in which the search for the image portion to be searched is omitted. For example, the image portion to be searched has a predetermined size (for example, area). A method may be used that determines a frame that is expected to be less than (or less than).

以上説明したように、本発明に係る画像処理システムによると、探索対象となる画像部分の探索処理に要する負担を軽減することができ、例えば、顔画像などの登録に要する演算処理を軽減し、画像の登録に要する処理時間の短縮を図ることができる。 As described above, according to the image processing system of the present invention, it is possible to reduce the load required for the search processing of the image portion to be searched, for example, reduce the calculation processing required for registration of face images and the like, The processing time required for image registration can be shortened.

本発明に係る実施例を図面を参照して説明する。
まず、本提案の概要を示す。
本例では、人が集中的に現れ始める固定の局所領域が存在するという監視映像の特徴を利用する。前記した局所領域において顔探索することで、人が通過するシーンを検知することができる。
以降では、一連のシーンの最初のフレームを検知するための領域である前記した局所領域を初期探索領域と称する。 Embodiments according to the present invention will be described with reference to the drawings.
First, the outline of this proposal is shown.
In this example, the feature of the monitoring video that there is a fixed local region where people start to appear intensively is used. By searching for a face in the above-described local region, a scene through which a person passes can be detected.
Hereinafter, the above-described local area that is an area for detecting the first frame of a series of scenes is referred to as an initial search area.

本例では、初期探索領域で顔探索をし、人の通過を待ち受ける。初期探索領域においては、撮像される人の大きさはほぼ一定であり、例えば、小サイズの顔を初期探索領域で探索すればよい。また、初期探索領域で顔が検出されたフレーム以降のフレームでは一連のシーンで既に検出した顔領域の重心座標から算出した動きベクトル及び一連のシーンで既に検出した顔サイズを用いて、次フレームでの顔領域の重心座標及び顔サイズを推定し、顔探索の領域及び顔探索の最小サイズ及び最大サイズを変更する。 In this example, a face search is performed in the initial search area and a person is allowed to pass. In the initial search area, the size of the person to be imaged is substantially constant. For example, a small face may be searched in the initial search area. In the frames after the frame in which the face is detected in the initial search area, the motion vector calculated from the barycentric coordinates of the face area already detected in the series of scenes and the face size already detected in the series of scenes are used in the next frame. The center of gravity coordinates and the face size of the face area are estimated, and the face search area and the face search minimum and maximum sizes are changed.

また、本例では、逆時系列順にフレームの登録を行ってもよい。時系列順に監視映像を登録する場合には、小さい初期探索領域で待ち受け、小サイズの顔で限定することで演算量の削減ができる。一方、逆時系列順に登録する場合には、人が集中的に通り過ぎる局所領域を初期探索領域とし、大きい顔を初期探索領域で顔探索することで、遠方の人の小さい顔を検出するための演算量を削減することができる。また、逆時系列順に登録する場合には、一連のシーンにおいて、先に大きい顔画像が得られるため、そのシーン以降のフレームで検出する顔サイズ等を推定することで、顔探索するフレームを選別することができる。 In this example, the frames may be registered in reverse chronological order. When registering monitoring videos in chronological order, the amount of calculation can be reduced by waiting in a small initial search area and limiting to small-sized faces. On the other hand, when registering in reverse chronological order, a local area where people pass intensively is set as an initial search area, and a face is searched for a large face in the initial search area, thereby detecting a small face of a distant person. The amount of calculation can be reduced. In addition, when registering in reverse chronological order, since a large face image is obtained first in a series of scenes, the face search frame is selected by estimating the face size detected in the frames after the scene. can do.

図１には、本発明の一実施例に係る監視映像を対象とした類似顔画像検索システムの構成の一例を示してある。
本例の類似顔画像検索システムは、撮像装置１と、録画装置２と、監視端末３と、撮像装置１と録画装置２を接続するネットワーク４と、録画装置２と監視端末３を接続するネットワーク５を備えている。
ここで、撮像装置１やネットワーク４、５としては、例えば、図７に示されるもの（同一の名称のもの）と同様なものを用いることができる。
なお、ネットワーク４とネットワーク５としては、必ずしも別個なものでなくてもよく、同一のネットワークが使用されてもよい。 FIG. 1 shows an example of the configuration of a similar face image search system for monitoring video according to an embodiment of the present invention.
The similar face image search system of this example includes an imaging device 1, a recording device 2, a monitoring terminal 3, a network 4 that connects the imaging device 1 and the recording device 2, and a network that connects the recording device 2 and the monitoring terminal 3. 5 is provided.
Here, as the imaging device 1 and the networks 4 and 5, for example, the same devices as those shown in FIG. 7 (same names) can be used.
Note that the network 4 and the network 5 are not necessarily separate, and the same network may be used.

録画装置２は、ネットワーク送受信部１１、映像記録部１２、映像配信部１３、画像特徴量記録部１４、画像特徴量抽出部１５、画像類似度判定部１６、顔探索部１７、シーン顔探索部１８を備えている。
録画装置２は、マイクロコンピュータなどを備え、ネットワーク４を介して撮像装置１から入力された画像データをＨＤＤ（ハードデスク装置）等の記録媒体に記録するネットワークデジタルレコーダ等の装置である。 The recording device 2 includes a network transmission / reception unit 11, a video recording unit 12, a video distribution unit 13, an image feature amount recording unit 14, an image feature amount extraction unit 15, an image similarity determination unit 16, a face search unit 17, and a scene face search unit. 18 is provided.
The recording device 2 includes a microcomputer and the like, and is a device such as a network digital recorder that records image data input from the imaging device 1 via the network 4 on a recording medium such as an HDD (hard disk device).

ここで、ネットワーク送受信部１１、映像記録部１２、映像配信部１３、画像特徴量抽出部１５、画像類似度判定部１６、顔探索部１７としては、例えば、図７に示されるもの（同一の名称のもの）と同様なものを用いることができる。
本例の録画装置２では、例えば図７に示される録画装置１０２と比べて、概略的には、図７に示される各処理部に加えて、シーン顔探索部１８を備えており、また、画像特徴量記録部１４の機能が異なっている。 Here, as the network transmission / reception unit 11, the video recording unit 12, the video distribution unit 13, the image feature quantity extraction unit 15, the image similarity determination unit 16, and the face search unit 17, for example, those shown in FIG. The thing similar to a thing of a name can be used.
In the recording device 2 of this example, for example, compared with the recording device 102 shown in FIG. 7, for example, in addition to the processing units shown in FIG. 7, a scene face search unit 18 is provided. The function of the image feature amount recording unit 14 is different.

監視端末３は、ネットワーク送受信部２１、検索要求部２２、検索結果表示部２３、映像再生部２４、登録要求部２５を備えている。
ここで、ネットワーク送受信部２１、検索要求部２２、検索結果表示部２３、映像再生部２４としては、例えば、図７に示されるもの（同一の名称のもの）と同様なものを用いることができる。
本例の監視端末３では、例えば図７に示される監視端末１０３と比べて、概略的には、登録要求部２５の機能が異なっている。 The monitoring terminal 3 includes a network transmission / reception unit 21, a search request unit 22, a search result display unit 23, a video reproduction unit 24, and a registration request unit 25.
Here, as the network transmission / reception unit 21, the search request unit 22, the search result display unit 23, and the video reproduction unit 24, for example, those similar to those shown in FIG. 7 (same names) can be used. .
In the monitoring terminal 3 of this example, for example, the function of the registration request unit 25 is roughly different from that of the monitoring terminal 103 shown in FIG.

本例の登録要求部２５は、例えば図７に示される登録要求部１２５と同様な機能を有しており、更にそれに加えて、監視端末３においてユーザによる登録方法の選択及び入力等された開始、終了フレーム番号の値から登録要求信号を作成して、録画装置２の画像特徴量記録部１４に登録開始を指示する機能（画像登録要求の機能）を有している。 The registration request unit 25 of this example has the same function as the registration request unit 125 shown in FIG. 7, for example, and in addition to that, the start of the selection and input of the registration method by the user in the monitoring terminal 3 The registration request signal is created from the value of the end frame number, and the image feature amount recording unit 14 of the recording device 2 has a function of instructing the start of registration (image registration request function).

録画装置２のシーン顔探索部１８は、録画装置２に録画されている画像から顔領域を検出する処理部である。ここで、顔探索部１７（及び、図７に示される顔探索部１１７）は、画像から所定の最小又は最大の顔を探索する機能のみを有している。これに対して、シーン顔探索部１８は、一連のシーンとしてこれまでに検出できた顔領域の重心座標、大きさ（サイズ）、顔検出数を内部メモリに記録及び参照することで、次フレームで顔探索する領域、顔最小及び最大サイズを算出し、顔探索する機能を有している。 The scene face search unit 18 of the recording device 2 is a processing unit that detects a face area from an image recorded in the recording device 2. Here, the face search unit 17 (and the face search unit 117 shown in FIG. 7) has only a function of searching for a predetermined minimum or maximum face from the image. On the other hand, the scene face searching unit 18 records and refers to the internal memory of the barycentric coordinates, size (size), and face detection number of the face area that has been detected so far as a series of scenes. Has a function for calculating a face and calculating a face minimum and maximum size to search for a face.

録画装置２の画像特徴量記録部１４は、シーン顔探索部１８を用いて、録画装置２のＨＤＤ等に蓄積された画像から顔画像の特徴量の登録を行う。また、監視端末３の登録要求部２５においてユーザが指定した登録方法に則り登録方法を変更する機能を有している。 The image feature amount recording unit 14 of the recording device 2 uses the scene face search unit 18 to register the feature amount of the face image from the image stored in the HDD or the like of the recording device 2. The registration request unit 25 of the monitoring terminal 3 has a function of changing the registration method in accordance with the registration method designated by the user.

図２には、建物内の廊下を遠方から手前に通り過ぎる人を監視するために設置された撮像装置１（例えば、監視カメラ）の映像の一例を模式的に示してある。これは、監視映像を対象とした類似顔画像検索システムの登録処理で用いる監視映像の一つである。
以下の顔画像登録方法（登録方法１）〜（登録方法３）の各別にシーン顔探索部１８で行われる顔探索処理について説明する。
（登録方法１）時系列順に登録する場合
（登録方法２）逆時系列順に登録する場合
（登録方法３）逆時系列順に登録し、一連のシーンにおいて、小さい顔の顔探索を省略する場合 FIG. 2 schematically shows an example of an image of an imaging device 1 (for example, a monitoring camera) installed to monitor a person passing through a corridor in a building from a distance to the front. This is one of the monitoring videos used in the registration process of the similar face image search system for monitoring videos.
The face search process performed by the scene face search unit 18 for each of the following face image registration methods (registration method 1) to (registration method 3) will be described.
(Registration method 1) When registering in chronological order (Registration method 2) When registering in reverse chronological order (Registration method 3) When registering in reverse chronological order and omitting face search for small faces in a series of scenes

（登録方法１）時系列順に登録する場合について説明する。
登録に用いるフレームは、ユーザが指定するフレーム又は録画装置２に蓄積された映像の先頭フレームから、ユーザが指定するフレーム又は録画装置２に蓄積された映像の最終フレームまでである。先頭フレームから順に各フレーム番号を元にして登録を行う。ここでは、フレーム番号順に登録する方法について説明するが、例えば、人が通り過ぎるシーンにおいてフレーム間の画像差異が小さい場合等には、登録するフレームを一定値で飛ばしてもよい。
以降、一定値で登録するフレームを飛ばすことをフレーム飛ばしと称する。 (Registration method 1) The case of registering in chronological order will be described.
The frame used for registration is from the frame specified by the user or the first frame of the video stored in the recording device 2 to the frame specified by the user or the final frame of the video stored in the recording device 2. Registration is performed based on each frame number in order from the first frame. Here, a method of registering in the order of frame numbers will be described. For example, when the image difference between frames is small in a scene where a person passes, the frames to be registered may be skipped at a constant value.
Hereinafter, skipping a frame to be registered with a constant value is referred to as frame skipping.

シーン顔探索部１８は、撮像装置１が撮像する映像の全領域３１での顔探索ではなく、撮像範囲に人が写り始める位置となる局所領域３２で顔を検出することができるフレームまで、局所領域３２で顔探索を行う。この例では、局所領域３２を初期探索領域とする。
ここでは、初期探索領域を局所領域３２としたが、用いる映像毎に適切な領域をユーザが指定することができる。また、システムの運用前に監視映像のサンプル映像を取得して、監視カメラの撮像範囲の全領域で顔探索を予め行っておき、人の軌跡情報を収集し、所定の大きさ以下の顔を検出した領域等を自動で初期探索領域として設定してもよい。 The scene face searching unit 18 does not search for a face in the entire region 31 of the image captured by the imaging device 1 but locally detects a face in a local region 32 where the person starts to appear in the imaging range. A face search is performed in region 32. In this example, the local area 32 is set as the initial search area.
Here, the initial search area is the local area 32, but the user can specify an appropriate area for each video to be used. Also, sample video of surveillance video is acquired before system operation, face search is performed in advance in the entire area of the imaging range of the surveillance camera, human trajectory information is collected, and faces below a predetermined size are collected. The detected area or the like may be automatically set as the initial search area.

局所領域３２で顔を検出することができたら、人が通過する一連のシーンが開始されたとし、時系列順で次フレーム以降は、それまでに検出した顔領域の重心座標を利用して、次フレームでの顔領域を推定する。ここで、座標は、例えば、撮像装置１で得られる画像の左上を原点とし、水平方向をＸ軸、垂直方向をＹ軸として定義される。 If the face can be detected in the local region 32, a series of scenes through which a person passes is started. From the next frame in the time series order, the barycentric coordinates of the face region detected so far are used. Estimate the face area in the next frame. Here, the coordinates are defined, for example, with the upper left of the image obtained by the imaging apparatus 1 as the origin, the horizontal direction as the X axis, and the vertical direction as the Y axis.

図３には、一連のシーンでこれまでに検出できた顔から次フレームでの顔領域を推定するための動きベクトルを算出する場合の一例を示してある。これは、監視映像を対象とした類似顔画像検索システムの登録処理において動きベクトルを求める例である。
ここで、図３に示される局所領域３２は、図２に示される初期探索領域である局所領域３２を表している。一連のシーンにおいて、動きベクトルを算出するためには、少なくとも２つのフレームでの顔領域の重心座標が必要であるため、動きベクトルによる顔探索領域の推定は、局所領域３２で顔検出した後、３フレーム目以降となる。 FIG. 3 shows an example in the case of calculating a motion vector for estimating the face area in the next frame from the face detected so far in a series of scenes. This is an example in which a motion vector is obtained in the registration process of the similar face image search system for a monitoring video.
Here, the local region 32 shown in FIG. 3 represents the local region 32 which is the initial search region shown in FIG. In order to calculate the motion vector in a series of scenes, the center of gravity coordinates of the face area in at least two frames is necessary. Therefore, the face search area estimation based on the motion vector is performed after detecting the face in the local area 32. It is after the third frame.

重心座標４１、４２は、局所領域３２で検出された顔領域の重心座標を表している。図３では、一連のシーンにおいて、それぞれのフレーム毎に連続して同一人物の顔を検出し、顔領域の重心座標が４つ得られているとし、次フレームの動きベクトルの算出方法について示している。
重心座標４１、４２に加え、重心座標４３、４４は、順に、既に得られた顔領域の重心座標の重心とする。 The barycentric coordinates 41 and 42 represent barycentric coordinates of the face area detected in the local area 32. FIG. 3 shows a method for calculating the motion vector of the next frame, assuming that the face of the same person is continuously detected for each frame in a series of scenes and four barycentric coordinates of the face area are obtained. Yes.
In addition to the center-of-gravity coordinates 41 and 42, the center-of-gravity coordinates 43 and 44 are sequentially set as the center-of-gravity coordinates of the already obtained face area.

重心座標４１〜４４に基づいて各フレーム間の重心のｘ座標、ｙ座標の移動量から動きベクトル５１、５２、５３を算出することができる。具体的には、顔領域の重心座標の重心４１、４２から算出される動きベクトル５１と、顔領域の重心座標の重心４２、４３から算出される動きベクトル５２と、顔領域の重心座標の重心４３、４４から算出される動きベクトル５３がある。動きベクトル５１、５２、５３から動きベクトル５４を算出する。 Based on the barycentric coordinates 41 to 44, the motion vectors 51, 52, and 53 can be calculated from the movement amounts of the barycentric x-coordinate and y-coordinate between the frames. Specifically, the motion vector 51 calculated from the centroids 41 and 42 of the centroid coordinates of the face area, the motion vector 52 calculated from the centroids 42 and 43 of the centroid coordinates of the face area, and the centroid of the centroid coordinates of the face area There is a motion vector 53 calculated from 43 and 44. A motion vector 54 is calculated from the motion vectors 51, 52 and 53.

動きベクトル５４は、以下の（式１）で求めることができる。
以下の（式１）は既に検出した顔座標で求められた動きベクトルＺ_ｉ（ｉ＝１、２、３、・・・、ｎ）を用いて、現フレームから次フレームでの顔領域の重心の移動方向を示す動きベクトルＺを算出する式である。
なお、Ｚ_ｉ、Ｚはベクトルを表すとし、ｎは２以上の値であるとする。 The motion vector 54 can be obtained by the following (Equation 1).
The following (Equation 1) uses the motion vector Z _i (i = 1, 2, 3,..., N) obtained from the already detected face coordinates, and the center of gravity of the face area from the current frame to the next frame. This is an equation for calculating a motion vector Z indicating the moving direction of.
Z _i and Z represent vectors, and n is a value of 2 or more.

ここで、動きベクトルＺ_ｉ毎に重みＷ_ｉ（＞＝１）（ｉ＝１、２、３、・・・、ｎ）を定義する。各フレーム間の動きベクトルＺ_ｉは、時刻ｉに対応するフレームで検出した顔領域の重心座標（Ｘ_ｉ、Ｙ_ｉ）と時刻（ｉ−１）に対応するフレームで検出した顔領域の重心座標（Ｘ_ｉ−１、Ｙ_ｉ−１）を用いて、Ｚ_ｉ＝（Ｘ_ｉ−Ｘ_ｉ−１、Ｙ_ｉ−Ｙ_ｉ−１）で求められる。 Here, a weight W _i (> = 1) (i = 1, 2, 3,..., N) is defined for each motion vector Z _i . The motion vectors Z _i between the frames are the barycentric coordinates (X _i , Y _i ) of the face area detected in the frame corresponding to the time i and the barycentric coordinates of the face area detected in the frame corresponding to the time (i−1). Using (X _i−1 , Y _i−1 ), Z _i = (X _i −X _i−1 , Y _i −Y _i−1 ).

動きベクトルＺ_ｉ毎の重みＷ_ｉに関して、例えば、Ｗ_ｉ＝１とすれば、動きベクトルＺは、それぞれの動きベクトルＺ_ｉの重みを等しくした場合である。各フレーム間の顔領域の移動量変化が少なく、次のフレームでの実際の移動量もそれまでと同じ程度である可能性が高ければ、精度の高い動きベクトルＺが得られる。一方、監視映像内の遠方の場合と手前で撮影される場合では移動量が大きく異なるならば、Ｗ_ｉ＝ｉとし、遠方で得た動きベクトルＺ_ｉ１と手前で得た動きベクトルＺ_ｉ２（ｉ２＞ｉ１）の重みに差を付けて、直近の動きベクトルＺ_ｉ２を重視することで、歩行していた人が急に立ち止まったり、移動方向を変更するなど、監視映像内の人の移動量の変化が激しい場合においても、動きベクトルＺにより人の移動を追従することができる。
なお、動きベクトルＺの算出に用いる重みは、上記に限定するものではない。 Regard the weight W _i of each motion vector Z _i, for example, if W i _{= 1,} the motion vector Z is a case where the equal weight of each motion vector Z _i. If there is little change in the amount of movement of the face area between frames and the actual amount of movement in the next frame is likely to be the same as before, a highly accurate motion vector Z can be obtained. On the other hand, if the moving amount is greatly different between the far side and the near side in the surveillance video, W _i = i is set, and the motion vector Z _i1 obtained far and the motion vector Z _i2 (i2) obtained near > 1) The weight of i1) is differentiated and the most recent motion vector Z _i2 is emphasized, so that the person who is walking suddenly stops or the movement direction is changed. Even when the change is severe, the motion vector Z can follow the movement of the person.
The weight used for calculating the motion vector Z is not limited to the above.

動きベクトルＺが得られれば、直近のフレームでの顔領域の重心座標を原点として動きベクトルＺにて次フレームでの顔座標の推定重心座標を求めることができる。図３の例では、動きベクトル５４から推定した次フレームでの顔領域の推定重心座標が推定重心座標４５である。この推定重心座標４５を中心（或いは、中心でなくてもよい）に小領域で顔探索を行えば、顔探索に要する演算量を削減することができる。顔探索を行う小領域の大きさは、例えば、次フレームでの顔サイズを推定することで決定することができる。具体例として、一連のシーンで検出できた顔サイズの履歴から次フレームでの顔サイズを推定することができ、この推定顔サイズに所定の倍率（例えば、予め設定された値）を掛けた大きさの領域にて顔探索を行う。 If the motion vector Z is obtained, the estimated center-of-gravity coordinate of the face coordinate in the next frame can be obtained from the motion vector Z using the center-of-gravity coordinate of the face area in the latest frame as the origin. In the example of FIG. 3, the estimated centroid coordinates 45 of the face area in the next frame estimated from the motion vector 54 are the estimated centroid coordinates 45. If a face search is performed in a small area with the estimated center-of-gravity coordinates 45 as the center (or may not be the center), the amount of calculation required for the face search can be reduced. The size of the small area for performing the face search can be determined by estimating the face size in the next frame, for example. As a specific example, the face size in the next frame can be estimated from the face size history detected in a series of scenes, and this estimated face size is multiplied by a predetermined magnification (for example, a preset value). A face search is performed in this area.

以降では、領域サイズ及び動きベクトルＺから決定される領域５５を追従顔探索領域と称する。
この追従顔探索領域による顔探索をすることで、演算量を削減することができる。また、例えば、更に演算量を削減するために、顔探索を行う顔最小サイズ及び顔最大サイズを適切に設定する必要がある。本例では、次フレームでの顔探索の最小及び最大のサイズは、上述の推定顔サイズを元に決定する。例えば、直近のフレームで検出した顔の顔サイズを推定顔サイズとすると、その推定顔サイズの横の画素数Ｆを基準に、顔探索の最小サイズを０．９×Ｆ、最大サイズを１．１×Ｆと決定する。 Hereinafter, the region 55 determined from the region size and the motion vector Z is referred to as a tracking face search region.
The amount of calculation can be reduced by performing a face search using the following face search area. Further, for example, in order to further reduce the calculation amount, it is necessary to appropriately set the minimum face size and the maximum face size for performing face search. In this example, the minimum and maximum size of the face search in the next frame is determined based on the estimated face size described above. For example, assuming that the face size of the face detected in the most recent frame is the estimated face size, the minimum face search size is 0.9 × F and the maximum size is 1. based on the number of pixels F next to the estimated face size. Determined as 1 × F.

（登録方法２）逆時系列順に登録する場合について説明する。
上記した（登録方法１）では、時系列順のフレームにて顔探索をする方法について説明した。ここでは、逆時系列順に顔探索する方法について説明する。
登録に用いるフレームは、ユーザが指定するフレーム又は録画装置２に蓄積された映像の最終のフレームから、ユーザが指定するフレーム又は録画装置２に蓄積された映像の最初のフレームまでである。上記した（登録方法１）と同様に、フレーム飛ばしによる登録を行ってもよい。 (Registration Method 2) A case where registration is performed in reverse time series will be described.
In the above (Registration Method 1), the method for searching for a face in time-sequential frames has been described. Here, a method of searching for faces in reverse time series will be described.
The frames used for registration are from the frame specified by the user or the last frame of the video stored in the recording device 2 to the frame specified by the user or the first frame of the video stored in the recording device 2. Similar to (Registration Method 1) described above, registration by skipping frames may be performed.

逆時系列順に登録する場合には、一連のシーンで人が監視映像から見えなくなるフレームにおいて局所領域３３（図２参照。）で顔探索を行う。局所領域３３は、人が監視映像から見えなくなる領域であり、一連のシーンおいて最も顔サイズが大きくなる。このため、局所領域３３で顔探索する際の顔探索最小サイズを全領域から顔探索する場合よりも大きくすることができる。顔探索において検出可能な最小顔サイズと演算量において負相関があることは、自明である。つまり、顔探索最小サイズを大きくすればするほど、演算量を削減することができる。 When registering in reverse chronological order, a face search is performed in the local region 33 (see FIG. 2) in a frame in which a person cannot be seen from the monitoring video in a series of scenes. The local area 33 is an area in which a person cannot be seen from the monitoring video, and has the largest face size in a series of scenes. For this reason, the face search minimum size when searching for a face in the local region 33 can be made larger than when searching for a face from the entire region. It is self-evident that there is a negative correlation between the minimum face size detectable in face search and the amount of calculation. That is, the amount of calculation can be reduced as the face search minimum size is increased.

逆時系列順に顔探索を行う場合には、各フレーム間の動きベクトルは時系列順の場合と比較して方向が逆になるだけであり、動きベクトルの算出方法は上記した（登録方法１）の場合と同様である。
逆時系列順に登録する場合には、一連のシーンで検出できる顔サイズは徐々に小さくなる。よって、顔探索の最小及び最大のサイズの算出方法では、例えば、一連のシーンで既に検出した直近のフレームでの顔サイズを基準に所定のサイズだけ小さくしたり、一連のシーンで既に検出した顔サイズの縮小率から推定される顔サイズなどを用いる。 When face search is performed in reverse time series order, the motion vectors between the frames are only reversed in direction compared to the case of time series order, and the motion vector calculation method has been described above (registration method 1). It is the same as the case of.
In the case of registration in reverse chronological order, the face size that can be detected in a series of scenes gradually decreases. Therefore, in the calculation method of the minimum and maximum sizes of the face search, for example, the face size in the latest frame that has already been detected in the series of scenes is reduced by a predetermined size or the face that has already been detected in the series of scenes. The face size estimated from the size reduction rate is used.

（登録方法３）逆時系列順に登録し、一連のシーンにおいて、小さいサイズの顔探索を省略する場合について説明する。
上記した（登録方法２）では、逆時系列順に登録する方法について説明した。逆時系列順に登録すると、例えば図２に示されるような遠方から人が通り過ぎる映像において、人が通り過ぎるシーン毎にサイズが大きい顔から順に検出できることになる。
類似顔画像検索をするためには、一連のシーンにおいて、サイズの大きい顔を得ることが望ましい。時系列順に登録した場合では、初期探索窓（初期探索領域）から顔探索領域を追従させ、小さい顔から順に大きい顔まで全て検出する必要があった。しかしながら、逆時系列順では、大きい顔を先に検出することができるため、一連のシーンにおいて、小さい顔が撮像されていると推定されるフレーム、つまり、時系列順で前方のフレームでの顔探索処理を省略することができる。 (Registration method 3) A case will be described in which registration is performed in reverse time-series order and a small-size face search is omitted in a series of scenes.
In the above (registration method 2), the method of registering in reverse time-series order has been described. When registered in reverse chronological order, for example, in a video that a person passes from a distance as shown in FIG. 2, it is possible to detect in order from the face with the largest size for each scene that the person passes.
In order to perform a similar face image search, it is desirable to obtain a large face in a series of scenes. In the case of registration in time series order, it is necessary to make the face search area follow from the initial search window (initial search area) and detect all the faces from the small face to the large face in order. However, in the reverse time series order, a large face can be detected first, so in a series of scenes, it is estimated that a small face has been captured, that is, a face in the front frame in time series order. Search processing can be omitted.

ここで、一連のシーンにおいて、顔探索するフレームを選別する方法について説明する。
一連のシーンにおいて、既に幾つかのフレームで顔を検出できている場合、それ以降のフレームで顔探索しないとする判定と、それ以降いずれのフレームまでを省略するかの判定をする必要がある。 Here, a method for selecting a frame for face search in a series of scenes will be described.
When a face has already been detected in several frames in a series of scenes, it is necessary to determine not to search for a face in subsequent frames and to determine which frames are omitted thereafter.

以降のフレームで顔探索しないとする判定の基準としては、例えば、一連のシーンでこれまでに所定の数以上の顔検出ができているか、所定のサイズ以上の顔画像を得られているか、などがある。具体的には、例えば、一連のシーンでこれまでに所定の数以上の顔検出ができている場合に以降のフレームで顔探索しないと判定することや、或いは、所定のサイズ以上の顔画像を得られていない場合に以降のフレームで顔探索しないと判定することができる。
また、以降のいずれのフレームまでを省略できるかの判定では、例えば、これまでに検出できた顔サイズの縮小率αと省略するフレーム数βで算出される推定サイズｓを用いることで、時系列順でいずれのフレームから一連のシーンが開始されたかを推定することができ、この推定されたフレームまで省略することができる。 As a criterion for determining not to search for a face in subsequent frames, for example, whether a predetermined number or more of faces have been detected so far in a series of scenes, or whether a face image of a predetermined size or more has been obtained, etc. There is. Specifically, for example, when a predetermined number or more of faces have been detected in a series of scenes so far, it is determined not to search for a face in subsequent frames, or a face image having a predetermined size or more is selected. If it is not obtained, it can be determined that the face is not searched in the subsequent frames.
Further, in determining which of the subsequent frames can be omitted, for example, by using the estimated size s calculated by the face size reduction ratio α and the omitted number of frames β detected so far, a time series is used. It is possible to estimate from which frame a series of scenes have been started in order, and this estimated frame can be omitted.

図４には、逆時系列順に顔画像を登録した場合に、顔探索を省略することができるフレームの一例を示してある。これは、監視映像を対象とした類似顔画像検索システムの登録処理において、逆時系列順に登録した場合に小さい顔の探索を省略するフレーム制御を示している。
具体的には、逆時系列順に登録を行い、フレーム６９、６８にて初期探索領域で顔検出することができ、フレーム６８以降、フレーム６７、６６、６５にて追従顔探索領域で顔探索を行い、これにより、各フレームで顔検出することができ、５つの顔画像が得られている場合である。 FIG. 4 shows an example of a frame in which face search can be omitted when face images are registered in reverse chronological order. This indicates frame control in which the search for a small face is omitted when registration is performed in reverse chronological order in the registration process of the similar face image search system for the monitoring video.
Specifically, registration is performed in reverse time-series order, and faces can be detected in the initial search area in frames 69 and 68, and face search is performed in the follow-up face search area in frames 67, 66, and 65. This is a case where a face can be detected in each frame and five face images are obtained.

ここでは、顔画像サイズとして顔画像の横の画素数を用いる。フレーム６５で得られる顔画像の横の画素数をｓ６５、フレーム６６で得られる顔画像の横の画素数をｓ６６、フレーム６７で得られる顔画像の横の画素数をｓ６７、フレーム６８で得られる顔の横の画素数をｓ６８とする。
なお、本例では、顔画像サイズとして、顔画像の横の画素数を用いたが、縦の画素数、或いは、面積などを用いることもできる。顔探索を省略する判定の基準を、一連のシーンで顔画像を４つ以上得られていることとすれば、フレーム６４以降の顔探索を省略することができる。 Here, the number of horizontal pixels of the face image is used as the face image size. The number of horizontal pixels of the face image obtained in the frame 65 is s65, the number of horizontal pixels of the face image obtained in the frame 66 is s66, the number of horizontal pixels of the face image obtained in the frame 67 is s67, and the frame 68 is obtained. The number of pixels next to the face is s68.
In this example, the number of horizontal pixels of the face image is used as the face image size, but the number of vertical pixels or the area can also be used. If the criterion for determining to omit the face search is that four or more face images are obtained in a series of scenes, the face search after the frame 64 can be omitted.

ここで、図４の例の場合、顔画像の縮小率αは、以下の縮小率の例（縮小率の例１）〜（縮小率の例４）などで求めることができる。
（縮小率の例１）
α＝（ｓ６８−ｓ６７）／ｓ６８
（縮小率の例２）
α＝（ｓ６７−ｓ６６）／ｓ６７
（縮小率の例３）
α＝（ｓ６６−ｓ６５）／ｓ６６
（縮小率の例４）
α＝｛（ｓ６８−ｓ６７）／ｓ６８＋（ｓ６７−ｓ６６）／ｓ６７＋（ｓ６６−ｓ６５）／ｓ６６｝／３ In the case of the example of FIG. 4, the reduction rate α of the face image can be obtained by the following reduction rate examples (reduction rate example 1) to (reduction rate example 4).
(Example of reduction rate 1)
α = (s68−s67) / s68
(Example 2 of reduction rate)
α = (s67−s66) / s67
(Example 3 of reduction ratio)
α = (s66−s65) / s66
(Example 4 of reduction rate)
α = {(s68−s67) / s68 + (s67−s66) / s67 + (s66−s65) / s66} / 3

縮小率αの定義は、各フレーム間の顔サイズの変化率であり、上記は、その一例である。また、縮小率αによる縮小回数をβとする。つまり、ｓ６５を基準とすると、一連のシーンが開始されたフレームの推定顔サイズは、（α^β×ｓ６５）で求めることができる。ここで、時系列順でシーンが開始されるフレームの顔サイズをＳｆｉｒｓｔと予め決定しておくことで、Ｓｆｉｒｓｔ＞（α^β×ｓ６５）となるフレームまで顔探索を省略することができる。
例えば、省略が可能なフレーム数が２と算出されれば、フレーム６４、６３の顔探索を省略し、フレーム６２以降（図４の例では、フレーム６２、６１）において初期探索領域での顔探索を再開する。 The definition of the reduction rate α is the change rate of the face size between the frames, and the above is an example. Further, the number of reductions by the reduction rate α is β. That is, when a reference s65, estimated face size of the frame a series of scene is started, can be obtained by (α β × ^s65). Here, by determining in advance the face size of the frame where the scene starts in time series order as Sfirst, the face search can be omitted up to a frame satisfying Sfirst> (α ^β × s65).
For example, if the number of frames that can be omitted is calculated as 2, the face search for frames 64 and 63 is omitted, and the face search in the initial search area is performed after frame 62 (frames 62 and 61 in the example of FIG. 4). To resume.

また、例えば、監視映像がエスカレータの昇降位置に設置された場合など、人が通過するほとんどのシーンにおいて、人の移動速度は一定であり、顔を検出するフレームがほぼ同じであるならば、予め省略するフレーム数を固定にしてもよい。 Also, in most scenes where people pass, for example, when the surveillance video is installed at the lift position of the escalator, if the moving speed of the person is constant and the frame for detecting the face is almost the same, The number of frames to be omitted may be fixed.

次に、上記した（登録方法１）及び（登録方法２）に対応した図５、上記した（登録方法３）に対応した図６を参照して、本例における監視映像を対象とした類似顔画像検索システムの登録処理の流れの例を示す。
本例では、類似顔画像検索システムの処理は、登録処理、録画処理、映像再生処理、類似画像検索処理の４つに大別されるが、録画処理、映像再生処理、類似画像検索処理については、例えば図７に示されるようなものなど、従来技術に係る録画処理、映像再生処理、類似画像検索処理と同様な処理を用いることができ、ここでは、詳しい説明を省略する。 Next, referring to FIG. 5 corresponding to the above (registration method 1) and (registration method 2) and FIG. 6 corresponding to the above (registration method 3), the similar face targeted for the monitoring video in this example The example of the flow of a registration process of an image search system is shown.
In this example, the process of the similar face image search system is roughly divided into four processes: a registration process, a recording process, a video reproduction process, and a similar image search process. The recording process, the video reproduction process, and the similar image search process are as follows. For example, a process similar to the recording process, the video reproduction process, and the similar image search process according to the related art such as the one shown in FIG. 7 can be used, and detailed description thereof is omitted here.

図５には、監視映像を対象とした類似顔画像検索システムにおける前記の（登録方法１）及び（登録方法２）での登録処理の流れの一例を示してある。ステップＳ１〜Ｓ２７は録画装置２により行われる処理であり、タイミングＳ３１は監視端末３により行われる処理に関する。
ステップＳ１において、録画装置２は、ユーザによる監視端末３の登録実行操作（登録方法の決定を含む）により登録要求部２５にて作成された登録要求信号を受信するまで待機する。登録要求信号の受信が完了したら、ステップＳ２へ進む。
タイミングＳ３１は、監視端末３と録画装置２との間の通信を表している。 FIG. 5 shows an example of the flow of registration processing in the (registration method 1) and (registration method 2) in the similar face image search system for monitoring video. Steps S 1 to S 27 are processing performed by the recording device 2, and timing S 31 relates to processing performed by the monitoring terminal 3.
In step S 1, the recording apparatus 2 stands by until a registration request signal created by the registration request unit 25 is received by a registration execution operation (including determination of a registration method) of the monitoring terminal 3 by the user. When reception of the registration request signal is completed, the process proceeds to step S2.
Timing S31 represents communication between the monitoring terminal 3 and the recording device 2.

ステップＳ２において、ユーザの操作内容を検証する。検証内容は、顔画像の登録方法の種類、登録開始フレーム番号及び登録終了フレーム番号である。ユーザによる登録要求に不足がある場合には、ステップＳ１へ進む。ユーザの登録要求が正しければ、ステップＳ３へ進む。
ステップＳ３において、ステップＳ１で受信した顔画像の登録方法の種類を、内部メモリに記憶し、ステップＳ４へ進む。
ステップＳ４において、ステップＳ１で受信した登録開始フレーム番号及び登録終了フレーム番号を内部メモリに記憶する。登録開始フレーム番号を現フレーム番号として設定し、ステップＳ５へ進む。 In step S2, the user's operation content is verified. The verification contents are the type of face image registration method, registration start frame number, and registration end frame number. If the registration request by the user is insufficient, the process proceeds to step S1. If the user registration request is correct, the process proceeds to step S3.
In step S3, the type of the face image registration method received in step S1 is stored in the internal memory, and the process proceeds to step S4.
In step S4, the registration start frame number and registration end frame number received in step S1 are stored in the internal memory. The registration start frame number is set as the current frame number, and the process proceeds to step S5.

ステップＳ５において、使用する初期探索領域の選択を実施し、ステップＳ６へ進む。本例では、顔画像の登録方法の種類が、前述の画像の（登録方法１）である場合には、局所領域３２の位置、大きさを初期探索領域の位置、大きさとして内部メモリに記憶し、また、前述の画像の（登録方法２）である場合には、局所領域３３の位置、大きさを初期探索領域の位置、大きさとして内部メモリに記憶する。
ステップＳ６において、顔探索領域の位置、大きさを算出し、ステップＳ７へ進む。顔探索領域の位置、大きさは、ステップＳ５において選択した初期探索領域の位置、大きさを内部メモリから読み出すことにより求める。 In step S5, an initial search area to be used is selected, and the process proceeds to step S6. In this example, when the type of registration method of the face image is (registration method 1) of the above-mentioned image, the position and size of the local area 32 are stored in the internal memory as the position and size of the initial search area. In the case of (registration method 2) of the above-described image, the position and size of the local area 33 are stored in the internal memory as the position and size of the initial search area.
In step S6, the position and size of the face search area are calculated, and the process proceeds to step S7. The position and size of the face search area are obtained by reading out the position and size of the initial search area selected in step S5 from the internal memory.

ステップＳ７において、顔探索の最小及び最大サイズを算出し、ステップＳ８へ進む。顔探索最小及び最大サイズは、ステップＳ５において選択した初期探索領域用の顔探索最小サイズ及び最大サイズを内部メモリから読み出すことにより求める。本例では、これらの値は、システムに予め設定され、内部メモリに記憶されているものとする。
ステップＳ８において、現フレーム番号に基づいて画像を読み出し、ステップＳ９へ進む。
ステップＳ９において、現フレーム番号を更新し、ステップＳ１０へ進む。顔画像の登録方法の種類が、前述の画像の（登録方法１）である場合には、例えば、現フレーム番号に１を加算した値を新たに現フレーム番号とし、また、前述の画像の（登録方法２）である場合には、例えば、現フレーム番号から１を減算した値を新たに現フレーム番号とする。 In step S7, the minimum and maximum sizes of the face search are calculated, and the process proceeds to step S8. The face search minimum and maximum sizes are obtained by reading out the face search minimum size and maximum size for the initial search area selected in step S5 from the internal memory. In this example, these values are set in advance in the system and stored in the internal memory.
In step S8, an image is read based on the current frame number, and the process proceeds to step S9.
In step S9, the current frame number is updated, and the process proceeds to step S10. When the type of registration method of the face image is (registration method 1) of the above-mentioned image, for example, a value obtained by adding 1 to the current frame number is newly set as the current frame number, and ( In the case of the registration method 2), for example, a value obtained by subtracting 1 from the current frame number is newly set as the current frame number.

ステップＳ１０において、ステップＳ６で算出した顔探索領域の位置と大きさ、及びステップＳ７で算出した顔探索最小最大サイズで、顔探索を行う。顔（顔の画像部分）が検出されたら、一連のシーンが開始されたと判断し、ステップＳ１１へ進み、また、顔が検出されなかったらステップＳ２４へ進む。
ステップＳ２４において、登録終了判定を行う。この判定は、現フレーム番号と終了フレーム番号の比較により、具体的には、例えば、現フレーム番号が終了フレーム番号になったら登録終了とする。登録終了と判定した場合には、ステップＳ２７へ進み、それ以外の場合には、ステップＳ８へ進む。
ステップＳ１１において、内部メモリに記憶しておいた一連のシーンにおける顔検出数と顔領域の重心座標、顔サイズの検出履歴を全て初期化する。
ステップＳ１２において、現フレーム番号に基づいて画像を読み出し、ステップＳ１３へ進む。 In step S10, a face search is performed with the position and size of the face search area calculated in step S6 and the face search minimum and maximum size calculated in step S7. If a face (face image portion) is detected, it is determined that a series of scenes has started, and the process proceeds to step S11. If no face is detected, the process proceeds to step S24.
In step S24, registration end determination is performed. This determination is based on the comparison between the current frame number and the end frame number. Specifically, for example, when the current frame number becomes the end frame number, the registration is ended. If it is determined that the registration has been completed, the process proceeds to step S27. Otherwise, the process proceeds to step S8.
In step S11, the number of face detections in the series of scenes stored in the internal memory, the barycentric coordinates of the face area, and the face size detection history are all initialized.
In step S12, an image is read based on the current frame number, and the process proceeds to step S13.

ステップＳ１３において、一連のシーンにおける顔検出数が２つ以上ならステップＳ１４へ進み、また、２つ未満ならステップＳ２５へ進む。
ステップＳ１４において、追従顔探索領域の位置を算出し、ステップＳ１５へ進む。追従顔探索領域の位置は、図３を参照して説明したように、一連のシーンにおける顔領域の重心座標の履歴から求めた動きベクトルから算出する。
ステップＳ１５において、追従顔探索領域の大きさを算出し、ステップＳ１６へ進む。追従顔探索領域の大きさは、図３を参照して説明したように、一連のシーンにおける顔サイズの履歴から求めた推定顔サイズから算出する。
ステップＳ１６において、追従顔探索領域用の顔探索の最小最大サイズを算出し、ステップＳ１７へ進む。このサイズは、上述したように、一連のシーンにおける顔サイズの履歴から求めた推定顔サイズから算出する。 In step S13, if the number of face detections in the series of scenes is 2 or more, the process proceeds to step S14, and if it is less than 2, the process proceeds to step S25.
In step S14, the position of the follow-up face search area is calculated, and the process proceeds to step S15. As described with reference to FIG. 3, the position of the follow-up face search area is calculated from a motion vector obtained from the history of the barycentric coordinates of the face area in a series of scenes.
In step S15, the size of the tracking face search area is calculated, and the process proceeds to step S16. The size of the follow-up face search area is calculated from the estimated face size obtained from the face size history in a series of scenes, as described with reference to FIG.
In step S16, the minimum and maximum size of the face search for the follow-up face search area is calculated, and the process proceeds to step S17. As described above, this size is calculated from the estimated face size obtained from the face size history in a series of scenes.

ステップＳ２５において、初期探索領域の位置、大きさの算出を実施し、ステップＳ２６へ進む。この算出は、例えば、ステップＳ６の処理と同様な方法で行う。
ステップＳ２６において、初期探索領域用の顔探索の最小最大サイズを算出し、ステップＳ１７へ進む。この算出は、例えば、ステップＳ７と同様な方法で行う。 In step S25, the position and size of the initial search area are calculated, and the process proceeds to step S26. This calculation is performed by the same method as the process of step S6, for example.
In step S26, the minimum and maximum size of the face search for the initial search area is calculated, and the process proceeds to step S17. This calculation is performed by the same method as in step S7, for example.

ステップＳ１７において、ステップＳ１４、Ｓ１５、Ｓ１６、又は、ステップＳ２５、Ｓ２６で算出した顔探索領域の位置、大きさ及び顔探索の最小最大サイズで顔探索を行う。顔が検出されたら、ステップＳ１８へ進み、また、顔が検出されなかったらステップＳ２１へ進む。
ステップＳ１８において、ステップＳ１７で検出された顔画像に対して、その画像特徴量を抽出し、ステップＳ１９へ進む。
ステップＳ１９において、ステップＳ１８で抽出した画像特徴量を記録媒体に記録し、ステップＳ２０へ進む。
ステップＳ２０において、ステップＳ１７で検出できた顔領域の重心座標及びサイズを内部メモリ上の検出履歴に追加し、顔検出数に１を加算し、ステップＳ２１へ進む。 In step S17, a face search is performed using the position and size of the face search area calculated in steps S14, S15, and S16, or steps S25 and S26, and the minimum and maximum size of the face search. If a face is detected, the process proceeds to step S18. If no face is detected, the process proceeds to step S21.
In step S18, the image feature amount is extracted from the face image detected in step S17, and the process proceeds to step S19.
In step S19, the image feature amount extracted in step S18 is recorded on the recording medium, and the process proceeds to step S20.
In step S20, the barycentric coordinates and size of the face area detected in step S17 are added to the detection history on the internal memory, 1 is added to the face detection count, and the process proceeds to step S21.

ステップＳ２１において、現フレーム番号を更新し、ステップＳ２２へ進む。顔画像の登録方法の種類が、前述の画像の（登録方法１）である場合には、例えば、現フレーム番号に１を加算した値を新たに現フレーム番号とし、また、前述の画像の（登録方法２）である場合には、例えば、現フレーム番号から１を減算した値を新たに現フレーム番号とする。
ステップＳ２２において、登録終了判定を行う。この判定は、現フレーム番号と終了フレーム番号の比較により、具体的には、例えば、現フレーム番号が終了フレーム番号になったら登録終了とする。登録終了であると判定した場合には、ステップＳ２７へ進み、それ以外の場合には、ステップＳ２３へ進む。 In step S21, the current frame number is updated, and the process proceeds to step S22. When the type of registration method of the face image is (registration method 1) of the above-mentioned image, for example, a value obtained by adding 1 to the current frame number is newly set as the current frame number, and ( In the case of the registration method 2), for example, a value obtained by subtracting 1 from the current frame number is newly set as the current frame number.
In step S22, registration end determination is performed. This determination is based on the comparison between the current frame number and the end frame number. Specifically, for example, when the current frame number becomes the end frame number, the registration is ended. If it is determined that the registration has been completed, the process proceeds to step S27. Otherwise, the process proceeds to step S23.

ステップＳ２３において、一連のシーンが終了したか否かを判定する。この判定には、例えば、顔を検出できなかったフレームの連続度や、直近のフレームで検出できた顔のサイズの所定の値との大小関係などを基準として用い、具体的には、例えば、顔を検出できなかったフレームの連続度が所定値を超えたら終了とし、或いは、直近のフレームで検出できた顔のサイズが所定の値を超えたら（又は、所定の値未満となったら）終了とする。一連のシーンがまだ継続していると判定されたら、ステップＳ１２へ進み、また、一連のシーンが終了したと判定されたら、ステップＳ６へ進む。
ステップＳ２７において、登録処理を終了する。 In step S23, it is determined whether or not a series of scenes has ended. For this determination, for example, the continuity of frames in which the face could not be detected or the magnitude relationship with a predetermined value of the size of the face that could be detected in the most recent frame is used as a reference. End when the continuity of frames that could not detect a face exceeds a predetermined value, or end when the size of the face that can be detected in the most recent frame exceeds a predetermined value (or less than a predetermined value) And If it is determined that the series of scenes are still continuing, the process proceeds to step S12. If it is determined that the series of scenes is completed, the process proceeds to step S6.
In step S27, the registration process is terminated.

図６には、監視映像を対象とした類似顔画像検索システムにおける前記の（登録方法３）での登録処理の流れの一例を示してある。
ここで、図６に示されるフローチャートは、図５に示されるフローチャートと比べて、ステップＳ４１、Ｓ４２、Ｓ４３を加えたものとなっており、図５に示されるのと同様なステップＳ１〜Ｓ２７、Ｓ３１については同一の符号を付してあり詳しい説明を省略する。ステップＳ４１〜Ｓ４３は録画装置２により行われる処理である。 FIG. 6 shows an example of the flow of registration processing in the above (registration method 3) in the similar face image search system for monitoring video.
Here, the flowchart shown in FIG. 6 is obtained by adding steps S41, S42, and S43 to the flowchart shown in FIG. 5, and steps S1 to S27 similar to those shown in FIG. About S31, the same code | symbol is attached | subjected and detailed description is abbreviate | omitted. Steps S41 to S43 are processes performed by the recording device 2.

ステップＳ４１は、ステップＳ１１の後に処理が実施され、当該ステップＳ４１の条件判定により、ステップＳ４２、Ｓ４３の処理が実施される。また、ステップＳ２３においてシーン未終了である場合にはステップＳ４１へ進む。
ステップＳ４１において、一連のシーン内のフレームでの顔探索を早期に終了するか否かの判定（シーン早期終了判定）を行う。このシーン早期終了判定は、例えば、一連のシーン内での顔検出数と所定の値との大小関係や、前のフレームで検出できた顔のサイズと所定サイズとの大小関係などを基準に用いることができ、具体的には、例えば、一連のシーン内での顔検出数が所定の値を超えたら早期に終了することや、或いは、前のフレームで検出できた顔のサイズが所定サイズ未満となったら早期に終了することができる。シーン早期終了判定によって、シーン内の顔探索を現フレームで終了すると判定した場合には、ステップＳ４２へ進み、また、シーン内の顔探索を継続すると判定した場合には、ステップＳ１２へ進む。 In step S41, processing is performed after step S11, and processing in steps S42 and S43 is performed based on the condition determination in step S41. If it is determined in step S23 that the scene has not ended, the process proceeds to step S41.
In step S41, it is determined whether or not to end face search in a frame in a series of scenes early (scene early end determination). This early scene end determination is based on, for example, the size relationship between the number of face detections in a series of scenes and a predetermined value, the size relationship between the face size detected in the previous frame and a predetermined size, or the like. Specifically, for example, when the number of detected faces in a series of scenes exceeds a predetermined value, the process ends early, or the face size detected in the previous frame is less than a predetermined size. Can be finished early. If it is determined by the early scene end determination that face search in the scene is to be ended in the current frame, the process proceeds to step S42, and if it is determined that face search in the scene is to be continued, the process proceeds to step S12.

ステップＳ４２において、一連のシーン内で検出した顔サイズの履歴から前述の縮小率α、省略フレーム数βを算出し、省略可能なフレーム数を決定し、ステップＳ４３へ進む。
ステップＳ４３において、現フレーム番号を更新し、ステップＳ２４へ進む。ここでは、次のシーンの顔探索を行うため、ステップＳ４２で求めた省略可能なフレーム数βを現フレーム番号から減算した値を新たに現フレーム番号とする。 In step S42, the aforementioned reduction ratio α and omitted frame number β are calculated from the face size history detected in the series of scenes, the number of omissible frames is determined, and the process proceeds to step S43.
In step S43, the current frame number is updated, and the process proceeds to step S24. Here, in order to perform a face search for the next scene, a value obtained by subtracting the omissible frame number β obtained in step S42 from the current frame number is newly set as the current frame number.

以上、具体的な実施の形態により本提案を説明したが、本提案は上記の実施の形態には限定されず、例えば、上記では、登録の機能を録画装置２に実装する構成を説明したが、録画装置とは別の装置に実装するような構成であってもよい。
本例においては、顔画像の検出処理時間の短縮による類似顔画像の登録処理時間の短縮を実現する構成を示したが、検出する対象が顔でない場合においても適用が可能であり、各種の物体検出技術を用いることで、特定の物体の類似画像検索システムに同様に適用することができる。
更に、本例においては、検索画像を、撮像装置１で撮影され録画装置２に記録された画像から選択する場合について示したが、例えば、撮像装置１とは異なる機器により撮影された画像とすることも可能である。
以上、本提案を上記の実施の形態で具体的に説明したが、上記の実施の形態は本提案の例示であって、本提案は上記の実施の形態に限定されないことは言うまでもない。 As mentioned above, although this proposal was demonstrated by specific embodiment, this proposal is not limited to said embodiment, For example, although the structure which mounts the registration function in the video recording apparatus 2 was demonstrated above, for example. The configuration may be such that it is mounted on a device different from the recording device.
In this example, a configuration has been shown in which a similar face image registration processing time is shortened by shortening a face image detection processing time. However, the present invention can be applied even when a detection target is not a face, and various objects can be applied. By using the detection technique, it can be similarly applied to a similar image search system for a specific object.
Furthermore, in this example, the case where the search image is selected from images captured by the imaging device 1 and recorded in the recording device 2 has been described. For example, an image captured by a device different from the imaging device 1 is used. It is also possible.
As mentioned above, although this proposal was concretely demonstrated by said embodiment, it cannot be overemphasized that said embodiment is an illustration of this proposal and this proposal is not limited to said embodiment.

図２に示される監視映像を参照して、本例により得られる効果の具体例を示す。
図２には、建物内の廊下を遠方から手前に通り過ぎる人を監視するために設置された監視カメラの映像の一例が示されている。撮像装置１で撮像された映像の全領域３１が示されている。
本例では、監視映像の特徴を利用することで、撮像装置１が撮像する全領域３１での顔探索ではなく、初期探索領域である局所領域３２にて顔探索することで、探索領域の縮小により演算量を削減することができる。更に、初期探索領域で顔を検出したフレーム以降では、一連のシーンで検出した顔の顔領域及び顔サイズを用いることで、次フレームでの顔領域を推定でき、その推定した領域において顔探索することで演算量の削減を実現することができる。 A specific example of the effect obtained by this example will be described with reference to the monitoring video shown in FIG.
FIG. 2 shows an example of an image of a surveillance camera installed to monitor a person passing through a corridor in a building from a distance to the front. The entire area 31 of the video imaged by the imaging device 1 is shown.
In this example, the search area is reduced by searching for a face in the local area 32 that is an initial search area instead of a face search in the entire area 31 captured by the imaging apparatus 1 by using the feature of the monitoring video. Thus, the amount of calculation can be reduced. Further, after the frame in which the face is detected in the initial search area, the face area in the next frame can be estimated by using the face area and the face size detected in the series of scenes, and the face is searched in the estimated area. Thus, the amount of calculation can be reduced.

また、他の例として、初期探索領域３３にて時系列と逆順に顔探索する場合には、小さいサイズの顔を探索する必要がないことによって、演算量を削減することができる。更に、逆時系列順に登録する場合には、一連のシーンにおいて大きい顔から先に取得できるため、それ以降のフレームでの小さい顔が存在するフレームにおいて顔探索する必要がなくなる。探索領域の縮小、又は、探索の最小サイズ及び最大サイズの限定による演算量の削減量は、画角、被写体の大きさ等により異なるが、定性的には自明である。また、一連のシーンにおいて、大きい顔を取得したフレームは、最も人を識別し易いベストショットフレームとして選択することができるという効果もある。 As another example, when a face search is performed in the initial search region 33 in the reverse order of the time series, it is not necessary to search for a face having a small size, thereby reducing the amount of calculation. Furthermore, in the case of registering in reverse chronological order, since a large face can be acquired first in a series of scenes, it is not necessary to search for a face in a frame where a small face exists in subsequent frames. The amount of calculation reduction by reducing the search area or limiting the minimum size and maximum size of the search varies depending on the angle of view, the size of the subject, etc., but is qualitatively obvious. Further, in a series of scenes, a frame that has acquired a large face can be selected as a best shot frame that is most easily identified.

具体例として、シーン顔探索部１８は、監視映像の特徴を利用して、撮像装置１で得られる画像の局所領域３２、又は、局所領域３３に注目して、局所領域で人の通過を待ち受ける。初期探索領域以降のフレームでは、追従顔探索領域にて顔探索を行う。
また、逆時系列順に登録を行う場合には、一連のシーン内での顔サイズをこれまでに検出した顔サイズから推定する。この推定により、小さい顔があると予想されるフレームの顔探索を省略することができる。 As a specific example, the scene face search unit 18 waits for a person to pass in the local region by using the feature of the monitoring video and paying attention to the local region 32 or the local region 33 of the image obtained by the imaging device 1. . In frames after the initial search area, face search is performed in the follow-up face search area.
When registration is performed in reverse time series order, the face size in a series of scenes is estimated from the face sizes detected so far. By this estimation, it is possible to omit a face search for a frame that is expected to have a small face.

以上のように、本例の類似顔画像検索システムでは、次のような構成例（構成例１）〜（構成例５）を有する。
（構成例１）初期窓の基本的な構成例である。
撮像装置１と、撮像装置１で撮像された画像を録画する録画装置２と、監視端末３を有し、任意の顔画像に対する類似顔画像を録画装置２に記録された画像から検索する機能と、検索された画像を監視端末３に表示する機能を備えた類似顔画像検索システムにおいて、固定の局所領域を顔探索領域とする初期探索機能を設けた。 As described above, the similar face image search system of this example includes the following configuration examples (configuration example 1) to (configuration example 5).
(Configuration Example 1) This is a basic configuration example of the initial window.
An imaging device 1, a recording device 2 that records an image captured by the imaging device 1, and a monitoring terminal 3, and a function of retrieving a similar face image for an arbitrary face image from images recorded in the recording device 2; In the similar face image search system having a function of displaying the searched image on the monitoring terminal 3, an initial search function using a fixed local area as a face search area is provided.

（構成例２）初期窓で時系列で待ち受ける構成である。
撮像装置１と、撮像装置１で撮像された画像を録画する録画装置２と、監視端末３を有し、任意の顔画像に対する類似顔画像を録画装置２に記録された画像から検索する機能と、検索された画像を監視端末３に表示する機能と、上記した（構成例１）に係る機能を備えた類似顔画像検索システムにおいて、小面積の固定の局所領域を顔探索領域とし、領域中の小サイズの顔を探索する初期探索機能と、時系列順にフレーム登録する機能を設けた。 (Configuration Example 2) A configuration in which the initial window waits in time series.
An imaging device 1, a recording device 2 that records an image captured by the imaging device 1, and a monitoring terminal 3, and a function of retrieving a similar face image for an arbitrary face image from images recorded in the recording device 2; In the similar face image search system having the function of displaying the searched image on the monitoring terminal 3 and the function according to (Configuration Example 1) described above, a fixed local area having a small area is set as the face search area, An initial search function for searching for a small-sized face and a function for registering frames in chronological order are provided.

（構成例３）初期窓で時系列で待ち受けて追従する構成
撮像装置１と、撮像装置１で撮像された画像を録画する録画装置２と、監視端末３を有し、任意の顔画像に対する類似顔画像を録画装置２に記録された画像から検索する機能と、検索された画像を監視端末３に表示する機能と、上記した（構成例１）に係る機能を備えた類似顔画像検索システムにおいて、顔検出フレームより後のフレームにおける顔探索を、一連のシーンで既に検出した顔領域の動きベクトル及び顔サイズから算出される顔探索領域及び顔探索の最小及び最大サイズによって行う。 (Configuration example 3) Configuration for waiting and following in time series in the initial window The imaging device 1, the recording device 2 for recording an image captured by the imaging device 1, and the monitoring terminal 3, and similar to an arbitrary face image In a similar face image retrieval system having a function of retrieving a face image from an image recorded in the recording device 2, a function of displaying the retrieved image on the monitoring terminal 3, and a function according to the above (Configuration Example 1) A face search in a frame after the face detection frame is performed based on the minimum and maximum size of the face search area and face search calculated from the motion vector and face size of the face area already detected in a series of scenes.

（構成例４）初期窓で逆時系列順で待ち受ける構成である。
撮像装置１と、撮像装置１で撮像された画像を録画する録画装置２と、監視端末３を有し、任意の顔画像に対する類似顔画像を録画装置２に記録された画像から検索する機能と、検索された画像を監視端末３に表示する機能と、（構成例１）に係る機能を備えた類似顔画像検索システムにおいて、大面積の固定の局所領域を顔探索領域とし、領域中の大サイズの顔を探索する初期探索機能と、逆時系列順にフレーム登録する機能を設けた。 (Configuration example 4) A configuration in which the initial window waits in reverse time-series order.
An imaging device 1, a recording device 2 that records an image captured by the imaging device 1, and a monitoring terminal 3, and a function of retrieving a similar face image for an arbitrary face image from images recorded in the recording device 2; In the similar face image search system having the function of displaying the searched image on the monitoring terminal 3 and the function according to (Configuration Example 1), a fixed local area having a large area is set as a face search area, and An initial search function for searching for a size face and a function for registering frames in reverse time series are provided.

（構成例５）初期窓で逆時系列順で待ち受けて、シーン内で早期に顔探索を終了する構成である。
撮像装置１と、撮像装置１で撮像された画像を録画する録画装置２と、監視端末３を有し、任意の顔画像に対する類似顔画像を録画装置２に記録された画像から検索する機能と、検索された画像を監視端末３に表示する機能と、（構成例４）に係る機能を備えた類似顔画像検索システムにおいて、顔探索フレームを選別する機能を設けた。 (Structure Example 5) This structure waits in reverse time-series order at the initial window and ends face search early in the scene.
An imaging device 1, a recording device 2 that records an image captured by the imaging device 1, and a monitoring terminal 3, and a function of retrieving a similar face image for an arbitrary face image from images recorded in the recording device 2; In the similar face image search system having a function of displaying the searched image on the monitoring terminal 3 and a function related to (Configuration Example 4), a function of selecting a face search frame is provided.

なお、本例の類似顔画像検索システム（画像処理システムの一例）では、録画装置２において、画像特徴量記録部１４がシーン顔探索部１８により、時系列順又は逆時系列順で、初期探索を行う機能により初期探索手段が構成されており、その後に追従探索を行う機能により追従探索手段が構成されており、所定の条件を満たすフレームでは探索を省略する機能により探索省略手段が構成されており、また、画像特徴量記録部１４が画像特徴量リストデータ（フレーム番号と画像特徴量との対応）をメモリに記録する機能により画像部分情報登録手段が構成されている。 In the similar face image search system of this example (an example of an image processing system), in the recording device 2, the image feature amount recording unit 14 is initially searched in the time series order or the reverse time series order by the scene face search unit 18. The initial search means is configured by the function of performing the search, the follow-up search means is configured by the function of performing the subsequent search thereafter, and the search omitting means is configured by the function of omitting the search in a frame that satisfies the predetermined condition. In addition, the image feature information recording unit 14 includes an image partial information registration unit having a function of recording image feature value list data (correspondence between frame numbers and image feature values) in a memory.

ここで、本発明に係るシステムや装置などの構成としては、必ずしも以上に示したものに限られず、種々な構成が用いられてもよい。また、本発明は、例えば、本発明に係る処理を実行する方法或いは方式や、このような方法や方式を実現するためのプログラムや当該プログラムを記録する記録媒体などとして提供することも可能であり、また、種々なシステムや装置として提供することも可能である。
また、本発明の適用分野としては、必ずしも以上に示したものに限られず、本発明は、種々な分野に適用することが可能なものである。
また、本発明に係るシステムや装置などにおいて行われる各種の処理としては、例えばプロセッサやメモリ等を備えたハードウエア資源においてプロセッサがＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）に格納された制御プログラムを実行することにより制御される構成が用いられてもよく、また、例えば当該処理を実行するための各機能手段が独立したハードウエア回路として構成されてもよい。
また、本発明は上記の制御プログラムを格納したフロッピー（登録商標）ディスクやＣＤ（ＣｏｍｐａｃｔＤｉｓｃ）−ＲＯＭ等のコンピュータにより読み取り可能な記録媒体や当該プログラム（自体）として把握することもでき、当該制御プログラムを当該記録媒体からコンピュータに入力してプロセッサに実行させることにより、本発明に係る処理を遂行させることができる。 Here, the configuration of the system and apparatus according to the present invention is not necessarily limited to the configuration described above, and various configurations may be used. The present invention can also be provided as, for example, a method or method for executing the processing according to the present invention, a program for realizing such a method or method, or a recording medium for recording the program. It is also possible to provide various systems and devices.
The application field of the present invention is not necessarily limited to the above-described fields, and the present invention can be applied to various fields.
In addition, as various processes performed in the system and apparatus according to the present invention, for example, the processor executes a control program stored in a ROM (Read Only Memory) in hardware resources including a processor and a memory. A controlled configuration may be used, and for example, each functional unit for executing the processing may be configured as an independent hardware circuit.
The present invention can also be understood as a computer-readable recording medium such as a floppy (registered trademark) disk or a CD (Compact Disc) -ROM storing the control program, and the program (itself). The processing according to the present invention can be performed by inputting the program from the recording medium to the computer and causing the processor to execute the program.

本発明の一実施例に係る類似顔画像検索システムの構成例を示す図である。It is a figure which shows the structural example of the similar face image search system which concerns on one Example of this invention. 監視カメラの映像の一例を示す図である。It is a figure which shows an example of the image | video of a surveillance camera. 動きベクトルを算出する場合の一例を示す図である。It is a figure which shows an example in the case of calculating a motion vector. 逆時系列順に顔画像を登録した場合に顔探索を省略することができるフレームの一例を示す図である。It is a figure which shows an example of the flame | frame which can abbreviate | omit a face search when a face image is registered in reverse time series order. （登録方法１）及び（登録方法２）での登録処理の流れの一例を示す図である。It is a figure which shows an example of the flow of the registration process in (registration method 1) and (registration method 2). （登録方法３）での登録処理の流れの一例を示す図である。It is a figure which shows an example of the flow of the registration process in (registration method 3). 類似顔画像検索システムの構成例を示す図である。It is a figure which shows the structural example of a similar face image search system.

Explanation of symbols

１・・撮像装置、２・・録画装置、３・・監視端末、４、５・・ネットワーク、１１・・ネットワーク送受信部、１２・・映像記録部、１３・・映像配信部、１４・・画像特徴量記録部、１５・・画像特徴量抽出部、１６・・画像類似度判定部、１７・・顔探索部、１８・・シーン顔探索部、２１・・ネットワーク送受信部、２２・・検索要求部、２３・・検索結果表示部、２４・・映像再生部、２５・・登録要求部、３１・・全領域、３２、３３・・局所領域、４１〜４４・・顔領域の重心座標、４５・・顔領域の推定重心座標、５１〜５４・・動きベクトル、５５・・追従顔探索領域、６１、６２、６８、６９・・初期探索領域での顔探索フレーム、６３、６４・・顔非探索フレーム、６５〜６７・・初期探索領域又は追従顔探索領域での顔探索フレーム、 1 .... Imaging device 2 .... Recording device 3 .... Monitoring terminal 4, 5 .... Network, 11 .... Network transmission / reception unit 12 .... Video recording unit 13 .... Video distribution unit 14 .... Image Feature amount recording unit, 15 .... Image feature amount extraction unit, 16 .... Image similarity determination unit, 17 .... Face search unit, 18 .... Scene face search unit, 21 .... Network transmission / reception unit, 22 .... Search request , 23 .. Search result display section, 24 .. Video playback section, 25 .. Registration request section, 31 .. All areas, 32, 33 .. Local area, 41 to 44 .. Center of gravity coordinates of face area, 45 .. Estimated center-of-gravity coordinates of face area, 51 to 54 .. Motion vector, 55 ..Following face search area, 61, 62, 68, 69 ..Face search frame in initial search area, 63, 64. Search frame, 65-67 Face search frame in the initial search area or follow-up face search area,

Claims

In an image processing system for registering information about a predetermined image portion for images of a plurality of frames,
Processing the images of the plurality of frames in chronological order;
An initial search means for searching an image portion to be searched in a predetermined local region set in the frame in a predetermined number of initial frame images;
In the image of the frame after chronological order from the initial frame, a local region is set using information based on the search result obtained in the image of the previous frame in chronological order within the frame. And follow-up search means for searching for an image portion to be searched in the local region,
Image part information registration means for registering information about the image part for part or all of the image part searched by the initial search means and the follow-up search means;
An image processing system comprising:

In an image processing system for registering information about a predetermined image portion for images of a plurality of frames,
Processing the images of the plurality of frames in reverse chronological order;
An initial search means for searching an image portion to be searched in a predetermined local region set in the frame in a predetermined number of initial frame images;
In an image of a frame subsequent to the initial frame in reverse time-series order, a local region using information based on the search result obtained in the image of the previous frame in reverse-time-series order within the frame is used. Tracking search means for searching for an image portion to be searched in the local region,
Image part information registration means for registering information about the image part for part or all of the image part searched by the initial search means and the follow-up search means;
An image processing system comprising: