JP2020190850A

JP2020190850A - Image processing device, imaging device, imaging processing method and program

Info

Publication number: JP2020190850A
Application number: JP2019094785A
Authority: JP
Inventors: 裕山下; Yutaka Yamashita
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-05-20
Filing date: 2019-05-20
Publication date: 2020-11-26

Abstract

To reduce a processing load while maintaining the accuracy of collation processing of an object in an image.SOLUTION: An image processing device includes: a detection unit for detecting an object included in a picked-up image; a collation unit for calculating similarity between the detected object and a prescribed registered object, and determining whether or not the detected object is the prescribed registered object on the basis of the similarity; and an optimum area calculation unit for determining a second area included in a first area in accordance with the similarity calculated by the collation unit to the object detected by the detection unit in the first area in the picked-up image. The image processing device determines whether or not the object detected by the detection unit in the second area is the prescribed registered object.SELECTED DRAWING: Figure 2

Description

本発明は、画像処理装置、撮像装置、画像処理方法及びプログラムに関する。 The present invention relates to an image processing device, an imaging device, an image processing method and a program.

従来、カメラからの映像から人物の領域を検出し、検出した領域から顔等の特徴量を取得し、登録画像の特徴量との比較により映像中の人物が同一であるかどうかを照合し、照合結果により登録画像に対応する人物と同一の人物か判定する技術が知られている。
このような技術を監視カメラに応用した場合は、人物の進行方向や速度等の影響により、常に人物から安定した特徴を得るように撮像し続けることは難しい。 Conventionally, the area of a person is detected from the image from the camera, the feature amount such as a face is acquired from the detected area, and it is compared with the feature amount of the registered image to check whether the person in the image is the same. There is known a technique for determining whether the person is the same as the person corresponding to the registered image based on the collation result.
When such a technique is applied to a surveillance camera, it is difficult to continue taking images so as to always obtain stable features from the person due to the influence of the traveling direction and speed of the person.

特許文献１には、特徴量を安定的に取得できる領域か否かを判断し、精度向上を支援する技術が開示されている。 Patent Document 1 discloses a technique for determining whether or not a region can stably acquire a feature amount and supporting improvement in accuracy.

特開２０１７−７６２８８号公報JP-A-2017-76288

しかしながら、特許文献１に開示された技術では、特徴量が安定的に取得できる領域に関する記載はあるが、人物を判定する照合の安定に関しての記載はない。すなわち、本人拒否率（本人として認識しない確率）、他人受入率（他人を本人として間違える確率）を安定的に下げる処理が含まれていないため、照合精度が下がる可能性がある。 However, in the technique disclosed in Patent Document 1, although there is a description regarding a region in which a feature amount can be stably acquired, there is no description regarding the stability of collation for determining a person. That is, since the processing for stably lowering the false rejection rate (probability of not recognizing the person as the person) and the false acceptance rate (probability of mistaking another person as the person) is not included, the collation accuracy may decrease.

また、監視用途の場合は、その場から人が遠く離れてしまう前に、照合結果の通知をする必要がある。照合の回数が多くなると処理時間が増加し、処理負荷が増加する。 In the case of monitoring use, it is necessary to notify the collation result before the person is far away from the place. As the number of collations increases, the processing time increases and the processing load increases.

本発明は上述の課題に鑑みて成されたものであり、画像中のオブジェクトの照合処理の精度を維持しつつ処理負荷を低減させることを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to reduce a processing load while maintaining the accuracy of collation processing of objects in an image.

上記の課題を解決するため、本発明に係る画像処理装置のある態様によれば、撮像画像中に含まれるオブジェクトを検出する検出手段と、前記検出されたオブジェクトと所定の登録オブジェクトとの類似度を算出する算出手段と、前記算出手段により算出された前記類似度に基づいて、前記検出されたオブジェクトが、前記所定の登録オブジェクトであるか否かを判定する判定手段と、前記撮像画像中の第１の領域において前記検出手段により検出されたオブジェクトに対して前記算出手段により算出した前記類似度に応じて、前記第１の領域に含まれる第２の領域を決定する決定手段と、前記第２の領域において前記検出手段により検出されたオブジェクトに対して、前記所定の登録オブジェクトであるか否かの判定を行うように、前記判定手段を制御する制御手段と、を備える画像処理装置が提供される。 In order to solve the above problems, according to an aspect of the image processing apparatus according to the present invention, the detection means for detecting an object contained in a captured image and the degree of similarity between the detected object and a predetermined registered object. A calculation means for calculating the above, a determination means for determining whether or not the detected object is the predetermined registered object based on the similarity calculated by the calculation means, and the captured image. A determination means for determining a second region included in the first region according to the similarity calculated by the calculation means for an object detected by the detection means in the first region, and the first region. Provided is an image processing device including a control means for controlling the determination means so as to determine whether or not the object detected by the detection means in the region 2 is a predetermined registered object. Will be done.

本発明によれば、画像中のオブジェクトの照合処理の精度を維持しつつ処理負荷を低減させることができる。 According to the present invention, it is possible to reduce the processing load while maintaining the accuracy of the collation processing of the objects in the image.

実施形態１の画像処理システムの構成例を示す図。The figure which shows the configuration example of the image processing system of Embodiment 1. FIG. 実施形態１の画像処理装置の機能の例を示すブロック図。The block diagram which shows the example of the function of the image processing apparatus of Embodiment 1. FIG. 実施形態１の画像処理装置のハードウェア構成例を示すブロック図。The block diagram which shows the hardware configuration example of the image processing apparatus of Embodiment 1. FIG. 最適領域算出処理の例を示すフローチャート。The flowchart which shows the example of the optimum area calculation process. フレーム画像の例を示す説明図。Explanatory drawing which shows an example of a frame image. メタデータの情報の例を示す図。The figure which shows the example of the information of metadata. 最適化レベルの例を示す図。The figure which shows the example of the optimization level. 人物の移動の軌跡の例を示す図。The figure which shows the example of the locus of movement of a person. 照合スコアの変化の例を示す図。The figure which shows the example of the change of the collation score. フレーム画像中の領域毎の照合スコアの例を示す図。The figure which shows the example of the collation score for each area in a frame image. 顔画像の照合処理の例を示すフローチャート。A flowchart showing an example of face image collation processing. 最適領域の例を示す図。The figure which shows the example of the optimum area. 実施形態２の画像処理システムにおける撮像装置の設置の例を示す図。The figure which shows the example of the installation of the image pickup apparatus in the image processing system of Embodiment 2. 実施形態２の画像処理システムの構成例を示すブロック図。The block diagram which shows the structural example of the image processing system of Embodiment 2. フレーム画像の例を示す図。The figure which shows the example of the frame image. メタデータの情報の例を示す図。The figure which shows the example of the information of metadata. 最適領域の例を示す図。The figure which shows the example of the optimum area. 実施形態３における処理を示す概念図。The conceptual diagram which shows the process in Embodiment 3.

以下、添付図面を参照して、本発明を実施するための実施形態について詳細に説明する。なお、以下に説明する各実施形態は、本発明の実現手段としての一例であり、本発明が適用される装置の構成や各種条件によって適宜修正または変更されるべきものであり、本発明は以下の実施形態に必ずしも限定されるものではない。また、本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。なお、同一の処理については、同じ符号を付して説明する。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the accompanying drawings. It should be noted that each embodiment described below is an example as a means for realizing the present invention, and should be appropriately modified or changed depending on the configuration of the device to which the present invention is applied and various conditions. Is not necessarily limited to the embodiment of. Moreover, not all combinations of features described in the present embodiment are essential for the means of solving the present invention. The same processing will be described with the same reference numerals.

（実施形態１）
図１は、実施形態１の画像処理システムの構成例を示している。
この画像処理システムは、画像処理を行う画像処理装置１と、撮像処理を行う撮像装置２と、ディスプレイ４を備えている。画像処理装置１と撮像装置２は、ネットワーク３を介して接続されている。ネットワーク３は、例えばＥｔｈｅｒｎｅｔ（登録商標）等の通信規格に準拠する複数のルータ、スイッチ、ケーブル等から構成されている。なお、ネットワーク３は、インターネットや有線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、無線ＬＡＮ（ＷｉｒｅｌｅｓｓＬＡＮ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）等により構成されてもよい。 (Embodiment 1)
FIG. 1 shows a configuration example of the image processing system of the first embodiment.
This image processing system includes an image processing device 1 that performs image processing, an image pickup device 2 that performs image pickup processing, and a display 4. The image processing device 1 and the image pickup device 2 are connected via a network 3. The network 3 is composed of a plurality of routers, switches, cables and the like conforming to communication standards such as Ethernet (registered trademark). The network 3 may be configured by the Internet, a wired LAN (Local Area Network), a wireless LAN (Wireless LAN), a WAN (Wide Area Network), or the like.

画像処理装置１は、撮像装置２が撮像した画像データに予め登録されている人物等の所定のオブジェクトと同一のオブジェクトが映っているか否かを判定する認証処理を行うための画像処理を実行する装置である。なお、画像処理装置１は、例えば、後述する画像処理の機能を実現するためのプログラムがインストールされたパーソナルコンピュータ等によって実現される。撮像装置２は、当該撮像装置２の撮像範囲内の被写体を撮像する装置である。撮像装置２は、例えば所定の時間間隔毎に、撮像した映像に基づく画像データと、撮像装置２を識別する識別情報（装置ＩＤ）と、当該画像を撮像した時刻に関する情報（フレームＩＤ）と、を関連付けて、ネットワーク３を介して、画像処理装置１へ送信する。 The image processing device 1 executes image processing for performing an authentication process for determining whether or not the same object as a predetermined object such as a person registered in advance in the image data captured by the image pickup device 2 is displayed. It is a device. The image processing device 1 is realized by, for example, a personal computer or the like in which a program for realizing an image processing function described later is installed. The imaging device 2 is a device that images a subject within the imaging range of the imaging device 2. The image pickup device 2 includes, for example, image data based on the captured image, identification information (device ID) for identifying the image pickup device 2, and information (frame ID) regarding the time when the image is captured, at predetermined time intervals. Is associated and transmitted to the image processing device 1 via the network 3.

ディスプレイ４は、ＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）等により構成されている。また、ディスプレイ４は、ＨＤＭＩ（登録商標）（ＨｉｇｈＤｅｆｉｎｉｔｉｏｎＭｕｌｔｉｍｅｄｉａＩｎｔｅｒｆａｃｅ）等の通信規格に準拠したディスプレイケーブルを介して画像処理装置１と接続されている。また、ディスプレイ４は、表示手段として機能し、撮像装置２が撮像した画像や、後述する画像処理に係る設定画面等を表示する。なお、ディスプレイ４、画像処理装置１は、単一の筐体に設けられてもよい。 The display 4 is composed of an LCD (Liquid Crystal Display) or the like. Further, the display 4 is connected to the image processing device 1 via a display cable compliant with a communication standard such as HDMI (registered trademark) (High Definition Multimedia Interface). In addition, the display 4 functions as a display means and displays an image captured by the image pickup apparatus 2 and a setting screen related to image processing described later. The display 4 and the image processing device 1 may be provided in a single housing.

（画像処理装置１の構成）
図２は、画像処理装置１の機能の例を示すブロック図である。
図３は、画像処理装置１のハードウェア構成例を示すブロック図である。
画像処理装置１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）３１と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）３２と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）３３とを備えている。また、画像処理装置１は、プログラム、データ等を格納するハードディスク装置（ＨＤＤ）３４と、通信等の外部との入出力を行うインタフェース（Ｉ／Ｆ）３５とを備えている。ＲＡＭ３２は、プログラムやデータを一時記憶する。ＲＯＭ３３は、変更を必要としないプログラムやパラメタを格納する。ＣＰＵ３１は、画像処理装置１全体を制御する。ＣＰＵ３１は、メインメモリであるＲＡＭ３２をワークメモリとして、ＲＯＭ３３やＨＤＤ３４等の記憶媒体に格納された各種プログラムを実行する。なお、図２に示す各機能は、例えば画像処理装置１のＲＯＭ３３に格納されたコンピュータプログラムを画像処理装置１のＣＰＵ３１が実行することにより実現される。 (Configuration of image processing device 1)
FIG. 2 is a block diagram showing an example of the function of the image processing device 1.
FIG. 3 is a block diagram showing a hardware configuration example of the image processing device 1.
The image processing device 1 includes a CPU (Central Processing Unit) 31, a RAM (Random Access Memory) 32, and a ROM (Read Only Memory) 33. Further, the image processing device 1 includes a hard disk device (HDD) 34 for storing programs, data, and the like, and an interface (I / F) 35 for input / output with the outside such as communication. The RAM 32 temporarily stores programs and data. The ROM 33 stores programs and parameters that do not need to be changed. The CPU 31 controls the entire image processing device 1. The CPU 31 uses the RAM 32, which is the main memory, as a work memory to execute various programs stored in a storage medium such as the ROM 33 or the HDD 34. Each function shown in FIG. 2 is realized, for example, by executing a computer program stored in the ROM 33 of the image processing device 1 by the CPU 31 of the image processing device 1.

図２に示すように、画像処理装置１は、ネットワーク３を介して撮像装置２と通信を行う通信部１１と、画像処理装置１全体の処理についての設定を行う設定部１２と、画像等を表示する表示部１３と、画像中のオブジェクトを検出する検出部１４とを備えている。なお、以下、オブジェクトとして人物を検出する例について説明する。また、画像処理装置１は、検出部１４が検出した人物の顔を含む画像から特徴量を算出す特徴量抽出部１５と、特徴量抽出部が抽出した顔画像と記憶部２０の登録画像の特徴量を照合し、同一の人物との類似度を表す照合スコアを算出する照合部１６とを備えている。また、画像処理装置１は、検出部１４によって得られた顔位置情報と照合部１６によって得られた照合スコアからメタデータを生成するメタデータ生成部１７と、メタデータ２４と照合スコアに基づいて最適領域を算出する最適領域算出部１８とを備えている。また、画像処理装置１は、ユーザの操作を入力する操作部１９と、処理に必要なデータ等を格納する記憶部２０とを備えている。 As shown in FIG. 2, the image processing device 1 includes a communication unit 11 that communicates with the image pickup device 2 via the network 3, a setting unit 12 that sets the processing of the entire image processing device 1, and an image or the like. It includes a display unit 13 for displaying and a detection unit 14 for detecting an object in an image. An example of detecting a person as an object will be described below. Further, the image processing device 1 includes a feature amount extraction unit 15 that calculates a feature amount from an image including a person's face detected by the detection unit 14, a face image extracted by the feature amount extraction unit, and a registered image of the storage unit 20. It is provided with a collation unit 16 that collates feature quantities and calculates a collation score indicating the degree of similarity with the same person. Further, the image processing device 1 is based on the metadata generation unit 17 that generates metadata from the face position information obtained by the detection unit 14 and the collation score obtained by the collation unit 16, and the metadata 24 and the collation score. It is provided with an optimum area calculation unit 18 for calculating an optimum area. Further, the image processing device 1 includes an operation unit 19 for inputting a user's operation and a storage unit 20 for storing data and the like necessary for the processing.

通信部１１は、図３のＩ／Ｆ３５によって構成されており、ネットワーク３を介して、撮像装置２と通信を行う。通信部１１は、例えば撮像装置２が撮像した画像の画像データを受信したり、撮像装置２を制御するための制御コマンドを撮像装置２へ送信したりする。なお、制御コマンドは、例えば、撮像装置２に対して撮像指示を行うコマンド等を含む。記憶部２０は、図３のＲＡＭ３２やＨＤＤ３４等によって構成されており、画像処理装置１による画像処理に関わる情報やデータを記憶する。操作部１９は、キーボードやマウス等の入力装置（不図示）を介して、ユーザが行った操作に関する情報を入力する。 The communication unit 11 is composed of the I / F 35 of FIG. 3, and communicates with the image pickup device 2 via the network 3. For example, the communication unit 11 receives image data of an image captured by the image pickup device 2 and transmits a control command for controlling the image pickup device 2 to the image pickup device 2. The control command includes, for example, a command for giving an imaging instruction to the imaging device 2. The storage unit 20 is composed of the RAM 32, the HDD 34, and the like shown in FIG. 3, and stores information and data related to image processing by the image processing device 1. The operation unit 19 inputs information related to the operation performed by the user via an input device (not shown) such as a keyboard or a mouse.

設定部１２は、画像処理装置１による情報処理に関する設定を行う。設定部１２は、例えば操作部１９を介して入力したユーザの操作に基づき、ユーザによる操作によって指定された通過線（所定の領域の通過の判定に用いる線）等の設定を行う。表示部１３は、撮像装置２が撮像した画像や、画像処理に関する設定を行う設定画面、画像処理の結果を示す情報等をディスプレイ４に表示させる。 The setting unit 12 makes settings related to information processing by the image processing device 1. The setting unit 12 sets, for example, a passing line (a line used for determining the passage of a predetermined area) designated by the user's operation based on the user's operation input via the operation unit 19. The display unit 13 causes the display 4 to display an image captured by the image pickup apparatus 2, a setting screen for making settings related to image processing, information indicating the result of image processing, and the like.

検出部１４は、画像中の人物を検出する検出処理を実行する。本実施形態における検出部１４は、照合パターン（辞書）を使用して、パターンマッチング等を行うことにより画像に含まれる人物を検出する。検出処理の検出結果には、画像のどの位置に顔画像があるかを示す顔位置情報が含まれる。なお、検出部１４に、検出された人物を追尾する機能（追尾機能）を設けてもよい。ここで、追尾機能とは、現在のフレームよりも１つ以上前のフレームの画像から検出部１４が検出した人物と同じ人物が現在のフレームの画像に存在する場合、それぞれのフレームにおける人物同士を対応付ける機能を意味する。すなわち、追尾機能とは、時間的に近い複数のフレームの画像間で人物を追尾する機能を意味する。追尾機能を設ける場合、検出部１４は、対応付けした同一の人物の各フレーム画像中の顔位置を、対応するフレーム画像を取得した時刻あるいは上述のフレームＩＤに対応付けて記録し、人物の移動の軌跡として記憶部２０に記憶する。すなわち、検出部１４は、複数のフレーム画像夫々について、同一の人物の顔位置を対応付けて記憶部２０に記憶する。 The detection unit 14 executes a detection process for detecting a person in the image. The detection unit 14 in the present embodiment detects a person included in the image by performing pattern matching or the like using a collation pattern (dictionary). The detection result of the detection process includes face position information indicating at which position of the image the face image is located. The detection unit 14 may be provided with a function (tracking function) for tracking the detected person. Here, the tracking function means that if the same person as the person detected by the detection unit 14 from the image of the frame one or more before the current frame exists in the image of the current frame, the people in each frame are used. It means the function to be associated. That is, the tracking function means a function of tracking a person between images of a plurality of frames that are close in time. When the tracking function is provided, the detection unit 14 records the face position in each frame image of the same person associated with the person in association with the time when the corresponding frame image was acquired or the above-mentioned frame ID, and moves the person. Is stored in the storage unit 20 as a locus of. That is, the detection unit 14 stores the face positions of the same person in the storage unit 20 in association with each of the plurality of frame images.

特徴量抽出部１５は、検出部１４により検出された人物の顔を含む画像（顔画像）を抽出し、人物の特徴量を算出する。照合部１６は、特徴量抽出部１５が抽出した顔画像と、記憶部２０が記憶する登録画像の特徴量を照合し、同一の人物であるかの類似度を表す照合スコアを算出する。照合部１６は、算出された照合スコアが、所定の閾値以上の場合に、当該人物と登録画像の人物とが同一の人物であると判定する。なお、所定の閾値は、記憶部２０に記憶される。また、所定の閾値は、撮像環境、登録画像、使用目的等に応じて適切に設定することが望ましい。 The feature amount extraction unit 15 extracts an image (face image) including the face of the person detected by the detection unit 14, and calculates the feature amount of the person. The collation unit 16 collates the face image extracted by the feature amount extraction unit 15 with the feature amount of the registered image stored in the storage unit 20, and calculates a collation score indicating the degree of similarity as to whether or not the person is the same person. When the calculated collation score is equal to or higher than a predetermined threshold value, the collation unit 16 determines that the person and the person in the registered image are the same person. The predetermined threshold value is stored in the storage unit 20. In addition, it is desirable to appropriately set a predetermined threshold value according to the imaging environment, registered image, purpose of use, and the like.

メタデータ生成部１７は、検出部１４によって得られた顔位置情報と、照合部１６によって得られた照合スコアとから、撮像画像のメタデータを生成し、当該メタデータを記憶部２０に記憶する。撮像装置が複数ある場合は、メタデータに撮像画像を撮像した撮像装置２の識別情報、すなわち装置ＩＤに対応付けて記録する。 The metadata generation unit 17 generates metadata of the captured image from the face position information obtained by the detection unit 14 and the collation score obtained by the collation unit 16, and stores the metadata in the storage unit 20. .. When there are a plurality of imaging devices, the metadata is recorded in association with the identification information of the imaging device 2 that captured the captured image, that is, the device ID.

最適領域算出部１８は、記憶部２０に記憶されているメタデータ２４中の顔位置情報と、照合部１６による照合スコアに基づいて、最適領域を算出し、算出した最適領域を記憶部２０に最適領域２２として記憶する。ここで、最適領域とは、撮像装置２により得られた撮像画像に対応する撮像範囲内における、同一の人物を照合し易い領域である。このような最適領域内の顔画像に対して照合処理を行うことにより、処理負荷を低減することができる。また、同一の人物の照合スコアが高くなる領域は、最適領域として適切であり、同一の人物における照合スコアが低くなる領域は、最適領域として不適切である。この適切さを示す例として、図７に示すような、照合スコアに応じた最適化レベルを定義する。ここでは、照合スコアは、数値が高い場合に、同一の人物が同一であると判定される本人一致度が高くなる数値とする。 The optimum area calculation unit 18 calculates the optimum area based on the face position information in the metadata 24 stored in the storage unit 20 and the collation score by the collation unit 16, and stores the calculated optimum area in the storage unit 20. It is stored as the optimum area 22. Here, the optimum region is an region in which the same person can be easily collated within the imaging range corresponding to the captured image obtained by the imaging device 2. By performing collation processing on the face image in such an optimum region, the processing load can be reduced. Further, the area where the collation score of the same person is high is appropriate as the optimum area, and the area where the collation score of the same person is low is inappropriate as the optimum area. As an example showing this suitability, an optimization level according to a collation score is defined as shown in FIG. Here, the collation score is a numerical value in which, when the numerical value is high, the degree of personal matching in which the same person is determined to be the same is high.

最適領域の算出は、例えば図７に示すような最適化レベルを設定した場合、数値が高い最適化レベルＡ，Ｂを最適領域とする閾値とすることで、顔位置情報と照合スコアに基づいて、最適領域を算出することができる。なお、最適化レベルは、設定部１２で設定し、記憶部２０に最適化レベル２３として記憶される。最適領域算出部１８は、必要に応じて記憶部２０より最適化レベル２３を読み出して使用する。 The optimum area is calculated based on the face position information and the collation score by setting the optimization levels A and B, which have high numerical values, as the threshold value when the optimization level as shown in FIG. 7 is set. , The optimum area can be calculated. The optimization level is set by the setting unit 12 and stored in the storage unit 20 as the optimization level 23. The optimum area calculation unit 18 reads out the optimization level 23 from the storage unit 20 and uses it as needed.

また、最適領域は、所定時間毎に、過去に算出された照合スコアと顔位置情報に基づいて最適領域算出部１８が算出し、記憶部２０に最適領域２２として記憶する。なお、最適領域の算出は、時間帯毎に行ってもよいし、照明のＯＮ・ＯＦＦ等を条件に行ってもよい。なお、最適領域を算出する処理については、後述する。 Further, the optimum area is calculated by the optimum area calculation unit 18 based on the collation score and the face position information calculated in the past at predetermined time intervals, and is stored in the storage unit 20 as the optimum area 22. The optimum region may be calculated for each time zone, or may be performed on the condition that the lighting is turned on or off. The process of calculating the optimum region will be described later.

図４は、画像処理装置１による最適領域算出処理の例を示すフローチャートである。なお、図４に示す処理は、所定の時間間隔毎に撮像装置２から撮像画像の画像データが供給される毎に実行される。
まず、Ｓ１において、通信部１１は、撮像装置２からの撮像画像の画像データをフレーム画像の単位で取得する。ここで、フレーム画像の例を図５（ａ）に示す。なお、各フレーム画像には、時間情報としてフレームＩＤが付与されているものとする。さらに、各フレーム画像には、対応する撮像装置２の装置ＩＤが付与されているものとする。 FIG. 4 is a flowchart showing an example of the optimum area calculation process by the image processing device 1. The process shown in FIG. 4 is executed every time the image data of the captured image is supplied from the image capturing device 2 at predetermined time intervals.
First, in S1, the communication unit 11 acquires the image data of the captured image from the imaging device 2 in units of frame images. Here, an example of a frame image is shown in FIG. 5 (a). It is assumed that a frame ID is assigned to each frame image as time information. Further, it is assumed that the device ID of the corresponding image pickup device 2 is assigned to each frame image.

次に、Ｓ２において、検出部１４は、Ｓ１において通信部１１が取得したフレーム画像中の人物を検出する検出処理を実行する。Ｓ２においてフレーム画像から人物が検出された場合、Ｓ３に進む。一方、フレーム画像から人物が検出されなかった場合は、画像処理装置１は、現在のフレーム画像に対する処理を終了し、次のフレーム画像の取得を待機する。撮像装置２から次のフレーム画像が供給されると、画像処理装置１は、再度Ｓ１から処理を実行し、通信部１１が、次のフレーム画像を取得する。 Next, in S2, the detection unit 14 executes a detection process for detecting a person in the frame image acquired by the communication unit 11 in S1. If a person is detected from the frame image in S2, the process proceeds to S3. On the other hand, when no person is detected from the frame image, the image processing device 1 ends the processing for the current frame image and waits for the acquisition of the next frame image. When the next frame image is supplied from the image pickup device 2, the image processing device 1 executes the process again from S1, and the communication unit 11 acquires the next frame image.

なお、フレーム画像から人物を検出する場合において、人物が正面向きである場合と横向きである場合とで使用する照合パターンを両方使うことで検出精度の向上が期待できる。例えば、正面（背面）向きの人体の画像と照合させるための照合パターンと、横向きの人物の画像と照合させるための照合パターンとを記憶部２０に保持しておき、撮像装置２の設置状態やユーザの指定に基づいて両方の照合パターンを使うことができる。 When a person is detected from the frame image, the detection accuracy can be expected to be improved by using both the collation patterns used when the person is facing forward and when the person is facing sideways. For example, a collation pattern for collating with an image of a human body facing in front (rear) and a collation pattern for collating with an image of a person facing sideways are stored in the storage unit 20, and the installation state of the image pickup device 2 and the state of installation. Both collation patterns can be used as specified by the user.

また、照合パターンは、斜め方向からや上方向から等他の角度からのものを用意しておいてもよい。また、人物を検出する場合、必ずしも全身の特徴を示す照合パターン（辞書）を用意しておく必要はなく、上半身、下半身、頭部、顔、足等の人物の一部についての照合パターンを用意してもよい。なお、検出部１４が人物を検出処理は、画像から人物を検出する機能を有していればよく、例えば既知の動体検出や肌色検出等の処理でもよく、パターンマッチング処理にのみ限定されるものではない。 Further, the collation pattern may be prepared from another angle such as from an oblique direction or from an upward direction. In addition, when detecting a person, it is not always necessary to prepare a collation pattern (dictionary) showing the characteristics of the whole body, and a collation pattern for a part of the person such as the upper body, lower body, head, face, and legs is prepared. You may. The process of detecting a person by the detection unit 14 may have a function of detecting a person from an image, and may be, for example, a process of known motion detection or skin color detection, and is limited to pattern matching processing. is not.

次に、Ｓ３において、検出部１４は、フレーム画像中の顔を検出する。具体的には、検出部１４は、Ｓ２で検出された人物の顔部分の画像（顔画像）の検出処理を実行する。ここで、顔画像が検出された位置を示す例を図５（ｂ）に示す。この例では、取得したフレーム画像に対し、顔画像が検出された位置を顔位置４１として表示している。なお、検出された顔位置には、各々の顔位置を識別可能なＩＤが付与されているものとする。また、検出部１４は、フレーム画像中に顔が検出された場合、Ｓ４に遷移する。一方、フレーム画像中に顔が検出されなかった場合は、画像処理装置１は、取得した現在のフレーム画像に対する処理を終了し、次のフレーム画像の取得を待機する。撮像装置２から次のフレーム画像が供給されると、画像処理装置１は、再度Ｓ１から処理を実行し、通信部１１が、次のフレーム画像を取得する。 Next, in S3, the detection unit 14 detects the face in the frame image. Specifically, the detection unit 14 executes the detection process of the image (face image) of the face portion of the person detected in S2. Here, an example showing the position where the face image is detected is shown in FIG. 5 (b). In this example, the position where the face image is detected is displayed as the face position 41 with respect to the acquired frame image. It is assumed that the detected face position is given an ID that can identify each face position. Further, the detection unit 14 transitions to S4 when a face is detected in the frame image. On the other hand, when the face is not detected in the frame image, the image processing device 1 ends the processing for the acquired current frame image and waits for the acquisition of the next frame image. When the next frame image is supplied from the image pickup device 2, the image processing device 1 executes the process again from S1, and the communication unit 11 acquires the next frame image.

Ｓ４において、特徴量抽出部１５は、Ｓ３で検出された顔画像に対して、人物の顔の特徴を示す情報である特徴量を抽出する。なお、人物の特徴量としては、色・エッジ・テクスチャ、髪の色、顔の器官の形状、サングラスの有無、髭の有無等といった情報がある。 In S4, the feature amount extraction unit 15 extracts the feature amount, which is information indicating the facial features of the person, from the face image detected in S3. As the feature amount of the person, there is information such as color / edge / texture, hair color, shape of facial organs, presence / absence of sunglasses, presence / absence of beard, and the like.

Ｓ５において、照合部１６は、顔画像の照合処理を行う。具体的には、照合部１６は、特徴量抽出部１５より抽出された顔画像の特徴量と、記憶部２０に記憶されている登録画像特徴量２１とを比較し、当該人物と登録画像に対応する人物との類似度を示す照合スコアを算出する。なお、照合スコアが高いほど、類似度が高いことを示す。そして、照合部１６は顔画像の特徴量と、登録画像の特徴量とを比較して算出した照合スコアが所定の閾値を超えた場合、顔画像の人物と登録画像の人物とが同一の人物であると判定する。すなわち、照合部１６は、顔画像の特徴量と登録画像の特徴量とを比較した類似度が所定の閾値を超えた場合に、顔画像の人物と登録画像の人物とが同一の人物であると判定する。すなわち、照合部１６は、検出されたオブジェクトである顔画像の人物が、所定のオブジェクトである登録画像の人物であるか否かの判定を行う。 In S5, the collation unit 16 performs collation processing of the face image. Specifically, the collation unit 16 compares the feature amount of the face image extracted from the feature amount extraction unit 15 with the registered image feature amount 21 stored in the storage unit 20, and creates the person and the registered image. Calculate a matching score that indicates the degree of similarity with the corresponding person. The higher the collation score, the higher the similarity. Then, when the collation score calculated by comparing the feature amount of the face image and the feature amount of the registered image exceeds a predetermined threshold value, the collation unit 16 is the same person as the person in the face image and the person in the registered image. Is determined to be. That is, in the collation unit 16, when the similarity between the feature amount of the face image and the feature amount of the registered image exceeds a predetermined threshold value, the person in the face image and the person in the registered image are the same person. Is determined. That is, the collation unit 16 determines whether or not the person in the face image, which is the detected object, is the person in the registered image, which is a predetermined object.

次に、Ｓ６において、メタデータ生成部１７は、検出部１４により得られた顔位置の情報と、照合部１６により得られた照合スコアに基づいて、メタデータ２４を生成する。メタデータ生成部１７は、生成したメタデータを記憶部２０に記憶する。なお、画像処理装置１は、操作部１９を介して、ユーザから終了指示を受けるまで、Ｓ１からＳ８までの処理を繰り返す。これにより、画像処理装置１は、撮像装置２から継続して受信する撮像画像の各フレーム画像のメタデータ２４の生成を続け、記憶部２０にメタデータ２４を蓄積していく。図６は、蓄積されたメタデータ２４の情報の例を示している。図６では、メタデータ２４として、照合スコアが高い順に顔位置を対応付けて蓄積した例を示している。顔位置は、例えばフレーム画像中の座標を示すＸ、Ｙと、顔位置のサイズを示すＷ、Ｈとを含んでいる。ここで、Ｘはフレーム画像中の顔領域の基準位置（例えば左上）のＸ座標をフレーム画像の画素単位で示しており、Ｙは顔領域の基準位置のＹ座標を示している。また、Ｗは顔領域の幅をフレーム画像の画素単位で示しており、Ｈは顔領域の高さを示している。また、メタデータ２４は、登録画像に対応する人物毎に蓄積しておき、特定の人物毎に後述のように最適領域を算出するようにしてもよい。あるいは、特定の人物に限定せずにメタデータ２４を蓄積しておき、特定の人物に限らず同一の最適領域を算出するようにしてもよい。また、メタデータ２４に、対応するフレーム画像を取得した時刻あるいは上述のフレームＩＤを対応付けて記憶しておいてもよい。 Next, in S6, the metadata generation unit 17 generates the metadata 24 based on the face position information obtained by the detection unit 14 and the collation score obtained by the collation unit 16. The metadata generation unit 17 stores the generated metadata in the storage unit 20. The image processing device 1 repeats the processes S1 to S8 until an end instruction is received from the user via the operation unit 19. As a result, the image processing device 1 continues to generate the metadata 24 of each frame image of the captured image continuously received from the image pickup device 2, and accumulates the metadata 24 in the storage unit 20. FIG. 6 shows an example of the information of the accumulated metadata 24. FIG. 6 shows an example in which the face positions are associated and accumulated as the metadata 24 in descending order of the collation score. The face position includes, for example, X and Y indicating the coordinates in the frame image and W and H indicating the size of the face position. Here, X indicates the X coordinate of the reference position (for example, the upper left) of the face area in the frame image in pixel units of the frame image, and Y indicates the Y coordinate of the reference position of the face area. Further, W indicates the width of the face region in pixel units of the frame image, and H indicates the height of the face region. Further, the metadata 24 may be accumulated for each person corresponding to the registered image, and the optimum area may be calculated for each specific person as described later. Alternatively, the metadata 24 may be accumulated without being limited to a specific person, and the same optimum region may be calculated without being limited to a specific person. Further, the metadata 24 may be stored in association with the time when the corresponding frame image was acquired or the above-mentioned frame ID.

次に、Ｓ７において、最適領域算出部１８は、最適領域を算出する。
最適領域算出部１８は、記憶部２０に記憶されているメタデータ２４より、顔位置毎の照合スコアに基づいて最適領域を算出する。人物（被写体）が各フレーム画像中で移動すると、移動に従って照合スコアが変化する。これは、登録画像とフレーム画像から得られた顔画像の類似度の変化を示している。この変化は、人物（被写体）の移動方向、移動速度、照明等の影響及び照合アルゴリズムによるものである。 Next, in S7, the optimum area calculation unit 18 calculates the optimum area.
The optimum area calculation unit 18 calculates the optimum area from the metadata 24 stored in the storage unit 20 based on the collation score for each face position. When a person (subject) moves in each frame image, the collation score changes according to the movement. This shows the change in the degree of similarity between the registered image and the face image obtained from the frame image. This change is due to the influence of the moving direction, moving speed, lighting, etc. of the person (subject) and the collation algorithm.

ここで、図８、図９を使用して照合スコアの変化について説明する。
図８（ａ）は、登録画像５０の例を示している。図８（ｂ）は、照合部１６が同一の人物の追尾を行った場合の登録画像５０と同一の人物の移動の軌跡５５の例を示している。なお、軌跡５５は、各フレーム画像で照合された同一の人物の顔位置により算出する。図８（ｂ）では、顔位置５１、５２、５３が顔画像の照合処理に使用した顔位置を示している。 Here, the change in the collation score will be described with reference to FIGS. 8 and 9.
FIG. 8A shows an example of the registered image 50. FIG. 8B shows an example of the movement locus 55 of the same person as the registered image 50 when the collation unit 16 tracks the same person. The locus 55 is calculated based on the face positions of the same person collated in each frame image. In FIG. 8B, face positions 51, 52, and 53 indicate face positions used for face image collation processing.

図９は、図８（ｂ）に示した軌跡５５に対する照合スコアの変化の例を示している。
図９は、連続するフレーム画像毎の照合スコアを、フレーム画像のフレームＩＤに対応する照合スコアのグラフとして示している。同一の人物であっても、照合スコアは、フレーム画像毎に得られる顔位置により変化する。また、上述の照合スコアと顔位置の蓄積は、所定の閾値を示す境界値６１より上の照合スコアに対応するものを対象とする。境界値６１は、最適化レベルで決定する。例えば、図７の最適化レベルＢ以上としてもよい。 FIG. 9 shows an example of a change in the collation score with respect to the locus 55 shown in FIG. 8 (b).
FIG. 9 shows the collation score for each consecutive frame image as a graph of the collation score corresponding to the frame ID of the frame image. Even for the same person, the collation score changes depending on the face position obtained for each frame image. Further, the above-mentioned accumulation of the collation score and the face position is targeted for those corresponding to the collation score above the boundary value 61 indicating a predetermined threshold value. The boundary value 61 is determined by the optimization level. For example, the optimization level B or higher in FIG. 7 may be set.

次に、最適領域の算出について説明する。
最適領域算出部１８は、上述のように蓄積された照合スコアと顔位置に基づいて最適領域を算出する。具体的には、最適領域は、照合スコアが境界値（所定の閾値）６１を超えた顔位置の場所の回数（出現率）としてもよいし、照合スコアの平均値としてもよい。図１０は、最適領域算出部１８が算出した最適領域の例を示している。なお、図１０では、網掛けが濃い部分が照合スコアが高い部分を示している。 Next, the calculation of the optimum region will be described.
The optimum area calculation unit 18 calculates the optimum area based on the collation score and the face position accumulated as described above. Specifically, the optimum region may be the number of times (appearance rate) of the positions of the face positions where the collation score exceeds the boundary value (predetermined threshold value) 61, or may be the average value of the collation scores. FIG. 10 shows an example of the optimum area calculated by the optimum area calculation unit 18. In FIG. 10, the part with dark shading indicates the part with high collation score.

次に、Ｓ８において、最適領域算出部１８は、Ｓ７で算出した最適領域を記憶部２０に最適領域２２として記憶する。なお、記憶部２０に既に最適領域２２が記憶されている場合には、記憶されている最適領域２２を今回得られた最適領域２２の値で上書きする。これにより、所定時間毎に最適領域が更新されることとなり、最適領域２２は、時間的な環境変化に対応し、常に処理時点に近い最適領域を記憶しておくことができる。画像処理装置１は、上述のように、最適領域算出処理を定期的に繰り返すことにより、最適領域を更新する。なお、上述の説明では、所定時間毎に最適領域を更新する例を示したが、使用する時間帯等毎に最適領域を用意しておき、適宜切り替えるようにしてもよい。 Next, in S8, the optimum area calculation unit 18 stores the optimum area calculated in S7 in the storage unit 20 as the optimum area 22. When the optimum area 22 is already stored in the storage unit 20, the stored optimum area 22 is overwritten with the value of the optimum area 22 obtained this time. As a result, the optimum area is updated at predetermined time intervals, and the optimum area 22 can always store the optimum area close to the processing time point in response to changes in the environment over time. As described above, the image processing device 1 updates the optimum area by periodically repeating the optimum area calculation process. In the above description, an example of updating the optimum area at predetermined time intervals has been shown, but the optimum area may be prepared for each time zone to be used and switched appropriately.

図１１は、最適領域を使用した顔画像の照合処理を示すフローチャートである。なお、以下、図４の処理と内容と同じ処理については同一の符号で示し、説明を省略する。
まず、Ｓ１０において、照合部１６は、記憶部２０より最適領域２２を取得する。
次に、Ｓ１１において、通信部１１が撮像装置２からのフレーム画像を取得すると、照合部１６は、取得した最適領域とフレーム画像から、顔画像の照合処理の対象となるフレーム画像中の領域（最適領域）を特定する。図１２は、顔画像の照合処理の対象となる最適領域を示している。続くＳ１２において、検出部１４は、Ｓ１１で得られたフレーム画像に対して最適領域の範囲のみ人物検出処理を実行する。これにより、照合スコアが低い領域に対する人物検出処理の実行回数を削減することが可能となる。Ｓ１２で人物を検出したならば、Ｓ３に進む。なお、Ｓ３からＳ５までの処理は図４と同等であるため省略する。Ｓ５の後に、Ｓ１６に進む。 FIG. 11 is a flowchart showing a face image collation process using the optimum area. Hereinafter, the processing having the same content as that of FIG. 4 will be indicated by the same reference numerals, and the description thereof will be omitted.
First, in S10, the collation unit 16 acquires the optimum area 22 from the storage unit 20.
Next, in S11, when the communication unit 11 acquires the frame image from the image pickup device 2, the collation unit 16 selects the region in the frame image to be collated with the face image from the acquired optimum region and the frame image. Optimal area). FIG. 12 shows an optimum region to be collated with the face image. In the following S12, the detection unit 14 executes the person detection process only in the optimum region for the frame image obtained in S11. As a result, it is possible to reduce the number of times the person detection process is executed for the area where the collation score is low. If a person is detected in S12, the process proceeds to S3. Since the processes from S3 to S5 are the same as those in FIG. 4, they are omitted. After S5, proceed to S16.

次に、Ｓ１６において、照合部１６は、Ｓ５で求められた照合スコアに基づいて人物の判定を行う。具体的には、照合部１６は、登録画像と顔画像の類似度を示す照合スコアに基づいて登録画像に対応する人物と同一であるか否かを判定する。ここで、同一であるか否かの判定結果に応じて、照合スコアと顔位置の情報を記憶部２０にメタデータ２４として記憶し、最適領域算出部１８が、最適領域の算出を行って、最適領域を更新するようにしてもよい。すなわち、図１１の照合処理と図４の最適領域算出処理を認証処理の一連の流れの中で実行するようにしてもよい。また、最適領域内のフレーム画像に対してＳ１２（顔検出）からＳ１６（判定）までの処理を実行するようにしてもよい。これにより、処理負荷をさらに低減させることができる。 Next, in S16, the collation unit 16 determines a person based on the collation score obtained in S5. Specifically, the collation unit 16 determines whether or not the person is the same as the person corresponding to the registered image based on the collation score indicating the degree of similarity between the registered image and the face image. Here, the collation score and the face position information are stored in the storage unit 20 as metadata 24 according to the determination result of whether or not they are the same, and the optimum area calculation unit 18 calculates the optimum area. The optimum area may be updated. That is, the collation process of FIG. 11 and the optimum area calculation process of FIG. 4 may be executed in a series of flow of the authentication process. Further, the processes from S12 (face detection) to S16 (determination) may be executed for the frame image in the optimum region. As a result, the processing load can be further reduced.

続くＳ１７において、画像処理装置１は、操作部１９を介してユーザから終了指示が入力されたか否かを判定する。終了指示が入力されていない場合は、現在のフレーム画像に対する処理を終了し、Ｓ１０に戻って、撮像装置２から次のフレーム画像が供給されるのを待機する。撮像装置２から次のフレーム画像が供給されると、画像処理装置１は、Ｓ１０以降の処理を繰り返す。一方、Ｓ１７において、終了指示が入力されている場合には、画像処理装置１は、図１１の顔画像の照合処理を終了する。 In the following S17, the image processing device 1 determines whether or not an end instruction has been input from the user via the operation unit 19. If the end instruction is not input, the process for the current frame image is ended, the process returns to S10, and the image pickup device 2 waits for the next frame image to be supplied. When the next frame image is supplied from the image pickup device 2, the image processing device 1 repeats the processing after S10. On the other hand, in S17, when the end instruction is input, the image processing device 1 ends the collation process of the face image of FIG.

以上説明したように、本実施形態の画像処理装置では、照合スコアに基づいて求めた最適領域に応じて、登録画像と照合を行うフレーム画像中の領域を選択することにより、照合回数を制限する。すなわち、本実施形態では、まず、照合スコアに基づいて、フレーム画像で規定される領域（第１の領域）に含まれる領域（最適領域）を決定する。そして、第１の領域よりもサイズ（面積）が小さい最適領域を、照合を行う対象領域（第２の領域）とすることにより、フレーム画像全体を対象とする場合と比べて照合回数を制限することができる。これにより、本実施形態では、画像中のオブジェクトの照合処理の精度を維持しつつ処理負荷を低減させることができる。また、本実施形態では、照合スコアや照合させる最適領域の設定を自動化できるため、画像処理システムの設置負荷を軽減させることができる。 As described above, in the image processing apparatus of the present embodiment, the number of collations is limited by selecting an area in the frame image to be collated with the registered image according to the optimum area obtained based on the collation score. .. That is, in the present embodiment, first, a region (optimal region) included in the region (first region) defined by the frame image is determined based on the collation score. Then, by setting the optimum region whose size (area) is smaller than that of the first region as the target region (second region) for collation, the number of collations is limited as compared with the case where the entire frame image is targeted. be able to. Thereby, in the present embodiment, it is possible to reduce the processing load while maintaining the accuracy of the collation processing of the objects in the image. Further, in the present embodiment, since the setting of the collation score and the optimum area to be collated can be automated, the installation load of the image processing system can be reduced.

（実施形態２）
実施形態２では、店舗やテーマパーク等の施設に入場した人物の各々について、当該施設に滞留する時間である滞留時間を求める例について説明する。滞留時間の測定は、例えば入口用と出口用のカメラで撮像した２種類の画像から個々の人物の照合を取り、照合が取れた時刻の差分から滞留時間を求めることによって行う。なお、照合が取れた際に用いたフレーム画像の取得時刻の差分から滞在時間を求めるようにしてもよい。 (Embodiment 2)
In the second embodiment, an example of obtaining the residence time, which is the time to stay in the facility, will be described for each person who has entered the facility such as a store or a theme park. The residence time is measured, for example, by collating an individual person from two types of images captured by an entrance camera and an exit camera, and obtaining the residence time from the difference between the times when the collation was obtained. It should be noted that the staying time may be obtained from the difference in the acquisition time of the frame image used when the collation is obtained.

図１３は、滞留時間を計測する画像処理システムにおける撮像装置の設置の例を示す図である。
入口用カメラ８１は、入口ゲート８２を通過する人物を撮像する。入場方向８３は、入口を通過する方向を示す。同様に出口用カメラ８６は、出口ゲート８７を通過する人物を撮像する。退場方向８８は、出口を通過する方向を示す。 FIG. 13 is a diagram showing an example of installation of an image pickup apparatus in an image processing system for measuring residence time.
The entrance camera 81 captures a person passing through the entrance gate 82. The entrance direction 83 indicates a direction passing through the entrance. Similarly, the exit camera 86 captures a person passing through the exit gate 87. The exit direction 88 indicates a direction passing through the exit.

図１４は、本実施形態の画像処理システムの構成例を示すブロック図である。この画像処理システムは、上述の図１に示す画像処理システムの構成に加え、さらに１つの撮像装置５を備えている。ここでは、撮像装置２を入口用カメラ８１とし、撮像装置５を出口用カメラ８６とした例について説明する。また、本実施形態では、画像処理装置１は、上述の図２の構成に加えて、滞留時間を計測する計測部を備えている。 FIG. 14 is a block diagram showing a configuration example of the image processing system of the present embodiment. This image processing system includes one image pickup device 5 in addition to the configuration of the image processing system shown in FIG. 1 described above. Here, an example in which the image pickup device 2 is the entrance camera 81 and the image pickup device 5 is the exit camera 86 will be described. Further, in the present embodiment, the image processing device 1 includes a measuring unit for measuring the residence time in addition to the configuration of FIG. 2 described above.

図１５（ａ）は、入口用カメラ８１からのフレーム画像の例を示す図であり、図１５（ｂ）は、出口用カメラ８６からのフレーム画像の例を示す図である。
画像処理装置１の特徴量抽出部１５は、入口用カメラ８１からのフレーム画像から検出された顔画像の特徴量と顔位置情報及びカメラ識別子（装置ＩＤ）を関連付けて、記憶部２０にメタデータ２４、登録画像特徴量２１として保存する。一方、照合部１６は、出口用カメラ８６からのフレーム画像から検出された顔画像の特徴量と、入口用カメラ８１からのフレーム画像から検出された顔画像に応じて登録された登録画像特徴量２１との照合を実行し、照合スコアを算出する。照合部１６は、算出した照合スコアと顔位置をメタデータ２４として記憶する。図１６は、複数のフレーム画像に応じて蓄積されたメタデータ２４の情報の例を示している。なお、別途、照合部１６は、入口用と出口用とで、顔画像の照合が取れた時刻を記憶部２０に記憶しておく。なお、計測部は、記憶部２０に記憶されている入口用と出口用で顔画像の照合が取れた時刻の差分から滞留時間を求める。 FIG. 15A is a diagram showing an example of a frame image from the entrance camera 81, and FIG. 15B is a diagram showing an example of a frame image from the exit camera 86.
The feature amount extraction unit 15 of the image processing device 1 associates the feature amount of the face image detected from the frame image from the entrance camera 81 with the face position information and the camera identifier (device ID), and provides metadata to the storage unit 20. 24. Save as the registered image feature amount 21. On the other hand, the collation unit 16 is a registered image feature amount registered according to the feature amount of the face image detected from the frame image from the exit camera 86 and the face image detected from the frame image from the entrance camera 81. The collation with 21 is executed, and the collation score is calculated. The collation unit 16 stores the calculated collation score and the face position as metadata 24. FIG. 16 shows an example of the information of the metadata 24 accumulated according to the plurality of frame images. Separately, the collation unit 16 stores in the storage unit 20 the times when the face images are collated for the entrance and the exit. In addition, the measuring unit obtains the residence time from the difference between the times when the face images are collated for the entrance and the exit stored in the storage unit 20.

最適領域算出部１８は、蓄積されたメタデータ２４の照合スコアと顔位置に基づいて最適領域を算出する。図１７は、算出された最適領域の例を示している。図１７（ａ）は、入口用カメラ８１からのフレーム画像に対する最適領域であり、図１７（ｂ）は、出口用カメラ８６からのフレーム画像に対する最適領域である。滞留時間を計測するための顔画像の照合処理では、入口用カメラ８１からのフレーム画像に対する最適領域から人物の顔画像を抽出し、出口用カメラ８６からのフレーム画像に対する最適領域から抽出した顔画像を用いて照合処理を実行する。 The optimum area calculation unit 18 calculates the optimum area based on the collation score and the face position of the accumulated metadata 24. FIG. 17 shows an example of the calculated optimum region. FIG. 17A is an optimum area for the frame image from the entrance camera 81, and FIG. 17B is an optimum area for the frame image from the exit camera 86. In the face image collation process for measuring the residence time, the face image of a person is extracted from the optimum area for the frame image from the entrance camera 81, and the face image extracted from the optimum area for the frame image from the exit camera 86. Is used to execute the collation process.

以上説明したように、本実施形態の画像処理装置では、滞留時間を計測するための複数の撮像装置を用いた場合において、画像中のオブジェクトの照合処理の精度を維持しつつ処理負荷を低減させることができる。 As described above, in the image processing apparatus of the present embodiment, when a plurality of imaging devices for measuring the residence time are used, the processing load is reduced while maintaining the accuracy of the collation processing of the objects in the image. be able to.

（実施形態３）
実施形態３は、実施形態２よりも処理負荷を軽減できる画像処理装置を説明する。実施形態３では、最適領域内で検出された顔に対して、照合の優先順位を使用した判定結果を用いることで、さらなる処理負荷の軽減を実現する。
図１８は、本実施形態における処理を示す概念図である。図１８には、検出部１４が同一の人物の追尾を行った場合の同一の人物の軌跡９５、異なるフレームのフレーム画像において最適領域で検出された顔画像９１、顔画像９２、顔画像９３の例が示されている。 (Embodiment 3)
The third embodiment describes an image processing apparatus capable of reducing the processing load as compared with the second embodiment. In the third embodiment, the processing load is further reduced by using the determination result using the collation priority for the face detected in the optimum region.
FIG. 18 is a conceptual diagram showing the processing in the present embodiment. FIG. 18 shows the locus 95 of the same person when the detection unit 14 tracks the same person, the face image 91, the face image 92, and the face image 93 detected in the optimum region in the frame images of different frames. An example is shown.

ここで、照合部１６は、最適領域で検出された顔画像９１、顔画像９２、顔画像９３を、さらに照合スコアが高い領域順で並び替える。例えば、並び順が顔画像９２、顔画像９３、顔画像９１となったとする。 Here, the collation unit 16 rearranges the face image 91, the face image 92, and the face image 93 detected in the optimum region in the order of regions having higher collation scores. For example, suppose that the order is face image 92, face image 93, and face image 91.

上述のように、顔画像の照合処理では、顔検出によって得られた顔画像に対して、図１１に示すような特徴量の抽出（Ｓ４）、顔画像の照合処理（Ｓ５）、判定（Ｓ１６）が実行される。本実施形態では、この処理を照合スコアが高い領域順の顔画像に対して実行する。上述の例では、まず顔画像９２に対し、Ｓ４、Ｓ５、Ｓ１６の処理を実行する。判定（Ｓ１６）により照合スコアが所定の閾値を超えた場合に、照合部１６は、同一の人物であると判定する。同一の人物であると判定された場合は、以降の当該人物の照合処理は実行しない。すなわち、同一の人物の画像である顔画像９３、顔画像９１に対してはＳ４、Ｓ５、Ｓ１６の処理を実行しない。これにより、本実施形態では、さらに処理負荷を低減させることができる。すなわち、本実施形態では、最適領域内の領域について照合スコアが高い順に優先的に照合処理を行うことにより、さらに照合処理の処理負荷を低減させることができる。 As described above, in the face image matching process, the feature amount extraction (S4), the face image matching process (S5), and the determination (S16) as shown in FIG. 11 are performed on the face image obtained by the face detection. ) Is executed. In the present embodiment, this process is executed for the face images in the order of regions with the highest collation scores. In the above example, first, the processes S4, S5, and S16 are executed on the face image 92. When the collation score exceeds a predetermined threshold value by the determination (S16), the collation unit 16 determines that the person is the same person. If it is determined that the person is the same person, the subsequent collation process of the person is not executed. That is, the processes of S4, S5, and S16 are not executed for the face image 93 and the face image 91 which are images of the same person. Thereby, in the present embodiment, the processing load can be further reduced. That is, in the present embodiment, the processing load of the collation process can be further reduced by preferentially performing the collation process for the area in the optimum area in descending order of the collation score.

あるいは、本実施形態では、所定数の顔画像を照合し、最高の照合スコアで判定してもよい。例えば、所定数を２と設定した場合は、並び順に基づいて顔画像９２、顔画像９３に対して処理を実行する。例えば顔画像９２の照合スコアが７１、顔画像９３の照合スコアが６５であった場合、照合スコア７１を判定結果として用いる。 Alternatively, in the present embodiment, a predetermined number of face images may be collated and determined with the highest collation score. For example, when the predetermined number is set to 2, the processing is executed for the face image 92 and the face image 93 based on the order of arrangement. For example, when the collation score of the face image 92 is 71 and the collation score of the face image 93 is 65, the collation score 71 is used as the determination result.

（実施形態４）
実施形態１〜３において、最適領域の修正を行ってもよい。最適領域の修正を行う場合を、実施形態４として以下に説明する。
例えば、撮像装置２から供給されるフレーム画像の画角がユーザによって変更された場合に、最適領域の情報をリセットする。最適領域の情報がリセットされた場合は、検出部１４は、最適領域に範囲を絞った顔検出は行わず、フレーム画像の全領域、あるいは予め指定した領域に対して顔検出を行う。これにより、撮像装置２の画角が変更された場合にも照合処理の精度を維持することができる。なお、リセット後に照合部１６によって得られる照合スコアとメタデータ生成部１７による顔位置のメタデータ２４が蓄積された場合に、再度最適領域を設定し照合処理を実行させてもよい。あるいは、以前の最適領域を記憶しておき、フレーム画像の画角が以前と同じ画角に戻ったときに、以前の最適領域に戻すようにしてもよい。 (Embodiment 4)
In the first to third embodiments, the optimum region may be modified. The case where the optimum region is modified will be described below as the fourth embodiment.
For example, when the angle of view of the frame image supplied from the image pickup apparatus 2 is changed by the user, the information of the optimum region is reset. When the information of the optimum area is reset, the detection unit 14 does not perform face detection narrowing down to the optimum area, but performs face detection on the entire area of the frame image or a predetermined area. As a result, the accuracy of the collation process can be maintained even when the angle of view of the image pickup apparatus 2 is changed. When the collation score obtained by the collation unit 16 and the metadata 24 of the face position by the metadata generation unit 17 are accumulated after the reset, the optimum area may be set again and the collation process may be executed. Alternatively, the previous optimum area may be stored, and when the angle of view of the frame image returns to the same angle of view as before, the previous optimum area may be restored.

その他の最適領域の修正として、例えば最適領域の照合スコアの蓄積情報によって得られる照合スコアの平均がある一定値より下がった場合に、最適領域を修正する処理を実行する。具体的には、照合スコアの平均がある一定値より下がった場合に、例えば最適領域の範囲を広げることにより、照合スコアと顔位置のメタデータを蓄積する範囲を広げて最適領域を修正する。これにより、照合処理の精度を維持することができる。 As another correction of the optimum area, for example, when the average of the collation scores obtained by the accumulated information of the collation scores of the optimum area falls below a certain value, a process of correcting the optimum area is executed. Specifically, when the average of the collation score falls below a certain value, for example, by expanding the range of the optimum area, the range of accumulating the metadata of the collation score and the face position is expanded to correct the optimum area. As a result, the accuracy of the collation process can be maintained.

以上、本発明の好ましい実施形態について説明したが、本発明はこれらの実施形態に限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、検出する対象（オブジェクト）は人物であるとしたが、人物以外のオブジェクトを検出対象としてもよい。 Although the preferred embodiments of the present invention have been described above, the present invention is not limited to these embodiments, and various modifications and modifications can be made within the scope of the gist thereof. For example, although the object to be detected is a person, an object other than the person may be a detection target.

（その他の実施形態）
以上、各実施形態を詳述したが、本発明は例えば、システム、装置、方法、プログラム若しくは記録媒体（記憶媒体）等としての実施態様をとることが可能である。具体的には、複数の機器（例えば、ホストコンピュータ、インタフェース機器、撮像装置、ｗｅｂアプリケーション等）から構成されるシステムに適用してもよいし、また、一つの機器からなる装置に適用してもよい。 (Other embodiments)
Although each embodiment has been described in detail above, the present invention can take an embodiment as a system, an apparatus, a method, a program, a recording medium (storage medium), or the like. Specifically, it may be applied to a system composed of a plurality of devices (for example, a host computer, an interface device, an imaging device, a web application, etc.), or it may be applied to a device composed of one device. Good.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。ＡＳＩＣはＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔの略である。 The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions. ASIC is an abbreviation for Application Specific Integrated Circuit.

１…画像処理装置、２…撮像装置、３…ネットワーク、４…ディスプレイ、１１…通信部、１２…設定部、１３…表示部、１４…検出部、１５…特徴量抽出部、１６…照合部、１７…メタデータ生成部、１８…最適領域算出部、１９…操作部、２０…記憶部 1 ... image processing device, 2 ... imaging device, 3 ... network, 4 ... display, 11 ... communication unit, 12 ... setting unit, 13 ... display unit, 14 ... detection unit, 15 ... feature amount extraction unit, 16 ... collation unit , 17 ... Metadata generation unit, 18 ... Optimal area calculation unit, 19 ... Operation unit, 20 ... Storage unit

Claims

A detection means that detects objects contained in the captured image,
A calculation means for calculating the similarity between the detected object and a predetermined registered object, and
A determination means for determining whether or not the detected object is the predetermined registered object based on the similarity calculated by the calculation means.
A determination to determine a second region included in the first region according to the similarity calculated by the calculation means with respect to an object detected by the detection means in the first region in the captured image. Means and
A control means that controls the determination means so as to determine whether or not the object detected by the detection means in the second region is the predetermined registered object.
An image processing device characterized by comprising.

The calculation means is
The similarity is calculated by obtaining the feature amount of the object detected by the detection means and comparing it with the feature amount of the predetermined registered object.
The image processing apparatus according to claim 1.

The determination means
When the similarity calculated by the calculation means is higher than a predetermined threshold value, it is determined that the object detected by the detection means is the registered object.
The image processing apparatus according to claim 1 or 2.

The determination means is
The position of the object detected by the detection means and the similarity calculated by the calculation means are accumulated in association with each other, and the second region is determined according to the position of the accumulated object and the similarity. ,
The image processing apparatus according to any one of claims 1 to 3, wherein the image processing apparatus is characterized by the above.

The detection means
Detecting the object included in the captured image for the first imaging range,
The object contained in the captured image for the second imaging range is detected, and the object is detected.
The calculation means is
The similarity between the detected object and the registered object for the first imaging range is calculated.
The second similarity between the detected object for the second imaging range and the detected object for the first imaging range was calculated.
The determination means
Depending on the similarity, the second region for the first imaging range is determined.
The second region for the second imaging range is determined according to the second similarity.
The image processing apparatus according to any one of claims 1 to 4, wherein the image processing apparatus is characterized by the above.

The captured image has a plurality of frames and has a plurality of frames.
The detection means
The object contained in the captured image is detected for each of the plurality of frames of the captured image, and the object is detected.
The calculation means is
The position of the same object detected in the plurality of frames and the similarity in the frame are stored in association with each other.
The control means
The determination means is controlled so as to determine whether or not the object is a predetermined registered object according to the similarity stored in association with the position of the object.
The image processing apparatus according to any one of claims 1 to 5, wherein the image processing apparatus is characterized in that.

The control means
The determination means is controlled so that the object whose similarity stored in association with the position of the object exceeds a predetermined threshold value is not subsequently determined whether or not it is the predetermined registered object. To do
6. The image processing apparatus according to claim 6.

The control means
The determination means is controlled so as to determine whether or not the object is a predetermined registered object in descending order of similarity stored in association with the position of the object.
6. The image processing apparatus according to claim 6.

The determination means is
When the angle of view of the captured image is changed, the second region is reset.
The image processing apparatus according to any one of claims 1 to 8, wherein the image processing apparatus is characterized in that.

The determination means is
When the angle of view of the captured image returns to the same angle of view as before the reset, it returns to the second region before the reset.
9. The image processing apparatus according to claim 9.

The determination means is
When the average of the similarity falls below a predetermined threshold, the second region is expanded.
The image processing apparatus according to any one of claims 1 to 10.

An imaging means that captures the subject and outputs the captured image,
A detection means for detecting an object included in an image captured by the image pickup means,
A calculation means for calculating the similarity between the detected object and a predetermined registered object, and
A determination means for determining whether or not the detected object is the predetermined registered object based on the similarity calculated by the calculation means.
A determination to determine a second region included in the first region according to the similarity calculated by the calculation means with respect to an object detected by the detection means in the first region in the captured image. Means and
A control means that controls the determination means so as to determine whether or not the object detected by the detection means in the second region is the predetermined registered object.
An imaging device characterized by comprising.

Steps to detect objects contained in the captured image,
The step of calculating the similarity between the detected object and the predetermined registered object, and
A step of determining whether or not the detected object is the predetermined registered object based on the calculated similarity.
A step of determining a second region included in the first region according to the calculated similarity to the detected object in the first region in the captured image.
A step of controlling the detected object in the second region so as to determine whether or not the detected object is the predetermined registered object.
An image processing method characterized by having.

A program for causing a computer to function as each means of the image processing apparatus according to any one of claims 1 to 11.