JP2008085491A

JP2008085491A - Image processor, and image processing method thereof

Info

Publication number: JP2008085491A
Application number: JP2006261254A
Authority: JP
Inventors: Satoyuki Shibata; 智行柴田; Tsugumi Yamada; 貢己山田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2006-09-26
Filing date: 2006-09-26
Publication date: 2008-04-10

Abstract

<P>PROBLEM TO BE SOLVED: To provide a tracking image processor which can shorten the time required for getting an optimal reference image when tracking image processing for object tracking is performed by acquiring a reference image representing the appearance of the object and can evaluate the reference image three-dimensionally. <P>SOLUTION: Computation time is shortened by adding a semi-tracking mode between a detection mode and a tracking mode, and the probability of getting an optimal reference image increases because many frames can be processed. In addition, a success rate of tracking is enhanced. Furthermore, since the reference image is evaluated three-dimensionally for face direction, a face image with a wider facing range is obtained as a reference face image, thus enhancing the success rate of tracking. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、カメラから入力された画像を処理して、対象物体の画像領域を検出し、効率的に追跡処理へ移行する画像処理装置及びその方法に関する。 The present invention relates to an image processing apparatus that processes an image input from a camera, detects an image area of a target object, and efficiently shifts to a tracking process, and a method thereof.

特許文献１に開示されているような映像中から特定領域を検出し、以後のフレームで特定領域を追跡する方法は、一般的に検出処理部の処理時間が長く入力される全てのフレームを処理できず，追跡処理部の初期値として用いられるべき最適画像を見逃すことがあった。 The method of detecting a specific area from a video as disclosed in Patent Document 1 and tracking the specific area in subsequent frames generally processes all frames that are input for a long processing time of the detection processing unit. In some cases, the optimal image to be used as the initial value of the tracking processing unit may be missed.

また、特許文献２及び特許文献３に開示されているような追跡処理や認識処理に用いるための最適画像を評価する方法は、顔部品の２次元的な配置や顔と顔部品の見えから評価していたが、それらの情報だけから評価した画像では真に最適な画像が得られず、その画像を用いて追跡処理を行うことで成功率の低下につながっていた。
特許第３５７６７３４号公報特開２００２−４９９１２公報特開２００５−２２７９５公報 In addition, a method for evaluating an optimal image for use in tracking processing and recognition processing as disclosed in Patent Document 2 and Patent Document 3 is evaluated from the two-dimensional arrangement of face parts and the appearance of faces and face parts. However, an image evaluated only from such information could not provide a truly optimal image, and the tracking process using the image led to a decrease in the success rate.
Japanese Patent No. 3576734 JP 2002-49912 A Japanese Patent Laid-Open No. 2005-22795

上述したように、従来技術には事前知識として与える情報を辞書とし、その辞書により検出処理を行う検出モードにおいて初期値となる基準画像を獲得し、基準画像を用いて追跡処理を行う追跡モードへ移行する処理を行う方法がある。 As described above, in the prior art, information given as prior knowledge is used as a dictionary, and a reference image that is an initial value in a detection mode in which detection processing is performed using the dictionary is acquired, and a tracking mode in which tracking processing is performed using the reference image is performed. There is a method for performing the process of migration.

しかし、検出モードの処理時間が長く全てのフレームを処理できないため、最適な基準画像を有するフレームを見逃すことがあった。 However, since the processing time in the detection mode is long and all frames cannot be processed, a frame having an optimal reference image may be missed.

また、顔画像を扱う場合、検出モードにおいて初期値となる基準顔画像は、顔部品の２次元的な配置や顔と顔部品の見えからカメラに対して正面を向いている顔画像を評価していた。しかし、この評価方法では最適ではない顔向きの基準顔画像を獲得することがあり、それにより追跡成功率が低下していた。 When handling a face image, the reference face image, which is the initial value in the detection mode, is evaluated as a face image facing the camera from the two-dimensional arrangement of face parts and the appearance of the face and face parts. It was. However, this evaluation method sometimes obtains a reference face image with a face orientation that is not optimal, and the tracking success rate is reduced.

そこで本発明は、より多くのフレームを処理することを可能とし、また、それにより追跡成功率が向上する画像処理装置及びその方法を提供することを目的とする。 SUMMARY OF THE INVENTION An object of the present invention is to provide an image processing apparatus and method capable of processing more frames and improving the tracking success rate.

本発明は、時系列の画像を入力する画像入力部と、検出対象である対象物体に関する辞書情報を記憶する辞書記憶部と、基準画像及び基準類似度を記憶する基準記憶部と、前記入力画像から前記辞書情報を用いて前記対象物体の第１対応領域を検出し、前記辞書情報と前記第１対応領域の画像との第１類似度を求め、前記第１対応領域の画像及び前記第１類似度で前記基準画像及び前記基準類似度を更新する検出モードを実行する検出モード実行部と、前記入力画像から前記基準画像を用いて第２対象領域を検出し、前記辞書情報と前記第２対応領域の画像との第２類似度を求め、前記求めた前記第２類似度が前記基準画像の類似度より高いときに、前記第２対応領域及び前記第２類似度で前記基準画像及び前記基準類似度を更新する準追跡モードを実行する準追跡モード実行部と、前記入力画像から前記基準画像を用いて前記対象領域を検出する追跡モードを実行する追跡モード実行部と、（１）最初は前記検出モードに切り替え、（２）前記検出モードの実行中に前記第１類似度が第１閾値よりも高くなったときには前記準追跡モードに切り替え、（３）前記準追跡モードの実行中に前記第２類似度が前記第１閾値より高い第２閾値よりも高くなったときには、前記追跡モードに切り替え、（４）前記準追跡モードの実行中に前記第２類似度が前記第１の閾値と前記第２の閾値との間であるときには、前記準追跡モードを継続する切り替え部と、を備えた画像処理装置である。 The present invention provides an image input unit that inputs a time-series image, a dictionary storage unit that stores dictionary information related to a target object that is a detection target, a reference storage unit that stores a reference image and a reference similarity, and the input image The first corresponding area of the target object is detected from the dictionary information, and a first similarity between the dictionary information and the image of the first corresponding area is obtained, and the image of the first corresponding area and the first A detection mode execution unit that executes a detection mode for updating the reference image and the reference similarity with a similarity; a second target region is detected from the input image using the reference image; and the dictionary information and the second A second similarity with the image of the corresponding region is obtained, and when the obtained second similarity is higher than the similarity of the reference image, the reference image and the second image with the second corresponding region and the second similarity are obtained. Semi-tracking to update reference similarity A quasi-tracking mode execution unit that executes a tracking mode, a tracking mode execution unit that executes a tracking mode that detects the target region using the reference image from the input image, and (1) first switch to the detection mode, (2) When the first similarity is higher than a first threshold during execution of the detection mode, the mode is switched to the semi-tracking mode. (3) During the execution of the semi-tracking mode, the second similarity is When it becomes higher than the second threshold value higher than the first threshold value, the mode is switched to the tracking mode, and (4) the second similarity level is set to the first threshold value and the second threshold value during execution of the semi-tracking mode. And a switching unit that continues the semi-tracking mode.

従来法と比べ計算時間が短くなり、より多くのフレームを処理できるため最適な基準画像を獲得する確率が上昇し、それにより追跡成功率が向上する。 Compared to the conventional method, the calculation time is shorter and more frames can be processed, so that the probability of obtaining the optimum reference image increases, thereby improving the tracking success rate.

また、対象物体を顔とする際、基準顔画像の評価を顔の向きを３次元的に考慮することで、より正確な最適な顔向きの画像を獲得でき、それにより追跡成功率が向上する。 In addition, when the target object is a face, it is possible to acquire a more accurate optimal face orientation image by considering the face orientation in three dimensions in the evaluation of the reference face image, thereby improving the tracking success rate. .

以下、図面を参照しながら本発明の各実施形態について説明する。 Embodiments of the present invention will be described below with reference to the drawings.

（第１の実施形態）
本発明の第１の実施形態に係わる追跡画像処理装置１について図１、図２、図８、図９に基づいて説明する。 (First embodiment)
A tracking image processing apparatus 1 according to a first embodiment of the present invention will be described with reference to FIGS. 1, 2, 8, and 9.

本実施形態の追跡画像処理装置１は、対象物体を写した時系列の画像（動画像、映像）を入力すると、選択されたモードにおける処理方法により画像中の対象物体を検出する。また、検出した対象領域の画像が基準画像に適しているか評価し、その評価値をもとに次のフレームにおけるモードを選択する。すなわち、追跡画像処理装置１は、評価値をもとに、検出モード、準追跡モード、追跡モードの３つのモードの切り替えを行う切り替え部５を持つことが特徴である。 When the tracking image processing apparatus 1 according to the present embodiment receives a time-series image (moving image, video) that captures the target object, the tracking image processing apparatus 1 detects the target object in the image by the processing method in the selected mode. Further, it is evaluated whether the detected image of the target area is suitable for the reference image, and the mode in the next frame is selected based on the evaluation value. That is, the tracking image processing device 1 is characterized by having a switching unit 5 that switches between three modes of a detection mode, a semi-tracking mode, and a tracking mode based on the evaluation value.

（１）追跡画像処理装置１の構成
図１は、本実施形態に係わる追跡画像処理装置１を示すブロック図である。 (1) Configuration of Tracking Image Processing Device 1 FIG. 1 is a block diagram showing a tracking image processing device 1 according to the present embodiment.

追跡画像処理装置１は、画像入力部２、検出部３、基準画像評価部４、切り替え部５、辞書記憶部６、基準画像記憶部７、出力部８から構成されている。 The tracking image processing apparatus 1 includes an image input unit 2, a detection unit 3, a reference image evaluation unit 4, a switching unit 5, a dictionary storage unit 6, a reference image storage unit 7, and an output unit 8.

各部２〜７の下記で説明する各機能は、コンピュータに格納されたプログラムによっても実現できる。 Each function described below of each unit 2 to 7 can also be realized by a program stored in a computer.

（１−２）画像入力部２
画像入力部２は、対象物体を撮像した時系列の画像を受け取り、検出部３に入力画像を送る。 (1-2) Image input unit 2
The image input unit 2 receives a time-series image obtained by imaging the target object, and sends the input image to the detection unit 3.

（１−３）辞書記憶部６
辞書記憶部６は、事前に学習された最も一般的な状態である対象物体を検出するための辞書情報を記録しておき、モードに応じて検出部３と基準画像評価部４に辞書情報を送る。 (1-3) Dictionary storage unit 6
The dictionary storage unit 6 records dictionary information for detecting the target object in the most general state learned in advance, and stores the dictionary information in the detection unit 3 and the reference image evaluation unit 4 according to the mode. send.

（１−４）検出部３
検出部３は、画像入力部２からの入力画像中から対象物体の対象領域を検出し、その対象領域の画像を検出画像として得る。このとき、切り替え部５により切り替えられるモードにより対象領域を検出する処理が異なる。このモードとしては、検出モード、準追跡モード、追跡モードがある。 (1-4) Detection unit 3
The detection unit 3 detects a target region of the target object from the input image from the image input unit 2 and obtains an image of the target region as a detection image. At this time, the process for detecting the target region differs depending on the mode switched by the switching unit 5. As this mode, there are a detection mode, a semi-tracking mode, and a tracking mode.

検出モードでは、辞書記憶部６の辞書情報を用いて検出画像を検出する。この検出画像と辞書情報との類似度を基準画像評価部４に送る。また、出力部８へ検出画像と位置情報を出力する。ここでは、辞書情報を用いて画像中の対象物体の領域を検出する方法としては、部分空間法（福井、山口、「形状抽出とパターン照合の組合わせによる顔特徴点抽出」、電子情報通信学会論文誌、1997年8月、第J80-D-II巻、第8号、p.2170-2177参照）を用いる。しかし、検出方法は辞書情報を用いて画像中の対象領域を特定できる方法であればよい。 In the detection mode, the detected image is detected using the dictionary information in the dictionary storage unit 6. The similarity between the detected image and the dictionary information is sent to the reference image evaluation unit 4. Further, the detected image and the position information are output to the output unit 8. Here, subspace method (Fukui, Yamaguchi, “Face feature point extraction by combination of shape extraction and pattern matching”), IEICE Journal, Aug. 1997, J80-D-II, No. 8, p.2170-2177). However, the detection method may be any method that can identify the target region in the image using dictionary information.

準追跡モードと追跡モードでは、基準画像記憶部７の基準画像を用いて対象領域を検出する。この検出画像と基準画像情報との類似度を基準画像評価部４に送る。また、出力部８へ検出画像と位置情報を出力する。基準画像を用いて対象物体の領域を検出する方法としては、ＱＴＲ法（山口修, 福井和広. 定性的３値表現に基づく画像マッチング. 電子情報通信学会技術研究報告, PRMU2002-34, pp. 23-30, June 2002.参照）を用いる。しかし、画像中で基準画像と最も似ている領域を特定できる方法であればよい。 In the semi-tracking mode and the tracking mode, the target area is detected using the reference image in the reference image storage unit 7. The similarity between the detected image and the reference image information is sent to the reference image evaluation unit 4. Further, the detected image and the position information are output to the output unit 8. As a method of detecting the region of the target object using the reference image, QTR method (Osamu Yamaguchi, Kazuhiro Fukui. Image matching based on qualitative ternary expression. IEICE technical report, PRMU2002-34, pp. 23 -30, June 2002.). However, any method can be used as long as it can identify an area most similar to the reference image in the image.

また、検出モードで得られる類似度と、準追跡モードと追跡モードで得られる類似度は異なる方法で獲得しているため、スケールが異なる。そこで後の処理を容易に行うため、スケールが同じになるように各類似度を正規化する。 Further, since the similarity obtained in the detection mode and the similarity obtained in the semi-tracking mode and the tracking mode are acquired by different methods, the scales are different. Therefore, in order to facilitate the subsequent processing, each similarity is normalized so that the scales are the same.

（１−５）基準画像評価部４
基準画像評価部４は、検出部３で検出した検出画像と位置情報により、評価値を求め、また、この評価値から基準画像を新たに更新するか否かの判断を行う。 (1-5) Reference image evaluation unit 4
The reference image evaluation unit 4 obtains an evaluation value from the detected image detected by the detection unit 3 and the position information, and determines whether or not to update the reference image from this evaluation value.

この評価値は、次のように求める。まず、辞書記憶部６の辞書情報と検出画像との類似度を第１の評価値とする。また、対象物体のカメラに対する向きを対象物体の特定部位の幾何学的配置により評価した第２の評価値を求める。そして、第１の評価値と第２の評価値を合成して最終的な評価値を求める。なお、第２の評価値は、対象物体が特定の方向に向いている場合に高くなるようにする。例えば、対象物体が顔の場合には、正面を向いているほど第２の評価値を高くする。 This evaluation value is obtained as follows. First, the similarity between the dictionary information in the dictionary storage unit 6 and the detected image is set as a first evaluation value. Further, a second evaluation value obtained by evaluating the orientation of the target object with respect to the camera by the geometrical arrangement of the specific part of the target object is obtained. Then, the first evaluation value and the second evaluation value are synthesized to obtain a final evaluation value. Note that the second evaluation value is set to be high when the target object is oriented in a specific direction. For example, when the target object is a face, the second evaluation value is increased as it faces the front.

この第１の評価値を求める方法は、２枚の画像がどれだけ似ているかを数値化できる方法であればよい。しかし、検出部３と同様に、切り替え部５により切り替えられるモードにより異なる処理を行う。 The method for obtaining the first evaluation value may be any method that can quantify how similar the two images are. However, similar to the detection unit 3, different processing is performed depending on the mode switched by the switching unit 5.

検出モードと準追跡モードでは、辞書記憶部６の辞書情報を用いて基準画像としての評価値を求める。追跡モードでは、評価値は求めない。 In the detection mode and the semi-tracking mode, an evaluation value as a reference image is obtained using dictionary information in the dictionary storage unit 6. The evaluation value is not obtained in the tracking mode.

求めた評価値は切り替え部５に送る。また、この求めた評価値が、現在の基準画像の評価値より高い場合には、その検出画像とその位置情報を新たな基準画像として、基準画像記憶部７に送る。 The obtained evaluation value is sent to the switching unit 5. If the obtained evaluation value is higher than the evaluation value of the current reference image, the detected image and its position information are sent to the reference image storage unit 7 as a new reference image.

（１−６）切り替え部５
切り替え部５は、基準画像評価部４からの評価値によりモードを切り替える。まず、初期値は検出モードである。そして、評価値が第１のモード切替え閾値Ｓ１を超えると準追跡モードに切り替え、評価値が第２のモード切替え閾値Ｓ２（但し、Ｓ１＜Ｓ２である）を超えると追跡モードに切り替える。なお、全てのモードで対象物体について、検出を一定期間失敗した場合は、検出モードに移行する。 (1-6) Switching unit 5
The switching unit 5 switches the mode according to the evaluation value from the reference image evaluation unit 4. First, the initial value is the detection mode. Then, when the evaluation value exceeds the first mode switching threshold S1, the mode is switched to the semi-tracking mode, and when the evaluation value exceeds the second mode switching threshold S2 (where S1 <S2), the mode is switched to the tracking mode. In addition, when detection fails for a certain period for the target object in all modes, the mode shifts to the detection mode.

（１−７）基準画像記憶部７
基準画像記憶部７は、基準画像評価部４から送られた検出画像とその位置情報を基準画像情報として記録し、モードに応じて検出部３と基準画像評価部４に基準画像情報として送る。 (1-7) Reference image storage unit 7
The reference image storage unit 7 records the detected image and its position information sent from the reference image evaluation unit 4 as reference image information, and sends it as reference image information to the detection unit 3 and the reference image evaluation unit 4 according to the mode.

（１−８）出力部８
出力部８は、検出部３から送られた検出画像とその位置情報を装置外部に出力する。 (1-8) Output unit 8
The output unit 8 outputs the detected image and its position information sent from the detection unit 3 to the outside of the apparatus.

（２）モードの内容
検出モードは、検出部３において、辞書記憶部６の辞書情報を用いて画像中の対象領域の画像（＝検出画像）を検出する。次に、基準画像評価部４において、辞書記憶部６の辞書情報を用いて、検出画像に対して基準画像としての最適性の評価を行う。 (2) Contents of Mode In the detection mode, the detection unit 3 detects the image (= detected image) of the target region in the image using the dictionary information in the dictionary storage unit 6. Next, the reference image evaluation unit 4 uses the dictionary information in the dictionary storage unit 6 to evaluate the optimality of the detected image as a reference image.

準追跡モードは、検出部３において、基準画像記憶部７の基準画像情報を用いて対象領域（＝検出画像）を検出する。次に、基準画像評価部４において、辞書記憶部６の辞書情報を用いて、検出画像に対して基準画像としての最適性の評価を行う。 In the semi-tracking mode, the detection unit 3 detects the target area (= detected image) using the reference image information in the reference image storage unit 7. Next, the reference image evaluation unit 4 uses the dictionary information in the dictionary storage unit 6 to evaluate the optimality of the detected image as a reference image.

追跡モードは、検出部３において、基準画像記憶部７の基準画像情報を用いて対象領域を検出する。なお、基準画像評価部４において評価は行わない。 In the tracking mode, the detection unit 3 detects the target area using the reference image information in the reference image storage unit 7. The reference image evaluation unit 4 does not perform evaluation.

（３）追跡画像処理手順
図２は、本実施形態に係る追跡画像処理手順の一例を示すフローチャートである。処理手順は以下のようになる。 (3) Tracking Image Processing Procedure FIG. 2 is a flowchart showing an example of the tracking image processing procedure according to this embodiment. The processing procedure is as follows.

最初に、ステップ１では、入力された時系列の画像の１フレームを画像入力部２に入力する。既に入力した画像がある場合は、その次の１フレーム分の画像を入力する。 First, in step 1, one frame of the input time-series image is input to the image input unit 2. If there is an image that has already been input, an image for the next one frame is input.

次に、ステップ２において、検出モード、または、準追跡モードと追跡モードであるかを判断する。初期値は検出モードとして判断される。 Next, in step 2, it is determined whether the detection mode or the semi-tracking mode and the tracking mode are set. The initial value is determined as the detection mode.

次に、ステップ２で検出モードであると判断されたときは、ステップ３において、検出部３は、辞書記憶部６の辞書情報を用いて、対象領域を検出し、この検出画像と辞書情報との類似度（＝第１の評価値）を基準画像評価部４に送る。 Next, when it is determined in step 2 that the mode is the detection mode, in step 3, the detection unit 3 uses the dictionary information in the dictionary storage unit 6 to detect the target region, and the detected image, the dictionary information, Are sent to the reference image evaluation unit 4 (= first evaluation value).

次に、ステップ２で準追跡モードと追跡モードと判断されたときは、ステップ４において、検出部３は、基準画像記録部７の基準画像を用いて対象領域を検出し、この検出画像と基準画像との類似度（＝第１の評価値）を基準画像評価部４に送る。 Next, when the semi-tracking mode and the tracking mode are determined in step 2, in step 4, the detection unit 3 detects the target region using the reference image of the reference image recording unit 7, and the detected image and the reference mode are detected. The similarity (= first evaluation value) with the image is sent to the reference image evaluation unit 4.

次に、ステップ５において、ステップ３もしくはステップ４から受け取った類似度（＝第１の評価値）をもとに、類似度（＝第１の評価値）＞＝閾値１であるかどうかを判定し、成立するならステップ６に進み、成立しないならステップ１１に進む。 Next, in step 5, it is determined whether similarity (= first evaluation value)> = threshold value 1 based on the similarity (= first evaluation value) received from step 3 or step 4. If yes, go to Step 6, otherwise go to Step 11.

次に、ステップ６において、検出モードと準追跡モード、または、追跡モードであるかを判断する。検出モードと準追跡モードと判断されたときはステップ７に進み、追跡モードが選択されたときはステップ１０に進む。 Next, in step 6, it is determined whether the detection mode and the semi-tracking mode or the tracking mode is set. When it is determined that the detection mode and the semi-tracking mode are selected, the process proceeds to Step 7, and when the tracking mode is selected, the process proceeds to Step 10.

次に、ステップ７において、検出モードと準追跡モードと判断されているので、基準画像評価部４は、辞書情報を用いて検出画像の評価値（＝第１の評価値＋第２の評価値）を求め、ステップ８に進む。 Next, in step 7, since the detection mode and the semi-tracking mode are determined, the reference image evaluation unit 4 uses the dictionary information to evaluate the detection image (= first evaluation value + second evaluation value). ) And go to step 8.

次に、ステップ８において、評価値＞＝閾値２であるかどうかを判定し、成立するならステップ９に進み、成立しないならステップ１０に進む。 Next, in step 8, it is determined whether or not evaluation value> = threshold value 2. If satisfied, the process proceeds to step 9, and if not satisfied, the process proceeds to step 10.

次に、ステップ９において、切り替え部５は、前記評価値をもとにモードを切り替える。また、評価値が現在の基準画像の評価値より高い場合は、検出画像を基準画像として基準画像記憶部７に記録し、ステップ１０に進む。 Next, in step 9, the switching unit 5 switches the mode based on the evaluation value. If the evaluation value is higher than the evaluation value of the current reference image, the detected image is recorded as a reference image in the reference image storage unit 7, and the process proceeds to step 10.

次に、ステップ１０では、検出画像と位置情報を出力部８で装置外部に出力し、ステップ１１に進む。 Next, in step 10, the detected image and position information are output to the outside of the apparatus by the output unit 8, and the process proceeds to step 11.

次に、ステップ１１では、処理を終了するかどうか判断し、終了しない場合はステップ１に戻り、終了する場合は追跡画像処理を終了する。 Next, in step 11, it is determined whether or not to end the process. If not, the process returns to step 1, and if it ends, the tracking image process is ended.

（４）効果
以下では、準追跡モードを導入することによる効果を説明する。 (4) Effects Hereinafter, effects of introducing the semi-tracking mode will be described.

図８に、本実施形態と従来技術の計算処理量の変化を示す。 FIG. 8 shows changes in the amount of calculation processing between the present embodiment and the prior art.

時刻ｔ１で本実施形態は準追跡モードに移行し、時刻ｔ２で追跡モードに移行するのに対し、従来技術は時刻ｔ３で追跡モードに移行する。 The present embodiment shifts to the quasi-tracking mode at time t1 and shifts to the tracking mode at time t2, whereas the conventional technology shifts to the tracking mode at time t3.

一般的に、検出モードは追跡モードと比べ計算処理量が多く計算時間が遅い。しかし、追跡モードでは基準画像の獲得を行わないため追跡成功率が向上することはないため、基準画像の評価を行う検出モードは重要である。 In general, the detection mode requires a large amount of calculation processing and the calculation time is slower than the tracking mode. However, in the tracking mode, since the reference image is not acquired and the tracking success rate is not improved, the detection mode for evaluating the reference image is important.

そこで、追跡モードと同様の検出方法を用いることで、追跡モードと比べ評価部分の計算時間だけが遅い準追跡モードを導入することにより、時刻ｔ１から時刻ｔ２までの間は従来技術より本実施形態の方がフレームレートが高くなり、より多くの画像を処理できる。 Therefore, by using a detection method similar to that in the tracking mode, a quasi-tracking mode in which only the calculation time of the evaluation portion is slower than in the tracking mode is introduced. The frame rate becomes higher, and more images can be processed.

次に、図９に最適な基準画像を獲得するまでの本実施形態と従来技術の動作内容を示す。なお、図中の矢印の長さは計算時間である。 Next, FIG. 9 shows the operation contents of this embodiment and the prior art until an optimum reference image is acquired. In addition, the length of the arrow in a figure is calculation time.

本実施形態では、時刻ｔ１で最適に近い基準画像を獲得し、計算時間の速い準追跡モードに移行し、時刻ｔ２で最適な基準画像を獲得し、追跡モードに移行する。 In the present embodiment, a reference image that is close to optimum is acquired at time t1, and the mode shifts to the quasi-tracking mode with a fast calculation time. The optimal reference image is acquired at time t2 and the mode is shifted to the tracking mode.

一方、従来技術は、検出モードの計算時間が遅いために時刻ｔ３まで基準画像が獲得できず、追跡モードに移行できない。また、より多くのフレームで顔向きを出力することが望まれているが、従来技術の場合には時刻ｔ３までで出力できたのは３回だけなのに対して、本実施形態は１３回と出力できるため有効である。 On the other hand, according to the conventional technique, since the calculation time of the detection mode is slow, the reference image cannot be acquired until time t3, and the tracking mode cannot be entered. In addition, although it is desired to output the face orientation in more frames, in the case of the conventional technique, only 3 times can be output until time t3, whereas this embodiment outputs 13 times. It is effective because it can.

検出モードと準追跡モードの違いは、検出モードは検出処理に辞書情報を用いるため汎用的に検出が行うことができる。しかし、準追跡モードは検出処理に基準画像を用いるため特定の対象物体しか検出を行うことができない点にある。 The difference between the detection mode and the quasi-tracking mode is that the detection mode uses dictionary information for detection processing, so that detection can be performed on a general basis. However, the semi-tracking mode is that only a specific target object can be detected because the reference image is used for the detection process.

追跡モードが必要な理由は、検出モードと準追跡モードで評価しているのは追跡処理における初期値なので、最適な初期値を獲得した時点で基準画像の更新は必要ないためである。追跡モードでは、基準画像は一定であるが、例えば追跡処理に用いるテンプレートの更新などを行うことで、追跡成功率は向上する。 The reason why the tracking mode is necessary is that since the initial value in the tracking process is evaluated in the detection mode and the semi-tracking mode, it is not necessary to update the reference image when the optimum initial value is obtained. In the tracking mode, the reference image is constant, but the tracking success rate is improved by, for example, updating a template used for the tracking process.

（５）変更例
上記実施形態では、評価値として、第１の評価値＋第２の評価値で求めたが、これに代えて、第１の評価値（類似度）だけを用いてもよい。 (5) Modification Example In the above embodiment, the first evaluation value + the second evaluation value is obtained as the evaluation value, but only the first evaluation value (similarity) may be used instead. .

（第２の実施形態）
本発明の第２の実施形態に係わる追跡画像処理装置１０１について図３と図４に基づいて説明する。 (Second Embodiment)
A tracking image processing apparatus 101 according to a second embodiment of the present invention will be described with reference to FIGS.

追跡画像処理装置１０１は、準追跡モードにおいて、パラメタ記憶部１０９に記録してあるパラメタをもとに、基準画像評価部１０４の処理量を調節する。 The tracking image processing apparatus 101 adjusts the processing amount of the reference image evaluation unit 104 based on the parameters recorded in the parameter storage unit 109 in the semi-tracking mode.

（１）追跡画像処理装置１０１の構成
図３は、本実施形態に係る追跡画像処理装置１０１を示す構成図である。 (1) Configuration of Tracking Image Processing Device 101 FIG. 3 is a configuration diagram showing the tracking image processing device 101 according to the present embodiment.

追跡画像処理装置１０１は、画像入力部１０２、検出部１０３、基準画像評価部１０４、切り替え部１０５、辞書記憶部１０６、基準画像記憶部１０７、出力部１０８、パラメタ記憶部１０９から構成されている。 The tracking image processing apparatus 101 includes an image input unit 102, a detection unit 103, a reference image evaluation unit 104, a switching unit 105, a dictionary storage unit 106, a reference image storage unit 107, an output unit 108, and a parameter storage unit 109. .

なお、追跡画像処理装置１０１の動作のうち、上記実施形態の追跡画像処理装置１と同様な処理については説明を省略する。 Of the operation of the tracking image processing apparatus 101, the description of the same processing as that of the tracking image processing apparatus 1 of the above embodiment will be omitted.

基準画像評価部１０４は、検出部１０３からの検出画像と位置情報の評価値と、この検出画像の基準画像としての最適性の評価を行う。 The reference image evaluation unit 104 evaluates the detection image from the detection unit 103 and the evaluation value of the position information and the optimality of the detection image as the reference image.

この評価値は、次のように求める。まず、辞書記憶部１０６の辞書画像と検出画像との類似度を第１の評価値とする。また、対象物体のカメラに対する向きを対象物体の特定部位の幾何学的配置により評価した第２の評価値を求める。そして、第１の評価値と第２の評価値とを合成して最終的な評価値を求める。 This evaluation value is obtained as follows. First, the similarity between the dictionary image in the dictionary storage unit 106 and the detected image is set as the first evaluation value. Further, a second evaluation value obtained by evaluating the orientation of the target object with respect to the camera by the geometrical arrangement of the specific part of the target object is obtained. Then, the first evaluation value and the second evaluation value are synthesized to obtain a final evaluation value.

この第１の評価値（＝類似度）を求める方法は、２枚の画像がどれだけ似ているかを数値化でき、かつ、計算コストを調整できる方法であればよい。ここで計算コストは、計算の処理量と頻度を示す。例えば評価の計算の処理量を調整するには、部分空間法の場合、用いる空間の次元数を変化させればよい。また評価を行う頻度を調整するには、毎フレーム処理を行うのではなく一定間隔で処理を行えばよい。切り替え部１０５により切り替えられるモードにより異なる処理を行う。 The method for obtaining the first evaluation value (= similarity) may be any method that can quantify how similar two images are and can adjust the calculation cost. Here, the calculation cost indicates the calculation processing amount and frequency. For example, in order to adjust the processing amount of the calculation for evaluation, in the case of the subspace method, the number of dimensions of the space to be used may be changed. Further, in order to adjust the frequency of performing the evaluation, the processing may be performed at regular intervals instead of performing the processing every frame. Different processing is performed depending on the mode switched by the switching unit 105.

検出モードでは、辞書記憶部１０６の辞書情報を用いて基準画像としての最適性の評価を行う。 In the detection mode, optimality as a reference image is evaluated using dictionary information in the dictionary storage unit 106.

準追跡モードでは、パラメタ記憶部１０９のパラメタをもとにパラメタが閾値に近づくにつれ部分空間の次元数を減らし計算量を削減した処理を行うか、また、同様にパラメタ記憶部１０９のパラメタが閾値に近づくにつれ評価を行う頻度を減らすことで計算量を削減した処理を行い、基準画像としての最適性の評価を行う。また、この時に獲得した評価値を次回のパラメタとし、パラメタ記憶部１０９に記録する。 In the quasi-tracking mode, based on the parameters in the parameter storage unit 109, processing is performed in which the number of subspace dimensions is reduced and the amount of calculation is reduced as the parameters approach the threshold value. As the frequency approaches, the frequency of evaluation is reduced to reduce the amount of calculation, and the optimality as a reference image is evaluated. The evaluation value acquired at this time is recorded in the parameter storage unit 109 as the next parameter.

追跡モードでは、評価を行わない。 No evaluation is performed in the tracking mode.

評価値が高くなるにつれ準追跡モードの処理は、追跡モードの処理と同様の処理となる。 As the evaluation value increases, the semi-tracking mode process is the same as the tracking mode process.

求めた評価値は切り替え部１０５に送る。また、この求めた評価値が、現在の基準画像の評価値より高い場合には、その検出画像とその位置情報を新たな基準画像として、基準画像記憶部１０７に送る。 The obtained evaluation value is sent to the switching unit 105. If the obtained evaluation value is higher than the evaluation value of the current reference image, the detected image and its position information are sent to the reference image storage unit 107 as a new reference image.

パラメタ記憶部１０９は、前回処理で基準画像評価部から得られた評価値をパラメタとして記憶する。すなわち、パラメタ記憶部１０９は、基準画像評価部１０４からパラメタを受け取り記録し、要請に応じて基準画像評価部１０４に前回のパラメタを送る。 The parameter storage unit 109 stores the evaluation value obtained from the reference image evaluation unit in the previous process as a parameter. That is, the parameter storage unit 109 receives and records the parameter from the reference image evaluation unit 104, and sends the previous parameter to the reference image evaluation unit 104 in response to a request.

（２）追跡画像処理手順
図４は、本実施形態に係る追跡画像処理手順の一例を示すフローチャートである。処理手順は以下のようになる。図２で説明した追跡画像処理手順と同様な処理に関しては説明を省略する。 (2) Tracking Image Processing Procedure FIG. 4 is a flowchart showing an example of the tracking image processing procedure according to this embodiment. The processing procedure is as follows. Description of processing similar to the tracking image processing procedure described in FIG. 2 is omitted.

ステップ１０６において、検出モード、準追跡モード、追跡モードかを判断する。検出モードのときはステップ７に進み、準追跡モードのときはステップ１１２に進み、追跡モードのときはステップ１１０に進む。 In step 106, it is determined whether the detection mode, the semi-tracking mode, or the tracking mode. When in the detection mode, the process proceeds to step 7; when in the semi-tracking mode, the process proceeds to step 112; when in the tracking mode, the process proceeds to step 110.

次に、ステップ１０７において、検出モードと判断されているので、辞書記憶部１０６の辞書情報を用いて検出画像と位置情報により評価値を求め、ステップ１０８に進む。 Next, since the detection mode is determined in step 107, an evaluation value is obtained from the detected image and position information using the dictionary information in the dictionary storage unit 106, and the process proceeds to step 108.

次に、ステップ１１２において、パラメタ記憶部１０９のパラメタを受け取る。 Next, in step 112, parameters in the parameter storage unit 109 are received.

次に、ステップ１１３において、辞書記憶部１０６の辞書情報を用いて検出画像と位置情報により評価値を求める。このときに受け取ったパラメタをもとに計算量を変化させる。 Next, in step 113, an evaluation value is obtained from the detected image and the position information using the dictionary information in the dictionary storage unit 106. The amount of calculation is changed based on the parameters received at this time.

次に、ステップ１１４において、その評価値を次回のパラメタとし、パラメタ記憶部１０９に記録し、ステップ１０８に進む。 Next, in step 114, the evaluation value is set as the next parameter, recorded in the parameter storage unit 109, and the process proceeds to step 108.

（第３の実施形態）
本発明の第３の実施形態に係わる顔向き推定装置２０１について図５から図７に基づいて説明する。本実施形態に係る顔向き推定装置２０１は、上記実施形態の追跡画像処理装置１において対象物体を人間の顔としたものである。そのため、顔向き推定装置２０１の動作のうち、上述した追跡画像処理装置１と同様な処理については説明を省略する。 (Third embodiment)
A face direction estimation apparatus 201 according to the third embodiment of the present invention will be described with reference to FIGS. The face orientation estimation apparatus 201 according to the present embodiment is such that the target object is a human face in the tracking image processing apparatus 1 of the above embodiment. Therefore, among the operations of the face orientation estimation device 201, the description of the same processing as that of the tracking image processing device 1 described above will be omitted.

（１）顔向き推定装置２０１の構成
図５は、本実施形態に係る顔向き推定装置２０１を示す構成図である。 (1) Configuration of Face Orientation Estimation Device 201 FIG. 5 is a configuration diagram showing the face orientation estimation device 201 according to the present embodiment.

顔向き推定装置２０１は、画像入力部２０２、検出部２０３、基準画像評価部２０４、切り替え部２０５、辞書記憶部２０６、基準画像記憶部２０７、出力部２０８、顔検出部２１０、特徴点検出部２１１、類似度評価部２１２、顔向き評価部２１３、顔向き推定部１４、一般顔形状記憶部２１５から構成されている。 The face orientation estimation apparatus 201 includes an image input unit 202, a detection unit 203, a reference image evaluation unit 204, a switching unit 205, a dictionary storage unit 206, a reference image storage unit 207, an output unit 208, a face detection unit 210, and a feature point detection unit. 211, a similarity evaluation unit 212, a face direction evaluation unit 213, a face direction estimation unit 14, and a general face shape storage unit 215.

（１−１）検出部２０３
検出部２０３は、顔検出部２１０と特徴点検出部２１１とを有する。 (1-1) Detection unit 203
The detection unit 203 includes a face detection unit 210 and a feature point detection unit 211.

顔検出部２１０は、画像入力部２０２からの入力画像中から顔領域を検出する。 The face detection unit 210 detects a face area from the input image from the image input unit 202.

特徴点検出部２１１は、前記顔領域における顔特徴点の座標を検出する。 The feature point detection unit 211 detects the coordinates of the face feature points in the face area.

そして、検出部２０３は、顔領域の画像と特徴点の座標の近傍画像を獲得する。しかし、切り替え部２０５により切り替えられるモードにより検出する処理が異なる。 Then, the detection unit 203 acquires a neighborhood image of the face area image and the feature point coordinates. However, the detection process differs depending on the mode switched by the switching unit 205.

検出モードでは、辞書記憶部２０６の辞書情報を用いて、画像中の顔領域と顔特徴点の座標を検出する。 In the detection mode, the coordinates of the face area and the face feature point in the image are detected using the dictionary information in the dictionary storage unit 206.

準追跡モードと追跡モードでは、基準画像記憶部２０７の基準顔画像情報を用いて、顔領域と顔特徴点の座標を検出する。検出した顔画像とその位置情報と、特徴点の座標とその近傍画像を基準画像評価部２０４に送り、顔向き推定部２１４に顔の位置情報と特徴点の座標を送る。 In the semi-tracking mode and the tracking mode, the coordinates of the face area and the face feature point are detected using the reference face image information in the reference image storage unit 207. The detected face image, its position information, the feature point coordinates, and its neighboring image are sent to the reference image evaluation unit 204, and the face position estimation unit 214 is sent the face position information and the feature point coordinates.

（１−１−１）顔検出部２１０
顔検出部２１０は、入力画像中から顔領域を検出し、その顔画像を得る。このとき、切り替え部２０５により切り替えられるモードにより領域を検出する処理が異なる。 (1-1-1) Face detection unit 210
The face detection unit 210 detects a face area from the input image and obtains the face image. At this time, the process for detecting an area differs depending on the mode switched by the switching unit 205.

検出モードでは、辞書記憶部２０６の辞書情報を用いて入力画像中の顔領域を検出する。 In the detection mode, the face area in the input image is detected using the dictionary information in the dictionary storage unit 206.

準追跡モードと追跡モードでは、基準画像記憶部２０７の基準顔画像情報を用いて顔領域を検出する。 In the semi-tracking mode and the tracking mode, the face area is detected using the reference face image information in the reference image storage unit 207.

（１−１−２）特徴点検出部２１１
特徴点検出部２１１は、入力画像を用いて特徴点の座標を検出し、その座標の近傍画像を得る。このとき、切り替え部２０５により切り替えられるモードにより検出する処理が異なる。 (1-1-2) Feature point detection unit 211
The feature point detection unit 211 detects the coordinates of the feature points using the input image, and obtains a neighborhood image of the coordinates. At this time, the detection process differs depending on the mode switched by the switching unit 205.

検出モードでは、辞書記憶部２０６の辞書情報を用いて入力画像中の特徴点の座標を検出する。 In the detection mode, the coordinates of the feature points in the input image are detected using the dictionary information in the dictionary storage unit 206.

準追跡モードと追跡モードでは、基準画像記憶部２０７の基準顔画像情報を用いて特徴点の座標を検出する。 In the semi-tracking mode and the tracking mode, the feature point coordinates are detected using the reference face image information in the reference image storage unit 207.

（１−２）基準画像評価部２０４
基準画像評価部２０４は、類似度評価部２１２と３次元的な顔向き評価部２１３と有している。 (1-2) Reference image evaluation unit 204
The reference image evaluation unit 204 includes a similarity evaluation unit 212 and a three-dimensional face direction evaluation unit 213.

基準画像評価部２０４は、検出部２０３からの検出した顔画像と特徴点の画像と位置情報を用いて、類似度評価部２１２から得た第１の評価値と、３次元的な顔向き評価部２１３から得た第２の評価値を合成することにより、基準顔画像としての最適性の評価を行う。 The reference image evaluation unit 204 uses the face image detected by the detection unit 203, the image of the feature point, and the position information, and the first evaluation value obtained from the similarity evaluation unit 212 and the three-dimensional face direction evaluation. By combining the second evaluation values obtained from the unit 213, the optimality as a reference face image is evaluated.

この評価の方法は、切り替え部２０５により切り替えられるモードにより異なる処理を行う。 This evaluation method performs different processing depending on the mode switched by the switching unit 205.

検出モードと準追跡モードでは、辞書記憶部２０６の辞書情報を用いて、類似度評価部２１２から得た第１の評価値と、３次元的な顔向き評価部２１３から得た第２の評価値を合成することにより基準顔画像としての最適性の評価を行う。 In the detection mode and the semi-tracking mode, the first evaluation value obtained from the similarity evaluation unit 212 and the second evaluation obtained from the three-dimensional face direction evaluation unit 213 are used using the dictionary information in the dictionary storage unit 206. The optimality as a reference face image is evaluated by combining the values.

なお、追跡モードでは評価を行わない。 Note that no evaluation is performed in the tracking mode.

求めた評価値は切り替え部２０５に送る。また、この求めた評価値が、現在の基準画像の評価値より高い場合には、その検出画像とその位置情報を新たな基準画像として、基準画像記憶部２０７に送る。 The obtained evaluation value is sent to the switching unit 205. If the obtained evaluation value is higher than the evaluation value of the current reference image, the detected image and its position information are sent to the reference image storage unit 207 as a new reference image.

すなわち、基準画像評価部２０４は、類似度評価部２１２から得た第１の評価値ｆ（ｘ）と、３次元的な顔向き評価部２１３からの第２の評価値ｇ（ｘ）を、αｆ（ｘ）＋βｇ（ｘ）などの式で合成することにより基準顔画像を評価する。但し、ｘは、フレーム番号である。 That is, the reference image evaluation unit 204 uses the first evaluation value f (x) obtained from the similarity evaluation unit 212 and the second evaluation value g (x) from the three-dimensional face orientation evaluation unit 213, The reference face image is evaluated by synthesizing with an expression such as αf (x) + βg (x). Where x is a frame number.

（１−２−１）類似度評価部２１２
類似度評価部２１２は、一般的で、かつ、オクルージョンの無い画像を基準画像とするために、検出部２０３から受け取った顔と特徴点の画像と辞書記憶部２０６の辞書情報との類似度を個々に獲得する。 (1-2-1) Similarity evaluation unit 212
The similarity evaluation unit 212 calculates the similarity between the face and feature point images received from the detection unit 203 and the dictionary information in the dictionary storage unit 206 in order to use a general and non-occlusion image as a reference image. Earn individually.

辞書情報には、複数の正解の情報を有する辞書情報（正解辞書情報）と、複数の誤りの情報を有する辞書情報（誤り辞書情報）とがある。そして、各画像で２つの辞書情報との類似度を求める。 The dictionary information includes dictionary information (correct answer dictionary information) having a plurality of correct answer information and dictionary information (error dictionary information) having a plurality of error information. Then, the similarity between the two dictionary information is obtained for each image.

まず。複数の顔の正解辞書情報の中で、顔正解閾値から各類似度を差し引いた中で最も高い値を第１の顔正類似度とする。 First. Among the correct face dictionary information of a plurality of faces, the highest value obtained by subtracting each similarity from the face correct answer threshold is set as the first face correct similarity.

次に、複数の顔の誤り辞書情報の中で、顔誤り閾値から各類似度を差し引いた中で最も高い値を第２の顔正類似度とする。 Next, among the plurality of face error dictionary information, the highest value obtained by subtracting each similarity from the face error threshold is set as the second face correct similarity.

次に、第１の顔正類似度から第２の顔正類似度を差し引いた値を、顔正誤類似度とする。 Next, a value obtained by subtracting the second face correct similarity from the first face correct similarity is set as the face correct / incorrect similarity.

また、全ての特徴点に関して、複数の特徴点の正解辞書情報の中で、特徴点正解閾値から各類似度を差し引いた中で最も高い値を第１の特徴点正類似度とする。 For all feature points, among the correct answer dictionary information of a plurality of feature points, the highest value among subtracted similarities from the feature point correct answer threshold is set as the first feature point correct similarity.

次に、複数の特徴点の誤り辞書の中で、特徴点誤り閾値から各類似度を差し引いた中で最も高い値を第２の特徴点正類似度とする。 Next, among the plurality of feature point error dictionaries, the highest value obtained by subtracting each similarity from the feature point error threshold is set as the second feature point positive similarity.

次に、第１の特徴点正類似度から第２の特徴点正類似度を差し引いた値を特徴点正誤類似度とする。 Next, a value obtained by subtracting the second feature point correct similarity from the first feature point correct similarity is set as the feature point correct similarity.

各特徴点に関して特徴点正類似度と特徴点正誤類似度のより値の大きい方を特徴点類似度とする。 For each feature point, the greater of the feature point correct similarity and the feature point correct / incorrect similarity is defined as the feature point similarity.

また、各特徴点の中で最も値の低い特徴点類似度の特徴点が有する特徴点正類似度と特徴点正誤類似度と顔正解辞書と顔正誤類似度の４つを加算した値を第１の評価値ｆ（ｘ）とする。 In addition, a value obtained by adding four of the feature point correct similarity, the feature point correct / incorrect similarity, the face correct answer dictionary, and the face correct / incorrect similarity of the feature points having the lowest feature point similarity among the feature points is the first. The evaluation value f (x) is 1.

（１−２−２）３次元的な顔向き評価部２１３
３次元的な顔向き評価部２１３は、検出部２０３から受け取った顔と特徴点の位置情報を受け取る。 (1-2-2) Three-dimensional face orientation evaluation unit 213
The three-dimensional face orientation evaluation unit 213 receives the face and feature point position information received from the detection unit 203.

ここで、図７は正面顔に対しての追跡可能な顔向き範囲を示す概念図であり、中央の正面方向を向いている顔画像が、顔向きを変化させた際に追跡が可能な範囲を点線で示している。 Here, FIG. 7 is a conceptual diagram showing a range of face orientations that can be tracked with respect to the front face, and a face image that faces the front direction in the center can be traced when the face orientation is changed. Is indicated by a dotted line.

図７の実線で示す追跡可能な顔向き範囲より、基準顔画像ができるだけ正面方向を向いている画像であれば追跡できる範囲が最も広い。すなわち、正面方向を向いている画像ほど第２の評価値を高くする。 If the reference face image is the front direction as much as possible, the range that can be tracked is wider than the traceable face direction range shown by the solid line in FIG. That is, the second evaluation value is increased as the image faces the front direction.

よって正面方向を向いている顔を評価するため、顔と特徴点の位置情報を顔向き推定部２１４に送り、カメラに対する顔向きの横向き角と縦向き角と首かしげ角を受け取り評価する。 Therefore, in order to evaluate the face facing the front direction, the position information of the face and the feature point is sent to the face direction estimation unit 214, and the lateral direction angle, the vertical direction angle, and the neck angle of the face direction with respect to the camera are received and evaluated.

また、各特徴点の画像座標に関する外れ値を評価するため、顔向き推定部２１４より形状当てはめ誤差と、顔に対する各特徴点の事前に学習した平均位置からの誤差を受け取り評価する。形状当てはめ誤差は、特徴点の検出誤差等により生じ、観測特徴点を一般顔形状に当てはめる際の誤差値である。角度の評価と特徴点の位置ずれ評価をもとに３次元的な顔向き評価値である第２の評価値ｇ（ｘ）とする。 Further, in order to evaluate an outlier regarding the image coordinates of each feature point, the face orientation estimation unit 214 receives and evaluates a shape fitting error and an error from an average position learned in advance for each feature point with respect to the face. The shape fitting error is caused by a feature point detection error or the like, and is an error value when fitting the observed feature point to the general face shape. A second evaluation value g (x), which is a three-dimensional face orientation evaluation value, is based on the evaluation of the angle and the evaluation of the displacement of the feature point.

（１−３）一般顔形状記憶部２１５
一般顔形状記憶部２１５は、事前に学習した一般顔形状と、顔に対する各特徴点の平均位置を記録しておき、要請があれば顔向き推定部２１４に送る。 (1-3) General face shape storage unit 215
The general face shape storage unit 215 records the general face shape learned in advance and the average position of each feature point with respect to the face, and sends it to the face direction estimation unit 214 if requested.

（１−４）顔向き推定部２１４
顔向き推定部２１４は、検出部２０３から受け取った顔と特徴点の画像座標情報を、一般顔形状記憶部２１５に記録してある一般顔形状を用いて、横向き角、縦向き角、首かしげ角、形状当てはめ誤差を求め、出力部２０８に送る。 (1-4) Face orientation estimation unit 214
The face orientation estimation unit 214 uses the general face shape recorded in the general face shape storage unit 215 for the image coordinate information of the face and the feature point received from the detection unit 203, and the horizontal orientation angle, the vertical orientation angle, and the neck curl. A corner and shape fitting error is obtained and sent to the output unit 208.

また、３次元的な顔向き評価部２１３から受け取った顔と特徴点の位置情報を、一般顔形状記憶部２１５の一般顔形状を用いて、横向き角、縦向き角、首かしげ角、形状当てはめ誤差と、顔に対する各特徴点の平均位置からの誤差を求め、３次元的な顔向き評価部２１３にそれらを返す。なお、各特徴点の平均位置は、一般顔形状記憶部２１５に記録してある。 Also, the position information of the face and feature points received from the three-dimensional face direction evaluation unit 213 is used for the horizontal angle, the vertical angle, the neck angle, and the shape fitting using the general face shape of the general face shape storage unit 215. The error and the error from the average position of each feature point with respect to the face are obtained, and these are returned to the three-dimensional face orientation evaluation unit 213. The average position of each feature point is recorded in the general face shape storage unit 215.

上記一般的な顔の形状モデルは、因子分解法などで事前に定義する。また、この一般的な顔の形状モデルが、現在の顔の向きが初期値となる。前記因子分解法は、非特許文献（金出，Poelman，森田，「因子分解法による物体形状とカメラ運動の復元」電子情報通信学会論文誌 D-II No.8， pp-1947-1505，Aug.1993）の方法を用いる。 The general face shape model is defined in advance by a factorization method or the like. In this general face shape model, the current face orientation is the initial value. Non-patent literature (Kanade, Poelman, Morita, “Restoring Object Shape and Camera Motion by Factorization”) IEICE Transactions D-II No.8, pp-1947-1505, Aug .1993) is used.

（２）顔向き推定手順
図６は、本実施形態に係る顔向き推定手順の一例を示すフローチャートである。処理手順は以下のようになる。図２で説明した追跡画像処理手順と同様な処理に関しては説明を省略する。 (2) Face Orientation Estimation Procedure FIG. 6 is a flowchart showing an example of a face orientation estimation procedure according to this embodiment. The processing procedure is as follows. Description of processing similar to the tracking image processing procedure described in FIG. 2 is omitted.

ステップ２０３について説明する。ステップ２０３は、ステップ２１５とステップ２１６から構成される。 Step 203 will be described. Step 203 is composed of Step 215 and Step 216.

ステップ２１５において、辞書記憶部２０６の辞書情報を用いて画像中の顔領域を検出し、検出画像と位置情報と用いた辞書情報との類似度を基準画像評価部２０４に送る。次に、ステップ２１６において、辞書情報を用いて画像中の特徴点の画像座標を検出し、検出した画像座標とその近傍画像と用いた辞書情報との類似度を基準画像評価部２０４に送る。 In step 215, the face area in the image is detected using the dictionary information in the dictionary storage unit 206, and the similarity between the detected image, the position information, and the used dictionary information is sent to the reference image evaluation unit 204. Next, in step 216, the image information of the feature point in the image is detected using the dictionary information, and the similarity between the detected image coordinate and its neighboring image and the used dictionary information is sent to the reference image evaluation unit 204.

ステップ２０４について説明する。ステップ２０４は、ステップ２１７とステップ２１８から構成される。 Step 204 will be described. Step 204 includes step 217 and step 218.

ステップ２１７において、基準画像記憶部２０７の基準顔画像情報を用いて画像中の顔領域を検出し、検出画像と位置情報と用いた辞書情報との類似度を切り替え部２０５に送る。次に、ステップ２１８において、基準顔画像情報を用いて画像中の特徴点の画像座標を検出し、検出した画像座標とその近傍画像と用いた辞書情報との類似度を切り替え部２０５に送る。 In step 217, the face area in the image is detected using the reference face image information in the reference image storage unit 207, and the similarity between the detected image, the position information, and the used dictionary information is sent to the switching unit 205. Next, in step 218, the image coordinates of the feature points in the image are detected using the reference face image information, and the similarity between the detected image coordinates and the neighboring image and the dictionary information used is sent to the switching unit 205.

ステップ２０７について説明する。ステップ２０７は、ステップ２１９とステップ２２０から構成される。 Step 207 will be described. Step 207 includes step 219 and step 220.

ステップ２１９において、最初に辞書記憶部２０６の辞書情報を用いて検出部２０３から受け取った顔と特徴点の位置情報から３次元的な顔向きを評価し正面方向を向いている画像の値が高くなるように第１の評価値を求める。次に、ステップ２２０において、検出部２０３から受け取った顔と特徴点の画像の辞書との類似度をもとに第２の評価値を求める。次に獲得した第１の評価値と第２の評価値を合成して、ステップ２０８に進む。 In step 219, first, the dictionary information in the dictionary storage unit 206 is used to evaluate the three-dimensional face direction from the face and feature point position information received from the detection unit 203, and the value of the image facing the front direction is high. The first evaluation value is obtained as follows. Next, in step 220, a second evaluation value is obtained based on the similarity between the face received from the detection unit 203 and the dictionary of feature point images. Next, the acquired first evaluation value and second evaluation value are synthesized, and the process proceeds to step 208.

ステップ２２１では、受け取った顔と特徴点の位置情報より顔向きを推定し、ステップ２１０に進む。 In step 221, the face orientation is estimated from the received face and feature point position information, and the process proceeds to step 210.

（変更例）
本発明は上記各実施形態に限らず、その主旨を逸脱しない限り種々に変更することができる。 (Example of change)
The present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the gist thereof.

（１）変更例１
上記実施形態では、基準画像を一枚としたが、これに代えて複数枚でもよい。例えば、次のような場合である。 (1) Modification 1
In the above embodiment, one reference image is used, but a plurality of images may be used instead. For example, this is the case.

まず、時系列画像として１〜５フレーム目の画像が入力したときに、各フレームで基準画像を求める。 First, when an image of the first to fifth frames is input as a time series image, a reference image is obtained for each frame.

次に、６フレーム目の画像が入力したときに、６フレーム目の基準画像を求める。 Next, when the image of the sixth frame is input, the reference image of the sixth frame is obtained.

次に、前記５枚の基準画像の中で最も評価値の低い基準画像（例えば、１フレーム目の基準画像とする）と、６フレーム目の基準画像の評価値を比較する。 Next, the reference image having the lowest evaluation value (for example, the reference image of the first frame) among the five reference images is compared with the evaluation value of the reference image of the sixth frame.

次に、６フレーム目の基準画像の評価値の方が、１フレーム目の基準画像の評価値より高いときに、６フレーム目の基準画像と１フレーム目の基準画像を入れ替える。 Next, when the evaluation value of the reference image of the sixth frame is higher than the evaluation value of the reference image of the first frame, the reference image of the sixth frame and the reference image of the first frame are switched.

これにより、常に５枚の基準画像を記録してることになる。 As a result, five reference images are always recorded.

（２）変更例２
上記実施形態では、対象物体の検出領域を入力画像から一枚だけ検出したが、これに限らず、複数の領域を検出してもよい。例えば、第３の実施形態では、一枚の入力画像に複数の顔が写っている場合である。 (2) Modification example 2
In the above embodiment, only one target object detection area is detected from the input image. However, the present invention is not limited to this, and a plurality of areas may be detected. For example, in the third embodiment, a plurality of faces are shown in one input image.

（３）変更例３
第３の実施形態で対象物体として人間の顔を用いたが、これに限らず、人間（人体全体）、自動車などの乗り物などの他の対象物体でもよい。 (3) Modification 3
Although the human face is used as the target object in the third embodiment, the present invention is not limited to this, and other target objects such as a human (the entire human body) and a vehicle such as an automobile may be used.

本発明の第１の実施形態に係わる追跡画像処理装置の構成を示す構成図である。It is a block diagram which shows the structure of the tracking image processing apparatus concerning the 1st Embodiment of this invention. 第１の実施形態の動作を示すフローチャートである。It is a flowchart which shows operation | movement of 1st Embodiment. 本発明の第２の実施形態に係わる追跡画像処理装置の構成を示す構成図である。It is a block diagram which shows the structure of the tracking image processing apparatus concerning the 2nd Embodiment of this invention. 第２の実施形態の動作を示すフローチャートである。It is a flowchart which shows operation | movement of 2nd Embodiment. 本発明の第３の実施形態に係わる顔向き推定装置の構成を示す構成図である。It is a block diagram which shows the structure of the face direction estimation apparatus concerning the 3rd Embodiment of this invention. 第３の実施形態の動作を示すフローチャートである。It is a flowchart which shows operation | movement of 3rd Embodiment. 正面顔に対しての追跡可能な顔向き範囲を示す概念図である。It is a conceptual diagram which shows the traceable face direction range with respect to a front face. 従来法と本実施形態のフレームレートの変化を示す概念図である。It is a conceptual diagram which shows the change of the frame rate of a conventional method and this embodiment. 最適画像を獲得するまでの本発明と従来法の挙動を示す概念図である。It is a conceptual diagram which shows the behavior of this invention and a conventional method until it acquires an optimal image.

Explanation of symbols

１、１０１・・・追跡画像処理装置
２、１０２、２０２・・・画像入力部
３、１０３、２０３・・・検出部
４、１０４、２０４・・・基準画像評価部
５、１０５、２０５・・・切り替え部
６、１０６、２０６・・・辞書記憶部
７、１０７、２０７・・・基準画像記憶部
８、１０８、２０８・・・出力部
１０９・・・パラメタ記憶部
２０１・・・顔向き推定装置
２１０・・・顔検出部
２１１・・・特徴点検出部
２１２・・・類似度評価部
２１３・・・３次元的な顔向き評価部
２１４・・・顔向き推定部
２１５・・・一般顔形状記憶部 DESCRIPTION OF SYMBOLS 1,101 ... Tracking image processing apparatus 2,102,202 ... Image input part 3,103,203 ... Detection part 4,104,204 ... Reference image evaluation part 5,105,205 ... Switching unit 6, 106, 206 ... Dictionary storage unit 7, 107, 207 ... Reference image storage unit 8, 108, 208 ... Output unit 109 ... Parameter storage unit 201 ... Face orientation estimation Device 210: Face detection unit 211 ... Feature point detection unit 212 ... Similarity evaluation unit 213 ... Three-dimensional face direction evaluation unit 214 ... Face direction estimation unit 215 ... General face Shape memory

Claims

An image input unit for inputting time-series images;
A dictionary storage unit that stores dictionary information related to a target object that is a detection target;
A reference storage unit for storing a reference image and a reference similarity;
A first corresponding area of the target object is detected from the input image using the dictionary information, a first similarity between the dictionary information and the image of the first corresponding area is obtained, and an image of the first corresponding area and A detection mode execution unit that executes a detection mode for updating the reference image and the reference similarity with the first similarity;
A second target area is detected from the input image using the reference image, a second similarity between the dictionary information and the image of the second corresponding area is obtained, and the obtained second similarity is the reference image. A semi-tracking mode execution unit that executes a semi-tracking mode for updating the reference image and the reference similarity with the second corresponding region and the second similarity when the similarity is higher than
A tracking mode execution unit that executes a tracking mode for detecting the target region using the reference image from the input image;
(1) switch to the detection mode at first, (2) switch to the semi-tracking mode when the first similarity becomes higher than a first threshold during the execution of the detection mode, and (3) the semi-tracking mode When the second similarity is higher than the second threshold higher than the first threshold during the execution of (2), switching to the tracking mode, (4) during the execution of the semi-tracking mode, the second similarity is the A switching unit that continues the semi-tracking mode when it is between a first threshold and the second threshold;
An image processing apparatus.

When the first similarity is lower than the first threshold, the switching unit continues the detection mode for the next input image,
The detection mode execution unit detects a next first corresponding area of the target object using the dictionary information from the next input image, and performs the next first from the reference image and the next first corresponding area. The similarity is obtained, and when the next first similarity is higher than the first similarity of the reference image, the next first corresponding region is set as a new reference image. The image processing apparatus according to 1.

In addition to the first similarity, the detection mode execution unit uses the dictionary information and the first corresponding area as a direction evaluation value that increases as the target object in the target area faces a specific direction. And using
The semi-tracking mode execution unit obtains the direction evaluation value in addition to the second similarity using the reference image and the second corresponding region,
The switching unit compares the first similarity or the value obtained by adding the direction evaluation value to the second similarity with the first threshold or the second threshold.
The image processing apparatus according to claim 1.

In the detection mode and the semi-tracking mode,
Having the reference image for each of the time-series images;
The reference image having the lowest evaluation value among the plurality of reference images is compared with the evaluation value of the newly obtained reference image,
The new reference image and the reference image with the lowest evaluation value are switched when the evaluation value of the new reference image is higher than the evaluation value of the reference image with the lowest evaluation value. Item 6. The image processing apparatus according to Item 1.

The image processing apparatus according to claim 1, wherein, in the detection mode and the semi-tracking mode, the calculation cost of the similarity is reduced as the first similarity or the second similarity increases.

The target object is a human face;
Obtaining facial part position information from the target area in each input image,
Obtaining three-dimensional face orientation information from the face component position information of each input image;
The image processing apparatus according to claim 3, wherein the direction evaluation value is increased as the face orientation information faces the front.

Enter a time-series image,
Stores dictionary information about the target object that is the detection target,
Memorize the standard image and standard similarity,
A first corresponding area of the target object is detected from the input image using the dictionary information, a first similarity between the dictionary information and the image of the first corresponding area is obtained, and an image of the first corresponding area and Executing a detection mode for updating the reference image and the reference similarity with the first similarity;
A second target area is detected from the input image using the reference image, a second similarity between the dictionary information and the image of the second corresponding area is obtained, and the obtained second similarity is the reference image. A semi-tracking mode for updating the reference image and the reference similarity with the second corresponding region and the second similarity when the similarity is higher than
Performing a tracking mode for detecting the target region using the reference image from the input image;
(1) switch to the detection mode at first, (2) switch to the semi-tracking mode when the first similarity becomes higher than a first threshold during the execution of the detection mode, and (3) the semi-tracking mode When the second similarity is higher than the second threshold higher than the first threshold during the execution of (2), switching to the tracking mode, (4) during the execution of the semi-tracking mode, the second similarity is the When in between the first threshold and the second threshold, the quasi-tracking mode is continued;
Image processing method.

Image input function to input time-series images,
A dictionary storage function for storing dictionary information relating to a target object to be detected;
A reference storage function for storing a reference image and a reference similarity;
A first corresponding area of the target object is detected from the input image using the dictionary information, a first similarity between the dictionary information and the image of the first corresponding area is obtained, and an image of the first corresponding area and A detection mode execution function for executing a detection mode for updating the reference image and the reference similarity with the first similarity;
A second target area is detected from the input image using the reference image, a second similarity between the dictionary information and the image of the second corresponding area is obtained, and the obtained second similarity is the reference image. A semi-tracking mode execution function for executing a semi-tracking mode for updating the reference image and the reference similarity with the second corresponding region and the second similarity when the similarity is higher than
A tracking mode execution function for executing a tracking mode for detecting the target region using the reference image from the input image;
(1) switch to the detection mode at first, (2) switch to the semi-tracking mode when the first similarity becomes higher than a first threshold during the execution of the detection mode, and (3) the semi-tracking mode When the second similarity is higher than the second threshold higher than the first threshold during the execution of (2), switching to the tracking mode, (4) during the execution of the semi-tracking mode, the second similarity is the A switching function to continue the semi-tracking mode when between the first threshold and the second threshold;
An image processing program that realizes