JP5836724B2

JP5836724B2 - Image recognition method, image recognition apparatus, and program

Info

Publication number: JP5836724B2
Application number: JP2011206206A
Authority: JP
Inventors: 加藤　政美; 政美加藤; 山本　貴久; 貴久山本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2011-09-21
Filing date: 2011-09-21
Publication date: 2015-12-24
Anticipated expiration: 2031-09-21
Also published as: JP2013069058A

Description

本発明は画像認識方法、画像認識装置及びプログラムに関し、特に、顔画像データから個人を特定するために用いて好適な技術に関する。 The present invention relates to an image recognition method, an image recognition apparatus, and a program, and more particularly to a technique suitable for specifying an individual from face image data.

顔画像データを用いた個人の認識（以下、顔認識とする）において、顔器官或いはそれに準ずる特徴的な部位（以下、特徴点とする）の位置の決定は重要なタスクであり、認識性能を律することが多い。しかしながら、高精度な特徴点の位置決定は高い処理負荷を要し、認識処理全体の時間を律速する場合がある。 In the recognition of individuals using face image data (hereinafter referred to as face recognition), the determination of the position of a facial organ or a characteristic part corresponding to it (hereinafter referred to as a feature point) is an important task. I often do it. However, highly accurate feature point position determination requires a high processing load, and may limit the time of the entire recognition process.

そこで、特許文献１には、動画データから個人を認識する場合に、前フレームの認識結果を利用して、処理対象フレームで抽出する特徴点の数を削減する手法が開示されている。即ち、対象とする人物が一度認識された場合（追尾状態）、次フレームで抽出する特徴点の数を削減することにより高速化を図っている。 Therefore, Patent Document 1 discloses a technique for reducing the number of feature points to be extracted from a processing target frame using a recognition result of a previous frame when an individual is recognized from moving image data. That is, when a target person is recognized once (tracking state), the speed is increased by reducing the number of feature points extracted in the next frame.

特開２００９−７５９９９号公報JP 2009-75999 A

Beumer, G.M.; Tao, Q.; Bazen, A.M.; Veldhuis, R.N.J.“A landmark paper in face recognition” Automatic Face and Gesture Recognition, 2006. FGR 2006. 7th International Conference、pp. 73-78Beumer, G.M .; Tao, Q .; Bazen, A.M .; Veldhuis, R.N.J. “A landmark paper in face recognition” Automatic Face and Gesture Recognition, 2006. FGR 2006. 7th International Conference, pp. 73-78

しかしながら、特許文献１に開示されている手法は、抽出する特徴点の削減に伴い、当該特徴点の位置を利用した後段の処理（特徴量の抽出や登録データに対する類似度の比較等の処理）内容が異なる。従って、特徴点の数に対応する複数の種類の登録データを予め用意しておく必要がある。その際、特徴点の数が少ない等、位置精度の低い特徴点に基づいて生成した登録データは信頼性が低い可能性が高い。このため、登録データの一部に信頼性の低いデータが混入した場合、認識精度が大きく低下する。 However, according to the technique disclosed in Patent Document 1, with the reduction of feature points to be extracted, subsequent processing using the positions of the feature points (processing such as feature amount extraction or comparison of similarity to registered data). The contents are different. Therefore, it is necessary to prepare in advance a plurality of types of registration data corresponding to the number of feature points. At that time, registration data generated based on feature points with low position accuracy, such as a small number of feature points, is likely to have low reliability. For this reason, when data with low reliability is mixed in a part of registered data, the recognition accuracy is greatly lowered.

本発明は前述の問題点に鑑み、高速に認識処理を実行する場合であっても、認識精度の低下を軽減できるようにすることを目的としている。 In view of the above-described problems, an object of the present invention is to reduce a reduction in recognition accuracy even when recognition processing is executed at high speed.

本発明の画像認識方法は、画像データを取得する画像データ取得工程と、前記取得した画像データから特徴点の位置を検出する第１の特徴点位置検出工程と、前記取得した画像データから前記第１の特徴点位置検出工程よりも多くの特徴点の位置を検出する第２の特徴点位置検出工程と、前記第１の特徴点位置検出工程、または第２の特徴点位置検出工程による処理を選択する選択工程と、前記選択された結果に基づいて特徴量の算出位置を決定する特徴量算出位置決定工程と、前記特徴量算出位置決定工程の結果に基づいて特徴量を算出する特徴量算出工程と、処理モードを判定する判定工程と、前記判定工程により判定された処理モードが登録処理の場合は、前記第２の特徴点位置検出工程による処理を選択し、当該第２の特徴点位置検出工程の結果及び前記特徴量算出位置決定工程の結果に基づいて算出した特徴量から登録データを生成する登録工程と、前記判定工程により判定された処理モードが認識処理の場合、前記第１の特徴点位置検出工程による処理を選択し、当該第１の特徴点位置検出工程の結果及び前記特徴量算出位置決定工程の結果に基づいて算出した特徴量と前記登録データとに基づいて画像データを認識する認識工程とを有し、前記選択工程において前記第１の特徴点位置検出工程による処理が選択されている場合に、前記特徴量算出位置決定工程は、前記第１の特徴点位置検出工程において検出された特徴点と、前記第２の特徴点位置検出工程においては検出されて前記第１の特徴点位置検出工程においては検出されない特徴点の位置を所定の座標値で代替することによって得られる特徴点とに基づいて、特徴量の算出位置を決定することを特徴とする。 The image recognition method of the present invention includes an image data acquisition step of acquiring image data, a first feature point position detection step of detecting a position of a feature point from the acquired image data, and the first of the acquired image data. a second feature point position detection step of detecting a position of a number of feature points than the first feature point position detection step, the process according to the first feature point position detection step or the second feature point position detection step, A selection step for selecting, a feature amount calculation position determining step for determining a feature amount calculation position based on the selected result, and a feature amount calculation for calculating a feature amount based on the result of the feature amount calculation position determining step step and a determination step of determining the processing mode, the determining if the step is the processing mode determined by the registration process is to select the processing by the second feature point position detection step, the second characteristic point position Inspection If the result and registration step of generating registration data from the feature amount calculated based on the results of the feature quantity calculating positioning process step, the determination processing mode determined by the process of the recognition process, the first feature select processing by the point position detecting step, recognizing the image data on the basis of the calculated feature amount based on the result of the first feature point position detection process result and the feature amount calculating positioning step and the registration data And the feature amount calculation position determination step is the first feature point position detection step when the processing by the first feature point position detection step is selected in the selection step. The detected feature point and the position of the feature point that is detected in the second feature point position detection step but not detected in the first feature point position detection step are represented by predetermined coordinate values. Based on the feature points obtained by replacement, and determines the calculated position of the feature quantity.

本発明によれば、認識処理時に高速に特徴点の位置を検出する場合であっても認識精度の低下を抑えることができる。 According to the present invention, it is possible to suppress a reduction in recognition accuracy even when the position of a feature point is detected at high speed during recognition processing.

第１の実施形態の動作の概要を説明する図である。It is a figure explaining the outline | summary of operation | movement of 1st Embodiment. 第１の実施形態の画像認識装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the image recognition apparatus of 1st Embodiment. 顔画像の切り出し例を説明する図である。It is a figure explaining the example of cut-out of a face image. 特徴点の例を説明する図である。It is a figure explaining the example of a feature point. 特徴点位置検出手段の例を説明する図である。It is a figure explaining the example of a feature point position detection means. 特徴点の補正例を説明する図である。It is a figure explaining the example of correction of a feature point. 特徴点位置補正手段の処理例を説明する図である。It is a figure explaining the process example of a feature point position correction means. 第１の実施形態の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of 1st Embodiment. 第２の実施形態の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of 2nd Embodiment. 第３の実施形態の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of 3rd Embodiment. 第３及び第４の実施形態の画像認識装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the image recognition apparatus of 3rd and 4th embodiment. 第４の実施形態の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of 4th Embodiment.

（第１の実施形態）
以下、本発明の第１の実施形態の動作について説明する。
図１は、本実施形態の処理の概要を説明する図である。
図１において、１０９は認識処理時の処理概要を示す。ここでの認識処理とは、顔画像データ取得処理１００で取得した顔画像データから個人を特定する処理である。１１０は登録処理時の処理概要を示す。ここでの登録処理とは、顔画像データ取得処理１００で取得した顔画像データから認識処理時に使用する登録データを生成する処理である。なお、認識処理１０９及び登録処理１１０において、番号が同じ処理は共通の処理で構成されるものとする。また、認識処理及び登録処理の選択は、処理モード判定処理１１１の結果に従うものとする。 (First embodiment)
The operation of the first embodiment of the present invention will be described below.
FIG. 1 is a diagram for explaining the outline of the processing of this embodiment.
In FIG. 1, reference numeral 109 denotes an outline of processing during recognition processing. Here, the recognition process is a process for identifying an individual from the face image data acquired in the face image data acquisition process 100. Reference numeral 110 denotes an outline of processing at the time of registration processing. Here, the registration process is a process of generating registration data to be used during the recognition process from the face image data acquired in the face image data acquisition process 100. In the recognition process 109 and the registration process 110, processes having the same number are configured as a common process. The selection of the recognition process and the registration process is based on the result of the process mode determination process 111.

認識処理時は、顔画像データ取得処理１００で取得した顔画像データから特徴点位置を検出する。ここで特徴点とは、例えば目・鼻・口等顔の器官に関連する特徴的な画像形状の位置である。本実施形態で２つの特徴点位置検出処理を行う。第１の特徴点位置検出処理１０２は、第１の特徴点位置検出である高速な特徴点位置検出処理である。一方、第２の特徴点位置検出処理１０３は、第２の特徴点位置検出である高精度な特徴点位置検出処理である。例えば、第１の特徴点位置検出処理１０２では、第２の特徴点位置検出処理１０３と比較して、算出する特徴点の数が少ない。 During the recognition process, the feature point position is detected from the face image data acquired in the face image data acquisition process 100. Here, the feature points are positions of characteristic image shapes related to facial organs such as eyes, nose and mouth. In this embodiment, two feature point position detection processes are performed. The first feature point position detection process 102 is a high-speed feature point position detection process that is the first feature point position detection. On the other hand, the second feature point position detection process 103 is a highly accurate feature point position detection process which is the second feature point position detection. For example, in the first feature point position detection process 102, the number of feature points to be calculated is smaller than that in the second feature point position detection process 103.

認識動作モード設定処理１０１は、認識処理を高精度で動作させるか或いは高速に動作させるかを指示するための設定処理である。選択処理１０４は、認識動作モード設定処理１０１に応じて特徴点位置検出処理を選択する。認識処理を高速に動作させる高速モードを選択したい場合は、第１の特徴点位置検出処理１０２を選択し、高精度に動作させる高精度モードを選択したい場合は、第２の特徴点位置検出処理１０３を選択する。 The recognition operation mode setting process 101 is a setting process for instructing whether the recognition process is operated with high accuracy or at high speed. The selection process 104 selects a feature point position detection process according to the recognition operation mode setting process 101. When it is desired to select a high-speed mode for operating recognition processing at high speed, the first feature point position detection process 102 is selected. When a high-precision mode for operating with high accuracy is selected, second feature point position detection processing is selected. 103 is selected.

特徴量算出位置決定処理１０５では、検出精度が異なる第１の特徴点位置検出処理１０２又は第２の特徴点位置検出処理１０３の結果に基づいて特徴量を算出する際の基準となる位置を決定する。ここでは、何れの特徴点位置検出処理が選択された場合であっても共通の特徴量算出位置基準を決定する。 In the feature amount calculation position determination processing 105, a position serving as a reference for calculating the feature amount is determined based on the result of the first feature point position detection processing 102 or the second feature point position detection processing 103 with different detection accuracy. To do. Here, a common feature amount calculation position reference is determined regardless of which feature point position detection process is selected.

特徴量算出処理１０６では、特徴量算出位置決定処理１０５の結果に基づいて所定の特徴量を算出する。本実施形態では、特徴量算出位置決定処理１０５により共通の基準位置が算出されるため、特徴点位置検出処理の種類（特徴点の数等）によらず共通の特徴量算出処理１０６を使用することができる。判定処理１０７では、特徴量算出処理１０６によって得られた認識処理対象の顔画像の特徴量と登録データ生成処理１０８で生成された登録データとを比較することにより認識処理対象の顔画像が登録データか否かの判定を行う。 In the feature amount calculation process 106, a predetermined feature amount is calculated based on the result of the feature amount calculation position determination process 105. In this embodiment, since the common reference position is calculated by the feature amount calculation position determination processing 105, the common feature amount calculation processing 106 is used regardless of the type of feature point position detection processing (number of feature points, etc.). be able to. In the determination processing 107, the facial image of the recognition processing target is obtained by comparing the feature amount of the recognition target facial image obtained by the feature calculation processing 106 with the registration data generated by the registration data generation processing 108. It is determined whether or not.

登録処理１１０では、顔画像データ取得処理１００から入力する顔画像データから特徴点位置を検出する。そして、第２の特徴点位置検出処理１０３により、高精度に特徴点の位置を検出する。特徴量算出位置決定処理１０５では、第２の特徴点位置検出処理１０３の結果に従って特徴量を算出する基準位置を決定する。特徴量算出処理１０６では特徴量算出位置決定処理１０５の結果に基づいて所定の特徴量を算出する。登録データ生成処理１０８では、特徴量算出処理１０６で算出された顔画像データに対する特徴量を当該顔画像に対応する個人を特定する情報と共に記録する。 In the registration process 110, the feature point position is detected from the face image data input from the face image data acquisition process 100. The second feature point position detection process 103 detects the position of the feature point with high accuracy. In the feature amount calculation position determination processing 105, a reference position for calculating the feature amount is determined according to the result of the second feature point position detection processing 103. In the feature amount calculation process 106, a predetermined feature amount is calculated based on the result of the feature amount calculation position determination process 105. In the registered data generation process 108, the feature quantity for the face image data calculated in the feature quantity calculation process 106 is recorded together with information for identifying an individual corresponding to the face image.

本実施形態では、認識処理時では認識動作モードに応じて特徴点位置検出処理を高速或いは高精度に切り替えて処理し、登録処理時では、高精度の特徴位置検出処理により登録データを生成する。即ち、認識処理時は、特徴点位置検出処理の種類によらず高精度の特徴位置検出処理に基づいて生成した共通の登録データを用いて認識処理を実行する。なお、本実施形態としては、特徴点位置検出処理の検出精度をモードに応じて可変設定するようにしてもよい。 In the present embodiment, the feature point position detection process is switched at high speed or with high accuracy according to the recognition operation mode at the time of recognition processing, and registration data is generated by the highly accurate feature position detection process at the time of registration processing. That is, at the time of recognition processing, recognition processing is executed using common registration data generated based on high-precision feature position detection processing regardless of the type of feature point position detection processing. In the present embodiment, the detection accuracy of the feature point position detection process may be variably set according to the mode.

図２は、本実施形態の画像認識装置の構成例を示すブロック図である。本実施形態の画像認識装置では、まず、画像データから顔画像データを抽出する。そして、得られた顔画像データから複数の特徴点位置を検出し、特徴点位置に基づいて算出した特徴量から個人を識別する。 FIG. 2 is a block diagram illustrating a configuration example of the image recognition apparatus according to the present embodiment. In the image recognition apparatus of this embodiment, first, face image data is extracted from image data. Then, a plurality of feature point positions are detected from the obtained face image data, and an individual is identified from the feature amount calculated based on the feature point positions.

図２において、画像入力部２０１は、光学系デバイス、光電変換デバイス及びセンサーを制御するドライバー回路、ＡＤコンバーター、各種画像補正を司る信号処理回路、フレームバッファ等により構成されている。２０２は前処理部であり、後段の各種処理を効果的に行うために各種前処理を実行する。具体的には、画像入力部２０１で取得した画像データに対して色変換処理、コントラスト補正処理等の画像データ変換をハードウェアで処理する。 In FIG. 2, an image input unit 201 includes an optical system device, a photoelectric conversion device, a driver circuit that controls a sensor, an AD converter, a signal processing circuit that controls various image corrections, a frame buffer, and the like. Reference numeral 202 denotes a preprocessing unit that executes various types of preprocessing in order to effectively perform various types of subsequent processing. Specifically, image data conversion such as color conversion processing and contrast correction processing is performed on the image data acquired by the image input unit 201 by hardware.

顔画像データ切り出し処理部２０３は、前処理部２０２で補正した画像データに対して顔検出処理を実行する。顔検出の手法は従来提案されている様々な手法を適用可能である。顔画像データ切り出し処理部２０３は、検出された顔毎に顔画像データを所定のサイズに正規化して切り出す。 The face image data cutout processing unit 203 performs face detection processing on the image data corrected by the preprocessing unit 202. Various conventionally proposed methods can be applied to the face detection method. The face image data cutout processing unit 203 normalizes and cuts out face image data to a predetermined size for each detected face.

図３は、顔画像データの切り出し処理例を説明する図である。前処理部２０２で補正された画像３１から顔領域３２を判定し、予め定めるサイズに正規化した顔画像３３を切り出す。即ち、顔画像３３の大きさは顔によらず一定である。切り出した顔画像３３はＤＭＡＣ（Direct Memory Access Controller）２０５を介してＲＡＭ（Random Access Memory）２０９に格納される。以後、特徴点の位置とは、顔画像３３内の特徴点の座標と定義し、ここでの座標は顔画像３３の左上端を原点とする座標系（ｘ座標、ｙ座標）で表現するものとする。 FIG. 3 is a diagram illustrating an example of face image data cutout processing. The face area 32 is determined from the image 31 corrected by the preprocessing unit 202, and the face image 33 normalized to a predetermined size is cut out. That is, the size of the face image 33 is constant regardless of the face. The cut face image 33 is stored in a RAM (Random Access Memory) 209 via a DMAC (Direct Memory Access Controller) 205. Hereinafter, the position of the feature point is defined as the coordinate of the feature point in the face image 33, and the coordinate here is expressed by a coordinate system (x coordinate, y coordinate) with the upper left corner of the face image 33 as the origin. And

２０７はＣＰＵ（Central Processing Unit）であり、本実施形態に係る主要な処理を実行するとともに、画像認識装置全体の動作を制御する。２０４はブリッジであり、画像バス２１０とＣＰＵバス２０６との間のバスブリッジ機能を提供する。２０８はＲＯＭであり、ＣＰＵ２０７の動作を規定する命令を格納する。ＲＡＭ２０９はＣＰＵ２０７の動作に必要な作業メモリである。ＲＡＭ２０９はＤＲＡＭ（Dynamic RAM）等の比較的容量の大きいメモリで構成し、図示しないメモリコントローラを介して、ＣＰＵバス２０６に接続する。画像バス２１０上のデバイス及びＣＰＵバス２０６上のデバイスは同時に動作する。 A CPU (Central Processing Unit) 207 executes main processing according to the present embodiment and controls the operation of the entire image recognition apparatus. A bridge 204 provides a bus bridge function between the image bus 210 and the CPU bus 206. Reference numeral 208 denotes a ROM that stores instructions that define the operation of the CPU 207. A RAM 209 is a work memory necessary for the operation of the CPU 207. The RAM 209 is configured by a memory having a relatively large capacity such as a DRAM (Dynamic RAM), and is connected to the CPU bus 206 via a memory controller (not shown). Devices on the image bus 210 and devices on the CPU bus 206 operate simultaneously.

２１１はユーザーインターフェース部であり、ユーザーが処理モードを入力するための物理的なインターフェース（スイッチ、タッチパネル等）を提供する。ＣＰＵ２０７はＲＡＭ２０９に格納した顔画像３３に対して認識処理を実行する。 A user interface unit 211 provides a physical interface (switch, touch panel, etc.) for the user to input a processing mode. The CPU 207 executes recognition processing on the face image 33 stored in the RAM 209.

図８は、本実施形態の顔認識処理の動作手順の一例を示すフローチャートである。当該フローチャートはＣＰＵ２０７の動作を示す図である。
まず、ＣＰＵ２０７はステップＳ８００で処理モードを判定する。ここでの処理モードとは認識処理又は登録処理であり、ユーザーインターフェース部２１１の設定情報に基づいて判定する。なお、以下の説明において、顔画像データはＲＡＭ２０９に既に格納されているものとする。 FIG. 8 is a flowchart showing an example of the operation procedure of the face recognition process of the present embodiment. The flowchart shows the operation of the CPU 207.
First, the CPU 207 determines a processing mode in step S800. The processing mode here is recognition processing or registration processing, and is determined based on setting information of the user interface unit 211. In the following description, it is assumed that the face image data is already stored in the RAM 209.

最初に、認識処理について説明する。ステップＳ８０１では、認識動作モードを判定する。ここでの認識動作モードとは高精度モード及び高速モードである。高精度モードでは、処理速度より認識精度を優先してに顔認識を行う。一方、高速モードでは精度の低下を許容して高速な認識処理を実行する。例えば、高精度モードか高速モードかの判断は、前フレームで認識対象人物（認識対象として登録されている人物）が認識されたか否かによって判断する。すなわち、ＣＰＵ２０７はＲＡＭ２０９に格納された前フレームの認識結果を参照して認識動作モードを判定する。具体的には、前フレームで認識対象として登録されている人物が認識された場合、現フレームは高速モードで処理を行う。 First, the recognition process will be described. In step S801, the recognition operation mode is determined. The recognition operation modes here are a high-accuracy mode and a high-speed mode. In the high accuracy mode, face recognition is performed with priority given to recognition accuracy over processing speed. On the other hand, in the high-speed mode, a high-speed recognition process is executed while allowing a decrease in accuracy. For example, whether the high-accuracy mode or the high-speed mode is determined is determined based on whether or not a recognition target person (person registered as a recognition target) has been recognized in the previous frame. That is, the CPU 207 determines the recognition operation mode with reference to the recognition result of the previous frame stored in the RAM 209. Specifically, when a person registered as a recognition target is recognized in the previous frame, the current frame is processed in the high-speed mode.

ＣＰＵ２０７は、ステップＳ８０１の判定結果に従って、対応する特徴点位置の検出処理を選択して実行する。高速モードが選択された場合、ステップＳ８０２で高速な特徴点位置検出処理を実行する。高精度モードが選択された場合、ステップＳ８０３で高精度な特徴点位置検出処理を実行する。 The CPU 207 selects and executes the corresponding feature point position detection process according to the determination result of step S801. If the high-speed mode is selected, a high-speed feature point position detection process is executed in step S802. If the high accuracy mode is selected, a highly accurate feature point position detection process is executed in step S803.

ここで、高速モードと高精度モードとで検出する特徴点の数が異なる場合について説明する。図４は、検出する特徴点の位置の例を説明する図である。図４（ａ）は、高精度モード時に検出する特徴点の位置を示しており、顔器官に関連する１５個の特徴点４０１〜４１５を検出する。一方、図４（ｂ）は高速モード時に検出する特徴点の位置を示しており、６個の特徴点位置４１７〜４２１を検出する。多数の特徴点を検出すると、後述する特徴量の算出位置を決定するステップＳ８０４で得られる位置データの信頼度が向上し、複数の特徴点の位置情報を利用することで誤検出の影響が緩和される。 Here, a case where the number of feature points detected in the high-speed mode and the high-accuracy mode is different will be described. FIG. 4 is a diagram for explaining an example of the position of the feature point to be detected. FIG. 4A shows the positions of feature points to be detected in the high accuracy mode, and 15 feature points 401 to 415 related to the facial organ are detected. On the other hand, FIG. 4B shows the positions of feature points detected in the high speed mode, and six feature point positions 417 to 421 are detected. When a large number of feature points are detected, the reliability of the position data obtained in step S804 for determining the calculation position of the feature amount described later is improved, and the influence of false detection is mitigated by using the position information of a plurality of feature points Is done.

図５は、特徴点位置検出処理であるステップＳ８０２及びＳ８０３の具体例を説明する図である。本実施形態では、ＣＮＮ（Convolutional Neural Networks）により特徴点位置を検出する。なお、図５には、説明のために２つの特徴点位置を検出する場合の構成を示している。 FIG. 5 is a diagram for explaining a specific example of steps S802 and S803 which are feature point position detection processing. In this embodiment, the feature point position is detected by CNN (Convolutional Neural Networks). FIG. 5 shows a configuration in the case of detecting two feature point positions for explanation.

ＣＮＮは階層的な特徴抽出処理により構成され、図５に示す例では、１層５０６の特徴数が３、２層５１０の特徴数が２の２階層ＣＮＮの例を示している。５０１は顔画像データであり、前述の顔画像３３に相当する。５０３ａ〜５０３ｃは１層５０６の特徴面を示す。特徴面とは、所定の特徴抽出フィルタ（コンボリューション演算の累積和及び非線形処理）で前階層のデータを走査しながら演算した結果を格納する画像データ面である。特徴面はラスタスキャンされた画像データに対する検出結果であるため検出結果も面で表す。特徴面５０３ａ〜５０３ｃは、５０１を参照して、異なる特徴抽出フィルタにより算出する。特徴面５０３ａ〜５０３ｃはそれぞれ模式的に２次元のコンボリューションフィルタ５０４ａ〜５０４ｃに対応する演算と演算結果の非線形変換とにより生成される。なお、５０２はコンボリューション演算に必要な参照画像領域を示す。例えば、フィルタサイズ（水平方向の長さと垂直方向の高さ）が１１×１１のコンボリューションフィルタ演算は、以下に示す式（１）のような積和演算により処理する。 The CNN is configured by hierarchical feature extraction processing, and in the example illustrated in FIG. 5, an example of a two-layer CNN in which the number of features in the first layer 506 is 3 and the number of features in the second layer 510 is two. Reference numeral 501 denotes face image data, which corresponds to the face image 33 described above. Reference numerals 503 a to 503 c denote characteristic surfaces of the one layer 506. The feature plane is an image data plane that stores a result of calculation while scanning data of the previous layer with a predetermined feature extraction filter (cumulative sum of convolution calculations and nonlinear processing). Since the feature plane is a detection result for raster-scanned image data, the detection result is also represented by a plane. The feature planes 503a to 503c are calculated with reference to 501 using different feature extraction filters. The characteristic surfaces 503a to 503c are generated by calculation corresponding to the two-dimensional convolution filters 504a to 504c and non-linear conversion of the calculation results, respectively. Reference numeral 502 denotes a reference image area necessary for the convolution calculation. For example, a convolution filter operation with a filter size (length in the horizontal direction and height in the vertical direction) of 11 × 11 is processed by a product-sum operation as shown in Equation (1) below.

ここで、input（ｘ，ｙ）は、座標（ｘ、ｙ）での参照画素値を示し、output（ｘ，ｙ）は、座標（ｘ，ｙ）での演算結果を示す。weight（column，row）は、座標（ｘ＋column，ｙ＋row）での重み係数を示し、columnSize＝１１、rowSize＝１１は、フィルタサイズ（フィルタタップ数）を示す。 Here, input (x, y) indicates a reference pixel value at coordinates (x, y), and output (x, y) indicates a calculation result at coordinates (x, y). weight (column, row) indicates a weight coefficient at coordinates (x + column, y + row), and columnSize = 11 and rowSize = 11 indicate a filter size (number of filter taps).

コンボリューションフィルタ５０４ａ〜５０４ｃは、夫々係数が異なる。コンボリューションフィルタの係数は学習によって予め決定しておく。また、特徴面によってコンボリューションフィルタのサイズも異なる。ＣＮＮ演算では複数のフィルタフィルタを画素単位で走査しながら積和演算を繰り返し、最終的な積和結果を非線形変換することにより特徴面を生成する。なお、非線形変換はシグモイド関数等を適用する。特徴面５０３ａを算出する場合は、前階層との結合数が１であるため、１つのコンボリューションフィルタ５０４ａを用いる。 The convolution filters 504a to 504c have different coefficients. The coefficient of the convolution filter is determined in advance by learning. Also, the size of the convolution filter varies depending on the feature plane. In the CNN operation, a product-sum operation is repeated while scanning a plurality of filter filters in units of pixels, and a final product-sum result is nonlinearly transformed to generate a feature plane. Note that a sigmoid function or the like is applied to the nonlinear transformation. When calculating the feature plane 503a, since the number of connections with the previous layer is 1, one convolution filter 504a is used.

一方、特徴面５０７ａ及び特徴面５０７ｂを計算する場合、前階層との結合数が３であるため夫々５０８ａ〜５０８ｃ及び５０８ｄ〜５０８ｅに相当する３つのコンボリューションフィルタの演算結果を累積加算する。つまり、特徴面５０７ａの１つの特徴値は、コンボリューションフィルタ演算５０８ａ〜５０８ｃの全ての出力を累積加算し、最後に非線形変換処理することによって得る。５０５ａ〜５０５ｃはコンボリューションフィルタ演算５０８に必要な参照画像領域を示す。ＣＮＮ演算は強力な特徴抽出手法として知られているが、この様に積和演算回数が多く、高い処理負荷を要する手法である。 On the other hand, when calculating the feature plane 507a and the feature plane 507b, since the number of connections with the previous layer is 3, the calculation results of the three convolution filters corresponding to 508a to 508c and 508d to 508e, respectively, are cumulatively added. That is, one feature value of the feature surface 507a is obtained by accumulating all the outputs of the convolution filter operations 508a to 508c and finally performing a nonlinear conversion process. Reference numerals 505 a to 505 c denote reference image areas necessary for the convolution filter calculation 508. CNN calculation is known as a powerful feature extraction technique, but is a technique that requires a large processing load and requires a high processing load.

ステップＳ８０２又はステップＳ８０３では、ＣＮＮ演算結果である特徴面５０７ａ、５０７ｂの重心を特徴点位置座標とする。なお、実際の処理では画像中の検出対象の特徴点の存在可能性を考慮して、限定された領域に対して演算を実行する。５０９ａ、５０９ｂは実際に演算する２層の特徴面の領域を示す。この様に限定された領域に対して得られた結果の重心を特徴点位置座標とする。 In step S802 or step S803, the center of gravity of the feature surfaces 507a and 507b, which is the CNN calculation result, is used as the feature point position coordinates. In actual processing, the calculation is performed on a limited region in consideration of the possibility of the existence of feature points to be detected in the image. Reference numerals 509a and 509b denote two-layer feature plane regions that are actually calculated. The center of gravity of the result obtained for the limited region is used as the feature point position coordinate.

図５では、説明のため２個の特徴を抽出する場合について説明したが、高精度特徴点位置検出処理を行うステップＳ８０３では、１５個の特徴点位置を検出可能なネットワーク構成を処理する。この場合、２層の特徴数が１５個となる。一方、高速特徴点位置検出処理を行うステップＳ８０２では、６個の特徴点位置を検出可能なネットワーク構成を処理する。高速特徴点位置検出処理を行なうステップＳ８０２では検出する特徴点の数を削減することにより処理演算量を大幅に削減することができる。 In FIG. 5, the case where two features are extracted has been described for the sake of explanation. However, in step S803 in which high-precision feature point position detection processing is performed, a network configuration capable of detecting 15 feature point positions is processed. In this case, the number of features of the two layers is 15. On the other hand, in step S802 in which high-speed feature point position detection processing is performed, a network configuration capable of detecting six feature point positions is processed. In step S802 in which high-speed feature point position detection processing is performed, the amount of processing calculations can be greatly reduced by reducing the number of feature points to be detected.

特徴量算出位置決定処理を行うステップＳ８０４では、ステップＳ８０２またはＳ８０３の結果に対して幾何学的な補正処理を実行し、特徴量算出の基準となる位置を決定する。図６は、補正処理の例を説明する図である。４０２ａは目尻を特徴とする特徴点であるが、誤って眉毛端の位置に判定されている。ステップＳ８０４では、人の顔の特徴の配置関係に基づいて統計的な処理によりその位置を補正する。即ち４０２ａを４０２ｂの位置に補正する。 In step S804 in which the feature amount calculation position determination process is performed, a geometric correction process is executed on the result of step S802 or S803 to determine a position serving as a reference for calculating the feature amount. FIG. 6 is a diagram illustrating an example of the correction process. 402a is a feature point that characterizes the corner of the eye but is erroneously determined as the position of the end of the eyebrows. In step S804, the position is corrected by statistical processing based on the arrangement relationship of human face features. That is, 402a is corrected to the position 402b.

図７は、ステップＳ８０４の処理内容を説明する図である。ステップＳ７００では、特徴点の数を調整する。ここでは、認識動作モード（高速モード又は高精度モード）毎に異なる特徴点の数を合わせる。高速モードの場合、高精度モードに対して検出されない９個の特徴点位置を予め定める平均的な座標値で代替する。ここでの平均値は後述する図７の平均ベクトルＡの対応する要素である。つまり、ステップＳ７０１〜Ｓ７０８の各処理は特徴点の数に関わらず共通に処理する。 FIG. 7 is a diagram for explaining the processing content of step S804. In step S700, the number of feature points is adjusted. Here, the number of different feature points is adjusted for each recognition operation mode (high-speed mode or high-accuracy mode). In the case of the high-speed mode, nine feature point positions that are not detected in the high-accuracy mode are replaced with predetermined average coordinate values. The average value here is a corresponding element of the average vector A in FIG. That is, the processes in steps S701 to S708 are performed in common regardless of the number of feature points.

ステップＳ７０１では、各特徴点位置座標を単純に連結して１つのベクトルを生成する。本実施形態の場合は、１５個の特徴点の位置座標から３０次元の特徴ベクトルＶを生成する。各特徴点の位置座標データ（ｘ_i，ｙ_i）（ｉは特徴点の番号１〜１５）を単純に連結したデータ列を特徴点ベクトルＶ（要素ｖ_j：ｊ＝１〜３０）とする。特徴点の番号１〜１５は本実施形態では特徴点４０１〜４１５に対応する。したがって、例えば、特徴点ベクトルの要素ｖ₁、ｖ₂はそれぞれ特徴点４０１のｘ座標値、ｙ座標値に対応する。特徴ベクトルＶは以下の式（２）で定義する。なお、以降、Ｔは転置を示し、ｆは特徴点の数を示す。
Ｖ＝（ｖ₁，ｖ₂，ｖ₃，・・・，ｖ₂×_f）^T ・・・（２） In step S701, each feature point position coordinate is simply connected to generate one vector. In the present embodiment, a 30-dimensional feature vector V is generated from the position coordinates of 15 feature points. A data string obtained by simply connecting the position coordinate data (x _i , y _i ) of each feature point (where i is a feature point number 1 to 15) is a feature point vector V (element v _j : j = 1 to 30). . Feature point numbers 1 to 15 correspond to feature points 401 to 415 in this embodiment. Therefore, for example, elements v ₁ and v ₂ of the feature point vector correspond to the x coordinate value and the y coordinate value of the feature point 401, respectively. The feature vector V is defined by the following equation (2). Hereinafter, T indicates transposition, and f indicates the number of feature points.
V = (v ₁ , v ₂ , v ₃ ,..., V ₂ × _f ) ^T (2)

平均ベクトルを減算するステップＳ７０２及び射影演算処理を行うステップＳ７０３では、それぞれ平均ベクトルＡ７０７、射影行列Ｅ７０８を用いて射影ベクトルＰを算出する。射影ベクトルＰは、特徴点ベクトルＶから平均ベクトルＡを減じたベクトルと射影行列Ｅとを使用して以下の式（３）〜式（５）により算出する。なお、射影行列Ｅ及び平均ベクトルＡは、予め多数の顔画像に対する特徴点ベクトル（学習用特徴ベクトル）を用いて、主成分分析により算出した行列である。したがって、ここでの射影行列Ｅは固有ベクトルにより構成される。学習用特徴ベクトルは顔画像の正しい特徴点位置座標を同様に連結して生成したベクトルである。
Ｐ＝Ｅ^T（Ｖ−Ａ）・・・（３）
Ａ＝（ａ₁，ａ₂，ａ₃，・・・，ａ₂×_f）^T ・・・（４）
Ｅ＝（ｕ₁，ｕ₂，・・・，ｕ_p）・・・（５） In step S702 for subtracting the average vector and in step S703 for performing the projection calculation process, the projection vector P is calculated using the average vector A707 and the projection matrix E708, respectively. The projection vector P is calculated by the following equations (3) to (5) using a vector obtained by subtracting the average vector A from the feature point vector V and the projection matrix E. Note that the projection matrix E and the average vector A are previously calculated by principal component analysis using feature point vectors (learning feature vectors) for a large number of face images. Therefore, the projection matrix E here is composed of eigenvectors. The learning feature vector is a vector generated by concatenating correct feature point position coordinates of a face image in the same manner.
P = E ^T (V−A) (3)
A = (a ₁ , a ₂ , a ₃ ,..., A ₂ × _f ) ^T (4)
E = (u ₁ , u ₂ ,..., U _p ) (5)

ｕ₁，ｕ₂，・・・，ｕ_pは主成分分析によって得られたそれぞれ２×ｆ次元の正規直交ベクトル（固有ベクトル）である。本実施形態の場合、３０次元のベクトルとなる。ｐは射影ベクトルの次元であり本実施形態では８とする。即ち主成分分析によって得られる直交ベクトルのうち、対応する固有値が大きい８個のベクトルを選択した行列が射影行列Ｅである。射影行列Ｅ及び平均ベクトルＡはＲＯＭ２０８或いはＲＡＭ２０９等に予め格納されているものとする。ステップＳ７０２及びＳ７０３では式（３）〜式（５）の演算により、２×ｆ次元の特徴点ベクトルをｐ次元の射影ベクトルに次元削減する。即ちｐ次元の部分空間に射影する。 u ₁ , u ₂ ,..., u _p are 2 × f-dimensional orthonormal vectors (eigenvectors) obtained by principal component analysis. In the case of this embodiment, it is a 30-dimensional vector. p is the dimension of the projection vector, and is 8 in this embodiment. That is, a projection matrix E is a matrix in which eight vectors having large corresponding eigenvalues among orthogonal vectors obtained by principal component analysis are selected. Assume that the projection matrix E and the average vector A are stored in advance in the ROM 208, the RAM 209, or the like. In Steps S702 and S703, the 2 × f-dimensional feature point vector is reduced to a p-dimensional projection vector by the operations of Expressions (3) to (5). That is, it projects onto a p-dimensional subspace.

次元を復元するステップＳ７０４及びＳ７０５では、射影ベクトルＰから元の特徴点ベクトル次元のデータ（即ち座標位置）を復元する。復元ベクトルＶ'は前記射影行列Ｅと前記平均ベクトルＡを用いて（６）式により算出する。
Ｖ'＝ＥＰ＋Ａ・・・（６） In steps S704 and S705 for restoring the dimensions, the original feature point vector dimension data (that is, the coordinate position) is restored from the projection vector P. The restoration vector V ′ is calculated by the equation (6) using the projection matrix E and the average vector A.
V ′ = EP + A (6)

ステップＳ７０６では、逆射影した復元ベクトルＶ'から補正後の座標データを取り出す。以上のステップＳ７０１からステップ７０６の処理により、全ての特徴点位置データを連結したベクトルを次元削減した部分空間に射影した後逆射影することによって、統計的な外れ値を補正することができる。つまり、射影した部分空間で表現できない外れ値（誤検出）が補正される。これによって、各特徴点の配置関係に基づいて幾何学的な配置関係を補正し、図６に示すような誤検出を補正する。なお、当該処理に関しては非特許文献１等にも開示されている。 In step S706, corrected coordinate data is extracted from the back-projected restored vector V ′. Through the processing from step S701 to step 706 described above, a statistical outlier can be corrected by projecting a vector obtained by concatenating all the feature point position data onto a subspace having a reduced dimension and then performing reverse projection. That is, outliers (incorrect detection) that cannot be expressed in the projected subspace are corrected. As a result, the geometrical arrangement relation is corrected based on the arrangement relation of each feature point, and the erroneous detection as shown in FIG. 6 is corrected. This processing is also disclosed in Non-Patent Document 1 and the like.

以上のように、特徴量の算出位置を決定するステップＳ８０４で、特徴点位置検出処理の内容の違いを吸収する。即ち入力特徴点数によらず、後段の処理に対して共通となる位置基準を提供する。 As described above, the difference in the content of the feature point position detection process is absorbed in step S804 in which the feature value calculation position is determined. In other words, a position reference that is common to subsequent processing is provided regardless of the number of input feature points.

次に、ステップＳ８０５〜ステップＳ８０７では、ステップＳ８０４で得られた位置に基づいて顔認識処理を実行する。認識処理は従来提案されている様々な手法を適用してよい。まず、ステップＳ８０５では、ステップＳ８０４の処理結果を基準にして複数の局所的な領域を切り出し、直交変換等により次元圧縮し、次元圧縮したデータを特徴量とする。例えば目尻や目頭等を中心とした局所領域から複数の特徴量を算出する。 Next, in steps S805 to S807, face recognition processing is executed based on the position obtained in step S804. For recognition processing, various conventionally proposed methods may be applied. First, in step S805, a plurality of local regions are cut out based on the processing result in step S804, dimensionally compressed by orthogonal transformation or the like, and the dimensionally compressed data is used as a feature amount. For example, a plurality of feature amounts are calculated from a local region centered on the corners of the eyes and the eyes.

次に、ステップＳ８０６では、複数の登録データをＲＡＭ２０９から読み出し、ステップＳ８０７で登録データとステップＳ８０５で得られた特徴量との相関演算により登録者に対する類似度を算出する。なお、特徴量は特徴点の位置を基準にして複数算出される。そして、特徴量毎に得られる複数の相関値を統合することにより最終的な類似度を算出し、当該最終類似度を閾値処理することにより処理対象の画像データが登録者であるか否かを判定する。 Next, in step S806, a plurality of registration data is read from the RAM 209, and the similarity to the registrant is calculated by a correlation operation between the registration data and the feature amount obtained in step S805 in step S807. A plurality of feature amounts are calculated based on the position of the feature point. Then, a final similarity is calculated by integrating a plurality of correlation values obtained for each feature amount, and whether or not the image data to be processed is a registrant by performing threshold processing on the final similarity. judge.

なお、ステップＳ８０７では、ステップＳ８０６で読み出した複数の登録データ（登録データ群）をステップＳ８０５の処理結果と比較し、その最終類似度が最も高くかつ予め定める閾値以上の場合、認識対象画像の顔が当該登録者であると判定する。そのため、登録データ群の中に信頼性の低い（特徴点位置の信頼度が低い場合等）登録データが１個でも混入した場合、全体的な認識精度が低下する可能性がある。 In step S807, the plurality of registration data (registration data group) read in step S806 is compared with the processing result in step S805. If the final similarity is the highest and is equal to or greater than a predetermined threshold, the face of the recognition target image Is determined to be the registrant. Therefore, if even one piece of registered data is mixed in the registered data group (for example, when the reliability of the feature point position is low), the overall recognition accuracy may be lowered.

そして、ステップＳ８０８では、判定結果をＲＡＭ２０９に記録する。本実施形態の場合、特徴量を算出する局所領域の数や種類は認識動作モードが異なる場合でも同一である。つまり、ステップＳ８０５〜ステップＳ８０７は認識動作モードによらず常に同じ処理で認識を実行する。 In step S808, the determination result is recorded in the RAM 209. In the case of the present embodiment, the number and type of local regions for calculating feature amounts are the same even when the recognition operation modes are different. That is, in steps S805 to S807, recognition is always performed by the same process regardless of the recognition operation mode.

次に、登録処理について説明する。ステップＳ８００により登録処理が選択された場合、ＣＰＵ２０７はステップＳ８０９で高精度特徴点位置検出処理を実行する。当該処理ステップは認識処理時のステップＳ８０３と同じであり、顔画像から１５個の特徴点位置を検出する。以下、ステップＳ８１０、ステップＳ８１１はそれぞれ、認識処理時のステップＳ８０４、ステップＳ８０５と同じである。 Next, the registration process will be described. If registration processing is selected in step S800, the CPU 207 executes high-precision feature point position detection processing in step S809. This processing step is the same as step S803 in the recognition processing, and 15 feature point positions are detected from the face image. Hereinafter, step S810 and step S811 are the same as step S804 and step S805, respectively, during the recognition process.

ステップＳ８１２では、処理対象の顔画像データに対応する個人情報（名前など）と共に得られた特徴量をＲＡＭ２０９等に格納する。或いは図示しないハードディスク等の記録装置に格納する。以上のように、登録処理時は処理時間の制約が少ない（リアルタイムに処理する必然性が少ない場合が多い）ことから常に高精度の特徴点位置検出処理により多数の特徴点位置を検出する。前述した様に多数の特徴点を検出することによりステップＳ８０４で得られる位置データの信頼度が向上し、それに伴い登録データの信頼度も向上する。 In step S812, the feature quantity obtained together with the personal information (name etc.) corresponding to the face image data to be processed is stored in the RAM 209 or the like. Alternatively, it is stored in a recording device such as a hard disk (not shown). As described above, since there are few processing time restrictions during registration processing (there is often less necessity for real-time processing), a large number of feature point positions are always detected by highly accurate feature point position detection processing. As described above, by detecting a large number of feature points, the reliability of the position data obtained in step S804 is improved, and accordingly, the reliability of the registered data is also improved.

本実施形態では、認識動作モードに応じて処理負荷の異なる特徴点位置検出処理で特徴点位置を検出し、認識動作モードによらず共通の基準で算出した特徴量と高精度な特徴点位置から生成した登録データを用いて認識処理を実行する。即ち、特徴点位置検出処理の違いを特徴量算出位置決定処理で吸収させることにより、常に高精度に登録データを生成する。 In the present embodiment, feature point positions are detected by feature point position detection processing with different processing loads depending on the recognition operation mode, and feature amounts calculated based on a common reference and high-precision feature point positions regardless of the recognition operation mode. A recognition process is executed using the generated registration data. That is, the difference in the feature point position detection process is absorbed by the feature amount calculation position determination process, so that registration data is always generated with high accuracy.

以上のように本実施形態によれば、登録時は常に高精度に特徴点位置を検出することにより、認識処理を簡易な特徴点位置検出処理で高速化した場合の認識性能の低下が軽減される。 As described above, according to the present embodiment, the feature point position is always detected with high accuracy at the time of registration, so that a reduction in recognition performance when the recognition process is accelerated by a simple feature point position detection process is reduced. The

（第２の実施形態）
図９は、本実施形態の動作を説明するフローチャートである。本実施例も図２で示す装置で動作し、図９に示すフローチャートはＣＰＵ２０７の動作を示すものとする。また、以下の説明において、顔画像データはＲＡＭ２０９に既に格納されているものとする。 (Second Embodiment)
FIG. 9 is a flowchart for explaining the operation of the present embodiment. This embodiment also operates with the apparatus shown in FIG. 2, and the flowchart shown in FIG. 9 shows the operation of the CPU 207. In the following description, it is assumed that face image data is already stored in the RAM 209.

本実施形態では処理モードに応じて特徴点位置検出方法が異なっており、ＣＰＵ２０７はステップＳ９００で処理モードを判定する。そして、認識処理時は、ステップＳ９０１で高速特徴点位置検出処理を実行する。即ち、認識処理は常に高速に動作させる。なお、ステップＳ９０１の内容は図８のステップＳ８０２と同じである。ここでは６個の特徴点の位置を検出する。 In this embodiment, the feature point position detection method differs according to the processing mode, and the CPU 207 determines the processing mode in step S900. Then, during the recognition process, a high-speed feature point position detection process is executed in step S901. That is, the recognition process is always operated at high speed. Note that the content of step S901 is the same as that of step S802 in FIG. Here, the positions of six feature points are detected.

次に、ステップＳ９０２では、ステップＳ９０１の処理結果に従って特徴量算出位置を決定する。以降、特徴量算出処理（ステップＳ９０３）、登録データ読み出し処理（ステップＳ９０４）、判定処理（ステップＳ９０５）、認識結果記録処理（ステップＳ９０６）を順次実行する。ここでのステップＳ９０２〜ステップＳ９０６はそれぞれステップＳ８０４〜ステップＳ８０８と同じである。即ち検出された６個の特徴点からステップＳ９０２で１５個の特徴点位置を決定し、当該位置に基づいて認識処理を実行する。即ち、ステップＳ９０２で検出した特徴点の数の違いを吸収する。 Next, in step S902, a feature amount calculation position is determined according to the processing result of step S901. Thereafter, a feature amount calculation process (step S903), a registered data read process (step S904), a determination process (step S905), and a recognition result recording process (step S906) are sequentially executed. Steps S902 to S906 here are the same as steps S804 to S808, respectively. That is, 15 feature point positions are determined from the detected 6 feature points in step S902, and recognition processing is executed based on the positions. That is, the difference in the number of feature points detected in step S902 is absorbed.

一方、登録処理時は、ステップＳ９０７で１５個の特徴点位置を検出し、以後、特徴量算出処理（ステップＳ９０８）、登録データ読み出し処理（ステップＳ９０９）、及び登録データ生成処理（ステップＳ９１０）を実行する。ここでのステップＳ９０８〜ステップＳ９１０はそれぞれステップＳ８１０〜ステップＳ８１２と同じである。 On the other hand, at the time of registration processing, 15 feature point positions are detected in step S907, and thereafter, feature amount calculation processing (step S908), registration data reading processing (step S909), and registration data generation processing (step S910) are performed. Run. Steps S908 to S910 here are the same as steps S810 to S812, respectively.

以上のように本実施形態によれば、処理モードに応じて処理負荷の異なる特徴点位置決定処理で特徴点を検出し、共通の特徴量・登録データで認識処理を実行する。登録時はより高精度に特徴点位置を検出することにより認識性能が向上する。 As described above, according to the present embodiment, feature points are detected by the feature point position determination process having different processing loads depending on the processing mode, and the recognition process is executed using the common feature amount / registered data. At the time of registration, the recognition performance is improved by detecting the feature point position with higher accuracy.

（第３の実施形態）
図１０は、本実施形態の動作を説明するフローチャートである。本実施例は図１１に示す画像認識装置で動作し、図１０に示すフローチャートはＣＰＵ２０７の動作を示すものとする。図１１に示す画像認識装置は、図２で示す画像認識装置にネットワークインターフェース２１２が付加されている。更に、ネットワークを介してネットワーク上のサーバー装置２１３と接続される。なお、図１１の２０１〜２１１は図２と同じである。 (Third embodiment)
FIG. 10 is a flowchart for explaining the operation of the present embodiment. This embodiment operates with the image recognition apparatus shown in FIG. 11, and the flowchart shown in FIG. 10 shows the operation of the CPU 207. In the image recognition apparatus shown in FIG. 11, a network interface 212 is added to the image recognition apparatus shown in FIG. Further, it is connected to a server device 213 on the network via the network. 11 are the same as those in FIG.

本実施形態では、認識動作は画像認識装置で処理され、登録データの生成はサーバー装置２１３で処理される。なお、以下の説明において、顔画像データはＲＡＭ２０９に既に格納されているものとする。 In the present embodiment, the recognition operation is processed by the image recognition device, and the registration data generation is processed by the server device 213. In the following description, it is assumed that the face image data is already stored in the RAM 209.

以下、図１０に従って本実施形態の動作について説明する。本実施形態では処理モードに応じて特徴点位置検出方法が異なる。まず、ＣＰＵ２０７はステップＳ１０００で処理モードを判定する。認識処理時は、ステップＳ１００１で、高速特徴点位置検出処理を実行する。当該処理ステップの内容は図９のステップＳ８０２と同じであり、６個の特徴点の位置を検出する。 The operation of this embodiment will be described below with reference to FIG. In this embodiment, the feature point position detection method differs depending on the processing mode. First, the CPU 207 determines a processing mode in step S1000. At the time of recognition processing, high-speed feature point position detection processing is executed in step S1001. The contents of this processing step are the same as those in step S802 in FIG. 9, and the positions of six feature points are detected.

次に、ステップＳ１００２では、ステップＳ１００１の処理結果に従って特徴量算出位置を決定する。以降、特徴量算出処理（ステップＳ１００３）、登録データ読み出し処理（ステップＳ１００４）、判定処理（ステップＳ１１０５）、及び認識結果記録処理（ステップＳ１００６）を実行する。ここでのステップＳ１００２〜ステップＳ１００６はそれぞれ、ステップＳ８０４〜ステップＳ８０８と同じである。即ち検出された６個の特徴点からステップＳ１００２で１５個の特徴点位置を決定し、当該位置を基準にして認識処理を実行する。即ち、ステップＳ１００２で検出した特徴点の数の違いを吸収する。 Next, in step S1002, the feature amount calculation position is determined according to the processing result of step S1001. Thereafter, a feature amount calculation process (step S1003), a registered data read process (step S1004), a determination process (step S1105), and a recognition result recording process (step S1006) are executed. Steps S1002 to S1006 here are the same as steps S804 to S808, respectively. That is, 15 feature point positions are determined in step S1002 from the detected 6 feature points, and recognition processing is executed based on the positions. That is, the difference in the number of feature points detected in step S1002 is absorbed.

一方、登録処理時は、ステップＳ１００７で顔画像データ及び当該顔画像に対する個人を識別する情報を、ネットワークインターフェース２１２を介してサーバー装置２１３に転送する。サーバー装置２１３は、ステップＳ１００８で顔画像データを受信すると、ステップＳ１００９で当該データに対して高精度な特徴点位置検出処理を開始する。ここではステップＳ８０９と同様に１５個の特徴点に対して検出処理を実行する。 On the other hand, at the time of registration processing, face image data and information for identifying an individual with respect to the face image are transferred to the server device 213 via the network interface 212 in step S1007. When the server apparatus 213 receives the face image data in step S1008, the server apparatus 213 starts a highly accurate feature point position detection process for the data in step S1009. Here, detection processing is executed for 15 feature points as in step S809.

次に、ステップＳ１０１０では、ステップＳ１００９で得られた特徴点位置から特徴量算出のための基準位置決定処理を実行する。ステップＳ１０１０〜ステップＳ１０１１の処理内容はステップＳ８１０〜Ｓ８１１の処理内容と同じである。そして、ステップＳ１０１２では得られた特徴量を登録データとして画像認識装置へ送信する。画像認識装置は、ステップＳ１０１３でネットワークインターフェース２１２を介してサーバー装置２１３が生成した登録データを受信する。そして、ステップＳ１０１４で登録データとしてＲＡＭ２０９に格納する。認識処理時はここで格納した登録データを用いて判定処理（ステップＳ１００５）を実行する。 Next, in step S1010, a reference position determination process for calculating a feature amount is executed from the feature point position obtained in step S1009. The processing contents of steps S1010 to S1011 are the same as the processing contents of steps S810 to S811. In step S1012, the obtained feature amount is transmitted as registration data to the image recognition apparatus. The image recognition apparatus receives the registration data generated by the server apparatus 213 via the network interface 212 in step S1013. In step S1014, it is stored in the RAM 209 as registered data. At the time of recognition processing, determination processing (step S1005) is executed using the registration data stored here.

本実施形態では、処理モードに応じて処理負荷の異なる特徴点位置検出処理で特徴点位置を検出する。その際、処理負荷の高い特徴点位置検出処理をサーバー装置２１３で処理する。このように本実施形態によれば、登録時はより高精度に特徴点位置を検出処理することにより認識性能が向上する。更にその場合も、登録時の処理負荷が増大しないようにすることができる。 In the present embodiment, feature point positions are detected by feature point position detection processing with different processing loads depending on the processing mode. At that time, the server apparatus 213 performs feature point position detection processing with a high processing load. As described above, according to the present embodiment, the recognition performance is improved by detecting the feature point position with higher accuracy during registration. In this case, the processing load during registration can be prevented from increasing.

（第４の実施形態）
図１２は、本実施形態の動作を説明するフローチャートである。本実施形態では図１１に示す画像認識装置で動作する。本実施形態では第３の実施形態との違いのみについて説明する。認識処理動作は第３の実施形態と同じである。即ちステップＳ１２０１〜ステップＳ１２０６はステップＳ１００１〜ステップＳ１００６と同じである。 (Fourth embodiment)
FIG. 12 is a flowchart for explaining the operation of the present embodiment. In this embodiment, the image recognition apparatus shown in FIG. 11 operates. In the present embodiment, only differences from the third embodiment will be described. The recognition processing operation is the same as in the third embodiment. That is, steps S1201 to S1206 are the same as steps S1001 to S1006.

登録処理時では、ステップＳ１２０７で顔画像データを、ネットワークインターフェース２１２を介してサーバー装置２１３に転送する。サーバー装置２１３は、ステップＳ１２０８で顔画像データを受信すると、ステップＳ１２０９で当該データに対して高精度な特徴点位置検出処理を開始する。ここではステップＳ８０９と同様に１５個の特徴点に対して検出処理を実行する。そして、ステップＳ１２１０では、検出した１５個の特徴点データを、ネットワークを介して画像認識装置に送信する。 In the registration process, the face image data is transferred to the server device 213 via the network interface 212 in step S1207. When the server apparatus 213 receives the face image data in step S1208, the server apparatus 213 starts a highly accurate feature point position detection process for the data in step S1209. Here, detection processing is executed for 15 feature points as in step S809. In step S1210, the detected 15 feature point data are transmitted to the image recognition apparatus via the network.

画像認識装置は、ステップＳ１２１１でネットワークインターフェース２１２を介してサーバー装置２１３が検出して特徴点位置を受信する。そして、ステップＳ１２１２では、ステップＳ１２０９で得られた特徴点位置から特徴量算出のための位置決定処理を実行する。ステップＳ１２１２〜ステップＳ１２１３はステップＳ８１０〜Ｓ８１１と同じである。そして、ステップＳ１２０４では、得られた特徴量を登録データとしてＲＡＭ２０９に格納する。認識処理時はここで格納した登録データを用いて判定処理（ステップＳ１２０５）を実行する。 In step S1211, the image recognition apparatus is detected by the server apparatus 213 via the network interface 212 and receives the feature point position. In step S1212, a position determination process for calculating a feature amount is executed from the feature point position obtained in step S1209. Steps S1212 to S1213 are the same as steps S810 to S811. In step S1204, the obtained feature amount is stored in the RAM 209 as registered data. At the time of recognition processing, determination processing (step S1205) is executed using the registration data stored here.

本実施形態では、処理モードに応じて処理負荷の異なる特徴点位置検出処理で特徴点位置を検出する。その際、処理負荷の高い特徴点位置検出処理をサーバー装置２１３で処理する。更に、本実施形態では特徴量の算出を端末装置側で処理することによりサーバー装置２１３の処理負荷を軽減する。このように本実施形態によれば、登録時はより高精度に特徴点位置を検出処理することにより認識性能が向上する。更にその場合も、登録時の処理負荷が増大しないようにすることができる。 In the present embodiment, feature point positions are detected by feature point position detection processing with different processing loads depending on the processing mode. At that time, the server apparatus 213 performs feature point position detection processing with a high processing load. Furthermore, in the present embodiment, the processing load of the server device 213 is reduced by processing the feature amount calculation on the terminal device side. As described above, according to the present embodiment, the recognition performance is improved by detecting the feature point position with higher accuracy during registration. In this case, the processing load during registration can be prevented from increasing.

（その他の実施形態）
第１の実施形態では、前フレームの認識結果にしたがって認識動作モード（高速モード・高精度モード）を選択する場合について説明したが、この様な場合に限るわけではない。例えば、対象とする画像３１中の顔の数によって高精度モードと高速モードを切り替える。ユーザーが所定のユーザーインターフェースを介して高精度モードと高速モードを設定する等様々なケースが考えられる。例えばユーザーインターフェースを介して設定する場合、ステップＳ８０１はユーザーの指定する情報に基づいて認識動作モードを判定する。 (Other embodiments)
In the first embodiment, the case where the recognition operation mode (high speed mode / high accuracy mode) is selected according to the recognition result of the previous frame has been described. However, the present invention is not limited to such a case. For example, the high accuracy mode and the high speed mode are switched depending on the number of faces in the target image 31. Various cases are conceivable, such as a user setting a high-precision mode and a high-speed mode via a predetermined user interface. For example, when setting via a user interface, step S801 determines the recognition operation mode based on information designated by the user.

前述した実施形態では画像中の顔画像から特定の人物を認識する場合について説明したが、本発明はこれに限るわけではない。特徴点の位置に基づいて所定の物体を識別する様々な画像認識装置に利用することが可能である。また、前述した実施形態では特徴点位置検出処理で検出する特徴点の数が異なる場合について説明したが、その検出方法が異なる場合でも良い。さらに、前述した実施形態では本発明に関する処理をソフトウェアにより実現する場合について説明したが、専用ハードウェアで構成することも可能である。 In the above-described embodiment, the case where a specific person is recognized from the face image in the image has been described, but the present invention is not limited to this. The present invention can be used for various image recognition apparatuses that identify a predetermined object based on the position of a feature point. In the above-described embodiment, the case where the number of feature points detected by the feature point position detection process is different has been described, but the detection method may be different. Furthermore, in the above-described embodiment, the case where the processing related to the present invention is realized by software has been described, but it is also possible to configure with dedicated hardware.

また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

１００顔画像データ取得処理
１０２第１の特徴点位置検出処理
１０３第２の特徴点位置検出処理
１０４選択処理
１０５特徴量算出位置決定処理
１０６特徴量算出処理
１１１処理モード判定処理 100 face image data acquisition processing 102 first feature point position detection processing 103 second feature point position detection processing 104 selection processing 105 feature amount calculation position determination processing 106 feature amount calculation processing 111 processing mode determination processing

Claims

An image data acquisition process for acquiring image data;
A first feature point position detecting step for detecting a position of a feature point from the acquired image data;
A second feature point position detecting step for detecting positions of more feature points than the first feature point position detecting step from the acquired image data;
A selection step of selecting processing by the first feature point position detection step or the second feature point position detection step;
A feature amount calculation position determining step for determining a calculation position of the feature amount based on the selected result;
A feature amount calculation step for calculating a feature amount based on the result of the feature amount calculation position determination step;
A determination step of determining a processing mode ;
When the processing mode determined by the determination step is a registration process, the processing by the second feature point position detection step is selected, and the result of the second feature point position detection step and the feature amount calculation position determination step A registration step of generating registration data from the feature amount calculated based on the result of
When the processing mode determined by the determination step is recognition processing, the processing by the first feature point position detection step is selected, and the result of the first feature point position detection step and the feature amount calculation position determination step and a recognizing step the image data on the basis of the calculated feature amount based on the result and the said registration data,
When the processing by the first feature point position detection step is selected in the selection step, the feature amount calculation position determination step includes the feature point detected in the first feature point position detection step, Based on a feature point obtained by substituting a predetermined coordinate value for the position of a feature point that is detected in the second feature point position detection step but not detected in the first feature point position detection step An image recognition method characterized by determining a calculation position of a quantity .

The image recognition method according to claim 1, wherein the predetermined coordinate value is an average coordinate value corresponding to an average vector calculated by principal component analysis.

3. The feature amount calculation position determination step according to claim 1, wherein a geometric correction process is performed on the position of the feature point to determine the calculation position of the feature amount. Image recognition method.

Image data acquisition means for acquiring image data;
First feature point position detecting means for detecting the position of the feature point from the acquired image data;
Second feature point position detecting means for detecting positions of more feature points than the first feature point position detecting means from the acquired image data;
Selecting means for selecting processing by the first feature point position detecting means or the second feature point position detecting means;
Feature amount calculation position determining means for determining the calculation position of the feature amount based on the selected result;
Feature quantity calculation means for calculating a feature quantity based on a result of processing by the feature quantity calculation position determination means;
Determining means for determining a processing mode;
When the processing mode determined by the determination unit is a registration process, the process by the second feature point position detection unit is selected, the result of the process by the second feature point position detection unit, and the feature amount calculation position Registration means for generating registration data from the feature amount calculated based on the result of processing by the determination means ;
When the processing mode determined by the determination unit is a recognition process, a process by the first feature point position detection unit is selected, and a result of the process by the first feature point position detection unit and the feature amount calculation position determination Recognizing means for recognizing image data based on the feature amount calculated based on the result of processing by the means and the registered data;
When the processing by the first feature point position detection unit is selected by the selection unit, the feature amount calculation position determination unit includes the feature point detected by the first feature point position detection unit, Based on a feature point obtained by substituting a predetermined coordinate value for the position of a feature point that is detected in the process by the second feature point position detection unit but not detected in the process by the first feature point position detection unit. An image recognition apparatus characterized by determining a calculation position of a feature amount .

The program for making a computer perform each process of the image recognition method of any one of Claims 1-3.