JP2019096146A

JP2019096146A - Image identification device, image identification method, computer program, and storage medium

Info

Publication number: JP2019096146A
Application number: JP2017226140A
Authority: JP
Inventors: 雄司金田; Yuji Kaneda; 佐藤　博; Hiroshi Sato; 博佐藤; 俊亮中野; Toshiaki Nakano; 敦夫野本; Atsuo Nomoto; 大輔西野; Daisuke Nishino
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-11-24
Filing date: 2017-11-24
Publication date: 2019-06-20

Abstract

To provide an image identification device which achieves high accuracy face authentication.SOLUTION: In an image identification device, a face image for registration included in the image acquired by an image acquisition part 110 is registered in a face authentication dictionary storage part 150, and a pair dictionary of a low resolution dictionary and a high resolution dictionary which are created by learning the face image for registration is stored in a high resolution dictionary storage part 190. In the image identification device, a high resolution image is created by high resolution enhancement of the face image for registration included in the image acquired by the image acquisition part 110 using the pair dictionary, and the face image for authentication is identified on the basis of the created high resolution image and the face image for registration.SELECTED DRAWING: Figure 2

Description

本発明は、画像識別技術に関する。 The present invention relates to image identification technology.

カメラ等の撮像装置により撮像した画像から人物の顔画像を抽出し、該顔画像により人物を特定する顔認証技術が広く用いられるようになっている。非特許文献１は、入力された輝度顔画像からＬＢＰ（Local Binary Pattern）特徴と呼ばれる特徴量を抽出して、ＬＢＰ特徴量に応じて人物を特定する技術を開示する。この技術では、ＬＢＰ特徴量と、予め登録された輝度顔画像から抽出した特徴量とを比較することで、入力された顔から人物を特定する。このような顔認証技術は、これまでデジタルカメラのオートシャッタや入退出管理に見られるように、撮像装置から被写体までの距離が短いなどの、比較的良好な撮影条件下で用いられている。 2. Description of the Related Art Face authentication technology for extracting a face image of a person from an image captured by an imaging device such as a camera and identifying the person from the face image has been widely used. Non-Patent Document 1 discloses a technique for extracting a feature quantity called an LBP (Local Binary Pattern) feature from an input luminance face image, and identifying a person according to the LBP feature quantity. In this technique, a person is identified from the input face by comparing the LBP feature amount with the feature amount extracted from the luminance face image registered in advance. Such face recognition technology has been used under relatively good imaging conditions such as a short distance from an imaging device to an object as seen in the automatic shutter and entry / exit management of digital cameras.

近年では監視カメラで取得するような遠方にある小さな顔やボケなどの低解像画像に対する顔認証の研究が行われている。低解像画像による顔認証に対する１つの手段として、画像の高解像化がある。例えば非特許文献２は、ある人物の顔画像を別人の顔画像の線形和で近似するFace hallucinationという技術を開示する。非特許文献３は、高解像化した画像を利用して顔認証を行うhallucination技術を用いた技術を開示する。 In recent years, research has been conducted on face recognition for low resolution images such as small faces and blurs in distant places such as obtained by surveillance cameras. One of the means for face recognition using low resolution images is to increase the resolution of images. For example, Non-Patent Document 2 discloses a technique called Face Halucination which approximates a face image of one person by a linear sum of face images of another person. Non-Patent Document 3 discloses a technology using a hallucination technology that performs face recognition using a high resolution image.

・hallucination技術の説明
hallucination技術は、低解像顔画像から高解像顔画像を生成する技術である。その原理の概要は、入力された低解像顔画像を他人の高解像顔画像で近似するというものである。図１６のhallucination技術の説明図により、hallucination技術の詳細を説明する。
図１６に示すように、予め様々な人物の顔画像を利用して高解像辞書と低解像辞書がペアとなっている高解像化辞書（数式１）を学習により用意する。高解像辞書と低解像辞書のペアは、数式２に示すように複数格納されている。また、高解像化辞書を構成する第１のペア、第１のペアを構成する低解像辞書、第１のペアを構成する高解像辞書のそれぞれは、数式３のように記述される。高解像化辞書と低解像辞書と高解像辞書との関係は数式４のとおりである。・ Description of hallucination technology
The hallucination technology is a technology for generating a high resolution face image from a low resolution face image. The outline of the principle is that an input low resolution face image is approximated by another person's high resolution face image. Details of the hallucination technology will be described with reference to the illustration of the hallucination technology in FIG.
As shown in FIG. 16, a high resolution dictionary (Formula 1) in which a high resolution dictionary and a low resolution dictionary are paired is previously prepared by learning using face images of various persons. A plurality of pairs of high resolution dictionary and low resolution dictionary are stored as shown in Equation 2. Further, each of the first pair constituting the high resolution dictionary, the low resolution dictionary constituting the first pair, and the high resolution dictionary constituting the first pair is described as Expression 3. . The relationship between the high resolution dictionary, the low resolution dictionary, and the high resolution dictionary is as shown in Formula 4.

高解像化辞書Ｄとして輝度画像を用いる場合には、高解像の輝度画像と低解像の輝度画像がペアとなって格納される。例えば、高解像の輝度画像は、映像中の顔が誰かを特定することが十分に可能なくらい鮮明な画像である。低解像の輝度画像は、顔が小さすぎるために又は顔がボケるために映像中の顔が誰かを特定するのが難しい画像である。
次に、入力された低解像顔画像Ｉ_Ｌから低解像部分画像が切り出される。この低解像部分画像は、高解像化辞書Ｄに記憶されている高解像辞書と低解像辞書のペア辞書（数式２）のうち、低解像辞書の線形和で近似される。数式５は近似の結果である。これにより、低解像部分画像を近似する低解像辞書と結合係数α（α１、α２、α３、…）が求まる。そして、低解像辞書に対応する高解像辞書と結合係数α（α１、α２、α３、…）を用いて高解像部分画像が生成される。数式６は生成された高解像部分画像を表す。 When a luminance image is used as the high resolution dictionary D, a high resolution luminance image and a low resolution luminance image are stored as a pair. For example, a high resolution luminance image is an image as clear as possible enough to identify who the face in the video is. Low-resolution luminance images are images in which it is difficult to identify who is the face in the video because the face is too small or the face is blurred.
Next, the low-resolution partial images are cut out from the input low-resolution face image I _L. This low resolution partial image is approximated by the linear sum of the low resolution dictionary among the high resolution dictionary and the low resolution dictionary (Dual Equation 2) stored in the high resolution dictionary D. Equation 5 is the result of the approximation. Thereby, a low resolution dictionary approximating a low resolution partial image and coupling coefficients α (α 1, α 2, α 3,...) Are obtained. Then, a high resolution partial image is generated using the high resolution dictionary corresponding to the low resolution dictionary and the coupling coefficient α (α1, α2, α3,...). Equation 6 represents the generated high resolution partial image.

なお、高解像化辞書Ｄは、輝度画像ではなく、エッジなどのような顔画像に共通な基底画像を利用しても良い。基底画像の例としては、非特許文献４のように主成分分析による固有顔などがある。 The high-resolution dictionary D may use not a luminance image but a base image common to face images such as edges. As an example of the base image, there is an eigenface by principal component analysis as in Non-Patent Document 4.

T. Pajdla, and J. Matas, “Face Recognitionwith Local Binary Patterns”, ECCV, pp. 469 - 481, 2004T. Pajdla, and J. Matas, “Face Recognition with Local Binary Patterns”, ECCV, pp. 469-481, 2004 K. Huang, R. Hu, “Facehallucination via K-selection mean constrained sparse representation”, ICIP, pp. 882 - 885, 2012K. Huang, R. Hu, “Facehallucination via K-selection mean constrained sparse representation,” ICIP, pp. 882-885, 2012 B. Li, H. Chang, “HallucinatingFacial Images and Features”, ICPR, pp. 1-4, 2008B. Li, H. Chang, “Hallucinating Facial Images and Features”, ICPR, pp. 1-4, 2008 M. Turk, A. Pentland, Eigenfaces forRecognition, Journal of Cognitive Neurosicence, Vol. 3, No. 1, 1991, pp. 71-86M. Turk, A. Pentland, Eigenfaces for Recognition, Journal of cognitive neurosicence, Vol. 3, No. 1, 1991, pp. 71-86

非特許文献２のように別人の顔の線形和で近似するFace hallucination技術は、予め用意しておく高解像化辞書Ｄが、画像の高解像化で重要になる。高解像化辞書Ｄは、小さい顔やボケなどの少ない情報に加え、ノイズが付加された状態からでも様々な人物の顔を十分に近似するだけの表現能力が必要である。そうでなければ、高解像化後の顔画像は本人とは似ていない顔画像となり、それを顔認証技術へ投入しても、本人とは別人の人物であると判定されてしまう可能性があるためである。しかし、実際には小さい顔やボケなど少ない情報から様々な人物の顔を十分に表現するだけでなく、ノイズなどにも対応可能な高解像化辞書Ｄを用意することは、非常に難しい。 As in Non-Patent Document 2, in the Face Hallucination technology that approximates with the linear sum of another person's face, the high resolution dictionary D prepared in advance is important for high resolution of the image. In addition to small information such as small faces and blurs, the high-resolution dictionary D needs an expression ability to sufficiently approximate the faces of various persons even in a state in which noise is added. Otherwise, the face image after high resolution will be a face image that is not similar to the person, and it may be determined to be a different person than the person even if it is input to face recognition technology There is However, it is very difficult to prepare a high-resolution dictionary D that can cope with noise as well as sufficiently represent the faces of various persons from small information such as small faces and blurs.

本発明は、上記課題に鑑みてなされたものであり、遠方にあるような小さな顔やボケ、さらにはノイズが含まれる顔画像に対しても高精度な顔認証を実現する画像識別装置を提供することを目的とする。 The present invention has been made in view of the above problems, and provides an image identification apparatus that realizes face recognition with high accuracy even for a face image including distant faces such as small faces and blurs, and noise. The purpose is to

本発明の画像識別装置は、画像を取得する画像取得手段と、前記画像取得手段で取得した画像に含まれる登録用の顔画像を登録する登録手段と、前記登録用の顔画像を学習して低解像辞書と高解像辞書とのペア辞書を生成する学習手段と、生成された前記ペア辞書が記憶される記憶手段と、前記画像取得手段で取得した画像に含まれる認証用の顔画像を前記ペア辞書を用いて高解像化することで、高解像画像を生成する高解像化画像生成手段と、前記高解像化画像生成手段で生成した前記高解像画像と前記登録手段に登録される前記登録用の顔画像とに基づいて、前記認証用の顔画像の識別を行う顔識別手段と、を備えることを特徴とする。 The image identification apparatus according to the present invention comprises an image acquisition unit for acquiring an image, a registration unit for registering a face image for registration included in the image acquired by the image acquisition unit, and learning the face image for registration. A learning unit that generates a pair dictionary of a low resolution dictionary and a high resolution dictionary, a storage unit that stores the generated pair dictionary, and an authentication face image included in an image acquired by the image acquisition unit High resolution image generation means for generating a high resolution image by using the pair dictionary for high resolution, and the high resolution image generated with the high resolution image generation means and the registration And face identification means for identifying the face image for authentication based on the face image for registration registered in the means.

本発明によれば、遠方にあるような小さい顔やボケ、更にはノイズが含まれる顔画像に対しても高精度な顔認証を実現することができる。 According to the present invention, it is possible to realize face recognition with high accuracy even for a face image including a small face or blur that is far away and noise.

高解像化辞書を学習する際の顔の種類の説明図。Explanatory drawing of the kind of face at the time of learning a high resolution dictionary. 画像識別装置の機能ブロック図。The functional block diagram of an image identification device. 顔識別処理を表すフローチャート。6 is a flowchart illustrating face identification processing. 顔識別処理を表すフローチャート。6 is a flowchart illustrating face identification processing. 高解像化辞書学習データ生成部の詳細な構成図。The detailed block diagram of a high-resolution dictionary learning data generation part. 高解像化辞書学習データ生成処理を表すフローチャート。6 is a flowchart showing high-resolution dictionary learning data generation processing. 切り出し画像の説明図。Explanatory drawing of a cutout image. 切り出し画像の説明図。Explanatory drawing of a cutout image. 低解像画像の説明図。Explanatory drawing of a low resolution image. 動画圧縮ノイズを想定する場合のノイズ付与の説明図。Explanatory drawing of noise addition in the case of supposing a video compression noise. 学習データを複数のブロックに分割したときの例示図。The example figure when dividing learning data into a plurality of blocks. ペア辞書の説明図。Explanatory drawing of a pair dictionary. ペア辞書の説明図。Explanatory drawing of a pair dictionary. 登録された顔画像だけを利用して高解像化辞書を学習する効果の説明図。Explanatory drawing of the effect which learns a high resolution dictionary using only the registered face image. 高解像化辞書による効果の説明図。Explanatory drawing of the effect by a high resolution dictionary. hallucination技術の説明図。Illustration of hallucination technology.

以下、図面を参照して、実施形態を詳細に説明する。 Hereinafter, embodiments will be described in detail with reference to the drawings.

顔認証技術に注目すると、近似すべき顔の種類は、限定的になる。つまり顔認証技術は、入力された顔画像が予め登録された顔画像と同一人物の顔であることを判定する技術であるため、あらゆる顔を十分に表現できる高解像化辞書Ｄを学習する必要はない。そのために顔認証技術は、予め登録した顔だけを十分に表現できる高解像化辞書Ｄを用意すればよい。本実施形態では顔認証で登録された登録顔画像を高解像化辞書Ｄに利用することによって、登録顔画像だけを十分に近似するだけの高解像化辞書Ｄを生成する。また、画像処理だけでは再現できない動画圧縮によって生じるノイズにも対応可能となるように、登録顔画像から動画像を生成し、生成した動画像から切り出した顔画像を利用して高解像化辞書Ｄを学習する。 Focusing on face recognition technology, the types of faces to be approximated are limited. That is, since the face recognition technology is a technology for determining that the input face image is the face of the same person as the face image registered in advance, the high resolution dictionary D capable of sufficiently expressing any face is learned. There is no need. Therefore, the face recognition technology may prepare the high resolution dictionary D capable of sufficiently expressing only the face registered in advance. In the present embodiment, by using the registered face image registered in the face authentication as the high resolution dictionary D, the high resolution dictionary D that sufficiently approximates only the registered face image is generated. In addition, a moving image is generated from the registered face image so that noise caused by moving image compression that can not be reproduced only by image processing can be coped with, and a high resolution dictionary is generated using the face image extracted from the generated moving image. Learn D

図１は、hallucination技術で利用される高解像化辞書Ｄを学習する際の顔の種類の説明図である。従来の技術では、予めインターネットなどから収集するなど顔認証で登録された人物とは別の顔画像（例えば、Ｘさん、Ｙさん、Ｚさん）を利用して高解像化辞書Ｄを学習する。そして、顔認証ではその高解像化辞書Ｄを利用して、入力された顔画像に対する高解像化を行っている。 FIG. 1 is an explanatory view of the type of face when learning the high-resolution dictionary D used in the hallucination technology. In the prior art, the high resolution dictionary D is learned using face images (for example, Mr. X, Mr. Y, Mr. Z) different from the person registered in the face authentication, for example, collected from the Internet in advance. . Then, in face recognition, the high resolution dictionary D is used to perform high resolution on the input face image.

これに対して本実施形態は、登録された顔画像（例えば、Ａさん、Ｂさん）から１つの共通な高解像化辞書Ｄを学習する（第１手法）、或いは登録された顔画像から個人ごとの高解像化辞書Ｄを学習する（第２手法）。これは、上述したように、顔認証技術では登録された人物かどうかだけを判定できればよいので、登録顔画像だけを近似する高解像化辞書Ｄを学習することで顔認証精度向上を狙うものである。 On the other hand, in the present embodiment, one common high-resolution dictionary D is learned from registered face images (for example, Mr. A, Mr. B) (first method) or from registered face images The high resolution dictionary D for each individual is learned (second method). This is because, as described above, it is only necessary to determine whether or not a person is a registered person in face recognition technology, so it is intended to improve face recognition accuracy by learning a high resolution dictionary D that approximates only a registered face image. It is.

（構成）
図２は、本実施形態の画像識別装置の機能ブロック図である。この画像識別装置は、不図示のＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、及びＲＡＭ（Random Access Memory）を備える情報処理装置により実現される。この情報処理装置は、カメラ等の撮像装置を搭載、或いは撮像装置に接続されて構成される。ＣＰＵは、ＲＯＭに格納されるコンピュータプログラムを、ＲＡＭを作業領域に用いて実行することで、情報処理装置を画像識別装置として機能させる。画像識別装置は、顔認証のための、画像取得部１１０、顔検出部１２０、顔器官検出部１３０、特徴抽出部１４０、顔認証辞書記憶部１５０、及び顔識別部１６０として機能する。また画像識別装置は、高解像化のための、高解像化辞書学習データ生成部１７０、高解像化辞書学習部１８０、高解像化辞書記憶部１９０、及び高解像化画像生成部２００として機能する。各機能ブロックは、少なくとも一部がハードウェアにより実現されていてもよい。 (Constitution)
FIG. 2 is a functional block diagram of the image identification apparatus of the present embodiment. The image identification apparatus is realized by an information processing apparatus including a central processing unit (CPU), a read only memory (ROM), and a random access memory (RAM) (not shown). The information processing apparatus includes an imaging device such as a camera, or is connected to the imaging device. The CPU causes the information processing apparatus to function as an image identification apparatus by executing the computer program stored in the ROM using the RAM as a work area. The image identification apparatus functions as an image acquisition unit 110, a face detection unit 120, a face organ detection unit 130, a feature extraction unit 140, a face authentication dictionary storage unit 150, and a face identification unit 160 for face authentication. The image identification apparatus also includes a high resolution dictionary learning data generation unit 170, a high resolution dictionary learning unit 180, a high resolution dictionary storage unit 190, and a high resolution image generation for high resolution. It functions as the unit 200. Each functional block may be realized at least in part by hardware.

（処理）
画像識別装置は、以上のような各機能ブロックにより、顔画像の登録及び高解像化辞書の学習を行う第１動作モードと、画像の高解像化及び顔認証を行う第２動作モードと、で動作する。図３、図４は、これら第１、第２動作モードによる顔識別処理を表すフローチャートである。 (processing)
The image identification apparatus includes a first operation mode for registering a face image and learning of a high resolution dictionary by each functional block as described above, and a second operation mode for performing high resolution of an image and face recognition. Works with FIG. 3 and FIG. 4 are flowcharts showing face identification processing in the first and second operation modes.

画像識別装置は、まず、動作モードが第１モードであるか否かを判定する（Ｓ１０００）。動作モードは、例えば不図示の入力装置を用いてユーザにより設定される。第１モードである場合、画像識別装置は、第１動作モードによる動作を開始する（Ｓ１０００：Y）。第２動作モードである場合、画像識別装置は、図４に示す第２動作モードによる動作を開始する（Ｓ１０００：N）。 The image identification apparatus first determines whether the operation mode is the first mode (S1000). The operation mode is set by the user using, for example, an input device (not shown). In the case of the first mode, the image identification device starts the operation in the first operation mode (S1000: Y). In the case of the second operation mode, the image identification apparatus starts the operation in the second operation mode shown in FIG. 4 (S1000: N).

（第１動作モード：画像登録と高解像化辞書学習モード）
第１動作モードは、顔認証に利用される顔画像の登録と、画像を高解像化するための辞書を学習するための動作モードである。ここでは、顔画像の登録に用いる登録用画像は、登録する人物の顔画像を含み、良好な照明環境下で撮影された高解像画像を想定している。高解像画像は、顔画像から人物の特定が可能な解像度を有する画像である。 (First operation mode: Image registration and high-resolution dictionary learning mode)
The first operation mode is an operation mode for registering a face image used for face authentication and learning a dictionary for enhancing the resolution of the image. Here, the registration image used to register the face image includes a face image of a person to be registered, and is assumed to be a high resolution image photographed under a good illumination environment. The high resolution image is an image having a resolution that can identify a person from a face image.

画像取得部１１０は、撮像装置から登録用画像を取得をする（Ｓ１１００）。撮像装置は、レンズなどの集光素子、光を電気信号に変換する撮像素子、アナログ信号をデジタル信号に変換するＡＤ変換器を備え、撮像した画像を表すデジタル画像データを生成する。撮像素子は、ＣＭＯＳ（Complementary Metal Oxide Semiconductor）イメージセンサやＣＣＤ（Charge Coupled Device）イメージセンサなどである。画像取得部１１０は、デジタル画像データとして登録用画像を取得する。画像取得部１１０は、登録用画像に対して間引き処理等を行うことによって、例えば、ＶＧＡ（６４０×４８０［pixel］）やＱＶＧＡ（３２０×２４０［pixel］）に変換した登録用画像を取得することも可能である。また、画像取得部１１０は、撮像装置の他に、フラッシュメモリなどの記憶媒体を介して登録用画像を取得してもよい。いずれにせよ画像取得部１１０は、外部装置から画像を取得する。画像取得部１１０は、取得した登録用画像を顔検出部１２０及び高解像化辞書学習データ生成部１７０へ送信する。 The image acquisition unit 110 acquires a registration image from the imaging device (S1100). The imaging device includes a focusing element such as a lens, an imaging element that converts light into an electrical signal, and an AD converter that converts an analog signal into a digital signal, and generates digital image data representing a captured image. The imaging device is, for example, a complementary metal oxide semiconductor (CMOS) image sensor or a charge coupled device (CCD) image sensor. The image acquisition unit 110 acquires a registration image as digital image data. The image acquisition unit 110 acquires, for example, a registration image converted to VGA (640 × 480 [pixel]) or QVGA (320 × 240 [pixel]) by performing thinning processing or the like on the registration image. It is also possible. The image acquisition unit 110 may also acquire an image for registration via a storage medium such as a flash memory, in addition to the imaging device. In any case, the image acquisition unit 110 acquires an image from an external device. The image acquisition unit 110 transmits the acquired registration image to the face detection unit 120 and the high-resolution dictionary learning data generation unit 170.

顔検出部１２０は、登録用画像から顔や左右の目、口などの重心位置を検出する（Ｓ１２００）。顔検出部１２０は、例えば「P. Viola, M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features”, in Proc. Of CVPR, vol.1, pp.511-518, December, 2001」に開示される技術により、この処理を行う。
顔検出部１２０は、検出した顔や左右の目、口などの重心位置から、アフィン変換などを利用して顔の大きさが所定のサイズで顔の向きが正立するような顔画像である第１の正規化画像を生成する（Ｓ１２０１）。なお、顔の大きさは左右の目の間のユークリッド距離として定義する方法などがある。第１動作モードでは、顔検出部１２０は、生成した第１の正規化画像を顔器官検出部１３０へ送信する。 The face detection unit 120 detects the position of the center of gravity of the face, the left and right eyes, and the mouth from the registration image (S1200). For example, “P. Viola, M. Jones,“ Rapid Object Detection using a Boosted Cascade of Simple Features ”, in Proc. Of CVPR, vol. 1, pp. 511-518, December, 2001. This process is performed according to the disclosed technology.
The face detection unit 120 is a face image in which the size of the face is a predetermined size and the orientation of the face is erected from the detected center of gravity of the face, the left and right eyes, and the mouth using affine transformation. A first normalized image is generated (S1201). The size of the face may be defined as Euclidean distance between the left and right eyes. In the first operation mode, the face detection unit 120 transmits the generated first normalized image to the face organ detection unit 130.

顔器官検出部１３０は、第１の正規化画像から、目尻や目頭などのより細かな特徴点の重心位置を検出する（Ｓ１３００）。特徴抽出部１４０は、例えば「T. F. Cootes， C. J. Taylor, D. H. Cooper, and J. Graham, “Active Shape Models -Their Training and Application”, Computer Vision and Image Understanding, Vol. 61, No. 1, January, pp. 38 - 59, 1995」に開示される技術により、この処理を行う。
顔器官検出部１３０は、検出した特徴点の重心位置を利用して、顔の大きさが所定のサイズで顔の向きが正立するような顔画像である第２の正規化画像を生成する（Ｓ１３０１）。第１動作モードでは、顔器官検出部１３０は、生成した第２の正規化画像を特徴抽出部１４０及び高解像化辞書学習データ生成部１７０へ送信する。 The facial organ detection unit 130 detects the center-of-gravity position of finer feature points such as the corner of the eye and the corner of the eye from the first normalized image (S1300). The feature extraction unit 140 is, for example, “TF Cootes, CJ Taylor, DH Cooper, and J. Graham,“ Active Shape Models-The Training and Application ”, Computer Vision and Image Understanding, Vol. 61, No. 1, January, pp. This process is performed according to the technology disclosed in “38-59, 1995”.
The face organ detection unit 130 generates a second normalized image which is a face image having a face size of a predetermined size and an orientation of the face erected using the barycentric position of the detected feature point. (S1301). In the first operation mode, the face organ detection unit 130 transmits the generated second normalized image to the feature extraction unit 140 and the high-resolution dictionary learning data generation unit 170.

特徴抽出部１４０は、第２の正規化画像に対して特徴抽出領域を設定し、設定した領域に対して非特許文献１に示すようなＬＢＰ特徴を抽出する（Ｓ１４００）。特徴抽出部１４０は、抽出したＬＢＰ特徴に当該顔の人物を特定する個人ＩＤに紐付けた、顔認証に用いる辞書を顔認証辞書記憶部１５０に記憶させる（Ｓ１５００）。 The feature extraction unit 140 sets a feature extraction region for the second normalized image, and extracts LBP features as shown in Non-Patent Document 1 for the set region (S1400). The feature extraction unit 140 causes the face authentication dictionary storage unit 150 to store a dictionary used for face authentication in which the extracted LBP feature is linked to a personal ID specifying a person of the face (S1500).

以上のＳ１０００〜Ｓ１５００の処理は、顔認証で一般的に行われる登録処理である。顔認証辞書記憶部１５０は、登録用の顔画像が登録される登録部となる。以下の処理では、画像識別装置が、登録用の顔画像を利用して、高解像化に必要な高解像化辞書Ｄを学習する。 The processes of S1000 to S1500 described above are registration processes generally performed in face authentication. The face authentication dictionary storage unit 150 is a registration unit in which a face image for registration is registered. In the following process, the image identification apparatus learns the high resolution dictionary D necessary for high resolution using the face image for registration.

高解像化辞書学習データ生成部１７０は、画像取得部１１０から受信する登録用画像及び顔器官検出部１３０から受信する第２の正規化画像に基づいて高解像化辞書Ｄを学習するために必要なデータ（高解像化辞書学習データ）を生成する（Ｓ１７００）。具体的には、高解像化辞書学習データ生成部１７０は、登録用画像から高解像画像と低解像画像とが１対１で対応するペア画像を生成する。図５は、高解像化辞書学習データ生成部１７０の詳細な構成図である。高解像化辞書学習データ生成部１７０は、画像切り取り部１７１、縮小画像生成部１７２、ボケ付与部１７３、及びノイズ付与部１７４を備える。図６は、Ｓ１７００の高解像化辞書学習データ生成処理を表すフローチャートである。 The high-resolution dictionary learning data generation unit 170 learns the high-resolution dictionary D based on the registration image received from the image acquisition unit 110 and the second normalized image received from the facial organ detection unit 130. Data (high-resolution dictionary learning data) necessary for the image processing is generated (S1700). Specifically, the high-resolution dictionary learning data generation unit 170 generates, from the registration image, a pair image in which the high-resolution image and the low-resolution image are in one-to-one correspondence. FIG. 5 is a detailed block diagram of the high-resolution dictionary learning data generation unit 170. As shown in FIG. The high-resolution dictionary learning data generation unit 170 includes an image cutout unit 171, a reduced image generation unit 172, a blur imparting unit 173, and a noise imparting unit 174. FIG. 6 is a flowchart showing the high-resolution dictionary learning data generation process of S1700.

画像切り取り部１７１は、Ｓ１３０１の処理で生成された第２の正規化画像から少なくとも１つの切り出し画像を生成する。図７、図８は、切り出し画像の説明図である。図７の例では、画像切り取り部１７１は、顔領域中心が切り出し画像中心と一致するような切り出し画像１７１０１及び顔領域中心が切り出し画像中心よりも左側になるような切り出し画像１７１０２を生成する。Ｓ１２００の処理（顔検出）やＳ１３００の処理（顔器官検出）では、ある程度の検出誤差が生じる。この検出誤差に対応可能な高解像化辞書Ｄを作成するために、切り出し画像１７１０２が生成される。図８の例では、画像切り取り部１７１は、位置の他に顔の向きにも対応可能となるように、３Ｄモデルを利用して正面向きの顔画像である切り出し画像１７１０１から斜め横向きの顔画像１７１０３を生成する。３Ｄモデルの処理は、例えば「I. Kemelmacher-Shlizerman, “3D Face Reconstruction from a Single Image Using a Single Reference Face Shape”, PAMI, pp. 394 - 405, 2011」に開示される。 The image cutout unit 171 generates at least one cutout image from the second normalized image generated in the process of S1301. 7 and 8 are explanatory diagrams of the cutout image. In the example of FIG. 7, the image clipping unit 171 generates a clipped image 17101 in which the face area center coincides with the clipped image center and a clipped image 17102 in which the face area center is on the left side of the clipped image center. In the process of S1200 (face detection) or the process of S1300 (face organ detection), a detection error occurs to some extent. In order to create a high resolution dictionary D capable of coping with this detection error, a cutout image 17102 is generated. In the example of FIG. 8, the image cropping unit 171 uses the 3D model so as to be able to cope with the orientation of the face as well as the position, and uses the 3D model to cut out the cut-out image 17101, which is a face image facing front, Generate 17103 The processing of the 3D model is disclosed, for example, in "I. Kemelmacher-Shlizerman," 3D Face Reconstruction from a Single Image Using a Single Reference Face Shape ", PAMI, pp. 394-405, 2011.

以上のように高解像化辞書学習データ生成部１７０は、正規化された登録用画像（第２の正規化画像）からの切り出しや３Ｄモデルの利用により、高解像化辞書Ｄを学習するために必要な高解像画像を生成する。
次に高解像化辞書学習データ生成部１７０は、高解像化辞書Ｄを学習するために必要な低解像画像を生成する。本実施形態では、画像の縮小、ボケ付与、及びノイズ付与により低解像画像を生成する。ボケ及びノイズは、例えば不図示の入力装置を用いてユーザにより付与するか否かが設定される。そのために、画像の縮小のみで低解像画像が生成される場合もあり、縮小した画像にボケ及びノイズの少なくとも一方が付与されて低解像画像が生成される場合もある。図９は、低解像画像の説明図である。 As described above, the high-resolution dictionary learning data generation unit 170 learns the high-resolution dictionary D by clipping from the normalized registration image (second normalized image) and using the 3D model. To generate the high resolution image required for
Next, the high-resolution dictionary learning data generation unit 170 generates a low-resolution image necessary for learning the high-resolution dictionary D. In the present embodiment, a low resolution image is generated by image reduction, blurring, and noise. Whether blurring and noise are to be applied by the user using, for example, an input device (not shown) is set. Therefore, a low resolution image may be generated only by image reduction, or at least one of blur and noise may be added to the reduced image to generate a low resolution image. FIG. 9 is an explanatory view of a low resolution image.

縮小画像生成部１７２は、第２の正規化画像から少なくとも１つの縮小画像を生成する（Ｓ１７２０）。この処理は、遠方にある小さい顔に対して高解像化可能となるようにするためである。画像の拡大・縮小方法は、バイキュービック、バイリニアなど様々な方法があるが、本実施形態ではバイリニアを利用する。図９の例では、切り出し画像１７１０１から縮小画像１７２０１が生成される。また、斜め横向きの顔画像１７１０３から縮小画像が生成されてもよい。 The reduced image generation unit 172 generates at least one reduced image from the second normalized image (S1720). This processing is to enable high resolution for a small face located far away. There are various image enlargement / reduction methods such as bicubic and bilinear. In this embodiment, bilinear is used. In the example of FIG. 9, a reduced image 17201 is generated from the cutout image 17101. In addition, a reduced image may be generated from the face image 17103 facing diagonally.

ボケ付与部１７３は、縮小画像生成部１７２が生成した縮小画像１７２０１に対してボケを付与するか否かを判定する（Ｓ１７３０）。ボケを付与する場合（Ｓ１７３０：Y）、ボケ付与部１７３は、縮小画像１７２０１に対してボケを付与した画像１７３０１（図９参照）を生成する（Ｓ１７３１）。ボケ付与には様々な方法があるが、本実施形態のボケ付与部１７３は、予め決められたカーネルサイズと標準偏差に従ったガイシアンフィルタを用いてボケを付与する。これは、焦点距離などによって生じたボケに対しても高解像化可能となるようにするためである。 The blur imparting unit 173 determines whether to apply blur to the reduced image 17201 generated by the reduced image generation unit 172 (S1730). When blur is added (S1730: Y), the blur applying unit 173 generates an image 17301 (see FIG. 9) in which the reduced image 17201 is added with blur (S1731). There are various methods for blurring, but the blurring unit 173 of the present embodiment applies blurring using a Geisian filter according to a predetermined kernel size and standard deviation. This is to make it possible to achieve high resolution even for blurring caused by a focal length or the like.

ボケを付与しない場合（Ｓ１７３０：N）、或いはボケ付与後に、ノイズ付与部１７４は、縮小画像１７２０１に対してノイズを付与するか否かを判定する（Ｓ１７４０）。ノイズを付与する場合（Ｓ１７４０：Y）、ノイズ付与部１７４は、縮小画像１７２０１に対してノイズを付与した画像１７４０１（図９参照）を生成する（Ｓ１７４１）。センサノイズを想定する場合、ノイズ付与部１７４は、予め決められた平均と標準偏差に従ったガイシアンノイズを付与する。また、センサノイズだけでなく、Ｈ２６４などのフレーム間にわたる動画圧縮ノイズを想定する場合のノイズ付与を図１０の説明図により説明する。ノイズ付与部１７４は、図１０のように、縮小画像から、顔位置がシフトした複数枚の変換顔画像を生成し、この生成された複数枚の変換顔画像にＨ２６４などのコーデックを用いることで動画像を生成する。ノイズ付与部１７４は、生成した動画像から再び静止画像を取得することで動画圧縮ノイズを付与する。 If blurring is not applied (S1730: N), or after blurring is applied, the noise applying unit 174 determines whether noise is applied to the reduced image 17201 (S1740). When applying noise (S1740: Y), the noise applying unit 174 generates an image 17401 (see FIG. 9) obtained by applying noise to the reduced image 17201 (S1741). When sensor noise is assumed, the noise giving unit 174 gives Geithian noise according to a predetermined average and standard deviation. In addition to sensor noise, noise addition in the case of assuming moving image compression noise over a frame such as H264 will be described with reference to FIG. As shown in FIG. 10, the noise giving unit 174 generates a plurality of converted face images with shifted face positions from the reduced image, and uses a codec such as H264 for the generated plurality of converted face images. Generate moving pictures. The noise giving unit 174 gives a moving image compression noise by acquiring a still image from the generated moving image again.

ノイズを付与しない場合（Ｓ１７４０：N）、或いはノイズ付与により、高解像化辞書学習データ生成部１７０は、低解像画像の生成処理を終了する。この処理により高解像化辞書学習データ生成部１７０は、登録用画像から図９のような様々な低解像の縮小画像１７２０１、画像１７３０１、１７４０１を生成する。
以上のように高解像化辞書学習データ生成部１７０は、Ｓ１７００の処理により、高解像化辞書Ｄの学習に利用する学習データとして、高解像画像と低解像画像とのペア画像を生成する。 If noise is not added (S1740: N) or noise is added, the high-resolution dictionary learning data generation unit 170 ends the low-resolution image generation processing. By this processing, the high-resolution dictionary learning data generation unit 170 generates various low-resolution reduced images 17201 and images 17301 and 17401 as shown in FIG. 9 from the registration image.
As described above, the high-resolution dictionary learning data generation unit 170 performs the process of S1700 to use a pair image of the high-resolution image and the low-resolution image as learning data to be used for learning the high-resolution dictionary D. Generate

高解像化辞書Ｄを学習するための学習データの生成処理が終了すると、高解像化辞書学習部１８０は、該学習データを利用して高解像化辞書Ｄの学習を行う（Ｓ１８００）。高解像化を図１６に示すように局所領域毎に行うため、高解像化辞書学習部１８０は、Ｓ１７００の処理で生成された学習データを複数のブロックに分割する。図１１は、学習データを複数のブロックに分割したときの例示図である。ｐブロック目の高解像化辞書Ｄ^ｐは、数式１で表される。 When the process of generating learning data for learning the high resolution dictionary D is completed, the high resolution dictionary learning unit 180 performs learning of the high resolution dictionary D using the learning data (S1800). . Since resolution enhancement is performed for each local region as shown in FIG. 16, the resolution enhancement dictionary learning unit 180 divides the learning data generated in the process of S1700 into a plurality of blocks. FIG. 11 is an exemplary view when learning data is divided into a plurality of blocks. The p-th block high-resolution dictionary D ^p is expressed by Equation 1.

ｐブロック目の高解像化辞書Ｄ^ｐの学習は、例えば非特許文献２に開示されるように、以下の数式７を満足する、つまり、ｐブロック目の画像Ｉ^ｐとＤ^ｐα^ｐとのＬ２ノルムが最小となるＤ^ｐ、α^ｐを求めることで行われる。数式７の右辺の第２項は、過適合を防止するための罰則項であり、一般的にはＬ１正則化項とも呼ばれる。 For example, as disclosed in Non-Patent Document 2, the learning of the p-th block high-resolution dictionary D ^p satisfies the following equation 7, that is, the p-th block image I ^p and D ^p α ^p It is carried out by finding D ^p and α ^p which minimize the L 2 norm of The second term on the right side of Equation 7 is a penalty term for preventing overfitting, and is generally also called an L1 regularization term.

ｐブロック目の画像Ｉ^ｐは、登録用画像からの切り出し画像１７１０１をブロックに分割した際のｐブロック目の高解像画像と、切り出し画像１７１０１から生成したｐブロック目の低解像画像と、を連結した画像である。低解像画像は、縮小画像１７２０１、更にはノイズやボケが付加された画像１７３０１、１７４０１である。
高解像化辞書Ｄ^ｐは、pブロック目の高解像辞書と低解像辞書とを連結したペア辞書である。高解像化辞書Ｄ^ｐは、学習する最初の段階では乱数などの初期値が設定される。但し、高解像化辞書Ｄ^ｐは、予め登録されていない人物の顔画像を利用して高解像化辞書を学習しておき、これが初期値として設定されても良い。
Ｗ１とＷ２は、それぞれ１つの分割画像のサイズである。α^ｐは、線形結合係数である。 The p-th block image I ^p is a p-th block high resolution image obtained by dividing the cutout image 17101 from the registration image into blocks, and a p-th block low resolution image generated from the cutout image 17101; Is a connected image. The low resolution images are the reduced image 17201 and further images 17301 and 17401 to which noise and blur are added.
The high resolution dictionary D ^p is a pair dictionary in which the high resolution dictionary and the low resolution dictionary of the p th block are connected. The high-resolution dictionary D ^p is set to an initial value such as a random number at the first stage of learning. However, higher resolution dictionary D ^p is in advance by learning a high resolution dictionary by using a face image of a person that is not registered in advance, which may be set as the initial value.
W1 and W2 are sizes of one divided image respectively. α ^p is a linear combination coefficient.

数式７を満たす方法は、様々な手法があるが、本実施形態では非特許文献２と同様にＫ−ＳＶＤ法を利用する。つまり、高解像化辞書学習部１８０は、学習画像Ｉ^ｐを現在の高解像化辞書Ｄ^ｐの線形結合Ｄ^ｐα^ｐで近似表現した後、その差に対して特異分解を行い、固有値の大きさなどを利用してその差が小さくなる方向へ高解像化辞書の係数を更新する。従って、ここで作成される高解像化辞書ＤＩ^ｐは、固有顔のような基底画像となる。但し高解像化辞書学習部１８０の処理は、これに限られるものではない。 Although there are various methods for satisfying the formula 7 in this embodiment, the K-SVD method is used as in Non-Patent Document 2 in the present embodiment. That is, the high-resolution dictionary learning unit 180 approximates the learning image I ^p with the linear combination D ^p α ^p of the current high-resolution dictionary D ^p , and then performs singular decomposition on the difference to obtain an eigenvalue The coefficients of the high-resolution dictionary are updated in the direction in which the difference is reduced by using the size of. Therefore, the high resolution dictionary DI ^p created here is a base image such as an eigenface. However, the processing of the high-resolution dictionary learning unit 180 is not limited to this.

図１２は、Ｓ１８００の学習処理により生成されるペア辞書の説明図である。Ｓ１８００の学習処理が終了すると、図１２に示すように、ブロックの位置ごとの高解像辞書と低解像辞書とのペア辞書（高解像化辞書）を取得することができる。 FIG. 12 is an explanatory diagram of a pair dictionary generated by the learning process of S1800. When the learning process of S1800 ends, as shown in FIG. 12, it is possible to acquire a pair dictionary (high resolution dictionary) of the high resolution dictionary and the low resolution dictionary for each block position.

冒頭で説明した第１手法で生成される高解像化辞書は、登録された顔画像から１つの共通な高解像化辞書を学習する。画像Ｉ^ｐは、登録された全ての人物のｐブロック目の分割画像になる。従って第１手法では、学習後には図１２に示すように、１つの高解像化辞書（ペア辞書）が生成される。 The high resolution dictionary generated by the first method described at the beginning learns one common high resolution dictionary from the registered face image. The image I ^p is a divided image of the p-th block of all registered persons. Therefore, in the first method, one learning dictionary (pair dictionary) is generated after learning as shown in FIG.

第２手法で生成される高解像化辞書は、登録された顔画像から個人ごとの高解像化辞書を学習する。画像Ｉ^ｐは、登録された顔画像の中で同一人物のｐブロック目の分割画像になる。従って第２手法では、学習後には図１３に示すように人物毎の高解像化辞書（ペア辞書）が生成される。図１３もＳ１８００の学習処理により生成されるペア辞書の説明図である。 The high resolution dictionary generated by the second method learns the individual high resolution dictionary from the registered face image. The image I ^p is a divided image of the p-th block of the same person among the registered face images. Therefore, in the second method, after learning, a high resolution dictionary (pair dictionary) for each person is generated as shown in FIG. FIG. 13 is also an explanatory diagram of a pair dictionary generated by the learning process of S1800.

高解像化辞書を生成した高解像化辞書学習部１８０は、生成した高解像化辞書をブロック位置ｐとともにメモリである高解像化辞書記憶部１９０に記憶する（Ｓ１９００）。以上のようにして、第１動作モード（画像登録と高解像化辞書学習モード）の処理が終了する。 The high-resolution dictionary learning unit 180 that has generated the high-resolution dictionary stores the generated high-resolution dictionary together with the block position p in the high-resolution dictionary storage unit 190, which is a memory (S1900). As described above, the processing of the first operation mode (image registration and high-resolution dictionary learning mode) ends.

（第２動作モード：高解像化と顔認証モード）
第２動作モードは、入力される認証用の顔画像（認証用画像）に対して高解像化を行い、高解像化された認証用の顔画像を利用して顔認証を行う動作モードである。 (Second operation mode: high resolution and face recognition mode)
The second operation mode is an operation mode in which the resolution is increased for the face image for authentication (image for authentication) to be input, and the face authentication is performed using the high resolution face image for authentication It is.

図４に示すＳ１１０１〜Ｓ１２０３の処理は、第１動作モードのＳ１１０１〜Ｓ１２０１の処理と同様である。但し、画像取得部１１０が取得する画像は認証用画像であり、顔検出部１２０が生成する第１の正規化画像は認証用画像から生成される。第２動作モードでは、顔検出部１２０は、顔検出結果及び第１の正規化画像を高解像化画像生成部２００へ送信する。 The processes of S1101 to S1203 illustrated in FIG. 4 are the same as the processes of S1101 to S1201 of the first operation mode. However, the image acquired by the image acquisition unit 110 is an authentication image, and the first normalized image generated by the face detection unit 120 is generated from the authentication image. In the second operation mode, the face detection unit 120 transmits the face detection result and the first normalized image to the high resolution image generation unit 200.

高解像化画像生成部２００は、高解像化辞書記憶部１９０からＳ１９００の処理で記憶された高解像化辞書Ｄを取得する（Ｓ２０００）。高解像化画像生成部２００は、高解像化辞書Ｄを用いて第１の正規化画像を高解像化する（Ｓ２００１）。高解像化画像生成部２００は、まず、Ｓ１２０１による顔検出結果に基づいて高解像化を行うか否かを判定する。例えば、高解像化画像生成部２００は、左右の目のユークリッド距離が所定閾値以下である場合に高解像化を行うと判定する。
高解像化を行う場合、高解像化画像生成部２００は、図１１のように正規化された入力顔画像ｊ_Ｌ（第１の正規化画像）を複数のブロックに分割する。次に高解像化画像生成部２００は、入力顔画像ｊ_Ｌのｐブロック目の画像を数式８に示すように、高解像化辞書Ｄ^ｐのうちの低解像辞書を用いて線形和で近似し、線形結合係数を求める。 The high resolution image generation unit 200 acquires the high resolution dictionary D stored in the process of S1900 from the high resolution dictionary storage unit 190 (S2000). The high resolution image generation unit 200 performs high resolution on the first normalized image using the high resolution dictionary D (S2001). First, the high resolution image generation unit 200 determines whether to perform high resolution based on the face detection result in S1201. For example, when the Euclidean distance between the left and right eyes is equal to or less than a predetermined threshold, the high-resolution image generation unit 200 determines that high resolution is to be performed.
When high resolution is to be performed, the high resolution image generation unit 200 divides the input face image j _L (first normalized image) normalized as shown in FIG. 11 into a plurality of blocks. Next, the high resolution image generation unit 200 generates a linear sum using the low resolution dictionary of the high resolution dictionary D ^p as the image of the p th block of the input face image j _L is expressed by Equation 8. And approximate linear combination coefficients.

次に高解像化画像生成部２００は、数式９に示すように低解像辞書に対応する高解像辞書と線形結合係数の線形和により、pブロック目の高解像画像を生成する。 Next, the high resolution image generation unit 200 generates a high resolution image of the p-th block from the linear sum of the linear combination coefficient and the high resolution dictionary corresponding to the low resolution dictionary as shown in Formula 9.

高解像化画像生成部２００は、以上のような処理を全てのブロックに対して行う。第１手法（登録された顔画像から１つの共通な高解像化辞書を学習する）の場合、高解像化辞書は、登録された全ての顔画像に共通である。そのために入力顔画像のｐブロック目の画像からは１個の高解像画像が生成される。第２手法（登録された顔画像から個人ごとの高解像化辞書を学習する）の場合、高解像化辞書は、登録された人物の数だけ存在する。そのために入力顔画像のｐブロック目の画像からはｎ個の高解像画像が生成される。
高解像化画像生成部２００は、以上のように入力顔画像のｐブロック目の画像から高解像画像を生成する処理を全ブロックに対して行い、それらの結果を連結することで高解像画像を生成する。なお、上述した第２手法では、言うまでもないが１枚の認証用の顔画像からｎ枚の高解像画像が得られる。 The high-resolution image generation unit 200 performs the above-described processing on all blocks. In the case of the first method (learning one common high-resolution dictionary from registered face images), the high-resolution dictionary is common to all registered face images. Therefore, one high resolution image is generated from the p-th block image of the input face image. In the case of the second method (learning a high resolution dictionary for each individual from a registered face image), the high resolution dictionaries exist as many as the number of registered persons. Therefore, n high resolution images are generated from the p-th block image of the input face image.
As described above, the high-resolution image generation unit 200 performs processing for generating a high-resolution image from the p-th block image of the input face image for all blocks, and connects these results to obtain a high-resolution solution. Generate an image. In the second method described above, it goes without saying that n high resolution images can be obtained from one authentication face image.

顔器官検出部１３０は、図３のＳ１３００、Ｓ１３０１の処理と同様の処理により、高解像化画像生成部２００により生成された高解像画像から第２の正規化画像を生成する（Ｓ１３０２、Ｓ１３０３）。特徴抽出部１４０は、図３のＳ１４００の処理と同様の処理により、第２の正規化画像から特徴量を抽出する（Ｓ１４０１）。顔識別部１６０は、顔認証に用いる辞書を顔認証辞書記憶部１５０から取得する（Ｓ１６００）。顔識別部１６０は、非特許文献１に例示するように、取得した顔認証に用いる辞書とＳ１４０１の処理で抽出した特徴量に基づいて、顔認証を行う（Ｓ１６０１）。
以上のように登録される顔画像に基づく顔認証が行われる。 The face organ detection unit 130 generates a second normalized image from the high resolution image generated by the high resolution image generation unit 200 by the processing similar to the processing of S1300 and S1301 of FIG. 3 (S1302, S1303). The feature extraction unit 140 extracts the feature amount from the second normalized image by the same process as the process of S1400 in FIG. 3 (S1401). The face identification unit 160 acquires a dictionary used for face authentication from the face authentication dictionary storage unit 150 (S1600). As exemplified in Non-Patent Document 1, the face identification unit 160 performs face authentication based on the acquired dictionary used for face authentication and the feature value extracted in the process of S1401 (S1601).
Face recognition based on the face image registered as described above is performed.

登録された顔画像だけを利用して高解像化辞書Ｄを学習する効果について説明する。図１４は、登録された顔画像だけを利用して高解像化辞書Ｄを学習する効果の説明図である。入力された低解像の顔画像がＡさんである場合、Ａさんの顔画像から生成した高解像化辞書を利用して低解像の顔画像を高解像化すると、Ａさんに似た高解像顔画像が生成される。Ｂさんの顔画像から生成した高解像化辞書を利用してＡさんの低解像の顔画像を高解像化する場合、Ｂさんに似た高解像顔画像が生成される。つまり、入力された低解像の顔画像と高解像化辞書との人物が一致した場合には本人らしい高解像顔画像が生成されるが、入力された低解像の顔画像と高解像化辞書との人物が一致しない場合には本人らしくない高解像顔画像が生成される。 An effect of learning the high resolution dictionary D using only the registered face image will be described. FIG. 14 is an explanatory diagram of an effect of learning the high resolution dictionary D using only the registered face image. If the input low-resolution face image is Mr. A, then raising the resolution of the low-resolution face image using the high-resolution dictionary generated from Mr. A's face image resembles Mr. A High resolution face image is generated. When raising the resolution of the low resolution face image of Mr. A using the high resolution dictionary generated from Mr. B's face image, a high resolution face image resembling Mr. B is generated. That is, when the input low-resolution face image matches the person in the high-resolution dictionary, a high-resolution face image like the individual is generated, but the input low-resolution face image and the high-resolution face image are high. When the person with the resolving dictionary does not match, a high resolution face image that is not like the person is generated.

図１５は、高解像化辞書Ｄによる効果の説明図である。図示するように、様々な人物の顔画像で学習した高解像化辞書Ｄにより高解像化したＡさんの顔画像と登録顔画像との類似度よりも、Ａさんの顔画像だけで学習した高解像化辞書Ｄにより高解像化したＡさんの顔画像と登録顔画像との類似度の方がより高くなる。このように、登録用画像だけを利用して高解像化辞書Ｄを学習することで、顔認証精度が向上するという効果が得られる。 FIG. 15 is an explanatory diagram of the effect of the high resolution dictionary D. As shown in FIG. As shown in the figure, learning is performed using only the face image of Mr. A rather than the similarity between the face image of Mr. A and the registered face image that have been enhanced with the high-resolution dictionary D learned with face images of various persons. The degree of similarity between the face image of Mr. A who has been made high resolution by the high resolution dictionary D and the registered face image is higher. As described above, learning the high resolution dictionary D using only the registration image has an effect of improving the face authentication accuracy.

以上のような本実施形態の画像識別装置は、顔認証に用いる登録画像だけを表現できるような高解像化辞書Ｄを学習して利用することで、低解像の顔画像に対して高精度な顔認証を実現することができる。また、画像識別装置は、登録された顔画像から動画像を生成し、生成した動画像から切り出した顔画像を高解像化辞書Ｄの学習に利用することで、ガウシアンノイズなどの画像処理だけでは再現できない動画圧縮によって生じるノイズにも対応する。そのために画像識別装置は、高精度な顔認証を実現することが可能となる。 The image identification apparatus according to the present embodiment as described above learns and uses a high resolution dictionary D that can express only a registered image used for face authentication, thereby achieving high resolution for low resolution face images. Accurate face recognition can be realized. In addition, the image identification apparatus generates a moving image from the registered face image, and uses the face image cut out from the generated moving image for learning of the high resolution dictionary D to perform only image processing such as Gaussian noise. It also copes with the noise caused by moving picture compression that can not be reproduced with. Therefore, the image identification apparatus can realize highly accurate face recognition.

本発明は、上述の各実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read the program. It can also be realized by the process to be executed. It can also be implemented by a circuit (eg, an ASIC) that implements one or more functions.

Claims

Image acquisition means for acquiring an image;
Registration means for registering a face image for registration included in the image acquired by the image acquisition means;
Learning means for learning the face image for registration to generate a pair dictionary of a low resolution dictionary and a high resolution dictionary;
Storage means in which the generated pair dictionary is stored;
A high resolution image generation unit configured to generate a high resolution image by performing high resolution on a face image for authentication included in the image acquired by the image acquisition unit using the pair dictionary;
Face identification means for identifying the face image for authentication based on the high resolution image generated by the high resolution image generation means and the face image for registration registered in the registration means; Characterized in that,
Image identification device.

The learning means is characterized in that it generates the pair dictionary which is a dictionary common to all persons registered in the registration means.
The image identification device according to claim 1.

The learning means generates the pair dictionary for each person registered in the registration means.
The image identification device according to claim 1.

The learning means is characterized in that a dictionary obtained by learning a face image not registered in the registration means is set as an initial value, and the face image for registration is learned to generate the pair dictionary.
The image identification device according to any one of claims 1 to 3.

The high resolution image generation unit generates a plurality of high resolution images for each face image acquired by the image acquisition unit using a pair dictionary for each person stored in the storage unit. Characterized by
The image identification device according to claim 4.

It further comprises learning data generation means for generating a low resolution face image and a high resolution face image from the image acquired by the image acquisition means,
The learning means learns the low resolution face image and the high resolution face image to generate the low resolution dictionary and the high resolution dictionary.
The image identification device according to any one of claims 1 to 5.

The learning data generation unit generates a moving image composed of a plurality of converted face images having different positions and sizes of faces from the image acquired by the image acquiring unit, and extracts a face image extracted from the moving image. Generate
The learning means is characterized in that the pair dictionary is generated by learning the extracted face image.
The image identification device according to claim 6.

The learning data generation means generates the high resolution face image by cutting out a face image from the image acquired by the image acquisition means, and reduces the cut out face image to reduce the low resolution face Characterized by generating an image,
The image identification device according to claim 6.

The learning data generation means generates the low resolution face image by adding at least one of blur and noise to the reduced face image.
The image identification device according to claim 8.

The information processing apparatus
A low resolution dictionary and a high resolution dictionary generated by extracting a face image for registration of a person from an image acquired from an external device and registering the face image in a predetermined registration unit and learning the face image for registration Store the pair dictionary of in the predetermined storage means,
A high resolution image is generated by increasing the resolution of the face image for authentication acquired from the external device using the pair dictionary, and the high resolution image and the registration unit registered in the registration unit Performing face authentication of the face image for authentication based on the face image;
Image identification method.

Computer,
Image acquisition means for acquiring an image,
Registration means for registering a face image for registration included in the image acquired by the image acquisition means;
Learning means for learning the face image for registration to generate a pair dictionary of a low resolution dictionary and a high resolution dictionary;
Storage means for storing the generated pair dictionary;
A high resolution image generation unit that generates a high resolution image by raising the resolution of the face image for authentication included in the image acquired by the image acquisition unit using the pair dictionary.
Face identification means for identifying the face image for authentication based on the high resolution image generated by the high resolution image generation means and the face image for registration registered in the registration means;
Computer program to function as.

A computer readable storage medium storing the computer program according to claim 11.