JP2009294955A

JP2009294955A - Image processor, image processing method, image processing program and recording medium with the same program recorded thereon

Info

Publication number: JP2009294955A
Application number: JP2008148601A
Authority: JP
Inventors: Yoshiko Sugaya; 佳子菅谷; Shingo Ando; 慎吾安藤; Akira Suzuki; 章鈴木; Hideki Koike; 秀樹小池
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-06-05
Filing date: 2008-06-05
Publication date: 2009-12-17

Abstract

<P>PROBLEM TO BE SOLVED: To provide image correction technology for removing a spectacles pattern in an image by generating a spectacles model adaptable to spectacles with various frame shapes. <P>SOLUTION: A pre-processing part 12 of an image processor 10 identifies and detects a face region from input image data. A spectacles model generation part 15 creates a spectacles model under the consideration of textures such as spectacles frame shapes and the shapes of eyeballs in a spectacles frame from the learning pattern of a sample image. A feature extraction means part 14 applies the spectacles model to the face region, and extracts a spectacles model parameter. A spectacles identification part 16 reads an identification parameter for identifying the presence of spectacles, and decides the presence of the spectacles in input image data with a spectacles model parameter. A region extraction part 18 divides the input image data into arbitrary regions. An interpolation part 19 creates an arbitrary texture from the data of the division regions, and integrates the respective regions. This integration result is output by an output part 20. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、画像中に含まれる眼鏡の有無および眼鏡領域を判断し、眼鏡パターンを精度よく除去する画像処理の技術に関する。 The present invention relates to an image processing technique for determining presence / absence of spectacles and a spectacle region included in an image and accurately removing spectacle patterns.

今日のコンピュータビジョンの分野では、生体認証の一つとして顔認証の技術が監視システムやペットロボット、ヒューマンインターフェースなどで頻繁に用いられている。顔認証については、事前に登録しておいた顔画像と入力画像を比較して認識するため、基本的には誤認識の原因となる部分遮蔽や眼鏡や装飾品といった遮蔽物のない顔画像を登録しておく必要がある。 In the field of computer vision today, face authentication technology is frequently used as one of biometrics in surveillance systems, pet robots, human interfaces, and the like. For face authentication, the face image registered in advance and the input image are compared and recognized, so basically face images without obstructions such as partial occlusion, glasses and ornaments that cause misrecognition. It is necessary to register.

そのため眼鏡をかけている人の顔画像登録の際には、眼鏡をかけている時の顔画像と眼鏡をはずした時の顔画像の２種類を登録する必要性が出てくる。顔認証のシステムにおいて、このような二度手間を省き、かつ顔の部分遮蔽による誤認識を軽減するために顔画像内から眼鏡領域のみを除去する技術が必要である。 Therefore, when registering a face image of a person wearing glasses, there is a need to register two types of images: a face image when wearing glasses and a face image when wearing glasses. In the face authentication system, a technique for removing only the eyeglass region from the face image is required in order to save such troubles twice and to reduce misrecognition due to partial masking of the face.

眼鏡除去の処理手順は「前処理部」、「眼鏡モデル生成部」、「眼鏡検出部」、「領域抽出部」、「補間部」の５つに分けることができる。従来の眼鏡除去法の一例としては、パラメトリック眼鏡フレームモデルを用いた手法（非特許文献１参照）がある。この技術は以下の手順で眼鏡領域の除去を行う。即ち、眼鏡をかけた人物の顔画像を入力とし、あらかじめ作成しておいた眼鏡フレームモデルと入力画像のエッジの検出を行い得られた線の情報とを比較し、眼鏡フレーム領域を検出する。次に、検出された眼鏡フレーム領域の輝度値を除去した後、眼鏡フレーム領域の輝度値を周辺領域の肌色部分の輝度値で線形補間する。線形補間した部分に、更に偏差を加算し、平滑化することにより眼鏡除去画像を生成する。
齋藤康之、剣持雪子、小谷一孔：「パラメトリックな眼鏡フレームモデルを用いた顔画像内の眼鏡フレーム領域の抽出と除去」，電子情報通信学会論文誌，Ｄ−ＩＩＶｏｌ．Ｊ８２−Ｄ−ＩＩＮｏ．５ｐｐ．８８０−８９０，１９９９年５月荒木祐一、島田伸敬、白井良明：「背景と顔の方向に依存しない顔の検出と顔方向の推定」、信学技法Ｖｏｌ．１０１Ｎｏ．５６９ｐｐ．８７−９４，２００２Ｔ．Ｆ．Ｃｏｏｔｅｓ，Ｇ．Ｊ．Ｅｄｗａｒｄｓ，ａｎｄＣ．Ｊ．Ｔａｙｌｏｒ．”Ａｃｔｉｖｅａｐｐｅａｒａｎｃｅｍｏｄｅｌｓ”．ＩＥＥＥＴＰＡＭＩ，ＶＯＬ．２３，ＮＯ６，ＪＵＮ２００１：６８１−６８５ＮｅｌｌｏＣｒｉｓｔｉａｎｉｎｉ，ＪｏｈｎＳｈａｗｅ−Ｔａｙｌｏｒ，”ＡｎＩｎｔｒｏｄｕｃｔｉｏｎｔｏＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅｓ”：ＡｎｄＯｔｈｅｒＫｅｒｎｅｌ−ＢａｓｅｄＬｅａｒｎｉｎｇＭｅｔｈｏｄｓ，ＣａｍｂｒｉｄｇｅＵｎｉｖＰｒｅｓｓ２０００：Ｐ９４〜Ｐ９７ＪｉａｎＳｕｎ，ＬｉｎＬｉａｎｇ，ＦａｎｇＷｅｎ，Ｈｅｕｎｇ−ＹｅｕｎｇＳｈｕｍ：”ＩｍａｇｅＶｅｃｔｏｒｉｚａｔｉｏｎｕｓｉｎｇＯｐｔｉｍｉｚｅｄＧｒａｄｉｅｎｔＭｅｓｈｅｓ”，ＡＣＭＴｒａｎｓａｃｔｉｏｎｓｏｎＧｒａｐｈｉｃｓ，Ｖｏｌ．２６，Ｎｏ．３，Ａｒｔｉｃｌｅ１１，Ｐｕｂｌｉｃａｔｉｏｎｄａｔｅ：Ｊｕｌｙ２００７． The glasses removal processing procedure can be divided into five parts: “pre-processing unit”, “glasses model generation unit”, “glasses detection unit”, “region extraction unit”, and “interpolation unit”. As an example of a conventional spectacle removal method, there is a method using a parametric spectacle frame model (see Non-Patent Document 1). In this technique, the eyeglass region is removed by the following procedure. That is, a face image of a person wearing glasses is input, and a spectacle frame region is detected by comparing a spectacle frame model prepared in advance with line information obtained by detecting edges of the input image. Next, after the luminance value of the detected spectacle frame region is removed, the luminance value of the spectacle frame region is linearly interpolated with the luminance value of the skin color portion of the peripheral region. A spectacles-removed image is generated by adding a deviation to the linearly interpolated portion and smoothing it.
Yasuyuki Saito, Yukiko Kenmochi, and Kazuya Kotani: “Extraction and removal of eyeglass frame regions from facial images using parametric eyeglass frame models”, IEICE Transactions, D-II Vol. J82-D-II No. 5 pp. 880-890, May 1999 Yuichi Araki, Nobutaka Shimada, Yoshiaki Shirai: “Detection of face and estimation of face direction independent of background and face direction”, Science Technique Vol. 101 no. 569 pp. 87-94, 2002 T.A. F. Cootes, G .; J. et al. Edwards, and C.I. J. et al. Taylor. “Active appearance models”. IEEE TPAMI, VOL. 23, NO6, JUN 2001: 681-685 Nello Cristianini, John Shawe-Taylor, “An Induction to Support Vector Machines”: And Other Kernel-Based Learning Methods, Cambridge P94 to Cambridge P Jian Sun, Lin Liang, Fang Wen, Heung-Yung Shum: “Image Vectorizing using Optimized Gradient Meshes”, ACM Transactions on Graphics, Vol. 26, no. 3, Article 11, Publication date: July 2007.

しかしながら、従来の眼鏡除去方法には次の問題があった。すなわち、一般に眼鏡フレームの形状や色は様々であるため、眼鏡フレームモデルを作成して、使用しても、眼鏡フレームの誤検出が生じる。これはエッジ抽出により得られた線の情報のみで眼鏡フレームを検出していることや眼鏡のフレームだけのモデルを使用していることに起因していると考えられる。顔画像内からの眼鏡領域除去を実現するためには、エッジ抽出から得られる線の情報から眼鏡のフレーム位置を推定するだけでなく、眼鏡フレームや眼鏡のレンズ内部のテクスチャといった付加情報も必要であると言える。 However, the conventional spectacle removal method has the following problems. That is, since the shape and color of a spectacle frame are generally various, even if a spectacle frame model is created and used, erroneous detection of the spectacle frame occurs. This can be attributed to the fact that the spectacle frame is detected only from the line information obtained by the edge extraction, and that the spectacle frame only model is used. In order to realize the removal of the spectacle region from the face image, not only the spectacle frame position is estimated from the line information obtained from the edge extraction, but also additional information such as the spectacle frame and the texture inside the spectacle lens is necessary. It can be said that there is.

また、眼鏡を含む顔画像を入力としているため、入力画像内に顔が存在し、且つ、眼鏡が含まれるという事前情報がないと、この技術を実際に適応することができない。 In addition, since a face image including glasses is used as an input, this technique cannot be actually applied without a prior information that a face exists in the input image and glasses are included.

さらに、検出された眼鏡フレーム領域の輝度値を周辺領域の肌色部分の輝度値で線形補間や平滑化を施しても、眼鏡フレーム領域と隣接するフレーム内側および眼鏡フレーム外側の領域間の境界でコントラストが不連続になり、処理結果画像の見た目が不自然になる可能性があるため、その効果は十分であるとは言えない。 Furthermore, even if the detected luminance value of the spectacle frame region is linearly interpolated or smoothed with the luminance value of the flesh-colored portion of the surrounding region, the contrast at the boundary between the spectacle frame region and the adjacent frame inner region and the outer region of the spectacle frame Is discontinuous and the processing result image may appear unnatural, so that the effect is not sufficient.

そこで、本発明は、様々なフレーム形状の眼鏡に対応可能な眼鏡モデルを生成して、眼鏡パターンを除去することを第１の課題とする。また、眼鏡の有無や顔の存在に関わらず、どのような画像を入力しても眼鏡除去を可能にするための顔の存在、および眼鏡の有無の識別することを第２の課題とする。さらに、眼鏡除去の処理結果が自然に見えるように除去領域を補間することを第３の課題とする。 Accordingly, a first object of the present invention is to generate a spectacle model that can accommodate spectacles having various frame shapes and remove a spectacle pattern. Further, it is a second problem to identify the presence of a face and the presence or absence of glasses for enabling the removal of glasses regardless of the input of any image regardless of the presence or absence of glasses or the presence of a face. Furthermore, a third problem is to interpolate the removal region so that the processing result of the eyeglass removal looks natural.

本発明は、前記課題を解決すべく創作された技術的思想であって、請求項１〜８記載の発明は、眼鏡フレームと眼球の形状に加え、テクスチャ情報を考慮して眼鏡領域を検出することで第１の課題を解決している。請求項２．５記載の発明は、撮像画像内おける顔の存在、および眼鏡の有無の識別を可能にすることで第２の課題を解決している。請求項３．６記載の発明は、顔領域を複数の小領域に切り分け、それぞれの領域全体の質感を考慮したテクスチャを含むレイヤーを生成し、合成することで第３の課題を解決している。 The present invention is a technical idea created to solve the above-mentioned problems, and the invention according to claims 1 to 8 detects a spectacle region in consideration of texture information in addition to the shape of a spectacle frame and an eyeball. This solves the first problem. The invention according to claim 2.5 solves the second problem by making it possible to identify the presence of a face in a captured image and the presence or absence of glasses. The invention described in claim 3.6 solves the third problem by dividing the face area into a plurality of small areas, generating a layer including a texture in consideration of the texture of the entire area, and synthesizing the layer. .

具体的には、請求項１記載の発明は、人物の撮像画像から眼鏡の有無を識別し、眼鏡パターンを除去した顔画像を生成する画像処理装置であって、サンプル画像の学習パターンから、眼鏡フレーム形状と眼鏡フレーム内部の眼球の形状、および眼鏡フレームと眼球を含む眼鏡フレーム内部のテクスチャを加味した眼鏡モデルを生成する眼鏡モデル生成手段と、入力された前記撮像画像内の顔領域と前記眼鏡モデルとから眼鏡領域の特徴量を抽出する特徴抽出手段と、前記眼鏡領域の特徴量と学習パターンの特徴量とを比較して、前記撮像画像内の眼鏡の有無を判別する眼鏡識別手段と、前記撮像画像を前記顔領域と前記眼鏡領域の特徴量とから任意の領域に分割する領域抽出手段と、前記分割された各領域のデータから任意のテクスチャを生成し、各領域を統合する補間手段とを備え、前記統合結果を出力することを特徴としている。 Specifically, the invention described in claim 1 is an image processing apparatus that identifies the presence or absence of glasses from a captured image of a person, and generates a face image from which the glasses pattern is removed. Eyeglass model generating means for generating a spectacle model that takes into account the frame shape and the shape of the eyeball inside the spectacle frame, and the texture inside the spectacle frame including the spectacle frame and the eyeball, and the face area and the spectacles in the input captured image A feature extraction unit that extracts a feature amount of a spectacle region from a model, a spectacle identification unit that compares the feature amount of the spectacle region with a feature amount of a learning pattern, and determines the presence or absence of spectacles in the captured image; An area extraction unit that divides the captured image into arbitrary areas from the feature values of the face area and the spectacle area, and generates an arbitrary texture from the data of each of the divided areas. And includes an interpolation means for integrating the regions, it is characterized by outputting the integrated results.

請求項２記載の発明は、前記撮像画像内における顔の有無を識別して、前記顔領域を抽出する前処理手段を備え、前記特徴抽出手段は、前記眼鏡領域の探索を行って、探索の結果得られた眼鏡モデルパラメータを特徴量として抽出するとともに、前記眼鏡識別手段は、データベースに保存された学習パターンの識別パラメータを特徴量として読み込み、前記眼鏡モデルパラメータを用いて、前記撮像画像内の眼鏡の有無を判定することを特徴としている。 The invention according to claim 2 includes pre-processing means for identifying the presence or absence of a face in the captured image and extracting the face area, wherein the feature extraction means searches for the spectacle area, The spectacles model parameter obtained as a result is extracted as a feature value, and the spectacles identification unit reads the learning pattern identification parameter stored in the database as a feature value, and uses the spectacle model parameter to store in the captured image. It is characterized by determining the presence or absence of glasses.

請求項３記載の発明は、前記領域抽出手段は、前記入力画像データ内における前記眼鏡領域の座標値、眼鏡フレーム領域の座標値および眼球領域の座標値を算出する手段と、前記顔領域を複数の領域に分割して切り分ける手段とを備えるとともに、前記補間手段は、領域抽出手段にて切り分けられた各領域内のオブジェクトの空洞の画素値を埋める手段と、任意の領域のレイヤーを作成する手段と、前記オブジェクトからテクスチャを自動生成して該テクスチャを補正する手段と、複数の前記レイヤーを単一の画像データに変換して統合する手段とを備えることを特徴としている。 According to a third aspect of the present invention, the region extracting means calculates a coordinate value of the spectacle region, a coordinate value of the spectacle frame region, and a coordinate value of the eyeball region in the input image data, and a plurality of the face regions. Means for dividing and dividing the area into a plurality of areas, and the interpolation means fills the hollow pixel values of the objects in each area cut by the area extraction means, and creates a layer of an arbitrary area And means for automatically generating a texture from the object and correcting the texture, and means for converting and integrating the plurality of layers into a single image data.

請求項４記載の発明は、人物の撮像画像から眼鏡の有無を識別し、眼鏡パターンを除去した顔画像を生成する画像処理方法であって、眼鏡モデル生成手段が、サンプル画像の学習パターンから、眼鏡フレーム形状と眼鏡フレーム内部の眼球の形状、および眼鏡フレームと眼球を含む眼鏡フレーム内部のテクスチャを加味した眼鏡モデルを生成する第１ステップと、特徴抽出手段が、入力された前記撮像画像内の顔領域と前記眼鏡モデルとから眼鏡領域の特徴量を抽出する第２ステップと、眼鏡識別手段が、前記眼鏡領域の特徴量と学習パターンの特徴量とを比較して、前記撮像画像内の眼鏡の有無を判別する第３ステップと、領域抽出手段が、前記顔領域と前記眼鏡領域の特徴量とから前記撮像画像を任意の領域に分割する第４ステップと、補間手段が、前記分割された各領域のデータから任意のテクスチャを生成し、各領域を統合する第５ステップと、前記第５ステップの統合結果を出力する第６ステップと、を有することを特徴としている。 The invention according to claim 4 is an image processing method for identifying the presence or absence of eyeglasses from a captured image of a person and generating a face image from which the eyeglass pattern is removed, wherein the eyeglass model generating means A first step of generating a spectacle model that takes into account the spectacle frame shape, the shape of the eyeball inside the spectacle frame, and the texture inside the spectacle frame including the spectacle frame and the eyeball; The second step of extracting the feature amount of the spectacle region from the face region and the spectacle model, and the spectacle identification means compares the feature amount of the spectacle region with the feature amount of the learning pattern, and the spectacles in the captured image A third step of determining whether or not there is a fourth step, and a region extracting unit divides the captured image into arbitrary regions from the feature amounts of the face region and the spectacle region; The interleaving unit has a fifth step of generating an arbitrary texture from the data of each divided area and integrating the areas, and a sixth step of outputting the integration result of the fifth step. It is said.

請求項５記載の発明は、前処理手段において前記撮像画像内の顔の有無を識別して、前記顔領域を抽出するステップをさらに有し、前記第２ステップは、前記眼鏡領域の探索を行って、探索の結果得られた眼鏡モデルパラメータを特徴量として抽出し、前記第３ステップは、データベースに保存された学習パターンの識別パラメータを特徴量として読み込み、前記眼鏡モデルパラメータを用いて、前記撮像画像内の眼鏡の有無を判定することを特徴としている。 According to a fifth aspect of the present invention, the preprocessing means further includes the step of identifying the presence or absence of a face in the captured image and extracting the face region, and the second step searches for the spectacle region. Then, the eyeglass model parameter obtained as a result of the search is extracted as a feature amount, and the third step reads the learning pattern identification parameter stored in the database as the feature amount, and uses the eyeglass model parameter to perform the imaging. It is characterized by determining the presence or absence of glasses in the image.

請求項６記載の発明は、前記第４ステップは、前記入力画像データ内における前記眼鏡領域の座標値、眼鏡フレーム領域の座標値および眼球領域の座標値を算出するステップと、前記顔領域を複数の領域に分割して切り分けるステップとを有する一方、前記第５ステップは、領域抽出手段にて切り分けられた各領域内のオブジェクトの空洞の画素値を埋めるステップと、任意の領域のレイヤーを作成するステップと、前記オブジェクトからテクスチャを自動生成して該テクスチャを補正するステップと、複数の前記レイヤーを単一の画像データに変換して統合するステップとを有することを特徴としている。 According to a sixth aspect of the present invention, the step of calculating the coordinate value of the spectacle region, the coordinate value of the spectacle frame region, and the coordinate value of the eyeball region in the input image data includes a plurality of face regions. The fifth step includes a step of filling the pixel value of the cavity of the object in each region cut by the region extraction means, and a layer of an arbitrary region. A step of automatically generating a texture from the object and correcting the texture; and converting and integrating the plurality of layers into a single image data.

請求項７記載の発明は、請求項１〜３のいずれか１項に記載の画像処理装置を構成する各手段としてコンピュータを機能させることを特徴とする画像処理プログラムに関する。 A seventh aspect of the present invention relates to an image processing program that causes a computer to function as each means constituting the image processing apparatus according to any one of the first to third aspects.

請求項８記載の発明は、請求項７記載の画像処理プログラムを記録した記録媒体に関する。 The invention according to claim 8 relates to a recording medium on which the image processing program according to claim 7 is recorded.

請求項１〜８記載の発明によれば、眼鏡フレームと眼球の形状に加え、テクスチャ情報も考慮して眼鏡領域が検出されるため、様々なタイプの眼鏡に対応でき、また従来手法と比べ、多くの情報量を使用するため結果として誤検出を減らすことが可能となる。 According to the first to eighth aspects of the present invention, the spectacle region is detected in consideration of the texture information in addition to the spectacle frame and the shape of the eyeball. Since a large amount of information is used, it is possible to reduce false detection as a result.

特に、請求項２．５記載の発明によれば、撮像画像内おける顔の存在、および眼鏡の有無の識別が可能になるので、眼鏡や顔の有無にかかわらず、任意の画像データを入力できるようになる。 In particular, according to the invention described in claim 2.5, it is possible to identify the presence of the face in the captured image and the presence or absence of the glasses, so that any image data can be input regardless of the presence or absence of the glasses or the face. It becomes like this.

また、請求項３．６記載の発明によれば、顔領域を複数の小領域に切り分け、それぞれの領域全体の質感を考慮したテクスチャを含むレイヤーを生成し、合成することにより、全体の自然な感じを損なうことなく、眼鏡パターンを除去した画像処理結果が得られる。 According to the invention described in claim 3.6, the face area is divided into a plurality of small areas, and a layer including a texture that takes into consideration the texture of each of the entire areas is generated and synthesized. An image processing result from which the eyeglass pattern is removed can be obtained without impairing the feeling.

以下、本発明の実施の形態について図面を用いて詳細に説明する。図１は、本発明の実施形態に係る画像処理装置の機能ブロック図を示している。この画像処理装置１０は、コンピュータにより構成され、画像データ入力部１１，前処理部１２，第１学習パターン記憶部１３，特徴抽出部１４，眼鏡モデル生成部１５，眼鏡識別部１６，第２学習パターン記憶部１７，領域抽出部１８，補間部１９，出力部２０を有している。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a functional block diagram of an image processing apparatus according to an embodiment of the present invention. The image processing apparatus 10 is configured by a computer, and includes an image data input unit 11, a preprocessing unit 12, a first learning pattern storage unit 13, a feature extraction unit 14, a spectacle model generation unit 15, a spectacle identification unit 16, and a second learning. A pattern storage unit 17, a region extraction unit 18, an interpolation unit 19, and an output unit 20 are included.

具体的には、前記画像処理装置１０は、通常のコンピュータの構成要素、例えばＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｏｒＵｎｉｔ）、メモリ（ＲＡＭ）、ハードディスクドライブ装置、通信デバイスなどのハードウェア資源を備え、かかるハードウェア資源とインストール済みのソフトウェアとの協働によって前記各機能ブロック１１〜２０の処理が実行されている。このうち前記学習パターン記憶部１３，１７は、ハードディスクドライブ装置にデータベースとして構築されている。 Specifically, the image processing apparatus 10 includes hardware resources such as a normal computer component, for example, a CPU (Central Processor Unit), a memory (RAM), a hard disk drive device, a communication device, and the like. And the processes of the functional blocks 11 to 20 are executed in cooperation with the installed software. Among these, the learning pattern storage units 13 and 17 are constructed as a database in the hard disk drive.

そして、前記画像処理装置１０の全体的な処理を、図２のフローに沿って概略説明すれば、まず、画像データ入力部１１は、前記通信デバイスなどで実現され、デジタルカメラなどの撮像手段をもって撮像されたデジタル画像データがネットワーク経由で入力され、この入力された撮像画像データ（以下、入力画像データと略す。）を前処理部１２に伝送する（Ｓ１）。 Then, the overall processing of the image processing apparatus 10 will be schematically described along the flow of FIG. 2. First, the image data input unit 11 is realized by the communication device or the like, and has an imaging means such as a digital camera. The captured digital image data is input via the network, and the input captured image data (hereinafter abbreviated as input image data) is transmitted to the preprocessing unit 12 (S1).

つぎに、前処理部１２は入力画像データと第１学習パターン記憶部１３に記憶されているパターンの特徴量とを比較して、顔検出を行い、顔が存在するか否かを識別する（Ｓ２）。このとき入力画像内に顔があると識別された場合、顔と思われる部分の切出しを経て特徴抽出部１４へ伝送する（Ｓ３）。一方、入力画像内に顔が無いと識別された場合には、処理を終了する。 Next, the pre-processing unit 12 compares the input image data with the feature amount of the pattern stored in the first learning pattern storage unit 13 to perform face detection and identify whether or not a face exists ( S2). At this time, if it is identified that there is a face in the input image, a portion that is considered to be a face is cut out and transmitted to the feature extraction unit 14 (S3). On the other hand, if it is identified that there is no face in the input image, the process ends.

特徴抽出部１４では、特徴量となる眼鏡モデルパラメータを抽出し、眼鏡識別部１６へ送る（Ｓ４）。この眼鏡モデルパラメータは、入力された顔領域の画像データと眼鏡モデル生成部１５で生成された眼鏡モデルとに基づき生成される。 The feature extraction unit 14 extracts a spectacle model parameter serving as a feature amount and sends it to the spectacle identification unit 16 (S4). The spectacle model parameters are generated based on the input face area image data and the spectacle model generated by the spectacle model generation unit 15.

眼鏡識別部１６は、特徴抽出部１４から伝送された特徴量（眼鏡モデルパラメータ）と第２学習パターン記憶部１７に記憶されている識別パラメータを用いて、眼鏡の有無の識別結果を得る（Ｓ５）。このとき眼鏡があると判定された場合（Ｓ６）には、以降の（Ｓ７）〜（Ｓ９）の処理を続行する一方、眼鏡が無いと判定された場合には処理を終了する。ここで第２学習パターン記憶部１７では、あらかじめ多数の学習画像から学習サンプルを生成し、学習により識別パラメータを生成し、これらを保持しておき、眼鏡識別部１６へ伝送している。 The eyeglass identification unit 16 uses the feature amount (glasses model parameter) transmitted from the feature extraction unit 14 and the identification parameter stored in the second learning pattern storage unit 17 to obtain the identification result of the presence or absence of glasses (S5). ). At this time, when it is determined that there is glasses (S6), the following processes (S7) to (S9) are continued. On the other hand, when it is determined that there is no glasses, the process ends. Here, in the second learning pattern storage unit 17, learning samples are generated from a large number of learning images in advance, identification parameters are generated by learning, are held, and are transmitted to the eyeglass identification unit 16.

領域抽出部１８では、前処理部１２で得られた顔領域の画像データと特徴抽出部１４で得られた特徴量（眼鏡モデルパラメータ）からデジタル画像データを複数の領域に分割し、抽出する。（Ｓ７）。 The area extraction unit 18 divides the digital image data into a plurality of areas and extracts them from the image data of the face area obtained by the preprocessing unit 12 and the feature amount (glasses model parameter) obtained by the feature extraction unit 14. (S7).

補間部１９では、領域抽出部１８で得られた各領域を、図３に示すようにレイヤー化し、各レイヤーのラスタ表現された画像データをベクタ表現に変換し、色の補正を行い、生成されたレイヤーデータを統合した結果をラスタ表現に変換し、出力部２０へ伝送する（Ｓ８）。出力部２０は、伝送された画像データを出力する（Ｓ９）。この出力された画像データはモニタなどの表示手段に表示される。以下、前記各機能ブロック１１〜２０の処理を個別具体的に説明する。 In the interpolation unit 19, each region obtained by the region extraction unit 18 is layered as shown in FIG. 3, the raster-represented image data of each layer is converted into a vector representation, color correction is performed, and the generated data is generated. The result obtained by integrating the layer data is converted into a raster representation and transmitted to the output unit 20 (S8). The output unit 20 outputs the transmitted image data (S9). The output image data is displayed on a display means such as a monitor. Hereinafter, the processing of each of the functional blocks 11 to 20 will be described individually and specifically.

（１）前処理部１２
前処理部１２における処理の実行方法を、図４のフローチャートに示す。前処理部１２では入力画像データ内から、顔検出を行って顔領域を切り出す（Ｓ１０）〜（Ｓ１２）。この顔検出は非特許文献２に記載された手法でよい。ここでは例えば正面以外の方向を向いている顔でも検出できる顔検出手法を使用する。一般に顔検出においては、事前に多数の学習画像を用いて学習を行い、その学習の結果得られた識別パラメータを学習パターン記憶部に蓄積しておき、顔か否かの識別の際には、その識別パラメータを使用して画像内の顔の有無を識別し、画像内位置を特定する。 (1) Pre-processing unit 12
The execution method of the process in the preprocessing unit 12 is shown in the flowchart of FIG. The pre-processing unit 12 performs face detection from the input image data to cut out a face area (S10) to (S12). This face detection may be performed by the method described in Non-Patent Document 2. Here, for example, a face detection method that can detect a face facing in a direction other than the front is used. In general, in face detection, learning is performed in advance using a large number of learning images, and the identification parameters obtained as a result of the learning are accumulated in a learning pattern storage unit. The presence / absence of a face in the image is identified using the identification parameter, and the position in the image is specified.

本発明においても同様に、顔検出（Ｓ１０）の際には第１学習パターン記憶部１３から識別パラメータを読み出し、識別に使用する。第１学習パターン記憶部１３は、入力された多数の学習画像から学習を行って、識別パラメータを生成し、保持する機能を持つものとする。 Similarly, in the present invention, at the time of face detection (S10), an identification parameter is read from the first learning pattern storage unit 13 and used for identification. The first learning pattern storage unit 13 has a function of performing learning from a large number of input learning images to generate and hold an identification parameter.

次に、入力画像データ内に顔が有ると識別された場合（Ｓ１１）、顔領域切出し（Ｓ１２）を行う。顔がないと識別された場合は処理を終了する。顔領域切出し（Ｓ１２）では、顔領域の画素値を持つ顔領域の画像データと顔領域以外の背景の画素値を持つ背景領域の画像データを生成する。まず、顔領域の画像データと背景領域の画像データの生成のために、入力されたデジタル画像データのコピーを二枚作成する。続いて、顔検出（Ｓ１０）の結果得られた画像内の顔のパーツである目、鼻、口の座標値情報を使用し、全てのパーツを含む任意の大きさの楕円領域内の画素値を選択する。そして、一方のコピー画像において、選択した楕円領域以外の画素を全て白（または黒）に変換することで顔領域の画像データとする。同様に、他方のコピー画像において、選択した楕円領域内の画素を全て白（または黒）に変換したものを背景領域画像データとする。 Next, when it is identified that there is a face in the input image data (S11), face area extraction (S12) is performed. If it is identified that there is no face, the process ends. In the face area extraction (S12), image data of a face area having pixel values of the face area and image data of a background area having background pixel values other than the face area are generated. First, two copies of the input digital image data are created in order to generate image data for the face area and image data for the background area. Subsequently, using the coordinate value information of the eyes, nose, and mouth, which are facial parts in the image obtained as a result of the face detection (S10), the pixel values in an elliptical area of any size including all parts Select. Then, in one copy image, all the pixels other than the selected elliptical area are converted into white (or black) to obtain face area image data. Similarly, in the other copy image, all the pixels in the selected elliptical area are converted to white (or black) as background area image data.

（２）眼鏡モデル生成部１５
眼鏡モデル生成部１５における処理のフローチャートを図５に示す。眼鏡モデル生成部１５では、眼鏡モデルの生成（Ｓ１３，Ｓ１４）および、保持しているデータの出力（Ｓ１５）を行う。眼鏡モデルの生成処理では「ＡｃｔｉｖｅＡｐｐｅａｒａｎｃｅＭｏｄｅｌ」（以下、「ＡＡＭ」と略す。）を使用する。 (2) Eyeglass model generation unit 15
FIG. 5 shows a flowchart of processing in the spectacle model generation unit 15. The spectacle model generation unit 15 generates a spectacle model (S13, S14) and outputs retained data (S15). The “Active Appearance Model” (hereinafter abbreviated as “AAM”) is used in the spectacle model generation processing.

「ＡＡＭ」は、アピアランスモデルと呼ばれる統計的なモデルの作成と、モデル探索によるパラメータ調節の２つの工程から構成される生成モデルと未知画像のマッチング手法である。アピアランスモデルの作成工程では、物体の形状を表す特徴点の座標値と見えを表す輝度値の相関をパラメータとして持つ統計的モデルを作成する。モデル探索によるパラメータ調節工程は、生成したアピアランスモデルと入力した未知画像とのイメージ残差を最小化するように生成したアピアランスモデルのパラメータを変化させる最適化問題に帰着し、その最適化問題を解くことで入力未知画像と生成したモデルのマッチングを行うものである。眼鏡モデル生成部１５では、「ＡＡＭ」の第１工程であるアピアランスモデルの作成のみを行う。 “AAM” is a method of matching an unknown image with a generated model composed of two processes: creation of a statistical model called an appearance model and parameter adjustment by model search. In the appearance model creation process, a statistical model having a correlation between the coordinate value of the feature point representing the shape of the object and the luminance value representing the appearance as a parameter is created. The parameter adjustment process by model search results in an optimization problem that changes the parameters of the generated appearance model to minimize the image residual between the generated appearance model and the input unknown image, and solves the optimization problem Thus, the input unknown image is matched with the generated model. The spectacle model generation unit 15 only creates an appearance model, which is the first step of “AAM”.

以下で、アピアランスモデル作成について簡単に概説する。アピアランスモデルは特徴点を配置した学習画像データセットを使用して作成する。まず１枚の学習画像内の特徴点の座標を並べたベクトルを形状ベクトル「ｘ」とし、全学習画像の形状ベクトルの平均形状「ｘｂａｒ」を求める。次に各学習画像において形状を平均形状「ｘｂａｒ」で正規化し、正規化された形状内の輝度値を並べ、テクスチャベクトル「ｇ」を得る。形状ベクトルと同様に、全学習画像のテクスチャベクトルの平均テクスチャ「ｇｂａｒ」を算出する。形状ベクトル「ｘ」とテクチヤベクトル「ｇ」とは式（１）のようにモデル化することができる。 The following outlines the appearance model creation. The appearance model is created using a learning image data set in which feature points are arranged. First, a vector in which the coordinates of feature points in one learning image are arranged is set as a shape vector “x”, and an average shape “x bar” of the shape vectors of all learning images is obtained. Next, in each learning image, the shape is normalized by the average shape “x bar”, and the luminance values in the normalized shape are arranged to obtain the texture vector “g”. Similar to the shape vector, the average texture “g bar” of the texture vectors of all learning images is calculated. The shape vector “x” and the texture vector “g” can be modeled as shown in Equation (1).

ここで「Ｑ_s」，「Ｑ_g」は固有ベクトルであり、「ｃ」は形状とテクスチャの両方をコントロールするアピアランスパラメータである。アピアランスパラメータ「ｃ」を用いることで、形状だけでなくテクスチャも表現することができる。また逆に、アピアランスパラメータ「ｃ」が求まれば、形状およびテクスチャを算出することが可能となる。（「ＡＡＭ」の基本原理については非特許文献３参照）。 Here, “Q _s ” and “Q _g ” are eigenvectors, and “c” is an appearance parameter that controls both shape and texture. By using the appearance parameter “c”, not only the shape but also the texture can be expressed. Conversely, if the appearance parameter “c” is obtained, the shape and texture can be calculated. (See Non-Patent Document 3 for the basic principle of “AAM”).

具体的には、眼鏡をかけた顔画像とその顔画像に図６のように配置された特徴点の座標値を学習画像データとして多数用意し、眼鏡フレームと眼鏡のレンズ内部、眼球の形状とテクスチャから、上記の「ｘｂａｒ」，「ｇｂａｒ」，「Ｑ_s」，「Ｑ_g」を算出する。ただし、図６中の眼鏡フレーム部分特徴点の画像内の座標を（ｘ_f，ｙ_f）、レンズ部分特徴点の画像内の座標を（ｘ_l，ｙ_l）、眼球部分特徴点の画像内の座標を（ｘ_e，ｙ_e）とする。このとき、形状ベクトル「ｘ」は式（２）であらわすものとする。 Specifically, a face image with spectacles and coordinate values of feature points arranged on the face image as shown in FIG. 6 are prepared as learning image data, and the spectacle frame and the inside of the spectacle lens, the shape of the eyeball, From the texture, the above “x bar”, “g bar”, “Q _s ”, and “Q _g ” are calculated. However, the coordinates of the spectacle frame partial feature points in the image in FIG. 6 are (x _f , y _f ), the coordinates of the lens partial feature points in the image (x _l , y _l ), and the eyeball partial feature points in the image Is set to (x _e , y _e ). At this time, the shape vector “x” is expressed by Expression (2).

学習画像データセットから得られた「ｘｂａｒ」，「ｇｂａｒ」，「Ｑ_s」，「Ｑ_g」は、アピアランスモデルとしてメモリなどの一時記憶手段に保持しておき、特徴抽出部１４へ伝送する。なお、前記アピアランスモデルは、ハードディスクドライブ装置に保存してもよい。 “X bar”, “g bar”, “Q _s ”, and “Q _g ” obtained from the learning image data set are held in a temporary storage means such as a memory as an appearance model and transmitted to the feature extraction unit 14. To do. The appearance model may be stored in a hard disk drive device.

（３）特徴抽出部１４
特徴抽出部１４における処理の実行方法を、図７のフローチャートに示す。特徴抽出部１４では、前処理部１２から伝送された顔領域の画像データを入力とし、眼鏡モデル生成部１５で作成した眼鏡モデルを使用して、最適化問題を解くことで眼鏡領域の探索を行う。探索の結果得られた眼鏡モデルパラメータを特徴量とし、その特徴量と前処理部１２から伝送された顔領域データを眼鏡識別部１６へ伝送する。 (3) Feature extraction unit 14
A method for executing processing in the feature extraction unit 14 is shown in the flowchart of FIG. The feature extraction unit 14 receives the face region image data transmitted from the preprocessing unit 12 and uses the spectacle model created by the spectacle model generation unit 15 to solve the optimization problem, thereby searching for the spectacle region. Do. The spectacle model parameter obtained as a result of the search is used as a feature amount, and the feature amount and the face area data transmitted from the preprocessing unit 12 are transmitted to the spectacle identification unit 16.

具体的には、「ＡＡＭ」の第２工程であるモデル探索によるパラメータ調節により眼鏡領域の探索を行う（Ｓ１６）。ここでは画像データ内から眼鏡領域を探索する際には、モデルの形状とテクスチャをあらわすアピアランスパラメータ「ｃ」に加え、顔領域の画像データ内の眼鏡位置、大きさ、回転を考慮した新しいパラメータ「ｐ」を用いる。顔領域の画像データ内の眼鏡と思われる部分の画素値から作成したテクスチャベクトル「ｇ_s」と眼鏡モデルを用いて作成したテクスチャベクトル「ｇ_m」とのイメージ残差を「ｒ（ｐ）＝ｇ_s−ｇ_m」とし、「ｒ（ｐ）」が最小となるように「ｐ」を変化させ、最適解を求める。得られた最適解から「ｃ」を逆算し、これを眼鏡モデルパラメータとする。ここでの最適化問題の詳細は非特許文献３と同様とする。 Specifically, the spectacle region is searched by parameter adjustment by model search, which is the second step of “AAM” (S16). Here, when searching for the spectacle region from the image data, in addition to the appearance parameter “c” representing the shape and texture of the model, a new parameter “in consideration of the position, size, and rotation of the spectacles in the image data of the face region” p "is used. The image residual between the texture vector “g _s ” created from the pixel values of the part of the image data of the face region that is considered to be glasses and the texture vector “g _m ” created using the glasses model is expressed as “r (p) = g _s −g _m ”,“ p ”is changed so that“ r (p) ”is minimized, and an optimal solution is obtained. “C” is calculated backward from the obtained optimal solution, and this is used as a spectacle model parameter. The details of the optimization problem here are the same as in Non-Patent Document 3.

（４）第２学習パターン記憶部１７
第２学習パターン記憶部１７では、眼鏡の有無を識別するための識別関数データを生成し、眼鏡識別部１６へ伝送する。識別関数データの生成には、「ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ（以下、ＳＶＭと略す。）」を使用する。「ＳＶＭ」は２クラスの分類問題を解くためのパターン識別器を構成する手法である。識別器は入力ベクトルを非線形写像によって仮想的な高次特徴空間に写像し、その特徴空間内の２つのクラスの間で汎化能力を最大とする境界となるような超平面として構成される。ここでの「ＳＶＭ」の基本原理は非特許文献４と同様とする。 (4) Second learning pattern storage unit 17
In the second learning pattern storage unit 17, identification function data for identifying the presence or absence of glasses is generated and transmitted to the glasses identification unit 16. For generation of identification function data, “Support Vector Machine (hereinafter abbreviated as SVM)” is used. “SVM” is a technique for constructing a pattern discriminator for solving a two-class classification problem. The discriminator is configured as a hyperplane that maps an input vector to a virtual higher-order feature space by a non-linear mapping and serves as a boundary that maximizes the generalization ability between the two classes in the feature space. The basic principle of “SVM” here is the same as in Non-Patent Document 4.

第２学習パターン記憶部１７の処理手順のフローチャートを図８に示す。第２学習パターン記憶部１７は、学習サンプル生成部１７−１と、識別関数データ生成部１７−２とから構成される。学習サンプル生成部１７−１では、入力された多数の学習画像から、学習画像と同じ数の学習サンプルを生成する。生成される学習サンプルは、特徴量と所属するクラスを示すラベルから構成される。 A flowchart of the processing procedure of the second learning pattern storage unit 17 is shown in FIG. The second learning pattern storage unit 17 includes a learning sample generation unit 17-1 and an identification function data generation unit 17-2. The learning sample generation unit 17-1 generates the same number of learning samples as the learning image from a large number of input learning images. The generated learning sample includes a feature amount and a label indicating a class to which the learning sample belongs.

図８では、ラベル入力前の段階では学習サンプルの枚数「ｎ＝０」とする（Ｓ１９）。学習サンプル生成部１７−１には、学習画像として眼鏡を掛けている人物の顔画像と眼鏡を掛けていない人物の顔画像の２種類の画像が入力される（Ｓ２０）。なお、入力される顔画像は、それぞれ眼鏡を掛けている人物の顔画像の場合は眼鏡ありクラス、眼鏡を掛けていない人物の顔画像の場合は眼鏡無しクラスに属するものとし、眼鏡ありのクラスに属している顔画像のラベルは「ｙ＝１」，眼鏡なしのクラスに属している顔画像のラベルは「ｙ＝−１」として与えられている。 In FIG. 8, the number of learning samples “n = 0” is set before the label input (S19). Two types of images, that is, a face image of a person wearing glasses and a face image of a person not wearing glasses are input to the learning sample generation unit 17-1 (S20). Note that the input face image belongs to the class with glasses for the face image of a person wearing glasses, and belongs to the class without glasses for the face image of a person without glasses. The label of the face image belonging to “y = 1” and the label of the face image belonging to the class without glasses is given as “y = −1”.

入力された１組の学習画像とラベルに対し、眼鏡モデル生成部１５で生成した眼鏡モデルを用いて、特徴抽出部１４のパラメータ最適化処理（Ｓ１６）と同様にモデル探索によるパラメータ調整を行う（Ｓ２１）。得られた眼鏡モデルパラメータを、その学習画像の特徴量とする。得られた特徴量とラベルのセットを１つの学習サンプル（ｎ＝１）とする（Ｓ２２）。この処理を繰り返し（Ｓ２３，Ｓ２４）、用意した全ての学習画像に対して行い、それぞれ特徴量を抽出し、「ｎ←ｎ＋１」枚の学習サンプルを生成する。この学習サンプルは、メモリなどの一時記憶手段に記憶しておいてもよい。 Using the spectacle model generated by the spectacle model generation unit 15 for the input set of learning images and labels, parameter adjustment by model search is performed in the same manner as the parameter optimization processing (S16) of the feature extraction unit 14 (S16). S21). The obtained spectacle model parameter is used as the feature amount of the learning image. The obtained feature amount and label set is set as one learning sample (n = 1) (S22). This process is repeated (S 23, S 24), and is performed on all prepared learning images, and feature quantities are respectively extracted to generate “n ← n + 1” learning samples. This learning sample may be stored in a temporary storage means such as a memory.

識別関数データ生成部１７−２では、学習サンプル生成部１７−１にて生成した学習サンプルを入力とし、「ＳＶＭ」で学習を行って識別関数データを得る。「ＳＶＭ」の学習では、ｌ個の特徴量「ｘ_i（ｉ＝１，２，…，ｌ）」がラベル「ｙ」とともに与えられているとき、識別関数「ｆ（ｘ）」は、式（３）の形で与えられる。この式（３）を解いて最適化する（Ｓ２５）。 In the discrimination function data generation unit 17-2, the learning sample generated in the learning sample generation unit 17-1 is input, and learning is performed with “SVM” to obtain discrimination function data. In the learning of “SVM”, when l feature quantities “x _i (i = 1, 2,..., L)” are given together with the label “y”, the discriminant function “f (x)” It is given in the form of (3). The equation (3) is solved and optimized (S25).

ここで、「α^*，β^*」は、以下の式（４）の最大化問題の最適解である。 Here, “α ^* , β ^* ” is an optimal solution of the maximization problem of the following equation (4).

識別関数データ生成部Ｓ１８では、学習により生成された「α^*，β^*」および「ｘ_i」を識別関数データとして第２学習パターン記憶部１７に蓄積する（Ｓ２６）。なお、式（３）の識別関数は非線形に拡張したものを用いてもよい。 In the discrimination function data generation unit S18, “α ^* , β ^* ” and “x _i ” generated by learning are accumulated in the second learning pattern storage unit 17 as discrimination function data (S26). The discriminant function of equation (3) may be a non-linearly expanded one.

（５）眼鏡識別部１６
眼鏡識別部１６における処理の実行方法を、図９のフローチャートに示す。眼鏡識別部１６では、眼鏡の有無の識別処理を行う。眼鏡ありクラスか眼鏡無しクラスかを識別する際には、特徴抽出部１４で生成された特徴量と第２学習パターン記憶部１７から識別関数データを呼び出し、それらを用いてどちらのクラスに属するかを識別する（Ｓ２７）。 (5) Glasses identification unit 16
The execution method of the process in the spectacles identification part 16 is shown in the flowchart of FIG. The eyeglass identification unit 16 performs a process for identifying the presence or absence of eyeglasses. When discriminating between a class with glasses and a class without glasses, the feature quantity generated by the feature extraction unit 14 and the identification function data are called from the second learning pattern storage unit 17, and which class is used by using them. Is identified (S27).

例えば、特徴抽出部１４で生成された特徴量を入力「ｘ」とし、式（３）に代入する。「ＳＶＭ」を用いた識別では未知の入力「ｘ」が与えられたとき、上記識別関数「ｆ（ｘ）」の符号により未知画像の所属するクラスを決める。具体的には、「ｆ（ｘ）≧０」のときは眼鏡ありクラス、「ｆ（ｘ）＜０」の時は眼鏡無しクラスとする。ここで入力した特徴量が眼鏡パターンであると判断された場合のみ、データを領域抽出部１８へ伝送する。眼鏡パターンはないと判断された場合、処理はここで終了する。なお、識別処理においては、「ＳＶＭ」以外にも例えば最近傍法、フィッシャーの線形識別法、ニューラルネットワークや部分空間法等を用いてもよい。 For example, the feature amount generated by the feature extraction unit 14 is set as an input “x”, and is substituted into Expression (3). In the identification using “SVM”, when an unknown input “x” is given, the class to which the unknown image belongs is determined by the code of the identification function “f (x)”. Specifically, the class with glasses is set when “f (x) ≧ 0”, and the class without glasses is set when “f (x) <0”. Only when it is determined that the feature amount input here is a spectacle pattern, the data is transmitted to the region extraction unit 18. If it is determined that there is no spectacle pattern, the process ends here. In addition to the “SVM”, for example, nearest neighbor method, Fisher's linear identification method, neural network, subspace method, or the like may be used in the identification processing.

（６）領域抽出部１８
領域抽出部１８における処理の実行方法を、図１０のフローチャートに示す。領域抽出部１８は、眼鏡領域位置算出処理部１８−１と領域切出し部１８−２から構成される。眼鏡領域位置算出処理部１８−１では、特徴抽出部１４で得られた眼鏡モデルのパラメータを使用し、眼鏡領域の画像内における位置を算出する（Ｓ３０）。 (6) Region extraction unit 18
A method for executing processing in the region extracting unit 18 is shown in the flowchart of FIG. The region extraction unit 18 includes a spectacle region position calculation processing unit 18-1 and a region cutout unit 18-2. The spectacle region position calculation processing unit 18-1 calculates the position of the spectacle region in the image using the spectacle model parameters obtained by the feature extraction unit 14 (S30).

具体的には、眼鏡モデルパラメータ「ｃ」から逆算し、形状ベクトル「ｘ」を算出する。その後、得られた形状ベクトルのうち眼鏡フレーム部分特徴点の画像内座標値（ｘ_f，ｙ_f）を眼鏡領域位置として算出する。次に、眼鏡フレーム領域位置として（ｘ_f，ｙ_f）と（ｘ_l，ｙ_l）を算出し、眼球領域位置として（ｘ_e，ｙ_e）を算出する（Ｓ３１）。 Specifically, the shape vector “x” is calculated by performing reverse calculation from the spectacle model parameter “c”. Thereafter, the in-image coordinate values (x _f , y _f ) of the spectacle frame partial feature points in the obtained shape vector are calculated as the spectacle region position. Next, (x _f , y _f ) and (x _l , y _l ) are calculated as spectacle frame region positions, and (x _e , y _e ) are calculated as eyeball region positions (S31).

次に、領域切出し部１８−１２の処理（Ｓ３２〜Ｓ３４）を、図１１を参照して説明する。図１１は、領域切出し処理における各領域の説明図である。切り出し処理は、前処理部１２の顔領域切出し（Ｓ１２）と同様に、特定の領域以外の画素値を全て白（または黒）に置き換えた画像データを生成する。まず、図１１（ａ）の顔領域から算出した図１１（ｂ）の眼鏡領域を選択し、眼鏡領域の画素値を保存した画像データを眼鏡領域データとし、逆に眼鏡領域以外の画素値を保存し、眼鏡領域内部の画素値を白（または黒）に変更したものを顔ベース領域図１１（ｃ）とする（Ｓ３２）。さらに眼鏡領域から図１１（ｄ）に示す眼鏡フレーム領域部分の画素値を保存した画像データを作成し、また眼鏡領域から図１１（ｅ）の眼球領域部分の画素値のみを保存した画像データを作成する（Ｓ３３）。図１１（ｆ）のように眼鏡領域からフレーム領域と眼球領域の画素値を白（または黒）にしたものをレンズ領域の画像データとする（Ｓ３４）。 Next, the processing (S32 to S34) of the area cutout unit 18-12 will be described with reference to FIG. FIG. 11 is an explanatory diagram of each area in the area extraction process. The clipping process generates image data in which all pixel values other than the specific area are replaced with white (or black), as in the face area extraction (S12) of the preprocessing unit 12. First, the spectacle region shown in FIG. 11B calculated from the face region shown in FIG. 11A is selected, and the image data storing the pixel values of the spectacle region is used as spectacle region data. The face base area shown in FIG. 11C is saved and the pixel value inside the eyeglass area is changed to white (or black) (S32). Further, image data in which the pixel values of the spectacle frame region portion shown in FIG. 11 (d) are stored from the spectacle region is created, and image data in which only the pixel values of the eyeball region portion in FIG. 11 (e) are stored from the spectacle region. Create (S33). As shown in FIG. 11F, the image data of the lens region is obtained by setting the pixel values of the frame region and the eyeball region to white (or black) from the eyeglass region (S34).

（７）補間部１９
補間部１９では領域抽出部１８で生成された領域ごとにベクタ化し、レイヤー分けを行う。ベクタ化とはラスタ表現と呼ばれるデジタル画像データのような離散値しか取れない画素の集合として記述されている画像データを、ベクタ表現と呼ばれる物体の特徴点や頂点をつないだ線分で記述した連続値を取り得る形式で表現される画像データに変換することである。具体的なベクタ化の手法として、ここではグラデーションメッシュを使用する。グラデーションメッシュとは対象とする物体の形状を網目状に区切り、その網目に沿って色のグラデーションを施すことで物の質感を写実的に表現することを可能にするグラフィック技術である。またグラデーションメッシュによるベクタ化は、非特許文献５のような最適化手法により自動生成させるものとする。 (7) Interpolation unit 19
The interpolation unit 19 vectorizes each region generated by the region extraction unit 18 and divides the layer. Vectorization is a series of image data described as a set of pixels that can only take discrete values, such as digital image data called raster representation, which is described as a line segment connecting feature points and vertices of an object called vector representation. Conversion to image data expressed in a format that can take values. Here, a gradient mesh is used as a specific vectorization technique. The gradation mesh is a graphic technique that can realistically express the texture of an object by dividing the shape of a target object into a mesh and applying a color gradation along the mesh. Further, vectorization by gradation mesh is automatically generated by an optimization method as described in Non-Patent Document 5.

以下で非特許文献５におけるグラデーションメッシュの構造について図１２を用いて概説する。図１２（ａ）はおのおのの座標がパラメータ「ｕ，ｖ」の関数として表現されるパラメトリック曲面の例であり、以下これをパッチと呼ぶ。非特許文献５にて使用されているファーガソンパッチは式（５）で表現される。 The structure of the gradation mesh in Non-Patent Document 5 will be outlined below with reference to FIG. FIG. 12A shows an example of a parametric curved surface in which each coordinate is expressed as a function of the parameter “u, v”, and this is hereinafter referred to as a patch. The Ferguson patch used in Non-Patent Document 5 is expressed by Equation (5).

グラデーションメッシュはパッチとメッシュラインから構成され、上記ファーガソンパッチ上の点はメッシュラインを表現するための位置、位置の微分、色、色情報の微分の４種類の情報を持っている。図１２（ｂ）はグラデーションメッシュの例であり、点「ｑ」を通る２本の曲線はメッシュラインである。例えば図１２（ｂ）中の点「ｑ」の場合、式（６）の位置情報と、式（７）の位置の微分情報、式（８）のＲＧＢカラー情報、式（９）のＲＧＢカラーの微分情報を持っており、グラデーションメッシュは式（１０）で表現される。 The gradation mesh is composed of patches and mesh lines, and the points on the Ferguson patch have four types of information: positions for expressing the mesh lines, position differentiation, color, and color information differentiation. FIG. 12B is an example of a gradation mesh, and two curves passing through the point “q” are mesh lines. For example, in the case of the point “q” in FIG. 12B, the position information of Expression (6), the differential information of the position of Expression (7), the RGB color information of Expression (8), and the RGB color of Expression (9) The gradation mesh is expressed by equation (10).

また、非特許文献５はベクタ化における最適化は以下の手順で行われる。即ち、まず入力されたラスタ画像内のベクタ化したい領域を均等に分割するなどして、複数のパッチを作成することでメッシュの初期化を行う。今、Ｐ個のパッチが生成され、ベクタ化したい画像を「Ｉ」としたとき、式（１１）のエネルギー関数を最小化することでメッシュが最適化されベクタ化される。 In Non-Patent Document 5, optimization in vectorization is performed according to the following procedure. That is, first, the mesh is initialized by creating a plurality of patches, for example, by equally dividing a region to be vectorized in the input raster image. Now, when P patches are generated and the image to be vectorized is “I”, the mesh is optimized and vectorized by minimizing the energy function of equation (11).

具体的な補間部１９における処理手順を図１３のフローチャートに示す。補間処理は図１３の顔ベースレイヤー作成部１９−１とレンズレイヤー作成部１９−２と眼球レイヤー作成部１９−３とから構成される。各レイヤー作成部１９−１〜１９−３で作成された情報は全てレイヤー統合処理Ｓ４３により１枚の画像に合成される。 A specific processing procedure in the interpolation unit 19 is shown in the flowchart of FIG. The interpolation process includes a face base layer creation unit 19-1, a lens layer creation unit 19-2, and an eyeball layer creation unit 19-3 in FIG. All the information created by each of the layer creation units 19-1 to 19-3 is synthesized into one image by the layer integration processing S43.

以下、各レイヤー作成部１９−１〜１９−３の処理を説明する。図１４は、図１３のフローチャート内の顔ベースレイヤー作成部１９−１内の処理を説明するための概念図である。顔ベースレイヤー作成部１９−１では、領域抽出部１８で作成された顔ベース領域情報を使用し、顔ベース穴埋め処理（Ｓ３８）により顔ベース領域の領域切出しの際に開いた眼鏡領域の穴を埋め、次に顔ベースベクタ化処理（Ｓ３９）を行う。 Hereinafter, processing of each of the layer creation units 19-1 to 19-3 will be described. FIG. 14 is a conceptual diagram for explaining the processing in the face base layer creation unit 19-1 in the flowchart of FIG. The face base layer creation unit 19-1 uses the face base region information created by the region extraction unit 18, and uses the face base hole filling process (S38) to open the eyeglass region hole that was opened when the face base region was cut out. Next, face-based vectorization processing (S39) is performed.

具体的には、顔ベース穴埋め処理（Ｓ３８）は、図１４（ａ）の顔ベース領域内の穴領域との境界の画素の画素値の平均で、眼鏡領域部分の色を補間し、図１４（ｂ）のような穴領域も肌と同じ色を持つ画像を生成する。顔ベースベクタ化処理（Ｓ３９）は、顔ベース領域を前記最適化手法により最適化されたグラデーションメッシュを作成する。図１４（ｃ）は最適化により自動作成されたメッシュの形状情報である。このメッシュ上には最適化によりテクスチャも生成される。メッシュ情報とテクスチャの両方の情報を持つ、図１４（ｄ）のような生成したベクタ画像を顔ベースレイヤーとし、生成したデータはレイヤー統合処理Ｓ４３とレンズレイヤー作成部１９−２に伝送する。 Specifically, the face base hole filling process (S38) interpolates the color of the eyeglass region by the average of the pixel values of the pixels at the boundary with the hole region in the face base region of FIG. An image having the same color as the skin is also generated in the hole region as shown in (b). The face base vectorization process (S39) creates a gradation mesh in which the face base area is optimized by the optimization method. FIG. 14C shows mesh shape information automatically created by optimization. A texture is also generated on the mesh by optimization. The generated vector image as shown in FIG. 14D having both mesh information and texture information is used as a face base layer, and the generated data is transmitted to the layer integration processing S43 and the lens layer creation unit 19-2.

眼球レイヤー作成部１９−３では、領域抽出部１８で作成された眼球領域の情報を使用し、前記最適化手法を用いて眼球領域ベクタ化処理（Ｓ４２）を行う。ここで生成したベクタ画像を眼球レイヤーとし、生成したデータはレイヤー統合処理Ｓ４３とレンズレイヤー作成部１９−２に伝送する。 The eyeball layer creation unit 19-3 uses the information on the eyeball region created by the region extraction unit 18 and performs the eyeball region vectorization process (S42) using the optimization method. The vector image generated here is used as an eyeball layer, and the generated data is transmitted to the layer integration processing S43 and the lens layer creation unit 19-2.

レンズレイヤー作成部１９−２では、領域抽出部１８で作成されたレンズ領域の情報を使用し、前記最適化手法を用いてレンズ領域ベクタ化処理（Ｓ４０）を行う。ここで生成したベクタ画像をレンズレイヤーとする。次に、生成した顔ベースレイヤーと眼球レイヤーとレンズレイヤーの３つのレイヤーの情報を用いて、レンズレイヤー色補正（Ｓ４１）を行う。レンズレイヤー色補正（Ｓ４１）の処理内容を、図１５を参照して説明する。生成したレイヤーには図１５（ａ）のようなメッシュと、その交差点であるメッシュポイントを持っている。メッシュポイントは色情報を保持し、メッシュポイント近傍の色の具合を制御することが可能である。まず、レンズレイヤーの外側の輪郭部分のメッシュポイントである図１５（ｂ）の顔ベース側メッシュポイントの色を、図１５（ｃ）のように対応する顔ベースレイヤー内における同じ座標値の場所の色に置換する。次に、内側の輪郭部分のメッシュポイントである図１５（ｂ）の眼球側メッシュポイントの色は、図１５（ｄ）のように対応する眼球レイヤー内の同じ座標値の場所の色に置換することで色を制御する。 The lens layer creation unit 19-2 uses the lens region information created by the region extraction unit 18 and performs lens region vectorization processing (S40) using the optimization method. The vector image generated here is used as a lens layer. Next, lens layer color correction (S41) is performed using the information of the generated three layers of the face base layer, the eyeball layer, and the lens layer. The details of the lens layer color correction (S41) will be described with reference to FIG. The generated layer has a mesh as shown in FIG. 15A and a mesh point that is an intersection of the mesh. The mesh point retains color information and can control the color condition in the vicinity of the mesh point. First, the color of the face base side mesh point in FIG. 15B, which is the mesh point of the outer contour portion of the lens layer, is the same coordinate value location in the corresponding face base layer as shown in FIG. 15C. Replace with color. Next, the color of the eyeball side mesh point in FIG. 15B which is the mesh point of the inner contour portion is replaced with the color of the location of the same coordinate value in the corresponding eyeball layer as shown in FIG. To control the color.

レイヤー統合処理Ｓ４３では、下から順に、前処理で切出した背景領域、顔ベースレイヤー、レンズレイヤー、眼球レイヤーの順に重ね合わせて一枚のラスタ画像データに変換する。ラスタ画像への変換処理は、ベクタ表現されていた画像を離散の値をとる画像データに変換することである。なお、ベクタ化画像の自動生成については、非特許文献の手法以外に「ｓｕｂｄｉｖｉｓｉｏｎｍｅｓｈ」を用いてもよい。 In layer integration processing S43, the background area, face base layer, lens layer, and eyeball layer cut out in the preprocessing are sequentially overlapped in order from the bottom and converted into a single raster image data. The process of converting to a raster image is to convert an image that has been vector-expressed into image data that takes discrete values. For automatic generation of a vectorized image, “subdivisionmesh” may be used in addition to the method of non-patent literature.

また、本発明は、上記実施形態に限定されるものではなく、例えば前記画像処理装置１０の各機能ブロック１１〜２０としてコンピュータを機能させるプログラムとして構築することもできる。このプログラムは、各機能ブロック１１〜２０の全ての処理をコンピュータに実行させてもよく、また、その一部の機能をコンピュータに実行させるものであってもよい。 In addition, the present invention is not limited to the above-described embodiment, and can be constructed as a program that causes a computer to function as each of the functional blocks 11 to 20 of the image processing apparatus 10, for example. This program may cause the computer to execute all the processes of the functional blocks 11 to 20, or may cause the computer to execute a part of the functions.

このプログラムは、Ｗｅｂサイトなどからのダウンロードによってコンピュータに提供することができる。また、ＣＤ−ＲＯＭ，ＤＶＤ−ＲＯＭ，ＣＤ−Ｒ，ＣＤ−ＲＷ，ＤＶＤ−Ｒ，ＤＶＤ−ＲＷ，ＭＯ，ＨＤＤ，Ｂｌｕ−ｒａｙＤｉｓｋ（登録商標）などの記録媒体に格納してコンピュータに提供してもよい。この記録媒体から読み出されたプログラムコード自体が前記実施形態の処理を実現するので、該記録媒体も本発明を構成する。 This program can be provided to a computer by downloading from a website or the like. Also, it is stored in a recording medium such as a CD-ROM, DVD-ROM, CD-R, CD-RW, DVD-R, DVD-RW, MO, HDD, Blu-ray Disk (registered trademark) and provided to a computer. May be. Since the program code itself read from the recording medium realizes the processing of the above embodiment, the recording medium also constitutes the present invention.

本発明の実施形態に係る画像処理装置の機能ブロック図。1 is a functional block diagram of an image processing apparatus according to an embodiment of the present invention. 同全体処理を示すフローチャート。The flowchart which shows the whole process. 同補間手段によるレイヤー化の概略図。The schematic of layering by the interpolation means. 同前処理部の処理を示すフローチャート。The flowchart which shows the process of the same preprocessing part. 同眼鏡モデル生成部の処理を示すフローチャート。The flowchart which shows the process of the spectacles model production | generation part. 同特徴抽出部の使用する眼鏡モデルの概念図。The conceptual diagram of the spectacles model which the same feature extraction part uses. 同特徴抽出部の処理を示すフローチャート。The flowchart which shows the process of the feature extraction part. 同第２学習パターン記憶部の処理の示すフローチャート。The flowchart which shows the process of the 2nd learning pattern memory | storage part. 同眼鏡識別部の処理を示すフローチャート。The flowchart which shows the process of the spectacles identification part. 同領域抽出部の処理を示すフローチャート。The flowchart which shows the process of the same area extraction part. 同領域抽出部の切出し領域の概念図。The conceptual diagram of the extraction area | region of the same area extraction part. 同補間部部のグラデーションメッシュの構造を示す概念図。The conceptual diagram which shows the structure of the gradation mesh of the interpolation part. 同補間部の処理を示すフローチャート。The flowchart which shows the process of the interpolation part. 同顔ベースレイヤー作成部内の処理手順を示す概念図。The conceptual diagram which shows the process sequence in the same face base layer preparation part. 同レンズレイヤー色補正の処理手順を示す概念図。The conceptual diagram which shows the process sequence of the same lens layer color correction.

Explanation of symbols

１０…画像処理装置
１１…画像データ入力部
１２…前処理部（前処理手段）
１３…第１学習パターン記憶部
１４…特徴抽出部（特徴抽出手段）
１５…眼鏡モデル生成部（眼鏡モデル生成手段）
１６…眼鏡識別部（眼鏡識別手段）
１７…第２学習パターン記憶部
１７−１…学習サンプル生成部
１７−２…識別関数データ生成部
１８…領域抽出部（領域抽出手段）
１８−１…眼鏡領域位置算出部
１８−２…領域切出し部
１９…補間部（補間手段）
１９−１…顔ベースレイヤー作成部
１９−２…レンズレイヤー作成部
１９−３…眼球レイヤー作成部
２０…出力部 DESCRIPTION OF SYMBOLS 10 ... Image processing apparatus 11 ... Image data input part 12 ... Pre-processing part (pre-processing means)
13 ... 1st learning pattern memory | storage part 14 ... Feature extraction part (feature extraction means)
15 ... Eyeglass model generation unit (glasses model generation means)
16 ... Glasses identifying unit (glasses identifying means)
DESCRIPTION OF SYMBOLS 17 ... 2nd learning pattern memory | storage part 17-1 ... Learning sample production | generation part 17-2 ... Discrimination function data generation part 18 ... Area extraction part (area extraction means)
18-1 ... Eyeglass region position calculation unit 18-2 ... Region cutout unit 19 ... Interpolation unit (interpolation means)
19-1 ... Face base layer creation unit 19-2 ... Lens layer creation unit 19-3 ... Eyeball layer creation unit 20 ... Output unit

Claims

An image processing apparatus that identifies the presence or absence of glasses from a captured image of a person and generates a face image from which a glasses pattern is removed,
A spectacle model generation means for generating a spectacle model taking into account the spectacle frame shape and the shape of the eyeball inside the spectacle frame, and the texture inside the spectacle frame including the spectacle frame and the eyeball, from the learning pattern of the sample image;
Feature extraction means for extracting a feature amount of a spectacle region from the face region in the input captured image and the spectacle model;
A pair of glasses identifying means for comparing the feature amount of the glasses region and the feature amount of the learning pattern to determine the presence or absence of glasses in the captured image;
An area extracting unit that divides the captured image into arbitrary areas from the feature quantities of the face area and the eyeglass area;
Interpolating means for generating an arbitrary texture from the data of each divided area and integrating the areas;
An image processing apparatus that outputs the integration result.

Recognizing the presence or absence of a face in the captured image, and comprising preprocessing means for extracting the face region;
The feature extraction means performs a search of the spectacle region, extracts spectacle model parameters obtained as a result of the search as a feature amount,
The eyeglass identification unit reads an identification parameter of a learning pattern stored in a database as a feature amount, and uses the eyeglass model parameter to determine the presence / absence of eyeglasses in the captured image. Image processing apparatus.

The area extracting means calculates a coordinate value of the eyeglass area, a coordinate value of the eyeglass frame area, and a coordinate value of the eyeball area in the captured image; and a means for dividing the face area into a plurality of areas. With
The interpolation means includes a means for filling pixel values of a cavity of an object in each area separated by the area extraction means, a means for creating a layer of an arbitrary area, and a texture automatically generated from the object to generate the texture. The image processing apparatus according to claim 1, further comprising: a unit that corrects the plurality of layers, and a unit that converts and integrates the plurality of layers into a single image.

An image processing method for identifying the presence or absence of glasses from a captured image of a person and generating a face image from which a glasses pattern is removed,
A first step in which the spectacle model generation means generates a spectacle model taking into account the spectacle frame shape and the shape of the eyeball inside the spectacle frame, and the texture inside the spectacle frame including the spectacle frame and the eyeball, from the learning pattern of the sample image;
A second step in which a feature extraction unit extracts a feature amount of the spectacle region from the face region in the input captured image and the spectacle model;
A third step in which the eyeglass identification means compares the feature amount of the eyeglass region and the feature amount of the learning pattern to determine the presence or absence of glasses in the captured image;
A fourth step in which an area extracting unit divides the captured image into arbitrary areas from the feature quantities of the face area and the eyeglass area;
A fifth step in which an interpolating unit generates an arbitrary texture from the data of the divided areas and integrates the areas;
A sixth step of outputting the integration result of the fifth step;
An image processing method comprising:

Further comprising the step of identifying the presence or absence of a face in the captured image in the pre-processing means and extracting the face region;
In the second step, the eyeglass region is searched, and eyeglass model parameters obtained as a result of the search are extracted as feature amounts.
5. The third step is to read an identification parameter of a learning pattern stored in a database as a feature amount, and to determine the presence or absence of glasses in the captured image using the spectacle model parameter. Image processing method.

The fourth step includes calculating a coordinate value of the spectacle region, a coordinate value of the spectacle frame region, and a coordinate value of the eyeball region in the input image data, and dividing and dividing the face region into a plurality of regions. While having
The fifth step includes a step of filling a pixel value of a cavity of an object in each region separated by the region extraction means, a step of creating a layer of an arbitrary region, a texture is automatically generated from the object, and the step The image processing apparatus according to claim 4, further comprising a step of correcting a texture, and a step of converting and integrating the plurality of layers into a single image data.

An image processing program for causing a computer to function as each means constituting the image processing apparatus according to claim 1.

A recording medium on which the image processing program according to claim 7 is recorded.