JP2005346474A

JP2005346474A - Image processing method and image processor and program and storage medium

Info

Publication number: JP2005346474A
Application number: JP2004166142A
Authority: JP
Inventors: Kotaro Yano; 光太郎矢野
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2004-06-03
Filing date: 2004-06-03
Publication date: 2005-12-15

Abstract

<P>PROBLEM TO BE SOLVED: To discriminate with good accuracy a prescribed object from a picture even when an environment such illumination conditions involving the object is change. <P>SOLUTION: An image processor detects the prescribed object from the picture and comprises: conversion sections 20, 30 converting the picture into a color space picture based on a human visual perception amount; a feature amount extract section 40 extracting a feature amount from the picture converted by the conversion sections; and object detection sections 50, 60 detecting the prescribed object based on the feature amount extracted by the feature amount extraction section. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、画像中から所定の被写体を判別する技術に関するものである。 The present invention relates to a technique for discriminating a predetermined subject from an image.

従来より画像中から所定の被写体が存在する領域を画像処理によって検出するためのいくつかの方法が知られている。 Conventionally, several methods for detecting an area where a predetermined subject exists in an image by image processing are known.

例えば、特開平６−４４３６５号公報（特許文献１）によれば、対象とする画像データから処理に必要な色情報を取り出し、人の肌領域や口唇、目といった顔特有の特徴となる部分を抽出し、画像中から顔の位置情報を得ている。また、特開平１１−１５９７９号公報（特許文献２）によれば、画像中から明るさがある程度暗い領域と肌色領域を検出し、画像中の所定の領域を人の顔とするかどうかの判別を行っている。 For example, according to Japanese Patent Laid-Open No. 6-44365 (Patent Document 1), color information necessary for processing is extracted from target image data, and portions that are characteristic of a face such as a human skin region, lips, and eyes are extracted. The face position information is obtained from the extracted image. Further, according to Japanese Patent Laid-Open No. 11-1579 (Patent Document 2), a region where the brightness is somewhat dark and a skin color region are detected from the image, and it is determined whether or not the predetermined region in the image is a human face. It is carried out.

しかしながら、以上のいずれの方法においても、被写体特有の明るさや色の特徴を被写体判別における重要なポイントとしているが、被写体の明るさや色の見え方の変化については考慮されておらず、被写体の置かれている環境が著しく変化した場合には被写体の判別が極端に困難となるという問題点を有していた。
特開平６−４４３６５号公報特開平１１−１５９７９号公報特開平１１−５３５４７号公報 However, in any of the methods described above, brightness and color characteristics peculiar to the subject are important points in subject discrimination, but changes in the subject's brightness and color appearance are not taken into consideration. There has been a problem that it is extremely difficult to discriminate a subject when the environment in which the light is applied changes significantly.
JP-A-6-44365 Japanese Patent Application Laid-Open No. 11-15979 Japanese Patent Laid-Open No. 11-53547

以上の点を考慮し、例えば、特開平１１−５３５４７号公報（特許文献３）においては、あらかじめ照明条件などの環境を変えて被写体画像を取得し、取得した被写体画像の特徴量の分布を基に参照辞書を作成し、被写体の照合を行い、対象とする画像から被写体の抽出を行うようにしている。 Considering the above points, for example, in Japanese Patent Laid-Open No. 11-53547 (Patent Document 3), a subject image is acquired in advance by changing the environment such as the illumination condition, and the distribution of the feature amount of the acquired subject image is used as a basis. A reference dictionary is created, the subject is collated, and the subject is extracted from the target image.

しかしながら、照明条件を変えて被写体を撮影して作成した参照辞書による照合では、結果的に色の情報よりもテクスチャの情報の方が重視されるようになるため、例えば、人物の顔の検出といった色情報が重要となる被写体の判別には効果的ではなかった。また、照明条件などの環境を変えて被写体画像を取得する必要があるため参照辞書の作成が容易ではなかった。 However, in comparison with a reference dictionary created by photographing a subject under different illumination conditions, as a result, texture information is more important than color information. It has not been effective in discriminating subjects for which color information is important. Also, since it is necessary to acquire subject images by changing the environment such as lighting conditions, it is not easy to create a reference dictionary.

従って、本発明は上述した課題に鑑みてなされたものであり、その目的は、被写体を取り巻く照明条件などの環境が変った場合においても、画像中から所定の被写体を精度よく判別できるようにすることである。 Therefore, the present invention has been made in view of the above-described problems, and an object of the present invention is to make it possible to accurately determine a predetermined subject from an image even when an environment such as an illumination condition surrounding the subject changes. That is.

上述した課題を解決し、目的を達成するために、本発明に係わる画像処理方法は、画像中から所定の被写体を検出する画像処理方法であって、画像を人間の知覚量に基づく色空間の画像に変換する変換工程と、該変換工程において変換された画像から特徴量を抽出する特徴量抽出工程と、該特徴量抽出工程において抽出された特徴量に基づいて前記所定の被写体を検出する被写体検出工程と、を具備することを特徴とする。 In order to solve the above-described problems and achieve the object, an image processing method according to the present invention is an image processing method for detecting a predetermined subject from an image, and the image is processed in a color space based on a human perception amount. A conversion step for converting to an image; a feature amount extraction step for extracting a feature amount from the image converted in the conversion step; and a subject for detecting the predetermined subject based on the feature amount extracted in the feature amount extraction step And a detection step.

また、この発明に係わる画像処理方法において、前記変換工程では、前記画像を撮影した際の撮影環境をもとに画像に対して順応処理を行うことを特徴とする。 In the image processing method according to the present invention, in the conversion step, adaptation processing is performed on an image based on a shooting environment when the image is shot.

また、この発明に係わる画像処理方法において、前記被写体判別工程では、あらかじめ異なる環境で撮影した検出対象の被写体を含む画像に前記変換工程を施して人間の知覚量に基づく色空間の画像に変換し、前記特徴量抽出工程において抽出した特徴量の分布をもとに前記所定の被写体を判別することを特徴とする。 In the image processing method according to the present invention, in the subject determination step, the conversion step is performed on an image including a subject to be detected, which has been captured in a different environment in advance, to convert the image into a color space image based on a human perception amount. The predetermined subject is discriminated based on the distribution of the feature amount extracted in the feature amount extraction step.

また、本発明に係わるプログラムは、上記の画像処理方法をコンピュータに実行させることを特徴とする。 A program according to the present invention causes a computer to execute the above image processing method.

また、本発明に係わる記憶媒体は、上記のプログラムをコンピュータ読み取り可能に記憶したことを特徴とする。 A storage medium according to the present invention stores the above-mentioned program so as to be readable by a computer.

また、本発明に係わる画像処理装置は、画像中から所定の被写体を検出する画像処理装置であって、画像を人間の知覚量に基づく色空間の画像に変換する変換手段と、該変換手段により変換された画像から特徴量を抽出する特徴量抽出手段と、該特徴量抽出手段により抽出された特徴量に基づいて前記所定の被写体を検出する被写体検出手段と、を具備することを特徴とする。 An image processing apparatus according to the present invention is an image processing apparatus that detects a predetermined subject from an image, a conversion unit that converts the image into an image in a color space based on a human perception amount, and the conversion unit. And a feature amount extraction unit that extracts a feature amount from the converted image, and a subject detection unit that detects the predetermined subject based on the feature amount extracted by the feature amount extraction unit. .

本発明によれば、被写体を取り巻く照明条件などの環境が変った場合においても画像中から所定の被写体を精度よく判別することが可能となる。 According to the present invention, it is possible to accurately determine a predetermined subject from an image even when an environment such as an illumination condition surrounding the subject changes.

以下、本発明の好適な一実施形態について、図面を参照して詳細に説明する。 Hereinafter, a preferred embodiment of the present invention will be described in detail with reference to the drawings.

本実施形態に係る画像処理装置は、ＰＣ（パーソナルコンピュータ）やＷＳ（ワークステーション）等のコンピュータにより構成されており、デジタルカメラからの入力、インターネットからのダウンロード、ＣＤ−ＲＯＭやＤＶＤ−ＲＯＭなどの記憶媒体からの読み出し等により入力された画像に対して、所定の被写体を判別し、その領域を検出することを目的とするものである。 The image processing apparatus according to the present embodiment is configured by a computer such as a PC (personal computer) or WS (workstation), and is input from a digital camera, downloaded from the Internet, a CD-ROM, a DVD-ROM, or the like. An object of the present invention is to determine a predetermined subject in an image input by reading from a storage medium or the like and detect the area.

以下、このような処理を行う本実施形態に係る画像処理装置についてより詳細に説明する。特に、本実施形態では、カメラで撮影した画像中から人物の顔領域を検出する例について説明する。 Hereinafter, the image processing apparatus according to the present embodiment that performs such processing will be described in more detail. In particular, in the present embodiment, an example in which a human face area is detected from an image captured by a camera will be described.

図２は、本実施形態に係る画像処理装置の基本構成を示す図である。 FIG. 2 is a diagram illustrating a basic configuration of the image processing apparatus according to the present embodiment.

２０１はＣＰＵで、ＲＡＭ２０２やＲＯＭ２０３に格納されているプログラムやデータを用いて本装置全体の制御を行うと共に、後述する各処理を行う。 A CPU 201 controls the entire apparatus using programs and data stored in the RAM 202 and the ROM 203, and performs each process described below.

２０２はＲＡＭで、外部記憶装置２０７や記憶媒体ドライブ装置２０８から読み出したプログラムやデータを一時的に記憶するためのエリアを備えると共に、ＣＰＵ２０１が各種の処理を実行するために用いるワークエリアも備える。 A RAM 202 includes an area for temporarily storing programs and data read from the external storage device 207 and the storage medium drive device 208, and also includes a work area used by the CPU 201 to execute various processes.

２０３はＲＯＭで、ここにブートプログラムや本装置の設定データなどを格納する。 A ROM 203 stores a boot program, setting data of the apparatus, and the like.

２０４、２０５は夫々キーボード、マウスで、夫々ＣＰＵ２０１に各種の指示を入力するためのものである。 Reference numerals 204 and 205 denote a keyboard and a mouse, respectively, for inputting various instructions to the CPU 201.

２０６は表示部で、ＣＲＴや液晶画面などにより構成されており、ＣＰＵ２０１による処理結果を文字や画像などにより表示する。 A display unit 206 includes a CRT, a liquid crystal screen, and the like, and displays a processing result by the CPU 201 using characters, images, or the like.

２０７は外部記憶装置で、例えばハードディスクドライブ装置等の大容量情報記憶装置であって、ここにＯＳ（オペレーティングシステム）や、後述する各処理をＣＰＵ２０１に実行させるためのプログラムやデータが格納されており、これらは必要に応じて、ＣＰＵ２０１の制御によりＲＡＭ２０２に読み出される。 Reference numeral 207 denotes an external storage device, which is a large-capacity information storage device such as a hard disk drive device, for example, and stores an OS (Operating System) and programs and data for causing the CPU 201 to execute each process described later. These are read to the RAM 202 as required by the control of the CPU 201.

２０８は記憶媒体ドライブ装置で、ＣＤ−ＲＯＭやＤＶＤ−ＲＯＭなどの記憶媒体に記録されているプログラムやデータを読み出して、ＲＡＭ２０２や外部記憶装置２０７に出力する。なお、上記外部記憶装置２０７に記憶されているプログラムやデータの一部を上記記憶媒体に記録しておいても良く、その場合には、これら記憶されているプログラムやデータを使用する際に、記憶媒体ドライブ装置２０８がこの記憶媒体に記録されているプログラムやデータを読み出して、ＲＡＭ２０２に出力する。 A storage medium drive device 208 reads out programs and data recorded on a storage medium such as a CD-ROM or DVD-ROM, and outputs them to the RAM 202 or the external storage device 207. A part of the program or data stored in the external storage device 207 may be recorded on the storage medium. In that case, when using the stored program or data, The storage medium drive device 208 reads out programs and data recorded on the storage medium and outputs them to the RAM 202.

２０９はＩ／Ｆ（インターフェース）で、ここにデジタルカメラやインターネットやＬＡＮのネットワーク回線等を接続することができる。 Reference numeral 209 denotes an I / F (interface), to which a digital camera, the Internet, a LAN network line, or the like can be connected.

２１０は上述の各部を繋ぐバスである。 A bus 210 connects the above-described units.

図１は、本実施形態に係る画像処理装置として機能するコンピュータの機能構成を示すブロック図である。 FIG. 1 is a block diagram illustrating a functional configuration of a computer that functions as an image processing apparatus according to the present embodiment.

図１において１０は画像入力部で、この画像入力部１０を介して画像処理装置１００に以下の処理対象の画像のデータが入力される。尚、この画像（入力画像）には、被写体として人の顔が写っている。２０はアピアランスパラメータ設定部で、入力画像をアピアランス色空間のデータに変換するためのパラメータを設定する。ここでは入力画像を撮影した際の撮影環境を表すパラメータとして環境、被写体の明るさ、光源の種類などが設定される。３０は色の見え変換部で、アピアランスパラメータ設定部２０で設定したアピアランスパラメータに従って、入力画像を人間の知覚量に基づいてアピアランス色空間の画像データに変換する。４０は被写体特徴量算出部で、色の見え変換部３０で変換されたアピアランス色空間の画像データから顔を検出するための特徴量を算出する。即ち画像から所定の注目領域を部分画像として切り出し、明るさ、色情報から計算される特徴量を求める。５０は参照辞書記憶部で、検出対象である被写体を判別するための特徴量の分布を辞書データとしてあらかじめ記憶するためのものである。６０は被写体判別部で、被写体特徴量算出部４０で算出した特徴量と参照辞書記憶部５０に記憶されている辞書データとを比較し、対象とする領域が検出すべき被写体のカテゴリーに属するかを判別する。７０は被写体領域出力部で、検出した被写体領域を検出結果として出力する。 In FIG. 1, reference numeral 10 denotes an image input unit, and the following image data to be processed is input to the image processing apparatus 100 via the image input unit 10. In this image (input image), a human face is shown as a subject. An appearance parameter setting unit 20 sets a parameter for converting the input image into appearance color space data. Here, the environment, the brightness of the subject, the type of the light source, and the like are set as parameters representing the shooting environment when the input image is shot. A color appearance conversion unit 30 converts the input image into appearance color space image data based on the amount of human perception according to the appearance parameter set by the appearance parameter setting unit 20. Reference numeral 40 denotes a subject feature amount calculation unit that calculates a feature amount for detecting a face from image data in the appearance color space converted by the color appearance conversion unit 30. That is, a predetermined region of interest is cut out from the image as a partial image, and a feature amount calculated from brightness and color information is obtained. Reference numeral 50 is a reference dictionary storage unit for storing in advance, as dictionary data, a distribution of feature amounts for determining a subject to be detected. Reference numeral 60 denotes a subject determination unit that compares the feature amount calculated by the subject feature amount calculation unit 40 with the dictionary data stored in the reference dictionary storage unit 50, and whether the target region belongs to the category of the subject to be detected. Is determined. A subject area output unit 70 outputs the detected subject area as a detection result.

尚、図１に示した各部は専用のハードウェアにより構成しても良いが、本実施形態では各部が行なおうとする後述の各処理をＣＰＵ２０１に実行させるためのプログラムとして実現するものとする。 1 may be configured by dedicated hardware, but in the present embodiment, each unit is realized as a program for causing the CPU 201 to execute each process to be described later that each unit is to perform.

図３は、図１に示した各部により行われる処理のフローチャートである。なお、図３のフローチャートに従ったプログラムを外部記憶装置２０７や記憶媒体ドライブ装置２０８によってＲＡＭ２０２にロードし、これをＣＰＵ２０１が実行することにより、本実施形態に係る画像処理装置は、後述する各処理を行うことになる。 FIG. 3 is a flowchart of processing performed by each unit shown in FIG. 3 is loaded into the RAM 202 by the external storage device 207 or the storage medium drive device 208, and is executed by the CPU 201, the image processing apparatus according to the present embodiment performs each process described below. Will do.

先ず、ＲＡＭ２０２に処理対象の画像（処理対象画像）のデータをロードする（ステップＳ１０１）。このロードの形態は特に限定するものではないが、例えばこの処理対象画像のデータを外部記憶装置２０７に保存させておき、キーボード２０４やマウス２０５を用いてこの処理対象画像のデータのファイル名を入力して、ＣＰＵ２０１にＲＡＭ２０２へのロードを指示しても良いし、Ｉ／Ｆ２０９に不図示のデジタルカメラを接続し、このデジタルカメラに対して（実際にはＲＡＭ２０２にロードされているこのデジタルカメラのドライバソフトに対して）、デジタルカメラが撮像し、自身の内部メモリに保存している画像のデータを画像処理装置のＲＡＭ２０２に転送する指示を送信することにより、ＲＡＭ２０２にこの処理対象画像のデータを転送するようにしても良い。 First, data of a processing target image (processing target image) is loaded into the RAM 202 (step S101). The form of loading is not particularly limited. For example, the processing target image data is stored in the external storage device 207, and the file name of the processing target image data is input using the keyboard 204 or the mouse 205. Then, the CPU 201 may be instructed to load the RAM 202, or a digital camera (not shown) is connected to the I / F 209, and this digital camera (actually, the digital camera loaded in the RAM 202 is connected). By sending an instruction to transfer image data captured by the digital camera and stored in its own internal memory to the RAM 202 of the image processing apparatus, the image data to be processed is stored in the RAM 202. You may make it forward.

本実施形態では、この処理対象画像は１画素が８ビットで表現されるＲＧＢ画像であって、且つ処理対象画像のサイズはＭ画素×Ｎ画素（但し、Ｍは水平画素数、Ｎは垂直画素数）であるとする。また、この処理対象画像は、Ｒ（赤）の画素値のみで構成されるプレーンと、Ｇ（緑）の画素値のみで構成されるプレーンと、Ｂ（青）の画素値のみで構成されるプレーンの計３つのプレーンで構成されるものであるとするが、これに限定されるものではなく、１枚の画像であっても良い。 In this embodiment, this processing target image is an RGB image in which one pixel is expressed by 8 bits, and the size of the processing target image is M pixels × N pixels (where M is the number of horizontal pixels and N is a vertical pixel) Number). In addition, the processing target image includes only a plane composed of only R (red) pixel values, a plane composed only of G (green) pixel values, and a B (blue) pixel value. The plane is composed of a total of three planes, but is not limited to this, and may be a single image.

また、この処理対象画像データがＪＰＥＧなどで圧縮されている場合には、ＣＰＵ２０１はこれを伸張し、各画素がＲ、Ｇ、Ｂの各成分で表現される画像を生成する。 When the processing target image data is compressed by JPEG or the like, the CPU 201 decompresses the image data and generates an image in which each pixel is expressed by R, G, and B components.

次に、ステップＳ１０１でＲＡＭ２０２にロードされた処理対象画像に対応するアピアランスパラメータを設定する（ステップＳ１０２）。ここで、ステップＳ１０２におけるアピアランスパラメータの設定処理についてより詳細に説明する。 Next, an appearance parameter corresponding to the processing target image loaded in the RAM 202 in step S101 is set (step S102). Here, the appearance parameter setting process in step S102 will be described in more detail.

以下、設定すべきアピアランスパラメータと設定方法を列挙する。 The appearance parameters and setting methods to be set are listed below.

カメラプロファイル：撮影したカメラが出力する画像データ（ＲＧＢ）と標準的な撮影環境での色データ（色の三刺激値ＸＹＺ）を対応付けたテーブル。画像データを撮影した環境で複数の色票を撮影し取得したＲＧＢデータと標準的な撮影環境で色票の分光分布特性を測定しＸＹＺに変換したデータから作成する。撮影するシーンを昼光、曇り、蛍光灯、タングステン、ストロボなど、光源ごとに大別し、あらかじめシーンごとにカメラプロファイルを作成しておき、その中からカメラプロファイルを代表させても良い。その場合、ユーザは画像に応じて撮影したシーンに対応するカメラプロファイルを選択する。 Camera profile: A table in which image data (RGB) output from a photographed camera is associated with color data (color tristimulus values XYZ) in a standard photographing environment. It is created from RGB data obtained by photographing a plurality of color charts in the environment where the image data was photographed and data obtained by measuring the spectral distribution characteristics of the color charts in a standard photographing environment and converting them into XYZ. The scenes to be photographed may be roughly classified for each light source such as daylight, cloudy, fluorescent light, tungsten, strobe, etc., and a camera profile may be created for each scene in advance, and the camera profile may be represented from these. In that case, the user selects a camera profile corresponding to the scene shot according to the image.

白点：画像データの基準となる白のＸＹＺデータ。ユーザが画像中から白レベルの領域を指定する。指定された領域の画像データをＲＧＢからカメラプロファイルによりＸＹＺに変換し、その平均値を白点データとする。画像中から比較的輝度の高い部分を自動的に抽出し白点を求めるようにしても良い。 White point: White XYZ data serving as a reference of image data. The user designates a white level area from the image. The image data of the designated area is converted from RGB to XYZ by the camera profile, and the average value is used as white point data. A white spot may be obtained by automatically extracting a relatively high luminance portion from the image.

順応輝度：順応領域における絶対輝度を表す。撮影時の被写体輝度を順応輝度として入力する。被写体輝度は撮影時の感度、絞り値、露光時間から計算できるのでカメラで撮影する際の感度、絞り値、露光時間を入力するようにしてもよい。 Adaptation luminance: represents the absolute luminance in the adaptation region. Input subject brightness at the time of shooting as adaptation brightness. Since the subject brightness can be calculated from the sensitivity, aperture value, and exposure time during shooting, the sensitivity, aperture value, and exposure time when shooting with the camera may be input.

背景輝度：背景領域の相対輝度。視野角１０度までの領域を背景領域とする。したがって画像データの視野角１０度に相当する部分領域のサイズを入力し、部分領域内の平均輝度を背景輝度とする。ここでは、画像データをＲＧＢからカメラプロファイルによりＸＹＺに変換し、Ｙ値の平均値を出力する。計算は画素毎に行うようにし、その際、部分領域は注目画素を中心としてその都度設定する。なお、視野角１０度に相当する部分領域のサイズは撮影した画像の画素数と画素サイズ、撮影レンズの焦点距離によって求めることができるので画像サイズを入力する代わりに画素数と画素サイズ、撮影レンズの焦点距離を入力するようにしても良い。 Background luminance: The relative luminance of the background area. A region up to a viewing angle of 10 degrees is set as a background region. Accordingly, the size of the partial area corresponding to the viewing angle of 10 degrees of the image data is input, and the average luminance in the partial area is set as the background luminance. Here, the image data is converted from RGB to XYZ by the camera profile, and the average value of the Y values is output. The calculation is performed for each pixel, and in this case, the partial area is set each time around the target pixel. The size of the partial area corresponding to the viewing angle of 10 degrees can be obtained from the number of pixels of the photographed image, the pixel size, and the focal length of the photographing lens. Therefore, instead of inputting the image size, the number of pixels, the pixel size, the photographing lens The focal length may be input.

周囲条件：撮影時の環境が平均的な環境、薄暗い環境、暗い環境の中からユーザが選択し、指定する。夜景など暗い環境で撮影したものを暗い環境、明け方や夕暮れ、薄暗い屋内などで撮影したものを薄暗い環境、その他の通常の明るい環境を平均的な環境とする。 Ambient conditions: The user selects and designates from among an average shooting environment, a dim environment, and a dark environment. Images taken in a dark environment such as a night view are taken as a dark environment, images taken in the morning or dusk, a dimly lit indoor environment, etc. are dim environments, and other normal bright environments are average environments.

すなわち、以上の設定すべき条件をキーボード２０４やマウス２０５を用いてアピアランスパラメータ設定部２０でユーザが入力するが、画像データに付帯情報として光源別のホワイトバランスのモードが設定されている場合にはカメラプロファイルの選択に利用できる。また、付帯情報に撮影モードとして感度、絞り値、露光時間が設定されている場合には順応輝度の算出に利用できる。また、感度、絞り値、露光時間から被写体輝度を算出し周囲条件の自動選択にも利用できる。また、付帯情報に画素数と画素サイズ、撮影レンズの焦点距離が設定されている場合には背景輝度の算出に利用できる。また、付帯情報に夜景撮影や夕焼けなど環境が特定できるシーンモードが設定されている場合には周囲条件の自動選択に利用できる。 That is, the user inputs the above conditions to be set in the appearance parameter setting unit 20 using the keyboard 204 and the mouse 205, but when the white balance mode for each light source is set as incidental information in the image data. Can be used to select camera profiles. Further, when sensitivity, aperture value, and exposure time are set in the incidental information as a shooting mode, it can be used for calculation of adaptation luminance. In addition, the subject brightness can be calculated from the sensitivity, aperture value, and exposure time, and used for automatic selection of ambient conditions. Further, when the number of pixels, the pixel size, and the focal length of the photographing lens are set in the incidental information, it can be used for calculating the background luminance. Further, when a scene mode capable of specifying an environment such as night view shooting or sunset is set in the incidental information, it can be used for automatic selection of ambient conditions.

次に、設定したアピアランスパラメータに従って画像データをアピアランス色空間のデータに変換する（ステップＳ１０３）。 Next, the image data is converted into appearance color space data in accordance with the set appearance parameter (step S103).

以下で説明するアピアランス色空間への変換は図１の色の見え変換部３０で行われ、色の見え変換部３０の機能は図４に示すように４つのブロックに分けることができる。３１はカラーマッチング部であり、カメラで撮影した環境での画像データを標準的な撮影環境での色に変換する。３２は色順応部であり、白点をもとに色の順応処理を行う。３３は錐体応答部であり、順応輝度をもとに視覚の錐体の応答を模擬した非線形圧縮処理を行う。３４は知覚属性抽出部であり、錐体応答出力から背景輝度、周囲条件をもとに知覚属性である明るさ、彩度、色相を求める。すなわち、アピアランス色空間への変換は異なる環境に対するカラーマッチング、色順応、輝度順応、知覚属性（明るさ、彩度、色相）への変換を含む。 The conversion to the appearance color space described below is performed by the color appearance conversion unit 30 in FIG. 1, and the function of the color appearance conversion unit 30 can be divided into four blocks as shown in FIG. Reference numeral 31 denotes a color matching unit, which converts image data captured by a camera into colors in a standard shooting environment. A color adaptation unit 32 performs color adaptation processing based on white spots. A cone response unit 33 performs nonlinear compression processing that simulates the response of a visual cone based on the adaptation luminance. Reference numeral 34 denotes a perceptual attribute extracting unit that obtains perceptual attributes of brightness, saturation, and hue from the cone response output based on the background luminance and the ambient conditions. That is, conversion to appearance color space includes color matching, chromatic adaptation, luminance adaptation, and perceptual attributes (brightness, saturation, hue) for different environments.

まず、画像入力部１０で取得されたＲＧＢデータは、カラーマッチング部３１であらかじめ設定されたアピアランスパラメータであるカメラプロファイルによりそれぞれ標準的な撮影環境における三刺激値ＸＹＺデータに変換される。カメラプロファイルには代表的なＲＧＢ値に対応するＸＹＺの値が所定間隔の格子状にテーブルとして記憶されており、所望のＲＧＢ値に対応するＸＹＺの値の算出は、既存の四面体補間処理などにより補間算出するものとする。 First, the RGB data acquired by the image input unit 10 is converted into tristimulus value XYZ data in a standard imaging environment by a camera profile which is an appearance parameter set in advance by the color matching unit 31. In the camera profile, XYZ values corresponding to typical RGB values are stored as a table in a grid pattern with a predetermined interval, and calculation of XYZ values corresponding to desired RGB values is performed by, for example, existing tetrahedral interpolation processing. It is assumed that interpolation calculation is performed by

そして、色順応部３２では白点をもとに以下の色順応処理を行う。 The chromatic adaptation unit 32 performs the following chromatic adaptation processing based on the white point.

Ｒａ＝｛Ｄ×Ｒｓ／Ｒｗ＋（１−Ｄ）｝×Ｒ
Ｇａ＝｛Ｄ×Ｇｓ／Ｇｗ＋（１−Ｄ）｝×Ｇ
Ｂａ＝｛Ｄ×Ｂｓ／Ｂｗ＋（１−Ｄ）｝×Ｂ …（１）
但し、（Ｒ，Ｇ，Ｂ）はカラーマッチング部３１で変換されたＸＹＺを所定の変換式（例えば３行３列のマトリクス演算、以下ＸＹＺ−ＲＧＢ変換とする）にしたがってＲＧＢに変換した値、（Ｒｓ,Ｇｓ,Ｂｓ）は標準的な撮影環境における白点を表すＸＹＺデータをＸＹＺ−ＲＧＢ変換によりＲＧＢに変換した値、（Ｒｗ,Ｇｗ,Ｂｗ）は画像データから白点として設定されたＸＹＺデータをＸＹＺ−ＲＧＢ変換によりＲＧＢに変換した値、（Ｒａ,Ｇａ,Ｂａ）は色順応変換後のＲＧＢデータ、Ｄは順応度合いを表すパラメータで例えば０.１など所定の値を設定しておく。Ｄはまた順応輝度、周囲条件から計算するようにしてもよいし、アピアランスパラメータ設定部２０においてユーザが設定できるようにしてもよい。そして色順応変換後のＲＧＢデータを所定の変換式（ＸＹＺ−ＲＧＢ変換の逆変換、例えば３行３列のマトリクス演算、以下ＲＧＢ−ＸＹＺ変換とする）にしたがってＸＹＺに変換し、出力する。 Ra = {D × Rs / Rw + (1−D)} × R
Ga = {D × Gs / Gw + (1−D)} × G
Ba = {D × Bs / Bw + (1−D)} × B (1)
However, (R, G, B) is a value obtained by converting XYZ converted by the color matching unit 31 into RGB in accordance with a predetermined conversion formula (for example, 3 × 3 matrix calculation, hereinafter referred to as XYZ-RGB conversion), (Rs, Gs, Bs) is a value obtained by converting XYZ data representing a white point in a standard photographing environment into RGB by XYZ-RGB conversion, and (Rw, Gw, Bw) is an XYZ set as a white point from image data A value obtained by converting data into RGB by XYZ-RGB conversion, (Ra, Ga, Ba) is RGB data after chromatic adaptation conversion, D is a parameter indicating the degree of adaptation, and a predetermined value such as 0.1 is set in advance. . D may be calculated from the adaptation brightness and the ambient condition, or may be set by the user in the appearance parameter setting unit 20. Then, the RGB data after the chromatic adaptation conversion is converted into XYZ according to a predetermined conversion formula (inverse conversion of XYZ-RGB conversion, for example, matrix operation of 3 rows and 3 columns, hereinafter referred to as RGB-XYZ conversion), and is output.

次に、錐体応答部３３では順応輝度をもとに視覚の錐体の応答を模擬した非線形圧縮処理を行う。まず、ＸＹＺデータを人間の視覚における錐体に対応する３つの周波数帯域における色データＬＭＳに所定の変換式（例えば３行３列のマトリクス演算、以下ＸＹＺ−ＬＭＳ変換とする）にしたがって変換する。そして、順応輝度をパラメータとした所定の圧縮式にしたがってＬＭＳデータの非線形圧縮を行い、出力する。 Next, the cone response unit 33 performs nonlinear compression processing that simulates the response of the visual cone based on the adaptation luminance. First, XYZ data is converted into color data LMS in three frequency bands corresponding to a cone in human vision according to a predetermined conversion formula (for example, a matrix operation of 3 rows and 3 columns, hereinafter referred to as XYZ-LMS conversion). Then, non-linear compression of the LMS data is performed in accordance with a predetermined compression formula using the adaptation luminance as a parameter, and output.

次に、知覚属性抽出部３４では錐体応答出力から背景輝度、周囲条件をもとに所定の変換式にしたがい知覚属性である明るさＪ、彩度Ｃ、色相ｈを求める。まず、錐体応答出力であるＬＭＳデータから色相ｈを求める。次に、錐体応答出力であるＬＭＳデータを周囲条件によって決まる定数パラメータと背景輝度から輝度順応を行い、明るさＪを求める。そして求めた色相ｈ、明るさＪと周囲条件によって決まる定数パラメータと背景輝度から彩度Ｃを求める。これらの処理は、例えば、CIE CAM97sやCIE CAM02などのカラーアピアランスモデルに基づく変換処理により実現できる。 Next, the perceptual attribute extraction unit 34 obtains brightness J, saturation C, and hue h, which are perceptual attributes, from the cone response output according to a predetermined conversion formula based on background luminance and ambient conditions. First, the hue h is obtained from LMS data that is a cone response output. Next, luminance adaptation is performed on the LMS data, which is a cone response output, from the constant parameter determined by the ambient conditions and the background luminance, and the brightness J is obtained. Then, the saturation C is obtained from the constant parameter determined by the obtained hue h, brightness J and ambient conditions, and the background luminance. These processes can be realized by a conversion process based on a color appearance model such as CIE CAM97s or CIE CAM02.

次に、所望の被写体を検出するため画像から部分領域を切り出す（ステップＳ１０４）。 Next, a partial region is cut out from the image in order to detect a desired subject (step S104).

このステップＳ１０４ではあらかじめ決められた複数の画像サイズの部分領域を所定の画素ごとに画像中を走査するように移動させて部分領域を切り出す。人物画像から人物の顔を検出する場合には、例えば、画像サイズの５％〜２５％の面積のサイズの部分領域を５％ごとに５段階、大きいサイズから走査するようにする。 In step S104, a partial area having a plurality of predetermined image sizes is moved so as to scan the image for each predetermined pixel, and the partial area is cut out. When detecting the face of a person from a person image, for example, a partial region having an area of 5% to 25% of the image size is scanned in 5 steps every 5% from a large size.

次に、被写体特徴量算出部４０で部分領域毎にアピアランス色空間の画像データ（ＪＣｈ）から顔を検出するための特徴量を計算する（ステップＳ１０５）。 Next, the feature amount calculation unit 40 calculates a feature amount for detecting a face from image data (JCh) in the appearance color space for each partial region (step S105).

まず、切り出した部分領域を所定のサイズに正規化する。そして、正規化した画像データから以下に示す５つの特徴量を求める。 First, the cut out partial area is normalized to a predetermined size. Then, the following five feature amounts are obtained from the normalized image data.

明るさの平均値：画像データのＪ成分の平均値。 Average value of brightness: Average value of J component of image data.

肌色部分の明るさ：画像データの色相ｈ成分から所定の範囲の画素を肌色画素とし、肌色画素のＪ成分の平均値を求める。 Brightness of skin color portion: Pixels in a predetermined range from the hue h component of the image data are skin color pixels, and an average value of J components of the skin color pixels is obtained.

目、口、鼻部分の明るさ：図５に示すように部分領域として抽出し、正規化した画像データの所定の領域を目（図５のＡ）、口（図５のＢ）、鼻（図５のＣ）部分とし、各領域に属する画素のＪ成分の平均値を求める。 Brightness of eyes, mouth, and nose parts: As shown in FIG. 5, a predetermined area of image data extracted and normalized as a partial area is defined as eyes (A in FIG. 5), mouth (B in FIG. 5), nose ( The average value of the J components of the pixels belonging to each region is obtained as part C) of FIG.

これらの特徴量は、顔画像の場合、肌色部分、鼻の明るさは顔全体の明るさよりも明るい、目、口の明るさは顔全体の明るさよりも暗い、などの仮定により顔の判別を行うために決定されたものである。このとき、画像を撮影する環境の違いにより、肌色の色相が多少変化してもアピアランス色空間の知覚属性に基づき肌色画素を判定するので光源の色の変化による精度劣化に対応することができる。また、照明により顔画像に陰が生じた場合にも輝度順応によって陰などに依存する成分をある程度除去しているので特徴量を精度よく抽出することができる。 In the case of a face image, these features are determined based on the assumption that the skin color and brightness of the nose are brighter than the brightness of the entire face, and the brightness of the eyes and mouth is darker than the brightness of the entire face. It is decided to do. At this time, even if the hue of the skin color changes slightly due to the difference in the environment in which the image is captured, the skin color pixel is determined based on the perceptual attribute of the appearance color space, so that it is possible to cope with the accuracy deterioration due to the change in the color of the light source. In addition, even when shadows are generated in the face image due to illumination, components depending on shadows are removed to some extent by luminance adaptation, so that feature quantities can be extracted with high accuracy.

なお、本実施形態のように特徴量算出に明るさ、色相の知覚属性のみが必要であるような場合には、特に必要なもののみ計算すればよく、本例では彩度の算出は特に必要でないことはいうまでもない。 In addition, when only the brightness and hue perceptual attributes are required for the feature amount calculation as in the present embodiment, it is only necessary to calculate what is necessary. In this example, it is particularly necessary to calculate the saturation. It goes without saying that it is not.

ここでは、顔を代表する特徴量として５つの明るさ特徴量について説明したが、色、エッジ強度や方向に基づく特徴量を算出して判別に利用するようにしてもよい。 Here, five brightness feature values have been described as feature values representing the face, but feature values based on color, edge strength, and direction may be calculated and used for discrimination.

次に、算出した特徴量に基づき被写体判別部５０で部分領域に対する顔らしさの判別を行う（ステップＳ１０６）。 Next, based on the calculated feature value, the subject determination unit 50 determines the facialness of the partial area (step S106).

顔らしさの判別を行うには、参照辞書記憶部５０に記憶されている辞書データを用いる。参照辞書記憶部５０には、あらかじめ様々な環境で撮影された顔を含む画像データを本実施形態のステップＳ１０１〜Ｓ１０５と同様の処理により計算した特徴量のデータが記憶されている。但し、参照辞書を作成する際のステップＳ１０４に相当する部分領域の切り出しは画像中の顔部分の位置を、画像を確認しながら入力するものである。計算された顔ごとの特徴量は統計的に処理され、顔らしさの判別を行うための、各特徴量の上限値、下限値および中心値、各特徴量を組み合わせた評価関数の上限値、下限値および中心値が顔画像と顔以外の画像を最適に判別するように設定され、参照辞書として記憶されている。各特徴量を組み合わせた評価関数としては、例えば、顔全体の明るさに対する肌色部分、目、口、鼻の明るさの比、肌色部分の明るさと目、口の明るさとの差などが用いられる。なお、本実施形態では、このように参照辞書を作成する際にも画像データをアピアランス色空間の画像に変換し、知覚量に基づいた正規化が行われているので、参照画像撮影時と検出の対象とする画像の撮影時での撮影環境が異なっても精度のよい検出が行える。 In order to determine the facial appearance, dictionary data stored in the reference dictionary storage unit 50 is used. The reference dictionary storage unit 50 stores feature amount data obtained by calculating image data including faces photographed in various environments in advance by the same processing as steps S101 to S105 of the present embodiment. However, the extraction of the partial region corresponding to step S104 when creating the reference dictionary is to input the position of the face portion in the image while confirming the image. The calculated feature value for each face is statistically processed, and the upper limit value, lower limit value and center value of each feature value, and the upper limit value and lower limit value of the evaluation function combining each feature value, to determine the likelihood of a face. The value and the center value are set so as to optimally discriminate between a face image and an image other than a face, and are stored as a reference dictionary. As an evaluation function combining each feature amount, for example, the ratio of the brightness of the skin color part, eyes, mouth, and nose to the brightness of the entire face, the difference between the brightness of the skin color part and the brightness of the eyes, mouth, etc. . In this embodiment, even when creating the reference dictionary in this way, the image data is converted into an appearance color space image, and normalization based on the perceptual amount is performed. Even if the photographing environment at the time of photographing the target image is different, accurate detection can be performed.

顔らしさの判別は、部分領域の画像データから抽出した特徴量と参照辞書の各データとを比較し、まず、各特徴量および各特徴量を組み合わせた評価関数の値が上限値と下限値の範囲内にあるかが判定される。そして、全ての値が上限値と下限値の範囲内にあるものが顔候補として選定される。さらに、各特徴量および各特徴量を組み合わせた評価関数の値と対応する中心値との距離の総和の逆数を顔らしさの指標として求めておく。 The determination of facial appearance is performed by comparing the feature values extracted from the image data of the partial area with each data of the reference dictionary. First, the value of the evaluation function combining each feature value and each feature value is an upper limit value and a lower limit value. It is determined whether it is within the range. Then, all the values within the range between the upper limit value and the lower limit value are selected as face candidates. Further, the reciprocal of the sum of the distances between the feature values and the evaluation function value combining the feature values and the corresponding central value is obtained as an index of facialness.

なお、本実施形態では各特徴量および各特徴量を組み合わせた評価関数をそれぞれ個別に判定し、中心値との距離の総和で顔らしさの指標を求めるようにしたが、例えば、各特徴量および各特徴量を組み合わせた評価関数を特徴ベクトルとし、参照辞書には参照とする顔画像の特徴ベクトルの平均値、共分散行列、マハラノビス距離での判定閾値を記憶しておき、部分領域の画像データから抽出した特徴ベクトルと参照辞書の平均値、共分散行列からマハラノビス距離を計算し、閾値以下の部分領域を顔候補として判別し、計算したマハラノビス距離の逆数を顔らしさの指標として求めておいてもよい。 In the present embodiment, each feature amount and an evaluation function that combines each feature amount are individually determined, and an index of facialness is obtained by the sum of distances from the center value. For example, each feature amount and An evaluation function combining each feature amount is used as a feature vector, and the reference dictionary stores the average value of the feature vectors of the face image to be referred to, a covariance matrix, and a determination threshold value based on the Mahalanobis distance, and image data of the partial region. The Mahalanobis distance is calculated from the feature vector extracted from the average value of the reference dictionary and the covariance matrix, and the sub-region below the threshold is determined as a face candidate, and the reciprocal of the calculated Mahalanobis distance is obtained as an index of facialness. Also good.

次に、全ての部分領域について顔判別の走査が終了したかを判定する（ステップＳ１０７）。全ての部分領域について顔判別の走査が終了していない場合には、次の部分領域について顔判別を行うようにステップＳ１０４に戻る。また、全ての部分領域について顔判別の走査が終了した場合には、次のステップＳ１０８に移る。 Next, it is determined whether or not face discrimination scanning has been completed for all partial regions (step S107). When face discrimination scanning has not been completed for all partial areas, the process returns to step S104 so that face discrimination is performed for the next partial area. If the face discrimination scan is completed for all the partial areas, the process proceeds to the next step S108.

次に、被写体領域出力部７０で顔候補として判別された全ての部分領域について顔領域の重なりの判定を行う（ステップＳ１０８）。 Next, the face area overlap is determined for all partial areas determined as face candidates by the subject area output unit 70 (step S108).

まず、顔候補として判別された部分領域の組を順次、位置的に重なり合うかの判定を行う。そして重なり合った部分領域の組が検出された場合には、ステップＳ１０６で求めた顔らしさの指標をもとに顔らしさの指標が低い部分領域を顔候補から取り除く。以上の処理を部分領域の重なりがなくなるまで繰り返し行い、最終的な顔候補とする。 First, it is determined whether sets of partial areas determined as face candidates are sequentially overlapped. If a pair of overlapping partial areas is detected, a partial area having a low facial likelihood index is removed from the face candidates based on the facial likelihood index obtained in step S106. The above processing is repeated until there is no overlapping of partial areas, and the final face candidate is obtained.

そして最後に被写体領域出力部７０で顔候補として判別された全ての部分領域を表示部２０６に出力する処理を行う（ステップＳ１０９）。例えば、画像入力部１０で取得した画像データに検出した顔部分の部分領域を示す矩形の枠を重畳表示する。 Finally, a process of outputting all partial areas determined as face candidates by the subject area output unit 70 to the display unit 206 is performed (step S109). For example, a rectangular frame indicating the partial area of the detected face portion is superimposed and displayed on the image data acquired by the image input unit 10.

以上説明したように、本実施形態によれば、被写体を取り巻く照明条件などの環境が変った場合においても画像中から人物の顔部分を精度よく判別することができる。 As described above, according to the present embodiment, it is possible to accurately determine the face portion of a person from the image even when the environment such as the illumination condition surrounding the subject changes.

なお、本実施形態では、顔を検出する場合について説明したが、被写体特徴量抽出部４０において検出すべき被写体に適した特徴量の抽出を行い、被写体判別部５０において検出すべき被写体に適した参照辞書をもとに判別すれば、広く所望の被写体の検出に適用できる。特に、本実施懈怠においては人間の視覚のメカニズムを参考にし、被写体認識処理を行う前段で入力した画像に対して色・輝度順応および知覚量への変換を行っているので、人間の視覚の特徴である環境への堅牢性が達成されるものである。 In the present embodiment, the case of detecting a face has been described. However, the subject feature amount extraction unit 40 extracts a feature amount suitable for the subject to be detected, and the subject determination unit 50 is suitable for the subject to be detected. If the determination is made based on the reference dictionary, it can be widely applied to detection of a desired subject. In particular, in this implementation failure, the human visual mechanism is referenced, and the image input in the previous stage of subject recognition processing is converted to color / luminance adaptation and perceptual amount. The robustness to the environment is achieved.

（他の実施形態）
本発明の目的は、前述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記録媒体（または記憶媒体）を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムコードを読み出し実行することによっても、達成されることは言うまでもない。この場合、記録媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記録した記録媒体は本発明を構成することになる。 (Other embodiments)
An object of the present invention is to supply a recording medium (or storage medium) that records software program codes for realizing the functions of the above-described embodiments to a system or apparatus, and the computer of the system or apparatus (or CPU or MPU). Needless to say, this can also be achieved by reading and executing the program code stored in the recording medium. In this case, the program code itself read from the recording medium realizes the functions of the above-described embodiment, and the recording medium on which the program code is recorded constitutes the present invention.

また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム（ＯＳ）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an operating system (OS) running on the computer based on the instruction of the program code. It goes without saying that a case where the function of the above-described embodiment is realized by performing part or all of the actual processing and the processing is included.

さらに、記録媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた後、そのプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Furthermore, after the program code read from the recording medium is written into a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the function is determined based on the instruction of the program code. It goes without saying that the CPU or the like provided in the expansion card or the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing.

本発明を上記記録媒体に適用する場合、その記録媒体には、先に説明したフローチャートに対応するプログラムコードが格納されることになる。 When the present invention is applied to the recording medium, program code corresponding to the flowchart described above is stored in the recording medium.

本発明の一実施形態に係る画像処理装置として機能するコンピュータの機能構成を示すブロック図である。1 is a block diagram illustrating a functional configuration of a computer that functions as an image processing apparatus according to an embodiment of the present invention. 本発明の一実施形態に係る画像処理装置の基本構成を示す図である。1 is a diagram illustrating a basic configuration of an image processing apparatus according to an embodiment of the present invention. 図１に示した各部により行われる処理のフローチャートである。It is a flowchart of the process performed by each part shown in FIG. 色の見え変換部の詳細を示すブロック図である。It is a block diagram which shows the detail of a color appearance conversion part. 顔画像の特徴量を説明する図である。It is a figure explaining the feature-value of a face image.

Claims

An image processing method for detecting a predetermined subject from an image,
A conversion step of converting the image into an image in a color space based on the amount of human perception;
A feature amount extraction step of extracting a feature amount from the image converted in the conversion step;
A subject detection step of detecting the predetermined subject based on the feature amount extracted in the feature amount extraction step;
An image processing method comprising:

The image processing method according to claim 1, wherein in the conversion step, an adaptation process is performed on the image based on a shooting environment when the image is shot.

In the subject determination step, the feature amount extracted in the feature amount extraction step is performed by performing the conversion step on an image including a subject to be detected that has been captured in a different environment in advance and converting the image into a color space image based on a human perception amount. The image processing method according to claim 1, wherein the predetermined subject is discriminated based on the distribution of the image.

A program for causing a computer to execute the image processing method according to any one of claims 1 to 3.

A storage medium storing the program according to claim 4 in a computer-readable manner.

An image processing apparatus for detecting a predetermined subject from an image,
Conversion means for converting an image into an image in a color space based on a human perception amount;
Feature quantity extraction means for extracting feature quantities from the image converted by the conversion means;
Subject detection means for detecting the predetermined subject based on the feature amount extracted by the feature amount extraction means;
An image processing apparatus comprising: