JP2019096073A

JP2019096073A - Image processor, method for processing image, and program

Info

Publication number: JP2019096073A
Application number: JP2017225113A
Authority: JP
Inventors: 雄司金田; Yuji Kaneda; 佐藤　博; Hiroshi Sato; 博佐藤; 俊亮中野; Toshiaki Nakano; 敦夫野本; Atsuo Nomoto; 大輔西野; Daisuke Nishino
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-11-22
Filing date: 2017-11-22
Publication date: 2019-06-20

Abstract

To recognize an attribute more accurately.SOLUTION: The image processor includes: detection means for detecting the position of an object in an image; extraction means for extracting the amount of feature from an image according to the position of an object; attribute identifying means for outputting an attribute score for each attribute of the object, using the feature amount; and correction means for correcting the attribute score for at least one of the attributes of the object on the basis of the attribute score of at least another attribute of the object output by the attribute identifying means.SELECTED DRAWING: Figure 2

Description

本発明は、画像処理装置、画像処理方法及びプログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and a program.

近年、顔の表情や個人を特定するだけでなく、人種、年代、性別、ヒゲ等の顔に関する属性、更には服装等の人体に関する属性を認識する技術開発が活発になっている。
例えば、非特許文献１では画像中から目、口、鼻の位置を検出して、これらの位置に基づいてＲＧＢやＨＳＶの色特徴、エッジやヒストグラム系の特徴等の多種多様な特徴量を抽出しておく。そして、これらの特徴量をＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ（以下、ＳＶＭという）と呼ばれる識別器に入力することで、メガネ、性別、年代、ヒゲ、髪の毛等６５種類もの顔に関する属性を認識する技術がある。
また、非特許文献２のように、ロングパンツ、ジーンズ、Ｔシャツ等人体に関する属性を認識する技術がある。 Recently, in addition to identifying facial expressions and individuals, there has been active development of technology for recognizing attributes related to faces such as race, age, sex, and beard, and further attributes related to human bodies such as clothes.
For example, Non-Patent Document 1 detects positions of eyes, mouth, and nose from an image, and extracts various feature amounts such as color features of RGB and HSV, and features of an edge and a histogram system based on these positions. Keep it. Then, there is a technology for recognizing attributes related to as many as 65 types of faces such as glasses, gender, age, hair, hair and the like by inputting these feature quantities into a classifier called Support Vector Machine (hereinafter referred to as SVM).
Also, as in Non-Patent Document 2, there is a technology for recognizing attributes related to the human body such as long pants, jeans, T-shirts and the like.

Ｎ．Ｋｕｍｅｒ， "ＡｔｔｒｉｂｕｔｅａｎｄＳｉｍｉｌｅＣｌａｓｓｉｆｉｅｒｆｏｒＦａｃｅＶｅｒｉｆｉｃａｔｉｏｎ"，ＩＥＥＥＩＣＣＶ，２００９N. Kumer, "Attribute and Simile Classifier for Face Verification", IEEE ICCV, 2009 Ｌ．Ｂｏｕｒｄｅｖ， "ＤｅｓｃｒｉｂｉｎｇＰｅｏｐｌｅ：ＡＰｏｓｅｌｅｔ−ＢａｓｅｄＡｐｐｒｏａｃｈｔｏＡｔｔｒｉｂｕｔｅＣｌａｓｓｉｆｉｃａｔｉｏｎ"，ＩＥＥＥＩＣＣＶ，２０１１L. Bourdev, "Describing People: A Poselet-Based Approach to Attribute Classification", IEEE ICCV, 2011 Ｐ．Ｖｉｏｌａ，Ｍ．Ｊｏｎｅｓ， "ＲａｐｉｄＯｂｊｅｃｔＤｅｔｅｃｔｉｏｎｕｓｉｎｇａＢｏｏｓｔｅｄＣａｓｃａｄｅｏｆＳｉｍｐｌｅＦｅａｔｕｒｅｓ"，ｉｎＰｒｏｃ．ＯｆＣＶＰＲ，ｖｏｌ．１，ｐｐ．５１１−５１８，Ｄｅｃｅｍｂｅｒ，２００１P. Viola, M. Jones, "Rapid Object Detection using a Boosted Cascade of Simple Features", in Proc. Of CVPR, vol. 1, pp. 511-518, December, 2001 ＸｕｄｏｎｇＣａｏ，ＹｉｃｈｅｎＷｅｉ，ＦａｎｇＷｅｎ，ＪｉａｎＳｕｎ， "ＦａｃｅＡｌｉｇｎｍｅｎｔｂｙＥｘｐｌｉｃｉｔＳｈａｐｅＲｅｇｒｅｓｓｉｏｎ"，ＣＶＰＲ，ｐｐ．２８８７−２８９４，２０１２Xudong Cao, Yichen Wei, Fang Wen, Jian Sun, "Face Alignment by Explicit Shape Regression", CVPR, pp. 2887-2894, 2012

しかしながら、様々な属性を認識する技術では２つの問題がある。
１つ目は、顔や人体に関わる属性の数は非常に多いため、属性の数を増やせば増やすほど属性毎にＰｏｓｉｔｉｖｅサンプルとＮｅｇａｔｉｖｅサンプルとを用意して識別器を学習しなければならないことである。つまり、非常に多くの労力を要することである。
２つ目は、属性毎に識別の難易度が大きく異なるという点である。例えば、顔に関する属性であるメガネに対しては、メガネフレーム等をエッジとして容易に検出することができる。また、非特許文献３の顔検出技術や非特許文献４の顔特徴点検出技術を利用することで、メガネが存在しうる領域を正確に設定することができる。そのため、高精度に識別することができる。
一方で、顔に関する属性である髪の色に対しては、撮像装置のホワイトバランスや環境光の影響で色が大きく変わってしまうため、高精度に識別することは非常に難しい。
また、ロングパンツ、Ｔシャツ等、人体に関する属性に対しては、顔とは異なり上半身や下半身の領域を正確に特定できないので、高精度に識別することは非常に難しい。 However, there are two problems with the technology that recognizes various attributes.
First, since the number of attributes related to the face and the human body is very large, it is necessary to prepare a positive sample and a negative sample for each attribute and learn the classifier as the number of attributes is increased. is there. In other words, it takes a great deal of effort.
The second point is that the difficulty level of identification differs greatly for each attribute. For example, for eyeglasses that are attributes related to the face, an eyeglass frame or the like can be easily detected as an edge. Further, by using the face detection technology of Non-Patent Document 3 and the face feature point detection technology of Non-Patent Document 4, it is possible to accurately set an area where glasses may exist. Therefore, it can be identified with high accuracy.
On the other hand, with respect to the color of hair which is an attribute relating to the face, the color largely changes due to the influence of the white balance of the imaging device and the ambient light, so it is very difficult to identify with high accuracy.
Also, with regard to attributes related to the human body, such as long pants and T-shirts, unlike the face, upper and lower body regions can not be accurately identified, so it is very difficult to identify with high accuracy.

本発明の画像処理装置は、画像に含まれるオブジェクトの位置を検出する検出手段と前記オブジェクトの位置に基づいて前記画像から特徴量を抽出する抽出手段と、前記特徴量を用いて前記オブジェクトの複数の属性のそれぞれに対する属性スコアを出力する属性識別手段と、前記属性識別手段により出力された前記オブジェクトの複数の属性の少なくとも１つに対する属性スコアに基づいて、前記複数の属性の他の少なくとも１つに対する属性スコアを補正する補正手段と、を有する。 The image processing apparatus according to the present invention comprises a detection unit that detects the position of an object included in an image, an extraction unit that extracts a feature quantity from the image based on the position of the object, and a plurality of the objects using the feature quantity. Attribute identification means for outputting an attribute score for each of the at least one of the plurality of attributes based on the attribute score for at least one of the plurality of attributes of the object output by the attribute identification means; And correction means for correcting the attribute score for.

本発明によれば、より高精度にオブジェクトの属性を認識することができる。 According to the present invention, the attribute of an object can be recognized with higher precision.

画像処理装置のハードウェア構成の一例を示す図である。It is a figure showing an example of the hardware constitutions of an image processing device. 画像処理装置のソフトウェア構成の一例を示す図である。It is a figure showing an example of the software composition of an image processing device. 位置検出部の構成の一例を示す図である。It is a figure which shows an example of a structure of a position detection part. 特徴抽出部の構成の一例を示す図である。It is a figure which shows an example of a structure of a feature extraction part. 属性識別結果補正部の構成の一例を示す図である。It is a figure which shows an example of a structure of an attribute identification result correction | amendment part. 画像処理装置における情報処理の一例を示すフローチャートである。It is a flowchart which shows an example of the information processing in an image processing apparatus. 顔特徴点を説明する図である。It is a figure explaining a face feature point. 局所領域を説明する図である。It is a figure explaining a local field. 多種多様な特徴量から特徴ベクトルＶ_ｎを生成する例を示す図である。It is a figure which shows the example which produces _| generates feature vector Vn from various feature quantities. 各属性識別器から出力されたスコアの一例を示す図である。It is a figure which shows an example of the score output from each attribute discriminator. 女性の属性と矛盾する口ひげの属性スコアを補正する一例を示す図である。It is a figure which shows an example which correct | amends the attribute score of the mustache which conflicts with the attribute of a woman. 属性識別結果補正処理の一例を示すフローチャートである。It is a flowchart which shows an example of an attribute identification result correction process. 属性スコアを要素とする特徴ベクトルの一例を示す図である。It is a figure which shows an example of the feature vector which makes an attribute score an element. 補正前の全属性空間ＶにおけるサンプルＳ_ｎの一例を示す図である。It is a diagram illustrating an example of a sample S _n in all attribute space V before the correction. 補正後の全属性空間ＶにおけるサンプルＳ_ｎの一例を示す図である。It is a diagram illustrating an example of a sample S _n in all attribute space V after correction. 画像処理装置のソフトウェア構成の変形例を示す図である。It is a figure which shows the modification of the software configuration of an image processing apparatus. 属性識別結果補正処理の変形例を示すフローチャートである。It is a flowchart which shows the modification of attribute identification result correction processing.

以下、本発明の実施形態について図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described based on the drawings.

＜実施形態１＞
実施形態１では、属性間の相関関係、又は撮像装置等の他センサから取得可能な撮影状況（日時や場所）と属性との間の相関関係を利用することで属性識別結果を補正する処理について説明する。
図１は、画像処理装置１０のハードウェア構成の一例を示す図である。画像処理装置１０は、ハードウェア構成として、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１、メモリ１２、通信Ｉ／Ｆ１３を含む。ＣＰＵ１１は、画像処理装置１０の全体を制御する。メモリ１２は、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）等である。メモリ１２は、プログラムや画像、ＣＰＵ１１がプログラムに基づき処理を実行する際に利用するデータ、設定値等を記憶する。通信Ｉ／Ｆ１３は、画像処理装置１０をネットワークに接続するためのインタフェースである。ＣＰＵ１１がメモリ１２に記憶されたプログラムに基づき処理を実行することによって後述する図２−図５、図１７等のソフトウェア構成及び、図６、図１２、図１６等のフローチャートの処理が実現される。画像処理装置は、撮像装置であってもよいし、ＰＣ等であってもよい。撮像装置の場合、図１に示したハードウェア構成に加えて、撮像素子やＡＤ変換器等がハードウェア構成として含まれる。画像処理装置が撮像装置の場合は、自信で撮像した画像に対して処理を実行し、画像処理装置がＰＣ等の場合は、ネットワークを介して撮像装置で撮像された画像を取得し、取得した画像に対して処理を実行する。また、ＣＰＵ１１は、入力装置や表示装置等を介したユーザ操作に基づいてデータや設定値等をメモリ１２に記憶してもよいし、通信Ｉ／Ｆ１３を介した外部装置等から送信されたコマンド等に基づきデータや設定値等をメモリ１２に記憶してもよい。 First Embodiment
In the first embodiment, the process of correcting the attribute identification result by using the correlation between the attributes or the correlation between the imaging condition (date and time and location) and the attribute that can be acquired from other sensors such as an imaging apparatus explain.
FIG. 1 is a diagram showing an example of the hardware configuration of the image processing apparatus 10. As shown in FIG. The image processing apparatus 10 includes a CPU (Central Processing Unit) 11, a memory 12, and a communication I / F 13 as a hardware configuration. The CPU 11 controls the entire image processing apparatus 10. The memory 12 is a random access memory (RAM), a read only memory (ROM), a hard disk drive (HDD) or the like. The memory 12 stores programs, images, data used when the CPU 11 executes processing based on the programs, setting values, and the like. The communication I / F 13 is an interface for connecting the image processing apparatus 10 to a network. By executing processing based on the program stored in the memory 12 by the CPU 11, software configurations such as FIG. 2 to FIG. 5 and FIG. 17 to be described later and processing of flowcharts such as FIG. . The image processing apparatus may be an imaging apparatus or may be a PC or the like. In the case of an imaging device, in addition to the hardware configuration shown in FIG. 1, an imaging element, an AD converter, etc. are included as a hardware configuration. When the image processing device is an imaging device, processing is performed on the image taken with confidence, and when the image processing device is a PC etc., an image captured by the imaging device via a network is acquired and acquired Perform processing on the image. Further, the CPU 11 may store data, setting values, etc. in the memory 12 based on a user operation via an input device, a display device, etc., and a command transmitted from an external device etc. via the communication I / F 13 Data, setting values and the like may be stored in the memory 12 based on the like.

図２は、画像処理装置１０のソフトウェア構成の一例を示す図である。画像処理装置１０は、ソフトウェア構成として、画像取得部１００、位置検出部１０１、特徴抽出部１０２、属性識別部１０３、属性識別結果補正部１０４を含む。各部の処理は、後述するフローチャート等を用いて説明する。
図３は、位置検出部１０１の構成の一例を示す図である。位置検出部１０１は、顔位置検出部１０１００、顔特徴点検出部１０１０１を含む。実施形態１では顔を対象にした例を用いて説明を行う、人体位置検出技術を導入することにより顔と人体とを対象としてもよい。各部の処理は、後述するフローチャート等を用いて説明する。顔や人体は画像に含まれるオブジェクトの一例である。
図４は、特徴抽出部１０２の構成の一例を示す図である。特徴抽出部１０２は、正規化部１０２００、領域設定部１０２０１、特徴抽出部１０２０２を含む。各部の処理は、後述するフローチャート等を用いて説明する。
図５は、属性識別結果補正部１０４の構成の一例を示す図である。属性識別結果補正部１０４は、属性選択部１０４００、部分空間選択部１０４０１、属性スコア補正部１０４０２を含む。各部の処理は、後述するフローチャート等を用いて説明する。 FIG. 2 is a view showing an example of the software configuration of the image processing apparatus 10. As shown in FIG. The image processing apparatus 10 includes an image acquisition unit 100, a position detection unit 101, a feature extraction unit 102, an attribute identification unit 103, and an attribute identification result correction unit 104 as a software configuration. The processing of each unit will be described using a flowchart and the like described later.
FIG. 3 is a diagram showing an example of the configuration of the position detection unit 101. As shown in FIG. The position detection unit 101 includes a face position detection unit 10100 and a face feature point detection unit 10101. In the first embodiment, description will be made using an example for the face, and by introducing the human body position detection technology, the face and the human body may be taken as the target. The processing of each unit will be described using a flowchart and the like described later. A face or a human body is an example of an object included in an image.
FIG. 4 is a diagram showing an example of the configuration of the feature extraction unit 102. As shown in FIG. The feature extraction unit 102 includes a normalization unit 10200, an area setting unit 10201, and a feature extraction unit 10202. The processing of each unit will be described using a flowchart and the like described later.
FIG. 5 is a diagram showing an example of the configuration of the attribute identification result correction unit 104. As shown in FIG. The attribute identification result correction unit 104 includes an attribute selection unit 10400, a partial space selection unit 10401, and an attribute score correction unit 10402. The processing of each unit will be described using a flowchart and the like described later.

図６は、画像処理装置１０における情報処理の一例を示すフローチャートである。
Ｓ１００００では、画像取得部１００は、レンズ等の集光素子、光を電気信号に変換するＣＭＯＳやＣＣＤ等の撮像素子、アナログ信号をデジタル信号に変換するＡＤ変換器を通過することによって、得られたデジタル画像（以下、画像という）を取得する。また、画像取得部１００は、間引き処理等を行うことによって、例えば、ＶＧＡ（６４０×４８０［ｐｉｘｅｌ］）やＱＶＧＡ（３２０×２４０［ｐｉｘｅｌ］）に変換した画像を取得するようにしてもよい。
Ｓ１０１００では、顔位置検出部１０１００は、Ｓ１００００で取得された画像から顔が存在する位置を検出する。画像内に複数の顔が存在する場合は、それぞれの顔の位置を検出する。例えば、顔位置検出部１０１００は、非特許文献３のように積分画像とカスケード識別器とを利用することによって高精度、かつ、高速に画像内の顔を検出することができるが、これに限られるわけではない。
Ｓ１０１０１では、顔位置検出部１０１００は、Ｓ１０１００で顔が検出されたかどうかを判定する。顔位置検出部１０１００は、顔が検出されたと判定した場合にはＳ１０１０２へ進み、顔が検出されなかったと判定した場合にはＳ１００００へ戻る。Ｓ１００００では、画像取得部１００は、次の画像を取得する。
Ｓ１０１０２では、顔位置検出部１０１００は、Ｓ１０１００で検出された顔の中から１つの顔を選択する。 FIG. 6 is a flowchart showing an example of information processing in the image processing apparatus 10.
In S10000, the image acquisition unit 100 is obtained by passing through a condensing element such as a lens, an imaging element such as a CMOS or CCD that converts light into an electric signal, and an AD converter that converts an analog signal into a digital signal. Digital image (hereinafter referred to as an image). Further, the image acquiring unit 100 may acquire an image converted to, for example, VGA (640 × 480 [pixel]) or QVGA (320 × 240 [pixel]) by performing thinning processing or the like.
In S10100, the face position detection unit 10100 detects the position where the face is present from the image acquired in S10000. If there are multiple faces in the image, the position of each face is detected. For example, as described in Non-Patent Document 3, the face position detection unit 10100 can detect a face in an image with high accuracy and at high speed by using an integral image and a cascade discriminator. It can not be
In S10101, the face position detection unit 10100 determines whether a face is detected in S10100. If the face position detection unit 10100 determines that a face is detected, the process advances to step S10102; if it is determined that a face is not detected, the process returns to step S10000. In S10000, the image acquisition unit 100 acquires the next image.
In S10102, the face position detection unit 10100 selects one face from the faces detected in S10100.

Ｓ１０１０３では、顔特徴点検出部１０１０１は、Ｓ１０１０２で選択された顔に対して顔特徴点位置を検出する。顔特徴点とは、図７の１０１０３０のように目尻や目頭、鼻の頂点、左右の口端等目や鼻や口のパーツのコーナーのことである。例えば、顔特徴点検出部１０１０１は、非特許文献４のように予め平均的な顔特徴点位置を初期配置として設定した後、顔特徴点位置に基づいて特徴量を抽出する。そして、顔特徴点検出部１０１０１は、予め用意しておいた特徴量と顔特徴点の移動・変形量との関係を表すテーブルに基づき、顔特徴点位置を移動・変形する手法がある。但し、顔特徴点の検出はこの手法に限定されるわけではない。
Ｓ１０２００では、正規化部１０２００は、Ｓ１０１０３で検出された顔特徴点に基づいて、顔の大きさが設定された大きさ、顔の向きが正立するようにアフィン変換処理を行う。例えば、正規化部１０２００は、目尻、目頭、上瞼、下瞼の目に関する４つの特徴点の重心を求め、左右の目の重心を結ぶ距離が７５［ｐｉｘｅｌ］、水平方向との傾きが０［度］となるようにアフィン変換を行う。
次に、領域設定部１０２０１は、図８のようにＳ１０１０３で検出された顔特徴点に基づいて局所領域１０２０１０の設定を行う。そして、特徴抽出部１０２０２は、カラーヒストグラム等の色特徴、エッジ、ＨｉｓｔｏｇｒａｍＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔ（以下、ＨＯＧという）等のヒストグラム特徴等、属性認識に必要な多種多様な特徴量を抽出する。 In S10103, the face feature point detection unit 10101 detects the face feature point position for the face selected in S10102. The face feature points are the corners of the back of the eye, the head of the eye, the top of the nose, the corners of the left and right mouth ends, and the parts of the nose and the mouth as in 101030 of FIG. For example, the face feature point detection unit 10101 extracts a feature amount based on the face feature point position after setting an average face feature point position as an initial arrangement in advance as in Non-Patent Document 4. Then, the face feature point detection unit 10101 has a method of moving / deforming the face feature point position based on a table indicating the relationship between the feature amount and the movement / deformation amount of the face feature point prepared in advance. However, detection of face feature points is not limited to this method.
In S10200, based on the face feature points detected in S10103, the normalization unit 10200 performs affine transformation processing so that the size of the face size is set and the orientation of the face is erected. For example, the normalization unit 10200 obtains the center of gravity of four feature points related to the eyes, eyes, upper eyelid and lower eyelid, and the distance connecting the center of gravity of the left and right eyes is 75 [pixel], and the inclination with respect to the horizontal direction is 0 Perform affine transformation to be [degree].
Next, the region setting unit 10201 performs setting of the local region 102010 based on the face feature points detected in S10103 as shown in FIG. Then, the feature extraction unit 10202 extracts various feature amounts necessary for attribute recognition, such as color features such as color histograms, edges, and histogram features such as Histogram Oriented Gradient (hereinafter referred to as HOG).

Ｓ１０３００では、属性識別部１０３は、図９のように、Ｓ１０２００で抽出された多種多様な特徴量から特徴ベクトルＶ_nを生成し、特徴ベクトルＶ_nを入力として、フレームメガネ、男性、女性、子供、成人、ニット帽子、口ヒゲ、顎ヒゲ、金髪、スキンヘッド、ウェーブヘア、化粧等の属性の識別を行う。属性識別器からは、属性に対するスコアが算出される。また、識別対象が人体である場合、属性識別部１０３は、上着やパンツ等の服装を識別するようにしてもよい。これら識別する属性の種類は予め人手によって決定される。
特徴ベクトルＶ_nはｎ人目の顔に対する特徴ベクトルであり、特徴ベクトルＶ_nを構成する要素ａ_nmは、ある領域ｍにおけるカラーヒストグラムの各ビンの要素、エッジ画像を構成する画素値、ＨＯＧの各ビンの要素を連結したものである。
属性識別器は、属性毎に学習する。例えば、男性のｐｏｓｉｔｉｖｅサンプルと女性のｎｅｇａｔｉｖｅサンプルとをｎ人分用意する。属性識別器は、これらのサンプルに対応する特徴ベクトルＶ_ｎを抽出する。男性のｐｏｓｉｔｉｖｅサンプルに対しては出力値が１、女性のｎｅｇａｔｉｖｅサンプルに対しては出力値が−１となるように識別器に教師データを与えることで学習する。識別器には、ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ（以下、ＳＶＭという）を利用するが、これに限られるわけではない。 In S10300, as shown in FIG. 9, the attribute identifying unit 103 generates a feature vector V _n from the various feature amounts extracted in S 10200, and uses the feature vector V _n as an input to set frame glasses, men, women, children Identify attributes such as adults, knit hats, mustaches, beards, blondes, skin heads, wave hair, makeup, etc. The attribute discriminator calculates a score for the attribute. In addition, when the identification target is a human body, the attribute identification unit 103 may identify clothes such as outerwear and pants. The types of these identifying attributes are determined in advance manually.
The feature vector V _n is a feature vector for the _n- th face, and the elements a _nm constituting the feature vector V _n are elements of each bin of the color histogram in a certain area m, pixel values constituting the edge image, and HOG It is the concatenation of the elements of the bin.
The attribute classifier learns for each attribute. For example, n samples of male positive samples and female negative samples are prepared. The attribute discriminator extracts feature vectors V _n corresponding to these samples. It is learned by giving teacher data to the classifier so that the output value is 1 for male positive samples and the output value is -1 for female negative samples. Although a Support Vector Machine (hereinafter referred to as SVM) is used as a classifier, it is not limited thereto.

Ｓ１０４００では、属性識別結果補正部１０４は、Ｓ１０３００で識別された各属性に対するスコアを補正する。図１０は、ある人物に対する各属性識別器から出力されたスコアの一例を示す図である。各属性識別器はそれぞれ独立に学習しているため、図１０のように女性スコアが０．９、つまり、女性である可能性が高いにも関わらず、口ひげスコアが０．７という矛盾するケースが生じる可能性がある。図１０等に示されるように、画像処理装置１０は、属性識別器を複数有する。
そこで、本実施形態では、図１１のように女性の属性と矛盾する口ひげの属性スコアを補正する。つまり、属性スコア補正部１０４０２は、１つ以上の属性識別結果を選択し、選択された属性識別結果が正しい結果であると信用して、その他の属性スコアを補正する。例えば、女性の属性識別スコアが１．０に近い０．９であることから、属性スコア補正部１０４０２は、Ｓ１０１０３で選択された顔は女性であると判断して、口ひげ等、女性とは矛盾するスコアを補正する。信用する属性は人間が予め各属性に対する評価を行うことで精度のよい属性を調べ、設定等しておき、属性スコア補正部１０４０２は、この設定された精度のよい属性を優先的に選択するようにしてもよい。複数の属性識別器のそれぞれより出力された属性スコアから少なくとも１つの属性スコアを選択し、選択した属性スコアに基づいて、オブジェクトの複数の属性スコアのうち、選択した属性スコア以外の少なくとも１つの属性スコアを補正する処理の一例である。 In S10400, the attribute identification result correction unit 104 corrects the score for each attribute identified in S10300. FIG. 10 is a diagram showing an example of the score output from each attribute discriminator for a certain person. Since each attribute classifier learns independently, as shown in FIG. 10, there is a contradictory case where the female score is 0.9, that is, although the possibility of being a female is high, the mustache score is 0.7. Can occur. As shown in FIG. 10 and the like, the image processing apparatus 10 has a plurality of attribute discriminators.
So, in this embodiment, the attribute score of the mustache contradictory to the female attribute as shown in FIG. 11 is corrected. That is, the attribute score correction unit 10402 selects one or more attribute identification results, trusts the selected attribute identification result to be a correct result, and corrects the other attribute scores. For example, since the attribute identification score of a woman is 0.9, which is close to 1.0, the attribute score correction unit 10402 determines that the face selected in S10103 is a woman and contradicts with a woman, such as a mustache or the like. To correct the score. The attribute to be trusted is checked beforehand by a human to evaluate the attribute with high accuracy by setting the attribute etc., and the attribute score correction unit 10402 preferentially selects this set attribute with high accuracy. You may At least one attribute score is selected from the attribute scores output from each of the plurality of attribute discriminators, and at least one attribute other than the selected attribute score among the plurality of attribute scores of the object based on the selected attribute score It is an example of the process which corrects a score.

図１２は、Ｓ１０４００の属性識別結果補正処理の一例を示すフローチャートである。
Ｓ１０４０１では、属性識別結果補正部１０４は、Ｓ１０３００における属性識別器の中から１つ以上の属性を選択する。属性の選択方法は、上述の通り、人間が予め各属性に対する評価を行うことで精度のよい属性を調べ、設定しておき、属性識別結果補正部１０４は、この設定に基づき精度のよい属性を優先的に選択するようにしてもよい。
属性の選択方法は、各属性に対する評価結果を利用する方法以外に、識別器から出力されるスコアそのものを統計的に分析し、この統計分析結果を利用する方法でもよい。例えば、ある属性識別器が高精度である場合、ｐｏｓｉｔｉｖｅサンプルに対しては１．０に近い出力値、ｎｅｇａｔｉｖｅサンプルに対しては−１．０に近い出力値を出力する。したがって、属性識別結果補正部１０４は、各属性器から出力されるスコアを２つのクラスに分類し、２クラス間の分離度（クラス間分散／クラス内分散）が最大となる属性を選択するようにしてもよい。このように識別器から出力されるスコアを統計的に分析し、選択する属性を決定する方法では各属性がどんな属性を識別しているか知る必要がない。 FIG. 12 is a flowchart showing an example of the attribute identification result correction process of S10400.
In S10401, the attribute identification result correction unit 104 selects one or more attributes from the attribute identifier in S10300. As the attribute selection method, as described above, a human examines the attribute with high accuracy in advance to evaluate and set the attribute with high accuracy, and the attribute identification result correction unit 104 determines the attribute with high accuracy based on this setting. It may be selected preferentially.
The method of selecting the attribute may be a method of statistically analyzing the score itself output from the classifier and using this statistical analysis result, in addition to the method of using the evaluation result for each attribute. For example, when a certain attribute discriminator has high precision, it outputs an output value close to 1.0 for positive samples and an output value close to -1.0 for negative samples. Therefore, the attribute identification result correction unit 104 classifies the score output from each attribute device into two classes, and selects an attribute with which the degree of separation between the two classes (inter-class variance / in-class variance) is maximum. You may As described above, in the method of statistically analyzing the score output from the classifier and determining the attribute to select, it is not necessary to know what attribute each attribute identifies.

Ｓ１０４０２では、属性識別結果補正部１０４は、Ｓ１０４０１で選択された属性スコアが設定された閾値以上かどうか判定する。属性識別結果補正部１０４は、属性スコアが設定された閾値以上の場合にはＳ１０４０３へ進み、属性スコアが設定された閾値未満の場合にはＳ１０４０１へ戻り、別の属性を選択する。
Ｓ１０４０３では、属性識別結果補正部１０４は、Ｓ１０４０１で選択された１つ以上の属性識別結果に基づいて、予め用意されていた複数の属性部分空間Ｐから１つの属性部分空間Ｐ_iを選択する。
ここで、属性部分空間Ｐについて説明する。
１名の人物に対する１番目からｉ番目までの各属性識別器から出力されるスコアをＳ₁、Ｓ₂、・・・、Ｓ_niとして各スコアを特徴ベクトルの１つの要素として考えると、図１３のようにｎ番目の人物に対してはＳ_n＝［Ｓ_n1、Ｓ_n2、・・・、Ｓ_ni］となる。ここで、Ｓ_nを、各属性を規定ベクトルとする空間上（以下、全属性空間Ｖという）の１点として捉える。
この全属性空間Ｖを張る属性の一部を女性の属性と相関する属性と考えると、図１４のように女性だけが存在しうる超平面を定義することができる。上述した女性スコアが０．９、口ひげスコアが０．７というサンプルをＳ_nとすると、サンプルＳ_nは図１４のように女性だけが存在しうる真の範囲から外れているものと考えられる。そこで本実施形態では、属性識別結果補正部１０４が、図１５のように女性だけが存在しうる真の範囲から外れているサンプルＳ_nを、全属性空間Ｖ上で女性だけが存在しうる真の範囲に移動させることで属性スコアを補正する。全属性空間Ｖ上でサンプルＳ_nを移動させる方法としては、属性識別結果補正部１０４は、属性部分空間Ｐを利用する。属性識別結果補正部１０４は、全属性空間Ｖ内に点在するサンプル分布から、全属性空間Ｖよりも低次元の部分属性空間Ｐを求め、入力サンプルＳ_nをこの部分属性空間Ｐへ射影する。部分属性空間Ｐへ射影した入力サンプルをＳ_n ^'とすると、属性識別結果補正部１０４は、部分属性空間Ｐへ射影した入力サンプルＳ_n ^'を再び元の全属性空間Ｖに逆射影することで、全属性空間Ｖ上でサンプルＳ_nを移動させる。ここで、部分属性空間Ｐの基底ベクトルを求める１つの手法として主成分分析（以下、ＰＣＡという）がある。例えば、女性だけの大量サンプルを用意しておき、この大量サンプルに対してＰＣＡを行うことでサンプルのばらつきを指標とする固有ベクトルＵ＝［ｕ１，ｕ２，・・・，ｕｋ］が求まる。属性識別結果補正部１０４は、固有値（ばらつき）の小さい固有ベクトルＵ＝［ｕ１，ｕ２，・・・，ｕｉ］を女性サンプル特有の固有ベクトル、固有値（ばらつき）の大きい固有ベクトルは女性サンプル特有でない固有ベクトルと判断する。そして、属性識別結果補正部１０４は、固有値（ばらつき）の小さい固有ベクトルＵ＝［ｕ１，ｕ２，・・・，ｕｉ］で部分属性空間Ｐを構成する。
部分属性空間Ｐ上での入力サンプルをＳ_n ^''とするとＳ_n ^''＝［ｕ１，ｕ２，・・・，ｕｋ］Ｓ_n ^'と表すことができる。また、全属性空間Ｖから部分属性空間Ｐへの変換は線形射影であるため、部分属性空間Ｐでの入力サンプルＳ_n ^''を元の全属性空間Ｖに逆射影可能である。部分空間Ｐを求める方法はＰＣＡでなくともよい。属性識別結果補正部１０４は、全属性空間Ｖ上で近くに存在するサンプルをより近づける局所性保存射影（ＬＰＰ）等、他の線形射影を利用してもよい。
Ｓ１０４０３では、属性識別結果補正部１０４は、上述の通り、Ｓ１０４０２で選択された部分属性空間Ｐへ入力サンプルＳ_n ^'を射影し、その後、全属性空間Ｖに逆射影する。
Ｓ１０４０３では、属性識別結果補正部１０４は、上述の通り、Ｓ１０４０２で選択された部分属性空間Ｐへ入力サンプルＳ_n ^'を射影し、その後、全属性空間Ｖに逆射影する。 In S10402, the attribute identification result correction unit 104 determines whether the attribute score selected in S10401 is equal to or greater than the set threshold. The attribute identification result correction unit 104 proceeds to S10403 if the attribute score is equal to or greater than the set threshold, and returns to S10401 if the attribute score is less than the set threshold, and selects another attribute.
In S10403, the attribute identification result correction unit 104 selects one attribute partial space P _i from the plurality of attribute partial spaces P prepared in advance, based on the one or more attribute identification results selected in S10401.
Here, the attribute subspace P will be described.
When the scores output from the first to i-th attribute classifiers for one person are S ₁ , S ₂ ,..., S _ni and each score is considered as one element of the feature vector, FIG. Thus, for the n-th person, S _n = [S _n1 , S _n2 ,..., S _ni ]. Here, the S _n, the space to define vectors each attribute (hereinafter, referred to as total attribute space V) taken as one point.
If a part of the attribute extending the entire attribute space V is considered to be an attribute correlated with the attribute of a woman, it is possible to define a hyperplane in which only a woman can exist as shown in FIG. Women score 0.9 described above, the mustache score to the sample and S _n of 0.7, the sample S _n is considered to have deviated from the true scope of which only women can exist as shown in Figure 14. Therefore, in the present embodiment, as shown in FIG. 15, the attribute identification result correction unit 104 determines that the sample S _n which is out of the true range where only women can exist is true that only women can exist on the entire attribute space V. Correct the attribute score by moving to the range of. As a method of moving the sample S _n on the entire attribute space V, the attribute identification result correction unit 104 uses the attribute subspace P. The attribute identification result correction unit 104 obtains a partial attribute space P having a dimension lower than that of the entire attribute space V from the sample distribution scattered in the entire attribute space V, and projects the input sample S _n onto this partial attribute space P. . Assuming that the input sample projected to the partial attribute space P is S _n ^′ , the attribute identification result correction unit 104 backprojects the input sample S _n ^′ projected to the partial attribute space P back to the original entire attribute space V. , Move the sample S _n on the entire attribute space V. Here, there is principal component analysis (hereinafter referred to as PCA) as one method of obtaining a basis vector of the partial attribute space P. For example, a large sample only for women is prepared, and PCA is performed on the large sample to obtain an eigenvector U = [u1, u2,..., Uk] using the sample variation as an index. The attribute identification result correction unit 104 determines that the eigenvector U = [u1, u2,..., Ui] having a small eigen value (variation) is an eigenvector specific to a female sample, and the eigenvector having a large eigen value Do. Then, the attribute identification result correction unit 104 configures the partial attribute space P with eigenvectors U = [u1, u2,..., Ui] having small eigenvalues (variations).
Assuming that an input sample on the partial attribute space P is S _n ^{′ ′} , it can be expressed as S _n ^{′ ′} = [u 1, u 2,..., Uk] S _n ^′ . Further, since the transformation from the total attribute space V to the partial attribute space P is a linear projection, the input sample S _n ^{′ ′} in the partial attribute space P can be backprojected onto the original total attribute space V. The method of obtaining the subspace P may not be PCA. The attribute identification result correction unit 104 may use another linear projection such as a locality preserving projection (LPP) that brings closer samples closer to each other on the entire attribute space V.
In S10403, the attribute identification result correction unit 104 projects the input sample S _n ^′ onto the partial attribute space P selected in S10402, as described above, and then performs reverse projection on the entire attribute space V.
In S10403, the attribute identification result correction unit 104 projects the input sample S _n ^′ onto the partial attribute space P selected in S10402, as described above, and then performs reverse projection on the entire attribute space V.

上述した例では、属性識別器のスコアに基づいて部分空間Ｐを選択する方法について説明したが、属性識別結果補正部１０４は、属性ではなくて画像が撮影された日時や場所等の他のセンサ情報として得られる撮影状況情報を利用するようにしてもよい。つまり、真夏の時期にニット帽を被っている人はいない、深夜の時間帯に子供はいない、小学校にヘルメットを被っている人はいない等、属性識別結果補正部１０４は、日時や場所と属性との相関関係を利用してもよい。図１６は、画像処理装置１０のソフトウェア構成の変形例を示す図である。図１６のソフトウェア構成では、図２のソフトウェア構成に加えて情報取得部１０５が付加された構成となっている。図１７は、Ｓ１０４００の属性識別結果補正処理の変形例を示すフローチャートである。
Ｓ１０４１１では、情報取得部１０５は、ネットワーク等を介して通信可能な撮像装置から取得可能な日時情報や場所情報を取得する。日時情報は、画像に関する日時情報の一例である。場所情報は、画像に関する場所情報の一例である。
Ｓ１０４１２では、属性識別結果補正部１０４は、Ｓ１０４１１で取得された日時や場所の情報に基づいて部分空間選択をする。例えば、真夏や深夜等の時間帯に特有の部分空間、小学校等の場所に特有の部分空間が用意されており、属性識別結果補正部１０４は、これらの複数の部分空間からＳ１０４１１で取得された日時や場所の情報に基づいて部分空間選択をする。 In the example described above, the method of selecting the partial space P based on the score of the attribute discriminator has been described, but the attribute discrimination result correction unit 104 is not an attribute, but other sensors such as date and time when the image was taken You may make it utilize the imaging condition information obtained as information. In other words, there are no people wearing knit hats in the summer season, no children in the late-night hours, no people wearing helmets in elementary schools, etc. The attribute identification result correction unit 104 You may use correlation with this. FIG. 16 is a view showing a modified example of the software configuration of the image processing apparatus 10. As shown in FIG. In the software configuration of FIG. 16, an information acquisition unit 105 is added to the software configuration of FIG. 2. FIG. 17 is a flowchart showing a modification of the attribute identification result correction process of S10400.
In S10411, the information acquisition unit 105 acquires date and time information and location information that can be acquired from an imaging device that can communicate via a network or the like. Date and time information is an example of date and time information related to an image. The place information is an example of the place information on the image.
In S10412, the attribute identification result correction unit 104 performs partial space selection based on the information on the date and time and the place acquired in S10411. For example, a partial space specific to a time zone such as midsummer or midnight, a partial space specific to a place such as an elementary school, etc. are prepared, and the attribute identification result correction unit 104 is acquired from these plural partial spaces in S10411. Subspace selection based on date and time and location information.

このように、実施形態１では、画像処理装置１０は、
１．属性識別器の中から少なくとも１つ以上の属性識別器を選択してスコアを参照する
２．選択した属性識別器のスコアに基づいて、予め用意しておいた部分空間Ｐを選択する
３．全属性空間Ｖから選択した部分空間Ｐへ射影した後、部分空間Ｐから全属性空間Ｖへ逆射影することで属性スコアを補正する
という処理を行う。このことにより、より高精度な属性識別器を実現することができる。 Thus, in the first embodiment, the image processing apparatus 10
1. At least one or more attribute discriminators are selected from the attribute discriminators to refer to the score. 2. Select a partial space P prepared in advance based on the score of the selected attribute discriminator. After projecting from the entire attribute space V to the selected subspace P, the process of correcting the attribute score by back-projecting the subspace P to the entire attribute space V is performed. This makes it possible to realize a more accurate attribute discriminator.

＜その他の実施形態＞
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給する。そして、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読み出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 <Other Embodiments>
The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium. And, it is also possible to realize the processing in which one or more processors in the computer of the system or apparatus read and execute the program. It can also be implemented by a circuit (eg, an ASIC) that implements one or more functions.

以上、本発明の実施形態の一例について詳述したが、本発明は係る特定の実施形態に限定されるものではない。
例えば、画像処理装置１０のハードウェア構成として、ＣＰＵは複数存在してもよく、複数のＣＰＵがメモリ等に記憶されているプログラムに基づき処理を実行するようにしてもよい。また、画像処理装置１０のハードウェア構成として、ＣＰＵの替わりにＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を用いてもよい。
また、画像処理装置１０のソフトウェア構成の一部をハードウェア構成として画像処理装置１０に実装してもよい。
また、画像処理装置１０は、１つ以上の属性識別器の結果と日時情報及び／又は場所情報とに基づいて、部分空間を選択し、射影・逆射影することで属性識別結果群を補正するようにしてもよい。 As mentioned above, although an example of an embodiment of the present invention was explained in full detail, the present invention is not limited to such a specific embodiment.
For example, as a hardware configuration of the image processing apparatus 10, a plurality of CPUs may exist, and the plurality of CPUs may execute processing based on a program stored in a memory or the like. Further, as a hardware configuration of the image processing apparatus 10, a GPU (Graphics Processing Unit) may be used instead of the CPU.
Further, part of the software configuration of the image processing apparatus 10 may be implemented as a hardware configuration in the image processing apparatus 10.
Further, the image processing apparatus 10 corrects the attribute identification result group by selecting a subspace based on the result of one or more attribute discriminators and the date / time information and / or the location information, and performing projection and back projection. You may do so.

以上上述した各実施形態によれば、より高精度に属性を認識することができる。 According to each embodiment described above, the attribute can be recognized with higher accuracy.

１０画像処理装置
１１ＣＰＵ
１２メモリ
１３通信Ｉ／Ｆ 10 image processing device 11 CPU
12 Memory 13 Communication I / F

Claims

A detection unit that detects a position of an object included in an image; and an extraction unit that extracts a feature amount from the image based on the position of the object.
Attribute identification means for outputting an attribute score for each of a plurality of attributes of the object using the feature amount;
Correction means for correcting an attribute score for at least one of the plurality of attributes based on the attribute score for at least one of the plurality of attributes of the object output by the attribute identification means;
An image processing apparatus having:

Having a plurality of the attribute identification means,
The correction means selects at least one attribute score from the attribute scores output from each of the plurality of attribute identification means, and the selected one of the plurality of attribute scores of the object is selected based on the selected attribute score. The image processing apparatus according to claim 1, wherein at least one attribute score other than the attribute score is corrected.

The image processing apparatus according to claim 2, wherein the correction unit selects at least one attribute score of the attribute scores output by the plurality of attribute identification units based on settings.

The correction means according to claim 2, wherein the correction means statistically analyzes the attribute scores outputted by the plurality of attribute identification means, and selects at least one attribute score of the attribute scores outputted by the plurality of attribute identification means. Image processing device.

Acquisition means for acquiring shooting status information regarding the image;
A detection unit that detects a position of an object included in an image; and an extraction unit that extracts a feature amount from the image based on the position of the object.
Attribute identification means for outputting an attribute score for at least one attribute of the object using the feature amount;
A correction unit that corrects the attribute score on the basis of the shooting status information acquired by the acquisition unit;
An image processing apparatus having:

The image processing apparatus according to claim 5, wherein the acquisition unit acquires date and time information indicating a date and time when the image is captured as the shooting status information.

The image processing apparatus according to claim 5, wherein the acquisition unit acquires location information representing a location at which the image is captured as the capturing status information.

The image processing apparatus according to any one of claims 5 to 7, comprising a plurality of the attribute identification means.

The image processing apparatus according to any one of claims 1 to 8, wherein the object is a face included in an image.

The image processing apparatus according to any one of claims 1 to 8, wherein the object is a human body included in an image.

The image processing apparatus according to any one of claims 1 to 9, wherein the image processing apparatus is an imaging apparatus.

A detection step of detecting a position of an object included in an image; and an extraction step of extracting a feature amount from the image based on the position of the object.
An attribute identification step of outputting an attribute score for each of a plurality of attributes of the object using the feature amount;
Correcting the attribute score for at least one of the plurality of attributes based on the attribute score for at least one of the plurality of attributes of the object output by the attribute identification step;
Image processing method including:

An acquisition step of acquiring shooting status information regarding an image;
A detection step of detecting a position of an object included in an image; and an extraction step of extracting a feature amount from the image based on the position of the object.
An attribute identification step of outputting an attribute score for at least one attribute of the object using the feature amount;
A correction step of correcting the attribute score based on the photographing situation information acquired by the acquisition step;
Image processing method including:

A program for causing a computer to function as each means of the image processing apparatus according to any one of claims 1 to 10.