JP6276504B2

JP6276504B2 - Image detection apparatus, control program, and image detection method

Info

Publication number: JP6276504B2
Application number: JP2013010753A
Authority: JP
Inventors: 梅崎　太造; 太造梅崎; 隼廣田; 健太西行; 山口　孝志; 孝志山口; 哲英 ▲高▼曽
Original assignee: MegaChips Corp
Current assignee: MegaChips Corp
Priority date: 2013-01-24
Filing date: 2013-01-24
Publication date: 2018-02-07
Anticipated expiration: 2033-01-24
Also published as: JP2014142800A

Description

本発明は、画像から特徴量を抽出する技術に関する。 The present invention relates to a technique for extracting a feature amount from an image.

特許文献１にも記載されているように、従来から、画像から特徴量を抽出する様々な技術が提案されている。また、非特許文献１では、特徴量の抽出で使用されるＬＢＰ（Local Binary Pattern）及びＬＴＰ（Local Ternary Pattern）について説明されている。 As described in Patent Document 1, various techniques for extracting feature amounts from images have been proposed. Non-Patent Document 1 describes LBP (Local Binary Pattern) and LTP (Local Ternary Pattern) used in feature quantity extraction.

特開２０１０−４４４３８号公報JP 2010-44438 A

Xiaoyang Tan and Bill Triggs，“Enhanced Local Texture Feature Sets for Face Recognition under Difficult Lighting Conditions”，IEEE Transactions on Imgage processing，Volume 19，Issue 6，pp.1635-1650，June 2010Xiaoyang Tan and Bill Triggs, “Enhanced Local Texture Feature Sets for Face Recognition under Difficult Lighting Conditions”, IEEE Transactions on Imgage processing, Volume 19, Issue 6, pp.1635-1650, June 2010

さて、画像から特徴量を抽出する際には、適切な特徴量を得る必要がある。 Now, when extracting a feature amount from an image, it is necessary to obtain an appropriate feature amount.

そこで、本発明は上述の点に鑑みて成されたものであり、適切な特徴量を得ることが可能な技術を提供することを目的とする。 Therefore, the present invention has been made in view of the above points, and an object thereof is to provide a technique capable of obtaining an appropriate feature amount.

上記課題を解決するため、本発明に係る画像検出装置の一態様は、処理対象画像から検出対象画像を検出する画像検出装置であって、前記処理対象画像に含まれる画像に含まれる複数の画素のそれぞれを注目画素とし、当該注目画素の画素値と複数の周囲画素値との関係を示す評価値を複数種類求める評価値取得部と、前記画像に含まれる前記複数の画素について求められた前記評価値についての共起頻度を求めて、当該共起頻度を特徴量とする特徴量取得部と、前記特徴量に基づいて、前記画像が前記検出対象画像である可能性が高いかを判定する識別器とを備え、前記画像検出装置は、前記識別器で使用される前記特徴量の取得で使用される前記評価値を、前記複数種類の評価値の間で、前記処理対象画像の撮像環境の明るさに応じて使い分ける。 In order to solve the above problems, one aspect of an image detection device according to the present invention is an image detection device that detects a detection target image from a processing target image, and includes a plurality of pixels included in the image included in the processing target image. Each of which is a pixel of interest, an evaluation value acquisition unit that obtains a plurality of types of evaluation values indicating the relationship between the pixel value of the pixel of interest and a plurality of surrounding pixel values, and the plurality of pixels that are included in the image seeking co-occurrence frequencies for evaluation value, determining a characteristic amount obtaining section for the co-occurrence frequency and feature amount, based on the feature quantity, whether the image is likely to be the detection target image The image detection device captures the processing target image between the plurality of types of evaluation values and uses the evaluation values used in the acquisition of the feature values used by the classifier. Use according to the brightness of the environment Divide.

また、本発明に係る画像検出装置の一態様では、前記特徴量取得部は、前記複数種類の評価値に含まれるある種類の評価値についての前記共起頻度を正規化せずに前記特徴量とする。 In the aspect of the image detection device according to the present invention, the feature amount acquisition unit may perform the feature amount without normalizing the co-occurrence frequency for a certain type of evaluation value included in the plurality of types of evaluation values. And

また、本発明に係る画像検出装置の一態様では、前記評価値取得部は、前記複数種類の評価値に含まれるある種類の評価値を求める際には、注目画素に対して斜め方向の周囲画素値は使用しない。 In the aspect of the image detection apparatus according to the present invention, the evaluation value acquisition unit may be configured to obtain an evaluation value of a certain type included in the plurality of types of evaluation values in a diagonal direction with respect to the target pixel. Pixel values are not used.

また、本発明に係る画像検出装置の一態様では、前記評価値取得部は、前記ある種類の評価値を求める際には、注目画素に対して右上、左上、右下及び左下の方向の周囲画素値はすべて使用しない。 Further, in one aspect of the image detection device according to the present invention, the evaluation value acquisition unit, when obtaining the certain type of evaluation value, surrounds the target pixel in the upper right, upper left, lower right, and lower left directions. All pixel values are not used.

また、本発明に係る画像検出装置の一態様では、前記評価値取得部は、前記ある種類の評価値の取得で使用する複数の周囲画素値のそれぞれについて、当該周囲画素値と注目画素の画素値との関係を示す１ビットを生成し、当該複数の周囲画素値について生成した複数のビットで構成されるバイナリコードを、当該複数のビットを順に見ていった際のビット変化の回数にかかわらず、当該ある種類の評価値として使用する。 Further, in one aspect of the image detection device according to the present invention, the evaluation value acquisition unit includes, for each of a plurality of surrounding pixel values used for acquiring the certain kind of evaluation value, the surrounding pixel value and the pixel of the target pixel. 1 bit indicating the relationship with the value is generated, and the binary code composed of the plurality of bits generated for the plurality of surrounding pixel values is related to the number of bit changes when the plurality of bits are viewed in order. Rather, it is used as a certain kind of evaluation value.

また、本発明に係る画像検出装置の一態様では、前記複数種類の評価値は、第１種類の評価値を含み、前記評価値取得部は、前記第１種類の評価値の取得で使用する複数の周囲画素値のそれぞれについて、当該周囲画素値と注目画素の画素値との関係を示す１ビットを生成し、当該複数の周囲画素値について生成した複数のビットで構成されるバイナリコードを当該第１種類の評価値として使用し、前記第１種類の評価値に関する前記１ビットは、周囲画素値が、注目画素の画素値から所定量だけ減算して得られる値以下であれば“１”を示し、当該値よりも大きければ“０”を示す。 Further, in one aspect of the image detection apparatus according to the present invention, the plurality of types of evaluation values include a first type of evaluation value, and the evaluation value acquisition unit is used for acquiring the first type of evaluation value. For each of a plurality of surrounding pixel values, one bit indicating the relationship between the surrounding pixel value and the pixel value of the target pixel is generated, and a binary code including a plurality of bits generated for the plurality of surrounding pixel values is It was used as the first type of evaluation value, wherein one bit for the first type of evaluation value, surrounding pixel value is equal to or less than the value obtained by subtracting a predetermined amount from the pixel value of the pixel of interest "1" If it is larger than the value, “0” is indicated.

また、本発明に係る画像検出装置の一態様では、前記複数種類の評価値は、第２種類の評価値を含み、前記評価値取得部は、前記第２種類の評価値の取得で使用する複数の周囲画素値のそれぞれについて、当該周囲画素値と注目画素の画素値との関係を示す１ビットを生成し、当該複数の周囲画素値について生成した複数のビットで構成されるバイナリコードを当該第２評価値として使用し、前記第２種類の評価値に関する前記１ビットは、周囲画素値が、注目画素の画素値に所定量だけ加算して得られる値以上であれば“１”を示し、当該値未満であれば“０”を示す。 In the aspect of the image detection apparatus according to the present invention, the plurality of types of evaluation values include a second type of evaluation value, and the evaluation value acquisition unit is used for acquiring the second type of evaluation value. For each of a plurality of surrounding pixel values, one bit indicating the relationship between the surrounding pixel value and the pixel value of the target pixel is generated, and a binary code including a plurality of bits generated for the plurality of surrounding pixel values is Used as a second evaluation value, the 1 bit relating to the second type of evaluation value indicates “1” if the surrounding pixel value is equal to or greater than a value obtained by adding a predetermined amount to the pixel value of the target pixel. If it is less than the value, “0” is indicated.

また、本発明に係る画像検出装置の一態様では、前記検出対象画像は、人の顔画像である。 In one aspect of the image detection apparatus according to the present invention, the detection target image is a human face image.

また、本発明に係る制御プログラムの一態様は、処理対象画像から検出対象画像を検出する画像検出装置を制御するための制御プログラムであって、前記画像検出装置に、（ａ）前記処理対象画像に含まれる画像に含まれる複数の画素のそれぞれを注目画素とし、当該注目画素の画素値と複数の周囲画素値との関係を示す評価値を複数種類求める工程と、（ｂ）前記画像に含まれる前記複数の画素について求められた前記評価値についての共起頻度を求めて、当該共起頻度を特徴量とする工程と、（ｃ）前記特徴量に基づいて、前記画像が前記検出対象画像である可能性が高いかを判定する工程とを実行させ、前記工程（ｃ）で使用される前記特徴量の取得で使用される前記評価値を、前記複数種類の評価値の間で、前記処理対象画像の撮像環境の明るさに応じて使い分けさせるためのものである。 One aspect of the control program according to the present invention is a control program for controlling an image detection device that detects a detection target image from a processing target image. The control program includes (a) the processing target image. Obtaining a plurality of types of evaluation values indicating the relationship between a pixel value of the target pixel and a plurality of surrounding pixel values, with each of a plurality of pixels included in the image included in the image included in the image, and (b) included in the image seeking co-occurrence frequencies for the evaluation values obtained for the plurality of pixels, the steps of the co-occurrence frequency and feature amount, based on (c) the feature amount, the image is the detection target Determining whether there is a high possibility that the image is an image, and the evaluation value used in the acquisition of the feature value used in the step (c) is, among the plurality of types of evaluation values, Imaging of the processing target image It is intended to make proper use according to the brightness of the border.

また、本発明に係る画像検出方法の一態様は、処理対象画像から検出対象画像を検出する画像検出方法であって、（ａ）前記処理対象画像に含まれる画像に含まれる複数の画素のそれぞれを注目画素とし、当該注目画素の画素値と複数の周囲画素値との関係を示す評価値を複数種類求める工程と、（ｂ）前記画像に含まれる前記複数の画素について求められた前記評価値についての共起頻度を求めて、当該共起頻度を特徴量とする工程と、（ｃ）前記特徴量に基づいて、前記画像が前記検出対象画像である可能性が高いかを判定する工程とを備え、前記工程（ｃ）で使用される前記特徴量の取得で使用される前記評価値を、前記複数種類の評価値の間で、前記処理対象画像の撮像環境の明るさに応じて使い分ける。
An aspect of the image detection method according to the present invention is an image detection method for detecting a detection target image from a processing target image, wherein: (a) each of a plurality of pixels included in the image included in the processing target image A step of obtaining a plurality of types of evaluation values indicating the relationship between the pixel value of the pixel of interest and a plurality of surrounding pixel values, and (b) the evaluation values obtained for the plurality of pixels included in the image seeking co-occurrence frequencies for the steps of the co-occurrence frequency with the feature amount, (c) on the basis of the feature quantity, the process determines whether the image is likely to be the detection target image The evaluation value used in the acquisition of the feature value used in the step (c) is determined according to the brightness of the imaging environment of the processing target image between the plurality of types of evaluation values. Use properly.

本発明によれば、適切な特徴量を得ることができる。 According to the present invention, an appropriate feature amount can be obtained.

画像処理システムの構成を示す図である。It is a figure which shows the structure of an image processing system. 画像検出装置が備える複数の機能ブロックの構成を示す図である。It is a figure which shows the structure of the several functional block with which an image detection apparatus is provided. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出結果枠を処理対象画像に重ねて示す図である。It is a figure which overlaps and shows a detection result frame on a process target image. 特徴量抽出装置の構成を示す図である。It is a figure which shows the structure of a feature-value extraction apparatus. 評価値取得部の動作を説明するための図である。It is a figure for demonstrating operation | movement of an evaluation value acquisition part. 注目画素値と複数の周囲画素値の一例を示す図である。It is a figure which shows an example of an attention pixel value and a some surrounding pixel value. ＬＢＰの生成方法を説明するための図である。It is a figure for demonstrating the production | generation method of LBP. ＬＢＰの生成方法を説明するための図である。It is a figure for demonstrating the production | generation method of LBP. 処理対象画像の一例を示す図である。It is a figure which shows an example of a process target image. ＬＰＢマップ画像の一例を示す図である。It is a figure which shows an example of an LPB map image. ＬＴＰの生成方法を説明するための図である。It is a figure for demonstrating the production | generation method of LTP. ポジティブＬＴＰの生成方法を説明するための図である。It is a figure for demonstrating the production | generation method of positive LTP. ネガティブＬＴＰの生成方法を説明するための図である。It is a figure for demonstrating the production | generation method of negative LTP. ポジティブＬＴＰマップ画像の一例を示す図である。It is a figure which shows an example of a positive LTP map image. ネガティブＬＴＰマップ画像の一例を示す図である。It is a figure which shows an example of a negative LTP map image. 注目画素と複数の周囲画素との間での位置関係を示す図である。It is a figure which shows the positional relationship between an attention pixel and several surrounding pixels. 注目画素からの距離が“１”である斜め方向の周囲位置での画素値を求める方法を説明するための図である。It is a figure for demonstrating the method of calculating | requiring the pixel value in the peripheral position of the diagonal direction whose distance from an attention pixel is "1". １次元評価値ヒストグラムの一例を示す図である。It is a figure which shows an example of a one-dimensional evaluation value histogram. 複数種類の評価値ペアの一例を示す図である。It is a figure which shows an example of multiple types of evaluation value pairs. 特徴量取得部の動作を説明するための図である。It is a figure for demonstrating operation | movement of the feature-value acquisition part. 特徴量取得部の動作を説明するための図である。It is a figure for demonstrating operation | movement of the feature-value acquisition part. 特徴量取得部の動作を説明するための図である。It is a figure for demonstrating operation | movement of the feature-value acquisition part. ２次元評価値ヒストグラムの一例を示す図である。It is a figure which shows an example of a two-dimensional evaluation value histogram. 出力値マップの生成方法を説明するための図である。It is a figure for demonstrating the production | generation method of an output value map. 出力値マップの生成方法を説明するための図である。It is a figure for demonstrating the production | generation method of an output value map. 本実施の形態に係る出力値マップの一例を示す図である。It is a figure which shows an example of the output value map which concerns on this Embodiment. 第１特徴量だけに基づいて生成された出力値マップの一例を示す図である。It is a figure which shows an example of the output value map produced | generated based only on the 1st feature-value. 特徴量取得部の変形例の動作を説明するための図である。It is a figure for demonstrating operation | movement of the modification of a feature-value acquisition part. 特徴量取得部の変形例の動作を説明するための図である。It is a figure for demonstrating operation | movement of the modification of a feature-value acquisition part. 撮像環境が明るい様子を示す図である。It is a figure which shows a mode that an imaging environment is bright. 撮像環境が暗い様子を示す図である。It is a figure which shows a mode that an imaging environment is dark. 画像処理システムの変形例の構成を示す図である。It is a figure which shows the structure of the modification of an image processing system. 画像検出装置の変形例の構成を示す図である。It is a figure which shows the structure of the modification of an image detection apparatus. 隣接画素間の輝度差についての頻度分布を表す頻度曲線を示す図である。It is a figure which shows the frequency curve showing the frequency distribution about the brightness | luminance difference between adjacent pixels.

図１は実施の形態に係る画像検出装置１を備える画像処理システム５０の構成を示す図である。画像処理システム５０は、画像検出装置１と、撮像装置５とを備えている。撮像装置５は、画像を撮像し、撮像画像を示す画像データを画像検出装置１に出力する。画像検出装置１は、入力される画像データが示す撮像画像から検出対象画像を検出する。画像処理システム５０は、例えば、監視カメラシステム、デジタルカメラシステム等で使用される。本実施の形態では、検出対象画像は、例えば人の顔画像である。以後、単に「顔画像」と言えば、人の顔画像を意味するものとする。また、検出対象画像を検出する対象の撮像画像を「処理対象画像」と呼ぶ。なお、検出対象画像は顔画像以外の画像であっても良い。例えば、検出対象画像は人全体の画像であっても良い。 FIG. 1 is a diagram illustrating a configuration of an image processing system 50 including an image detection apparatus 1 according to an embodiment. The image processing system 50 includes an image detection device 1 and an imaging device 5. The imaging device 5 captures an image and outputs image data indicating the captured image to the image detection device 1. The image detection apparatus 1 detects a detection target image from a captured image indicated by input image data. The image processing system 50 is used in, for example, a surveillance camera system, a digital camera system, or the like. In the present embodiment, the detection target image is, for example, a human face image. Hereinafter, simply speaking “face image” means a human face image. A captured image to be detected from the detection target image is referred to as a “processing target image”. The detection target image may be an image other than the face image. For example, the detection target image may be an image of the whole person.

図１に示されるように、画像検出装置１は、ＣＰＵ（Central Processing Unit）２及び記憶部３を備えている。記憶部３は、ＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）等で構成されている。記憶部３には、画像検出装置１の動作を制御するための制御プログラム４等が記憶されている。画像検出装置１の各種機能は、ＣＰＵ２が記憶部３内の制御プログラム４を実行することによって実現される。画像検出装置１では、制御プログラム４が実行されることによって、図２に示されるような複数の機能ブロックが形成される。 As shown in FIG. 1, the image detection apparatus 1 includes a CPU (Central Processing Unit) 2 and a storage unit 3. The storage unit 3 includes a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The storage unit 3 stores a control program 4 for controlling the operation of the image detection apparatus 1 and the like. Various functions of the image detection apparatus 1 are realized by the CPU 2 executing the control program 4 in the storage unit 3. In the image detection apparatus 1, a plurality of functional blocks as shown in FIG. 2 are formed by executing the control program 4.

図２に示されるように、画像検出装置１は、機能ブロックとして、画像入力部１１と、検出部１２と、出力値マップ生成部１５とを備えている。なお、画像検出装置１が備える各種機能は、機能ブロックではなくハードウェア回路で実現しても良い。 As shown in FIG. 2, the image detection apparatus 1 includes an image input unit 11, a detection unit 12, and an output value map generation unit 15 as functional blocks. Note that various functions provided in the image detection apparatus 1 may be realized by hardware circuits instead of functional blocks.

画像入力部１１には、撮像装置５で順次撮像された複数枚の画像をそれぞれ示す複数の画像データが順次入力される。画像入力部１１は、処理対象画像を示す画像データを出力する。画像入力部１１は、撮像装置５で得られる各画像を処理対象画像としても良いし、撮像装置５で得られる画像のうち、数秒毎に得られる画像を処理対象画像としても良い。撮像装置５では、例えば、１秒間にＬ枚（Ｌ≧２）の画像が撮像される。つまり、撮像装置５での撮像フレームレートは、Ｌｆｐｓ(frame per second）である。 A plurality of pieces of image data respectively indicating a plurality of images sequentially taken by the imaging device 5 are sequentially input to the image input unit 11. The image input unit 11 outputs image data indicating the processing target image. The image input unit 11 may use each image obtained by the imaging device 5 as a processing target image, or may use an image obtained every few seconds among images obtained by the imaging device 5 as a processing target image. In the imaging device 5, for example, L (L ≧ 2) images are captured per second. That is, the imaging frame rate in the imaging device 5 is Lfps (frame per second).

また、撮像装置５で撮像される画像では、行方向にＭ個（Ｍ≧２）の画素が並び、列方向にＮ個（Ｎ≧２）の画素が並んでいる。撮像装置５で撮像される画像の解像度は、例えばＶＧＡ（Video Graphics Array）であって、Ｍ＝６４０、Ｎ＝４８０となっている。 In the image captured by the imaging device 5, M (M ≧ 2) pixels are arranged in the row direction, and N (N ≧ 2) pixels are arranged in the column direction. The resolution of the image picked up by the image pickup device 5 is, for example, VGA (Video Graphics Array), and M = 640 and N = 480.

なお以後、行方向にｍ個（ｍ≧１）の画素が並び、列方向にｎ個（ｎ≧１）の画素が並ぶ領域の大きさをｍｐ×ｎｐで表す（ｐはピクセルの意味）。また、行列状に配置された複数の値において、左上を基準にして第ｍ行目であって第ｎ列目に位置する値をｍ×ｎ番目の値と呼ぶことがある。また、行列状に配置された複数の画素において、左上を基準にして第ｍ行目であって第ｎ列目に位置する画素をｍ×ｎ番目の画素と呼ぶことがある。 Hereinafter, the size of an area in which m (m ≧ 1) pixels are arranged in the row direction and n (n ≧ 1) pixels are arranged in the column direction is represented by mp × np (p is a pixel meaning). In addition, among a plurality of values arranged in a matrix, a value located in the m-th row and the n-th column with reference to the upper left may be referred to as an m × n-th value. In addition, in a plurality of pixels arranged in a matrix, a pixel located in the m-th row and the n-th column with respect to the upper left may be referred to as an m × n-th pixel.

検出部１２は、画像入力部１１から出力される画像データを使用して、処理対象画像に対して顔画像の検出を行う。出力値マップ生成部１５は、検出部１２での検出結果に基づいて、顔画像としての確からしさを示す検出確度値についての処理対象画像での分布を示す出力値マップを生成する。 The detection unit 12 uses the image data output from the image input unit 11 to detect a face image for the processing target image. The output value map generation unit 15 generates an output value map indicating the distribution in the processing target image with respect to the detection accuracy value indicating the likelihood as the face image based on the detection result of the detection unit 12.

次に、画像検出装置１の各ブロックの動作について詳細に説明する。 Next, the operation of each block of the image detection apparatus 1 will be described in detail.

＜検出処理＞
図２に示されるように、検出部１２は、特徴量抽出部１３及び識別器１４を備えている。検出部１２は、検出枠を用いて、処理対象画像において当該検出枠と同じサイズの顔画像である可能性が高い領域を検出結果領域として検出する検出処理を行う。以後、単に「検出処理」と言えば、検出部１２でのこの検出処理を意味する。検出処理については後で詳細に説明する。 <Detection process>
As shown in FIG. 2, the detection unit 12 includes a feature amount extraction unit 13 and a discriminator 14. Using the detection frame, the detection unit 12 performs a detection process of detecting, as a detection result region, a region that is highly likely to be a face image having the same size as the detection frame in the processing target image. Hereinafter, simply “detection processing” means this detection processing in the detection unit 12. The detection process will be described in detail later.

検出部１２は、処理対象画像における様々な大きさの顔画像を検出するために、サイズの異なる複数種類の検出枠を使用する。検出部１２では、例えば３０種類の検出枠が使用される。 The detection unit 12 uses a plurality of types of detection frames having different sizes in order to detect face images of various sizes in the processing target image. In the detection unit 12, for example, 30 types of detection frames are used.

本実施の形態では、後述するように、特徴量抽出部１３は、画像から特徴量を抽出する。そして、特徴量抽出部１３においては、特徴量を抽出する対象の画像については、基準サイズ（正規化サイズ）の画像を使用する必要がある。 In the present embodiment, as will be described later, the feature amount extraction unit 13 extracts a feature amount from an image. The feature amount extraction unit 13 needs to use an image having a reference size (normalized size) as a target image from which the feature amount is extracted.

一方で、本実施の形態では、互いにサイズが異なる複数種類の検出枠には、基準サイズと同じサイズの検出枠と、基準サイズとは異なるサイズの検出枠とが含まれている。以後、基準サイズと同じサイズの検出枠を「基準検出枠」と呼び、基準サイズとは異なるサイズの検出枠を「非基準検出枠」と呼ぶ。本実施の形態では、複数種類の検出枠のうちのサイズが最小の検出枠が基準検出枠となっている。したがって、非基準検出枠のサイズは基準サイズよりも大きくなっている。基準検出枠のサイズは、例えば１６ｐ×１６ｐである。また、複数種類の検出枠には、例えば、大きさが１８ｐ×１８ｐの非基準検出枠及び大きさが２０ｐ×２０ｐの非基準検出枠などが含まれている。 On the other hand, in the present embodiment, the plurality of types of detection frames having different sizes include a detection frame having the same size as the reference size and a detection frame having a size different from the reference size. Hereinafter, a detection frame having the same size as the reference size is referred to as a “reference detection frame”, and a detection frame having a size different from the reference size is referred to as a “non-reference detection frame”. In the present embodiment, the detection frame having the smallest size among the plurality of types of detection frames is the reference detection frame. Therefore, the size of the non-reference detection frame is larger than the reference size. The size of the reference detection frame is, for example, 16p × 16p. The plurality of types of detection frames include, for example, a non-reference detection frame having a size of 18p × 18p and a non-reference detection frame having a size of 20p × 20p.

本実施の形態では、検出部１２は、処理対象画像について基準検出枠を使用して検出処理を行う際には、処理対象画像に対して基準検出枠を移動させながら、当該基準検出枠内の画像に対して顔画像の検出を行って、当該画像が顔画像である可能性が高いかを判定する。そして、検出部１２は、処理対象画像において、顔画像である可能性が高いと判定した領域（基準検出枠内の画像）を検出結果領域とする。 In the present embodiment, when performing detection processing for a processing target image using a reference detection frame, the detection unit 12 moves the reference detection frame with respect to the processing target image, A face image is detected for the image to determine whether the image is highly likely to be a face image. And the detection part 12 makes the area | region (image in a reference | standard detection frame) determined with possibility that it is a face image high in a process target image as a detection result area | region.

一方で、検出部１２は、処理対象画像について非基準検出枠を使用して検出処理を行う際には、基準サイズとサイズが一致するように非基準検出枠をサイズ変更する。そして、検出部１２は、非基準検出枠のサイズ変更に応じて処理対象画像のサイズ変更を行う。検出部１２は、サイズ変更を行った処理対象画像に対して、サイズ変更を行った非基準検出枠を移動させながら、当該非基準検出枠内の画像に対して顔画像の検出を行って、当該画像が顔画像である可能性が高いかを判定する。そして、検出部１２は、サイズ変更を行った処理対象画像において、顔画像である可能性が高いと判定した領域（サイズ変更後の非基準検出枠内の画像）に基づいて、サイズ変更が行われていない、本来のサイズの処理対象画像において顔画像である可能性が高い領域を特定し、当該領域を検出結果領域とする。 On the other hand, when performing detection processing using the non-reference detection frame for the processing target image, the detection unit 12 changes the size of the non-reference detection frame so that the size matches the reference size. Then, the detection unit 12 changes the size of the processing target image in accordance with the size change of the non-reference detection frame. The detection unit 12 detects a face image for an image in the non-reference detection frame while moving the non-reference detection frame whose size has been changed with respect to the processing target image whose size has been changed, It is determined whether or not the image is likely to be a face image. Then, the detection unit 12 performs the size change based on the region (the image in the non-reference detection frame after the size change) that is determined to be highly likely to be a face image in the processing target image that has undergone the size change. A region that has a high possibility of being a face image in a processing target image of an original size that is not known is identified, and the region is set as a detection result region.

以後、処理対象画像に対して非基準検出枠が使用されて検出処理が行われる際のサイズ変更後の当該処理対象画像を「サイズ変更画像」と呼ぶ。また、処理対象画像に対して非基準検出枠が使用されて検出処理が行われる際のサイズ変更後の当該非基準検出枠を「サイズ変更検出枠」と呼ぶ。 Hereinafter, the processing target image after the size change when the non-reference detection frame is used for the processing target image and the detection processing is performed is referred to as a “size-changed image”. Further, the non-reference detection frame after the size change when the detection process is performed using the non-reference detection frame for the processing target image is referred to as a “size change detection frame”.

このように、本実施の形態では、検出部１２が処理対象画像に対して基準検出枠を使用して検出処理を行う際の当該検出部１２の動作と、検出部１２が処理対象画像に対して非基準検出枠を使用して検出処理を行う際の当該検出部１２の動作とが異なっている。以下に検出部１２の動作について詳細に説明する。 Thus, in the present embodiment, the operation of the detection unit 12 when the detection unit 12 performs detection processing on the processing target image using the reference detection frame, and the detection unit 12 performs processing on the processing target image. Thus, the operation of the detection unit 12 when performing the detection process using the non-reference detection frame is different. The operation of the detection unit 12 will be described in detail below.

検出部１２では、検出処理に基準検出枠が使用される際には、特徴量抽出部１３が、処理対象画像に対して基準検出枠を設定し、当該処理対象画像における当該基準検出枠内の画像から複数の特徴量を抽出する。一方で、検出処理に非基準検出枠が使用される際には、特徴量抽出部１３は、処理対象画像をサイズ変更して得られるサイズ変更画像に対して、非基準検出枠をサイズ変更して得られるサイズ変更検出枠を設定し、当該サイズ変更画像における当該サイズ変更検出枠内の画像から複数の特徴量を抽出する。以後、特徴量が抽出される、基準検出枠内の画像及びサイズ変更検出枠内の画像を総称して「枠内画像」と呼ぶことがある。 In the detection unit 12, when the reference detection frame is used for the detection process, the feature amount extraction unit 13 sets a reference detection frame for the processing target image, and within the reference detection frame in the processing target image. A plurality of feature amounts are extracted from the image. On the other hand, when a non-reference detection frame is used for the detection process, the feature amount extraction unit 13 resizes the non-reference detection frame with respect to the size-changed image obtained by resizing the processing target image. The size change detection frame obtained in this way is set, and a plurality of feature amounts are extracted from the image in the size change detection frame in the size change image. Hereinafter, the image in the reference detection frame and the image in the size change detection frame from which the feature amount is extracted may be collectively referred to as “in-frame image”.

ここで、基準検出枠のサイズは基準サイズと一致することから、処理対象画像における基準検出枠内の画像のサイズは基準サイズとなる。また、サイズ変更検出枠のサイズは基準サイズと一致することから、サイズ変更画像におけるサイズ変更検出枠内の画像のサイズは基準サイズとなる。よって、特徴量抽出部１３は、常に基準サイズの画像から特徴量を抽出することができる。なお、特徴量抽出部１３での特徴量の抽出方法については後で詳細に説明する。 Here, since the size of the reference detection frame matches the reference size, the size of the image in the reference detection frame in the processing target image becomes the reference size. In addition, since the size of the size change detection frame matches the reference size, the size of the image in the size change detection frame in the size change image becomes the reference size. Therefore, the feature amount extraction unit 13 can always extract the feature amount from the image of the reference size. Note that the feature quantity extraction method in the feature quantity extraction unit 13 will be described in detail later.

識別器１４は、特徴量抽出部１３が枠内画像から抽出した複数の特徴量から成る特徴ベクトルと、学習サンプル（学習用のサンプル画像）に基づいて生成された複数の重み係数から成る重みベクトルとに基づいて、当該枠内画像が顔画像である確からしさを示す実数値を算出する。具体的には、特徴量抽出部１３は、枠内画像についての特徴ベクトルと、重みベクトルとの内積を求め、当該内積に所定のバイアス値を加算して得られる値を、当該枠内画像が顔画像である確からしさを示す実数値とする。以後、顔画像である確からしさを示す実数値を「検出確度値」と呼ぶ。識別器１４で算出される検出確度値は、基準検出枠内の画像あるいはサイズ変更検出枠内の画像についての顔画像らしさ（顔らしさ）を示している。識別器１４では、例えば、ＳＶＭ（Support Vector Machine）あるいはＡｄａｂｏｏｓｔが使用される。 The discriminator 14 includes a feature vector composed of a plurality of feature amounts extracted from the in-frame image by the feature amount extraction unit 13 and a weight vector composed of a plurality of weight coefficients generated based on the learning sample (learning sample image). Based on the above, a real value indicating the probability that the in-frame image is a face image is calculated. Specifically, the feature amount extraction unit 13 obtains an inner product of the feature vector and the weight vector for the in-frame image, and adds a predetermined bias value to the inner product, and the value in the in-frame image is obtained. A real value indicating the certainty of the face image is used. Hereinafter, a real value indicating the likelihood of being a face image is referred to as a “detection accuracy value”. The detection accuracy value calculated by the discriminator 14 indicates the face image likelihood (face likelihood) of the image in the reference detection frame or the image in the size change detection frame. In the classifier 14, for example, SVM (Support Vector Machine) or Adaboost is used.

識別器１４は、算出した検出確度値がしきい値以上であれば、枠内画像が顔画像である可能性が高いと判定する。つまり、基準検出枠が使用される際には、識別器１４は、処理対象画像における基準検出枠内の画像が、基準検出枠と同じサイズの顔画像である可能性が高い領域であると判定する。また、非基準検出枠が使用される際には、識別器１４は、サイズ変更画像におけるサイズ変更検出枠内の画像が、サイズ変更検出枠と同じサイズの顔画像である可能性が高い領域であると判定する。 If the calculated detection accuracy value is greater than or equal to the threshold value, the discriminator 14 determines that there is a high possibility that the in-frame image is a face image. That is, when the reference detection frame is used, the discriminator 14 determines that the image in the reference detection frame in the processing target image is a region that is highly likely to be a face image having the same size as the reference detection frame. To do. When the non-reference detection frame is used, the discriminator 14 is an area where the image in the size change detection frame in the size change image is likely to be a face image having the same size as the size change detection frame. Judge that there is.

一方で、識別器１４は、算出した検出確度値がしきい未満であれば、枠内画像が顔画像でない可能性が高いと判定する。つまり、基準検出枠が使用される際には、識別器１４は、処理対象画像における基準検出枠内の画像が、基準検出枠と同じサイズの顔画像である可能性が高い領域ではないと判定する。また、非基準検出枠が使用される際には、識別器１４は、サイズ変更画像におけるサイズ変更検出枠内の画像が、サイズ変更検出枠と同じサイズの顔画像である可能性が高い領域ではないと判定する。 On the other hand, if the calculated detection accuracy value is less than the threshold, the discriminator 14 determines that there is a high possibility that the in-frame image is not a face image. That is, when the reference detection frame is used, the discriminator 14 determines that the image in the reference detection frame in the processing target image is not an area that is highly likely to be a face image having the same size as the reference detection frame. To do. In addition, when the non-reference detection frame is used, the discriminator 14 determines that the image in the size change detection frame in the size change image is likely to be a face image having the same size as the size change detection frame. Judge that there is no.

識別器１４は、処理対象画像における基準検出枠内の画像が、基準検出枠と同じサイズの顔画像である可能性が高い領域であると判定すると、当該画像を検出結果領域とし、当該基準検出枠を検出結果枠とする。 When the discriminator 14 determines that the image in the reference detection frame in the processing target image is a region that is highly likely to be a face image having the same size as the reference detection frame, the image is used as a detection result region, and the reference detection is performed. Let the frame be the detection result frame.

また識別器１４は、サイズ変更画像におけるサイズ変更検出枠内の画像が、サイズ変更検出枠と同じサイズの顔画像である可能性が高い領域であると判定すると、当該領域の外形枠を仮検出結果枠とする。そして、識別器１４は、仮検出結果枠に基づいて、サイズ変更画像の元の画像である処理対象画像において、非基準検出枠と同じサイズの顔画像である可能性が高い領域を特定し、当該領域を検出結果領域とするとともに、当該検出結果領域の外形枠を最終的な検出結果枠とする。 If the discriminator 14 determines that the image in the size change detection frame in the size change image is a region that is highly likely to be a face image having the same size as the size change detection frame, the identifier 14 temporarily detects the outer shape frame of the region. The result frame. Then, the discriminator 14 identifies a region that is highly likely to be a face image having the same size as the non-reference detection frame in the processing target image that is the original image of the resized image based on the temporary detection result frame, The area is set as a detection result area, and the outer frame of the detection result area is set as a final detection result frame.

＜基準検出枠を用いた検出処理＞
次に、検出部１２が処理対象画像に対して基準検出枠を移動させながら、当該基準検出枠内の画像が顔画像である可能性が高いかを判定する際の当該検出部１２の一連の動作について説明する。図３〜６は、検出部１２の当該動作を説明するための図である。検出部１２は、基準検出枠をラスタスキャンさせながら、当該基準検出枠内の画像に対して顔画像の検出を行う。 <Detection process using reference detection frame>
Next, the detection unit 12 moves the reference detection frame relative to the processing target image, and determines whether the image in the reference detection frame is likely to be a face image. The operation will be described. 3-6 is a figure for demonstrating the said operation | movement of the detection part 12. FIG. The detection unit 12 detects a face image for an image in the reference detection frame while raster scanning the reference detection frame.

図３に示されるように、特徴量抽出部１３は、処理対象画像２０の左上にまず基準検出枠１００を設定して、当該基準検出枠１００内の画像から複数の特徴量を抽出する。識別器１４は、特徴量抽出部１３が抽出した複数の特徴量から成る特徴ベクトルと、複数の重み係数から成る重みベクトルとに基づいて、基準検出枠１００内の画像についての検出確度値を求める。そして、識別器１４は、算出した検出確度値がしきい値以上である場合には、処理対象画像２０での左上の基準検出枠１００内の領域が顔画像である可能性が高いと判定し、当該領域を検出結果領域とし、当該領域の外形枠である当該基準検出枠１００を検出結果枠とする。 As shown in FIG. 3, the feature amount extraction unit 13 first sets a reference detection frame 100 at the upper left of the processing target image 20, and extracts a plurality of feature amounts from the image in the reference detection frame 100. The discriminator 14 obtains a detection accuracy value for an image in the reference detection frame 100 based on a feature vector composed of a plurality of feature amounts extracted by the feature amount extraction unit 13 and a weight vector composed of a plurality of weight coefficients. . When the calculated detection accuracy value is equal to or greater than the threshold value, the classifier 14 determines that the region in the upper left reference detection frame 100 in the processing target image 20 is highly likely to be a face image. The region is set as a detection result region, and the reference detection frame 100 which is the outer frame of the region is set as a detection result frame.

次に特徴量抽出部１３は、処理対象画像２０において基準検出枠１００を少し右に移動させる。特徴量抽出部１３は、例えば、１画素分あるいは数画素分だけ右に基準検出枠１００を移動させる。そして、特徴量抽出部１３は、処理対象画像２０における移動後の基準検出枠１００内の画像から複数の特徴量を抽出する。 Next, the feature amount extraction unit 13 moves the reference detection frame 100 slightly to the right in the processing target image 20. For example, the feature amount extraction unit 13 moves the reference detection frame 100 to the right by one pixel or several pixels. Then, the feature quantity extraction unit 13 extracts a plurality of feature quantities from the image in the reference detection frame 100 after movement in the processing target image 20.

その後、識別器１４は、特徴量抽出部１３で抽出された複数の特徴量から成る特徴ベクトルと、複数の重み係数から成る重みベクトルとに基づいて、移動後の基準検出枠１００内の画像についての検出確度値を求める。そして、識別器１４は、算出した検出確度値がしきい値以上である場合には、移動後の基準検出枠１００内の画像が顔画像である可能性が高いと判定して、当該画像を検出結果領域とするとともに、当該画像の外形枠である移動後の基準検出枠１００を検出結果枠とする。 Thereafter, the discriminator 14 uses the feature vector composed of a plurality of feature amounts extracted by the feature amount extraction unit 13 and the weight vector composed of a plurality of weight coefficients to determine the image in the reference detection frame 100 after movement. The detection accuracy value is obtained. When the calculated detection accuracy value is equal to or greater than the threshold value, the discriminator 14 determines that the image in the reference detection frame 100 after movement is highly likely to be a face image, and determines the image. In addition to the detection result area, the reference detection frame 100 after movement, which is the outer frame of the image, is used as the detection result frame.

その後、検出部１２は同様に動作して、図４に示されるように、基準検出枠１００が処理対象画像２０の右端まで移動すると、検出部１２は、右端の基準検出枠１００内の画像についての検出確度値を求める。そして、検出部１２は、求めた検出確度値がしきい値以上であれば、右端の基準検出枠１００内の画像を検出結果領域とするとともに、当該右端の基準検出枠１００を検出結果枠とする。 Thereafter, the detection unit 12 operates in the same manner, and as illustrated in FIG. 4, when the reference detection frame 100 moves to the right end of the processing target image 20, the detection unit 12 detects the image in the right end reference detection frame 100. The detection accuracy value is obtained. If the obtained detection accuracy value is equal to or greater than the threshold value, the detection unit 12 sets the image in the rightmost reference detection frame 100 as a detection result region, and uses the rightmost reference detection frame 100 as a detection result frame. To do.

次に、特徴量抽出部１３は、図５に示されるように、基準検出枠１００を少し下げつつ処理対象画像２０の左端に移動させた後、当該基準検出枠１００内の画像から複数の特徴量を抽出する。特徴量抽出部１３は、上下方向（列方向）において例えば１画素分あるいは数画素分だけ下に基準検出枠１００を移動させる。その後、識別器１４は、特徴量抽出部１３から抽出された複数の特徴量から成る特徴ベクトルと、複数の重み係数から成る重みベクトルとに基づいて、現在の基準検出枠１００内の画像についての検出確度値を求めて出力する。そして、識別器１４は、算出した検出確度値がしきい値以上である場合には、現在の基準検出枠１００内の画像が顔画像である可能性が高いと判定して、当該画像を検出結果領域とするとともに、当該基準検出枠１００を検出結果枠とする。 Next, as illustrated in FIG. 5, the feature amount extraction unit 13 moves the reference detection frame 100 to the left end of the processing target image 20 while slightly lowering the reference detection frame 100, and then performs a plurality of features from the image in the reference detection frame 100. Extract the amount. The feature amount extraction unit 13 moves the reference detection frame 100 downward by, for example, one pixel or several pixels in the vertical direction (column direction). Thereafter, the discriminator 14 uses the feature vector composed of a plurality of feature amounts extracted from the feature amount extraction unit 13 and the weight vector composed of a plurality of weight coefficients to determine the image in the current reference detection frame 100. Find and output the detection accuracy value. When the calculated detection accuracy value is equal to or greater than the threshold value, the classifier 14 determines that the image in the current reference detection frame 100 is likely to be a face image, and detects the image. In addition to the result area, the reference detection frame 100 is set as a detection result frame.

その後、検出部１２は同様に動作して、図６に示されるように、基準検出枠１００が処理対象画像２０の右下まで移動すると、検出部１２は、右下の当該基準検出枠１００内の画像についての検出確度値を求める。そして、検出部１２は、求めた検出確度値がしきい値以上であれば、右下の基準検出枠１００内の画像を検出結果領域とするとともに、当該右下の基準検出枠を検出結果枠とする。 Thereafter, the detection unit 12 operates in the same manner, and as shown in FIG. 6, when the reference detection frame 100 moves to the lower right of the processing target image 20, the detection unit 12 moves to the lower right of the reference detection frame 100. The detection accuracy value for the image is obtained. If the obtained detection accuracy value is equal to or greater than the threshold value, the detection unit 12 sets the image in the lower right reference detection frame 100 as a detection result region and uses the lower right reference detection frame as the detection result frame. And

以上のようにして、検出部１２は、基準検出枠を使用して、処理対象画像において、当該基準検出枠と同じサイズの顔画像である可能性が高い領域を検出結果領域として検出する。言い換えれば、検出部１２は、基準検出枠を使用して、処理対象画像において、当該基準検出枠と同じサイズの顔画像を特定する。 As described above, the detection unit 12 uses the reference detection frame to detect, as a detection result region, an area that is highly likely to be a face image having the same size as the reference detection frame in the processing target image. In other words, the detection unit 12 specifies a face image having the same size as the reference detection frame in the processing target image using the reference detection frame.

＜非基準検出枠を用いた検出処理＞
検出部１２が非基準検出枠を使用して検出処理を行う際には、特徴量抽出部１３は、非基準検出枠の大きさが基準サイズ（基準検出枠のサイズ）と一致するように、当該非基準検出枠をサイズ変更する。そして、特徴量抽出部１３は、非基準検出枠についてのサイズ変更比率と同じだけ処理対象画像をサイズ変更する。 <Detection process using non-reference detection frame>
When the detection unit 12 performs the detection process using the non-reference detection frame, the feature amount extraction unit 13 is configured so that the size of the non-reference detection frame matches the reference size (the size of the reference detection frame). The non-reference detection frame is resized. Then, the feature amount extraction unit 13 changes the size of the processing target image by the same amount as the size change ratio for the non-reference detection frame.

本実施の形態では、基準サイズは１６ｐ×１６ｐであることから、例えば、大きさがＲｐ×Ｒｐ（Ｒ＞１６）の非基準検出枠が使用される場合、特徴量抽出部１３は、当該非基準検出枠の縦幅（上下方向の幅）及び横幅（左右方向の幅）をそれぞれ（１６／Ｒ）倍して当該非基準検出枠を縮小し、サイズ変更検出枠を生成する。そして、特徴量抽出部１３は、処理対象画像の縦幅（画素数）及び横幅（画素数）をそれぞれ（１６／Ｒ）倍して当該処理対象画像を縮小し、サイズ変更画像を生成する。その後、検出部１２は、上述の図３〜６を用いて説明した処理と同様に、サイズ変更画像に対してサイズ変更検出枠を移動させながら、当該サイズ変更検出枠内の画像から特徴量を抽出し、当該特徴量に基づいて、当該サイズ変更検出枠内の画像が、当該サイズ変更検出枠と同じサイズの顔画像である可能性が高いか判定する。つまり、検出部１２は、サイズ変更検出枠を用いて、サイズ変更画像において当該サイズ変更検出枠と同じサイズの顔画像である可能性が高い領域を検出する処理を行う。以後、この処理を「サイズ変更版検出処理」と呼ぶ。また、枠内画像からの特徴量の抽出に使用される基準検出枠及びサイズ変更検出枠を総称して「特徴量抽出枠」と呼ぶ。特徴量抽出枠には、サイズ変更前の非基準検出枠は含まれない。 In this embodiment, since the reference size is 16p × 16p, for example, when a non-reference detection frame having a size of Rp × Rp (R> 16) is used, the feature amount extraction unit 13 The non-reference detection frame is reduced by multiplying the vertical width (vertical width) and horizontal width (horizontal width) of the reference detection frame by (16 / R), respectively, to generate a size change detection frame. Then, the feature amount extraction unit 13 reduces the processing target image by multiplying the vertical width (number of pixels) and the horizontal width (number of pixels) of the processing target image by (16 / R), and generates a size-changed image. After that, the detection unit 12 moves the size change detection frame with respect to the size change image and moves the feature amount from the image in the size change detection frame in the same manner as the process described with reference to FIGS. Based on the extracted feature amount, it is determined whether there is a high possibility that the image in the size change detection frame is a face image having the same size as the size change detection frame. That is, using the size change detection frame, the detection unit 12 performs processing for detecting a region that is highly likely to be a face image having the same size as the size change detection frame in the size change image. Hereinafter, this process is referred to as a “size-changed version detection process”. Further, the reference detection frame and the size change detection frame used for extracting the feature amount from the in-frame image are collectively referred to as “feature amount extraction frame”. The feature amount extraction frame does not include the non-reference detection frame before the size change.

検出部１２は、サイズ変更版検出処理において、サイズ変更画像に対してサイズ変更検出枠を設定し、当該サイズ変更検出枠内の画像が、当該サイズ変更検出枠と同じサイズの顔画像である可能性が高いと判定すると、当該画像の外形枠である当該サイズ変更検出枠を仮検出結果枠とする。 In the size change version detection process, the detection unit 12 sets a size change detection frame for the size change image, and the image in the size change detection frame may be a face image having the same size as the size change detection frame. If it is determined that the property is high, the size change detection frame that is the outer frame of the image is set as a temporary detection result frame.

検出部１２では、サイズ変更画像について少なくとも一つの仮検出結果枠が得られると、識別器１４が、当該少なくとも一つの仮検出結果枠を、当該サイズ変更画像の元になる処理対象画像に応じた検出結果枠に変換する。 In the detection unit 12, when at least one temporary detection result frame is obtained for the size-changed image, the discriminator 14 determines the at least one temporary detection result frame according to the processing target image that is the source of the size-changed image. Convert to detection result frame.

具体的には、識別器１４は、まず、サイズ変更画像に対して、得られた少なくとも一つの仮検出結果枠を設定する。図７は、サイズ変更画像１２０に対して仮検出結果枠１１０が設定されている様子を示す図である。図７の例では、サイズ変更画像１２０に対して複数の仮検出結果枠１１０が設定されている。 Specifically, the classifier 14 first sets at least one obtained temporary detection result frame for the resized image. FIG. 7 is a diagram illustrating a state in which the temporary detection result frame 110 is set for the size-changed image 120. In the example of FIG. 7, a plurality of temporary detection result frames 110 are set for the resized image 120.

次に識別器１４は、図８に示されるように、仮検出結果枠１１０が設定されたサイズ変更画像１２０を拡大（サイズ変更）して元のサイズに戻すことによって、サイズ変更画像１２０を処理対象画像２０に変換する。これにより、サイズ変更画像１２０に設定された仮検出結果枠１１０も拡大されて、仮検出結果枠１１０は、図８に示されるように、処理対象画像２０に応じた検出結果枠１５０に変換される。処理対象画像２０における検出結果枠１５０内の領域が、処理対象画像２０において非基準検出枠と同じサイズの顔画像である可能性が高い検出結果領域となる。これにより、検出部１２では、サイズ変更版検出処理によって得られた仮検出結果枠１１０に基づいて、処理対象画像において非基準検出枠と同じサイズの顔画像である可能性が高い検出結果領域が特定される。 Next, as shown in FIG. 8, the classifier 14 processes the resized image 120 by enlarging (resizing) the resized image 120 in which the temporary detection result frame 110 is set and returning it to the original size. The target image 20 is converted. As a result, the temporary detection result frame 110 set in the size-changed image 120 is also enlarged, and the temporary detection result frame 110 is converted into a detection result frame 150 corresponding to the processing target image 20, as shown in FIG. The A region in the detection result frame 150 in the processing target image 20 is a detection result region that is highly likely to be a face image having the same size as the non-reference detection frame in the processing target image 20. Thereby, the detection unit 12 has a detection result region that is highly likely to be a face image having the same size as the non-reference detection frame in the processing target image, based on the temporary detection result frame 110 obtained by the size-changed version detection process. Identified.

このように、検出部１２は、非基準検出枠を使用して処理対象画像についての検出処理を行う際には、サイズが基準サイズと一致するようにサイズ変更した非基準検出枠と、当該非基準検出枠のサイズ変更に応じてサイズ変更した処理対象画像とを使用してサイズ変更版検出処理を行う。これにより、基準サイズとは異なるサイズの検出枠が使用される場合であっても、特徴量抽出部１３は、基準サイズの画像から特徴量を抽出できる。そして、検出部１２は、サイズ変更版検出処理の結果に基づいて、処理対象画像において非基準検出枠と同じサイズの顔画像である可能性が高い検出結果領域を特定する。これにより、検出部１２では非基準検出枠が用いられた検出処理が行われる。 As described above, when the detection unit 12 performs the detection process on the processing target image using the non-reference detection frame, the non-reference detection frame that has been resized so that the size matches the reference size, The size-changed version detection process is performed using the processing target image whose size has been changed according to the size change of the reference detection frame. Thereby, even when a detection frame having a size different from the reference size is used, the feature amount extraction unit 13 can extract the feature amount from the image of the reference size. And the detection part 12 pinpoints the detection result area | region with high possibility that it is a face image of the same size as a non-reference | standard detection frame in a process target image based on the result of a size-change version detection process. Accordingly, the detection unit 12 performs detection processing using the non-reference detection frame.

検出部１２は、以上のような検出処理を、複数種類の検出枠のそれぞれを用いて行う。これにより、処理対象画像に顔画像が含まれている場合には、検出結果領域（顔画像である可能性が高い領域）及び検出結果枠（顔画像である可能性が高い領域の外形枠）が得られるとともに、検出結果枠に対応した検出確度値が得られる。処理対象画像について得られた検出結果枠に対応した検出確度値とは、当該処理対象画像における当該検出結果枠内の画像が顔画像である確からしさを示している。 The detection unit 12 performs the detection process as described above using each of a plurality of types of detection frames. Thereby, when a face image is included in the processing target image, a detection result area (area that is highly likely to be a face image) and a detection result frame (outer frame of an area that is likely to be a face image) Is obtained, and a detection accuracy value corresponding to the detection result frame is obtained. The detection accuracy value corresponding to the detection result frame obtained for the processing target image indicates the probability that the image in the detection result frame in the processing target image is a face image.

図９は、処理対象画像２０について得られた検出結果枠１５０が当該処理対象画像２０に重ねて配置された様子を示す図である。図９に示されるように、互いにサイズの異なる複数種類の検出枠が使用されて検出処理が行われることによって、様々な大きさの検出結果枠１５０が得られる。これは、処理対象画像２０に含まれる様々な大きさの顔画像が検出されていることを意味している。 FIG. 9 is a diagram illustrating a state in which the detection result frame 150 obtained for the processing target image 20 is arranged so as to overlap the processing target image 20. As shown in FIG. 9, detection results frames 150 having various sizes are obtained by performing detection processing using a plurality of types of detection frames having different sizes. This means that face images of various sizes included in the processing target image 20 are detected.

＜特徴量抽出処理＞
次に特徴量抽出部１３の動作について詳細に説明する。図１０は、特徴量抽出部１３の構成を示す図である。図１０に示されるように、特徴量抽出部１３は、評価値取得部１３０と、特徴量取得部１３１とを備えている。 <Feature extraction process>
Next, the operation of the feature amount extraction unit 13 will be described in detail. FIG. 10 is a diagram illustrating a configuration of the feature amount extraction unit 13. As shown in FIG. 10, the feature amount extraction unit 13 includes an evaluation value acquisition unit 130 and a feature amount acquisition unit 131.

評価値取得部１３０は、処理対象画像から、行列状に配列された複数の評価値で構成された評価値マップを生成する。特徴量取得部１３１は、評価値マップを用いて顔画像についての特徴量を取得する。ここで、評価値とは、注目画素の画素値と、当該注目画素の周囲の複数の画素値との関係を示す値である。以後、注目画素の画素値を「注目画素値」と呼ぶことがある。また、注目画素の周囲の画素値を「周囲画素値」と呼ぶことがある。本実施の形態では、評価値の取得には、例えば、注目画素の周囲８方向の周囲画素値が使用される。 The evaluation value acquisition unit 130 generates an evaluation value map including a plurality of evaluation values arranged in a matrix form from the processing target image. The feature amount acquisition unit 131 acquires the feature amount of the face image using the evaluation value map. Here, the evaluation value is a value indicating the relationship between the pixel value of the target pixel and a plurality of pixel values around the target pixel. Hereinafter, the pixel value of the target pixel may be referred to as a “target pixel value”. In addition, pixel values around the pixel of interest may be referred to as “ambient pixel values”. In the present embodiment, for example, peripheral pixel values in eight directions around the target pixel are used to acquire the evaluation value.

＜評価値マップの生成方法＞
評価値取得部１３０は、検出処理で使用される複数種類の検出枠のそれぞれについて、当該検出枠に対応した評価値マップを生成する。以後、基準検出枠に対応した評価値マップを「基準用評価値マップ」と呼ぶ。また、非基準検出枠に対応した評価値マップを「非基準用評価値マップ」と呼ぶ。 <Method for generating evaluation value map>
The evaluation value acquisition unit 130 generates an evaluation value map corresponding to the detection frame for each of a plurality of types of detection frames used in the detection process. Hereinafter, the evaluation value map corresponding to the reference detection frame is referred to as a “reference evaluation value map”. The evaluation value map corresponding to the non-reference detection frame is referred to as a “non-reference evaluation value map”.

ここで、後述するように、処理対象画像についての特徴量の抽出では、本来のサイズ（Ｍｐ×Ｎｐ）よりも周囲１画素分だけ小さい、（Ｍ−２）ｐ×（Ｎ−２）ｐの処理対象画像が使用される。つまり、上述の図３〜６を用いて説明した検出処理においては、本来のサイズよりも周囲１画素分だけ小さい処理対象画像が使用される。この処理対象画像を特に「抽出用処理対象画像」と呼ぶ。図３〜６に示される処理対象画像２０は実際には「抽出用処理対象画像」である。また、サイズ変更画像についての特徴量が抽出される際には、本来のサイズよりも周囲１画素分だけ小さいサイズ変更画像が使用される。以後、このサイズ変更画像を特に「抽出用サイズ変更画像」と呼ぶ。 Here, as will be described later, in the feature amount extraction for the processing target image, (M−2) p × (N−2) p, which is smaller than the original size (Mp × Np) by one surrounding pixel. A processing target image is used. That is, in the detection processing described with reference to FIGS. 3 to 6 described above, a processing target image that is smaller than the original size by one surrounding pixel is used. This processing target image is particularly referred to as an “extraction processing target image”. The processing target image 20 shown in FIGS. 3 to 6 is actually an “extraction processing target image”. In addition, when a feature amount for a size-changed image is extracted, a size-changed image that is smaller than the original size by one surrounding pixel is used. Hereinafter, this size-changed image is particularly referred to as “extraction size-changed image”.

基準用評価値マップは処理対象画像から生成される。基準用評価値マップは、抽出用処理対象画像と同様に、行方向に（Ｍ−２）個の評価値が並び、列方向に（Ｎ−２）個の評価値が並ぶ、合計（（Ｍ−２）×（Ｎ−２））個の複数の評価値で構成されている。基準用評価値マップを構成する複数の評価値は、抽出用処理対象画像を構成する複数の画素とそれぞれ対応している。具体的には、基準用評価値マップでのｍ×ｎ番目の評価値は、抽出用処理対象画像でのｍ×ｎ番目の画素に対応している。基準用評価値マップでの各評価値は、それに対応する画素を注目画素とした場合における、当該注目画素の画素値と複数の周囲画素値との間の関係を示している。 The reference evaluation value map is generated from the processing target image. In the reference evaluation value map, (M−2) evaluation values are arranged in the row direction and (N−2) evaluation values are arranged in the column direction, as in the extraction processing target image. −2) × (N−2)) multiple evaluation values. The plurality of evaluation values constituting the reference evaluation value map respectively correspond to the plurality of pixels constituting the extraction processing target image. Specifically, the m × n-th evaluation value in the reference evaluation value map corresponds to the m × n-th pixel in the extraction processing target image. Each evaluation value in the reference evaluation value map indicates the relationship between the pixel value of the target pixel and a plurality of surrounding pixel values when the corresponding pixel is the target pixel.

また、非基準検出枠に対応した非基準用評価値マップを構成する、行列状に配置された複数の評価値は、当該非基準検出枠に対応するサイズ変更画像（当該非基準検出枠のサイズ変更比率と同じ比率だけサイズ変更された処理対象画像）を、周囲１画素分だけ小さくして得られる抽出用サイズ変更画像を構成する複数の画素とそれぞれ対応している。非基準用評価値マップでの複数の評価値の配列は、抽出用サイズ変更画像を構成する複数の画素の配列と同じである。非基準用評価値マップでのｍ×ｎ番目の評価値は、抽出用サイズ変更画像でのｍ×ｎ番目の画素に対応している。非基準用評価値マップでの各評価値は、それに対応する画素を注目画素とした場合における、当該注目画素の画素値と複数の周囲画素値との間の関係を示している。 In addition, a plurality of evaluation values arranged in a matrix that form a non-reference evaluation value map corresponding to a non-reference detection frame is a size-changed image corresponding to the non-reference detection frame (the size of the non-reference detection frame). The processing target image whose size has been changed by the same ratio as the change ratio) corresponds to each of a plurality of pixels constituting an extraction size-changed image obtained by reducing the size by one surrounding pixel. The arrangement of the plurality of evaluation values in the non-reference evaluation value map is the same as the arrangement of the plurality of pixels constituting the extraction size-changed image. The m × n-th evaluation value in the non-reference evaluation value map corresponds to the m × n-th pixel in the extraction size-changed image. Each evaluation value in the non-reference evaluation value map indicates a relationship between the pixel value of the target pixel and a plurality of surrounding pixel values when the corresponding pixel is the target pixel.

評価値取得部１３０は、基準用評価値マップを生成する際には、図１１に示されるように、処理対象画像２０の左上に対してサイズが３ｐ×３ｐの算出用枠２００を設定する。そして、評価値取得部１３０は、算出用枠２００内の９個の画素の中央の画素を注目画素とする。 When generating the reference evaluation value map, the evaluation value acquisition unit 130 sets a calculation frame 200 having a size of 3p × 3p on the upper left of the processing target image 20, as shown in FIG. Then, the evaluation value acquisition unit 130 sets the central pixel of the nine pixels in the calculation frame 200 as the target pixel.

次に評価値取得部１３０は、注目画素の画素値と、注目画素の周囲の８個の周囲画素値との関係を示す評価値を求める。評価値取得部１３０は、当該８個の周囲画素値として、注目画素の左上の画素の画素値、真上の画素の画素値、右上の画素の画素値、右の画素の画素値、右下の画素の画素値、真下の画素の画素値、左下の画素の画素値、左の画素の画素値を使用する。本実施の形態では、評価値は８ビットで表される。そして、評価値取得部１３０は、求めた評価値を、算出用枠２００内の中央の画素、つまり、抽出用処理対象画像の１×１番目の画素に対応する、基準用評価値マップの１×１番目の値とする。評価値としては、例えば、ＬＢＰあるいはＬＴＰを使用することができる。ＬＢＰ及びＬＴＰの求め方については後で詳細に説明する。 Next, the evaluation value acquisition unit 130 obtains an evaluation value indicating the relationship between the pixel value of the target pixel and the eight surrounding pixel values around the target pixel. The evaluation value acquisition unit 130 uses the eight surrounding pixel values as the pixel value of the upper left pixel, the pixel value of the upper right pixel, the pixel value of the upper right pixel, the pixel value of the right pixel, , The pixel value of the pixel immediately below, the pixel value of the lower left pixel, and the pixel value of the left pixel are used. In the present embodiment, the evaluation value is represented by 8 bits. Then, the evaluation value acquisition unit 130 sets the obtained evaluation value to 1 in the reference evaluation value map corresponding to the central pixel in the calculation frame 200, that is, the 1 × 1 pixel of the extraction processing target image. X First value. As the evaluation value, for example, LBP or LTP can be used. The method for obtaining LBP and LTP will be described in detail later.

図１２は、算出用枠２００内の９個の画素についての画素値の例を示す図である。本実施の形態では、評価値の取得で使用する画素値を例えば輝度とする。また本実施の形態では、画素値は８ビットで表される。したがって、画素値は、十進数で表すと、“０”から“２５５”までの値をとる。なお、当該画素値は色差成分であっても良い。以後、画素値等の値については、特に断らない名切り、十進数で表した値とする。 FIG. 12 is a diagram illustrating an example of pixel values for nine pixels in the calculation frame 200. In the present embodiment, the pixel value used for obtaining the evaluation value is, for example, luminance. In the present embodiment, the pixel value is represented by 8 bits. Therefore, the pixel value takes a value from “0” to “255” in decimal. The pixel value may be a color difference component. In the following, values such as pixel values are assumed to be values expressed in decimal notation unless otherwise specified.

図１２の例では、注目画素値２１０が“５７”である。また、注目画素の左上の画素の画素値、真上の画素の画素値、右上の画素の画素値、右の画素の画素値、右下の画素の画素値、真下の画素の画素値、左下の画素の画素値、左の画素の画素値が、それぞれ、“５０”、“５５”、“６５”、“７５”、“７９”、“５９”、“４８”及び“４９”となっている。 In the example of FIG. 12, the target pixel value 210 is “57”. Also, the pixel value of the upper left pixel of the target pixel, the pixel value of the upper right pixel, the pixel value of the upper right pixel, the pixel value of the right pixel, the pixel value of the lower right pixel, the pixel value of the lower right pixel, the lower left pixel The pixel value of the pixel and the pixel value of the left pixel are “50”, “55”, “65”, “75”, “79”, “59”, “48”, and “49”, respectively. Yes.

評価値取得部１３０は、処理対象画像２０の左上にある算出用枠２００に対応する評価値を求めると、処理対象画像２０において算出用枠２００を１画素分だけ右に移動させる。そして、評価値取得部１３０は、移動後の算出用枠２００内の９個の画素の中央の画素を注目画素とし、注目画素値と、８個の周囲画素値との関係を示す評価値を求める。評価値取得部１３０は、求めた評価値を、移動後の算出用枠２００内の中央の画素、つまり、抽出用処理対象画像の１×２番目の画素に対応する、基準用評価値マップの１×２番目の値とする。 When the evaluation value acquisition unit 130 obtains an evaluation value corresponding to the calculation frame 200 at the upper left of the processing target image 20, the evaluation value acquisition unit 130 moves the calculation frame 200 to the right by one pixel in the processing target image 20. Then, the evaluation value acquisition unit 130 sets the central pixel of the nine pixels in the calculation frame 200 after movement as a target pixel, and calculates an evaluation value indicating the relationship between the target pixel value and the eight surrounding pixel values. Ask. The evaluation value acquisition unit 130 uses the calculated evaluation value of the reference evaluation value map corresponding to the center pixel in the calculation frame 200 after movement, that is, the 1 × 2 pixel of the extraction target image. The value is 1 × 2nd.

次に、評価値取得部１３０は、処理対象画像２０において算出用枠２００をさらに１画素分だけ右に移動させる。そして、評価値取得部１３０は、移動後の算出用枠２００内の９個の画素の中央の画素を注目画素とし、注目画素値と、８個の周囲画素値との関係を示す評価値を求める。評価値取得部１３０は、求めた評価値を、移動後の算出用枠２００内の中央の画素、つまり、抽出用処理対象画像の１×３番目の画素に対応する、基準用評価値マップの１×３番目の値とする。 Next, the evaluation value acquisition unit 130 moves the calculation frame 200 further to the right by one pixel in the processing target image 20. Then, the evaluation value acquisition unit 130 sets the central pixel of the nine pixels in the calculation frame 200 after movement as a target pixel, and calculates an evaluation value indicating the relationship between the target pixel value and the eight surrounding pixel values. Ask. The evaluation value acquisition unit 130 uses the calculated evaluation value of the reference evaluation value map corresponding to the center pixel in the calculation frame 200 after movement, that is, the 1 × 3rd pixel of the extraction target image. The value is 1 × 3.

以後、評価値取得部１３０は、算出用枠２００を１画素分ずつ処理対象画像２０の右下までラスタスキャンさせて、算出用枠２００の各位置において、当該算出用枠２００の中央の画素を注目画素として評価値を求める。これにより、抽出用処理対象画像を構成する複数の画素、つまり、処理対象画像の周囲１画素分を除いた複数の画素にそれぞれ対応する複数の評価値が生成され、当該複数の評価値から成る基準用評価値マップが完成する。なお、処理対象画像の周囲１画素分については、各画素が注目画素とならないため、当該各画素に対応する評価値は求められない。 Thereafter, the evaluation value acquisition unit 130 raster scans the calculation frame 200 pixel by pixel to the lower right of the processing target image 20, and determines the center pixel of the calculation frame 200 at each position of the calculation frame 200. An evaluation value is obtained as the target pixel. Thereby, a plurality of evaluation values respectively corresponding to a plurality of pixels constituting the extraction processing target image, that is, a plurality of pixels excluding one pixel around the processing target image, are generated, and are composed of the plurality of evaluation values. A reference evaluation value map is completed. Note that for each pixel around the processing target image, each pixel does not become a target pixel, and thus an evaluation value corresponding to each pixel cannot be obtained.

評価値取得部１３０は、非基準検出枠に対応する非基準用評価値マップを生成する際には、処理対象画像に対して算出用枠２００を設定する替わりに、当該非基準検出枠に対応するサイズ変更画像に対して算出用枠２００を設定する。そして、評価値取得部１３０は、上記と同様にして、当該サイズ変更画像に対して算出用枠２００を１画素ずつラスタスキャンさせながら、算出用枠２００の各位置で評価値を求める。これにより、非基準検出枠に対応する抽出用サイズ変更画像を構成する複数の画素、つまり非基準検出枠に対応するサイズ変更画像の周囲１画素分を除いた複数の画素にそれぞれ対応する複数の評価値が生成され、当該複数の評価値から成る、当該非基準検出枠に対応する非基準用評価値マップが完成する。評価値取得部１３０は、複数種類の非基準検出枠のそれぞれについて非基準用評価値マップを生成する。なお、サイズ変更画像の周囲１画素分については、各画素が注目画素とならないため、当該各画素に対応する評価値は求められない。 When generating the non-reference evaluation value map corresponding to the non-reference detection frame, the evaluation value acquisition unit 130 supports the non-reference detection frame instead of setting the calculation frame 200 for the processing target image. A calculation frame 200 is set for the size-changed image. Then, in the same manner as described above, the evaluation value acquisition unit 130 obtains an evaluation value at each position of the calculation frame 200 while raster-scanning the calculation frame 200 pixel by pixel with respect to the size-changed image. As a result, a plurality of pixels constituting the extraction size-changed image corresponding to the non-reference detection frame, that is, a plurality of pixels respectively corresponding to a plurality of pixels excluding one peripheral pixel of the size-changed image corresponding to the non-reference detection frame. An evaluation value is generated, and a non-reference evaluation value map corresponding to the non-reference detection frame, which is composed of the plurality of evaluation values, is completed. The evaluation value acquisition unit 130 generates a non-reference evaluation value map for each of a plurality of types of non-reference detection frames. Note that, for each pixel around the size-changed image, each pixel does not become a target pixel, and thus an evaluation value corresponding to each pixel cannot be obtained.

＜評価値の具体例＞
次に評価値として使用されるＬＢＰ及びＬＴＰについて説明する。 <Specific examples of evaluation values>
Next, LBP and LTP used as evaluation values will be described.

＜ＬＢＰ＞
ＬＢＰが生成される際には、複数の周囲画素値のそれぞれについて、当該周囲画素値と注目画素値との関係を示す１ビット（以後、「関係表示ビット」と呼ぶ）が生成される。周囲画素値から注目画素値を差し引いて得られる差分値が零以上であれば関係表示ビットの値は“１”とされ、零未満であれば関係表示ビットの値は“０”とされる。そして、複数の周囲画素値について得られた複数の関係表示ビットで構成される８ビットのバイナリコード（以後、「関係表示コード」と呼ぶ）がＬＢＰとなり、当該ＬＰＢが評価値とされる。具体的には、ＬＢＰとしての関係表示コードを十進数で表した値が評価値とされる。 <LBP>
When the LBP is generated, for each of a plurality of surrounding pixel values, 1 bit (hereinafter referred to as “relation display bit”) indicating the relationship between the surrounding pixel value and the target pixel value is generated. If the difference value obtained by subtracting the target pixel value from the surrounding pixel value is greater than or equal to zero, the value of the related display bit is “1”, and if it is less than zero, the value of the related display bit is “0”. Then, an 8-bit binary code (hereinafter referred to as “related display code”) composed of a plurality of related display bits obtained for a plurality of surrounding pixel values becomes LBP, and the LPB is used as an evaluation value. Specifically, a value representing the relation display code as the LBP in decimal is used as the evaluation value.

例えば、算出用枠２００内の複数の画素について、図１２に示されるような注目画素値２１０と複数の周囲画素値２２０とが得られたとする。評価値取得部１３０は、各周囲画素値２２０について、当該周囲画素値２２０から注目画素値２１０を差し引いて得られる差分値２５０を求める。図１３に示されるように、左上、真上、右上、右、右下、真下、左下、左の周囲画素値２２０についての注目画素値２１０との間の差分値２５０は、それぞれ“−７”、“−２”、“８”、“１８”、“２２”、“２”、“−９”、“−８”となる。そして評価値取得部１３０は、求めた複数の差分値２５０（本例では８個の差分値２５０）のそれぞれと零とを比較する。評価値取得部１３０は、周囲画素値２２０についての差分値２５０が零以上である場合には、当該周囲画素値２２０についての関係表示ビット２６０の値を“１”とし、当該差分値２５０が零未満である場合には、当該周囲画素値２２０についての関係表示ビット２６０の値を“０”とする。図１２，１３の例では、図１４に示されるように、左上、真上、右上、右、右下、真下、左下、左の周囲画素値２２０についての関係表示ビット２６０は、それぞれ“０”、“０”、“１”、“１”、“１”、“１”、“０”、“０”となる。そして、評価値取得部１３０は、複数の周囲画素値２２０について求めた複数の関係表示ビット２６０を所定の順で並べることによって、ＬＢＰとして８ビットの関係表示コードを生成する。評価値取得部１３０は、関係表示コードを求めると、当該関係表示コードを十進数で表した値を、注目画素に対応する評価値とする。本実施の形態では、例えば、左上の周囲画素値２２０の関係表示ビット２６０、真上の周囲画素値２２０の関係表示ビット２６０、右上の周囲画素値２２０の関係表示ビット２６０、右の周囲画素値２２０の関係表示ビット２６０、右下の周囲画素値２２０の関係表示ビット２６０、真下の周囲画素値２２０の関係表示ビット２６０、左下の周囲画素値２２０の関係表示ビット２６０、左の周囲画素値２２０の関係表示ビット２６０の順で、得られた８個の関係表示ビット２６０が並べられて関係表示コードが生成される。図１２の画素値の例では、図１４に示されるように、“００１１１１００”が関係表示コードとなり、それを十進数で表した値“６０”が注目画素に対応する評価値となる。 For example, assume that a target pixel value 210 and a plurality of surrounding pixel values 220 as shown in FIG. 12 are obtained for a plurality of pixels in the calculation frame 200. The evaluation value acquisition unit 130 obtains a difference value 250 obtained by subtracting the target pixel value 210 from the surrounding pixel value 220 for each surrounding pixel value 220. As shown in FIG. 13, the difference values 250 between the surrounding pixel values 220 for the upper left, right above, right upper, right, lower right, right under, lower left, and left surrounding pixel values 220 are “−7”, respectively. , “−2”, “8”, “18”, “22”, “2”, “−9”, and “−8”. Then, the evaluation value acquisition unit 130 compares each of the obtained plurality of difference values 250 (eight difference values 250 in this example) with zero. When the difference value 250 for the surrounding pixel value 220 is greater than or equal to zero, the evaluation value acquisition unit 130 sets the value of the relationship display bit 260 for the surrounding pixel value 220 to “1”, and the difference value 250 is zero. If it is less, the value of the relation display bit 260 for the surrounding pixel value 220 is set to “0”. In the example of FIGS. 12 and 13, as shown in FIG. 14, the relationship display bits 260 for the surrounding pixel values 220 on the upper left, directly above, upper right, right, lower right, directly below, lower left, and left are respectively “0”. , “0”, “1”, “1”, “1”, “1”, “0”, “0”. Then, the evaluation value acquisition unit 130 generates an 8-bit relation display code as an LBP by arranging the plurality of relation display bits 260 obtained for the plurality of surrounding pixel values 220 in a predetermined order. When the evaluation value acquisition unit 130 obtains the relationship display code, the evaluation value acquisition unit 130 sets a value representing the relationship display code in decimal as an evaluation value corresponding to the target pixel. In the present embodiment, for example, the relationship display bit 260 of the upper left surrounding pixel value 220, the relationship display bit 260 of the upper right surrounding pixel value 220, the relationship display bit 260 of the upper right surrounding pixel value 220, and the right surrounding pixel value 220, a relation display bit 260 of the lower right surrounding pixel value 220, a relation display bit 260 of the lower right surrounding pixel value 220, a relation display bit 260 of the lower left surrounding pixel value 220, and a left surrounding pixel value 220. The relationship display bits 260 are arranged in the order of the relationship display bits 260, and the relationship display code is generated. In the example of the pixel value in FIG. 12, as shown in FIG. 14, “00111100” is the relation display code, and the value “60” expressed in decimal is the evaluation value corresponding to the target pixel.

以上の説明から理解できるように、ＬＢＰは、注目画素の周囲での画素値の様子（分布状況）を示していると言える。よって、ＬＢＰは局所的なテクスチャを示していると言える。以後、ＬＢＰ（正確にはそれを十進数で表した値）を各評価値とする評価値マップを「ＬＢＰマップ」と呼ぶ。 As can be understood from the above description, it can be said that the LBP indicates the state (distribution state) of the pixel value around the pixel of interest. Therefore, it can be said that LBP indicates a local texture. Hereinafter, an evaluation value map in which each evaluation value is an LBP (more precisely, a value represented by a decimal number) is referred to as an “LBP map”.

図１５は処理対象画像２０の一例を示す図である。図１６は図１５に示される処理対象画像２０に基づいて生成されたＬＢＰマップでの各評価値を輝度とすることによって、当該ＬＢＰマップを画像化して得られるグレースケールのＬＢＰマップ画像３０を示す図である。図１６に示されるＬＢＰマップ画像３０からは、処理対象画像２０についてのテクスチャを読み取ることができる。 FIG. 15 is a diagram illustrating an example of the processing target image 20. FIG. 16 shows a grayscale LBP map image 30 obtained by imaging each LBP map by using each evaluation value in the LBP map generated based on the processing target image 20 shown in FIG. 15 as luminance. FIG. From the LBP map image 30 shown in FIG. 16, the texture for the processing target image 20 can be read.

＜ＬＴＰ＞
ＬＴＰが生成される際には、ＬＢＰとは異なり、複数の周囲画素値のそれぞれについて、当該周囲画素値と注目画素値との関係を示す３値データが生成される。ここで、ＬＴＰにおいては、ノイズの影響を抑制するために、周囲画素値と注目画素値との関係を示す３値データが生成される際にはオフセット値が使用される。 <LTP>
When the LTP is generated, unlike the LBP, ternary data indicating the relationship between the surrounding pixel value and the target pixel value is generated for each of the plurality of surrounding pixel values. Here, in LTP, in order to suppress the influence of noise, an offset value is used when ternary data indicating the relationship between the surrounding pixel value and the target pixel value is generated.

具体的には、周囲画素値と注目画素値との差分値の絶対値が所定のオフセット値未満の場合には、３値データの値が“０”とされる。また、周囲画素値と注目画素値との差分値の絶対値がオフセット値以上であって、当該周囲画素値が当該注目画素値よりも大きい場合には、３値データの値は“１”とされる。つまり、周囲画素値が、注目画素値に対してオフセット値を加算して得られる値以上の場合には、３値データの値は“１”とされる。そして、周囲画素値と注目画素値との差分値の絶対値がオフセット値以上であって、当該周囲画素値が当該注目画素値よりも小さい場合には、３値データの値は“−１”とされる。つまり、周囲画素値が、注目画素値からオフセット値だけ減算して得られる値以下の場合には、３値データの値は“−１”とされる。オフセット値は、非特許文献１に記載されているように、例えば“５”に設定される。 Specifically, when the absolute value of the difference value between the surrounding pixel value and the target pixel value is less than a predetermined offset value, the value of the ternary data is set to “0”. When the absolute value of the difference value between the surrounding pixel value and the target pixel value is equal to or larger than the offset value and the surrounding pixel value is larger than the target pixel value, the value of the ternary data is “1”. Is done. That is, when the surrounding pixel value is equal to or greater than the value obtained by adding the offset value to the target pixel value, the value of the ternary data is “1”. When the absolute value of the difference value between the surrounding pixel value and the target pixel value is equal to or larger than the offset value and the surrounding pixel value is smaller than the target pixel value, the value of the ternary data is “−1”. It is said. That is, when the surrounding pixel value is equal to or smaller than the value obtained by subtracting the offset value from the target pixel value, the value of the ternary data is “−1”. As described in Non-Patent Document 1, the offset value is set to “5”, for example.

このように、周囲画素値と注目画素値との間の大小関係が判定される際にオフセット値が設けられることによって、周囲画素値及び注目画素値の少なくとも一方がノイズの影響を受けたとしても、当該大小関係が誤って判定されることを抑制することができる。 As described above, even when at least one of the surrounding pixel value and the target pixel value is affected by noise, the offset value is provided when the magnitude relationship between the surrounding pixel value and the target pixel value is determined. , It can be suppressed that the magnitude relationship is erroneously determined.

ＬＴＰは、複数の周囲画素値について求められた複数の３値データが所定の順で並べられて得られる３値コード（ターナリーコード）である。例えば、左上の周囲画素値の３値データ、真上の周囲画素値の３値データ、右上の周囲画素値の３値データ、右の周囲画素値の３値データ、右下の周囲画素値の３値データ、真下の周囲画素値の３値データ、左下の周囲画素値の３値データ、左の周囲画素値の３値データの順で並べられて得られる３値コードがＬＴＰとされる。 LTP is a ternary code (ternary code) obtained by arranging a plurality of ternary data obtained for a plurality of surrounding pixel values in a predetermined order. For example, the upper left surrounding pixel value ternary data, the upper right surrounding pixel value ternary data, the upper right surrounding pixel value ternary data, the right surrounding pixel value ternary data, the lower right surrounding pixel value The ternary code obtained by arranging the ternary data, the ternary data of the immediately lower surrounding pixel value, the ternary data of the lower left surrounding pixel value, and the ternary data of the left surrounding pixel value is LTP.

図１７は、上述の図１２に示されるような注目画素値２１０と複数の周囲画素値２２０とが得られた場合における当該複数の周囲画素値２２０にそれぞれ対応する複数の３値データ２７０を示す図である。図１５の例では、ＬＴＰは（−１）０１１１０（−１）（−１）となる。 FIG. 17 shows a plurality of ternary data 270 respectively corresponding to the plurality of surrounding pixel values 220 when the target pixel value 210 and the plurality of surrounding pixel values 220 as shown in FIG. 12 are obtained. FIG. In the example of FIG. 15, LTP is (−1) 01110 (−1) (− 1).

ＬＴＰが評価値として使用される場合には、ＬＴＰがそのまま使用されるのではなく、ＬＴＰから得られるポジティブＬＴＰ及びネガティブＬＴＰのどちらか一方が使用される。 When LTP is used as an evaluation value, LTP is not used as it is, but either positive LTP or negative LTP obtained from LTP is used.

ポジティブＬＴＰとは、ＬＴＰに含まれる“１”だけに着目して当該ＬＴＰをバイナリコードに変換したものである。具体的には、ＬＴＰにおいて“１”以外の値をすべて“０”に変換し、それによって得られたバイナリコードがポジティブＬＴＰとなる。図１８は、図１７に示されるＬＴＰに対応するポジティブＬＴＰを示す図である。図１７に示されるＬＴＰ、つまり（−１）０１１１０（−１）（−１）において、“１”以外の値がすべて“０”に変換されると、図１８に示されるように、“００１１１０００”というポジティブＬＴＰが得られる。ポジティブＬＴＰを構成する８ビットは、上位から順に、左上の周囲画素値、真上の周囲画素値、右上の周囲画素値、右の周囲画素値、右下の周囲画素値、真下の周囲画素値、左下の周囲画素値、左の周囲画素値にそれぞれ対応している。 The positive LTP is obtained by converting the LTP into a binary code focusing on only “1” included in the LTP. Specifically, all values other than “1” in LTP are converted to “0”, and the binary code obtained thereby becomes positive LTP. FIG. 18 is a diagram showing a positive LTP corresponding to the LTP shown in FIG. In the LTP shown in FIG. 17, that is, (-1) 01110 (-1) (-1), when all values other than "1" are converted to "0", as shown in FIG. A positive LTP is obtained. The 8 bits constituting the positive LTP are the upper left surrounding pixel value, the upper right surrounding pixel value, the upper right surrounding pixel value, the right surrounding pixel value, the lower right surrounding pixel value, and the immediately below surrounding pixel value in order from the top. , Respectively corresponding to the lower left surrounding pixel value and the left surrounding pixel value.

上記の説明から理解できるように、ポジティブＬＴＰの各ビットは、当該ビットに対応する周囲画素値と注目画素値との関係を示している。そして、ポジティブＬＴＰの各ビットは、それに対応する周囲画素値が、注目画素値に対してオフセット値だけ加算して得られる値以上であれば“１”を示し、当該値未満であれば“０”を示す。したがって、ポジティブＬＴＰは、注目画素の周囲での、当該注目画素の画素値よりも大きい画素値の様子（分布状況）を示していると言える。よって、ポジティブＬＴＰについても、ＬＢＰと同様に、局所的なテクスチャを示していると言える。評価値取得部１３０は、ポジティブＬＴＰを十進数で表した値を評価値として使用する。図１８の例では、バイナリコード“００１１１０００”を十進数で表した値“５６”が評価値とされる。以後、ポジティブＬＴＰ（正確にはそれを十進数で表した値）を各評価値とする評価値マップを「ポジティブＬＴＰマップ」と呼ぶ。 As can be understood from the above description, each bit of the positive LTP indicates the relationship between the surrounding pixel value corresponding to the bit and the target pixel value. Each bit of the positive LTP indicates “1” if the surrounding pixel value corresponding to the positive LTP is equal to or greater than the value obtained by adding only the offset value to the target pixel value, and indicates “0” if it is less than the value. ". Therefore, it can be said that the positive LTP indicates a state (distribution state) of a pixel value larger than the pixel value of the target pixel around the target pixel. Therefore, it can be said that the positive LTP also shows a local texture like the LBP. The evaluation value acquisition unit 130 uses a value representing the positive LTP in decimal as an evaluation value. In the example of FIG. 18, the value “56” representing the binary code “00111000” in decimal is used as the evaluation value. Hereinafter, an evaluation value map having positive LTPs (more precisely, values expressed in decimal) as evaluation values is referred to as a “positive LTP map”.

一方で、ネガティブＬＴＰとは、ＬＴＰに含まれる“−１”だけに着目して当該ＬＴＰをバイナリコードに変換したものである。具体的には、ＬＴＰにおいて“−１”以外の値をすべて“０”に変換するとともに“−１”を“１”に変換し、それによって得られたバイナリコードがネガティブＬＴＰとなる。図１９は、図１７に示されるＬＴＰに対応するネガティブＬＴＰを示す図である。図１７に示されるＬＴＰ、つまり（−１）０１１１０（−１）（−１）において、“−１”以外の値がすべて“０”に変換されるとともに“−１”が“１”に変換されると、図１９に示されるように、“１０００００１１”というネガティブＬＴＰが得られる。ネガティブＬＴＰを構成する８ビットは、上位から順に、左上の周囲画素値、真上の周囲画素値、右上の周囲画素値、右の周囲画素値、右下の周囲画素値、真下の周囲画素値、左下の周囲画素値、左の周囲画素値にそれぞれ対応している。 On the other hand, the negative LTP is obtained by converting the LTP into a binary code focusing on only “−1” included in the LTP. Specifically, all values other than “−1” in LTP are converted to “0” and “−1” is converted to “1”, and the resulting binary code becomes a negative LTP. FIG. 19 is a diagram showing a negative LTP corresponding to the LTP shown in FIG. In the LTP shown in FIG. 17, that is, (-1) 01110 (-1) (-1), all values other than "-1" are converted to "0" and "-1" is converted to "1". Then, as shown in FIG. 19, a negative LTP of “10000011” is obtained. The 8 bits constituting the negative LTP are the upper left surrounding pixel value, the upper right surrounding pixel value, the upper right surrounding pixel value, the right surrounding pixel value, the lower right surrounding pixel value, and the immediately below surrounding pixel value in order from the top. , Respectively corresponding to the lower left surrounding pixel value and the left surrounding pixel value.

上記の説明から理解できるように、ネガティブＬＴＰの各ビットについても、当該ビットに対応する周囲画素値と注目画素値との関係を示している。そして、ネガティブＬＴＰの各ビットは、それに対応する周囲画素値が、注目画素値からオフセット値だけ減算して得られる値以下であれば“１”を示し、当該値よりも大きければ“０”を示す。したがって、ネガティブＬＴＰは、注目画素の周囲での、当該注目画素の画素値よりも小さい画素値の様子（分布状況）を示していると言える。よって、ネガティブＬＴＰについても、ＬＢＰ及びポジティブＬＴＰと同様に、局所的なテクスチャを示していると言える。評価値取得部１３０は、ネガティブＬＴＰを十進数で表した値を評価値として使用する。図１９の例では、バイナリコード“１０００００１１”を十進数で表した値“１３１”が評価値とされる。以後、ネガティブＬＴＰ（正確にはそれを十進数で表した値）を各評価値とする評価値マップを「ネガティブＬＴＰマップ」と呼ぶ。 As can be understood from the above description, for each bit of the negative LTP, the relationship between the surrounding pixel value corresponding to the bit and the target pixel value is shown. Each bit of the negative LTP indicates “1” if the surrounding pixel value corresponding to the bit is equal to or less than the value obtained by subtracting the offset value from the target pixel value, and indicates “0” if the value is larger than the value. Show. Therefore, it can be said that the negative LTP indicates a state (distribution state) of a pixel value smaller than the pixel value of the target pixel around the target pixel. Therefore, it can be said that the negative LTP also shows a local texture like the LBP and the positive LTP. The evaluation value acquisition unit 130 uses a value representing the negative LTP in decimal as an evaluation value. In the example of FIG. 19, the value “131” representing the binary code “10000011” in decimal is used as the evaluation value. Hereinafter, an evaluation value map having negative LTPs (more precisely, values expressed in decimal) as evaluation values is referred to as a “negative LTP map”.

図２０は図１５に示される処理対象画像２０に基づいて生成されたポジティブＬＴＰマップでの各評価値を輝度とすることによって、当該ポジティブＬＴＰマップを画像化して得られるグレースケールのポジティブＬＴＰマップ画像４０ｐを示す図である。図２１は図１５に示される処理対象画像２０に基づいて生成されたネガティブＬＴＰマップでの各評価値を輝度とすることによって、当該ネガティブＬＴＰマップを画像化して得られるグレースケールのネガティブＬＴＰマップ画像４０ｎを示す図である。図２１に示されるポジティブＬＴＰマップ画像４０ｐ及び図２１に示されるネガティブＬＴＰマップ画像４０ｎからは、処理対象画像２０についてのテクスチャを読み取ることができる。 FIG. 20 shows a grayscale positive LTP map image obtained by imaging each positive evaluation value in the positive LTP map generated based on the processing target image 20 shown in FIG. It is a figure which shows 40p. FIG. 21 shows a grayscale negative LTP map image obtained by imaging each negative LTP map by setting each evaluation value in the negative LTP map generated based on the processing target image 20 shown in FIG. 15 as luminance. It is a figure which shows 40n. From the positive LTP map image 40p shown in FIG. 21 and the negative LTP map image 40n shown in FIG. 21, the texture of the processing target image 20 can be read.

なお以後、ＬＢＰ、ネガティブＬＴＰ及びポジティブＬＴＰを特に区別する必要がないときにはそれぞれを「テクスチャ表現コード」と呼ぶことがある。 Hereinafter, when there is no need to particularly distinguish LBP, negative LTP, and positive LTP, each may be referred to as a “texture expression code”.

＜uniformについて＞
上記の通り、本実施の形態では、評価値は、８ビットで表現されることから、０〜２５５までの２５６種類の値をとることが可能である。 <About uniform>
As described above, in the present embodiment, the evaluation value is represented by 8 bits, and thus can take 256 types of values from 0 to 255.

一方で、後述の説明から理解できるように、枠内画像から抽出される複数の特徴量の数、つまり枠内画像から抽出される特徴ベクトルの次元数は、評価値がとり得る値の種類の数に依存する。したがって、検出部１２での処理量を低減するためには、評価値がとり得る値の種類の数を制限することが有効である。 On the other hand, as can be understood from the description below, the number of feature quantities extracted from the in-frame image, that is, the number of dimensions of the feature vector extracted from the in-frame image, is the kind of value that the evaluation value can take. Depends on the number. Therefore, in order to reduce the processing amount in the detection unit 12, it is effective to limit the number of types of values that the evaluation value can take.

そこで、ＬＢＰやＬＴＰを求める際に使用されることがあるuniformという考え方を使用して、評価値がとり得る値の種類の数を制限（低減）することについて検討する。以下に、uniformを使用した、評価値がとり得る値の種類の数の制限について説明する。 Therefore, using the concept of uniform, which is sometimes used when obtaining LBP and LTP, it is considered to limit (reduce) the number of types of values that an evaluation value can take. In the following, the limitation on the number of types of values that the evaluation value can take using uniform will be described.

まず、ＬＢＰ等のテクスチャ表現コードを構成する８ビットを順に見ていった際のビット変化（ビット反転）の回数を求める。テクスチャ表現コードを構成する８ビットを順に見ていく方向は、上位から下位に向かう方向で良いし、下位から上位に向かう方向であっても良い。そして、テクスチャ表現コードについて求めたビット変化の回数が２回以下である場合には、当該テクスチャ表現コードはuniformであるとする。一方で、テクスチャ表現コードについて求めたビット変化の回数が２回を越える場合には、つまり３回以上の場合には、当該テクスチャ表現コードはuniformでないとする。 First, the number of bit changes (bit inversion) when the 8 bits constituting the texture expression code such as LBP are viewed in order is obtained. The direction in which the 8 bits constituting the texture expression code are viewed in order may be from the high order to the low order, or from the low order to the high order. When the number of bit changes obtained for the texture expression code is 2 or less, the texture expression code is assumed to be uniform. On the other hand, when the number of bit changes obtained for the texture expression code exceeds two, that is, three or more times, the texture expression code is not uniform.

uniformであるとされたテクスチャ表現コードについては、当該テクスチャ表現コードを十進数で表した値が評価値とされる。 With respect to the texture expression code determined to be uniform, a value representing the texture expression code in decimal is used as the evaluation value.

一方で、uniformでないとされたテクスチャ表現コードは、ノイズの影響を受けた注目画素値及び周囲画素値に基づいて生成されたもとのとして、評価値として使用されない。uniformでないとされたテクスチャ表現コードは、８ビットの特定のバイナリコードに変換されて、当該特定のバイナリコードを十進数で表した値が評価値とされる。この特定のバイナリコードについては、uniformとされる８ビットのバイナリコード以外の８ビットのバイナリコードであれば、何でも良い。例えば、特定のバイナリコードとしては、“１０１０１０１０”が採用される。 On the other hand, the texture expression code that is not uniform is not used as an evaluation value as it is generated based on the target pixel value and surrounding pixel values affected by noise. The texture expression code determined not to be uniform is converted into a specific binary code of 8 bits, and a value representing the specific binary code in decimal is used as an evaluation value. The specific binary code may be anything as long as it is an 8-bit binary code other than the uniform 8-bit binary code. For example, “10101010” is adopted as the specific binary code.

例えば、テクスチャ表現コードが“０１００００００”であるとする。“０１００００００”を例えば上位から順に見ていくと、上位から１ビット目と２ビット目との間で“０”から“１”に変化しており、上位から２ビット目と３ビット目との間で“１”から“０”へ変化している。したがって、ビット変化が２回であるため、“０１００００００”はuniformとなり、“０１００００００”を十進数で表した値“６４”が評価値とされる。 For example, it is assumed that the texture expression code is “01000000”. For example, when “01000000” is viewed in order from the high order, it changes from “0” to “1” between the first and second bits from the high order, and the second and third bits from the high order. In the meantime, it changes from “1” to “0”. Therefore, since the bit changes twice, “01000000” becomes uniform, and the value “64” representing “01000000” in decimal is used as the evaluation value.

また、テクスチャ表現コードが“００００１１１１”であるとする。それを上位から順に見ていくと、上位から４ビット目と５ビット目の間で“０”から“１”の変化しており、ビット変化は１回である。したがって、“００００１１１１”はuniformとなり、“００００１１１１”を十進数で表した値“１５”が評価値とされる。 Further, it is assumed that the texture expression code is “00001111”. When viewed in order from the higher order, “0” changes to “1” between the fourth and fifth bits from the upper order, and the bit changes once. Therefore, “00001111” becomes uniform, and a value “15” representing “00001111” in decimal is an evaluation value.

また、テクスチャ表現コードが“００１１００１１”であるとする。それを上位から順に見ていくと、上位から２ビット目と３ビット目の間で“０”から“１”に変化し、上位から４ビット目と５ビット目の間で“１”から“０”に変化し、上位から６ビット目と７ビット目の間で“０”から“１”に変化している。したがって、ビット変化は３回であるため、“００１１００１１”はuniformでないとされる。よって、“００１１００１１”は、“１０１０１０１０”という特定のバイナリコードに変換されて、“１０１０１０１０”を十進数で表した値“１７０”が評価値とされる。 Further, it is assumed that the texture expression code is “00110011”. When viewed in order from the high order, it changes from “0” to “1” between the second and third bits from the high order, and from “1” to “1” between the fourth and fifth bits from the high order. It changes to “0”, and changes from “0” to “1” between the 6th and 7th bits from the top. Therefore, since the bit change is 3 times, “00110011” is not uniform. Therefore, “00110011” is converted into a specific binary code “10101010”, and a value “170” representing “10101010” in decimal is used as an evaluation value.

また、テクスチャ表現コードが“０１０１０１０１”であるとする。それを順に見ていくと、ビット変化は７回であるため、“０１０１０１０１”はuniformでないとされる。よって、“０１０１０１０１”は、“１０１０１０１０”という特定のバイナリコードに変換されて、“１０１０１０１０”を十進数で表した値“１７０”が評価値とされる。 Further, it is assumed that the texture expression code is “01010101”. Looking at it in turn, the bit change is 7 times, so that “01010101” is not uniform. Therefore, “01010101” is converted into a specific binary code “10101010”, and a value “170” representing “10101010” in decimal is used as an evaluation value.

このように、テクスチャ表現コードを構成する８ビットを順に見ていった際のビット変化の回数が２回を越える場合には、当該テクスチャ表現コードを特定のバイナリコードに変換し、当該特定のバイナリコードを十進数で表した値を評価値とすることによって、評価値がとり得る値の種類は、２５６種類から５９種類に制限される。 In this way, when the number of bit changes when the 8 bits constituting the texture expression code are viewed in order exceeds two, the texture expression code is converted into a specific binary code, and the specific binary code is converted. By using a value representing a code in decimal as an evaluation value, the types of values that the evaluation value can take are limited from 256 types to 59 types.

本実施の形態に係る評価値取得部１３０は、特徴ベクトルの次元数を低減するために、uniformを使用して、評価値がとり得る値の種類を５９種類に制限する。 Evaluation value acquisition section 130 according to the present embodiment uses uniform to limit the types of values that evaluation values can take to 59, in order to reduce the number of dimensions of feature vectors.

なお、説明の便宜上、評価値がとり得る５９種類の値に対して０〜５８の番号をそれぞれ割り当てる。以後、この番号を使用して本実施の形態を説明することがある。 For convenience of explanation, numbers 0 to 58 are assigned to 59 types of values that can be evaluated. Hereinafter, this embodiment may be described using this number.

＜画素値補間について＞
評価値の精度を向上させるためには、評価値の取得で用いられる複数の周囲画素値については、注目画素からの距離が同じである複数の周囲位置での周囲画素値が用いられる方が望ましい。 <About pixel value interpolation>
In order to improve the accuracy of the evaluation value, it is preferable to use the surrounding pixel values at a plurality of surrounding positions having the same distance from the target pixel for the plurality of surrounding pixel values used for obtaining the evaluation value. .

一方で、ＬＢＰの生成においては、周囲画素値についての注目画素値との比較を、注目画素からどの程度離れた範囲まで行うかを示すパラメータとして、「注目点からの距離」が使用される。注目点からの距離が“１”の場合には、上述のように、８個の周囲画素値が使用されてＬＢＰが求められる。距離１とは、画素間の上下方向及び左右方向の距離を示している。なお、ＬＴＰについても同様である。 On the other hand, in the generation of the LBP, “distance from the point of interest” is used as a parameter indicating how far the surrounding pixel value is compared with the target pixel value up to a range away from the target pixel. When the distance from the point of interest is “1”, as described above, eight surrounding pixel values are used to obtain the LBP. The distance 1 indicates the vertical and horizontal distances between pixels. The same applies to LTP.

図２２は算出用枠２００内での注目画素３００と８個の周囲画素３１０ａ〜３１０ｈとの間での位置関係を示す図である。図２２に示されるように、算出用枠２００内の９個の画素においては、注目画素３００と上下方向あるいは左右方向の周囲画素との間の距離ａと、注目画素３００と斜め方向の周囲画素との間の距離ｂとは異なっている。 FIG. 22 is a diagram showing a positional relationship between the target pixel 300 and the eight surrounding pixels 310a to 310h in the calculation frame 200. As shown in FIG. 22, in the nine pixels in the calculation frame 200, the distance a between the target pixel 300 and the surrounding pixels in the vertical direction or the horizontal direction, and the target pixel 300 and the surrounding pixels in the diagonal direction. Is different from the distance b.

上述のように、評価値を求める際に使用する８個の周囲画素値として、注目画素３００の周囲に存在する８個の周囲画素３１０ａ〜３１０ｈの画素値を使用すると、注目画素と、周囲画素値が対応する位置との間の距離については、上下方向及び左右方向では“１”となるが、斜め方向では“１”とならない。したがって、この場合には、評価値の取得で用いられる複数の周囲画素値については、注目画素からの距離が同じである複数の周囲位置での周囲画素値とはならない。 As described above, when the pixel values of the eight peripheral pixels 310a to 310h existing around the target pixel 300 are used as the eight peripheral pixel values used when obtaining the evaluation value, the target pixel and the peripheral pixels The distance between the position corresponding to the value is “1” in the vertical direction and the horizontal direction, but is not “1” in the oblique direction. Therefore, in this case, the plurality of surrounding pixel values used for obtaining the evaluation value are not the surrounding pixel values at the plurality of surrounding positions having the same distance from the target pixel.

したがって、斜め方向の周囲画素値については、斜め方向の周囲画素の画素値を使用するのではなく、注目画素からの距離が“１”である斜め方向の周囲位置４００（図２２参照）での画素値を使用することが望ましい。 Therefore, as for the surrounding pixel values in the oblique direction, the pixel values of the surrounding pixels in the oblique direction are not used, but at the surrounding position 400 in the oblique direction where the distance from the target pixel is “1” (see FIG. 22). It is desirable to use pixel values.

注目画素からの距離が“１”である斜め方向の周囲位置４００での画素値については、バイリニア補間処理などの画素値補間処理によって求めることができる。 The pixel value at the peripheral position 400 in the oblique direction whose distance from the target pixel is “1” can be obtained by pixel value interpolation processing such as bilinear interpolation processing.

図２３は、注目画素からの距離が“１”である右上方向の周囲位置４００での画素値をバイリニア補間処理を用いて求める方法を説明するための図である。図２３の例では、注目画素３００の画素値、上方向の周囲画素３１０ｂの画素値、右上方向の周囲画素３１０ｃの画素値、右方向の周囲画素３１０ｄの画素値が、それぞれ“５７”、“５５”、“６５”及び“７５”となっている。また、周囲画素３１０ｃの上方向に存在する画素３２０ａの画素値が“５０”であり、周囲画素３１０ｃの右方向に存在する画素３２０ｂの画素値が“７０”となっている。 FIG. 23 is a diagram for explaining a method of obtaining the pixel value at the peripheral position 400 in the upper right direction where the distance from the target pixel is “1” using bilinear interpolation processing. In the example of FIG. 23, the pixel value of the target pixel 300, the pixel value of the surrounding pixel 310b in the upper direction, the pixel value of the surrounding pixel 310c in the upper right direction, and the pixel value of the surrounding pixel 310d in the right direction are “57” and “ 55, “65” and “75”. Further, the pixel value of the pixel 320a existing in the upper direction of the surrounding pixel 310c is “50”, and the pixel value of the pixel 320b existing in the right direction of the surrounding pixel 310c is “70”.

図２３の例において、対象の周囲位置４００と画素３２０ａとの間の上下方向の距離と、周囲位置４００と周囲画素３１０ｄとの間の上下方向の距離との比が、ｙ１：ｙ２であるとすると、以下の式（１）を用いて上下方向の補間値Ｙ０を求める。 In the example of FIG. 23, the ratio of the vertical distance between the target peripheral position 400 and the pixel 320a and the vertical distance between the peripheral position 400 and the peripheral pixel 310d is y1: y2. Then, the interpolation value Y0 in the vertical direction is obtained using the following equation (1).

また、周囲位置４００と周囲画素３１０ｂとの間の左右方向の距離と、周囲位置４００と画素３２０ｂとの間の左右方向の距離との比が、ｘ１：ｘ２であるとすると、以下の式（２）を用いて左右方向の補間値Ｘ０を求める。 When the ratio of the distance in the left-right direction between the surrounding position 400 and the surrounding pixel 310b and the distance in the left-right direction between the surrounding position 400 and the pixel 320b is x1: x2, the following formula ( 2) is used to determine the left-right direction interpolation value X0.

そして、以下の式（３）を用いて周囲位置４００での画素値Ｚ０を求める。 Then, the pixel value Z0 at the peripheral position 400 is obtained using the following equation (3).

注目画素からの距離が“１”である左上方向の周囲位置での画素値、注目画素からの距離が“１”である左下方向の周囲位置での画素値、注目画素からの距離が“１”である右下方向の周囲位置での画素値についても同様にして求めることができる。 The pixel value at the upper left peripheral position where the distance from the target pixel is “1”, the pixel value at the lower left peripheral position where the distance from the target pixel is “1”, and the distance from the target pixel is “1” The pixel value at the peripheral position in the lower right direction that is “can be obtained in the same manner.

評価値を求める際には、周囲画素値として、注目画素の斜め方向に存在する周囲画素の画素値ではなく、注目画素からの距離が“１”である斜め方向の周囲位置での画素値を使用することによって、より正確な評価値を求めることができる。評価値取得部１３０は、周囲画素値として、斜め方向の周囲画素の画素値を使用しても良いし、注目画素からの距離が“１”である斜め方向の周囲位置での画素値を使用しても良い。 When obtaining the evaluation value, the pixel value at the peripheral position in the diagonal direction where the distance from the target pixel is “1” is used as the peripheral pixel value, not the pixel value of the peripheral pixel existing in the diagonal direction of the target pixel. By using it, a more accurate evaluation value can be obtained. The evaluation value acquisition unit 130 may use the pixel values of the surrounding pixels in the oblique direction as the surrounding pixel values, or use the pixel values at the surrounding position in the oblique direction whose distance from the target pixel is “1”. You may do it.

＜特徴量取得部の動作説明＞
次に特徴量取得部１３１の動作について詳細に説明する。特徴量取得部１３１は、複数種類の検出枠のそれぞれについて、当該検出枠に対応する、評価値取得部１３０で生成された評価値マップを用いて特徴量を取得する。また特徴量取得部１３１は、処理対象画像についての特徴量を取得する際には、本来のサイズよりも周囲１画素分だけ小さい抽出用処理対象画像を使用する。また、特徴量取得部１３１は、サイズ変更画像についての特徴量を取得する際には、本来のサイズよりも周囲１画素分だけ小さい抽出用サイズ変更画像を使用する。以後、特徴量の取得で使用される抽出用処理対象画像及び抽出用サイズ変更画像を総称して「抽出対象画像」と呼ぶことがある。 <Description of operation of feature quantity acquisition unit>
Next, the operation of the feature amount acquisition unit 131 will be described in detail. The feature amount acquisition unit 131 acquires a feature amount for each of a plurality of types of detection frames using the evaluation value map generated by the evaluation value acquisition unit 130 corresponding to the detection frame. Further, when the feature amount acquisition unit 131 acquires the feature amount of the processing target image, the feature amount acquisition unit 131 uses an extraction processing target image that is smaller than the original size by one surrounding pixel. Further, when the feature amount acquisition unit 131 acquires a feature amount for the size-changed image, the feature-size acquisition unit 131 uses an extraction size-changed image that is smaller than the original size by one surrounding pixel. Hereinafter, the extraction processing target image and the extraction size-changed image used for acquiring the feature amount may be collectively referred to as “extraction target image”.

本実施の形態では、例えば、２種類の特徴量が使用される。具体的には、評価値マップに設定された特徴量抽出枠内での評価値の頻度（度数）が特徴量とされる。つまり、評価値マップに設定された特徴量抽出枠内での、評価値がとり得る５９種類の値のそれぞれについての頻度（度数）が特徴量とされる。以後、この特徴量を「第１特徴量」とする。 In the present embodiment, for example, two types of feature values are used. Specifically, the frequency (frequency) of the evaluation value within the feature amount extraction frame set in the evaluation value map is used as the feature amount. That is, the frequency (frequency) of each of 59 types of values that can be taken as evaluation values within the feature value extraction frame set in the evaluation value map is used as the feature value. Hereinafter, this feature amount is referred to as a “first feature amount”.

さらに、本実施の形態では、評価値マップに設定された特徴量抽出枠内での評価値の共起頻度（共起度数）が特徴量とされる。つまり、評価値マップに設定された特徴量抽出枠内における、評価値がとり得る値についての組み合わせの頻度（度数）が特徴量とされる。以後、この特徴量を「第２特徴量」とする。第２特徴量は、評価値の共起性を示していると言える。特徴量取得部１３１は、枠内画像から取得した第１及び第２特徴量の両方から成る特徴ベクトルを識別器１４に入力する。以下に、第１及び第２特徴量の求め方について説明する。 Furthermore, in the present embodiment, the co-occurrence frequency (co-occurrence frequency) of evaluation values within the feature amount extraction frame set in the evaluation value map is used as the feature amount. That is, the combination amount (frequency) of the possible values of the evaluation value in the feature amount extraction frame set in the evaluation value map is used as the feature amount. Hereinafter, this feature amount is referred to as a “second feature amount”. It can be said that the second feature amount indicates the co-occurrence of the evaluation value. The feature amount acquisition unit 131 inputs a feature vector composed of both the first and second feature amounts acquired from the in-frame image to the discriminator 14. Hereinafter, how to obtain the first and second feature amounts will be described.

＜第１特徴量について＞
特徴量取得部１３１は、上述の図３〜６のようにして、抽出用処理対象画像のある位置に基準検出枠を設定した際には、評価値取得部１３０で生成された基準用評価値マップに対して、当該ある位置と同じ位置に基準検出枠を設定する。そして、特徴量取得部１３１は、基準用評価値マップに設定した基準検出枠内での評価値の頻度分布（度数分布）を示す１次元評価値ヒストグラムを生成する。 <About the first feature>
When the reference detection frame is set at a position of the extraction processing target image as shown in FIGS. 3 to 6, the feature amount acquisition unit 131 generates the reference evaluation value generated by the evaluation value acquisition unit 130. A reference detection frame is set at the same position as the certain position with respect to the map. Then, the feature amount acquisition unit 131 generates a one-dimensional evaluation value histogram indicating the frequency distribution (frequency distribution) of evaluation values within the reference detection frame set in the reference evaluation value map.

図２４は１次元評価値ヒストグラムの一例を示す図である。図２４の横軸は、評価値がとり得る５９種類の値に対してそれぞれ割り当てられた０〜５８の番号を示している。図２４の縦軸は、基準用評価値マップに設定された基準検出枠内の複数の評価値において、横軸に示された番号の値を有する評価値の頻度を示している。本実施の形態では、評価値がとり得る値の種類は５９種類であるため、１次元評価値ヒストグラムは５９個のビンを有する。 FIG. 24 is a diagram illustrating an example of a one-dimensional evaluation value histogram. The horizontal axis of FIG. 24 shows numbers 0 to 58 assigned to 59 types of evaluation values. The vertical axis in FIG. 24 indicates the frequency of evaluation values having the number values shown on the horizontal axis in a plurality of evaluation values within the reference detection frame set in the reference evaluation value map. In the present embodiment, since there are 59 types of values that can be taken as evaluation values, the one-dimensional evaluation value histogram has 59 bins.

特徴量取得部１３１は、１次元評価値ヒストグラムを生成すると、当該１次元評価値ヒストグラムにおける５９個のビンでの頻度のそれぞれを、抽出用処理対象画像に設定した基準検出枠内の画像についての第１特徴量とする。これにより、抽出用処理対象画像に設定された基準検出枠内の画像から、５９個の第１特徴量が抽出される。 When the feature quantity acquisition unit 131 generates a one-dimensional evaluation value histogram, each of the frequencies in the 59 bins in the one-dimensional evaluation value histogram is set for the image in the reference detection frame set as the extraction processing target image. The first feature value is used. As a result, 59 first feature values are extracted from the image within the reference detection frame set as the extraction target image.

特徴量取得部１３１は、抽出用処理対象画像に対して基準検出枠をラスタスキャンさせていく際に、基準検出枠の各位置において、上記のようにして、基準用評価値マップを用いて基準検出枠内の画像から５９個の第１特徴量を取得する。 When the reference detection frame is raster scanned with respect to the extraction processing target image, the feature amount acquisition unit 131 uses the reference evaluation value map as a reference at each position of the reference detection frame as described above. 59 first feature amounts are acquired from the image within the detection frame.

特徴量取得部１３１では、非基準検出枠が使用される場合でも、同様にして第１特徴量が取得される。特徴量取得部１３１は、非基準検出枠に対応する抽出用サイズ変更画像のある位置にサイズ変更検出枠（特徴量抽出枠）を設定した際には、当該非基準検出枠に対応する、評価値取得部１３０で生成された非基準用評価値マップに対して、当該ある位置と同じ位置に当該サイズ変更検出枠を設定する。そして、特徴量取得部１３１は、非基準用評価値マップに設定したサイズ変更検出枠内での評価値の頻度分布を示す１次元評価値ヒストグラムを生成する。 The feature amount acquisition unit 131 acquires the first feature amount in the same manner even when the non-reference detection frame is used. When the size change detection frame (feature amount extraction frame) is set at a position of the extraction size change image corresponding to the non-reference detection frame, the feature amount acquisition unit 131 corresponds to the non-reference detection frame. For the non-reference evaluation value map generated by the value acquisition unit 130, the size change detection frame is set at the same position as the certain position. Then, the feature quantity acquisition unit 131 generates a one-dimensional evaluation value histogram indicating the frequency distribution of evaluation values within the size change detection frame set in the non-reference evaluation value map.

特徴量取得部１３１は、１次元評価値ヒストグラムを生成すると、当該１次元評価値ヒストグラムにおける５９個のビンでの頻度のそれぞれを、抽出用サイズ変更画像に設定したサイズ変更検出枠内の画像についての第１特徴量とする。これにより、抽出用サイズ変更画像に設定されたサイズ変更検出枠内の画像から５９個の第１特徴量が抽出される。 When the feature quantity acquisition unit 131 generates a one-dimensional evaluation value histogram, the frequency in 59 bins in the one-dimensional evaluation value histogram is set for each image in the size change detection frame set as the extraction size change image. The first feature amount. As a result, 59 first feature amounts are extracted from the image within the size change detection frame set in the size change image for extraction.

特徴量取得部１３１は、抽出用サイズ変更画像に対してサイズ変更検出枠をラスタスキャンさせていく際に、サイズ変更検出枠の各位置において、上記のようにして、非基準用評価値マップを用いてサイズ変更検出枠内の画像から５９個の第１特徴量を取得する。特徴量取得部１３１は、複数種類の非基準検出枠のそれぞれに関して、当該非基準検出枠に対応する抽出用サイズ変更画像に対して当該非基準検出枠に対応するサイズ変更検出枠をラスタスキャンさせながら、当該非基準検出枠に対応する非基準用評価値マップを用いて当該サイズ変更検出枠内の画像から５９個の第１特徴量を取得する。 The feature amount acquisition unit 131 rasterizes the non-reference evaluation value map as described above at each position of the size change detection frame when raster-scanning the size change detection frame with respect to the size change image for extraction. It is used to obtain 59 first feature values from the image within the size change detection frame. For each of a plurality of types of non-reference detection frames, the feature amount acquisition unit 131 raster-scans the size change detection frame corresponding to the non-reference detection frame with respect to the extracted size-changed image corresponding to the non-reference detection frame. However, 59 first feature values are acquired from the image in the size change detection frame using the non-reference evaluation value map corresponding to the non-reference detection frame.

なお、枠内画像から抽出された５９個の第１特徴量（１次元評価値ヒストグラムの５９個のビンでの頻度）のそれぞれを以下の式（４）を用いて正規化しても良い。第１特徴量を正規化することによって、第１特徴量が、撮像装置５での撮像環境の変化の影響を受けにくくなる。 Each of the 59 first feature values (frequency in 59 bins of the one-dimensional evaluation value histogram) extracted from the in-frame image may be normalized using the following equation (4). By normalizing the first feature value, the first feature value is less affected by the change in the imaging environment of the imaging device 5.

ここで、ｖは正規化前の第１特徴量を示しており、Ｖは正規化後の第１特徴量を示している。また、ｋは、枠内画像から抽出された複数の第１特徴量の数を示しており、本実施の形態ではｋ＝５９である。そして、ｖ（ｉ）は、ｋ個の第１特徴量に対して１番〜ｋ番までをそれぞれ付与した場合における、ｉ番の正規化前の第１特徴量を示している。なお、εは、式（４）の右辺の式において、ｖが零で除算されないために設けられた定数である。 Here, v indicates the first feature value before normalization, and V indicates the first feature value after normalization. Further, k indicates the number of the first feature amounts extracted from the in-frame image, and k = 59 in the present embodiment. And v (i) represents the first feature value before normalization of the i-th number when the first to k-th numbers are assigned to the k first feature values. Note that ε is a constant provided so that v is not divided by zero in the expression on the right side of Expression (4).

＜第２特徴量について＞
特徴量取得部１３１は、抽出用処理対象画像のある位置に基準検出枠を設定した際には、評価値取得部１３０で生成された基準用評価値マップに対して、当該ある位置と同じ位置に基準検出枠を設定する。そして、特徴量取得部１３１は、基準用評価値マップに設定した基準検出枠内での評価値の共起頻度の分布を示す２次元評価値ヒストグラムを生成する。 <About the second feature>
When the reference detection frame is set at a certain position of the extraction processing target image, the feature amount acquisition unit 131 has the same position as the certain position with respect to the reference evaluation value map generated by the evaluation value acquisition unit 130. Set the reference detection frame to. Then, the feature amount acquisition unit 131 generates a two-dimensional evaluation value histogram indicating the distribution of evaluation value co-occurrence frequencies within the reference detection frame set in the reference evaluation value map.

本実施の形態では、基準用評価値マップに設定した基準検出枠（特徴量抽出枠）内において、所定の相対的な位置関係にある２つの評価値がとり得る値の組み合わせについての頻度分布を示す２次元評価値ヒストグラムが生成される。以後、所定の相対的な位置関係にある２つの評価値のペアを「評価値ペア」と呼ぶ。そして、本実施の形態では、評価値ペアを構成する２つの評価値の間での相対的な位置関係が異なる複数種類（複数組）の評価値ペアのそれぞれについて２次元評価値ヒストグラムが生成される。本実施の形態では、例えば３０種類の評価値ペアが使用される。以後、評価値ペアにおいて、一方の評価値を「第１評価値」と呼び、他方の評価値を「第２評価値」と呼ぶ。 In the present embodiment, the frequency distribution of combinations of values that can be taken by two evaluation values in a predetermined relative positional relationship within the reference detection frame (feature amount extraction frame) set in the reference evaluation value map is obtained. A two-dimensional evaluation value histogram is generated. Hereinafter, a pair of two evaluation values having a predetermined relative positional relationship is referred to as an “evaluation value pair”. In this embodiment, a two-dimensional evaluation value histogram is generated for each of a plurality of types (plural sets) of evaluation value pairs having different relative positional relationships between the two evaluation values constituting the evaluation value pair. The In the present embodiment, for example, 30 types of evaluation value pairs are used. Hereinafter, in the evaluation value pair, one evaluation value is referred to as a “first evaluation value”, and the other evaluation value is referred to as a “second evaluation value”.

図２５は３０種類の評価値ペアの一例を示す図である。図２５では、特徴量抽出枠内での、第１評価値５００と第２評価値５１０の間の相対的な位置関係が示されている。図２５では、白丸は第１評価値５００を示している。また図２５では、黒丸は第２評価値５１０を示しており、特徴量抽出枠内で位置が互いに異なる３０種類の第２評価値５１０が示されている。 FIG. 25 is a diagram illustrating an example of 30 types of evaluation value pairs. FIG. 25 shows a relative positional relationship between the first evaluation value 500 and the second evaluation value 510 within the feature amount extraction frame. In FIG. 25, white circles indicate the first evaluation value 500. In FIG. 25, black circles indicate the second evaluation values 510, and 30 types of second evaluation values 510 having different positions within the feature amount extraction frame are illustrated.

本実施の形態では、図２５に示される第１評価値５００と、図２５に示される１種類の第２評価値５１０とで、１種類の評価値ペアが形成される。図２５では、第２評価値５１０は３０種類存在することから、３０種類の評価値ペアが形成される。以後、説明の対象の評価値ペアを「対象評価値ペア」と呼ぶ。 In the present embodiment, one type of evaluation value pair is formed by the first evaluation value 500 shown in FIG. 25 and one type of second evaluation value 510 shown in FIG. In FIG. 25, since there are 30 types of second evaluation values 510, 30 types of evaluation value pairs are formed. Hereinafter, the evaluation value pair to be described is referred to as “target evaluation value pair”.

図２６〜２８は、特徴量取得部１３１が、基準用評価値マップに設定した基準検出枠１００内において、対象評価値ペアがとり得る値の組み合わせについての頻度分布を示す２次元評価値ヒストグラムを生成する際の当該特徴量取得部１３１の動作を説明するための図である。図２６〜２８では、対象評価値ペアは、左右方向で互いに隣り合う第１評価値５００及び第２評価値５１０で構成されている。 26 to 28 show two-dimensional evaluation value histograms indicating frequency distributions for combinations of values that can be taken by the target evaluation value pair in the reference detection frame 100 set by the feature amount acquisition unit 131 in the reference evaluation value map. It is a figure for demonstrating operation | movement of the said feature-value acquisition part 131 at the time of producing | generating. 26 to 28, the target evaluation value pair is configured by a first evaluation value 500 and a second evaluation value 510 that are adjacent to each other in the left-right direction.

本実施の形態では、特徴量取得部１３１は、基準検出枠１００において、左上から右下にかけて（ラスタスキャン方向に沿って）順番に評価値を第１評価値５００とし、当該第１評価値５００とペアとなる第２評価値５１０と当該第１評価値５００の組み合わせを記憶する。そして、特徴量取得部１３１は、記憶した複数組の組み合わせについての頻度分布を示す２次元評価値ヒストグラムを生成する。以下にこの点について詳細に説明する。 In the present embodiment, the feature amount acquisition unit 131 sets the evaluation value as the first evaluation value 500 in order from the upper left to the lower right (along the raster scan direction) in the reference detection frame 100, and the first evaluation value 500. The combination of the second evaluation value 510 and the first evaluation value 500 paired with each other is stored. Then, the feature quantity acquisition unit 131 generates a two-dimensional evaluation value histogram indicating the frequency distribution for the stored combinations of the plurality of sets. This point will be described in detail below.

図２６に示されるように、まず特徴量取得部１３１は、基準検出枠１００内の左上の評価値を第１評価値５００とし、その右隣の評価値を第２評価値５１０として、第１評価値５００と第２評価値５１０の組み合わせを記憶する。 As shown in FIG. 26, the feature amount acquisition unit 131 first sets the first upper evaluation value in the reference detection frame 100 as the first evaluation value 500, and sets the evaluation value adjacent to the right as the second evaluation value 510. A combination of the evaluation value 500 and the second evaluation value 510 is stored.

次に図２７に示されるように、特徴量取得部１３１は、基準検出枠１００内の左上から右方向に見て２番目の評価値を第１評価値５００とし、その右隣の評価値を第２評価値５１０として、第１評価値５００と第２評価値５１０の組み合わせを記憶する。 Next, as shown in FIG. 27, the feature quantity acquisition unit 131 sets the second evaluation value as the first evaluation value 500 when viewed from the upper left in the reference detection frame 100 in the right direction, and sets the evaluation value adjacent to the right as the first evaluation value. As the second evaluation value 510, a combination of the first evaluation value 500 and the second evaluation value 510 is stored.

以後、特徴量取得部１３１は、ラスタスキャン方向に沿って、基準検出枠１００内の評価値を順番に第１評価値５００とし、当該第１評価値５００の右隣の評価値を第２評価値５１０として、各第１評価値５００について、当該第１評価値５００と、それとペアとなる第２評価値５１０の組み合わせを記憶する。図２８では、基準検出枠１００内の右下から左方向に見て２番目の評価値が第１評価値５００とされ、その右隣の評価値が第２評価値５１０とされている様子が示されている。 Thereafter, the feature quantity acquisition unit 131 sequentially sets the evaluation values in the reference detection frame 100 in the raster scan direction as the first evaluation value 500, and sets the evaluation value on the right side of the first evaluation value 500 as the second evaluation value. As each value 510, for each first evaluation value 500, a combination of the first evaluation value 500 and the second evaluation value 510 paired therewith is stored. In FIG. 28, the second evaluation value viewed from the lower right in the reference detection frame 100 in the left direction is the first evaluation value 500, and the evaluation value adjacent to the right is the second evaluation value 510. It is shown.

なお、特徴量取得部１３１は、基準検出枠１００において、左上から右下にかけて順番に評価値を第１評価値５００とし、当該第１評価値５００とペアとなる第２評価値５１０と当該第１評価値５００の組み合わせを記憶していく際に、第１評価値５００とペアとなる第２評価値５１０を定めることができないときには、当該第１評価値５００と、それとペアとなる第２評価値５１０との組み合わせは記憶しない。例えば、図２６〜２８の例では、基準検出枠１００内の右下の評価値を第１評価値５００とした場合には、それとペアとなる第２評価値５１０を定めることができないことから、当該第１評価値５００と、それとペアとなる第２評価値５２０との組み合わせは記憶されない。 Note that the feature amount acquisition unit 131 sets the evaluation value as the first evaluation value 500 in order from the upper left to the lower right in the reference detection frame 100, and the second evaluation value 510 and the first evaluation value 500 paired with the first evaluation value 500. When the second evaluation value 510 that is paired with the first evaluation value 500 cannot be determined when the combination of the first evaluation values 500 is stored, the first evaluation value 500 and the second evaluation that is paired with the first evaluation value 500 are stored. The combination with the value 510 is not stored. For example, in the example of FIGS. 26 to 28, when the lower right evaluation value in the reference detection frame 100 is the first evaluation value 500, the second evaluation value 510 that is paired with the first evaluation value 500 cannot be determined. A combination of the first evaluation value 500 and the second evaluation value 520 paired therewith is not stored.

特徴量取得部１３１は、第１評価値５００と第２評価値５１０の組み合わせの記憶が終了すると、記憶した複数組の組み合わせに基づいて、対象評価値ペアがとり得る値の組み合わせについての頻度分布を示す２次元評価値ヒストグラムを生成する。 When the combination of the first evaluation value 500 and the second evaluation value 510 is completed, the feature amount acquisition unit 131, based on the stored combination of the plurality of combinations, the frequency distribution regarding the combination of values that the target evaluation value pair can take Is generated.

図２９は、対象評価値ペアについての２次元評価値ヒストグラムの一例を示す図である。図２９のＸ方向に沿った第１軸は、対象評価値ペアの第１評価値がとり得る５９種類の値に対してそれぞれ割り当てられた０〜５８の番号が示されている。また図２９のＹ方向に沿った第２軸は、対象評価値ペアの第２評価値がとり得る５９種類の値に対してそれぞれ割り当てられた０〜５８の番号が示されている。そして図２９のＺ方向に沿った第３軸は、評価値マップに設定された特徴量抽出枠内における、第１軸に示された番号の値を有する第１評価値と、第２軸に示された番号の値を有する第２評価値との組み合わせの頻度を示している。つまり、図２９の第３軸は、対象評価値ペアについて記憶された複数組の組み合わせにおいて、第１軸に示された番号の値を有する第１評価値と、第２軸に示された番号の値を有する第２評価値との組み合わせがいくつ存在しているかを示している。 FIG. 29 is a diagram illustrating an example of a two-dimensional evaluation value histogram for a target evaluation value pair. The first axis along the X direction in FIG. 29 shows numbers 0 to 58 assigned to 59 types of values that can be taken by the first evaluation value of the target evaluation value pair. In addition, the second axis along the Y direction in FIG. 29 shows numbers 0 to 58 assigned to 59 types of values that can be taken by the second evaluation value of the target evaluation value pair. The third axis along the Z direction in FIG. 29 is the first evaluation value having the number indicated by the first axis in the feature amount extraction frame set in the evaluation value map, and the second axis. The frequency of combination with the second evaluation value having the value of the indicated number is shown. That is, the third axis in FIG. 29 is the first evaluation value having the number indicated on the first axis and the number indicated on the second axis in a plurality of combinations stored for the target evaluation value pair. It shows how many combinations with the second evaluation value having the value of.

例えば、対象評価値ペアについて記憶された複数組の組み合わせにおいて、番号０の値を有する第１評価値と、番号２の値を有する第２評価値との組み合わせが８個存在する場合には、第１軸に示される番号０及び第２軸に示される番号２に対応するビンでの頻度が“８”となる。また、対象評価値ペアについて記憶された複数組の組み合わせにおいて、番号２の値を有する第１評価値と、番号１の値を有する第２評価値との組み合わせが３個存在する場合には、第１軸に示される番号２及び第２軸に示される番号１に対応するビンでの頻度が“３”となる。本実施の形態では、２次元評価値ヒストグラムは、３４８１（＝５９×５９）個のビンを有する。 For example, when there are eight combinations of the first evaluation value having a value of number 0 and the second evaluation value having a value of number 2 in a plurality of combinations stored for the target evaluation value pair, The frequency in the bin corresponding to the number 0 indicated on the first axis and the number 2 indicated on the second axis is “8”. Further, in the combination of a plurality of sets stored for the target evaluation value pair, when there are three combinations of the first evaluation value having the value of number 2 and the second evaluation value having the value of number 1, The frequency in the bin corresponding to the number 2 indicated on the first axis and the number 1 indicated on the second axis is “3”. In the present embodiment, the two-dimensional evaluation value histogram has 3481 (= 59 × 59) bins.

上述のようにして、特徴量取得部１３１は、３０種類の評価値ペアのそれぞれについて２次元評価値ヒストグラムを生成する。これにより、３０個の２次元評価値ヒストグラムが生成される。 As described above, the feature amount acquisition unit 131 generates a two-dimensional evaluation value histogram for each of the 30 types of evaluation value pairs. Thereby, 30 two-dimensional evaluation value histograms are generated.

特徴量取得部１３１は、３０個の２次元評価値ヒストグラムを生成すると、当該３０個の２次元評価値ヒストグラムのそれぞれについて、当該２次元評価ヒストグラムにおける３４８１個のビンでの頻度のそれぞれを、抽出用処理対象画像に設定した基準検出枠内の画像についての第２特徴量とする。これにより、抽出用処理対象画像に設定された基準検出枠内の画像から、１０４４３０（＝３４８１×３０）個の第２特徴量が抽出される。 When the feature quantity acquisition unit 131 generates 30 two-dimensional evaluation value histograms, the feature amount acquisition unit 131 extracts, for each of the 30 two-dimensional evaluation value histograms, frequencies of 3481 bins in the two-dimensional evaluation histogram. The second feature amount for the image within the reference detection frame set as the processing target image. Accordingly, 104430 (= 3481 × 30) second feature values are extracted from the image within the reference detection frame set as the extraction processing target image.

特徴量取得部１３１は、抽出用処理対象画像に対して基準検出枠をラスタスキャンさせていく際に、基準検出枠の各位置において、上記のようにして、基準用評価値マップを用いて基準検出枠内の画像から１０４４３０個の第２特徴量を取得する。 When the reference detection frame is raster scanned with respect to the extraction processing target image, the feature amount acquisition unit 131 uses the reference evaluation value map as a reference at each position of the reference detection frame as described above. 104430 second feature values are acquired from the image within the detection frame.

特徴量取得部１３１では、非基準検出枠が使用される場合でも、同様にして第２特徴量が取得される。特徴量取得部１３１は、非基準検出枠に対応する抽出用サイズ変更画像のある位置にサイズ変更検出枠を設定した際には、当該非基準検出枠に対応する、評価値取得部１３０で生成された非基準用評価値マップに対して、当該ある位置と同じ位置に当該サイズ変更検出枠を設定する。そして、特徴量取得部１３１は、非基準用評価値マップに設定したサイズ変更検出枠内での評価値の共起頻度の分布を示す２次元評価値ヒストグラムを３０種類の評価値ペアのそれぞれについて生成する。 The feature amount acquisition unit 131 acquires the second feature amount in the same manner even when the non-reference detection frame is used. When the size change detection frame is set at a position of the extraction size change image corresponding to the non-reference detection frame, the feature amount acquisition unit 131 generates the evaluation value acquisition unit 130 corresponding to the non-reference detection frame. The size change detection frame is set at the same position as the certain position for the non-reference evaluation value map. Then, the feature amount acquisition unit 131 generates a two-dimensional evaluation value histogram indicating the distribution of co-occurrence frequencies of evaluation values within the size change detection frame set in the non-reference evaluation value map for each of the 30 evaluation value pairs. Generate.

特徴量取得部１３１は、３０個の２次元評価値ヒストグラムを生成すると、当該３０個の２次元評価値ヒストグラムのそれぞれについて、当該２次元評価値ヒストグラムにおける３４８１個のビンでの頻度のそれぞれを、抽出用サイズ変更画像に設定したサイズ変更検出枠内の画像についての第２特徴量とする。これにより、抽出用サイズ変更画像に設定されたサイズ変更検出枠内の画像から１０４４３０個の第２特徴量が抽出される。 When the feature quantity acquisition unit 131 generates 30 two-dimensional evaluation value histograms, for each of the 30 two-dimensional evaluation value histograms, the frequency of 3481 bins in the two-dimensional evaluation value histogram is calculated. The second feature amount of the image within the size change detection frame set in the size change image for extraction is used. As a result, 104430 second feature values are extracted from the image within the size change detection frame set in the size change image for extraction.

特徴量取得部１３１は、抽出用サイズ変更画像に対してサイズ変更検出枠をラスタスキャンさせていく際に、サイズ変更検出枠の各位置において、上記のようにして、非基準用評価値マップを用いてサイズ変更検出枠内の画像から１０４４３０個の第２特徴量を取得する。特徴量取得部１３１は、複数種類の非基準検出枠のそれぞれに関して、当該非基準検出枠に対応する抽出用サイズ変更画像に対して当該非基準検出枠に対応するサイズ変更検出枠をラスタスキャンさせながら、当該非基準検出枠に対応する非基準用評価値マップを用いて当該サイズ変更検出枠内の画像から１０４４３０個の第２特徴量を取得する。 The feature amount acquisition unit 131 rasterizes the non-reference evaluation value map as described above at each position of the size change detection frame when raster-scanning the size change detection frame with respect to the size change image for extraction. Using this, 104430 second feature values are acquired from the image within the size change detection frame. For each of a plurality of types of non-reference detection frames, the feature amount acquisition unit 131 raster-scans the size change detection frame corresponding to the non-reference detection frame with respect to the extracted size-changed image corresponding to the non-reference detection frame. However, 104430 second feature values are acquired from the image in the size change detection frame using the non-reference evaluation value map corresponding to the non-reference detection frame.

なお、第１特徴量と同様に、枠内画像から抽出された１０４４３０個の第２特徴量（２次元評価値ヒストグラムの１０４４３０個のビンでの頻度）のそれぞれを上記の式（４）を用いて正規化しても良い。これにより、第２特徴量が、撮像装置５での撮像環境の変化の影響を受けにくくなる。式（４）を用いて第２特徴量を正規化する場合には、ｖは正規化前の第２特徴量となり、Ｖは正規化後の第２特徴量となる。また、ｋは、枠内画像から抽出された第２特徴量の数となり、ｋ＝１０４４３０である。そして、ｖ（ｉ）は、ｋ個の第２特徴量に対して１番〜ｋ番までをそれぞれ付与した場合における、ｉ番の正規化前の第２特徴量となる。 As in the case of the first feature amount, each of the 104430 second feature amounts (frequency in 104430 bins of the two-dimensional evaluation value histogram) extracted from the in-frame image is expressed by the above equation (4). Normalization. As a result, the second feature amount is not easily affected by a change in the imaging environment of the imaging device 5. When the second feature value is normalized using Expression (4), v is the second feature value before normalization, and V is the second feature value after normalization. Further, k is the number of second feature values extracted from the in-frame image, and k = 104430. Then, v (i) is the second feature amount before normalization of the i-th number when the first to k-th numbers are assigned to the k second feature amounts.

＜識別器に入力する特徴ベクトルについて＞
特徴量取得部１３１は、枠内画像から、５９個の第１特徴量と１０４４３０個の第２特徴量を抽出すると、これらの１０４４８９（＝５９＋１０４４３０）個の特徴量を順番で並べて得られる特徴ベクトルを生成する。特徴ベクトルの次元数は“１０４４８９”となる。そして、特徴量取得部１３１は、生成した特徴ベクトルを識別器１４に入力する。識別器１４は、上述のように、入力された特徴ベクトルと、重みベクトルとに基づいて、枠内画像が顔画像である確からしさを示す検出確度値を算出する。識別器１４で使用される重みベクトルは、上記と同様にして学習サンプルから抽出された５９個の第１特徴量及び１０４４３０個の第２特徴量から成る特徴ベクトルに基づいて生成されている。 <About feature vectors input to the classifier>
When the feature quantity acquisition unit 131 extracts 59 first feature quantities and 104430 second feature quantities from the in-frame image, a feature vector obtained by arranging these 104489 (= 59 + 104430) feature quantities in order. Is generated. The number of dimensions of the feature vector is “104489”. Then, the feature quantity acquisition unit 131 inputs the generated feature vector to the classifier 14. As described above, the discriminator 14 calculates a detection accuracy value indicating the likelihood that the in-frame image is a face image based on the input feature vector and the weight vector. The weight vector used in the discriminator 14 is generated based on a feature vector composed of 59 first feature values and 104430 second feature values extracted from the learning sample in the same manner as described above.

このように、本実施の形態に係る特徴量抽出部１３は、枠内画像から、１０４４８９個の特徴量から成る特徴ベクトルを取得して識別器１４に入力する。 As described above, the feature quantity extraction unit 13 according to the present embodiment acquires a feature vector composed of 104489 feature quantities from the in-frame image and inputs the feature vector to the discriminator 14.

なお、uniformを使用して評価値がとり得る値が制限されない場合には、１次元評価値ヒストグラムでのビンの数は２５６個となり、２次元評価値ヒストグラムでのビンの数は１９６６０８０（＝２５６×２５６×３０）個となる。したがって、この場合の特徴ベクトルの次元数は１９６６３３６（＝２５６＋１９６６０８０）となる。 If the values that the evaluation value can take using uniform are not limited, the number of bins in the one-dimensional evaluation value histogram is 256, and the number of bins in the two-dimensional evaluation value histogram is 1966080 (= 256). × 256 × 30). Therefore, the dimension number of the feature vector in this case is 1966336 (= 256 + 1966080).

＜出力値マップ生成処理＞
出力値マップ生成部１５は、検出部１２での検出結果に基づいて、顔画像としての確からしさ（顔画像らしさ）を示す検出確度値についての処理対象画像での分布を示す出力値マップを生成する。 <Output value map generation processing>
The output value map generation unit 15 generates an output value map indicating the distribution in the processing target image with respect to the detection accuracy value indicating the likelihood (face image likelihood) as the face image based on the detection result in the detection unit 12. To do.

具体的には、出力値マップ生成部１５は、抽出用処理対象画像と同様に、行方向に（Ｍ−２）個の値が並び、列方向に（Ｎ−２）個の値が並ぶ、合計（（Ｍ−２）×（Ｎ−２））個の値から成るマップ６００を考える。そして、出力値マップ生成部１５は、処理対象画像についての一つの検出結果枠を対象検出結果枠とし、対象検出結果枠と同じ位置に、対象検出結果枠と同じ大きさの枠６１０をマップ６００に対して設定する。図３０は、マップ６００に対して枠６１０を設定した様子を示す図である。 Specifically, the output value map generation unit 15 arranges (M−2) values in the row direction and (N−2) values in the column direction, as in the extraction processing target image. Consider a map 600 consisting of a total of ((M−2) × (N−2)) values. Then, the output value map generation unit 15 sets one detection result frame for the processing target image as the target detection result frame, and maps the frame 610 having the same size as the target detection result frame to the map 600 at the same position as the target detection result frame. Set for. FIG. 30 is a diagram showing a state in which a frame 610 is set for the map 600.

次に出力値マップ生成部１５は、マップ６００における、枠６１０外の各値については“０”とし、枠６１０内の各値については、対象検出結果枠に対応する検出確度値（対象検出結果枠となった検出枠内の画像に対して顔画像の検出を行った結果得られた検出確度値）を用いて決定する。対象検出結果枠の大きさが、例えば１６ｐ×１６ｐであるとすると、枠６１０内には、行方向に１６個、列方向に１６個、合計２５６個の値が存在する。また、対象検出結果枠の大きさが、例えば２０ｐ×２０ｐであるとすると、枠６１０内には、行方向に２０個、列方向に２０個、合計４００個の値が存在する。図３１は、枠６１０内の各値を決定する方法を説明するための図である。 Next, the output value map generation unit 15 sets “0” for each value outside the frame 610 in the map 600, and for each value within the frame 610, a detection accuracy value (target detection result) corresponding to the target detection result frame. This is determined using a detection accuracy value obtained as a result of detecting a face image with respect to an image in the detection frame that is a frame. If the size of the object detection result frame is 16p × 16p, for example, there are 16 values in the frame 610, 16 in the row direction and 16 in the column direction, for a total of 256 values. If the size of the target detection result frame is, for example, 20p × 20p, there are a total of 400 values in the frame 610, 20 in the row direction and 20 in the column direction. FIG. 31 is a diagram for explaining a method of determining each value in the frame 610.

出力値マップ生成部１５は、枠６１０内の中心６１１の値を、検出部１２で求められた、対象検出結果枠に対応する検出確度値とする。そして、出力値マップ生成部１５は、枠６１０内のそれ以外の複数の値を、枠６１０の中心６１１の値を最大値とした正規分布曲線に従って枠６１０内の中心６１１から外側に向けて値が徐々に小さくなるようにする。これにより、マップ６００を構成する複数の値のそれぞれが決定されて、対象検出結果枠に対応するマップ６００が完成する。 The output value map generation unit 15 sets the value of the center 611 in the frame 610 as the detection accuracy value corresponding to the target detection result frame obtained by the detection unit 12. Then, the output value map generation unit 15 sets the other values in the frame 610 to the outside from the center 611 in the frame 610 according to the normal distribution curve with the value at the center 611 of the frame 610 as the maximum value. Is gradually reduced. Thereby, each of the plurality of values constituting the map 600 is determined, and the map 600 corresponding to the target detection result frame is completed.

以上のようにして、出力値マップ生成部１５は、処理対象画像についての複数の検出結果枠にそれぞれ対応する複数のマップ６００を生成する。そして、出力値マップ生成部１５は、生成した複数のマップ６００を合成して出力値マップを生成する。 As described above, the output value map generation unit 15 generates a plurality of maps 600 respectively corresponding to a plurality of detection result frames for the processing target image. Then, the output value map generation unit 15 combines the plurality of generated maps 600 to generate an output value map.

具体的には、出力値マップ生成部１５は、生成した複数のマップ６００のｍ×ｎ番目の値を加算し、それによって得られた加算値を出力値マップのｍ×ｎ番目の検出確度値とする。出力値マップ生成部１５は、このようにして、出力値マップを構成する各検出確度値を求める。これにより、処理対象画像での検出確度値の分布を示す出力値マップが完成される。出力値マップは、抽出用処理対象画像と同様に、（（Ｍ−２）×（Ｎ−２））個の検出確度値で構成される。出力値マップを参照すれば、処理対象画像において顔画像らしさが高い領域を特定することができる。つまり、出力値マップを参照することによって、処理対象画像おける顔画像を特定することができる。 Specifically, the output value map generation unit 15 adds the m × n-th value of the plurality of generated maps 600, and uses the obtained addition value as the m × n-th detection accuracy value of the output value map. And In this way, the output value map generation unit 15 obtains each detection accuracy value constituting the output value map. Thereby, an output value map indicating the distribution of detection accuracy values in the processing target image is completed. The output value map is composed of ((M−2) × (N−2)) detection accuracy values, like the extraction target image. By referring to the output value map, it is possible to specify a region having a high likelihood of a face image in the processing target image. That is, the face image in the processing target image can be specified by referring to the output value map.

図３２は、処理対象画像２０についての出力値マップを当該処理対象画像２０に重ねて示す図である。図３２には、評価値マップとしてネガティブＬＴＰマップが使用された際の出力値マップが示されている。図３２及び後述の図３３では、理解し易いように、検出確度値の大きさを例えば第１段階から第５段階の５段階に分けて出力値マップを示している。図３２，３３に示される出力値マップにおいては、検出確度値が、最も大きい第５段階に属する領域については砂地のハッチングが示されており、２番目に大きい第４段階に属する領域については左上がりのハッチングが示されている。また、図３２，３３での出力値マップにおいては、検出確度値が、３番目に大きい第３段階に属する領域については右上がりのハッチングが示されており、４番目に大きい第２段階に属する領域については縦線のハッチングが示されている。そして、図３２，３３に示される出力値マップにおいては、検出確度値が、最も小さい第１段階に属する領域についてはハッチングが示されていない。 FIG. 32 is a diagram showing an output value map for the processing target image 20 superimposed on the processing target image 20. FIG. 32 shows an output value map when a negative LTP map is used as the evaluation value map. In FIG. 32 and FIG. 33 to be described later, for easy understanding, the output value map is shown by dividing the magnitude of the detection accuracy value into, for example, five stages from the first stage to the fifth stage. In the output value maps shown in FIGS. 32 and 33, sand areas are hatched for areas belonging to the fifth stage where the detection accuracy value is the largest, and left for areas belonging to the second largest fourth stage. Rising hatching is shown. Also, in the output value maps in FIGS. 32 and 33, a region belonging to the third stage where the detection accuracy value is the third largest shows hatching that rises to the right, and belongs to the second largest second stage. For the region, vertical hatching is shown. In the output value maps shown in FIGS. 32 and 33, hatching is not shown for the region belonging to the first stage having the smallest detection accuracy value.

図３２に示される出力値マップにおいては、処理対象画像２０での顔画像に対応する領域での検出確度値が高くなっている。これは、処理対象画像２０に含まれる顔画像が適切に検出されていることを意味する。 In the output value map shown in FIG. 32, the detection accuracy value in the region corresponding to the face image in the processing target image 20 is high. This means that the face image included in the processing target image 20 is properly detected.

図３３は、本実施の形態とは異なり、枠内画像から第１特徴量（１次元評価値マップでの各ビンの頻度）だけが抽出され、識別器１４には第１特徴量だけから成る特徴ベクトルが入力された際の出力値マップを処理対象画像２０に重ねて示す図である。図３３では、図３２と同様に、評価値マップとしてネガティブＬＴＰマップが使用された際の出力値マップが示されている。 In FIG. 33, unlike the present embodiment, only the first feature amount (frequency of each bin in the one-dimensional evaluation value map) is extracted from the in-frame image, and the discriminator 14 includes only the first feature amount. It is a figure which shows the output value map at the time of the feature vector being input superimposed on the process target image. FIG. 33 shows an output value map when a negative LTP map is used as the evaluation value map, as in FIG.

図３３に示されるように、第１特徴量だけが使用されて顔検出が行われることによって得られた出力値マップでは、検出確度値が大きくなっている、処理対象画像２０での非顔画像に対応する領域が多くなっている。したがって、顔画像を誤検出する可能性が高くなる。 As shown in FIG. 33, in the output value map obtained by performing face detection using only the first feature amount, the non-face image in the processing target image 20 has a large detection accuracy value. There are many areas corresponding to. Therefore, the possibility that a face image is erroneously detected increases.

このように、本実施の形態に係る画像検出装置１では、第１特徴量だけが使用されて顔検出が行わる場合と比較して、顔画像についての検出精度が高くなっている。 As described above, in the image detection apparatus 1 according to the present embodiment, the detection accuracy for the face image is higher than when the face detection is performed using only the first feature amount.

画像検出装置１は、出力値マップを生成すると、当該出力値マップに基づいて、処理対象画像での顔画像を特定する。具体的には、画像検出装置１は、出力値マップにおいて、検出確度値がしきい値以上である領域を特定し、処理対象画像での当該領域と同じ位置に存在する領域を顔画像であると認定する。そして、画像検出装置１は、処理対象画像を表示装置で表示する際に、当該処理対象画像での顔画像を四角枠等が囲うようにする。 When generating the output value map, the image detection device 1 specifies a face image in the processing target image based on the output value map. Specifically, the image detection apparatus 1 specifies a region where the detection accuracy value is equal to or greater than a threshold value in the output value map, and a region existing at the same position as the region in the processing target image is a face image. Certify. Then, when displaying the processing target image on the display device, the image detection device 1 surrounds the face image in the processing target image with a square frame or the like.

また、画像検出装置１は、予め登録された顔画像と、処理対象画像において特定した顔画像とを比較し、両者が一致するか否かを判定しても良い。そして、画像検出装置１は、予め登録された顔画像と、処理対象画像において特定した顔画像と一致しない場合には、処理対象画像での当該顔画像に対してモザイク処理を行った上で、当該処理対象画像を表示装置に表示しても良い。これにより、本実施の形態に係る画像検出装置１を監視カメラシステムに使用した場合において、監視カメラによって隣家の人の顔画像が撮影された場合であっても、当該顔画像を認識できないようにすることができる。つまり、プライバシーマスクを実現することができる。 Further, the image detection apparatus 1 may compare a face image registered in advance with a face image specified in the processing target image and determine whether or not they match. Then, when the face image registered in advance and the face image specified in the processing target image do not match, the image detection device 1 performs mosaic processing on the face image in the processing target image, The processing target image may be displayed on a display device. As a result, when the image detection apparatus 1 according to the present embodiment is used in a surveillance camera system, even when a face image of a neighbor's person is photographed by the surveillance camera, the face image cannot be recognized. can do. That is, a privacy mask can be realized.

以上のように、本実施の形態に係る特徴量抽出装置１３（特徴量抽出部１３）では、注目画素値と複数の周囲画素値との関係を示す評価値についての共起頻度を特徴量としていることから、顔画像の検出等の画像検出に適切な特徴量を得ることができる。よって、本実施の形態のように、処理対象画像から検出対象画像を検出する画像検出装置１において特徴量抽出装置１３を使用し、特徴量抽出装置１３において画像から抽出された特徴量に基づいて、当該画像が検出対象画像である可能性が高いかを判定することによって、判定精度を向上することができる。したがって、検出対象画像についての誤検出を抑制することができる。つまり、検出対象画像についての検出精度が向上する。 As described above, in the feature quantity extraction device 13 (feature quantity extraction unit 13) according to the present embodiment, the co-occurrence frequency for the evaluation value indicating the relationship between the target pixel value and a plurality of surrounding pixel values is used as the feature quantity. Therefore, it is possible to obtain a feature amount suitable for image detection such as face image detection. Therefore, as in the present embodiment, the feature amount extraction device 13 is used in the image detection device 1 that detects the detection target image from the processing target image, and the feature amount extraction device 13 is based on the feature amount extracted from the image. The determination accuracy can be improved by determining whether or not the image is highly likely to be a detection target image. Therefore, erroneous detection of the detection target image can be suppressed. That is, the detection accuracy for the detection target image is improved.

また、画像の局所的なテクスチャを示すＬＢＰ、ネガティブＬＴＰあるいはポジティブＬＴＰに基づく特徴量を使用して顔画像の検出を行うことによって、ＨＯＧ（Histgrams of Oriented Gradients）特徴量あるいはＨａａｒ−ｌｉｋｅ特徴量だけを使用する場合よりも顔画像についての検出精度を向上することができる。 Further, by detecting a face image using a feature quantity based on LBP, negative LTP, or positive LTP indicating the local texture of the image, only HOG (Histgrams of Oriented Gradients) feature quantity or Haar-like feature quantity is obtained. The detection accuracy for the face image can be improved as compared with the case of using.

また、本実施の形態では、第２特徴量を使用することによって検出対象画像についての検出精度が向上することから、処理量を低減するために第１及び第２特徴量の正規化を行わず、そのために第１及び第２特徴量が撮像環境の変化の影響を受けやすくなったとしても、検出対象画像についての検出精度を維持することができる。 In the present embodiment, since the detection accuracy of the detection target image is improved by using the second feature amount, the first and second feature amounts are not normalized in order to reduce the processing amount. For this reason, even if the first and second feature quantities are easily affected by changes in the imaging environment, the detection accuracy for the detection target image can be maintained.

なお、上記の例では、識別器１４に入力される特徴ベクトルには、第１特徴量と第２特徴量の両方が含まれていたが、第１特徴量が含まれていなくても良い。つまり、特徴ベクトルには、少なくとも第２特徴量（評価値の共起頻度）が含まれていれば良い。 In the above example, the feature vector input to the discriminator 14 includes both the first feature value and the second feature value. However, the first feature value may not be included. That is, the feature vector only needs to include at least the second feature amount (co-occurrence frequency of evaluation values).

また、特徴ベクトルには、ＬＢＰマップから取得された第２特徴量、ポジティブＬＴＰマップから取得された第２特徴量、及びネガティブＬＴＰマップから取得された第２特徴量のうちの少なくとも２種類の第２特徴量が含まれても良い。例えば、特徴ベクトルには、ＬＢＰマップから取得された第２特徴量、ポジティブＬＴＰマップから取得された第２特徴量、及びネガティブＬＴＰマップから取得された第２特徴量が含まれても良いし、ポジティブＬＴＰマップから取得された第２特徴量及びネガティブＬＴＰマップから取得された第２特徴量が含まれても良い。 The feature vector includes at least two types of second feature values acquired from the LBP map, second feature values acquired from the positive LTP map, and second feature values acquired from the negative LTP map. Two feature quantities may be included. For example, the feature vector may include a second feature amount acquired from the LBP map, a second feature amount acquired from the positive LTP map, and a second feature amount acquired from the negative LTP map. The second feature value acquired from the positive LTP map and the second feature value acquired from the negative LTP map may be included.

また、特徴ベクトルには、ＨＯＧ特徴量、Ｈａａｒ−ｌｉｋｅ特徴量などの他の種類の特徴量が含まれても良い。 In addition, the feature vector may include other types of feature amounts such as an HOG feature amount and a Haar-like feature amount.

＜各種変形例＞
以下に本実施の形態についての各種変形例について説明する。 <Various modifications>
Hereinafter, various modified examples of the present embodiment will be described.

＜第１変形例＞＞
枠内画像から特徴量を抽出する際には、枠内画像を複数のブロックに分割し、各ブロックから個別に特徴量を抽出し、抽出された当該複数のブロックについての特徴量を当該枠内画像についての特徴量としても良い。これにより、枠内画像を構成する複数のブロックのそれぞれについて独立した特徴量を抽出することができる。したがって、顔の一部が隠れている場合であっても、処理対象画像から当該顔についての顔画像を適切に検出することが可能となる。以下に、枠内画像が行列状に４つのブロックに分割される場合を例に挙げて、本変形例について説明する。 <First Modification>
When extracting feature values from the image within the frame, the image within the frame is divided into a plurality of blocks, the feature values are extracted individually from each block, and the feature values for the extracted plurality of blocks are included in the frame. A feature amount for an image may be used. Thereby, independent feature amounts can be extracted for each of a plurality of blocks constituting the in-frame image. Therefore, even when a part of the face is hidden, it is possible to appropriately detect the face image of the face from the processing target image. Hereinafter, this modification will be described by taking as an example a case where the in-frame image is divided into four blocks in a matrix.

特徴量取得部１３１は、図３４に示されるように、抽出対象画像７００のある位置に特徴量抽出枠７１０を設定すると、特徴量抽出枠７１０内の画像（枠内画像）を行列状に４つの画像ブロック７２０に分割する。また、特徴量取得部１３１は、評価値マップ８００に対して、抽出対象画像７００に設定した特徴量抽出枠７１０と同じ位置に特徴量抽出枠７１０を設定すると、特徴量抽出枠７１０内の領域を行列状に４つの評価値ブロック８２０に分割する。そして、特徴量取得部１３１は、４つの評価値ブロック８２０のそれぞれについて、当該評価値ブロック８２０に含まれる複数の評価値を用いて、上述のようにして複数の第１特徴量と複数の第２特徴量を求める。第１特徴量の数及び第２特徴量の数は、評価値ブロック８２０に含まれる複数の評価値の数に依存する。以後、一つの評価値ブロック８２０について求められた、複数の第１特徴量及び複数の第２特徴量をまとめて「特徴量群」と呼ぶ。 As shown in FIG. 34, when the feature quantity acquisition unit 131 sets a feature quantity extraction frame 710 at a certain position of the extraction target image 700, the feature quantity acquisition unit 131 sets 4 images in the feature quantity extraction frame 710 (in-frame images) in a matrix. The image is divided into two image blocks 720. In addition, when the feature amount acquisition unit 131 sets the feature amount extraction frame 710 at the same position as the feature amount extraction frame 710 set in the extraction target image 700 with respect to the evaluation value map 800, the region in the feature amount extraction frame 710 is displayed. Is divided into four evaluation value blocks 820 in a matrix. Then, for each of the four evaluation value blocks 820, the feature amount acquisition unit 131 uses the plurality of evaluation values included in the evaluation value block 820 as described above and the plurality of first feature amounts and the plurality of first values. 2 Find the feature quantity. The number of first feature values and the number of second feature values depends on the number of evaluation values included in the evaluation value block 820. Hereinafter, the plurality of first feature amounts and the plurality of second feature amounts obtained for one evaluation value block 820 are collectively referred to as a “feature amount group”.

特徴量取得部１３１は、複数の評価値ブロック８２０のそれぞれについて、当該評価値ブロック８２０の特徴量群を求めると、当該特徴量群を、当該評価値ブロック８２０と同じ位置にある画像ブロック７２０についての特徴量とする。これにより、枠内画像を構成する４つの画像ブロック７２０のそれぞれから独立して特徴量が抽出される。特徴量取得部１３１は、枠内画像を構成する４つの画像ブロック７２０から特徴量を抽出すると、当該４つの画像ブロック７２０の特徴量から成るベクトルを当該枠内画像についての特徴ベクトルとして識別器１４に入力する。識別器１４は、入力された特徴ベクトルと重みベクトルに基づいて、当該枠内画像が顔画像である確からしさを示す検出確度値を算出する。 When the feature amount acquisition unit 131 obtains a feature amount group of the evaluation value block 820 for each of the plurality of evaluation value blocks 820, the feature amount acquisition unit 131 sets the feature amount group for the image block 720 at the same position as the evaluation value block 820. Feature amount. Thereby, the feature amount is extracted independently from each of the four image blocks 720 constituting the in-frame image. When the feature quantity acquisition unit 131 extracts the feature quantities from the four image blocks 720 constituting the in-frame image, the classifier 14 uses a vector composed of the feature quantities of the four image blocks 720 as a feature vector for the in-frame image. To enter. The discriminator 14 calculates a detection accuracy value indicating the likelihood that the in-frame image is a face image based on the input feature vector and weight vector.

＜第２変形例＞
上述のように、評価値の共起頻度を特徴量として使用することによって、検出対象画像についての誤検出を抑制することができることから、処理量を低減するために特徴ベクトルの次元数を低減させたとしても検出対象画像の検出精度を維持することができる。 <Second Modification>
As described above, by using the co-occurrence frequency of the evaluation value as a feature amount, it is possible to suppress erroneous detection of the detection target image. Therefore, in order to reduce the processing amount, the number of dimensions of the feature vector is reduced. Even so, the detection accuracy of the detection target image can be maintained.

そこで、本変形例では、評価値取得部１３０は、評価値を求める際には、注目画素に対する斜め方向の周囲画素値は使用しないようにする。例えば、評価値取得部１３０は、注目画素に対して右上、左上、右下及び左下の方向の周囲画素値はすべて使用しない。これにより、評価値の取得では、上方向の周囲画素値、下方向の周囲画素値、右方向の周囲画素値及び左方向の周囲画素値だけが使用されることから、評価値は８ビットから４ビットで表現されることになり、評価値の情報量が低減する。したがって、評価値がとり得る値は０〜１５の１６種類となり、評価値がとり得る値の種類の数が低減する。なお、右上、左上、右下及び左下の方向の周囲画素値のうち少なくとも一つの周囲画素値を評価値の取得で使用しないことによって、評価値がとり得る値の種類の数を低減することができる。 Therefore, in this modification, the evaluation value acquisition unit 130 does not use the surrounding pixel values in the oblique direction with respect to the target pixel when obtaining the evaluation value. For example, the evaluation value acquisition unit 130 does not use all the surrounding pixel values in the upper right, upper left, lower right, and lower left directions for the target pixel. Thereby, in the acquisition of the evaluation value, only the surrounding pixel value in the upward direction, the surrounding pixel value in the downward direction, the surrounding pixel value in the right direction, and the surrounding pixel value in the left direction are used. It is expressed by 4 bits, and the information amount of the evaluation value is reduced. Therefore, the possible values of the evaluation value are 16 types of 0 to 15, and the number of types of values that the evaluation value can take is reduced. Note that by not using at least one surrounding pixel value among the surrounding pixel values in the upper right, upper left, lower right, and lower left directions, the number of types of values that the evaluation value can take can be reduced. it can.

このように、評価値がとり得る値の種類の数が低減することによって、１次元評価値ヒストグラム及び２次元評価値ヒストグラムでのビンの数を低減することができる。よって、特徴量抽出部１３での特徴ベクトルの生成処理についての処理量を低減できるとともに、識別器１４での処理量を低減することができる。 Thus, the number of types of values that can be taken by the evaluation value is reduced, whereby the number of bins in the one-dimensional evaluation value histogram and the two-dimensional evaluation value histogram can be reduced. Therefore, it is possible to reduce the processing amount of the feature vector generation processing in the feature amount extraction unit 13 and to reduce the processing amount in the classifier 14.

また、評価値の取得で、斜め方向の周囲画素値が使用されない場合には、注目画素からの距離が“１”である当該斜め方向の周囲位置での画素値を画素値補間処理によって求める必要がないことから、評価値取得部１３０での処理量がさらに低減する。 Further, in the case of obtaining the evaluation value, when the surrounding pixel value in the oblique direction is not used, it is necessary to obtain the pixel value at the surrounding position in the oblique direction whose distance from the target pixel is “1” by the pixel value interpolation process. Therefore, the processing amount in the evaluation value acquisition unit 130 is further reduced.

また、評価値の取得で、右上、左上、右下及び左下の方向の周囲画素値がすべて使用されない場合には、評価値が４ビットで表現されることから、上記のようにuniformを使用して評価値がとり得る値の種類を制限したとしてもそれほど効果が現れない。よって、この場合には、uniformを使用して評価値がとり得る値の種類を制限しないようにする。つまり、評価値取得部１３０は、複数のビット（４ビット）で構成されるテクスチャ表現コード（より正確にはそれを十進数で表した値）を、当該複数のビットを順に見ていった際のビット変化の回数にかかわらず、評価値として使用する。これにより、評価値がとり得る値の種類を制限する処理が不要となることから、評価値取得部１３０での処理量が低減する。 In addition, when all the surrounding pixel values in the upper right, upper left, lower right, and lower left directions are not used in obtaining the evaluation value, the evaluation value is expressed in 4 bits, so uniform is used as described above. Even if the kinds of values that the evaluation value can take are limited, the effect is not so much. Therefore, in this case, uniform is not used to limit the types of values that the evaluation value can take. That is, when the evaluation value acquisition unit 130 looks at a plurality of bits in order, a texture expression code composed of a plurality of bits (4 bits) (more precisely, a value represented by a decimal number). It is used as an evaluation value regardless of the number of bit changes. This eliminates the need to limit the types of values that can be taken by the evaluation value, thereby reducing the amount of processing in the evaluation value acquisition unit 130.

なお、評価値が４ビットで表現され、uniformを使用して評価値がとり得る値の種類が制限されない場合には、１次元評価値ヒストグラムのビンの数は１６個となり、２次元評価値ヒストグラムのビンの数は２５６（＝１６×１６）個となる。したがって、枠内画像から抽出される特徴ベクトルは、７６９６（＝１６＋２５６×３０）個の特徴量で構成され、７６９６次元となる。 If the evaluation value is expressed in 4 bits and the types of values that the evaluation value can take using uniform are not limited, the number of bins in the one-dimensional evaluation value histogram is 16, and the two-dimensional evaluation value histogram The number of bins is 256 (= 16 × 16). Therefore, the feature vector extracted from the in-frame image is composed of 7696 (= 16 + 256 × 30) feature amounts and has 7696 dimensions.

＜第３変形例＞
図３６に示されるように、昼間など撮像環境が明るい場合には、人の顔における、目などの比較的暗い部分と、周囲との明暗がはっきりとなる。したがって、撮像環境が明るい場合には、注目画素の周囲での、当該注目画素よりも暗い画素の分布状況を示すネガティブＬＴＰで構成されたネガティブＬＴＰマップから取得された特徴量に基づいて顔画像の検出を行うことによって、検出精度を向上させることができる。 <Third Modification>
As shown in FIG. 36, when the imaging environment is bright, such as in the daytime, the contrast between a relatively dark part such as the eyes and the surroundings of a person's face becomes clear. Therefore, when the imaging environment is bright, the facial image is based on the feature amount acquired from the negative LTP map including the negative LTP indicating the distribution of pixels darker than the target pixel around the target pixel. Detection accuracy can be improved by performing detection.

一方で、図３７に示されるように、夜間など撮像環境が暗い場合には、人の顔における、頬などの比較的明るい部分と、周囲との明暗がはっきりとなる。したがって、撮像環境が暗い場合には、注目画素の周囲での、当該注目画素よりも明るい画素の分布状況を示すポジティブＬＴＰで構成されたポジティブＬＴＰマップから取得された特徴量に基づいて顔画像の検出を行うことによって、検出精度を向上させることができる。 On the other hand, as shown in FIG. 37, when the imaging environment is dark, such as at night, the brightness of a human face, such as a cheek, and the surroundings become clear. Therefore, when the imaging environment is dark, the facial image is based on the feature amount acquired from the positive LTP map including the positive LTP indicating the distribution of pixels brighter than the target pixel around the target pixel. Detection accuracy can be improved by performing detection.

そこで、本変形例に係る画像処理システム５０では、画像検出装置１が、撮像装置５での撮像環境が明るい場合には、ネガティブＬＴＰマップを使用して顔画像の検出を行い、撮像装置５での撮像環境が暗い場合には、ポジティブＬＴＰマップを使用して顔画像の検出を行う。以下に、本変形例に係る画像処理システム５０について詳細に説明する。 Therefore, in the image processing system 50 according to this modification, when the imaging environment of the imaging device 5 is bright, the image detection device 1 detects a face image using a negative LTP map, and the imaging device 5 When the imaging environment is dark, the face image is detected using the positive LTP map. Hereinafter, the image processing system 50 according to this modification will be described in detail.

図３８は本変形例に係る画像処理システム５０の構成を示す図である。図３８に示されるように、本変形例に係る画像処理システム５０では、撮像装置５に照度センサー５ａが設けられている。照度センサー５ａは、撮像装置５での撮像環境の照度を検出し、検出した照度を示す検出信号を出力する。 FIG. 38 is a diagram showing a configuration of an image processing system 50 according to this modification. As shown in FIG. 38, in the image processing system 50 according to the present modification, the illuminance sensor 5a is provided in the imaging device 5. The illuminance sensor 5a detects the illuminance of the imaging environment in the imaging device 5, and outputs a detection signal indicating the detected illuminance.

図３９は、本変形例に係る画像検出装置１の機能ブロックを示す図である。図３９に示されるように、本変形例に係る画像検出装置１は、照度センサー５ａから出力される検出信号に基づいて、撮像装置５での撮像環境が明るいか否かを判定する判定部１６を備えている。判定部１６は、検出部１２において処理対象画像についての検出処理が行われる際に、照度センサー５ａから出力される検出信号を参照する。そして、判定部１６は、当該検出信号が示す照度がしきい値以上であれば撮像環境は明るいと判定し、当該照度が当該しきい値未満であれば撮像環境は暗いと判定する。 FIG. 39 is a diagram illustrating functional blocks of the image detection apparatus 1 according to the present modification. As shown in FIG. 39, the image detection apparatus 1 according to the present modification example determines whether or not the imaging environment in the imaging apparatus 5 is bright based on the detection signal output from the illuminance sensor 5a. It has. The determination unit 16 refers to a detection signal output from the illuminance sensor 5a when the detection unit 12 performs a detection process on the processing target image. Then, the determination unit 16 determines that the imaging environment is bright if the illuminance indicated by the detection signal is equal to or greater than the threshold value, and determines that the imaging environment is dark if the illuminance is less than the threshold value.

本変形例では、特徴量抽出部１３は、判定部１６において撮像環境が明るいと判定されると、ネガティブＬＴＰマップを生成する。そして、特徴量抽出部１３は、生成したネガティブＬＴＰマップを用いて特徴ベクトルを生成して識別器１４に入力する。一方で、特徴量抽出部１３は、判定部１６において撮像環境が暗いと判定されると、ポジティブＬＴＰマップを生成する。そして、特徴量抽出部１３は、生成したポジティブＬＴＰマップを用いて特徴ベクトルを生成して識別器１４に入力する。 In this modification, the feature quantity extraction unit 13 generates a negative LTP map when the determination unit 16 determines that the imaging environment is bright. Then, the feature quantity extraction unit 13 generates a feature vector using the generated negative LTP map and inputs it to the classifier 14. On the other hand, when the determination unit 16 determines that the imaging environment is dark, the feature amount extraction unit 13 generates a positive LTP map. Then, the feature quantity extraction unit 13 generates a feature vector using the generated positive LTP map and inputs it to the classifier 14.

このように、撮像環境が明るいか暗いかによって、使用する評価値マップの種類を切り替えることによって、顔画像についての検出精度がさらに向上する。 Thus, the detection accuracy for the face image is further improved by switching the type of the evaluation value map to be used depending on whether the imaging environment is bright or dark.

＜第４変形例＞
画素値が輝度であって、画素値が８ビットで表現される際には、ＬＴＰの生成で使用されるオフセット値を“８”に設定することによって、顔画像についての検出精度が向上する。以下に、この理由について説明する。 <Fourth Modification>
When the pixel value is luminance and the pixel value is expressed by 8 bits, the detection accuracy for the face image is improved by setting the offset value used for generating the LTP to “8”. The reason for this will be described below.

図４０は、顔画像サンプルについての隣接画素間の輝度差の頻度分布（度数分布）を示す頻度曲線（度数曲線）９００と、非顔画像サンプルについての隣接画素間の輝度差の頻度分布を示す頻度曲線９１０とを示す図である。図４０の横軸は、画素間の輝度差がとり得る値を示している。図４０の縦軸は、横軸に示された値を有する輝度差の頻度（度数）を示している。頻度曲線９００,９１０については以下のようにして求められる。 FIG. 40 shows a frequency curve (frequency curve) 900 indicating a frequency distribution (frequency distribution) of luminance differences between adjacent pixels for the face image sample, and a frequency distribution of luminance differences between adjacent pixels for the non-face image sample. It is a figure which shows the frequency curve 910. FIG. The horizontal axis in FIG. 40 indicates the values that the luminance difference between pixels can take. The vertical axis in FIG. 40 indicates the frequency (frequency) of the luminance difference having the value indicated on the horizontal axis. The frequency curves 900 and 910 are obtained as follows.

まず、様々な複数枚の顔画像サンプルを用意する。次に、顔画像サンプルを構成する複数の画素のそれぞれについて、当該画素と、それに隣接する３つの画素（真下の画素、右下の画素、右の画素）のそれぞれとの間の輝度差を求める。この輝度差を求める処理を、用意した複数枚の顔画像サンプルのそれぞれについて行う。そして、複数枚の顔画像サンプルについて得られた複数の輝度差についての頻度分布を求める。次に、求めた頻度分布での各頻度を顔画像サンプルの枚数で除算し、１枚の顔画像サンプルについての平均的な頻度分布を生成する。その後、当該頻度分布を示すヒストグラムを生成する。頻度曲線９００は、生成したヒストグラムの複数のビンの頂点を曲線で結んだものである。 First, various face image samples are prepared. Next, for each of a plurality of pixels constituting the face image sample, a luminance difference between the pixel and each of three adjacent pixels (a pixel immediately below, a pixel on the lower right, and a pixel on the right) is obtained. . The process for obtaining the luminance difference is performed for each of the prepared face image samples. Then, a frequency distribution for a plurality of luminance differences obtained for a plurality of face image samples is obtained. Next, each frequency in the obtained frequency distribution is divided by the number of face image samples to generate an average frequency distribution for one face image sample. Thereafter, a histogram indicating the frequency distribution is generated. The frequency curve 900 is obtained by connecting vertices of a plurality of bins of a generated histogram with a curve.

また、様々な複数枚の非顔画像サンプルを用意する。そして、非顔画像サンプルを構成する複数の画素のそれぞれについて、当該画素と、それに隣接する３つの画素（真下の画素、右下の画素、右の画素）のそれぞれとの間の輝度差を求める。この輝度差を求める処理を、用意した複数枚の非顔画像サンプルのそれぞれについて行う。そして、複数枚の非顔画像サンプルについて得られた複数の輝度差についての頻度分布を求める。次に、求めた頻度分布での各頻度を非顔画像サンプルの枚数で除算し、１枚の非顔画像サンプルについての平均的な頻度分布を生成する。その後、当該頻度分布を示すヒストグラムを生成する。頻度曲線９１０は、生成したヒストグラムの複数のビンの頂点を曲線で結んだものである。 Also, various non-face image samples are prepared. Then, for each of a plurality of pixels constituting the non-face image sample, a luminance difference between the pixel and each of three adjacent pixels (a pixel immediately below, a pixel on the lower right, and a pixel on the right) is obtained. . The process for obtaining the luminance difference is performed for each of the prepared non-face image samples. Then, a frequency distribution for a plurality of luminance differences obtained for a plurality of non-face image samples is obtained. Next, each frequency in the obtained frequency distribution is divided by the number of non-face image samples to generate an average frequency distribution for one non-face image sample. Thereafter, a histogram indicating the frequency distribution is generated. The frequency curve 910 is obtained by connecting the vertices of a plurality of bins of the generated histogram with a curve.

図４０に示されるように、非顔画像サンプルについての隣接画素間の輝度差の頻度分布を示す頻度曲線９１０は、全体的に正規分布曲線に近い形を成している。これは、非顔画像については、隣接画素間の輝度差に何ら特徴が見られないからであり、輝度差が２〜３あたりでピークとなっているのはノイズの影響である。 As shown in FIG. 40, the frequency curve 910 indicating the frequency distribution of the luminance difference between adjacent pixels for the non-face image sample has a shape close to a normal distribution curve as a whole. This is because no feature is seen in the luminance difference between adjacent pixels for the non-face image, and the fact that the luminance difference peaks around 2-3 is due to the influence of noise.

一方で、顔画像サンプルについての隣接画素間の輝度差の頻度分布を示す頻度曲線９００は、輝度差が８未満の部分では、正規分布曲線に近い形となって頻度曲線９１０と相似形となっているが、輝度差が８以上の部分では、正規分布曲線からくずれており、頻度曲線９１０とは相似形ではない。顔画像については、隣接画素間の輝度差に特徴が見られるため、頻度曲線９００については本来的には正規分布曲線とはならないが、ノイズの影響により、輝度差が８未満の部分では正規分布曲線に近い形となっているものと思われる。 On the other hand, the frequency curve 900 indicating the frequency distribution of the luminance difference between adjacent pixels for the face image sample is similar to the frequency curve 910 in the portion where the luminance difference is less than 8, which is similar to the normal distribution curve. However, in the portion where the luminance difference is 8 or more, it is deviated from the normal distribution curve and is not similar to the frequency curve 910. Since a characteristic is seen in the luminance difference between adjacent pixels for the face image, the frequency curve 900 is not inherently a normal distribution curve, but due to the influence of noise, a normal distribution is obtained in a portion where the luminance difference is less than 8. The shape seems to be close to a curve.

ここで、仮に頻度曲線９００，９１０がともに正規分布曲線であれば、頻度曲線９００，９１０は互いに交差することはない。しかしながら、図４０に示されるように、頻度曲線９００，９１０は輝度差８で交差していることから、輝度差が８以上において、顔画像の特徴が現れて、頻度曲線９１０の形が正規分布曲線からくずれていると見ることができる。 Here, if the frequency curves 900 and 910 are both normal distribution curves, the frequency curves 900 and 910 do not cross each other. However, as shown in FIG. 40, since the frequency curves 900 and 910 intersect with a luminance difference of 8, a facial image feature appears when the luminance difference is 8 or more, and the shape of the frequency curve 910 is a normal distribution. It can be seen that it is broken from the curve.

このように、顔画像サンプルについての頻度曲線９００と、非顔画像サンプルについての頻度曲線９１０とを比較すると、隣接画素間の輝度差が８以上となれば、両者は非相似形となっている。このことから、隣接画素間の輝度差が８以上の場合に、ノイズの影響が小さくなって、顔画像の特徴が現れると考えることができる。 As described above, when the frequency curve 900 for the face image sample and the frequency curve 910 for the non-face image sample are compared, if the luminance difference between adjacent pixels is 8 or more, they are non-similar. . From this, it can be considered that when the luminance difference between adjacent pixels is 8 or more, the influence of noise is reduced and the feature of the face image appears.

そこで、ネガティブＬＴＰマップあるいはポジティブＬＴＰマップを使用して特徴量を抽出する際には、ＬＴＰを生成する際のオフセット値を“８”に設定する。これにより、ネガティブＬＴＰあるいはポジティブＬＴＰは、顔画像の特徴を適切に表すことができ、ネガティブＬＴＰマップあるいはポジティブＬＴＰマップを使用して抽出された特徴量に基づいて顔画像の識別を行うことによって、顔画像についての検出精度が向上する。 Therefore, when extracting the feature amount using the negative LTP map or the positive LTP map, the offset value when generating the LTP is set to “8”. Thereby, the negative LTP or the positive LTP can appropriately represent the feature of the face image, and by identifying the face image based on the feature amount extracted using the negative LTP map or the positive LTP map, The detection accuracy for the face image is improved.

上記において画像処理システム５０は詳細に説明されたが、上記した説明は、全ての局面において例示であって、この発明がそれに限定されるものではない。また、上述した各種の例は、相互に矛盾しない限り組み合わせて適用可能である。そして、例示されていない無数の変形例が、この発明の範囲から外れることなく想定され得るものと解される。 Although the image processing system 50 has been described in detail above, the above description is illustrative in all aspects, and the present invention is not limited thereto. The various examples described above can be applied in combination as long as they do not contradict each other. And it is understood that the countless modification which is not illustrated can be assumed without deviating from the scope of the present invention.

１画像検出装置
４制御プログラム
１３特徴量抽出装置（特徴量抽出部）
１３０評価値取得部
１３１特徴量取得部 DESCRIPTION OF SYMBOLS 1 Image detection apparatus 4 Control program 13 Feature-value extraction apparatus (feature-value extraction part)
130 evaluation value acquisition unit 131 feature amount acquisition unit

Claims

An image detection device for detecting a detection target image from a processing target image,
Each of a plurality of pixels included in an image included in the processing target image is a target pixel, and an evaluation value acquisition unit that obtains a plurality of types of evaluation values indicating a relationship between a pixel value of the target pixel and a plurality of surrounding pixel values;
Seeking co-occurrence frequencies for the evaluation values obtained for the plurality of pixels included in the image, the feature amount acquisition unit to feature quantity the co-occurrence frequency,
A discriminator for determining whether the image is likely to be the detection target image based on the feature amount;
The image detection device uses the evaluation value used in the acquisition of the feature amount used in the classifier according to the brightness of the imaging environment of the processing target image between the plurality of types of evaluation values. Different image detection devices.

The image detection apparatus according to claim 1.
The plurality of types of evaluation values include a first type of evaluation value,
The evaluation value acquisition unit generates, for each of a plurality of surrounding pixel values used in the acquisition of the first type of evaluation value, 1 bit indicating a relationship between the surrounding pixel value and the pixel value of the target pixel, A binary code composed of a plurality of bits generated for a plurality of surrounding pixel values is used as the first type of evaluation value,
The 1 bit relating to the first type of evaluation value indicates “1” if the surrounding pixel value is equal to or smaller than a value obtained by subtracting a predetermined amount from the pixel value of the target pixel, and “1” if the surrounding pixel value is larger than the value. An image detection device indicating 0 ″.

An image detection apparatus according to any one of claims 1 and 2.
The plurality of types of evaluation values include a second type of evaluation value;
The evaluation value acquisition unit generates, for each of a plurality of surrounding pixel values used in the acquisition of the second type of evaluation value, 1 bit indicating a relationship between the surrounding pixel value and the pixel value of the target pixel, A binary code composed of a plurality of bits generated for a plurality of surrounding pixel values is used as the second type of evaluation value,
The 1 bit relating to the second type of evaluation value indicates “1” if the surrounding pixel value is equal to or greater than a value obtained by adding a predetermined amount to the pixel value of the target pixel, and “1” if the surrounding pixel value is less than the value. An image detection device indicating 0 ″.

The image detection apparatus according to any one of claims 1 to 3 ,
The image detection apparatus , wherein the feature amount acquisition unit sets the feature amount without normalizing the co-occurrence frequency for a certain type of evaluation value included in the plurality of types of evaluation values .

The image detection apparatus according to any one of claims 1 to 3 ,
The image detection apparatus , wherein the evaluation value acquisition unit does not use surrounding pixel values in an oblique direction with respect to a target pixel when obtaining a certain type of evaluation value included in the plurality of types of evaluation values .

The image detection apparatus according to claim 5 ,
The evaluation value acquiring unit, in determining said certain type of evaluation value, not used all the upper right, upper left, around the pixel values of the lower right and lower left directions with respect to the target pixel, the image sensing apparatus.

The image detection device according to claim 6 ,
The evaluation value acquisition unit generates, for each of a plurality of surrounding pixel values used for acquisition of the certain type of evaluation value, one bit indicating a relationship between the surrounding pixel value and the pixel value of the target pixel, Image detection that uses a binary code composed of multiple bits generated for the surrounding pixel values as an evaluation value of that kind regardless of the number of bit changes when the multiple bits are viewed in order Equipment .

An image detection apparatus according to any one of claims 1 to 7 ,
The image detection apparatus, wherein the detection target image is a human face image.

A control program for controlling an image detection device that detects a detection target image from a processing target image,
In the image detection device,
(A) obtaining a plurality of types of evaluation values indicating the relationship between a pixel value of the target pixel and a plurality of surrounding pixel values, with each of the plurality of pixels included in the image included in the processing target image as a target pixel;
(B) seeking co-occurrence frequencies for the evaluation values obtained for the plurality of pixels included in the image, the steps of the co-occurrence frequency and feature amount,
(C) executing a step of determining whether the image is likely to be the detection target image based on the feature amount;
The evaluation value used in the acquisition of the feature value used in the step (c) is selectively used among the plurality of types of evaluation values according to the brightness of the imaging environment of the processing target image. Control program.

An image detection method for detecting a detection target image from a processing target image,
(A) obtaining a plurality of types of evaluation values indicating the relationship between a pixel value of the target pixel and a plurality of surrounding pixel values, with each of the plurality of pixels included in the image included in the processing target image as a target pixel;
(B) seeking co-occurrence frequencies for the evaluation values obtained for the plurality of pixels included in the image, the steps of the co-occurrence frequency and feature amount,
(C) determining whether the image is likely to be the detection target image based on the feature amount,
Image detection in which the evaluation value used in the acquisition of the feature value used in the step (c) is selectively used between the plurality of types of evaluation values according to the brightness of the imaging environment of the processing target image Method.