JP2012243206A

JP2012243206A - Image processing method, image processor and image processing program

Info

Publication number: JP2012243206A
Application number: JP2011114836A
Authority: JP
Inventors: Naoki Ito; 直己伊藤; Tetsuya Kinebuchi; 哲也杵渕; Hiroyuki Arai; 啓之新井; Akira Suzuki; 章鈴木; Isamu Igarashi; 勇五十嵐; Hideki Koike; 秀樹小池
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-05-23
Filing date: 2011-05-23
Publication date: 2012-12-10
Anticipated expiration: 2031-05-23
Also published as: JP5530399B2

Abstract

PROBLEM TO BE SOLVED: To provide an image processor for achieving quick face detection.SOLUTION: The image processing method includes: a step of generating a smoothed image from an acquired image; a step of calculating the mean value of respective pixels on the basis of the images of a plurality of frames before a processing object image including the processing object image, and calculating a distribution value from the calculated mean value of the pixels and the pixel value of each frame; a step of executing further smoothing processing to the luminance value of the smoothed image of the processing object image to generate an image for normalization, and normalizing the distribution value image by using the value of each pixel of the generated image for normalization to generate a normalized distribution value image with respect to each pixel value of the generated distribution value image; a step of executing expansion processing to the normalized distribution value image to generate a corrected distribution value image; a step of executing arbitrary threshold processing to each pixel value of the corrected distribution value image to generate a binary image, and for detecting a figure area; and a step of executing face detection processing to the pixels whose distribution value is determined so as to be larger than a fixed value.

Description

本発明は、撮像装置で撮影された時系列画像から、人物の顔を検出する際に、人物の動きを検出することにより検出範囲を限定し、高速に顔の検出を行う画像処理装置、画像処理方法および画像処理プログラムに関する。 The present invention relates to an image processing device that detects a person's face from a time-series image taken by an imaging device, limits the detection range by detecting the person's movement, and detects the face at high speed. The present invention relates to a processing method and an image processing program.

従来から、画像や映像から顔を検出する方法として非特許文献１記載の方法が知られている。また、検出した顔の姿勢を推定する方法として非特許文献２記載の方法が知られている。これらの方法を用いることにより、近年普及しているデジタルサイネージの客観的な広告効果指標として、デジタルサイネージに対する注目度合いを計測することができる。 Conventionally, a method described in Non-Patent Document 1 is known as a method of detecting a face from an image or video. Further, a method described in Non-Patent Document 2 is known as a method for estimating the detected face posture. By using these methods, it is possible to measure the degree of attention to digital signage as an objective advertising effect index of digital signage that has become popular in recent years.

三田雄志，金子敏充，堀修，”個体差のある対象の画像照合に適した確率的増分符号相関” 電子情報通信学会論文誌Ｄ−ＩＩＶｏｌ．Ｊ８８−Ｄ−ＩＩＮｏ．８ｐｐ．１６１４−１６２３，（株）電子情報通信学会２００５Yuji Mita, Toshimitsu Kaneko, Osamu Hori, “Probabilistic Incremental Code Correlation Suitable for Image Matching of Individual Differences” IEICE Transactions D-II Vol. J88-D-II No. 8 pp. 1614-1623, The Institute of Electronics, Information and Communication Engineers 2005 安藤慎吾，草地良規，鈴木章，荒川賢一，”サポートベクトル回帰を用いた三次元物体の姿勢推定法”，電子情報通信学会論文誌ＤＶｏｌ．Ｊ８９−ＤＮｏ．８ｐｐ．１８４０−１８４７，（株）電子情報通信学会２００６Shingo Ando, Yoshinori Kusachi, Akira Suzuki, Kenichi Arakawa, “Pose Estimation Method for 3D Objects Using Support Vector Regression”, IEICE Transactions D Vol. J89-D No. 8 pp. 1840-1847, The Institute of Electronics, Information and Communication Engineers 2006

しかしながら、一般的な顔検出手法では、画像や映像全体から顔を検出する手法であり、誤検出や過剰検出を抑制するための高度な特徴量を使用する画像処理技術を用いた場合、処理速度が低下してしまうというという問題がある。 However, a general face detection method is a method for detecting a face from an entire image or video, and when using image processing technology that uses advanced feature amounts to suppress false detection and over-detection, the processing speed There is a problem that will decrease.

本発明は、このような事情に鑑みてなされたもので、人物の顔を検出する範囲を限定することにより、誤検出や過剰検出を抑制しながらも、高速な顔検出を実現することができる画像処理装置、画像処理方法および画像処理プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and by limiting the detection range of a person's face, high-speed face detection can be realized while suppressing false detection and excessive detection. An object is to provide an image processing apparatus, an image processing method, and an image processing program.

本発明は、取得した画像から顔検出を行うために、平滑化画像生成部と、分散値画像生成部と、分散値画像正規化部と、膨張収縮処理部と、人物領域検出部と、顔検出処理部とを備える画像処理装置における画像処理方法であって、前記平滑化画像生成部が、取得した画像に対して平滑化処理を行い平滑化画像を生成する平滑化ステップと、前記分散値画像生成部が、処理対象画像を含む処理対象画像以前の複数フレームの画像をもとに、各画素の平均値を算出し、算出した各画素の平均値とそれぞれのフレームの該当画素値から分散値を算出する分散値画像生成ステップと、前記分散値画像正規化部が、前記処理対象画像の平滑化画像の輝度値に対して更なる平滑化処理を行って正規化用画像を生成し、生成した前記分散値画像の各画素値に対して、生成した前記正規化用画像の各画素の値を用いて分散値画像の正規化を行い正規化分散値画像を生成する正規化画像生成ステップと、前記膨張収縮処理部が、正規化した前記正規化分散値画像に対して膨張処理を実施することにより補正分散値画像を生成する膨張収縮ステップと、前記人物領域検出部が、前記補正分散値画像の各画素値に対して任意の閾値処理を行い二値画像を生成することにより人物領域を検出する人物領域検出ステップと、前記顔検出処理部が、前記分散値が所定値より大きいと判定された画素に対して顔検出処理を実施する顔検出ステップとを有することを特徴とする。 The present invention provides a smoothed image generation unit, a variance value image generation unit, a variance value image normalization unit, an expansion / contraction processing unit, a person area detection unit, a face, An image processing method in an image processing apparatus including a detection processing unit, wherein the smoothed image generation unit performs a smoothing process on the acquired image to generate a smoothed image, and the variance value The image generation unit calculates the average value of each pixel based on images of a plurality of frames before the processing target image including the processing target image, and distributes the calculated average value of each pixel and the corresponding pixel value of each frame. A variance value image generation step for calculating a value, and the variance value image normalization unit performs a further smoothing process on the luminance value of the smoothed image of the processing target image to generate a normalization image; Each pixel value of the generated variance value image On the other hand, a normalized image generation step of generating a normalized variance value image by normalizing the variance value image using the value of each pixel of the generated normalization image, and the expansion / contraction processing unit includes the normalization An expansion / contraction step for generating a corrected dispersion value image by performing an expansion process on the normalized dispersion value image, and the person region detection unit may arbitrarily set each pixel value of the correction dispersion value image. A human region detection step of detecting a human region by generating a binary image by performing threshold processing, and the face detection processing unit performs face detection processing on pixels for which the variance value is determined to be greater than a predetermined value. And a face detection step to be performed.

本発明は、前記分散値画像生成ステップにより生成された各画素の分散値に対して、前記処理対象画像の輝度値毎に最小となる分散値を分散値テーブルとして生成する分散値テーブル生成ステップをさらに有し、前記分散値テーブル生成ステップにおいて生成された前記分散値テーブル並びに前記正規化画像生成ステップにより生成された前記正規化用画像の各画素値より、前記人物領域検出ステップにて各画素の閾値を自動算出し、閾値処理を行うことを特徴とする。 The present invention includes a variance value table generation step of generating, as a variance value table, a variance value that is minimum for each luminance value of the processing target image with respect to the variance value of each pixel generated by the variance value image generation step. Furthermore, each pixel value in the person area detection step is determined based on the variance value table generated in the variance value table generation step and each pixel value of the normalization image generated in the normalized image generation step. A threshold value is automatically calculated and threshold processing is performed.

本発明は、前記顔検出ステップにて検出されたひとつ前のフレームの画像に対する顔検出位置を保持する顔検出位置保持ステップをさらに有し、前記顔検出位置保持ステップに保持されている顔検出位置を基準とし、前記膨張収縮ステップにて生成した補正分散値画像の画素値がより高い方向を優先し前記顔検出ステップにて顔検出を行い、且つ前記人物領域検出ステップにおいて該当画素位置周辺の分散値が一定値以下と判定されていた場合においても、該当画素位置周辺にて顔検出を行うことを特徴とする。 The present invention further includes a face detection position holding step for holding a face detection position with respect to the image of the previous frame detected in the face detection step, and the face detection position held in the face detection position holding step. , The face detection is performed in the face detection step by giving priority to the direction in which the pixel value of the corrected dispersion value image generated in the expansion / contraction step is higher, and the variance around the corresponding pixel position in the person region detection step Even when the value is determined to be equal to or less than a certain value, face detection is performed around the corresponding pixel position.

本発明は、取得した画像から顔検出を行う画像処理装置であって、取得した画像に対して平滑化処理を行い平滑化画像を生成する平滑化手段と、処理対象画像を含む処理対象画像以前の複数フレームの画像をもとに、各画素の平均値を算出し、算出した各画素の平均値とそれぞれのフレームの該当画素値から分散値を算出する分散値画像生成手段と、前記処理対象画像の平滑化画像の輝度値に対して更なる平滑化処理を行って正規化用画像を生成し、生成した前記分散値画像の各画素値に対して、生成した前記正規化用画像の各画素の値を用いて分散値画像の正規化を行い正規化分散値画像を生成する正規化画像生成手段と、正規化した前記正規化分散値画像に対して膨張処理を実施することにより補正分散値画像を生成する膨張収縮手段と、前記補正分散値画像の各画素値に対して任意の閾値処理を行い二値画像を生成することにより人物領域を検出する人物領域検出手段と、前記分散値が所定値より大きいと判定された画素に対して顔検出処理を実施する顔検出手段とを備えることを特徴とする。 The present invention is an image processing device that performs face detection from an acquired image, and performs smoothing processing on the acquired image to generate a smoothed image, and the processing target image including the processing target image A variance value image generating means for calculating an average value of each pixel based on the images of a plurality of frames and calculating a variance value from the calculated average value of each pixel and a corresponding pixel value of each frame; Further smoothing processing is performed on the luminance value of the smoothed image of the image to generate a normalization image, and for each pixel value of the generated variance value image, each of the generated normalization image Normalized image generation means for generating a normalized dispersion value image by normalizing the dispersion value image using the pixel value, and correction dispersion by performing dilation processing on the normalized normalized dispersion value image Expansion and contraction means for generating a value image; Human area detection means for detecting a person area by performing arbitrary threshold processing on each pixel value of the corrected variance value image to generate a binary image, and pixels for which the variance value is determined to be greater than a predetermined value And a face detection means for performing face detection processing on the image.

本発明は、取得した画像から顔検出を行う画像処理装置上のコンピュータに画像処理を行わせる画像処理プログラムであって、取得した画像に対して平滑化処理を行い平滑化画像を生成する平滑化ステップと、処理対象画像を含む処理対象画像以前の複数フレームの画像をもとに、各画素の平均値を算出し、算出した各画素の平均値とそれぞれのフレームの該当画素値から分散値を算出する分散値画像生成ステップと、前記処理対象画像の平滑化画像の輝度値に対して更なる平滑化処理を行い、正規化用画像を生成し、生成した前記分散値画像の各画素値に対して、生成した前記正規化用画像の各画素の値を用いて分散値画像の正規化を行い正規化分散値画像を生成する正規化画像生成ステップと、正規化した前記正規化分散値画像に対して膨張処理を実施することにより補正分散値画像を生成する膨張収縮ステップと、前記補正分散値画像の各画素値に対して任意の閾値処理を行い二値画像を生成することにより人物領域を検出する人物領域検出ステップと、前記分散値が所定値より大きいと判定された画素に対して顔検出処理を実施する顔検出ステップとを前記コンピュータに行わせることを特徴とする。 The present invention is an image processing program that causes a computer on an image processing apparatus that performs face detection from an acquired image to perform image processing, and performs smoothing processing on the acquired image to generate a smoothed image The average value of each pixel is calculated based on the image of a plurality of frames before the processing target image including the processing target image, and the variance value is calculated from the calculated average value of each pixel and the corresponding pixel value of each frame. A variance value image generation step to be calculated, and further smoothing processing is performed on the brightness value of the smoothed image of the processing target image to generate a normalization image, and each pixel value of the generated variance value image On the other hand, a normalized image generation step of generating a normalized variance value image by normalizing the variance value image using the value of each pixel of the generated normalization image, and the normalized normalized variance value image Against An expansion / contraction step for generating a corrected variance value image by performing a tension process, and an arbitrary threshold value processing for each pixel value of the corrected variance value image to generate a binary image, thereby detecting a human region It is characterized in that the computer is caused to perform a human region detection step and a face detection step of performing a face detection process on pixels for which the variance value is determined to be larger than a predetermined value.

本発明によれば、人物の動き情報により人物領域を限定することにより、従来技術の人物の顔や姿勢推定を行う領域を限定し、高速な検出を行うことが可能となる。また、平滑化による低輝度領域におけるノイズの低減、画像全体の空間的な輝度値により動き量である分散値の正規化、輝度値毎の分散値のテーブル化により人物領域検出時の閾値の自動化などを行うことにより、ノイズによる人物領域の過剰検出や、低輝度領域の検出漏れなどを低減することも可能となる。さらに、人物の動き量に応じた探索領域の優先付けにより高速な探索を行うことも可能となり、前フレームにて検出された顔位置情報を元に該当位置周辺からの顔位置を検出することにより動き量の少ない静止中人物の顔も検出可能となるという効果が得られる。 According to the present invention, by limiting the person area based on the person's motion information, it is possible to limit the area for estimating the face and posture of the person of the prior art and perform high-speed detection. Also, smoothing reduces noise in low-brightness areas, normalizes the variance value, which is the amount of motion based on the spatial brightness value of the entire image, and automates thresholds when detecting human areas by creating a table of variance values for each brightness value By performing the above and the like, it is possible to reduce excessive detection of a human area due to noise, omission of detection of a low luminance area, and the like. Furthermore, it becomes possible to perform high-speed search by prioritizing search areas according to the amount of movement of the person, and by detecting the face position from around the corresponding position based on the face position information detected in the previous frame An effect is obtained that the face of a stationary person with a small amount of motion can be detected.

本発明の一実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of one Embodiment of this invention. 図１に示す装置の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the apparatus shown in FIG. 図１に示す装置の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the apparatus shown in FIG. 図１に示す装置の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the apparatus shown in FIG. 複数フレームと分散値の一例を示す説明図である。It is explanatory drawing which shows an example of a some frame and a dispersion value. モルフォロジー演算による膨張処理、縮退処理の一例を示す説明図である。It is explanatory drawing which shows an example of the expansion process by a morphological operation, and a degeneracy process.

始めに、画像処理の原理について説明する。ここでは、人物の顔を検出する前の領域を限定するために、画像や映像から人物らしい領域を検出する。人物らしさとしては、対象となる画像の各画素成分の時間的な変化を用いる。時間的な変化を計測するため、時系列画像から処理対象となる画像を含む連続した複数枚の画像を選択し、画素毎の時間変化を複数フレームの同一画素の分散値を計算することにより変化量を算出する。 First, the principle of image processing will be described. Here, in order to limit the area before detecting a person's face, a person-like area is detected from an image or video. As the personness, a temporal change of each pixel component of the target image is used. To measure temporal changes, select multiple consecutive images including the image to be processed from the time-series images, and change the temporal changes for each pixel by calculating the variance of the same pixel in multiple frames. Calculate the amount.

図５に変化量算出の概念図を示す。図５に示すように、動きのある領域では分散値が大きくなるが、動きの無い領域では画素値の変化が少ないため、分散値は非常に小さな値となる。分散値の算出には、処理対象となる画像を含め複数枚の画像を使用するが、ここで図５の例では処理対象となる画像をＮ枚目とし、Ｎ枚目を基準に過去Ｍ枚目までの画像を使用して分散値を算出する。このとき、分散算出に使用した画像の枚数が少ない場合、１枚の画像による寄与率が大きくなり、瞬間的な変化に対して分散値が大きく変動する。そのため、本来は変化の少ない領域であっても、分散値が大きく変動してしまう。 FIG. 5 shows a conceptual diagram of the change amount calculation. As shown in FIG. 5, the variance value is large in the region with movement, but the variance value is very small because the change in the pixel value is small in the region without movement. The calculation of the variance value uses a plurality of images including the image to be processed. Here, in the example of FIG. 5, the image to be processed is the Nth image, and the past M images based on the Nth image The variance value is calculated using the image up to the eye. At this time, when the number of images used for the variance calculation is small, the contribution ratio of one image increases, and the variance value greatly fluctuates with respect to an instantaneous change. For this reason, the variance value fluctuates greatly even in an area where there is originally little change.

そこで、分散値の算出には十分多くの枚数の画像を使用する。ここで、撮影環境やカメラ設定により画素の輝度が低い場合は、画素値の変化に比べて相対的に撮像時のノイズの影響が大きくなるという問題がある。そこで、画像全体に対して空間的な平滑化を行うことにより、ノイズの影響による画素値の変化を軽減する。また、影や逆光などにより、同一画像内で空間的に明るさが異なる場合、あるいは画像中の物体そのものの反射率が低く、画像中の画素値が小さくなってしまう場合など、画素値の小さな領域では、計測される変化量が小さくなるという問題がある。そのため、画像の輝度値に対して空間的な平滑化を行った輝度画像により、全画素値の輝度を平滑化した後、分散値を算出することにより、画素値の小さな領域においても、微少な変化を計測する。 Therefore, a sufficiently large number of images are used for calculating the variance value. Here, when the luminance of the pixel is low due to the shooting environment or camera setting, there is a problem that the influence of noise during imaging is relatively greater than the change in the pixel value. Therefore, by performing spatial smoothing on the entire image, changes in pixel values due to the influence of noise are reduced. Also, the pixel value is small, such as when the brightness is spatially different within the same image due to shadows or backlighting, or when the reflectance of the object itself in the image is low and the pixel value in the image becomes small In the region, there is a problem that the amount of change measured is small. Therefore, by smoothing the brightness of all the pixel values with a brightness image obtained by spatially smoothing the brightness value of the image, and calculating the variance value, even in a region with a small pixel value, Measure changes.

長時間人物が同じ個所に留まっている場合、人物周辺領域は人物のふらつきによる動きがあるため、画素値の変化が大きくなり、分散値も大きくなる。しかしながら、空間的に画素値の変動が少ない人物中心付近の領域では、分散値も小さくなる。そこで、算出した分散値に対して、図６に示すモルフォロジー（morphology）演算の膨張、収縮処理を行い、分散値の小さくなる人物中心付近の穴埋め処理を行う。膨張処理を行うことにより、人物周辺画素から人物中心、あるいは人物の外側の領域に対しても人物周辺領域と同じ大きな分散の値とすることができ、収縮処理を行うことで、人物の外側領域のみ分散値をより外側の小さな分散値とすることができる。このようにして、人物領域では大きな分散値、それ以外の領域では小さな分散値となる分散値画像を生成する。 When the person stays at the same place for a long time, the person peripheral area has a movement due to the wobbling of the person, so the change of the pixel value becomes large and the variance value also becomes large. However, in a region near the center of the person where the variation of the pixel value is spatially small, the variance value is also small. Therefore, expansion and contraction processing of the morphology operation shown in FIG. 6 is performed on the calculated dispersion value, and hole filling processing near the center of the person where the dispersion value becomes small is performed. By performing the dilation process, the same large variance value as the person peripheral area can be obtained from the person peripheral pixel to the person center or the area outside the person. Only the variance value can be a smaller outer variance value. In this way, a variance value image having a large variance value in the person area and a small variance value in the other areas is generated.

上記分散値画像の画素値を人物らしさとして人物領域を抽出するためには、人物らしさの値により閾値処理を行う必要がある。ここで、前述の通り撮像装置に含まれるノイズ等の影響の更なる低減のため、計測時の画素の輝度値毎に分散値の最小値をテーブル化し、テーブル化した最小となる分散値に応じて閾値を算出することで、輝度値に応じた閾値を動的に設定する。また、人物領域は人物以外の領域に比べて画素値の変化が大きいことから、分散値画像の各画素の値の大小により、人物の顔検出を行うための探索領域に優先順位を付けることにより、より高速な探索を行う。さらに、人物が静止している場合には動き量が少なくなることから、前フレームにて顔が検出された位置情報に基づき、動き量が極端に少なくなった場合においても該当位置周辺での顔検出処理を行うことにより、人物の顔の検出漏れを低減する。 In order to extract a person region using the pixel value of the variance value image as a person, it is necessary to perform threshold processing based on the person-like value. Here, as described above, in order to further reduce the influence of noise or the like included in the imaging device, the minimum value of the variance value is tabulated for each luminance value of the pixel at the time of measurement, and the minimum variance value that is tabulated is determined according to the table. By calculating the threshold value, the threshold value corresponding to the luminance value is dynamically set. In addition, since the change in the pixel value of the person area is larger than that of the area other than the person, the priority is given to the search area for detecting the face of the person depending on the value of each pixel of the variance value image. Do a faster search. Furthermore, since the amount of movement decreases when the person is stationary, the face around the corresponding position even when the amount of movement is extremely small based on the position information where the face was detected in the previous frame. By performing the detection processing, omissions in detection of human faces are reduced.

＜第１の実施形態＞
以下、図面を参照して、本発明の第１の実施形態による画像処理装置を説明する。図１は同実施形態の構成を示すブロック図である。図１において、１は画像入力装置としてのカメラである。２はカメラ１で取得した画像に対して空間的な平滑化を行うための平滑化画像生成部である。３は平滑化画像生成部２で生成した画像を蓄積する画像蓄積部である。４は平滑化画像生成部２により平滑化された画像の各画素に対して平均値を算出し、分散値画像を生成する分散値画像生成部である。５は平滑化画像生成部２により生成された平滑化画像に対して輝度値の平滑化処理を行い、平滑化後の輝度値に応じて分散値画像生成部４により算出される各画素の分散値を正規化する分散値画像正規化部である。 <First Embodiment>
Hereinafter, an image processing apparatus according to a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the embodiment. In FIG. 1, reference numeral 1 denotes a camera as an image input device. Reference numeral 2 denotes a smoothed image generating unit for performing spatial smoothing on the image acquired by the camera 1. An image storage unit 3 stores the image generated by the smoothed image generation unit 2. Reference numeral 4 denotes a dispersion value image generation unit that calculates an average value for each pixel of the image smoothed by the smoothed image generation unit 2 and generates a dispersion value image. Reference numeral 5 denotes a luminance value smoothing process performed on the smoothed image generated by the smoothed image generating unit 2, and the variance of each pixel calculated by the variance value image generating unit 4 according to the smoothed luminance value It is a dispersion value image normalization part which normalizes a value.

６は分散値画像正規化部５により正規化された分散値に対して膨張処理、収縮処理を行う膨張収縮処理部である。７は膨張収縮処理部６により膨張処理、収縮処理を行った正規化後の分散値に対して閾値処理を行い人物領域を検出する人物領域検出部である。８は人物領域検出部７により検出された人物領域から顔検出処理を行う顔検出処理部である。なお、前記平滑化画像生成部２、分散値画像生成部４、分散値画像正規化部５、膨張収縮処理部６、人物領域検出部７、顔検出処理部８における各処理は、たとえばコンピュータやカメラ内部のハードウェアにより実行される。画像蓄積部３には、ハードディスク、ＲＡＩＤ装置、ＣＤ−ＲＯＭなどの記録媒体を利用する、または、ネットワークを介してリモートなデータ資源を利用する形態でも構わない。 An expansion / contraction processing unit 6 performs expansion processing / contraction processing on the dispersion value normalized by the dispersion value image normalization unit 5. Reference numeral 7 denotes a person area detection unit that detects a person area by performing threshold processing on the normalized dispersion value that has been subjected to expansion processing and contraction processing by the expansion / contraction processing unit 6. A face detection processing unit 8 performs face detection processing from the person area detected by the person area detection unit 7. Each process in the smoothed image generation unit 2, the variance value image generation unit 4, the variance value image normalization unit 5, the expansion / contraction processing unit 6, the person area detection unit 7, and the face detection processing unit 8 is performed by, for example, a computer or It is executed by hardware inside the camera. The image storage unit 3 may use a recording medium such as a hard disk, a RAID device, a CD-ROM, or a remote data resource via a network.

次に、図２を参照して、図１に示す画像処理装置の処理動作を説明する。図２は、図１に示す図１に示す画像処理装置の処理動作を示すフローチャートである。まず全体の流れを説明する。平滑化画像生成部２は、カメラ１で撮影した画像に対して平滑化を行い、画像蓄積部３に蓄積する（ステップＳ１）。次に、分散値画像生成部４と分散値画像正規化部５は、画像蓄積部２に蓄積された画像を取り込む（ステップＳ２）。そして、分散値画像生成部４は、平滑化後の複数フレームの画像から画素毎に平均値を算出し（ステップＳ３）、画素毎の分散値を計算する（ステップＳ４）。分散値画像生成部４は、全ての画素に対して処理が終了しているか否かを判定し（ステップＳ５）、未処理の画素が残されている場合は該当画素の処理を行うことにより分散値画像を生成する。 Next, the processing operation of the image processing apparatus shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a flowchart showing the processing operation of the image processing apparatus shown in FIG. 1 shown in FIG. First, the overall flow will be described. The smoothed image generating unit 2 smoothes the image captured by the camera 1 and stores it in the image storage unit 3 (step S1). Next, the variance value image generation unit 4 and the variance value image normalization unit 5 capture images stored in the image storage unit 2 (step S2). Then, the variance value image generation unit 4 calculates an average value for each pixel from the smoothed images of a plurality of frames (step S3), and calculates a variance value for each pixel (step S4). The variance value image generation unit 4 determines whether or not the processing has been completed for all the pixels (step S5). Generate a value image.

次に、分散値画像正規化部５は、取得した処理対象となる画像の平滑化画像から輝度値を平滑化した正規化用輝度画像を生成し（ステップＳ６）、分散値画像生成部４が生成した分散値画像とステップＳ６にて生成した正規化用輝度画像から、分散値画像の正規化を行う（ステップＳ７）。続いて、膨張収縮処理部６は、正規化後の分散値画像に対して膨張処理、収縮処理を行い（ステップＳ８）、人物領域検出部７は、膨張処理、収縮処理後の分散値画像に対して閾値処理を行う（ステップＳ９）ことで、人物領域を検出し、得られた人物領域画像を出力する（ステップＳ１０）。顔検出処理部８は、人物領域画像から人物の顔を検出する（ステップＳ１１）。 Next, the variance value image normalization unit 5 generates a normalization luminance image in which the luminance value is smoothed from the acquired smoothed image of the processing target image (step S6), and the variance value image generation unit 4 The dispersion value image is normalized from the generated dispersion value image and the normalization luminance image generated in step S6 (step S7). Subsequently, the expansion / contraction processing unit 6 performs expansion processing and contraction processing on the normalized dispersion value image (step S8), and the person area detection unit 7 converts the dispersion value image after the expansion processing and contraction processing into the dispersion value image. The threshold value processing is performed for the person area (step S9), thereby detecting the person area and outputting the obtained person area image (step S10). The face detection processing unit 8 detects a person's face from the person area image (step S11).

次に、各処理の動作を詳細に説明する。カメラ１を経由して取得された画像は平滑化画像生成部２において平滑化処理を行う。平滑化処理には、ガウシアンフィルタ等の一般的な画像処理により使用されるフィルタ処理を用いる。平滑化処理を行った画像は画像蓄積部３に蓄積され、分散値画像生成部４において画像蓄積部３から処理対象画像を含む連続したＭ枚の画像を取得する。ここでＭは十分に大きな任意の枚数とする。次に、取得したＭ枚の画像の各画素に対して、（１）式により座標（ｘ，ｙ）における画素の平均値￣Ｉ（ｘ，ｙ）（￣は、Ｉの頭に付く）を計算する。ここで、ｋは処理対象画像を０フレーム目とし、過去Ｍフレーム目までのフレーム番号、は各フレームのフレーム番号ｋの座標（ｘ，ｙ）の画素値とする。

Next, the operation of each process will be described in detail. An image acquired via the camera 1 is subjected to a smoothing process in the smoothed image generator 2. For the smoothing process, a filter process used by general image processing such as a Gaussian filter is used. The smoothed image is accumulated in the image accumulating unit 3, and the M value continuous image including the processing target image is acquired from the image accumulating unit 3 in the variance value image generating unit 4. Here, M is an arbitrarily large number of sheets. Next, for each pixel of the acquired M images, an average value ￣I (x, y) (￣ is attached to the head of I) of the pixel at the coordinate (x, y) according to the equation (1). calculate. Here, k is the 0th frame of the processing target image, the frame number up to the past M frame, and is the pixel value of the coordinates (x, y) of the frame number k of each frame.

各画素の平均値を計算した後、（２）式により分散値を計算する。

After calculating the average value of each pixel, the variance value is calculated according to equation (2).

全ての画素に対して同様の処理を行った後、画像蓄積部３から処理対象画像を取得し、処理対象画像の輝度値に対して平滑化処理を行う。ここでの平滑化処理は、平滑化画像生成部と同様にガウシアンフィルタ等のフィルタ処理を用いる。分散値画像Ｓ（ｘ，ｙ）の値を平滑化処理後の処理対象画像の輝度値で除算することにより、（３）式により分散値の正規化処理を行う。ここで、平滑化後の分散値画像の画素値をＧ（ｘ，ｙ）、正規化処理後の分散値をＳ'（ｘ，ｙ）とする。
Ｓ'（ｘ，ｙ）＝Ｓ（ｘ，ｙ）／Ｇ（ｘ，ｙ）・・・（３） After performing the same processing for all the pixels, a processing target image is acquired from the image storage unit 3, and smoothing processing is performed on the luminance value of the processing target image. The smoothing process here uses a filter process such as a Gaussian filter as in the smoothed image generation unit. By dividing the value of the variance value image S (x, y) by the luminance value of the processing target image after the smoothing process, the variance value normalization process is performed according to equation (3). Here, it is assumed that the pixel value of the smoothed dispersion value image is G (x, y), and the dispersion value after the normalization process is S ′ (x, y).
S ′ (x, y) = S (x, y) / G (x, y) (3)

分散値画像の正規化処理を行った後、モルフォロジ演算の膨張処理、収縮処理を行う。膨張処理は、正規化処理分散値画像の任意の画素Ｓ'（ｘ，ｙ）に対して、隣接する８画素の中から最大の値を画素Ｓ'（ｘ，ｙ）の値とする演算を、任意の回数実施する。膨張処理後の分散値画像の画素値をＳ''（ｘ，ｙ）とする。同様に、収縮処理は膨張処理後の分散値画像の任意の画素Ｓ''（ｘ，ｙ）に対して、隣接する８画素の中から最小となる画素の値を、画素Ｓ''（ｘ，ｙ）の値とする演算を任意の回数実施する。収縮処理後の分散値画像の画素値をＳ'''（ｘ，ｙ）とする。ここで、膨張処理、収縮処理の実行回数は同じ回数とする。最後に、（４）式において任意の閾値により分散値画像Ｓ'''（ｘ，ｙ）の値を判定し、閾値を超えた値の画素についてのみ顔検出を行うことにより、高速な検出を行うことが可能となる。ここで、ｆ（ｘ，ｙ）は閾値を超えたか否かを示す２値画像の画素値であり、ｆ（ｘ，ｙ）＝１の場合は閾値を超えており、ｆ（ｘ，ｙ）＝０の場合は閾値を超えていないことを表す。また、ＴＨは任意の閾値とする。

After performing the normalization process of the variance value image, the expansion process and the contraction process of the morphology operation are performed. In the dilation processing, for any pixel S ′ (x, y) in the normalized dispersion value image, an operation is performed in which the maximum value among the adjacent 8 pixels is the value of the pixel S ′ (x, y). Implement any number of times. Let S ″ (x, y) be the pixel value of the dispersion value image after the expansion processing. Similarly, in the contraction process, for any pixel S ″ (x, y) in the dispersion value image after the expansion process, the minimum pixel value among the adjacent eight pixels is set as the pixel S ″ (x , Y) is performed an arbitrary number of times. Let S ′ ″ (x, y) be the pixel value of the dispersion value image after the shrinkage processing. Here, the expansion processing and the contraction processing are executed the same number of times. Finally, the value of the variance value image S ′ ″ (x, y) is determined by an arbitrary threshold value in the equation (4), and face detection is performed only for pixels having a value exceeding the threshold value, thereby performing high-speed detection. Can be done. Here, f (x, y) is a pixel value of the binary image indicating whether or not the threshold value is exceeded. When f (x, y) = 1, the threshold value is exceeded, and f (x, y). When = 0, it indicates that the threshold is not exceeded. TH is an arbitrary threshold value.

＜第２の実施形態＞
次に、図３を参照して、第２の実施形態における画像処理装置の処理動作を説明する。図３は、第２の実施形態における画像処理装置の処理動作を示すフローチャートである。図３において、図２に示す処理動作と同一の処理動作には同じ符号を付けてその説明を省略する。 <Second Embodiment>
Next, the processing operation of the image processing apparatus according to the second embodiment will be described with reference to FIG. FIG. 3 is a flowchart illustrating the processing operation of the image processing apparatus according to the second embodiment. In FIG. 3, the same processing operations as those shown in FIG.

第１の実施形態では、収縮処理後の分散値画像に対して、任意の閾値を設定し、閾値を超える場合は該当画素を１とし、それ以外の画素値を０とする人物領域画像を生成している。図３に示す処理動作では、平滑化処理後の処理対象画像の該当する位置の輝度値に応じて最小の分散値を持つテーブルを用意し、処理対象画像の各画素の分散値算出後に、該当画素の輝度値からテーブルの該当輝度値の値と比較し、算出した分散値と比較し、分散値の方が小さければテーブルの値を更新する処理動作（ステップＳ１２）が追加されている。 In the first embodiment, an arbitrary threshold value is set for the dispersion value image after the contraction process, and when the threshold value is exceeded, the corresponding pixel is set to 1 and the other pixel values are set to 0. doing. In the processing operation shown in FIG. 3, a table having the minimum variance value is prepared according to the luminance value at the corresponding position of the processing target image after the smoothing process, and after calculating the variance value of each pixel of the processing target image, A processing operation (step S12) is performed in which the luminance value of the pixel is compared with the value of the corresponding luminance value in the table, compared with the calculated variance value, and if the variance value is smaller, the table value is updated.

また、閾値処理（ステップＳ９）を実施する際に、処理対象画像の該当する位置の画素の輝度値に対応するテーブルの値並びに正規化用輝度画像の該当画素値から、該当する画素の閾値を自動生成することで、カメラ等撮像装置の輝度値毎のノイズによる画素値の変動を考慮した閾値処理を行う。一般的なカメラの場合、カメラ毎にノイズの特性が異なり、任意の閾値によりカメラのノイズによる分散値への影響を除外することは難しい。ここでは、分散値は輝度値の大きさにより相対的に変化し、人物領域では分散値が大きくなるという傾向から、位置と輝度とに依存した最小となる分散値をテーブルとして保持することにより、該当位置が人物以外の領域であった場合においても、ノイズによる最小分散値を排除し、最適な閾値処理を行うことが可能となる。 Further, when the threshold processing (step S9) is performed, the threshold value of the corresponding pixel is determined from the table value corresponding to the luminance value of the pixel at the corresponding position of the processing target image and the corresponding pixel value of the normalization luminance image. By performing the automatic generation, threshold processing is performed in consideration of pixel value variation due to noise for each luminance value of the imaging device such as a camera. In the case of a general camera, noise characteristics differ from camera to camera, and it is difficult to exclude the influence of the camera noise on the dispersion value by an arbitrary threshold value. Here, since the variance value changes relatively depending on the size of the luminance value, and the variance value tends to increase in the person area, by holding a minimum variance value depending on the position and the luminance as a table, Even when the corresponding position is an area other than a person, it is possible to eliminate the minimum variance value due to noise and perform optimum threshold processing.

＜第３の実施形態＞
次に、図４を参照して、第３の実施形態における画像処理装置の処理動作を説明する。図４は、第３の実施形態における画像処理装置の処理動作を示すフローチャートである。図４において、図３に示す処理動作と同一の処理動作には同じ符号を付けてその説明を省略する。 <Third Embodiment>
Next, the processing operation of the image processing apparatus according to the third embodiment will be described with reference to FIG. FIG. 4 is a flowchart showing the processing operation of the image processing apparatus according to the third embodiment. In FIG. 4, the same processing operations as those shown in FIG.

第１及び第２の実施形態では、人物領域画像の画素値が１となる画素について顔検出処理を行っている。図４に示す処理動作では、前フレームにて検出された人物の顔の位置情報を顔検出結果としてバッファに記憶する（ステップＳ１３）処理動作が追加され、且つ膨張収縮処理後の分散値画像の画素値から前フレームにて検出された人物の顔位置周辺を探索し、より分散値の大きな方向から優先的に顔画像を探索することにより探索範囲を絞り込み、より高速な顔検出を行う。さらには、人物の動きが非常に小さい場合においても、前フレームにて検出された顔の位置情報に基づき、検出された顔位置周辺の人物領域画像の画素値が０となっている場合においても、該当位置を探索対象とすることにより、人物の顔の検出漏れを低減する。 In the first and second embodiments, face detection processing is performed for pixels whose pixel value of the person area image is 1. In the processing operation shown in FIG. 4, a processing operation for storing the position information of the face of the person detected in the previous frame in the buffer as a face detection result (step S13) is added, and the dispersion value image after the expansion / contraction processing is added. A search is made for the vicinity of the face position of the person detected in the previous frame from the pixel value, and the search range is narrowed down by searching for a face image preferentially from the direction of larger variance value, thereby performing faster face detection. Furthermore, even when the movement of the person is very small, the pixel value of the person area image around the detected face position is 0 based on the position information of the face detected in the previous frame. The detection of the human face is reduced by setting the corresponding position as the search target.

以上説明したように、人物の動き情報により人物領域を限定することにより、従来技術の人物の顔や姿勢推定を行う領域を限定し、高速な検出を行うことが可能となる。また、平滑化による低輝度領域におけるノイズの低減、画像全体の空間的な輝度値により動き量である分散値の正規化、輝度値毎の分散値のテーブル化により人物領域検出時の閾値の自動化などを行うことにより、ノイズによる人物領域の過剰検出や、低輝度領域の検出漏れなどを低減することも可能となる。さらに、人物の動き量に応じた探索領域の優先付けにより高速な探索を行うことも可能となり、前フレームにて検出された顔位置情報を元に該当位置周辺からの顔位置を検出することにより動き量の少ない静止中人物の顔も検出可能となる。 As described above, by limiting the person area based on the person's motion information, it is possible to limit the area for estimating the face and posture of the person in the prior art and perform high-speed detection. Also, smoothing reduces noise in low-brightness areas, normalizes the variance value, which is the amount of motion based on the spatial brightness value of the entire image, and automates thresholds when detecting human areas by creating a table of variance values for each brightness value By performing the above and the like, it is possible to reduce excessive detection of a human area due to noise, omission of detection of a low luminance area, and the like. Furthermore, it becomes possible to perform high-speed search by prioritizing search areas according to the amount of movement of the person, and by detecting the face position from around the corresponding position based on the face position information detected in the previous frame It is also possible to detect the face of a stationary person with a small amount of movement.

なお、図１における処理部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより画像処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 Note that a program for realizing the function of the processing unit in FIG. 1 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed to perform image processing. May be. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, what is called a difference file (difference program) may be sufficient.

撮像装置で撮影された時系列画像から、人物の顔を検出する際に、人物の動きを検出することにより検出範囲を限定し、高速に顔の検出を行うことが不可欠な用途に適用できる。 When detecting a person's face from a time-series image taken by an imaging device, the detection range can be limited by detecting the person's movement, and the detection can be applied to an indispensable use.

１・・・カメラ、２・・・平滑化画像生成部、３・・・画像蓄積部、４・・・分散値算画像生成部、５・・・分散値画像正規化部、６・・・膨張収縮処理部、７・・・人物領域検出部、８・・・顔検出処理部 DESCRIPTION OF SYMBOLS 1 ... Camera, 2 ... Smoothed image generation part, 3 ... Image storage part, 4 ... Dispersion value calculation image generation part, 5 ... Dispersion value image normalization part, 6 ... Expansion / contraction processing unit, 7 ... person area detection unit, 8 ... face detection processing unit

Claims

In order to perform face detection from the acquired image, a smoothed image generation unit, a variance value image generation unit, a variance value image normalization unit, an expansion / contraction processing unit, a person region detection unit, a face detection processing unit, An image processing method in an image processing apparatus comprising:
A smoothing step in which the smoothed image generating unit performs a smoothing process on the acquired image to generate a smoothed image;
The variance value image generation unit calculates an average value of each pixel based on images of a plurality of frames before the processing target image including the processing target image, and calculates the average value of each pixel and the corresponding pixel of each frame. A variance value image generation step for calculating a variance value from the value;
The variance value image normalization unit performs a further smoothing process on the brightness value of the smoothed image of the processing target image to generate a normalization image, and sets each pixel value of the generated variance value image. On the other hand, a normalized image generation step of generating a normalized variance value image by normalizing the variance value image using the value of each pixel of the generated normalization image,
An expansion / contraction step in which the expansion / contraction processing unit generates a corrected dispersion value image by performing expansion processing on the normalized dispersion value image normalized; and
A human region detection step in which the human region detection unit detects a human region by performing arbitrary threshold processing on each pixel value of the corrected dispersion value image to generate a binary image;
An image processing method comprising: a face detection step in which the face detection processing unit performs a face detection process on a pixel for which the variance value is determined to be greater than a predetermined value.

A dispersion value table generating step for generating a dispersion value that is minimized for each luminance value of the processing target image as a dispersion value table with respect to the dispersion value of each pixel generated by the dispersion value image generation step; The threshold value of each pixel is automatically calculated in the person area detection step from the dispersion value table generated in the variance value table generation step and each pixel value of the normalization image generated in the normalized image generation step. The image processing method according to claim 1, wherein threshold processing is performed.

A face detection position holding step for holding a face detection position with respect to the image of the previous frame detected in the face detection step, with the face detection position held in the face detection position holding step as a reference; Priority is given to the direction in which the pixel value of the corrected variance value image generated in the expansion / contraction step is higher, face detection is performed in the face detection step, and the variance value around the pixel position in the person region detection step is a constant value. 3. The image processing method according to claim 1, wherein face detection is performed around the corresponding pixel position even when it is determined as follows.

An image processing apparatus that performs face detection from an acquired image,
Smoothing means for smoothing the acquired image to generate a smoothed image;
A variance that calculates an average value of each pixel based on images of a plurality of frames before the processing target image including the processing target image, and calculates a variance value from the calculated average value of each pixel and the corresponding pixel value of each frame. A value image generating means;
A normalization image is generated by performing further smoothing processing on the luminance value of the smoothed image of the processing target image, and the generated normalization image is generated for each pixel value of the generated variance value image. Normalized image generation means for generating a normalized dispersion value image by normalizing the dispersion value image using the value of each pixel of the image;
Expansion and contraction means for generating a corrected dispersion value image by performing expansion processing on the normalized dispersion value image that has been normalized;
Human area detection means for detecting a human area by performing arbitrary threshold processing on each pixel value of the corrected dispersion value image to generate a binary image;
An image processing apparatus comprising: a face detection unit that performs face detection processing on a pixel for which the variance value is determined to be greater than a predetermined value.

An image processing program for causing a computer on an image processing apparatus that performs face detection from an acquired image to perform image processing,
A smoothing step of performing a smoothing process on the acquired image to generate a smoothed image;
A variance that calculates an average value of each pixel based on images of a plurality of frames before the processing target image including the processing target image, and calculates a variance value from the calculated average value of each pixel and the corresponding pixel value of each frame. A value image generation step;
Further smoothing processing is performed on the brightness value of the smoothed image of the processing target image to generate a normalization image, and the generated normalization image is generated for each pixel value of the generated variance value image. A normalized image generation step of generating a normalized dispersion value image by normalizing the dispersion value image using a value of each pixel of the image;
An expansion / contraction step for generating a corrected dispersion value image by performing an expansion process on the normalized dispersion value image that has been normalized;
A person area detection step of detecting a person area by performing arbitrary threshold processing on each pixel value of the corrected dispersion value image to generate a binary image;
An image processing program that causes the computer to perform a face detection step of performing face detection processing on a pixel for which the variance value is determined to be greater than a predetermined value.