JP2006350576A

JP2006350576A - Image processor, image processing method and image processing program

Info

Publication number: JP2006350576A
Application number: JP2005174410A
Authority: JP
Inventors: Motofumi Fukui; 基文福井; Sukeji Kato; 典司加藤; Hirotsugu Kashimura; 洋次鹿志村
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2005-06-14
Filing date: 2005-06-14
Publication date: 2006-12-28

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processor for detecting a region under consideration according to the number of regions under consideration included in image data when detecting a region under consideration such as a hand region or a foot region included in the image data. <P>SOLUTION: This image processor is configured to acquire the map of score values where score values relating to such a level that each of unit regions included in image data is included in a region under consideration are associated with the positions of the unit regions in the image data, and to make a detection start point determining part 24 determine a plurality of detection start points, and to make a candidate region detection part 25 detect the candidate regions of a plurality of regions under consideration associated with the respective detection start points based on the map of the score values, and to make a hand region specification part 26 decide whether or not the candidate region is a region under consideration based on a relation between the mean value of the score values associated with the unit regions included in each candidate region and a predetermined threshold. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、画像データに含まれる、人物の手又は足に係る部分を含む領域を検出する画像処理装置、画像処理方法及び画像処理プログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and an image processing program for detecting an area including a portion related to a person's hand or foot included in image data.

画像データの中から、人物の手や足に係る部分を検出することは、当該人物の姿勢や状態を認識するために極めて重要な技術である。手や足を検出する方法としては、例えば何らかの方法により当該人物の肌の肌色情報を色分布として取得し、その色分布を用いて手や足を含む画像領域を画定する手法がある。上記の技術により例えば手領域を検出するには、取得した肌色の色分布をもとに、画像データに含まれる各画素について手領域に含まれる度合いに関するスコア値を算出し、各画素の位置とスコア値とが対応づけられたスコア値のマップを生成する。そして、このスコア値のマップを用いて手領域を検出する。スコア値のマップを用いて手領域を検出する場合、まず所定の条件に基づいてスコア値のマップ上に検出開始点を設定する。そして、当該検出開始点を基点として、その周囲のスコア値の高い画素が集まった領域を検出し、手領域として決定する（例えば、特許文献１参照）。 Detecting a part related to a person's hand or foot from image data is a very important technique for recognizing the posture and state of the person. As a method for detecting hands and feet, for example, there is a method of acquiring skin color information of the person's skin as a color distribution by some method and demarcating an image region including the hands and feet using the color distribution. For example, in order to detect a hand region by the above technique, based on the acquired skin color distribution, a score value related to the degree included in the hand region is calculated for each pixel included in the image data, and the position of each pixel is calculated. A map of score values associated with the score values is generated. Then, a hand region is detected using this score value map. When a hand region is detected using a score value map, a detection start point is first set on the score value map based on a predetermined condition. Then, using the detection start point as a base point, an area where pixels around the periphery having high score values gather is detected and determined as a hand area (see, for example, Patent Document 1).

上記のような方法を用いて領域を検出する場合には、予め画像データに含まれるべき手領域、足領域の数を指定する。これにより、指定された領域の数に応じて検出開始点の数を決定し、検出開始点のそれぞれに基づいて領域を決定することで、意図した数の領域を得ることができる。
特開２００５‐０７８２５７号公報 When detecting an area using the above method, the number of hand areas and foot areas to be included in the image data is designated in advance. Accordingly, the intended number of regions can be obtained by determining the number of detection start points according to the number of designated regions and determining the region based on each of the detection start points.
Japanese Patent Laying-Open No. 2005-078257

しかしながら、上記従来例の技術によれば、予め画像データに含まれる手領域や足領域の数を知ることができない場合、正しい数の手領域や足領域を検出することができない。 However, according to the technique of the conventional example, when the number of hand regions and foot regions included in the image data cannot be known in advance, the correct number of hand regions and foot regions cannot be detected.

本発明は、上記実情に鑑みてなされたものであって、その目的の一つは、画像データに含まれる人物の手又は足に係る部分を含む注目領域を検出する場合に、予め画像データに含まれる注目領域の数を知ることができない場合であっても、画像データに含まれる注目領域の数に応じて注目領域を検出できる画像処理装置、画像処理方法及び画像処理プログラムを提供することにある。 The present invention has been made in view of the above circumstances, and one of its purposes is to preliminarily store image data when detecting a region of interest including a part related to a person's hand or foot included in the image data. To provide an image processing device, an image processing method, and an image processing program capable of detecting an attention area according to the number of attention areas included in image data even when the number of attention areas included cannot be known. is there.

上記課題を解決するための本発明に係る画像処理装置は、人物の少なくとも一部を撮像して生成された画像データについて、当該画像データの中から当該人物の手又は足に係る部分を含む注目領域を検出する画像処理装置であって、前記画像データに含まれる単位領域のそれぞれについて、当該単位領域の画像データ内における位置に、当該単位領域が注目領域に含まれる度合いに関するスコア値が対応づけられたスコア値のマップを取得するマップ取得手段と、所定の条件に基づいて、複数の検出開始点を決定する検出開始点決定手段と、前記スコア値のマップに基づいて、前記各検出開始点にそれぞれ対応づけられた複数の注目領域の候補である候補領域を検出する候補領域検出手段と、前記各候補領域について、前記候補領域に含まれる単位領域に対応づけられた前記スコア値の平均値と所定の閾値との関係に基づいて前記候補領域が注目領域であるか否かを判定する領域特定手段と、を含むことを特徴とする。 An image processing apparatus according to the present invention for solving the above-described problem is focused on image data generated by imaging at least a part of a person, including a part related to the person's hand or foot from the image data. An image processing apparatus for detecting an area, wherein for each unit area included in the image data, a score value relating to a degree that the unit area is included in the attention area is associated with a position in the image data of the unit area. Map acquisition means for acquiring a map of the score values obtained, detection start point determination means for determining a plurality of detection start points based on a predetermined condition, and each detection start point based on the map of score values Candidate area detecting means for detecting candidate areas that are candidates for a plurality of attention areas respectively associated with each of the candidate areas, and each candidate area is included in the candidate area Characterized in that on the basis of the relationship between the average value with a predetermined threshold value of the score values associated with the unit areas including a determining area specifying means whether the candidate region is a region of interest.

ここで、前記領域特定手段は、前記スコア値の平均値が高い順に２つの前記候補領域について、当該２つの候補領域に係る前記スコア値の平均値の比に基づいて前記候補領域が注目領域であるか否かを判定することとしてもよい。 Here, the area specifying unit is configured such that, for the two candidate areas in descending order of the average value of the score values, the candidate area is an attention area based on a ratio of the average values of the score values related to the two candidate areas. It may be determined whether or not there is.

また、前記検出開始点決定手段は、複数の前記検出開始点が前記スコア値のマップ上でそれぞれ所定の距離以上離れるように、前記検出開始点を決定することとしてもよい。 Further, the detection start point determination means may determine the detection start point such that a plurality of the detection start points are separated by a predetermined distance or more on the score value map.

また、本発明に係る画像処理方法は、コンピュータを用いて、人物の少なくとも一部を撮像して生成された画像データについて、当該画像データの中から当該人物の手又は足に係る部分を含む注目領域を検出する画像処理方法であって、前記画像データに含まれる単位領域のそれぞれについて、当該単位領域の画像データ内における位置に、当該単位領域が注目領域に含まれる度合いに関するスコア値が対応づけられたスコア値のマップを取得するステップと、所定の条件に基づいて、複数の検出開始点を決定するステップと、前記スコア値のマップに基づいて、前記各検出開始点にそれぞれ対応づけられた複数の注目領域の候補である候補領域を検出するステップと、前記各候補領域について、前記候補領域に含まれる単位領域に対応づけられた前記スコア値の平均値と所定の閾値との関係に基づいて前記候補領域が注目領域であるか否かを判定するステップと、を含むことを特徴とする。 In addition, the image processing method according to the present invention is directed to image data generated by imaging at least a part of a person using a computer, including a part relating to the hand or foot of the person from the image data. An image processing method for detecting an area, wherein for each unit area included in the image data, a score value relating to a degree that the unit area is included in the attention area is associated with a position in the image data of the unit area. Obtaining a map of score values obtained, determining a plurality of detection start points based on a predetermined condition, and corresponding to each detection start point based on the map of score values Detecting candidate areas that are candidates for a plurality of attention areas, and associating each candidate area with a unit area included in the candidate area The candidate region is characterized in that it comprises a, determining whether a region of interest based on the relationship between the average value with a predetermined threshold value of the score values.

また、本発明に係る画像処理プログラムは、コンピュータに、人物の少なくとも一部を撮像して生成された画像データについて、当該画像データの中から当該人物の手又は足に係る部分を含む注目領域を検出させる画像処理プログラムであって、前記画像データに含まれる単位領域のそれぞれについて、当該単位領域の画像データ内における位置に、当該単位領域が注目領域に含まれる度合いに関するスコア値が対応づけられたスコア値のマップを取得するステップと、所定の条件に基づいて、複数の検出開始点を決定するステップと、前記スコア値のマップに基づいて、前記各検出開始点にそれぞれ対応づけられた複数の注目領域の候補である候補領域を検出するステップと、前記各候補領域について、前記候補領域に含まれる単位領域に対応づけられたスコア値の平均値と所定の閾値との関係に基づいて前記候補領域が注目領域であるか否かを判定するステップと、を前記コンピュータに実行させることを特徴とする。 In addition, the image processing program according to the present invention, for image data generated by imaging at least a part of a person in a computer, includes a region of interest including a part related to the person's hand or foot from the image data. In the image processing program to be detected, for each of the unit areas included in the image data, a score value related to the degree to which the unit area is included in the attention area is associated with the position of the unit area in the image data. Obtaining a score value map; determining a plurality of detection start points based on a predetermined condition; and a plurality of detection points corresponding to the detection start points based on the score value map. Detecting a candidate area that is a candidate for the attention area; and for each candidate area, a unit area included in the candidate area Determining whether the candidate region is a region of interest based on the relationship between the average value and a predetermined threshold of response correlated obtained score value, the characterized in that for causing the computer to execute.

本発明によれば、画像データに含まれる、人物の手又は足に係る部分を含む注目領域を検出する際に、予め画像データに含まれる手又は足の数を知ることができない場合においても、画像データに含まれる注目領域の数に応じて注目領域を検出でき、検出の精度を向上できる。 According to the present invention, when detecting a region of interest including a portion related to a person's hand or foot included in the image data, even when the number of hands or feet included in the image data cannot be known in advance, The attention area can be detected according to the number of attention areas included in the image data, and the detection accuracy can be improved.

以下、本発明の好適な実施の形態について、図面を参照しながら説明する。 Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.

本発明の実施の形態に係る画像処理装置１０は、図１に示すように、制御部１１、記憶部１２、操作部１３、表示部１４及び撮像部１５を含んで構成されている。 As shown in FIG. 1, the image processing apparatus 10 according to the embodiment of the present invention includes a control unit 11, a storage unit 12, an operation unit 13, a display unit 14, and an imaging unit 15.

ここで、制御部１１は、例えばＣＰＵ等で構成されており、記憶部１２に格納されているプログラムに従って動作する。記憶部１２は、ＲＡＭやＲＯＭ等のメモリ素子及び／又はディスクデバイスなどを含んで構成されている。この記憶部１２には、制御部１１によって実行されるプログラムが格納されている。また、記憶部１２は、制御部１１のワークメモリとしても動作する。本実施の形態において制御部１１が実行する処理の内容については、後に詳しく述べる。 Here, the control part 11 is comprised, for example with CPU etc., and operate | moves according to the program stored in the memory | storage part 12. FIG. The storage unit 12 includes a memory element such as a RAM and a ROM and / or a disk device. The storage unit 12 stores a program executed by the control unit 11. The storage unit 12 also operates as a work memory for the control unit 11. The contents of the processing executed by the control unit 11 in the present embodiment will be described in detail later.

操作部１３は、キーボードやマウス等であり、利用者の指示操作を受け付けて、当該指示操作の内容を制御部１１に出力する。表示部１４は、ディスプレイ等であり、制御部１１からの指示に従って、情報の表示を行う。 The operation unit 13 is a keyboard, a mouse, or the like, receives a user's instruction operation, and outputs the content of the instruction operation to the control unit 11. The display unit 14 is a display or the like, and displays information according to an instruction from the control unit 11.

撮像部１５は、デジタルスチルカメラやビデオカメラ等であり、画像を撮影して得られた画像データを制御部１１に対して出力する。なお、撮像部１５は、必ずしも必要ではない。この場合、本実施の形態による処理の対象となる画像データは、記憶部１２に予め保持されていてもよいし、図示しない通信部によりネットワーク経由で他の装置から受信することとしてもよい。 The imaging unit 15 is a digital still camera, a video camera, or the like, and outputs image data obtained by capturing an image to the control unit 11. Note that the imaging unit 15 is not always necessary. In this case, the image data to be processed according to the present embodiment may be stored in the storage unit 12 in advance, or may be received from another device via a network by a communication unit (not shown).

以下では、人物を撮像して生成された画像データに対して、当該人物の顔に係る部分を含む顔領域をまず抽出し、当該顔領域に含まれる画素の色の情報に基づいて当該人物の手に係る部分を含む手領域を検出する場合に、画像処理装置１０が実行する処理を例として説明する。なお、本発明の実施の形態は、上述の形態に限られず、例えば手領域に代えて足領域を検出することとしてもよい。また、任意の対象物について、当該対象物の性状情報に基づいて画像データから当該対象物の少なくとも一部を含む画像領域（注目領域）を複数個検出する場合に適用可能である。例えば、人物だけでなく、動物やロボット等、画像データに含まれる複数の領域を外観の性状情報を用いて検出可能な多様な対象物に対して適用することができる。この場合、対象物の外観や撮像する画像のデータ形式などによって、色の情報に代えて画素の濃度の情報など他の外観の性状に関する情報を用いてもよい。 In the following, a face area including a portion related to the person's face is first extracted from the image data generated by imaging the person, and the person's color information of the person included in the face area is extracted. An example of processing executed by the image processing apparatus 10 when detecting a hand region including a portion related to the hand will be described. In addition, embodiment of this invention is not restricted to the above-mentioned form, For example, it is good also as detecting a foot area | region instead of a hand area | region. Further, the present invention can be applied to a case where a plurality of image areas (attention areas) including at least a part of the target object are detected from image data based on property information of the target object. For example, not only a person but also a plurality of regions included in image data, such as animals and robots, can be applied to various objects that can be detected using appearance property information. In this case, depending on the appearance of the target object or the data format of the image to be captured, information on other appearance properties such as pixel density information may be used instead of color information.

画像処理装置１０は、機能的には、図２に示すように、顔領域抽出部２１、色ヒストグラム生成部２２、マップ生成部２３、検出開始点決定部２４、候補領域検出部２５及び手領域特定部２６を含んで構成されている。これらの機能は、プログラムとして画像処理装置１０の記憶部１２に記憶されており、制御部１１によって実行される。 As shown in FIG. 2, the image processing apparatus 10 functionally includes a face area extraction unit 21, a color histogram generation unit 22, a map generation unit 23, a detection start point determination unit 24, a candidate area detection unit 25, and a hand area. The identification unit 26 is included. These functions are stored as programs in the storage unit 12 of the image processing apparatus 10 and executed by the control unit 11.

顔領域抽出部２１は、撮像部１５などによって得られた画像データに対して、当該画像データの中からサンプルデータとして用いる人物の顔に係る部分を含む画像領域を抽出する。撮像部１５がビデオカメラなどの場合、動画像のデータが得られるが、この場合、顔領域抽出部２１は、動画像データの中に含まれる各フレームの静止画像データに対して、処理を行う。 The face area extraction unit 21 extracts, from the image data obtained by the imaging unit 15 or the like, an image area that includes a portion related to a person's face used as sample data. When the imaging unit 15 is a video camera or the like, moving image data is obtained. In this case, the face area extracting unit 21 performs processing on still image data of each frame included in the moving image data. .

顔領域の抽出の方法は特に限定されず、任意の技術を用いることができる。例えば、顔領域抽出部２１は、予め学習によって獲得した顔の形状パターンに関する統計データを用いて、顔領域の抽出を行うこととしてもよい。また、顔領域抽出部２１は、顔の位置だけでなく、顔の向きや大きさを特定することとしてもよい。 The method for extracting the face area is not particularly limited, and any technique can be used. For example, the face area extraction unit 21 may extract a face area using statistical data regarding a face shape pattern acquired in advance by learning. Further, the face area extracting unit 21 may specify not only the position of the face but also the direction and size of the face.

色ヒストグラム生成部２２は、顔領域抽出部２１により抽出された画像領域に含まれる画素について、その色分布を表す色ヒストグラムを生成する。生成される色ヒストグラムは、対象となる画像データに表される人物の顔の色情報から得られた、その人物固有の肌の色情報を表すことになる。 The color histogram generation unit 22 generates a color histogram representing the color distribution of pixels included in the image region extracted by the face region extraction unit 21. The generated color histogram represents the skin color information unique to the person obtained from the face color information of the person represented in the target image data.

具体的に、色ヒストグラム生成部２２は、まず顔領域に含まれる画素がそれぞれ持つ色の値を、必要に応じて適当な色空間上の値に変換する。例えば、元の画像データに含まれるそれぞれの画素が、赤、緑、青の３原色のそれぞれについて０から２５５までの２５６階調の値（以下、それぞれＲ，Ｇ，Ｂで表す）を持ち、これら３つの値の組み合わせによって画素の持つ色が表されるものとする。この場合、色ヒストグラム生成部２２は、他の色空間への変換を行わず、Ｒ，Ｇ，Ｂの値をそのまま用いて３次元のヒストグラムを生成してもよいし、３つの値のうち一部の値のみ用いて、より低次元のヒストグラムを生成することとしてもよい。また、特に人物の肌色の特徴を捉える場合には、前記Ｒ，Ｇ，Ｂの値によって表される色を正規化ｒｇ空間や、ＨＳＶ空間上の値に変換することが望ましい。ここでは、正規化ｒｇ空間上の値に変換するものとする。正規化ｒｇ空間上のｒ成分、ｇ成分のそれぞれの値は、以下の計算式により計算される。
ｒ＝Ｒ／（Ｒ＋Ｇ＋Ｂ）
ｇ＝Ｇ／（Ｒ＋Ｇ＋Ｂ）
この計算式により、色ヒストグラム生成部２２は、それぞれの画素について、（Ｒ，Ｇ，Ｂ）の値から、正規化ｒｇ空間上の値（ｒ，ｇ）を得る。ここで、ｒ，ｇはそれぞれ０から１までの値をとる。 Specifically, the color histogram generation unit 22 first converts the color values of the pixels included in the face area into values in an appropriate color space as necessary. For example, each pixel included in the original image data has 256 gradation values from 0 to 255 for each of the three primary colors red, green, and blue (hereinafter, represented by R, G, and B, respectively) It is assumed that the color of a pixel is represented by a combination of these three values. In this case, the color histogram generation unit 22 may generate a three-dimensional histogram using the values of R, G, and B as they are without performing conversion to another color space, or one of the three values. A lower-dimensional histogram may be generated using only the value of the part. In particular, when capturing the characteristics of a person's skin color, it is desirable to convert the color represented by the R, G, and B values into values in a normalized rg space or HSV space. Here, the values are converted into values in the normalized rg space. Each value of the r component and the g component on the normalized rg space is calculated by the following calculation formula.
r = R / (R + G + B)
g = G / (R + G + B)
From this calculation formula, the color histogram generation unit 22 obtains a value (r, g) in the normalized rg space from the value of (R, G, B) for each pixel. Here, r and g each take a value from 0 to 1.

さらに、色ヒストグラム生成部２２は、顔領域に含まれる画素のそれぞれについて、必要に応じて変換された色空間上の値が、色ヒストグラムのどのビンに含まれるか判定する。ここで、ヒストグラムのビン数としては、予め定められた適当な値を用いる。例えばビン数をｒ成分、ｇ成分のそれぞれについて５０とすると、生成される色ヒストグラムは、５０×５０のビン数を持つ２次元のヒストグラムになる。ｒ，ｇはそれぞれ０から１までの値をとるので、ビンの幅が一様である場合、それぞれのビン幅は１／５０となる。 Further, the color histogram generation unit 22 determines in which bin of the color histogram the value in the color space converted as necessary for each pixel included in the face region. Here, an appropriate predetermined value is used as the number of bins in the histogram. For example, assuming that the number of bins is 50 for each of the r component and the g component, the generated color histogram is a two-dimensional histogram having 50 × 50 bins. Since r and g each take a value from 0 to 1, when the bin width is uniform, each bin width is 1/50.

このようにして、色ヒストグラム生成部２２は、顔領域抽出部２１が抽出した画像領域に含まれる画素について、所定の色空間における色分布を色ヒストグラムとして生成する。この色ヒストグラムは、それぞれのビンについて、そのビンに該当する成分の色を持つ画素の数を度数として持つ。また、そのビンに該当する成分の色を持つ画素の数を調査対象とした画素数で割った値を度数としてもよい。この場合、各ビンの度数は画像領域に含まれる全画素に対する割合で表され、全てのビンの度数を合計すると１になる。 In this way, the color histogram generation unit 22 generates a color distribution in a predetermined color space as a color histogram for the pixels included in the image region extracted by the face region extraction unit 21. This color histogram has, for each bin, the number of pixels having the color of the component corresponding to that bin as the frequency. Further, the frequency may be a value obtained by dividing the number of pixels having the component color corresponding to the bin by the number of pixels to be investigated. In this case, the frequency of each bin is expressed as a ratio with respect to all the pixels included in the image area, and becomes 1 when the frequencies of all bins are summed.

マップ生成部２３は、色ヒストグラム生成部２２により生成された色ヒストグラムのデータを用いて、スコア値のマップ（活性化マップ）を生成する。ここで、スコア値のマップは、処理の対象となっている画像データに含まれる各画素について、当該画素の画像データ上の位置情報と、当該画素が検出しようとする手領域に含まれる度合いに関するスコア値とが対応づけられたデータである。スコア値は、画像データに含まれる各画素について、注目領域（本実施例においては、手領域）に含まれる度合いを所定の方法により算出した値である。例えば、当該画素が手領域に含まれる確率を表す尤度などに基づいてスコア値を算出できる。本実施の形態においては、スコア値は主として当該画素の色が画像データに撮像された人物の肌色である可能性がどの程度あるかに関する情報に基づいて算出される。 The map generation unit 23 generates a score value map (activation map) using the color histogram data generated by the color histogram generation unit 22. Here, the score value map relates to the position information on the pixel data of each pixel included in the image data to be processed and the degree to which the pixel is included in the hand region to be detected. This is data in which score values are associated with each other. The score value is a value obtained by calculating the degree of inclusion of each pixel included in the image data in the attention area (in this embodiment, the hand area) by a predetermined method. For example, the score value can be calculated based on the likelihood representing the probability that the pixel is included in the hand region. In the present embodiment, the score value is calculated mainly based on information regarding the degree of possibility that the color of the pixel is the skin color of a person captured in the image data.

スコア値を算出する方法としては、バックプロジェクション方式や相関方式などの方法を用いることができる。具体的には、例えば以下のようにする。すなわち、まずスコア値を算出する対象となる画素の色の値を、必要に応じて色ヒストグラム生成部２２が用いた色空間上の値に変換する。そして、色ヒストグラム生成部２２により得られた色ヒストグラムにおいて、変換後の画素の色情報が該当するビンの度数の値を、当該画素の肌色のスコア値として設定する。 As a method for calculating the score value, a method such as a back projection method or a correlation method can be used. Specifically, for example, the following is performed. That is, first, the color value of the pixel for which the score value is calculated is converted into a value in the color space used by the color histogram generation unit 22 as necessary. Then, in the color histogram obtained by the color histogram generation unit 22, the frequency value of the bin corresponding to the color information of the pixel after conversion is set as the skin color score value of the pixel.

ここで、顔領域抽出部２１が抽出した顔領域については、手領域検出部２５が手領域を検出する対象からは除かれるため、スコア値の算出は行わなくともよい。この場合、当該画像領域に含まれる画素のスコア値を０とすることで、手領域検出部２５が検出する対象から除かれるようにすることができる。 Here, since the face area extracted by the face area extracting unit 21 is excluded from the target from which the hand area detecting unit 25 detects the hand area, the score value need not be calculated. In this case, by setting the score value of the pixel included in the image area to 0, it can be excluded from the object detected by the hand area detection unit 25.

また、スコア値を算出する場合に、顔領域との位置関係を考慮してもよい。すなわち、顔領域の位置や大きさ、また顔の向きの情報から、手が存在する可能性が高いと考えられる位置にある画素については、同じ肌色の色情報を持っていたとしても、手が存在する可能性が低いと考えられる位置にある画素よりもスコア値が高くなるように、スコア値の算出を行う。このようにすれば、より精度よく手領域を検出することができる。 Further, when calculating the score value, the positional relationship with the face area may be taken into consideration. That is, from the information on the position and size of the face area and the orientation of the face, even if the pixel at a position where the possibility that the hand is likely to exist is the same skin color information, The score value is calculated so that the score value is higher than that of a pixel at a position considered to be unlikely to exist. In this way, the hand area can be detected with higher accuracy.

このようにして得られたスコア値のマップは、顔領域抽出部２１が抽出した顔領域の色情報に近い色を持つ画素が高いスコア値を持ち、手領域であると考えられる領域に高いスコア値を持つ画素が集まることになる。 In the score value map obtained in this way, pixels having colors close to the color information of the face area extracted by the face area extracting unit 21 have a high score value, and a high score is given to an area considered to be a hand area. Pixels with values will be collected.

検出開始点決定部２４は、候補領域検出部２５が手領域の候補領域を検出する際の基点となる位置（検出開始点）を、所定の条件に基づいて複数個所決定する。検出開始点の決定方法としては、スコア値のマップ上の各画素のスコア値に基づいて決定する方法などがある。例えば、全画素の中から、スコア値の高い順に所定の数の画素を、検出開始点として決定する。 The detection start point determination unit 24 determines a plurality of positions (detection start points) as base points when the candidate region detection unit 25 detects a candidate region of the hand region based on a predetermined condition. As a method of determining the detection start point, there is a method of determining based on the score value of each pixel on the score value map. For example, a predetermined number of pixels are determined as detection start points in descending order of score values from all pixels.

また、例えば以下のような方法により検出開始点を決定してもよい。すなわち、所定の方法でサイズを定めた正方形の領域を用意し、この正方形の領域によってスコア値のマップ全体を走査した場合に、正方形の領域の内部に含まれる画素のスコア値の平均値が高くなるような位置を、平均値が高い順に所定の数だけ決定する。そして、その位置における正方形の領域の中心を検出開始点として決定する。この場合、正方形の領域の大きさを、顔領域抽出部２１が抽出した顔領域の大きさに基づいて、例えば顔領域の大きさの０．４倍の大きさになるように設定するとよい。これにより、単にスコア値の高い画素に基づいて検出開始点を決定する場合に比べて、手領域ではないのに高いスコア値を持つような画素によるノイズの影響を低減できる。 For example, the detection start point may be determined by the following method. That is, when a square area whose size is determined by a predetermined method is prepared and the entire score value map is scanned by this square area, the average value of the score values of the pixels included in the square area is high. A predetermined number of such positions are determined in descending order of the average value. Then, the center of the square area at that position is determined as the detection start point. In this case, the size of the square area may be set to be 0.4 times the size of the face area, for example, based on the size of the face area extracted by the face area extraction unit 21. Thereby, compared with the case where the detection start point is simply determined based on a pixel having a high score value, it is possible to reduce the influence of noise caused by a pixel having a high score value although it is not a hand region.

また、元の画像データが動画像に含まれるフレームの画像データである場合、当該フレームより時間的に前のフレームの画像データについて、既に手領域の検出を行っていれば、前のフレームで得られた手領域の位置に基づいて検出開始点を決定してもよい。動画像の場合、前後のフレーム画像において手領域の位置が急激に変化する可能性は少ないため、この方法によれば、より手領域である可能性が高い位置を検出開始点として決定することができる。あるいは、顔領域の位置及び大きさ、顔の向きなどに基づいて、顔との相対的な位置関係により、手領域がある可能性が高いと判断される領域の中から検出開始点を決定してもよい。 In addition, when the original image data is image data of a frame included in the moving image, if the hand area has already been detected for the image data of the previous frame in time, the previous image can be obtained. The detection start point may be determined based on the position of the hand region. In the case of a moving image, the position of the hand region is unlikely to change suddenly in the preceding and following frame images, and according to this method, a position that is more likely to be a hand region can be determined as a detection start point. it can. Alternatively, based on the position and size of the face area, the orientation of the face, etc., the detection start point is determined from areas that are likely to have a hand area based on the relative positional relationship with the face. May be.

さらに、検出開始点決定部２４は、複数の検出開始点を決定する際に、それぞれの検出開始点が所定の距離以上離れるように決定してもよい。例えば、顔領域の大きさに基づいて、手領域の大きさとして予測される大きさを見積もり、これに基づいて所定の距離を決定する。これにより、互いに離れた複数の手領域を検出しやすくなる。 Furthermore, when the detection start point determination unit 24 determines a plurality of detection start points, the detection start point determination unit 24 may determine that the detection start points are separated by a predetermined distance or more. For example, the size estimated as the size of the hand region is estimated based on the size of the face region, and a predetermined distance is determined based on the estimated size. This facilitates detection of a plurality of hand regions that are separated from each other.

候補領域検出部２５は、マップ生成部２３が生成したスコア値のマップ上において、検出開始点決定部２４が決定した当該スコア値のマップ上の各検出開始点を基点として、検出する目的となる画像領域である手領域の候補領域を検出する。具体的に、候補領域検出部２５は、各検出開始点に対して、スコア値のマップ上において当該検出開始点の近傍の高いスコア値を持つ画素が集まった領域を特定することで、候補領域を検出する。候補領域は、複数の検出開始点に応じて、複数個検出される。 The candidate area detection unit 25 has a purpose of detecting, on the score value map generated by the map generation unit 23, using each detection start point on the score value map determined by the detection start point determination unit 24 as a base point. A hand region candidate region that is an image region is detected. Specifically, the candidate area detection unit 25 identifies, for each detection start point, an area in which a pixel having a high score value near the detection start point is gathered on the score value map. Is detected. A plurality of candidate areas are detected according to a plurality of detection start points.

候補領域検出部２５が候補領域を検出する方法は特に限定されないが、ここでは、カムシフトアルゴリズム（Gary R.Bradski, Computer Vision Face Tracking For Use in a Perceptual User Interface: Intel Technology Journal Q2,1998）を用いた例について説明する。 The method for detecting the candidate area by the candidate area detection unit 25 is not particularly limited, but here, a cam shift algorithm (Gary R. Bradski, Computer Vision Face Tracking For Use in a Perceptual User Interface: Intel Technology Journal Q2, 1998) is used. The used example will be described.

カムシフトアルゴリズムを用いる場合、まず検出開始点を中心として、ある大きさを持った正方形の領域（以下、テンプレートという）を設定し、その中に含まれる画素のスコア値の情報を用いて、所定の関数により手領域である可能性を判断するための値（以下、モーメントという）を計算する。候補領域検出部２５は、このモーメントに基づいてテンプレートの重心を決定し、決定した重心にテンプレートの中心を移動させて、当該位置におけるモーメントを再計算する処理を行う。以下、同様にして、新たに求めた重心と前回求めた重心との距離が所定の値以下になる（すなわち、重心の位置が収束する）か、繰り返しの回数が所定の値に到達するまでこれらの処理を繰り返す。これにより、最終的に得られたテンプレートの重心の位置を、候補領域の中心として仮決定する。 When using the cam shift algorithm, first, a square area having a certain size (hereinafter referred to as a template) is set with the detection start point as the center, and information on score values of pixels included therein is used to determine a predetermined area. A value (hereinafter referred to as a moment) for determining the possibility of a hand region is calculated by the function of Candidate area detection unit 25 determines the center of gravity of the template based on this moment, moves the center of the template to the determined center of gravity, and recalculates the moment at that position. Similarly, until the distance between the newly obtained center of gravity and the previously obtained center of gravity is equal to or smaller than a predetermined value (that is, the position of the center of gravity converges) or until the number of repetitions reaches a predetermined value. Repeat the process. Thereby, the position of the center of gravity of the finally obtained template is provisionally determined as the center of the candidate area.

続いて、モーメントの値に基づいてテンプレートのサイズ及び傾きを変化させて、再度重心の位置が収束するまで前述の処理を繰り返す。この一連の処理を、位置、サイズ及び傾きの全てが収束するか、繰り返し回数が所定の値に到達するまで繰り返すことにより、最終的に得られたテンプレートの位置、サイズ及び傾きを候補領域の位置、サイズ及び傾きとして決定する。 Subsequently, the size and inclination of the template are changed based on the value of the moment, and the above processing is repeated until the position of the center of gravity converges again. By repeating this series of processing until all of the position, size and inclination converge or the number of repetitions reaches a predetermined value, the position, size and inclination of the template finally obtained are the positions of the candidate areas. , Determined as size and slope.

手領域特定部２６は、候補領域検出部２５が複数の検出開始点に応じて検出した複数の候補領域の中から、所定の条件を満たす領域を、手領域として特定する。以下では、手領域特定部２６が手領域を特定する手順について、図３のフロー図に基づいて説明する。なお、ここでは、候補領域検出部２５が検出した候補領域の数をＮとし、それぞれの候補領域Ｔ_ｋ（ｋ＝１，２，・・・，Ｎ）に含まれる画素のスコア値の平均値をＳ_ｋで表す。 The hand region specifying unit 26 specifies a region satisfying a predetermined condition as a hand region from among a plurality of candidate regions detected by the candidate region detecting unit 25 according to a plurality of detection start points. Below, the procedure in which the hand region specifying unit 26 specifies the hand region will be described based on the flowchart of FIG. Here, the number of candidate areas detected by the candidate area detection unit 25 is N, and the average value of the score values of the pixels included in each candidate area T _k (k = 1, 2,..., N). _Is represented by Sk.

まず、手領域特定部２６は、候補領域Ｔ_ｋの大きさが所定の条件を満たすか否かを判定し、所定の条件を満たさない候補領域を以降の処理の対象から除外する（Ｓ１）。具体的には、例えば候補領域の幅及び高さが所定の範囲に含まれているか否かを判定し、含まれていない場合には候補領域は手領域ではないと判断する。所定の範囲は、例えば画像データ全体の大きさや顔領域の大きさに基づいて決定してもよい。また、元の画像データが動画像データに含まれるフレームの画像データである場合には、当該フレームより時間的に前のフレームに対して検出した手領域の大きさなどに基づいて、所定の範囲を決定してもよい。 First, the hand region specifying unit 26, the size of the candidate region T _k is determined whether a predetermined condition is satisfied, excluded from further processing the candidate region does not satisfy a predetermined condition (S1). Specifically, for example, it is determined whether or not the width and height of the candidate area are included in a predetermined range. If not included, the candidate area is determined not to be a hand area. The predetermined range may be determined based on the size of the entire image data or the size of the face area, for example. Further, when the original image data is image data of a frame included in the moving image data, a predetermined range is determined based on the size of the hand area detected with respect to the frame temporally prior to the frame. May be determined.

次に、手領域特定部２６は、Ｓ１の処理において除外されずに残った候補領域Ｔ_ｋについて、それぞれの領域に含まれる画素のスコア値の平均値Ｓ_ｋが最も高い候補領域を第１候補領域Ｔ_ｓとして設定する。また、スコア値の平均値が２番目に高い候補領域を第２候補領域Ｔ_ｆとして設定する（Ｓ２）。この第１候補領域及び第２候補領域が、候補領域検出部２５が検出した複数の候補領域の中で、最終的に手領域であると判断される候補となる。ここで、第１候補領域に含まれる画素のスコア値の平均値をＳ_ｆ、第２候補領域に含まれる画素のスコア値の平均値をＳ_ｓとする。 Next, for the candidate regions T _k that remain without being excluded in the processing of S1, the hand region specifying unit 26 selects a candidate region having the highest average score S _k of the pixels included in each region as the first candidate. Set as region T _s . Further, the candidate area with the second highest average score value is set as the second candidate area _Tf (S2). The first candidate area and the second candidate area are candidates that are finally determined to be hand areas among the plurality of candidate areas detected by the candidate area detection unit 25. Here, the average value of the score values of the pixels included in the first candidate region is S _f , and the average value of the score values of the pixels included in the second candidate region is S _s .

続いて、手領域特定部２６は、まずＳ_ｓが所定の閾値より大きいか否かを判定する（Ｓ３）。ここで、所定の閾値は、例えば過去に検出された手領域のスコア値の平均値などのデータに基づいて０．００５などの値に設定する。所定の閾値以下の場合、第２候補領域Ｔ_ｓは手領域ではないと推測される。 Subsequently, the hand region specifying unit 26 first determines whether or not S _s is larger than a predetermined threshold (S3). Here, the predetermined threshold value is set to a value such as 0.005 based on data such as an average value of score values of hand regions detected in the past. When the value is equal to or smaller than the predetermined threshold, it is estimated that the second candidate region T _s is not a hand region.

Ｓ_ｓが所定の閾値以下である場合、手領域特定部２６は次に、Ｓ_ｆが所定の閾値より大きいか否かを判定する（Ｓ４）。ここで、所定の閾値は、Ｓ４で用いたものと同じ値であるとする。Ｓ_ｆも所定の閾値以下の場合、第１候補領域Ｔ_ｆも手領域ではないと推測される。従って、手領域特定部２６は、画像データの中に手領域は含まれていないと判定して処理を終了する（Ｓ５）。一方、Ｓ_ｆが所定の閾値より大きい場合には、第１候補領域Ｔ_ｆは手領域であると考えられる。従って、画像データの中には手領域が一つだけ含まれると判定し、第１候補領域Ｔ_ｆを手領域として特定する（Ｓ６）。 When S _s is equal to or smaller than the predetermined threshold, the hand region specifying unit 26 next determines whether or not S _f is larger than the predetermined threshold (S4). Here, it is assumed that the predetermined threshold is the same value as that used in S4. When _{Sf is} also equal to or smaller than a predetermined threshold, it is estimated that the first candidate region _Tf is not a hand region. Therefore, the hand area specifying unit 26 determines that the hand area is not included in the image data, and ends the process (S5). On the other hand, when S _f is greater than a predetermined threshold value is considered the first candidate region T _f is a hand region. Accordingly, it is determined that only one hand region is included in the image data, and the first candidate region _Tf is specified as the hand region (S6).

これに対して、Ｓ３の処理においてＳ_ｓが所定の閾値より大きいと判定された場合、手領域特定部２６は第１候補領域Ｔ_ｆ及び第２候補領域Ｔ_ｓの両方が手領域であると判定してもよい。しかしながら、ここではさらに検出精度を上げるため、Ｓ_ｓとＳ_ｆの比が所定の閾値より大きいか否かを判定する（Ｓ７）。具体的には、Ｓ_ｆ／Ｓ_ｓが例えば２．０より大きいか否かを判定する。これにより、例えば２つの候補領域のスコア値の平均値に大きな差があってＳ_ｆがＳ_ｓの２倍となるような場合、一方の候補領域は手領域ではないと判定する。このようにＳ_ｆとＳ_ｓの比を用いて手領域であるか否かを判定する場合、Ｓ３で用いた閾値より大きい候補領域であっても、手領域ではないと判定することがあり得ることになる。 On the other hand, when it is determined in the process of S3 that S _s is larger than the predetermined threshold, the hand region specifying unit 26 determines that both the first candidate region T _f and the second candidate region T _s are hand regions. You may judge. However, here, in order to further increase the detection accuracy, it is determined whether or not the ratio of S _s to S _f is greater than a predetermined threshold (S7). Specifically, it is determined whether S _f / S _s is greater than 2.0, for example. If this, for example S _f there is a large difference in the average value of the score values of the two candidate regions such that twice the S _s, one of the candidate region is not the very hand region. As described above, when determining whether or not the hand region is a ratio using the ratio of S _f and S _s , even if the candidate region is larger than the threshold used in S3, it may be determined that it is not a hand region. It will be.

Ｓ_ｓとＳ_ｆの比が所定の閾値より大きい場合、第２候補領域Ｔ_ｓは手領域ではないと判定し、手領域特定部２６は第１候補領域Ｔ_ｆのみを手領域として特定する（Ｓ６）。一方、Ｓ_ｓとＳ_ｆの比が所定の閾値以下の場合には、第１候補領域Ｔ_ｆ及び第２候補領域Ｔ_ｓの両方が手領域であると考えられる。従って、手領域特定部２６は、画像データの中に手領域が二つ含まれると判定し、第１候補領域Ｔ_ｆ及び第２候補領域Ｔ_ｓを手領域として特定する（Ｓ７）。 _If the ratio of S _s to S _f is greater than a predetermined threshold, it is determined that the second candidate region T _s is not a hand region, and the hand region specifying unit 26 specifies only the first candidate region T _f as a hand region ( S6). On the other hand, when the ratio of S _s to S _f is equal to or smaller than a predetermined threshold, both the first candidate region T _f and the second candidate region T _s are considered to be hand regions. Accordingly, the hand region specifying unit 26 determines that the hand region is included two in the image data, identifying a first candidate region T _f and the second candidate region T _s as the hand region (S7).

以上の処理により得られた手領域の情報は、人物の姿勢や状態に関する情報として記憶部１２に記録され、又は他の処理に用いられる。また、処理の対象とした画像データが動画像の一部であるフレーム画像のデータであった場合、動画像データの各フレームに対して手領域を検出し、各フレームについての手領域の差分情報を得ることで、人物の手の動きを把握することが可能となる。 Information on the hand region obtained by the above processing is recorded in the storage unit 12 as information on the posture and state of the person, or used for other processing. In addition, when the image data to be processed is frame image data that is a part of a moving image, a hand region is detected for each frame of the moving image data, and difference information of the hand region for each frame is detected. It becomes possible to grasp the movement of a person's hand.

以上説明した本発明の実施の形態によれば、所定の閾値に基づいて候補領域が手領域であるか否かの判定を行うことで、予め画像データに含まれる手領域の数を知ることができない場合や、オクリュージョン（隠れ領域）を含む動画像から手領域を検出する場合であっても、画像データに含まれる手の数に応じて手領域を検出でき、手領域の検出精度を向上できる。 According to the embodiment of the present invention described above, it is possible to know in advance the number of hand regions included in image data by determining whether a candidate region is a hand region based on a predetermined threshold. Even if it is not possible or when detecting a hand area from a moving image including occlusion (hidden area), the hand area can be detected according to the number of hands included in the image data, and the detection accuracy of the hand area is improved. It can be improved.

また、本実施の形態を足領域に適用することにより、手領域を検出する場合と同様に、事前に足領域の数を知ることができない場合であっても、画像データに含まれる足の数に応じて足領域を検出できる。この場合、顔領域との位置関係などに関する情報を含めて算出した、手領域の検出に用いるものとは異なるスコア値のマップを用いて、候補領域の検出及び足領域の特定を行ってもよい。また、手領域の検出に用いるものと同様の、顔領域の肌色分布に基づくスコア値のマップにより候補領域の検出を行い、顔領域との位置関係や既に検出した手領域との位置関係、及び足領域を特定するために設定された所定の閾値などに基づいて足領域の特定を行ってもよい。さらにこの場合には、当該画像データ中の手領域に含まれる画素のスコア値の平均値と、足領域の候補領域に含まれる画素のスコア値の平均値との比に基づいて、候補領域が足領域であるか否かを判定する処理を行ってもよい。 In addition, by applying this embodiment to the foot region, the number of feet included in the image data even when the number of foot regions cannot be known in advance, as in the case of detecting the hand region. The foot region can be detected according to In this case, the candidate area may be detected and the foot area may be specified using a map of score values different from that used for detecting the hand area calculated including information on the positional relationship with the face area. . In addition, the candidate area is detected by a map of score values based on the skin color distribution of the face area, similar to that used for detecting the hand area, and the positional relationship with the face area or the positional relationship with the already detected hand area, and The foot region may be specified based on a predetermined threshold set to specify the foot region. Furthermore, in this case, based on the ratio between the average score value of the pixels included in the hand region in the image data and the average score value of the pixels included in the candidate region of the foot region, the candidate region is You may perform the process which determines whether it is a foot area | region.

なお、以上の説明においては、画像データに含まれる各画素についてスコア値を設定し、スコア値のマップを生成する例について説明したが、マップ生成部２３が生成するスコア値のマップは、必ずしも単位領域として画素を用いなくてもよい。例えば、２×２の４画素など複数の画素の集合を単位領域として設定し、当該単位領域ごとにスコア値を設定することで、画素ごとにスコア値を設定した場合に比較してより粗い情報量を持つスコア値のマップが得られる。このスコア値のマップに対して、検出開始点決定部２４及び候補領域検出部２５が候補領域を検出することで、検出される注目領域の精度は落ちるものの、より少ない計算量で注目領域を検出することができる。 In the above description, an example in which a score value is set for each pixel included in the image data and a score value map is generated has been described. However, the score value map generated by the map generation unit 23 is not necessarily a unit. Pixels may not be used as regions. For example, by setting a set of a plurality of pixels such as 2 × 2 4 pixels as a unit region and setting a score value for each unit region, information that is coarser than when a score value is set for each pixel A map of score values with quantities is obtained. Although the detection start point determination unit 24 and the candidate region detection unit 25 detect the candidate region for the score value map, the accuracy of the detected attention region is reduced, but the attention region is detected with a smaller calculation amount. can do.

また、上記説明においては、まず顔領域を顔領域抽出部２１が抽出し、得られた顔領域に含まれる画素から色ヒストグラム生成部２２が色ヒストグラムを生成したが、本発明の実施の形態はこのような形態に限られない。すなわち、色ヒストグラムは予め所定の肌色の色分布として与えられていてもよいし、過去の注目領域を検出する処理によって得られた学習データを用いてもよい。また、動画像に含まれる各フレームの画像データから注目領域を検出する場合、以前のフレームの画像データに対して色ヒストグラム生成部２２が生成した色ヒストグラムを用いてスコア値のマップを生成することとしてもよい。 In the above description, the face area extraction unit 21 first extracts a face area, and the color histogram generation unit 22 generates a color histogram from the pixels included in the obtained face area. It is not restricted to such a form. That is, the color histogram may be given in advance as a color distribution of a predetermined skin color, or learning data obtained by a process of detecting a past attention area may be used. In addition, when a region of interest is detected from image data of each frame included in the moving image, a score value map is generated using the color histogram generated by the color histogram generation unit 22 for the image data of the previous frame. It is good.

本発明の実施の形態に係る画像処理装置の構成を表すブロック図である。It is a block diagram showing the structure of the image processing apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る画像処理装置の機能を表す機能ブロック図である。It is a functional block diagram showing the function of the image processing apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る画像処理装置によって実行される処理の一例を示すフロー図である。It is a flowchart which shows an example of the process performed by the image processing apparatus which concerns on embodiment of this invention.

Explanation of symbols

１０画像処理装置、１１制御部、１２記憶部、１３操作部、１４表示部、１５撮像部、２１顔領域抽出部、２２色ヒストグラム生成部、２３マップ生成部、２４検出開始点決定部、２５候補領域検出部、２６手領域特定部。 DESCRIPTION OF SYMBOLS 10 Image processing apparatus, 11 Control part, 12 Storage part, 13 Operation part, 14 Display part, 15 Imaging part, 21 Face area extraction part, 22 Color histogram generation part, 23 Map generation part, 24 Detection start point determination part, 25 Candidate area detection unit, 26 hand area identification unit.

Claims

An image processing device for detecting a region of interest including a portion related to a hand or a foot of the person from the image data generated by imaging at least a part of the person,
For each unit area included in the image data, a map acquisition is performed for acquiring a score value map in which a score value related to the degree to which the unit area is included in the attention area is associated with a position in the image data of the unit area. Means,
Detection start point determination means for determining a plurality of detection start points based on a predetermined condition;
Candidate area detection means for detecting candidate areas that are candidates for a plurality of attention areas respectively associated with the respective detection start points based on the map of score values;
For each candidate area, an area for determining whether or not the candidate area is an attention area based on a relationship between an average value of the score values associated with the unit areas included in the candidate area and a predetermined threshold value Specific means,
An image processing apparatus comprising:

The image processing apparatus according to claim 1.
The area specifying unit determines whether the candidate area is an attention area based on a ratio of the average values of the score values related to the two candidate areas for the two candidate areas in descending order of the average value of the score values. An image processing apparatus characterized by determining whether or not.

The image processing apparatus according to claim 1 or 2,
The image processing apparatus according to claim 1, wherein the detection start point determination means determines the detection start points such that a plurality of the detection start points are separated by a predetermined distance or more on the score value map.

An image processing method for detecting a region of interest including a portion related to a hand or a foot of the person from the image data generated by imaging at least a part of the person using a computer,
For each unit area included in the image data, obtaining a score value map in which a score value related to the degree to which the unit area is included in the attention area is associated with a position in the image data of the unit area; ,
Determining a plurality of detection start points based on a predetermined condition;
Detecting candidate regions that are candidates for a plurality of attention regions respectively associated with the respective detection start points based on the map of score values;
For each of the candidate regions, a step of determining whether or not the candidate region is a region of interest based on a relationship between an average value of the score values associated with the unit regions included in the candidate region and a predetermined threshold value When,
An image processing method comprising:

An image processing program for causing a computer to detect a region of interest including a portion related to a person's hand or foot from the image data generated by imaging at least a part of the person,
For each unit area included in the image data, obtaining a score value map in which a score value related to the degree to which the unit area is included in the attention area is associated with a position in the image data of the unit area; ,
Determining a plurality of detection start points based on a predetermined condition;
Detecting candidate regions that are candidates for a plurality of attention regions respectively associated with the respective detection start points based on the map of score values;
For each of the candidate regions, a step of determining whether or not the candidate region is a region of interest based on a relationship between an average value of the score values associated with the unit regions included in the candidate region and a predetermined threshold value When,
An image processing program for causing the computer to execute.