JP6770363B2

JP6770363B2 - Face direction estimator and its program

Info

Publication number: JP6770363B2
Application number: JP2016154536A
Authority: JP
Inventors: 真介横澤; 高橋　正樹; 正樹高橋; 山内　結子; 結子山内
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2016-08-05
Filing date: 2016-08-05
Publication date: 2020-10-14
Anticipated expiration: 2036-08-05
Also published as: JP2018022416A

Description

本発明は、色ヒストグラム及び他の特徴量を用いて、被写体の顔方向を推定する顔方向推定装置及びそのプログラムに関する。 The present invention relates to a face direction estimation device that estimates the face direction of a subject using a color histogram and other features, and a program thereof.

従来より、映像中の人物の顔向きを推定する手法は、様々なものが提案されている。ここで、サッカーの試合を広角の固定カメラで撮影し、その映像からサッカー選手の顔画像を抽出すると、その顔画像の解像度が低くなることが多い。このような低解像度の顔画像を扱う手法では、顔方向を８方向で定義し、それらをパターン認識で分類するアプローチが多く取られている。 Conventionally, various methods for estimating the face orientation of a person in an image have been proposed. Here, when a soccer game is shot with a wide-angle fixed camera and a face image of a soccer player is extracted from the image, the resolution of the face image is often lowered. In the method of handling such a low-resolution face image, many approaches are taken in which the face directions are defined in eight directions and they are classified by pattern recognition.

また、顔画像から抽出する特徴量として、ｉＤＦ（Non-local Intensity Difference Feature）と、ｃＤＦ（Non-local Color Different Feature）と、ＩＦ（Intensity Feature）とを用いる手法が提案されている（非特許文献１）。この他、ＨＯＧ（Histograms of Oriented Gradients）と、ＣＴＣ（Color Triplet Comparison）とを用いる手法が提案されている（非特許文献２）。 In addition, a method using iDF (Non-local Intensity Difference Feature), cDF (Non-local Color Different Feature), and IF (Intensity Feature) has been proposed as feature quantities extracted from facial images (non-patented). Document 1). In addition, a method using HOG (Histograms of Oriented Gradients) and CTC (Color Triplet Comparison) has been proposed (Non-Patent Document 2).

T. Siriteerakul, D. Sugimura and Y. Sato, “Head Pose Classification from Low Resolution Images Using Pairwise Non-Local Intensity and Color Differences”, Proc. Fourth Pacific-Rim Symposium on Image and Video Technology, pp.362-369 (Nov. 2010)T. Siriteerakul, D. Sugimura and Y. Sato, “Head Pose Classification from Low Resolution Images Using Pairwise Non-Local Intensity and Color Differences”, Proc. Fourth Pacific-Rim Symposium on Image and Video Technology, pp.362-369 ( Nov. 2010) B. Benfold and I. Reid, “Unsupervised learning of a scene-specific coarse gaze estimator”, Proc. 2011 International Conference on Computer Vision, pp.2344-2351 (Nov. 2011)B. Benfold and I. Reid, “Unsupervised learning of a scene-specific coarse gaze estimator”, Proc. 2011 International Conference on Computer Vision, pp.2344-2351 (Nov. 2011)

しかし、非特許文献１，２に記載の手法は、特徴量の次元数が多いので、その特徴量による学習及び識別の処理負荷が重くなるという問題があった。このため、非特許文献１，２に記載の手法は、サッカーの中継のようにリアルタイム性が要求されるコンテンツへの適用が困難であった。 However, the methods described in Non-Patent Documents 1 and 2 have a problem that since the number of dimensions of the feature amount is large, the processing load of learning and identification based on the feature amount becomes heavy. Therefore, it has been difficult to apply the methods described in Non-Patent Documents 1 and 2 to contents that require real-time performance, such as soccer relay.

そこで、本発明は、リアルタイムで顔方向を高精度に推定できる顔方向推定装置及びそのプログラムを提供することを課題とする。 Therefore, an object of the present invention is to provide a face direction estimation device and a program thereof that can estimate the face direction with high accuracy in real time.

前記した課題に鑑みて、本発明に係る顔方向推定装置は、色ヒストグラム、及び、前記色ヒストグラムと異なる１種類以上の第２特徴量を用いて、被写体の顔画像から前記被写体の顔方向を推定する顔方向推定装置であって、画像領域分割部と、第１特徴量計算部と、第１識別部と、第２特徴量計算部と、第２識別部と、顔方向推定部と、を備える構成とした。 In view of the above problems, the face direction estimation device according to the present invention uses a color histogram and one or more types of second feature quantities different from the color histogram to determine the face direction of the subject from the face image of the subject. An image region dividing unit, a first feature amount calculation unit, a first identification unit, a second feature amount calculation unit, a second identification unit, a face direction estimation unit, and the face direction estimation device for estimating. The configuration is provided with.

かかる構成によれば、顔方向推定装置は、画像領域分割部によって、前記顔画像を入力し、入力した前記顔画像を複数の領域に分割する。そして、顔方向推定装置は、前記領域毎の色ヒストグラムを計算し、計算した前記領域毎の色ヒストグラムを連結することで、前記顔画像全体の色ヒストグラムを求める。 According to such a configuration, the face direction estimation device inputs the face image by the image region dividing unit, and divides the input face image into a plurality of regions. Then, the face direction estimation device calculates the color histogram for each of the regions, and concatenates the calculated color histogram for each region to obtain the color histogram of the entire face image.

ここで、顔方向推定装置は、各画素の位置情報を記述するために顔画像を領域分割し、それぞれの領域で色ヒストグラムを計算するので、特徴量の次元数を少なくできる。さらに、顔方向推定装置は、顔画像内で頭部位置が変化する場合、顔画像の解像度が低下する場合や顔画像にノイズが重畳する場合でも、顔画像の領域毎に色ヒストグラムを計算するので、特徴量の計算結果がこれらの影響を受けにくくなる。 Here, since the face direction estimation device divides the face image into regions in order to describe the position information of each pixel and calculates the color histogram in each region, the number of dimensions of the feature amount can be reduced. Further, the face direction estimation device calculates the color histogram for each region of the face image even when the head position changes in the face image, the resolution of the face image decreases, or noise is superimposed on the face image. Therefore, the calculation result of the feature amount is less affected by these.

顔方向推定装置は、第１識別部により、顔方向が異なる訓練データの色ヒストグラムを学習した識別器により、前記顔画像全体の色ヒストグラムから、前記被写体が各顔方向を向いている確率である信頼度を計算する。 The face direction estimation device is the probability that the subject is facing each face direction from the color histogram of the entire face image by the classifier that has learned the color histograms of the training data having different face directions by the first identification unit. Calculate the confidence.

顔方向推定装置は、第２特徴量計算部によって、前記第２特徴量の種類毎に、前記顔画像の第２特徴量を計算する。そして、顔方向推定装置は、第２識別部によって、前記第２特徴量の種類毎に、前記訓練データの第２特徴量を学習した識別器により、前記顔画像の第２特徴量から、前記信頼度を計算する。さらに、顔方向推定装置は、顔方向推定部によって、色ヒストグラム及び前記第２特徴量の種類毎に計算した信頼度を統合することで、前記被写体の顔方向を推定する。 The face direction estimation device calculates the second feature amount of the face image for each type of the second feature amount by the second feature amount calculation unit. Then, the face direction estimation device is described from the second feature amount of the face image by the classifier that has learned the second feature amount of the training data for each type of the second feature amount by the second identification unit. Calculate the reliability. Further, the face direction estimation device estimates the face direction of the subject by integrating the color histogram and the reliability calculated for each type of the second feature amount by the face direction estimation unit.

このように、顔方向推定装置は、特徴量の次元数が少ないので、学習及び識別の処理負荷を軽減し、被写体の顔方向をリアルタイムで推定することができる。さらに、顔方向推定装置は、色ヒストグラムと、色ヒストグラム以外の第２特徴量とを併用するので、被写体の顔方向を高精度に推定することができる。 As described above, since the face direction estimation device has a small number of dimensions of the feature amount, the processing load of learning and identification can be reduced, and the face direction of the subject can be estimated in real time. Further, since the face direction estimation device uses the color histogram and the second feature amount other than the color histogram in combination, the face direction of the subject can be estimated with high accuracy.

なお、本発明に係る顔方向推定装置は、コンピュータが備えるＣＰＵ、メモリ、ハードディスク等のハードウェア資源を、前記した各手段として協調動作させる顔方向推定プログラムで実現することもできる。 The face direction estimation device according to the present invention can also be realized by a face direction estimation program in which hardware resources such as a CPU, a memory, and a hard disk included in a computer are cooperatively operated as the above-mentioned means.

本発明によれば、以下のような優れた効果を奏する。
本発明に係る顔方向推定装置は、顔画像の領域毎に色ヒストグラムを計算するので、顔画像内での頭部位置の変化、顔画像の解像度低下やノイズ重畳の影響を受けにくく、特徴量の次元数を少なくできる。これにより、顔方向推定装置は、学習及び識別の処理負荷を軽減し、被写体の顔方向をリアルタイムで推定することができる。さらに、顔方向推定装置は、色ヒストグラムと、色ヒストグラム以外の第２特徴量とを併用するので、被写体の顔方向を高精度に推定することができる。 According to the present invention, the following excellent effects are obtained.
Since the face direction estimation device according to the present invention calculates the color histogram for each region of the face image, it is not easily affected by the change in the head position in the face image, the decrease in the resolution of the face image and the superposition of noise, and the feature amount The number of dimensions of can be reduced. As a result, the face direction estimation device can reduce the processing load of learning and identification and estimate the face direction of the subject in real time. Further, since the face direction estimation device uses the color histogram and the second feature amount other than the color histogram in combination, the face direction of the subject can be estimated with high accuracy.

本発明の第１実施形態に係る顔方向推定システムの概略を示す概略図である。It is a schematic diagram which shows the outline of the face direction estimation system which concerns on 1st Embodiment of this invention. 顔方向推定システムが合成したＣＧ映像を説明する説明図である。It is explanatory drawing explaining the CG image synthesized by the face direction estimation system. 本発明の第１実施形態に係る顔方向推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the face direction estimation apparatus which concerns on 1st Embodiment of this invention. （ａ）は顔画像抽出部が抽出した顔画像の一例であり、（ｂ）は正規化した顔画像の一例である。(A) is an example of a face image extracted by the face image extraction unit, and (b) is an example of a normalized face image. 画像領域分割部が分割した顔画像の一例である。This is an example of a face image divided by the image area division portion. （ａ）は領域画像の一例であり、（ｂ）は色ヒストグラムの計算を説明する説明図である。(A) is an example of a region image, and (b) is an explanatory diagram illustrating the calculation of the color histogram. （ａ）は顔方向の基準となる座標軸を説明する図であり、（ｂ）は顔方向を説明する図である。(A) is a diagram for explaining a coordinate axis that serves as a reference for the face direction, and (b) is a diagram for explaining the face direction. 訓練データの一例である。This is an example of training data. （ａ）は輝度の勾配強度及び勾配方向を説明する図であり、（ｂ）は輝度のヒストグラムを説明する図である。(A) is a diagram for explaining the gradient intensity and the gradient direction of the luminance, and (b) is a diagram for explaining the histogram of the luminance. 本発明の第２実施形態に係る顔方向推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the face direction estimation apparatus which concerns on 2nd Embodiment of this invention. 顔方向推定装置の学習モードの動作を示すフローチャートである。It is a flowchart which shows the operation of the learning mode of a face direction estimation apparatus. 顔方向推定装置の推定モードの動作を示すフローチャートである。It is a flowchart which shows the operation of the estimation mode of the face direction estimation device. 実施例１、参考例１及び比較例１〜３の識別性能を表すテーブルである。It is a table which shows the identification performance of Example 1, Reference Example 1 and Comparative Examples 1-3. 実施例１、参考例１及び比較例１〜３の計算時間を表すテーブルである。It is a table which shows the calculation time of Example 1, Reference Example 1 and Comparative Examples 1-3. 実施例１の混同行列である。It is a confusion matrix of Example 1. 参考例１の混同行列である。It is a confusion matrix of Reference Example 1. 比較例１の混同行列である。It is a confusion matrix of Comparative Example 1. 比較例２の混同行列である。It is a confusion matrix of Comparative Example 2. 比較例３の混同行列である。It is a confusion matrix of Comparative Example 3.

以下、本発明の各実施形態について、適宜図面を参照しながら詳細に説明する。なお、各実施形態において、同一の手段には同一の符号を付し、説明を省略した。 Hereinafter, each embodiment of the present invention will be described in detail with reference to the drawings as appropriate. In each embodiment, the same means are designated by the same reference numerals, and the description thereof is omitted.

（第１実施形態）
［顔方向推定システムの概略］
図１を参照し、本発明の第１実施形態に係る顔方向推定システム１００の概略について説明する。
顔方向推定システム１００は、サッカー選手（被写体）の顔方向をリアルタイムで推定し、推定したサッカー選手の顔方向を中継映像にＣＧ合成する。図１に示すように、顔方向推定システム１００は、第１撮影部Ｃ_１と、第２撮影部Ｃ_２と、顔方向推定装置１と、ＣＧ合成装置２と、を備える。 (First Embodiment)
[Outline of face direction estimation system]
The outline of the face direction estimation system 100 according to the first embodiment of the present invention will be described with reference to FIG.
The face direction estimation system 100 estimates the face direction of a soccer player (subject) in real time, and CG synthesizes the estimated face direction of the soccer player into a relay image. As shown in FIG. 1, the face direction estimation system 100 includes a first photographing unit C ₁ , a second photographing unit C ₂ , a face direction estimating device 1, and a CG synthesizer 2.

第１撮影部Ｃ_１は、顔方向推定用の映像を撮影するカメラである。本実施形態では、第１撮影部Ｃ_１は、コーナーエリア９１付近に配置され、複数のサッカー選手を同時に撮影できるように、広角でサッカーの試合を撮影する。この第１撮影部Ｃ_１は、特に制限されないが、例えば、パン、チルト及びズームの各機能（ＰＴＺ機能）を備えない固定カメラである。 The first shooting unit C ₁ is a camera that shoots an image for estimating the face direction. In the present embodiment, the first shooting unit C ₁ is arranged near the corner area 91 and shoots a soccer game at a wide angle so that a plurality of soccer players can be shot at the same time. The first photographing unit C ₁ is not particularly limited, but is, for example, a fixed camera that does not have the pan, tilt, and zoom functions (PTZ functions).

第２撮影部Ｃ_２は、サッカーの試合映像を撮影するカメラである。本実施形態では、第２撮影部Ｃ_２は、センターライン９２付近に配置され、カメラマンによる手動操作又は自動制御でサッカーの試合を撮影する。この第２撮影部Ｃ_２は、特に制限されないが、例えば、ＰＴＺ機能を備えたＰＴＺカメラである。 The second shooting unit C ₂ is a camera that shoots a soccer game video. In the present embodiment, the second shooting unit C ₂ is arranged near the center line 92, and shoots a soccer game by manual operation or automatic control by a photographer. The second photographing unit C ₂ is not particularly limited, but is, for example, a PTZ camera having a PTZ function.

顔方向推定装置１は、サッカー選手の顔方向を識別するための識別器を予め生成する。そして、顔方向推定装置１は、この識別器を用いて、第１撮影部Ｃ_１で撮影した映像から、サッカー選手の顔方向を推定する。なお、顔方向推定装置１の詳細は、後記する。 The face direction estimation device 1 generates in advance a classifier for identifying the face direction of a soccer player. The face direction estimating apparatus 1 uses this identifier, the image taken by the first imaging unit C _1, to estimate the face direction of the soccer player. The details of the face direction estimation device 1 will be described later.

ＣＧ合成装置２は、第２撮影部Ｃ_２で撮影した映像に、顔方向推定装置１が推定したサッカー選手の顔方向を示すＣＧを合成する。例えば、ＣＧ合成装置２は、図２に示すように、サッカーの試合映像に、サッカー選手の顔方向を示す扇状マーカαのＣＧを合成する。
これにより、顔方向推定システム１００は、サッカー選手の動きを視聴者が把握し易くなり、より臨場感が高いスポーツ映像を提供することができる。 The CG synthesizer 2 synthesizes a CG indicating the face direction of the soccer player estimated by the face direction estimation device 1 with the image taken by the second shooting unit C ₂ . For example, as shown in FIG. 2, the CG synthesizer 2 synthesizes the CG of the fan-shaped marker α indicating the face direction of the soccer player with the soccer match video.
As a result, the face direction estimation system 100 makes it easier for the viewer to grasp the movement of the soccer player, and can provide a sports image with a higher sense of presence.

［顔方向推定装置の構成］
図３を参照し、本発明の実施形態に係る顔方向推定装置１の構成について説明する。
顔方向推定装置１は、色ヒストグラム、及び、色ヒストグラムと異なる１種類以上の第２特徴量を用いて、サッカー選手の顔画像から顔方向を推定する。本実施形態では、顔方向推定装置１は、第２特徴量として、ＨＯＧを用いることとした。つまり、顔方向推定装置１は、色に関連した特徴量である色ヒストグラム、及び、形状に関連した特徴量であるＨＯＧのように、特性が異なる特徴量を併用している。 [Configuration of face direction estimation device]
The configuration of the face direction estimation device 1 according to the embodiment of the present invention will be described with reference to FIG.
The face direction estimation device 1 estimates the face direction from the face image of a soccer player by using the color histogram and one or more types of second feature quantities different from the color histogram. In the present embodiment, the face direction estimation device 1 uses HOG as the second feature amount. That is, the face direction estimation device 1 uses the feature amounts having different characteristics, such as the color histogram which is the feature amount related to the color and the HOG which is the feature amount related to the shape.

図３に示すように、顔方向推定装置１は、特徴量計算装置３と、顔画像抽出部１０と、画像サイズ正規化部１１と、第１識別部１４と、第２特徴量計算部１５と、第２識別部１６と、識別結果統合部（顔方向推定部）１７と、出力部１８と、を備える。 As shown in FIG. 3, the face direction estimation device 1 includes a feature amount calculation device 3, a face image extraction unit 10, an image size normalization unit 11, a first identification unit 14, and a second feature amount calculation unit 15. A second identification unit 16, an identification result integration unit (face direction estimation unit) 17, and an output unit 18 are provided.

ここで、操作者は、図示を省略したマウス、キーボード等の操作手段を介して、顔方向推定装置１に学習モード又は推定モードを指定する。
学習モードは、顔方向推定装置１が識別器を生成するモードである。学習モードの場合、顔方向推定装置１は、特徴量計算装置３、画像サイズ正規化部１１、第１識別部１４、第２特徴量計算部１５、及び、第２識別部１６が機能する。
推定モードは、顔方向推定装置１がサッカー選手の顔方向を推定するモードである。推定モードの場合、顔方向推定装置１の全手段が機能する。 Here, the operator designates the learning mode or the estimation mode to the face direction estimation device 1 via an operating means such as a mouse or a keyboard (not shown).
The learning mode is a mode in which the face direction estimation device 1 generates a classifier. In the learning mode, the face direction estimation device 1 functions as a feature amount calculation device 3, an image size normalization unit 11, a first identification unit 14, a second feature amount calculation unit 15, and a second identification unit 16.
The estimation mode is a mode in which the face direction estimation device 1 estimates the face direction of a soccer player. In the estimation mode, all means of the face direction estimation device 1 function.

顔画像抽出部１０は、推定モードの場合、第１撮影部Ｃ_１より入力した映像から顔画像を抽出する。例えば、顔画像抽出部１０は、サッカーの試合映像に被写体追跡処理を施し、この映像に含まれるサッカー選手の位置を求める（例えば、参考文献１）。この参考文献１に記載の手法は、サッカー選手の動きをモデル化し、パーティクルフィルタにより追跡を行うものである。
参考文献１：西濃拓郎、滝口哲也、有木康雄、「単眼動画像におけるボールと選手の３次元位置推定」、２００９年電子情報通信学会総合大会(情報・システム講演論文集２)、ｐ２１３ Facial image extraction unit 10, when the estimation mode, extracts a face image from the image input from the first imaging section C _1. For example, the face image extraction unit 10 performs subject tracking processing on a soccer match video to obtain the position of a soccer player included in this video (for example, Reference 1). The method described in Reference 1 models the movement of a soccer player and tracks it with a particle filter.
Reference 1: Takuro Seino, Tetsuya Takiguchi, Yasuo Ariki, "Three-dimensional position estimation of ball and player in monocular motion image", 2009 IEICE General Conference (Information and System Lecture Proceedings 2), p213

また、顔画像抽出部１０は、外部からサッカー選手の位置情報が提供される場合（例えば、参考文献２）、この位置情報を利用してもよい。
参考文献２：ChyronHego, “TRACAB Optical Tracking”, URL<http://chyronhego.com/sports-data/tracab> Further, the face image extraction unit 10 may use this position information when the position information of the soccer player is provided from the outside (for example, Reference 2).
Reference 2: ChyronHego, “TRACAB Optical Tracking”, URL <http://chyronhego.com/sports-data/tracab>

次に、顔画像抽出部１０は、サッカー選手の位置を基準にして、サッカー選手の顔領域の画像である顔画像を抽出する。この顔画像は、第１撮影部Ｃ_１が広角で撮影を行っているので、解像度が低くなることが多い。また、顔画像は、映像内におけるサッカー選手の位置に応じて、その解像度（サイズ）が異なる。図４（ａ）の例では、顔画像の解像度は、横１５ピクセル、縦１５ピクセルである。 Next, the face image extraction unit 10 extracts a face image which is an image of the face area of the soccer player with reference to the position of the soccer player. The face image, the first imaging section C ₁ is performing photographing at a wide angle, is often low resolution. In addition, the resolution (size) of the face image differs depending on the position of the soccer player in the image. In the example of FIG. 4A, the resolution of the face image is 15 pixels in the horizontal direction and 15 pixels in the vertical direction.

なお、顔画像抽出部１０は、映像に複数のサッカー選手が含まれる場合、全サッカー選手の顔画像を抽出してもよい。この場合、顔方向推定装置１は、顔画像抽出部１０が抽出した全サッカー選手の顔方向を推定することになる。
また、操作者は、操作手段により、顔方向の推定対象となるサッカー選手を指定してもよい。この場合、顔方向推定装置１は、操作者が指定したサッカー選手の顔方向を推定することになる。 When the video includes a plurality of soccer players, the face image extraction unit 10 may extract the face images of all the soccer players. In this case, the face direction estimation device 1 estimates the face directions of all soccer players extracted by the face image extraction unit 10.
Further, the operator may specify a soccer player to be estimated in the face direction by the operation means. In this case, the face direction estimation device 1 estimates the face direction of the soccer player designated by the operator.

画像サイズ正規化部１１は、推定モードの場合、顔画像抽出部１０から入力した顔画像を、予め設定したサイズに正規化する。例えば、画像サイズ正規化部１１は、図４（ａ）の顔画像を、図４（ｂ）に示すように縦横２０ピクセルのサイズに正規化する。
また、画像サイズ正規化部１１は、学習モードの場合、操作者が入力した訓練データを、推定モードと同様に正規化する。なお、訓練データの詳細は、後記する。 In the estimation mode, the image size normalization unit 11 normalizes the face image input from the face image extraction unit 10 to a preset size. For example, the image size normalization unit 11 normalizes the face image of FIG. 4 (a) to a size of 20 pixels in length and width as shown in FIG. 4 (b).
Further, in the learning mode, the image size normalization unit 11 normalizes the training data input by the operator in the same manner as in the estimation mode. The details of the training data will be described later.

特徴量計算装置３は、色ヒストグラムを用いて、画像サイズ正規化部１１から入力した顔画像の特徴量を計算する。図３に示すように、特徴量計算装置３は、画像領域分割部１２と、第１特徴量計算部１３と、を備える。 The feature amount calculation device 3 calculates the feature amount of the face image input from the image size normalizing unit 11 by using the color histogram. As shown in FIG. 3, the feature amount calculation device 3 includes an image area dividing unit 12 and a first feature amount calculation unit 13.

画像領域分割部１２は、推定モードの場合、画像サイズ正規化部１１から入力した顔画像を、ｉ×ｊ個の領域に分割する（ｉは縦方向の領域分割数を表す２以上の整数、ｊは横方向の領域分割数を表す２以上の整数）。例えば、画像領域分割部１２は、図５に示すように、縦横２０ピクセルの顔画像を縦横に４等分し、１６個の領域に分割する（ｉ＝ｊ＝４）。つまり、各領域は、縦横５ピクセルの画像になる。
また、画像領域分割部１２は、学習モードの場合、画像サイズ正規化部１１から入力した訓練データを、推定モードと同様に分割する。 In the estimation mode, the image area division unit 12 divides the face image input from the image size normalization unit 11 into i × j areas (i is an integer of 2 or more representing the number of area divisions in the vertical direction. j is an integer of 2 or more representing the number of region divisions in the horizontal direction). For example, as shown in FIG. 5, the image area dividing unit 12 divides a face image having 20 pixels in height and width into four equal parts in length and width, and divides the face image into 16 areas (i = j = 4). That is, each area becomes an image of 5 pixels in length and width.
Further, in the learning mode, the image area dividing unit 12 divides the training data input from the image size normalizing unit 11 in the same manner as in the estimation mode.

第１特徴量計算部１３は、推定モードの場合、画像領域分割部１２から入力した顔画像の領域毎に色ヒストグラムを計算し、計算した領域毎の色ヒストグラムを連結することで、顔画像全体の色ヒストグラムを求める。
また、第１特徴量計算部１３は、学習モードの場合、画像領域分割部１２から入力した訓練データ全体の色ヒストグラムを、推定モードと同様に求める。 In the estimation mode, the first feature amount calculation unit 13 calculates the color histogram for each area of the face image input from the image area division unit 12, and concatenates the color histograms for each calculated area to form the entire face image. Find the color histogram of.
Further, in the learning mode, the first feature amount calculation unit 13 obtains the color histogram of the entire training data input from the image area dividing unit 12 in the same manner as in the estimation mode.

＜色ヒストグラムの計算＞
以下、図６を参照し、色ヒストグラムの計算について説明する（適宜図３参照）。
第１特徴量計算部１３は、図６（ａ）の領域画像について、各原色の画像における画素値（輝度値）のヒストグラムを求める。図６（ａ）の領域画像は、図５の顔画像で左上の領域に対応する画像である。 <Calculation of color histogram>
Hereinafter, the calculation of the color histogram will be described with reference to FIG. 6 (see FIG. 3 as appropriate).
The first feature amount calculation unit 13 obtains a histogram of pixel values (luminance values) in the images of each primary color for the region image of FIG. 6A. The area image of FIG. 6A is an image corresponding to the upper left area of the face image of FIG.

まず、第１特徴量計算部１３は、図６（ａ）の領域画像から、赤色成分を抽出したＲ画像、緑色成分を抽出したＧ画像、及び、青色成分を抽出したＢ画像を生成する。そして、第１特徴量計算部１３は、図６（ｂ）に示すように、Ｒ画像、Ｇ画像及びＢ画像において、それぞれの画素値の分布を表したヒストグラムを算出する。 First, the first feature amount calculation unit 13 generates an R image from which the red component is extracted, a G image from which the green component is extracted, and a B image from which the blue component is extracted from the region image of FIG. 6A. Then, as shown in FIG. 6B, the first feature amount calculation unit 13 calculates a histogram showing the distribution of each pixel value in the R image, the G image, and the B image.

例えば、第１特徴量計算部１３は、画素値が０〜２５５の範囲となる場合には、この範囲を４等分し、０〜６３、６４〜１２７、１２８〜１９１、１９２〜２５５のグループに分割する。そして、第１特徴量計算部１３は、Ｒ画像、Ｇ画像及びＢ画像のそれぞれで、各グループに含まれる画素値の数を格納した配列を生成する。例えば、第１特徴量計算部１３は、Ｒ画像について、０〜６３のグループに対応したＲ［０］と、６４〜１２７のグループに対応したＲ［１］と、１２８〜１９１のグループに対応したＲ［２］と、１９２〜２５５のグループに対応したＲ［３］とを格納した配列を生成する（Ｇ画像及びＢ画像も同様）。 For example, when the pixel value is in the range of 0 to 255, the first feature amount calculation unit 13 divides this range into four equal groups of 0 to 63, 64-127, 128-191, and 192 to 255. Divide into. Then, the first feature amount calculation unit 13 generates an array in which the number of pixel values included in each group is stored in each of the R image, the G image, and the B image. For example, the first feature amount calculation unit 13 corresponds to R [0] corresponding to the groups 0 to 63, R [1] corresponding to the groups 64 to 127, and the groups 128 to 191 for the R image. An array containing the R [2] and the R [3] corresponding to the groups of 192 to 255 is generated (the same applies to the G image and the B image).

このようにして、第１特徴量計算部１３は、図６（ａ）の領域画像について、Ｒ［０］〜Ｒ［３］、Ｇ［０］〜Ｇ［３］、Ｂ［０］〜Ｂ［３］を要素とする色ヒストグラムを計算できる。さらに、第１特徴量計算部１３は、図６（ａ）以外の領域についても、同様に色ヒストグラムを計算する。その後、第１特徴量計算部１３は、左上から右下までの全領域画像の色ヒストグラムを連結し、顔画像全体の色ヒストグラムを求める。 In this way, the first feature amount calculation unit 13 has R [0] to R [3], G [0] to G [3], and B [0] to B for the region image of FIG. 6A. The color histogram with [3] as an element can be calculated. Further, the first feature amount calculation unit 13 calculates the color histogram in the same manner for the regions other than those shown in FIG. 6A. After that, the first feature amount calculation unit 13 concatenates the color histograms of the entire region images from the upper left to the lower right to obtain the color histogram of the entire face image.

図３に戻り、顔方向推定装置１の構成について、説明を続ける。
第１識別部１４は、学習モードの場合、顔方向が異なる訓練データの色ヒストグラムを学習した識別器を生成する。また、第１識別部１４は、推定モードの場合、この識別器により、顔画像全体の色ヒストグラムから、被写体が各顔方向を向いている確率である信頼度を計算する。 Returning to FIG. 3, the configuration of the face direction estimation device 1 will be described.
In the learning mode, the first discriminator 14 generates a discriminator that has learned the color histograms of the training data having different face directions. Further, in the estimation mode, the first identification unit 14 calculates the reliability, which is the probability that the subject is facing each face direction, from the color histogram of the entire face image by this classifier.

第１識別部１４は、機械学習の手法が特に制限されないが、例えば、ｏｎｅ‐ｖｅｒｓｕｓ‐ｒｅｓｔによるマルチクラスＳＶＭ（Support Vector Machine）を用いる。本実施形態では、第１識別部１４は、顔方向を８方向で定義したので、８クラスのＳＶＭを用いる。 The first identification unit 14 uses, for example, a multi-class SVM (Support Vector Machine) by one-versus-rest, although the machine learning method is not particularly limited. In the present embodiment, since the first identification unit 14 defines the face direction in eight directions, eight classes of SVMs are used.

ＳＶＭは、あるクラスと別のクラスとの境界を定義すべく、サポートベクトルとマージンという２つの概念を導入する。サポートベクトルとは、分離超平面から一番近い各クラスのデータのことであり、サポートベクトルから分離超平面までの距離をマージンと呼ぶ。 SVM introduces two concepts, support vectors and margins, to define the boundaries between one class and another. The support vector is the data of each class closest to the separated hyperplane, and the distance from the support vector to the separated hyperplane is called the margin.

２次元の特徴空間において、２クラスの訓練サンプルを与えたこととする。この場合、ＳＶＭは、マージンが最大となるように、２クラスの真ん中に分離超平面を引く。また、ＳＶＭでは、分離超平面を境界として、２クラスの訓練サンプルを識別（分類）する。すなわち、マルチクラスＳＶＭは、２クラスのＳＶＭを複数用いて、マルチクラスの識別を行う。 It is assumed that two classes of training samples are given in a two-dimensional feature space. In this case, the SVM draws a hyperplane separation in the middle of the two classes so that the margin is maximized. In SVM, two classes of training samples are identified (classified) with the separation hyperplane as the boundary. That is, the multi-class SVM uses a plurality of two-class SVMs to identify the multi-class.

本実施形態では、図７（ａ）に示すように、センターマーク９３を基準にして、サッカーコート９０の横方向（図面下方向）をｘ軸とし、サッカーコート９０の縦方向（図面右方向）をｙ軸とする。そして、図７（ｂ）に示すように、ｘ軸の方向を０°とし、反時計回りに４５°おきの８方向で顔方向を定義した。 In the present embodiment, as shown in FIG. 7A, the horizontal direction of the soccer court 90 (downward in the drawing) is the x-axis with reference to the center mark 93, and the vertical direction of the soccer court 90 (right in the drawing). Is the y-axis. Then, as shown in FIG. 7B, the direction of the x-axis was set to 0 °, and the face direction was defined in eight directions at intervals of 45 ° counterclockwise.

＜識別器の生成、識別器による信頼度の計算＞
以下、識別器の生成と、識別器による信頼度の計算とを順に説明する。
識別器の生成に必要な訓練データを準備する。この訓練データは、サッカー選手の顔方向を表した教師信号（アノテーション）と、サッカー選手の顔画像とを対応付けたデータである。例えば、図８に示すように、訓練データとして、０°から３１５°までの方向を向いたサッカー選手の顔画像を準備する。 <Generation of classifier, calculation of reliability by classifier>
Hereinafter, the generation of the classifier and the calculation of the reliability by the classifier will be described in order.
Prepare the training data required to generate the classifier. This training data is data in which a teacher signal (annotation) indicating the face direction of a soccer player is associated with a face image of the soccer player. For example, as shown in FIG. 8, as training data, a face image of a soccer player facing a direction from 0 ° to 315 ° is prepared.

なお、図８では、各顔方向の訓練データを１つだけ図示したが、識別精度を向上させるため、訓練データを複数準備することが好ましい。
また、訓練データは、実際にサッカーの試合を撮影した映像から生成してもよく、所定のデータセットを利用してもよい（例えば、参考文献３）。 Although only one training data in each face direction is shown in FIG. 8, it is preferable to prepare a plurality of training data in order to improve the identification accuracy.
Further, the training data may be generated from a video of actually shooting a soccer game, or a predetermined data set may be used (for example, Reference 3).

参考文献３： S. A. Pettersen et al., “Soccer video and player position dataset”, Proc. of the 5th ACM Multimedia Systems Conference, pp.18-23, Mar. 2014. DOI: 10.1145/2557642.2563677 Reference 3: S.A. Pettersen et al., “Soccer video and player position dataset”, Proc. Of the 5th ACM Multimedia Systems Conference, pp.18-23, Mar. 2014. DOI: 10.1145 / 2557642.2563677

操作者は、顔方向推定装置１を学習モードに設定し、訓練データを画像サイズ正規化部１１に入力する。すると、顔方向推定装置１は、訓練データのサイズを正規化し、訓練データを複数の領域に分割する。そして、顔方向推定装置１は、訓練データの領域毎に色ヒストグラムを計算及び連結し、訓練データ全体の色ヒストグラムを求める。さらに、第１識別部１４は、マルチクラスＳＶＭにより、訓練データ全体の色ヒストグラムを学習し、識別器を生成する。 The operator sets the face direction estimation device 1 to the learning mode and inputs the training data to the image size normalization unit 11. Then, the face direction estimation device 1 normalizes the size of the training data and divides the training data into a plurality of regions. Then, the face direction estimation device 1 calculates and concatenates the color histograms for each area of the training data, and obtains the color histogram of the entire training data. Further, the first discriminator 14 learns the color histogram of the entire training data by the multi-class SVM and generates a discriminator.

次に、操作者は、顔方向推定装置１を推定モードに設定し、第１撮影部Ｃ_１でサッカーの試合を撮影する。すると、顔方向推定装置１は、第１撮影部Ｃ_１の映像から顔画像を抽出し、顔画像のサイズを正規化し、顔画像を複数の領域に分割する。そして、顔方向推定装置１は、顔画像の領域毎に色ヒストグラムを計算及び連結し、顔画像全体の色ヒストグラムを求める。さらに、第１識別部１４は、顔画像全体の色ヒストグラムを識別器に入力し、この識別器から信頼度の計算結果を得る。 Next, the operator sets the face direction estimating apparatus 1 in estimation mode, shooting a soccer game on the first imaging unit C _1. Then, the face direction estimating apparatus 1, the first imaging unit C ₁ video extracts a face image, and normalizes the size of the face image, divides the facial image into a plurality of regions. Then, the face direction estimation device 1 calculates and concatenates the color histograms for each region of the face image, and obtains the color histogram of the entire face image. Further, the first identification unit 14 inputs the color histogram of the entire face image to the classifier, and obtains the calculation result of the reliability from the classifier.

図３に戻り、顔方向推定装置１の構成について、説明を続ける。
第２特徴量計算部１５は、推定モードの場合、画像サイズ正規化部１１から入力した顔画像のＨＯＧを計算する。
また、第２特徴量計算部１５は、学習モードの場合、画像サイズ正規化部１１から入力した訓練データのＨＯＧを、推定モードと同様に求める。 Returning to FIG. 3, the configuration of the face direction estimation device 1 will be described.
In the estimation mode, the second feature amount calculation unit 15 calculates the HOG of the face image input from the image size normalization unit 11.
Further, in the learning mode, the second feature amount calculation unit 15 obtains the HOG of the training data input from the image size normalization unit 11 in the same manner as in the estimation mode.

＜ＨＯＧの計算＞
以下、図９を参照して、ＨＯＧの計算について説明する（適宜図３参照）。
このＨＯＧは、顔画像の局所領域（セル）での輝度の勾配方向をヒストグラム化したものである。図９（ａ）に示すように、顔画像全体を１ブロックとし、セルのサイズを縦横に４ピクセルとした。つまり、１ブロックは、縦横に５個のセルを有する。 <Calculation of HOG>
Hereinafter, the calculation of HOG will be described with reference to FIG. 9 (see FIG. 3 as appropriate).
This HOG is a histogram of the gradient direction of the brightness in the local region (cell) of the face image. As shown in FIG. 9A, the entire face image was set as one block, and the cell size was set to 4 pixels vertically and horizontally. That is, one block has five cells vertically and horizontally.

まず、第２特徴量計算部１５は、図９（ａ）の顔画像に含まれる全てのピクセルから、輝度の勾配強度及び勾配方向を求める。図９（ａ）のセルでは、各ピクセルにおける輝度の勾配強度及び勾配方向を、線分の濃淡と方向で図示した。つまり、図９（ａ）のセルにおいて、線分の濃淡が輝度の勾配強度を示し、線分の方向が輝度の勾配方向を示す。 First, the second feature amount calculation unit 15 obtains the gradient intensity and the gradient direction of the luminance from all the pixels included in the face image of FIG. 9A. In the cell of FIG. 9A, the gradient intensity and the gradient direction of the luminance at each pixel are shown in the shade and direction of the line segment. That is, in the cell of FIG. 9A, the shade of the line segment indicates the gradient intensity of the luminance, and the direction of the line segment indicates the gradient direction of the luminance.

次に、第２特徴量計算部１５は、図９（ｂ）に示すように、セル毎に、輝度の勾配方向を０°〜１８０°の間で２０°間隔で９方向に区分けして、輝度のヒストグラムを生成する。つまり、このヒストグラムは、縦軸が輝度の勾配強度となり、横軸が輝度の勾配方向となる。 Next, as shown in FIG. 9B, the second feature amount calculation unit 15 divides the luminance gradient direction between 0 ° and 180 ° into 9 directions at 20 ° intervals for each cell. Generate a luminance histogram. That is, in this histogram, the vertical axis is the luminance gradient intensity, and the horizontal axis is the luminance gradient direction.

図３に戻り、顔方向推定装置１の構成について、説明を続ける。
第２識別部１６は、学習モードの場合、訓練データのＨＯＧを学習した識別器を生成する。また、第２識別部１６は、推定モードの場合、この識別器により、顔画像のＨＯＧから信頼度を計算する。
なお、第２識別部１６は、色ヒストグラムの代わりにＨＯＧを用いる以外、第１識別部１４と同様のため、詳細な説明を省略する。 Returning to FIG. 3, the configuration of the face direction estimation device 1 will be described.
In the learning mode, the second discriminator 16 generates a discriminator that has learned the HOG of the training data. Further, in the case of the estimation mode, the second identification unit 16 calculates the reliability from the HOG of the face image by this classifier.
Since the second identification unit 16 is the same as the first identification unit 14 except that the HOG is used instead of the color histogram, detailed description thereof will be omitted.

識別結果統合部１７は、推定モードの場合、色ヒストグラム及びＨＯＧで計算した信頼度を統合することで、被写体の顔方向を推定する。具体的には、識別結果統合部１７は、顔方向毎に色ヒストグラムで計算した信頼度とＨＯＧで計算した信頼度とを乗算し、乗算した信頼度が最も高くなる顔方向を被写体の顔方向として推定する。 In the estimation mode, the identification result integration unit 17 estimates the face direction of the subject by integrating the color histogram and the reliability calculated by the HOG. Specifically, the identification result integration unit 17 multiplies the reliability calculated by the color histogram for each face direction and the reliability calculated by HOG, and sets the face direction having the highest multiplied reliability as the face direction of the subject. Estimate as.

つまり、識別結果統合部１７は、下記式（１）のように、マルチクラスＳＶＭの信頼度に基づくlate fusionを行う。ここで、ｐ^ｈ（Ｘ）は、顔画像Ｘがｈ番目のクラスに属する信頼度、つまり、クラス統合後の識別結果を表す。また、ｐ^ｈ _ｎ（Ｘ）は、ｎ番目の識別器により、顔画像Ｘがｈ番目のクラスに分類される事後確率である。 That is, the identification result integration unit 17 performs late fusion based on the reliability of the multi-class SVM as shown in the following equation (1). Here, p ^h (X) represents the reliability that the face image X belongs to the hth class, that is, the identification result after class integration. Further, ^ph _n (X) is a posterior probability that the face image X is classified into the hth class by the nth classifier.

なお、ｎは、何種類目の特徴量であるかを表す整数であり、１≦ｎ≦Ｎである。また、Ｎは、顔方向推定装置１で用いる特徴量の最大種類数を表す。本実施形態では、１種類目の特徴量が色ヒストグラムであり、２種類目の特徴量がＨＯＧであるので、Ｎ＝２となる。 Note that n is an integer indicating the number of types of feature quantities, and 1 ≦ n ≦ N. Further, N represents the maximum number of types of features used in the face direction estimation device 1. In the present embodiment, since the feature amount of the first type is the color histogram and the feature amount of the second type is HOG, N = 2.

また、顔方向が８方向なので、顔方向０°をクラス１、顔方向４５°をクラス２、顔方向９０°をクラス３、顔方向１３５°をクラス４、顔方向１８０°をクラス５、顔方向２２５°をクラス６、顔方向２７０°をクラス７、顔方向３１５°をクラス８と定義する。この場合、ｈは、何番目のクラスであるかを表す整数であり、１≦ｈ≦Ｈである。また、Ｈは、顔方向推定装置１で定義したクラスの最大数を表す。本実施形態では、８クラスを定義したので、Ｈ＝８となる。 In addition, since the face direction is 8 directions, the face direction 0 ° is class 1, the face direction 45 ° is class 2, the face direction 90 ° is class 3, the face direction 135 ° is class 4, the face direction 180 ° is class 5, and the face. The direction 225 ° is defined as class 6, the face direction 270 ° is defined as class 7, and the face direction 315 ° is defined as class 8. In this case, h is an integer indicating which class it belongs to, and 1 ≦ h ≦ H. Further, H represents the maximum number of classes defined by the face direction estimation device 1. In this embodiment, since 8 classes are defined, H = 8.

本実施形態では、識別結果統合部１７は、Ｎ＝２及びＨ＝８なので、下記式（１−１）の計算を行う。そして、識別結果統合部１７は、信頼度ｐ^１（Ｘ）〜ｐ^８（Ｘ）の信頼度のうち、その値が最も高くなるクラスの顔方向を推定結果とする。 In the present embodiment, since the identification result integration unit 17 has N = 2 and H = 8, the following formula (1-1) is calculated. The identification result integration unit 17, among the reliability of the reliability ^{p 1 (X) ~p 8 (} X), the face direction of the class to which the value is the highest estimated results.

例えば、第１識別部１４が、クラス１の事後確率ｐ^１ _１（Ｘ）＝０．８、クラス２の事後確率ｐ^２ _１（Ｘ）＝０．４、…、クラス８の事後確率ｐ^８ _１（Ｘ）＝０．０５と計算したこととする。また、例えば、第２識別部１６が、クラス１の事後確率ｐ^１ _２（Ｘ）＝０．７、クラス２の事後確率ｐ^２ _２（Ｘ）＝０．５、…、クラス８の事後確率ｐ^８ _２（Ｘ）＝０．１と計算したこととする。
なお、説明を簡易にするため、クラス３〜７の事後確率の計算は省略した。 For example, the first identification unit 14 has a class 1 posterior probability p ¹ ₁ (X) = 0.8, a class 2 posterior probability p ² ₁ (X) = 0.4, ..., a class 8 posterior probability p ⁸ It is assumed that ₁ (X) = 0.05 is calculated. Further, for example, the second identification unit 16 has a class 1 posterior probability p ¹ ₂ (X) = 0.7, a class 2 posterior probability p ² ₂ (X) = 0.5, ..., a class 8 posterior probability. and it was calculated to ^{_{p 8 2 (X) = 0.1}} .
For the sake of simplicity, the calculation of posterior probabilities for classes 3 to 7 is omitted.

この場合、識別結果統合部１７は、色ヒストグラムで計算したクラス１の信頼度ｐ^１ _１（Ｘ）＝０．８と、ＨＯＧで計算した計算したクラス１の信頼度ｐ^１ _２（Ｘ）＝０．７とを乗算し、クラス１の信頼度ｐ^１（Ｘ）＝０．５６を求める。また、識別結果統合部１７は、色ヒストグラムで計算したクラス２の信頼度ｐ^２ _１（Ｘ）＝０．４と、ＨＯＧで計算した計算したクラス２の信頼度ｐ^２ _２（Ｘ）＝０．５とを乗算し、クラス２の信頼度ｐ^２（Ｘ）＝０．２を求める。そして、識別結果統合部１７は、色ヒストグラムで計算したクラス８の信頼度ｐ^８ _１（Ｘ）＝０．０５と、ＨＯＧで計算した計算したクラス８の信頼度ｐ^８ _２（Ｘ）＝０．１とを乗算し、クラス８の信頼度ｐ^８（Ｘ）＝０．００５を求める。さらに、識別結果統合部１７は、信頼度ｐ^１（Ｘ）〜ｐ^８（Ｘ）のうち、最高値となるクラス１の顔方向＝０°を推定結果とする。 In this case, the identification result integration unit 17, and the reliability ^p _{1 1} (X) = 0.8 for class 1 calculated by the color histogram, the reliability ^p ₁ 2 Class 1 calculated calculated in HOG (X) = Multiply by 0.7 to obtain the class 1 reliability p ¹ (X) = 0.56. Further, the identification result integration unit 17 has class 2 reliability p ² ₁ (X) = 0.4 calculated by the color histogram and class 2 reliability p ² ₂ (X) = 0 calculated by HOG. Multiply with .5 to obtain the class 2 reliability p ² (X) = 0.2. Then, the identification result integration unit 17 has the reliability p ⁸ ₁ (X) = 0.05 of the class 8 calculated by the color histogram and the reliability p ⁸ ₂ (X) = 0 of the class 8 calculated by the HOG. multiplying the .1 determine the reliability ^p 8 (X) = 0.005 for class 8. Further, the identification result integration unit 17 sets the highest value of the reliability p ¹ (X) to p ⁸ (X) of the class 1 face direction = 0 ° as the estimation result.

出力部１８は、識別結果統合部１７が推定した顔方向を外部（例えば、ＣＧ合成装置２）に出力する。本実施形態では、出力部１８は、顔方向の推定結果として、顔方向を表した数値を出力する。
なお、出力部１８は、顔方向を任意の形式で出力可能であり、顔方向を表したＣＧを生成、出力してもよい。 The output unit 18 outputs the face direction estimated by the identification result integration unit 17 to the outside (for example, the CG synthesizer 2). In the present embodiment, the output unit 18 outputs a numerical value representing the face direction as the estimation result of the face direction.
The output unit 18 can output the face direction in any format, and may generate and output CG representing the face direction.

［作用・効果］
以上のように、本発明の第１実施形態に係る顔方向推定装置１は、各画素の位置情報を記述するために顔画像を領域分割し、それぞれの領域で色ヒストグラムを計算するので、従来技術に比べて、特徴量の次元数を少なくできる（例えば、ＲＧＢ各色のビン数が４なので、色ヒストグラムで合計１２次元）。さらに、顔方向推定装置１は、顔画像の領域毎に色ヒストグラムを計算するので、顔画像内での頭部位置の変化、顔画像の解像度低下やノイズ重畳の影響を受けにくくなる。これにより、顔方向推定装置１は、学習及び識別の処理負荷を軽減し、サッカー選手の顔方向をリアルタイムで推定することができる。 [Action / Effect]
As described above, the face direction estimation device 1 according to the first embodiment of the present invention divides the face image into regions in order to describe the position information of each pixel, and calculates the color histogram in each region. Compared with the technology, the number of dimensions of the feature amount can be reduced (for example, since the number of bins of each RGB color is 4, the total of 12 dimensions in the color histogram). Further, since the face direction estimation device 1 calculates the color histogram for each region of the face image, it is less susceptible to changes in the head position in the face image, reduction in resolution of the face image, and noise superposition. As a result, the face direction estimation device 1 can reduce the processing load of learning and identification and estimate the face direction of the soccer player in real time.

さらに、顔方向推定装置１は、色に関連した特徴量である色ヒストグラム、及び、形状に関連した特徴量であるＨＯＧのように、特性が異なる特徴量を併用するので、サッカー選手の顔方向を高精度に推定することができる。
なお、顔方向推定装置１の動作は、第２実施形態で説明する。 Further, since the face direction estimation device 1 uses feature amounts having different characteristics such as the color histogram, which is a feature amount related to color, and HOG, which is a feature amount related to shape, the face direction of a soccer player. Can be estimated with high accuracy.
The operation of the face direction estimation device 1 will be described in the second embodiment.

（第２実施形態）
［顔方向推定装置の構成］
図１０を参照し、本発明の第２実施形態に係る顔方向推定装置１Ｂの構成について、第１実施形態と異なる点を説明する。 (Second Embodiment)
[Configuration of face direction estimation device]
With reference to FIG. 10, the configuration of the face direction estimation device 1B according to the second embodiment of the present invention will be described as being different from the first embodiment.

第１実施形態では、色ヒストグラム及びＨＯＧという２種類の特徴量を用いることとして説明した。第２実施形態では、Ｎ−１種類の第２特徴量及び色ヒストグラムを合わせて、Ｎ種類の特徴量を用いる点が、第１実施形態と異なる。 In the first embodiment, it has been described that two types of feature quantities, the color histogram and the HOG, are used. The second embodiment is different from the first embodiment in that N-1 types of second feature amounts and color histograms are combined and N types of feature amounts are used.

図１０に示すように、顔方向推定装置１Ｂは、特徴量計算装置３と、顔画像抽出部１０と、画像サイズ正規化部１１と、第１識別部１４と、第２特徴量計算部１５（１５_２〜１５_Ｎ）と、第２識別部１６（１６_２〜１６_Ｎ）と、識別結果統合部（顔方向推定部）１７Ｂと、出力部１８と、を備える。 As shown in FIG. 10, the face direction estimation device 1B includes a feature amount calculation device 3, a face image extraction unit 10, an image size normalization unit 11, a first identification unit 14, and a second feature amount calculation unit 15. It includes a ₍₁₅ 2 to 15 _N), and the second identification portion 16 ₍₁₆ 2 ~ 16 _N), and the identification result integration unit (face direction estimating section) 17B, and an output unit 18.

つまり、顔方向推定装置１Ｂは、第２特徴量の種類毎に、第２特徴量計算部１５と第２識別部１６との組を備える。言い換えるなら、顔方向推定装置１Ｂは、第２特徴量計算部１５と第２識別部１６との組をＮ−１個だけ備える。 That is, the face direction estimation device 1B includes a set of the second feature amount calculation unit 15 and the second identification unit 16 for each type of the second feature amount. In other words, the face direction estimation device 1B includes only N-1 pairs of the second feature amount calculation unit 15 and the second identification unit 16.

ここで、顔方向推定装置１Ｂは、組み合わせ可能な特徴量の種類及び数が特に制限されず、特性が異なる第２特徴量を併用することが好ましい。また、顔方向推定装置１Ｂは、色に関連した特徴量（色ヒストグラム）を用いるので、色以外に関連した第２特徴量を併用することがより好ましい。 Here, in the face direction estimation device 1B, the types and numbers of the feature amounts that can be combined are not particularly limited, and it is preferable to use the second feature amounts having different characteristics together. Further, since the face direction estimation device 1B uses the feature amount (color histogram) related to the color, it is more preferable to use the second feature amount related to other than the color together.

例えば、顔方向推定装置１Ｂは、第１実施形態と同様、２種類目の特徴量として、形状に関連したＨＯＧを用いてもよい。また、顔方向推定装置１Ｂは、３種類目の特徴量として、エッジに関連したＥＯＧ(Edge of Orientation Histogram)を用いてもよい。さらに、顔方向推定装置１Ｂは、ＳＩＦＴ(Scale-Invariant Feature Transform)、ＳＵＲＦ(Speeded Up Robust Features)等の特徴量を用いてもよい。ＳＩＦＴ又はＳＵＲＦを用いる場合、顔方向推定装置１Ｂは、顔画像の画素数が少ないため特徴点を抽出せず、固定グリッドで特徴量を記述することが好ましい（dense sampling）。 For example, the face direction estimation device 1B may use a shape-related HOG as the second type of feature amount as in the first embodiment. Further, the face direction estimation device 1B may use an EOG (Edge of Orientation Histogram) related to the edge as the third type of feature amount. Further, the face direction estimation device 1B may use a feature amount such as SIFT (Scale-Invariant Feature Transform) and SURF (Speeded Up Robust Features). When SIFT or SURF is used, it is preferable that the face direction estimation device 1B does not extract feature points because the number of pixels of the face image is small, and describes the feature amount on a fixed grid (dense sampling).

第２特徴量計算部１５（１５_２〜１５_Ｎ）は、第２特徴量の種類毎に、画像サイズ正規化部１１から入力した顔画像及び訓練データの第２特徴量を計算する。具体的には、第２特徴量計算部１５_２は、顔画像及び訓練データから１種類目の第２特徴量を計算する。また、第２特徴量計算部１５_３は、顔画像及び訓練データから２種類目の第２特徴量を計算する。さらに、第２特徴量計算部１５_Ｎは、顔画像及び訓練データからＮ−１種類目の第２特徴量を計算する。
なお、第２特徴量計算部１５（１５_２〜１５_Ｎ）は、学習モード及び推定モードでの処理内容が第１実施形態と同様のため、これ以上の説明を省略する。 The second feature quantity calculating unit ₁₅ (15 2 ~15 _N), for each type of the second feature amount, calculates a second characteristic amount of the face image and the training data input from the image size normalization section 11. Specifically, the second feature quantity calculating unit 15 ₂ calculates the second feature quantity of the first type from the face image and the training data. The second feature quantity calculator 15 ₃ calculates a second characteristic amount of the second type from the face image and the training data. Further, the second feature amount calculation unit 15 _N calculates the second feature amount of the N-1 type from the face image and the training data.
Note that the second feature quantity calculator ₁₅ (15 2 ~15 _N), because processing of the learning mode and estimation mode is similar to the first embodiment, further explanation is omitted here.

第２識別部１６（１６_２〜１６_Ｎ）は、学習モードの場合、第２特徴量の種類毎に、訓練データの第２特徴量を学習した識別器を生成する。また、第２識別部１６（１６_２〜１６_Ｎ）は、推定モードの場合、第２特徴量の種類毎に、この識別器により、顔画像の第２特徴量から信頼度を計算する。 Second identification portion ₁₆ (16 2 ~16 _N), when the learning mode, for each type of the second feature quantity, generates a classifier learned the second feature amount of training data. The second identifying unit ₁₆ (16 2 ~16 _N), when the estimated mode for each type of the second feature quantity by the discriminator, calculates reliability from a second characteristic amount of the face image.

具体的には、第２識別部１６_２は、１種類目の第２特徴量により、識別器の生成及び信頼度の計算を行う。また、第２識別部１６_３は、２種類目の第２特徴量により、識別器の生成及び信頼度の計算を行う。さらに、第２識別部１６_Ｎは、Ｎ−１種類目の第２特徴量により、識別器の生成及び信頼度の計算を行う。
なお、第２識別部１６（１６_２〜１６_Ｎ）は、学習モード及び推定モードでの処理内容が第１実施形態と同様のため、これ以上の説明を省略する。 Specifically, the second identifying unit 16 _2, the second feature quantity of the first type, and generates and reliability calculation discriminator. Further, the second identification portion 16 _3, the second feature quantity of the second type, and generates and reliability calculation discriminator. Further, the second identification unit 16 _N generates the classifier and calculates the reliability based on the second feature amount of the N-1th type.
Incidentally, the second identification portion ₁₆ (16 2 ~16 _N), because processing of the learning mode and estimation mode is similar to the first embodiment, further explanation is omitted here.

識別結果統合部１７Ｂは、推定モードの場合、第１識別部１４及び第２識別部１６_２〜１６_Ｎで計算した信頼度を統合することで、被写体の顔方向を推定する。具体的には、識別結果統合部１７Ｂは、顔方向毎に色ヒストグラム、及び、それぞれの第２特徴量で計算した信頼度を乗算し、乗算した信頼度が最も高くなる顔方向を被写体の顔方向として推定する。つまり、識別結果統合部１７Ｂは、前記した式（１）により、顔方向毎の信頼度を計算し、その値が最も高くなるクラスの顔方向を推定結果とする。 Identification result integration unit 17B, when the estimation mode, by integrating the calculated confidence in the first identifying unit 14 and the second identifying unit 16 ₂ ~ 16 _N, to estimate the face direction of the object. Specifically, the identification result integration unit 17B multiplies the color histogram for each face direction and the reliability calculated by the second feature amount of each, and sets the face direction having the highest reliability as the subject's face. Estimate as direction. That is, the identification result integration unit 17B calculates the reliability for each face direction by the above equation (1), and sets the face direction of the class having the highest value as the estimation result.

［顔方向推定装置の動作：学習モード］
図１１を参照し、顔方向推定装置１Ｂの学習モードの動作について説明する（適宜図１０参照）。この学習モードでは、操作者が顔方向推定装置１Ｂに複数の訓練データを入力し、顔方向推定装置１Ｂが訓練データを１個ずつ学習する。
なお、図１１では、ｎ種類目の特徴量を特徴量（ｎ）と図示した（図１２も同様）。 [Operation of face direction estimation device: learning mode]
The operation of the learning mode of the face direction estimation device 1B will be described with reference to FIG. 11 (see FIG. 10 as appropriate). In this learning mode, the operator inputs a plurality of training data into the face direction estimation device 1B, and the face direction estimation device 1B learns the training data one by one.
In FIG. 11, the nth type of feature amount is shown as a feature amount (n) (the same applies to FIG. 12).

画像サイズ正規化部１１は、訓練データのサイズを正規化する（ステップＳ１０）。
顔方向推定装置１Ｂは、何種類目の特徴量であるかを表す整数ｎを１に初期化する（ステップＳ１１）。 The image size normalization unit 11 normalizes the size of the training data (step S10).
The face direction estimation device 1B initializes the integer n indicating which type of feature amount is 1 to 1. (Step S11).

顔方向推定装置１Ｂは、ｎ種類目の特徴量で領域分割が必要か否かを判定する。ここで、顔方向推定装置１Ｂは、領域分割が必要な特徴量（例えば、色ヒストグラム）、及び、領域分割が必要でない特徴量（例えば、ＨＯＧ）を予め設定し、その設定結果に基づいて判定を行う。ここで、顔方向推定装置１Ｂは、ｎ＝１（色ヒストグラム）の場合、領域分割が必要と判定する。一方、顔方向推定装置１Ｂは、ｎ＝２（ＨＯＧ）の場合、領域分割が必要でないと判定する（ステップＳ１２）。 The face direction estimation device 1B determines whether or not region division is necessary based on the nth type of feature amount. Here, the face direction estimation device 1B sets in advance a feature amount that requires region division (for example, a color histogram) and a feature amount that does not require region division (for example, HOG), and determines based on the setting result. I do. Here, the face direction estimation device 1B determines that region division is necessary when n = 1 (color histogram). On the other hand, the face direction estimation device 1B determines that the region division is not necessary when n = 2 (HOG) (step S12).

領域分割が必要な場合（ステップＳ１２でＹｅｓ）、画像領域分割部１２は、訓練データを、ｉ×ｊ個の領域に分割する（ステップＳ１３）。
第１特徴量計算部１３は、訓練データのそれぞれの領域について、色ヒストグラムを計算する。そして、第１特徴量計算部１３は、それぞれの領域の色ヒストグラムを連結し、訓練データ全体の色ヒストグラムを求める（ステップＳ１４）。
第１識別部１４は、訓練データの色ヒストグラムを学習した識別器を生成する（ステップＳ１５）。 When region division is required (Yes in step S12), the image region division unit 12 divides the training data into i × j regions (step S13).
The first feature amount calculation unit 13 calculates the color histogram for each region of the training data. Then, the first feature amount calculation unit 13 concatenates the color histograms of the respective regions and obtains the color histogram of the entire training data (step S14).
The first discriminator 14 generates a discriminator that has learned the color histogram of the training data (step S15).

領域分割が必要でない場合（ステップＳ１２でＮｏ）、第２特徴量計算部１５_ｎは、訓練データのｎ種類目の特徴量を計算する（ステップＳ１６）。
第２識別部１６_ｎは、訓練データのｎ種類目の特徴量を学習した識別器を生成する（ステップＳ１７）。 If you do not need Segmentation (No in step S12), the second feature quantity calculator 15 _n calculates the feature quantity of n types th training data (step S16).
The second identification section 16 _n, and generates a classifier learned feature quantity of n types th training data (step S17).

顔方向推定装置１Ｂは、整数ｎが特徴量の最大種類数Ｎに一致するか否かにより、全種類の特徴量で識別器を生成したか否かを判定する（ステップＳ１８）。
整数ｎが最大種類数Ｎに一致しない場合（ステップ１８でＮｏ）、顔方向推定装置１Ｂは、整数ｎをインクリメントし（ステップＳ１９）、ステップＳ１２の処理に戻る。 The face direction estimation device 1B determines whether or not the classifier has been generated for all types of feature amounts based on whether or not the integer n matches the maximum number of types N of the feature amounts (step S18).
When the integer n does not match the maximum number of types N (No in step 18), the face direction estimation device 1B increments the integer n (step S19) and returns to the process of step S12.

整数ｎが最大種類数Ｎに一致する場合（ステップ１８でＹｅｓ）、顔方向推定装置１Ｂは、全訓練データの学習を終了したか否かを判定する（ステップＳ２０）。
全訓練データの学習を終了していない場合（ステップＳ２０でＮｏ）、顔方向推定装置１Ｂは、ステップＳ１０の処理に戻り、次の訓練データを学習する。
全訓練データの学習を終了した場合（ステップＳ２０でＹｅｓ）、顔方向推定装置１Ｂは、学習モードを終了する。
このように、学習モードにより、顔方向推定装置１Ｂは、サッカー選手の顔方向の推定に必要な識別器を生成できる。 When the integer n matches the maximum number of types N (Yes in step 18), the face direction estimation device 1B determines whether or not the learning of all training data has been completed (step S20).
When the learning of all training data is not completed (No in step S20), the face direction estimation device 1B returns to the process of step S10 and learns the next training data.
When the learning of all the training data is completed (Yes in step S20), the face direction estimation device 1B ends the learning mode.
In this way, the learning mode allows the face direction estimation device 1B to generate a classifier necessary for estimating the face direction of a soccer player.

［顔方向推定装置の動作：推定モード］
図１２を参照し、顔方向推定装置１Ｂの推定モードの動作について説明する（適宜図１０参照）。 [Operation of face direction estimation device: estimation mode]
The operation of the estimation mode of the face direction estimation device 1B will be described with reference to FIG. 12 (see FIG. 10 as appropriate).

顔方向推定装置１Ｂは、第１撮影部Ｃ_１が撮影したサッカーの試合映像を入力する（ステップＳ３０）。
顔画像抽出部１０は、映像に被写体追跡処理を施し、サッカー選手の位置を求める。そして、顔画像抽出部１０は、サッカー選手の位置を基準にして、サッカー選手の顔画像を抽出する（ステップＳ３１）。 Face direction estimating apparatus 1B, the first imaging section C ₁ inputs a game image of a soccer captured (step S30).
The face image extraction unit 10 performs subject tracking processing on the image to obtain the position of the soccer player. Then, the face image extraction unit 10 extracts the face image of the soccer player based on the position of the soccer player (step S31).

このステップＳ３１において、サッカーの試合映像に複数のサッカー選手が含まれる場合、顔画像抽出部１０は、全サッカー選手の顔画像を抽出してもよく、操作者が指定したサッカー選手の顔画像を抽出してもよい。推定モードでは、顔方向推定装置１Ｂが、顔画像を１個ずつ推定する。 In step S31, when a plurality of soccer players are included in the soccer match video, the face image extraction unit 10 may extract the face images of all the soccer players, and the face image of the soccer player designated by the operator may be extracted. It may be extracted. In the estimation mode, the face direction estimation device 1B estimates face images one by one.

画像サイズ正規化部１１は、顔画像のサイズを正規化する（ステップＳ３２）。
顔方向推定装置１Ｂは、何種類目の特徴量であるかを表す整数ｎを１に初期化する（ステップＳ３３）。
顔方向推定装置１Ｂは、図１１のステップＳ１２と同様、ｎ種類目の特徴量で領域分割が必要か否かを判定する（ステップＳ３４）。 The image size normalization unit 11 normalizes the size of the face image (step S32).
The face direction estimation device 1B initializes the integer n indicating the number of types of feature quantities to 1 (step S33).
Similar to step S12 in FIG. 11, the face direction estimation device 1B determines whether or not region division is necessary for the nth type of feature amount (step S34).

領域分割が必要な場合（ステップＳ３４でＹｅｓ）、画像領域分割部１２は、顔画像を、ｉ×ｊ個の領域に分割する（ステップＳ３５）。
第１特徴量計算部１３は、顔画像のそれぞれの領域について、色ヒストグラムを計算する。そして、第１特徴量計算部１３は、それぞれの領域の色ヒストグラムを連結し、顔画像全体の色ヒストグラムを求める（ステップＳ３６）。
第１識別部１４は、色ヒストグラムを学習した識別器により、顔画像全体の色ヒストグラムから信頼度を計算する（ステップＳ３７）。 When the region division is required (Yes in step S34), the image region division unit 12 divides the face image into i × j regions (step S35).
The first feature amount calculation unit 13 calculates the color histogram for each region of the face image. Then, the first feature amount calculation unit 13 connects the color histograms of the respective regions to obtain the color histogram of the entire face image (step S36).
The first identification unit 14 calculates the reliability from the color histogram of the entire face image by the classifier that has learned the color histogram (step S37).

領域分割が必要でない場合（ステップＳ３４でＮｏ）、第２特徴量計算部１５_ｎは、顔画像のｎ種類目の特徴量を計算する（ステップＳ３８）。
第２識別部１６_ｎは、ｎ種類目の特徴量を学習した識別器により、顔画像でｎ種類目の特徴量から信頼度を計算する（ステップＳ３９）。 If you do not need Segmentation (No in step S34), the second feature quantity calculator 15 _n calculates the feature quantity of n types th face image (step S38).
The second identifying unit 16 _n, the identifier learned feature quantity of n types th, calculates the reliability from the feature of n type th in the face image (step S39).

顔方向推定装置１Ｂは、整数ｎが特徴量の最大種類数Ｎに一致するか否かにより、全種類の特徴量で信頼度を計算したか否かを判定する（ステップＳ４０）。
整数ｎが最大種類数Ｎに一致しない場合（ステップ４０でＮｏ）、顔方向推定装置１Ｂは、整数ｎをインクリメントし（ステップＳ４１）、ステップＳ３４の処理に戻る。 The face direction estimation device 1B determines whether or not the reliability is calculated for all types of feature amounts based on whether or not the integer n matches the maximum number of types N of the feature amounts (step S40).
When the integer n does not match the maximum number of types N (No in step 40), the face direction estimation device 1B increments the integer n (step S41) and returns to the process of step S34.

整数ｎが最大種類数Ｎに一致する場合（ステップＳ４０でＹｅｓ）、識別結果統合部１７Ｂは、１種類目からｎ種類目までの信頼度を統合し、顔方向を推定する（ステップＳ４２）。
顔方向推定装置１Ｂは、全顔画像の顔方向の推定を終了したか否かを判定する（ステップＳ４３）。
全顔画像の顔方向の推定を終了していない場合（ステップＳ４３でＮｏ）、顔方向推定装置１Ｂは、ステップＳ３２の処理に戻り、次の顔画像の顔方向を推定する。 When the integer n matches the maximum number of types N (Yes in step S40), the identification result integration unit 17B integrates the reliability of the first type to the nth type and estimates the face direction (step S42).
The face direction estimation device 1B determines whether or not the estimation of the face direction of all face images has been completed (step S43).
When the estimation of the face direction of all face images is not completed (No in step S43), the face direction estimation device 1B returns to the process of step S32 and estimates the face direction of the next face image.

全顔画像の顔方向の推定を終了した場合（ステップＳ４３でＹｅｓ）、出力部１８は、識別結果統合部１７Ｂが推定した全顔画像の顔方向を外部（例えば、ＣＧ合成装置２）に出力し（ステップＳ４４）、推定モードを終了する。
このように、推定モードにより、顔方向推定装置１Ｂは、サッカー選手の顔方向を推定できる。 When the estimation of the face direction of the whole face image is completed (Yes in step S43), the output unit 18 outputs the face direction of the whole face image estimated by the identification result integration unit 17B to the outside (for example, the CG synthesizer 2). (Step S44), the estimation mode is terminated.
In this way, the face direction estimation device 1B can estimate the face direction of the soccer player by the estimation mode.

［作用・効果］
本発明の第２実施形態に係る顔方向推定装置１Ｂは、顔画像の領域毎に色ヒストグラムを計算するので、第１実施形態と同様、特徴量の次元数を少なくし、サッカー選手の顔方向をリアルタイムで推定することができる。さらに、顔方向推定装置１Ｂは、色ヒストグラム、及び、１以上の任意の第２特徴量を併用するので、サッカー選手の顔方向を高精度に推定することができる。 [Action / Effect]
Since the face direction estimation device 1B according to the second embodiment of the present invention calculates the color histogram for each region of the face image, the number of dimensions of the feature amount is reduced and the face direction of the soccer player is reduced as in the first embodiment. Can be estimated in real time. Further, since the face direction estimation device 1B uses the color histogram and one or more arbitrary second feature amounts in combination, the face direction of the soccer player can be estimated with high accuracy.

（変形例）
以上、本発明の各実施形態を詳述してきたが、本発明は前記した各実施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計変更等も含まれる。
前記した各実施形態では、顔方向を８方向で識別することとして説明したが、これに限定されない。例えば、顔方向推定装置は、顔方向を４方向又は１６方向で推定してもよい。 (Modification example)
Although each embodiment of the present invention has been described in detail above, the present invention is not limited to each of the above-described embodiments, and includes design changes and the like within a range not deviating from the gist of the present invention.
In each of the above-described embodiments, the face direction is identified in eight directions, but the present invention is not limited to this. For example, the face direction estimation device may estimate the face direction in 4 directions or 16 directions.

前記した各実施形態では、顔方向推定装置が、識別器を事前に学習することとして説明したが、これに限定されない。例えば、顔方向推定装置は、オンライン学習により、識別器を学習しながら、リアルタイムで顔方向を推定することができる。 In each of the above embodiments, the face orientation estimator has been described as learning the classifier in advance, but is not limited thereto. For example, the face direction estimation device can estimate the face direction in real time while learning the classifier by online learning.

前記した各実施形態では、顔方向推定装置が、ｏｎｅ‐ｖｅｒｓｕｓ‐ｒｅｓｔによるマルチクラスＳＶＭを用いることとして説明したが、これに限定されない。例えば、顔方向推定装置は、ランダムフォレスト、ニューラルネットワーク等の機械学習を用いてもよい。 In each of the above embodiments, the face orientation estimator has been described as using a multi-class SVM by one-versus-rest, but is not limited thereto. For example, the face direction estimation device may use machine learning such as a random forest or a neural network.

前記した各実施形態では、顔方向推定装置が、サッカー選手の顔方向を推定することとして説明したが、これに限定されない。例えば、顔方向推定装置は、サッカー以外のスポーツ映像に含まれる選手の顔方向を推定できる。また、顔方向推定装置は、監視カメラの映像に含まれる人物の顔方向を推定してもよい。 In each of the above-described embodiments, the face direction estimation device has been described as estimating the face direction of a soccer player, but the present invention is not limited to this. For example, the face direction estimation device can estimate the face direction of a player included in a sports image other than soccer. Further, the face direction estimation device may estimate the face direction of a person included in the image of the surveillance camera.

前記した各実施形態では、顔方向推定装置を独立したハードウェアとして説明したが、これに限定されない。例えば、顔方向推定装置は、コンピュータが備えるＣＰＵ、メモリ、ハードディスク等のハードウェア資源を、前記した各手段として協調動作させる顔方向推定プログラムで実現することもできる。このプログラムは、通信回線を介して配布してもよく、ＣＤ−ＲＯＭやフラッシュメモリ等の記録媒体に書き込んで配布してもよい。 In each of the above embodiments, the face orientation estimation device has been described as independent hardware, but the present invention is not limited thereto. For example, the face direction estimation device can also be realized by a face direction estimation program in which hardware resources such as a CPU, a memory, and a hard disk included in a computer are cooperatively operated as the above-mentioned means. This program may be distributed via a communication line, or may be written and distributed on a recording medium such as a CD-ROM or a flash memory.

前記した各実施形態では、顔方向推定装置が特徴量計算装置を備えることとして説明したが、これに限定されない。つまり、特徴量計算装置は、顔方向推定装置に組み込むことなく、独立したハードウェアとして利用することができる。 In each of the above-described embodiments, the face direction estimation device is described as including the feature amount calculation device, but the present invention is not limited to this. That is, the feature amount calculation device can be used as independent hardware without being incorporated in the face direction estimation device.

本発明の実施例として、本発明に係る顔方向推定装置の評価試験の結果について説明する。
本発明に係る顔方向推定プログラムをコンピュータにインストールし、第１実施形態と同様の構成とした。このコンピュータは、ＣＰＵがインテル株式会社製の「Ｃｏｒｅ（登録商標）ｉ７‐４７９０３．６０ＧＨｚ」であり、ＲＡＭが１６ＧＢであり、ＯＳがマイクロソフト社製の「ＷＩＮＤＯＷＳ（登録商標）７ＰｒｏＳＰ１６４ｂｉｔ」である。また、顔方向推定プログラムは、Ｐｙｔｈｏｎ３．５．１の環境においてシングルスレッドで実装した。以下、顔方向推定プログラムを実装したコンピュータを、顔方向推定装置とする。 As an example of the present invention, the result of the evaluation test of the face direction estimation device according to the present invention will be described.
The face orientation estimation program according to the present invention was installed on a computer to have the same configuration as that of the first embodiment. This computer has a CPU of "Core (registered trademark) i7-4790 3.60 GHz" manufactured by Intel Corporation, a RAM of 16 GB, and an OS of "WINDOWS (registered trademark) 7 Pro SP1 64 bit" manufactured by Microsoft. Is. The face orientation estimation program was implemented in a single thread in the environment of Python 3.5.1. Hereinafter, a computer equipped with a face direction estimation program will be referred to as a face direction estimation device.

本発明に係る顔方向推定装置の評価試験には、サッカーの試合映像を用いた。第１撮影部は、キヤノン株式会社製の「ＸＡ２５」の１台で撮影を行った。第１撮影部は、センターライン付近の観客席に配置し、サッカーコートの半分が映る画角で撮影した。センターマークの原点を（０，０）とすれば、第１撮影部の座標は、（３４，０）付近の観客席を表す。 In the evaluation test of the face direction estimation device according to the present invention, a soccer match image was used. The first shooting section shot with one of the "XA25" manufactured by Canon Inc. The first shooting section was placed in the audience seats near the center line and shot at an angle of view that reflected half of the soccer court. Assuming that the origin of the center mark is (0,0), the coordinates of the first photographing unit represent the audience seats near (34.0).

評価試験では、正解ラベル（教師信号）を手入力とし、各クラス均等に合計８００サンプルを用意した。サンプルの７５％を訓練データ、残り２５％を評価データ（顔画像）とした。ＨＯＧのパラメータは、１セルを４×４ピクセル、１ブロックを５×５セルとした。色ヒストグラムのパラメータは、領域分割数をｉ＝ｊ＝４とし、ＲＧＢ各色についてビン数を４とした。そして、識別器の生成及び顔方向の推定を５０回試行し、推定結果を平均した。これを実施例１とする。 In the evaluation test, the correct answer label (teacher signal) was manually input, and a total of 800 samples were prepared evenly for each class. 75% of the sample was training data, and the remaining 25% was evaluation data (face image). The parameters of HOG were 4 × 4 pixels per cell and 5 × 5 cells per block. As the parameters of the color histogram, the number of region divisions was set to i = j = 4, and the number of bins was set to 4 for each RGB color. Then, the generation of the classifier and the estimation of the face direction were tried 50 times, and the estimation results were averaged. This is referred to as Example 1.

また、特徴量計算装置（領域分割する色ヒストグラム）の評価実験を行った。これを参考例１とする。この参考例１において、コンピュータの仕様、サンプル、特徴量のパラメータ、試行回数等の評価条件は、実施例１と同等であった。 In addition, an evaluation experiment was conducted on the feature amount calculation device (color histogram that divides the area). This is referred to as Reference Example 1. In Reference Example 1, the evaluation conditions such as computer specifications, samples, feature quantity parameters, and number of trials were the same as those in Example 1.

実施例１と比較すべく、ｉＤＦ、ｃＤＦ及びＩＦを組み合わせて評価実験を行った。このとき、ｉＤＦ及びｃＤＦのパラメータは、ペア数＝１００００とした。これを比較例１とする。ＨＯＧ及びＣＴＣを組み合わせて評価実験を行い、これを比較例２とした。さらに、ＨＯＧのみで評価実験を行い、これを比較例３とした。比較例１〜３の評価条件は、実施例１，２と同等であった。 An evaluation experiment was conducted in combination with iDF, cDF and IF in order to compare with Example 1. At this time, the parameters of iDF and cDF were set to the number of pairs = 10000. This is referred to as Comparative Example 1. An evaluation experiment was conducted by combining HOG and CTC, and this was designated as Comparative Example 2. Further, an evaluation experiment was conducted using only HOG, and this was designated as Comparative Example 3. The evaluation conditions of Comparative Examples 1 to 3 were the same as those of Examples 1 and 2.

図１３では、「ｉＤＦ＋ｃＤＦ＋ＩＦ」が比較例１であり、「ＣＴＣ＋ＨＯＧ」が比較例２であり、「ＨＯＧ」が比較例３であり、「Ｃｏｌｏｒｈｉｓｔｏｇｒａｍｓ」が参考例１であり、「Ｐｒｏｐｏｓｅｄ」が実施例１である（図１４〜図１９も同様）。 In FIG. 13, "iDF + cDF + IF" is Comparative Example 1, "CTC + HOG" is Comparative Example 2, "HOG" is Comparative Example 3, "Color histograms" is Reference Example 1, and "Proposed" is carried out. Example 1 (the same applies to FIGS. 14 to 19).

また、図１３には、実施例１、参考例１及び比較例１〜３の識別性能として、正解率（Accuracy）、適合率（Precision）、再現率（Recall）、Ｆ値（F-measure）を示した。図１３より、実施例１は、全項目で比較例１〜３を上回っており、ＨＯＧと色ヒストグラムとを組み合わせたことで、良好な識別性能を有することが分かった。 Further, in FIG. 13, the accuracy rate (Accuracy), precision rate (Precision), recall rate (Recall), and F value (F-measure) are shown as the discrimination performances of Example 1, Reference Example 1, and Comparative Examples 1 to 3. showed that. From FIG. 13, it was found that Example 1 exceeded Comparative Examples 1 to 3 in all items, and had good discrimination performance by combining the HOG and the color histogram.

図１４には、実施例１、参考例１及び比較例１〜３の計算時間として、１サンプルあたりの特徴量抽出時間（Feature extraction）、学習時間（Training）、識別時間（Classifying）を示した。図１４より、実施例１は、３つの合計時間が約３．３ｍｓとなり、リアルタイム（２９．９７ｆｐｓ相当）で処理できることが分かった。 FIG. 14 shows the feature extraction time (Feature extraction), training time (Training), and identification time (Classifying) per sample as the calculation times of Example 1, Reference Example 1, and Comparative Examples 1 to 3. .. From FIG. 14, it was found that in Example 1, the total time of the three was about 3.3 ms, and the processing could be performed in real time (equivalent to 29.97 fps).

図１５〜図１９には、実施例１、参考例１及び比較例１〜３の識別結果として、混同行列（Confusion matrix）を示した。この混同行列は、縦軸が訓練データの顔方向を表し、横軸が評価データの顔方向を表す。また、混同行列は、数値が識別数を表し、濃淡が信頼度を表す。これら混同行列では、左上から右下まで対角線上の項目において、識別数が多く、信頼度が高くなれば、識別結果が良好であると言える。 15 to 19 show a confusion matrix as the identification results of Example 1, Reference Example 1 and Comparative Examples 1 to 3. In this confusion matrix, the vertical axis represents the face direction of the training data, and the horizontal axis represents the face direction of the evaluation data. In the confusion matrix, the numerical value represents the number of identifications and the shade represents the reliability. In these confusion matrices, it can be said that the identification result is good if the number of identifications is large and the reliability is high in the items on the diagonal line from the upper left to the lower right.

図１５〜図１９より、実施例１は、比較例１〜３と比較して、同程度の推定精度を有することが分かった。また、実施例１は、隣接するクラス間で誤分類が発生している。その理由としては、訓練及び識別のプロセスに起因するものの他、アノテーションにおける顔方向の判断の影響もあると考えられる。つまり、アノテーションの明確な基準がなく、顔方向を人間の主観で判断しているため、例えば、顔方向が０°と４５°との中間のように見える場合、アノテーションをどちらにするか判断が困難である。このように、アノテーションが誤分類の原因になると考えられる。 From FIGS. 15 to 19, it was found that Example 1 had the same degree of estimation accuracy as Comparative Examples 1 to 3. Further, in the first embodiment, misclassification occurs between adjacent classes. The reason may be due to the process of training and identification, as well as the influence of facial orientation judgment in annotation. In other words, since there is no clear standard for annotation and the face direction is judged by human subjectivity, for example, when the face direction looks between 0 ° and 45 °, it is decided which annotation to use. Have difficulty. In this way, annotations are considered to cause misclassification.

１，１Ｂ顔方向推定装置
３特徴量計算装置
１０顔画像抽出部
１１画像サイズ正規化部
１２画像領域分割部
１３第１特徴量計算部
１４第１識別部
１５，１５_２〜１５_Ｎ第２特徴量計算部
１６，１６_２〜１６_Ｎ第２識別部
１７，１７Ｂ識別結果統合部（顔方向推定部）
１８出力部 1,1B Face direction estimation device 3 Feature calculation device 10 Face image extraction unit 11 Image size normalization unit 12 Image area division unit 13 First feature amount calculation unit 14 First identification unit 15, 15 ₂ to 15 _N Second feature Quantitative calculation unit 16, 16 _{2 to} 16 _N Second identification unit 17, 17B Identification result integration unit (face direction estimation unit)
18 Output section

Claims

A face direction estimation device that estimates the face direction of the subject from the face image of the subject by using the color histogram and one or more types of second feature quantities different from the color histogram.
An image area dividing unit that inputs the face image and divides the input face image into a plurality of areas.
The first feature amount calculation unit for obtaining the color histogram of the entire face image by calculating the color histogram for each region and concatenating the calculated color histogram for each region.
A first discriminator that calculates the reliability, which is the probability that the subject is facing each face direction, from the color histogram of the entire face image by the classifier that has learned the color histograms of the training data having different face directions.
A second feature amount calculation unit that calculates the second feature amount of the face image for each type of the second feature amount,
A second identification unit that calculates the reliability from the second feature amount of the face image by a discriminator that has learned the second feature amount of the training data for each type of the second feature amount.
By integrating the color histogram and the reliability calculated for each type of the second feature amount, the face direction estimation unit that estimates the face direction of the subject and the face direction estimation unit
A face direction estimation device, which comprises.

The second feature amount calculation unit calculates the HOG of the face image as the second feature amount.
The face direction estimation device according to claim 1, wherein the second identification unit calculates the reliability from the HOG of the face image by a classifier that has learned the HOG from the training data.

The face direction estimation unit multiplies the reliability calculated by the color histogram and the reliability calculated by the HOG for each face direction, and determines the face direction having the highest reliability as the face direction of the subject. The face direction estimation device according to claim 2, wherein the face direction estimation device is characterized in that.

A face image extraction unit that inputs a video of the subject and extracts the low-resolution face image from the input video.
An image size normalization unit that normalizes the low-resolution face image to a preset size is further provided.
The face direction estimation device according to any one of claims 1 to 3, wherein the image region dividing unit divides the normalized face image into the plurality of regions.

A face direction estimation program for causing a computer to function as the face direction estimation device according to any one of claims 1 to 4.