JP6744536B1

JP6744536B1 - Eye-gaze imaging method and eye-gaze imaging system

Info

Publication number: JP6744536B1
Application number: JP2019200130A
Authority: JP
Inventors: 英太郎角田
Original assignee: 株式会社アップステアーズ
Priority date: 2019-11-01
Filing date: 2019-11-01
Publication date: 2020-08-19
Anticipated expiration: 2039-11-01
Also published as: JP2021072597A

Abstract

【課題】被写体のカメラ目線画像をより高い確率で自動撮影する方法及びシステム。【解決手段】目線撮影システム１は、撮影した画像の被写体がカメラ目線かを識別判定する判定部３、カメラ目線の画像を一時保存する画像記憶部４、及びカメラ目線の画像を表示する表示部４とを備える。判定部３は、入力する画像にオブジェクトとして対象の顔又は対象以外の顔が存在するかどうかを予測し、存在する場合はその位置を予測する顔検出器７を有する。判定部は更に、顔検出器７から入力する対象の顔画像がカメラ目線であるか否かを判定する目線分類器８を有する。【選択図】図１PROBLEM TO BE SOLVED: To provide a method and a system for automatically photographing a subject's line-of-sight image with a higher probability. A line-of-sight imaging system 1 includes a determination unit 3 that determines whether a subject of a captured image is looking at a camera, an image storage unit 4 that temporarily stores an image of the camera looking, and a display unit that displays the image of the camera looking. 4 and 4. The determination unit 3 includes a face detector 7 that predicts whether or not a target face or a non-target face exists as an object in the input image, and if so, predicts the position. The determination unit further includes a line-of-sight classifier 8 that determines whether or not the target face image input from the face detector 7 is the line of sight of the camera. [Selection diagram] Figure 1

Description

本発明は、人や猫、犬等の動物を自動撮影し、カメラ目線の画像を記録する方法及びシステムに関するものである。 The present invention relates to a method and system for automatically photographing a human, a cat, an animal such as a dog, and recording an image of a line of sight of the camera.

一般に、写真専門家ではない撮影者が、人物の顔や姿をカメラで撮影する際、所謂「カメラ目線」の画像を常にタイミング良く撮ることは容易でない。特に被写体が猫や犬等の動物の場合、それがたとえ自分のペットであっても、カメラ目線画像を撮影することは、尚更困難である。 Generally, it is not easy for a photographer who is not a photographer to take a so-called “camera line of sight” image with good timing when taking a picture of a person's face or figure with a camera. Especially when the subject is an animal such as a cat or a dog, it is even more difficult to take a line-of-sight image even if the subject is his or her pet.

例えば特許文献１に記載される撮像装置は、シャッタ釦が最初に操作されて、被写体の注意を引く音声が出力された後、更にシャッタ釦が操作されることによって、被写体の静止画像が所定間隔で撮影されて記憶され、記憶された各静止画像の画像認識を行って、被写体の目線がカメラ目線か否かを検出する。 For example, in the image pickup apparatus described in Patent Document 1, after the shutter button is first operated and a sound that draws the subject's attention is output, the shutter button is further operated, so that a still image of the subject is displayed at predetermined intervals. Image recognition is performed on each of the still images that have been photographed and stored in, and it is detected whether or not the line of sight of the subject is the line of sight of the camera.

同特許文献におけるカメラ目線の検出は、次の３つの方法で行われる。第１の方法は、画像認識により前記静止画像から目の画像部分を抽出してそのエッジ座標を検出し、それが黒目の中心座標から離れている黒目比率を計算した結果で判定する。第２の方法は、更に画像認識により前記静止画像から顔画像を抽出してそのエッジ座標を検出し、顔の中心である鼻の中心座標との距離から、顔が斜めを向いている比率を計算した結果と、前記黒目比率の計算結果とから判定する。第３の方法は、予め多数の人の顔画像からカメラ目線時の黒目比率データをサンプルデータとして記憶しておき、前記静止画像から画像認識により検出したデータと比較して判定する。 Detection of the line of sight of the camera in the patent document is performed by the following three methods. In the first method, the image portion of the eye is extracted from the still image by image recognition, its edge coordinates are detected, and the iris ratio apart from the center coordinates of the iris is calculated to determine the result. The second method further extracts a face image from the still image by image recognition, detects edge coordinates of the face image, and determines a ratio of the face facing obliquely from the distance from the center coordinate of the nose, which is the center of the face. A judgment is made based on the calculation result and the calculation result of the black eye ratio. In the third method, the iris ratio data when looking at the camera from a large number of face images is stored in advance as sample data, and the comparison is made with the data detected by image recognition from the still image to make a determination.

近年、計算機の性能向上に伴い、画像に写った物体（オブジェクト）を識別する画像認識において、畳込みニューラルネットワーク（Convolutional Neural Network：ＣＮＮ）と呼ばれる深層学習器が広く採用されている（例えば、非特許文献１を参照）。一般にＣＮＮは、複数の畳込み層と出力層とから構成され、入力画像から特徴量の抽出を複数回に亘って実行し，最後段の出力層で物体の認識を行う。 In recent years, with the improvement of computer performance, a deep learning device called a convolutional neural network (CNN) has been widely adopted in image recognition for identifying an object reflected in an image (for example, non-convolutional neural network (CNN)). See Patent Document 1). Generally, a CNN is composed of a plurality of convolutional layers and an output layer, performs extraction of a feature amount from an input image a plurality of times, and recognizes an object at the last output layer.

また、ＣＮＮを用いた物体検出アルゴリズムとして、ＹＯＬＯ（You Only Look Once）が知られている（例えば、非特許文献２を参照）。ＹＯＬＯは、画像を複数の格子（例えば、７×７）に分割して行う物体のクラス分類と、物体を囲う矩形領域（バウンディングボックス）の位置座標を求める物体の位置検出とを並行して同時に行い、最後に両方の結果を合体することで、物体を認識する。 In addition, YOLO (You Only Look Once) is known as an object detection algorithm using CNN (for example, see Non-Patent Document 2). YOLO simultaneously classifies an object by dividing an image into a plurality of grids (for example, 7×7) and detects the position of the object to obtain the position coordinates of a rectangular area (bounding box) surrounding the object in parallel. By doing, and finally combining both results, the object is recognized.

特開２００７−９６４４０号公報JP, 2007-96440, A

Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton, "Image Net Classification with Deep Convolutional Neural Networks", Advances In Neural Information Processing Systems, Vol.25, pp.1106-1114, 2012.Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton, "Image Net Classification with Deep Convolutional Neural Networks", Advances In Neural Information Processing Systems, Vol.25, pp.1106-1114, 2012. Joseph Redmon, Ali Farhadi, "YOLOv3: An Incremental Improvement", Cornell University, Technical Report, 2018.Joseph Redmon, Ali Farhadi, "YOLOv3: An Incremental Improvement", Cornell University, Technical Report, 2018.

しかしながら、特許文献１記載の撮像装置は、画像認識するための静止画像を撮影するために、シャッタ釦の最初の操作で出力される音声により被写体人物の注意を喚起する必要がある。そのため、被写体が幼児や赤ん坊であったり、猫や犬等の動物の場合、単なる音声による注意喚起では、必ずしも撮影した静止画像にカメラ目線画像が含まれることを期待することはできない。 However, in order to capture a still image for image recognition, the image pickup apparatus described in Patent Document 1 needs to call the attention of the subject person by the sound output by the first operation of the shutter button. Therefore, when the subject is an infant, a baby, or an animal such as a cat or a dog, it is not always possible to expect that the captured still image includes the camera line-of-sight image by just calling out the sound.

また、特許文献１記載の第３の方法は、撮影される静止画像と比較するために、カメラ目線画像を示す多大なデータ量のサンプルデータを予めＲＯＭに記憶させておくことが必要である。このデータ量が大きいほど、誤検出の可能性は低くなるが、それだけ処理時間が長くなり、又は高速処理のために高性能のプロセッサーや大容量のメモリーが要求される。サンプルデータ量を限定すれば、静止画像を撮影した時点で直ちに検出処理を終わらせることもあり得るが、誤検出の可能性は高くなる。 Further, in the third method described in Patent Document 1, it is necessary to store a large amount of sample data indicating a camera line-of-sight image in the ROM in advance in order to compare it with a captured still image. The larger the amount of data, the lower the possibility of erroneous detection, but the processing time becomes longer, or a high-performance processor or large-capacity memory is required for high-speed processing. If the amount of sampled data is limited, the detection process may be terminated immediately when a still image is captured, but the possibility of erroneous detection increases.

そこで、本発明は、上述した従来技術の問題点に鑑みてなされたものであり、その目的は、幼児や赤ん坊を含む人、及び猫、犬等の動物のカメラ目線画像を、より高い確率で自動撮影し得る方法及び装置を提供することにある。 Therefore, the present invention has been made in view of the problems of the above-described conventional technology, and the object thereof is a person including an infant or a baby, and a cat looking image of an animal such as a cat or a dog with a higher probability. It is to provide a method and an apparatus capable of automatically photographing.

更に本発明の目的は、比較的リソースの自由度が高いパーソナルコンピューターや高性能な電子機器だけでなく、リソースが制限されている場合が多いスマートフォンやタブレット等のモバイル電子機器においても、使用上十分に速い処理速度で動作し得る、被写体のカメラ目線の画像を撮影記録する方法及びシステムを提供することにある。 Further, the object of the present invention is sufficient for use not only in personal computers and high-performance electronic devices with relatively high degree of freedom of resources, but also in mobile electronic devices such as smartphones and tablets where resources are often limited. It is an object of the present invention to provide a method and system for photographing and recording an image of a subject's line of sight, which can operate at a very high processing speed.

本発明の目線撮象システムは、
撮影した画像を出力する撮影部と、
前記撮影部から入力した画像の対象がカメラ目線か否か識別判定する判定部と、
前記判定部によりカメラ目線と判定された画像を保存する記憶部とを備え、
前記判定部は、前記入力した画像に対象の顔及び前記対象以外の顔を検出し、その位置を予測する顔検出器と、
前記顔検出器から入力する前記対象の顔画像がカメラ目線であるか否かを判定する目線分類器とを有し、
前記顔検出器は、前記対象の顔を撮影した画像データを学習データとして前記対象の顔を学習する学習器を有し、
前記学習器は更に、前記対象の顔及び前記対象以外の顔以外のオブジェクトを、前記対象の顔及び前記対象以外の顔ではないオブジェクトとして暗黙的に学習し、かつ前記対象以外の顔を明示的に学習し、
前記顔検出器は、前記対象の顔を撮影した画像データを学習データとして前記対象の顔を学習するために、前記学習データの中から前記対象の顔が写っている画像を１つ選択して前記対象の顔を検出する処理を行い、その検出結果と正解データとから算出した検出誤差を元にパラメータを更新する追加処理を、前記暗黙的な学習のために、前記学習器が前記対象の顔を学習する際に、予め設定した所定の確率で行うことを特徴とする。 The eye-gaze imaging system of the present invention is
A shooting unit that outputs the shot image,
A determination unit for determining whether or not the target of the image input from the image capturing unit is looking at the camera,
A storage unit that stores the image determined to be the camera's line of sight by the determination unit,
The determining unit detects a target face and a face other than the target in the input image, and a face detector that predicts the position thereof,
Wherein said target face image input from the face detector possess and determining eyes classifier whether the Waist,
The face detector includes a learning device that learns the target face by using image data obtained by capturing the target face as learning data.
The learner further implicitly learns an object other than the target face and the face other than the target as an object that is not the target face and the face other than the target, and explicitly identifies the face other than the target. Learn to
The face detector selects one image showing the target face from the learning data in order to learn the target face by using the image data obtained by photographing the target face as learning data. Performing a process of detecting the face of the target, an additional process of updating the parameter based on the detection error calculated from the detection result and the correct data, for the implicit learning, the learning device of the target When learning a face, it is performed with a predetermined probability set in advance .

或る実形形態において、前記顔検出器は、前記学習データの前記対象の画像に対する前記対象のマスク画像を作成し、前記対象のマスク画像を用いて、該マスク画像の元の前記対象の画像から前記対象の部分のみを抽出して、前記対象以外のオブジェクトが写っている背景画像に置いた合成画像を生成し、前記合成画像を追加の学習データとして用いることにより、前記学習器を暗黙的に学習させる。 In some actual forms, the face detector creates a mask image of the object relative to the target image before Symbol learning data, by using the mask image of the object, of the mask image original of the object By extracting only the part of the target from the image, generating a composite image placed on the background image in which an object other than the target is captured, and using the composite image as additional learning data, the learning device is implicitly to make learning.

或る実施形態において、前記目線分類器は、前記顔検出器により判定された前記対象の顔画像に対して、完全なカメラ目線の場合を１、全くカメラ目線ではない場合を０とする目標値を基にして、カメラ目線の程度に応じて１から０の評価値を付与し、その評価値が予め設定された閾値を超えている場合に、カメラ目線と判定される。 In one embodiment, the eye gaze classifier sets a target value of 1 for a complete camera eye and 0 for a complete camera eye with respect to the target face image determined by the face detector. Based on, an evaluation value of 1 to 0 is given according to the degree of the line of sight of the camera, and when the evaluation value exceeds a preset threshold value, it is determined to be the line of sight of the camera.

別の実施形態では、前記判定部によりカメラ目線の前記対象の顔画像であると判定された画像を一時保存する画像記憶部と、前記画像記憶部に記憶された画像の一覧を表示する表示部とを更に備える。 In another embodiment, an image storage unit that temporarily stores an image that is determined by the determination unit to be the face image of the target from the viewpoint of the camera, and a display unit that displays a list of images stored in the image storage unit. And further.

或る実施形態において、前記対象の顔は猫の顔であり、前記対象以外の顔は人の顔である。 In one embodiment, the target face is a cat face and the non-target face is a human face .

本発明の目線撮像方法は、画像を撮影するステップと、撮影された前記画像から対象の顔又は前記対象以外の顔を検出し、その検出領域を予測するステップと、前記対象の顔を検出した前記画像及び前記対象以外の顔を検出した前記画像から、前記対象の顔画像を取り出すステップと、取り出した前記対象の顔画像からそのカメラ目線を検出するステップと、カメラ目線が検出された前記対象の顔画像を保存するステップとを含み、前記撮影された画像から対象の顔又は前記対象以外の顔を検出するために、前記対象の顔を撮影した画像データを学習データとして前記対象の顔を学習し、更に前記対象の顔及び前記対象以外の顔以外のオブジェクトを、前記対象の顔及び前記対象以外の顔ではないオブジェクトとして暗黙的に学習するステップと、前記対象の顔を検出した画像及び前記対象以外の顔を検出した画像から前記対象の顔画像を取り出すために、前記対象以外の顔を明示的に学習するステップとを更に含み、前記対象の顔を撮影した画像データを学習データとして前記対象の顔を学習するために、前記学習データの中から前記対象の顔が写っている画像を１つ選択して前記対象の顔を検出する処理を行い、その検出結果と正解データとから算出した検出誤差を元にパラメータを更新する追加処理を、前記暗黙的に学習するステップにおいて前記対象の顔を学習する際に、予め設定した所定の確率で行うことを特徴とする。 The eye-gaze imaging method of the present invention detects an image, a step of detecting a target face or a face other than the target from the captured image, and predicting a detection area thereof, and detecting the target face . From the image and the image in which a face other than the target has been detected, a step of extracting the face image of the target, a step of detecting the camera line of sight from the extracted face image of the target, and the target in which the camera line of sight is detected look including the step of storing the face image in order to detect the captured face other than the target of the face or the target from the image, wherein the subject's face image data obtained by photographing the face of the target as learning data Learning, further implicitly learning an object other than the target face and the face other than the target as an object that is not the target face and the face other than the target, an image in which the target face is detected And the step of explicitly learning the face other than the target in order to extract the face image of the target from the image in which the face other than the target is detected, the image data obtained by photographing the face of the target is learning data. In order to learn the target face, a process of selecting one image showing the target face from the learning data and detecting the target face is performed, and the detection result and correct answer data are obtained. It is characterized in that an additional process of updating the parameter based on the detection error calculated from is performed with a preset probability when the target face is learned in the implicit learning step .

或る実施形態において、前記暗黙的に学習するステップは、前記対象の顔を学習するための学習データである前記対象の顔の画像に対する前記対象のマスク画像を作成し、前記対象のマスク画像を用いて、該マスク画像の元の前記対象の画像から前記対象の部分のみを抽出して、前記対象以外のオブジェクトが写っている背景画像に置いた合成画像を生成し、前記合成画像を追加の学習データとして用いることにより行う。 In one embodiment, the step of implicitly learning creates a mask image of the target for an image of the face of the target, which is learning data for learning the face of the target, and generates the mask image of the target. By using the original image of the target of the mask image, only the target portion is extracted to generate a composite image placed on a background image showing an object other than the target, and the composite image is added. This is done by using it as learning data.

本発明の目線撮象システム及び方法によれば、リソースが制限されているモバイル電子機器においても、使用上十分に速い処理速度で、対象である猫等の動物や人のカメラ目線の画像を撮影記録することができる。 According to the eye-gaze capturing system and method of the present invention, even in a mobile electronic device with limited resources, an image of the target animal such as a cat or a person looking at the camera can be captured at a processing speed that is sufficiently high for use. Can be recorded.

本発明の目線撮影システムの構成を示すブロック図である。It is a block diagram which shows the structure of the eye gaze imaging system of this invention. 顔検出器に学習データとして使用される画像の説明図である。It is explanatory drawing of the image used as learning data in a face detector. 顔検出器に学習データとして使用される画像の説明図である。It is explanatory drawing of the image used as learning data in a face detector. （ａ）図は、顔検出器の学習データとして使用される猫の画像、（ｂ）図はそのマスク画像である。(A) is an image of a cat used as learning data for the face detector, and (b) is its mask image. （ａ）図は、顔検出器の学習データとして使用される背景画像、（ｂ）図はそれに猫の顔を重ねた合成画像である。(A) is a background image used as learning data for the face detector, and (b) is a composite image in which a cat's face is superimposed on it. （ａ）図はカメラ目線の猫の顔画像（目標値１）、（ｂ）図はカメラ目線でない猫の顔画像（目標値０）である。(A) is a face image of a cat looking at the camera (target value 1), and (b) is a face image of a cat not looking at the camera (target value 0). 目線撮影システムを適用した目線撮影装置の構成全体を示すブロック図である。It is a block diagram showing the whole composition of an eye-gaze photographing device to which an eye-gaze photographing system is applied. 本発明による目線撮影方法の処理過程を示すフロー図である。It is a flowchart which shows the process of the eye-gaze photographing method by this invention.

以下に、本発明の好適な実施形態を、添付図面を参照しつつ、詳細に説明する。本実施形態では、猫を被写体として認識し、そのカメラ目線画像を自動撮影して記憶させるための目線撮影方法、装置及びシステムについて説明する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In this embodiment, a line-of-sight image capturing method, device, and system for recognizing a cat as a subject and automatically capturing and storing a camera line-of-sight image thereof will be described.

図１は、本実施形態の目線撮影システム１の構成を概略的に示している。目線撮影システム１は、被写体を撮影した画像を入力する入力部２、入力した画像の被写体がカメラ目線かを識別判定する判定部３、カメラ目線の画像を一時保存する画像記憶部４、及び、カメラ目線の画像を表示する表示部５とを備える。 FIG. 1 schematically shows the configuration of a line-of-sight imaging system 1 of this embodiment. The line-of-sight imaging system 1 includes an input unit 2 for inputting an image of a subject, a determination unit 3 for identifying and identifying whether the subject of the input image is looking at the camera, an image storage unit 4 for temporarily storing the image of the camera looking, and And a display unit 5 for displaying an image of the line of sight of the camera.

本実施形態において、被写体の画像は、動画で撮影されて入力部２に入力される。入力部２は、入力された動画から、予め設定された一定の間隔でフレーム画像を取得し、順次判定部３に送る。別の実施形態では、被写体の画像を静止画像として撮影することもできる。その場合、入力部２は、入力した静止画像を１枚ずつ順次判定部３に送る。 In the present embodiment, a subject image is captured as a moving image and input to the input unit 2. The input unit 2 acquires frame images from the input moving image at preset constant intervals and sequentially sends the frame images to the determination unit 3. In another embodiment, the image of the subject can be captured as a still image. In that case, the input unit 2 sequentially sends the input still images one by one to the determination unit 3.

判定部３は、顔検出器７と目線分類器８とを有する。顔検出器７は、入力部２から入力するフレーム画像に、オブジェクトとして猫又は人の顔が存在するかどうかを予測し、更に、存在する場合はその位置を予測する。猫の顔が存在しないと予測した場合、そのフレーム画像は消去する。本実施形態において、顔検出器７は、上述した物体検出アルゴリズムＹＯＬＯの発展版であるＹＯＬＯｖ３[１]を用いる。 The determination unit 3 includes a face detector 7 and a line-of-sight classifier 8. The face detector 7 predicts whether or not a face of a cat or a person exists as an object in the frame image input from the input unit 2, and further, if it exists, the position thereof. If it is predicted that the cat's face does not exist, the frame image is deleted. In the present embodiment, the face detector 7 uses YOLOv3[1] which is a developed version of the object detection algorithm YOLO described above.

具体的には、ＹＯＬＯｖ３[１]は、図２に例示するように、入力画像Ｉをｇ×ｇの格子Ｇに分割し、格子Ｇの各セルに対して、次の７つの値を複数セット予測する。
・Ｘ：オブジェクトの中心点のＸ座楳値
・Ｙ：オブジェクトの中心点のＹ座標値
・Ｗ：オブジェクトの幅
・Ｈ：オブジェクトの高さ
・Ｏ：オブジェクトの中心がセル内にある確率
・Ｃ１：オブジェクトが猫の顔である確率
・Ｃ２：オブジェクトが人の顔である確率 Specifically, YOLOv3[1] divides the input image I into a g×g grid G, as illustrated in FIG. 2, and sets a plurality of the following seven values for each cell of the grid G. Predict.
・X: X coordinate value of the center point of the object ・Y: Y coordinate value of the center point of the object ・W: Width of the object ・H: Height of the object ・O: Probability that the center of the object is within the cell ・C1 : Probability that the object is a cat face C2: Probability that the object is a human face

顔検出器７は、上記７つの値の組（Ｘ，Ｙ，Ｗ，Ｈ，Ｏ，Ｃ１，Ｃ２）を複数セット生成する。生成した複数セットの中から、猫の顔が存在する確率Ｏ×Ｃ１が、予め設定された閾値を超えた予測だけを採用する。残りの予測は消去する。 The face detector 7 generates a plurality of sets of the above seven values (X, Y, W, H, O, C1, C2). From the generated plurality of sets, only the prediction in which the probability O×C1 that a cat's face is present exceeds a preset threshold is adopted. Delete the remaining predictions.

また、予測は、１つの画像に１つの対象について検出される領域が複数重複して存在することは好ましくない。本実施形態では、Non-Maximum Suppression（論文：Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation", University of California, Berkeley, and the International Computer Science Institute, CVPR 2014を参照）というアルゴリズムを用いて、検出した領域から重複領域を削除する。これにより、図３に示すように、入力画像Ｉに１つの対象についてその顔を囲う領域Ｂを１つだけ残すようにする。 Further, in prediction, it is not preferable that a plurality of regions detected for one target overlap in one image. In this embodiment, Non-Maximum Suppression (Paper: Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation", University of California, Berkeley, and the International Computer Science Institute , CVPR 2014), and remove the overlapping area from the detected area. As a result, as shown in FIG. 3, only one region B surrounding the face of one target is left in the input image I.

顔検出器７は、入力画像の中に検出した猫及び人の顔以外のオブジェクト（例えば、壁やソファー等）を猫又は人の顔として誤認識する確率を低減するために、学習器９を有する。学習器９は、猫及び人の顔以外のオブジェクトを暗黙的に学習する。ここで、暗黙的とは、検出した猫及び人の顔以外のオブジェクトが具体的に何かは分からないが、とりあえず検出対象（猫の顔又は人の顔）ではないことを学習する、という意味である。 The face detector 7 uses the learning device 9 in order to reduce the probability of erroneously recognizing an object other than the face of a cat or a person (for example, a wall or a sofa) detected in the input image as a face of a cat or a person. Have. The learning device 9 implicitly learns objects other than cats and human faces. Here, “implicitly” means that the object other than the detected cat and human face is not specifically known, but for the time being, it is learned that it is not the detection target (cat face or human face). Is.

当然ながら、猫の撮影の際に遭遇し得るあらゆるオブジェクトを明示的に検出し、それを無視することで誤検出を削減することは、技術的に可能である。しかしながら、一般に物体検出器の検出能力には上限があり、検出対象を増やすと、それだけ各オブジェクトに対する検出精度は下がってしまう、という問題がある。 Of course, it is technically possible to explicitly detect every object that can be encountered when shooting a cat and ignore it to reduce false positives. However, in general, the detection capability of the object detector has an upper limit, and there is a problem that the detection accuracy for each object decreases as the number of detection targets increases.

物体検出器のパラメータを増やすことによって、検出精度の低下を抑えることは可能である。しかしながら、物体検出器のパラメータを増やすと、それだけ処理時間が増加し、撮影中の画像からリアルタイムで検出することは難しくなる。特に、スマートフォンのようにリソースが比較的制限されているモバイル電子機器の場合、必ずしもＧＰＵの支援を得ることができず、リアルタイムでの検出処理は、より困難である。 By increasing the parameters of the object detector, it is possible to suppress the deterioration of detection accuracy. However, if the parameter of the object detector is increased, the processing time is increased accordingly, and it becomes difficult to detect in real time from the image being captured. In particular, in the case of mobile electronic devices whose resources are relatively limited, such as smartphones, GPU support cannot always be obtained, and real-time detection processing is more difficult.

本実施形態の学習器９における、猫及び人の顔以外のオブジェクトの暗黙的な学習を、以下に具体的に説明する。顔検出器７は、猫の顔を撮影した多数の画像データを学習データとして、予め猫の顔のテクスチャーのパターンを学習している。学習器９による暗黙的な学習のために、事前に前記学習データに用いた猫の画像データに対して、画像セクメンテーションを行い、猫のマスク画像を多数作成する。図４（ａ）は、前記学習データに用いた元の猫の画像Ｉ１であり、図４（ｂ）は、それを白黒の二値画像で表した猫のマスク画像Ｉ２を示している。 The implicit learning of objects other than cats and human faces in the learning device 9 of the present embodiment will be specifically described below. The face detector 7 previously learns the texture pattern of the cat's face by using a large number of image data of the image of the cat's face as learning data. For implicit learning by the learning device 9, image segmentation is performed in advance on the cat image data used for the learning data, and a large number of cat mask images are created. FIG. 4A shows an original cat image I1 used for the learning data, and FIG. 4B shows a cat mask image I2 represented by a black and white binary image.

更に前記暗黙的な学習のために、学習器９が猫の顔を学習する際に、次の追加処理を行う。学習器９における猫の顔の学習では、学習データの中から猫が写っている画像を１つ選択して、猫の顔検出を行い、その検出結果と正解データ（猫の顔の位置情報）とから検出誤差を算出し、その検出誤差を元にパラメータを更新する、という一連の学習処理を行う。 Further, for the implicit learning, the following additional processing is performed when the learning device 9 learns the face of a cat. In the learning of the face of the cat in the learning device 9, one image showing the cat is selected from the learning data, the face of the cat is detected, and the detection result and the correct answer data (position information of the face of the cat) are selected. Then, a series of learning processing is performed in which the detection error is calculated from and the parameters are updated based on the detection error.

前記追加処理は、猫が写っている画像を１つ選択した際に、予め設定した所定の確率で行う。本実施形態では、５０％の確率で行うことを選択したが、要求される前記追加処理の精度や、本発明の目線撮影システムを実行するハードウエアの使用等、様々な条件に応じて様々に設定することができる。 The additional processing is performed with a predetermined probability set in advance when one image including a cat is selected. In the present embodiment, it is selected that the probability is 50%, but the accuracy of the additional processing required, the use of the hardware for executing the eye-gaze photographing system of the present invention, and the like can be changed according to various conditions. Can be set.

前記追加処理に先立って、図５（ａ）に例示するように、猫と人以外のオブジェクトが写っている背景画像ＢＩを複数用意する。次に、前記猫のマスク画像を用いて、元の猫の画像から、猫の部分のみを抽出する。この猫の部分のみの画像から、正解デ一タである前記元の猫の画像における猫の顔の位置情報を用いて、猫の顔部分のみを抽出する。先に用意した複数の背景画像から、ランダムに１つの背景画像を選択し、その上に、先に抽出した猫の顔画像を置いて、図５（ｂ）に例示する合成画像ＣＩを複数生成する。 Prior to the addition processing, as illustrated in FIG. 5A, a plurality of background images BI including objects other than cats and people are prepared. Next, using the mask image of the cat, only the cat portion is extracted from the original cat image. From the image of only the part of the cat, only the face part of the cat is extracted using the position information of the face of the cat in the image of the original cat which is the correct answer data. One background image is randomly selected from the plurality of background images prepared previously, and the face image of the cat extracted previously is placed on the background image to generate a plurality of composite images CI illustrated in FIG. 5B. To do.

本実施形態によれば、こうして得られた複数の合成画像を追加の学習データとして用いて、学習器９を学習させる。これにより、背景画像に写っている、猫及び人の顔以外の様々なオブジェクトを暗黙的に学習することができ、学習器９の誤検出を削減することができた。 According to the present embodiment, the learning device 9 is trained by using the plurality of composite images thus obtained as additional learning data. As a result, various objects other than the faces of cats and humans appearing in the background image can be learned implicitly, and false detection of the learning device 9 can be reduced.

更に学習器９は、人の顔というテクスチャのパターンを明示的に学習させた。これにより、猫及び人の顔以外のオブジェクトの誤検出を削減すると同時に、人の顔を猫の顔として誤検出することを解消した。 Furthermore, the learning device 9 explicitly learned the texture pattern of the human face. This reduces erroneous detection of objects other than cats and human faces, and eliminates erroneous detection of human faces as cat faces.

目線分類器８は、ＣＮＮを用いた２クラス分類の画像分類器である。目線分類器８は、顔検出器７から入力する猫の顔画像について、カメラ目線の猫の顔画像であるのか、カメラ目線ではない猫の顔画像であるのかを判定するように構成される。本実施形態では、事前に、完全なカメラ目線と評価される猫の顔画像に１の目標値を、全くカメラ目線ではないと評価される猫の顔画像に０の目標値を付与する。例えば図６（ａ）は、完全なカメラ目線と、図６（ｂ）は、全くカメラ目線ではないと評価される。それら以外の猫の顔画像には、カメラ目線の程度に応じて１から０の間の目標値を付与する。 The line-of-sight classifier 8 is a two-class image classifier using CNN. The line-of-sight classifier 8 is configured to determine whether the face image of the cat input from the face detector 7 is a face image of a cat looking at the camera or a face image of a cat not looking at the camera. In the present embodiment, a target value of 1 is assigned to a face image of a cat that is evaluated to be a perfect camera line of sight, and a target value of 0 is assigned to a face image of a cat that is not evaluated to be a camera line of sight. For example, it is evaluated that FIG. 6A is a complete camera line of sight, and FIG. 6B is not a camera line of sight at all. Target values between 1 and 0 are given to the face images of cats other than these depending on the degree of the line of sight of the camera.

猫の顔画像に対するカメラ目線の評価及び目標値の決定・付与は、目線分類器８を学習させる学習実行者によって行われる。このようにして目標値を付与した多数の猫の顔画像を学習データとして、目線分類器８を学習させる。この学習結果を基にして、目線分類器８は、顔検出器７から入力する猫の顔画像についてカメラ目線の程度を評価し、目標値と同じく１から０の間の評価値を付与して出力する。猫の顔画像をカメラ目線であると判定するために、評価値には予め所定の閾値を設定する。判定部３は、目線分類器８からの出力される評価値が前記閾値を超えていると、その猫の顔画像をカメラ目線である、と判定する。 Evaluation of the camera line of sight and determination/assignment of the target value for the face image of the cat are performed by a learning executor who trains the line of sight classifier 8. In this way, the eye-line classifier 8 is trained using the face images of a large number of cats to which the target values are assigned as learning data. Based on this learning result, the line-of-sight classifier 8 evaluates the degree of the line of sight of the camera with respect to the face image of the cat input from the face detector 7, and assigns the same evaluation value between 1 and 0 as the target value. Output. In order to determine that the face image of the cat is looking at the camera, a predetermined threshold value is set in advance as the evaluation value. When the evaluation value output from the eye-line classifier 8 exceeds the threshold value, the determination unit 3 determines that the face image of the cat is the line of sight of the camera.

画像記憶部４は、判定部３から出力されるカメラ目線の猫の顔画像を順次一時的に記憶する。画像記憶部４に一時記憶されたカメラ目線の猫の顔画像は、ユーザーが、上述した目線撮影システム１による猫のカメラ目線画像を自動撮影して記憶させる処理の終了を入力すると、その一覧の画像データが表示部４に送られる。 The image storage unit 4 sequentially and temporarily stores the face images of the cat looking at the camera output from the determination unit 3. When the user inputs the end of the process of automatically capturing and storing the cat's camera line-of-sight image by the above-described line-of-sight image capturing system 1, the cat's face image of the line of sight of the camera temporarily stored in the image storage unit 4 is displayed in the list. The image data is sent to the display unit 4.

また、判定部３は、判定された前記カメラ目線の猫の顔画像を、そのカメラ目線の程度に応じて付与されたスコアと共に、画像記憶部４に出力する。本実施形態において、スコアは、目線分類器８が、上述したカメラ目線の機械学習で学習データとして用いた画像データの目標値を基にして付与した前記評価値を、そのまま使用することができる。また、目線分類器８により付与された前記評価値を高い順に並べ、その順番を画像のスコアとすることもできる。 The determination unit 3 also outputs the determined face image of the cat looking at the camera to the image storage unit 4 together with the score given according to the degree of the camera looking. In the present embodiment, as the score, the evaluation value given by the eye-line classifier 8 based on the target value of the image data used as learning data in the machine learning of the camera eye can be used as it is. It is also possible to arrange the evaluation values assigned by the eye-line classifier 8 in descending order and use that order as the score of the image.

表示部４は、画像記憶部４から出力されるカメラ目線の猫の顔画像の一覧を表示する。また、表示部４は、前記カメラ目線の猫の顔画像の一覧を、各画像に付与された前記スコアの大きい順に並び替えて表示することができる。ユーザーは、自ら好みのカメラ目線の猫の顔画像を前記一覧から選択することができる。ユーザーが選択した猫の顔画像は、画像記憶部４から、目線撮影システム１の図示しない記憶部又は外部の記憶手段に出力し、保存することができる。 The display unit 4 displays a list of face images of cats looking from the camera, which are output from the image storage unit 4. In addition, the display unit 4 can rearrange and display the list of the face images of the cat looking at the camera in the descending order of the score assigned to each image. The user can select a face image of the cat looking at the camera, which he or she likes, from the list. The face image of the cat selected by the user can be output from the image storage unit 4 to a storage unit (not shown) of the line-of-sight imaging system 1 or an external storage unit and stored.

目線撮影システム１は、例えばスマートフォン等のモバイル電子機器やパーソナルコンピューターに適用することができる。図７は、目線撮影システム１を適用したスマートフォンである本実施形態の目線撮影装置１１の構成を示している。 The line-of-sight imaging system 1 can be applied to mobile electronic devices such as smartphones and personal computers. FIG. 7 shows the configuration of the eye-gaze photographing device 11 of the present embodiment, which is a smartphone to which the eye-gaze photographing system 1 is applied.

目線撮影装置１１は、被写体の動画及び／又は静止画像を撮影するカメラからなる撮像装置１２と、撮影した画像データを記憶する記憶装置１３と、前記画像データを処理するＧＰＵ（Graphics Processing Unit）からなる演算処理部１４と、前記撮像装置、記憶装置及び演算処理部の動作を制御するＣＰＵ（Central Processing Unit）からなる制御部１５と、入力装置１６と、表示装置１７とを備える。目線撮影装置１１において本実施形態の目線撮影方法を実行するための目線撮影プロクラムは、演算処理部１４が有するＲＯＭ（図示せず）又は記憶装置１３に格納される。 The line-of-sight image capturing device 11 includes an image capturing device 12 including a camera that captures a moving image and/or a still image of a subject, a storage device 13 that stores captured image data, and a GPU (Graphics Processing Unit) that processes the image data. The calculation processing unit 14 includes a control unit 15 including a CPU (Central Processing Unit) that controls the operations of the imaging device, the storage device, and the calculation processing unit, an input device 16, and a display device 17. The eye-gaze photographing program for executing the eye-gaze photographing method of the present embodiment in the eye-gaze photographing device 11 is stored in the ROM (not shown) or the storage device 13 included in the arithmetic processing unit 14.

入力装置１６は、例えばキーボード、マウス、タッチパネル、テンキー等、スマートフォン等が元より備える入力装置であり、ユーザーによって操作される。表示装置１７は、スマートフォン等が元より備える液晶パネル等のディスプレイ装置からなる。更に目線撮影装置１１は、外部との間で画像データ及び他のデータを無線又は有線で双方向で通信するための通信装置（図示せず）を備えることができる。前記通信装置を介して、目線撮影装置１１は、外部の記憶装置に画像データを記憶させることができる。 The input device 16 is, for example, a keyboard, a mouse, a touch panel, a ten-key pad, and the like, which is originally included in a smartphone or the like, and is operated by a user. The display device 17 includes a display device such as a liquid crystal panel that is originally included in a smartphone or the like. Further, the line-of-sight photographing device 11 can include a communication device (not shown) for bidirectionally communicating image data and other data with the outside in a wireless or wired manner. The eye-gaze photographing device 11 can store image data in an external storage device via the communication device.

図８は、図７の目線撮影装置１１を用いて実行される、本実施形態の目線撮影方法による処理過程を示すフロー図である。前記目線撮影プログラムが、入力装置１６を介してユーザーにより起動されると、制御部１５は、該目線撮影プログラムを前記ＲＯＭ又は記憶装置１３から読み出して実行する。先ず、ステップＳ１において、制御部１５は、撮像装置１２を制御して、該撮像装置による被写体（猫）の動画撮影及び録画を開始する。 FIG. 8 is a flow chart showing a processing process by the eye-gaze photographing method of the present embodiment, which is executed using the eye-gaze photographing device 11 of FIG. When the user has activated the eye-gaze photographing program via the input device 16, the control unit 15 reads the eye-gaze photographing program from the ROM or the storage device 13 and executes the program. First, in step S1, the control unit 15 controls the imaging device 12 to start capturing and recording a moving image of a subject (cat) by the imaging device.

次に、制御部１４は、撮像装置１２から連続して出力される画像データ（動画）から、予め設定した一定の間隔でフレ一ムを取得し、それを演算処理部１４からなる判定部４の顔検出器７に出力する（ステップＳ２）。これと同時に、制御部１４は、取得した前記フレームを記憶装置１３に連続的に出力させて記憶させる。撮像装置１２から出力される動画をそのまま記憶装置１３に記憶させることもできる。 Next, the control unit 14 acquires frames from image data (moving images) that are continuously output from the image pickup apparatus 12 at preset constant intervals, and determines the frames from the determination unit 4 including the arithmetic processing unit 14. To the face detector 7 (step S2). At the same time, the control unit 14 causes the storage device 13 to continuously output and store the acquired frames. The moving image output from the imaging device 12 can be stored in the storage device 13 as it is.

顔検出器７は、入力した前記フレームに対して、上述した猫の顔を検出する処理を演算処理部１４により実行する。その結果、前記フレームに猫の顔を検出しなかった場合（ステップＳ３のＮｏ）、そのフレ一ムを破棄し、次のフレ一ムを待つ。顔検出器７が前記フレームに猫の顔を検出した場合（ステップＳ３のＹｅｓ）、そのフレームから検出領域（図４のＢ）を抽出し、それを目線分類器８に送る。 The face detector 7 causes the arithmetic processing unit 14 to perform the above-described process of detecting the face of the cat on the input frame. As a result, when the face of the cat is not detected in the frame (No in step S3), the frame is discarded and the next frame is waited for. When the face detector 7 detects a cat face in the frame (Yes in step S3), the detection area (B in FIG. 4) is extracted from the frame and sent to the eye-line classifier 8.

目線分類器８は、目線分類器８から送られた猫の顔の検出領域に対して、上述した猫の顔からカメラ目線を検出する処理を演算処理部１４により実行する。前記検出領域に猫のカメラ目線を認識しなかった場合（ステップＳ４のＮｏ、そのフレームを破棄し、次のフレームを待つ）。目線分類器８が前記検出領域に猫のカメラ目線を認識した場合（ステップＳ４のＹｅｓ）、制御部１５は、そのフレ一ムを猫のカメラ目線画像として、記憶装置１３からなる画像記憶部５に一時保存する（ステップＳ５）。 The line-of-sight classifier 8 causes the arithmetic processing unit 14 to perform the above-described process of detecting the line of sight of the camera from the face of the cat in the detection area of the face of the cat sent from the line-of-sight classifier 8. When the cat's camera line of sight is not recognized in the detection area (No in step S4, the frame is discarded and the next frame is waited). When the line-of-sight classifier 8 recognizes the cat's camera line-of-sight in the detection area (Yes in step S4), the control unit 15 sets the frame as the cat's camera line-of-sight image and the image storage unit 5 including the storage device 13. (Step S5).

前記目線撮影プロクラムは、撮像装置１２による撮影及び録画を終了するための終了ボ夕ンを提供している。ステップＳ５の後、制御部１５は、ユーザーが入力装置１６を操作して前記終了ボタンを押したか否かを確認する（ステップＳ６）。前記終了ボタンが押されてない場合、ステップＳに戻り、上述した一連の処理Ｓ２〜Ｓ６が繰り返し実行される。 The line-of-sight photographing program provides an end button for ending photographing and recording by the image pickup device 12. After step S5, the control unit 15 confirms whether or not the user has operated the input device 16 and pressed the end button (step S6). If the end button has not been pressed, the process returns to step S and the series of processes S2 to S6 described above is repeatedly executed.

ステップＳ６で前記終了ボタンが押された場合、制御部１５は、画像記憶部５に一時保存されている猫のカメラ目線画像の一覧を、表示部６を構成する表示装置１７に表示させる（ステップＳ７）。ユーザーが、表示装置１７に表示された画像の一覧から所望の画像を選択すると、制御部１５は、その選択された猫のカメラ目線画像を記憶装置１３に記憶させる。別の実施形態では、ユーザーが選択した猫のカメラ目線画像を、目線撮影装置１１の前記通信装置を介して送信し、外部の記憶装置に保存することもできる。 When the end button is pressed in step S6, the control unit 15 causes the display device 17 included in the display unit 6 to display a list of the camera line-of-sight images of the cat temporarily stored in the image storage unit 5 (step S6). S7). When the user selects a desired image from the list of images displayed on the display device 17, the control unit 15 stores the selected camera line-of-sight image of the cat in the storage device 13. In another embodiment, the user's line-of-sight image of the cat selected by the user may be transmitted via the communication device of the line-of-sight photographing device 11 and stored in an external storage device.

以上、本発明を好適な実施態様に関連して説明したが、本発明は上記実施態様に限定されるものでなく、その技術的範囲において、様々な変更又は変形を加えて実施することができる。例えば、上記実施形態は、猫を対象として、猫のカメラ目線画像を撮影記録する場合について説明したが、本発明は、対象を猫以外の動物や人（例えば、赤ん坊や幼児）とした場合にも、同様に適用することができる。 Although the present invention has been described above with reference to the preferred embodiments, the present invention is not limited to the above embodiments, and various modifications or variations can be made within the technical scope thereof. .. For example, the above-described embodiment has been described with respect to a cat as a case of shooting and recording a cat's line-of-sight image of the cat. Can be similarly applied.

１目線撮影システム
２入力部
３判定部
４画像記憶部
５表示部
７顔検出器
８目線分類器
９学習器
1 Eye-gaze imaging system 2 Input section 3 Judgment section 4 Image storage section 5 Display section 7 Face detector 8 Eye-gaze classifier 9 Learner

Claims

A shooting unit that outputs the shot image,
A determination unit for determining whether or not the target of the image input from the image capturing unit is looking at the camera,
A storage unit that stores the image determined to be the camera's line of sight by the determination unit,
The determining unit detects a target face and a face other than the target in the input image, and a face detector that predicts the position thereof,
And a line-of-sight classifier that determines whether or not the target face image input from the face detector is a camera line of sight,
The face detector includes a learning device that learns the target face by using image data obtained by capturing the target face as learning data.
The learner further implicitly learns an object other than the target face and the face other than the target as an object that is not the target face and the face other than the target, and explicitly identifies the face other than the target. Learn to
The face detector selects one image showing the target face from the learning data in order to learn the target face by using the image data obtained by photographing the target face as learning data. Performing a process of detecting the face of the target, an additional process of updating the parameter based on the detection error calculated from the detection result and the correct data, for the implicit learning, the learning device of the target When learning a face, it is performed with a preset probability.
Eye-gaze photography system.

The face detector creates a mask image of the target for the target image of the learning data, and using the mask image of the target, removes only the target portion from the original target image of the mask image. The method according to claim 1, wherein the learner is implicitly learned by generating a composite image that is extracted and placed on a background image including an object other than the target, and using the composite image as additional learning data. The line-of-sight imaging system described.

The line-of-sight classifier is based on a target value that is 1 for a complete camera line of sight and 0 for no face of the camera for the target face image determined by the face detector, based on a target value. The eye-gaze photographing according to claim 1 or 2 , wherein an evaluation value of 1 to 0 is given according to the degree of the eye-gaze, and when the evaluation value exceeds a preset threshold value, it is determined to be a camera eye-gaze. system.

An image storage unit that temporarily stores an image that is determined by the determination unit to be the face image of the target looking at the camera, and a display unit that displays a list of images stored in the image storage unit. The eye-gaze photographing system according to any one of claims 1 to 3 .

The eye gaze photographing system according to any one of claims 1 to 4 , wherein the target face is a cat face and the non-target face is a human face.

A step of taking an image,
Detecting a target face or a face other than the target from the captured image, and predicting the detection area,
Extracting the face image of the target from the image in which the face of the target is detected and the image in which a face other than the target is detected,
Detecting the camera line of sight from the extracted target face image,
Saving a face image of the object from which a camera line of sight is detected,
In order to detect a target face or a face other than the target from the captured image, the target face is learned using image data of the target face captured as learning data, and the target face and the target are further learned. An object other than a face other than the step of implicitly learning as an object that is not the target face and the face other than the target,
To retrieve the target face image from an image a face has been detected other than the subject's face to detect the image and the object, further saw including a step of explicitly learn the face of other than the target,
In order to learn the target face by using the image data of the image of the target face as learning data, one image showing the target face is selected from the learning data to detect the target face. The additional processing of updating the parameter based on the detection error calculated from the detection result and the correct answer data is performed, and when learning the target face in the step of implicitly learning, a predetermined preset value is set. With the probability of
How to take a look.

The step of implicitly learning creates a mask image of the target for an image of the face of the target, which is learning data for learning the face of the target, and uses the mask image of the target to generate the mask image. By extracting only the target portion from the original target image of, to generate a composite image placed on a background image in which an object other than the target is captured, and using the composite image as additional learning data. The eye-gaze photographing method according to claim 6, which is performed.

The eye-gaze photographing method according to claim 6 or 7 , wherein the target face is a cat face, and the non-target face is a human face.