JP7385416B2

JP7385416B2 - Image processing device, image processing system, image processing method, and image processing program

Info

Publication number: JP7385416B2
Application number: JP2019186945A
Authority: JP
Inventors: 康博中嶋; 素子黒岩
Original assignee: Glory Ltd
Current assignee: Glory Ltd
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2023-11-22
Anticipated expiration: 2039-10-10
Also published as: JP2021064043A

Description

本発明は、画像処理装置、画像処理システム、画像処理方法及び画像処理プログラムに関する。 The present invention relates to an image processing device, an image processing system, an image processing method, and an image processing program.

人物が撮影された画像を画像処理装置にて解析し、その画像から当該人物の顔を検出する手法が従来から提案されている。 BACKGROUND ART Conventionally, a method has been proposed in which an image of a person is analyzed by an image processing device and the face of the person is detected from the image.

例えば、特許文献１には、入力画像を段階的に縮小し、縮小した画像に対してテンプレート・マッチングによる顔検出を行う顔検出装置が開示されている。 For example, Patent Document 1 discloses a face detection device that reduces an input image in stages and performs face detection on the reduced image by template matching.

また、非特許文献１には、ニューラルネットワークを使用して画像から物体を検出する手法の一つが開示されている。 Further, Non-Patent Document 1 discloses one of the methods of detecting an object from an image using a neural network.

特開２００８－２５７３２１号公報JP2008-257321A

Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie, "Feature Pyramid Networks for Object Detection," The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2117-2125Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie, "Feature Pyramid Networks for Object Detection," The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2117-2125

しかしながら、特許文献１に記載の顔検出装置では、設定された全ての段階において顔検出処理をしなければならず、処理時間が長くなってしまう。また、当該装置の利用者が検出する必要がないと考える大きさの顔を多く検出してしまう可能性もある。 However, in the face detection device described in Patent Document 1, face detection processing must be performed at all set stages, resulting in a long processing time. Furthermore, there is a possibility that many faces of a size that the user of the device considers unnecessary to be detected may be detected.

また、非特許文献１に記載の手法を利用して顔を検出することは可能であるが、検出可能な顔の大きさの範囲を広げすぎると、検出精度を確保するために複雑なネットワークを使用する必要があり、学習時及び検出時ともに時間的なコストが増大するとともに、必要なリソースが大きくなることによる経済的なコストアップを招いてしまう。 Furthermore, although it is possible to detect faces using the method described in Non-Patent Document 1, if the range of detectable face sizes is expanded too much, a complex network may be required to ensure detection accuracy. This increases the time cost for both learning and detection, and increases the required resources, leading to an increase in economic costs.

本発明は、上記現状に鑑みてなされたものであり、利用者が要求する大きさの顔を効率的に検出可能な画像処理装置、画像処理システム、画像処理方法及び画像処理プログラムを提供することを目的とするものである。 The present invention has been made in view of the above-mentioned current situation, and provides an image processing device, an image processing system, an image processing method, and an image processing program that can efficiently detect a face of a size requested by a user. The purpose is to

上述した課題を解決し、目的を達成するために、本発明は、画像処理装置であって、前記画像処理装置は、画像から、大きさが所定範囲の顔を検出可能な顔検出部と、入力画像から検出すべき顔の大きさの範囲として設定された設定範囲を記憶する記憶部と、前記所定範囲及び前記設定範囲に基づいて、前記入力画像の解像度を変更した画像である拡大縮小画像を生成する解像度変更部と、を備え、前記顔検出部は、前記拡大縮小画像に対して顔検出することを特徴とする。 In order to solve the above-mentioned problems and achieve the objects, the present invention provides an image processing device, the image processing device comprising: a face detection section capable of detecting a face having a size within a predetermined range from an image; a storage unit that stores a set range set as a size range of a face to be detected from an input image; and a scaled-down image that is an image in which the resolution of the input image is changed based on the predetermined range and the set range. and a resolution changing unit that generates a resolution change unit, and the face detection unit detects a face in the enlarged/reduced image.

また、本発明は、上記発明において、前記解像度変更部は、前記設定範囲の最大値に係る大きさの顔と、前記設定範囲の最小値に係る大きさの顔との少なくとも一方を前記顔検出部が検出可能となるように、前記入力画像の前記解像度を変更して前記拡大縮小画像を生成することを特徴とする。 Further, in the above invention, the resolution changing unit detects at least one of a face having a size corresponding to the maximum value of the setting range and a face having a size corresponding to the minimum value of the setting range. The enlarged/reduced image is generated by changing the resolution of the input image so that the portion can be detected.

また、本発明は、上記発明において、前記解像度変更部は、少なくとも前記設定範囲の最大値に係る大きさの顔を前記顔検出部が検出可能となるように、前記入力画像の前記解像度を低くすることによって前記拡大縮小画像を生成することを特徴とする。 Further, in the above invention, the present invention provides that the resolution changing section lowers the resolution of the input image so that the face detecting section can detect a face having a size related to at least the maximum value of the setting range. The enlarged/reduced image is generated by doing the following.

また、本発明は、上記発明において、前記顔検出部は、前記設定範囲の最小値によっては、前記入力画像に対しても更に顔検出することを特徴とする。 Further, the present invention is characterized in that, in the above-mentioned invention, the face detection section further detects a face in the input image depending on the minimum value of the setting range.

また、本発明は、上記発明において、前記入力画像に対する前記拡大縮小画像の拡大／縮小率をＸ％とし、前記所定範囲の最小値をＭｉｎ１とし、前記設定範囲の最小値をＭｉｎ２としたとき、（Ｍｉｎ２×Ｘ／１００）＜Ｍｉｎ１の関係を満たす場合、前記顔検出部は、前記入力画像に対しても更に顔検出することを特徴とする。 Further, in the above invention, when the enlargement/reduction ratio of the enlarged/reduced image with respect to the input image is X%, the minimum value of the predetermined range is Min1, and the minimum value of the set range is Min2, If the relationship (Min2×X/100)<Min1 is satisfied, the face detection unit further detects a face in the input image.

また、本発明は、上記発明において、前記拡大縮小画像にて検出された顔の領域に対応する前記入力画像中の第１の領域が、前記入力画像にて検出された顔の領域である第２の領域と重なる場合、前記顔検出部は、前記第１の領域及び前記第２の領域の重なり具合に応じて、いずれか一方の顔に係る情報を検出結果として出力することを特徴とする。 Further, in the above invention, the present invention provides a first region in the input image corresponding to a face region detected in the enlarged/reduced image is a face region detected in the input image. If the first area and the second area overlap, the face detection unit outputs information related to one of the faces as a detection result depending on the degree of overlap between the first area and the second area. .

また、本発明は、上記発明において、前記顔検出部は、検出した顔の顔らしさに係るスコアを更に出力するものであり、前記第１の領域が前記第２の領域と重なり、かついずれか一方の顔に係る情報を出力する場合、顔らしさに係るスコアがより高い方の顔に係る情報を検出結果として出力することを特徴とする。 Further, in the present invention, in the above invention, the face detection unit further outputs a score related to the face-likeness of the detected face, and the first area overlaps the second area, and the face detection unit further outputs a score related to the face-likeness of the detected face, When outputting information related to one face, information related to the face with a higher face-likeness score is output as a detection result.

また、本発明は、上記発明において、前記画像処理装置は、前記顔検出部の検出結果に基づき認証処理を行う顔認証部を更に備え、前記第１の領域が前記第２の領域と重なり、かつ前記顔検出部が前記拡大縮小画像にて検出した前記顔に係る情報を検出結果として出力する場合、前記顔認証部は、前記拡大縮小画像にて検出された前記顔の座標に対応する前記入力画像の座標から特定される前記入力画像の画像領域に基づいて認証処理を行うことを特徴とする。 Further, in the present invention, in the above invention, the image processing device further includes a face authentication unit that performs authentication processing based on the detection result of the face detection unit, and the first area overlaps the second area, When the face detection section outputs information related to the face detected in the enlarged/reduced image as a detection result, the face authentication section outputs information related to the face detected in the enlarged/reduced image. The present invention is characterized in that authentication processing is performed based on an image area of the input image specified from the coordinates of the input image.

また、本発明は、上記発明において、前記画像処理装置は、前記顔検出部の検出結果に基づき認証処理を行う顔認証部を更に備えることを特徴とする。 Further, the present invention is characterized in that, in the above invention, the image processing device further includes a face authentication section that performs authentication processing based on the detection result of the face detection section.

また、本発明は、上記発明において、前記所定範囲の少なくとも一部が前記設定範囲を超えており、かつ前記顔検出部によって前記設定範囲外の大きさの顔が検出された場合、前記顔検出部は、当該設定範囲外の大きさの顔に係る情報を検出結果として出力しないことを特徴とする。 Further, in the above invention, when at least a part of the predetermined range exceeds the set range and the face detection unit detects a face having a size outside the set range, the face detection unit The section is characterized in that it does not output information related to a face having a size outside the set range as a detection result.

また、本発明は、上記発明において、前記顔検出部は、画像に含まれる顔の複数の特徴点を検出し、かつ前記複数の特徴点に基づいて当該顔の大きさを算出することを特徴とする。 Further, in the above invention, the present invention is characterized in that the face detection unit detects a plurality of feature points of a face included in the image, and calculates the size of the face based on the plurality of feature points. shall be.

また、本発明は、上記発明において、前記顔検出部は、メモリ容量が互いに異なる複数の顔検出器を有し、前記複数の顔検出器のメモリ容量は、複数の画像サイズに応じて設定されたものであることを特徴とする。 Further, in the present invention, in the above invention, the face detection unit includes a plurality of face detectors having different memory capacities, and the memory capacities of the plurality of face detectors are set according to the plurality of image sizes. It is characterized by being

また、本発明は、上記発明において、前記画像処理装置は、前記拡大縮小画像のサイズが前記複数の画像サイズのうちの最も近い画像サイズに一致するように、前記拡大縮小画像の周囲の少なくとも一部に背景画像を追加する画像サイズ調整部を更に備え、前記顔検出部は、前記背景画像が追加された前記拡大縮小画像に対して顔検出することを特徴とする Further, in the above invention, the present invention provides that the image processing device adjusts at least one part of the periphery of the enlarged/reduced image so that the size of the enlarged/reduced image matches the closest image size among the plurality of image sizes. The image size adjusting section adds a background image to the image size adjusting section, and the face detecting section detects a face on the enlarged/reduced image to which the background image is added.

また、本発明は、画像処理システムであって、前記画像処理システムは、画像から、大きさが所定範囲の顔を検出可能な顔検出部と、入力画像から検出すべき顔の大きさの範囲として設定された設定範囲を記憶する記憶部と、前記所定範囲及び前記設定範囲に基づいて、前記入力画像の解像度を変更した画像である拡大縮小画像を生成する解像度変更部と、を備え、前記顔検出部は、前記拡大縮小画像に対して顔検出することを特徴とする。 The present invention also provides an image processing system, the image processing system comprising: a face detection unit capable of detecting a face within a predetermined size range from an image; and a face size range to be detected from an input image. a storage unit that stores a setting range set as , and a resolution changing unit that generates an enlarged/reduced image that is an image with the resolution of the input image changed based on the predetermined range and the setting range, The face detection unit is characterized in that it detects a face in the enlarged/reduced image.

また、本発明は、画像から顔を検出する画像処理方法であって、入力画像から検出すべき顔の大きさの範囲として設定された設定範囲を記憶するステップと、検出可能な顔の大きさの範囲である所定範囲と、前記設定範囲とに基づいて、前記入力画像の解像度を変更した画像である拡大縮小画像を生成するステップと、前記拡大縮小画像に対して顔検出するステップと、を含むことを特徴とする。 The present invention also provides an image processing method for detecting a face from an image, which includes the steps of: storing a setting range set as a range of the size of a face to be detected from an input image; and a step of generating a scaled-down image that is an image with a changed resolution of the input image based on a predetermined range that is a range of It is characterized by containing.

また、本発明は、画像処理プログラムであって、前記画像処理プログラムは、画像から顔を検出するためにコンピュータを、検出可能な顔の大きさの範囲である所定範囲と、入力画像から検出すべき顔の大きさの範囲として設定された設定範囲とに基づいて、前記入力画像の解像度を変更した画像である拡大縮小画像を生成する手段、及び前記拡大縮小画像に対して顔検出する手段、として機能させることを特徴とする。 The present invention also provides an image processing program, wherein the image processing program detects a face from an input image by causing a computer to detect a face within a predetermined range of detectable face sizes and from an input image. means for generating an enlarged/reduced image that is an image obtained by changing the resolution of the input image based on a setting range set as a range of the desired face size; and means for detecting a face in the enlarged/reduced image; It is characterized by functioning as

本発明の画像処理装置、画像処理システム、画像処理方法及び画像処理プログラムによれば、利用者が要求する大きさの顔を効率的に検出することができる。 According to the image processing device, image processing system, image processing method, and image processing program of the present invention, it is possible to efficiently detect a face of a size requested by a user.

実施形態１における顔検出の手法の概要を説明するための模式図である。2 is a schematic diagram for explaining an overview of a face detection method in Embodiment 1. FIG. 実施形態１に係る画像処理装置を含むシステムの全体構成を説明するブロック図である。1 is a block diagram illustrating the overall configuration of a system including an image processing device according to a first embodiment. FIG. 実施形態１に係る画像処理装置の構成を説明するブロック図である。1 is a block diagram illustrating the configuration of an image processing apparatus according to a first embodiment. FIG. 設定部による設定範囲の設定方法の一例を説明するための模式図であり、（ａ）は、カメラにより利用者を撮像する方法を示し、（ｂ）は、モニタ上に表示された利用者の顔から設定範囲を設定する方法を示す。3A and 3B are schematic diagrams for explaining an example of a setting range setting method by a setting unit, in which (a) shows a method of capturing an image of a user with a camera, and (b) shows a method of capturing an image of a user displayed on a monitor. Shows how to set the setting range from the face. 設定部による設定範囲の設定方法の別の例を説明するための模式図であり、モニタ上に表示された顔のモデルから設定範囲を設定する方法を示す。FIG. 7 is a schematic diagram for explaining another example of a method of setting a setting range by a setting unit, and shows a method of setting a setting range from a face model displayed on a monitor. 顔検出部によって検出された顔の特徴点及びバウンディングボックスを示した模式図である。FIG. 2 is a schematic diagram showing feature points and bounding boxes of a face detected by a face detection unit. 拡大縮小画像にて顔検出部によって検出された顔の領域（バウンディングボックス）と、入力画像にて顔検出部によって検出された顔の領域（バウンディングボックス）との対応関係を示した模式図である。FIG. 2 is a schematic diagram showing the correspondence between a face area (bounding box) detected by a face detection unit in a scaled-down image and a face area (bounding box) detected by the face detection unit in an input image. . 実施形態１に係る画像処理装置で行われる処理、主に顔検出処理の手順の一例を示す模式図である。FIG. 2 is a schematic diagram showing an example of a procedure of processing, mainly face detection processing, performed by the image processing apparatus according to the first embodiment. 実施形態１に係る画像処理装置で行われる処理、主に顔検出処理の手順の一例を示すフローチャートである。2 is a flowchart illustrating an example of a process performed by the image processing apparatus according to the first embodiment, mainly a procedure of face detection processing. 変形形態に係る画像処理システムの全体構成を説明する模式図である。FIG. 2 is a schematic diagram illustrating the overall configuration of an image processing system according to a modified embodiment.

以下、本発明に係る画像処理装置、画像処理システム、画像処理方法及び画像処理プログラムの好適な実施形態を、図面を参照しながら説明する。本発明において、顔の大きさを示す具体的なパラメータ（所定範囲及び設定範囲を特定するためのパラメータ）は、特に限定されず、例えば、左右の目間の距離（以下、単に目間距離とも言う）、左右の目間の中心から口までの距離、顎先から頭の頂上までの距離、左右の耳の間の距離、等が挙げられるが、以下では、目間距離を使用した例について説明する。 Hereinafter, preferred embodiments of an image processing device, an image processing system, an image processing method, and an image processing program according to the present invention will be described with reference to the drawings. In the present invention, specific parameters indicating the size of the face (parameters for specifying the predetermined range and setting range) are not particularly limited, and for example, the distance between the left and right eyes (hereinafter also simply referred to as the distance between the eyes) is not particularly limited. ), the distance from the center of the left and right eyes to the mouth, the distance from the tip of the chin to the top of the head, the distance between the left and right ears, etc. Below, we will discuss examples using the distance between the eyes. explain.

＜本実施形態の概要＞
まず、実施形態１における顔検出の手法の概要について説明する。本実施形態では、機械学習された推論モデルから構成される顔検出器（顔検出エンジン）にて検出可能な顔の大きさの所定範囲（例えば目間距離が８～２００画素）と、検出したい顔の大きさとして利用者が設定した設定範囲（任意の設定値、例えば目間距離が１０～４００画素）に応じて、入力画像のサイズを変更（拡大又は縮小、例えば２００／４００にサイズを圧縮）した画像である拡大縮小画像を生成し、拡大縮小画像に基づき顔検出する。これにより、顔検出器のみでは対応できない大きさ（目間距離）の顔であっても拡大縮小画像から検出することが可能となる。以下、図１を用いて、より詳しく説明する。 <Overview of this embodiment>
First, an overview of the face detection method in the first embodiment will be explained. In this embodiment, a predetermined range of face sizes (for example, distance between eyes of 8 to 200 pixels) that can be detected by a face detector (face detection engine) composed of a machine-learned inference model and a face that is desired to be detected Change the size of the input image (enlarge or reduce, for example, change the size to 200/400 pixels) according to the setting range set by the user as the size of the face (any setting value, for example, the distance between the eyes is 10 to 400 pixels). A scaled image that is a compressed image is generated, and a face is detected based on the scaled image. This makes it possible to detect a face of a size (inter-eye distance) that cannot be handled by a face detector alone from an enlarged/reduced image. A more detailed explanation will be given below using FIG. 1.

まず、事前処理として、機械学習（好ましくは深層学習）により、入力画像中から所定範囲の大きさ（例えば目間距離が８～２００画素）の顔を検出可能な顔検出器を作成する。そして、運用時の初期化処理等において、作成した顔検出器をメモリ容量（サイズ）が異なる複数の顔検出器として記憶部に読み込む。ここで、各顔検出器のメモリ容量は、ＶＧＡ、ＦｕｌｌＨＤ、４Ｋ等の規格サイズの画像やその他任意の画像サイズ応じて設定する。 First, as a preliminary process, a face detector is created by machine learning (preferably deep learning) that can detect faces within a predetermined size range (eg, distance between eyes of 8 to 200 pixels) from an input image. Then, in initialization processing during operation, etc., the created face detector is read into the storage unit as a plurality of face detectors having different memory capacities (sizes). Here, the memory capacity of each face detector is set according to a standard size image such as VGA, Full HD, 4K, or any other arbitrary image size.

そして、顔検出時の処理では、図１に示すように、利用者が設定した検出したい顔の大きさ（例えば目間距離）を示す範囲である設定範囲に応じて、顔検出器が検出可能な適切な画像の解像度（サイズ）を決定し、入力画像の解像度（サイズ）を変更する。すなわち、顔検出器が検出可能な顔の大きさになるように入力画像を拡大又は縮小し、拡大縮小画像を生成する。 In the face detection process, as shown in Figure 1, the face detector can detect the face according to the setting range that indicates the size of the face that the user wants to detect (for example, the distance between the eyes). Determine the appropriate image resolution (size) and change the input image resolution (size). That is, the input image is enlarged or reduced so that the face size can be detected by the face detector, and an enlarged/reduced image is generated.

より具体的には、例えば、各顔検出器が検出可能な目間距離の所定範囲が８～２００画素であり、入力画像から検出すべき顔の大きさとして利用者が設定した目間距離の設定範囲が８～４００画素とする。この場合、各顔検出器は、入力画像からは、目間距離が２００～４００画素の顔を検出できない。そこで、入力画像を、そのアスペクト比を保ったまま５０％のサイズに縮小した拡大縮小画像を生成する。この拡大縮小画像でも各顔検出器が検出できる目間距離は、８～２００画素であるが、この拡大縮小画像は、サイズが入力画像の半分であるため、入力画像のサイズでは、その２倍の目間距離が１６～４００画素の顔を検出することになる。したがって、各顔検出器は、拡大縮小画像から、入力画像のサイズに換算したときに目間距離が２００～４００画素となる顔を検出することができる。 More specifically, for example, the predetermined range of the eye distance that each face detector can detect is 8 to 200 pixels, and the eye distance that the user has set as the size of the face to be detected from the input image. The setting range is 8 to 400 pixels. In this case, each face detector cannot detect a face with an inter-eye distance of 200 to 400 pixels from the input image. Therefore, an enlarged/reduced image is generated by reducing the input image to 50% of the size while maintaining its aspect ratio. Even in this enlarged/reduced image, the distance between the eyes that each face detector can detect is 8 to 200 pixels, but since this enlarged/reduced image is half the size of the input image, the size of the input image is twice that. Faces with an eye-to-eye distance of 16 to 400 pixels will be detected. Therefore, each face detector can detect a face whose inter-eye distance is 200 to 400 pixels when converted to the size of the input image from the enlarged/reduced image.

ただし、この場合、拡大縮小画像では、入力画像のサイズに換算して目間距離が１６画素未満の顔を検出することができず、設定範囲の最小値付近の大きさの顔を検出することができない。そこで、本実施形態では、拡大縮小画像のみならず、入力画像に対しても顔検出を行う。各顔検出器は、入力画像から目間距離が８～２００画素の顔を検出することができる。 However, in this case, with the enlarged/reduced image, it is not possible to detect faces whose distance between the eyes is less than 16 pixels when converted to the size of the input image, and it is difficult to detect faces whose size is around the minimum value of the setting range. I can't. Therefore, in this embodiment, face detection is performed not only on enlarged/reduced images but also on input images. Each face detector can detect faces with an inter-eye distance of 8 to 200 pixels from the input image.

このように、拡大縮小画像から、入力画像のサイズに換算して目間距離が１６～４００画素の顔を検出するとともに、入力画像から、目間距離が８～２００画素の顔を検出する。これにより、設定範囲（目間距離が８～４００画素）全域の大きさの顔を検出可能としている。 In this way, faces with an inter-eye distance of 16 to 400 pixels in terms of the size of the input image are detected from the enlarged/reduced image, and faces with an inter-eye distance of 8 to 200 pixels are detected from the input image. This makes it possible to detect faces of all sizes within the set range (inter-eye distance of 8 to 400 pixels).

このとき、拡大縮小画像及び入力画像のそれぞれの画像に対して、いずれの顔検出器を使用するかは、検出対象の画像のサイズに応じて決定される。検出対象の画像のサイズにメモリ容量が最も近いもの（ただし、画像サイズ以上のもの）を選択する。 At this time, which face detector is used for each of the enlarged/reduced image and the input image is determined according to the size of the image to be detected. Select the one whose memory capacity is closest to the size of the image to be detected (but larger than the image size).

また、図１に示したように、検出結果が複数ある場合、すなわち複数の顔検出器によって重なり合う領域に顔が検出された場合は、重なり具合で検出結果を統合し、最終的な結果を求める。 In addition, as shown in Figure 1, if there are multiple detection results, that is, if a face is detected in an overlapping area by multiple face detectors, the detection results are integrated according to the degree of overlap to obtain the final result. .

本実施形態によれば、顔検出器が対応可能な所定範囲と、利用者が希望する設定範囲とに基づいて、入力画像の解像度（サイズ）を変更して拡大縮小画像を生成し、拡大縮小画像に対して顔検出することから、利用者が要求する大きさの顔を効率的に検出することができる。 According to the present embodiment, the resolution (size) of the input image is changed based on a predetermined range that can be handled by the face detector and a setting range desired by the user, and a scaled-down image is generated. Since faces are detected in images, it is possible to efficiently detect faces of the size requested by the user.

＜画像処理装置を含むシステムの全体構成＞
次に、図２を用いて、実施形態１に係る画像処理装置を含むシステムの全体構成について説明する。図２に示すように、このシステムは、屋内外の所定エリア（例えば屋外の通路や店舗内）等を対象にして構築されるものであり、画像取得部として機能するカメラ１と、画像記録部として機能するレコーダ（画像記録装置）２と、顔検出部及び顔認証部として機能する画像処理装置３と、を備えている。所定エリアは、用途に応じて適宜設定可能である。 <Overall system configuration including image processing device>
Next, the overall configuration of a system including the image processing apparatus according to the first embodiment will be described using FIG. 2. As shown in Fig. 2, this system is constructed to target a predetermined area indoors or outdoors (for example, an outdoor aisle or inside a store), and includes a camera 1 that functions as an image acquisition unit and an image recording unit. The device includes a recorder (image recording device) 2 that functions as a face detection unit and an image processing device 3 that functions as a face detection unit and a face authentication unit. The predetermined area can be set as appropriate depending on the purpose.

カメラ１は、屋内外の所望の場所に設置される。カメラ１により屋内外の所定エリアが撮影される。撮影された映像（各フレーム）は、レコーダ２に蓄積される。 The camera 1 is installed at a desired location indoors or outdoors. A camera 1 photographs a predetermined area indoors and outdoors. The photographed video (each frame) is stored in the recorder 2.

画像処理装置３には、撮影した映像等を表示する表示部としてのモニタ（表示装置）４と、操作者が種々の入力操作を行う入力部としての入力デバイス５（例えばキーボードやマウス等）とが通信可能に接続されている。なお、モニタ４及び入力デバイス５は、タッチパネルディスプレイ等の入力機能付きの表示装置から構成されてもよい。 The image processing device 3 includes a monitor (display device) 4 as a display unit that displays captured images, etc., and an input device 5 (for example, a keyboard, a mouse, etc.) as an input unit through which an operator performs various input operations. are connected for communication. Note that the monitor 4 and the input device 5 may be configured from a display device with an input function such as a touch panel display.

画像処理装置３は、利用者が、モニタ４によって、カメラ１で撮像された映像をリアルタイムで閲覧できるように構成されるとともに、レコーダ２に録画された過去の映像を閲覧できるように構成されている。 The image processing device 3 is configured so that the user can view the video captured by the camera 1 in real time on the monitor 4, and is also configured to allow the user to view past video recorded on the recorder 2. There is.

＜画像処理装置の構成＞
次に、図３を用いて、画像処理装置３の構成について説明する。画像処理装置３は、一般的なパーソナルコンピュータ相当の機能を有し、図３に示すように、制御部１０及び記憶部２０を備えている。 <Configuration of image processing device>
Next, the configuration of the image processing device 3 will be explained using FIG. 3. The image processing device 3 has functions equivalent to a general personal computer, and includes a control section 10 and a storage section 20, as shown in FIG.

制御部１０は、映像入力部１１と、設定部１２と、解像度変更部１３と、画像サイズ調整部１４と、顔検出部１５と、顔認識部１６との機能を備えている。制御部１０は、例えば、各種の処理を実現するためのソフトウェアプログラムと、該ソフトウェアプログラムを実行するＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）と、該ＣＰＵによって制御される各種ハードウェア等によって構成されている。制御部１０の動作に必要なソフトウェアプログラムやデータは記憶部２０に記憶される。 The control section 10 has the functions of a video input section 11 , a setting section 12 , a resolution change section 13 , an image size adjustment section 14 , a face detection section 15 , and a face recognition section 16 . The control unit 10 includes, for example, a software program for implementing various processes, a CPU (Central Processing Unit) that executes the software program, and various hardware controlled by the CPU. Software programs and data necessary for the operation of the control unit 10 are stored in the storage unit 20.

なお、制御部１０の図３に示した各部は、制御部１０のＣＰＵで本実施形態に係る画像処理プログラムを実行させることによって実現される。本実施形態に係る画像処理プログラムは、画像処理装置３に予め導入されてもよいし、汎用ＯＳ上で動作可能なアプリケーションプログラムとして、コンピュータ読み取り可能な記録媒体に記録して、又は、ネットワークを介して、利用者に提供されてもよい。 Note that each part of the control unit 10 shown in FIG. 3 is realized by causing the CPU of the control unit 10 to execute the image processing program according to the present embodiment. The image processing program according to the present embodiment may be installed in the image processing device 3 in advance, or may be recorded on a computer-readable recording medium as an application program that can run on a general-purpose OS, or may be stored via a network. may be provided to the user.

記憶部２０は、ハードディスク装置や不揮発性メモリ等の記憶装置から構成され、推論モデル２１を記憶している。 The storage unit 20 is composed of a storage device such as a hard disk device or a nonvolatile memory, and stores an inference model 21.

映像入力部１１は、カメラ１又はレコーダ２から画像を取得する処理を行う。顔検出をリアルタイムで行う場合にはカメラ１から画像を取得し、過去の映像から顔検出を行う場合にはレコーダ２から画像を取得する。映像入力部１１によって取得された画像は、入力画像として解像度変更部１３に出力される。なお、入力画像は、静止画像であってもよいし、動画像（映像）であってもよい。 The video input unit 11 performs a process of acquiring images from the camera 1 or the recorder 2. When performing face detection in real time, an image is acquired from the camera 1, and when performing face detection from past video, an image is acquired from the recorder 2. The image acquired by the video input section 11 is output to the resolution changing section 13 as an input image. Note that the input image may be a still image or a moving image (video).

設定部１２は、利用者の入力操作に応じて、入力画像から検出すべき顔の大きさ（目間距離）の範囲として、設定範囲を設定する処理を行う。設定範囲は、検出すべき顔の大きさ（目間距離）の最大値及び最小値を指定することによって設定され、記憶部２０に記憶される。設定範囲の最大値及び最小値は、いずれも任意の値であり、好ましくは画素数（任意の自然数）である。設定範囲は、利用者が入力デバイス５を用いて目間距離の最大値及び最小値を直接入力し、当該入力値に基づき設定部１２が設定してもよいし、下記の手法により設定されてもよい。 The setting unit 12 performs a process of setting a setting range as a range of the size of the face (inter-eye distance) to be detected from the input image according to the user's input operation. The setting range is set by specifying the maximum and minimum values of the size (inter-eye distance) of the face to be detected, and is stored in the storage unit 20. The maximum value and minimum value of the setting range are both arbitrary values, preferably the number of pixels (an arbitrary natural number). The setting range may be set by the user directly inputting the maximum and minimum values of the inter-eye distance using the input device 5, and by the setting unit 12 based on the input values, or may be set by the method described below. Good too.

例えば、図４（ａ）に示すように、カメラ１の前を利用者に前後に歩行してもらいながら撮影し、撮影した映像をモニタ４に表示する。図４（ｂ）に示すように、モニタ４には、顔の大きさが徐々に変化する利用者が表示されることになる。次に、利用者にモニタ４上で顔の大きさの変化を実際に確認してもらいながら、検出したい大きさの顔、すなわち検出したい最も小さい顔の画像と、検出したい最も大きい顔の画像とを選択してもらう。そして、それぞれの画像について、解像度変更部１３による拡大縮小画像の生成処理と顔検出部１５による顔検出処理とを、顔検出部１５がそれらの顔を検出するまで、拡大縮小率を種々変更しながら繰り返す。そして、顔検出部１５によって検出された最も小さい顔及び最も小さい顔の目間距離から、検出すべき目間距離の範囲（最大値及び最小値）を決定する。また、これとは異なり、利用者にモニタ４上で検出したい最も小さい顔の画像と検出したい最も大きい顔の画像とを選択してもらい、そして、それぞれの顔の両目の位置を利用者に入力デバイス５を用いて指定してもらうことによって、検出すべき目間距離の範囲（最大値及び最小値）を決定してもよい。 For example, as shown in FIG. 4A, a user is asked to walk back and forth in front of the camera 1 while taking pictures, and the captured video is displayed on the monitor 4. As shown in FIG. 4(b), the monitor 4 displays a user whose face size gradually changes. Next, while having the user actually check the change in the size of the face on the monitor 4, the user selects an image of the size of the face they want to detect, that is, an image of the smallest face they want to detect, and an image of the largest face they want to detect. have them choose. Then, for each image, the resolution changing unit 13 performs scaling image generation processing and the face detection unit 15 performs face detection processing by changing the scaling ratio variously until the face detection unit 15 detects those faces. Repeat while doing so. Then, from the smallest face detected by the face detection unit 15 and the distance between the eyes of the smallest face, a range (maximum value and minimum value) of the distance between the eyes to be detected is determined. Also, different from this, the user is asked to select the smallest face image they want to detect and the largest face image they want to detect on the monitor 4, and then the user inputs the positions of the eyes of each face. The range (maximum value and minimum value) of the inter-eye distance to be detected may be determined by having the user specify it using the device 5.

また、図５に示すように、カメラ１の映像に顔のイラスト画像等をオーバーレイ表示し、利用者にマウス等により検出したい顔の大きさまで拡大縮小してもらってもよい。この場合、設定部１２が、最も縮小された顔（検出したい最も小さい顔）と、最も拡大された顔（検出したい最も大きい顔）から、それぞれ、検出すべき目間距離の最大値及び最小値を決定し、設定範囲を設定する。 Alternatively, as shown in FIG. 5, an illustration image of a face or the like may be displayed as an overlay on the image of the camera 1, and the user may use a mouse or the like to enlarge or reduce the size of the face desired to be detected. In this case, the setting unit 12 determines the maximum and minimum inter-eye distances to be detected from the most reduced face (the smallest face you want to detect) and the most enlarged face (the largest face you want to detect). and set the setting range.

更に、設定部１２が、入力画像の解像度から、検出したい顔の大きさ、すなわち設定範囲を自動で設定してもよい。これは、スマートフォンやパスコンのカメラで顔認証（ログイン）する場合に好適である。この用途では、画像中に顔が大きく写るはずであるため、例えば、入力画像の解像度の半分以上等の画素数を設定範囲としてもよい。 Further, the setting unit 12 may automatically set the size of the face to be detected, that is, the setting range, from the resolution of the input image. This is suitable for face authentication (login) using a smartphone or pass cap camera. In this application, since the face should appear large in the image, the setting range may be, for example, the number of pixels equal to or more than half the resolution of the input image.

解像度変更部１３は、顔検出部１５が検出可能な顔の大きさ（目間距離）の範囲である所定範囲と、上述の設定範囲（入力画像から検出すべき顔の大きさ（目間距離）の範囲）とに基づいて、入力画像の解像度を変更した画像である拡大縮小画像を生成する。拡大縮小画像とは、入力画像の解像度を上げるか、又は下げた画像、すなわち入力画像を拡大又は縮小した画像である。なお、通常は、解像度変更部１３は、入力画像のアスペクト比を維持したまま拡大縮小画像を生成する。 The resolution changing unit 13 has a predetermined range of face sizes (inter-eye distance) that can be detected by the face detection unit 15 and the above-mentioned setting range (a range of face sizes (inter-eye distance) to be detected from the input image). ), an enlarged/reduced image, which is an image with a changed resolution of the input image, is generated based on the range of ). The enlarged/reduced image is an image obtained by increasing or reducing the resolution of the input image, that is, an image obtained by enlarging or reducing the input image. Note that normally, the resolution changing unit 13 generates an enlarged/reduced image while maintaining the aspect ratio of the input image.

より詳細には、解像度変更部１３は、設定範囲の最大値に係る大きさの顔と、設定範囲の最小値に係る大きさの顔との少なくとも一方を顔検出部１５が検出可能となるように、入力画像の解像度を変更して拡大縮小画像を生成する。これにより、検出可能な所定範囲を超える大きさの顔（顔検出部１５の検出可能な範囲を超える大き過ぎる顔）と、検出可能な所定範囲を下回る大きさの顔（顔検出部１５の検出可能な範囲を下回る小さすぎる顔）との少なくとも一方について、顔検出部１５によって効率的に検出することができる。 More specifically, the resolution changing unit 13 enables the face detecting unit 15 to detect at least one of a face with a size corresponding to the maximum value of the setting range and a face with a size corresponding to the minimum value of the setting range. Then, the resolution of the input image is changed to generate a scaled image. As a result, faces whose size exceeds the predetermined detectable range (too large faces that exceed the detectable range of the face detection unit 15) and faces whose size falls below the predetermined detectable range (faces that are too large that exceed the detectable range of the face detection unit 15) are detected. The face detection unit 15 can efficiently detect at least one of the face (too small, below the possible range).

より好ましくは、解像度変更部１３は、少なくとも設定範囲の最大値に係る大きさの顔を顔検出部１５が検出可能となるように、入力画像の解像度を低くすることによって拡大縮小画像を生成する。これにより、少なくとも検出可能な所定範囲を超える大きさの顔（顔検出部１５の検出可能な範囲を超える大き過ぎる顔）を効率的に検出することができる。また、入力画像の解像度を低くした拡大縮小画像、すなわち縮小画像に対して顔検出できることから、顔検出処理の負担を軽減することができる。 More preferably, the resolution changing unit 13 generates the enlarged/reduced image by lowering the resolution of the input image so that the face detection unit 15 can detect a face having a size corresponding to at least the maximum value of the setting range. . Thereby, it is possible to efficiently detect at least a face whose size exceeds the predetermined detectable range (a face that is too large and exceeds the detectable range of the face detection unit 15). Furthermore, since faces can be detected on enlarged/reduced images obtained by lowering the resolution of the input image, that is, reduced images, the burden of face detection processing can be reduced.

具体的には、例えば、目間距離の所定範囲が８～２００画素であり、目間距離の設定範囲が８～４００画素である場合、設定範囲の最大値の目間距離である４００画素が所定範囲の最大値と一致するように、入力画像の解像度を低くする。すなわち、（所定範囲の最大値）／（設定範囲の最大値）×１００の式から算出される拡大／縮小率（％）で入力画像を拡大／又は縮小して拡大縮小画像を生成する。 Specifically, for example, if the predetermined range of the eye distance is 8 to 200 pixels and the set range of the eye distance is 8 to 400 pixels, then the maximum eye distance of the set range of 400 pixels is The resolution of the input image is lowered to match the maximum value of a predetermined range. That is, an enlarged/reduced image is generated by enlarging/reducing the input image at an enlargement/reduction ratio (%) calculated from the formula (maximum value of predetermined range)/(maximum value of set range)×100.

なお、拡大／縮小率とは、入力画像に対する拡大縮小画像の倍率を意味し、ここでは、入力画像に対する拡大縮小画像の面積比率ではなく、入力画像に対する拡大縮小画像の幅及び高さの比率を表す。具体的には、上述のように、例えば、所定範囲の最大値及び設定範囲の最大値から算出される。 Note that the enlargement/reduction ratio means the magnification ratio of the enlarged/reduced image to the input image, and here, it refers to the ratio of the width and height of the enlarged/reduced image to the input image, rather than the area ratio of the enlarged/reduced image to the input image. represent. Specifically, as described above, it is calculated, for example, from the maximum value of a predetermined range and the maximum value of a set range.

顔検出部１５は、入力画像や拡大縮小画像といった検出対象の画像（画像データ）から顔検出する、すなわち顔検出処理を実行する。ただし、顔検出部１５は、画像中のあらゆる大きさ（目間距離）の顔を検出できるものではなく、所定範囲の大きさ（目間距離）の顔のみを検出可能なように構成されている。 The face detection unit 15 detects a face from a detection target image (image data) such as an input image or a scaled image, that is, executes a face detection process. However, the face detection unit 15 is not configured to be able to detect faces of all sizes (inter-eye distance) in an image, but only to be able to detect faces within a predetermined range of sizes (inter-eye distance). There is.

そこで、顔検出部１５は、少なくとも拡大縮小画像に対して顔検出する。これにより、例え検出すべき目間距離（設定範囲）が検出可能な所定範囲外であっても、拡大縮小画像から設定範囲の顔を検出できるようになる。 Therefore, the face detection unit 15 performs face detection on at least the enlarged/reduced image. As a result, even if the inter-eye distance (set range) to be detected is outside the predetermined detectable range, it becomes possible to detect faces within the set range from the enlarged/reduced image.

具体的には、例えば、目間距離の所定範囲が８～２００画素であり、目間距離の設定範囲が５０～４００画素であり、５０％の拡大／縮小率で入力画像を縮小して拡大縮小画像を生成する場合、顔検出部１５は、拡大縮小画像では、入力画像のサイズに換算して目間距離が１６～４００画素の顔を検出できることから、設定範囲の大きさの顔を検出できることになる。 Specifically, for example, if the predetermined range of the inter-eye distance is 8 to 200 pixels, the set range of the inter-eye distance is 50 to 400 pixels, and the input image is scaled down and enlarged at an enlargement/reduction ratio of 50%. When generating a reduced image, the face detection unit 15 detects a face with a size within the set range, since in an enlarged/reduced image, it is possible to detect faces with an inter-eye distance of 16 to 400 pixels when converted to the size of the input image. It will be possible.

ただし、上述のように、例えば、目間距離の所定範囲が８～２００画素であり、目間距離の設定範囲が８～４００画素であり、５０％の拡大／縮小率で入力画像を縮小して拡大縮小画像を生成する場合、顔検出部１５は、拡大縮小画像では、入力画像のサイズに換算して目間距離が１６画素未満の顔を検出できなくなってしまう。 However, as mentioned above, for example, if the predetermined range of the inter-eye distance is 8 to 200 pixels, the set range of the inter-eye distance is 8 to 400 pixels, and the input image is reduced at a magnification/reduction ratio of 50%. When generating an enlarged/reduced image using the enlarged/reduced image, the face detection unit 15 cannot detect a face whose inter-eye distance is less than 16 pixels in terms of the size of the input image.

そこで、顔検出部１５は、設定範囲の最小値によっては、入力画像に対しても更に顔検出する。これにより、例え検出すべき顔の大きさ（設定範囲）が拡大縮小画像にて顔検出部１５が検出可能な範囲を下回る場合であっても、入力画像に対して顔検出を行うことから、拡大縮小画像では検出できないような小さな顔を検出できる可能性が高くなる。 Therefore, the face detection unit 15 further performs face detection on the input image depending on the minimum value of the setting range. As a result, even if the size of the face to be detected (setting range) is smaller than the range that can be detected by the face detection unit 15 in the enlarged/reduced image, face detection is performed on the input image. This increases the possibility of detecting small faces that cannot be detected in enlarged/reduced images.

具体的には、上述のように、目間距離の所定範囲が８～２００画素であり、目間距離の設定範囲が８～４００画素であり、５０％の拡大／縮小率で入力画像を縮小して拡大縮小画像を生成する場合であっても、顔検出部１５は、入力画像では、目間距離が８～２００画素の顔を検出することができる。すわなち、拡大縮小画像では検出できない目間距離が８～１６画素の顔を入力画像から検出することができる。 Specifically, as described above, the predetermined range of the inter-eye distance is 8 to 200 pixels, the set range of the inter-eye distance is 8 to 400 pixels, and the input image is reduced at an enlargement/reduction ratio of 50%. Even in the case where a scaled image is generated using the input image, the face detection unit 15 can detect a face with an inter-eye distance of 8 to 200 pixels in the input image. In other words, it is possible to detect faces with an inter-eye distance of 8 to 16 pixels, which cannot be detected in enlarged/reduced images, from the input image.

入力画像に対しても更に顔検出するか否かの基準としては、下記式（１）が好適である。すなわち、入力画像に対する拡大縮小画像の拡大／縮小率をＸ％（ただし、Ｘは０を除く正の数）とし、所定範囲の最小値をＭｉｎ１とし、設定範囲の最小値をＭｉｎ２としたとき、下記式（１）で表される関係を満たす場合は、顔検出部１５は、入力画像に対しても更に顔検出することが好ましい。
（Ｍｉｎ２×Ｘ／１００）＜Ｍｉｎ１（１） The following formula (1) is suitable as a criterion for determining whether or not to further perform face detection on the input image. That is, when the enlargement/reduction ratio of the enlarged/reduced image with respect to the input image is set to X% (where X is a positive number excluding 0), the minimum value of the predetermined range is set to Min1, and the minimum value of the set range is set to Min2, When the relationship expressed by the following formula (1) is satisfied, it is preferable that the face detection unit 15 further performs face detection on the input image.
(Min2×X/100)<Min1 (1)

他方、上記式（１）で表される関係を満たさない場合、すなわち（Ｍｉｎ２×Ｘ／１００）≧Ｍｉｎ１を満たす場合は、顔検出部１５は、入力画像に対して顔検出しなくてもよい。 On the other hand, if the relationship expressed by the above formula (1) is not satisfied, that is, if (Min2×X/100)≧Min1 is satisfied, the face detection unit 15 does not need to perform face detection on the input image. .

図６に示すように、顔検出部１５（各顔検出器）は、少なくとも、検出対象の画像に含まれる顔の左右の目の座標Ｅ１、Ｅ２と、当該顔の領域を示す矩形のバウンディングボックスＢＢと、当該顔の顔らしさに係るスコアとを出力（推定）するものである。なお、顔らしさに係るスコアとは、検出した顔がどの程度顔らしいかを示すパラメータであり、０～１の連続する値で出力され、より１に近いほど当該顔がより顔らしいことを意味する。また、顔検出部１５は、左右の目の座標Ｅ１、Ｅ２に基づいて当該顔の大きさ（本実施形態では目間距離）を算出する処理を行う。すなわち、目間距離は、検出された左右の目の座標Ｅ１及びＥ２間の距離に対応する。 As shown in FIG. 6, the face detection unit 15 (each face detector) at least detects the coordinates E1 and E2 of the left and right eyes of the face included in the image to be detected, and a rectangular bounding box indicating the area of the face. This outputs (estimates) BB and a score related to the face-likeness of the face. Note that the score related to face-likeness is a parameter that indicates how much the detected face resembles a face, and is output as a continuous value from 0 to 1, and the closer it is to 1, the more the face is like a face. do. The face detection unit 15 also performs a process of calculating the size of the face (in this embodiment, the distance between the eyes) based on the coordinates E1 and E2 of the left and right eyes. That is, the inter-eye distance corresponds to the distance between the detected coordinates E1 and E2 of the left and right eyes.

そして、顔検出部１５の検出結果には、検出した顔に係る情報として、左右の目の座標Ｅ１、Ｅ２、バウンディングボックスＢＢの座標（左上のコーナーの座標及びサイズ）、顔らしさに係るスコア、顔の大きさとしての目間距離、等が含まれる。顔検出部１５による検出結果、例えば、左右の目に位置を示す記号やバウンディングボックスＢＢは、モニタ４に表示されることによって利用者に報知される。 The detection result of the face detection unit 15 includes information related to the detected face, including the coordinates E1 and E2 of the left and right eyes, the coordinates of the bounding box BB (coordinates and size of the upper left corner), the score related to face-likeness, This includes the distance between the eyes as the size of the face. The detection results by the face detection unit 15, such as symbols indicating the positions of the left and right eyes and bounding boxes BB, are displayed on the monitor 4 to notify the user.

図７に示すように、拡大縮小画像にて顔検出部１５によって検出された顔Ｆ１の領域ｒ１に対応する入力画像中の第１の領域Ｒ１が、入力画像にて顔検出部１５によって検出された顔Ｆ２の領域である第２の領域Ｒ２と重なる場合、顔検出部１５は、第１の領域Ｒ１及び第２の領域Ｒ２の重なり具合に応じて、いずれか一方の顔Ｆ１又はＦ２に係る情報を検出結果として出力する。これにより、顔認証部１６に出力される顔を減らすことができるので、画像処理装置３の処理コストを削減することができる。 As shown in FIG. 7, a first region R1 in the input image corresponding to region r1 of the face F1 detected by the face detection section 15 in the enlarged/reduced image is detected by the face detection section 15 in the input image. If the area overlaps with the second area R2 which is the area of the face F2, the face detection unit 15 detects the area of the face F1 or F2 according to the degree of overlap between the first area R1 and the second area R2. Output information as detection results. Thereby, the number of faces output to the face authentication section 16 can be reduced, so the processing cost of the image processing device 3 can be reduced.

ここで、第１の領域Ｒ１が第２の領域Ｒ２と重なる場合としては、具体的には、例えば、拡大縮小画像と入力画像との両方にて同一人物の顔が検出された場合と、拡大縮小画像及び入力画像のそれぞれにて異なる人物の顔が互いに近接して検出された場合とが想定できる。そして、前者の場合は、拡大縮小画像と入力画像とのいずれかの検出結果のみを採用すべき場合であり、後者の場合は、拡大縮小画像と入力画像とのそれぞれの検出結果を採用すべき場合である。そこで、これらの場合に対応可能とするために、上述のように、顔検出部１５は、第１の領域Ｒ１及び第２の領域Ｒ２の重なり具合に応じて、いずれか一方の顔Ｆ１又はＦ２に係る情報を検出結果として出力するようになっている。ここで、第１の領域Ｒ１及び第２の領域Ｒ２の重なり具合とは、具体的には、例えば、ＩｏＵ（Intersection over Union）であり、ＩｏＵが所定の基準値以上であれば、第１の領域Ｒ１と第２の領域Ｒ２の重なりが大きいため、顔Ｆ１又はＦ２に係る顔を同一人物に係る顔とみなし、いずれか一方の顔Ｆ１又はＦ２に係る情報を検出結果として出力し、ＩｏＵがその基準値未満であれば、第１の領域Ｒ１と第２の領域Ｒ２の重なりが小さいため、顔Ｆ１又はＦ２に係る顔をそれぞれ異なる人物の顔とみなして両方の顔Ｆ１及びＦ２に係る情報をそれぞれ検出結果として出力する。なお、ＩｏＵは、（領域の共通部分）／（領域の和集合）の式から算出される。 Here, specifically, cases where the first region R1 overlaps with the second region R2 include, for example, a case where the same person's face is detected in both the enlarged/reduced image and the input image, and It can be assumed that faces of different people are detected close to each other in each of the reduced image and the input image. In the former case, only the detection results for either the scaled image or the input image should be adopted, and in the latter case, the detection results for both the scaled image and the input image should be adopted. This is the case. Therefore, in order to be able to deal with these cases, as described above, the face detection unit 15 detects either one of the faces F1 or F2 depending on the degree of overlap between the first region R1 and the second region R2. Information related to this is output as a detection result. Here, the overlapping degree of the first region R1 and the second region R2 is specifically, for example, IoU (Intersection over Union), and if IoU is equal to or greater than a predetermined reference value, the first region Since the overlap between region R1 and second region R2 is large, faces related to face F1 or F2 are regarded as faces related to the same person, information related to either face F1 or F2 is output as a detection result, and IoU is If it is less than the reference value, since the overlap between the first region R1 and the second region R2 is small, the faces related to the face F1 or F2 are regarded as the faces of different people, and the information related to both the faces F1 and F2 is generated. are output as detection results. Note that IoU is calculated from the formula (common part of regions)/(union of regions).

また、第１の領域Ｒ１が第２の領域Ｒ２と重なり、かついずれか一方の顔Ｆ１又はＦ２に係る情報を出力する場合、顔検出部１５は、顔らしさに係るスコアがより高い方の顔Ｆ１又はＦ２に係る情報を検出結果として出力する。これにより、拡大縮小画像と入力画像で検出された２つの顔Ｆ１及びＦ２のうち、より顔らしさのスコアが高い方の顔に係る情報を利用できることから、顔認証部１６による当該顔の認証精度を向上することができる。 Further, when the first region R1 overlaps the second region R2 and outputs information related to either one of the faces F1 or F2, the face detection unit 15 selects the face with a higher face-likeness score. Information related to F1 or F2 is output as a detection result. As a result, information related to the face with a higher face-likeness score among the two faces F1 and F2 detected in the enlarged/reduced image and the input image can be used, so the accuracy of the recognition of the face by the face authentication unit 16 can be improved.

なお、顔Ｆ１に係る領域ｒ１及び顔Ｆ２に係る領域Ｒ２は、いずれも顔検出部１５によって検出されたバウンディングボックスＢＢに相当するものである。また、領域ｒ１の拡大縮小画像内における相対的な位置は、第１の領域Ｒ１の入力画像内における相対的な位置と一致している。 Note that the area r1 related to the face F1 and the area R2 related to the face F2 both correspond to the bounding box BB detected by the face detection unit 15. Further, the relative position of the region r1 within the enlarged/reduced image matches the relative position of the first region R1 within the input image.

また、顔検出部１５は、所定範囲の少なくとも一部が設定範囲を超えており、かつ顔検出部１５によって設定範囲外の大きさの顔が検出された場合、その顔、すなわち当該設定範囲外の大きさの顔に係る情報を、利用者が設定可能な設定条件に応じて処理する。具体的には、顔検出部１５は、当該設定範囲外の大きさの顔に係る情報を検出結果として出力しないことが好ましい。これにより、検出すべきでない大きさの顔、すなわち利用者が希望しない大きさの顔が検出された場合に、当該顔が検出されたことをモニタ４に表示されないようにして、利用者が望む検出結果のみを確認しやすくすることができる。 In addition, when at least a part of the predetermined range exceeds the set range and the face detection unit 15 detects a face whose size is outside the set range, the face detection unit 15 detects the face that is outside the set range. Information related to a face of size is processed according to setting conditions that can be set by the user. Specifically, it is preferable that the face detection unit 15 not output information regarding a face having a size outside the set range as a detection result. As a result, when a face of a size that should not be detected, that is, a face of a size that the user does not desire, is detected, the fact that the face has been detected is not displayed on the monitor 4, and the user It is possible to easily confirm only the detection results.

なお、所定範囲の少なくとも一部が設定範囲を超える場合とは、より具体的には、所定範囲の最大値が設定範囲の最大値を超える場合、所定範囲の最小値が設定範囲の最小値を下回る場合、所定範囲の最大値が設定範囲の最大値を超え、かつ所定範囲の最小値が設定範囲の最小値を下回る場合が挙げられる。 In addition, when at least a part of the predetermined range exceeds the set range, more specifically, when the maximum value of the predetermined range exceeds the maximum value of the set range, the minimum value of the predetermined range exceeds the minimum value of the set range. In the case where the maximum value of the predetermined range exceeds the maximum value of the set range, and the minimum value of the predetermined range is lower than the minimum value of the set range.

また、設定条件によっては、設定範囲外の大きさの顔が検出された場合、該顔を検出すべき顔（設定範囲内の大きさの顔）とは異なる態様でモニタ４に表示されるようにしてもよい。例えば、設定範囲外の大きさの顔と、設定範囲内の大きさの顔とで、バウンディングボックスの表示態様を異なるもの、例えば異なる色にしてもよい。 Also, depending on the setting conditions, if a face with a size outside the setting range is detected, the face may be displayed on the monitor 4 in a different manner from the face to be detected (face with a size within the setting range). You may also do so. For example, the bounding box may be displayed in a different manner, for example, in a different color, for a face whose size is outside the set range and a face whose size is within the set range.

また、顔検出部１５は、上述のように、メモリ容量が互いに異なる複数の顔検出器を含んでいる。なお、複数の顔検出器は、メモリ容量が異なるだけであり、検出可能な顔の大きさを示す所定範囲は、いずれの顔検出器でも共通である。 Furthermore, as described above, the face detection unit 15 includes a plurality of face detectors having different memory capacities. Note that the plurality of face detectors differ only in memory capacity, and the predetermined range indicating the size of a detectable face is common to all face detectors.

顔検出器のメモリ容量は、顔検出処理中の演算結果を記憶するために確保されたものであり、複数の画像サイズ（以下、基準画像サイズとも言う）に応じて設定される。このように、顔検出器のメモリ容量を複数の基準画像サイズに応じて設定することによって、検出対象となる画像（拡大縮小画像や入力画像）の画像サイズに応じて検出器のメモリ容量を都度確保する場合に比べて、処理時間を短縮することができる。顔検出器のメモリ容量は、上述のように、ＶＧＡ、ＦｕｌｌＨＤ、４Ｋ、任意の画像サイズ等の複数の画像サイズに応じて設定されている。 The memory capacity of the face detector is secured to store calculation results during face detection processing, and is set according to a plurality of image sizes (hereinafter also referred to as reference image sizes). In this way, by setting the memory capacity of the face detector according to multiple reference image sizes, the memory capacity of the detector can be adjusted each time according to the image size of the image to be detected (scaled image or input image). The processing time can be shortened compared to the case where the data is secured. As described above, the memory capacity of the face detector is set according to a plurality of image sizes such as VGA, Full HD, 4K, and any image size.

顔検出部１５は、検出対象の画像内をくまなく検索し、画像内の任意の場所で顔を見つけ出す（推定する）ものである。また、顔検出部１５は、正面を向いていない顔、すなわち様々な方向を向いた顔を検出することができる。 The face detection unit 15 searches the whole image of the detection target and finds (estimates) a face anywhere in the image. Further, the face detection unit 15 can detect faces that are not facing forward, that is, faces facing in various directions.

より具体的には、顔検出部１５は、上述のように複数の顔検出器を含んでおり、各顔検出器は、記憶部２０に記憶された推論モデル２１を制御部１０にて実行することによって、その機能、すなわち顔検出機能を実現する。 More specifically, the face detection unit 15 includes a plurality of face detectors as described above, and each face detector executes the inference model 21 stored in the storage unit 20 by the control unit 10. By doing so, the function, namely the face detection function, is realized.

ここで、推論モデル２１について説明する。推論モデル２１は、ラベル情報（正解データ）が付されたデータセット（教師データ）の教師あり機械学習により作成される。より具体的には、推論モデル２１は、人物の顔及び人物の顔以外の物体が写った画像（例えば二次元の静止画像）を入力データとし、その顔に付与された特徴点（例えば、左右の目）の位置や、その顔の領域を示す矩形のバウンディングボックス、人物の顔を１とし人物の顔以外の物体を０とするスコア情報等の情報をラベルとして、畳み込みニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）を利用した学習用プログラムにより深層学習（ディープラーニング）を行うことによって作成される。 Here, the inference model 21 will be explained. The inference model 21 is created by supervised machine learning of a data set (teacher data) to which label information (correct data) is attached. More specifically, the inference model 21 uses as input data an image of a person's face and an object other than the person's face (for example, a two-dimensional still image), and uses feature points attached to the face (for example, left and right Convolutional neural network (CNN) It is created by performing deep learning using a learning program using Neural Network.

教師あり機械学習により作成された推論モデル２１は、学習済みパラメータが組み込まれた推論プログラム（学習済みモデル）として機能する。なお、学習済みパラメータは、データセットを用いた学習の結果、得られたパラメータ（係数）である。また。推論プログラムは、入力として与えられた入力画像（静止画像、又は映像（動画像）を構成する各静止画像）に対して、学習の結果として取得された学習済みパラメータを適用し、当該入力画像に対する結果（具体的には、上述したような各特徴点やバウンディングボックスの位置等）を出力するための一連の演算手順を規定したプログラムである。 The inference model 21 created by supervised machine learning functions as an inference program (learned model) in which learned parameters are incorporated. Note that the learned parameters are parameters (coefficients) obtained as a result of learning using a data set. Also. The inference program applies the learned parameters obtained as a result of learning to the input image (still image or each still image forming a video (moving image)) given as input, and applies the learned parameters to the input image. This is a program that defines a series of calculation procedures for outputting results (specifically, the positions of each feature point and bounding box as described above).

なお、推論モデル２１（各顔検出器）は、追加学習されてもよい。すなわち、推論モデル２１に異なるデータセットを適用し、更なる学習を行うことによって、新たに学習済みパラメータを生成し、この新たな学習済みパラメータが組み込まれた推論プログラムを推論モデル２１（各顔検出器）として利用してもよい。 Note that the inference model 21 (each face detector) may be additionally trained. That is, by applying a different data set to the inference model 21 and performing further learning, new learned parameters are generated, and the inference program incorporating the new learned parameters is applied to the inference model 21 (each face detection It may also be used as a container.

画像サイズ調整部１４は、拡大縮小画像のサイズが複数の基準画像サイズのうちの最も近い画像サイズに一致するように、拡大縮小画像の周囲の少なくとも一部に背景画像を追加する処理を行う。この結果、拡大縮小画像のサイズは、いずれかの顔検出器のメモリ容量に対応するものとなる。そして、背景画像が追加された拡大縮小画像は、複数の顔検出器のうち、背景画像が追加された拡大縮小画像のサイズに対応するメモリ容量が確保された顔検出器によって、顔検出が行われる。これにより、上述のように、検出対象となる画像（拡大縮小画像や入力画像）の画像サイズに応じて顔検出器のメモリ容量を都度確保することなく顔検出処理を実行できるため、処理時間を短縮することができる。 The image size adjustment unit 14 performs a process of adding a background image to at least a portion of the periphery of the enlarged/reduced image so that the size of the enlarged/reduced image matches the closest image size among the plurality of reference image sizes. As a result, the size of the enlarged/reduced image corresponds to the memory capacity of one of the face detectors. Face detection is performed on the enlarged/reduced image to which the background image has been added by one of the multiple face detectors that has a memory capacity that corresponds to the size of the enlarged/reduced image to which the background image has been added. be exposed. As a result, as mentioned above, face detection processing can be performed without reserving the memory capacity of the face detector each time according to the image size of the image to be detected (scaled/reduced image or input image), which reduces the processing time. Can be shortened.

また、上述のように、拡大縮小画像のみならず入力画像に対しても顔検出を行う場合は、同様に、画像サイズ調整部１４は、入力画像のサイズが複数の基準画像サイズのうちの最も近い画像サイズに一致するように、入力画像の周囲の少なくとも一部に背景画像を追加する処理を行う。この結果、入力画像のサイズは、いずれかの顔検出器のメモリ容量に対応するものとなる。そして、背景画像が追加された入力画像は、複数の顔検出器のうち、背景画像が追加された入力画像のサイズに対応するメモリ容量が確保された顔検出器によって、顔検出が行われる。 Further, as described above, when performing face detection not only on enlarged/reduced images but also on input images, the image size adjustment unit 14 similarly determines whether the size of the input image is the largest among the plurality of reference image sizes. Processing is performed to add a background image to at least a portion of the periphery of the input image so as to match a similar image size. As a result, the size of the input image corresponds to the memory capacity of either face detector. Face detection is then performed on the input image to which the background image has been added, by one of the plurality of face detectors that has a memory capacity that corresponds to the size of the input image to which the background image has been added.

なお、背景画像は、顔検出部１５（各顔検出器）によってその背景画像内に顔が検出され得ない画像であれば、特に限定されないが、黒等の単色の画像が好適である。また、背景画像を元の画像（拡大縮小画像又は入力画像）のどこに追加するかも特に限定されず、例えば、矩形又は正方形の元の画像の一辺のみに追加してもよいし、隣り合う又は対向する二辺のみに追加してもよいし、三辺のみに追加してもよいし、四辺全てに追加してもよい。何れの場合も背景画像は、通常、元の画像の辺に対して幅が一定の帯状の領域として追加される。 Note that the background image is not particularly limited as long as it is an image in which a face cannot be detected by the face detection unit 15 (each face detector), but a monochromatic image such as black is preferable. Furthermore, there are no particular limitations on where to add the background image to the original image (scaled or reduced image or input image). It may be added to only two sides, only three sides, or all four sides. In either case, the background image is usually added as a band-shaped area with a constant width relative to the sides of the original image.

具体的には、例えば、入力画像のサイズ（解像度）が８００×８００であり、拡大縮小画像のサイズ（解像度）が４００×４００であり、ＶＧＡ（６４０×４８０）、ＦｕｌｌＨＤ（１９２０×１０８０）、４Ｋの画像サイズにそれぞれ対応した３つの顔検出器がある場合、画像サイズ調整部１４は、入力画像及び拡大縮小画像にそれぞれ黒色の背景画像を追加し、入力画像及び拡大縮小画像のサイズをそれぞれＦｕｌｌＨＤ（１９２０×１０８０）及びＶＧＡ（６４０×４８０）に変更する。そして、サイズ変更後の入力画像に対してＦｕｌｌＨＤ（１９２０×１０８０）の画像サイズに対応した顔検出器で顔検出を行い、サイズ変更後の拡大縮小画像に対してＶＧＡ（６４０×４８０）の画像サイズに対応した顔検出器で顔検出を行う。 Specifically, for example, the input image size (resolution) is 800 x 800, the scaled image size (resolution) is 400 x 400, VGA (640 x 480), Full HD (1920 x 1080), When there are three face detectors each corresponding to a 4K image size, the image size adjustment unit 14 adds a black background image to each of the input image and the enlarged/reduced image, and adjusts the sizes of the input image and the enlarged/reduced image, respectively. Change to Full HD (1920x1080) and VGA (640x480). Then, face detection is performed on the input image after resizing using a face detector compatible with the image size of Full HD (1920 x 1080), and a VGA (640 x 480) image is detected on the enlarged/reduced image after resizing. Face detection is performed using a face detector that corresponds to the size.

顔認証部１６は、顔検出部１５の検出結果に基づき認証処理を行う。具体的な認証処理の内容は、特に限定されず、例えば、本人特定、属性（年齢、性別等）の推定、アクセサリ（眼鏡、サングラスの有無等）の検知等の処理が挙げられ、これらのうちの一つの処理のみを行ってもよいし、２以上の処理を行ってもよいが、顔認証部１６は、少なくとも本人特定を行うことが好ましい。顔認証部１６によって参照される顔検出部１５の検出結果は、顔検出部１５によって検出された顔の画像領域を特定できる情報であれば特に限定されず、例えば、バウンディングボックスＢＢの座標、特徴点の位置（例えば左右の目の座標）等が挙げられる。 The face authentication section 16 performs authentication processing based on the detection result of the face detection section 15. The specific content of authentication processing is not particularly limited, and includes, for example, identification of the person, estimation of attributes (age, gender, etc.), detection of accessories (presence of glasses, sunglasses, etc.); Although only one process or two or more processes may be performed, it is preferable that the face authentication section 16 at least performs identification. The detection result of the face detection unit 15 referred to by the face authentication unit 16 is not particularly limited as long as it is information that can specify the image area of the face detected by the face detection unit 15, and for example, the coordinates of the bounding box BB, the characteristics, etc. Examples include the position of a point (for example, the coordinates of the left and right eyes).

図７で説明したように、拡大縮小画像にて顔検出部１５によって検出された顔Ｆ１の領域ｒ１に対応する入力画像中の第１の領域Ｒ１が、入力画像にて顔検出部１５によって検出された顔Ｆ２の領域である第２の領域Ｒ２と重なり、かつ顔検出部１５が拡大縮小画像にて検出した顔Ｆ１に係る情報を検出結果として出力する場合、顔認証部１６は、拡大縮小画像にて検出された顔Ｆ１の座標に対応する入力画像の座標から特定される入力画像の画像領域に基づいて認証処理を行う。すなわち、顔認証部１６は、入力画像から当該画像領域を切り出す処理を行い、当該画像領域に対して認証処理を行う。これにより、顔認証部１６による認証精度の低下を防ぐことが可能である。画像を縮小すると、画素を間引く、又は画素値を重み付けて平均とる処理が入り、入力画像に比べて情報量が落ちるため、認証精度が低下する可能性がある。拡大の場合は、補間処理が入り、情報量は入力画像以下になるため、認証精度は同等以下になる。したがって、上述のように、入力画像から認証用の画像を切り出すことにより、認証精度の低下を防ぐことが可能である。 As explained in FIG. 7, the first region R1 in the input image corresponding to the region r1 of the face F1 detected by the face detection section 15 in the enlarged/reduced image is detected by the face detection section 15 in the input image. When outputting information related to the face F1 that overlaps the second region R2, which is the region of the face F2 that has been enlarged and is detected by the face detection unit 15 in the enlarged/reduced image, as a detection result, the face authentication unit 16 Authentication processing is performed based on the image area of the input image specified from the coordinates of the input image corresponding to the coordinates of the face F1 detected in the image. That is, the face authentication unit 16 performs a process of cutting out the image area from the input image, and performs an authentication process on the image area. Thereby, it is possible to prevent the authentication accuracy by the face authentication section 16 from decreasing. When an image is reduced, pixels are thinned out or pixel values are weighted and averaged, which reduces the amount of information compared to the input image, which may reduce authentication accuracy. In the case of enlarging, interpolation processing is involved and the amount of information is less than the input image, so the authentication accuracy is equal to or less than the input image. Therefore, as described above, by cutting out an image for authentication from an input image, it is possible to prevent a decrease in authentication accuracy.

ここで、拡大縮小画像にて検出された顔Ｆ１の座標とは、顔Ｆ１の領域ｒ１（バウンディングボックスＢＢ）の座標（左上のコーナーの座標及びサイズ）であることが好ましく、顔検出部１５にて検出された顔Ｆ２の第２の領域Ｒ２とは、上述のように、顔Ｆ２のバウンディングボックスＢＢに相当するものである。したがって、顔認証部１６は、拡大縮小画像にて検出された顔Ｆ１の領域ｒ１（バウンディングボックスＢＢ）に対応する入力画像中の第１の領域Ｒ１が、入力画像にて検出された顔Ｆ２の第２の領域Ｒ２（バウンディングボックスＢＢ）と重なり、かつ拡大縮小画像にて検出された顔Ｆ１に係る情報が顔検出部１５の検出結果として出力される場合、入力画像中の第１の領域Ｒ１に基づいて認証処理を行うことがより好ましい。 Here, the coordinates of the face F1 detected in the enlarged/reduced image are preferably the coordinates (coordinates and size of the upper left corner) of the region r1 (bounding box BB) of the face F1, and The second region R2 of the face F2 detected in this manner corresponds to the bounding box BB of the face F2, as described above. Therefore, the face authentication unit 16 determines that the first region R1 in the input image corresponding to the region r1 (bounding box BB) of the face F1 detected in the enlarged/reduced image is the same as that of the face F2 detected in the input image. When information related to the face F1 that overlaps with the second region R2 (bounding box BB) and is detected in the enlarged/reduced image is output as the detection result of the face detection unit 15, the first region R1 in the input image It is more preferable to perform authentication processing based on.

顔認証部１６による認証処理の結果は、モニタ４に表示されることによって利用者に報知される。 The result of the authentication process by the face authentication section 16 is displayed on the monitor 4 to notify the user.

＜顔検出処理の手順＞
次に、図８及び９を用いて、画像処理装置３で行われる画像処理の手順、主に顔検出に係る処理の手順について説明する。ここでは、目間距離の所定範囲が８～２００画素である場合、すなわち目間距離が８～２００画素の大きさの顔を顔検出部１５（各顔検出器）が検出可能な場合について説明する。 <Face detection processing procedure>
Next, the procedure of image processing performed by the image processing device 3, mainly the procedure of processing related to face detection, will be explained using FIGS. 8 and 9. Here, we will explain the case where the predetermined range of the distance between the eyes is 8 to 200 pixels, that is, the case where the face detection unit 15 (each face detector) can detect a face with a size of the distance between the eyes of 8 to 200 pixels. do.

まず、利用者の入力操作等に基づき、設定部１２によって、入力画像から検出すべき顔の大きさ（目間距離）の範囲として、設定範囲が設定される（ステップＳ１１）。ここでは、目間距離の設定範囲を８～４００画素とする。 First, based on the user's input operation, the setting unit 12 sets a setting range as the range of the size of the face (inter-eye distance) to be detected from the input image (step S11). Here, the setting range of the inter-eye distance is 8 to 400 pixels.

次に、映像入力部１１に、カメラ１又はレコーダ２から入力画像（静止画像又は動画像）が入力される（ステップＳ１２）。例えば、入力画像には、目間距離が４００画素の顔が撮像されている。 Next, an input image (still image or moving image) is input to the video input unit 11 from the camera 1 or the recorder 2 (step S12). For example, the input image includes a face with an inter-eye distance of 400 pixels.

次に、解像度変更部１３が、所定範囲と設定範囲とに基づいて、入力画像の解像度を変更した画像である拡大縮小画像を生成する（ステップＳ１３）。具体的には、（所定範囲の最大値）／（設定範囲の最大値）×１００、すなわち２００／４００×１００の式から算出した拡大／縮小率（＝５０％）で入力画像を拡大／又は縮小（ここでは縮小）し、拡大縮小画像を生成する。この結果、顔検出部１５は、拡大縮小画像では、入力画像の解像度に換算して、目間距離が１６～４００画素の顔を検出可能となる。このように、拡大縮小画像では、目間距離が１６未満の顔を検出できないことから、顔検出部１５は、入力画像に対しても顔検出を行う。 Next, the resolution changing unit 13 generates an enlarged/reduced image that is an image in which the resolution of the input image is changed based on the predetermined range and the set range (step S13). Specifically, the input image is enlarged/or The image is reduced (reduced in this case) and a scaled image is generated. As a result, the face detection unit 15 can detect faces with an inter-eye distance of 16 to 400 pixels in terms of the resolution of the input image in the enlarged/reduced image. In this way, since a face with an inter-eye distance of less than 16 cannot be detected in an enlarged/reduced image, the face detection unit 15 also performs face detection on the input image.

次に、画像サイズ調整部１４が、拡大縮小画像及び入力画像にそれぞれ背景画像を追加し、例えば、サイズをそれぞれＶＧＡ（６４０×４８０）及びＦｕｌｌＨＤ（１９２０×１０８０）に調整する（ステップＳ１４）。 Next, the image size adjustment unit 14 adds a background image to each of the enlarged/reduced image and the input image, and adjusts the sizes to, for example, VGA (640×480) and Full HD (1920×1080), respectively (step S14).

次に、顔検出部１５が、拡大縮小画像及び入力画像に対して顔検出処理を実行する（ステップＳ１５）。顔検出部１５は、メモリ容量がＶＧＡ、ＦｕｌｌＨＤ及び４Ｋの画像サイズに対応した３つの顔検出器を有しており、いずれの顔検出器を用いるかは、入力画像及び拡大縮小画像の画像サイズに応じて決定される。ここでは、拡大縮小画像に対しては、メモリ容量がＶＧＡに対応した顔検出器が顔検出を行い、入力画像に対しては、メモリ容量がＦｕｌｌＨＤに対応した顔検出器が顔検出を行う。 Next, the face detection unit 15 performs face detection processing on the enlarged/reduced image and the input image (step S15). The face detection unit 15 has three face detectors whose memory capacities correspond to VGA, Full HD, and 4K image sizes, and which face detector to use depends on the image size of the input image and the enlarged/reduced image. Determined accordingly. Here, a face detector whose memory capacity corresponds to VGA performs face detection on enlarged/reduced images, and a face detector whose memory capacity corresponds to Full HD performs face detection on input images.

その結果、拡大縮小画像及び入力画像にそれぞれに対する顔検出部１５による検出結果が得られる。 As a result, detection results by the face detection unit 15 are obtained for each of the enlarged/reduced image and the input image.

拡大縮小画像及び入力画像のそれぞれで検出結果が得られる場合は、それらの重なり具合で統合する（ステップＳ１６）。具体的には、顔検出部１５が、拡大縮小画像で検出された顔Ｆ１の領域ｒ１（バウンディングボックスＢＢ）に対応する入力画像中の第１の領域Ｒ１と、入力画像で検出された顔Ｆ２の第２の領域Ｒ２（バウンディングボックスＢＢ）とのＩｏＵを算出し、ＩｏＵが所定の基準値以上であれば、いずれか一方の顔Ｆ１又はＦ２（好ましくは、顔らしさに係るスコアがより高い方の顔）に係る情報を採用し、ＩｏＵがその基準値未満であれば、両方の顔Ｆ１及びＦ２に係る情報をそれぞれ採用する。 If detection results are obtained for each of the enlarged/reduced image and the input image, they are integrated based on their degree of overlap (step S16). Specifically, the face detection unit 15 detects a first region R1 in the input image corresponding to region r1 (bounding box BB) of the face F1 detected in the enlarged/reduced image, and a face F2 detected in the input image. The IoU with the second region R2 (bounding box BB) of If the IoU is less than the reference value, the information regarding both faces F1 and F2 is adopted.

そして、顔検出部１５が、ステップＳ１６にて採用した顔の左右の目の座標から目間距離を算出し、算出した目間距離が設定範囲外でなければ、その顔に係る検出結果を出力し、算出した目間距離が設定範囲外であれば、その顔に係る検出結果を出力することなく、その顔に係る情報を削除する（ステップＳ１７）。 Then, the face detection unit 15 calculates the distance between the eyes from the coordinates of the left and right eyes of the face adopted in step S16, and if the calculated distance between the eyes is not outside the set range, outputs the detection result related to the face. However, if the calculated inter-eye distance is outside the set range, the information related to the face is deleted without outputting the detection result related to the face (step S17).

その後、顔認証部１６が、顔検出部１５の検出結果に基づき認証処理を行い（ステップＳ１８）、処理を終了する。 Thereafter, the face authentication section 16 performs authentication processing based on the detection result of the face detection section 15 (step S18), and the processing ends.

以上説明したように、本実施形態では、検出可能な顔の大きさ（目間距離）の範囲を示す所定範囲と、入力画像から検出すべき顔の大きさ（目間距離）の範囲として設定された設定範囲とに基づいて、入力画像の解像度を変更、すなわち拡大縮小し、その拡大縮小画像に対して顔検出部１５が顔検出を行うことから、例え検出すべき顔の大きさ（設定範囲）が検出可能な所定範囲外である場合であっても、顔検出部１５が検出可能な適切な大きさの拡大縮小画像を生成でき、そして、その拡大縮小画像に対して顔検出を行うことができる。また、拡大縮小画像を所定範囲及び設定範囲に基づいて生成することから、必要最低限の拡大縮小画像に対して効率的に顔処理を行うことができる。以上より、利用者が要求する大きさの顔を効率的に検出することができる。 As explained above, in this embodiment, a predetermined range indicating the range of detectable face sizes (inter-eye distance) and a range of face sizes (inter-eye distance) to be detected from the input image are set. The resolution of the input image is changed, that is, scaled based on the set range, and the face detection unit 15 performs face detection on the scaled image. range) is outside the predetermined detectable range, the face detection unit 15 can generate a scaled image of an appropriate size that can be detected, and perform face detection on the scaled image. be able to. Furthermore, since the enlarged/reduced image is generated based on the predetermined range and the set range, face processing can be efficiently performed on the minimum necessary enlarged/reduced image. As described above, it is possible to efficiently detect a face of a size requested by the user.

なお、上記実施形態では、所定範囲の最大値と設定範囲の最大値とから算出される拡大／縮小率により入力画像を拡大／又は縮小する場合について具体的に説明したが、入力画像の拡大／縮小率は、所定範囲の最小値と設定範囲の最小値とから算出してもよいし、所定範囲の中央値と設定範囲の中央値とから算出してもよい。 Note that in the above embodiment, the case where the input image is enlarged/reduced by the enlargement/reduction ratio calculated from the maximum value of the predetermined range and the maximum value of the set range has been specifically described. The reduction rate may be calculated from the minimum value of the predetermined range and the minimum value of the set range, or may be calculated from the median value of the predetermined range and the median value of the set range.

また、上記実施形態では、推論モデル２１が畳み込みニューラルネットワークを利用した深層学習により構築された場合について説明したが、推論モデル２１は、機械学習により作成されたものであれば特に限定されず、推論モデル２１は、深層学習以外の機械学習により作成されたものであってもよい。 Further, in the above embodiment, a case has been described in which the inference model 21 is constructed by deep learning using a convolutional neural network, but the inference model 21 is not particularly limited as long as it is created by machine learning, and The model 21 may be created by machine learning other than deep learning.

また、上記実施形態では、画像処理装置３を一つの装置として構成する場合について説明したが、画像処理装置３の各機能を適宜複数の装置に分散した分散処理システムにより実現してもよい。 Further, in the above embodiment, a case has been described in which the image processing device 3 is configured as one device, but each function of the image processing device 3 may be realized by a distributed processing system that is appropriately distributed among a plurality of devices.

具体的には、例えば、図１０に示すように、画像取得部及び画像記録部としてのカメラ１０１と、カメラ１０１と通信可能に接続されたモニタ及び入力デバイスを備えるパーソナルコンピュータ１０２と、パーソナルコンピュータ１０２と通信可能に接続されたクラウドサーバ１０３とから画像処理システムを構成してもよい。そして、パーソナルコンピュータ１０２に上述の映像入力部１１、設定部１２、解像度変更部１３、画像サイズ調整部１４及び顔検出部１５及びの機能を持たせ、クラウドサーバ１０３に上述の顔認識部１６の機能を持たせ、パーソナルコンピュータ１０２のモニタ及び入力デバイスをそれぞれ表示部及び入力部として利用し、当該モニタにカメラ１０１による画像や、顔検出部１５による検出結果及び顔認識部１６による認証結果を表示してもよい。なお、画像記録部としての機能は、パーソナルコンピュータ１０２又はクラウドサーバ１０３に持たせてもよい。 Specifically, as shown in FIG. 10, for example, a camera 101 as an image acquisition unit and an image recording unit, a personal computer 102 equipped with a monitor and an input device communicably connected to the camera 101, and a personal computer 102 An image processing system may be configured from a cloud server 103 that is communicably connected to the cloud server 103. Then, the personal computer 102 is provided with the functions of the above-mentioned video input section 11, setting section 12, resolution change section 13, image size adjustment section 14, and face detection section 15, and the cloud server 103 is provided with the functions of the above-mentioned face recognition section 16. The monitor and input device of the personal computer 102 are used as a display unit and an input unit, respectively, and the image taken by the camera 101, the detection result by the face detection unit 15, and the authentication result by the face recognition unit 16 are displayed on the monitor. You may. Note that the personal computer 102 or the cloud server 103 may have the function of the image recording unit.

以上、図面を参照しながら本発明の実施形態を説明したが、本発明は、上記実施形態に限定されるものではない。また、各実施形態の構成は、本発明の要旨を逸脱しない範囲において適宜組み合わされてもよいし、変更されてもよい。 Although the embodiments of the present invention have been described above with reference to the drawings, the present invention is not limited to the above embodiments. Furthermore, the configurations of each embodiment may be combined or modified as appropriate without departing from the gist of the present invention.

以上のように、本発明は、画像から、利用者が要求する大きさの顔を効率的に検出するのに有用な技術である。 As described above, the present invention is a useful technique for efficiently detecting a face of a size requested by a user from an image.

１：カメラ
２：レコーダ
３：画像処理装置
４：モニタ
５：入力デバイス
１０：制御部
１１：映像入力部
１２：設定部
１３：解像度変更部
１４：画像サイズ調整部
１５：顔検出部
１６：顔認証部
２０：記憶部
２１：推論モデル
１０１：カメラ
１０２：パーソナルコンピュータ
１０３：クラウドサーバ

1: Camera 2: Recorder 3: Image processing device 4: Monitor 5: Input device 10: Control section 11: Video input section 12: Setting section 13: Resolution change section 14: Image size adjustment section 15: Face detection section 16: Face Authentication unit 20: Storage unit 21: Inference model 101: Camera 102: Personal computer 103: Cloud server

Claims

a face detection unit capable of detecting a face within a predetermined size range from an image;
a storage unit that stores a setting range set by a user as a size range of a face to be detected from an input image;
a resolution changing unit that determines a resolution of an image that can be detected by the face detection unit based on the predetermined range and the setting range, and generates an enlarged/reduced image that is an image in which the resolution of the input image is changed to the determined resolution; and,
Equipped with
The image processing device is characterized in that the face detection unit detects a face on the enlarged/reduced image.

The resolution changing section is configured to detect the face detecting section so that the face detecting section can detect at least one of a face having a size corresponding to the maximum value of the setting range and a face having a size corresponding to the minimum value of the setting range. The image processing apparatus according to claim 1, wherein the enlarged/reduced image is generated by changing the resolution of the input image.

The resolution changing unit generates the enlarged/reduced image by lowering the resolution of the input image so that the face detection unit can detect a face having a size corresponding to at least a maximum value of the setting range. The image processing apparatus according to claim 2, characterized in that:

The image processing apparatus according to claim 1, wherein the face detection unit further detects a face in the input image depending on a minimum value of the setting range.

When the enlargement/reduction ratio of the enlarged/reduced image with respect to the input image is X%, the minimum value of the predetermined range is Min1, and the minimum value of the set range is Min2, (Min2×X/100)<Min1. 5. The image processing apparatus according to claim 4, wherein when the relationship is satisfied, the face detection unit further performs face detection on the input image.

If a first area in the input image corresponding to a face area detected in the enlarged/reduced image overlaps a second area that is a face area detected in the input image, the face detection The image processing device according to claim 4 or 5, wherein the unit outputs information related to one of the faces as a detection result depending on the degree of overlap between the first region and the second region. .

The face detection unit further outputs a score related to the face-likeness of the detected face, and when the first region overlaps the second region and outputs information related to either one of the faces. 7. The image processing apparatus according to claim 6, wherein information regarding a face having a higher face-likeness score is output as a detection result.

further comprising a face authentication unit that performs authentication processing based on the detection result of the face detection unit,
When the first area overlaps the second area and the face detection unit outputs information related to the face detected in the enlarged/reduced image as a detection result, the face authentication unit The image processing device according to claim 6 or 7, wherein the image processing device performs authentication processing based on an image area of the input image specified from coordinates of the input image corresponding to coordinates of the face detected in the image. .

The image processing apparatus according to claim 1, further comprising a face authentication section that performs authentication processing based on a detection result of the face detection section.

If at least a portion of the predetermined range exceeds the set range and the face detection unit detects a face with a size outside the set range, the face detection unit detects a face with a size outside the set range. 10. The image processing apparatus according to claim 1, wherein the image processing apparatus does not output information regarding the face of the person as a detection result.

Any one of claims 1 to 10, wherein the face detection unit detects a plurality of feature points of a face included in an image, and calculates the size of the face based on the plurality of feature points. The image processing device described in .

The face detection unit includes a plurality of face detectors having different memory capacities,
The image processing apparatus according to claim 1, wherein the memory capacity of the plurality of face detectors is set according to the plurality of image sizes.

further comprising an image size adjustment unit that adds a background image to at least a portion of the periphery of the enlarged/reduced image so that the size of the enlarged/reduced image matches the closest image size among the plurality of image sizes;
13. The image processing apparatus according to claim 12, wherein the face detection unit performs face detection on the scaled image to which the background image is added.

a face detection unit capable of detecting a face within a predetermined size range from an image;
a storage unit that stores a setting range set by a user as a size range of a face to be detected from an input image;
a resolution changing unit that determines a resolution of an image that can be detected by the face detection unit based on the predetermined range and the setting range, and generates an enlarged/reduced image that is an image in which the resolution of the input image is changed to the determined resolution; and,
Equipped with
The image processing system is characterized in that the face detection unit detects a face on the enlarged/reduced image.

An image processing method for detecting a face from an image, the method comprising:
storing a setting range set by the user as a size range of the face to be detected from the input image;
An image obtained by determining the resolution of an image that can be detected by a face detection unit based on a predetermined range of detectable face sizes and the setting range, and changing the resolution of the input image to the determined resolution. generating a scaled image that is
Detecting a face on the enlarged/reduced image;
An image processing method characterized by comprising:

computer to detect faces from images,
Resolution of an image that can be detected by the face detection unit based on a predetermined range that is the range of detectable face sizes and a setting range set by the user as the range of face sizes that should be detected from the input image. means for determining a scaled-down image, which is an image obtained by changing the resolution of the input image to the determined resolution; and means for detecting a face in the scaled-down image;
An image processing program characterized by functioning as an image processing program.