JP2023012364A

JP2023012364A - Detection apparatus

Info

Publication number: JP2023012364A
Application number: JP2021115968A
Authority: JP
Inventors: 将幸山崎; Masayuki Yamazaki
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2021-07-13
Filing date: 2021-07-13
Publication date: 2023-01-25

Abstract

To provide a detection apparatus which can improve robustness for accuracy of detecting a position of an object on an image.SOLUTION: A detection apparatus includes: a pre-processing unit 31 which applies a plurality of different pre-processes to an image representing a detection target, to generate a plurality of pre-processed images; a detection unit 32 which inputs the image and each of the pre-processed images to a detector for detecting the detection target, to detect a position of the detection target in the image and each of the pre-processed images; and a position specifying unit 33 which specifies a statistical representative value of the positions of the detection target in the image and each of the pre-processed images, as a position of the detection target on the image.SELECTED DRAWING: Figure 3

Description

本発明は、画像に表された対象物を検出する検出装置に関する。 The present invention relates to a detection device for detecting an object represented in an image.

画像に表された所定の物体を検出する技術が研究されている。特に、所定の物体を検出するように予め学習された検出器に画像を入力することで、その画像からその所定の物体を検出する技術が研究されている。このような技術では、検出器を学習するために利用する教師画像として、様々な環境下で得られた画像、及び、仕様の異なる様々なカメラにより生成された多数の画像が用意されることが望ましい。教師画像として、ある特定の環境下で得られた画像あるいはある特定の仕様のカメラにより生成された画像のみが用いられると、検出器は十分な検出精度を発揮できないことがある。そこで、識別対象物が表された一つの画像から複数の画像を生成し、その複数の画像を識別対象物の状態の識別に利用する技術が提案されている（例えば、特許文献１を参照）。 Techniques for detecting a given object represented in an image have been researched. In particular, research has been conducted on techniques for detecting a given object from an image by inputting the image to a detector that has been pre-trained to detect the given object. In such technology, images obtained under various environments and a large number of images generated by various cameras with different specifications are prepared as teacher images used for learning the detector. desirable. If only an image obtained under a specific environment or an image generated by a camera with specific specifications is used as the teacher image, the detector may not exhibit sufficient detection accuracy. Therefore, a technique has been proposed in which a plurality of images are generated from a single image representing an identification target, and the plurality of images are used to identify the state of the identification target (see, for example, Patent Document 1). .

特許文献１に開示された状態識別装置は、入力された識別対象に係る一つの画像に対し、この一つの画像における識別対象の見え方を変更し且つその状態を維持する処理を施し、この処理で生成された画像を含む画像群を生成する。そしてこの状態識別装置は、識別モデルを構築する際、及び／または識別モデルを用いて状態を識別する際に、生成された画像群を使用する。また、この状態識別装置は、一つの画像に対し、見え方に係る少なくとも一つの種別について、この一つの画像における方向とは反対の方向への変更を少なくとも行う。 The state identification device disclosed in Patent Literature 1 performs a process of changing the appearance of an identification target in one input image of the identification target and maintaining that state, and performs this process. Generate an image group containing the image generated in . The state identifier then uses the generated images in building a discriminative model and/or in using the discriminative model to identify a state. In addition, the state identification device changes at least one type of appearance in one image in a direction opposite to the direction in the one image.

特開２０１８－１１６５８９号公報JP 2018-116589 A

上記の技術では、識別対象物の状態を識別するために、一つの画像から生成された画像群が利用される。しかしながら、画像生成時の撮影環境あるいは画像を生成したカメラの違いにより、画像上での検出対象となる物体の位置の検出精度が低下することがある。 In the above technique, an image group generated from one image is used to identify the state of the identification object. However, the detection accuracy of the position of the object to be detected on the image may be degraded due to differences in the imaging environment at the time of image generation or the camera that generated the image.

そこで、本発明は、画像上での対象物の位置の検出精度に対するロバスト性を向上することが可能な検出装置を提供することを目的とする。 SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide a detection apparatus capable of improving robustness with respect to detection accuracy of the position of an object on an image.

一つの実施形態によれば、検出装置が提供される。この検出装置は、検出対象物が表された画像に対して、互いに異なる複数の前処理をそれぞれ適用することで、複数の前処理画像を生成する前処理部と、画像及び複数の前処理画像のそれぞれを、検出対象物を検出するための検出器に入力することで、画像及び複数の前処理画像のそれぞれにおける検出対象物の位置を検出する検出部と、画像及び複数の前処理画像のそれぞれにおける検出対象物の位置の統計的代表値を、画像上でのその検出対象物の位置として特定する位置特定部とを有する。 According to one embodiment, a detection device is provided. This detection device includes a preprocessing unit that generates a plurality of preprocessed images by applying a plurality of different preprocesses to an image representing a detection target, and an image and the plurality of preprocessed images. are input to a detector for detecting a detection target, a detection unit that detects the position of the detection target in each of the image and the plurality of preprocessed images, and the image and the plurality of preprocessed images. a position specifying unit that specifies a statistical representative value of the position of the detection target in each as the position of the detection target on the image.

本発明に係る検出装置は、画像上での対象物の位置の検出精度に対するロバスト性を向上することができるという効果を奏する。 Advantageous Effects of Invention The detection device according to the present invention has the effect of being able to improve the robustness with respect to the detection accuracy of the position of an object on an image.

検出装置が実装される車両制御システムの概略構成図である。1 is a schematic configuration diagram of a vehicle control system in which a detection device is mounted; FIG. 検出装置の一つの実施形態である電子制御装置のハードウェア構成図である。It is a hardware block diagram of the electronic control unit which is one embodiment of a detection apparatus. 検出処理に関する、電子制御装置のプロセッサの機能ブロック図である。FIG. 4 is a functional block diagram of the processor of the electronic control unit relating to detection processing; 本実施形態による検出処理の概要の説明図である。FIG. 4 is an explanatory diagram of an overview of detection processing according to the present embodiment; 検出処理の動作フローチャートである。4 is an operation flowchart of detection processing;

以下、図を参照しつつ、検出装置、及び、検出装置上で実行される検出方法及び検出用コンピュータプログラムについて説明する。この検出装置は、検出対象物を撮影して得られたオリジナルの画像に対して、互いに異なる複数の前処理をそれぞれ適用することで、複数の前処理画像を生成する。この検出装置は、オリジナルの画像及び複数の前処理画像のそれぞれを、検出対象物を検出するための検出器に入力することで、オリジナルの画像及び複数の前処理画像のそれぞれにおける検出対象物の位置を検出する。そしてこの検出装置は、オリジナルの画像及び複数の前処理画像のそれぞれにおける検出対象物の位置の統計的代表値を、オリジナルの画像上でのその検出対象物の位置として特定することで、位置検出精度のロバスト性の向上を図る。 A detection device, and a detection method and detection computer program executed on the detection device will be described below with reference to the drawings. This detection device generates a plurality of preprocessed images by applying a plurality of mutually different preprocessings to an original image obtained by photographing a detection target. This detection device inputs an original image and a plurality of preprocessed images, respectively, to a detector for detecting a detection target, thereby detecting a detection target in each of the original image and the plurality of preprocessed images. Detect location. Then, the detection device detects the position by specifying the statistical representative value of the position of the detection target in each of the original image and the plurality of preprocessed images as the position of the detection target on the original image. Improve accuracy robustness.

以下では、検出装置を、車両のドライバの顔を継続的に撮影することで得られた時系列の一連の画像に基づいてドライバをモニタリングするドライバモニタ装置に適用した例について説明する。このドライバモニタ装置は、ドライバの頭部を撮影するように設けられたドライバモニタカメラにより生成された画像（以下、説明の便宜上、顔画像と呼ぶ）からドライバの顔の個々の器官（例えば、眼、鼻、口など）の特徴点を検出する。そしてこのドライバモニタ装置は、検出した各器官の特徴点に基づいてドライバの状態を判定する。すなわち、ドライバの顔の各器官の特徴点は、検出対象物の一例である。ここで、このドライバモニタ装置は、顔画像に対して複数の処理の前処理を実行して複数の前処理画像を生成し、顔画像及び各前処理画像をそれぞれ検出器に入力することで顔画像及び各前処理画像からドライバの顔の各器官について、１以上の特徴点を検出する。そしてこのドライバモニタ装置は、各特徴点について、顔画像及び各前処理画像から検出されたその特徴点の位置の統計的代表値を、顔画像におけるその特徴点の位置とする。 An example in which the detection device is applied to a driver monitoring device for monitoring a driver based on a series of time-series images obtained by continuously photographing the face of a driver of a vehicle will be described below. This driver monitor device extracts an image (hereinafter referred to as a face image for convenience of explanation) generated by a driver monitor camera provided to photograph the driver's head, and extracts individual organs (such as eyes) of the driver's face. , nose, mouth, etc.). The driver monitoring device determines the driver's condition based on the detected feature points of each organ. That is, the feature points of each organ of the driver's face are an example of the detection target. Here, this driver monitor device executes a plurality of preprocessings on a face image to generate a plurality of preprocessed images, and inputs the face image and each preprocessed image to a detector, respectively, thereby detecting the face. One or more feature points are detected for each organ of the driver's face from the image and each preprocessed image. For each feature point, the driver monitor device takes the statistical representative value of the position of the feature point detected from the face image and each preprocessed image as the position of the feature point in the face image.

なお、本実施形態による検出装置は、ドライバモニタ装置に限られず、Webカメラあるいは他の監視カメラといった、検出対象物を撮影するカメラにより得られた画像からその対象物を検出することが要求される様々な用途に対して好適に利用される。また、検出対象物は、人物の顔の個々の器官に限られず、人物の頭部以外の部位（例えば、手、足など）、あるいは全身であってもよく、あるいは、人物以外の様々な物体（例えば、車両、人物以外の動物など）であってもよい。 It should be noted that the detection device according to this embodiment is not limited to a driver monitor device, and is required to detect an object from an image obtained by a camera that takes a picture of the detection object, such as a web camera or other surveillance camera. Suitable for various uses. Moreover, the detection target is not limited to individual organs of a person's face, but may be parts other than the head of a person (for example, hands, feet, etc.), the whole body, or various objects other than a person. (For example, it may be a vehicle, an animal other than a person, etc.).

図１は、検出装置が実装される車両制御システムの概略構成図である。また図２は、検出装置の一つの実施形態である電子制御装置のハードウェア構成図である。本実施形態では、車両１０に搭載され、かつ、車両１０を制御する車両制御システム１は、ドライバモニタカメラ２と、ユーザインターフェース３と、検出装置の一例である電子制御装置（ＥＣＵ）４とを有する。ドライバモニタカメラ２及びユーザインターフェース３とＥＣＵ４とは、コントローラエリアネットワークといった規格に準拠した車内ネットワークを介して通信可能に接続される。なお、車両制御システム１は、車両１０の自己位置を測位するためのGPS受信機（図示せず）をさらに有してもよい。また、車両制御システム１は、車両１０の周囲を撮影するためのカメラ（図示せず）、または、LiDARあるいはレーダといった、車両１０から車両１０の周囲に存在する物体までの距離を測定する距離センサ（図示せず）の少なくとも何れかをさらに有していてもよい。さらにまた、車両制御システム１は、他の機器と無線通信するための無線通信端末（図示せず）を有していてもよい。さらにまた、車両制御システム１は、車両１０の走行ルートを探索するためのナビゲーション装置（図示せず）を有していてもよい。 FIG. 1 is a schematic configuration diagram of a vehicle control system in which a detection device is mounted. FIG. 2 is a hardware configuration diagram of an electronic control unit, which is one embodiment of the detection device. In this embodiment, a vehicle control system 1 mounted on a vehicle 10 and controlling the vehicle 10 includes a driver monitor camera 2, a user interface 3, and an electronic control unit (ECU) 4 which is an example of a detection device. have. The driver monitor camera 2, the user interface 3, and the ECU 4 are communicably connected via an in-vehicle network conforming to a standard such as a controller area network. The vehicle control system 1 may further include a GPS receiver (not shown) for positioning the vehicle 10 itself. In addition, the vehicle control system 1 includes a camera (not shown) for photographing the surroundings of the vehicle 10, or a distance sensor such as LiDAR or radar that measures the distance from the vehicle 10 to objects existing around the vehicle 10. (not shown). Furthermore, the vehicle control system 1 may have a wireless communication terminal (not shown) for wireless communication with other devices. Furthermore, the vehicle control system 1 may have a navigation device (not shown) for searching the travel route of the vehicle 10 .

ドライバモニタカメラ２は、カメラまたは車内撮像部の一例であり、CCDあるいはC-MOSなど、可視光または赤外光に感度を有する光電変換素子のアレイで構成された２次元検出器と、その２次元検出器上に撮影対象となる領域の像を結像する結像光学系を有する。ドライバモニタカメラ２は、赤外LEDといったドライバを照明するための光源をさらに有していてもよい。そしてドライバモニタカメラ２は、車両１０の運転席に着座したドライバの頭部がその撮影対象領域に含まれるように、すなわち、ドライバの頭部を撮影可能なように、例えば、インストルメントパネルまたはその近傍にドライバへ向けて取り付けられる。そしてドライバモニタカメラ２は、所定の撮影周期（例えば1/30秒～1/10秒）ごとにドライバの頭部を撮影し、ドライバの顔が表された顔画像を生成する。ドライバモニタカメラ２により得られた顔画像は、カラー画像であってもよく、あるいは、グレー画像であってもよい。ドライバモニタカメラ２は、顔画像を生成する度に、その生成した顔画像を、車内ネットワークを介してＥＣＵ４へ出力する。 The driver monitor camera 2 is an example of a camera or an in-vehicle imaging unit, and includes a two-dimensional detector configured by an array of photoelectric conversion elements sensitive to visible light or infrared light, such as a CCD or C-MOS; It has an imaging optical system that forms an image of an area to be photographed on the dimensional detector. The driver monitor camera 2 may further have a light source for illuminating the driver, such as an infrared LED. Then, the driver monitor camera 2 is mounted on the instrument panel or the like so that the head of the driver seated in the driver's seat of the vehicle 10 is included in the photographing target area, that is, the head of the driver can be photographed. It is mounted in close proximity to the driver. Then, the driver monitor camera 2 photographs the driver's head at predetermined photographing intervals (for example, 1/30 second to 1/10 second) to generate a facial image representing the driver's face. The face image obtained by the driver monitor camera 2 may be a color image or a gray image. Each time the driver monitor camera 2 generates a face image, it outputs the generated face image to the ECU 4 via the in-vehicle network.

ユーザインターフェース３は、通知部の一例であり、例えば、液晶ディスプレイまたは有機ＥＬディスプレイといった表示装置を有する。ユーザインターフェース３は、車両１０の車室内、例えば、インスツルメンツパネルに、ドライバへ向けて設置される。そしてユーザインターフェース３は、ＥＣＵ４から車内ネットワークを介して受信した各種の情報を表示することで、その情報をドライバへ通知する。ユーザインターフェース３は、さらに、車室内に設置されるスピーカを有していてもよい。この場合、ユーザインターフェース３は、ＥＣＵ４から車内ネットワークを介して受信した各種の情報を音声信号として出力することで、その情報をドライバへ通知する。 The user interface 3 is an example of a notification unit, and has a display device such as a liquid crystal display or an organic EL display. The user interface 3 is installed in the interior of the vehicle 10, for example, on an instrument panel, facing the driver. The user interface 3 notifies the driver of the information by displaying various information received from the ECU 4 via the in-vehicle network. The user interface 3 may also have a speaker installed inside the vehicle. In this case, the user interface 3 notifies the driver of various information received from the ECU 4 via the in-vehicle network by outputting the information as an audio signal.

ＥＣＵ４は、顔画像に基づいてドライバの顔の向きを検出し、その顔の向きに基づいてドライバの状態を判定する。そしてＥＣＵ４は、ドライバの状態が、ドライバが余所見をしているといった運転に適さない状態である場合、ユーザインターフェース３を介してドライバへ警告する。 The ECU 4 detects the orientation of the driver's face based on the face image, and determines the driver's state based on the orientation of the face. Then, the ECU 4 warns the driver via the user interface 3 when the driver is in a state unsuitable for driving such as looking away.

図２に示されるように、ＥＣＵ４は、通信インターフェース２１と、メモリ２２と、プロセッサ２３とを有する。通信インターフェース２１、メモリ２２及びプロセッサ２３は、それぞれ、別個の回路として構成されてもよく、あるいは、一つの集積回路として一体的に構成されてもよい。 As shown in FIG. 2, the ECU 4 has a communication interface 21, a memory 22, and a processor . The communication interface 21, memory 22 and processor 23 may each be configured as separate circuits, or may be integrally configured as one integrated circuit.

通信インターフェース２１は、ＥＣＵ４を車内ネットワークに接続するためのインターフェース回路を有する。そして通信インターフェース２１は、ドライバモニタカメラ２から顔画像を受信する度に、受信した顔画像をプロセッサ２３へわたす。また、通信インターフェース２１は、ユーザインターフェース３に表示させる情報をプロセッサ２３から受け取ると、その情報をユーザインターフェース３へ出力する。 The communication interface 21 has an interface circuit for connecting the ECU 4 to the in-vehicle network. Then, the communication interface 21 passes the received face image to the processor 23 each time it receives a face image from the driver monitor camera 2 . Further, when receiving information to be displayed on the user interface 3 from the processor 23 , the communication interface 21 outputs the information to the user interface 3 .

メモリ２２は、記憶部の一例であり、例えば、揮発性の半導体メモリ及び不揮発性の半導体メモリを有する。そしてメモリ２２は、ＥＣＵ４のプロセッサ２３により実行される検出処理を含むドライバモニタ処理において使用される各種のアルゴリズム及び各種のデータを記憶する。例えば、メモリ２２は、前処理画像の生成に利用される各種のパラメータ及び顔の各器官の特徴点の検出に利用される各種のパラメータを記憶する。さらに、メモリ２２は、ドライバモニタカメラ２から受け取った顔画像、及び、前処理画像といった、ドライバモニタ処理の途中で生成される各種のデータを一時的に記憶する。 The memory 22 is an example of a storage unit, and has, for example, a volatile semiconductor memory and a nonvolatile semiconductor memory. The memory 22 stores various algorithms and various data used in driver monitor processing including detection processing executed by the processor 23 of the ECU 4 . For example, the memory 22 stores various parameters used for generating a preprocessed image and various parameters used for detecting feature points of each organ of the face. Furthermore, the memory 22 temporarily stores various data generated during the driver monitor process, such as the face image received from the driver monitor camera 2 and the preprocessed image.

プロセッサ２３は、１個または複数個のＣＰＵ(Central Processing Unit)及びその周辺回路を有する。プロセッサ２３は、論理演算ユニット、数値演算ユニットあるいはグラフィック処理ユニットといった他の演算回路をさらに有していてもよい。そしてプロセッサ２３は、検出処理を含むドライバモニタ処理を実行する。 The processor 23 has one or more CPUs (Central Processing Units) and their peripheral circuits. Processor 23 may further comprise other arithmetic circuitry such as a logic arithmetic unit, a math unit or a graphics processing unit. The processor 23 then executes driver monitor processing including detection processing.

図３は、ドライバモニタ処理に関する、プロセッサ２３の機能ブロック図である。プロセッサ２３は、前処理部３１と、検出部３２と、位置特定部３３と、状態判定部３４とを有する。プロセッサ２３が有するこれらの各部は、例えば、プロセッサ２３上で動作するコンピュータプログラムにより実現される機能モジュールである。あるいは、プロセッサ２３が有するこれらの各部は、プロセッサ２３に設けられる、専用の演算回路であってもよい。なお、プロセッサ２３が有するこれらの各部のうち、前処理部３１、検出部３２及び位置特定部３３が検出処理に関連する。 FIG. 3 is a functional block diagram of processor 23 relating to driver monitor processing. The processor 23 has a preprocessing section 31 , a detection section 32 , a position specifying section 33 and a state determination section 34 . These units of the processor 23 are, for example, functional modules implemented by computer programs running on the processor 23 . Alternatively, each of these units of processor 23 may be a dedicated arithmetic circuit provided in processor 23 . Among these units of the processor 23, the preprocessing unit 31, the detection unit 32, and the position specifying unit 33 are related to detection processing.

前処理部３１は、ＥＣＵ４がドライバモニタカメラ２から受け取った顔画像に対して、互いに異なる複数の前処理をそれぞれ適用することで、複数の前処理画像を生成する。 The preprocessing unit 31 generates a plurality of preprocessed images by applying a plurality of different preprocesses to the face image received by the ECU 4 from the driver monitor camera 2 .

本実施形態では、前処理部３１は、前処理として、例えば、顔画像のサイズまたはアスペクト比の変換処理及びコントラストの変換処理の少なくとも何れかを顔画像に対して実行する。さらに、前処理部３１は、前処理として、色補正処理、エッジ強調処理あるいは平滑化処理を顔画像に対して実行してもよい。また、前処理部３１は、前処理として、上記のサイズ変換、コントラスト変換、色補正などの処理のうちの幾つかを組み合わせたものを実行してもよい。 In the present embodiment, the preprocessing unit 31 performs at least one of conversion processing of the size or aspect ratio of the face image and conversion processing of the contrast as preprocessing, for example. Furthermore, the preprocessing unit 31 may perform color correction processing, edge enhancement processing, or smoothing processing on the face image as preprocessing. Also, the preprocessing unit 31 may perform a combination of some of the above-described size conversion, contrast conversion, color correction, and other processing as preprocessing.

例えば、前処理部３１は、顔画像を、所定のサンプリングレートにて顔画像をリサンプリングすることで、前処理画像の一つとして、顔画像の横方向及び縦方向のサイズを変換したサイズ変換画像を生成する。この場合、前処理部３１は、顔画像において、顔の特定の部位（例えば、眼）が表されていると想定される部分領域を顔画像からトリミングし、その部分領域をリサンプリングしてサイズを拡大したものを、前処理画像の一つとして追加してもよい。なお、部分領域は、例えば、直前に得られた顔画像において顔の特定の部位が表された領域とすることができる。 For example, the preprocessing unit 31 resamples the face image at a predetermined sampling rate, and converts the size of the face image in the horizontal direction and the vertical direction as one of the preprocessed images. Generate an image. In this case, the preprocessing unit 31 trims a partial area assumed to represent a specific part of the face (for example, eyes) from the face image, resamples the partial area, and resizes the partial area. may be added as one of the preprocessed images. Note that the partial area can be, for example, an area in which a specific part of the face is represented in the face image obtained immediately before.

また、前処理部３１は、縦方向と横方向とで異なるサンプリングレートにて顔画像をリサンプリングすることで、前処理画像の他の一つとして、顔画像のアスペクト比を変換したアスペクト比変換画像を生成する。なお、前処理部３１は、上記のリサンプリングの手法として、単純間引き、最近傍法、バイリニア補間、バイキュービック補間といった手法を適用すればよい。そして前処理部３１は、一つの顔画像に対して互いに異なる２以上のリサンプリング手法を適用して、２以上のサイズ変換画像あるいはアスペクト比変換画像を生成してもよい。さらに、前処理部３１は、互いに異なるサンプリングレートを適用して、一つの顔画像からサイズまたはアスペクト比が異なる２以上のサンプリング画像またはアスペクト比変換画像を生成してもよい。 In addition, the preprocessing unit 31 resamples the face image at different sampling rates in the vertical direction and the horizontal direction, and converts the aspect ratio of the face image as another preprocessed image. Generate an image. Note that the preprocessing unit 31 may apply methods such as simple thinning, nearest neighbor method, bilinear interpolation, and bicubic interpolation as the above resampling method. Then, the preprocessing unit 31 may apply two or more different resampling methods to one face image to generate two or more size-converted images or aspect-ratio-converted images. Furthermore, the preprocessing unit 31 may apply different sampling rates to generate two or more sampled images or aspect ratio converted images having different sizes or aspect ratios from one face image.

また、前処理部３１は、入力される画素の値と出力される画素の値の関係を示すコントラスト変換曲線に従って顔画像の各画素の値を変換することで、前処理画像の他の一つとして、顔画像のコントラストを変換したコントラスト変換画像を生成する。なお、画素の値は、例えば、輝度あるいは各色成分値とすることができる。さらに、前処理部３１は、顔画像中の所定領域の色が予め設定された基準色となるように色変変換係数を算出し、算出した色変換係数を用いて顔画像の各画素の値を変換することで、前処理画像の他の一つとして、顔画像を色補正した色補正画像を生成する。この場合、所定領域は、ドライバの姿勢によらず、車室内の所定の物体（例えば、車室の天井）が常に表される領域とすることが好ましい。 Further, the preprocessing unit 31 converts the value of each pixel of the face image according to a contrast conversion curve that indicates the relationship between the value of the input pixel and the value of the output pixel, thereby converting the value of each pixel of the face image into another one of the preprocessed images. , a contrast-converted image is generated by converting the contrast of the face image. Note that the pixel value can be, for example, luminance or each color component value. Further, the preprocessing unit 31 calculates a color transformation coefficient so that the color of a predetermined area in the face image becomes a preset reference color, and uses the calculated color transformation coefficient to determine the value of each pixel of the face image. , a color-corrected image obtained by color-correcting the face image is generated as another preprocessed image. In this case, the predetermined area is preferably an area in which a predetermined object in the vehicle compartment (for example, the ceiling of the vehicle compartment) is always represented regardless of the driver's posture.

さらにまた、前処理部３１は、顔画像に対して所定のエッジ強調フィルタを適用することで、前処理画像の他の一つとして、顔画像のエッジを強調したエッジ強調画像を生成する。さらにまた、前処理部３１は、顔画像に対して所定の平滑化フィルタを適用することで、前処理画像の他の一つとして、顔画像を平滑化した平滑化画像を生成する。 Furthermore, the preprocessing unit 31 applies a predetermined edge enhancement filter to the face image to generate an edge-enhanced image in which the edges of the face image are enhanced as another preprocessed image. Furthermore, the preprocessing unit 31 applies a predetermined smoothing filter to the face image to generate a smoothed image obtained by smoothing the face image as another preprocessed image.

前処理部３１は、生成した各前処理画像を検出部３２へ出力する。 The preprocessing unit 31 outputs each generated preprocessed image to the detection unit 32 .

検出部３２は、顔画像及び各前処理画像から、ドライバの顔の各器官についての１以上の特徴点を検出する。 The detection unit 32 detects one or more feature points for each organ of the driver's face from the face image and each preprocessed image.

検出部３２は、顔の各器官について、１以上の特徴点を検出するために、その器官の特徴点（例えば、目頭、目尻、鼻尖点、口角点など）を検出するように予め学習された検出器に顔画像を入力する。これにより、検出部３２は、顔画像から顔の各器官について１以上の特徴点を検出する。同様に、検出部３２は、その検出器に各前処理画像を入力することで、各前処理画像において顔の各器官の１以上の特徴点を検出する。検出部３２は、そのような検出器として、例えば、Single Shot MultiBox Detector(SSD)、または、Faster R-CNNといった、コンボリューショナルニューラルネットワーク型(CNN)のアーキテクチャを持つディープニューラルネットワーク(DNN)を用いることができる。あるいは、検出部３２は、そのような検出器として、サポートベクトルマシンあるいはAdaBoostといった、他の機械学習手法に基づいて顔の個々の器官の特徴点を検出するように予め学習された検出器を用いてもよい。あるいはまた、検出部３２は、そのような検出器として、Active Shape Model(ASM)あるいはActive Appearance Model(AAM)といった、顔全体の情報を利用する検出器を利用してもよい。 In order to detect one or more feature points for each organ of the face, the detection unit 32 is trained in advance to detect the feature points of the organ (for example, the inner corner of the eye, the outer corner of the eye, the tip of the nose, the corner of the mouth, etc.). Input a face image to the detector. Thereby, the detection unit 32 detects one or more feature points for each organ of the face from the face image. Similarly, the detection unit 32 detects one or more feature points of each organ of the face in each preprocessed image by inputting each preprocessed image into the detector. The detection unit 32 uses a deep neural network (DNN) having a convolutional neural network (CNN) architecture such as a Single Shot MultiBox Detector (SSD) or Faster R-CNN as such a detector. can be used. Alternatively, the detector 32 may use detectors pre-trained to detect feature points of individual facial organs based on other machine learning techniques, such as support vector machines or AdaBoost, as such detectors. may Alternatively, the detector 32 may use a detector that uses information of the entire face, such as an Active Shape Model (ASM) or an Active Appearance Model (AAM).

検出部３２は、顔画像及び各前処理画像について、検出された、顔の個々の器官の１以上の特徴点を表す情報を位置特定部３３へ出力する。なお、器官ごとの１以上の特徴点を表す情報は、例えば、特徴点ごとの位置座標と、その特徴点が表す顔の器官（例えば、眼、鼻、口など）及び器官における位置（例えば、目頭、目尻、鼻尖点、口角点など）を示す識別番号とを含む。 The detection unit 32 outputs to the position specifying unit 33 information representing one or more feature points of each organ of the face detected for the face image and each preprocessed image. The information representing one or more feature points for each organ includes, for example, the position coordinates of each feature point, facial organs represented by the feature points (e.g., eyes, nose, mouth, etc.) and positions in the organs (e.g., and an identification number indicating the inner corner of the eye, the outer corner of the eye, the tip of the nose, the corner of the mouth, etc.).

位置特定部３３は、顔画像及び各前処理画像において検出されたドライバの顔の個々の器官の１以上の特徴点ごとに、その特徴点の位置の統計的代表値を、顔画像上でのその特徴点の位置として特定する。 The position specifying unit 33, for each of one or more feature points of individual organs of the driver's face detected in the face image and each preprocessed image, calculates a statistical representative value of the position of the feature point on the face image. It is specified as the position of the feature point.

例えば、位置特定部３３は、着目する特徴点について顔画像及び各前処理画像のそれぞれにおけるその特徴点の位置の重心あるいは平均値を、その特徴点の位置の統計的代表値として算出する。また、前処理部３１により生成された前処理画像に、サイズ変換画像またはアスペクト比変換画像が含まれている場合には、位置特定部３３は、そのサイズ変換画像またはアスペクト比変換画像を生成するときの変換処理と逆変換の処理を特徴点に対して実行する。これにより、位置特定部３３は、サイズ変換画像またはアスペクト比変換画像上の特徴点の位置を、元の顔画像における座標で表すことができる。また、前処理画像の一つとして、部分領域をリサンプリングして得られたものが含まれている場合、位置特定部３３は、その前処理画像を生成時のサンプリングレートの逆数でリサンプリングすることで、リサンプリングされた前処理画像上での特徴点の位置を求める。さらに、位置特定部３３は、元の顔画像における部分領域の位置に応じて特徴点の位置を補正することで、その特徴点の位置を、元の顔画像における座標で表すことができる。 For example, the position specifying unit 33 calculates the center of gravity or the average value of the position of the feature point of interest in each of the face image and each preprocessed image as the statistical representative value of the position of the feature point. Further, when the preprocessed image generated by the preprocessing unit 31 includes a size-converted image or an aspect-ratio-converted image, the position specifying unit 33 generates the size-converted image or the aspect-ratio-converted image. Transformation processing and inverse transformation processing are performed on feature points. As a result, the position specifying unit 33 can express the position of the feature point on the size-converted image or the aspect-ratio-converted image by coordinates in the original face image. Also, if one of the preprocessed images includes an image obtained by resampling a partial area, the position specifying unit 33 resamples the preprocessed image at the reciprocal of the sampling rate at the time of generation. Thus, the position of the feature point on the resampled preprocessed image is obtained. Furthermore, the position specifying unit 33 corrects the position of the feature point according to the position of the partial area in the original face image, so that the position of the feature point can be represented by the coordinates in the original face image.

変形例によれば、位置特定部３３は、着目する特徴点について、顔画像及び各前処理画像のそれぞれにおけるその特徴点の位置のうち、所定の外れ値基準を満たす位置となる特徴点を除外してもよい。そして位置特定部３３は、残りの特徴点の位置に基づいて、着目する特徴点の位置の統計的代表値を算出してもよい。この場合、位置特定部３３は、所定の外れ値基準を、例えば、k近傍法、Random Sample Consensus(RANSAC)法あるいはLeast Median of Square(LMedS)法に従って設定すればよい。このように、位置特定部３３は、顔画像及び各前処理画像のそれぞれから検出された、着目する特徴点のうち、位置が外れ値基準を満たす特徴点を、統計的代表値の算出から除外することで、より正確にその着目する特徴点の位置を算出することができる。また、一つの器官について複数の特徴点が検出されている場合、位置特定部３３は、その器官について、顔画像及び各前処理画像において、特徴点間の対応付けを変更しながら、その対応する特徴点の組について距離の二乗和を算出する。そして位置特定部３３は、その距離の二乗和が最小となるときの特徴点の組を、互いに対応する特徴点とすればよい。 According to the modified example, the position specifying unit 33 excludes, from among the positions of the feature points in each of the face image and each preprocessed image, the feature points that satisfy a predetermined outlier criterion for the feature points of interest. You may Then, the position specifying unit 33 may calculate a statistical representative value of the positions of the feature points of interest based on the positions of the remaining feature points. In this case, the position specifying unit 33 may set a predetermined outlier criterion according to, for example, the k-nearest neighbor method, the Random Sample Consensus (RANSAC) method, or the Least Median of Square (LMedS) method. In this way, the position specifying unit 33 excludes, from among the feature points of interest detected from each of the face image and each preprocessed image, feature points whose positions satisfy the outlier criterion from the calculation of the statistical representative value. By doing so, the position of the feature point of interest can be calculated more accurately. Further, when a plurality of feature points are detected for one organ, the position specifying unit 33 changes the correspondence between the feature points in the face image and each preprocessed image for that organ, and changes the correspondence between the feature points. Calculate the sum of squared distances for the set of feature points. Then, the position specifying unit 33 may set the pair of feature points when the sum of the squares of the distances is the minimum as the feature points corresponding to each other.

図４は、本実施形態による検出処理の概要の説明図である。入力された顔画像４００に対して互いに異なるｎ個（ｎは２以上の整数）の前処理が実行されることで、ｎ個の前処理画像４０１－１～４０１－ｎが生成される。そして顔画像４００及び前処理画像４０１－１～４０１－ｎのそれぞれが検出器４０２に入力されることで、顔画像４００及び前処理画像４０１－１～４０１－ｎのそれぞれから、顔の器官ごとに特徴点４０３が検出される。そして特徴点４０３ごとに、顔画像４００及び前処理画像４０１－１～４０１－ｎのそれぞれにおけるその特徴点の位置から、外れ値基準を満たす位置となる特徴点が除外される。そして除外されていない各画像におけるその特徴点の位置の統計的代表値として、顔画像４００における、特徴点４０３の位置が特定される。 FIG. 4 is an explanatory diagram of an overview of detection processing according to the present embodiment. The input face image 400 is subjected to n different preprocessing (n is an integer of 2 or more) to generate n preprocessed images 401-1 to 401-n. Then, the face image 400 and the preprocessed images 401-1 to 401-n are input to the detector 402, so that each facial organ is detected from the face image 400 and the preprocessed images 401-1 to 401-n. A feature point 403 is detected at . Then, for each feature point 403, feature points that satisfy the outlier criterion are excluded from the positions of the feature points in the face image 400 and the preprocessed images 401-1 to 401-n. Then, the position of the feature point 403 in the face image 400 is specified as a statistical representative value of the position of that feature point in each non-excluded image.

位置特定部３３は、ドライバの顔の各器官の１以上の特徴点のそれぞれの位置を状態判定部３４へ通知する。 The position specifying unit 33 notifies the state determination unit 34 of each position of one or more feature points of each organ of the driver's face.

状態判定部３４は、顔画像に表されたドライバの顔の各器官の特徴点の位置に基づいて、ドライバの状態を判定する。 The state determination unit 34 determines the state of the driver based on the position of the feature point of each organ of the driver's face represented in the face image.

本実施形態では、状態判定部３４は、顔領域に表されたドライバの顔の向きとドライバの顔の基準方向とを比較することで、ドライバの状態が車両１０の運転に適した状態か否か判定する。なお、顔の基準方向は、メモリ２２に予め記憶される。 In this embodiment, the state determination unit 34 compares the direction of the driver's face shown in the face area with the reference direction of the driver's face to determine whether the driver's state is suitable for driving the vehicle 10. determine whether Note that the reference direction of the face is stored in the memory 22 in advance.

状態判定部３４は、顔の個々の器官の特徴点を、顔の３次元形状を表す３次元顔モデルにフィッティングする。そして状態判定部３４は、各特徴点が３次元顔モデルに最もフィッティングする際の３次元顔モデルの顔の向きを、ドライバの顔の向きとして検出する。あるいは、状態判定部３４は、顔の各器官の特徴点に基づいて画像に表された顔の向きを判定する他の手法に従って、ドライバの顔の向きを検出してもよい。なお、ドライバの顔の向きは、例えば、ドライバモニタカメラ２に対して正対する方向を基準とする、ピッチ角、ヨー角及びロール角の組み合わせで表される。 The state determination unit 34 fits the feature points of individual organs of the face to a three-dimensional face model representing the three-dimensional shape of the face. Then, the state determination unit 34 detects the face orientation of the three-dimensional face model when each feature point is best fitted to the three-dimensional face model as the face orientation of the driver. Alternatively, the state determination unit 34 may detect the orientation of the driver's face according to another method of determining the orientation of the face represented in the image based on the feature points of each organ of the face. The orientation of the driver's face is represented by, for example, a combination of the pitch angle, yaw angle, and roll angle with reference to the direction facing the driver monitor camera 2 .

状態判定部３４は、顔領域に表されたドライバの顔の向きとドライバの顔の基準方向との差の絶対値を算出し、その差の絶対値を所定の顔向き許容範囲と比較する。そして状態判定部３４は、その差の絶対値が顔向き許容範囲から外れている場合、ドライバは余所見をしている、すなわち、ドライバの状態は車両１０の運転に適した状態でないと判定する。 The state determination unit 34 calculates the absolute value of the difference between the orientation of the driver's face represented in the face area and the reference direction of the driver's face, and compares the absolute value of the difference with a predetermined permissible face orientation range. Then, if the absolute value of the difference is out of the allowable face orientation range, the state determination unit 34 determines that the driver is looking away, that is, the driver is not in a state suitable for driving the vehicle 10 .

なお、ドライバは、車両１０の周辺の状況の確認のために、車両１０の正面方向以外を向くことがある。ただしそのような場合でも、ドライバが車両１０の運転に集中していれば、ドライバは、車両１０の正面方向以外を継続して向くことはない。そこで変形例によれば、状態判定部３４は、ドライバの顔の向きとドライバの顔の基準方向との差の絶対値が顔向き許容範囲から外れている期間が所定時間（例えば、数秒間）以上継続した場合に、ドライバの状態は車両１０の運転に適した状態でないと判定してもよい。 It should be noted that the driver may face a direction other than the front of the vehicle 10 in order to check the situation around the vehicle 10 . However, even in such a case, if the driver is concentrating on driving the vehicle 10, the driver will not continue to face the vehicle 10 in any direction other than the front. Therefore, according to the modified example, the state determination unit 34 determines that the period in which the absolute value of the difference between the orientation of the driver's face and the reference direction of the driver's face is outside the allowable face orientation range is a predetermined time (for example, several seconds). If this continues, it may be determined that the driver's condition is not suitable for driving the vehicle 10 .

状態判定部３４は、ドライバの状態が車両１０の運転に適した状態でないと判定した場合、ドライバに対して車両１０の正面を向くように警告する警告メッセージを含む警告情報を生成する。そして状態判定部３４は、生成した警告情報を、通信インターフェース２１を介してユーザインターフェース３へ出力することで、ユーザインターフェース３にその警告メッセージを表示させる。あるいは、状態判定部３４は、ユーザインターフェース３が有するスピーカに、ドライバに対して車両１０の正面を向くように警告する音声を出力させる。 When the state determination unit 34 determines that the driver's state is not suitable for driving the vehicle 10 , the state determination unit 34 generates warning information including a warning message to warn the driver to face the front of the vehicle 10 . Then, the state determination unit 34 outputs the generated warning information to the user interface 3 via the communication interface 21, thereby causing the user interface 3 to display the warning message. Alternatively, the state determination unit 34 causes the speaker of the user interface 3 to output a sound warning the driver to face the front of the vehicle 10 .

図５は、プロセッサ２３により実行される、ドライバモニタ処理の動作フローチャートである。プロセッサ２３は、以下の動作フローチャートに従って、検出処理を含むドライバモニタ処理を実行すればよい。なお、以下に示される動作フローチャートのうち、ステップＳ１０１～Ｓ１０３の処理が、検出処理に相当する。 FIG. 5 is an operation flowchart of driver monitor processing executed by the processor 23 . The processor 23 may execute the driver monitor process including the detection process according to the following operation flowchart. In the operation flowchart shown below, the processing of steps S101 to S103 corresponds to the detection processing.

プロセッサ２３の前処理部３１は、ＥＣＵ４がドライバモニタカメラ２から受け取った顔画像に対して、互いに異なる複数の前処理をそれぞれ適用することで、複数の前処理画像を生成する（ステップＳ１０１）。また、プロセッサ２３の検出部３２は、顔画像及び各前処理画像から、ドライバの顔の各器官について１以上の特徴点を検出する（ステップＳ１０２）。 The preprocessing unit 31 of the processor 23 generates a plurality of preprocessed images by applying a plurality of different preprocesses to the face image received by the ECU 4 from the driver monitor camera 2 (step S101). Further, the detection unit 32 of the processor 23 detects one or more feature points for each organ of the driver's face from the face image and each preprocessed image (step S102).

プロセッサ２３の位置特定部３３は、顔画像及び各前処理画像において検出されたドライバの顔の個々の器官の１以上の特徴点ごとに、その特徴点の位置の統計的代表値を、顔画像上でのその特徴点の位置として特定する（ステップＳ１０３）。なお、上記のように、位置特定部３３は、顔画像及び各前処理画像のそれぞれから検出された特徴点のうち、位置が所定の外れ値基準を満たす特徴点を、統計的代表値の算出から除外してもよい。 The position specifying unit 33 of the processor 23 calculates, for each one or more feature points of individual organs of the driver's face detected in the face image and each preprocessed image, a statistical representative value of the position of the feature point in the face image. It is specified as the position of the feature point on the top (step S103). Note that, as described above, the position specifying unit 33 calculates the statistical representative value of the feature points whose positions satisfy the predetermined outlier criterion among the feature points detected from each of the face image and each preprocessed image. can be excluded from

プロセッサ２３の状態判定部３４は、個々の器官の特徴点の位置に基づいてドライバの顔の向きを検出して、ドライバの状態が車両１０の運転に適した状態か否か判定する（ステップＳ１０４）。そして状態判定部３４は、その判定結果に応じた警告処理などを実行する。その後、プロセッサ２３は、ドライバモニタ処理を終了する。 The state determination unit 34 of the processor 23 detects the orientation of the driver's face based on the positions of the feature points of the individual organs, and determines whether or not the driver's state is suitable for driving the vehicle 10 (step S104). ). Then, the state determination unit 34 executes warning processing or the like according to the determination result. After that, the processor 23 terminates the driver monitor process.

以上に説明してきたように、この検出装置は、検出対象物を撮影して得られたオリジナルの画像に対して、互いに異なる複数の前処理をそれぞれ適用することで、複数の前処理画像を生成する。この検出装置は、オリジナルの画像及び複数の前処理画像のそれぞれを、検出対象物を検出するための検出器に入力することで、オリジナルの画像及び複数の前処理画像のそれぞれにおける検出対象物の位置を検出する。そしてこの検出装置は、オリジナルの画像及び複数の前処理画像のそれぞれにおける検出対象物の位置の統計的代表値を、オリジナルの画像上でのその検出対象物の位置として特定する。これにより、この検出装置は、画像上での検出対象物についての位置検出精度のロバスト性を向上することができる。 As described above, this detection device generates a plurality of preprocessed images by applying a plurality of different preprocessings to the original image obtained by photographing the detection target. do. This detection device inputs an original image and a plurality of preprocessed images, respectively, to a detector for detecting a detection target, thereby detecting a detection target in each of the original image and the plurality of preprocessed images. Detect location. The detection device then identifies a statistically representative value of the position of the detection target in each of the original image and the plurality of preprocessed images as the position of the detection target on the original image. As a result, this detection device can improve the robustness of the position detection accuracy for the detection target on the image.

変形例によれば、所定期間にわたって得られた一連の顔画像から生成された特定の種類の前処理画像について、特徴点の位置が常に外れ値基準を満たす場合、位置特定部３３は、それ以降、前処理部３１にその特定の種類の前処理画像を生成させないようにしてもよい。これにより、検出処理全体の演算量が削減される。 According to a variant, for a particular type of preprocessed image generated from a series of facial images obtained over a predetermined period of time, if the position of the feature point always satisfies the outlier criterion, the position identifying unit 33 thereafter , the preprocessing unit 31 may be prevented from generating that particular type of preprocessed image. This reduces the amount of calculation for the entire detection process.

上記の実施形態または変形例による、ＥＣＵ４のプロセッサ２３の機能を実現するコンピュータプログラムは、半導体メモリ、磁気記録媒体または光記録媒体といった、コンピュータ読取可能な可搬性の記録媒体に記録された形で提供されてもよい。 A computer program that realizes the functions of the processor 23 of the ECU 4 according to the above embodiment or modification is provided in a form recorded in a computer-readable portable recording medium such as a semiconductor memory, a magnetic recording medium, or an optical recording medium. may be

以上のように、当業者は、本発明の範囲内で、実施される形態に合わせて様々な変更を行うことができる。 As described above, those skilled in the art can make various modifications within the scope of the present invention according to the embodiment.

１車両制御システム
１０車両
２ドライバモニタカメラ
３ユーザインターフェース
４電子制御装置(ＥＣＵ)
２１通信インターフェース
２２メモリ
２３プロセッサ
３１前処理部
３２検出部
３３位置特定部
３４状態判定部 1 vehicle control system 10 vehicle 2 driver monitor camera 3 user interface 4 electronic control unit (ECU)
21 communication interface 22 memory 23 processor 31 preprocessing unit 32 detection unit 33 position specifying unit 34 state determination unit

Claims

a preprocessing unit that generates a plurality of preprocessed images by applying a plurality of different preprocesses to an image representing a detection target;
Detecting the position of the detection target in each of the image and the plurality of preprocessed images by inputting each of the image and the plurality of preprocessed images into a detector for detecting the detection target. a detection unit that
a position specifying unit that specifies a statistical representative value of the position of the detection target in each of the image and the plurality of preprocessed images as the position of the detection target on the image;
A detection device having