JP2018147240A

JP2018147240A - Image processing device, image processing method, and image processing program

Info

Publication number: JP2018147240A
Application number: JP2017041837A
Authority: JP
Inventors: 巧坂口; Takumi Sakaguchi; 寺田　佳久; Yoshihisa Terada; 佳久寺田
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2017-03-06
Filing date: 2017-03-06
Publication date: 2018-09-20

Abstract

PROBLEM TO BE SOLVED: To hasten the timing to start detecting while preventing the accuracy from decreasing in object detection processing in which learning model data having a plurality of stages is applied.SOLUTION: In an image processing device 10 that detects feature points of an object by correcting, in stages, the initial feature points within an input image by applying learning model data of N (N is an integer of 2 or more) stages each of which contribute to the derivation of correction vectors of the feature points, at least at the time of start-up, a correction vector derivation unit and a feature point correction unit start the correction processing of the feature points before it completes loading the learning model data of the N stages into a work area unit 15, and the correction vector derivation unit derives, when it derives the correction vector of a stage t(1≤t≤N) and if the learning model data of the stage t has not been loaded into the work area unit 15, the correction vector by substituting the learning model data with that of another stage.SELECTED DRAWING: Figure 1

Description

本発明は、所定のオブジェクトの特徴点を検出する画像処理装置、画像処理方法、及び画像処理プログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and an image processing program that detect feature points of a predetermined object.

近年、入力画像内のオブジェクト（例えば、人間の顔）の特徴点を、形状回帰モデルを使用して検出する手法が提案されている。形状回帰モデルは、初期の特徴点を複数ステージの修正処理により正解の特徴点に逐次的に近似させていくモデルであり、複数ステージ分の学習モデルデータを必要とする。 In recent years, methods have been proposed for detecting feature points of an object (for example, a human face) in an input image using a shape regression model. The shape regression model is a model in which initial feature points are sequentially approximated to correct feature points by a correction process of a plurality of stages, and requires learning model data for a plurality of stages.

特開２０１６−６６１１４号公報JP-A-2006-66114

従来の形状回帰モデルを使用した特徴点検出では、起動時に容量の大きい学習モデルデータをワークエリアにロードする際、全ての学習データのロードが完了しなければオブジェクトの検知処理を開始することができず、起動から検知開始までに時間がかかっていた。 In the feature point detection using the conventional shape regression model, when loading large-capacity learning model data to the work area at startup, the object detection process can be started if all the learning data has not been loaded. It took a long time from startup to detection start.

本発明はこうした状況に鑑みなされたものであり、その目的は、複数ステージの学習モデルデータを用いたオブジェクト検知処理において、精度低下を抑制しつつ、検知開始タイミングを早める技術を提供することにある。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide a technique for accelerating detection start timing while suppressing deterioration in accuracy in object detection processing using learning model data of a plurality of stages. .

上記課題を解決するために、本発明のある態様の画像処理装置は、それぞれが特徴点の修正ベクトルの導出に寄与するＮ（Ｎは２以上の整数）ステージの学習モデルデータを用いて、入力画像内の初期の特徴点を段階的に修正してオブジェクトの特徴点を検出する画像処理装置であって、前記Ｎステージの学習モデルデータをロードするためのワークエリア部と、入力画像内の初期の各特徴点の周辺の特徴量または１つ前のステージにおいて修正された各特徴点の周辺の特徴量をもとに、前記ワークエリア部に保持される、対応するステージの学習モデルデータを参照して各特徴点の修正ベクトルを導出する修正ベクトル導出部と、前記入力画像内の各特徴点を、導出された各修正ベクトルをもとに修正する特徴点修正部と、を備える。少なくとも本画像処理装置の起動時において、前記Ｎステージの学習モデルデータを記録している記録部から前記ワークエリア部への前記Ｎステージの学習モデルデータのロードが完了する前から、前記修正ベクトル導出部および前記特徴点修正部は、特徴点の修正処理を開始し、前記修正ベクトル導出部は、ステージｔ（１≦ｔ≦Ｎ）の修正ベクトルを導出する際に、前記ワークエリア部にステージｔの学習モデルデータがロードされていない場合、他のステージの学習モデルデータを代用して修正ベクトルを導出する。 In order to solve the above problems, an image processing apparatus according to an aspect of the present invention uses N (N is an integer of 2 or more) stages of learning model data, each of which contributes to the derivation of a feature point correction vector. An image processing apparatus for detecting feature points of an object by correcting initial feature points in an image stepwise, a work area unit for loading the N-stage learning model data, and an initial stage in an input image Refer to the learning model data of the corresponding stage held in the work area based on the feature quantity around each feature point or the feature quantity around each feature point modified in the previous stage. Then, a correction vector deriving unit for deriving a correction vector of each feature point, and a feature point correcting unit for correcting each feature point in the input image based on each derived correction vector. At least when the image processing apparatus is activated, the correction vector derivation is performed before the loading of the N stage learning model data from the recording unit recording the N stage learning model data to the work area unit is completed. And the feature point correction unit start the feature point correction process, and the correction vector deriving unit determines the stage t in the work area unit when deriving the correction vector of the stage t (1 ≦ t ≦ N). If the learning model data is not loaded, the correction vector is derived by substituting the learning model data of the other stages.

なお、以上の構成要素の任意の組み合わせ、本発明の表現を方法、装置、システム、コンピュータプログラム、コンピュータプログラムを記録した記録媒体などの間で変換したものもまた、本発明の態様として有効である。 Note that any combination of the above-described constituent elements and the expression of the present invention converted between a method, an apparatus, a system, a computer program, a recording medium recording the computer program, etc. are also effective as an aspect of the present invention. .

本発明によれば、複数ステージの学習モデルデータを用いたオブジェクト検知処理において、精度低下を抑制しつつ、検知開始タイミングを早めることができる。 According to the present invention, in object detection processing using learning model data of a plurality of stages, it is possible to advance detection start timing while suppressing a decrease in accuracy.

本発明の実施の形態に係る撮像システムの構成を示すブロック図である。It is a block diagram which shows the structure of the imaging system which concerns on embodiment of this invention. 図２（ａ）−（ｃ）は、特徴点検出部による特徴点修正処理で使用されるＮステージ分の学習モデルデータの生成アルゴリズムを説明するための図である。FIGS. 2A to 2C are diagrams for explaining an algorithm for generating learning model data for N stages used in the feature point correction process by the feature point detection unit. 図１のオブジェクト検出部及び特徴点検出部の構成を詳細に示したブロック図である。FIG. 2 is a block diagram illustrating in detail a configuration of an object detection unit and a feature point detection unit in FIG. 1. 実施例１に係る、各検知ステージと使用する学習モデルデータの関係を示す図である。It is a figure which shows the relationship between each detection stage and learning model data to be used based on Example 1. FIG. 実施例２に係る、各検知ステージと使用する学習モデルデータの関係を示す図である。It is a figure which shows the relationship between each detection stage and learning model data to be used based on Example 2. FIG. 実施例２に係るステージ選択ルールの決定処理の一例を示す図である。FIG. 10 is a diagram illustrating an example of stage selection rule determination processing according to the second embodiment. 図７（ａ）、（ｂ）は、比較例に係る画像処理装置の動作例を示すフローチャートである。FIGS. 7A and 7B are flowcharts illustrating an operation example of the image processing apparatus according to the comparative example. 図８（ａ）−（ｃ）は、図７（ａ）、（ｂ）の動作例により検出される特徴点の具体例を示す図である。FIGS. 8A to 8C are diagrams showing specific examples of feature points detected by the operation examples of FIGS. 7A and 7B. 図９（ａ）、（ｂ）は、実施例２に係る画像処理装置の動作例を示すフローチャートである。FIGS. 9A and 9B are flowcharts illustrating an operation example of the image processing apparatus according to the second embodiment.

図１は、本発明の実施の形態に係る撮像システム１の構成を示すブロック図である。以下、本明細書では撮像システム１として、車両内の運転者の顔を撮影するためのカメラシステムを想定する。撮像システム１は、撮像部２０及び画像処理装置１０を備える。撮像部２０は車両内において運転席の前方上部に、運転者の顔を画角に収める向きに設置される。撮像部２０が実装される基板と、画像処理装置１０が実装される基板は１つの筐体に一体的に収納されてもよいし、別々の筐体に分離されて収納されてもよい。後者の場合、撮像部２０と画像処理装置１０間は有線または無線で接続される。 FIG. 1 is a block diagram showing a configuration of an imaging system 1 according to an embodiment of the present invention. Hereinafter, in this specification, a camera system for photographing a driver's face in a vehicle is assumed as the imaging system 1. The imaging system 1 includes an imaging unit 20 and an image processing device 10. The imaging unit 20 is installed in the upper front part of the driver's seat in the vehicle so that the driver's face is within the angle of view. The substrate on which the imaging unit 20 is mounted and the substrate on which the image processing apparatus 10 is mounted may be integrally stored in one housing, or may be stored separately in separate housings. In the latter case, the imaging unit 20 and the image processing apparatus 10 are connected by wire or wireless.

撮像部２０は、固体撮像素子及び信号処理回路を含む。固体撮像素子には例えば、ＣＭＯＳイメージセンサ又はＣＣＤイメージセンサを使用することができる。固体撮像素子は入射光を電気的な画像信号に変換し、信号処理回路に出力する。信号処理回路は、固体撮像素子から入力される画像信号に対して、Ａ／Ｄ変換、ノイズ除去などの信号処理を実行し、画像処理装置１０に出力する。なお当該信号処理回路で、歪み補正、フレーム／画素の間引き等の所定の前処理を実行してもよい。 The imaging unit 20 includes a solid-state imaging device and a signal processing circuit. For example, a CMOS image sensor or a CCD image sensor can be used for the solid-state imaging device. The solid-state imaging device converts incident light into an electrical image signal and outputs it to a signal processing circuit. The signal processing circuit performs signal processing such as A / D conversion and noise removal on the image signal input from the solid-state imaging device, and outputs the signal to the image processing apparatus 10. Note that the signal processing circuit may execute predetermined preprocessing such as distortion correction and frame / pixel thinning.

画像処理装置１０は、処理部１１、ワークエリア部１５及び不揮発性記録部１６を含む。ワークエリア部１５は、ＤＲＡＭ、ＳＤＲＡＭ等の揮発性メモリで構成される記憶領域を含む。不揮発性記録部１６は、ＮＡＮＤ型／ＮＯＲ型フラッシュメモリ等で構成される大容量の不揮発性メモリを備える。なお不揮発性記録部１６は、着脱可能な記録媒体を備える構成でもよく、半導体メモリカード、光ディスク、リムーバブルハードディスク等が装着される構成でもよい。不揮発性記録部１６には、特徴点検出部１３による特徴点修正処理で使用するＮ（Ｎは２以上の整数）ステージ分の学習モデルデータが記録されている。 The image processing apparatus 10 includes a processing unit 11, a work area unit 15, and a nonvolatile recording unit 16. The work area unit 15 includes a storage area composed of a volatile memory such as DRAM or SDRAM. The non-volatile recording unit 16 includes a large-capacity non-volatile memory including a NAND / NOR flash memory or the like. The nonvolatile recording unit 16 may be configured to include a detachable recording medium, or may be configured to be mounted with a semiconductor memory card, an optical disk, a removable hard disk, and the like. The nonvolatile recording unit 16 stores learning model data for N (N is an integer of 2 or more) stages used in the feature point correction process by the feature point detection unit 13.

処理部１１は、オブジェクト検出部１２、特徴点検出部１３及びアプリケーション処理部１４を含む。処理部１１の構成は、ハードウェア資源とソフトウェア資源の協働、またはハードウェア資源のみにより実現できる。ハードウェア資源として、ＣＰＵ、ＧＰＵ、ＤＳＰ、ＦＰＧＡ、その他のＬＳＩを利用できる。ソフトウェア資源としてオペレーティングシステム、ファームウェア、アプリケーション等のプログラムを利用できる。 The processing unit 11 includes an object detection unit 12, a feature point detection unit 13, and an application processing unit 14. The configuration of the processing unit 11 can be realized by cooperation of hardware resources and software resources, or only by hardware resources. As a hardware resource, a CPU, GPU, DSP, FPGA, or other LSI can be used. Programs such as an operating system, firmware, and applications can be used as software resources.

オブジェクト検出部１２は、入力画像内から所定のオブジェクトの識別器を用いて、所定のオブジェクトの大凡の位置と大きさを検出する。以下、本明細書では所定のオブジェクトとして運転者の顔を想定する。特徴点検出部１３は、検知対象のオブジェクトを含む入力画像に設定された初期の特徴点座標を、複数のステージで逐次修正し、当該オブジェクトの真の特徴点と推定される特徴点を検出する。アプリケーション処理部１４は、特徴点検出部１３により検出されたオブジェクトの特徴点をもとに所定のアプリケーション処理を実行する。本実施の形態では例えば、運転者の個人認証、顔向き判定、わき見判定などのアプリケーション処理を実行する。 The object detection unit 12 detects an approximate position and size of a predetermined object from the input image using a predetermined object identifier. Hereinafter, in this specification, a driver's face is assumed as a predetermined object. The feature point detection unit 13 sequentially corrects the initial feature point coordinates set in the input image including the object to be detected at a plurality of stages, and detects the feature point estimated as the true feature point of the object. . The application processing unit 14 executes predetermined application processing based on the feature points of the object detected by the feature point detection unit 13. In the present embodiment, for example, application processing such as driver personal authentication, face orientation determination, side-view determination, and the like is executed.

図２（ａ）−（ｃ）は、特徴点検出部１３による特徴点修正処理で使用されるＮステージ分の学習モデルデータの生成アルゴリズムを説明するための図である。図２（ａ）−（ｃ）に示す例では、顔の特徴点として、左右の眉頭、左右の目頭、左右の目尻、鼻の穴の間、口の両端に合計９点を設定している。ステージ１では入力画像内に平均的な顔の特徴点（濃い点）を初期特徴点として設定する（図２（ａ）参照）。 FIGS. 2A to 2C are diagrams for explaining an algorithm for generating learning model data for N stages used in the feature point correction process by the feature point detection unit 13. In the example shown in FIGS. 2A to 2C, a total of nine points are set as the facial feature points between the left and right eyebrows, the left and right eyes, the right and left eye corners, the nostrils, and both ends of the mouth. . In stage 1, an average facial feature point (dark point) is set as an initial feature point in the input image (see FIG. 2A).

学習アルゴリズムは、ステージ１における初期特徴点の周囲の特徴量を特徴点ごとに算出し、各特徴点の周囲のパターンを分類する。当該特徴量には輝度の勾配やＨＯＧ（ＨｉｓｔｇｒａｍｓｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓ）特徴量、あるいはその他の任意の特徴量を用いることができる。なお、抽出した特徴量をさらにＫ−ｍｅａｎｓ法などによるクラスタリング、あるいは主成分分析などにより次元圧縮をするなどしたものを当該特徴量としてもよい。学習アルゴリズムは、設定された初期特徴点（濃い点）と、正解特徴点（薄い丸）との差分を修正ベクトルとして算出する。ここで、正解特徴点は、例えばあらかじめ人間が目視で判断して座標を定義する。学習アルゴリズムは、特徴点の周囲のパターンと修正ベクトルを関連付けてステージ１の学習モデルデータとして保存する。例えば、各特徴点の分類パターンごとにルックアップテーブルを用意して、当該ルックアップテーブルに、同一の当該パターンに分類された複数の学習画像から導出した修正ベクトルの平均値や中央値などで代表化した代表値を修正ベクトルとして書き込んでいく。あるいは、特徴量と修正ベクトルとの関係が数式で近似可能な場合はその数式に含まれる係数パラメータなどを学習モデルデータとして保存してもよい。 The learning algorithm calculates the feature amount around the initial feature point in stage 1 for each feature point, and classifies the pattern around each feature point. As the feature amount, a luminance gradient, a HOG (Histgrams of Oriented Gradients) feature amount, or any other feature amount can be used. Note that the extracted feature value may be further subjected to clustering by the K-means method or the like or dimensional compression by principal component analysis or the like as the feature value. The learning algorithm calculates a difference between the set initial feature point (dark point) and the correct feature point (light circle) as a correction vector. Here, the correct feature point is defined by, for example, a human eye in advance to determine the coordinates. The learning algorithm associates the pattern around the feature point with the correction vector and stores it as learning model data of stage 1. For example, a lookup table is prepared for each feature point classification pattern, and the lookup table is represented by an average value or median value of correction vectors derived from a plurality of learning images classified into the same pattern. The converted representative value is written as a correction vector. Alternatively, when the relationship between the feature quantity and the correction vector can be approximated by a mathematical expression, coefficient parameters included in the mathematical expression may be stored as learning model data.

また学習アルゴリズムは、ステージ１において構築した学習モデルデータを用いて、ステージ１の初期特徴点に対して特徴点の周囲のパターンに対応する学習済みの修正ベクトルを用いて当該初期特徴点を修正することで、ステージ１での検出特徴点を導出し、これをステージ２の初期特徴点として設定する。例えば学習アルゴリズムは、９個の特徴点に対する９本の修正ベクトルをルックアップテーブルから読み込み、９個の特徴点に９本の修正ベクトルをそれぞれ加算して、ステージ２の９個の初期特徴点を導出する（図２（ａ）および図２（ｂ）参照）。 In addition, the learning algorithm uses the learning model data constructed in stage 1 to correct the initial feature point using the learned correction vector corresponding to the pattern around the feature point with respect to the initial feature point of stage 1. Thus, the detected feature point in stage 1 is derived and set as the initial feature point in stage 2. For example, the learning algorithm reads nine correction vectors for nine feature points from the lookup table, adds the nine correction vectors to the nine feature points, respectively, and determines the nine initial feature points of stage 2. Derived (see FIGS. 2A and 2B).

学習アルゴリズムは、ステージ２においてステージ１と同様に、ステージ２の初期特徴点の周囲の特徴量を特徴点ごとに算出し、各特徴点の周囲のパターンを分類する。学習アルゴリズムは、設定された初期特徴点（濃い点）と、正解特徴点（薄い丸）との差分を修正ベクトルとして算出する。学習アルゴリズムは、特徴点の周囲のパターンと修正ベクトルを関連付けてステージ２の学習モデルデータとして保存する。以下、同様の処理をステージＮまで繰り返す。ステージが進むにつれて検出特徴点の修正量が小さくなっていき、それに従い検出特徴点と正解特徴点とのずれも小さくなっていく（図２（ｃ）参照）。この学習を多数の顔画像に対して実行することにより、Ｎステージ分の学習モデルデータが生成される。 The learning algorithm calculates the feature amount around the initial feature point of stage 2 for each feature point in stage 2 as in stage 1, and classifies the pattern around each feature point. The learning algorithm calculates a difference between the set initial feature point (dark point) and the correct feature point (light circle) as a correction vector. The learning algorithm associates the pattern around the feature point with the correction vector and stores it as learning model data of stage 2. Thereafter, the same processing is repeated up to stage N. As the stage progresses, the correction amount of the detected feature point becomes smaller, and the deviation between the detected feature point and the correct feature point becomes smaller accordingly (see FIG. 2C). By performing this learning on a large number of face images, learning model data for N stages is generated.

図３は、図１のオブジェクト検出部１２及び特徴点検出部１３の構成を詳細に示したブロック図である。オブジェクト検出部１２は、オブジェクト領域検出部１２１及び初期特徴点出力部１２２を含む。特徴点検出部１３は、ロード順序決定部１３１、学習モデルデータロード部１３２、使用ステージ決定部１３３、修正ベクトル導出部１３４、特徴点修正部１３５、信頼度判定部１３６及び特徴点出力部１３７を含む。 FIG. 3 is a block diagram showing in detail the configuration of the object detection unit 12 and the feature point detection unit 13 of FIG. The object detection unit 12 includes an object region detection unit 121 and an initial feature point output unit 122. The feature point detection unit 13 includes a load order determination unit 131, a learning model data load unit 132, a use stage determination unit 133, a correction vector derivation unit 134, a feature point correction unit 135, a reliability determination unit 136, and a feature point output unit 137. Including.

オブジェクト領域検出部１２１は、入力画像内においてオブジェクトの探索窓をラスタスキャンして、オブジェクトが写っている領域を探索する。１度のラスタスキャンが終了すると、探索窓の大きさを変更して再度ラスタスキャンを実行する。この探索には例えば、Haar-Like特徴量を用いたCascade構造の識別器を用いることができる。オブジェクト領域検出部１２１による検出処理では、オブジェクトの大小や、画像のぼけ具合に関わらず、オブジェクトの大凡の位置と大きさを検出することができる。初期特徴点出力部１２２は、オブジェクト領域検出部１２１により検出されたオブジェクトの検出窓をもとに、その検出窓内の大凡の特徴点座標を修正ベクトル導出部１３４に初期座標として出力する。 The object area detection unit 121 performs a raster scan of the object search window in the input image to search for an area in which the object is shown. When one raster scan is completed, the size of the search window is changed and the raster scan is executed again. For this search, for example, a discriminator having a Cascade structure using Haar-Like features can be used. In the detection process by the object area detection unit 121, the approximate position and size of the object can be detected regardless of the size of the object and the degree of blur of the image. Based on the object detection window detected by the object region detection unit 121, the initial feature point output unit 122 outputs the approximate feature point coordinates in the detection window to the correction vector deriving unit 134 as initial coordinates.

ロード順序決定部１３１は、不揮発性記録部１６に保存されたＮステージ分の学習モデルデータのワークエリア部１５へのロード順序を決定する。学習モデルデータロード部１３２は、ロード順序決定部１３１により決定されたロード順序に基づいて、学習モデルデータを不揮発性記録部１６からステージ単位でワークエリア部１５に順次ロードする。 The load order determination unit 131 determines the load order of the learning model data for N stages stored in the nonvolatile recording unit 16 to the work area unit 15. The learning model data loading unit 132 sequentially loads the learning model data from the nonvolatile recording unit 16 to the work area unit 15 in units of stages based on the loading order determined by the loading order determination unit 131.

使用ステージ決定部１３３は、Ｎステージ分の学習モデルデータがワークエリア部１５にどのステージまでロードされたかに基づいて、各検知ステージにおいて、どのステージの学習モデルデータを使用するかを決定する。修正ベクトル導出部１３４は各検知ステージにおいて、入力画像内の複数の特徴点の各修正ベクトルを、使用ステージ決定部１３３により決定されたステージの学習モデルデータから導出する。特徴点修正部１３５は各検知ステージにおいて、入力画像内の複数の特徴点を、使用ステージ決定部１３３により導出された各修正ベクトルを用いてそれぞれ修正する。 The use stage determination unit 133 determines which stage of learning model data is used in each detection stage based on which stage of learning model data for N stages is loaded in the work area unit 15. The correction vector deriving unit 134 derives each correction vector of a plurality of feature points in the input image from the learning model data of the stage determined by the use stage determining unit 133 at each detection stage. The feature point correction unit 135 corrects a plurality of feature points in the input image using the correction vectors derived by the use stage determination unit 133 at each detection stage.

信頼度判定部１３６は検知ステージごとに、特徴点修正の信頼度を判定する。信頼度判定部１３６は、ある検知ステージで導出された修正ベクトルの信頼度が基準値に満たない場合、当該検知ステージで導出された修正ベクトルの信頼度が低いと判定する。 The reliability determination unit 136 determines the reliability of feature point correction for each detection stage. When the reliability of the correction vector derived at a certain detection stage is less than the reference value, the reliability determination unit 136 determines that the reliability of the correction vector derived at the detection stage is low.

特徴点出力部１３７は、Ｎステージにわたる修正後の特徴点を、特徴点検出部１３の検出結果としてアプリケーション処理部１４に出力する。特徴点出力部１３７は特徴点の修正過程において、信頼度判定部１３６により信頼度が低いと判定された検知ステージが発生した場合、その検知ステージにおける特徴点の修正をキャンセルし、直前の検知ステージの特徴点の状態に戻す。なお信頼度判定部１３６により信頼度が低いと判定された検知ステージが発生した場合、そのフレームの特徴点検出を無効としてもよい。また所定の回数以上、信頼度判定部１３６により信頼度が低いと判定された検知ステージが発生した場合に、そのフレームの特徴点検出を無効としてもよい。 The feature point output unit 137 outputs the corrected feature points over the N stages to the application processing unit 14 as the detection result of the feature point detection unit 13. In the process of correcting a feature point, the feature point output unit 137 cancels the correction of the feature point at the detection stage when the reliability determination unit 136 determines that the reliability is low, and immediately preceding the detection stage. Return to the state of the feature point. When a detection stage that is determined to have low reliability by the reliability determination unit 136 occurs, feature point detection for the frame may be invalidated. Further, when a detection stage whose reliability is determined to be low by the reliability determination unit 136 occurs a predetermined number of times or more, the feature point detection of the frame may be invalidated.

以上の構成において、撮像システム１の起動時にＮステージ分の学習モデルデータを不揮発性記録部１６からワークエリア部１５にロードするために長い時間を要する。ステージの数、特徴点の数、特徴点の周囲のパターン分類の数が多くなるほど、学習モデルデータの量が大きくなる。安価な組み込み型のシステムでは、バスのバンド幅が狭く、不揮発性記録部１６からワークエリア部１５に学習モデルデータを全てロードするのに、数秒以上を要する場合もある。例えば撮像システム１を、運転開始時の個人認証に使用する場合、イグニッションオンしてから実際に車両を動かせる状態になるまで数秒以上の時間がかかることになる。以下、このロードの待ち時間を短縮する仕組みを導入する。 In the above configuration, it takes a long time to load learning model data for N stages from the nonvolatile recording unit 16 to the work area unit 15 when the imaging system 1 is activated. As the number of stages, the number of feature points, and the number of pattern classifications around the feature points increase, the amount of learning model data increases. In an inexpensive built-in system, the bandwidth of the bus is narrow, and it may take several seconds or more to load all the learning model data from the nonvolatile recording unit 16 to the work area unit 15. For example, when the imaging system 1 is used for personal authentication at the start of driving, it takes several seconds or more until the vehicle can be actually moved after the ignition is turned on. In the following, we will introduce a mechanism to reduce the load waiting time.

まず実施例１を説明する。実施例１では、Ｎステージ分の学習モデルデータを途中までワークエリア部１５にロードした段階で、ワークエリア部１５にロード済みの学習モデルデータを使用して検知を開始する。即ち、一部のステージのロードが完了した時点で検知を開始し、学習モデルデータが未ロードの検知ステージでは、ロード済みの学習モデルデータで代替して検知を行う。 Example 1 will be described first. In the first embodiment, when learning model data for N stages is loaded into the work area unit 15 halfway, detection is started using the learning model data loaded in the work area unit 15. That is, detection is started when loading of some stages is completed, and detection is performed instead of the loaded learning model data in the detection stage where the learning model data is not loaded.

図４は、実施例１に係る、各検知ステージと使用する学習モデルデータの関係を示す図である。比較例は、Ｎステージ分の学習モデルデータのワークエリア部１５へのロードが全て完了してから検知を開始する例である。比較例では、各検知ステージと、各検知ステージで使用する学習モデルデータのステージは常に一致する。これに対して実施例１では、ステージｋまでのロードが完了した段階で検知を開始する。実施例１ではステージｋ以降のステージｋ〜Ｎでは、ステージｋの学習モデルデータを使用して特徴点を修正する。 FIG. 4 is a diagram illustrating a relationship between each detection stage and learning model data to be used according to the first embodiment. The comparative example is an example in which detection is started after the loading of learning model data for N stages to the work area unit 15 is completed. In the comparative example, each detection stage always matches the learning model data stage used in each detection stage. On the other hand, in the first embodiment, the detection is started when the loading up to the stage k is completed. In the first embodiment, in the stages k to N after the stage k, the feature points are corrected using the learning model data of the stage k.

次に実施例２を説明する。実施例２では、学習モデルデータのロード順序を変えることで、より実際のステージに近いデータを使用可能とするものであり、実施例１と比較して修正精度が向上する。例えば、ロード順序は、ステージ１を含み一部のステージを含まないステージ群を昇順に順次ロードした後に、残りのステージをロードする。なお、当該残りのステージのロード順序は、当該残りのステージの全部または一部をさらに昇順に順次ロードする順序に設定してもよい。 Next, Example 2 will be described. In the second embodiment, data closer to the actual stage can be used by changing the loading order of the learning model data, and the correction accuracy is improved as compared with the first embodiment. For example, the loading order is to sequentially load a stage group including stage 1 but not including some stages in ascending order and then loading the remaining stages. Note that the load order of the remaining stages may be set to an order in which all or a part of the remaining stages are sequentially loaded in ascending order.

図５は、実施例２に係る、各検知ステージと使用する学習モデルデータの関係を示す図である。図５に示す例では、７ステージ分の学習モデルデータを使用する例を示しており、その学習モデルデータのロード順序は、１、２、４、６、３、５、７に設定されている。ステージ１の学習モデルデータしかロードされていない段階で検知を開始すると、全ての検知ステージでステージ１の学習モデルデータを使用することになり、誤差が大きくなる。従って、全検知ステージの数（図５では７）に対して、約半分の数（図５では、３または４）のステージの学習モデルデータのロードが完了した段階から検知を開始することが好ましい。ステージ１、２、４、６の学習モデルデータのロードが完了した段階で検知を開始すると、各検知ステージにおいて、自ステージまたは直前のステージの学習モデルデータを使用することができ、誤差を最小限に抑えることができる。 FIG. 5 is a diagram illustrating a relationship between each detection stage and learning model data to be used according to the second embodiment. The example shown in FIG. 5 shows an example in which learning model data for seven stages is used, and the loading order of the learning model data is set to 1, 2, 4, 6, 3, 5, 7 . If detection is started at the stage where only the learning model data of stage 1 is loaded, the learning model data of stage 1 is used at all detection stages, and the error increases. Therefore, it is preferable to start the detection from the stage where the learning model data of about half of the number of detection stages (7 in FIG. 5) (3 or 4 in FIG. 5) has been loaded. . If detection is started at the stage when the learning model data of stages 1, 2, 4, and 6 is completely loaded, the learning model data of the own stage or the immediately preceding stage can be used at each detection stage, and errors are minimized. Can be suppressed.

このように学習モデルデータのロード順序は、基本的に１つ飛ばして２回ループする「１、２、４、６、３、５、７」か「１、３、５、７、２、４、６」が好ましいことが分かる。以下、最適な学習モデルデータのロード順序と、最適なステージ選択ルールを導出する方法を説明する。 As described above, the learning model data is basically loaded in the order of “1, 2, 4, 6, 3, 5, 7” or “1, 3, 5, 7, 2, 4” which skips one and loops twice. , 6 "is preferred. Hereinafter, an optimal learning model data loading order and a method for deriving an optimal stage selection rule will be described.

複数のテスト画像を用いて、修正された特徴点と正解特徴点との平均誤差が最小となるステージ選択ルールを決定する。まず、ステージ選択ルールの複数の候補を用意する。最終ステージであるステージ７において、選択した学習モデルデータの修正ベクトルによる修正後の特徴点と正解特徴点との誤差を特徴点ごとに算出し、それらの誤差の平均を算出する。その際、修正後の特徴点と正解特徴点との誤差を示す値として、生のピクセル数ではなく、右眼と左眼の間の距離で正規化したピクセル数を使用する。これにより、テスト画像内の顔の大きさの違いにより、平均誤差の値が影響を受けることを防止することができる。なお、上記以外の任意の正規化を行ってもよい。 A stage selection rule that minimizes the average error between the corrected feature point and the correct feature point is determined using a plurality of test images. First, a plurality of candidates for stage selection rules are prepared. In stage 7, which is the final stage, the error between the corrected feature point and the correct feature point by the correction vector of the selected learning model data is calculated for each feature point, and the average of these errors is calculated. At this time, the number of pixels normalized by the distance between the right eye and the left eye is used as a value indicating the error between the corrected feature point and the correct feature point, instead of the raw pixel number. Thereby, it is possible to prevent the average error value from being affected by the difference in the size of the face in the test image. In addition, you may perform arbitrary normalizations other than the above.

次に各テスト画像における平均誤差を全テスト画像で平均化して、テスト画像全体に対する平均誤差を算出する。最後に平均誤差が最小となるステージ選択ルールを採用する。なお、全てのテスト画像の平均誤差ではなく、例えばワースト２割のテスト画像の平均誤差を使用してもよい。この場合、大きく外れた修正を行ったステージ選択候補の評価が下がり、どの画像に対しても安定した修正を行ったステージ選択候補が採用されやすくなる。なお、上記以外の方法により誤差の代表値を導出してもよい。 Next, the average error in each test image is averaged over all test images, and the average error for the entire test image is calculated. Finally, a stage selection rule that minimizes the average error is adopted. Instead of the average error of all test images, for example, the average error of the worst 20% test images may be used. In this case, the evaluation of stage selection candidates that have undergone significant corrections is reduced, and it becomes easier to employ stage selection candidates that have been stably corrected for any image. Note that the error representative value may be derived by a method other than the above.

図６は、実施例２に係るステージ選択ルールの決定処理の一例を示す図である。図６に示す例は、ワークエリア部１５に４つのステージの学習モデルデータがロードされた段階において、各検知ステージで使用する学習モデルデータの選択ルールとして、最適なものを決定するための例である。図６では４つのステージ選択ルールの候補についてテストが実行され、候補３が最も平均誤差が小さくなったことを示している。候補３のステージ選択ルールを実現するには、ワークエリア部１５に４つのステージの学習モデルデータがロードされた段階において、ステージ１、２、４、６の学習モデルデータがロードされていることが必要となる。従って、学習モデルデータのロード順序は、「１、２、４、６、３、５、７」が適切であると決定することができる。 FIG. 6 is a diagram illustrating an example of a stage selection rule determination process according to the second embodiment. The example shown in FIG. 6 is an example for determining an optimum learning model data selection rule to be used in each detection stage at the stage where learning model data of four stages is loaded in the work area unit 15. is there. FIG. 6 shows that the test is performed on four stage selection rule candidates, and candidate 3 shows the smallest average error. In order to realize the stage 3 selection rule, the learning model data of stages 1, 2, 4, and 6 is loaded at the stage where the learning model data of four stages is loaded in the work area unit 15. Necessary. Accordingly, it is possible to determine that “1, 2, 4, 6, 3, 5, 7” is appropriate as the loading order of the learning model data.

実施例１、２では、検知ステージと、使用する学習モデルデータのステージが一致しない状態が発生する。このステージが一致しないことによる悪影響を軽減するため信頼度判定部１３６を設けている。信頼度判定部１３６は、ある検知ステージで導出された修正ベクトルの信頼度が基準値に満たない場合、当該修正ベクトルの信頼度が低いと判定する。例えば、ステージｔ（ｔは１〜Ｎの整数）で導出された複数の修正ベクトルの平均長が、ステージ（ｔ−１）で導出された複数の修正ベクトルの平均長より所定値以上大きい場合、ステージｔで導出された修正ベクトルの信頼度が低いと判定する。修正ベクトルの長さは、修正ベクトルのＸ、Ｙ成分の二乗和の平方根などで求めることができる。 In the first and second embodiments, a state occurs in which the detection stage does not match the stage of the learning model data to be used. A reliability determination unit 136 is provided in order to reduce the adverse effects caused by this stage mismatch. The reliability determination unit 136 determines that the reliability of the correction vector is low when the reliability of the correction vector derived at a certain detection stage is less than the reference value. For example, when the average length of the plurality of correction vectors derived at stage t (t is an integer from 1 to N) is greater than the average length of the plurality of correction vectors derived at stage (t−1) by a predetermined value or more, It is determined that the reliability of the correction vector derived at stage t is low. The length of the correction vector can be obtained from the square root of the sum of squares of the X and Y components of the correction vector.

また信頼度判定部１３６は、例えば初期特徴点をあらかじめ複数用意する、あるいは１つの初期特徴点に乱数を加えるなどして生成した複数の初期特徴点に対して複数の検知を行い、ステージｔの複数の修正ベクトルを導出した場合に、当該複数の修正ベクトルの分散または修正後の複数の特徴点の分散が、ステージ（ｔ−１）で導出された複数の修正ベクトルの分散または修正後の複数の特徴点の分散より所定値以上大きい場合、ステージｔで導出された修正ベクトルの信頼度が低いと判定してもよい。特徴点の修正が正解に向けて正しく行われている場合、ステージが進むにつれて修正量が減少して収束していく筈であり、直前のステージより修正ベクトルの長さや分散、特徴点同士の分散が増加している場合は修正が成功していないことを示唆している。 In addition, the reliability determination unit 136 performs a plurality of detections on a plurality of initial feature points generated by, for example, preparing a plurality of initial feature points in advance or adding a random number to one initial feature point, and When a plurality of correction vectors are derived, the distribution of the plurality of correction vectors or the distribution of the plurality of feature points after correction is the distribution of the plurality of correction vectors derived at stage (t-1) or the plurality of correction points after correction. May be determined to be low in reliability of the correction vector derived in stage t. If feature points are corrected correctly for the correct answer, the amount of correction should decrease and converge as the stage progresses, and the length and variance of the correction vector and variance between feature points from the previous stage If it increases, it indicates that the correction was not successful.

図７（ａ）、（ｂ）は、比較例に係る画像処理装置１０の動作例を示すフローチャートである。図７（ａ）はメインルーチンを示し、図７（ｂ）は顔特徴点検出のサブルーチンを示している。図８（ａ）−（ｃ）は、図７（ａ）、（ｂ）の動作例により検出される特徴点の具体例を示す図である。 FIGS. 7A and 7B are flowcharts illustrating an operation example of the image processing apparatus 10 according to the comparative example. FIG. 7A shows a main routine, and FIG. 7B shows a face feature point detection subroutine. FIGS. 8A to 8C are diagrams showing specific examples of feature points detected by the operation examples of FIGS. 7A and 7B.

図７（ａ）において、オブジェクト検出部１２は、撮像部２０から入力された入力画像において顔領域Ｒｆを検出する（Ｓ１０、図８（ａ）参照）。学習モデルデータロード部１３２は、不揮発性記録部１６に記録されているＮステージ分の学習モデルデータをステージ順にワークエリア部１５にロードしていく（Ｓ１２）。Ｎステージ分の学習モデルデータの全てのロードが完了すると顔特徴点検出処理が開始する（Ｓ１３）。 In FIG. 7A, the object detection unit 12 detects a face region Rf in the input image input from the imaging unit 20 (S10, see FIG. 8A). The learning model data loading unit 132 loads the learning model data for N stages recorded in the nonvolatile recording unit 16 into the work area unit 15 in the order of stages (S12). When all of the N stages of learning model data have been loaded, the facial feature point detection process starts (S13).

図７（ｂ）において、初期特徴点出力部１２２は顔領域Ｒｆに初期の特徴点を設定する（Ｓ１３０）。使用ステージ決定部１３３は、ステージｔの学習モデルデータを選択する（Ｓ１３２）。修正ベクトル導出部１３４は、ステージｔの学習モデルデータを使用して、検知ステージｔの特徴点の修正ベクトルを導出する（Ｓ１３３）。特徴点修正部１３５は、導出された修正ベクトルを用いて検知ステージｔの特徴点を修正する（Ｓ１３４）。ステップＳ１３２からステップＳ１３４までの処理が、Ｎ回繰り返し実行される。図８（ｂ）は初期の特徴点の位置を示しており、図８（ｂ）は修正完了後の特徴点の位置を示している。 In FIG. 7B, the initial feature point output unit 122 sets initial feature points in the face region Rf (S130). The use stage determination unit 133 selects learning model data of the stage t (S132). The correction vector deriving unit 134 derives the correction vector of the feature point of the detection stage t using the learning model data of the stage t (S133). The feature point correction unit 135 corrects the feature point of the detection stage t using the derived correction vector (S134). The processing from step S132 to step S134 is repeatedly performed N times. FIG. 8B shows the position of the initial feature point, and FIG. 8B shows the position of the feature point after completion of the correction.

図７（ａ）において、アプリケーション処理部１４は、顔特徴点検出処理において検出された特徴点をもとに顔向きを推定し（Ｓ１４）、わき見の有無を判定する（Ｓ１５）。アプリケーション処理部１４は、運転者がわき見をしていると判定すると、アラート信号を出力する。図７（ａ）に示した処理がフレーム毎に実行される。アプリケーション処理部１４は、複数のフレーム間の各特徴点の移動量および移動方向を検出することにより、わき見の有無を判定してもよい。 In FIG. 7A, the application processing unit 14 estimates the face direction based on the feature points detected in the face feature point detection process (S14), and determines the presence or absence of side effects (S15). If the application processing unit 14 determines that the driver is looking aside, the application processing unit 14 outputs an alert signal. The process shown in FIG. 7A is executed for each frame. The application processing unit 14 may determine the presence or absence of a sidewalk by detecting the movement amount and movement direction of each feature point between a plurality of frames.

図９（ａ）、（ｂ）は、実施例２に係る画像処理装置１０の動作例を示すフローチャートである。図９（ａ）において、オブジェクト検出部１２は、撮像部２０から入力された入力画像において顔領域Ｒｆを検出する（Ｓ１０）。ロード順序決定部１３１は、Ｎステージ分の学習モデルデータの不揮発性記録部１６からワークエリア部１５へのロード順序を決定する（Ｓ１１）。学習モデルデータロード部１３２は、決定されたロード順序に従い、学習モデルデータをワークエリア部１５にロードしていく（Ｓ１２ａ）。実施例２ではＮステージ分の学習モデルデータの全てのロードが完了する前に、顔特徴点検出処理が開始する（Ｓ１３）。 FIGS. 9A and 9B are flowcharts illustrating an operation example of the image processing apparatus 10 according to the second embodiment. In FIG. 9A, the object detection unit 12 detects a face region Rf in the input image input from the imaging unit 20 (S10). The load order determination unit 131 determines the load order of learning model data for N stages from the nonvolatile recording unit 16 to the work area unit 15 (S11). The learning model data loading unit 132 loads the learning model data into the work area unit 15 according to the determined loading order (S12a). In the second embodiment, the face feature point detection process starts before all the learning model data for N stages are loaded (S13).

図９（ｂ）において、初期特徴点出力部１２２は顔領域Ｒｆに初期の特徴点を設定する（Ｓ１３０）。使用ステージ決定部１３３は、予め設定されたステージ選択ルールに従い、検知ステージｔで使用する学習モデルデータのステージｍを決定する（Ｓ１３１）。使用ステージ決定部１３３は、決定したステージｍの学習モデルデータを選択する（Ｓ１３２ａ）。修正ベクトル導出部１３４は、ステージｍの学習モデルデータを使用して、検知ステージｔの特徴点の修正ベクトルを導出する（Ｓ１３３）。特徴点修正部１３５は、導出された修正ベクトルを用いて検知ステージｔの特徴点を修正する（Ｓ１３４）。 In FIG. 9B, the initial feature point output unit 122 sets initial feature points in the face region Rf (S130). The use stage determination unit 133 determines the stage m of the learning model data to be used at the detection stage t according to a preset stage selection rule (S131). The use stage determination unit 133 selects the learning model data of the determined stage m (S132a). The correction vector deriving unit 134 derives a correction vector of the feature point of the detection stage t using the learning model data of the stage m (S133). The feature point correction unit 135 corrects the feature point of the detection stage t using the derived correction vector (S134).

信頼度判定部１３６は、検知ステージｔの修正ベクトルの信頼度を算出する（Ｓ１３５）。算出された信頼度が基準値未満の場合（Ｓ１３６のＮ）、特徴点出力部１３７は、検知ステージｔの修正をキャンセルし、検知ステージ（ｔ−１）の特徴点を検知ステージ（ｔ＋１）に出力する（Ｓ１３７）。算出された信頼度が基準値以上の場合（Ｓ１３６のＹ）、ステップＳ１３７の処理はスキップされる。ステップＳ１３１からステップＳ１３７までの処理が、Ｎ回繰り返し実行される。 The reliability determination unit 136 calculates the reliability of the correction vector of the detection stage t (S135). When the calculated reliability is less than the reference value (N in S136), the feature point output unit 137 cancels the correction of the detection stage t, and the feature point of the detection stage (t−1) is changed to the detection stage (t + 1). It outputs (S137). When the calculated reliability is equal to or higher than the reference value (Y in S136), the process in step S137 is skipped. The processing from step S131 to step S137 is repeatedly executed N times.

図９（ａ）において、アプリケーション処理部１４は、顔特徴点検出処理において検出された特徴点をもとに顔向きを推定し（Ｓ１４）、わき見の有無を判定する（Ｓ１５）。 In FIG. 9A, the application processing unit 14 estimates the face direction based on the feature points detected in the face feature point detection process (S14), and determines the presence or absence of side effects (S15).

以上説明したように本実施の形態によれば、学習モデルデータを不揮発性記録部１６からワークエリア部１５へロードする際、Ｎステージ分の学習モデルデータを予め決定した順序でステージ単位で順次ロードする。一部のステージのロードが完了した時点から検知を開始し、未ロードのステージの検知を行う際は、予め決定した選択ルールに従い、現在の検知ステージに最も近いロード済みステージの学習モデルデータを代替して使用する。代替時の精度低下が最小となるようにロード順序を決定しておくことにより、検知開始までの時間を短縮しつつ、精度低下を抑えた検知が可能となる。 As described above, according to the present embodiment, when learning model data is loaded from the nonvolatile recording unit 16 to the work area unit 15, the learning model data for N stages are sequentially loaded in units in a predetermined order. To do. Detection starts when loading of some stages is completed, and when detecting an unloaded stage, the learning model data of the loaded stage closest to the current detection stage is substituted according to a predetermined selection rule And use it. By determining the load order so as to minimize the degradation in accuracy at the time of substitution, it is possible to perform detection while suppressing the degradation in accuracy while shortening the time to start detection.

また、各検知ステージにおいて信頼度を算出し、信頼度が低いと判断した場合は、１つ前のステージの特徴点を検知結果として出力する。これにより、精度低下を抑えた検知が可能となる。 If the reliability is calculated at each detection stage and it is determined that the reliability is low, the feature point of the previous stage is output as the detection result. As a result, detection with reduced accuracy can be performed.

以上、本発明を実施の形態をもとに説明した。実施の形態は例示であり、それらの各構成要素または各処理プロセスの組み合わせに、いろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described based on the embodiments. The embodiments are exemplifications, and it is understood by those skilled in the art that various modifications can be made to the respective components or combinations of the respective treatment processes, and such modifications are also within the scope of the present invention. is there.

上述の実施の形態では、検知対象のオブジェクトが単一の例を説明した。この点、検知対象のオブジェクトが複数の場合、Ｎステージ分の学習モデルデータがオブジェクトごとに不揮発性記録部１６に保存される。例えば、ノーマルの顔検知用の学習モデルデータＡ、マスクをした顔検知用の学習モデルデータＢ、サングラスをした顔検知用の学習モデルデータＣが不揮発性記録部１６内に保存される。 In the above-described embodiment, the example in which the object to be detected is single has been described. In this regard, when there are a plurality of objects to be detected, learning model data for N stages is stored in the nonvolatile recording unit 16 for each object. For example, learning model data A for normal face detection, learning model data B for face detection using a mask, and learning model data C for face detection using sunglasses are stored in the nonvolatile recording unit 16.

オブジェクト検出部１２は、入力画像内で検出されたオブジェクトを示すオブジェクトＩＤを特徴点検出部１３に出力する。特徴点検出部１３の学習モデルデータロード部１３２は、オブジェクトＩＤに対応する学習モデルデータを、不揮発性記録部１６からワークエリア部１５に上述した実施例１または実施例２に係る方法でロードする。 The object detection unit 12 outputs an object ID indicating the object detected in the input image to the feature point detection unit 13. The learning model data loading unit 132 of the feature point detection unit 13 loads the learning model data corresponding to the object ID from the nonvolatile recording unit 16 to the work area unit 15 by the method according to the first embodiment or the second embodiment described above. .

以下、オブジェクト切り替え前の学習モデルデータ（ノーマル顔の学習モデルデータ）を学習モデルデータＡとし、オブジェクト切り替え後の学習モデルデータ（マスクをした顔の学習モデルデータ）を学習モデルデータＢとする。学習モデルデータＢを使用した検知ステージの信頼度が基準値未満であり、学習モデルデータＡがワークエリア部１５に残存している場合、学習モデルデータＡを使用した場合の信頼度を算出し、基準値以上の場合、学習モデルデータＡを使用した検知に復帰してもよい。この場合、ワークエリア部１５の学習モデルデータＡが上書きされることを防止し、学習モデルデータＡの再ロードを回避することができる。 Hereinafter, learning model data before switching objects (learning model data for normal faces) is referred to as learning model data A, and learning model data after switching objects (masked face learning model data) is referred to as learning model data B. When the reliability of the detection stage using the learning model data B is less than the reference value and the learning model data A remains in the work area unit 15, the reliability when the learning model data A is used is calculated. In the case of the reference value or more, the detection using the learning model data A may be returned. In this case, it is possible to prevent the learning model data A in the work area unit 15 from being overwritten, and to avoid reloading the learning model data A.

アプリケーション処理部１４は、複数のアプリケーション処理のそれぞれの開始タイミングを、ワークエリア部１５への学習モデルデータのロード済みのステージ数に応じて決定してもよい。例えば、顔の表情から感情を推定するアプリケーションの場合、精通な特徴点の検出が必要となるため、全てのステージの学習モデルデータのロードが完了してから処理を開始する。一方、顔向き推定のアプリケーションは大凡の特徴点から検出できるため、一部のステージの学習モデルデータのロードが完了した時点で処理を開始する。アプリケーション処理部１４は、ワークエリア部１５へのロード済みステージ数が増えるにつれて、複数のアプリケーションの内、処理可能なアプリケーションを順次有効化していく。 The application processing unit 14 may determine the start timing of each of the plurality of application processes according to the number of stages that have been loaded with the learning model data to the work area unit 15. For example, in the case of an application that estimates emotions from facial expressions, it is necessary to detect feature points that are sensible, and therefore, the processing is started after the learning model data of all stages has been loaded. On the other hand, since the face direction estimation application can be detected from roughly feature points, the processing is started when the learning model data of some stages is completely loaded. As the number of stages loaded in the work area unit 15 increases, the application processing unit 14 sequentially activates applications that can be processed among a plurality of applications.

上述の実施の形態では、特徴点検出部１３における初期の特徴点座標を、オブジェクト検出部１２により検出されたオブジェクトの大凡の特徴点座標から設定する例を説明した。この点、画像内のオブジェクトの大きさと位置が概ね決まっている用途（例えば、証明写真）の場合、予め決まった位置に平均顔の特徴点を初期の特徴点として設定してもよい。 In the above-described embodiment, the example in which the initial feature point coordinates in the feature point detection unit 13 are set from the approximate feature point coordinates of the object detected by the object detection unit 12 has been described. In this regard, in the case of an application in which the size and position of an object in an image are generally determined (for example, an ID photo), an average face feature point may be set as an initial feature point at a predetermined position.

上述の実施の形態では画像処理装置１０として、車内に設置される撮像システム１内に組み込まれる画像処理装置１０を想定した。この点、画像処理装置１０はスマートフォン、ＰＣなど情報処理機器全般のいずれの機器に搭載されるものであってもよい。例えば、ＰＣであっても低スペックの機種であれば、ハードディスク内の学習モデルデータをＲＡＭにロードするのに長い時間を要するものもある。このような場合、上述の実施の形態で説明した手法が有効に機能する。 In the above-described embodiment, the image processing apparatus 10 incorporated in the imaging system 1 installed in the vehicle is assumed as the image processing apparatus 10. In this regard, the image processing apparatus 10 may be mounted on any device of information processing devices such as a smartphone and a PC. For example, even if a PC is a low-spec model, it may take a long time to load learning model data in the hard disk into the RAM. In such a case, the method described in the above embodiment functions effectively.

なお、実施の形態は、以下の項目によって特定されてもよい。 The embodiment may be specified by the following items.

［項目１］
それぞれが特徴点の修正ベクトルの導出に寄与するＮ（Ｎは２以上の整数）ステージの学習モデルデータを用いて、入力画像内の初期の特徴点を段階的に修正してオブジェクトの特徴点を検出する画像処理装置（１０）であって、
前記Ｎステージの学習モデルデータをロードするためのワークエリア部（１５）と、
入力画像内の初期の各特徴点の周辺の特徴量または１つ前のステージにおいて修正された各特徴点の周辺の特徴量をもとに、前記ワークエリア部（１５）に保持される、対応するステージの学習モデルデータを参照して各特徴点の修正ベクトルを導出する修正ベクトル導出部（１３４）と、
前記入力画像内の各特徴点を、導出された各修正ベクトルをもとに修正する特徴点修正部（１３５）と、を備え、
少なくとも本画像処理装置（１０）の起動時において、前記Ｎステージの学習モデルデータを記録している記録部（１６）から前記ワークエリア部（１５）への前記Ｎステージの学習モデルデータのロードが完了する前から、前記修正ベクトル導出部（１３４）および前記特徴点修正部（１３５）は、特徴点の修正処理を開始し、
前記修正ベクトル導出部（１３４）は、ステージｔ（１≦ｔ≦Ｎ）の修正ベクトルを導出する際に、前記ワークエリア部（１５）にステージｔの学習モデルデータがロードされていない場合、他のステージの学習モデルデータを代用して修正ベクトルを導出することを特徴とする画像処理装置（１０）。
これによれば、特徴点の修正処理の開始タイミングを早めることができる。
［項目２］
前記修正ベクトル導出部（１３４）は、ステージｔ（１≦ｔ≦Ｎ）の修正ベクトルを導出する際に、前記ワークエリア部（１５）にステージｔの学習モデルデータがロードされていない場合、前記ワークエリア部（１５）に既にロードされている学習モデルデータの内、最も近いステージの学習モデルデータを代用して修正ベクトルを導出することを特徴とする項目１に記載の画像処理装置（１０）。
これによれば、特徴点の修正精度の低下を抑制しつつ、修正処理の開始タイミングを早めることができる。
［項目３］
前記記録部（１６）から前記ワークエリア部（１５）への前記Ｎステージの学習モデルデータのロード順序は、ステージ１を含み一部のステージを飛ばしたステージ群を昇順に順次ロードした後、飛ばしたステージを順次ロードする順序であることを特徴とする項目１または２に記載の画像処理装置（１０）。
これによれば、学習モデルデータの代用が必要なステージにおいて、近いステージの学習モデルデータを代用することができ、特徴点の修正精度の低下を抑制することができる。
［項目４］
前記Ｎステージの学習モデルデータの前記ワークエリア部（１５）へのロード順序を決定するロード順序決定部（１３１）と、
前記ロード順序決定部（１３１）で決定されたロード順序に基づいて、前記Ｎステージの学習モデルデータをステージ単位で順次ロードする学習モデルデータロード部（１３２）と、をさらに備え、
前記ロード順序決定部（１３１）は、複数のロード順序候補の中から、テストデータに対する特徴点の修正処理において正解特徴点との誤差の平均が最小となるロード順序候補を、使用するロード順序に決定することを特徴とする項目１から３のいずれかに記載の画像処理装置（１０）。
これによれば、特徴点の修正精度の低下をさらに抑制することができる。
［項目５］
各ステージで導出される特徴点の修正ベクトルの信頼度を判定する信頼度判定部（１３６）をさらに備え、
前記信頼度判定部（１３６）は、ステージｔで導出された修正ベクトルの信頼度が基準値に満たない場合、ステージｔの特徴点の修正を無効にすることを特徴とする項目１から４のいずれかに記載の画像処理装置（１０）。
これによれば、信頼度の低い特徴点の修正を無効にすることにより、修正精度の低下を抑制することができる。
［項目６］
前記信頼度判定部（１３６）は、ステージｔで導出された複数の修正ベクトルの長さの平均値が、ステージ（ｔ−１）で導出された複数の修正ベクトルの長さの平均値より所定値以上大きい場合、ステージｔの複数の特徴点の修正を無効にすることを特徴とする項目５に記載の画像処理装置（１０）。
これによれば、直前のステージの修正と現在のステージの修正とを比較することにより、修正精度を的確に評価することができる。
［項目７］
前記記録部（１６）には、前記Ｎステージの学習モデルデータが検知対象のオブジェクトごとに記録されており、
検知対象が第１オブジェクトから第２オブジェクトに切り替わる際、前記記録部（１６）から前記ワークエリア部（１５）への前記第２オブジェクトのＮステージの学習モデルデータのロードが完了する前から、前記修正ベクトル導出部（１３４）および前記特徴点修正部（１３５）は、前記第２オブジェクトの特徴点の修正処理を開始し、
前記修正ベクトル導出部（１３４）は、前記第２オブジェクトの特徴点の修正処理において、ステージｔ（１≦ｔ≦Ｎ）の修正ベクトルを導出する際に、前記ワークエリア部（１５）にステージｔの学習モデルデータがロードされていない場合、他のステージの学習モデルデータを代用して修正ベクトルを導出することを特徴とする項目１から４のいずれかに記載の画像処理装置（１０）。
これによれば、第１オブジェクトから第２オブジェクトへの切り替え時において、第２オブジェクトの特徴点の修正処理の開始タイミングを早めることができる。
［項目８］
各ステージで導出される特徴点の修正ベクトルの信頼度を判定する信頼度判定部（１３６）をさらに備え、
前記信頼度判定部（１３６）は、前記第２オブジェクトの学習モデルデータを使用した修正ベクトルの信頼度が基準値に満たない場合であり、かつ前記ワークエリア部（１５）に残存している前記第１オブジェクトの学習モデルデータを使用した修正ベクトルの信頼度が基準値を満たす場合、第１オブジェクトの検知処理に切り替えることを特徴とする項目７に記載の画像処理装置（１０）。
これによれば、ワークエリア部（１５）に残存している第１オブジェクトの学習モデルデータを有効活用することができる。
［項目９］
前記入力画像内において特定のオブジェクトを探索し、検出した特定のオブジェクトの検出枠を設定するオブジェクト領域検出部（１２１）と、
前記検出枠内のオブジェクトの特徴点を、前記初期の特徴点に設定する初期特徴点設定部（１２２）と、
をさらに備えることを特徴とする項目１から８のいずれかに記載の画像処理装置（１０）。
これによれば、初期の特徴点を、正解の特徴点に比較的に近い位置に設定することができる。
［項目１０］
前記初期の特徴点の段階的な修正が完了した後の特徴点をもとに、複数のアプリケーション処理を実行するアプリケーション処理部（１４）をさらに備え、
前記アプリケーション処理部（１４）は、前記複数のアプリケーション処理のそれぞれの開始タイミングを、前記ワークエリア部（１５）への前記学習モデルデータのロード済みのステージ数に応じて決定することを特徴とする項目１から９のいずれかに記載の画像処理装置（１０）。
これによれば、アプリケーション処理の開始タイミングを最適化することができる。
［項目１１］
前記入力画像は、車両内に設置された運転者の顔を撮影するための撮像部（２０）から入力された画像であり、
検知対象のオブジェクトは、運転者の顔であることを特徴とする項目１から１０のいずれかに記載の画像処理装置（１０）。
これによれば、運転者の顔検出システムにおいて、特徴点の修正処理の開始タイミングを早めることができる。
［項目１２］
それぞれが特徴点の修正ベクトルの導出に寄与するＮ（Ｎは２以上の整数）ステージの学習モデルデータを用いて、入力画像内の初期の特徴点を段階的に修正してオブジェクトの特徴点を検出する画像処理方法であって、
前記Ｎステージの学習モデルデータをワークエリア部（１５）にロードするステップと、
入力画像内の初期の各特徴点の周辺の特徴量または１つ前のステージにおいて修正された各特徴点の周辺の特徴量をもとに、前記ワークエリア部（１５）に保持される、対応するステージの学習モデルデータを参照して各特徴点の修正ベクトルを導出するステップと、
前記入力画像内の各特徴点を、導出された各修正ベクトルをもとに修正するステップと、を有し、
少なくとも起動時において、前記Ｎステージの学習モデルデータを記録している記録部（１６）から前記ワークエリア部（１５）への前記Ｎステージの学習モデルデータのロードが完了する前から、前記修正ベクトルを導出する処理および前記特徴点を修正する処理は、特徴点の修正処理を開始し、
前記修正ベクトルを導出するステップは、ステージｔ（１≦ｔ≦Ｎ）の修正ベクトルを導出する際に、前記ワークエリア部（１５）にステージｔの学習モデルデータがロードされていない場合、他のステージの学習モデルデータを代用して修正ベクトルを導出することを特徴とする画像処理方法。
これによれば、特徴点の修正処理の開始タイミングを早めることができる。
［項目１３］
それぞれが特徴点の修正ベクトルの導出に寄与するＮ（Ｎは２以上の整数）ステージの学習モデルデータを用いて、入力画像内の初期の特徴点を段階的に修正してオブジェクトの特徴点を検出する画像処理プログラムであって、
前記Ｎステージの学習モデルデータをワークエリア部（１５）にロードする処理と、
入力画像内の初期の各特徴点の周辺の特徴量または１つ前のステージにおいて修正された各特徴点の周辺の特徴量をもとに、前記ワークエリア部（１５）に保持される、対応するステージの学習モデルデータを参照して各特徴点の修正ベクトルを導出する処理と、
前記入力画像内の各特徴点を、導出された各修正ベクトルをもとに修正する処理と、をコンピュータに実行させ、
少なくとも起動時において、前記Ｎステージの学習モデルデータを記録している記録部（１６）から前記ワークエリア部（１５）への前記Ｎステージの学習モデルデータのロードが完了する前から、前記修正ベクトルを導出する処理および前記特徴点を修正する処理は、特徴点の修正処理を開始し、
前記修正ベクトルを導出する処理は、ステージｔ（１≦ｔ≦Ｎ）の修正ベクトルを導出する際に、前記ワークエリア部（１５）にステージｔの学習モデルデータがロードされていない場合、他のステージの学習モデルデータを代用して修正ベクトルを導出することを特徴とする画像処理プログラム。
これによれば、特徴点の修正処理の開始タイミングを早めることができる。 [Item 1]
Using the learning model data of N (N is an integer of 2 or more) stages, each of which contributes to the derivation of the correction vector of the feature point, the initial feature point in the input image is corrected step by step to obtain the feature point of the object. An image processing device (10) to detect,
A work area section (15) for loading the N-stage learning model data;
Correspondence stored in the work area unit (15) based on the feature amount around each initial feature point in the input image or the feature amount around each feature point modified in the previous stage A correction vector deriving unit (134) for deriving a correction vector of each feature point with reference to the learning model data of the stage to perform;
A feature point correction unit (135) that corrects each feature point in the input image based on each derived correction vector;
At least when the image processing apparatus (10) is activated, the N stage learning model data is loaded from the recording unit (16) recording the N stage learning model data to the work area unit (15). Before completion, the correction vector deriving unit (134) and the feature point correcting unit (135) start the feature point correcting process,
When the correction vector deriving unit (134) derives the correction vector of the stage t (1 ≦ t ≦ N), if the learning model data of the stage t is not loaded in the work area unit (15), the other An image processing apparatus (10), wherein a correction vector is derived by substituting the learning model data of the stage.
According to this, the start timing of the feature point correction processing can be advanced.
[Item 2]
When the correction vector deriving unit (134) derives the correction vector of the stage t (1 ≦ t ≦ N) and the learning model data of the stage t is not loaded in the work area unit (15), the correction vector deriving unit (134) Item 10. The image processing apparatus (10) according to item 1, wherein a correction vector is derived by substituting the learning model data at the nearest stage among the learning model data already loaded in the work area (15). .
According to this, the start timing of the correction process can be advanced while suppressing a decrease in the correction accuracy of the feature points.
[Item 3]
The learning order data of the N stages from the recording unit (16) to the work area unit (15) is loaded after the stage group including stage 1 is skipped in order of ascending order and then skipped. 3. The image processing apparatus (10) according to item 1 or 2, wherein the stages are sequentially loaded.
According to this, in the stage where learning model data needs to be substituted, the learning model data of the near stage can be substituted, and the reduction in the correction accuracy of the feature points can be suppressed.
[Item 4]
A load order determination unit (131) for determining a load order of the N-stage learning model data to the work area unit (15);
A learning model data loading unit (132) that sequentially loads the learning model data of the N stages in units of stages based on the loading order determined by the loading order determination unit (131);
The load order determination unit (131) selects a load order candidate that has the smallest average error from the correct feature point in the correction process of feature points for test data from among a plurality of load order candidates as a load order to use. The image processing apparatus (10) according to any one of items 1 to 3, wherein the image processing apparatus (10) is determined.
According to this, it is possible to further suppress a decrease in the correction accuracy of the feature points.
[Item 5]
A reliability determination unit (136) for determining the reliability of the correction vector of the feature point derived at each stage;
The reliability determination unit (136) invalidates the correction of the feature point of the stage t when the reliability of the correction vector derived at the stage t is less than a reference value. The image processing device (10) according to any one of the above.
According to this, it is possible to suppress a reduction in correction accuracy by invalidating correction of feature points with low reliability.
[Item 6]
The reliability determination unit (136) determines the average value of the lengths of the plurality of correction vectors derived at the stage t from the average value of the lengths of the plurality of correction vectors derived at the stage (t-1). The image processing apparatus (10) according to item 5, wherein the correction of a plurality of feature points of the stage t is invalidated when the value is larger than the value.
According to this, the correction accuracy can be accurately evaluated by comparing the correction of the immediately preceding stage with the correction of the current stage.
[Item 7]
In the recording unit (16), the learning model data of the N stage is recorded for each object to be detected,
When the detection target is switched from the first object to the second object, the loading of the learning model data of the N stage of the second object from the recording unit (16) to the work area unit (15) is completed. The correction vector deriving unit (134) and the feature point correcting unit (135) start correction processing of the feature points of the second object,
When the correction vector deriving unit (134) derives the correction vector of the stage t (1 ≦ t ≦ N) in the correction process of the feature point of the second object, the correction vector deriving unit (134) outputs the stage t to the work area unit (15). 5. The image processing device (10) according to any one of items 1 to 4, wherein when the learning model data is not loaded, a correction vector is derived by substituting the learning model data of another stage.
According to this, at the time of switching from the first object to the second object, it is possible to advance the start timing of the correction process of the feature point of the second object.
[Item 8]
A reliability determination unit (136) for determining the reliability of the correction vector of the feature point derived at each stage;
The reliability determination unit (136) is a case where the reliability of the correction vector using the learning model data of the second object is less than a reference value, and remains in the work area unit (15) 8. The image processing apparatus (10) according to item 7, wherein when the reliability of the correction vector using the learning model data of the first object satisfies a reference value, the process is switched to the first object detection process.
According to this, it is possible to effectively utilize the learning model data of the first object remaining in the work area part (15).
[Item 9]
An object area detection unit (121) for searching for a specific object in the input image and setting a detection frame of the detected specific object;
An initial feature point setting unit (122) for setting a feature point of an object in the detection frame as the initial feature point;
The image processing apparatus (10) according to any one of items 1 to 8, further comprising:
According to this, the initial feature point can be set at a position relatively close to the correct feature point.
[Item 10]
An application processing unit (14) for executing a plurality of application processes based on the feature points after the stepwise correction of the initial feature points is completed;
The application processing unit (14) determines the start timing of each of the plurality of application processes according to the number of stages in which the learning model data has been loaded into the work area unit (15). The image processing apparatus (10) according to any one of items 1 to 9.
According to this, the start timing of application processing can be optimized.
[Item 11]
The input image is an image input from an imaging unit (20) for photographing a driver's face installed in a vehicle,
The image processing device (10) according to any one of items 1 to 10, wherein the object to be detected is a driver's face.
According to this, in the driver's face detection system, the start timing of the feature point correction process can be advanced.
[Item 12]
Using the learning model data of N (N is an integer of 2 or more) stages, each of which contributes to the derivation of the correction vector of the feature point, the initial feature point in the input image is corrected step by step to obtain the feature point of the object. An image processing method to detect,
Loading the N stage learning model data into the work area section (15);
Correspondence stored in the work area unit (15) based on the feature amount around each initial feature point in the input image or the feature amount around each feature point modified in the previous stage Deriving a correction vector of each feature point with reference to learning model data of the stage to be performed;
Correcting each feature point in the input image based on each derived correction vector,
At least at the time of activation, the correction vector is stored before the learning model data of the N stage is completely loaded from the recording unit (16) in which the learning model data of the N stage is recorded to the work area unit (15). The process of deriving and the process of correcting the feature point starts the process of correcting the feature point,
In the step of deriving the correction vector, if the learning model data of stage t is not loaded in the work area part (15) when deriving the correction vector of stage t (1 ≦ t ≦ N), An image processing method, wherein a correction vector is derived by substituting learning model data of a stage.
According to this, the start timing of the feature point correction processing can be advanced.
[Item 13]
Using the learning model data of N (N is an integer of 2 or more) stages, each of which contributes to the derivation of the correction vector of the feature point, the initial feature point in the input image is corrected step by step to obtain the feature point of the object. An image processing program to detect,
A process of loading the N-stage learning model data into the work area unit (15);
Correspondence stored in the work area unit (15) based on the feature amount around each initial feature point in the input image or the feature amount around each feature point modified in the previous stage A process for deriving a correction vector of each feature point with reference to the learning model data of the stage to be performed,
Processing for correcting each feature point in the input image based on each derived correction vector,
At least at the time of activation, the correction vector is stored before the learning model data of the N stage is completely loaded from the recording unit (16) in which the learning model data of the N stage is recorded to the work area unit (15). The process of deriving and the process of correcting the feature point starts the process of correcting the feature point,
In the process of deriving the correction vector, when deriving the correction vector of stage t (1 ≦ t ≦ N), if learning model data of stage t is not loaded in the work area part (15), An image processing program that derives a correction vector by substituting learning model data of a stage.
According to this, the start timing of the feature point correction processing can be advanced.

１撮像システム、１０画像処理装置、１１処理部、１２オブジェクト検出部、１３特徴点検出部、１４アプリケーション処理部、１５ワークエリア部、１６不揮発性記録部、２０撮像部、１２１オブジェクト領域検出部、１２２初期特徴点出力部、１３１ロード順序決定部、１３２学習モデルデータロード部、１３３使用ステージ決定部、１３４修正ベクトル導出部、１３５特徴点修正部、１３６信頼度判定部、１３７特徴点出力部。 DESCRIPTION OF SYMBOLS 1 Imaging system, 10 Image processing apparatus, 11 Processing part, 12 Object detection part, 13 Feature point detection part, 14 Application processing part, 15 Work area part, 16 Non-volatile recording part, 20 Imaging part, 121 Object area detection part, 122 initial feature point output unit, 131 load order determination unit, 132 learning model data load unit, 133 use stage determination unit, 134 correction vector derivation unit, 135 feature point correction unit, 136 reliability determination unit, 137 feature point output unit

Claims

Using the learning model data of N (N is an integer of 2 or more) stages, each of which contributes to the derivation of the correction vector of the feature point, the initial feature point in the input image is corrected step by step to obtain the feature point of the object. An image processing device to detect,
A work area section for loading the N-stage learning model data;
Based on the feature amount around each initial feature point in the input image or the feature amount around each feature point modified in the previous stage, the corresponding stage held in the work area unit A correction vector deriving unit for deriving a correction vector of each feature point with reference to the learning model data;
A feature point correction unit that corrects each feature point in the input image based on each derived correction vector;
At least when the image processing apparatus is activated, the correction vector derivation is performed before the loading of the N stage learning model data from the recording unit recording the N stage learning model data to the work area unit is completed. And the feature point correction unit start a feature point correction process,
When the correction vector deriving unit derives the correction vector of the stage t (1 ≦ t ≦ N), if the learning model data of the stage t is not loaded in the work area unit, the learning model data of another stage An image processing apparatus that derives a correction vector by substituting.

When the correction vector deriving unit derives the correction vector of the stage t (1 ≦ t ≦ N) and the learning model data of the stage t is not loaded in the work area unit, the correction vector deriving unit is already loaded in the work area unit. The image processing apparatus according to claim 1, wherein the correction vector is derived by substituting the learning model data of the closest stage among the learning model data being processed.

The learning model data of the N stage from the recording unit to the work area unit is loaded in order of ascending order of stages including stage 1 and skipping some stages, and then sequentially loading the skipped stages. The image processing apparatus according to claim 1, wherein the order is an order.

A load order determination unit for determining a load order of the learning model data of the N stages to the work area unit;
A learning model data loading unit that sequentially loads the learning model data of the N stages in units of stages based on the loading order determined by the loading order determination unit;
The load order determination unit determines, from among a plurality of load order candidates, a load order candidate that minimizes an average error from the correct feature point in the correction process of the feature points for the test data as a load order to be used. The image processing apparatus according to claim 1, wherein:

A reliability determination unit that determines the reliability of the correction vector of the feature point derived at each stage;
The reliability determination unit invalidates the correction of the feature point of the stage t when the reliability of the correction vector derived at the stage t is less than a reference value. An image processing apparatus according to 1.

In the reliability determination unit, the average value of the lengths of the plurality of correction vectors derived in the stage t is larger than the average value of the lengths of the plurality of correction vectors derived in the stage (t-1) by a predetermined value or more. 6. The image processing apparatus according to claim 5, wherein correction of a plurality of feature points of stage t is invalidated.

In the recording unit, the learning model data of the N stage is recorded for each object to be detected,
When the detection target is switched from the first object to the second object, before the loading of the learning model data of the N stage of the second object from the recording unit to the work area unit is completed, the correction vector deriving unit and the The feature point correction unit starts correction processing of the feature points of the second object,
When the correction vector deriving unit derives a correction vector of the stage t (1 ≦ t ≦ N) in the correction process of the feature point of the second object, the learning model data of the stage t is loaded into the work area unit 5. If not, the image processing apparatus according to claim 1, wherein the correction vector is derived by substituting the learning model data of another stage.

A reliability determination unit that determines the reliability of the correction vector of the feature point derived at each stage;
The reliability determination unit is a case where the reliability of the correction vector using the learning model data of the second object is less than a reference value, and the learning model of the first object remaining in the work area unit The image processing apparatus according to claim 7, wherein when the reliability of the correction vector using data satisfies a reference value, the processing is switched to the first object detection process.

Searching for a specific object in the input image, an object area detection unit for setting a detection frame of the detected specific object;
An initial feature point setting unit for setting a feature point of the object in the detection frame as the initial feature point;
The image processing apparatus according to claim 1, further comprising:

Based on the feature point after completion of the stepwise correction of the initial feature point, further comprising an application processing unit that executes a plurality of application processes,
The said application process part determines the start timing of each of these application processes according to the number of stages by which the said learning model data was loaded to the said work area part. The image processing apparatus according to any one of the above.

The input image is an image input from an imaging unit for photographing a driver's face installed in a vehicle,
The image processing apparatus according to claim 1, wherein the object to be detected is a driver's face.

Using the learning model data of N (N is an integer of 2 or more) stages, each of which contributes to the derivation of the correction vector of the feature point, the initial feature point in the input image is corrected step by step to obtain the feature point of the object. An image processing method to detect,
Loading the N stage learning model data into a work area;
Based on the feature amount around each initial feature point in the input image or the feature amount around each feature point modified in the previous stage, the corresponding stage held in the work area unit Deriving a correction vector of each feature point with reference to the learning model data;
Correcting each feature point in the input image based on each derived correction vector,
The process of deriving the correction vector before the completion of the loading of the learning model data of the N stage from the recording unit recording the learning model data of the N stage to the work area at least at the time of start-up The process of correcting the feature point starts the correction process of the feature point,
In the step of deriving the correction vector, when the learning model data of the stage t is not loaded in the work area part when deriving the correction vector of the stage t (1 ≦ t ≦ N), learning of another stage is performed. An image processing method characterized by deriving a correction vector by substituting model data.

Using the learning model data of N (N is an integer of 2 or more) stages, each of which contributes to the derivation of the correction vector of the feature point, the initial feature point in the input image is corrected step by step to obtain the feature point of the object. An image processing program to detect,
A process of loading the N-stage learning model data into a work area part;
Based on the feature amount around each initial feature point in the input image or the feature amount around each feature point modified in the previous stage, the corresponding stage held in the work area unit A process for deriving a correction vector of each feature point with reference to the learning model data;
Processing for correcting each feature point in the input image based on each derived correction vector,
The process of deriving the correction vector before the completion of the loading of the learning model data of the N stage from the recording unit recording the learning model data of the N stage to the work area at least at the time of start-up The process of correcting the feature point starts the correction process of the feature point,
In the process of deriving the correction vector, when the learning model data of stage t is not loaded in the work area when deriving the correction vector of stage t (1 ≦ t ≦ N), learning of another stage is performed. An image processing program that derives a correction vector by substituting model data.