JP2022517254A

JP2022517254A - Gaze area detection method, device, and electronic device

Info

Publication number: JP2022517254A
Application number: JP2021540793A
Authority: JP
Inventors: ▲詩▼▲堯▼ 黄; ▲飛▼ 王; 晨 ▲錢▼
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2019-03-18
Filing date: 2019-12-24
Publication date: 2022-03-07
Anticipated expiration: 2039-12-24
Also published as: KR20210104107A; WO2020186867A1; JP7244655B2; CN111723828A

Abstract

本発明は、注視エリア検出方法、装置、及び電子デバイスを提供する。前記方法は、所定の３次元空間で収集された顔画像を取得することと、前記顔画像に基づいて視線検出を実行して視線検出結果を得ることと、前記所定の３次元空間に対して事前にトレーニングされた注視エリア分類器を利用して、前記視線検出結果に基づいて前記顔画像に対応する目標注視エリアの種類を検出することと、を含み、ここで、前記目標注視エリアは、前記所定の３次元空間を事前に分割することにより得られた複数種類の定義された注視エリアのうちの１つに属する。【選択図】図１The present invention provides a gaze area detection method, an apparatus, and an electronic device. The method obtains a face image collected in a predetermined three-dimensional space, executes line-of-sight detection based on the face image to obtain a line-of-sight detection result, and obtains a line-of-sight detection result with respect to the predetermined three-dimensional space. A pre-trained gaze area classifier is used to detect the type of target gaze area corresponding to the face image based on the gaze detection result, wherein the target gaze area is: It belongs to one of a plurality of defined gaze areas obtained by preliminarily dividing the predetermined three-dimensional space. [Selection diagram] Fig. 1

Description

＜関連出願の互いに引用＞
本発明は、出願日が２０１９年３月１８日であり、出願番号が２０１９１０２０４７９３.１であり、発明名称が「注視エリア検出方法、装置、及び電子デバイス」である中国特許出願の優先権を主張し、当該中国特許出願の全ての内容が参照として本願に組み入れられる。
本発明は、コンピュータビジョン技術の分野に関し、特に、注視エリア検出方法、装置、及び電子デバイスに関する。 <Mutual citation of related applications>
The present invention claims the priority of a Chinese patent application with a filing date of March 18, 2019, an application number of 2019102047793.1, and an invention title of "Gaze Area Detection Method, Device, and Electronic Device". However, the entire contents of the Chinese patent application are incorporated herein by reference.
The present invention relates to the field of computer vision technology, and more particularly to gaze area detection methods, devices, and electronic devices.

注視エリアの検出は、インテリジェントな運転、ヒューマンコンピュータインタラクション、セキュリティ監視などのアプリケーションで重要な役割を果たすことができる。ヒューマンコンピュータインタラクションに関しては、目の空間における３次元位置を確定し、また３次元視線方向を組み合わせて、人の注視点の３次元空間おける位置を得て、機械に出力してさらなるインタラクティブ処理を行うようにする。注意力検出に関しては、目の視線方向を推定することによって、人の注視方向を判断し、人の関心エリアを得て、人の注意力が集中しているか否かを判断することができる。 Gaze area detection can play an important role in applications such as intelligent driving, human-computer interaction, and security monitoring. Regarding human-computer interaction, the 3D position in the eye space is determined, and the 3D line-of-sight direction is combined to obtain the position of the human gaze point in the 3D space, which is output to the machine for further interactive processing. To do so. With regard to attention detection, by estimating the gaze direction of the eyes, it is possible to determine the gaze direction of the person, obtain the area of interest of the person, and determine whether or not the attention of the person is concentrated.

本発明の第１態様によると、注視エリア検出方法を提供し、当該方法は、所定の３次元空間で収集された顔画像を取得することと、前記顔画像に基づいて視線検出を実行して視線検出結果を得ることと、前記所定の３次元空間に対して事前にトレーニングされた注視エリア分類器を利用して、前記視線検出結果に基づいて前記顔画像に対応する目標注視エリアの種類を検出することと、を含み、ここで、前記目標注視エリアは、前記所定の３次元空間を事前に分割することにより得られた複数種類の定義された注視エリアのうちの１つに属する。 According to the first aspect of the present invention, a gaze area detection method is provided, in which the method obtains a face image collected in a predetermined three-dimensional space and executes line-of-sight detection based on the face image. By obtaining the line-of-sight detection result and using the gaze area classifier trained in advance for the predetermined three-dimensional space, the type of the target gaze area corresponding to the face image can be determined based on the line-of-sight detection result. Including detecting, where the target gaze area belongs to one of a plurality of defined gaze areas obtained by pre-dividing the predetermined three-dimensional space.

本発明の第２態様によると、注視エリア検出装置を提供し、前記装置は、所定の３次元空間で収集された顔画像を取得するための画像取得モジュールと、前記顔画像に基づいて視線検出を実行して視線検出結果を得るための視線検出モジュールと、前記所定の３次元空間に対して事前にトレーニングされた注視エリア分類器を利用して、前記視線検出結果に基づいて前記顔画像に対応する目標注視エリアの種類を検出するための注視エリア検出モジュールと、を備え、ここで、前記目標注視エリアは、前記所定の３次元空間を事前に分割することにより得られた複数種類の定義された注視エリアのうちの１つに属する。 According to the second aspect of the present invention, a gaze area detection device is provided, in which the device is an image acquisition module for acquiring a face image collected in a predetermined three-dimensional space, and line-of-sight detection based on the face image. Using the line-of-sight detection module for obtaining the line-of-sight detection result and the gaze area classifier trained in advance for the predetermined three-dimensional space, the face image is created based on the line-of-sight detection result. It comprises a gaze area detection module for detecting the type of the corresponding target gaze area, wherein the target gaze area has a plurality of types of definitions obtained by preliminarily dividing the predetermined three-dimensional space. It belongs to one of the gaze areas.

本発明の第３態様によると、コンピュータプログラムが記憶されているコンピュータ可読記録媒体を提供し、前記コンピュータプログラムがプロセッサによって実行されると、当該プロセッサが上記の第１態様の方法を実現するようにする。 According to a third aspect of the invention, a computer-readable recording medium in which a computer program is stored is provided so that when the computer program is executed by a processor, the processor realizes the method of the first aspect. do.

本発明の第４態様によると、電子デバイスを提供し、当該電子デバイスは、メモリとプロセッサとを備え、前記メモリには、コンピュータプログラムが記憶されており、前記プロセッサが前記コンピュータプログラムを実行するときに、上記の第１態様の方法を実現する。 According to a fourth aspect of the present invention, an electronic device is provided, the electronic device includes a memory and a processor, and the computer program is stored in the memory, and the processor executes the computer program. In addition, the method of the first aspect described above is realized.

本発明の実施例によると、所定の３次元空間の変化に対して、各３次元空間に対応する注視エリア分類器のみをトレーニングする必要がある。分類器のトレーニングは大量のデータを必要とせず、またトレーニング速度がより速いため、異なる３次元空間（たとえば異なる車両モデルの空間）間で注視エリア検出方法を移転するときの時間コストおよび技術的な困難を大幅に削減することができる。 According to the embodiment of the present invention, it is necessary to train only the gaze area classifier corresponding to each three-dimensional space for the change of the predetermined three-dimensional space. Classifier training does not require large amounts of data and training speeds are faster, so the time cost and technical costs of transferring gaze area detection methods between different 3D spaces (eg, spaces of different vehicle models). Difficulties can be greatly reduced.

本発明の例示的な実施例に係る注視エリア検出方法のフローチャートである。It is a flowchart of the gaze area detection method which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係る所定の３次元空間に対する注視エリア分類器をリアルタイムでトレーニングする方法のフローチャートである。It is a flowchart of the method of training the gaze area classifier for a predetermined 3D space which concerns on an exemplary embodiment of this invention in real time. 本発明の例示的な実施例に係る複数種類の定義された注視エリアの模式図である。It is a schematic diagram of a plurality of types of defined gaze areas according to an exemplary embodiment of the present invention. 本発明の例示的な実施例に係る顔画像内の人物の視線開始点情報を確定する方法のフローチャートである。It is a flowchart of the method of determining the line-of-sight start point information of a person in a face image which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係る顔画像内の人物の視線方向情報を検出する方法のフローチャートである。It is a flowchart of the method of detecting the line-of-sight direction information of a person in a face image which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係る顔画像内の人物の頭部姿態情報を検出する方法のフローチャートである。It is a flowchart of the method of detecting the head appearance information of the person in the face image which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係る頭部姿態情報に基づいて顔画像内の人物の視線方向情報を検出する方法のフローチャートである。It is a flowchart of the method of detecting the line-of-sight direction information of a person in a face image based on the head figure information which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係る顔画像に対して正規化処理を行って正規化された顔画像を得る方法のフローチャートである。It is a flowchart of the method of obtaining the normalized face image by performing the normalization processing on the face image which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係る取得た顔画像に対して正規化処理を行う模式図である。It is a schematic diagram which performs the normalization processing on the acquired face image which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係る分類器が目標注視エリアの種類を出力する模式図である。It is a schematic diagram which outputs the type of the target gaze area by the classifier which concerns on the exemplary embodiment of this invention. 本発明の例示的な実施例に係る分類器が目標注視エリアの名称を出力する模式図である。It is a schematic diagram which outputs the name of the target gaze area by the classifier which concerns on the exemplary embodiment of this invention. 本発明の例示的な実施例に係る３次元視線方向を検出するためのニューラルネットワークをトレーニングする方法のフローチャートである。It is a flowchart of the method of training the neural network for detecting the three-dimensional line-of-sight direction which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係る注視エリア検出装置のブロック図である。It is a block diagram of the gaze area detection apparatus which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係る注視エリア検出装置の視線検出モジュールのブロック図である。It is a block diagram of the line-of-sight detection module of the gaze area detection device which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係る注視エリア検出装置のもう１つの視線検出モジュールのブロック図である。It is a block diagram of another line-of-sight detection module of the gaze area detection device which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係る図１２および図１３中の目位置検出サブモジュールのブロック図である。12 is a block diagram of an eye position detection submodule in FIGS. 12 and 13 according to an exemplary embodiment of the present invention. 本発明の例示的な実施例に係る注視エリア検出装置のもう１つの視線検出モジュールのブロック図である。It is a block diagram of another line-of-sight detection module of the gaze area detection device which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係る図１５中の視線検出モジュールの姿態検出サブモジュールのブロック図である。It is a block diagram of the appearance detection submodule of the line-of-sight detection module in FIG. 15 which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係る図１５中の視線検出モジュールの方向検出サブモジュールのブロック図である。It is a block diagram of the direction detection submodule of the line-of-sight detection module in FIG. 15 which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係る図１７中の方向検出サブモジュールの画像処理ユニットのブロック図である。It is a block diagram of the image processing unit of the direction detection submodule in FIG. 17 which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係るもう１つの注視エリア検出装置のブロック図である。It is a block diagram of another gaze area detection apparatus which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係るもう１つの注視エリア検出装置のブロック図である。It is a block diagram of another gaze area detection apparatus which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係るもう１つの注視エリア検出装置のブロック図である。It is a block diagram of another gaze area detection apparatus which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係るもう１つの注視エリア検出装置のブロック図である。It is a block diagram of another gaze area detection apparatus which concerns on an exemplary embodiment of this invention. 本発明の例示的な実施例に係る電子デバイスのブロック図である。It is a block diagram of the electronic device which concerns on an exemplary embodiment of this invention.

ここで例示的な実施例を詳細に説明し、その例を図面に示す。以下の説明が図面を言及している場合、特に明記しない限り、異なる図面における同一の数字は、同一または類似な要素を示す。以下の例示的な実施例で叙述される実施形態は、本発明と一致するすべての実施形態を代表しない。逆に、それらは、添付された特許請求の範囲に記載された、本発明のいくつかの態様と一致する装置及び方法の例に過ぎない。 Here, exemplary embodiments will be described in detail and examples are shown in the drawings. Where the following description refers to drawings, the same numbers in different drawings indicate the same or similar elements, unless otherwise stated. The embodiments described in the following exemplary examples do not represent all embodiments consistent with the present invention. Conversely, they are merely examples of devices and methods consistent with some aspects of the invention described in the appended claims.

本発明で使用される用語は、特定の実施例を説明することのみを目的としており、本発明を限定することを意図するものではない。本発明で使用される「一種」、「前記」、「当該」などの単数形は、文脈が他の意味を明確に示さない限り、複数形を含むことを意図している。本明細書で使用される「および／または」という用語は、１つまたは複数の関連するリストされたアイテムの任意の１つまたはすべての可能な組み合わせを含むことを指すことを理解すべきである。 The terms used in the present invention are intended solely to illustrate specific embodiments and are not intended to limit the invention. The singular forms used in the present invention, such as "kind," "above," and "corresponding," are intended to include the plural unless the context clearly indicates other meanings. It should be understood that the term "and / or" as used herein refers to including any one or all possible combinations of one or more related listed items. ..

本発明では、第１、第２、第３などの用語を使用して様々な情報を記述することがあるが、これら情報はこれら用語によって限制されるべきではないことを理解すべきである。これら用語は、同じ種類の情報を互いに区別するためにのみ使用される。たとえば、本開示の範囲から逸脱することなく、第１の情報は、第２の情報とも呼ばれ得、同様に、第２の情報は、第１の情報とも呼ばれ得る。文脈に応じて、本明細書で使用される「もし」という単語は、「…場合」、「…すると」、または、「…ことに応答して」と解釈することができる。 In the present invention, various information may be described using terms such as first, second, and third, but it should be understood that such information should not be limited by these terms. These terms are used only to distinguish the same type of information from each other. For example, without departing from the scope of the present disclosure, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information. Depending on the context, the word "if" as used herein can be interpreted as "... if", "... then", or "in response to ...".

本発明は、注視エリア検出方法を提供し、インテリジェントな運転、ヒューマンコンピュータインタラクション、セキュリティ監視などのシーンに適用されることができる。本発明は、当該注視エリア検出方法をインテリジェントな運転シーンに適用する例を挙げて詳細に説明する。 The present invention provides a gaze area detection method and can be applied to scenes such as intelligent driving, human-computer interaction, and security monitoring. The present invention will be described in detail with reference to an example of applying the gaze area detection method to an intelligent driving scene.

本発明の実施例において、関与する実行主体は、コンピュータシステムおよび所定の３次元空間に設けられたカメラを含み得る。所定の３次元空間に設けられたカメラは、収集したユーザの顔画像データを上記のコンピュータシステムに送信することができる。当該コンピュータシステムは、人工ニューラルネットワークを利用して上記の顔画像データに対して処理を実行して、当該ユーザの注意力が所定の３次元空間内のどの部分のエリアに集中されているかを検出することができ、すなわち、当該ユーザの目標注視エリアを検出することができて、コンピュータシステムが、上記のユーザの目標注視エリアに基づいて、スマート走行車両を運転するための命令などの、対応する操作制御情報を出力するようにすることができる。 In the embodiments of the present invention, the executing entity involved may include a computer system and a camera provided in a predetermined three-dimensional space. A camera provided in a predetermined three-dimensional space can transmit the collected face image data of the user to the above-mentioned computer system. The computer system uses an artificial neural network to execute processing on the above facial image data to detect in which area of a predetermined three-dimensional space the user's attention is concentrated. That is, the target gaze area of the user can be detected, and the computer system responds, such as an instruction for driving a smart traveling vehicle based on the target gaze area of the user described above. Operation control information can be output.

上記のコンピュータシステムは、サーバ、サーバクラスタ、または、クラウドプラットフォームに設けられ得、さらに、パーソナルコンピュータ、車載デバイス、移動端末などの電子デバイス中のコンピュータシステムであり得る。上記のカメラは、ドライビングレコーダ内のカメラ、スマート端末のカメラなどの、車載デバイスであり得る。上記のスマート端末は、たとえば、スマートフォン、ＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ、パーソナルデジタルアシスタント）、タブレットコンピュータ、車載デバイスなどの、電子デバイスを含み得る。具体的に実現する過程において、カメラとコンピュータシステムとは、互いに独立していてもよく、同時に互いに接続されて、本発明の実施例によって提供される注視エリア検出方法を共同で実現することができる。以下、コンピュータシステムの例を挙げて、本発明によって提供される注視エリア検出方法を詳細に説明する。 The computer system may be a server, a server cluster, or a cloud platform, and may be a computer system in an electronic device such as a personal computer, an in-vehicle device, or a mobile terminal. The above-mentioned camera may be an in-vehicle device such as a camera in a driving recorder or a camera of a smart terminal. The smart terminal may include electronic devices such as smartphones, PDAs (Personal Digital Assistants), tablet computers, in-vehicle devices and the like. In the process of concretely realizing, the camera and the computer system may be independent of each other or connected to each other at the same time, and the gaze area detection method provided by the embodiment of the present invention can be jointly realized. .. Hereinafter, the gaze area detection method provided by the present invention will be described in detail with reference to an example of a computer system.

図１は、本発明の例示的な実施例に係る注視エリア検出方法のフローチャートである。前記方法は、コンピュータシステムによって実行され得、様々なスマートデバイス（たとえば、スマート交通手段、スマートロボット、スマートホームデバイスなど）に適用され得る。図１に示したように、当該方法は、ステップ１１～１３を含み得る。 FIG. 1 is a flowchart of a gaze area detection method according to an exemplary embodiment of the present invention. The method can be performed by a computer system and can be applied to various smart devices (eg, smart transportation means, smart robots, smart home devices, etc.). As shown in FIG. 1, the method may include steps 11-13.

ステップ１１において、所定の３次元空間内で収集された顔画像を取得する。 In step 11, a face image collected in a predetermined three-dimensional space is acquired.

Ｍモデルの車両の例を挙げると、所定の３次元空間は、当該車両の空間であり、当該車両のセンターコンソールの位置などの内部空間に１つのカメラが固定設置されることができる。当該カメラは、リアルタイムまたは所定の時間周期などで、ドライバなどの目標対象の顔画像を収集してコンピュータシステムに提供することによって、当該コンピュータシステムが収集された顔画像を取得するようにすることができる。 Taking the example of the vehicle of the M model, the predetermined three-dimensional space is the space of the vehicle, and one camera can be fixedly installed in the internal space such as the position of the center console of the vehicle. The camera may collect a face image of a target target such as a driver and provide it to a computer system in real time or at a predetermined time cycle so that the computer system can acquire the collected face image. can.

ステップ１２において、前記顔画像に基づいて視線検出を実行して視線検出結果を得る。 In step 12, the line-of-sight detection is executed based on the face image, and the line-of-sight detection result is obtained.

本発明の実施例において、コンピュータシステムは、上記の顔画像に基づいて視線検出を実行して、視線検出結果を得ることができる。視線検出は、顔画像内の目の位置および／または視線方向を分析することによって、視線検出結果を得ることである。本発明は、視線検出を実行する方法に対して限定しなく、すなわち、本発明の実施例に言及された方法を採用して視線検出を実行してもよいし、従来の他の方法を採用して視線検出を実行してもよい。上記の視線検出結果は、顔画像内の人物の視線開始点情報および視線方向情報を含み得、顔画像内の人物の頭部姿態などの情報をさらに含み得る。 In the embodiment of the present invention, the computer system can execute the line-of-sight detection based on the above-mentioned face image and obtain the line-of-sight detection result. The line-of-sight detection is to obtain the line-of-sight detection result by analyzing the position and / or the line-of-sight direction of the eyes in the facial image. The present invention is not limited to the method of performing the line-of-sight detection, that is, the method referred to in the embodiment of the present invention may be adopted to perform the line-of-sight detection, or other conventional methods may be adopted. Then, the line-of-sight detection may be executed. The above-mentioned line-of-sight detection result may include line-of-sight start point information and line-of-sight direction information of a person in a face image, and may further include information such as a head figure of a person in a face image.

ステップ１３において、前記所定の３次元空間に対して事前にトレーニングされた注視エリア分類器を利用して、前記視線検出結果に基づいて前記顔画像に対応する目標注視エリアの種類を検出する。 In step 13, the type of the target gaze area corresponding to the face image is detected based on the gaze detection result by using the gaze area classifier trained in advance for the predetermined three-dimensional space.

前記目標注視エリアは、前記所定の３次元空間を事前に分割することにより得られた複数種類の定義された注視エリアのうちの１つに属する。たとえば、フロントガラス、バックミラー、または、車内の他の空間などの、車両走行過程でドライバが注視できる各空間を所定の３次元空間に設定することができる。 The target gaze area belongs to one of a plurality of defined gaze areas obtained by preliminarily dividing the predetermined three-dimensional space. For example, each space that the driver can gaze at during the vehicle traveling process, such as a windshield, a rear-view mirror, or another space in the vehicle, can be set as a predetermined three-dimensional space.

上記の例のように、コンピュータシステムは、上記の顔画像内の人物の視線検出結果を得た後に、上記の視線検出結果を、事前にトレーニングされた、上記のＭモデルのインテリジェント運転車両の注視エリア分類器に、入力することによって、上記の顔画像に対応する目標注視エリアの種類を検出することができ、すなわち、画像を収集するときのドライバなどの顔画像内の人が車両のどのエリアを注視しているかを検出することができる。 As in the above example, the computer system obtains the line-of-sight detection result of the person in the face image, and then gazes at the above-mentioned line-of-sight detection result of the above-mentioned M model intelligent driving vehicle trained in advance. By inputting into the area classifier, it is possible to detect the type of target gaze area corresponding to the above facial image, i.e., which area of the vehicle the person in the facial image, such as the driver when collecting the image. It is possible to detect whether or not you are watching.

本発明において、上記の所定の３次元空間に対する注視エリア分類器は、コンピュータシステムによって上記の所定の３次元空間に対するトレーニングサンプルセットに基づいて事前にトレーニングされたものであり、ここで、前記トレーニングサンプルセットは、複数の視線特徴サンプルを含み、各前記視線特徴サンプルは、視線開始点情報、視線方向情報、および、当該視線特徴サンプルに対応する注視エリア種類のラベリング情報を含み、ラベリングした注視エリアの種類は、前記所定の３次元空間を分割して得られた複数種類の定義された注視エリアのうちの１つに属する。 In the present invention, the gaze area classifier for the predetermined three-dimensional space is pre-trained by a computer system based on the training sample set for the predetermined three-dimensional space, and here, the training sample. The set includes a plurality of gaze feature samples, and each gaze feature sample contains gaze start point information, gaze direction information, and labeling information of the gaze area type corresponding to the gaze feature sample, and is a labeled gaze area. The type belongs to one of a plurality of defined gaze areas obtained by dividing the predetermined three-dimensional space.

本発明の実施例によると、所定の３次元空間に対して注視エリア分類器をトレーニングする前に、所定の３次元空間内の、目の視線が注目する可能性がある３次元空間エリアに対して、細かく分類して、複数種類の定義された注視エリアを得、また複数種類の定義された注視エリアに対応するトレーニングサンプルセットに基づいて分類器トレーニングを実行して、所定の３次元空間に対する注視エリア分類器を得る。後続で、当該注視エリア分類器を利用して視線検出結果に基づいて目標注視エリア情報を正確に検出することができ、計算が簡単で、目標注視エリアの誤判断率を効果的に低減し、後続の操作に対してより正確な情報を提供することができる。 According to an embodiment of the present invention, before training the gaze area classifier for a predetermined 3D space, for a 3D space area in the predetermined 3D space where the line of sight of the eye may be noticed. And finely categorize to obtain multiple types of defined gaze areas, and perform classifier training based on a training sample set corresponding to multiple types of defined gaze areas for a given 3D space. Obtain a gaze area classifier. Subsequently, the target gaze area information can be accurately detected based on the line-of-sight detection result using the gaze area classifier, the calculation is easy, and the misjudgment rate of the target gaze area is effectively reduced. More accurate information can be provided for subsequent operations.

上記のステップ１２に対応する視線検出段階は、所定の３次元空間における複数種類の定義された注視エリアの分布とは関係がなく、上記のステップ１３に対応する注視エリア検出段階は、上記の複数種類の定義された注視エリアの所定の３次元空間における分布と関係がある。たとえば、異なるモデルの車両の空間の全体の大きさは、異なる可能性があり、またグローブボックスなどの同一の種類のエリアの異なる車両空間における位置が異なる可能性があるため、異なる３次元空間における複数種類の定義された注視エリアの分割も異なる可能性があり、たとえば、定義された注視エリアの数および種類が異なる可能性がある。したがって、異なる３次元空間に対して異なる注視エリア分類器をトレーニングする必要があり、たとえば、空間分布が異なるＭモデル車両およびＮモデル車両に対して、異なる注視エリア分類器をそれぞれトレーニングする必要がある。 The line-of-sight detection step corresponding to the above step 12 has nothing to do with the distribution of a plurality of types of defined gaze areas in a predetermined three-dimensional space, and the gaze area detection step corresponding to the above step 13 is the above-mentioned plurality. It is related to the distribution of the defined gaze area of the type in a given three-dimensional space. For example, the overall size of the vehicle space of different models can be different, and the positions of the same type of area, such as a glove box, in different vehicle spaces can be different, so in different three-dimensional spaces. The division of multiple types of defined gaze areas can also be different, for example, the number and types of defined gaze areas can be different. Therefore, it is necessary to train different gaze area classifiers for different 3D spaces, for example, different gaze area classifiers for M model vehicles and N model vehicles with different spatial distributions. ..

したがって、異なるモデルの車両に対して同じ方法を採用して視線検出を実行することができ、車両モデルを変更するときに注視エリア分類器を再トレーニングするだけで済む。エンドツーエンドの方式で畳み込みニューラルネットワークの全体を再トレーニングする場合と比較して、注視エリア分類器のトレーニングは比較的に簡単で、それほど多いデータを必要とせず、トレーニング速度が速いため、異なる車両モデル間で上記の注視エリア検出方法を移転するときの時間コストおよび技術的な困難を大幅に削減することができる。 Therefore, the same method can be used to perform line-of-sight detection for vehicles of different models, requiring only retraining of the gaze area classifier when changing vehicle models. Training a gaze area classifier is relatively easy, requires less data, and is faster than training an entire convolutional neural network in an end-to-end fashion, so different vehicles The time cost and technical difficulty in transferring the above gaze area detection method between models can be significantly reduced.

本発明のもう１実施例において、上記の注視エリア検出方法は、上記のステップ１１の前に、前記所定の３次元空間に対するトレーニングが完了された注視エリア分類器を取得することをさらに含み得る。本発明において、以下の方式１または方式２を採用して、前記所定の３次元空間に対するトレーニングが完了された注視エリア分類器を取得することができる。 In another embodiment of the invention, the gaze area detection method may further comprise obtaining a gaze area classifier that has been trained for the predetermined three-dimensional space prior to step 11. In the present invention, the following method 1 or method 2 can be adopted to obtain a gaze area classifier for which training for the predetermined three-dimensional space has been completed.

方式１において、注視エリア検出を実行する必要があるときに、所定の３次元空間に対する注視エリア分類器をリアルタイムでトレーニングする。 In method 1, when it is necessary to perform gaze area detection, a gaze area classifier for a predetermined three-dimensional space is trained in real time.

図２に示したように、所定の３次元空間に対する注視エリア分類器をリアルタイムでトレーニングすることは、少なくとも１つの視線特徴サンプルの視線開始点情報および視線方向情報をトレーニング待ちの注視エリア分類器に入力して、当該視線特徴サンプルに対応する注視エリア種類予測情報を得るステップ１０１と、前記注視エリア種類予測情報と当該視線特徴サンプルに対応する注視エリア種類のラベリング情報との間の偏差に基づいて、前記注視エリア分類器に対してパラメータ調整を実行して、前記注視エリア分類器をトレーニングするステップ１０２と、をさらに含み得る。 As shown in FIG. 2, training the gaze area classifier for a predetermined three-dimensional space in real time causes the gaze start point information and the gaze direction information of at least one gaze feature sample to be the gaze area classifier waiting for training. Based on the deviation between the step 101 of inputting to obtain the gaze area type prediction information corresponding to the line-of-sight feature sample and the gaze area type prediction information and the gaze area type labeling information corresponding to the line-of-sight feature sample. , And step 102 of training the gaze area classifier by performing parameter adjustments on the gaze area classifier.

たとえば、上記の所定の３次元空間は、あるモデルの車両の空間であり得る。まず、顔画像を収集するためのカメラの固定位置を確定する。たとえば、カメラをセンターコンソールの位置に固定して、運転エリア内のドライバの顔画像を収集する。後続で、分類器トレーニング段階および検出段階で必要な顔画像は、すべて当該固定位置の上記のカメラを利用して収集する。 For example, the above-mentioned predetermined three-dimensional space can be the space of a vehicle of a certain model. First, the fixed position of the camera for collecting the face image is determined. For example, the camera is fixed in the position of the center console to collect facial images of the driver in the driving area. Subsequent facial images required for the classifier training and detection stages are all collected using the above camera in the fixed position.

同時に、上記の車両の異なる部位に対して注視エリア分割を実行し、主には、車両運転過程でドライバの目が注目する必要があるエリアに基づいて、上記の車両空間で複数種類の定義された注視エリアを分割し、複数種類の定義された注視エリアに対してそれぞれ対応する種類情報を設定する。 At the same time, gaze area division is performed for different parts of the vehicle above, and multiple types are defined in the vehicle space above, primarily based on the areas that the driver's eyes need to pay attention to during the vehicle driving process. The gaze area is divided, and the corresponding type information is set for each of a plurality of defined gaze areas.

本発明の１実施例において、車両空間を分割して得られた複数種類の定義された注視エリアは、左フロントガラスエリア、右フロントガラスエリア、インストルメントパネルエリア、インテリアミラーエリア、センターコンソールエリア、左バックミラーエリア、右バックミラーエリア、シェーディングプレートエリア、シフトレバーエリア、ステアリングホイールの下方エリア、副操縦士エリア、副操縦士の前方のグローブボックスエリアの中の少なくとも２種類を含み得る。 In one embodiment of the present invention, the plurality of types of defined gaze areas obtained by dividing the vehicle space are the left windshield area, the right windshield area, the instrument panel area, the interior mirror area, the center console area, and the like. It may include at least two of the left rear-view mirror area, the right rear-view mirror area, the shading plate area, the shift lever area, the area below the steering wheel, the co-pilot area, and the glove box area in front of the co-pilot.

図３は、本発明の例示的な実施例に係る複数種類の定義された注視エリアの模式図である。所定の１つのモデルの車両に対して、左フロントガラス、右フロントガラス、インストルメントパネル、インテリアミラー、センターコンソール、左バックミラー、右バックミラー、遮陽板、シフトレバー、携帯電話のような複数種類の定義された注視エリアを確定することができる。複数種類の定義された注視エリアに対してそれぞれ対応する種類情報を事前に設定することができ、たとえば、数字を利用して種類値をしめすことができる。上記の複数種類の定義された注視エリアと所定の種類値との間の対応関係は、表１に示したようであり得る。

FIG. 3 is a schematic diagram of a plurality of types of defined gaze areas according to an exemplary embodiment of the present invention. Multiple types such as left windshield, right windshield, instrument panel, interior mirror, center console, left rearview mirror, right rearview mirror, sun visor, shift lever, mobile phone for one given model of vehicle The defined gaze area of can be determined. Corresponding type information can be set in advance for each of a plurality of types of defined gaze areas, and for example, a numerical value can be used to indicate a type value. The correspondence between the plurality of types of defined gaze areas described above and the predetermined type values may be as shown in Table 1.

上記の種類情報は、Ａ、Ｂ、Ｃ…Ｊなどの所定の英語文字で示すこともできることを説明する必要がある。 It is necessary to explain that the above type information can also be indicated by predetermined English characters such as A, B, C ... J.

その後、顔画像サンプルを収集し、トレーニングサンプルセットを得る。当該トレーニングサンプルセットは、複数の視線特徴サンプルを含み得、ここで、各前記視線特徴サンプルは、視線開始点情報、視線方向情報、および、当該視線特徴サンプルに対応する注視エリア種類のラベリング情報を含み、ラベリングした注視エリアの種類は、前記所定の３次元空間を分割して得られた複数種類の定義された注視エリアのうちの１つに属する。ここで、どのように顔画像に基づいて人の視線開始点情報および視線方向情報を確定するかに関しては、後面で詳細に記述する。 Then, a facial image sample is collected to obtain a training sample set. The training sample set may include a plurality of line-of-sight feature samples, wherein each line-of-sight feature sample contains line-of-sight start point information, line-of-sight direction information, and labeling information of the gaze area type corresponding to the line-of-sight feature sample. The type of gaze area included and labeled belongs to one of a plurality of defined gaze areas obtained by dividing the predetermined three-dimensional space. Here, how to determine the line-of-sight start point information and the line-of-sight direction information of a person based on a facial image will be described in detail on the rear surface.

続いて、上記のトレーニングサンプルセットを利用して、以下のステップを反復して実行することによって上記の所定の３次元空間に対する分類器をトレーニングし、ここで、当該ステップは、上記のトレーニングサンプルセットの中の１つの視線特徴サンプルの視線開始点情報および視線方向情報をトレーニング待ちの注視エリア分類器に入力して、当該視線特徴サンプルに対応する注視エリア種類の予測情報を得ることと、当該視線特徴サンプルの前記注視エリア種類に対する予測情報および注視エリア種類のラベリング情報との間の偏差に基づいて、前記注視エリア分類器に対してパラメータ調整を実行して、前記注視エリア分類器をトレーニングすることと、をさらに含む。 Subsequently, using the above training sample set, the classifier for the above-mentioned predetermined three-dimensional space is trained by repeatedly executing the following steps, where the step is the above-mentioned training sample set. The line-of-sight start point information and the line-of-sight direction information of one of the line-of-sight feature samples are input to the gaze area classifier waiting for training to obtain the prediction information of the gaze area type corresponding to the line-of-sight feature sample, and the line of sight. Training the gaze area classifier by performing parameter adjustments to the gaze area classifier based on the deviation between the prediction information for the gaze area type and the labeling information for the gaze area type in the feature sample. And further include.

例示的な１実施例において、上記のステップ１０２は、１つの視線特徴サンプルの注視エリア種類の予測値と注視エリア種類のラベリング値との間の差値に基づいて、損失関数値を得ることと、前記損失関数値が所定のトレーニング終了条件を満たすと、トレーニングを終了し、現在トレーニング段階の分類器をトレーニングが完了された分類器として確定することと、前記損失関数値が上記の所定のトレーニング終了条件を満たさないと、前記損失関数値に基づいて前記注視エリア分類器に対してパラメータ調整を実行することと、を含み得る。 In one exemplary embodiment, step 102 above obtains a loss function value based on the difference between the predicted value of the gaze area type of one gaze area type and the labeling value of the gaze area type of one gaze feature sample. When the loss function value satisfies the predetermined training end condition, the training is ended, the classifier currently in the training stage is determined as the trained classifier, and the loss function value is the above-mentioned predetermined training. Failure to meet the termination condition may include performing parameter adjustments to the gaze area classifier based on the loss function value.

本発明の実施例において、損失関数は、トレーニング過程で分類器モデルのトレーニングサンプルに対する誤分類程度を測定するための数学的表現である。損失関数値は、トレーニングサンプルセットの全体に基づいて得ることができ、上記の損失関数値が大きいほど、現在トレーニング段階の分類器の誤分類率が高いことを示し、逆に、上記の損失関数値が小さいほど、現在トレーニング段階の分類器の誤分類率が小さいことを示す。 In an embodiment of the invention, the loss function is a mathematical representation for measuring the degree of misclassification of a classifier model with respect to a training sample during the training process. The loss function value can be obtained based on the entire training sample set, and the larger the above loss function value, the higher the misclassification rate of the classifier at the current training stage, and conversely, the above loss function value. The smaller the value, the smaller the misclassification rate of the classifier at the current training stage.

上記の所定のトレーニング終了条件は、注視エリア分類器のトレーニングを終了する条件である。１実施例において、上記の所定のトレーニング終了条件は、所定の損失関数の損失関数値が所定の閾値よりも小さいことであり得る。理想的な場合、上記の所定のトレーニング終了条件は、損失関数値が０に等しいことである。これは、現在分類器によって予測された注視エリア種類がすべて正確であることを示す。実際の操作において、注視エリア分類器のトレーニング効率およびトレーニングコストの問題を考慮して、上記の所定の閾値は所定の１つの経験値であり得る。 The above-mentioned predetermined training end condition is a condition for ending the training of the gaze area classifier. In one embodiment, the above-mentioned predetermined training end condition may be that the loss function value of the predetermined loss function is smaller than the predetermined threshold value. Ideally, the predetermined training end condition described above is that the loss function value is equal to zero. This indicates that all gaze area types currently predicted by the classifier are accurate. In actual operation, the above-mentioned predetermined threshold value may be a predetermined one empirical value in consideration of the problems of training efficiency and training cost of the gaze area classifier.

上記の例のように、現在損失関数値が上記の所定の閾値以上であると、現在トレーニング段階の分類器の予測結果の正確率が期待どおりではないことを意味するため、所定のトレーニング終了条件が満たされるまでに、上記の損失関数値を利用して注視エリア分類器の関連パラメータを調整した後に、パラメータ更新後の注視エリア分類器を利用してステップ１０１とステップ１０２とを反復して実行して、上記の所定の３次元空間に対するトレーニングが完了された注視エリア分類器を得ることができる。 As in the above example, if the current loss function value is greater than or equal to the above predetermined threshold, it means that the accuracy rate of the prediction result of the classifier at the current training stage is not as expected. After adjusting the related parameters of the gaze area classifier using the above loss function value until is satisfied, step 101 and step 102 are repeatedly executed using the gaze area classifier after updating the parameters. Then, it is possible to obtain a gaze area classifier in which the training for the above-mentioned predetermined three-dimensional space is completed.

本発明の実施例において、コンピュータシステムは、サポートベクターマシン、単純ベイズ、決定木、ランダムフォレスト、Ｋ平均法（Ｋ-ｍｅａｎｓ）などのアルゴリズムを採用して、上記の注視エリア分類器をトレーニングすることができる。 In an embodiment of the invention, the computer system employs algorithms such as support vector machines, naive bayes, decision trees, random forests, and K-means to train the gaze area classifier described above. Can be done.

本発明の実施例において、所定の３次元空間の変化に応じて、トレーニングサンプルセットを再確定して、対応する注視エリア分類器をトレーニングする必要がある。分類器のトレーニングが大量のデータを必要とせず、またトレーニング速度がより速いため、異なる３次元空間（たとえば異なる車両モデルの空間）間で上記の注視エリア検出方法を移転するときの時間コストおよび技術的な困難を大幅に削減することができる。 In an embodiment of the invention, it is necessary to redefine the training sample set and train the corresponding gaze area classifier in response to changes in a given three-dimensional space. Time cost and technology when transferring the above gaze area detection method between different 3D spaces (eg, different vehicle model spaces) because classifier training does not require large amounts of data and training speeds are faster. Difficulty can be greatly reduced.

方式２において、注視エリア検出を実行する必要があるときに、所定のストレージリソースから上記の所定の３次元空間に対する注視エリア分類器を直接取得する。 In the method 2, when it is necessary to execute the gaze area detection, the gaze area classifier for the above-mentioned predetermined three-dimensional space is directly acquired from the predetermined storage resource.

本発明の１実施例において、コンピュータシステムは、各種類の所定の３次元空間に対するトレーニングが完了された注視エリア分類器を、当該所定の３次元空間の空間識別子と関連付けて、クラウドサーバなどの、指定されたストレージリソースに記憶して、所定の注視エリア分類器セットを形成する。上記のインテリジェントな運転アプリケーションシナリオにおいて、上記の所定の注視エリア分類器セットは、複数の車両モデルと注視エリア分類器との間の対応関係を含み得、表２に示したようである。

In one embodiment of the invention, the computer system associates the gaze area classifier, which has been trained for each type of predetermined three-dimensional space, with the spatial identifier of the predetermined three-dimensional space, such as a cloud server. Stores in the specified storage resource to form a given gaze area classifier set. In the intelligent driving application scenario described above, the predetermined gaze area classifier set may include a correspondence between a plurality of vehicle models and the gaze area classifier, as shown in Table 2.

１つの既知のモデル（たとえばモデルがＭ０１である）の新車のコンピュータシステムに注視エリア分類器プログラムが装備されていない場合、当該車両は、注視エリア検出を実行する前に、自身のモデル（たとえばＭ０１）に従って自動的にクラウドサーバから対応する目標注視エリア分類器プログラム（たとえば上記の第１分類器に対応するコンピュータプログラム）をダウンロードすることによって、注視エリア検出を迅速に実現することができる。 If the computer system of a new vehicle of one known model (eg, model is M01) is not equipped with a gaze area classifier program, the vehicle will have its own model (eg, M01) before performing gaze area detection. ), The corresponding target gaze area classifier program (for example, the computer program corresponding to the above-mentioned first classifier) is automatically downloaded from the cloud server, so that the gaze area detection can be realized quickly.

本発明の実施例において、上記のステップ１２によって得られた視線検出結果は、少なくとも上記の顔画像内の人物の視線開始点情報および視線方向情報を含み、顔画像内の人物の頭部姿態情報をさらに含み得る。 In the embodiment of the present invention, the line-of-sight detection result obtained by step 12 includes at least the line-of-sight start point information and the line-of-sight direction information of the person in the face image, and the head appearance information of the person in the face image. Can be further included.

本発明の実施例によると、図４に示したように、ステップ１２１１～１２１２を実行することによって、顔画像内の人物の視線開始点情報を確定することができる。 According to the embodiment of the present invention, as shown in FIG. 4, by executing steps 1211-1212, the line-of-sight start point information of the person in the face image can be determined.

ステップ１２１１において、前記顔画像内の目の位置を検出する。 In step 1211, the position of the eyes in the face image is detected.

本発明の実施例において、上記の目の位置は、顔画像内の目の実際のカメラ座標系における位置である。上記の実際のカメラ座標系は、コンピュータシステムにより上記のカメラに基づいて確定した空間直交座標系である。上記のカメラは、上記の所定の３次元空間で上記の顔画像を撮影するカメラであり、カメラＣ０としてマークすることができる。 In an embodiment of the invention, the eye position is the position of the eye in the facial image in the actual camera coordinate system. The above-mentioned actual camera coordinate system is a spatial Cartesian coordinate system determined by a computer system based on the above-mentioned camera. The camera is a camera that captures the face image in the predetermined three-dimensional space, and can be marked as the camera C0.

当該実際のカメラ座標系のＺ軸は、上記のカメラの光軸であり、カメラレンズの光中心は、当該所定の実際のカメラ座標系の原点である。実際のカメラ座標系の横軸であるＸ軸、および、垂直軸であるＹ軸は、上記のカメラのレンズ面に平行である。 The Z-axis of the actual camera coordinate system is the optical axis of the camera, and the optical center of the camera lens is the origin of the predetermined actual camera coordinate system. The X-axis, which is the horizontal axis of the actual camera coordinate system, and the Y-axis, which is the vertical axis, are parallel to the lens plane of the camera.

本発明の実施例において、コンピュータシステムは、以下の任意の方式を採用して顔画像内の目の位置を検出することができる。すなわち、第１の方式において、少なくとも２つのカメラを利用して上記のドライバなどの１つの目標対象に対して少なくとも２つのフレームの顔画像を同時に収集し、カメラ校正法を利用して上記の顔画像内の目の位置を取得し、ここで、上記の少なくとも２つのカメラは、検出待ち顔画像を収集するカメラを含む。第２の方式において、前記顔画像内の人物の頭部姿態情報を検出し、前記頭部姿態情報に基づいて前記顔画像内の目の位置を検出する。 In the embodiment of the present invention, the computer system can detect the position of the eyes in the facial image by adopting any of the following methods. That is, in the first method, at least two cameras are used to simultaneously collect face images of at least two frames for one target object such as the above driver, and the camera calibration method is used to collect the above-mentioned face. Acquiring the position of the eyes in the image, where at least the two cameras described above include a camera that collects face images waiting to be detected. In the second method, the head shape information of the person in the face image is detected, and the position of the eyes in the face image is detected based on the head shape information.

本発明の１実施例において、コンピュータシステムは、１つのカメラによって撮影された顔画像に基づいて、フレキシブルモデル法、幾何学的方法などの関連技術での頭部姿態推定方法を利用して、上記のドライバの頭部姿態情報を確定し、頭部姿態情報に基づいて目標対象の目の所定の実際のカメラ座標系における３Ｄ位置を取得することができ、ここで、上記の所定の実際のカメラ座標系は、上記のカメラＣ０に基づいて確定されたカメラ座標系である。 In one embodiment of the present invention, the computer system utilizes a head shape estimation method in related techniques such as a flexible model method and a geometric method based on a face image taken by one camera. It is possible to determine the head shape information of the driver and acquire the 3D position in the predetermined actual camera coordinate system of the target eye based on the head shape information, and here, the above-mentioned predetermined actual camera. The coordinate system is a camera coordinate system determined based on the above camera C0.

上記の目の位置を確定する第２の方式を採用して、単一のカメラすなわち単眼カメラによって収集された顔画像を利用して、目の３Ｄ位置を確定することができ、注視エリア検出のためのハードウェア構成コストを節約することができる。 By adopting the second method of determining the position of the eyes described above, the 3D position of the eyes can be determined using the face image collected by a single camera, that is, a monocular camera, and the gaze area can be detected. The hardware configuration cost for this can be saved.

ステップ１２１２において、前記目の位置に基づいて前記顔画像内の人物の視線開始点情報を確定する。 In step 1212, the line-of-sight start point information of the person in the face image is determined based on the position of the eyes.

本発明において、上記のステップ１２１１において顔画像から検出された目の位置は、顔画像内のドライバなどの目標対象の片目の位置を含む可能性もあり、両目の位置（すなわちドライバの左目と右目の位置）を含む可能性もある。 In the present invention, the eye positions detected from the face image in step 1211 may include the position of one eye of the target object such as the driver in the face image, and the positions of both eyes (that is, the driver's left eye and right eye). (Position of) may be included.

これに応じて、本発明の実施例は、以下の方式１または方式２を採用して上記の顔画像内の人物の視線開始点情報を確定することができる。 Accordingly, in the embodiment of the present invention, the following method 1 or method 2 can be adopted to determine the line-of-sight start point information of the person in the above facial image.

方式１において、片目の位置に基づいて上記の顔画像内の人物の視線開始点情報を確定する。１実施例において、ステップ１２１１において確定された前記目の位置が両目の位置を含むと、その中の任意の１つの目の位置に基づいて上記の顔画像内の人物の視線開始点情報を確定することができる。もう１実施例において、ステップ１２１１において確定された目の位置が片目の位置を含むと、当該片目の位置に基づいて上記の顔画像内の人物の視線開始点情報を確定することができる。 In the method 1, the line-of-sight start point information of the person in the above facial image is determined based on the position of one eye. In one embodiment, when the eye positions determined in step 1211 include the positions of both eyes, the line-of-sight start point information of the person in the above facial image is determined based on the position of any one eye in the eye positions. can do. In another embodiment, when the eye position determined in step 1211 includes the position of one eye, the line-of-sight start point information of the person in the above facial image can be determined based on the position of the one eye.

方式２において、ステップ１２１１において確定された前記目の位置が両目の位置を含むと、前記両目の中間位置を前記視線開始点情報として確定し、ここで、上記の両目の中間位置は、両目の３Ｄ座標の繋がり線の中点位置、または、両目の３Ｄ座標の繋がり線上の他の位置であり得る。 In the method 2, when the position of the eye determined in step 1211 includes the position of both eyes, the intermediate position of the eyes is determined as the line-of-sight start point information, and here, the intermediate position of both eyes is the position of both eyes. It can be the midpoint position of the 3D coordinate connection line, or another position on the 3D coordinate connection line of both eyes.

本発明の実施例において、上記の方式２を採用して顔画像内の人物の視線開始点情報を確定することは、上記の方式１と比較すると、片目検出誤差による視線開始点情報の不正確さを排除し、視線検出結果の精度を向上させることに有益である。 In the embodiment of the present invention, determining the line-of-sight start point information of a person in a face image by adopting the above method 2 is inaccurate in the line-of-sight start point information due to one-eyed detection error as compared with the above method 1. It is useful to eliminate the problem and improve the accuracy of the line-of-sight detection result.

本発明の実施例によると、図５に示したように、ステップ１２２１～１２２２を実行することによって顔画像内の人物の視線方向情報を検出することができる。 According to the embodiment of the present invention, as shown in FIG. 5, the line-of-sight direction information of the person in the face image can be detected by executing steps 1221 to 1222.

ステップ１２２１において、顔画像内の人物の頭部姿態情報を検出する。 In step 1221, the head appearance information of the person in the face image is detected.

上記のように、コンピュータシステムは、１つのカメラによって撮影された顔画像に基づいて、フレキシブルモデル法、幾何学的方法などの関連技術での頭部姿態推定方法を利用して、上記のドライバの頭部姿態情報を確定することができる。 As mentioned above, the computer system utilizes the head shape estimation method in the related technology such as flexible model method, geometric method, etc. based on the face image taken by one camera, and the above driver. The head shape information can be confirmed.

上記のフレキシブルモデル法とは、画像平面内の頭部画像の顔部構成に、アクティブシェイプモデル（ＡｃｔｉvｅＳｈａｐｅＭｏｄｅｌ、ＡＳＭ）、アクティブアピアランスモデル（ＡｃｔｉvｅＡｐｐｅａｒａｎｃｅＭｏｄｅｌ、ＡＡＭ）、弾性マップマッチングモデル（ＥｌａｓｔｉｃＧｒａｐｈＭａｔｃｈｉｎｇ、ＥＧＭ）などのフレキシブルのモデルをマッチングして、特徴比較またはモデルのパラメータによって頭部姿態推定の最終結果を得ることを指す。 The above-mentioned flexible model method includes an active shape model (Active Shape Model, ASM), an active appearance model (Active Appealance Model, AAM), and an elastic map matching model (Elastic Graph) for the facial composition of the head image in the image plane. Matching flexible models such as Matching, EGM) to obtain the final result of head shape estimation by feature comparison or model parameters.

幾何学的方法とは、頭部の形状、および、目、鼻、口の関連位置などの顔部局所特徴点の正確な形態学的情報を利用して、頭部姿態を推定することを指す。 Geometric method refers to estimating the shape of the head using the shape of the head and accurate morphological information of local facial features such as the relevant positions of the eyes, nose, and mouth. ..

本発明の実施例によると、単眼カメラによって収集された単一フレーム画像に基づいて、画像内の人物の頭部姿態を推定することができる。 According to the embodiment of the present invention, the head shape of a person in an image can be estimated based on a single frame image collected by a monocular camera.

本発明の実施例によると、図６に示したように、ステップ１２０１～１２０２を実行することによって、顔画像内の人物の頭部姿態情報を検出することができる（ステップ１２２１）。 According to the embodiment of the present invention, as shown in FIG. 6, by executing steps 1201 to 1202, it is possible to detect the head shape information of the person in the facial image (step 1221).

ステップ１２０１において、前記顔画像内の複数の顔キーポイントを検出する。 In step 1201, a plurality of face key points in the face image are detected.

本発明の１実施例において、Ｒｏｂｅｒｔアルゴリズム、Ｓｏｂｅｌアルゴリズムなどのエッジ検出アルゴリズムを利用して顔キーポイント検出を実行してもよいし、アクティブ輪郭モデル（たとえばＳｎａｋｅモデル）などの関連モデルを利用して顔キーポイント検出を実行してもよい。 In one embodiment of the present invention, face keypoint detection may be performed using an edge detection algorithm such as a Robert algorithm or a Sobel algorithm, or a related model such as an active contour model (for example, a Snake model) may be used. Face keypoint detection may be performed.

本発明のもう１実施例において、顔キーポイント検出を実行するためのニューラルネットワークを利用して顔キーポイント検出を実行することができる。なお、サードパーティのアプリケーション（たとえばＤｌｉｂツールキットなど）を利用して顔キーポイント検出を実行することができる。 In another embodiment of the present invention, face keypoint detection can be performed using a neural network for performing face keypoint detection. It should be noted that a third party application (eg, Dlib toolkit, etc.) can be used to perform face keypoint detection.

上記の方法を採用して、所定の数量（たとえば１６０個）の、左目コーナー、右目コーナー、鼻先、左口コーナー、右口コーナー、下顎などの顔キーポイントの位置座標を含み、顔部キーポイント位置を検出することができる。顔キーポイント検出方法によって、得られる顔キーポイント位置座標の数も異なる可能性があることを理解することができる。たとえば、Ｄｌｉｂツールキットを採用して６８個の顔部キーポイント位置を検出することができる。 Using the above method, the facial key points include the position coordinates of the face key points such as the left eye corner, the right eye corner, the tip of the nose, the left mouth corner, the right mouth corner, and the lower jaw in a predetermined quantity (for example, 160 pieces). The position can be detected. It can be understood that the number of face keypoint position coordinates obtained may differ depending on the face keypoint detection method. For example, the Dlib toolkit can be used to detect 68 facial keypoint positions.

ステップ１２０２において、検出された顔キーポイントおよび所定の平均顔モデルを利用して、前記顔画像内の人物の頭部姿態情報を確定する。 In step 1202, the detected face key points and a predetermined average face model are used to determine the head appearance information of the person in the face image.

図５に戻ると、ステップ１２２２において、前記頭部姿態情報に基づいて前記顔画像内の人物の視線方向情報を検出する。 Returning to FIG. 5, in step 1222, the line-of-sight direction information of the person in the face image is detected based on the head figure information.

本発明の実施例において、頭部姿態情報に基づいて、既にトレーニングされたニューラルネットワークを利用して上記の顔画像内の人物の視線方向情報を検出することができる。 In the embodiment of the present invention, it is possible to detect the line-of-sight direction information of a person in the above facial image by using an already trained neural network based on the head posture information.

図７を参照すると、前記ステップ１２２２は、ステップ１２２２１～１２２２３を含み得る。 Referring to FIG. 7, the step 1222 may include steps 12221-12223.

ステップ１２２２１において、前記頭部姿態情報に基づいて前記顔画像に対して正規化処理を行って、正規化された顔画像を得る。 In step 12221, the face image is normalized based on the head shape information to obtain a normalized face image.

実際に操作において、カメラＣ０によって異なる時点で収集された顔画像の場合、顔領域画像の全体における位置がランダムに変化され、画像内の人物の頭部姿態もランダムに変化される。上記のニューラルネットワークをトレーニングするときに、カメラが直接収集した顔画像をサンプル画像として使用すると、頭部姿態および顔領域画像位置のランダム性により、ニューラルネットワークのトレーニング困難さおよびトレーニング時間が増加されることは間違いない。 In the actual operation, in the case of face images collected at different time points by the camera C0, the position in the entire face area image is randomly changed, and the head shape of the person in the image is also randomly changed. When using the face image collected directly by the camera as a sample image when training the above neural network, the randomness of the head shape and face area image position increases the training difficulty and training time of the neural network. There is no doubt about it.

本発明の実施例によると、上記の視線方向を検出するためのニューラルネットワークをトレーニングするときに、トレーニング困難さを低減するために、まず、トレーニングサンプルセットの中の各サンプル画像データに対して正規化処理を実行することによって、正規化処理後のサンプル画像データが、仮想カメラが頭部に面して撮影した画像データに同等になるようにした後に、正規化処理後のサンプル画像データを利用して当該ニューラルネットワークをトレーニングする。 According to the embodiment of the present invention, when training the neural network for detecting the above-mentioned line-of-sight direction, in order to reduce the training difficulty, first, it is normal for each sample image data in the training sample set. By executing the normalization process, the sample image data after the normalization process is used after the sample image data after the normalization process is made equal to the image data taken by the virtual camera facing the head. And train the neural network.

これに応じて、当該ニューラルネットワークの適用段階では、視線方向情報の検出の正確性を確保するために、まず、顔画像に対して正規化処理を行ってから、対応する仮想カメラ座標系における正規化された顔画像を得て、上記のニューラルネットワークに入力して視線方向情報を検出する必要がある。 In response to this, at the application stage of the neural network, in order to ensure the accuracy of detection of the line-of-sight direction information, the face image is first normalized, and then the normalization in the corresponding virtual camera coordinate system is performed. It is necessary to obtain the converted facial image and input it into the above neural network to detect the line-of-sight direction information.

図８Ａを参照すると、上記のステップ１２２２１は、ステップ１２-１～１２-３を含み得る。 Referring to FIG. 8A, the above step 12221 may include steps 12-1-12-3.

ステップ１２-１において、前記頭部姿態情報に基づいて前記顔画像内の人物の頭部座標系を確定する。たとえば、前記頭部座標系のＸ軸は、左目と右目の座標の繋がり線に平行であり、前記頭部座標系のＹ軸は、顔の平面で前記Ｘ軸に垂直であり、前記頭部座標系のＺ軸は、前記顔の平面に垂直であり、視線の開始点は、前記頭部座標系の原点である。 In step 12-1, the head coordinate system of the person in the face image is determined based on the head shape information. For example, the X-axis of the head coordinate system is parallel to the connecting line of the coordinates of the left eye and the right eye, and the Y-axis of the head coordinate system is the plane of the face and perpendicular to the X-axis. The Z-axis of the coordinate system is perpendicular to the plane of the face, and the starting point of the line of sight is the origin of the head coordinate system.

本発明の実施例において、コンピュータシステムが上記の顔画像に基づいて目標対象の頭部姿態情報を検出することは、コンピュータシステムが目標対象の３次元頭部モデルを予測することと同等である。当該３次元頭部モデルは、カメラＣ０が上記の顔画像を収集するときの、目標対象の頭部のカメラＣ０に対する姿態情報を示すことができる。これに基づいて、コンピュータシステムは、頭部姿態情報に基づいて目標対象の頭部座標系を確定することができる。 In the embodiment of the present invention, the computer system detecting the head shape information of the target target based on the above facial image is equivalent to the computer system predicting the three-dimensional head model of the target target. The three-dimensional head model can show the appearance information of the target head with respect to the camera C0 when the camera C0 collects the above facial image. Based on this, the computer system can determine the head coordinate system of the target target based on the head shape information.

当該頭部座標系は、空間直交座標系を示すことができる。上記の頭部座標系のＸ軸は、上記の３次元頭部モデル中の両目の３Ｄ位置座標の繋がり線に平行である。両目の３Ｄ位置座標の繋がり線の中点すなわち上記の視線の開始点を、上記の頭部座標系の原点として確定することができる。前記頭部座標系のＹ軸は、顔の面で前記Ｘ軸に垂直である。前記頭部座標系のＺ軸は、顔の面に垂直である。 The head coordinate system can indicate a spatial Cartesian coordinate system. The X-axis of the head coordinate system is parallel to the connecting line of the 3D position coordinates of both eyes in the three-dimensional head model. The midpoint of the connecting line of the 3D position coordinates of both eyes, that is, the starting point of the line of sight can be determined as the origin of the head coordinate system. The Y-axis of the head coordinate system is perpendicular to the X-axis on the face. The Z axis of the head coordinate system is perpendicular to the face of the face.

ステップ１２-２において、前記頭部座標系に基づいて前記顔画像に対応する実際のカメラ座標系に対して回転および平行移動を行って、仮想カメラ座標系を得る。たとえば、前記仮想カメラ座標系のＺ軸は、前記頭部座標系の原点を指し、前記仮想カメラ座標系のＸ軸と前記頭部座標系のＸ軸とは、同じ平面にあり、前記仮想カメラ座標系の原点と前記頭部座標系の原点との間は、前記仮想カメラ座標系のＺ軸方向に所定の距離だけ離れている。 In step 12-2, a virtual camera coordinate system is obtained by performing rotation and translation with respect to the actual camera coordinate system corresponding to the face image based on the head coordinate system. For example, the Z axis of the virtual camera coordinate system points to the origin of the head coordinate system, and the X axis of the virtual camera coordinate system and the X axis of the head coordinate system are on the same plane, and the virtual camera. The origin of the coordinate system and the origin of the head coordinate system are separated by a predetermined distance in the Z-axis direction of the virtual camera coordinate system.

本発明の実施例において、コンピュータシステムは、目標対象の頭部座標系を確定した後に、上記の頭部座標系を参照して、上記のカメラに対して回転または平行移動の操作を実行して１つの仮想カメラを確定し、上記の仮想カメラの頭部座標系における位置に基づいて、上記の仮想カメラに対応する仮想カメラ座標系を構築することができる。当該仮想カメラ座標系の構築方法は、上記の所定の実際のカメラ座標系の構築方法と同様であり、すなわち仮想カメラ座標系のＺ軸は、上記の仮想カメラの光軸であり、上記の仮想カメラ座標系のＸ軸およびＹ軸は、当該仮想カメラのレンズ面に平行であり、仮想カメラレンズの光中心は、当該仮想カメラ座標系の原点である。 In the embodiment of the present invention, the computer system determines the head coordinate system of the target target, and then performs a rotation or translation operation with respect to the above camera with reference to the above head coordinate system. One virtual camera can be determined, and a virtual camera coordinate system corresponding to the virtual camera can be constructed based on the position of the virtual camera in the head coordinate system. The method for constructing the virtual camera coordinate system is the same as the method for constructing the predetermined actual camera coordinate system, that is, the Z axis of the virtual camera coordinate system is the optical axis of the virtual camera, and the virtual camera is described above. The X-axis and Y-axis of the camera coordinate system are parallel to the lens plane of the virtual camera, and the optical center of the virtual camera lens is the origin of the virtual camera coordinate system.

上記の仮想カメラ座標系と頭部座標系との間の位置関係は、以下の３つの条件を満たす。 The positional relationship between the virtual camera coordinate system and the head coordinate system satisfies the following three conditions.

条件１は、前記仮想カメラ座標系のＺ軸が前記頭部座標系の原点を指すことである。 Condition 1 is that the Z axis of the virtual camera coordinate system points to the origin of the head coordinate system.

条件２は、前記仮想カメラ座標系のＸ軸が前記頭部座標系のＸ軸と同じ平面に位置することであり、ここで、仮想カメラ座標系のＸ軸と前記頭部座標系のＸ軸との相対的な位置関係は、平行関係を含むが、これらに限定されない。 Condition 2 is that the X-axis of the virtual camera coordinate system is located on the same plane as the X-axis of the head coordinate system, and here, the X-axis of the virtual camera coordinate system and the X-axis of the head coordinate system. The relative positional relationship with and, including, but not limited to, parallel relationships.

条件３は、前記仮想カメラ座標系の原点が前記頭部座標系の原点と前記仮想カメラ座標系のＺ軸方向で所定の距離だけ離れていることである。 The condition 3 is that the origin of the virtual camera coordinate system is separated from the origin of the head coordinate system by a predetermined distance in the Z-axis direction of the virtual camera coordinate system.

上記の過程は、上記のカメラＣ０に対して以下の操作を実行して１つの仮想カメラを確定することと同等であり、すなわち、前記カメラＣ０を回転して、そのＺ軸が目画像内の人物の３次元視線の開始点を指すようにする同時に、カメラＣ０のＸ軸が上記の頭部座標系のＸ軸と同じ平面にあるようにし、回転後のカメラＣ０をＺ軸に沿って平行移動して、そのレンズの光中心と上記の頭部座標系の原点との間の距離が所定の長さになるようにする。 The above process is equivalent to performing the following operations on the camera C0 to determine one virtual camera, that is, rotating the camera C0 and its Z-axis in the eye image. At the same time as pointing to the starting point of the person's 3D line of sight, the X-axis of the camera C0 should be on the same plane as the X-axis of the above-mentioned head coordinate system, and the rotated camera C0 should be parallel to the Z-axis. Move so that the distance between the center of light of the lens and the origin of the above-mentioned head coordinate system is a predetermined length.

これまでのところ、コンピュータシステムは、実際のカメラ座標系と頭部座標系との間の位置関係、および、仮想カメラ座標系と上記の頭部座標系との間の位置関係に基づいて、実際のカメラ座標系と上記の仮想カメラ座標系との間の位置変換関係を確定することができる。 So far, computer systems have actually based on the positional relationship between the actual camera coordinate system and the head coordinate system, and the positional relationship between the virtual camera coordinate system and the head coordinate system described above. It is possible to determine the position conversion relationship between the camera coordinate system of the above and the virtual camera coordinate system described above.

本発明において、仮想カメラ座標系が顔画像内の人物の頭部姿態に関連しているため、異なる顔画像は、異なる仮想カメラ座標系に対応し得ることを理解すべきである。 It should be understood that in the present invention, different facial images may correspond to different virtual camera coordinate systems because the virtual camera coordinate system is related to the head shape of the person in the facial image.

ステップ１２-３において、前記実際のカメラ座標系と前記仮想カメラ座標系との間の位置変換関係に基づいて、前記顔画像に対して正規化処理を行って、前記正規化された顔画像を得る。 In step 12-3, the face image is normalized based on the position conversion relationship between the actual camera coordinate system and the virtual camera coordinate system, and the normalized face image is obtained. obtain.

本発明の実施例において、コンピュータシステムは、上記の実際のカメラ座標系と仮想カメラ座標系との間の位置変換関係に利用して、上記の顔画像に対して回転、アフィン、ズーム変換などの処理を実行して、上記の仮想カメラ座標系における正規化された顔画像を得ることができる。 In the embodiment of the present invention, the computer system utilizes the position conversion relationship between the actual camera coordinate system and the virtual camera coordinate system to perform rotation, affine, zoom conversion, and the like with respect to the face image. Processing can be performed to obtain a normalized face image in the virtual camera coordinate system described above.

図８Ｂは、例示的な１実施例に係る取得た顔画像に対して正規化処理を行うことを示す模式図であり、ここで、画像Ｐ０は、実際の車載カメラＣ０がドライバに対して収集した顔画像であり、画像Ｐ１は、上記の正規化処理を通じた後に得た仮想カメラ座標系における正規化された顔画像を示し、すなわちドライバ頭部に面している１つの仮想カメラＣ１によって収集されたドライバ顔画像に相当する。 FIG. 8B is a schematic diagram showing that the acquired face image according to an exemplary embodiment is subjected to normalization processing, where the image P0 is collected by the actual vehicle-mounted camera C0 for the driver. The image P1 shows a normalized face image in the virtual camera coordinate system obtained after the above normalization process, that is, collected by one virtual camera C1 facing the driver's head. Corresponds to the driver's face image.

図７に戻ると、ステップ１２２２２において、前記正規化された顔画像に基づいて視線方向検出を実行して、第１検出視線方向を得る。たとえば、上記の第１検出視線方向は、前記仮想カメラ座標系における３次元視線方向情報であり、３次元方向ベクトルであり得る。 Returning to FIG. 7, in step 12222, the line-of-sight direction detection is executed based on the normalized face image to obtain the first detected line-of-sight direction. For example, the first detected line-of-sight direction is three-dimensional line-of-sight direction information in the virtual camera coordinate system, and may be a three-dimensional direction vector.

本発明の実施例において、上記の正規化処理を通じた正規化された顔画像を既にトレーニングされた視線方向を検出するためのニューラルネットワークに入力して、上記の正規化された顔画像内の人物の３次元視線方向情報を検出することができる。上記の視線方向を検出するためのニューラルネットワークは、畳み込みニューラルネットワーク（ｃｏｎvｏｌｕｔｉｏｎａｌｎｅｕｒａｌｎｅｔwｏｒｋ、ＣＮＮ）などのディープニューラルネットワーク（ｄｅｅｐｎｅｕｒａｌｎｅｔwｏｒｋ、ＤＮＮ）を含み得る。 In an embodiment of the present invention, a person in the normalized facial image described above is input to a neural network for detecting a trained line-of-sight direction by inputting a normalized facial image through the normalized process. 3D line-of-sight direction information can be detected. The neural network for detecting the line-of-sight direction may include a deep neural network (DNP) such as a convolutional neural network (CNN).

ステップ１２２２３において、前記第１検出視線方向に対して座標逆変換処理を行って、前記顔画像内の人物の視線方向情報を得る。 In step 12223, the coordinate inverse conversion process is performed on the first detected line-of-sight direction to obtain the line-of-sight direction information of the person in the face image.

後続の注視エリア検出段階において、注視エリア分類器に実際のカメラ座標系における視線特徴ベクトルを入力する必要がある。したがって、本発明において、コンピュータシステムが仮想カメラ座標系における視線方向情報である上記の第１検出視線方向を検出した後、上第１検出視線方向に対して、仮想カメラ座標系から上記の実際のカメラ座標系までの座標逆変換処理を実行して、上記の実際のカメラ座標系における視線方向情報を得る必要がある。 In the subsequent gaze area detection stage, it is necessary to input the gaze feature vector in the actual camera coordinate system to the gaze area classifier. Therefore, in the present invention, after the computer system detects the above-mentioned first detection line-of-sight direction which is the line-of-sight direction information in the virtual camera coordinate system, the above-mentioned actual operation is performed from the virtual camera coordinate system with respect to the upper first detection line-of-sight direction. It is necessary to execute the coordinate inverse conversion process up to the camera coordinate system to obtain the line-of-sight direction information in the above actual camera coordinate system.

図１に戻ると、上記のステップ１２は、顔画像内の人物の視線特徴ベクトルを確定する過程に相当し、当該視線特徴ベクトルは、顔画像内の人物の視線開始点情報および視線方向情報を含む。 Returning to FIG. 1, the above step 12 corresponds to the process of determining the line-of-sight feature vector of the person in the face image, and the line-of-sight feature vector obtains the line-of-sight start point information and the line-of-sight direction information of the person in the face image. include.

たとえば、インテリジェントな運転の実際に適用において、上記の顔画像に対して視線特徴ベクトルを抽出する過程は、車両モデルの変更によって変更されない。当該段階で使用する人工ニューラルネットワーク（顔キーポイントを検出するためのニューラルネットワーク、視線方向を検出するためのニューラルネットワークなど）は、異なる車両モデルに適用でき、良好な機動性を有する。 For example, in the actual application of intelligent driving, the process of extracting the line-of-sight feature vector for the above facial image is not changed by changing the vehicle model. The artificial neural network used at this stage (neural network for detecting face key points, neural network for detecting line-of-sight direction, etc.) can be applied to different vehicle models and has good mobility.

上記のように、本発明の１実施例によると、ステップ１３において、ステップ１２で確定された顔画像内の人物の視線開始点情報および視線方向情報を既に所定の３次元空間に対するトレーニングが完了された注視エリア分類器に入力して、前記顔画像に対応する目標注視エリアの種類を検出することができる。 As described above, according to one embodiment of the present invention, in step 13, training of the line-of-sight start point information and the line-of-sight direction information of the person in the face image determined in step 12 for a predetermined three-dimensional space has already been completed. It is possible to detect the type of the target gaze area corresponding to the face image by inputting to the gaze area classifier.

本発明の実施例において、上記のステップ１３は、前記目標注視エリアの種類に基づいて目標注視エリア情報を確定し、前記目標注視エリア情報を出力することを含み得る。 In the embodiment of the present invention, the above step 13 may include determining the target gaze area information based on the type of the target gaze area and outputting the target gaze area information.

たとえば、分類器は、図９Ａに示すように、目標注視エリアの種類を出力することができ、または、図９Ｂに示すように、目標注視エリアの名称を直接出力することができる。 For example, the classifier can output the type of target gaze area as shown in FIG. 9A, or can directly output the name of the target gaze area as shown in FIG. 9B.

本発明のもう１実施例において、上記の注視エリア検出方法は、上記のステップ１１の前に、視線方向を検出するためのニューラルネットワークをトレーニングすることをさらに含み得る。当該ステップは、３次元視線方向推定モデルのトレーニング過程に対応される。当該ステップは、図２に示したリアルタイムで注視エリア分類器をトレーニングする過程と、互いに異なるコンピュータシステムで実行されることができることを説明する必要がある。 In another embodiment of the invention, the gaze area detection method may further comprise training a neural network to detect the gaze direction prior to step 11. The step corresponds to the training process of the three-dimensional line-of-sight direction estimation model. It is necessary to explain that the step is the process of training the gaze area classifier in real time shown in FIG. 2 and that it can be performed on different computer systems.

図１０は、本発明の例示的な実施例に係る３次元視線方向を検出するためのニューラルネットワークをトレーニングする方法のフローチャートである。当該方法は、ステップ１００１～１００５を含み得る。 FIG. 10 is a flowchart of a method for training a neural network for detecting a three-dimensional line-of-sight direction according to an exemplary embodiment of the present invention. The method may include steps 1001-1005.

ステップ１００１において、少なくとも１つの顔サンプルを含む元のサンプルセットを確定し、ここで、各前記顔サンプルは、顔画像サンプルおよび視線方向ラベリング情報を含む。 In step 1001, the original sample set containing at least one face sample is determined, where each face sample contains a face image sample and gaze direction labeling information.

本発明の実施例において、教師あり学習方法を採用して上記のニューラルネットワークをトレーニングすることができる。これに応じて、上記のニューラルネットワークをトレーニングするためのサンプルセットの中の各々のサンプルは、予測するための入力情報すなわち顔画像サンプル、および、当該入力情報に該当する真の値すなわち実際のカメラ座標系における実際に検出された視線方向情報を含み得る。本発明の実施例において、上記の実際に検出された視線方向情報を、視線方向ラベリング情報とも呼ぶ。 In the embodiment of the present invention, the above neural network can be trained by adopting a supervised learning method. Correspondingly, each sample in the sample set for training the above neural network is an input information for prediction, that is, a face image sample, and a true value corresponding to the input information, that is, an actual camera. It may include the actually detected line-of-sight information in the coordinate system. In the embodiment of the present invention, the above-mentioned actually detected line-of-sight direction information is also referred to as line-of-sight direction labeling information.

ステップ１００２において、顔キーポイントおよび平均顔モデルに基づいて、各々の前記顔画像サンプルに対応する頭部姿態情報を確定する。 In step 1002, the head figure information corresponding to each of the face image samples is determined based on the face key points and the average face model.

ステップ１００３において、前記頭部姿態情報および前記実際のカメラ座標系に基づいて、各々の前記顔画像サンプルに対応する正規化された顔画像サンプルおよび前記視線方向ラベリング情報の前記仮想座標系における仮想視線方向ラベリング情報を確定する。 In step 1003, based on the head shape information and the actual camera coordinate system, the virtual line-of-sight in the virtual coordinate system of the normalized face image sample corresponding to each face image sample and the line-of-sight labeling information. Confirm the direction labeling information.

上記のステップ１００２およびステップ１００３の実施過程は、それぞれ、上記のステップ１２０２およびステップ１２-１～１２-３と同様であり、ここでは繰り返して説明しない。同時に、コンピュータシステムは、実際のカメラ座標系から仮想カメラ座標系までの位置変換関係に基づいて、上記の視線方向ラベリング情報を仮想視線ラベリング情報に変換する。 The implementation process of the above steps 1002 and 1003 is the same as the above steps 1202 and steps 12-1 to 12-3, respectively, and will not be described repeatedly here. At the same time, the computer system converts the above-mentioned line-of-sight direction labeling information into virtual line-of-sight labeling information based on the position conversion relationship from the actual camera coordinate system to the virtual camera coordinate system.

これまでのところ、仮想カメラ座標系におけるサンプルセットを得た。続いて、当該サンプルセットを利用して、前記３次元視線方向を検出するためのニューラルネットワークのトレーニング要件を満たすまで、以下のステップを反復してトレーニングし、これらステップは、各前記正規化された顔画像サンプルをトレーニング待ちの３次元視線方向検出ニューラルネットワークに入力して、３次元視線方向予測情報を得るステップ１００４と、前記３次元視線方向予測情報と前記仮想視線方向ラベリング情報との間の偏差に基づいて、前記ニューラルネットワークに対してパラメータ調整を実行して、視線方向情報を検出するためのニューラルネットワークを得るステップ１００５と、を含む。 So far, we have obtained a sample set in the virtual camera coordinate system. Subsequently, using the sample set, the following steps were repeatedly trained until the training requirements of the neural network for detecting the three-dimensional line-of-sight direction were satisfied, and these steps were each normalized. The deviation between the step 1004 of inputting the face image sample into the three-dimensional line-of-sight detection neural network waiting for training to obtain the three-dimensional line-of-sight direction prediction information and the three-dimensional line-of-sight direction prediction information and the virtual line-of-sight direction labeling information. Includes step 1005, which performs parameter adjustments to the neural network to obtain a neural network for detecting line-of-sight information.

本発明の実施例において、仮想カメラ座標系において正規化処理後の正規化された顔画像をトレーニングサンプルデータとして採用することによって、頭部姿態変化によるニューラルネットワークのトレーニング困難さを低減し、視線方向を検出するためのニューラルネットワークのトレーニング効率を向上させることができる。 In the embodiment of the present invention, by adopting the normalized face image after the normalization processing in the virtual camera coordinate system as the training sample data, the difficulty of training the neural network due to the change in the head shape is reduced, and the line-of-sight direction It is possible to improve the training efficiency of the neural network for detecting.

１例として、ドライバの注視エリアを認識した後に、当該注視エリアに基づいてさらなる操作を実行することができる。たとえば、注視エリア種類検出結果に基づいて、顔画像に対応する人物の注意力モニタリング結果を確定することができる。たとえば、前記の注視エリア種類検出結果は、所定の時間帯内の注視エリアの種類であり得る。例示的に、当該注視エリア種類検出結果は、「所定の時間帯で、当該ドライバの注視エリアは、常にエリア２である」であり得る。当該エリア２が右フロントガラスであると、当該ドライバが運転により専念していることを意味する。当該エリア２が副操縦士の前方のグローブボックスエリアであると、当該ドライバが気を散らされて集中できない可能性が高いことを意味する。 As an example, after recognizing the gaze area of the driver, further operations can be performed based on the gaze area. For example, the attention monitoring result of the person corresponding to the face image can be determined based on the gaze area type detection result. For example, the gaze area type detection result may be the type of the gaze area within a predetermined time zone. Illustratively, the gaze area type detection result may be "at a predetermined time zone, the gaze area of the driver is always area 2." If the area 2 is the right windshield, it means that the driver is devoting himself to driving. If the area 2 is the glove box area in front of the co-pilot, it means that the driver is likely to be distracted and unable to concentrate.

注意力モニタリング結果を検出した後に、前記注意力モニタリング結果を出力することができ、たとえば、車両内のある表示エリアに「運転がよく専念している」を表示することができる。または、前記注意力モニタリング結果に基づいて注意散漫プロンプト情報を出力することができ、表示スクリーンにすみやかに表示する方式、または、音声プロンプトなどの方式によって、「運転の安全を確保するため、運転に注意力を集中してください」とドライバにプロンプトする。当然ながら、具体的に情報を出力ときに、注意力モニタリング結果および注意散漫プロンプト情報の中の少なくとも１つの情報を出力することができる。 After detecting the attention monitoring result, the attention monitoring result can be output, and for example, "driving is well devoted" can be displayed in a certain display area in the vehicle. Alternatively, distraction prompt information can be output based on the attention monitoring result, and the distraction prompt information can be displayed promptly on the display screen, or a voice prompt or the like can be used to "in order to ensure driving safety. Focus your attention, "prompts the driver. As a matter of course, when the information is specifically output, at least one of the attention monitoring result and the distraction prompt information can be output.

注視エリア種類の検出に基づいて人間の注意力モニタリング結果を確定したり、注意散漫プロンプト情報を出力したりすることによって、ドライバの注意力モニタリングに重要の助けとなり、ドライバが注意力を集中していない状況を効果的な検出し、迅速に思い出させることができ、事故のリスクを減らし、運転の安全を確保することができる。 By confirming human attention monitoring results based on the detection of the gaze area type and outputting distraction prompt information, it is an important help for the driver's attention monitoring, and the driver is focusing his attention. It can effectively detect situations that are not present, quickly remind them, reduce the risk of accidents, and ensure driving safety.

上記の例の記述において、インテリジェントな運転アプリケーションシナリオにおいてドライバの注意力をモニタリングする例を説明しる。これ以外に、注視エリアの検出は、他の多くの用途もある。 In the description of the above example, an example of monitoring the driver's attention in an intelligent driving application scenario will be described. Besides this, gaze area detection has many other uses.

たとえば、注視エリア検出に基づく車両と機械の対話型制御を実行することができる。車両内にマルチメディアプレーヤーなどの一部の電子デバイスが搭載されていることができる。車両内の人の注視エリアを検出することによって、注視エリアの検出結果に基づいて、当該マルチメディアプレーヤーが再生機能を起動するように自動的に制御することができる。 For example, interactive control of the vehicle and machine based on gaze area detection can be performed. Some electronic devices such as multimedia players can be installed in the vehicle. By detecting the gaze area of a person in the vehicle, the multimedia player can be automatically controlled to activate the playback function based on the detection result of the gaze area.

例示的に、車両内に配置したカメラを利用して車両内の人（運転手または乗客など）の顔画像を撮影して得、事前にトレーニングされたニューラルネットワークを利用して注視エリア種類検出結果を検出する。たとえば、当該検出結果は、時間帯Ｔで、当該車両内の人の注視エリア１が常に車両内のあるマルチメディアプレーヤー上の「注視起動」のオプションが位置しているエリアであることであり得る。上記の検出結果に基づいて当該車両内の人が当該マルチメディアプレーヤーを起動しようとしているとして確定することができるため、該当する制御命令を出力して、当該マルチメディアプレーヤーが再生を実行しはじめるように制御することができる。 Illustratively, a camera placed in the vehicle is used to capture a facial image of a person (driver or passenger, etc.) in the vehicle, and a pre-trained neural network is used to detect the gaze area type. Is detected. For example, the detection result may be that in time zone T, the gaze area 1 of a person in the vehicle is always the area where the "gaze activation" option on a multimedia player in the vehicle is located. .. Based on the above detection result, it can be confirmed that the person in the vehicle is trying to activate the multimedia player, so that the corresponding control command is output so that the multimedia player starts executing playback. Can be controlled to.

車両に関連するアプリケーションに加えて、ゲーム制御、スマートホームデバイス制御、広告プッシュなどの複数の種類のアプリケーションのシナリオをさらに含み得る。スマートホーム制御の例を挙げると、制御者の顔画像を収集し、事前にトレーニングされたニューラルネットワークを介して注視エリア種類検出結果を検出することができる。たとえば、当該検出結果は、時間帯Ｔで、当該制御者の注視エリア１が常にスマートエアコン上の「注視起動」のオプションが位置しているエリアであることであり得る。上記の検出結果に基づいて、当該制御者がスマートエアコンを起動しようとしているとして確定することができるため、該当する制御命令を出力して、当該エアコンを起動するように制御することができる。 In addition to vehicle-related applications, it may further include scenarios for multiple types of applications such as game control, smart home device control, and ad push. To give an example of smart home control, it is possible to collect a face image of a controller and detect a gaze area type detection result via a pre-trained neural network. For example, the detection result may be that in the time zone T, the gaze area 1 of the controller is always the area where the "gaze activation" option on the smart air conditioner is located. Based on the above detection result, it can be determined that the controller is about to start the smart air conditioner, so that the corresponding control command can be output to control the air conditioner to be started.

説明の便宜上、前述した各方法の実施例をいずれも一連の動作の組み合わせに記述された。当業者は、本発明は記述された動作の順序に限定されないことを了解すべきである。本発明によると、いくつかのステップは、その他の順序を採用するか、または、同時に実行されることができる。 For convenience of explanation, examples of each of the above-mentioned methods are described in a series of operation combinations. Those skilled in the art should appreciate that the invention is not limited to the sequence of operations described. According to the present invention, some steps may adopt other sequences or be performed simultaneously.

本発明は、前述した方法の実施例に対応する装置および電子デバイスの実施例をさらに提供することができる。 INDUSTRIAL APPLICABILITY The present invention can further provide examples of devices and electronic devices corresponding to the embodiments of the above-mentioned method.

図１１は、本発明の例示的な実施例に係る注視エリア検出装置１１００のブロック図である。注視エリア検出装置１１００は、画像取得モジュール２１と、視線検出モジュール２２と、注視エリア検出モジュール２３と、備え得る。 FIG. 11 is a block diagram of the gaze area detection device 1100 according to an exemplary embodiment of the present invention. The gaze area detection device 1100 may include an image acquisition module 21, a gaze detection module 22, and a gaze area detection module 23.

画像取得モジュール２１は、所定の３次元空間で収集された顔画像を取得する。視線検出モジュール２２は、前記顔画像に基づいて視線検出を実行して視線検出結果を得る。本発明の１実施例において、前記視線検出結果は、前記顔画像内の人物の視線開始点情報および視線方向情報を含み得る。注視エリア検出モジュール２３は、前記所定の３次元空間に対して事前にトレーニングされた注視エリア分類器を利用して、前記視線検出結果に基づいて前記顔画像に対応する目標注視エリアの種類を検出する。前記目標注視エリアは、前記所定の３次元空間を事前に分割することにより得られた複数種類の定義された注視エリアのうちの１つに属する。 The image acquisition module 21 acquires a face image collected in a predetermined three-dimensional space. The line-of-sight detection module 22 executes line-of-sight detection based on the face image and obtains a line-of-sight detection result. In one embodiment of the present invention, the line-of-sight detection result may include line-of-sight start point information and line-of-sight direction information of a person in the face image. The gaze area detection module 23 detects the type of the target gaze area corresponding to the face image based on the gaze detection result by using the gaze area classifier trained in advance for the predetermined three-dimensional space. do. The target gaze area belongs to one of a plurality of defined gaze areas obtained by preliminarily dividing the predetermined three-dimensional space.

図１２を参照すると、本発明の例示的な実施例に係る注視エリア検出装置の視線検出モジュール２２は、前記顔画像内の目の位置を検出するための目位置検出サブモジュール２２１と、前記目の位置が両目の位置を含む場合、前記両目の中間位置を前記視線開始点情報として確定するための第１開始点情報確定サブモジュール２２２と、を備え得る。 Referring to FIG. 12, the line-of-sight detection module 22 of the gaze area detection device according to the exemplary embodiment of the present invention includes an eye position detection submodule 221 for detecting the position of an eye in the face image and the eye. When the position of the eyes includes the positions of both eyes, a first start point information determination submodule 222 for determining the intermediate position of the eyes as the line-of-sight start point information may be provided.

図１３を参照すると、本発明の例示的な実施例に係る注視エリア検出装置のもう１つの視線検出モジュール２２は、前記顔画像内の目の位置を検出するための目位置検出サブモジュール２２１と、前記目の位置が両目の位置を含む場合、前記両目中の任意の１つの目の位置を前記視線開始点情報として確定し、または、前記目の位置が片目の位置を含む場合、前記片目の位置を前記視線開始点情報として確定するための第２開始点情報確定サブモジュール２２３と、を備え得る。 Referring to FIG. 13, another line-of-sight detection module 22 of the gaze area detection device according to the exemplary embodiment of the present invention includes an eye position detection submodule 221 for detecting the position of the eyes in the face image. , If the position of the eyes includes the positions of both eyes, the position of any one eye in the eyes is determined as the line-of-sight start point information, or if the position of the eyes includes the position of one eye, the one eye. A second start point information determination submodule 223 for determining the position of the line of sight as the line-of-sight start point information may be provided.

図１４を参照すると、本発明の例示的な実施例に係る図１２および図１３中の目位置検出サブモジュール２２１は、前記顔画像内の人物の頭部姿態情報を検出するための姿態検出ユニット２２１１と、前記頭部姿態情報に基づいて前記顔画像内の目の位置を確定するための位置確定ユニット２２１２と、を備え得る。 Referring to FIG. 14, the eye position detection submodule 221 in FIGS. 12 and 13 according to an exemplary embodiment of the present invention is a shape detection unit for detecting head shape information of a person in the face image. 2211 and a position determination unit 2212 for determining the position of the eyes in the face image based on the head shape information may be provided.

図１５を参照すると、本発明の例示的な実施例に係る注視エリア検出装置のもう１つの視線検出モジュール２２は、前記顔画像内の人物の頭部姿態情報を検出するための姿態検出サブモジュール２２-１と、前記頭部姿態情報に基づいて前記顔画像内の人物の視線方向情報を検出するための方向検出サブモジュール２２-２と、を備え得る。 Referring to FIG. 15, another line-of-sight detection module 22 of the gaze area detection device according to the exemplary embodiment of the present invention is a shape detection submodule for detecting head shape information of a person in the face image. 22-1 and a direction detection submodule 22-2 for detecting line-of-sight direction information of a person in the face image based on the head shape information may be provided.

図１６を参照すると、本発明の例示的な実施例に係る図１５中の姿態検出サブモジュール２２-１は、前記顔画像内の複数の顔キーポイントを検出するためのキーポイント検出ユニット２２-１１と、前記顔キーポイントおよび所定の平均顔モデルに基づいて、前記顔画像内の人物の頭部姿態情報を確定するための姿態確定ユニット２２-１２と、を備え得る。 Referring to FIG. 16, the figure detection submodule 22-1 in FIG. 15 according to an exemplary embodiment of the present invention is a key point detection unit 22- for detecting a plurality of face key points in the face image. 11 and a figure determining unit 22-12 for determining the head figure information of the person in the face image based on the face key points and a predetermined average face model may be provided.

図１７を参照すると、本発明の例示的な実施例に係る図１５中の方向検出サブモジュール２２-２は、前記頭部姿態情報に基づいて前記顔画像に対して正規化処理を行って、正規化された顔画像を得るための画像処理ユニット２２-２１と、前記正規化された顔画像に基づいて視線方向検出を実行して、第１検出視線方向を得るための第１方向検出ユニット２２-２２と、前記第１検出視線方向に対して座標逆変換処理を行って、前記顔画像内の人物の視線方向情報を得るための方向確定ユニット２２-２３と、を備え得る。 Referring to FIG. 17, the direction detection submodule 22-2 in FIG. 15 according to an exemplary embodiment of the present invention performs a normalization process on the face image based on the head shape information. An image processing unit 22-21 for obtaining a normalized face image, and a first-direction detection unit for executing line-of-sight direction detection based on the normalized face image to obtain a first detected line-of-sight direction. 22-22 may be provided with a direction determination unit 22-23 for performing a coordinate inverse conversion process with respect to the first detected line-of-sight direction to obtain line-of-sight direction information of a person in the face image.

図１８を参照すると、本発明の例示的な実施例に係る図１７中の画像処理ユニット２２-２１は、前記頭部姿態情報に基づいて前記顔画像内の人物の頭部座標系を確定するための頭部座標確定サブユニット２２-２１１と、前記頭部座標系に基づいて前記顔画像に対応する実際のカメラ座標系に対して回転および平行移動を行って、仮想カメラ座標系を得るための座標変換サブユニット２２-２１２と、前記実際のカメラ座標系と前記仮想カメラ座標系との間の位置変換関係に基づいて、前記顔画像に対して正規化処理を行って、前記正規化された顔画像を得るための画像処理サブユニット２２-２１３と、を備え得る。 Referring to FIG. 18, the image processing unit 22-21 in FIG. 17 according to an exemplary embodiment of the present invention determines the head coordinate system of a person in the face image based on the head figure information. To obtain a virtual camera coordinate system by rotating and translating the head coordinate determination subunit 22-211 for the purpose and the actual camera coordinate system corresponding to the face image based on the head coordinate system. Based on the position conversion relationship between the coordinate conversion subunit 22-212 of the above and the actual camera coordinate system and the virtual camera coordinate system, the face image is normalized by performing the normalization process. An image processing subsystem 22-213 for obtaining a face image may be provided.

本発明の上記の任意の装置の実施例において、前記注視エリア分類器は、前記所定の３次元空間に対するトレーニングサンプルセットに基づいて事前にトレーニングされることができる。前記トレーニングサンプルセットは、複数の視線特徴サンプルを含み得、各前記視線特徴サンプルは、視線開始点情報、視線方向情報、および、当該視線特徴サンプルに対応する注視エリア種類のラベリング情報を含み、ラベリングした注視エリアの種類は、前記所定の３次元空間を分割して得られた複数種類の定義された注視エリアのうちの１つに属する。 In an embodiment of any of the above-mentioned devices of the invention, the gaze area classifier can be pre-trained based on a training sample set for the predetermined three-dimensional space. The training sample set may include a plurality of line-of-sight feature samples, and each line-of-sight feature sample contains line-of-sight start point information, line-of-sight direction information, and labeling information of the gaze area type corresponding to the line-of-sight feature sample, and is labeled. The type of gaze area is one of a plurality of defined gaze areas obtained by dividing the predetermined three-dimensional space.

図１９は、本発明の例示的な実施例に係るもう１つの注視エリア検出装置１９００のブロック図である。図１１に示した注視エリア検出装置１１００と比較すると、注視エリア検出装置１９００は、分類器トレーニングモジュール２０をさらに含み得る。 FIG. 19 is a block diagram of another gaze area detection device 1900 according to an exemplary embodiment of the present invention. Compared to the gaze area detection device 1100 shown in FIG. 11, the gaze area detection device 1900 may further include a classifier training module 20.

分類器トレーニングモジュール２０は、少なくとも１つの前記視線特徴サンプルの前記視線開始点情報および前記視線方向情報をトレーニング待ちの注視エリア分類器に入力して、当該視線特徴サンプルに対応する注視エリア種類予測情報を得るための種類予測サブモジュール２０１と、前記注視エリア種類予測情報と当該視線特徴サンプルに対応する注視エリア種類のラベリング情報との間の偏差に基づいて、前記注視エリア分類器に対してパラメータ調整を実行して、前記注視エリア分類器をトレーニングするためのパラメータ調整サブモジュール２０２と、をさらに備え得る。 The classifier training module 20 inputs the line-of-sight start point information and the line-of-sight direction information of at least one line-of-sight feature sample into the gaze area classifier waiting for training, and gaze area type prediction information corresponding to the line-of-sight feature sample. Parameter adjustment for the gaze area classifier based on the deviation between the gaze area type prediction information and the gaze area type labeling information corresponding to the gaze feature sample. May further include a parameter adjustment submodule 202 for training the gaze area classifier.

図２０は、本発明の例示的な実施例に係るもう１つの注視エリア検出装置２０００のブロック図である。図１１に示した注視エリア検出装置１１００と比較すると、注視エリア検出装置２０００は、分類器取得モジュール２０３をさらに備え得る。 FIG. 20 is a block diagram of another gaze area detection device 2000 according to an exemplary embodiment of the present invention. Compared to the gaze area detection device 1100 shown in FIG. 11, the gaze area detection device 2000 may further include a classifier acquisition module 203.

分類器取得モジュール２０３は、前記所定の３次元空間の空間識別子に基づいて所定の注視エリア分類器セットから前記空間識別子に対応する注視エリア分類器を取得することができる。前記所定の注視エリア分類器セットは、異なる３次元空間の空間識別子にそれぞれ対応する注視エリア分類器を含み得る。 The classifier acquisition module 203 can acquire a gaze area classifier corresponding to the space identifier from a predetermined gaze area classifier set based on the spatial identifier of the predetermined three-dimensional space. The predetermined gaze area classifier set may include gaze area classifiers corresponding to different spatial identifiers in different three-dimensional spaces.

本発明の上記の任意の装置の実施例において、前記所定の３次元空間は、車両空間を含み得る。これに応じて、前記顔画像は、前記車両空間内の運転エリアに対して収集された画像に基づいて確定されることができる。前記所定の３次元空間を分割して得られた複数種類の定義された注視エリアは、左フロントガラスエリア、右フロントガラスエリア、インストルメントパネルエリア、インテリアミラーエリア、センターコンソールエリア、左バックミラーエリア、右バックミラーエリア、シェーディングプレートエリア、シフトレバーエリア、ステアリングホイールの下方エリア、副操縦士エリア、副操縦士の前方のグローブボックスエリアの中の少なくとも２種類を含み得る。 In an embodiment of any of the above-mentioned devices of the present invention, the predetermined three-dimensional space may include a vehicle space. Accordingly, the facial image can be determined based on the image collected for the driving area in the vehicle space. The plurality of types of defined gaze areas obtained by dividing the predetermined three-dimensional space are the left windshield area, the right windshield area, the instrument panel area, the interior mirror area, the center console area, and the left rear-view mirror area. , Right rearview mirror area, shading plate area, shift lever area, steering wheel lower area, co-pilot area, glove box area in front of co-pilot.

図２１は、本発明の例示的な実施例に係るもう１つの注視エリア検出装置２１００のブロック図である。図１１に示した注視エリア検出装置１１００と比較すると、注視エリア検出装置２１００は、注視エリア検出モジュール２３が得た注視エリア種類検出結果に基づいて、前記顔画像に対応する人物の注意力モニタリング結果を確定するための注意力モニタリングモジュール２４と、前記注意力モニタリング結果を出力しおよび／または前記注意力モニタリング結果に基づいて注意散漫プロンプト情報を出力するためのモニタリング結果出力モジュール２５と、をさらに備え得る。 FIG. 21 is a block diagram of another gaze area detection device 2100 according to an exemplary embodiment of the present invention. Compared with the gaze area detection device 1100 shown in FIG. 11, the gaze area detection device 2100 has the attention monitoring result of the person corresponding to the face image based on the gaze area type detection result obtained by the gaze area detection module 23. The attention monitoring module 24 for confirming the above, and the monitoring result output module 25 for outputting the attention monitoring result and / or the distraction prompt information based on the attention monitoring result are further provided. obtain.

図２２は、本発明の例示的な実施例に係るもう１つの注視エリア検出装置２２００のブロック図である。図１１に示した注視エリア検出装置１１００と比較すると、注視エリア検出装置２２００は、注視エリア検出モジュール２３が得た注視エリア種類検出結果に対応する制御命令を確定するための制御命令確定モジュール２６と、電子デバイスが前記制御命令に該当する操作を実行するように制御するための操作制御モジュール２７と、をさらに備え得る。 FIG. 22 is a block diagram of another gaze area detection device 2200 according to an exemplary embodiment of the present invention. Compared with the gaze area detection device 1100 shown in FIG. 11, the gaze area detection device 2200 has a control command confirmation module 26 for determining a control command corresponding to the gaze area type detection result obtained by the gaze area detection module 23. Further, an operation control module 27 for controlling the electronic device to perform the operation corresponding to the control command may be provided.

装置の実施例の場合、基本的に方法の実施例に対応されるため、関連される部分は方法の実施例の部分の説明を参照すればよい。上記の装置の実施例は、単に模式的なものである。ここで、分離部件として説明されたユニットは、物理的に分離されている場合と物理的に分離されていない場合があり、ユニットとして表示される部品は、物理ユニットである場合とそうでない場合がある。1つの場所に配置されることも、複数のネットワークユニットに分散させることもできる。当業者は、創造的な作業なしに、実際の必要によってその中の一部またはすべてのモジュールを選択して本発明の実施例を実現することができる。 In the case of the embodiment of the apparatus, since it basically corresponds to the embodiment of the method, the related part may refer to the description of the part of the embodiment of the method. The embodiment of the above device is merely schematic. Here, the unit described as a separation part may or may not be physically separated, and the part displayed as a unit may or may not be a physical unit. be. It can be located in one location or distributed across multiple network units. One of ordinary skill in the art can realize an embodiment of the present invention by selecting some or all of the modules thereof according to actual needs without any creative work.

本発明は、上記の注視エリア検出方法に対応する電子デバイスをさらに提供することができる。図２３は、本発明の例示的な１実施例に係る電子デバイス２３００のブロック図である。たとえば、電子デバイス２３００は、プロセッサと、内部バスと、ネットワークインターフェースと、内部メモリと、不揮発性メモリと、を備え得る。プロセッサは、不揮発性メモリから対応するコンピュータプログラムを内部メモリに読み込んで運行させることによって、上記の注視エリア検出方法を実現するための注視エリア検出装置を論理的に形成することができる。 The present invention can further provide an electronic device corresponding to the above-mentioned gaze area detection method. FIG. 23 is a block diagram of an electronic device 2300 according to an exemplary embodiment of the present invention. For example, the electronic device 2300 may include a processor, an internal bus, a network interface, an internal memory, and a non-volatile memory. The processor can logically form a gaze area detection device for realizing the above-mentioned gaze area detection method by reading a corresponding computer program from the non-volatile memory into the internal memory and operating the processor.

当業者は、本発明は、方法、装置、システム、または、コンピュータプログラム製品として提供することができることを理解すべきである。したがって、本発明は、完全なハードウェアの実施例、完全なソフトウェアの実施例、または、ソフトウェアとハードウェアとを組み合わせた実施例の形態を採用することができる。 Those skilled in the art should understand that the invention can be provided as a method, device, system, or computer program product. Accordingly, the present invention may adopt embodiments of complete hardware, complete software, or a combination of software and hardware.

本発明は、コンピュータ可読記録媒体をさらに提供することができ、当該記録媒体には、コンピュータプログラムが記憶されており、前記コンピュータプログラムがプロセッサによって実行されると、当該プロセッサが、上記の任意の方法実施例の注視エリア検出方法を実現するようにする。 The present invention can further provide a computer-readable recording medium, in which the computer program is stored, and when the computer program is executed by the processor, the processor can be used in any of the above-mentioned methods. To realize the gaze area detection method of the embodiment.

本発明における主題および機能操作の実施例は、デジタル電子回路、有形コンピュータソフトウェアまたはファームウェア、本発明に開示される構成およびその構造的同等物を含むコンピュータハードウェア、または、それらの１つまたは複数の組み合わせで、実現されることができる。本発明における主題の実施例は、１つまたは複数のコンピュータプログラムとして実現されることができ、すなわち、有形の非一時的プログラムキャリア上に符号化されて、データ処理装置によって実行されるか、または、データ処理装置の操作を制御するための、コンピュータプログラム命令中の１つまたは複数のモジュールとして実現されることができる。代替的または追加的に、プログラム命令は、手動で生成する伝播信号上に符号化されることができ、例えば、機械が生成する電気信号、光信号、または、電磁信号に符号化されることができる。当該信号は、情報を符号化して適切な受信機装置に伝送して、データ処理装置によって実行されるようにするために、生成される。コンピュータ記憶媒体は、機械可読記憶デバイス、機械可読記憶基板、ランダムにまたはシリアルアクセスメモリデバイス、または、それらの１つまたは複数の組み合わせであり得る。 Examples of the subject matter and functional operation in the present invention are digital electronic circuits, tangible computer software or firmware, computer hardware including the configurations and structural equivalents thereof disclosed in the present invention, or one or more thereof. It can be realized by combination. The embodiments of the subject in the present invention can be realized as one or more computer programs, i.e., encoded on a tangible non-temporary program carrier and executed by a data processing apparatus. , Can be implemented as one or more modules in a computer program instruction to control the operation of the data processing device. Alternatively or additionally, the program instruction can be encoded on a manually generated propagation signal, for example, on a machine-generated electrical, optical, or electromagnetic signal. can. The signal is generated to encode the information and transmit it to the appropriate receiver device for execution by the data processing device. The computer storage medium can be a machine-readable storage device, a machine-readable storage board, a random or serial access memory device, or a combination thereof.

本発明における処理と論理フローは、１つまたは複数のコンピュータプログラムを実行する１つまたは複数のプログラム可能なコンピュータによって実行されることができ、入力データに基づいて操作を実行して出力を生成することによって該当する機能を実行する。前記処理と論理フローは、さらに、例えば、ＦＰＧＡ（フィールドプログラマブルゲートアレイ）またはＡＳＩＣ（専用集積回路）などの専用論理回路によって実行されることができ、また、装置も専用論理回路として実現されることができる。 The processing and logical flow in the present invention can be performed by one or more programmable computers running one or more computer programs, performing operations based on input data to produce output. By doing so, the corresponding function is executed. The processing and logic flow can be further executed by a dedicated logic circuit such as FPGA (field programmable gate array) or ASIC (dedicated integrated circuit), and the device can also be realized as a dedicated logic circuit. Can be done.

コンピュータプログラムの実行に適したコンピュータは、例えば、汎用、および／または、専用マイクロプロセッサ、または、いかなる他の種類の中央処理ユニットを含む。一般的に、中央処理ユニットは、読み取り専用メモリ、および／または、ランダムアクセスメモリから、命令とデータを受信することになる。コンピュータの基本コンポーネントは、命令を実施または実行するための中央処理ユニット、および、命令とデータを記憶するための１つまたは複数のメモリデバイスを含む。一般的に、コンピュータは、磁気ディスク、磁気光学ディスク、または、光ディスクなどの、データを記憶するための１つまたは複数の大容量記憶デバイスをさらに含むか、または、操作可能に当該大容量記憶デバイスと結合されてデータを受信するかまたはデータを伝送するか、または、その両方を兼有する。しかしながら、コンピュータは、必ずとして、このようなデバイスを有するわけではない。なお、コンピュータは、もう１デバイスに埋め込まれることができ、例えば、携帯電話、パーソナルデジタルアシスタント（ＰＤＡ）、モバイルオーディオまたはビデオおプレーヤー、ゲームコンソール、グローバルポジショニングシステム（ＧＰＳ）レジーバー、または、汎用シリアルバス（ＵＳＢ）フラッシュドライブなどのポータブル記憶デバイスに埋め込まれることができ、これらデバイスはいくつかの例に過ぎない。 Suitable computers for running computer programs include, for example, general purpose and / or dedicated microprocessors, or any other type of central processing unit. Generally, the central processing unit will receive instructions and data from read-only memory and / or random access memory. The basic components of a computer include a central processing unit for executing or executing instructions, and one or more memory devices for storing instructions and data. In general, a computer further includes or is operable with one or more mass storage devices for storing data, such as magnetic disks, magnetic optical disks, or optical disks. Combined with, receive data, transmit data, or have both. However, computers do not necessarily have such devices. The computer can be embedded in another device, such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, Global Positioning System (GPS) receiver, or general purpose serial bus. It can be embedded in portable storage devices such as (USB) flash drives, and these devices are just a few examples.

コンピュータプログラム命令とデータの記憶に適したコンピュータ可読媒体は、様々な形式の不揮発性メモリ、媒介、および、メモリデバイスを含み、例えば、半導体メモリデバイス（たとえば、消去可能プログラム可能読み取り専用メモリ（ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ、ＥＰＲＯＭ）、電気的消去可能プログラム可能読み取り専用メモリ（ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ、ＥＥＰＲＯＭ）およびフラッシュメモリ）、磁気ディスク（たとえば、内部ハードディスクまたは移動可能ディスク）、磁気光学ディスク、光学ディスク読み取り専用メモリ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ、ＣＤ-ＲＯＭ）、デジタル多用途光ディスク（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ、ＤＶＤ）などを含む。プロセッサとメモリは、専用論理回路によって補完されるかまたは専用論理回路に組み込まれることができる。 Computer-readable media suitable for storing computer program instructions and data include various forms of non-volatile memory, intermediaries, and memory devices, such as semiconductor memory devices (eg, erasable programmable read-only memory (Erasable Programmable)). Read Only Memory (EPROM), electrically erasable programmable read-only memory (Electrically Erasable Read-Only Memory, EEPROM) and flash memory), magnetic disks (eg, internal hard disks or mobile disks), magnetic optical disks, optical disks. Includes read-only memory (Compact Computer Read Only Memory, CD-ROM), digital versatile optical disc (Digital Versaille Disc, DVD), and the like. Processors and memory can be complemented by dedicated logic circuits or incorporated into dedicated logic circuits.

本発明は、多くの具体的な実施の細部を含むが、これらを本発明の範囲または保護しようとする範囲を限定するものとして解釈すべきではなく、主に本発明のいくつかの実施例の特徴を叙述するために使用される。本発明の複数の実施例中の特定の特徴は、単一の実施例に組み合わせて実施されることもできる。他方、単一の実施例中の各種の特徴は、複数の実施例で別々に実施されるかまたはいかなる適切なサブ組み合わせで実施されることもできる。なお、特徴が上記のように特定の組み合わせで役割を果たし、また最初からこのように保護すると主張したが、保護すると主張した組み合わせからの１つまたは複数の特徴は、場合によって当該組み合わせから除外されることができ、また保護すると主張した組み合わせはサブ組み合わせるまたはサブ組み合わせの変形に向けることができる。 The present invention contains many specific implementation details, which should not be construed as limiting the scope of the invention or the scope of which it seeks to protect, primarily of some embodiments of the invention. Used to describe features. Specific features in a plurality of embodiments of the present invention may also be implemented in combination with a single embodiment. On the other hand, the various features in a single embodiment can be implemented separately in multiple embodiments or in any suitable subcombination. It should be noted that the features play a role in a particular combination as described above and are claimed to be protected in this way from the beginning, but one or more features from the combination claimed to be protected may be excluded from the combination in some cases. And the combinations claimed to be protected can be sub-combined or directed to variants of the sub-combination.

類似的に、図面で特定の順序に従って操作を描いたが、これはこれら操作を示した特定の順序にしたがって実行するかまたは順次に実行するように要求するか、または、例示したすべての操作が実行されることによって期待する結果が実現されると要求することであると理解すべきではない。なお、上記の実施例中の各種のシステムモジュールとコンポーネントの分離は、すべての実施例でいずれもこのように分離されなければならないと理解すべきではないし、また、叙述したプログラムコンポーネントとシステムは、一般的に、一緒に単一のソフトウェア製品に統合されるか、または、複数のソフトウェア製品にパッケージされることができることを理解すべきである。 Similarly, the drawings depict operations in a particular order, which either requires them to be performed in a specific order or requires them to be performed in sequence, or all the operations illustrated. It should not be understood that it is a requirement that the expected result be achieved by being carried out. It should be noted that the separation of the various system modules and components in the above embodiments should not be understood to be such separation in all embodiments, and the described program components and systems are: In general, it should be understood that they can be integrated together into a single software product or packaged into multiple software products.

上記は、本発明のいくつかの実施例に過ぎず、本発明を限定するために使用されるものではない。本発明の精神と原則の範囲内で行われたいかなる修正、同等の置換、改良などは、いずれも本発明の範囲に含まれるべきである。 The above are only a few embodiments of the invention and are not used to limit the invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the invention should be included within the scope of the invention.

Claims

It is a gaze area detection method,
Acquiring facial images collected in a predetermined 3D space,
To obtain the line-of-sight detection result by executing the line-of-sight detection based on the face image,
Includes detecting the type of target gaze area corresponding to the face image based on the gaze detection result using a gaze area classifier pre-trained for the predetermined three-dimensional space.
Here, the gaze area detection method, characterized in that the target gaze area belongs to one of a plurality of types of defined gaze areas obtained by preliminarily dividing the predetermined three-dimensional space.

The gaze area detection method according to claim 1, wherein the gaze detection result includes gaze start point information and gaze direction information of a person in the face image.

To obtain the line-of-sight detection result by executing the line-of-sight detection based on the face image
Detecting the position of the eyes in the facial image and
The gaze area detection method according to claim 2, wherein when the position of the eyes includes the positions of both eyes, the intermediate position of the eyes is determined as the line-of-sight start point information.

To obtain the line-of-sight detection result by executing the line-of-sight detection based on the face image
Detecting the position of the eyes in the facial image and
When the position of the eyes includes the positions of both eyes, the position of any one eye in the eyes is determined as the line-of-sight start point information, or when the position of the eyes includes the position of one eye, the position of the one eye. The gaze area detection method according to claim 2, wherein the position is determined as the line-of-sight start point information, and the present invention includes.

Detecting the position of the eyes in the facial image is
Detecting the head shape information of a person in the face image and
The gaze area detection method according to claim 3 or 4, wherein the position of the eyes in the facial image is determined based on the head shape information.

To obtain the line-of-sight detection result by executing the line-of-sight detection based on the face image
Detecting the head shape information of a person in the face image and
The gaze area detection method according to claim 2, further comprising detecting gaze direction information of a person in the face image based on the head shape information.

Detecting the head shape information of a person in the face image is
Detecting multiple face key points in the face image and
The gaze area detection method according to claim 5 or 6, wherein the gaze area detection method according to claim 5 or 6, comprising determining the head appearance information of a person in the face image based on the face key points and a predetermined average face model. ..

Detecting the line-of-sight direction information of a person in the face image based on the head figure information is
The face image is normalized based on the head shape information to obtain a normalized face image.
To obtain the first detected line-of-sight direction by executing the line-of-sight direction detection based on the normalized face image,
The gaze area detection according to claim 6 or 7, wherein the first detection line-of-sight direction is subjected to coordinate inverse conversion processing to obtain information on the line-of-sight direction of a person in the face image. Method.

It is not possible to obtain a normalized face image by performing normalization processing on the face image based on the head shape information.
Determining the head coordinate system of the person in the face image based on the head figure information,
To obtain a virtual camera coordinate system by rotating and translating the actual camera coordinate system corresponding to the face image based on the head coordinate system.
Includes that the face image is normalized to obtain the normalized face image based on the position conversion relationship between the actual camera coordinate system and the virtual camera coordinate system. The gaze area detection method according to claim 8, wherein the gaze area is detected.

The gaze area classifier is pre-trained based on a training sample set for the predetermined three-dimensional space, where the training sample set includes a plurality of gaze feature samples and each gaze feature sample is gaze. The gaze area type including the start point information, the line-of-sight direction information, and the gaze area type labeling information corresponding to the line-of-sight feature sample is the plurality of types obtained by dividing the predetermined three-dimensional space. The gaze area detection method according to any one of claims 1 to 9, wherein the gaze area belongs to one of the defined gaze areas.

Before acquiring the facial image collected in the predetermined three-dimensional space,
The line-of-sight start point information and the line-of-sight direction information of at least one line-of-sight feature sample are input to the gaze area classifier waiting for training to obtain gaze area type prediction information corresponding to the line-of-sight feature sample.
Based on the deviation between the gaze area type prediction information and the gaze area type labeling information corresponding to the gaze feature sample, parameter adjustment is performed for the gaze area classifier to obtain the gaze area classifier. The gaze area detection method according to claim 10, further comprising training.

Before acquiring the facial image collected in the predetermined three-dimensional space,
Further comprising acquiring a gaze area classifier corresponding to the space identifier from a predetermined gaze area classifier set based on the spatial identifier of the predetermined three-dimensional space.
Here, the gaze area detection method according to claim 10, wherein the predetermined gaze area classifier set includes a gaze area classifier corresponding to each of the spatial identifiers of different three-dimensional spaces.

The gaze area detection method according to any one of claims 1 to 12, wherein the predetermined three-dimensional space includes a vehicle space.

The facial image is determined based on the image collected for the driving area in the vehicle space.
The plurality of defined gaze areas are left windshield area, right windshield area, instrument panel area, interior mirror area, center console area, left rearview mirror area, right rearview mirror area, shading plate area, shift lever. The gaze area detection method according to claim 13, wherein the gaze area detection method includes at least two types of an area, a lower area of the steering wheel, a sub-pilot area, and a glove box area in front of the sub-pilot.

Based on the gaze area type detection result, the attention monitoring result of the person corresponding to the face image is determined, and
One of claims 1 to 14, further comprising outputting the attention monitoring result and / or outputting distraction prompt information based on the attention monitoring result. The gaze area detection method described in the section.

Confirming the control command corresponding to the gaze area type detection result and
The gaze area detection method according to any one of claims 1 to 15, further comprising controlling an electronic device to perform an operation corresponding to the control command.

It is a gaze area detection device,
An image acquisition module for acquiring face images collected in a predetermined three-dimensional space, and
A line-of-sight detection module for executing line-of-sight detection based on the face image and obtaining a line-of-sight detection result,
A gaze area detection module for detecting the type of the target gaze area corresponding to the face image based on the gaze detection result by using the gaze area classifier trained in advance for the predetermined three-dimensional space. And, with
Here, the gaze area detection device is characterized in that the target gaze area belongs to one of a plurality of types of defined gaze areas obtained by preliminarily dividing the predetermined three-dimensional space.

The gaze area detection device according to claim 17, wherein the gaze detection result includes gaze start point information and gaze direction information of a person in the face image.

The line-of-sight detection module is
An eye position detection submodule for detecting the position of the eyes in the face image,
18. The invention according to claim 18, wherein when the position of the eyes includes the positions of both eyes, a first start point information determination submodule for determining the intermediate position of the eyes as the line-of-sight start point information is provided. Gaze area detector.

The line-of-sight detection module is
An eye position detection submodule for detecting the position of the eyes in the face image,
When the position of the eyes includes the positions of both eyes, the position of any one eye in the eyes is determined as the line-of-sight start point information, or when the position of the eyes includes the position of one eye, the position of the one eye. The gaze area detection device according to claim 18, further comprising a second start point information determination submodule for determining the position as the line-of-sight start point information.

The eye position detection submodule
A figure detection unit for detecting the head figure information of a person in the face image, and
The gaze area detection device according to claim 19 or 20, further comprising a position determination unit for determining the position of the eyes in the face image based on the head posture information.

The line-of-sight detection module is
A figure detection submodule for detecting the head figure information of a person in the face image,
The gaze area detection device according to claim 18, further comprising a direction detection submodule for detecting gaze direction information of a person in the face image based on the head shape information.

The appearance detection submodule is
A key point detection unit for detecting a plurality of face key points in the face image, and
22. The gaze area according to claim 22, comprising: a figure determination unit for determining the head appearance information of a person in the face image based on the face key points and a predetermined average face model. Detection device.

The direction detection submodule
An image processing unit for performing normalization processing on the face image based on the head shape information to obtain a normalized face image, and an image processing unit.
A first-direction detection unit for performing line-of-sight detection based on the normalized face image to obtain a first detection line-of-sight direction,
22. 23. Gaze area detector.

The image processing unit is
A head coordinate determination subunit for determining the head coordinate system of a person in the face image based on the head figure information,
A coordinate conversion subunit for obtaining a virtual camera coordinate system by rotating and translating the actual camera coordinate system corresponding to the face image based on the head coordinate system.
An image processing subunit for performing normalization processing on the face image based on the position conversion relationship between the actual camera coordinate system and the virtual camera coordinate system to obtain the normalized face image. 24. The gaze area detection device according to claim 24, comprising:

The gaze area classifier is pre-trained based on a training sample set for the predetermined three-dimensional space, where the training sample set includes a plurality of gaze feature samples and each gaze feature sample is gaze. The gaze area type including the start point information, the line-of-sight direction information, and the gaze area type labeling information corresponding to the line-of-sight feature sample is the plurality of types obtained by dividing the predetermined three-dimensional space. The gaze area detection device according to any one of claims 17 to 25, which belongs to one of the defined gaze areas.

Further equipped with a classifier training module, the classifier training module is
A type prediction sub for inputting the line-of-sight start point information and the line-of-sight direction information of at least one line-of-sight feature sample into a gaze area classifier waiting for training to obtain gaze area type prediction information corresponding to the line-of-sight feature sample. Module and
Based on the deviation between the gaze area type prediction information and the gaze area type labeling information corresponding to the gaze feature sample, parameter adjustment is performed for the gaze area classifier to obtain the gaze area classifier. 26. The gaze area detector according to claim 26, further comprising a parameter adjustment submodule for training.

A classifier acquisition module for acquiring a gaze area classifier corresponding to the space identifier from a predetermined gaze area classifier set based on the spatial identifier of the predetermined three-dimensional space is further provided.
26. The gaze area detection device according to claim 26, wherein the predetermined gaze area classifier set includes a gaze area classifier corresponding to a spatial identifier in a different three-dimensional space.

The gaze area detection device according to any one of claims 17 to 28, wherein the predetermined three-dimensional space includes a vehicle space.

The facial image is determined based on the image collected for the driving area in the vehicle space.
The plurality of defined gaze areas are left windshield area, right windshield area, instrument panel area, interior mirror area, center console area, left rearview mirror area, right rearview mirror area, shading plate area, shift lever. 29. The gaze area detection device according to claim 29, comprising at least two types of an area, a lower area of the steering wheel, a sub-pilot area, and a glove box area in front of the sub-pilot.

An attention monitoring module for determining the attention monitoring result of the person corresponding to the face image based on the gaze area type detection result obtained by the gaze area detection module.
17 to 30, further comprising a monitoring result output module for outputting the attention monitoring result and / or outputting distraction prompt information based on the attention monitoring result. The gaze area detection device according to any one of the items.

A control command confirmation module for confirming a control command corresponding to the gaze area type detection result obtained by the gaze area detection module, and a control command confirmation module.
The gaze area detection according to any one of claims 17 to 31, further comprising an operation control module for controlling the electronic device to perform an operation corresponding to the control command. Device.

A computer-readable recording medium that stores computer programs.
A computer-readable recording medium, wherein when the computer program is executed by a processor, the processor implements the method according to any one of claims 1 to 16.

It ’s an electronic device,
Equipped with memory and processor,
Here, the computer program is stored in the memory.
An electronic device according to any one of claims 1 to 16, wherein the method according to any one of claims 1 to 16 is realized when the processor executes the computer program.