JP7402239B2

JP7402239B2 - Face recognition method, neural network training method, face recognition device, electronic device and computer readable storage medium

Info

Publication number: JP7402239B2
Application number: JP2021540572A
Authority: JP
Inventors: ワン，フェイ; キアン，チェン
Original assignee: ベイジンセンスタイムテクノロジーデベロップメントシーオー．，エルティーディー
Priority date: 2019-02-26
Filing date: 2019-10-31
Publication date: 2023-12-20
Anticipated expiration: 2039-10-31
Also published as: JP2022521038A; CN109886222B; WO2020173117A1; CN109886222A; KR20210101313A

Description

本開示は、画像処理の技術分野に関し、特に顔認識方法、ニューラルネットワークのトレーニング方法、装置及び電子機器に関する。 The present disclosure relates to the technical field of image processing, and particularly relates to a face recognition method, a neural network training method, an apparatus, and an electronic device.

人工知能と車両業界の急速な発展に伴い、最新の人工知能技術を量産車両に適用することは、既にもっとも市場の将来性を有する方向となる。現在の車両市場に求められている人工知能製品は、運転支援システム、運転者監視システム、車両運行管理システムなどを含むが、これらに限定されない。これらの市場ニーズを満たすために、通常、運転者の顔を認識し、かつこれを基に後続きの管理及び制御を行う必要がある。 With the rapid development of artificial intelligence and the vehicle industry, applying the latest artificial intelligence technology to mass-produced vehicles has already become the most promising direction in the market. Artificial intelligence products in demand in the current vehicle market include, but are not limited to, driver assistance systems, driver monitoring systems, vehicle operation management systems, etc. In order to meet these market needs, it is usually necessary to recognize the driver's face and to perform subsequent management and control based on this recognition.

本開示は、顔認識の技術手段及びニューラルネットワークのトレーニングの技術手段を提供する。 The present disclosure provides technical means for facial recognition and neural network training.

第１の態様では、本開示の実施例に係る顔認識方法は、
第１のカメラにより第１の顔画像を取得するステップと、
前記第１の顔画像の第１の顔特徴を抽出するステップと、
前記第１の顔特徴を、前記第１のカメラと異なるタイプのカメラである第２のカメラが取得した第２の顔画像の特徴から抽出された予め記憶された第２の顔特徴と対比し、参照類似度を取得するステップと、
前記参照類似度に基づいて前記第１の顔特徴と前記第２の顔特徴とが同じ人に対応するか否かを決定するステップと、を含む。 In a first aspect, a face recognition method according to an embodiment of the present disclosure includes:
acquiring a first facial image with a first camera;
extracting a first facial feature of the first facial image;
The first facial features are compared with pre-stored second facial features extracted from the features of a second facial image acquired by a second camera that is a different type of camera from the first camera. , obtaining reference similarity;
determining whether the first facial feature and the second facial feature correspond to the same person based on the reference similarity.

本開示の実施例では、顔認識を行う場合、異なるタイプのカメラにより取得された第１の顔画像及び第２の顔画像の認識を実現することができる。本開示の実施例の実施では、異なるタイプのカメラにより第１の顔画像及び第２の顔画像を取得することにより、本開示の実施例に係る顔認識方法をより多くのシーンに適用することができ、顔認証を容易にするだけでなく、顔登録時のカメラも限定されず、顔登録の利便性を向上させる。 In the embodiment of the present disclosure, when performing face recognition, recognition of a first face image and a second face image acquired by different types of cameras can be realized. In implementing the embodiments of the present disclosure, the first face image and the second face image are acquired by different types of cameras, thereby applying the face recognition method according to the embodiments of the present disclosure to more scenes. This not only simplifies face recognition, but also improves the convenience of face registration by not limiting the number of cameras used during face registration.

第２の態様では、本開示の実施例に係るニューラルネットワークのトレーニング方法は、
異なるタイプのカメラによって撮影され、かつ顔が含まれる第１のタイプの画像サンプル及び第２のタイプの画像サンプルを取得するステップと、
前記第１のタイプの画像サンプル及び前記第２のタイプの画像サンプルに基づいてニューラルネットワークをトレーニングするステップと、を含む。 In a second aspect, a neural network training method according to an embodiment of the present disclosure includes:
obtaining a first type of image sample and a second type of image sample that are taken by different types of cameras and include a face;
training a neural network based on the first type of image samples and the second type of image samples.

本開示の実施例では、異なるタイプのカメラによって撮影された顔画像を用いてニューラルネットワークをトレーニングすることにより、該ニューラルネットワークが顔特徴を出力する精度を効果的に向上させることができるとともに、顔認識を行うとき、該ニューラルネットワークを用いて顔特徴を抽出すると、顔認識の精度を効果的に向上させる。 In embodiments of the present disclosure, by training a neural network using facial images captured by different types of cameras, the accuracy with which the neural network outputs facial features can be effectively improved, and facial When performing recognition, using the neural network to extract facial features can effectively improve the accuracy of facial recognition.

第３の態様では、本開示の実施例に係る顔認識装置は、
第１のカメラにより第１の顔画像を取得する第１の取得ユニットと、
前記第１の顔画像の第１の顔特徴を抽出する第１の抽出ユニットと、
前記第１の顔特徴を、前記第１のカメラと異なるタイプのカメラである第２のカメラが取得した第２の顔画像の特徴から抽出された予め記憶された第２の顔特徴と対比し、参照類似度を取得する対比ユニットと、
前記参照類似度に基づいて前記第１の顔特徴と前記第２の顔特徴とが同じ人に対応するか否かを決定する決定ユニットと、を含む。 In a third aspect, a face recognition device according to an embodiment of the present disclosure includes:
a first acquisition unit that acquires a first face image with a first camera;
a first extraction unit that extracts a first facial feature of the first facial image;
The first facial features are compared with pre-stored second facial features extracted from the features of a second facial image acquired by a second camera that is a different type of camera from the first camera. , a contrast unit that obtains the reference similarity;
a determining unit for determining whether the first facial feature and the second facial feature correspond to the same person based on the reference similarity.

第４の態様では、本開示の実施例に係るニューラルネットワークのトレーニング装置は、
異なるタイプのカメラによって撮影され、かつ顔が含まれる第１のタイプの画像サンプル及び第２のタイプの画像サンプルを取得する取得ユニットと、
前記第１のタイプの画像サンプル及び前記第２のタイプの画像サンプルに基づいてニューラルネットワークをトレーニングするトレーニングユニットと、を含む。 In a fourth aspect, a neural network training device according to an embodiment of the present disclosure includes:
an acquisition unit for acquiring a first type of image sample and a second type of image sample taken by different types of cameras and including a face;
a training unit for training a neural network based on the first type of image samples and the second type of image samples.

第５の態様では、本開示の実施例に係る電子機器は、
プロセッサ及びメモリを含み、前記メモリは、前記プロセッサに結合され、さらにプログラム命令を記憶し、前記プロセッサは、前記電子機器が上記第１の態様の方法における対応する機能を実行することをサポートするように構成される。 In a fifth aspect, an electronic device according to an embodiment of the present disclosure includes:
a processor and a memory, the memory coupled to the processor and further storing program instructions, the processor configured to support the electronic device in performing the corresponding functions in the method of the first aspect. It is composed of

第６の態様では、本開示の実施例に係る電子機器は、プロセッサ及びメモリを含み、前記メモリは、前記プロセッサに結合され、さらにプログラム命令を記憶し、前記プロセッサは、前記電子機器が上記第２の態様の方法における対応する機能を実行することをサポートするように構成される。 In a sixth aspect, an electronic device according to an embodiment of the present disclosure includes a processor and a memory, the memory is coupled to the processor and further stores program instructions, and the processor is configured to and configured to support performing the corresponding functions in the method of aspect 2.

第７の態様では、本開示の実施例に係る顔認識システムは、ニューラルネットワークのトレーニング装置及び顔認識装置を含み、前記ニューラルネットワークのトレーニング装置は、前記顔認識装置に結合され、
前記ニューラルネットワークのトレーニング装置は、ニューラルネットワークをトレーニングし、
前記顔認識装置は、前記ニューラルネットワークのトレーニング装置によってトレーニングされたニューラルネットワークを適用する。 In a seventh aspect, a face recognition system according to an embodiment of the present disclosure includes a neural network training device and a face recognition device, and the neural network training device is coupled to the face recognition device;
The neural network training device trains a neural network,
The face recognition device applies a neural network trained by the neural network training device.

第８の態様では、本開示の実施例に係るコンピュータ可読記憶媒体には、コンピュータにおいて実行されると、コンピュータに上記各態様に記載の方法を実行させる命令が記憶されている。 In an eighth aspect, a computer readable storage medium according to an embodiment of the present disclosure stores instructions that, when executed in a computer, cause the computer to perform the methods described in each aspect above.

第９の態様では、本開示の実施例に係る、命令を含むコンピュータプログラムは、命令がコンピュータにおいて実行されると、コンピュータに上記各態様に記載の方法を実行させる。 In a ninth aspect, a computer program comprising instructions according to an embodiment of the present disclosure causes the computer to perform the method described in each aspect above when the instructions are executed on a computer.

本開示の出願人は、本開示の実施例を実施するプロセスにおいて、従来の顔認識方法において認証される画像タイプと登録される画像タイプとが同じ画像タイプであることを制限することが多く、例えば登録プロセスにおいてＲＧＢ画像を使用する場合、認証プロセスにおいてもＲＧＢ画像を使用する必要があり、このように、複数のタイプのカメラが関係するシーンでの従来の顔認識解決手段の適用が制限されることを発見した。本開示の実施例は、複数のタイプのカメラが関係するシーンに顔画像認識の解決手段を提供し、本開示の実施例の実施では、一タイプのカメラにより取得された第２の顔画像の顔画像をベースライブラリ特徴とし、別のタイプのカメラにより取得された第１の顔画像の顔特徴をベースライブラリ特徴と対比し、対比結果に基づいて顔認識を実現することにより、本開示の実施例に係る顔認識方法をより多くのシーンに適用することができ、顔認証を容易にするだけでなく、顔登録時のカメラも限定されず、顔登録の利便性を向上させる。 In the process of implementing the embodiments of the present disclosure, the applicant of the present disclosure often restricts the image type that is authenticated and the image type that is registered in conventional face recognition methods to be the same image type; For example, if RGB images are used in the registration process, RGB images must also be used in the authentication process, thus limiting the application of traditional facial recognition solutions in scenes involving multiple types of cameras. I discovered that. Embodiments of the present disclosure provide a facial image recognition solution for scenes involving multiple types of cameras, and implementations of embodiments of the present disclosure provide a facial image recognition solution for scenes involving multiple types of cameras; Implementation of the present disclosure by taking a face image as a base library feature, comparing the face feature of a first face image acquired by another type of camera with the base library feature, and realizing face recognition based on the comparison result. The face recognition method according to the example can be applied to more scenes, which not only makes face recognition easier, but also the camera used during face registration is not limited, improving the convenience of face registration.

本開示の実施例又は背景技術における技術手段をより明確に説明するために、以下、本開示の実施例又は背景技術に使用する必要がある図面について説明する。
本開示の実施例に係る顔認識方法のフローチャートである。本開示の実施例に係るニューラルネットワークのトレーニング方法のフローチャートである。本開示の実施例に係るトレーニングプロセスの概略図である。本開示の実施例に係る顔認識装置の概略構成図である。本開示の実施例に係る別の顔認識装置の概略構成図である。本開示の実施例に係るニューラルネットワークのトレーニング装置の概略構成図である。本開示の実施例に係るトレーニングユニットの概略構成図である。本開示の実施例に係る別のニューラルネットワークのトレーニング装置の概略構成図である。本開示の実施例に係る電子機器の概略構成図である。 In order to more clearly explain the technical means in the embodiments of the present disclosure or the background art, drawings that need to be used in the embodiments or background technology of the present disclosure will be described below.
3 is a flowchart of a face recognition method according to an embodiment of the present disclosure. 3 is a flowchart of a neural network training method according to an embodiment of the present disclosure. 1 is a schematic diagram of a training process according to an embodiment of the present disclosure; FIG. 1 is a schematic configuration diagram of a face recognition device according to an embodiment of the present disclosure. FIG. 3 is a schematic configuration diagram of another face recognition device according to an example of the present disclosure. FIG. 1 is a schematic configuration diagram of a neural network training device according to an embodiment of the present disclosure. FIG. 1 is a schematic configuration diagram of a training unit according to an example of the present disclosure. FIG. 2 is a schematic configuration diagram of another neural network training device according to an embodiment of the present disclosure. 1 is a schematic configuration diagram of an electronic device according to an example of the present disclosure.

本開示の目的、技術手段及び利点をより明確にするために、以下に図面を参照しながら本開示をさらに詳細に説明する。 In order to make the objects, technical means, and advantages of the present disclosure more clear, the present disclosure will be described in further detail with reference to the drawings below.

本開示の明細書と特許請求の範囲と上記図面における用語「第１」、「第２」などは、異なる対象を区別するためのものであり、特定の順序を説明するためのものではない。また、用語「含む」と「備える」及びそれらのいかなる変形は、非排他的な包含をカバーすることを意図する。例えば、一連のステップ又はユニットを含むプロセス、方法、システム、製品又は機器は、示されたステップ又はユニットに限定されないが、好ましくは、示されていないステップ又はユニットをさらに含むか、又は好ましくは、これらのプロセス、方法又は機器に固有の他のステップ又はユニットをさらに含む。 The terms "first", "second", etc. in the specification and claims of the present disclosure and the above drawings are used to distinguish different objects and are not intended to describe a particular order. Also, the terms "comprising" and "comprising" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or apparatus comprising a series of steps or units is not limited to the steps or units shown, but preferably further comprises steps or units not shown, or preferably: It may further include other steps or units specific to these processes, methods or apparatuses.

図１を参照すると、図１は、本開示の実施例に係る顔認識方法のフローチャートであり、該顔認識方法は、顔認識装置に適用でき、さらに電子機器に適用でき、該電子機器は、サーバ又は端末装置を含んでよく、該サーバは、任意タイプのサーバ、例えばクラウドサーバなどを含んでよく、本開示の実施例では限定しない。該端末装置は、携帯電話、タブレットコンピュータ、デスクトップコンピュータ、車載機器、運転者状態監視システム、乗車管理システム、自動車レンタル管理システム、オンライン配車管理システムなどを含んでよく、本開示の実施例は、該端末装置の具体的な形態を一意的に限定しない。以下、該顔認識方法の電子機器への適用を例として説明する。 Referring to FIG. 1, FIG. 1 is a flowchart of a face recognition method according to an embodiment of the present disclosure, and the face recognition method can be applied to a face recognition device, and further can be applied to an electronic device, and the electronic device includes: It may include a server or a terminal device, and the server may include any type of server, such as a cloud server, and is not limited by the embodiments of this disclosure. The terminal device may include a mobile phone, a tablet computer, a desktop computer, an in-vehicle device, a driver status monitoring system, a ride management system, an automobile rental management system, an online vehicle dispatch management system, etc. The specific form of the terminal device is not uniquely limited. The application of the face recognition method to electronic equipment will be described below as an example.

図１に示すように、該顔認識方法は、以下のステップ１０１～１０４を含む。 As shown in FIG. 1, the face recognition method includes the following steps 101-104.

ステップ１０１では、第１のカメラにより第１の顔画像を取得する。 In step 101, a first facial image is acquired by a first camera.

本開示の実施例では、該第１の顔画像は、第１のカメラにより撮影された画像を含んでよく、或いは、該第１の顔画像は、第１のカメラにより撮影されたビデオストリームデータ中の任意のフレームの画像などを含んでよく、本開示の実施例は、該第１の顔画像の元を限定しない。 In embodiments of the present disclosure, the first facial image may include an image captured by a first camera, or the first facial image may include video stream data captured by a first camera. The embodiment of the present disclosure does not limit the source of the first face image.

本開示の実施例では、第１のカメラはサーモカメラであってよく、或いは、第１のカメラは可視光カメラである。第１のカメラがサーモカメラである場合、第２のカメラは、サーモカメラと異なる他のカメラであってよく、例えば、第２のカメラは可視光カメラであってよい。第１のカメラが可視光カメラである場合、第２のカメラは可視光カメラと異なる他のカメラであってよく、例えば、第２のカメラはサーモカメラであってよい。一例では、可視光カメラは赤緑青（ｒｅｄｇｒｅｅｎｂｌｕｅ、ＲＧＢ）カメラを含んでよい。サーモカメラは赤外線（ｉｎｆｒａｒｅｄｒａｄｉａｔｉｏｎ、ＩＲ）カメラを含んでよい。ＩＲカメラの結像は、環境光の干渉を受けず、昼間や夜であれ、晴れ、曇天や雨天であれ、外部道路やトンネルなどの異なる適用シーンであるかに関わらず、いずれも品質の差異が大きくない画像を収集することができる。ＲＧＢカメラは、価格が低く、適用が普及し、多くの端末又はシーンにいずれもＲＧＢカメラが配置され、かつＲＧＢ画像も非常に普及して汎用される。したがって、車の適用シーンにおいて車載カメラはＩＲカメラであってよい。それによりＲＧＢカメラを用いて顔登録を行い、登録の利便性及び柔軟性を向上させ、ＩＲカメラを用いて顔認識を行い、車載カメラを用いてリアルタイムな画像収集を行い、顔認識の結果に基づいてロック解除、権限制御、人員／車両管理などの処理を行うことを容易にする。なお、以上は一例に過ぎず、具体的な実現において、他のタイプのカメラをさらに含む可能性があり、本明細書において繰り返して説明しない。 In embodiments of the present disclosure, the first camera may be a thermal camera or the first camera may be a visible light camera. When the first camera is a thermo camera, the second camera may be another camera different from the thermo camera, for example, the second camera may be a visible light camera. When the first camera is a visible light camera, the second camera may be another camera different from the visible light camera, for example, the second camera may be a thermo camera. In one example, the visible light camera may include a red green blue (RGB) camera. Thermal cameras may include infrared radiation (IR) cameras. The imaging of IR camera is not affected by the interference of environmental light, and there is no difference in quality regardless of whether it is daytime or night, whether it is sunny, cloudy or rainy, and whether it is different application scenes such as external roads or tunnels. Can collect images that are not large. RGB cameras are low in price and widely used, RGB cameras are installed in many terminals or scenes, and RGB images are also very popular and widely used. Therefore, in the car application scene, the on-vehicle camera may be an IR camera. This will improve the convenience and flexibility of registration by using RGB cameras for face registration, perform face recognition using IR cameras, and collect real-time images using in-vehicle cameras to improve the convenience and flexibility of registration. This makes it easy to perform processes such as unlocking, authority control, personnel/vehicle management, etc. Note that the above is only an example, and a specific implementation may further include other types of cameras, and will not be repeatedly described herein.

一例では、第１のカメラは、電子機器に接続された外付けカメラであってよく、或いは該電子機器の内蔵されたカメラなどであってよく、本開示の実施例は、該第１のカメラの具体的な実現形態を限定しない。それに応じて、第２のカメラの一例では、第１のカメラ及び第２のカメラは、異なるタイプの車載カメラであってよく、つまり第１の顔画像は、車載カメラにより車両の運転領域において取得された顔画像であってよい。なお、具体的な実現において、該第１のカメラ及び該第２のカメラが様々な電子機器に内蔵される場合、一例として、第１のカメラがカメラに内蔵されてよく、或いは携帯電話に内蔵されてよく、或いは車載機器などに内蔵されてよく、本開示の実施例は、該第１のカメラ及び該第２のカメラの具体的な形態を一意的に限定しない。 In one example, the first camera may be an external camera connected to an electronic device, or a built-in camera of the electronic device, etc., and embodiments of the present disclosure describe how the first camera The specific implementation form is not limited. Accordingly, in one example of the second camera, the first camera and the second camera may be different types of onboard cameras, i.e. the first facial image is acquired in the driving area of the vehicle by the onboard camera. It may be a face image that has been In addition, in a specific implementation, when the first camera and the second camera are built into various electronic devices, for example, the first camera may be built into the camera, or the first camera may be built into a mobile phone. The first camera and the second camera may be built into an in-vehicle device or the like, and the embodiments of the present disclosure do not uniquely limit the specific forms of the first camera and the second camera.

一例では、第１のカメラが車載カメラである場合、第１のカメラにより第１の顔画像を取得するステップは、
車載カメラにより第１の顔画像を取得するステップを含み、第１の顔画像は、車両の車両使用者の顔画像を含む。 In one example, when the first camera is an in-vehicle camera, obtaining the first facial image with the first camera includes:
The method includes acquiring a first facial image by an on-vehicle camera, the first facial image including a facial image of a vehicle user of the vehicle.

本実施例では、車両は、自動車、軽車両、乗用車、トラック、定時運行車、タクシー、二輪車、三輪車、四輪及び四輪以上車、小車、車両型ロボット、ラジコン模型自動車などを含んでよく、本開示の実施例は、該車両の具体的なタイプを限定しない。 In this embodiment, the vehicle may include a car, a light vehicle, a passenger car, a truck, a scheduled vehicle, a taxi, a two-wheeled vehicle, a three-wheeled vehicle, a four-wheeled vehicle or a vehicle with more than four wheels, a small vehicle, a vehicle-type robot, a radio-controlled model vehicle, etc. , embodiments of the present disclosure do not limit the specific type of vehicle.

本実施例では、車両使用者は、車両を運転する人、車両に乗る人、車両を修理する人、車両に給油する人及び車両を制御する人のうちの１人以上を含んでよい。車両を制御する人はラジコン模型自動車を制御する人であってよく、車両に給油する人は給油作業者であってよく、車両を修理する人は自動車修理作業者であってよく、車両に乗る人はタクシー又は定時運行車などに乗る人であってよく、車両を運転する人は運転手などであってよい。本開示の実施例は、以上の車両使用者の具体的なタイプを限定しない。 In this example, vehicle users may include one or more of the following: a person who drives the vehicle, a person who rides the vehicle, a person who repairs the vehicle, a person who refuels the vehicle, and a person who controls the vehicle. The person who controls the vehicle may be a person who controls a radio-controlled model car, the person who refuels the vehicle may be a refueling worker, the person who repairs the vehicle may be an automobile repair worker, and the person who rides the vehicle may be a person who controls a radio-controlled model car. The person may be a person riding a taxi or a scheduled vehicle, etc., and the person driving the vehicle may be a driver, etc. Embodiments of the present disclosure do not limit the specific types of vehicle users described above.

一例では、車両使用者が車両を運転する人を含む場合、本開示の実施例は、電子機器がいつ第１の顔画像を取得するかというトリガ条件をさらに提供し、例えば、車載カメラにより第１の顔画像を取得するステップは、
トリガ命令を受信した場合、車載カメラにより第１の顔画像を取得するステップ、
或いは、車両の走行中に、車載カメラにより第１の顔画像を取得するステップ、
或いは、車両の走行速度が参照速度に達した場合、車載カメラにより第１の顔画像を取得するステップを含む。 In one example, if the vehicle users include a person driving the vehicle, embodiments of the present disclosure further provide a triggering condition of when the electronic device acquires the first facial image, e.g. The step of acquiring the first facial image is as follows:
If the trigger command is received, acquiring a first facial image by the in-vehicle camera;
Alternatively, a step of acquiring a first facial image using an on-vehicle camera while the vehicle is running;
Alternatively, the method includes the step of acquiring a first facial image using an on-vehicle camera when the traveling speed of the vehicle reaches a reference speed.

本実施例では、トリガ命令は、電子機器が受信した、ユーザが入力したトリガ命令であってもよく、電子機器に接続された他の電子機器が送信したトリガ命令などであってもよく、本実施例は、該トリガ命令の元及び具体的な形態を限定しない。 In this embodiment, the trigger command may be a trigger command received by the electronic device and input by a user, or may be a trigger command sent by another electronic device connected to the electronic device. The embodiments do not limit the origin and specific form of the trigger instruction.

本実施例では、車両の走行中は、車両が点火するときと理解でき、つまり、電子機器が、車両の走行が開始したことを検出すれば、該電子機器は、車両の運転領域におけるユーザの顔画像、すなわち第１の顔画像を取得することができる。 In this embodiment, when the vehicle is running, it can be understood that when the vehicle is ignited, that is, when the electronic device detects that the vehicle has started running, the electronic device detects the user's ignition in the driving area of the vehicle. A facial image, ie, a first facial image, can be obtained.

本実施例では、参照速度は、車両の走行速度がどの程度に達するとき、電子機器が第１の顔画像を取得するかを判断するため、該参照速度が具体的にはどのぐらいであるかを限定しない。該参照速度は、ユーザにより設定されてもよく、電子機器に接続された、車両の走行速度を測定する装置により、設定されてもよく、電子機器により設定されてもよく、本実施例は限定しない。 In this embodiment, the reference speed is used to determine at what speed the vehicle travels when the electronic device acquires the first face image. Not limited. The reference speed may be set by a user, may be set by a device connected to an electronic device and measures the running speed of the vehicle, or may be set by an electronic device, and this embodiment is not limited to do not.

本実施例の実施では、トリガ条件を設定することにより第１の顔画像を取得することにより、車両使用者の身分を識別し、電子機器が顔認識を行う効率を効果的に向上させることができる。 In the implementation of this embodiment, by setting the trigger condition and acquiring the first facial image, it is possible to identify the identity of the vehicle user and effectively improve the efficiency with which the electronic device performs facial recognition. can.

ステップ１０２では、第１の顔画像の第１の顔特徴を抽出する。 In step 102, a first facial feature of the first facial image is extracted.

本開示の実施例では、電子機器は、任意の方法により第１の顔画像の第１の顔特徴を抽出することができ、例えば、該電子機器は、特徴点抽出アルゴリズムにより第１の顔特徴を抽出することができ、該特徴点抽出については、ｓｕｓａｎ演算子特徴抽出、ｈａｒｒｉｓ演算子特徴抽出、ｓｉｆｔ特徴抽出又はニューラルネットワーク特徴方法などを用いることができるが、これらに限定されない。また、例えば、該電子機器は、幾何学的特徴の顔特徴抽出方法又はテンプレートマッチングに基づく顔特徴抽出方法により該第１の顔特徴などを抽出し、本開示の実施例は、該電子機器がどのように第１の顔特徴を抽出するかについて限定しない。 In embodiments of the present disclosure, the electronic device may extract the first facial feature of the first facial image by any method, for example, the electronic device may extract the first facial feature of the first facial image using a feature point extraction algorithm. For the feature point extraction, susan operator feature extraction, harris operator feature extraction, sift feature extraction, neural network feature extraction, or the like can be used, but the method is not limited thereto. Further, for example, the electronic device extracts the first facial feature etc. by a facial feature extraction method based on geometric features or a facial feature extraction method based on template matching, and the embodiment of the present disclosure There is no limitation on how the first facial feature is extracted.

一例では、該電子機器は、ニューラルネットワークにより第１の顔の第１の顔特徴を抽出してよく、該ニューラルネットワークは、予めトレーニングされたニューラルネットワークであってよい。該予めトレーニングされたニューラルネットワークは、本開示の実施例における電子機器によってトレーニングされたニューラルネットワークであってもよく、他の装置によってトレーニングされたニューラルネットワークであってもよく、次に本開示の実施例における電子機器が該他の装置から取得したニューラルネットワークなどであってもよく、本開示の実施例は限定しない。 In one example, the electronic device may extract the first facial feature of the first face with a neural network, and the neural network may be a pre-trained neural network. The pre-trained neural network may be a neural network trained by an electronic device in an embodiment of the present disclosure, or may be a neural network trained by another device, and then a neural network trained by an electronic device in an embodiment of the present disclosure. The electronic device in the example may be a neural network acquired from the other device, and the embodiments of the present disclosure are not limited thereto.

本開示の実施例では、ニューラルネットワークは、畳み込み層、非線形層、プーリング層などのネットワーク層により一定の方式で積層設計されてよく、本開示の実施例は、具体的なネットワーク構造を限定しない。ニューラルネットワークの構造を設計した後、ニューラルネットワークの所定のトレーニング完了条件を満たすまで、ラベリング情報付きの画像に基づいて、教師又は弱教師方式を用いて設計されたニューラルネットワークに対して勾配誤差逆伝播を行うなどの方法で幾千ひいては幾万回の反復トレーニングを行い、ニューラルネットワークのネットワークパラメータを調整してよい。具体的なトレーニング方式について、本開示の実施例は限定しない。 In the embodiments of the present disclosure, the neural network may be designed to be stacked in a certain manner with network layers such as convolutional layers, nonlinear layers, pooling layers, etc., and the embodiments of the present disclosure do not limit the specific network structure. After designing the structure of the neural network, gradient error backpropagation is performed on the neural network designed using the supervised or weakly supervised method based on images with labeling information until the predetermined training completion condition of the neural network is met. The network parameters of the neural network may be adjusted by performing repeated training thousands or even tens of thousands of times. The embodiments of the present disclosure are not limited to specific training methods.

ニューラルネットワークによる顔画像の特徴抽出は、端末間の出力を実現することができ、例えば、第１の顔画像を予めトレーニングされたニューラルネットワークに入力し、ニューラルネットワークが該第１の顔画像から特徴抽出を行って取得された特徴図を出力し、すなわち端末間で顔画像特徴抽出のプロセスを実現する。顔特徴抽出は、顔のいくつかの特徴に対して顔特徴抽出を行うことであり、顔特徴は顔表徴と呼ばれてもよい。一例として、ニューラルネットワークによる顔特徴抽出は、具体的にはディープニューラルネットワークにより顔の深さレベルの抽象的な特徴を抽出することであってよい。 Feature extraction of a facial image by a neural network can realize output between terminals, for example, a first facial image is input to a pre-trained neural network, and the neural network extracts features from the first facial image. The feature map obtained by extraction is output, that is, the process of facial image feature extraction is realized between terminals. Facial feature extraction refers to performing facial feature extraction on some features of a face, and the facial features may also be referred to as facial features. As an example, facial feature extraction using a neural network may specifically involve extracting deep-level abstract features of a face using a deep neural network.

一例では、第１の顔画像を予めトレーニングされたニューラルネットワークに入力し、ニューラルネットワークにより第１の顔画像の第１の顔特徴を出力し、ニューラルネットワークは、異なるタイプのカメラによって撮影され、かつ顔が含まれる第１のタイプの画像サンプル及び第２のタイプの画像サンプルに基づいてトレーニングすることにより取得される。本開示では、２種類の異なるタイプのカメラによって撮影された画像サンプルを用いてニューラルネットワークをトレーニングすることにより、ニューラルネットワークは、異なるタイプの画像の特徴抽出能力を学習することができ、これによりトレーニングされたニューラルネットワークに基づいて異なるタイプの画像に対する顔認識能力を実現する。 In one example, a first facial image is input to a pre-trained neural network, the neural network outputs a first facial feature of the first facial image, and the neural network includes a first facial image captured by a different type of camera; It is obtained by training based on a first type of image sample and a second type of image sample containing a face. In this disclosure, by training a neural network using image samples taken by two different types of cameras, the neural network can learn the feature extraction ability of different types of images, thereby training The face recognition ability for different types of images is realized based on the developed neural network.

ステップ１０３では、第１の顔特徴を、第１のカメラと異なるタイプのカメラである第２のカメラが取得した第２の顔画像の特徴から抽出された予め記憶された第２の顔特徴と対比し、参照類似度を取得する。 In step 103, the first facial features are replaced with pre-stored second facial features extracted from the features of a second facial image acquired by a second camera that is a different type of camera from the first camera. Contrast and obtain reference similarity.

本開示の実施例では、第１の顔画像は、顔認証を行う必要がある顔画像として理解されてよく、或いは顔認識を行う必要がある顔画像として理解されてよく、或いは検索する必要がある顔画像として理解されてよい。第２の顔画像は、顔登録時の顔画像として理解されてよく、或いは身分ベースライブラリに保存された顔画像として理解されてよい。該身分ベースライブラリに身分情報、及び各身分情報に対応する顔特徴が保存される。以下、第１の顔画像が顔認識を行う必要がある顔画像であり、そして第２の顔画像が顔登録時の顔画像であることを例として説明する。 In embodiments of the present disclosure, the first facial image may be understood as a facial image that needs to undergo facial recognition, or that needs to be searched. It may be understood as a certain facial image. The second facial image may be understood as a facial image during face registration or may be understood as a facial image stored in an identity-based library. The identity base library stores identity information and facial features corresponding to each identity information. Hereinafter, an example will be described in which the first face image is a face image that requires face recognition, and the second face image is a face image at the time of face registration.

本開示の実施例では、第１のカメラ及び第２のカメラは異なるタイプのカメラに属し、つまり第１の顔画像と第２の顔画像は異なるタイプの顔画像として理解できる。例えば、第１の顔画像がＲＧＢ顔画像であれば、第２の顔画像はＩＲ顔画像であってよい。或いは、第１の顔画像はＩＲ顔画像であり、第２の顔画像はＲＧＢ顔画像である。或いは、他のタイプの顔画像などであり、本開示の実施例は限定しない。第１のカメラ及び第２のカメラの具体的な説明について、前述の実施例を参照することができ、本明細書において１つずつ詳述しない。 In the embodiments of the present disclosure, the first camera and the second camera belong to different types of cameras, that is, the first facial image and the second facial image can be understood as different types of facial images. For example, if the first facial image is an RGB facial image, the second facial image may be an IR facial image. Alternatively, the first face image is an IR face image and the second face image is an RGB face image. Alternatively, it may be another type of facial image, etc., and the embodiments of the present disclosure are not limited thereto. For specific descriptions of the first camera and the second camera, reference can be made to the previous embodiments, and they will not be described in detail one by one in this specification.

本開示の実施例では、参照類似度は、第１の顔画像が身分ベースライブラリにおける第２の顔画像に対応するユーザに属する類似度であり、すなわち参照類似度は、第１の顔画像と第２の顔画像が同じ人に対応する類似度を示すことができる。例えば、電子機器は、顔特徴の対比により参照類似度を取得することができる。 In an embodiment of the present disclosure, the reference similarity is the similarity where the first facial image belongs to the user who corresponds to the second facial image in the identity-based library, i.e., the reference similarity is the degree of similarity between the first facial image and the second facial image in the identity-based library. It is possible to indicate the degree of similarity in which the second facial images correspond to the same person. For example, an electronic device can obtain reference similarity by comparing facial features.

一例では、第２の顔画像が車両使用者に対して顔登録を行うための画像である場合、本開示の実施例は、第２の顔画像の顔特徴を取得する方法をさらに提供し、例えば第１の顔特徴を予め記憶された第２の顔特徴と対比する前に、図１に示される方法は、
第２のカメラにより第２の顔画像を取得するステップと、
第２の顔画像の第２の顔特徴を抽出するステップと、
第２の顔画像の第２の顔特徴を保存するステップと、をさらに含む。 In one example, when the second facial image is an image for performing facial registration for a vehicle user, embodiments of the present disclosure further provide a method for obtaining facial features of the second facial image, For example, before comparing a first facial feature with a pre-stored second facial feature, the method shown in FIG.
acquiring a second facial image with a second camera;
extracting a second facial feature of the second facial image;
and storing a second facial feature of the second facial image.

本実施例は、第２の顔画像の顔特徴を抽出する方法を限定せず、例えば予めトレーニングされたニューラルネットワークにより該第２の顔画像の顔特徴を抽出することにより第２の顔特徴を取得してよい。また例えばローカルバイナリパターン（ｌｏｃａｌｂｉｎａｒｙｐａｔｔｅｒｎｓ、ＬＢＰ）方法により該第２の顔画像の顔特徴を抽出してよい。また例えばｓｉｆｔ特徴抽出方法、幾何学的特徴の顔特徴抽出方法、及びテンプレートマッチングに基づく顔特徴抽出方法などにより該第２の顔画像の顔特徴を抽出してよい。なお、本実施例は、第２の顔画像の顔特徴をどのように抽出するかについて一意的に限定しない。本実施例では、第２の顔特徴を電子機器に保存することにより、顔認識に保障を提供することができる。 The present embodiment does not limit the method of extracting the facial features of the second facial image. For example, the second facial features are extracted by extracting the facial features of the second facial image using a pre-trained neural network. You may obtain it. Alternatively, facial features of the second facial image may be extracted using, for example, a local binary patterns (LBP) method. Further, the facial features of the second facial image may be extracted by, for example, a sift feature extraction method, a facial feature extraction method based on geometric features, a facial feature extraction method based on template matching, or the like. Note that this embodiment does not uniquely limit how the facial features of the second facial image are extracted. In this embodiment, the second facial feature can be stored in the electronic device to provide security for facial recognition.

ステップ１０４では、参照類似度に基づいて第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定する。 In step 104, it is determined whether the first facial feature and the second facial feature correspond to the same person based on the reference similarity.

本開示の実施例では、参照類似度に基づいて第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定する場合、例えば参照類似度と類似度閾値（静的な類似度閾値と理解できる）との間の関係を比較することにより決定してよく、参照類似度が類似度閾値以上であれば、第１の顔特徴と第２の顔特徴とが同じ人に対応すると決定することができる。参照類似度が類似度閾値より小さければ、第１の顔特徴と第２の顔特徴とが異なる人に対応すると決定することができる。 In embodiments of the present disclosure, when determining whether the first facial feature and the second facial feature correspond to the same person based on the reference similarity, for example, the reference similarity and the similarity threshold (static (understood as a similarity threshold), and if the reference similarity is greater than or equal to the similarity threshold, then the first facial feature and the second facial feature are the same person. It can be determined that it corresponds to If the reference similarity is less than the similarity threshold, it may be determined that the first facial feature and the second facial feature correspond to different people.

一例では、本開示の実施例は、動的な類似度閾値を用いて第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定する方法をさらに提供する。例えば、参照類似度に基づいて第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定するステップは、
参照類似度、参照誤報率及び類似度閾値に基づいて第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定するステップを含み、異なる誤報率は、異なる類似度閾値に対応する。 In one example, embodiments of the present disclosure further provide a method for determining whether a first facial feature and a second facial feature correspond to the same person using a dynamic similarity threshold. For example, determining whether the first facial feature and the second facial feature correspond to the same person based on the reference similarity may include:
determining whether the first facial feature and the second facial feature correspond to the same person based on a reference similarity, a reference false alarm rate, and a similarity threshold, where different false alarm rates correspond to different degrees of similarity; Corresponds to the threshold.

本実施例では、異なる誤報率は、異なる類似度閾値に対応し、つまり、誤報率と類似度閾値との間に対応関係がある。異なる誤報率が具体的に対応する類似度閾値がどのぐらいであるかについて、本開示の実施例は限定しない。例えば、該誤報率と類似度閾値との間の対応関係は、ユーザにより設定されてよく、或いは電子機器により自主的に設定されてよく、本開示の実施例は限定しない。なお、本開示の実施例における参照誤報率は、電子機器により決定された誤報率であり、例えば電子機器が誤報率と類似度閾値との間の対応関係から決定した１つの誤報率である。 In this example, different false alarm rates correspond to different similarity thresholds, that is, there is a correspondence between false alarm rates and similarity thresholds. The embodiments of the present disclosure do not limit the similarity thresholds to which different false alarm rates specifically correspond. For example, the correspondence between the false alarm rate and the similarity threshold may be set by the user or may be set independently by the electronic device, and the embodiments of the present disclosure are not limited thereto. Note that the reference false alarm rate in the embodiment of the present disclosure is a false alarm rate determined by the electronic device, and is, for example, one false alarm rate determined by the electronic device from the correspondence between the false alarm rate and the similarity threshold.

例えば、誤報率と類似度閾値との間の関係は以下のとおりである：誤報率が万分の一である場合、類似度閾値は０．７であり、誤報率が十万分の一である場合、類似度閾値は０．８であってよく、誤報率が百万分の一である場合、類似度閾値は０．９であってよく、誤報率が千万分の一である場合、類似度閾値は０．９８であってよい。これにより、参照誤報率を決定した後、該電子機器は、該参照誤報率に基づいて類似度閾値を決定することができることにより、取得された参照類似度及び決定された類似度閾値に基づいて第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定することができる。なお、本開示の実施例は、電子機器がどのように参照誤報率を決定するかについて限定せず、例えば該電子機器は、ユーザが入力した決定命令により決定してよく、或いは他の方法で決定してよい。 For example, the relationship between the false alarm rate and the similarity threshold is as follows: if the false alarm rate is 1 in 10,000, then the similarity threshold is 0.7, and the false alarm rate is 1 in 100,000. , the similarity threshold may be 0.8, and if the false alarm rate is 1 in 1 million, the similarity threshold may be 0.9, and if the false alarm rate is 1 in 1 million, The similarity threshold may be 0.98. Accordingly, after determining the reference false alarm rate, the electronic device can determine the similarity threshold based on the reference false alarm rate, thereby determining the similarity threshold based on the obtained reference similarity and the determined similarity threshold. It may be determined whether the first facial feature and the second facial feature correspond to the same person. Note that embodiments of the present disclosure do not limit how the electronic device determines the reference false alarm rate; for example, the electronic device may determine the reference false alarm rate based on a determination instruction input by a user, or may determine the reference false alarm rate by other methods. You may decide.

本実施例の実施では、異なる誤報率により異なる類似度閾値を取得することにより第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定することは、固定の類似度閾値を用いる方式で顔を認証する解決手段を回避することにより、２つの顔画像の間の関係を判断するための類似度を動的に決定することができ、顔認識の精度を向上させる。 In the implementation of this embodiment, determining whether the first facial feature and the second facial feature correspond to the same person by obtaining different similarity thresholds with different false alarm rates is based on a fixed similarity threshold. By avoiding face recognition solutions using a degree threshold method, the degree of similarity for determining the relationship between two face images can be dynamically determined, improving the accuracy of face recognition. .

一例では、本開示の実施例は、第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定する方法をさらに提供し、例えば参照類似度に基づいて第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定するステップは、
参照類似度及び閾値情報に基づいて、正規化された参照類似度を決定するステップと、
正規化された参照類似度に基づいて第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定するステップと、を含む。 In one example, embodiments of the present disclosure further provide a method for determining whether a first facial feature and a second facial feature correspond to the same person, e.g. The step of determining whether the facial feature and the second facial feature correspond to the same person includes:
determining a normalized reference similarity based on the reference similarity and the threshold information;
determining whether the first facial feature and the second facial feature correspond to the same person based on the normalized reference similarity.

本実施例では、閾値情報は、ポジティブサンプルペアの類似度、ネガティブサンプルペアの類似度及び予め設定された異なる誤報率に基づいて取得され、ポジティブサンプルペア及びネガティブサンプルペアは、第１のタイプの画像及び第２のタイプの画像に基づいて取得され、各ポジティブサンプルペアは２枚の画像を含み、かつ２枚の画像における顔は同じ人に対応し、各ネガティブサンプルペアは２枚の画像を含み、かつ２枚の画像における顔は異なる人に対応し、かつポジティブサンプルペアとネガティブサンプルペアとの類似度は、予めトレーニングされたニューラルネットワークにより決定される。一例では、該閾値情報は、第１の閾値及び第２の閾値を含でんよく、これにより電子機器は、参照類似度、該第１の閾値及び該第２の閾値に基づいて、正規化された参照類似度を決定することができ、該参照類似度は、第１の閾値と第２の閾値との間にあり、かつ閾値情報において、該参照類似度は、第１の閾値及び第２の閾値に最も近い。第１のタイプの画像と第２のタイプの画像は、それぞれ異なるタイプのカメラにより取得される。 In this embodiment, the threshold information is obtained based on the similarity of positive sample pairs, the similarity of negative sample pairs and different preset false alarm rates, and the positive sample pairs and negative sample pairs are of the first type. image and a second type of image, each positive sample pair includes two images, and the faces in the two images correspond to the same person, and each negative sample pair includes two images. The faces in the two images correspond to different people, and the similarity between the positive sample pair and the negative sample pair is determined by a pre-trained neural network. In one example, the threshold information may include a first threshold and a second threshold, whereby the electronic device normalizes the reference similarity based on the first threshold and the second threshold. a reference similarity between a first threshold and a second threshold; closest to the threshold of 2. The first type of image and the second type of image are each obtained by a different type of camera.

本実施例では、参照類似度により、閾値情報から第１の閾値及び第２の閾値を決定してよく、これにより電子機器は、該第１の閾値及び該第２の閾値に基づいて、正規化された参照類似度を決定することができる。該正規化された参照類似度は、第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定するための最終類似度である。例えば、参照類似度の数値がＴ（ｎ－１）（例えば第１の閾値である）とＴ（ｎ）（例えば第２の閾値である）との間にある場合、正規化された参照類似度は、０．３＋（ｎ－１）／１０＋０．１^＊（参照類似度－Ｔ（ｎ－１））^＊（Ｔ（ｎ）－Ｔ（ｎ－１））に決定することができる。なお、以上は正規化の方法の例に過ぎず、本実施例を限定するものと理解すべきではない。 In this embodiment, the first threshold value and the second threshold value may be determined from the threshold information based on the reference similarity, so that the electronic device can determine the regularity based on the first threshold value and the second threshold value. The standardized reference similarity can be determined. The normalized reference similarity is the final similarity for determining whether the first facial feature and the second facial feature correspond to the same person. For example, if the reference similarity value is between T(n-1) (e.g., the first threshold) and T(n) (e.g., the second threshold), then the normalized reference similarity The degree can be determined to be 0.3+(n-1) ^/10+0.1* (reference similarity-T(n-1)) ^* (T(n)-T(n-1)). Note that the above is only an example of the normalization method, and should not be understood as limiting the present embodiment.

なお、本実施例における電子機器は、正規化された参照類似度を決定した後に、該電子機器は、固定の類似度閾値を用いて第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定してよい。一例では、該電子機器は、さらに動的な類似度閾値（すなわち異なる誤報率に基づいて異なる類似度閾値を取得する）を用いて第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定してよい。 Note that, after determining the normalized reference similarity, the electronic device in this embodiment uses a fixed similarity threshold to identify a person whose first facial feature and second facial feature are the same. You may decide whether or not to comply with the above. In one example, the electronic device further uses a dynamic similarity threshold (i.e., obtains different similarity thresholds based on different false alarm rates) to match the first facial feature and the second facial feature to the same person. You may decide whether to respond or not.

本実施例を実施して、第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定する類似度の精度をさらに向上させ、顔認証の精度を向上させることができる。 By implementing this embodiment, it is possible to further improve the accuracy of the degree of similarity for determining whether or not the first facial feature and the second facial feature correspond to the same person, thereby improving the accuracy of facial recognition. can.

一例では、身分ベースライブラリにおける顔画像は、複数である可能性があり、すなわち予め記憶された第２の顔特徴は複数の個人に対応する可能性があるため、該第２の顔画像の数は少なくとも２つを含み、参照類似度の数は少なくとも２つを含んでよい。したがって、本開示の実施例は、顔認識方法をさらに提供し、参照類似度、参照誤報率及び類似度閾値に基づいて第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定するステップは、
参照誤報率に基づいて類似度閾値を決定し、そして少なくとも２つの参照類似度から第１の顔特徴との類似度が最も高い第２の顔特徴を決定するステップと、
類似度が最も高い第２の顔特徴と第１の顔特徴との参照類似度が類似度閾値よりも大きい場合、類似度が最も高い第２の顔特徴と第１の顔特徴とが同じ人に対応すると決定するステップとを含む。 In one example, the facial images in the identity-based library may be plural, i.e., the pre-stored second facial features may correspond to multiple individuals, so that the number of the second facial images is may include at least two, and the number of reference similarities may include at least two. Accordingly, embodiments of the present disclosure further provide a facial recognition method, determining whether the first facial feature and the second facial feature correspond to the same person based on a reference similarity, a reference false alarm rate, and a similarity threshold. The step to decide whether
determining a similarity threshold based on the reference false alarm rate, and determining a second facial feature that is most similar to the first facial feature from the at least two reference similarities;
If the reference similarity between the second facial feature with the highest similarity and the first facial feature is greater than the similarity threshold, the second facial feature with the highest similarity is the same as the first facial feature. and determining that the method corresponds to.

本実施例では、電子機器は、それぞれ第１の顔特徴と少なくとも２つの第２の顔特徴との参照類似度を取得することにより、少なくとも２つの参照類似度から第１の顔特徴との類似度が最も高い第２の顔特徴を決定し、次に該類似度が最も高い第２の顔特徴と第１の顔特徴との参照類似度が類似度閾値よりも大きいか否か（異なる誤報率に基づいて取得される）を対比してよく、大きければ、該類似度が最も高い第２の顔特徴と第１の顔特徴とが同じ人に対応することを示す。 In this embodiment, the electronic device obtains the reference similarities between the first facial feature and the at least two second facial features, thereby determining the similarity between the first facial feature and the first facial feature from the at least two reference similarities. determine the second facial feature with the highest degree of similarity, and then determine whether the reference similarity between the second facial feature with the highest degree of similarity and the first facial feature is greater than a similarity threshold (different false alarms); If the similarity is large, it indicates that the second facial feature and the first facial feature with the highest degree of similarity correspond to the same person.

さらに、電子機器が第１の顔特徴と少なくとも２つの第２の顔特徴との参照類似度に基づいて取得した、正規化された参照類似度の数も少なくとも２つを含む。したがって、正規化された参照類似度を決定した後、該電子機器は、さらに、正規化された参照類似度に基づいて第１の顔特徴との類似度が最も高い第２の顔特徴を決定してよく、かつ第１の顔特徴との類似度が最も高い該第２の顔特徴と第１の顔特徴との正規化された参照類似度が類似度閾値（異なる誤報率に基づいて取得されてよい）よりも大きければ、第１の顔特徴との類似度が最も高い第２の顔特徴と第１の顔特徴における顔が同じ顔に属し、すなわち同じ人に対応すると決定することができる。 Furthermore, the number of normalized reference similarities obtained by the electronic device based on the reference similarities between the first facial feature and at least two second facial features also includes at least two. Therefore, after determining the normalized reference similarity, the electronic device further determines a second facial feature that has the highest similarity to the first facial feature based on the normalized reference similarity. and the normalized reference similarity between the second facial feature and the first facial feature that has the highest similarity with the first facial feature is set to a similarity threshold (obtained based on different false alarm rates). ), it can be determined that the second facial feature with the highest degree of similarity to the first facial feature and the face in the first facial feature belong to the same face, that is, they correspond to the same person. can.

本開示の実施例では、電子機器は、登録時の顔画像と顔認識時の顔画像とが同じ人に対応するか否かを対比する必要があるため、本開示の実施例は、顔登録時の顔画像を取得する電子機器と、顔認識時の顔画像を取得する電子機器とが同じタイプ（又は同一）の電子機器であるか否かについて限定しない。 In the embodiment of the present disclosure, the electronic device needs to compare whether the face image at the time of registration and the face image at the time of face recognition correspond to the same person. There is no limitation as to whether the electronic device that acquires the facial image during face recognition and the electronic device that acquires the facial image during face recognition are of the same type (or the same).

一例として、以下に例を挙げて本開示の実施例が適用するシーンを説明する。 As an example, a scene to which the embodiment of the present disclosure is applied will be described below.

シーン１ scene 1

（１）登録プロセスについて、ユーザは、携帯電話により顔登録を完了し、すなわち携帯電話に設置されたカメラにより撮影された顔画像により登録フローを行ってよく、携帯電話に一般的に可視光カメラ（例えばＲＧＢカメラ）が設置されるため、登録された顔画像は、ＲＧＢカメラにより撮影された第２の顔画像である。その後に登録された顔画像を携帯電話により又は携帯電話を介してサーバにより車載機器に送信し、車載機器は、該登録された顔画像を記憶し、すなわち車載機器は、第２の顔画像を記憶する。或いは、ユーザは、携帯電話に設置された可視光カメラにより第２の顔画像を撮影して、該第２の顔画像をサーバ又は車載機器に送信することにより、サーバ又は車載機器において第２の顔画像の登録を完了し、登録が完了した後に第２の顔画像に基づいて抽出された第２の顔特徴を保存してよい。 (1) Regarding the registration process, the user may complete the face registration by mobile phone, that is, the registration flow may be performed by the face image taken by the camera installed on the mobile phone, and the mobile phone generally has a visible light camera. (for example, an RGB camera), the registered face image is a second face image taken by the RGB camera. Thereafter, the registered facial image is transmitted to the in-vehicle device by the server by or via the mobile phone, and the in-vehicle device stores the registered facial image, that is, the in-vehicle device transmits the second facial image. Remember. Alternatively, the user captures a second facial image using a visible light camera installed on a mobile phone, and sends the second facial image to the server or in-vehicle device, thereby capturing the second facial image in the server or in-vehicle device. The registration of the facial image may be completed, and the second facial feature extracted based on the second facial image may be saved after the registration is completed.

（２）認識プロセスについて、ユーザは、車載機器により認識する必要がある顔画像（すなわち第１の顔画像）を収集し、車載機器に赤外線カメラが設置されてよく、これにより赤外線カメラにより撮影された、例えばＩＲ顔画像を顔認識しようとする対象の画像、第１の顔画像としてよい。該車載機器は、ＩＲ画像の顔特徴を抽出して、ＩＲ画像の顔特徴を登録されたＲＧＢ画像の顔特徴と照合することにより、識別ようとする対象のユーザと登録されたユーザとが同じ人であるか否かを決定する。 (2) Regarding the recognition process, the user collects a facial image that needs to be recognized by the in-vehicle device (i.e., the first facial image), and an infrared camera may be installed in the in-vehicle device, so that the infrared camera captures the facial image. Alternatively, for example, the IR face image may be used as the image of the target whose face is to be recognized, or the first face image. The in-vehicle device extracts the facial features of the IR image and compares the facial features of the IR image with the facial features of the registered RGB image to determine if the user to be identified is the same as the registered user. Determine whether a person is a person or not.

なお、本開示の実施例に示されるシーン１は、運転者状態監視システム、乗車管理システム、自動車レンタル管理システム及びオンライン配車管理システムなどに適用できる。例えば、運転者状態監視システムに対して、該システムは、一般的には、顔認識、運転者動作検出、運転者疲労検出及び運転者注意力監視などのいくつかのモジュールを含む。したがって、上記システムは、顔認識の段階的ではシーン１に示された方法により顔認識を行うことにより、運転者の身分又は車両を賃貸する人の身分などを決定することができる。 Note that scene 1 shown in the embodiment of the present disclosure can be applied to a driver condition monitoring system, a ride management system, an automobile rental management system, an online vehicle dispatch management system, and the like. For example, for a driver condition monitoring system, the system generally includes several modules such as facial recognition, driver motion detection, driver fatigue detection, and driver attention monitoring. Therefore, the above system can determine the identity of the driver, the identity of the person renting the vehicle, etc. by performing facial recognition using the method shown in scene 1 in the facial recognition step.

シーン２ scene 2

本開示の実施例に係る顔認識方法は、小区域入出管理システム及びセキュリティ設定制御システムなどにさらに適用でき、例えば小区域入出管理システムを例とする。 The face recognition method according to the embodiment of the present disclosure can be further applied to a small area access management system, a security setting control system, etc., for example, a small area access management system is taken as an example.

（１）登録プロセスについて、ユーザ（ある小区域内の住人）は、携帯電話により顔登録を行い、すなわち携帯電話に設置された可視光カメラ（例えばＲＧＢカメラ）により該ユーザの顔画像、すなわち第２の顔画像を取得してよい。その後に登録された顔画像を携帯電話により又は携帯電話を介してサーバにより入出制御機器に送信し、入出制御機器は、該第２の顔画像を記憶する。或いは、ユーザは、携帯電話に設置された可視光カメラにより第２の顔画像を撮影して、該第２の顔画像を入出制御機器に送信することにより、該入出制御機器により第２の顔画像の登録を完了し、登録が完了した後に第２の顔画像に基づいて抽出された第２の顔特徴を保存してよい。 (1) Regarding the registration process, a user (a resident in a certain sub-area) registers his or her face using a mobile phone, that is, a visible light camera (e.g. You may obtain facial images of Thereafter, the registered facial image is transmitted by the server to the entrance/exit control device by or via the mobile phone, and the entrance/exit control device stores the second facial image. Alternatively, the user can capture a second face image using a visible light camera installed on a mobile phone and transmit the second face image to the access control device, so that the access control device can display the second face image. The image registration may be completed, and the second facial features extracted based on the second facial image may be stored after the registration is completed.

（２）認識プロセスについて、ユーザが小区域を入出する必要がある場合、入出制御機器は、赤外線カメラ（例えばＩＲカメラ）により該ユーザの顔画像、すなわち第１の顔画像を取得してよい。その後に該入出制御機器は、第１の顔画像の顔特徴を抽出して第１の顔特徴を取得し、該入出制御機器は、ＩＲカメラにより撮影された第１の顔画像であるＩＲ画像の顔特徴を、登録されたＲＧＢカメラにより撮影されたＲＧＢ画像の顔特徴と対比することにより、小区域に入出しようとするユーザと登録されたユーザとが同じ人であるか否かを決定する。 (2) For the recognition process, when a user needs to enter or exit a sub-area, the access control device may obtain a facial image of the user, ie a first facial image, by an infrared camera (e.g. IR camera). After that, the entrance/exit control device extracts the facial features of the first facial image to obtain the first facial features, and the entrance/exit control device extracts the facial features of the first facial image, and the entrance/exit control device extracts the facial features of the first facial image. determine whether the user attempting to enter or leave the subarea is the same person as the registered user by comparing the facial features of the RGB image captured by the registered RGB camera. .

なお、以上は本開示の実施例において例示されたいくつかのシーンに過ぎず、具体的な実現では、本開示の実施例に係る方法は、さらに多くのシーンに適用でき、例えば本開示の実施例に係る方法は、携帯電話のロック解除などの端末のロック解除及び銀行身分認証システムなどにさらに適用でき、本開示の実施例は限定しない。携帯電話のロック解除での適用に対して、ユーザは、携帯電話により顔登録を行ってよく、その後に毎回携帯電話を使用するプロセスにおいて、該携帯電話は、いずれも本開示の実施例に係る方法を実行して顔画像の識別を実現することができる。なお、顔登録時の携帯電話と、顔認識時の携帯電話とは異なる携帯電話であってよく、すなわち顔登録時の携帯電話におけるカメラと顔認識時の携帯電話におけるカメラとは、タイプが異なってよい。また、金融身分認証システムである銀行身分認証システムに対して、ユーザが銀行口座を開設する時に、銀行端末により顔登録を行ってよく、その後に後続きの銀行業務を取り扱うプロセスにおいて、他の銀行端末は、本開示の実施例に係る方法を実行して顔画像の認識を実現して、ユーザの銀行業務の安全性を確保することができる。顔登録時の銀行端末及び顔認識時の銀行端末におけるカメラのタイプは異なってよい。 Note that the above are only some scenes illustrated in the embodiments of the present disclosure, and in concrete implementation, the method according to the embodiments of the present disclosure can be applied to many more scenes, for example, in the implementation of the present disclosure. The example method can further be applied to unlocking a terminal such as unlocking a mobile phone, bank identification system, etc., and the embodiments of the present disclosure are not limiting. For the application in unlocking a mobile phone, the user may perform face registration with the mobile phone, and then in the process of using the mobile phone each time, the mobile phone will be automatically registered according to the embodiments of the present disclosure. The method may be implemented to realize facial image identification. Note that the mobile phone at the time of face registration and the mobile phone at the time of face recognition may be different mobile phones, that is, the camera of the mobile phone at the time of face registration and the camera of the mobile phone at the time of face recognition may be of different types. It's fine. In addition, when a user opens a bank account for a bank ID authentication system, which is a financial ID authentication system, the user may register his or her face using a bank terminal. The terminal can perform the method according to the embodiment of the present disclosure to realize facial image recognition and ensure the security of the user's banking business. The types of cameras in the bank terminal during face registration and the bank terminal during face recognition may be different.

したがって、本開示の実施例において示されたシーンを本開示の実施例を限定するものと理解すべきではない。 Therefore, the scenes shown in the embodiments of the present disclosure should not be understood as limiting the embodiments of the present disclosure.

本開示の出願人は、本開示の実施例を実施するプロセスにおいて、同じ人がＲＧＢカメラ及びＩＲカメラにより２枚の顔画像を撮影して、この２枚の画像を用いてＲＧＢ画像、ＩＲ画像、ＲＧＢと赤外線画像の混合画像という３組の混合比較組み合わせを形成し、同じニューラルネットワークを介して取得された類似度に差異が存在する可能性があることをさらに発見した。 The applicant of the present disclosure proposes that in the process of implementing the embodiments of the present disclosure, the same person takes two facial images using an RGB camera and an IR camera, and uses these two images to create an RGB image and an IR image. , formed three sets of mixed comparison combinations: mixed images of RGB and infrared images, and further discovered that there may be differences in the similarity obtained through the same neural network.

同時に、２人の異なる人は、それぞれＲＧＢカメラ及びＩＲカメラにより４枚の顔画像を撮影して、この４枚の顔画像を用いて異なる画像組み合わせ、例えばユーザ１ＲＧＢ画像とユーザ２ＲＧＢ画像、ユーザ１ＩＲ画像とユーザ２ＩＲ画像、ユーザ１ＲＧＢ画像とユーザ２ＩＲ画像、ユーザ１ＩＲ画像とユーザ２ＲＧＢ画像を構成する。この４組の混合比較組み合わせは、同じニューラルネットワークを介して取得された類似度にも差異が存在する可能性がある。 At the same time, two different people take four facial images with an RGB camera and an IR camera, respectively, and use these four facial images to create different image combinations, such as user 1 RGB image, user 2 RGB image, user 1 IR The image and the user 2 IR image, the user 1 RGB image and the user 2 IR image, and the user 1 IR image and the user 2 RGB image are configured. There may be differences in the degree of similarity obtained through the same neural network among these four mixed comparison combinations.

これにより、本開示の実施例は、ニューラルネットワーク（ｎｅｕｒａｌｎｅｔｗｏｒｋ、ＮＮ）をトレーニングする技術手段を提供し、上記発生した問題を効果的に低減するか又は回避することができる。なお、本開示の実施例におけるニューラルネットワークは、ディープニューラルネットワーク（ｄｅｅｐｎｅｕｒａｌｎｅｔｗｏｒｋ、ＤＮＮ）、畳み込みニューラルネットワークなどを含んでよく、本開示の実施例は、該ニューラルネットワークの具体的な形態を限定しない。 Accordingly, embodiments of the present disclosure provide a technical means for training a neural network (NN), which can effectively reduce or avoid the problems encountered above. Note that the neural network in the embodiments of the present disclosure may include a deep neural network (DNN), a convolutional neural network, etc., and the embodiments of the present disclosure do not limit the specific form of the neural network. .

以下、本開示の実施例に係るニューラルネットワークをトレーニングする技術手段を詳細に説明する。 Hereinafter, technical means for training a neural network according to an embodiment of the present disclosure will be described in detail.

図２を参照すると、図２は、本開示の実施例に係るニューラルネットワークのトレーニング方法のフローチャートであり、該ニューラルネットワークのトレーニング方法は、ニューラルネットワークのトレーニング装置に適用でき、そして該ニューラルネットワークのトレーニング方法は、電子機器に適用でき、該電子機器は、サーバ又は端末装置を含んでよく、該端末装置は、携帯電話、タブレットコンピュータ、デスクトップコンピュータ、パームサイズパーソナルコンピュータ、車載機器、及び車載ロボットなどを含んでよく、本開示の実施例は、該電子機器の具体的な形態を一意的に限定しない。なお、該ニューラルネットワークのトレーニング方法は、さらに顔認識装置に適用できる。つまり、図２に示される方法及び図１に示される方法は、同じタイプの電子機器により実行されてもよく、異なるタイプの電子装置により実行されてもよく、本開示の実施例は限定しない。同じタイプの電子機器により実行され、つまり、図１に示される方法は、端末装置により実行され、図２に示される方法は、端末装置により実行されてもよい。或いは、図２に示される方法は、図１に示される方法と同じ装置により実行されてよく、本開示の実施例は限定しない。 Referring to FIG. 2, FIG. 2 is a flowchart of a neural network training method according to an embodiment of the present disclosure, the neural network training method can be applied to a neural network training device, and the neural network training method can be applied to a neural network training device. The method can be applied to an electronic device, and the electronic device may include a server or a terminal device, and the terminal device may include a mobile phone, a tablet computer, a desktop computer, a palm-sized personal computer, a vehicle-mounted device, a vehicle-mounted robot, etc. The embodiments of the present disclosure do not uniquely limit the specific form of the electronic device. Note that the neural network training method can be further applied to a face recognition device. That is, the method shown in FIG. 2 and the method shown in FIG. 1 may be performed by the same type of electronic device or by different types of electronic devices, and the embodiments of the present disclosure are not limiting. The method shown in FIG. 1 may be carried out by the same type of electronic equipment, ie the method shown in FIG. 2 may be carried out by a terminal device. Alternatively, the method shown in FIG. 2 may be performed by the same apparatus as the method shown in FIG. 1, and embodiments of the present disclosure are not limiting.

以下、ニューラルネットワークのトレーニングプロセスにおいて使用されたトレーニング画像を画像サンプルと呼び、画像サンプルは、ラベリング情報を含み、ラベリング情報は、画像中の顔ＩＤ（顔のラベリング情報と理解されてよい）、画像のタイプなどのうちの少なくとも１つを含むが、それらに限定されず、同じ人に対応する顔ＩＤは同じであり、画像のタイプは、画像収集に対応するカメラタイプを特徴付ける。図２に示すように、該ニューラルネットワークのトレーニング方法は、以下のステップ２０１～２０２を含む。 Hereinafter, the training images used in the training process of the neural network will be referred to as image samples, and the image samples include labeling information, and the labeling information includes the face ID in the image (which may be understood as face labeling information), the image The face ID corresponding to the same person is the same, and the type of image characterizes the camera type corresponding to the image collection, including, but not limited to, the type of image. As shown in FIG. 2, the neural network training method includes the following steps 201-202.

ステップ２０１では、異なるタイプのカメラによって撮影され、かつ顔が含まれる第１のタイプの画像サンプル及び第２のタイプの画像サンプルを取得する。 In step 201, a first type of image sample and a second type of image sample that are captured by different types of cameras and include a face are obtained.

本開示の実施例では、第１のタイプの画像サンプルは、少なくとも顔の画像を含み、第２のタイプの画像サンプルは、少なくとも顔の画像を含み、かつ該第１のタイプの画像サンプルと該第２のタイプの画像サンプルは、異なるタイプのカメラにより取得される。例えば、第１のタイプの画像サンプルがＲＧＢカメラによって取得されると、第２のタイプの画像サンプルは、他のタイプのカメラ、例えばＩＲカメラによって取得されてよい。なお、異なるタイプのカメラの具体的な実施形態について、図１に示される形態を参照することができ、本明細書において繰り返して説明しない。 In embodiments of the present disclosure, the first type of image sample includes at least a facial image, and the second type of image sample includes at least a facial image, and the first type of image sample and the second type of image sample include at least a facial image. A second type of image sample is acquired by a different type of camera. For example, if a first type of image sample is acquired by an RGB camera, a second type of image sample may be acquired by another type of camera, such as an IR camera. It should be noted that for specific embodiments of different types of cameras, reference may be made to the embodiment shown in FIG. 1 and will not be repeatedly described herein.

なお、該第１のタイプの画像サンプルの数量、及び該第２のタイプの画像サンプルの数量について、本開示の実施例は限定しない。該第１のタイプの画像サンプルの数及び該第２のタイプの画像サンプルの数は、ニューラルネットワークのトレーニング度合いを計量基準などとしてよい。 Note that the embodiment of the present disclosure does not limit the quantity of the first type of image samples and the quantity of the second type of image samples. The number of image samples of the first type and the number of image samples of the second type may be used as a metric for the degree of training of the neural network.

ステップ２０２では、第１のタイプの画像サンプル及び第２のタイプの画像サンプルに基づいてニューラルネットワークをトレーニングする。 Step 202 trains a neural network based on the first type of image samples and the second type of image samples.

本開示の実施例では、電子機器は、２種類の異なるタイプのカメラによって撮影された画像サンプルを用いてニューラルネットワークをトレーニングしてよく、これによりニューラルネットワークは、トレーニングプロセスにおいて２種類の異なるタイプの画像に対する特徴抽出能力を学習することができる。ニューラルネットワークのトレーニングが完了した後、トレーニングされたニューラルネットワークに基づいて、この２種類の異なる画像のうちの任意の１種類の画像に対して特徴抽出を行い、かつ特徴抽出の精度を保証することができ、これによりこの２種類のカメラに基づいて顔登録及び認識を行う適用需要を満たす。つまり、ニューラルネットワークをトレーニングする時に、単純に１つのタイプの画像サンプルを用いてトレーニングせず、複数のタイプの画像サンプルを用いてトレーニングすることにより、トレーニングされたニューラルネットワークは、異なるタイプの画像サンプルの特徴を効果的に取得することができる。 In embodiments of the present disclosure, the electronic device may train the neural network using image samples taken by two different types of cameras, such that the neural network uses two different types of images in the training process. It is possible to learn feature extraction ability for images. After the training of the neural network is completed, perform feature extraction on any one of these two different types of images based on the trained neural network, and ensure the accuracy of feature extraction. This satisfies the application needs of face registration and recognition based on these two types of cameras. In other words, when training a neural network, instead of simply training with one type of image samples, by training with multiple types of image samples, the trained neural network can be trained with different types of image samples. characteristics can be effectively acquired.

一例では、教師ありトレーニングを例とし、ニューラルネットワークのトレーニング方法は、例えば、第１のタイプ画像サンプル及び第２のタイプ画像サンプルをニューラルネットワークに入力し、ニューラルネットワークを介して第１のタイプ画像サンプル及び第２のタイプ画像サンプルを処理した後に、ニューラルネットワークの、第１のタイプ画像サンプルへの予測結果及び第２のタイプ画像サンプルへの予測結果（すなわち各画像サンプルの顔予測結果）をそれぞれ出力する。その後に、所定の損失関数により画像サンプルの顔予測結果と画像サンプルのラベリング情報（すなわち真の顔ラベリング情報）との間の損失を計算し、すなわち所定の損失関数により第１のタイプの画像サンプルの予測結果とラベリング情報との間の損失及び第２のタイプの画像サンプルの予測結果とラベリング情報との間の損失を計算する。それにより該損失をニューラルネットワークに逆伝播し、ニューラルネットワークは、逆伝播された損失に基づいて畳み込みカーネル、重みなどのネットワークパラメータのパラメータ値を調整する。一般的には、調整した後に、所定のトレーニング完了条件（例えば、損失が所定の閾値よりも小さいか又は反復トレーニングの回数が設定回数を超えるなど）を満たすまで、他の第１のタイプの画像サンプル及び他の第２のタイプの画像サンプルをさらに入力し、上記トレーニングプロセスを繰り返してよい。なお、以上は本開示の実施例に係るトレーニングプロセスに過ぎず、具体的な実現では、さらに他の方法などを含む可能性があり、本開示の実施例は限定しない。 In one example, taking supervised training as an example, the training method of the neural network includes, for example, inputting a first type image sample and a second type image sample to the neural network, and passing the first type image sample through the neural network. and after processing the second type image samples, the neural network outputs the prediction results for the first type image samples and the prediction results for the second type image samples (i.e., the face prediction results for each image sample), respectively. do. Then, calculate the loss between the face prediction result of the image sample and the labeling information of the image sample (i.e., the true face labeling information) by a predetermined loss function, i.e., the loss of the first type of image sample by a predetermined loss function. a loss between the prediction result of the second type of image sample and the labeling information and a loss between the prediction result of the second type of image sample and the labeling information. Thereby, the loss is back-propagated to the neural network, which adjusts parameter values of network parameters such as convolution kernels, weights, etc. based on the back-propagated loss. Generally, after the adjustment, the images of the other first type are Further samples and other second type image samples may be input and the training process described above may be repeated. Note that the above is only a training process according to the embodiment of the present disclosure, and in concrete implementation, other methods may be included, and the embodiment of the present disclosure is not limited.

一例では、第１のタイプの画像サンプル及び第２のタイプの画像サンプルに基づいてニューラルネットワークをトレーニングするステップは、
第１のタイプの画像サンプルと第２のタイプの画像サンプルとをペアリングして第１のタイプの画像サンプルと第２のタイプの画像サンプルの混合タイプの画像サンプルを取得するステップと、
第１のタイプの画像サンプル、第２のタイプの画像サンプル及び混合タイプの画像サンプルに基づいてニューラルネットワークをトレーニングするステップと、を含む。 In one example, training the neural network based on the first type of image samples and the second type of image samples includes:
pairing the first type of image sample and the second type of image sample to obtain a mixed type of image sample of the first type of image sample and the second type of image sample;
training a neural network based on the first type of image samples, the second type of image samples and the mixed type of image samples.

本実施例では、第１のタイプの画像サンプルと第２のタイプの画像サンプルとがペアリングされ、つまり、混合タイプの画像サンプルにおける各ペアの混合タイプの画像サンプルは、第１のタイプの画像サンプル及び第２のタイプの画像サンプルを含み、これにより２種類の異なるタイプのトレーニング用画像サンプルを形成する。第１のタイプの画像サンプル、第２のタイプの画像サンプル及び混合タイプの画像サンプルによりニューラルネットワークをトレーニングすることにより、該ニューラルネットワークは、各単一タイプの画像の特徴抽出能力を学習することができるだけでなく、この２種類の異なるタイプの画像の特徴抽出能力をよりよく共同学習することができ、ニューラルネットワークの特徴抽出の精度を向上させることにより、トレーニングされたニューラルネットワークは、本開示の実施例に係る顔認識方法に効果的に適用することができる。 In this example, a first type of image sample and a second type of image sample are paired, that is, each pair of mixed type image samples in the mixed type image sample is an image sample of the first type. sample and a second type of image sample, thereby forming two different types of training image samples. By training a neural network with image samples of the first type, image samples of the second type, and image samples of mixed types, the neural network can learn the feature extraction ability of each single type of image. Not only can the feature extraction ability of these two different types of images be better jointly learned, but by improving the feature extraction accuracy of the neural network, the trained neural network can improve the performance of the implementation of the present disclosure. It can be effectively applied to the face recognition method according to the example.

一例では、第１のタイプの画像サンプル、第２のタイプの画像サンプル及び混合タイプの画像サンプルに基づいてニューラルネットワークをトレーニングするステップは、
ニューラルネットワークにより第１のタイプの画像サンプルの顔予測結果、第２のタイプの画像サンプルの顔予測結果及び混合タイプの画像サンプルの顔予測結果を取得するステップと、
第１のタイプの画像サンプルの顔予測結果と顔ラベリング結果との差異、第２のタイプの画像サンプルの顔予測結果と顔ラベリング結果との差異及び混合タイプの画像サンプルの顔予測結果と顔ラベリング結果との差異に基づいてニューラルネットワークをトレーニングするステップと、を含む。 In one example, training the neural network based on the first type of image samples, the second type of image samples, and the mixed type of image samples includes:
obtaining face prediction results for a first type of image sample, a face prediction result for a second type of image sample, and a face prediction result for a mixed type image sample by a neural network;
The difference between the face prediction result and face labeling result of the first type image sample, the difference between the face prediction result and face labeling result of the second type image sample, and the face prediction result and face labeling of the mixed type image sample. training a neural network based on the differences with the results.

本実施例では、電子機器は、ニューラルネットワークにより第１のタイプの画像サンプルの顔予測結果、第２のタイプの画像サンプルの顔予測結果及び混合タイプの画像サンプルの顔予測結果をそれぞれ取得してよい。その後に、該第１のタイプの画像サンプルの顔予測結果と顔ラベリング結果との差異、第２のタイプの画像サンプルの顔予測結果と顔ラベリング結果との差異及び混合タイプの画像サンプルの顔予測結果と顔ラベリング結果との差異に基づいてニューラルネットワークをトレーニングする。例えば、第１のタイプの画像サンプルの顔予測結果と顔ラベリング結果との間の損失に基づいてニューラルネットワークをトレーニングし、そして第２のタイプの画像サンプルの顔予測結果と顔ラベリング結果との間の損失に基づいてニューラルネットワークをトレーニングし、そして混合タイプの画像サンプルの顔予測結果と顔ラベリング結果との間の損失に基づいてニューラルネットワークをトレーニングしてよい。なお、具体的なトレーニング方法について、前述の実施例の説明を参照することができ、本明細書において１つずつ詳述しない。 In this embodiment, the electronic device uses a neural network to obtain face prediction results for the first type image sample, face prediction results for the second type image sample, and face prediction results for the mixed type image sample. good. Thereafter, the difference between the face prediction result and the face labeling result of the first type image sample, the difference between the face prediction result and the face labeling result of the second type image sample, and the face prediction of the mixed type image sample. Train a neural network based on the differences between the results and the face labeling results. For example, training a neural network based on the loss between the face prediction result and the face labeling result of a first type of image sample, and the loss between the face prediction result and the face labeling result of a second type of image sample. The neural network may be trained based on the loss between the face prediction result and the face labeling result for the mixed type image samples. Note that for specific training methods, reference can be made to the description of the above-mentioned embodiments, and each training method will not be described in detail in this specification.

一例では、ニューラルネットワークが顔特徴を抽出する精度をさらに向上させるために、本開示の実施例は、トレーニング方法をさらに提供し、例えば、ニューラルネットワークには第１の分類器、第２の分類器及び混合分類器が含まれ、ニューラルネットワークにより第１のタイプの画像サンプル、第２のタイプの画像サンプル及び混合タイプの画像サンプルの顔予測結果を取得するステップは、
第１のタイプの画像サンプルの特徴を第１の分類器に入力して、第１のタイプの画像サンプルの顔予測結果を取得するステップと、
第２のタイプの画像サンプルを第２の分類器に入力して、第２のタイプの画像サンプルの顔予測結果を取得するステップと、
混合タイプの画像サンプルの特徴を混合分類器に入力して、混合タイプの画像サンプルの顔予測結果を取得するステップと、を含む。 In one example, to further improve the accuracy with which the neural network extracts facial features, embodiments of the present disclosure further provide a training method, e.g., the neural network includes a first classifier, a second classifier. and a mixture classifier, the step of obtaining face prediction results for the first type image sample, the second type image sample and the mixture type image sample by the neural network,
inputting features of the first type of image sample into a first classifier to obtain a face prediction result of the first type of image sample;
inputting a second type of image sample into a second classifier to obtain a face prediction result for the second type of image sample;
inputting features of the mixed type image sample to a mixture classifier to obtain a face prediction result for the mixed type image sample.

本実施例では、分類器は、異なるタイプの画像サンプルの顔予測結果を分類してよく、例えば分類器により該分類器に入力された画像サンプルの顔予測結果を取得することができることにより、分類器の出力に基づいて損失を決定し、さらにニューラルネットワークに逆伝播し、該ニューラルネットワークをトレーニングすることができる。一例では、それぞれ、第１の分類器により第１のタイプの画像サンプルの顔予測結果を出力し、第２の分類器により第２のタイプの画像サンプルの顔予測結果を出力し、そして混合分類器により混合タイプの画像サンプルの顔予測結果を出力してよい。画像サンプルのタイプに対応する分類器により顔予測結果を出力することにより、分類器が顔予測結果を出力する精度を効果的に向上させ、さらに分類器の出力結果に基づいて混合タイプの画像特徴抽出をサポートするニューラルネットワークのトレーニングの精度を向上させ、ニューラルネットワークが異なるタイプ画像の顔特徴を抽出する場合の正確性及びロバスト性を向上させることができる。 In this example, the classifier may classify the face prediction results of image samples of different types, e.g. by being able to obtain the face prediction results of the image samples input to the classifier by the classifier. The loss can be determined based on the output of the device and further backpropagated to the neural network to train the neural network. In one example, a first classifier outputs a face prediction result for a first type of image sample, a second classifier outputs a face prediction result for a second type of image sample, and the mixture classification The device may output face prediction results for mixed type image samples. By outputting the face prediction result by the classifier corresponding to the type of image sample, it can effectively improve the accuracy of the classifier outputting the face prediction result, and furthermore, based on the output result of the classifier, the image features of mixed types can be output. The training accuracy of the neural network supporting extraction can be improved, and the accuracy and robustness of the neural network in extracting facial features of different types of images can be improved.

第１のタイプの画像サンプル、第２のタイプの画像サンプル及び混合タイプの画像サンプルに基づいてニューラルネットワークをトレーニングする具体的な実現形態は、図３に示されるものであってよく、図３は、本開示の実施例に係るトレーニングプロセスの概略図である。第１のタイプの画像サンプルがＲＧＢ画像サンプルであり、ＲＧＢ画像サンプルライブラリに含まれる各画像サンプルは、いずれもＲＧＢ画像サンプルであり、第２のタイプの画像サンプルがＩＲ画像サンプルであり、ＩＲ画像サンプルライブラリに含まれる各画像サンプルは、いずれもＩＲ画像サンプルであり、混合タイプの画像サンプルがＲＧＢ＆ＩＲ画像サンプルであることを例とし、混合タイプの画像サンプルライブラリには、一部のＲＧＢ画像サンプル及び部分ＩＲ画像サンプルが含まれ、ＲＧＢ＆ＩＲ画像サンプルとして表示される。同時に、第１の分類器がＲＧＢ分類器であり、第２の分類器がＩＲ分類器であり、混合分類器がＲＧＢ＆ＩＲ分類器であることを例とし、ＲＧＢ分類器は、ＲＧＢ画像サンプルライブラリにおけるＲＧＢ画像サンプルを分類し、あるＲＧＢ画像サンプルの分類結果は、該ＲＧＢ画像サンプルにおける顔がＲＧＢ画像サンプルライブラリにおける各顔ＩＤカテゴリに属する確率を示し、ＩＲ分類器は、ＩＲ画像サンプルライブラリにおけるＩＲ画像サンプルを分類し、あるＩＲ画像サンプルの分類結果は、該ＩＲ画像サンプルにおける顔がＩＲ画像サンプルライブラリにおける各顔ＩＤカテゴリに属する確率を示し、ＲＧＢ＆ＩＲ分類器は、混合タイプの画像サンプルライブラリにおけるＲＧＢ画像サンプル及びＩＲ画像サンプルを分類し、あるＲＧＢ＆ＩＲ画像サンプルは、該画像サンプルにおける顔が混合タイプの画像サンプルライブラリにおける各顔ＩＤカテゴリに属する確率を示す。ＲＧＢ＆ＩＲ画像サンプルにおける「＆」は、混合分類器であるＲＧＢ＆ＩＲ分類器に入力された画像サンプルがＲＧＢタイプの画像サンプルである可能性があり、ＩＲタイプの画像サンプルである可能性もあることを示してよい。したがって、本開示における「＆」を本開示を限定するものと理解すべきではない。 A specific implementation for training a neural network based on first type image samples, second type image samples and mixed type image samples may be as shown in FIG. , is a schematic diagram of a training process according to an embodiment of the present disclosure. The first type of image sample is an RGB image sample, each image sample included in the RGB image sample library is an RGB image sample, and the second type of image sample is an IR image sample, and each image sample included in the RGB image sample library is an IR image sample. Each image sample included in the sample library is an IR image sample. For example, a mixed type image sample is an RGB & IR image sample. The mixed type image sample library includes some RGB image samples and A partial IR image sample is included and displayed as an RGB&IR image sample. At the same time, take as an example that the first classifier is an RGB classifier, the second classifier is an IR classifier, and the mixed classifier is an RGB&IR classifier, the RGB classifier is The classification result of an RGB image sample indicates the probability that the face in the RGB image sample belongs to each face ID category in the RGB image sample library, and the IR classifier classifies the IR images in the IR image sample library. The classification result of a given IR image sample indicates the probability that the face in the IR image sample belongs to each face ID category in the IR image sample library, and the RGB&IR classifier classifies the RGB images in the mixed type image sample library. The samples and IR image samples are classified, and a certain RGB&IR image sample indicates the probability that the face in the image sample belongs to each face ID category in the mixed type image sample library. "&" in RGB&IR image sample indicates that the image sample input to the RGB&IR classifier, which is a mixed classifier, may be an RGB type image sample and may also be an IR type image sample. It's fine. Therefore, "&" in this disclosure should not be understood as limiting this disclosure.

図３に示すように、ＲＧＢ画像サンプル、ＩＲ画像サンプル及びＲＧＢ＆ＩＲ画像サンプルをそれぞれニューラルネットワークに入力した後、該ニューラルネットワークは、特徴抽出器によりＲＧＢ画像サンプルの特徴、ＩＲ画像サンプルの特徴及びＲＧＢ＆ＩＲ画像サンプルの特徴を出力してよい。その後に、ＲＧＢ画像サンプルの特徴、ＩＲ画像サンプルの特徴及びＲＧＢ＆ＩＲ画像サンプルの特徴をそれぞれＲＧＢ分類器、ＩＲ分類器、ＲＧＢ＆ＩＲ分類器に入力する。これにより、該ニューラルネットワークは、分類器により各画像サンプルの顔予測結果を出力し、該各画像サンプルの顔予測結果と各画像サンプルの顔ラベリング結果とを比較すれば、各画像サンプルの顔予測結果と各画像サンプルの顔ラベリング結果との間の損失を取得し、さらに該損失をニューラルネットワーク全体に逆伝播し、ニューラルネットワーク全体をトレーニングする。本実施例では、ニューラルネットワークに分類器を追加することにより、分類器がニューラルネットワーク全体のトレーニングを支援することにより、トレーニングされたニューラルネットワークは、異なるタイプの画像の特徴を正確にかつ効果的に抽出し、これにより顔認識の精度及び効率を向上させることができる。 As shown in FIG. 3, after inputting RGB image samples, IR image samples and RGB&IR image samples respectively into the neural network, the neural network uses a feature extractor to extract the features of the RGB image samples, the features of the IR image samples and the RGB&IR images. Sample characteristics may be output. Thereafter, the features of the RGB image samples, the features of the IR image samples, and the features of the RGB&IR image samples are input into the RGB classifier, the IR classifier, and the RGB&IR classifier, respectively. As a result, the neural network outputs the face prediction result of each image sample using the classifier, and compares the face prediction result of each image sample with the face labeling result of each image sample to predict the face of each image sample. Obtain the loss between the result and the face labeling result of each image sample, and further backpropagate the loss to the entire neural network to train the entire neural network. In this example, by adding a classifier to the neural network, the classifier helps train the entire neural network, so that the trained neural network can accurately and effectively distinguish the features of different types of images. extraction, thereby improving the accuracy and efficiency of face recognition.

なお、第１の分類器、第２の分類器及び混合分類器によりニューラルネットワーク全体のトレーニングを完了した後に、図２に示される方法は、
トレーニングされた上記ニューラルネットワーク中から第１の分類器、第２の分類器及び混合分類器を除去して、顔認識を行うためのニューラルネットワークを取得するステップをさらに含む。 Note that after completing the training of the entire neural network using the first classifier, the second classifier, and the mixed classifier, the method shown in FIG.
The method further includes the step of removing a first classifier, a second classifier, and a mixed classifier from the trained neural network to obtain a neural network for performing face recognition.

つまり、第１の分類器、第２の分類器及び混合分類器は、ニューラルネットワークのトレーニングを支援することができるが、具体的な適用において、例えば本開示の実施例に係る方法を適用してニューラルネットワークを介して顔認識を行うプロセスにおいて、該ニューラルネットワークは、該第１の分類器、第２の分類器及び混合分類器を含まなくてよく、これにより、ニューラルネットワークのトレーニング装置のトレーニングが完了した後、ニューラルネットワークにおける第１の分類器、第２の分類器及び混合分類器を除去することができる。 That is, the first classifier, the second classifier, and the mixed classifier can support training of a neural network, but in a specific application, for example, by applying the method according to the embodiment of the present disclosure. In the process of performing face recognition via a neural network, the neural network may not include the first classifier, the second classifier and the mixed classifier, so that the training of the neural network training device is Once completed, the first classifier, second classifier and mixed classifier in the neural network can be removed.

本開示の実施例では、異なるタイプのカメラによって撮影された画像を用いてニューラルネットワークをトレーニングすることにより、該ニューラルネットワークが特徴を出力する効率を効果的に向上させ、異なるタイプの画像が同じニューラルネットワークにより特徴を抽出することによる類似度の差異を低減するか又は回避することができる。 In embodiments of the present disclosure, by training a neural network using images taken by different types of cameras, the efficiency of the neural network to output features can be effectively improved, and images of different types can be trained using the same neural network. Similarity differences due to feature extraction by a network can be reduced or avoided.

一例では、図１に示される正規化された参照類似度を決定するための閾値情報を取得するために、本開示の実施例は、閾値情報の取得方法をさらに提供し、以下に示すとおりである。 In one example, to obtain threshold information for determining the normalized reference similarity shown in FIG. 1, embodiments of the present disclosure further provide a method for obtaining threshold information, as shown below. be.

第１のタイプの画像サンプル、第２のタイプの画像サンプル及び混合タイプの画像サンプルに基づいてニューラルネットワークをトレーニングした後に、図２に示される方法は、
異なるカメラにより取得された第１のタイプの画像及び第２のタイプの画像を取得するステップと、
第１のタイプの画像及び第２のタイプの画像に基づいて、それぞれ２つの画像を含み、かつその２つの画像における顔が同じ人に対応するポジティブサンプルペアと、それぞれ２つの画像を含み、かつその２つの画像における顔が異なる人に対応するネガティブサンプルペアとを取得するステップと、
トレーニングされたニューラルネットワークを介してそれぞれポジティブサンプルペアの類似度及びネガティブサンプルペアの類似度を決定するステップと、
ポジティブサンプルペアの類似度、ネガティブサンプルペアの類似度及び予め設定された異なる誤報率に基づいて、第１の閾値及び第２の閾値を含む閾値情報を決定するステップと、をさらに含む。 After training the neural network based on the first type of image samples, the second type of image samples and the mixed type of image samples, the method shown in FIG.
acquiring a first type of image and a second type of image acquired by different cameras;
a positive sample pair based on the first type of image and the second type of image, each including two images, and the faces in the two images correspond to the same person; obtaining negative sample pairs corresponding to people whose faces in the two images are different;
determining the similarity of the positive sample pairs and the similarity of the negative sample pairs, respectively, via the trained neural network;
The method further includes determining threshold information including a first threshold value and a second threshold value based on the similarity of the positive sample pair, the similarity of the negative sample pair, and different preset false alarm rates.

本実施例では、第１のタイプの画像は、少なくとも顔の画像を含み、第２のタイプの画像は、少なくとも顔の画像を含み、かつ該第１のタイプの画像と該第２のタイプの画像は、異なるタイプのカメラによって取得される。例えば、ＲＧＢカメラにより複数の人のＲＧＢ画像を取得し、そしてＩＲカメラにより複数の人のＩＲ画像を取得することができ、例えばＮ人を含み、各人は、Ｍ枚のＲＧＢ画像及びＭ枚のＩＲ画像を含む。該Ｍ及びＮは、いずれも２以上の整数である。 In this example, the first type of image includes at least a face image, the second type of image includes at least a face image, and the first type of image and the second type of image Images are acquired by different types of cameras. For example, an RGB camera can acquire RGB images of multiple people, and an IR camera can acquire IR images of multiple people, including, for example, N people, each person having M RGB images and M images. Contains IR images. Both M and N are integers of 2 or more.

一例では、第１のタイプの画像及び第２のタイプの画像に基づいてポジティブサンプルペアを取得するステップは、
第１のタイプの画像をペアリングして第１のタイプの画像のポジティブサンプルペアを取得するステップと、
第２のタイプの画像をペアリングして第２のタイプの画像のポジティブサンプルペアを取得するステップと、
第１のタイプの画像と第２のタイプの画像とをペアリングして混合画像のポジティブサンプルペアを取得するステップと、を含む。 In one example, obtaining positive sample pairs based on the first type of image and the second type of image includes:
pairing the first type of images to obtain a positive sample pair of the first type of images;
pairing the second type of images to obtain a positive sample pair of the second type of images;
pairing the first type of image and the second type of image to obtain a positive sample pair of mixed images.

本実施例では、例えばＮ人を含み、各人がＭ枚のＲＧＢ画像及びＭ枚のＩＲ画像を含むことを例とすると、各人が含むＭ枚のＲＧＢ画像をペアリングしてＭ^＊（Ｍ－１）／２個のＲＧＢポジティブサンプルペアを取得し、各人が含むＭ枚のＩＲ画像をペアリングしてＭ^＊（Ｍ－１）／２個のＩＲポジティブサンプルペアを取得し、各人が含むＭ枚のＲＧＢ画像とＭ枚のＩＲ画像とをペアリングしてＭ＊Ｍ個のＲＧＢ＆ＩＲポジティブサンプルペアを取得する。 In this embodiment, for example, if there are N people and each person includes M RGB images and M IR images, the M RGB images included by each person are paired and M ^* ( M-1)/2 RGB positive sample pairs are obtained, and M ^* (M-1)/2 IR positive sample pairs are obtained by pairing the M IR images included in each person. M*M RGB&IR positive sample pairs are obtained by pairing M RGB images and M IR images of a person.

ネガティブサンプルペアの場合、各人の画像と他の異なる人の画像とをペアリングし、ネガティブサンプルペアを構成することができる。例えば、第１のユーザ及び第２のユーザを例とすると、該ネガティブサンプルペアは、第１のユーザのＩＲ画像と第２のユーザのＩＲ画像、第１のユーザのＲＧＢ画像と第２のユーザのＲＧＢ画像、及び第１のユーザのＩＲ画像と第２のユーザのＲＧＢ画像を含んでよい。 For negative sample pairs, images of each person can be paired with images of other different people to form negative sample pairs. For example, taking a first user and a second user as an example, the negative sample pairs include an IR image of the first user and an IR image of the second user, and an RGB image of the first user and the second user. , an IR image of the first user, and an RGB image of the second user.

ポジティブサンプルペア及びネガティブサンプルペアを取得した後に、図２に示されるトレーニングされたニューラルネットワーク上でテストすることができ、例えば、トレーニングされたニューラルネットワークによりポジティブサンプルペアの顔特徴及びネガティブサンプルペアの顔特徴を出力し、該ポジティブサンプルペアの顔特徴に基づいて該ポジティブサンプルペアの類似度を取得し、そして該ネガティブサンプルペアの顔特徴に基づいて該ネガティブサンプルペアの類似度を取得する。それにより異なる誤報率での通過率及び対応する閾値を取得する。具体的には、例えば、ポジティブサンプルペアの類似度及びネガティブサンプルペアの類似度を取得した場合、目標誤報率に基づいて、各サンプルペアの類似度から類似度が最も低いサンプルペアを見つけ、かつ該類似度が最も低いサンプルペアが誤報のサンプルペアに属し、これにより該類似度が最も低いサンプルペアに対応する類似度を該目標誤報率に対応する閾値とすることができる。なお、目標誤報率は、予め設定されたか又は選択された１つの誤報率であると理解することができ、本開示の実施例は、該目標誤報率の具体的な値を限定しない。 After obtaining the positive sample pair and the negative sample pair, they can be tested on the trained neural network shown in Figure 2, for example, the facial features of the positive sample pair and the face of the negative sample pair can be tested by the trained neural network. outputting features, obtaining the similarity of the positive sample pair based on the facial features of the positive sample pair, and obtaining the similarity of the negative sample pair based on the facial features of the negative sample pair. Thereby, the passing rate and the corresponding threshold value at different false alarm rates are obtained. Specifically, for example, when the similarity of positive sample pairs and the similarity of negative sample pairs are obtained, the sample pair with the lowest similarity is found from the similarity of each sample pair based on the target false alarm rate, and The sample pair with the lowest degree of similarity belongs to the sample pair of false alarms, so that the degree of similarity corresponding to the sample pair with the lowest degree of similarity can be set as the threshold value corresponding to the target false alarm rate. Note that the target false alarm rate can be understood as one preset or selected false alarm rate, and the embodiments of the present disclosure do not limit the specific value of the target false alarm rate.

表１に示すように、表１は、本開示の実施例に係る閾値情報である。 As shown in Table 1, Table 1 is threshold information according to an example of the present disclosure.

誤報率は、電子機器により自主的に設定されてよいなど、本開示の実施例は、該誤報率の具体的な値を限定しない。 The false alarm rate may be independently set by the electronic device, and the embodiments of the present disclosure do not limit the specific value of the false alarm rate.

例えば、目標誤報率が０．００００１であり、ポジティブサンプルペア及びネガティブサンプルペアが合計１００００００ペアであれば、誤報のサンプルペアは、合計１０対であることが分かる。ポジティブサンプルペア及びネガティブサンプルペアから該１０ペアの誤報のサンプルペアを見つけ、その後にその中から類似度が最も低いサンプルペアを取り出すことができ、これにより該類似度が最も低いサンプルペアの類似度は、目標誤報率に対応する閾値である。表１に対して、目標誤報率０．００００１に対応する閾値はＴ（５）である。なお、本開示の実施例は、類似度が最も低いサンプルペアをどのように選択するかの方法を限定せず、例えば、ランキング法により該類似度が最も低いサンプルペアなどを取得することができる。表１に示される各対応関係は一例に過ぎず、本開示の実施例は具体的な値を限定しない。 For example, if the target false alarm rate is 0.00001 and the total number of positive sample pairs and negative sample pairs is 1,000,000, it can be seen that there are 10 false alarm sample pairs in total. It is possible to find the 10 false alarm sample pairs from the positive sample pairs and negative sample pairs, and then extract the sample pair with the lowest degree of similarity from among them, thereby determining the degree of similarity of the sample pair with the lowest degree of similarity. is a threshold corresponding to the target false alarm rate. For Table 1, the threshold corresponding to the target false alarm rate of 0.00001 is T(5). Note that the embodiments of the present disclosure do not limit the method of selecting the sample pair with the lowest degree of similarity; for example, the sample pair with the lowest degree of similarity can be obtained by a ranking method. . Each correspondence relationship shown in Table 1 is only an example, and the embodiments of the present disclosure do not limit specific values.

本実施例では、ニューラルネットワークをトレーニングした後に、さらに大量の画像に基づいて該ニューラルネットワークをテストすることにより異なる誤報率での閾値を取得し、すなわち閾値情報を取得することができ、さらに該ニューラルネットワークを適用するとき、異なる誤報率に基づいて異なる閾値情報（例えば第１の閾値及び第２の閾値）を用いて正規化された参照類似度を決定することができる。なお、表１の具体的な適用について、図１に示される実現形態を参照することができ、例えば電子機器は、表１中の閾値情報及び電子機器により取得された第１の顔特徴と第２の顔特徴の参照類似度に基づいて、正規化された参照類似度を決定することにより、正規化された参照類似度に基づいて該第１の顔特徴と該第２の顔特徴とが同じ人に対応するか否かを決定してよい。 In this embodiment, after training the neural network, thresholds at different false alarm rates can be obtained by further testing the neural network based on a large number of images, that is, threshold information can be obtained, and furthermore, the neural network can be tested based on a large number of images. When applying the network, the normalized reference similarity may be determined using different threshold information (eg, a first threshold and a second threshold) based on different false alarm rates. Note that for a specific application of Table 1, reference can be made to the implementation form shown in FIG. By determining a normalized reference similarity based on the reference similarity of the second facial feature, the first facial feature and the second facial feature are determined based on the normalized reference similarity. You may decide whether to respond to the same person or not.

なお、以上の各実施例にはそれぞれ重要視があり、１つの実施例で詳細に説明されない実現形態は、他の実施例の実現形態を対応して参照することができ、本明細書において１つずつ詳述しない。 It should be noted that each of the above embodiments has its own importance, and implementation forms that are not explained in detail in one embodiment can be referred to correspondingly to implementation forms of other embodiments, and in this specification, 1. I will not elaborate on each one.

以上には本開示の実施例の方法を詳細に説明し、以下に、本開示の実施例の装置及び電子機器が提供され、簡潔に説明することに鑑み、技術的原理、技術的効果などの面で詳細に説明されない装置の部分は、以上の方法実施例の対応する記載を参照することができ、説明を省略する。 The above describes the method of the embodiment of the present disclosure in detail, and below, the apparatus and electronic device of the embodiment of the present disclosure are provided and in view of the brief description, technical principles, technical effects, etc. For parts of the apparatus that are not explained in detail in the above, reference can be made to the corresponding description of the method embodiments above, and the explanation will be omitted.

図４を参照すると、図４は、本開示の実施例に係る顔認識装置の概略構成図であり、該顔認識装置は、図１に示される顔認識方法を実行することができる。図４に示すように、該顔認識装置は、
第１のカメラにより第１の顔画像を取得する第１の取得ユニット４０１と、
第１の顔画像の第１の顔特徴を抽出する第１の抽出ユニット４０２と、
第１の顔特徴を、第１のカメラと異なるタイプのカメラである第２のカメラが取得した第２の顔画像の特徴から抽出された第２の顔特徴と対比し、参照類似度を取得する対比ユニット４０３と、
参照類似度に基づいて第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定する決定ユニット４０４と、を含む。 Referring to FIG. 4, FIG. 4 is a schematic configuration diagram of a face recognition device according to an embodiment of the present disclosure, and the face recognition device can execute the face recognition method shown in FIG. 1. As shown in FIG. 4, the face recognition device includes:
a first acquisition unit 401 that acquires a first face image with a first camera;
a first extraction unit 402 that extracts a first facial feature of a first facial image;
The first facial feature is compared with a second facial feature extracted from the features of a second facial image acquired by a second camera that is a different type of camera from the first camera, and a reference similarity is obtained. A comparison unit 403 that
a determining unit 404 for determining whether the first facial feature and the second facial feature correspond to the same person based on the reference similarity.

一例では、第１のカメラはサーモカメラであり、第２のカメラは可視光カメラであり、或いは、第１のカメラは可視光カメラであり、第１のカメラはサーモカメラである。 In one example, the first camera is a thermal camera and the second camera is a visible light camera, or the first camera is a visible light camera and the first camera is a thermal camera.

一例では、決定ユニット４０４は、具体的には、参照類似度、参照誤報率及び類似度閾値に基づいて第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定し、異なる誤報率は異なる類似度閾値に対応する。 In one example, the determining unit 404 specifically determines whether the first facial feature and the second facial feature correspond to the same person based on the reference similarity, the reference false alarm rate, and the similarity threshold. However, different false alarm rates correspond to different similarity thresholds.

本開示の実施例の実施では、異なる誤報率により異なる類似度閾値を取得することにより第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定することは、固定の類似度閾値を用いる方式で顔を認証する解決手段を回避することにより、２つの顔画像の間の関係を判断するための類似度を動的に決定することができ、顔認証又は顔認識の効率及び精度を向上させる。 In implementations of embodiments of the present disclosure, determining whether the first facial feature and the second facial feature correspond to the same person by obtaining different similarity thresholds with different false alarm rates may be fixed. By bypassing solutions that authenticate faces in a manner that uses a similarity threshold of improve efficiency and accuracy.

一例では、決定ユニット４０４は、具体的には、参照類似度及び閾値情報に基づいて、正規化された参照類似度を決定し、そして正規化された参照類似度に基づいて第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定する。 In one example, the determining unit 404 specifically determines the normalized reference similarity based on the reference similarity and the threshold information, and determines the first facial feature based on the normalized reference similarity. and the second facial feature correspond to the same person.

本開示の実施例では、参照類似度及び予め設定された情報に基づいて、正規化された参照類似度を決定することにより、該正規化された参照類似度に基づいて第１の顔特徴と第２の顔特徴とが同じ人に対応するか否かを決定する。固定の類似度閾値を用いる方式で顔を認証する解決手段を効果的に改善することにより、２枚の顔画像の間の関係を判断するための類似度（すなわち正規化された参照類似度）を動的に決定することができ、顔認識の効率及び精度を向上させる。 In the embodiment of the present disclosure, the normalized reference similarity is determined based on the reference similarity and preset information, and the first facial feature is determined based on the normalized reference similarity. It is determined whether the second facial feature corresponds to the same person. By effectively improving the face recognition solution in a manner that uses a fixed similarity threshold, the similarity measure (i.e., the normalized reference similarity measure) for determining the relationship between two facial images. can be determined dynamically, improving the efficiency and accuracy of face recognition.

一例では、第１の抽出ユニット４０２は、具体的には、第１の顔画像を予めトレーニングされたニューラルネットワークに入力し、ニューラルネットワークにより第１の顔画像の第１の顔特徴を出力し、ニューラルネットワークは、異なるタイプのカメラによって撮影され、かつ顔が含まれる第１のタイプの画像サンプル及び第２のタイプの画像サンプルに基づいてトレーニングすることにより取得される。 In one example, the first extraction unit 402 specifically inputs the first facial image to a pre-trained neural network, outputs the first facial feature of the first facial image by the neural network, The neural network is obtained by training on a first type of image samples and a second type of image samples that are taken by different types of cameras and include faces.

本開示の実施例では、異なるタイプの画像サンプルによりニューラルネットワークをトレーニングすることにより、該ニューラルネットワークを適用して顔を認識するなど、顔認識の効率及び精度を効果的に向上させる。 Embodiments of the present disclosure effectively improve the efficiency and accuracy of face recognition, such as applying the neural network to recognize faces, by training the neural network with different types of image samples.

一例では、ニューラルネットワークは、第１のタイプの画像サンプル、第２のタイプの画像サンプル及び混合タイプの画像サンプルに基づいてトレーニングすることにより取得され、混合タイプの画像サンプルは、第１のタイプの画像サンプルと第２のタイプの画像サンプルとをペアリングすることにより取得される。 In one example, the neural network is obtained by training on image samples of a first type, image samples of a second type and image samples of a mixed type, where the image samples of a mixed type are of a first type. It is obtained by pairing the image sample and the second type of image sample.

一例では、第１のカメラは車載カメラを含み、第１の取得ユニット４０１は、具体的には、車載カメラにより第１の顔画像を取得し、第１の顔画像は、車両の車両使用者の顔画像を含む。 In one example, the first camera includes an on-vehicle camera, and the first acquisition unit 401 specifically captures the first facial image by the on-vehicle camera, and the first facial image is a vehicle user of the vehicle. Contains facial images.

本開示の実施例は、運転者監視システムに効果的に適用することにより、運転者の顔認識の効率を向上させることができる。 Embodiments of the present disclosure can be effectively applied to a driver monitoring system to improve the efficiency of driver face recognition.

一例では、車両使用者は、車両を運転する人、車両に乗る人、車両を修理する人、車両に給油する人及び車両を制御する人のうちの１つ以上を含む。 In one example, vehicle users include one or more of a person who drives the vehicle, a person who rides the vehicle, a person who repairs the vehicle, a person who refuels the vehicle, and a person who controls the vehicle.

一例では、上記車両使用者が車両を運転する人を含む場合、第１の取得ユニット４０１は、具体的には、トリガ命令を受信した場合、車載カメラにより第１の顔画像を取得し、
或いは第１の取得ユニット４０１は、具体的には、車両の走行中に、車載カメラにより第１の顔画像を取得し、
或いは第１の取得ユニット４０１は、具体的には、車両の走行速度が参照速度に達した場合、車載カメラにより第１の顔画像を取得する。 In one example, when the vehicle user includes a person driving a vehicle, the first acquisition unit 401 specifically acquires a first facial image by an on-vehicle camera when receiving a trigger command;
Alternatively, the first acquisition unit 401 specifically acquires the first facial image using an on-vehicle camera while the vehicle is running;
Alternatively, the first acquisition unit 401 specifically acquires the first face image using the vehicle-mounted camera when the traveling speed of the vehicle reaches the reference speed.

一例では、第２の顔画像は、車両使用者が顔登録を行うための画像であり、図５に示すように、顔認識装置は、
第２のカメラにより第２の顔画像を取得する第２の取得ユニット４０５と、
第２の顔画像の第２の顔特徴を抽出する第２の抽出ユニット４０６と、
第２の顔画像の第２の顔特徴を保存する保存ユニット４０７と、をさらに含む。 In one example, the second face image is an image for the vehicle user to perform face registration, and as shown in FIG.
a second acquisition unit 405 that acquires a second facial image with a second camera;
a second extraction unit 406 for extracting a second facial feature of the second facial image;
and a storage unit 407 for storing second facial features of the second facial image.

説明すべきものとして、各ユニットの実現は、さらに図１に示される方法実施例の対応する説明を対応して参照することができる。 As should be explained, the implementation of each unit may also be referred to correspondingly in the corresponding description of the method embodiment shown in FIG.

図６を参照すると、図６は、本開示の実施例に係るニューラルネットワークのトレーニング装置の概略構成図であり、該ニューラルネットワークのトレーニング装置は、図２に示される顔認識方法を実行することができる。図６に示すように、該ニューラルネットワークのトレーニング装置は、
異なるタイプのカメラによって撮影され、かつ顔が含まれる第１のタイプの画像サンプル及び第２のタイプの画像サンプルを取得する取得ユニット６０１と、
第１のタイプの画像サンプル及び第２のタイプの画像サンプルに基づいてニューラルネットワークをトレーニングするトレーニングユニット６０２と、を含む。 Referring to FIG. 6, FIG. 6 is a schematic configuration diagram of a neural network training device according to an embodiment of the present disclosure, and the neural network training device is capable of executing the face recognition method shown in FIG. can. As shown in FIG. 6, the neural network training device includes:
an acquisition unit 601 for acquiring a first type of image sample and a second type of image sample taken by different types of cameras and containing a face;
a training unit 602 for training the neural network based on the first type of image samples and the second type of image samples.

本開示の実施例では、異なるタイプのカメラによって撮影された顔画像を用いてニューラルネットワークをトレーニングすることにより、ニューラルネットワークが顔特徴を出力する精度を効果的に向上させることができ、また顔認識を行うとき、該ニューラルネットワークを用いて顔特徴を抽出すると、顔認識の精度を効果的に向上させる。 In embodiments of the present disclosure, by training the neural network using facial images taken by different types of cameras, the accuracy of the neural network outputting facial features can be effectively improved, and facial images can be effectively recognized. When performing facial recognition, extracting facial features using the neural network effectively improves the accuracy of facial recognition.

一例では、図７に示すように、トレーニングユニット６０２は、
第１のタイプの画像サンプルと第２のタイプの画像サンプルとをペアリングして第１のタイプの画像サンプルと第２のタイプの画像サンプルの混合タイプの画像サンプルを取得するペアリングサブユニット６０２１と、
第１のタイプの画像サンプル、第２のタイプの画像サンプル及び混合タイプの画像サンプルに基づいてニューラルネットワークをトレーニングするトレーニングサブユニット６０２２と、を含む。 In one example, as shown in FIG. 7, training unit 602 includes:
Pairing subunit 6021 for pairing a first type of image sample and a second type of image sample to obtain a mixed type image sample of the first type of image sample and the second type of image sample. and,
a training subunit 6022 for training the neural network based on the first type of image samples, the second type of image samples, and the mixed type of image samples.

一例では、トレーニングサブユニット６０２２は、具体的には、ニューラルネットワークにより第１のタイプの画像サンプルの顔予測結果、第２のタイプの画像サンプルの顔予測結果及び混合タイプの画像サンプルの顔予測結果を取得し、そして第１のタイプの画像サンプルの顔予測結果と顔ラベリング結果との差異、第２のタイプの画像サンプルの顔予測結果と顔ラベリング結果との差異及び混合タイプの画像サンプルの顔予測結果と顔ラベリング結果との差異に基づいてニューラルネットワークをトレーニングする。 In one example, the training subunit 6022 specifically uses a neural network to generate face prediction results for the first type of image sample, face prediction results for the second type of image sample, and face prediction results for the mixed type image sample. and the difference between the face prediction result and the face labeling result of the first type of image sample, the difference between the face prediction result and the face labeling result of the second type of image sample, and the face of the mixed type image sample. Train a neural network based on the difference between the prediction result and the face labeling result.

一例では、ニューラルネットワークには、第１の分類器、第２の分類器及び混合分類器が含まれ、トレーニングサブユニット６０２２は、具体的には、第１のタイプの画像サンプルの特徴を第１の分類器に入力して第１のタイプの画像サンプルの顔予測結果を取得し、第２のタイプの画像サンプルを第２の分類器に入力して前記第２のタイプの画像サンプルの顔予測結果を取得し、そして混合タイプの画像サンプルの顔特徴を混合分類器に入力して混合タイプの画像サンプルの顔予測結果を取得する。 In one example, the neural network includes a first classifier, a second classifier, and a mixture classifier, and the training subunit 6022 specifically combines features of the first type of image sample with the first classifier. A second type of image sample is input to a second classifier to obtain a face prediction result of the first type of image sample, and a second type of image sample is input to a second classifier to obtain a face prediction result of the second type of image sample. Obtaining the result, and inputting the facial features of the mixed type image sample to a mixture classifier to obtain the face prediction result of the mixed type image sample.

一例では、図８に示すように、上記装置は、
トレーニングされたニューラルネットワーク中から第１の分類器、第２の分類器及び混合分類器を除去して、顔認識を行うためのニューラルネットワークを取得するニューラルネットワーク適用ユニットをさらに含む。 In one example, as shown in FIG.
The method further includes a neural network application unit that removes the first classifier, the second classifier, and the mixed classifier from the trained neural network to obtain a neural network for performing face recognition.

説明すべきものとして、各ユニットの実現は、さらに図２に示される方法実施例の対応する説明を対応して参照することができる。 As such, the implementation of each unit may also be referred to correspondingly in the corresponding description of the method embodiment shown in FIG. 2.

図９を参照すると、図９は、本開示の実施例に係る電子機器の概略構成図である。図９に示すように、該電子機器は、バスなどを含んでよい接続線により互いに接続されるプロセッサ９０１、メモリ９０２及び入出力インタフェース９０３を含む。 Referring to FIG. 9, FIG. 9 is a schematic configuration diagram of an electronic device according to an embodiment of the present disclosure. As shown in FIG. 9, the electronic device includes a processor 901, a memory 902, and an input/output interface 903 that are connected to each other by a connection line that may include a bus or the like.

入出力インタフェース９０３は、データ及び／又は信号を入力し、データ及び／又は信号を出力することができる。 The input/output interface 903 can input data and/or signals and output data and/or signals.

メモリ９０２は、ランダムアクセスメモリ（ｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ、ＲＡＭ）、リードオンリーメモリ（ｒｅａｄ－ｏｎｌｙｍｅｍｏｒｙ、ＲＯＭ）、消去可能なプログラマブルリードオンリーメモリ（ｅｒａｓａｂｌｅｐｒｏｇｒａｍｍａｂｌｅｒｅａｄｏｎｌｙｍｅｍｏｒｙ、ＥＰＲＯＭ）又はコンパクトリードオンリーメモリ（ｃｏｍｐａｃｔｄｉｓｃｒｅａｄ－ｏｎｌｙｍｅｍｏｒｙ、ＣＤ－ＲＯＭ）を含むが、これらに限定されず、該メモリ９０２は、関連命令及びデータに用いられる。 Memory 902 may be random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM), or compact read-only memory. (compact The memory 902 is used for related instructions and data, including but not limited to disc read-only memory (CD-ROM).

プロセッサ９０１は、１つ以上であってよく、上記１つ以上のプロセッサは、１つ以上の中央処理装置（ｃｅｎｔｒａｌｐｒｏｃｅｓｓｉｎｇｕｎｉｔ、ＣＰＵ）及び／又は１つ以上の加速ユニットなどであってよい。ＣＰＵは、シングルコアＣＰＵであってもよく、マルチコアＣＰＵであってもよい。加速ユニットは、画像処理装置（ｇｒａｐｈｉｃｓｐｒｏｃｅｓｓｉｎｇｕｎｉｔ、ＧＰＵ）、フィールドプログラマブルゲートアレイ（ｆｉｅｌｄ－ｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙ、ＦＰＧＡ）などを含むが、それらに限定されない。 The processor 901 may be one or more, and the one or more processors may be one or more central processing units (CPUs) and/or one or more acceleration units. The CPU may be a single-core CPU or a multi-core CPU. The acceleration unit includes, but is not limited to, a graphics processing unit (GPU), a field-programmable gate array (FPGA), and the like.

なお、本開示の実施例に係るプロセッサは、他のタイプのプロセッサであってよく、本開示の実施例は、該プロセッサのタイプを一意的に限定しない。本開示の実施例に係るメモリは、他のタイプのメモリなどであってもよく、本開示の実施例は、該メモリのタイプも限定しない。 Note that the processor according to the embodiment of the present disclosure may be another type of processor, and the embodiment of the present disclosure does not uniquely limit the type of the processor. The memory according to embodiments of the present disclosure may be other types of memory, and embodiments of the present disclosure do not limit the type of memory.

一例では、各操作の実現は、さらに図１に示される方法実施例の対応する説明を参照することができる。各操作の実現は、さらに図２に示される方法実施例の対応する説明に対応して参照することができる。或いは、各操作の実現は、さらに図４及び図５に示される実施例の対応する説明を対応して参照することができる。各操作の実現は、さらに図６～図８に示される実施例の対応する説明を対応して参照することができる。 In one example, the implementation of each operation can further refer to the corresponding description of the method embodiment shown in FIG. The implementation of each operation can be referred to correspondingly in the corresponding description of the method embodiment further illustrated in FIG. Alternatively, the implementation of each operation can further refer correspondingly to the corresponding description of the embodiments shown in FIGS. 4 and 5. The implementation of each operation can also be correspondingly referred to the corresponding description of the embodiments shown in FIGS. 6 to 8.

一実施例では、プロセッサ９０１は、ステップ１０１～ステップ１０４に示す方法を実行することができる。一例として、プロセッサは、さらに入出力インタフェースを制御して第１の顔画像などを取得することができ、本開示の実施例は、第１の顔画像をどのように取得するかについて一意的に限定しない。 In one embodiment, processor 901 may perform the method shown in steps 101-104. As an example, the processor may further control the input/output interface to obtain a first facial image, etc., and embodiments of the present disclosure uniquely determine how to obtain the first facial image. Not limited.

また例えば、一実施例では、プロセッサは、ステップ２０１及びステップ２０２に示す方法を実行することができる。 Also for example, in one embodiment, the processor may perform the method shown in steps 201 and 202.

また例えば、プロセッサ９０１は、第１の取得ユニット４０１、さらに第１の抽出ユニット４０２、対比ユニット４０３、及び決定ユニット４０４が実行する方法などを実行することができる。 Also, for example, the processor 901 may perform the method performed by the first acquisition unit 401, as well as the first extraction unit 402, the comparison unit 403, the determination unit 404, etc.

また例えば、プロセッサ９０１は、さらに取得ユニット６０１及びトレーニングユニット６０２が実行する方法などを実行することができる。 Also, for example, processor 901 may further perform methods such as those performed by acquisition unit 601 and training unit 602.

これにより、図９に示される電子機器の具体的な実現形態について、前述の各実施例の説明を対応して参照することができ、本明細書において１つずつ詳述しない。 As a result, the specific implementation form of the electronic device shown in FIG. 9 can be referred to in a corresponding manner to the description of each of the above-mentioned embodiments, and will not be described in detail one by one in this specification.

本開示の実施例は、コンピュータ可読記憶媒体をさらに提供する。上記方法実施例における全部又は一部のフローは、コンピュータプログラムにより関連ハードウェアを命令して完了でき、該プログラムは、上記コンピュータ記憶媒体に記憶でき、該プログラムは、実行中に上記各方法実施例のフローを含でんよい。コンピュータ可読記憶媒体は、前述のいずれかの実施例の顔認識装置又はニューラルネットワークのトレーニング装置の内部記憶ユニット、例えば顔認識装置又はニューラルネットワークのトレーニング装置のハードディスク又はメモリであってよい。上記コンピュータ可読記憶媒体は、上記顔認識装置又はニューラルネットワークのトレーニング装置の外部記憶装置、例えば上記顔認識装置又はニューラルネットワークのトレーニング装置に備えられた挿着式ハードディスク、スマートメモリカード（ｓｍａｒｔｍｅｄｉａｃａｒｄ、ＳＭＣ）、セキュア・デジタル（ｓｅｃｕｒｅｄｉｇｉｔａｌ、ＳＤ）カード、フラッシュメモリカード（ｆｌａｓｈｃａｒｄ）などであってもよい。さらに、上記コンピュータ可読記憶媒体は、上記顔認識装置又はニューラルネットワークのトレーニング装置の内部記憶ユニットを含むだけでなく、外部記憶装置を含んでよい。上記コンピュータ可読記憶媒体は、上記コンピュータプログラム及び上記顔認識装置又はニューラルネットワークのトレーニング装置に必要な他のプログラム及びデータを記憶する。上記コンピュータ可読記憶媒体は、出力されたか又は出力しようとするデータを一時的に記憶してもよい。 Embodiments of the present disclosure further provide a computer-readable storage medium. All or some of the flows in the method embodiments described above may be completed by instructing associated hardware by a computer program, which program may be stored on the computer storage medium, and during execution, the program may execute the steps in each of the method embodiments described above. It may contain the flow of The computer-readable storage medium may be an internal storage unit of the facial recognition device or neural network training device of any of the embodiments described above, such as a hard disk or memory of the facial recognition device or neural network training device. The computer-readable storage medium may be an external storage device of the facial recognition device or neural network training device, such as an insertable hard disk, a smart memory card, or a smart media card provided in the facial recognition device or neural network training device. SMC), secure digital (SD) card, flash memory card (flash card), etc. may be used. Furthermore, the computer-readable storage medium may include not only an internal storage unit of the facial recognition device or neural network training device, but also an external storage device. The computer-readable storage medium stores the computer program and other programs and data necessary for the facial recognition device or neural network training device. The computer-readable storage medium may temporarily store data that has been output or is to be output.

１つ以上の選択可能な実施形態では、本開示の実施例は、実行されると、コンピュータに上記任意の実施例のいずれか１項に記載の方法を実行させるコンピュータ可読命令を記憶するコンピュータプログラムをさらに提供する。 In one or more optional embodiments, embodiments of the present disclosure may be implemented using a computer program storing computer readable instructions that, when executed, cause a computer to perform the method described in any one of the embodiments above. Provide more programs .

該コンピュータプログラムは、具体的には、ハードウェア、ソフトウェア又はそれらの組み合わせ方式により実現できる。１つの選択可能な例では、上記コンピュータプログラムは、具体的には、コンピュータ記憶媒体として具現化され、別の選択可能な例では、上記コンピュータプログラムは、具体的に、ソフトウェア、例えばソフトウェア開発パケット（ｓｏｆｔｗａｒｅｄｅｖｅｌｏｐｍｅｎｔｋｉｔ、ＳＤＫ）などとして具現化される。 Specifically, the computer program can be implemented using hardware, software, or a combination thereof. In one alternative example, the computer program is specifically embodied as a computer storage medium, and in another alternative, the computer program is specifically embodied in software , such as a software development packet ( It is realized as a software development kit (SDK), etc.

上記実施例では、全て又は部分的にソフトウェア、ハードウェア、ファームウェア又は他の任意の組み合わせにより実現される。ソフトウェアプログラムで実現されると、全て又は部分的にコンピュータプログラムの形態で実現されてよい。上記コンピュータプログラムは、１つ以上のコンピュータ命令を含む。コンピュータに上記コンピュータプログラム命令をロードし実行する場合、本開示の実施例に記載のフロー又は機能に基づいて全て又は部分的に生成される。上記コンピュータは、汎用コンピュータ、専用コンピュータ、コンピュータネットワーク、又は他のプログラマブル装置であってよい。上記コンピュータ命令は、コンピュータ可読記憶媒体に記憶されてよく、或いは上記コンピュータ可読記憶媒体により伝送されてよい。上記コンピュータ可読記憶媒体は、コンピュータがアクセス可能な任意の利用可能媒体又は１つ以上の利用可能媒体集積を含むサーバ、データセンタなどのデータ記憶装置であってよい。上記利用可能媒体は、磁気媒体（例えば、ソフトディスク、ハードディスク、磁気テープ）、光学媒体（例えば、ＤＶＤ（登録商標））又は半導体媒体（例えば、ソリッドステートデバイス（ｓｏｌｉｄｓｔａｔｅｄｉｓｋ、ＳＳＤ）などであってよい。

The embodiments described above may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in a software program, it may be implemented in whole or in part in the form of a computer program . The computer program includes one or more computer instructions. When loaded and executed on a computer, the computer program instructions may be generated in whole or in part based on the flows or functionality described in the embodiments of the present disclosure. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions may be stored on or transmitted by a computer-readable storage medium. The computer-readable storage medium may be any available computer-accessible media or data storage device such as a server, data center, etc. that includes one or more collections of available media. The available media may be magnetic media (e.g., soft disk, hard disk, magnetic tape), optical media (e.g., DVD (registered trademark) ), or semiconductor media (e.g., solid state disk (SSD)). It's fine.

本開示の実施例の方法におけるステップは、実際の需要に応じて順序調整、統合及び削除を行うことができる。 The steps in the method of the embodiments of the present disclosure can be reordered, combined, and deleted according to actual needs.

本開示の実施例装置におけるモジュールは、実際の必要に応じて合併、分割及び削除を行うことができる。 Modules in the embodiment device of the present disclosure can be merged, divided, and deleted according to actual needs.

上記のように、上記実施例は、本開示の技術手段を説明するためのものに過ぎず、限定するものではないと説明すべきであり、前述の実施例を参照して本開示を詳細に説明したが、当業者が理解すべきこととして、依然として、前述の各実施例において記載される技術手段を修正するか、又はその技術的特徴の一部に同等置換を行うことができ、これらの修正や置換によって、対応する技術手段の本質は、本開示の実施例に係る技術手段の範囲から逸脱することはない。

As mentioned above, it should be explained that the above embodiments are only for illustrating the technical means of the present disclosure and not limiting, and the present disclosure will be described in detail with reference to the above embodiments. Having explained this, it should be understood by those skilled in the art that the technical means described in each of the above-mentioned embodiments can still be modified or equivalent substitutions can be made to some of the technical features thereof, and these Due to modifications and substitutions, the essence of the corresponding technical means does not depart from the scope of the technical means according to the embodiments of the present disclosure.

Claims

acquiring a first facial image with a first camera;
extracting a first facial feature of the first facial image;
The first facial features are compared with pre-stored second facial features extracted from the features of a second facial image acquired by a second camera that is a different type of camera from the first camera. , obtaining a reference similarity indicating a similarity between the first facial feature and the second facial feature;
determining whether the first facial feature and the second facial feature correspond to the same person based on the reference similarity;
The first camera is a thermo camera and the second camera is a visible light camera, or the first camera is a visible light camera and the second camera is a thermo camera,
The step of determining whether the first facial feature and the second facial feature correspond to the same person based on the reference similarity,
determining whether the first facial feature and the second facial feature correspond to the same person based on a reference false alarm rate and a similarity threshold in addition to the reference similarity; , the similarity threshold is a threshold of the reference similarity, and different reference false alarm rates correspond to different similarity thresholds, or
In addition to the reference similarity, a normalized reference similarity is determined based on threshold information, and the first facial feature and the second face are determined based on the normalized reference similarity. determining whether the features correspond to the same person , the threshold information being obtained based on the similarity of the positive sample pair, the similarity of the negative sample pair, and different preset false alarm rates. , the threshold information includes a first threshold and a second threshold, the reference similarity is between the first threshold and the second threshold, and the normalized reference similarity is between the first and second thresholds. a reference similarity, determined based on the first threshold and the second threshold ;
determining whether the first facial feature and the second facial feature correspond to the same person based on a reference false alarm rate and a similarity threshold in addition to the reference similarity;
determining the similarity threshold based on the reference false alarm rate and obtaining the reference similarity between the first facial feature and at least two second facial features; determining a second facial feature with the highest degree of similarity;
If the reference similarity between the second facial feature having the highest degree of similarity and the first facial feature is greater than the similarity threshold, the second facial feature having the highest degree of similarity and the first facial feature have the highest degree of similarity. determining that the facial features correspond to the same person;
determining whether the first facial feature and the second facial feature correspond to the same person based on the normalized reference similarity;
the first facial feature having the highest similarity to said first facial feature based on at least two normalized reference similarities obtained based on reference similarities between said first facial feature and at least two second facial features; A second facial feature is determined, and the normalized reference similarity between the second facial feature and the first facial feature having the highest degree of similarity with the first facial feature is the similarity threshold. , the step of determining that a second facial feature having the highest degree of similarity to the first facial feature and the first facial feature correspond to the same person. Method.

The step of extracting a first facial feature of the first facial image includes:
inputting the first facial image into a pre-trained neural network, and outputting a first facial feature of the first facial image by the neural network, the neural network is configured to obtained by training on image samples and a second type of image sample , the first type of image sample being taken by a first type of camera, and the second type of image sample being a second type of image sample. taken by the camera of
The first type of camera is a thermal camera and the second type of camera is a visible light camera, or the first type of camera is a visible light camera and the second type of camera is a visible light camera. is a thermo camera,
the first type of image sample and the second type of image sample include a face;
The face recognition method according to claim 1 , wherein the neural network has feature extraction capabilities for two different types of images .

The neural network is trained based on the first type of image samples, the second type of image samples and mixed type image samples, the mixed type image samples being the first type of image samples. 3. The face recognition method according to claim 2, wherein the face recognition method is obtained by pairing the second type of image sample and the second type of image sample.

The first camera includes an in-vehicle camera, and the step of acquiring a first facial image with the first camera includes:
4. The method according to claim 1, further comprising the step of acquiring the first facial image using the on-vehicle camera, wherein the first facial image includes a facial image of a vehicle user of the vehicle . Facial recognition method described in Section.

The vehicle user includes one or more of the following: a person who drives the vehicle, a person who rides the vehicle, a person who repairs the vehicle, a person who refuels the vehicle, and a person who controls the vehicle. The face recognition method according to claim 4 .

The vehicle user includes a person who drives the vehicle, and the step of acquiring the first facial image with the in-vehicle camera includes:
If a trigger command is received, acquiring the first facial image of the person driving the vehicle using the in-vehicle camera;
Alternatively, while the vehicle is running, acquiring the first facial image of the person driving the vehicle using the in-vehicle camera;
Alternatively, the method further includes the step of acquiring the first facial image of the person driving the vehicle using the in-vehicle camera when the traveling speed of the vehicle reaches a reference speed;
5. The face recognition method according to claim 4, wherein the trigger command is a trigger command input by a user or a trigger command transmitted from another electronic device.

The second face image is an image for the vehicle user to perform face registration, and the second face image is an image for the vehicle user to perform face registration, and the second face image is an image for the vehicle user to perform face registration. The recognition method is
acquiring the second facial image with the second camera;
extracting a second facial feature of the second facial image;
The face recognition method according to any one of claims 4 to 6, further comprising the step of storing a second facial feature of the second facial image.

obtaining a first type of image sample and a second type of image sample, each of which includes facial images captured by different types of cameras;
training a neural network based on the first type of image samples and the second type of image samples;
The different types of cameras include a first camera and a second camera, the first type of image sample being taken by the first camera, and the second type of image sample being taken by the second camera. ,
The first camera is a thermo camera and the second camera is a visible light camera, or the first camera is a visible light camera and the second camera is a thermo camera,
The neural network has feature extraction capabilities for two different types of images,
The step of training a neural network based on the first type of image samples and the second type of image samples includes:
pairing the first type of image sample and the second type of image sample to obtain a mixed type image sample of the first type of image sample and the second type of image sample; the mixed-type image samples include a plurality of pairs of mixed-type image samples, each pair of mixed-type image samples including a first type of image sample and a second type of image sample; step and
A method for training a neural network, comprising the step of training the neural network based on the first type of image sample, the second type of image sample, and the mixed type of image sample.

The step of training the neural network based on the first type of image samples, the second type of image samples and the mixed type of image samples includes:
obtaining a face prediction result of the first type image sample, a face prediction result of the second type image sample, and a face prediction result of the mixed type image sample by the neural network;
The difference between the face prediction result and face labeling result of the first type image sample, the difference between the face prediction result and face labeling result of the second type image sample, and the face prediction result of the mixed type image sample. and a face labeling result , the face labeling result being labeling information of a face included in an image sample. A neural network training method described in .

The neural network includes a first classifier, a second classifier, and a mixture classifier, and the neural network calculates a face prediction result for the first type of image sample and a face prediction result for the second type of image sample. The step of obtaining a face prediction result and a face prediction result of the mixed type image sample comprises:
inputting facial features of the first type of image sample into the first classifier to obtain a face prediction result of the first type of image sample;
inputting facial features of the second type of image sample to the second classifier to obtain a face prediction result of the second type of image sample;
The neural network according to claim 9 , further comprising inputting facial features of the mixed type image sample to the mixture classifier to obtain a face prediction result of the mixed type image sample. training method.

After said step of training said neural network based on said first type of image samples, said second type of image samples and said mixed type of image samples;
Claim further comprising the step of removing the first classifier, the second classifier, and the mixed classifier from the neural network to obtain a neural network for performing face recognition. 10. The neural network training method described in 10 .

a first acquisition unit that acquires a first face image with a first camera;
a first extraction unit that extracts a first facial feature of the first facial image;
The first facial features are compared with pre-stored second facial features extracted from the features of a second facial image acquired by a second camera that is a different type of camera from the first camera. , a comparison unit that obtains a reference similarity indicating the similarity between the first facial feature and the second facial feature;
a determining unit for determining whether the first facial feature and the second facial feature correspond to the same person based on the reference similarity;
The first camera is a thermo camera and the second camera is a visible light camera, or the first camera is a visible light camera and the second camera is a thermo camera,
The determining unit determines whether the first facial feature and the second facial feature correspond to the same person based on the reference similarity,
determining whether the first facial feature and the second facial feature correspond to the same person based on a reference false alarm rate and a similarity threshold in addition to the reference similarity; , the similarity threshold is a threshold of the reference similarity, and different reference false alarm rates correspond to different similarity thresholds, or
In addition to the reference similarity, a normalized reference similarity is determined based on threshold information, and the first facial feature and the second face are determined based on the normalized reference similarity. and determining whether the features correspond to the same person, the threshold information being obtained based on the similarity of the positive sample pair, the similarity of the negative sample pair and different preset false alarm rates. , the threshold information includes a first threshold and a second threshold, the reference similarity is between the first threshold and the second threshold, and the normalized reference similarity is between the first and second thresholds. a reference similarity, determined based on the first threshold and the second threshold ;
Determining whether the first facial feature and the second facial feature correspond to the same person based on the reference false alarm rate and a similarity threshold in addition to the reference similarity,
determining the similarity threshold based on the reference false alarm rate and obtaining the reference similarity between the first facial feature and at least two second facial features; determining a second facial feature with the highest degree of similarity;
If the reference similarity between the second facial feature having the highest degree of similarity and the first facial feature is greater than the similarity threshold, the second facial feature having the highest degree of similarity and the first facial feature have the highest degree of similarity. determining that the facial features correspond to the same person;
determining whether the first facial feature and the second facial feature correspond to the same person based on the normalized reference similarity;
the first facial feature having the highest similarity to said first facial feature based on at least two normalized reference similarities obtained based on reference similarities between said first facial feature and at least two second facial features; A second facial feature is determined, and the normalized reference similarity between the second facial feature and the first facial feature having the highest degree of similarity with the first facial feature is the similarity. Face recognition comprising: determining that a second facial feature having the highest degree of similarity to the first facial feature and the first facial feature correspond to the same person if the second facial feature is larger than a threshold. Device.

comprising a processor and a memory, the processor coupled to the memory, the memory, when executed by the processor, causing the processor to perform the face recognition method of any one of claims 1 to 7 ; Or an electronic device storing a program instruction for causing the processor to execute the neural network training method according to any one of claims 8 to 11 .

When executed by a processor, it causes the processor to perform a face recognition method according to any one of claims 1 to 7 , or causes the processor to perform a neural network according to any one of claims 8 to 11 . A computer-readable storage medium having stored thereon a computer program containing program instructions for carrying out a training method.