CN112084946B

CN112084946B - Face recognition method and device and electronic equipment

Info

Publication number: CN112084946B
Application number: CN202010941304.3A
Authority: CN
Inventors: 王开业
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-05-09
Filing date: 2020-05-09
Publication date: 2022-08-05
Anticipated expiration: 2040-05-09
Also published as: CN112084946A; CN111291740B; CN111291740A

Abstract

The embodiment of the specification provides a face recognition method, a face recognition device and electronic equipment. The face recognition method comprises the following steps: and acquiring the face feature data of the object to be recognized corresponding to at least two modal images. And performing feature fusion on the face feature data of the at least two modal images corresponding to the object to be recognized to obtain the face fusion feature data corresponding to the object to be recognized. And inputting the face fusion characteristic data corresponding to the object to be recognized into a face recognition model to obtain a recognition result corresponding to the object to be recognized, wherein the face recognition model is obtained by training based on the face fusion characteristic data corresponding to the sample object and the recognition classification label corresponding to the sample object, and the face fusion characteristic data corresponding to the sample object is obtained by fusing the face characteristic data corresponding to the at least two modal images of the sample object.

Description

Face recognition method and device and electronic equipment

The present document is a divisional application of a training method of a face recognition model, a face recognition method and hardware, and the application number of a parent case is 202010388083.1, and the application date is 5, 9 and 2020.

Technical Field

The present disclosure relates to the field of biometric identification technologies, and in particular, to a face recognition method and apparatus, and an electronic device.

Background

With the large-scale application of the online paying scene of the face recognition, the requirement on the false recognition passing rate of the face recognition of 1: N is higher and higher. How to further improve the performance of face recognition to ensure the development of services has become an urgent need. The widely used face recognition technology at present is based on the face imaging of a visible color light RGB camera, and identity verification and recognition are carried out on the face features in RGB pictures. However, with the expansion of business, the RGB face recognition technology only utilizes the texture information of the face, and it has become increasingly difficult to meet the extreme requirements for the throughput rate and the misrecognition rate in terms of performance.

Therefore, how to realize higher-performance face recognition is a technical problem which needs to be solved urgently at present.

Disclosure of Invention

Embodiments of the present disclosure provide a training method for a face recognition model, a face recognition method, hardware, an apparatus, and an electronic device, which can perform face recognition in multiple modes to improve recognition performance.

In order to achieve the above object, the embodiments of the present specification are implemented as follows:

in a first aspect, a face recognition method is provided, including:

acquiring face feature data of at least two modal images corresponding to an object to be recognized;

performing feature fusion on the face feature data of at least two modal images corresponding to the object to be recognized to obtain face fusion feature data corresponding to the object to be recognized;

and inputting the face fusion characteristic data corresponding to the object to be recognized into a face recognition model to obtain a recognition result corresponding to the object to be recognized, wherein the face recognition model is obtained by training based on the face fusion characteristic data corresponding to the sample object and the recognition classification label corresponding to the sample object, and the face fusion characteristic data corresponding to the sample object is obtained by fusing the face characteristic data corresponding to the at least two modal images of the sample object.

In a second aspect, a face recognition apparatus is provided, including:

the acquisition module is used for acquiring the face feature data of the object to be recognized corresponding to at least two modal images;

the fusion module is used for carrying out feature fusion on the face feature data of at least two modal images corresponding to the object to be recognized to obtain face fusion feature data corresponding to the object to be recognized;

and the recognition module is used for inputting the face fusion characteristic data corresponding to the object to be recognized into a face recognition model to obtain a recognition result corresponding to the object to be recognized, wherein the face recognition model is obtained by training based on the face fusion characteristic data corresponding to the sample object and the recognition classification label corresponding to the sample object, and the face fusion characteristic data corresponding to the sample object is obtained by fusing the face characteristic data corresponding to the at least two modal images with the sample object.

In a third aspect, an electronic device is provided that includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program being executed by the processor to:

In a fourth aspect, a computer-readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, performs the steps of:

The model scheme method of the embodiment of the specification performs feature fusion on the face feature data of the images in different modes to obtain the face fusion feature data with more obvious distinctiveness, and trains the face recognition model by using the face fusion feature data, so that the face recognition with higher performance can be realized by using the face recognition model, and the false recognition passing rate can be obviously reduced.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative efforts.

Fig. 1 is a schematic flow chart of a training method for a face recognition model provided in an embodiment of the present specification.

Fig. 2 is a schematic flow chart of a face recognition method provided in an embodiment of the present specification.

Fig. 3 is a schematic structural diagram of a training apparatus for a face recognition model provided in an embodiment of the present specification.

Fig. 4 is a schematic structural diagram of a face recognition apparatus provided in an embodiment of this specification.

Fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of this specification.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.

As described above, with the large-scale application of the online payment scene of face recognition, the requirement for the false recognition passing rate of 1: N is higher and higher. How to further improve the performance of face recognition to ensure the development of services has become an urgent need. The widely used face recognition technology at present is based on the face imaging of a visible color light RGB camera, and identity verification and recognition are carried out on the face features in RGB pictures. However, with the expansion of business, the RGB face recognition technology only uses the texture information of the face, and it is increasingly difficult to meet the extreme requirements for the throughput rate and the misrecognition rate in performance.

Under the background, the document aims to provide a multi-modal face recognition scheme, which can realize better recognition and distinguishing performance so as to meet higher required false recognition passing rate.

Fig. 1 is a flowchart of a training method of a face recognition model according to an embodiment of the present disclosure. The method shown in fig. 1 may be performed by a corresponding apparatus, comprising:

step S102, acquiring the face feature data of at least two modal images corresponding to the sample object.

The at least two modality images may include, but are not limited to: at least one of a near infrared IR face image, a visible color RGB face image, and a depth 3D face image. In practical application, the step may capture, by a camera of the terminal device, a face image of the sample object based on shooting modes of different modality images, respectively, and extract face feature data from the face image.

And step S104, performing feature fusion on the face feature data of the at least two modal images corresponding to the sample object to obtain face fusion feature data corresponding to the sample object.

It should be understood that the manner of feature fusion is not exclusive and the embodiments of the present specification are not particularly limited. As an exemplary introduction thereto:

in the step, the feature fusion model belonging to the convolutional neural network can be used for carrying out feature fusion on the face feature data of the images in different modes.

Wherein, the feature fusion model comprises:

and the convolution layer is used for performing convolution processing on the face characteristic data of the at least two modal images corresponding to the obtained sample object to obtain the output characteristic data of the convolution layer. In embodiments of the present description, the convolutional layer may filter out unwanted information in the modal image.

The pooling layer is used for pooling the convolution layer output characteristic set based on a maximum pooling algorithm and/or a mean pooling algorithm to obtain pooling layer output characteristic data; in an embodiment of the present description, the pooling layer may compress the amount of data and parameters to reduce overfitting.

And the connection layer is used for combining and reducing the dimension of the characteristic data output by the pooling layer to obtain the human face fusion characteristic data corresponding to the sample object. In the embodiment of the present specification, the face feature data of images in different modalities can be fused by using the connection layer classification capability.

It should be understood that after the face feature data of the at least two modality images corresponding to the sample object is input into the feature fusion model, the face fusion feature data corresponding to the sample object can be extracted from the connection layer of the feature fusion model.

And step S106, taking the face fusion characteristic data corresponding to the sample object as the input of the face recognition model, taking the recognition classification label corresponding to the sample object as the output of the face recognition model, and training the face recognition model.

In a specific training process, after target encryption characteristic data is input into the face recognition model, a training result given by the face recognition model can be obtained. The training result is a predicted classification result of the preset learning model for the sample user, and may be different from a true value classification result indicated by the identification classification label of the sample user. In the embodiments of the present disclosure, an error value between the predicted classification result and the true classification result may be calculated based on a loss function derived from the maximum likelihood estimation, and parameters in the face recognition model (for example, weight values of bottom-layer vectors) may be adjusted to achieve a training effect with the purpose of reducing the error value.

The model training method of the embodiment of the specification performs feature fusion on the face feature data of different modal images to obtain face fusion feature data with more obvious distinctiveness, and trains the face recognition model by using the face fusion feature data, so that face recognition with higher performance can be realized by using the face recognition model, and the false recognition passing rate can be obviously reduced.

The method of the embodiments of the present disclosure is described below with reference to practical application scenarios.

The application scene carries out feature fusion on the face feature data of the sample object in the IR face image, the RGB face image and the 3D face image so as to train the face recognition model. The corresponding process mainly comprises the following steps:

the method comprises the steps of firstly, obtaining face feature data of a sample object in an IR face image, an RGB face image and a 3D face image.

Specifically, this step may start a camera of the terminal device, capture a face image of the sample object based on the shooting modes of the different modality images, respectively, and extract face feature data from the face image.

It will be appreciated that facial images can present rich feature data, which contains some information that is not useful for face recognition. In order to filter out useless information, human face feature data which is meaningful for face recognition can be given by means of model training and model interpretability.

Specifically, the present specification embodiment may set the feature extraction model for the above-described three different modality images, respectively. Taking the feature extraction model of the target modal image as an example, the feature extraction model of the target modal image can be trained based on the face feature data of the target modal image corresponding to other sample objects and the identification classification labels corresponding to other sample objects to obtain interpretation data for the face feature data of the target modal image, then, the effective feature dimension of the target modal image is determined by using the interpretation data, and the feature extraction is performed on the target modal image of the sample object according to the effective feature dimension of the target modal image to obtain the face feature data of the target modal image corresponding to the sample object.

For example, after training the feature extraction model of the target modality image is completed, the useful feature dimension of the target modality image may be determined based on the weight value of each feature vector in the target modality image. It should be understood that the weight values of the feature vectors in the target modality image reflect the degree of influence of the feature vectors on the model training result. Obviously, the larger the weight value is, the more important this feature vector is for face recognition. Therefore, the feature dimension required to be subjected to feature extraction can be determined by the feature vector of which the weight value reaches the preset threshold value in the feature extraction model based on the target modality image.

For ease of understanding, simple examples:

the embodiment of the specification aims to find the face feature data of the sample object, which is useful in the RGB face image. A feature extraction model can be set in advance for the RGB face image, and various face feature data of other sample objects in the RGB face image and identification classification labels corresponding to other sample objects are used for training the feature extraction model. After training of the feature extraction model is completed, a weight value of the bottom layer feature vector is given, and the weight value is interpretation data representing the importance of the bottom layer feature vector. And constructing a feature dimension for feature extraction of the RGB face image by selecting the weighted bottom-layer feature vector reaching a certain threshold value. For example, in the trained feature extraction model, if the weight value of the bottom-layer feature vector reflecting the central region gray level is higher, it can be determined that the central region gray level is the effective feature dimension of the RGB face image.

In the step of obtaining the face feature data of the sample object in the RGB face image, the gray value of the central area of the sample object in the RGB face image may be extracted. This central region gray value is used for feature fusion with the feature data of the sample object in the other modality images.

Obviously, by the above method, the face feature data of the sample object in the images of different modalities can be extracted, and the details are not repeated herein.

And secondly, performing feature fusion on the face feature data in the IR face image, the RGB face image and the 3D face image corresponding to the sample object to obtain face fusion feature data RGB _ IR _ 3D.

Specific features are introduced above, and are not described herein again by way of example.

It should be noted that, in the embodiment of the present specification, the face feature data of three sample objects in the IR face image, the RGB face image, and the 3D face image may be directly fused into the face fusion feature data RGB _ IR _3D in one step; or, feature fusion is performed on the face feature data of the sample object in the two modality images, for example, to obtain intermediate fused feature data RGB _ IR, and then feature fusion is performed on the intermediate fused feature data RGB _ IR and the face feature data of the sample object in the remaining modality images to obtain fused feature data RGB _ IR _ 3D.

And step three, training a face recognition model based on the face fusion feature data RGB _ IR _3D and the recognition classification labels corresponding to the sample objects.

The specific training method is described above, and is not described herein again by way of example. It should be further noted that any deep learning model (such as a gradient lifting tree model and a logistic regression model) with a classification function may be applied to the face recognition model in the embodiment of the present specification, and the embodiment of the present specification does not specifically limit the face recognition model.

Obviously, the application scene carries out more effective mechanical learning on the information of the RGB face image, the IR face image and the depth face image, compared with the mechanical learning only by utilizing the RGB image, the model has higher identification and distinguishing performance, and can obviously reduce the false identification passing rate.

It should be understood that the face recognition model trained by the above training method has the capability of performing user recognition based on fused face feature data). Further, the embodiment of the present specification also provides a face recognition method based on the face recognition model. Fig. 2 is a flowchart of a face recognition method according to an embodiment of the present disclosure. The method shown in fig. 2 may be performed by a corresponding apparatus, comprising:

step S202, acquiring the face feature data of the object to be recognized corresponding to at least two modal images.

And step S204, performing feature fusion on the face feature data of the at least two modal images corresponding to the object to be recognized to obtain face fusion feature data corresponding to the object to be recognized.

Step S206, inputting the face fusion characteristic data corresponding to the object to be recognized into a face recognition model to obtain a recognition result corresponding to the object to be recognized, wherein the face recognition model is obtained by training based on the face fusion characteristic data corresponding to the sample object and the recognition classification label corresponding to the sample object, and the face fusion characteristic data corresponding to the sample object is obtained by fusing the face characteristic data of the at least two modal images corresponding to the sample object.

The face recognition method of the embodiment of the description performs feature fusion on the face feature data of different modal images to obtain more obvious discriminative face fusion feature data, and trains a face recognition model by using the face fusion feature data, so that the face recognition model can be used for realizing higher-performance face recognition, and the false recognition passing rate can be obviously reduced.

In practical applications, the face recognition method in the embodiments of the present specification may be used in any application scenario that requires identity verification. Such as a face payment scenario, a screen unlock scenario, etc. Taking a face payment scenario as an example, when the object to be recognized uses the payment device to perform face payment, the payment device may be controlled to obtain the face fusion feature data corresponding to the object to be recognized based on the steps shown in fig. 2, and initiate identity verification whether the object to be recognized belongs to a target user (a legal payment user).

The above is an introduction of the training method of the face recognition model and an introduction of the face recognition method by the face recognition model in the embodiment of the present specification. It will be appreciated that appropriate modifications may be made without departing from the principles outlined herein, and such modifications are intended to be included within the scope of the embodiments herein.

Corresponding to the training method of the face recognition model, the embodiment of the specification further provides a training device of the face recognition model. Fig. 3 is a schematic structural diagram of a training device 300 according to an embodiment of the present disclosure, including:

an obtaining module 310, configured to obtain face feature data of at least two modality images corresponding to a sample object;

a fusion module 320, configured to perform feature fusion on the face feature data of the at least two modality images corresponding to the sample object, so as to obtain face fusion feature data corresponding to the sample object;

the training module 330 is configured to train the face recognition model by using the face fusion feature data corresponding to the sample object as an input of the face recognition model and using the recognition classification label corresponding to the sample object as an output of the face recognition model.

The training device of the embodiment of the description performs feature fusion on the face feature data of images in different modes to obtain face fusion feature data with more obvious distinctiveness, and trains a face recognition model by using the face fusion feature data, so that face recognition with higher performance can be realized by using the face recognition model, and the false recognition passing rate can be obviously reduced.

Optionally, when executing, the fusion module 320 specifically inputs the face feature data of the at least two modality images corresponding to the sample object into a feature fusion model, so as to obtain the face fusion feature data corresponding to the sample object.

Wherein, the feature fusion model comprises:

The pooling layer performs pooling processing on the convolution layer output feature set based on a maximum pooling algorithm and/or a mean pooling algorithm to obtain pooled layer output feature data; in an embodiment of the present description, the pooling layer may compress the amount of data and parameters to reduce overfitting.

And the connection layer combines and reduces the dimension of the characteristic data output by the pooling layer to obtain the human face fusion characteristic data corresponding to the sample object. In the embodiment of the present specification, the face feature data of images in different modalities can be fused by using the connection layer classification capability.

Optionally, when the obtaining module 310 is executed, specifically, according to a feature dimension of a target modality image in at least two modality images corresponding to a sample object, performing feature extraction on the target modality image of the sample object to obtain face feature data of the target modality image corresponding to the sample object, where the feature dimension of the target modality image is determined based on interpretation data after a feature extraction model of the target modality image is trained, and the feature extraction model of the target modality image is trained based on face feature data of the target modality image corresponding to another sample object and an identification classification label corresponding to the another sample object.

Wherein the interpretation data includes weight values of respective feature vectors in a feature extraction model of the target modality image. In this embodiment, the feature dimension of the target modality image may be determined based on a feature vector of which a weight value reaches a preset threshold in a feature extraction model of the target modality image.

Optionally, the at least two modality images include: at least one of a near infrared light face image, a visible colored light face image, and a depth face image.

Obviously, the training device of the embodiment of the present specification can be used as the execution subject of the training method shown in fig. 1, and thus can implement the function of the training method implemented in fig. 1. Since the principle is the same, the detailed description is omitted here.

Corresponding to the face recognition method, the embodiment of the specification further provides a face recognition device. Fig. 4 is a schematic structural diagram of a face recognition apparatus 400 according to an embodiment of the present disclosure, including:

the acquiring module 410 acquires face feature data of at least two modal images corresponding to an object to be recognized;

the fusion module 420 is configured to perform feature fusion on the face feature data of the at least two modality images corresponding to the object to be recognized, so as to obtain face fusion feature data corresponding to the object to be recognized;

the recognition module 440 inputs the face fusion feature data corresponding to the object to be recognized into a face recognition model, and obtains a recognition result corresponding to the object to be recognized, wherein the face recognition model is obtained by training based on the face fusion feature data corresponding to the sample object and the recognition classification label corresponding to the sample object, and the face fusion feature data corresponding to the sample object is obtained by fusing the face feature data corresponding to the at least two modal images with the sample object.

The face recognition device of the embodiment of the description performs feature fusion on face feature data of images in different modes to obtain face fusion feature data with more obvious distinctiveness, and trains a face recognition model by using the face fusion feature data, so that face recognition with higher performance can be realized by using the face recognition model, and the false recognition passing rate can be obviously reduced.

Specifically, the embodiment of the present specification may obtain, by a terminal device initiating user identification, face feature data of at least two modality images corresponding to a sample object. That is, the obtaining module 410 specifically starts a camera function of the terminal device, and performs image acquisition on the object to be recognized in at least two modality image acquisition manners to obtain the face feature data of the object to be recognized corresponding to at least two modality images.

In practical applications, the face recognition apparatus of the embodiment of the present specification may be used in a payment device. When the object to be recognized uses the payment device to perform face payment, the face recognition device based on the payment device can acquire face fusion feature data corresponding to the object to be recognized so as to be used for identity verification required to be executed in the payment process.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present specification. Referring to fig. 5, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 5, but this does not indicate only one bus or one type of bus.

And the memory is used for storing programs. Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present specification. Referring to fig. 5, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

Optionally, the processor reads a corresponding computer program from the non-volatile memory into the memory and then runs the computer program to form the training device of the face recognition model on a logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:

and acquiring the face feature data of the sample object corresponding to the at least two modal images.

And performing feature fusion on the face feature data of the sample object corresponding to the at least two modal images to obtain face fusion feature data corresponding to the sample object.

And training the face recognition model by taking the face fusion characteristic data corresponding to the sample object as the input of the face recognition model and taking the recognition classification label corresponding to the sample object as the output of the face recognition model.

Or the processor reads a corresponding computer program from the nonvolatile memory into the memory and then runs the computer program, and the face recognition device is formed on a logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:

and acquiring the face feature data of the object to be recognized corresponding to at least two modal images.

And performing feature fusion on the face feature data of the at least two modal images corresponding to the object to be recognized to obtain the face fusion feature data corresponding to the object to be recognized.

The above-mentioned training method disclosed in the embodiment shown in fig. 1 or the face recognition method disclosed in the embodiment shown in fig. 2 may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in a hardware decoding processor, or in a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

It should be understood that the electronic device of the embodiment of the present specification may implement the functions of the above-described feature extraction apparatus in the embodiment shown in fig. 1, or implement the functions of the above-described model training apparatus in the embodiment shown in fig. 2. Since the principle is the same, the detailed description is omitted here.

Of course, besides the software implementation, the electronic device in this specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.

Furthermore, the present specification embodiments also propose a computer-readable storage medium storing one or more programs, the one or more programs including instructions.

Wherein the above instructions, when executed by a portable electronic device comprising a plurality of application programs, enable the portable electronic device to perform the method of the embodiment shown in fig. 1, and are specifically configured to perform the following steps:

a sequence of multi-frame images presenting a sample object is acquired.

And performing feature representation on the obtained multi-frame image sequence by taking nonlinear conversion as an encryption mode to obtain initial encryption feature data of the multi-frame image sequence corresponding to the sample object, wherein the feature data of the sample object presented by the multi-frame image sequence belongs to the personal data of the sample object.

And performing ensemble learning on the initial encrypted feature data of the multi-frame image sequence corresponding to the sample object to obtain target encrypted feature data corresponding to the sample object.

Alternatively, the above instructions, when executed by a portable electronic device comprising a plurality of application programs, can cause the portable electronic device to perform the method of the embodiment shown in fig. 2, and is specifically configured to perform the following steps:

a sequence of multi-frame images presenting a sample object is acquired.

And training a preset learning model based on the target encryption characteristic data corresponding to the sample object and the model classification label of the sample object.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification. Moreover, all other embodiments obtained by a person skilled in the art without making any inventive step shall fall within the scope of protection of this document.

Claims

1. A face recognition method, comprising:

the method for acquiring the face feature data of the object to be recognized corresponding to at least two modal images comprises the following steps: according to the characteristic dimension of a target modal image in at least two modal images corresponding to a sample object, performing characteristic extraction on the target modal image of the sample object to obtain face characteristic data of the target modal image corresponding to the sample object, wherein the characteristic dimension of the target modal image is determined based on interpretation data after training of a characteristic extraction model of the target modal image, the characteristic extraction model of the target modal image is trained based on face characteristic data of the target modal image corresponding to other sample objects and identification classification labels corresponding to other sample objects, and the interpretation data comprises weight values of characteristic vectors in the characteristic extraction model of the target modal image;

2. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

the method for acquiring the face feature data of the object to be recognized corresponding to at least two modal images comprises the following steps:

and starting a camera shooting function of the terminal equipment, and carrying out image acquisition on the object to be recognized in at least two modal image acquisition modes to obtain the face feature data of the object to be recognized corresponding to at least two modal images.

3. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

the face fusion feature data corresponding to the sample object is obtained in the following manner:

inputting the face feature data of the sample object corresponding to the at least two modality images into a feature fusion model to obtain the face fusion feature data corresponding to the sample object, wherein the feature fusion model comprises:

the convolution layer is used for carrying out convolution processing on the face characteristic data of the at least two modal images corresponding to the sample object to obtain convolution layer output characteristic data;

the pooling layer performs pooling processing on the convolution layer output feature set based on a maximum pooling algorithm and/or a mean pooling algorithm to obtain pooled layer output feature data;

and the connection layer combines and reduces the dimension of the characteristic data output by the pooling layer to obtain the human face fusion characteristic data corresponding to the sample object.

4. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

the feature dimension of the target modality image is determined based on a feature vector of which the weight value reaches a preset threshold value in the feature extraction model of the target modality image.

5. The method according to any one of claims 1 to 4,

the at least two modality images include: at least two of the near-infrared light face image, the visible colored light face image and the depth face image.

6. A face recognition apparatus comprising:

the acquisition module acquires the face feature data of at least two modal images corresponding to the object to be recognized, and comprises: according to the characteristic dimension of a target modal image in at least two modal images corresponding to a sample object, performing characteristic extraction on the target modal image of the sample object to obtain face characteristic data of the target modal image corresponding to the sample object, wherein the characteristic dimension of the target modal image is determined based on interpretation data after training of a characteristic extraction model of the target modal image, the characteristic extraction model of the target modal image is trained based on face characteristic data of the target modal image corresponding to other sample objects and identification classification labels corresponding to other sample objects, and the interpretation data comprises weight values of characteristic vectors in the characteristic extraction model of the target modal image;

7. An electronic device includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program being executed by the processor to:

8. A computer-readable storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of: