CN110569826B

CN110569826B - Face recognition method, device, equipment and medium

Info

Publication number: CN110569826B
Application number: CN201910882737.3A
Authority: CN
Inventors: 唐健; 石伟; 陶昆; 王志元
Original assignee: Shenzhen Jieshun Science and Technology Industry Co Ltd
Current assignee: Shenzhen Jieshun Science and Technology Industry Co Ltd
Priority date: 2019-09-18
Filing date: 2019-09-18
Publication date: 2022-05-24
Anticipated expiration: 2039-09-18
Also published as: CN110569826A

Abstract

The application discloses a face recognition method, a face recognition device, face recognition equipment and a face recognition medium, wherein the face recognition device comprises the following steps: obtaining a training sample; the training sample is a face image comprising face label information; carrying out region division on the training sample to obtain a first region image and a second region image; the first area image is an image including a human eye area, and the second area image is an image not including the human eye area; inputting the first area image and the second area image into a convolutional neural network model, respectively performing convolution operation on the first area image and the second area image to obtain corresponding first characteristics and second characteristics, then fusing the first characteristics and the second characteristics to obtain target characteristics, and obtaining a trained model by using the target characteristics; wherein the feature dimension of the first feature is lower than the feature dimension of the second feature; and when the face image to be recognized is obtained, outputting a corresponding recognition result by using the trained model. Therefore, the accuracy of the face recognition of the user wearing the glasses can be improved.

Description

Face recognition method, device, equipment and medium

Technical Field

The present application relates to the field of face recognition technologies, and in particular, to a face recognition method, apparatus, device, and medium.

Background

With the rapid progress of image processing and pattern recognition technology and the convenience of face recognition, face recognition systems based on video image processing are widely applied, and face recognition systems are already available in the fields of attendance checking, access control, safety monitoring and the like.

In the prior art, the face recognition effect is often influenced by factors such as light, make-up, wearing glasses and the like, and especially when wearing large black frame glasses, the face recognition effect and the experience of face recognition equipment are seriously influenced.

Disclosure of Invention

In view of this, an object of the present application is to provide a method, an apparatus, a device and a medium for face recognition, which can reduce the influence of wearing glasses on face recognition, thereby improving the accuracy of face recognition with wearing glasses. The specific scheme is as follows:

in a first aspect, the present application discloses a face recognition method, including:

obtaining a training sample; the training sample is a face image comprising face label information;

performing region division on the training sample to obtain a first region image and a second region image; the first area image is an image including a human eye area, and the second area image is an image not including the human eye area;

inputting the first area image and the second area image into a convolutional neural network model, performing convolution operation on the first area image and the second area image respectively to obtain corresponding first features and second features, then fusing the first features and the second features to obtain target features, and obtaining a trained model by using the target features; wherein a feature dimension of the first feature is lower than a feature dimension of the second feature;

and when the face image to be recognized is obtained, outputting a corresponding recognition result by using the trained model.

Optionally, the performing region division on the training sample to obtain a first region image and a second region image includes:

acquiring position information of eyes and nose tips in the training sample;

determining a first boundary line of the training sample by using the position information of the eyes and the nose tip;

and carrying out region division on the training sample by using the first boundary line to obtain a first region image and a second region image.

acquiring position information of a lower frame of the glasses in the training sample;

determining a second boundary line of the training sample by using the position information of the lower frame of the glasses;

and carrying out region division on the training sample by using the second boundary line to obtain a first region image and a second region image.

Optionally, the obtaining a training sample includes:

obtaining an initial training sample;

extracting a face region in the initial training sample;

and adjusting the sizes of the face regions extracted from different initial training samples to the same size to obtain an optimized training sample.

In a second aspect, the present application discloses a face recognition apparatus, comprising:

the sample acquisition module is used for acquiring a training sample; the training sample is a face image comprising face label information;

the region dividing module is used for performing region division on the training sample to obtain a first region image and a second region image; the first area image is an image including a human eye area, and the second area image is an image not including the human eye area;

the model training module is used for inputting the first area image and the second area image into a convolutional neural network model, performing convolution operation on the first area image and the second area image respectively to obtain corresponding first features and second features, fusing the first features and the second features to obtain target features, and obtaining a trained model by using the target features; wherein a feature dimension of the first feature is lower than a feature dimension of the second feature;

and the image recognition module is used for outputting a corresponding recognition result by utilizing the trained model when the face image to be recognized is obtained.

In a third aspect, the present application discloses an electronic device comprising a processor and a memory; wherein the content of the first and second substances,

the memory is used for storing a computer program;

the processor is configured to execute the computer program to implement the foregoing face recognition method.

In a fourth aspect, the application discloses a face recognition device, which includes the electronic device.

In a fifth aspect, the present application discloses a computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the aforementioned face recognition method.

Therefore, the training sample is firstly obtained; the training sample is a face image comprising face label information, and the training sample is subjected to region division to obtain a first region image and a second region image; the method comprises the steps that a first area image is an image including a human eye area, a second area image is an image not including the human eye area, the first area image and the second area image are input to a convolutional neural network model, convolution operation is respectively carried out on the first area image and the second area image to obtain a corresponding first feature and a second feature, then the first feature and the second feature are fused to obtain a target feature, and a trained model is obtained by utilizing the target feature; and when the face image to be recognized is acquired, outputting a corresponding recognition result by using the trained model. That is, this application divides the training sample into including the regional first regional image of people's eye and not including the regional second regional image of people's eye earlier, then reduces the shared proportion of first regional image characteristic in whole face image characteristics in the training process, and the model after the training that obtains like this has reduced the influence of wearing glasses to face identification to the rate of accuracy of wearing glasses face identification has been promoted.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a face recognition method disclosed in the present application;

fig. 2 is a flowchart of a specific face recognition method disclosed in the present application;

fig. 3 is a specific face image region division diagram disclosed in the present application;

FIG. 4 is a flowchart of a convolutional neural network model training method disclosed in the present application

Fig. 5 is a flowchart of a specific face recognition method disclosed in the present application;

fig. 6 is a schematic structural diagram of a face recognition apparatus disclosed in the present application;

FIG. 7 is a block diagram of an electronic device disclosed herein;

fig. 8 is a structural diagram of a face recognition device disclosed in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the prior art, the face recognition effect is often influenced by factors such as light, make-up and glasses wearing, and especially when large black frame glasses are worn, the face recognition effect and the experience of face recognition equipment are seriously influenced. Therefore, the application provides a face recognition scheme, the influence of wearing glasses on face recognition can be reduced, and therefore the accuracy of face recognition of wearing glasses is improved.

Referring to fig. 1, an embodiment of the present application discloses a face recognition method, which is characterized by including:

step S11: obtaining a training sample; the training samples are face images including face label information.

In a specific implementation manner, in this embodiment, an initial training sample may be obtained first, then a face region in the initial training sample is extracted, and sizes of the face regions extracted in different initial training samples are adjusted to the same size, so as to obtain an optimized training sample. For example, all images in the VGG-face2 are obtained as initial training samples, and then about 320 ten thousand initial training samples of 9131 individuals are obtained through detection and alignment processing by using an MTCNN (Multi-task convolutional neural network) face detection and alignment algorithm to obtain optimized training samples, where all training samples may be processed to a preset size, such as 128 × 128.

Step S12: carrying out region division on the training sample to obtain a first region image and a second region image; the first area image is an image including a human eye area, and the second area image is an image not including the human eye area.

In a specific embodiment, the training sample may be divided into regions to obtain a first region image and a second region image with the same size, and specifically, the training sample may be divided into the first region image and the second region image with the size of 64 × 128. For example, referring to fig. 2, fig. 2 is a specific face image region division diagram disclosed in the embodiment of the present application.

Step S13: inputting the first area image and the second area image into a convolutional neural network model, performing convolution operation on the first area image and the second area image respectively to obtain corresponding first features and second features, then fusing the first features and the second features to obtain target features, and obtaining a trained model by using the target features; wherein a feature dimension of the first feature is lower than a feature dimension of the second feature.

For example, in this embodiment, an open-source ResNet20 may be used as a convolutional neural network model to be trained, the first region image and the second region image are input to ResNet20, then operations such as convolution and prelu are performed respectively to obtain two part features, in this embodiment, the feature dimension after final fusion is set to 512 dimensions, then the features of the first region image and the second region image are set to 112 dimensions and 400 dimensions, finally the two part features are fused to obtain a new 512-dimensional feature, and finally the obtained 512-dimensional feature is sent to an open-source AMsoftmax Loss function to continue training to obtain a trained model. Referring to fig. 3, fig. 3 is a flowchart of a convolutional neural network model training method disclosed in the present application.

Step S14: and when the face image to be recognized is obtained, outputting a corresponding recognition result by using the trained model.

In this embodiment, the camera may be used to acquire a face image to be recognized, then the acquired face image to be recognized is input to the trained model, and the trained model is used to output a corresponding recognition result.

In addition, in the embodiment, the same face recognition data set is used for respectively adopting the face recognition method in the prior art and the face recognition method disclosed in the embodiment to perform recognition test, wherein the face recognition data set comprises a glasses-wearing face recognition data set and a standard face recognition data set, and the standard face recognition data set comprises a glasses-wearing face image and a glasses-free face image. Specific results are shown in table 1, it can be found that, by using the face recognition method in the embodiment of the present application, the recognition rates on the standard face recognition data set and the glasses-worn face recognition data set are both improved to some extent, and particularly, the recognition rate on the glasses-worn face recognition data set is improved by about 3%, which indicates that the face recognition method is effective for improving the glasses-worn face recognition rate.

TABLE 1

Recognition rate	Standard face recognition data set	Glasses-worn face recognition data set
			Existing face recognition method	96.75％	83.49％
Face method in the application	97.26％	86％

Therefore, the embodiment of the application firstly obtains the training sample; the training sample is a face image comprising face label information, and the training sample is subjected to region division to obtain a first region image and a second region image; the method comprises the steps that a first area image is an image including a human eye area, a second area image is an image not including the human eye area, the first area image and the second area image are input to a convolution neural network model, convolution operation is respectively carried out on the first area image and the second area image to obtain a corresponding first feature and a corresponding second feature, then the first feature and the second feature are fused to obtain a target feature, and a trained model is obtained by utilizing the target feature; and when the face image to be recognized is acquired, outputting a corresponding recognition result by using the trained model. That is, the embodiment of the application divides the training sample into the first region image including the human eye region and the second region image not including the human eye region, and then reduces the proportion of the first region image feature in all the human face image features in the training process, so that the obtained model after training reduces the influence of wearing glasses on the human face recognition, and the accuracy of wearing glasses on the human face recognition is improved.

Referring to fig. 4, an embodiment of the present application discloses a specific face recognition method, including:

step S21: obtaining a training sample; the training samples are face images including face label information.

For the specific process of the step S21, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

Step S22: and acquiring the position information of the eyes and the nose tip in the training sample.

Step S23: determining a first boundary line of the training sample using the position information of the eyes and the nose tip.

Step S24: and carrying out region division on the training sample by using the first boundary line to obtain a first region image and a second region image.

In a specific implementation, the present embodiment may obtain position information of three points of the eyes and the nose tip in the training sample by using the MTCNN detection algorithm, and determine the first boundary line of the training sample by using the position information of the eyes and the nose tip. For example, a first reference line segment is determined by using position information of two points of eyes, wherein the first reference line segment includes the two points of eyes, and then a line segment parallel to the first reference line segment between the nose tip position and the first reference line segment is determined as a first boundary line. Then, the training sample is subjected to region division by using a first boundary line to obtain a first region image and a second region image, wherein the first region image is an image including a human eye region, and the second region image is an image not including the human eye region.

Step S25: inputting the first area image and the second area image into a convolutional neural network model, performing convolution operation on the first area image and the second area image respectively to obtain corresponding first features and second features, then fusing the first features and the second features to obtain target features, and obtaining a trained model by using the target features; wherein a feature dimension of the first feature is lower than a feature dimension of the second feature.

Step S26: and when the face image to be recognized is obtained, outputting a corresponding recognition result by using the trained model.

For the specific processes of the steps S25 and S26, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

Referring to fig. 5, an embodiment of the present application discloses a specific face recognition method, including:

step S31: obtaining a training sample; the training samples are face images including face label information.

For the specific process of the step S31, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

Step S32: and acquiring the position information of the lower frame of the glasses in the training sample.

Step S33: and determining a second boundary line of the training sample by using the position information of the lower frame of the glasses.

Step S34: and carrying out region division on the training sample by using the second boundary line to obtain a first region image and a second region image.

In a specific implementation, the training sample in this embodiment may be a face image with glasses. In this embodiment, the image detection algorithm may be used to obtain the position information of two points of the lower frame of the glasses in the training sample, and then the position information of the lower frame of the glasses is used to determine the second boundary line of the training sample. For example, a line segment including two points of the lower frame of the glasses may be determined as a second boundary line, or a line segment including two points of the lower frame of the glasses may be determined as a second reference line, then a line segment parallel to the second reference line segment and between the second reference line segment and the pre-detected nose position is determined as a second boundary line, and then the training sample is subjected to region division by using the second boundary line to obtain a first region image and a second region image, where the first region image is an image including a human eye region, that is, the first region image is an image including a region with glasses, and the second region image is an image not including the human eye region.

Step S35: inputting the first area image and the second area image into a convolutional neural network model, performing convolution operation on the first area image and the second area image respectively to obtain corresponding first features and second features, then fusing the first features and the second features to obtain target features, and obtaining a trained model by using the target features; wherein a feature dimension of the first feature is lower than a feature dimension of the second feature.

Step S36: and when the face image to be recognized is obtained, outputting a corresponding recognition result by using the trained model.

For the specific processes of the steps S35 and S36, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

Referring to fig. 6, an embodiment of the present application discloses a face recognition apparatus, including:

a sample obtaining module 11, configured to obtain a training sample; the training sample is a face image comprising face label information;

the region dividing module 12 is configured to perform region division on the training sample to obtain a first region image and a second region image; the first area image is an image including a human eye area, and the second area image is an image not including the human eye area;

the model training module 13 is configured to input the first region image and the second region image into a convolutional neural network model, perform convolution operation on the first region image and the second region image respectively to obtain a corresponding first feature and a corresponding second feature, then fuse the first feature and the second feature to obtain a target feature, and obtain a trained model by using the target feature; wherein a feature dimension of the first feature is lower than a feature dimension of the second feature;

and the image recognition module 14 is configured to output a corresponding recognition result by using the trained model when the face image to be recognized is acquired.

Therefore, the embodiment of the application firstly obtains the training sample; the training sample is a face image comprising face label information, and the training sample is subjected to region division to obtain a first region image and a second region image; the method comprises the steps that a first area image is an image including a human eye area, a second area image is an image not including the human eye area, the first area image and the second area image are input to a convolutional neural network model, convolution operation is respectively carried out on the first area image and the second area image to obtain a corresponding first feature and a second feature, then the first feature and the second feature are fused to obtain a target feature, and a trained model is obtained by utilizing the target feature; and when the face image to be recognized is acquired, outputting a corresponding recognition result by using the trained model. That is to say, in the embodiment of the application, the training sample is divided into the first area image including the human eye area and the second area image not including the human eye area, and then the proportion of the first area image characteristics in all the human face image characteristics is reduced in the training process, so that the influence of wearing glasses on the human face recognition is reduced by the obtained trained model, and the accuracy of the human face recognition with the glasses is improved.

The sample acquiring module 11 may specifically include:

and the initial sample acquisition sub-module is used for acquiring an initial training sample.

And the face region extraction submodule is used for extracting the face region in the initial training sample.

And the face region adjusting submodule is used for adjusting the sizes of the face regions extracted from different initial training samples to the same size so as to obtain an optimized training sample.

In a specific embodiment, the area dividing module 12 may include

The target position acquisition submodule is used for acquiring the position information of the eyes and the nose tip in the training sample;

a first boundary line determining sub-module for determining a first boundary line of the training sample using the position information of the eyes and the nose tip;

and the area division submodule is used for carrying out area division on the training sample by utilizing the first boundary line to obtain a first area image and a second area image.

In another specific embodiment, the area dividing module 12 may include

The target position obtaining submodule is used for obtaining the position information of the lower frame of the glasses in the training sample;

the second boundary line determining submodule is used for determining a second boundary line of the training sample by utilizing the position information of the lower frame of the glasses;

and the region division submodule is used for performing region division on the training sample by using the second boundary line to obtain a first region image and a second region image.

Referring to fig. 7, an embodiment of the present application discloses an electronic device, which includes a processor 21 and a memory 22; wherein, the memory 22 is used for saving computer programs; the processor 21 is configured to execute the computer program to implement the following steps:

obtaining a training sample; the training sample is a face image comprising face label information; performing region division on the training sample to obtain a first region image and a second region image; the first area image is an image including a human eye area, and the second area image is an image not including the human eye area; inputting the first area image and the second area image into a convolutional neural network model, respectively performing convolution operation on the first area image and the second area image to obtain corresponding first features and second features, then fusing the first features and the second features to obtain target features, and obtaining a trained model by using the target features; wherein a feature dimension of the first feature is lower than a feature dimension of the second feature; and when the face image to be recognized is obtained, outputting a corresponding recognition result by using the trained model.

Therefore, the embodiment of the application firstly obtains the training sample; the training sample is a face image comprising face label information, and the training sample is subjected to region division to obtain a first region image and a second region image; the method comprises the steps that a first area image is an image including a human eye area, a second area image is an image not including the human eye area, the first area image and the second area image are input to a convolutional neural network model, convolution operation is respectively carried out on the first area image and the second area image to obtain a corresponding first feature and a second feature, then the first feature and the second feature are fused to obtain a target feature, and a trained model is obtained by utilizing the target feature; and when the face image to be recognized is acquired, outputting a corresponding recognition result by using the trained model. That is, the embodiment of the application divides the training sample into the first region image including the human eye region and the second region image not including the human eye region, and then reduces the proportion of the first region image feature in all the human face image features in the training process, so that the obtained model after training reduces the influence of wearing glasses on the human face recognition, and the accuracy of wearing glasses on the human face recognition is improved.

In this embodiment, when the processor 21 executes the computer subprogram stored in the memory 22, the following steps may be specifically implemented: acquiring position information of eyes and nose tips in the training sample; determining a first boundary line of the training sample by using the position information of the eyes and the nose tip; and carrying out region division on the training sample by using the first boundary line to obtain a first region image and a second region image.

In this embodiment, when the processor 21 executes the computer subprogram stored in the memory 22, the following steps may be specifically implemented: acquiring position information of a lower frame of the glasses in the training sample; determining a second boundary line of the training sample by using the position information of the lower frame of the glasses; and carrying out region division on the training sample by using the second boundary line to obtain a first region image and a second region image.

In this embodiment, when the processor 21 executes the computer subprogram stored in the memory 22, the following steps may be specifically implemented: obtaining an initial training sample; extracting a face region in the initial training sample; and adjusting the sizes of the face regions extracted from different initial training samples to the same size to obtain an optimized training sample.

Referring to fig. 8, an embodiment of the present application discloses a face recognition device 20, which includes an electronic device including a processor 21 and a memory 22 disclosed in the foregoing embodiments. For the steps that the processor 21 can specifically execute, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not described herein again.

Further, the face recognition device 20 in this embodiment may further specifically include:

and the camera 23 is used for acquiring a face image to be recognized.

Further, an embodiment of the present application also discloses a computer-readable storage medium for storing a computer program, where the computer program implements the following steps when executed by a processor:

obtaining a training sample; the training sample is a face image comprising face label information; carrying out region division on the training sample to obtain a first region image and a second region image; the first area image is an image including a human eye area, and the second area image is an image not including the human eye area; inputting the first area image and the second area image into a convolutional neural network model, performing convolution operation on the first area image and the second area image respectively to obtain corresponding first features and second features, then fusing the first features and the second features to obtain target features, and obtaining a trained model by using the target features; wherein a feature dimension of the first feature is lower than a feature dimension of the second feature; and when the face image to be recognized is obtained, outputting a corresponding recognition result by using the trained model.

In this embodiment, when the processor executes the computer subprogram stored in the computer readable storage medium, the following steps may be specifically implemented: acquiring position information of eyes and nose tips in the training sample; determining a first boundary line of the training sample by using the position information of the eyes and the nose tip; and carrying out region division on the training sample by using the first boundary line to obtain a first region image and a second region image.

In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: acquiring position information of a lower frame of the glasses in the training sample; determining a second boundary line of the training sample by using the position information of the lower frame of the glasses; and carrying out region division on the training sample by using the second boundary line to obtain a first region image and a second region image.

In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: obtaining an initial training sample; extracting a face region in the initial training sample; and adjusting the sizes of the face regions extracted from different initial training samples to the same size to obtain an optimized training sample.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The face recognition method, device, equipment and medium provided by the present application are introduced in detail, and a specific example is applied in the text to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A face recognition method, comprising:

acquiring a training sample; the training sample is a face image comprising face label information;

inputting the first area image and the second area image into a convolutional neural network model, performing convolution operation on the first area image and the second area image respectively to obtain corresponding first features and second features, then fusing the first features and the second features to obtain target features, and obtaining a trained model by using the target features; wherein a ratio of the feature dimension of the first feature to the feature dimension of the target feature is lower than a ratio of the feature dimension of the second feature to the feature dimension of the target feature;

2. The face recognition method of claim 1, wherein the performing region division on the training sample to obtain a first region image and a second region image comprises:

acquiring position information of eyes and nose tips in the training sample;

3. The face recognition method of claim 1, wherein the performing region division on the training sample to obtain a first region image and a second region image comprises:

4. The face recognition method according to any one of claims 1 to 3, wherein the obtaining of the training samples comprises:

obtaining an initial training sample;

extracting a face region in the initial training sample;

5. A face recognition apparatus, comprising:

the model training module is used for inputting the first area image and the second area image into a convolutional neural network model, performing convolution operation on the first area image and the second area image respectively to obtain corresponding first features and second features, fusing the first features and the second features to obtain target features, and obtaining a trained model by using the target features; wherein a ratio of the feature dimension of the first feature to the feature dimension of the target feature is lower than a ratio of the feature dimension of the second feature to the feature dimension of the target feature;

6. An electronic device comprising a processor and a memory; wherein the content of the first and second substances,

the memory is used for storing a computer program;

the processor is configured to execute the computer program to implement the face recognition method according to any one of claims 1 to 4.

7. A face recognition device characterized by comprising the electronic device of claim 6.

8. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the face recognition method according to any one of claims 1 to 4.