CN113239858A

CN113239858A - Face detection model training method, face recognition method, terminal and storage medium

Info

Publication number: CN113239858A
Application number: CN202110588537.4A
Authority: CN
Inventors: 陈登峰; 陈鹏文; 肖海燕; 陈俊彤; 赵亮
Original assignee: Xian University of Architecture and Technology
Current assignee: Xian University of Architecture and Technology
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2021-08-10

Abstract

The invention discloses a face detection model training method, a face recognition method, a terminal and a storage medium, wherein a RENTINAVACE face detection algorithm and a facenet face recognition method are adopted for integration, an improved RENTINAVACE method is adopted at the same time, a trunk network adopts an improved mobilenetv2 lightweight neural network, so that the weight of the trunk network is reduced, the speed of edge equipment is improved, meanwhile, the combined method can reduce the burden of limited computing power of the edge equipment, and meanwhile, the high detection precision of an algorithm model is ensured.

Description

Face detection model training method, face recognition method, terminal and storage medium

Technical Field

The invention belongs to the field of computer vision, and particularly relates to a face detection model training method, a face recognition method, a terminal and a storage medium.

Background

The human face recognition is one of the most successful applications of the computer vision, receives wide attention, the image processing and the computer vision progress, and has made great progress in the field in the past decade, especially in the development of deep learning, so that the human face recognition technology has a great breakthrough, a new height is improved, and as the deep learning occupies a large amount of CPU and is difficult to operate on mobile equipment, the application of the human face recognition also receives a certain obstruction, so that a great deal of work is needed in developing a processing method more suitable for the task and optimizing an algorithm operated on the mobile equipment, and the current greatest challenge is to optimize the deep learning algorithm, so that the deep learning algorithm is light-weighted, the speed of the edge equipment is improved, the burden of limited calculation power of the edge equipment is reduced, and the high detection precision of the algorithm model is ensured.

Disclosure of Invention

The invention aims to overcome the defects and provides a face detection model training method, a face recognition method, a terminal and a storage medium, which can efficiently and accurately recognize a face, so that the face detection model training method is suitable for edge equipment and realizes edge calculation.

In order to achieve the aim, the face detection model training method comprises the following steps:

acquiring a face sample image, labeling a face position and a face key point in the face sample image to form a labeling file, preprocessing the face sample image, and then normalizing;

and carrying out supervision training by improving a retinafece detection algorithm model according to the normalized sample face image to obtain a target face detection model.

A backbone network of an improved retinaface detection algorithm selects a mobilnetv2 neural network, and the main component of the neural network is inverted reblock;

the specific method for improving the retinaface detection algorithm is as follows:

performing dimension increasing by using 1x1 convolution, then performing depth separable convolution by using 3x3 convolution, and finally performing dimension reduction by using 1x1, wherein the dimension reduction comprises a residual part;

using the FPN feature pyramid, FPN construction is performed on the three feature layers of 56x56x64, 28x28x128 and 14x14x256, which are finally generated by mobilenetv 2;

feature extraction is further enhanced using SSH modules.

A face recognition method comprises the following steps:

step one, a face image or a real-time video is transmitted into an improved retinaface face detection algorithm model to obtain a face prediction frame, and a detected face prediction frame is obtained;

intercepting by a coordinate mode to obtain a local image of the face;

thirdly, performing face correction on the image to align the image to obtain a local face prediction frame;

and fourthly, coding the face by using a face recognition algorithm, and finally comparing a face coding result with the face codes in the database to obtain the identity information of the face.

The face image input size is 224x224x3, where 224xz224 is the image size and 3 is the number of channels.

The face image comprises an annotation file, wherein the annotation file comprises a path of a picture, the height and the width of a face frame, the center position of the face frame and five key point information, including left eye information, right eye information, left mouth corner information, right mouth corner information and nose information.

The human face detection algorithm model uses an improved retinafee detection algorithm, a main network of the improved retinafee detection algorithm selects a mobilnetv2 neural network, the main component of the model is inverted reblock, dimension increasing is carried out by utilizing 1x1 convolution, then depth separable convolution is carried out by utilizing 3x3 convolution, and finally dimension reduction is carried out by utilizing 1x1, wherein the dimension reduction comprises a residual error part.

And intercepting the coordinates of the face prediction frame, aligning the intercepted face frame by the coordinates relative to the upper left corner of the face picture.

And adopting the facenet algorithm to transmit a face detection frame detected by the improved face detection algorithm into the face recognition algorithm, and transmitting the aligned faces into the facenet model to obtain the feature vector of each face.

And comparing and subtracting the face coding result with the database, taking the Euclidean distance, setting a threshold value after obtaining the shortest face characteristic vector, and if the distance is smaller than the set threshold value, realizing face recognition.

A terminal comprising a processor, the processor being configured to store a memory for the processor to execute instructions; the processor is configured to perform a face recognition method.

A computer-readable storage medium storing a computer program which, when executed by a processor, implements a face recognition method.

Compared with the prior art, the invention combines a rentinaface face detection algorithm with a facenet face recognition method, adopts an improved retinafee method, adopts an improved mobilenetv2 lightweight neural network as a main network, and lightens the weight of the main network so as to improve the speed of the edge equipment.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic diagram of a face detection network structure of an improved retinafeace algorithm of the implementation method of the present invention;

FIG. 3 is a schematic diagram of the FPN feature pyramid formation combined with SSH enhanced feature extraction in the present invention;

fig. 4 is a schematic diagram of the Facenet algorithm structure of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

A face detection model training method comprises the following steps:

firstly, performing dimension increasing by using 1x1 convolution, then performing depth separable convolution by using 3x3 convolution, and finally performing dimension reduction by using 1x1, wherein a residual error part is contained to form an inverted reblock module;

feature extraction is further enhanced using SSH modules.

The training environment is windows10, tensorflow1.13, keras2.56, GPU is GeForce GTX 1080Ti, and the running software is pycharm, python 3.6.

Referring to fig. 1, a face recognition method includes the following steps:

introducing a face image or a video into a face detection algorithm model to obtain a face prediction box, wherein the input size of the face image is 224x224x3, 224xz224 is the image size, and 3 is the number of channels; the face image comprises an annotation file, wherein the annotation file comprises a path of a picture, the height and the width of a face frame, the center position of the face frame and five key point information, including left eye information, right eye information, left mouth corner information, right mouth corner information and nose information.

Intercepting by a coordinate mode to obtain a local image of the face, and intercepting a coordinate of a face prediction frame, wherein the coordinate is relative to the upper left corner of the face picture, and aligning the intercepted face frame;

then, carrying out face correction and alignment on the image to obtain a local face prediction frame;

and coding the face by using a face recognition algorithm, and finally comparing a face coding result with a database to obtain the identity information of the face.

The human face detection algorithm model improves a retinafece detection algorithm, a main network of the improved retinafece detection algorithm selects a mobilnetv2 neural network, the main component of the main network is inverted reblock, dimension increasing is carried out by utilizing convolution of 1x1, then depth separable convolution is carried out by utilizing convolution of 3x3, and finally dimension reduction is carried out by utilizing convolution of 1x1, wherein the dimension reduction comprises a residual error part. And adopting the facenet algorithm to transmit a face detection frame detected by the improved face detection algorithm into the face recognition algorithm, and transmitting the aligned faces into the facenet model to obtain the feature vector of each face.

Referring to fig. 2, an input image in the modified mobilnetv2 network is 224X3, 112X112X16 is obtained by convolution, s is a step size, then batch normalization (batch Norm) and ReLU function activation are performed on the input image, the input image is finally input into an Inverted _ resblock network, the step size is 1, a convolution kernel is 32, a shape of a feature layer 112X32 is obtained, then a next feature layer is input, the feature layer is two Inverted _ resblock, the step size is 2, so that a1 is obtained as a shape of 56X56X64, then a next feature layer is input, the feature layer is six Inverted _ resblock, the step size is 2, so that a2 is 28X28X128, and two Inverted _ block in the last feature layer are obtained as a step size of 2, and finally A3 is obtained as a shape of 28X 128X.

Referring to fig. 3, an FPN feature pyramid is formed by using the a1, a2, and A3 feature layers obtained in fig. 2, and a B3 layer of feature output is obtained by performing channel adjustment on A3; carrying out up-sampling and channel adjustment on A3 and carrying out feature fusion with A2 to obtain a feature output B2 layer; similarly, performing up-sampling and channel adjustment on the A2 and performing feature fusion on the A1 to obtain a B1 layer of feature output; finally, the SSH module further enhances the feature extraction of B1, B and B2, and finally outputs a 20x20 effective feature layer of 80x80 and 40x40, thereby obtaining a face frame and face key points; the SSH module may enhance the receptive field by replacing the convolution of 5X5 and 7X7 with a convolution superposition of 3X 3.

Referring to fig. 4, the obtained human face prediction image is subjected to feature extraction through a deep learning network, and then is subjected to L2 standardization to obtain a 128-dimensional feature vector.

Claims

1. A face detection model training method is characterized by comprising the following steps:

2. The training method of the face detection model according to claim 1, wherein a backbone network of an improved retinafece detection algorithm selects a mobilnetv2 neural network, the main component of which is inverted reblock;

performing dimension increasing by using convolution of 1x1, then performing depth separable convolution by using convolution of 3x3, and finally performing dimension reduction by using convolution of 1x1, wherein a residual error part is contained to form an inverted reblock module;

feature extraction is further enhanced using SSH modules.

3. A face recognition method is characterized by comprising the following steps:

firstly, a face image or a real-time video is transmitted into an improved retinaface face detection algorithm model to obtain a face prediction frame;

intercepting by a coordinate mode to obtain a local image of the face;

4. A face recognition method as claimed in claim 3, wherein the input size of the face image is 224x224x3, where 224xz224 is the image size and 3 is the number of channels.

5. The face recognition method of claim 3, wherein the face image comprises an annotation file, the annotation file comprises a path of the picture, a height and a width of a face frame, a center position of the face frame, and five key point information including left eye information, right eye information, left mouth angle information, right mouth angle information, and nose information;

the face detection algorithm model uses an improved retinaface detection algorithm.

6. A face recognition method according to claim 3, characterized in that the face prediction frame is subjected to coordinate clipping, and the coordinates are aligned with respect to the top left corner of the face picture.

7. The face recognition method of claim 3, wherein the facenet algorithm is adopted to introduce the face detection box detected by the improved face detection algorithm into the face recognition algorithm, and the aligned face is introduced into the facenet model to obtain the feature vector of each face.

8. The face recognition method of claim 3, wherein the face coding result is compared with the database and subtracted, the Euclidean distance is taken to obtain the shortest face feature vector, then a threshold is set, and if the distance is smaller than the set threshold, the face recognition can be realized.

9. A terminal comprising a processor, the processor being configured to store a memory for the processor to execute instructions; the processor is configured to perform the method of any one of claims 1-8.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 8.