CN115050069A

CN115050069A - Face and attribute recognition method and system based on deep learning and computer equipment

Info

Publication number: CN115050069A
Application number: CN202210602950.6A
Authority: CN
Inventors: 游亚东; 王一科; 迟大鹏; 于佳辰; 贾林; 涂静一
Original assignee: Shenzhen Kewei Robot Technology Co ltd
Current assignee: Shenzhen Kewei Robot Technology Co ltd
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2022-09-13

Abstract

The invention discloses a face and attribute recognition method, a system and computer equipment based on deep learning, wherein the method comprises the following steps: acquiring each frame of monitoring image of a monitoring video, carrying out face detection to obtain face position information and face key point information of a face, and intercepting an initial face image according to the face position information; calculating a face rotation angle according to the face key point information and rotating to obtain a target face image; inputting a target face image into a face feature coding model to obtain face attribution information; and inputting the target face image into a face attribute recognition model for face attribute recognition to obtain age information and gender information of the face, and outputting face attribution information, age information and gender information of the target face image. The invention respectively obtains the face attribution information, the age information and the gender information by utilizing the face feature coding model and the face attribute identification model, thereby improving the identification efficiency and ensuring more accurate identification effect.

Description

Face and attribute recognition method and system based on deep learning and computer equipment

Technical Field

The invention relates to the technical field of video editing, in particular to a method, a system and computer equipment for recognizing human faces and attributes based on deep learning.

Background

Most of the existing face recognition methods are developed based on the traditional technology, the generalization capability is not strong, the recognition error is large, the recognition speed is slow, the precision is low, and the existing face recognition methods only stay on the basis of face recognition.

Disclosure of Invention

The embodiment of the invention provides a face and attribute identification method and system based on deep learning and computer equipment, and aims to solve the problems of low identification speed and low accuracy in the prior art.

In a first aspect, an embodiment of the present invention provides a method for recognizing a face and attributes based on deep learning, including:

acquiring a monitoring video shot by a camera, intercepting each frame of monitoring image of the monitoring video, performing face detection on each frame of monitoring image to obtain face position information and face key point information of a face in each frame of monitoring image, and intercepting the face according to the face position information to obtain an initial face image;

calculating a face rotation angle of the initial face image according to face key point information, rotating the initial face image according to the face rotation angle, and performing size adjustment and edge processing on the rotated initial face image to obtain a target face image;

inputting the target face image into a face feature coding model for feature coding to obtain a current face feature vector, and comparing the current face feature vector with historical face feature vectors in a face library to obtain face attribution information;

and inputting the target face image into a face attribute recognition model for face attribute recognition to obtain age information and gender information of the face in the target face image, and outputting the face attribution information, the age information and the gender information of the target face image.

In a second aspect, an embodiment of the present invention provides a face and attribute recognition system based on deep learning, including:

an initial face image obtaining unit, configured to obtain a monitoring video shot by a camera, intercept each frame of monitoring image of the monitoring video, perform face detection on each frame of monitoring image, obtain face position information and face key point information of a face in each frame of monitoring image, and intercept the face according to the face position information, so as to obtain an initial face image;

a target face image obtaining unit, configured to calculate a face rotation angle of the initial face image according to face key point information, rotate the initial face image according to the face rotation angle, and perform size adjustment and edge processing on the rotated initial face image to obtain a target face image;

a face attribution information obtaining unit, configured to input the target face image into a face feature coding model for feature coding to obtain a current face feature vector, and compare the current face feature vector with a historical face feature vector in a face library to obtain face attribution information;

and the human face attribute acquisition unit is used for inputting the target human face image into a human face attribute recognition model for human face attribute recognition to obtain age information and gender information of the human face in the target human face image and outputting the human face attribution information, the age information and the gender information of the target human face image outwards.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the computer program, implements the method for recognizing a face and attributes based on deep learning according to the first aspect.

The embodiment of the invention provides a method, a system and computer equipment for recognizing human faces and attributes based on deep learning, wherein the method comprises the following steps: acquiring a monitoring video shot by a camera, intercepting each frame of monitoring image of the monitoring video, performing face detection on each frame of monitoring image to obtain face position information and face key point information of a face in each frame of monitoring image, and intercepting the face according to the face position information to obtain an initial face image; calculating a face rotation angle of the initial face image according to face key point information, rotating the initial face image according to the face rotation angle, and performing size adjustment and edge processing on the rotated initial face image to obtain a target face image; inputting the target face image into a face feature coding model for feature coding to obtain a current face feature vector, and comparing the current face feature vector with historical face feature vectors in a face library to obtain face attribution information; and inputting the target face image into a face attribute recognition model for face attribute recognition to obtain age information and gender information of the face in the target face image, and outputting the face attribution information, the age information and the gender information of the target face image. According to the embodiment of the invention, the face attribution information, the age information and the gender information are respectively obtained by using the face feature coding model and the face attribute identification model, so that the identification efficiency is improved, and the identification effect is more accurate.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a face and attribute recognition method based on deep learning according to an embodiment of the present invention;

fig. 2 is a schematic block diagram of a face and attribute recognition system based on deep learning according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described implementations are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1, fig. 1 is a schematic flow chart of a method for recognizing a face and attributes based on deep learning according to an embodiment of the present invention, where the method includes steps S101 to S104.

S101, acquiring a monitoring video shot by a camera, intercepting each frame of monitoring image of the monitoring video, performing face detection on each frame of monitoring image to obtain face position information and face key point information of a face in each frame of monitoring image, and intercepting the face according to the face position information to obtain an initial face image;

s102, calculating a face rotation angle of the initial face image according to face key point information, rotating the initial face image according to the face rotation angle, and carrying out size adjustment and edge processing on the rotated initial face image to obtain a target face image;

s103, inputting the target face image into a face feature coding model for feature coding to obtain a current face feature vector, and comparing the current face feature vector with historical face feature vectors in a face library to obtain face attribution information;

and S104, inputting the target face image into a face attribute recognition model for face attribute recognition to obtain age information and gender information of the face in the target face image, and outputting the face attribution information, the age information and the gender information of the target face image.

In this embodiment, face detection is performed on each frame of monitoring image in a monitoring video shot by a camera to obtain face position information and face key point information of all faces in each frame of monitoring image, and then an initial face image of each face is captured according to the face position information corresponding to each face; rotating each initial face image according to the face rotation angle, and carrying out size adjustment and edge processing to obtain a corresponding target face image; and respectively inputting the target face image into a face feature coding model and a face attribute identification model for feature coding and face attribute identification to obtain face attribution information, age information and gender information of the target face image.

Specifically, a monitoring video shot by a USB camera is captured, each frame of monitoring image is intercepted, whether a face exists in the monitoring image is detected, if the face exists, the number of the faces is further monitored, and face position information and face key point information of each face are obtained; five key points of the human face in the embodiment are provided, namely a left eye, a right eye, a nose tip, a left mouth corner and a right mouth corner; then generating a face frame according to the face position information of each face, and intercepting the face according to the face frame to obtain an initial face image corresponding to the face; judging whether the face is in a horizontal state or not according to the left-eye and right-eye key point information of the face key points, if not, calculating an included angle theta between two eyes and a horizontal line, then rotating the initial face image around a central point clockwise by the affine transformation theta to enable the two eyes to be in the horizontal state, carrying out Resize operation (namely size adjustment) on the face to fix the size of the face at 112 x 112, and carrying out 0 complementing operation on the edge missing part by padding (filling) to obtain a target face image; then, respectively inputting the target face image into a face feature coding model and a face attribute identification model for feature coding and attribute identification; in the face feature coding model, the target face image is coded by 112 × 112 dimensions into a 128-dimensional coding vector, and then the target face image is coded by 1: comparing the mode of N with vectors in a face library, wherein N is the number of the face libraries, and determining face attribution information by calculating cosine distances and cosine similarity between faces; in the face attribute recognition model, the target face image is normalized, standardized and the like, and then the age information and the gender information of the current face are output through the attribute recognition model.

This example uses 1: and comparing the N face libraries to identify the face, and judging the face attribution by calculating the cosine distance between the feature coding vector of the current face and the face in the face library, wherein the calculation formula is as follows:

the distance obtained by simplification is 2(1-cos (a, B)). Wherein cos (A, B) is between two vectorsCosine similarity, distance is the final cosine distance. Compared with the traditional Euclidean distance, the method for calculating the similarity between the human faces by adopting the cosine distance in the high-dimensionality vector is simpler, and the calculation amount is smaller. And finally, indexing the label of the corresponding characteristic of the face library by the minimum value in the N distance numerical value distances obtained by calculation, and outputting face attribution information, such as Zhang III, Liqu and the like.

In an embodiment, the performing face detection on each frame of monitored image to obtain face position information and face key point information of a face in each frame of monitored image includes:

inputting each frame of monitoring image into a nerve network model based on a Mobilenet network to carry out size adjustment and mean value reduction operation on the monitoring image, and carrying out face detection on the monitoring image by using a face detection algorithm model to obtain the number of faces in the monitoring image, face position information and face key point information corresponding to each face.

In this embodiment, a neural network model is built based on a backbone network of the mobilent, the size of a monitored image is adjusted and the average value of the monitored image is reduced, a face detection algorithm model is used for face detection of the monitored image, and the number of faces in the monitored image of the current frame, and corresponding face position information and face key point information are obtained.

Specifically, a neural network model is built based on a backbone network of the mobilene, a monitored image is an image with a size of 640 × 640 by resize, then an average value reduction operation is performed, average values of [104, 117, 123] are respectively reduced in R, G, B three channels, then the monitored image is subjected to face recognition through a face detection algorithm model, and the number of faces detected in a current frame monitored image, face position information of each face and face key point information are output.

In an embodiment, the inputting the target face image into a face feature coding model for feature coding to obtain a current face feature vector includes:

performing convolution processing on the target face image, inputting the target face image into a BN layer for batch standardization processing, and activating by using an activation function to obtain a first convolution result;

inputting the first convolution result into a residual error network module for convolution processing to obtain a second convolution result;

and performing dimensionality reduction processing on the second convolution result by using a pre-constructed dimensionality reduction mapping matrix to obtain a current face feature vector with an appointed dimensionality.

In this embodiment, the target face image is sequentially input to the convolution layer and the BN layer, activated by using an activation function, continuously input to the residual error network for convolution, and finally subjected to dimensionality reduction by using a pre-constructed dimensionality reduction mapping matrix, so as to obtain a current face feature vector with an assigned dimensionality. The embodiment constructs 25088 × 128 mapping matrix, and reduces the calculation amount to: 25088 x 128 is 3.2M, which reduces the parameter number by 75%, reduces the calculation amount and increases the running speed.

In an embodiment, the inputting the first convolution result into the residual error network module for convolution processing to obtain a second convolution result includes:

inputting the first convolution result into a front-section residual error network unit for convolution processing to obtain a first front-section residual error result, inputting the first front-section residual error result into a middle-section residual error network unit for convolution processing to obtain a first middle-section residual error result, and inputting the first middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a first rear-section residual error result;

inputting the first rear-section residual error result into a front-section residual error network unit for convolution processing to obtain a second front-section residual error result, inputting the second front-section residual error result into a plurality of continuous middle-section residual error network units for convolution processing to obtain a second middle-section residual error result, and inputting the second middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a second rear-section residual error result;

inputting the second rear-section residual error result into a front-section residual error network unit for convolution processing to obtain a third front-section residual error result, inputting the third front-section residual error result into a plurality of continuous middle-section residual error network units for convolution processing to obtain a third middle-section residual error result, and inputting the third middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a third rear-section residual error result;

and inputting the third rear-section residual result into a front-section residual network unit for convolution processing to obtain a fourth front-section residual result, inputting the fourth front-section residual result into a middle-section residual network unit for convolution processing to obtain a fourth middle-section residual result, and inputting the fourth middle-section residual result into a rear-section residual network unit for convolution processing to obtain a fourth rear-section residual result.

In this embodiment, the residual network module includes a first residual unit, a second residual unit, a third residual unit, and a fourth residual unit, and each of the first residual unit, the second residual unit, the third residual unit, and the fourth residual unit includes a front-stage residual network unit, a middle-stage residual network unit, and a rear-stage residual network unit. The first residual error unit comprises a front section residual error network unit, a middle section residual error network unit and a rear section residual error network unit; the second residual error unit comprises a front section residual error network unit, two middle section residual error network units and a rear section residual error network unit; the third residual error unit comprises a front-section residual error network unit, four middle-section residual error network units and a rear-section residual error network unit; the fourth residual unit comprises a front-section residual network unit, a middle-section residual network unit and a rear-section residual network unit.

The processing procedure of the first residual error unit is as follows: inputting the first convolution result into a front-section residual error network unit for convolution processing to obtain a first front-section residual error result, inputting the first front-section residual error result into a middle-section residual error network unit for convolution processing to obtain a first middle-section residual error result, and inputting the first middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a first rear-section residual error result; the processing procedure of the second residual error unit is as follows: inputting the first rear-section residual error result into a front-section residual error network unit for convolution processing to obtain a second front-section residual error result, inputting the second front-section residual error result into two continuous middle-section residual error network units for convolution processing to obtain a second middle-section residual error result, and inputting the second middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a second rear-section residual error result; the processing procedure of the third residual error unit is as follows: inputting the second rear-section residual error result into a front-section residual error network unit for convolution processing to obtain a third front-section residual error result, inputting the third front-section residual error result into four continuous middle-section residual error network units for convolution processing to obtain a third middle-section residual error result, and inputting the third middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a third rear-section residual error result; the processing procedure of the fourth residual unit is as follows: and inputting the third rear-section residual result into a front-section residual network unit for convolution processing to obtain a fourth front-section residual result, inputting the fourth front-section residual result into a middle-section residual network unit for convolution processing to obtain a fourth middle-section residual result, and inputting the fourth middle-section residual result into a rear-section residual network unit for convolution processing to obtain a fourth rear-section residual result.

In this embodiment, it is described that data is input into the front-stage residual error network unit, the middle-stage residual error network unit, and the back-stage residual error network unit for convolution processing, and instead of repeatedly inputting data into the same front-stage residual error network unit, middle-stage residual error network unit, and back-stage residual error network unit, the data is provided with a plurality of front-stage residual error network units, middle-stage residual error network units, and back-stage residual error network units having the same structure, and the data is arranged in the above order.

The residual error connection in the front section residual error network unit, the middle section residual error network unit and the rear section residual error network unit is constructed in a convolution kernel connection mode with the size of 3 × 3 and the step size of 2 and the size of 1 × 1 in the residual error network module, so that the problem of local information loss caused by convolution with the size of 1 × 1 and the step size of 2 in comparison with the traditional method is avoided.

In an embodiment, the inputting the first convolution result into a front-end residual error network unit for convolution processing to obtain a first front-end residual error result includes:

inputting the first convolution result into a convolution layer with a convolution kernel of 1 × 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by using a Relu activation function to obtain a first front-stage convolution result;

inputting the first front-stage convolution result into a convolution layer with convolution kernel of 3 multiplied by 3 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by utilizing Relu activation function to obtain a second front-stage convolution result;

and inputting the second front-section convolution result into a convolution layer with a convolution kernel of 1 multiplied by 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing to obtain a first batch of standardization processing results, and performing characteristic fusion on the first batch of standardization processing results and the first convolution result to obtain a first front-section residual error result.

In this embodiment, the first convolution result is sequentially input to the convolution layer and the BN layer with convolution kernel of 1 × 1, and activated by the Relu activation function to obtain a first previous convolution result, then the first previous convolution result is continuously input to the convolution layer and the BN layer with convolution kernel of 3 × 3, and activated by the Relu activation function to obtain a second previous convolution result, and finally the second previous convolution result is input to the convolution layer and the BN layer with convolution kernel of 1 × 1 to be processed to obtain a first batch of standardized processing results, and the first batch of standardized processing results and the first convolution result are subjected to feature fusion processing to obtain a first previous residual error result.

In an embodiment, the inputting the first front-segment residual result into a middle-segment residual network unit for convolution processing to obtain a first middle-segment residual result includes:

inputting the first front-section residual error result into a BN layer for batch standardization processing, and activating by utilizing a Relu activation function to obtain a first middle-section convolution result;

inputting the first middle-section convolution result into a convolution layer with a convolution kernel of 1 × 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by using a Relu activation function to obtain a second middle-section convolution result;

inputting the second middle-section convolution result into a convolution layer with convolution kernel of 3 x 3 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by utilizing Relu activation function to obtain a third middle-section convolution result;

and inputting the third middle-section convolution result into a convolution layer with a convolution kernel of 1 multiplied by 1 for convolution processing, and performing characteristic fusion on the convolution result and the first front-section residual error result to obtain a first middle-section residual error result.

In this embodiment, the first front-end residual error result is input into the BN layer for batch normalization processing, the Relu activation function is used for activation to obtain a first middle-section convolution result, the first middle-section convolution result is input into the convolution layer and the BN layer with convolution kernels of 1 × 1, the Relu activation function is used for activation to obtain a second middle-section convolution result, the second middle-section convolution result is then continuously input into the convolution layer and the BN layer with convolution kernels of 3 × 3, the Relu activation function is used for activation to obtain a third middle-section convolution result, the third middle-section convolution result is input into the convolution layer with convolution kernels of 1 × 1 for convolution, and the convolution result and the first front-end residual error result are subjected to feature fusion processing to obtain a first middle-section residual error result.

In this embodiment, if a plurality of continuous middle-segment residual error network units are provided, the first middle-segment residual error network unit is first input into the BN layer for batch normalization, and the remaining middle-segment residual error network units are deleted and activated directly using the Relu activation function.

In an embodiment, the inputting the first middle-stage residual result into a back-stage residual network unit for convolution processing to obtain a first back-stage residual result includes:

inputting the first middle section residual error result into a BN layer for batch standardization processing, and activating by utilizing a Relu activation function to obtain a first back section convolution result;

inputting the first back-stage convolution result into a convolution layer with a convolution kernel of 1 × 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by using a Relu activation function to obtain a second back-stage convolution result;

inputting the second back-stage convolution result into a convolution layer with convolution kernel of 3 multiplied by 3 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by utilizing a Relu activation function to obtain a third back-stage convolution result;

and inputting the third rear-segment convolution result into a convolution layer with a convolution kernel of 1 multiplied by 1 for convolution processing, performing characteristic fusion on the convolution result and the first middle-segment residual error result, inputting the characteristic fusion result into a BN layer for batch standardization processing, and activating by using a Relu activation function to obtain a first rear-segment residual error result.

In this embodiment, the first middle-segment residual error result is first input into the BN layer for batch normalization, and activated by the Relu activation function to obtain a first back-segment convolution result, then inputting the first back-end convolution result into a convolution layer with convolution kernel of 1 multiplied by 1 and a BN layer, and utilizing Relu activation function to activate to obtain a second back-end convolution result, then the second back-stage convolution result is continuously input into the convolution layer with convolution kernel of 3 x 3 and the BN layer, activating by Relu activation function to obtain a third post-convolution result, inputting the third post-convolution result into a convolution layer with convolution kernel of 1 × 1 for convolution, and performing characteristic blending processing on the convolution result and the first front-segment residual error result, inputting the characteristic blending result into a BN layer for batch standardization processing, and finally activating by using a Relu activation function to obtain a first rear-segment residual error result.

In an embodiment, the inputting the target face image into a face attribute recognition model for face attribute recognition to obtain age information and gender information of a face in the target face image includes:

normalizing each pixel of the target face image to enable each pixel to be between [0, 1 ];

setting a corresponding standardized mean coefficient and a corresponding standardized variance coefficient for each pixel channel of the target face image so as to carry out standardized processing on the target face image;

inputting the standardized target face image into a shufflenet type-dividing neural network for feature coding to obtain a corresponding target feature vector, and classifying the target feature vector according to a pre-constructed component mapping matrix to obtain age information and gender information.

In this embodiment, the target face image is normalized, that is, each pixel of the target face image is converted from an INT type to a FLOAT type, and is reduced by 255 times, so that each pixel is between [0, 1], then is normalized, and is input into a shufflenet-based neural network for feature coding, so as to obtain a corresponding target feature vector, and the target feature vector is classified according to a pre-constructed component mapping matrix, so as to obtain age information and gender information.

The specific process of the standardization treatment is as follows: respectively processing three channels of RGB of the image, and setting a normalized mean coefficient of a channel 1 to be 0.485 and a variance coefficient to be 0.229; the mean coefficient of channel 2 is 0.456 and the variance coefficient is 0.224; the mean coefficient of channel 3 is 0.406, and the variance coefficient is 0.225;

after normalization and standardization, the reconstructed image matrix is subjected to a shufflent classification type neural network to encode the image matrix into 1024-dimensional feature vectors, the feature vectors are classified according to a pre-constructed 1024 x 2 mapping matrix, the first type of output is gender information (male/female), and the second type of output is age information (0-100 years old).

Referring to fig. 2, fig. 2 is a schematic block diagram of a deep learning based face and attribute recognition system according to an embodiment of the present invention, where the deep learning based face and attribute recognition system 200 includes:

an initial face image obtaining unit 201, configured to obtain a surveillance video captured by a camera, intercept each frame of surveillance image of the surveillance video, perform face detection on each frame of surveillance image, obtain face position information and face key point information of a face in each frame of surveillance image, and intercept the face according to the face position information, so as to obtain an initial face image;

a target face image obtaining unit 202, configured to calculate a face rotation angle of the initial face image according to face key point information, rotate the initial face image according to the face rotation angle, and perform size adjustment and edge processing on the rotated initial face image to obtain a target face image;

a face attribution information obtaining unit 203, configured to input the target face image into a face feature coding model for feature coding to obtain a current face feature vector, and compare the current face feature vector with a historical face feature vector in a face library to obtain face attribution information;

a face attribute obtaining unit 204, configured to input the target face image into a face attribute recognition model for face attribute recognition, obtain age information and gender information of a face in the target face image, and output face attribution information, age information and gender information of the target face image.

In an embodiment, the initial face image obtaining unit 201 includes:

and the face information acquisition unit is used for inputting each frame of monitoring image into a neural network model based on a Mobilenet network to carry out size adjustment and mean value reduction on the monitoring image, and carrying out face detection on the monitoring image by using a face detection algorithm model to obtain the number of faces in the monitoring image, face position information and face key point information corresponding to each face.

In an embodiment, the face attribution information obtaining unit 203 includes:

the first convolution result acquisition unit is used for performing convolution processing on the target face image, inputting the convolution processing into the BN layer for batch standardization processing, and activating by using an activation function to obtain a first convolution result;

the residual error network module processing unit is used for inputting the first convolution result into the residual error network module for convolution processing to obtain a second convolution result;

and the current face feature vector acquisition unit is used for carrying out dimension reduction processing on the second convolution result by utilizing a pre-constructed dimension reduction mapping matrix to obtain a current face feature vector with a specified dimension.

In one embodiment, the residual network module processing unit includes:

the first residual error processing unit is used for inputting the first convolution result into a front-section residual error network unit for convolution processing to obtain a first front-section residual error result, inputting the first front-section residual error result into a middle-section residual error network unit for convolution processing to obtain a first middle-section residual error result, and inputting the first middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a first rear-section residual error result;

the second residual error processing unit is used for inputting the first rear-section residual error result into a front-section residual error network unit for convolution processing to obtain a second front-section residual error result, inputting the second front-section residual error result into a plurality of continuous middle-section residual error network units for convolution processing to obtain a second middle-section residual error result, and inputting the second middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a second rear-section residual error result;

a third residual processing unit, configured to input the second rear-segment residual result into a front-segment residual network unit for convolution processing, so as to obtain a third front-segment residual result, input the third front-segment residual result into a plurality of consecutive middle-segment residual network units for convolution processing, so as to obtain a third middle-segment residual result, and input the third middle-segment residual result into a rear-segment residual network unit for convolution processing, so as to obtain a third rear-segment residual result;

and the fourth residual processing unit is used for inputting the third rear-section residual result into a front-section residual network unit for convolution processing to obtain a fourth front-section residual result, inputting the fourth front-section residual result into a middle-section residual network unit for convolution processing to obtain a fourth middle-section residual result, and inputting the fourth middle-section residual result into a rear-section residual network unit for convolution processing to obtain a fourth rear-section residual result.

In an embodiment, the first residual error processing unit includes:

a first previous convolution result obtaining unit, configured to input the first convolution result into a convolution layer with a convolution kernel of 1 × 1 for convolution processing, input the convolution result into a BN layer for batch normalization processing, and activate the convolution result by using a Relu activation function to obtain a first previous convolution result;

a second previous-stage convolution result obtaining unit, configured to input the first previous-stage convolution result to a convolution layer with a convolution kernel of 3 × 3 for convolution processing, input the convolution result to a BN layer for batch normalization processing, and activate by using a Relu activation function to obtain a second previous-stage convolution result;

and the first front-section residual error result acquisition unit is used for inputting the second front-section convolution result into a convolution layer with a convolution kernel of 1 multiplied by 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing to obtain a first batch of standardization processing results, and performing characteristic fusion on the first batch of standardization processing results and the first convolution result to obtain a first front-section residual error result.

In an embodiment, the first residual error processing unit includes:

a first middle-section convolution result obtaining unit, configured to input the first front-section residual result into a BN layer for batch normalization processing, and activate the first front-section residual result by using a Relu activation function, so as to obtain a first middle-section convolution result;

the second middle-section convolution result acquisition unit is used for inputting the first middle-section convolution result into a convolution layer with a convolution kernel of 1 multiplied by 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by using a Relu activation function to obtain a second middle-section convolution result;

a third middle-section convolution result obtaining unit, configured to input the second middle-section convolution result to a convolution layer with a convolution kernel of 3 × 3 for convolution processing, input the convolution result to a BN layer for batch normalization processing, and activate the convolution result by using a Relu activation function to obtain a third middle-section convolution result;

and the first middle section residual error result acquisition unit is used for inputting the third middle section convolution result into a convolution layer with convolution kernel of 1 multiplied by 1 for convolution processing, and performing characteristic fusion on the convolution result and the first front section residual error result to obtain a first middle section residual error result.

In an embodiment, the first residual error processing unit includes:

the first back-stage convolution result acquisition unit is used for inputting the first middle-stage residual error result into the BN layer for batch standardization processing and activating by utilizing a Relu activation function to obtain a first back-stage convolution result;

the second back-stage convolution result acquisition unit is used for inputting the first back-stage convolution result into a convolution layer with convolution kernel of 1 × 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by using a Relu activation function to obtain a second back-stage convolution result;

a third back-end convolution result obtaining unit, configured to input the second back-end convolution result to a convolution layer with a convolution kernel of 3 × 3 for convolution processing, input the convolution result to a BN layer for batch standardization processing, and activate by using a Relu activation function to obtain a third back-end convolution result;

and the first rear-section residual error result obtaining unit is used for inputting the third rear-section convolution result into a convolution layer with a convolution kernel of 1 multiplied by 1 for convolution processing, performing characteristic fusion on the convolution result and the first middle-section residual error result, inputting the characteristic fusion result into a BN layer for batch standardization processing, and activating by utilizing a Relu activation function to obtain a first rear-section residual error result.

In an embodiment, the face attribute obtaining unit 204 includes:

the normalization processing unit is used for performing normalization processing on each pixel of the target face image to enable each pixel to be between [0 and 1 ];

the normalization processing unit is used for setting a corresponding normalization mean coefficient and a corresponding normalization variance coefficient for each pixel channel of the target face image so as to perform normalization processing on the target face image;

and the age and gender acquisition unit is used for inputting the standardized target face image into a shufflenet type-dividing neural network for feature coding to obtain a corresponding target feature vector, and classifying the target feature vector according to a pre-constructed component mapping matrix to obtain age information and gender information.

The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the method for recognizing the human face and the attribute based on the deep learning when executing the computer program.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The use of the phrase "including a" does not exclude the presence of other, identical elements in a process, method, article, or apparatus that comprises the same element, unless the context clearly dictates otherwise.

Claims

1. A face and attribute recognition method based on deep learning is characterized by comprising the following steps:

calculating a face rotation angle of the initial face image according to face key point information, rotating the initial face image according to the face rotation angle, and carrying out size adjustment and edge processing on the rotated initial face image to obtain a target face image;

2. The method for recognizing the face and the attribute based on the deep learning of claim 1, wherein the step of performing the face detection on each frame of the monitored image to obtain the face position information and the face key point information of the face in each frame of the monitored image comprises the steps of:

3. The method for recognizing the face and the attribute based on the deep learning of claim 1, wherein the step of inputting the target face image into a face feature coding model for feature coding to obtain a current face feature vector comprises the steps of:

4. The method of claim 3, wherein the step of inputting the first convolution result into a residual error network module for convolution processing to obtain a second convolution result comprises:

5. The method of claim 4, wherein the inputting the first convolution result into a front-end residual error network unit for convolution processing to obtain a first front-end residual error result comprises:

6. The method of claim 4, wherein the inputting the first front-segment residual result into a middle-segment residual network unit for convolution processing to obtain a first middle-segment residual result comprises:

7. The method of claim 4, wherein the step of inputting the first middle-segment residual result into a back-segment residual network unit for convolution processing to obtain a first back-segment residual result comprises:

8. The method for recognizing the face and the attribute based on the deep learning of claim 1, wherein the step of inputting the target face image into a face attribute recognition model for face attribute recognition to obtain the age information and the gender information of the face in the target face image comprises the steps of:

9. A face and attribute recognition system based on deep learning is characterized by comprising:

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for deep learning based face and attribute recognition according to any one of claims 1 to 8 when executing the computer program.