CN115050069A - Face and attribute recognition method and system based on deep learning and computer equipment - Google Patents

Face and attribute recognition method and system based on deep learning and computer equipment Download PDF

Info

Publication number
CN115050069A
CN115050069A CN202210602950.6A CN202210602950A CN115050069A CN 115050069 A CN115050069 A CN 115050069A CN 202210602950 A CN202210602950 A CN 202210602950A CN 115050069 A CN115050069 A CN 115050069A
Authority
CN
China
Prior art keywords
face
convolution
result
inputting
residual error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210602950.6A
Other languages
Chinese (zh)
Inventor
游亚东
王一科
迟大鹏
于佳辰
贾林
涂静一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Kewei Robot Technology Co ltd
Original Assignee
Shenzhen Kewei Robot Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Kewei Robot Technology Co ltd filed Critical Shenzhen Kewei Robot Technology Co ltd
Priority to CN202210602950.6A priority Critical patent/CN115050069A/en
Publication of CN115050069A publication Critical patent/CN115050069A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a face and attribute recognition method, a system and computer equipment based on deep learning, wherein the method comprises the following steps: acquiring each frame of monitoring image of a monitoring video, carrying out face detection to obtain face position information and face key point information of a face, and intercepting an initial face image according to the face position information; calculating a face rotation angle according to the face key point information and rotating to obtain a target face image; inputting a target face image into a face feature coding model to obtain face attribution information; and inputting the target face image into a face attribute recognition model for face attribute recognition to obtain age information and gender information of the face, and outputting face attribution information, age information and gender information of the target face image. The invention respectively obtains the face attribution information, the age information and the gender information by utilizing the face feature coding model and the face attribute identification model, thereby improving the identification efficiency and ensuring more accurate identification effect.

Description

Face and attribute recognition method and system based on deep learning and computer equipment
Technical Field
The invention relates to the technical field of video editing, in particular to a method, a system and computer equipment for recognizing human faces and attributes based on deep learning.
Background
Most of the existing face recognition methods are developed based on the traditional technology, the generalization capability is not strong, the recognition error is large, the recognition speed is slow, the precision is low, and the existing face recognition methods only stay on the basis of face recognition.
Disclosure of Invention
The embodiment of the invention provides a face and attribute identification method and system based on deep learning and computer equipment, and aims to solve the problems of low identification speed and low accuracy in the prior art.
In a first aspect, an embodiment of the present invention provides a method for recognizing a face and attributes based on deep learning, including:
acquiring a monitoring video shot by a camera, intercepting each frame of monitoring image of the monitoring video, performing face detection on each frame of monitoring image to obtain face position information and face key point information of a face in each frame of monitoring image, and intercepting the face according to the face position information to obtain an initial face image;
calculating a face rotation angle of the initial face image according to face key point information, rotating the initial face image according to the face rotation angle, and performing size adjustment and edge processing on the rotated initial face image to obtain a target face image;
inputting the target face image into a face feature coding model for feature coding to obtain a current face feature vector, and comparing the current face feature vector with historical face feature vectors in a face library to obtain face attribution information;
and inputting the target face image into a face attribute recognition model for face attribute recognition to obtain age information and gender information of the face in the target face image, and outputting the face attribution information, the age information and the gender information of the target face image.
In a second aspect, an embodiment of the present invention provides a face and attribute recognition system based on deep learning, including:
an initial face image obtaining unit, configured to obtain a monitoring video shot by a camera, intercept each frame of monitoring image of the monitoring video, perform face detection on each frame of monitoring image, obtain face position information and face key point information of a face in each frame of monitoring image, and intercept the face according to the face position information, so as to obtain an initial face image;
a target face image obtaining unit, configured to calculate a face rotation angle of the initial face image according to face key point information, rotate the initial face image according to the face rotation angle, and perform size adjustment and edge processing on the rotated initial face image to obtain a target face image;
a face attribution information obtaining unit, configured to input the target face image into a face feature coding model for feature coding to obtain a current face feature vector, and compare the current face feature vector with a historical face feature vector in a face library to obtain face attribution information;
and the human face attribute acquisition unit is used for inputting the target human face image into a human face attribute recognition model for human face attribute recognition to obtain age information and gender information of the human face in the target human face image and outputting the human face attribution information, the age information and the gender information of the target human face image outwards.
In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the computer program, implements the method for recognizing a face and attributes based on deep learning according to the first aspect.
The embodiment of the invention provides a method, a system and computer equipment for recognizing human faces and attributes based on deep learning, wherein the method comprises the following steps: acquiring a monitoring video shot by a camera, intercepting each frame of monitoring image of the monitoring video, performing face detection on each frame of monitoring image to obtain face position information and face key point information of a face in each frame of monitoring image, and intercepting the face according to the face position information to obtain an initial face image; calculating a face rotation angle of the initial face image according to face key point information, rotating the initial face image according to the face rotation angle, and performing size adjustment and edge processing on the rotated initial face image to obtain a target face image; inputting the target face image into a face feature coding model for feature coding to obtain a current face feature vector, and comparing the current face feature vector with historical face feature vectors in a face library to obtain face attribution information; and inputting the target face image into a face attribute recognition model for face attribute recognition to obtain age information and gender information of the face in the target face image, and outputting the face attribution information, the age information and the gender information of the target face image. According to the embodiment of the invention, the face attribution information, the age information and the gender information are respectively obtained by using the face feature coding model and the face attribute identification model, so that the identification efficiency is improved, and the identification effect is more accurate.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a face and attribute recognition method based on deep learning according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of a face and attribute recognition system based on deep learning according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described implementations are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1, fig. 1 is a schematic flow chart of a method for recognizing a face and attributes based on deep learning according to an embodiment of the present invention, where the method includes steps S101 to S104.
S101, acquiring a monitoring video shot by a camera, intercepting each frame of monitoring image of the monitoring video, performing face detection on each frame of monitoring image to obtain face position information and face key point information of a face in each frame of monitoring image, and intercepting the face according to the face position information to obtain an initial face image;
s102, calculating a face rotation angle of the initial face image according to face key point information, rotating the initial face image according to the face rotation angle, and carrying out size adjustment and edge processing on the rotated initial face image to obtain a target face image;
s103, inputting the target face image into a face feature coding model for feature coding to obtain a current face feature vector, and comparing the current face feature vector with historical face feature vectors in a face library to obtain face attribution information;
and S104, inputting the target face image into a face attribute recognition model for face attribute recognition to obtain age information and gender information of the face in the target face image, and outputting the face attribution information, the age information and the gender information of the target face image.
In this embodiment, face detection is performed on each frame of monitoring image in a monitoring video shot by a camera to obtain face position information and face key point information of all faces in each frame of monitoring image, and then an initial face image of each face is captured according to the face position information corresponding to each face; rotating each initial face image according to the face rotation angle, and carrying out size adjustment and edge processing to obtain a corresponding target face image; and respectively inputting the target face image into a face feature coding model and a face attribute identification model for feature coding and face attribute identification to obtain face attribution information, age information and gender information of the target face image.
Specifically, a monitoring video shot by a USB camera is captured, each frame of monitoring image is intercepted, whether a face exists in the monitoring image is detected, if the face exists, the number of the faces is further monitored, and face position information and face key point information of each face are obtained; five key points of the human face in the embodiment are provided, namely a left eye, a right eye, a nose tip, a left mouth corner and a right mouth corner; then generating a face frame according to the face position information of each face, and intercepting the face according to the face frame to obtain an initial face image corresponding to the face; judging whether the face is in a horizontal state or not according to the left-eye and right-eye key point information of the face key points, if not, calculating an included angle theta between two eyes and a horizontal line, then rotating the initial face image around a central point clockwise by the affine transformation theta to enable the two eyes to be in the horizontal state, carrying out Resize operation (namely size adjustment) on the face to fix the size of the face at 112 x 112, and carrying out 0 complementing operation on the edge missing part by padding (filling) to obtain a target face image; then, respectively inputting the target face image into a face feature coding model and a face attribute identification model for feature coding and attribute identification; in the face feature coding model, the target face image is coded by 112 × 112 dimensions into a 128-dimensional coding vector, and then the target face image is coded by 1: comparing the mode of N with vectors in a face library, wherein N is the number of the face libraries, and determining face attribution information by calculating cosine distances and cosine similarity between faces; in the face attribute recognition model, the target face image is normalized, standardized and the like, and then the age information and the gender information of the current face are output through the attribute recognition model.
This example uses 1: and comparing the N face libraries to identify the face, and judging the face attribution by calculating the cosine distance between the feature coding vector of the current face and the face in the face library, wherein the calculation formula is as follows:
Figure BDA0003669135510000051
the distance obtained by simplification is 2(1-cos (a, B)). Wherein cos (A, B) is between two vectorsCosine similarity, distance is the final cosine distance. Compared with the traditional Euclidean distance, the method for calculating the similarity between the human faces by adopting the cosine distance in the high-dimensionality vector is simpler, and the calculation amount is smaller. And finally, indexing the label of the corresponding characteristic of the face library by the minimum value in the N distance numerical value distances obtained by calculation, and outputting face attribution information, such as Zhang III, Liqu and the like.
In an embodiment, the performing face detection on each frame of monitored image to obtain face position information and face key point information of a face in each frame of monitored image includes:
inputting each frame of monitoring image into a nerve network model based on a Mobilenet network to carry out size adjustment and mean value reduction operation on the monitoring image, and carrying out face detection on the monitoring image by using a face detection algorithm model to obtain the number of faces in the monitoring image, face position information and face key point information corresponding to each face.
In this embodiment, a neural network model is built based on a backbone network of the mobilent, the size of a monitored image is adjusted and the average value of the monitored image is reduced, a face detection algorithm model is used for face detection of the monitored image, and the number of faces in the monitored image of the current frame, and corresponding face position information and face key point information are obtained.
Specifically, a neural network model is built based on a backbone network of the mobilene, a monitored image is an image with a size of 640 × 640 by resize, then an average value reduction operation is performed, average values of [104, 117, 123] are respectively reduced in R, G, B three channels, then the monitored image is subjected to face recognition through a face detection algorithm model, and the number of faces detected in a current frame monitored image, face position information of each face and face key point information are output.
In an embodiment, the inputting the target face image into a face feature coding model for feature coding to obtain a current face feature vector includes:
performing convolution processing on the target face image, inputting the target face image into a BN layer for batch standardization processing, and activating by using an activation function to obtain a first convolution result;
inputting the first convolution result into a residual error network module for convolution processing to obtain a second convolution result;
and performing dimensionality reduction processing on the second convolution result by using a pre-constructed dimensionality reduction mapping matrix to obtain a current face feature vector with an appointed dimensionality.
In this embodiment, the target face image is sequentially input to the convolution layer and the BN layer, activated by using an activation function, continuously input to the residual error network for convolution, and finally subjected to dimensionality reduction by using a pre-constructed dimensionality reduction mapping matrix, so as to obtain a current face feature vector with an assigned dimensionality. The embodiment constructs 25088 × 128 mapping matrix, and reduces the calculation amount to: 25088 x 128 is 3.2M, which reduces the parameter number by 75%, reduces the calculation amount and increases the running speed.
In an embodiment, the inputting the first convolution result into the residual error network module for convolution processing to obtain a second convolution result includes:
inputting the first convolution result into a front-section residual error network unit for convolution processing to obtain a first front-section residual error result, inputting the first front-section residual error result into a middle-section residual error network unit for convolution processing to obtain a first middle-section residual error result, and inputting the first middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a first rear-section residual error result;
inputting the first rear-section residual error result into a front-section residual error network unit for convolution processing to obtain a second front-section residual error result, inputting the second front-section residual error result into a plurality of continuous middle-section residual error network units for convolution processing to obtain a second middle-section residual error result, and inputting the second middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a second rear-section residual error result;
inputting the second rear-section residual error result into a front-section residual error network unit for convolution processing to obtain a third front-section residual error result, inputting the third front-section residual error result into a plurality of continuous middle-section residual error network units for convolution processing to obtain a third middle-section residual error result, and inputting the third middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a third rear-section residual error result;
and inputting the third rear-section residual result into a front-section residual network unit for convolution processing to obtain a fourth front-section residual result, inputting the fourth front-section residual result into a middle-section residual network unit for convolution processing to obtain a fourth middle-section residual result, and inputting the fourth middle-section residual result into a rear-section residual network unit for convolution processing to obtain a fourth rear-section residual result.
In this embodiment, the residual network module includes a first residual unit, a second residual unit, a third residual unit, and a fourth residual unit, and each of the first residual unit, the second residual unit, the third residual unit, and the fourth residual unit includes a front-stage residual network unit, a middle-stage residual network unit, and a rear-stage residual network unit. The first residual error unit comprises a front section residual error network unit, a middle section residual error network unit and a rear section residual error network unit; the second residual error unit comprises a front section residual error network unit, two middle section residual error network units and a rear section residual error network unit; the third residual error unit comprises a front-section residual error network unit, four middle-section residual error network units and a rear-section residual error network unit; the fourth residual unit comprises a front-section residual network unit, a middle-section residual network unit and a rear-section residual network unit.
The processing procedure of the first residual error unit is as follows: inputting the first convolution result into a front-section residual error network unit for convolution processing to obtain a first front-section residual error result, inputting the first front-section residual error result into a middle-section residual error network unit for convolution processing to obtain a first middle-section residual error result, and inputting the first middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a first rear-section residual error result; the processing procedure of the second residual error unit is as follows: inputting the first rear-section residual error result into a front-section residual error network unit for convolution processing to obtain a second front-section residual error result, inputting the second front-section residual error result into two continuous middle-section residual error network units for convolution processing to obtain a second middle-section residual error result, and inputting the second middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a second rear-section residual error result; the processing procedure of the third residual error unit is as follows: inputting the second rear-section residual error result into a front-section residual error network unit for convolution processing to obtain a third front-section residual error result, inputting the third front-section residual error result into four continuous middle-section residual error network units for convolution processing to obtain a third middle-section residual error result, and inputting the third middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a third rear-section residual error result; the processing procedure of the fourth residual unit is as follows: and inputting the third rear-section residual result into a front-section residual network unit for convolution processing to obtain a fourth front-section residual result, inputting the fourth front-section residual result into a middle-section residual network unit for convolution processing to obtain a fourth middle-section residual result, and inputting the fourth middle-section residual result into a rear-section residual network unit for convolution processing to obtain a fourth rear-section residual result.
In this embodiment, it is described that data is input into the front-stage residual error network unit, the middle-stage residual error network unit, and the back-stage residual error network unit for convolution processing, and instead of repeatedly inputting data into the same front-stage residual error network unit, middle-stage residual error network unit, and back-stage residual error network unit, the data is provided with a plurality of front-stage residual error network units, middle-stage residual error network units, and back-stage residual error network units having the same structure, and the data is arranged in the above order.
The residual error connection in the front section residual error network unit, the middle section residual error network unit and the rear section residual error network unit is constructed in a convolution kernel connection mode with the size of 3 × 3 and the step size of 2 and the size of 1 × 1 in the residual error network module, so that the problem of local information loss caused by convolution with the size of 1 × 1 and the step size of 2 in comparison with the traditional method is avoided.
In an embodiment, the inputting the first convolution result into a front-end residual error network unit for convolution processing to obtain a first front-end residual error result includes:
inputting the first convolution result into a convolution layer with a convolution kernel of 1 × 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by using a Relu activation function to obtain a first front-stage convolution result;
inputting the first front-stage convolution result into a convolution layer with convolution kernel of 3 multiplied by 3 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by utilizing Relu activation function to obtain a second front-stage convolution result;
and inputting the second front-section convolution result into a convolution layer with a convolution kernel of 1 multiplied by 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing to obtain a first batch of standardization processing results, and performing characteristic fusion on the first batch of standardization processing results and the first convolution result to obtain a first front-section residual error result.
In this embodiment, the first convolution result is sequentially input to the convolution layer and the BN layer with convolution kernel of 1 × 1, and activated by the Relu activation function to obtain a first previous convolution result, then the first previous convolution result is continuously input to the convolution layer and the BN layer with convolution kernel of 3 × 3, and activated by the Relu activation function to obtain a second previous convolution result, and finally the second previous convolution result is input to the convolution layer and the BN layer with convolution kernel of 1 × 1 to be processed to obtain a first batch of standardized processing results, and the first batch of standardized processing results and the first convolution result are subjected to feature fusion processing to obtain a first previous residual error result.
In an embodiment, the inputting the first front-segment residual result into a middle-segment residual network unit for convolution processing to obtain a first middle-segment residual result includes:
inputting the first front-section residual error result into a BN layer for batch standardization processing, and activating by utilizing a Relu activation function to obtain a first middle-section convolution result;
inputting the first middle-section convolution result into a convolution layer with a convolution kernel of 1 × 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by using a Relu activation function to obtain a second middle-section convolution result;
inputting the second middle-section convolution result into a convolution layer with convolution kernel of 3 x 3 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by utilizing Relu activation function to obtain a third middle-section convolution result;
and inputting the third middle-section convolution result into a convolution layer with a convolution kernel of 1 multiplied by 1 for convolution processing, and performing characteristic fusion on the convolution result and the first front-section residual error result to obtain a first middle-section residual error result.
In this embodiment, the first front-end residual error result is input into the BN layer for batch normalization processing, the Relu activation function is used for activation to obtain a first middle-section convolution result, the first middle-section convolution result is input into the convolution layer and the BN layer with convolution kernels of 1 × 1, the Relu activation function is used for activation to obtain a second middle-section convolution result, the second middle-section convolution result is then continuously input into the convolution layer and the BN layer with convolution kernels of 3 × 3, the Relu activation function is used for activation to obtain a third middle-section convolution result, the third middle-section convolution result is input into the convolution layer with convolution kernels of 1 × 1 for convolution, and the convolution result and the first front-end residual error result are subjected to feature fusion processing to obtain a first middle-section residual error result.
In this embodiment, if a plurality of continuous middle-segment residual error network units are provided, the first middle-segment residual error network unit is first input into the BN layer for batch normalization, and the remaining middle-segment residual error network units are deleted and activated directly using the Relu activation function.
In an embodiment, the inputting the first middle-stage residual result into a back-stage residual network unit for convolution processing to obtain a first back-stage residual result includes:
inputting the first middle section residual error result into a BN layer for batch standardization processing, and activating by utilizing a Relu activation function to obtain a first back section convolution result;
inputting the first back-stage convolution result into a convolution layer with a convolution kernel of 1 × 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by using a Relu activation function to obtain a second back-stage convolution result;
inputting the second back-stage convolution result into a convolution layer with convolution kernel of 3 multiplied by 3 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by utilizing a Relu activation function to obtain a third back-stage convolution result;
and inputting the third rear-segment convolution result into a convolution layer with a convolution kernel of 1 multiplied by 1 for convolution processing, performing characteristic fusion on the convolution result and the first middle-segment residual error result, inputting the characteristic fusion result into a BN layer for batch standardization processing, and activating by using a Relu activation function to obtain a first rear-segment residual error result.
In this embodiment, the first middle-segment residual error result is first input into the BN layer for batch normalization, and activated by the Relu activation function to obtain a first back-segment convolution result, then inputting the first back-end convolution result into a convolution layer with convolution kernel of 1 multiplied by 1 and a BN layer, and utilizing Relu activation function to activate to obtain a second back-end convolution result, then the second back-stage convolution result is continuously input into the convolution layer with convolution kernel of 3 x 3 and the BN layer, activating by Relu activation function to obtain a third post-convolution result, inputting the third post-convolution result into a convolution layer with convolution kernel of 1 × 1 for convolution, and performing characteristic blending processing on the convolution result and the first front-segment residual error result, inputting the characteristic blending result into a BN layer for batch standardization processing, and finally activating by using a Relu activation function to obtain a first rear-segment residual error result.
In an embodiment, the inputting the target face image into a face attribute recognition model for face attribute recognition to obtain age information and gender information of a face in the target face image includes:
normalizing each pixel of the target face image to enable each pixel to be between [0, 1 ];
setting a corresponding standardized mean coefficient and a corresponding standardized variance coefficient for each pixel channel of the target face image so as to carry out standardized processing on the target face image;
inputting the standardized target face image into a shufflenet type-dividing neural network for feature coding to obtain a corresponding target feature vector, and classifying the target feature vector according to a pre-constructed component mapping matrix to obtain age information and gender information.
In this embodiment, the target face image is normalized, that is, each pixel of the target face image is converted from an INT type to a FLOAT type, and is reduced by 255 times, so that each pixel is between [0, 1], then is normalized, and is input into a shufflenet-based neural network for feature coding, so as to obtain a corresponding target feature vector, and the target feature vector is classified according to a pre-constructed component mapping matrix, so as to obtain age information and gender information.
The specific process of the standardization treatment is as follows: respectively processing three channels of RGB of the image, and setting a normalized mean coefficient of a channel 1 to be 0.485 and a variance coefficient to be 0.229; the mean coefficient of channel 2 is 0.456 and the variance coefficient is 0.224; the mean coefficient of channel 3 is 0.406, and the variance coefficient is 0.225;
after normalization and standardization, the reconstructed image matrix is subjected to a shufflent classification type neural network to encode the image matrix into 1024-dimensional feature vectors, the feature vectors are classified according to a pre-constructed 1024 x 2 mapping matrix, the first type of output is gender information (male/female), and the second type of output is age information (0-100 years old).
Referring to fig. 2, fig. 2 is a schematic block diagram of a deep learning based face and attribute recognition system according to an embodiment of the present invention, where the deep learning based face and attribute recognition system 200 includes:
an initial face image obtaining unit 201, configured to obtain a surveillance video captured by a camera, intercept each frame of surveillance image of the surveillance video, perform face detection on each frame of surveillance image, obtain face position information and face key point information of a face in each frame of surveillance image, and intercept the face according to the face position information, so as to obtain an initial face image;
a target face image obtaining unit 202, configured to calculate a face rotation angle of the initial face image according to face key point information, rotate the initial face image according to the face rotation angle, and perform size adjustment and edge processing on the rotated initial face image to obtain a target face image;
a face attribution information obtaining unit 203, configured to input the target face image into a face feature coding model for feature coding to obtain a current face feature vector, and compare the current face feature vector with a historical face feature vector in a face library to obtain face attribution information;
a face attribute obtaining unit 204, configured to input the target face image into a face attribute recognition model for face attribute recognition, obtain age information and gender information of a face in the target face image, and output face attribution information, age information and gender information of the target face image.
In an embodiment, the initial face image obtaining unit 201 includes:
and the face information acquisition unit is used for inputting each frame of monitoring image into a neural network model based on a Mobilenet network to carry out size adjustment and mean value reduction on the monitoring image, and carrying out face detection on the monitoring image by using a face detection algorithm model to obtain the number of faces in the monitoring image, face position information and face key point information corresponding to each face.
In an embodiment, the face attribution information obtaining unit 203 includes:
the first convolution result acquisition unit is used for performing convolution processing on the target face image, inputting the convolution processing into the BN layer for batch standardization processing, and activating by using an activation function to obtain a first convolution result;
the residual error network module processing unit is used for inputting the first convolution result into the residual error network module for convolution processing to obtain a second convolution result;
and the current face feature vector acquisition unit is used for carrying out dimension reduction processing on the second convolution result by utilizing a pre-constructed dimension reduction mapping matrix to obtain a current face feature vector with a specified dimension.
In one embodiment, the residual network module processing unit includes:
the first residual error processing unit is used for inputting the first convolution result into a front-section residual error network unit for convolution processing to obtain a first front-section residual error result, inputting the first front-section residual error result into a middle-section residual error network unit for convolution processing to obtain a first middle-section residual error result, and inputting the first middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a first rear-section residual error result;
the second residual error processing unit is used for inputting the first rear-section residual error result into a front-section residual error network unit for convolution processing to obtain a second front-section residual error result, inputting the second front-section residual error result into a plurality of continuous middle-section residual error network units for convolution processing to obtain a second middle-section residual error result, and inputting the second middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a second rear-section residual error result;
a third residual processing unit, configured to input the second rear-segment residual result into a front-segment residual network unit for convolution processing, so as to obtain a third front-segment residual result, input the third front-segment residual result into a plurality of consecutive middle-segment residual network units for convolution processing, so as to obtain a third middle-segment residual result, and input the third middle-segment residual result into a rear-segment residual network unit for convolution processing, so as to obtain a third rear-segment residual result;
and the fourth residual processing unit is used for inputting the third rear-section residual result into a front-section residual network unit for convolution processing to obtain a fourth front-section residual result, inputting the fourth front-section residual result into a middle-section residual network unit for convolution processing to obtain a fourth middle-section residual result, and inputting the fourth middle-section residual result into a rear-section residual network unit for convolution processing to obtain a fourth rear-section residual result.
In an embodiment, the first residual error processing unit includes:
a first previous convolution result obtaining unit, configured to input the first convolution result into a convolution layer with a convolution kernel of 1 × 1 for convolution processing, input the convolution result into a BN layer for batch normalization processing, and activate the convolution result by using a Relu activation function to obtain a first previous convolution result;
a second previous-stage convolution result obtaining unit, configured to input the first previous-stage convolution result to a convolution layer with a convolution kernel of 3 × 3 for convolution processing, input the convolution result to a BN layer for batch normalization processing, and activate by using a Relu activation function to obtain a second previous-stage convolution result;
and the first front-section residual error result acquisition unit is used for inputting the second front-section convolution result into a convolution layer with a convolution kernel of 1 multiplied by 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing to obtain a first batch of standardization processing results, and performing characteristic fusion on the first batch of standardization processing results and the first convolution result to obtain a first front-section residual error result.
In an embodiment, the first residual error processing unit includes:
a first middle-section convolution result obtaining unit, configured to input the first front-section residual result into a BN layer for batch normalization processing, and activate the first front-section residual result by using a Relu activation function, so as to obtain a first middle-section convolution result;
the second middle-section convolution result acquisition unit is used for inputting the first middle-section convolution result into a convolution layer with a convolution kernel of 1 multiplied by 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by using a Relu activation function to obtain a second middle-section convolution result;
a third middle-section convolution result obtaining unit, configured to input the second middle-section convolution result to a convolution layer with a convolution kernel of 3 × 3 for convolution processing, input the convolution result to a BN layer for batch normalization processing, and activate the convolution result by using a Relu activation function to obtain a third middle-section convolution result;
and the first middle section residual error result acquisition unit is used for inputting the third middle section convolution result into a convolution layer with convolution kernel of 1 multiplied by 1 for convolution processing, and performing characteristic fusion on the convolution result and the first front section residual error result to obtain a first middle section residual error result.
In an embodiment, the first residual error processing unit includes:
the first back-stage convolution result acquisition unit is used for inputting the first middle-stage residual error result into the BN layer for batch standardization processing and activating by utilizing a Relu activation function to obtain a first back-stage convolution result;
the second back-stage convolution result acquisition unit is used for inputting the first back-stage convolution result into a convolution layer with convolution kernel of 1 × 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by using a Relu activation function to obtain a second back-stage convolution result;
a third back-end convolution result obtaining unit, configured to input the second back-end convolution result to a convolution layer with a convolution kernel of 3 × 3 for convolution processing, input the convolution result to a BN layer for batch standardization processing, and activate by using a Relu activation function to obtain a third back-end convolution result;
and the first rear-section residual error result obtaining unit is used for inputting the third rear-section convolution result into a convolution layer with a convolution kernel of 1 multiplied by 1 for convolution processing, performing characteristic fusion on the convolution result and the first middle-section residual error result, inputting the characteristic fusion result into a BN layer for batch standardization processing, and activating by utilizing a Relu activation function to obtain a first rear-section residual error result.
In an embodiment, the face attribute obtaining unit 204 includes:
the normalization processing unit is used for performing normalization processing on each pixel of the target face image to enable each pixel to be between [0 and 1 ];
the normalization processing unit is used for setting a corresponding normalization mean coefficient and a corresponding normalization variance coefficient for each pixel channel of the target face image so as to perform normalization processing on the target face image;
and the age and gender acquisition unit is used for inputting the standardized target face image into a shufflenet type-dividing neural network for feature coding to obtain a corresponding target feature vector, and classifying the target feature vector according to a pre-constructed component mapping matrix to obtain age information and gender information.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the method for recognizing the human face and the attribute based on the deep learning when executing the computer program.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The use of the phrase "including a" does not exclude the presence of other, identical elements in a process, method, article, or apparatus that comprises the same element, unless the context clearly dictates otherwise.

Claims (10)

1. A face and attribute recognition method based on deep learning is characterized by comprising the following steps:
acquiring a monitoring video shot by a camera, intercepting each frame of monitoring image of the monitoring video, performing face detection on each frame of monitoring image to obtain face position information and face key point information of a face in each frame of monitoring image, and intercepting the face according to the face position information to obtain an initial face image;
calculating a face rotation angle of the initial face image according to face key point information, rotating the initial face image according to the face rotation angle, and carrying out size adjustment and edge processing on the rotated initial face image to obtain a target face image;
inputting the target face image into a face feature coding model for feature coding to obtain a current face feature vector, and comparing the current face feature vector with historical face feature vectors in a face library to obtain face attribution information;
and inputting the target face image into a face attribute recognition model for face attribute recognition to obtain age information and gender information of the face in the target face image, and outputting the face attribution information, the age information and the gender information of the target face image.
2. The method for recognizing the face and the attribute based on the deep learning of claim 1, wherein the step of performing the face detection on each frame of the monitored image to obtain the face position information and the face key point information of the face in each frame of the monitored image comprises the steps of:
inputting each frame of monitoring image into a nerve network model based on a Mobilenet network to carry out size adjustment and mean value reduction operation on the monitoring image, and carrying out face detection on the monitoring image by using a face detection algorithm model to obtain the number of faces in the monitoring image, face position information and face key point information corresponding to each face.
3. The method for recognizing the face and the attribute based on the deep learning of claim 1, wherein the step of inputting the target face image into a face feature coding model for feature coding to obtain a current face feature vector comprises the steps of:
performing convolution processing on the target face image, inputting the target face image into a BN layer for batch standardization processing, and activating by using an activation function to obtain a first convolution result;
inputting the first convolution result into a residual error network module for convolution processing to obtain a second convolution result;
and performing dimensionality reduction processing on the second convolution result by using a pre-constructed dimensionality reduction mapping matrix to obtain a current face feature vector with an appointed dimensionality.
4. The method of claim 3, wherein the step of inputting the first convolution result into a residual error network module for convolution processing to obtain a second convolution result comprises:
inputting the first convolution result into a front-section residual error network unit for convolution processing to obtain a first front-section residual error result, inputting the first front-section residual error result into a middle-section residual error network unit for convolution processing to obtain a first middle-section residual error result, and inputting the first middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a first rear-section residual error result;
inputting the first rear-section residual error result into a front-section residual error network unit for convolution processing to obtain a second front-section residual error result, inputting the second front-section residual error result into a plurality of continuous middle-section residual error network units for convolution processing to obtain a second middle-section residual error result, and inputting the second middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a second rear-section residual error result;
inputting the second rear-section residual error result into a front-section residual error network unit for convolution processing to obtain a third front-section residual error result, inputting the third front-section residual error result into a plurality of continuous middle-section residual error network units for convolution processing to obtain a third middle-section residual error result, and inputting the third middle-section residual error result into a rear-section residual error network unit for convolution processing to obtain a third rear-section residual error result;
and inputting the third rear-section residual result into a front-section residual network unit for convolution processing to obtain a fourth front-section residual result, inputting the fourth front-section residual result into a middle-section residual network unit for convolution processing to obtain a fourth middle-section residual result, and inputting the fourth middle-section residual result into a rear-section residual network unit for convolution processing to obtain a fourth rear-section residual result.
5. The method of claim 4, wherein the inputting the first convolution result into a front-end residual error network unit for convolution processing to obtain a first front-end residual error result comprises:
inputting the first convolution result into a convolution layer with a convolution kernel of 1 × 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by using a Relu activation function to obtain a first front-stage convolution result;
inputting the first front-stage convolution result into a convolution layer with convolution kernel of 3 multiplied by 3 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by utilizing Relu activation function to obtain a second front-stage convolution result;
and inputting the second front-section convolution result into a convolution layer with a convolution kernel of 1 multiplied by 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing to obtain a first batch of standardization processing results, and performing characteristic fusion on the first batch of standardization processing results and the first convolution result to obtain a first front-section residual error result.
6. The method of claim 4, wherein the inputting the first front-segment residual result into a middle-segment residual network unit for convolution processing to obtain a first middle-segment residual result comprises:
inputting the first front-section residual error result into a BN layer for batch standardization processing, and activating by utilizing a Relu activation function to obtain a first middle-section convolution result;
inputting the first middle-section convolution result into a convolution layer with a convolution kernel of 1 × 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by using a Relu activation function to obtain a second middle-section convolution result;
inputting the second middle-section convolution result into a convolution layer with convolution kernel of 3 x 3 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by utilizing Relu activation function to obtain a third middle-section convolution result;
and inputting the third middle-section convolution result into a convolution layer with a convolution kernel of 1 multiplied by 1 for convolution processing, and performing characteristic fusion on the convolution result and the first front-section residual error result to obtain a first middle-section residual error result.
7. The method of claim 4, wherein the step of inputting the first middle-segment residual result into a back-segment residual network unit for convolution processing to obtain a first back-segment residual result comprises:
inputting the first middle section residual error result into a BN layer for batch standardization processing, and activating by utilizing a Relu activation function to obtain a first back section convolution result;
inputting the first back-stage convolution result into a convolution layer with a convolution kernel of 1 × 1 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by using a Relu activation function to obtain a second back-stage convolution result;
inputting the second back-stage convolution result into a convolution layer with convolution kernel of 3 multiplied by 3 for convolution processing, inputting the convolution result into a BN layer for batch standardization processing, and activating by utilizing a Relu activation function to obtain a third back-stage convolution result;
and inputting the third rear-segment convolution result into a convolution layer with a convolution kernel of 1 multiplied by 1 for convolution processing, performing characteristic fusion on the convolution result and the first middle-segment residual error result, inputting the characteristic fusion result into a BN layer for batch standardization processing, and activating by using a Relu activation function to obtain a first rear-segment residual error result.
8. The method for recognizing the face and the attribute based on the deep learning of claim 1, wherein the step of inputting the target face image into a face attribute recognition model for face attribute recognition to obtain the age information and the gender information of the face in the target face image comprises the steps of:
normalizing each pixel of the target face image to enable each pixel to be between [0, 1 ];
setting a corresponding standardized mean coefficient and a corresponding standardized variance coefficient for each pixel channel of the target face image so as to carry out standardized processing on the target face image;
inputting the standardized target face image into a shufflenet type-dividing neural network for feature coding to obtain a corresponding target feature vector, and classifying the target feature vector according to a pre-constructed component mapping matrix to obtain age information and gender information.
9. A face and attribute recognition system based on deep learning is characterized by comprising:
an initial face image obtaining unit, configured to obtain a monitoring video shot by a camera, intercept each frame of monitoring image of the monitoring video, perform face detection on each frame of monitoring image, obtain face position information and face key point information of a face in each frame of monitoring image, and intercept the face according to the face position information, so as to obtain an initial face image;
a target face image obtaining unit, configured to calculate a face rotation angle of the initial face image according to face key point information, rotate the initial face image according to the face rotation angle, and perform size adjustment and edge processing on the rotated initial face image to obtain a target face image;
a face attribution information obtaining unit, configured to input the target face image into a face feature coding model for feature coding to obtain a current face feature vector, and compare the current face feature vector with a historical face feature vector in a face library to obtain face attribution information;
and the human face attribute acquisition unit is used for inputting the target human face image into a human face attribute recognition model for human face attribute recognition to obtain age information and gender information of the human face in the target human face image and outputting the human face attribution information, the age information and the gender information of the target human face image outwards.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for deep learning based face and attribute recognition according to any one of claims 1 to 8 when executing the computer program.
CN202210602950.6A 2022-05-30 2022-05-30 Face and attribute recognition method and system based on deep learning and computer equipment Pending CN115050069A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210602950.6A CN115050069A (en) 2022-05-30 2022-05-30 Face and attribute recognition method and system based on deep learning and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210602950.6A CN115050069A (en) 2022-05-30 2022-05-30 Face and attribute recognition method and system based on deep learning and computer equipment

Publications (1)

Publication Number Publication Date
CN115050069A true CN115050069A (en) 2022-09-13

Family

ID=83159913

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210602950.6A Pending CN115050069A (en) 2022-05-30 2022-05-30 Face and attribute recognition method and system based on deep learning and computer equipment

Country Status (1)

Country Link
CN (1) CN115050069A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116416671A (en) * 2023-06-12 2023-07-11 太平金融科技服务(上海)有限公司深圳分公司 Face image correcting method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116416671A (en) * 2023-06-12 2023-07-11 太平金融科技服务(上海)有限公司深圳分公司 Face image correcting method and device, electronic equipment and storage medium
CN116416671B (en) * 2023-06-12 2023-10-03 太平金融科技服务(上海)有限公司深圳分公司 Face image correcting method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Moghaddam et al. Probabilistic visual learning for object detection
Ravichandran et al. Categorizing dynamic textures using a bag of dynamical systems
Darom et al. Scale-invariant features for 3-D mesh models
EP2017770B1 (en) Face meta-data generation and face similarity calculation
CN109145745B (en) Face recognition method under shielding condition
Debiasi et al. PRNU variance analysis for morphed face image detection
US20120183212A1 (en) Identifying descriptor for person or object in an image
CN111709313B (en) Pedestrian re-identification method based on local and channel combination characteristics
JP2005149506A (en) Method and apparatus for automatic object recognition/collation
Lepsøy et al. Statistical modelling of outliers for fast visual search
CN106980848A (en) Facial expression recognizing method based on warp wavelet and sparse study
JP6071002B2 (en) Reliability acquisition device, reliability acquisition method, and reliability acquisition program
CN106096517A (en) A kind of face identification method based on low-rank matrix Yu eigenface
CN111931548B (en) Face recognition system, method for establishing face recognition data and face recognition method
Demirkus et al. Hierarchical temporal graphical model for head pose estimation and subsequent attribute classification in real-world videos
CN114359553B (en) Signature positioning method and system based on Internet of things and storage medium
JP2006344236A (en) Face metadata generation method and device, and face recognition method and system
US9081800B2 (en) Object detection via visual search
CN113592769A (en) Abnormal image detection method, abnormal image model training method, abnormal image detection device, abnormal image model training device and abnormal image model training medium
El-Abed et al. Quality assessment of image-based biometric information
CN115240029A (en) Training method of image regression model, image regression analysis method, medium, and terminal
CN115050069A (en) Face and attribute recognition method and system based on deep learning and computer equipment
JP4375571B2 (en) Face similarity calculation method and apparatus
US20200380288A1 (en) Proposal region filter for digital image processing
Kekre et al. Performance comparison of DCT and VQ based techniques for iris recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination