CN110688875A - Face quality evaluation network training method, face quality evaluation method and device - Google Patents

Face quality evaluation network training method, face quality evaluation method and device Download PDF

Info

Publication number
CN110688875A
CN110688875A CN201810730258.5A CN201810730258A CN110688875A CN 110688875 A CN110688875 A CN 110688875A CN 201810730258 A CN201810730258 A CN 201810730258A CN 110688875 A CN110688875 A CN 110688875A
Authority
CN
China
Prior art keywords
network
feature
evaluation
output
evaluation network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810730258.5A
Other languages
Chinese (zh)
Other versions
CN110688875B (en
Inventor
蔡晓蕙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201810730258.5A priority Critical patent/CN110688875B/en
Publication of CN110688875A publication Critical patent/CN110688875A/en
Application granted granted Critical
Publication of CN110688875B publication Critical patent/CN110688875B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a training method of a face quality evaluation network, a face quality evaluation method and a face quality evaluation device. The training method of the face quality evaluation network comprises the following steps: the system comprises a branch network group, a feature fusion network layer and a first output network layer which are connected in sequence; the method comprises the following steps: training the characteristic evaluation network based on the first sample image data and the output of a second output network layer included by the characteristic evaluation network aiming at each characteristic evaluation network to obtain the network parameters of the characteristic extraction network layer of the characteristic evaluation network; after the training of each feature evaluation network is completed, on the basis of the network parameters of the feature extraction network layer of each feature evaluation network obtained by training, the face quality evaluation network is trained on the basis of second sample image data and the output of the first output network layer. The scheme can improve the network convergence speed of training the face quality evaluation network.

Description

Face quality evaluation network training method, face quality evaluation method and device
Technical Field
The present application relates to the field of image analysis technologies, and in particular, to a face quality assessment network training method, a face quality assessment method, and an apparatus.
Background
In order to improve the accuracy of human face-based tasks such as human face recognition, quality evaluation is usually performed on human face images, and then the human face images with higher evaluation scores are applied to the tasks. Among them, it is a common method to perform face quality evaluation through a face quality evaluation network.
In the prior art, a training method of a face quality evaluation network comprises the following steps: and training the face quality evaluation network by taking the sample image data of the training sample set as input content and taking the evaluation score of the artificially calibrated sample image as a true value.
Because the face belonging to the three-dimensional target has complex feature details, and the existing training method of the face quality evaluation network directly regresses the sample image into scores, the network convergence speed is low.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method and an apparatus for training a face quality assessment network, so as to improve a network convergence speed of training the face quality assessment network. In addition, the embodiment of the application also provides a face quality evaluation method so as to quickly obtain an effective face quality evaluation score. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a training method for a face quality assessment network, where the face quality assessment network includes: the system comprises a branch network group, a feature fusion network layer and a first output network layer which are connected in sequence; the branch network group comprises at least two branch network layers which are parallel branches, each branch network layer corresponds to one feature evaluation network, each branch network layer is a feature extraction network layer in the corresponding feature evaluation network, and each feature evaluation network further comprises a second output network layer;
the method comprises the following steps:
training the characteristic evaluation network based on the first sample image data and the output of a second output network layer included by the characteristic evaluation network aiming at each characteristic evaluation network to obtain the network parameters of the characteristic extraction network layer of the characteristic evaluation network;
after the training of each feature evaluation network is completed, on the basis of the network parameters of the feature extraction network layer of each feature evaluation network obtained by training, the face quality evaluation network is trained on the basis of second sample image data and the output of the first output network layer, and the face quality evaluation network for face quality evaluation is obtained.
Optionally, the feature evaluation networks corresponding to the at least two branch network layers include:
the system comprises a first feature evaluation network for evaluating key points of the face, a second feature evaluation network for evaluating the brightness of an image, a third feature evaluation network for evaluating the definition of the image, a fourth feature evaluation network for evaluating the angle of the face and a fifth feature evaluation network for evaluating the degree of face shielding.
Optionally, the step of training, for each feature evaluation network, the feature evaluation network based on the first sample image data and an output of a second output network layer included in the feature evaluation network to obtain a network parameter of a feature extraction network layer of the feature evaluation network includes:
aiming at each feature evaluation network in the first type of network, training the feature evaluation network based on the first sample image data and the output of a second output network layer included by the feature evaluation network to obtain the network parameters of the feature extraction network layer of the feature evaluation network; wherein the first type of network comprises: the first feature evaluation network, the second feature evaluation network and the third feature evaluation network;
for each feature evaluation network in the second type of network, training the feature evaluation network by taking feature graphs extracted by one or more feature evaluation networks in the first type of network as supervision information based on first sample image data and output of a second output network layer included by the feature evaluation network; wherein the second type of network comprises: a fourth feature evaluation network and a fifth feature evaluation network.
Optionally, the step of training, for each feature evaluation network in the first type of network, the feature evaluation network based on the first sample image data and the output of the second output network layer included in the feature evaluation network to obtain the network parameters of the feature extraction network layer of the feature evaluation network includes:
and training the feature evaluation network to obtain network parameters of the feature extraction network layer of the feature evaluation network based on the feature graph extracted by one or more feature evaluation networks in the second type of network as supervision information based on the first sample image data and the output of the second output network layer included by the feature evaluation network.
Optionally, the step of training, for each feature evaluation network in the second type of network, the feature evaluation network based on the first sample image data and the output of the second output network layer included in the feature evaluation network, with the feature map extracted by one or more feature evaluation networks in the first type of network as the supervision information, to obtain the network parameters of the feature extraction network layer of the feature evaluation network includes:
and training the feature evaluation network by taking the feature graph extracted by the first feature evaluation network as supervision information to obtain network parameters of the feature extraction network layer of the feature evaluation network based on the first sample image data and the output of the second output network layer included by the feature evaluation network aiming at each feature evaluation network in the second type of network.
Optionally, the feature extraction network layer of the first feature evaluation network includes: a first feature extraction sublayer and a second feature extraction sublayer connected in sequence;
the step of training the feature evaluation network to obtain the network parameters of the feature extraction network layer of the feature evaluation network based on the first sample image data and the output of the second output network layer included in the feature evaluation network, the feature extraction network layer being extracted by the first feature evaluation network, includes:
and training the feature evaluation network to obtain network parameters of the feature extraction network layer of the feature evaluation network based on the feature graph extracted by the first feature extraction sub-layer in the first feature evaluation network as supervision information based on the first sample image data and the output of the second output network layer included by the feature evaluation network.
Optionally, the feature extraction network layer of the fourth feature evaluation network includes: the first feature extraction sublayer is connected with the second feature extraction sublayer;
the feature extraction network layer of the fifth feature evaluation network includes: and the second feature fusion sublayer is also connected with the first feature extraction sublayer.
Optionally, the feature evaluation network corresponding to the at least two branch network layers further includes:
a sixth feature evaluation network for evaluating image categories, wherein the image categories are non-faces, irregular faces or regular faces.
Optionally, for each feature evaluation network, training the feature evaluation network based on the first sample image data and an output of a second output network layer included in the feature evaluation network to obtain a network parameter of a feature extraction network layer of the feature evaluation network, including:
and training the feature evaluation network based on the first sample image data and the output of a second output network layer included by the feature evaluation network, wherein feature graphs extracted from one or more feature evaluation networks in a third type of network are used as supervision information, and network parameters of the feature extraction network layer of the feature evaluation network are obtained, and the third type of network includes feature evaluation networks except the sixth feature evaluation network.
Optionally, the face quality evaluation network further includes: and the basic network layer is a feature extraction network layer in the first feature evaluation network, and the output content of the basic network layer is the input content of each branch network layer in the branch network group.
Optionally, the first output network layer comprises: a feature extraction sublayer and a fractional regression sublayer;
the input content of the feature extraction sublayer is the output content of the feature fusion network layer, the input content of the score regression sublayer is the output content of the feature extraction sublayer, and the output content of the score regression sublayer is the quality evaluation score.
Optionally, the step of training the face quality assessment network based on the second sample image data and the output of the first output network layer to obtain a face quality assessment network for face quality assessment includes:
training the face quality evaluation network based on second sample image data, the output of the first output network layer and a quality evaluation score corresponding to the second sample image data to obtain a face quality evaluation network for face quality evaluation;
wherein the quality evaluation score of the second sample image data is: and determining a score based on a target similarity, wherein the target similarity is the similarity between the second sample image data and reference image data corresponding to the second sample image data, the reference image data and the second sample image data are face image data of the same person, and the reference image data is an image including complete face information.
In a second aspect, an embodiment of the present application provides a face quality assessment method, including:
obtaining a target face image to be evaluated;
inputting the target face image into a pre-trained face quality evaluation network to obtain a quality evaluation score of the target face image; the face quality evaluation network is trained based on the training method of the face quality evaluation network provided by the embodiment of the application.
In a third aspect, an embodiment of the present application provides a training apparatus for a face quality assessment network, where the face quality assessment network includes: the system comprises a branch network group, a feature fusion network layer and a first output network layer which are connected in sequence; the branch network group comprises at least two branch network layers which are parallel branches, each branch network layer corresponds to one feature evaluation network, each branch network layer is a feature extraction network layer in the corresponding feature evaluation network, and each feature evaluation network further comprises a second output network layer;
the device comprises:
the first training unit is used for training the characteristic evaluation network aiming at each characteristic evaluation network based on the first sample image data and the output of a second output network layer included by the characteristic evaluation network to obtain the network parameters of the characteristic extraction network layer of the characteristic evaluation network;
and the second training unit is used for training the face quality evaluation network on the basis of second sample image data and the output of the first output network layer after the training of each feature evaluation network is finished and on the basis of the network parameters of the feature extraction network layer of each feature evaluation network obtained through training, so as to obtain the face quality evaluation network for face quality evaluation.
Optionally, the feature evaluation networks corresponding to the at least two branch network layers include:
the system comprises a first feature evaluation network for evaluating key points of the face, a second feature evaluation network for evaluating the brightness of an image, a third feature evaluation network for evaluating the definition of the image, a fourth feature evaluation network for evaluating the angle of the face and a fifth feature evaluation network for evaluating the degree of face shielding.
Optionally, the first training unit comprises:
the first-class network training subunit is used for training the characteristic evaluation network aiming at each characteristic evaluation network in the first-class network and based on the first sample image data and the output of a second output network layer included by the characteristic evaluation network to obtain the network parameters of the characteristic extraction network layer of the characteristic evaluation network; wherein the first type of network comprises: the first feature evaluation network, the second feature evaluation network and the third feature evaluation network;
a second-class network training subunit, configured to train, for each feature evaluation network in the second-class network, the feature evaluation network based on the first sample image data and an output of a second output network layer included in the feature evaluation network, with feature maps extracted by one or more feature evaluation networks in the first-class network as supervision information; wherein the second type of network comprises: a fourth feature evaluation network and a fifth feature evaluation network.
Optionally, the first-class network training subunit includes:
and the first-class network training module is used for training the characteristic evaluation network to obtain network parameters of the characteristic extraction network layer of the characteristic evaluation network based on the first sample image data and the output of the second output network layer included by the characteristic evaluation network and the characteristic diagram extracted by one or more characteristic evaluation networks in the second-class network as supervision information.
Optionally, the second type of network training subunit includes:
and the second type network training module is used for training the characteristic evaluation network to obtain the network parameters of the characteristic extraction network layer of the characteristic evaluation network based on the first sample image data and the output of a second output network layer included by the characteristic evaluation network and the characteristic graph extracted by the first characteristic evaluation network as supervision information.
Optionally, the feature extraction network layer of the first feature evaluation network includes: a first feature extraction sublayer and a second feature extraction sublayer connected in sequence;
the second type of network training module is specifically configured to:
and training the feature evaluation network to obtain network parameters of the feature extraction network layer of the feature evaluation network based on the feature graph extracted by the first feature extraction sub-layer in the first feature evaluation network as supervision information based on the first sample image data and the output of the second output network layer included by the feature evaluation network.
Optionally, the feature extraction network layer of the fourth feature evaluation network includes: the first feature extraction sublayer is connected with the second feature extraction sublayer;
the feature extraction network layer of the fifth feature evaluation network includes: and the second feature fusion sublayer is also connected with the first feature extraction sublayer.
Optionally, the feature evaluation network corresponding to the at least two branch network layers further includes:
a sixth feature evaluation network for evaluating image categories, wherein the image categories are non-faces, irregular faces or regular faces.
Optionally, the first training unit is specifically configured to evaluate, for the sixth feature evaluation network, an output of a second output network layer comprised by the network based on the first sample image data and the feature evaluation,
and training the feature evaluation network by taking the feature graphs extracted from one or more feature evaluation networks in a third type of network as supervision information to obtain network parameters of a feature extraction network layer of the feature evaluation network, wherein the third type of network comprises the feature evaluation networks except the sixth feature evaluation network.
Optionally, the face quality evaluation network further includes: and the basic network layer is a feature extraction network layer in the first feature evaluation network, and the output content of the basic network layer is the input content of each branch network layer in the branch network group.
Optionally, the first output network layer comprises: a feature extraction sublayer and a fractional regression sublayer;
the input content of the feature extraction sublayer is the output content of the feature fusion network layer, the input content of the score regression sublayer is the output content of the feature extraction sublayer, and the output content of the score regression sublayer is the quality evaluation score.
Optionally, the second training unit comprises:
the second training subunit is used for training the face quality evaluation network based on second sample image data, the output of the first output network layer and a quality evaluation score corresponding to the second sample image data to obtain a face quality evaluation network for face quality evaluation;
wherein the quality evaluation score of the second sample image data is: and determining a score based on a target similarity, wherein the target similarity is the similarity between the second sample image data and reference image data corresponding to the second sample image data, the reference image data and the second sample image data are face image data of the same person, and the reference image data is an image including complete face information.
In a fourth aspect, an embodiment of the present application provides a face quality assessment apparatus, including:
the device comprises an obtaining unit, a judging unit and a judging unit, wherein the obtaining unit is used for obtaining a target face image to be evaluated;
the determining unit is used for inputting the target face image into a pre-trained face quality evaluation network to obtain a quality evaluation score of the target face image; the face quality evaluation network is trained based on the training method of the face quality evaluation network provided by the embodiment of the application.
In a fifth aspect, an embodiment of the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the training method of the face quality evaluation network provided by the embodiment of the application when executing the program stored in the memory.
In a sixth aspect, an embodiment of the present application provides an electronic device, which includes a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the human face quality evaluation method provided by the embodiment of the application when executing the program stored in the memory.
The training method of the face quality evaluation network provided by the embodiment of the application comprises the steps of firstly training each feature evaluation network to obtain network parameters of a feature extraction network layer of each feature evaluation network; then, on the basis of extracting the network parameters of the network layer from the features of each feature evaluation network obtained by training, uniformly training each network layer in the face quality evaluation network. Before the face quality evaluation network is trained, each branch network layer in the face quality evaluation network has finished convergence processing once, so that when all network layers in the face quality evaluation network are trained uniformly, network parameters of all network layers are finely adjusted, and the adjustment times of the network parameters are greatly reduced. Therefore, compared with the prior art, the scheme can improve the network convergence speed of training the face quality evaluation network.
In addition, in the face quality evaluation method provided by the embodiment of the application, a target face image to be evaluated is obtained; inputting the target face image into a pre-trained face quality evaluation network to obtain a quality evaluation score of the target face image; the face quality evaluation network is trained based on the training method of the face quality evaluation network provided by the embodiment of the application. Therefore, the face quality evaluation method provided by the embodiment of the application can quickly obtain the effective face quality evaluation score.
Of course, it is not necessary for any product or method of the present application to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a face quality evaluation network according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a face quality evaluation network according to an embodiment of the present application;
fig. 3 is a flowchart of a training method for a face quality assessment network according to an embodiment of the present application;
fig. 4 is another flowchart of a training method for a face quality assessment network according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a face quality evaluation network according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a face quality evaluation network according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a face quality evaluation network according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a training apparatus of a face quality assessment network according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 10 is a flowchart of a face quality evaluation method according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a face quality evaluation apparatus according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to improve the network convergence speed of training a face quality assessment network, the embodiment of the application provides a training method and device of the face quality assessment network, an electronic device and a storage medium. Among them, the so-called face quality evaluation is a network for scoring the image quality of a face image, and the score may be a percentage, a tenth, or a fifths, or the like.
First, a training method of a face quality assessment network provided by the embodiment of the present application is introduced below.
It should be noted that the training method for the face quality assessment network provided by the embodiment of the present application is applied to an electronic device. In a specific application, the electronic device may be a terminal device or a server.
Specifically, as shown in fig. 1, the face quality evaluation network provided in the embodiment of the present application may include: a branch network group, a feature fusion network layer 120, and a first output network layer 130 connected in sequence; the branch network group includes at least two branch network layers 110 as parallel branches, each branch network layer 110 corresponds to one feature evaluation network, each branch network layer 110 extracts a network layer for a feature in the corresponding feature evaluation network, and each feature evaluation network further includes a second output network layer. It can be understood that various image features have relevance and can be used as branches to fuse various detailed information; moreover, any network layer of the face quality evaluation network and the feature evaluation network may include: volume and pooling, etc.
It is understood that, from a functional point of view, each feature evaluation network can be divided into: a network layer for extracting the feature map and a network layer for outputting the result. For convenience of description, in the embodiment of the present application, a network layer for extracting a feature map in a feature evaluation network is named a feature extraction network layer, and a network layer for outputting a result in the feature evaluation network is named a second output network layer. The characteristic diagram is as follows: and fusing images with at least one of characteristics such as color, texture, shape and spatial relationship. In addition, the second output network layer may be specifically configured to classify or regress, specifically, the classification or regression is related to the evaluated features, wherein the output result of the second output network layer for classification may be the confidence of each class, and the output result of the second output network layer for regression is a specific numerical value. For example: the output of the feature evaluation network for evaluating the face occlusion degree is a degree value of the face degree, so that the second output network layer of the feature evaluation network for evaluating the face occlusion degree is specifically used for regression; while the output of the feature evaluation network for evaluating image classes is the confidence of the respective image class, the second output network layer of the feature evaluation network for evaluating image classes is used for classification.
Also, from a functional point of view, the face quality evaluation network can be divided into: the network layer is used for extracting a feature graph of a certain feature, the network layer is used for fusing the feature graph, and the network layer is used for outputting results. For convenience of description, in the embodiment of the present application, a network layer for extracting a feature map of a certain feature is named a branch network layer, a network layer for fusing the feature map is named a feature fusion network layer, and a network layer for outputting a result is named a first output network layer. The first output network layer is specifically used for regression, that is, regression of image data into a quality evaluation score. It is understood that in a specific application, since the feature fusion network layer is used for fusion of feature maps, for better learning feature details, the first output network layer may include: a feature extraction sublayer and a fractional regression sublayer; the input content of the feature extraction sublayer is the output content of the feature fusion network layer, the input content of the score regression sublayer is the output content of the feature extraction sublayer, and the output content of the score regression sublayer is the quality evaluation score. The feature extraction sublayer is used for further learning detail features of the fused feature map to obtain a new feature map, the new feature map can be used as input of the score regression sublayer, and the face quality evaluation score is finally obtained through regression processing of the score regression sublayer.
In a specific application, the feature evaluation networks corresponding to the at least two branch network layers may include: the system comprises at least two networks of a first feature evaluation network for evaluating key points of the face, a second feature evaluation network for evaluating the brightness of the image, a third feature evaluation network for evaluating the definition of the image, a fourth feature evaluation network for evaluating the angle of the face and a fifth feature evaluation network for evaluating the degree of face shielding. The face key points may include: nose, mouth, eyebrow, cheek, etc., and the key points of the face can be determined by a key point detection algorithm; the image brightness may be determined by a brightness determination algorithm; and the image definition, the face angle and the face shielding degree can be calibrated in a manual calibration mode.
For convenience of understanding, fig. 2 shows a schematic structural diagram of the face quality evaluation network when the feature evaluation networks corresponding to the at least two branch network layers include the above five feature evaluation networks. In fig. 2, the face quality evaluation network includes: a branch network group, a feature fusion network layer 220, and a first output network layer 230 connected in sequence, the branch network group including: the system comprises a branch network layer 211 corresponding to a face key point, a branch network layer 212 corresponding to image brightness, a branch network layer 213 corresponding to image definition, a branch network layer 214 corresponding to a face angle and a branch network layer 215 corresponding to face shielding degree, wherein the branch network layer 211 corresponding to the face key point is a branch network layer corresponding to a first feature evaluation network, the branch network layer 212 corresponding to the image brightness is a branch network layer corresponding to a second feature evaluation network, the branch network layer 213 corresponding to the image definition is a branch network layer corresponding to a third feature evaluation network, the branch network layer 214 corresponding to the face angle is a branch network layer corresponding to a fourth feature evaluation network, and the branch network layer 215 corresponding to the face shielding degree is a branch network layer corresponding to a fifth feature evaluation network.
As shown in fig. 3, the training method for a face quality assessment network provided in the embodiment of the present application may include the following steps:
s301, aiming at each feature evaluation network, training the feature evaluation network based on the first sample image data and the output of a second output network layer included in the feature evaluation network to obtain the network parameters of the feature extraction network layer of the feature evaluation network;
in order to solve the problems in the prior art, based on the composition of the face quality evaluation network, in the embodiment of the application, each feature evaluation network is trained to obtain the network parameters of the feature extraction network layer of each feature evaluation network, and the convergence rate of the feature evaluation network is high because the feature evaluation network is a single-feature learning. And then, on the basis of extracting the network parameters of the network layer from the features of each feature evaluation network, carrying out unified training on each network layer in the face quality evaluation network. Therefore, when the face quality evaluation network is trained, the network parameters in the face quality evaluation network can be finely adjusted, the adjustment times of the network parameters are greatly reduced, and the rapid convergence of the face quality evaluation network is realized.
Specifically, when training each feature evaluation network, for each feature evaluation network, the feature evaluation network may be trained based on the first sample image data, the output of the second output network layer included in the feature evaluation network, and the feature content corresponding to the first sample image data, so as to obtain the network parameters of the feature extraction network layer of the feature evaluation network. That is, the first sample image data is used as the input content, the feature content corresponding to the first sample image data is used as the true value corresponding to the corresponding input content, and the network parameter of the feature evaluation network is adjusted by the difference between the output of the second output network layer included in the feature evaluation network and the true value.
It is emphasized that the process of training the feature evaluation network is a process of adjusting some or all of the network parameters of the feature evaluation network, wherein the back propagation method or the gradient descent method may be used when adjusting the network parameters, but is not limited thereto. In addition, when each feature evaluation network is trained and a preset termination condition is met, the training process can be terminated to obtain network parameters of the feature extraction network layer of the feature evaluation network. The preset end condition may be, but is not limited to, a network convergence for the feature evaluation network. Wherein, the specific step of the feature evaluation network to achieve network convergence is as follows: a loss value between the real values corresponding to the output and input contents with respect to the second output network layer calculated using the loss function is lower than a predetermined threshold.
In addition, it is understood that the first sample image data may be an image acquired by a special face information acquisition device, or may be an image obtained by performing face detection processing on an image including a face region, which is reasonable.
S302, after the training of each feature evaluation network is completed, on the basis of the network parameters of the feature extraction network layer of each feature evaluation network obtained through training, the face quality evaluation network is trained on the basis of the second sample image data and the output of the first output network layer, and the face quality evaluation network for face quality evaluation is obtained.
After the training of each feature evaluation network is completed, on the basis of the network parameters of the feature extraction network layer of each feature evaluation network obtained by training, the face quality evaluation network may be trained based on the second sample image data and the output of the first output network layer, that is, the network parameters in the face quality evaluation network are finely adjusted, so as to obtain the face quality evaluation network for face quality evaluation. On the basis of the network parameters of the feature extraction network layer of each feature evaluation network obtained by training, training the face quality evaluation network based on the second sample image data and the output of the first output network layer specifically means: and extracting network parameters of network layers according to the features of each feature evaluation network obtained by training, taking the network parameters as initial network parameters of corresponding branch network layers in the face quality evaluation network, and further training each network layer in the face quality evaluation network based on second sample image data and the output of the first output network layer.
Specifically, when the face quality evaluation network is trained, the face quality evaluation network may be trained based on second sample image data, the output of the first output network layer, and a quality evaluation score corresponding to the second sample image data, so as to obtain the face quality evaluation network for face quality evaluation. That is to say, the second sample image data is used as the input content, the quality evaluation score corresponding to the second sample image data is used as the true value corresponding to the corresponding input content, and the network parameters of the face quality evaluation network are adjusted through the difference between the output of the first output network layer and the true value.
It should be emphasized that the process of training the face quality assessment network is a process of adjusting part or all of the network parameters of the face quality assessment network, wherein a back propagation method or a gradient descent method may be used when adjusting the network parameters, but is not limited thereto. Those skilled in the art can understand that when the face quality evaluation network is trained, the network parameters of each branch network layer may be changed or may be fixed. And when the face quality evaluation network is trained and a preset finishing condition is met, finishing the training process to obtain the face quality evaluation network for face quality evaluation. The preset termination condition may be that the face quality evaluation network reaches network convergence, but is not limited thereto. The specific point that the face quality evaluation network achieves network convergence is as follows: a loss value between the real values corresponding to the output and input contents with respect to the first output network layer calculated using the loss function is lower than a predetermined threshold.
It is understood that the second sample image data may be an image acquired by a special face information acquisition device, or may be an image obtained by performing face detection processing on an image including a face region, which is reasonable. Also, the second sample image data and the first sample image data may be the same or different data.
In addition, it is emphasized that there may be a variety of ways to determine the quality assessment score of the second sample image data. Optionally, in an implementation, the quality assessment score of the second sample image data may be calibrated manually, that is, the quality of the second sample image data is assessed by manual observation.
Optionally, in another implementation manner, in order to ensure objectivity of the quality assessment score and thus improve reliability of the quality assessment score, the quality assessment score of the second sample image data may be: and determining a score based on a target similarity, wherein the target similarity is the similarity between the second sample image data and reference image data corresponding to the second sample image data, the reference image data and the second sample image data are face image data of the same person, and the reference image data is an image including complete face information.
In particular applications, it is reasonable that the reference image may be an identification card image, or other image that includes full facial information. Moreover, the plurality of second sample image data of the training face quality evaluation network may be images of one person or images of a plurality of persons. When the plurality of second sample image data are images of a person, the plurality of second sample image data correspond to the same reference image; when the plurality of second sample image data are images of at least two persons, the second sample image data belonging to the same person correspond to the same reference image.
In addition, any image similarity calculation method may be used to calculate the similarity between the second sample image data and the corresponding reference image. In addition, in a specific application, the quality evaluation score may be a similarity or a product of the similarity and a preset adjustment value, where the preset adjustment value may be set according to an actual situation, for example: the preset adjustment value may be 10, 100, 1000, etc., but is not limited thereto.
The training method of the face quality evaluation network provided by the embodiment of the application comprises the steps of firstly training each feature evaluation network to obtain network parameters of a feature extraction network layer of each feature evaluation network; then, on the basis of extracting the network parameters of the network layer from the features of each feature evaluation network obtained by training, uniformly training each network layer in the face quality evaluation network. Before the face quality evaluation network is trained, each branch network layer in the face quality evaluation network has finished convergence processing once, so that when all network layers in the face quality evaluation network are trained uniformly, network parameters of all network layers are finely adjusted, and the adjustment times of the network parameters are greatly reduced. Therefore, compared with the prior art, the scheme can improve the network convergence speed of training the face quality evaluation network.
On the basis of the above embodiments, a training method for a face quality assessment network provided by the embodiments of the present application is introduced below with reference to specific embodiments.
It should be noted that the training method for the face quality assessment network provided by the present embodiment is applied to an electronic device. In a specific application, the electronic device may be a terminal device or a server.
In this embodiment, the face quality evaluation network may include: the system comprises a branch network group, a feature fusion network layer and a first output network layer which are connected in sequence; the branch network group comprises five branch network layers which are parallel branches, each branch network layer corresponds to one feature evaluation network, each branch network layer extracts network layers for features in the corresponding feature evaluation network, and each feature evaluation network further comprises a second output network layer. And, the feature evaluation network corresponding to the five branch network layers comprises: the system comprises a first feature evaluation network for evaluating key points of the face, a second feature evaluation network for evaluating the brightness of an image, a third feature evaluation network for evaluating the definition of the image, a fourth feature evaluation network for evaluating the angle of the face and a fifth feature evaluation network for evaluating the degree of face shielding. A schematic diagram of the structure of the face quality evaluation network can be seen in fig. 2.
It will be appreciated by those skilled in the art that certain image features may have a supervisory role with respect to other image features, i.e. certain image features may influence specific feature values of other image features. Therefore, in the specific embodiment, when the feature evaluation network is trained, feature maps extracted by other feature evaluation networks can be used as supervision information, so that various detailed information is fused, and the evaluation accuracy of the face quality evaluation network is further improved. For example: the face key points serve as basic features of the face and have a supervision function on face angles and face shielding degrees, so that feature graphs extracted by the feature evaluation network corresponding to the face key points can be used as supervision information on the feature evaluation network of the face angles and the face shielding degrees.
As shown in fig. 4, a training method for a face quality assessment network may include the following steps:
s401, aiming at each feature evaluation network in a first type of network, training the feature evaluation network based on first sample image data and the output of a second output network layer included in the feature evaluation network to obtain the network parameters of a feature extraction network layer of the feature evaluation network; wherein the first type of network comprises: the first feature evaluation network, the second feature evaluation network and the third feature evaluation network;
in this embodiment, the first feature evaluation network, the second feature evaluation network, and the third feature evaluation network are used as a first class classification network, and each feature evaluation network in the first class classification network may be trained independently, that is, no monitoring information is set.
Specifically, when training each feature evaluation network in the first-class network, for each feature evaluation network, the feature evaluation network may be trained based on the first sample image data, the output of the second output network layer included in the feature evaluation network, and the feature content corresponding to the first sample image data, so as to obtain the network parameters of the feature extraction network layer of the feature evaluation network. That is, the first sample image data is used as the input content, the feature content corresponding to the first sample image data is used as the true value corresponding to the corresponding input content, and the network parameter of the feature evaluation network is adjusted by the difference between the output of the second output network layer included in the feature evaluation network and the true value.
It should be noted that the process of training each feature evaluation network in the first type of network is a process of adjusting part or all of the network parameters of the feature evaluation network, wherein a back propagation method or a gradient descent method may be used when adjusting the network parameters, but is not limited thereto. In addition, when each feature evaluation network in the first-class network is trained, when a preset termination condition is met, the training process can be terminated to obtain network parameters of the feature extraction network layer of the feature evaluation network. The preset end condition may be, but is not limited to, a network convergence for the feature evaluation network.
S402, aiming at each feature evaluation network in the second type of network, based on the first sample image data and the output of a second output network layer included in the feature evaluation network, taking feature graphs extracted by one or more feature evaluation networks in the first type of network as supervision information, and training the feature evaluation network; wherein the second type of network may include: a fourth feature evaluation network and a fifth feature evaluation network;
in this specific embodiment, the fourth feature evaluation network and the fifth feature evaluation network are used as the second type of network, and when each feature evaluation network in the second type of network is trained, the supervision information is added, and the supervision information is a feature map extracted by one or more feature evaluation networks in the first type of network. The feature graph extracted by one or more feature evaluation networks in the first type of network is used as supervision information of the feature evaluation network in the second type of network, and specifically refers to: the feature graphs extracted by one or more feature evaluation networks in the first type of network are fused to the feature extraction network layer of the feature evaluation network in the second type of network, that is, the feature extraction network layer of the feature evaluation network in the second type of network utilizes the supervision information during training.
Specifically, when training each feature evaluation network in the second type of network, for each feature evaluation network, based on the first sample image data, the output of the second output network layer included in the feature evaluation network, and the feature content corresponding to the first sample image data, the feature graph extracted by one or more feature evaluation networks in the first type of network is used as the supervision information to train the feature evaluation network, so as to obtain the network parameters of the feature extraction network layer of the feature evaluation network. That is to say, the first sample image data is used as the input content, the feature content corresponding to the first sample image data is used as the real value corresponding to the corresponding input content, the supervision information is merged into the feature extraction network layer of the feature evaluation network, and then the network parameters of the feature evaluation network are adjusted through the difference between the output of the second output network layer included in the feature evaluation network and the real value.
Optionally, since the face key points have a good monitoring function on the face shielding degree and the face angle, when each feature evaluation network in the second type of network is trained, the feature map extracted by the first feature evaluation network may be used as monitoring information.
For the feature map extracted by the first feature evaluation network as the supervision information of the second type of network, in a specific implementation manner, the feature extraction network layer of the first feature evaluation network may include: a first feature extraction sublayer and a second feature extraction sublayer connected in sequence; at this time, when each feature evaluation network in the second type of network is trained, the feature map extracted by the first feature extraction sub-layer in the first feature evaluation network is used as supervision information.
Correspondingly, on the basis of taking the feature map extracted by the first feature extraction sub-layer as supervision information, the feature extraction network layer of the fourth feature evaluation network comprises: the first feature extraction sublayer is connected with the second feature extraction sublayer; the feature extraction network layer of the fifth feature evaluation network includes: and the second feature fusion sublayer is also connected with the first feature extraction sublayer.
On the basis that the feature map extracted by the first feature extraction sub-layer is used as the supervision information of the second type of network, a schematic structural diagram of a face quality assessment network provided in the present embodiment may be shown in fig. 5. In fig. 5, the face quality evaluation network includes: a branch network group, a feature fusion network layer 520, and a first output network layer 530 connected in sequence, the branch network group including: the branch network layer 511 corresponding to the face key point, the branch network layer 512 corresponding to the image brightness, the branch network layer 513 corresponding to the image definition, the branch network layer 514 corresponding to the face angle, and the branch network layer 515 corresponding to the face shielding degree, and details are not repeated herein, as for the correspondence relationship between each branch network layer and each feature evaluation network shown in fig. 5 and the correspondence relationship between each branch network layer and each feature evaluation network in fig. 2. The branch network layer 511 corresponding to the face key point includes: a first feature extraction sublayer 5111 and a second feature extraction sublayer 5112; the branch network layer 514 corresponding to the face angle includes: a third feature extraction sublayer 5141, a first feature fusion sublayer 5142, and a fourth feature extraction sublayer 5143; the branch network layer 515 corresponding to the face shielding degree includes: a fifth feature extraction sublayer 5151, a second feature fusion sublayer 5152, and a sixth feature extraction sublayer 5153. It is emphasized that, since the output content of the first feature extraction sub-layer in the first feature evaluation network is applied to the training of other feature evaluation networks, and the output content of the second feature extraction sub-layer in the first feature evaluation network is used as the output content of the feature extraction network layer, the first feature extraction sub-layer can be used for extracting shallow features, and the second feature extraction sub-layer can be used for extracting deep features, in other words, the feature extraction capability of the first feature extraction sub-layer can be lower than that of the second feature extraction sub-layer. Based on such a requirement, different numbers of convolutional layers, pooling layers, and the like may be included in constructing the first feature extraction sublayer and the second feature extraction sublayer.
Similarly, since the output contents of the third feature extraction sublayer and the fifth feature extraction sublayer are merged with the output contents of the first feature extraction sublayer, and the fourth feature extraction sublayer and the sixth feature extraction sublayer are used as the output contents of the corresponding feature extraction network layer, the third feature extraction sublayer and the fifth feature extraction sublayer are used for extracting shallow features, and the fourth feature extraction sublayer and the sixth feature extraction sublayer are used for extracting deep features, in other words, the feature extraction capability of the third feature extraction sublayer may be lower than that of the fourth feature extraction sublayer, and the feature extraction capability of the fifth feature extraction sublayer may be lower than that of the sixth feature extraction sublayer. Based on the requirement, when the third feature extraction sublayer and the fourth feature extraction sublayer are constructed, different numbers of convolution layers, pooling layers and the like can be included; likewise, different numbers of convolutional layers, pooling layers, and the like may be included in constructing the fifth feature extraction sublayer and the sixth feature extraction sublayer.
And S403, after the training of each feature evaluation network is completed, on the basis of the network parameters of the feature extraction network layer of each feature evaluation network obtained by training, training the face quality evaluation network based on the second sample image data and the output of the first output network layer, and obtaining the face quality evaluation network for face quality evaluation.
S403 in this embodiment is the same as S102 in the above embodiments, and is not described herein again.
In addition, in order to further improve the evaluation accuracy of the face quality evaluation network, supervision information can be added during the first-class network training. Specifically, for each feature evaluation network in the first type of network, based on the first sample image data and the output of the second output network layer included in the feature evaluation network, the feature graph extracted by one or more feature evaluation networks in the second type of network is used as supervision information to train the feature evaluation network, so as to obtain the network parameters of the feature extraction network layer of the feature evaluation network. The feature graph extracted by one or more feature evaluation networks in the second type of network is used as supervision information of the feature evaluation network in the first type of network, and specifically refers to: and fusing the feature graphs extracted by one or more feature evaluation networks in the second type of network into the feature extraction network layer of the feature evaluation network in the first type of network.
In this embodiment, before the face quality assessment network is trained, each branch network layer in the face quality assessment network has already completed a convergence process, so that when each network layer in the face quality assessment network is trained in a unified manner, network parameters of each network layer are finely adjusted, and the adjustment times of the network parameters are greatly reduced. Therefore, compared with the prior art, the scheme can improve the network convergence speed of training the face quality evaluation network. In addition, when the special evaluation network is trained, the feature graphs extracted by other feature evaluation networks are used as supervision information, so that more detailed information can be fused, and the evaluation accuracy is improved.
In addition, in the face detection, images including irregular faces or non-faces may be detected as face images, and the regression of the quality evaluation scores may be affected when the images including irregular faces or non-faces are used as sample image data. In order to reduce the influence of an image containing an irregular face or non-face on the regression of the quality assessment score, a branch for judging the image category may be added. Based on such a requirement, in this embodiment of the application, the feature evaluation network corresponding to at least two branch network layers may further include:
a sixth feature evaluation network for evaluating an image category, wherein the image category is a non-face, an irregular face, or a regular face.
Specifically, when the sixth feature evaluation network is trained, the sixth feature evaluation network may be trained based on the first sample image data, the output of the second output network layer included in the sixth feature evaluation network, and the image category corresponding to the first sample image data, so as to obtain the network parameters of the feature extraction network layer of the sixth feature evaluation network. That is, with the first sample image data as the input content and the image category corresponding to the first sample image data as the true value corresponding to the corresponding input content, the network parameter of the sixth feature evaluation network is adjusted by the difference between the output of the second output network layer included in the sixth feature evaluation network and the true value.
It is understood that the image category corresponding to the first sample image data can be calibrated by a human, but is not limited thereto.
Optionally, in order to further improve the evaluation accuracy of the face quality evaluation network, when a sixth feature evaluation network is trained, the feature evaluation network may be trained based on the first sample image data and the output of a second output network layer included in the feature evaluation network, and with feature maps extracted from one or more feature evaluation networks in a third type of network as supervision information, network parameters of the feature extraction network layer of the feature evaluation network are obtained, and the third type of network includes a feature evaluation network other than the sixth feature evaluation network. In a specific implementation manner, when the sixth feature evaluation network is trained, a feature map extracted by a feature extraction network layer of the fourth feature evaluation network may be used as supervision information. Specifically, the fourth feature evaluation network includes: on the basis of the third feature extraction sublayer, the first feature fusion sublayer and the fourth feature extraction sublayer, the feature extraction network layer in the sixth feature evaluation network may include: and the third feature fusion sublayer is also connected with the first feature fusion sublayer.
On the basis that the feature extraction network layer in the sixth feature evaluation network includes a seventh feature extraction sublayer, a third feature fusion sublayer and an eighth feature extraction sublayer that are connected in sequence, a schematic structural diagram of the face quality evaluation network provided in the embodiment of the present application may be as shown in fig. 6. The face quality evaluation network shown in fig. 6 is added with a branch network layer 516 corresponding to an image category, compared with the face quality evaluation network shown in fig. 5, where the branch network layer 516 corresponding to the image category is a feature extraction network layer of a sixth feature evaluation network, where the branch network layer 516 corresponding to the image category includes: a seventh feature extraction sublayer 5161, a third feature fusion sublayer 5162, and an eighth feature extraction sublayer 5163.
In addition, because the key points of the face are used as the basic characteristics of the face and have a supervision function on the face angle and the face shielding degree, the face quality assessment network further comprises: a basic network layer, wherein the basic network layer is a feature extraction network layer in the first feature evaluation network, and the output content of the basic network layer is the input content of each branch network layer in the branch network group.
In order to facilitate understanding of the relationship between the basic network layer and each branch network layer, fig. 7 shows a schematic structural diagram of a face quality evaluation network in which the basic network layer exists. The face quality evaluation network shown in fig. 7 adds an underlying network layer 500 to the face quality evaluation network shown in fig. 6.
Corresponding to the above training method for the face quality assessment network, an embodiment of the present application further provides a training device for a face quality assessment network, where the face quality assessment network includes: the system comprises a branch network group, a feature fusion network layer and a first output network layer which are connected in sequence; the branch network group comprises at least two branch network layers which are parallel branches, each branch network layer corresponds to one feature evaluation network, each branch network layer is a feature extraction network layer in the corresponding feature evaluation network, and each feature evaluation network further comprises a second output network layer;
as shown in fig. 8, an apparatus for training a face quality assessment network may include:
a first training unit 810, configured to train, for each feature evaluation network, the feature evaluation network based on the first sample image data and an output of a second output network layer included in the feature evaluation network, so as to obtain a network parameter of a feature extraction network layer of the feature evaluation network;
and a second training unit 820, configured to train the face quality evaluation network based on second sample image data and the output of the first output network layer after the training of each feature evaluation network is completed and on the basis of the network parameters of the feature extraction network layer of each feature evaluation network obtained through training, so as to obtain a face quality evaluation network used for face quality evaluation.
The training device for the face quality evaluation network provided by the embodiment of the application firstly trains each feature evaluation network to obtain the network parameters of the feature extraction network layer of each feature evaluation network; then, on the basis of extracting the network parameters of the network layer from the features of each feature evaluation network obtained by training, uniformly training each network layer in the face quality evaluation network. Before the face quality evaluation network is trained, each branch network layer in the face quality evaluation network has finished convergence processing once, so that when all network layers in the face quality evaluation network are trained uniformly, network parameters of all network layers are finely adjusted, and the adjustment times of the network parameters are greatly reduced. Therefore, compared with the prior art, the scheme can improve the network convergence speed of training the face quality evaluation network.
Optionally, in a specific implementation manner, the feature evaluation networks corresponding to the at least two branch network layers include:
the system comprises a first feature evaluation network for evaluating key points of the face, a second feature evaluation network for evaluating the brightness of an image, a third feature evaluation network for evaluating the definition of the image, a fourth feature evaluation network for evaluating the angle of the face and a fifth feature evaluation network for evaluating the degree of face shielding.
Optionally, on the basis that the feature evaluation networks corresponding to the at least two branch network layers include a first feature evaluation network, a second feature evaluation network, a third feature evaluation network, a fourth feature evaluation network, and a fifth feature evaluation network, the first training unit 810 may include:
the first-class network training subunit is used for training the characteristic evaluation network aiming at each characteristic evaluation network in the first-class network and based on the first sample image data and the output of a second output network layer included by the characteristic evaluation network to obtain the network parameters of the characteristic extraction network layer of the characteristic evaluation network; wherein the first type of network comprises: the first feature evaluation network, the second feature evaluation network and the third feature evaluation network;
a second-class network training subunit, configured to train, for each feature evaluation network in the second-class network, the feature evaluation network based on the first sample image data and an output of a second output network layer included in the feature evaluation network, with feature maps extracted by one or more feature evaluation networks in the first-class network as supervision information; wherein the second type of network comprises: a fourth feature evaluation network and a fifth feature evaluation network.
Optionally, the first type network training subunit may include:
and the first-class network training module is used for training the characteristic evaluation network to obtain network parameters of the characteristic extraction network layer of the characteristic evaluation network based on the first sample image data and the output of the second output network layer included by the characteristic evaluation network and the characteristic diagram extracted by one or more characteristic evaluation networks in the second-class network as supervision information.
Optionally, the second type network training subunit may include:
and the second type network training module is used for training the characteristic evaluation network to obtain the network parameters of the characteristic extraction network layer of the characteristic evaluation network based on the first sample image data and the output of a second output network layer included by the characteristic evaluation network and the characteristic graph extracted by the first characteristic evaluation network as supervision information.
Optionally, the feature extraction network layer of the first feature evaluation network includes: a first feature extraction sublayer and a second feature extraction sublayer connected in sequence;
the second type of network training module is specifically configured to:
and training the feature evaluation network to obtain network parameters of the feature extraction network layer of the feature evaluation network based on the feature graph extracted by the first feature extraction sub-layer in the first feature evaluation network as supervision information based on the first sample image data and the output of the second output network layer included by the feature evaluation network.
Optionally, the feature extraction network layer of the fourth feature evaluation network includes: the first feature extraction sublayer is connected with the second feature extraction sublayer;
the feature extraction network layer of the fifth feature evaluation network includes: and the second feature fusion sublayer is also connected with the first feature extraction sublayer.
Optionally, on the basis that the feature evaluation networks corresponding to the at least two branch network layers include a first feature evaluation network, a second feature evaluation network, a third feature evaluation network, a fourth feature evaluation network, and a fifth feature evaluation network, the feature evaluation networks corresponding to the at least two branch network layers may further include:
a sixth feature evaluation network for evaluating image categories, wherein the image categories are non-faces, irregular faces or regular faces.
Optionally, the first training unit 810 is specifically configured to, for the sixth feature evaluation network, train the feature evaluation network based on the first sample image data and the output of the second output network layer included in the feature evaluation network, and use the feature maps extracted from one or more feature evaluation networks in a third class of networks as supervision information to obtain network parameters of the feature extraction network layer of the feature evaluation network, where the third class of networks includes feature evaluation networks other than the sixth feature evaluation network.
Optionally, the face quality evaluation network may further include: and the basic network layer is a feature extraction network layer in the first feature evaluation network, and the output content of the basic network layer is the input content of each branch network layer in the branch network group.
Optionally, the first output network layer comprises: a feature extraction sublayer and a fractional regression sublayer;
the input content of the feature extraction sublayer is the output content of the feature fusion network layer, the input content of the score regression sublayer is the output content of the feature extraction sublayer, and the output content of the score regression sublayer is the quality evaluation score.
Optionally, the second training unit 820 may include:
the second training subunit is used for training the face quality evaluation network based on second sample image data, the output of the first output network layer and a quality evaluation score corresponding to the second sample image data to obtain a face quality evaluation network for face quality evaluation;
wherein the quality evaluation score of the second sample image data is: and determining a score based on a target similarity, wherein the target similarity is the similarity between the second sample image data and reference image data corresponding to the second sample image data, the reference image data and the second sample image data are face image data of the same person, and the reference image data is an image including complete face information.
Corresponding to the training method for the human face quality evaluation network provided by the embodiment of the present application, an embodiment of the present application further provides an electronic device, as shown in fig. 9, including a processor 910, a communication interface 920, a memory 930, and a communication bus 940, where the processor 910, the communication interface 920, and the memory 930 complete mutual communication through the communication bus 940,
a memory 930 for storing a computer program;
the processor 910 is configured to implement the training method for the face quality assessment network provided in the embodiment of the present application when executing the program stored in the memory 930.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
The embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the training method for the face quality assessment network provided in the embodiment of the present application.
In addition, the embodiment of the application also provides a face quality evaluation method, a face quality evaluation device, electronic equipment and a storage medium.
It should be noted that the face quality evaluation method provided by the embodiment of the present application is applied to an electronic device. In a specific application, the electronic device may be a terminal device or a server.
As shown in fig. 10, the method for evaluating face quality provided in the embodiment of the present application may include the following steps:
s1001, obtaining a target face image to be evaluated;
when the score of the face image needs to be evaluated, the electronic equipment can obtain a target face image to be evaluated.
It should be noted that the target face image may be a face image acquired by a special face information acquisition device, or may be an image obtained by performing face detection processing on an image including a face region.
S1002, inputting the target face image into a pre-trained face quality evaluation network to obtain a quality evaluation score of the target face image.
The face quality evaluation network is trained based on the training method of the face quality evaluation network provided by the embodiment of the application.
It is understood that after the target face image to be evaluated is obtained, the target face image may be directly input to the face quality evaluation network. Of course, before the target face image is input to the face quality evaluation network, image preprocessing may be performed on the target face image, and the target face image after image preprocessing is input to the face quality evaluation network.
In the face quality evaluation method provided by the embodiment of the application, a target face image to be evaluated is obtained; inputting the target face image into a pre-trained face quality evaluation network to obtain a quality evaluation score of the target face image; the face quality evaluation network is trained based on the training method of the face quality evaluation network provided by the embodiment of the application. Therefore, the face quality evaluation method provided by the embodiment of the application can quickly obtain the effective face quality evaluation score.
Corresponding to the face quality evaluation method provided by the embodiment of the application, the embodiment of the application also provides a face quality evaluation device. As shown in fig. 11, the face quality evaluation apparatus provided in the embodiment of the present application may include:
a face image obtaining unit 1110, configured to obtain a target face image to be evaluated;
an evaluation score obtaining unit 1120, configured to input the target face image into a pre-trained face quality evaluation network, so as to obtain a quality evaluation score of the target face image; the face quality evaluation network is trained based on the training method of the face quality evaluation network provided by the embodiment of the application.
The face quality evaluation device provided by the embodiment of the application obtains a target face image to be evaluated; inputting the target face image into a pre-trained face quality evaluation network to obtain a quality evaluation score of the target face image; the face quality evaluation network is trained based on the training method of the face quality evaluation network provided by the embodiment of the application. Therefore, the face quality evaluation method provided by the embodiment of the application can quickly obtain the effective face quality evaluation score.
Corresponding to the training method for the face quality assessment network provided by the embodiment of the present application, the embodiment of the present application further provides an electronic device, as shown in fig. 12, including a processor 1210, a communication interface 1220, a memory 1230 and a communication bus 1240, where the processor 1210, the communication interface 1220 and the memory 1230 complete mutual communication through the communication bus 1240,
a memory 1230 for storing computer programs;
the processor 1210 is configured to implement the method for evaluating human face quality provided by the embodiment of the present application when executing the program stored in the memory 1230.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the face quality assessment method provided in the embodiment of the present application are implemented.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (17)

1. A training method of a human face quality evaluation network is characterized in that,
the face quality evaluation network comprises: the system comprises a branch network group, a feature fusion network layer and a first output network layer which are connected in sequence; the branch network group comprises at least two branch network layers which are parallel branches, each branch network layer corresponds to one feature evaluation network, each branch network layer is a feature extraction network layer in the corresponding feature evaluation network, and each feature evaluation network further comprises a second output network layer;
the method comprises the following steps:
training the characteristic evaluation network based on the first sample image data and the output of a second output network layer included by the characteristic evaluation network aiming at each characteristic evaluation network to obtain the network parameters of the characteristic extraction network layer of the characteristic evaluation network;
after the training of each feature evaluation network is completed, on the basis of the network parameters of the feature extraction network layer of each feature evaluation network obtained by training, the face quality evaluation network is trained on the basis of second sample image data and the output of the first output network layer, and the face quality evaluation network for face quality evaluation is obtained.
2. The method of claim 1, wherein the feature evaluation networks corresponding to the at least two branch network layers comprise:
the system comprises a first feature evaluation network for evaluating key points of the face, a second feature evaluation network for evaluating the brightness of an image, a third feature evaluation network for evaluating the definition of the image, a fourth feature evaluation network for evaluating the angle of the face and a fifth feature evaluation network for evaluating the degree of face shielding.
3. The method of claim 2, wherein the step of training, for each feature evaluation network, the feature evaluation network based on the first sample image data and the output of the second output network layer included in the feature evaluation network to obtain the network parameters of the feature extraction network layer of the feature evaluation network comprises:
aiming at each feature evaluation network in the first type of network, training the feature evaluation network based on the first sample image data and the output of a second output network layer included by the feature evaluation network to obtain the network parameters of the feature extraction network layer of the feature evaluation network; wherein the first type of network comprises: the first feature evaluation network, the second feature evaluation network and the third feature evaluation network;
for each feature evaluation network in the second type of network, training the feature evaluation network by taking feature graphs extracted by one or more feature evaluation networks in the first type of network as supervision information based on first sample image data and output of a second output network layer included by the feature evaluation network; wherein the second type of network comprises: a fourth feature evaluation network and a fifth feature evaluation network.
4. The method according to claim 3, wherein the step of training the feature evaluation network to obtain the network parameters of the feature extraction network layer of the feature evaluation network based on the first sample image data and the output of the second output network layer included in the feature evaluation network for each feature evaluation network in the first class of networks comprises:
and training the feature evaluation network to obtain network parameters of the feature extraction network layer of the feature evaluation network based on the feature graph extracted by one or more feature evaluation networks in the second type of network as supervision information based on the first sample image data and the output of the second output network layer included by the feature evaluation network.
5. The method according to claim 3, wherein the step of training, for each feature evaluation network in the second type of network, the feature evaluation network based on the first sample image data and the output of the second output network layer included in the feature evaluation network, with the feature map extracted by one or more feature evaluation networks in the first type of network as the supervision information, to obtain the network parameters of the feature extraction network layer of the feature evaluation network comprises:
and training the feature evaluation network by taking the feature graph extracted by the first feature evaluation network as supervision information to obtain network parameters of the feature extraction network layer of the feature evaluation network based on the first sample image data and the output of the second output network layer included by the feature evaluation network aiming at each feature evaluation network in the second type of network.
6. The method of claim 5, wherein the feature extraction network layer of the first feature evaluation network comprises: a first feature extraction sublayer and a second feature extraction sublayer connected in sequence;
the step of training the feature evaluation network to obtain the network parameters of the feature extraction network layer of the feature evaluation network based on the first sample image data and the output of the second output network layer included in the feature evaluation network, the feature extraction network layer being extracted by the first feature evaluation network, includes:
and training the feature evaluation network to obtain network parameters of the feature extraction network layer of the feature evaluation network based on the feature graph extracted by the first feature extraction sub-layer in the first feature evaluation network as supervision information based on the first sample image data and the output of the second output network layer included by the feature evaluation network.
7. The method of claim 6, wherein the feature extraction network layer of the fourth feature evaluation network comprises: the first feature extraction sublayer is connected with the second feature extraction sublayer;
the feature extraction network layer of the fifth feature evaluation network includes: and the second feature fusion sublayer is also connected with the first feature extraction sublayer.
8. The method according to any one of claims 2-7, wherein the feature evaluation network corresponding to the at least two branch network layers further comprises:
a sixth feature evaluation network for evaluating image categories, wherein the image categories are non-faces, irregular faces or regular faces.
9. The method of claim 8, wherein the step of training, for each feature evaluation network, the feature evaluation network based on the first sample image data and the output of the second output network layer included in the feature evaluation network to obtain the network parameters of the feature extraction network layer of the feature evaluation network comprises:
and training the feature evaluation network based on the first sample image data and the output of a second output network layer included by the feature evaluation network, wherein feature graphs extracted from one or more feature evaluation networks in a third type of network are used as supervision information, and network parameters of the feature extraction network layer of the feature evaluation network are obtained, and the third type of network includes feature evaluation networks except the sixth feature evaluation network.
10. The method of claim 8, wherein the face quality assessment network further comprises: and the basic network layer is a feature extraction network layer in the first feature evaluation network, and the output content of the basic network layer is the input content of each branch network layer in the branch network group.
11. The method of any of claims 1-7, wherein the first output network layer comprises: a feature extraction sublayer and a fractional regression sublayer;
the input content of the feature extraction sublayer is the output content of the feature fusion network layer, the input content of the score regression sublayer is the output content of the feature extraction sublayer, and the output content of the score regression sublayer is the quality evaluation score.
12. The method according to any one of claims 1 to 7, wherein the step of training the face quality assessment network based on the second sample image data and the output of the first output network layer to obtain a face quality assessment network for face quality assessment comprises:
training the face quality evaluation network based on second sample image data, the output of the first output network layer and a quality evaluation score corresponding to the second sample image data to obtain a face quality evaluation network for face quality evaluation;
wherein the quality evaluation score of the second sample image data is: and determining a score based on a target similarity, wherein the target similarity is the similarity between the second sample image data and reference image data corresponding to the second sample image data, the reference image data and the second sample image data are face image data of the same person, and the reference image data is an image including complete face information.
13. A face quality assessment method is characterized by comprising the following steps:
obtaining a target face image to be evaluated;
inputting the target face image into a pre-trained face quality evaluation network to obtain a quality evaluation score of the target face image; wherein the face quality assessment network is a network trained based on the method of any one of claims 1-12.
14. An apparatus for training a face quality assessment network, the face quality assessment network comprising: the system comprises a branch network group, a feature fusion network layer and a first output network layer which are connected in sequence; the branch network group comprises at least two branch network layers which are parallel branches, each branch network layer corresponds to one feature evaluation network, each branch network layer is a feature extraction network layer in the corresponding feature evaluation network, and each feature evaluation network further comprises a second output network layer;
the device comprises:
the first training unit is used for training the characteristic evaluation network aiming at each characteristic evaluation network based on the first sample image data and the output of a second output network layer included by the characteristic evaluation network to obtain the network parameters of the characteristic extraction network layer of the characteristic evaluation network;
and the second training unit is used for training the face quality evaluation network on the basis of second sample image data and the output of the first output network layer after the training of each feature evaluation network is finished and on the basis of the network parameters of the feature extraction network layer of each feature evaluation network obtained through training, so as to obtain the face quality evaluation network for face quality evaluation.
15. A face quality assessment apparatus, comprising:
the device comprises an obtaining unit, a judging unit and a judging unit, wherein the obtaining unit is used for obtaining a target face image to be evaluated;
the determining unit is used for inputting the target face image into a pre-trained face quality evaluation network to obtain a quality evaluation score of the target face image; wherein the face quality assessment network is a network trained based on the method of any one of claims 1-12.
16. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1-12 when executing a program stored in the memory.
17. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of claim 13 when executing a program stored in the memory.
CN201810730258.5A 2018-07-05 2018-07-05 Face quality evaluation network training method, face quality evaluation method and device Active CN110688875B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810730258.5A CN110688875B (en) 2018-07-05 2018-07-05 Face quality evaluation network training method, face quality evaluation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810730258.5A CN110688875B (en) 2018-07-05 2018-07-05 Face quality evaluation network training method, face quality evaluation method and device

Publications (2)

Publication Number Publication Date
CN110688875A true CN110688875A (en) 2020-01-14
CN110688875B CN110688875B (en) 2022-11-04

Family

ID=69106601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810730258.5A Active CN110688875B (en) 2018-07-05 2018-07-05 Face quality evaluation network training method, face quality evaluation method and device

Country Status (1)

Country Link
CN (1) CN110688875B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274993A (en) * 2020-02-12 2020-06-12 深圳数联天下智能科技有限公司 Eyebrow recognition method and device, computing equipment and computer-readable storage medium
CN112990156A (en) * 2021-05-12 2021-06-18 深圳市安软科技股份有限公司 Optimal target capturing method and device based on video and related equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834898A (en) * 2015-04-09 2015-08-12 华南理工大学 Quality classification method for portrait photography image
CN105631439A (en) * 2016-02-18 2016-06-01 北京旷视科技有限公司 Human face image collection method and device
CN106897748A (en) * 2017-03-02 2017-06-27 上海极链网络科技有限公司 Face method for evaluating quality and system based on deep layer convolutional neural networks
CN106951825A (en) * 2017-02-13 2017-07-14 北京飞搜科技有限公司 A kind of quality of human face image assessment system and implementation method
CN107103585A (en) * 2017-04-28 2017-08-29 广东工业大学 A kind of image super-resolution system
CN108171256A (en) * 2017-11-27 2018-06-15 深圳市深网视界科技有限公司 Facial image matter comments model construction, screening, recognition methods and equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834898A (en) * 2015-04-09 2015-08-12 华南理工大学 Quality classification method for portrait photography image
CN105631439A (en) * 2016-02-18 2016-06-01 北京旷视科技有限公司 Human face image collection method and device
CN106951825A (en) * 2017-02-13 2017-07-14 北京飞搜科技有限公司 A kind of quality of human face image assessment system and implementation method
CN106897748A (en) * 2017-03-02 2017-06-27 上海极链网络科技有限公司 Face method for evaluating quality and system based on deep layer convolutional neural networks
CN107103585A (en) * 2017-04-28 2017-08-29 广东工业大学 A kind of image super-resolution system
CN108171256A (en) * 2017-11-27 2018-06-15 深圳市深网视界科技有限公司 Facial image matter comments model construction, screening, recognition methods and equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
VIGNESH S等: "Face image quality assessment for face selection in surveillance video using convolutional neural networks", 《2015 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION》 *
张园林等: "基于二度冗余网络的人脸识别", 《计算机应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274993A (en) * 2020-02-12 2020-06-12 深圳数联天下智能科技有限公司 Eyebrow recognition method and device, computing equipment and computer-readable storage medium
CN111274993B (en) * 2020-02-12 2023-08-04 深圳数联天下智能科技有限公司 Eyebrow recognition method, device, computing equipment and computer readable storage medium
CN112990156A (en) * 2021-05-12 2021-06-18 深圳市安软科技股份有限公司 Optimal target capturing method and device based on video and related equipment

Also Published As

Publication number Publication date
CN110688875B (en) 2022-11-04

Similar Documents

Publication Publication Date Title
CN110163300B (en) Image classification method and device, electronic equipment and storage medium
CN108615071B (en) Model testing method and device
US10262190B2 (en) Method, system, and computer program product for recognizing face
CN109389135B (en) Image screening method and device
CN108124486A (en) Face living body detection method based on cloud, electronic device and program product
WO2017143921A1 (en) Multi-sampling model training method and device
CN105303150B (en) Realize the method and system of image procossing
CN111754396B (en) Face image processing method, device, computer equipment and storage medium
CN112464809A (en) Face key point detection method and device, electronic equipment and storage medium
CN110956615B (en) Image quality evaluation model training method and device, electronic equipment and storage medium
CN110472611A (en) Method, apparatus, electronic equipment and the readable storage medium storing program for executing of character attribute identification
CN110909784B (en) Training method and device of image recognition model and electronic equipment
WO2020088029A1 (en) Liveness detection method, storage medium, and electronic device
CN110705428B (en) Facial age recognition system and method based on impulse neural network
WO2024051597A1 (en) Standard pull-up counting method, and system and storage medium therefor
CN111144398A (en) Target detection method, target detection device, computer equipment and storage medium
CN110688875B (en) Face quality evaluation network training method, face quality evaluation method and device
CN110738702B (en) Three-dimensional ultrasonic image processing method, device, equipment and storage medium
CN111144369A (en) Face attribute identification method and device
CN109754077B (en) Network model compression method and device of deep neural network and computer equipment
CN110751171A (en) Image data classification method and device, computer equipment and storage medium
CN111428655A (en) Scalp detection method based on deep learning
CN111401343A (en) Method for identifying attributes of people in image and training method and device for identification model
CN108596094B (en) Character style detection system, method, terminal and medium
CN117392733A (en) Acne grading detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant