CN111814759A

CN111814759A - Method and device for acquiring face quality label value, server and storage medium

Info

Publication number: CN111814759A
Application number: CN202010854320.9A
Authority: CN
Inventors: 汪韬; 张睿欣; 陈星宇; 李绍欣; 李季檩; 黄飞跃
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-24
Filing date: 2020-08-24
Publication date: 2020-10-23
Anticipated expiration: 2040-08-24
Also published as: CN111814759B

Abstract

The application discloses a method, a device, a server and a storage medium for acquiring a face quality label value, wherein the method comprises the following steps: obtaining a first image set, wherein the first image set comprises a plurality of first image samples; obtaining a first face quality label value of each first image sample and a preset quality label value of an artificial label corresponding to the first face quality label value; training a label training model by taking at least a first face quality label value of a first image sample as an input sample of a pre-constructed label training model and taking a preset quality label value corresponding to the first face quality label value as an output sample of the label training model; and processing at least the first face quality label value of each second image sample in the second image set by using the trained label training model to obtain the second face quality label value of each second image sample, wherein the second face quality label value of each second image sample is used for training the face quality model.

Description

Method and device for acquiring face quality label value, server and storage medium

Technical Field

The present application relates to the field of deep learning technologies, and in particular, to a method and an apparatus for obtaining a face quality label value, a server, and a storage medium.

Background

As a brand-new artificial intelligence ai (artificial intelligence) technology, face recognition is widely applied in the fields of face-brushing payment, old customer recognition and the like, however, in scenes such as face-brushing payment and the like, the accuracy requirement is very high due to binding with an individual account, and therefore a user needs to be in a semi-fit state, such as keeping a face upright, having no face shielding, having clear images, and having good illumination. When the user does not execute according to the requirement, the user needs to be accurately prompted and the improper behavior of the user needs to be corrected, so that the quality attribution sub-model is needed to perform structural analysis of the subdivision dimension on the image which is not suitable for face recognition. On the other hand, in the tasks of filing the business super snapshot photo preferred photo and the like, the structural analysis of the subdivision dimensionality is not needed, and only the video frame or image most suitable for face recognition is selected to be included in the file, so that the quality total score model is needed to evaluate the friendliness degree of the face image to the face recognition model.

The face quality score is generally used for screening and distinguishing input images of the face recognition system, is a front-end process of face recognition, and plays a critical role in the accuracy and stability of the face recognition system. In the deep learning field, the human face quality score evaluation method is roughly divided into a traditional image processing method and a deep learning method. The conventional image processing method evaluates the angle attribution score by using a similarity score between histograms (left face and right face) of local features as a face symmetry measure; evaluating the fuzzy attribution score through the Laplacian operator response; estimating the illumination quality by determining the length of the available range of gray scale intensities, obtaining an illumination attribution score; and a certain mapping is carried out based on the four-dimensional attribution score to obtain a total quality score. The deep learning technology is mainly divided into manual labeling and automatic labeling methods according to different label obtaining ways.

Based on the above, there are two methods for obtaining the face quality attribution model, one is to manually judge the type and degree of sample attribution and then use the sample attribution for model training; and secondly, artificially applying degradation to an ideal sample, such as manually adding occlusion, blurring, brightness change and other operations on an image, labeling the sample according to different degradation degrees, and using the sample for model training. Two methods are generally used for obtaining a face quality total score model, wherein one method is to manually judge the quality total score of a sample and then use the quality total score for model training; the second one is to use face recognition model to calculate the face similarity between the picture to be marked and the standard identification picture, and directly use the face similarity to replace the total quality score for model training.

However, the quality score labels obtained by manual scoring have the defects of high cost, long time consumption and the like, so that the feasibility is low. The currently common automatic labeling method is low in cost and high in efficiency, but the accuracy is insufficient, so that the accuracy of the trained human face quality model is low.

Disclosure of Invention

In view of this, the present application provides a method, an apparatus, a server and a storage medium for obtaining a face quality label value, so as to improve the accuracy of a face quality model trained by the label value.

In order to achieve the above object, in one aspect, the present application provides a method for obtaining a face quality label value, including:

obtaining a first image set, wherein the first image set comprises a plurality of first image samples;

obtaining a first face quality label value of each first image sample and a preset quality label value of an artificial label corresponding to the first face quality label value, wherein the first face quality label value comprises a test label value of a face quality total score and/or a test label value of a face quality attribution score, and the preset quality label value is a preset label value of the face quality total score or a preset label value of the face quality attribution score;

training the label training model by taking at least a first face quality label value of the first image sample as an input sample of a pre-constructed label training model and taking a preset quality label value corresponding to the first face quality label value as an output sample of the label training model;

the method comprises the steps of processing a first face quality label value of each second image sample in a second image set by using a trained label training model to obtain a second face quality label value of each second image sample, wherein the second face quality label value of each second image sample is used for training a face quality model, the face quality model comprises a face quality total score model or a face quality attribution score model, the face quality total score model is used for obtaining a face quality total score of a test image, and the face quality attribution score model is used for obtaining a face quality attribution score of the test image.

In one possible implementation, obtaining a first face quality label value for each of the first image samples includes:

performing image processing on each first image sample by using an initially trained face quality total score model to obtain a test label value of the face quality total score of each first image sample;

and/or the presence of a gas in the gas,

performing image processing on each first image sample by using an initially trained face quality attribution model to obtain a test label value of the face quality attribution of each first image sample;

the face quality total score model is obtained by training a plurality of third image samples with initial label values of face quality total scores, and the face quality attribution score model is obtained by training a plurality of third image samples with initial label values of face quality attribution scores.

Optionally, the initial label value of the total face quality score of the third image sample is obtained by:

performing face recognition on each third image sample by using a preset face recognition model to obtain an initial label value of the total face quality score of the third image sample;

the initial label value of the face quality attribution score of the third image sample is obtained by:

performing degradation processing on each third image sample in at least one degradation dimension to obtain an initial label value of at least one face quality attribution of the third image sample.

Optionally, the first image sample is an image sample selected from the third image samples, or the first image sample is different from the third image sample.

Optionally, the first image sample includes an original first image sample and/or an image sample obtained by left-right flipping the original first image sample.

Optionally, the input samples of the label training model further include:

at least one item of face attribute information obtained by carrying out face attribute recognition on the first image sample, and/or at least one item of face key point information obtained by carrying out face key point recognition on the first image sample.

Optionally, a difference between a first face quality label value of a second image sample in the second image set and a second face quality label value of the second image sample is greater than or equal to a preset threshold.

In another aspect, the present application further provides an apparatus for obtaining a face quality label value, including:

an image set obtaining unit, configured to obtain a first image set, where the first image set includes a plurality of first image samples;

a label value testing unit, configured to obtain a first face quality label value of each first image sample and a preset quality label value of an artificial label corresponding to the first face quality label value, where the first face quality label value includes a test label value of a total score of face quality and/or a test label value of an attribution score of face quality, and the preset quality label value is a preset label value of the total score of face quality or a preset label value of the attribution score of face quality;

the label value training unit is used for training the label training model by taking at least a first face quality label value of the first image sample as an input sample of a pre-constructed label training model and taking a preset quality label value corresponding to the first face quality label value as an output sample of the label training model;

the label value obtaining unit is configured to at least process a first face quality label value of each second image sample in a second image set by using a trained label training model to obtain a second face quality label value of each second image sample, where the second face quality label value of the second image sample is used to train a face quality model, the face quality model includes a face quality total score model or a face quality attribution score model, the face quality total score model is used to obtain a face quality total score of a test image, and the face quality attribution score model is used to obtain a face quality attribution score of the test image.

In another aspect, the present application further provides a server, including:

a processor and a memory;

wherein the processor is configured to execute a program stored in the memory;

the memory is to store a program to at least:

In still another aspect, the present application further provides a storage medium, where computer-executable instructions are stored, and when the computer-executable instructions are loaded and executed by a processor, the method for obtaining a face quality label value as described in any one of the above is implemented.

According to the above scheme, in the method, the apparatus, the server and the storage medium for obtaining the face quality label value provided by the present application, after obtaining the test label value of the face quality total score and/or the test label value of the face quality attribution score of the first image sample and the corresponding preset quality label value of the artificial label, at least the first face quality label value of the first image sample is taken as the input sample of the label training model, the more accurate preset quality label value corresponding to the first face quality label value is taken as the output sample of the label training model, the label training model is trained, and thus the trained label training model can perform the learning prediction of the label value on the second image sample having the first face quality label value, so as to obtain the more accurate second face quality label value of each second image sample, based on the above, the second face quality label value of the second image sample can be used for training the corresponding face quality model, so that the accuracy of the trained face quality total score model or face quality attribution score model is improved, and accordingly, when the face quality total score model or face quality attribution score model with higher accuracy is used for testing the test image, the face quality total score or face quality attribution score with higher accuracy can be obtained.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic diagram illustrating a composition framework of a face quality label value obtaining system according to an embodiment of the present application;

fig. 2 to 4 are schematic diagrams of other architectures of a system for acquiring a face quality label value according to an embodiment of the present application, respectively;

fig. 5 is a schematic diagram illustrating a hardware component structure of a server for implementing acquisition of a face quality label value according to an embodiment of the present application;

fig. 6 is a flowchart illustrating a method for obtaining a face quality label value according to an embodiment of the present application;

FIG. 7 is a diagram showing an example of a face image in an embodiment of the present application;

8-9 show flow charts of application examples of face quality total score and face quality attribution score in the embodiment of the application;

FIG. 10 is a flow chart for acquiring a face quality label value in an embodiment of the present application;

fig. 11 is a schematic diagram illustrating a configuration of an embodiment of an apparatus for acquiring a face quality label value according to an embodiment of the present application.

Detailed Description

The scheme of the application can be suitable for obtaining the label value of a face quality identification model such as a face quality total score model or a face quality attribution score model, and is mainly applied to scenes in the technical field of Artificial Intelligence (AI), such as intelligent retail scenes of face brushing payment, old customer identification and the like.

The inventor of the present application found through research that: in the existing scheme, the quality score label obtained by manual scoring is simple and direct. In general deep learning tasks, manual scoring is the simplest and the most reliable, but the quality scoring task has complex and subjective particularity, and the feasibility of manual scoring is low. Firstly, the manual scoring has the task load problem, and the total score and the 4 cause values in the quality score are independent, so that a annotating person needs to perform independent judgment for 5 times to complete the marking of a sample, the task load is huge, the feasibility is very low, and a large-scale public data set does not exist in the field; secondly, the accurate quantification capability of human eyes on the quality score is insufficient, random errors and system errors exist in manual marking, the manual marking is not accurate enough, and the manual marking can be simply divided into 5 grades, so that the label has discreteness; thirdly, manual marking has the markability problem, when the image has the cause degradation of a plurality of dimensions, manual marking cannot be carried out, for example, the angle degradation degree of the shielded image is difficult to mark, and the fuzzy degradation of the underexposed image is difficult to mark; fourthly, in the process of continuous iteration of business and data, some customized tasks have special requirements on quality models, for example, strong requirements on mask blocking attribution judgment during epidemic situations, and if a data label is marked once again for each customized task, the labor cost is very high.

Therefore, an implementation scheme for automatically labeling a tag value is commonly used at present, but the existing automatic labeling method is low in cost and insufficient in accuracy. For example, degraded samples generated by image processing are different from actual pictures, resulting in poor performance of the obtained attributed partial model on actual data; the face quality total score is directly marked through the face similarity, on one hand, the face quality total score strongly depends on a face recognition model, and the existence of deviation is judged; on the other hand, the face similarity and the total mass score cannot be completely equal, certain numerical value mapping is needed when the face similarity and the total mass score are used as a label value of the total mass score, and a large amount of noise exists, namely, the similarity is high but the actual total score is low, or the similarity is low but the actual total score is high, so that the accuracy of a trained total score model is low, and the robustness is poor.

Therefore, the inventor of the present application further researches and discovers that a basic human face quality model can be obtained based on a traditional implementation scheme of automatically labeling a label value, and a model capable of predicting an accurate label value is trained through a small number of manually labeled label values, so that an accurate sample label value is obtained, and further the trained human face quality total score model and the trained human face quality attribution score model can be more accurate; furthermore, the relationship of mutual restriction or mutual influence between the total score and the attribution score can be learned in the training of the label value, the accuracy of the trained model capable of predicting the accurate label value is further improved through the cross optimization between the total score of the face quality and the attribution score of the face quality, and the total score model of the face quality and the attribution score model of the face quality trained by using the predicted accurate sample label value can be more accurate; and the most difficult samples needed by model training, namely the samples with larger difference between the input samples and the output samples, can be searched from the unlabeled image set, so that the accuracy of the face quality total score model and the face quality attribution score model can be improved from two aspects of the information density of the sample data and the accuracy of the label value.

For the convenience of understanding, a system to which the solution of the present application is applied is described herein, and reference is made to fig. 1, which is a schematic diagram illustrating a component architecture of a face quality label value obtaining system according to the present application.

As can be seen from fig. 1, the system may include: the server 10 and the terminal 20 are connected in communication through a network, and the server 10 and the terminal 20 are connected in communication through the network.

The server 10 may be a front-end server or a background server, the terminal 20 may be a client such as a mobile phone, a pad, or a computer, and at this time, the user may acquire a test image through the terminal 20 and send the test image to the server 10, the server 10 may predict a more accurate face quality label value by using an image sample, train a more accurate face quality model, and then perform image processing on the test image by using the face quality model to obtain a face quality total score and/or a face quality attribution score.

It should be noted that, in another implementation, the system for obtaining a face quality label value in the present application may not include the server 10, but only include the terminal 20, and the functions of the server 10 for performing label value prediction, model training, and image processing are integrated on the terminal 20, and the terminal 20 predicts a more accurate face quality label value by using an image sample and then trains a more accurate face quality model, and then performs image processing on an acquired test image by using the face quality model to obtain a face quality total score and/or a face quality attribution score; or, the system for acquiring the face quality label value in the present application may not include the terminal 20 in another implementation, but only includes the server 10, and the function of the terminal 20 for collecting the test image is integrated into the server 10, and the server 10 predicts a more accurate face quality label value by using an image sample, trains a more accurate face quality model, and then performs image processing on the collected test image by using the face quality model to obtain the face quality total score and/or the face quality attribution score.

For example, after the mobile phone terminal acquires a face image of a mobile phone user, the face image is transmitted to the payment server, the payment server performs face recognition, and prompts the mobile phone user according to the obtained total face quality score and/or attribution score to prompt information such as whether to perform position adjustment or how to perform adjustment, and further realizes a payment function when the face image can meet payment conditions, as shown in fig. 2; or, after the face image of the mobile phone user is collected by the camera on the mobile phone terminal, the face image is transmitted to the central processing unit of the mobile phone, the central processing unit of the mobile phone performs face recognition, and prompts the mobile phone user according to the obtained face quality total score and/or attribution score so as to prompt whether to perform position adjustment or how to perform adjustment and other information, and then the face image is sent to the payment server when the face image can meet the payment condition so as to realize the payment function, as shown in fig. 3; after the payment server collects the face image of the user through the camera, the face image is transmitted to a processor of the payment server, the processor of the payment server carries out face recognition on the face image, and prompts the user according to the obtained total face quality score and/or attribution score so as to prompt information such as whether position adjustment is carried out or how adjustment is carried out, and then the payment function is realized when the face image can meet the payment condition, as shown in fig. 4.

Wherein, a device capable of image acquisition, such as a camera or a video recorder, can be configured on the terminal 20 or the server 10 for acquiring a face image of a user needing face recognition.

In order to implement the function of the corresponding image processing on the terminal or the server, a program for implementing the corresponding function needs to be stored in the memory of the terminal or the server. In order to facilitate understanding of the hardware configuration of the terminal or the server, the server is described as an example below. As shown in fig. 5, which is a schematic structural diagram of a server of the present application, the server 10 in this embodiment may include: a processor 101, a memory 102, a communication interface 103, an input unit 104, a display 105, and a communication bus 106.

The processor 101, the memory 102, the communication interface 103, the input unit 104, and the display 105 all communicate with each other through the communication bus 106.

In this embodiment, the processor 101 may be a Central Processing Unit (CPU), an application specific integrated circuit, a digital signal processor, an off-the-shelf programmable gate array, or other programmable logic device.

The processor 101 may call a program stored in the memory 102. Specifically, the processor 101 may perform operations performed by the server in the following embodiments of the face quality label value acquisition method.

The memory 102 is used for storing one or more programs, which may include program codes including computer operation instructions, and in this embodiment, the memory stores at least the programs for implementing the following functions:

obtaining a first face quality label value of each first image sample and a preset quality label value of an artificial label corresponding to the first face quality label value, wherein the first face quality label value comprises a test label value of a face quality total score and/or a test label value of a face quality attribution score, and the preset quality label value is the preset label value of the face quality total score or the preset label value of the face quality attribution score;

training a label training model by taking at least a first face quality label value of a first image sample as an input sample of a pre-constructed label training model and taking a preset quality label value corresponding to the first face quality label value as an output sample of the label training model;

and processing at least a first face quality label value of each second image sample in the second image set by using the trained label training model to obtain a second face quality label value of each second image sample, wherein the second face quality label value of each second image sample is used for training the face quality model, the face quality model comprises a face quality total score model or a face quality attribution score model, the face quality total score model is used for obtaining the face quality total score of the test image, and the face quality attribution score model is used for obtaining the face quality attribution score of the test image.

In one possible implementation, the memory 102 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function (such as model training, etc.), and the like; the storage data area may store data created during use of the computer, such as image samples, tag values, and the like.

Further, the memory 102 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid state storage device.

The communication interface 103 may be an interface of a communication module, such as an interface of a GSM module.

Of course, the structure of the server shown in fig. 5 does not constitute a limitation to the server in the embodiment of the present application, and in practical applications, the server may include more or less components than those shown in fig. 5, or some components may be combined. It is understood that the hardware composition of the terminal may refer to the hardware composition of the server in fig. 5.

It should be noted that the server in this embodiment may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

That is to say, the server in the present application may be a cloud server, and the technical solution of the present application is implemented by a cloud technology (cloud technology), where the cloud technology refers to a hosting technology that unifies series resources such as hardware, software, and a network in a wide area network or a local area network to implement calculation, storage, processing, and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied in the cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.

Among them, cloud computing (cloud computing) is a computing mode that distributes computing tasks over a resource pool formed by a large number of computers, so that various application systems can acquire computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.

As a basic capability provider of cloud computing, a cloud computing resource pool (cloud platform for short) generally called as an Infrastructure as a Service (IaaS) platform is established, and multiple types of virtual resources are deployed in the resource pool and are selectively used by external clients. The cloud computing resource pool mainly comprises: computing devices (which are virtualized machines, including operating systems), storage devices, and network devices.

According to the logic function division, a PaaS (Platform as a Service) layer can be deployed on an IaaS (Infrastructure as a Service) layer, a SaaS (Software as a Service) layer is deployed on the PaaS layer, and the SaaS can be directly deployed on the IaaS. PaaS is a platform on which software runs, such as a database, a web container, etc. SaaS is a variety of business software, such as web portal, sms, and mass texting. Generally speaking, SaaS and PaaS are upper layers relative to IaaS.

The server mentioned in the foregoing embodiment is a server capable of performing cloud computing on a cloud platform, and is used to implement image processing and model training in the present application.

With reference to fig. 6, a schematic flow chart of an embodiment of the method for obtaining a face quality label value according to the present application is shown in combination with the above commonalities, where the method in this embodiment may include:

s601: a first set of images is obtained.

The first image set includes a plurality of first image samples, and each of the first image samples is an image including at least a part of a face region, such as an image including a left region of the face, an image including a right region of the face, an image including a front region of the face, and so on, as shown in fig. 7.

S602: and obtaining a first face quality label value of each first image sample and a preset quality label value of the artificial label corresponding to the first face quality label value.

In this embodiment, each first image sample may be subjected to image processing by using a face quality model subjected to initial training, such as a face quality total score model and/or a face quality attribution score model, so as to obtain a first face quality label value including a test label value of the face quality total score and/or a test label value of the face quality attribution score.

Further, in this embodiment, after receiving a manual labeling operation performed on the first image sample by a user, a preset quality label value corresponding to the first face quality label value is obtained according to the manual labeling operation. For example, manual labeling operations of a plurality of users for the face quality total score of the same first image sample are obtained, and then the labeling values in the manual labeling operations are averaged to serve as the preset label value of the face quality total score of the first image sample; for another example, manual labeling operations of a plurality of users for face quality attribution of the same first image sample are obtained, and then the labeling values in the manual labeling operations are averaged to serve as the preset label value of the face quality attribution of the first image sample.

In an implementation manner, the first face quality label value may include a test label value of the total score of the face quality, and correspondingly, in this embodiment, an initially trained face quality total score model may be used to perform image processing on each first image sample to obtain a test label value of the total score of the face quality of each first image sample, and then the obtained preset quality label value according to the manual labeling operation is a preset label value of the total score of the face quality;

in another implementation manner, the first face quality label value may include at least one test label value of the face quality attribution score, and accordingly, in this embodiment, an initially trained face quality attribution score model may be used to perform image processing on each first image sample to obtain the test label value of the at least one face quality attribution score of each first image sample, and then the obtained preset quality label value according to the manual tagging operation includes the preset label value of each face quality attribution score;

in another implementation, the first face quality label value may include a test label value of the face quality total score and a test label value of the at least one face quality attribution score, and accordingly, the initially trained face quality total score model may be utilized in this embodiment, performing image processing on each first image sample to obtain a test label value of the face quality total score of each first image sample, attributing a score model by using the initially trained face quality, performing image processing on each first image sample to obtain at least one test label value of face quality attribution of each first image sample, and then the preset quality label value obtained according to the manual marking operation is the preset label value of the total score of the face quality, or the preset quality label value obtained according to the manual labeling operation is the preset label value of each human face quality attribution score.

In a specific implementation, the above initially trained face quality total score model is obtained by training a plurality of third image samples with initial label values of face quality total scores, and the face quality attribution score model is obtained by training a plurality of third image samples with initial label values of face quality attribution scores.

In one implementation, the initial label value of the face quality score of the third image sample may be obtained by:

and carrying out face recognition on each third image sample by using a preset face recognition model to obtain an initial label value of the face quality total score of the third image sample. For example, a third image sample to be marked and a standard certificate image are subjected to recognition calculation of the face similarity by using a face recognition model, and then the face similarity value can be directly used as an initial label value of the face quality total score;

in one implementation, the initial label value of the face quality attribute of the third image sample may be obtained by:

and performing degradation processing on each third image sample in at least one degradation dimension to obtain at least one initial label value of the third image sample due to the face quality. For example, an image processing tool is used to add degradation operations such as occlusion, blurring, brightness variation and the like to the third image sample, and then the third image sample is set with an initial label value of the corresponding face quality attribution according to different degradation degrees.

Based on the above implementation, the first image sample in the first image set may be a portion of the image sample selected from the third image sample, or the first image sample in the first image set may also be a reselected image sample, and the first image sample may be different from the third image sample.

S603: and training the label training model by taking at least the first face quality label value of the first image sample as an input sample of a pre-constructed label training model and taking a preset quality label value corresponding to the first face quality label value as an output sample of the label training model.

The label training model may be a regression model constructed based on a progressive gradient regression tree gbrt (gradientbookregression tree).

Specifically, in this embodiment, a pre-constructed label training model is used to sequentially perform learning prediction on the first face quality label value of each first image sample to obtain a face quality prediction value of each first image sample, where the face quality prediction value may be a face quality total score prediction value or at least one face quality attribute score prediction value, based on which, the face quality prediction value is compared with the first face quality label value, and a model parameter of the label training model is adjusted according to a difference value represented by a comparison result, so that a loss function of the label training model is reduced, and the training is completed until the loss function converges.

It should be noted that, in this embodiment, the first image sample serving as the input sample of the label training model may include an original first image sample, or the first image sample may include an image sample obtained by left-right flipping the original first image sample, or the first image sample may include the original first image sample and also include an image sample obtained by left-right flipping the original first image sample.

Based on this, the first face quality label value of the input sample as the label training model may include a test label value of the face quality total score of the original first image sample and a test label value of the face quality total score of the image sample obtained by left-right turning the original first image sample; or, the first face quality label value may include a test label value of at least one face quality attribution score of the original first image sample and a test label value of at least one face quality attribution score of an image sample obtained by left-right flipping the original first image sample; or, the first face quality label value may include a test label value of the face quality total score of the original first image sample and a test label value of the at least one face quality attribution score, and further include a test label value of the face quality total score of the image sample obtained by left-right turning the original first image sample and a test label value of the at least one face quality attribution score.

Further, the input samples of the label training model may further include: based on at least one item of face attribute information obtained by performing face attribute recognition on the first image sample, in this embodiment, the face attribute information is added to the input sample of the label training model, and further, the learning prediction of the occlusion cause label value in the label training model is optimized by using the face attributes of sunglasses, bang, and the like in the first image sample;

and/or, the input samples of the label training model can also comprise: based on at least one item of face key point information obtained by performing face key point recognition on the first image sample, in this embodiment, the face key point information is added to the input sample of the label training model, and then the learning prediction of the angle attribution label value in the label training model is optimized by using the integrity degree of the face key point in the first image sample.

In one implementation, under the condition that an output sample of a label training model is a preset label value of a total score of face quality corresponding to a first face quality label value, the trained label training model can be used for predicting a predicted label value of the total score of face quality in an image sample, and the accuracy of the predicted label value of the total score of face quality is higher than that of a test label value of the total score of face quality of the image sample; based on the realization, under the condition that the first face quality label value comprises the test label value of the face quality total score and the test label value of at least one face quality attribution score of the original first image sample, and also comprises the test label value of the face quality total score and the test label value of at least one face quality attribution score of the image sample obtained after the original first image sample is turned left and right, the cross optimization training is carried out on the label training model by utilizing the test label value of the face quality total score and the test label value of the face quality attribution score in the embodiment, so that the accuracy of the label training model for realizing the prediction of the face quality total score is improved;

in another implementation manner, in the case that an output sample of the label training model is a preset label value of the face quality attribution corresponding to the first face quality label value, the trained label training model can be used for predicting a predicted label value of the face quality attribution in the image sample, and the accuracy of the predicted label value of the face quality attribution is higher than that of a test label value of the face quality attribution of the image sample; based on the implementation, under the condition that the first face quality label value includes the test label value of the face quality total score and the test label value of at least one face quality attribution score of the original first image sample, and further includes the test label value of the face quality total score and the test label value of at least one face quality attribution score of the image sample obtained after the original first image sample is turned left and right, the cross optimization training is performed on the label training model by using the test label value of the face quality total score and the test label value of the face quality attribution score in the embodiment, so that the accuracy of the label training model for realizing the prediction of the face quality attribution score is improved.

S604: and processing the first face quality label value of each second image sample in the second image set by using the trained label training model to obtain a second face quality label value of each second image sample.

The second face quality label value of the second image sample is used for training a face quality model, the face quality model comprises a face quality total score model or a face quality attribution score model, the face quality total score model is used for obtaining a face quality total score of the test image, and the face quality attribution score model is used for obtaining at least one face quality attribution score of the test image.

In an implementation manner, in this embodiment, a label training model is used to perform learning prediction on a test label value of the total score of the face quality in each second image sample, so as to obtain a predicted label value of the total score of the face quality of each second image sample; or, in this embodiment, a label training model is used to perform learning prediction on the test label value of the face quality total score and the test label value of the face quality attribution score of each second image sample, so as to obtain a prediction label value of the face quality total score of each second image sample; or, in this embodiment, the label training model is used to perform learning prediction on the test label value of the face quality total score and the test label value of the face quality attribution score of each second image sample, and the test label value of the face quality total score and the test label value of the face quality attribution score of the image sample after the left-right turning of each second image sample, so as to obtain the prediction label value of the face quality total score of each second image sample; or, in this embodiment, a label training model is used to perform learning prediction on the test label value of the face quality total score and the test label value of the face quality attribution score of each second image sample, and the test label value of the face quality total score and the test label value of the face quality attribution score of each image sample after the image sample is left-right turned, as well as the face attribute information and the face key point information, so as to obtain a prediction label value of the face quality total score of each second image sample;

then, in this embodiment, the face quality total score model may be trained by using the prediction tag value of the face quality total score of each second image sample, so as to obtain a face quality total score model capable of accurately obtaining the face quality total score of the test image;

in one implementation manner, in this embodiment, the label training model is used to perform learning prediction on the test label value of the face quality attribution of each second image sample, so as to obtain a predicted label value of the face quality attribution of each second image sample; or, in this embodiment, a label training model is used to perform learning prediction on the test label value of the face quality total score and the test label value of the face quality attribution score of each second image sample, so as to obtain a prediction label value of the face quality attribution score of each second image sample; or, in this embodiment, a label training model is used to perform learning prediction on the test label value of the face quality total score and the test label value of the face quality attribution score of each second image sample, and the test label value of the face quality total score and the test label value of the face quality attribution score of each image sample after the image sample is inverted left and right, so as to obtain a prediction label value of the face quality attribution score of each second image sample; or, in this embodiment, a label training model is used to perform learning prediction on the test label value of the face quality total score and the test label value of the face quality attribution score of each second image sample, and the test label value of the face quality total score and the test label value of the face quality attribution score of each image sample after left-right turning of each second image sample, as well as the face attribute information and the face key point information, so as to obtain a prediction label value of the face quality attribution score of each second image sample;

and then training the face quality attribution model by using the prediction tag value of the face quality attribution of each second image sample so as to obtain the face quality attribution model capable of accurately obtaining the face quality attribution of the test image.

Therefore, in the embodiment, the test label value of the face quality total score of the first image sample can be used alone to train the label training model capable of predicting the accurate prediction label value of the face quality total score; or, the test label value of the face quality attribution of the first image sample can be used alone to train a label training model capable of predicting the accurate prediction label value of the face quality attribution; alternatively, the test label value of the face quality total score and the test label value of the face quality attribution score of the first image sample may be used for cross guidance to train a prediction label value label training model capable of predicting an accurate face quality total score or face quality attribution score, so as to further improve the accuracy of the prediction label value of the face quality total score and the prediction label value of the face quality attribution score, thereby improving the accuracy of the subsequently trained face quality total score model and face quality attribution score model.

It should be noted that, the manner of obtaining the first face quality label value of the second image sample may refer to the manner of obtaining the first face quality label value of the first image sample in the foregoing, for example, performing image processing on each second image sample by using the initially trained face quality total score model to obtain a test label value of the face quality total score of each second image sample, performing image processing on each second image sample by using the initially trained face quality attribution score model to obtain a test label value of the face quality attribution score of each second image sample, where the test label value of the face quality total score of the second image sample and/or the test label value of the face quality attribution score of the second image sample constitute the first face quality label value of the second image sample.

Based on the above implementation, the second image sample participating in the face quality model training in this embodiment is a screened image sample, specifically, in this embodiment, the label value of all the image samples in the initially selected full training image set is learned and predicted by using the label training model, and then the image samples whose difference between the predicted second face quality label value and the first face quality label value before prediction is greater than or equal to the preset threshold value are screened out, these screened image samples are the second image samples, that is, in this embodiment, the image samples with a large difference in label values before and after the full-force training image set is used as the image samples participating in the face quality model training, therefore, the human face quality model is trained by using the hard sample, so that the accuracy of the trained human face quality model is improved.

It can be seen from the foregoing solution that, in the method for obtaining a face quality label value provided in this embodiment of the present application, after obtaining a test label value of a total score of face quality of a first image sample and/or a test label value of an attributive score of face quality and a preset quality label value of a corresponding artificial label, at least the first face quality label value of the first image sample is used as an input sample of a label training model, and a more accurate preset quality label value corresponding to the first face quality label value is used as an output sample of the label training model to train the label training model, so that the trained label training model can perform learning prediction of the label value on a second image sample having the first face quality label value, thereby obtaining a more accurate second face quality label value of each second image sample, and based on this, the corresponding face quality model can be subjected to corresponding face quality using the second face quality label value of the second image sample And training is carried out, so that the accuracy of the trained face quality total score model or the trained face quality attribution score model is improved, and accordingly, when the face quality total score model or the face quality attribution score model with higher accuracy is used for testing the test image, the face quality total score or the face quality attribution score with higher accuracy can be obtained.

For ease of understanding, the following illustrates an application of the face quality score in a registered photo filing task in a scenario of smart retail sales and the like. Based on the technical scheme of the application, a simple and reliable registration photo intercepting scheme is realized in the embodiment, the friendliness of the face area in the monitoring video to the face recognition model can be accurately evaluated through the face quality total score model, and whether the monitoring video is sent to be filed or not is judged. As shown in fig. 8, for a segment of surveillance video, it needs to be decomposed into image frames, and after a face region is cut out by a face detection tool, the image in the region is given to a face quality total score model for scoring. And selecting a plurality of face images cut from a section of video, selecting the face image with the highest face quality total score output by the face quality total score model, sending a requirement of the face quality total score to the system by a background, wherein the requirement is usually a fixed total score threshold value, if the requirement is higher than the threshold value, sending the image to be filed, if the requirement is lower than the threshold value, continuously cutting the video, and carrying out the next round of quality total score evaluation.

The following examples illustrate the application of face quality attribution in practical application scenarios such as payment and face verification. Based on the technical scheme of the application, the efficient and reliable face correction reminding scheme is realized in the embodiment, the degradation reason of the face image can be accurately analyzed through face quality attribution assessment, and the face correction prompt is fed back to the user. As shown in fig. 9, after a face region is cut out from a photo taken by a user by a face detection tool, an image of the region is given to a face quality attribution score model for scoring, and a background sends a requirement for attribution score, which is generally a fixed threshold, to the system. And then comparing the attribution scores of the four dimensions with the requirements one by one, intercepting unqualified images, and inputting the images again after correcting and reminding the corresponding attribution dimensions of the user.

An example of the practical application of the present solution is described below with reference to a logical architecture diagram of the server shown in fig. 10, where the logical architecture diagram is used to obtain the face quality label value:

first, as shown in table 1, for the total points archive criteria:

TABLE 1 Total points filing Standard

With reference to fig. 10, the process of obtaining the label values of the training samples of the face quality total score model and the face quality attribution score model in the server is as follows:

step one, a large number of total score attribution pre-labeling data sets with low possible label accuracy, namely data sets composed of the third image samples in the foregoing text, are obtained, wherein the pre-labeling data sets can label the third image samples with the initial label values of the total score of the face quality and the attribute score of the face quality in a traditional automatic labeling mode. For example, by applying a degradation operation to the third image sample, such as adding an occlusion, a blur, a brightness change, and the like to the image, and then marking the third image sample with an initial label value of the face quality attribution score, that is, a pre-labeled attribution score, according to the difference of the degradation degree; and calculating the face similarity of the third image sample to be marked, such as a photo and the like, and the standard identification photo by using the face recognition model, so that the face similarity is directly used as an initial label value of the face quality total score, namely the pre-marked face quality total score.

And step two, obtaining an initial version total score model and an attribution model through a traditional mode, namely an initially trained face quality total score model and an initially trained face quality attribution score model. For example, in this embodiment, a deep learning model for estimating the total score and the attribution score of the face quality is built in advance by using a framework such as tensoflow, and then supervised learning is performed by using the pre-labeled data set obtained in the step one to train an initial total score model and an initial attribution score model of the face quality.

And step three, obtaining a small amount of accurate guide data sets, namely the first image set in the foregoing, wherein the first image samples in the first image set are used for guiding the correction of the pre-annotation data. For example, a small number of image samples are accurately labeled by multiple times of manual work, that is, a preset quality label value corresponding to a first face quality label value of a first image sample is labeled for multiple times, and then an average value is taken, so that the random error of the labeling result is reduced, and the accuracy is improved.

And step four, obtaining the initial version total score and the attribution score of the first image sample in the guide data set, namely the test label value of the face quality total score and the test label value of the face quality attribution score of the first image sample. For example, in the embodiment, the total score model and the attribution model of the initial version obtained in the second step are used to predict the total score and the attribution score of the initial version of the first image sample, and at this time, the first image sample in the guidance data set has a preset label value and a test label value, that is, the precise annotation and the total score and the attribution score of the initial version.

Step five, learning a score correction strategy according to the total score attribution score difference of the initial version and the accurate marked version in the guidance data set: and taking the initial version total score and the attribution score of the guide data set and the accurately labeled score as training data, and learning the mapping from the initial version total score and the attribution score to the accurately labeled score by using a regression model such as GBRT (guaranteed bit rate rt), so that the regression model obtains the capability of correcting the pre-labeled data. When the regression model learning attribution time is divided, in the embodiment, an original attribution point (a test label value of the face quality attribution point of the original first image sample), an original total point (a test label value of the face quality total point of the original first image sample), an original attribution point after horizontal inversion (left-right inversion of the original first image sample) and an original total point after horizontal inversion are used as input samples, and the accurately labeled attribution point is used as an output sample to guide the regression model learning attribution point correction strategy; in the embodiment, the original attribution points, the original total points, the original attribution points after horizontal turning, and the original total points after horizontal turning are used as input samples, and the total points are accurately marked as output samples to guide the correction strategy of the regression model to learn the total point scores. Therefore, by introducing the cross-guide strategy, label correction of a full amount of training labels (for example, 100 ten thousand second image samples) through a small amount of labeling samples (for example, 1 ten thousand first image samples) can be realized.

In addition, in the embodiment, when the regression model is trained in the step five, other guidance information may be added, so that the effect of cross optimization may be further improved. For example, the face attribute judgment information of the first image sample is added, and then the optimization of the occlusion factor value can be strengthened by utilizing information such as sunglasses, Liuhai and the like; for another example, the face key point information of the first image sample is added, and the optimization of the angle attribution value can be strengthened by using the integrity degree of the key points.

And step six, collecting valuable samples by utilizing the learned strategies, namely collecting the valuable samples serving as difficult samples of the second image sample, and further optimizing the subsequently trained face quality total score model and face quality attribution score model. For example, in the model optimization process, simply increasing the training scale does not necessarily bring forward gains, because repeated learning of simple samples cannot bring an improvement to the model, but rather introduces noise. Therefore, it makes sense to introduce valuable samples, and therefore, in this embodiment, according to the regression model obtained in step five, the total score of face quality and the attribution score of face quality are predicted and corrected on a new unlabeled image sample, and an image sample with a large difference between the total score and the attribution score before and after correction, that is, a second image sample, is selected and added to the training set as a difficult sample lacking in the current model, so as to obtain a full training set.

And step seven, combining the theory of the step six, and predicting the initial version total score and the attribution score of the full-scale training set by using the total score model and the attribution model obtained in the step two.

And step eight, using the regression model obtained in the step five in a full training set, inputting the initial version total score and the attribution score obtained in the step seven, and outputting the corrected final face quality total score and the face quality attribution score, wherein the final face quality total score and the face quality attribution score can be used for training a final total score model and an attribution model.

And step nine, training a total score model and an attribution model by using the total score and the attribution score obtained in the step eight after the total training data is corrected, and obtaining an optimized total score model and an optimized attribution model after cross guidance.

According to the technical scheme, the total score model and the attribution model are used for cross guidance, and therefore the manual marking amount can be reduced. And quickly building an accurate and reliable quality sub-model under a new scene, a new task and new data, and accelerating the model iteration of subsequent versions. Correspondingly, the total score model and the attribution model with higher accuracy can be obtained by utilizing the cross guidance of the total score model and the attribution model. For a certain model, the score threshold of classification is increased, so that the model is more strict, the proportion of misjudging bad samples into good samples can be reduced, and the proportion of correctly classifying the good samples into good samples can be reduced; otherwise, decreasing the score threshold of classification will increase the proportion of bad samples being misjudged as good samples, but will also increase the proportion of good samples being correctly classified as good samples. Therefore, for the evaluation of a model, the correct classification ratio of good samples should be fixed, and the evaluation is performed according to the ratio of bad samples judged as good samples by mistake.

Taking a photo filing business of a Chinese as an example, because the Chinese has fewer beards, the existing Chinese attribution model takes the ludwigia chinensis as shielding treatment; and the foreign people's beards have high proportion and can not be intercepted completely in the service, so the shielding prediction score of the beards sample needs to be improved. Therefore, the beard information in the face attribute is introduced for cross optimization. The result shows that by screening 1 ten thousand of the beard samples, multiplying the original occlusion attribution score by a coefficient of 1.5 and then using the result for guiding the label correction of 100 ten thousand of the training samples, the proportion of the mistaken judgment occlusion of the beard samples is reduced from 72.67% to 20.67% under the condition that 99% of good samples are correctly classified. And the model iterative optimization is quickly realized under the condition of completely not participating in manual marking, and the effect is obvious.

Therefore, the scheme of cross guidance optimization of the total human face quality and the attribution sub-model is adopted, accurate and reliable evaluation is carried out on the human face image quality in actual business, and the method can be widely applied to scenes such as WeChat payment, human face verification, intelligent retail and the like.

In another aspect, the present application further provides an apparatus for obtaining a face quality label value, as shown in fig. 11, which shows a schematic composition diagram of an embodiment of an apparatus for obtaining a face quality label value according to the present application, where the apparatus of the present embodiment may be applied to a server, and the apparatus may include:

an image set obtaining unit 1101, configured to obtain a first image set, where the first image set includes a plurality of first image samples;

a label value testing unit 1102, configured to obtain a first face quality label value of each first image sample and a preset quality label value of an artificial label corresponding to the first face quality label value, where the first face quality label value includes a test label value of a face quality total score and/or a test label value of a face quality attribution score, and the preset quality label value is a preset label value of the face quality total score or a preset label value of the face quality attribution score;

a label value training unit 1103, configured to train the label training model by using at least a first face quality label value of the first image sample as an input sample of a pre-constructed label training model and using a preset quality label value corresponding to the first face quality label value as an output sample of the label training model;

a label value obtaining unit 1104, configured to process at least a first face quality label value of each second image sample in the second image set by using a trained label training model to obtain a second face quality label value of each second image sample, where the second face quality label value of each second image sample is used to train a face quality model, the face quality model includes a face quality total score model or a face quality attribution score model, the face quality total score model is used to obtain a face quality total score of the test image, and the face quality attribution score model is used to obtain a face quality attribution score of the test image.

Optionally, the tag value testing unit 1102 is specifically configured to:

and/or the presence of a gas in the gas,

Optionally, the label value testing unit 1102 obtains an initial label value of the total face quality score of the third image sample by:

performing face recognition on each third image sample by using a preset face recognition model to obtain an initial label value of the total face quality score of the third image samples;

the label value testing unit 1102 obtains an initial label value of the face quality attribution of the third image sample by:

and performing degradation processing on each third image sample in at least one degradation dimension to obtain at least one initial label value of the third image sample due to the face quality.

Optionally, the input samples of the label training model further include:

On the other hand, the embodiment of the present application further provides a storage medium, where computer-executable instructions are stored in the storage medium, and when the computer-executable instructions are loaded and executed by a processor, the method for obtaining the face quality tag value performed by the server in any one of the above embodiments is implemented.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims

1. A method for acquiring a face quality label value is characterized by comprising the following steps:

2. The method of claim 1, wherein obtaining a first face quality label value for each of the first image samples comprises:

and/or the presence of a gas in the gas,

3. The method of claim 2, wherein the initial label value of the face quality score of the third image sample is obtained by:

4. The method of claim 2, wherein the first image sample is an image sample selected from the third image samples, or wherein the first image sample is different from the third image sample.

5. The method according to claim 1 or 2, wherein the first image sample comprises an original first image sample and/or an image sample obtained by left-right flipping the original first image sample.

6. The method of claim 1 or 2, wherein the input samples of the label training model further comprise:

7. A method according to claim 1 or 2, wherein the difference between the first face quality label value of a second image sample in the second image set and the second face quality label value of the second image sample is greater than or equal to a preset threshold.

8. An apparatus for obtaining a face quality label value, comprising:

9. A server, comprising:

a processor and a memory;

wherein the processor is configured to execute a program stored in the memory;

the memory is to store a program to at least:

10. A storage medium having stored thereon computer-executable instructions which, when loaded and executed by a processor, implement the method of obtaining a face quality label value according to any one of claims 1 to 7.