CN109034106B

CN109034106B - Face data cleaning method and device

Info

Publication number: CN109034106B
Application number: CN201810930672.0A
Authority: CN
Inventors: 范晓
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2018-08-15
Filing date: 2018-08-15
Publication date: 2022-06-10
Anticipated expiration: 2038-08-15
Also published as: CN109034106A

Abstract

The disclosure relates to a method and a device for cleaning face data. The method comprises the following steps: acquiring first face data to be processed related to a target person; respectively extracting the face features of each face data based on a first face recognition model, and determining the first face features of the target person according to the face features of each face data; screening out first face data which do not need to be cleaned, second face data which need to be deleted and third face data which need to be cleaned in the face data according to the face features of the face data and the first face features of the target person; and cleaning the third face data to obtain fourth face data, and determining the first face data and the fourth face data as first target face data corresponding to the target person. The data cleaning method and the data cleaning device can reduce time and labor required by data cleaning operation, reduce data cleaning cost and improve data cleaning efficiency.

Description

Face data cleaning method and device

Technical Field

The disclosure relates to the technical field of terminals, in particular to a face data cleaning method and device.

Background

The face recognition technology based on deep learning greatly improves the accuracy of face recognition and promotes the application of face recognition in the fields of finance, public security, payment and the like.

However, the face recognition based on deep learning requires a large amount of labeled face data as training samples, and thus the data sorting requirement is large. Meanwhile, the face data obtained by means of network search and the like is often mixed with the face data of other people, even the data of non-faces, and the data needs to be cleaned.

Disclosure of Invention

In order to overcome the problems in the related art, the embodiments of the present disclosure provide a method and an apparatus for cleaning face data. The technical scheme is as follows:

according to a first aspect of the embodiments of the present disclosure, a method for cleaning face data is provided, the method including:

acquiring first face data to be processed related to a target person;

respectively extracting the face features of the face data based on a first face recognition model, and determining the first face feature of the target person according to the face features of the face data;

screening out first face data which do not need to be cleaned, second face data which need to be deleted and third face data which need to be cleaned in the face data according to the face features of the face data and the first face features of the target person;

And cleaning the third face data to obtain fourth face data, and determining the first face data and the fourth face data as first target face data corresponding to the target person.

In one embodiment, the screening out the first face data that does not need to be cleaned, the second face data that needs to be deleted, and the third face data that needs to be cleaned from the face data according to the face features of the face data and the first face feature of the target person includes:

calculating the similarity between the face features of the face data and the first face feature of the target person;

determining the face data with the similarity larger than a first threshold value in each face data as first face data which does not need to be cleaned; determining the face data with the similarity not greater than the first threshold and greater than the second threshold in each face data as third face data needing cleaning; and determining the face data with the similarity not greater than the second threshold value in each face data as second face data needing to be deleted.

In one embodiment, the determining the first facial feature of the target person according to the facial features of each of the facial data includes:

Calculating the average value of the face features of the face data;

and determining the calculated average value as the first face characteristic of the target person.

In one embodiment, after the determining the first and fourth face data as the first target face data corresponding to the target person, the method further includes:

training the first face recognition model based on the first face data and the fourth face data to obtain a trained second face recognition model;

respectively extracting the face features of the first target face data and the second to-be-processed face data based on the second face recognition model, and determining the second face feature of the target person according to the face features of the first target face data;

screening out fifth face data which does not need to be cleaned, sixth face data which needs to be deleted and seventh face data which needs to be cleaned in the second face data to be processed according to the second face features of the target person and the face features of the second face data to be processed;

and cleaning the seventh face data to obtain eighth face data, and determining the first face data, the fourth face data, the fifth face data and the eighth face data as second target face data corresponding to the target person.

In one embodiment, the acquiring the first to-be-processed face data related to the target person includes:

acquiring reference data related to a target person;

respectively carrying out face detection on each reference data based on the first face identification model;

and determining the data of the detected face in each reference data as the first face data to be processed related to the target person.

According to a second aspect of the embodiments of the present disclosure, there is provided a face data cleaning apparatus including:

the acquisition module is used for acquiring first face data to be processed related to a target person;

the first determining module is used for respectively extracting the face features of the face data based on a first face recognition model and determining the first face features of the target person according to the face features of the face data;

the first screening module is used for screening out first face data which do not need to be cleaned, second face data which need to be deleted and third face data which need to be cleaned in the face data according to the face features of the face data and the first face features of the target person;

and the first cleaning module is used for cleaning the third face data to obtain fourth face data, and determining the first face data and the fourth face data as first target face data corresponding to the target person.

In one embodiment, the first screening module includes:

the calculation submodule is used for calculating the similarity between the face characteristics of the face data and the first face characteristics of the target person;

the first determining submodule is used for determining the face data with the similarity larger than a first threshold value in each face data as first face data which does not need to be cleaned; determining the face data with the similarity not greater than the first threshold and greater than the second threshold in each face data as third face data needing cleaning; and determining the face data with the similarity not greater than the second threshold value in each face data as second face data needing to be deleted.

In one embodiment, the first determination module calculates an average value of the face features of each of the face data; and determining the calculated average value as the first face characteristic of the target person.

In one embodiment, the apparatus further comprises:

the training module is used for training the first face recognition model based on the first face data and the fourth face data to obtain a trained second face recognition model;

the second determining module is used for respectively extracting the face characteristics of each first target face data and second face data to be processed based on the second face recognition model, and determining the second face characteristics of the target person according to the face characteristics of each first target face data;

The second screening module is used for screening fifth face data which does not need to be cleaned, sixth face data which needs to be deleted and seventh face data which needs to be cleaned in each piece of second face data to be processed according to the second face features of the target person and the face features of each piece of second face data to be processed;

and the second cleaning module is used for cleaning the seventh face data to obtain eighth face data, and determining the first face data, the fourth face data, the fifth face data and the eighth face data as second target face data corresponding to the target person.

In one embodiment, the obtaining module includes:

the acquisition submodule is used for acquiring reference data related to the target person;

the face detection submodule is used for respectively carrying out face detection on each reference data based on the first face identification model;

and the second determining submodule is used for determining the data of the detected face in each reference data as the first face data to be processed related to the target person.

According to a third aspect of the embodiments of the present disclosure, there is provided a face data washing apparatus, including:

A processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

acquiring first face data to be processed related to a target person;

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method embodiments of any one of the above-mentioned first aspects.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: this technical scheme is through using the face identification model to extract the face characteristic of the face data of pending to calculate the first face characteristic of target personage, then according to the face characteristic of the face data of pending and the first face characteristic of target personage the face data of pending screening process, only need wash partial face data, the time and the manpower that the required cost of data cleaning operation that can significantly reduce can reduce, data cleaning's cost improves data cleaning's efficiency.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow diagram illustrating a method of face data cleansing in accordance with an exemplary embodiment.

FIG. 2 is a flow diagram illustrating a method of face data cleansing in accordance with an exemplary embodiment.

FIG. 3 is a flow diagram illustrating a method of face data cleansing in accordance with an exemplary embodiment.

FIG. 4 is a flow diagram illustrating a face data cleansing method according to an exemplary embodiment.

FIG. 5 is a block diagram illustrating a face data cleansing apparatus according to an exemplary embodiment.

FIG. 6 is a block diagram illustrating a face data cleansing apparatus according to an exemplary embodiment.

FIG. 7 is a block diagram illustrating a face data cleansing apparatus according to an exemplary embodiment.

FIG. 8 is a block diagram illustrating a face data cleansing apparatus according to an exemplary embodiment.

FIG. 9 is a block diagram illustrating a face data cleansing apparatus according to an exemplary embodiment.

FIG. 10 is a block diagram illustrating an apparatus according to an example embodiment.

FIG. 11 is a block diagram illustrating an apparatus according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In the related art, a large amount of labeled face data is required to be used as training samples for face recognition based on deep learning, so that the requirement for data arrangement is high. Meanwhile, the face data obtained by means of network search and the like is often mixed with the face data of other people, even the data of non-faces, and the data needs to be cleaned, so that the data cleaning operation needs to spend more time and labor, and the cost of data cleaning is higher.

In order to solve the above problem, an embodiment of the present disclosure provides a face data cleaning method, where the method includes: acquiring first face data to be processed related to a target person; respectively extracting the face features of each face data based on a first face recognition model, and determining the first face features of the target person according to the face features of each face data; screening out first face data which do not need to be cleaned, second face data which need to be deleted and third face data which need to be cleaned in the face data according to the face features of the face data and the first face features of the target person; cleaning the third face data to obtain fourth face data; and determining the first face data and the fourth face data as first target face data corresponding to the target person.

The face feature of the face data to be processed is extracted by using the face recognition model, the first face feature of the target person is calculated, then the face data to be processed is screened according to the face feature of the face data to be processed and the first face feature of the target person, only part of face data needs to be cleaned, time and labor required by data cleaning operation can be greatly reduced, and the cost of data cleaning is reduced.

FIG. 1 is a flow diagram illustrating a method of face data cleansing in accordance with an exemplary embodiment; the execution main body of the method can be a terminal or a server, and the terminal can be an electronic device such as a smart phone, a tablet computer, a camera and the like; as shown in fig. 1, the method comprises the following steps 101-104:

in step 101, first face data to be processed related to a target person is acquired.

For example, the reference data related to the target person, such as the reference data associated with the name of the target person, may be obtained by searching a picture through a network, or storing a picture in advance. In practice, reference data associated with the name of the target person may be mixed with non-face data, and the non-face data may increase the workload of data cleaning; the data of the detected face in each reference data is determined as the first face data to be processed related to the target person, and the non-face data of which the face is not detected in each reference data is deleted from each reference data, so that the workload of data cleaning operation can be reduced.

In step 102, face features of each face data are respectively extracted based on the first face recognition model, and the first face feature of the target person is determined according to the face features of each face data.

For example, the face features of the first face data to be processed are respectively extracted based on an existing first face recognition model; and determining the first face characteristics of the target person according to the face characteristics of each face data. For example, an average value of the face features of each face data is calculated, and the calculated average value is determined as the first face feature of the target person.

For example, for a first to-be-processed face data Ii related to a target person, the value of i is 1-n, and a face feature Fi of the face data Ii is extracted based on a first face recognition model; calculating an average value Mn of all face data Ii, for example, Mn ═ F1+ … Fi + … Fn)/n, where Fi and Mn may be feature vectors; mn is taken as the first facial feature of the target person.

In step 103, first face data which does not need to be cleaned, second face data which needs to be deleted, and third face data which needs to be cleaned are screened out from the face data according to the face features of the face data and the first face features of the target person.

For example, the similarity between the face feature of each face data and the first face feature of the target person is calculated, for example, the similarity Si between the face feature Fi of the face data Ii and the first face feature Mn of the target person is calculated, for example, Si is equal to the cosine value of the included angle between the feature vectors Fi and Mn;

furthermore, according to the similarity between the face features of each piece of face data and the first face features of the target person, screening out first face data which do not need to be cleaned, second face data which need to be deleted and third face data which need to be cleaned in each piece of face data, for example, comparing the similarity between the face features of each piece of face data and the first face features of the target person with a preset first threshold and a preset second threshold, wherein the first threshold is larger than the second threshold; determining the face data with the similarity larger than a first threshold value in each face data as first face data which does not need to be cleaned; determining the face data with the similarity not greater than the first threshold and greater than the second threshold in each face data as third face data needing cleaning; and determining the face data with the similarity not greater than a second threshold value in each face data as second face data needing to be deleted.

Taking any face data Ii as an example, assume that the similarity between the face feature Fi of the face data Ii and the first face feature Mn of the target person is Si, the first threshold is T1, and the second threshold is T2.

If the similarity Si is greater than the first threshold T1, it may be determined that the face data Ii is a face image of the target person, in which case the face data Ii belongs to first face data that does not need to be cleaned;

if the similarity Si is not greater than the first threshold T1 and the similarity Si is greater than the second threshold T2, it is determined that the face data Ii is possibly a face image of the target person, but it needs to be further determined whether the face data Ii is the face image of the target person by a method such as manual confirmation, and in this case, the face data Ii belongs to third face data that needs to be cleaned;

if the similarity Si is not greater than the second threshold T2, the possibility that the face data Ii is the face image of the target person is relatively low, and in this case, the face data Ii belongs to the second face data that needs to be deleted.

In step 104, the third face data is cleaned to obtain fourth face data, and the first face data and the fourth face data are determined as first target face data corresponding to the target person.

It should be noted that the first face data, the second face data, the third face data, and the fourth face data are different types of face data, respectively.

For example, the third face data to be cleaned, which is screened from the respective face data, is judged by, for example, requesting a user to confirm, and whether the respective face data belonging to the third face data are face images of the target person is respectively judged; the face data of the face image which is judged to be the target person does not need to be cleaned; deleting the face data of the face image which is judged not to be the target person from the third face data to obtain fourth face data; as can be seen, the fourth face data includes face data of a face image belonging to the target person in the third face data.

According to the technical scheme, the face characteristics of the face data to be processed are extracted through the face recognition model, the first face characteristics of the target person are calculated, then the face data to be processed are screened according to the face characteristics of the face data to be processed and the first face characteristics of the target person, only part of face data need to be cleaned, time and labor required by data cleaning operation can be greatly reduced, the cost of data cleaning is reduced, and the efficiency of data cleaning is improved.

FIG. 2 is a flow diagram illustrating a method of face data cleansing in accordance with an exemplary embodiment. As shown in fig. 2, on the basis of the embodiment shown in fig. 1, the face data cleansing method according to the present disclosure may include the following steps 201 and 208:

in step 201, first face data to be processed related to a target person is acquired.

In step 202, face features of each face data are respectively extracted based on the first face recognition model, and the first face feature of the target person is determined according to the face features of each face data.

In step 203, the first face data which does not need to be cleaned, the second face data which needs to be deleted, and the third face data which needs to be cleaned are screened out from the face data according to the face features of the face data and the first face feature of the target person.

In step 204, the third face data is cleaned to obtain fourth face data, and the first face data and the fourth face data are determined as first target face data corresponding to the target person.

It should be noted that, for the descriptions of step 201 to step 204, reference may be made to the corresponding descriptions in the embodiment shown in fig. 1.

In step 205, a first face recognition model is trained based on the first face data and the fourth face data, and a trained second face recognition model is obtained.

For example, step 201 to step 204 are executed to obtain first target face data corresponding to a target person, and the first target face data are face images of the target person, so that the reliability is high; the first face recognition model is trained by using the first target face data, and for example, the first face recognition model can be finely tuned (refining) based on the first target face data to obtain a second face recognition model with better recognition performance.

Or a brand-new face recognition network can be designed, and a third face recognition model different from the first face recognition model is obtained based on the first target face data training; the third face recognition model has better recognition performance than the first face recognition model.

In step 206, the face features of each first target face data and the second to-be-processed face data are respectively extracted based on the second face recognition model, and the second face feature of the target person is determined according to the face features of each first target face data.

Illustratively, the face features of each first target face data and the second face data to be processed are respectively extracted based on a second face recognition model; the second face data to be processed refers to unprocessed face data that is related to the target person and is different from the first face data to be processed. And determining second face characteristics of the target person according to the face characteristics of the first target face data. For example, an average value of the face features of the respective first target face data is calculated, and the calculated average value is determined as the second face feature of the target person.

For example, for second to-be-processed face data Ij related to the target person, j takes a value of 1 to m, and face features Fj of each face data Ij are extracted based on a second face recognition model;

extracting the face features Fk of each first target face data Ik based on the second face recognition model; calculating an average value Mt of the face features Fk of each first target face data Ik, for example, Mt ═ F1+ … Fk + … Ft)/t, where k takes a value of 1 to m, and Fj, Fk, and Mt may be feature vectors; and taking the Mt as a second face characteristic of the target person.

In step 207, according to the second face features of the target person and the face features of the second face data to be processed, fifth face data which does not need to be cleaned, sixth face data which needs to be deleted, and seventh face data which needs to be cleaned are screened out from the second face data to be processed.

For example, the similarity between the face feature of each second to-be-processed face data and the second face feature of the target person is calculated, for example, the similarity Sj between the face feature Fj of the second to-be-processed face data Ij and the second face feature Mt of the target person is calculated, for example, Sj is equal to the cosine value of the included angle between the feature vectors Fj and Mt.

Furthermore, according to the similarity between the face features of the second to-be-processed face data and the second face features of the target person, screening out fifth face data which does not need to be cleaned, sixth face data which needs to be deleted and seventh face data which needs to be cleaned in the second to-be-processed face data; for example, the similarity between the face features of the second to-be-processed face data and the second face features of the target person is respectively compared with a preset third threshold and a preset fourth threshold, wherein the third threshold is greater than the fourth threshold; determining the face data with the similarity larger than a third threshold value in the second face data to be processed as fifth face data which does not need to be cleaned, wherein the face data are face images of target persons; determining the face data with the similarity not greater than a third threshold and greater than a fourth threshold in the second face data to be processed as seventh face data needing cleaning, wherein part or all of the face data are face images of the target person; and determining the face data with the similarity not greater than a fourth threshold in the second face data to be processed as sixth face data to be deleted, wherein the face data has low possibility of being a face image of the target person.

In step 208, the seventh face data is cleaned to obtain eighth face data, and the first face data, the fourth face data, the fifth face data, and the eighth face data are determined as second target face data corresponding to the target person.

It should be noted that the fifth face data, the sixth face data, and the seventh face data are different types of face data, respectively.

For example, the seventh face data to be cleaned, which is screened from the second face data to be processed, is judged by, for example, requesting the user to confirm, and whether each face data belonging to the seventh face data is a face image of the target person is respectively judged; the face data of the face image which is judged to be the target person does not need to be cleaned; deleting the face data of the face image which is judged not to be the target person from the seventh face data to obtain eighth face data; as can be seen, the eighth face data includes face data of a face image belonging to the target person in the seventh face data.

Illustratively, through the first picking and cleaning in steps 201 to 204 and the second picking and cleaning in steps 205 to 208, authentic first face data, fourth face data, fifth face data and eighth face data are obtained, and these four types of face data are merged together as a final face data processing result, i.e. second target face data corresponding to the target person.

For example, the second target face data may be used as training data of the face recognition model. At this time, if there are still more unprocessed face data related to the target person, the second target face data may be used to train a face recognition model with a stronger face recognition capability, and step 205 to step 208 are repeatedly executed to perform the iterative processing.

In the technical scheme provided by the embodiment of the disclosure, the face data to be processed is selected and cleaned by utilizing the existing face recognition model, so that the face data to be cleaned is greatly reduced; and the cleaned face data is used for training a new face recognition model, the new model is used for processing and cleaning the data again, and the recognition performance of the face recognition model is improved in a model iteration mode, so that the face data to be cleaned is reduced, and the data cleaning cost is reduced.

FIG. 3 is a flow diagram illustrating a method of face data cleansing in accordance with an exemplary embodiment. As shown in fig. 3, on the basis of the embodiment shown in fig. 1, the face data cleansing method according to the present disclosure may include the following steps 301-306:

in step 301, first face data to be processed related to a target person is acquired.

In step 302, face features of each face data are respectively extracted based on the first face recognition model, and the first face feature of the target person is determined according to the face features of each face data.

In step 303, the similarity between the face feature of each face data and the first face feature of the target person is calculated.

In step 304, determining the face data with the similarity greater than a first threshold value in each face data as first face data which does not need to be cleaned; determining the face data with the similarity not greater than the first threshold and greater than the second threshold in each face data as third face data needing cleaning; and determining the face data with the similarity not greater than a second threshold value in each face data as second face data needing to be deleted.

In step 305, the third face data is cleaned to obtain fourth face data.

In step 306, the first face data and the fourth face data are determined as first target face data corresponding to the target person.

According to the technical scheme provided by the embodiment of the disclosure, the face features of the face data to be processed are extracted based on the face recognition model, the face features are used for carrying out similarity calculation/evaluation on the first face data to be processed, then the face data to be processed is screened according to the similarity, only part of the face data needs to be cleaned, and the workload of data cleaning operation is greatly reduced.

FIG. 4 is a flow diagram illustrating a method of face data cleansing in accordance with an exemplary embodiment. As shown in fig. 4, on the basis of the embodiment shown in fig. 1, the face data cleaning method according to the present disclosure may include the following steps 401-:

in step 401, reference data related to the target person is acquired.

In step 402, face detection is performed on each reference data based on the first face recognition model.

In step 403, the data of the detected face in each reference data is determined as the first face data to be processed related to the target person.

In step 404, the face features of each face data are respectively extracted based on the first face recognition model, and the first face feature of the target person is determined according to the face features of each face data.

In step 405, according to the face features of each face data and the first face features of the target person, first face data which does not need to be cleaned, second face data which needs to be deleted, and third face data which needs to be cleaned are screened out from each face data.

In step 406, the third face data is cleaned to obtain fourth face data;

in step 407, the first face data and the fourth face data are determined as first target face data corresponding to the target person.

According to the technical scheme provided by the embodiment of the disclosure, after reference data related to a target person is obtained, firstly, face detection is respectively carried out on each reference data based on an existing first face recognition model; the data of the detected face in each reference data is determined as the first face data to be processed related to the target person, and the non-face data of which the face cannot be detected in each reference data is deleted from each reference data, so that the workload of data cleaning operation can be reduced.

As a possible embodiment, the face data cleaning method provided in the embodiment of the present disclosure may include the following steps:

step 1, processing data to be cleaned by using an existing face recognition model:

the existing face recognition model is used for carrying out face detection, face normalization and face feature extraction on the data to be cleaned, similarity evaluation is carried out on the data to be cleaned by using the face features, and part of face data is selected for cleaning. The method specifically comprises the following steps:

a. and carrying out face detection on the data to be cleaned by using an existing face recognition model, such as face detection in the VGG face. And cleaning the non-face data of which the face cannot be detected from the data set to be cleaned.

b. For the face data of the detected face, the face features of the face data are extracted by using the existing face recognition model, such as VGG face. Face features Fi are extracted for all face data Ii related to the target user, and all the features are fused into Mn (for example, an average value Mn is obtained (F1+ … + Fn)/n, where n is the number of face data related to the target user)), and Mn is used as the first face feature of the target user. And calculating the similarity Si of the face features Fi and Mn of the face data Ii, (such as the cosine value of an included angle between two feature vectors). If the similarity Si is larger than the threshold value T1, the face data Ii are considered as the face data of the target user, and the face data Ii are determined to be the face data D1 which do not need to be processed any more; if the similarity Si is smaller than the threshold T1 but larger than the threshold T2, it is considered that the face data Ii may be the face data of the target user, the face data Ii is determined to be data Ds which needs to be cleaned, and it is determined whether the face data Ii is the face data of the target user through methods such as manual confirmation; if the similarity Si is smaller than the threshold T2, the probability that the face data Ii is the face data of the target user is relatively low, and the face data Ii can be cleaned without processing for the moment.

c. And c, carrying out final judgment on the data Ds which are selected in the step b and need to be cleaned in a mode of manual confirmation and the like: the face data not of the target user in Ds is deleted, and only the face data confirmed to be of the target user is retained, thereby obtaining the face data D2 after cleaning.

Step 2, iteratively training a face recognition model by using the obtained face data:

after the data cleaning operation in step 1, the face data D1 and D2 obtained in step 1 can be used to train a face recognition model, and a face recognition model with higher face recognition accuracy is obtained. The method specifically comprises the following steps:

1) because the face data D1 obtained in step 1 and the face data D2 after cleaning are both face data with high reliability, the face data D1 and D2 are superimposed together to be used as a training data set Dt of the face recognition model.

2) The face recognition model is trained on the training data set Dt. For example, a face recognition model with better face recognition performance can be obtained by fine-tuning (refining) on the training data set Dt for the existing face recognition model; a brand-new face recognition model different from the existing face recognition model can be designed, and a brand-new face recognition model is obtained by training on the training data set Dt.

Step 3, selecting and cleaning the unprocessed face data by using the face recognition model obtained by training in the step 2:

because the face recognition model obtained by training in the step 2 is learned based on the face data with high reliability, the face recognition model obtained by training in the step 2 has stronger resolving power and face recognition capability aiming at the data set needing to be processed. And (3) cleaning and selecting the unprocessed face data by using the face recognition model obtained by training in the step (2). The method specifically comprises the following steps:

a) and (3) extracting face features from the face data associated with the target user by using the face recognition model obtained by training in the step (2). The method comprises the following steps: extracting the face features Fk of each face data Ik in the training data set Dt based on the face recognition model obtained by training in the step 2; calculating an average value Mt of the face features Fk of each face data Ik, for example, Mt ═ F1+ … Fk + … Ft)/t, where k takes a value of 1 to m, and Fj, Fk, and Mt may be feature vectors; and taking the Mt as a second face feature of the target person. And (3) extracting the face features Fj of the unprocessed face data Ij associated with the target user based on the face recognition model obtained by training in the step (2), wherein the value of j is 1-m. And calculating the similarity Sj of the face feature Fj of the face data Ij and the second face feature Mt of the target person, wherein for example, Sj is equal to the cosine value of the included angle between the feature vector Fj and the Mt.

If the similarity Sj is greater than the threshold value T3, the face data Ij are regarded as the face data of the target user, and the face data Ij are determined to be the face data D3 which do not need to be processed again; if the similarity Sj is smaller than the threshold T3 but larger than the threshold T4, the face data Ij is considered to be possibly the face data of the target user, the face data Ij is determined to be the data needing to be cleaned, and whether the face data Ij is the face data of the target user or not is judged by methods such as manual confirmation and the like; if the similarity Sj is smaller than the threshold T4, the probability that the face data Ij is the face data of the target user is relatively low, and the face data Ij can be cleaned without being processed temporarily.

b) Carrying out final judgment on the face data which is selected in the step a) and needs to be cleaned through modes such as manual confirmation and the like: deleting the face data which are not the target user from the face data which are selected in the step a) and need to be cleaned, and only keeping the face data which are confirmed to be the target user, thereby obtaining the cleaned face data D4.

Step 4, multiple iteration treatment:

through the data selection and cleaning processing in the steps 1 to 3, credible face data D1, D2, D3 and D4 are obtained. The four parts of data can be combined together as a final face data processing result D as a training set of the face recognition model. If other unprocessed face data still exist, a face recognition model with stronger face recognition capability can be trained by using D; by repeatedly executing the face data cleaning method provided by the embodiment of the present disclosure, the iterative processing is continued.

In the technical scheme provided by the embodiment of the disclosure, the face data to be cleaned is processed and selected by utilizing the existing face recognition model, so that the face data to be cleaned is greatly reduced; and training a new face recognition model by using the cleaned partial face data, processing the data by using the new model again, picking out the face data to be cleaned, performing iterative processing, reducing the workload of data processing, improving the efficiency of data cleaning, and finally obtaining a data set suitable for training the face recognition model.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.

FIG. 5 is a block diagram illustrating a face data cleansing apparatus according to an exemplary embodiment; the apparatus may be implemented in various ways, for example, with all components of the apparatus being implemented in a terminal or server, or with components of the apparatus being implemented in a coupled manner on the terminal side or server side; the device can realize the method related to the present disclosure through software, hardware or a combination of the two, as shown in fig. 5, the face data washing device includes: an obtaining module 501, a first determining module 502, a first screening module 503, and a first cleaning module 504, wherein:

The obtaining module 501 is configured to obtain first face data to be processed related to a target person;

the first determining module 502 is configured to extract face features of each face data based on the first face recognition model, respectively, and determine a first face feature of the target person according to the face features of each face data;

the first screening module 503 is configured to screen out first face data that does not need to be cleaned, second face data that needs to be deleted, and third face data that needs to be cleaned from each face data according to the face features of each face data and the first face features of the target person;

the first cleaning module 504 is configured to clean the third face data to obtain fourth face data, and determine the first face data and the fourth face data as first target face data corresponding to a target person.

The device provided by the embodiment of the disclosure can be used for executing the technical scheme of the embodiment shown in fig. 1, and the execution mode and the beneficial effect are similar, and are not described again here.

In one possible embodiment, as shown in fig. 6, the face data cleansing apparatus shown in fig. 5 may further include a first filtering module 503 configured to include: a calculation submodule 601 and a first determination submodule 602, wherein:

The calculation sub-module 601 is configured to calculate a similarity between the face features of each piece of face data and the first face feature of the target person;

the first determining sub-module 602 is configured to determine, as first face data that does not need to be cleaned, face data whose similarity is greater than a first threshold in each face data; determining the face data with the similarity not greater than the first threshold and greater than the second threshold in each face data as third face data needing cleaning; and determining the face data with the similarity not greater than a second threshold in the face data as second face data needing to be deleted.

In one possible implementation, the first determining module 502 is configured to calculate an average value of the facial features of each of the facial data; and determining the calculated average value as the first face characteristic of the target person.

In one possible embodiment, as shown in fig. 7, the face data washing apparatus shown in fig. 5 may further include: a training module 701, a second determining module 702, a second screening module 703, and a second cleaning module 704, wherein:

the training module 701 is configured to train a first face recognition model based on the first face data and the fourth face data to obtain a trained second face recognition model;

The second determining module 702 is configured to extract the face features of each first target face data and the second to-be-processed face data based on the second face recognition model, and determine the second face feature of the target person according to the face features of each first target face data;

the second screening module 703 is configured to screen out fifth face data that does not need to be cleaned, sixth face data that needs to be deleted, and seventh face data that needs to be cleaned from each second face data to be processed according to the second face features of the target person and the face features of each second face data to be processed;

the second cleaning module 704 is configured to clean the seventh face data to obtain eighth face data, and determine the first face data, the fourth face data, the fifth face data, and the eighth face data as second target face data corresponding to the target person.

In one possible embodiment, as shown in fig. 8, the face data washing apparatus shown in fig. 5 may further include a configuration that the obtaining module 501 is configured to include: an acquisition sub-module 801, a face detection sub-module 802, and a second determination sub-module 803, wherein:

the acquisition sub-module 801 is configured to acquire reference data related to a target person;

The face detection sub-module 802 is configured to perform face detection on each reference data based on the first face recognition model;

the second determining sub-module 803 is configured to determine the data of the detected face in each reference data as the first face data to be processed related to the target person.

Fig. 9 is a block diagram illustrating a face data cleansing apparatus 900 according to an exemplary embodiment, the face data cleansing apparatus 900 being adapted to a terminal or a server, the face data cleansing apparatus 900 including:

a processor 901;

a memory 902 for storing processor-executable instructions;

wherein the processor 901 is configured to:

acquiring first face data to be processed related to a target person;

respectively extracting the face features of each face data based on a first face recognition model, and determining the first face features of the target person according to the face features of each face data;

In one embodiment, the processor 901 may be further configured to:

calculating the similarity between the face characteristics of each face data and the first face characteristics of the target person;

determining the face data with the similarity larger than a first threshold value in each face data as first face data which does not need to be cleaned; determining the face data with the similarity not greater than the first threshold and greater than the second threshold in each face data as third face data needing cleaning; and determining the face data with the similarity not greater than a second threshold value in each face data as second face data needing to be deleted.

In one embodiment, the processor 901 may be further configured to:

calculating the average value of the human face characteristics of each piece of human face data;

In one embodiment, the processor 901 may be further configured to:

after the first face data and the fourth face data are determined as first target face data corresponding to a target person, a first face recognition model is trained on the basis of the first face data and the fourth face data, and a trained second face recognition model is obtained;

respectively extracting the face features of each first target face data and second face data to be processed based on a second face recognition model, and determining the second face feature of the target person according to the face features of each first target face data;

In one embodiment, the processor 901 may be further configured to:

acquiring reference data related to a target person;

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 10 is a block diagram illustrating an apparatus in accordance with an example embodiment. For example, the apparatus 1000 may be an electronic device such as a camera, a mobile phone, a tablet computer, or a smart wearable device. Referring to fig. 10, the apparatus 1000 may include one or more of the following components: processing component 1002, memory 1004, power component 1006, multimedia component 1008, audio component 1010, input/output (I/O) interface 1012, sensor component 1014, and communications component 1016.

The processing component 1002 is generally configured to control overall operation of the apparatus 1000, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 1002 may include one or more processors 1020 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 1002 may include one or more modules that facilitate interaction between processing component 1002 and other components. For example, the processing component 1002 may include a multimedia module to facilitate interaction between the multimedia component 1006 and the processing component 1002.

The memory 1004 is configured to store various types of data to support operations at the apparatus 1000. Examples of such data include instructions for any application or method operating on device 1000, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1004 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 1006 provides power to the various components of the device 1000. The power components 1006 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 1000.

The multimedia component 1008 includes a screen that provides an output interface between the device 1000 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1008 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 1000 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 1010 is configured to output and/or input audio signals. For example, audio component 1010 includes a Microphone (MIC) configured to receive external audio signals when apparatus 1000 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 1004 or transmitted via the communication component 1016. In some embodiments, audio component 1010 also includes a speaker for outputting audio signals.

I/O interface 1012 provides an interface between processing component 1002 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 1014 includes one or more sensors for providing various aspects of status assessment for the device 1000. For example, sensor assembly 1014 may detect an open/closed state of device 1000, the relative positioning of components, such as a display and keypad of device 1000, sensor assembly 1014 may also detect a change in position of device 1000 or a component of device 1000, the presence or absence of user contact with device 1000, orientation or acceleration/deceleration of device 1000, and a change in temperature of device 1000. The sensor assembly 1014 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 1014 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1014 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1016 is configured to facilitate communications between the apparatus 1000 and other devices in a wired or wireless manner. The device 1000 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 1016 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 1016 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 1000 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 1004 comprising instructions, executable by the processor 1020 of the device 1000 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

FIG. 11 is a block diagram illustrating an apparatus in accordance with an example embodiment. For example, the apparatus 1100 may be provided as a server. The apparatus 1100 comprises a processing component 1102, which further comprises one or more processors, and memory resources, represented by memory 1103, for storing instructions, e.g., applications, executable by the processing component 1102. The application stored in the memory 1103 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1102 is configured to execute instructions to perform the above-described methods.

The apparatus 1100 may also include a power component 1106 configured to perform power management of the apparatus 1100, a wired or wireless network interface 1105 configured to connect the apparatus 1100 to a network, and an input output (I/O) interface 1108. The apparatus 1100 may operate based on an operating system stored in the memory 1103, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

A non-transitory computer readable storage medium, wherein instructions in the storage medium, when executed by a processor of the

apparatus

1000 or 1100, enable the

apparatus

1000 or 1100 to perform a method comprising:

Acquiring first face data to be processed related to a target person;

In one embodiment, screening out first face data that does not need to be cleaned, second face data that needs to be deleted, and third face data that needs to be cleaned from each face data according to the face features of each face data and the first face features of the target person includes:

In one embodiment, determining the first face characteristics of the target person according to the face characteristics of each face data includes:

calculating the average value of the human face characteristics of each human face data;

In one embodiment, after determining the first face data and the fourth face data as the first target face data corresponding to the target person, the method further includes:

training a first face recognition model based on the first face data and the fourth face data to obtain a trained second face recognition model;

In one embodiment, acquiring first to-be-processed face data related to a target person includes:

acquiring reference data related to a target person;

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A face data cleaning method is characterized by comprising the following steps:

acquiring first face data to be processed related to a target person;

cleaning the third face data to obtain fourth face data, and determining the first face data and the fourth face data as first target face data corresponding to the target person;

screening out the first human face data which do not need to be cleaned, the second human face data which need to be deleted and the third human face data which need to be cleaned in the human face data according to the human face features of the human face data and the first human face features of the target person, and the method comprises the following steps:

calculating the similarity between the face features of the face data and the first face features of the target person;

Determining the face data with the similarity larger than a first threshold value in each face data as first face data which does not need to be cleaned; determining the face data with the similarity not greater than the first threshold and greater than the second threshold in each face data as third face data needing cleaning; determining the face data with the similarity not greater than the second threshold value in each face data as second face data needing to be deleted;

determining the first face features of the target person according to the face features of the face data, including:

and calculating the average value of the face features of the face data, and determining the calculated average value as the first face feature of the target person.

2. The method according to claim 1, wherein after the determining the first and fourth face data as first target face data corresponding to the target person, the method further comprises:

3. The method of claim 1, wherein the obtaining of the first to-be-processed face data related to the target person comprises:

acquiring reference data related to a target person;

4. A face data cleaning apparatus, comprising:

the first cleaning module is used for cleaning the third face data to obtain fourth face data, and determining the first face data and the fourth face data as first target face data corresponding to the target person;

the first screening module includes:

the first determining submodule is used for determining the face data with the similarity larger than a first threshold value in each face data as first face data which does not need to be cleaned; determining the face data with the similarity not greater than the first threshold and greater than the second threshold in each face data as third face data needing cleaning; determining the face data with the similarity not greater than the second threshold value in each face data as second face data needing to be deleted;

The first determining module calculates the average value of the face features of the face data; and determining the calculated average value as the first face characteristic of the target person.

5. The apparatus of claim 4, further comprising:

the second screening module is used for screening fifth face data which does not need to be cleaned, sixth face data which needs to be deleted and seventh face data which needs to be cleaned in the second face data to be processed according to the second face features of the target person and the face features of the second face data to be processed;

6. The apparatus of claim 4, wherein the obtaining module comprises:

7. A face data cleaning apparatus, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

acquiring first face data to be processed related to a target person;

8. A computer-readable storage medium having stored thereon computer instructions, which, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 3.