CN115439919A

CN115439919A - Model updating method, device, equipment, storage medium and program product

Info

Publication number: CN115439919A
Application number: CN202211353988.0A
Authority: CN
Inventors: 张韵璇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-11-01
Filing date: 2022-11-01
Publication date: 2022-12-06
Anticipated expiration: 2042-11-01
Also published as: CN116978087A; CN115439919B

Abstract

The embodiment of the application discloses a model updating method, a model updating device, model updating equipment, a storage medium and a program product; in the embodiment of the application, the face features of the training samples and the central features of the target clusters corresponding to the labels of the training samples are subjected to fusion processing to obtain the fusion features of the training samples; according to the fusion features and the face features, the center features of the target cluster are adjusted to obtain adjusted center features of the target cluster; performing fusion processing on the face features and the adjusted central features of the target cluster to obtain adjusted fusion features of the training sample; according to the adjusted fusion features, determining compatible loss values between the fusion features and the adjusted fusion features, and determining classification loss values of the face recognition model; and training the face recognition model according to the classification loss value and the compatible loss value to obtain a target face recognition model after the face recognition model is updated. The method and the device can improve the compatibility of the target face recognition model.

Description

Model updating method, device, equipment, storage medium and program product

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a model updating method, apparatus, device, storage medium, and program product.

Background

With the development of scientific technology, neural network models are more and more widely applied, for example, neural network models are applied to the field of face recognition, that is, facial features of a face image are extracted through a face recognition model, and the facial features are matched with features in a face library to obtain a recognition result of the face image.

In the process of identifying the face image by adopting the face identification model, the face identification model is updated according to the face image. However, the features of the face extracted by the updated face recognition model are incompatible with the features in the face library, so that the features of the face image in the face library need to be re-extracted by the updated face recognition model, which consumes a long time and causes a slow recognition speed.

Disclosure of Invention

The embodiment of the application provides a model updating method, a model updating device, a model updating storage medium and a program product, and can solve the technical problems that the time consumed is long and the recognition speed is slow when the features of face images in a face library are re-extracted through an updated face recognition model.

The embodiment of the application provides a model updating method, which comprises the following steps:

acquiring a training set for updating a face recognition model, wherein the training set comprises at least one training sample with a label, and the training sample is a face image historically recognized by the face recognition model;

extracting features of the training samples to obtain facial features corresponding to the training samples, and performing fusion processing on the facial features and central features of the target clusters corresponding to the labels to obtain fusion features of the training samples;

adjusting the central feature of the target cluster according to the fusion feature and the facial feature to obtain an adjusted central feature of the target cluster;

performing fusion processing on the face features and the adjusted central features of the target cluster to obtain adjusted fusion features of the training sample;

according to the adjusted fusion features, determining compatible loss values between the fusion features and the adjusted fusion features, and determining classification loss values of the face recognition model;

and training the face recognition model according to the classification loss value and the compatible loss value to obtain a target face recognition model after the face recognition model is updated.

Accordingly, an embodiment of the present application provides a model updating apparatus, including:

the face recognition system comprises an acquisition module, a face recognition module and a face recognition module, wherein the acquisition module is used for acquiring a training set for updating a face recognition model, the training set comprises at least one training sample with a label, and the training sample is a face image historically recognized by the face recognition model;

the extraction module is used for extracting the features of the training samples to obtain facial features corresponding to the training samples, and performing fusion processing on the facial features and the central features of the target clusters corresponding to the labels to obtain fusion features of the training samples;

an adjusting module, configured to adjust a central feature of the target cluster according to the fusion feature and the face feature, so as to obtain an adjusted central feature of the target cluster;

a fusion module, configured to perform fusion processing on the facial features and the adjusted central features of the target cluster to obtain adjusted fusion features of the training sample;

a determining module, configured to determine a compatible loss value between the fusion feature and the adjusted fusion feature according to the adjusted fusion feature, and determine a classification loss value of the face recognition model;

Optionally, the adjusting module is specifically configured to perform:

determining a prediction label of the training sample according to the fusion characteristics;

and adjusting the central feature of the target cluster according to the facial feature and the prediction label to obtain the adjusted central feature of the target cluster.

Optionally, the extraction module is specifically configured to perform:

extracting the features of the training sample through a feature extraction layer in the face recognition model to obtain initial face features corresponding to the training sample;

and performing feature mapping on the initial face features through a feature mapping layer in the face recognition model to obtain face features corresponding to the training samples.

Optionally, the adjusting module is specifically configured to perform:

acquiring first direction information corresponding to the facial features and second direction information of the initial facial features;

and adjusting the center feature of the target cluster according to the first direction information, the second direction information and the prediction label to obtain an adjusted center feature of the target cluster.

Optionally, the adjusting module is specifically configured to perform:

acquiring third direction information of the central feature of the target cluster;

and adjusting the center feature of the target cluster according to the first direction information, the second direction information, the third direction information and the prediction tag to obtain an adjusted center feature of the target cluster.

Optionally, the fusion module is specifically configured to perform:

performing first fusion processing on the adjusted fusion features and the initial facial features to obtain target fusion features of the training sample;

and determining a compatibility loss value between the fusion feature and the adjusted fusion feature according to the target fusion feature.

Optionally, the model updating apparatus further includes:

the mapping module is used for acquiring the initial central characteristics of the target cluster corresponding to the label;

and mapping the initial central features through a central mapping layer of the face recognition model to obtain the central features of the target clusters corresponding to the labels.

Optionally, the obtaining module is specifically configured to perform:

acquiring a face image historically recognized by the face recognition model;

screening a picture cluster group to be labeled from picture clusters corresponding to the face image, labeling the face image according to the picture cluster group to be labeled, and obtaining a labeled face image;

and determining a training set for updating the face recognition model according to the labeled face image.

Optionally, the obtaining module is specifically configured to perform:

acquiring a historical face image historically recognized by the face recognition model;

extracting key points of the historical face image, and determining a quality score corresponding to the historical face image according to the key points;

and taking the historical face image corresponding to the quality score meeting the preset score threshold value as the face image historically recognized by the face recognition model.

Optionally, the obtaining module is specifically configured to perform:

obtaining a plurality of quality models of the historical face image;

determining a plurality of initial quality scores of the historical face image according to the key points through the quality model;

and determining the quality score corresponding to the historical face image according to the initial quality score.

Optionally, the historical face image has a plurality of quality evaluation dimensions, each quality evaluation dimension corresponds to a dimension quality model, and accordingly, the obtaining module is specifically configured to perform:

determining an initial quality score of the historical face image for each quality evaluation dimension according to the key points through the dimension quality model;

and performing weighting operation on each initial quality score to obtain a quality score corresponding to the historical face image.

Optionally, the model updating apparatus further includes:

a training module to perform:

acquiring a first training set of a dimension quality model to be trained, wherein the first training set comprises a plurality of first training samples;

extracting sample key points of the first training sample, and determining a pair loss value and/or an anchor loss value of the dimension quality model to be trained according to the sample key points;

and training the dimension quality model to be trained according to the pair loss value and/or the anchor loss value to obtain the dimension quality model.

Optionally, the training module is specifically configured to perform:

acquiring an anchor label corresponding to the first training sample, wherein the anchor label represents the grade of the quality evaluation dimension, and at least three anchor labels exist in the first training set;

determining a prediction score of the first training sample according to the key point;

and determining an anchor loss value of the dimension quality model to be trained according to the anchor label, the shielding prediction score and the score interval corresponding to the anchor label.

Optionally, the plurality of quality models of the historical face image include a first comprehensive quality model and a second comprehensive quality model, and accordingly, the obtaining module is specifically configured to perform:

determining a first initial quality score corresponding to the historical face image according to the key point through the first comprehensive quality model;

determining a second initial quality score corresponding to the historical face image according to the key point through the second comprehensive quality model;

and screening out a quality score corresponding to the historical face image from the first initial quality score and the second initial quality score.

Optionally, the obtaining module is specifically configured to perform:

acquiring a preset screening threshold;

if the first initial quality score and the second initial quality score are both smaller than or equal to the preset screening threshold, taking the first initial quality score as a quality score corresponding to the historical face image;

and if the first initial quality score and the second initial quality score are both larger than the preset screening threshold, taking the second initial quality score as the quality score corresponding to the historical face image.

Optionally, the obtaining module is specifically configured to perform:

determining the association degree between the picture clusters corresponding to the face images according to the face features in the picture clusters corresponding to the face images;

and screening out a group of picture clusters to be labeled from the picture clusters corresponding to the face images according to the association degree.

Optionally, the obtaining module is specifically configured to perform:

constructing a data structure according to the picture clusters corresponding to the face images and the association degrees, wherein the data structure comprises a plurality of nodes and a plurality of edges, each node represents a picture cluster corresponding to the face images, and the weight of each edge represents the association degrees between the picture clusters corresponding to the face images;

performing annular adjustment on the data structure according to the weight of the edge in the data structure to obtain the data structure which does not contain annular connection;

and screening out a node group from the data structure which does not contain the annular connection, wherein the node group corresponds to the picture cluster group to be labeled.

Optionally, the obtaining module is specifically configured to perform:

extracting a plurality of modal information corresponding to the face images in the picture cluster group to be labeled;

displaying the group of the picture clusters to be marked and the modal information;

and receiving labeling information for labeling the face image in the to-be-labeled picture cluster group by a user according to the modal information to obtain a labeled face image.

In addition, an electronic device is further provided in an embodiment of the present application, and includes a processor and a memory, where the memory stores a computer program, and the processor is configured to run the computer program in the memory to implement the model updating method provided in the embodiment of the present application.

In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, where the computer program is suitable for being loaded by a processor to execute any one of the model updating methods provided in the embodiments of the present application.

In addition, the embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements any one of the model updating methods provided by the embodiment of the present application.

In the embodiment of the application, a training set for updating a face recognition model is obtained, the training set comprises at least one training sample with a label, the training sample is a face image which is historically recognized by the face recognition model, feature extraction is performed on the training sample to obtain a face feature corresponding to the training sample, the face feature and a central feature of a target cluster corresponding to the label are subjected to fusion processing to obtain a fusion feature of the training sample, so that the central feature of the target cluster can be adjusted according to the fusion feature and the face feature to obtain an adjusted central feature of the target cluster, after the face feature and the adjusted central feature of the target cluster are subjected to fusion processing to obtain the adjusted fusion feature of the training sample, a compatible loss value between the fusion feature and the adjusted fusion feature can be determined according to the adjusted fusion feature, a classification loss value of the face recognition model is determined, the face recognition model is subjected to the face recognition model according to the classification loss value and the compatible loss value, the updated target face recognition model is obtained, so that face recognition model extraction time is reduced without adopting a face recognition model, and time extraction is performed again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic view of a scenario of a model update process provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a data loop provided by an embodiment of the present application;

FIG. 3 is a schematic flowchart of a model updating method according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a face image provided by an embodiment of the application;

FIG. 5 is a schematic diagram of a first training set provided by an embodiment of the present application;

FIG. 6 is a schematic view of an anchor tag provided by embodiments of the present application;

FIG. 7 is a schematic diagram of a dimensional quality model provided by an embodiment of the present application;

FIG. 8 is a graphical illustration of the effect of various loss value trained models provided by embodiments of the present application;

FIG. 9 is a diagram illustrating the quality evaluation effect of an unsupervised model provided in an embodiment of the present application;

FIG. 10 is a schematic diagram of a model perturbation integrated quality model provided by an embodiment of the present application;

FIG. 11 is a schematic diagram of a similarity distribution distance integrated quality model according to an embodiment of the present disclosure;

FIG. 12 is a schematic diagram of an adaptive feature model length integrated quality model provided by an embodiment of the present application;

FIG. 13 is a graphical illustration of the effect of a first integrated mass model and a second integrated mass model provided by embodiments of the present application;

FIG. 14 is a diagram of a data structure provided by an embodiment of the present application;

FIG. 15 is a schematic diagram of a method of active learning provided by an embodiment of the present application;

FIG. 16 is a schematic diagram of a set of queries provided by an embodiment of the present application;

FIG. 17 is a schematic diagram of another method for active learning provided by embodiments of the present application;

FIG. 18 is a schematic illustration of a tagging interface provided by an embodiment of the present application;

FIG. 19 is a diagram of a new face recognition model and a face recognition model provided by an embodiment of the present application;

FIG. 20 is a schematic illustration of compatibility provided by embodiments of the present application;

fig. 21 is a flowchart illustrating a training set constructing method according to an embodiment of the present application;

FIG. 22 is a schematic flow chart diagram illustrating another model updating method provided in an embodiment of the present application;

FIG. 23 is a schematic structural diagram of a model updating apparatus according to an embodiment of the present application;

fig. 24 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Embodiments of the present application provide a model updating method, apparatus, device, storage medium, and program product, where the device may be an electronic device, the storage medium may be a computer-readable storage medium, and the program product may be a computer program product. The model updating apparatus may be integrated into an electronic device, and the electronic device may be a server or a terminal.

The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, network acceleration service (CDN), big data and an artificial intelligence platform.

And, a plurality of servers can be grouped into a blockchain, and the servers are nodes on the blockchain.

The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

For example, as shown in fig. 1, the terminal may acquire a face image and transmit the face image to the server. The server identifies the face image through the face identification model to obtain an identification result of the face image, and returns the identification result to the terminal.

After obtaining the face image, the server may use the face image as a training sample, and construct a training set for updating the face recognition model according to the training sample. Then, the server extracts features of the training samples through the face recognition model to obtain face features corresponding to the training samples, performs fusion processing on the face features and central features of the target clusters corresponding to the labels of the training samples to obtain fusion features of the training samples, and adjusts the central features of the target clusters according to the fusion features and the face features to obtain adjusted central features of the target clusters. And then, the server performs fusion processing on the face features and the adjusted central features of the target cluster to obtain adjusted fusion features of the training samples. And finally, the server determines a compatible loss value between the fusion feature and the adjusted fusion feature according to the adjusted fusion feature, determines a classification loss value of the face recognition model, and trains the face recognition model according to the classification loss value and the compatible loss value to obtain a target face recognition model after the face recognition model is updated.

The "plurality" in the embodiment of the present application means two or more. "first" and "second" and the like in the embodiments of the present application are used for distinguishing the description, and are not to be construed as implying relative importance.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes technologies such as image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction and the like, and also includes common biometric technologies such as face recognition, fingerprint recognition and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The method specially studies how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

The scheme provided by the embodiment of the application relates to the technologies of machine learning of artificial intelligence, computer vision and the like, and is specifically explained by the following embodiment. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.

Referring to fig. 2, a usage scenario of the embodiment of the present application will be described in detail. In the process of recognizing the face image by adopting the face recognition model, the face recognition model is updated according to the face image to obtain an updated face recognition model, then the updated face recognition model is adopted for face recognition, the face image and the face recognition model are mutually promoted to form a closed-loop feedback system of the data-model, and the process is called as data closed loop.

For example, when the face recognition model is a human face recognition model, the data closed loop may be as shown in fig. 2, and the key point detection is performed on the recognized human face image to obtain human face key points, then the quality filtering is performed according to the human face key points to obtain filtered human face images, and a training set is constructed according to the filtered human face images. And extracting the face features in the training set, and updating the face recognition model according to the face features to obtain an updated face recognition model.

Therefore, in the method of updating a face recognition model, a process of constructing a training set of the updated face recognition model and a process of updating the face recognition model are included.

In the present embodiment, the model updating apparatus will be described from the perspective of the model updating apparatus, and for convenience of describing the model updating method of the present application, the model updating apparatus will be described in detail below as being integrated in a server, that is, as being an execution subject.

Referring to fig. 3, fig. 3 is a schematic flowchart illustrating a model updating method according to an embodiment of the present application. The model updating method can comprise the following steps:

s301, a training set for updating the face recognition model is obtained, the training set comprises at least one training sample with a label, and the training sample is a face image which is historically recognized by the face recognition model.

The face image refers to an image in which the head portion having the identity information is located. The type of the face image may be set according to actual conditions, for example, the face image may be a face image of an animal or a face image of a plant, and the face image of the animal may be a face image or a face image of a pet, and the embodiment is not limited herein.

The server may obtain a training set that updates the face recognition model upon receiving the obtaining instruction. Alternatively, the server may also obtain a training set for updating the face model when detecting that the number of the face images historically recognized by the face recognition model satisfies a preset condition. Still alternatively, the server may periodically obtain a training set that updates the face recognition model.

The server can directly take the historical face images identified by the face identification model history as the face images identified by the face identification model history. Alternatively, the server may also screen a preset number (the preset number is smaller than the total number of the historical face images) of historical face images which are historically recognized by the face recognition model as the face images which are historically recognized by the face recognition model.

When the quality of the historical face image is low, the identity information of the historical face image cannot be judged, so that the identification result of the historical face image is wrong. For example, as shown in fig. 4, a historical face image with a low quality cannot be recognized. Therefore, when the server screens out a preset number of historical face images as face images from historical face images historically recognized by the face recognition model, the server may screen out a preset number of high-quality historical face images as face images from historical face images historically recognized by the face recognition model.

Alternatively, the server may manually screen a preset number of historical face images with better quality from historical face images historically recognized by the face recognition model, and then use the historical face images with better quality as face images, or the server may automatically screen a preset number of historical face images with higher quality scores from historical face images historically recognized by the face recognition model as face images.

If the server automatically screens a preset number of historical face images with higher quality scores from the historical face images which are identified by the face identification model as the face images, acquiring a training set for updating the face identification model, wherein the training set may include:

obtaining a historical face image of historical recognition of a face recognition model;

and taking the historical face image corresponding to the quality score meeting the preset score threshold value as the face image historically identified by the face identification model.

In the embodiment, the quality score corresponding to the historical face image is determined according to the key points of the historical face image, and then the historical face image corresponding to the quality score meeting the preset score threshold value is used as the face image of the historical recognition of the face recognition model, so that the historical face image with poor quality is automatically screened out, the efficiency of updating the face recognition model according to the training set can be improved, and the screening speed can be improved.

The method for determining the quality score corresponding to the historical face image according to the key points can be selected according to actual conditions, for example, the definition of the historical face image can be determined through the key points, and then the quality score corresponding to the historical face image can be determined according to the definition. Or, the deflection angle of the face in the historical face image can be determined through the key points, and then the corresponding quality score of the historical face image is determined according to the deflection angle. Alternatively, the quality score of the historical face image may be determined from the key points by a quality model. The present embodiment is not limited herein.

When the quality scores of the historical face images are determined according to the key points through the quality models, if only one quality model is used, the quality scores of the historical face images are determined according to the key points, and the accuracy of the obtained quality scores is low.

To improve the accuracy of the quality scores, in some embodiments, the quality scores corresponding to the historical facial images are determined from the keypoints, including:

obtaining a plurality of quality models of historical face images;

determining a plurality of initial quality scores of the historical face image according to the key points through a quality model;

The plurality of initial quality scores can be subjected to weighting operation, so that the quality scores corresponding to the historical face images are obtained. Alternatively, one initial quality score may be screened out from a plurality of initial quality scores to be used as the quality score corresponding to the historical face image, which is not limited herein.

In this embodiment, a plurality of initial quality scores of the historical face image are obtained through a plurality of quality models, and then the quality score of the historical face image is determined according to the plurality of initial quality scores, so that the quality score of the historical face image is improved.

In other embodiments, the determining of the plurality of initial quality scores of the historical face image according to the quality model and the key points comprises:

determining an initial quality score of the historical face image for each quality evaluation dimension according to the key points through a dimension quality model;

The quality evaluation dimension refers to a dimension in which factors that affect the quality of the historical face image are located. The quality evaluation dimension may be selected according to an actual situation, for example, the quality evaluation dimension may be a definition of the historical face image, a deflection angle of a face in the historical face image, a blocking degree of a face in the historical face image, a brightness of the historical face image, a color of the historical face image, or the like, that is, the dimension quality model (each dimension quality model may also be referred to as a multi-expert model) may be a definition quality model, an angle quality model, a blocking quality model, a brightness quality model, a color quality model, or the like, and the initial quality score may be a definition quality score, a deflection angle quality score, a blocking quality score, a brightness quality score, a color quality score, or the like, which is not limited herein.

The weighting operation may refer to directly adding the initial mass fractions, or the weighting operation may refer to multiplying the initial mass fractions by weights corresponding to the initial mass fractions to obtain adjusted initial mass fractions, and then adding the adjusted initial mass fractions, which is not limited in this embodiment.

In this embodiment, each quality evaluation dimension corresponds to one dimension quality model, then an initial quality score of the historical face image for each quality evaluation dimension is obtained through the dimension quality model, and finally, weighting operation is performed on each initial quality score to obtain a quality score corresponding to the historical face image, so that quality scores can be obtained from different quality evaluation dimensions, and the accuracy of the quality scores is improved.

When the dimension quality model corresponding to the quality evaluation dimension is obtained without a training mode, for example, when the quality evaluation dimension is a deflection angle of a historical face image face, the dimension quality model corresponding to the quality evaluation dimension may be an angle quality model, and the process of determining the deflection angle quality score of the face in the historical face image according to the key point through the angle quality model may be:

and calculating a rotation matrix according to the key Points and the coordinates of the reference system by a Perspective-n-Points (PNP) method, and then determining the deflection angle of the face in the historical face image according to the rotation matrix without training a angle quality model.

When the dimension quality model corresponding to the quality evaluation dimension is obtained through a training mode, for example, when the quality evaluation dimension is the definition or the shielding degree of the historical face image, the dimension quality model corresponding to the quality evaluation dimension may be a definition quality model or a shielding quality model, and the definition quality model or the shielding quality model may be obtained through a training mode, wherein the dimension quality model may be obtained through a supervised training mode or an unsupervised training mode.

If the dimension quality model is obtained through a supervised training mode, before determining the quality score corresponding to the historical face image according to the key point through the dimension quality model, the method further comprises the following steps:

extracting sample key points of a first training sample, and determining a pair loss value and/or an anchor loss value of a dimension quality model to be trained according to the sample key points;

and training the dimension quality model to be trained according to the pair-wise loss value and/or the anchor loss value to obtain the dimension quality model.

The pair-wise Loss value of the dimension quality model to be trained may be determined according to the sample key point by using a metric learning method, and the metric learning method may be selected according to an actual situation, for example, a classical contrast Loss method, infoNCE Loss, and a Margin Ranking Loss (if the pair-wise Loss value is determined according to the sample key point by using the Margin Ranking Loss, the training of the dimension quality model to be trained is converted into a Ranking problem), or an N-pair Loss method may be selected as the metric learning method, which is not limited herein.

When the pairwise Loss value of the dimensional quality model to be trained is determined according to the sample key points by the N-pair Loss method, the process of determining the pairwise Loss value of the dimensional quality model to be trained according to the sample key points may be:

screening a positive training sample corresponding to the first training sample from a first sample cluster corresponding to the first training sample, screening a plurality of negative training samples corresponding to the first training sample from a second sample cluster, wherein the second sample cluster is a sample cluster except the first sample cluster in the sample cluster corresponding to the first training set;

determining a first distance between the first training sample and the positive training sample according to the sample key points of the first training sample and the sample key points of the positive training sample, and determining a second distance between the first training sample and the negative training sample according to the sample key points of the first training sample and the sample key points of the negative training sample;

and determining a pair loss value of the dimension quality model to be trained according to the first distance and the second distance.

Because different from gender, whether a mask or not is worn can be clearly judged, and some quality evaluation dimensions (such as definition and shielding degree) reflect a relative relationship, namely, a pair of images are given at the same time, and the judgment of which image is more seriously shielded or clearer is made, so that in the example, a point wise is converted into a label of a pair wise, and then a contrast loss function is used for training, so that a higher score is given to a better image in the model training process.

Optionally, according to the sample key points, the process of determining the Anchor Loss value (Anchor Loss) of the dimension quality model to be trained may be:

acquiring an anchor label corresponding to the first training sample, and acquiring a fraction interval corresponding to the anchor label;

determining a prediction score of a first training sample according to the sample key points;

and determining an anchor loss value of the dimension quality model to be trained according to the anchor label, the prediction score and the score interval corresponding to the anchor label.

The anchor label represents a true category or a false category, and may include two categories, for example, the true category may be 1, and the false category may be 0, that is, the anchor label may be 0 or 1.

The anchor label represents a true category or a false category, which means that when the dimension quality model to be trained is trained, the classification result of the dimension quality model to be trained on the first training sample is in a yes or no form, however, some quality evaluation dimensions exist in multiple levels, for example, for the occlusion degree and the definition of the face, the occlusion degree of the face exists in multiple types (the occlusion degree may be strong occlusion, micro occlusion, light occlusion, no occlusion, etc.), and the definition also exists in multiple types, so if when the dimension quality model to be trained is trained, the anchor label only includes two types, which results in a low accuracy of the score obtained through the dimension quality model.

To further improve the accuracy of the dimension quality model, in other embodiments, determining an anchor loss value of the dimension quality model to be trained according to the key point includes:

obtaining an anchor label corresponding to a first training sample, wherein the anchor label represents the grade of a quality evaluation dimension, and at least three anchor labels exist in a first training set;

determining a prediction score of the first training sample according to the key points;

and determining an anchor loss value of the dimension quality model to be trained according to the anchor label, the shielding prediction fraction and the fraction interval corresponding to the anchor label.

The grade of the quality assessment dimension represents the severity of the quality assessment dimension. The lower the rank of the quality assessment dimension, the more severe the quality assessment dimension. For example, the quality evaluation dimension is an occlusion degree, and the lower the level of the occlusion degree, the more severe the occlusion degree, that is, the more occluded the occlusion. For another example, the quality evaluation dimension is sharpness, and the lower the level of sharpness, the less sharp the representation.

Optionally, the level of the quality evaluation dimension may be quantified according to the number of visible key points, for example, the higher the number of visible key points is, the higher the level of the quality evaluation dimension is.

The first training set can comprise first training samples of different levels of quality evaluation dimensionality, and then a dimensionality quality model to be trained is trained to carry out at least three classifications on each first training sample.

For example, when waiting to train dimension quality model and carry out the three classification to every first training sample for waiting to train and shelter from quality model and train, first training set can be including sheltering from first training sample by force, do not have and shelter from first training sample a little, and at this moment, there is the three rank of degree of sheltering from in the first training set, and three rank divide into and shelter from, shelter from a little and do not have for strong, sheltering from and sheltering from.

When the first training set comprises a strongly occluded first training sample, an unoccluded first training sample, and a micro-occluded first training sample, the first training set may be as shown in fig. 5.

If there are three levels of quality assessment dimension for the first training set, it means at least three anchor labels in the first training set. When three anchor labels exist in the first training set and the dimension quality model to be trained is the shielding quality model to be trained, the three anchor labels can be shown in fig. 6, the horizontal axis in the graph represents fractional intervals corresponding to the anchor labels, the three anchor labels are respectively 0,1,2,0 represents strong shielding, 1 represents micro shielding and 2 represents no shielding.

The process of determining the anchor loss value of the dimension quality model to be trained according to the anchor label, the prediction score and the score interval corresponding to the anchor label may be as follows:

determining an anchor loss value to be adjusted according to the anchor label and the prediction score;

determining a correct classification score corresponding to the anchor label according to the score interval corresponding to the anchor label;

and adjusting the anchor loss value to be adjusted according to the difference between the prediction score and the correct classification score corresponding to the anchor label to obtain the anchor loss value of the dimensional quality model to be trained.

In this embodiment, the anchor labels represent the grade of the quality evaluation dimension, at least three anchor labels exist in the first training set, and then the dimension quality model to be trained is trained according to the first training set, so that the quality of the face image obtained through the dimension quality model is further improved, and the accuracy and the efficiency of labeling can be improved subsequently when the face image is labeled.

The embodiment can train the dimension quality model to be trained according to the pairwise loss value. Alternatively, the embodiment may also train the dimension quality model to be trained according to the Anchor Loss value (Anchor Loss). Or, in this embodiment, the to-be-trained dimensional quality model may also be trained according to the pairwise loss value and the anchor loss value at the same time, and at this time, when the multiple dimensional quality models are the angle quality model, the occlusion quality model, and the definition quality model, respectively, the multiple quality models in the embodiment of the present application may be as shown in fig. 7. For labels, i.e., positive and positive examples, positive and negative examples.

When the dimension quality model to be trained is trained according to the pairwise loss value, the pairwise loss value comprises the distance between the positive sample pairs and the distance between the positive sample and the negative sample, so that the dimension quality model to be trained is trained relatively to the two classification loss values, and after the dimension quality model to be trained is trained according to the pairwise loss value, the obtained dimension quality model has a better effect in quality evaluation.

When the dimension quality model to be trained is trained according to the anchor loss value, different from the two-classification loss value, the class is constrained, the anchor loss value can determine the prediction difficulty degree of the sample according to the difference between the true class prediction probability and the false class prediction probability, and then the loss value is dynamically re-estimated according to the prediction difficulty degree of the sample, so that the constraint on the fraction interval (the fraction interval can be the interval where the true class prediction probability is located) is realized, the overfitting phenomenon is relieved, and the effect of the dimension quality model in quality evaluation is improved.

When the dimensional quality model to be trained is trained according to the paired loss values and the anchor loss values, the distance between the pair of positive samples and the distance between the positive sample and the negative sample can be restrained, and the fraction interval can be restrained, so that the effect of the dimensional quality model in quality evaluation can be further improved.

The effect of the dimensional quality model obtained by training based on various loss values will be described below with reference to fig. 8. The abscissa in fig. 8 represents the filtering ratio of the dimension quality model, the filtering ratio may refer to the ratio between the number of the historical face images filtered by the dimension quality model and the number of the historical face images historically identified by the face identification model, the ordinate in fig. 8 represents the recall ratio (TPR), the ordinate in fig. 8 represents the correct filtering ratio, and the correct filtering ratio refers to the ratio between the historical face images correctly filtered by the dimension quality model and the historical face images filtered by the dimension quality model.

As can be seen from fig. 8, if the dimensional quality model to be trained is trained according to the two-classification loss value, a severe overfitting phenomenon occurs, which results in a low recall rate and a poor filtering result. If the dimension quality model to be trained is trained according to the anchor loss value, the overfitting phenomenon can be relieved, and the recall rate and the filtering result are improved. If the paired loss value is added on the basis of the anchor loss value, the prediction score of the dimensional quality model can be smoother, and the effect of the dimensional quality model in quality evaluation is improved.

In this embodiment, initial quality scores are obtained through the dimension quality models of the quality evaluation dimensions, and then the weighting operation is performed on the initial quality scores, so that quality scores corresponding to the historical face image are obtained. However, the quality scores of the historical face images are not simply and linearly superposed with the initial quality scores, for example, a clear side face and a blurred front face are difficult to clearly define which one has better quality, and therefore, various quality evaluation dimensions need to be accurately weighed, so that the quality scores are more accurately obtained.

In order to obtain the quality scores more accurately, in other embodiments, the plurality of quality models of the historical face images include a first comprehensive quality model and a second comprehensive quality model, and the determining the quality scores corresponding to the historical face images according to the key points includes:

determining a first initial quality score corresponding to the historical face image according to the key points through a first comprehensive quality model;

determining a second initial quality score corresponding to the historical face image according to the key points through a second comprehensive quality model;

In this embodiment, the first and second comprehensive quality models may determine the first and second initial quality scores of the historical face image according to the respective quality evaluation dimensions of the historical face image at the same time, so that the first and second initial quality scores are more accurate.

And the training mode of the dimension quality model of the quality evaluation dimension is supervised training, and the supervised training leads the training of the dimension quality model to be trained to be influenced by the marking precision and speed of the first training sample. Thus, to more accurately derive the mass scores, in other embodiments, the first and second integrated mass models may be first and second integrated mass unsupervised models.

Screening out a quality score corresponding to the historical face image from the first initial quality score and the second initial quality score, wherein the screening out the quality score corresponding to the historical face image comprises the following steps:

acquiring a preset screening threshold;

if the first initial quality score and the second initial quality score are both smaller than or equal to a preset screening threshold value, taking the first initial quality score as a quality score corresponding to the historical face image;

and if the first initial quality score and the second initial quality score are both larger than a preset screening threshold value, taking the second initial quality score as a quality score corresponding to the historical face image.

The first initial mass fraction and the second initial mass fraction are both less than or equal to a preset screening threshold, and may be understood as being both less than the preset screening threshold, or may be understood as being both equal to the preset screening threshold, or may be understood as being both less than the preset screening threshold, and the second initial mass fraction is equal to the preset screening threshold, or may be understood as being both equal to the preset screening threshold, and the second initial mass fraction is less than the preset screening threshold.

In the practical application process, the comprehensive quality model is found to have different representations on facial images with different qualities. For example, as shown in fig. 9, the effect of the model perturbation comprehensive Quality model (SER-FIQ) and the Similarity Distribution Distance comprehensive Quality model (Similarity Distribution Distance for Face Image Quality Assessment) on the low-Quality Face Image is better, that is, the Quality Assessment score of the low-Quality Face Image obtained through the model perturbation comprehensive Quality model and the Similarity Distribution Distance comprehensive Quality model is more accurate. The effect of the adaptive feature length integration quality model (a spatial representation for face recognition and quality Assessment) on the high-quality face image is better, that is, the quality evaluation score of the high-quality face image obtained by the adaptive feature length integration quality model is more accurate.

The principle of model perturbation comprehensive quality model can be as follows: as shown in fig. 10, the model disturbance comprehensive quality model includes a plurality of parallel random sub-networks, and during the training process, neurons in the random sub-networks are randomly removed (dropout) to obtain a plurality of different features corresponding to the second training sample of the model disturbance comprehensive quality model, and the variance of the plurality of different features of the second training sample is calculated, where a larger variance indicates that the quality of the second training sample is lower, and a smaller variance indicates that the quality of the second training sample is higher.

The similarity distribution distance comprehensive quality model can also be called an intra-class similarity distribution and inter-class similarity distribution model, and the principle of the model can be as follows: as shown in fig. 11, a third sample cluster in which the second training sample is located is obtained by the face recognition model, then an intra-class distribution of the similarity between the second training sample and the face image in the third sample cluster is determined, and an inter-class distribution of the similarity between the second training sample and the face image in the fourth sample cluster (the sample cluster in which the second training sample is not located) is determined, then the Wasserstein distance between the intra-class distribution and the inter-class distribution is determined by the intra-class distribution and the inter-class distribution, and the Wasserstein distance is used as a pseudo quality score to train the similarity distribution distance comprehensive quality model.

The principle of the adaptive characteristic model length comprehensive quality model can be as follows: as shown in fig. 12 (in fig. 11, w ' and w denote two class centers, B ' denotes a boundary of the class center w ', B denotes a boundary of the class center w, m denotes an angle interval between the two classes, and 1,2, and 3 denote three kinds of face images having different qualities), in the training process, the modular length of the feature of the face image is determined, and the adaptive feature modular length integrated quality model is trained based on the feature modular length, wherein the larger the modular length of the feature is, the higher the quality of the face image is, and the smaller the modular length of the feature is, the lower the quality of the face image is.

Therefore, in this embodiment, the comprehensive quality model capable of obtaining a more accurate quality score of a low-quality face image is used as the first comprehensive quality model, the comprehensive quality model capable of obtaining a more accurate quality score of a high-quality face image is used as the second comprehensive quality model, and then the initial quality scores of the historical face images with different qualities are obtained through the first comprehensive quality model and the second comprehensive quality model, so that the accuracy of the quality scores corresponding to the historical face images is improved. However, the server cannot determine the quality of the historical face image when the first initial quality score corresponding to the historical face image is determined by the first comprehensive quality model and the second initial quality score corresponding to the historical face image is determined by the second comprehensive quality model.

Therefore, in this embodiment, a preset screening threshold is obtained, if the first initial quality score and the second initial quality score are both less than or equal to the preset screening threshold, which indicates that the historical face image is a low-quality face image, the first initial quality score is used as a quality score corresponding to the historical face image, and if the first initial quality score and the second initial quality score are both greater than the preset screening threshold, which indicates that the historical face image is a high-quality face image, the second initial quality score is used as a quality score corresponding to the historical face image, so that different comprehensive quality models are adopted, the initial quality scores of the historical face images with different qualities are obtained, and the accuracy of the quality scores corresponding to the historical face images is improved.

It should be understood that, in the embodiment, when the first comprehensive quality model and the second comprehensive quality model are trained, the historical face image may be subjected to face recognition according to the face recognition model to obtain a recognition result, then a first target loss value is determined according to the recognition result and the first sample intrinsic quality score of the first comprehensive quality model to be trained, and then the first comprehensive quality model to be trained is trained according to the first target loss value to obtain the first comprehensive quality model.

The training process of the second integrated quality model may refer to the training process of the first integrated quality model, and this example is not limited herein.

The effect of the integrated mass model will be described with reference to fig. 13. The ordinate of 1301 in fig. 13 represents the recall ratio, and the ordinate of 1302 in fig. 13 represents the correct filtering ratio, and it can be seen from fig. 13 that the first and second comprehensive quality models can better solve the coupling problem between the quality evaluation dimensions, so that the recall ratio and the correct filtering ratio of the first and second comprehensive quality models are better than those of the quality models of the quality evaluation dimensions.

In other embodiments, obtaining a training set of updated face recognition models includes:

acquiring a face image historically recognized by a face recognition model;

screening a picture cluster group to be labeled from picture clusters corresponding to the face image, and labeling the face image according to the picture cluster group to be labeled to obtain a labeled face image;

from the tagged face images, a training set is determined that updates the face recognition model.

The server clusters the face images after acquiring the face images historically recognized by the face recognition model to obtain image clusters corresponding to the face images, then screens out image cluster groups to be labeled from the image clusters corresponding to the face images so as to label the face images according to the image cluster groups to be labeled to obtain labeled face images, and finally determines and updates a training set of the face recognition model according to the labeled face images.

The method comprises the steps of labeling face images according to picture cluster groups to be labeled, wherein the steps of labeling the face images can be understood as judging whether the face images among the picture cluster groups to be labeled are face images of the same object or not. Optionally, the face images of the to-be-annotated picture cluster groups may be displayed, so that the user can determine the face images between the to-be-annotated picture cluster groups, or the server may also automatically determine whether the face images between the to-be-annotated picture cluster groups are face images of the same object. The present embodiment is not limited herein.

In other embodiments, the process of screening the group of picture clusters to be labeled from the picture clusters corresponding to the face image may be:

and screening a first to-be-labeled picture cluster from the candidate picture clusters, and then screening a first number of second to-be-labeled picture clusters from the candidate picture clusters to respectively form a to-be-labeled picture cluster group with the first to-be-labeled picture cluster.

However, the number of the clusters of the to-be-labeled pictures obtained by the method is large, and the labeling errors are accumulated. In order to reduce the number of the image cluster groups to be labeled and reduce the labeling error, in other embodiments, the method for screening the image cluster groups to be labeled from the image clusters corresponding to the face image includes:

and screening out a picture cluster group to be labeled from the picture cluster corresponding to the face image according to the association degree.

The server may calculate cluster similarity between picture clusters corresponding to the two face images according to central face features of the picture clusters corresponding to the two face images (or randomly select one face feature from the picture clusters corresponding to the face images to calculate cluster similarity between the picture clusters corresponding to the two face images), calculate tag (ID) heat of the picture clusters corresponding to the face images according to the number of the face images in the picture clusters corresponding to the face images (i.e., the number of times that tags corresponding to the picture clusters corresponding to the face images appear) and the total number of the face images, and then determine association between the picture clusters corresponding to the face images according to the tag heat and the cluster similarity of the picture clusters corresponding to the face images.

Optionally, the tag heat, the cluster similarity, and the association of the picture cluster corresponding to the face image satisfy the following relation:

cost（u _p ,u _q ）picture cluster representing face image correspondenceu _p Picture cluster corresponding to face imageu _q The degree of association between the two or more,info_u _p picture cluster representing face image correspondenceu _p The heat of the label of (1),info_u _q picture cluster representing face image correspondenceu _q The label heat of (1).

In this embodiment, after the picture clusters corresponding to the face images are obtained, the association degree between the picture clusters corresponding to the face images is determined according to the face features corresponding to the face images in the picture clusters corresponding to the face images, then, the picture cluster group to be labeled is screened out from the picture clusters corresponding to the face images according to the association degree, the greater the association degree is, the greater the probability that the picture clusters corresponding to the two face images are similar is shown, and then, the picture clusters corresponding to the two face images are used as the picture cluster group to be labeled, so that the number of the picture cluster groups to be labeled is reduced.

Optionally, according to the association degree, a group of picture clusters to be labeled is screened out from the picture clusters corresponding to the face image, where the sum of the association degrees of the screened group of picture clusters to be labeled is the minimum, or the sum of the association degrees of the screened group of picture clusters to be labeled is the second minimum, and this embodiment is not limited herein.

In other embodiments, screening a group of picture clusters to be labeled from picture clusters corresponding to the face image according to the association degree includes:

constructing a data structure according to the picture clusters corresponding to the face images and the association degrees, wherein the data structure comprises a plurality of nodes and a plurality of edges, each node represents the picture cluster corresponding to the face images, and the weights of the edges represent the association degrees among the picture clusters corresponding to the face images;

and screening out node groups from a data structure which does not contain annular connection, wherein the node groups correspond to the picture cluster groups to be labeled.

For example, the constructed data may be as shown in fig. 14, in which case the data structure includes a ring structure, and in fig. 14, the edge 1, the edge 2, and the edge 3 form a ring, the edge 1 may be removed to obtain a data structure that does not include a ring connection, and the data structure that does not include a ring connection may also be understood as a tree-shaped data structure.

The method for adjusting the data structure in a ring manner according to the weight of the edge in the data structure to obtain the data structure not including the ring connection may be selected according to an actual situation, for example, a Prim (Prim) algorithm, a Kruskal (Kruskal) algorithm, or a shortest path algorithm may be used, and this embodiment is not limited herein.

It should be understood that when the sum of the relevance degrees of the finally screened group of the picture clusters to be labeled is minimum, the data structure not containing the ring connection can be understood as a minimum spanning tree.

In this embodiment, a data structure is constructed according to a picture cluster and an association degree corresponding to a face image, the data structure includes a plurality of nodes and a plurality of edges, each node represents the picture cluster corresponding to the face image, the weight of each edge represents the association degree between the picture clusters corresponding to the face image, then the data structure is adjusted in an annular manner according to the weight of each edge in the data structure, so that a data structure not including annular connection is obtained, finally, a node group is selected from the data structure not including annular connection, the node group corresponds to a picture cluster group to be labeled, and the picture cluster group to be labeled is obtained according to the association degree in a data structure manner, so that the number of the picture cluster groups to be labeled can be reduced, and the picture cluster group to be labeled can be obtained more accurately.

In other embodiments, in order to improve the recall rate of the face recognition model, candidate picture clusters may be screened from picture clusters corresponding to the face image by an Active learning (Active learning) method, and then a group of picture clusters to be labeled is determined according to the candidate picture clusters. The active learning means that sample data which is difficult to classify is obtained through a machine learning method, manual re-confirmation is carried out, and then the manually confirmed data is retrained with a supervised learning model or a semi-supervised learning model, so that the effect of the model is improved, and the artificial experience is integrated into the machine learning model.

The method of active learning may be as shown in fig. 15. The face recognition model periodically and historically recognizes face images and periodically clusters (clustering is performed again after a prolonged time span, which means that clustering is performed periodically), then the clustered image clusters are purified (the purification operation can include quality filtering), so that purified image clusters are obtained, the purified image clusters are preprocessed, so that preprocessed image clusters are obtained, then the preprocessed image clusters are manually labeled, so that a labeling result is obtained, and finally the labeling result is post-processed (post-processing means that preprocessed image clusters which are the same object are combined into one image cluster), so that a training set is obtained.

In the active learning method, the difficulty of data discrimination is generally determined By an Uncertainty Sampling (US) method, a Diversity Sampling (DS) method, an Expected Model Change (EMC) method, a Committee Query (Query-By-Committee, QBC) method, and a Density-Weighted method.

When the data distinguishing difficulty is determined according to the density weight method, determining the association degree between the picture clusters corresponding to the face image according to the face features in the picture clusters corresponding to the face image, and screening out a picture cluster group to be labeled according to the association degree from the picture clusters corresponding to the face image, wherein the method comprises the following steps:

determining the density weight of a picture cluster corresponding to the face image according to the face feature of the face image, wherein the density weight is used for representing the distinguishing difficulty of the face image;

screening out a picture cluster corresponding to the density weight meeting the preset weight from picture clusters corresponding to the face image to obtain a candidate picture cluster;

determining the association degree between the candidate image clusters according to the face features in the candidate image clusters;

and screening the picture cluster group to be marked from the candidate picture clusters according to the association degree.

At this time, constructing a data structure according to the picture clusters corresponding to the face image and the association degree, where the data structure includes a plurality of nodes and a plurality of edges, each node represents a picture cluster corresponding to the face image, and the weights of the edges represent the association degree between the picture clusters corresponding to the face image, may include:

and constructing a data structure according to the candidate picture clusters and the association degree, wherein the data structure comprises a plurality of nodes and a plurality of edges, each node represents a candidate picture cluster, and the weight of each edge represents the association degree between the candidate picture clusters.

The density-weight may refer to: if one data is abnormal data or has a large deviation from most other data, the data is not suitable for sampling, and the dense and difficult-to-distinguish data has a larger value, that is, the density weight can be used for representing the distinguishing difficulty of the face image, the more difficult the face image is to be distinguished when the density weight is larger, and the more easily the face image is to be distinguished when the density weight is smaller.

For example, as shown in fig. 16, the picture cluster c3 is not a face picture cluster of the same object as the picture cluster c1, the picture cluster c3 is not a face picture cluster of the same object as the picture cluster c2, and the picture cluster c1 and the picture cluster c2 are face picture clusters of the same object, so that the density weight of the picture cluster c3 is smaller than the density weight of the picture cluster c1 and the density weight of the picture cluster c 2.

In this embodiment, the density weight of the image cluster corresponding to the face image is obtained according to the face feature of the face image to represent the distinguishing difficulty, and then the image cluster corresponding to the density weight meeting the preset weight is screened out from the image cluster corresponding to the face image according to the density weight to obtain a candidate image cluster, so that when the face recognition model is trained according to the candidate image cluster, the obtained target face recognition model has a higher recall rate.

The process of determining the density weight of the picture cluster corresponding to the face image according to the face feature of the face image may be:

calculating the similarity between the central feature of the picture cluster corresponding to each face image and the face feature of the face image;

determining a first prediction probability and a second prediction probability corresponding to the face image according to the face feature of the face image;

and determining the density weight of the face image according to the first prediction probability, the second prediction probability and the similarity.

The first prediction probability and the second prediction probability may be two of prediction probabilities of the face image by the face recognition model. For example, the first prediction probability may be a maximum probability that the face recognition model predicts the face image, and the second prediction probability may be a second maximum probability that the face recognition model predicts the face image. For another example, the first prediction probability may be a second highest probability that the face recognition model predicts the face image, and the second prediction probability may be a third highest probability that the face recognition model predicts the face image. The present embodiment is not limited herein.

Alternatively, the first prediction probability, the second prediction probability and the similarity may be substituted into the following relation, so as to obtain the density weight of the face image:

x _ID representing facial featuresxThe weight of the density of (a) is,x ^(u) a central feature representing each cluster of pictures,udenotes the firstuThe number of clusters of pictures is one,Uindicates the number of clusters of pictures,argmaxxit means that the maximum value of the independent variable is calculated,simthe operation of the similarity is represented by the operation of the similarity,

the parameter of the index is represented by,argminxit is expressed as an operation of finding the minimum value of an independent variable,

a first prediction probability is represented that is,

representing the class to which the first prediction probability corresponds,

a second prediction probability is represented that is indicative of,

the category to which the second prediction probability corresponds,

the facial feature that represents the smallest difference between the first prediction probability and the second prediction probability, that is,

can be obtained by the edge sampling principle.

The active learning method achieves a better effect on the task of a closed set (closed set), where the closed set means that the class of the samples in the training set is the same as the class of the samples in the test set, for example, the samples in the training set include class a, class B, and class C, and the samples in the test set also include class a, class B, and class C.

That is, if the training set and the test set of the face recognition model are in the form of a closed set, the recall rate of the face recognition model is high after the face recognition model is trained according to the indistinguishable face images screened by the active learning method.

However, in the practical application process, since the difficulty of obtaining the sample is high, for example, in the medical field or the aircraft field, the difficulty of obtaining the sample is very high, and therefore, the test Set may involve in the class of samples that are not in the training Set, and this kind of problem is called the Open Set Recognition problem.

The face Recognition belongs to the Open Set Recognition problem, the face image contains larger noise, if a group of picture clusters to be labeled is directly screened out from picture clusters corresponding to the face image according to an active learning method, and the face image is labeled according to the group of picture clusters to be labeled to obtain a labeled face image, the recall rate of a face Recognition model obtained by training according to the labeled face image can be influenced.

Therefore, in order to reduce the influence on the recall rate of the face recognition model, in other embodiments, determining a density weight of a picture cluster corresponding to the face image according to the face feature of the face image, and screening a picture cluster corresponding to the density weight meeting a preset weight from the picture clusters corresponding to the face image to obtain a candidate picture cluster includes:

acquiring the label heat of a picture cluster of a face image;

screening out a picture cluster meeting the label heat of a preset heat from the picture cluster of the face image to obtain a primary screening picture cluster;

determining the density weight of the primary screening picture cluster according to the facial features of the facial images in the primary screening picture cluster;

and screening out the picture clusters corresponding to the density weight meeting the preset weight from the primarily screened picture clusters to obtain candidate picture clusters.

The tag (ID) heat may be the ratio between the number of tag occurrences and the number of total face images, i.e.: ID heat = number of ID occurrences/total number of face images.

In this embodiment, instead of directly screening candidate image clusters from image clusters corresponding to a face image, an active learning method is improved, where the improved part may be an improved part 1 shown in fig. 17, and the image clusters meeting the tag heat of a preset heat are first screened from the image clusters of the face according to the tag heat to obtain primary screened image clusters, and then the density weight of the primary screened image clusters is determined according to the face features of the face image in the primary screened image clusters, and the candidate image clusters are obtained according to the density weight.

Because the label heat degree is high, the sample corresponding to the label is easy to obtain, and the difference between the class of the training set and the class of the testing set is small, so that the difficulty degree of obtaining the face image in the preliminary screening picture cluster is low, namely, the difficulty degree of obtaining the face image in the candidate picture cluster is low, and the influence on the recall rate of the face recognition model is reduced.

After the server obtains the group of the picture clusters to be labeled, the server can label the face image according to the group of the picture clusters to be labeled to obtain the labeled face image. Optionally, labeling the face image according to the group of to-be-labeled picture clusters to obtain a labeled face image, including:

extracting a plurality of modal information corresponding to the face image in the picture cluster group to be labeled;

displaying the picture cluster group to be marked and the modal information;

and receiving the labeling information of labeling the face image in the picture cluster group to be labeled by the user according to the modal information to obtain the labeled face image.

The modality information may refer to attribute information of the face image, and the attribute information of the face image may be information included in the face image itself, or attribute information of the face image, or attribute information of a video in which the face image is located. For example, the attribute information may be at least one of picture information of a face image, subtitle information, video ip (video where the face image is), character information, watermark information, and character information. And displaying the plurality of modal information so that a user can label according to the plurality of modal information to realize multi-modal labeling, thereby improving the accuracy of labeling.

For example, as shown in fig. 18, a set a and a set B are to-be-labeled picture cluster groups, the set a is derived from a video a, the set B is derived from a video B, and the set a, the set B, the modality information of the face images in the set a, and the modality information of the face images in the set B are displayed.

It should be understood that the multi-modal annotation process in the present embodiment can be understood as a part of improving the annotation method in active learning, as shown in fig. 15, in the related art, the annotation method in active learning is only directly annotated by a human being, and does not display multiple pieces of modal information, and in the present embodiment, as shown in fig. 17, multiple pieces of modal information are displayed so as to be manually annotated according to multiple pieces of modal information.

S302, extracting features of the training sample to obtain facial features corresponding to the training sample, and performing fusion processing on the facial features and the central features of the target cluster corresponding to the label to obtain fusion features of the training sample.

The training sample can be subjected to feature extraction through a feature extraction layer in the face recognition model, and face features corresponding to the training sample are obtained. Or, feature extraction may be performed on the training sample through a feature extraction layer in the face recognition model to obtain initial face features corresponding to the training sample, and then feature mapping is performed on the initial face features through a feature mapping layer in the face recognition model to obtain face features corresponding to the training sample.

In some embodiments, the feature mapping layer in the face recognition model includes a first full connection layer and a first activation layer, and the feature mapping layer in the face recognition model performs feature mapping on the initial face features to obtain face features corresponding to the training sample, including:

performing feature dimension reduction mapping on the initial face features through a first full-connection layer in the face recognition model to obtain dimension-reduced face features corresponding to the training samples;

and carrying out nonlinear feature mapping on the face features subjected to dimensionality reduction through a first activation layer in the face recognition model to obtain face features corresponding to the training samples.

In this embodiment, the initial facial features are subjected to dimension reduction mapping through the first full-connected layer, so that the computation amount of extracting the nonlinear features of the initial facial features is reduced, and then the nonlinear feature mapping is performed on the dimension-reduced facial features to obtain the facial features corresponding to the training samples, that is, the facial features include the nonlinear features of the initial facial features.

In other embodiments, in order to solve the over-fitting problem, the feature mapping layer in the face recognition model may be a residual structure, in this case, the face recognition model may be as shown in fig. 19, the feature mapping layer in the face recognition model further includes a second full connection layer, and the nonlinear feature mapping is performed on the face features after the dimensionality reduction through the first activation layer in the face recognition model, so as to obtain the face features corresponding to the training sample, including:

carrying out nonlinear feature mapping on the face features subjected to dimensionality reduction through a first activation layer in the face recognition model to obtain nonlinear face features corresponding to the training samples;

performing feature dimension-rising mapping on the nonlinear face features through a second full-connection layer in the face recognition model, and training face features corresponding to the samples after dimension rising;

and determining the facial features corresponding to the training samples according to the face features after the face feature is subjected to the dimensionality lifting and the initial face features.

In other embodiments, the face recognition model further comprises a center mapping layer, and the method further comprises:

acquiring initial central characteristics of a target cluster corresponding to a label;

and mapping the initial central features through a central mapping layer of the face recognition model to obtain the central features of the target cluster corresponding to the label.

The structure of the center mapping layer may be the same as the structure of the feature mapping layer, and specifically, reference may be made to the description of the feature mapping layer and fig. 19, which is not limited herein.

In some embodiments, the parameters of the center mapping layer may be the same as the parameters of the feature mapping layer, that is, when the face recognition model is obtained through training, the parameters of the feature mapping layer may be obtained, and then the parameters of the feature mapping layer are shared with the center mapping layer, where the face recognition model may be an adaptive face recognition model.

The adaptive face recognition model may refer to: and in the process of processing and analyzing the face image, automatically adjusting a processing method, a processing sequence, a processing parameter, a boundary condition or a constraint condition according to the feature of the face image, so that the statistical distribution feature and the structural feature of the processed face image are adapted to obtain the optimal processing effect. The adaptive process is a process of continuously approaching an object and can be represented by a mathematical model.

S303, adjusting the central feature of the target cluster according to the fusion feature and the face feature to obtain the adjusted central feature of the target cluster.

The central feature of the target cluster can be adjusted according to the face feature and the prediction label, so as to obtain the adjusted central feature of the target cluster. Or, according to the fusion features, determining a prediction label of the training sample, then adjusting the central feature of the target cluster by using the face feature, the prediction label and the initial central feature of the target cluster to obtain an adjusted central feature of the target cluster.

When the predicted label of the training sample is determined according to the fusion feature, then the central feature of the target cluster is adjusted according to the facial feature and the predicted label to obtain the adjusted central feature of the target cluster, and the method comprises the following steps:

and adjusting the central feature of the target cluster according to the first direction information, the second direction information and the prediction label to obtain the adjusted central feature of the target cluster.

In this embodiment, the center feature of the target cluster is adjusted according to the first direction information, the second direction information and the prediction tag to obtain an adjusted center feature of the target cluster, so that the center feature of the target cluster is adjusted from the direction information, that is, from the boundary information, to implement boundary constraint on the center feature of the target cluster, thereby obtaining the adjusted center feature of the target cluster.

Or, according to the facial feature and the prediction label, adjusting the central feature of the target cluster to obtain the adjusted central feature of the target cluster, including:

acquiring first direction information corresponding to the face features and acquiring third direction information of central features of the target cluster;

and adjusting the central feature of the target cluster according to the first direction information, the third direction information and the prediction label to obtain the adjusted central feature of the target cluster.

In this embodiment, the center feature of the target cluster is adjusted according to the first direction information, the third direction information and the prediction tag to obtain an adjusted center feature of the target cluster, so that the center feature of the target cluster is adjusted from the direction information, that is, from the boundary information, to implement boundary constraint on the center feature of the target cluster, thereby obtaining the adjusted center feature of the target cluster.

In other embodiments, the adjusting the center feature of the target cluster according to the first direction information, the second direction information, and the prediction label, and the process of obtaining the adjusted center feature of the target cluster may include:

acquiring a direction weight corresponding to the first direction information and a direction weight corresponding to the second direction information;

adjusting the first direction information according to the direction weight corresponding to the first direction information to obtain adjusted first direction information, and adjusting the second direction information according to the direction weight corresponding to the second direction information to obtain adjusted second direction information;

determining first included angle information between the initial facial features and the facial features according to the adjusted first direction information and the adjusted second direction information;

and adjusting the central feature of the target cluster according to the first included angle information and the prediction label to obtain the adjusted central feature of the target cluster.

Or, the process of adjusting the center feature of the target cluster according to the first direction information, the second direction information, and the prediction tag to obtain the adjusted center feature of the target cluster may also be:

and adjusting the central feature of the target cluster according to the first direction information, the second direction information, the third direction information and the prediction label to obtain the adjusted central feature of the target cluster.

The method comprises the following steps of adjusting the central feature of the target cluster according to the first direction information, the second direction information, the third direction information and the prediction label, and obtaining the adjusted central feature of the target cluster:

determining initial facial features and first included angle information between the facial features according to the first direction information and the second direction information;

determining second included angle information between the initial face feature and the central feature of the target cluster according to the second direction information and the third direction information;

and adjusting the central feature of the target cluster according to the first included angle information, the second included angle information and the prediction label to obtain the adjusted central feature of the target cluster.

In other embodiments, the process of adjusting the center feature of the target cluster according to the first included angle information, the second included angle information, and the prediction tag to obtain the adjusted center feature of the target cluster may be:

determining an included angle difference value between the first included angle information and the second included angle information;

and adjusting the central feature of the target cluster according to the predicted label, the included angle difference value and the label to obtain the adjusted central feature of the target cluster.

It should be understood that the center feature of the target cluster may be the initial center feature of the target cluster, or may be a feature mapped to the initial center feature of the target cluster.

When the central feature of the target cluster is the initial central feature of the target cluster, the predicted label, the included angle difference value, the label and the adjusted central feature of the target cluster meet the following relational expression:

kada’indicating the adjusted center feature of the target cluster,Nthe class of the training sample is represented by,irepresenting the number of training samples.

Representing training samplesiThe label of (a) to (b),

representing training samplesiThe prediction tag of (a) is determined,b_funca function representing the calculation of the angle is shown,

representing training samplesiThe initial facial features of the human,

representing the facial features of the training sample i,Koldrepresenting training samplesiOf the target cluster.

In this embodiment, the boundary constraint is performed on the central feature of the target cluster through the difference between the first included angle information and the second included angle information, so that when the face recognition model is trained according to the compatible loss value obtained by the adjusted center of the target cluster, the face recognition model can reach a better convergence state.

S304, carrying out fusion processing on the face features and the adjusted central features of the target cluster to obtain adjusted fusion features of the training sample.

The process of obtaining the adjusted fusion features of the training sample by performing fusion processing on the face features and the adjusted central features of the target cluster may refer to the process of obtaining the fusion features of the training sample by performing fusion processing on the face features and the central features of the target cluster corresponding to the label, and the embodiment is not limited herein.

It should be understood that the face recognition model may be referred to as a new face recognition model after the central feature of the target cluster is changed into the adjusted central feature of the target cluster, and in this case, the face feature in the fusion process of the face feature and the adjusted central feature of the target cluster may refer to a face feature obtained by feature extraction of a training sample by the new face recognition model

That is, at this time, the,

can mean

The relationship between the face recognition model and the new face recognition model may be as shown in fig. 18.

S305, according to the adjusted fusion characteristics, determining a compatibility loss value between the fusion characteristics and the adjusted fusion characteristics, and determining a classification loss value of the face recognition model.

The compatibility loss value between the fused feature and the adjusted fused feature may also be understood as the compatibility loss value between the facial feature and the adjusted fused feature. The adjusted fusion features and the face features can be subjected to first fusion processing to obtain target fusion features of the training sample, and then compatibility loss values between the fusion features and the adjusted fusion features are fused according to the target fusion features.

Or, the adjusted fusion feature and the initial face feature may be subjected to a first fusion process to obtain a target fusion feature of the training sample, and then a compatibility loss value between the fusion feature and the adjusted fusion feature is obtained according to the target fusion feature.

The first fusion process is a process of determining a difference between the adjusted fusion feature and the initial face feature, and may be a multiplicative fusion process, or may be a subtractive fusion process.

When the first fusion processing is subtraction fusion processing, logarithm operation may be performed on the adjusted fusion features to obtain simplified adjusted fusion features, then subtraction fusion processing is performed on the simplified adjusted fusion features and the initial face features to obtain target fusion features of the training sample, then division operation is performed on the target fusion features and the number of classes of the training set to obtain a compatibility loss value, and at this time, the adjusted fusion features, the initial face features, and the target fusion features satisfy the following relational expression:

indicating a value of compatibility loss between the fused feature and the adjusted fused feature,y _i representing training samplesiThe target cluster in which it is located.

In the related art, in the process of recognizing a face image by using a face recognition model, the face recognition model is updated according to the face image. However, the facial features extracted by the updated face recognition model are incompatible with the features in the face library (the incompatible facial features extracted by the updated face recognition model and the features in the face library means that the distances between the facial features extracted by the updated face recognition model and the features in the face library cannot be directly calculated), so that the features of the facial images in the face library need to be re-extracted by the updated face recognition model, the consumed time is long, and the recognition speed is slow.

For example, as shown in fig. 20, facial features in a query set (query collection) extracted by the updated face recognition model and facial features in a registration set (galery collection) extracted by the face recognition model are incompatible, which results in that the updated face recognition model is used to brush the galery library, and when the galery order is large, the device cost and the time cost generated by the brushing library are high.

In this embodiment, the face feature and the central feature of the target cluster corresponding to the tag are fused to obtain a fusion feature of the training sample, so that the central feature of the target cluster can be adjusted according to the fusion feature and the face feature to obtain an adjusted central feature of the target cluster, and the center of the target cluster is bound, so that after the face feature and the adjusted central feature of the target cluster are fused to obtain the adjusted fusion feature of the training sample, a compatible loss value between the fusion feature and the adjusted fusion feature can be determined according to the adjusted fusion feature, and the face recognition model is trained according to the compatible loss value, so that the correction of the central feature of the target cluster is completed, so that information of the central feature of the target cluster can be retained, so that the face feature obtained through the target face recognition model and the face feature obtained through the face recognition model have compatibility, and feature extraction is not required to be performed again on an image in the face library by using the target face recognition model, thereby reducing time for feature re-extraction, improving recognition speed, and reducing time cost and equipment cost.

In addition, compared with a mode of directly calculating the compatible loss value according to the central feature of the target cluster, the embodiment of the application adjusts the central feature of the target cluster and calculates the compatible loss value according to the adjusted central feature of the target cluster, so that the face recognition model can achieve a better convergence state.

S306, training the face recognition model according to the classification loss value and the compatible loss value to obtain a target face recognition model after the face recognition model is updated.

The server may directly add the classification loss value and the compatible loss value to obtain a target loss value of the face recognition model, or the server may obtain a classification weight corresponding to the classification loss value and a compatible weight corresponding to the compatible loss value, multiply the classification weight by the classification loss value to obtain an adjusted classification loss value, multiply the compatible weight by the compatible loss value to obtain an adjusted compatible loss value, and finally add the adjusted classification loss value and the adjusted compatible loss value to obtain the target loss value.

After the target loss value is obtained, if the target loss value is equal to or greater than the preset loss value, updating parameters of the face recognition model according to the target loss value (the parameters of the parameter mapping layer may not be updated), and returning to perform the step of performing feature extraction on the training sample to obtain the face features corresponding to the training sample, at this time, it may not be necessary to perform fusion processing on the face features and the central features of the target cluster corresponding to the label to obtain fusion features of the training sample, and the step of adjusting the central features of the target cluster according to the fusion features and the face features to obtain the adjusted central features of the target cluster may also be performed, which is not described herein again.

Alternatively, the classification weight and the compatibility weight may be fixed during the training process, or the classification weight and the compatibility loss weight may also be dynamically changed during the training process, that is, the classification weight and the compatibility loss weight are learned simultaneously during the process of training the face recognition model, and at this time, the classification weight and the compatibility loss weight are updated while updating the parameters of the face recognition model according to the target loss value.

It should be noted that, after the target face recognition model is applied to perform face recognition, the feature mapping layer and the center mapping layer in the target face recognition model may be removed, so that no additional parameters are introduced during face recognition by the target face recognition model. At this time, the process of recognizing the face image to be recognized by using the target face recognition model may be:

acquiring a facial image to be recognized, and performing feature extraction on the facial image to be recognized through a feature extraction layer in the target facial recognition model to obtain facial features to be recognized corresponding to the facial image to be recognized;

and classifying the facial features to be recognized through a classification layer in the target facial recognition model to obtain a recognition result of the facial image to be recognized.

After the target face recognition model is obtained, an update gate method is adopted to evaluate the compatibility index of the target face recognition model, and an evaluation formula can be as follows:

it is indicated that the gain is updated,

a model representing the recognition of the target face,

representing a face recognition model, Q representing a query set, D representing a registry set (corpus),

indicating a True Positive Ratio (TPR) in the case where the facial features of the face images in the query set are extracted by the target face model, and the facial features of the face images in the registration set are extracted by the face recognition model,

a true positive rate in the case where the facial features of the facial images in the registration set are extracted and the facial features of the facial images in the query set are extracted by the face recognition model,

the true positive rate in the case of extracting the facial features of the facial images in the registration set and extracting the facial features of the facial images in the query set by the target face recognition model is represented.

Tables 1 and 2 show the performance of the face recognition model obtained by various update methods. The training set of table 1 is different from the training set of table 2. model v0 represents a face recognition model before updating, ours represents a target face recognition model, model represents a face recognition model without compatibility updating, adapter wo boundary represents an adaptive model without boundary constraint, adapter wo residual represents an adaptive model without residual structure, adapter 0.5 beta represents a target face recognition model with classification weight and compatibility weight of 0.5, and adapter leanable beta represents a target face recognition model with classification weight and compatibility weight dynamically changing.

TABLE 1

TABLE 2

As can be seen from table 1, the accuracy and the update gain of the target face recognition model are higher than those of the face recognition model obtained by the related art. As can be seen from table 2, the true positive rate and the update gain of the target face recognition model when the classification weight and the compatible weight are fixed are higher than those of the target face recognition model when the classification weight and the compatible weight are dynamically changed; the true positive rate and the updating gain of the target face identification with the residual error structure are higher than those of the target face identification without the residual error structure; the true positive rate of the adaptive model without boundary constraints is higher than that of the adaptive model without residual structure (the importance of residual structure and boundary constraints is found by the inference term).

As can be seen from the above, in the embodiment of the present application, a training set for updating a face recognition model is obtained, the training set includes at least one training sample with a label, the training sample is a face image historically recognized by the face recognition model, feature extraction is performed on the training sample to obtain a face feature corresponding to the training sample, and fusion processing is performed on the face feature and a central feature of a target cluster corresponding to the label to obtain a fusion feature of the training sample, so that the central feature of the target cluster can be adjusted according to the fusion feature and the face feature to obtain an adjusted central feature of the target cluster, so that after the adjusted central feature of the face feature and the adjusted central feature of the target cluster are subjected to fusion processing to obtain an adjusted fusion feature of the training sample, a compatible loss value between the fusion feature and the adjusted fusion feature can be determined according to the adjusted fusion feature, a classification loss value of the face recognition model is determined, the face recognition model is trained according to the classification loss value and the face compatible loss value, the face recognition model is updated to obtain the updated target recognition model, and further the face recognition model is not required to be extracted again.

The method described in the above embodiments is further illustrated in detail by way of example.

Referring to fig. 21, fig. 21 is a flowchart of a method for constructing a training set for updating a face recognition model according to an embodiment of the present application, where the method includes:

s2101, the server obtains historical face images of the face recognition model through historical recognition, and key points of the historical face images are extracted.

S2102, the server determines a first initial quality score corresponding to the historical face image according to the key point through the first comprehensive quality model, determines a second initial quality score corresponding to the historical face image according to the key point through the second comprehensive quality model, and obtains a preset screening threshold.

S2103, if the first initial quality score and the second initial quality score are both smaller than or equal to a preset screening threshold value, the server takes the first initial quality score as a quality score corresponding to the historical face image.

S2104, if the first initial mass fraction and the second initial mass fraction are both larger than a preset screening threshold value, the server takes the second initial mass fraction as a mass fraction corresponding to the historical face image.

S2105, the server takes the historical face image corresponding to the quality score meeting the preset score threshold value as the face image historically recognized by the face recognition model.

In the active learning method, the purification operation, that is, the quality filtering, is manually implemented, and in this embodiment, the quality scores corresponding to the historical face images are determined by the first comprehensive quality model and the second comprehensive quality model, so that the automation of the data mining part is completed, and the improvement of the quality filtering process in the active learning method is realized. For example, as in the modified part 2 of fig. 17. Therefore, the quality of training samples in a training set for updating the face recognition model is improved, and the recall rate of the face recognition model is further improved.

S2106, the server clusters the face images through the face recognition model to obtain the image cluster where the face images are located.

S2107, the server acquires the label heat degree of the image cluster of the face image, and screens out the image cluster meeting the label heat degree of the preset heat degree from the image cluster of the face image to obtain a primary screening image cluster.

S2108, the server determines the density weight of the primary screening picture cluster according to the face characteristics of the face images in the primary screening picture cluster, and screens out the picture clusters corresponding to the density weight meeting the preset weight from the primary screening picture cluster to obtain candidate picture clusters.

S2109, the server determines the association degree between the candidate image clusters according to the face features corresponding to the face images in the candidate image clusters, and constructs a tree structure according to the candidate image clusters and the association degree, wherein the tree structure comprises a plurality of nodes and a plurality of edges, each node represents a candidate image cluster, and the weight of each edge represents the association degree between the candidate image clusters.

S21010, the server adjusts the tree structure according to the weight of the edges in the tree structure to obtain a minimum spanning tree, and node groups are screened from the minimum spanning tree, wherein the node groups correspond to the picture cluster groups to be labeled.

S21011, the server extracts a plurality of modal information corresponding to the face images in the picture cluster group to be annotated, and displays the picture cluster group to be annotated and the modal information.

S21012, the server receives labeling information for labeling the face images in the picture cluster group to be labeled according to the modal information, and a labeled face image is obtained.

In the embodiment of the application, a plurality of modal information are displayed, the interference caused by makeup, multi-angle, age span and the like is effectively inhibited, so that a user can determine whether the face image in the picture cluster group to be labeled is the same image, the labeling efficiency and accuracy are effectively improved, the labeling cost is reduced, the accuracy is improved to 99% from 85%, and the labeling efficiency is improved by 5 times.

The displaying of the plurality of modality information for the user to label in the embodiment of the present application may be an improvement on the labeling operation in the active learning method. For example, as in the modified section 3 of fig. 17.

S21013, the server constructs a training set for updating the face recognition model according to the labeled face images, wherein the training set comprises at least one labeled training sample.

The specific implementation manner and the corresponding beneficial effects of this embodiment may specifically refer to the above embodiment of the model updating method, and this embodiment is not described herein again.

Referring to fig. 22, fig. 22 is a method for updating a face recognition model according to an embodiment of the present application, where the method for updating a face recognition model includes:

s2201, the server extracts the features of the training samples through a feature extraction layer in the face recognition model to obtain initial face features corresponding to the training samples.

S2202, the server performs feature dimension reduction mapping on the initial face features through a first full connection layer in the face recognition model to obtain dimension reduced face features corresponding to the training samples, and performs nonlinear feature mapping on the dimension reduced face features through a first activation layer in the face recognition model to obtain nonlinear face features corresponding to the training samples.

S2203, the server performs feature dimension-increasing mapping on the nonlinear face features through a second full connection layer in the face recognition model, trains face features after dimension increasing corresponding to the samples, and determines face features corresponding to the training samples according to the face features after dimension increasing and the initial face features.

S2204, the server obtains the initial central feature of the target cluster corresponding to the label, and performs dimensionality reduction mapping on the initial central feature of the target cluster through a third full-connection layer in the face recognition model to obtain the dimensionality-reduced initial central feature.

S2205, the server conducts nonlinear feature mapping on the initial central feature after dimension reduction through a second activation layer in the face recognition model to obtain a nonlinear initial central feature, and conducts feature dimension increasing mapping on the nonlinear initial central feature through a fourth full connection layer in the face recognition model to obtain an initial central feature after dimension increasing.

S2206, the server determines the central feature of the target cluster according to the initial central feature after the dimensionality raising and the initial central feature of the target cluster.

S2207, the server performs fusion processing on the face features and the central features of the target cluster through the feature fusion layer in the face recognition model to obtain fusion features of the training samples, and determines prediction labels of the training samples according to the fusion features through the classification layer in the face recognition model.

S2208, the server obtains first direction information corresponding to the face features, obtains second direction information of the initial face features, and obtains third direction information of the initial center features of the target cluster.

S2209, the server determines first included angle information between the initial face features and the face features according to the first direction information and the second direction information, and determines second included angle information between the initial face features and the central features of the target cluster according to the second direction information and the third direction information.

S22010, the server determines an included angle difference value between the first included angle information and the second included angle information, and adjusts the central feature of the target cluster according to the predicted label, the label and the included angle difference value to obtain the adjusted central feature of the target cluster.

S22011, the server carries out fusion processing on the face features and the adjusted central features of the target cluster to obtain the adjusted fusion features of the training samples.

S22012, the server performs first fusion processing on the adjusted fusion features and the initial face features to obtain target fusion features of the training samples.

S22013, the server determines a compatible loss value between the fusion feature and the adjusted fusion feature according to the target fusion feature, and determines a classification loss value of the face recognition model.

S22014, the server trains the face recognition model according to the classification loss value and the compatible loss value to obtain a target face recognition model after the face recognition model is updated.

The specific implementation manner and the corresponding beneficial effects of this embodiment may specifically refer to the above-mentioned embodiment of the model updating method, and this example is not described herein again.

In order to better implement the model updating method provided by the embodiment of the present application, the embodiment of the present application further provides a device based on the model updating method. The terms are the same as those in the model updating method, and details of implementation can be referred to the description in the method embodiment.

For example, as shown in fig. 23, the model updating means may include:

an obtaining module 2301, configured to obtain a training set for updating a face recognition model, where the training set includes at least one labeled training sample, and the training sample is a face image historically recognized by the face recognition model;

the extraction module 2302 is used for performing feature extraction on the training sample to obtain facial features corresponding to the training sample, and performing fusion processing on the facial features and the central features of the target cluster corresponding to the label to obtain fusion features of the training sample;

an adjusting module 2303, configured to adjust the central feature of the target cluster according to the fusion feature and the face feature, to obtain an adjusted central feature of the target cluster;

a fusion module 2304, configured to perform fusion processing on the face features and the adjusted center features of the target cluster to obtain adjusted fusion features of the training samples;

a determining module 2305, for determining a compatible loss value between the fusion feature and the adjusted fusion feature according to the adjusted fusion feature, and determining a classification loss value of the face recognition model;

the training model 2306 trains the face recognition model according to the classification loss value and the compatibility loss value, and obtains a target face recognition model after the face recognition model is updated.

Optionally, the adjusting module 2303 is specifically configured to perform:

and adjusting the central feature of the target cluster according to the face feature and the prediction label to obtain the adjusted central feature of the target cluster.

Optionally, the extraction module 2302 is specifically configured to perform:

performing feature extraction on the training sample through a feature extraction layer in the face recognition model to obtain initial face features corresponding to the training sample;

Optionally, the adjusting module 2303 is specifically configured to perform:

acquiring third direction information of the central characteristics of the target cluster;

Optionally, the fusion module 2304 is specifically configured to perform:

performing first fusion processing on the adjusted fusion features and the initial face features to obtain target fusion features of the training sample;

and determining a compatible loss value between the fusion feature and the adjusted fusion feature according to the target fusion feature.

Optionally, the model updating apparatus further includes:

a mapping module to perform:

Optionally, the obtaining module 2301 is specifically configured to perform:

acquiring a face image historically recognized by a face recognition model;

Optionally, the obtaining module 2301 is specifically configured to perform:

obtaining a plurality of quality models of historical face images;

Optionally, the historical face image has a plurality of quality evaluation dimensions, each quality evaluation dimension corresponds to a dimension quality model, and accordingly, the obtaining module 2301 is specifically configured to perform:

Optionally, the model updating apparatus further includes:

a training module to perform:

extracting sample key points of a first training sample, and determining a pair loss value and/or an anchor loss value of the dimension quality model to be trained according to the sample key points;

and training the dimension quality model to be trained according to the pairwise loss value and/or the anchor loss value to obtain the dimension quality model.

Optionally, the training module is specifically configured to perform:

Optionally, the multiple quality models of the historical facial image include a first comprehensive quality model and a second comprehensive quality model, and accordingly, the obtaining module 2301 is specifically configured to perform:

Optionally, the obtaining module 2301 is specifically configured to perform:

acquiring a preset screening threshold;

Optionally, the obtaining module 2301 is specifically configured to perform:

and screening out a node group from a data structure which does not contain annular connection, wherein the node group corresponds to the picture cluster group to be labeled.

Optionally, the obtaining module 2301 is specifically configured to perform:

displaying the group of pictures to be labeled and the modal information;

and receiving labeling information for labeling the face image in the picture cluster group to be labeled by the user according to the modal information to obtain the labeled face image.

In specific implementation, the above modules may be implemented as independent entities, or may be combined arbitrarily, and implemented as the same or several entities, and the specific implementation manner and the corresponding beneficial effects of the above modules may refer to the foregoing method embodiments, which are not described herein again.

An embodiment of the present application further provides an electronic device, which may be a server or a terminal, and as shown in fig. 24, a schematic structural diagram of the electronic device according to the embodiment of the present application is shown, specifically:

the electronic device may include components such as a processor 2401 of one or more processing cores, memory 2402 of one or more computer-readable storage media, a power supply 2403, and an input unit 2404. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 24 does not constitute a limitation of the electronic device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

the processor 2401 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by operating or executing computer programs and/or modules stored in the memory 2402 and calling data stored in the memory 2402. Optionally, processor 2401 may include one or more processing cores; preferably, the processor 2401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 2401.

The memory 2402 may be used to store computer programs and modules, and the processor 2401 executes various functional applications and data processing by executing the computer programs and modules stored in the memory 2402. The memory 2402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, a computer program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 2402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 2402 may also include a memory controller to provide the processor 2401 with access to the memory 2402.

The electronic device further comprises a power supply 2403 for supplying power to each component, and preferably, the power supply 2403 is logically connected to the processor 2401 through a power management system, so that functions of managing charging, discharging, power consumption and the like are realized through the power management system. The power supply 2403 may also include any component(s) including one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The electronic device may further include an input unit 2404, and the input unit 2404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the electronic device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 2401 in the electronic device loads an executable file corresponding to one or more processes of the computer program into the memory 2402 according to the following instructions, and the processor 2401 runs the computer program stored in the memory 2402, so as to implement various functions, such as:

acquiring a training set for updating the face recognition model, wherein the training set comprises at least one training sample with a label, and the training sample is a face image historically recognized by the face recognition model;

extracting features of the training samples to obtain face features corresponding to the training samples, and performing fusion processing on the face features and central features of the target clusters corresponding to the labels to obtain fusion features of the training samples;

according to the fusion features and the face features, the center features of the target cluster are adjusted to obtain adjusted center features of the target cluster;

The specific implementation of each operation and the corresponding beneficial effects can be referred to the above detailed description of the model updating method, which is not repeated herein.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium and loaded and executed by a processor, or by a computer program controlling associated hardware.

To this end, the present application provides a computer-readable storage medium, in which a computer program is stored, where the computer program can be loaded by a processor to execute the steps in any one of the model updating methods provided in the present application. For example, the computer program may perform the steps of:

extracting the features of the training samples to obtain face features corresponding to the training samples, and fusing the face features and the central features of the target clusters corresponding to the labels to obtain fused features of the training samples;

The specific implementation of the above operations and the corresponding beneficial effects can be referred to the foregoing embodiments, and are not described herein again.

Wherein the computer-readable storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.

Since the computer program stored in the computer-readable storage medium can execute the steps of any model updating method provided in the embodiments of the present application, beneficial effects that can be achieved by any model updating method provided in the embodiments of the present application can be achieved, for details, see the foregoing embodiments, and are not described herein again.

According to an aspect of the application, there is provided, among other things, a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the model updating method.

The above detailed description is provided for a model updating method, apparatus, device, storage medium and program product provided in the embodiments of the present application, and a specific example is applied in the present application to explain the principle and implementation manner of the present application, and the description of the above embodiments is only used to help understanding the method and core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A model update method, comprising:

acquiring a training set for updating a face recognition model, wherein the training set comprises at least one labeled training sample, and the training sample is a face image historically recognized by the face recognition model;

according to the fusion feature and the face feature, adjusting the central feature of the target cluster to obtain an adjusted central feature of the target cluster;

performing fusion processing on the face features and the adjusted central features of the target cluster to obtain adjusted fusion features of the training samples;

2. The model updating method according to claim 1, wherein the adjusting the central feature of the target cluster according to the fused feature and the facial feature to obtain an adjusted central feature of the target cluster comprises:

3. The model updating method according to claim 2, wherein the performing feature extraction on the training sample to obtain the facial features corresponding to the training sample comprises:

performing feature extraction on the training sample through a feature extraction layer in the face recognition model to obtain an initial face feature corresponding to the training sample;

4. The model updating method of claim 3, wherein the adjusting the center feature of the target cluster according to the facial feature and the prediction label to obtain an adjusted center feature of the target cluster comprises:

5. The model updating method according to claim 4, wherein the adjusting the center feature of the target cluster according to the first direction information, the second direction information and the prediction label to obtain the adjusted center feature of the target cluster comprises:

6. The model updating method of claim 3, wherein the face recognition model further comprises a center mapping layer, the method further comprising:

acquiring initial central features of a target cluster corresponding to the label;

and mapping the initial central feature through a central mapping layer of the face recognition model to obtain the central feature of the target cluster corresponding to the label.

7. The model updating method according to any one of claims 1 to 6, wherein the obtaining a training set of updated face recognition models comprises:

acquiring a face image historically recognized by the face recognition model;

screening a picture cluster group to be labeled from a picture cluster corresponding to the face image, and labeling the face image according to the picture cluster group to be labeled to obtain a labeled face image;

8. The model updating method according to claim 7, wherein obtaining the face image historically recognized by the face recognition model comprises:

9. The model updating method of claim 8, wherein said determining a quality score corresponding to the historical facial image from the keypoints comprises:

obtaining a plurality of quality models of the historical facial image;

determining, by the quality model, a plurality of initial quality scores of the historical facial image according to the keypoints;

10. The model updating method according to claim 9, wherein the historical face image has a plurality of quality evaluation dimensions, each corresponding to a dimensional quality model;

determining, by the quality model and according to the key points, a plurality of initial quality scores of the historical face image, and determining, according to the initial quality scores, a quality score corresponding to the historical face image, including:

determining an initial quality score of the historical facial image for each quality evaluation dimension according to the key points through the dimension quality model;

11. The model updating method of claim 10, further comprising, before said determining a quality score corresponding to the historical facial image from the keypoints by the dimensional quality model:

12. The model updating method of claim 9, wherein the plurality of quality models of the historical facial image includes a first comprehensive quality model and a second comprehensive quality model, and the determining the corresponding quality score of the historical facial image according to the keypoint comprises:

determining a second initial quality score corresponding to the historical facial image according to the key point through the second comprehensive quality model;

screening out the quality scores corresponding to the historical face images from the first initial quality scores and the second initial quality scores.

13. The model updating method of claim 12, wherein the filtering out the quality score corresponding to the historical facial image from the first initial quality score and the second initial quality score comprises:

acquiring a preset screening threshold;

14. The model updating method of claim 7, wherein the step of screening out the group of picture clusters to be labeled from the picture clusters corresponding to the facial image comprises:

15. The model updating method according to claim 14, wherein the step of screening out a group of picture clusters to be labeled from the picture clusters corresponding to the face images according to the association degree comprises:

16. The model updating method of claim 7, wherein labeling the face image according to the group of picture clusters to be labeled to obtain a labeled face image comprises:

displaying the picture cluster group to be marked and the modal information;

17. A model updating apparatus, comprising:

the face recognition system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a training set for updating a face recognition model, the training set comprises at least one labeled training sample, and the training sample is a face image historically recognized by the face recognition model;

the extraction module is used for extracting the features of the training sample to obtain the facial features corresponding to the training sample, and performing fusion processing on the facial features and the central features of the target cluster corresponding to the label to obtain the fusion features of the training sample;

the adjusting module is used for adjusting the central feature of the target cluster according to the fusion feature and the face feature to obtain the adjusted central feature of the target cluster;

the fusion module is used for carrying out fusion processing on the face features and the adjusted central features of the target cluster to obtain the adjusted fusion features of the training samples;

the determining module is used for determining a compatible loss value between the fusion feature and the adjusted fusion feature according to the adjusted fusion feature and determining a classification loss value of the face recognition model;

18. An electronic device, comprising a processor and a memory, the memory storing a computer program, the processor being configured to execute the computer program in the memory to perform the model updating method of any one of claims 1 to 16.

19. A computer-readable storage medium, characterized in that it stores a computer program adapted to be loaded by a processor for performing the model updating method of any one of claims 1 to 16.

20. A computer program product, characterized in that it stores a computer program adapted to be loaded by a processor for performing the model updating method of any one of claims 1 to 16.