CN113361363A

CN113361363A - Training method, device and equipment for face image recognition model and storage medium

Info

Publication number: CN113361363A
Application number: CN202110604787.2A
Authority: CN
Inventors: 杨馥魁
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-09-07
Anticipated expiration: 2041-05-31
Also published as: CN113361363B

Abstract

The present disclosure provides a training method, an apparatus, a device and a storage medium for a face image recognition model, which relates to the technical field of artificial intelligence, in particular to the technical fields of deep learning, computer vision, etc., and can be applied to smart city scenes. The specific implementation scheme is as follows: acquiring a sample face image and an annotated face feature corresponding to the sample face image; extracting a sample scene category corresponding to the sample face image; acquiring scene edge characteristics corresponding to the sample scene categories; and training an initial face image recognition model according to the sample face image, the labeled face features and the scene edge features to obtain a target face image recognition model. Therefore, the identification representation capability of the target face image identification model for the face image characteristics under different scenes can be effectively improved, and the accuracy and reliability of face image identification are effectively improved.

Description

Training method, device and equipment for face image recognition model and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of deep learning, computer vision and the like, can be applied to smart city scenes, and particularly relates to a training method, a device, equipment and a storage medium of a face image recognition model.

Background

Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.

In the face image recognition model in the related technology, when the face images of different scene categories are recognized, the accuracy of the model recognition is poor.

Disclosure of Invention

The present disclosure provides a training method of a face image recognition model, a face image recognition method, an apparatus, an electronic device, a storage medium, and a computer program product.

According to a first aspect of the present disclosure, there is provided a training method for a face image recognition model, including: acquiring a sample face image and an annotated face feature corresponding to the sample face image; extracting a sample scene category corresponding to the sample face image; acquiring scene edge characteristics corresponding to the sample scene categories; and training an initial face image recognition model according to the sample face image, the labeled face features and the scene edge features to obtain a target face image recognition model.

According to a second aspect of the present disclosure, there is provided a face image recognition method, including: acquiring a face image; and inputting the face image into the target face image recognition model obtained by the training of the face image recognition model so as to obtain the target face characteristics output by the target face image recognition model.

According to a third aspect of the present disclosure, there is provided a training apparatus for a face image recognition model, comprising: the first acquisition module is used for acquiring a sample face image and an annotated face feature corresponding to the sample face image; the extraction module is used for extracting a sample scene category corresponding to the sample face image; the second acquisition module is used for acquiring scene edge characteristics corresponding to the sample scene category; and the training module trains an initial face image recognition model according to the sample face image, the labeled face features and the scene edge features to obtain a target face image recognition model.

According to a fourth aspect of the present disclosure, there is provided a face image recognition apparatus including: the third acquisition module is used for acquiring a face image; and the input module is used for inputting the face image into the target face image recognition model obtained by training the training device of the face image recognition model so as to obtain the target face characteristics output by the target face image recognition model.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of training a facial image recognition model according to the first aspect or to perform the method of facial image recognition according to the second aspect.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to execute the training method of the face image recognition model according to the first aspect or execute the face image recognition method according to the second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of training a face image recognition model according to the first aspect or performs the method of face image recognition according to the second aspect.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a schematic flow chart diagram illustrating a training method of a face image recognition model according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 5 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 6 is a schematic diagram according to a fifth embodiment of the present disclosure;

FIG. 7 is a schematic diagram according to a sixth embodiment of the present disclosure;

FIG. 8 is a schematic diagram according to a seventh embodiment of the present disclosure

FIG. 9 illustrates a schematic block diagram of an example electronic device that may be used to implement the training method of the face image recognition model of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure.

It should be noted that an execution subject of the training method for a face image recognition model in this embodiment is a training apparatus for a face image recognition model, the apparatus may be implemented in a software and/or hardware manner, the apparatus may be configured in an electronic device, and the electronic device may include but is not limited to a terminal, a server, and the like.

The embodiment of the disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to smart city scenes.

Wherein, Artificial Intelligence (Artificial Intelligence), english is abbreviated as AI. The method is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.

Deep learning is the intrinsic law and expression level of the learning sample data, and the information obtained in the learning process is very helpful for the interpretation of data such as characters, images and sounds. The final goal of deep learning is to make a machine capable of human-like analytical learning, and to recognize data such as characters, images, and sounds.

Computer vision, which means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eye observation or transmitted to an instrument for detection.

The smart city scene is a new concept and a new mode for promoting city planning, construction, management and service intellectualization by applying new-generation information technologies such as internet of things, cloud computing, big data and the like, and is a city informatization advanced form for fully applying the new-generation information technologies to various industries of cities.

In this embodiment, an execution subject of the training method for the face image recognition model may obtain the sample face image in various public and legal compliance manners, for example, the sample face image may be obtained from a public sample face image set, or obtained from a user after authorization of the user. The sample face image does not reflect the personal information of a particular user.

As shown in fig. 1, the training method of the face image recognition model includes:

s101: and acquiring a sample face image and an annotated face feature corresponding to the sample face image.

The face images used for training the model may be referred to as sample face images, the number of the sample face images may be one or more, the sample face images may be obtained by shooting the face of the user, or the sample face images may also be obtained by parsing the video, for example, the sample face images may be partial frame video images extracted from a plurality of video frames included in the video, which is not limited to this.

It should be noted that, in the embodiments of the present disclosure, the sample face images are obtained in compliance with relevant laws and regulations.

In the training process of the face image recognition model, the face features used as references for determining the convergence time of the model (i.e., determining whether the model reaches the standard) may be referred to as labeled face features, the face features may be, for example, facial features, skin color features, facial shape features, etc., the facial features may be, for example, positions of five sense organs, relative distances between five sense organs, etc., and no limitation is imposed on the facial features.

For example, a local image region (e.g., a facial feature region, a skin region, a facial contour region) may be identified from the sample face image, and then, image feature analysis may be performed on the local image region to determine facial features such as facial features, skin color features, facial features, and the like as the labeled face features, without limitation.

S102: and extracting a sample scene category corresponding to the sample face image.

After the sample face image is obtained, the sample scene type corresponding to the sample face image can be extracted.

The category used for describing the scene corresponding to the sample face image may be referred to as a sample scene category, and the category of the scene includes, for example, a scene category with a mask and a scene category without a mask.

In some embodiments, when the sample scene category corresponding to the sample face image is extracted, the sample scene category corresponding to the sample face image may be analyzed by obtaining a facial feature, an accessory feature, or a skin color feature of a face in the sample face image and then using the facial feature, the accessory feature, or the skin color feature of the face, without limitation.

For example, the face region of the sample face image may be identified, whether the face of the face is covered by a mask or not may be determined, if the face is covered by the mask, the sample scene type may be determined as the scene type with the mask, and if the face is covered by the mask, the sample scene type may be determined as the scene type without the mask.

For example, the sample scene category may also be, for example, an elderly people scene category, a children scene category, a scene category of a certificate photo, and the like.

In the embodiment of the application, in the training process of the face image recognition model, the sample scene category corresponding to the sample face image is extracted, so that the follow-up learning and modeling of the face feature based on the sample scene category auxiliary face image recognition model can be supported, and the sample scene category corresponding to the sample face image is jointly used as the training learning consideration of the face image recognition model, so that the target face image recognition model obtained by training can effectively learn the scene feature of the face image, and the face feature modeling representation effect is improved in an auxiliary manner.

S103: scene edge features corresponding to the sample scene categories are obtained.

After the sample scene category corresponding to the sample face image is extracted, the scene edge feature corresponding to the sample scene category may be obtained, where the scene edge feature may be an edge feature that describes a scene difference and is related to the sample scene category corresponding to the sample face image, and the edge feature may reflect a difference feature and a difference feature of the face image recognition model for prediction features in different scenes, for example, a distribution difference between prediction features in different scenes or any other difference feature between prediction features in different scenes, which is not limited herein.

For example, when acquiring the scene edge features corresponding to the sample scene category, a first face feature corresponding to a face sample image related to the old people scene category and a second face feature corresponding to a face sample image related to the children scene category may be determined, then, a difference feature between the first face feature and the second face feature in the two scene categories may be analyzed, the difference feature may be determined, a relative difference feature between the first face feature and the first face feature may be used as a scene edge feature related to the old people scene category, and the difference feature may be determined, and a relative difference feature between the second face feature and the second face feature may be used as a scene edge feature related to the children scene category, or any other possible manner may be adopted to acquire the scene edge features corresponding to the sample scene category, for example, an engineering manner, a model prediction manner, and the like, this is not limiting.

S104: and training an initial face image recognition model according to the sample face image, the labeled face features and the scene edge features to obtain a target face image recognition model.

After the scene edge features corresponding to the sample scene types are obtained, an initial face image recognition model can be trained according to the sample face images, the labeled face features and the scene edge features, so that a target face image recognition model is obtained.

The face image recognition model obtained at the initial stage of training may be referred to as an initial face image recognition model, and the face image recognition model may be an artificial intelligence model, specifically, for example, a neural network model or a machine learning model, or of course, any other possible artificial intelligence model capable of executing a face image recognition task may be adopted, without limitation.

For example, a sample face image, an annotated face feature, and a scene edge feature may be input into an initial face image recognition model to obtain a predicted face feature output by the initial face image recognition model, and if a convergence condition is satisfied between the predicted face feature and the annotated face feature (for example, a loss value between the predicted face feature and the annotated face feature is less than a loss threshold), it is determined that the face recognition model satisfies a convergence time, and the trained face image recognition model may be used as a target face image recognition model.

In some embodiments, a loss function may be preconfigured for the initial face image recognition model, in the process of training the initial face image recognition model, the predicted face features and the labeled face features are used as input parameters of the loss function, a loss value output by the loss function is determined, and then the loss value is compared with a set loss threshold value to determine whether the face image recognition model meets the convergence time, which is not limited.

In the embodiment, a sample face image and labeled face features corresponding to the sample face image are obtained, and a sample scene type corresponding to the sample face image is extracted; the method comprises the steps of obtaining scene edge characteristics corresponding to sample scene categories, training an initial face image recognition model according to sample face images, labeled face characteristics and the scene edge characteristics to obtain a target face image recognition model, effectively improving the recognition characterization capability of the target face image recognition model aiming at the face image characteristics under different scenes, and effectively improving the accuracy and reliability of face image recognition.

Fig. 2 is a schematic diagram according to a second embodiment of the present disclosure.

As shown in fig. 2, the training method of the face image recognition model includes:

s201: and acquiring a sample face image and an annotated face feature corresponding to the sample face image.

S202: and extracting a sample scene category corresponding to the sample face image.

For the description of S201-S202, reference may be made to the above embodiments, which are not described herein again.

S203: and inputting the sample face image into an initial face image recognition model to obtain the predicted face features output by the face image recognition model.

The description can be made with reference to fig. 3, as shown in fig. 3, fig. 3 is a schematic flow chart of a training method of a face image recognition model in the embodiment of the present disclosure, a sample face image sequence can be composed of an acquired sample face image 1, a sample face image 2, and a sample face image 3, the sample face sample sequence is input into an initial face image recognition model to obtain predicted face features (for example, predicted face feature 1 corresponding to sample face image 1, predicted face feature 2 corresponding to sample face image 2, and predicted face feature 3 corresponding to sample face image 3) output by the initial face image recognition model and corresponding to each sample face image, and then normalization processing can be performed on predicted face features 1, predicted face features 2, and predicted face features 3, and obtaining the predicted human face features after normalization processing.

For example, each sample face image in the sample face image sequence may be input into the initial face image recognition model in parallel to obtain a plurality of predicted face features predicted by the face image recognition model, and then, normalization processing may be performed on the plurality of predicted face features to obtain normalized predicted face features (the normalized predicted face features may be represented by a matrix F with a dimension M × N), where M is the number of sample face images and N is the dimension of the face features.

S204: and processing the predicted face features according to the sample scene category to obtain scene edge features corresponding to the sample scene category.

After the sample face image is input into the initial face image recognition model to obtain the predicted face features output by the face image recognition model, the predicted face features can be correspondingly processed according to the recognized sample scene categories to integrate the sample scene categories into the features, so that modeling and learning of the face recognition model are facilitated, the face recognition model can effectively recognize the difference features between the predicted face features under different scene categories, and the learning and modeling capability of the face recognition model for the difference features of the face features under different scene categories is improved.

When the predicted face features are processed according to the sample scene categories, weighted fusion processing may be performed on the sample scene categories and the predicted face features, or fusion processing may be performed in any other possible manner, which is not limited to this.

Optionally, in some embodiments, as shown in fig. 4, fig. 4 is a schematic diagram according to a third embodiment of the present disclosure, where the processing the predicted face features according to the sample scene category to obtain scene edge features corresponding to the sample scene category includes:

s401: scene image features corresponding to the sample scene category are determined.

After the sample face image is input into the initial face image recognition model to obtain the predicted face features output by the face image recognition model, the scene image features corresponding to the sample scene category can be determined.

The image features that can be used to characterize the scene category may be, for example, scene image features, for example, the scene image features corresponding to the old people scene category may be, for example, image features capable of characterizing massage chairs and image features capable of characterizing walking sticks, and the scene image features corresponding to the children scene category may be, for example, image features capable of characterizing red scarves and image features capable of characterizing amusement parks, which is not limited herein.

The scene image features corresponding to the sample scene categories may be obtained by labeling in advance.

In the embodiment of the present disclosure, the scene image features corresponding to the sample scene categories may be obtained by performing a certain conversion processing on the scene edge features corresponding to the sample scene categories.

For example, the present embodiment may be specifically described with reference to fig. 3, as shown in fig. 3, a random initialization process may be performed on scene edge features corresponding to sample scene categories to obtain scene image features (the scene image features may be represented by a matrix w1 with a dimension N × S), where N is the dimension of the scene image features, and S is the number of the scene categories, and then, a normalization process may be performed on the matrix w1 to obtain normalized scene image features.

S402: and fusing the scene image characteristics and the predicted human face characteristics to obtain fused image characteristics.

After the sample face image is input into the initial face image recognition model to obtain the predicted face features output by the face image recognition model and the scene image features corresponding to the scene category are determined, the predicted face features and the scene image features can be subjected to fusion processing to obtain the fusion image features.

For example, the embodiment may be specifically described with reference to fig. 3, as shown in fig. 3, the predicted face image feature F after normalization and the scene image feature w1 obtained after normalization may be multiplied to implement fusion processing of the predicted face image feature and the scene image feature to obtain a fused image feature (the fused image feature may be represented by a matrix T), or any other possible manner may be adopted to perform fusion processing on the predicted face image feature and the scene image feature, which is not limited in this regard.

S403: and generating scene edge characteristics corresponding to the sample scene categories according to the fusion image characteristics.

After the scene image features and the predicted face features are fused to obtain the fused image features, the scene edge features corresponding to the sample scene categories can be generated according to the fused image features.

For example, the fused image features may be input into a pre-trained scene edge feature determination model to obtain the scene edge features output by the scene edge feature determination model.

In the embodiment, the scene image features corresponding to the sample scene categories are determined, the scene image features and the predicted face features are fused to obtain the fused image features, the scene edge features corresponding to the sample scene categories are generated according to the fused image features, and the scene edge features corresponding to the sample scene categories are obtained due to the fact that the scene image features and the predicted face features are fused, so that the efficiency of optimizing the scene edge features is improved while the expression modeling capacity of the target face image recognition model for different scenes is improved to a great extent.

Optionally, in some embodiments, the generating, according to the fused image feature, a scene edge feature corresponding to the sample scene category may determine an angle relation value corresponding to the fused image feature; and generating the scene edge feature according to the fused image feature and the angle relation value.

After the scene image features and the predicted face image features are fused, the angle relation value corresponding to the fused image features can be determined.

The angle value used to describe the relationship between the scene image feature and the predicted face image feature may be referred to as an angle relationship value.

In some embodiments, the corresponding angle relationship module may be configured in advance for the model, and then the fused image feature is input to the angle relationship module to obtain the angle relationship value output by the angle relationship module and corresponding to the fused image feature, or the angle relationship value corresponding to the fused image feature may also be determined in any other possible manner, which is not limited to this.

After the fusion image features are obtained by fusing the scene image features and the predicted face image features and the angle relation values corresponding to the fusion image features are obtained, the scene edge features can be generated according to the fusion image features and the angle relation values.

For example, the embodiment may be described in detail with reference to fig. 3, and as shown in fig. 3, a certain mathematical operation may be performed on the fused image feature and the angle relation value to obtain the scene edge feature, where the specific calculation method is as follows: the method includes the steps that a soft-margin value obtained through calculation by the formula can be used for representing the scene edge characteristics, wherein T is the fusion image characteristics, an angle relation value corresponding to the fusion image characteristics can be arccos (T), the soft-margin value is a maximum similarity matrix, dim is a first dimension, and soft-margin is the scene edge characteristics.

In the embodiment, the angle relation value corresponding to the fusion image characteristic is determined; the scene edge features are generated according to the fusion image features and the angle relation values, and the scene edge features are generated by combining the fusion image features and the angle relation values, so that the generation efficiency of the scene edge features is effectively improved, and meanwhile, the expression modeling capacity of the face image recognition model for the scene edge features in different scenes is effectively improved, and the face recognition effect of the target face image recognition model can be effectively improved in an auxiliary manner.

Of course, any other possible manner may also be adopted to generate the scene edge feature corresponding to the sample scene category according to the fused image feature, such as a modeling manner, an engineering manner, and the like, which is not limited herein.

S205: a category feature corresponding to the sample scene category is determined.

After the scene edge features are generated according to the fused image features and the angle relation values, the category features corresponding to the sample scene categories can be determined.

For example, the embodiment may be described in detail with reference to fig. 3, as shown in fig. 3, a random initialization process may be performed on a sample scene category to obtain a category feature corresponding to the sample scene category (the category feature may be represented by a matrix w2 with a dimension N × C), where N is the dimension of the category feature, and C is the number of categories, and then, a normalization process may be performed on a w2 matrix to obtain a normalized category feature.

S206: and fusing the category characteristics and the predicted face characteristics to obtain target predicted face characteristics.

After the category characteristics corresponding to the sample scene categories are determined, the category characteristics and the predicted face characteristics can be fused to obtain the target predicted face characteristics.

For example, the embodiment may be specifically described with reference to fig. 3, as shown in fig. 3, the predicted face image feature F after normalization and the category feature w2 obtained after normalization may be multiplied to implement fusion processing of the predicted face image feature and the category feature, so as to obtain a target predicted face feature (the target predicted face feature may be represented by a matrix M), or any other possible manner may be adopted to perform fusion processing on the predicted face image feature and the category feature, which is not limited thereto.

In the embodiment of the disclosure, the category characteristics corresponding to the category of the sample scene are determined; the category characteristics and the predicted face characteristics are fused to obtain the target predicted face characteristics, so that the influence of scene categories on the face recognition accuracy can be effectively reduced, the face recognition accuracy and reliability can be effectively improved in an auxiliary mode, the recognition application scene of a face image recognition model is effectively expanded, and the applicability of the face image recognition model is improved.

S207: and determining loss values among the target prediction face features, the scene edge features and the labeled face features.

In some embodiments, as shown in fig. 3, a loss function may be preconfigured for the model, in the model training process, the target predicted face feature, the scene edge feature, and the labeled face feature are used as input parameters of the loss function, and a loss value of the output of the loss function is determined, and then, the loss value is compared with a set loss threshold to determine whether the face image recognition model meets the convergence time.

In the embodiment of the disclosure, the optimized target predicted face features and scene edge features soft-margin may be input into a loss function to obtain a loss value output by the loss function, and a specific loss value calculation mode is as follows: loss ═ softmax (cos (arccos (m) + soft-margin)). The descent gradient of the model can be obtained through calculation by a loss function, then the calculated descent gradient is adopted to update the optimizer, the descent gradient is applied to the model parameters to obtain updated model parameters, and then the training process of the face image recognition model can be supervised by referring to the updated model parameters.

S208: and if the loss value is smaller than the loss threshold value, taking the face image recognition model obtained by training as a target face image recognition model.

For example, if the loss value is smaller than the set loss threshold, it may be determined that the loss value satisfies the set condition, or the set condition may be configured as any other possible condition, which is not limited.

Therefore, in the embodiment, by determining the target predicted face features, the scene edge features and the loss values among the labeled face features, if the loss values are smaller than the loss threshold, the trained face image recognition model is used as the target face image recognition model, and the convergence time of the model can be accurately judged, so that the influence of feature difference on the judgment accuracy of the convergence time is avoided, the judgment accuracy of the convergence time can be effectively improved, and the training effect of the model is improved.

In this embodiment, by obtaining a sample face image and labeled face features corresponding to the sample face image, after the sample face image is input into an initial face image recognition model to obtain predicted face features output by the face image recognition model, the predicted face features may be correspondingly processed according to the recognized sample scene categories to blend the sample scene categories into the features, so as to facilitate modeling and learning of the face recognition model, so that the face recognition model can effectively recognize difference features between the predicted face features in different scene categories, thereby improving the learning and modeling capability of the face recognition model for differences of the face features in different scene categories. Determining a class characteristic corresponding to the class of the sample scene; the category features and the predicted face features are fused to obtain the target predicted face features, so that the influence of scene categories on face recognition can be effectively reduced, the accuracy and the reliability of the face recognition are effectively improved, and the recognition effect of the face image recognition model is effectively improved. By determining the target predicted face features, the scene edge features and the loss values among the labeled face features, if the loss values are smaller than the loss threshold, the trained face image recognition model is used as the target face image recognition model, the convergence time of the model can be accurately judged, so that the influence of feature difference on the judgment accuracy of the convergence time is avoided, the judgment accuracy of the convergence time can be effectively improved, and the model training effect is improved.

Fig. 5 is a schematic diagram according to a fourth embodiment of the present disclosure.

As shown in fig. 5, the face image recognition method includes:

s501: and acquiring a human face image.

The number of the face images may be one or more, and the face images may also be partial frame video images extracted from a plurality of video frames, which is not limited to this.

S502: and inputting the face image into the target face image recognition model obtained by training the face image recognition model by the training method so as to obtain the target face characteristics output by the target face image recognition model.

After the face image is obtained, the face image can be input into the target face image recognition model obtained by the training method of the face image recognition model, so as to obtain the target face feature output by the target face image recognition model.

In the embodiment, the face image is obtained and input into the target face image recognition model obtained by the training method of the face image recognition model, so as to obtain the target face feature output by the target face image recognition model.

Fig. 6 is a schematic diagram according to a fifth embodiment of the present disclosure.

As shown in fig. 6, the training apparatus 60 for face image recognition model includes:

a first obtaining module 601, configured to obtain a sample face image and an annotated face feature corresponding to the sample face image;

an extracting module 602, configured to extract a sample scene category corresponding to the sample face image;

a second obtaining module 603, configured to obtain a scene edge feature corresponding to the sample scene category; and

the training module 604 is configured to train an initial face image recognition model according to the sample face image, the labeled face features, and the scene edge features to obtain a target face image recognition model.

In some embodiments of the present disclosure, as shown in fig. 7, fig. 7 is a schematic diagram of a training apparatus 70 for a face image recognition model according to a sixth embodiment of the present disclosure, which includes: the system comprises a first obtaining module 701, an extracting module 702, a second obtaining module 703 and a training module 704, wherein the second obtaining module 703 comprises:

an input sub-module 7031, configured to input the sample face image into an initial face image recognition model to obtain a predicted face feature output by the face image recognition model;

and the processing sub-module 7032 is configured to process the predicted face features according to the sample scene category to obtain scene edge features corresponding to the sample scene category.

In some embodiments of the present disclosure, among others, processing submodule 7032 includes:

a determining unit 70321 for determining scene image features corresponding to the sample scene category;

a fusion unit 70322, configured to fuse the scene image feature and the predicted face feature to obtain a fused image feature;

a generating unit 70323, configured to generate a scene edge feature corresponding to the sample scene type according to the fused image feature.

In some embodiments of the present disclosure, the generating unit 70323 is specifically configured to:

determining an angle relation value corresponding to the fusion image characteristic;

and generating scene edge characteristics according to the fusion image characteristics and the angle relation value.

In some embodiments of the present disclosure, further comprising:

a determining module 705, configured to determine a category feature corresponding to a category of the sample scene;

and a fusion module 706, configured to fuse the category features and the predicted face features to obtain target predicted face features.

In some embodiments of the present disclosure, the training module 704 is specifically configured to:

determining target prediction face features, scene edge features and loss values among labeled face features;

and if the loss value is smaller than the loss threshold value, taking the face image recognition model obtained by training as a target face image recognition model.

It is understood that the training apparatus 70 of the facial image recognition model in fig. 7 of the present embodiment and the training apparatus 60 of the facial image recognition model in the above-mentioned embodiment, the first obtaining module 701 and the first obtaining module 601 in the above-mentioned embodiment, the extracting module 702 and the extracting module 602 in the above-mentioned embodiment, the second obtaining module 703 and the second obtaining module 603 in the above-mentioned embodiment, and the training module 704 and the training module 604 in the above-mentioned embodiment may have the same functions and structures.

It should be noted that the explanation of the aforementioned training method for the face image recognition model is also applicable to the training apparatus for the face image recognition model of the present embodiment.

Fig. 8 is a schematic diagram according to a seventh embodiment of the present disclosure.

As shown in fig. 8, the face image recognition apparatus 80 includes:

a third obtaining module 801, configured to obtain a face image;

the input module 802 is configured to input the face image into a target face image recognition model obtained by training with the training apparatus of the face image recognition model, so as to obtain a target face feature output by the target face image recognition model.

It should be noted that the explanation of the face image recognition method is also applicable to the face image recognition apparatus of the present embodiment, and is not repeated herein.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 9 illustrates a schematic block diagram of an example electronic device that may be used to implement the training method of the face image recognition model of an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the above-described methods and processes, such as a training method of a face image recognition model or a face image recognition method. For example, in some embodiments, the training method of the facial image recognition model, or the facial image recognition method, may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 908.

In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, the above-described training method of the face image recognition model, or one or more steps of the face image recognition method, may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform a training method of a face image recognition model, or a face image recognition method, in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; by way of example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in this disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions provided by this disclosure can be achieved, which are not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A training method of a face image recognition model comprises the following steps:

acquiring a sample face image and an annotated face feature corresponding to the sample face image;

extracting a sample scene category corresponding to the sample face image;

acquiring scene edge features corresponding to the sample scene categories; and

and training an initial face image recognition model according to the sample face image, the labeled face features and the scene edge features to obtain a target face image recognition model.

2. The method of claim 1, wherein the obtaining scene edge features corresponding to the sample scene category comprises:

inputting the sample face image into the initial face image recognition model to obtain a predicted face feature output by the face image recognition model;

and processing the predicted face features according to the sample scene category to obtain scene edge features corresponding to the sample scene category.

3. The method of claim 2, wherein the processing the predicted face features according to the sample scene class to obtain scene edge features corresponding to the sample scene class comprises:

determining scene image features corresponding to the sample scene category;

fusing the scene image features and the predicted face features to obtain fused image features;

and generating scene edge characteristics corresponding to the sample scene categories according to the fusion image characteristics.

4. The method of claim 3, wherein the generating scene edge features corresponding to the sample scene categories from the fused image features comprises:

determining an angle relation value corresponding to the fused image feature;

and generating the scene edge feature according to the fused image feature and the angle relation value.

5. The method of claim 2, further comprising, before the training an initial face image recognition model based on the sample face image, the annotated face features, and the scene edge features to obtain a target face image recognition model:

determining a category feature corresponding to the sample scene category;

and fusing the class characteristics and the predicted face characteristics to obtain target predicted face characteristics.

6. The method of claim 5, wherein the training an initial face image recognition model according to the sample face image, the annotated face features, and the scene edge features to obtain a target face image recognition model comprises:

determining loss values among the target prediction face features, the scene edge features and the labeled face features;

and if the loss value is smaller than a loss threshold value, taking the face image recognition model obtained by training as the target face image recognition model.

7. A face image recognition method comprises the following steps:

acquiring a face image;

inputting the face image into a target face image recognition model obtained by training the face image recognition model according to any one of the above 1-6, so as to obtain the target face features output by the target face image recognition model.

8. A training device for a face image recognition model comprises:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a sample face image and an annotated face feature corresponding to the sample face image;

the extraction module is used for extracting a sample scene category corresponding to the sample face image;

the second acquisition module is used for acquiring scene edge characteristics corresponding to the sample scene categories; and

and the training module is used for training an initial face image recognition model according to the sample face image, the labeled face features and the scene edge features to obtain a target face image recognition model.

9. The apparatus of claim 8, wherein the second obtaining module comprises:

the input submodule is used for inputting the sample face image into the initial face image recognition model so as to obtain the predicted face characteristics output by the face image recognition model;

and the processing submodule is used for processing the predicted face features according to the sample scene category so as to obtain scene edge features corresponding to the sample scene category.

10. The apparatus of claim 9, wherein the processing submodule comprises:

a determining unit for determining scene image features corresponding to the sample scene category;

the fusion unit is used for fusing the scene image characteristics and the predicted human face characteristics to obtain fused image characteristics;

and the generating unit is used for generating scene edge characteristics corresponding to the sample scene categories according to the fusion image characteristics.

11. The apparatus according to claim 10, wherein the generating unit is specifically configured to:

determining an angle relation value corresponding to the fused image feature;

12. The apparatus of claim 9, further comprising:

a determining module for determining a category feature corresponding to the sample scene category;

and the fusion module is used for fusing the class characteristics and the predicted face characteristics to obtain target predicted face characteristics.

13. The apparatus of claim 12, wherein the training module is specifically configured to:

14. A face image recognition apparatus comprising:

the third acquisition module is used for acquiring a face image;

and the input module is used for inputting the face image into a target face image recognition model obtained by training of the training device of the face image recognition model according to any one of the above 8-13 so as to obtain the target face characteristics output by the target face image recognition model.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6 or to perform the method of claim 7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-6 or to perform the method of claim 7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1-6 or performs the method of claim 7.