CN113361363B

CN113361363B - Training method, device, equipment and storage medium for face image recognition model

Info

Publication number: CN113361363B
Application number: CN202110604787.2A
Authority: CN
Inventors: 杨馥魁
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2024-02-06
Anticipated expiration: 2041-05-31
Also published as: CN113361363A

Abstract

The disclosure provides a training method, device, equipment and storage medium for a face image recognition model, relates to the technical field of artificial intelligence, in particular to the technical fields of deep learning, computer vision and the like, and can be applied to smart city scenes. The specific implementation scheme is as follows: obtaining a sample face image and labeled face features corresponding to the sample face image; extracting sample scene categories corresponding to the sample face images; acquiring scene edge characteristics corresponding to sample scene categories; and training an initial face image recognition model according to the sample face image, the marked face features and the scene edge features to obtain a target face image recognition model. Therefore, the recognition and characterization capability of the target face image recognition model for the face image features under different scenes can be effectively improved, and the accuracy and reliability of face image recognition are effectively improved.

Description

Training method, device, equipment and storage medium for face image recognition model

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of deep learning, computer vision and the like, and can be applied to smart city scenes, in particular to a training method, a training device, training equipment and a training storage medium of a face image recognition model.

Background

Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

In the face image recognition model in the related technology, when face images of different scene categories are recognized, the accuracy of model recognition is poor.

Disclosure of Invention

The present disclosure provides a training method for a face image recognition model, a face image recognition method, a device, an electronic apparatus, a storage medium and a computer program product.

According to a first aspect of the present disclosure, there is provided a training method of a face image recognition model, including: obtaining a sample face image and labeled face features corresponding to the sample face image; extracting sample scene categories corresponding to the sample face images; acquiring scene edge characteristics corresponding to sample scene categories; and training an initial face image recognition model according to the sample face image, the marked face features and the scene edge features to obtain a target face image recognition model.

According to a second aspect of the present disclosure, there is provided a face image recognition method, including: acquiring a face image; and inputting the face image into a target face image recognition model obtained by training the training method of the face image recognition model so as to obtain target face characteristics output by the target face image recognition model.

According to a third aspect of the present disclosure, there is provided a training apparatus for a face image recognition model, including: the first acquisition module is used for acquiring a sample face image and labeled face features corresponding to the sample face image; the extraction module is used for extracting sample scene categories corresponding to the sample face images; the second acquisition module is used for acquiring scene edge features corresponding to the sample scene categories; and the training module trains an initial face image recognition model according to the sample face image, the marked face features and the scene edge features to obtain a target face image recognition model.

According to a fourth aspect of the present disclosure, there is provided a face image recognition apparatus, comprising: the third acquisition module is used for acquiring the face image; the input module is used for inputting the face image into the target face image recognition model obtained by training by the training device of the face image recognition model so as to obtain the target face characteristics output by the target face image recognition model.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the training method of the face image recognition model as described in the first aspect or to perform the face image recognition method as described in the second aspect.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the training method of the face image recognition model as described in the first aspect, or to perform the face image recognition method as described in the second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the training method of the face image recognition model as described in the first aspect, or performs the face image recognition method as described in the second aspect.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a flow chart of a training method of a face image recognition model in an embodiment of the present disclosure;

FIG. 4 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 5 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 6 is a schematic diagram according to a fifth embodiment of the present disclosure;

FIG. 7 is a schematic diagram according to a sixth embodiment of the present disclosure;

FIG. 8 is a schematic diagram according to a seventh embodiment of the present disclosure

FIG. 9 illustrates a schematic block diagram of an example electronic device that may be used to implement the training method of the face image recognition model of embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure.

It should be noted that, the training device of the face image recognition model may be implemented by software and/or hardware, and the device may be configured in an electronic device, where the electronic device may include, but is not limited to, a terminal, a server, and the like.

The embodiment of the disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to smart city scenes.

Wherein, artificial intelligence (Artificial Intelligence), english is abbreviated AI. It is a new technical science for researching, developing theory, method, technology and application system for simulating, extending and expanding human intelligence.

Deep learning is the inherent regularity and presentation hierarchy of learning sample data, and the information obtained during such learning is helpful in interpreting data such as text, images and sounds. The final goal of deep learning is to enable a machine to analyze learning capabilities like a person, and to recognize text, images, and sound data.

Computer vision refers to machine vision such as identifying, tracking and measuring targets by using a camera and a computer instead of human eyes, and further performing graphic processing, so that the computer processing becomes an image which is more suitable for human eyes to observe or transmit to an instrument for detection.

The smart city scene refers to a new concept and a new mode for promoting the intelligent of city planning, construction, management and service by applying new generation information technologies such as Internet of things, cloud computing, big data and the like, and is a city informatization advanced form for fully applying the new generation information technologies in various industries of cities.

In this embodiment, the execution body of the training method of the face image recognition model may obtain the sampled face image in various public and legal manners, for example, may be obtained from the public sampled face image set, or may be obtained from the user after authorization of the user. The sample face image does not reflect personal information of a particular user.

As shown in fig. 1, the training method of the face image recognition model includes:

s101: and obtaining a sample face image and labeling face features corresponding to the sample face image.

The number of the sample face images may be one or more, the sample face images may be obtained by photographing the face of the user, or the sample face images may be obtained by parsing a video, for example, the sample face images may be partial frame video images extracted from a plurality of video frames contained in the video, which is not limited.

It should be noted that, in the embodiments of the present disclosure, the sample face images are obtained under the condition of meeting the related laws and regulations.

In the training process of the face image recognition model, the face features used as references for determining the convergence time of the model (i.e. determining whether the model meets the standards) may be referred to as labeled face features, and the face features may be, for example, facial features, skin color features, facial features, and the like, and the facial features may specifically be, for example, the facial position, the relative distance between the facial features, and the like, which are not limited.

After the sampled face image is obtained, the image feature analysis may be performed on the sample face image in real time to determine the labeled face feature corresponding to the sample face image, for example, a local image area (such as a facial feature image area, a skin area, and a facial contour area) may be identified from the sample face image, and then the image feature analysis may be performed on the local image area to determine the facial feature, the skin color feature, the facial feature, and the like as the labeled face feature, which is not limited.

S102: and extracting a sample scene category corresponding to the sample face image.

After the sampling face image is obtained, the sample scene category corresponding to the sample face image can be extracted.

The scene category for describing the scene corresponding to the sample face image may be referred to as a sample scene category, and the scene category includes, for example, a scene category with mask and a scene category without mask.

In some embodiments, when the sample scene category corresponding to the sample face image is extracted, the sample scene category corresponding to the sample face image may be analyzed by obtaining facial features, accessory features, or skin color features of a face in the sample face image, and then using the facial features, accessory features, or skin color features of the face, which is not limited.

For example, face region recognition may be performed on a sample face image, whether or not a face of the face is blocked by a mask may be determined, if a mask is recognized, a sample scene type is determined as a scene type with a mask, and if no mask is recognized, a sample scene type is determined as a scene type without a mask.

For example, the sample scene category may also be, for example, an elderly scene category, a child scene category, a scene category of a credential, and so forth.

In the embodiment of the application, in the training process of the face image recognition model, the sample scene category corresponding to the sample face image is extracted, so that the subsequent learning and modeling of the face feature by the face image recognition model can be supported based on the sample scene category, the sample scene category corresponding to the sample face image is jointly used as the training learning consideration of the face image recognition model, the target face image recognition model obtained through training can effectively learn the scene feature of the face image, and the modeling characterization effect of the face feature is assisted and promoted.

S103: and acquiring scene edge characteristics corresponding to the sample scene category.

After the sample scene category corresponding to the sample face image is extracted, the scene edge feature corresponding to the sample scene category may be obtained, where the scene edge feature may be an edge feature describing a scene difference related to the sample scene category corresponding to the sample face image, and the edge feature may embody a difference feature between the predicted features of the face image recognition model in different scenes, for example, a distribution difference between the predicted features in different scenes or any other difference feature between the predicted features in different scenes, which is not limited.

For example, when the scene edge feature corresponding to the sample scene category is obtained, the first face feature corresponding to the face sample image related to the elderly scene category and the second face feature corresponding to the face sample image related to the child scene category may be determined, then, the difference feature between the first face feature and the second face feature in the two scene categories is analyzed, and the difference feature is determined, and the relative difference feature between the first face feature and the second face feature is used as the scene edge feature related to the elderly scene category and the difference feature is determined, or any other possible manner may be used to obtain the scene edge feature corresponding to the sample scene category, for example, an engineering manner, a model prediction manner, and the like, which is not limited thereto.

S104: training an initial face image recognition model according to the sample face image, the marked face features and the scene edge features to obtain a target face image recognition model.

After the scene edge features corresponding to the sample scene category are obtained, the initial face image recognition model can be trained according to the sample face image, the labeled face features and the scene edge features, so that the target face image recognition model can be obtained.

The face image recognition model obtained in the initial stage of training may be referred to as an initial face image recognition model, and the face image recognition model may be an artificial intelligent model, specifically, for example, a neural network model or a machine learning model, and of course, any other possible artificial intelligent model capable of performing a face image recognition task may be adopted, which is not limited.

For example, the sample face image, the labeled face feature, and the scene edge feature may be input into an initial face image recognition model to obtain a predicted face feature output by the initial face image recognition model, and if a convergence condition is satisfied between the predicted face feature and the labeled face feature (for example, a loss value between the predicted face feature and the labeled face feature is smaller than a loss threshold value), the face recognition model is determined to satisfy the convergence opportunity, and the face image recognition model obtained by training may be used as the target face image recognition model.

In some embodiments, a loss function may be preconfigured for an initial face image recognition model, in the process of training the initial face image recognition model, the predicted face feature and the labeled face feature are used as input parameters of the loss function, a loss value output by the loss function is determined, and then the loss value is compared with a set loss threshold value to determine whether the face image recognition model meets convergence opportunity, which is not limited.

In the embodiment, the sample face image and the labeled face features corresponding to the sample face image are obtained, and the sample scene category corresponding to the sample face image is extracted; the method comprises the steps of obtaining scene edge characteristics corresponding to sample scene categories, training an initial face image recognition model according to sample face images, labeled face characteristics and the scene edge characteristics to obtain a target face image recognition model, and effectively improving the recognition characterization capability of the target face image recognition model on the face image characteristics under different scenes and the accuracy and reliability of face image recognition.

Fig. 2 is a schematic diagram according to a second embodiment of the present disclosure.

As shown in fig. 2, the training method of the face image recognition model includes:

S201: and obtaining a sample face image and labeling face features corresponding to the sample face image.

S202: and extracting a sample scene category corresponding to the sample face image.

The descriptions of S201 to S202 may be specifically referred to the above embodiments, and are not repeated herein.

S203: and inputting the sample face image into an initial face image recognition model to obtain the predicted face characteristics output by the face image recognition model.

The description may be made with reference to fig. 3 together for this embodiment, as shown in fig. 3, fig. 3 is a schematic flow chart of a training method of a face image recognition model in this embodiment of the disclosure, an obtained sample face image 1, a sample face image 2, and a sample face image 3 may be formed into a sample face image sequence, the sample face sample sequence is input into an initial face image recognition model to obtain predicted face features (for example, a predicted face feature 1 corresponding to the sample face image 1, a predicted face feature 2 corresponding to the sample face image 2, and a predicted face feature 3 corresponding to the sample face image 3) corresponding to each sample face image output by the initial face image recognition model, and then normalization processing may be performed on the predicted face feature 1, the predicted face feature 2, and the predicted face feature 3 to obtain the predicted face feature after normalization processing.

For example, each sample face image in the sample face image sequence may be input to an initial face image recognition model in parallel, so as to obtain a plurality of predicted face features by predicting the face image recognition model, and then normalization processing is performed on the plurality of predicted face features to obtain normalized predicted face features (the normalized predicted face features may be represented by a matrix F with dimensions of m×n), where M is the number of sample face images and N is the dimension of the face features.

S204: and processing the predicted face features according to the sample scene category to obtain scene edge features corresponding to the sample scene category.

According to the method, after the sample face image is input into the initial face image recognition model to obtain the predicted face characteristics output by the face image recognition model, the predicted face characteristics can be correspondingly processed according to the recognized sample scene categories, so that the sample scene categories are integrated into the characteristics, modeling learning of the face recognition model is facilitated, the face recognition model can effectively recognize the difference characteristics among the face characteristics predicted under different scene categories, and therefore learning modeling capability of the face recognition model for the difference characteristics of the face characteristics under different scene categories is improved.

The above-mentioned processing of the predicted face features according to the sample scene category may be a weighted fusion processing of the sample scene category and the predicted face features, or may be a fusion processing performed by any other possible method, which is not limited thereto.

Optionally, in some embodiments, as shown in fig. 4, fig. 4 is a schematic diagram of a third embodiment according to the present disclosure, where the processing the predicted face feature according to the sample scene category to obtain a scene edge feature corresponding to the sample scene category includes:

s401: scene image features corresponding to the sample scene category are determined.

After the sample face image is input into the initial face image recognition model to obtain the predicted face features output by the face image recognition model, the scene image features corresponding to the sample scene category can be determined.

The image features that can be used to characterize the scene category may be, for example, scene image features that correspond to the elderly scene category may be, for example, image features that can characterize a massage chair, image features that can characterize a walking stick, and scene image features that correspond to the child scene category may be, for example, image features that can characterize a red scarf, image features that can characterize a playground, without limitation.

The scene image features corresponding to the sample scene category may be labeled in advance.

In the embodiment of the disclosure, the scene image features corresponding to the sample scene category may be obtained by performing a certain conversion process on the scene edge features corresponding to the sample scene category.

For example, the embodiment may be specifically described with reference to fig. 3, as shown in fig. 3, a random initialization process may be performed on a scene edge feature corresponding to a sample scene category to obtain a scene image feature (the scene image feature may be represented by a matrix w1 with a dimension of n×s), where N is a dimension of the scene image feature, S is the number of scene categories, and then the w1 matrix may be subjected to a normalization process to obtain a normalized scene image feature.

S402: and fusing scene image features and predicted face features to obtain fused image features.

The sample face image is input into the initial face image recognition model to obtain the predicted face characteristics output by the face image recognition model, and after the scene image characteristics corresponding to the scene categories are determined, fusion processing can be carried out on the predicted face characteristics and the scene image characteristics to obtain the fusion image characteristics.

For example, the embodiment may be specifically described with reference to fig. 3, as shown in fig. 3, the fusion processing of the predicted face image feature and the scene image feature may be implemented by multiplying the normalized predicted face image feature F and the scene image feature w1 obtained after the normalization processing to obtain a fused image feature (the fused image feature may be represented by the matrix T), or the fusion processing of the predicted face image feature and the scene image feature may be implemented by any other possible manner, which is not limited thereto.

S403: and generating scene edge features corresponding to the sample scene categories according to the fused image features.

After the scene image features and the predicted face features are fused to obtain the fused image features, scene edge features corresponding to sample scene categories can be generated according to the fused image features.

For example, the fused image features may be input into a pre-trained scene edge feature determination model to obtain scene edge features output by the scene edge feature determination model.

In this embodiment, the scene image features and the predicted face features corresponding to the sample scene category are determined, and are fused to obtain the fused image features, and the scene edge features corresponding to the sample scene category are generated according to the fused image features.

Optionally, in some embodiments, the generating, according to the fused image feature, a scene edge feature corresponding to the sample scene category may determine an angular relationship value corresponding to the fused image feature; and generating the scene edge feature according to the fusion image feature and the angle relation value.

After the scene image features and the predicted face image features are fused, the angle relation value corresponding to the fused image features can be determined.

Among other things, the angle values used to describe the relationship between scene image features and predicted face image features may be referred to as angle relationship values.

In some embodiments, a corresponding angular relationship module may be configured for the model in advance, and then the fused image feature is input to the angular relationship module to obtain an angular relationship value corresponding to the fused image feature output by the angular relationship module, or the angular relationship value corresponding to the fused image feature may be determined by any other possible manner, which is not limited.

After the scene image features and the predicted face image features are fused to obtain the fused image features and the angle relation values corresponding to the fused image features, scene edge features can be generated according to the fused image features and the angle relation values.

For example, the embodiment may be specifically described with reference to fig. 3, and as shown in fig. 3, a certain mathematical operation may be performed on the fused image feature and the angle relation value to obtain a scene edge feature, which is specifically calculated as follows: soft-margin=reduce_max (arccos (T), dim=0), wherein T is a fused image feature, the angle relation value corresponding to the fused image feature may be arccos (T), reduce_max is a similarity maximum matrix, dim is a first dimension, soft-margin is a scene edge feature, and the soft-margin value calculated by the above formula may be used to represent the scene edge feature.

In the embodiment, the angle relation value corresponding to the fused image characteristic is determined; according to the fusion image feature and the angle relation value, the scene edge feature is generated, and the fusion image feature and the angle relation value are combined, so that the scene edge feature is generated, the scene edge feature generation efficiency is effectively improved, meanwhile, the expression modeling capability of the face image recognition model for the scene edge feature under different scenes is effectively improved, and the face recognition effect of the target face image recognition model can be effectively assisted to be improved.

Of course, generating scene edge features corresponding to the sample scene category from the fused image features may be implemented in any other possible manner, such as modeling, engineering, etc., which is not limited.

S205: a category characteristic corresponding to the sample scene category is determined.

After generating the scene edge feature according to the fused image feature and the angle relation value, the category feature corresponding to the sample scene category can be determined.

For example, the embodiment may be specifically described with reference to fig. 3, as shown in fig. 3, a random initialization process may be performed on the sample scene category to obtain a category feature corresponding to the sample scene category (the category feature may be represented by a matrix w2 with a dimension of n×c), where N is a dimension of the category feature, and C is a number of categories, and then normalization process may be performed on the w2 matrix to obtain a normalized category feature.

S206: and fusing the category characteristics and the predicted face characteristics to obtain target predicted face characteristics.

After determining the category characteristics corresponding to the sample scene category, the category characteristics and the predicted face characteristics can be fused to obtain the target predicted face characteristics.

For example, the embodiment may be specifically described with reference to fig. 3, as shown in fig. 3, the fusion processing of the predicted face image feature and the class feature may be implemented by multiplying the normalized predicted face image feature F and the class feature w2 obtained after the normalization processing to obtain the target predicted face feature (the target predicted face feature may be represented by the matrix M), or any other possible manner may be used to perform the fusion processing of the predicted face image feature and the class feature, which is not limited thereto.

In the embodiment of the disclosure, the category characteristics corresponding to the sample scene category are determined; the category characteristics and the predicted face characteristics are fused to obtain target predicted face characteristics, so that the influence of scene categories on the accuracy of face recognition can be effectively reduced, the accuracy and reliability of face recognition are effectively improved in an auxiliary mode, the recognition application scene of the face image recognition model is effectively expanded, and the applicability of the face image recognition model is improved.

S207: and determining target predicted face features, scene edge features and loss values among the labeled face features.

In some embodiments, as shown in fig. 3, a loss function may be preconfigured for a model, in a model training process, a target predicted face feature, a scene edge feature, and a labeled face feature are used as input parameters of the loss function, and a loss value of an output of the loss function is determined, and then the loss value is compared with a set loss threshold value to determine whether the face image recognition model meets convergence opportunity.

In the embodiment of the disclosure, the target predicted face feature and the scene edge feature obtained by the optimization may be input into a loss function to obtain a loss value output by the loss function, and a specific loss value calculating manner is as follows: loss=softmax (cos (M) +soft-margin)). The descent gradient of the model can be calculated through the loss function, then the calculated descent gradient update optimizer is adopted to apply the descent gradient to the model parameters to obtain updated model parameters, and then the training process of the face image recognition model can be supervised by referring to the updated model parameters.

S208: and if the loss value is smaller than the loss threshold value, taking the face image recognition model obtained through training as a target face image recognition model.

For example, if the loss value is smaller than the set loss threshold, it may be determined that the loss value satisfies the set condition, or the set condition may be configured as any other possible condition, which is not limited.

Therefore, in the embodiment, by determining the target predicted face feature, the scene edge feature and the loss value between the labeled face features, if the loss value is smaller than the loss threshold value, the face image recognition model obtained through training is used as the target face image recognition model, so that the convergence time of the model can be accurately judged, the influence of the feature difference on the convergence time judgment accuracy is avoided, the convergence time judgment accuracy can be effectively improved, and the model training effect is improved.

In this embodiment, by obtaining the sampled face image and the labeled face features corresponding to the sampled face image, after the sampled face image is input into the initial face image recognition model to obtain the predicted face features output by the face image recognition model, the predicted face features can be correspondingly processed according to the recognized sample scene categories, so that the sample scene categories are integrated into the features, modeling learning of the face recognition model is facilitated, the face recognition model can effectively recognize the difference features among the face features predicted under different scene categories, and thus learning modeling capability of the face recognition model for the difference features of the face features under different scene categories is improved. Determining category characteristics corresponding to the sample scene categories; the category characteristics and the predicted face characteristics are fused to obtain target predicted face characteristics, so that the influence of scene categories on face recognition can be effectively reduced, the accuracy and reliability of face recognition are effectively improved, and the recognition effect of a face image recognition model is effectively improved. By determining the target predicted face features, the scene edge features and the loss values among the labeled face features, if the loss values are smaller than the loss threshold, the face image recognition model obtained through training is used as the target face image recognition model, so that the convergence time of the model can be accurately judged, the influence of feature differences on the convergence time judgment accuracy is avoided, the convergence time judgment accuracy can be effectively improved, and the model training effect is improved.

Fig. 5 is a schematic diagram according to a fourth embodiment of the present disclosure.

As shown in fig. 5, the face image recognition method includes:

s501: and acquiring a face image.

The number of the face images may be one or more, and the face images may be partial frame video images extracted from a plurality of video frames, which is not limited.

S502: and inputting the face image into the target face image recognition model obtained by training the training method of the face image recognition model so as to obtain the target face characteristics output by the target face image recognition model.

After the face image is obtained, the face image can be input into the target face image recognition model obtained through training by the training method of the face image recognition model, so that the target face characteristics output by the target face image recognition model can be obtained.

In this embodiment, the face image is obtained and input into the target face image recognition model obtained by training the training method of the face image recognition model, so as to obtain the target face feature output by the target face image recognition model.

Fig. 6 is a schematic diagram according to a fifth embodiment of the present disclosure.

As shown in fig. 6, the training device 60 for a face image recognition model includes:

the first obtaining module 601 is configured to obtain a sample face image and labeled face features corresponding to the sample face image;

an extracting module 602, configured to extract a sample scene category corresponding to a sample face image;

a second obtaining module 603, configured to obtain a scene edge feature corresponding to the sample scene category; and

the training module 604 is configured to train the initial face image recognition model according to the sample face image, the labeled face feature, and the scene edge feature, so as to obtain the target face image recognition model.

In some embodiments of the present disclosure, as shown in fig. 7, fig. 7 is a schematic diagram of a training apparatus 70 of a face image recognition model according to a sixth embodiment of the present disclosure, including: a first acquisition module 701, an extraction module 702, a second acquisition module 703, a training module 704, wherein the second acquisition module 703 comprises:

an input submodule 7031, configured to input a sample face image into an initial face image recognition model, so as to obtain predicted face features output by the face image recognition model;

The processing sub-module 7032 is configured to process the predicted face feature according to the sample scene category to obtain a scene edge feature corresponding to the sample scene category.

In some embodiments of the present disclosure, wherein the processing sub-module 7032 comprises:

a determining unit 70321 for determining a scene image feature corresponding to the sample scene category;

the fusion unit 70322 is used for fusing the scene image features and the predicted face features to obtain fused image features;

the generating unit 70323 is configured to generate a scene edge feature corresponding to the sample scene category according to the fused image feature.

In some embodiments of the present disclosure, the generating unit 70323 is specifically configured to:

determining an angle relation value corresponding to the fused image characteristic;

and generating scene edge features according to the fused image features and the angle relation values.

In some embodiments of the present disclosure, further comprising:

a determining module 705, configured to determine a category feature corresponding to a sample scene category;

and the fusion module 706 is configured to fuse the category feature and the predicted face feature to obtain a target predicted face feature.

In some embodiments of the present disclosure, training module 704 is specifically configured to:

Determining target predicted face features, scene edge features and loss values among labeled face features;

and if the loss value is smaller than the loss threshold value, taking the face image recognition model obtained through training as a target face image recognition model.

It can be understood that the training device 70 for the face image recognition model in fig. 7 of the present embodiment and the training device 60 for the face image recognition model in the foregoing embodiment, the first acquisition module 701 and the first acquisition module 601 in the foregoing embodiment, the extraction module 702 and the extraction module 602 in the foregoing embodiment, the second acquisition module 703 and the second acquisition module 603 in the foregoing embodiment, and the training module 704 and the training module 604 in the foregoing embodiment may have the same functions and structures.

The explanation of the training method of the face image recognition model is also applicable to the training device of the face image recognition model of the present embodiment.

Fig. 8 is a schematic diagram according to a seventh embodiment of the present disclosure.

As shown in fig. 8, the face image recognition apparatus 80 includes:

a third acquiring module 801, configured to acquire a face image;

the input module 802 is configured to input the face image into the target face image recognition model obtained by training by the training device of the face image recognition model, so as to obtain the target face feature output by the target face image recognition model.

It should be noted that the foregoing explanation of the face image recognition method is also applicable to the face image recognition device of the present embodiment, and will not be repeated here.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

FIG. 9 illustrates a schematic block diagram of an example electronic device that may be used to implement the training method of the face image recognition model of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

Various components in device 900 are connected to I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, for example, a training method of a face image recognition model, or a face image recognition method. For example, in some embodiments, the face image recognition model training method, or the face image recognition method, may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908.

In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the training method of the face image recognition model described above, or the face image recognition method may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform a training method of the face image recognition model, or a face image recognition method, by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (for example, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present disclosure may be performed in parallel, sequentially, or in a different order, so long as the desired result of the technical solution provided in the present disclosure is achieved, and the present disclosure is not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A training method of a face image recognition model comprises the following steps:

obtaining a sampling face image and labeling face features corresponding to the sampling face image;

extracting a sample scene category corresponding to the sample face image;

inputting the sample face image into an initial face image recognition model to obtain predicted face characteristics output by the face image recognition model;

processing the predicted face features according to the sample scene categories to obtain scene edge features corresponding to the sample scene categories, wherein the scene edge features are difference features related to the sample scene categories corresponding to the sample face images and describing scene differences; and

training the initial face image recognition model according to the sample face image, the marked face features and the scene edge features to obtain a target face image recognition model.

2. The method of claim 1, wherein the processing the predicted face features according to the sample scene category to obtain scene edge features corresponding to the sample scene category comprises:

determining scene image features corresponding to the sample scene categories;

Fusing the scene image features and the predicted face features to obtain fused image features;

and generating scene edge features corresponding to the sample scene categories according to the fused image features.

3. The method of claim 2, wherein the generating scene edge features corresponding to the sample scene category from the fused image features comprises:

determining an angle relation value corresponding to the fused image feature;

and generating the scene edge feature according to the fusion image feature and the angle relation value.

4. The method of claim 1, further comprising, prior to said training the initial face image recognition model from the sample face image, the labeled face features, and the scene edge features to arrive at a target face image recognition model:

determining class characteristics corresponding to the sample scene class;

and fusing the category characteristics and the predicted face characteristics to obtain target predicted face characteristics.

5. The method of claim 4, wherein the training the initial face image recognition model based on the sample face image, the labeled face features, and the scene edge features to obtain a target face image recognition model comprises:

Determining the target predicted face features, the scene edge features and the loss values among the labeled face features;

and if the loss value is smaller than the loss threshold value, using the face image recognition model obtained through training as the target face image recognition model.

6. A face image recognition method, comprising:

acquiring a face image;

inputting the face image into a target face image recognition model obtained by training the training method of the face image recognition model according to any one of claims 1-5, so as to obtain target face characteristics output by the target face image recognition model.

7. A training device for a face image recognition model, comprising:

the first acquisition module is used for acquiring a sampling face image and a labeling face feature corresponding to the sampling face image;

the extraction module is used for extracting sample scene categories corresponding to the sample face images;

a second acquisition module, wherein the second acquisition module comprises:

the input sub-module is used for inputting the sample face image into an initial face image recognition model so as to obtain predicted face characteristics output by the face image recognition model;

The processing sub-module is used for processing the predicted face characteristics according to the sample scene category to obtain scene edge characteristics corresponding to the sample scene category, wherein the scene edge characteristics are differential characteristics which are related to the sample scene category corresponding to the sample face image and describe scene differences; and

and the training module is used for training the initial face image recognition model according to the sample face image, the marked face features and the scene edge features so as to obtain a target face image recognition model.

8. The apparatus of claim 7, wherein the processing sub-module comprises:

a determining unit, configured to determine a scene image feature corresponding to the sample scene category;

the fusion unit is used for fusing the scene image features and the predicted face features to obtain fused image features;

and the generating unit is used for generating scene edge features corresponding to the sample scene category according to the fused image features.

9. The apparatus of claim 8, wherein the generating unit is specifically configured to:

determining an angle relation value corresponding to the fused image feature;

10. The apparatus of claim 7, further comprising:

the determining module is used for determining category characteristics corresponding to the sample scene categories;

and the fusion module is used for fusing the category characteristics and the predicted face characteristics to obtain target predicted face characteristics.

11. The apparatus of claim 10, wherein the training module is specifically configured to:

12. A face image recognition apparatus comprising:

the third acquisition module is used for acquiring the face image;

the input module is configured to input the face image into a target face image recognition model obtained by training the training device for a face image recognition model according to any one of claims 7 to 11, so as to obtain a target face feature output by the target face image recognition model.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5 or to perform the method of claim 6.

14. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-5 or to perform the method of claim 6.