CN116524574A

CN116524574A - Facial area recognition method and device and electronic equipment

Info

Publication number: CN116524574A
Application number: CN202310638763.8A
Authority: CN
Inventors: 张翱翔; 刘文瑞
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2023-05-31
Filing date: 2023-05-31
Publication date: 2023-08-01

Abstract

The embodiment of the application provides a facial area identification method, a facial area identification device and electronic equipment, wherein the facial area identification method comprises the following steps: acquiring an image to be processed, wherein the image to be processed comprises a face area; performing saliency detection on the image to be processed by using a saliency feature generation model to obtain a corresponding target saliency feature map; performing enhancement processing on the image to be processed according to the target significance characteristic diagram to obtain an enhanced image; extracting features of the enhanced image by using a facial region recognition model to obtain enhanced facial region features; and determining a facial region recognition result of the image to be processed based on the enhanced facial region features.

Description

Facial area recognition method and device and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of computers, and relates to a facial area recognition method, a facial area recognition device and electronic equipment.

Background

In some scenes, the images collected by the cameras often have a plurality of low-quality face images, and the low-quality face images often have the problem of high rejection rate or misidentification in the identification system, so that the safety of the identification system is influenced, and the user experience is also influenced.

In the related art, most of the recognition methods adopted for solving the above problems are to filter out low-quality face images before recognition, however, the method is difficult to filter out low-quality face images, and even some high-quality face images can be filtered out; further, the recognition rate of the face image is lowered.

Disclosure of Invention

The embodiment of the application provides a facial area identification method and device and electronic equipment.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a facial area identification method, which comprises the following steps:

acquiring an image to be processed, wherein the image to be processed comprises a face area;

performing saliency detection on the image to be processed by using a saliency feature generation model to obtain a corresponding target saliency feature map;

performing enhancement processing on the image to be processed according to the target significance characteristic diagram to obtain an enhanced image;

extracting features of the enhanced image by using a facial region recognition model to obtain enhanced facial region features;

and determining a facial region recognition result of the image to be processed based on the enhanced facial region features.

In some embodiments, the facial region recognition model is trained by:

acquiring an image training data set; the image training data set comprises a plurality of face area images and quality labels corresponding to the face area images; the mass labels comprise a first mass label and a second mass label, and the second mass is lower than the first mass;

acquiring a significance feature map corresponding to each second-quality facial region image in the image training data set, and a significance feature map weight matrix;

obtaining each face area image after first enhancement based on the saliency feature images corresponding to each face area image of the second quality and the saliency feature image weight matrix;

and carrying out iterative training on the initial face region identification model by utilizing the face region images after the first enhancement until the trained face region identification model is obtained.

In some embodiments, the performing iterative training on the initial face region recognition model by using the face region images after the first enhancement until a trained face region recognition model is obtained includes:

Determining a first loss of the initial facial region recognition model using the first enhanced respective facial region images;

according to the first loss, carrying out gradient feedback and network parameter adjustment on the initial face region recognition model to obtain a face region recognition model which is trained for the first time;

acquiring a first layer gradient value of the facial region recognition model after the first training, and updating a significance feature map weight matrix corresponding to each facial region image with second quality according to the first layer gradient value to obtain an updated significance feature map weight matrix;

based on the updated significance feature map weight matrix, obtaining each face area image after second enhancement;

and continuing to perform iterative training on the face area recognition model after the first training by using the face area images after the second enhancement until the face area recognition model after the training is completed is obtained.

In some embodiments, the obtaining each face region image after the first enhancement based on the saliency feature map and the saliency feature map weight matrix corresponding to each face region image of the second quality includes:

Normalizing the face area images of each second quality to obtain corresponding normalized images;

multiplying the normalized image corresponding to each face region image with the second quality, the saliency feature image and the saliency feature image weight matrix to obtain each face region image after the first enhancement.

In some embodiments, the salient feature generation model is trained by:

determining a salient feature map corresponding to each face region image after the first enhancement according to an initial salient feature generation model;

determining a second loss of an initial saliency feature generation model according to the saliency feature images corresponding to the face region images after the first enhancement and the first layer gradient values;

adjusting network parameters of the initial salient feature generation model by using the second loss to obtain a salient feature generation model completed by the first training;

generating a model according to the salient features after the first training, and determining salient feature images corresponding to the facial area images after the second enhancement;

and according to the salient feature map corresponding to each facial area image after the second enhancement, carrying out iterative training on the salient feature generation model after the first training until obtaining the salient feature generation model after the training.

In some embodiments, the obtaining the second enhanced respective facial area image based on the updated saliency feature map weight matrix includes:

updating the salient feature map corresponding to each face region image with the second quality by using the salient feature generation model completed by the first training to obtain an updated salient feature map;

and obtaining each facial area image after the second enhancement based on the updated significance signature weight matrix and the updated significance signature.

In some embodiments, the facial region recognition model and the salient feature generation model share a preprocessing network, the method further comprising:

performing face area detection on the image to be processed by using the preprocessing network to obtain a face area detection result;

judging whether the quality of the face area of the image to be processed meets the quality requirement or not based on the face area detection result to obtain a judgment result;

correspondingly, performing saliency detection on the image to be processed by using a saliency feature generation model to obtain a corresponding target saliency feature map, wherein the method comprises the following steps:

And determining that the quality of the facial area of the image to be processed does not meet the quality requirement according to the judging result, and detecting the saliency of the image to be processed by using the saliency characteristic generation model.

In some embodiments, the enhancing the image to be processed according to the target saliency feature map to obtain an enhanced image includes:

normalizing the image to be processed to obtain a corresponding target normalized image;

determining a target significance characteristic map weight matrix corresponding to the image to be processed according to the facial region recognition model;

multiplying the target normalized image, the target saliency feature image weight matrix and the target saliency feature image corresponding to the image to be processed to obtain the enhanced image.

An embodiment of the present application provides a facial region recognition apparatus, including:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an image to be processed, and the image to be processed comprises a face area;

the detection module is used for carrying out saliency detection on the image to be processed by using a saliency feature generation model to obtain a corresponding target saliency feature map;

the enhancement module is used for enhancing the image to be processed according to the target significance characteristic image to obtain an enhanced image;

The extraction module is used for extracting the characteristics of the enhanced image by utilizing the facial area recognition model to obtain the characteristics of the enhanced facial area;

and the determining module is used for determining the enhanced facial region characteristics and determining a facial region recognition result of the image to be processed.

The embodiment of the application provides an electronic device, which comprises a memory, at least one processor and a computer program stored on the memory and capable of running on the processor, wherein the facial region identification method provided by the embodiment of the application is realized when the program is executed by the at least one processor.

The embodiment of the application provides a computer readable storage medium, which stores executable instructions for causing a processor to execute the executable instructions to implement the facial region recognition method provided by the embodiment of the application.

The facial region identification method, the facial region identification device and the electronic equipment provided by the embodiment of the application acquire an image to be processed, wherein the image to be processed comprises a facial region; performing saliency detection on the image to be processed by using a saliency feature generation model to obtain a corresponding target saliency feature map; performing enhancement processing on the image to be processed according to the target significance characteristic diagram to obtain an enhanced image; extracting features of the enhanced image by using a facial region recognition model to obtain enhanced facial region features; and determining a facial region recognition result of the image to be processed based on the enhanced facial region features. In this way, the embodiment of the application performs the saliency detection on the image to be processed including the face area so as to realize the enhancement processing on the saliency area in the face area, so that the accuracy of extracting the face area features can be improved by extracting the features of the enhanced face area, and the recognition rate of the face area can be effectively improved by further recognizing the features of the face area.

Drawings

Fig. 1 is a schematic view of an application scenario of a face area recognition method provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart of an alternative face region recognition method according to an embodiment of the present application;

fig. 3 is a schematic flow chart of saliency detection of an image to be processed according to an embodiment of the present application;

fig. 4 is a schematic flow chart of enhancement processing on an image to be processed according to an embodiment of the present application;

FIG. 5 is a flowchart of a training method of a face region recognition model according to an embodiment of the present application;

FIG. 6 is a flow chart of a training method of a salient feature generation model provided by an embodiment of the present application;

FIG. 7 is a flow diagram of a model training phase provided by an embodiment of the present application;

FIG. 8 is a schematic flow chart of an alternative face region identification method according to an embodiment of the present application;

fig. 9 is a schematic diagram of the composition structure of a face area recognition device provided in the embodiment of the present application;

fig. 10 is a schematic diagram of a composition structure of an electronic device according to an embodiment of the present application.

Detailed Description

For a more clear description of the objects, technical solutions and advantages of the embodiments of the present application, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. It is to be understood that the following description of the embodiments is intended to illustrate and describe the general concepts of the embodiments of the application and should not be construed as limiting the embodiments of the application. In the description and drawings, the same or similar reference numerals refer to the same or similar parts or components. For purposes of clarity, the drawings are not necessarily drawn to scale and some well-known components and structures may be omitted from the drawings.

In some embodiments, unless otherwise defined, technical or scientific terms used in the embodiments of the present application should be given the ordinary meanings as understood by those of ordinary skill in the art to which the embodiments of the present application belong. The terms "first," "second," and the like, as used in embodiments of the present application, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The terms "a" or "an" do not exclude a plurality. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", "top" or "bottom" and the like are used only to indicate a relative positional relationship, which may be changed accordingly when the absolute position of the object to be described is changed. When an element such as a layer, film, region or substrate is referred to as being "on" or "under" another element, it can be "directly on" or "under" the other element or intervening elements may be present.

In the related art, aiming at the problem of high rejection rate or false recognition of low-quality face images, two methods are generally adopted, one method is to filter out the low-quality face images before recognition, however, the method is difficult to filter out the low-quality face images, even some high-quality face images can be filtered out, and the recognition rate of the face images can be reduced; still another method is to enhance the quality of a low-quality face image by using a dim light enhancement, deblurring or super-resolution technique, however, the high-quality face image generated by this method is not necessarily capable of enhancing the recognition rate of the face image, for example, the super-resolution technique has a problem of distortion of the face image when generating the high-quality face image.

Based on the problems existing in the related art, the embodiment of the application provides a facial region identification method, which is used for acquiring an image to be processed, wherein the image to be processed comprises a facial region; performing saliency detection on the image to be processed by using a saliency feature generation model to obtain a corresponding target saliency feature map; performing enhancement processing on the image to be processed according to the target significance characteristic diagram to obtain an enhanced image; extracting features of the enhanced image by using a facial region recognition model to obtain enhanced facial region features; and determining a facial region recognition result of the image to be processed based on the enhanced facial region features. In this way, the embodiment of the application performs the saliency detection on the image to be processed including the face area so as to realize the enhancement processing on the saliency area in the face area, so that the accuracy of extracting the face area features can be improved by extracting the features of the enhanced face area, and the recognition rate of the face area can be effectively improved by further recognizing the features of the face area.

The facial region recognition method provided by the embodiment of the application may be executed by an electronic device, where the electronic device may be a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (for example, a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device) and other various types of terminals, and may also be implemented as a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligent platforms, and the like.

Fig. 1 is a schematic view of an application scenario of a facial area recognition method according to an embodiment of the present application, and as shown in fig. 1, a facial area recognition system 10 for implementing the facial area recognition method includes a terminal 100, a network 200, and a server 300, where the network 200 may be a wide area network or a local area network, or a combination of the two. The server 300 and the electronic device may be physically separate or integrated. When performing facial region recognition, the server 300 may acquire, through the network 200, an image to be processed in the terminal 100, where the image to be detected includes a facial region, perform feature extraction on the image to be processed by using a saliency feature generation model to obtain a corresponding target saliency feature map, perform enhancement processing on the image to be processed according to the target saliency feature map to obtain an enhanced image, perform feature extraction on the enhanced image by using a facial region recognition model to obtain enhanced facial region features, determine a facial region recognition result of the image to be processed based on the enhanced facial region features, send the facial region recognition result of the image to be processed and other information to the terminal 100, and display the facial region recognition result of the image to be processed on the display interface 100-1 of the terminal 100 to determine the facial region recognition result of the image to be processed.

The facial region recognition method provided by the embodiments of the present application will be described below in connection with exemplary applications and implementations of the server provided by the embodiments of the present application. Referring to fig. 2, fig. 2 is a schematic flowchart of an optional face area identifying method according to an embodiment of the present application, where the flowchart may include the following steps:

step S201, acquiring an image to be processed, where the image to be processed includes a face area.

In some embodiments, the image to be processed represents an image that requires facial region recognition, which may be referred to herein as a face; here, the method of acquiring the image to be processed is not limited, and may be, for example, acquired by an image acquisition device such as a camera, or may be acquired by other means.

And step S202, performing saliency detection on the image to be processed by using a saliency feature generation model to obtain a corresponding target saliency feature map.

Illustratively, the purpose of saliency detection of an image to be processed is to obtain saliency data of the image to be processed; the saliency data comprises the saliency of each pixel point in the image to be processed; based on the saliency data of the image to be processed, a target saliency feature image corresponding to the image to be processed can be obtained; the size of the target saliency feature image is the same as that of the image to be processed.

In the embodiment of the application, in order to ensure the accuracy of saliency detection, a saliency feature generation model based on deep learning can be adopted to carry out saliency detection on an image to be processed; in some embodiments, the saliency feature generation model may include a preprocessing network, see fig. 3, for saliency detection of an image to be processed using the saliency feature generation model, may include the steps of:

step S2021, performing face area detection on the image to be processed by using a preprocessing network to obtain a face area detection result;

step S2022, based on the face region detection result, judging whether the face region quality of the image to be processed meets the quality requirement, and obtaining a judgment result;

step S2023, determining that the quality of the face area of the image to be processed does not meet the quality requirement according to the judging result, and performing saliency detection on the image to be processed by using a saliency feature generation model.

Illustratively, the preprocessing network may be a network related to face detection, and the face area detection result corresponding to the image to be processed may be obtained by using the network to perform face area detection on the image to be processed; the face region detection result may include information affecting image quality such as confidence and sharpness of each key point in the face region, where the key points may include eye position, nose position, mouth angle position, and the like of the face region.

Further, based on the face region detection result, a quality score of the face region of the image to be processed may be determined; and judging whether the face area of the image to be processed meets the quality requirement or not according to the comparison result of the quality score and the quality threshold value, and obtaining a judgment result.

Here, a quality threshold may be preset, or may be set according to actual conditions when it is actually implemented; in the embodiment of the present application, the size of the quality threshold is not specifically limited.

For example, if the comparison result indicates that the quality score of the face area is smaller than the quality threshold, the quality of the face area is lower, and at this time, it may be determined that the face area of the image to be processed does not meet the quality requirement; otherwise, if the comparison result shows that the quality score of the face area is greater than or equal to the quality threshold, the face area is higher in quality, and at the moment, the face area of the image to be processed can be determined to meet the quality requirement; that is, the judgment result includes two cases where the face area quality meets the quality requirement and the face area quality does not meet the quality requirement.

Here, factors affecting the quality of the face region include, but are not limited to: illumination (dim light, bright light, backlight), whether or not there is occlusion, noise, resolution, etc.

In the embodiment of the application, after the judgment result is obtained, if the quality of the face area of the image to be processed is determined to be not in accordance with the quality requirement according to the judgment result, performing saliency detection on the image to be processed by using a saliency feature generation model to obtain a target saliency feature map corresponding to the image to be processed; otherwise, if the quality of the facial area of the image to be processed meets the quality requirement according to the judging result, the feature extraction is directly carried out on the image to be processed by using the facial area identification model without carrying out significance detection, so as to obtain facial area features; and determining a face region recognition result of the image to be processed based on the face region features.

It can be seen that in the embodiment of the present application, the acquired images to be processed do not need to be subjected to saliency detection, but are subjected to saliency detection only when the quality of the face area of the images to be processed is determined to be inconsistent with the quality requirement, so that a certain number of operands can be reduced, and the recognition efficiency is improved.

It should be noted that the facial area recognition model and the salient feature generation model may share a preprocessing network, and the preprocessing network is used to perform facial area detection on the image to be processed, so as to determine whether the quality of the facial area meets the quality requirement. Here, the facial area recognition model and the salient feature generation model can share a part of network at the bottom layer, so that memory resources can be saved, and network reasoning time can be accelerated.

And step S203, carrying out enhancement processing on the image to be processed according to the target significance characteristic diagram to obtain an enhanced image.

Here, the enhanced image is the same size as the image to be processed; in some embodiments, referring to fig. 4, the enhancing processing is performed on the image to be processed according to the target saliency feature image, so as to obtain an enhanced image, which may include the following steps:

step S2031, carrying out normalization processing on the image to be processed to obtain a corresponding target normalized image;

step S2032, determining a target saliency feature map weight matrix corresponding to the image to be processed according to the facial area recognition model;

step S2033, multiplying the target normalized image, the target saliency feature map weight matrix and the target saliency feature map corresponding to the image to be processed, and obtaining the enhanced image.

Illustratively, normalizing the image to be processed means normalizing the pixel value of each pixel point in the image to be processed; it may be understood that, assuming that the pixel value of each pixel point ranges from 0 to 255, performing normalization processing on the image to be processed may include: calculating the ratio of the pixel value of each pixel point to 255 in the image to be processed, determining the ratio corresponding to each pixel point as the pixel value after normalization processing of each pixel point, and obtaining the target normalized image corresponding to the image to be processed based on the pixel values.

The method comprises the steps of obtaining a first-layer gradient value of a training face region recognition model, updating a saliency feature map weight matrix obtained when the training face region recognition model is performed last time according to the first-layer gradient value of the model, and determining the updated saliency feature map weight matrix as a target saliency feature map weight matrix corresponding to an image to be processed. The training process of the facial area recognition model will be described later, and will not be described here again.

Here, the target saliency feature map weight matrix is the same as the size of the image to be processed; further, after the target normalized image, the target saliency feature image weight matrix and the target saliency feature image corresponding to the image to be processed are obtained, the enhanced image can be obtained by multiplying the target normalized image, the target saliency feature image weight matrix and the target saliency feature image.

Illustratively, the manner in which the enhanced image is obtained may further include: multiplying the image to be processed, the target significance feature map weight matrix and the target significance feature map; that is, the enhanced image can be obtained without normalization processing of the image to be processed, that is, normalization processing is not an indispensable step; whether to-be-processed images are normalized or not depends on whether to normalize each training image or not in the model training process, and the normalization processing are consistent. In addition, in the process of obtaining the enhanced image, the target saliency feature image weight matrix and the target saliency feature image can be normalized, so long as the consistency with the model training process is ensured, and the description is omitted here.

It can be understood that, by performing normalization processing on the image to be processed, the pixel value range corresponding to each pixel point of the image to be processed can be between 0 and 1, but not between 0 and 255, so that a certain calculated amount can be reduced, and the data processing efficiency can be improved.

And S204, extracting features of the enhanced image by using the facial region recognition model to obtain enhanced facial region features.

In the embodiment of the present application, after the enhanced image is obtained according to the above steps, the enhanced image is input into the facial region recognition model, and feature extraction is performed on the enhanced image, so that the enhanced facial region features can be obtained.

By way of example, the embodiments of the present application are not limited in terms of the type of the facial region recognition model, and for example, the facial region recognition model may be constructed based on deep learning, and may include a preset number of shallow convolution layers and deep convolution layers, where the shallow convolution layers may be used to extract shallow image features of the facial region in the enhanced image, such as image features of shallow layers including texture, brightness, etc., and the deep convolution layers may be used to extract deep image features of the facial region in the enhanced image, such as image features of deep layers including eyes, nose, mouth, etc.; by combining these two partial image features, enhanced facial region features can be obtained.

Step S205, based on the enhanced facial region characteristics, determining a facial region recognition result of the image to be processed.

Illustratively, after the enhanced facial region features are obtained, the similarity between the enhanced facial region features and the respective reference facial region features in the facial region database is determined, and the facial region recognition result of the image to be processed is determined based on the similarity.

Here, the face region database may be a database that stores reference face region features at an artificial granularity, which is set in advance; illustratively, each reference facial region feature in the facial region database has its unique tag that is labeled with user identity information, e.g., a user, B user, etc., corresponding to the reference facial region feature.

For example, a similarity algorithm may be employed to determine the similarity between the enhanced facial region features and the respective reference facial region features in the facial region database; in the embodiment of the present application, the type of the similarity algorithm is not particularly limited, and may be, for example, a cosine similarity algorithm or other similarity algorithms.

In some embodiments, determining a facial region recognition result of the image to be processed based on the similarity includes: judging whether the similarity is larger than or equal to the preset similarity, obtaining a judging result, and determining a facial area recognition result corresponding to the image to be processed according to the judging result.

Here, the preset similarity may be preset according to an actual situation, and the size of the preset similarity is not limited in this embodiment of the present application.

For example, if the judgment result indicates that the similarity is greater than or equal to the preset similarity, it is indicated that there is a reference facial region feature in the facial region database, which is matched with the enhanced facial region feature, and at this time, the user identity information corresponding to the reference facial region feature may be determined as the facial region recognition result of the image to be processed. Otherwise, if the judging result shows that the similarity is smaller than the preset similarity, the fact that the reference facial area characteristics matched with the enhanced facial area characteristics do not exist in the facial area database is indicated, and at the moment, the facial area recognition result of the image to be processed can be determined to be recognition failure.

As can be seen, in the embodiment of the present application, the salient detection is performed on the image to be processed including the face area, so that enhancement processing of the salient area in the face area is achieved, so that the accuracy of feature extraction of the face area can be improved by performing feature extraction on the enhanced face area, and the recognition rate of the face area can be effectively improved by further recognizing the feature of the face area; further, the problems of low face recognition rate and false recognition of low quality in the related art are solved.

In some embodiments, the facial region recognition method provided in the embodiments of the present application may be implemented by a facial region recognition model and a salient feature generation model together. The training process of the two models will be described in the following by way of example. Fig. 5 is a flowchart of a training method of a face area recognition model according to an embodiment of the present application, as shown in fig. 5, the method may include the following steps:

step S301, acquiring an image training data set.

For example, the image training dataset may include a plurality of face region images and quality labels corresponding to the respective face region images; the mass labels comprise a first mass label and a second mass label, and the second mass is lower than the first mass; here, for convenience of description, the second quality may be referred to as low quality, and the first quality may be referred to as high quality, that is, the image training dataset includes both a plurality of low quality face region images and a plurality of high quality face region images. The low-quality facial region images may include facial images with occlusion, low definition, high noise, or low resolution, and the high-quality facial region images may include facial images that do not affect facial region recognition.

Step S302, a saliency feature map corresponding to each second-quality facial region image in the image training data set is obtained, and a saliency feature map weight matrix is obtained.

For example, after the image training data set is acquired, each second quality face region image in the image training data set may be input into the initial saliency feature generation model, so as to obtain a saliency feature image corresponding to each second quality face region image. In addition, before training the facial area recognition model, initializing a salient feature weight matrix, and taking the matrix as a salient feature weight matrix corresponding to each facial area image with second quality; the size of the matrix is the same as that of the facial area image, and the value of each element in the matrix is random.

Step S303, obtaining each face area image after the first enhancement based on the saliency feature images corresponding to the face area images of the second quality and the saliency feature image weight matrix.

In some embodiments, the implementation of step S303 may include: carrying out normalization processing on each face area image with the second quality to obtain a corresponding normalized image; multiplying the normalized image corresponding to each face region image with the second quality, the saliency feature image and the saliency feature image weight matrix to obtain each face region image after first enhancement.

For example, to reduce the data calculation amount, after the image training data set is acquired, normalization processing may be performed on each face area image in the image training data set to obtain a normalized image corresponding to each face area image. For the normalized image corresponding to each second quality face area image, the normalized image can be multiplied by the corresponding saliency feature image and the saliency feature image weight matrix, so as to obtain each face area image after the first enhancement of each second quality face area image.

And step S304, performing iterative training on the initial face region recognition model by utilizing each face region image after the first enhancement until a trained face region recognition model is obtained.

In some embodiments, the implementation of step S304 may include: determining a first loss of the initial facial region recognition model using the first enhanced respective facial region images; according to the first loss, carrying out gradient feedback and network parameter adjustment on the initial face region recognition model to obtain a face region recognition model which is trained for the first time; acquiring a first layer gradient value of the facial area recognition model which is trained for the first time, and updating the significance feature map weight matrix corresponding to each facial area image with the second quality according to the first layer gradient value to obtain an updated significance feature map weight matrix; based on the updated significance characteristic map weight matrix, obtaining each face area image after the second enhancement; and (3) continuing to perform iterative training on the face region recognition model which is completed by the first training by utilizing each face region image after the second enhancement until the face region recognition model which is completed by the training is obtained.

Illustratively, after obtaining each face region image after the first enhancement, each face region image after the first enhancement may be input to an initial face region recognition model, to obtain a model output result; and determining the first loss of the initial face area identification model according to the model output result and the corresponding quality label.

Here, the type of the loss function used for determining the first loss is not limited, and may be, for example, a triplet loss function or another type of loss function.

For example, after the first loss is obtained, gradient feedback and network parameter adjustment may be performed on the initial face region recognition model by using the first loss, so that the model error meets the set requirement, and at this time, the face region recognition model after the first training is obtained. Then, a first layer gradient value of the face area recognition model which is trained for the first time can be obtained, the first layer gradient value is utilized to update the significance feature map weight matrix corresponding to each face area image with the second quality, and further, each face area image after the second enhancement is obtained based on the updated significance feature map weight matrix; it should be noted that, the implementation of the step of obtaining the second enhanced image of each face area involves a significant feature generation model, and the implementation of the step will be described later in conjunction with the training process of the significant feature generation model, which is not described herein.

For example, after obtaining the face area images after the second enhancement, the face area images after the second enhancement may be input into the face area recognition model after the first training, and the iterative training may be continued until the face area recognition model after the training is obtained.

The following describes a training process of the salient feature generation model; fig. 6 is a flowchart of a training method of a salient feature generation model according to an embodiment of the present application, as shown in fig. 6, the method may include the following steps:

step S401, generating a model according to the initial salient features, and determining salient feature diagrams corresponding to the facial region images after the first enhancement.

Illustratively, inputting the face region images of the second quality in the image training data set to an initial saliency feature generation model, so as to obtain a saliency feature image corresponding to the face region images of the second quality; at this time, the saliency feature maps corresponding to the face region images of the respective second qualities are determined as the saliency feature maps corresponding to the face region images after the first enhancement.

Step S402, determining a second loss of the initial saliency feature generation model according to the saliency feature images corresponding to the face region images after the first enhancement and the first layer gradient values.

For example, the first layer gradient value of the facial region recognition model after the first training is used as a saliency feature map label corresponding to each facial region image after the first enhancement; in this way, a second loss of the initial saliency feature generation model may be determined based on the corresponding saliency feature images and corresponding saliency feature image labels of the respective face region images after the first enhancement.

Here, the type of the loss function used for determining the second loss is not limited, and may be, for example, an average absolute error loss function or another type of loss function.

And step S403, adjusting network parameters of the initial salient feature generation model by using the second loss to obtain the salient feature generation model after the first training.

For example, after the second loss is obtained, the network parameters of the initial salient feature generating model may be adjusted by using the second loss, so that the model error meets the set requirement, and at this time, the salient feature generating model after the first training is completed may be obtained.

And step S404, generating a model according to the salient features after the first training, and determining salient feature diagrams corresponding to the facial region images after the second enhancement.

Illustratively, after obtaining the salient feature generating model after the first training is completed, the face region images of each second quality may be input again to the salient feature generating model after the first training is completed, to obtain updated salient feature diagrams corresponding to the face region images of each second quality; at this time, the updated saliency feature image is the saliency feature image corresponding to each face region image after the second enhancement.

Further, after the updated saliency feature map weight matrix and the updated saliency feature map are obtained, the face region images of the second quality can be multiplied by the corresponding updated saliency feature map weight matrix and the updated saliency feature map again, and then the face region images after the second enhancement are obtained.

And step 405, continuing to perform iterative training on the salient feature generation model after the first training according to the salient feature map corresponding to each facial area image after the second enhancement until the salient feature generation model after the training is obtained.

For example, after obtaining the saliency feature image corresponding to each face region image after the second enhancement, the iterative training may be continued on the saliency feature generation model after the first training according to the saliency feature image and the first layer gradient value of the face region recognition model after the second training until the saliency feature generation model after the training is obtained.

It can be understood that, since the salient regions of each low-quality facial region image in the image training dataset are different, the salient feature generation model obtained by training through the steps can dynamically learn the salient region feature map of each low-quality facial region image, so that the low-quality facial region can be enhanced, and the recognition rate of the low-quality facial region can be improved.

Fig. 7 is a schematic flow chart of a model training phase provided in an embodiment of the present application, as shown in fig. 7, the flow may include the following steps:

step S501, acquiring an image training data set.

The image training dataset includes a plurality of facial region images with quality labels, wherein the quality labels include a first quality label and a second quality label, the second quality being lower than the first quality. Here, the face area image with the first quality tag may be referred to as a high quality face image, and the face area image with the second quality tag may be referred to as a low quality face image.

Step S502, initializing a saliency feature map and a saliency feature map weight matrix of each low-quality face image.

Illustratively, each low quality face image is for each of the above-described second quality face region images; here, initializing the saliency feature map of each low-quality face image may include: and inputting each low-quality face image into the initial saliency feature generation model to obtain a saliency feature image corresponding to each low-quality face image. Meanwhile, initializing a saliency feature weight matrix, and taking the matrix as a saliency feature weight matrix corresponding to each low-quality face image.

Step S503, face detection is performed on each face area image included in the image training data set.

Illustratively, face detection can be performed on each face region image included in the image training data set through a preprocessing network shared by the face region recognition model and the saliency feature generation model to obtain a face detection result, and whether each face region image in the image training data set is a low-quality face image is determined according to the face detection result.

Step S504, training an initial face area recognition model.

When a certain face area image in the image training data set is determined to be a low-quality face image according to a face detection result, the face image (or the face image after normalization processing) can be multiplied by a corresponding saliency feature image and a saliency feature image weight matrix to obtain a face image after first enhancement, the face image after first enhancement is input into an initial face area recognition model to obtain a corresponding model output result, and the initial face area recognition model is trained by using the model output result.

Step S505, determining a first loss of the initial face region identification model.

For example, a first loss of the initial face region recognition model may be determined according to the model output result and the corresponding quality label, and then gradient backhaul and network parameter adjustment are performed on the initial face region recognition model by using the loss until a trained face region recognition model is obtained.

Step S506, training the initial saliency feature generation model.

When a face region image in the image training dataset is determined to be a low-quality face image according to a face detection result, the face image can be input into an initial saliency feature generation model to obtain a corresponding saliency feature image, and the initial saliency feature generation model is trained by using the saliency feature image.

Step S507, determining a second loss of the initial saliency feature generation model.

The first layer gradient value obtained by gradient feedback of the initial face region identification model is obtained, the first layer gradient value is used as a label of a saliency feature image and used for determining second loss of the initial saliency feature generation model with the saliency feature image output by the initial saliency feature generation model, and further, network parameter adjustment is performed on the initial saliency feature generation model by using the loss until the trained saliency feature generation model is obtained.

Further, on the basis of obtaining the trained salient feature generation model and the facial region recognition model according to fig. 7, an optional flowchart of the facial region recognition method is provided in the embodiment of the present application, as shown in fig. 8, and the facial region recognition may be implemented through steps S601 to S609:

step S601, acquiring a face image.

For example, a face image including a face may be acquired by an image acquisition device such as a camera, where the face image corresponds to the image to be processed and the face included in the face image corresponds to the face region.

Step S602, face detection.

Illustratively, face detection may be performed on the face image through a preprocessing network shared by a face recognition model (corresponding to the above-mentioned face region recognition model) and a salient feature generation model, so as to obtain a face detection result.

Step S603, judging whether the face quality meets the quality requirement. If not, steps S604 to S607 are performed, and if yes, steps S608 to S609 are performed.

For example, whether the face quality meets the quality requirement or not can be judged according to the face detection result, and if not, the face quality is indicated to be a low-quality face; otherwise, if the face quality is consistent, the face quality is high-quality face.

Step S604: and acquiring a face saliency feature image corresponding to the face image.

Here, the face saliency feature map corresponds to the target saliency feature map; for example, the saliency detection can be performed on the face image by using a saliency feature generation model, so as to obtain a face saliency feature image corresponding to the face image.

Step S605: and carrying out enhancement processing on the face image to obtain an enhanced face image.

Here, the enhanced face image corresponds to the enhanced image, and the enhanced face image may be obtained by multiplying a face image (or a face image after normalization processing), a weight matrix of a saliency feature map, and a saliency feature map of a face.

Step S606: and inputting the enhanced face image into a face recognition model.

The enhanced face image may be input to a face recognition model, and feature extraction may be performed on the enhanced image using the face recognition model to obtain enhanced face features.

Step S607: and carrying out similarity judgment on the enhanced face features.

The similarity between the enhanced face features and each reference face feature in the face database can be calculated, and the face recognition result corresponding to the face image can be determined according to the judgment result of the similarity and the preset similarity. Here, the face database corresponds to the face region database, and the reference face features correspond to the reference face region features.

Step S608: and extracting the characteristics of the face image.

For example, the face image may be directly input to a face recognition model, and feature extraction is performed on the face image by using the face recognition model, so as to obtain the face features of the face image.

Step S609: and judging the similarity of the face features of the face image.

By way of example, the similarity between the face features of the face image and each reference face feature in the face database may be calculated, and the face recognition result corresponding to the face image may be determined according to the determination result of the similarity and the preset similarity.

It can be seen that the facial region recognition method provided by the embodiment of the application not only can be used for recognizing high-quality face images, but also can be used for recognizing low-quality face images; in the process of carrying out face recognition on the low-quality face image, the low-quality face can be enhanced through the saliency characteristic generation model, so that the problems of low face recognition rate and false recognition of the low-quality face can be effectively solved, and the face recognition passing rate is improved.

Fig. 9 is a schematic diagram of the composition structure of a face area recognition device provided in an embodiment of the present application, and as shown in fig. 9, the face area recognition device 110 includes:

An acquiring module 111, configured to acquire an image to be processed, where the image to be processed includes a face area;

the detection module 112 is configured to perform saliency detection on the image to be processed by using a saliency feature generation model, so as to obtain a corresponding target saliency feature map;

the enhancement module 113 is configured to perform enhancement processing on the image to be processed according to the target saliency feature map, so as to obtain an enhanced image;

an extracting module 114, configured to perform feature extraction on the enhanced image by using a facial region recognition model, so as to obtain enhanced facial region features;

a determining module 115, configured to determine the enhanced facial area feature, and determine a facial area recognition result of the image to be processed.

In some embodiments, the facial region recognition device 110 further includes a first training module for acquiring an image training dataset; the image training data set comprises a plurality of face area images and quality labels corresponding to the face area images; the mass labels comprise a first mass label and a second mass label, and the second mass is lower than the first mass; acquiring a significance feature map corresponding to each second-quality facial region image in the image training data set, and a significance feature map weight matrix; obtaining each face area image after first enhancement based on the saliency feature images corresponding to each face area image of the second quality and the saliency feature image weight matrix; and carrying out iterative training on the initial face region identification model by utilizing the face region images after the first enhancement until the trained face region identification model is obtained.

In some embodiments, the first training module is further configured to determine a first loss of the initial facial region recognition model using the first enhanced respective facial region images; according to the first loss, carrying out gradient feedback and network parameter adjustment on the initial face region recognition model to obtain a face region recognition model which is trained for the first time; acquiring a first layer gradient value of the facial region recognition model after the first training, and updating a significance feature map weight matrix corresponding to each facial region image with second quality according to the first layer gradient value to obtain an updated significance feature map weight matrix; based on the updated significance feature map weight matrix, obtaining each face area image after second enhancement; and continuing to perform iterative training on the face area recognition model after the first training by using the face area images after the second enhancement until the face area recognition model after the training is completed is obtained.

In some embodiments, the first training module is further configured to normalize the face area images of the second quality to obtain corresponding normalized images; multiplying the normalized image corresponding to each face region image with the second quality, the saliency feature image and the saliency feature image weight matrix to obtain each face region image after the first enhancement.

In some embodiments, the facial region recognition device 110 further includes a second training module, configured to generate a model according to the initial salient features, and determine salient feature maps corresponding to the first enhanced facial region images; determining a second loss of an initial saliency feature generation model according to the saliency feature images corresponding to the face region images after the first enhancement and the first layer gradient values; adjusting network parameters of the initial salient feature generation model by using the second loss to obtain a salient feature generation model completed by the first training; generating a model according to the salient features after the first training, and determining salient feature images corresponding to the facial area images after the second enhancement; and according to the salient feature map corresponding to each facial area image after the second enhancement, carrying out iterative training on the salient feature generation model after the first training until obtaining the salient feature generation model after the training.

In some embodiments, the first training module is configured to update a salient feature map corresponding to the facial area image of each second quality by using the salient feature generation model completed by the first training, so as to obtain an updated salient feature map; and obtaining each facial area image after the second enhancement based on the updated significance signature weight matrix and the updated significance signature.

In some embodiments, the facial area recognition model and the salient feature generation model share a preprocessing network, and the facial area recognition device 110 further includes a judging module, configured to perform facial area detection on the image to be processed by using the preprocessing network, so as to obtain a facial area detection result; judging whether the quality of the face area of the image to be processed meets the quality requirement or not based on the face area detection result to obtain a judgment result;

accordingly, the detection module 112 is configured to determine that the quality of the face area of the image to be processed does not meet the quality requirement according to the determination result, and perform saliency detection on the image to be processed by using the saliency feature generation model.

In some embodiments, the enhancement module 113 is configured to normalize the image to be processed to obtain a corresponding target normalized image; determining a target significance characteristic map weight matrix corresponding to the image to be processed according to the facial region recognition model; multiplying the target normalized image, the target saliency feature image weight matrix and the target saliency feature image corresponding to the image to be processed to obtain the enhanced image.

It should be noted that, the description of the apparatus in the embodiment of the present application is similar to the description of the embodiment of the method described above, and has similar beneficial effects as the embodiment of the method, so that a detailed description is omitted. For technical details not disclosed in the embodiments of the present apparatus, please refer to the description of the embodiments of the method of the present application for understanding.

In the embodiment of the present application, if the above-mentioned facial area recognition method is implemented in the form of a software function module, and sold or used as a separate product, it may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or part contributing to the related art, and the computer software product may be stored in a storage medium, including several instructions for causing a terminal to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

An embodiment of the present application provides an electronic device, fig. 10 is a schematic diagram of a composition structure of the electronic device provided in the embodiment of the present application, as shown in fig. 10, where the electronic device 120 at least includes: a processor 121 and a computer-readable storage medium 122 configured to store executable instructions, wherein the processor 121 generally controls the overall operation of the electronic device. The computer-readable storage medium 122 is configured to store instructions and applications executable by the processor 121, and may also cache data to be processed or processed by each module in the processor 121 and the electronic device 120, and may be implemented by flash memory or random access memory (RAM, ran dom Access Memory).

The present embodiments provide a storage medium storing executable instructions that, when executed by a processor, cause the processor to perform the facial region recognition method provided by the embodiments of the present application, for example, the method as shown in fig. 2.

In some embodiments, the storage medium may be a computer readable storage medium, such as a ferroelectric Memory (FRAM, ferromagnetic Random Access Memory), read Only Memory (ROM), programmable Read Only Memory (PROM, programmable Read Only Memory), erasable programmable Read Only Memory (EPROM, erasable Programmable Read Only Memory), charged erasable programmable Read Only Memory (EEPR OM, electrically Erasable Programmable Read Only Memory), flash Memory, magnetic surface Memory, optical Disk, or Compact Disk-Read Only Memory (CD-ROM), among others; but may be a variety of devices including one or any combination of the above memories.

In some embodiments, the executable instructions may be in the form of programs, software modules, scripts, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.

As an example, the executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in one or more scripts in a hypertext markup language (HTML, hyper Text Marku p Language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). As an example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices located at one site or, alternatively, distributed across multiple sites and interconnected by a communication network.

The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. that are within the spirit and scope of the present application are intended to be included within the scope of the present application. It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application. The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of units is only one logical function division, and there may be other divisions in actual implementation, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed.

The foregoing is merely an embodiment of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes or substitutions are covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of facial region identification, the method comprising:

2. The method of claim 1, the facial region recognition model being trained by:

3. The method according to claim 2, wherein the iteratively training the initial face region recognition model using the first enhanced face region images until a trained face region recognition model is obtained, includes:

4. A method according to claim 2 or 3, wherein the obtaining each face region image after the first enhancement based on the saliency feature map and the saliency feature map weight matrix corresponding to each face region image of the second quality includes:

5. A method according to claim 3, the salient feature generation model being trained by:

6. The method of claim 5, wherein the obtaining, based on the updated saliency feature map weight matrix, each facial region image after second enhancement comprises:

7. The method of claim 1, the facial region recognition model and the salient feature generation model sharing a preprocessing network, the method further comprising:

8. The method according to claim 1 or 7, wherein the enhancing the image to be processed according to the target saliency feature image, to obtain an enhanced image, includes:

9. A facial region recognition apparatus, the apparatus comprising:

10. An electronic device comprising a memory, at least one processor and a computer program stored on the memory and executable on the processor, the at least one processor implementing the method of any one of claims 1 to 8 when the program is executed.