CN115147894A

CN115147894A - Image processing method, image processing apparatus, electronic device, and medium

Info

Publication number: CN115147894A
Application number: CN202210642616.3A
Authority: CN
Inventors: 周坚灿
Original assignee: Lumi United Technology Co Ltd
Current assignee: Lumi United Technology Co Ltd
Priority date: 2022-06-08
Filing date: 2022-06-08
Publication date: 2022-10-04

Abstract

The embodiment of the application provides an image processing method and device, electronic equipment and a storage medium, and relates to the technical field of computers. Wherein, the method comprises the following steps: acquiring an image to be identified; performing image feature extraction on the image to be recognized to obtain the image feature of the image to be recognized; according to the image characteristics of the image to be recognized, recognizing multiple face attributes of the face in the image to be recognized to obtain multiple attribute recognition results, wherein each attribute recognition result is used for indicating one recognized face attribute; and performing corresponding scene processing based on the identified at least one face attribute. The image processing method provided by the embodiment of the application can improve the accuracy of scene processing.

Description

Image processing method, image processing apparatus, electronic device, and medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.

Background

With the development of computer technology, human face attributes are used as important components of basic information of human faces and are gradually applied to different scenes, however, the single human face attribute often causes inaccuracy of expression content, and further accuracy of scene processing is easily affected.

At present, a multi-task learning mode or a multi-label learning mode is generally adopted for recognizing the attributes of multiple faces, but the accuracy of scene processing is still influenced due to low recognition precision of the attributes of the multiple faces.

From the above, how to improve the accuracy of scene processing still remains to be solved.

Disclosure of Invention

Embodiments of the present application provide an image processing method and apparatus, an electronic device, and a storage medium, which can solve the problem of low accuracy of scene processing in the related art. The technical scheme is as follows:

according to an aspect of an embodiment of the present application, an image processing method includes: acquiring an image to be identified; performing image feature extraction on the image to be recognized to obtain the image feature of the image to be recognized; according to the image characteristics of the image to be recognized, performing multi-face attribute recognition on the face in the image to be recognized to obtain a plurality of attribute recognition results, wherein each attribute recognition result is used for indicating one recognized face attribute; and carrying out corresponding scene processing based on the identified at least one face attribute.

According to an aspect of an embodiment of the present application, an image processing apparatus includes: the image acquisition module is used for acquiring an image to be identified; the characteristic extraction module is used for extracting image characteristics of the image to be identified to obtain the image characteristics of the image to be identified; the attribute identification module is used for identifying multiple face attributes of the face in the image to be identified according to the image characteristics of the image to be identified to obtain multiple attribute identification results, and each attribute identification result is used for indicating one identified face attribute; and the scene processing module is used for carrying out corresponding scene processing based on the identified at least one face attribute.

In an exemplary embodiment, the attribute recognition result is obtained by calling a face recognition model which completes face attribute extension.

In one exemplary embodiment, the apparatus further comprises: the training module is used for training to obtain the face recognition module, and the training module comprises: the first training unit is used for training an initial face recognition model to be trained according to a first training set for face recognition until the initial face recognition model converges to meet training conditions to obtain a trained basic model; the second training unit is used for training the face attribute recognition models corresponding to the face attributes according to a second training set used for face attribute recognition until the face attribute recognition models corresponding to the face attributes converge to meet training conditions, and obtaining the trained face attribute recognition models corresponding to the face attributes; the human face attribute recognition model corresponding to each human face attribute shares the same feature extraction layer with the basic model; and the extension unit is used for carrying out face attribute extension processing on the basic model according to the face attribute identification model corresponding to each face attribute to obtain the face identification model for finishing face attribute extension.

In one exemplary embodiment, the apparatus further comprises: the separation module is used for separating a trained feature extraction layer from the basic model; the separation module includes: the level determining unit is used for determining a feature extraction layer and a face recognition layer which finish training based on the basic model; and the parameter processing unit is used for curing the parameters of the feature extraction layer and deleting the parameters of the face recognition layer in the process of training the face attribute recognition model corresponding to each face attribute.

In one exemplary embodiment, the apparatus further comprises: the construction module is used for constructing a human face attribute identification model of the feature extraction layer containing the basic model; the building module comprises: the layer definition unit is used for taking the feature extraction layer of the basic model as the feature extraction layer of the face attribute identification model corresponding to each face attribute; each face attribute identification model comprises a face attribute identification layer corresponding to each face attribute; and the layer connecting unit is used for connecting the output end of the feature extraction layer in each face attribute identification model to the input end of the corresponding face attribute identification layer to obtain the constructed face attribute identification model corresponding to each face attribute.

In one exemplary embodiment, the second training unit includes: a class prediction subunit, configured to, for a training image carrying a face attribute label in the second training set, input a current training image into a face attribute identification model corresponding to each face attribute, perform prediction of a face attribute class, and obtain a prediction result of the current training image, where the training label is used to indicate a real face attribute of a face in the training image, and the prediction result is used to indicate a face attribute of face prediction in the training image; the difference determining subunit is used for determining the difference loss corresponding to the current training image according to the difference between the prediction result of the current training image and the face attribute label carried by the current training image; a parameter updating subunit, configured to update a parameter of a face attribute identification layer in the face attribute identification model if a difference loss corresponding to a current one of the training images does not satisfy a convergence condition; and the training completion subunit is used for inputting the next training image into the face attribute recognition model until the difference loss corresponding to the next training image meets the convergence condition, and completing the training of the face attribute recognition model.

In one exemplary embodiment, the extension unit includes: the layer sharing subunit is used for taking a feature extraction layer shared by the basic model and the plurality of face attribute identification models as a first layer of the face identification model for finishing face attribute expansion; the level parallel subunit is used for connecting the face recognition layer in the basic model with the face attribute recognition layers in the plurality of face attribute recognition models in parallel respectively to form a second layer of the face recognition model for finishing face attribute expansion; and the layer connection subunit is used for connecting the output end of the first layer to the parallel input end of the second layer to form a face recognition model for finishing the face attribute expansion.

In one exemplary embodiment, the apparatus further comprises: the face recognition module is used for carrying out face recognition on the face in the image to be recognized according to the image characteristics of the image to be recognized to obtain a face recognition result; the scene processing module comprises: and the quality evaluation unit is used for determining a scene execution scheme related to the quality of the face image according to the quality of the face image indicated by the attribute recognition result, so that the equipment can evaluate the quality of the face image according to the determined scene execution scheme.

In one exemplary embodiment, the apparatus further comprises: the face detection module is used for carrying out face detection on the image to be recognized; and the face extraction module is used for extracting a face region in the image to be recognized to obtain the face image if the image to be recognized is detected to contain the face, so that the image feature extraction is performed on the basis of the face image.

According to an aspect of an embodiment of the present application, an electronic device includes: the system comprises at least one processor, at least one memory and at least one communication bus, wherein the memory is stored with computer programs, and the processor reads the computer programs in the memory through the communication bus; the computer program, when executed by a processor, implements the image processing method as described above.

According to an aspect of embodiments of the present application, a storage medium has stored thereon a computer program which, when executed by a processor, implements an image processing method as described above.

According to an aspect of an embodiment of the present application, a computer program product includes a computer program, the computer program is stored in a storage medium, a processor of a computer device reads the computer program from the storage medium, and the processor executes the computer program, so that the computer device realizes the image processing method as described above when executing.

The technical scheme provided by the application brings the beneficial effects that:

in the technical scheme, image feature extraction is performed based on an acquired image to be recognized to obtain image features of the image to be recognized, so that multiple face attributes can be recognized on a face in the image to be recognized according to the image features of the image to be recognized to obtain multiple attribute recognition results, and further corresponding scene processing can be performed based on at least one recognized face attribute.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

FIG. 1 is one of schematic diagrams of an implementation environment according to an embodiment of the present application;

FIG. 2 is a second schematic diagram of an implementation environment according to an embodiment of the present application;

FIG. 3 is a flow diagram illustrating a method of image processing according to an exemplary embodiment;

FIG. 4 is a schematic diagram of a face image according to an embodiment of the present application;

FIG. 5 is a flow diagram illustrating another method of image processing according to an exemplary embodiment;

FIG. 6 is a schematic diagram of the face attribute recognition model and the base model according to the embodiment shown in FIG. 5 sharing the same feature extraction layer;

FIG. 7 is a flowchart of a method, in one embodiment, of step 430, according to the corresponding embodiment of FIG. 5;

FIG. 8 is a diagram illustrating a face recognition model extended from a base model to complete face attribute recognition extension according to the corresponding embodiment in FIG. 5;

FIG. 9 is a diagram illustrating an embodiment of an image processing method in an application scenario;

FIG. 10 is a schematic diagram of basic information for presenting a face to a user according to the application scenario shown in FIG. 9;

fig. 11 is a block diagram showing a configuration of an image processing apparatus according to an exemplary embodiment;

FIG. 12 is a diagram illustrating a hardware configuration of an electronic device in accordance with an exemplary embodiment;

FIG. 13 is a block diagram illustrating the structure of an electronic device in accordance with an exemplary embodiment.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are exemplary only for explaining the present application and are not construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

The following is a description and an explanation of several terms referred to in this application:

face attributes, which refer to semantic features of a face in appearance, describe visual characteristics that can be understood by humans, for example, face attributes include, but are not limited to: age, gender, expression, whether to wear a mask, whether to wear glasses, facial image quality, etc.

As mentioned above, the accuracy of scene processing can be affected by either single face attributes or multiple face attributes.

On the one hand, the single face attribute influences the accuracy of scene processing due to inaccuracy of expression content, taking the application scene of the intelligent sound box as an example, if the expression of the user is determined to be happy according to the face attribute, the intelligent sound box pushes a happy song to the user, but due to the fact that the age of the user is not determined, the pushed song is likely to be difficult to meet the preference of the age group of the user, and then user experience is influenced.

On the other hand, the accuracy of scene processing is affected by the multi-face attribute due to low recognition accuracy. Specifically, the recognition of the multi-face attributes usually adopts a multi-task learning mode or a multi-label learning mode, so that the recognition accuracy of the multi-face attributes is not high, and the accuracy of scene processing is further influenced.

The multi-label learning mode means that the data volume of the training images comes from a training set, and the training set comprises training images carrying various face attribute labels. In this way, the data size of a single training set is often not enough to support the learning of parameters in the whole recognition network, so that problems of overfitting, low accuracy and the like easily occur, and the recognition accuracy is difficult to ensure.

The multi-task learning mode means that the data volume of the training images comes from a plurality of training sets, and each training set comprises the training images carrying the same face attribute label. In this way, a plurality of training sets learn simultaneously, or, use a training set to train recognition network in advance to initialize parameters, reuse a plurality of corresponding training sets to carry out parameter fine tuning, although the data volume has increased mutually, but because the degree of difficulty is different between the multitask, and the data volume of the training image that carries different face attribute labels is unbalanced scheduling problem in different training sets, the degree of difficulty has increased, and along with the increase of face attribute kind, the study will become more complicated, and then make the study degree of difficulty grow gradually, still can make the recognition accuracy between the different face attributes influence each other, thereby be unfavorable for the promotion of recognition accuracy.

In addition, after the multitask learning is completed, if a task related to the attribute of the human face needs to be newly added, all tasks which are previously completed and learned need to be combined with the newly added task for relearning, so that the problems of poor expansibility and poor flexibility exist, and the maximum reduction of the calculation amount is not facilitated.

Therefore, the accuracy of scene processing is difficult to improve due to the low recognition precision of the multi-face attributes.

Therefore, the image processing method provided by the application can effectively improve the recognition precision of multiple human face attributes, and further can effectively improve the accuracy of scene processing, and accordingly, the image processing method is suitable for the image processing device, and the image processing device can be deployed in an electronic device configured with a von neumann architecture, for example, the electronic device can be a desktop computer, a notebook computer, a tablet computer, a gateway, a server and the like.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an implementation environment related to an image processing method. The implementation environment includes a user terminal 110, a smart device 130, a server 170, and a network device 190.

Specifically, the user terminal 110 may also be considered as a user terminal or a terminal, and may perform deployment (also understood as installation) of a client associated with the smart device 130, and this user terminal 110 may be an electronic device such as a smart phone, a tablet computer, a notebook computer, a desktop computer, an intelligent control panel, and other devices with display and control functions, and is not limited herein.

The client is associated with the smart device 130, and is substantially that a user registers an account in the client, and configures the smart device 130 in the client, for example, the configuration includes adding a device identifier to the smart device 130, so that when the client is run in the user terminal 110, a function related to device display of the smart device 130 can be provided for the user, and the client may be in the form of an application program or a web page.

The smart device 130 refers to one of the plurality of smart devices 130, and the smart device 130 is only used as an example in the embodiment of the present application, that is, the number and the device type of the smart devices are not limited in the embodiment of the present application. The intelligent device 130 may be an intelligent printer, an intelligent facsimile machine, an intelligent camera, an intelligent air conditioner, an intelligent door lock, an intelligent lamp, or an electronic device equipped with a communication module, such as a human body sensor, a door/window sensor, a temperature/humidity sensor, a water sensor, a natural gas alarm, a smoke alarm, a wall switch, a wall socket, a wireless switch, a wireless wall switch, a magic cube controller, a curtain motor, and the like.

The interaction between the user terminal 110 and the smart device 130 may be implemented through a local area network or a wide area network. In an application scenario, the user terminal 110 establishes a wired or wireless communication connection with the smart device 130 through the network device 190 (e.g., a router or a gateway), for example, the wired or wireless communication connection includes but is not limited to WIFI, so that the user terminal 110 and the smart device 130 are deployed in the same local area network, and further, the user terminal 110 may implement interaction with the smart device 130 through a local area network path. In another application scenario, a wired or wireless communication connection is established between the user terminal 110 and the smart device 130 through the server 170, for example, the wired or wireless communication connection includes but is not limited to 2G, 3G, 4G, 5G, WIFI, and the like, so that the user terminal 110 and the smart device 130 are deployed in the same wide area network, and further, the user terminal 110 may implement interaction with the smart device 130 through a wide area network path.

The server 170 may also be considered as a cloud, a cloud platform, a server, and the like, and the server 170 may be one server, or may be a server cluster formed by multiple servers, or a cloud computing center formed by multiple servers, so as to better provide background services to the mass user terminals 110. For example, the background service includes a face attribute recognition service.

Taking the example that the face attribute recognition service is provided by the server 170 (i.e., the cloud), in an application scenario, the intelligent device 130 (e.g., a smart camera) captures and acquires an image to be recognized, and sends the image to be recognized to the server 170, so that the server 170 provides the face attribute recognition service for the image to be recognized.

After the image to be recognized is obtained, the server 170 may perform multiple face attribute recognition on the image to be recognized by using the face attribute recognition service. Specifically, image feature extraction is carried out on an image to be recognized to obtain image features of the image to be recognized; according to the image characteristics of the image to be recognized, recognizing the face attributes of the face in the image to be recognized to obtain a plurality of attribute recognition results; and determining a scene execution scheme related to at least one face attribute according to the face attributes indicated by the attribute recognition results.

The smart device 130 and/or the user terminal 110 can perform scene operations corresponding to the face attributes according to a scene execution scheme. For example, in the user terminal 110, basic information of a face such as age and sex is presented to the user.

Of course, the face attribute recognition service may also be provided by an electronic device (i.e., an edge terminal) with an image capture function, for example, the electronic device may be a smart camera, a smart phone, or the like, as shown in fig. 2, which is not specifically limited herein.

Referring to fig. 3, an embodiment of the present application provides an image processing method, which is applicable to an electronic device, where the electronic device may specifically be the user terminal 110 in the implementation environment shown in fig. 1, or may also be the server 170 in the implementation environment shown in fig. 1.

In the following method embodiments, for convenience of description, the main execution subject of each step of the method is taken as an electronic device for illustration, but the method is not particularly limited to this configuration.

As shown in fig. 3, the method may include the steps of:

step 310, acquiring an image to be identified.

The image to be recognized is obtained by shooting and collecting the environment where the target is located, so that the target in the image to be recognized can be recognized conveniently in the follow-up process. In one possible implementation, the target is a human face.

It is to be understood that the shooting may be a single shooting or a continuous shooting, and then, for the same object, for the continuous shooting, a video may be obtained, and the image to be recognized may be any one frame of picture in the video, and for multiple times of shooting, multiple pictures may be obtained, and the image to be recognized may be any one picture in the multiple pictures. In other words, the image to be recognized in this embodiment may be from a dynamic image, for example, a plurality of frames of a video, or a plurality of photos, or may be from a static image, for example, any one frame of a video, or any one of a plurality of photos, and accordingly, the recognition performed on the target in this embodiment is performed in units of frames.

It is added here that the image acquisition device for shooting and acquiring the image to be recognized may be an electronic device such as a camcorder, a video camera, a smart phone, and a tablet computer, which is configured with a camera, and the image acquisition device may be deployed around the environment where the target is located, so as to shoot and acquire the environment where the target is located to form the image to be recognized.

Regarding the acquisition of the image to be recognized, the image to be recognized may be derived from the image to be recognized which is captured and collected by the image capturing device in real time, or may be the image to be recognized which is captured and collected by the image capturing device in a historical time period which is stored in the electronic device in advance. Then, for the electronic device, after the image to be recognized is captured and acquired by the image acquisition device, the image to be recognized may be processed in real time, or the image to be recognized may be stored in advance for further processing, for example, the image to be recognized may be processed when the CPU of the electronic device is low, or the image to be recognized may be processed according to the instruction of the worker. Therefore, the identification performed on the target in this embodiment may be performed on the image to be identified acquired in real time, or may be performed on the image to be identified acquired in a historical time period, which is not specifically limited herein.

And 330, extracting image characteristics of the image to be recognized to obtain the image characteristics of the image to be recognized.

The image features are accurate description of the human face in the image to be recognized. It should be understood that the faces in the images to be recognized are different, and the image features of the images to be recognized are also different, in other words, the image features of the images to be recognized uniquely identify the faces in the images to be recognized.

In a possible implementation manner, the extraction of the image features is implemented by a feature extraction algorithm of histogram of oriented gradient features, local binary pattern features, haar-like features, and the like.

In one possible implementation, the extraction of image features is achieved by several convolution layers. It should be noted that based on different numbers of convolution kernels in different convolution layers and different sizes, image features of different lengths can be obtained to reflect the face in the image to be recognized with different resolutions.

In one possible implementation, the image to be recognized is a face image; in one possible implementation, the face image is obtained by extracting a face region from the image to be recognized. It should be noted that the face image refers to an image mainly containing a human face, as shown in fig. 4.

Regarding the acquisition of the face image, before step 330, in one possible implementation, the method may further include the following steps: carrying out face detection on an image to be recognized; and if the image to be recognized contains the face, extracting the face area in the image to be recognized to obtain the face image.

In one possible implementation, the face image is an ROI (region of interest) image. For the face image, the region of interest is a face region, and is generated by marking the region where the face is located in the image to be recognized in a manner of a square frame, a circle, an ellipse, an irregular polygon and the like, so that the face image can be obtained by extracting the face region.

And 350, identifying multiple face attributes of the face in the image to be identified according to the image characteristics of the image to be identified to obtain multiple attribute identification results.

Wherein each attribute recognition result is used for indicating a recognized face attribute.

The identification of the multiple face attributes refers to the identification of multiple face attributes respectively. The recognition for multiple face attributes may be performed synchronously, for example, age and expression are recognized synchronously, or may be performed sequentially, for example, age is recognized first, and then expression is recognized, which is not limited herein.

In one possible implementation, the recognition of multiple face attributes is implemented by multiple face attribute recognition layers. In the mode, multiple face attributes are separately identified, so that the identification is easier, the pertinence is stronger, and the accuracy of single face attribute identification can be ensured.

In one possible implementation, the recognition of multiple face attributes is realized by a face recognition model which performs face attribute extension. In the mode, the characteristics that the training set for face recognition is particularly sufficient and rich in diversity are fully utilized, the face attribute is expanded on the basis of the face recognition model, and the problem of low recognition accuracy caused by insufficient data quantity to support parameter learning in the whole recognition network is solved.

Regarding the recognition of multiple face attributes, the recognition of each face attribute specifically includes: and according to the image characteristics of the image to be recognized, predicting the face attribute category of the face in the image to be recognized to obtain an attribute recognition result.

Taking the face attribute as a gender for example, the face attribute category comprises a male and a female, through prediction, the probability that the face in the image to be recognized belongs to the male is P1, the probability that the face in the image to be recognized belongs to the female is P2, if P1> P2, the attribute recognition result is used for indicating that the recognized face attribute is the male, otherwise, if P1< P2, the attribute recognition result is used for indicating that the recognized face attribute is the female. It can also be understood that the attribute recognition result is used for indicating the face attribute category to which the face belongs in the image to be recognized. Of course, when the face attribute is the face image quality, the face attribute type corresponds to different values, for example, 0 to 100, and accordingly, the attribute identification result is a specific value, which indicates that the identified face attribute is the face image quality.

And 370, performing corresponding scene processing based on the identified at least one face attribute.

In a possible implementation manner, the scene processing refers to determining a scene execution scheme related to at least one face attribute according to the face attributes indicated by the multiple attribute recognition results, so that the device executes a scene operation corresponding to the face attribute according to the scene execution scheme.

For example, if the facial attributes indicated by the attribute recognition results at least include facial image quality, age, gender, expression, etc., the scene execution scheme associated with the facial attributes may be a scene operation corresponding to the facial attributes, which is used to indicate the user terminal to execute, as an information presentation operation. In this case, the user terminal can present basic information of the face such as age, sex, and expression to the user according to the scene execution plan.

Or, if the facial attributes indicated by the attribute recognition results at least include age and expression, the scene execution scheme related to the facial attributes may be a scene operation corresponding to the facial attributes, which is used to indicate that the smart sound box executes, as a song push operation. At this time, the smart sound box can push a corresponding song to the user according to the scene execution scheme. For example, if the expression is happy and the age is 20 years old, the smart speaker will select a popular and cheerful song from the local/cloud media library for pushing.

That is to say, the scene execution scheme may be configured in advance in the user terminal by the user based on a plurality of human face attributes, and uploaded to the gateway/server, so that, along with interaction between the gateway/server and each device, each device can perform different responses according to the scene execution scheme configured by the user.

Through the process, the image processing is realized, on the one hand, multiple face attributes are separately identified, the identification is easier, the pertinence is stronger, the identification precision of single face attribute can be ensured, on the other hand, the identification of multiple face attributes is based on the image characteristics of the same image to be identified, the multiple face attributes are mutually associated, the improvement of the identification precision of the multiple face attributes is facilitated, the accuracy of scene processing is further facilitated, and the problem that the accuracy of scene processing existing in the related technology is not high can be effectively solved.

In one possible implementation, the attribute recognition result is performed by calling a face recognition model which completes face attribute extension. That is to say, the face identification model for completing the face attribute extension can not only identify the face in the image to be identified, but also identify the face in the image to be identified with multi-face attributes.

Referring to fig. 5, in an exemplary embodiment, the process of training the face recognition model with the face attribute expansion may include the following steps:

and step 410, training the initial face recognition model to be trained according to the first training set for face recognition until the initial face recognition model converges to meet the training conditions, and obtaining a trained basic model.

The training images in the first training set carry face labels, the training images refer to face images used for training of an initial face recognition model, and the face labels are used for indicating real identities of faces in the training images.

After the initial face recognition model is constructed and the parameters of the initial face recognition model are initialized randomly, training related to face recognition can be carried out on the initial face recognition model according to the training images carrying the face labels in the first training set. The initial face recognition model may be a machine learning model with any network structure, for example, the initial face recognition model is a convolutional neural network model, and this does not constitute a specific limitation here.

The training process may specifically comprise the steps of:

and aiming at the training images in the first training set, inputting the current training image into the initial face recognition model, and predicting the face type to obtain a prediction result of the current training image, wherein the prediction result is used for indicating the prediction identity of the face in the training image.

According to the difference between the prediction result of the current training image and the face label carried by the current training image, and determining the difference loss corresponding to the current training image. The difference loss may be determined by any type of loss function, for example, the loss function may be a cross-entropy loss function, a regression loss function, or the like, and is not limited herein.

And if the difference loss corresponding to the current training image does not meet the convergence condition, updating the parameters of the initial face recognition model. The convergence condition may be flexibly set according to actual needs of an application scenario, for example, the convergence condition refers to that a difference loss corresponding to the training image reaches a minimum to improve the recognition accuracy, or the convergence condition refers to that the number of iterations exceeds an iteration threshold to improve the training efficiency.

And inputting the next training image into the initial face recognition model until the difference loss corresponding to the next training image meets the convergence condition, and converging the initial face recognition model to obtain a basic model.

Therefore, the basic model has the capability of face recognition and provides a basis for the subsequent face attribute expansion.

And 430, training the face attribute recognition models corresponding to the face attributes according to the second training set for face attribute recognition until the face attribute recognition models corresponding to the face attributes converge to meet the training conditions, and obtaining the trained face attribute recognition models corresponding to the face attributes.

The human face attribute recognition model corresponding to each human face attribute and the basic model share the same feature extraction layer.

FIG. 6 is a diagram that illustrates that the face attribute recognition models share a feature extraction layer with the base model in one embodiment. In fig. 6, the basic model 400 includes a feature extraction layer 401 and a face recognition layer 403, which are trained, so that the training-based feature recognition model and the basic model share the same feature extraction layer, the feature extraction layer 401 is used as a first layer of each face attribute recognition model 500, the face attribute recognition layers 501 to 503 are respectively used as second layers of each face attribute recognition model 500, and meanwhile, an output end of the first layer is connected to an input end of the second layer, so that the construction of each face attribute recognition model 500 is completed. In one possible implementation, the feature extraction layer, the various types of recognition layers, is implemented by a fully connected layer. It should be noted that, for the feature extraction layer, the input is a face image, and regarding the face detection for obtaining the face image, the face detection may be implemented as a part of the model, that is, within the model, as shown in fig. 6, or may be implemented independently outside the model, which is not limited specifically here.

That is to say, each face attribute recognition model is based on a basic model, namely a feature extraction layer which completes training is used as a basis, and a corresponding face attribute recognition layer is connected on the basis, so that the characteristics of the first training set that the first training set is particularly sufficient and rich in diversity can be fully utilized, and the problem of low recognition accuracy caused by insufficient data quantity to support parameter learning in the whole recognition network is avoided.

In addition, each face attribute recognition model is based on a basic model, that is, in the subsequent process of the face attribute recognition training, parameters of the feature extraction layer are subjected to curing processing (for example, the learning rate is set to be 0), namely, the parameters participate in the face attribute recognition training, but the parameters are not updated in the training process, and the parameters of the face recognition layer are deleted, namely, the parameters do not participate in the face attribute recognition training, so that the calculation amount in the training process is reduced by sharing the calculation result of the feature extraction part, and the training efficiency is further improved.

And aiming at each face attribute recognition model, after the face attribute recognition model is constructed and the parameters of a face attribute recognition layer in the face attribute recognition model are initialized randomly, training about face attribute recognition can be carried out on the face attribute recognition model according to the training images carrying face attribute labels in the second training set. The training images in the second training set carry face attribute labels, the training images are to-be-recognized images used for training of the face attribute recognition model, and the face attribute labels are used for indicating real face attributes of faces in the training images. It should be noted that, for multiple face attributes, the second training set may include multiple training subsets, and training images in each training subset carry the same type of face attribute labels, or the same training image in the second training set may carry different types of face attribute labels, which is not limited herein.

The training process may specifically further comprise the steps of:

as shown in fig. 7, in step 431, for the training images in the second training set, the current training image is input into the face attribute recognition model, and the prediction of the face attribute category is performed to obtain the prediction result of the current training image, where the prediction result is used to indicate the face attribute of the face prediction in the training image.

And 433, determining a difference loss corresponding to the current training image according to a difference between the prediction result of the current training image and the face attribute label carried by the current training image. The difference loss may be determined by any type of loss function, for example, the loss function may be a cross-entropy loss function, a regression loss function, or the like, and is not limited herein.

Step 435, if the difference loss corresponding to the current training image does not satisfy the convergence condition, updating the parameters of the face attribute recognition layer in the face attribute recognition model.

The convergence condition may be flexibly set according to actual needs of an application scenario, for example, the convergence condition means that a difference loss corresponding to the training image reaches a minimum to improve the recognition accuracy, or the convergence condition means that the number of iterations exceeds an iteration threshold to improve the training efficiency.

It should be noted here that, since the parameters of the feature extraction layer are already involved in the updating during the training of the initial face recognition model based on the first training set, for the training of the face attribute recognition model based on the second training set, the parameters of the feature extraction layer are not involved in the updating, and only the parameters of the face attribute recognition layer are involved in the updating, in this way, not only can the characteristics of the first training set that the data volume is especially sufficient and rich in diversity be fully utilized, but also the differences between different face attribute labels in the second training set are combined, so that the complexity of parameter updating is reduced, the training difficulty is reduced, and the accuracy of face attribute recognition can be fully ensured.

And 437, inputting the next training image into the face attribute recognition model until the difference loss corresponding to the next training image meets the convergence condition, and finishing the training of the face attribute recognition model.

Therefore, the face attribute identification model has the capability of identifying the corresponding face attribute and provides a basis for the subsequent face attribute expansion.

And 450, performing face attribute extension processing on the basic model according to the face attribute identification models corresponding to the face attributes to obtain the face identification model completing the face attribute extension.

FIG. 8 is a diagram illustrating a face recognition model extended from a base model to perform face attribute expansion in one embodiment. In fig. 8, the process of expanding the face attribute specifically refers to a feature extraction layer 401 shared by the basic model and the plurality of face attribute identification models, as a first layer of the face identification model 600 that completes the face attribute expansion; the face recognition 403 in the basic model is respectively connected in parallel with the face attribute recognition layers 501-503 in the plurality of face attribute recognition models to form a second layer of the face recognition model 600 for finishing the face attribute expansion; the output end A of the first layer is connected to the parallel input end B of the second layer to form the face recognition model 600 for completing the face attribute expansion, and the face recognition model 600 not only has the capability of multi-face attribute recognition, but also retains the capability of face recognition.

Similarly, the face detection for obtaining the face image may be implemented as a part of the model, i.e., inside the model, as shown in fig. 8, or may be implemented independently outside the model, which is not limited herein.

Under the action of the embodiment, the face attribute is expanded on the basis of the face recognition model, the characteristics that the first training set is particularly sufficient and rich in diversity are utilized, the calculation results of the feature extraction part are shared, the calculation amount of a plurality of training branches related to the face attribute recognition is reduced as much as possible, the deployment of a new task is flexible, the expandability is high, the added task does not influence the original task, namely, the new and old tasks are not required to be combined together for retraining, and the calculation amount is further reduced to the maximum extent.

Fig. 9 is a schematic diagram of an implementation of an image processing method in an application scenario. In this application scenario, the implementation environments shown in fig. 1 and 2 are applicable.

Through step 801, the image capturing device captures and captures an image to be recognized, and through step 802, transmits the image to be recognized to a cloud, for example, the server 170 shown in fig. 1, or to an edge, where the edge may be the image capturing device itself, for example, the image capturing device is the smart device 130 (such as a smart camera) shown in fig. 1, or may be another electronic device, for example, the edge refers to a user terminal (such as a smart phone) shown in fig. 2.

For the cloud end/edge end, after receiving the image to be recognized, the face attribute recognition service can be called, so that the device can perform corresponding scene processing based on at least one face attribute. Specifically, in step 803, a scene execution scheme associated with at least one face attribute is determined according to a plurality of attribute recognition results obtained by performing multi-face attribute recognition.

After the scene execution scheme is obtained, the cloud end/edge end can determine, according to the scene execution scheme, a scene operation corresponding to the face attribute that the device should execute, and further send a device control instruction to the device, so that the device is controlled to execute the scene operation corresponding to the face attribute in response to the device control instruction.

For the device, by performing step 804, a scene operation corresponding to the face attribute is performed.

For example, as shown in fig. 10, based on various face attributes, in the user terminal, a face image including a face is presented to the user, and basic information of the face is also presented, including but not limited to: age, gender, expression, whether to wear a mask, whether to wear glasses, facial image quality, and the like.

Taking the example that the face attribute comprises the quality of the face image, the quality evaluation of the face image can be carried out based on the quality of the face image. Specifically, the user terminal can evaluate the quality of the face image of the face recognition result so as to accurately recognize strangers. For example, if the face recognition result indicates that the face is a stranger, but the face image quality is only 50 points, it is considered that the stranger may be recognized by mistake, and the smart camera is notified to shoot and acquire the image to be recognized again.

Taking the face attribute including age and/or expression as an example, based on the age and/or expression as happiness, the smart speaker pushes a happy song or voice meeting the age bracket of the user to the user, or the user terminal stores the face image and shares the face image to the social application associated with the user.

Of course, based on the age and/or the expression being sad, the smart sound box may push a soothing song or voice conforming to the age group of the user to the user, and may also trigger the automation of the device in a soothing scene configured in the user terminal by the user, for example, the configuration of the soothing scene includes but is not limited to: the brightness of the living room lamp is adjusted to 20%, the wind speed of a living room fan is adjusted to be a breeze gear, and the intelligent television plays funny film and television programs.

In the application scene, the accurate scene processing based on the human face attribute is realized, including but not limited to: the method comprises the steps of evaluating the quality of a face image of a face recognition result, displaying basic information of a face, pushing/sharing information, triggering equipment linkage/equipment automation in an intelligent home scene and the like, so that the user experience is effectively improved.

The following are embodiments of the apparatus of the present application that can be used to perform the image processing method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to method embodiments of the image processing method of the present application.

Referring to fig. 11, an embodiment of the present application provides an image processing apparatus 900, including but not limited to: an image acquisition module 910, a feature extraction module 930, an attribute identification module 950, and a scene processing module 970.

The image obtaining module 910 is configured to obtain an image to be identified.

The feature extraction module 930 is configured to perform image feature extraction on the image to be recognized to obtain an image feature of the image to be recognized.

The attribute identification module 950 is configured to perform multiple face attribute identification on a face in an image to be identified according to image features of the image to be identified, so as to obtain multiple attribute identification results, where each attribute identification result is used to indicate an identified face attribute.

And the scene processing module 970 is configured to perform corresponding scene processing based on the identified at least one face attribute.

In one exemplary embodiment, the apparatus further comprises: the training module is used for training to obtain the face recognition module, and comprises: the first training unit is used for training an initial face recognition model to be trained according to a first training set for face recognition until the initial face recognition model converges to meet training conditions to obtain a trained basic model; the second training unit is used for training the face attribute recognition models corresponding to the face attributes according to a second training set used for face attribute recognition until the face attribute recognition models corresponding to the face attributes converge to meet training conditions, and obtaining the trained face attribute recognition models corresponding to the face attributes; the human face attribute recognition model corresponding to each human face attribute shares the same feature extraction layer with the basic model; and the extension unit is used for carrying out face attribute extension processing on the basic model according to the face attribute identification model corresponding to each face attribute to obtain the face identification model for finishing face attribute extension.

In one exemplary embodiment, the apparatus further comprises: the construction module is used for constructing a human face attribute recognition model of the feature extraction layer containing the basic model; the building module comprises: the layer definition unit is used for taking the feature extraction layer of the basic model as the feature extraction layer of the face attribute identification model corresponding to each face attribute; each face attribute identification model comprises a face attribute identification layer corresponding to each face attribute; and the layer connecting unit is used for connecting the output end of the feature extraction layer in each face attribute recognition model to the input end of the corresponding face attribute recognition layer to obtain the constructed face attribute recognition model corresponding to each face attribute.

In one exemplary embodiment, the second training unit includes: a class prediction subunit, configured to, for a training image carrying a face attribute label in the second training set, input a current one of the training images into a face attribute identification model corresponding to each of the face attributes, and perform prediction of a face attribute class to obtain a prediction result of the current one of the training images, where the training label is used to indicate a real face attribute of a face in the training image, and the prediction result is used to indicate a face attribute of a face prediction in the training image; the difference determining subunit is used for determining the difference loss corresponding to the current training image according to the difference between the prediction result of the current training image and the face attribute label carried by the current training image; a parameter updating subunit, configured to update a parameter of a face attribute identification layer in the face attribute identification model if a difference loss corresponding to the current one of the training images does not satisfy a convergence condition; and the training completion subunit is used for inputting the next training image into the face attribute recognition model until the difference loss corresponding to the next training image meets the convergence condition, and completing the training of the face attribute recognition model.

In one exemplary embodiment, the apparatus further comprises: the face recognition module is used for carrying out face recognition on the face in the image to be recognized according to the image characteristics of the image to be recognized to obtain a face recognition result; the scene processing module comprises: and the quality evaluation unit is used for determining a scene execution scheme related to the quality of the face image according to the quality of the face image indicated by the attribute recognition result so as to enable the equipment to evaluate the quality of the face image according to the determined scene execution scheme.

In one exemplary embodiment, the apparatus further comprises: the face detection module is used for carrying out face detection on the image to be recognized; and the face extraction module is used for extracting a face region in the image to be recognized to obtain a face image if the image to be recognized is detected to contain the face, so that the image feature extraction is performed on the basis of the face image.

It should be noted that, in the image processing apparatus provided in the above embodiment, only the division of the above functional modules is taken as an example when executing a scene, and in practical applications, the above functions may be distributed to different functional modules according to needs, that is, the internal structure of the image processing apparatus is divided into different functional modules to complete all or part of the above described functions.

In addition, the image processing apparatus and the image processing method provided by the above embodiments belong to the same concept, and the specific manner in which each module performs operations has been described in detail in the method embodiments, and is not described again here.

FIG. 12 shows a schematic of a structure of an electronic device according to an example embodiment. The electronic device is suitable for use in the user terminal 110, the smart device 130, the gateway 150, the server 170, etc. in the implementation environment shown in fig. 1.

It should be noted that the electronic device is only an example adapted to the application and should not be considered as providing any limitation to the scope of use of the application. The electronic device is also not to be construed as necessarily relying on or having to have one or more components in the exemplary electronic device 2000 shown in fig. 12.

The hardware structure of the electronic device 2000 may have a large difference due to the difference of configuration or performance, as shown in fig. 12, the electronic device 2000 includes: a power supply 210, an interface 230, at least one memory 250, and at least one Central Processing Unit (CPU) 270.

Specifically, the power supply 210 is used to provide an operating voltage for each hardware device on the electronic device 2000.

The interface 230 includes at least one wired or wireless network interface for interacting with external devices. For example, interaction between user terminal 110 and gateway 150 is conducted in the implementation environment shown in FIG. 1.

Of course, in other examples of the present application, the interface 230 may further include at least one serial-to-parallel conversion interface 233, at least one input/output interface 235, at least one USB interface 237, and the like, as shown in fig. 12, which is not limited herein.

The storage 250 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., and the resources stored thereon include an operating system 251, an application 253, data 255, etc., and the storage manner may be a transient storage or a permanent storage.

The operating system 251 is used for managing and controlling hardware devices and application programs 253 on the electronic device 2000 to implement operations and processing of the mass data 255 in the memory 250 by the central processing unit 270, and may be Windows server, mac OS XTM, unix, linux, freeBSDTM, or the like.

The application 253 is a computer program that performs at least one specific task on the operating system 251, and may include at least one module (not shown in fig. 12), each of which may respectively include a computer program for the electronic device 2000. For example, the image processing apparatus can be regarded as an application 253 disposed in the electronic device 2000.

The data 255 may be a photograph, a picture, or the like stored in a disk, or may be a face recognition model, or the like, stored in the memory 250.

The central processor 270 may include one or more processors and is configured to communicate with the memory 250 through at least one communication bus to read the computer programs stored in the memory 250, and further implement operations and processing on the mass data 255 in the memory 250. The image processing method is accomplished, for example, by the central processor 270 reading a form of a series of computer programs stored in the memory 250.

Furthermore, the present application can be implemented by hardware circuits or by hardware circuits in combination with software, and therefore, the implementation of the present application is not limited to any specific hardware circuits, software, or a combination of the two.

Referring to fig. 13, in an embodiment of the present application, an electronic device 4000 is provided, where the electronic device 400 may include: desktop computers, notebook computers, tablet computers, smart phones, gateways, servers, and the like.

In fig. 13, the electronic device 4000 includes at least one processor 4001, at least one communication bus 4002, and at least one memory 4003.

Processor 4001 is coupled to memory 4003, such as by communication bus 4002. Optionally, the electronic device 4000 may further include a transceiver 4004, and the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computing function, e.g., comprising one or more microprocessors, a combination of DSPs and microprocessors, etc.

Communication bus 4002 may include a path that carries information between the aforementioned components. The communication bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The communication bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 13, but this is not intended to represent only one bus or type of bus.

The Memory 4003 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto.

A computer program is stored in the memory 4003, and the processor 4001 reads the computer program stored in the memory 4003 through the communication bus 4002.

The computer program realizes the image processing method in the above embodiments when executed by the processor 4001.

Furthermore, in an embodiment of the present application, a storage medium is provided, and a computer program is stored on the storage medium, and when being executed by a processor, the computer program realizes the image processing method in each of the above embodiments.

A computer program product is provided in an embodiment of the present application, the computer program product comprising a computer program stored in a storage medium. The processor of the computer device reads the computer program from the storage medium, and the processor executes the computer program, so that the computer device executes the image processing method in each of the embodiments described above.

Compared with the prior art, the image processing is realized, on one hand, multiple face attributes are separately identified, so that the identification is easier, the pertinence is stronger, the accuracy of single face attribute identification can be ensured, on the other hand, the identification of the multiple face attributes is based on the image characteristics of the same image to be identified, so that the multiple face attributes are mutually associated, the improvement of the accuracy of the multiple face attribute identification is facilitated, the accuracy of scene processing is further facilitated, and the problem that the accuracy of the scene processing existing in the related art is not high can be effectively solved.

In addition, the face attribute recognition is realized through a face recognition model with extended face attributes, so that the characteristics of sufficient data volume and rich diversity of a training set for face recognition are fully utilized, the problem of low recognition precision caused by insufficient quantity of learning parameters in the whole recognition network is avoided, the deployment of new tasks is flexible, the expandability is high, new tasks and old tasks do not need to be combined together for relearning, meanwhile, a large amount of calculation of a feature extraction part is shared, the calculation amount can be reduced to the maximum extent, and the recognition efficiency and precision are improved.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless otherwise indicated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. An image processing method, comprising:

acquisition is to be treated identifying an image;

carrying out image feature extraction on the image to be recognized to obtain the image features of the image to be recognized;

according to the image characteristics of the image to be recognized, recognizing multiple face attributes of the face in the image to be recognized to obtain multiple attribute recognition results, wherein each attribute recognition result is used for indicating one recognized face attribute;

and carrying out corresponding scene processing based on the identified at least one face attribute.

2. The method of claim 1, wherein the attribute recognition result is obtained by calling a face recognition model which completes face attribute extension.

3. The method of claim 2, wherein the face recognition model is trained by a training step comprising:

training an initial face recognition model to be trained according to a first training set for face recognition until the initial face recognition model converges to meet training conditions to obtain a trained basic model;

training a face attribute recognition model corresponding to each face attribute according to a second training set for face attribute recognition until the face attribute recognition model corresponding to each face attribute converges to meet training conditions, and obtaining a trained face attribute recognition model corresponding to each face attribute; the human face attribute recognition model corresponding to each human face attribute shares the same feature extraction layer with the basic model;

and performing face attribute extension processing on the basic model according to the face attribute identification model corresponding to each face attribute to obtain a face identification model for finishing face attribute extension.

4. A method as claimed in claim 3, wherein before the training of the face attribute recognition model corresponding to each face attribute according to the second training set for face attribute recognition, the method further comprises: separating a feature extraction layer which completes training from the basic model;

the step of separating the trained feature extraction layer from the basic model comprises the following steps:

determining a feature extraction layer and a face recognition layer which finish training based on the basic model;

and in the process of training the face attribute recognition model corresponding to each face attribute, carrying out curing treatment on the parameters of the feature extraction layer, and deleting the parameters of the face recognition layer.

5. A method as claimed in claim 3, wherein before the training of the face attribute recognition model corresponding to each face attribute according to the second training set for face attribute recognition, the method further comprises: constructing a human face attribute recognition model of a feature extraction layer containing the basic model;

the constructing of the face attribute recognition model of the feature extraction layer including the basic model includes:

taking the feature extraction layer of the basic model as a feature extraction layer of a face attribute identification model corresponding to each face attribute; each face attribute identification model comprises a face attribute identification layer corresponding to each face attribute;

and connecting the output end of the feature extraction layer in each face attribute recognition model to the input end of the corresponding face attribute recognition layer to obtain the constructed face attribute recognition model corresponding to each face attribute.

6. The method according to claim 3, wherein the training of the face attribute recognition model corresponding to each face attribute according to the second training set for face attribute recognition comprises:

aiming at training images carrying face attribute labels in the second training set, inputting a current training image into a face attribute recognition model corresponding to each face attribute, and predicting the face attribute type to obtain a prediction result of the current training image, wherein the training labels are used for indicating real face attributes of faces in the training images, and the prediction result is used for indicating the predicted face attributes of the faces in the training images;

determining the difference loss corresponding to the current training image according to the difference between the prediction result of the current training image and the face attribute label carried by the current training image;

if the difference loss corresponding to the current training image does not meet the convergence condition, updating the parameters of a face attribute recognition layer in the face attribute recognition model;

and inputting the next training image into the face attribute recognition model until the difference loss corresponding to the next training image meets the convergence condition, and finishing the training of the face attribute recognition model.

7. The method according to claim 3, wherein the performing the face attribute extension process on the basic model according to the face attribute identification model corresponding to each face attribute to obtain the face identification model with the face attribute extension completed comprises:

a characteristic extraction layer shared by the basic model and the plurality of face attribute recognition models is used as a first layer of the face recognition model for finishing face attribute expansion;

connecting the face recognition layer in the basic model with the face attribute recognition layers in the plurality of face attribute recognition models in parallel respectively to form a second layer of the face recognition model for finishing face attribute expansion;

and connecting the output end of the first layer to the parallel input end of the second layer to form a face recognition model for finishing the face attribute expansion.

8. The method of claim 1, wherein the face attributes include face image quality; the method further comprises the following steps:

according to the image characteristics of the image to be recognized, carrying out face recognition on the face in the image to be recognized to obtain a face recognition result;

the corresponding scene processing based on the identified at least one face attribute comprises:

and determining a scene execution scheme related to the quality of the face image according to the quality of the face image indicated by the attribute recognition result, so that the equipment performs face image quality evaluation on the face recognition result according to the determined scene execution scheme.

9. The method of any of claims 1 to 8, wherein after acquiring the image to be identified, the method further comprises:

carrying out face detection on the image to be recognized;

if the image to be recognized contains the face, extracting a face region in the image to be recognized to obtain the face image, so that the image feature extraction is performed on the basis of the face image.

10. An image processing apparatus characterized by comprising:

the image acquisition module is used for acquiring an image to be identified;

the characteristic extraction module is used for extracting image characteristics of the image to be identified to obtain the image characteristics of the image to be identified;

the attribute identification module is used for identifying multiple face attributes of the face in the image to be identified according to the image characteristics of the image to be identified to obtain multiple attribute identification results, and each attribute identification result is used for indicating one identified face attribute;

and the scene processing module is used for carrying out corresponding scene processing based on the identified at least one face attribute.

11. An electronic device, comprising: at least one processor, at least one memory, and at least one communication bus, wherein,

the memory has a computer program stored thereon, and the processor reads the computer program in the memory through the communication bus;

the computer program, when executed by the processor, implements the image processing method of any one of claims 1 to 9.

12. A storage medium on which a computer program is stored, which computer program, when being executed by a processor, carries out the image processing method according to any one of claims 1 to 9.