CN111191527B

CN111191527B - Attribute identification method, attribute identification device, electronic equipment and readable storage medium

Info

Publication number: CN111191527B
Application number: CN201911295248.4A
Authority: CN
Inventors: 亢君莲
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-12-16
Filing date: 2019-12-16
Publication date: 2024-03-12
Anticipated expiration: 2039-12-16
Also published as: CN111191527A

Abstract

The embodiment of the application provides an attribute identification method, an attribute identification device, electronic equipment and a readable storage medium, aiming at improving the accuracy of attribute identification. The attribute identification method comprises the following steps: obtaining image characteristics of an image to be identified; inputting the image characteristics into a plurality of attribute identification branches to respectively identify different groups of attributes of the image to be identified through different attribute identification branches; wherein each attribute identification branch corresponds to a group of attributes, and when each attribute in each group of attributes is identified, the positions of image areas which play a role in determining the attribute identification result on the image are the same or similar.

Description

Attribute identification method, attribute identification device, electronic equipment and readable storage medium

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to an attribute identification method, an attribute identification device, electronic equipment and a readable storage medium.

Background

In the technical field of image processing, an attribute identification task is to automatically identify attribute information of an object in a picture according to image features of the picture. In general, the attributes of the images to be identified in the attribute identification task include a plurality of types, and, taking attribute identification of the pedestrian image as an example, the attributes of the pedestrian image may include: sex, age, hair length, hair color, whether to carry a hat, whether to wear glasses, whether to carry a backpack, whether to wear a cotta, upper body clothing color, whether to bike, etc. Taking the identification of the attribute of the vehicle image as an example, the attribute of the vehicle image may include: vehicle model, body color, vehicle brand, vehicle orientation, roof rack, passenger for secondary drive, seat belt, visor set down, etc.

In the related art, in order to increase the attribute recognition efficiency as much as possible and shorten the time for obtaining attribute information, attribute information of each attribute category is usually obtained from an image to be recognized at one time. For example, after a pedestrian image to be recognized is obtained, the sex, age, hair length, hair color, whether to hat, whether to wear glasses, whether to backpack, whether to wear short sleeves, upper body clothing color, whether to ride a bike, etc. of the pedestrian are recognized at one time. However, the accuracy of identification in such an attribute identification method is not high, and partial attribute identification errors are liable to occur.

Disclosure of Invention

The embodiment of the application provides an attribute identification method, an attribute identification device, electronic equipment and a readable storage medium, aiming at improving the accuracy of attribute identification.

A first aspect of an embodiment of the present application provides an attribute identifying method, including:

obtaining image characteristics of an image to be identified;

inputting the image characteristics into a plurality of attribute identification branches to respectively identify different groups of attributes of the image to be identified through different attribute identification branches;

wherein each attribute identification branch corresponds to a group of attributes, and when each attribute in each group of attributes is identified, the positions of image areas which play a role in determining the attribute identification result on the image are the same or similar.

A second aspect of embodiments of the present application provides an attribute identification device, the device including:

the image characteristic obtaining module is used for obtaining the image characteristics of the image to be identified;

the attribute identification module is used for inputting the image characteristics into a plurality of attribute identification branches so as to respectively identify different groups of attributes of the image to be identified through different attribute identification branches;

A third aspect of the embodiments of the present application provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as described in the first aspect of the present application.

A fourth aspect of the embodiments of the present application provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed implements the steps of the method described in the first aspect of the present application.

By adopting the attribute identification method provided by the application, the image characteristics of the image to be identified are firstly obtained, and then the image characteristics are input into a plurality of attribute identification branches, so that different groups of attributes of the image to be identified are respectively identified through different attribute identification branches. Wherein, each attribute identification branch corresponds to a group of attributes, and when each attribute in the same group of attributes is identified, the positions of the image areas which play a role in determining the attribute identification result on the image are the same or similar. Therefore, the mutual promotion effect is provided between the various attributes identified by the same attribute identification branch, thereby being beneficial to improving the accuracy of the attribute identification result.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for attribute identification according to an embodiment of the present application;

FIG. 2 is a flow chart of a method for attribute identification according to another embodiment of the present application;

FIG. 3 is a schematic diagram illustrating attribute identification according to an embodiment of the present application;

FIG. 4 is a schematic diagram of attribute identification according to another embodiment of the present application;

FIG. 5 is a flow chart of model training according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a plurality of class activation graphs in accordance with one embodiment of the present application;

FIG. 7 is a schematic diagram of training a preset model according to an embodiment of the present application;

fig. 8 is a schematic diagram of an attribute identifying apparatus according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

In the related art, in order to increase the attribute recognition efficiency as much as possible and shorten the time for obtaining attribute information, attribute information of each attribute category is usually obtained from an image to be recognized at one time. However, the accuracy of identification in such an attribute identification method is not high, and partial attribute identification errors are liable to occur.

After analysis, the applicant of the present application determines that one of the reasons for the low recognition accuracy of such an attribute recognition method is: when part of the attributes are identified, the image areas on the image, which play a role in determining the attribute identification result, are the same or similar; when some attributes are recognized, image areas on the image that are decisive for the result of attribute recognition differ from each other, and when two attributes (hereinafter collectively referred to as mutually exclusive attributes) that are decisive for the image areas differ from each other during simultaneous recognition, they negatively affect each other, resulting in a decrease in accuracy of attribute recognition.

Taking the attribute identification of the pedestrian image as an example, the attribute of the pedestrian image may include: sex, age, hair length, hair color, whether to carry a hat, whether to wear glasses, whether to carry a backpack, whether to wear a cotta, upper body clothing color, whether to bike, etc. The applicant of the application trains an attribute recognition model, then takes a pedestrian image as an example, and generates an activation graph corresponding to each attribute of the pedestrian image by utilizing the trained attribute recognition model, so that the attribute recognition model specifically focuses on which area of the pedestrian image is confirmed according to the activation graph during the period of being recognized. In other words, the position of the image area where each attribute plays a decisive role in the recognition result of the attribute during the recognition is determined from the activation map.

Wherein, an activation graph corresponding to an attribute is used for representing: when the attribute is recognized, the weight of each region of the pedestrian image for determining the effect of the attribute recognition result. Alternatively, an activation graph corresponding to an attribute reflects: and each model parameter participating in identifying the attribute in the trained attribute identification model. The manner in which the activation map is generated will be described in detail below and will not be described in detail herein.

The applicant observes the activation graphs corresponding to the attributes of the pedestrian image to find that the attribute recognition results determine that the image areas differ from each other when the attributes are recognized. In the activation diagrams of the attributes such as sex, hair length, and whether or not a hat is attached, the highlighted areas are concentrated near the neck and shoulder of the pedestrian, in other words, when the attributes such as sex, hair length, and whether or not a hat is attached are identified, the image areas that determine the attribute identification result are the neck and shoulder areas of the pedestrian in the pedestrian image.

For example, in the activation diagrams of the attributes such as whether the backpack is worn or whether the short sleeve is worn, the highlight areas are concentrated near the arms and the upper body of the pedestrian, in other words, when the attributes such as whether the backpack is worn or whether the short sleeve is worn are recognized, the image areas that determine the attribute recognition result are the arms and the upper body areas of the pedestrian in the pedestrian image.

Wherein, since the sex attribute and whether the knapsack attribute are during recognition, the image areas on the pedestrian image which play a role in determining the attribute recognition result are different from each other, they have negative effects on each other, resulting in a decrease in accuracy of attribute recognition.

In view of this, in order to improve the attribute recognition accuracy, the applicant proposed: the plurality of attributes are divided into a plurality of groups in advance according to the positions of image areas where the images play a role in determining the attribute recognition results when the respective attributes are recognized. Wherein each group of attributes corresponds to an attribute identification branch, and when each attribute in each group of attributes is identified, the positions of image areas which play a role in determining the attribute identification result on the image are the same or similar. When the attribute identification is carried out on the image to be identified, firstly, the image characteristics of the image to be identified are obtained, then the image characteristics are input into each attribute identification branch, so that different groups of attributes of the image to be identified are respectively identified through different attribute identification branches, and the mutual negative influence of mutually exclusive attributes in the same attribute identification branch is avoided.

Referring to fig. 1, fig. 1 is a flowchart of an attribute identifying method according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:

step S11: image features of the image to be identified are obtained.

Since the application does not limit the application field of the attribute identification method, the subject of the image to be identified is not limited. For example, the image to be recognized may be a pedestrian image with a pedestrian as a subject, or the subject of the image to be recognized may be a vehicle image with a vehicle as a subject.

Illustratively, after the attribute identifying method of the present application is applied to the pedestrian attribute identifying system, the pedestrian attribute identifying system obtains the image characteristics of the pedestrian image when executing step S11.

Or, for example, after the attribute identifying method of the present application is applied to the vehicle attribute identifying system, the vehicle attribute identifying system obtains the image features of the vehicle image when step S11 is performed.

Step S12: and inputting the image characteristics into a plurality of attribute identification branches to respectively identify different groups of attributes of the image to be identified through the different attribute identification branches.

Wherein a set of attributes typically includes two or more attributes. However, in some sets of attributes, only one attribute may be included, which is not limited in this application. For example, when a certain attribute is identified, the positions of the image areas on the image that determine the result of attribute identification are different from the positions of the image areas on the image that determine the result of attribute identification when all other attributes are identified. Such attributes may therefore be individually identified as a set of attributes and branches identified for one attribute.

Wherein, the same or similar positions of the image areas on the image, which play a role in determining the attribute recognition result, refer to: the image area on the image that determines the result of the attribute recognition belongs to the same location on the target object. Taking the example that the object is a pedestrian, for example, when identifying whether the long-hair attribute and the sex attribute are long-hair attribute, the image area that determines the attribute identification result belongs to the neck and shoulder of the pedestrian, it can be considered that: when identifying whether the long-hair attribute and the sex attribute, the positions of the image areas which play a role in determining the attribute identification result are the same or similar.

By way of example, attribute recognition is performed for pedestrian images, and the attributes of the pedestrian images are divided into three groups. Wherein the first set of attributes comprises: whether hair is growing, sex, hat is wearing, and umbrella is being opened. When each attribute in the first group of attributes is identified, the image area on the image which determines the attribute identification result is the neck and shoulder area of the pedestrian, and the first group of attributes is identified by the first attribute identification branch.

The second set of attributes includes: whether to carry a backpack, whether to carry a single shoulder bag, and whether to wear the short sleeve. The image area on the image that determines the result of attribute recognition at the time of recognition is the arm and upper body area of the pedestrian, and the second group of attributes are recognized by the second attribute recognition branch.

The third set of attributes includes: whether to wear a skirt, whether to ride a bike, and age. When each attribute in the third group of attributes is identified, the image area on the image which determines the attribute identification result is the whole body area of the pedestrian, and the third group of attributes is identified by the third attribute identification branch.

After the image feature of the pedestrian image is obtained through the above-described step S11, the image feature is input to the input first attribute identification branch, the second attribute identification branch, and the third attribute identification branch. The first attribute identification branch identifies each attribute included in the first set of attributes based on the image feature after the image feature is obtained. The second attribute identification branch identifies each attribute included in the second set of attributes based on the image feature after the image feature is obtained. The third attribute identifying branch identifies each attribute included in the third set of attributes based on the image feature after the image feature is obtained.

In this way, the plurality of attributes of the pedestrian image are recognized in parallel by the three attribute recognition branches, and each attribute recognition branch, when recognizing for its corresponding set of attributes, has the same or similar position of the image area in the pedestrian image that determines the result of attribute recognition. Therefore, under the condition of ensuring the attribute identification efficiency, the mutual promotion effect is achieved among all the attributes identified by the same attribute identification branch, so that the accuracy of the attribute identification result is improved, and the mutual negative influence of the mutually exclusive attributes in the same attribute identification branch is avoided.

Further, since the attribute identifying method of the present application first obtains the image feature of the image to be identified, then inputs the image feature to the respective attribute identifying branches. In this way, each attribute identification branch performs an attribute identification operation based on the same image feature, the image feature represents the potential relevance among the attributes, and the accuracy of attribute identification is further improved through the potential relevance among the attributes.

It should be understood that the number of packets and the packet results mentioned in the above examples are only used to schematically explain the present application, and the number of packets and the packet results during implementation of the present application may be the same as or different from those in the above examples.

For example, the steps S11 and S12 may be performed by using a trained model after a preset model is built in advance and the preset model is trained. For how to build the preset model and how to train the preset model, please refer to the following, which is not repeated here.

Referring to fig. 2, fig. 2 is a flowchart of an attribute identification method according to another embodiment of the present application. As shown in fig. 2, the method comprises the steps of:

step S11-1: and obtaining shallow image characteristics of the image to be identified.

Step S12-1: the shallow image features are input into a plurality of attribute identification branches.

Step S12-2: and aiming at each attribute identification branch, acquiring deep image features corresponding to the attribute identification branch based on the shallow image features through the attribute identification branch, and identifying a group of attributes corresponding to the attribute identification branch based on the deep image features through the attribute identification branch.

Among these, shallow image features and deep image features are two opposing concepts. The shallow image features are image features extracted from the first few layers of a convolutional neural network CNN, and the deep image features are image features extracted from the last few layers of the convolutional neural network CNN. Or the shallow image feature is a feature extracted by the first convolutional neural network CNN, and the deep image feature is an image feature obtained by further extracting the shallow image feature by the second convolutional neural network CNN based on the shallow image feature after the shallow image feature is input into the second convolutional neural network CNN.

In general, shallow image features may include: color, texture, lines, edges, etc. Deep image features may include: facial, limb, backpack, vehicle, etc. have semantic features.

Wherein step S11-1 is one embodiment of step S11, and step S12-1 and step S12-2 are one embodiment of step S12.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating attribute identification according to an embodiment of the present application. As shown in fig. 3, a pedestrian image is used as an image to be identified, after a shallow image feature of the image to be identified is obtained, the shallow image feature is input into a first attribute identification branch, a second attribute identification branch and a third attribute identification branch, and each attribute identification branch further obtains a deep image feature corresponding to the shallow image feature based on the obtained shallow image feature and then performs attribute identification based on the deep image feature so as to obtain an identification result of each attribute in a group of attributes corresponding to the attribute identification branch.

It should be understood that the number of packets and the packet results shown in fig. 3 are only used to schematically explain the present application, and the number of packets and the packet results during implementation of the present application may be the same as or different from those in fig. 3.

Illustratively, in the above step S11-1 and the above step S12-2, shallow image features and deep image features may be extracted respectively using different hierarchies of the ResNet (Residual Neural Network) network. Specifically, for example, the ResNet network includes N residual blocks, the first N-1 blocks of the ResNet network can be used to extract shallow image features, and the last (i.e., N) block of the ResNet network can be used to extract deep image features. In other words, the image features extracted from the first N-1 blocks of the ResNet network are used as shallow image features, and the image features extracted from the last block of the ResNet network are used as deep image features.

Thus, in FIG. 3, the first N-1 blocks of the ResNet network are located before the three attribute identification branches for extracting shallow image features. While the last block of the res net network is duplicated in three copies, one in each of the three attribute identification branches, for extracting deep image features. The blocks in the three attribute identification branches have the same structure, but because they are updated based on different sample results at the time of training, respectively, the parameters of the blocks in the three attribute identification branches are different from each other.

In the above example, the last block is used to extract the deep image features, that is, the last block is duplicated in three copies and is respectively arranged in the three attribute identification branches, so that parameters of the whole network (including a module for extracting the shallow image features and the three attribute identification branches) can be effectively reduced, and the model training efficiency can be improved.

In addition, the shallow image features of the image to be identified are obtained, the shallow image features are input into a plurality of attribute identification branches, the deep image features corresponding to the shallow image features are respectively extracted through the attribute identification branches, and finally each attribute identification branch performs attribute identification based on the deep image features corresponding to the attribute identification branches. The shallow image features are used as image features shared by a plurality of attribute identification branches and used for reflecting potential relevance among the attributes, and the accuracy of attribute identification is further improved through the potential relevance among the attributes. And the deep image features extracted by the attribute recognition branches are used for reflecting the difference between the mutual exclusion attributes, and the deep image features corresponding to the attribute recognition branches are extracted by the attribute recognition modules, so that the mutual negative influence between the mutual exclusion attributes can be further avoided, and the attribute recognition accuracy is further improved.

For example, each attribute identification branch, when identifying for its corresponding set of attributes based on its corresponding deep image features, may be in specific embodiments: carrying out pooling treatment on the deep image features corresponding to the attribute identification branches through the attribute identification branches to obtain pooled image features; the pooled image features are input into a plurality of attribute identification units of the attribute identification branch, so that different attributes in a group of attributes corresponding to the attribute identification branch are respectively identified through different attribute identification units.

The specific pooling manner adopted in the pooling treatment may include multiple pooling manners, and the specific pooling manner is not limited in this application, for example, the specific pooling manner may be: global average Pooling GAP (Global Average Pooling), max-Pooling, mean-Pooling, or General Pooling, among others.

As shown in fig. 3, taking the first attribute identifying branch as an example, after obtaining the deep image feature corresponding to the first attribute identifying branch, the first attribute identifying branch performs pooling processing on the deep image feature to obtain pooled image features, and then inputs the pooled image features into four attribute identifying units. The four attribute identifying units identify whether the user has long hair, sex, wears a hat, and opens an umbrella, respectively. Specifically, each attribute identifying unit may include a full connection layer and a classification output layer softmax, and parameters of the respective attribute identifying units are different from each other since the respective attribute identifying units are updated based on different sample results, respectively, at the time of training.

For example, an attribute identifying unit for identifying a sex attribute determines a loss value loss using a sample result regarding the sex attribute at the time of training, and then updates the attribute identifying unit based on the loss value loss. For another example, the attribute identifying unit for identifying whether the hat is worn or not determines a loss value loss using a sample result regarding whether the hat is worn or not at the time of training, and then updates the attribute identifying unit based on the loss value loss.

In the above embodiments, the different attribute identification branches respectively correspond to different attribute groups. During the implementation of the present application, if the method of each of the above embodiments is performed by a pre-trained preset model, since different attribute identification branches respectively correspond to different attribute groups, during the training and learning of the preset model, the attributes having an effect of promoting the learning of each other are within the same group, and the attributes having an effect of suppressing the learning of each other are respectively placed in different groups. In other words, two or more attributes that are mutually exclusive attributes are respectively in different attribute groups, respectively identified by different attribute identification branches.

Furthermore, it is contemplated that two attributes respectively located in two attribute groupings are not necessarily mutually exclusive, and instead there may be a relationship between the attributes. For example, in the above example, whether the attribute of wearing a skirt is in the third set of attributes, but whether the attribute of wearing a skirt correlates with whether the attribute of the first set of attributes is long hair, gender, etc., because, according to common sense, the long hair is left typically by a woman, the normal time of wearing a skirt is also female, the normal long hair of wearing a skirt.

Thus, such correlation may be utilized, with the aim of further improving attribute identification accuracy. For this reason, the present application regards an attribute having an association relationship with other sets of attributes as a cross attribute. For example, whether to wear skirt attributes is taken as a cross attribute, and the cross attribute has an association relationship with the first set of attributes.

When executing the above step S12-2, specifically, for each attribute identification branch, in the case where a group of attributes corresponding to the attribute identification branch includes a cross attribute, obtaining deep image features corresponding to other groups of attributes having an association relationship with the cross attribute; and identifying the intersection attribute according to the deep image features corresponding to the attribute identification branches and the deep image features corresponding to the other group of attributes. The deep image features corresponding to other groups of attributes are as follows: the attributes corresponding to the other sets of attributes identify deep image features corresponding to the branches.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating attribute identification according to another embodiment of the present application. As shown in fig. 4, among the set of attributes corresponding to the third attribute identifying branch (i.e., the third set of attributes), whether the skirt attribute is worn is a cross attribute having an association relationship with the first set of attributes (i.e., the set of attributes corresponding to the first attribute identifying branch). When the intersection attribute is identified, as shown in fig. 4, a deep image feature corresponding to the first attribute identification branch is obtained, and the deep image feature is fused with a deep image feature corresponding to the third attribute identification branch to obtain a fused feature. And finally, inputting the fusion characteristic into an attribute identification unit of the cross attribute so as to identify the cross attribute.

For example, during fusion, two deep image features may be first spliced to obtain a spliced feature, and then a channel-down operation is performed on the spliced feature to obtain a fused feature, where the number of channels of the fused feature is equal to the number of channels of each deep image feature before splicing.

It should be understood that one cross attribute is only schematically shown in fig. 4, and that one or more cross attributes may be determined according to actual situations during implementation of the present application, which is not limited in this application.

By utilizing the association relation between the cross attribute and other groups of attributes, the cross attribute can be based on more deep image features during recognition, and the deep image features have positive influence on recognition of the cross attribute, so that the recognition accuracy of the cross attribute can be further improved.

In the above, the application process of the attribute identification method is introduced by the embodiments. In some embodiments, it is mentioned that the attribute identification method may be implemented by a pre-trained pre-set model. In the following, the present application describes a training process of a preset model by way of examples.

Referring to fig. 5, fig. 5 is a model training flowchart according to an embodiment of the present application. As shown in fig. 5, the training process includes the following steps:

Step S51: and building a preset model, wherein the preset model comprises a feature extraction module and a plurality of attribute identification branches, and each attribute identification branch is connected with the feature extraction module.

Wherein, in order to determine the number of attribute identifying branches and the attribute specifically included in the set of attributes corresponding to each attribute identifying branch, the following sub-steps may be performed:

step S51-1: for each attribute of a preset image, obtaining an activation graph corresponding to the attribute, wherein the activation graph is used for representing: when the attribute is identified, the weight of each area of the preset image for determining the attribute identification result;

step S51-2: clustering multiple attributes into multiple groups according to weight distribution represented by each activation graph, wherein the positions of areas with highest weights in each activation graph corresponding to the various attributes included in each group are the same;

step S51-3: for each set of attributes, an attribute identification branch is established.

For example, an existing attribute recognition model (hereinafter, collectively referred to as an existing model) may be used to obtain an activation map for each of a plurality of attributes. Specifically, one or more preset images may be input into a feature extraction network of the existing model, so as to extract an image feature map of the preset image. And then carrying out global average pooling GAP processing on the image features to obtain pooled image features. And inputting the image features into a plurality of attribute identification units of the existing model to obtain prediction results of a plurality of attributes. Wherein each attribute identification unit is used for identifying one attribute, and each attribute identification unit comprises a full connection layer FC and a classification output layer softmax.

Finally, for each attribute, the class activation graph algorithm CAM (Class Activation Mapping) is utilized to multiply the weight of the full-connection layer FC corresponding to the attribute to the channel of the image feature map, in other words, the weight of the full-connection layer FC is regarded as the weight of each channel. And then weighting and summing all channels to obtain an activation graph corresponding to the attribute.

Referring to fig. 6, fig. 6 is a schematic diagram of a plurality of class activation graphs according to an embodiment of the present application. In fig. 6, for example, in the activation map of each attribute of whether to send out, sex, wear hat, and get umbrella, as shown in fig. 6, the highlight region (i.e., the highest weighted region) is concentrated near the neck and shoulder of the pedestrian. In other words, when the attributes such as whether to grow, sex, wear a hat, and get an umbrella are identified, the neck and shoulder area of the pedestrian plays the most important role in determining the identification result of these attributes.

As shown in fig. 6, in the activation diagrams of each of the attributes of whether to carry a backpack, whether to carry a single shoulder, and whether to wear a short sleeve, the highlight region (i.e., the highest-weight region) is concentrated near the arm and upper body of the pedestrian. In other words, when recognizing the attributes such as whether to carry a backpack, whether to carry a single shoulder, and whether to wear a short sleeve, the result of recognizing these attributes is most determined by the arms and upper body regions of the pedestrian.

As shown in fig. 6, in the activation diagrams of the attributes of wearing a skirt, riding a bike, and age, the highlight region (i.e., the highest-weight region) is concentrated on the whole-body region of the pedestrian. In other words, when recognizing attributes such as whether to wear a skirt, whether to ride a bike, and the age, the whole body region of a pedestrian plays a major role in determining the recognition result of these attributes.

Referring to fig. 7, fig. 7 is a schematic diagram of training a preset model according to an embodiment of the present application. As shown in fig. 7, attributes of whether to send out long, sex, hat, umbrella, etc. are clustered into a first group of attributes; clustering the attributes such as whether to carry a backpack, whether to carry a single shoulder bag, whether to wear a short sleeve and the like into a second group of attributes; attributes of whether to wear a skirt, whether to ride a bike, and age are clustered into a third set of attributes. And respectively establishing an attribute identification branch for each of the three groups of attributes, wherein the first group of attributes corresponds to the first attribute identification branch, the second group of attributes corresponds to the second attribute identification branch, and the third group of attributes corresponds to the first attribute identification branch.

As shown in fig. 7, three attribute identification branches are connected to the feature extraction module, and each attribute identification branch may include an attribute identification unit. The number of attribute identification units included in one attribute identification branch is equal to the number of attributes included in its corresponding set of attributes. In other words, one attribute identifying unit is used to identify one attribute. Wherein each attribute identification unit specifically comprises a full connection layer and a classification output layer softmax. For simplifying the drawing, the full connection layer and the classification output layer softmax included in the attribute identifying unit are not shown in fig. 7.

Step S52: a plurality of sample images are obtained, each sample image carrying a plurality of attribute tags.

Taking attribute identification task for pedestrian images as an example, after a plurality of sample images of which the subjects are pedestrians are obtained, true attributes of the attributes are marked for each sample image and serve as labels of the attributes. For example, if a sample image is a female image wearing a skirt and carrying a shoulder bag, the label is "female" for the sex attribute. The tag is "yes" for whether the backpack is of property. The label is no for whether to carry the backpack attribute. The label is "yes" for whether to carry the rucksack attribute. The label is "yes" for whether to wear the skirt property.

Step S53: training the preset model based on the plurality of sample images, including: and inputting each sample image into the preset model, carrying out feature extraction on the sample image through the feature extraction module to obtain the image features of the sample image, and respectively identifying different groups of attributes based on the image features through different attribute identification branches to obtain the identification results of the attributes.

Continuing with the task of attribute identification for pedestrian images as an example, as shown in fig. 7, after an image feature of a certain sample image is extracted by the feature extraction module, the image feature is input into three attribute identification branches. Each attribute identification branch identifies a corresponding group of attributes by respective attribute identification units based on the obtained image characteristics, and obtains an identification result.

Step S54: and updating the attribute identification branch corresponding to each attribute according to the identification result of the attribute and the attribute label corresponding to the attribute.

As an example, as shown in fig. 7, the attribute identifying unit 1 of the first attribute identifying branch obtains an identification result of whether or not the long-hair attribute, and determines the loss value loss based on the identification result and the tag (i.e., the true attribute) of the long-hair attribute. The loss value loss is then used to update the attribute identification unit 1 and the parts of the first attribute identification branch other than the attribute identification unit 2-4.

Based on the same concept, other attribute identification units of the first attribute identification branch are updated. And based on the same conception, other attribute identification branches and attribute identification units included in the other attribute identification branches are updated. The reasons for the picture space are limited in fig. 7, and the updating process of the model by the attribute identification result of "whether long is sent" and the attribute identification result of "whether skirt is worn" are only schematically shown in fig. 7, and the updating process of the model by the rest attribute identification results can refer to the two attributes.

Wherein the following rules are followed when determining the loss value loss: in case the recognition result is consistent with the tag (i.e. the real attribute), the loss value loss is small, e.g. equal to 0; in case the recognition result does not coincide with the tag (i.e. the real attribute), the loss value loss is larger, for example equal to 1.

Step S55: and updating the feature extraction module according to the respective identification results of the plurality of attributes and the attribute labels corresponding to the plurality of attributes.

For example, after the above step S54 is performed to obtain loss values loss corresponding to the attributes, an average value of the loss values may be calculated, and then the feature extraction module may be updated with the average value.

After the training of the preset model is finished, the feature extraction module is used for obtaining the image features of the image to be identified.

As shown in fig. 7, the feature extraction module is configured to extract shallow image features, and each attribute identification branch may include a deep feature extraction module configured to extract deep image features based on the shallow image features.

Furthermore, as previously described, the two attributes respectively located in the two attribute groups are not necessarily mutually exclusive, but instead there may be a correlation between the attributes. For example, in the above example, whether the attribute of wearing a skirt is in the third set of attributes, but whether the attribute of wearing a skirt correlates with whether the attribute of the first set of attributes is long hair, gender, etc., because, according to common sense, the long hair is left typically by a woman, the normal time of wearing a skirt is also female, the normal long hair of wearing a skirt.

Therefore, the correlation can be utilized when the preset model is trained, so that the model with higher recognition accuracy can be obtained after the preset model is trained. For this reason, the present application regards an attribute having an association relationship with other sets of attributes as a cross attribute. For example, whether to wear skirt attributes is taken as a cross attribute, and the cross attribute has an association relationship with the first set of attributes.

When the above step S53 is executed, the steps are executed: when different sets of attributes are respectively identified based on the image features through different attribute identification branches, the method specifically comprises the following substeps:

substep S53-1: aiming at each attribute identification branch, under the condition that a group of attributes corresponding to the attribute identification branch comprises cross attributes, obtaining deep image features corresponding to other groups of attributes with association relation with the cross attributes;

substep S53-2: and identifying the intersection attribute according to the deep image features corresponding to the attribute identification branches and the deep image features corresponding to the other group of attributes.

As shown in fig. 7, among the set of attributes corresponding to the third attribute identifying branch (i.e., the third set of attributes), whether the skirt attribute is worn is a cross attribute having an association relationship with the first set of attributes (i.e., the set of attributes corresponding to the first attribute identifying branch). When the intersection attribute is identified, as shown in fig. 7, a deep image feature corresponding to the first attribute identification branch is obtained, and the deep image feature is fused with a deep image feature corresponding to the third attribute identification branch to obtain a fused feature. And finally, inputting the fusion characteristic into an attribute identification unit of the cross attribute so as to identify the cross attribute.

It should be understood that one cross attribute is only schematically shown in fig. 7, and that one or more cross attributes may be determined according to actual situations during implementation of the present application, which is not limited in this application.

When the above step S54 is performed, the steps are performed: for each attribute, when updating the attribute identification branch corresponding to the attribute according to the identification result of the attribute and the attribute label corresponding to the attribute, the method specifically comprises the following substeps:

substep S54-1: and updating the attribute identification branches corresponding to the cross attribute according to the identification result of the cross attribute and the attribute label corresponding to the cross attribute aiming at the cross attribute, and updating the attribute identification branches corresponding to other groups of attributes with association relation with the cross attribute.

As shown in fig. 7, the attribute identifying unit 9 of the third attribute identifying branch obtains an identification result of whether to wear the skirt attribute, and determines the loss value loss based on the identification result and the tag (i.e., the real attribute) of whether to wear the skirt attribute. The loss value loss is then used to update the attribute identification unit 9, the parts of the third attribute identification branch other than the attribute identification units 10 and 11, and the parts of the first attribute identification branch other than the attribute identification units 1 to 4.

Based on the same inventive concept, an embodiment of the present application provides an attribute identification device. Referring to fig. 8, fig. 8 is a schematic diagram of an attribute identifying apparatus according to an embodiment of the present application. As shown in fig. 8, the apparatus includes:

an image feature obtaining module 81 for obtaining image features of an image to be identified;

the attribute identifying module 82 is configured to input the image feature into a plurality of attribute identifying branches, so as to identify different sets of attributes of the image to be identified respectively through different attribute identifying branches;

Optionally, the image feature obtaining module is specifically configured to: obtaining shallow image characteristics of the image to be identified;

the attribute identification module includes:

the feature input sub-module is used for inputting the shallow image features into a plurality of attribute identification branches;

the attribute identification sub-module is used for identifying branches according to each attribute, acquiring deep image features corresponding to the attribute identification branches based on the shallow image features through the attribute identification branches, and identifying a group of corresponding attributes according to the deep image features corresponding to the attribute identification branches through the attribute identification branches.

Optionally, the attribute identification submodule is specifically configured to: under the condition that a group of attributes corresponding to the attribute identification branch comprises the cross attribute, obtaining deep image features corresponding to other groups of attributes with association relation with the cross attribute; and identifying the intersection attribute according to the deep image features corresponding to the attribute identification branches and the deep image features corresponding to the other group of attributes.

Optionally, the attribute identification submodule is specifically configured to: carrying out pooling treatment on the deep image features corresponding to the attribute identification branches through the attribute identification branches to obtain pooled image features; the pooled image features are input into a plurality of attribute identification units of the attribute identification branch, so that different attributes in a group of attributes corresponding to the attribute identification branch are respectively identified through different attribute identification units.

Optionally, the apparatus further comprises:

the model building module is used for building a preset model before obtaining the image characteristics of the image to be identified, the preset model comprises a characteristic extraction module and a plurality of attribute identification branches, and each attribute identification branch is connected with the characteristic extraction module;

the sample image obtaining module is used for obtaining a plurality of sample images, and each sample image carries a plurality of attribute tags;

The model training module is configured to train the preset model based on the plurality of sample images, and includes: inputting each sample image into the preset model, carrying out feature extraction on the sample image through the feature extraction module to obtain image features of the sample image, and respectively identifying different groups of attributes based on the image features through different attribute identification branches to obtain identification results of the attributes;

the first updating module is used for updating the attribute identification branch corresponding to each attribute according to the identification result of the attribute and the attribute label corresponding to the attribute;

the second updating module is used for updating the feature extraction module according to the respective identification results of the plurality of attributes and the respective attribute labels corresponding to the plurality of attributes;

Optionally, the model building module includes:

the activation map obtaining sub-module is used for obtaining an activation map corresponding to each attribute of the preset image, wherein the activation map is used for representing: when the attribute is identified, the weight of each area of the preset image for determining the attribute identification result;

The grouping determination submodule is used for clustering multiple attributes into multiple groups according to weight distribution represented by each activation image, and the positions of areas with highest weights in each activation image corresponding to the various attributes included in each group are the same;

the branch establishment sub-module is used for establishing an attribute identification branch for each group of attributes.

Optionally, the feature extraction module is configured to extract shallow image features, and each attribute identification branch includes a deep feature extraction module, where the deep feature extraction module is configured to extract deep image features based on the shallow image features;

the model training module is specifically used for: aiming at each attribute identification branch, under the condition that a group of attributes corresponding to the attribute identification branch comprises cross attributes, obtaining deep image features corresponding to other groups of attributes with association relation with the cross attributes; identifying the intersection attribute according to the deep image features corresponding to the attribute identification branches and the deep image features corresponding to the other groups of attributes;

the first updating module is specifically configured to: and updating the attribute identification branches corresponding to the cross attribute according to the identification result of the cross attribute and the attribute label corresponding to the cross attribute aiming at the cross attribute, and updating the attribute identification branches corresponding to other groups of attributes with association relation with the cross attribute.

Based on the same inventive concept, another embodiment of the present application provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the attribute identification method according to any of the above embodiments of the present application.

Based on the same inventive concept, another embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the steps in the attribute identification method described in any one of the foregoing embodiments of the present application.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present embodiments have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the present application.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The foregoing has described in detail a method, an apparatus, an electronic device and a readable storage medium for identifying an attribute, and specific examples have been applied to illustrate the principles and embodiments of the present application, and the description of the foregoing examples is only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method of attribute identification, the method comprising:

obtaining image characteristics of an image to be identified;

inputting the image characteristics into a plurality of attribute identification branches to respectively identify different groups of attributes of the image to be identified through different attribute identification branches; wherein, each attribute identification branch corresponds to a group of attributes, and when each attribute in each group of attributes is identified, the positions of image areas which play a role in determining the attribute identification result on the image are the same or similar;

the obtaining the image characteristics of the image to be identified comprises the following steps:

obtaining shallow image characteristics of the image to be identified;

The step of inputting the image features into a plurality of attribute identification branches to identify different groups of attributes of the image to be identified through the different attribute identification branches, comprises the following steps:

inputting the shallow image features into a plurality of attribute identification branches;

aiming at each attribute identification branch, acquiring deep image features corresponding to the attribute identification branch based on the shallow image features through the attribute identification branch, and identifying a group of attributes corresponding to the attribute identification branch based on the deep image features through the attribute identification branch;

under the condition that a group of attributes corresponding to one attribute identification branch comprises cross attributes, deep image features corresponding to other groups of attributes with association relation with the cross attributes are obtained;

and identifying the intersection attribute according to the deep image features corresponding to the attribute identification branches and the deep image features corresponding to the other group of attributes.

2. The method of claim 1, wherein the identifying by the attribute identifying branch for its corresponding set of attributes based on its corresponding deep image features, comprises:

carrying out pooling treatment on the deep image features corresponding to the attribute identification branches through the attribute identification branches to obtain pooled image features;

The pooled image features are input into a plurality of attribute identification units of the attribute identification branch, so that different attributes in a group of attributes corresponding to the attribute identification branch are respectively identified through different attribute identification units.

3. The method according to any one of claims 1 to 2, wherein prior to obtaining the image features of the image to be identified, the method further comprises:

building a preset model, wherein the preset model comprises a feature extraction module and a plurality of attribute identification branches, and each attribute identification branch is connected with the feature extraction module;

obtaining a plurality of sample images, wherein each sample image carries a plurality of attribute tags;

training the preset model based on the plurality of sample images, including: inputting each sample image into the preset model, carrying out feature extraction on the sample image through the feature extraction module to obtain image features of the sample image, and respectively identifying different groups of attributes based on the image features through different attribute identification branches to obtain identification results of the attributes;

updating the attribute identification branch corresponding to each attribute according to the identification result of the attribute and the attribute label corresponding to the attribute;

Updating the feature extraction module according to respective identification results of the plurality of attributes and respective attribute labels corresponding to the plurality of attributes;

4. A method according to claim 3, wherein building a pre-set model comprises:

for each attribute of a preset image, obtaining an activation graph corresponding to the attribute, wherein the activation graph is used for representing: when the attribute is identified, the weight of each area of the preset image for determining the attribute identification result;

clustering multiple attributes into multiple groups according to weight distribution represented by each activation graph, wherein the positions of areas with highest weights in each activation graph corresponding to the various attributes included in each group are the same;

for each set of attributes, an attribute identification branch is established.

5. A method according to claim 3, wherein the feature extraction module is configured to extract shallow image features, each attribute identification branch comprising a deep feature extraction module configured to extract deep image features based on shallow image features; when training the preset model, the identifying different groups of attributes through different attribute identification branches based on the image features respectively comprises the following steps:

Aiming at each attribute identification branch, under the condition that a group of attributes corresponding to the attribute identification branch comprises cross attributes, obtaining deep image features corresponding to other groups of attributes with association relation with the cross attributes;

identifying the intersection attribute according to the deep image features corresponding to the attribute identification branches and the deep image features corresponding to the other groups of attributes;

for each attribute, updating the attribute identification branch corresponding to the attribute according to the identification result of the attribute and the attribute label corresponding to the attribute, including:

and updating the attribute identification branches corresponding to the cross attribute according to the identification result of the cross attribute and the attribute label corresponding to the cross attribute aiming at the cross attribute, and updating the attribute identification branches corresponding to other groups of attributes with association relation with the cross attribute.

6. An attribute identification device, the device comprising:

the attribute identification module is used for inputting the image characteristics into a plurality of attribute identification branches so as to respectively identify different groups of attributes of the image to be identified through different attribute identification branches; wherein, each attribute identification branch corresponds to a group of attributes, and when each attribute in each group of attributes is identified, the positions of image areas which play a role in determining the attribute identification result on the image are the same or similar;

The image feature obtaining module is specifically configured to:

obtaining shallow image characteristics of the image to be identified;

the attribute identification module is specifically configured to:

7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 5.

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 5 when executing the computer program.