CN112800978A

CN112800978A - Attribute recognition method, and training method and device for part attribute extraction network

Info

Publication number: CN112800978A
Application number: CN202110133441.9A
Authority: CN
Inventors: 王森
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2021-05-14

Abstract

The invention provides an attribute identification method, a training method and a training device of a part attribute extraction network, wherein after a plurality of human body part images corresponding to a target object are obtained, the attribute characteristics of human body parts contained in each human body part image are extracted; and determining the attribute feature of the target object based on the extracted attribute feature. The method can extract the attribute characteristics of the corresponding human body part through different human body part images, and compared with the mode of identifying various attribute characteristics from the whole image area of the target pedestrian in the prior art, the accuracy of the identification result of the attribute characteristics can be improved because the human body part images contain more local characteristics and detail information.

Description

Attribute recognition method, and training method and device for part attribute extraction network

Technical Field

The invention relates to the technical field of image processing, in particular to an attribute identification method, a training method of a part attribute extraction network and a device.

Background

The pedestrian attribute identification is a technical means for performing high-level semantic mining on pedestrians, can generalize rules from pedestrian data with large data volume, and has important significance in various fields. Specifically, in the pedestrian attribute recognition, a target pedestrian is first detected from an image, and then attribute features such as the age, sex, hair, clothes, and the like of the target pedestrian are recognized based on an image region of the target pedestrian.

Disclosure of Invention

In view of the above, the present invention provides an attribute identification method, a training method for a part attribute extraction network, and an apparatus thereof, so as to improve accuracy of an identification result of an attribute feature.

In a first aspect, an embodiment of the present invention provides an attribute identification method, where the method includes: acquiring a plurality of human body part images corresponding to a target object; wherein each human body part image comprises at least one human body part of the target object; extracting attribute characteristics of human body parts contained in each human body part image; and determining the attribute characteristics of the target object based on the extracted attribute characteristics of each human body part image.

In a second aspect, an embodiment of the present invention provides a method for training a partial attribute extraction network, where the method includes: acquiring a sample image containing a sample object and a plurality of human body position images corresponding to the sample object; each of the body part images contains at least one body part of the sample object; for each human body part, inputting a human body part image containing the human body part into a part attribute extraction network corresponding to the human body part and subjected to preliminary training, and outputting attribute characteristics of the human body part; inputting the sample image to a whole body attribute extraction network after primary training is completed, and outputting whole body attribute characteristics of the sample object; performing feature fusion on the attribute features of each human body part and the whole body attribute features to obtain fusion features, and determining the attribute features of the sample object based on the fusion features; and adjusting network parameters of a part attribute extraction network and a whole body attribute extraction network corresponding to each human body part based on the attribute characteristics of the sample object and a preset loss function until the attribute characteristics of the sample object are converged to obtain the trained part attribute extraction network and the trained whole body attribute extraction network corresponding to each human body part.

In a third aspect, an embodiment of the present invention provides an attribute identification apparatus, where the apparatus includes: the first acquisition module is used for acquiring a plurality of human body part images corresponding to a target object; wherein each human body part image comprises at least one human body part of the target object; the extraction module is used for extracting the attribute characteristics of the human body part contained in each human body part image; and the first determining module is used for determining the attribute characteristics of the target object based on the extracted attribute characteristics of each human body part image.

In a fourth aspect, an embodiment of the present invention provides a training apparatus for a part attribute extraction network, where the apparatus includes: the second acquisition module is used for acquiring a sample image containing a sample object and a plurality of human body position images corresponding to the sample object; each of the body part images contains at least one body part of the sample object; the output module is used for inputting the human body part image containing the human body part to the corresponding part attribute extraction network after the preliminary training of the human body part and outputting the attribute characteristics of the human body part aiming at each human body part; inputting the sample image to a whole body attribute extraction network after primary training is completed, and outputting whole body attribute characteristics of the sample object; the second determination module is used for performing feature fusion on the attribute features of each human body part and the whole body attribute features to obtain fusion features, and determining the attribute features of the sample object based on the fusion features; and the adjusting module is used for adjusting the network parameters of the part attribute extraction network and the whole body attribute extraction network corresponding to each human body part based on the attribute characteristics of the sample object and a preset loss function until the attribute characteristics of the sample object are converged to obtain the trained part attribute extraction network and the trained whole body attribute extraction network corresponding to each human body part.

In a fifth aspect, an embodiment of the present invention provides a server, including a processor and a memory, where the memory stores machine executable instructions capable of being executed by the processor, and the processor executes the machine executable instructions to implement the attribute identification method according to any one of the above first aspects or the training method for the part attribute extraction network according to any one of the above second aspects.

In a sixth aspect, embodiments of the present invention provide a machine-readable storage medium storing machine-executable instructions, which when invoked and executed by a processor, cause the processor to implement the method for attribute recognition according to any one of the above first aspects or the method for training a part attribute extraction network according to any one of the above second aspects.

According to the attribute identification method and the training method and device of the part attribute extraction network, when a plurality of human body part images corresponding to a target object are obtained, the attribute characteristics of the human body part contained in each human body part image are extracted; and determining the attribute feature of the target object based on the extracted attribute feature. The method can extract the attribute characteristics of the corresponding human body part through different human body part images, and compared with the mode of identifying various attribute characteristics from the whole image area of the target pedestrian in the prior art, the accuracy of the identification result of the attribute characteristics can be improved because the human body part images contain more local characteristics and detail information.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of an attribute identification method according to an embodiment of the present invention;

FIG. 2 is a flow chart of another attribute identification method according to an embodiment of the present invention;

FIG. 3 is a flowchart of another attribute identification method according to an embodiment of the present invention;

FIG. 4 is a flowchart of another attribute identification method according to an embodiment of the present invention;

FIG. 5 is a flowchart of a method for training a portion attribute extraction network according to an embodiment of the present invention;

fig. 6 is a flowchart of another training method for a portion attribute extraction network according to an embodiment of the present invention;

FIG. 7 is a flow chart of a pedestrian attribute model training process according to an embodiment of the present invention;

FIG. 8 is a flow chart of a pedestrian attribute model prediction provided in an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an attribute identification apparatus according to an embodiment of the present invention;

FIG. 10 is a schematic structural diagram of a training apparatus for a part attribute extraction network according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

With the development of society, the population flow of public places such as malls and stations is rapidly increased, and how to rapidly generalize rules from user behaviors with large data volumes has important significance for smart city construction, accurate advertisement delivery, criminal investigation and the like. The pedestrian attribute identification is a technical means for mining high-level semantics of pedestrians, detects a pedestrian target from a video, automatically identifies attribute characteristics such as age, gender, hair and clothes, is convenient for mining more semantic information of the pedestrians efficiently under monitoring or shooting, and has wide application in the aspects of smart city construction, accurate advertisement delivery, criminal investigation and the like. In the related art, most of the pedestrian attribute identification is to learn prediction of all attributes through a whole pedestrian picture, that is, the pedestrian attribute identification firstly detects a target pedestrian from an image, and then identifies attribute features of the target pedestrian, such as age, gender, hair, clothes and the like, based on an image region of the target pedestrian. Based on the above, the embodiment of the invention provides an attribute identification method, a training method and a training device for a part attribute extraction network, and the technology can be applied to applications requiring pedestrian attribute identification.

To facilitate understanding of the present embodiment, a detailed description is first provided for an attribute identification method disclosed in the embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

step S102, acquiring a plurality of human body part images corresponding to a target object; wherein each human body part image comprises at least one human body part of the target object.

The target object can be a target person, such as a pedestrian in a monitoring scene of a market, a station or a street; the human body part may be various body parts of the target object, such as a head, an upper body or a lower body; the human body part image includes one or more human body parts of the target object, for example, the human body part image may be a head image including only a head, an upper body image including only an upper body, or an image including both a head and an upper body; in actual implementation, when the attribute of the target object needs to be recognized, a plurality of body part images of the target object, such as a head image, an upper body image, a lower body image, and the like of the target object, are generally acquired first, and the number of the specific body part images may be set according to actual needs.

Step S104, extracting the attribute characteristics of the human body part contained in each human body part image.

The attribute features generally include color features, shape features and the like of corresponding human body parts; in actual implementation, after a plurality of human body part images corresponding to a target object are acquired, the attribute characteristics of the corresponding human body part can be extracted from each human body part image; for example, attribute features corresponding to the head, such as attribute features corresponding to the gender, hair color, age, and the like included in the head, may be extracted from the head image; attribute features corresponding to the type and color of the upper body, accessories of the upper body such as a scarf, and the like can be extracted from the upper body image.

And step S106, determining the attribute characteristics of the target object based on the attribute characteristics extracted from each human body part image.

After the attribute features of the human body part contained in each human body part image are extracted, determining the attribute features of the target object according to the extracted attribute features of each human body part image corresponding to the target object; for example, the plurality of body region images corresponding to the target object include a head image, an upper body image, and a lower body image, and the attribute feature of the whole body of the target object may be determined based on the attribute features of the corresponding body regions extracted from the head image, the upper body image, and the lower body image.

According to the attribute identification method provided by the embodiment of the invention, after a plurality of human body part images corresponding to a target object are obtained, the attribute characteristics of the human body part contained in each human body part image are extracted; and determining the attribute feature of the target object based on the extracted attribute feature. The method can extract the attribute characteristics of the corresponding human body part through different human body part images, and compared with the mode of identifying various attribute characteristics from the whole image area of the target pedestrian in the prior art, the accuracy of the identification result of the attribute characteristics can be improved because the human body part images contain more local characteristics and detail information.

The embodiment of the invention provides another attribute identification method, which is realized on the basis of the method of the embodiment; the method mainly describes a specific process of acquiring a plurality of human body part images corresponding to a target object, and as shown in fig. 2, the method comprises the following steps:

in step S202, an initial image including the target object is acquired.

The initial image may be understood as an original image containing a target object, such as an original image containing a target object captured by a monitoring camera, or a picture or a photograph containing the target object; the initial image is typically an image of all human body parts including the head, upper body and lower body of the target object; in practical implementation, an initial image including a target object may be acquired first, so as to identify an attribute of the target object based on the initial image.

Step S204, a plurality of human body parts of the target object are identified from the initial image, and an identification result is obtained.

When an initial image including a target object is acquired, the initial image may be generally recognized first to recognize a plurality of human body parts such as the head, the upper body, or the lower body of the target object from the initial image, and specifically, labels at pixel point levels may be obtained based on the initial image to recognize the head, the jacket, the arms, and the like.

In step S206, image segmentation processing is performed on the initial image based on the recognition result to obtain a human body part image corresponding to each human body part.

After obtaining the recognition result, the initial image is usually subjected to image segmentation based on the recognition result, specifically, each human body part of the target object in the initial image can be segmented along the contour, generally, the segmentation processing of the initial image can be realized through a pedestrian segmentation model, and in practical implementation, the pedestrian segmentation model can be trained in advance based on an open LIP (Look int Person) data set or an ART data set (a data set which can be used for development work such as image recognition, segmentation, annotation, and the like), and each human body part of the target object is extracted to obtain a human body part image corresponding to each human body part; the LIP dataset is a picture including a person, which is cut from a COCO (common Objects in context) dataset, and the COCO dataset is a dataset which can be used for image recognition, segmentation, annotation and other development works.

In step S208, the attribute features of the human body part included in each human body part image are extracted.

Step S210 is to determine the attribute feature of the target object based on the extracted attribute feature of each human body part image.

According to the attribute identification method provided by the embodiment of the invention, a plurality of human body parts of a target object are identified from an obtained initial image containing the target object, and an identification result is obtained; based on the recognition result, carrying out image segmentation processing on the initial image to obtain a human body position image corresponding to each human body position; extracting attribute characteristics of human body parts contained in each human body part image; and determining the attribute characteristics of the target object based on the extracted attribute characteristics. The method can extract the attribute characteristics of the corresponding human body part through different human body part images, and compared with the mode of identifying various attribute characteristics from the whole image area of the target pedestrian in the prior art, the accuracy of the identification result of the attribute characteristics can be improved because the human body part images contain more local characteristics and detail information.

The embodiment of the invention provides another attribute identification method, which is realized on the basis of the method of the embodiment; the method mainly describes a specific process of extracting attribute features of human body parts contained in each human body part image, and in the method, each human body part image contains one human body part; for example, each human body part image only includes a head, an upper body or a lower body; each human body part corresponds to a part attribute extraction network which is trained in advance; wherein, the part attribute extraction network can be a part attribute neural network of a corresponding human body part; the part attribute extraction network corresponding to each human body part is used for extracting attribute features contained in the human body part; for example, the head corresponds to a head attribute neural network, and the head attribute neural network is used for extracting attribute features contained in the head; the upper half body corresponds to an upper half body attribute neural network, and the upper half body attribute neural network is used for extracting attribute characteristics contained in the upper half body; the lower body corresponds to a lower body attribute neural network, and the lower body attribute neural network is used for extracting attribute characteristics contained in the lower body; as shown in fig. 3, the method comprises the steps of:

step S302, acquiring a plurality of human body part images corresponding to a target object; wherein each human body part image comprises at least one human body part of the target object.

Step S304, aiming at each human body part, inputting the human body part image corresponding to the human body part into the part attribute extraction network corresponding to the human body part, and outputting the attribute characteristics of the human body part.

The part attribute extraction network generally comprises a feature extraction sub-network and an attribute classification sub-network; the feature extraction sub-network can be used for simplifying and expressing high-dimensional image data, and can be specifically realized through various convolutional neural networks, such as a residual error network and a VGG network; the attribute classification sub-network can be used for integrating the features extracted by the feature extraction sub-network so as to classify the human body part images and further obtain the attribute features of the corresponding human body parts.

Specifically, the step S304 can be realized by the following steps from one to two:

step one, inputting the human body part image corresponding to the human body part into the feature extraction subnetwork corresponding to the human body part to obtain the part feature of the human body part.

In actual implementation, if the human body part is a head, a head image corresponding to the head can be input into the head attribute feature extraction sub-network to obtain the part feature of the head; for example, the head image may be converted into three-dimensional neurons in RGB (R: Red, Red; G: Green, Green; B: Blue, Blue) format, represented by a 3X 112 vector, where 3X 112 may be used to represent a three-channel, 112X 112 pixel picture; in a computer, the three channels can be used to represent pictures; the data of the three channels are spliced together and input into a head attribute convolutional neural Network, such as a ResNet 18 (redundant Network 18) Residual Network, to obtain a feature map vector with a size of 512 × 7, where 512 × 7 may represent a 512-channel, 7 × 7-pixel picture.

If the human body part is the upper half body, the upper half body image corresponding to the upper half body can be input into the upper half body feature extraction sub-network to obtain the part feature of the upper half body; for example, the upper body image may be converted to three-dimensional neurons in RGB format, represented by a 3 × 112 vector, which is input into an upper body attribute convolutional neural network, such as a ResNet 18 residual network, to obtain a feature map vector of size 512 × 7.

If the human body part is the lower half body, the lower half body image corresponding to the lower half body can be input into the lower half body feature extraction sub-network to obtain the part feature of the lower half body; for example, the lower body image may be converted to three-dimensional neurons in RGB format, represented by a 3 × 112 vector, which is input to a lower body attribute convolutional neural network, such as a ResNet 18 residual network, to obtain a feature map vector of size 512 × 7.

Inputting the part characteristics into an attribute classification subnetwork corresponding to the human body part to obtain the attribute characteristics of the human body part; the attribute features include: and presetting a first attribute value corresponding to the attribute type.

The preset attribute category can be understood as the classification of the attributes contained in the target object in advance; for example, the attributes included in the target object may be arranged into several sets, that is, a whole body attribute, an upper body attribute, a lower body attribute, a head attribute, and the like; wherein the whole body attributes comprise all attributes of the target object; the first attribute value may be understood as a predicted value corresponding to an attribute type obtained by a feature extraction sub-network and an attribute classification sub-network included in the part attribute extraction network; if the human body part is the head, the preset attribute categories can comprise the gender, the age, the eyes, the ears, the hair color and the length, whether some head accessories wear hat glasses or not and the like, and the attribute categories can be judged through the head; usually, each attribute category corresponds to one predicted value, so the predicted value of the head attribute usually comprises a plurality of predicted values; the part features of the head obtained in the first step may first obtain 512-dimensional features through a pooling layer pooling, and then obtain a head attribute classification layer through a full connection layer, and obtain attribute features of the head, where the attribute features of the head include a plurality of predicted values corresponding to a plurality of attribute categories corresponding to the head, respectively.

If the human body part is the upper half body, the preset attribute categories can comprise the jacket type, the jacket color, accessories of the upper half body such as a scarf and the like, and the attribute categories can be judged through the upper half body; each attribute category generally corresponds to one predicted value, so the predicted value of the upper half attribute generally comprises a plurality of predicted values; the part characteristics of the upper body obtained in the first step may be first obtained as 512-dimensional characteristics through a pooling layer, and then obtained as an upper body attribute classification layer through a full connection layer, so as to obtain the attribute characteristics of the upper body, where the attribute characteristics of the upper body include a plurality of predicted values corresponding to a plurality of attribute categories corresponding to the upper body.

If the human body part is the lower half body, the preset attribute categories can comprise a lower clothes type, a lower clothes color, a shoe type color and the like, and the attribute categories can be judged through the lower half body; each attribute category generally corresponds to one predicted value, so the predicted value of the lower body attribute generally comprises a plurality of predicted values; the part feature of the lower body obtained in the first step may be first obtained as a 512-dimensional feature by means of one pooling layer, and then obtained as a lower body attribute classification layer by means of one all-connected layer, to obtain an attribute feature of the lower body, wherein the attribute feature of the lower body includes a plurality of predicted values corresponding to a plurality of attribute categories corresponding to the lower body.

Step S306, the attribute characteristics of the target object are determined based on the attribute characteristics extracted from each human body part image.

According to the attribute identification method provided by the embodiment of the invention, after a plurality of human body part images corresponding to a target object are obtained, the human body part image corresponding to the human body part is input into the part attribute extraction network corresponding to the human body part for each human body part, and the attribute characteristics of the human body part are output. And determining the attribute characteristics of the target object based on the attribute characteristics extracted from each human body part image. The method can extract the attribute characteristics of the corresponding human body part through different human body part images, and compared with the mode of identifying various attribute characteristics from the whole image area of the target pedestrian in the prior art, the accuracy of the identification result of the attribute characteristics can be improved because the human body part images contain more local characteristics and detail information.

The embodiment of the invention provides another attribute identification method, which is realized on the basis of the method of the embodiment; the method mainly describes a specific process of determining the attribute characteristics of a target object based on the attribute characteristics extracted from each human body part image, and the method comprises the following steps: presetting a first attribute value corresponding to an attribute category; as shown in fig. 4, the method includes the steps of:

step S402, acquiring a plurality of human body position images corresponding to a target object; wherein each human body part image comprises at least one human body part of the target object.

In step S404, the attribute features of the human body part included in each human body part image are extracted.

Step S406, inputting an initial image containing a target object into a whole body attribute extraction network which is trained in advance, and outputting whole body attribute characteristics of the target object; the whole-body attribute feature includes an attribute feature of each body part of the target object.

The whole body attribute extraction network can be used for extracting the attribute characteristics of each human body part in the whole body of the target object; for example, if an initial image including a target object includes the head, upper body, and lower body of the target object, the initial image may be input to a whole-body attribute extraction network trained in advance, and then the attribute features of the head, upper body, and lower body of the target object may be output.

In practical implementation, the whole pedestrian picture can be converted into a three-dimensional neuron in an RBG format, the three-dimensional neuron is represented by a 3 x 112 vector, the three-dimensional neuron is input into a whole body attribute extraction network which is trained in advance, such as a ResNet 18 network, a vector of a feature map 512 x 7 is obtained, 512-dimensional features are obtained through a pooling layer, all attribute classification layers of pedestrians are obtained through a full connection layer, and attribute features of each human body part of a target object are further obtained.

Step S408, for each attribute category, if the first attribute value corresponding to the attribute category is greater than or equal to the first preset threshold, it is determined that the target object has the attribute feature corresponding to the attribute category.

The first preset threshold can be set according to actual requirements, and is usually a larger value between 0 and 1; for example, the first preset threshold may be set to 0.8, etc.; each attribute category has a corresponding first attribute value, namely a predicted value corresponding to the attribute category is obtained through a feature extraction sub-network and an attribute classification sub-network which are included in a corresponding part attribute extraction network, and the predicted value is usually a probability value between 0 and 1; in practical implementation, for each attribute class, a first attribute value corresponding to the attribute class may be compared with a first preset threshold, where the first preset threshold is 0.8 for example, if the first attribute value is greater than or equal to 0.8, it may be directly determined that the target object has the attribute class and an attribute feature corresponding to the attribute class.

Step S410, if the first attribute value corresponding to the attribute category is less than or equal to a second preset threshold value, determining that the target object does not have the attribute feature corresponding to the attribute category; the first preset threshold is larger than the second preset threshold.

The second preset threshold can also be set according to actual requirements, and is usually a smaller value between 0 and 1; for example, the second preset threshold may be set to 0.2, etc.; in practical implementation, for each attribute class, a first attribute value corresponding to the attribute class may be compared with a second preset threshold, for example, the second preset threshold is 0.2, and if the first attribute value is less than or equal to 0.2, it may be directly determined that the target object does not have the attribute class and an attribute feature corresponding to the attribute class.

Step S412, if the first attribute value corresponding to the attribute category is smaller than a first preset threshold and larger than a second preset threshold, acquiring a second attribute value corresponding to the attribute category from the whole body attribute feature; and determining whether the target object has the attribute feature corresponding to the attribute category or not based on the first attribute value and the second attribute value.

The second attribute value may be understood as a predicted value corresponding to an attribute type obtained by extracting a network from the whole body attribute; the predicted value is usually a probability value between 0 and 1; for convenience of description, taking the first preset threshold value as 0.8 and the second preset threshold value as 0.2 as an example, for each attribute class, if the first attribute value corresponding to the attribute class is smaller than 0.8 and larger than 0.2, the attribute feature obtained only through the part attribute extraction network cannot directly judge whether the target object has the attribute class and the attribute feature corresponding to the attribute class; at this time, a second attribute value corresponding to the attribute category may be obtained from the whole body attribute features, the first attribute value and the second attribute value are summed, and then an average value is calculated, where the average value is a final predicted value probability obtained by calculation, and whether the target object has the attribute category and the attribute features corresponding to the attribute category may be determined according to the average value.

It should be noted that some attribute categories do not exist in the attribute categories corresponding to the human body part, and the attribute features corresponding to these attribute categories are only included in the whole-body attribute features, for example, the orientation of the pedestrian, which can determine whether the target object has the attribute category and the attribute features corresponding to the attribute category directly according to the second attribute values corresponding to these attribute categories in the whole-body attribute features.

According to the attribute identification method provided by the embodiment of the invention, after a plurality of human body part images corresponding to a target object are obtained, the attribute characteristics of a human body part contained in each human body part image are extracted, an initial image containing the target object is input into a whole body attribute extraction network which is trained in advance, and the whole body attribute characteristics of the target object are output; for each attribute category, if a first attribute value corresponding to the attribute category is greater than or equal to a first preset threshold value, determining that the target object has an attribute feature corresponding to the attribute category; if the first attribute value corresponding to the attribute category is smaller than or equal to a second preset threshold value, determining that the target object does not have the attribute feature corresponding to the attribute category; if the first attribute value corresponding to the attribute category is smaller than a first preset threshold and larger than a second preset threshold, acquiring a second attribute value corresponding to the attribute category from the whole body attribute characteristics; and determining whether the target object has the attribute feature corresponding to the attribute category or not based on the first attribute value and the second attribute value. The method can extract the attribute characteristics of the corresponding human body part through different human body part images, and compared with the mode of identifying various attribute characteristics from the whole image area of the target pedestrian in the prior art, the accuracy of the identification result of the attribute characteristics can be improved because the human body part images contain more local characteristics and detail information.

The embodiment of the invention provides a training method for a part attribute extraction network, as shown in fig. 5, the method comprises the following steps:

step S502, acquiring a sample image containing a sample object and a plurality of human body position images corresponding to the sample object; each human body part image contains at least one human body part of the sample object.

The sample object may be understood as a sample person; the sample image is generally an image of all human body parts including the head, upper body, and lower body of the sample object; the human body part may be each body part of the sample object, such as the head, the upper body or the lower body; the plurality of human body part images corresponding to the sample object include one or more human body parts of the sample object, for example, the human body part image may be a head image including only a head of the sample object, an upper body image including only an upper body of the sample object, an image including both the head and the upper body of the sample object, or the like; when the part attribute extraction network needs to be trained, in order to ensure the accuracy of the part attribute extraction network training, a large number of sample images containing a large number of sample objects and a plurality of human body part images corresponding to the sample objects can be acquired.

Step S504, aiming at each human body part, inputting a human body part image containing the human body part into a part attribute extraction network corresponding to the human body part and subjected to preliminary training, and outputting attribute characteristics of the human body part; and inputting the sample image into a whole body attribute extraction network after the initial training, and outputting the whole body attribute characteristics of the sample object.

Extracting all parameters in the network according to the part attributes after the initial training, wherein the parameters such as convolution kernel parameters and the like can be randomly initialized parameters; all parameters in the whole body attribute extraction network after the initial training, such as convolution kernel parameters and the like, can also be randomly initialized parameters; if the human body part is the head, the head image containing the head can be input into a head attribute extraction network corresponding to the head and subjected to preliminary training, and the attribute characteristics of the head are output; if the human body part is the upper half body, the upper half body image containing the upper half body can be input into the upper half body attribute extraction network corresponding to the upper half body and subjected to preliminary training, and the attribute characteristics of the upper half body are output; if the human body part is the lower half body, the lower half body image including the lower half body can be input into the lower half body attribute extraction network corresponding to the lower half body and subjected to preliminary training, and the attribute characteristics of the lower half body can be output. Since the sample image is usually an image of all the human body parts including the head, upper body, and lower body of the sample object, the sample image can be input to the whole-body attribute extraction network after the preliminary training is completed, and then the whole-body attribute feature of the sample object can be output.

And S506, performing feature fusion on the attribute features of each human body part and the whole body attribute features to obtain fusion features, and determining the attribute features of the sample object based on the fusion features.

The feature fusion can be understood as that the output attribute features of each human body part and the whole body attribute features are generated into new features, namely fusion features, through a specified feature fusion algorithm, such as an algorithm based on a Bayesian decision theory, an algorithm based on a deep learning theory and the like, and the fusion features are more effective for classification; for example, the attribute features of the head, the upper body, the lower body, and the whole body may be feature-fused to obtain corresponding fusion features, and the attribute features of the sample object may be specified based on the fusion features.

Step S508, based on the attribute characteristics of the sample object and the preset loss function, network parameters of the part attribute extraction network and the whole body attribute extraction network corresponding to each human body part are adjusted until the attribute characteristics of the sample object converge, so as to obtain the trained part attribute extraction network and the trained whole body attribute extraction network corresponding to each human body part.

The loss function can calculate the difference between the predicted value and the true value of each attribute type of the sample object; the network parameters generally include all parameters of a part attribute extraction network and a whole body attribute extraction network corresponding to each human body part, such as convolution kernel parameters and the like; in practical implementation, a part attribute extraction network and a whole body attribute extraction network corresponding to each human body part are respectively trained, respective corresponding loss values are calculated through loss functions, and after the extraction networks are respectively trained in parallel, the whole body attribute extraction network is trained, so that network parameters of the part attribute extraction network and the whole body attribute extraction network corresponding to each human body part are continuously trained and adjusted based on the attribute characteristics of a sample object and a preset loss function until the attribute characteristics of the sample object are converged, and the part attribute extraction network and the whole body attribute extraction network corresponding to each trained human body part are obtained.

In practical implementation, the loss function L for supervised learning using sigmod cross entropy loss is as follows:

wherein N represents the number of samples, i represents the ith sample, L represents the specific attribute, L represents the ith attribute, y_ilActual value, p, of the i-th property of the i-th sample_iThe prediction value, p, of the classification layer output for the ith sample representing the model_ilThe classification layer representing the model outputs a predicted value for the property/of the ith sample.

The embodiment of the invention provides a training method of a part attribute extraction network, which comprises the steps of obtaining a sample image containing a sample object and a plurality of human body part images corresponding to the sample object; for each human body part, inputting a human body part image containing the human body part into a part attribute extraction network corresponding to the human body part and subjected to preliminary training, and outputting attribute characteristics of the human body part; and inputting the sample image into a whole body attribute extraction network after the initial training, and outputting the whole body attribute characteristics of the sample object. And performing feature fusion on the attribute features of each human body part and the whole body attribute features to obtain fusion features, and determining the attribute features of the sample object based on the fusion features. And adjusting network parameters of a part attribute extraction network and a whole body attribute extraction network corresponding to each human body part based on the attribute characteristics of the sample object and a preset loss function until the attribute characteristics of the sample object are converged to obtain the trained part attribute extraction network and the trained whole body attribute extraction network corresponding to each human body part. The method can extract the attribute characteristics of the corresponding human body part through different human body part images, and compared with the mode of identifying various attribute characteristics from the whole image area of the target pedestrian in the prior art, the accuracy of the identification result of the attribute characteristics can be improved because the human body part images contain more local characteristics and detail information.

The embodiment of the invention provides another training method for a part attribute extraction network, which is realized on the basis of the method of the embodiment; the method mainly describes a specific process of performing feature fusion on the attribute features of each human body part and the whole body attribute features to obtain fusion features, and as shown in fig. 6, the method comprises the following steps:

step S602, a sample image containing a sample object and a plurality of human body part images corresponding to the sample object are obtained; each human body part image contains at least one human body part of the sample object.

Step S604, aiming at each human body part, inputting a human body part image containing the human body part into a part attribute extraction network corresponding to the human body part and subjected to preliminary training, and outputting attribute characteristics of the human body part; and inputting the sample image into a whole body attribute extraction network after the initial training, and outputting the whole body attribute characteristics of the sample object.

Step S606, all human body parts are combined into a part set, two human body parts are determined from the part set, and the two determined human body parts are deleted from the part set; and performing feature fusion processing on the determined attribute features of the two human body parts to obtain a first fusion feature.

The part set usually includes all human parts, for example, if the human parts include a head, an upper body and a lower body, the part set includes the head, the upper body and the lower body, two human parts are determined from the part set, for example, the two determined human parts are the head and the upper body, and a new feature, namely, the first fused feature is generated by a specified feature fusion algorithm according to the attribute feature of the head and the attribute feature of the upper body; usually, after two human body parts are determined from the part set, the two human body parts can be deleted from the part set, and repeated processing is avoided.

In practical implementation, the step of performing feature fusion processing on the determined attribute features of the two human body parts to obtain a first fusion feature may include: splicing the determined attribute characteristics of the two human body parts to obtain a first splicing characteristic; and performing feature fusion processing on the first splicing feature through a preset convolution layer to obtain a first fusion feature.

Specifically, the preset convolution layer may be set according to actual requirements, for example, the preset convolution layer may be a convolution layer with a core 1 × 1; after two human body parts are determined from the part set, the attribute features of the two human body parts may be subjected to a stitching process, for example, if the two determined human body parts are a head and an upper body, and the attribute feature of the head is a vector with a size of 512 × 7, and the attribute feature of the upper body is also a vector with a size of 512 × 7, the attribute feature of the head and the attribute feature of the upper body may be stitched to obtain a feature map vector of 1024 × 7, which is the first stitching feature; feature fusion is performed on the convolution layers with 512 cores of 1 × 1, and a fused feature map with a size of 512 × 7 is obtained, namely the first fusion feature.

Step S608, if the part set is not empty, determining a target human body part from the part set, and deleting the target human body part from the part set; and performing feature fusion processing on the first fusion feature and the attribute feature of the target human body part to obtain an updated first fusion feature.

The target human body part may be a designated human body part from among the remaining human body parts in the set of parts from which the two determined human body parts are deleted; for example, also taking the two human body parts identified above as the head and the upper body as an example, if the lower body remains in the part set after the head and the upper body are deleted from the part set, the lower body can be identified as the target human body part.

In practical implementation, the step of performing feature fusion processing on the first fusion feature and the attribute feature of the target human body part to obtain an updated first fusion feature may include: splicing the first fusion characteristic and the attribute characteristic of the target human body part to obtain a second splicing characteristic; and performing feature fusion processing on the second splicing feature through a preset convolution layer to obtain an updated first fusion feature.

The preset convolution layer may also be set according to actual requirements, for example, the preset convolution layer may have a core of 1 × 1; after the target human body part is determined from the part set, the first fusion feature and the attribute feature of the target human body part can be spliced; for example, if the target human body part is a lower body and the attribute feature of the lower body is a vector having a size of 512 × 7, the first fused feature may be spliced with the attribute feature of the lower body to obtain a feature map vector of 1024 × 7, which is the second spliced feature; feature fusion is performed on the convolution layer with 512 cores of 1 × 1, and a fused feature map with a size of 512 × 7 is obtained, that is, the updated first fusion feature.

Step S610, if the part set is not empty, the step of determining a target human body part from the part set is continuously executed until the part set is empty, and a final fusion feature is determined based on the final first fusion feature and the whole body attribute feature.

In practical implementation, the step of determining the final fusion feature based on the final first fusion feature and the whole-body attribute feature may include: and performing feature fusion processing on the final first fusion feature and the whole body attribute feature to obtain a final fusion feature. Specifically, after the final first fused feature is obtained, the final first fused feature may be spliced with the whole-body attribute feature, for example, taking the final first fused feature as a feature map vector of 512 × 7 and the whole-body attribute feature as an example, the final first fused feature may be first spliced with the whole-body attribute feature to obtain a feature map with a size of 1024 × 7, and feature fusion is performed through 512 convolution layers with a kernel of 1 × 1 to obtain a fused feature map with a size of 512 × 7, that is, the final fused feature.

Step S612, determining the attribute feature of the sample object based on the final fusion feature.

Step S614, based on the attribute characteristics of the sample object and the preset loss function, network parameters of the part attribute extraction network and the whole body attribute extraction network corresponding to each human body part are adjusted until the attribute characteristics of the sample object are converged, and the part attribute extraction network and the whole body attribute extraction network corresponding to each trained human body part are obtained.

In actual implementation, the attribute characteristics of the sample object can be input into a final classification layer to obtain a predicted value of the global pedestrian attribute after fusion, sigmod cross entropy loss is used for supervised learning, and the loss function calculation mode is the same as the above; it should be noted that the attribute category of the global pedestrian attribute mentioned here is the same as the attribute category of the pedestrian attribute output from the sample image, except that the predicted value may be different.

The embodiment of the invention provides a training method of a part attribute extraction network, which comprises the steps of obtaining a sample image containing a sample object and a plurality of human body part images corresponding to the sample object; for each human body part, inputting a human body part image containing the human body part into a part attribute extraction network corresponding to the human body part and subjected to preliminary training, and outputting attribute characteristics of the human body part; and inputting the sample image into a whole body attribute extraction network after the initial training, and outputting the whole body attribute characteristics of the sample object. Combining all human body parts into a part set, determining two human body parts from the part set, and deleting the determined two human body parts; and performing feature fusion processing on the determined attribute features of the two human body parts to obtain a first fusion feature. If the part set is not empty, determining a target human body part from the part set, and deleting the target human body part; and performing feature fusion processing on the first fusion feature and the attribute feature of the target human body part to obtain an updated first fusion feature. And if the part set is not empty, continuing to execute the step of determining a target human body part from the part set until the part set is empty, and determining a final fusion feature based on the final first fusion feature and the whole body attribute feature. Attribute features of the sample object are determined based on the final fused features. And adjusting network parameters of a part attribute extraction network and a whole body attribute extraction network corresponding to each human body part based on the attribute characteristics of the sample object and a preset loss function until the attribute characteristics of the sample object are converged to obtain the trained part attribute extraction network and the trained whole body attribute extraction network corresponding to each human body part. The method can extract the attribute characteristics of the corresponding human body part through different human body part images, and compared with the mode of identifying various attribute characteristics from the whole image area of the target pedestrian in the prior art, the accuracy of the identification result of the attribute characteristics can be improved because the human body part images contain more local characteristics and detail information.

To further understand the above embodiments, an embodiment of the present invention provides a pedestrian attribute model training flowchart as shown in fig. 7, which mainly includes four parts, namely, a pedestrian segmentation model training stage, a feature extraction stage of each part of a pedestrian, a multilevel feature fusion stage, and an attribute prediction learning stage. Specifically, a pedestrian picture is input into a pedestrian segmentation model, pictures of each part, namely a head picture, an upper half body picture and a lower half body picture, are obtained through the pedestrian segmentation model, the head picture is converted into a three-dimensional neuron in an RBG format, the three-dimensional neuron is expressed by a vector of 3 x 112, and the three-dimensional neuron is input into a corresponding head attribute convolution neural network (a part attribute extraction network corresponding to the head after preliminary training is completed), so that a head feature map is obtained; converting the upper half body picture into three-dimensional neurons in an RBG format, representing the three-dimensional neurons by vectors of 3 x 112, inputting the three-dimensional neurons into corresponding upper half body attribute convolutional neural networks (corresponding to the part attribute extraction networks after the initial training corresponding to the upper half body) to obtain an upper half body feature map; converting the lower half body picture into a three-dimensional neuron in an RBG format, expressing the three-dimensional neuron by a vector of 3 x 112, inputting the three-dimensional neuron into a corresponding lower half body attribute convolution neural network (corresponding to a part attribute extraction network corresponding to the lower half body after the initial training is finished), and obtaining a lower half body feature map; converting the whole pedestrian picture into a three-dimensional neuron in an RBG format, expressing the three-dimensional neuron by using a vector of 3 x 112, and inputting the three-dimensional neuron into a corresponding whole body attribute neural network (corresponding to the whole body attribute extraction network after the initial training) to obtain a whole body feature map; the head feature map, the upper body feature map, the lower body feature map and the whole body feature map can be feature maps with the size of 512 × 7; performing feature fusion on the head feature map, the upper body feature map, the lower body feature map and the whole body feature map to obtain a final pedestrian attribute feature map (corresponding to the final fusion feature), wherein the size of the final pedestrian attribute feature map is usually 512 × 7; inputting the final pedestrian attribute feature map into a final classification layer to obtain a predicted value of the fused global pedestrian attribute, calculating the loss of a real value and the predicted value, and performing supervised learning by using sigmod cross entropy loss to update network parameters of the head attribute neural network, the upper body attribute neural network, the lower body attribute neural network and the whole body attribute neural network, so as to finally obtain the trained head attribute neural network, the upper body attribute neural network, the lower body attribute neural network and the whole body attribute neural network.

The embodiment of the present invention further provides a pedestrian attribute model prediction flowchart as shown in fig. 8, which is implemented by inputting a pedestrian image into a pedestrian segmentation model, obtaining images of each part through the pedestrian segmentation model, namely a head picture, an upper body picture and a lower body picture, converts the head picture into a preset format, e.g., three-dimensional neurons in RBG format, represented by 3 x 112 vectors, are input into corresponding trained head attribute convolutional neural networks, obtaining a head characteristic diagram, inputting the head characteristic diagram into a head attribute classifier to obtain a predicted value of a head attribute, converting the upper half body picture into a preset format, inputting the preset format into a corresponding upper half body attribute convolutional neural network after training is finished, obtaining an upper body feature map, and inputting the upper body feature map into an upper body attribute classifier to obtain a predicted value of the upper body attribute; converting the lower body picture into a preset format, inputting the preset format into a corresponding trained lower body attribute convolutional neural network to obtain a lower body feature map, and inputting the lower body feature map into a lower body attribute classifier to obtain a predicted value of the lower body attribute; converting the whole-body picture into a preset format, inputting the preset format into a corresponding trained whole-body attribute convolutional neural network to obtain a whole-body feature map, and inputting the whole-body feature map into a whole-body attribute classifier to obtain a predicted value of the whole-body attribute, wherein the predicted value is usually a probability value of 0-1; determining final pedestrian attributes through a pedestrian attribute decision engine; specifically, because the prediction result obtained through the part network is more reliable, the local attribute is determined by the result of the prediction vector of the part, for example, a threshold value of 0.8 and a threshold value of 0.2 are set, and if the probability of the attribute is greater than 0.8, the attribute of the target pedestrian can be directly determined; if the probability of the attribute is less than 0.2, the target pedestrian can be directly judged to have no attribute; and if the probability of the attribute is less than 0.8 and more than 0.2, summing and averaging the values in the whole body attribute vector and the values in the part attribute prediction vector corresponding to the attribute, and performing attribute identification according to the obtained final prediction probability.

According to the attribute identification method based on pedestrian segmentation and multi-level feature fusion, corresponding pedestrian attributes are predicted through different body parts, then better learning is achieved through feature fusion level by level, for example, when the head or the trunk is partially shielded or blurred during monitoring, the overall attribute identification effect is influenced, corresponding attribute features are extracted through local images of all parts of the target pedestrian, the shielded or blurred parts are identified by means of clear parts, and the accuracy of learning all attributes relative to the whole pedestrian picture is improved.

The embodiment of the present invention provides a schematic structural diagram of an attribute identification device, as shown in fig. 9, the device includes: a first obtaining module 90, configured to obtain multiple human body part images corresponding to a target object; wherein each human body part image comprises at least one human body part of the target object; the extracting module 91 is configured to extract attribute features of the human body part included in each human body part image; and the first determining module 92 is used for determining the attribute characteristics of the target object based on the attribute characteristics extracted from each human body part image.

After acquiring a plurality of human body part images corresponding to the target object, the attribute identification device extracts the attribute characteristics of the human body part contained in each human body part image; and determining the attribute feature of the target object based on the extracted attribute feature. The device can extract the attribute characteristics of corresponding human body parts through different human body part images, and compared with the mode of identifying various attribute characteristics from the whole image area of a target pedestrian in the prior art, the accuracy of the identification result of the attribute characteristics can be improved because the human body part images contain more local characteristics and detail information.

Further, the first obtaining module 90 is further configured to: acquiring an initial image containing a target object; recognizing a plurality of human body parts of a target object from the initial image to obtain a recognition result; based on the recognition result, image segmentation processing is performed on the initial image to obtain a human body part image corresponding to each human body part.

Furthermore, each human body part image comprises a human body part; each human body part corresponds to a part attribute extraction network which is trained in advance; the part attribute extraction network corresponding to each human body part is used for extracting attribute features contained in the human body part; the extraction module 91 is further configured to: and inputting the human body part image corresponding to each human body part into the part attribute extraction network corresponding to the human body part and outputting the attribute characteristics of the human body part.

Further, the part attribute extraction network comprises a feature extraction sub-network and an attribute classification sub-network; the extraction module 91 is further configured to: inputting the human body part image corresponding to the human body part into the feature extraction subnetwork corresponding to the human body part to obtain the part feature of the human body part; inputting the part characteristics into an attribute classification subnetwork corresponding to the human body part to obtain the attribute characteristics of the human body part; the attribute features include: and presetting a first attribute value corresponding to the attribute type.

Further, the attribute features include: presetting a first attribute value corresponding to an attribute category; the first determination module 92 is further configured to: for each attribute category, if a first attribute value corresponding to the attribute category is greater than or equal to a first preset threshold value, determining that the target object has an attribute feature corresponding to the attribute category; if the first attribute value corresponding to the attribute category is smaller than or equal to a second preset threshold value, determining that the target object does not have the attribute feature corresponding to the attribute category; the first preset threshold is larger than the second preset threshold.

Further, the apparatus is further configured to: inputting an initial image containing a target object into a whole body attribute extraction network which is trained in advance, and outputting whole body attribute characteristics of the target object; the whole body attribute characteristics comprise attribute characteristics of each human body part of the target object; the first determination module 92 is further configured to: if the first attribute value corresponding to the attribute category is smaller than a first preset threshold and larger than a second preset threshold, acquiring a second attribute value corresponding to the attribute category from the whole body attribute characteristics; and determining whether the target object has the attribute feature corresponding to the attribute category or not based on the first attribute value and the second attribute value.

The implementation principle and the generated technical effect of the attribute identification device provided by the embodiment of the invention are the same as those of the embodiment of the attribute identification method, and for the sake of brief description, the corresponding contents in the embodiment of the attribute identification method can be referred to where the embodiment of the attribute identification device is not mentioned.

An embodiment of the present invention provides a schematic structural diagram of a training apparatus for a partial attribute extraction network, as shown in fig. 10, the apparatus includes: a second obtaining module 100, configured to obtain a sample image including a sample object and a plurality of body position images corresponding to the sample object; each human body part image comprises at least one human body part of the sample object; an output module 101, configured to, for each human body part, input a human body part image including the human body part into a part attribute extraction network corresponding to the human body part and having been preliminarily trained, and output attribute features of the human body part; inputting the sample image into a whole body attribute extraction network after the initial training, and outputting the whole body attribute characteristics of the sample object; the second determining module 102 is configured to perform feature fusion on the attribute features of each human body part and the whole body attribute features to obtain fusion features, and determine the attribute features of the sample object based on the fusion features; and the adjusting module 103 is configured to adjust network parameters of the part attribute extraction network and the whole body attribute extraction network corresponding to each human body part based on the attribute features of the sample object and the preset loss function until the attribute features of the sample object converge, so as to obtain the trained part attribute extraction network and the trained whole body attribute extraction network corresponding to each human body part.

The training device of the part attribute extraction network acquires a sample image containing a sample object and a plurality of human body part images corresponding to the sample object; for each human body part, inputting a human body part image containing the human body part into a part attribute extraction network corresponding to the human body part and subjected to preliminary training, and outputting attribute characteristics of the human body part; and inputting the sample image into a whole body attribute extraction network after the initial training, and outputting the whole body attribute characteristics of the sample object. And performing feature fusion on the attribute features of each human body part and the whole body attribute features to obtain fusion features, and determining the attribute features of the sample object based on the fusion features. And adjusting network parameters of a part attribute extraction network and a whole body attribute extraction network corresponding to each human body part based on the attribute characteristics of the sample object and a preset loss function until the attribute characteristics of the sample object are converged to obtain the trained part attribute extraction network and the trained whole body attribute extraction network corresponding to each human body part. The device can extract the attribute characteristics of corresponding human body parts through different human body part images, and compared with the mode of identifying various attribute characteristics from the whole image area of a target pedestrian in the prior art, the accuracy of the identification result of the attribute characteristics can be improved because the human body part images contain more local characteristics and detail information.

Further, the second determining module 102 is further configured to: combining all human body parts into a part set, determining two human body parts from the part set, and deleting the determined two human body parts from the part set; performing feature fusion processing on the determined attribute features of the two human body parts to obtain a first fusion feature; if the part set is not empty, determining a target human body part from the part set, and deleting the target human body part from the part set; performing feature fusion processing on the first fusion feature and the attribute feature of the target human body part to obtain an updated first fusion feature; and if the part set is not empty, continuing to execute the step of determining a target human body part from the part set until the part set is empty, and determining a final fusion feature based on the final first fusion feature and the whole body attribute feature.

Further, the second determining module 102 is further configured to: and performing feature fusion processing on the final first fusion feature and the whole body attribute feature to obtain a final fusion feature.

Further, the second determining module 102 is further configured to: splicing the determined attribute characteristics of the two human body parts to obtain a first splicing characteristic; performing feature fusion processing on the first splicing feature through a preset convolution layer to obtain a first fusion feature; the second determining module 102 is further configured to: splicing the first fusion characteristic and the attribute characteristic of the target human body part to obtain a second splicing characteristic; and performing feature fusion processing on the second splicing feature through a preset convolution layer to obtain an updated first fusion feature.

The implementation principle and the generated technical effect of the training device of the part attribute extraction network provided by the embodiment of the invention are the same as those of the embodiment of the training method of the part attribute extraction network, and for the sake of brief description, corresponding contents in the embodiment of the training method of the part attribute extraction network can be referred to.

An embodiment of the present invention further provides a server, as shown in fig. 11, where the server includes a processor 130 and a memory 131, the memory 131 stores machine executable instructions capable of being executed by the processor 130, and the processor 130 executes the machine executable instructions to implement the attribute identification method or the training method for the part attribute extraction network.

Further, the server shown in fig. 11 further includes a bus 132 and a communication interface 133, and the processor 130, the communication interface 133 and the memory 131 are connected through the bus 132.

The memory 131 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 133 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used. The bus 132 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 11, but that does not indicate only one bus or one type of bus.

The processor 130 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 130. The processor 130 may be a general-purpose processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 131, and the processor 130 reads the information in the memory 131 and completes the steps of the method of the foregoing embodiment in combination with the hardware thereof.

The embodiment of the present invention further provides a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are called and executed by a processor, the machine-executable instructions cause the processor to implement the above attribute identification method or the training method for the part attribute extraction network, and specific implementation may refer to method embodiments, and is not described herein again.

The computer program product of the attribute identification method, the training method for the part attribute extraction network, and the training device provided in the embodiments of the present invention includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementations may refer to the method embodiments and will not be described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that the following embodiments are merely illustrative of the present invention, and not restrictive, and the scope of the present invention is not limited thereto: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An attribute identification method, comprising:

acquiring a plurality of human body part images corresponding to a target object; wherein each human body part image comprises at least one human body part of the target object;

extracting attribute characteristics of human body parts contained in each human body part image;

and determining the attribute characteristics of the target object based on the extracted attribute characteristics of each human body part image.

2. The method of claim 1, wherein the step of obtaining a plurality of body region images corresponding to the target object comprises:

acquiring an initial image containing a target object;

identifying a plurality of human body parts of the target object from the initial image to obtain an identification result;

and carrying out image segmentation processing on the initial image based on the identification result to obtain a human body part image corresponding to each human body part.

3. The method of claim 1, wherein each of said images of body parts comprises a body part; each human body part corresponds to a part attribute extraction network which is trained in advance; the part attribute extraction network corresponding to each human body part is used for extracting attribute features contained in the human body part;

the step of extracting the attribute features of the human body part included in each of the human body part images includes: and inputting the human body part image corresponding to each human body part into the part attribute extraction network corresponding to the human body part and outputting the attribute characteristics of the human body part.

4. The method of claim 3, wherein the site attribute extraction network comprises a feature extraction subnetwork and an attribute classification subnetwork;

the step of inputting the human body part image corresponding to the human body part into the part attribute extraction network corresponding to the human body part and outputting the attribute characteristics of the human body part comprises the following steps:

inputting the human body part image corresponding to the human body part into the feature extraction subnetwork corresponding to the human body part to obtain the part feature of the human body part;

5. The method of claim 1, wherein the attribute features comprise: presetting a first attribute value corresponding to an attribute category; the step of determining the attribute feature of the target object based on the attribute feature extracted from each of the human body part images includes:

for each attribute category, if a first attribute value corresponding to the attribute category is greater than or equal to a first preset threshold value, determining that the target object has an attribute feature corresponding to the attribute category;

if the first attribute value corresponding to the attribute category is smaller than or equal to a second preset threshold value, determining that the target object does not have the attribute feature corresponding to the attribute category; wherein the first preset threshold is greater than the second preset threshold.

6. The method according to claim 5, wherein the step of determining the attribute feature of the target object based on the extracted attribute feature of each of the human body part images is preceded by the method further comprising:

inputting an initial image containing a target object into a whole body attribute extraction network which is trained in advance, and outputting whole body attribute characteristics of the target object; the whole body attribute features comprise attribute features of each human body part of the target object;

the method further comprises the following steps: if the first attribute value corresponding to the attribute category is smaller than the first preset threshold and larger than the second preset threshold, acquiring a second attribute value corresponding to the attribute category from the whole body attribute feature; and determining whether the target object has the attribute feature corresponding to the attribute category or not based on the first attribute value and the second attribute value.

7. A method for training a partial attribute extraction network, the method comprising:

acquiring a sample image containing a sample object and a plurality of human body position images corresponding to the sample object; each of the body part images contains at least one body part of the sample object;

for each human body part, inputting a human body part image containing the human body part into a part attribute extraction network corresponding to the human body part and subjected to preliminary training, and outputting attribute characteristics of the human body part; inputting the sample image to a whole body attribute extraction network after primary training is completed, and outputting whole body attribute characteristics of the sample object;

performing feature fusion on the attribute features of each human body part and the whole body attribute features to obtain fusion features, and determining the attribute features of the sample object based on the fusion features;

and adjusting network parameters of a part attribute extraction network and a whole body attribute extraction network corresponding to each human body part based on the attribute characteristics of the sample object and a preset loss function until the attribute characteristics of the sample object are converged to obtain the trained part attribute extraction network and the trained whole body attribute extraction network corresponding to each human body part.

8. The method according to claim 7, wherein the step of feature fusing the attribute feature of each human body part and the whole body attribute feature to obtain a fused feature comprises:

combining all human body parts into a part set, determining two human body parts from the part set, and deleting the determined two human body parts from the part set; performing feature fusion processing on the determined attribute features of the two human body parts to obtain a first fusion feature;

if the part set is not empty, determining a target human body part from the part set, and deleting the target human body part from the part set; performing feature fusion processing on the first fusion feature and the attribute feature of the target human body part to obtain an updated first fusion feature;

if the set of parts is not empty, continuing to perform the step of determining a target human body part from the set of parts until the set of parts is empty, and determining a final fusion feature based on the final first fusion feature and the whole-body attribute feature.

9. The method of claim 8, wherein the step of determining a final fused feature based on the final first fused feature and the whole-body attribute feature comprises: and performing feature fusion processing on the final first fusion feature and the whole body attribute feature to obtain a final fusion feature.

10. The method according to claim 8, wherein the step of performing feature fusion processing on the determined attribute features of the two human body parts to obtain a first fusion feature comprises:

splicing the determined attribute characteristics of the two human body parts to obtain a first splicing characteristic; performing feature fusion processing on the first splicing feature through a preset convolution layer to obtain a first fusion feature;

the step of performing feature fusion processing on the first fusion feature and the attribute feature of the target human body part to obtain an updated first fusion feature includes:

splicing the first fusion characteristic and the attribute characteristic of the target human body part to obtain a second splicing characteristic; and performing feature fusion processing on the second splicing feature through a preset convolution layer to obtain an updated first fusion feature.

11. An attribute identification device, the device comprising:

the first acquisition module is used for acquiring a plurality of human body part images corresponding to a target object; wherein each human body part image comprises at least one human body part of the target object;

the extraction module is used for extracting the attribute characteristics of the human body part contained in each human body part image;

and the first determining module is used for determining the attribute characteristics of the target object based on the extracted attribute characteristics of each human body part image.

12. An apparatus for training a partial attribute extraction network, the apparatus comprising:

the second acquisition module is used for acquiring a sample image containing a sample object and a plurality of human body position images corresponding to the sample object; each of the body part images contains at least one body part of the sample object;

the output module is used for inputting the human body part image containing the human body part to the corresponding part attribute extraction network after the preliminary training of the human body part and outputting the attribute characteristics of the human body part aiming at each human body part; inputting the sample image to a whole body attribute extraction network after primary training is completed, and outputting whole body attribute characteristics of the sample object;

the second determination module is used for performing feature fusion on the attribute features of each human body part and the whole body attribute features to obtain fusion features, and determining the attribute features of the sample object based on the fusion features;

and the adjusting module is used for adjusting the network parameters of the part attribute extraction network and the whole body attribute extraction network corresponding to each human body part based on the attribute characteristics of the sample object and a preset loss function until the attribute characteristics of the sample object are converged to obtain the trained part attribute extraction network and the trained whole body attribute extraction network corresponding to each human body part.

13. A server comprising a processor and a memory, the memory storing machine executable instructions executable by the processor, the processor executing the machine executable instructions to implement the method of attribute recognition of any one of claims 1-6 or the method of training a site attribute extraction network of any one of claims 7-10.

14. A machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to implement the method of attribute recognition of any of claims 1-6, or the method of training a site attribute extraction network of any of claims 7-10.