CN111881762A

CN111881762A - Method for training attribute recognition and identity recognition of pedestrian in combined manner

Info

Publication number: CN111881762A
Application number: CN202010620356.0A
Authority: CN
Inventors: 蒲恒; 邵新庆
Original assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Current assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2020-11-03

Abstract

A method for training attribute recognition and identity recognition of pedestrians in a combined manner comprises the following steps: inputting the pedestrian image for training into a neural network model; calculating the pedestrian image through the neural network model to obtain the probability distribution of the pedestrian attribute and the probability distribution of the pedestrian identity; calculating a first difference between the probability distribution of the pedestrian property calculated by the neural network model and the actual probability distribution of the pedestrian property according to a first loss function; calculating a second difference between the probability distribution of the pedestrian identity calculated by the neural network model and the actual probability distribution of the pedestrian identity according to a second loss function; and performing iterative optimization on the parameters of the neural network model according to the first difference and the second difference. The model trained by the method can simultaneously output the pedestrian attribute recognition result and the pedestrian body recognition result, and the recognition accuracy is effectively improved.

Description

Method for training attribute recognition and identity recognition of pedestrian in combined manner

Technical Field

The invention relates to the technical field of computer vision, in particular to a method for training pedestrian attribute recognition and pedestrian identity recognition in a combined manner.

Background

Pedestrian attribute identification is aimed at mining attributes of a given pedestrian image, such as attributes of hairstyle, gender, clothing, and the like. Given a pedestrian image I and a predefined set of attributes a, the object of pedestrian attribute recognition is to predict from this picture a set B belonging to the set of attributes a to characterize the pedestrian image. A general pedestrian attribute identification method is to input a pedestrian image into a neural network for feature extraction, obtain a high-dimensional vector that can represent features of an input picture, and then perform classification based on the feature vector. Pedestrian attributes are a high-level semantic feature that is more robust to changes in viewing angle and observation conditions.

Pedestrian identification refers to a technology for judging whether a specific pedestrian exists in an image or a video sequence by utilizing a computer vision technology. The general pedestrian identification method is to extract features of a target pedestrian by using a trained model, and then judge whether a specific pedestrian exists according to the similarity between the features. In the training stage, after the features are extracted by using the model, the pedestrians are classified according to the feature vectors. Pedestrian identification has attracted extensive research and attention as a technique that can achieve cross-border tracking of people.

Most of the existing methods are used for training the two tasks independently, so that the improvement of the model performance has certain limitation.

Disclosure of Invention

The application provides a method, a system and a storage medium for jointly training pedestrian attribute recognition and pedestrian identity recognition, which are used for improving the performance of a pedestrian attribute recognition model and a pedestrian identity recognition model.

According to a first aspect, the present invention provides a method for training pedestrian attribute recognition and pedestrian identity recognition in a combined manner, comprising:

inputting the pedestrian image for training into a neural network model;

calculating the pedestrian image through the neural network model to obtain the probability distribution of the pedestrian attribute and the probability distribution of the pedestrian identity, wherein the neural network model comprises:

the global feature extraction network is used for calculating the pedestrian image to obtain a global feature vector;

the pedestrian attribute prediction branch is used for calculating the global feature vector to obtain a first feature vector and calculating the probability distribution of the pedestrian attribute according to the first feature vector;

the pedestrian identity prediction branch comprises a fusion layer and is used for fusing the first feature vector and the global feature vector to obtain a second feature vector, and calculating the probability distribution of the pedestrian identity according to the second feature vector;

calculating a first difference between the probability distribution of the pedestrian property calculated by the neural network model and the actual probability distribution of the pedestrian property according to a first loss function;

calculating a second difference between the probability distribution of the pedestrian identity calculated by the neural network model and the actual probability distribution of the pedestrian identity according to a second loss function;

and performing iterative optimization on the parameters of the neural network model according to the first difference and the second difference.

In one embodiment, the fusing the first feature vector and the global feature vector by the fusion layer to obtain a second feature vector includes: and the fusion layer calculates the kronecker product of the first feature vector and the global feature vector to obtain the second feature vector.

In one embodiment, the global feature extraction network is a pre-trained ResNet50 network that includes a portion from the input layer to the global average pooling layer.

In one embodiment, before the inputting the pedestrian image into the neural network model, the method further includes: and normalizing the original image by using a preset mean value and a preset standard deviation so as to enable the size of the pedestrian image to meet the input requirement of the neural network model.

In one embodiment, the first loss function is determined by: for each pedestrian attribute, calculating two-class cross entropy loss functions of the probability distribution of the pedestrian attribute calculated by the neural network model and the actual probability distribution of the pedestrian attribute, and adding the two-class cross entropy loss functions corresponding to all the pedestrian attributes to form a first loss function;

the second loss function is determined by: and taking a multi-classification cross entropy loss function of the probability distribution of the pedestrian identity calculated by the neural network model and the probability distribution of the actual pedestrian identity as a second loss function.

In one embodiment, the iteratively optimizing the parameters of the neural network model based on the first difference and the second difference comprises: and according to the sum of the first difference and the second difference, carrying out iterative optimization on the parameters of the neural network model by using a random gradient descent method, so that the sum of the first difference and the second difference is reduced until a preset stop condition is reached.

According to a second aspect, the invention provides a system for training pedestrian attribute recognition and pedestrian identity recognition in a combined manner, comprising:

the input module is used for acquiring a pedestrian image for training;

the neural network model is used for calculating the pedestrian images acquired by the input module and used for training to obtain the probability distribution of the pedestrian attributes and the probability distribution of the pedestrian identities, and comprises:

the pedestrian identity prediction branch is used for fusing the first feature vector and the global feature vector to obtain a second feature vector and calculating the probability distribution of the pedestrian identity according to the second feature vector;

a loss calculation module for calculating a difference between the probability distribution of the pedestrian attributes calculated by the neural network model and an actual probability distribution of the pedestrian attributes, and a difference between the probability distribution of the pedestrian identities calculated by the neural network model and an actual probability distribution of the pedestrian identities, according to a predefined loss function;

and the parameter optimization module is used for performing iterative optimization on the parameters of the neural network model according to the difference between the probability distribution of the pedestrian attributes obtained by calculation of the neural network model and the actual probability distribution of the pedestrian attributes and the difference between the probability distribution of the pedestrian identities obtained by calculation of the neural network model and the actual probability distribution of the pedestrian identities.

According to a third aspect, the present invention provides a pedestrian attribute identification and pedestrian identity identification system comprising:

the receiving module is used for receiving a pedestrian image to be identified;

and the output module is used for outputting the probability distribution of the pedestrian attributes and the probability distribution of the pedestrian identities.

In one embodiment, the pedestrian identity prediction branch includes a fusion layer, and the fusion layer is configured to calculate a kronecker product of the first feature vector and the global feature vector to obtain the second feature vector.

According to a fourth aspect, the invention provides a computer readable storage medium comprising a program executable by a processor to implement a method of jointly training pedestrian attribute recognition and pedestrian identity recognition as described above.

According to the method, the system and the computer readable storage medium for jointly training pedestrian attribute recognition and pedestrian identity recognition, the global features of the pedestrian image and the features for attribute classification are fused by utilizing the correlation between the pedestrian attributes and the pedestrian identities to obtain the features with stronger representation capability, the fused features are used for pedestrian identity recognition, and the pedestrian attribute recognition and the pedestrian identity recognition are jointly trained to obtain a model, so that the pedestrian attribute recognition result and the pedestrian identity recognition result can be simultaneously output, and the recognition accuracy is effectively improved.

Drawings

FIG. 1 is a flow chart of a method for jointly training pedestrian attribute recognition and pedestrian identity recognition provided by the present invention;

FIG. 2 is a diagram of a neural network model structure according to an embodiment of the method for jointly training attribute recognition and identity recognition of pedestrians provided by the present invention;

FIG. 3 is a schematic structural diagram of a system for jointly training pedestrian attribute recognition and pedestrian identity recognition provided by the present invention;

fig. 4 is a schematic structural diagram of a pedestrian attribute identification and pedestrian identity identification system provided by the invention.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.

Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.

The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning. The term "connected" and "coupled" when used in this application, unless otherwise indicated, includes both direct and indirect connections (couplings).

The attribute characteristics of the same pedestrian can be the same under different cameras, different scenes and different postures, so that the attribute and the identity of the pedestrian are related. In the training process, the model extracts global features before attribute classification, the global features and the features for attribute classification are fused, and the fused features are used for pedestrian identity recognition. The fused features have stronger representation capability, so that better pedestrian attribute identification and pedestrian identity identification performance can be realized. The model trained by the method can simultaneously output the pedestrian attribute recognition result and the pedestrian identity recognition result, and the recognition accuracy is effectively improved.

Fig. 1 is a flowchart of a method for jointly training pedestrian attribute recognition and pedestrian identity recognition according to the present invention. As shown in fig. 1, the method for jointly training pedestrian attribute recognition and pedestrian identity recognition provided by the invention comprises the following steps:

step 102: inputting the pedestrian image for training into the neural network model.

Step 103: the probability distribution of the pedestrian attributes and the probability distribution of the pedestrian identities are obtained by calculating the pedestrian images input in the step 102 through a neural network model.

Fig. 2 is a diagram of a neural network model structure of an embodiment of a method for jointly training attribute recognition and identity recognition of a pedestrian according to the present invention. As shown in fig. 2, a neural network model for jointly training pedestrian attribute recognition and pedestrian identity recognition provided by an embodiment of the present invention may include:

and the global feature extraction network is used for calculating the pedestrian image to obtain a global feature vector. In this embodiment, the global feature extraction network uses a pre-trained ResNet50 network including a portion from the input layer to the global average pooling layer, and the ResNet50 network structure can avoid the problem of gradient disappearance during model training. After the pedestrian image is input into the global feature extraction network, the global feature vector f _ glb is obtained through forward calculation of the global feature extraction network.

And the pedestrian attribute prediction branch is used for calculating the global feature vector to obtain a first feature vector and calculating the probability distribution of the pedestrian attribute according to the first feature vector. In the present embodiment, the pedestrian attribute prediction branch includes a full connection layer FC1 and a full connection layer FC 2. And inputting the global feature vector f _ glb into a pedestrian attribute prediction branch, obtaining a first feature vector f _ attr for pedestrian attribute classification after passing through a full-connection layer FC1, inputting the feature vector into a full-connection layer FC2 for pedestrian attribute classification, and obtaining the probability distribution of the pedestrian attributes, wherein the number of output neurons of the full-connection layer FC2 is equal to the number of pedestrian attributes in a training set. In order to ensure that the distribution of the network prediction is a probability distribution, the output of the full connection layer FC2 is transformed using a Sigmoid layer, thereby obtaining the probability that the pedestrian image has each attribute. Predicting value y for each attribute_iThe output after Sigmoid function processing is as follows:

and the pedestrian identity prediction branch comprises a fusion layer and is used for fusing the first feature vector and the global feature vector to obtain a second feature vector, and calculating the probability distribution of the pedestrian identity according to the second feature vector. In this embodiment, the pedestrian identity prediction branch includes a fusion layer, a full link layer FC3 and a full link layer FC 4. The fusion layer has no parameter, and only the global feature vector f _ glb and the first feature vector f _ attr are fused to obtain a second feature vector f _ id. And inputting the second characteristic vector f _ id into a full connection layer FC3, and obtaining the probability distribution of the pedestrian identities through the calculation of a full connection layer FC3 and a full connection layer FC4, wherein the number of output neurons of the full connection layer FC4 is equal to the number of the pedestrian identities in the training set. The method for fusing the global feature vector f _ glb and the first feature vector f _ attr by the fusion layer may use a kronecker product, and the kronecker product is calculated by the following formula:

and respectively substituting the global characteristic vector f _ glb and the first characteristic vector f _ attr into u and v in the formula to obtain a second characteristic vector f _ id through calculation. In order to ensure that the distribution predicted by the network is a probability distribution, the output of the full connection layer FC4 is transformed using the Softmax layer, thereby obtaining the probability that the pedestrian image corresponds to each pedestrian identity. Suppose the output of full connectivity layer FC4 is y₁,y₂,y₃...,y_n

Then for each pedestrian identity predicted value y_iThe output after the processing of the Softmax function is as follows:

it should be understood that the structure of the neural network model can be designed according to the needs of a specific task, and is not limited to the structure provided in the present embodiment. For example, the global feature extraction network may use other image classification network models such as vgnet, and the pedestrian attribute prediction branch and the pedestrian identity prediction branch may increase or decrease the number of fully connected layers, or add other types of hidden layers, as required by the specific task.

Step 104: a first difference between the probability distribution of the pedestrian property calculated by the neural network model and the actual probability distribution of the pedestrian property is calculated according to the first loss function.

In some embodiments, a cross entropy loss function is used as the loss function. Cross entropy can be used to measure how similar two probability distributions are, and a cross entropy loss function is often used to compute the difference between the predicted and actual distributions of the network during training of the neural network model. The cross entropy loss function is defined as follows:

H(x)＝-∑p(x)log(q(x))

where p (x) represents the actual distribution and q (x) represents the predicted distribution of the network.

The first loss function may be determined by: regarding the identification of each pedestrian attribute as a two-classification problem, namely whether the input pedestrian image has the attribute, for each pedestrian attribute, calculating two-classification cross entropy loss functions of the probability distribution of the pedestrian attribute calculated by the neural network model and the actual probability distribution of the pedestrian attribute, and adding the two-classification cross entropy loss functions corresponding to all the pedestrian attributes to be used as a first loss function.

Step 105: and calculating a second difference between the probability distribution of the pedestrian identity calculated by the neural network model and the actual probability distribution of the pedestrian identity according to a second loss function.

The second loss function may be determined by: and (3) regarding the pedestrian identity recognition as a multi-classification problem, and taking a multi-classification cross entropy loss function of the probability distribution of the pedestrian identity calculated by the neural network model and the actual probability distribution of the pedestrian identity as a second loss function.

Step 106: and performing iterative optimization on the parameters of the neural network model according to the first difference calculated in the step 104 and the second difference calculated in the step 105.

In some embodiments, the parameters of the neural network model are iteratively optimized using a stochastic gradient descent method according to the sum of the first difference and the second difference, such that the sum of the first difference and the second difference is reduced until a preset stop condition is reached, for example, a set precision requirement or a maximum number of iterations is reached, i.e., the iteration is stopped.

In some embodiments, before step 102, further comprising:

step 101: and preprocessing the pedestrian image. In some embodiments, for a pedestrian image with a given pixel value distributed in the [0, 255] interval, the original image is normalized by using a preset mean value and standard deviation, and the size of the input image is scaled to the input size of the neural network model, so that the size of the pedestrian image meets the input requirement of the neural network model, and the training process can be more stable.

The method for jointly training the attribute recognition and the identity recognition of the pedestrian provided by the invention utilizes the correlation between the attribute of the pedestrian and the identity of the pedestrian to train the two tasks at the same time. In the training process, the model extracts global features before attribute classification, the global features and the features for attribute classification are fused, and the fused features are used for pedestrian identity recognition. Because the fused features have stronger representation capability, better pedestrian attribute identification and pedestrian identity identification performance can be realized. The model trained by the method can simultaneously output the pedestrian attribute recognition result and the pedestrian identity recognition result, and the recognition accuracy is effectively improved.

The invention also provides a system for training pedestrian attribute recognition and pedestrian identity recognition in a combined manner, as shown in fig. 3, the system comprises: an input module 301, a neural network model 302, a loss calculation module 303 and a parameter optimization module 304.

And the input module 301 is configured to acquire a pedestrian image for training, and input the pedestrian image into the neural network model 302.

The neural network model 302 is configured to calculate a pedestrian image for training acquired by the input module 301 to obtain a probability distribution of a pedestrian attribute and a probability distribution of a pedestrian identity, where the neural network model 302 includes:

and the global feature extraction network 312 is used for calculating the pedestrian image to obtain a global feature vector. In this embodiment, the global feature extraction network 312 uses a pre-trained ResNet50 network including a portion from the input layer to the global average pooling layer to perform forward calculation on the pedestrian image, so as to obtain a global feature vector f _ glb.

And a pedestrian attribute prediction branch 322, configured to calculate the global feature vector to obtain a first feature vector, and calculate probability distribution of a pedestrian attribute according to the first feature vector. In the present embodiment, the pedestrian attribute prediction branch 322 includes a full connection layer FC1 and a full connection layer FC 2. The global feature vector f _ glb is input into the pedestrian attribute prediction branch 322, a first feature vector f _ attr for pedestrian attribute classification is obtained after calculation of the full-connection layer FC1, the feature vector is input into the full-connection layer FC2 for pedestrian attribute classification, probability distribution of pedestrian attributes is obtained through calculation of the full-connection layer FC2, and the number of output neurons of the full-connection layer FC2 is equal to the number of the pedestrian attributes. In order to ensure that the distribution of the network prediction is a probability distribution, the output of the full connection layer FC2 is transformed using a Sigmoid layer, thereby obtaining the probability that the pedestrian image has each attribute. Predicting value y for each attribute_iThe output after calculation of the Sigmoid layer is as follows:

and a pedestrian identity prediction branch 332, configured to fuse the first feature vector and the global feature vector to obtain a second feature vector, and calculate probability distribution of a pedestrian identity according to the second feature vector. In this embodiment, the pedestrian identity prediction branch 332 includes a fusion layer, a full link layer FC3 and a full link layer FC 4. The fusion layer has no parameters, and only the kronecker product of the global feature vector f _ glb and the first feature vector f _ attr is calculated to obtain a second feature vector f _ id. The kronecker product is calculated as follows:

and inputting the second characteristic vector f _ id into a full connection layer FC3, and obtaining the probability distribution of the pedestrian identities through the calculation of a full connection layer FC3 and a full connection layer FC4, wherein the number of output neurons of the full connection layer FC4 is equal to the number of the pedestrian identities in the training set. In order to ensure that the distribution predicted by the network is a probability distribution, the output of the full connection layer FC4 is transformed using the Softmax layer, thereby obtaining the probability that the pedestrian image corresponds to each pedestrian identity. Suppose the output of full connectivity layer FC4 is y₁,y₂,y₃...,y_n

Then for each pedestrian identity predicted value y_iThe output after calculation by the Softmax layer is as follows:

a loss calculating module 303, configured to calculate a first difference between the probability distribution of the pedestrian attribute calculated by the neural network model and the probability distribution of the actual pedestrian attribute according to a first loss function, and calculate a second difference between the probability distribution of the pedestrian identity calculated by the neural network model and the probability distribution of the actual pedestrian identity according to a second loss function.

H(x)＝-∑p(x)log(q(x))

And a parameter optimization module 304, configured to perform iterative optimization on parameters of the neural network model according to the first difference and the second difference.

In some embodiments, the parameter optimization module 304 iteratively optimizes the parameters of the neural network model using a random gradient descent method according to the sum of the first difference and the second difference, so that the sum of the first difference and the second difference is reduced until a preset stop condition is reached, for example, a set precision requirement or a maximum number of iterations is reached, i.e., the iteration is stopped.

According to the system for jointly training pedestrian attribute recognition and pedestrian identity recognition, the global features of the input pedestrian image are extracted firstly by utilizing the correlation between the pedestrian attributes and the pedestrian identities, the global features and the features for attribute classification are fused to obtain the features with stronger representation capability, the fused features are used for pedestrian identity recognition, and two tasks of pedestrian attribute recognition and pedestrian identity recognition are trained simultaneously, so that a neural network model capable of simultaneously outputting a pedestrian attribute recognition result and a pedestrian identity recognition result is obtained, and the recognition accuracy is effectively improved.

The present invention also provides a pedestrian attribute identification and pedestrian identity identification system, as shown in fig. 4, the system includes: the system comprises a receiving module 401, a global feature extraction network 402, a pedestrian attribute prediction branch 403 and a pedestrian identity prediction branch 404.

The receiving module 401 is configured to receive an image of a pedestrian to be identified.

And the global feature extraction network 402 is configured to calculate the pedestrian image received by the receiving module 401 to obtain a global feature vector. In this embodiment, the global feature extraction network 402 calculates the pedestrian image using the pre-trained ResNet50 network including the parts from the input layer to the global average pooling layer, to obtain the global feature vector f _ glb.

And a pedestrian attribute prediction branch 403, configured to calculate the global feature vector to obtain a first feature vector, and calculate probability distribution of a pedestrian attribute according to the first feature vector. In the present embodiment, the pedestrian attribute prediction branch 403 includes a full connection layer FC1 and a full connection layer FC 2. The global feature vector f _ glb is input into the pedestrian attribute prediction branch 403, a first feature vector f _ attr for pedestrian attribute classification is obtained after calculation of the full-connection layer FC1, the feature vector is input into the full-connection layer FC2 for pedestrian attribute classification, probability distribution of pedestrian attributes is obtained through calculation of the full-connection layer FC2, and the number of output neurons of the full-connection layer FC2 is equal to the number of pedestrian attributes. In order to ensure that the distribution of the network prediction is a probability distribution, the output of the full connection layer FC2 is transformed using a Sigmoid layer, thereby obtaining the probability that the pedestrian image has each attribute. Predicting value y for each attribute_iThe output after calculation of the Sigmoid layer is as follows:

and a pedestrian identity prediction branch 404, configured to fuse the first feature vector and the global feature vector to obtain a second feature vector, and calculate probability distribution of a pedestrian identity according to the second feature vector. In this embodiment, the pedestrian identity prediction branch includes a fusion layer 414, a full link layer FC3, and a full link layer FC 4. The fusion layer is used for calculating a kronecker product of the global feature vector f _ glb and the first feature vector f _ attr to obtain a second feature vector f _ id. The kronecker product is calculated as follows:

and an output module 405, configured to output the probability distribution of the attribute of the pedestrian and the probability distribution of the identity of the pedestrian.

According to the pedestrian attribute identification and pedestrian identity identification system provided by the invention, the correlation between the pedestrian attributes and the pedestrian identities is utilized to simultaneously identify the pedestrian attributes and the pedestrian identities, the global features of the input pedestrian images are extracted firstly, the global features and the features for attribute classification are fused to obtain the features with stronger representation capability, and the fused features are used for identifying the pedestrian identities, so that the pedestrian attribute identification result and the pedestrian identity identification result can be simultaneously output, and the identification accuracy is effectively improved.

The present invention also provides a computer-readable storage medium including a program executable by a processor to implement the aforementioned method of jointly training pedestrian attribute recognition and pedestrian identity recognition.

Reference is made herein to various exemplary embodiments. However, those skilled in the art will recognize that changes and modifications may be made to the exemplary embodiments without departing from the scope hereof. For example, the various operational steps, as well as the components used to perform the operational steps, may be implemented in differing ways depending upon the particular application or consideration of any number of cost functions associated with operation of the system (e.g., one or more steps may be deleted, modified or incorporated into other steps).

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. Additionally, as will be appreciated by one skilled in the art, the principles herein may be reflected in a computer program product on a computer readable storage medium, which is pre-loaded with computer readable program code. Any tangible, non-transitory computer-readable storage medium may be used, including magnetic storage devices (hard disks, floppy disks, etc.), optical storage devices (CD-to-ROM, DVD, Blu-Ray discs, etc.), flash memory, and/or the like. These computer program instructions may be loaded onto a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the computer or other programmable data processing apparatus create means for implementing the functions specified. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including means for implementing the function specified. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified.

While the principles herein have been illustrated in various embodiments, many modifications of structure, arrangement, proportions, elements, materials, and components particularly adapted to specific environments and operative requirements may be employed without departing from the principles and scope of the present disclosure. The above modifications and other changes or modifications are intended to be included within the scope of this document.

The foregoing detailed description has been described with reference to various embodiments. However, one skilled in the art will recognize that various modifications and changes may be made without departing from the scope of the present disclosure. Accordingly, the disclosure is to be considered in an illustrative and not a restrictive sense, and all such modifications are intended to be included within the scope thereof. Also, advantages, other advantages, and solutions to problems have been described above with regard to various embodiments. However, the benefits, advantages, solutions to problems, and any element(s) that may cause any element(s) to occur or become more pronounced are not to be construed as a critical, required, or essential feature or element of any or all the claims. As used herein, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, system, article, or apparatus. Furthermore, the term "coupled," and any other variation thereof, as used herein, refers to a physical connection, an electrical connection, a magnetic connection, an optical connection, a communicative connection, a functional connection, and/or any other connection.

Those skilled in the art will recognize that many changes may be made to the details of the above-described embodiments without departing from the underlying principles of the invention. Accordingly, the scope of the invention should be determined only by the claims.

Claims

1. A method for training attribute recognition and identity recognition of pedestrians in a combined manner is characterized by comprising the following steps:

inputting the pedestrian image for training into a neural network model;

2. The method of claim 1, wherein the fusing layer fuses the first feature vector with the global feature vector to obtain a second feature vector, comprising: and the fusion layer calculates the kronecker product of the first feature vector and the global feature vector to obtain the second feature vector.

3. The method of claim 2, wherein the global feature extraction network is a pre-trained ResNet50 network that includes a portion of an input layer to a global average pooling layer.

4. The method of claim 1, prior to inputting the pedestrian image for training into the neural network model, further comprising: and normalizing the original image by using a preset mean value and a preset standard deviation so as to enable the size of the pedestrian image to meet the input requirement of the neural network model.

5. The method of claim 1, wherein the first loss function is determined by: for each pedestrian attribute, calculating two-class cross entropy loss functions of the probability distribution of the pedestrian attribute calculated by the neural network model and the actual probability distribution of the pedestrian attribute, and adding the two-class cross entropy loss functions corresponding to all the pedestrian attributes to form a first loss function;

6. The method of claim 1, wherein iteratively optimizing the parameters of the neural network model based on the first and second differences comprises: and according to the sum of the first difference and the second difference, carrying out iterative optimization on the parameters of the neural network model by using a random gradient descent method, so that the sum of the first difference and the second difference is reduced until a preset stop condition is reached.

7. A system for combined training of pedestrian attribute recognition and pedestrian identity recognition, comprising:

the input module is used for acquiring a pedestrian image for training;

a loss calculation module, configured to calculate a first difference between the probability distribution of the pedestrian attribute calculated by the neural network model and the probability distribution of the actual pedestrian attribute according to a first loss function, and calculate a second difference between the probability distribution of the pedestrian identity calculated by the neural network model and the probability distribution of the actual pedestrian identity according to a second loss function;

and the parameter optimization module is used for performing iterative optimization on the parameters of the neural network model according to the first difference and the second difference.

8. A pedestrian attribute identification and pedestrian identity recognition system, comprising:

the receiving module is used for receiving a pedestrian image to be identified;

9. The system of claim 8, wherein the pedestrian identity prediction branch comprises a fusion layer to compute a kronecker product of the first feature vector and the global feature vector, resulting in the second feature vector.

10. A computer-readable storage medium, characterized by comprising a program executable by a processor to implement the method of any one of claims 1 to 6.