CN111881762A - Method for training attribute recognition and identity recognition of pedestrian in combined manner - Google Patents

Method for training attribute recognition and identity recognition of pedestrian in combined manner Download PDF

Info

Publication number
CN111881762A
CN111881762A CN202010620356.0A CN202010620356A CN111881762A CN 111881762 A CN111881762 A CN 111881762A CN 202010620356 A CN202010620356 A CN 202010620356A CN 111881762 A CN111881762 A CN 111881762A
Authority
CN
China
Prior art keywords
pedestrian
feature vector
probability distribution
neural network
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010620356.0A
Other languages
Chinese (zh)
Inventor
蒲恒
邵新庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen ZNV Technology Co Ltd
Nanjing ZNV Software Co Ltd
Original Assignee
Shenzhen ZNV Technology Co Ltd
Nanjing ZNV Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen ZNV Technology Co Ltd, Nanjing ZNV Software Co Ltd filed Critical Shenzhen ZNV Technology Co Ltd
Priority to CN202010620356.0A priority Critical patent/CN111881762A/en
Publication of CN111881762A publication Critical patent/CN111881762A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A method for training attribute recognition and identity recognition of pedestrians in a combined manner comprises the following steps: inputting the pedestrian image for training into a neural network model; calculating the pedestrian image through the neural network model to obtain the probability distribution of the pedestrian attribute and the probability distribution of the pedestrian identity; calculating a first difference between the probability distribution of the pedestrian property calculated by the neural network model and the actual probability distribution of the pedestrian property according to a first loss function; calculating a second difference between the probability distribution of the pedestrian identity calculated by the neural network model and the actual probability distribution of the pedestrian identity according to a second loss function; and performing iterative optimization on the parameters of the neural network model according to the first difference and the second difference. The model trained by the method can simultaneously output the pedestrian attribute recognition result and the pedestrian body recognition result, and the recognition accuracy is effectively improved.

Description

Method for training attribute recognition and identity recognition of pedestrian in combined manner
Technical Field
The invention relates to the technical field of computer vision, in particular to a method for training pedestrian attribute recognition and pedestrian identity recognition in a combined manner.
Background
Pedestrian attribute identification is aimed at mining attributes of a given pedestrian image, such as attributes of hairstyle, gender, clothing, and the like. Given a pedestrian image I and a predefined set of attributes a, the object of pedestrian attribute recognition is to predict from this picture a set B belonging to the set of attributes a to characterize the pedestrian image. A general pedestrian attribute identification method is to input a pedestrian image into a neural network for feature extraction, obtain a high-dimensional vector that can represent features of an input picture, and then perform classification based on the feature vector. Pedestrian attributes are a high-level semantic feature that is more robust to changes in viewing angle and observation conditions.
Pedestrian identification refers to a technology for judging whether a specific pedestrian exists in an image or a video sequence by utilizing a computer vision technology. The general pedestrian identification method is to extract features of a target pedestrian by using a trained model, and then judge whether a specific pedestrian exists according to the similarity between the features. In the training stage, after the features are extracted by using the model, the pedestrians are classified according to the feature vectors. Pedestrian identification has attracted extensive research and attention as a technique that can achieve cross-border tracking of people.
Most of the existing methods are used for training the two tasks independently, so that the improvement of the model performance has certain limitation.
Disclosure of Invention
The application provides a method, a system and a storage medium for jointly training pedestrian attribute recognition and pedestrian identity recognition, which are used for improving the performance of a pedestrian attribute recognition model and a pedestrian identity recognition model.
According to a first aspect, the present invention provides a method for training pedestrian attribute recognition and pedestrian identity recognition in a combined manner, comprising:
inputting the pedestrian image for training into a neural network model;
calculating the pedestrian image through the neural network model to obtain the probability distribution of the pedestrian attribute and the probability distribution of the pedestrian identity, wherein the neural network model comprises:
the global feature extraction network is used for calculating the pedestrian image to obtain a global feature vector;
the pedestrian attribute prediction branch is used for calculating the global feature vector to obtain a first feature vector and calculating the probability distribution of the pedestrian attribute according to the first feature vector;
the pedestrian identity prediction branch comprises a fusion layer and is used for fusing the first feature vector and the global feature vector to obtain a second feature vector, and calculating the probability distribution of the pedestrian identity according to the second feature vector;
calculating a first difference between the probability distribution of the pedestrian property calculated by the neural network model and the actual probability distribution of the pedestrian property according to a first loss function;
calculating a second difference between the probability distribution of the pedestrian identity calculated by the neural network model and the actual probability distribution of the pedestrian identity according to a second loss function;
and performing iterative optimization on the parameters of the neural network model according to the first difference and the second difference.
In one embodiment, the fusing the first feature vector and the global feature vector by the fusion layer to obtain a second feature vector includes: and the fusion layer calculates the kronecker product of the first feature vector and the global feature vector to obtain the second feature vector.
In one embodiment, the global feature extraction network is a pre-trained ResNet50 network that includes a portion from the input layer to the global average pooling layer.
In one embodiment, before the inputting the pedestrian image into the neural network model, the method further includes: and normalizing the original image by using a preset mean value and a preset standard deviation so as to enable the size of the pedestrian image to meet the input requirement of the neural network model.
In one embodiment, the first loss function is determined by: for each pedestrian attribute, calculating two-class cross entropy loss functions of the probability distribution of the pedestrian attribute calculated by the neural network model and the actual probability distribution of the pedestrian attribute, and adding the two-class cross entropy loss functions corresponding to all the pedestrian attributes to form a first loss function;
the second loss function is determined by: and taking a multi-classification cross entropy loss function of the probability distribution of the pedestrian identity calculated by the neural network model and the probability distribution of the actual pedestrian identity as a second loss function.
In one embodiment, the iteratively optimizing the parameters of the neural network model based on the first difference and the second difference comprises: and according to the sum of the first difference and the second difference, carrying out iterative optimization on the parameters of the neural network model by using a random gradient descent method, so that the sum of the first difference and the second difference is reduced until a preset stop condition is reached.
According to a second aspect, the invention provides a system for training pedestrian attribute recognition and pedestrian identity recognition in a combined manner, comprising:
the input module is used for acquiring a pedestrian image for training;
the neural network model is used for calculating the pedestrian images acquired by the input module and used for training to obtain the probability distribution of the pedestrian attributes and the probability distribution of the pedestrian identities, and comprises:
the global feature extraction network is used for calculating the pedestrian image to obtain a global feature vector;
the pedestrian attribute prediction branch is used for calculating the global feature vector to obtain a first feature vector and calculating the probability distribution of the pedestrian attribute according to the first feature vector;
the pedestrian identity prediction branch is used for fusing the first feature vector and the global feature vector to obtain a second feature vector and calculating the probability distribution of the pedestrian identity according to the second feature vector;
a loss calculation module for calculating a difference between the probability distribution of the pedestrian attributes calculated by the neural network model and an actual probability distribution of the pedestrian attributes, and a difference between the probability distribution of the pedestrian identities calculated by the neural network model and an actual probability distribution of the pedestrian identities, according to a predefined loss function;
and the parameter optimization module is used for performing iterative optimization on the parameters of the neural network model according to the difference between the probability distribution of the pedestrian attributes obtained by calculation of the neural network model and the actual probability distribution of the pedestrian attributes and the difference between the probability distribution of the pedestrian identities obtained by calculation of the neural network model and the actual probability distribution of the pedestrian identities.
According to a third aspect, the present invention provides a pedestrian attribute identification and pedestrian identity identification system comprising:
the receiving module is used for receiving a pedestrian image to be identified;
the global feature extraction network is used for calculating the pedestrian image to obtain a global feature vector;
the pedestrian attribute prediction branch is used for calculating the global feature vector to obtain a first feature vector and calculating the probability distribution of the pedestrian attribute according to the first feature vector;
the pedestrian identity prediction branch is used for fusing the first feature vector and the global feature vector to obtain a second feature vector and calculating the probability distribution of the pedestrian identity according to the second feature vector;
and the output module is used for outputting the probability distribution of the pedestrian attributes and the probability distribution of the pedestrian identities.
In one embodiment, the pedestrian identity prediction branch includes a fusion layer, and the fusion layer is configured to calculate a kronecker product of the first feature vector and the global feature vector to obtain the second feature vector.
According to a fourth aspect, the invention provides a computer readable storage medium comprising a program executable by a processor to implement a method of jointly training pedestrian attribute recognition and pedestrian identity recognition as described above.
According to the method, the system and the computer readable storage medium for jointly training pedestrian attribute recognition and pedestrian identity recognition, the global features of the pedestrian image and the features for attribute classification are fused by utilizing the correlation between the pedestrian attributes and the pedestrian identities to obtain the features with stronger representation capability, the fused features are used for pedestrian identity recognition, and the pedestrian attribute recognition and the pedestrian identity recognition are jointly trained to obtain a model, so that the pedestrian attribute recognition result and the pedestrian identity recognition result can be simultaneously output, and the recognition accuracy is effectively improved.
Drawings
FIG. 1 is a flow chart of a method for jointly training pedestrian attribute recognition and pedestrian identity recognition provided by the present invention;
FIG. 2 is a diagram of a neural network model structure according to an embodiment of the method for jointly training attribute recognition and identity recognition of pedestrians provided by the present invention;
FIG. 3 is a schematic structural diagram of a system for jointly training pedestrian attribute recognition and pedestrian identity recognition provided by the present invention;
fig. 4 is a schematic structural diagram of a pedestrian attribute identification and pedestrian identity identification system provided by the invention.
Detailed Description
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.
Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.
The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning. The term "connected" and "coupled" when used in this application, unless otherwise indicated, includes both direct and indirect connections (couplings).
The attribute characteristics of the same pedestrian can be the same under different cameras, different scenes and different postures, so that the attribute and the identity of the pedestrian are related. In the training process, the model extracts global features before attribute classification, the global features and the features for attribute classification are fused, and the fused features are used for pedestrian identity recognition. The fused features have stronger representation capability, so that better pedestrian attribute identification and pedestrian identity identification performance can be realized. The model trained by the method can simultaneously output the pedestrian attribute recognition result and the pedestrian identity recognition result, and the recognition accuracy is effectively improved.
Fig. 1 is a flowchart of a method for jointly training pedestrian attribute recognition and pedestrian identity recognition according to the present invention. As shown in fig. 1, the method for jointly training pedestrian attribute recognition and pedestrian identity recognition provided by the invention comprises the following steps:
step 102: inputting the pedestrian image for training into the neural network model.
Step 103: the probability distribution of the pedestrian attributes and the probability distribution of the pedestrian identities are obtained by calculating the pedestrian images input in the step 102 through a neural network model.
Fig. 2 is a diagram of a neural network model structure of an embodiment of a method for jointly training attribute recognition and identity recognition of a pedestrian according to the present invention. As shown in fig. 2, a neural network model for jointly training pedestrian attribute recognition and pedestrian identity recognition provided by an embodiment of the present invention may include:
and the global feature extraction network is used for calculating the pedestrian image to obtain a global feature vector. In this embodiment, the global feature extraction network uses a pre-trained ResNet50 network including a portion from the input layer to the global average pooling layer, and the ResNet50 network structure can avoid the problem of gradient disappearance during model training. After the pedestrian image is input into the global feature extraction network, the global feature vector f _ glb is obtained through forward calculation of the global feature extraction network.
And the pedestrian attribute prediction branch is used for calculating the global feature vector to obtain a first feature vector and calculating the probability distribution of the pedestrian attribute according to the first feature vector. In the present embodiment, the pedestrian attribute prediction branch includes a full connection layer FC1 and a full connection layer FC 2. And inputting the global feature vector f _ glb into a pedestrian attribute prediction branch, obtaining a first feature vector f _ attr for pedestrian attribute classification after passing through a full-connection layer FC1, inputting the feature vector into a full-connection layer FC2 for pedestrian attribute classification, and obtaining the probability distribution of the pedestrian attributes, wherein the number of output neurons of the full-connection layer FC2 is equal to the number of pedestrian attributes in a training set. In order to ensure that the distribution of the network prediction is a probability distribution, the output of the full connection layer FC2 is transformed using a Sigmoid layer, thereby obtaining the probability that the pedestrian image has each attribute. Predicting value y for each attributeiThe output after Sigmoid function processing is as follows:
Figure BDA0002562840500000051
and the pedestrian identity prediction branch comprises a fusion layer and is used for fusing the first feature vector and the global feature vector to obtain a second feature vector, and calculating the probability distribution of the pedestrian identity according to the second feature vector. In this embodiment, the pedestrian identity prediction branch includes a fusion layer, a full link layer FC3 and a full link layer FC 4. The fusion layer has no parameter, and only the global feature vector f _ glb and the first feature vector f _ attr are fused to obtain a second feature vector f _ id. And inputting the second characteristic vector f _ id into a full connection layer FC3, and obtaining the probability distribution of the pedestrian identities through the calculation of a full connection layer FC3 and a full connection layer FC4, wherein the number of output neurons of the full connection layer FC4 is equal to the number of the pedestrian identities in the training set. The method for fusing the global feature vector f _ glb and the first feature vector f _ attr by the fusion layer may use a kronecker product, and the kronecker product is calculated by the following formula:
Figure BDA0002562840500000061
and respectively substituting the global characteristic vector f _ glb and the first characteristic vector f _ attr into u and v in the formula to obtain a second characteristic vector f _ id through calculation. In order to ensure that the distribution predicted by the network is a probability distribution, the output of the full connection layer FC4 is transformed using the Softmax layer, thereby obtaining the probability that the pedestrian image corresponds to each pedestrian identity. Suppose the output of full connectivity layer FC4 is y1,y2,y3...,yn
Then for each pedestrian identity predicted value yiThe output after the processing of the Softmax function is as follows:
Figure BDA0002562840500000062
it should be understood that the structure of the neural network model can be designed according to the needs of a specific task, and is not limited to the structure provided in the present embodiment. For example, the global feature extraction network may use other image classification network models such as vgnet, and the pedestrian attribute prediction branch and the pedestrian identity prediction branch may increase or decrease the number of fully connected layers, or add other types of hidden layers, as required by the specific task.
Step 104: a first difference between the probability distribution of the pedestrian property calculated by the neural network model and the actual probability distribution of the pedestrian property is calculated according to the first loss function.
In some embodiments, a cross entropy loss function is used as the loss function. Cross entropy can be used to measure how similar two probability distributions are, and a cross entropy loss function is often used to compute the difference between the predicted and actual distributions of the network during training of the neural network model. The cross entropy loss function is defined as follows:
H(x)=-∑p(x)log(q(x))
where p (x) represents the actual distribution and q (x) represents the predicted distribution of the network.
The first loss function may be determined by: regarding the identification of each pedestrian attribute as a two-classification problem, namely whether the input pedestrian image has the attribute, for each pedestrian attribute, calculating two-classification cross entropy loss functions of the probability distribution of the pedestrian attribute calculated by the neural network model and the actual probability distribution of the pedestrian attribute, and adding the two-classification cross entropy loss functions corresponding to all the pedestrian attributes to be used as a first loss function.
Step 105: and calculating a second difference between the probability distribution of the pedestrian identity calculated by the neural network model and the actual probability distribution of the pedestrian identity according to a second loss function.
The second loss function may be determined by: and (3) regarding the pedestrian identity recognition as a multi-classification problem, and taking a multi-classification cross entropy loss function of the probability distribution of the pedestrian identity calculated by the neural network model and the actual probability distribution of the pedestrian identity as a second loss function.
Step 106: and performing iterative optimization on the parameters of the neural network model according to the first difference calculated in the step 104 and the second difference calculated in the step 105.
In some embodiments, the parameters of the neural network model are iteratively optimized using a stochastic gradient descent method according to the sum of the first difference and the second difference, such that the sum of the first difference and the second difference is reduced until a preset stop condition is reached, for example, a set precision requirement or a maximum number of iterations is reached, i.e., the iteration is stopped.
In some embodiments, before step 102, further comprising:
step 101: and preprocessing the pedestrian image. In some embodiments, for a pedestrian image with a given pixel value distributed in the [0, 255] interval, the original image is normalized by using a preset mean value and standard deviation, and the size of the input image is scaled to the input size of the neural network model, so that the size of the pedestrian image meets the input requirement of the neural network model, and the training process can be more stable.
The method for jointly training the attribute recognition and the identity recognition of the pedestrian provided by the invention utilizes the correlation between the attribute of the pedestrian and the identity of the pedestrian to train the two tasks at the same time. In the training process, the model extracts global features before attribute classification, the global features and the features for attribute classification are fused, and the fused features are used for pedestrian identity recognition. Because the fused features have stronger representation capability, better pedestrian attribute identification and pedestrian identity identification performance can be realized. The model trained by the method can simultaneously output the pedestrian attribute recognition result and the pedestrian identity recognition result, and the recognition accuracy is effectively improved.
The invention also provides a system for training pedestrian attribute recognition and pedestrian identity recognition in a combined manner, as shown in fig. 3, the system comprises: an input module 301, a neural network model 302, a loss calculation module 303 and a parameter optimization module 304.
And the input module 301 is configured to acquire a pedestrian image for training, and input the pedestrian image into the neural network model 302.
The neural network model 302 is configured to calculate a pedestrian image for training acquired by the input module 301 to obtain a probability distribution of a pedestrian attribute and a probability distribution of a pedestrian identity, where the neural network model 302 includes:
and the global feature extraction network 312 is used for calculating the pedestrian image to obtain a global feature vector. In this embodiment, the global feature extraction network 312 uses a pre-trained ResNet50 network including a portion from the input layer to the global average pooling layer to perform forward calculation on the pedestrian image, so as to obtain a global feature vector f _ glb.
And a pedestrian attribute prediction branch 322, configured to calculate the global feature vector to obtain a first feature vector, and calculate probability distribution of a pedestrian attribute according to the first feature vector. In the present embodiment, the pedestrian attribute prediction branch 322 includes a full connection layer FC1 and a full connection layer FC 2. The global feature vector f _ glb is input into the pedestrian attribute prediction branch 322, a first feature vector f _ attr for pedestrian attribute classification is obtained after calculation of the full-connection layer FC1, the feature vector is input into the full-connection layer FC2 for pedestrian attribute classification, probability distribution of pedestrian attributes is obtained through calculation of the full-connection layer FC2, and the number of output neurons of the full-connection layer FC2 is equal to the number of the pedestrian attributes. In order to ensure that the distribution of the network prediction is a probability distribution, the output of the full connection layer FC2 is transformed using a Sigmoid layer, thereby obtaining the probability that the pedestrian image has each attribute. Predicting value y for each attributeiThe output after calculation of the Sigmoid layer is as follows:
Figure BDA0002562840500000081
and a pedestrian identity prediction branch 332, configured to fuse the first feature vector and the global feature vector to obtain a second feature vector, and calculate probability distribution of a pedestrian identity according to the second feature vector. In this embodiment, the pedestrian identity prediction branch 332 includes a fusion layer, a full link layer FC3 and a full link layer FC 4. The fusion layer has no parameters, and only the kronecker product of the global feature vector f _ glb and the first feature vector f _ attr is calculated to obtain a second feature vector f _ id. The kronecker product is calculated as follows:
Figure BDA0002562840500000082
and inputting the second characteristic vector f _ id into a full connection layer FC3, and obtaining the probability distribution of the pedestrian identities through the calculation of a full connection layer FC3 and a full connection layer FC4, wherein the number of output neurons of the full connection layer FC4 is equal to the number of the pedestrian identities in the training set. In order to ensure that the distribution predicted by the network is a probability distribution, the output of the full connection layer FC4 is transformed using the Softmax layer, thereby obtaining the probability that the pedestrian image corresponds to each pedestrian identity. Suppose the output of full connectivity layer FC4 is y1,y2,y3...,yn
Then for each pedestrian identity predicted value yiThe output after calculation by the Softmax layer is as follows:
Figure BDA0002562840500000083
a loss calculating module 303, configured to calculate a first difference between the probability distribution of the pedestrian attribute calculated by the neural network model and the probability distribution of the actual pedestrian attribute according to a first loss function, and calculate a second difference between the probability distribution of the pedestrian identity calculated by the neural network model and the probability distribution of the actual pedestrian identity according to a second loss function.
In some embodiments, a cross entropy loss function is used as the loss function. Cross entropy can be used to measure how similar two probability distributions are, and a cross entropy loss function is often used to compute the difference between the predicted and actual distributions of the network during training of the neural network model. The cross entropy loss function is defined as follows:
H(x)=-∑p(x)log(q(x))
where p (x) represents the actual distribution and q (x) represents the predicted distribution of the network.
The first loss function may be determined by: regarding the identification of each pedestrian attribute as a two-classification problem, namely whether the input pedestrian image has the attribute, for each pedestrian attribute, calculating two-classification cross entropy loss functions of the probability distribution of the pedestrian attribute calculated by the neural network model and the actual probability distribution of the pedestrian attribute, and adding the two-classification cross entropy loss functions corresponding to all the pedestrian attributes to be used as a first loss function.
The second loss function may be determined by: and (3) regarding the pedestrian identity recognition as a multi-classification problem, and taking a multi-classification cross entropy loss function of the probability distribution of the pedestrian identity calculated by the neural network model and the actual probability distribution of the pedestrian identity as a second loss function.
And a parameter optimization module 304, configured to perform iterative optimization on parameters of the neural network model according to the first difference and the second difference.
In some embodiments, the parameter optimization module 304 iteratively optimizes the parameters of the neural network model using a random gradient descent method according to the sum of the first difference and the second difference, so that the sum of the first difference and the second difference is reduced until a preset stop condition is reached, for example, a set precision requirement or a maximum number of iterations is reached, i.e., the iteration is stopped.
According to the system for jointly training pedestrian attribute recognition and pedestrian identity recognition, the global features of the input pedestrian image are extracted firstly by utilizing the correlation between the pedestrian attributes and the pedestrian identities, the global features and the features for attribute classification are fused to obtain the features with stronger representation capability, the fused features are used for pedestrian identity recognition, and two tasks of pedestrian attribute recognition and pedestrian identity recognition are trained simultaneously, so that a neural network model capable of simultaneously outputting a pedestrian attribute recognition result and a pedestrian identity recognition result is obtained, and the recognition accuracy is effectively improved.
The present invention also provides a pedestrian attribute identification and pedestrian identity identification system, as shown in fig. 4, the system includes: the system comprises a receiving module 401, a global feature extraction network 402, a pedestrian attribute prediction branch 403 and a pedestrian identity prediction branch 404.
The receiving module 401 is configured to receive an image of a pedestrian to be identified.
And the global feature extraction network 402 is configured to calculate the pedestrian image received by the receiving module 401 to obtain a global feature vector. In this embodiment, the global feature extraction network 402 calculates the pedestrian image using the pre-trained ResNet50 network including the parts from the input layer to the global average pooling layer, to obtain the global feature vector f _ glb.
And a pedestrian attribute prediction branch 403, configured to calculate the global feature vector to obtain a first feature vector, and calculate probability distribution of a pedestrian attribute according to the first feature vector. In the present embodiment, the pedestrian attribute prediction branch 403 includes a full connection layer FC1 and a full connection layer FC 2. The global feature vector f _ glb is input into the pedestrian attribute prediction branch 403, a first feature vector f _ attr for pedestrian attribute classification is obtained after calculation of the full-connection layer FC1, the feature vector is input into the full-connection layer FC2 for pedestrian attribute classification, probability distribution of pedestrian attributes is obtained through calculation of the full-connection layer FC2, and the number of output neurons of the full-connection layer FC2 is equal to the number of pedestrian attributes. In order to ensure that the distribution of the network prediction is a probability distribution, the output of the full connection layer FC2 is transformed using a Sigmoid layer, thereby obtaining the probability that the pedestrian image has each attribute. Predicting value y for each attributeiThe output after calculation of the Sigmoid layer is as follows:
Figure BDA0002562840500000101
and a pedestrian identity prediction branch 404, configured to fuse the first feature vector and the global feature vector to obtain a second feature vector, and calculate probability distribution of a pedestrian identity according to the second feature vector. In this embodiment, the pedestrian identity prediction branch includes a fusion layer 414, a full link layer FC3, and a full link layer FC 4. The fusion layer is used for calculating a kronecker product of the global feature vector f _ glb and the first feature vector f _ attr to obtain a second feature vector f _ id. The kronecker product is calculated as follows:
Figure BDA0002562840500000102
and inputting the second characteristic vector f _ id into a full connection layer FC3, and obtaining the probability distribution of the pedestrian identities through the calculation of a full connection layer FC3 and a full connection layer FC4, wherein the number of output neurons of the full connection layer FC4 is equal to the number of the pedestrian identities in the training set. In order to ensure that the distribution predicted by the network is a probability distribution, the output of the full connection layer FC4 is transformed using the Softmax layer, thereby obtaining the probability that the pedestrian image corresponds to each pedestrian identity. Suppose the output of full connectivity layer FC4 is y1,y2,y3...,yn
Then for each pedestrian identity predicted value yiThe output after calculation by the Softmax layer is as follows:
Figure BDA0002562840500000111
and an output module 405, configured to output the probability distribution of the attribute of the pedestrian and the probability distribution of the identity of the pedestrian.
According to the pedestrian attribute identification and pedestrian identity identification system provided by the invention, the correlation between the pedestrian attributes and the pedestrian identities is utilized to simultaneously identify the pedestrian attributes and the pedestrian identities, the global features of the input pedestrian images are extracted firstly, the global features and the features for attribute classification are fused to obtain the features with stronger representation capability, and the fused features are used for identifying the pedestrian identities, so that the pedestrian attribute identification result and the pedestrian identity identification result can be simultaneously output, and the identification accuracy is effectively improved.
The present invention also provides a computer-readable storage medium including a program executable by a processor to implement the aforementioned method of jointly training pedestrian attribute recognition and pedestrian identity recognition.
Reference is made herein to various exemplary embodiments. However, those skilled in the art will recognize that changes and modifications may be made to the exemplary embodiments without departing from the scope hereof. For example, the various operational steps, as well as the components used to perform the operational steps, may be implemented in differing ways depending upon the particular application or consideration of any number of cost functions associated with operation of the system (e.g., one or more steps may be deleted, modified or incorporated into other steps).
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. Additionally, as will be appreciated by one skilled in the art, the principles herein may be reflected in a computer program product on a computer readable storage medium, which is pre-loaded with computer readable program code. Any tangible, non-transitory computer-readable storage medium may be used, including magnetic storage devices (hard disks, floppy disks, etc.), optical storage devices (CD-to-ROM, DVD, Blu-Ray discs, etc.), flash memory, and/or the like. These computer program instructions may be loaded onto a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the computer or other programmable data processing apparatus create means for implementing the functions specified. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including means for implementing the function specified. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified.
While the principles herein have been illustrated in various embodiments, many modifications of structure, arrangement, proportions, elements, materials, and components particularly adapted to specific environments and operative requirements may be employed without departing from the principles and scope of the present disclosure. The above modifications and other changes or modifications are intended to be included within the scope of this document.
The foregoing detailed description has been described with reference to various embodiments. However, one skilled in the art will recognize that various modifications and changes may be made without departing from the scope of the present disclosure. Accordingly, the disclosure is to be considered in an illustrative and not a restrictive sense, and all such modifications are intended to be included within the scope thereof. Also, advantages, other advantages, and solutions to problems have been described above with regard to various embodiments. However, the benefits, advantages, solutions to problems, and any element(s) that may cause any element(s) to occur or become more pronounced are not to be construed as a critical, required, or essential feature or element of any or all the claims. As used herein, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, system, article, or apparatus. Furthermore, the term "coupled," and any other variation thereof, as used herein, refers to a physical connection, an electrical connection, a magnetic connection, an optical connection, a communicative connection, a functional connection, and/or any other connection.
Those skilled in the art will recognize that many changes may be made to the details of the above-described embodiments without departing from the underlying principles of the invention. Accordingly, the scope of the invention should be determined only by the claims.

Claims (10)

1. A method for training attribute recognition and identity recognition of pedestrians in a combined manner is characterized by comprising the following steps:
inputting the pedestrian image for training into a neural network model;
calculating the pedestrian image through the neural network model to obtain the probability distribution of the pedestrian attribute and the probability distribution of the pedestrian identity, wherein the neural network model comprises:
the global feature extraction network is used for calculating the pedestrian image to obtain a global feature vector;
the pedestrian attribute prediction branch is used for calculating the global feature vector to obtain a first feature vector and calculating the probability distribution of the pedestrian attribute according to the first feature vector;
the pedestrian identity prediction branch comprises a fusion layer and is used for fusing the first feature vector and the global feature vector to obtain a second feature vector, and calculating the probability distribution of the pedestrian identity according to the second feature vector;
calculating a first difference between the probability distribution of the pedestrian property calculated by the neural network model and the actual probability distribution of the pedestrian property according to a first loss function;
calculating a second difference between the probability distribution of the pedestrian identity calculated by the neural network model and the actual probability distribution of the pedestrian identity according to a second loss function;
and performing iterative optimization on the parameters of the neural network model according to the first difference and the second difference.
2. The method of claim 1, wherein the fusing layer fuses the first feature vector with the global feature vector to obtain a second feature vector, comprising: and the fusion layer calculates the kronecker product of the first feature vector and the global feature vector to obtain the second feature vector.
3. The method of claim 2, wherein the global feature extraction network is a pre-trained ResNet50 network that includes a portion of an input layer to a global average pooling layer.
4. The method of claim 1, prior to inputting the pedestrian image for training into the neural network model, further comprising: and normalizing the original image by using a preset mean value and a preset standard deviation so as to enable the size of the pedestrian image to meet the input requirement of the neural network model.
5. The method of claim 1, wherein the first loss function is determined by: for each pedestrian attribute, calculating two-class cross entropy loss functions of the probability distribution of the pedestrian attribute calculated by the neural network model and the actual probability distribution of the pedestrian attribute, and adding the two-class cross entropy loss functions corresponding to all the pedestrian attributes to form a first loss function;
the second loss function is determined by: and taking a multi-classification cross entropy loss function of the probability distribution of the pedestrian identity calculated by the neural network model and the probability distribution of the actual pedestrian identity as a second loss function.
6. The method of claim 1, wherein iteratively optimizing the parameters of the neural network model based on the first and second differences comprises: and according to the sum of the first difference and the second difference, carrying out iterative optimization on the parameters of the neural network model by using a random gradient descent method, so that the sum of the first difference and the second difference is reduced until a preset stop condition is reached.
7. A system for combined training of pedestrian attribute recognition and pedestrian identity recognition, comprising:
the input module is used for acquiring a pedestrian image for training;
the neural network model is used for calculating the pedestrian images acquired by the input module and used for training to obtain the probability distribution of the pedestrian attributes and the probability distribution of the pedestrian identities, and comprises:
the global feature extraction network is used for calculating the pedestrian image to obtain a global feature vector;
the pedestrian attribute prediction branch is used for calculating the global feature vector to obtain a first feature vector and calculating the probability distribution of the pedestrian attribute according to the first feature vector;
the pedestrian identity prediction branch is used for fusing the first feature vector and the global feature vector to obtain a second feature vector and calculating the probability distribution of the pedestrian identity according to the second feature vector;
a loss calculation module, configured to calculate a first difference between the probability distribution of the pedestrian attribute calculated by the neural network model and the probability distribution of the actual pedestrian attribute according to a first loss function, and calculate a second difference between the probability distribution of the pedestrian identity calculated by the neural network model and the probability distribution of the actual pedestrian identity according to a second loss function;
and the parameter optimization module is used for performing iterative optimization on the parameters of the neural network model according to the first difference and the second difference.
8. A pedestrian attribute identification and pedestrian identity recognition system, comprising:
the receiving module is used for receiving a pedestrian image to be identified;
the global feature extraction network is used for calculating the pedestrian image to obtain a global feature vector;
the pedestrian attribute prediction branch is used for calculating the global feature vector to obtain a first feature vector and calculating the probability distribution of the pedestrian attribute according to the first feature vector;
the pedestrian identity prediction branch is used for fusing the first feature vector and the global feature vector to obtain a second feature vector and calculating the probability distribution of the pedestrian identity according to the second feature vector;
and the output module is used for outputting the probability distribution of the pedestrian attributes and the probability distribution of the pedestrian identities.
9. The system of claim 8, wherein the pedestrian identity prediction branch comprises a fusion layer to compute a kronecker product of the first feature vector and the global feature vector, resulting in the second feature vector.
10. A computer-readable storage medium, characterized by comprising a program executable by a processor to implement the method of any one of claims 1 to 6.
CN202010620356.0A 2020-06-30 2020-06-30 Method for training attribute recognition and identity recognition of pedestrian in combined manner Pending CN111881762A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010620356.0A CN111881762A (en) 2020-06-30 2020-06-30 Method for training attribute recognition and identity recognition of pedestrian in combined manner

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010620356.0A CN111881762A (en) 2020-06-30 2020-06-30 Method for training attribute recognition and identity recognition of pedestrian in combined manner

Publications (1)

Publication Number Publication Date
CN111881762A true CN111881762A (en) 2020-11-03

Family

ID=73157895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010620356.0A Pending CN111881762A (en) 2020-06-30 2020-06-30 Method for training attribute recognition and identity recognition of pedestrian in combined manner

Country Status (1)

Country Link
CN (1) CN111881762A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784166A (en) * 2018-12-13 2019-05-21 北京飞搜科技有限公司 The method and device that pedestrian identifies again
CN110046553A (en) * 2019-03-21 2019-07-23 华中科技大学 A kind of pedestrian weight identification model, method and system merging attributive character

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784166A (en) * 2018-12-13 2019-05-21 北京飞搜科技有限公司 The method and device that pedestrian identifies again
CN110046553A (en) * 2019-03-21 2019-07-23 华中科技大学 A kind of pedestrian weight identification model, method and system merging attributive character

Similar Documents

Publication Publication Date Title
JP6873237B2 (en) Image-based vehicle damage assessment methods, equipment, and systems, as well as electronic devices
CN108304882B (en) Image classification method and device, server, user terminal and storage medium
CN109145766B (en) Model training method and device, recognition method, electronic device and storage medium
WO2020047420A1 (en) Method and system for facilitating recognition of vehicle parts based on a neural network
CN109190470B (en) Pedestrian re-identification method and device
US20230134967A1 (en) Method for recognizing activities using separate spatial and temporal attention weights
CN106372666B (en) A kind of target identification method and device
CN108960211A (en) A kind of multiple target human body attitude detection method and system
CN111275060B (en) Identification model updating processing method and device, electronic equipment and storage medium
CN110096938B (en) Method and device for processing action behaviors in video
CN109034086B (en) Vehicle weight identification method, device and system
CN110598603A (en) Face recognition model acquisition method, device, equipment and medium
JP7327077B2 (en) Road obstacle detection device, road obstacle detection method, and road obstacle detection program
KR102225613B1 (en) Person re-identification apparatus and method
US11410327B2 (en) Location determination apparatus, location determination method and computer program
CN110598019B (en) Repeated image identification method and device
KR20220076398A (en) Object recognition processing apparatus and method for ar device
CN109376736A (en) A kind of small video target detection method based on depth convolutional neural networks
CN110992404B (en) Target tracking method, device and system and storage medium
Mahpod et al. Facial landmarks localization using cascaded neural networks
CN113569070A (en) Image detection method and device, electronic equipment and storage medium
CN111241873A (en) Image reproduction detection method, training method of model thereof, payment method and payment device
KR20190018274A (en) Method and apparatus for recognizing a subject existed in an image based on temporal movement or spatial movement of a feature point of the image
CN114880513A (en) Target retrieval method and related device
CN113505716B (en) Training method of vein recognition model, and recognition method and device of vein image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination