CN113936246A - Unsupervised target pedestrian re-identification method based on joint local feature discriminant learning - Google Patents

Unsupervised target pedestrian re-identification method based on joint local feature discriminant learning Download PDF

Info

Publication number
CN113936246A
CN113936246A CN202111076953.2A CN202111076953A CN113936246A CN 113936246 A CN113936246 A CN 113936246A CN 202111076953 A CN202111076953 A CN 202111076953A CN 113936246 A CN113936246 A CN 113936246A
Authority
CN
China
Prior art keywords
local
feature
features
learning
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111076953.2A
Other languages
Chinese (zh)
Inventor
田月媛
付苗苗
邓苗磊
张德贤
吴雨露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Technology
Original Assignee
Henan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Technology filed Critical Henan University of Technology
Priority to CN202111076953.2A priority Critical patent/CN113936246A/en
Publication of CN113936246A publication Critical patent/CN113936246A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an unsupervised target pedestrian re-identification method based on Joint Local Feature discriminant learning, which is characterized in that a Joint Local Feature Extraction Network (JLFEN) formed by parallel space converters and a plurality of simple convolutional neural networks horizontally and dynamically divides two groups of Local regions with different quantity scales for the same pedestrian and extracts effective features, so that the Local features are effectively aligned in space; a Feature Joint Discrimination (FJD) loss function improvement model consisting of a Local Feature Discrimination (LFD) loss function and a Cascade Feature Discrimination (CFD) loss function is adopted to perform discriminant learning on unsupervised Local features, so that the influence of different pedestrians with similar shapes on the Local Feature learning is reduced.

Description

Unsupervised target pedestrian re-identification method based on joint local feature discriminant learning
Technical Field
The invention relates to an unsupervised target pedestrian re-identification method based on joint local feature discriminant learning, and belongs to the field of computer vision.
Background
In the pedestrian re-identification research based on deep learning, the quality of local features also has certain influence on the learning of the non-tag data features of the pedestrian re-identification model. In order to learn more effective local features, an unsupervised target pedestrian re-identification method based on combined local feature discriminant learning is provided. The pedestrian features are horizontally and dynamically divided through the joint local feature extraction network, and the corresponding region features are extracted to obtain two local feature groups, so that the influence of pedestrian attitude change and camera angle on local feature alignment is reduced. And guiding the local features to carry out discriminant learning by adopting a local feature discriminant loss function so as to improve the learning capability of the unsupervised pedestrian re-recognition model on the local features. In order to reduce the influence of local features with similar appearances of different pedestrians on model learning, the relative distance and the absolute distance between the features are calculated by using a cascade feature discrimination loss function, the similar features are drawn close, different features are pushed away, and the recognition performance of the unsupervised pedestrian re-recognition model is further enhanced.
Disclosure of Invention
In order to solve some problems in the background art, the invention provides an unsupervised target pedestrian re-identification method based on joint local feature discriminant learning.
1. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: combining local feature extraction networks;
in order to make the network dependent on the pedestrian gesture in the imageThe local features of different areas are extracted through the change of the state angle, the problem that the local features cannot be well aligned is solved, two parallel space converters are added behind a ResNet50 network to divide the local areas, a simple convolutional neural network is used for extracting the features, the intermediate feature mapping of the image is sent into a plurality of positioning networks, the spatial transformation of the feature mapping is carried out, and compared with the original image, the network computing complexity is reduced; the positioning network consists of a convolution layer with the kernel size of 3 multiplied by 3 and two full connection layers, ReLU is used as the network activation function, the last full connection layer is initialized and biased, and in order to obtain local sampling grids of two division modes, the positioning network is utilized to respectively predict two groups of space position parameters theta ═ theta1,θ2,...,θMEta ═ eta1,η2,...,ηN};
In order to enable the predicted space position to obtain effective fine-grained characteristics and align the characteristics in the space, position parameters in two space converters are predicted according to each part of the human body in the vertical direction; the human body can be divided into three parts, namely a head part, an upper body and a lower body, wherein the human body is generally short in the upper body and long in the lower body, the head part accounts for the least, the upper body is arranged, and then the lower body is arranged, but under the camera, due to the change of the angle of the camera and the change of the posture of the pedestrian, the proportion of the pedestrian in the obtained image is changed, and if the problem that the upper body is long and the lower body is short occurs, the partial areas divided by the same pedestrian can not be aligned;
the positioning network firstly predicts three groups of space position parameters to divide pedestrians in unequal proportion in the horizontal direction to obtain three local regions with the longitudinal width ratio close to 1:2:3, obtains local features from top to bottom, enables the proportion of heads to be minimum, enables the middle part to be close to the size of the upper body of a normal visual angle, and finally contains feature information below the buttocks of the pedestrians, can divide different local regions for the pedestrians in an image space according to the specific change condition of the pedestrians in the image, and can deal with the problem that the local features cannot be aligned due to the change of the proportion, the posture and the like of the pedestrians;
in addition, the upper body part and the lower body part of the pedestrian compriseThe pedestrian information is more detailed and can be divided into parts such as chests, abdomens, thighs, crus, feet and the like, different parts of the pedestrian information possibly contain different feature information, and the feature information obtained by fine local area division can enable a model to better extract fine-grained features in an image, so that six local areas are divided from the image by considering the fact that six groups of spatial position parameters are pre-measured according to a plurality of parts contained in a human body, the more effective fine-grained feature information of the pedestrian local area can be conveniently mined in a network mode, the robustness and the recognition accuracy of the model can be improved by combining the two types of local feature information, the position parameters are subjected to affine transformation with the size of 2 x 3, and the local areas are obtained by cutting the feature mapping;
Figure BDA0003262581570000031
Figure BDA0003262581570000032
wherein A isθ,AηRespectively representing unknown parameters in the two groups of positioning networks, and locally cutting the image through prediction parameters a, b, c and d;
according to the predicted parameters, the feature mapping is cut to divide local sampling grids with different positions and scales according to the spatial position of the pedestrian in the image, and the generation process is
Figure BDA0003262581570000033
Wherein, for each spatial location parameterized by the positioning network,
Figure BDA0003262581570000034
representing the spatial location coordinates of the input,
Figure BDA0003262581570000035
representing the spatial location coordinates of the output; finally, two groups of local sampling grid parameters of different division modes are obtained, a sampler is used for sampling, three local areas and six local areas with the longitudinal width ratio close to 1:2:3 are respectively obtained in the same image according to the above formula, and different local areas comprise different parts of pedestriansFinally, the obtained local areas are respectively sent into a simple convolutional neural network to be coded to obtain local characteristics; the convolutional neural network is composed of an adaptive average pooling function, a convolutional layer, a BN layer and two full-connection layers, wherein the adaptive average pooling function is used for ensuring that a feature region input into the convolutional layer is local feature mapping with a specific size of 2048 multiplied by 1, then feature extraction is respectively carried out on the local region through the convolutional layer, the BN layer and the two full-connection layers, and meanwhile, the feature information of local image splicing can be obtained by connecting the local feature information, so that a model obtains the overall important information of pedestrians in an image, and the problem of inaccurate matching caused by the similarity of local features of different pedestrians is reduced.
2. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: judging and learning local features;
by combining the division of two local areas in the local feature extraction network, local features of different positions and scales can be extracted from the same image, and then the local features are compared in the non-label local features, which is very difficult to process in deep learning based on small batch optimization, so that the feature memory is adopted to store and update the unsupervised sample features; judging local features of each block of input image according to Euclidean distance metric criterion
Figure BDA0003262581570000036
The similarity between the features of the similar positions with other images is subjected to local feature learning,
Figure BDA0003262581570000037
representing the m local feature of the ith image; feature memory
Figure BDA0003262581570000038
The updating method is to use the similarity of sample features as the auxiliary clustering of the monitoring information, train the sample features to find out the similar features nearest to the sample features, and judge whether the classes of the pseudo labels are consistent to perform corresponding operations on the featuresUpdating; the dynamic updating process comprises the following steps:
Figure BDA0003262581570000039
wherein
Figure BDA00032625815700000310
Is composed of
Figure BDA00032625815700000311
The rate of the update is 0.1,
Figure BDA00032625815700000312
for the updated latest local feature, P is the training period, and when P is 0,
Figure BDA00032625815700000313
initializing a feature library for an unlabeled database prior to training, and updating features in memory
Figure BDA00032625815700000314
Infinite proximity to
Figure BDA00032625815700000315
We compute each local feature
Figure BDA0003262581570000041
Finding distances from Euclidean distances between sample features in feature memory
Figure BDA0003262581570000042
The most recent K local features are obtained
Figure BDA0003262581570000043
Set and then calculate the K local features and features
Figure BDA0003262581570000044
The sum of the similarities between them, and
Figure BDA0003262581570000045
and all mth local features in the feature memory to obtain a Local Feature Discrimination (LFD) loss function as:
Figure BDA0003262581570000046
Figure BDA0003262581570000047
wherein M represents the number of local features into which the image is divided, here 3 and 6, | · |. the luminance2Representing the euclidean distance.
3. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: judging and learning cascade features;
under the condition that data does not have class labels, when local features which are similar in appearance but contain different identity information are clustered, the problems that the local features of the same pedestrian are separated easily and the local features of different pedestrians are drawn close easily occur in the extracted local features, so that the local features of the same pedestrian cannot be registered and the learning capability of the local features of the model is reduced, therefore, in order to improve the feature robustness of a model learning sample, a cascade feature discrimination loss function (CFD) optimization model is adopted, the sum of all local feature information of an image is obtained by connecting the local features output by unlabeled images, the discrimination of the local cascade features is learned by maximizing the inter-class distance and minimizing the intra-class distance, and learning the feature of the sample by using the hardest positive sample and the hardest negative sample of the sample is beneficial to enhancing the feature learning capability of the model, the robustness and the accuracy of the model are improved, so that a corresponding quaternary loss function is provided to guide the learning of the cascade characteristics, and the most difficult positive sample and the most difficult negative sample are used in the cascade characteristic discrimination loss function, so that the learning of the characteristics by the model is improved, and therefore, the most difficult positive sample and the most difficult negative sample pair of the samples need to be found in different modes;
first, a small sample batch is given
Figure BDA0003262581570000048
For the input image XiPerforming a series of simple random transformations including image cropping, contrast, saturation and brightness to obtain pseudo-positive sample X by image processingpiThe marked identity marks and the input images are used as the most difficult positive samples, all the image samples are sent to a network for experiment, and the randomly generated pseudo positive samples are favorable for feature discrimination learning of an unsupervised model; then, if they are not nearest neighbors, the identities of the samples are not similar and do not belong to the same class, whether the samples are nearest neighbors can be determined through the similarity between the samples, and the hardest negative sample pair is determined by using the cyclic ordering similarity result, so that the Euclidean distance between every two samples is measured according to the Euclidean distance to obtain each sample XiGenerates an ordered list N of the measurement resultsiSorting by measurement results, if sample XjThe farther the distance is from the sample XiThe lower the similarity of (A) is, not XiSo that X can be identified as the nearest neighbor top-njIs XiIn order to mine the most difficult negative pairs, the first two negative x samples in the ranking list are selectedmiAnd xniAs the most difficult negative sample pair, where xmiRank at xniBefore; and finally, performing feature learning on the model through the obtained four samples, wherein a Cascade Feature Discrimination (CFD) loss function of the model is expressed as:
LCFD=(||xi-xpi||2-||xai-xmi||2)+α)++(||xai-xpi||2-||xmi-xni||2+β)+
wherein xaiRepresenting an input image, xpiRepresenting pseudo-positive samples, xmi,xniRespectively represent the hardest negative sample pair, ()+The expression takes the maximum value, and the parameters alpha and beta are threshold values.
4. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: feature association discrimination learning;
the local feature discrimination loss function mainly performs discrimination learning on each local feature in the image, and the splicing feature discrimination loss function mainly performs the learning on all local feature discriminativity of each image, so that the loss functions of two features are combined in a reasonable mode to obtain the cascade feature discrimination loss function, the learning capability of the pedestrian re-identification model on the unlabeled data features can be improved, and the Feature Joint Discrimination (FJD) loss function is expressed as:
Figure BDA0003262581570000051
where λ represents a weight.
Drawings
Figure 1 parameter K experimental results in different data sets.
Figure 2 parameter n experimental results in different data sets.
FIG. 3 shows the result of taking the value of parameter λ in data set Market-1501.
FIG. 4 shows the result of taking the value of parameter λ in the data set DukeMTMC-reiD.
FIG. 5 is a method structure model.
Detailed Description
The invention comprises the following technical scheme:
1. an unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: combining local feature extraction networks;
in order to enable the network to extract local features of different areas according to the change of the posture angle of pedestrians in the image and reduce the problem that the local features cannot be well aligned, two parallel space converters are added behind the ResNet50 network to divide the local areas, a simple convolutional neural network is used for feature extraction, the intermediate feature mapping of the image is sent into a plurality of positioning networks, and compared with the original image, the spatial transformation of the feature mapping reduces the network calculation complexity; the core size of the positioning network is 3 x 3 volumeThe method comprises the steps that a layer is formed by stacking and two full-connection layers, ReLU is used as a network activation function, the last full-connection layer is initialized and biased, and in order to obtain local sampling grids of two division modes, a positioning network is used for predicting two groups of space position parameters theta ═ respectively1,θ2,...,θMEta ═ eta1,η2,...,ηN};
In order to enable the predicted space position to obtain effective fine-grained characteristics and align the characteristics in the space, position parameters in two space converters are predicted according to each part of the human body in the vertical direction; the human body can be divided into three parts, namely a head part, an upper body and a lower body, wherein the human body is generally short in the upper body and long in the lower body, the head part accounts for the least, the upper body is arranged, and then the lower body is arranged, but under the camera, due to the change of the angle of the camera and the change of the posture of the pedestrian, the proportion of the pedestrian in the obtained image is changed, and if the problem that the upper body is long and the lower body is short occurs, the partial areas divided by the same pedestrian can not be aligned;
the positioning network firstly predicts three groups of space position parameters to divide pedestrians in unequal proportion in the horizontal direction to obtain three local regions with the longitudinal width ratio close to 1:2:3, obtains local features from top to bottom, enables the proportion of heads to be minimum, enables the middle part to be close to the size of the upper body of a normal visual angle, and finally contains feature information below the buttocks of the pedestrians, can divide different local regions for the pedestrians in an image space according to the specific change condition of the pedestrians in the image, and can deal with the problem that the local features cannot be aligned due to the change of the proportion, the posture and the like of the pedestrians;
in addition, the upper body part and the lower body part of the pedestrian contain more detailed pedestrian information which can be divided into parts such as the chest, the abdomen, the thighs, the calves, the feet and the like, different parts of the pedestrian can contain different feature information, and the fine-grained features in the image can be better extracted by a model through the feature information obtained by fine local area divisionThe fine-granularity feature information is combined with two kinds of local feature information, so that the robustness and the recognition accuracy of the model are improved, the position parameters adopt affine transformation with the size of 2 multiplied by 3, and a local area is obtained by cutting the feature mapping;
Figure BDA0003262581570000061
Figure BDA0003262581570000062
wherein A isθ,AηRespectively representing unknown parameters in the two groups of positioning networks, and locally cutting the image through prediction parameters a, b, c and d;
according to the predicted parameters, the feature mapping is cut to divide local sampling grids with different positions and scales according to the spatial position of the pedestrian in the image, and the generation process is
Figure BDA0003262581570000063
Wherein, for each spatial location parameterized by the positioning network,
Figure BDA0003262581570000064
representing the spatial location coordinates of the input,
Figure BDA0003262581570000065
representing the spatial location coordinates of the output; finally, two groups of local sampling grid parameters of different division modes are obtained, a sampler is used for sampling, three local areas and six local areas with the longitudinal width ratio close to 1:2:3 are respectively obtained in the same image, and because different local areas contain characteristic information of different parts of pedestrians, the obtained local areas are respectively sent into a simple convolutional neural network to be coded to obtain local characteristics; the convolutional neural network is composed of an adaptive average pooling function, a convolutional layer, a BN layer and two full-connection layers, wherein the adaptive average pooling function is used for ensuring that a feature region input into the convolutional layer is a local feature mapping with a specific size of 2048 multiplied by 1, and then the local region is divided through the convolutional layer, the BN layer and the two full-connection layersAnd feature extraction is carried out, and meanwhile, the feature information of local image splicing can be obtained by connecting the local feature information, so that the model obtains the overall important information of pedestrians in the image, and the problem of inaccurate matching caused by the similarity of local features of different pedestrians is reduced.
2. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: judging and learning local features;
by combining the division of two local areas in the local feature extraction network, local features of different positions and scales can be extracted from the same image, and then the local features are compared in the non-label local features, which is very difficult to process in deep learning based on small batch optimization, so that the feature memory is adopted to store and update the unsupervised sample features; judging local features of each block of input image according to Euclidean distance metric criterion
Figure BDA0003262581570000071
The similarity between the features of the similar positions with other images is subjected to local feature learning,
Figure BDA0003262581570000072
representing the m local feature of the ith image; feature memory
Figure BDA0003262581570000073
The updating method is characterized in that the similarity of sample characteristics is used as monitoring information for assisting clustering, the sample characteristics are trained to find out the nearest similar characteristics, and whether the categories of pseudo labels are consistent or not is judged to correspondingly update the characteristics; the dynamic updating process comprises the following steps:
Figure BDA0003262581570000074
wherein
Figure BDA0003262581570000075
Is composed of
Figure BDA0003262581570000076
The rate of the update is 0.1,
Figure BDA0003262581570000077
for the updated latest local feature, P is the training period, and when P is 0,
Figure BDA0003262581570000078
initializing a feature library for an unlabeled database prior to training, and updating features in memory
Figure BDA0003262581570000079
Infinite proximity to
Figure BDA00032625815700000710
We compute each local feature
Figure BDA00032625815700000711
Finding distances from Euclidean distances between sample features in feature memory
Figure BDA00032625815700000712
The most recent K local features are obtained
Figure BDA00032625815700000713
Set and then calculate the K local features and features
Figure BDA00032625815700000714
The sum of the similarities between them, and
Figure BDA00032625815700000715
and all mth local features in the feature memory to obtain a Local Feature Discrimination (LFD) loss function as:
Figure BDA00032625815700000716
Figure BDA00032625815700000717
wherein M represents the number of local features into which the image is divided, here 3 and 6, | · |. the luminance2Representing the euclidean distance.
3. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: judging and learning cascade features;
under the condition that data does not have class labels, when local features which are similar in appearance but contain different identity information are clustered, the problems that the local features of the same pedestrian are separated easily and the local features of different pedestrians are drawn close easily occur in the extracted local features, so that the local features of the same pedestrian cannot be registered and the learning capability of the local features of the model is reduced, therefore, in order to improve the feature robustness of a model learning sample, a cascade feature discrimination loss function (CFD) optimization model is adopted, the sum of all local feature information of an image is obtained by connecting the local features output by unlabeled images, the discrimination of the local cascade features is learned by maximizing the inter-class distance and minimizing the intra-class distance, and learning the feature of the sample by using the hardest positive sample and the hardest negative sample of the sample is beneficial to enhancing the feature learning capability of the model, the robustness and the accuracy of the model are improved, so that a corresponding quaternary loss function is provided to guide the learning of the cascade characteristics, and the most difficult positive sample and the most difficult negative sample are used in the cascade characteristic discrimination loss function, so that the learning of the characteristics by the model is improved, and therefore, the most difficult positive sample and the most difficult negative sample pair of the samples need to be found in different modes;
first, a small sample batch is given
Figure BDA0003262581570000081
For the input image XiPerforming a series of simple random transformations including image cropping, contrast, saturation and brightness to obtain pseudo-positive sample X by image processingpiThe same as the marked identity label and the input image are used as the most difficult positive samples, all the image samples are sent to the network for experiment, and the marked identity label and the input image are usedThe machine-generated false positive sample is beneficial to feature discriminant learning of an unsupervised model; then, if they are not nearest neighbors, the identities of the samples are not similar and do not belong to the same class, whether the samples are nearest neighbors can be determined through the similarity between the samples, and the hardest negative sample pair is determined by using the cyclic ordering similarity result, so that the Euclidean distance between every two samples is measured according to the Euclidean distance to obtain each sample XiGenerates an ordered list N of the measurement resultsiSorting by measurement results, if sample XjThe farther the distance is from the sample XiThe lower the similarity of (A) is, not XiSo that X can be identified as the nearest neighbor top-njIs XiIn order to mine the most difficult negative pairs, the first two negative x samples in the ranking list are selectedmiAnd xniAs the most difficult negative sample pair, where xmiRank at xniBefore; and finally, performing feature learning on the model through the obtained four samples, wherein a Cascade Feature Discrimination (CFD) loss function of the model is expressed as:
LCFD=(||xi-xpi||2-||xai-xmi||2+α)++(||xai-xpi||2-||xmi-xni||2+β)+
wherein xaiRepresenting an input image, xpiRepresenting pseudo-positive samples, xmi,xniRespectively represent the hardest negative sample pair, ()+The expression takes the maximum value, and the parameters alpha and beta are threshold values.
4. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: feature association discrimination learning;
the local feature discrimination loss function mainly performs discrimination learning on each local feature in the image, and the splicing feature discrimination loss function mainly performs the learning on all local feature discriminativity of each image, so that the loss functions of two features are combined in a reasonable mode to obtain the cascade feature discrimination loss function, the learning capability of the pedestrian re-identification model on the unlabeled data features can be improved, and the Feature Joint Discrimination (FJD) loss function is expressed as:
Figure BDA0003262581570000082
where λ represents a weight.
Results and analysis of the experiments
In order to verify the influence of a combined local feature extraction network on the alignment of the local features of the pedestrians, the combined local feature extraction network is verified on a DukeMTMC-reiD data set and a Market-1501 data set respectively, performance evaluation is carried out through evaluation index average precision value mAP and matching rate top-k, when two local feature branches are combined, the mAP and top-1 values of the model are superior to those of the single local feature branch, and the combined local feature discriminant learning method is used for extracting the local features of different scales and positions of the same pedestrian image, so that the network can more finely acquire effective local feature information of the pedestrians, the accuracy of the local features in alignment comparison is improved, and the learning capacity of the unsupervised pedestrian re-identification model on the local features is further improved. The results of the experiments are shown in the following table.
TABLE 1 local feature Branch in Market-1501 affects results on model Performance
Figure BDA0003262581570000091
TABLE 2 local feature Branch in DukeMTMC-reiD impact results on model Performance
Figure BDA0003262581570000092
Compared with the PAUL unsupervised local feature method, the combined local feature branch is used and is divided according to the structure of the human body part, so that local features containing more effective feature information can be obtained for effective local feature comparison. It is also contemplated to learn local features using the FJD loss function with better discriminant learning capabilities. It can be seen from the table that in both datasets, the mAP and top-1 results are higher than PAUL, in the Market-1501 dataset, the mAP value is increased to 41.4%, and the top-1 value is increased to 70.2%; in the DukeMTMC-reiD dataset, the mAP value increased to 54.1% and the top-1 value increased to 73.9%.
Table 3 comparison with the latest method in Market-1501 data set
Figure BDA0003262581570000101
TABLE 4 comparison of the DukeMTMC-reiD dataset with the latest method
Figure BDA0003262581570000102
In order to better analyze the influence of a Local Feature Discrimination (LFD) loss function and a Cascade Feature Discrimination (CFD) loss function in a model on the discriminability of the learning local features of the model, the result of a pre-training network JLFEN is taken as the reference of an experiment, the experiment analysis is respectively carried out in a typical data set Market-1501 and a DukeMTMC-reiD, the experiment result shows that the values of an evaluation index average precision value mAP and a matching rate top-k are obviously improved after the LFD loss function is used for learning, and the values of mAP and top-k are also improved after the CFD loss function is used, so that the effective discrimination of the CFD loss function on the local features of different pedestrians with similar appearances in a pedestrian image is shown. Finally, two loss functions are reasonably combined, and the results of the mAP and the top-k are obviously superior to those of the other two combination modes, wherein the mAP value is improved by 8.7%, the top-1 value is improved by 8.3%, the top-5 value is improved by 6.3%, and the top-15 value is improved by 5.5% in the Market-1501 data set; the mAP value in the Duke MTMC-reiD data set is improved by 7.2%, the top-1 value is improved by 7.5%, the top-5 value is improved by 2.8%, and the top-15 value is improved by 1.7%.
TABLE 5 ablation test results of loss function in Market-1501 data set
Figure BDA0003262581570000111
TABLE 6 results of ablation experiments with loss function in DukeMTMC-reiD data set
Figure BDA0003262581570000112
The values of the parameters in the loss function also have a certain influence on the performance of the model. Through experiments in different data sets, the values of the weight parameter lambda in the FJD loss function, the selection of the parameter K in the LFD loss function and the selection of the parameter n in the CFD loss function are analyzed, and the influence of the selection of the parameter K in the LFD loss function and the influence of the selection of the parameter n in the CFD loss function on the model performance are respectively analyzed. The experimental method controls other parameters to be unchanged, the values of the other parameters are respectively explored, and the first hit rate top-1 is used as an evaluation index for analysis. It was found that when K is 15, the top-1 value is relatively good.
The CFD loss function also plays an important role in distinguishing the model learning characteristics, wherein the better most difficult negative sample is found, so that the model learning characteristics can be effectively improved, the unsupervised model robustness is enhanced, and therefore, the influence of the selection of the parameter n in the nearest neighbor top-n in the sample on the cascade characteristic distinguishing loss function is analyzed through experiments. Because two negative samples are needed in CFD loss function learning, when the n value is too small, the obtained samples are insufficient, the loss function learns the positive sample as the negative sample by mistake, so that the learning performance of the model is reduced, when the n value is gradually increased, the performance of the model is increased and then reduced, the larger the n value is, because the learned samples are enough, the model can easily find the most difficult sample, the learning difficulty of the model is reduced, and when n is 6, the model can obtain good performance.
From the above results, the parameter K is 15, n is 6 to control the variables, and the parameter λ is analyzed experimentally, and it can be seen in fig. 3 that in different data sets, as the λ value is gradually increased, the results of the average precision value mAP and the first hit rate top-1 are increased and then decreased, wherein at λ 2, the modeling performance is the best.

Claims (4)

1. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: combining local feature extraction networks;
in order to enable the network to extract local features of different areas according to the change of the posture angle of pedestrians in the image and reduce the problem that the local features cannot be well aligned, two parallel space converters are added behind the ResNet50 network to divide the local areas, a simple convolutional neural network is used for feature extraction, the intermediate feature mapping of the image is sent into a plurality of positioning networks, and compared with the original image, the spatial transformation of the feature mapping reduces the network calculation complexity; the positioning network consists of a convolution layer with the kernel size of 3 multiplied by 3 and two full-connection layers, ReLU is used as the network activation function, the last full-connection layer is initialized and biased, and in order to obtain local sampling grids of two division modes, the positioning network is utilized to respectively predict two groups of space position parameters theta ═ theta1,θ2,...,θMEta ═ eta1,η2,...,ηN};
In order to enable the predicted space position to obtain effective fine-grained characteristics and align the characteristics in the space, position parameters in two space converters are predicted according to each part of the human body in the vertical direction; the human body can be divided into three parts, namely a head part, an upper body and a lower body, wherein the human body is generally short in the upper body and long in the lower body, the head part is the smallest, then the upper body and then the lower body, but under the camera, due to the change of the angle of the camera and the change of the posture of a pedestrian, the proportion of the pedestrian in the obtained image changes, for example, the problem that the upper body is long and the lower body is short occurs, and the situation that the partial areas divided by the same pedestrian cannot be aligned can occur;
the positioning network firstly predicts three groups of space position parameters to divide pedestrians in unequal proportion in the horizontal direction to obtain three local regions with the longitudinal width ratio close to 1:2:3, obtains local features from top to bottom to enable the proportion of heads to be minimum, the middle part of the local regions is close to the size of the upper body at a normal visual angle, and finally contains feature information below the buttocks of the pedestrians;
in addition, the upper body part and the lower body part of the pedestrian contain more detailed pedestrian information which can be divided into parts such as chests, abdomens, thighs, shanks, feet and the like, different parts of the pedestrian information can contain different feature information, and the fine particle size features in the image can be better extracted by the model through the feature information obtained by fine local area division, so that six local areas are divided by the image by considering prediction of six groups of spatial position parameters according to a plurality of parts contained in a human body, the network can conveniently mine more effective fine particle size feature information of the local area of the pedestrian, the robustness and the recognition accuracy of the model can be improved by combining two types of local feature information, the position parameters adopt affine transformation with the size of 2 x 3, and the local areas are obtained by cutting the feature mapping;
Figure FDA0003262581560000011
Figure FDA0003262581560000012
wherein A isθ,AηRespectively representing unknown parameters in the two groups of positioning networks, and locally cutting the image through prediction parameters a, b, c and d;
according to the predicted parameters, the feature mapping is cut to divide local sampling grids with different positions and scales according to the spatial position of the pedestrian in the image, and the generation process is
Figure FDA0003262581560000013
Wherein, for each spatial position parameterized by the positioning network,
Figure FDA0003262581560000021
representing the spatial location coordinates of the input,
Figure FDA0003262581560000022
representing the spatial location coordinates of the output; finally, two groups of local sampling grid parameters of different division modes are obtained, a sampler is used for sampling, three local areas and six local areas with the longitudinal width ratio close to 1:2:3 are respectively obtained in the same image, and because different local areas contain characteristic information of different parts of pedestrians, the obtained local areas are respectively sent into a simple convolutional neural network to be coded to obtain local characteristics; the convolutional neural network consists of an adaptive average pooling function, a convolutional layer, a BN layer and two full-connection layers, wherein the adaptive average pooling function is used for ensuring that a feature region input into the convolutional layer is local feature mapping with a specific size of 2048 multiplied by 1, then feature extraction is respectively carried out on the local region through the convolutional layer, the BN layer and the two full-connection layers, and meanwhile, the local feature information is connected to obtain feature information of image local splicing, so that a model obtains integral important information of pedestrians in an image, and the problem of inaccurate matching caused by the similarity of local features of different pedestrians is solved.
2. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: local feature discrimination learning;
by combining the division of two local areas in a local feature extraction network, local features of different positions and scales can be extracted from the same image, and then the local features are compared in the non-label local features and are difficult to process in deep learning based on small-batch optimization, so that a feature memory is adopted to store and update the unsupervised sample features; judging local features of each block of input image according to Euclidean distance measurement criterion
Figure FDA0003262581560000023
The similarity between the features of the similar positions with other images is subjected to local feature learning,
Figure FDA0003262581560000024
indicates the ith sheetThe mth local feature of the image; feature memory
Figure FDA0003262581560000025
The updating method is characterized in that the similarity of sample characteristics is used as monitoring information for assisting clustering, the sample characteristics are trained to find out the nearest similar characteristics, and whether the categories of pseudo labels are consistent or not is judged to correspondingly update the characteristics; the dynamic updating process comprises the following steps:
Figure FDA0003262581560000026
wherein
Figure FDA0003262581560000027
Is composed of
Figure FDA0003262581560000028
The rate of the update is 0.1,
Figure FDA0003262581560000029
for the updated latest local feature, P is the training period, and when P is 0,
Figure FDA00032625815600000210
initializing a feature library for an unlabeled database prior to training, and updating features in memory
Figure FDA00032625815600000211
Infinite proximity to
Figure FDA00032625815600000212
We compute each local feature
Figure FDA00032625815600000213
Finding distances from Euclidean distances between sample features in feature memory
Figure FDA00032625815600000214
The most recent K local features are obtained
Figure FDA00032625815600000215
Set and then calculate the K local features and features
Figure FDA00032625815600000216
The sum of the similarities between them, and
Figure FDA00032625815600000217
and all mth local features in the feature memory to obtain a Local Feature Discrimination (LFD) loss function as:
Figure FDA00032625815600000218
Figure FDA00032625815600000219
wherein M represents the number of local features into which the image is divided, here 3 and 6, | · |. the luminance2Representing the euclidean distance.
3. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: judging and learning cascade characteristics;
under the condition that data does not have class labels, when local features which are similar in appearance but contain different identity information are clustered, the problems that the local features of the same pedestrian are separated easily and the local features of different pedestrians are drawn close easily occur in the extracted local features, so that the local features of the same pedestrian cannot be registered and the learning capacity of the local features of the model is reduced, therefore, in order to improve the feature robustness of a model learning sample, a cascade feature discrimination loss function (CFD) optimization model is adopted, the sum of all local feature information of an image is obtained by connecting the local features output by unlabeled images, the discrimination of the local cascade features is learned by maximizing the inter-class distance and minimizing the intra-class distance, the learning of the sample features by using the hardest positive sample and the hardest negative sample of the sample is beneficial to enhancing the feature learning capacity of the model and improving the robustness and accuracy of the model, therefore, a corresponding quadruple loss function is provided to guide the learning of the cascade characteristics, and the most difficult positive sample and the most difficult negative sample are used in the cascade characteristic discrimination loss function, so that the learning of the characteristics by the model is improved, and therefore, the most difficult positive sample and the most difficult negative sample of the samples need to be found in different modes;
first, a small sample batch is given
Figure FDA0003262581560000031
For the input image XiPerforming a series of simple random transformations including image cropping, contrast, saturation and brightness to obtain pseudo-positive sample X by image processing operationpiThe marked identity marks and the input images are used as the most difficult positive samples, all the image samples are sent to a network for experiment, and the randomly generated pseudo positive samples are favorable for feature discriminant learning of an unsupervised model; then, if they are not nearest neighbors, the identities between the samples are not similar and belong to the same class, whether the samples are nearest neighbors can be determined through the similarity between the samples, and the hardest negative sample pair is determined by using the cyclic ordering similarity result, so that each sample X is obtained by measuring the Euclidean distance between every two samples according to the Euclidean distanceiGenerates an ordered list X of the measurement resultsiSorting by measurement results, if sample XjThe farther the distance is from the sample XiThe lower the similarity of (A) is, not XiSo that X can be identified as the nearest neighbor top-njIs XiIn order to mine the most difficult negative pairs, the first two negative x samples in the ranking list are selectedmiAnd xniAs the most difficult negative sample pair, where xmiRank at xniBefore; finally, the four obtained are passedThe sample carries out feature learning on the model, and the Cascade Feature Discrimination (CFD) loss function of the sample is expressed as:
LCFD=(||xi-xpi||2-||xai-xmi||2+α)++(||xai-xpi||2-||xmi-xni||2+β)+
wherein xaiRepresenting an input image, xpiRepresenting pseudo-positive samples, xmi,xniRespectively represent the hardest negative sample pair, ()+This means taking the maximum value and the parameters α, β as the threshold.
4. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: feature joint discrimination learning;
the local feature discrimination loss function mainly performs discrimination learning on each local feature in the image, and the splicing feature discrimination loss function mainly performs the learning on all local feature discriminativity of each image, so that the loss functions of two features are combined in a reasonable mode to obtain the cascade feature discrimination loss function, the learning capability of the pedestrian re-identification model on the unlabeled data features can be improved, and the Feature Joint Discrimination (FJD) loss function is expressed as:
Figure FDA0003262581560000041
where λ represents a weight.
CN202111076953.2A 2021-09-14 2021-09-14 Unsupervised target pedestrian re-identification method based on joint local feature discriminant learning Pending CN113936246A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111076953.2A CN113936246A (en) 2021-09-14 2021-09-14 Unsupervised target pedestrian re-identification method based on joint local feature discriminant learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111076953.2A CN113936246A (en) 2021-09-14 2021-09-14 Unsupervised target pedestrian re-identification method based on joint local feature discriminant learning

Publications (1)

Publication Number Publication Date
CN113936246A true CN113936246A (en) 2022-01-14

Family

ID=79275732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111076953.2A Pending CN113936246A (en) 2021-09-14 2021-09-14 Unsupervised target pedestrian re-identification method based on joint local feature discriminant learning

Country Status (1)

Country Link
CN (1) CN113936246A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310648A (en) * 2023-03-23 2023-06-23 北京的卢铭视科技有限公司 Model training method, face recognition method, electronic device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310648A (en) * 2023-03-23 2023-06-23 北京的卢铭视科技有限公司 Model training method, face recognition method, electronic device and storage medium
CN116310648B (en) * 2023-03-23 2023-12-12 北京的卢铭视科技有限公司 Model training method, face recognition method, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN108960140B (en) Pedestrian re-identification method based on multi-region feature extraction and fusion
CN112101150B (en) Multi-feature fusion pedestrian re-identification method based on orientation constraint
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN111898736B (en) Efficient pedestrian re-identification method based on attribute perception
CN105956560B (en) A kind of model recognizing method based on the multiple dimensioned depth convolution feature of pondization
Senior A combination fingerprint classifier
CN102682302B (en) Human body posture identification method based on multi-characteristic fusion of key frame
CN110197502B (en) Multi-target tracking method and system based on identity re-identification
CN109165540B (en) Pedestrian searching method and device based on prior candidate box selection strategy
CN108595636A (en) The image search method of cartographical sketching based on depth cross-module state correlation study
CN106257498A (en) Zinc flotation work condition state division methods based on isomery textural characteristics
CN105975932A (en) Gait recognition and classification method based on time sequence shapelet
CN106778501A (en) Video human face ONLINE RECOGNITION method based on compression tracking with IHDR incremental learnings
CN110334628B (en) Outdoor monocular image depth estimation method based on structured random forest
CN113706547B (en) Unsupervised domain adaptive semantic segmentation method based on category dissimilarity guidance
CN112767447A (en) Time-sensitive single-target tracking method based on depth Hough optimization voting, storage medium and terminal
CN111814705B (en) Pedestrian re-identification method based on batch blocking shielding network
CN113486902A (en) Three-dimensional point cloud classification algorithm automatic selection method based on meta-learning
CN115527269A (en) Intelligent human body posture image identification method and system
CN113936246A (en) Unsupervised target pedestrian re-identification method based on joint local feature discriminant learning
CN113723558A (en) Remote sensing image small sample ship detection method based on attention mechanism
CN112084353A (en) Bag-of-words model method for rapid landmark-convolution feature matching
CN116935411A (en) Radical-level ancient character recognition method based on character decomposition and reconstruction
CN112465054B (en) FCN-based multivariate time series data classification method
CN113887509B (en) Rapid multi-modal video face recognition method based on image set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20220114